Web Scraping in Power BI

Scraping data from a website in Power BI is SO EASY. Let me say it one more time. Grabbing data from a website’s HTML in Power BI is so ridiculously easy it’ll make you mad. After I imported my first data from a website using Power BI I thought about all of the times that I had written R scripts to grab some data and exported it to CSV to grab with Tableau and it made me laugh.

In this blog I am going to show you how simple it is to scrape data from the web with Power BI using my Bulls Dynasty dashboard (below) as an example.

Step 1. Connecting to Data

To start, open up Power BI and click on Transform Data to open Power Query Editor. Now click, New Source. This will give us a few different options to connect by default, but you can click More at the bottom of the list to see an expanded list of options.

In this example we will be pulling data directly from basketball-reference.com. In order to do this we need to select the Web option under Other. When you hover over this option it says, “Import data from a Web page.” Sounds just like what we want to do! Click Connect.

Once you’ve clicked Connect you’ll find the screen below giving you two options, Basic or Advanced. Today we will just be using the Basic option. I know it seems like it should be harder than this, but just input the url from the website you’re trying to get data from and paste it in the bar and click Ok. For this example we are going to pull data for the 1994-95 Chicago Bulls using the url below:

https://www.basketball-reference.com/teams/CHI/1995.html

Step 2. Importing Data

When building an R script to do a similar task it’s important to know what tables you’re referring to in the HTML, but luckily Power BI separates out these tables for us. If you see below, Power BI returns the different sections of the web page allowing us to just import the data that we are interested in. For this use case, I am wanting just the Per Game Table, so I’m going to click on that box and press Ok.

Step 3. Transforming Data

Now that my data has been brought into my Power Query Editor I want to do some basic manipulation to make sure that my data model is set up appropriately.

First, I want to double click on the second column header which is currently blank and rename it Player. I then want to add a column to indicate what season these stats are for. Select the Add Column ribbon at the top and then select Custom Column. Name your column Year and make the value 1995. Click Ok.

Next, go to the left hand side and rename your sheet in accordance to the year and then add in another query for the 1996 season. This process should be the same process as importing 1995 except for the one character difference in the url. In my use case I created 8 queries, 1 for each of the seasons I wanted to look at.

Once you have all of the seasons that you want to include in your data analysis, you will need a way to union all of this data to create your final data set. Luckily, Power BI gives us an easy way to do that as well! Highlight all of your desired queries and the go to the top right of the Home ribbon and click the Append Queries drop down. Since I wanted to keep all of the original queries, I clicked Append Queries as New. This will create a new query table for you on the left hand side.

Once this has been done, click Close and Apply in the top left to get ready to visualize your data!

Step 4. Visualize Data

This was the easiest but most fun part of the process. I chose the Ribbon chart in the visualizations pane to represent this data as I felt that I wanted to show the ebb and flow of certain measures over the years. I placed the Year column that we made in the Axis field, the Player field in the Legend, and our desired measure, such as points per game shown below, in the Value field.

Using text boxes with 100% transparent backgrounds and manually placing them, I created data labels for our Ribbon chart. The trophy icons are just transparent images spaced manually.

Thanks for reading! I am having a lot of fun blogging about Power BI as I learn so hopefully you had some fun and learned a little bit too. My hope is that this blog showed you how easy it is to import web data in Power BI. If you have any questions, comments, or concerns please feel free to email me at spencer.baucke@tessellationconsulting.

3 comments

  1. Great post Spencer and thanks for the awesome break-down. After I was done, I did a little experiment to see if I could do something similar with SEMRush historical rank data over time (using the ribbon). Pretty amazing – especially with the ability to filter by keywords and see clear +/- trends over time.

Leave a Reply