If you’re worked in the wide and diverse field of information technology for almost any amount of time, it probably hasn’t taken you long to discover that the one thing constant about IT is that the technologies and strategies involved change faster than you can learn them. And if you work in business intelligence like I do, you don’t have to look very far at all to see change. The Microsoft Power BI team rolls out a software update every month! If I want to stay learned up on the technology, I have to really be on top of things.
New to Power BI? Start here!
About ten years ago when Hadoop was first being developed at Yahoo, I don’t think anyone could have anticipated the size of the ripples (more likes cannonball sized splashes) being able to access Big Data could and would have on the IT industry. Hadoop (and other advances in hardware and software technologies) gave us something we never had before: The ability to access and report on data in real time on a scale never previously imagined. This gives an organization a unique way to identify and understand trends and patterns in the data and gain previously unknown insights. The organizations that are able to leverage big data will be the organizations that leave their competition in the dust.
Set Up and Configure the Hortonworks Sandbox in Azure
Not only does Power BI Desktop give us the ability to connect to Hadoop Distributed File System (HDFS) for reporting we can also mash it up with other more traditional and structured data sources with minimal effort required. But that’s not what this blog post is all about. This post is about setting up a virtual machine in Azure running Hadoop and connecting to our Hortonworks Sandbox with Power BI Desktop :).
The first thing you do if you don’t have access to a Hadoop cluster is to set up the Hortonworks Sandbox on Azure. The good news is its free (for the duration of the trial) and its super easy. Just follow the instructions at this link to set up the Hortonworks Sandbox.
Once that’s set up, you’ll need to add mapping for the IP address and host name to your hosts file. Devin Knight has a blog on this that you’ll find helpful.
Connecting to HDFS with Power BI Desktop
Once your Hortonworks Sandbox is set up, you’re ready to set up your connection to Hadoop with Power BI Query. Start up the Power BI Desktop and click Get Data. Scroll down and select Hadoop File (HDFS) and click Connect.
From there you can follow the rest of the wizard to load the data into the semantic model.
Once the data is loaded, you’ll need to modify the query to navigate to the data you wish to use in your model.
In Power BI Desktop, go to the Home ribbon and click Edit Queries.
Next, filter the Extension column to only show rows where the extension is .txt like I’ve done here.
You’ll see in the Hortonworks Sandbox that there’s a bunch of different data samples we can poke around in, but for this example I’m going to use the cdrs.txt file, which is some sample call detail record data that any telecommunication company might have. Click the Binary value in the Content column next to the cdrs.txt to drill down into the contents of the file. Now we’re ready to party! Click Close & Load in the top left of the Query Editor when you’re done making any transformations.
Once you click Close & Load, your data will be loaded into the model and you’re ready to start building dashboards in Power BI Desktop or begin mashing up your big data with other data sources.
And in case you were wondering, here’s what the dashboard looks like on my Windows phone (Nokia Lumia 1520):
Resources
To create the Hortonworks Sandbox in Azure, follow these instructions: http://hortonworks.com/blog/hortonworks-sandbox-azure/
Watch this video on the latest version of the Power BI Desktop
If you want a complete walk through of Power BI Desktop and the new visualization types, check out these blog posts:
Devin Knight has a good blog on adding the mapping of the IP and host name to your hosts file: http://devinknightsql.com/2014/05/02/power-query-to-hdfs-remote-name-could-not-be-resolved/
Feedback?
So what’s your take on Big Data and how it relates to Power BI? Where do you see this going in the future? Leave a comment down below.