Category Archives: Data Warehouse Design

Data Warehouse Design and Power BI at SQL Saturday Tampa 2019

SQLSaturday #859 - Tampa 2019

Hello, fellow data nerds! I wanted to make you aware of a fantastic event coming up this month in Tampa, Florida. On Saturday, March 23, 2019, the annual SQL Saturday event will be hosted at the University of South Florida! This is a really tremendous opportunity to engage in some quality training at little to no cost. If you’re able to be in the Tampa area Saturday 3/23 then you need to register for this event. I’ve seen the schedule and the line up is impressive! Old dogs and young dogs alike will learn something new from a host of professionals and industry experts!

I’ll be presenting two sessions during the morning of March 23 at SQL Saturday Tampa and I’d love to see you there!

Making Sense of the Power BI Ecosystem

Power BI has taken the world by storm since it’s release in mid-2015. Since then, Microsoft’s cloud analytics service has matured and grown in scope and complexity. In this session, we’ll seek to understand the moving parts of the Power BI ecosystem including Power BI Pro, Premium, Report Server, Embedded, Data Flows, and more. By the end of this session the audience should have a firm grasp on the components ecosystem and how those components work together.

Building a Data Warehouse from the Ground Up

What if you could approach any business process in your organization and quickly design an effective and optimal dimensional model using a standardized step-by-step method? In this session we’ll discuss the steps required to design a unified dimensional model that is optimized for reporting and follows widely accepted best practices. We’ll also discuss how the design of our dimensional model affects a semantic modeling technologies like SQL Server Analysis Services or Power BI and how the choices we make during the data warehouse design phase can make or break our reports.

I hope you can make it to this really amazing event and can make it to one of my sessions! Don’t forget to register! If you do, swing by and say hi to me!

10 SQL Server Data Warehouse Design Best Practices to Follow (Part 1 )

This past Saturday I had the pleasure of speaking at SQL Saturday #552 here in beautiful Jacksonville, Florida. My good friend, Mitch Pearson (blog | twitter) and I presented our session, Designing a Data Warehouse from the Ground Up. We had a great crowd and lots of great questions from the audience!

Watch Designing a Data Warehouse from the Ground Up Webinar Recording

With all the talk about designing a data warehouse and best practices, I thought I’d take a few moment to jot down some of my thoughts around best practices and things to consider when designing your data warehouse. Below you’ll find the first five of ten data warehouse design best practices that I believe are worth considering. This list isn’t meant to be the ten best “best practices” to follow and are in no particular order. Of course, each design scenario is different so you may find that some of the best practices listed here aren’t optimal in your specific situation.

Continue reading 10 SQL Server Data Warehouse Design Best Practices to Follow (Part 1 )

Data Warehouse Design Challenge: Relating a Temporal Fact Table to a Date Dimension

This past week I ran into an interesting challenge with a client. The data warehouse is capturing testing data for an educational institution. In the screenshot below, you’ll see Continue reading Data Warehouse Design Challenge: Relating a Temporal Fact Table to a Date Dimension

Thank You for Attending my #SQLSatOrlando Session! Slides, Resources, Recording

SQL Saturday #442 in Orlando, FL has come and gone but what a turn out! The event was excellent, we had a great turnout for our session and had a blast! And as a bonus, the BBQ lunch, baked beans, coleslaw, mac n cheese and dessert were amazing. Seriously one of the best lunches I’ve had a SQL Saturday event! Plus, the Lego name tags were epic! 100% without a doubt the coolest name tag ever.

Continue reading Thank You for Attending my #SQLSatOrlando Session! Slides, Resources, Recording

Data Warehouse from the Ground Up at SQL Saturday Orlando, FL on Oct. 10th

SQL Saturday #442SQL Saturday #442 is upon us and yours truly will be presenting in Orlando, Florida on October 10th alongside Mitchell Pearson (b|t). The session is scheduled at 10:35 AM and will last until 11:35 AM. I’m very excited to be presenting at SQL Saturday Orlando this year as it’ll be my first presenting this session in person and my first time speaking at SQL Saturday Orlando! If you haven’t registered yet for this event, you need to do that. This event will be top notch! Continue reading Data Warehouse from the Ground Up at SQL Saturday Orlando, FL on Oct. 10th

Designing a Data Warehouse from the Ground Up Webinar Recording with Q & A

image Thank you to everyone that registered and attended my webinar Designing your Data Warehouse from the Ground Up webinar this past Tuesday. And I’d also like to give a special thanks to my good friend, Mitchell Pearson (b|t), for helping me present this webinar. We had a great time!

Also, thank to everyone that tuned into the live broadcast of the webinar on Periscope! I hope you enjoyed the unique perspective Periscope gives.

The good news is that Continue reading Designing a Data Warehouse from the Ground Up Webinar Recording with Q & A

#PowerBI Fantasy Football Player Stats Dashboards for Download

Every year at Pragmatic Works some coworkers, including consultants, marketing staff, support team members, software development staff and project management, partake in a company fantasy football league. And with the recent release of the new Power BI Desktop, I thought what better way is there to prepare to completely annihilate my coworkers and friends in an imaginary nonsensical game than by creating some nifty Power BI dashboards based on last years player stats as recorded by Yahoo! Sports. So I thought I’d walk you through some of the steps I followed to leverage the Yahoo! Sports NFL player stats page as a data source and some of the query transformations I applied to prepare the data for reporting.

Power BI dashboard with Power BI Desktop Continue reading #PowerBI Fantasy Football Player Stats Dashboards for Download

My Top Four Books for the MS Business Intelligence Professional

As a Business Intelligence Consultant, I do a decent amount of speaking, interacting with the community, and have written and contributed on a few SQL Server books. A question I’m often asked is if I can recommend any good books which brings me to this blog post. I wanted to make you aware of four books for learning data warehousing and  other MS BI technologies that I’ve found incredibly helpful over the years I’ve spent designing and implementing enterprise data warehouse and business intelligence solutions. Continue reading My Top Four Books for the MS Business Intelligence Professional

Learn Designing a Data Warehouse from the Ground Up at SQL Saturday 442 Orlando, FL

image I’m very excited to announce that I’ve been selected to present a session entitled Designing a Data Warehouse from the Ground Up on October 10, 2015 in Orlando, Florida at Seminole State College of Florida! I’ll be presenting this session for the first time along side Mitchell Pearson (b|t). This is going to be an amazing event with tons of amazing, FREE training for everyone including SQL server newbies on up to those who have been in the profession for years. Continue reading Learn Designing a Data Warehouse from the Ground Up at SQL Saturday 442 Orlando, FL

Gotchas With Referenced Dimensions

image

A referenced dimension relationship allows us to create a relationship between a measure group and a dimension that is indirectly related to the measure group using an intermediate dimension. This is great, because we can now create relationships between our facts and dimensions that may not be directly related. Unfortunately, there are a couple of gotchas that you may not be aware of, which I read about here and here. I’m just offering my take on these issues, so thanks to Alberto and Jesse for the great content.

You can see several examples of referenced relationship types in the Adventure Works cube. One of those examples is the relationship between the Sales Target measure group and the Sales Territory dimension.

image

If you’ve ever created a referenced dimension relationship, you know that the Materialize option is checked by default. So one might assume this is the best option to go with.

To Materialize?
image

But before we can decide if the option to materialize the relationship is best, we must first understand what the materialize option really does. When we check the materialize option, the link between the referenced dimension and the facts is materialized. During processing of the measure group, a join is made to the dimension to look up the referenced dimension’s key value. This leads to slower cube processing but better cube query performance.

There is, however, an important thing to realize: The materialized relationship link between the facts and the correct attribute members is created during the processing of the partition, not the dimension! Why is this important? Imagine we need to update the relationships between the facts and the referenced dimension. We can no longer simply process the dimension. The link to the older referenced dimension attribute member still exists in the partition! The partition must be processed! This could be a huge issue if your cube is very large and you’re only planning to process the most recent partition. If the relationships changed in the referenced dimension, you could find yourself forced to process more than just the most recent partition in order to ensure the relationships between the facts and the referenced dimension are correct.

To Not Materialize?
image

So what’s the other option? We could leave the option to materialize the relationship unchecked. The downside to this is that query performance will suffer since the member will need to be looked up on the fly. The upside is that processing will be faster and we no longer need to process the partition to update the relationship since its looked up at query time.

But there is also a gotcha with using un-materialized referenced dimension relationships. Aggregations can not be created for a referenced dimension when the relationship is not materialized. If you run the Aggregation wizard, you’ll notice that any referenced dimensions with un-materialized relationships will be left out. This is by design. An aggregation should not be created (which is possible to do manually) for an un-materialized referenced dimension relationship because the relationship is resolved at query time.

What About the TSQL Query?

But what happens with the TSQL query during processing of the measure group? Here’s an example in the Adventure Works cube that we can look at:

image

This referenced relationship between the Reseller Sales measure group is materialized. When the Reseller Sales measure group is processed, a join is made to the intermediate dimension, to look up the Geography Key, which you can see here:

image

This join allows SSAS to add the referenced key to the measure group providing better MDX query performance for the users. But if we’re dealing with a significant amount of data in the fact table and a large intermediate dimension, the processing performance impact could be very significant.

But what if I de-materialize the relationship? The good news is that the TSQL query used to load the measure group will perform better because we’re no longer making a join to the intermediate dimension:

image

The bad news is that during executions of MDX queries, SSAS will need to look up the key value for the referenced dimension leading to slower MDX query performance.

The Conclusion

My personal recommendation would be to add the referenced key to the fact table so that way you can eliminate the dilemma altogether. I’ve personally seen referenced relationships in a multi TB cube destroy processing performance. The resolution for this particular client’s issue was to add the referenced key to the fact tables and create the regular relationships between the measure groups and the referenced dimensions. After implementing the fix, processing for the partitions was very fast.

If for some reason you can’t add the referenced key to the fact table and you must decide between materializing the relationship or not, of course the answer is “It depends”. If your cube is extremely large featuring an incremental processing strategy, I would recommend to not materialize the relationship. This will prevent you from having to process the entire cube should something in the referenced dimension change. You’ll just have to live with the somewhat slower MDX query performance.

If, however, your cube is relatively small or your data warehouse is completely reloaded each night, I would probably recommend materializing the relationships. The increased time to process won’t be that big of a deal since the cube is small and your query times will benefit because of the materialized relationship. You’ll also be able to create aggregations since the relationship is materialized. Of course, the “it depends” applies differently to every situation.

Feedback?

I hope you found this useful! If you did, please share it!

Also, if you have any questions or feedback, please leave a comment below! Thanks for reading!