All posts by Dustin Ryan

Dustin is a Data Platform Solutions Architect on the Education team at Microsoft. You can find Dustin blogging and speaking at events like SQL Saturday, Code Camp, and SQL Rally. Follow Dustin on Twitter @SQLDusty.

Gotchas With Referenced Dimensions

image

A referenced dimension relationship allows us to create a relationship between a measure group and a dimension that is indirectly related to the measure group using an intermediate dimension. This is great, because we can now create relationships between our facts and dimensions that may not be directly related. Unfortunately, there are a couple of gotchas that you may not be aware of, which I read about here and here. I’m just offering my take on these issues, so thanks to Alberto and Jesse for the great content.

You can see several examples of referenced relationship types in the Adventure Works cube. One of those examples is the relationship between the Sales Target measure group and the Sales Territory dimension.

image

If you’ve ever created a referenced dimension relationship, you know that the Materialize option is checked by default. So one might assume this is the best option to go with.

To Materialize?
image

But before we can decide if the option to materialize the relationship is best, we must first understand what the materialize option really does. When we check the materialize option, the link between the referenced dimension and the facts is materialized. During processing of the measure group, a join is made to the dimension to look up the referenced dimension’s key value. This leads to slower cube processing but better cube query performance.

There is, however, an important thing to realize: The materialized relationship link between the facts and the correct attribute members is created during the processing of the partition, not the dimension! Why is this important? Imagine we need to update the relationships between the facts and the referenced dimension. We can no longer simply process the dimension. The link to the older referenced dimension attribute member still exists in the partition! The partition must be processed! This could be a huge issue if your cube is very large and you’re only planning to process the most recent partition. If the relationships changed in the referenced dimension, you could find yourself forced to process more than just the most recent partition in order to ensure the relationships between the facts and the referenced dimension are correct.

To Not Materialize?
image

So what’s the other option? We could leave the option to materialize the relationship unchecked. The downside to this is that query performance will suffer since the member will need to be looked up on the fly. The upside is that processing will be faster and we no longer need to process the partition to update the relationship since its looked up at query time.

But there is also a gotcha with using un-materialized referenced dimension relationships. Aggregations can not be created for a referenced dimension when the relationship is not materialized. If you run the Aggregation wizard, you’ll notice that any referenced dimensions with un-materialized relationships will be left out. This is by design. An aggregation should not be created (which is possible to do manually) for an un-materialized referenced dimension relationship because the relationship is resolved at query time.

What About the TSQL Query?

But what happens with the TSQL query during processing of the measure group? Here’s an example in the Adventure Works cube that we can look at:

image

This referenced relationship between the Reseller Sales measure group is materialized. When the Reseller Sales measure group is processed, a join is made to the intermediate dimension, to look up the Geography Key, which you can see here:

image

This join allows SSAS to add the referenced key to the measure group providing better MDX query performance for the users. But if we’re dealing with a significant amount of data in the fact table and a large intermediate dimension, the processing performance impact could be very significant.

But what if I de-materialize the relationship? The good news is that the TSQL query used to load the measure group will perform better because we’re no longer making a join to the intermediate dimension:

image

The bad news is that during executions of MDX queries, SSAS will need to look up the key value for the referenced dimension leading to slower MDX query performance.

The Conclusion

My personal recommendation would be to add the referenced key to the fact table so that way you can eliminate the dilemma altogether. I’ve personally seen referenced relationships in a multi TB cube destroy processing performance. The resolution for this particular client’s issue was to add the referenced key to the fact tables and create the regular relationships between the measure groups and the referenced dimensions. After implementing the fix, processing for the partitions was very fast.

If for some reason you can’t add the referenced key to the fact table and you must decide between materializing the relationship or not, of course the answer is “It depends”. If your cube is extremely large featuring an incremental processing strategy, I would recommend to not materialize the relationship. This will prevent you from having to process the entire cube should something in the referenced dimension change. You’ll just have to live with the somewhat slower MDX query performance.

If, however, your cube is relatively small or your data warehouse is completely reloaded each night, I would probably recommend materializing the relationships. The increased time to process won’t be that big of a deal since the cube is small and your query times will benefit because of the materialized relationship. You’ll also be able to create aggregations since the relationship is materialized. Of course, the “it depends” applies differently to every situation.

Feedback?

I hope you found this useful! If you did, please share it!

Also, if you have any questions or feedback, please leave a comment below! Thanks for reading!

Migrating From Informatica To SSIS?

image

If you’re even considering migrating your Informatica Workflows to SSIS packages, you’re going to need a clear understanding of the level of difficulty you’re going to be up against. That’s where Workflow Profiler comes in.

The Workflow Profiler allows you to gather a complete list of of repository objects at the most granular level with the goal of determining their reusability. The Workflow Profilers also identifies connections utilized within the Workflows. Once the workflows have been documented, detailed reports are generated detailing the objects contained in the Informatica workflows.

First, identify the repository connection.

Then select the Workflows you want to document.

Next, you can select the type of report you’d like to generate. You can also specify if you would like to filter out certain types of objects.

And you’re report is ready for viewing and exporting to Excel, Word, or PDF formats.

Head over to PragmaticWorks.com and download the free trial of Workflow Profiler. If you’re considering migrating your ETL to SSIS, this is a tool you don’t want to miss out on.

Notes From SQL Saturday 168: Introduction To SSIS 2012

Thank you to everyone who attended my session, “SSIS 2012 In Yo’ Face: Introduction to SSIS”! I had a great time presenting and it was a whole lot of fun. Also, a big thank you to my co-presenter, Mitchell Pearson (blog | twitter)!

Also, thanks to everyone who attended my second bonus session, “What’s New In SSIS 2012”.

For those of you interested, here is the package we created in “SSIS 2012 In Yo’ Face”. Simple place the content of the zip file in the C:\Projects\SQLSaturday168 folder and unzip the SSIS project to the directory of your choice. If you have any questions, please feel free to post them here or send me a message on Twitter (@SQLDusty).

Again, thanks to everyone who came to my sessions! I hope you enjoyed the sessions and maybe even learned a little something.

BI xPress Report Deployment Tool

imageIf you’ve followed my blog for a while, you’ve probably seen me blog about BI xPress because its such an awesome tool. I use BI xPress on every gig I go on, even if the client has not purchased a license. There’s been several great enhancements since the earlier versions of BI xPress, but one of the more useful is the new Report Deployment Tool.

The practical need the Report Deployment Tool meets for the SSRS developer or administrator is the need to easily and quickly deploy SSRS reports from one environment to the next. With its drag and drop interface, I can move SSRS reports from the QA environment up to Production (or any other environment within my domain) simply by dragging and dropping the report.

After connecting to both SSRS servers, I can easily compare which reports exist in one environment versus the next. I can also bi-directionally move reports that are ready for deployment into Production.

I would highly recommend this tool for anyone who finds themselves facing challenging SSRS report deployment scenarios or for anyone who just wants to very easily move reports across servers in a split second. To learn more about BI xPress and the Report Deployment Tool, head over here.

PowerPivot 101 Training Webinar with Q&A

To watch the free recording of my webinar called PowerPivot 101, just head over to PragmaticWorks.com, create a free login and you’re all set! Please let me know if you have any questions.

PowerPivot 101 free webinar training

Once again, thanks to everyone that attended my session on PowerPivot. Here are my responses to some of the questions I received during my session but did not have time to answer until now.

1. Does the current version of PP support drill through?

The current version of PowerPivot does support drill through. All you have to do is right-click a measure in your spreadsheet and select Show More Details or alternatively you can double-click the measure.

image

2. Is PowerPivot add-in required for SharePoint?

PowerPivot for SharePoint is required if you wish you share Excel workbooks that contain PowerPivot data via SharePoint. For more information on installing and setting up PowerPivot for SharePoint, check this out.

3. Where will a replay of the PowerPivot 101 webinar be available?

Head over to the Pragmatic Works Learning Center to view the recording of this webinar as well as the recordings of all the previous webinars!

4. Book/website recommendations?

Definitely check out PowerPivotPro.com run by Rob Collie. It’s a great website with tons of cool PowerPivot-y stuff, plus Rob is a cool dude.

Check out these books!

image4

PowerPivot for Excel 2010: Give Your Data Meaning
Marco Russo & Alberto Ferrari

image10

Practical PowerPivot & DAX Formulas for Excel 2010
Art Tennick

5. What is the name of the component/function in SharePoint that would make the “live” data connections possible in published PowerPivot?

I assume the question is directed at utilizing a live connection to a data source, such as SQL Server. To my knowledge, this is not natively supported with PowerPivot. The data in your PowerPivot workbook must be refreshed.

There are two ways to refresh data in a PowerPivot workbook. You can refresh it manually anytime you want or you can schedule a refresh through SharePoint like I showed during the webinar. For more information on refreshing data in a PowerPivot workbook, check out MSDN.

6. I thought DAX was only supported using the Tabular mode of SSAS. My understanding is that PowerPivot is built on the Multidimensional model. I wouldn’t think DAX would be supported. Can you comment?

DAX (Data Analysis Expression language) is the formula language for PowerPivot and Tabular modeling and is used to specify calculations and to create new columns. It’s basically an extension of the Excel formula language. For more information related to DAX in regards to PowerPivot and Tabular modeling, read this and this.

7. Can you blog about having a calculated field that I can put on a slicer to filter by number of records (i.e. products with more than 100 records).

To filter records where a measure is greater than or less than a certain amount is pretty easy. In your workbook, just click the drop down arrow above the attribute you wish to filter, go to Value Filters, and select Greater Than, Less Than, or whichever option you desire.

image

Then you can select which measure you want to use and what quantity to use.

image

Thanks for all the great questions everyone. Don’t forget to check the recording of the webinar in case you missed anything!

PowerPivot 101 Recording Now Available

Thanks to everyone who attended my PowerPivot 101 webinar last Tuesday morning! It was a blast and I had a great time presenting for the 300 strong crowd that attended!

In case you missed the webinar, you can easily view the recording here free of charge! Also, don’t forget to check out the dozens of other free webinar recordings. With this kind of free training available, there really is no excuse for you to not be learning something new!

If you’re looking for a great book to get you started with PowerPivot and DAX, I would suggest checking out the following books:

image

PowerPivot for Excel 2010: Give Your Data Meaning
Marco Russo & Alberto Ferrari

 

 

 

image

Practical PowerPivot & DAX Formulas for Excel 2010
Art Tennick

 

 

 

Also, check back for my PowerPivot Q&A responses in a later blog post! Thanks again to everyone who attended my session!

Comparing and Syncing Data With Data Inspector

image

Earlier this month Pragmatic Works released Workbench. Workbench combines the powerful tools many of you are already using with an entire host of new, powerful features part of DBA xPress.

DBA xPress features tools designed to make designing, maintaining, and administering your SQL Server databases easier than before. Now you can easily do things like visually navigate your databases, visualize schema dependencies, compare, synchronize, script, and navigate schemas, and more.

image

One of my personal favorite tools included with DBA xPress is the Data Inspector. Data Inspector allows you to create a comparison between the data stored in two different databases. Not only can you compare the data between databases, but you can also copy the data from one database to another.

With Data Inspector, there are three ways to sync data between databases. Data can either be synced directly between databases with the Data Inspector Synchronization Wizard,

a script can be generated to synchronize the data for you, or a hybrid of the first two options will directly sync the data and generate the script for you.

I’m very excited about this tool and am looking forward to using it to speed up my data warehouse development projects. Head over to PragmaticWorks.com and check out DBA xPress.

Implementing Security With SSAS

Pragmatic Works just published a video on their YouTube channel put together by yours truly on implementing security in your SQL Server Analysis Services cube.

The video covers implementing basic dimensional security, cell security, as well as an extended look at implementing dynamic data driven security. The video is about an hour long so grab a bag of popcorn, sit back, and hopefully learn how to make your cube as secure as if it were guarded by a squad of Segway riding special forces commandos.

So check out my video Implementing Security With SSAS and feel free to post any questions or comments!

I’m Speaking at SQL Saturday #168: BI Edition!

imageSQL Saturday #168: BI Edition is coming up on November 17th in Tampa, Florida and yours truly will be presenting! This SQL Saturday event will be all Business Intelligence focused! All the sessions will either be ETL, analytics, or reporting related with a miscellaneous track focusing on professional development and DBA topics. Get registered ASAP for this awesome BI focused SQL Saturday!

I’ll be co-presenting with Mitchell Pearson (blog | twitter) on an introduction to SSIS 2012 called SSIS 2012 In Yo’ Face: Introduction to SSIS. We’re going to cover commonly used SSIS tasks and transforms, the SSIS expression language, and how to create powerfully dynamic packages.

So head over to the SQL Saturday #168 site and get registered. I look forward to seeing everyone in our session! It’s going to be a blast!

Generating Junk/Test Data For SSIS

Everyone once in a while I’ll run across a requirement for me to generate a bunch of junk data for the sole purpose of performance testing or benchmarking. There are plenty of ways that I could generate my test data, including clever TSQL scripts or a Data Flow Script Component as a source. And those methods work well, but are somewhat limited if I need to generate unicode string data for testing. But that’s where Task Factory comes in.

imageIncluded in the latest release of Task Factory is the Placeholder Source. The beauty of the Place Holder Source is that I can quickly and easily generate test data for my package.

Here you can see the editor for the Placeholder Source. On the Component Properties tab, I can specify how many rows of junk data I’d like to generate. The default is 10,000 rows, but I can set it to whatever I like. I’ve tested the source component with 100 million records before with no problem.

image

On the Input and Output Properties tab is where you specify which columns and data types you’d like to include in your junk data.

image

By default, the Placeholder Source creates 5 output columns of various data type, but you can add more fields of whatever data type you like. Just click Add Column and then select the data type you want. Here I’ve added an additional column and selected Integer as the data type.

All that’s left to do is wire up my package and click execute. Instantly tons of junk data is created for us to test or play with. If you’re interested in expanding your SSIS arsenal, head over to PragmaticWorks.com and download Task Factory. The best part is that Task Factory is free to download. A license is only required to run the Task Factory components remotely. So why not check it out?