Category Archives: MDX

How to Pass Multiple Values from an SSAS Report Drill Through Action to an SSRS Report

These past couple weeks I faced a client requirement to create an SSRS Report Action in an SSAS cube. This requirement dictated that a user should be able to browse a cube in an Excel pivot table and be able to drill through to an SSRS report (which used SQL Server as the data source) and view data at a monthly level. The twist was that if the user drilled from a year or quarter level, the appropriate months would be selected by default. I thought the solution I came up with was useful so here you have it, as recreated using Adventure Works 🙂 . I won’t walk through the step-by-step of creating a Report Action in SSAS, but I will hit the high points of overcoming this requirement.

I created the report action in the Cube Designer in SSDT and called it SSRS Sales Report. Because I want the users to be able to click on a measure, I set the Target type to “Cells” and the Target object to “All cells”.

image

But I really only want the users to see the action if they right-click the Reseller Sales Amount measure, so I added the following conditional logic so the action will only appear for the Reseller Sales Amount.

image

If you wanted to limit the action to appear beneath more than one measure but not all measures, just use “or” and specify other measures as seen in the example below. Also, in my specific requirement, the action should only work for a certain time frame (2007 forward in this example) so I added the logic you see after the “and”.
image

(Measures.CurrentMember is [Measures].[Reseller Sales Amount] 
or Measures.CurrentMember is [Measures].[Internet Sales Amount])
and
COUNT(
Intersect(
{[Date].[Calendar Year].&[2007]:NULL},
[Date].[Calendar Year].CURRENTMEMBER)
) > 0

Next, I added in the Report Server URL and SSRS report link. In my case, the report server was SharePoint Integrated.
image

Now this is where things start to get interesting. I need to dynamically pass the year and month values to the SSRS report. What values to pass from SSAS really depends on the parameters in the SSRS report. If the SSRS report is using SSAS as a data source, then you may be able to use the unique name of the attribute members. In my case, the SSRS report used a regular ole SQL Server database as the data source so I needed to format the parameter values appropriately. The Parameter Name values should reflect the name of the parameters in the SSRS report. Also, if you’re using an SSRS report in SharePoint, don’t forget to include “rp:” as a prefix on the parameter name. For example, if your parameter is named “Year”, the parameter name you would enter into the action would be “rp:Year”. Again, this only applies if your SSRS report server is SharePoint integrated.

In the table below, you’ll noticed I’m using the Right function to return the right four characters of the name of the Calendar Year member. This is because in SSAS the Calendar Year caption appears as “CY 2007” for 2007 for example. In the SSRS report, however, the parameter value would be “2007”. The Month parameter is a little trickier. Because I want the user to be able to view multiple months at a time depending on what level of the Date Hierarchy they click on, I used the Generate function with the Descendants function to create a comma delimited string of the member values at the month level.

Parameter Name Parameter Value
rp:Year RIGHT([Date].[Calendar Year].CURRENTMEMBER.NAME, 4)
rp:Month

GENERATE(

     DESCENDANTS([Date].[Calendar].CURRENTMEMBER,
          [Date].[Calendar].[Month]),

     LEFT(
          [Date].[Calendar].CURRENTMEMBER.NAME,

          3),

     “&rp:Month=”)

So if you’re wondering what the Generate function is doing here, take a look at the following query and the query results and that should clear it up for you.

with member [Measures].[Months param] as 

GENERATE(
DESCENDANTS([Date].[Calendar].CURRENTMEMBER, [Date].[Calendar].[Month]),
LEFT([Date].[Calendar].CURRENTMEMBER.NAME,3),
"&rp:Month=")

Select [Measures].[Months param] on 0,

DESCENDANTS(
[Date].[Calendar].MEMBERS,
[Date].[Calendar].[Month],
SELF_AND_BEFORE
) on 1
From [Adventure Works]

image 
In the results you can see that one of the neat things the Generate function can do is to create a delimited list. The third argument of the Generate function is my delimiter and in this case, I want it to be “&rp:Month=” so I can specify multiple values to pass to the parameter in the SSRS report. So if the user right-clicks a quarter, 3 months will be passed to the report. If the user clicks a semester, 6 months should be passed.

And here’s a screen cap of the Parameters in the Cube Design in SSDT in case you were wondering.

image

The last step is to make the caption for the action dynamic. Set the Caption is MDX to “True” and then you can do neat things with the action like use the CurrentMember’s Name property in the caption of the action.

image

Here’s the complete screen grab of the Action if you need it:

image

So that’s fantastic and all but what does it look like on the end user’s end? Here I have a pivot table with my date hierarchy and a couple measures. If I right-click the Reseller Sales Amount for Q4 2007, select Additional Actions, you’ll see the caption for my brand new action: View SSRS Sales Report for Q4 CY 2007.

image

And the report correctly picks up the values from the Action:
image

Keep in mind that this action works for any level of the Date Hierarchy down to the date level. Very slick!

If you look at the URL, you can see that the parameters being passed to the report through the URL query string:

http://spsqlbi/sites/BI%20Center%20Test/_layouts/15/ReportServer/RSViewerPage.aspx?rv:RelativeReportUrl=/sites/BI%20Center%20Test/Reporting%20Services/Adventure%20Works%20Reports/SalesReport.rdl&rp:Year=2007&rp:Month=Oct&rp:Month=Nov&rp:Month=Dec&rs:Command=Render&rs:Renderer=HTML5

Chances are that you’ll have to struggle through this a few times like I did. So here’s a couple tips to help you troubleshoot:

1. If the action does not appear where you expect it to in the pivot table, then there could be a syntax issue with your MDX in the Action definition in SSDT. Double-check your MDX.
2. If the action takes you to the report correctly, but the values in the parameters aren’t selected, then you probably aren’t passing the correct values to the report for the parameter. For example, if the Month parameter in the report uses Month Number as the values (ie 1 = Jan, 2 = Feb, etc) but you pass “January” from the Action, then the value will not be automatically selected.
3. Lastly, you are subject to the URL query string character limit in place by the browser. For example, IE has a limit of 2,083 characters last I checked, so if you’re trying to pass a selection of 500 customer values to an SSRS report you will probably run into some problems.

I hope that some of you out there found this useful. If you did, share it with a friend or leave me a comment and let me know! Thanks!

How To Display 0’s Instead of NULL in your SSAS Cube & MDX Query Results

I’ve done a decent amount of SSAS and MDX query development over my career and one of the requirements that continually pops up is to display zeros instead of NULL when browsing the cube. Take the following query as an example:

Select {[Date].[Calendar Year].Children} on 0,

NON EMPTY {[Geography].[State-Province].members} on 1

From [Adventure Works DW]

Where [Measures].[Reseller Sales Amount]

 

And here are the results:

image

There are a couple ways display 0’s instead of NULL. The first way is to create logic in your MDX script that uses a CASE statement of an IIF expression that manually checks for empty cube space using the ISEMPTY function, similar to the following example:

SCOPE ([Measures].[Reseller Sales Amount]);

THIS=IIF(ISEMPTY([Measures].[Reseller Sales Amount]),0,[Measures].[Reseller Sales Amount]);

END SCOPE;
The problem with this method is that any empty cube space will be populated with 0’s, potentially exploding your results! Rows that should not be displayed are now displayed! In the image below, I’ve highlighted a row that does not have any results that should not be displayed. Queries that usually only return a handful of results could now potentially display thousands or millions of rows! Yikes! And that’s something we definitely want to avoid.
image
There is another way that we can fill the NULL’s with a 0 or any other value we choose: Through the FormatString property. The FormatString property accepts four arguments separated by a semicolon (;). The first argument is how positive value should be formatted, the second is how negative values should be formatted, the third argument is how 0 values should be formatted, and the fourth argument is how NULL should be formatted. The fourth argument is the one we are interested in!
Here is an example of how I am formatting the Reseller Sales Amount: “#,##0.00;-#,##0.00;;0”.
image
By configuring the FormatString property this way, we are no longer filling the empty space in the cube with 0’s, rather we are configuring the measure to only display a 0 should a NULL value appear in the results. Check it out.
image
No more NULL values. If you’d rather display a non-numeric value, you can do this: “#,##0.00;-#,##0.00;;Zero”
image
If you thought that was helpful or enjoyed reading my blog, leave a comment. Feel free to post any questions, as well!

Do You Know Why Your MDX Query Is Slow?

Performance tuning MDX queries can often be a daunting and challenging task. But the first step in deciding where to begin your efforts to improve the performance of your query is to diagnose the source of the problem. There are two areas that could be causing our performance issue: 1. The design of our SSAS solution or 2. The design of our query. We need to figure out if we’ve written a bad query or designed a bad cube :).

How Do We Test The Query?

Usually an issue is discovered when a user comes to the BI team with a report that appears to be running slowly. So for our example, I have a pivot table in an excel workbook that is running a little slow. Let’s walk through this together and diagnose what the problem could be. Below here you can see the pivot table.

image

The first thing you’ll need to do is test the query and the best way to do this is to execute the query in isolation so that we can eliminate outside factors as part of the problem. So in this example I would execute the report outside of business hours. I want to make sure I do this outside of peak use time because one of the things I’ll need to do is clear the cache. In order to prevent SSAS from satisfying our query by leveraging the formula cache and the storage engine cache, we need to execute a ClearCache command to prevent our results from being corrupted. To do this, I’ll execute the following XMLA script in SSMS.

 

<ClearCache xmlns=”http://schemas.microsoft.com/analysisservices/2003/engine>
     <Object>
          <DatabaseID>AdventureWorksDW2012MutliDimensional-EE</DatabaseID>
          <CubeID>Adventure Works</CubeID>
     </Object>
</ClearCache>

 

Also, I’ll execute the following MDX to initialize my cube’s MDX script.

 

Select {} on 0 
From [Adventure Works]

For us to figure out how long our query is taking to execute, we’re going to fire up SQL Server Profiler and execute a trace against SSAS while we execute the query so we can gather all the nitty gritty details of our query execution. By running a trace, we will be able to see all kinds of really helpful details like the total duration of the query, the partitions being queries, if aggregations are being used to answer the query, which attributes are being used, and much more.

To begin our trace, go to Tool > SQL Server Profilers in SSMS.

image

Flip the Server type to Analysis Services and set the Server name to your SSAS instance that is home to the SSAS database your report is pointed at.

image

Next you’ll see the Trace Properties window. Go to the Events Selection tab, and check the check box near the bottom right of the window to Show all events. Scroll down a little more than half way and find the events “Get Data From Aggregation” and “Query Subcube Verbose.” The “Get Data From Aggregation” event is fire when an aggregation is used to satisfy a query. This event is also especially helpful when trying to determine if the aggregations you have designed are actually useful. The “Query Subcube Verbose” event will give you very detailed information on which members from which dimension attributes are being queried to satisfy the query. Click Run when you’re done.

image

Now that the trace is running, its time to conduct our test.

1. The first thing I’ll do is execute the Clear Cache command.

 

<ClearCache xmlns=”http://schemas.microsoft.com/analysisservices/2003/engine>
     <Object>
          <DatabaseID>AdventureWorksDW2012MutliDimensional-EE</DatabaseID>
          <CubeID>Adventure Works</CubeID>
     </Object>
</ClearCache>

 

2. Initialize the Calculation script in your cube.

 

Select {} on 0 
From [Adventure Works]
3. Execute the report/query. In my case, my report is an Excel pivot table, so I’ll simply click the refresh button in Excel to execute my query.

The report may take a few seconds or minutes to run depending on the query, but it should probably take longer than you’re used to simply because the query is running against an empty cache and SSAS will have to retrieve all of the data from storage.

When the query is finished executing, pause your Trace in SQL Server Profiler by clicking the pause button at the top of the Trace window. Now its time to take a look at the results. Find the Query End event in the Trace results. Find the Duration column. This number displays the total query duration in milliseconds. In my example, my query took just over 13 seconds to execute. So its not unbelievably slow, but certainly slower than we’d like.

image

In order to take a more in depth look at our query’s performance, lets save the Trace results to a SQL Server table so we can query it. Go to File > Save As > Trace Table.

image

Specify where you’d like to save the results and click OK.

image

Below you’ll see a very useful query that will break down where your query is experiencing a slow down. If you’re using this query for your testing, don’t forget to alter the From clause to query your trace table.

 

SELECT x.ConnectionID,
       x.QueryID,
       x.QueryDuration,
       p.SEDuration,
       CASE 
WHEN p.SEDuration > x.QueryDuration THEN NULL ELSE x.QueryDuration - p.SEDuration 
END AS FEDuration,
       y.[Number of SE Queries],
       y.[Thread Duration of SE Queries],
       w.[Aggregations Read],
       x.TextData
FROM   (SELECT a.ConnectionID,
               a.Duration AS QueryDuration,
               a.TextData,
               CAST (HashBytes('SHA1', CAST (reverse(CAST (TextData AS VARCHAR (MAX))) AS NVARCHAR (4000))) AS INT) AS QueryID
        FROM   MyTraceTable AS a
        WHERE  a.EventClass = 10) AS x -- Query End Event
       LEFT OUTER JOIN
       (/* Determine Query Subcube Verbose of Non-cache data */
       SELECT   ConnectionID,
                 COUNT(*) AS [Number of SE Queries],
                 SUM(Duration) AS [Thread Duration of SE Queries]
        FROM     MyTraceTable
        WHERE    EventClass = 12 -- Query Subcube Verbose
                 AND EventSubclass = 22 -- Non-cache data
        GROUP BY ConnectionID) AS y
       ON y.ConnectionID = x.ConnectionID
       LEFT OUTER JOIN
       (/* Determine Aggregations that are ready from */
       SELECT   ConnectionID,
                 COUNT(*) AS [Aggregations Read]
        FROM     MyTraceTable
        WHERE    EventClass = 60 -- Read from Aggregations
        GROUP BY ConnectionID) AS w
       ON w.ConnectionID = x.ConnectionID
       LEFT OUTER JOIN
       (/* Determine SE time */
       SELECT   ConnectionID,
                 SUM(Duration) AS SEDuration
        FROM     MyTraceTable
        WHERE    EventClass = 11 -- Query SubCube
        GROUP BY ConnectionID) AS p
       ON p.ConnectionID = x.ConnectionID;

 

 

The results here are very telling. The column “QueryDuration” shows us the total execution time of the query. The column “SEDuration” shows us the amount of time SSAS spent pulling the data from storage (Storage Engine). The column “FEDuration” shows how long SSAS spent calculating our queries results (Formula Engine).

image

In this particular example, the vast majority of our query’s execution time is spent in the Formula Engine. Of the 13+ seconds spent executing the query, the query spends more than 12 seconds in the Formula Engine and only 297 milliseconds pulling the data from storage. This tells us that the problem is probably not with the design our SSAS solution, but rather a poorly written query. Unfortunately, this being an MDX query generated by Excel, there’s not a lot we can do about altering the query.

How Do We Fix The Query?

Typically when deciding where to spend your performance tuning efforts you want to start in the area where your query spends more than 30% of its time (If its a 50/50 split make an educated decision). In the previous example, we’ve determined our problem is with the query.

What can I do to improve my MDX query?

If you determine your query’s problem is the query itself, ensure SSAS is utilizing subspace computation instead of cell by cell computation. SSAS will usually evaluate groups of cells in your cube at a time but in certain situations SSAS will evaluate cube space one cell at a time. We want to to avoid that. You can get a hint that SSAS is calculation your results one cell at a time if the query on our trace table shows a large amount of Storage Engine queries. Certain SSAS/MDX functions can disable subspace computation.

1. Late binding functions (ex. StrtoMember, StrtoSet functions)
2. Set aliases
3. LookupCube function

Also, check out this blog for more info on ways to improve your MDX query.

But what if the problem is my SSAS solution’s design?

If you conduct your test and determine the majority of the query duration is spent pulling the data from storage, there’s a lot to consider when discussing cube design best practices. But here are some brief highlights of things to consider.

1. Can we design aggregations to help our query? Look at your test results and see if aggregations are being used to satisfy the query.
2. Can we implement a partition design strategy that keeps SSAS from having to query larger partitions?
3. Are the right partitions being queried? For example, if your query is asking for data for 2010 and you notice in your trace that the partitions for all the other years are being queried, this could indicate that SSAS is having a hard time figuring out which partition has the correct data. You may need to set the Slice property on your partition.
4. Create Attribute relationships
5. Leverage Natural Hierarchies

There are many more best practices for cube design, but this is probably a good starting point.

If you’ve found this helpful, share it with a colleague or a friend and leave me a comment. Feel free to leave me a question or feedback in the comments here or send me a note on Twitter! I love discussing new ideas and learning so don’t hesitate!

SSAS Static Named Sets Vs. Dynamic Named Sets

So I’m 95% sure that I blogged about this topic at some point over the last couple years, but every time I try to find the link to show a class I’m teaching or to show a client, I can never find the darn thing. This is why I’m writing this blog. That and its also nice to have a good example of this on hand, which is what we have here.

In SSAS we have the ability to create named sets. An named set is basically an aliased set expression that we can use within our MDX queries. This is very useful if we have a set that is commonly used in our organization’s reporting solution.

But there are two types of named sets: static and dynamic. Static and dynamic sets appear very similar but they actually behave very differently, which is why I present to you the following example.

Below you will see a snippet of MDX from my cube script that creates a named set called Top 10 Customers – Static. This is the basic syntax for creating a named set in your cube’s MDX script. You’ll notice the static keyword, highlighted in blue. This specifies that we wish this named set to be static. The static keyword is actually optional, because if we leave the static keyword out of the create set statement, the set will still be created as a static named set.

CREATE STATIC SET CURRENTCUBE.[Top 10 Customers – Static]

AS topcount(

[Customer].[Customer].children,

10,

[Measures].[Internet Sales Amount]

) ;

The next create set statement creates a dynamic named set called Top 10 Customers – Dynamic, the big difference here being obviously the keyword dynamic, highlighted in blue. This specifies that this named set should be created as a dynamic named set.

CREATE DYNAMIC SET CURRENTCUBE.[Top 10 Customers – Dynamic]

AS topcount(

[Customer].[Customer].children,

10,

[Measures].[Internet Sales Amount]

) ;

Here you can see the create set statements in my Adventure Works cube script.

image 

And here we can see the two sets as they appear in our cube’s metadata tab in SQL Server Management Studio.

image

Now this is where it gets interesting. Below we have an example of our static named set being used in a query on the row axis. And we can see that it works fine.

image

But what happens if add a constraint in the Where clause? Uh oh, we run into an issue. The static named set does not respect the Where clause (or a subselect statement in the From clause for that matter). The named set displays the same members instead of displaying the top 10 customers from the year 2006. This could be a problem for our users depending on the requirements of the reporting solution.

image

image

This could be where a dynamic named set may be more useful. Here you can see an example of  a query that uses our dynamic named set.

image

Except when we provide a constraint in the Where clause, the named set listens to the Where clause and displays the correct data. I know the Internet Sales Amount numbers are all the same but that’s just the nature of the Adventure Works data.

image

I think this perfectly demonstrates the differences between static and dynamic named sets. Static named sets behave exactly as their name suggests: They are static and do not respect the Where clause or a subselect statement in the From clause. The dynamic named set is dynamic and will listen to a Where clause slicer or a subselect in the From clause.

If this is all a little overwhelming to you and anytime someone mentions using MDX you curl up into the fetal position, suck your thumb, and sob uncontrollably, I would suggest taking a look at the BI xPress calculation builder. BI xPress has a nifty little wizard that will help you create MDX calculations and named sets without you having to do any of the tough MDX writing on your own.

To create a named set with the BI xPress calculation builder, click the little calculate icon in the Calculations tab of the cube designer in BIDS or SSDT. This will open up the MDX Calculation Builder Wizard part of BI xPress.

image

Choose the Top 10 Count template under the Sets folder and click Next.

image

On the next screen we can pick the attribute required for our set. In this case, I’ll select the Customer attribute of the Customer dimension in order to create the Top 10 Customers set we were playing with earlier.

image 

Then we select the measure we want to use to rank our customers. I’m selecting the Internet Sales Amount measure.

image 

Lastly we give our named set a name and click finish. On this screen we can preview the MDX the BI xPress MDX Calculation Builder wrote for us.

image 

And we’re done!

image 

The BI xPress MDX Calculation Builder wrote all the MDX for us without us having to know a lick of MDX! Pretty nifty if I do say so myself. For more information on BI xPress or the BI xPress MDX Calculation Builder, head over to PragmaticWorks.com and download the free trial of BI xPress.

And if you have any questions or comments, please feel free to leave a comment or shout out on Twitter @SQLDusty! Thanks!

Recording Now Available For The Webinar, Choosing The Right Analysis Services: MOLAP Vs. Tabular

image

Thanks to everyone that attended Devin’s and my webinar called Choosing The Right Analysis Services: MOLAP vs. Tabular. I’m pleased to announce that the recording is now available to watch for free over at PragmaticWorks.com, so please go check it out. It’s a little less than an hour so you can watch it during your lunch break.

Also, the PowerPoint slide deck Devin and I used during the webinar is also available for viewing now! Please visit this link to download the slide deck.

Now for the questions! Many of you asked some great questions but unfortunately we ran out of time to answer all of the questions during the webinar. So here are a few of the questions we didn’t get to.

Q: How do I link if column have more than one column is key column in tabular?
A: If you need to create a composite key in a Tabular model table, you will need to create a calculated column that concatenate the columns that make up your composite key. You’ll need to do this in both tables you wish to relate. Once you’ve done that, then you can create the relationship between the two tables using your new columns.

Q: Can DAX be used to access cubes?
A: In the SQL 2012 SP1 CU4 release, DAX support for multidimensional cubes was added, so as long as you are running on SQL 2012 SP1 CU4 or later, you should be able to query cubes with DAX expressions. On a side note, MDX can also be used to query a Tabular model.

Q: Since tablular solution is many ways better than Muti Dimensional..then my question is when to go for Multi dimensional solution
A: This is one we covered extensively during the webinar. Here are some of the things to consider:

  1. How much data are you dealing with? If its too much to fit into memory for your Tabular model, then MOLAP is the way to go.
  2. Do you have a need for complex relationships? If so, MOLAP may be the answer. Role playing dimensions and many-to-many relationships are possible to create in a Tabular model, but they’re easier to create and manage in a MOLAP cube.
  3. Do you need to perform many complex calculations involving complex Scope assignments? If so, MOLAP is the answer here.

Q: Can you use a Multidimensional database as the source for a Tabular model and improve performance when creating low level granular reports?? This goes back to the performance differences between Multidimensional vs Tabular when creating granular reports.
A: You can use a Multidimensional database as a data source for a Tabular model, but I would suggest getting the data from the original source for the tabular model. If granular type queries are slow against your cube, those same queries are still going to be slow when you execute them to process your Tabular model.

Thanks to everyone that attending Devin’s and my webinar! If you have any other questions, please feel free to leave a comment or send me a message on Twitter!

SSAS Lessons Learned: 29% Better Compression and 11% Better Query Performance

The Importance of Sort Order

This past week I taught the SSAS Masters class which is one of the virtual training classes offered by Pragmatic Works. One of the things we discuss in the class is the importance of sorting the fact data in your data warehouse in preparation for Analysis Services. Simply by sorting your fact data, you can see much improved compression which can also improve your query response time, as well. But how much improvement in compression and query response could you see? Well that’s what I set out to discover by running a couple little tests.

Better Compression? Yes, please.

I started with my beloved Fact Sales measure group in the Contoso Retail Operations cube. The Fact Sales measure group utilizes a named query in the DSV that is a basic select statement from the Fact Sales table in the Contoso Retail database. I checked the size of the single partition that made up the measure group and saw that is was just over 129 MB in size. Not big but I thought we could improve that.

1 Partition Properties Fact Sales Unsorted

So I set out to sort my data. The trick to sorting your data is to start by selecting your top three fields with the least amount of cardinality (or uniqueness). Try experimenting with different sorting to see what kind of results you can get. For the FactSales table, I started with PromotionKey, CurrencyKey, and ChannelKey then went from there. I simply set my partition to Query Bound and utilized the following query:

SELECT   TOP 2147483647 CONVERT (INT, CONVERT (CHAR (8), DateKey, 112)) AS DateKey,
channelKey,
StoreKey,
ProductKey,
PromotionKey,
CurrencyKey,
UnitCost,
UnitPrice,
SalesQuantity,
ReturnQuantity,
ReturnAmount,
DiscountQuantity,
DiscountAmount,
TotalCost,
SalesAmount,
ETLLoadID,
LoadDate,
UpdateDate
FROM     dbo.FactSales
ORDER BY PromotionKey, CurrencyKey, ChannelKey, StoreKey, ProductKey, DateKey;

Arguably there are better ways to sort the data for SSAS but that’s not the point of this blog post so I’ll leave that for you to decide.

I did a quick redeployment of the cube and processed the Fact Sales measure group.

2 Partition Properties Fact Sales Sorted

The partition size dropped down to 93.11 MB in size! That’s a whopping 28% decrease in size! Awesome!

28% is a pretty big storage savings, especially when we could potentially be dealing with a lot more data in an enterprise scenario. Personally, I’ve seen storage savings up to 45% simply by sorting the data in the relational engine.

Better Query Performance? Sign me up!

With small fact .data files, we should see better query performance, right? I mean, theoretically it makes sense, but I was curious about how much better query performance we would see. So I set out with another little experiment.

First, I used Excel to generate a nasty little MDX query for my testing, which I captured with profiler:

SELECT
{
[Measures].[Sales Amount]
,[Measures].[Sales Quantity]
,[Measures].[Sales Unit Cost]
}
DIMENSION PROPERTIES
PARENT_UNIQUE_NAME
,HIERARCHY_UNIQUE_NAME
ON COLUMNS
,NON EMPTY
CrossJoin
(
Hierarchize
(
{
DrillDownLevel
(
{[Date].[Calendar Week].[All Date]}
,,,INCLUDE_CALC_MEMBERS
)
}
)
,Hierarchize
(
{
DrillDownLevel
(
{[Product].[Product Name].[All Products]}
,,,INCLUDE_CALC_MEMBERS
)
}
)
)
DIMENSION PROPERTIES
PARENT_UNIQUE_NAME
,HIERARCHY_UNIQUE_NAME
,[Product].[Product Name].[Product Name].[Product Available For Sale Date]
,[Product].[Product Name].[Product Name].[Product Brand Name]
,[Product].[Product Name].[Product Name].[Product Category Description]
,[Product].[Product Name].[Product Name].[Product Category Label]
,[Product].[Product Name].[Product Name].[Product Class]
,[Product].[Product Name].[Product Name].[Product Color]
,[Product].[Product Name].[Product Name].[Product Description]
,[Product].[Product Name].[Product Name].[Product Image URL]
,[Product].[Product Name].[Product Name].[Product Label]
,[Product].[Product Name].[Product Name].[Product Manufacturer]
,[Product].[Product Name].[Product Name].[Product Size Range]
,[Product].[Product Name].[Product Name].[Product Size Unit Measure]
,[Product].[Product Name].[Product Name].[Product Status]
,[Product].[Product Name].[Product Name].[Product Stock Type]
,[Product].[Product Name].[Product Name].[Product Stop Sale Date]
,[Product].[Product Name].[Product Name].[Product Style]
,[Product].[Product Name].[Product Name].[Product Subcategory Description]
,[Product].[Product Name].[Product Name].[Product Subcategory Label]
,[Product].[Product Name].[Product Name].[Product Subcategory Name]
,[Product].[Product Name].[Product Name].[Product Unit Of Measure]
,[Product].[Product Name].[Product Name].[Product URL]
,[Product].[Product Name].[Product Name].[Product Weight Unit Measure]
,[Date].[Date].[Date].[Asia Season]
,[Date].[Date].[Date].[Calendar Month]
,[Date].[Date].[Date].[Calendar Week Day]
,[Date].[Date].[Date].[Date Description]
,[Date].[Date].[Date].[Europe Season]
,[Date].[Date].[Date].[Fiscal Month]
,[Date].[Date].[Date].[Is Work Day]
,[Date].[Date].[Date].[North America Season]
ON ROWS
FROM
[Operation]

CELL PROPERTIES
VALUE
,FORMAT_STRING
,LANGUAGE
,BACK_COLOR
,FORE_COLOR
,FONT_FLAGS;

I then modified my partition to use the unsorted data, redeployed, and reprocessed. Executing the query against the Contoso database with a cold cache returned the following execution time, which I captured with Profiler again:

3 Unsorted Query Duration Cold Cache

The query finished in just over 56 seconds. Against a warm cache, the query finished in about 50 seconds.

I once again altered my partition to be query bound the TSQL query previously mentioned, redeployed, reprocessed, cleared the cache, and ran my query. This time my query finished executing 49 seconds!

5 Sorted Query Duration Cold Cache

So simply by sorting the data for loading into my partitions, I saved 28% storage space and improved my query’s performance by 11%! Not bad for about 10 minutes worth of work, huh? I conducted my tests several times and each time the results were about the same.

The Tradeoff

It’s not all sunshine and roses. There is a slight drawback that you should be aware of and it has to do with additional time spent processing. By adding the Order By clause to your queries for the partitions, the queries will probably take longer to execute. This is going to add time to processing. This means you have to decide if you can live with the additional time processing in order to gain improved compression and query performance. Depending on many factors, the additional time spent processing could be minimal… or not. But you’ll have to decide if the additional processing time is worth the improvements.

The Conclusion

The lesson to be learned here is the importance of sorting your data for loading into your partitions. The performance improvements seen by simply improving compression of your partitions by sorting your data is an easy way to improve storage of your data as well as query performance.

I’d be interested to see if any of my readers could conduct their own tests and see what kind of performance benefits they see. So if you have a few minutes of your own, try this out and then leave a comment with your results. Good luck!

Thanks For Attending My SQLSat192 Session!

SQLSat192Thanks to all who attended my session MDXplosion! Intro to MDX at SQL Saturday 192 in Tampa over the past weekend! It was a great event and I had a blast presenting, networking, and hanging out with some great friends. And a special thanks to all the sponsors, volunteers, and Pam Shaw for making this great event possible. 

If you’d like to download my slide deck and code examples from the presentation, you can get that here. If you have more questions about MDX and want to learn more, you can find my content on MDX here. And, of course, you can email me at dryan (at) pragmaticworks.com or find me on Twitter here.