After spending the last five years or so designing and performance tuning SSAS cubes and dimensions for the clients of Pragmatic Works, I’ve noticed there seems to be a recurring pattern with poorly designed dimensions. Because of this I wanted to take a few minutes to point out three easy things you can do to improve the performance and usability of your solution’s dimensions
Remember Less is More
One of the most common mistakes that I often see with poorly performing dimensions is large, bloated dimensions with attributes that are duplicated, unnecessary, and/or unused. These extra dimension attributes can seriously increase the amount of time required to process the dimension. It’s important to remember that for each attribute in a dimension, a Select Distinct query is executed against the data source. Depending on the number of rows in the table, unique values in the field and other factors, the processing time will be negatively impacted with each attribute that is added to the dimension.
I once worked for a client that had over 100 SSAS databases and very mature BI environment but had allowed the users to talk their team into making some poor dimension design decisions. One of their largest cubes featured an Employee dimension that included hundreds of thousands of members. The Employee dimension contained an unnecessary amount of attributes, many which contained the same data as other attributes or went unused. For example:
- Employee Name – Employee Number (ex. John Smith – 0004567)
- Employee Number – Employee Name (ex. 0004567 – John Smith)
- Employee Number Name (ex. 0004567/John Smith)
- Employee Name Number (ex. John Smith/0004567)
- Employee Number (ex. 0004567)
- Employee Name (ex. John Smith)
The reasons for the duplication of the data was that the dimension was leveraged by many different business groups. These users wanted the data formatted in the way that mirrored their previously used legacy systems. The above problem persisted in other dimensions in the cube, as well. As the data volume grew, more attributes were created, and more measure groups were added causing processing time to increase to unacceptable levels. Additional steps were necessary to performance tune the dimensions by eliminating the unused and unnecessary attribute hierarchies.
A better design strategy would have been to call a meeting to bring together members from the vested business groups in order to come to a consensus on how how the Employee Name and Employee Number attributes should be displayed with the purpose of cutting back on the number of attributes per dimension. Doing so would decrease processing time for the dimension and also provide better query performance as the query cache can be better utilized for a smaller number of attributes.
Use Member Properties
Sometimes in a dimension we will have attributes that are important to view as details of another attribute but by themselves these attributes are not all that important for slicing and dicing. The Adventure Works Employee dimension has many attributes configured as member properties (they’re the attributes with the grayed-out icons). These attributes contain data that the business has identified as necessary for viewing as a detail of a given employee but are not necessary for slicing and dicing sales metrics or for filtering reports. In this case, we can improve processing performance and save storage by setting the AttributeHierarchyEnabled property to False.
When we set AttributeHierarchyEnabled to False, we can disable building an attribute hierarchy for the attribute. This means the attribute alone can not be used to slice or filter data in the cube . It also means that indexes are not built for the attribute, saving processing time. This can be especially valuable for attributes that have a high level of uniqueness such as SSN, Login ID, or Emergency Contact Phone and that are only necessary to be viewed as details of another attribute.
After setting AttributeHierarchyEnabled to False, we are still able to query the data and interact with the data in Excel. If I right-click on the Employee attribute, select Show Properties in Report, I can then select the member property to display in the report.
Avoid String Data Types
In Analysis Services, string data is stored in special, separate files called string stores. String store files have various extensions depending on where the data is used (ie key, property, member-value, etc.). Strings have to be stored in a separate file because the records are dynamic in size. SSAS also requires at least 12 additional bytes of storage per string: 4 bytes to store the position, 4 bytes for the size, 2 bytes for the type, and 2 bytes for the string NULL terminator character (“String Store Structure”, Microsoft SQL Server 2008 Analysis Services Unleashed, p348-49). Because of these facts, storing strings in our SSAS databases is expensive and degrades performance. For every string added to the solution there will be an impact to processing and query performance.
So what does this mean for us as SSAS developers? Should we eliminate all strings from our cube? Hardly. We want to keep in mind the interests of the users in order to provide the users with an optimal and friendly experience. But it is also important to remember there are penalties for utilizing string data in the cube when unnecessary. Utilize fixed data types when at all possible. This especially applies to dimension attribute key values because many times the key values of a member are not exposed to the users. For example, if I were creating an Employee Name attribute, I could use the Employee Number (INT data type) as the attribute key column and then use the Employee Name (varchar(50)) as the attribute name column. This will allow us to limit the string data being stored to only what is necessary in order to preserve the user experience and optimize performance.
So here are the three best practices to keep in mind while designing your dimensions.
- Only include attributes that are necessary for your dimensions. If an attribute isn’t used, eliminate it. Also, avoid duplicating data.
- Create member properties for attributes that are not necessary for slicing and dicing or filtering.
- Avoid string data when possible.
For information on SSAS design best practices, take a look at Three SSAS Best Practices to Follow.
For some more information on designing and performance tuning SSAS solutions, I highly recommend the following resources:
- Microsoft SQL Server 2008 Analysis Services Unleashed
- Professional Microsoft SQL Server 2012 Analysis Services with MDX and DAX
What did you think about my three dimension design best practices? Did I leave something out that you think I should have covered in this post? Please let me know! I’d love to hear your thoughts.