This past week I ran into an interesting challenge with a client. The data warehouse is capturing testing data for an educational institution. In the screenshot below, you’ll see a simplified model I created that illustrates the basics of their data warehouse. Not too complex.
The goal of the model is to be able to track student test scores.
This model seemed to be what they were hoping for but they discovered an issue with their model as they began to build their first reports. A student can be related to more than one faculty member. Because of this, the client created a many-to-many bridge table between DimStudent and DimFaculty called BridgeStudentFaculty. BridgeStudentFaculty has an EntryDate and ExitDate to identify the time frame a student was related to a faculty member.
Here is the dimension usage of the example cube so you can better visualize the relationships.
The problem is that FactTest is only related to BridgeStudentFaculty through the DimStudent dimension, which means that when a student takes a test the client is unable to easily determine which faculty member the student was related to at the time the test was taken. The following query results illustrate the problem:
The rows that are highlighted in yellow are correct. We can see that the Full Date, which is the Test Date, falls between the Entry Date and Exit Date values.
The rows that are highlighted in red are incorrect. The Full Date is outside of the Entry Date and Exit Date values. These records are shown because the relationship between the Entry and Exit Dates is based on the Student and does not consider the dates.
There are a couple solutions here that could be considered. The first is that rather than leaving BridgeStudentFaculty as a temporal style fact table with one record per date range per student per faculty, I could transform the table into a daily snapshot fact table, with one row per day per student per faculty. By doing this I would then be able to create a direct relationship between BridgeStudentFaculty and the Test Date dimension, allow me to relate test dates to the days that a student was related to a faculty member. The reason that I did not go with this solution is that this would cause the size of the BridgeStudentFaculty table to explode to an unacceptable size.
The second option that I went with was to create a couple new tables to support a relationship between the Entry and Exit date ranges in BridgeStudentFaculty and the Test Date dimension. I created a new table called DimDateRange and another new table BridgeDateRangeDate. This new BridgeDateRangeDate table will allow the users to easily identify date ranges a student was related to a faculty member and will also be a much smaller table than the table required using the previous option. Here you can see the diagram:
The DimDateRange table contains one row for each unique date range that exists in the BridgeStudentFaculty table:
And the BridgeDateRangeDate table contains one row for each day in a date range that exists in the DimDateRange table:
Here are the updated relationships in the cube:
I created a new dimension based on DimDateRange and a new measure group based on BridgeDateRangeDate. I create the regular relationships between the tables but also added many-to-many relationships between Test Date and the Bridge Student Faculty measure group using the the BridgeDateRangeDate table and also a many-to-many relationship between DimDateRange and the Fact Test measure group using the BridgeDateRangeDate table.
So now, because of the new many-to-many relationships, when I run my query to analyze tests for students and faculty members I only see records where the test date is within the date range that a student was related to a faculty member.
I wanted to post this solution in hopes that someone else would benefit from it. Please leave your feedback and comments below and let me know if you found this helpful! Have a great day!