data warehouse design

Data Warehouse Design Challenge: Relating a Temporal Fact Table to a Date Dimension

This past week I ran into an interesting challenge with a client. The data warehouse is capturing testing data for an educational institution. In the screenshot below, you’ll see a simplified model I created that illustrates the basics of their data warehouse. Not too complex.

image

The goal of the model is to be able to track student test scores.

The Problem

This model seemed to be what they were hoping for but they discovered an issue with their model as they began to build their first reports. A student can be related to more than one faculty member. Because of this, the client created a many-to-many bridge table between DimStudent and DimFaculty called BridgeStudentFaculty. BridgeStudentFaculty has an EntryDate and ExitDate to identify the time frame a student was related to a faculty member.

Here is the dimension usage of the example cube so you can better visualize the relationships.

image

The problem is that FactTest is only related to BridgeStudentFaculty through the DimStudent dimension, which means that when a student takes a test the client is unable to easily determine which faculty member the student was related to at the time the test was taken. The following query results illustrate the problem:

image

The rows that are highlighted in yellow are correct. We can see that the Full Date, which is the Test Date, falls between the Entry Date and Exit Date values.

The rows that are highlighted in red are incorrect. The Full Date is outside of the Entry Date and Exit Date values. These records are shown because the relationship between the Entry and Exit Dates is based on the Student and does not consider the dates.

The Solution

There are a couple solutions here that could be considered. The first is that rather than leaving BridgeStudentFaculty as a temporal style fact table with one record per date range per student per faculty, I could transform the table into a daily snapshot fact table, with one row per day per student per faculty. By doing this I would then be able to create a direct relationship between BridgeStudentFaculty and the Test Date dimension, allow me to relate test dates to the days that a student was related to a faculty member. The reason that I did not go with this solution is that this would cause the size of the BridgeStudentFaculty table to explode to an unacceptable size.

The second option that I went with was to create a couple new tables to support a relationship between the Entry and Exit date ranges in BridgeStudentFaculty and the Test Date dimension. I created a new table called DimDateRange and another new table BridgeDateRangeDate. This new BridgeDateRangeDate table will allow the users to easily identify date ranges a student was related to a faculty member and will also be a much smaller table than the table required using the previous option. Here you can see the diagram:

image

The DimDateRange table contains one row for each unique date range that exists in the BridgeStudentFaculty table:

image

And the BridgeDateRangeDate table contains one row for each day in a date range that exists in the DimDateRange table:

image

Here are the updated relationships in the cube:

image

I created a new dimension based on DimDateRange and a new measure group based on BridgeDateRangeDate. I create the regular relationships between the tables but also added many-to-many relationships between Test Date and the Bridge Student Faculty measure group using the the BridgeDateRangeDate table and also a many-to-many relationship between DimDateRange and the Fact Test measure group using the BridgeDateRangeDate table.

So now, because of the new many-to-many relationships, when I run my query to analyze tests for students and faculty members I only see records where the test date is within the date range that a student was related to a faculty member.

image

Feedback?

I wanted to post this solution in hopes that someone else would benefit from it. Please leave your feedback and comments below and let me know if you found this helpful! Have a great day!

6 thoughts on “Data Warehouse Design Challenge: Relating a Temporal Fact Table to a Date Dimension”

  1. Hello Dustin, Just curious why did you not just create a basic star schema with dim_facluty directly related to the fact table and create role paying dimensions for the 2 date attributes?

    1. The reason I didn’t add faculty to the fact test table is because that would have lowered the grain of the table and there are measures that cannot be subdivided any further. Dividing test scores wasn’t an option.

      1. I assumed that this was a transnational fact table, 1 test per student per faculty member per Test Type. So is this fact table a Periodic Snapshot?

    1. Cool beans. Also, one other aspect I left out is that a student also has a m2m relationship with a Subgroup dim and a another dim (which I can’t recall) and the m2m bridges for those relationships also has an effective date range. This solution works with the multiple bridge tables.

Comments are closed.