Sunday, December 4, 2022
HomeBusiness IntelligenceSlowly Altering Dimension (SCD) in Energy BI, Half 1, Introduction to SCD

Slowly Altering Dimension (SCD) in Energy BI, Half 1, Introduction to SCD

Slowly altering dimension (SCD) is an information warehousing idea coined by the superb Ralph Kimball. The SCD idea offers with transferring a particular set of knowledge from one state to a different. Think about a human assets (HR) system having an Worker desk. As the next picture exhibits, Stephen Jiang is a Gross sales Supervisor having ten gross sales representatives in his staff:

SCD in Power BI, Stephen Jiang is the sales manager of a team of 10 sales representatives
Picture 1: Stephen Jiang is the gross sales supervisor of a staff of 10 gross sales representatives

Immediately, Stephen Jiang bought his promotion to the Vice President of Gross sales position, so his staff has grown in measurement from 10 to 17. Stephen is identical particular person, however his position is now modified, as proven within the following picture:

SCD in Power BI, Stephen's team after he was promoted to Vice President of Sales
Picture 2: Stephen’s staff after he was promoted to Vice President of Gross sales

One other instance is when a buyer’s deal with modifications in a gross sales system. Once more, the shopper is identical, however their deal with is now completely different. From an information warehousing standpoint, we’ve got completely different choices to cope with the info relying on the enterprise necessities, main us to various kinds of SDCs. It’s essential to notice that the info modifications within the transactional supply methods (in our examples, the HR system or a gross sales system). We transfer and remodel the info from the transactional methods by way of ETL (Extract, Transform, and Load) processes and land it in an information warehouse, the place the SCD idea kicks in. SCD is about how modifications within the supply methods replicate the info within the knowledge warehouse. These sorts of modifications within the supply system don’t occur fairly often therefore the time period slowly altering. Many SCD sorts have been developed through the years, which is out of the scope of this put up, however to your reference, we cowl the primary three sorts as follows.

SCD sort zero (SCD 0)

With this kind of SCD, we ignore all modifications in a dimension. So, when an individual’s residential deal with modifications within the supply system (an HR system, in our instance), we don’t change the touchdown dimension in our knowledge warehouse. In different phrases, we ignore the modifications inside the knowledge supply. SCD 0 is additionally known as mounted dimensions.

SCD sort 1 (SCD 1)

With an SCD 1 sort, we overwrite the outdated knowledge with the brand new. A superb instance of an SCD 1 sort is when the enterprise doesn’t want the shopper’s outdated deal with and solely must hold the shopper’s present deal with.

SCD sort 2 (SCD 2)

With this kind of SCD, we hold the historical past of knowledge modifications within the knowledge warehouse when the enterprise must hold the outdated and present knowledge. In an SCD 2 state of affairs, we have to keep the historic knowledge, so we insert a brand new row of knowledge into the info warehouse each time a transactional system modifications. A change within the transactional system is among the following:

  • Insertion: When a brand new row inserted into the desk
  • Updating: When an current row of knowledge is up to date with new knowledge
  • Deletion: When a row of knowledge is faraway from the desk

Let’s proceed with our earlier instance of a Human Useful resource system and the Worker desk. Inserting a brand new row of knowledge into the Worker dimension within the knowledge warehouse for each change inside the supply system causes knowledge duplications within the Worker dimensions within the knowledge warehouse. Due to this fact we can not use the EmployeeKey column as the first key of the dimension. Therefore, we have to introduce a brand new set of columns to ensure the distinctiveness of each row of the info, as follows:

  • A brand new key column that ensures rows’ uniqueness within the Worker dimension. This new key column is solely an index representing every row of knowledge saved in an information warehouse dimension. The brand new secret’s a so-called surrogate key. Whereas the Surrogate Key ensures every row within the dimension is exclusive, we nonetheless want to keep up the supply system’s main key. By definition, the supply system’s main keys at the moment are known as enterprise keys or alternate keys within the knowledge warehousing world.
  • Begin Date and an Finish Date column characterize the timeframe throughout which a row of knowledge is in its present state.
  • One other column exhibits the standing of every row of knowledge.

SCD 2 is probably the most widespread sort of SCD. After we create the required columns

Let’s revisit our state of affairs when Stephen Jiang was promoted from Gross sales Supervisor to Vice President of Gross sales. The next screenshot exhibits the info within the Worker dimensions within the knowledge warehouse earlier than Stephen bought the promotion:

SCD in Power BI, The employee data before Stephen was promoted
Picture 3: The worker knowledge earlier than Stephen was promoted

The EmployeeKey column is the Surrogate Key of the dimension, and the EmployeeBusinessKey column is the Enterprise Key (the first key of the shopper within the supply system); the Begin Date column exhibits the date Stephen Jiang began his job as North American Gross sales Supervisor, the Finish Date column has been left clean (null), and the Standing column exhibits Present. Now, let’s take a look on the knowledge after Stephen will get the promotion, which is illustrated within the following screenshot:

SCD in Power BI, The employee data after Stephen gets promoted
Picture 4: The worker knowledge after Stephen will get promoted

Because the above picture exhibits, Stephan Jiang began his new position as Vice President of Gross sales on 13/10/2012 and completed his job as North American Gross sales Supervisor on 12/10/2012. So, the info is remodeled whereas transferring from the supply system into the info warehouse. As you see, dealing with SCDs is among the most important duties within the ETL processes.

Let’s see what SCD 2 means in the case of knowledge modeling in Energy BI. The primary query is: Can we implement SCD 2 instantly in Energy BI Desktop with out having an information warehouse? To reply this query, we should do not forget that we all the time put together the info earlier than loading it into the mannequin. Alternatively, we create a semantic layer when constructing an information mannequin in Energy BI. In a earlier put up, I defined the completely different parts of a BI resolution, together with the ETL and the semantic layer. However I repeat it right here. In a Energy BI resolution, we maintain the ETL processes utilizing Energy Question, and the info mannequin is the semantic layer. The semantic layer, by definition, is a view of the supply knowledge (often an information warehouse), optimised for reporting and analytical functions. The semantic layer is to not substitute the info warehouse and isn’t one other model of the info warehouse both. So the reply is that we can not implement the SCD 2 performance purely in Energy BI. We have to both have an information warehouse protecting the historic knowledge, or the transactional system has a mechanism to assist sustaining the historic knowledge, similar to a temporal mechanism. A temporal mechanism is a characteristic that some relational database administration methods similar to SQL Server provide to supply details about the info saved in a desk at any time as a substitute of protecting the present knowledge solely. To be taught extra about temporal tables in SQL Server, verify this out.

After we load the info into the info mannequin in Energy BI Desktop, we’ve got all present and historic knowledge within the dimension tables. Due to this fact, we’ve got to watch out when coping with SCDs. As an illustration, the next screenshot exhibits reseller gross sales for workers:

SCD in Power BI, SCD in Power BI, Reseller sales for employees without considering SCD
Picture 5: Reseller gross sales for workers with out contemplating SCD

At a primary look, the numbers appear to be right. Nicely, they might be proper; they might be fallacious. It relies on what the enterprise expects to see on a report. Take a look at Picture 4, which exhibits Stephen’s modifications. Stephen had some gross sales values when he was a North American Gross sales Supervisor (EmployeeKey 272). However after his promotion (EmployeeKey 277), he isn’t promoting anymore. We didn’t think about SCD once we created the previous desk, which implies we think about Stephen’s gross sales values (EmployeeKey 272). However is that this what the enterprise requires? Does the enterprise count on to see all staff’ gross sales with out contemplating their standing? For extra readability, let’s add the Standing column to the desk.

SCD in Power BI, Reseller sales for employees and their status without considering SCD
Picture 6: Reseller gross sales for workers and their standing with out contemplating SCD

What if the enterprise must solely present gross sales values just for staff when their standing is Present? In that case, we must issue the SCD into the equation and filter out Stephen’s gross sales values. Relying on the enterprise necessities, we’d want so as to add the Standing column as a filter within the visualizations, whereas in different instances, we’d want to switch the measures by including the Begin DateFinish Date, and Standing columns to filter the outcomes. The next screenshot exhibits the outcomes once we use visible filters to take out Stephen’s gross sales:

SCD in Power BI, SCD in Power BI, Reseller sales for employees considering SCD
Picture 7: Reseller gross sales for workers contemplating SCD

Coping with SCDs is just not all the time so simple as this. Generally, we have to make some modifications to our knowledge mannequin.

So, do all of the above imply we can not implement any sorts of SCDs in Energy BI? The reply, as all the time, is “it relies upon.” In some eventualities, we are able to implement an answer much like the SCD 1 performance, which I clarify in one other weblog put up. However we’re out of luck in implementing the SCD 2 performance purely in Energy BI.

Have you ever used SCDs in Energy BI, I’m curious to know concerning the challenges you confronted. So please share you ideas within the feedback part beneath.



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments