Shop Talk: 2020-05-18

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tom Norman

Notes: Questions and Topics

Data Architecture Day

I spent a little bit of time gleefully recapping Data Architecture Day. We had 376 viewers over the course of the 11 sessions. If you missed any sessions or want to review something, I have put together a playlist for you.

Resource Governor and Machine Learning Services

I ran into some issues around SQL Server Machine Learning Services and processes (R and Python) running out of memory. The reason is that, by default, SQL Server creates an external resource group with a limit of 20% of the allocated memory for external scripts. In my talk on Machine Learning Services in production, I cover how to change this value.

Slowly Changing Dimension Types

@iconpro5555 asked during Data Architecture Day: “what are some approaches to versioning changes?  do u mix. type 2 data with type 1?”

Answer: The reference here is to slowly changing dimensions in a Kimball style warehouse. If you want an excellent write-up, I recommend Simon Whiteley’s take on the matter. Here’s the script I used when putting together my answer:

USE tempdb
GO
-- Type 0:  Nothing will ever change...right?
CREATE TABLE dbo.Type0Dim
(
    Name VARCHAR(50),
    DateOfBirth DATE
);
-- To insert:  add a new row.
INSERT INTO dbo.Type0Dim(Name, DateOfBirth) VALUES('Tony', '2001-01-01');
SELECT * FROM dbo.Type0Dim;
-- To update:  we don't!
GO

-- Type 1:  History?  Who cares?
CREATE TABLE dbo.Type1Dim
(
    Name VARCHAR(50),
    DateOfBirth DATE
);
-- To insert:  add a new row.
INSERT INTO dbo.Type1Dim(Name, DateOfBirth) VALUES('Tony', NULL);
SELECT * FROM dbo.Type1Dim;
-- To update:  simple update statement.
UPDATE dbo.Type1Dim SET DateOfBirth = '2001-01-01' WHERE Name = 'Tony';
SELECT * FROM dbo.Type1Dim;
GO

-- Type 2:  History is a new row.
CREATE TABLE dbo.Type2Dim
(
    Name VARCHAR(50),
    FavoriteColor VARCHAR(30),
    IsCurrent BIT,
    PeriodBeginDate DATETIME2(3),
    PeriodEndDate DATETIME2(3)
);
-- To insert:  add a new row.
INSERT INTO dbo.Type2Dim(Name, FavoriteColor, IsCurrent, PeriodBeginDate, PeriodEndDate)
VALUES('Tony', 'Green', 1, GETUTCDATE(), '9999-12-31');
SELECT * FROM dbo.Type2Dim;
-- To update:  update the current row and add a new row.
DECLARE @Now DATETIME2(3) = GETUTCDATE();
UPDATE dbo.Type2Dim SET IsCurrent = 0, PeriodEndDate = @Now WHERE Name = 'Tony' and IsCurrent = 1;
INSERT INTO dbo.Type2Dim(Name, FavoriteColor, IsCurrent, PeriodBeginDate, PeriodEndDate)
VALUES('Tony', 'Puce', 1, @Now, '9999-12-31');
SELECT * FROM dbo.Type2Dim;
GO

-- Type 3:  History is the prior version.
CREATE TABLE dbo.Type3Dim
(
    Name VARCHAR(50),
    FavoriteColor VARCHAR(30),
    PriorFavoriteColor VARCHAR(30)
);
-- To insert:  add a new row WITHOUT prior values.
INSERT INTO dbo.Type3Dim(Name, FavoriteColor)
VALUES('Tony', 'Green');
SELECT * FROM dbo.Type3Dim;
-- To update:  set the prior value.
UPDATE dbo.Type3Dim SET PriorFavoriteColor = 'Green', FavoriteColor = 'Puce' WHERE Name = 'Tony';
SELECT * FROM dbo.Type3Dim;
GO

-- Type 4:  the Princess is in another castle.
CREATE TABLE dbo.Type4Dim
(
    Name VARCHAR(50),
    FavoriteColor VARCHAR(30)
);
CREATE TABLE dbo.Type4DimHistory
(
    HistoryKey BIGINT IDENTITY(1,1) NOT NULL,
    Name VARCHAR(50),
    FavoriteColor VARCHAR(30),
    CreateDate DATETIME2(3)
);
-- To insert:  add a row into the dimension *and* a row into the history.
INSERT INTO dbo.Type4Dim(Name, FavoriteColor) VALUES('Tony', 'Green');
INSERT INTO dbo.Type4DimHistory(Name, FavoriteColor, CreateDate) VALUES('Tony', 'Green', GETUTCDATE());
SELECT * FROM dbo.Type4Dim;
SELECT * FROM dbo.Type4DimHistory;
-- To update:  update the dimension *and* add a new row into the history.
UPDATE dbo.Type4Dim SET FavoriteColor = 'Puce' WHERE Name = 'Tony';
INSERT INTO dbo.Type4DimHistory(Name, FavoriteColor, CreateDate) VALUES('Tony', 'Puce', GETUTCDATE());
SELECT * FROM dbo.Type4Dim;
SELECT * FROM dbo.Type4DimHistory;
GO

-- Type 6:  I just can't decide!
CREATE TABLE dbo.Type6Dim
(
    Name VARCHAR(50),
    FavoriteColor VARCHAR(30),
    PriorFavoriteColor VARCHAR(30),
    IsCurrent BIT,
    PeriodBeginDate DATETIME2(3),
    PeriodEndDate DATETIME2(3)
);
-- To insert:  add a row without prior values.
INSERT INTO dbo.Type6Dim(Name, FavoriteColor, IsCurrent, PeriodBeginDate, PeriodEndDate)
VALUES('Tony', 'Green', 1, GETUTCDATE(), '9999-12-31');
SELECT * FROM dbo.Type6Dim;
-- To update:  update the current row and add a new row.
DECLARE @Now DATETIME2(3) = GETUTCDATE();
UPDATE dbo.Type6Dim SET IsCurrent = 0, PeriodEndDate = @Now WHERE Name = 'Tony' and IsCurrent = 1;
INSERT INTO dbo.Type6Dim(Name, FavoriteColor, PriorFavoriteColor, IsCurrent, PeriodBeginDate, PeriodEndDate)
VALUES('Tony', 'Puce', 'Green', 1, @Now, '9999-12-31');
SELECT * FROM dbo.Type6Dim;
GO

-- Clean up.
DROP TABLE IF EXISTS dbo.Type0Dim;
DROP TABLE IF EXISTS dbo.Type1Dim;
DROP TABLE IF EXISTS dbo.Type2Dim;
DROP TABLE IF EXISTS dbo.Type3Dim;
DROP TABLE IF EXISTS dbo.Type4Dim;
DROP TABLE IF EXISTS dbo.Type4DimHistory;
DROP TABLE IF EXISTS dbo.Type6Dim;
GO

One thing that I hit at the end of the discussion but want to mention here is that the types properly refer to attributes, not to dimensions. What I mean by this is, you can definitely have a dimension which combines types 0, 1, and 2 together. You may have some information which will never change, and so those attributes are type 0. Then, you may have some attributes for which we don’t care about history—those would be type 1 attributes. And some attributes might affect our fact tables, so we track the history as type 2.

Let’s say we have a Person dimension and a Sales fact. On the Person dimension, perhaps we have a DateJoined indicator. We never update that indicator, as the person joined on a specific date and that’s it. So DateJoined would be type 0. The person has a telephone number. We care about the current value of the telephone number, but the telephone number itself won’t tell us much about historical sales. At one point in time, a phone number change was typically indicative of moving to a different area, but between area code splits and mobile phones, your area code isn’t really a great indicator of your location any more. So a phone number might be a type 1 attribute: if it changes, we want to keep track of the latest value, but we don’t need to store a history of it. Favorite color, however, might be a type 2 attribute—if a person’s favorite color changes, that might affect their sales behavior as they shift from buying things in one color to buying things in a different color. In this case, we want to know when the person had a particular favorite color.

This starts to sound complex, but it’s the kind of analysis which is critical for a solid data warehouse. This may also be part of why so many data warehousing projects fail!

Mala’s Book Corner

Mala has a couple of book recommendations for us this week:

Experiences with Hands-On Labs

Tom asked for recommendations on hands-on labs. I shared some of my experiences and Neil Hambly brought up ideas as well. I don’t have a great answer, especially if you don’t do hands-on labs frequently. If you’re in the ML space, I like Azure Notebooks because it’s free and I don’t have to worry about the computers people bring in. We also talked about a few options for when notebooks aren’t a good solution: Docker containers, VMs on thumb drives, bringing your own hardware (as a trainer), and Azure VMs come up. Basically, unless you control the situation really well and pass out the hardware yourself, I would shy away from trying to use attendees’ machines directly for a hands-on lab and virtualize, containerize, or cloudify whatever I could.

Oxford Commas are Great

We had 100% approval of Oxford commas in chat. It was beautiful.

One Reply to “Shop Talk: 2020-05-18”

Leave a Reply

Your email address will not be published. Required fields are marked *