Shop Talk: 2020-06-08

The Recording

The Panelist

  • Kevin Feasel

Notes: Questions and Topics

Where Did Everybody Go?

Unfortunately, I was the only person able to make it on for Shop Talk last night. Fortunately, we had some great folks in chat to break up the monotony of me being me.

Jupyter Books and Calendar Tables

Mala asked (a while ago), what is the difference between a Jupyter Notebook and a Jupyter Book?

Answer: Jupyter Notebooks are individual documents which can contain code, markdown, and documentation. I used the analogy of “looseleaf” notebooks for a bundle of notebooks in the same directory which are somewhat related, but with nothing explicitly tying them together.

Jupyter Books are collections of notebooks with a table of contents and proper “binding.” In Azure Data Studio, you can see the output:

My example was Emanuele Meazzo’s SQL Server Diagnostic Book. You can see everything connected together in a way that simply having a folder of “looseleaf” notebooks doesn’t give you.

I also mentioned the Big Data Clusters notebook.

@johnfan14 then got me on the topic of calendar tables. I love calendar tables. They play an important role in my DataCamp course. I also have a two-part series on building a calendar table and using it to simplify queries.

Subscription Data

@thedukeny asked during Data Architecture Day, “One thing I’ve always wondered is how are people fitting subscription data to a star schema?  So many business are subscription based with THROUGH and FROM periods and not based on a particular date”

Answer:  The key here is to use the subscription as a dimension and include the time frames there. Then, in your ETL process, have the fact look up the right subscription key—it’s a lot better to take that extra time while loading the data and make data retrieval easier and faster.

Here’s the script I used to explain my thoughts:

CREATE TABLE #Subscription
(
    SubscriptionKey INT,
    SourceSubscriptionID UNIQUEIDENTIFIER,
    UserKey INT,
    Level VARCHAR(30),
    PeriodBeginDate DATE,
    PeriodEndDate DATE,
    IsCurrentRow BIT
);

-- Create a couple of people.  We'd have other dimensions too, of course.
INSERT INTO #Subscription
(
    SubscriptionKey,
    SourceSubscriptionID,
    UserKey,
    -- Other user information.  No worries about dupes...
    Level,
    PeriodBeginDate,
    PeriodEndDate,
    IsCurrentRow
)
VALUES
-- User 1 came in at Gold
(1, NEWID(), 1, 'Gold', '2020-05-01', '9999-12-31',  1),
-- User 2 came in at Talc and upgraded to Zirconium
(2, NEWID(), 2, 'Talc', '2018-04-01', '2018-12-01', 0),
(3, NEWID(), 2, 'Zirconium', '2018-12-01', '9999-12-31', 1),
-- User 3 came in at Quartz, downgraded to Amethyst, and left
(4, NEWID(), 3, 'Quartz', '2017-11-29', '2018-11-28', 0),
(5, NEWID(), 3, 'Amethyst', '2018-11-28', '2019-11-27', 1);

-- Example fact table for watching a video.
CREATE TABLE #CourseView
(
    DateKey INT,
    CourseKey INT,
    SubscriptionKey INT,
    -- Could include UserKey if we normally group by that...
    -- Other keys
    NumberOfMinutesViewed INT,
    NumberOfSegmentsViewed INT,
    FinishedCourse BIT
);


-- User 2 upgrades to Topaz, so that's an SCD2 change.
-- Update the current row and add a new row.
UPDATE #Subscription
SET PeriodEndDate = '2020-06-08', IsCurrentRow = 0
WHERE SubscriptionKey = 3;

INSERT INTO #Subscription
(
    SubscriptionKey,
    SourceSubscriptionID,
    UserKey,
    Level,
    PeriodBeginDate,
    PeriodEndDate,
    IsCurrentRow
)
VALUES
(6, NEWID(), 2, 'Topaz', '2020-06-08', '9999-12-31', 1);

-- User 1 leaves us.  This is changing the end date, so just SCD1.
UPDATE #Subscription
SET PeriodEndDate = '2020-06-30'
WHERE SubscriptionKey = 1;

I also ginned up a sample query against another data set based on some discussion with @thedukeny. This was off the top of my head, but gives an idea of how you can use it.

SELECT
    c.CalendarYear,
    c.CalendarQuarterName,
    e.FirstName,
    e.LastName,
    s.[Level],
    SUM(er.Amount) AS Total
FROM ExpenseReports.dbo.Employee e
    INNER JOIN #Subscription s ON e.EmployeeID = s.UserKey
    CROSS JOIN ExpenseReports.dbo.Calendar c
    LEFT OUTER JOIN ExpenseReports.dbo.ExpenseReport er
        ON er.EmployeeID = e.EmployeeID
        AND c.Date = er.ExpenseDate
WHERE c.[Date] >= s.PeriodBeginDate and c.[Date] < s.PeriodEndDate
and c.date < '2020-01-01'
GROUP BY c.CalendarYear, c.CalendarQuarterName, e.FirstName, e.LastName, s.Level
ORDER BY e.FirstName, c.CalendarYear, c.CalendarQuarterName;

This was a great question because modern web-based companies are moving to subscription models for payment, and some of the older books like Kimball’s don’t really cover subscription models that well.

Crazy Job Descriptions

From Mike Chrestensen: “This is the second job I’ve seen.  Am I crazy or is this impossible.  Full stack + DBA salesforce support + .ASP/.NET/C#, SQL Dev, Reporting and interface with business.”

Answer: I love these descriptions. I think that there are four major reasons why you see crazy job descriptions (though in fairness, the one Mike showed me wasn’t necessarily crazy):

  • HR got their hands on it. That’s where you start seeing things like one of my favorites, where you need 3 years of SQL Server 2012 experience…in the year 2014. Anders also pointed out a developer job which wanted several years of .NET experience in 1999—mind you, .NET was in preview at that time and C# didn’t even stabilize as a language until after 2000.
  • Janice just left the company and now we need a new Janice. Here’s all the stuff she did over the past decade. You should be able to do all of the things she did.
  • Rebecca needs a promotion, but the company requires that we post all jobs to the public. Therefore, we figure out all of the things Rebecca does and has experience with and make them mandatory for this new role. You can tell they’re gaming the system when those descriptions mention experience with internal products.
  • The company is a startup, or still has that startup mentality.

The fourth one is the one I’m most okay with, especially if you know it’s a startup. One person needs to wear a lot of hats in that kind of role, so it’s good to know what you’re in for.

I don’t want to make it sound like having a brain dump of every potentially-tangential technology is a good thing for a job description—it’s not. My ideal job description includes the stuff that you need to know, the stuff we want you to know, and the stuff which is a bonus clearly delineated. If it’s an app developer role, it’s okay to list SQL Server development skills, SSIS, and SSRS, but make it clear that these are small parts of the job and bonus skills rather than core to the position.

Leave a Reply

Your email address will not be published. Required fields are marked *