Shop Talk: 2021-01-04

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tom Norman

Notes: Questions and Topics

Happy Trails, Tom

Our major announcement is that Tom Norman is moving to Texas. I appreciate everything that Tom has done for us at TriPASS and we all wish him well.

As a result, we do have an opening for a board member. If you are a TriPASS member in good standing and interested in being the VP of Membership, please reach out to me.

TriPASS Survey Results

Our first topic of the night was the results of the TriPASS survey. We had enough respondents to get some interesting results. Here’s the quick version:

  • A small majority of TriPASS members who voted said that they would potentially be interested in paid training sponsored by TriPASS, with a price point between $100 and $150, so basically the price of a SQL Saturday pre-con. There were quite a few topics mentioned, but little consistency in topic selection so we’ll have to talk about it as a board to see if there is a good training option we might put on.
  • Thanks to everybody who contributed topics of interest for us to cover. The good news is that we have plans to cover at least 70% or so of the requested topics in the next year. And I might find a way to bring in one or two more.
  • We asked about special interest groups, and we will continue to have two special interest groups in addition to the main meeting. The second Tuesday of the month will continue to be Advanced DBA. For the fourth Tuesday of the month, the title will be Data Science and Business Intelligence. This means that we will include more topics like Power BI, Tableau, and data modeling, but will keep data science and machine learning topics in play as well.
  • Finally, the last question on the survey that I covered pertained to Shop Talk cadence. We have decided to move Shop Talk to run once every two weeks. This means that we will meet every other Monday at 7 PM Eastern.

Thank you to everybody who took the time to fill out the survey.

Power BI End-to-End

Melissa Coates has an updated document for us: the Power BI End-to-End Diagram. Melissa has kept this up to date for several years, and it’s crazy to see just how much Power BI has changed over time. We joke about having a release every other day, but it feels true…

Data Professional Salary Survey, 2021 Edition

Brent Ozar has released the results of the 2021 Data Professional Salary Survey. I haven’t taken the time to look at the data yet, but I have for prior years. It’s a skewed data set—primarily people who know of Brent (or are in that extended network), so it’s going to focus mostly on SQL Server. But if you understand that this isn’t the entire data platform professional population, it’s a very interesting data set for analysis.

Shop Talk: 2020-12-28

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tracy Boggiano

Notes: Questions and Topics

Announcements

Our major announcement is that the TriPASS 2020 membership survey is open. If you haven’t filled it out, it will be open for the next couple of weeks.

Data Saturdays

We spent the first segment of the show talking about the post-PASS future. We included a link to Data Saturdays, which is one step toward replacing SQL Saturdays.

I’d also recommend checking out Call for Data Speakers, a service hosted by Daniel Hutmacher to give data platform speakers and conferences a central location, even outside of the SQL Saturday/Data Saturdays paradigm.

Azure Databricks and Azure Synapse Analytics

We spent a good amount of time walking through the differing use cases of Azure Databricks and Azure Synapse Analytics. Microsoft has an architecture guide walking through the use cases. One point of difference I have: I don’t think HDInsight is worth using.

Ivana Pejeva has a great article on the topic. One thing about the article, though, is that it was written a few months back, so Azure Synapse Analytics has changed a bit since.

We also talked about Snowflake competitors, which in the Azure cloud is Azure Synapse Analytics dedicated SQL pools.

Time Series Databases

@rnicrosoft asked a question around analytics when you have 800 million or so key-value pairs. The transactional side solution is typically something like Cosmos DB, where you’re reading and writing single records at a time. But what happens when you need to perform analysis on the data?

One solution is to use something like Azure Synapse Link to pull that data from Cosmos DB into Azure Synapse Analytics and organize the data in a classic fact-dimension model.

But another solution would be to store the data in a time series database like InfluxDB and visualize it in Grafana. Tracy and I have implemented monitoring with InfluxDB, Telegraf, and Grafana, and you can use this for “normal” analytics as well.

Shop Talk: 2020-12-14

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tracy Boggiano

Notes: Questions and Topics

Announcements

Our major announcement is that the TriPASS 2020 membership survey is open. If you haven’t filled it out, it will be open for the next couple of weeks.

Stuff’s Broken: Google

We started out by covering the Google outage that morning. Interestingly, several sources I’ve seen say that the outage was “about an hour,” but I know YouTube was down at 4:30 AM Eastern and services weren’t fully back on until approximately 7 AM.

We also talked about redundancy in communication technologies: if you’re using GSuite, have an alternative. If you’re heavily invested in O365, have an alternative. When we’re all remote, having those backup methods of communication may feel redundant but it can be critical. Anders brought up cases where critical people had satellite phones. That answer might be a bit extreme for many companies, but before laughing it off, ask yourself how much it costs the company if nobody can communicate for a few hours. Also, I mentioned the possibility of dual-use systems: if you have meeting software (Zoom, GoToMeeting, Teams, etc.) in addition to a separate provider for chat and e-mail, you can use it as a backstop…unless it’s also down.

Stuff’s Broken: the Rest of the World

Our major topic was all about Solarwinds Orion. This is a pretty big deal, as attackers were able to leverage a vulnerability in a downstream vendor to attack the real targets, which include quite a few US federal government agencies. The CISA (which does not have authority over the Department of Defense) mandated all non-defense agencies remove from the network any server running Solarwinds Orion software and not to bring those machines back onto the network until they get the all clear and wipe those machines.

The Departments of Commerce and Treasury have already announced breaches and Brian Krebs has a bit more, including speculation that it’s hit more than just those two departments.

Lesley Carhart lays out more of the story and points out both that this is not a new style of attack and that this is an extremely difficult vector to protect.

Pluralsight Acquisition

Our last major topic is that Vista Equity Partners is aiming to acquire Pluralsight for $3.5 billion. After an IPO in 2018, Vista would take Pluralsight back private. This still has to go through regulators, but I’m not sure I’d see any major roadblocks there.

We then spent a fair amount of time talking about various training offerings and how the market has shifted. When Pluralsight first came out, it was not the first of its kind, but it was a great way for developers to acquire professional training via video. They expanded this out over the years, but that market is pretty well filled out. Pluralsight has responded by moving more to certification training and project work rather than directed video training, and we’ve seen other training players move in that direction as well.

Shop Talk: 2020-12-07

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tracy Boggiano

Notes: Questions and Topics

Announcements

Our major announcement is that the TriPASS 2020 membership survey is open. If you haven’t filled it out, it will be open for the next couple of weeks.

ETL Best Practices

Our big topic was around ETL best practices. We spent quite a bit of time on tooling, processes, ETL vs ELT, and scaling (or lack thereof) in major tools.

Books, Books Everywhere

I showed off my brand-new copy of Itzik Ben-Gan’s T-SQL Window Functions, 2nd Edition. I read the first edition several years back and I’m looking forward to seeing how Itzik has made this even better.

Mala, meanwhile, showed off her copy of Don Jones’s Shell of an Idea, which covers the history of Powershell.

Shop Talk: 2020-11-30

The Recording

The Panelists

  • Kevin Feasel
  • Tom Norman

Notes: Questions and Topics

Announcements

We had one announcement: TriPASS board elections are open until December 3rd. All active TriPASS Meetup members are eligible to vote.

AWS Kinesis Outage

Tom and I talked a bit about another cloud outage: this time, AWS’s Kinesis service went down during Cyber 5. It had some big effects on downstream customers.

Building a Safe Sequence

The majority of this episode is dedicated to building a safe sequence for when you absolutely need contiguous numbers for ID values. Identity and sequence can, by design, leave gaps. 99% of the time (or more), that’s not an issue. But in the rare case where numeric IDs are part of an audit process, I show a technique to eliminate those gaps, though at the cost of slowing down insert performance considerably.

The notebook is part of my upcoming post-con for Data Platform Summit, so I’m not releasing it right now. But I might release a version of it in the future.

Shop Talk: 2020-11-16

The Recording

The Panelists

  • Kevin Feasel
  • Tom Norman

Notes: Questions and Topics

Announcements

We had one announcement: TriPASS board elections are coming up this Thursday. They will run for the next three weeks. Because we had one nominee for each position, we’ll have a up-or-down vote on the entire slate.

Upcoming Events

Data Platform Summit is also coming up, from November 30th through December 8th. Registration is open and you can use the discount code TRIANGLESQL for a nice discount.

PASS Summit Thoughts

Tom and I talked about PASS Summit for the entire episode. We had some great conversations in chat as well.

Mala wasn’t able to make it, but she shared her experiences in a blog post that we walked through. I also made mention of Andy Warren’s series on PASS Summit. Andy’s a guy I have a lot of respect for in the community and I’m glad he put in so much time to reflect on Summit from start to end.

After we finished recording, I saw Deb Melkin’s blog post summarizing PASS Virtual Summit, as well as Andy Levy’s. Check those out as well for additional thoughts on Summit.

TriPASS Plans in November

We will not have Shop Talk next Monday, November 23rd, as it is the Monday of Thanksgiving. We also are done with regular data science meetings for the year, so no regular meeting that week.

We will have Shop Talk on November 30th.

Shop Talk: 2020-11-02

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tom Norman

Notes: Questions and Topics

Announcements

We had one announcement: TriPASS board elections are coming up in mid-November. We have three positions up for election this year: President, VP of Marketing, and Treasurer. All TriPASS members in good standing of eligible to run, and the election will run from November 19th through December 3rd, for terms beginning February 23, 2021 through February 26, 2023. If you’re interested, e-mail (shoptalk at tripass dot org will work) and I can provide more details as needed.

Conferences

PASS Summit is coming up, from November 10th through the 13th. Registration is still open and you can use the discount code LGDISCWY7 to save some cash. Tom also gave the full-court press on why you should sign up for PASS Virtual Summit this year.

Data Platform Summit is also coming up, from November 30th through December 8th. Registration is open and you can use the discount code TRIANGLESQL for a nice discount.

SSMS and ADS Joined Together

Our big topic for the evening was the about SQL Server Management Studio 18.7, which now forces installation of Azure Data Studio.

Tom, Mala, and I all share fairly similar opinions, so I’ll lay out mine just a little bit. I don’t hate the idea, but I do see two problems with it. First, this is a required installation. Microsoft’s saying that you can uninstall ADS afterward if you choose, but I’d much rather there be an option to install or not rather than having to uninstall later. I use Azure Data Studio pretty regularly and I still don’t like the notion of forcing an installation when you wanted SQL Server Management Studio.

Second, I’m a bit curious about release cadence and bloat. SSMS releases less frequently than ADS, so if I subsequently install 18.7.2, will it try to install an old version of Azure Data Studio over my current version? I don’t know the answer for that and figure we’ll have to wait and see. But we do see from Glenn Berry that ADS is installed as a system installation rather than a user installation, so you might have two copies as a result. You might have a third copy if you’ve used the Insiders track. There’s nothing SSMS-specific about this as you can install multiple copies all on your own, but I’m sure it’ll ruffle a few feathers.

Mala did bring up a good point that I’d like to mention here as well: SQL Server Management Studio is already a collection of installed applications—it brings along with it Profiler, Replication Monitor, Database Tuning Advisor (ugh), the Analysis Services deployment tool, and more. These are all separate applications

But that’s not to say I’m dismissive of the people who don’t want to install ADS, as well as the people who want to make ADS an optional installation rather than mandatory. I understand concerns about corporate compliance, potential security risks involved (you are, after all, installing Python and a very extensible IDE), etc. And given that you don’t need ADS to use SSMS, I’d rather they not require installation. Make it an option and I think a lot of the concern within the community simmers down.

This topic also led us down a rabbit hole of installing SSMS on servers. In real life, I’m not too hard-pressed on the idea because it’s nice to have the tools there in case you need direct server access. But I wanted to push the other side of the argument as hard as I could, and I think SSMS 18.7 makes that “Don’t install SSMS on servers ever” argument a bit easier because it increases the scope of what you need to worry about on a server. Or we just keep copies of 18.6 around for a few years…

TriPASS Plans in November

We will not have Shop Talk next week (November 9th), as this is the start of pre-cons for PASS Summit. We also will not have an Advanced DBA group meeting that week, so expect no TriPASS-specific content unless I decide to do something ad hoc.

We will have Shop Talk on the 16th and Tracy will talk about dbatools & dbachecks; register for that now to get a reminder or just show up when we go live on Twitch.

We will probably not have Shop Talk on November 23rd, as it is the Monday of Thanksgiving. We also are done with regular data science meetings for the year, so no regular meeting that week.

Finally, we will have Shop Talk on November 30th.

Shop Talk: 2020-10-26

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tom Norman

Notes: Questions and Topics

Announcements

We had two announcements. First, TriPASS board elections are coming up in mid-November. We have three positions up for election this year: President, VP of Marketing, and Treasurer. All TriPASS members in good standing of eligible to run, and the election will run from November 19th through December 3rd, for terms beginning February 23, 2021 through February 26, 2023. If you’re interested, e-mail (shoptalk at tripass dot org will work) and I can provide more details as needed.

Second, Mala gave us an update on the Azure SQL Championship, which is now winding down.

Conferences

PASS Summit is coming up, from November 10th through the 13th. Registration is still open and you can use the discount code LGDISCWY7 to save some cash.

Data Platform Summit is also coming up, from November 30th through December 8th. Registration is open and you can use the discount code TRIANGLESQL for a nice discount.

SolarWinds Purchasing SentryOne

Our big topic for the evening was the announcement that SolarWinds intends to purchase SentryOne. SentryOne has a press release as well on the topic.

We talked for a little while about this topic, but I’m interested to see where SolarWinds takes the product. SentryOne isn’t perfect by any means—and I kind of threatened to get into why I’m not a fan of their user experience (or the user experience of any other third-party SQL Server monitoring solutions). Here are my notes for what would make for a good monitoring product:

  • The key question:  is my system currently experiencing problems?  How can I know?  What does normal look like versus the current state?
  • If we are experiencing problems, what is the cause?  Or what are the causes?
  • If I am not particularly familiar with this system but am knowledgeable in maintenance, what are the pain points I can most easily fix?
  • If I am not very knowledgeable with this platform, what should I look for?

Ultimately, a monitoring solution—like any other dashboard—should evoke an emotion: either anger (because something is broken and I need to fix it now) or happiness (because everything is going great and I can keep doing whatever it is I do). With most products, there are so many things going on that it’s hard to tell what you should feel, even if you are familiar with the environment and the tool.

Building a Free Monitoring Solution

While we were talking about third-party monitoring solutions, I wanted to call out the work that Shop Talker Tracy Boggiano has put into a free monitoring solution for SQL Server. I ended up converting this into a post of my own, so you might want to check that out as well.

Anders shared some concern around how third-party tools are generally better than first-party from the standpoint that at least third-party tools have support and (usually) documentation, whereas in-house solutions have nothing if the person who put it together leaves. I don’t completely agree with that notion, but you’ll have to watch the video for my argument against.

QDS Toolbox

I am excited to be able to promote the QDS Toolbox, a solution to make Query Store better. You can also check out my blog post for my thoughts in detail.

Robert Half Technology Salary Guide 2021

The last topic of discussion tonight revolved around Robert Half’s Technology Salary Guide for 2021. The PDF is freely available and gives you an idea of salary ranges by job title.

One thing I found interesting is that Software Engineer, Software Developer, and Developer / Programmer Analyst have a fairly wide discrepancy in salary expectations: a Software Engineer at the median salary in the survey earns $123,250, a Software Developer $118,250, and a Developer / Programmer Analyst $109,500. That’s a $14K per year range for people doing ostensibly the same job, but I do have a conjecture regarding why they’re different. My conjecture is that you’re more likely to find people with the title Software Engineer at companies which build software products for customers, whereas you’d find Software Developers at companies which aren’t necessarily selling software to customers; those developers would be working more on back office or internal solutions. Meanwhile, Programmer Analyst is a classic term for employees in the government, where salaries tend to be lower (particularly at the state level). That’s a conjecture, mind, so it could very well be wrong.

On the data side, I was surprised to see the median DBA salary at $100,750; I would have expected DBA salaries to be at least as high as developer salaries. We do see Database Developer roles earning as much as Software Developers, so that’s about what I’d expect. Also surprising is that Data Scientist salaries seem to have dropped considerably, though I surmise the reason here is that there are a lot of newly-minted Data Scientists and people switching from Data Analyst job titles pushing those salaries down.

The salary tables include salaries in USD normalized across the United States, as well as weighting factors for particular cities, so for example, Raleigh salaries are about 4% above the quoted norm. After those tables, there are similar tables for Canadians in CAD. Surprising to me there is that Canadian salaries in CAD are about 75-85% of US salaries in USD, so there’s a huge price difference. A DBA at the 50th percentile in the US has an expected salary of $100,750 USD in the survey. A DBA at the 50th percentile in Canada has an expected salary of $84,000 CAD. Right now, the exchange rate is approximately 4 CAD = 3 USD, so $84,000 CAD * 0.75 USD/CAD = $63,000 USD.

Take all survey numbers with as many grains of salt as you need, but it’s an interesting way of getting a feel for the industry.

Shop Talk: 2020-10-19

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan

Notes: Questions and Topics

Announcements

We had two announcements. First, TriPASS board elections are coming up in mid-November. We have three positions up for election this year: President, VP of Marketing, and Treasurer. All TriPASS members in good standing of eligible to run, and the election will run from November 19th through December 3rd, for terms beginning February 23, 2021 through February 26, 2023. If you’re interested, e-mail (shoptalk at tripass dot org will work) and I can provide more details as needed.

Second, Mala brought up the Azure SQL Championship, so check that out.

Date Buckets in Azure SQL Edge

Our first topic for the night was a function available only in Azure SQL Edge: DATE_BUCKET(). I walked through a sample of some of the code I plan to show off during my PASS Summit pre-con, and I’m glad I did this because I was not very happy with the delivery. The concept feels a little backwards to me, as “date bucket” sounds like it should be numeric: bucket 1, bucket 2, bucket 3. Instead, it’s the begin date for the bucket itself: 2020-09-01, 2020-09-15, 2020-09-29.

I certainly believe it’s useful and would like to see it in SQL Server and Azure SQL Database, but I’ll need to work on my examples to make sure they’re clearer.

Thoughts on Graph Databases

The other key topic for the night was a discussion of graph databases. We recently recorded an episode of the SQL Data Partners podcast with Mala as the guest and the two of us had some detailed thoughts on the utility of graph databases. We took some time tonight to flesh out those thoughts a bit further.

Mala shared her thoughts and included a useful video on graph databases, including their pros and cons.

The short version of it for me is that graph concepts are extremely useful; graph databases are not, unless your company’s key differentiator requires solving graph-style problems with large amounts of data in near real time. An example I like to give for this is logistics, in which you need to know where people and things are and where they need to go in the most efficient manner. If you’re developing software for logistics companies to use to solve this problem, then a graph database might be the best option. Check out the show (as well as the upcoming podcast episode, #208) for the rest of my thoughts on the topic.

Mala’s Books & More Kiosk

We’ve upgraded Mala’s Book Corner to include a lot more. This time around, she has one blog post and three podcasts for us.

Shop Talk: 2020-10-12

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tom Norman

Notes: Questions and Topics

Announcements

We had two announcements. First, TriPASS board elections are coming up in mid-November. We have three positions up for election this year: President, VP of Marketing, and Treasurer. All TriPASS members in good standing of eligible to run, and the election will run from November 19th through December 3rd, for terms beginning February 23, 2021 through February 26, 2023. If you’re interested, e-mail (shoptalk at tripass dot org will work) and I can provide more details as needed.

Second, Mala brought up the Azure SQL Championship, so check that out.

Code Review Tools

Our first topic for the night was code review tools, focusing mostly on the database world. Mala led us through some of the research she has done to find an alternative to Crucible and Fisheye. Mala keys us in on several tools, with an emphasis on tsqllint.

Xenographics

Over at Curated SQL, I posted about Xenographics, a website dedicated to…uncommon visuals. I enjoyed walking through several of them. This site includes some visuals I like as well as some which I can’t understand even after reviewing them.

Watch the video for the visuals we look at, but I wanted to take a moment and hit six characteristics I think make for a good visual. This list is neither necessary nor sufficient—a good visual need not be all of these at once, and I won’t claim that this is the authoritative list of rules for quality visuals. That said, here they are:

  • Intuitive — A visual should be easy for a person to understand despite not having much context. In some cases, you have the opportunity to provide additional context, be it in person or in a magazine. That lets you increase the complexity a bit, but some visuals are really difficult to understand and if you don’t have the luxury to provide additional context, it makes your viewer’s job difficult.
  • Compact — Given two visuals, the one which can put more information into a given space without losing fidelity or intuitiveness is preferable. This lets you save more screen real estate for additional visuals and text. There are certainly limits to this philosophy, so consider it a precept with diminishing marginal returns.
  • Concise — Remove details other than what helps tell the story. This fits in with compactness: if you have unnecessary visual elements, removing them lets you reclaim that space without losing any fidelity. Also, remove unnecessary coloration, changes in line thickness, and other things which don’t contribute to understanding the story. Please note that this doesn’t mean removing all color—just coloration which doesn’t make it easier for a person to understand what’s happening.
  • Consistent — By consistency, what I mean is that the meaning of elements on the visual does not change within a run or between runs. Granted, this is more relevant to dashboards than individual visuals, but think about a Reporting Services report which uses default colors for lines on a chart. If you refresh the page and the colors for different indicators change, it’s hard for a person to build that mental link to understand what’s happening.
  • Glanceable — Concise and consistent visuals tend to be more glanceable than their alternatives. Glanceable means that you are able to pick out some key information without needing to stare the the visual. Ideally, a quick glance at a visual tells you enough of what you need to know, especially if you have seen the same visual in prior states.
  • Informative — This last consideration is critical but often goes overlooked. The data needs to be useful and pertinent to users, describing the situation at the appropriate grain: it includes all of the necessary detail for understanding while eschewing unnecessary detail.