Shop Talk: 2020-11-16

The Recording

The Panelists

  • Kevin Feasel
  • Tom Norman

Notes: Questions and Topics

Announcements

We had one announcement: TriPASS board elections are coming up this Thursday. They will run for the next three weeks. Because we had one nominee for each position, we’ll have a up-or-down vote on the entire slate.

Upcoming Events

Data Platform Summit is also coming up, from November 30th through December 8th. Registration is open and you can use the discount code TRIANGLESQL for a nice discount.

PASS Summit Thoughts

Tom and I talked about PASS Summit for the entire episode. We had some great conversations in chat as well.

Mala wasn’t able to make it, but she shared her experiences in a blog post that we walked through. I also made mention of Andy Warren’s series on PASS Summit. Andy’s a guy I have a lot of respect for in the community and I’m glad he put in so much time to reflect on Summit from start to end.

After we finished recording, I saw Deb Melkin’s blog post summarizing PASS Virtual Summit, as well as Andy Levy’s. Check those out as well for additional thoughts on Summit.

TriPASS Plans in November

We will not have Shop Talk next Monday, November 23rd, as it is the Monday of Thanksgiving. We also are done with regular data science meetings for the year, so no regular meeting that week.

We will have Shop Talk on November 30th.

Shop Talk: 2020-11-02

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tom Norman

Notes: Questions and Topics

Announcements

We had one announcement: TriPASS board elections are coming up in mid-November. We have three positions up for election this year: President, VP of Marketing, and Treasurer. All TriPASS members in good standing of eligible to run, and the election will run from November 19th through December 3rd, for terms beginning February 23, 2021 through February 26, 2023. If you’re interested, e-mail (shoptalk at tripass dot org will work) and I can provide more details as needed.

Conferences

PASS Summit is coming up, from November 10th through the 13th. Registration is still open and you can use the discount code LGDISCWY7 to save some cash. Tom also gave the full-court press on why you should sign up for PASS Virtual Summit this year.

Data Platform Summit is also coming up, from November 30th through December 8th. Registration is open and you can use the discount code TRIANGLESQL for a nice discount.

SSMS and ADS Joined Together

Our big topic for the evening was the about SQL Server Management Studio 18.7, which now forces installation of Azure Data Studio.

Tom, Mala, and I all share fairly similar opinions, so I’ll lay out mine just a little bit. I don’t hate the idea, but I do see two problems with it. First, this is a required installation. Microsoft’s saying that you can uninstall ADS afterward if you choose, but I’d much rather there be an option to install or not rather than having to uninstall later. I use Azure Data Studio pretty regularly and I still don’t like the notion of forcing an installation when you wanted SQL Server Management Studio.

Second, I’m a bit curious about release cadence and bloat. SSMS releases less frequently than ADS, so if I subsequently install 18.7.2, will it try to install an old version of Azure Data Studio over my current version? I don’t know the answer for that and figure we’ll have to wait and see. But we do see from Glenn Berry that ADS is installed as a system installation rather than a user installation, so you might have two copies as a result. You might have a third copy if you’ve used the Insiders track. There’s nothing SSMS-specific about this as you can install multiple copies all on your own, but I’m sure it’ll ruffle a few feathers.

Mala did bring up a good point that I’d like to mention here as well: SQL Server Management Studio is already a collection of installed applications—it brings along with it Profiler, Replication Monitor, Database Tuning Advisor (ugh), the Analysis Services deployment tool, and more. These are all separate applications

But that’s not to say I’m dismissive of the people who don’t want to install ADS, as well as the people who want to make ADS an optional installation rather than mandatory. I understand concerns about corporate compliance, potential security risks involved (you are, after all, installing Python and a very extensible IDE), etc. And given that you don’t need ADS to use SSMS, I’d rather they not require installation. Make it an option and I think a lot of the concern within the community simmers down.

This topic also led us down a rabbit hole of installing SSMS on servers. In real life, I’m not too hard-pressed on the idea because it’s nice to have the tools there in case you need direct server access. But I wanted to push the other side of the argument as hard as I could, and I think SSMS 18.7 makes that “Don’t install SSMS on servers ever” argument a bit easier because it increases the scope of what you need to worry about on a server. Or we just keep copies of 18.6 around for a few years…

TriPASS Plans in November

We will not have Shop Talk next week (November 9th), as this is the start of pre-cons for PASS Summit. We also will not have an Advanced DBA group meeting that week, so expect no TriPASS-specific content unless I decide to do something ad hoc.

We will have Shop Talk on the 16th and Tracy will talk about dbatools & dbachecks; register for that now to get a reminder or just show up when we go live on Twitch.

We will probably not have Shop Talk on November 23rd, as it is the Monday of Thanksgiving. We also are done with regular data science meetings for the year, so no regular meeting that week.

Finally, we will have Shop Talk on November 30th.

Shop Talk: 2020-10-26

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tom Norman

Notes: Questions and Topics

Announcements

We had two announcements. First, TriPASS board elections are coming up in mid-November. We have three positions up for election this year: President, VP of Marketing, and Treasurer. All TriPASS members in good standing of eligible to run, and the election will run from November 19th through December 3rd, for terms beginning February 23, 2021 through February 26, 2023. If you’re interested, e-mail (shoptalk at tripass dot org will work) and I can provide more details as needed.

Second, Mala gave us an update on the Azure SQL Championship, which is now winding down.

Conferences

PASS Summit is coming up, from November 10th through the 13th. Registration is still open and you can use the discount code LGDISCWY7 to save some cash.

Data Platform Summit is also coming up, from November 30th through December 8th. Registration is open and you can use the discount code TRIANGLESQL for a nice discount.

SolarWinds Purchasing SentryOne

Our big topic for the evening was the announcement that SolarWinds intends to purchase SentryOne. SentryOne has a press release as well on the topic.

We talked for a little while about this topic, but I’m interested to see where SolarWinds takes the product. SentryOne isn’t perfect by any means—and I kind of threatened to get into why I’m not a fan of their user experience (or the user experience of any other third-party SQL Server monitoring solutions). Here are my notes for what would make for a good monitoring product:

  • The key question:  is my system currently experiencing problems?  How can I know?  What does normal look like versus the current state?
  • If we are experiencing problems, what is the cause?  Or what are the causes?
  • If I am not particularly familiar with this system but am knowledgeable in maintenance, what are the pain points I can most easily fix?
  • If I am not very knowledgeable with this platform, what should I look for?

Ultimately, a monitoring solution—like any other dashboard—should evoke an emotion: either anger (because something is broken and I need to fix it now) or happiness (because everything is going great and I can keep doing whatever it is I do). With most products, there are so many things going on that it’s hard to tell what you should feel, even if you are familiar with the environment and the tool.

Building a Free Monitoring Solution

While we were talking about third-party monitoring solutions, I wanted to call out the work that Shop Talker Tracy Boggiano has put into a free monitoring solution for SQL Server. I ended up converting this into a post of my own, so you might want to check that out as well.

Anders shared some concern around how third-party tools are generally better than first-party from the standpoint that at least third-party tools have support and (usually) documentation, whereas in-house solutions have nothing if the person who put it together leaves. I don’t completely agree with that notion, but you’ll have to watch the video for my argument against.

QDS Toolbox

I am excited to be able to promote the QDS Toolbox, a solution to make Query Store better. You can also check out my blog post for my thoughts in detail.

Robert Half Technology Salary Guide 2021

The last topic of discussion tonight revolved around Robert Half’s Technology Salary Guide for 2021. The PDF is freely available and gives you an idea of salary ranges by job title.

One thing I found interesting is that Software Engineer, Software Developer, and Developer / Programmer Analyst have a fairly wide discrepancy in salary expectations: a Software Engineer at the median salary in the survey earns $123,250, a Software Developer $118,250, and a Developer / Programmer Analyst $109,500. That’s a $14K per year range for people doing ostensibly the same job, but I do have a conjecture regarding why they’re different. My conjecture is that you’re more likely to find people with the title Software Engineer at companies which build software products for customers, whereas you’d find Software Developers at companies which aren’t necessarily selling software to customers; those developers would be working more on back office or internal solutions. Meanwhile, Programmer Analyst is a classic term for employees in the government, where salaries tend to be lower (particularly at the state level). That’s a conjecture, mind, so it could very well be wrong.

On the data side, I was surprised to see the median DBA salary at $100,750; I would have expected DBA salaries to be at least as high as developer salaries. We do see Database Developer roles earning as much as Software Developers, so that’s about what I’d expect. Also surprising is that Data Scientist salaries seem to have dropped considerably, though I surmise the reason here is that there are a lot of newly-minted Data Scientists and people switching from Data Analyst job titles pushing those salaries down.

The salary tables include salaries in USD normalized across the United States, as well as weighting factors for particular cities, so for example, Raleigh salaries are about 4% above the quoted norm. After those tables, there are similar tables for Canadians in CAD. Surprising to me there is that Canadian salaries in CAD are about 75-85% of US salaries in USD, so there’s a huge price difference. A DBA at the 50th percentile in the US has an expected salary of $100,750 USD in the survey. A DBA at the 50th percentile in Canada has an expected salary of $84,000 CAD. Right now, the exchange rate is approximately 4 CAD = 3 USD, so $84,000 CAD * 0.75 USD/CAD = $63,000 USD.

Take all survey numbers with as many grains of salt as you need, but it’s an interesting way of getting a feel for the industry.

Shop Talk: 2020-10-19

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan

Notes: Questions and Topics

Announcements

We had two announcements. First, TriPASS board elections are coming up in mid-November. We have three positions up for election this year: President, VP of Marketing, and Treasurer. All TriPASS members in good standing of eligible to run, and the election will run from November 19th through December 3rd, for terms beginning February 23, 2021 through February 26, 2023. If you’re interested, e-mail (shoptalk at tripass dot org will work) and I can provide more details as needed.

Second, Mala brought up the Azure SQL Championship, so check that out.

Date Buckets in Azure SQL Edge

Our first topic for the night was a function available only in Azure SQL Edge: DATE_BUCKET(). I walked through a sample of some of the code I plan to show off during my PASS Summit pre-con, and I’m glad I did this because I was not very happy with the delivery. The concept feels a little backwards to me, as “date bucket” sounds like it should be numeric: bucket 1, bucket 2, bucket 3. Instead, it’s the begin date for the bucket itself: 2020-09-01, 2020-09-15, 2020-09-29.

I certainly believe it’s useful and would like to see it in SQL Server and Azure SQL Database, but I’ll need to work on my examples to make sure they’re clearer.

Thoughts on Graph Databases

The other key topic for the night was a discussion of graph databases. We recently recorded an episode of the SQL Data Partners podcast with Mala as the guest and the two of us had some detailed thoughts on the utility of graph databases. We took some time tonight to flesh out those thoughts a bit further.

Mala shared her thoughts and included a useful video on graph databases, including their pros and cons.

The short version of it for me is that graph concepts are extremely useful; graph databases are not, unless your company’s key differentiator requires solving graph-style problems with large amounts of data in near real time. An example I like to give for this is logistics, in which you need to know where people and things are and where they need to go in the most efficient manner. If you’re developing software for logistics companies to use to solve this problem, then a graph database might be the best option. Check out the show (as well as the upcoming podcast episode, #208) for the rest of my thoughts on the topic.

Mala’s Books & More Kiosk

We’ve upgraded Mala’s Book Corner to include a lot more. This time around, she has one blog post and three podcasts for us.

Shop Talk: 2020-10-12

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tom Norman

Notes: Questions and Topics

Announcements

We had two announcements. First, TriPASS board elections are coming up in mid-November. We have three positions up for election this year: President, VP of Marketing, and Treasurer. All TriPASS members in good standing of eligible to run, and the election will run from November 19th through December 3rd, for terms beginning February 23, 2021 through February 26, 2023. If you’re interested, e-mail (shoptalk at tripass dot org will work) and I can provide more details as needed.

Second, Mala brought up the Azure SQL Championship, so check that out.

Code Review Tools

Our first topic for the night was code review tools, focusing mostly on the database world. Mala led us through some of the research she has done to find an alternative to Crucible and Fisheye. Mala keys us in on several tools, with an emphasis on tsqllint.

Xenographics

Over at Curated SQL, I posted about Xenographics, a website dedicated to…uncommon visuals. I enjoyed walking through several of them. This site includes some visuals I like as well as some which I can’t understand even after reviewing them.

Watch the video for the visuals we look at, but I wanted to take a moment and hit six characteristics I think make for a good visual. This list is neither necessary nor sufficient—a good visual need not be all of these at once, and I won’t claim that this is the authoritative list of rules for quality visuals. That said, here they are:

  • Intuitive — A visual should be easy for a person to understand despite not having much context. In some cases, you have the opportunity to provide additional context, be it in person or in a magazine. That lets you increase the complexity a bit, but some visuals are really difficult to understand and if you don’t have the luxury to provide additional context, it makes your viewer’s job difficult.
  • Compact — Given two visuals, the one which can put more information into a given space without losing fidelity or intuitiveness is preferable. This lets you save more screen real estate for additional visuals and text. There are certainly limits to this philosophy, so consider it a precept with diminishing marginal returns.
  • Concise — Remove details other than what helps tell the story. This fits in with compactness: if you have unnecessary visual elements, removing them lets you reclaim that space without losing any fidelity. Also, remove unnecessary coloration, changes in line thickness, and other things which don’t contribute to understanding the story. Please note that this doesn’t mean removing all color—just coloration which doesn’t make it easier for a person to understand what’s happening.
  • Consistent — By consistency, what I mean is that the meaning of elements on the visual does not change within a run or between runs. Granted, this is more relevant to dashboards than individual visuals, but think about a Reporting Services report which uses default colors for lines on a chart. If you refresh the page and the colors for different indicators change, it’s hard for a person to build that mental link to understand what’s happening.
  • Glanceable — Concise and consistent visuals tend to be more glanceable than their alternatives. Glanceable means that you are able to pick out some key information without needing to stare the the visual. Ideally, a quick glance at a visual tells you enough of what you need to know, especially if you have seen the same visual in prior states.
  • Informative — This last consideration is critical but often goes overlooked. The data needs to be useful and pertinent to users, describing the situation at the appropriate grain: it includes all of the necessary detail for understanding while eschewing unnecessary detail.

Shop Talk: 2020-10-05

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan

Notes: Questions and Topics

Announcements

We had two announcements. First, TriPASS board elections are coming up in mid-November. We have three positions up for election this year: President, VP of Marketing, and Treasurer. All TriPASS members in good standing of eligible to run, and the election will run from November 19th through December 3rd, for terms beginning February 23, 2021 through February 26, 2023. If you’re interested, e-mail (shoptalk at tripass dot org will work) and I can provide more details as needed.

Second, Mala brought up the Azure SQL Championship, so check that out.

So You Want to Learn about Database Development

Leslie sent in an e-mail (shoptalk at tripass dot org) asking for guidance or a syllabus for learning to become a database developer, so Mala and I tackled that. We broke it down into five categories and focused mostly on books. One book-buying tip I have: if you’re looking at older books, check out a website like ISBN.nu, as it aggregates across several book sellers to find the best price. I’ve used that site for about 15 years now. Most of the links I provide are from Amazon, but that’s almost never the cheapest price.

First up is T-SQL, as I think you should learn the language before trying to work through the rest of it. Carlos Chacon has a great book called From Zero to SQL in 20 Lessons, which was written for someone with no experience to get started. From there, I’d recommend Itzik Ben-Gan’s T-SQL Fundamentals or Kathi Kellenberger’s Beginning T-SQL. Mala brought up John Deardurff’s series on learning T-SQL as well as the SQL Quickstart Guide. Finally, once you’ve spent a couple of years building up those SQL skills, grab a copy of Grant Fritchey’s SQL Server 2017 Query Performance Tuning book, which is a massive tome full of great info. Or you can get the much smaller version from Red Gate.

After taking on T-SQL, learn about data modeling. The single best reference is the Handbook of Relational Database Design, released in 1989. The first half of the book is gold; the second half is only useful if you’re trying to implement this stuff on a late 1980s Oracle or Informix database… For something a bit more recent, check out Louis Davidson’s Pro SQL Server Relational Database Design. The 6th edition is coming out soon, but if you’re impatient, Louis and Jessica Moss teamed up on the 5th edition. I have an older edition, and I think Louis’s explanation of 4th normal form is the best I’ve ever read. Mala has a couple of recommendations as well: Information Modeling on Relational Databases and Database Design for Mere Mortals.

If you want to move from OLTP-style data modeling and into warehousing, Ralph Kimball’s Data Warehouse Toolkit is the book to get. The Kimball model is everywhere and this is the type of book you can come back to multiple times and learn more and more each time. If you’re interested in the Data Vault approach, check out Building a Scalable Data Warehouse with Data Vault 2.0. I haven’t read it, but people who are into Data Vault have recommended the book.

Of course, once you’ve learned a bit about warehousing, you might want to read up on ways to work with warehouses. If you’re looking at report-writing, Kathi Kellenberger’s Beginning SQL Server Reporting Services is a good start. From there, Paul Turley, et al’s Pro Microsoft SQL Server 2016 Reporting Services is the next step, but I’d probably only go that far if I were writing SSRS reports as a main part of the job.

Meanwhile, if you want to learn about Power BI, Marco Russo and Alberto Ferrari have you covered. They’ve released a number of books and also have plenty of videos available for free. I’d also recommend Phillip Seamark’s Beginning DAX with Power BI, though I don’t think it’s really a beginner book by any stretch of the imagination. Finally, Mala reminded me that I couldn’t finish a list of Power BI resources without mentioning Guy in a Cube.

Career Next Steps

During our coverage, @jpanagi1 asked us about career guidance:

my background – dont have compsci degree – engineering background – learned fundamentals of sql – recommended next steps? do i need more coege or certs? what are employeers looking for?

We spent some time talking through this specific case. To summarize, @jpanagi is currently working at a company which has teams working on interesting projects, and the question is, how can I get into that position?

The simple form of our advice is, work with the team a bit on projects. If you have official projects, that’s easiest; if you can volunteer to help them out a bit, that can work too. If you have a reasonable manager, bring up that you’d like to do that kind of work in the medium term. A reasonable manager might not want to lose you, but there are several options here.

One is “in-house” work. Suppose that team is a report-writing and dashboarding team and they’re using some nice products in interesting ways. You might be able to build a dashboard specifically for your team with the same products, gaining experience while providing direct benefits to your team and your manager.

Another option is working part-time on projects with members of the report-writing and dashboarding team. It might not be directly for your team, but maybe a time-sharing of 80-20 or 90-10 could get you some experience on projects and try out something new without your manager losing a person.

If your manager isn’t reasonable, things get harder. In that case, you might need to volunteer on the sly and work a few extra hours with that team after you get your regular work done. I’ve known people who were trained up “after hours,” where they’d stay late at work and learn from someone in another department. Smart managers at good companies formalize this kind of training, but don’t let that limit you otherwise.

Also, a bit of freelancing doesn’t hurt. Again, this is off-hours, but if you want to try out a different path, see if there’s something tangentially related to your team. Learn about the products and try them out. Then, if you come up with something interesting, you can bring it up with your manager. If you ultimately get locked out from doing this thing, well, at least you have a project for your resume, and you can decide whether the position you’re at is the right one.

Shop Talk: 2020-09-28

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan

Notes: Questions and Topics

Azure Active Directory Outage

Shortly before the episode began, Azure had a major outage in its Active Directory service. We used this as a springboard for discussion around what happens when you rely on a cloud service for your company’s infrastructure, but that cloud service goes down. The easiest answer is multi-cloud and cloud/on-premises hybrid scenarios, but those can get expensive and also limit you from some of the benefits of Platform-as-a-Service offerings.

SOS_SCHEDULER_YIELD

Our next segment was around a server having issues that I was asked to take a look at. The two biggest waits on the server were CXPACKET and SOS_SCHEDULER_YIELD. The knee-jerk reaction when you see SOS_SCHEDULER_YIELD is to say that it’s a CPU problem, but Paul Randal explains why that is not necessarily the case.

From there, you want to see if the signal wait time is high—the rule of thumb I’ve seen is that if signal waits are more than about 20% of total wait time for SOS_SCHEDULER_YIELD, then we have a problem where there isn’t enough CPU to go around. In my scenario, signal waits were approximately 99.999% of total waits. So the kneejerk reaction is to blame CPU, the more thoughtful reaction is to investigate further, and the results of the investigation were to blame CPU.

If that’s the case, here are a few of the options potentially available:

  1. Tune some queries. Look for high-CPU queries, especially ones which run frequently. There are a few of those in my scenario, so every second of CPU you shave off per query call is a second which can go to another query and lessen the waits.
  2. Drop MAXDOP and cost threshold for parallelism if it is too high. I’m not talking about the knee-jerk reaction of “Parallelism is high, so drop MAXDOP to 1” but if you have 32 cores and your MAXDOP is 32, that probably doesn’t make much sense for an OLTP system. As a warning, though, changing MAXDOP without reviewing poorly-performing queries can lead to bigger problems, as those poorly-performing queries are already struggling with a high value of MAXDOP, so if you reduce .
  3. Increase CPU cores if you can. If your queries are looking good, it may just be that you have too much load on your server and adding more hardware can be the solution.

SSIS and ADO.NET: A Rant from Someone Not Me

Mala spent far too long working through an issue with multi-subnet failover on Availability Groups and SSIS. We use OLEDB drivers very heavily when working with SSIS, as we appreciate work finishing at a reasonable pace. But the OLEDB drivers installed with SQL Server 2017 don’t include support for multi-subnet failover. But hey, ADO.NET drivers do. So Mala spent a lot of time trying to switch over to that and catalogs some of the issues she ran into.

Furthermore, it turns out that the latest OLEDB driver (installed with SQL Server 2019 and available separately if you don’t have that version of SQL Server) does have support for multi-subnet failover, so we just needed to update drivers.

Shop Talk: 2020-09-21

The Recording

The Panelists

  • Kevin Feasel
  • Tom Norman

Notes: Questions and Topics

Microsoft Ignite

Microsoft Ignite is this week. Tom and I spent some time talking about the sessions and themes in the data space. Tom mentioned some sessions he’ll attend. I’m going to check out the Azure SQL Edge session.

Kerberos with Tom

Tom is working through some Kerberos double-hop issues, so we talked a bit about SPNs. Tom’s question was whether we needed to register the SPN for just the SQL service account, or also for SSRS, SSIS, etc. My recollection was that you only needed to register for the SQL Server account. It looks like that’s correct.

Innovate and Modernize Apps with Data and AI

I wanted to call out that a Microsoft Cloud Workshop that I helped build: Innovate and Modernize Apps with Data and AI. It hits on a lot of Azure technologies, so if you’re interested in how some of these technologies can fit together in an event-driven architecture, check it out.

Conference Thoughts

We wrapped up tonight’s show with a quick discussion of virtual conferences.

Shop Talk: 2020-09-14

The Recording

The Panelists

  • Kevin Feasel
  • Tom Norman

Notes: Questions and Topics

Azure Labs

Tom has spent a lot of time recently with Azure Lab Services, so our first segment is an interview where we cover some of the details of Azure Labs, how it worked for Tom, how much it ended up costing him, and some of the things you can do with the product.

Tom is doing another training for O’Reilly on December 4th. Here is a link to his prior training, to give you an idea of the offering.

Refactoring Databases

Raymond brought us into the main topic of the evening: practices and planning for refactoring databases.

We go into it in some detail, but I wanted to share my notes as there were a couple of topics I did not have a chance to cover. What follows is a disjointed set of concepts which hopefully resemble advice in the aggregate.

First, refactoring is not performance tuning. Refactoring is supposed to be a zero-sum code change intended to make maintenance easier. It, by its nature, is not intended to make things faster or change behavior. I gave a few classic examples of refactoring in application code, as well as a couple examples of database refactoring. In code, think about breaking a segment of code out into its own method/function, or separating two things into different classes. This doesn’t necessarily improve application performance but it does make it easier for developers to understand and maintain. On the database side, normalizing a set of tables can be an example of refactoring—you’re not necessarily changing the performance of the application, but instead making it easier to manage query calls to insert, update, and retrieve information. Another example of database refactoring would be changing text data types to VARCHAR(MAX).

One key consideration is, do you have the downtime to perform refactoring? In other words, do you need the application running while your changes happen? If so, check out my talk on approaching zero-downtime code deployments.

I think the most important mindset to take when refactoring is to ask why you need to refactor. Make sure you have a good reason and it’s not just “It’s not my code, so it’s bad.” Understand that code has a story and the developer who wrote it knows more about the story than you do—even if you wrote the code, the you of two years ago was closer to the problem then than you are today. Chesterton’s Fence is a good mindset to have: before you change code, understand what it is doing and why it is there. If you can’t explain why the code looks the way that it does, be very hesitant about changing it.

Keep in line with your application deployment. Working with app devs reduces the risk that you’ll drop procedures in use or make changes which break the code. Also, try to use stored procedures as interfaces whenever possible. They allow you to make changes much more cleanly than writing queries directly through the application or via an ORM.

Another thing to consider is whether to include a rollback process, or if forward is the only direction. This will depend on your (and your company’s) risk appetite. Only moving forward is a lot easier to develop against, but it does require foresight and is much higher risk. Rollback scripts force you to think about how to back out of a problem, but because they are (hopefully) almost never used, it’s a lot of extra development time. This will depend on the company and scenario—if you’re working on a real-time financial system, then you don’t have much of a choice. But if you’re working on a personal website or an in-house product with a small number of users, it may make more sense just to keep moving forward.

Whenever possible, have tests. A thorough set of database integration tests is great. But if all you have is a hand-created workbench of queries to run, that’s still better than nothing and can be sufficient. The goal is to have enough information to determine if there are issues with the rollforward phase as quickly as possible, ideally before users experience them.

Write your scripts to be re-runnable and put them through change management. Store the scripts in source control. There are a couple philosophies on what to store in source control: either the current state of the system (creation scripts for tables, stored procedures, etc.) or the change process (the set of database modification scripts run over time). I like a compromise approach of having the current state plus some history of changes.

Finally, early refactoring can be painful, especially because you’re usually trying to untangle years of growth. Choose some easy targets to build momentum. Those tend to be tangential projects or small parts of a major database. Start building some of the muscle memory around rollback scripts (if needed), change control, and testing; that will make subsequent work that much easier.

Shop Talk: 2020-08-31

The Recording

The Panelists

  • Kevin Feasel
  • Tom Norman
  • Tracy Boggiano
  • Mala Mahadevan

Notes: Questions and Topics

The DBA as Gatekeeper

Our first topic was all about the DBA as a gatekeeper. Kenneth Fisher’s blog post inspired the discussion. We got into a fairly detailed discussion on what gatekeeping means, where it makes sense, and where it doesn’t. Mike in chat summed it up with an excellent analogy: gatekeepers guard gates, not walls. In other words, “No” can be a viable answer, but can’t be the only answer.

The Case for Heap Tables

From there, Mala gave us the second topic of the night: is there a place for heap tables? tg came in quickly with “Yes, in a museum.”

The general consensus is that yes, there are cases for heap tables, but they are rare: rare enough that we’re talking 1% or fewer of user tables across an environment. But there are some good cases:

  • Tiny tables which fit on one page, such as enumeration tables. There’s no benefit to a clustered index when you’ve only got one page.
  • “Write-only” tables, such as log tables, can potentially be faster with heaps than with clustered indexes. That’s not guaranteed and advice can vary based on the version of SQL Server we’re talking about (where later versions make it less likely that you want to use a heap), but it’s possible.
  • Temporary tables. It may sound like a bit of cheating for me to include these, but “temporary” can include “temporary permanent” tables: non-temp tables which you don’t expect to be around for very long and aren’t used in your application.
  • If I need to perform a full scan of the table every time I query it—if I actually need every record and don’t look for ranges or individual rows—then it might make sense to leave the table as a heap.
  • In Azure Synapse Analytics SQL pools, we have three options: clustered columnstore index, clustered index, and heap. Clustered columnstore indexes are recommended for fact-like tables (lots of numeric values, no long strings) with at least 60 million rows. Clustered indexes are recommended for cases when you perform a single-row lookup. And heaps are recommended in other cases: small data sets supporting scans rather than point lookups.

Mala’s Book Corner

Mala has two book recommendations for us: