Shop Talk: 2020-09-14

The Recording

The Panelists

  • Kevin Feasel
  • Tom Norman

Notes: Questions and Topics

Azure Labs

Tom has spent a lot of time recently with Azure Lab Services, so our first segment is an interview where we cover some of the details of Azure Labs, how it worked for Tom, how much it ended up costing him, and some of the things you can do with the product.

Tom is doing another training for O’Reilly on December 4th. Here is a link to his prior training, to give you an idea of the offering.

Refactoring Databases

Raymond brought us into the main topic of the evening: practices and planning for refactoring databases.

We go into it in some detail, but I wanted to share my notes as there were a couple of topics I did not have a chance to cover. What follows is a disjointed set of concepts which hopefully resemble advice in the aggregate.

First, refactoring is not performance tuning. Refactoring is supposed to be a zero-sum code change intended to make maintenance easier. It, by its nature, is not intended to make things faster or change behavior. I gave a few classic examples of refactoring in application code, as well as a couple examples of database refactoring. In code, think about breaking a segment of code out into its own method/function, or separating two things into different classes. This doesn’t necessarily improve application performance but it does make it easier for developers to understand and maintain. On the database side, normalizing a set of tables can be an example of refactoring—you’re not necessarily changing the performance of the application, but instead making it easier to manage query calls to insert, update, and retrieve information. Another example of database refactoring would be changing text data types to VARCHAR(MAX).

One key consideration is, do you have the downtime to perform refactoring? In other words, do you need the application running while your changes happen? If so, check out my talk on approaching zero-downtime code deployments.

I think the most important mindset to take when refactoring is to ask why you need to refactor. Make sure you have a good reason and it’s not just “It’s not my code, so it’s bad.” Understand that code has a story and the developer who wrote it knows more about the story than you do—even if you wrote the code, the you of two years ago was closer to the problem then than you are today. Chesterton’s Fence is a good mindset to have: before you change code, understand what it is doing and why it is there. If you can’t explain why the code looks the way that it does, be very hesitant about changing it.

Keep in line with your application deployment. Working with app devs reduces the risk that you’ll drop procedures in use or make changes which break the code. Also, try to use stored procedures as interfaces whenever possible. They allow you to make changes much more cleanly than writing queries directly through the application or via an ORM.

Another thing to consider is whether to include a rollback process, or if forward is the only direction. This will depend on your (and your company’s) risk appetite. Only moving forward is a lot easier to develop against, but it does require foresight and is much higher risk. Rollback scripts force you to think about how to back out of a problem, but because they are (hopefully) almost never used, it’s a lot of extra development time. This will depend on the company and scenario—if you’re working on a real-time financial system, then you don’t have much of a choice. But if you’re working on a personal website or an in-house product with a small number of users, it may make more sense just to keep moving forward.

Whenever possible, have tests. A thorough set of database integration tests is great. But if all you have is a hand-created workbench of queries to run, that’s still better than nothing and can be sufficient. The goal is to have enough information to determine if there are issues with the rollforward phase as quickly as possible, ideally before users experience them.

Write your scripts to be re-runnable and put them through change management. Store the scripts in source control. There are a couple philosophies on what to store in source control: either the current state of the system (creation scripts for tables, stored procedures, etc.) or the change process (the set of database modification scripts run over time). I like a compromise approach of having the current state plus some history of changes.

Finally, early refactoring can be painful, especially because you’re usually trying to untangle years of growth. Choose some easy targets to build momentum. Those tend to be tangential projects or small parts of a major database. Start building some of the muscle memory around rollback scripts (if needed), change control, and testing; that will make subsequent work that much easier.

Shop Talk: 2020-08-31

The Recording

The Panelists

  • Kevin Feasel
  • Tom Norman
  • Tracy Boggiano
  • Mala Mahadevan

Notes: Questions and Topics

The DBA as Gatekeeper

Our first topic was all about the DBA as a gatekeeper. Kenneth Fisher’s blog post inspired the discussion. We got into a fairly detailed discussion on what gatekeeping means, where it makes sense, and where it doesn’t. Mike in chat summed it up with an excellent analogy: gatekeepers guard gates, not walls. In other words, “No” can be a viable answer, but can’t be the only answer.

The Case for Heap Tables

From there, Mala gave us the second topic of the night: is there a place for heap tables? tg came in quickly with “Yes, in a museum.”

The general consensus is that yes, there are cases for heap tables, but they are rare: rare enough that we’re talking 1% or fewer of user tables across an environment. But there are some good cases:

  • Tiny tables which fit on one page, such as enumeration tables. There’s no benefit to a clustered index when you’ve only got one page.
  • “Write-only” tables, such as log tables, can potentially be faster with heaps than with clustered indexes. That’s not guaranteed and advice can vary based on the version of SQL Server we’re talking about (where later versions make it less likely that you want to use a heap), but it’s possible.
  • Temporary tables. It may sound like a bit of cheating for me to include these, but “temporary” can include “temporary permanent” tables: non-temp tables which you don’t expect to be around for very long and aren’t used in your application.
  • If I need to perform a full scan of the table every time I query it—if I actually need every record and don’t look for ranges or individual rows—then it might make sense to leave the table as a heap.
  • In Azure Synapse Analytics SQL pools, we have three options: clustered columnstore index, clustered index, and heap. Clustered columnstore indexes are recommended for fact-like tables (lots of numeric values, no long strings) with at least 60 million rows. Clustered indexes are recommended for cases when you perform a single-row lookup. And heaps are recommended in other cases: small data sets supporting scans rather than point lookups.

Mala’s Book Corner

Mala has two book recommendations for us:

Shop Talk: 2020-08-24

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tracy Boggiano

Notes: Questions and Topics

Working with ORMs

Our primary topic for the evening was working with ORMs. We’re data platform specialists, so the feeling is generally fairly negative toward ORMs, but I tried to give the “pro” side a reasonable airing, as there certainly are valuable uses for them.

One of the key things about ORMs to consider is that there are two major varieties: lightweight ORMs (also known as micro-ORMs) and their heavyweight bretheren. Two examples of micro-ORMs that I personally like are Dapper and FSharp.Data.SqlClient. These are small wrappers around ADO.NET which create simple objects from T-SQL statements and stored procedures. They save the development time of mapping result set outputs to .NET classes / record types without adding a lot of overhead. For that reason, I’m a big fan of using them.

On the other side, we have heavyweight ORMs like Entity Framework and NHibernate. These do a lot more and aim to create a single development experience in C#, so that developers don’t have to think in two languages. They also work well with fluent APIs like LINQ, translating those statements into SQL queries.

As far as performance goes, micro-ORMs are faster in most cases. Products like EF and NHibernate can generate some really nasty SQL and cause performance problems on complicated queries. But if you stick to fairly simple queries—especially simple insert, update, and delete operations—heavyweight ORMs can save you a good bit of time.

Tips for Creating a Presentation

The secondary topic for this evening was tips for creating a presentation. Mala, Tracy, and I have all put together presentations and so have several of the great people in chat. We talked about some ideas on how to get into presenting and shared a few stories of things that go wrong, with the expectation that hey, stuff happens but the presentation still works out in the end.

Mala’s Book Corner

Mala has two book recommendations for us:

Shop Talk: 2020-08-17

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tracy Boggiano

Notes: Questions and Topics

Thoughts on PASS Pro Membership

Mala, Tracy, and I spent a good bit of time talking about PASS’s professional tier membership. It’s pretty common for trade organizations to require member dues—I brought up as examples the ACM and IEEE, and Tracy & Mala came up with other examples like Oracle’s official user groups. In this case, there’s no requirement: the free tier of PASS membership is the same as before.

We talked for a while about the question of whether to become a pro member. The really short version is that if you’re taking a hard look at membership benefits versus the annual cost, it’s not worth it today. But if you have goodwill built up for PASS, it’s worth it, as it helps keep PASS afloat.

Architectural Diagrams

The other key theme of the evening was architectural diagrams. Mala and Raymond both shared the same article on what makes for a good diagram, so that seems like a sign that it’s a good article.

My key judgments on what makes for a good architectural diagram are:

  • The diagram is concise. Show what you need but don’t include a lot of unnecessary detail. For example, if we’re talking about a ETL process in Azure, such a diagram might show a virtual machine pushing data through Azure Data Factory into an Azure Synapse Analytics SQL Pool, and from there into Azure Analysis Services and Power BI. At this high level, including the list of specific VNet settings is unnecessary. Even the set of data flows you’re creating through this process is unnecessary unless there’s a need—for example, if there are two processes and you need to differentiate them.
  • At the appropriate level. Ideally, boxes or shapes in a diagram should be independent units, each of which is necessary to understand for the solution.
  • Built with the audience in mind. We can have multiple diagrams for different people, and diagrams aren’t (or at least shouldn’t be!) the only documentation available.

As far as diagramming tools, I brought up Diagrams.net and Diagrams as Code. We also received recommendations for PowerPoint, Visio, and LucidChart.

Mala’s Book Corner

Mala has two book recommendations for us:

Jobs

Shop Talk: 2020-08-10

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tracy Boggiano
  • Tom Norman

Notes: Questions and Topics

Transaction Modes

The first topic of the evening was around transaction modes, following from a blog post and video I released yesterday.

Not surprisingly, most of us use explicit transactions in many cases, particularly higher-risk scenarios. Tracy has a template which she fills in, and Mala follows a similar path to me: begin a transaction if it’s potentially scary or if you’re doing this in application procedures. Tom’s answer: it depends. Most of the time, Tom uses autocommit because he’s in easy mode, but when he kicks it up to dangerous mode or if he needs to wrap multiple tables in a single transaction, he’ll mark them as explicit.

We also got into a rat’s nest on nested transactions and save points. Nested transactions are a lie. Savepoints are mostly a lie.

Maximum Number of Joins in a Query

Mala brought up a topic asking about the maximum number of joins in a query. Tom took a strong stand with “It depends.” Mostly it depends on how big the tables are.

@srutzky had the best answer: 20 to 30, but they must all be to views which have 20-30 joins. I’d add on that the views need to be nested views going at least three or four levels deep.

I don’t think there’s a real answer to the question. I’ve run into cases with 18-20 joins where the query plan just falls apart and taking one of those joins out (even to a minor table where it’s a simple nested loop lookup of a fairly small number of rows) makes the query perform a lot faster. But I’ve also worked with queries with more joins than that which worked quite nicely. At the end of the day, if you are able to generate a stable plan, that’s how many joins you can get away with.

As a bonus, I rant about the phrase “Normalize until it hurts, denormalize until it works.” This isn’t the 1980s; that phrase generally doesn’t apply to OLTP systems anymore and hasn’t for a good decade-plus. If you need to denormalize your tables to get queries to run efficiently, it probably wasn’t really normalized.

Mala’s Book Corner

Mala is back with two book recommendations for us:

A Diversion on Technical Writing

We ended the broadcast with a discussion on the importance of technical writing and some of the difficulties around it. It started on the idea of writing a book, but we ended up focusing on the documentation itself. One thing I want to stress is just how difficult it is to get this documentation right, especially because we tend to take mental shortcuts and expect that others will know the context currently in our heads. I’m really bad at it and have to try hard to remember that the reader needs all of the relevant context. It’s particularly difficult because the reader will go from A to B to C to D, but I might have written it B, D, A, C, such that by the time I get to A, I forget that I needed to explain something to make B make sense.

Raymond also asked where you can store documents. It’s a tough problem and we punt around the problem a bit.

Shop Talk: 2020-08-03

The Recording

The Panelists

  • Kevin Feasel
  • Tom Norman

Notes: Questions and Topics

Tracy the MVP

Congratulations to Tracy Boggiano for finally getting her MVP. It’s been a long time coming and I’m glad that she is getting the appropriate recognition for her community support.

Azure Data Studio Database Projects

Tom gives us his first thoughts on Azure Data Studio database projects. Tom referenced a blog post by Wolfgang Strasser and walked us through his first thoughts.

We then turned this into an extended discussion on the state of Azure Data Studio today.

A Rant on the XEvent Profiler

Tom and I then discussed Profiler. No, not that profiler; the other one. I hate the fact that Microsoft named this the XEvent Profiler because it really muddies the waters. The product itself is fine and is starting to give Extended Events a reasonable UI. But that name…

The biggest problem I have with the name is that it seems to be intentionally confusing, and as long as there are two tools called Profiler, that ambiguity will lead to confusion. “Oh, I heard from <insert name here> that Profiler is bad, so I’ll avoid this thing called XE Profiler. What’s an XE?” It would have been better to name it something different and make it easier for people to say something like “Avoid Profiler and use the SQL Server Performance Tracker instead.”

The product is fine; the name is not.

Licensing

We had several questions around licensing, and I’m bundling them here.

First, if you have questions about SQL Server licensing, Thomas Grohser did a talk for our group last month and he explains it better than I ever will.

We also talked about licensing in tools like Visual Studio Code, which has its own license based on MIT. We talked a bit about which licenses tend to pass muster in legal teams at organizations, as well as some of the ones which don’t.

I also talked about why I hate Oracle and the exact amount of my Oracle razzing which is real versus me being a troll.

Mala’s Book Corner, by Kevin

In this week’s edition of Mala’s Book Corner Kevin’s Book Hovel, I recommended Spark in Action 2nd Edition by Jean-Georges Perrin. Jean-Georges is local to us in the Triangle area and published a great book on Spark. The examples are a lot better than what I’ve seen in other Spark books and training materials, so if you’re interested in learning about Spark, get this book.

Shop Talk: 2020-07-27

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tracy Boggiano
  • Tom Norman

Notes: Questions and Topics

PASS Summit Speakers

I wanted to call out that there are nine separate North Carolinian speakers for PASS Summit this year.

On Shop Talk, Tracy Boggiano, Tom Norman, and I will all be speaking at PASS Summit. Other speakers include friends of the show Chris Voss and Jared Poche.

Angela Henry heads up the group from the Triad area.

Melissa Coates (whose pre-con is on a different day from mine, so you can see both of ours!), Brad Llewellyn, and Ben DeBow round things out from the Charlotte area.

Dealing with XML and JSON

@iconpro555555 asked a question at the very end of last week’s Shop Talk around handling structured data—that is, XML and JSON—in SQL Server.

SQL Server has a long history of dealing with XML data, including an XML data type, XML-specific indexes, and XPath query notation. It’s notorious among DBAs as “Something we don’t want,” and Tracy points out the biggest problem: once you start to get a decent amount of data, everything falls apart.

In SQL Server 2016, we got JSON support through the OPENJSON, FOR JSON, and JSONVALUE commands. There is no separate JSON datatype or direct indexing, though if you want to, you could create a persistent computed column using JSONVALUE to extract a specific value from the JSON and index that persistent computed column.

I don’t necessarily mind XML or JSON data in the database and it can be the best solution. But one pre-condition for it being the best solution is that you shouldn’t be shredding that data regularly. If you do, build out a proper, normalized data model and take advantage of what relational databases do best.

Storing and Using Secrets in Powershell

The next topic was around secrets in Powershell. This was a question from the very beginning of Shop Talk’s history, but because Rob Sewell recently blogged about this, I wanted to bring it up.

A Rant on Database Diagramming Tools

My official Rant of the Night was around how weak the database diagramming tool is in SQL Server Management Studio. The thing barely works at all, requires you to create database objects (meaning that developers with read-only permissions can’t even create these diagrams for themselves), provides only a modicum of relevant information, and are a royal pain to disentangle if you have more than just a couple of tables.

Tom defended the honor of SSMS database diagrams, while folks in chat recommended tools like DBeaver and DBDiagram. I brought up Mermaid, as I’ve just started getting into it for presentations and it allows for code-centric modeling, meaning that you write Python code to represent your model. I could see that tying in well with queries against system tables to get tables and columns, foreign keys, and the like. I also mentioned sqlDBm in an off-hand fashion because I couldn’t remember the name of it. I’ve used it before to reasonably good effect and the free version works pretty well if you have one fairly stable database. Visio database reverse engineering is excellent as well when your goal is to understand an existing, stable design.

Shop Talk: 2020-07-20

The Recording

The Panelists

  • Kevin Feasel
  • Tracy Boggiano

Notes: Questions and Topics

Azure and Lego Pieces

I got things going with a rant about how things in Azure feel like Lego pieces which don’t quite fit together. The first five hours of an Azure project generally involves me enjoying how well all of these services fit together. Then, you find a spot where things don’t quite work and then you spend 15 hours trying to bludgeon the thing into submission.

My example was around using Azure Synapse Analytics. I used a Spark pool to train a model, and the notebook worked great, saving my model to Data Lake Storage Gen2 and everything.

Then, I wanted to deploy the model using Azure ML. Now, Azure ML lets you deploy PySpark models, so that’s no problem. The problem was, the model is in HDFS (really Data Lake but let’s not split hairs too much given that Blob Storage is an implementation of WebHDFS) but Azure ML won’t let you use Data Lake Storage for the model location—it needs to be local to the machine. Which means local to the driver. Which is the opposite of what you want with Apache Spark.

The end result was that I could get the pieces to fit, but it involved a bit too much welding, filing, and sanding for my taste.

Dealing with a Production Outage

Tracy tells a story of teamwork during an outage. I’d link to it, but unfortunately she’s dealing with a blog outage.

How is the DBA Role Evolving?

Mala couldn’t make it tonight but she did give us a great topic: how is the DBA role evolving?

I took it three ways, noting that this is focused on “production DBAs” rather than database developers:

  1. Is the DBA role evolving? My short answer is, I can argue either way. On the “yes” side, DBAs are expected to have more development skills than they had before. We want them to know about source control, Powershell, and DevOps. They also, in cloud environments, have fewer knobs to turn and lose ultimate control over the machine. But on the “no” side, the key set of DBA roles hasn’t really changed that much over the past several decades. Think about taking backups and testing restoration, ensuring high availability and/or disaster recovery, preventing or correcting corruption, tuning performance, and monitoring systems. The means have changed but the motivations and goals are the same.
  2. Can a DBA from 1985, 1995, or 2005 be teleported into today’s world and survive? Assume that there’s some time for learning about technological changes. Well, take a look at DBAs around you. Many of them got their starts in the 1990s, so the answer is probably “yes.” Now, the person from 1985 or 1995 would need to unlearn quite a bit and learn quite a bit, but I don’t think this is some crazy sci-fi future scenario where a DBA from then gets dropped in and has absolutely no clue how to be a DBA today.
  3. Will there be DBAs 10 years from now? 20 years from now? I think the answer is absolutely yes, because the things which define a DBA’s role don’t come about because DBAs wormed their way into organizations. They are critical for the business: we have data, data which the company finds valuable. As a result, we need to make sure that the data is available, that it won’t disappear tomorrow, that it won’t become corrupt tomorrow, and that business users will be able to get to it in a reasonable amount of time. There will be people who specialize in this administration of the bases of data as long as people at businesses care about data.

Shop Talk: 2020-07-13

The Recording

The Panelists

  • Kevin Feasel
  • Tracy Boggiano
  • Tom Norman

Notes: Questions and Topics

Last night was the “Let’s get a copyright strike from the TV” edition of Shop Talk.

Migrating from Azure VMs to Azure SQL Database

Tom started us off with a discussion of how simple it was to migrate from SQL Server on a virtual machine hosted in Azure to running Azure SQL Database via Availability Groups. The one problem he ran into was around things like logins and SQL Agent jobs not going over to the AG secondaries because system databases can’t be part of an Availability Group. This feature was announced for SQL Server 2019, but had to be pulled, and as of CU5 at least, is not in the product.

Goodbye, Azure Notebooks

I reported the sad news that Azure Notebooks is going away on October 9th. There are several alternatives available. For training people, Azure Labs is probably the best option. For personal work, Visual Studio Code and Azure Data Studio offer notebook options, though I’d probably just use Jupyter Notebooks off of Anaconda and be done with it.

Visual Studio Codespaces does look pretty good as well and the pricing isn’t too bad. But none of these have what I really appreciated about Azure Notebooks: it being free.

Mala’s Book Corner, Hosted by Kevin

Without Mala here to join us, I decided to take on the mantle of book cornerist, where I recommended Learn Azure in a Month of Lunches by Iain Foulds. I’m going through this now, as it just released—I got my copy about a week ago. It looks like Microsoft is offering a free copy of the e-book as well if you provide them a bit of info.

Staying at a Job

My last question of the night was, how long should you stay at a job? In this case, I broke it down into two categories: what’s the minimum number of years you should stay at a job to avoid the “job-hopper” label, and what are the key conditions for leaving a job?

As far as years of service go, Tracy said 2-3 years. Anders mentioned that he starts looking at 2 years just to stay in tune with the market. When I started my career, 5+ years was the expectation, though I did get some pushback from folks in the audience and we settled on 5 years being a midwestern thing. Anyhow, I’d say that the number today is 2 years: we stay at a job that long and it’s “career break-even.”

This ties in with the other side, conditions for leaving. @thedukeny mentioned boredom and the quest for more money, and Anders echoed on the boredom comment. @rporrata mentioned three things: money, family, and (lack of) training. Tom mentioned the company not paying for training and also falling behind on technology. With Tracy, it’s terrible bosses and old systems.

I mentioned one other criterion: moving up. In the Olden Days, you had clearly defined levels where you could move up in a company, from Programmer 1 to Programmer 2 to 3 to 4 to Architect, and so on. Each role had specified tasks and commensurate pay. You could expect some level of growth at a company over the course of a couple decades. But that’s changed radically at most organizations. Now, you have a role: Programmer. In many places, there aren’t even labels for Junior or Senior—you’re just Programmer. This has nice benefits for employees in making it easier to pick up new problems to solve—you aren’t constrained to doing tasks A, B, and C because that’s all a Programmer 1 does. But it also means that your potential for growth is quite limited. There are no new titles to which you can aspire or big salary bumps which go along with it. And companies have this strange habit of assuming that an across-the-board 2% pay increase for employees is fine, regardless of how that person’s market value has changed.

As a result, people leave companies to grow. Each job change is an opportunity to get back up to your market salary level as well as take on more roles and assignments. There’s also much less stigma about people leaving and returning after a few years, as most people recognize that this isn’t the 1950s or the 1980s in terms of working at companies. There are some “lifer” companies out there with a fairly high percentage of extremely long-term employees, but those are rare.

Shop Talk: 2020-07-06

The Recording

The Panelists

  • Kevin Feasel
  • Tracy Boggiano
  • Mala Mahadevan
  • Special Guest Star Jared Poche

For show canon purposes, Tom was captured in a multi-state illegal fireworks ring, smuggling the good stuff.

Notes: Questions and Topics

Last night was the “Let’s get a copyright strike from the TV” edition of Shop Talk.

I Remember Halloween

Mala started us off with a question about Halloween protection. What is Halloween protection? How can we identify this? What are possible solutions?

Answer: Jared gave us a description of Halloween protection, and I followed up with a quick demo. If you want more thorough answers on Halloween protection and solutions to the performance issues it can provide, I recommend Paul White’s series on the topic, Jared’s blog post, and Mala passed along an article from Itzik Ben-Gan.

Upgrading SQL Server via Distributed Availability Groups

Tracy then told us about a project she had undertaken to upgrade SQL Server using Distributed Availability Groups, a feature introduced in SQL Server 2016 and enhanced with direct seeding in 2017. Tracy has used this technique to perform near-instant upgrades from 2016 to 2017, and points out that it can work for upgrading any instance from 2012 and on. Tracy shares a few important tips, such as changing the listener rather than upgrading connection strings, as that obviates the risk of missing some connection string somewhere.

Tracy should have a blog post forthcoming on the topic as well.

The Silliness of Security Compliance

In last night’s Rant of the Evening, I channeled Sean McCown in pointing out that there is nothing inherently wrong with enabling xp_cmdshell. If you’re not a sysadmin, you can’t use xp_cmdshell by default; if you are a sysadmin, you can enable xp_cmdshell whenever you want. So any security check telling you to disable xp_cmdshell is wrong—the correct answer is not to hand out xp_cmdshell rights willy-nilly.

All of this comes in the context of security compliance audits. It was Tracy’s topic and she did a good job airing out some of the sillier ones when I wasn’t cutting her off.

As a point of order, the concept of security is by no means silly. Having people who understand the behaviors (including unintended!) of first- and third-party applications and systems is critical, and I support those people how I can.

Mala’s Book Corner

Mala recommended two books for us this week:

Getting a Job

I wrapped things up with a question: how do you get a new job in adverse conditions? I also tried to tie it into the dot-com collapse, as that was the best analog I had for today’s environment.