Shop Talk: 2020-04-13

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tom Norman

Notes: Questions and Topics

Deleting Lots of Rows

Anders Pedersen starts us off with a doozy. When deleting a large number of rows, should we do this in one transaction or not?

Answer: Nope. Delete in batches, although this can take a while for enormous tables. If you’re retaining a tiny percentage of rows, then it might be easier to create a new table, migrate the data you want to keep to that table, drop the old table, and rename the new table back to the old name.

If you’re using Enterprise Edition, you can partition your tables by date and use partition switching.

On Long-Term Storage of Data

As part of deleting lots of data, we ended up talking about long-term archival storage of data. Tom brought up Stretch DB and I laughed. I laughed because Stretch DB is dead on arrival as soon as you look at the price.

If you aren’t made of money, there are a few other options. One I like is to use PolyBase for cold storage of data. Solomon Rutzky also recommended storing archival data on slow disk within SQL Server.

Magnetic Storage Has Its Place

Mike Lisanke calls me out and says that magnetic storage has its place in the world.

To that I say, this is true. I want things as fast as possible, though, and faster storage is one of the easiest ways to make your SQL Server a lot faster. Spinning disk and tape are good for long-term backup storage.  But they’re generally not for OLTP or even OLAP scenarios. Give me NVMe or even SSD any day of the week.

Why do Databases Not Have Multi-Level Caching?

From Mike Lisanke, why do databases not have the concept of multi-level caching?

Answer: This answer is for SQL Server in particular; it may be different for other database technologies.

SQL Server has a buffer pool, where data is read into memory before it is returned.  That’s one level of caching. From there, multi-level caching is more of an architecture decision:  adding caching apps like Redis or using in-process cache in your app servers. That’s outside of the database but replaces database calls, so it effectively acts as another layer of caching.

Also, there is a concept of aggregations in SQL Server Analysis Services, where the engine creates pre-computed aggregations of slices of your data. That gives you a performance boost sort of like what caching does, and you can replicate this in the database engine with rollup tables.

Tools for Recording Presentations

Mala recommends Skype, as it is free and lets you save recordings. She also recommended checking out work from Doug Lane (for example, his gear to make technical videos—though that is a few years old) and Erik Darling.

Tom uses GoToWebinar but doesn’t do many recordings.

I use Whereby for streams and you can record on there.  I use Camtasia for professional video editing and post-processing.  OBS Studio is great for gonzo work when you don’t want post-processing. It’s also the software I use for streaming.  Windows Video Editor is a thing but I have no experience with it so I don’t know how well it would work here.  Adobe Premiere Pro is great if you can afford it.

Mike Chrestensen recommended frame.io and Loom as well. And Raymond Porrata has a whole list of open source video editing tools.

Mike Lisanke had a follow-up question here about using transcription services. Anders Pederson recommended Rev. You can also use things like Azure Speech to Text.

What Are You Working on Now?

Mala is currently going through SSIS training from Andy Leonard. Andy is an outstanding teacher and one of the best at SSIS. If you get a chance to learn from Andy, take it.

Tom is working on building an enclave in his environment so he can use Always Encrypted with enclaves.

On Tuning a Stored Procedure

John fan Zhang had a lengthy question for us which I’m summarizing as, given a new stored procedure which inserts batches of rows into a table, I am seeing resulting worse database performance.  What can I do about this? The table is a heap. Will a unique clustered index help?

Answer: My first thought is, check your storage. If you have cheap disk, get better disk performance and your problem probably goes away.

Inserting into heaps can be faster than inserting into tables with clustered indexes due to the hot page problem.  This typically matters more when dealing with concurrent insertion rather than single batch operation.  Still, in most cases, a clustered index will be faster for insert than a heap.

MERGE Versus INSERT

Mike Chrestensen asks, will using MERGE to insert data be faster than INSERT?

Answer: No. Also, avoid MERGE.  It has lots of bugs.  It’s easy to end up with terrible performance.  It’s generally slower than independent INSERT/UPDATE/DELETE operations.

Shop Talk: 2020-04-06

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tom Norman

Notes: Questions and Topics

SSIS and SSMS

Can SQL Server Management Studio 18.4 connect to SSIS 2017 and SSIS 2019?

Answer: Yes. As of SSMS 18, you can connect to Integration Services 2017 and 2019. For prior versions of Integration Services, you will need the same version of SSMS as SSIS.

Azure Data Studio and Notebooks

Kevin’s mini-rant about Azure Data Studio shortcuts can be summed up in two GitHub issues: supporting Jupyter shortcuts and supporting Command Mode. Please upvote those by choosing a thumbs-up reaction if you want to see these in Azure Data Studio.

Module Signing

From chat:

johnfan14: Can I ask a question? The question is If we must modify the Orders table to meet the following requirements: 1. Create new rows in the table without granting INSERT permissions to the table. 2. Notify the sales person who places an order whether or not the order was completed. What should we create?

For the first part, my recommendation is to use certificate signing. Solomon Rutzky has an excellent tutorial on that. Solomon happened to be in chat and mentioned that ownership chaining can work as well for many circumstances.

The answer to the second part is generally to use something like Service Broker. For more on that, I’d recommend Colleen Morrow’s series of posts on the topic.

SQL Server in Docker Containers

We had a question in chat about using SQL Server in containers. Microsoft has some good documentation on how to get that going.

I also mentioned running Linux containers natively in Windows without emulation via Hyper-V. You can read more about that on the Docker website.