Shop Talk: 2020-04-13

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tom Norman

Notes: Questions and Topics

Deleting Lots of Rows

Anders Pedersen starts us off with a doozy. When deleting a large number of rows, should we do this in one transaction or not?

Answer: Nope. Delete in batches, although this can take a while for enormous tables. If you’re retaining a tiny percentage of rows, then it might be easier to create a new table, migrate the data you want to keep to that table, drop the old table, and rename the new table back to the old name.

If you’re using Enterprise Edition, you can partition your tables by date and use partition switching.

On Long-Term Storage of Data

As part of deleting lots of data, we ended up talking about long-term archival storage of data. Tom brought up Stretch DB and I laughed. I laughed because Stretch DB is dead on arrival as soon as you look at the price.

If you aren’t made of money, there are a few other options. One I like is to use PolyBase for cold storage of data. Solomon Rutzky also recommended storing archival data on slow disk within SQL Server.

Magnetic Storage Has Its Place

Mike Lisanke calls me out and says that magnetic storage has its place in the world.

To that I say, this is true. I want things as fast as possible, though, and faster storage is one of the easiest ways to make your SQL Server a lot faster. Spinning disk and tape are good for long-term backup storage.  But they’re generally not for OLTP or even OLAP scenarios. Give me NVMe or even SSD any day of the week.

Why do Databases Not Have Multi-Level Caching?

From Mike Lisanke, why do databases not have the concept of multi-level caching?

Answer: This answer is for SQL Server in particular; it may be different for other database technologies.

SQL Server has a buffer pool, where data is read into memory before it is returned.  That’s one level of caching. From there, multi-level caching is more of an architecture decision:  adding caching apps like Redis or using in-process cache in your app servers. That’s outside of the database but replaces database calls, so it effectively acts as another layer of caching.

Also, there is a concept of aggregations in SQL Server Analysis Services, where the engine creates pre-computed aggregations of slices of your data. That gives you a performance boost sort of like what caching does, and you can replicate this in the database engine with rollup tables.

Tools for Recording Presentations

Mala recommends Skype, as it is free and lets you save recordings. She also recommended checking out work from Doug Lane (for example, his gear to make technical videos—though that is a few years old) and Erik Darling.

Tom uses GoToWebinar but doesn’t do many recordings.

I use Whereby for streams and you can record on there.  I use Camtasia for professional video editing and post-processing.  OBS Studio is great for gonzo work when you don’t want post-processing. It’s also the software I use for streaming.  Windows Video Editor is a thing but I have no experience with it so I don’t know how well it would work here.  Adobe Premiere Pro is great if you can afford it.

Mike Chrestensen recommended frame.io and Loom as well. And Raymond Porrata has a whole list of open source video editing tools.

Mike Lisanke had a follow-up question here about using transcription services. Anders Pederson recommended Rev. You can also use things like Azure Speech to Text.

What Are You Working on Now?

Mala is currently going through SSIS training from Andy Leonard. Andy is an outstanding teacher and one of the best at SSIS. If you get a chance to learn from Andy, take it.

Tom is working on building an enclave in his environment so he can use Always Encrypted with enclaves.

On Tuning a Stored Procedure

John fan Zhang had a lengthy question for us which I’m summarizing as, given a new stored procedure which inserts batches of rows into a table, I am seeing resulting worse database performance.  What can I do about this? The table is a heap. Will a unique clustered index help?

Answer: My first thought is, check your storage. If you have cheap disk, get better disk performance and your problem probably goes away.

Inserting into heaps can be faster than inserting into tables with clustered indexes due to the hot page problem.  This typically matters more when dealing with concurrent insertion rather than single batch operation.  Still, in most cases, a clustered index will be faster for insert than a heap.

MERGE Versus INSERT

Mike Chrestensen asks, will using MERGE to insert data be faster than INSERT?

Answer: No. Also, avoid MERGE.  It has lots of bugs.  It’s easy to end up with terrible performance.  It’s generally slower than independent INSERT/UPDATE/DELETE operations.

Leave a Reply

Your email address will not be published. Required fields are marked *