Shop Talk: 2020-05-11

The Recording

The Panelists

  • Kevin Feasel
  • Tracy Boggiano
  • Mala Mahadevan
  • Tom Norman

Notes: Questions and Topics

GUIs for Git

Mark Gordon wanted to know what people use for working with Git.

Answer: Here’s what everybody uses:

  • Tom — Visual Studio’s Git integration most of the time and Sourcetree for tricky parts.
  • Mala — TortoiseGit
  • Tracy — Visual Studio Code’s Git integration
  • Kevin — Mostly Sourcetree, sometimes TortoiseGit, and sometimes just using the Git bash shell (for things like git stash and git stash pop)

Presentations and Sessions

@johnfan14 asks, “How many presentations you do in a month on average? How many sessions you normally submit to a conference? What is the optimal numbers do you think?”

Answer:  This really depends on the person and we spent some time covering the various criteria. The short answer is that it really depends on your budget and available time, as well as location. Being based in the southeastern United States and in a city with a reasonably good airport, all four of us have an easy time traveling to a variety of events.

I didn’t really say out loud my answer to the specific questions so I’ll do so here. Prior to this year, my presentation counts were 63 in 2019, 70 in 2018, 50 in 2017, 53 in 2016, and 19 in 2015. Some events had me give multiple presentations, so I wasn’t at 63 events in 2019, for example, but the answer is “quite a few.” But I’m an extreme outlier, so I didn’t want to skew things too much.

As far as session submissions, this depends on the conference rules, but when there is no limit, I submit 4. That way there’s a variety of topics (which makes me more likely to be selected) but not so much noise that an organizer needs to wade through a dozen submissions. If you have one good talk or two good talks, just submit those—you don’t need 4.

Database Documentation Tools

From @rporrata: “DB Documentation what tools have you used especially for redesigning systems outside of the usual suspects erwin/studio.”

Answer: This ended up blending together two sets of tools: database documentation tools and database architecture tools. I’m not a big fan of database architecture tools in general, so without further ado, links to resources we came up with and chat helped us out with.

Architecture tools:

  • Visio allows you to reverse engineer database diagrams, though watch out: there were some editions which didn’t have this functionality.
  • SSMS database diagrams are a low-end way of showing relationships. They barely function but work okay with small numbers of tables and columns.
  • Lucidchart has a database design tool
  • @mikec_07 recommended Aqua Data Studio
  • Solomon Rutzky recommended Diagrams.Net, which is an old favorite of mine—it used to be called Draw.io.
  • I’ve used SqlDBM a few times. It’s fine for building static models, though there are niceties I wish it had (like being able to prevent table overlap).

Documentation tools:

Free Databases for Teaching SQL

Mark Hutchinson e-mailed before the show began and had a question for us: “I’m going to teach a couple of friends SQL.  Just the DML subset of the language to start (Select, Insert, Update, Delete).  Currently, we’re going to use MS Access, since the GUI can help do some of the heavy lifting at the early part of the course.  For later, maybe a second course, I was thinking about introducing the students to a large database, such as SQL Server.  What is the best free (or damned cheap) database?  One of the students is out of work and I’m retired, so money is an issue.”

Answer: We’re SQL Server folks, so we’re going to be biased. But each one of us came up with SQL Server Developer Edition. It is 100% free for non-production use and easy to install. We debated the merits of other editions as well, so here’s the quick +/- on each:

  • Developer Edition — Installation is really easy (basically next-next-next if you’re installing for a development box and don’t need custom configuration like you would for production) and you get the full SQL Server surface area.
  • Express Edition — Installation is also easy and you can use Express Edition in production environments for free. It’s limited to 10GB per database, no more than 4 cores on 1 socket, and a limited amount of RAM, but if you wanted a zero-cost SQL Server in production, that’s an option.
  • LocalDB — SQL Server Express LocalDB integrates really well with Visual Studio and lets you work with database projects. It’s a single-user variant of SQL Server with practically no configuration necessary. Some of the surface area is different in LocalDB, but that’s stuff you wouldn’t notice while learning SQL.
  • Containerized Developer Edition — You can certainly get SQL Server as a Docker container. If you’re already familiar with Docker and have it set up on the training machines, then this is a good technique as it’s really easy to spin up a new container and blow it away if you mess it up too badly. But if you don’t have things configured already, it can turn into a mess where you spend so much time trying to set up your environment that you lose out on what you wanted to do.
  • Azure SQL Database — Avoid installation altogether with Azure SQL Database. There is no free tier option anymore, but the Basic tier DTU model is $5 per month and there’s no installation or configuration at all.

While you’re at it, I’ll drop another plug for Itzik Ben-Gan’s T-SQL Fundamentals 3rd Edition, which would work as a textbook. Also, Kendra Little has a beginning T-SQL series worth checking out.

Expression Order

From @mikec_07:

My question is when we say “all expressions that appear in the same logical query processing phase are evaluated conceptually at the same point in time,” does that mean the query looks for all instances of FROM then all instances of WHERE, etc.?
How would this be processed? SELECT c.CUSTOMERID FROM CUSTOMER c WHERE c.CUSTOMERID NOT IN (SELECT ag.CUSTOMERID FROM ANGRYCUSTOMERS ag )

Answer: Mike is talking about clause ordering, where the FROM clause parses before the WHERE clause, etc. And within the WHERE clause, all predicates are handled at the same time, meaning that if you have a WHERE clause with X=1 AND Y=2 AND Z=3, the database engine can take them in any order (regardless of the order in which you wrote them). From your perspective, all of those filters happen concurrently, so you can write them in any order and the database engine will (hopefully) pick the smartest path. This is why short-circuiting may not work in SQL the way you’d expect it to in an imperative language: because the optimizer doesn’t care about the order in which you write things and can shake up the order if it looks like a better path.

In this particular instance, FROM CUSTOMER c will process first, and then we will get to the WHERE clause. Inside there, we have a NOT IN operator which operates on a sub-query, so we move to FROM ANGRYCUSTOMERS ag and then SELECT ag.CUSTOMERID. After that completes, we finally get to SELECT c.CUSTOMERID.

Events of Note

We wrapped up with one event of note because I forgot about the other one:

Shop Talk: 2020-05-04

The Recording

The Panelists

  • Kevin Feasel
  • Tracy Boggiano
  • Mala Mahadevan
  • Tom Norman

Notes: Questions and Topics

The Right PC for the Home Lab

Chris Voss e-mailed us with a great question. “What are the “home lab” setups like? What computers/specs does everyone have, and for what purposes? I’m talking about person systems rather than work. Part of why I ask is because I’m looking at new computers, so I’m asking basically every tech person to ensure I’m doing this right.”

Answer: This will depend on whether you want a desktop or a laptop. Or you could just have a server room in your basement…

Tom and Tracy went over some of the characteristics they look for in laptops, starting with 64 GB of RAM. I mentioned that I’d much rather have extremely fast disk for a single SQL Server installation, though both of them have multiple SQL Server instances running on multiple VMs, so the need for lots of RAM makes perfect sense.

If you want a resource for building a desktop machine, the folks at Logical Increments do an incredible job. The service is entirely free and I used it to build my machine learning and video processing desktop. You pick the price point and they give you several recommendations on hardware choices. One thing I would say is that I’d recommend going up a notch on drives—prosumer grade NVMe (like the Samsung Pro series) over consumer-grade SSD. Fill your motherboard’s NVMe slots first before using SSD or HDD.

Neil Hambly had some nice recommendations as well, including Overclockers for the UK folks and making sure that you swap out SSD every 18-24 months to eliminate the risk of a drive dying on you. Unlike hard disks, SSD doesn’t give you much as much warning before it dies out, and it can just suddenly drop off.

SSD vs NVMe

As a bit of kismet, Mark Gordon had e-mailed me earlier with a great follow-on question: “When it comes to storage for SQL Server, does NVMe offer an improvement over SSD?”

Answer: Oh, you bet it does. On stream, I read a tiny bit from this article on the differences between NVMe and SSD. The relevant portion is:

NVMe is not affected by the ATA interface constrictions as it sits right on the top of the PCI Express directly connected to the CPU. That results in 4 times faster Input/Output Operations Per Second (IOPs) rivaling the fastest SAS option out there. The seek time for data is ten times faster. NVMe can deliver sustained read-write speed of 2000MB per second, way faster than the SATA SSD III, which limits at 600MB per second. Here the bottleneck is NAND technology, which is rapidly advancing, which means we’ll likely see higher speeds soon with NVMe.

With SQL Server, you will notice the difference under load. NVMe is still nowhere near as fast as RAM, but it’s a lot closer than SSD (which is itself way closer than 15K spinning disk).

By the way, for the pedantic-minded, I am aware that NVMe disks are still SSD; when I say SSD, I mean SSD over SATA in the classic 2.5″ form factor.

Mark had some follow-up bits I can hit briefly here. He mentioned tempdb as a good candidate for fast disk and that’s a smart idea: tempdb should be on the fastest disk you can possibly get. Here’s a rough guide that I’m coming up with off the top of my head, ranking things in order of best to worst:

  • All direct-attached NVMe
  • All-flash SAN array
  • Tempdb on NVMe and all other databases on direct-attached SSD
  • All direct-attached SSD
  • SSD SAN array
  • Tempdb on direct-attached SSD and all other databases on HDD
  • Hard disks everywhere

There are other configurations that nestle in between some of these (e.g., direct-attached SSD for tempdb but SSD SAN for the rest is slightly better than SSD SAN array but slightly worse than all direct-attached SSD), but the general rule of thumb is that direct-attached beats SAN and that NVMe > SSD > HDD.

Learning T-SQL

John Fan Zhang asked for a good book to learn SQL.

My recommendation, to the point where I have purchased this book for one of my employees needing to learn T-SQL, is Itzik Ben-Gan’s T-SQL Fundamentals 3rd Edition. Itzik is brilliant and an outstanding teacher, and even if you have an advanced knowledge of T-SQL, you’ll still pick up things from his beginner-level book.

Mala also recommended Itzik’s training, available on demand.

Fun with RIGHT JOIN

Gabriel hit me up with this question before the stream began: “Is there a really a need to support and maintain RIGHT JOIN?”

Answer: Tracy says no, Tom says no, Mala says no, and Kevin says mostly no.

The thing about RIGHT JOIN is that it is usually confusing to people because it’s backwards from how we want to read. In English, we read left to right, top to bottom. We also work from the assumption that the most important stuff comes first. RIGHT JOIN violates this by making the latter table the “important” one. The other consideration here is that every RIGHT OUTER JOIN operation can be rewritten as a logically equivalent LEFT OUTER JOIN.

That said, I have personally run into a couple of cases where it made sense to use a RIGHT JOIN rather than switching to LEFT JOIN. These cases were mostly around complex join criteria with a combination of LEFT JOIN and INNER JOIN and one last RIGHT OUTER JOIN to catch the “I don’t have anything else” scenario. So I wouldn’t get rid of RIGHT OUTER JOIN, but if I see it in a code review, the first question is asking why this needs to be ROJ and cannot be a LOJ.

Aliases and Table Names

Finally, chat got off onto the tangent of aliases and table names. On this topic, @iconpro555 tossed us into the briar patch with “why not use 4 letter names because that is what you use for aliases anyway?”

As far as naming goes, my rule of thumb is: make it clear but not overly verbose. 4 characters is fine if a table is called dbo.Home and represents information about a home (location, square footage, tax appraisal, etc.). But don’t be afraid to add a few extra characters to a column name if it clarifies intent. One thing I really like to see is unit of measure. You show me a thing called TotalCPUTime, but is that in seconds? milliseconds? microseconds? This gets really annoying even with SQL Server DMVs because some of them are milliseconds and others microseconds.

Names are for developers, whether that’s the current developer or a future maintainer. Just like with the discussion about RIGHT OUTER JOIN, we are optimizing for developers rather than for the database engine. There are times when you need to optimize for the sake of the database engine rather than the developer, and that’s where you start adding copious notes clarifying your intent.

Events of Note

We wrapped up with one event of note because I forgot about the other one:

Shop Talk: 2020-04-27

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tom Norman

Notes: Questions and Topics

Upgrades

Mala had a two-part question for us. When do you upgrade? And why do you upgrade?

Answer: Each of us has different opinions.

Mala would regularly wait for SP1 of a product before upgrading. With Microsoft eliminating regular service packs for SQL Server, she’s not quite sure what rule of thumb to follow.

Tom likes to push the envelope, preferring to upgrade quickly. Though he doesn’t hit each version—he tends to skip a version, e.g., 2016 to 2019. He does want to see compelling items in a version before upgrading.

Kevin likes to upgrade for upgrading’s sake. Or something like that… I have enjoyed being part of the Early Access Program for SQL Server and getting a chance to try out products under development. I pushed back a bit against the “Wait for SP1” argument, but one thing I failed to say during it is that if everybody waits for SP1, SP1 will still have a bunch of bugs. I am thankful for the people whose philosophy is “Someone’s got to find the bugs, and it might as well be me” and everybody who waits to upgrade should as well.

From there, I derailed things onto my refusal to work for a company stuck on old version of SQL Server, with no plan to upgrade (or a plan but no real desire to upgrade). Tom and Mala make me walk it back a bit.

ANSI SQL

Mike Lisanke wanted to know why we call the language SQL for SQL Server, Oracle, DB/2, Postgres, etc., and yet they’re all different languages.

We covered a lot in here, but the gist is that ANSI releases versions of the standard which companies subsequently adopt in part (and extend in part). I mentioned that there isn’t “an” ANSI SQL standard and Wikipedia has a nice table (about 2/5 of the way down) showing the different versions of ANSI SQL. I had guessed about the pre-89 versions and wasn’t quite right—there was only one pre-89 version, there wasn’t a 1997 version, and 2000 was 1999. Other than that the answer was fine! But there have been 10 iterations of the ANSI SQL standard.

We also talked about the origin stories of a few platforms, including Sybase/SQL Server, Oracle/Postgres, and MySQL/MariaDB. We also talked about coding for ANSI compliance. Tom likes that idea (or just using PolyBase—which I recommend!). I don’t care much for coding for ANSI compliance for most places because you lose chances to improve performance for a chimerical gain. The exception here is if you must write software which is cross-platform; then you’re stuck.

I brought up lateral joins. Here’s a sample implementation in Apache Drill. I still prefer the name CROSS APPLY.

Hierarchies and Relational Database Design

Tom mentions making use of hierarchyid in SQL Server. Then we started name-dropping books.

First up, I recommend Adam Machanic, et al’s, Expert SQL Server 2005 Development. I haven’t read it in a while and obviously the development surface area has changed in 15 years, but there is an excellent chapter on trees and hierarchies.

Mala and I both recommend Louis Davidson and Jessica Moss’s Pro SQL Server Relational Database Design and Implementation. I have an older edition, but I mentioned that it has the best explanation of 4th normal form that I’ve ever read.

I then pulled out my copy of Candace Fleming and Barbara von Halle’s Handbook of Relational Database Design. I consider it the best explanation of normalization I’ve ever seen in print (and thanks to Grant Fritchey for the recommendation!). Just don’t read the second half of the book unless you want a story of how implement on ancient systems.

Events of Note

We wrapped up with one event of note:

Shop Talk: 2020-04-20

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tom Norman

Notes: Questions and Topics

Tips for Working from Home

I started us off with a topic of discussion: working from home. Mala and Tom both have significant experience with the topic and they share their thoughts. Stick around for a bit of ranting about Microsoft Teams. @thedukeny points out this highly-upvoted item to allow for multiple Teams accounts at the same time. Slack does it right, and teams is painful.

Tom brought up desk-sharing, which I absolutely hate. On the plus side, it did remind me of a Dilbert strip from 25 years ago.

Original strip: https://dilbert.com/strip/1995-01-09

Containers

Chris Voss asked a question a while back and I finally got a chance to answer: Our team is starting the use of containers for local environments to test our database development, before deploying to the shared dev environment.Can anyone share their container strategies, and what are space considerations for local sandboxes? Would it make sense to put an entire application code base in the same container?

Answer: There are a few questions in here, so let’s take them in turn.

As far as space goes, Tom Norman pointed out that containers won’t save you space across machines: if you have a 500GB database you need on every developer’s laptop, even if that database is in a container, it’ll cost you 500GB of disk space per laptop. Kevin pointed out that the container savings is when you can layer your containers: if you have a bunch of applications using .NET Core, for example, you can reuse container layers so that you might have a couple dozen .NET Core apps which all use the same base layer, so that layer gets stored on disk once.

Does it make sense to put application code in the same container as database code? No, for the same reason that you wouldn’t put app code on the same server as your database. Keeping components isolated reduces system complexity and makes it easier to upgrade or swap out parts.

The Telemetry Service

Mark Gordon raised a question about the telemetry service which derived from a weird account setup. Mark’s research led him to read up a bit on the telemetry service. We then had a bit of discussion about the telemetry service itself and I referenced a Brent Ozar post on the topic.

My personal opinion is that I’m fine with a telemetry service. I build telemetry in my applications and would expect the same from products like SQL Server. There are differing opinions on the topic, though.

Events of Note

We wrapped up with a few events of note:

Shop Talk: 2020-04-13

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tom Norman

Notes: Questions and Topics

Deleting Lots of Rows

Anders Pedersen starts us off with a doozy. When deleting a large number of rows, should we do this in one transaction or not?

Answer: Nope. Delete in batches, although this can take a while for enormous tables. If you’re retaining a tiny percentage of rows, then it might be easier to create a new table, migrate the data you want to keep to that table, drop the old table, and rename the new table back to the old name.

If you’re using Enterprise Edition, you can partition your tables by date and use partition switching.

On Long-Term Storage of Data

As part of deleting lots of data, we ended up talking about long-term archival storage of data. Tom brought up Stretch DB and I laughed. I laughed because Stretch DB is dead on arrival as soon as you look at the price.

If you aren’t made of money, there are a few other options. One I like is to use PolyBase for cold storage of data. Solomon Rutzky also recommended storing archival data on slow disk within SQL Server.

Magnetic Storage Has Its Place

Mike Lisanke calls me out and says that magnetic storage has its place in the world.

To that I say, this is true. I want things as fast as possible, though, and faster storage is one of the easiest ways to make your SQL Server a lot faster. Spinning disk and tape are good for long-term backup storage.  But they’re generally not for OLTP or even OLAP scenarios. Give me NVMe or even SSD any day of the week.

Why do Databases Not Have Multi-Level Caching?

From Mike Lisanke, why do databases not have the concept of multi-level caching?

Answer: This answer is for SQL Server in particular; it may be different for other database technologies.

SQL Server has a buffer pool, where data is read into memory before it is returned.  That’s one level of caching. From there, multi-level caching is more of an architecture decision:  adding caching apps like Redis or using in-process cache in your app servers. That’s outside of the database but replaces database calls, so it effectively acts as another layer of caching.

Also, there is a concept of aggregations in SQL Server Analysis Services, where the engine creates pre-computed aggregations of slices of your data. That gives you a performance boost sort of like what caching does, and you can replicate this in the database engine with rollup tables.

Tools for Recording Presentations

Mala recommends Skype, as it is free and lets you save recordings. She also recommended checking out work from Doug Lane (for example, his gear to make technical videos—though that is a few years old) and Erik Darling.

Tom uses GoToWebinar but doesn’t do many recordings.

I use Whereby for streams and you can record on there.  I use Camtasia for professional video editing and post-processing.  OBS Studio is great for gonzo work when you don’t want post-processing. It’s also the software I use for streaming.  Windows Video Editor is a thing but I have no experience with it so I don’t know how well it would work here.  Adobe Premiere Pro is great if you can afford it.

Mike Chrestensen recommended frame.io and Loom as well. And Raymond Porrata has a whole list of open source video editing tools.

Mike Lisanke had a follow-up question here about using transcription services. Anders Pederson recommended Rev. You can also use things like Azure Speech to Text.

What Are You Working on Now?

Mala is currently going through SSIS training from Andy Leonard. Andy is an outstanding teacher and one of the best at SSIS. If you get a chance to learn from Andy, take it.

Tom is working on building an enclave in his environment so he can use Always Encrypted with enclaves.

On Tuning a Stored Procedure

John fan Zhang had a lengthy question for us which I’m summarizing as, given a new stored procedure which inserts batches of rows into a table, I am seeing resulting worse database performance.  What can I do about this? The table is a heap. Will a unique clustered index help?

Answer: My first thought is, check your storage. If you have cheap disk, get better disk performance and your problem probably goes away.

Inserting into heaps can be faster than inserting into tables with clustered indexes due to the hot page problem.  This typically matters more when dealing with concurrent insertion rather than single batch operation.  Still, in most cases, a clustered index will be faster for insert than a heap.

MERGE Versus INSERT

Mike Chrestensen asks, will using MERGE to insert data be faster than INSERT?

Answer: No. Also, avoid MERGE.  It has lots of bugs.  It’s easy to end up with terrible performance.  It’s generally slower than independent INSERT/UPDATE/DELETE operations.

Shop Talk: 2020-04-06

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tom Norman

Notes: Questions and Topics

SSIS and SSMS

Can SQL Server Management Studio 18.4 connect to SSIS 2017 and SSIS 2019?

Answer: Yes. As of SSMS 18, you can connect to Integration Services 2017 and 2019. For prior versions of Integration Services, you will need the same version of SSMS as SSIS.

Azure Data Studio and Notebooks

Kevin’s mini-rant about Azure Data Studio shortcuts can be summed up in two GitHub issues: supporting Jupyter shortcuts and supporting Command Mode. Please upvote those by choosing a thumbs-up reaction if you want to see these in Azure Data Studio.

Module Signing

From chat:

johnfan14: Can I ask a question? The question is If we must modify the Orders table to meet the following requirements: 1. Create new rows in the table without granting INSERT permissions to the table. 2. Notify the sales person who places an order whether or not the order was completed. What should we create?

For the first part, my recommendation is to use certificate signing. Solomon Rutzky has an excellent tutorial on that. Solomon happened to be in chat and mentioned that ownership chaining can work as well for many circumstances.

The answer to the second part is generally to use something like Service Broker. For more on that, I’d recommend Colleen Morrow’s series of posts on the topic.

SQL Server in Docker Containers

We had a question in chat about using SQL Server in containers. Microsoft has some good documentation on how to get that going.

I also mentioned running Linux containers natively in Windows without emulation via Hyper-V. You can read more about that on the Docker website.