Show notes for the Triangle Area SQL Server User Group's Shop Talk program.
Author: Kevin Feasel
Kevin Feasel is a Microsoft Data Platform MVP and CTO at Envizage, where he specializes in data analytics with T-SQL and R, forcing Spark clusters to do his bidding, fighting with Kafka, and pulling rabbits out of hats on demand. He is the lead contributor to Curated SQL (https://curatedsql.com) and author of PolyBase Revealed (https://www.apress.com/us/book/9781484254608). A resident of Durham, North Carolina, he can be found cycling the trails along the triangle whenever the weather's nice enough.
Our first topic was a viewer request from the prior show and was about HYCU and Azure Backup. I wasn’t familiar with HYCU and so I spent a little bit of time looking at their Azure offering and their data protection documentation. I still haven’t tried it out but that didn’t stop me from talking about it!
OpenPilot
This was a departure from the normal data platform fare, but I’ve been using Openpilot and the Comma 3X a lot for assisted driving. Behaviors are a lot smoother than my stock driver assist options, especially when I’m not on highways. It’s definitely not perfect for me, especially in how it tries (and fails) to make smooth turns, but a surprisingly large percentage of my driving involves driving approximately in a straight line, and this does a fantastic job of that.
We opened up with a note that the Live! 360 call for speakers is currently open. I’ve had the fortune to be able to speak there several years now and I think it’s an excellent conference covering a wide variety of tech tracks.
The Modern Data Stack
Friend of the show Chris Voss chimed in with the major theme for the night: the modern data stack.
Summarizing Chris’s question in my own words:
What, exactly, is the modern data stack? Many companies are selling it as cloud-native (or at least cloud-first) services, consolidating data lakes and data warehouses together.
Is this concept just a bunch of marketing hype or is there something real to it?
We spent a lot of time talking about Microsoft Fabric, data lakes and lakehouses, and a variety of other topics in and around the “modern data stack” idea.
I have no problem with any of Hugo’s conclusions and appreciate the level of effort he went through. In fact, after reading the blog post, I’ve changed my mind on MERGE. Early on, I was gung-ho about it because I knew about Oracle’s UPSERT and wanted an equivalent for SQL Server. But once all the bugs and horror stories came out about MERGE, I stopped using it altogether. Now, it’s good to know that most of the problems have been fixed. There are still some cases in which you don’t want to use MERGE and Hugo highlights those instances, but for the rest of the time, it’s just as safe as having separate operations for insert, update, and delete operations.
F# and the CLR
Our other topic was around a ticket I’ve been watching for a few years to support F# in the SQL Server CLR. I got a rush of excitement when it was closed as fixed, but after a bit of confusion, the ticket was re-opened and it’s still awaiting work. Hopefully this will make its way out to the world someday. It would have been a great addition 15 years ago, when the CLR was still novel, as F# is an excellent language for the small types of functions that work well as CLR functions. But still, better late than never.
Our first topic of the night was TriPASS election results. With a resounding percentage of the vote, Tracy Boggiano and Mike Chrestensen are still stuck doing work for me TriPASS board members. I congratulate them and am glad to have them (as well as all of the board members) aboard.
Is SQL Server Old and Not Exciting?
Mala brought up the first real topic of the night, in which we talked a bit about how exciting SQL Server is. My belief is that yes, it’s not exciting for most people, but that’s okay: it works and it works well. That said, even within that answer, I’d still say that there are plenty of things in the product which do excite me. Of course, I’ve also focused a large portion of my career around SQL Server, so you’d hope that it would interest me.
I have a longer answer in the segment but basically, relational databases aren’t exciting and that’s generally a good thing. Exciting data platform technologies tend to lose your important data, even if they’re web-scale.
Skepticism of Low-Code Solutions
The other article was an interesting blog post from Nick Scialli on low-code products. I agree with the crux of the article and we spent a good amount of time talking about it, reminiscing about FoxPro, doing the opposite of reminiscing about PowerBuilder, and plenty more.
Mike started us off with a neat lab to learn how to write SQL, specifically using the SQLite syntax. The story is a murder mystery and you figure out whodunnit based on data in the tables. There’s also a walkthrough if you don’t know where to begin.
This kind of reminds me of the Kusto Detective Agency project, though that one ramps up really fast into complicated solutions rather than staying at a beginner level.
Erik Darling on Stored Procedures
Our primary topic was around a series of blog posts Erik Darling has released. Erik is responsible for maintaining some rather large open-source stored procedures and is a clear writer, so this has been quite interesting to follow. I don’t agree with 100% of what he has to say and talk a bit about some of the disagreements, but these are usually minor quibbles rather than “He’s totally wrong” disagreements.
This was a fairly packed episode of Shop Talk. We first talked about the three major conferences that happened the prior week: PASS Summit, Microsoft Ignite, and Live! 360. I went to Live! 360, so spent the most amount of time talking about that, but Anders went to Summit and shared some experiences. We also spent some time going through the Microsoft Ignite book of news.
Microsoft Fabric Now GA
One of the major announcements at Ignite was that Microsoft Fabric is now generally available. I shared some thoughts about the product: to wit, that there’s a lot still coming, and I hope it doesn’t develop a reputation of “too much broken, too much missing” before it stabilizes as a really useful service. The problem is, I think that it’ll take about 1 1/2 to 2 years before it is a really useful service for most companies, and a lot of opinion solidification can happen in the meantime.
OpenAI Wackiness
Our other major topic was some of the news happening at OpenAI right then. As of the time of recording, the OpenAI board booted CEO Sam Altman, and chairman of the board Greg Brockman resigned in protest. Then, Altman and Brockman went to work for Microsoft heading their AI team and it seemed a lot of OpenAI employees were headed that way. The news is still in flux as I post this recap—for example, Altman is back as CEO of OpenAI, while the board members who ousted him are now gone.
There will be an election for two TriPASS board positions this year. Tracy Boggiano and Mike Chrestensen currently hold these positions and are re-running. If you are a TriPASS member in good standing (basically, you’ve signed up on Meetup and haven’t broken any of the bylaws), you can also throw your hat into the ring and run for office. If so, please let me know, either via e-mail (if you have it) or reaching out to me on Meetup.
Upcoming Events
The first topic of the night is that there are two major conferences coming up next week: PASS Summit and Live! 360. Tracy will be at the former and I’ll be at the latter, so if you’re going to either, come say hello.
Somehow, we got sidelined on using Visual Studio Code extensions in Azure Data Studio. I used the example of the excellent Rainbow CSV extension for Visual Studio Code and showed how to get it working in Azure Data Studio. It failed the first time because my version of Azure Data Studio was a bit old, but after updating, the second time was the charm.
Oracle Migrations Are Scary
The next topic covered some pain Mike is going through right now as his organization does a big migration from Oracle to SQL Server. I don’t have a lot of great advice around this, sadly. Or maybe happily, as it means I haven’t had to do one of those.
Options for Improving View Performance
The final topic of the night also came from Mike and it has to do with optimizing some nasty-sounding views. We had to stick to high-level ideas because there was no example to troubleshoot live, but here’s some of the key guidance:
Views are just saved queries. Try running the query outside a view and see if there’s a drastic difference. There shouldn’t be, but if there is, it can be an indicator of some other problem we didn’t discuss.
Review the execution plans for common view queries. Optimize them like any other query plan: look for big hitters like large (and unnecessary) table scans, lazy spooling, or reading millions of rows to return back a few hundred.
Don’t nest views. The optimizer can give up if you nest views deeply enough and you can end up with many joins to the same tables, causing additional I/O that you could optimize out with a better query. Make the query optimizer’s job as simple as possible by keeping your queries simple.
We talked a little bit about the possibility of indexed views but passed on them because the situation was not conducive to using them.
Solomon brought up some good ideas as well: Make sure the data itself is in the correct data types and you don’t have implicit (or explicit) conversions on joins requiring scans.
There will be an election for two TriPASS board positions this year. Tracy Boggiano and Mike Chrestensen currently hold these positions and are re-running. If you are a TriPASS member in good standing (basically, you’ve signed up on Meetup and haven’t broken any of the bylaws), you can also throw your hat into the ring and run for office. If so, please let me know, either via e-mail (if you have it) or reaching out to me on Meetup.
Plan for the Year
The next order of business was a discussion of our plan for the rest of the year. Because of various holidays, TriPASS usually slows down in November and December, so here’s what you have to look forward to:
Rick Pack presented on Tuesday night (video available here), closing out our Data Science and Business Intelligence series for the year. We’ll be back next year with more talks on those topics.
There will be no advanced DBA meeting in November because that’s the same week as PASS Summit and Live! 360.
Mala Mahadevan will speak at the main meeting on November 21st.
Torsten Strauss will close out our Advanced DBA group on December 12th. We’ll be back in 2024 with more advanced DBA meetings as well.
Surbhi Pokhama will speak at our final main meeting for the year.
Shop Talk will continue every two weeks, though we might miss one in December, as well as the one scheduled for New Year’s Day 2024. Be on the lookout in Meetup closer to December to see if we’re hosting Shop Talk those days.
RCSI and ETL Processes
Mala wanted to talk about a nice article from Michael J. Swart about Read Committed Snapshot Isolation and one risk you can run into if you design ETL processes in a particular way. We talked about the nature of the problem and why it happens, as well as several workarounds. I definitely wouldn’t use this as a reason to avoid RCSI, but it is worth checking your ETL processes to see how they load data.
I also brought up a discussion of RCSI versus Snapshot isolation, using a great blog post from Kendra Little as a guide. I’ll admit that I’ve never actually used Snapshot isolation in a production environment, though I can understand the issue that Kendra mentions in the post.
Our first topic was Data Relay, where I had spent the prior week. If you’re not familiar with it, Data Relay is a touring conference which spans 5 cities in 5 days across the UK. We talked quite a bit about the first day’s events, as that happened to be the National Museum of Computing in Bletchley Park.
Yes Aliases
Mike was responsible for the next topic, covering whether or not to alias table names in queries. We used this Reddit post to springboard the discussion, but it’s something Mike is dealing with at work as well. We talked about how a “no aliases” policy is a bad idea, but also about rules of thumb which make aliasing easier to understand. I tend not to have hard-and-fast rules around aliasing and certainly don’t expect aliases to be globally unique and non-repetitive (e.g., a dbo.Patient table always and everywhere aliased as pat, but no other table either being aliased as pat), but they should be meaningful enough within the context of a query to make clear which table, view, or expression it is.
Fully Qualified Object Names
Mike also had a question about qualifying object names. For example, in SQL Server, you can have up to four-part naming: instance.database.schema.object. We talked a bit about what each of those four parts actually means and why I always include schema and object when querying, almost never include database, and even more rarely include instance.
The next article we looked at was a good article on row-level security attacks. For the most part, this boiled down to attacks around row-level security rather than through row-level security, although there was an interesting side channel attack involving division by 0.
SQL Server 2022 Utilization
We got a question in chat about whether SQL Server 2022 is worth learning, as many companies haven’t moved to it. It’s a fair question, but my short answer is yes, it can make sense to learn about SQL Server 2022 even if your company is using 2016. There are a couple of reasons. First, SQL Server’s database engine doesn’t tend radically to change between versions, so it’s pretty easy to get up to speed on multiple versions of SQL Server. Second, you might find something useful in SQL Server 2022 which isn’t available in prior versions, and that could drive a change.