Our primary topic for the evening was a review of Red Gate’s State of the Database Landscape 2024 survey. Red Gate received 3849 responses and put together a great deal of information, so we covered a fair amount of it. I also got sidetracked a couple of times on specific choices for visuals, but that complaining aside, I appreciate all of the work Red Gate put into the survey and their findings were interesting enough that we discussed them for quite some time.
OpEx vs CapEx
Along the way, ENHANCE_QC brought up a comment about operating expenditures versus capital expenditures in chat and I dove into the topic a bit, referencing a couple of slides from a talk I deliver on saving money in the cloud.
The first topic of the night was a blog post by Pete Warden, asking why the Internet of Things phenomenon never got past the trough of disillusionment in Gartner-speak. Pete brings up several good points, though I spend a long time railing on one point I didn’t see from Pete: the utter lack of security. But in fairness, I’m also not sure that the utter lack of security is actually a factor in why the mass market generally rejects IoT objects, because I’m very cynical about this topic.
Anders and I also spent a bit of time covering industrial IoT, where there’s a much stronger use case. And I went half-ranty on the desire for Intranet of Things and not Internet of Things.
Goodbye, System.Data.SqlClient Package
Topic number two comes from a David Engel announcement: Microsoft is deprecating the System.Data.SqlClient package. For the most part, this shouldn’t be a big deal to switch over because Microsoft.Data.SqlClient has been around for several years and has significantly grown in capability. It did remind me of FSharp.Data.SqlClient, which needed System.Data.SqlClient back when I used it.
Improving Full-Text Index Performance
Mala wanted us to talk about a Microsoft article on improving full-text index performance. This article brought back memories, many of which were not good. For example, it reminded me that AWE was a thing. For those who are blissfully unaware, prior to SQL Server being a 64-bit installation, the x86 version was limited to 2GB of RAM. Unless, of course, you enabled the Address Windowing Extensions capability, which allowed you to use 3GB of RAM for the database engine. Given that this article prominently talks about AWE, I think it gives you an idea of how quickly advice changes around full-text indexing.
I will give the article major kudos on one point: it provides an explicit formula for “How much memory will I need for this?” Too often, the answer to that question is, “Dunno, how much you got?” And that’s not a particularly satisfying answer. In fairness, the reason for this is there are so many factors that affect memory needs, including database sizes, indexing strategy, expected query response times, other services running on the machine, etc. That makes it really hard to give a satisfying a priori answer. This article does that, at least with respect to how much memory the fdhost.exe service will require. I think it definitely falls short on amount of memory left over, especially for modern servers, but I still applaud its attempt to get to a conclusive answer.
The first topic of the night was EightKB, an online conference for advanced data platform topics. Their annual event happened recently and you can see all of the videos on their YouTube channel. If you want to dig into topics around SQL Server and other data platform technologies, EightKB is one of the best places to go.
Training Resources for R
Our next question came via e-mail and was around good training resources for a person learning R. In this case, the person was not necessarily that technical but did want to learn about data analysis. Mala, Mike, and I all know R and so we put our heads together and came with the following as resources we can recommend:
R for Data Science (2nd Edition) — Free book by one of the core developers of the R programming language
The Epidemiologist R Handbook — Free handbook that specializes in epidemiology but does teach a lot about principles of R as well, so it’s certainly not limited to health topics. I haven’t read this but Mala Mahadevan recommended it.
An Introduction to R — Another free book. I haven’t read this but Mala Mahadevan recommended it.
Practical Data Science with R (2nd Edition) — This is a book I whole-heartedly recommend for people getting into data science. Your wife probably won’t care about some of it (stuff like application lifecycle management) but it does a great job on the topic.
DataCamp — There are 127 separate courses on R currently hosted on DataCamp (as of the time of recording). This is a paid service, but I’ve been paying for it for several years.
Running out of Flight Numbers and other Data Growth Problems
Our third major topic of the night was a post from View from the Wing (via the Brent Ozar Newsletter) around airlines getting closer to having 10,000 flight numbers and some of the challenges around it, especially for places that still have analog boards displaying flight information. None of us hosts has enough knowledge of airline industry software to know how easy or difficult it would be to change (though probably absurdly difficult, given that this seems to be the norm), though Anders in chat did share some interesting thoughts on the topic. We did digress a bit to discuss other problems companies run into, such as running out of identity integers.
The first topic of the night covered the recent CrowdStrike incident. Mala pointed us to one article and I brought up an article in The Register. There’s a fair amount that we know about the story, so we covered what we had, including a brief discussion of travel issues. I was fortunate in that my travel happened after American Airlines figured out their issues, but Friend of the Show Marshall had some tales of woe in Atlanta.
Languages and the Stack Overflow 2024 Survey
Our next topic came from Friend of the Show Mark Gordon, who asked what languages developers are into these days. I shared my list: C# and Java are common but not necessarily popular languages. Python, Rust, and Kotlin are popular languages (as is Go, though I forgot to mention that one), with Python really dominating the mainstream in a bunch of fields. And, of course, for any front-end development, there’s the JavaScript Framework of the Month Club: React, Vue, Express, etc.
This also happened to coincide with Stack Overflow’s 2024 survey of nearly 50,000 developers. We looked at the languages on this list and talked about which languages are over-represented (due to the survey being a sample of Stack Overflow users, who are themselves a biased sample of the overall developer population). A few languages like COBOL are underrepresented, I argued, as people aren’t going to Stack Overflow to answer COBOL questions. We also talked about longevity in systems: C++ and C are both still high on the list, even though I’d imagine that most new development isn’t happening in those languages. But that’s the effect of legacy in programming languages.
GitHub Forks and Repo Privacy
Our final topic was an article from Joe Leon at Truffle Security, looking at accessing deleted and private repo data on GitHub by use of forks. My stance on this is, it’s worth knowing that there’s a risk from forks, but the behavior makes sense and enables open-source projects to offer the ability for people to make pull requests without needing to grant write permissions to randos. GitHub can also rightly say RTFM, where they clearly indicate the consequences of enabling forks on a repository. So if you are administrator of GitHub repos in an enterprise, read the article, understand what it means, and carry on.
Our first topic was me talking about some of the audio silliness I put up with for years because I didn’t know any better. I blame nobody but myself for hiding all of those decibels for so long. Things are better now, by which I mean I’m a lot louder. It also means that people who don’t speak English will understand me better. Because I’m louder.
Version-Specific Functionality
Mala brought up our primary topic for the day, a post on X from Grant Fritchey regarding version-specific functionality. My official ranking of responses is:
Build and optimize for what you have if it’s an internal app. If your systems are running SQL Server 2022, take advantage of what the product has. Don’t limit yourself to old versions of T-SQL for some silly reason. I specifically disagreed with one response talking about limiting T-SQL to “2008/2012”—which is itself a wild answer, considering that 2012 was a huge change in the T-SQL surface area. Think about extended window functions, TRY_CAST(), PARSE() and TRY_PARSE(), CONCAT(), OFFSET/FETCH, and the use of sp_describe_first_result_set. Those are available in 2012, not 2008.
If you are building something for internal use, don’t build “for portability” unless you actually port databases with regularity. Because almost no company actually does that. They stick with a platform for decades, and even if a database does move from one platform to another, it’s a one-time thing. You’re crippling your database capabilities in an attempt to be the lowest common denominator.
If you are building a third-party product as an ISV and you expect your database to be “real” (in the sense that you have sufficient data and number of queries per second that it couldn’t comfortably run on a Raspberry Pi), build for each platform you support. Don’t use generic ANSI SQL just because you’re supporting SQL Server, Oracle, PostgreSQL, and MySQL. Write and use T-SQL for SQL Server, PL/SQL for Oracle, etc. Take advantage of what’s in each platform and then have your application’s data layer sort out the differences so that your business objects can be the same regardless of the data platform technology.
If you are building a third-party product as an ISV and your database can comfortably run on a Raspberry Pi, go ahead and do whatever you want.
Normalization
Our next topic was a good article from Daniel Calbimonte on normalization. I covered this on Curated SQL, but wanted to get a bit further into detail on why I argue that atomicity is not actually a pre-requisite for 1st Normal Form. In Daniel’s defense, a lot of references on database design and normalization include atomicity as a pre-req, but I base my argument on CJ Date’s definition of 1NF and his critique that “atomic” doesn’t actually have a precise meaning, whereas each rule of normalization must have a precise mathematical definition.
I also touched upon a fantastic essay from Phil Factor about the entity-attribute-value anti-pattern. I don’t know that I’ve ever met Phil (though if you’re out there, let me know—I can keep a secret!) but this is a must-read essay.
Our first topic was a SQL Saturday survey that Steve Rezhener put together and that Steve Jones has promoted. I do recommend filling out the survey, and you can also view the results. Some of the questions on the survey were a bit of a challenge to understand, so I think that has limited the number of responses (in fact, Anders quit about halfway through the survey for that reason). But the information is potentially useful for getting a pulse of where SQL Saturday attendees are in terms of their current capabilities, as well as an idea of what types of sessions might be interesting to include.
Along the way, I talked a little bit about designing surveys, took some potshots at DB2, learned from Solomon that some of my cheap shots at DB2 around 8-character column names are no longer valid, and dove into how ranking mechanisms work for sites like DB-Engines and why its results might so radically differ from the Stack Overflow survey. Populations matter a lot for surveying.
Paginated Queries
The other major topic came from Tracy, despite her not being on the show this time around. She wanted to talk a bit about pagination and OFFSET/FETCH in particular. I happen to have a talk that includes pagination techniques and went into detail on how each works and the expected performance profile for each one. OFFSET/FETCH is fine for systems where you don’t have a huge number of pages, or where people rarely go past the first few pages. But it performs steadily worse the farther in a person goes, as it requires scanning all of the offset records and then pulling in the fetch records. I also lamented how much of a challenge pagination is, and how there are so few really good patterns for it given how common a technique it is and how important it is for websites.
The second topic was a blog post about two new additions to T-SQL in Azure SQL Database: UNISTR() and ||. The || operator acts very similarly to the + operator and serves to concatenate two items together. || is the ANSI SQL standard, so that makes it a perfectly reasonable addition to the syntax, though I’ll use CONCAT() and FORMATMESSAGE() over || or + most days of the week.
The other addition is UNISTR(), which lets you escape Unicode characters. Solomon had a good explanation of why he’s not the biggest fan of this change.
SQL Server Linting
Our final major topic was thanks to Tracy: linting in SQL Server. Tracy recommended super-linter as a project for linting in a wide variety of languages. In this broader project, there are a pair of linters for relational databases, and Tracy’s preference is SQLFluff. We explained what the purpose of a linter is, looked at some linting rules, and got derailed on “usp_” and passive voice, as is my wont.
Our first topic came from Brent Ozar’s Monday links newsletter and covers a report that Vista Equity is writing off the entire equity value in Pluralsight, which they bought for $3.5 billion a few years back. I ranted a bit about private equity and talked about some of the challenges that learning platforms have in our modern era. I really liked Pluralsight and was a subscriber for years, only cancelling the subscription after I moved to a totally different role and really didn’t have time to take any more courses.
Copilot+ Recall: A Bad Idea
Our other big topic was an amazing article covering a terrible idea. Kevin Beaumont has written a top-notch post describing the many ways that Copilot+ Recall is not ready for prime time. And one cynical take I read after broadcast summed up the problem quite nicely: if Copilot+ Recall does become generally available, you have to assume that any document you ever send to a user running Windows 11 (or later) will be stolen.
All the Copilots
Speaking of Copilots, we also talked about the various Copilots Microsoft has available right now, based on this Ginger Grant article. Ginger lists 10 of them and we might even have quibbles about there being additional ones (e.g., PowerPoint and Word being separate Copilots rather than one Office 365 Copilot) in practice if not in pricing.
Our first topic was a quick recap of SQL Saturday Richmond. It was good to see the RTP area turnout, as well as some people we don’t get to see very often, like Anders. I also talked about how the SQL Saturday Richmond people gave me 2 hours and 15 minutes to talk about normalization and I still went long. It’s a mad world.
From there, I had to show the Nuke map, which gives you an idea of the effective zone of a nuclear explosion. Marshall brought up the Mars Bluff atomic weapon blast that affected his grandparents.
Back on Azure regions, I showed one of my favorite sites for pricing; CloudPrice.NET. This started as a discussion of relative region prices and then went into discussion of specific VM costs. Searching across regions is a great way to save money, assuming you have isolated resources and aren’t sending a whole lot of data cross-region.
We also dug into a linked article on prompt injection and had a bit of fun with other topics tangentially related, as happens from time to time (by which I mean “every time”) on Shop Talk.
First up, we wished Tracy Boggiano a happy birthday, as one does. After that, Mala announced that she’s graduating from a Master’s program in a couple of weeks, so we wished her an early happy graduation day.
T-SQL Tuesday
I gave a sneak preview of this month’s T-SQL Tuesday, which I am hosting. The theme is, what is your favorite job interview question? I talked at some length about questions as the interviewee or candidate, because asking good questions is something a lot of candidates don’t think to do. You’re interviewing the company just as much as the company is interviewing you, after all.
Mala also shared a link to this article on the Structured and Unstructured Query Language, otherwise known as SUQL. I had far too much fun mocking the acronym. As an idea? I dunno, maybe it’ll work out great. But as a name? No, no, no, please talk to the marketing department first.
Anders Has a Song
Anders used Suno to generate a song for the DBA, lyrics, chords, and all. You can and should enjoy it for what it is.