Our first topic comes from Tracy, who noticed an odd situation in which using CONVERT() on a dataset ended up about 200ms slower than using CAST(). I didn’t have any good answers, though chat and I speculated on possibilities.
The Future (and Past) of No-Code Programming Languages
After that, we meandered a bit on no-code programming languages, using this article as a jumping-off point. There’s some value in no-code languages but finding a general-purpose, no-code programming language which doesn’t want to make you tear your hair out in frustration as soon as you leave the happy path is like searching for El Dorado.
If you’re on that happy path, then they can be a good way of getting started. Examples I used include SSIS, Informatica, and the Azure ML designer. These are custom-purpose canvases for specific operations. Trying to turn diagrams into code is something that we’ve seen since Rational Rose (and maybe even earlier). It just doesn’t work.
Using AI for Movie Funding Decisions
Our final topic came from this Hollywood Reporter article about Warner Brothers using an ML model to help determine whether to fund movie production. We talked a bit about it, including some parts without snark. Just a bit.
Our first topic was an update on SQL Server tools. Mala and I talked about our experiences using Azure Data Studio, and I mentioned a three-part series from Erin Stellato on the most recent release for SQL Server Management Studio (part 1, part 2, part 3).
Azure Data Studio has made some great strides, especially in how it displays execution plans. If T-SQL development is your primary use case, give Azure Data Studio a try. If you’re mostly a database administrator, you can also give Azure Data Studio a try, though SQL Server Management Studio is likely to have more of what you need to do your job.
AI Feedback Loops and Costs of Training
Anders pitched me a softball with this article about the AI feedback loop. The idea is that, with the proliferation of Large Language Models (LLMs), they’re generating a large amount of the data which will be used to train the next generation of LLMs, and we tend to see model quality drop with auto-generated gibberish, leading to even more auto-generated gibberish.
I also covered an article on diminishing returns to Deep Learning techniques, indicating that it takes about k^9 computational power to cut errors down to 1/k. For example, if we want to cut training errors in half, that’s k=2, and we need about 500x as much computational power to do that. For this reason, there are some slowly-changing boundaries around how much neural network model improvement we can see with additional hardware generations.
A Case Study in Insider Risk Management
The last topic was an interesting story I learned about because of a training I helped deliver. The case study involved a woman who stole a formula while working at Coca-Cola and tried to use it to bootstrap a new company in China. The whole story is quite long, but goes into a good amount of detail on the topic and was pretty interesting, especially as it details how she was able to circumvent Coca-Cola’s Data Loss Prevention software that they were using at the time.
Microsoft Build was the day after our prior episode of Shop Talk, so this time around, we dove into Microsoft Fabric. I answered some questions for Mala and gave her a better idea of where this fits in the modern data warehousing world. We spent a bit of time talking about licensing, and I talked a bit about why I really don’t like the name.
OpenAI and Yet Another LLM Rant
One of these days, I’ll stop ranting about LLMs. Probably when the next thing hits the peak of the hype cycle. We spent some more time talking about OpenAI and LLMs, as it seemed that about 2/3 of Build content was geared that way. I was surprisingly less critical of it than I have been over the past few weeks, but I think that’s just an internal attempt not to go too far overboard in my critique—it’s not that there’s zero utility in LLMs; it’s that humans need to be in the loop the whole time, and we need to remember that these are language models and not fact models. I also talked a bit about a great article by Eric Hartford on uncensored models. I completely agree with Eric: it’s fine to add your own bias after the fact—for example, I don’t want my corporate-friendly chat assistant to start engaging in lewd conversation with potential customers—but I also don’t like the idea of some unrelated organization deciding what’s acceptable and what isn’t.
Microsoft Build was the day after this episode of Shop Talk, so we discussed a few items around Microsoft’s then-known emphasis on AI’ing all the things, as well as poking fun at a few deprecated features like StreamInsight.
We had a question come in from chat about how to get started with Azure SQL, getting an idea of what it entails versus on-premises SQL Server. I recommended the Azure SQL Workshop and Mala included a YouTube video from Anna Hoffman explaining when to use each service.
Recently I have taken over the DB tasks at the company I work at. The structure is either defined by software we use or logically set up decently. The only problem I’ve noticed so far though is the array of methods how data is manipulated in SQL server for user defined tables. For example, we use:
– SSIS packages in the Integration Catalog (SSISDB) – found eventually by using a file table but manually exporting right now since not setup with FileStream on the server
– SSIS packages in a file system called by SQL Server Agent job – parsed with a PowerShell script
– SQL Server Agent TSQL steps – using TSQL to parse sys tables to read the commands of steps for keywords
– Procedures – TSQL script looking for keyword through procedure definitions in all databases
– Triggers – Similar to procedures
Normally I use all these tools developed to try to track where the changes are being made and it actually works really well until the other day 🙁 For the life of me I can’t find where a table is being updated and it’s driving me crazy! I realized I even missed cases like SQL Server Maintenance Plans, Task Scheduler events, software methods/functions that connect to the database, etc. Is there a way to see where and how a DB object is manipulated and have the ability to go modify that code? We would like to stay away from installing 3rd party tools but I am open to suggestions. The other day I read about SQL Server Profiler that comes with SSMS. I think that is enabled and I am going to try that on Monday. However, I am curious will that return every context of how the object is updated? I have 0 knowledge about this tool. Also, if there is another built-in tool recommended I am all ears as I am new to SQL Server.
When you are content simply passing back a “bag of attributes” to the application and have it sort things out, especially if you know the result set will definitely be small.
When building a proof of concept and you are positive (not just hopeful) that you will rebuild the data model before the first actual release.
The second use case is the scary one, as there are many things we tell ourselves to mollify the voices in our heads saying that this is a bad idea.
Are Databases Legacy Technology?
The last topic came from a couple of anecdotes about “kids these days” and coming out of computer science programs not knowing about databases. Most computer science programs are bad at teaching anything database-related—if you want to learn about relational theory, there’s usually at most one course (undergraduate or graduate-level), but if you want to learn practical applications of databases, you’re usually out of luck.
I think that this is a challenge, as it means we have new developers coming out of school who are completely ignorant of how data management works. At the same time, this is a challenge which has been with us for quite some time. When I was an undergraduate, I had…one course offered on relational theory. I don’t think the pattern has gotten worse, but there’s obviously a gap and the question is, is there a broader-based way of filling that gap than trying to introduce a database course one university at a time? I don’t have a good answer for this, though we do talk out the idea a bit and maybe that will help somebody come up with a better solution.
Last week, Tracy and I were in Redmond for MVP Summit. It was good seeing a bunch of people for the first time in several years.
We next talked a bit about the notion of cross-platform code, that is, code which runs in Oracle, SQL Server, and other relational databases. There’s some value in the idea but the problem is that you lose out on some of the platform-specific performance and code benefits and end up designing for mediocrity rather than high performance.
What Compatibility Level Does
Our first real topic was a Brent Ozar blog post on compatibility level. The short version is that recent versions of SQL Server have introduced more functionality wrapped around compatibility level, and it’s good to know which things are gated by compatibility level—and importantly, which things aren’t.
Another Large Language Model Rant
Chat then baited me into a discussion on large language models and using ChatGPT for coding assistance. Where I’m at with this is, if you’re good enough to know what code you intend to write (and you don’t worry about potential licensing problems), you can probably treat ChatGPT as a drunken intern who writes code that you need carefully to review. Aside from that, I’m rather pessimistic about the whole notion.
I started off with a ramble around a comment I made on the SQL Data Partners podcast: that you’d have to pay me to learn about Google Cloud Platform. Someone reached out to me to ask for more info and if that meant that I dislike GCP or something.
The short version is no, I have no ill feelings toward GCP. Neither do I have positive feelings toward it. It fits squarely into a purposeful blind spot for me, which comes about because the opportunity cost of learning something is the value of learning the next-best alternative. In other words, there’s only a certain number of hours in the day to learn things, so I’m going to prioritize things I find interesting, things which are easy for me to pick up, or things which (eventually?) make me money. Azure I know because I get free credits and get paid to know it well. AWS I know because I’ve worked in jobs where they’ve paid me to know enough about it. I’ve never had anyone pay me to learn GCP, so there’s no external incentive. If a customer came to me and said that they were switching to GCP and would like me to learn it, then yeah, I’d pick it up and see how things differ from Azure and AWS. But otherwise, it’s not on my radar.
Now, one thing I didn’t get into is that philosophically, I do find value in the learning equivalent of “wandering aimlessly.” I’m the type of person who would walk up and down the aisles in university libraries, not looking for a specific book but just willing to let whatever drew my attention guide me. This style of learning doesn’t always pay off, though I’ve found its hit rate is a lot higher than you’d first expect. So even if nobody pays me, there is a chance that someday I pick up GCP and try out some things. But the probability is still low—there are a lot of books on those shelves.
Draft Flag Driven Development
Mala pointed out a link to this Alex Bunardzic article on what he calls Draft Flag Driven Development. It took me a bit of discussion in chat and noodling through the idea to figure out the problem that I have with it. I do understand that, for many companies, the signal from code in a test environment succeeding (or failing) is not an extremely strong indicator of production succeeding or failing. But the big concern I have with this style of development is the risk of “not only did I break this thing, but I also broke a bunch of other stuff along the way” problems, where reverting back to the prior release isn’t enough—think catastrophic data failure or sending permanent changes to a third party provider.
Tracy and Mala started off with a quick review of SQLbits, with Mala mentioning that it was probably the best hybrid experience she’s had with a large conference.
Parameter Sensitive Plan Optimization
After that, Mala shared her thoughts on a new feature in SQL Server 2022 that she’s been trying out: parameter sensitive plan optimization. Jared mentioned some of the challenges with it but we also talked about how some of the criticism of this feature is a bit overblown.
40 Problems with a Stored Procedure
Mark Hutchinson got us to talk about this article from Aaron Bertrand involving a code review of a nasty piece of work. Aaron found 40 separate problems, so we went through and talked about each of them. I came in expecting to disagree with 10 or so, but I think I really only disagreed with 3-4. I was actually a little surprised by that, though then we had some fun pointing out the formatting problems in Aaron’s updated procedure. Sometimes what is best in life is to be just a little petty.
Mike and I had a mini-debate for this topic. While we were talking about the topic, I included this explanation of ChatGPT. Personally, I am very pessimistic on the idea of using ChatGPT for anything other than enjoyment at the clever way in which it puts together words. It is a language model, not a truth model: there is no concept of truthfulness in its responses and there is no ghost in the shell. My response to this comes from three places. First, a strong agreement with the thrust of Charlie Stross’s post about this being a rather fishy time for a bunch of ChatGPT-related endeavors to pop up, just in time to soak money after the last bubble. Second, I’ve heard some really dumb ideas involving ChatGPT, like having it write academic papers or code. And third, because I am a strong believer in the weak AI theory (quick note: I misspoke and said “hard” and “soft” AI when I meant “strong” and “weak” AI). As I mentioned in the video, I’m obviously not able to prove that there will never be a strong AI, but I’m quite skeptical of the notion and if I had to put money on it, would be more comfortable with the “never” bet than it actually occurring before any specific time frame.
Mike, meanwhile, talked about some of the practical things he was using ChatGPT for, and he also accidentally exposed a weakness in ChatGPT to old information when asking a question about PASS Summit.
We had the great honor of having Kevin Kline on, so we spent most of the episode grilling him and Mala about the history of the SQL Server community and PASS as an organization. Both of them have such a great deal of knowledge about the organization and broader community, so if there was ever a good episode for me to lose my voice, this is the one.