Show notes for the Triangle Area SQL Server User Group's Shop Talk program.
Author: Kevin Feasel
Kevin Feasel is a Microsoft Data Platform MVP and CTO at Envizage, where he specializes in data analytics with T-SQL and R, forcing Spark clusters to do his bidding, fighting with Kafka, and pulling rabbits out of hats on demand. He is the lead contributor to Curated SQL (https://curatedsql.com) and author of PolyBase Revealed (https://www.apress.com/us/book/9781484254608). A resident of Durham, North Carolina, he can be found cycling the trails along the triangle whenever the weather's nice enough.
Microsoft Build was the day after this episode of Shop Talk, so we discussed a few items around Microsoft’s then-known emphasis on AI’ing all the things, as well as poking fun at a few deprecated features like StreamInsight.
We had a question come in from chat about how to get started with Azure SQL, getting an idea of what it entails versus on-premises SQL Server. I recommended the Azure SQL Workshop and Mala included a YouTube video from Anna Hoffman explaining when to use each service.
Recently I have taken over the DB tasks at the company I work at. The structure is either defined by software we use or logically set up decently. The only problem I’ve noticed so far though is the array of methods how data is manipulated in SQL server for user defined tables. For example, we use:
– SSIS packages in the Integration Catalog (SSISDB) – found eventually by using a file table but manually exporting right now since not setup with FileStream on the server
– SSIS packages in a file system called by SQL Server Agent job – parsed with a PowerShell script
– SQL Server Agent TSQL steps – using TSQL to parse sys tables to read the commands of steps for keywords
– Procedures – TSQL script looking for keyword through procedure definitions in all databases
– Triggers – Similar to procedures
Normally I use all these tools developed to try to track where the changes are being made and it actually works really well until the other day 🙁 For the life of me I can’t find where a table is being updated and it’s driving me crazy! I realized I even missed cases like SQL Server Maintenance Plans, Task Scheduler events, software methods/functions that connect to the database, etc. Is there a way to see where and how a DB object is manipulated and have the ability to go modify that code? We would like to stay away from installing 3rd party tools but I am open to suggestions. The other day I read about SQL Server Profiler that comes with SSMS. I think that is enabled and I am going to try that on Monday. However, I am curious will that return every context of how the object is updated? I have 0 knowledge about this tool. Also, if there is another built-in tool recommended I am all ears as I am new to SQL Server.
When you are content simply passing back a “bag of attributes” to the application and have it sort things out, especially if you know the result set will definitely be small.
When building a proof of concept and you are positive (not just hopeful) that you will rebuild the data model before the first actual release.
The second use case is the scary one, as there are many things we tell ourselves to mollify the voices in our heads saying that this is a bad idea.
Are Databases Legacy Technology?
The last topic came from a couple of anecdotes about “kids these days” and coming out of computer science programs not knowing about databases. Most computer science programs are bad at teaching anything database-related—if you want to learn about relational theory, there’s usually at most one course (undergraduate or graduate-level), but if you want to learn practical applications of databases, you’re usually out of luck.
I think that this is a challenge, as it means we have new developers coming out of school who are completely ignorant of how data management works. At the same time, this is a challenge which has been with us for quite some time. When I was an undergraduate, I had…one course offered on relational theory. I don’t think the pattern has gotten worse, but there’s obviously a gap and the question is, is there a broader-based way of filling that gap than trying to introduce a database course one university at a time? I don’t have a good answer for this, though we do talk out the idea a bit and maybe that will help somebody come up with a better solution.
Last week, Tracy and I were in Redmond for MVP Summit. It was good seeing a bunch of people for the first time in several years.
We next talked a bit about the notion of cross-platform code, that is, code which runs in Oracle, SQL Server, and other relational databases. There’s some value in the idea but the problem is that you lose out on some of the platform-specific performance and code benefits and end up designing for mediocrity rather than high performance.
What Compatibility Level Does
Our first real topic was a Brent Ozar blog post on compatibility level. The short version is that recent versions of SQL Server have introduced more functionality wrapped around compatibility level, and it’s good to know which things are gated by compatibility level—and importantly, which things aren’t.
Another Large Language Model Rant
Chat then baited me into a discussion on large language models and using ChatGPT for coding assistance. Where I’m at with this is, if you’re good enough to know what code you intend to write (and you don’t worry about potential licensing problems), you can probably treat ChatGPT as a drunken intern who writes code that you need carefully to review. Aside from that, I’m rather pessimistic about the whole notion.
I started off with a ramble around a comment I made on the SQL Data Partners podcast: that you’d have to pay me to learn about Google Cloud Platform. Someone reached out to me to ask for more info and if that meant that I dislike GCP or something.
The short version is no, I have no ill feelings toward GCP. Neither do I have positive feelings toward it. It fits squarely into a purposeful blind spot for me, which comes about because the opportunity cost of learning something is the value of learning the next-best alternative. In other words, there’s only a certain number of hours in the day to learn things, so I’m going to prioritize things I find interesting, things which are easy for me to pick up, or things which (eventually?) make me money. Azure I know because I get free credits and get paid to know it well. AWS I know because I’ve worked in jobs where they’ve paid me to know enough about it. I’ve never had anyone pay me to learn GCP, so there’s no external incentive. If a customer came to me and said that they were switching to GCP and would like me to learn it, then yeah, I’d pick it up and see how things differ from Azure and AWS. But otherwise, it’s not on my radar.
Now, one thing I didn’t get into is that philosophically, I do find value in the learning equivalent of “wandering aimlessly.” I’m the type of person who would walk up and down the aisles in university libraries, not looking for a specific book but just willing to let whatever drew my attention guide me. This style of learning doesn’t always pay off, though I’ve found its hit rate is a lot higher than you’d first expect. So even if nobody pays me, there is a chance that someday I pick up GCP and try out some things. But the probability is still low—there are a lot of books on those shelves.
Draft Flag Driven Development
Mala pointed out a link to this Alex Bunardzic article on what he calls Draft Flag Driven Development. It took me a bit of discussion in chat and noodling through the idea to figure out the problem that I have with it. I do understand that, for many companies, the signal from code in a test environment succeeding (or failing) is not an extremely strong indicator of production succeeding or failing. But the big concern I have with this style of development is the risk of “not only did I break this thing, but I also broke a bunch of other stuff along the way” problems, where reverting back to the prior release isn’t enough—think catastrophic data failure or sending permanent changes to a third party provider.
Tracy and Mala started off with a quick review of SQLbits, with Mala mentioning that it was probably the best hybrid experience she’s had with a large conference.
Parameter Sensitive Plan Optimization
After that, Mala shared her thoughts on a new feature in SQL Server 2022 that she’s been trying out: parameter sensitive plan optimization. Jared mentioned some of the challenges with it but we also talked about how some of the criticism of this feature is a bit overblown.
40 Problems with a Stored Procedure
Mark Hutchinson got us to talk about this article from Aaron Bertrand involving a code review of a nasty piece of work. Aaron found 40 separate problems, so we went through and talked about each of them. I came in expecting to disagree with 10 or so, but I think I really only disagreed with 3-4. I was actually a little surprised by that, though then we had some fun pointing out the formatting problems in Aaron’s updated procedure. Sometimes what is best in life is to be just a little petty.
Mike and I had a mini-debate for this topic. While we were talking about the topic, I included this explanation of ChatGPT. Personally, I am very pessimistic on the idea of using ChatGPT for anything other than enjoyment at the clever way in which it puts together words. It is a language model, not a truth model: there is no concept of truthfulness in its responses and there is no ghost in the shell. My response to this comes from three places. First, a strong agreement with the thrust of Charlie Stross’s post about this being a rather fishy time for a bunch of ChatGPT-related endeavors to pop up, just in time to soak money after the last bubble. Second, I’ve heard some really dumb ideas involving ChatGPT, like having it write academic papers or code. And third, because I am a strong believer in the weak AI theory (quick note: I misspoke and said “hard” and “soft” AI when I meant “strong” and “weak” AI). As I mentioned in the video, I’m obviously not able to prove that there will never be a strong AI, but I’m quite skeptical of the notion and if I had to put money on it, would be more comfortable with the “never” bet than it actually occurring before any specific time frame.
Mike, meanwhile, talked about some of the practical things he was using ChatGPT for, and he also accidentally exposed a weakness in ChatGPT to old information when asking a question about PASS Summit.
We had the great honor of having Kevin Kline on, so we spent most of the episode grilling him and Mala about the history of the SQL Server community and PASS as an organization. Both of them have such a great deal of knowledge about the organization and broader community, so if there was ever a good episode for me to lose my voice, this is the one.
Because there was nobody to stop me from spiraling, I started off the episode with some bad news:
We probably aren’t going to have a SQL Saturday Raleigh this year due to difficulty finding an appropriate venue. I had a bunch of places shoot us down or ghost us, so although I’m sure we could have found somewhere to host, we weren’t able to figure out where that place was in time.
I got the privilege of telling my employees that we were all being laid off as part of a reorganization plan.
Thoughts on Synapse
After that, I riffed for a while on a blog post by Eugene Meidinger covering the difficulty in learning Azure Synapse Analytics from someone without that classical warehousing or ETL experience. Earlier that day, Eugene, Carlos L. Chacon, and I interviewed someone (and I’m being a little cagey here just because the episode hasn’t come out yet so I don’t want to spoil too much) on this topic.
“Big Data” and Its Discontents
The final topic of the evening was a discussion of how “Big Data” platforms—the author’s experience is in BigQuery but I’d also include Hadoop and even things like the Azure Synapse Analytics dedicated SQL pool—have become less common over the past several years. I think the article makes a good number of points, particularly around the major increases in per-machine power we’ve seen over the past decade. There are a couple of parts where I think the author overplays his hand, but overall, the article is worth the read.
The first topic of the night was a couple upcoming events the Shop Talk crew will be at. I’ll be at SQL Saturday Atlanta BI Edition on February 25th. Tracy will be in Wales for SQLbits in March and Mala will present remotely.
Laid Off? Andy Leonard Has Free Training for You
Andy Leonard has a generous offer for anyone who has been laid off recently: a full year of free access to his training catalog. Andy has a lot of great content and is a great person to learn from when it comes to data movement in SSIS or Azure Data Factory.
Implicit Conversions are Bad
Tracy authored a blog post recently on eliminating implicit conversions in Hibernate and JDBC. She wasn’t able to make the show but Mala and I talked about the topic and Solomon Rutzky reminded us that the most likely problem Tracy ran into involved collations and data type mismatches—with Windows collations, we wouldn’t see these issues.
Debugging T-SQL Code
Mala wanted us to talk about a recent Brent Ozar post on debugging T-SQL code. I agree with Brent that RAISERROR and table variables form a potent combination for error handling. I will, however, never pronounce it as “raise-roar.”
We wrapped things up with a diversion around this Maelle Salmon post on code commenting, with an emphasis on R. I like the principles of it and it got me thinking about whether there are languages which are more or less comment-needy: in other words, are there some languages in which you absolutely need more comments and other languages in which you definitely don’t need more? As a first approximation, I went with math-heavy (and functional) programming languages as benefitting more from detailed comments, and I could see relatively more verbose languages like COBOL needing fewer explicit comments. I’m not sure this is actually correct, however; I’d have to think about it some more.
Because we talked about this during the last episode, here’s a quick update. We have booked all three groups (Advanced DBA, main meeting, and BI/Data Science) through July. The call for speakers is still up, however, and if you want to speak for our group, please submit one or more sessions..
Workplace “Red Flags”
A Kevin Kline tweet formed the basis of our first topic:
Mala and I shared some painful responses, though I cheated a bit and picked several situations in which I saw the red flag before taking the job.
“Big Data” Trends
We spent the rest of the episode taking a look at this Petr Nemeth article. We looked at and responded to each of Petr’s main trends. Some of them, I think, are reasonable; others have been a pipe dream for the past 15 years and I don’t foresee that changing.