Last week, Tracy and I were in Redmond for MVP Summit. It was good seeing a bunch of people for the first time in several years.
Cross-Platform Code
We next talked a bit about the notion of cross-platform code, that is, code which runs in Oracle, SQL Server, and other relational databases. There’s some value in the idea but the problem is that you lose out on some of the platform-specific performance and code benefits and end up designing for mediocrity rather than high performance.
What Compatibility Level Does
Our first real topic was a Brent Ozar blog post on compatibility level. The short version is that recent versions of SQL Server have introduced more functionality wrapped around compatibility level, and it’s good to know which things are gated by compatibility level—and importantly, which things aren’t.
Another Large Language Model Rant
Chat then baited me into a discussion on large language models and using ChatGPT for coding assistance. Where I’m at with this is, if you’re good enough to know what code you intend to write (and you don’t worry about potential licensing problems), you can probably treat ChatGPT as a drunken intern who writes code that you need carefully to review. Aside from that, I’m rather pessimistic about the whole notion.
I started off with a ramble around a comment I made on the SQL Data Partners podcast: that you’d have to pay me to learn about Google Cloud Platform. Someone reached out to me to ask for more info and if that meant that I dislike GCP or something.
The short version is no, I have no ill feelings toward GCP. Neither do I have positive feelings toward it. It fits squarely into a purposeful blind spot for me, which comes about because the opportunity cost of learning something is the value of learning the next-best alternative. In other words, there’s only a certain number of hours in the day to learn things, so I’m going to prioritize things I find interesting, things which are easy for me to pick up, or things which (eventually?) make me money. Azure I know because I get free credits and get paid to know it well. AWS I know because I’ve worked in jobs where they’ve paid me to know enough about it. I’ve never had anyone pay me to learn GCP, so there’s no external incentive. If a customer came to me and said that they were switching to GCP and would like me to learn it, then yeah, I’d pick it up and see how things differ from Azure and AWS. But otherwise, it’s not on my radar.
Now, one thing I didn’t get into is that philosophically, I do find value in the learning equivalent of “wandering aimlessly.” I’m the type of person who would walk up and down the aisles in university libraries, not looking for a specific book but just willing to let whatever drew my attention guide me. This style of learning doesn’t always pay off, though I’ve found its hit rate is a lot higher than you’d first expect. So even if nobody pays me, there is a chance that someday I pick up GCP and try out some things. But the probability is still low—there are a lot of books on those shelves.
Draft Flag Driven Development
Mala pointed out a link to this Alex Bunardzic article on what he calls Draft Flag Driven Development. It took me a bit of discussion in chat and noodling through the idea to figure out the problem that I have with it. I do understand that, for many companies, the signal from code in a test environment succeeding (or failing) is not an extremely strong indicator of production succeeding or failing. But the big concern I have with this style of development is the risk of “not only did I break this thing, but I also broke a bunch of other stuff along the way” problems, where reverting back to the prior release isn’t enough—think catastrophic data failure or sending permanent changes to a third party provider.
Tracy and Mala started off with a quick review of SQLbits, with Mala mentioning that it was probably the best hybrid experience she’s had with a large conference.
Parameter Sensitive Plan Optimization
After that, Mala shared her thoughts on a new feature in SQL Server 2022 that she’s been trying out: parameter sensitive plan optimization. Jared mentioned some of the challenges with it but we also talked about how some of the criticism of this feature is a bit overblown.
40 Problems with a Stored Procedure
Mark Hutchinson got us to talk about this article from Aaron Bertrand involving a code review of a nasty piece of work. Aaron found 40 separate problems, so we went through and talked about each of them. I came in expecting to disagree with 10 or so, but I think I really only disagreed with 3-4. I was actually a little surprised by that, though then we had some fun pointing out the formatting problems in Aaron’s updated procedure. Sometimes what is best in life is to be just a little petty.
Mike and I had a mini-debate for this topic. While we were talking about the topic, I included this explanation of ChatGPT. Personally, I am very pessimistic on the idea of using ChatGPT for anything other than enjoyment at the clever way in which it puts together words. It is a language model, not a truth model: there is no concept of truthfulness in its responses and there is no ghost in the shell. My response to this comes from three places. First, a strong agreement with the thrust of Charlie Stross’s post about this being a rather fishy time for a bunch of ChatGPT-related endeavors to pop up, just in time to soak money after the last bubble. Second, I’ve heard some really dumb ideas involving ChatGPT, like having it write academic papers or code. And third, because I am a strong believer in the weak AI theory (quick note: I misspoke and said “hard” and “soft” AI when I meant “strong” and “weak” AI). As I mentioned in the video, I’m obviously not able to prove that there will never be a strong AI, but I’m quite skeptical of the notion and if I had to put money on it, would be more comfortable with the “never” bet than it actually occurring before any specific time frame.
Mike, meanwhile, talked about some of the practical things he was using ChatGPT for, and he also accidentally exposed a weakness in ChatGPT to old information when asking a question about PASS Summit.
We had the great honor of having Kevin Kline on, so we spent most of the episode grilling him and Mala about the history of the SQL Server community and PASS as an organization. Both of them have such a great deal of knowledge about the organization and broader community, so if there was ever a good episode for me to lose my voice, this is the one.
Because there was nobody to stop me from spiraling, I started off the episode with some bad news:
We probably aren’t going to have a SQL Saturday Raleigh this year due to difficulty finding an appropriate venue. I had a bunch of places shoot us down or ghost us, so although I’m sure we could have found somewhere to host, we weren’t able to figure out where that place was in time.
I got the privilege of telling my employees that we were all being laid off as part of a reorganization plan.
Thoughts on Synapse
After that, I riffed for a while on a blog post by Eugene Meidinger covering the difficulty in learning Azure Synapse Analytics from someone without that classical warehousing or ETL experience. Earlier that day, Eugene, Carlos L. Chacon, and I interviewed someone (and I’m being a little cagey here just because the episode hasn’t come out yet so I don’t want to spoil too much) on this topic.
“Big Data” and Its Discontents
The final topic of the evening was a discussion of how “Big Data” platforms—the author’s experience is in BigQuery but I’d also include Hadoop and even things like the Azure Synapse Analytics dedicated SQL pool—have become less common over the past several years. I think the article makes a good number of points, particularly around the major increases in per-machine power we’ve seen over the past decade. There are a couple of parts where I think the author overplays his hand, but overall, the article is worth the read.
The first topic of the night was a couple upcoming events the Shop Talk crew will be at. I’ll be at SQL Saturday Atlanta BI Edition on February 25th. Tracy will be in Wales for SQLbits in March and Mala will present remotely.
Laid Off? Andy Leonard Has Free Training for You
Andy Leonard has a generous offer for anyone who has been laid off recently: a full year of free access to his training catalog. Andy has a lot of great content and is a great person to learn from when it comes to data movement in SSIS or Azure Data Factory.
Implicit Conversions are Bad
Tracy authored a blog post recently on eliminating implicit conversions in Hibernate and JDBC. She wasn’t able to make the show but Mala and I talked about the topic and Solomon Rutzky reminded us that the most likely problem Tracy ran into involved collations and data type mismatches—with Windows collations, we wouldn’t see these issues.
Debugging T-SQL Code
Mala wanted us to talk about a recent Brent Ozar post on debugging T-SQL code. I agree with Brent that RAISERROR and table variables form a potent combination for error handling. I will, however, never pronounce it as “raise-roar.”
Code Commenting
We wrapped things up with a diversion around this Maelle Salmon post on code commenting, with an emphasis on R. I like the principles of it and it got me thinking about whether there are languages which are more or less comment-needy: in other words, are there some languages in which you absolutely need more comments and other languages in which you definitely don’t need more? As a first approximation, I went with math-heavy (and functional) programming languages as benefitting more from detailed comments, and I could see relatively more verbose languages like COBOL needing fewer explicit comments. I’m not sure this is actually correct, however; I’d have to think about it some more.
Because we talked about this during the last episode, here’s a quick update. We have booked all three groups (Advanced DBA, main meeting, and BI/Data Science) through July. TheĀ call for speakers is still up, however, and if you want to speak for our group, please submit one or more sessions..
Workplace “Red Flags”
A Kevin Kline tweet formed the basis of our first topic:
Mala and I shared some painful responses, though I cheated a bit and picked several situations in which I saw the red flag before taking the job.
“Big Data” Trends
We spent the rest of the episode taking a look at this Petr Nemeth article. We looked at and responded to each of Petr’s main trends. Some of them, I think, are reasonable; others have been a pipe dream for the past 15 years and I don’t foresee that changing.
The first topic of the night is that we are looking for speakers for the Advanced DBA and Business Intelligence / Data Science TriPASS meetings. These are (currently) remote-only, so all are welcome to submit sessions. The call for speakers is currently up and running.
SQL Saturday Raleigh Update
As a first step toward hosting SQL Saturday Raleigh in 2023, we started looking for a venue. The place which hosted us last time around is no longer doing weekend events and I’m currently 0 for 4 on locations. We have a few other irons in the fire and, assuming we can lock down a venue, will get to work on hosting SQL Saturday Raleigh. Our provisional date is April 15th but there’s no call for speakers or official announcement yet.
We had a chat question come in around normalizing addresses: that is, given some arbitrary string a user typed in, what is the “official” address? We recommended Melissa Data for this, as they handle files and have an API, as well as SSIS components. Other alternatives we kicked around were the Google Maps API and OpenStreetMap, both of which have APIs to support address lookup.
PyTorch Compromise
Our final topic of the night involved PyTorch, a popular deep learning library for Python. It seems that, sometime shortly after Christmas, someone pulled off a supply chain attack on PyTorch, creating a malicious package with the same name as an internal PyTorch package. This only affected people who installed the nightly build between December 25th and December 30th and the PyTorch website has cleanup instructions, as well as more details. The specific nature of the attack was particularly interesting, as the attackers put a lot of effort into staying hidden.
We’ve completed another round of TriPASS elections and the slate of candidates passed: Kevin Feasel as President, Rick Pack as VP of Marketing, and Mala Mahadevan as Treasurer. Thank you to any TriPASS member who voted.
The Siren Song of Reusable Queries
Our big topic for this episode was around reusable code and how much of a trap it can be in SQL Server. Thinking about ways to reuse code is great in most procedural languages but we cover in some detail why that plan can fall apart with common T-SQL constructs, including functions and views.
Resume Thoughts
The other topic we covered involved resumes. I looked at it from two angles: me as a hiring manager and me as a candidate. A couple of the big things I’m looking for:
Brevity. My resume is 1 page long and I’ve done a few things. Your resume is not a curriculum vitae: it’s not intended to be everything you’ve ever done, just items which are most relevant to the job at hand. As you gain more experience, it’s okay to leave off older jobs, especially when they aren’t directly relevant.
Impact. You worked at BigCo for 14 years but what did you do? Pick one or two major projects which had the biggest impact and give me concrete measures of how you made somebody’s life better.
Appropriate humility. If you call yourself an expert on something, be prepared: that’s a big target on your back. But at the same time, if you’ve written a book and delivered a 6-lecture series at Oxford on a topic, don’t underplay your level of knowledge. Finding the appropriate level is tough, especially when there aren’t clear, common delineations between levels of expertise in a given field.
Hit the HR bullet points. This isn’t something I look for as a hiring manager but it can prevent me from getting your resume. Be sure, when you customize your resume for a particular job, to include as many of the relevant keywords as possible, as automated HR systems act as gatekeepers here. If the job mentions T-SQL, SQL, database administration, query tuning, and database security, fit those in. You should still be able to keep it to 1 page of impact-driven statements, especially if you do include a “Key skills” section with a line or two of relevant skills that you demonstrate (even if between the lines) in your job experience section.