Shop Talk: 2023-05-08

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan

Notes: Questions and Topics

Auditing Table Changes

The first topic came in the form of an e-mail:

Recently I have taken over the DB tasks at the company I work at. The structure is either defined by software we use or logically set up decently. The only problem I’ve noticed so far though is the array of methods how data is manipulated in SQL server for user defined tables. For example, we use:

– SSIS packages in the Integration Catalog (SSISDB) – found eventually by using a file table but manually exporting right now since not setup with FileStream on the server

– SSIS packages in a file system called by SQL Server Agent job – parsed with a PowerShell script

– SQL Server Agent TSQL steps – using TSQL to parse sys tables to read the commands of steps for keywords

– Procedures – TSQL script looking for keyword through procedure definitions in all databases

– Triggers – Similar to procedures

Normally I use all these tools developed to try to track where the changes are being made and it actually works really well until the other day šŸ™ For the life of me I can’t find where a table is being updated and it’s driving me crazy! I realized I even missed cases like SQL Server Maintenance Plans, Task Scheduler events, software methods/functions that connect to the database, etc. Is there a way to see where and how a DB object is manipulated and have the ability to go modify that code? We would like to stay away from installing 3rd party tools but I am open to suggestions. The other day I read about SQL Server Profiler that comes with SSMS. I think that is enabled and I am going to try that on Monday. However, I am curious will that return every context of how the object is updated? I have 0 knowledge about this tool. Also, if there is another built-in tool recommended I am all ears as I am new to SQL Server.

Mala and I walk through the recommended answer, which is to use SQL Audit. Mark Gordon had a great presentation on the topic for TriPASS, so I recommend you check it out if you want to get started. If you just want to see the very specific answer of how to set up a SQL Audit for a single table, Nic Cain has a DBA Stack Exchange answer which does the trick. You can also dig into what the fn_get_audit_file() function does and see things like application name, which (if your connection strings are configured correctly), will provide you an answer of what is updating that table.

Entity-Attribute-Value Tables

The next topic was a bit of a blast from the past: Entity-Attribute-Value tables. This is a pretty good look at what they are, and Phil Factor has the best way of explaining them. I am not entirely anti-EAV, but I am anti-EAV about 90% of the time. The exceptional cases are:

  • When you are content simply passing back a “bag of attributes” to the application and have it sort things out, especially if you know the result set will definitely be small.
  • When building a proof of concept and you are positive (not just hopeful) that you will rebuild the data model before the first actual release.

The second use case is the scary one, as there are many things we tell ourselves to mollify the voices in our heads saying that this is a bad idea.

Are Databases Legacy Technology?

The last topic came from a couple of anecdotes about “kids these days” and coming out of computer science programs not knowing about databases. Most computer science programs are bad at teaching anything database-related—if you want to learn about relational theory, there’s usually at most one course (undergraduate or graduate-level), but if you want to learn practical applications of databases, you’re usually out of luck.

I think that this is a challenge, as it means we have new developers coming out of school who are completely ignorant of how data management works. At the same time, this is a challenge which has been with us for quite some time. When I was an undergraduate, I had…one course offered on relational theory. I don’t think the pattern has gotten worse, but there’s obviously a gap and the question is, is there a broader-based way of filling that gap than trying to introduce a database course one university at a time? I don’t have a good answer for this, though we do talk out the idea a bit and maybe that will help somebody come up with a better solution.

Shop Talk: 2023-04-24

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Mike Chrestensen

Notes: Questions and Topics

MVP Summit

Last week, Tracy and I were in Redmond for MVP Summit. It was good seeing a bunch of people for the first time in several years.

Cross-Platform Code

We next talked a bit about the notion of cross-platform code, that is, code which runs in Oracle, SQL Server, and other relational databases. There’s some value in the idea but the problem is that you lose out on some of the platform-specific performance and code benefits and end up designing for mediocrity rather than high performance.

What Compatibility Level Does

Our first real topic was a Brent Ozar blog post on compatibility level. The short version is that recent versions of SQL Server have introduced more functionality wrapped around compatibility level, and it’s good to know which things are gated by compatibility level—and importantly, which things aren’t.

Another Large Language Model Rant

Chat then baited me into a discussion on large language models and using ChatGPT for coding assistance. Where I’m at with this is, if you’re good enough to know what code you intend to write (and you don’t worry about potential licensing problems), you can probably treat ChatGPT as a drunken intern who writes code that you need carefully to review. Aside from that, I’m rather pessimistic about the whole notion.

Shop Talk: 2022-04-10

The Recording

The Panelists

  • Kevin Feasel
  • Mike Chrestensen

Notes: Questions and Topics

Caring about GCP

I started off with a ramble around a comment I made on the SQL Data Partners podcast: that you’d have to pay me to learn about Google Cloud Platform. Someone reached out to me to ask for more info and if that meant that I dislike GCP or something.

The short version is no, I have no ill feelings toward GCP. Neither do I have positive feelings toward it. It fits squarely into a purposeful blind spot for me, which comes about because the opportunity cost of learning something is the value of learning the next-best alternative. In other words, there’s only a certain number of hours in the day to learn things, so I’m going to prioritize things I find interesting, things which are easy for me to pick up, or things which (eventually?) make me money. Azure I know because I get free credits and get paid to know it well. AWS I know because I’ve worked in jobs where they’ve paid me to know enough about it. I’ve never had anyone pay me to learn GCP, so there’s no external incentive. If a customer came to me and said that they were switching to GCP and would like me to learn it, then yeah, I’d pick it up and see how things differ from Azure and AWS. But otherwise, it’s not on my radar.

Now, one thing I didn’t get into is that philosophically, I do find value in the learning equivalent of “wandering aimlessly.” I’m the type of person who would walk up and down the aisles in university libraries, not looking for a specific book but just willing to let whatever drew my attention guide me. This style of learning doesn’t always pay off, though I’ve found its hit rate is a lot higher than you’d first expect. So even if nobody pays me, there is a chance that someday I pick up GCP and try out some things. But the probability is still low—there are a lot of books on those shelves.

Draft Flag Driven Development

Mala pointed out a link to this Alex Bunardzic article on what he calls Draft Flag Driven Development. It took me a bit of discussion in chat and noodling through the idea to figure out the problem that I have with it. I do understand that, for many companies, the signal from code in a test environment succeeding (or failing) is not an extremely strong indicator of production succeeding or failing. But the big concern I have with this style of development is the risk of “not only did I break this thing, but I also broke a bunch of other stuff along the way” problems, where reverting back to the prior release isn’t enough—think catastrophic data failure or sending permanent changes to a third party provider.

Ordered Columnstore Indexes in SQL Server 2022

Ed Pollack has a great article on ordered columnstore indexes in SQL Server 2022. We walked through the article in detail, covering scenarios where columnstore itself works well, where ordered columnstore indexes are useful, and some of the pitfalls you might hit when using this feature.

Shop Talk: 2027-03-27

The Recording

The Panelists

  • Kevin Feasel
  • Tracy Boggiano
  • Mala Mahadevan
  • Gatecrasher Special guest star Jared Poche

Notes: Questions and Topics

SQLbits Review

Tracy and Mala started off with a quick review of SQLbits, with Mala mentioning that it was probably the best hybrid experience she’s had with a large conference.

Parameter Sensitive Plan Optimization

After that, Mala shared her thoughts on a new feature in SQL Server 2022 that she’s been trying out: parameter sensitive plan optimization. Jared mentioned some of the challenges with it but we also talked about how some of the criticism of this feature is a bit overblown.

40 Problems with a Stored Procedure

Mark Hutchinson got us to talk about this article from Aaron Bertrand involving a code review of a nasty piece of work. Aaron found 40 separate problems, so we went through and talked about each of them. I came in expecting to disagree with 10 or so, but I think I really only disagreed with 3-4. I was actually a little surprised by that, though then we had some fun pointing out the formatting problems in Aaron’s updated procedure. Sometimes what is best in life is to be just a little petty.

Shop Talk: 2023-03-13

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Mike Chrestensen

Notes: Questions and Topics

Exponential Moving Average in T-SQL

The first topic of the night came from Mala, as we talked about an article on financial modeling via T-SQL. Along the way, I shared one of my favorite “don’t do this but I love it” scenarios in T-SQL, an outstanding Gail Shaw blog post on using genetic algorithms within T-SQL to solve the knapsack problem. These are the types of things that I find interesting, though for “proper” use, I’d definitely pick a different language.

ChatGPT and the AI Bubble

Mike and I had a mini-debate for this topic. While we were talking about the topic, I included this explanation of ChatGPT. Personally, I am very pessimistic on the idea of using ChatGPT for anything other than enjoyment at the clever way in which it puts together words. It is a language model, not a truth model: there is no concept of truthfulness in its responses and there is no ghost in the shell. My response to this comes from three places. First, a strong agreement with the thrust of Charlie Stross’s post about this being a rather fishy time for a bunch of ChatGPT-related endeavors to pop up, just in time to soak money after the last bubble. Second, I’ve heard some really dumb ideas involving ChatGPT, like having it write academic papers or code. And third, because I am a strong believer in the weak AI theory (quick note: I misspoke and said “hard” and “soft” AI when I meant “strong” and “weak” AI). As I mentioned in the video, I’m obviously not able to prove that there will never be a strong AI, but I’m quite skeptical of the notion and if I had to put money on it, would be more comfortable with the “never” bet than it actually occurring before any specific time frame.

Mike, meanwhile, talked about some of the practical things he was using ChatGPT for, and he also accidentally exposed a weakness in ChatGPT to old information when asking a question about PASS Summit.

Shop Talk: 2023-02-27

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tracy Boggiano
  • Special Guest Star: Kevin Kline

Notes: Questions and Topics

PASS and Community

We had the great honor of having Kevin Kline on, so we spent most of the episode grilling him and Mala about the history of the SQL Server community and PASS as an organization. Both of them have such a great deal of knowledge about the organization and broader community, so if there was ever a good episode for me to lose my voice, this is the one.

Shop Talk: 2023-02-13

The Recording

The Panelists

  • Kevin Feasel

Notes: Questions and Topics

The Disappointment Episode

Because there was nobody to stop me from spiraling, I started off the episode with some bad news:

  • We probably aren’t going to have a SQL Saturday Raleigh this year due to difficulty finding an appropriate venue. I had a bunch of places shoot us down or ghost us, so although I’m sure we could have found somewhere to host, we weren’t able to figure out where that place was in time.
  • I got the privilege of telling my employees that we were all being laid off as part of a reorganization plan.

Thoughts on Synapse

After that, I riffed for a while on a blog post by Eugene Meidinger covering the difficulty in learning Azure Synapse Analytics from someone without that classical warehousing or ETL experience. Earlier that day, Eugene, Carlos L. Chacon, and I interviewed someone (and I’m being a little cagey here just because the episode hasn’t come out yet so I don’t want to spoil too much) on this topic.

“Big Data” and Its Discontents

The final topic of the evening was a discussion of how “Big Data” platforms—the author’s experience is in BigQuery but I’d also include Hadoop and even things like the Azure Synapse Analytics dedicated SQL pool—have become less common over the past several years. I think the article makes a good number of points, particularly around the major increases in per-machine power we’ve seen over the past decade. There are a couple of parts where I think the author overplays his hand, but overall, the article is worth the read.

Shop Talk: 2023-01-30

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan

Notes: Questions and Topics

Upcoming Events

The first topic of the night was a couple upcoming events the Shop Talk crew will be at. I’ll be at SQL Saturday Atlanta BI Edition on February 25th. Tracy will be in Wales for SQLbits in March and Mala will present remotely.

Laid Off? Andy Leonard Has Free Training for You

Andy Leonard has a generous offer for anyone who has been laid off recently: a full year of free access to his training catalog. Andy has a lot of great content and is a great person to learn from when it comes to data movement in SSIS or Azure Data Factory.

Implicit Conversions are Bad

Tracy authored a blog post recently on eliminating implicit conversions in Hibernate and JDBC. She wasn’t able to make the show but Mala and I talked about the topic and Solomon Rutzky reminded us that the most likely problem Tracy ran into involved collations and data type mismatches—with Windows collations, we wouldn’t see these issues.

Debugging T-SQL Code

Mala wanted us to talk about a recent Brent Ozar post on debugging T-SQL code. I agree with Brent that RAISERROR and table variables form a potent combination for error handling. I will, however, never pronounce it as “raise-roar.”

Code Commenting

We wrapped things up with a diversion around this Maelle Salmon post on code commenting, with an emphasis on R. I like the principles of it and it got me thinking about whether there are languages which are more or less comment-needy: in other words, are there some languages in which you absolutely need more comments and other languages in which you definitely don’t need more? As a first approximation, I went with math-heavy (and functional) programming languages as benefitting more from detailed comments, and I could see relatively more verbose languages like COBOL needing fewer explicit comments. I’m not sure this is actually correct, however; I’d have to think about it some more.

Shop Talk: 2023-01-16

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan

Notes: Questions and Topics

TriPASS Booked through July

Because we talked about this during the last episode, here’s a quick update. We have booked all three groups (Advanced DBA, main meeting, and BI/Data Science) through July. TheĀ call for speakers is still up, however, and if you want to speak for our group, please submit one or more sessions..

Workplace “Red Flags”

A Kevin Kline tweet formed the basis of our first topic:

Mala and I shared some painful responses, though I cheated a bit and picked several situations in which I saw the red flag before taking the job.

“Big Data” Trends

We spent the rest of the episode taking a look at this Petr Nemeth article. We looked at and responded to each of Petr’s main trends. Some of them, I think, are reasonable; others have been a pipe dream for the past 15 years and I don’t foresee that changing.

Shop Talk: 2023-01-02

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Tracy Boggiano

Notes: Questions and Topics

TriPASS Is Looking for Speakers

The first topic of the night is that we are looking for speakers for the Advanced DBA and Business Intelligence / Data Science TriPASS meetings. These are (currently) remote-only, so all are welcome to submit sessions. The call for speakers is currently up and running.

SQL Saturday Raleigh Update

As a first step toward hosting SQL Saturday Raleigh in 2023, we started looking for a venue. The place which hosted us last time around is no longer doing weekend events and I’m currently 0 for 4 on locations. We have a few other irons in the fire and, assuming we can lock down a venue, will get to work on hosting SQL Saturday Raleigh. Our provisional date is April 15th but there’s no call for speakers or official announcement yet.

A New Demo Database

I showed off a demo database that Daniel Hutmacher put together, covering approximately 20 years of parking ticket data for Chicago, Illinois. We got some fun stories from Solomon (who I knew lived in Chicago for a while) on insta-tickets due to street cleaning and snow limitations.

Address and Business Name Validation

We had a chat question come in around normalizing addresses: that is, given some arbitrary string a user typed in, what is the “official” address? We recommended Melissa Data for this, as they handle files and have an API, as well as SSIS components. Other alternatives we kicked around were the Google Maps API and OpenStreetMap, both of which have APIs to support address lookup.

PyTorch Compromise

Our final topic of the night involved PyTorch, a popular deep learning library for Python. It seems that, sometime shortly after Christmas, someone pulled off a supply chain attack on PyTorch, creating a malicious package with the same name as an internal PyTorch package. This only affected people who installed the nightly build between December 25th and December 30th and the PyTorch website has cleanup instructions, as well as more details. The specific nature of the attack was particularly interesting, as the attackers put a lot of effort into staying hidden.