Shop Talk: 2025-10-06

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Mike Chrestensen

Notes: Questions and Topics

Reading Parquet Files in SQL Server 2022

Our first came from a reader. The brunt of the question is as follows:

I’ve got a vendor who sends us parquet files. A quick google search suggests that SQL server’s polybase feature allows SQL Server 2022 to directly read from parquet files without to do ELT.

Fortunately, I happened to put together a video on how to read local files in SQL Server with PolyBase and MinIO.

A Checklist for Database Reliability Engineers

Our next major topic was reviewing this blog post from Amy Abel on what Database Reliability Engineers should know. The main takeaway I have from it is that 90% of the contents are simply things that good DBAs should know and do. There are some exceptions to this, particularly around knowledge of things like TerraForm or other Infrastructure as Code techniques, but the majority of this I would lump into “Stuff a good DBA should know how to do.”

The Monty Hall Problem

After that, we covered one of my favorite paradoxical concepts in statistics: the Monty Hall problem. I’ve also created a video on the topic that you’re welcome to watch if you want to dig into the simulation further. This concept is one that really drives home the importance of updating knowledge based on new information, and not applying that new information incorrectly.

DBSCAN in SQL Server

Our final major topic was a fun article around implementing the DBSCAN clustering algorithm in SQL Server. I was legitimately surprised that the performance was fine for ~200 records. Granted, 200 records isn’t a huge number or anything, but I figured it would be so sub-optimal compared to R or Python code that we’d see it hit its limits by that point.

Leave a Reply

Your email address will not be published. Required fields are marked *