Show notes for the Triangle Area SQL Server User Group's Shop Talk program.
Author: Kevin Feasel
Kevin Feasel is a Microsoft Data Platform MVP and CTO at Envizage, where he specializes in data analytics with T-SQL and R, forcing Spark clusters to do his bidding, fighting with Kafka, and pulling rabbits out of hats on demand. He is the lead contributor to Curated SQL (https://curatedsql.com) and author of PolyBase Revealed (https://www.apress.com/us/book/9781484254608). A resident of Durham, North Carolina, he can be found cycling the trails along the triangle whenever the weather's nice enough.
Our first topic of the night hit on SQLbits, which is coming up March 8-12. Registration is open right now and there are in-person and virtual options available. Note that this is a paid conference.
Becoming a Data Analyst
Our main topic of the night came in from a chat question:
I want to become a data analyst. I’m currently working in a profession not related to tech. Is this a pipe dream?
This is a great question and we spend a lot of time on (hopefully!) practical tips on how to get into the world of data analysis. Mike, Mala, and I come at this from different angles and one of the things I reiterate a few times is that data work is a career people move into, not start in. Quite often, somebody starts in a different field, begins to do data analysis within that area, and eventually decides to specialize in it when they find out how fun it is.
The main topic of this episode was around code reviews. We talked about some of the pain points when doing code reviews for database changes, how they differ from application code reviews, and also ended up going down some cryptographical rabbit holes.
Board elections for the Triangle Area SQL Server Users Group have wrapped up and we are happy to have Tracy Boggiano and Mike Chrestensen back for their two-year terms. Thank you to everybody who voted.
Technical Debt
Our big topic was around technical debt, including a description of what it is, how you can minimize it, and even how much it matters. I decided to play Devil’s Advocate a bit, as developers tend to be on the “fix all the technical debt!” train, and somebody’s got to make it interesting…
Mike brought up Martin Fowler’s tech debt quadrant, which gives one helpful way of distinguishing classes of technical debt. I fumbled around with some thoughts and will need to write a fuller blog post to get my ideas out. Jeff Moden also made an appearance and talked about his situation, which sounds to me like a really difficult one to be in (to put it nicely).
Board elections for the Triangle Area SQL Server Users Group are currently running. Get in touch with me via Meetup if you are eligible to vote (i.e., are a member of the Meetup) and have a vote up or down on the slate of candidates.
Reflections on the Year
This episode was an early “reflect on the last year and think about the next” type of episode because we didn’t have any content I was traveling. Thank you to everybody who is a part of TriPASS; we appreciate your existence.
Board elections for the Triangle Area SQL Server Users Group are coming up soon. We have two candidates running for re-election: Mike for the Vice President of Membership and Tracy for Vice President of Programs. If you are a TriPASS member in good standing, you can also run for one of the leadership positions; get in touch with me via Meetup if interested.
Conference Thoughts
All four of us shared our thoughts on the recent PASS Summit. Overall, I think it was a successful first attempt, but it did have some rough edges. From there, I spoke a bit about the Live! 360 conference last week, where it was good to see some people for the first time in a couple of years.
NULL, Sargability, and Performance
The back half of the episode mostly dealt with performance tuning considerations, particularly when dealing with NULL. I started off hinting that I’d disagree with a good bit, but in the end, most of the advice was reasonable.
We started out with some thoughts about the first day of PASS Summit, which is now almost over because I’m bad at uploading these videos in a timely fashion. I moderated Steph Locke’s pre-con on data science, which was really good. Tracy saw Rob Sewell’s pre-con on ARM templates and Bicep, which was also really good.
Special Guest Interrogation
We invited Chris Wood on to ask a whole bunch of questions about PolyBase. This is probably a good reason to point at PolyBase Revealed, the world’s best book on PolyBase.
We dove into several topics, including parts of how PolyBase works, how you can analyze problems, and where you can find details on why performance is slow.
Registration is open for PASS Data Community Summit. This includes two days of pre-cons as well as three days of conference sessions. The pre-cons are priced at $200 apiece and the main conference is free, so check it out.
Query Editors: SSMS vs Azure Data Studio vs Whatever
Our first topic of the night was based on two separate questions. First, Mike (who sadly could not make it) asked what tools people use for query editing. He uses a tool called Aqua Data Studio, which looks fine but I’d guess doesn’t have a huge uptake in the SQL Server community. Almost everybody uses SQL Server Management Studio regularly, and I was the only person who also use Azure Data Studio regularly, though this was no scientific survey.
We talked a bit about how VS Code has overwhelmed Visual Studio, yet the SQL Server analog has had so much trouble getting adoption. I threw out a couple conjectures on the topic. First, there’s a lot of functionality in SSMS that isn’t in Azure Data Studio, and sometimes there’s stuff in ADS which is garbage compared to SSMS. Execution plans are the best example of this: as they are in Azure Data Studio today, execution plans are significantly worse than what you can get in SSMS, much less a tool like SolarWinds Plan Explorer. Considering that Azure Data Studio is intended for developers, that’s a massive pain point.
The other conjecture I threw out is that DBAs and data platform developers tend to be much more reticent to change than application developers, including when it comes to tooling. Considering that we still have people yearning for the good ol’ days of SQL Server Enterprise Manager and Query Analyzer, there’s an uphill battle here for the Azure Data Studio team.
Also of interest is that we have a bunch of people who use Notepad++ as their editors for pretty much everything.
We had a couple questions around stored procedures that we covered briefly. First, how do you convince developers and managers that stored procedures are useful (at least in SQL Server; for other database platforms, this advice may vary)? I’ve found that the most useful argument is to note that stored procedures act as interfaces between the code and database, allowing you to do a lot of interesting work without needing to re-compile and re-deploy code. Developers understand and (typically) like interfaces, so that helps it click for them. It also allows you to isolate the SQL code away from other languages, which lets multiple people work on separate problems with less risk of merge conflicts. Solomon also brought up important points like the ability for enhanced security via module signing.
The other question concerned unit testing stored procedures. Some people are big on that, like Mala, who uses tSQLt a lot. Solomon prefers DbFit as an integration test framework. I like the concept of tSQLt a lot, but the big issue is that the things I most want to test are things which are really difficult to test (like triggers, schema modifications, complex dynamic SQL operations, etc.) because they have dependencies on actual objects.
Registration is open for PASS Data Community Summit. This includes two days of pre-cons as well as three days of conference sessions. The pre-cons are priced at $200 apiece and the main conference is free, so check it out.
ORMs and Stored Procedures
Our first topic of the night related to Object-Relational Mappers (ORMs) and specifically, this post from Erik Darling about when to write a stored procedure to perform some activity versus trying to do it strictly from the ORM. Erik has some interesting thoughts on the topic that are worth your time.
At this point, I agree with something Anders mentioned in chat: ORMs are fine for simple CRUD operations: if I need something which inserts a row into a table and don’t have to worry about things like high concurrency, likelihood of failure on insertion, or multi-table insertions, then the ORM is fine. As the operation becomes more complex, however, ORMs tend to fall apart and that’s where it’s time to have a specialist working on the issue.
Speaking of specialists, I spent a bit of time criticizing full-stack generalists who don’t know the first thing about databases. I don’t expect a generalist to know a deep amount about all topics, but data is important enough that you should take it upon yourself to know enough not to make horrible mistakes.
I’m strongly in favor of right to repair, which boils down to a simple concept: make the schematics available, preferably to the general public but at least to independent repair shops. People end up throwing away so many things because some component broke—sometimes, it could be a component as simple as a capacitor on a circuit board. Fixing this may be a $50-100 job, but without having detailed repair documents available, it severely limits the ability of people to make the economical choice, leaving them to purchase something anew.
Definitely check out this segment, as there was a lot of great feedback and discussion from chat. I do want to reiterate one thing that I mentioned along the way: the importance of right to repair is not that it’s a way to make people retain things longer, but rather that it provides more opportunity for people to make the best decisions based on their circumstances. One person may choose to repair an item, another may choose to have a repair shop fix it, and a third person may choose to scrap the item and buy a new one. All three choices are the right choice, but in a world without right to repair, we’re limited in our ability to make them.
Registration is open for PASS Data Community Summit. This includes two days of pre-cons as well as three days of conference sessions. The pre-cons are priced at $200 apiece and the main conference is free, so check it out.
Presentation Recording Tips
The rest of the show was dedicated to audio and presentation quality tips from all of us. The jumping-off point was Anthony Nocentino’s post on his recording setup, which led me to pester him a bit and then buy the same stuff he has (though no amount of audio quality will give me Anthony’s dulcet tones!). Solomon also asked a very similar question right before the show started, so we spent a lot of time talking about audio, using OBS Studio, video editing products like Camtasia, and a detailed discussion on different levels of quality and roughly how much they cost. The most important takeaway is that you can create good presentations without spending a lot of money on professional-level gear. That stuff is for people who make a living (or a serious part of their total income) creating recorded content. Otherwise, you can sound great with a $50-100 microphone.
Next up, Mala couldn’t make it but she did suggest we cover this Daniel Hutmacher post. It’s a clever use of temporal tables, especially if you don’t have a crazy busy system. If you do start talking crazy busy (millions of rows updated per day, for example), I’d prefer an events architecture built around modifications getting their own records and summarizing results periodically. That way, you don’t run the risk of index fragmentation issues causing slow queries.
PASS Data Community Summit Pre-Cons
Our last major topic of the night was a discussion of PASS Data Community Summit and especially Tracy’s pre-con on Azure SQL fundamentals. If you haven’t already signed up for the Summit, the main event is free and pre-cons are quite well-priced at $200 apiece.