Author: Kevin Feasel

Kevin Feasel is a Microsoft Data Platform MVP and CTO at Envizage, where he specializes in data analytics with T-SQL and R, forcing Spark clusters to do his bidding, fighting with Kafka, and pulling rabbits out of hats on demand. He is the lead contributor to Curated SQL (https://curatedsql.com) and author of PolyBase Revealed (https://www.apress.com/us/book/9781484254608). A resident of Durham, North Carolina, he can be found cycling the trails along the triangle whenever the weather's nice enough.

Shop Talk: 2021-11-22

by Kevin Feasel

November 24, 2021

12:06 am

Leave a comment on Shop Talk: 2021-11-22

Show Notes

The Recording

The Panelists

Kevin Feasel
Mala Mahadevan
Tracy Boggiano
Mike Chrestensen

Notes: Questions and Topics

Board Elections

Board elections for the Triangle Area SQL Server Users Group are coming up soon. We have two candidates running for re-election: Mike for the Vice President of Membership and Tracy for Vice President of Programs. If you are a TriPASS member in good standing, you can also run for one of the leadership positions; get in touch with me via Meetup if interested.

Conference Thoughts

All four of us shared our thoughts on the recent PASS Summit. Overall, I think it was a successful first attempt, but it did have some rough edges. From there, I spoke a bit about the Live! 360 conference last week, where it was good to see some people for the first time in a couple of years.

NULL, Sargability, and Performance

The back half of the episode mostly dealt with performance tuning considerations, particularly when dealing with NULL. I started off hinting that I’d disagree with a good bit, but in the end, most of the advice was reasonable.

Shop Talk: 2021-11-08

by Kevin Feasel

November 11, 2021

7:59 pm

Leave a comment on Shop Talk: 2021-11-08

Show Notes

The Recording

The Panelists

Kevin Feasel
Tracy Boggiano
Mike Chrestensen
BONUS SPECIAL GUEST Chris Wood

Notes: Questions and Topics

Thoughts on Day 1 of PASS Summit

We started out with some thoughts about the first day of PASS Summit, which is now almost over because I’m bad at uploading these videos in a timely fashion. I moderated Steph Locke’s pre-con on data science, which was really good. Tracy saw Rob Sewell’s pre-con on ARM templates and Bicep, which was also really good.

Special Guest Interrogation

We invited Chris Wood on to ask a whole bunch of questions about PolyBase. This is probably a good reason to point at PolyBase Revealed, the world’s best book on PolyBase.

We dove into several topics, including parts of how PolyBase works, how you can analyze problems, and where you can find details on why performance is slow.

SQL Server 2022

We also talked about SQL Server 2022’s announcement at Ignite. Since then, we saw the PASS Data Community Summit’s first day keynote dive into more of the topics and vision.

Shop Talk: 2021-10-25

by Kevin Feasel

October 27, 2021

1:34 pm

Leave a comment on Shop Talk: 2021-10-25

Show Notes

The Recording

The Panelists

Kevin Feasel
Mala Mahadevan
Tracy Boggiano

Notes: Questions and Topics

Pre-Cons at PASS Summit

Registration is open for PASS Data Community Summit. This includes two days of pre-cons as well as three days of conference sessions. The pre-cons are priced at $200 apiece and the main conference is free, so check it out.

Query Editors: SSMS vs Azure Data Studio vs Whatever

Our first topic of the night was based on two separate questions. First, Mike (who sadly could not make it) asked what tools people use for query editing. He uses a tool called Aqua Data Studio, which looks fine but I’d guess doesn’t have a huge uptake in the SQL Server community. Almost everybody uses SQL Server Management Studio regularly, and I was the only person who also use Azure Data Studio regularly, though this was no scientific survey.

We talked a bit about how VS Code has overwhelmed Visual Studio, yet the SQL Server analog has had so much trouble getting adoption. I threw out a couple conjectures on the topic. First, there’s a lot of functionality in SSMS that isn’t in Azure Data Studio, and sometimes there’s stuff in ADS which is garbage compared to SSMS. Execution plans are the best example of this: as they are in Azure Data Studio today, execution plans are significantly worse than what you can get in SSMS, much less a tool like SolarWinds Plan Explorer. Considering that Azure Data Studio is intended for developers, that’s a massive pain point.

The other conjecture I threw out is that DBAs and data platform developers tend to be much more reticent to change than application developers, including when it comes to tooling. Considering that we still have people yearning for the good ol’ days of SQL Server Enterprise Manager and Query Analyzer, there’s an uphill battle here for the Azure Data Studio team.

Also of interest is that we have a bunch of people who use Notepad++ as their editors for pretty much everything.

.NET Hot Reloading

Our next topic was around a controversy in the .NET community. The Verge reported on Microsoft taking a feature out from a release candidate after it had already made its way into the code base. That, by itself, isn’t necessarily a big deal—sometimes, features get pulled because they’re half-finished and you’ve hit the deadline, so you yank the code and put it in a future release. The problem, though, was that they were going to include it in Visual Studio 2022 but not anything else. After a lot of community outrage, Microsoft has put the feature back in, so that’s a win for the open source community.

Stored Procedures and Unit Testing

We had a couple questions around stored procedures that we covered briefly. First, how do you convince developers and managers that stored procedures are useful (at least in SQL Server; for other database platforms, this advice may vary)? I’ve found that the most useful argument is to note that stored procedures act as interfaces between the code and database, allowing you to do a lot of interesting work without needing to re-compile and re-deploy code. Developers understand and (typically) like interfaces, so that helps it click for them. It also allows you to isolate the SQL code away from other languages, which lets multiple people work on separate problems with less risk of merge conflicts. Solomon also brought up important points like the ability for enhanced security via module signing.

The other question concerned unit testing stored procedures. Some people are big on that, like Mala, who uses tSQLt a lot. Solomon prefers DbFit as an integration test framework. I like the concept of tSQLt a lot, but the big issue is that the things I most want to test are things which are really difficult to test (like triggers, schema modifications, complex dynamic SQL operations, etc.) because they have dependencies on actual objects.

Attorneys General Sue Google

Our last topic was around a dozen Attorneys General suing Google for anti-trust violations. The AGs have filed their brief, which lays out their argument in 173 pages. I linked to a thorough analysis of the first half or so of the document, as well as some high-level takeaways. It’s important to remember that these are allegations rather than proven facts, but the allegations, if true, are pretty devastating.

Shop Talk: 2021-10-11

by Kevin Feasel

October 12, 2021

8:57 pm

Leave a comment on Shop Talk: 2021-10-11

Show Notes

The Recording

The Panelists

Kevin Feasel
Mala Mahadevan
Mike Chrestensen

Notes: Questions and Topics

Pre-Cons at PASS Summit

ORMs and Stored Procedures

Our first topic of the night related to Object-Relational Mappers (ORMs) and specifically, this post from Erik Darling about when to write a stored procedure to perform some activity versus trying to do it strictly from the ORM. Erik has some interesting thoughts on the topic that are worth your time.

At this point, I agree with something Anders mentioned in chat: ORMs are fine for simple CRUD operations: if I need something which inserts a row into a table and don’t have to worry about things like high concurrency, likelihood of failure on insertion, or multi-table insertions, then the ORM is fine. As the operation becomes more complex, however, ORMs tend to fall apart and that’s where it’s time to have a specialist working on the issue.

Speaking of specialists, I spent a bit of time criticizing full-stack generalists who don’t know the first thing about databases. I don’t expect a generalist to know a deep amount about all topics, but data is important enough that you should take it upon yourself to know enough not to make horrible mistakes.

Right to Repair

The other topic we spent a lot of time on came from the story that Microsoft is embracing right to repair for their hardware. Additional versions of the story from Ars Technica and Paul Thurrott, as well as Louis Rossmann’s take on the matter.

I’m strongly in favor of right to repair, which boils down to a simple concept: make the schematics available, preferably to the general public but at least to independent repair shops. People end up throwing away so many things because some component broke—sometimes, it could be a component as simple as a capacitor on a circuit board. Fixing this may be a $50-100 job, but without having detailed repair documents available, it severely limits the ability of people to make the economical choice, leaving them to purchase something anew.

Definitely check out this segment, as there was a lot of great feedback and discussion from chat. I do want to reiterate one thing that I mentioned along the way: the importance of right to repair is not that it’s a way to make people retain things longer, but rather that it provides more opportunity for people to make the best decisions based on their circumstances. One person may choose to repair an item, another may choose to have a repair shop fix it, and a third person may choose to scrap the item and buy a new one. All three choices are the right choice, but in a world without right to repair, we’re limited in our ability to make them.

Shop Talk: 2021-09-27

by Kevin Feasel

September 29, 2021

10:30 am

Leave a comment on Shop Talk: 2021-09-27

Show Notes

The Recording

The Panelists

Kevin Feasel
Mala Mahadevan
Mike Chrestensen
Tracy Boggiano

Notes: Questions and Topics

Pre-Cons at PASS Summit

Presentation Recording Tips

The rest of the show was dedicated to audio and presentation quality tips from all of us. The jumping-off point was Anthony Nocentino’s post on his recording setup, which led me to pester him a bit and then buy the same stuff he has (though no amount of audio quality will give me Anthony’s dulcet tones!). Solomon also asked a very similar question right before the show started, so we spent a lot of time talking about audio, using OBS Studio, video editing products like Camtasia, and a detailed discussion on different levels of quality and roughly how much they cost. The most important takeaway is that you can create good presentations without spending a lot of money on professional-level gear. That stuff is for people who make a living (or a serious part of their total income) creating recorded content. Otherwise, you can sound great with a $50-100 microphone.

Shop Talk: 2021-09-13

by Kevin Feasel

September 16, 2021

3:10 pm

Leave a comment on Shop Talk: 2021-09-13

Show Notes

The Recording

The Panelists

Kevin Feasel
Mike Chrestensen
Tracy Boggiano

Notes: Questions and Topics

Open Sourcing the .NET 5 C# Language Extension

Our first topic of the night was a dive into Microsoft open sourcing the .NET 5 C# Language Extension for SQL Server. We spent a good amount of time comparing it to CLR and wondering what the future holds in that regard.

Maintaining Timestamps without Triggers

Next up, Mala couldn’t make it but she did suggest we cover this Daniel Hutmacher post. It’s a clever use of temporal tables, especially if you don’t have a crazy busy system. If you do start talking crazy busy (millions of rows updated per day, for example), I’d prefer an events architecture built around modifications getting their own records and summarizing results periodically. That way, you don’t run the risk of index fragmentation issues causing slow queries.

PASS Data Community Summit Pre-Cons

Our last major topic of the night was a discussion of PASS Data Community Summit and especially Tracy’s pre-con on Azure SQL fundamentals. If you haven’t already signed up for the Summit, the main event is free and pre-cons are quite well-priced at $200 apiece.

Shop Talk: 2021-08-30

by Kevin Feasel

September 1, 2021

1:57 pm

Leave a comment on Shop Talk: 2021-08-30

Show Notes

The Recording

The Panelists

Kevin Feasel
Mala Mahadevan
Mike Chrestensen

Notes: Questions and Topics

Securing Application Accounts in SQL Server

@rednelo7625 had a follow-up from a prior e-mail and introduced a rather interesting question (or set of questions!):

Last week I mentioned in chat that our server was hacked and we were re-examining a lot of things. It appears that the hacker was only interested in defacing a site and spreading malware. But, it raised questions regarding what if the hacker was able to view confidential records in a db. While we’ve done all we know to prevent SQL injection, we have to ask if they were to launch a successful Sql injection attack, what could they see and how would we know it?
We had Sql auditing enabled but most of the results were not very helpful. At the time I set it up to pull everything without really thinking through what I needed to see. I watched the video with Mark Gordon which was very helpful. So, now we’re defining the audits to answer specific questions and automating email alerts if certain lookups occur on certain tables.
The user in the connection string for the application (AKA “WebConfig_user”) needs to connect select data from these tables. In the audit records, we see that user and what the application is selecting. But, it’s hard to know what is a legit query and what is not because of massive volume of data.
So, to help I created a SQL audit that looks to see if specific tables were accessed by anyone who is not the application’s connection string user. So, if a developer is working in SSMS, and views a table, we would catch that as well as any actor who might have gained access to one of the user SQL accounts. That part works fine. I can now determine if someone other than the WebConfig_user was poking around.
But, this still doesn’t completely solve the issue. Suppose a hacker was successfully able to gain access through the application’s connection string user? If I can’t determine through SQL auditing what that person as “WebConfig_user” was looking at, I’m still in the dark. Is there an answer with Sql Auditing I’m unaware of? I’m wondering if I should I just try to stop them another way –perhaps by taking more efforts at blocking their ability to find the tables to look at in the first place.”
This causes us some concern. I read somewhere that I could deny select on certain objects or schemas which would prevent this user from accessing this information and knowing which tables there are to look at.
First, I’m wondering if denying select on these is a good practice or if it would even work?
Second, if it is a good idea and would prevent a bad actor from viewing the table structures, would denying these break the application? We don’t specifically call these anywhere. But, I’m not sure if denying select on these would impact anything Sql server is may be running in the background. Does anybody know.
I guess it comes down to this. How do we best ensure that nobody can access anything beyond the intended use of the application? And what is the best way to determine if they did?

We had quite a bit of discussion about options here, but in brief:

Encrypt your connection strings. Here’s an example for encrypting connection strings in a web.config file.
Run DENY VIEW DEFINITION on a particular role or user who shouldn’t have rights to view system tables.
Use stored procedures as a way of limiting direct table access and sign those procedures using module signing. This is the best way of limiting access in SQL Server.

ScriptDom

Next up, we covered an article Mala wrote on ScriptDom. This is hopefully the first of many articles.

Thoughts on Exception Handling in T-SQL with Try/Catch

For the last 15 minutes or so, we talked about the implementation of try/catch style error handling in SQL Server. The short version is that it’s definitely not perfect, but does have value. I think there are enough places where error handling brings value, but fully agree with the sentiment that if you’re handling errors in your T-SQL statements and you do not fix the error in the script itself, make sure that you bubble up the original error so that your caller has a fighting chance of fixing the issue.

Shop Talk: 2021-08-16

by Kevin Feasel

August 19, 2021

2:29 pm

Leave a comment on Shop Talk: 2021-08-16

Show Notes

The Recording

The Panelists

Kevin Feasel
Mike Chrestensen

Notes: Questions and Topics

Text Delimiters and CSVs

From Mark G: “I was teaching some new sql developers about export to text/csv files using wizard. When it got to having a text qualified value of NONE or actually putting in something such as “, the question came up Why would someone put something in there.”

This led to a fair amount of ranting on my part, as well as Solomon calling me out for using the term “quoted delimiter” as a sloppy combination of “quoted identifier” and “text delimiter.” I complained a lot about how PolyBase quoted identifiers don’t actually do anything useful and how you have to be careful about each CSV parser because they all seem to behave differently.

DBCC REINDEX and Always Encrypted

Bob also e-mailed and had a strange error when running DBCC REINDEX to rebuild an index with Always Encrypted enabled:

Msg 206, Level 16, State 2, Line 3 Operand type clash: nvarchar(max) encrypted with (encryption_type = ‘DETERMINISTIC’, encryption_algorithm_name = ‘AEAD_AES_256_CBC_HMAC_SHA_256’, column_encryption_key_name = ‘CEK_Auto1’, column_encryption_key_database_name = ‘DBNAME’) is incompatible with varchar DBCC execution completed. If DBCC printed error messages, contact your system administrator. Completion time: 2021-07-30T17:53:06.1301441-04:00

Fortunately, Bob was able to find an answer pretty quickly, and the solution follows classic Microsoft error policy where the error message is technically correct but not helpful in discovering the cause.

In this case, Bob’s answer was to drop auto-generated statistics on the table and then rebuild the index again. That happened to work out just fine.

CycleCloud

Mike wrapped us up with a quick discussion of CycleCloud, an interesting way to scale up and down VM resources in Azure. Unlike platform-as-a-service offerings, which tend to have sliders to control this sort of thing, VMs don’t have an easy answer. This still isn’t an easy answer, but it’s definitely easier.

Shop Talk: 2021-08-02

by Kevin Feasel

August 6, 2021

6:55 pm

Leave a comment on Shop Talk: 2021-08-02

Show Notes

The Recording

The Panelists

Kevin Feasel
Mala Mahadevan
Mike Chrestensen

Notes: Questions and Topics

PrintNightmare Updates

Just in case you thought PrintNightmare was done, it’s not. This bypass isn’t as bad as the original exploit, but given that this is a bypass response to a patch Microsoft just put out, it’s a good indicator that we’ll have more print spooler problems to deal with over the coming months.

Using Lookup Tables and the OTLT Anti-Pattern

Mike then regaled us with a story about lookup tables, though by “lookup table” he referred to metadata about tables and columns rather than reference data. I took the opportunity to talk about the One True Lookup Table anti-pattern and where it can all go wrong, and all three of us recommended that you read SQL Antipatterns.

Regular Expressions in SQL Server?

Mark H. asked a good question:

Do the recent versions of SQL Server come with (.Net framework) regular expression support?

The short answer is, no. The longer answer is that you can use SQLCLR to call out to C# code and have been able to since 2005. You could build your own implementation or use Solomon Rutzky’s SQLSharp library, which I highly recommend. I also pointed out the ability to use SQL Server Machine Learning Services to do this in R (since SQL Server 2016), Python (2017), and Java (2019), though Solomon rightly points out that this is more a hack solution than a great one, as you have to send the entire dataset to ML Services and it would not work well in a row-by-row operation.

Solomon and I then got off onto the topic of CLR strict security, and I recommend a lengthy series on the topic that Solomon wrote. Start with part 1 and check the sidebar for the other parts of the series. In short, CLR strict security is the answer to a question nobody asked.

At the end of the show, we also spoke a bit pessimistically about the future of CLR and how it’s likely to struggle along in the background as .NET Core takes over. I’d like that not to be the case—one area where SQL Server is way behind other relational database management systems is in the relative lack of capability for extensibility scripting. Compare this to something like PostgreSQL, where you can write functions and procedures in multiple languages natively (and others with plugins) and you can see the problem even more clearly.

Shop Talk: 2021-07-19

by Kevin Feasel

July 22, 2021

12:02 am

Leave a comment on Shop Talk: 2021-07-19

Show Notes

The Recording

The Panelists

Kevin Feasel
Mala Mahadevan
Mike Chrestensen

Notes: Questions and Topics

SQL Server Version Support

We started off tonight by mentioning that SQL Server 2016 ended mainstream support on July 13th and now has entered extended support. SQL Server 2012 leaves extended support in July of 2022. We also talked a little bit about what this means.

On the other side of support, we talked about a couple of surveys from Brent Ozar and Steve Stedman, respectively, which look at who (among their user bases) is using which version of SQL Server. I also go into some detail about what I mean when I say that the population surveyed may not be the same as the broader population of SQL Server installs.

Print Spoolers? We Don’t Need No Steenkin’ Print Spoolers!

It has not been a good month for print spooling on Windows.

First up, we have PrintNightmare, which nets an 8.8 CERT rating. 10 is the “best” you can do, but not necessarily something you wish to aim for.
Then there was another print spooler problem about a week later.
But wait, there’s more! Another day, another print spooler issue.

Reinforcement Learning and Pong

Andrej Karpathy has a great post from a few years back about reinforcement learning. The last segment of the program tonight covered some minor updates I’ve made to the code, as well as a demonstration of a model I trained over the course of a weekend.