Show notes for the Triangle Area SQL Server User Group's Shop Talk program.
Author: Kevin Feasel
Kevin Feasel is a Microsoft Data Platform MVP and CTO at Envizage, where he specializes in data analytics with T-SQL and R, forcing Spark clusters to do his bidding, fighting with Kafka, and pulling rabbits out of hats on demand. He is the lead contributor to Curated SQL (https://curatedsql.com) and author of PolyBase Revealed (https://www.apress.com/us/book/9781484254608). A resident of Durham, North Carolina, he can be found cycling the trails along the triangle whenever the weather's nice enough.
The main topic for tonight started with a blog post Tracy wrote to store BlitzIndex results to a table. We expanded this to discuss a variety of other topics, like what else we might store in tables: wait stats, sp_whoisactive results, and other things which are transient. We also talked about cleanup strategies, the folly of not having a cleanup strategy, and long-term data retention. There was a bit of meandering and ranting here, as you’d expect.
Mike talked about some pain that some of his colleagues have been having with Azure Data Factory and data cleanup / obfuscation via the FHIR service. It doesn’t sound like there’s a great answer to the problems his colleague ran into, as they had similar issues with two other providers.
When contained databases are enabled, database users with the ALTER ANY USER permission, such as members of the db_owner and db_accessadmin database roles, can grant access to databases and by doing so, grant access to the instance of SQL Server. This means that control over access to the server is no longer limited to members of the sysadmin and securityadmin fixed server role, and logins with the server level CONTROL SERVER and ALTER ANY LOGIN permission. Before allowing contained databases, you should understand the risks associated with contained databases. For more information, see Security Best Practices with Contained Databases.
They interpreted this as granting rights to the partially contained database user to access other databases on the instance. Instead, what it’s saying is that any partially contained user must have the ability to connect to the instance in order to access the partially contained database. But they cannot connect to any other databases on the instance.
Thoughts on MySQL
Mala has been doing some work with MySQL lately. For the sake of being the bad cop, I made fun of MySQL, though that is in good fun—I like MySQL for what it is but it’s probably number 4 on my list of relational databases, well behind SQL Server, PostgreSQL, and Oracle. Anyhow, Mala is impressed with their diagram creation tool, which is way better than what SQL Server has built in.
Our first topic of the night was a quick discussion about SQL Server 2022, including some of the features forthcoming and how to try it out. If you haven’t tried it yet, the easiest approach at the moment would be to grab the 2022 CTP container. But if you want to try out the Windows version on a VM, that’s another good approach.
The next topic was around CVE-2022-29143, which affects all versions of SQL Server since 2014 (but probably not 2012?). As of the time of recording, there wasn’t much information available on this issue, as it was not an issue in the wild and sounds like the problem was discovered by Microsoft themselves or a customer and not a third-party security researcher. That means no cool websites, exciting names, fancy logos, or mascots.
Still, patch your servers.
SSISDB in Full Recovery Mode?
Our third topic came from an e-mail by friend of the show Mark Gordon, who ran into a SQL Server instance whose SSISDB was in Full recovery mode while pretty much every other database was in Simple. In this segment, we talked a bit about why having SSISDB in Full recovery mode may (or more likely, will not) matter. The short version: it’s probably not too important as long as you take regular database backups. But if you have it in Full recovery mode, that certainly doesn’t harm things.
Indexed Views in SQL Server
Our final topic came from Mike, who wasn’t able to make it to the show. He asked about materialized views, a concept in Oracle, and what the SQL Server equivalent is. The closest equivalent is indexed views, so I talked about them. One thing I just glanced upon but didn’t go into detail over was that indexed views are indeed different from Oracle (and PostgreSQL)’s concept of materialized views. SQL Server’s indexed views are truer to the concept of views-as-tables, in the sense that any update to the underlying tables also necessarily updates the indexed view. There is no concept of deferring updates or running them on demand. This means that you won’t get stale data from an indexed view but also means that they can be the cause of performance issues when the underlying tables are sufficiently busy.
Our first major topic of the night came from an Aaron Bertrand post covering expressive T-SQL queries. This led me to a discussion of language generations. This post does a good job of describing the generations though I disagree strongly with the idea that Perl, Python, Ruby, etc. are 4th generation languages—they are third generation languages. The most popular 4th gen language is either SQL or the Unix shell but it turns out that there were a few more popular languages on the list that I didn’t see during the show.
For a really quick synopsis:
The first generation of programming language is machine code.
The second generation is assembly, which acts as an abstraction of machine code.
The third generation is structured, imperative programming languages. C and its family are some of the big boys here but almost every general-purpose programming language fits in this space.
The fourth generation covers declarative languages: I explain to you what I need and you determine how to get it. I don’t tell the database engine how to get the data, just that I need data meeting certain criteria.
The fifth generation of programming languages were intended for knowledge systems: I explain some facts to you and you draw inferences from them. In practice, 5GL never really took off, though we do see use of it in academia and AI research.
Fun with Indexing
Mike had a couple of questions which ate up the rest of the show.
If I have no filters on a query and select an indexed column vs. not, is there a performance difference? In other words, if I have EmployeeID INT which has a non-clustered index and SomeOtherID INT which does not have a non-clustered index, is there a performance difference if I run SELECT EmployeeID FROM Table versus SELECT SomeOtherID FROM Table?
If an index has multiple columns, how do you best make use of the index in terms of writing a query?
Mike had a data problem. Specifically, he had “parent” and “child” records in the same table. He knew which was a parent and which a child and he also knew that the children always came immediately after their parent alphabetically. The question is, how could he associate child records with their parents?
One easy answer to this is a concept called Last Observation Carried Forward (LOCF). LOCF is possible…in Azure SQL Edge. It’s now available in SQL Server 2022 but when we recorded this, Build had not yet happened and therefore we didn’t know anything about SQL Server 2022.
Fortunately, for those of you who didn’t immediately jump and put a first CTP into production, there are still solutions. Itzik Ben-Gan, naturally, has a solution. Specifically, he has two solutions but one of them was way more complex than I wanted to describe on the air.
Saying Goodbye to User Groups
Mala made the point that quite a few user groups have gone by the wayside these past couple of years. As things (slowly) begin to re-open, we’re seeing some user groups return from dormancy—for example, TriNUG, our local .NET user group, has started back up and you should stop on by one of these days. But for every TriNUG, there seem to be several groups whose leaders have given up the ghost. We talk a bit about why that can be.
New Doesn’t Mean Better
Our final topic of the night turned out to be an extended ranty discussion based on a really good blog post by Andy Leonard. The gist of Andy’s post is that novelty is not, strictly speaking, a virtue. A few of the highlights of this include:
The benefits of old code
Chesterton’s Fence, aka why you shouldn’t make rash decisions without sufficient information
Ugly code is usually ugly for a reason and if you don’t understand that reason, your beautiful new code will get real ugly
On the other side, technological decay does happen and things become obsolete
Still, anybody pushing new technology is trying to cross the chasm and drag you along with it, whether it makes sense or not
Today’s episode of Shop Talk didn’t really have a theme other than “watch Kevin fight allergies and rant about various things.” We talked about Windows 11, IoT, data privacy, the joys of The Internet Archive, and more.
If you’re looking at going to PASS Summit in Seattle this year, use the discount code Azure175 to save $175 off the cost of registration.
Least Privilege and SQL Server
Mala started us off with a fun(?) question about whether anybody actually implements the principle of least privilege with SQL Server. We got into the discussion quite a bit but my short version is, the principle of least privilege is an aspiration. Solomon mentioned module signing as the answer. My counter-point is that yes, module signing is the right tool but least privilege is as much a people & process problem as it is a tooling problem. The tool allows you to get there if and only if you have the people and process in place to do so.
Bro, Do You Even GROUPING SETS?
After that, we talked about a topic from Mike: grouping options, including CUBE, ROLLUP, and GROUPING SETS. I like ROLLUP for hierarchical data, such as Country -> State -> City. I don’t much like CUBE and I’ve never found a really good non-trivial example for why I’d use it. GROUPING SETS, however, is a top-notch operator. Read on for more info here.
Hardware for Practicing SQL Server Operations
Mike also asked for some hardware recommendations for doing “real” query tuning work with SQL Server. We talked about physical hardware, cloud VMs, and simulating weak networks (a trick I learned from Kendra Little).
Remembering the Basics
Our last topic came from Tracy and is around how easy it is to forget the “basics” when troubleshooting, especially in new environments. We make a lot of assumptions around the state of a system and sometimes those assumptions come back to bite us. Tracy’s example was using a balanced power plan on a server running SQL Server, thereby throttling CPU. We talked about the importance of checklists on the air but I want to expound on my thoughts just a little bit more.
Checklists are critical for troubleshooting but so is the right mindset. That mindset starts with careful reading and paying attention to available signals. That means reading error messages and understanding the content, which can sometimes be enough right there. Error messages often don’t do a great job of explaining themselves so it probably isn’t enough. Knowing where to look for more information, especially in non-destructive ways, is also vital. Some of this is “in-built” in the sense that it’s attitude but it is attitude which you can train into people. The rest is knowledge that you gain over time, and that’s what gets converted into the checklist.
The first topic of the evening was an after-action report of DevIntersection, as well as some discussion around in-person TriPASS events. We talked about why there won’t be a SQL Saturday Raleigh in 2022…but there might be in 2023. Regardless of when we go in-person again, I’ll still have the recording equipment available.
Our first topic of the night was a review of Microsoft’s write-up of the LAPSUS$ group, including how the group infiltrates organizations, what they tend to do, and how they operate. It’s a really interesting dive with some solid recommendations on how to prevent attacks like this in the future.
Replication: Push or Pull?
A while back, Mark Gordon reached out and asked about replication models, specifically whether push or pull should be the default at organizations. My quick thoughts are as follows:
Push is a good default because it lets you centralize jobs and processes (though if you have a really large number of jobs, you might need a separate job server to manage all of it). This setup works best when machines are on the same domain and the same people own and operate all of the servers.
When those conditions don’t hold, pull may be a better option. For example, if the machines are on different domains and there is no trust relationship between the domains, you can give the subscribers a SQL authenticated account and let them retrieve data. This also works well when you don’t want the publisher to have access to remote subscriber networks and when the people operating subscribers are different than the publisher server operators.