Welcome to Shop Talk, a regular Q&A-style broadcast hosted by the Triangle Area SQL Server Users Group (TriPASS).
After each episode airs, you will see the show notes as well as an embedded link below.
Welcome to Shop Talk, a regular Q&A-style broadcast hosted by the Triangle Area SQL Server Users Group (TriPASS).
After each episode airs, you will see the show notes as well as an embedded link below.
Mala kicked us off with an announcement of a conference from the WIT group: a mental health day on Friday, May 7th.
Our big item was Grafana moving from Apache 2.0 licensing to AGPL. This primed a long discussion on a variety of topics, starting with open source licensing but going way off the rails. What could possibly go wrong?
Tonight’s episode was a fun topic where I essentially gave a survey but without collecting responses. Because that’s how I roll. Here were the questions (including a couple I didn’t ask), along with my original answers.
@rajeshparuchuri asks, “How do you delete old backup files? Or what is your preferred way of deleting old backup files?”
We had some discussions about different methods for deleting, but there are several key methods:
xp_cmdshellto delete the files. This wasn’t available to Rajesh, so we had to move on.
@rednelo7625 asked: “I have a question about storing Word docs in a document DB as VARBINARY(MAX). I’m limiting my users to an upload size of 3MB. I’m storing them this way so that I can encrypt them and back them up. Do you see any problem with this practice?”
The short answer is yes, there are some risks with this. The biggest problem is that if the database becomes really popular, it can bloat up quickly and make activities like database backups, CHECKDB runs, etc. run a lot slower. And with blob data like this, there’s not much we can do about it—especially something like DOCX files which are already zipped (spoiler: .docx is basically a zipped series of XML files), so you don’t get any real compression benefits from using
As for the reasoning, it’s to encrypt and back up files. My response was to recommend checking into FileTable functionality that was introduced in SQL Server 2012. FileTable gives you a nice compromise between a desire to keep file metadata in SQL Server while keeping the binary data out. If you take a full backup of your database, it does back up the FileTable data (unless you choose to exclude it), so you do get backup protection that way without needing to store the files directly in the database.
As far as encryption goes, I’d recommend enabling disk-based encryption on the storage where those files are (assuming you use FileTable). I made the guess that rednelo was using Transparent Data Encryption for this, and encrypting the disks provides a similar level of protection—it protects data at rest. For further protection, it would be possible to write an app which accepts those Word documents and saves them in a password-protected archive, or encrypts the binary data and saves it as an encrypted file, decrypting it later for access.
Eventually, we got pulled off onto a sidetrack around IoT, especially in agriculture. This is an area of interest to me because the average person has no clue just how sophisticated modern agriculture is. Anders and @ENHANCE_QC have some knowledge in the area (family members in farming) and added some nice context too.
We covered one of the key problems with IoT: you may own the hardware, but if it has to talk to a remote server to survive, your expensive product can quickly become a brick. This also ties to right-to-repair laws, which I fully support. My philosophy on this is that if I bought the product, I should be able to alter the product, including repairs and modifications.
Twice in six months, Azure Active Directory has gone down. Both times, it happened right around an episode of Shop Talk. I think they’re just trying to give us content.
Microsoft has provided a summary of the issue. This was pretty big—bigger than I understood at the time of the episode, as this one did affect services more than the issue six months prior.
Fred e-mailed and asked a few questions about modern data warehousing. I gave fairly substantial answers, so check out the video for those details. Here’s the brief summary.
I want to understand how to create the hub and spoke data warehousing architecture.
Hub and spoke has a couple of meanings. The original term came from “departmental data marts,” which is a style popularized by Bill Inmon. I’m not the biggest fan of this approach. Instead, I like the bus architecture by Ralph Kimball. I think this approach makes it much easier for business analysts to get the data they need without needing to know much (if any) SQL.
By the way, because he’s a really smart guy, I want to link to Simon Whiteley’s take on whether the Kimball model fits in the modern data warehouse. My short version is that Kimball is still quite relevant, in part because its intended purpose is ease of use rather than performance, strictly speaking.
How will my schema look?
Ellis Butterfield has a presentation from 2017 covering hub-and-spoke in a Massive Parallel Processing system, such as Azure Synapse Analytics dedicated SQL pools (nee Azure SQL Data Warehouse). The idea is that dedicated SQL pools have concurrency limits, such that only a certain number of queries may run at a time. If you’re hitting those limits and queries are waiting, you might want to offload processing onto different machines.
In short, Ellis’s example uses Azure SQL Database elastic pools as the spokes, creating external tables which point back to the dedicated SQL pool, allowing you to isolate workloads on specific pools.
Another option is to use the serverless SQL pool, creating external tables against pre-cleaned data in the lake. This might help offload queries from a dedicated SQL pool.
A third option is to use Azure SQL Databases as the spokes, using either replication or ETL to migrate data from the dedicated SQL pool into SQL databases. Essentially, you’d treat these as data marts: line-of-business specific databases which serve queries for business analysts in those departments. This fits really well with the Kimball approach but does add more in terms of complexity.
A fourth option is to use Power BI as the spoke, migrating data from the dedicated SQL pool into a Power BI data model and use that for reporting. This assumes that your relevant data set can fit into Power BI’s Tabular model or that you’re using Azure Analysis Services to fit it in.
A fifth option would be to use result set caching in the dedicated SQL pool, especially if you have infrequently-changing data and many occurrences of the same queries. The idea here is that if you can make each individual query faster, you have less concurrency, as each gets in and out sooner and there’s less chance a lot of them will be hanging around at the same time.
What do I need to consider when I am transforming the data?
Here are a few things to consider:
If I am going to use Azure services, is there a cheaper substitute for Azure Synapse Analytics?
If your database is large enough—50-100 terabytes or more—then no, you’re probably not going to find a cheaper alternative. But if it’s small, on the order of 1 TB or less, go with Azure SQL Database. Azure Synapse Analytics dedicated SQL pools don’t really make much sense for a database under 1 TB.
Here are a few tips for saving cash when using Azure Synapse Analytics:
Thanks for the great question, Fred! If you have any questions you want us to handle on Shop Talk, hit us up at shoptalk at tripass dot org.
Mala and I spent some time talking about Microsoft Ignite and the data sessions. This ran Tuesday through Thursday, so you can check out highlights, etc.
Look, I’ve made a lot of mistakes in my life. Defending COBOL (as someone who doesn’t even like COBOL!) probably wasn’t one of them. But we spent a lot of time on technology, innovation, and my recognition (admonition?) that there’s nothing new under the sun.
As a bonus: COBOL-On-Wheelchair, 2013’s answer to Ruby on Rails.
Our first topic was around PASS content being available through Redgate. They’ve made PASS Pro content available on the Redgate Hub, and accessing that previously-gated content is now free.
In addition, they have taken control of the PASStv YouTube account. They’re uploading the PASS Summit 2019 Learning Pathway videos as well as PASS Virtual Summit 2020 videos.
We also spent some time on the relative merits of Redgate directly advertising on the PASStv YouTube account. Here’s my stance (after Solomon made good points in chat): if Redgate intends to release PASS content as a good deed, then yes, slap Redgate logos on things, make it available via Redgate’s website, and advertise away. They bought the rights to this content and PASS had the authority to sell them the rights, so no complaints there.
If Redgate intends to foster a new community organization, the better approach is to keep it as vendor-neutral as possible. I don’t think a SQL Server community really succeeds if it’s tied closely into one commercial enterprise—even if that enterprise were Microsoft, but especially not a third-party vendor. It plays havoc with the ability to be a 501(c)(3) (something PASS never was due to its early closeness to Microsoft) and probably drives away other vendors who supported PASS in the past but don’t want to give too much support to their competitor.
Ultimately, that’s Redgate’s choice to make, but I don’t think there’s a way to thread the needle.
Our next topic was around Always Encrypted, sent in by @rednelo7625. Here’s a summary of the question:
I have a table with about 5000 rows called [Case] with a column named “Title”. It was encrypted with an encryption type of Randomized. After a year, I needed to change the encryption type to Deterministic because of a new query which includes “Title” in group a by clause. You can only do that if the type is Deterministic.
So far this week 142 cases have been added to the table. And I’ve seen 1 user still get the same 500 error when inserting a new record through the website:
Operand type clash: nvarchar(10) encrypted with (encryption_type = ‘RANDOMIZED’, encryption_algorithm_name = ‘AEAD_AES_256_CBC_HMAC_SHA_256’, column_encryption_key_name = ‘CEK_Auto1’, column_encryption_key_database_name = ‘DBNAME’) is incompatible with nvarchar(max) encrypted with (encryption_type = ‘DETERMINISTIC’, encryption_algorithm_name = ‘AEAD_AES_256_CBC_HMAC_SHA_256’, column_encryption_key_name = ‘CEK_Auto1’, column_encryption_key_database_name = ‘DBNAME’).
Also, now when I rebuild the index on the table I get the following DBCC error:
Msg 206, Level 16, State 2, Line 29
Operand type clash: nvarchar(max) encrypted with (encryption_type = ‘DETERMINISTIC’, encryption_algorithm_name = ‘AEAD_AES_256_CBC_HMAC_SHA_256’, column_encryption_key_name = ‘DBNAME’, column_encryption_key_database_name = ‘DBNAME’) is incompatible with varchar
So, it appears that somewhere the column is still associated as Randomized or not at all…
There are two error messages of note here. The first is that
nvarchar(10) clashes with
nvarchar(max). The second is that
nvarchar(max) clashes with
For the first message, my best guess is that some stored procedure or .NET calling code has a parameter set to 10 bytes and is trying to update the column. When working with Always Encrypted, the data lengths must match up, so if you define the column as
nvarchar(max), all calls from .NET must have a data length of -1 (equivalent to max).
For the second message, that seems kind of weird and is around index rebuilding. My recommendation there was to rebuild the table. Essentially, take a downtime and:
SELECT * INTO dbo.BackupTableName FROM dbo.CurrentTableNameto move all data over to a new table.
I’m not sure if switching from deterministic to randomized encryption in Always Encrypted is officially supported but there isn’t much information available online about it. Given rednelo’s table size (about 5500 rows), I’d rather just re-create the thing than try to troubleshoot it and possibly see other errors pop up in the future.
If you are new to Power BI, the Dashboard in a Day is a good way of understanding some of what is possible with the product. Microsoft has an actively updated list of events, so check one out in your time zone. If you’re already familiar with Power BI, you won’t gain much from it, though—it’s really intended to be an introductory overview of the product.
Welcome to TriPASS’s newest board member, Mike Chrestensen. Mike is taking over for Tom Norman, who has moved down to Texas. Mike will serve out the remainder of Tom’s term and will be up for election at the end of 2021.
Denise e-mailed us with a question:
My question is about the 70-761 Microsoft Exam and Azure-900 Exam (I do have books),do you know anybody in our group that passed that and would provide tips for further study & preparation for the Exam. Recommendations for testing tools?
Mike and I both recommended Itzik Ben-Gan’s exam prep book for 761 (and its prior iteration, 461). Frankly, even if you never intend to take the exam, this is a book worth owning because Itzik does a great job of teaching and includes a surprising amount of depth in an exam study book.
As far as AZ-900 goes, this is an introductory exam. My recommendation for that one is to check out the exam skills outline and also try the Microsoft Learn items linked to the exam. You’re being tested on concepts, basic services, and higher-level generalities more than deep knowledge on any given service, and so I wouldn’t stress too hard on this one.
We also covered Redgate’s acquisition of PASS assets. At this point, there isn’t too much information available, but we do know a couple of things:
If you want to share your thoughts on what the future of the SQL Server community should look like, there’s a forum on SQL Server Central for this.
Mala and I shared our guarded optimism on this, though it will be interesting seeing if Redgate is able to get a consortium of vendors interested in sponsoring a Redgate-led conference.
Our key time sink for the evening was a Brent Ozar blog post which Mala and I both marked on our show agenda. The gist is that Brent tweeted out a question: if you could get rid of one feature in SQL Server, what would it be? For the show, we looked at each of the features that people submitted and gave our takes. Forthwith:
LIKE '%Something%'searches all over your code.
Summing it up, I agree with some of these, particularly auto-shrink, priority boost, the IMAGE data type, and lightweight pooling. I can be swayed to a couple more. But most of these are useful tools whose detractors see being misused. It’s not the trigger’s fault that some developer created cross-server, multi-step triggers which call multi-statement UDFs and we update it by hitting the ASMX web service hosted on our SQL Server.
The first topic of discussion was around the proper scope of resource groups. Here’s my thought on it:
My general philosophy is that a single resource group contains the whole set of things you’d destroy when it’s time to decommission something. From that perspective, it’s easy to think of one application per resource group: as it’s time to tear down an application, you delete the resource group and you’re done. Even with a hundred applications, I don’t think I’d be too upset about one resource group per set of application services if they’re independent applications.
That said, there are services which go across several applications. For example, a data warehouse or data lake might be used in several applications as a data source. Those core services could also get their own resource groups, but that comes with the downside risk that if all of the dependent users of the service get destroyed, the service might still be around, at which point it would be wasting money. So there’d have to be some sort of architectural mapping across application/resource group boundaries to keep track of dependencies. The list of resource groups itself would no longer be sufficient to provide all of this information.
Another thought would be, how independent are these applications? If they’re effectively independent microservices which communicate via expected protocols (like HTTP or gRPC) rather than shared infrastructure, then one resource group per application can sound really viable. But if some of the applications are tightly coupled together, it may make more sense to put them all into the same resource group, especially if there is a considerable overlap. Suppose for example that five applications all use the same SQL Server VM to store data, that there are cross-database queries galore, and that each app is dependent on the others for survival–that if you got rid of the other four, any individual one would not be particularly useful. In that case, it could make more sense to put that cluster of applications into a single resource group.
A related thought to the prior is, how much resource overlap is there? Do all of the apps use the same Azure App Services? Are you sharing one Azure Container Repository for everything? Is there one single Azure Synapse Analytics workspace across all of the applications? Or maybe one Azure Function app? The more interconnected services exist, the less useful “one resource group per application” works in practice. That said, there will probably be some overlap automatically–they may all use the same Azure DevOps implementation and store all of their code in the same GitHub account (though different repositories). The fact that there is some overlap isn’t necessarily contradictory; it’s the level of overlap which counts.
@lisanke chimed in with a really good question:
does anyone have or work for an organization that bought Solarwinds and caught a sunburst? Or does anyone buy a single source security mechanism for their Org? Do you now worry that u’ve put all your security eggs into one basket? np if this is too big for an add-on discussion tonight… maybe something to ask for a future talk. Or more general, supply-chain bugs.
This led to a lengthy discussion around risk.
@vijayb5 has a good question as well:
what approach you would think will guide us to determine root cause to find failover of alwayson availability group i have used Failover Detection Utility most of the time it never give proper reason other than showing the possible error around that time. i am interested to find what caused the error.
This is one that I tried to answer, but we’re bouncing it off of Tracy for a better answer in the next episode.
Chris Wood hits us with a question, too:
My upgrade problem. I have a SQL2019 CU4 server running on Windows 2019 that had the server renamed. The appropriate drop servername and add servername was run but the registry wasn’t changed to indicate the current servername
When I ran the CU8 upgrade it throw a 574 error running msdb110_upgrade.sql against master and master wasn’t recoverable. We had disk snapshots and recovered to another server. I had heard of using Trace Flag 902 to stop the master being upgraded. I would like to try again sometime but I am now unsure what to do. This is the first major failure in about 20 years of SQL upgrades
In this case, it sounds like there’s an issue with running an upgrade script for SQL Agent. I was working from the assumption that it’s related to the server name change, though Solomon cautions that this isn’t necessarily the case. In any event, Mala’s recommendation on this is to contact Microsoft support. Mine is to try to convince the customer to install on a new server rather than upgrading in-place.
I ended the episode with an issue that a person e-mailed me about. It shows just how difficult it can be to get PolyBase to perform well when running SQL Server to SQL Server. The short version is that a particular query can work fine when returning a result set, but as soon as we try to save the result as a variable, performance tanks due to a predicate not being pushed down to the remote server. It’s a weird one and I’ll have a proper blog post about it someday.
Our major announcement is that Tom Norman is moving to Texas. I appreciate everything that Tom has done for us at TriPASS and we all wish him well.
As a result, we do have an opening for a board member. If you are a TriPASS member in good standing and interested in being the VP of Membership, please reach out to me.
Our first topic of the night was the results of the TriPASS survey. We had enough respondents to get some interesting results. Here’s the quick version:
Thank you to everybody who took the time to fill out the survey.
Melissa Coates has an updated document for us: the Power BI End-to-End Diagram. Melissa has kept this up to date for several years, and it’s crazy to see just how much Power BI has changed over time. We joke about having a release every other day, but it feels true…
Brent Ozar has released the results of the 2021 Data Professional Salary Survey. I haven’t taken the time to look at the data yet, but I have for prior years. It’s a skewed data set—primarily people who know of Brent (or are in that extended network), so it’s going to focus mostly on SQL Server. But if you understand that this isn’t the entire data platform professional population, it’s a very interesting data set for analysis.
Our major announcement is that the TriPASS 2020 membership survey is open. If you haven’t filled it out, it will be open for the next couple of weeks.
We spent the first segment of the show talking about the post-PASS future. We included a link to Data Saturdays, which is one step toward replacing SQL Saturdays.
I’d also recommend checking out Call for Data Speakers, a service hosted by Daniel Hutmacher to give data platform speakers and conferences a central location, even outside of the SQL Saturday/Data Saturdays paradigm.
We spent a good amount of time walking through the differing use cases of Azure Databricks and Azure Synapse Analytics. Microsoft has an architecture guide walking through the use cases. One point of difference I have: I don’t think HDInsight is worth using.
Ivana Pejeva has a great article on the topic. One thing about the article, though, is that it was written a few months back, so Azure Synapse Analytics has changed a bit since.
We also talked about Snowflake competitors, which in the Azure cloud is Azure Synapse Analytics dedicated SQL pools.
@rnicrosoft asked a question around analytics when you have 800 million or so key-value pairs. The transactional side solution is typically something like Cosmos DB, where you’re reading and writing single records at a time. But what happens when you need to perform analysis on the data?
One solution is to use something like Azure Synapse Link to pull that data from Cosmos DB into Azure Synapse Analytics and organize the data in a classic fact-dimension model.
But another solution would be to store the data in a time series database like InfluxDB and visualize it in Grafana. Tracy and I have implemented monitoring with InfluxDB, Telegraf, and Grafana, and you can use this for “normal” analytics as well.