Show notes for the Triangle Area SQL Server User Group's Shop Talk program.
Author: Kevin Feasel
Kevin Feasel is a Microsoft Data Platform MVP and CTO at Envizage, where he specializes in data analytics with T-SQL and R, forcing Spark clusters to do his bidding, fighting with Kafka, and pulling rabbits out of hats on demand. He is the lead contributor to Curated SQL (https://curatedsql.com) and author of PolyBase Revealed (https://www.apress.com/us/book/9781484254608). A resident of Durham, North Carolina, he can be found cycling the trails along the triangle whenever the weather's nice enough.
Our big item was Grafana moving from Apache 2.0 licensing to AGPL. This primed a long discussion on a variety of topics, starting with open source licensing but going way off the rails. What could possibly go wrong?
Tonight’s episode was a fun topic where I essentially gave a survey but without collecting responses. Because that’s how I roll. Here were the questions (including a couple I didn’t ask), along with my original answers.
Best Named SQL feature: Hekaton was, at least until marketing got in the way and made it In-Memory OLTP.
Worst Named SQL feature: The timestamp data type. As a runner-up, Azure Data Studio.
SQL Feature Most Likely To Have Been Named By Marketing: Always On Availability Groups. Or maybe AlwaysOn. I think it might have been Always-On at one point as well, but don’t recall.
Most Accurately Named SQL Feature: I want jokingly to say “Priority boost” here. On the plus side, most features are named clearly enough, even if sometimes they take away nice names and give them marketing-heavy names.
Least Accurately Named SQL Feature: The timestamp data type. As a runner-up, Azure Data Studio.
SQL Feature That Should Throw A Hard 24 Error If You Try to Use It In A New Project (But Continue To Work For All Your Existing Crap): Non-schemabound table-valued functions
SQL Feature That Just Needs A Little Love to Be Great: If this was 2010, I would have said Service Broker. Right now, PolyBase.
SQL Feature That Can Never Be Great Even With Barry White Levels of Love: CLR strict security.
Best SQL Feature: Metadata is also described in terms of SQL.
Suckiest SQL Feature: Stretch DB is too easy. CLR strict security could be on here, but I’ll pick something different: SSIS scale-out. It’s a half-hearted attempt to say that SSIS scales.
Surprisingly Useful SQL Feature: Batch mode processing. To clarify, the surprising part is just how often batch mode processing can speed things up.
Surprisingly Useless SQL Feature: Temporal tables. I think they tried to solve two problems at once and ended up providing a halfway-viable solution to each, but not a complete solution to either. A bit of that Barry White love could fix this.
SQL Feature I’ll Probably Never Use But I Like That It Exists: Query Store plan forcing for cursors. It’s easy to dunk on cursors and performance, but I appreciate that they were able to figure this out.
@rajeshparuchuri asks, “How do you delete old backup files? Or what is your preferred way of deleting old backup files?”
We had some discussions about different methods for deleting, but there are several key methods:
Use xp_cmdshell to delete the files. This wasn’t available to Rajesh, so we had to move on.
Write a PowerShell script to delete the files and call that script in the backup job. Essentially, know where the backups are stored and write a one-liner to get child items (with a .trn or .bak extension), filter to get just those older than your retention period, and delete the objects.
Use a maintenance plan. I include this in the list for the sake of completeness, not because it’s a great idea.
If you’re writing backups to a cloud store like AWS S3 or Azure Blob Storage, you can set a retention plan on the bucket/container and automatically delete files older than a certain age.
Similarly, many newer SANs offer the ability to set a retention policy on a share.
The Downside of Storing VARBINARY in a Table
@rednelo7625 asked: “I have a question about storing Word docs in a document DB as VARBINARY(MAX). I’m limiting my users to an upload size of 3MB. I’m storing them this way so that I can encrypt them and back them up. Do you see any problem with this practice?”
The short answer is yes, there are some risks with this. The biggest problem is that if the database becomes really popular, it can bloat up quickly and make activities like database backups, CHECKDB runs, etc. run a lot slower. And with blob data like this, there’s not much we can do about it—especially something like DOCX files which are already zipped (spoiler: .docx is basically a zipped series of XML files), so you don’t get any real compression benefits from using COMPRESS().
As for the reasoning, it’s to encrypt and back up files. My response was to recommend checking into FileTable functionality that was introduced in SQL Server 2012. FileTable gives you a nice compromise between a desire to keep file metadata in SQL Server while keeping the binary data out. If you take a full backup of your database, it does back up the FileTable data (unless you choose to exclude it), so you do get backup protection that way without needing to store the files directly in the database.
As far as encryption goes, I’d recommend enabling disk-based encryption on the storage where those files are (assuming you use FileTable). I made the guess that rednelo was using Transparent Data Encryption for this, and encrypting the disks provides a similar level of protection—it protects data at rest. For further protection, it would be possible to write an app which accepts those Word documents and saves them in a password-protected archive, or encrypts the binary data and saves it as an encrypted file, decrypting it later for access.
IoT Everywhere
Eventually, we got pulled off onto a sidetrack around IoT, especially in agriculture. This is an area of interest to me because the average person has no clue just how sophisticated modern agriculture is. Anders and @ENHANCE_QC have some knowledge in the area (family members in farming) and added some nice context too.
We covered one of the key problems with IoT: you may own the hardware, but if it has to talk to a remote server to survive, your expensive product can quickly become a brick. This also ties to right-to-repair laws, which I fully support. My philosophy on this is that if I bought the product, I should be able to alter the product, including repairs and modifications.
Twice in six months, Azure Active Directory has gone down. Both times, it happened right around an episode of Shop Talk. I think they’re just trying to give us content.
Microsoft has provided a summary of the issue. This was pretty big—bigger than I understood at the time of the episode, as this one did affect services more than the issue six months prior.
Modern Data Warehousing
Fred e-mailed and asked a few questions about modern data warehousing. I gave fairly substantial answers, so check out the video for those details. Here’s the brief summary.
I want to understand how to create the hub and spoke data warehousing architecture.
Hub and spoke has a couple of meanings. The original term came from “departmental data marts,” which is a style popularized by Bill Inmon. I’m not the biggest fan of this approach. Instead, I like the bus architecture by Ralph Kimball. I think this approach makes it much easier for business analysts to get the data they need without needing to know much (if any) SQL.
Ellis Butterfield has a presentation from 2017 covering hub-and-spoke in a Massive Parallel Processing system, such as Azure Synapse Analytics dedicated SQL pools (nee Azure SQL Data Warehouse). The idea is that dedicated SQL pools have concurrency limits, such that only a certain number of queries may run at a time. If you’re hitting those limits and queries are waiting, you might want to offload processing onto different machines.
In short, Ellis’s example uses Azure SQL Database elastic pools as the spokes, creating external tables which point back to the dedicated SQL pool, allowing you to isolate workloads on specific pools.
Another option is to use the serverless SQL pool, creating external tables against pre-cleaned data in the lake. This might help offload queries from a dedicated SQL pool.
A third option is to use Azure SQL Databases as the spokes, using either replication or ETL to migrate data from the dedicated SQL pool into SQL databases. Essentially, you’d treat these as data marts: line-of-business specific databases which serve queries for business analysts in those departments. This fits really well with the Kimball approach but does add more in terms of complexity.
A fourth option is to use Power BI as the spoke, migrating data from the dedicated SQL pool into a Power BI data model and use that for reporting. This assumes that your relevant data set can fit into Power BI’s Tabular model or that you’re using Azure Analysis Services to fit it in.
A fifth option would be to use result set caching in the dedicated SQL pool, especially if you have infrequently-changing data and many occurrences of the same queries. The idea here is that if you can make each individual query faster, you have less concurrency, as each gets in and out sooner and there’s less chance a lot of them will be hanging around at the same time.
What do I need to consider when I am transforming the data?
Here are a few things to consider:
Explore raw flat files using Azure Synapse Studio. This will give you an opportunity to see what the incoming data looks like and get a quick determination on data quality and shape.
Mapping Data Flows in Azure Data Factory can transform data stored in your data lake. These are great for relatively clean data.
You can also use Apache Spark and Spark clusters to wrangle nasty data, particularly if you need to do things like parsing regular expressions or fixing file delimiters.
Store your transformed data in Parquet format. It’s a great format in general and works extremely well with fact-style data, which has a lot of columnar repetition.
Generally, you’ll have three layers of data in your data lake. You’ll sometimes hear them referred to as bronze, silver, and gold. Occasionally you’ll hear raw, refined, and ready.
For loading data into clustered columnstore indexes, stage your data in heap tables or temp tables before running those transformations.
If I am going to use Azure services, is there a cheaper substitute for Azure Synapse Analytics?
If your database is large enough—50-100 terabytes or more—then no, you’re probably not going to find a cheaper alternative. But if it’s small, on the order of 1 TB or less, go with Azure SQL Database. Azure Synapse Analytics dedicated SQL pools don’t really make much sense for a database under 1 TB.
Here are a few tips for saving cash when using Azure Synapse Analytics:
Keep your dedicated SQL pools as small as possible. This component is less expensive than Azure SQL Data Warehouse was, but it is still costly.
Use serverless queries for one-off stuff rather than building it into the dedicated pool.
If you do a good job of laying out your data lake files and folders, serverless queries can be really inexpensive. It’s $5 per terabyte of data scanned, but even if you have 500+ TB of data stored, you might only need to scan a few gigabytes to get your answer.
Use Power BI Premium and store segments of the data model there. If you use that for reporting and analysis, you can scale down the dedicated SQL pool. And if you’re using Power BI Premium Per User pricing, that’s $20 per user per month. Or $5K a month if you use the Premium SKU.
Pause dedicated SQL pools if there’s a time frame you’re allowed to do that. For example, if your analysts are only in the office from 6 AM until 6 PM, you may be able to shut off the dedicated SQL pool for 10-12 hours a day.
If you can’t turn it off, that’s fine. Reserved instance pricing can cut down on the cost of dedicated SQL pools considerably. A 3-year reservation can save 65% or so off the pay-as-you-go price. If you do reserve, I’d recommend reserving a bit below where you normally are, as then you have the opportunity to scale back some if you get major query gains and you can negotiate up more easily than down.
Keep an eye on Spark pools and make sure they run only when needed. Those can also get expensive. Spark pools have auto-shutdown capabilities so if somebody forgets to turn one off, it can shut down automatically.
Thanks for the great question, Fred! If you have any questions you want us to handle on Shop Talk, hit us up at shoptalk at tripass dot org.
Mala and I spent some time talking about Microsoft Ignite and the data sessions. This ran Tuesday through Thursday, so you can check out highlights, etc.
Did Kevin Defend COBOL?
Look, I’ve made a lot of mistakes in my life. Defending COBOL (as someone who doesn’t even like COBOL!) probably wasn’t one of them. But we spent a lot of time on technology, innovation, and my recognition (admonition?) that there’s nothing new under the sun.
Mala and Kathi Kellenberger are hosting Data Platform WIT Day on March 5th, and Tracy will be one of the speakers. Registration is free via Meetup.
PASS Content via Redgate
Our first topic was around PASS content being available through Redgate. They’ve made PASS Pro content available on the Redgate Hub, and accessing that previously-gated content is now free.
In addition, they have taken control of the PASStv YouTube account. They’re uploading the PASS Summit 2019 Learning Pathway videos as well as PASS Virtual Summit 2020 videos.
We also spent some time on the relative merits of Redgate directly advertising on the PASStv YouTube account. Here’s my stance (after Solomon made good points in chat): if Redgate intends to release PASS content as a good deed, then yes, slap Redgate logos on things, make it available via Redgate’s website, and advertise away. They bought the rights to this content and PASS had the authority to sell them the rights, so no complaints there.
If Redgate intends to foster a new community organization, the better approach is to keep it as vendor-neutral as possible. I don’t think a SQL Server community really succeeds if it’s tied closely into one commercial enterprise—even if that enterprise were Microsoft, but especially not a third-party vendor. It plays havoc with the ability to be a 501(c)(3) (something PASS never was due to its early closeness to Microsoft) and probably drives away other vendors who supported PASS in the past but don’t want to give too much support to their competitor.
Ultimately, that’s Redgate’s choice to make, but I don’t think there’s a way to thread the needle.
Always Encrypted Issues
Our next topic was around Always Encrypted, sent in by @rednelo7625. Here’s a summary of the question:
I have a table with about 5000 rows called [Case] with a column named “Title”. It was encrypted with an encryption type of Randomized. After a year, I needed to change the encryption type to Deterministic because of a new query which includes “Title” in group a by clause. You can only do that if the type is Deterministic.
[…]
So far this week 142 cases have been added to the table. And I’ve seen 1 user still get the same 500 error when inserting a new record through the website:
Operand type clash: nvarchar(10) encrypted with (encryption_type = ‘RANDOMIZED’, encryption_algorithm_name = ‘AEAD_AES_256_CBC_HMAC_SHA_256’, column_encryption_key_name = ‘CEK_Auto1’, column_encryption_key_database_name = ‘DBNAME’) is incompatible with nvarchar(max) encrypted with (encryption_type = ‘DETERMINISTIC’, encryption_algorithm_name = ‘AEAD_AES_256_CBC_HMAC_SHA_256’, column_encryption_key_name = ‘CEK_Auto1’, column_encryption_key_database_name = ‘DBNAME’).
Also, now when I rebuild the index on the table I get the following DBCC error:
Msg 206, Level 16, State 2, Line 29
Operand type clash: nvarchar(max) encrypted with (encryption_type = ‘DETERMINISTIC’, encryption_algorithm_name = ‘AEAD_AES_256_CBC_HMAC_SHA_256’, column_encryption_key_name = ‘DBNAME’, column_encryption_key_database_name = ‘DBNAME’) is incompatible with varchar
So, it appears that somewhere the column is still associated as Randomized or not at all…
There are two error messages of note here. The first is that nvarchar(10) clashes with nvarchar(max). The second is that nvarchar(max) clashes with varchar.
For the first message, my best guess is that some stored procedure or .NET calling code has a parameter set to 10 bytes and is trying to update the column. When working with Always Encrypted, the data lengths must match up, so if you define the column as nvarchar(max), all calls from .NET must have a data length of -1 (equivalent to max).
For the second message, that seems kind of weird and is around index rebuilding. My recommendation there was to rebuild the table. Essentially, take a downtime and:
Change all of the encrypted columns back to plaintext
SELECT * INTO dbo.BackupTableName FROM dbo.CurrentTableName to move all data over to a new table.
Drop foreign key constraints or stored procedures which reference the table. Drop the table.
Re-create the table with plaintext columns.
Re-load the data from dbo.BackupTableName.
Encrypt using the combination of deterministic and randomized encryption as desired.
Re-create the stored procedures which reference the table. Also re-create any foreign key constraints as needed.
Make sure any .NET code is using correct data lengths.
I’m not sure if switching from deterministic to randomized encryption in Always Encrypted is officially supported but there isn’t much information available online about it. Given rednelo’s table size (about 5500 rows), I’d rather just re-create the thing than try to troubleshoot it and possibly see other errors pop up in the future.
Power BI Dashboard in a Day
If you are new to Power BI, the Dashboard in a Day is a good way of understanding some of what is possible with the product. Microsoft has an actively updated list of events, so check one out in your time zone. If you’re already familiar with Power BI, you won’t gain much from it, though—it’s really intended to be an introductory overview of the product.
Welcome to TriPASS’s newest board member, Mike Chrestensen. Mike is taking over for Tom Norman, who has moved down to Texas. Mike will serve out the remainder of Tom’s term and will be up for election at the end of 2021.
Tips for Exam Study
Denise e-mailed us with a question:
My question is about the 70-761 Microsoft Exam and Azure-900 Exam (I do have books),do you know anybody in our group that passed that and would provide tips for further study & preparation for the Exam. Recommendations for testing tools?
Mike and I both recommended Itzik Ben-Gan’s exam prep book for 761 (and its prior iteration, 461). Frankly, even if you never intend to take the exam, this is a book worth owning because Itzik does a great job of teaching and includes a surprising amount of depth in an exam study book.
As far as AZ-900 goes, this is an introductory exam. My recommendation for that one is to check out the exam skills outline and also try the Microsoft Learn items linked to the exam. You’re being tested on concepts, basic services, and higher-level generalities more than deep knowledge on any given service, and so I wouldn’t stress too hard on this one.
Redgate Acquires PASS Assets
We also covered Redgate’s acquisition of PASS assets. At this point, there isn’t too much information available, but we do know a couple of things:
Redgate does not get access to any user or attendee data that PASS previously owned. If you want to keep in the loop, you’ll have to sign up anew.
Redgate promises a PASS Summit-like conference, as well as the return of SQL Saturdays. As of right now, the SQL Saturday URL doesn’t direct to a server, but I imagine they’re working on building out a new site. PASS did not give them any code as part of the acquisition, so they’re building anew. Which, frankly, is a good idea when you consider what the SQL Saturday system ran on…
Redgate did obtain recordings from various PASS virtual groups and events, and they are working to bring that back online.
If you want to share your thoughts on what the future of the SQL Server community should look like, there’s a forum on SQL Server Central for this.
Mala and I shared our guarded optimism on this, though it will be interesting seeing if Redgate is able to get a consortium of vendors interested in sponsoring a Redgate-led conference.
Removing Features from SQL Server
Our key time sink for the evening was a Brent Ozar blog post which Mala and I both marked on our show agenda. The gist is that Brent tweeted out a question: if you could get rid of one feature in SQL Server, what would it be? For the show, we looked at each of the features that people submitted and gave our takes. Forthwith:
Cursors. Nope, keep them. Cursors can solve certain problems that you can’t do set-based. Prior to SQL Server 2012, cursors were the best solution for running totals (assuming you didn’t want to bank on accidental behavior around clustered index sorting). I’ve also used cursors to solve problems which don’t have pleasant set-based solutions. Then there’s the value in using cursors for administrative work. Yeah, you can say to use Powershell or .NET or bash or some other language to do the work, but if I’m in T-SQL and don’t want to build out a bunch of extra code in other languages, this is a good route. Also, for all of the people who said cursors, nobody wanted to get rid of WHILE. This is despite WHILE typically being much slower than properly-tuned cursors. I get that there are people who misuse cursors, and I get that set-based solutions are generally much better. But that doesn’t mean it makes sense to drop the functionality.
MERGE. This is one where I’d rather they fix it than dump it. There are too many bugs around MERGE for me to recommend people actually use, but it’d be so nice if it did work as expected.
Triggers. Again, no. Those are useful for good reasons. They can be misused, but there’s functionality that I can’t easily replicate without triggers. Or I can, but then other people want to get rid of them as well…
VARBINARY(MAX) or IMAGE data types. No to the first, yes to the second. The IMAGE data type is already deprecated; don’t use it anymore. As far as VARBINARY(MAX) goes, it’s quite useful. I can store compressed data, store ML models, and store things which I need access to in SQL Server without having to go elsewhere. Yeah, I get the pain of storing website images in a database—don’t do that. But that doesn’t mean getting rid of the data type is good idea.
GEOGRAPHY and GEOMETRY data types. Those are quite useful for spatial data, and if I had a dollar for every time somebody put longitude in the latitude column or vice versa, I’d have several dollars. Also, what are you storing longitude and latitude as? Float? Decimal? VARCHAR? It’s also a lot more pleasant to perform geographical distance calculations with these data types than it is if you’re just storing floats.
UNION and DISTINCT. This was a person who’s been burned by slower-than-necessary code. Otherwise, these constructs are so useful that it’s a non-starter.
WITH(NOLOCK) and instead force the use of the term READ UNCOMMITTED. I can get behind this, though I shudder at the hundreds of millions of lines of code which would break as soon as that happened…including, I’m sure, Microsoft code.
Auto-shrink. Sign me up for this one. I don’t think there’s a single good use for it, and would be happy to see it disappear entirely from the product. If you need to shrink a database, that should be an uncommon occurrence. Uncommon enough that you shouldn’t need to automate it.
Auto-close. I do have a good use for auto-close: development instances with large numbers of infrequently-accessed databases. Yeah, that’s a pretty specific use case, but if you’re in a low-memory situation or have a huge number of databases which rarely get used (and don’t mind the spinup cost when you do use them), auto-close is a viable solution.
Lightweight pooling. My recollection is that this was useful a couple decades ago, but has been a bad idea since, oh, 2005 or so.
OLE Automation. I haven’t worked with this, to be honest, so I don’t have a strong opinion.
CLR. A lot of DBAs dislike CLR, but I think it’s reflexive, like a vampire seeing sunlight. Or a developer seeing sunlight. If you want to argue that CLR has a fairly niche role, okay. As long as you ignore all of the stuff in T-SQL which is really CLR (PARSE, FORMAT, STRING_SPLIT, STRING_AGG, etc.). But I’ve written and used CLR modules to great effect, as it lets me do things in-database, simplifying solutions.
Service Broker. My beef with Service Broker is that it never got the UI it deserved. It’s a solid queueing system which works quite well, but it takes a while to understand.
Linked servers. The person mentioned that any work which requires linked servers should be done “in the mid-tier.” Sometimes I don’t have a mid-tier. Sometimes I don’t want a mid-tier. Occasionally, I just want to read the contents of a text file, and linked servers are an option (or OPENROWSET or PolyBase, both of which I’m also counting in this). Linked servers are useful for ETL, reference data lookups, migrating data from other data platforms, and plenty more. Yes, they can be slow and I don’t recommend using them frequently in OLTP scenarios. But SQL Server handles more than OLTP.
Database Tuning Advisor. This is an interesting one to think about. The kind of shop which uses DTA typically doesn’t know better and often ends up in a hole. But without DTA, they’d be in a different and possibly more painful hole. I don’t like DTA at all and hate having to clean up after it so much, but when you don’t have a DBA and “the database is slow,” I see the allure.
Full-text indexing. Meh. I’ve seen in in action a couple of times, seen it fail to work a couple of times (usually because data sets are too large), and have never been too thrilled with it. But it’s definitely better than LIKE '%Something%' searches all over your code.
XML indexing. Again, meh. I don’t think I’ve ever seen it in production, but it does work. I guess.
FILESTREAM / FileTable. I like FileTable. I think it does a good job of what it’s supposed to do: showing metadata for files in a share. You can’t complain about storing files in the database, but you also don’t need some custom-written app to keep a table in a database up to date with a file share. It’s a niche set of functionality, but again, I’ve benefitted from having this.
SSAS / SSRS / SSIS, and change the pricing model. Strongly against. SQL Server has always had an AND pricing model: you get this AND this AND this AND this, all for the same price. By contrast, Oracle has an OR pricing model. If you want feature A, you pay $$. For feature B, you pay an additional $$. For feature C, you pay an additional $$$. Pretty soon, $$$$$$$ adds up to a bunch of dollars. The biggest problem is that splitting out the products wouldn’t really decrease the price; it’d likely kill the other products as cross-subsidies would no longer apply. Today, I pay for SQL Server and get SSAS for free, so I can try it out. Maybe it turns out to be great and I use it so much that I add it to another server (and pay that additional license fee). Or maybe I don’t like it. If business decision-makers had to buy SSAS separately, there’d be a lot less of it and those of us who do use it would be worse off as a result.
Summing it up, I agree with some of these, particularly auto-shrink, priority boost, the IMAGE data type, and lightweight pooling. I can be swayed to a couple more. But most of these are useful tools whose detractors see being misused. It’s not the trigger’s fault that some developer created cross-server, multi-step triggers which call multi-statement UDFs and we update it by hitting the ASMX web service hosted on our SQL Server.
The first topic of discussion was around the proper scope of resource groups. Here’s my thought on it:
My general philosophy is that a single resource group contains the whole set of things you’d destroy when it’s time to decommission something. From that perspective, it’s easy to think of one application per resource group: as it’s time to tear down an application, you delete the resource group and you’re done. Even with a hundred applications, I don’t think I’d be too upset about one resource group per set of application services if they’re independent applications. That said, there are services which go across several applications. For example, a data warehouse or data lake might be used in several applications as a data source. Those core services could also get their own resource groups, but that comes with the downside risk that if all of the dependent users of the service get destroyed, the service might still be around, at which point it would be wasting money. So there’d have to be some sort of architectural mapping across application/resource group boundaries to keep track of dependencies. The list of resource groups itself would no longer be sufficient to provide all of this information. Another thought would be, how independent are these applications? If they’re effectively independent microservices which communicate via expected protocols (like HTTP or gRPC) rather than shared infrastructure, then one resource group per application can sound really viable. But if some of the applications are tightly coupled together, it may make more sense to put them all into the same resource group, especially if there is a considerable overlap. Suppose for example that five applications all use the same SQL Server VM to store data, that there are cross-database queries galore, and that each app is dependent on the others for survival–that if you got rid of the other four, any individual one would not be particularly useful. In that case, it could make more sense to put that cluster of applications into a single resource group. A related thought to the prior is, how much resource overlap is there? Do all of the apps use the same Azure App Services? Are you sharing one Azure Container Repository for everything? Is there one single Azure Synapse Analytics workspace across all of the applications? Or maybe one Azure Function app? The more interconnected services exist, the less useful “one resource group per application” works in practice. That said, there will probably be some overlap automatically–they may all use the same Azure DevOps implementation and store all of their code in the same GitHub account (though different repositories). The fact that there is some overlap isn’t necessarily contradictory; it’s the level of overlap which counts.
Solarwinds and Security
@lisanke chimed in with a really good question:
does anyone have or work for an organization that bought Solarwinds and caught a sunburst? Or does anyone buy a single source security mechanism for their Org? Do you now worry that u’ve put all your security eggs into one basket? np if this is too big for an add-on discussion tonight… maybe something to ask for a future talk. Or more general, supply-chain bugs.
This led to a lengthy discussion around risk.
Availability Group Failover Causes
@vijayb5 has a good question as well:
what approach you would think will guide us to determine root cause to find failover of alwayson availability group i have used Failover Detection Utility most of the time it never give proper reason other than showing the possible error around that time. i am interested to find what caused the error.
This is one that I tried to answer, but we’re bouncing it off of Tracy for a better answer in the next episode.
Upgrades and Server Names
Chris Wood hits us with a question, too:
My upgrade problem. I have a SQL2019 CU4 server running on Windows 2019 that had the server renamed. The appropriate drop servername and add servername was run but the registry wasn’t changed to indicate the current servername
When I ran the CU8 upgrade it throw a 574 error running msdb110_upgrade.sql against master and master wasn’t recoverable. We had disk snapshots and recovered to another server. I had heard of using Trace Flag 902 to stop the master being upgraded. I would like to try again sometime but I am now unsure what to do. This is the first major failure in about 20 years of SQL upgrades
In this case, it sounds like there’s an issue with running an upgrade script for SQL Agent. I was working from the assumption that it’s related to the server name change, though Solomon cautions that this isn’t necessarily the case. In any event, Mala’s recommendation on this is to contact Microsoft support. Mine is to try to convince the customer to install on a new server rather than upgrading in-place.
PolyBase and Pushdown
I ended the episode with an issue that a person e-mailed me about. It shows just how difficult it can be to get PolyBase to perform well when running SQL Server to SQL Server. The short version is that a particular query can work fine when returning a result set, but as soon as we try to save the result as a variable, performance tanks due to a predicate not being pushed down to the remote server. It’s a weird one and I’ll have a proper blog post about it someday.
Our major announcement is that Tom Norman is moving to Texas. I appreciate everything that Tom has done for us at TriPASS and we all wish him well.
As a result, we do have an opening for a board member. If you are a TriPASS member in good standing and interested in being the VP of Membership, please reach out to me.
TriPASS Survey Results
Our first topic of the night was the results of the TriPASS survey. We had enough respondents to get some interesting results. Here’s the quick version:
A small majority of TriPASS members who voted said that they would potentially be interested in paid training sponsored by TriPASS, with a price point between $100 and $150, so basically the price of a SQL Saturday pre-con. There were quite a few topics mentioned, but little consistency in topic selection so we’ll have to talk about it as a board to see if there is a good training option we might put on.
Thanks to everybody who contributed topics of interest for us to cover. The good news is that we have plans to cover at least 70% or so of the requested topics in the next year. And I might find a way to bring in one or two more.
We asked about special interest groups, and we will continue to have two special interest groups in addition to the main meeting. The second Tuesday of the month will continue to be Advanced DBA. For the fourth Tuesday of the month, the title will be Data Science and Business Intelligence. This means that we will include more topics like Power BI, Tableau, and data modeling, but will keep data science and machine learning topics in play as well.
Finally, the last question on the survey that I covered pertained to Shop Talk cadence. We have decided to move Shop Talk to run once every two weeks. This means that we will meet every other Monday at 7 PM Eastern.
Thank you to everybody who took the time to fill out the survey.
Power BI End-to-End
Melissa Coates has an updated document for us: the Power BI End-to-End Diagram. Melissa has kept this up to date for several years, and it’s crazy to see just how much Power BI has changed over time. We joke about having a release every other day, but it feels true…
Data Professional Salary Survey, 2021 Edition
Brent Ozar has released the results of the 2021 Data Professional Salary Survey. I haven’t taken the time to look at the data yet, but I have for prior years. It’s a skewed data set—primarily people who know of Brent (or are in that extended network), so it’s going to focus mostly on SQL Server. But if you understand that this isn’t the entire data platform professional population, it’s a very interesting data set for analysis.
Our major announcement is that the TriPASS 2020 membership survey is open. If you haven’t filled it out, it will be open for the next couple of weeks.
Data Saturdays
We spent the first segment of the show talking about the post-PASS future. We included a link to Data Saturdays, which is one step toward replacing SQL Saturdays.
I’d also recommend checking out Call for Data Speakers, a service hosted by Daniel Hutmacher to give data platform speakers and conferences a central location, even outside of the SQL Saturday/Data Saturdays paradigm.
Azure Databricks and Azure Synapse Analytics
We spent a good amount of time walking through the differing use cases of Azure Databricks and Azure Synapse Analytics. Microsoft has an architecture guide walking through the use cases. One point of difference I have: I don’t think HDInsight is worth using.
We also talked about Snowflake competitors, which in the Azure cloud is Azure Synapse Analytics dedicated SQL pools.
Time Series Databases
@rnicrosoft asked a question around analytics when you have 800 million or so key-value pairs. The transactional side solution is typically something like Cosmos DB, where you’re reading and writing single records at a time. But what happens when you need to perform analysis on the data?
One solution is to use something like Azure Synapse Link to pull that data from Cosmos DB into Azure Synapse Analytics and organize the data in a classic fact-dimension model.
But another solution would be to store the data in a time series database like InfluxDB and visualize it in Grafana. Tracy and I have implemented monitoring with InfluxDB, Telegraf, and Grafana, and you can use this for “normal” analytics as well.