Shop Talk: 2021-08-16

The Recording

The Panelists

  • Kevin Feasel
  • Mike Chrestensen

Notes: Questions and Topics

Text Delimiters and CSVs

From Mark G: “I was teaching some new sql developers about export to text/csv files using wizard.  When it got to having a text qualified value of NONE or actually putting in something such as “, the question came up Why would someone put something in there.”

This led to a fair amount of ranting on my part, as well as Solomon calling me out for using the term “quoted delimiter” as a sloppy combination of “quoted identifier” and “text delimiter.” I complained a lot about how PolyBase quoted identifiers don’t actually do anything useful and how you have to be careful about each CSV parser because they all seem to behave differently.

DBCC REINDEX and Always Encrypted

Bob also e-mailed and had a strange error when running DBCC REINDEX to rebuild an index with Always Encrypted enabled: 

Msg 206, Level 16, State 2, Line 3 Operand type clash: nvarchar(max) encrypted with (encryption_type = ‘DETERMINISTIC’, encryption_algorithm_name = ‘AEAD_AES_256_CBC_HMAC_SHA_256’, column_encryption_key_name = ‘CEK_Auto1’, column_encryption_key_database_name = ‘DBNAME’) is incompatible with varchar DBCC execution completed. If DBCC printed error messages, contact your system administrator. Completion time: 2021-07-30T17:53:06.1301441-04:00

Fortunately, Bob was able to find an answer pretty quickly, and the solution follows classic Microsoft error policy where the error message is technically correct but not helpful in discovering the cause.

In this case, Bob’s answer was to drop auto-generated statistics on the table and then rebuild the index again. That happened to work out just fine.

CycleCloud

Mike wrapped us up with a quick discussion of CycleCloud, an interesting way to scale up and down VM resources in Azure. Unlike platform-as-a-service offerings, which tend to have sliders to control this sort of thing, VMs don’t have an easy answer. This still isn’t an easy answer, but it’s definitely easier.

Shop Talk: 2021-08-02

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Mike Chrestensen

Notes: Questions and Topics

PrintNightmare Updates

Just in case you thought PrintNightmare was done, it’s not. This bypass isn’t as bad as the original exploit, but given that this is a bypass response to a patch Microsoft just put out, it’s a good indicator that we’ll have more print spooler problems to deal with over the coming months.

Using Lookup Tables and the OTLT Anti-Pattern

Mike then regaled us with a story about lookup tables, though by “lookup table” he referred to metadata about tables and columns rather than reference data. I took the opportunity to talk about the One True Lookup Table anti-pattern and where it can all go wrong, and all three of us recommended that you read SQL Antipatterns.

Regular Expressions in SQL Server?

Mark H. asked a good question:

Do the recent versions of SQL Server come with (.Net framework) regular expression support?

The short answer is, no. The longer answer is that you can use SQLCLR to call out to C# code and have been able to since 2005. You could build your own implementation or use Solomon Rutzky’s SQLSharp library, which I highly recommend. I also pointed out the ability to use SQL Server Machine Learning Services to do this in R (since SQL Server 2016), Python (2017), and Java (2019), though Solomon rightly points out that this is more a hack solution than a great one, as you have to send the entire dataset to ML Services and it would not work well in a row-by-row operation.

Solomon and I then got off onto the topic of CLR strict security, and I recommend a lengthy series on the topic that Solomon wrote. Start with part 1 and check the sidebar for the other parts of the series. In short, CLR strict security is the answer to a question nobody asked.

At the end of the show, we also spoke a bit pessimistically about the future of CLR and how it’s likely to struggle along in the background as .NET Core takes over. I’d like that not to be the case—one area where SQL Server is way behind other relational database management systems is in the relative lack of capability for extensibility scripting. Compare this to something like PostgreSQL, where you can write functions and procedures in multiple languages natively (and others with plugins) and you can see the problem even more clearly.

Shop Talk: 2021-07-19

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Mike Chrestensen

Notes: Questions and Topics

SQL Server Version Support

We started off tonight by mentioning that SQL Server 2016 ended mainstream support on July 13th and now has entered extended support. SQL Server 2012 leaves extended support in July of 2022. We also talked a little bit about what this means.

On the other side of support, we talked about a couple of surveys from Brent Ozar and Steve Stedman, respectively, which look at who (among their user bases) is using which version of SQL Server. I also go into some detail about what I mean when I say that the population surveyed may not be the same as the broader population of SQL Server installs.

Print Spoolers? We Don’t Need No Steenkin’ Print Spoolers!

It has not been a good month for print spooling on Windows.

Reinforcement Learning and Pong

Andrej Karpathy has a great post from a few years back about reinforcement learning. The last segment of the program tonight covered some minor updates I’ve made to the code, as well as a demonstration of a model I trained over the course of a weekend.

Shop Talk: 2021-06-21

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Mike Chrestensen

Notes: Questions and Topics

The Value of Community

We started off tonight by mentioning that Mike got a promotion at his job thanks, in part, to what he’s been able to learn by being part of the SQL community. We’re all quite happy for Mike and wish him the best, and we’re glad that he’s a part of the community.

Linked Servers and their Alternatives

Mala brought up our first major topic tonight: what are some alternatives to using linked servers? Here’s a brief synopsis, but you’ll definitely want to listen to the video.

  • First, are you sure you need an alternative? Linked servers do work for specific purposes and they do a pretty good job in that role. The main problem comes when we try to move lots of data over linked servers, especially when we do it over and over.
  • PolyBase is an alternative. Of course I’d talk about PolyBase
  • You can also use OPENROWSET for one-off data queries.
  • If you need to migrate data frequently, perhaps using an ETL tool would be preferable. That could be a product you purchase (SSIS, Informatica, etc.) or an application you develop. The next night, Joshua Higginbotham talked to us about using Python for ETL work, so it’s a good lead-in.
  • Data virtualization solutions are also available.
  • One idea I hadn’t thought of but makes a lot of sense is using Machine Learning Services, especially if you need to do data analytics from remote data and then store it in your local SQL Server instance.

ML Experiments on Large Data

Our other major topic was from George Walkey, who asked for my preferences around performing machine learning experiments on large data sets, specifically asking about Databricks, SparkML, Azure Synapse Analytics, Azure Machine Learning, and on-premises solutions. Here’s a synopsis of what I wrote back:

My team generally uses on-premises tooling because that’s what we had available.  We started out with SQL Server Machine Learning Services and built a few products off of that.  However, most of our new products don’t use ML Services and we’ve started to move existing ones away where it doesn’t make sense (mostly, it doesn’t make sense when you need sub-second performance, aren’t using batch operations to generate lots of records at a time, don’t have the data in SQL Server, don’t store the results in SQL Server, and can’t use native scoring and the PREDICT operator).  Instead, we are prone to building out Linux-based servers running R and/or Python to host APIs.  We also tried out Azure ML and team members really liked it, but internal problems kept us from being able to use it.

As for what I like:

  • On-prem is easy if you’re on a tight budget.  There are a lot of tools available, and if you have knowledge in the ML space, you can do pretty much anything on-prem that you could do in a cloud.
  • I’m not the biggest fan of SparkML (and by extension, the Databricks ML capabilities).  Spark’s machine learning capabilities aren’t fantastic, and they tend to lag way behind native scikit-learn or R packages.  Sure, you can run native scikit-learn code on a Spark cluster, but then you don’t get the advantages of scale-out.
  • Azure ML (and AWS SageMaker) are good options, especially for people without an enormous amount of ML experience.  They’re going to constrain you a bit on the algorithms you can choose, but that’s usually okay–and pro users can find their way around that by bringing in Python libraries.
  • Azure Syanpse Analytics offers ML capabilities, but it’s worth keeping in mind that the Spark pool machine learning has the same SparkML downsides as what you find in Databricks. Also, between Azure Synapse Analytics and Azure ML, Microsoft has two different ML products run by two different teams here and both of them are competing for what is “the right choice.”  Last year, the guidance was to use Azure ML for everything machine learning-related in Azure, and Azure Synapse Analytics Spark pools with SparkML only in cases where you already had existing Spark code.  I’m not sure what the guidance will look like over the next few months, but that does leave me wondering.

Shop Talk: 2021-06-07

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan

Notes: Questions and Topics

SMO

Mala started us off with a discussion of SQL Server and SQL Management Objects (SMO). We spent the first half of the episode discussing SMO as a concept, understanding where and how to use it, and working through some of the foibles associated with it.

Miscellany

Chat did a good job of carrying this episode. Check out things like:

  • Customer support and how terrible it is, as well as my thoughts on the nature of level 1 & level 2 support versus level 3 and higher.
  • How containers differ from the Java Virtual Machine and why it makes sense for the former to be completely unrelated to the latter.
  • How to pronounce squid (Skoo-id, obviously).
  • Emanuele Meazzo’s SQL Server Diagnostic Book. I’m a fan of it.

And more!

Shop Talk: 2021-05-24

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Mike Chrestensen

Notes: Questions and Topics

Technologies to Learn

Chris Voss e-mailed (shoptalk at tripass dot org) with a great question:

When it comes to deciding what technology or data skill you wish to learn next, what’s your process? Do you base it on what looks great, what could be good for a given task at your job, what tool or language appears to be most in demand? With all that considered, how much time then goes into such external learning? I know it depends, but I wonder what everyone has to say.

I’m going to keep some of my powder dry because this makes up a fair percentage of my upcoming Dataminutes talk, but here are some quick thoughts:

  1. Build a not-to-do list. It’s easy to add things to to-do lists, but a not-to-do list is even more important. Doing this gives you focus. So what goes on your not-to-do list? That’s for Dataminutes…
  2. Think about how good you are at your current job, in terms of knowing what you need to know as well as job security. If you’re at risk, focus your learning time on what makes you better at your current job. Keeping the job you have is important, as it’s much easier to get a new job if you have one than if you’re out of work and looking.
  3. If you’re in a pretty good place job-wise, think about what you want your next job to be and spend some amount of time on that. You shouldn’t spend 100% of your learning on “the next role,” but this is a continuum, in that as you get more comfortable in the current position, you should be able to branch further.
  4. Spend an hour a day at work learning. If you need to (and can!), set up a daily meeting on your calendar during a time frame when you aren’t that busy, and use it to learn. Bias toward things helpful for your current position, so that if pressed, you can bring up specific cases where the time you spent learning directly and positively affected the company.
  5. Don’t forget about learning for fun. That may also be part of your “next job” learning, but it can be enjoyable to learn something simply for the sake of learning it.
  6. Go outside your comfort zone. Nobody respects a Fachidiot.
  7. Also, don’t forget about leaving your normal realm. I’m assuming that most of the Shop Talk audience is technical people, like application developers, database administrators, and the like. Pick up a book on literature, history, or philosophy.
  8. Your brain needs breaks, too. Specifically, physical activity is really good for giving your brain a break. If you spend 8+ hours a day sitting in front of a screen at work and navigating logic problems, it can wear you out to keep trying to solve similar logic problems at night. Switch it up a bit and you’ll find more success long-term.

Shop Talk: 2021-05-10

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Mike Chrestensen

Notes: Questions and Topics

WIT Mental Health Day Review

Mala kicked us off with a review of a recent conference from the WIT group: a mental health day on Friday, May 7th.

Build and/or Buy

Mala gave us the theme of the night, starting with a fairly recent article by Grant Fritchey regarding the choice to build or buy software.

We ended up spending a lot of time on this topic, as well as a detailed discussion on just how likely it is that companies will be cloud-only anytime soon.

Shop Talk: 2021-04-26

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Mike Chrestensen

Notes: Questions and Topics

WIT Mental Health Day

Mala kicked us off with an announcement of a conference from the WIT group: a mental health day on Friday, May 7th.

Grafana Going to AGPL

Our big item was Grafana moving from Apache 2.0 licensing to AGPL. This primed a long discussion on a variety of topics, starting with open source licensing but going way off the rails. What could possibly go wrong?

Shop Talk: 2021-04-12

The Recording

The Panelists

  • Kevin Feasel
  • Mike Chrestensen

Notes: Questions and Topics

Questions of Critical Importance

Tonight’s episode was a fun topic where I essentially gave a survey but without collecting responses. Because that’s how I roll. Here were the questions (including a couple I didn’t ask), along with my original answers.

  • Best Named SQL feature: Hekaton was, at least until marketing got in the way and made it In-Memory OLTP.
  • Worst Named SQL feature: The timestamp data type.  As a runner-up, Azure Data Studio.
  • SQL Feature Most Likely To Have Been Named By Marketing: Always On Availability Groups.  Or maybe AlwaysOn.  I think it might have been Always-On at one point as well, but don’t recall.
  • Most Accurately Named SQL Feature: I want jokingly to say “Priority boost” here.  On the plus side, most features are named clearly enough, even if sometimes they take away nice names and give them marketing-heavy names.
  • Least Accurately Named SQL Feature: The timestamp data type.  As a runner-up, Azure Data Studio.
  • SQL Feature That Should Throw A Hard 24 Error If You Try to Use It In A New Project (But Continue To Work For All Your Existing Crap): Non-schemabound table-valued functions
  • SQL Feature That Just Needs A Little Love to Be Great: If this was 2010, I would have said Service Broker.  Right now, PolyBase.
  • SQL Feature That Can Never Be Great Even With Barry White Levels of Love: CLR strict security.
  • Best SQL Feature: Metadata is also described in terms of SQL.
  • Suckiest SQL Feature: Stretch DB is too easy.  CLR strict security could be on here, but I’ll pick something different:  SSIS scale-out.  It’s a half-hearted attempt to say that SSIS scales.
  • Surprisingly Useful SQL Feature: Batch mode processing.  To clarify, the surprising part is just how often batch mode processing can speed things up.
  • Surprisingly Useless SQL Feature: Temporal tables.  I think they tried to solve two problems at once and ended up providing a halfway-viable solution to each, but not a complete solution to either. A bit of that Barry White love could fix this.
  • SQL Feature I’ll Probably Never Use But I Like That It Exists: Query Store plan forcing for cursors.  It’s easy to dunk on cursors and performance, but I appreciate that they were able to figure this out.

Shop Talk: 2020-03-29

The Recording

The Panelists

  • Kevin Feasel
  • Mala Mahadevan
  • Mike Chrestensen

Notes: Questions and Topics

Deleting Backup Files

@rajeshparuchuri asks, “How do you delete old backup files? Or what is your preferred way of deleting old backup files?”

We had some discussions about different methods for deleting, but there are several key methods:

  • Use xp_cmdshell to delete the files. This wasn’t available to Rajesh, so we had to move on.
  • Write a PowerShell script to delete the files and call that script in the backup job. Essentially, know where the backups are stored and write a one-liner to get child items (with a .trn or .bak extension), filter to get just those older than your retention period, and delete the objects.
  • Use a maintenance plan. I include this in the list for the sake of completeness, not because it’s a great idea.
  • If you’re writing backups to a cloud store like AWS S3 or Azure Blob Storage, you can set a retention plan on the bucket/container and automatically delete files older than a certain age.
  • Similarly, many newer SANs offer the ability to set a retention policy on a share.

The Downside of Storing VARBINARY in a Table

@rednelo7625 asked: “I have a question about storing Word docs in a document DB as VARBINARY(MAX). I’m limiting my users to an upload size of 3MB. I’m storing them this way so that I can encrypt them and back them up. Do you see any problem with this practice?”

The short answer is yes, there are some risks with this. The biggest problem is that if the database becomes really popular, it can bloat up quickly and make activities like database backups, CHECKDB runs, etc. run a lot slower. And with blob data like this, there’s not much we can do about it—especially something like DOCX files which are already zipped (spoiler: .docx is basically a zipped series of XML files), so you don’t get any real compression benefits from using COMPRESS().

As for the reasoning, it’s to encrypt and back up files. My response was to recommend checking into FileTable functionality that was introduced in SQL Server 2012. FileTable gives you a nice compromise between a desire to keep file metadata in SQL Server while keeping the binary data out. If you take a full backup of your database, it does back up the FileTable data (unless you choose to exclude it), so you do get backup protection that way without needing to store the files directly in the database.

As far as encryption goes, I’d recommend enabling disk-based encryption on the storage where those files are (assuming you use FileTable). I made the guess that rednelo was using Transparent Data Encryption for this, and encrypting the disks provides a similar level of protection—it protects data at rest. For further protection, it would be possible to write an app which accepts those Word documents and saves them in a password-protected archive, or encrypts the binary data and saves it as an encrypted file, decrypting it later for access.

IoT Everywhere

Eventually, we got pulled off onto a sidetrack around IoT, especially in agriculture. This is an area of interest to me because the average person has no clue just how sophisticated modern agriculture is. Anders and @ENHANCE_QC have some knowledge in the area (family members in farming) and added some nice context too.

We covered one of the key problems with IoT: you may own the hardware, but if it has to talk to a remote server to survive, your expensive product can quickly become a brick. This also ties to right-to-repair laws, which I fully support. My philosophy on this is that if I bought the product, I should be able to alter the product, including repairs and modifications.