- Kevin Feasel
- Mala Mahadevan
Notes: Questions and Topics
Azure Active Directory Outage
Shortly before the episode began, Azure had a major outage in its Active Directory service. We used this as a springboard for discussion around what happens when you rely on a cloud service for your company’s infrastructure, but that cloud service goes down. The easiest answer is multi-cloud and cloud/on-premises hybrid scenarios, but those can get expensive and also limit you from some of the benefits of Platform-as-a-Service offerings.
Our next segment was around a server having issues that I was asked to take a look at. The two biggest waits on the server were
SOS_SCHEDULER_YIELD. The knee-jerk reaction when you see
SOS_SCHEDULER_YIELD is to say that it’s a CPU problem, but Paul Randal explains why that is not necessarily the case.
From there, you want to see if the signal wait time is high—the rule of thumb I’ve seen is that if signal waits are more than about 20% of total wait time for
SOS_SCHEDULER_YIELD, then we have a problem where there isn’t enough CPU to go around. In my scenario, signal waits were approximately 99.999% of total waits. So the kneejerk reaction is to blame CPU, the more thoughtful reaction is to investigate further, and the results of the investigation were to blame CPU.
If that’s the case, here are a few of the options potentially available:
- Tune some queries. Look for high-CPU queries, especially ones which run frequently. There are a few of those in my scenario, so every second of CPU you shave off per query call is a second which can go to another query and lessen the waits.
- Drop MAXDOP and cost threshold for parallelism if it is too high. I’m not talking about the knee-jerk reaction of “Parallelism is high, so drop MAXDOP to 1” but if you have 32 cores and your MAXDOP is 32, that probably doesn’t make much sense for an OLTP system. As a warning, though, changing MAXDOP without reviewing poorly-performing queries can lead to bigger problems, as those poorly-performing queries are already struggling with a high value of MAXDOP, so if you reduce .
- Increase CPU cores if you can. If your queries are looking good, it may just be that you have too much load on your server and adding more hardware can be the solution.
SSIS and ADO.NET: A Rant from Someone Not Me
Mala spent far too long working through an issue with multi-subnet failover on Availability Groups and SSIS. We use OLEDB drivers very heavily when working with SSIS, as we appreciate work finishing at a reasonable pace. But the OLEDB drivers installed with SQL Server 2017 don’t include support for multi-subnet failover. But hey, ADO.NET drivers do. So Mala spent a lot of time trying to switch over to that and catalogs some of the issues she ran into.
Furthermore, it turns out that the latest OLEDB driver (installed with SQL Server 2019 and available separately if you don’t have that version of SQL Server) does have support for multi-subnet failover, so we just needed to update drivers.