Tech Trained Monkey

Everyday Problem Solvers

Advanced Troubleshooting Sql Server Pt.1

Detecting a Bad SPN

Okay, we all know the error that this causes. Even saying SSPI context is bad juju, and I feel dirty when I talk about it, but it has to be said:

Cannot generate SSPI Context

This is a generic error that can be caused by 1000 different things, but experience has shown me that 98% of the time, it is caused by 1 of 2 problems. Either the server is not able to connect to a domain controller to establish the SSPI context or there is an invalid SPN for the SQL Server service in AD. To verify if the domain controller connection may be a problem, I check the System event log. There may be errors stating that SQL Server was not able to connect to a domain controller or kerberos errors. The error logged may have occurred several days before you see the SSPI context errors. 99% of the time, this error is resolved by rebooting the server.

I want to talk about the second cause. An invalid SPN will cause this error. No SPN at all will NOT cause this error. That bears repeating: No SPN at all will NOT cause this error. Adding an SPN to AD will NOT fix this error. Setting up kerberos or constrained delegation when there is an invalid SPN will NOT fix this error. Starting to see a pattern?

To summarize the above, there are a lot of things that people will recommend trying that do not fix this error. If anything, they complicate trying to fix it. So what does fix this error? Well, there is only one way to fix an SSPI context error caused by an invalid SPN. Delete the invalid SPN.

It is actually quite simple to detect that your SPN is invalid as long as you are using a domain account for your SQL Server service (which you should be). Simply check for SPN’s that are registered for your service account and for the server. If you do not find matching SQL Server SPN’s for both, then there is an invalid SPN. If the SPN’s match, but the server returns extra SPN’s that are not in the service account list, there is an invalid SPN. And lastly, if everything matches, checking the spelling carefully as I have been bitten by misspelled manually created SPN’s several times.

The tool I use for checking for SPN’s is setspn from a command window. Let me demonstrate:

Example 1:
Let’s say that our service account is MyDomain\SQLSrvrSvc and the name of my SQL Server is ProdSQL02. My check of the SPN’s might look like this:

C:\users\sqlsoldier>setspn -l MyDomain\SQLSrvrSvc
Registered ServicePrincipalNames for CN=SQL Server Service Acct,OU=UserAccounts,DC=MyDomain,DC=com:

C:\users\sqlsoldier>setspn -l ProdSQL02
Registered ServicePrincipalNames for CN=ProdSQL02,OU=Workstations,OU=Machines,DC=MyDomain,DC=com:
TERMSRV/ProdSQL02
TERMSRV/ProdSQL02.MyDomain.com
HOST/ProdSQL02
HOST/ProdSQL02.MyDomain.com
MSSQLSvc/ProdSQL02:1433
MSSQLSvc/ProdSQL02.MyDomain.com:1433

Do you see the problem? No? When I checked for SPN’s for the server, 2 SPN’s for SQL Server show up. When I checked for SPN’s for the service account, no SPN’s for that server show up. This means that the SPN’s configured for SQL Server are not owned by that service account. Most likely, the service account was changed recently, and these SPN’s were left over from the previous service account. This happens frequently when changing the service account from Local Service to a domain account (more on this later). This will generate the SSPI context error, and connections will fail. Delete the SPN’s.

Example 2:
Let’s say that our service account is MyDomain\SQLSrvrSvc and the name of my SQL Server is ProdSQL02. My check of the SPN’s might look like this:

C:\users\sqlsoldier>setspn -l MyDomain\SQLSrvrSvc
Registered ServicePrincipalNames for CN=SQL Server Service Acct,OU=UserAccounts,DC=MyDomain,DC=com:
MSSQLSvc/ProdSQL02:1433

C:\users\sqlsoldier>setspn -l ProdSQL02
Registered ServicePrincipalNames for CN=ProdSQL02,OU=Workstations,OU=Machines,DC=MyDomain,DC=com:
TERMSRV/ProdSQL02
TERMSRV/ProdSQL02.MyDomain.com
HOST/ProdSQL02
HOST/ProdSQL02.MyDomain.com
MSSQLSvc/ProdSQL02:1433
MSSQLSvc/ProdSQL02.MyDomain.com:1433

Do you see it this time? There are matching SPN’s for the account without the fully qualified domain name (FQDN), but the SPN for the non-FQDN does not have a match. In this scenario, some users will connect fine, and some may not. Specifically, anyone using the FQDN will fail with the SSPI Context error, but users not using the FQDN will connect successfully. Delete the SPN that does not match or just delete all of them if you are not using them for establishing Kerberos connections.

Example 3:
Let’s say that our service account is MyDomain\SQLSrvrSvc and the name of my SQL Server is ProdSQL02. My check of the SPN’s might look like this:

C:\users\sqlsoldier>setspn -l MyDomain\SQLSrvrSvc
Registered ServicePrincipalNames for CN=SQL Server Service Acct,OU=UserAccounts,DC=MyDomain,DC=com:
MSSQLSvc/ProdSQL02:1433
MSSQLSvc/ProdSQL02.MyDoman.com:1433

C:\users\sqlsoldier>setspn -l ProdSQL02
Registered ServicePrincipalNames for CN=ProdSQL02,OU=Workstations,OU=Machines,DC=MyDomain,DC=com:
TERMSRV/ProdSQL02
TERMSRV/ProdSQL02.MyDomain.com
HOST/ProdSQL02
HOST/ProdSQL02.MyDomain.com
MSSQLSvc/ProdSQL02:1433
MSSQLSvc/ProdSQL02.MyDoman.com:1433

Do you see it this time? This one is a little trickier. I already mentioned it, so it’s not that tricky. There are matching SPN’s for both the FQDN and the non-FQDN, but there is still an invalid SPN. I’ll give you a hint, users using the non-FQDN can connect successfully, but users using the FQDN get the SSPI Context error. Ah, now you go it!! Yes, the domain name is misspelled and therefore the SPN is invalid. User’s are connecting via ProdSQL02.MyDomain.com.

What Causes a Bad SPN

Other than Nargles, most invalid SPN’s are caused by one of 3 things. Either the service account was changed from Local Service to a domain account incorrectly; the SPN was created manually, and the SPN was not deleted and regenerated; or it was manually created incorrectly (human error). I want to take a closer look at the first option because this seems to be the most common cause I deal with.

There is what I consider to be a bug in SQL Server Configuration Manager. If you change the service account of the SQL Server service and then restart the service so that the change takes affect, the existing SPN is not deleted. Local Service is a privileged account, and is allowed to create and delete its own SPN’s in AD. As a result, if the service starts as Local System, an SPN will be successfully created. If you then stop the service, the SPN will be deleted. If you change the service account from Local System without stopping the service first, the SPN may not be deleted. Now this is not a 100% scenario. Sometimes it will be deleted and sometimes not. If not, then the SPN will be invalid when the service restarts with a different account and users will get the SSPI Context error.

To fix this problem, you need someone who has sufficient permissions in AD to delete the account. In most production domains, this is not going to be you or anyone on your team. This means contacting whoever manages your domain or its domain entities. This of course means delays while you wait for them to make the change for you. Luckily, we know someone who has privileges to delete this SPN. He’s right in front of us.

As I said above, Local System has permissions to create and delete its own SPN’s. This SPN is owned by Local System. Here is a fix that is generally much faster than waiting for someone else to fix it for you. And if nobody can connect to the server anyway, restarting the service is not an issue as you are effectively down anyway. In one of the cases above where some users can connect and some can’t, then you may be better of waiting for the deletion.

To delete the SPN manually:

  • Stop the SQL Server service
  • Change the SQL Server service account back to the old account (Local System)
  • Start the SQL Server service
  • Stop the SQL Server service
  • Change the service accounts back to the domain account
  • Start the SQL Server service
  • Start the SQL Server Agent service if it was originally running

That’s it. The SPN should be deleted and connections should work now. To ward of Nargles (and prevent this error from reoccurring), always follow these steps when changing the service account of the SQL Server service.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: