Windows – DB auto failover in c# does not work when the principal server physically goes offline

sqlsql-server-2008windows

I'm setting up DB auto failover in C# with SQL Server 2008 and I have a 'high safety with automatic failover mirror' using a witness setup and my connection string looks like

"Server=tcp:DC01; Failover Partner=tcp:DC02; database=dbname; uid=sewebsite;pwd=somerndpwd;Connect Timeout=10;Pooling=True;"

During testing, when I turn off the SQL Server service on the principal server the auto failover works like a charm, but if I take the principal server offline (by shutting down the server or killing the network card) auto failover does not work and my website just times out.

I found this article where the second last post suggests that its because we are using named pipes which does not work when the principal goes offline, but we force TCP in our connection string.

What am I missing to get this DB auto failover working?

Best Answer

After working with MS for a week, we've worked out why this happens.

Essentially, the application is not failing over because it needs to be sure that the database has failed over - and the sql connection is timing out before the connection has determined that the db has failed over.

The process to confirm that the database has failed over (with all the default tcp registry settings) is to:

  1. try to communicate with the principal, see that it is not the principal anymore
  2. communicate with the failover to make sure that it has failed over and that it is now the new principal.

When the principal is down, this communication takes about 21 seconds because it will:

  1. try to communicate with the principal, wait 3 seconds, timeout
  2. try to communicate with the principal again, wait 6 seconds, timeout
  3. try to communicate with the principal again, wait 12 seconds, timeout
  4. try to communicate with the failover partner, see that it has failed over, so fail over in the application.

So if your sql connection isn't waiting 21 seconds (probably more in reality) then its going to timeout before it finishes this dance and its not going to fail over at all.

Solution is to set the timeout in your connection string to large value, we use 60 seconds just to be safe.

Cheers

Related Topic