Sql-server – Combining failover clustering and database mirroring

clusterdatabase-mirroringsql serversql-server-2005sql-server-2008

When you combine failover clustering and database mirroring in SQL Server, you need to change the mirroring partner timeout value so that the local cluster gets a chance to fail over before database mirroring fails over. I'm curious as to what people are doing when combining these technologies – I teach various HA classes and this is not too common a combination.

Here are my questions IF you are using failover clustering and database mirroring combined. If you could answer them all in each response, that would be very useful to me. I don't need an explanation of why things need to be changed or how the technologies work – I used to own them both when at Microsoft – I'm interested in industry practices now the possibility of marrying them has been out there for 4 years.

1) how long, on average, does it take for a clustered SQL Server instance to fail over for you? (I know it depends on how much crash recovery is required, but what's an average for you?)

2) for these same instances, what do you set the mirroring partner timeout to?

3) are you comfortable with the fact that a real cluster outage could occur and it may be quite a while until mirroring notices that the failure has occured because you've bumped the mirroring partner timeout up?

Thanks for all responses!

Best Answer

Paul, 1. Typically a few seconds, up to a couple of minutes depending on ... (you know the rest).

  1. Were I to setup auto failover I'd go for several minutes. That way site to site VPN connections would have time to come back up, Cluster could restart, etc. At the minimum I'd probably go with 4 minutes longer than it would take the nodes of the cluster to restart in the event of a local power outage.

  2. Yep. DR issues are usually defined as a failure over an hour. Besides it'll probably take longer than that for the global load ballancer to notice the other site is down, and upload all the DNS, plus the TTL time on the DNS. This total time should be the upper end of the amount of time for auto failover.