Troubleshooting SQL connectivity issues after database migration

sql-server-2008

We have a just completed a migration from SQL 2000 to SQL 2008 R2 and have started to intermittently receive SqlExceptions with the following two error messages:

  • A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 – The semaphore timeout period has expired.)
  • A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 – The semaphore timeout period has expired.)

We have 3 web servers connecting to this SQL Server running around 100 applications (all accessing the same 8 databases on the SQL Server).

Because these exceptions were not occurring on the 2000 server, we feel like it is unlikely to be an application issue (however, we are not ruling it out). Traffic on the web sites is typical, ruling out a high traffic issue. The old SQL 2000 box had 4 CPUs and 8 GB RAM, while the new one has 24 GB RAM and 16 CPUs (which is currently and during the issue underutilized).

These errors occurred for a period of about 5 minutes several hours ago and have not as yet reoccurred.

The sys.dm_os_ring_buffers system view does not show entries for these disconnects, and there are no corresponding event log entries on either the server or the client.

Some googling has found a few similar reports, however nothing seems definitive(see links below). Has anyone seen errors like this after migrating from SQL 2000 to SQL 2008 R2?

Links:

Best Answer

We have tracked down and fixed this issue in our environment. The description as I understand it is below (please excuse potential inaccuracies below; This is the way I (as a software developer) understand the descriptions given to me by our Network Administrator (who also was working with our hosting company).

The cause was eventually tracked down as a network configuration issue involving the Load Balancer. We had expected that the Load Balancer was sitting between the internet and our web servers, and that all of our servers were communicating freely with each other. Unfortunately the network was set up in such a way that all network traffic (including traffic between the SQL Servers and Web Servers) was passing through the Load Balancer. The Load Balancer was configured to limit bandwidth passing through it, and when the limit was exceeded it simply dropped packets. The limit was often exceeded when large file transfers were occurring between the servers (eg, when database backups were copied off of the database server, etc). This was hard for us to see as we didn't have access to the Load Balancer (only our hosting provider could access it), and as far as we could tell we were far from saturating our network interfaces. Additionally, these issues were extremely sporadic (on the order of a handful of minutes every 3-5 months).

The fix was to rearrange the environment so our internal network traffic did not go through the LB; I believe the network was rearranged to fit a One-armed Load Balancing Architecture. Since making this change we have not experienced the intermittent connectivity issues.