Causing a vm to exhibit packet loss

packetlosssql-server-2005vmware-esxwindows-server-2003

We have a pretty nice piece of hardware set up to run multiple virtual machines in vmware and one of the vm's is an instance of Windows Server 2003 running SQL Server 2005. For some reason we occasionally see 10-20 seconds of straight packet loss to this machine from remote machines (my workstation) as well as other vm's on the same physical hardware. I am using PingPlotter to keep a close eye on the packet loss.

So far we've turned off flow control on the NIC but we are already running out of other things to try. What might be causing this and how can I identify the problem?

Note: We also have another server with a very similar configuration with the same type of problem to a lesser extent (because its not used as heavily?)

Best Answer

Interesting. First, lets establish some specifics...

You have an ESX host that is running multiple VMs, right?

You have one of those VMs as a Windows 2003 server.

You say when you run pings from a "remote" machine to that VM, you see 10-20 seconds of packet loss.

OK, immediate questions:

1) Does the packet loss occur when pinging from one of the other VMs running on that host?

2) Do any of the other VMs on that host (or the host itself) display the same behavior when you ping them in an identical manner from an identical place on the network?

3) Are any of the other VMs running the same operating system as the VM displaying the behavior?

4) Is there any kind of timing pattern? Does it happen every 5 minutes? Is it every so many packets. Do you always lose the same amount of packets?

5) When you go into the vSphere console, do you see any kind of performance graph changes that match the timing of your ping loss?

6) Is VMware tools installed on the VM and up to date?