In setting up a development lab, I've got a desktop system running ESXi 4.1.0 (free license) on SATA RAID 0 (already purchased and configured when I started this job; I'm open to hardware input as it pertains to my problem.) Its guests so far include two Win2008 Server R2 64-bit VMs and on Ubuntu 10.04 64-bit VM. I'm installing onto the Windows servers.
We've been copying off some fairly large files (over a gigabyte) for an installation, hoping to install more quickly from a (virtual) hard drive than from the network for from BD-ROM. The problem is that they keep coming up with different checksums from the originals. The file sizes are the same, but md5sum reports different numbers (and so does the installer, as it refuses to continue when the checksums don't match.)
I've tried copying directly from the BD-ROM (attaching the OS drive to the host system's physical drive). I've tried copying the large files onto a co-worker's Windows machine from his Blu-Ray drive; when I do that, the checksums match. But when I copy from his machine to the VM guest over a network share, the checksums no longer match.
Thinking this meant a corrupt destination drive, I deleted it in vSphere and added another freshly created drive. The problem persists. I'm not sure what to try next.
Best Answer
So this was a combination of a bad stick of RAM and a Linux kernel bug affecting SATA. I'd put Ubuntu 10.04 on there, and eventually left memtest86+ running all night (as running it for 1.5 passes before hadn't flushed out the problem).
After I removed the bad RAM, I started seeing SATA errors in /var/syslog, similar to this:
I finally discovered this bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892?comments=all which led me to try an earlier Linux kernel (the one that ships with Ubuntu 8.04). The machine's been working great ever since.