Troubleshooting ESXi hardware freeze with PCI Passthrough

hardwaretroubleshootingvirtualizationvmware-esxivmware-vsphere

I have a Supermicro X9SCM board with a Atheros AR5008 PCI Express card (D-LINK DWA-556, Device=0024&Vendor=168C). The card can successfully be marked for PCI passthrough in ESXi (I tried versions 4.1 and 5.0) though each time I start a VM with the Wifi card associated – the entire host freezes and requires a hard reset.

There is a good chance this card is just not compatible for some reason – though there seems to be at least one report I could find of it "working", or at least the guest being able to boot. I would really like to understand why it is failing though. I have tried digging into some log files and other resources to see if I can glean any knowledge on how to best troubleshoot this, though I am far from an expert with VMWare tools.

Here is what I have looked at so far:

  • BIOS, tried latest version (1.1a) and one older version (1.0c).
  • The BIOS has a log that reports "PCI ERR" or "PCI ERR – Asserted" whenever this freeze event happens.
  • I grabbed the various logs from /var/log on the ESXi host, though I haven't really been able to see anything too useful from them just yet. Maybe I don't know where to look.
  • I tried adding the PCI card to the passthru.map file to see if I could perhaps hint to ESXi how it should behave with no luck. (Note: I haven't tried all combinations of reset method / fpt shareable yet)
  • I have read there may be a difference with "Active" PCI Express cards. I believe this might be referring to Active State Power Management though I am not sure how to even check this.
  • I have contacted Supermicro support to see if there is a known issue with the BIOS / hardware though I haven't heard back. I also tried to get on the VMWare communities and post on their forums though I haven't been able to activate my account for some strange reason.

Again, my real question is: How do I go about understanding why this device is causing the hypervisor to lockup when it is assigned to a guest?

Best Answer

It likely is not the Hypervisor that is locking up, but some kind of hardware (like the PCIe switch). You would have a hard time debugging this without any kind of PCIe debugging hardware and a whole bunch of PCIe-specific knowledge, so it probably is not worth pursuing. In general, PCI passthrough is not what you should use without a great deal of consideration.

If you need a wireless-connected interface on the virtual machine, consider using an external device (router/bridge) bridging the wireless network to a wired one and using a virtual interface from within your virtual machine connecting to this network. Another option would be using a USB-plugged interface together with a USB network redirector.