Uncorrectable machine check

hardwarehphp-proliant

I am experiencing rare but real unrecoverable machine checks on HP DL370 G6 dual-core Xeon server. I ran memtest86+ before, and ran CPU-intensive operations without any problems.

In your opinion, does this indicate a real problem, or is it normal and expected behavior?

How would you approach this problem?

EDIT after some troubleshooting, it seems that these machine checks, as well as problems when showing device manager can be traced back to NC375i NICs. All is well when the NICs are not in the server.

History of machine checks


Further improvements to stability of HP Gen6 with Intel Xeon have been brought in with BIOS update in September 2013 HP Update DVD. Intel's newer microcode makes these CPUs much more stable. We haven't seen hardware-related BSODs since the update in September.

Best Answer

The NC375 are problem filled NIC's. I've had a LOT of bad luck with them across various customers we work with. Total loss of connectivity across all ports, lock ups, etc.. Multiple updates to the critical HP Advisories around this NIC as well. General rule we have with the NIC's are replace it with something else if you can, otherwise, ask HP to replace it with newer hardware revisions and grab the latest firmware/drivers as soon as possible.