Linux – ‘Memory read error’,Sever hardware error

hardwarelinux

I got a error about my server which is running CentOS5.5.

MCE 20
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 BANK 8 TSC 6ab9ff9745f62 [at 2394 Mhz 9 days 1:50:52 uptime (unreliable)]
MISC cf36ad0100081186 ADDR 203376500 
MCG status:
MCi status:
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERR
Transaction: Memory read error
STATUS 8c0000400001009f MCGSTATUS 0

what is the matter?
is memory card error or memory controller error?

Best Answer

If you can restart the machine and get into the BIOS you may be able to see if there is a failed DIMM.

Basically your OS has detected a faulty piece of hardware. You need to figure out what exactly that means. Most likely you should try and backup your data to another machine or USB drive if you can and then start the troubleshooting process.

Depending on the server hardware you are using, sometimes the motherboard will detect the hardware fault and light up an LED beside the faulty DIMM.

Consult your hardware vendor's website and see if there is some kind of diagnostic tool provided by them that will help you test your hardware. Sometimes this can be done without rebooting but likely you will have to create a boot disk for their tool to run.