Dell PowerEdge C1100 MRC failure DIMM replaced, new memory not recognized in previously failed slots

delldell-poweredgememory

Dell PowerEdge C1100, BIOS Version DS993B22 as seen via BMC IPMI, all 18 memory slots are populated with 4 GB modules.

The following memory errors occured:

MRC Event: Memory sensor, MRC Warning(1B.01): Lane failures during Dqs clean-up!
MRC Event: Memory sensor, MRC Warning(1C): Hardware Memtest failed and the DIMM is disabled. Node 1, Channel F, DIMM 0.
MRC Event: Memory sensor, MRC Warning(0B): DIMM was disabled due to MemTest errors. Node 1, Channel F, DIMM 0.
MRC Event: Memory sensor, MRC Warning(0B): DIMM was disabled due to MemTest errors. Node 1, Channel F, DIMM 1.
MRC Event: Memory sensor, MRC Warning(0B): DIMM was disabled due to MemTest errors. Node 1, Channel F, DIMM 2.

I replaced Channel F, DIMM 0, 1, and 2 modules with known good modules, and booted the system. Both BIOS and syslog show no memory problems, but the server only shows 72 GB of memory in the BIOS and POST.

I shut the server down and and replaced all of Channel D and Channel E modules with known good modules as well so that all modules of CPU1 are identical. Booted the server back up, same issue, only 72 GB are shown in BIOS and POST.

The memory mode section of the BIOS shows Independent mode set. Prior to the above mentioned memory errors the system did have fully functioning 96 GB.

Is there a setting somewhere to enable the "disabled" slots? I was unable to locate any such setting in the BIOS or BMC screens.

These are the SEL Event Data codes for this issue, some codes appeared more than once, but once for each slot:

AF2900 WARN_DQS_TEST_MINOR_CLEANUP
AF2B60 WARN_MEM_TEST
AF1760 WARN_MEM_TEST_DIMM_DISABLE
AF1764 WARN_MEM_TEST_DIMM_DISABLE
AF1768 WARN_MEM_TEST_DIMM_DISABLE

The How to decode raw data on ECC memory errors for the PowerEdge C1100, C2100, C6100, C6105, and C6145 tool provided by Dell doesn't decode those errors.

For reference, here's the slot layout from the Dell PowerEdge C1100 Hardware Owner's Manual @ Dell.com

Memory Socket Location on C1100 System Board

Best Answer

This issue turned out to be a combination of outdated docs and user error.

18 memory slots per C1100. 4 GB of RAM per slot.

18 * 4 = 72

These servers only have 72 GB of RAM and not 96 GB. Since the server shows 72 GB functioning after RAM replacement everything is fine.