Memory – Does it Make Sense to Install Online Spare Memory?

hphp-proliantmemory

I'm using HP DL360p Generation 8. I need a pretty reliable server, so I'm using RAID 1 with a spare drive, and I also added an extra power supply. But should I install online spare memory? Or it's just wasting money?

Best Answer

It's not worth it. With ECC RAM and running HP management agents, it's pretty easy to detect bad memory. There are typically a few steps to intervene before you see a major problem that affects operation. Under standard support, RAM replacement is next-business-day, so there's no need to complicate your RAM arrangement by adding spare DIMMs.

The worst HP ProLiant memory issue I had on a system eventually crashed the server after several ECC alerts that occurred over the course of a week. The errors came, the server rebooted through an ASR and the machine came back up with the bad DIMM disabled. This was an HP ProLiant DL580 G4 system and the error logs were as follows...

0004 Repaired       22:21  12/01/2008 22:21  12/01/2008 0001
LOG: Corrected Memory Error threshold exceeded (Slot 1, Memory Module 1)

0005 Repaired       20:41  12/06/2008 20:43  12/06/2008 0002
LOG: POST Error: 201-Memory Error Single-bit error occured during memory initialization, Board 1, DIMM 1. Bank 
containing DIMM(s) has been disabled.

Back in the day, I installed many HP ProLiant DL740 servers that featured a RAID5-style memory array. So a 16GB RAM server actually had 20GB installed in hot-swappable banks of 8 DIMMS. For the dozens of those servers that I deployed and ran for 5+ years, I only had one DIMM module fail. Figures...

Edit:
You're planning to use this in a high-frequency trading environment. You asked about latency with spare RAM in a server like this. Typically, for low-latency applications, I disable the memory pre-failure checks on my host systems. This is the recommendation from HP on page 7 of their Configuring the HP ProLiant Server BIOS for Low-Latency Applications white paper. It's a matter of monitoring and risk. I rarely have DIMMs fail. Do you care more about speed or resiliency? You won't get both at the hardware level...

Related Topic