Detect Supermicro DIMM (memory) errors as reported during POST without rebooting

ipmiipmitoolmemorysupermicro

One of our Supermicro servers reports an error like this during POST:

Failing DIMM: DIMM location (Correctable memory component found)

DIMMB2

I can also see this in the Health Event Log in the IPMI web interface:

Failing DIMM: DIMM location. (Correctable memory component found) (DIMMB2)

Until I rebooted it (for unrelated reasons), the server has been running fine, so I had no idea anything was wrong with its RAM. Is there any way to find errors like this without rebooting the server, e.g. some ipmitool command?

If not, is there a way to at least a scriptable way to see these errors after a server has been rebooted, i.e. without using the web interface? I tried ipmitool sel elist, but it shows these entries as "Unknown" events:

5 | 10/11/2019 | 11:21:25 | Unknown #0xff | | Asserted

Edit: I found that Supermicro's proprietary tool, IPMICFG, can show these events (IPMICFG-Linux.x86_64 -sel list) but it would still be nice to have a way to do this with ipmitool and, most importantly, without rebooting.

Best Answer

Try to use FreeIPMI instead (ipmi-sel for instance): there's a good chance it will give you more information than ipmitool as the codebase is much more maintained