Linux – Clear Dell OpenManage SBE memory log of all and specific connectors without rebooting the server

dell-openmanagelinux

Running omreport chassis results in:

Health

Main System Chassis

SEVERITY : COMPONENT
Ok       : Fans
Ok       : Intrusion
Critical : Memory
Ok       : Power Management
Ok       : Processors
Ok       : Temperatures
Ok       : Voltages
Ok       : Hardware Log
Ok       : Batteries

For further help, type the command followed by -?

Running dcicfg command=clearmemfailures in order to clear the SBE fails:

Clearing failures using mask: 31
DIMM_X1 : failed status: 270

Based on this message the assumption was that the command should be issued on the memory that is causing the issue.

Consulting the help by executing dcicfg command=clearmemfailures -? resulted in:

Dell(R) Data Engine Data Engine Configuration Utility  7.4.0 (BLD_1)
Copyright (C) Dell Inc. 1995-2013

Usage: dcicfg command=COMMAND [PARAMETERS...] [OPTIONS...]

COMMAND:
  clearmemfailures    Clear memory device failure mode

PARAMETERS:
  listonly=BOOLN      (opt.) list all occupied memory connectors
  connectors=STRING   (opt.) memory device connector name (default=all)
  failures=STRING     (opt.) failure type to clear (default=all)

Running omreport chassis memory indicates which memory is causing the issue:

Index          : 3
Status         : Critical
Connector Name : DIMM_Y1
Type           : DDRY - Synchronous Unregistered (Unbuffered)
Size           : Y  MB

and issuing dcicfg command=clearmemfailures connectors=DIMM_Y1 indicated that the memory connector cannot be found:

Clearing failures using mask: 31
failed to find any memory connector based on the names provided

omreport chassis memory index=3 indicates that the memory has thrown SBEs:

Memory Device Information

Health : Critical

Status      : Critical
Device Name : DIMM_Y1
Size        : Y MB
Type        : DDRY Synchronous Unregistered (Unbuffered)
Speed       : Y ns
Rank        : Dual
Failures    : Single-bit warning error rate exceeded.
              Single-bit failure error rate exceeded.

Questions

  1. What does the failed status 270 mean?
  2. Why can the memory connector not be found while it has been specified and it exists?
  3. How to clear SBEs?

Attempts to solve the issue

The following commands from this Q&A:

  1. sudo omconfig system esmlog action=clear
  2. sudo omconfig system alertlog action=clear

were issued to clear the SBE, but the Critical memory status persists.

Best Answer

I had trouble clearing the SBE log using dcicfg. The below steps worked for me:

Download the Dell Support Live Image (download link at bottom of this link)

  1. Boot the system using the "DOS-Based Diagnostic Tools (Dell 9G-10G servers)" or "DOS-Based Diagnostic Tools (Dell 11G servers)" option. The Customer Diagnostic Menu Ver 1.6 is displayed.
  2. When the Enter option or letter is displayed, then press the <4> key. The MS-DOS prompt is displayed.
  3. Type C:, and then press < Enter >. The current drive changes to C:.
  4. Type "mpmemory –ptech -tlogclr", and then press < Enter >.