Drive faults not logged in HP IML on RHEL-based ProLiant server

hphp-proliantmonitoringrhel5

Today I have had disks on two separate HP ProLiant servers go into Predictive Failure. One of these servers runs Windows Server 2008 R2 and one runs Oracle Enterprise Linux 5 (a RHEL5-based distro).

If I look in the Integrated Management Logs for these servers, the Windows server has a 'Caution' entry announcing the Predictive Failure, but the OEL server does not have the same.

We have some existing business process around the IML (ticket integration, reporting, etc.), hence the preference to have these messages there. All the right bells and whistles sounded for the Windows box, but nothing from the OEL server.

I've gone back through my monitoring system's alert history and it shows that this has always been the case — the Windows server reports its disk failures (predictive and actual), while the OEL server does not.

SNMP trap alerts appear to be working; these are logged in root's mail file and are captured in the /var/log/messages file. Interestingly, the IML on the OEL server does appear to be showing me Repaired entries for previous disk failures. It is just the initial Caution or Failure entry that appears to be missing from the log.

The Windows server has all the HP Management agents installed as part of the Intelligent Provisioning/Smart Start install of the OS. The OEL server has the RHEL5 HP yum repo enabled, and has the hpsmh, hpilo, hp-health and hp-snmp-agents packages installed.

The Windows server is a DL380p Gen8, while the OEL server is a DL380 G7. I have no other server generations running OEL to compare (although it does appear to be common to the three DL380 G7 servers I have running OEL). Further checking shows IML-logged drive errors on other Windows servers, at least as far back as G5 (so I don't think it is a generation issue).

I've also looked at the startup/config scripts in /opt/hp/hp-snmp-agents/storage/etc/cma* but can't see anything the pertains to the IML (not that I really know what I am looking for here).

Is it a missing package or config statement (i.e. something readily rectifiable) that is preventing these messages reaching the IML?

Or is it a known issue (leaving me no choice but to hack something else into the business process)?

Best Answer

I don't think you should rely on the HP IML log alone. Not everything is reported there, and the log can be cleared. I don't look at it as an authoritative source of system health status. Plus items get marked as repaired, depending on the event.

If you need a comparison of what a busy EL5 system's IML log should look like, see this pastebin. But most of my IML logs have been cleared at some point... E.g.:

# hplog -v

ID   Severity       Initial Time      Update Time       Count
-------------------------------------------------------------
0000 Information    03:14  02/26/2014 03:14  02/26/2014 0001
LOG: Maintenance note: IML cleared through hpasmcli

0001 Repaired       20:09  05/07/2014 02:38  09/08/2014 0005
LOG: Network Adapter Link Down (Slot 0, Port 1)

0002 Information    05:29  06/30/2014 05:29  06/30/2014 0001
LOG: Firmware flashed (iLO 4 1.51)

0003 Information    03:07  08/12/2014 03:07  08/12/2014 0001
LOG: Firmware flashed (iLO 4 2.00)

The HP management agents in Linux can easily be set to send SNMP traps and also email.

Typical config in /etc/snmp/snmpd.conf:

# Following entries were added by HP Insight Management Agents at
#      Wed Feb 26 03:12:45 PST 2014
dlmod cmaX /usr/lib64/libcmaX64.so
rwcommunity  bigbanana
rocommunity  bigbanana
syscontact Systems <systems@bigbanana.net>
syslocation Anaheim, CA

And for the /opt/hp/hp-snmp-agents/cma.conf

########################################################################
# trapemail is used for configuring email command(s) which will be
# executed whenever a SNMP trap is generated.
# Multiple trapemail lines are allowed.
# Note: any command that reads standard input can be used. For example:
#             trapemail /usr/bin/logger
#       will log trap messages into system log (/var/log/messages).
########################################################################
trapemail /bin/mail -s 'HP Insight Management Agents Trap Alarm - Big Banana' systems@bigbanana.net

The HP management agents for Linux should be straightforward. You'll want the following packages:

hp-snmp-agents, hpssa, hp-health, hp-smh-templates, hpsmh, hpssacli, hponcfg