Debian – Email notifications about hardware RAID status

debianhardware-raidhphp-proliantsnmp

I have a server with HP Smart Array hardware RAID controller. To monitor its status, I use cpqarrayd. /etc/default/cpqarrayd contains DAEMON_OPTS="-t localhost:162" to send SNMP traps when something happens. Traps are handled by snmptrapd, /etc/snmp/snmptrapd.conf contains

disableAuthorization yes
traphandle default mailx -s "SNMP Trap" admin@example.com

The e-mails recieved this way contain SNMP Traps, but they are not human-readable, and it's impossible to tell what they are about, or whether they were issued by cpqarrayd or not.
Is it possible to send human-readable e-mails when RAID status changes?

Solution

The following script placed in cron.hourly:

#!/bin/sh

CCISS_DEVICE=/dev/cciss/c0d1
STATUS_FILE=/var/cciss_vol_status
TMP_FILE=$TMPDIR/status-$$.$RANDOM

mv $STATUS_FILE $TMP_FILE
cciss_vol_status $CCISS_DEVICE >$STATUS_FILE

if ! cmp -s $STATUS_FILE $TMP_FILE ; then
    mailx -s "CCISS status changed" admin@example.com <$STATUS_FILE
fi

rm $TMP_FILE

Best Answer

First, see: How do I get my HP servers to email me when a drive fails?

In short, the HP SNMP Management Agents that are installed as part of the Service Pack for ProLiant or Management Component Pack (Debian) will provide you the proper alerts for the system's health. This includes traps for disks, array controller, fan, temperature, power supplies, ILO, NICs, etc.

This is fully supported under Debian. You will find the downloads in the HP Software Delivery Repository.

Two parts to this (configured automatically by the installer):

In your snmpd.conf file:

# Following entries were added by HP Insight Management Agents at
#      Thu Mar 18 04:14:43 PDT 2010
dlmod cmaX /usr/lib64/libcmaX64.so

That registers the HP health agents with SNMP.

And the /opt/hp/hp-snmp-agents/cma.conf file:

############################################################
#
# cma.conf: HP Insight Management Agents configuration file
#
############################################################

########################################################################
# trapemail is used for configuring email command(s) which will be
# executed whenever a SNMP trap is generated.
# Multiple trapemail lines are allowed.
# Note: any command that reads standard input can be used. For example:
#             trapemail /usr/bin/logger
#       will log trap messages into system log (/var/log/messages).
########################################################################
trapemail /bin/mail -s 'HP Insight Management Agents Trap Alarm' alerts@brazzers.com

Typical RAID alert emails will look like:

Trap-ID=3040

Accelerator Board Battery status change, slot number: 1.
Battery failed. Status: Failed..

or

Trap-ID=3034

Logical Drive Status Change: Slot 1, Drive: 2.Status is now Rebuilding.

or

Trap-ID=3034

Logical Drive Status Change: Slot 1, Drive: 1.Status is now OK.

EDIT:

It appears you're having difficulty with a 100-series ProLiant, HP Health agents and Debian. This is a supported solution, but depending on how you've installed and configured the solution, you may have problems. Given that, you can probably just install the cciss_vol_status utility and run a periodic check via cron.

Related Topic