We are monitoring the disks on our servers using Smartmontools and Nagios with the check_smartmon or another Nagios plugin. It appears to work, as there are no errors. But how do I know if it is truly working?
It would be great to simulate an error on the disk and observe the error through the entire Nagios pipeline. From the Linux or FreeBSD commandline, s there a way to trigger a SMART fault on a disk drive or array without damaging the disk?
I found an old discussion on the smartmontools-support mailinglist, but it's not clear that this functionality was ever added.
Best Answer
If the drive firmware supports it,
hdparm
can be used to manually corrupt some sectors via its--make-bad-sector
option. Note that this will really corrupt a sector, which means that:Current Pending Sector
Reallocated Sector Count
Please note that
hdparm
distinguishes between a "normal" and "flagged" corruption: in the former, any read will timeout as if the sector were genuinely bad; in the latter any read will immediately be aborted.Be sure to understand that, using the method above, you are really corrupting sectors, with relative reallocation events - ie you are somehow "damaging" your drive.
Finally, to recover a sector before it is reallocated you can use the
--repair-sector
option.Back to
smartmontools
: you can use an old drive to simulate such errors, givingsmartd
a chance to alert you and checking the effectiveness of yoursmartctl
configuration.