Simulate a SMART error on a drive for test purposes

smartsmartmontools

We are monitoring the disks on our servers using Smartmontools and Nagios with the check_smartmon or another Nagios plugin. It appears to work, as there are no errors. But how do I know if it is truly working?

It would be great to simulate an error on the disk and observe the error through the entire Nagios pipeline. From the Linux or FreeBSD commandline, s there a way to trigger a SMART fault on a disk drive or array without damaging the disk?

I found an old discussion on the smartmontools-support mailinglist, but it's not clear that this functionality was ever added.

Best Answer

If the drive firmware supports it, hdparm can be used to manually corrupt some sectors via its --make-bad-sector option. Note that this will really corrupt a sector, which means that:

  • on subsequent read, the sector will be "discovered" as unreadable with a corresponding increase in SMART attribute 197 - Current Pending Sector
  • on subsequent write, the sector will be remapped using a spare sector, with a corresponding increase in SMART attribute 5 - Reallocated Sector Count

Please note that hdparm distinguishes between a "normal" and "flagged" corruption: in the former, any read will timeout as if the sector were genuinely bad; in the latter any read will immediately be aborted.

Be sure to understand that, using the method above, you are really corrupting sectors, with relative reallocation events - ie you are somehow "damaging" your drive.

Finally, to recover a sector before it is reallocated you can use the --repair-sector option.

Back to smartmontools: you can use an old drive to simulate such errors, giving smartd a chance to alert you and checking the effectiveness of your smartctl configuration.

Related Topic