Linux – Maintain RAID to keep it healthy – data scrub

bad-blockslinuxraid

I want to be prepaired the best to keep my linux sw-raids healthy. So I know it's very important to maintain the data on a RAID to keep it readable in event of disk failure. If not, an URE could prevent a correct rebuild. So I myself: What would be a good (or maybe best) way to scrub my data? I'm considering:

echo "check" >/sys/block/md/md0/sync_action

or

echo "repair" >/sys/block/md/md/sync_action

and

badblocks -n

For the first both I'm not quite sure about if "check" is enough. I read that "repair" will restore errors if it found some by using parity data. So this should be what I want, if I'm getting this point right? But what does "check" then even do? Only finding problems?

But if I do on of both it always starts from the beginning which takes a long time so maybe badblocks with n-Option could be a better solution, because it is possible to define start and end blocks. With this I could make small jobs in the night and start next night were the previous job stopped. But badblocks doesn't want to use a mounted fs, except with the f-option but the man-page doesn't recommend it.

How do you maintain your RAID? Maybe I'm barking up the wrong tree and there are better solutions…

Best Answer

Running the check function as a cron job has always been sufficient for me. I've never had it actually find any errors.

Beyond that remember that RAID is not a backup. So keep a backup of all important data. Offsite is best but an external USB drive and a rsnapshot cron job is a good first step.

Running badblocks would not work on a md block device as the RAID, depending on the RAID level you're running, is going to hide any bad blocks from it. And in the event that it thought it found a bad block what drive would sector 88376283 be on? Modern hard drives take care of themselves and relocate bad blocks to spare blocks. So if you're seeing bad blocks on a drive then that's not a good sign as it typically means that the drive has run out of spare blocks and is no longer able to cope with media errors. So keep in mind that badblocks is an old program and it's original intent was for working around badblocks on a drive prior to drives being able to relocate badblocks on their own, back when drives were expensive. That's not to say that badblocks isn't useful on modern cheap drives I just don't think it will help you protect data on a disk. As if it does show errors then your disk is already in pretty bad shape.

Outside of this you can monitor the drives' health with S.M.A.R.T. but that's not perfect either. But it is another layer of monitoring that you can do. There are plenty of questions here about S.M.A.R.T. that can go into some of it's pros, cons, and abilities.