Why does a RAID-0 volume randomly fail with healthy drives

hard drivehardware-raidraid

I have had a pair of 5 year warranty WD Velociraptors hardware stripped on an intel ICH8R motherboard controller for about 1.5 years.

The other day, the volume randomly failed during no specific activity and the RAID bios indicated one of the drives had failed.

I did extensive diagnostics with Spinrite and WD Diag on each drive and they picked up NO surface issues, no sector errors, and no SMART warnings.

I then recreated the volume with the same drives, restored from backup, and have been up and running fine for 2 weeks now with no issues.

What happened?

Are my drives okay? Can there be something unhealthy with one of my drives that the diags are not picking up?

Best Answer

You ran into the worst problem with stripe only arrays. RAID0 is completely unforgiving any IO interruptions. If any drive bobbles you will need to rebuild the array from scratch. This is why I almost always RAID level 1 or higher.

Many things can cause a drive to have temporary IO issues: power fluctuations, heat, vibration, and dirty connections are just a few. Dust in the system can buildup and cause airflow problems and heat buildup. Dust can also work its way into connections.
You may want to clean the inside of your machine to remove the dust and gunk that builds up and re-seat all of the drive connections. Measure the internal temperature, not just on the system board but near or between the drives. Add airflow if the temperature seems too warm. This should take care of heat and dirty connections as a problem.

Power problems are a different beast all together. If you have adequate power and filtering it shouldn't be a problem. If you are hanging the machine off of mains power without any sort of line conditioning or UPS you are just asking for problems.

Related Topic