Linux – SMART warns me but I don’t trust it

hardwarelinuxmonitoringsmart

I've got a server with four Samsung hard drives. All drives are the same model and have been bought together. The drives are SAMSUNG HE753LJ with firmware 1AA01113.

I'm getting SMART errors but I have the feeling that smartctl does not understand the value he gets from the hard drive.

Here's the result of a SMART test:

asgard:~# smartctl -H /dev/sdb
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0007   001   001   011    Pre-fail  Always   FAILING_NOW 60340

I don't trust SMART because:

  • It's been over one year that all disks are about to fail within less than 24 hours. Nothing blew up yet.
  • Wikipedia says that
    "Spin-Up Time is the average time of spindle spin up (from zero RPM to fully operational [millisecs])." That would mean that the drives need about one minute to wake up?!

I would like to follow smartctl's advice and change these disks but I just don't trust the results I read.

What do you think about this?
What would you do?

Thanks for your help.

Best Answer

All drives are the same model and have been bought together.

This is a ticking bomb.

Based on both the message from SMART and the quote above, you should change disks right away.

Since the drives have been bought together and are the same model, they will probably have the same weaknesses, and probably all fail simultaneously under the same condition...

The main concept of RAID is that disks fail at different times, giving you the opportunity to swap one disk at a time, and avoid data loss.

Others have reported simultaneous failure of an entire array of identical disks in a RAID configuration, coming from the same production batch, and thus being subject to the same weakness.

I can't stress this enough: You need to start swapping your drives!