Linux – How to use hdparm to fix a pending sector

bad-blockshdparmlinuxsmartctl

SMART is stating one pending sector on of my server's hdd. I've read lot's of articles recommending using hdparm to "easily" force the disk to relocated the bad sector, but I can't find the correct way to use it.

Some info from my "smartctl":

Error 95 occurred at disk power-on lifetime: 20184 hours (841 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 d7 55 dd 02  Error: UNC at LBA = 0x02dd55d7 = 48059863

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 d6 55 dd e2 00  18d+05:13:42.421  READ DMA
  27 00 00 00 00 00 e0 00  18d+05:13:42.392  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 02  18d+05:13:42.378  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 02  18d+05:13:42.355  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00  18d+05:13:42.327  READ NATIVE MAX ADDRESS EXT

 SMART Self-test log structure revision number 1
 Num  Test_Description    Status                  Remaining  LifeTime(hours)        LBA_of_first_error
 # 1  Extended offline    Completed: read failure       90%     20194         48059863
 # 2  Short offline       Completed without error       00%     15161         -

With that "bad LBA" (48059863) in hand, how do I use hdparm? What type of address the parameters "–read-sector" and "–write-sector" should have?

If I issue the command hdparm –read-sector 48095863 /dev/sda it reads and dumps data. If this command was right, I should expect an I/O error, right?

Instead, it dumps data:

$ ./hdparm --read-sector 48059863 /dev/sda

/dev/sda:
reading sector 48059863: succeeded
4b50 5d1b 7563 a932 618d 1f81 4514 2343
8a16 3342 5e36 2591 3b4e 762a 4dd7 037f
6a32 6996 816f 573f eee1 bc24 eed4 206e
(...)

Best Answer

If for whatever reason you prefer to try to clear those bad sectors, and you do not care about the existing contents of a drive, the below shell snippet may help. I tested this on an older Seagate Barracuda drive that is well past its warranty anyway. It might not work right with other drive models or manufacturers, but it should put you on the right path if you must script something. It will destroy any content you have on the drive.

You may prefer just running badblocks, an hdparm Secure Erase (SE) (https://wiki.archlinux.org/index.php/Securely_wipe_disk), or some other tool that is actually designed for this. Or even the manufacturer provided tools like SeaTools (there is a 32bit linux 'enterprise' version, google it).

Make sure the drive in question is completely unused/unmounted before doing this. Also, I know, while loop, no excuses. It is a hack, you can make it better...

baddrive=/dev/sdb
badsect=1
while true; do
  echo Testing from LBA $badsect
  smartctl -t select,${badsect}-max ${baddrive} 2>&1 >> /dev/null

  echo "Waiting for test to stop (each dot is 5 sec)"
  while [ "$(smartctl -l selective ${baddrive} | awk '/^ *1/{print substr($4,1,9)}')" != "Completed" ]; do
    echo -n .
    sleep 5
  done
  echo

  badsect=$(smartctl -l selective ${baddrive} | awk '/# 1  Selective offline   Completed: read failure/ {print $10}')
  [ $badsect = "-" ] && exit 0

  echo Attempting to fix sector $badsect on $baddrive
  hdparm --repair-sector ${badsect} --yes-i-know-what-i-am-doing $baddrive
  echo Continuning test
done

One advantage of using the 'selftest' method is the load is handled by the drive firmware, so the PC it is connected to is not loaded down like it would be with dd or badblocks.

NOTE : I'm sorry, I made a mistake, the correct while condition is like this :

while [ "$(smartctl -l selective ${baddrive} | awk '/^ *1/{print $4}')" = "Self_test_in_progess" ]; do

And the exit condition of the script becomes :

[ $badsect = "-" ] || [ "$badsect" = "" ] && exit 0