Ubuntu – SSD disk reports ATA errors

ssdUbuntu

Yesterday I got report about SSD ATA errors on one of my hosts.

SSD disk is 128MB OCZ-VERTEX4 Firmware rev 1.3 about 8 months old.

OS is Ubuntu 11.04 running kernel 2.6.38-16-generic.

Motherboard is Intel DP35DP.

There are no read errors or any other disk errors since these two below.

Should I prepare replacement drive?

Smart attributes:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0000   ---   ---   ---    Old_age   Offline      -       393222
  3 Spin_Up_Time            0x0000   100   100   000    Old_age   Offline      -       0
  4 Start_Stop_Count        0x0000   100   100   000    Old_age   Offline      -       0
  5 Reallocated_Sector_Ct   0x0000   100   100   000    Old_age   Offline      -       0
  9 Power_On_Hours          0x0000   ---   ---   ---    Old_age   Offline      -       507536484
 12 Power_Cycle_Count       0x0000   ---   ---   ---    Old_age   Offline      -       1664100
232 Available_Reservd_Space 0x0000   100   100   000    Old_age   Offline      -       4804710657
233 Media_Wearout_Indicator 0x0000   099   000   000    Old_age   Offline      -       99

Kernel log:

Jun  1 11:50:42 kernel: [424453.095411] ata4: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Jun  1 11:50:42 kernel: [424453.095415] ata4: irq_stat 0x00400040, connection status changed
Jun  1 11:50:42 kernel: [424453.095418] ata4: SError: { PHYRdyChg DevExch }
Jun  1 11:50:42 kernel: [424453.095422] ata4: hard resetting link
Jun  1 11:50:42 kernel: [424453.840022] ata4: SATA link down (SStatus 0 SControl 300)
Jun  1 11:50:44 kernel: [424455.948532] ata4: hard resetting link
Jun  1 11:50:45 kernel: [424456.490021] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun  1 11:50:45 kernel: [424456.490288] ata4.00: configured for UDMA/133
Jun  1 11:50:45 kernel: [424456.490294] ata4: EH complete

Jun  1 19:18:23 kernel: [451311.319525] ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Jun  1 19:18:23 kernel: [451311.319529] ata4.00: irq_stat 0x00400040, connection status changed
Jun  1 19:18:23 kernel: [451311.319532] ata4: SError: { PHYRdyChg DevExch }
Jun  1 19:18:23 kernel: [451311.319535] ata4.00: failed command: FLUSH CACHE
Jun  1 19:18:23 kernel: [451311.319541] ata4.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Jun  1 19:18:23 kernel: [451311.319542]          res 40/00:0c:78:c6:c7/00:00:0a:00:00/40 Emask 0x10 (ATA bus error)
Jun  1 19:18:23 kernel: [451311.319545] ata4.00: status: { DRDY }
Jun  1 19:18:23 kernel: [451311.319549] ata4: hard resetting link
Jun  1 19:18:23 kernel: [451312.060033] ata4: SATA link down (SStatus 0 SControl 300)
Jun  1 19:18:23 kernel: [451314.082062] ata4: hard resetting link
Jun  1 19:18:23 kernel: [451314.630022] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun  1 19:18:23 kernel: [451314.630295] ata4.00: configured for UDMA/133
Jun  1 19:18:23 kernel: [451314.630298] ata4.00: retrying FLUSH 0xe7 Emask 0x10
Jun  1 19:18:23 kernel: [451314.630320] ata4: EH complete

Best Answer

It's possible the cable might be bad, but it's also possible the drive's firmware is bad. It can also (very rarely) happen as a one-off. This error shows up when the drive fails to respond to ATA commands, or when data isn't coming across the connection properly.

Consider replacing the cable, and check for firmware updates (and if you're not taking backups, yesterday is a perfectly fine time to start). If you see this happen again, or more frequently, you'll be needing to replace the drive.

Very rarely, this can also be a bad IDE controller (on your RAID card or motherboard).