Linux – xfs fails with errors on dmesg

linuxxfs

I have a strange error on a linux box with xfs, and I don't know how to debug and fix it.

Below is an excerpt from dmesg :

Info fld=0x17
end_request: I/O error, dev sde, sector 34412208504
sd 7:0:0:0: SCSI error: return code = 0x08000002
sde: Current: sense key: Aborted Command
   <<vendor>> ASC=0xc0 ASCQ=0x23ASC=0xc0 ASCQ=0x23

Info fld=0x17
end_request: I/O error, dev sde, sector 35840057200
sd 7:0:0:0: SCSI error: return code = 0x08000002
sde: Current: sense key: Aborted Command
   <<vendor>> ASC=0xc0 ASCQ=0x23ASC=0xc0 ASCQ=0x23

Info fld=0x17
end_request: I/O error, dev sde, sector 35799212408
sd 7:0:0:0: SCSI error: return code = 0x08000002
sde: Current: sense key: Aborted Command
   <<vendor>> ASC=0xc0 ASCQ=0x23ASC=0xc0 ASCQ=0x23

Info fld=0x17
end_request: I/O error, dev sde, sector 39444095352
sd 7:0:0:1: SCSI error: return code = 0x08000002
sdf: Current: sense key: Aborted Command
   <<vendor>> ASC=0xc0 ASCQ=0x23ASC=0xc0 ASCQ=0x23

Info fld=0x17
end_request: I/O error, dev sdf, sector 32974487928
device-mapper: multipath: Failing path 8:80.
sd 7:0:0:1: SCSI error: return code = 0x08000002
sdf: Current: sense key: Aborted Command
   <<vendor>> ASC=0xc0 ASCQ=0x23ASC=0xc0 ASCQ=0x23

Info fld=0x17
end_request: I/O error, dev sdf, sector 32973734264
sd 7:0:0:1: SCSI error: return code = 0x08000002
sdf: Current: sense key: Aborted Command
   <<vendor>> ASC=0xc0 ASCQ=0x23ASC=0xc0 ASCQ=0x23

Info fld=0x17
end_request: I/O error, dev sdf, sector 22213009752
sd 7:0:0:1: SCSI error: return code = 0x08000002
sdf: Current: sense key: Aborted Command
   <<vendor>> ASC=0xc0 ASCQ=0x23ASC=0xc0 ASCQ=0x23

Info fld=0x17
end_request: I/O error, dev sdf, sector 32940065144
sd 7:0:0:1: SCSI error: return code = 0x08000002
sdf: Current: sense key: Aborted Command
   <<vendor>> ASC=0xc0 ASCQ=0x23ASC=0xc0 ASCQ=0x23

Info fld=0x17
end_request: I/O error, dev sdf, sector 32974552944
sd 7:0:0:1: SCSI error: return code = 0x08000002
sdf: Current: sense key: Aborted Command
   <<vendor>> ASC=0xc0 ASCQ=0x23ASC=0xc0 ASCQ=0x23

Info fld=0x17
end_request: I/O error, dev sdf, sector 17956282744
Buffer I/O error on device dm-3, logical block 9666270717
lost page write due to I/O error on dm-3
I/O error in filesystem ("dm-3") meta-data dev dm-3 block 0xe7ffb01c2       ("xlog_iodone") error 5 buf count 12800
Buffer I/O error on device dm-3, logical block 4028959741
lost page write due to I/O error on dm-3
xfs_force_shutdown(dm-3,0x2) called from line 956 of file fs/xfs/xfs_log.c.  Return address = 0xffffffff883bec58
Filesystem "dm-3": Log I/O Error Detected.  Shutting down filesystem: dm-3
Please umount the filesystem, and rectify the problem(s)

How I can debug this?

Thanks.

Best Answer

I know this is a very old post, but as the answer is incorrect, I think it can be useful to future visitors to post a correct answer...

The error message reported by the OP has nothing to do with XFS by itself, rather it is the result of a bad drive/cable. Examining the error entry:

end_request: I/O error, dev sde, sector 39444095352

The system can not retrieve data located on sde at LBA address 39444095352. This generally means a bad block on disk.

sd 7:0:0:1: SCSI error: return code = 0x08000002
sdf: Current: sense key: Aborted Command
vendor ASC=0xc0 ASCQ=0x23ASC=0xc0 ASCQ=0x23

The SCSI command was aborted due to timeout (caused by the bad block) and the disk returns a specific vendor code explaining the error in more detail.

Issuing a smartctl --all shows various internal disk counter. Attributes with ID 5 (Reallocated_Sector_Ct), 197 (Current_Pending_Sector) and 198 (Offline_Uncorrectable) are of special interest because they shows the disk blocks unreadable/remapped situation.

What can you do in this case? The safest and strongly preferred approach is to backup the entire readable content to another, safe disk (maybe using something resilient to disk error, as ddrescue)

If this approach is not possible, than two possibilities remains:

  1. reboot with a live distro and issue a badblocks -n <dev> (here for man page): it will start a non-destructive read/write test which should trigger the on-disk bad block remapping procedure
  2. manually overwrite the affected bad blocks issuing something similar to dd if=/dev/zero of=/dev/sde bs=512 count=1 seek=39444095352

Note that the two approaches above (especially the second one) will cause data loss, as the affected, unreadable sector will be overwritten.

After the recovery/overwrite completed you should run a complete filesystem check, in this case issuing xfs_repair /dev/sde