I have a server with mdadm raid0:
# mdadm --version
mdadm - v3.1.4 - 31st August 2010
# uname -a
Linux orkan 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012 x86_64 GNU/Linux
One of the disk has failed:
# grep sdf /var/log/kern.log | head
Jan 30 19:08:06 orkan kernel: [163492.873861] sd 2:0:9:0: [sdf] Unhandled error code
Jan 30 19:08:06 orkan kernel: [163492.873869] sd 2:0:9:0: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 30 19:08:06 orkan kernel: [163492.873874] sd 2:0:9:0: [sdf] Sense Key : Hardware Error [deferred]
Right now in dmesg I can see:
Jan 31 15:59:49 orkan kernel: [238587.307760] sd 2:0:9:0: rejecting I/O to offline device
Jan 31 15:59:49 orkan kernel: [238587.307859] sd 2:0:9:0: rejecting I/O to offline device
Jan 31 16:03:58 orkan kernel: [238836.627865] __ratelimit: 10 callbacks suppressed
Jan 31 16:03:58 orkan kernel: [238836.627872] mdadm: sending ioctl 1261 to a partition!
Jan 31 16:03:58 orkan kernel: [238836.627878] mdadm: sending ioctl 1261 to a partition!
Jan 31 16:04:09 orkan kernel: [238847.215187] mdadm: sending ioctl 1261 to a partition!
Jan 31 16:04:09 orkan kernel: [238847.215195] mdadm: sending ioctl 1261 to a partition!
But mdadm did not notice that the drive has failed:
# mdadm -D /dev/md0
/dev/md0:
Version : 0.90
Creation Time : Thu Jan 13 15:19:05 2011
Raid Level : raid0
Array Size : 71682176 (68.36 GiB 73.40 GB)
Raid Devices : 3
Total Devices : 3
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu Sep 22 14:37:24 2011
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Chunk Size : 64K
UUID : 7e018643:d6173e01:17ab5d05:f75b494e
Events : 0.9
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 65 1 active sync /dev/sde1
2 8 81 2 active sync /dev/sdf1
Also, forcing a read from /dev/md0 does support the theory that /dev/sdf has failed and yet mdadm does not mark the drive as failed:
# dd if=/dev/md0 of=/root/md.data bs=512 skip=255 count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.00367142 s, 139 kB/s
# dd if=/dev/md0 of=/root/md.data bs=512 skip=256 count=1
dd: reading `/dev/md0': Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000359543 s, 0.0 kB/s
# dd if=/dev/md0 of=/root/md.data bs=512 skip=383 count=1
dd: reading `/dev/md0': Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000422959 s, 0.0 kB/s
# dd if=/dev/md0 of=/root/md.data bs=512 skip=384 count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.000314845 s, 1.6 MB/s
However trying to access the /dev/sdf disk fails with:
# dd if=/dev/sdf of=/root/sdf.data bs=512 count=1
dd: opening `/dev/sdf': No such device or address
The data is not that important to me, I just want to understand why mdadm insists that the array is "State: clean"
Best Answer
The md(4) man page sheds some light on how the word "clean" is used (crucial bit italicized):
It's plausible that a disk in the RAID failed after the RAID was safely and normally disabled by the system (at, e.g., a shutdown). In other words, the disk failure happened with the RAID in a consistent, synchronized state. The RAID would then be flagged "clean", and, when it was next enabled and one of its disks failed, the flag would remain.