If I do the following
/opt/MegaRAID/MegaCli/MegaCli -LDInfo -Lall -aAll -NoLog > /tmp/tmp
/opt/MegaRAID/MegaCli/MegaCli -LDPDInfo -aAll -NoLog >> /tmp/tmp
then I see these errors
Media Error Count: 11
Other Error Count: 5
Question
What does they mean? Are they critical?
Full output:
Adapter 0 -- Virtual Drive Information:
Virtual Disk: 0 (target id: 0)
Name:Virtual Disk 0
RAID Level: Primary-5, Secondary-0, RAID Level Qualifier-3
Size:951296MB
State: Optimal
Stripe Size: 64kB
Number Of Drives:5
Span Depth:1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default
Adapter #0
Number of Virtual Disks: 1
Virtual Disk: 0 (target id: 0)
Name:Virtual Disk 0
RAID Level: Primary-5, Secondary-0, RAID Level Qualifier-3
Size:951296MB
State: Optimal
Stripe Size: 64kB
Number Of Drives:5
Span Depth:1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default
Number of Spans: 1
Span: 0 - Number of PDs: 5
PD: 0 Information
Enclosure Device ID: N/A
Slot Number: 0
Device Id: 0
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Raw Size: 238418MB [0x1d1a94a2 Sectors]
Non Coerced Size: 237906MB [0x1d0a94a2 Sectors]
Coerced Size: 237824MB [0x1d080000 Sectors]
Firmware state: Online
SAS Address(0): 0x1221000000000000
Connected Port Number: 0
Inquiry Data: ATA WDC WD2500JS-75N2E04 WD-WCANK9523610
PD: 1 Information
Enclosure Device ID: N/A
Slot Number: 1
Device Id: 1
Sequence Number: 2
Media Error Count: 11
Other Error Count: 5
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Raw Size: 238418MB [0x1d1a94a2 Sectors]
Non Coerced Size: 237906MB [0x1d0a94a2 Sectors]
Coerced Size: 237824MB [0x1d080000 Sectors]
Firmware state: Online
SAS Address(0): 0x1221000001000000
Connected Port Number: 1
Inquiry Data: ATA WDC WD2500JS-75N2E04 WD-WCANK9507278
PD: 2 Information
Enclosure Device ID: N/A
Slot Number: 2
Device Id: 2
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Raw Size: 238418MB [0x1d1a94a2 Sectors]
Non Coerced Size: 237906MB [0x1d0a94a2 Sectors]
Coerced Size: 237824MB [0x1d080000 Sectors]
Firmware state: Online
SAS Address(0): 0x1221000002000000
Connected Port Number: 2
Inquiry Data: ATA WDC WD2500JS-75N2E04 WD-WCANK9504713
PD: 3 Information
Enclosure Device ID: N/A
Slot Number: 3
Device Id: 3
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Raw Size: 238418MB [0x1d1a94a2 Sectors]
Non Coerced Size: 237906MB [0x1d0a94a2 Sectors]
Coerced Size: 237824MB [0x1d080000 Sectors]
Firmware state: Online
SAS Address(0): 0x1221000003000000
Connected Port Number: 3
Inquiry Data: ATA WDC WD2500JS-75N2E04 WD-WCANK9503028
PD: 4 Information
Enclosure Device ID: N/A
Slot Number: 4
Device Id: 4
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Raw Size: 238418MB [0x1d1a94a2 Sectors]
Non Coerced Size: 237906MB [0x1d0a94a2 Sectors]
Coerced Size: 237824MB [0x1d080000 Sectors]
Firmware state: Online
SAS Address(0): 0x1221000004000000
Connected Port Number: 4
Inquiry Data: ATA WDC WD2500JS-75N2E04 WD-WCANK9503793
Best Answer
You have problems with drive in slot 1. It's RAID 5, so your data is protected, but you've lost redundancy (one disk is not reliable). Media error means the drive run out of spare sectors to remap bad sectors to (
http://kb.lsi.com/KnowledgebaseArticle15809.aspxhttp://mycusthelp.info/LSI/_cs/AnswerDetail.aspx?inc=7468). If it was my data I'd be doubly scrupulous when backing up, remove the drive, replace it with a new one and synchronise the array. Some vendors (e.g. IBM) will accept RMA based on predictive failure indicators, some won't. If your vendor does not accept a disk with bad, un-remappable sectors as faulty, then take it out of the array and exercise in a test system. It should fail in reasonable time.Edit:
Media events were non-zero only for disk with slot ID 1. In the log you've provided there's slot ID for each entry. The strange thing is, that the raid reports its state as optimal, despite media errors on the disk. Still', I wouldn't trust the disk.
RAID 5 made with n disks of the same size gives you capacity of (n-1) disks, because it stores one disks' worth of redundancy data. Therefore if you have six 250 GB disks and 1T of usable space, they are most likely divided into 5-disks RAID 5 (which gives you 4x250 GB of usable space) plus 1 spare disk.