DPM 2010 “Disk failed or disk not found”

scdpmwindows-server-2008-r2

I have an HP Proliant ML110 G5 server with Windows server 2008R2 only dedicated for DPM 2010. This server has a limit in HD of 8TB which has already been met.

I'm now stuck in this situation where my disk keeps failing "Disk failed or Disk not found" in the disk management. Only after I reboot the system the disk comes back up. Today I was running my monthly tape backup on a certain protection group and the disk failed again while the tape job was running (so the job wasn't completed).

This is the description of the error in the alerts: "The disk Disk 1 – Hitachi HDS722020ALA330 SCSI Disk Device cannot be detected or has stopped responding. All subsequent protection activities that use this disk will fail until the disk is brought back online. (ID 3120)".

My backup system is becoming useless! I don't think that is a hardware issue (please correct me if I'm wrong) since the HD works fine for a certain period of time which is becoming shorter and shorter.

I basically have no more option to fix this problem. I tried to fix any error that was coming up in the event viewer with no luck (included one regarding the SQL2008 compatibility issue). The disk keeps failing! Now I'm only trying to recover/migrate the data from the disk that is having problem but my issue now is that I cannot add any drives to my server since I already got installed the maximum storage capacity 8TB.

I thought about 2 simple options. Please tell me what you guys think about it;

  1. Unplug one of the 2 storage pool disks (disk0, that one without problem) from the machine and install a new one in order to migrate the data with the Migration tool for DPM. Remove the defective disk (disk1), put back the disk0 and run the synchronization/consistency check on all the groups to recreate replicas and recovery points.

  2. Run diskpart.exe and clean up the disk (loosing all data) and hoping that he will work after I sync all the protection groups.

Both solutions are not elegant but I have no better options at the moment. Please I need some help.

Thanks for your time

Angelo

Best Answer

Your disk IS going bad, it's just sliding quietly into the night instead of dropping dead in place. I'm not sure about the DL110 line, but it sounds like the reboot is clearing the error counters your disk controller is seeing to throw the disk out of usage. That, or the disk only has so much operational time before it throws its creeping error. It's time for a new disk.

Since this is the drive in a RAID1 set, once you get a new drive it should remirror cleanly once you have it.

Related Topic