Storage Spaces Failed Disk – Volume Offline

storage-spaceswindows-server-2012-r2windows-storage-server

We have set up an inexpensive physical server with a bunch of 3TB disks to use as a backup staging area before we push to tape. We've installed Windows Server 2012 R2 and set up Storage Spaces/Pools. We backup using Veeam to a faster server running on fibre channel, and then use scripts to move backups that are older than x number of days to our Storage Spaces server.

We had some failures originally as we found using Robocopy to move the data by UNC path didn't gracefully close out the SMB connection. we resolved this by adding net use and then net use /delete to the script (and then using the drive letter as the Robocopy target). This worked beautifully for the last week or two.

This morning though the scripts reported failure. Upon investigation I found a series of event ID 51 warnings, followed by event ID 134 (from source ReFS). This looks to me like a physical disk in the storage pool has failed. However, looking in Server Manager, it showed virtual disk/volume/not quite sure what to call it as 'offline'; simply bringing it back online worked, and there are no failed physical disks in the Storage Pool. There are also two hot spares, and neither of these have been swapped in.

I'm curious as to what happened here? And also why did the volume go offline? I thought the whole point of ReFS and Storage Pools was to provide reliance in the event of these kinds of failures?

EDIT: Adding all relevant logs below.

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
  <Provider Name="disk" /> 
  <EventID Qualifiers="32772">51</EventID> 
  <Level>3</Level> 
  <Task>0</Task> 
  <Keywords>0x80000000000000</Keywords> 
  <TimeCreated SystemTime="2014-12-23T22:13:12.704827200Z" /> 
  <EventRecordID>23901</EventRecordID> 
  <Channel>System</Channel> 
  <Computer>****</Computer> 
  <Security /> 
  </System>
 <EventData>
  <Data>\Device\Harddisk25\DR25</Data> 
  <Binary>040080000100000000000000330004802D0100006B0400C000000000000000000000000000000000FC8F470200000000FFFFFFFF0100000058000030020000000020101280032040000080003C000000000020AB09E0FFFF783583D201E0FFFF0000000000000000507383D201E0FFFF30C99FC108E0FFFF6B0400C0000000008A00000000027C288D60000008000000000000000000000000000000000000000000000000000000</Binary> 
  </EventData>
  </Event>

An error was detected on device \Device\Harddisk25\DR25 during a
paging operation.

FYI Disk25 is the virtual disk created by storage spaces, not one of the physical disks

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
 <System>
  <Provider Name="ReFS" Guid="{036647D2-2FB0-4E32-8349-3F5C19C16E5E}" /> 
  <EventID>134</EventID> 
  <Version>0</Version> 
  <Level>2</Level> 
  <Task>0</Task> 
  <Opcode>0</Opcode> 
  <Keywords>0x8000000000000000</Keywords> 
  <TimeCreated SystemTime="2014-12-23T22:13:13.329846900Z" /> 
  <EventRecordID>23902</EventRecordID> 
  <Correlation /> 
  <Execution ProcessID="4" ThreadID="31267444" /> 
  <Channel>System</Channel> 
  <Computer>*****</Computer> 
  <Security UserID="S-1-5-18" /> 
  </System>
<EventData>
  <Data Name="VolumeIdLength">2</Data> 
  <Data Name="VolumeId">D:</Data> 
  <Data Name="FailureReason">0xc000000e</Data> 
  </EventData>
  </Event>

The file system was unable to write metadata to the media backing
volume D:. A write failed with status "A device which does not exist
was specified." ReFS will take the volume offline. It may be mounted
again automatically.

 <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
 <System>
  <Provider Name="Microsoft-Windows-StorageSpaces-Driver" Guid="{595F7F52-C90A-4026-A125-8EB5E083F15E}" /> 
  <EventID>304</EventID> 
  <Version>0</Version> 
  <Level>3</Level> 
  <Task>0</Task> 
  <Opcode>0</Opcode> 
  <Keywords>0x8000000000000000</Keywords> 
  <TimeCreated SystemTime="2014-12-30T23:43:40.519688500Z" /> 
  <EventRecordID>21</EventRecordID> 
  <Correlation /> 
  <Execution ProcessID="4" ThreadID="3723912" /> 
  <Channel>Microsoft-Windows-StorageSpaces-Driver/Operational</Channel> 
  <Computer>****</Computer> 
  <Security UserID="S-1-5-18" /> 
  </System>
 <EventData>
  <Data Name="Id">{DE94C7EF-6A25-11E4-80B7-647002019326}</Data> 
  </EventData>
  </Event>

The virtual disk {de94c7ef-6a25-11e4-80b7-647002019326} is in a
degraded state. This can happen when a physical disk hosting the
virtual disk fails, is disconnected, or experiences a write error.

Windows will attempt to repair the virtual disk. No action is needed at this time.

Best Answer

Assuming you are definitely using a fault-tolerant mode such as parity or mirror, then that error should not be possible. I was able to reproduce that error in a striping setup with a disk I have that I know is bad. So either you're set up for striping, or you found a bug. I would involve Microsoft at this point, if you haven't already.

Related Topic