Windows – How to fix system state backup error and NTDS VSS (error: 0x800423f4) state [11]

system-statevsswindowswindows-server-backup

The SBS 2011 (Exchange is at SP1) Windows 2008 R2 suddenly stopped making backups with error

Backup unsuccessful. A Volume Shadow Copy Service operation failed. Unknown error (0x800423f4).

When manually starting the backup from SBS console, the backup will fail after 52 seconds.

Hardware setup

The source of the backup are two RAID-1 volumes connected to a P420:

  • 2 x 128GB Samsung SSD 840 — 78 GB out of 119 GB available
  • 2 x 300GB ATA WDC WD3000HLFS — 218 GB out of 279 GB available

The backup destination is a USB drive with 298 GB of (free) space.

System State backup fails

> wbadmin start systemstatebackup -backuptarget:\\?\Volume{3956a561-b129-11e3-805c-7446a0f49555}
...(203.18 MB)...

Failure in a Volume Shadow Copy Service operation.

ERROR - Volume Shadow Copy Service operation error (0x800423f4)
The writer experienced a non-transient error.  If the backup process is retried,
the error is likely to reoccur.

I could not read .etl files

The wbadmin command output also points to log files that should be available at C:\Windows\Logs\WindowsServerBackup\, however there are no .log files there (only .etl files).

NTDS writer is in state "[11] Failed"

> Vssadmin list writers

The only item with an error is the NTDS writer:

Writer name: 'NTDS'
   Writer Id: {b2014c9e-8711-4c5c-a5a9-3cf384484757}
   Writer Instance Id: {d88809aa-a5ef-460e-84c0-4dd8a8350184}
   State: [11] Failed
   Last error: Non-retryable error

Event viewer

In the event viewer Application event log the wbadmin start systemstate command registers

  • an error for application Backup with Event-ID 521 and error number 2155348129.
  • After starting the command the ESENT event-IDs occur is this order: 2001, 2001, 2003, 2006, 2003, 2006,
  • then there is the VSS event 8229 with error 0x800423f4,
  • then there are 18264 events (MSSQL database backup succeeded for MICROSOFT##SSEE, SBSMONITORING and SHAREPOINT),
  • and finally there is the Backup event 521 with error 2155348129.

Regression

  • Reboot
  • Disable CrashPlan backup service
  • Disable SQL Server VSS Writer
  • C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\BIN>PSConfig.exe -cmd upgrade -inplace b2b -force -cmd applicationcontent -install -cmd installfeatures
  • Clear Volume Shadow Copy files for boot volume

    > vssadmin delete shadows /for=c: /all

  • Set Volume Shadow Copy to use unlimited space on both volumes

  • Delete backup catalog

    > wbadmin delete catalog

  • Restart the Com and DCOM services

  • Restart the Volume Shadow Copy Service
  • Uninstall Windows Backup component; reboot; install Windows Backup component
  • Install Update Rollup 4 for Windows Small Business Server 2011 Standard (KB2885319)
  • Re-registering Vss Dlls
  • Install Sharepoint 2010 Foundation SP2
  • cd "C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\BIN";PSConfig.exe -cmd upgrade -inplace b2b -force -cmd applicationcontent -install -cmd installfeatures
  • increase swap file from 32MB to 1.5x RAM (90000 MB)
  • Run dcdiag /fix; remove old domain controller; reboot; run dcdiag /fix again

Command "dcdiag /fix" fails

Starting test: NCSecDesc
    Error NT AUTHORITY\ENTERPRISE DOMAIN CONTROLLERS doesn't have
       Replicating Directory Changes In Filtered Set
    access rights for the naming context:
    DC=DomainDnsZones,DC=CONTOSO,DC=COM
    Error NT AUTHORITY\ENTERPRISE DOMAIN CONTROLLERS doesn't have
       Replicating Directory Changes In Filtered Set
    access rights for the naming context:
    DC=ForestDnsZones,DC=CONTOSO,DC=COM
    ......................... Contoso-DC1 failed test NCSecDesc 

FRS evntvwr

File Replication Service log shows some errors with id 13568, De File Replication-service de volgende fout aangetroffen in de replicaset DOMAIN SYSTEM VOLUME (SYSVOL SHARE): JRNL_WRAP_ERROR.

How do I let this backup complete its backups again?

Best Answer

Volume shadow copying may stop working at times for a number of reasons I don't really get. But I have had success in making the VSS service run correctly again by deleting all existing shadow copies on a particular volume. Do like this in an elevated command prompt:

vssadmin delete shadows /for=c: /all

I see that you tried to reset the VSS copies for your volumes, but did you do it like this?

Next, check out the ETL files you get - they are parseable if you use the VSS tracing tools available here. In particular, try doing:

vsstrace -etl <file.etl> -o <outfile>

This should give you the logged events in a readable format. If this doesn't give you anything worthwhile, try getting a list of VSS writers like this:

vssadmin list writers

The result should be a list of entities that use the VSS service to write stuff along with a Last error: entry per writer. In particular, you should check if there is more than just the one failing component.

EDIT: and this - I just remembered I fixed wbadmin strangeness by resetting the backup catalog. This may or may not be an option for you, but I did it like this:

wbadmin delete catalog

Hope it helps!