Sql-server – How to detect robocopy’s failure to delete from source

robocopysql serverwindows-server-2008

On my SQL Server instance I use SQL Agent to run a daily backup job in two steps.

One of the steps uses Robocopy to move local backup files to network storage.

The command to move the files looks like this:

robocopy M:\backups \\NAS\backups$\Database /MOV /MIR /XJ /NP /COPY:DT

The output of the command in the job step history looks like this:

-------------------------------------------------------------------------------     
ROBOCOPY     ::     Robust File Copy for Windows                                
-------------------------------------------------------------------------------      
Started : Fri Jul 20 00:55:42 2012     
Source : M:\backups\       
Dest : \\NAS\backups$\Database        
Files : *.*            
Options : *.* /S /E /COPY:DT /MOV /PURGE /MIR /NP /XJ /R:1000000 /W:30     
------------------------------------------------------------------------------  
3  M:\backups\
      *EXTRA File       15.5 m  GeoDisplay_Full_2012-07-19-000004.bak
      *EXTRA File       41.3 m  GeoDisplay2_Full_2012-07-19-000004.bak
      *EXTRA File      264.1 g  Webstore_Full_2012-07-19-000004.bak
      New File          15.5 m  GeoDisplay_Full_2012-07-20-000002.bak
      New File          41.4 m  GeoStore_Full_2012-07-20-000002.bak
      New File         302.1 g  Webstore_Full_2012-07-20-000002.bak

      2012/07/20 04:34:50 ERROR 32 (0x00000020) Deleting Source File M:\backups\Webstore_Full_2012-07-20-000002.bak  The process cannot access the file because it is being used by another process.       
------------------------------------------------------------------------------
      Total    Copied   Skipped  Mismatch    FAILED    Extras
Dirs :         1         0         1         0         0         0     
Files :         3         3         0         0         0         3     
Bytes : 302.187 g 302.187 g         0         0         0 264.181 g     
Times :   3:38:57   3:38:45                       0:00:00   0:00:11         
Speed :            24720063 Bytes/sec.     
Speed :            1414.493 MegaBytes/min.       
Ended : Fri Jul 20 04:34:50 2012.  
Process exit code 3.

The text output clearly shows ERROR 32. Robocopy failed to honor the /MOV switch (delete from source after copy) because another process had a handle on one file when robocopy tried to delete it.

Robocopy returned exit code 3 (new files copied to destination and extra files deleted from destination). This is correct but incomplete because there is no way to tell from the error code that any operation failed.

SQL Agent considers only the return code, not the command output, when determining the success of a command shell operation. I could modify the Robocopy command to save the output to disk, and parse the output in an extra job step, but this requires extra programming and would add another dependency to the backup job.

Is there any way to detect this kind of failure without searching Robocopy's text output for the string ERROR 32?

Best Answer

Whether it's RoboCopy or some other method, I prefer to log all output and then post-process the log. I personally prefer Perl for that task but use whatever you're comfortable with. I aslo like to have the test checking script email me the results, showing just whether it succeeded or in the case of a failure the relevant lines from the log.

In my opinion, any backup operation that does not include reporting the results is a disaster waiting to happen, simply because you cannot possibly have confidence in an operation that is not checked. Humans do an appallingly poor job of checking logs, so make the extra effort to write the script. Without it you can just about be guaranteed that some critical operation will fail one day and you'll be none the wiser for it, possibly risking far more than just the data.