Network tape restore is faster than disk to disk copy

bandwidthhard driveperformancetapeveritas

How can this be?

Running a cp or rsync (with -W –inplace) takes two hours for 93Gb; a tape restore over the dedicated backup network is 41 minutes. Tape restore is 50 Mb/s; disk to disk was measured and calculated to be 16 Mb/s tops – 2 Mb/s if the CPU is busy.

The restore software is Veritas NetBackup; the disks are on an EMC Symmetrix array over fiber. The box is an HP rx6600 (Itanium) with 16 Gb running HP-UX 11i v2. All the disks are on one fiber card, listed as:

HP AD194-60001 PCI/PCI-X Fibre Channel 2-port 4Gb FC/2-port 1000B-T Combo Adapter (FC Port 1)

The disks are also all using Veritas Volume Manager (instead of HP LVM).


Update: It occurs to me that this is not just a straight disk-to-disk copy; in reality, it is a snapshot to disk copy. Could reading the snapshot be slowing things down that much? The snapshot is an HP VxFS snapshot (not a vxsnap); perhaps the interaction between the snapshot and VxVM is causing speed degradation?


Update: Using fstyp -v, it appears that the block size (f_bsize) is 8192; the default UNIX block size is 512 (or 8192/16). When testing with dd, I used a block size of 1024k (or 1048576, or 8192*128).

I really wonder if it is the block size. I read over at PerlMonks that the Perl module File::Copy is faster than cp; that is intriguing: I wonder.

If NetBackup is using tar, then it is not using cp: that might explain the speed increase as well.


Update: It appears that reading from snapshot is almost twice as slow as reading from the actual device. Running cp is slow, as is tar writing to the command line. Using tar is slightly better (when using a file) but is limited to 8Gb files (file in question is 96Gb or so). Using perl's File::Copy with a non-snapshot volume seems to be one of the fastest ways to go.

I'm going to try that and will report here what I get.

Best Answer

Another question is whether you're IO bound inside the FC network, ask the SAN guys to demonstrate (graphs are good) actual spare bandwidth available (oh, and if the FC switches are the Cisco ones how they're ensuring they're avoiding the bandwidth issues inside the switch)