How to Eliminate Tape Backup and Off-site Storage Service

backupdeduplicationreplicationtape

PLEASE READ UPDATE AT THE BOTTOM. THANKS! 😉

Environment Info (all Windows):

  • 2 sites
  • 30 servers site #1 (3TB of backup data)
  • 5 servers site #2 (1TB of backup data)
  • MPLS backbone tunnel connecting site #1 and site #2

Current Backup Process:

Online Backup (disk-to-disk)

Site #1 has a server running Symantec Backup Exec 12.5 with four 1TB USB 2.0 disks. BE jobs for full backups run nightly on all servers in site #1 to these disks. Site #2 backs up to a central file server there using software they already had when we purchased them. A BE job pulls that data nightly to site #1 and stores them on said disks.

Off-site Backup (tape)

Connected to our backup server is a tape drive. BE backs up the external disks to tape once a week which gets picked up by our off-site storage company. Obviously we rotate two tape libraries, one is always here and one is always there.

Requirements:

  • Eliminate the need for tape and off-site storage service by doing disk-to-disk at each site and replicating site #1 to site #2 and vice versa.
  • Software based solution as hardware options have been too pricey (ie, SonicWall, Arkeia).
  • Agents for Exchange, SharePoint, and SQL.

Some Ideas So Far:

Storage

DroboPro at each site with an initial 8TB of storage (these are expandable up to 16TB at present). I like these because they are rackmountable, allow disparate drives, and have iSCSI interfaces. They are relatively cheap too.

Software

Symantec Backup Exec 12.5 already has all the agents and licenses we need. I'd like to keep using it unless there is a better solution, similarly priced, that does everything BE does plus deduplication and replication.

Server

Because there is no more need for a SCSI adapter (for tape drive) we are going to virtualize our backup server as it is currently the only physical machine save for SQL boxes.

Problems:

  • When replicating between sites we want as little data as possible to go across the pipe. There is no deduplication or compression in what I have laid out here so far.
  • The files being replicated are BE's virtual tape libraries from our disk-to-disk backup. Because of this each of those huge files will go across the wire every week because they change every day.

And Finally, the Question:

Is there any software out there that does deduplication, or at least compression, to handle just our site-to-site replication? Or, looking at our setup, is there any other solution that I am missing that might be cheaper, faster, better?

Thanks. Sorry so long.

UPDATE 2:

I've set a bounty on this question to get it more attention. I'm looking for software that will handle replication of data between two sites using the least amount of data possible (either compression, deduplication, or some other method). Something similar to rsync would work but it needs to be native to Windows and not a port involving shenanigans to get up and running. Prefer a GUI based product and I don't mind shelling out a few bones if it works.

Please, answers that meet the above criteria only. If you don't think one exists or if you think I'm being to restrictive keep it to yourself. If after seven days there is no answer at all, so be it. Thanks again everyone.

UPDATE 2:

I really appreciate everyone coming forward with suggestions. There is no way for me to try all of these before the bounty expires. For now I'm going to let this bounty run out and whoever has the most votes will get the 100 rep points. Thanks again!

Best Answer

Windows Server 2003 R2 and later has support for DFSR, which I used extensively to sync and backup large amounts of data over a rather small pipe across three sites (80GB+ over a T1<-->T1<-->T1 topology).

msdn.microsoft.com/en-us/library/bb540025(VS.85).aspx

Replicating data to multiple servers increases data availability and gives users in remote sites fast, reliable access to files. DFSR uses a new compression algorithm called Remote Differential Compression (RDC). RDC is a "diff over the wire" protocol that can be used to efficiently update files over a limited-bandwidth network. RDC detects insertions, removals, and rearrangements of data in files, enabling DFSR to replicate only the deltas (changes) when files are updated.

DFSR is fully multimaster and can be configured however you want. That will keep your data in sync on the "backup" location, for a very small amount of bandwidth and CPU. From here, you can use the Volume Shadow Copy Service.

technet.microsoft.com/en-us/library/cc785914.aspx

The Volume Shadow Copy Service can produce consistent shadow copies by coordinating with business applications, file-system services, backup applications, fast-recovery solutions, and storage hardware. Several features in the Windows Server 2003 operating systems use the Volume Shadow Copy Service, including Shadow Copies for Shared Folders and Backup.

The shadow copies reside on disk, and take "no space" aside from the changed files from snapshot to snapshot. This is a process that can run on a live dataset with no ill effects, aside from slightly increased disk I/O as the snapshot is being created.

I used this solution for quite some time with great success. Changes to files were written out to the other sites within seconds (even over the low bandwidth links), even in cases where just a few bytes out of a very large file changes. The snapshots can be accessed independently from any other snapshot taken at any point in time, which provides both backups in case of emergency and very very little overhead. I set the snapshots to fire at 5 hour intervals, in addition to once before the workday started, once during the lunch hour and once after the day was over.

With this, you could store all data in parallel at both locations, kept relatively up to date and "backed up" (which amounts to versioned, really) as often as you want it to.

The Shadow Copy Client can be installed on the client computers to give them access to the versioned files, too.

www.microsoft.com/downloads/details.aspx?FamilyId=E382358F-33C3-4DE7-ACD8-A33AC92D295E&displaylang=en

If a user accidentally deletes a file, they can right-click the folder, properties, Shadow Copies, select the latest snapshot and copy it out of the snapshot and into the live copy, right where it belongs.

MSSQL backups can be written out to a specific folder (or network share) which would then automatically be synched between sites and versioned on a schedule you define.

I've found that data redundancy and versioning with these can act as an awesome backup system. It also gives you the option to copy a specific snapshot offsite without interfering with the workflow, as the files it reads from aren't in use...

This should work with your setup, as the second backup site can be configured as a read-only sync/mirror.