LTO-4 Tape Write Source Throughput Needs

ltolto-4tapetapedrive

I am looking to begin a tape backup regimen and am looking to keep data flowing to the tape drive in a sufficient manner (120+MBs target sustained) but cannot figure out how to do so without a dedicated source drive/array that idles when not writing tapes. The documentation for our specific drive mentions no minimum throughput required.

Enviroment

  • Linux Debian writing to tape using mt & tar backing up RAR archives with recovery record, each ~1GB-300GB in size
  • LTO-4 Tapes on Quantum TC-42BN tape drive via SAS over external SFF Cable
  • Server is used for file backups only, no network services or fileserving.
  • MD RAID arrays with data intermittently being read/written in spurts throughout the day/night.

If the source array has significant reads/writes (from scheduled backups) during a tape write, throughput to the tape would drop dramatically even if temporarily. So some questions centered around source array/tape write throughput:

  1. I am assuming a sustained drop in throughput to below 10-20MB/s (or less) on the source during a tape write would be a problem?
  2. Do I need to have a source guaranteed to have no backups scheduled to it? Essentially 2 arrays minimum; one for backups and one for archives and tape writing?
  3. Is there a QOS for drives/arrays that could prioritize the tape writing over all else?
  4. LTO-4 tape drives throttle, so is there a common lower throughput limit to maintain for LTO-4 or does it vary widely per drive? Again, documentation mentions max designed speed and "variable speed transfers", but no mention of how variable.
  5. Am I missing something in this source-throughput equation, or have unfounded worries?

Update:

I decided to tax things minimally with a single I/O stream via a 600GB archive job reading from the array at about 30MB/s sustained while a tar was being written to the tape from a 4 drive RAID 6 with consumer SATA. The tape definitely slowed to a crawl via listening to the drive but did NOT seem to run out of data or shoe shine. This tells me to NOT expect things to keep up during a full scheduled backup for our hardware configuration but it can handle a less taxing I/O job wile writing to tape.

As of note, the LOT4 tapes must do 56 end-to-end passes so effectively it writes in ~14GB chunks before it stops for some seconds to slow down and then "go" the other direction. I think this helped keep the drive "fed" with data under lower throughput as I have read ahead and async writes set in the stinit.def.

Another note is a read of "dd if=/dev/st0 of=/dev/null" only produced a result of 107MB/s. This, I would assume, is the real-world max effective throughput of this the drive and NOT 120 MB/s. The drive is currently on a dedicated SAS PCIe HBA with no other PCIe cards installed

In the meantime, I setup a 1TB RAID0 as a Disk2Tape buffer and had to add another disk to server to make this feasible.

I would still love to find away to do some sort of QOS for the tape drive and set writing to tape top priority so we can simplify our arrays and reduce parasitic hardaware costs, but in the mean time, I'm not seeing a way to NOT get around having a dedicated disk2tape buffer if I want to ensure continuous writes no matter what scheduled jobs hit the array.

Best Answer

The mbuffer is a small and handy tool which can help you to maintain sustained data flow to the tape drive. It’s available on most linux distributions.

mbuffer - buffers I/O operations and displays the throughput rate. It is multi-threaded, supports network connections, and offers more options than the standard buffer.


Example usage with multithreaded compression on-the-fly:

tar cvf - /backupdir | lbzip2 | mbuffer -m 4G -L -P 80 > /dev/st0

  1. start adding files to the tar file archive
  2. (optional) compress it with lbzip2 to use all CPU cores
  3. begin filling memory buffer
  4. once filled to 80%, start sending data to the tape drive

mbuffer parameters explained:

  • -m 4 4GB memory buffer size. If necessary or available, use a larger buffer.
  • -L locked in memory (optional)
  • -P 80 start writing to the tape after 80% of the buffer has filled. There's no need to put 100 as it'll take some time for a tape drive to start writing, and it'll probably fill to 100% by then.

In this example, once the buffer fills up to 80% of capacity, it'll start sending data to the tape and mbuffer will continue to receive archive stream.

If the archiving process is slow and mbuffer has not received the data fast enough to keep up with the tape drive, it will stop sending data to the tape drive once it reaches 0%. Once the memory buffer is filled up to 80%, it'll start sending data to the tape drive, and recording will continue at full speed.

This way the "shoe-shining" of the tape is reduced to a minimum and tape drive will always get the data at the maximum speed it requires to sustain the stream.

You can also use mbuffer in reverse direction to read backup data from a tape drive and store the stream to some slower media or send it through the network.