Write speed requirement : 1.1GB/s possibilities

performancestoragewritezfs

We will have a machine at work, that on peak performance, should be able to push 50 ("write heads") x 75GB of data per hour. That's peak performance of ~1100MB/s write speed. To get that from the machine, it requires two 10GBi lines. My question is what kind of server+technology can handle/store such data flow ?

Currently for data storage we work with ZFS, although write speeds were never a question. (we are not even close to these speeds) Would ZFS (zfs on linux) be an option ? We also need to store a lot of data, the "IT guide" suggests somewhere between 50-75 TB in total. So it probably can't be all SSDs unless we want to offer our first-born child.

Some additions based on the excellent replies :

  • the maximum is 50x75GB/hour during peak which is less than 24h (most
    likely <6h)
  • We don't expect this to happen soon, most likely we will
    run 5-10x75GB/hour
  • it's a pre-alpha machine, however requirements should be met (even though a lot of question marks are in play)
  • we would use NFS as connection from the machine to the server
  • layout : generating machine -> storage (this one) -> (safe raid 6) -> compute cluster
  • so read speed is not essential, but it would be nice to use it from the compute cluster (but this is completely optional)
  • most likely it's going to be large data files (not many small)

Best Answer

Absolutely... ZFS on Linux is a possibility if architected correctly. There are many cases of poor ZFS design, but done well, your requirements can be met.

So the main determinant will be how you're connecting to this data storage system. Is it NFS? CIFS? How are the clients connecting to the storage? Or is the processing, etc. done on the storage system?

Fill in some more details and we can see if we can help.

For instance, if this is NFS and with synchronous mounts, then it's definitely possible to scale ZFS on Linux to meet the write performance needs and still maintain the long-term storage capacity requirement. Is the data compressible? How is each client connected? Gigabit ethernet?


Edit:

Okay, I'll bite:

Here's a spec that's roughly $17k-$23k and fits in a 2U rack space.

HP ProLiant DL380 Gen9 2U Rackmount
2 x Intel E5-2620v3 or v4 CPUs (or better)
128GB RAM
2 x 900GB Enterprise SAS OS drives 
12 x 8TB Nearline SAS drives
1 or 2 x Intel P3608 1.6TB NVMe drives

This setup would provide you 80TB usable space using either hardware RAID6 or ZFS RAIDZ2.

Since the focus is NFS-based performance (assuming synchronous writes), we can absorb all of those easily with the P3608 NVMe drives (striped SLOG). They can accommodate 3GB/s in sequential writes and have a high enough endurance rating to continuously handle the workload you've described. The drives can easily be overprovisioned to add some protections under a SLOG use case.

With the NFS workload, the writes will be coalesced and flushed to spinning disk. Under Linux, we would tune this to flush every 15-30 seconds. The spinning disks could handle this and may benefit even more if this data is compressible.

The server can be expanded with 4 more open PCIe slots and an additional port for dual-port 10GbE FLR adapters. So you have networking flexibility.