Real-time file sync between servers with hunderd-thousands of small files

filesrealtimersync

I've given the task to create two CentOS 7 servers where not only the databases will be replicated but also files. Now my problem is that there will be probably hundred-thousands of files if not a million of files with a wide variety of sizes from a few Kbytes to ~1 Gbyte.

I've read about

  • incrion
  • lysncd
  • git-annex
  • ChironFS

Now I wish to ask your experiences about any of these if you have been using it or currently using it. How is the performance doing with the file changes regarding to copies and deletions? I'm very affraid of using any rsync because my experience is that it is not very fast with a lot of small files, therefore I can't really use it for a real-time file replication. Or am I wrong? Please prove me wrong. 🙂

Or maybe I'll need a 3rd and 4th server as fileservers? If yes, then the question still remains: How to replicate the files between the two servers in realtime?

Cheers!

Best Answer

If your servers are on the same LAN, then a clustered filesystem (ie: GlusterFS) or a shared storage solution (ie: via NFS) should be the better choice.

If your servers are in different location, having only WAN connectivity, the above solution will not work well. In this case, and if you only need one-way replication (ie: from active to backup server), lsyncd is a good solution. Another solution is csync2. Finally, another possibility is to use DRBD + DRBD Proxy (please note that its proxy component is a commercial plugin).

Finally, if your servers only have WAN connectivity and you need bidirectional replication (ie: both servers are active at the same time), basically no silver bullet exists. I'll list some possibilities, but I am far from recommending a similar setup:

  • unison with its real-time plugin
  • psync, which I exactly wrote for solving a similar problem (but please note that it has its own share of idiosyncrasies, and I provide no support for it)
  • syncthing with its real-time plugin (but it has significan limitations, namely it does not preserve ACLs nor the file's owner/group)