Centos – Please help to find a solution for two way, real-time synchronization on Centos 5.5 64Bit

centossynchronization

I am in need of a real time, two way synchronization software for Centos 5.5 / 64Bit.

Here's little explanation:

It needs to be able to perform:

  1. Two way synchronization.

  2. It must be realtime. By realtime means it can be almost realtime, i.e. a delay of 1 second for example is fine.

  3. And the folders are on the same server.

I am currently using GlusterFS across two webservers. However, it has extremely poor small file read performance and it's slowing down my website. There's nothing more that can be done to improve this, I have already tested many configurations. As a solution, I was going to mount a RAM drive (tmpfs) that mirrors the GlusterFS web files but get the webserver to use the RAM drive.

The issue is that I need two way realtime mirroring or replication between glusterfs and the RAM drive. I need this is as Apache writes files as wells.

As I said, realtime two way synchronization across two folders. Which are in fact 2 different mounts points.
The RAM (tmpfs) mount poing and the GlusterFS mount point.

I already know about:

  • Rsync – Which is one way
  • Unison – Which is not realtime

Please suggest me any solution free or paid.

Thanks in advance

Best Answer

While afs looks to be an obvious solution, I'd previously looked at this in some depth for a highly custom web application - and the fastest / most efficient / reliable solution was to implement to replication within the app - using rsync when bringing nodes back online. I did have a longer term plan to implement demand-based resynchronisations using inotify as the trigger (but never found the time).

As a solution, I was going to mount a RAM drive (tmpfs) that mirrors the GlusterFS web files but get the webserver to use the RAM drive

Yes, if GlusterFS does not support synchronization of cache invalidations, then it might be slightly faster (at the cost of not updating in real-time) although unless you have a very high data turnover, you'll probably find it faster to use a optimized file-system on a conventional device rather than a RAM drive.

You're probably going to see similar problems with any shared disk file systems (but this is mostly guesswork on my part).

A better solution would be to use a database (cluster?) to store any data - clustering is much more manageable (and easier to implement). See also mysql replication and Cassandra.

/me wonders if you could use overlay filesystems (unionfs) - putting the local copy on top and the remote system underneath then run rsync periodically from top to bottom - although I suspect it may prove difficult to delete files.

HTH

C.