Web-server – Best way to keep servers in sync without any broken/missing files between syncs

rsyncsynchronizationweb-server

I'm assuming this is the right place to ask this sort of question, but please don't shoot me if I should have gone to a different Stack Exchange.

Anyhoo, what would you say the best way to keep servers in sync is? Obviously, using something like Rsync, but here's the issue. Say you have two servers hosting the exact same files. Your A records point the same domain to both of the servers. Basically what we are looking at here is a basic round robin load balancing set up.

This obviously works fine if you have static files, but what about user uploads? Say you host a simple board based site (like 4chan) and load balance it across two servers. Say one person uploads an image, and it gets stored onto whichever server their PC connected to (from the round robin DNS), but in that instant, another person comes along to see the file, but they are connected to the other server, and before the file has time to sync across the two servers, they are left with a broken image.

I figured a way to get around this could be to have a separate subdomain for each server and if the file hasn't synced, then load from whichever servers it is already on (using some magic PHP code). I'm sure there must be something simpler though.

Edit:

Obviously I'm talking just small scale stuff hosted on something like a couple of VPSs for instance (this is more of a hypothetical question that may come in use some time), rather than a huge data centre with custom technology like Facebook and Google

Best Answer

The "most correct" way to ensure that multiple machines have access to the same data is to place the data on a shared filesystem, which all machines access. NFS is the canonical solution, but it's far from the only option.

If you wished to stick with something rsync-based, that's entirely doable, for a small-scale site. The best way to avoid potential "missing data" problems is to save the file and sync it to the other machine before you send back the response. This will slow down file upload responses, but it does mean that you won't have any possibility of the file data not being available on either machine by a later request -- until the response is issued to the upload, the client can't make any assumptions about the existence of the resource. Similarly, save-and-sync the file before making a record in the database for it, to prevent another request (which, say, enumerates entries in the DB) from referencing the file before it's available everywhere.

Related Topic