Linux – How to host a single website on multiple geographically-diverse servers

cloudgeodnslinuxweb-hostingweb-server

I currently have 2 servers, using cPanel/WHM. The first one is a VPS hosted in London (we'll call it "international") and the second one is a dedicated server located in my country (we'll call it "local").

"local" will have unlimited local bandwidth, however it will only have 1Mbps of international bandwidth.

I need to host a single website (or maybe multiple websites) on both servers and serve the visitor based on their origin country. I mean, when the visitor is from my own country the data will be served from "local", and if the visitor is from any other country the data will be served from "international".

Both types of visitor can perform read/write operations on the servers and I need to sync files and databases between both servers, as both server will have updated files and database.

So, how can this be possible regarding the DNS and synchronisation? Or what's easy and possible? Can anyone guide me with the steps I have to perform?

Best Answer

The first, simple, straightforward, and above all, robust solution is to give up on your plans for having two servers, and just run one machine out of a suitably central location. While I understand the rationale for not hosting anything off your local server, because of the constrained international bandwidth, I don't see anything in your question that requires a local server presence.

If your reasons for wanting a local server are for purely performance reasons, I'd seriously recommend looking at a local static asset server, with all of the dynamic stuff going to London. While geoDNS isn't trivial, it's an awful lot easier than robust real-time synchronisation of your dynamic assets and database. This mechanism is used by many sites (this one included) to improve overall perceived page speed, and it works rather well.

Assuming that isn't the case here, and you really do need two servers, I see a massive flaw in your plan -- the 1Mbps international bandwidth is going to be fairly saturated by your synchronisation traffic. You'll want to hope that your site doesn't get too popular, or you'll be in a whole world of pain.

You're in a fairly favourable position vis a vis DNS, because you've got a clearly defined subset of addresses that you want to serve particular records to. Presumably you can get a list of netblocks from your provider that delineate what counts as "local, bandwidth unlimited" traffic, and what counts as "international, 1Mbps capped" traffic. If your provider can't do that, I'd be asking them how the hell they're actually doing the rate limiting, because there's got to be a list in there somewhere. Worst case, if they're just doing it based on "anything that we see announced over this BGP link is local", you still should be able to get a list of the prefixes on that link.

So, the DNS stuff comes down to "for A record requests to www.example.com, serve localip if the source address is in the list of local prefixes, and internationalip otherwise". How you script that for your given DNS servers is up to you; I'd go with tinydns, because I use it for everything I can and it's pretty awesome at this particular task.

But that's about 1% of the total problem. You've got a much, much bigger issue on the dynamic assets side of town.

The database is actually the (relatively) easy bit. Both MySQL and PostgreSQL support multi-master replication, whereby writes to either database get replicated to the other (more or less) automatically. It's not exactly trivial to setup, and you need to monitor the bejesus out of it to detect when it breaks and fix it, but it is possible in a fairly standardised way.

Your files, on the other hand, require a lot more local intelligence. To make this work, you'll need to design your file storage properly to allow the replication to work. It's even more entertaining because you say you need to support deletion.

Really, periodic rsync is your best friend for this. Ignoring the modification and deletion aspects of things for a second, if you make sure that your filenames can't collide on both sides (using UUIDs or database PKs as the basis for all your filenames will work nicely) you should be able to just do periodic rsyncs of each side to the other, and all new files created on each side will magically appear on the other. How often you do the rsync depends on how much time you can stand before everything is synchronised -- that's a call you have to make. Your application also needs to intelligently handle cases where (for example) the DB records have synchronised but the files haven't.

The deletion makes things a lot harder, because you can't just run a blind rsync -a --delete because anything that the sender doesn't have will be deleted from the receiver -- a great way to lose lots of data. I'd prefer to have a deletion log, and run through it every now and then and delete things from the other side. If that doesn't appeal, you can go more fancy with two separate filesystems at each end (one for "local data", and the other for "replica of other end"), and either access both of them from your application, or use a union filesystem layer to make them look like one filesystem to the webserver.

Modification is just a complete nightmare -- your risk is simultaneous modifications on both servers, at which point you're just screwed. In the sort of "eventual consistency" model you're working with here (which, for the geographically-distributed, high-latency replication system you're forced to deal with, is the only option) you simply cannot handle this at the infrastructure level -- you have to make some sort of compromise at your application to decide how to deal with these sorts of issues. You can help the situation by treating your filesystem as an append-only store (if you want to modify a file, you write a new version and update your database to point to the new record), but since your database, too, is only eventually-consistent, you can't solve the problem completely. At least if your database is the single point of truth, though, you'll have guaranteed consistency, if not guaranteed correctness, which is half the battle.

And I think that just about covers everything. To reiterate, though, life is a lot simpler if you don't have to go with geographically-distributed servers. If you're implementing this because it "sounds cool", step away from the keyboard. If you want to do cool stuff, do it on your own time, or as a science experiment. You're paid to do what's most effective for your employer, not what gives you a geek priapism.

Notes on linux permissions

Linux and other POSIX-compliant systems use traditional unix permissions. There is an excellent article on Wikipedia about Filesystem permissions so I won't repeat everything here. But there are a few things you should be aware of.

The execute bit
Interpreted scripts (eg. Ruby, PHP) work just fine without the execute permission. Only binaries and shell scripts need the execute bit. In order to traverse (enter) a directory, you need to have execute permission on that directory. The webserver needs this permission to list a directory or serve any files inside of it.

Default new file permissions
When a file is created, it normally inherits the group id of whoever created it. But sometimes you want new files to inherit the group id of the folder where they are created, so you would enable the SGID bit on the parent folder.

Default permission values depend on your umask. The umask subtracts permissions from newly created files, so the common value of 022 results in files being created with 755. When collaborating with a group, it's useful to change your umask to 002 so that files you create can be modified by group members. And if you want to customize the permissions of uploaded files, you either need to change the umask for apache or run chmod after the file has been uploaded.

The problem with 777

When you chmod 777 your website, you have no security whatsoever. Any user on the system can change or delete any file in your website. But more seriously, remember that the web server acts on behalf of visitors to your website, and now the web server is able to change the same files that it's executing. If there are any programming vulnerabilities in your website, they can be exploited to deface your website, insert phishing attacks, or steal information from your server without you ever knowing.

Additionally, if your server runs on a well-known port (which it should to prevent non-root users from spawning listening services that are world-accessible), that means your server must be started by root (although any sane server will immediately drop to a less-privileged account once the port is bound). In other words, if you're running a webserver where the main executable is part of the version control (e.g. a CGI app), leaving its permissions (or, for that matter, the permissions of the containing directory, since the user could rename the executable) at 777 allows any user to run any executable as root.

Define the requirements

Developers need read/write access to files so they can update the website
Developers need read/write/execute on directories so they can browse around
Apache needs read access to files and interpreted scripts
Apache needs read/execute access to serveable directories
Apache needs read/write/execute access to directories for uploaded content

Maintained by a single user

If only one user is responsible for maintaining the site, set them as the user owner on the website directory and give the user full rwx permissions. Apache still needs access so that it can serve the files, so set www-data as the group owner and give the group r-x permissions.

In your case, Eve, whose username might be eve, is the only user who maintains contoso.com :

chown -R eve contoso.com/
chgrp -R www-data contoso.com/
chmod -R 750 contoso.com/
chmod g+s contoso.com/

ls -l
drwxr-s--- 2 eve      www-data   4096 Feb  5 22:52 contoso.com

If you have folders that need to be writable by Apache, you can just modify the permission values for the group owner so that www-data has write access.

chmod g+w uploads

ls -l
drwxrws--- 2 eve      www-data   4096 Feb  5 22:52 uploads

The benefit of this configuration is that it becomes harder (but not impossible*) for other users on the system to snoop around, since only the user and group owners can browse your website directory. This is useful if you have secret data in your configuration files. Be careful about your umask! If you create a new file here, the permission values will probably default to 755. You can run umask 027 so that new files default to 640 (rw- r-- ---).

Maintained by a group of users

If more than one user is responsible for maintaining the site, you will need to create a group to use for assigning permissions. It's good practice to create a separate group for each website, and name the group after that website.

groupadd dev-fabrikam
usermod -a -G dev-fabrikam alice
usermod -a -G dev-fabrikam bob

In the previous example, we used the group owner to give privileges to Apache, but now that is used for the developers group. Since the user owner isn't useful to us any more, setting it to root is a simple way to ensure that no privileges are leaked. Apache still needs access, so we give read access to the rest of the world.

chown -R root fabrikam.com
chgrp -R dev-fabrikam fabrikam.com
chmod -R 775 fabrikam.com
chmod g+s fabrikam.com

ls -l
drwxrwsr-x 2 root     dev-fabrikam   4096 Feb  5 22:52 fabrikam.com

If you have folders that need to be writable by Apache, you can make Apache either the user owner or the group owner. Either way, it will have all the access it needs. Personally, I prefer to make it the user owner so that the developers can still browse and modify the contents of upload folders.

chown -R www-data uploads

ls -l
drwxrwxr-x 2 www-data     dev-fabrikam   4096 Feb  5 22:52 uploads

Although this is a common approach, there is a downside. Since every other user on the system has the same privileges to your website as Apache does, it's easy for other users to browse your site and read files that may contain secret data, such as your configuration files.

You can have your cake and eat it too

This can be futher improved upon. It's perfectly legal for the owner to have less privileges than the group, so instead of wasting the user owner by assigning it to root, we can make Apache the user owner on the directories and files in your website. This is a reversal of the single maintainer scenario, but it works equally well.

chown -R www-data fabrikam.com
chgrp -R dev-fabrikam fabrikam.com
chmod -R 570 fabrikam.com
chmod g+s fabrikam.com

ls -l
dr-xrwx--- 2 www-data  dev-fabrikam   4096 Feb  5 22:52 fabrikam.com

If you have folders that need to be writable by Apache, you can just modify the permission values for the user owner so that www-data has write access.

chmod u+w uploads

ls -l
drwxrwx--- 2 www-data  dev-fabrikam   4096 Feb  5 22:52 fabrikam.com

One thing to be careful about with this solution is that the user owner of new files will match the creator instead of being set to www-data. So any new files you create won't be readable by Apache until you chown them.

*Apache privilege separation

I mentioned earlier that it's actually possible for other users to snoop around your website no matter what kind of privileges you're using. By default, all Apache processes run as the same www-data user, so any Apache process can read files from all other websites configured on the same server, and sometimes even make changes. Any user who can get Apache to run a script can gain the same access that Apache itself has.

To combat this problem, there are various approaches to privilege separation in Apache. However, each approach comes with various performance and security drawbacks. In my opinion, any site with higher security requirements should be run on a dedicated server instead of using VirtualHosts on a shared server.

Additional considerations

I didn't mention it before, but it's usually a bad practice to have developers editing the website directly. For larger sites, you're much better off having some kind of release system that updates the webserver from the contents of a version control system. The single maintainer approach is probably ideal, but instead of a person you have automated software.

If your website allows uploads that don't need to be served out, those uploads should be stored somewhere outside the web root. Otherwise, you might find that people are downloading files that were intended to be secret. For example, if you allow students to submit assignments, they should be saved into a directory that isn't served by Apache. This is also a good approach for configuration files that contain secrets.

For a website with more complex requirements, you may want to look into the use of Access Control Lists. These enable much more sophisticated control of privileges.

If your website has complex requirements, you may want to write a script that sets up all of the permissions. Test it thoroughly, then keep it safe. It could be worth its weight in gold if you ever find yourself needing to rebuild your website for some reason.

Best Answer

Related Solutions

Run Multiple Public Web Servers on Single IP Address – How to

Linux Web Server – Proper Permissions for Website Files and Folders