Best failover strategy for e-mail servers on AWS to ensure high availability

amazon s3amazon-web-servicesbackupemail-serverfailover

We have our e-mail server hosted on AWS. Last week Amazon had a failure in their East Coast region which brought down our server along with many others.

We now want to implement a failover strategy so that if the mail server becomes unavailable again then we can simply switch to another mail server in a different zone and users can continue sending and receiving mail AS WELL AS still having access to their existing mail items.

Obviously having periodic back-ups of messages isn't a good-enough solution because there is a constant stream of incoming and outgoing emails being written to disk.

We are using a Windows 2008 Server and running Mailenable Enterprise. Configuration for MailEnable (eg. user accounts, passwords, etc.) are stored in an SQL Server Database on the Mail Server.

We are considering the following solution:

  • Mount S3 storage as a windows drive to store messages using a tool like tntdrive. Unlike EBS-storage (which is restricted to a single availability zone), S3 storage is available across availability zones which would make our storage available even if a single region fails.
  • We take daily snapshots of the mail server and copy this to S3.
  • In the case of the mail server failing we create a new instance of the mail server from our snapshot (this means that configuration changes such as password changes or new user account creation that happened since snapshot was taken will not be included, but we can accept that risk)
  • We mount the S3 storage containing the messages as a drive on the new server.
  • We switch the elastic ip for the mail server to the new server and we have a mail server that is available again!

Will this solution work? I am a bit worried about the latency and cost of S3 as compared to EBS (see http://jimliddle.sys-con.com/node/1103438/mobile). Is there a different approach we should be looking at? Would you recommend different Amazon tools to solve the problem?

Best Answer

You can clone the current mail server in another EC2 instance and run it as a backup MX server. Both servers database should be in sync with db level replication, and the disks should be in sync hrly with rsync/deltacopy. When the primary one goes down, sending email servers will automatically try to use secondary MX server, and users can still access old and new emails. When main server comes back, backup server will sync newest emails with main server again.