Linux – How to move Mdadm RAID drive (EBS based) to different AWS Instance

amazon-ebsamazon-web-serviceslinuxmdadm

We have a media-rich web application that is hosted on AWS. We have several Web Servers and we have an NFS server.

On the NFS server (Linux server) we have several EBS volumes that are mounted and we've used mdadm to implement the different mounted volumes as a single RAID volume. The Web Servers simply access the NFS storage through a mount point.

Amazon has now let us know that they will be performing power maintenance on this server in a couple of days time. Since all our media is on here it would render our site unusable for the hours while Amazon is working on it. We want to try and prevent this downtime.

I was thinking that we can prevent server downtime by perhaps setting up a new server temporarily and attaching the EBS drives (raid volume) to that server and have our web servers point there during maintenance.

This is a very high risk operation since this involves several terabytes of our production data.

What would be the safe way to move over our logical raid drive (md0) to a new amazon instance? I was hoping that I could start with building the new server, mounting the ebs volumes and assembling the RAID partition using mdadm –assemble –scan before unmounting from the existing instance so that I can first test that everything works and thus having it mounted on two instances at the same time, but I don't believe that is possible with the way that filesystems work.

How do I move a Linux software RAID to a new machine? suggests a way to move drives, but isn't really a cloud-based question. Perhaps there are simpler ways to prevent system downtime with our solution being hosted on the cloud? I have considered taking an EBS snapshot, but that tries to replicate all the many terabytes of mounted storage, so this is not a practical solution.

Any ideas?

Best Answer

You can only attach an EBS device to a single instance, so you're going to have to detach it when moving. I'm assuming that you want to avoid data intensive processes like creating an EBS snapshot or rsyncing the data to a new instance. I'm also assuming that you're using RAID1.

The safest option will require a couple minutes of downtime. You would start up a new instance, and install and configure the software necessary (e.g. NFS server). Then on the old instance, unmount the filesystem, stop the array and detach the two EBS devices. Then attach the EBS devices to the new instance, start the array and mount the filesystem. Get the webservers to mount NFS from the new instance. Starting the array should just be a case of running the mdadm command you describe, but I'd definitely test this first.

The second option potentially has lower downtime (assuming that you can operate in read only mode for a while), but is more dangerous. You'd start the new instance as above. On the old instance, remount the filesystem in read only mode. Then fail one of the RAID devices, detach this EBS device and attach it to the new instance. Start the array in degraded mode on the new instance, mount the filesystem, and get the webservers to mount NFS from the new instance (the site should be fully available at this stage). Then stop the array on the old instance, detach the EBS device and attach it to the new instance, and add it to the array. This may however trigger a full resync, so again, test this first.

Whatever you do, make sure to test the process first so you know exactly how to perform it, and make sure you have backups in case it goes horribly wrong. (Also, consider storing your media on S3.)