Lvm – Moving a Logical Volume directly from one server to another over the network

ddfileslvm

I have a KVM host machine with several VMs on it. Each VM uses a Logical Volume on the host. I need to copy the LVs to another host machine.

Normally, I would use something like:

dd if=/the/logical-volume of=/some/path/machine.dd

To turn the LV into an image file and use SCP to move it. Then use DD to copy the file back to a new LV on the new host.

The problem with this method is you need twice as much disk space as the VM takes on both machines. ie. a 5GB LV uses 5GB of space for the LV and the dd copy also uses an additional 5GB of space for the image. This is fine for small LVs, but what if (as is my case) you have a 500GB LV for a big VM? The new host machine has a 1TB hard drive, so it can't hold a 500GB dd image file and have a 500GB logical volume to copy to and have room for the host OS and room for other smaller guests.

What I would like to do is something like:

dd if=/dev/mygroup-mylv of=192.168.1.103/dev/newvgroup-newlv

In other words, copy the data directly from one logical volume to the other over the network and skip the intermediate image file.

Is this possible?

Best Answer

Sure, of course it's possible.

dd if=/dev/mygroup-mylv | ssh 192.168.1.103 dd of=/dev/newvgroup-newlv

Boom.

Do yourself a favor, though, and use something larger than the default blocksize. Maybe add bs=4M (read/write in chunks of 4 MB). You can see there's some nitpicking about blocksizes in the comments; if this is something you find yourself doing fairly often, take a little time to try it a few different times with different blocksizes and see for yourself what gets you the best transfer rates.

Answering one of the questions from the comments:

You can pipe the transfer through pv to get statistics about the transfer. It's a lot nicer than the output you get from sending signals to dd.

I will also say that while of course using netcat -- or anything else that does not impose the overhead of encryption -- is going to be more efficient, I usually find that the additional speed comes at some loss of convenience. Unless I'm moving around really large datasets, I usually stick with ssh despite the overhead because in most cases everything is already set up to Just Work.

Related Topic