Linux – Copy 10 million images in a single folder to another server

linuxrsynctar

Now I know you shouldn't ever put 10 million files into a single directory to begin with. Blame it on the developers, but as it stands that’s where I am at. We will be fixing it and moving them into folder groups, but first we gotta get them copied off of the production box.

I first tried rsync but it would fail out miserably. I assume it was because storing the name and path of the files in memory was greater than the ram and swap space.

Then I tried to compress it all into a tar.gz but it couldn't unzip it, file too large error (it was 60gigs).

I tried to just do a tar to tar exaction, but I got a "cannot open: file too large"

tar c images/ | tar x –C /mnt/coverimages/

Extra Info:

/mnt/coverimages/ is an nfs share where we want to move the images to.

All files are images

OS: Gentoo

Best Answer

If you install version 3+ of rsync it will do a rolling list of files to transfer and won't need to keep the entire file list in memory. In the future you probably want to consider hashing the filenames and creating a directory structure based on parts of those hashes.

You can see this answer to get an idea of what I mean with the hashing.