Linux – the fastest way to “clone” a file in Linux

filesystemslinux

I would like to use an application API that is not "crash safe"; in other words, there is a high likelihood of the data file being corrupt and unreadable if the application crashes.

The file itself is a "metadata file" and should not get very big: few 100s of MB maximum.

What I want to do is:

  1. Force the application to access the file in "direct mode" (no OS caching).
  2. Pause updates at regular "checkpoint" intervals
  3. Perform a flush() (some data probably got flushed automatically)
  4. Now that I know the file is consistent, clone it.
  5. If there is an "old clone" delete it.
  6. Resume doing changes to the original file.
  7. Loop.

Could I use a special-purpose file system that makes some kind of "zero copy" of the file, combined with copy-on-write of the modified sectors of the original file, to get the clone "almost free" (with minimum disk IO)?

Also, can I do the "clone" without having to fork a process? (I don't know if the Linux file API offers a "cp" system-call).

Best Answer

You could use LVM snapshotting for this instead of cloning. If something goes wrong, just copy the file from the clone.

There is a libdevmapper/libdevmapper-event-lvm2snapshot which could be helpful in doing this programmatically (without a fork): http://sourceware.org/dm/

Edit:

If you can change your program here is another solution: https://stackoverflow.com/questions/1565177/can-i-do-a-copy-on-write-memcpy-in-linux

mmap() the file twice, once normally and once with MAP_PRIVATE.

This would avoid the externalities (esp performance) of lvm