I've always been a block level kinda guy but I'm interested in hearing some real world experiences with file level cloning. What are some of the advantages and disadvantages as well as what tools work the best.
Linux – block level vs. file level cloning
block-levelclonefilesystemslinuxunix
Related Solutions
For more data on the layout of Linux file-systems, look at the Filesystem Hierarchy Standard (now at version 2.3, with the beta 3.0 version deployed on most recent distros). It does explain some of where the names came from:
- /bin - Binaries.
- /boot - Files required for booting.
- /dev - Device files.
- /etc - Et cetera. The name is inherited from the earliest Unixes, which is when it became the spot to put config-files.
- /home - Where home directories are kept.
- /lib - Where code libraries are kept.
- /media - A more modern directory, but where removable media gets mounted.
- /mnt - Where temporary file-systems are mounted.
- /opt - Where optional add-on software is installed. This is discrete from
/usr/local/
for reasons I'll get to later. - /run - Where runtime variable data is kept.
- /sbin - Where super-binaries are stored. These usually only work with root.
- /srv - Stands for "serve". This directory is intended for static files that are served out.
/srv/http
would be for static websites,/srv/ftp
for an FTP server. - /tmp - Where temporary files may be stored.
- /usr - Another directory inherited from the Unixes of old, it stands for "UNIX System Resources". It does not stand for "user" (see the Debian Wiki). This directory should be sharable between hosts, and can be NFS mounted to multiple hosts safely. It can be mounted read-only safely.
- /var - Another directory inherited from the Unixes of old, it stands for "variable". This is where system data that varies may be stored. Such things as spool and cache directories may be located here. If a program needs to write to the local file-system and isn't serving that data to someone directly, it'll go here.
/opt vs /usr/local
The rule of thumb I've seen is best described as:
Use
/usr/local
for things that would normally go into/usr
, or are overriding things that are already in/usr
. Use/opt
for things that install all in one directory, or are otherwise special.
ZFS deduplication works on blocks (recordlength) it does not know/care about files. Each block is checksummed using sha256 (by default changeable). If the checksum matches an other block it will just reference the same record and no new data will be written. One problem of deduplication with ZFS is that checksums are kept in memory so large pools will require a lot of memory. So you should only apply reduplication when using large record length
Assuming recordlength 128k
If I a randomly filled file of 1GB, then I write a second file that is the same except half way through, I change one of the bytes. Will that file be deduplicated (all except for the changed byte's block?)
Yes only one block will not be duplicated.
If I write a single byte file, will it take a whole 128 kilobytes? If not, will the blocks get larger in the event the file gets longer?
128k will be allocated, if the file size grows above 128k more blocks will be allocated as needed.
If a file takes two 64kilobyte blocks (would this ever happen?), then would an identical file get deduped after taking a single 128 kilobyte block
A file will take 128k the same file will be deduplicated
If a file is shortened, then part of its block would have been ignored, perhaps the data would not be reset to 0x00 bytes. Would a half used block get deduced?
If the exact same block is found yes
Best Answer
Well the most obvious advantage of file-level cloning is that you don't waste time cloning unused blocks. Eg a clone of a 40G partition with 10G of data will require 40G of reads and 40G of writes on the block level, but close to 10G of reads and 10G of writes on the file level.
One minor benefit of file-level cloning, is that it effectively perfectly de-fragments your filesystem at the same time, whereas block-level cloning clones fragmentation as well.
Block-level cloning is simpler, and you don't have to worry about any kind of permissions or other issues, you know for 100% certain the clone will be identical to the original, but it's possible for file-level cloning to go wrong if you mess up your settings.