Seeking inside file performance under BTRFS with LZO compression

btrfscompressionperformance

I am planning to use btrfs on a 50 TB RAID6 array and I want to enable lzo compression.

This is for bioinformatics setup where lots of seeking within large (1 TB — 20 TB) files occur. (The software gets only small chunks of data scattered across the file).

What worries me is that I don't understand how seeking is performed on compressed filesystems like btrfs. Does the file need to be decompressed from the beginning till the sought-after position first? That would have a huge negative impact in my setup.

Or a more general question: does the seek-time scale with file size the same way as on non-compressed filesystem or does it get worse, e.g. O(file_length)

Best Answer

Random seek times will be roughly O(1) like uncompressed file systems as well, but with the caveat that up to 128 KiB of data is compressed together so to read just a single byte, all the data in that 128 KiB block will have to be read and decompressed. Depending on the access pattern, this can have a somewhat large performance impact, but you need to benchmark this with your specific application and dataset.

(Source)

Related Topic