Linux Filesystems – Interaction Between Filesystem Commit Interval and vm.dirty_expire_centisecs

filesystemslinux

Question

How do filesystem "commit interval" options interact with vm.dirty_expire_centisecs? What happens when one is shorter than the other? Does it ever make sense to set these differently?

My understanding is that filesystem commit interval settings control how often the filesystem will proactively write dirty data and metadata to disk, even when fsync hasn't been called by the application.

Separately, vm.dirty_expire_centisecs seems to have a similar role, but at the VM layer rather than the filesystem layer.


References

ext4 commit mount option:

Ext4 can be told to sync all its data and metadata every 'nrsec' seconds. The default value is 5 seconds. This means that if you lose your power, you will lose as much as the latest 5 seconds of work (your filesystem will not be damaged though, thanks to the journaling).

btrfs commit mount option:

Set the interval of periodic commit. Higher values defer data being synced to permanent storage with obvious consequences when the system crashes.

Note, I'm leaving out XFS for now, as its fs.xfs.xfssyncd_centisecs option appears to apply to metadata only.

vm.dirty_expire_centisecs:

This tunable is used to define when dirty data is old enough to be eligible
for writeout by the kernel flusher threads. It is expressed in 100'ths
of a second. Data which has been dirty in-memory for longer than this
interval will be written out next time a flusher thread wakes up.

Best Answer

I posted this question to the linux-ext4@ mailing list, and the answer from Jan Kara was:

Yes, the effect is rather similar but not quite the same. The first thing to observe is kind of obvious fact that ext4 commit interval influences just the particular filesystem while dirty_expire_centisecs influences behavior of global writeback over all filesystems.

Secondly, commit interval is really the maximum age of ext4 transation. So if there is metadata change pending in the journal, it will become persistent at latest after this time. So for say 'mkdir' that will be persistent at latest after this time. For data operations things are more complex. E.g. when delayed allocation is used (which is the default), the change gets logged in the journal only during writeback. So it can take up to dirty_expire_centisecs for data to be written back from page cache, that results in filesystem journalling block allocations etc. and then it can take upto commit interval for these changes to become persistent. So in this case the intervals add up. There are also other special cases somewhere in between but generally it is reasonable to assume that data gets automatically persistent in dirty_expire_centisecs + commit_interval time. Note both these times are actually times when writeback is triggered so if the disk gets too busy, the actual time when data is completely on disk may be much higher.