Zfs: zpool space map thrashing – ever fixed

solariszfs

For zpools which are rather full and/or very fragmented I use to enable metaslab debugging (echo metaslab_debug/W1 | mdb -kw) to avoid space map thrashing and the resulting severe write performance hit. The problem itself seems to be old and understood, a fix has been rumored to be "in the works" for a while now, just as has been the defrag API which presumably should help as well, yet I could not find an "official" approach fixing it by default in production code.

Is there something I have missed?

Some environment data: my zpools are of moderate size (typically < 10 TB) and mostly present zfs datasets with zvols using the default record size of 8K (which is in fact variable due to typically enabled compression). Over the years, I have seen this problem appear in different versions of Solaris, especially with aged zpools which have seen a lot of data. Note that this is not the same as the zpool 90% full performance wall as the space map thrashing due to fragmentation hits at a significantly lower space utilization level (I have seen it occur at 70% on a couple of old pools)

Best Answer

Unfortunately, in a word: no.

In a longer word: sort of. The method by which ZFS finds free space to use has been somewhat altered in latest builds of ZFS (Open-ZFS) to somewhat mitigate the issue -- the underlying fragmentation remains, the 'fix' is that it has less impact on performance.

The only true 'fix' you can use at the moment is to zfs send the data off the pool, wipe the pool out, and zfs send the data back. Obviously the problem will then reappear at a later date, based on your workload and how quickly you fragment the space maps.

There are other potential fixes/workarounds being discussed/in the works, but I certainly couldn't give any sort of ETA.