Linux – JFS: long fsck time on large filesystem

filesystemsfsckjfslinuxperformance

There was a power failure recently that took down one of my servers. On reboot, the main storage filesystem – JFS on a 7TB (9x1TB RAID6) filesystem – needed an fsck before mounting read-write. After I started the fsck, I watched it for awhile in top – memory usage was rising steadily (but not too rapidly), and CPU usage was pegged at or near 100%.

Now, about 12 hours in, the fsck process has consumed almost 94% of the 4GB of memory in the system and CPU usage has dropped to around 2%. The process is still running (and gives no indication as to further running time).

First off: is this indicative of a problem? I'm worried by the fact that the CPU usage has dropped so dramatically – it seems almost as though the process has become memory-bound, and the fsck will take forever to complete because it's spending all its time swapping. (I noticed that kswapd0 is floating uncomfortably close to the top of the list in top, actually beating out the fsck process for CPU usage more than half the time.) If this isn't the case, if fsck just slows down CPU-wise near the end of the process, that's fine – I just need to know that.

If this is a problem, what can I do to improve fsck performance? I'm open to almost anything, up to and including "buy more memory for the system."

The relevant line from top:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
 5201 root      20   0 58.1g 3.6g  128 D    2 93.8   1071:27 fsck.jfs

And the result of free -m:

             total       used       free     shared    buffers     cached 
Mem:          3959       3932         26          0          0          6 
-/+ buffers/cache:       3925         33 
Swap:          964        482        482

Best Answer

Correct me if I'm wrong, but JFS is not a full journaling file system: it only handles the metadata in the journal. This means that the fsck command will take a looong time to complete if you have lots of data.

I suggest you investigate the possibility to switch to a fully journaled file system (etx3/4): that should remove the need for the command to be run in case of abrupt failure.