Linux – interpreting disk stats using sar

iolinuxsar

I have JBOD and Cassandra installed on it. I'm trying to figure out if my disk are totally free or they are boring. The sar -d result is in the end. As I understand until %util is low (less then 1) it means that all disk are boring? Am I correct in interpreting results? Or maybe I should check something more?

00:00:01          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
12:00:01       dev8-0     11.44      0.43    172.52     15.12      0.00      0.08      0.04      0.05
12:00:01      dev8-16     47.49      0.00   2872.14     60.48      0.01      0.28      0.04      0.19
12:00:01      dev8-32      1.23     24.78    140.97    134.93      0.00      2.75      1.33      0.16
12:00:01      dev8-80      2.07    160.71    139.63    145.26      0.00      1.57      1.47      0.30
12:00:01      dev8-64      0.87     85.07     22.82    124.59      0.00      3.22      3.17      0.27
12:00:01      dev8-96      3.89     62.73    287.07     89.84      0.01      1.32      1.02      0.40
12:00:01      dev8-48      1.08     37.81     68.02     97.61      0.00      2.05      1.86      0.20
12:00:01     dev8-144      1.22     86.92     51.66    113.43      0.01      4.14      3.56      0.43
12:00:01     dev8-112      0.43     53.76     20.51    174.24      0.01     14.11     12.57      0.54
12:00:01     dev8-160      2.29     26.02    200.23     98.84      0.00      1.62      0.87      0.20
12:00:01     dev8-128      2.30     12.06    208.93     95.98      0.00      0.57      0.48      0.11
12:00:01     dev253-0      0.05      0.43      0.00      8.00      0.00      4.50      0.56      0.00
12:00:01     dev253-1      0.03      0.00      0.27      8.00      0.00      0.90      0.60      0.00
12:00:01     dev253-2     25.26     26.02    200.23      8.96      0.02      0.79      0.08      0.20
12:00:01     dev253-3      6.89     86.92     51.66     20.11      0.02      2.65      0.63      0.43
12:00:01     dev253-4     26.17     12.06    208.93      8.44      0.02      0.91      0.04      0.11
12:00:01     dev253-5      2.84     53.76     20.51     26.19      0.01      2.46      1.89      0.54
12:00:01     dev253-6     36.13     62.73    287.07      9.68      0.04      1.15      0.11      0.40
12:00:01     dev253-7     18.19    160.71    139.63     16.52      0.02      0.97      0.17      0.30
12:00:01     dev253-8      3.19     85.07     22.82     33.84      0.01      2.76      0.86      0.27
12:00:01     dev253-9      8.65     37.81     68.02     12.23      0.02      2.35      0.23      0.20
12:00:01    dev253-10     17.81     24.78    140.97      9.31      0.02      0.85      0.09      0.16
12:00:01    dev253-11    359.02      0.00   2872.14      8.00      0.21      0.57      0.01      0.19
12:00:01    dev253-12      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:00:01    dev253-13     19.27      0.00    154.16      8.00      0.00      0.10      0.01      0.02
12:00:01    dev253-14      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:00:01    dev253-15      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:00:01    dev253-16      2.26      0.00     18.10      8.00      0.00      1.02      0.11      0.03

Best Answer

I haven't heard it called 'boring' before - the term I've heard used is 'idle'.

But yes, utilisation is an indication of how busy your disk is. At a very rough rule of thumb, above 60% and you will start to see performance impact.

The things that are most significant for your user experience though, is svctm and await - this is the service time - time taken to respond to an IO request. Lower is better. Above about 10 is a warning sign, and above 20 is when you'll start to see applications noticably suffering. ev8-112 falls into that category. But it still has quite a low utilisation.

I'm somewhat intrigued by what model of disks these are - you've got v253-11 that's doing 359 tps (transactions per second) and ~2800 wr_sec/s (sectors written per second). But it's still showing less than 1% utilisation.