Xfs slow performance; ls hang

centos7performancexfs

I have a very strange problem with a xfs on a centos7 storage server.
On this one particular fs in lvm, it had been very slow in many different occasion. At first I thought it was a read and/or write, as cp of a 4.5 GB iso file took 100x longer than in a good fs. I wondered if it was an alignment problem.
But now it is hanging on a simple ls -lh, with nothing else going on to this file system, something else must be very wrong here…

Have a look at this series of commands. ls was fine at first, then after a file creation test, a second ls hung:

# cd perf_test/
# ls
CentOS-7-x86_64-DVD-1708.iso  CentOS-7-x86_64-DVD-1708.iso.cp1
# ls -l
total 8830984
-rw-rw-r--. 1 t    t    4521459712 Sep  6  2017 CentOS-7-x86_64-DVD-1708.iso
-rw-r--r--. 1 root root 4521459712 Apr  9 21:47 CentOS-7-x86_64-DVD-1708.iso.cp1
# date; time -p dd if=/dev/zero of=test.dd bs=1k count=64k ; echo $? ; date
Thu Apr 12 23:50:58 PDT 2018
65536+0 records in
65536+0 records out
67108864 bytes (67 MB) copied, 0.264055 s, 254 MB/s
real 0.27
user 0.02
sys 0.24
0
Thu Apr 12 23:50:58 PDT 2018
# ls -lh
(this ls has been running for 20 minutes now and still hasn't returned)

If I open another ssh session, /bin/ls -lh returns right away.
sure, the default version of ls is aliased to –color, it takes a bit longer to stat all the inodes (for all of 3 files), it should certainly not hang?

The FS is pretty new and has very little data on it:
/dev/mapper/datavg02-datalv02 125T 1.1T 124T 1% /mnt/tmp_data02

A number of strange things have happened:

  • both FS and its LV "host" was first created at ~10GB, then lvextend -L+500G/xfs_growfs and yet again lvextend -L 125T/xfs_growfs.
  • early functionality test was okay.
  • Started copying files to it. Now at 1.1TB.
  • At the time machine felt like it was somehow very slow.
  • Issued reboot, took a long time (5+ minutes), thought machine was hung and decided to power off.
  • mount took a very long time to mount the fs again (like 24+ hours).
  • but fs can now umount and mount very quickly.

Some details on the sw stack:

# uname -a
Linux roost 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
# rpm -qa | grep xfs
xfsprogs-4.5.0-12.el7.x86_64
xfsdump-3.1.4-1.el7.x86_64

RAID card:

b3:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] (rev 02)

It is a raid 6 "Virtual Disk" with 16x 10 TB drives. LVM2 on this VD, with two XFS on this volume group. the other XFS volume behave just fine, though that's a 123GB FS.

What could be wrong and how should I go about fixing it?

Thanks!

T

~~~~~

Thanks for the feedback Tux 🙂

The ls never finished and I had to kill it. kill -9 didnt work. I had to kill the parent bash process.

I unmounted, xfs_repair (which took 25 hours), mount again.

The first cp of a 4.5 GB iso file was horribly slow. took 826 seconds rather than an expected ~8 seconds on a xfs based on a different raid volume.

The strange thing is, a second cp test of the same file completed in the expected ~8 second.
I wonder if I am hitting the bug mentioned in:
https://groups.google.com/forum/#!topic/linux.kernel/kG1gtLc7R8Q

I havent finished reading that long thread to see what the solution.
But, whats "too large agcount" these days for xfs?

# xfs_info /dev/datavg02/datalv02
meta-data=/dev/mapper/datavg02-datalv02 isize=512    agcount=5295839, agsize=6336 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=33554432000, imaxpct=25
         =                       sunit=64     swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=1605, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

as for stats…
below are couple of snapshot of iotop and iostat during the first cp that took 100x the expected time:

# date; time -p cp CentOS-6-x86_64-DVD-1708.iso CentOS-7-x86_64-DVD-1708.iso.after_xfs_repair ; echo $? ; date
Sat Apr 14 23:00:55 PDT 2018
real 826.37
user 0.04
sys 5.96
1
Sat Apr 14 23:14:41 PDT 2018


Total DISK READ :     650.00 K/s | Total DISK WRITE :    1040.00 K/s
Actual DISK READ:     650.00 K/s | Actual DISK WRITE:       0.00 B/s
   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                        
222036 be/4 root        0.00 B/s    3.82 K/s  0.00 %  0.00 % tee iotop.out   
   937 be/4 root        0.00 B/s    3.82 K/s  0.00 %  0.00 % rsyslogd -n [rs:main Q:Reg]
380133 be/4 root      650.00 K/s    0.00 B/s  0.00 %  0.00 % [kworker/u896:0]
  1489 be/4 ganglia     0.00 B/s 1032.35 K/s  0.00 %  0.00 % gmetad -d 1
  2048 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % X :0 -background none -noreset -audit 4 -verbose~tabase -seat seat0 -nolisten tcp vt1 [llvmpipe-6]
     1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % systemd --switched-root --system --deserialize 21
     2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
     3 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
  2052 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % X :0 -background none -noreset -audit 4 -verbose~abase -seat seat0 -nolisten tcp vt1 [llvmpipe-10]
     5 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/0:0H]
  2054 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % X :0 -background none -noreset -audit 4 -verbose~abase -seat seat0 -nolisten tcp vt1 [llvmpipe-12]
     7 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/0]
     8 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_bh]
     9 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_sched]
    10 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/0]
    11 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/1]
    12 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/1]
    13 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/1]
    15 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/1:0H]
    16 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/2]
    17 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/2]



Total DISK READ :     567.90 K/s | Total DISK WRITE :    1008.77 K/s
Actual DISK READ:     567.90 K/s | Actual DISK WRITE:    1374.92 K/s
   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                        
380133 be/4 root      567.90 K/s    0.00 B/s  0.00 %  0.00 % [kworker/u896:0]
  1489 be/4 ganglia     0.00 B/s 1008.77 K/s  0.00 %  0.00 % gmetad -d 1 
  2048 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % X :0 -background none -noreset -audit 4 -verbose~tabase -seat seat0 -nolisten tcp vt1 [llvmpipe-6]
     1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % systemd --switched-root --system --deserialize 21
     2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
     3 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
  2052 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % X :0 -background none -noreset -audit 4 -verbose~abase -seat seat0 -nolisten tcp vt1 [llvmpipe-10]
     5 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/0:0H]
  2054 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % X :0 -background none -noreset -audit 4 -verbose~abase -seat seat0 -nolisten tcp vt1 [llvmpipe-12]
     7 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/0]
     8 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_bh]
     9 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_sched]
    10 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/0]
    11 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/1]
    12 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/1]
    13 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/1]
    15 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/1:0H]
    16 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/2]
    17 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/2]
    18 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/2]
  2051 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % X :0 -background none -noreset -audit 4 -verbose~tabase -seat seat0 -nolisten tcp vt1 [llvmpipe-9]
    20 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/2:0H]
    21 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/3]
    22 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/3]
    23 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/3]
    25 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/3:0H]
    26 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/4]
    27 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/4]
    28 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/4]
    30 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/4:0H]
    31 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/5]
    32 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/5]
    33 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/5]
    35 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/5:0H]
    36 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/6]
    37 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/6]
    38 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/6]
    40 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/6:0H]
    41 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/7]
    42 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/7]
    43 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/7]
    45 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/7:0H]
    46 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/8]
    47 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/8]
    48 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/8]
  2056 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % X :0 -background none -noreset -audit 4 -verbose~abase -seat seat0 -nolisten tcp vt1 [llvmpipe-14]
    50 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/8:0H]



avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.28    0.00    0.28    0.01    0.00   99.44

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda             261.80         1.60      1130.40          8       5652
sdb              21.80      3072.00         0.00      15360          0
sdc             155.60       622.40         0.00       3112          0
dm-0            262.20         1.60      1130.40          8       5652
dm-1              0.00         0.00         0.00          0          0
dm-2             21.80      3072.00         0.00      15360          0
dm-3              0.00         0.00         0.00          0          0
dm-4            155.60       622.40         0.00       3112          0
dm-5              0.00         0.00         0.00          0          0
dm-6              0.00         0.00         0.00          0          0




avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.50    0.00    0.30    0.00    0.00   99.20

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              45.60         0.00       185.60          0        928
sdb               0.00         0.00         0.00          0          0
sdc             161.80       647.20         0.00       3236          0
dm-0             45.60         0.00       185.60          0        928
dm-1              0.00         0.00         0.00          0          0
dm-2              0.00         0.00         0.00          0          0
dm-3              0.00         0.00         0.00          0          0
dm-4            162.00       648.00         0.00       3240          0
dm-5              0.00         0.00         0.00          0          0
dm-6              0.00         0.00         0.00          0          0




# dmesg | grep -v IPTable ; uptime
[614878.648714] XFS (dm-4): Mounting V5 Filesystem
[614878.800398] XFS (dm-4): Ending clean mount
 23:04:43 up 7 days,  2:53,  4 users,  load average: 2.04, 1.25, 0.71

Best Answer

both FS and its LV "host" was first created at ~10GB, then lvextend -L+500G/xfs_growfs and yet again lvextend -L 125T/xfs_growfs.

This probably is your problem. Filesystem grow are generally fine for 10X (one order of magnitude); anything bigger severely penalize filesystem performance compared to a correctly sized one.

In you case, you grow it from 10 GB to >100 TB, an increase of 10000 times (4 order of magnitude). So, I strongly suggest you to reformat your filesystem with the correct size. This will obviously means to backup and restore anything you load on it.

Anyway, are you sure no disk has issues? Can you post the output of iostat -x -k 1 during the blocked ls ?