AWS EC2 – Why Are AWS i3en.3xlarge IOPS Really Low?

amazon ec2amazon-web-servicesiopsnvmeUbuntu

I just launched a new instance ec2 instance of type i3en.3xlarge. Operating system is Ubuntu. I mounted the NVMe Instance store but every speed test I run is incredible low at around 7k iops. What am I doing wrong?

Here are the steps I did:

1) Check available ssds with nvme -list:

---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1     vol012301587a8724842 Amazon Elastic Block Store               1           8.59  GB /   8.59  GB    512   B +  0 B   1.0     
/dev/nvme1n1     AWS16AAAC6C7BFAC4972 Amazon EC2 NVMe Instance Storage         1           7.50  TB /   7.50  TB    512   B +  0 B   0

2) create a new xfs file system for nvme1n1:

sudo mkfs -t xfs /dev/nvme1n1

3) mount it to /home

sudo mount /dev/nvme1n1 /home

4) check df -h:

    ubuntu@ip-172-31-35-146:/home$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root       7.7G  2.8G  4.9G  37% /
devtmpfs         47G     0   47G   0% /dev
tmpfs            47G     0   47G   0% /dev/shm
tmpfs           9.4G  852K  9.4G   1% /run
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            47G     0   47G   0% /sys/fs/cgroup
/dev/loop0       25M   25M     0 100% /snap/amazon-ssm-agent/4046
/dev/loop3       43M   43M     0 100% /snap/snapd/14066
/dev/loop2       68M   68M     0 100% /snap/lxd/21835
/dev/loop1       56M   56M     0 100% /snap/core18/2284
/dev/loop4       62M   62M     0 100% /snap/core20/1242
/dev/loop6       56M   56M     0 100% /snap/core18/2253
/dev/loop5       44M   44M     0 100% /snap/snapd/14549
/dev/loop7       62M   62M     0 100% /snap/core20/1328
tmpfs           9.4G     0  9.4G   0% /run/user/1000
/dev/nvme1n1    6.9T   49G  6.8T   1% /home

5)run test with fio:

fio -direct=1 -iodepth=1 -rw=randread -ioengine=libaio -bs=4k -size=1G -numjobs=1 -runtime=1000 -group_reporting -filename=iotest -name=Rand_Read_Testing

Fio Results:

fio-3.16
Starting 1 process
Rand_Read_Testing: Laying out IO file (1 file / 1024MiB)
Jobs: 1 (f=1): [r(1)][100.0%][r=28.5MiB/s][r=7297 IOPS][eta 00m:00s]
Rand_Read_Testing: (groupid=0, jobs=1): err= 0: pid=1701: Sat Jan 29 22:28:17 2022
  read: IOPS=7139, BW=27.9MiB/s (29.2MB/s)(1024MiB/36717msec)
    slat (nsec): min=2301, max=39139, avg=2448.98, stdev=311.68
    clat (usec): min=32, max=677, avg=137.06, stdev=26.98
     lat (usec): min=35, max=680, avg=139.59, stdev=26.99
    clat percentiles (usec):
     |  1.00th=[   35],  5.00th=[   99], 10.00th=[  100], 20.00th=[  124],
     | 30.00th=[  125], 40.00th=[  126], 50.00th=[  139], 60.00th=[  141],
     | 70.00th=[  165], 80.00th=[  167], 90.00th=[  169], 95.00th=[  169],
     | 99.00th=[  172], 99.50th=[  174], 99.90th=[  212], 99.95th=[  281],
     | 99.99th=[  453]
   bw (  KiB/s): min=28040, max=31152, per=99.82%, avg=28506.48, stdev=367.13, samples=73
   iops        : min= 7010, max= 7788, avg=7126.59, stdev=91.80, samples=73
  lat (usec)   : 50=1.29%, 100=9.46%, 250=89.19%, 500=0.06%, 750=0.01%
  cpu          : usr=1.43%, sys=2.94%, ctx=262144, majf=0, minf=12
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=262144,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=27.9MiB/s (29.2MB/s), 27.9MiB/s-27.9MiB/s (29.2MB/s-29.2MB/s), io=1024MiB (1074MB), run=36717-36717msec

Disk stats (read/write):
  nvme1n1: ios=259894/5, merge=0/3, ticks=35404/0, in_queue=35404, util=99.77%

According to benchmarks like here the iops performance should be way better.

So am I missing something here?

Thanks in advance

Best Answer

So I span up one of these instances to test for myself. My steps were only a little different:

  1. Partition the disk first using parted
  2. Make the filesystem
  3. Mount at /opt as /home was already there and had my user's home directory in (ubuntu).
  4. apt update && apt upgrade, then install fio
  5. Run the same command as you: fio -direct=1 -iodepth=1 -rw=randread -ioengine=libaio -bs=4k -size=1G -numjobs=1 -runtime=1000 -group_reporting -filename=iotest -name=Rand_Read_Testing from within /opt, with sudo.

I got similar results, with read: IOPS=7147.

I then ran another test:

/opt$ sudo fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=fiotest --filename=testfio --bs=4k --iodepth=64 --size=8G --readwrite=randrw --rwmixread=75
fiotest: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.16
Starting 1 process
fiotest: Laying out IO file (1 file / 8192MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=332MiB/s,w=109MiB/s][r=85.1k,w=28.0k IOPS][eta 00m:00s]
fiotest: (groupid=0, jobs=1): err= 0: pid=26470: Mon Jan 31 09:14:45 2022
  read: IOPS=91.5k, BW=357MiB/s (375MB/s)(6141MiB/17187msec)
   bw (  KiB/s): min=339568, max=509896, per=100.00%, avg=366195.29, stdev=59791.96, samples=34
   iops        : min=84892, max=127474, avg=91548.82, stdev=14947.99, samples=34
  write: IOPS=30.5k, BW=119MiB/s (125MB/s)(2051MiB/17187msec); 0 zone resets
   bw (  KiB/s): min=111264, max=170424, per=100.00%, avg=122280.71, stdev=20225.33, samples=34
   iops        : min=27816, max=42606, avg=30570.18, stdev=5056.32, samples=34
  cpu          : usr=19.73%, sys=41.60%, ctx=742611, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=1572145,525007,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=357MiB/s (375MB/s), 357MiB/s-357MiB/s (375MB/s-375MB/s), io=6141MiB (6440MB), run=17187-17187msec
  WRITE: bw=119MiB/s (125MB/s), 119MiB/s-119MiB/s (125MB/s-125MB/s), io=2051MiB (2150MB), run=17187-17187msec

Disk stats (read/write):
  nvme1n1: ios=1563986/522310, merge=0/0, ticks=927244/24031, in_queue=951275, util=99.46%

...which looks a lot better - read: IOPS=91.5k.

I suspect it's due to how the read-only test works? Or some nuance of reading off the disk you're on, and some other limitation?

I ran my test a couple more times and got similar results each time.

I then ran another read-only test using the command from here, and got this:

/opt$ sudo fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=fiotest --filename=testfio --bs=4k --iodepth=64 --size=8G --readwrite=randread
fiotest: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=332MiB/s][r=85.1k IOPS][eta 00m:00s]
fiotest: (groupid=0, jobs=1): err= 0: pid=26503: Mon Jan 31 09:17:57 2022
  read: IOPS=88.6k, BW=346MiB/s (363MB/s)(8192MiB/23663msec)
   bw (  KiB/s): min=339560, max=787720, per=100.00%, avg=354565.45, stdev=72963.81, samples=47
   iops        : min=84890, max=196930, avg=88641.40, stdev=18240.94, samples=47
  cpu          : usr=15.37%, sys=31.05%, ctx=844523, majf=0, minf=72
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=2097152,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=346MiB/s (363MB/s), 346MiB/s-346MiB/s (363MB/s-363MB/s), io=8192MiB (8590MB), run=23663-23663msec

Disk stats (read/write):
  nvme1n1: ios=2095751/1, merge=0/0, ticks=1468160/0, in_queue=1468159, util=99.64%

So much better read performance. I suspect the arguments you gave your command are not allowing the test to get the best performance from the disk, maybe due to block size, file size, etc. I did notice they were all single-dashed arguments (e.g. -bs=4k) not double (--bs=4k), so they might not even be being parsed correctly...

Related Topic