Very low disk throughput on AWS EBS volumes

amazon-ebsamazon-web-services

I am currently copying data manually from one EBS volume to another, which has a smaller size, as we have XFS file system, which can not be reduced.

I am using a t3.micro instance (EBS optimised) with Amazon Linux 2 AMI, which has both EBS volumes attached (gp2) additionally to the main from the instance (everything in the same Availability Zone)

I have already done this and it was taking around 5-10 mins to copy 95GB of data (which would be if 10 mins, 162MB/s of throughput), but now, with the same volumes, it is being very slow.

The copying process is:

tar cSf - /mnt/nvme1n1p1/ | cat | (cd ../nvme2n1p1/ && tar xSBf -)

I have it running in background, and checking at the same time with iostat -xm 5 3 I am getting this results:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.07    0.02    0.86   39.62    0.05   59.39

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme1n1           0.00     0.00   54.20    0.00     6.70     0.00   253.19     0.94   34.62   34.62    3.56  17.32  93.90
nvme2n1           0.00     0.28    0.06   27.20     0.00     6.71   503.98     0.14    6.67    0.31    6.68   1.22   3.32
nvme0n1           0.00     0.02    2.10    0.90     0.04     0.00    30.65     0.00    0.63    0.63    0.62   0.08   0.02

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.10    0.00    0.70   37.54    0.00   61.66

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme1n1           0.00     0.00   46.40    0.00     5.80     0.00   256.00     1.00   43.16   43.16    0.00  21.48  99.68
nvme2n1           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
nvme0n1           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.90   38.66    0.10   60.34

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme1n1           0.00     0.00   53.80    0.00     6.73     0.00   256.00     1.00   36.67   36.67    0.00  18.57  99.92
nvme2n1           0.00     0.00    0.00   16.00     0.00     4.00   512.00     0.03    3.20    0.00    3.20   0.80   1.28
nvme0n1           0.00     0.60    0.00    1.40     0.00     0.02    23.14     0.00    0.00    0.00    0.00   0.00   0.00

As you can see I am getting a throughput below 10MB/s, and it is going less and less. I have been reading about EBS throughput and I do not find any clue about what can it be, if there is any penality or something similar…

Do you know what it can be?

Thanks in advance! šŸ™‚

More requested info:

ulimit -a:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 3700
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 3700
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

df -h:

Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        463M     0  463M   0% /dev
tmpfs           480M     0  480M   0% /dev/shm
tmpfs           480M  380K  480M   1% /run
tmpfs           480M     0  480M   0% /sys/fs/cgroup
/dev/nvme0n1p1  8.0G  1.1G  7.0G  13% /
tmpfs            96M     0   96M   0% /run/user/1000
/dev/nvme1n1p1  500G   93G  408G  19% /mnt/nvme1n1p1
/dev/nvme2n1p1  150G   55G   96G  37% /mnt/nvme2n1p1

EBS Burst Balance is +98% all the time.

EDIT: it has stopped happening in a new time I have done it

Best Answer

Open Amazon Cloudwatch and review the ā€œCPUCreditBalanceā€ for the instance. Look at the total credits available with a sample rate of every 5 minutes. Are the credits dropping to near 0 at any point?

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-monitoring-cpu-credits.html

A ā€˜Tā€™ type AWS instance is a burstable, performance limited type. A t2.micro instance earns only 6 CPU credits per hour. This means your CPU can only run at a sustained 10% usage or it will chew up all of its credits and slow to a crawl.

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-monitoring-cpu-credits.html

Increase the size of your instance type. I would recommend changing to a sufficiently sized ā€˜Cā€™ type instance until after the copy is done. You can downgrade back to a smaller instance afterwards.

Related Topic