Linux – Load average is 50 while CPU Utilization is %60

amazon ec2amazon-web-serviceslinux

We use EC2 Auto Scaling and recently decided to change Instance type from m2.2xlarge to c1.xlarge (High Memory to High CPU) because average amount of used RAM per Instance is 2G, thus we don't need 34G provided by m2.2xlarge, and having more CPU power of c1.xlarge for the same price would be good idea.

But after switching to c1.xlarge, we have the issue:

  1. Load average became 50 while CPU Utilization dropped from %70 to %60.
  2. Scaling in from 6 Instances to 4 doesn't affect CPU Utilization Cloud Watch metric.
  3. Response time appeared to be very slow and Instances been substituting constantly with Auto Scaling because of ELB Health Check.
  4. Auto Scaling reduced the number of Instances from 8 to 4 because CPU Utilization dropped.

Can you explain me what might be the reason of such behavior and what can I do with it?

EC2 Instance Types Info:

High-Memory Double Extra Large Instance

34.2 GB of memory
13 EC2 Compute Units (4 virtual cores with 3.25 EC2 Compute Units each)
850 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.2xlarge

High-CPU Extra Large Instance

7 GB of memory
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: c1.xlarge

EDIT:

$ iostat -x
Linux 2.6.38-13-virtual     02/17/2012  _x86_64_    (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.34    0.00    0.13    0.02    0.29   98.23

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.04     0.09    0.08    0.13     1.50     0.87    22.99     0.01   36.59   23.42   44.75   4.04   0.08
xvdb              0.00     0.00    0.01    0.00     0.03     0.00     9.37     0.00    1.04    0.95   15.00   1.04   0.00



$ iostat
Linux 2.6.38-13-virtual     02/17/2012  _x86_64_    (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.45    0.00    0.14    0.02    0.31   98.08

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvdap1            0.21         1.50         0.87      93689      54728
xvdb              0.01         0.03         0.00       1575          8



$ top
top - 05:30:08 up 17:20,  3 users,  load average: 15.13, 10.24, 9.66
Tasks: 166 total,  20 running, 146 sleeping,   0 stopped,   0 zombie
Cpu(s): 65.3%us,  4.7%sy,  0.0%ni, 13.5%id,  0.0%wa,  0.0%hi,  0.7%si, 15.8%st
Mem:   7130236k total,   463440k used,  6666796k free,    19100k buffers
Swap:        0k total,        0k used,        0k free,    95136k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                          
 6457 ubuntu    20   0  257m  11m 4820 S   24  0.2   0:16.73 apache2                                                                                                                                                                          
 6416 ubuntu    20   0  257m  11m 4820 R   23  0.2   0:17.36 apache2                                                                                                                                                                          
 6375 ubuntu    20   0  257m  11m 4820 R   22  0.2   0:17.62 apache2                                                                                                                                                                          
 6402 ubuntu    20   0  257m  11m 4820 R   22  0.2   0:16.85 apache2                                                                                                                                                                          
 6472 ubuntu    20   0  257m  11m 4820 S   22  0.2   0:08.95 apache2                                                                                                                                                                          
 6311 ubuntu    20   0  257m  11m 4820 S   21  0.2   0:24.91 apache2                                                                                                                                                                          
 6446 ubuntu    20   0  257m  11m 4820 R   21  0.2   0:16.91 apache2                                                                                                                                                                          
 6372 ubuntu    20   0  257m  11m 4820 R   21  0.2   0:17.89 apache2                                                                                                                                                                          
 6460 ubuntu    20   0  257m  11m 4820 R   21  0.2   0:16.73 apache2                                                                                                                                                                          
 6379 ubuntu    20   0  257m  11m 4820 R   20  0.2   0:16.24 apache2                                                                                                                                                                          
 6380 ubuntu    20   0  257m  11m 4820 S   20  0.2   0:17.20 apache2                                                                                                                                                                          
 6450 ubuntu    20   0  257m  11m 4820 S   20  0.2   0:16.89 apache2                                                                                                                                                                          
 6426 ubuntu    20   0  257m  11m 4820 R   20  0.2   0:16.96 apache2                                                                                                                                                                          
 6432 ubuntu    20   0  257m  11m 4820 S   20  0.2   0:17.78 apache2                                                                                                                                                                          
 6433 ubuntu    20   0  257m  11m 4820 R   20  0.2   0:14.37 apache2                                                                                                                                                                          
 6476 ubuntu    20   0  257m  11m 4816 R   20  0.2   0:02.92 apache2                                                                                                                                                                          
 6386 ubuntu    20   0  257m  11m 4824 S   20  0.2   0:17.94 apache2                                                                                                                                                                          
 6475 ubuntu    20   0  257m  11m 4820 S   19  0.2   0:03.41 apache2                                                                                                                                                                          
 6355 ubuntu    20   0  257m  11m 4820 S   19  0.2   0:24.39 apache2                                                                                                                                                                          
 6417 ubuntu    20   0  257m  11m 4820 R   18  0.2   0:16.66 apache2                                                                                                                                                                          
 6455 ubuntu    20   0  257m  11m 4820 R   18  0.2   0:16.27 apache2                                                                                                                                                                          
 6393 ubuntu    20   0  257m  11m 4820 S   18  0.2   0:16.60 apache2                                                                                                                                                                          
 6325 ubuntu    20   0  257m  11m 4820 R   18  0.2   0:25.66 apache2                                                                                                                                                                          
 6403 ubuntu    20   0  257m  11m 4820 S   18  0.2   0:15.61 apache2                                                                                                                                                                          
 6474 ubuntu    20   0  257m  11m 4812 S   18  0.2   0:04.37 apache2                                                                                                                                                                          
 6477 ubuntu    20   0  257m  11m 4800 S   18  0.2   0:01.43 apache2                                                                                                                                                                          
 6315 ubuntu    20   0  257m  11m 4820 S   17  0.2   0:25.27 apache2                                                                                                                                                                          
 6376 ubuntu    20   0  257m  11m 4820 R   17  0.2   0:17.53 apache2                                                                                                                                                                          
 6478 ubuntu    20   0  257m  11m 4800 S   15  0.2   0:00.45 apache2                                                                                                                                                                          
 6359 ubuntu    20   0  257m  11m 4820 R   15  0.2   0:23.60 apache2   



$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/xvda1            7.9G  1.4G  6.1G  19% /
none                  3.4G  112K  3.4G   1% /dev
none                  3.4G     0  3.4G   0% /dev/shm
none                  3.4G   72K  3.4G   1% /var/run
none                  3.4G     0  3.4G   0% /var/lock
/dev/xvdb             414G  199M  393G   1% /mnt
XXXX.compute.internal:/share_0
                       99G   28G   66G  30% /data_0
XXXX.compute.internal:/share_17
                       99G   30G   64G  33% /data_17
XXXX.compute.internal:/share_13
                       99G   30G   64G  33% /data_13
XXXX.compute.internal:/share_18
                       99G   31G   64G  33% /data_18
XXXX.compute.internal:/share_15
                       99G   28G   66G  30% /data_15
XXXX.compute.internal:/share_10
                       99G   28G   67G  30% /data_10
XXXX.compute.internal:/share_16
                       99G   30G   64G  32% /data_16
XXXX.internal:/share_3
                       99G   29G   66G  31% /data_3
XXXX.compute.internal:/share_11
                       99G   30G   64G  32% /data_11
XXXX.compute.internal:/share_7
                       99G   28G   66G  30% /data_7
XXXX.compute.internal:/share
                       99G   58G   37G  62% /share
XXXX.compute.internal:/share_2
                       99G   28G   66G  30% /data_2
XXXX.compute.internal:/share_8
                       99G   28G   67G  30% /data_8
XXXX.compute.internal:/share_19
                       99G   28G   66G  30% /data_19
XXXX.compute.internal:/share_14
                       99G   31G   64G  33% /data_14
XXXX.compute.internal:/share_5
                       99G   28G   66G  30% /data_5
XXXX.compute.internal:/share_6
                       99G   28G   67G  30% /data_6
XXXX.compute.internal:/share_1
                       99G   28G   66G  30% /data_1
XXXX.compute.internal:/share_12
                       99G   31G   64G  33% /data_12
XXXX.compute.internal:/share_4
                       99G   29G   66G  31% /data_4
XXXX.compute.internal:/share_9
                       99G   28G   66G  30% /data_9



$ free -g
             total       used       free     shared    buffers     cached
Mem:             6          0          6          0          0          0
-/+ buffers/cache:          0          6
Swap:            0          0          0



sar 1
Linux 2.6.38-13-virtual     02/17/2012  _x86_64_    (8 CPU)

05:33:02 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
05:33:03 AM     all     69.27      0.00      5.90      0.00     13.83     11.00
05:33:04 AM     all     70.88      0.00      7.62      0.00     16.50      5.01
05:33:05 AM     all     64.41      0.00      5.35      0.00     17.90     12.34
05:33:06 AM     all     66.41      0.00      9.16      0.00     13.09     11.34
05:33:07 AM     all     74.55      0.00      7.06      0.00     11.21      7.17
05:33:08 AM     all     62.31      0.00      7.49      0.00     13.38     16.81
05:33:09 AM     all     73.65      0.00      5.61      0.00     16.04      4.70
05:33:10 AM     all     76.79      0.00      8.20      0.00      9.70      5.31
05:33:11 AM     all     70.91      0.00      5.86      0.00     14.21      9.02
05:33:12 AM     all     73.95      0.00      6.37      0.00     12.51      7.17
05:33:13 AM     all     63.50      0.00      6.03      0.00     17.52     12.95
05:33:14 AM     all     61.92      0.00      4.42      0.00     17.66     16.00
05:33:15 AM     all     63.56      0.00      6.42      0.00     15.11     14.91
05:33:16 AM     all     72.63      0.00      7.51      0.00     14.90      4.97
05:33:17 AM     all     60.68      0.00      6.17      0.00     15.09     18.06



$ sar -w 1
    Linux 2.6.38-13-virtual     02/17/2012  _x86_64_    (8 CPU)

    09:34:23 AM    proc/s   cswch/s
    09:34:24 AM      0.00   4795.00
    09:34:25 AM      0.00   4174.00
    09:34:26 AM      0.00   4194.23
    09:34:27 AM      1.00   3645.00
    09:34:28 AM      0.00   4564.00
    09:34:29 AM      0.00   4473.00
    09:34:30 AM      0.00   4225.00
    09:34:31 AM      0.00   4064.36
    09:34:32 AM      0.00   4740.00
    09:34:33 AM      0.00   4589.22
    09:34:34 AM      0.00   3887.00
    09:34:35 AM      0.00   4579.00
    09:34:36 AM      0.00   4408.00
    09:34:37 AM      1.00   4390.00
    09:34:38 AM      0.00   4628.00

Best Answer

Please add sar -w 1 output. I suppose a number of context switches per second is killing your performance, because there are much more processes running than available processors. I think context switches on a virtual machine are expensive.

If it's true, then there are some kernel tunables that can help you lower number of context switches:

  • Check value of systctl kernel.sched_min_granularity_ns. Double it with a command similar to systctl kernel.sched_min_granularity_ns=2000000. Retest. Double it again. Retest. Repeat. Try to find a value which will not cripple interactivity too much but won't allow too many context switches and write it to /etc/sysctl.conf so it will be set at startup.

  • Set apache scheduling policy to SCHED_BATCH - start it with chrt -b 0 apache2