Vps – System load average is extremely high

codeigniterhigh-loadload-averagevps

I own a website which is running on a VPS since last week. From monday until saturday, everything is going smoothly. The website has around 4.500 unique visitors a day, and the load average and respond time is fine.

On a sunday, the website has around 11.000 unique visitors, because we have offer unique and exclusive content on that day. The content is stored in a MySQL database, which is running on a different VPS server and using the InnoDB engine. This is where things are going wrong. Because of the increase of visitors, the load average will rise to the extreme, until the point where the website will be unreachable.

Here is the top output:

 This is an automated message notifying you that the 5 minute load average on your system is 238.37.
 This has exceeded the 10 threshold.

 One Minute      - 237.31
 Five Minutes    - 238.37
 Fifteen Minutes - 231.1

 top - 16:41:12 up 5 days, 18:51,  1 user,  load average: 238.68, 238.62, 231.25
 Tasks: 517 total, 246 running, 271 sleeping,   0 stopped,   0 zombie
 Cpu(s):  1.8%us,  0.3%sy,  0.0%ni, 97.6%id,  0.0%wa,  0.0%hi,  0.1%si,  0.2%st
 Mem:   3922920k total,  3542968k used,   379952k free,     2736k buffers
 Swap:  1048564k total,   105316k used,   943248k free,   142772k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND    
 14395 apache    20   0  313m  13m 4044 R  2.8  0.4   0:09.81 /usr/sbin/httpd -k start -DSSL
 13405 apache    20   0  314m  15m 4432 R  2.3  0.4   0:17.87 /usr/sbin/httpd -k start -DSSL
 15865 apache    20   0  312m  13m 4176 R  2.3  0.4   0:01.28 /usr/sbin/httpd -k start -DSSL
 15930 apache    20   0  310m  11m 4060 R  2.3  0.3   0:00.88 /usr/sbin/httpd -k start -DSSL
 15978 apache    20   0  310m  11m 4048 R  2.3  0.3   0:01.08 /usr/sbin/httpd -k start -DSSL
 16041 apache    20   0  309m  10m 4052 R  2.1  0.3   0:00.58 /usr/sbin/httpd -k start -DSSL
 16082 apache    20   0  211m 4192 2276 R  1.9  0.1   0:00.09 /usr/sbin/httpd -k start -DSSL
 14298 apache    20   0  310m  11m 4044 R  0.6  0.3   0:09.56 /usr/sbin/httpd -k start -DSSL
 14457 apache    20   0  311m  11m 4068 R  0.6  0.3   0:10.18 /usr/sbin/httpd -k start -DSSL
 14486 apache    20   0  310m  11m 4464 R  0.6  0.3   0:06.13 /usr/sbin/httpd -k start -DSSL
 15287 apache    20   0  313m  14m 4048 R  0.6  0.4   0:05.21 /usr/sbin/httpd -k start -DSSL
 15363 apache    20   0  310m  11m 4064 R  0.6  0.3   0:04.13 /usr/sbin/httpd -k start -DSSL
 15400 apache    20   0  313m  13m 4048 R  0.6  0.4   0:04.09 /usr/sbin/httpd -k start -DSSL
 15404 apache    20   0  310m  11m 4056 R  0.6  0.3   0:04.22 /usr/sbin/httpd -k start -DSSL
 15649 apache    20   0  313m  14m 4432 R  0.6  0.4   0:02.88 /usr/sbin/httpd -k start -DSSL
 15675 apache    20   0  310m  10m 4044 S  0.6  0.3   0:02.22 /usr/sbin/httpd -k start -DSSL
 15692 apache    20   0  310m  11m 4084 R  0.6  0.3   0:01.46 /usr/sbin/httpd -k start -DSSL
 15702 apache    20   0  311m  12m 4044 R  0.6  0.3   0:01.85 /usr/sbin/httpd -k start -DSSL
 15719 apache    20   0  310m  10m 4048 R  0.6  0.3   0:02.32 /usr/sbin/httpd -k start -DSSL
 15781 apache    20   0  318m  18m 4044 R  0.6  0.5   0:01.91 /usr/sbin/httpd -k start -DSSL
 15788 apache    20   0  312m  13m 4048 R  0.6  0.4   0:02.13 /usr/sbin/httpd -k start -DSSL
 15823 apache    20   0  310m  11m 4060 R  0.6  0.3   0:02.04 /usr/sbin/httpd -k start -DSSL
 15837 apache    20   0  311m  12m 4052 R  0.6  0.3   0:01.64 /usr/sbin/httpd -k start -DSSL

On sunday, the website has to perform a pretty large query, with a couple of left joins on different tables.

The website is running on a VPS, containing 2 x 2.4 Ghz proccessor and 4GB ram. The database is running on a SSD VPS, containing 2 x 2.4 Ghz proccessors, and 2GB ram.

On the specific sunday, I also got this message in the ErrorLog of the server:

 Sun Nov 24 15:03:34 2013] [error] server reached MaxClients setting, consider raising the MaxClients setting

The website is created by using the PHP Codeigniter framework, and worked fine the first 8 weeks on a shared hosting (with the same code). After those weeks, the problem started, that's why I decided to move to a VPS server. But the problem seems to be continuing.

I have absolutely no clue where things are going wrong, so any help would he highly appreciated.

Best Answer

the answer to your question is leverge memory caching as much as possible. ie memcache, varnish, etc....... and then use nginx, which you can scale horizontally, and behind it, a php-fpm pool appropriately sized to your load, fully meshed with upsream nginx boxes.

once you get to a certain level of traffic, its not as much as throwing hardware at the problem, as much as leveraging caching, and having individual tiers that can be upgraded/updated individually.

you cant have a super high availability site on a single vps, unless its static html, and even then varnish is ideal.

get a pair of haproxy frontend load balancers, distributing to varnish, pulling from nginx / php / memcache / redis / mysql (postgres)..... thats it in a nutshell :P

Related Topic