Linux – How to diagnose a webserver locking on high load

apache-2.2linuxSecurityweb-server

I have troubles understanding what problems may cause the occasional hangs that the server gets due to sudden high load spikes. I am not a system administrator (I'm a PHP programmer) but since the official sysadmin is quite lacking on effort I'm asked to find a solution myself.

The server runs on a Debian Lenny and serves via apache a wordpress + vbulletin based site with 40-60k visits/day. Having done all the application-side optimization I could, we got to the situation where the site runs smoothly even for weeks, then it trips on something that makes the server load jump up to 80+. Stopping apache to restart it helps, but it usually calms down by itself, if given enough time. It can "crash" twice in a day, or see no problems for weeks. It seems to be totally random.

One weird specific thing happened though. I was warned of a strange behavior and after inspection I found the .htaccess file changed to redirect traffic coming from search engines to some external site. I checked the code and every plugin (all up to date) and finally tried the "hard way" chowning .htaccess to root.root. The weird part is that when another issue came up, I found that file changed back to be owned by the user assigned to the website virtualhost. I understand there is no way for this to happen just via some web exploit, or am I mistaken?

How can I find the cause of this high load spikes?
What can explain a root.root file changing permissions, other that someone with root access doing it?
Could these two things be linked to some kind of attack?

Best Answer

Regarding the Apache issue, one possible cause is that your MaxClient/MaxServer setting is too high. When you get spike in traffic you use up all your RAM and cause Apache to start using swap which will very quickly kill your performance. Next time you have the issue check the output of top/free and see if any swap is being used. If it is try reducing the MaxClient/MaxServer values.

I also had an issue with Apache 1.3 where some connections wouldn't ever close and after a few days there would 90% of the connections doing nothing leaving not clients for handling incoming new connections. I solved it by simply restarting Apache each day. From the sounds of it you don't have enough traffic or time between issues for this to be a likely cause.