Ssh – 100% CPU keeps me from logging into the machine

amazon ec2central-processing-unitssh

I have a CentOS 5 instance running on Amazon EC2. The normal CPU usage hovers around 10-20%. About 4 times in the past week, however, CPU usage has suddenly shot up to 100% and just stayed at a constant 100% until rebooting the instance.

I'm sure this is a bug or a misconfiguration with something on the server, but when the instance gets into this state, I can't log in via SSH to do any investigating. Unfortunately, Amazon doesn't provide a way for you to access the instance via a console.

So, I guess my question is — is there a way to configure the machine such that in any 100% CPU situation, we give priority to SSH to allow root to log in and investigate?

Or at least, is there any easy way to automatically kill any process/processes when this sort of situation occurs?

By the way, this is a "C1.xlarge" instance on amazon, which means it has 8 cores.

Also if it helps, the machine is set up as a web server running Plesk. And don't tell me that Plesk can't be run within EC2, because I've been doing it just fine for months … until recently. The machine is already running PLesk's version of monit, so I'd rather not set up a second monit.

Best Answer

You could try modifying the sshd init script to start it up with a nice value of -5 or -10. That'll change the value for all SSH logins, which may be fine for you.