I have a CentOS 5 instance running on Amazon EC2. The normal CPU usage hovers around 10-20%. About 4 times in the past week, however, CPU usage has suddenly shot up to 100% and just stayed at a constant 100% until rebooting the instance.
I'm sure this is a bug or a misconfiguration with something on the server, but when the instance gets into this state, I can't log in via SSH to do any investigating. Unfortunately, Amazon doesn't provide a way for you to access the instance via a console.
So, I guess my question is — is there a way to configure the machine such that in any 100% CPU situation, we give priority to SSH to allow root to log in and investigate?
Or at least, is there any easy way to automatically kill any process/processes when this sort of situation occurs?
By the way, this is a "C1.xlarge" instance on amazon, which means it has 8 cores.
Also if it helps, the machine is set up as a web server running Plesk. And don't tell me that Plesk can't be run within EC2, because I've been doing it just fine for months … until recently. The machine is already running PLesk's version of monit, so I'd rather not set up a second monit.
Best Answer
You could try modifying the sshd init script to start it up with a nice value of -5 or -10. That'll change the value for all SSH logins, which may be fine for you.