Websites down EC2 inaccessible via SSH CPU utilisation 100% last few hours – what should I do

amazon ec2

I have multiple websites hosted on 1 single EC2 instance.

  • 1 website "abc" were down for a few hours, sometimes threw database connection error and sometimes just took too long to respond.
  • 1 website "def" were incredibly slow but still up and running
  • the rest of the websites had the same symptoms has "abc"

I can afford 15 min or less down time for "def".

Should I then (in AWS console)

  • reboot my instance
  • or
  • create an AMI image from my instance and launch it and associate my elastic IP to the new instance
  • or
  • "launch more like this"

Background on what may have happened to my ec2

  • The last time I made changes for 21 hours ago.
  • A cronjob to create snapshots ran around 19 hours ago and it has been running for a long time.
  • Google Analytics shows traffic to my websites such as kidlander.sg has been nothing exceptional.

Is there any other actions I should take or better options I could have?
(I have already contacted AWS support but their turnaround is 12 hours so I appreciate all the help I could get)

Update
I got everything back up and running and CPU utilisation back to normal, around 30%.

There is 1 difference between "def" and "abc" as well as my other websites
"def"'s database is hosted on RDS
"abc"'s database is hosted on an EC2 instance (different from my web server instance) configured by myself

Nevertheless, I checked the EC2 instance I'm using as MySQL server yesterday and it was absolutely fine during the incident

  • low CPU ultilisation
  • I could log in using linux command line

Best Answer

Reboot it. If it don't work, stop and start it again. This can make the instance launch on a different hardware if the old one is marked as unavailable or problematic.

If you are using EBS volumes you should not worry about losing data. (Asides, only terminating an instance actually destroys it).