High CPU usage without useful information from top command

high-loadtop

Over the past 2 days I started having problems with a server which has a few users on it. The server is an OpenVZ VPS. Normally when I experience high CPU usage, I always use top command to find out the reason. But for this server, I don't receive any useful info from the top command. Below is an example screenshot for the problem that I am experiencing
enter image description here

As can be seen in the screenshot, the %CPU column is almost always zero for all process, and practically most of the time I see all values are zeroes, yet the CPU usage reached up to 10 cores!

I'm totally lost and don't know what to do to find out the reason. So I'd like to ask if anyone have any idea about the possible causes that I am facing? Could it be due to the server issues?

Thanks for any suggestion!

Edit:

Please note that this screenshot is taken only when high load happens. It happens every few hours, and last for about 20 minutes. Normal usage is only about 0.0-0.2 cores. Below is an example of normal usage.

enter image description here

Further update

Just now it happened again, here is the screenshot of the suggested commands

enter image description here

I'm sorry that I'm a noob about these, but if I understand correctly, there is not anything wrong with the disk usage, io usage is very low.

Last update

I have tried using the suggested methods using vmstat, ps given in the answers and comments, but couldn't find useful information. When the spike happens, I even stop apache, mysql but it didn't help. I finally contacted the VPS provider and requested for changing into another node. He told me that he was aware of the issues with the node, which is being heavily abused by malicious customers recently, and he is working to fix the issues. So I guess I don't have to do anything from my end now. Nevertheless, I would like to thank all members who have given suggestions, making this Q&A useful for future reference!

Best Answer

About 30% of CPU time seems to be spent in WAIT, and your (1 minute, at least) load is very high.

So I would start by checking your storage, and storage usage patterns. A good starting point might be taking a look at iostat and/or monitoring for any slow ops (writes and reads). You might also check the time spent on each process in top, to see if anything stands out. As you have mysql running, I would also check how it is doing.

You also seem to be swapping out even though you have a reasonable amount of spare memory, so I would see what adjusting vm.swappiness (using sysctl) could do for you. RAM is faster than swap, so if you can use it, I would.

All else failing, look at what probing sysrq can do for you.

If a more visual overview might help you, then I would maybe look at firehol's netdata (https://netdata.firehol.org/), if you don't have a pre-existing solution.

Related Topic