Tomcat – Java eats %100 cpu up

cpu-usagejavatomcat

I got this CentOS server in which a Java WebApp (Tomcat6+Hibernate+MySQL+Struts2) is being run.

Usually cpu usage is about 10% but sometimes all of a sudden it goes to 100% and the application freezes. The process causing this condition is the java command, then the server have to be rebooted to get things to normal. This happens completely irregularly, so it is kinda unlikely to be an app bug.

this is the top command under normal condition:

top - 12:50:35 up 21 min,  1 user,  load average: 0.13, 0.18, 0.21
Mem:   8300688k total,   836232k used,  7464456k free,    22168k buffers
Swap: 16779884k total,        0k used, 16779884k free,   309080k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  SWAP   TIME CODE DATA nFLT COMMAND
 3292 tomcat    18   0 1382m 415m  10m S 11.0  5.1   2:55.45 967m   2:55   36 1.3g  537 java
 3165 mysql     15   0  137m  25m 4908 S  5.3  0.3   0:26.64 111m   0:26 6496 124m   82 mysqld
 3456 root      34  19 25660   9m 2076 S  0.0  0.1   0:00.01  15m   0:00    4 8060    2 yum-updatesd
 3345 root      18   0 23040 9420 5520 S  0.0  0.1   0:00.08  13m   0:00  300 3860   20 httpd
 3421 apache    18   0 23040 4808  880 S  0.0  0.1   0:00.00  17m   0:00  300 3860    0 httpd
 3422 apache    18   0 23040 4808  880 S  0.0  0.1   0:00.00  17m   0:00  300 3860    0 httpd
 3423 apache    18   0 23040 4808  880 S  0.0  0.1   0:00.00  17m   0:00  300 3860    0 httpd
 3424 apache    18   0 23040 4808  880 S  0.0  0.1   0:00.00  17m   0:00  300 3860    0 httpd
 3425 apache    23   0 23040 4808  880 S  0.0  0.1   0:00.00  17m   0:00  300 3860    0 httpd
 3426 apache    24   0 23040 4808  880 S  0.0  0.1   0:00.00  17m   0:00  300 3860    0 httpd
 3427 apache    23   0 23040 4808  880 S  0.0  0.1   0:00.00  17m   0:00  300 3860    0 httpd
 3428 apache    23   0 23040 4808  880 S  0.0  0.1   0:00.00  17m   0:00  300 3860    0 httpd
 2951 haldaemo  19   0  5744 3944 1692 S  0.0  0.0   0:00.52 1800   0:00  268 2236    0 hald
 2669 named     19   0  109m 3684 1928 S  0.0  0.0   0:00.08 105m   0:00  364 102m    3 named

and when the hazard comes up:

top - 12:25:10 up 59 min,  3 users,  load average: 1.09, 0.97, 0.64
Tasks: 192 total,   1 running, 189 sleeping,   2 stopped,   0 zombie
Cpu(s): 12.5%us,  0.0%sy,  0.0%ni, 87.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8300688k total,  2303376k used,  5997312k free,    85104k buffers
Swap: 16779884k total,        0k used, 16779884k free,   882748k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  SWAP   TIME CODE DATA nFLT COMMAND
 6609 root      18   0 1356m 1.2g  10m S 101.9 14.8   4:50.37 154m   4:50   36 1.3g    1 java
    1 root      15   0  2068  628  536 S  0.0  0.0   0:01.25 1440   0:01   32  280   20 init
    2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 migration/0
    3 root      34  19     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 ksoftirqd/0
    4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 watchdog/0
    5 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 migration/1
    6 root      34  19     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 ksoftirqd/1
    7 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 watchdog/1
    8 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 migration/2
    9 root      34  19     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 ksoftirqd/2
   10 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 watchdog/2
   11 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 migration/3
   12 root      34  19     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 ksoftirqd/3
   13 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00    0   0:00    0    0    0 watchdog/3

Interesting thing that the java process' user is tomcat when everything is fine, but it turns into root when problem comes up.

what could cause the issue?

Best Answer

There is obviously a thread that is hanging.

kill -3 processid

Will show a list of the running threads in the java-app. Collect these and send it back to the dev.