Docker – Jenkins docker container exiting with exitcode 137 and oom killed is false

docker

We are running a docker container for our jenkins master and it is getting exited with "exitcode 137" every 3 days. We have to restart our EC2 and start the image to make it operational again. Just restarting the container is not working.

docker inspect on exited container gives us this:
"State": {
"Status": "exited",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 0,
"ExitCode": 137,
"Error": "",
"StartedAt": "2019-09-05T11:00:29.683406065Z",
"FinishedAt": "2019-09-05T13:24:26.336749715Z"

The EC2 is a m5a.large instance with 8GB memory and docker stats on the running container gives me this:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
88530703bf01 xenodochial_chandrasekhar 0.58% 1.092GiB / 7.546GiB 14.47% 3.12MB / 6.67MB 368MB / 11.3MB 0

Memory usage says 1.092GiB / 7.546GiB which means the upper limit of memory is the host memory of 8GB and oomkilled is false.

docker logs command does not show much info.

Has anyone else faced this issue? How do we know the reason for the container exit?

Best Answer

The exit code 137 means that the process was killed with signal 9 (SIGKILL):

Python 3.6.8 (default, Jan 14 2019, 11:02:34) 
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from os import *
>>> x = 137
>>> WIFEXITED(x)
False
>>> WIFSIGNALED(x)
True
>>> WTERMSIG(x)
9
>>> 

This means that some other process (or maybe the kernel itself) sent a SIGKILL signal to the main process running in the container. That is why it exited.

If it was the kernel, there will be a message in the kernel log (dmesg, /var/log/kern.log). Look for a "Killed process" message.

This could also be caused by the docker kill command.

This can also happen if docker stop (or restart, etc.) is run, but the process does not exit within 10 seconds. So if something is stopping or restarting the container, but your Jenkins process takes a long time to exit (e.g. if it waits for jobs to complete) that could cause this behavior.

Hope this helps.

Related Topic