We are running a docker container for our jenkins master and it is getting exited with "exitcode 137" every 3 days. We have to restart our EC2 and start the image to make it operational again. Just restarting the container is not working.
docker inspect on exited container gives us this:
"State": {
"Status": "exited",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 0,
"ExitCode": 137,
"Error": "",
"StartedAt": "2019-09-05T11:00:29.683406065Z",
"FinishedAt": "2019-09-05T13:24:26.336749715Z"
The EC2 is a m5a.large instance with 8GB memory and docker stats on the running container gives me this:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
88530703bf01 xenodochial_chandrasekhar 0.58% 1.092GiB / 7.546GiB 14.47% 3.12MB / 6.67MB 368MB / 11.3MB 0
Memory usage says 1.092GiB / 7.546GiB which means the upper limit of memory is the host memory of 8GB and oomkilled is false.
docker logs command does not show much info.
Has anyone else faced this issue? How do we know the reason for the container exit?
Best Answer
The exit code 137 means that the process was killed with signal 9 (
SIGKILL
):This means that some other process (or maybe the kernel itself) sent a
SIGKILL
signal to the main process running in the container. That is why it exited.If it was the kernel, there will be a message in the kernel log (
dmesg
,/var/log/kern.log
). Look for a "Killed process" message.This could also be caused by the
docker kill
command.This can also happen if
docker stop
(orrestart
, etc.) is run, but the process does not exit within 10 seconds. So if something is stopping or restarting the container, but your Jenkins process takes a long time to exit (e.g. if it waits for jobs to complete) that could cause this behavior.Hope this helps.