Docker – CoreOS: `fleetctl stop ` always exits with status 137

coreosdockersystemd

I've been trying to familiarise with CoreOS (v. 633.1.0), and I've been playing around with fleet (I'm using the recommended Vagrant 3-cluster setup on my local machine). I have created the following very basic service (apache@.service):

[Unit]
Description="Dummy Apache service"
After="docker.service"
Requires="docker.service"

[Service]
TimeoutStartSec=0
TimeoutStopSec=30
ExecStartPre=-/usr/bin/docker kill apache1
ExecStartPre=-/usr/bin/docker rm apache1
ExecStartPre=/usr/bin/docker pull coreos/apache
ExecStart=/usr/bin/docker run --rm --name apache1 -p 80:80 coreos/apache /usr/sbin/apache2ctl -D FOREGROUND
ExecStop=/usr/bin/docker stop apache1

[X-Fleet]
Conflicts=apache@*.service

When I start it, it just works flawlessly, however, whenever I try to stop it, it is marked as failed. This is the output of e.g. fleetctl status apache@2 when the stop command starts executing:

core@core-01 ~ $ fleetctl stop apache@2
Unit apache@2.service loaded on a91e28b0.../172.17.8.101
core@core-01 ~ $ fleetctl status apache@2
● apache@2.service - "Dummy Apache container"
   Loaded: loaded (/run/fleet/units/apache@2.service; linked-runtime; vendor preset: disabled)
   Active: deactivating (stop) since Wed 2015-04-15 18:45:46 UTC; 2s ago
  Process: 1038 ExecStartPre=/usr/bin/docker pull coreos/apache (code=exited, status=0/SUCCESS)
  Process: 1030 ExecStartPre=/usr/bin/docker rm apache1 (code=exited, status=1/FAILURE)
  Process: 1024 ExecStartPre=/usr/bin/docker kill apache1 (code=exited, status=1/FAILURE)
 Main PID: 1375 (docker);         : 1522 (docker)
   CGroup: /system.slice/system-apache.slice/apache@2.service
           ├─1375 /usr/bin/docker run --rm --name apache1 -p 80:80 coreos/apache /usr/sbin/apache2ctl -D FOREGROUND
           └─control
             └─1522 /usr/bin/docker stop apache1

Apr 15 18:43:18 core-01 docker[1038]: 9cd978db300e: Pulling fs layer
Apr 15 18:44:26 core-01 docker[1038]: 9cd978db300e: Download complete
Apr 15 18:44:26 core-01 docker[1038]: 87026dcb0044: Pulling metadata
Apr 15 18:44:27 core-01 docker[1038]: 87026dcb0044: Pulling fs layer
Apr 15 18:44:53 core-01 docker[1038]: 87026dcb0044: Download complete
Apr 15 18:44:53 core-01 docker[1038]: 87026dcb0044: Download complete
Apr 15 18:44:53 core-01 docker[1038]: Status: Downloaded newer image for coreos/apache:latest
Apr 15 18:44:53 core-01 systemd[1]: Started "Dummy Apache container".
Apr 15 18:44:53 core-01 docker[1375]: apache2: Could not reliably determine the server's fully qualified domain name, using 10.1.0.2 for ServerName
Apr 15 18:45:46 core-01 systemd[1]: Stopping "Dummy Apache container"...

However, after a few seconds:

core@core-01 ~ $ fleetctl status apache@2
● apache@2.service - "Dummy Apache container"
   Loaded: loaded (/run/fleet/units/apache@2.service; linked-runtime; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2015-04-15 18:45:56 UTC; 1s ago
  Process: 1522 ExecStop=/usr/bin/docker stop apache1 (code=exited, status=0/SUCCESS)
  Process: 1375 ExecStart=/usr/bin/docker run --rm --name apache1 -p 80:80 coreos/apache /usr/sbin/apache2ctl -D FOREGROUND (code=exited, status=137)
  Process: 1038 ExecStartPre=/usr/bin/docker pull coreos/apache (code=exited, status=0/SUCCESS)
  Process: 1030 ExecStartPre=/usr/bin/docker rm apache1 (code=exited, status=1/FAILURE)
  Process: 1024 ExecStartPre=/usr/bin/docker kill apache1 (code=exited, status=1/FAILURE)
 Main PID: 1375 (code=exited, status=137)

Apr 15 18:44:53 core-01 docker[1038]: 87026dcb0044: Download complete
Apr 15 18:44:53 core-01 docker[1038]: Status: Downloaded newer image for coreos/apache:latest
Apr 15 18:44:53 core-01 systemd[1]: Started "Dummy Apache container".
Apr 15 18:44:53 core-01 docker[1375]: apache2: Could not reliably determine the server's fully qualified domain name, using 10.1.0.2 for ServerName
Apr 15 18:45:46 core-01 systemd[1]: Stopping "Dummy Apache container"...
Apr 15 18:45:56 core-01 docker[1522]: apache1
Apr 15 18:45:56 core-01 systemd[1]: apache@2.service: main process exited, code=exited, status=137/n/a
Apr 15 18:45:56 core-01 systemd[1]: Stopped "Dummy Apache container".
Apr 15 18:45:56 core-01 systemd[1]: Unit apache@2.service entered failed state.
Apr 15 18:45:56 core-01 systemd[1]: apache@2.service failed.

From what I can see, it seems like docker is killing the container (which is what the ExecStartPre command is supposed to be doing when the service starts). I have also investigated the logs with journalctl, and it seems to be the case.

Correction: Upon further inspection (and eyes closer to the screen), I noticed this very important line in the logs:

Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="Container afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1 failed to exit within 10 seconds of SIGTERM -

So I can see that docker is unable to stop the service properly (the reason for which I don't understand), however I still don't understand why it goes ahead and attempts to destroy the container. Here is the rest of the relevant logs:

Apr 15 18:45:46 core-01 systemd[1]: Stopping "Dummy Apache container"...
Apr 15 18:45:46 core-01 fleetd[880]: INFO manager.go:145: Triggered systemd unit apache@2.service stop: job=2115
Apr 15 18:45:46 core-01 fleetd[880]: INFO reconcile.go:311: AgentReconciler completed task: type=StopUnit job=apache@2.service reason="unit currently launched but desired state is loaded"
Apr 15 18:45:46 core-01 dockerd[881]: time="2015-04-15T18:45:46Z" level="info" msg="POST /v1.17/containers/apache1/stop?t=10"
Apr 15 18:45:46 core-01 dockerd[881]: time="2015-04-15T18:45:46Z" level="info" msg="+job stop(apache1)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="Container afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1 failed to exit within 10 seconds of SIGTERM -
Apr 15 18:45:56 core-01 kernel: docker0: port 1(veth87a841f) entered disabled state
Apr 15 18:45:56 core-01 kernel: device veth87a841f left promiscuous mode
Apr 15 18:45:56 core-01 kernel: docker0: port 1(veth87a841f) entered disabled state
Apr 15 18:45:56 core-01 systemd-networkd[830]: veth87a841f     : lost carrier
Apr 15 18:45:56 core-01 systemd-networkd[830]: veth87a841f     : could not find udev device: No such device
Apr 15 18:45:56 core-01 systemd-networkd[830]: docker0         : lost carrier
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="+job log(die, afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1, coreos/apache:latest)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="-job log(die, afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1, coreos/apache:latest) = OK (0)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="+job release_interface(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="-job attach(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1) = OK (0)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="POST /v1.17/containers/afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1/wait"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="+job wait(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="-job release_interface(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1) = OK (0)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="+job log(stop, afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1, coreos/apache:latest)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="-job log(stop, afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1, coreos/apache:latest) = OK (0)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="-job stop(apache1) = OK (0)"
Apr 15 18:45:56 core-01 docker[1522]: apache1
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="-job wait(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1) = OK (0)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="GET /v1.17/containers/afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1/json"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="+job container_inspect(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="-job container_inspect(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1) = OK (0)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="DELETE /v1.17/containers/afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1?v=1"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="+job rm(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="POST /v1.17/containers/afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1/kill?signal=TERM"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="+job kill(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1, TERM)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="+job log(destroy, afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1, coreos/apache:latest)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="-job log(destroy, afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1, coreos/apache:latest) = OK (0)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="-job rm(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1) = OK (0)"
Apr 15 18:45:56 core-01 systemd[1]: apache@2.service: main process exited, code=exited, status=137/n/a
Apr 15 18:45:56 core-01 systemd[1]: Stopped "Dummy Apache container".
Apr 15 18:45:56 core-01 systemd[1]: Unit apache@2.service entered failed state.
Apr 15 18:45:56 core-01 systemd[1]: apache@2.service failed.
Apr 15 18:45:56 core-01 dockerd[881]: No such container: afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="-job kill(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1, TERM) = ERR (1)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="error" msg="Handler for POST /containers/{name:.*}/kill returned error: No such container: afa4e52d8ff2edaa53d2bae1177535d669a4758f
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="error" msg="HTTP Error: statusCode=404 No such container: afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1"

I'm not really sure what's going on here, so any help is greatly appreciated.

UPDATE: Just for the sake of testing, I've increased the timeouts in my service file (docker stop -t 300 apache1 and TimeoutStopSec = 300), however the container fails to stop even after 5 minutes wait. To be sure I tried to stop the container directly on the command line (docker stop apache1), and of course it works just fine. So there seems to be something that prevents container from being stopped properly through fleetctl.

Thanks for your time!

Best Answer

The reason why docker is stopping the container is because apache in the container is failing.