Docker-compose ELK container fails to start

dockerdocker-composeelasticsearch

I am deploying a docker-compose stack of 5 applications on a single AWS EC2 host with 32GB RAM. This includes: 2 x Java containers, 2 x Rails containers, 1 x Elasticsearch/Logstash/Kibana (ELK) container (from https://hub.docker.com/r/sebp/elk/).

When I bring the stack up for the first time, all containers start. The ELK container takes about 3 minutes to start. The others come up straight away.

But the ELK containers exits after about 5 minutes. I can see from the logs that the elasticsearch service will not start. The log messages indicates a memory limitation error.

However, when I then tear everything down, and bring it up again, all the containers start straight away, including the ELK container, and everything remains stable. The issue only occurs the first time I start the stack on a new EC2 instance.

I can see from the docker stats that the ELK container is only using 2-3GB of the 32GB RAM available on the instance.

The ELK container is configured as follows:

elk:
  image: sebp/elk
  hostname: elk
  container_name: elk
  volumes:
    - ./pipeline/:/etc/logstash/conf.d/
  tty: true
  expose:
    - "12201/udp"
  network_mode: host
  ports:
    - "5601:5601"
    - "9200:9200"
    - "12201:12201"
  ulimits:
    nofile:
      soft: 65536
      hard: 65536

There are no dependencies between the containers on start up.

What is happening with elasticsearch when it first runs that cause the container to fail when starting?

Best Answer

The issue here was that the sebp/elb image contains a timeout parameter that was expiring before elasticsearch could start.

The is controlled by the ES_CONNECT_RETRY environment variable, which is set to 30 secs by default. Elasticsearch takes longer than this to start the first time, so when I set this to 300 secs, it worked. You can add this as an environment variable for the elk container in your docker-compose manifest.

elk:
  image: sebp/elk
  hostname: elk
  container_name: elk
  environment:
   - ES_CONNECT_RETRY=300

In addition to this, you also need to set the COMPOSE_HTTP_TIMEOUT environment variable before you run docker-compose, as otherwise, docker-compose will time out before elasticsearch can start. You should set the value to something greater than you set for ES_CONNECT_RETRY eg

COMPOSE_HTTP_TIMEOUT=360

Its possible that this is occurring because I am using a larger Ec2 instance (32GB RAM). It may not be an issue on smaller instances.