Vm.max_map_count problems on GKE ElasticSearch StatefulSet

elasticsearchgcloudgoogle-kubernetes-enginekubernetes

A problem appeared on working ElasticSearch cluster on GKE.
Nodes with "data" roles began to crash unexpectedly with an error:

max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

bootstrap checks failed

Of course, there is an init container in this StatefulSet controller which sets vm.max_map_count to 262144

Even more, this init container seem to complete successfully:

kubectl descrive pod elastic-data-0


Init Containers:
  init-sysctl:
    Container ID:  docker://23d3b3d11198510aa01aef340b92e1603785804fbf75e963fdbd61acfe458318
    Image:         busybox:latest
    Image ID:      docker-pullable://busybox@sha256:5e8e0509e829bb8f990249135a36e81a3ecbe94294e7a185cc14616e5fad96bd
    Port:          <none>
    Command:
      sysctl
      -w
      vm.max_map_count=262144
    State:          Terminated
      Reason:       Completed

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: elastic-data
  labels:
    component: elasticsearch
    role: data
spec:
  serviceName: elasticsearch-data
  updateStrategy:
    type: RollingUpdate
  replicas: 3
  template:
    metadata:
      labels:
        component: elasticsearch
        role: data
    spec:
      initContainers:
      - name: init-sysctl
        image: busybox:latest
        imagePullPolicy: Always
        command:
        - sysctl
        - -w
        - vm.max_map_count=262144
        securityContext:
          privileged: true
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: role
                  operator: In
                  values:
                  - data
              topologyKey: kubernetes.io/hostname
      containers:
      - name: es-data
        image: docker.elastic.co/elasticsearch/elasticsearch:6.3.2
        env:
        - name: node.name
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: cluster.name
          value: "elastic"
        - name: node.master
          value: "false"
        - name: node.data
          value: "true"
        - name: node.ingest
          value: "false"
        - name: http.enabled
          value: "true"
        - name: bootstrap.memory_lock
          value: "false"
        - name: path.data
          value: "/data/data"
        - name: path.logs
          value: "/data/log"
        - name: discovery.zen.ping.unicast.hosts
          value: "elasticsearch-discovery"
        - name: ES_JAVA_OPTS
          value: -Xms512m -Xmx512m
        - name: processors
          valueFrom:
            resourceFieldRef:
              resource: limits.cpu
        resources:
          limits:
            cpu: 2
          requests:
            cpu: 300m
        ports:
        - containerPort: 9200
          name: http
        - containerPort: 9300
          name: transport
        livenessProbe:
          tcpSocket:
            port: transport
          initialDelaySeconds: 20
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /_cluster/health
            port: http
          initialDelaySeconds: 20
          timeoutSeconds: 5
        volumeMounts:
        - name: storage-volume
          mountPath: /data
      securityContext:
        runAsUser: 1000
        fsGroup: 100
  volumeClaimTemplates:
  - metadata:
      name: storage-volume
    spec:
      storageClassName: manual
      accessModes: [ ReadWriteOnce ]
      resources:
        requests:
          storage: 300Gi

Logs:

kubectl logs elastic-data-0 
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
[2018-08-16T15:40:33,998][INFO ][o.e.n.Node               ] [elastic-data-0] initializing ...
[2018-08-16T15:40:34,163][INFO ][o.e.e.NodeEnvironment    ] [elastic-data-0] using [1] data paths, mounts [[/data (/dev/sdb)]], net usable_space [231.2gb], net total_space [245gb], types [ext4]
[2018-08-16T15:40:34,165][INFO ][o.e.e.NodeEnvironment    ] [elastic-data-0] heap size [503.6mb], compressed ordinary object pointers [true]
[2018-08-16T15:40:34,544][INFO ][o.e.n.Node               ] [elastic-data-0] node name [elastic-data-0], node ID [C2vCCIpHS3mpiDHduimS0g]
[2018-08-16T15:40:34,545][INFO ][o.e.n.Node               ] [elastic-data-0] version[6.3.2], pid[1], build[default/tar/053779d/2018-07-20T05:20:23.451332Z], OS[Linux/4.14.22+/amd64], JVM["Oracle Corporation"/OpenJDK 64-Bit Server VM/10.0.2/10.0.2+13]
[2018-08-16T15:40:34,545][INFO ][o.e.n.Node               ] [elastic-data-0] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch.zCR3bQNp, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Djava.locale.providers=COMPAT, -XX:UseAVX=2, -Des.cgroups.hierarchy.override=/, -Xms512m, -Xmx512m, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config, -Des.distribution.flavor=default, -Des.distribution.type=tar]
[2018-08-16T15:40:36,612][WARN ][o.e.d.c.s.Settings       ] [http.enabled] setting was deprecated in Elasticsearch and will be removed in a future release! See the breaking changes documentation for the next major version.
[2018-08-16T15:40:38,484][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [aggs-matrix-stats]
[2018-08-16T15:40:38,485][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [analysis-common]
[2018-08-16T15:40:38,485][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [ingest-common]
[2018-08-16T15:40:38,485][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [lang-expression]
[2018-08-16T15:40:38,485][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [lang-mustache]
[2018-08-16T15:40:38,485][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [lang-painless]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [mapper-extras]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [parent-join]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [percolator]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [rank-eval]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [reindex]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [repository-url]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [transport-netty4]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [tribe]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [x-pack-core]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [x-pack-deprecation]
[2018-08-16T15:40:38,486][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [x-pack-graph]
[2018-08-16T15:40:38,487][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [x-pack-logstash]
[2018-08-16T15:40:38,487][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [x-pack-ml]
[2018-08-16T15:40:38,487][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [x-pack-monitoring]
[2018-08-16T15:40:38,487][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [x-pack-rollup]
[2018-08-16T15:40:38,487][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [x-pack-security]
[2018-08-16T15:40:38,487][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [x-pack-sql]
[2018-08-16T15:40:38,488][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [x-pack-upgrade]
[2018-08-16T15:40:38,488][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded module [x-pack-watcher]
[2018-08-16T15:40:38,488][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded plugin [ingest-geoip]
[2018-08-16T15:40:38,489][INFO ][o.e.p.PluginsService     ] [elastic-data-0] loaded plugin [ingest-user-agent]
[2018-08-16T15:40:44,991][INFO ][o.e.x.s.a.s.FileRolesStore] [elastic-data-0] parsed [0] roles from file [/usr/share/elasticsearch/config/roles.yml]
[2018-08-16T15:40:45,793][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [controller/61] [Main.cc@109] controller (64 bit): Version 6.3.2 (Build 903094f295d249) Copyright (c) 2018 Elasticsearch BV
[2018-08-16T15:40:47,003][INFO ][o.e.d.DiscoveryModule    ] [elastic-data-0] using discovery type [zen]
[2018-08-16T15:40:48,139][INFO ][o.e.n.Node               ] [elastic-data-0] initialized
[2018-08-16T15:40:48,140][INFO ][o.e.n.Node               ] [elastic-data-0] starting ...
[2018-08-16T15:40:48,337][INFO ][o.e.t.TransportService   ] [elastic-data-0] publish_address {10.0.1.11:9300}, bound_addresses {[::]:9300}
[2018-08-16T15:40:48,452][INFO ][o.e.b.BootstrapChecks    ] [elastic-data-0] bound or publishing to a non-loopback address, enforcing bootstrap checks
ERROR: [1] bootstrap checks failed
[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2018-08-16T15:40:48,477][INFO ][o.e.n.Node               ] [elastic-data-0] stopping ...
[2018-08-16T15:40:48,503][INFO ][o.e.n.Node               ] [elastic-data-0] stopped
[2018-08-16T15:40:48,504][INFO ][o.e.n.Node               ] [elastic-data-0] closing ...
[2018-08-16T15:40:48,525][INFO ][o.e.n.Node               ] [elastic-data-0] closed
[2018-08-16T15:40:48,529][INFO ][o.e.x.m.j.p.NativeController] Native controller process has stopped - no new native processes can be started

Kubernetes version is 1.10.5-gke.4 (yes, it's on GKE)

Any ideas are appreciated.

Best Answer

The problem is that init-containers is executed only when pod is created to node, I Think you restarted kubernetes nodes behind this and you did get error:

max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

Normally to fix this, you should run the command below for each node you have when you restarted kubernetes:

sudo sysctl -w vm.max_map_count=262144

I suggest you to use DeamonSet to your cluster. Seems to do the trick. Using google's startup-script container. you will find the solution provided below.

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    k8s-app: sysctl-conf
  name: sysctl-conf
spec:
  template:
    metadata:
      labels:
        k8s-app: sysctl-conf
    spec:
      containers:
      - command:
        - sh
        - -c
        - sysctl -w vm.max_map_count=262166 && while true; do sleep 86400; done
        image: busybox:1.26.2
        name: sysctl-conf
        resources:
          limits:
            cpu: 10m
            memory: 50Mi
          requests:
            cpu: 10m
            memory: 50Mi
        securityContext:
          privileged: true
      terminationGracePeriodSeconds: 1

to validate your update take one node, ssh to it node and run the command to list how much VM max:

sudo sysctl -a | grep vm.max_map_count
Related Topic