Docker – Java application gets killed in kubernetes while the resource limits and heap size are specified

dockerjavakubernetesoom

Background

A spring boot Java application is deployed in a kubernetes cluster and gets killed several times per day.

I'm using openjdk:8u181-jre for my Java apps.

Kubernetes version: v1.11.5

Node os: CentOS 7.4 x64

JAVA_OPTS are set according to this post about letting java application read the cgroup limitations.
https://developers.redhat.com/blog/2017/03/14/java-inside-docker/

env:
- name: JAVA_OPTS
  value: " -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2 -Xms512M"
resources:
  requests:
    memory: "4096Mi"
    cpu: "1"
  limits:
    memory: "4096Mi"
    cpu: "1"

The nodes in the cluster are with memory of 16GiB. And the pod is requesting 4GiB.

Error

But the application get OOM killed from time to time.

The system events:

Jan 16 23:29:58 localhost kernel: java invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=-998
Jan 16 23:29:58 localhost kernel: java cpuset=docker-aa640424ab783e441cbd26cd25b7817e5a36deff2f44b369153d7399020d1059.scope mems_allowed=0
Jan 16 23:29:58 localhost kernel: CPU: 7 PID: 19904 Comm: java Tainted: G           OE  ------------ T 3.10.0-693.2.2.el7.x86_64 #1
Jan 16 23:29:58 localhost kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Jan 16 23:29:58 localhost kernel: ffff880362700000 000000008b5adefc ffff88034078bc90 ffffffff816a3db1
Jan 16 23:29:58 localhost kernel: ffff88034078bd20 ffffffff8169f1a6 ffff8803b1642680 0000000000000001
Jan 16 23:29:58 localhost kernel: 0000000000000000 ffff880407eeaad0 ffff88034078bcd0 0000000000000046
Jan 16 23:29:58 localhost kernel: Call Trace:
Jan 16 23:29:58 localhost kernel: [<ffffffff816a3db1>] dump_stack+0x19/0x1b
Jan 16 23:29:58 localhost kernel: [<ffffffff8169f1a6>] dump_header+0x90/0x229
Jan 16 23:29:58 localhost kernel: [<ffffffff81185ee6>] ? find_lock_task_mm+0x56/0xc0
Jan 16 23:29:58 localhost kernel: [<ffffffff81186394>] oom_kill_process+0x254/0x3d0
Jan 16 23:29:58 localhost kernel: [<ffffffff811f52a6>] mem_cgroup_oom_synchronize+0x546/0x570
Jan 16 23:29:58 localhost kernel: [<ffffffff811f4720>] ? mem_cgroup_charge_common+0xc0/0xc0
Jan 16 23:29:58 localhost kernel: [<ffffffff81186c24>] pagefault_out_of_memory+0x14/0x90
Jan 16 23:29:58 localhost kernel: [<ffffffff8169d56e>] mm_fault_error+0x68/0x12b
Jan 16 23:29:58 localhost kernel: [<ffffffff816b0231>] __do_page_fault+0x391/0x450
Jan 16 23:29:58 localhost kernel: [<ffffffff810295da>] ? __switch_to+0x15a/0x510
Jan 16 23:29:58 localhost kernel: [<ffffffff816b03d6>] trace_do_page_fault+0x56/0x150
Jan 16 23:29:58 localhost kernel: [<ffffffff816afa6a>] do_async_page_fault+0x1a/0xd0
Jan 16 23:29:58 localhost kernel: [<ffffffff816ac578>] async_page_fault+0x28/0x30
Jan 16 23:29:58 localhost kernel: Task in /kubepods.slice/kubepods-podc4e5c355_196b_11e9_b6ba_00163e066499.slice/docker-aa640424ab783e441cbd26cd25b7817e5a36deff2f44b369153d7399020d1059.scope killed as a result of limit of /kubepods.slice/kubepods-podc4e5c355_196b_11e9_b6ba_00163e066499.slice
Jan 16 23:29:58 localhost kernel: memory: usage 4194304kB, limit 4194304kB, failcnt 7722
Jan 16 23:29:58 localhost kernel: memory+swap: usage 4194304kB, limit 9007199254740988kB, failcnt 0
Jan 16 23:29:58 localhost kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Jan 16 23:29:58 localhost kernel: Memory cgroup stats for /kubepods.slice/kubepods-podc4e5c355_196b_11e9_b6ba_00163e066499.slice: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Jan 16 23:29:58 localhost kernel: Memory cgroup stats for /kubepods.slice/kubepods-podc4e5c355_196b_11e9_b6ba_00163e066499.slice/docker-58ff049ead2b1713e8a6c736b4637b64f8b6b5c9d1232101792b4d1e8cf03d6a.scope: cache:0KB rss:40KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:40KB inactive_file:0KB active_file:0KB unevictable:0KB
Jan 16 23:29:58 localhost kernel: Memory cgroup stats for /kubepods.slice/kubepods-podc4e5c355_196b_11e9_b6ba_00163e066499.slice/docker-aa640424ab783e441cbd26cd25b7817e5a36deff2f44b369153d7399020d1059.scope: cache:32KB rss:4194232KB rss_huge:3786752KB mapped_file:8KB swap:0KB inactive_anon:0KB active_anon:4194232KB inactive_file:0KB active_file:32KB unevictable:0KB
Jan 16 23:29:58 localhost kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Jan 16 23:29:58 localhost kernel: [19357]     0 19357      254        1       4        0          -998 pause
Jan 16 23:29:58 localhost kernel: [19485]     0 19485     1071      161       7        0          -998 sh
Jan 16 23:29:58 localhost kernel: [19497]     0 19497  2008713  1051013    2203        0          -998 java
Jan 16 23:29:58 localhost kernel: Memory cgroup out of memory: Kill process 31404 (java) score 6 or sacrifice child
Jan 16 23:29:58 localhost kernel: Killed process 19497 (java) total-vm:8034852kB, anon-rss:4188424kB, file-rss:15628kB, shmem-rss:0kB

I'm quite confused that the Heap size should be limited to 2GiB (estimated) since the RAMFraction is set to 2. But the container is killed. 🙁

Could you please help me to find out a correct way or method to dig into this error?

Best Answer

The article you mention is was updated twice with some new information. There are many similar blog posts out there and maybe tools like the java-buildpack-memory-calculator are still helpful. But the conclusion in general is that Java 10 and following will finally be better suited for running in containers.