Linux – Oracle invoked oom-killer even when RAM and SWAP is plenty free

linuxlinux-kerneloom-killeroracleprocess

Can you please tell me what cause this Oracle process killed? Seems like plenty of RAM free, and plenty of SWAP free. There followed few other oracle processes killed.
The VM has the 16G of vMem and 8 vCPU.
But I am posting here the first oracle process that got killed:

 Mar  1 20:00:58 ******* kernel: oracle invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0
Mar  1 20:00:58 ******* kernel: oracle cpuset=/ mems_allowed=0
Mar  1 20:00:58 ******* kernel: Pid: 2370, comm: oracle Not tainted 2.6.32-431.el6.x86_64 #1
Mar  1 20:00:58 ******* kernel: Call Trace:
Mar  1 20:00:58 ******* kernel: [] ? cpuset_print_task_mems_allowed+0x91/0xb0
Mar  1 20:00:58 ******* kernel: [] ? dump_header+0x90/0x1b0
Mar  1 20:00:58 ******* kernel: [] ? security_real_capable_noaudit+0x3c/0x70
Mar  1 20:00:58 ******* kernel: [] ? oom_kill_process+0x82/0x2a0
Mar  1 20:00:58 ******* kernel: [] ? select_bad_process+0xe1/0x120
Mar  1 20:00:58 ******* kernel: [] ? out_of_memory+0x220/0x3c0
Mar  1 20:00:58 ******* kernel: [] ? __alloc_pages_nodemask+0x8ac/0x8d0
Mar  1 20:00:58 ******* kernel: [] ? alloc_pages_vma+0x9a/0x150
Mar  1 20:00:58 ******* kernel: [] ? handle_pte_fault+0x73d/0xb00
Mar  1 20:00:58 ******* kernel: [] ? free_pgtables+0xce/0x120
Mar  1 20:00:58 ******* kernel: [] ? unmap_region+0xcd/0x130
Mar  1 20:00:58 ******* kernel: [] ? vma_prio_tree_add+0x75/0xd0
Mar  1 20:00:58 ******* kernel: [] ? handle_mm_fault+0x22a/0x300
Mar  1 20:00:58 ******* kernel: [] ? __do_page_fault+0x138/0x480
Mar  1 20:00:58 ******* kernel: [] ? do_mmap_pgoff+0x335/0x380
Mar  1 20:00:58 ******* kernel: [] ? do_page_fault+0x3e/0xa0
Mar  1 20:00:58 ******* kernel: [] ? page_fault+0x25/0x30
Mar  1 20:00:58 ******* kernel: Mem-Info:
Mar  1 20:00:58 ******* kernel: Node 0 DMA per-cpu:
Mar  1 20:00:58 ******* kernel: CPU    0: hi:    0, btch:   1 usd:   0
Mar  1 20:00:58 ******* kernel: CPU    1: hi:    0, btch:   1 usd:   0
Mar  1 20:00:58 ******* kernel: CPU    2: hi:    0, btch:   1 usd:   0
Mar  1 20:00:58 ******* kernel: CPU    3: hi:    0, btch:   1 usd:   0
Mar  1 20:00:58 ******* kernel: CPU    4: hi:    0, btch:   1 usd:   0
Mar  1 20:00:58 ******* kernel: CPU    5: hi:    0, btch:   1 usd:   0
Mar  1 20:00:58 ******* kernel: CPU    6: hi:    0, btch:   1 usd:   0
Mar  1 20:00:58 ******* kernel: CPU    7: hi:    0, btch:   1 usd:   0
Mar  1 20:00:58 ******* kernel: Node 0 DMA32 per-cpu:
Mar  1 20:00:58 ******* kernel: CPU    0: hi:  186, btch:  31 usd:   0
Mar  1 20:00:58 ******* kernel: CPU    1: hi:  186, btch:  31 usd:   0
Mar  1 20:00:58 ******* kernel: CPU    2: hi:  186, btch:  31 usd:   0
Mar  1 20:00:58 ******* kernel: CPU    3: hi:  186, btch:  31 usd:   0
Mar  1 20:00:58 ******* kernel: CPU    4: hi:  186, btch:  31 usd:   0
Mar  1 20:00:58 ******* kernel: CPU    5: hi:  186, btch:  31 usd:   0
Mar  1 20:00:58 ******* kernel: CPU    6: hi:  186, btch:  31 usd:   0
Mar  1 20:00:58 ******* kernel: CPU    7: hi:  186, btch:  31 usd:   0
Mar  1 20:00:58 ******* kernel: Node 0 Normal per-cpu:
Mar  1 20:00:58 ******* kernel: CPU    0: hi:  186, btch:  31 usd:   0
Mar  1 20:00:58 ******* kernel: CPU    1: hi:  186, btch:  31 usd:   0
Mar  1 20:00:58 ******* kernel: CPU    2: hi:  186, btch:  31 usd:   0
Mar  1 20:00:58 ******* kernel: CPU    3: hi:  186, btch:  31 usd:  20
Mar  1 20:00:58 ******* kernel: CPU    4: hi:  186, btch:  31 usd:  32
Mar  1 20:00:58 ******* kernel: CPU    5: hi:  186, btch:  31 usd:   0
Mar  1 20:00:58 ******* kernel: CPU    6: hi:  186, btch:  31 usd: 184
Mar  1 20:00:58 ******* kernel: CPU    7: hi:  186, btch:  31 usd:   0
Mar  1 20:00:58 ******* kernel: active_anon:2673615 inactive_anon:368657 isolated_anon:0
Mar  1 20:00:58 ******* kernel: active_file:3541 inactive_file:3962 isolated_file:32
Mar  1 20:00:58 ******* kernel: unevictable:0 dirty:3 writeback:2770 unstable:0
Mar  1 20:00:58 ******* kernel: free:33763 slab_reclaimable:16555 slab_unreclaimable:28221
Mar  1 20:00:58 ******* kernel: mapped:1517627 shmem:1730877 pagetables:906135 bounce:0
Mar  1 20:00:58 ******* kernel: Node 0 DMA free:15132kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:14740kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Mar  1 20:00:58 ******* kernel: lowmem_reserve[]: 0 3000 16130 16130
Mar  1 20:00:58 ******* kernel: Node 0 DMA32 free:64904kB min:12556kB low:15692kB high:18832kB active_anon:2064816kB inactive_anon:516452kB active_file:492kB inactive_file:188kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072096kB mlocked:0kB dirty:0kB writeback:0kB mapped:2319432kB shmem:2352892kB slab_reclaimable:7420kB slab_unreclaimable:3620kB kernel_stack:832kB pagetables:24672kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1 all_unreclaimable? no
Mar  1 20:00:58 ******* kernel: lowmem_reserve[]: 0 0 13130 13130
Mar  1 20:00:58 ******* kernel: Node 0 Normal free:55016kB min:54964kB low:68704kB high:82444kB active_anon:8629644kB inactive_anon:958176kB active_file:13672kB inactive_file:15660kB unevictable:0kB isolated(anon):0kB isolated(file):128kB present:13445120kB mlocked:0kB dirty:12kB writeback:11080kB mapped:3751076kB shmem:4570616kB slab_reclaimable:58800kB slab_unreclaimable:109264kB kernel_stack:5360kB pagetables:3599868kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:160 all_unreclaimable? no
Mar  1 20:00:58 ******* kernel: lowmem_reserve[]: 0 0 0 0
Mar  1 20:00:58 ******* kernel: Node 0 DMA: 3*4kB 2*8kB 2*16kB 3*32kB 2*64kB 2*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 3*4096kB = 15132kB
Mar  1 20:00:58 ******* kernel: Node 0 DMA32: 1225*4kB 859*8kB 878*16kB 547*32kB 184*64kB 34*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 1*4096kB = 65596kB
Mar  1 20:00:58 ******* kernel: Node 0 Normal: 9165*4kB 1804*8kB 46*16kB 2*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 55924kB
Mar  1 20:00:58 ******* kernel: 1760824 total pagecache pages
Mar  1 20:00:58 ******* kernel: 22460 pages in swap cache
Mar  1 20:00:58 ******* kernel: Swap cache stats: add 6636857, delete 6614397, find 15635455/16141480
Mar  1 20:00:58 ******* kernel: Free swap  = 33548340kB
Mar  1 20:00:58 ******* kernel: Total swap = 36184056kB
Mar  1 20:00:58 ******* kernel: 4194288 pages RAM
Mar  1 20:00:58 ******* kernel: 111808 pages reserved
Mar  1 20:00:58 ******* kernel: 59252583 pages shared
Mar  1 20:00:58 ******* kernel: 2502605 pages non-shared
Mar  1 20:00:58 ******* kernel: [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
Mar  1 20:00:58 ******* kernel: [  612]     0   612     2769       42   2     -17         -1000 udevd
Mar  1 20:00:58 ******* kernel: [ 1872]     0  1872    47365      204   7       0             0 vmtoolsd
Mar  1 20:00:58 ******* kernel: [ 1980]     0  1980    23294      109   6     -17         -1000 auditd
Mar  1 20:00:58 ******* kernel: [ 1996]     0  1996    62898      842   4       0             0 rsyslogd
Mar  1 20:00:58 ******* kernel: [ 2025]     0  2025     2738       93   3       0             0 irqbalance
Mar  1 20:00:58 ******* kernel: [ 2039]    32  2039     4744       68   4       0             0 rpcbind
Mar  1 20:00:58 ******* kernel: [ 2071]    29  2071     5837       61   3       0             0 rpc.statd
Mar  1 20:00:58 ******* kernel: [ 2092]     0  2092     5773       31   1       0             0 rpc.idmapd
Mar  1 20:00:58 ******* kernel: [ 2211]     0  2211    39323      127   5       0             0 pbx_exchange
Mar  1 20:00:58 ******* kernel: [ 2223]     0  2223    48106      158   5       0             0 winbindd
Mar  1 20:00:58 ******* kernel: [ 2237]     0  2237     1020       48   4       0             0 acpid
Mar  1 20:00:58 ******* kernel: [ 2323]     0  2323    49766      281   0       0             0 winbindd
Mar  1 20:00:58 ******* kernel: [ 2540]     0  2540    26827       11   5       0             0 rpc.rquotad
Mar  1 20:00:58 ******* kernel: [ 2544]     0  2544     5414       41   5       0             0 rpc.mountd
Mar  1 20:00:58 ******* kernel: [ 2580]     0  2580     1570       23   0       0             0 mcelog
Mar  1 20:00:58 ******* kernel: [ 2592]     0  2592    16651       78   5     -17         -1000 sshd
Mar  1 20:00:58 ******* kernel: [ 2600]     0  2600     5545      105   3       0             0 xinetd
Mar  1 20:00:58 ******* kernel: [ 2608]    38  2608     7147      132   5       0             0 ntpd
Mar  1 20:00:58 ******* kernel: [ 2618]   498  2618    25741       57   2       0             0 uuidd
Mar  1 20:00:58 ******* kernel: [ 2630]     0  2630    43170      139   3       0             0 vnetd
Mar  1 20:00:58 ******* kernel: [ 2638]     0  2638    52398      158   2       0             0 bpcd
Mar  1 20:00:58 ******* kernel: [ 2655]     0  2655   198335      478   4       0             0 nbdisco
Mar  1 20:00:58 ******* kernel: [ 2676]     0  2676    76958       82   2       0             0 mtstrmd
Mar  1 20:00:58 ******* kernel: [ 2707]     0  2707    22314      141   0       0             0 sendmail
Mar  1 20:00:58 ******* kernel: [ 2716]    51  2716    19658       80   4       0             0 sendmail
Mar  1 20:00:58 ******* kernel: [ 2734]     0  2734   200856      353   7       0             0 avagent.bin
Mar  1 20:00:58 ******* kernel: [ 2747]     0  2747    44287      178   3       0             0 tuned
Mar  1 20:00:58 ******* kernel: [ 2757]     0  2757    29333      103   6       0             0 crond
Mar  1 20:00:58 ******* kernel: [ 2778]     0  2778    27431      167   7       0             0 saphostexec
Mar  1 20:00:58 ******* kernel: [ 2805]   600  2805   545016     4031   5       0             0 sapstartsrv
Mar  1 20:00:58 ******* kernel: [ 2885]   834  2885   100602      294   3       0             0 sapstartsrv
Mar  1 20:00:58 ******* kernel: [ 2904]     0  2904     5385       31   6       0             0 atd
Mar  1 20:00:58 ******* kernel: [ 2928]     0  2928    26005       69   5       0             0 rhsmcertd
Mar  1 20:00:58 ******* kernel: [ 2935]     0  2935     8154     1110   0       0             0 saposcol
Mar  1 20:00:58 ******* kernel: [ 3098]   834  3098    13538       50   3       0             0 sapstart
Mar  1 20:00:58 ******* kernel: [ 3128]   834  3128    43278      119   5       0             0 jc.sapDAA_SMDA9
Mar  1 20:00:58 ******* kernel: [ 3144]   834  3144  1276839    57796   4       0             0 jstart
Mar  1 20:00:58 ******* kernel: [ 3211]   703  3211    33752      378   5       0             0 perl
Mar  1 20:00:58 ******* kernel: [ 3288]   703  3288  1181563    62355   0       0             0 java
Mar  1 20:00:58 ******* kernel: [ 3497]     0  3497     1016       34   1       0             0 mingetty
Mar  1 20:00:58 ******* kernel: [ 3499]     0  3499     1016       34   1       0             0 mingetty
Mar  1 20:00:58 ******* kernel: [ 3502]     0  3502     1016       34   1       0             0 mingetty
Mar  1 20:00:58 ******* kernel: [ 3504]     0  3504     1016       34   2       0             0 mingetty
Mar  1 20:00:58 ******* kernel: [ 3506]     0  3506     1016       34   1       0             0 mingetty
Mar  1 20:00:58 ******* kernel: [ 3508]     0  3508     1016       34   1       0             0 mingetty
Mar  1 20:00:58 ******* kernel: [ 3515]     0  3515     3098       41   2     -17         -1000 udevd
Mar  1 20:00:58 ******* kernel: [ 3516]     0  3516     3098       41   4     -17         -1000 udevd
Mar  1 20:00:58 ******* kernel: [13764]     0 13764    48089       89   7       0             0 winbindd
Mar  1 20:00:58 ******* kernel: [13765]     0 13765    48089       92   7       0             0 winbindd
Mar  1 20:00:58 ******* kernel: [13873]   703 13873  2403434     6196   5       0             0 oracle
Mar  1 20:00:58 ******* kernel: [13875]   703 13875  2402873      651   3       0             0 oracle
Mar  1 20:00:58 ******* kernel: [13880]   703 13880  2402873      423   4       0             0 oracle
Mar  1 20:00:58 ******* kernel: [13875]   703 13875  2402873      651   3       0             0 oracle
Mar  1 20:00:58 ******* kernel: [13880]   703 13880  2402873      423   4       0             0 oracle

.. Note: Removed bunch of oracle processes here so as to limit the character length for the posting here. Total of 296 oracle process running.
..
Mar  1 20:00:59 ******* kernel: [18644]     0 18644    44207      371   1       0             0 bpclntcmd
Mar  1 20:00:59 ******* kernel: [18647]   703 18647    57442      240   3       0             0 oracle
Mar  1 20:00:59 ******* kernel: [18656]   703 18656    57442      185   6       0             0 oracle
Mar  1 20:00:59 ******* kernel: [18657] 54329 18657     9279      196   1       0             0 nrpe
Mar  1 20:00:59 ******* kernel: [18660] 54329 18660     9314      255   2       0             0 nrpe
Mar  1 20:00:59 ******* kernel: [18662]     0 18662    39263      289   5       0             0 crond
Mar  1 20:00:59 ******* kernel: [18663]     0 18663     5745      341   1       0             0 saposcol
Mar  1 20:00:59 ******* kernel: [18664] 54329 18664     9315      146   3       0             0 nrpe
Mar  1 20:00:59 ******* kernel: [18665] 54329 18665     5730       76   0       0             0 check_open_file
Mar  1 20:00:59 ******* kernel: [18666] 54329 18666     6611      191   4       0             0 xinetd
Mar  1 20:00:59 ******* kernel: [18667]     0 18667     8389      183   1       0             0 sapcimb
Mar  1 20:00:59 ******* kernel: [18669]     0 18669     6610      171   0       0             0 xinetd
Mar  1 20:00:59 ******* kernel: [18670]     0 18670     6610      171   0       0             0 xinetd
Mar  1 20:00:59 ******* kernel: [18677]     0 18677     6610      177   5       0             0 xinetd
Mar  1 20:00:59 ******* kernel: [18678]   703 18678    29497      275   4       0             0 perl
Mar  1 20:00:59 ******* kernel: [18682]   703 18682    29497      252   7       0             0 perl
Mar  1 20:00:59 ******* kernel: [18683]   703 18683    29497      231   0       0             0 perl
Mar  1 20:00:59 ******* kernel: [18687]     0 18687     2620       92   1       0             0 .SAPOSCOL_00000
Mar  1 20:00:59 ******* kernel: [18688]     0 18688     6610      186   5       0             0 xinetd
Mar  1 20:00:59 ******* kernel: [18689]     0 18689     6610      189   2       0             0 xinetd
Mar  1 20:00:59 ******* kernel: [18690]     0 18690     6610      191   3       0             0 xinetd
Mar  1 20:00:59 ******* kernel: [18691]     0 18691     6610      194   2       0             0 xinetd
Mar  1 20:00:59 ******* kernel: Out of memory: Kill process 13900 (oracle) score 77 or sacrifice child
Mar  1 20:00:59 ******* kernel: Killed process 13900, UID 703, (oracle) total-vm:9622308kB, anon-rss:5180kB, file-rss:4028040kB
 

From above, I think these lines says I have plenty of RAM and swap. right?:

Node 0 DMA free:15132kB
Node 0 DMA32 free:64904kB
Node 0 Normal free:55016kB
Free swap  = 33548340kB
Total swap = 36184056kB

Wondering what does it mean by "all_unreclaimable? yes" for Node 0 DMA, and "all_unreclaimable? no"Node 0 DMA31 and Node 0 Normal !

Also, here are the info that might give more info about the server settings:

 $sudo sysctl -p
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key
error: "net.bridge.bridge-nf-call-iptables" is an unknown key
error: "net.bridge.bridge-nf-call-arptables" is an unknown key
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.msgmni = 1024
kernel.sem = 1250 256000 100 8192
vm.max_map_count = 1000000
kernel.shmall = 1152921504606846720
fs.file-max = 19801952
net.core.rmem_default = 1048576
net.core.wmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_max = 1048576
fs.aio-max-nr = 1048576
net.ipv4.ip_local_port_range = 9000 65500
vm.swappiness = 0
vm.dirty_background_ratio = 3
vm.dirty_ratio = 15
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100
kernel.shmmni = 4096

Best Answer

You don't have much free memory at all.

First, vm.swappiness = 0 Only do this if you are definitely sure you have enough. Setting it low to 10 or so might prevent an out of memory condition. And will actually make use of your paging space.

From the summary of node 0, your 16 GB is roughly a quarter page tables, a quarter shared memory, half anonymous program memory, and some various odds and ends. Notice that the readily available file memory, plus free, is only tens of MB, not large. It won't be able to give you another GB or so of shared memory.

Page tables are eating you alive. You may not have huge pages enabled, which Oracle recommends for databases, and Red Hat does too.