Linux – Server overloaded, acts like out of memory, but thats not true

I have a Centos 6.5 server, running QEMU-KVM virtualization:

hardware:

40 CPUs
400 GB RAM

software:

Kernel: 2.6.32-431.17.1.el6.x86_64
Qemu: 0.12.1.2
Libvirt: 0.10.2

There are 3 guests, with identical hw configuration:

16 CPUs
120 GB RAM

<memory unit='KiB'>125829120</memory>
<currentMemory unit='KiB'>125829120</currentMemory>
<vcpu placement='static'>16</vcpu>

Guests are running Apache and MySQL.

On the host runs just some backup and maintenance scripts beside the virtuals, nothing else.

Always after a few days running, problems starts to show up. The load on guests randomly spikes upto about 150, with 10-15% in steal cpu time. On the host the load is around 38-40, about 30-40% in user cpu time, 40-50% in system cpu time.

The most CPU-consuming processes on the host in that moment are the Qemu precesses of virtual guests and right after them are kswapd0 and kswapd1, with 100% CPU usage.

Memory usage in that moment:

RAM total 378.48 GB
RAM used 330.82 GB
RAM free 47.66 GB
SWAP total 500.24 MB
SWAP used 497.13 MB
SWAP free 3192 kB

plus 10-20 GB RAM in buffers.

So, from the point of memory usage, there shouldnt be any problem. But the heavy work of kswapd processes indicates memory shortage, also full swap points in that direction (and when i turn swap of and on, it gets filled up in few moments). And once in a while, OOM-killer kills some process:

Nov 20 12:42:42 wv2-f302 kernel: active_anon:79945387 inactive_anon:3660742 isolated_anon:0
Nov 20 12:42:42 wv2-f302 kernel: active_file:252 inactive_file:0 isolated_file:0
Nov 20 12:42:42 wv2-f302 kernel: unevictable:0 dirty:2 writeback:0 unstable:0
Nov 20 12:42:42 wv2-f302 kernel: free:12513746 slab_reclaimable:5001 slab_unreclaimable:1759785
Nov 20 12:42:42 wv2-f302 kernel: mapped:213 shmem:41 pagetables:188243 bounce:0
Nov 20 12:42:42 wv2-f302 kernel: Node 0 DMA free:15728kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15332kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Nov 20 12:42:42 wv2-f302 kernel: lowmem_reserve[]: 0 2965 193855 193855
Nov 20 12:42:42 wv2-f302 kernel: Node 0 DMA32 free:431968kB min:688kB low:860kB high:1032kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3037072kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Nov 20 12:42:42 wv2-f302 kernel: lowmem_reserve[]: 0 0 190890 190890
Nov 20 12:42:42 wv2-f302 kernel: Node 0 Normal free:6593828kB min:44356kB low:55444kB high:66532kB active_anon:178841380kB inactive_anon:7783292kB active_file:540kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:195471360kB mlocked:0kB dirty:8kB writeback:0kB mapped:312kB shmem:48kB slab_reclaimable:11136kB slab_unreclaimable:1959664kB kernel_stack:5104kB pagetables:397332kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Nov 20 12:42:42 wv2-f302 kernel: lowmem_reserve[]: 0 0 0 0
Nov 20 12:42:42 wv2-f302 kernel: Node 1 Normal free:43013460kB min:45060kB low:56324kB high:67588kB active_anon:140940168kB inactive_anon:6859676kB active_file:468kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:198574076kB mlocked:0kB dirty:0kB writeback:0kB mapped:540kB shmem:116kB slab_reclaimable:8868kB slab_unreclaimable:5079476kB kernel_stack:2856kB pagetables:355640kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Nov 20 12:42:42 wv2-f302 kernel: lowmem_reserve[]: 0 0 0 0
Nov 20 12:42:42 wv2-f302 kernel: Node 0 DMA: 2*4kB 1*8kB 2*16kB 2*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15728kB
Nov 20 12:42:42 wv2-f302 kernel: Node 0 DMA32: 10*4kB 11*8kB 12*16kB 13*32kB 12*64kB 5*128kB 7*256kB 10*512kB 9*1024kB 6*2048kB 98*4096kB = 431968kB
Nov 20 12:42:42 wv2-f302 kernel: Node 0 Normal: 1648026*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6592104kB
Nov 20 12:42:42 wv2-f302 kernel: Node 1 Normal: 8390977*4kB 1181188*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 43013412kB
Nov 20 12:42:42 wv2-f302 kernel: 49429 total pagecache pages
Nov 20 12:42:42 wv2-f302 kernel: 48929 pages in swap cache
Nov 20 12:42:42 wv2-f302 kernel: Swap cache stats: add 2688331, delete 2639402, find 16219898/16530111
Nov 20 12:42:42 wv2-f302 kernel: Free swap  = 3264kB
Nov 20 12:42:42 wv2-f302 kernel: Total swap = 512248kB
Nov 20 12:42:44 wv2-f302 kernel: 100663294 pages RAM
Nov 20 12:42:44 wv2-f302 kernel: 1446311 pages reserved
Nov 20 12:42:44 wv2-f302 kernel: 10374115 pages shared
Nov 20 12:42:44 wv2-f302 kernel: 84534113 pages non-shared

Oct 27 14:24:43 wv2-f302 kernel: [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
Oct 27 14:24:43 wv2-f302 kernel: [ 3878]     0  3878 32042399 31569413  10       0             0 qemu_wl52
Oct 27 14:24:43 wv2-f302 kernel: [ 4321]     0  4321 32092081 31599762  20       0             0 qemu_wl51
Oct 27 14:24:43 wv2-f302 kernel: [ 4394]     0  4394 32106979 31575717  15       0             0 qemu_wl50
...
Oct 27 14:24:43 wv2-f302 kernel: Out of memory: Kill process 3878 (qemu_wl52) score 318 or sacrifice child
Oct 27 14:24:43 wv2-f302 kernel: Killed process 3878, UID 0, (qemu_wl52) total-vm:128169596kB, anon-rss:126277476kB, file-rss:176kB

Complete dump: http://evilcigi.eu/msg/msg.txt

Then i start the killed guest and from that moment, everything is OK, for a few days.. With the same memory usage as it has before the problem:

RAM total 378.48 GB
RAM used 336.15 GB
RAM free 42.33 GB
SWAP total 500.24 MB
SWAP used 344.55 MB
SWAP free 155.69 MB

Is it possible that server somehow counts memory badly? Or is there something I'm missing?

One thing comes to my mind, that host puts all free memory in buffers and cache and then suffers from memory shortage (invokes OOM-killer)? But that, i think, shouldnt happen, right? Also, that doesnt explain the behavior before the killing.

Thank you in advance.

so today the problem occurs again, here is the content of /proc/meminfo:

MemTotal:       396867932 kB
MemFree:         9720268 kB
Buffers:        53354000 kB
Cached:            22196 kB
SwapCached:       343964 kB
Active:         331872796 kB
Inactive:       41283992 kB
Active(anon):   305458432 kB
Inactive(anon): 14322324 kB
Active(file):   26414364 kB
Inactive(file): 26961668 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:        512248 kB
SwapFree:              0 kB
Dirty:                48 kB
Writeback:             0 kB
AnonPages:      319438656 kB
Mapped:             8536 kB
Shmem:               164 kB
Slab:            9052784 kB
SReclaimable:    2014752 kB
SUnreclaim:      7038032 kB
KernelStack:        8064 kB
PageTables:       650892 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    198946212 kB
Committed_AS:   383832752 kB
VmallocTotal:   34359738367 kB
VmallocUsed:     1824832 kB
VmallocChunk:   34157271228 kB
HardwareCorrupted:     0 kB
AnonHugePages:  31502336 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        7852 kB
DirectMap2M:     3102720 kB
DirectMap1G:    399507456 kB

It seems that all the "free" memory is spent in buffers.

After hints from @Matthew Ife about memory fragmentation, I have compacted the memory and also dropped the caches (to free up 60 GBs in buffers) on Host, with those commands:

echo 3 > /proc/sys/vm/drop_caches
echo 1 >/proc/sys/vm/compact_memory

Here is what memory fragmentation looks like now:

# cat /proc/buddyinfo
Node 0, zone      DMA      2      1      2      2      2      1      0      0      1      1      3 
Node 0, zone    DMA32     12     12     13     16     10      5      7     10      9      6     98 
Node 0, zone   Normal 2398537 469407 144288  97224  58276  24155   8153   3141   1299    451     75 
Node 1, zone   Normal 9182926 2727543 648104  81843   7915   1267    244     67      3      1      0

update 2014/11/25 – server is overloaded again:

# cat /proc/buddyinfo
Node 0, zone      DMA      2      1      2      2      2      1      0      0      1      1      3 
Node 0, zone    DMA32     12     12     13     16     10      5      7     10      9      6     98 
Node 0, zone   Normal 4374385  85408      0      0      0      0      0      0      0      0      0 
Node 1, zone   Normal 1830850 261703    460     14      0      0      0      0      0      0      0 

# cat /proc/meminfo 
MemTotal:       396867932 kB
MemFree:        28038892 kB
Buffers:        49126656 kB
Cached:            19088 kB
SwapCached:       303624 kB
Active:         305426204 kB
Inactive:       49729776 kB
Active(anon):   292040988 kB
Inactive(anon): 13969376 kB
Active(file):   13385216 kB
Inactive(file): 35760400 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:        512248 kB
SwapFree:             20 kB
Dirty:                28 kB
Writeback:             0 kB
AnonPages:      305706632 kB
Mapped:             9324 kB
Shmem:               124 kB
Slab:            8616228 kB
SReclaimable:    1580736 kB
SUnreclaim:      7035492 kB
KernelStack:        8200 kB
PageTables:       702268 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    198946212 kB
Committed_AS:   384014048 kB
VmallocTotal:   34359738367 kB
VmallocUsed:     1824832 kB
VmallocChunk:   34157271228 kB
HardwareCorrupted:     0 kB
AnonHugePages:  31670272 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        7852 kB
DirectMap2M:     3102720 kB
DirectMap1G:    399507456 kB

and in syslog are some page allocation failures:

Nov 25 09:14:07 wv2-f302 kernel: qemu_wl50: page allocation failure. order:4, mode:0x20
Nov 25 09:14:07 wv2-f302 kernel: Pid: 4444, comm: qemu_wl50 Not tainted 2.6.32-431.17.1.el6.x86_64 #1
Nov 25 09:14:07 wv2-f302 kernel: Call Trace:
Nov 25 09:14:07 wv2-f302 kernel: <IRQ>  [<ffffffff8112f64a>] ? __alloc_pages_nodemask+0x74a/0x8d0
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8116e082>] ? kmem_getpages+0x62/0x170
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8116ec9a>] ? fallback_alloc+0x1ba/0x270
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8116ea19>] ? ____cache_alloc_node+0x99/0x160
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8116fbe0>] ? kmem_cache_alloc_node_trace+0x90/0x200
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8116fdfd>] ? __kmalloc_node+0x4d/0x60
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8144ff5a>] ? __alloc_skb+0x7a/0x180
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff81451070>] ? skb_copy+0x40/0xb0
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa013a55c>] ? tg3_start_xmit+0xa8c/0xd80 [tg3]
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814603e4>] ? dev_hard_start_xmit+0x224/0x480
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8147be6a>] ? sch_direct_xmit+0x15a/0x1c0
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814608e8>] ? dev_queue_xmit+0x228/0x320
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa02c8898>] ? br_dev_queue_push_xmit+0x88/0xc0 [bridge]
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa02c8928>] ? br_forward_finish+0x58/0x60 [bridge]
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa02c8ae8>] ? __br_deliver+0x78/0x110 [bridge]
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa02c8bb5>] ? br_deliver+0x35/0x40 [bridge]
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa02c78f4>] ? br_dev_xmit+0x114/0x140 [bridge]
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814603e4>] ? dev_hard_start_xmit+0x224/0x480
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8146087d>] ? dev_queue_xmit+0x1bd/0x320
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff81466785>] ? neigh_resolve_output+0x105/0x2d0
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8149a2f8>] ? ip_finish_output+0x148/0x310
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8149a578>] ? ip_output+0xb8/0xc0
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8149983f>] ? __ip_local_out+0x9f/0xb0
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff81499875>] ? ip_local_out+0x25/0x30
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff81499d50>] ? ip_queue_xmit+0x190/0x420
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814af06e>] ? tcp_transmit_skb+0x40e/0x7b0
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814b15b0>] ? tcp_write_xmit+0x230/0xa90
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814b2130>] ? __tcp_push_pending_frames+0x30/0xe0
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814a9893>] ? tcp_data_snd_check+0x33/0x100
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814ad491>] ? tcp_rcv_established+0x381/0x7f0
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814b5873>] ? tcp_v4_do_rcv+0x2e3/0x490
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa02b1557>] ? ipv4_confirm+0x87/0x1d0 [nf_conntrack_ipv4]
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa0124441>] ? nf_nat_fn+0x91/0x260 [iptable_nat]
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814b717a>] ? tcp_v4_rcv+0x51a/0x900
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff81494300>] ? ip_local_deliver_finish+0x0/0x2d0
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814943dd>] ? ip_local_deliver_finish+0xdd/0x2d0
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff81494668>] ? ip_local_deliver+0x98/0xa0
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff81493b2d>] ? ip_rcv_finish+0x12d/0x440
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff814940b5>] ? ip_rcv+0x275/0x350
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff81489509>] ? nf_iterate+0x69/0xb0
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8145b5db>] ? __netif_receive_skb+0x4ab/0x750
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8145f1f0>] ? netif_receive_skb+0x0/0x60
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8145f248>] ? netif_receive_skb+0x58/0x60
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa02c9af8>] ? br_handle_frame_finish+0x1e8/0x2a0 [bridge]
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa02c9d5a>] ? br_handle_frame+0x1aa/0x250 [bridge]
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8145b659>] ? __netif_receive_skb+0x529/0x750
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8145f248>] ? netif_receive_skb+0x58/0x60
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8145f350>] ? napi_skb_finish+0x50/0x70
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff81460ab9>] ? napi_gro_receive+0x39/0x50
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa0136b54>] ? tg3_poll_work+0xc24/0x1020 [tg3]
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa0136f9c>] ? tg3_poll_msix+0x4c/0x150 [tg3]
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff81460bd3>] ? net_rx_action+0x103/0x2f0
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff810a6da9>] ? ktime_get+0x69/0xf0
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8107a551>] ? __do_softirq+0xc1/0x1e0
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff810e6b20>] ? handle_IRQ_event+0x60/0x170
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8100fa75>] ? do_softirq+0x65/0xa0
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8107a405>] ? irq_exit+0x85/0x90
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff815312c5>] ? do_IRQ+0x75/0xf0
Nov 25 09:14:07 wv2-f302 kernel: <EOI>  [<ffffffffa018e271>] ? kvm_arch_vcpu_ioctl_run+0x4c1/0x10b0 [kvm]
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa018e25f>] ? kvm_arch_vcpu_ioctl_run+0x4af/0x10b0 [kvm]
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff810aee2e>] ? futex_wake+0x10e/0x120
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffffa0175b04>] ? kvm_vcpu_ioctl+0x434/0x580 [kvm]
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8119d802>] ? vfs_ioctl+0x22/0xa0
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8119dcca>] ? do_vfs_ioctl+0x3aa/0x580
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff810b186b>] ? sys_futex+0x7b/0x170
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8119df21>] ? sys_ioctl+0x81/0xa0
Nov 25 09:14:07 wv2-f302 kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b

edit:
The problem was solved by reducing the memory allocated for guests. Now there are 3 guests with 80 GB RAM each, leaving about 150 GB RAM to the host system:

# free -h
              total        used        free      shared  buff/cache   available
Mem:           377G        243G         29G        1,9G        104G        132G

Best Answer

Theres a lot of free memory, but these zones are totally fragmented:

Node 0 Normal: 1648026*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6592104kB
Node 1 Normal: 8390977*4kB 1181188*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB

There are very few non-zero order pages left, none in one zone left at all.

I cant guarantee anything but you may want to try to turn off ksmd and re-compact memory. Compaction only gets called automatically on higher order page allocations and never calls oom-killer, so I assume that the system has tried to allocate memory from orders 2 or 3 and got stuck.

To compact memory run echo 1 >/proc/sys/vm/compact_memory

Theres only so much to go off in this question, but I suspect ksmd is causing the fragmentation by scanning for pages duplicated in both VM's and swapping them all around.

Best Answer

Related Solutions

Linux – How does gnome system monitor calculates the amount of used memory

Server Refuses to Use Swap Partition – Troubleshooting

Related Topic