Proceso de Linux eliminado aunque haya suficiente memoria disponible

9

Estoy investigando por qué dos de nuestros procesos fueron eliminados por el asesino de OOM de Linux, a pesar de que parece haber habido suficiente RAM y suficiente SWAP disponible en ambas ocasiones.

Cuando lo interpreto como en esta respuesta, la primera solicitud de memoria solicitó 2 ^ 2 = 4 páginas (16 KB) de memoria (indicador de orden) y quería que fuera de la zona "Normal".

Jan 27 04:26:14 kernel: [639964.652706] java invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0

Y si analizo correctamente la salida, hay espacio más que suficiente:

Node 0 Normal free:178144kB min:55068kB low:68832kB high:82600kB 

La segunda vez tiene la misma solicitud unos minutos más tarde, y también parece haber suficiente espacio disponible.

¿Por qué se activó el asesino OOM entonces? ¿Estoy analizando mal la información?

  • El sistema es un Ubuntu 14.04 con el kernel 4.4.0-59 x64
  • El vm.overcommit_memoryajuste se establece en "0" ( heurístico ) que puede no ser óptimo.

Instancia uno:

Jan 27 04:26:14 kernel: [639964.652706] java invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan 27 04:26:14 kernel: [639964.652711] java cpuset=/ mems_allowed=0
Jan 27 04:26:14 kernel: [639964.652716] CPU: 5 PID: 2152 Comm: java Not tainted 4.4.0-59-generic #80~14.04.1-Ubuntu
Jan 27 04:26:14 kernel: [639964.652717] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 06/22/2012
Jan 27 04:26:14 kernel: [639964.652719]  0000000000000000 ffff88041a963b38 ffffffff813dbd6c ffff88041a963cf0
Jan 27 04:26:14 kernel: [639964.652721]  0000000000000000 ffff88041a963bc8 ffffffff811fafc6 0000000000000000
Jan 27 04:26:14 kernel: [639964.652722]  0000000000000000 0000000000000000 ffff88042a6d1b88 0000000000000015
Jan 27 04:26:14 kernel: [639964.652724] Call Trace:
Jan 27 04:26:14 kernel: [639964.652731]  [<ffffffff813dbd6c>] dump_stack+0x63/0x87
Jan 27 04:26:14 kernel: [639964.652736]  [<ffffffff811fafc6>] dump_header+0x5b/0x1d5
Jan 27 04:26:14 kernel: [639964.652741]  [<ffffffff813766f1>] ? apparmor_capable+0xd1/0x180
Jan 27 04:26:14 kernel: [639964.652746]  [<ffffffff81188b35>] oom_kill_process+0x205/0x3d0
Jan 27 04:26:14 kernel: [639964.652747]  [<ffffffff8118916b>] out_of_memory+0x40b/0x460
Jan 27 04:26:14 kernel: [639964.652749]  [<ffffffff811fba7f>] __alloc_pages_slowpath.constprop.87+0x742/0x7ad
Jan 27 04:26:14 kernel: [639964.652752]  [<ffffffff8118e167>] __alloc_pages_nodemask+0x237/0x240
Jan 27 04:26:14 kernel: [639964.652754]  [<ffffffff8118e32d>] alloc_kmem_pages_node+0x4d/0xd0
Jan 27 04:26:14 kernel: [639964.652758]  [<ffffffff8107c125>] copy_process+0x185/0x1ce0
Jan 27 04:26:14 kernel: [639964.652763]  [<ffffffff810fd0b4>] ? do_futex+0xf4/0x520
Jan 27 04:26:14 kernel: [639964.652766]  [<ffffffff810a71c9>] ? resched_curr+0xa9/0xd0
Jan 27 04:26:14 kernel: [639964.652768]  [<ffffffff8107de1a>] _do_fork+0x8a/0x310
Jan 27 04:26:14 kernel: [639964.652769]  [<ffffffff8107e149>] SyS_clone+0x19/0x20
Jan 27 04:26:14 kernel: [639964.652775]  [<ffffffff81802c76>] entry_SYSCALL_64_fastpath+0x16/0x75
Jan 27 04:26:14 kernel: [639964.652776] Mem-Info:
Jan 27 04:26:14 kernel: [639964.652780] active_anon:1596719 inactive_anon:281182 isolated_anon:0
Jan 27 04:26:14 kernel: [639964.652780]  active_file:953586 inactive_file:952370 isolated_file:0
Jan 27 04:26:14 kernel: [639964.652780]  unevictable:0 dirty:7358 writeback:0 unstable:0
Jan 27 04:26:14 kernel: [639964.652780]  slab_reclaimable:217903 slab_unreclaimable:12162
Jan 27 04:26:14 kernel: [639964.652780]  mapped:40068 shmem:34861 pagetables:8261 bounce:0
Jan 27 04:26:14 kernel: [639964.652780]  free:71705 free_pcp:0 free_cma:0
Jan 27 04:26:14 kernel: [639964.652783] Node 0 DMA free:15892kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 27 04:26:14 kernel: [639964.652787] lowmem_reserve[]: 0 2951 16005 16005 16005
Jan 27 04:26:14 kernel: [639964.652789] Node 0 DMA32 free:92784kB min:12448kB low:15560kB high:18672kB active_anon:1094416kB inactive_anon:368444kB active_file:579188kB inactive_file:561504kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:3048784kB mlocked:0kB dirty:1188kB writeback:0kB mapped:32604kB shmem:27372kB slab_reclaimable:336288kB slab_unreclaimable:7196kB kernel_stack:1520kB pagetables:3964kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 27 04:26:14 kernel: [639964.652793] lowmem_reserve[]: 0 0 13054 13054 13054
Jan 27 04:26:14 kernel: [639964.652795] Node 0 Normal free:178144kB min:55068kB low:68832kB high:82600kB active_anon:5292460kB inactive_anon:756284kB active_file:3235156kB inactive_file:3247976kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:13631488kB managed:13367448kB mlocked:0kB dirty:28244kB writeback:0kB mapped:127668kB shmem:112072kB slab_reclaimable:535324kB slab_unreclaimable:41436kB kernel_stack:3968kB pagetables:29080kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:128 all_unreclaimable? no
Jan 27 04:26:14 kernel: [639964.652798] lowmem_reserve[]: 0 0 0 0 0
Jan 27 04:26:14 kernel: [639964.652800] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
Jan 27 04:26:14 kernel: [639964.652807] Node 0 DMA32: 18127*4kB (UME) 2601*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 93316kB
Jan 27 04:26:14 kernel: [639964.652814] Node 0 Normal: 32943*4kB (UMEH) 5702*8kB (UMEH) 19*16kB (H) 13*32kB (H) 9*64kB (H) 2*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 178940kB
Jan 27 04:26:14 kernel: [639964.652820] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 27 04:26:14 kernel: [639964.652821] 1949078 total pagecache pages
Jan 27 04:26:14 kernel: [639964.652822] 8225 pages in swap cache
Jan 27 04:26:14 kernel: [639964.652824] Swap cache stats: add 1131771, delete 1123546, find 7366438/7540102
Jan 27 04:26:14 kernel: [639964.652824] Free swap  = 4080988kB
Jan 27 04:26:14 kernel: [639964.652825] Total swap = 4194300kB
Jan 27 04:26:14 kernel: [639964.652826] 4194174 pages RAM
Jan 27 04:26:14 kernel: [639964.652826] 0 pages HighMem/MovableOnly
Jan 27 04:26:14 kernel: [639964.652827] 86139 pages reserved
Jan 27 04:26:14 kernel: [639964.652828] 0 pages cma reserved
Jan 27 04:26:14 kernel: [639964.652828] 0 pages hwpoisoned
Jan 27 04:26:14 kernel: [639964.652829] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Jan 27 04:26:14 kernel: [639964.652834] [  424]     0   424     4909      388      14       3       68             0 upstart-udev-br
Jan 27 04:26:14 kernel: [639964.652836] [  439]     0   439    13075      456      29       3      322         -1000 systemd-udevd
Jan 27 04:26:14 kernel: [639964.652839] [  724]     0   724     3816      226      13       3       53             0 upstart-socket-
Jan 27 04:26:14 kernel: [639964.652840] [  813]     0   813     5856      449      16       3       57             0 rpcbind
Jan 27 04:26:14 kernel: [639964.652842] [  865]   108   865     5386      456      16       3      113             0 rpc.statd
Jan 27 04:26:14 kernel: [639964.652844] [ 1034]     0  1034     3820      281      12       3       35             0 upstart-file-br
Jan 27 04:26:14 kernel: [639964.652846] [ 1041]   102  1041     9817      366      23       3       50             0 dbus-daemon
Jan 27 04:26:14 kernel: [639964.652847] [ 1045]   101  1045    65018     1203      31       3      384             0 rsyslogd
Jan 27 04:26:14 kernel: [639964.652849] [ 1056]     0  1056    10870      525      26       4       49             0 systemd-logind
Jan 27 04:26:14 kernel: [639964.652851] [ 1063]     0  1063     5870        0      16       3       53             0 rpc.idmapd
Jan 27 04:26:14 kernel: [639964.652852] [ 1153]     0  1153     2558      371       9       3      517             0 dhclient
Jan 27 04:26:14 kernel: [639964.652854] [ 1374]     0  1374     3955      401      13       3       40             0 getty
Jan 27 04:26:14 kernel: [639964.652855] [ 1377]     0  1377     3955      406      13       3       38             0 getty
Jan 27 04:26:14 kernel: [639964.652857] [ 1383]     0  1383     3955      406      13       3       39             0 getty
Jan 27 04:26:14 kernel: [639964.652858] [ 1384]     0  1384     3955      418      13       3       37             0 getty
Jan 27 04:26:14 kernel: [639964.652859] [ 1386]     0  1386     3955      418      12       3       38             0 getty
Jan 27 04:26:14 kernel: [639964.652861] [ 1403]     0  1403    15346      735      34       3      142         -1000 sshd
Jan 27 04:26:14 kernel: [639964.652863] [ 1436]     0  1436     4825      408      13       3       28             0 irqbalance
Jan 27 04:26:14 kernel: [639964.652864] [ 1440]     0  1440     1093      379       8       3       35             0 acpid
Jan 27 04:26:14 kernel: [639964.652866] [ 1442]     0  1442     4785      176      14       3       38             0 atd
Jan 27 04:26:14 kernel: [639964.652867] [ 1443]     0  1443     5914      466      17       3       43             0 cron
Jan 27 04:26:14 kernel: [639964.652869] [ 1464]   105  1464    61957     3600      59       3      273          -900 postgres
Jan 27 04:26:14 kernel: [639964.652870] [ 1561]   107  1561     7864      657      21       3      113             0 ntpd
Jan 27 04:26:14 kernel: [639964.652872] [ 1762]   105  1762    62036    35419     117       3      264             0 postgres
Jan 27 04:26:14 kernel: [639964.652873] [ 1763]   105  1763    61957    35051     117       3      266             0 postgres
Jan 27 04:26:14 kernel: [639964.652875] [ 1764]   105  1764    61957     1773      52       3      306             0 postgres
Jan 27 04:26:14 kernel: [639964.652877] [ 1765]   105  1765    62166     4999     116       3      374             0 postgres
Jan 27 04:26:14 kernel: [639964.652878] [ 1766]   105  1766    25910      617      48       3      274             0 postgres
Jan 27 04:26:14 kernel: [639964.652880] [ 1834]  1004  1834  2002886   692615    1549      10    12707             0 java
Jan 27 04:26:14 kernel: [639964.652881] [ 1921]   106  1921     5835      452      16       3      112             0 nrpe
Jan 27 04:26:14 kernel: [639964.652883] [ 1943]     0  1943   175986      420      41       4       50             0 nscd
Jan 27 04:26:14 kernel: [639964.652884] [ 1978]   109  1978   111112      309      48       4      213             0 nslcd
Jan 27 04:26:14 kernel: [639964.652886] [ 2007]     8  2007     3172      326      11       3       52             0 nullmailer-send
Jan 27 04:26:14 kernel: [639964.652887] [ 2092]     0  2092    34005     1947      70       3     3067             0 /usr/bin/monito
Jan 27 04:26:14 kernel: [639964.652889] [ 2110]     0  2110     1901      367       9       3       25             0 getty
Jan 27 04:26:14 kernel: [639964.652891] [ 2146] 65534  2146    34005     1101      67       3     3810             0 monitorix-httpd
Jan 27 04:26:14 kernel: [639964.652893] [24525]   105 24525  1826264  1151331    3568      10      299             0 postgres
Jan 27 04:26:14 kernel: [639964.652895] [20380]   105 20380    62511    36514     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652897] [21273]   105 21273    62532    36508     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652898] [22133]   105 22133    62610    36827     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652900] [22135]   105 22135    62541    34994     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652901] [22428]     0 22428    15436      739      35       4       11             0 cron
Jan 27 04:26:14 kernel: [639964.652903] [22429]     0 22429    15489      749      35       4       12             0 cron
Jan 27 04:26:14 kernel: [639964.652904] [22442]     0 22442     1112      198       8       3        0             0 sh
Jan 27 04:26:14 kernel: [639964.652906] [22443]  1004 22443     1112      191       8       3        0             0 sh
Jan 27 04:26:14 kernel: [639964.652908] [22444]  1004 22444     3102      748      11       3        0             0 syncDaily.sh
Jan 27 04:26:14 kernel: [639964.652909] [22445]     0 22445     1112      420       8       3        0             0 cron-apt
Jan 27 04:26:14 kernel: [639964.652911] [22465]  1004 22465    55074    10532     113       3        0             0 rsync
Jan 27 04:26:14 kernel: [639964.652912] [22466]     0 22466     1087      171       8       3        0             0 sleep
Jan 27 04:26:14 kernel: [639964.652914] [22467]  1004 22467    29820     9151      62       3        0             0 rsync
Jan 27 04:26:14 kernel: [639964.652915] [22468]  1004 22468    61770     7168     125       3        0             0 rsync
Jan 27 04:26:14 kernel: [639964.652917] [22990]   105 22990    62490    35099     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652919] [23138]   105 23138    62491    35578     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652920] [23139]   105 23139    62690    36657     121       3      236             0 postgres
Jan 27 04:26:14 kernel: [639964.652922] [23140]   105 23140    62455    32973     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652923] [23631]   105 23631    62518    34978     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652925] [23635]   105 23635    62506    35193     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652927] [23636]   105 23636    62455    30085     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652928] [23637]   105 23637    62470    33106     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652930] [23639]   105 23639    62511    34295     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652940] Out of memory: Kill process 24525 (postgres) score 224 or sacrifice child
Jan 27 04:26:14 kernel: [639964.652975] Killed process 24525 (postgres) total-vm:7305056kB, anon-rss:4460244kB, file-rss:145080kB

Instancia dos

Jan 27 04:34:36 kernel: [640466.131656] java invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan 27 04:34:36 kernel: [640466.131660] java cpuset=/ mems_allowed=0
Jan 27 04:34:36 kernel: [640466.131665] CPU: 7 PID: 2152 Comm: java Not tainted 4.4.0-59-generic #80~14.04.1-Ubuntu
Jan 27 04:34:36 kernel: [640466.131666] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 06/22/2012
Jan 27 04:34:36 kernel: [640466.131668]  0000000000000000 ffff88041a963b38 ffffffff813dbd6c ffff88041a963cf0
Jan 27 04:34:36 kernel: [640466.131670]  0000000000000000 ffff88041a963bc8 ffffffff811fafc6 0000000000000000
Jan 27 04:34:36 kernel: [640466.131671]  0000000000000000 0000000000000000 ffff88042a6d1b88 0000000000000015
Jan 27 04:34:36 kernel: [640466.131673] Call Trace:
Jan 27 04:34:36 kernel: [640466.131698]  [<ffffffff813dbd6c>] dump_stack+0x63/0x87
Jan 27 04:34:36 kernel: [640466.131712]  [<ffffffff811fafc6>] dump_header+0x5b/0x1d5
Jan 27 04:34:36 kernel: [640466.131721]  [<ffffffff813766f1>] ? apparmor_capable+0xd1/0x180
Jan 27 04:34:36 kernel: [640466.131728]  [<ffffffff81188b35>] oom_kill_process+0x205/0x3d0
Jan 27 04:34:36 kernel: [640466.131730]  [<ffffffff8118916b>] out_of_memory+0x40b/0x460
Jan 27 04:34:36 kernel: [640466.131732]  [<ffffffff811fba7f>] __alloc_pages_slowpath.constprop.87+0x742/0x7ad
Jan 27 04:34:36 kernel: [640466.131734]  [<ffffffff8118e167>] __alloc_pages_nodemask+0x237/0x240
Jan 27 04:34:36 kernel: [640466.131736]  [<ffffffff8118e32d>] alloc_kmem_pages_node+0x4d/0xd0
Jan 27 04:34:36 kernel: [640466.131745]  [<ffffffff8107c125>] copy_process+0x185/0x1ce0
Jan 27 04:34:36 kernel: [640466.131755]  [<ffffffff810fd0b4>] ? do_futex+0xf4/0x520
Jan 27 04:34:36 kernel: [640466.131761]  [<ffffffff810a71c9>] ? resched_curr+0xa9/0xd0
Jan 27 04:34:36 kernel: [640466.131763]  [<ffffffff8107de1a>] _do_fork+0x8a/0x310
Jan 27 04:34:36 kernel: [640466.131765]  [<ffffffff8107e149>] SyS_clone+0x19/0x20
Jan 27 04:34:36 kernel: [640466.131779]  [<ffffffff81802c76>] entry_SYSCALL_64_fastpath+0x16/0x75
Jan 27 04:34:36 kernel: [640466.131781] Mem-Info:
Jan 27 04:34:36 kernel: [640466.131784] active_anon:463046 inactive_anon:339934 isolated_anon:0
Jan 27 04:34:36 kernel: [640466.131784]  active_file:1074992 inactive_file:1398029 isolated_file:0
Jan 27 04:34:36 kernel: [640466.131784]  unevictable:0 dirty:1307 writeback:0 unstable:0
Jan 27 04:34:36 kernel: [640466.131784]  slab_reclaimable:626085 slab_unreclaimable:26239
Jan 27 04:34:36 kernel: [640466.131784]  mapped:40618 shmem:35429 pagetables:4038 bounce:0
Jan 27 04:34:36 kernel: [640466.131784]  free:161367 free_pcp:0 free_cma:0
Jan 27 04:34:36 kernel: [640466.131788] Node 0 DMA free:15892kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 27 04:34:36 kernel: [640466.131792] lowmem_reserve[]: 0 2951 16005 16005 16005
Jan 27 04:34:36 kernel: [640466.131794] Node 0 DMA32 free:112056kB min:12448kB low:15560kB high:18672kB active_anon:465908kB inactive_anon:478436kB active_file:620808kB inactive_file:963844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:3048784kB mlocked:0kB dirty:844kB writeback:0kB mapped:11132kB shmem:5644kB slab_reclaimable:390764kB slab_unreclaimable:8488kB kernel_stack:1408kB pagetables:2304kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 27 04:34:36 kernel: [640466.131798] lowmem_reserve[]: 0 0 13054 13054 13054
Jan 27 04:34:36 kernel: [640466.131800] Node 0 Normal free:517520kB min:55068kB low:68832kB high:82600kB active_anon:1386276kB inactive_anon:881300kB active_file:3679160kB inactive_file:4628272kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:13631488kB managed:13367448kB mlocked:0kB dirty:4384kB writeback:0kB mapped:151340kB shmem:136072kB slab_reclaimable:2113576kB slab_unreclaimable:96452kB kernel_stack:3904kB pagetables:13848kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 27 04:34:36 kernel: [640466.131803] lowmem_reserve[]: 0 0 0 0 0
Jan 27 04:34:36 kernel: [640466.131805] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
Jan 27 04:34:36 kernel: [640466.131812] Node 0 DMA32: 20157*4kB (UME) 4165*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 113948kB
Jan 27 04:34:36 kernel: [640466.131817] Node 0 Normal: 119665*4kB (UMEH) 4706*8kB (UMEH) 12*16kB (H) 13*32kB (H) 10*64kB (H) 2*128kB (H) 1*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 518068kB
Jan 27 04:34:36 kernel: [640466.131824] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 27 04:34:36 kernel: [640466.131825] 2516698 total pagecache pages
Jan 27 04:34:36 kernel: [640466.131826] 8199 pages in swap cache
Jan 27 04:34:36 kernel: [640466.131828] Swap cache stats: add 1131970, delete 1123771, find 7374629/7548428
Jan 27 04:34:36 kernel: [640466.131828] Free swap  = 4085700kB
Jan 27 04:34:36 kernel: [640466.131829] Total swap = 4194300kB
Jan 27 04:34:36 kernel: [640466.131830] 4194174 pages RAM
Jan 27 04:34:36 kernel: [640466.131830] 0 pages HighMem/MovableOnly
Jan 27 04:34:36 kernel: [640466.131831] 86139 pages reserved
Jan 27 04:34:36 kernel: [640466.131832] 0 pages cma reserved
Jan 27 04:34:36 kernel: [640466.131832] 0 pages hwpoisoned
Jan 27 04:34:36 kernel: [640466.131833] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Jan 27 04:34:36 kernel: [640466.131838] [  424]     0   424     4909      388      14       3       68             0 upstart-udev-br
Jan 27 04:34:36 kernel: [640466.131841] [  439]     0   439    13075      456      29       3      322         -1000 systemd-udevd
Jan 27 04:34:36 kernel: [640466.131843] [  724]     0   724     3816      226      13       3       53             0 upstart-socket-
Jan 27 04:34:36 kernel: [640466.131845] [  813]     0   813     5856      449      16       3       57             0 rpcbind
Jan 27 04:34:36 kernel: [640466.131846] [  865]   108   865     5386      456      16       3      113             0 rpc.statd
Jan 27 04:34:36 kernel: [640466.131848] [ 1034]     0  1034     3820      281      12       3       35             0 upstart-file-br
Jan 27 04:34:36 kernel: [640466.131850] [ 1041]   102  1041     9817      366      23       3       50             0 dbus-daemon
Jan 27 04:34:36 kernel: [640466.131852] [ 1045]   101  1045    65018     1255      31       3      362             0 rsyslogd
Jan 27 04:34:36 kernel: [640466.131854] [ 1056]     0  1056    10870      525      26       4       49             0 systemd-logind
Jan 27 04:34:36 kernel: [640466.131855] [ 1063]     0  1063     5870        0      16       3       53             0 rpc.idmapd
Jan 27 04:34:36 kernel: [640466.131857] [ 1153]     0  1153     2558      371       9       3      517             0 dhclient
Jan 27 04:34:36 kernel: [640466.131858] [ 1374]     0  1374     3955      401      13       3       40             0 getty
Jan 27 04:34:36 kernel: [640466.131860] [ 1377]     0  1377     3955      406      13       3       38             0 getty
Jan 27 04:34:36 kernel: [640466.131861] [ 1383]     0  1383     3955      406      13       3       39             0 getty
Jan 27 04:34:36 kernel: [640466.131863] [ 1384]     0  1384     3955      418      13       3       37             0 getty
Jan 27 04:34:36 kernel: [640466.131864] [ 1386]     0  1386     3955      418      12       3       38             0 getty
Jan 27 04:34:36 kernel: [640466.131866] [ 1403]     0  1403    15346      735      34       3      142         -1000 sshd
Jan 27 04:34:36 kernel: [640466.131868] [ 1436]     0  1436     4825      408      13       3       28             0 irqbalance
Jan 27 04:34:36 kernel: [640466.131869] [ 1440]     0  1440     1093      379       8       3       35             0 acpid
Jan 27 04:34:36 kernel: [640466.131871] [ 1442]     0  1442     4785      176      14       3       38             0 atd
Jan 27 04:34:36 kernel: [640466.131872] [ 1443]     0  1443     5914      466      17       3       43             0 cron
Jan 27 04:34:36 kernel: [640466.131874] [ 1464]   105  1464    61957     4409      59       3      254          -900 postgres
Jan 27 04:34:36 kernel: [640466.131876] [ 1561]   107  1561     7864      657      21       3      113             0 ntpd
Jan 27 04:34:36 kernel: [640466.131877] [ 1834]  1004  1834  2002886   692883    1549      10    12598             0 java
Jan 27 04:34:36 kernel: [640466.131879] [ 1921]   106  1921     5835      452      16       3      112             0 nrpe
Jan 27 04:34:36 kernel: [640466.131880] [ 1943]     0  1943   175986      420      41       4       50             0 nscd
Jan 27 04:34:36 kernel: [640466.131882] [ 1978]   109  1978   111112      309      48       4      213             0 nslcd
Jan 27 04:34:36 kernel: [640466.131883] [ 2007]     8  2007     3172      326      11       3       52             0 nullmailer-send
Jan 27 04:34:36 kernel: [640466.131885] [ 2092]     0  2092    34005     1947      70       3     3067             0 /usr/bin/monito
Jan 27 04:34:36 kernel: [640466.131887] [ 2110]     0  2110     1901      367       9       3       25             0 getty
Jan 27 04:34:36 kernel: [640466.131888] [ 2146] 65534  2146    34005     1101      67       3     3810             0 monitorix-httpd
Jan 27 04:34:36 kernel: [640466.131891] [22428]     0 22428    15436      739      35       4       11             0 cron
Jan 27 04:34:36 kernel: [640466.131892] [22429]     0 22429    15489      749      35       4       12             0 cron
Jan 27 04:34:36 kernel: [640466.131894] [22442]     0 22442     1112      198       8       3        0             0 sh
Jan 27 04:34:36 kernel: [640466.131895] [22443]  1004 22443     1112      191       8       3        0             0 sh
Jan 27 04:34:36 kernel: [640466.131897] [22444]  1004 22444     3102      748      11       3        0             0 syncDaily.sh
Jan 27 04:34:36 kernel: [640466.131899] [22445]     0 22445     1112      420       8       3        0             0 cron-apt
Jan 27 04:34:36 kernel: [640466.131900] [22465]  1004 22465    54754    27012     113       3        0             0 rsync
Jan 27 04:34:36 kernel: [640466.131902] [22466]     0 22466     1087      171       8       3        0             0 sleep
Jan 27 04:34:36 kernel: [640466.131903] [22467]  1004 22467    34234    26953      72       3        0             0 rsync
Jan 27 04:34:36 kernel: [640466.131905] [22468]  1004 22468    62154    15613     126       3        0             0 rsync
Jan 27 04:34:36 kernel: [640466.131907] [24170]   105 24170    61990    34251     117       3      237             0 postgres
Jan 27 04:34:36 kernel: [640466.131908] [24171]   105 24171    61957    33191     115       3      238             0 postgres
Jan 27 04:34:36 kernel: [640466.131910] [24172]   105 24172    61957     1190      52       3      238             0 postgres
Jan 27 04:34:36 kernel: [640466.131911] [24173]   105 24173    62166     1333      54       3      239             0 postgres
Jan 27 04:34:36 kernel: [640466.131913] [24174]   105 24174    25876      642      48       3      242             0 postgres
Jan 27 04:34:36 kernel: [640466.131915] [24175]   105 24175    62464    35199     120       3      232             0 postgres
Jan 27 04:34:36 kernel: [640466.131916] [24203]   105 24203    62467    22296     120       3      232             0 postgres
Jan 27 04:34:36 kernel: [640466.131918] [24266]   105 24266    62475    36452     120       3      232             0 postgres
Jan 27 04:34:36 kernel: [640466.131920] [24317]   105 24317    62424    17702     119       3      232             0 postgres
Jan 27 04:34:36 kernel: [640466.131921] [24318]   105 24318    62449    24858     120       3      232             0 postgres
Jan 27 04:34:36 kernel: [640466.131923] [24320]   105 24320    62485    24779     120       3      232             0 postgres
Jan 27 04:34:36 kernel: [640466.131925] [24321]   105 24321    62449    27595     120       3      232             0 postgres
Jan 27 04:34:36 kernel: [640466.131926] [24452]   105 24452    62484    16118     120       3      232             0 postgres
Jan 27 04:34:36 kernel: [640466.131928] Out of memory: Kill process 1834 (java) score 137 or sacrifice child
Jan 27 04:34:36 kernel: [640466.132070] Killed process 1834 (java) total-vm:8011544kB, anon-rss:2763340kB, file-rss:8192kB
Patricio
fuente
Vea la excelente respuesta de Matthew Ife a dupe serverfault.com/a/762327/9517
user9517
@istheEnglishway ¿Es esto realmente un engañado? Nuestra solicitud es para la "Zona normal" (gfp que termina en cero) y la salida enumera dicha zona. Mientras que en el engaño al que hizo referencia, ambos hacen referencia a una zona de memoria diferente (HighMem) que no está presente en la salida ...
Patrick
1
Vale la pena señalar que, incluso si hay suficiente memoria libre, no se garantiza que haya 16 KB contiguos. No estoy seguro si eso se aplica aquí o no.
kasperd
Correcto, pero si analizo la salida correctamente, entonces debería haber varios bloques de 16 KB disponibles: "19 * 16 kB (H)" para la primera aparición. Sin embargo, es posible que haya encontrado un error de kernel relevante, lo publicaré como una (posible) respuesta.
Patrick

Respuestas:

5

Una posible razón parece ser que se trata de un error del kernel de Ubuntu. Hay un problema de launchpad que describe situaciones de OOM para este kernel: 1655842

Después de volver a un kernel anterior (4.4.0-57), el problema no volvió a ocurrir en las dos semanas que monitoreamos el sistema.

¡Las explicaciones / respuestas alternativas son muy bienvenidas!

Patricio
fuente
¿Alguna idea de si la última versión de 4.4.0corrige este error?
Elad Nava
1
Que yo sepa, sí. Desde entonces, hemos actualizado a los nuevos núcleos 4.4.0 y ya no hemos encontrado este problema.
Patrick