¿Por qué se invoca OOM-killer incluso cuando hay una gran cantidad de RAM libre?

8

En la instancia de ubuntu xlarge respaldada por ebs de EC2, se invoca a oom-killer. De la salida / var / log / syslog a continuación, parece que ZONE_NORMAL se está quedando sin memoria:

Node 0 Normal free:11344kB min:11556kB low:14444kB high:17332kB active_anon:10936284kB inactive_anon:144kB active_file:688kB inactive_file:740kB

pero ¿por qué ZONE_NORMAL asigna solo 11 MB de RAM total de 15 GB? ¿O hay alguna otra razón para quedarse sin memoria?

La RAM disponible en la máquina (instancia de xlarge) es de 15 GB. La suma de la columna rss en el registro a continuación es 3.7GB y la de total_vm es 11.4GB.

Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456146] node invoked oom-killer: gfp_mask=0x84d0, order=0, oom_adj=0, oom_score_adj=0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456153] node cpuset=/ mems_allowed=0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456157] Pid: 639, comm: node Not tainted 2.6.38-1-virtual #28-Ubuntu
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456160] Call Trace:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456170]  [<ffffffff810b259d>] ? cpuset_print_task_mems_allowed+0x9d/0xb0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456189]  [<ffffffff81106a01>] ? dump_header+0x91/0x1e0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456196]  [<ffffffff81007e7f>] ? xen_restore_fl_direct_end+0x0/0x1
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456203]  [<ffffffff815c92de>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456208]  [<ffffffff8110705d>] ? oom_kill_process+0x8d/0x180
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456213]  [<ffffffff81107492>] ? out_of_memory+0x102/0x240
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456218]  [<ffffffff8110c209>] ? __alloc_pages_nodemask+0x709/0x7f0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456224]  [<ffffffff81105c78>] ? filemap_fault+0x398/0x490
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456230]  [<ffffffff811414bd>] ? alloc_pages_current+0x9d/0x110
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456235]  [<ffffffff8104338b>] ? pte_alloc_one+0x1b/0x50
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456239]  [<ffffffff8100530e>] ? xen_pud_val+0xe/0x10
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456245]  [<ffffffff811255c5>] ? __pte_alloc+0x35/0x100
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456250]  [<ffffffff81129559>] ? handle_mm_fault+0x129/0x250
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456257]  [<ffffffff815ccf82>] ? do_page_fault+0x1a2/0x540
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456262]  [<ffffffff8109aacb>] ? do_futex+0xbb/0x210
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456266]  [<ffffffff8109ac9b>] ? sys_futex+0x7b/0x180
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456274]  [<ffffffff814c2975>] ? sys_recvmsg+0x75/0x90
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456278]  [<ffffffff815c9a15>] ? page_fault+0x25/0x30
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456282] Mem-Info:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456284] Node 0 DMA per-cpu:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456288] CPU    0: hi:    0, btch:   1 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456290] CPU    1: hi:    0, btch:   1 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456293] CPU    2: hi:    0, btch:   1 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456295] CPU    3: hi:    0, btch:   1 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456299] Node 0 DMA32 per-cpu:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456303] CPU    0: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456307] CPU    1: hi:  186, btch:  31 usd:  81
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456310] CPU    2: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456314] CPU    3: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456317] Node 0 Normal per-cpu:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456321] CPU    0: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456323] CPU    1: hi:  186, btch:  31 usd: 175
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456326] CPU    2: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456329] CPU    3: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456334] active_anon:3726183 inactive_anon:44 isolated_anon:0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456335]  active_file:214 inactive_file:237 isolated_file:64
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456336]  unevictable:0 dirty:1 writeback:0 unstable:0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456337]  free:18955 slab_reclaimable:2811 slab_unreclaimable:7277
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456338]  mapped:256 shmem:101 pagetables:12634 bounce:0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456341] Node 0 DMA free:15880kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(
anon):0kB isolated(file):0kB present:15688kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456353] lowmem_reserve[]: 0 4024 15142 15142
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456360] Node 0 DMA32 free:48596kB min:4180kB low:5224kB high:6268kB active_anon:3968448kB inactive_anon:32kB active_file:168kB inactive_file:208kB unevic
table:0kB isolated(anon):0kB isolated(file):0kB present:4120800kB mlocked:0kB dirty:0kB writeback:0kB mapped:328kB shmem:80kB slab_reclaimable:2208kB slab_unreclaimable:8604kB kernel_stack:48kB pagetabl
es:9724kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:601 all_unreclaimable? yes
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456371] lowmem_reserve[]: 0 0 11117 11117
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456377] Node 0 Normal free:11344kB min:11556kB low:14444kB high:17332kB active_anon:10936284kB inactive_anon:144kB active_file:688kB inactive_file:740kB
unevictable:0kB isolated(anon):0kB isolated(file):256kB present:11384720kB mlocked:0kB dirty:4kB writeback:0kB mapped:696kB shmem:324kB slab_reclaimable:9036kB slab_unreclaimable:20504kB kernel_stack:20
16kB pagetables:40812kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2190 all_unreclaimable? yes
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456388] lowmem_reserve[]: 0 0 0 0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456393] Node 0 DMA: 0*4kB 1*8kB 0*16kB 0*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15880kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456409] Node 0 DMA32: 715*4kB 525*8kB 343*16kB 245*32kB 147*64kB 50*128kB 16*256kB 5*512kB 2*1024kB 0*2048kB 1*4096kB = 48996kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456423] Node 0 Normal: 1919*4kB 5*8kB 2*16kB 2*32kB 1*64kB 1*128kB 2*256kB 2*512kB 0*1024kB 1*2048kB 0*4096kB = 11588kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456438] 768 total pagecache pages
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456440] 0 pages in swap cache
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456443] Swap cache stats: add 0, delete 0, find 0/0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456445] Free swap  = 0kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456447] Total swap = 0kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495285] 3934192 pages RAM
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495290] 91275 pages reserved
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495291] 1661 pages shared
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495293] 3822220 pages non-shared
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495296] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495310] [  228]     0   228     4297       71   2       0             0 upstart-udev-br
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495315] [  242]     0   242     5353      123   2     -17         -1000 udevd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495319] [  319]     0   319     5352      126   2     -17         -1000 udevd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495323] [  320]     0   320     5352      126   0     -17         -1000 udevd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495327] [  392]     0   392     1771      124   2       0             0 dhclient
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495332] [  467]     0   467    12366      140   3     -17         -1000 sshd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495336] [  474]   101   474    47476      160   2       0             0 rsyslogd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495340] [  502]   102   502     5927      108   2       0             0 dbus-daemon
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495344] [  534]     0   534     1035       27   3       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495348] [  539]     0   539     1035       27   0       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495352] [  543]     0   543    13589     2510   1       0             0 supervisord
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495356] [  544]     0   544     1035       25   2       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495360] [  545]     0   545     1035       27   2       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495364] [  547]     0   547     1035       27   2       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495368] [  552]     0   552     5302       65   1       0             0 cron
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495372] [  553]     0   553     4753       44   1       0             0 atd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495376] [  565]     0   565     3946       46   2       0             0 irqbalance
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495380] [  571]   106   571    35760     3859   1       0             0 memcached
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495384] [  637]  1000   637   222128    36427   1       0             0 node
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495388] [  639]  1000   639   189634     4396   1       0             0 node
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495392] [  696]     0   696     1035       27   3       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495396] [  755]     0   755    31312      372   2       0             0 console-kit-dae
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495401] [ 2404]     0  2404    35058     3644   1       0             0 carbon-cache.py
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495405] [ 2601]  1000  2601     5982      199   1       0             0 screen
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495409] [ 2602]  1000  2602     5576      766   0       0             0 bash
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495413] [ 2661]     0  2661     6801       67   1       0             0 sudo
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495417] [ 2663]     0  2663     7133      851   1       0             0 run-graphite-de
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495421] [ 2668]     0  2668     1056       25   1       0             0 sh
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495425] [ 2669]     0  2669    12906     3942   1       0             0 django-admin.py
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495429] [ 2670]     0  2670    50313    11546   1       0             0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495433] [ 2785]     0  2785    15953      489   1       0             0 nginx
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495437] [ 3123]  1000  3123  1184911   266122   1       0             0 java
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495443] [18135]     0 18135  5884532     9128   1       0             0 mongod
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495449] [12574]  1000 12574   966781   932357   2       0             0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495455] [30139]  1000 30139   759010   741642   1       0             0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495461] [14186]    33 14186    19905     4657   1       0             0 nginx
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495465] [14187]    33 14187    19726     4527   1       0             0 nginx
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495470] [14190]  1000 14190    23767     8459   1       0             0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495474] [14201]  1000 14201  1706261  1687237   1       0             0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495478] [24553]     0 24553    19831      200   1       0             0 sshd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495482] [24584]  1000 24584    19831      200   1       0             0 sshd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495486] [24585]  1000 24585     6464     1655   1       0             0 bash
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495490] Out of memory: Kill process 14201 (python) score 439 or sacrifice child
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495509] Killed process 14201 (python) total-vm:6825044kB, anon-rss:6748848kB, file-rss:100kB

¡Gracias por tu ayuda!

Tarkeshwar
fuente
44
¿Porque no has deshabilitado oom-killer?
HopelessN00b
2
Tal vez deberías incluir información sobre cómo hacerlo.
mdpc
pmap -x <pid> muestra los valores de rss en kb. En el informe de bloqueo de Oom, la columna rss está en número de páginas. Multiplique por el tamaño de la página (4kB).
Tarkeshwar

Respuestas:

13

Parece un OOM bastante legítimo para mí:

  1. Tiene un nodo NUMA (es decir, el Nodo 0) que tiene tres zonas:
    1. DMA que tiene 15880kB gratis
    2. DMA32 que tiene 48996kB gratis
    3. Normal que tiene 11588kB gratis
  2. Casi toda su memoria 3968448kB + 10936284kBestá en la active_anonlista LRU.
  3. Los tamaños de memoria en la tabla están en páginas (que en su arquitectura probablemente sean 4kb), por lo que suman hasta 14.8Gb, no 3.7.
SaveTheRbtz
fuente
0

Ambos tienen razón 3.7Gb (4) = 14.8Gb. Cada valor de rss debe multiplicarse por el tamaño de las páginas para tener el tamaño real asignado en la memoria. En The Memory Descriptor el rss es solo el número de / * páginas asignadas * /

Saludos

Samuel Paz Nuñez
fuente