linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: linux-mm@kvack.org
Subject: [PATCH 00 of 16] OOM related fixes
Date: Fri, 08 Jun 2007 22:02:58 +0200	[thread overview]
Message-ID: <patchbomb.1181332978@v2.random> (raw)

Hello everyone,

this is a set of fixes done in the context of a quite evil workload reading
from nfs large files with big read buffers in parallel from many tasks at
the same time until the system goes oom. Mostly all of these fixes seems to be
required to fix the customer workload on top of an older sles kernel. The
forward port of the fixes has been already tested successfully on similar evil
workloads.

mainline vanilla running a somewhat simulated workload:

Jun  8 06:06:56 kvm kernel: Out of memory: Killed process 3282 (klauncher).
Jun  8 06:17:35 kvm kernel: Out of memory: kill process 3002 (qmgr) score 11225 or a child
Jun  8 06:17:35 kvm kernel: Out of memory: kill process 3001 (pickup) score 11216 or a child
Jun  8 06:17:35 kvm kernel: Out of memory: kill process 2186 (hald) score 11004 or a child
Jun  8 06:17:35 kvm kernel: Out of memory: kill process 3515 (bash) score 9447 or a child
Jun  8 06:17:35 kvm kernel: Out of memory: kill process 2186 (hald) score 8558 or a child
Jun  8 06:17:35 kvm kernel: Out of memory: kill process 2142 (dbus-daemon) score 5591 or a child
Jun  8 06:17:35 kvm kernel: Out of memory: kill process 3549 (recursive_readd) score 4597 or a child
Jun  8 06:17:43 kvm kernel: Out of memory: kill process 3591 (pickup) score 9756 or a child
Jun  8 06:17:43 kvm kernel: Out of memory: kill process 2204 (hald-addon-acpi) score 4121 or a child
Jun  8 06:17:43 kvm kernel: Out of memory: kill process 3515 (bash) score 3808 or a child
Jun  8 06:17:45 kvm kernel: Out of memory: kill process 3555 (recursive_readd) score 2330 or a child
Jun  8 06:17:53 kvm kernel: Out of memory: kill process 3554 (recursive_readd) score 2605 or a child
Jun  8 06:18:00 kvm kernel: Out of memory: kill process 3170 (nscd) score 1985 or a child
Jun  8 06:18:00 kvm kernel: Out of memory: kill process 3187 (nscd) score 1985 or a child
Jun  8 06:18:00 kvm kernel: Out of memory: kill process 3188 (nscd) score 1985 or a child
Jun  8 06:18:00 kvm kernel: Out of memory: kill process 2855 (portmap) score 1965 or a child
Jun  8 06:18:00 kvm kernel: Out of memory: kill process 3551 (recursive_readd) score 859 or a child
[ eventually it deadlocks and stops killing new tasks ]

mainline + fixes running the same simulated workload:

Jun  8 13:35:32 kvm kernel: Out of memory: kill process 3494 (recursive_readd) score 3822 or a child
Jun  8 13:35:33 kvm kernel: Out of memory: kill process 3494 (recursive_readd) score 3822 or a child
Jun  8 13:35:33 kvm kernel: Out of memory: kill process 3494 (recursive_readd) score 3822 or a child
Jun  8 13:37:33 kvm kernel: Out of memory: kill process 3505 (recursive_readd) score 622 or a child
Jun  8 13:37:34 kvm kernel: Out of memory: kill process 3510 (recursive_readd) score 418 or a child
Jun  8 13:37:36 kvm kernel: Out of memory: kill process 3535 (recursive_readd) score 377 or a child
Jun  8 13:37:36 kvm kernel: Out of memory: kill process 3498 (recursive_readd) score 370 or a child
Jun  8 13:37:36 kvm kernel: Out of memory: kill process 3516 (recursive_readd) score 364 or a child
Jun  8 13:37:36 kvm kernel: Out of memory: kill process 3515 (recursive_readd) score 357 or a child
Jun  8 13:40:49 kvm kernel: Out of memory: kill process 3537 (recursive_readd) score 2391 or a child
Jun  8 13:40:50 kvm kernel: Out of memory: kill process 3537 (recursive_readd) score 2391 or a child
Jun  8 13:40:50 kvm kernel: Out of memory: kill process 3537 (recursive_readd) score 2391 or a child
Jun  8 13:40:50 kvm kernel: Out of memory: kill process 3537 (recursive_readd) score 2391 or a child
Jun  8 13:40:50 kvm kernel: Out of memory: kill process 3537 (recursive_readd) score 2391 or a child
Jun  8 13:40:50 kvm kernel: Out of memory: kill process 3537 (recursive_readd) score 2391 or a child
Jun  8 13:40:50 kvm kernel: Out of memory: kill process 3537 (recursive_readd) score 2391 or a child
Jun  8 13:40:50 kvm kernel: Out of memory: kill process 3537 (recursive_readd) score 2391 or a child
Jun  8 13:40:50 kvm kernel: Out of memory: kill process 3537 (recursive_readd) score 2391 or a child
Jun  8 13:40:50 kvm kernel: Out of memory: kill process 3537 (recursive_readd) score 2391 or a child
Jun  8 13:40:51 kvm kernel: Out of memory: kill process 3537 (recursive_readd) score 2391 or a child
Jun  8 13:40:51 kvm kernel: Out of memory: kill process 3537 (recursive_readd) score 2391 or a child
Jun  8 13:40:51 kvm kernel: Out of memory: kill process 3537 (recursive_readd) score 2391 or a child
Jun  8 13:40:51 kvm kernel: Out of memory: kill process 3537 (recursive_readd) score 2391 or a child
Jun  8 13:41:55 kvm kernel: Out of memory: kill process 3558 (recursive_readd) score 356 or a child
Jun  8 13:41:56 kvm kernel: Out of memory: kill process 3578 (recursive_readd) score 355 or a child
Jun  8 13:41:56 kvm kernel: Out of memory: kill process 3577 (recursive_readd) score 350 or a child
Jun  8 13:41:56 kvm kernel: Out of memory: kill process 3572 (recursive_readd) score 347 or a child
Jun  8 13:41:56 kvm kernel: Out of memory: kill process 3568 (recursive_readd) score 346 or a child

The oom deadlock detection triggers a couple of times against the PG_locked
deadlock:

Jun  8 13:51:19 kvm kernel: Killed process 3504 (recursive_readd)
Jun  8 13:51:19 kvm kernel: detected probable OOM deadlock, so killing another task
Jun  8 13:51:19 kvm kernel: Out of memory: kill process 3532 (recursive_readd) score 1225 or a child

Example of stack trace of TIF_MEMDIE killed task (not literally verified that
this was the one with TIF_MEMDIE set but it's the same as before with the
verified one):

recursive_rea D ffff810001056418     0  3548   3544 (NOTLB)
 ffff81000e57dba8 0000000000000082 ffff8100010af5e8 ffff8100148df730
 ffff81001ff3ea10 0000000000bd2e1b ffff8100148df908 0000000000000046
 ffff81001fd5f170 ffffffff8031c36d ffff81001fd5f170 ffff810001056418
Call Trace:
 [<ffffffff8031c36d>] __generic_unplug_device+0x13/0x24
 [<ffffffff80244163>] sync_page+0x0/0x40
 [<ffffffff804cdf5b>] io_schedule+0xf/0x17
 [<ffffffff8024419e>] sync_page+0x3b/0x40
 [<ffffffff804ce162>] __wait_on_bit_lock+0x36/0x65
 [<ffffffff80244150>] __lock_page+0x5e/0x64
 [<ffffffff802321f1>] wake_bit_function+0x0/0x23
 [<ffffffff802440c0>] find_get_page+0xe/0x40
 [<ffffffff80244a33>] do_generic_mapping_read+0x200/0x450
 [<ffffffff80243f26>] file_read_actor+0x0/0x11d
 [<ffffffff80247fd4>] get_page_from_freelist+0x2d3/0x36e
 [<ffffffff802464d0>] generic_file_aio_read+0x11d/0x159
 [<ffffffff80260bdc>] do_sync_read+0xc9/0x10c
 [<ffffffff80252adb>] vma_merge+0x10c/0x195
 [<ffffffff802321c3>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80253a06>] do_mmap_pgoff+0x5e1/0x74c
 [<ffffffff8026134d>] vfs_read+0xaa/0x132                                                                                         
 [<ffffffff80261662>] sys_read+0x45/0x6e
 [<ffffffff8020991e>] system_call+0x7e/0x83

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply	other threads:[~2007-06-08 20:06 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-08 20:02 Andrea Arcangeli [this message]
2007-06-08 20:02 ` [PATCH 01 of 16] remove nr_scan_inactive/active Andrea Arcangeli
2007-06-10 17:36   ` Rik van Riel
2007-06-10 18:17     ` Andrea Arcangeli
2007-06-11 14:58       ` Rik van Riel
2007-06-26 17:08       ` Rik van Riel
2007-06-26 17:55         ` Andrew Morton
2007-06-26 19:02           ` Rik van Riel
2007-06-28 22:44           ` Rik van Riel
2007-06-28 22:57             ` Andrew Morton
2007-06-28 23:04               ` Rik van Riel
2007-06-28 23:13                 ` Andrew Morton
2007-06-28 23:16                   ` Rik van Riel
2007-06-28 23:29                     ` Andrew Morton
2007-06-29  0:00                       ` Rik van Riel
2007-06-29  0:19                         ` Andrew Morton
2007-06-29  0:45                           ` Rik van Riel
2007-06-29  1:12                             ` Andrew Morton
2007-06-29  1:20                               ` Rik van Riel
2007-06-29  1:29                                 ` Andrew Morton
2007-06-28 23:25                   ` Andrea Arcangeli
2007-06-29  0:12                     ` Andrew Morton
2007-06-29 13:38             ` Lee Schermerhorn
2007-06-29 14:12               ` Andrea Arcangeli
2007-06-29 14:59                 ` Rik van Riel
2007-06-29 22:39                 ` "Noreclaim Infrastructure" [was Re: [PATCH 01 of 16] remove nr_scan_inactive/active] Lee Schermerhorn
2007-06-29 22:42                 ` RFC "Noreclaim Infrastructure - patch 1/3 basic infrastructure" Lee Schermerhorn
2007-06-29 22:44                 ` RFC "Noreclaim Infrastructure patch 2/3 - noreclaim statistics..." Lee Schermerhorn
2007-06-29 22:49                 ` "Noreclaim - client patch 3/3 - treat pages w/ excessively references anon_vma as nonreclaimable" Lee Schermerhorn
2007-06-26 20:37         ` [PATCH 01 of 16] remove nr_scan_inactive/active Andrea Arcangeli
2007-06-26 20:57           ` Rik van Riel
2007-06-26 22:21             ` Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 02 of 16] avoid oom deadlock in nfs_create_request Andrea Arcangeli
2007-06-10 17:38   ` Rik van Riel
2007-06-10 18:27     ` Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 03 of 16] prevent oom deadlocks during read/write operations Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 04 of 16] serialize oom killer Andrea Arcangeli
2007-06-09  6:43   ` Peter Zijlstra
2007-06-09 15:27     ` Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 05 of 16] avoid selecting already killed tasks Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 06 of 16] reduce the probability of an OOM livelock Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 07 of 16] balance_pgdat doesn't return the number of pages freed Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 08 of 16] don't depend on PF_EXITING tasks to go away Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 09 of 16] fallback killing more tasks if tif-memdie doesn't " Andrea Arcangeli
2007-06-08 21:57   ` Christoph Lameter
2007-06-08 20:03 ` [PATCH 10 of 16] stop useless vm trashing while we wait the TIF_MEMDIE task to exit Andrea Arcangeli
2007-06-08 21:48   ` Christoph Lameter
2007-06-09  1:59     ` Andrea Arcangeli
2007-06-09  3:01       ` Christoph Lameter
2007-06-09 14:05         ` Andrea Arcangeli
2007-06-09 14:38           ` Andrea Arcangeli
2007-06-11 16:07             ` Christoph Lameter
2007-06-11 16:50               ` Andrea Arcangeli
2007-06-11 16:57                 ` Christoph Lameter
2007-06-11 17:51                   ` Andrea Arcangeli
2007-06-11 17:56                     ` Christoph Lameter
2007-06-11 18:22                       ` Andrea Arcangeli
2007-06-11 18:39                         ` Christoph Lameter
2007-06-11 18:58                           ` Andrea Arcangeli
2007-06-11 19:25                             ` Christoph Lameter
2007-06-11 16:04           ` Christoph Lameter
2007-06-08 20:03 ` [PATCH 11 of 16] the oom schedule timeout isn't needed with the VM_is_OOM logic Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 12 of 16] show mem information only when a task is actually being killed Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 13 of 16] simplify oom heuristics Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 14 of 16] oom select should only take rss into account Andrea Arcangeli
2007-06-10 17:17   ` Rik van Riel
2007-06-10 17:30     ` Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 15 of 16] limit reclaim if enough pages have been freed Andrea Arcangeli
2007-06-10 17:20   ` Rik van Riel
2007-06-10 17:32     ` Andrea Arcangeli
2007-06-10 17:52       ` Rik van Riel
2007-06-11 16:23         ` Christoph Lameter
2007-06-11 16:57           ` Rik van Riel
2007-06-08 20:03 ` [PATCH 16 of 16] avoid some lock operation in vm fast path Andrea Arcangeli
2007-06-08 21:26 ` [PATCH 00 of 16] OOM related fixes William Lee Irwin III
2007-06-09 14:55   ` Andrea Arcangeli
2007-06-12  8:58     ` Petr Tesarik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=patchbomb.1181332978@v2.random \
    --to=andrea@suse.de \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox