From: David Rientjes <rientjes@google.com>
To: Paul Jackson <pj@sgi.com>
Cc: clameter@sgi.com, andrea@suse.de, akpm@linux-foundation.org,
linux-mm@kvack.org
Subject: Re: [patch 4/4] oom: serialize for cpusets
Date: Thu, 28 Jun 2007 01:05:36 -0700 (PDT) [thread overview]
Message-ID: <alpine.DEB.0.99.0706280039510.17762@chino.kir.corp.google.com> (raw)
In-Reply-To: <20070628003334.1ed6da96.pj@sgi.com>
On Thu, 28 Jun 2007, Paul Jackson wrote:
> > There's only three cases I'm aware of (and correct me if I'm wrong) where
> > that can happen: the GFP_ATOMIC exception, tasks that have switched their
> > cpuset attachment, or a change in p->mems_allowed and left pages behind in
> > other nodes with memory_migrate set to 0.
>
> Perhaps also shared memory - shared with another task in another cpuset
> that originally placed the page, then exited, leaving the current task
> as the only one holding it.
>
That's possible, but then the user gets what he deserves because he's
chosen to share memory across cpusets. Without the expensive search
through all system tasks and an examination of their associated mm's to
determine whether it has allocated memory on an OOM'ing node, we can't fix
that problem. And, even then, this would only help if that task were the
only user of such memory. Otherwise we kill it unnecessarily.
>From Christoph Lameter:
> Filtering tasks is a very expensive operation on huge systems. We have had
> cases where it took an hour or so for the OOM to complete. OOM usually
> occurs under heavy processing loads which makes the taking of global locks
> quite expensive.
>
The cost of this operation as I've enabled it in the OOM killer would be
similiar to cating /dev/cpuset/my_cpuset/tasks, with the exception that it
will take slightly longer if we have an elaborate hierarchy of
non-mem_exclusive cpusets. We need to hold a read_lock(&tasklist_lock)
and callback_mutex for this, but I would argue that if it's perfectly
legitimate for a system-wide (CONSTRAINT_NONE) OOM condition, then it
should be legitimate for a cpuset-wide (CONSTRAINT_CPUSET) OOM. All "huge
systems" surely don't use cpusets currently and they must not be affected
by this contention with current mainline behavior or we'd hear complaints.
The OOM killer does not act egregiously.
Also from Christoph Lameter:
> The "kill-the-current-process" approach is most effective in hitting the
> process that is allocating the most. And as far as I can tell its easiest
> to understand for our customer.
Hmm, it probably goes without saying that I disagree with the first
sentence or otherwise I wouldn't have written the patchset. There is
actually no guarantee at all that current is allocating the most, it could
have just attempted to allocate at a very unfortunate time and ended up
being the sacrificial lamb in the OOM killer.
A much better set of rules to determine what the best task to kill is
through the select_bad_process() heuristics which take things such as
OOM_DISABLE and total VM size into account when scoring tasks. It's
certainly the fairest way of determining which task to kill that will,
hopefully, alleviate the OOM condition as soon as possible for that
cpuset. I would argue that going through select_bad_process() would not
be as great of a performance hit as you might suspect compared with git
HEAD's behavior of trying to kill current, which may be ineligible for
several different reasons, making out_of_memory() a no-op, looping back to
__alloc_pages(), rescheduling, and spinning until such time as an eligible
task does hit out_of_memory() which would require an explicit memory
allocation attempt from it to even occur.
David
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-06-28 8:05 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-27 14:44 [patch 1/4] oom: extract deadlock helper function David Rientjes
2007-06-27 14:44 ` [patch 2/4] oom: select process to kill for cpusets David Rientjes
2007-06-27 14:44 ` [patch 3/4] oom: extract select helper function David Rientjes
2007-06-27 14:44 ` [patch 4/4] oom: serialize for cpusets David Rientjes
2007-06-27 21:53 ` Christoph Lameter
2007-06-27 22:13 ` Paul Jackson
2007-06-28 6:24 ` David Rientjes
2007-06-28 7:33 ` Paul Jackson
2007-06-28 8:05 ` David Rientjes [this message]
2007-06-28 9:03 ` Paul Jackson
2007-06-28 18:13 ` David Rientjes
2007-06-28 18:55 ` Paul Jackson
2007-06-28 19:27 ` Paul Menage
2007-06-28 20:15 ` Paul Jackson
2007-06-28 20:43 ` David Rientjes
2007-06-29 1:33 ` Christoph Lameter
2007-06-29 4:07 ` David Rientjes
2007-06-28 0:26 ` Andrea Arcangeli
2007-06-28 20:41 ` [patch 5/4] oom: add oom_kill_asking_task flag David Rientjes
2007-06-28 22:07 ` Paul Jackson
2007-06-27 21:52 ` [patch 2/4] oom: select process to kill for cpusets Christoph Lameter
2007-06-28 6:13 ` David Rientjes
2007-07-26 6:15 ` [patch 1/4] oom: extract deadlock helper function David Rientjes
2007-07-26 6:25 ` Andrew Morton
2007-07-26 7:29 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.0.99.0706280039510.17762@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=akpm@linux-foundation.org \
--cc=andrea@suse.de \
--cc=clameter@sgi.com \
--cc=linux-mm@kvack.org \
--cc=pj@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox