From: David Rientjes <rientjes@google.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Andrew Morton <akpm@linux-foundation.org>,
anfei <anfei.zhou@gmail.com>,
nishimura@mxp.nes.nec.co.jp,
Balbir Singh <balbir@linux.vnet.ibm.com>,
linux-mm@kvack.org
Subject: Re: [patch -mm] memcg: make oom killer a no-op when no killable task can be found
Date: Thu, 8 Apr 2010 11:05:58 -0700 (PDT) [thread overview]
Message-ID: <alpine.DEB.2.00.1004081036520.25592@chino.kir.corp.google.com> (raw)
In-Reply-To: <20100407205418.FB90.A69D9226@jp.fujitsu.com>
On Wed, 7 Apr 2010, KOSAKI Motohiro wrote:
> > > > oom-badness-heuristic-rewrite.patch
> > >
> > > Do you have any specific feedback that you could offer on why you decided
> > > to nack this?
> > >
> >
> > I like this patch. But I think no one can't Ack this because there is no
> > "correct" answer. At least, this show good behavior on my environment.
>
> see diffstat. that's perfectly crap, obviously need to make separate patches
> individual one. Who can review it?
>
> Documentation/filesystems/proc.txt | 95 ++++----
> Documentation/sysctl/vm.txt | 21 +
> fs/proc/base.c | 98 ++++++++
> include/linux/memcontrol.h | 8
> include/linux/oom.h | 17 +
> include/linux/sched.h | 3
> kernel/fork.c | 1
> kernel/sysctl.c | 9
> mm/memcontrol.c | 18 +
> mm/oom_kill.c | 319 ++++++++++++++-------------
> 10 files changed, 404 insertions(+), 185 deletions(-)
>
> additional commets is in below.
>
This specific change cannot be broken down into individual patches as much
as I'd like to. It's a complete rewrite of the badness() function and
requires two new tunables to be introduced, determination of the amount of
memory available to current, formals being changed around, and
documentation.
A review tip: the change itself is in the rewrite of the function now
called oom_badness(), so I recommend applying downloading mmotm and
reading it there as well as the documentation change. The remainder of
the patch fixes up the various callers of that function and isn't
interesting.
> If you suggest to revert pagefault_oom itself, it is considerable. but
> even though I don't think so.
>
> quote nick's mail
>
> The thing I should explain is that user interfaces are most important
> for their intended semantics. We don't generally call bugs or oversights
> part of the interface, and they are to be fixed unless some program
> relies on them.
>
I disagree, I believe the long-standing semantics of user interfaces such
as panic_on_oom are more important than what the name implies or what it
was intended for when it was introduced.
> Nowhere in the vm documentation does it say anything about "pagefault
> ooms", and even in the kernel code, even to mm developers (who mostly
> don't care about oom killer) probably wouldn't immediately think of
> pagefault versus any other type of oom.
>
> Given that, do you think it is reasonable, when panic_on_oom is set,
> to allow a process to be killed due to oom condition? Or do you think
> that was an oversight of the implementation?
>
Users have a well-defined and long-standing method of protecting their
applications from oom kill and that is OOM_DISABLE. With my patch, if
current is unkillable because it is OOM_DISABLE, then we fallback to a
tasklist scan iff panic_on_oom is unset.
> Regardless of what architectures currently do. Yes there is a
> consistency issue, and it should have been fixed earlier, but the
> consistency issue goes both ways now. Some (the most widely tested
> and used, if that matters) architectures, do it the right way.
>
> So, this patch is purely backstep. it break panic_on_oom.
> If anyone post "pagefault_out_of_memory() aware pagefault for ppc" or
> something else architecture, I'm glad and ack it.
>
It's not a backstep, it's making all architectures consistent as it sits
right now in mmotm. If someone would like to change all VM_FAULT_OOM
handlers to do a tasklist scan and not default to killing current, that is
an extension of this patchset. Likewise, if we want to ensure
panic_on_oom is respected even for pagefault ooms, then we need to do that
on all architectures so that we don't have multiple definitions depending
on machine type. The semantics of a sysctl shouldn't depend on the
architecture and right now it does, so this patch fixes that. In other
words: if you want to extend the definition of panic_on_oom, then do so
completely for all architectures first and then add it to the
documentation.
> > > > oom-deprecate-oom_adj-tunable.patch
> > >
> > > Alan had a concern about removing /proc/pid/oom_adj, or redefining it with
> > > different semantics as I originally did, and then I updated the patchset
> > > to deprecate the old tunable as Andrew suggested.
> > >
> > > My somewhat arbitrary time of removal was approximately 18 months from
> > > the date of deprecation which would give us 5-6 major kernel releases in
> > > between. If you think that's too early of a deadline, then I'd happily
> > > extend it by 6 months or a year.
> > >
> > > Keeping /proc/pid/oom_adj around indefinitely isn't very helpful if
> > > there's a finer grained alternative available already unless you want
> > > /proc/pid/oom_adj to actually mean something in which case you'll never be
> > > able to seperate oom badness scores from bitshifts. I believe everyone
> > > agrees that a more understood and finer grained tunable is necessary as
> > > compared to the current implementation that has very limited functionality
> > > other than polarizing tasks.
>
> The problem is, oom_adj is one of most widely used knob. it is not only used
> admin, but also be used applications. in addition, oom_score_adj is bad interface
> and no good to replace oom_adj. kamezawa-san, as following your mentioned.
>
oom_adj is retained but deprecated, so I'm not sure what you're suggesting
here. Do you think we should instead keep oom_adj forever in parallel
with oom_score_adj? It's quite clear that a more powerful, finer-grained
solution is necessary than what oom_adj provides. I believe the
deprecation for 5-6 major kernel releases is enough, but we can certainly
talk about extending that by a year if you'd like.
Can you elaborate on why you believe oom_score_adj is a bad interface or
have had problems with it in your personal use?
> agreed. oom_score_adj is completely crap. should gone.
> but also following pseudo scaling adjustment is crap too. it don't consider
> both page sharing and mlock pages. iow, it never works correctly.
>
>
> + points = (get_mm_rss(mm) + get_mm_counter(mm, MM_SWAPENTS)) * 1000 /
> + totalpages;
>
That baseline actually does work much better than total_vm as we've
discussed multiple times on LKML leading up to the development of this
series, but if you'd like to propose additional considerations into the
heuristic, than please do so.
> > > > oom-replace-sysctls-with-quick-mode.patch
> > > >
> > > > IIRC, alan and nick and I NAKed such patch. everybody explained the reason.
> > >
> > > Which patch of the four you listed are you referring to here?
> > >
> > replacing used sysctl is bad idea, in general.
> >
> > I have no _strong_ opinion. I welcome the patch series. But aboves are my concern.
> > Thank you for your work.
>
> I really hate "that is _inteltional_ regression" crap. now almost developers
> ignore a bug report and don't join problem investigate works. I and very few
> people does that. (ok, I agree you are in such few developers, thanks)
>
> Why can't we discard it simplely? please don't make crap.
>
Perhaps you don't understand. The users of oom_kill_allocating_task are
those systems that have extremely large tasklists and so iterating through
it comes at a substantial cost. It was originally requested by SGI
because they preferred an alternative to the tasklist scan used for
cpuset-constrained ooms and were satisfied with simply killing something
quickly instead of iterating the tasklist.
This patchset, however, enables oom_dump_tasks by default because it
provides useful information to the user to understand the memory use of
their applications so they can hopefully determine why the oom occurred.
This requires a tasklist scan itself, so those same users of
oom_kill_allocating_task are no longer protected from that cost by simply
setting this sysctl. They must also disable oom_dump_tasks or we're at
the same efficiency that we were before oom_kill_allocating_task was
introduced.
Since they must modify their startup scripts, and since the users of both
of these sysctls are the same and nobody would use one without the other,
it should be possible to consolidate them into a single sysctl. If
additional changes are made to the oom killer in the future, it would then
be possible to test for this single sysctl, oom_kill_quick, instead
without introducing additional sysctls and polluting procfs.
Thus, it's completely unnecessary to keep oom_kill_allocating_task and we
can redefine it for those systems. What alternatives do you have in mind
or what part of this logic do you not agree with?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-04-08 18:06 UTC|newest]
Thread overview: 116+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-24 16:25 [PATCH] oom killer: break from infinite loop Anfei Zhou
2010-03-25 2:51 ` KOSAKI Motohiro
2010-03-26 22:08 ` Andrew Morton
2010-03-26 22:33 ` Oleg Nesterov
2010-03-28 14:55 ` anfei
2010-03-28 16:28 ` Oleg Nesterov
2010-03-28 21:21 ` David Rientjes
2010-03-29 11:21 ` Oleg Nesterov
2010-03-29 20:49 ` [patch] oom: give current access to memory reserves if it has been killed David Rientjes
2010-03-30 15:46 ` Oleg Nesterov
2010-03-30 20:26 ` David Rientjes
2010-03-31 17:58 ` Oleg Nesterov
2010-03-31 20:47 ` Oleg Nesterov
2010-04-01 8:35 ` David Rientjes
2010-04-01 8:57 ` [patch -mm] oom: hold tasklist_lock when dumping tasks David Rientjes
2010-04-01 14:27 ` Oleg Nesterov
2010-04-01 19:16 ` David Rientjes
2010-04-01 13:59 ` [patch] oom: give current access to memory reserves if it has been killed Oleg Nesterov
2010-04-01 19:12 ` David Rientjes
2010-04-02 11:14 ` Oleg Nesterov
2010-04-02 18:30 ` [PATCH -mm 0/4] oom: linux has threads Oleg Nesterov
2010-04-02 18:31 ` [PATCH -mm 1/4] oom: select_bad_process: check PF_KTHREAD instead of !mm to skip kthreads Oleg Nesterov
2010-04-02 19:05 ` David Rientjes
2010-04-02 18:32 ` [PATCH -mm 2/4] oom: select_bad_process: PF_EXITING check should take ->mm into account Oleg Nesterov
2010-04-06 11:42 ` anfei
2010-04-06 12:18 ` Oleg Nesterov
2010-04-06 13:05 ` anfei
2010-04-06 13:38 ` Oleg Nesterov
2010-04-02 18:32 ` [PATCH -mm 3/4] oom: introduce find_lock_task_mm() to fix !mm false positives Oleg Nesterov
2010-04-02 18:33 ` [PATCH -mm 4/4] oom: oom_forkbomb_penalty: move thread_group_cputime() out of task_lock() Oleg Nesterov
2010-04-02 19:04 ` David Rientjes
2010-04-05 14:23 ` [PATCH -mm] oom: select_bad_process: never choose tasks with badness == 0 Oleg Nesterov
2010-04-02 19:02 ` [patch] oom: give current access to memory reserves if it has been killed David Rientjes
2010-04-02 19:14 ` Oleg Nesterov
2010-04-02 19:46 ` David Rientjes
2010-04-02 19:54 ` [patch -mm] oom: exclude tasks with badness score of 0 from being selected David Rientjes
2010-04-02 21:04 ` Oleg Nesterov
2010-04-02 21:22 ` [patch -mm v2] " David Rientjes
2010-04-02 20:55 ` [patch] oom: give current access to memory reserves if it has been killed Oleg Nesterov
2010-03-31 21:07 ` David Rientjes
2010-03-31 22:50 ` Oleg Nesterov
2010-03-31 23:30 ` Oleg Nesterov
2010-03-31 23:48 ` David Rientjes
2010-04-01 14:39 ` Oleg Nesterov
2010-04-01 18:58 ` David Rientjes
2010-04-01 8:25 ` David Rientjes
2010-04-01 15:26 ` Oleg Nesterov
2010-04-08 21:08 ` David Rientjes
2010-04-09 12:38 ` Oleg Nesterov
2010-03-30 16:39 ` [PATCH] oom: fix the unsafe proc_oom_score()->badness() call Oleg Nesterov
2010-03-30 17:43 ` [PATCH -mm] proc: don't take ->siglock for /proc/pid/oom_adj Oleg Nesterov
2010-03-30 20:30 ` David Rientjes
2010-03-31 9:17 ` Oleg Nesterov
2010-03-31 18:59 ` Oleg Nesterov
2010-03-31 21:14 ` David Rientjes
2010-03-31 23:00 ` Oleg Nesterov
2010-04-01 8:32 ` David Rientjes
2010-04-01 15:37 ` Oleg Nesterov
2010-04-01 19:04 ` David Rientjes
2010-03-30 20:32 ` [PATCH] oom: fix the unsafe proc_oom_score()->badness() call David Rientjes
2010-03-31 9:16 ` Oleg Nesterov
2010-03-31 20:17 ` Oleg Nesterov
2010-04-01 7:41 ` David Rientjes
2010-04-01 13:13 ` [PATCH 0/1] oom: fix the unsafe usage of badness() in proc_oom_score() Oleg Nesterov
2010-04-01 13:13 ` [PATCH 1/1] " Oleg Nesterov
2010-04-01 19:03 ` David Rientjes
2010-03-29 14:06 ` [PATCH] oom killer: break from infinite loop anfei
2010-03-29 20:01 ` David Rientjes
2010-03-30 14:29 ` anfei
2010-03-30 20:29 ` David Rientjes
2010-03-31 0:57 ` KAMEZAWA Hiroyuki
2010-03-31 6:07 ` David Rientjes
2010-03-31 6:13 ` KAMEZAWA Hiroyuki
2010-03-31 6:30 ` Balbir Singh
2010-03-31 6:31 ` KAMEZAWA Hiroyuki
2010-03-31 7:04 ` David Rientjes
2010-03-31 6:32 ` David Rientjes
2010-03-31 7:08 ` [patch -mm] memcg: make oom killer a no-op when no killable task can be found David Rientjes
2010-03-31 7:08 ` KAMEZAWA Hiroyuki
2010-03-31 8:04 ` Balbir Singh
2010-03-31 10:38 ` David Rientjes
2010-04-04 23:28 ` David Rientjes
2010-04-05 21:30 ` Andrew Morton
2010-04-05 22:40 ` David Rientjes
2010-04-05 22:49 ` Andrew Morton
2010-04-05 23:01 ` David Rientjes
2010-04-06 12:08 ` KOSAKI Motohiro
2010-04-06 21:47 ` David Rientjes
2010-04-07 0:20 ` KAMEZAWA Hiroyuki
2010-04-07 13:29 ` KOSAKI Motohiro
2010-04-08 18:05 ` David Rientjes [this message]
2010-04-21 19:17 ` Andrew Morton
2010-04-21 22:04 ` David Rientjes
2010-04-22 0:23 ` KAMEZAWA Hiroyuki
2010-04-22 8:34 ` David Rientjes
2010-04-27 22:58 ` [patch -mm] oom: reintroduce and deprecate oom_kill_allocating_task David Rientjes
2010-04-28 0:57 ` KAMEZAWA Hiroyuki
2010-04-22 7:23 ` [patch -mm] memcg: make oom killer a no-op when no killable task can be found Nick Piggin
2010-04-22 7:25 ` KAMEZAWA Hiroyuki
2010-04-22 10:09 ` Nick Piggin
2010-04-22 10:27 ` KAMEZAWA Hiroyuki
2010-04-22 21:11 ` David Rientjes
2010-04-22 10:28 ` David Rientjes
2010-04-22 15:39 ` Nick Piggin
2010-04-22 21:09 ` David Rientjes
2010-05-04 23:55 ` David Rientjes
2010-04-08 17:36 ` David Rientjes
2010-04-02 10:17 ` [PATCH] oom killer: break from infinite loop Mel Gorman
2010-04-04 23:26 ` David Rientjes
2010-04-05 10:47 ` Mel Gorman
2010-04-06 22:40 ` David Rientjes
2010-03-29 11:31 ` anfei
2010-03-29 11:46 ` Oleg Nesterov
2010-03-29 12:09 ` anfei
2010-03-28 2:46 ` David Rientjes
2010-05-04 23:51 [patch -mm] memcg: make oom killer a no-op when no killable task can be found David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.00.1004081036520.25592@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=akpm@linux-foundation.org \
--cc=anfei.zhou@gmail.com \
--cc=balbir@linux.vnet.ibm.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=nishimura@mxp.nes.nec.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox