From: Michal Hocko <mhocko@kernel.org>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: akpm@linux-foundation.org, oleg@redhat.com, rientjes@google.com,
linux-mm@kvack.org
Subject: Re: [PATCH] mm,oom_reaper: Use try_oom_reaper() for reapability test.
Date: Thu, 14 Apr 2016 16:30:32 +0200 [thread overview]
Message-ID: <20160414143031.GJ2850@dhcp22.suse.cz> (raw)
In-Reply-To: <201604142301.BJG51570.LFSOOVFMHJQtOF@I-love.SAKURA.ne.jp>
On Thu 14-04-16 23:01:41, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Thu 14-04-16 20:34:18, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > [...]
> > > > The patch seems correct I just do not see any point in it because I do
> > > > not think it handles any real life situation. I basically consider any
> > > > workload where only _certain_ thread(s) or process(es) sharing the mm have
> > > > OOM_SCORE_ADJ_MIN set as invalid. Why should we care about those? This
> > > > requires root to cripple the system. Or am I missing a valid
> > > > configuration where this would make any sense?
> > >
> > > Because __oom_reap_task() as of current linux.git marks only one of
> > > thread groups as OOM_SCORE_ADJ_MIN and happily disables further reaping
> > > (which I'm utilizing such behavior for catching bugs which occur under
> > > almost OOM situation).
> >
> > I am not really sure I understand what you mean here. Let me try. You
> > have N tasks sharing the same mm. OOM killer selects one of them and
> > kills it, grants TIF_MEMDIE and schedules it for oom_reaper. Now the oom
> > reaper handles that task and marks it OOM_SCORE_ADJ_MIN. Others will
> > have fatal_signal_pending without OOM_SCORE_ADJ_MIN. The shared mm was
> > already reaped so there is not much left we can do about it. What now?
>
> You finally understood what I mean here.
OK, good to know we are on the same page.
> Say, there are TG1 and TG2 sharing the same mm which are marked as
> OOM_SCORE_ADJ_MAX. First round of the OOM killer selects TG1 and sends
> SIGKILL to TG1 and TG2. The OOM reaper reaps memory via TG1 and marks
> TG1 as OOM_SCORE_ADJ_MIN and revokes TIF_MEMDIE from TG1. Then, next
> round of the OOM killer selects TG2 and sends SIGKILL to TG1 and TG2.
> But since TG1 is already marked as OOM_SCORE_ADJ_MIN by the OOM reaper,
> the OOM reaper is not called.
which doesn't matter because this mm has already been reaped and further
attempts are basically deemed to fail as well. This is the reason why I
completely failed to see your point previously. Because it is not the
oom reaper which makes the situation worse. We just never cared about
this possible case.
> This is a situation which the patch you show below will solve.
OK, great.
[...]
> Michal Hocko wrote:
> > On Thu 14-04-16 14:01:06, Michal Hocko wrote:
> > [...]
> > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > > index 716759e3eaab..d5a4d08f2031 100644
> > > --- a/mm/oom_kill.c
> > > +++ b/mm/oom_kill.c
> > > @@ -286,6 +286,13 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
> > > return OOM_SCAN_CONTINUE;
> > >
> > > /*
> > > + * mm of this task has already been reaped so it doesn't make any
> > > + * sense to select it as a new oom victim.
> > > + */
> > > + if (test_bit(MMF_OOM_REAPED, &task->mm->flags))
> > > + return OOM_SCAN_CONTINUE;
> >
> > This will have to move to oom_badness to where we check for
> > OOM_SCORE_ADJ_MIN to catch the case where we try to sacrifice a child...
>
> oom_badness() should return 0 if MMF_OOM_REAPED is set (please be careful
> with race task->mm becoming NULL).
This is what I ended up with. Patch below for the reference. I plan to
repost with other 3 posted recently in one series sometimes next week
hopefully.
> But oom_scan_process_thread() should not
> return OOM_SCAN_ABORT if one of threads in TG1 or TG2 still has TIF_MEMDIE
> (because it is possible that one of threads in TG1 or TG2 gets TIF_MEMDIE
> via the fatal_signal_pending(current) shortcut in out_of_memory()).
This would be a separate patch again. I still have to think how to deal
with this case but the most straightforward thing to do would be to simply
disable those shortcuts for crosss process shared mm-s. They are just
too weird and I do not think we want to support all the potential corner
cases and dropping an optimistic heuristic in the name of overal sanity
sounds as a good compromise to me.
---
next prev parent reply other threads:[~2016-04-14 14:30 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-14 10:56 Tetsuo Handa
2016-04-14 10:56 ` [PATCH] mm,oom: Clarify reason to kill other threads sharing the vitctim's memory Tetsuo Handa
2016-04-14 11:31 ` Michal Hocko
2016-04-14 15:03 ` [PATCH] mm,oom: Clarify reason to kill other threads sharing thevitctim's memory Tetsuo Handa
2016-04-14 15:18 ` Michal Hocko
2016-04-14 21:59 ` [PATCH] mm,oom: Clarify reason to kill other threads sharing the vitctim's memory Tetsuo Handa
2016-04-14 11:21 ` [PATCH] mm,oom_reaper: Use try_oom_reaper() for reapability test Michal Hocko
2016-04-14 11:34 ` Tetsuo Handa
2016-04-14 12:01 ` Michal Hocko
2016-04-14 12:34 ` Michal Hocko
2016-04-14 14:01 ` Tetsuo Handa
2016-04-14 14:30 ` Michal Hocko [this message]
2016-04-15 12:11 ` Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160414143031.GJ2850@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=oleg@redhat.com \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox