From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@suse.com
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
rientjes@google.com, mjaggi@caviumnetworks.com, mgorman@suse.de,
oleg@redhat.com, vdavydov.dev@gmail.com, vbabka@suse.cz
Subject: Re: [RFC PATCH 2/2] mm,oom: Try last second allocation after selecting an OOM victim.
Date: Sat, 9 Sep 2017 09:55:00 +0900 [thread overview]
Message-ID: <201709090955.HFA57316.QFOSVMtFOJLFOH@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20170825080020.GE25498@dhcp22.suse.cz>
There is no response to your suggestion. Can we agree with going to this direction?
If no response, for now I push ignore MMF_OOM_SKIP for once approach.
Michal Hocko wrote:
> On Thu 24-08-17 23:40:36, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Thu 24-08-17 21:18:26, Tetsuo Handa wrote:
> > > > Manish Jaggi noticed that running LTP oom01/oom02 ltp tests with high core
> > > > count causes random kernel panics when an OOM victim which consumed memory
> > > > in a way the OOM reaper does not help was selected by the OOM killer [1].
> > > >
> > > > Since commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip
> > > > oom_reaped tasks") changed task_will_free_mem(current) in out_of_memory()
> > > > to return false as soon as MMF_OOM_SKIP is set, many threads sharing the
> > > > victim's mm were not able to try allocation from memory reserves after the
> > > > OOM reaper gave up reclaiming memory.
> > > >
> > > > I proposed a patch which alllows task_will_free_mem(current) in
> > > > out_of_memory() to ignore MMF_OOM_SKIP for once so that all OOM victim
> > > > threads are guaranteed to have tried ALLOC_OOM allocation attempt before
> > > > start selecting next OOM victims [2], for Michal Hocko did not like
> > > > calling get_page_from_freelist() from the OOM killer which is a layer
> > > > violation [3]. But now, Michal thinks that calling get_page_from_freelist()
> > > > after task_will_free_mem(current) test is better than allowing
> > > > task_will_free_mem(current) to ignore MMF_OOM_SKIP for once [4], for
> > > > this would help other cases when we race with an exiting tasks or somebody
> > > > managed to free memory while we were selecting an OOM victim which can take
> > > > quite some time.
> > >
> > > This a lot of text which can be more confusing than helpful. Could you
> > > state the problem clearly without detours? Yes, the oom killer selection
> > > can race with those freeing memory. And it has been like that since
> > > basically ever.
> >
> > The problem which Manish Jaggi reported (and I can still reproduce) is that
> > the OOM killer ignores MMF_OOM_SKIP mm too early. And the problem became real
> > in 4.8 due to commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip
> > oom_reaped tasks"). Thus, it has _not_ been like that since basically ever.
>
> Again, you are mixing more things together. Manish usecase triggers a
> pathological case where the oom reaper is not able to reclaim basically
> any memory and so we unnecessarily kill another victim if the original
> one doesn't finish quick enough.
>
> This patch and your former attempts will only help (for that particular
> case) if the victim itself wanted to allocate and didn't manage to pass
> through the ALLOC_OOM attempt since it was killed. This yet again a
> corner case and something this patch won't plug in general (it only
> takes another task to go that path). That's why I consider that
> confusing to mention in the changelog.
>
> What I am trying to say is that time-to-check vs time-to-kill has
> been a race window since ever and a large amount of memory can be
> released during that time. This patch definitely reduces that time
> window _considerably_. There is still a race window left but this is
> inherently racy so you could argue that the remaining window is small to
> lose sleep over. After all this is a corner case again. From my years of
> experience with OOM reports I haven't met many (if any) cases like that.
> So the primary question is whether we do care about this race window
> enough to even try to fix it. Considering an absolute lack of reports
> I would tend to say we don't but if the fix can be made non-intrusive
> which seems likely then we actually can try it out at least.
>
> > > I wanted to remove this some time
> > > ago but it has been pointed out that this was really needed
> > > https://patchwork.kernel.org/patch/8153841/ Maybe things have changed
> > > and if so please explain.
> >
> > get_page_from_freelist() in __alloc_pages_may_oom() will remain needed
> > because it can help allocations which do not call oom_kill_process() (i.e.
> > allocations which do "goto out;" in __alloc_pages_may_oom() without calling
> > out_of_memory(), and allocations which do "return;" in out_of_memory()
> > without calling oom_kill_process() (e.g. !__GFP_FS)) to succeed.
>
> I do not understand. Those request will simply back off and retry the
> allocation or bail out and fail the allocation. My primary question was
>
> : that the above link contains an explanation from Andrea that the reason
> : for the high wmark is to reduce the likelihood of livelocks and be sure
> : to invoke the OOM killer,
>
> I am not sure how much that reason applies to the current code but if it
> still applies then we should do the same for later
> last-minute-allocation as well. Having both and disagreeing is just a
> mess.
> --
> Michal Hocko
> SUSE Labs
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-09-09 1:46 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1503577106-9196-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>
2017-08-24 12:18 ` Tetsuo Handa
2017-08-24 13:18 ` Michal Hocko
2017-08-24 14:40 ` Tetsuo Handa
2017-08-25 8:00 ` Michal Hocko
2017-09-09 0:55 ` Tetsuo Handa [this message]
[not found] ` <201710172204.AGG30740.tVHJFFOQLMSFOO@I-love.SAKURA.ne.jp>
2017-10-20 12:40 ` Michal Hocko
2017-10-20 14:18 ` Tetsuo Handa
2017-10-23 11:30 ` Michal Hocko
2017-10-24 11:24 ` Tetsuo Handa
2017-10-24 11:41 ` Michal Hocko
2017-10-25 10:48 ` Tetsuo Handa
2017-10-25 11:09 ` Michal Hocko
2017-10-25 12:15 ` Tetsuo Handa
2017-10-25 12:41 ` Michal Hocko
2017-10-25 14:58 ` Tetsuo Handa
2017-10-25 15:05 ` Michal Hocko
2017-10-25 15:34 ` Tetsuo Handa
2017-08-24 13:03 ` [PATCH 1/2] mm,page_alloc: Don't call __node_reclaim() with oom_lock held Michal Hocko
2017-08-25 20:47 ` Andrew Morton
2017-08-26 1:28 ` Tetsuo Handa
2017-08-27 4:17 ` Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201709090955.HFA57316.QFOSVMtFOJLFOH@I-love.SAKURA.ne.jp \
--to=penguin-kernel@i-love.sakura.ne.jp \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.com \
--cc=mjaggi@caviumnetworks.com \
--cc=oleg@redhat.com \
--cc=rientjes@google.com \
--cc=vbabka@suse.cz \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox