From: Michal Hocko <mhocko@suse.com>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
aarcange@redhat.com, hannes@cmpxchg.org
Subject: Re: [PATCH 1/5] mm,page_alloc: Update comment for last second allocation attempt.
Date: Thu, 9 Nov 2017 12:30:40 +0100 [thread overview]
Message-ID: <20171109113040.77gapoevxszejyfm@dhcp22.suse.cz> (raw)
In-Reply-To: <201711091945.IAD64050.MtLFFQOOSOFJHV@I-love.SAKURA.ne.jp>
On Thu 09-11-17 19:45:04, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Wed 08-11-17 20:01:44, Tetsuo Handa wrote:
> > > __alloc_pages_may_oom() is doing last second allocation attempt using
> > > ALLOC_WMARK_HIGH before calling out_of_memory(). This had two reasons.
> > >
> > > The first reason is explained in the comment that it aims to catch
> > > potential parallel OOM killing. But there is no longer parallel OOM
> > > killing (in the sense that out_of_memory() is called "concurrently")
> > > because we serialize out_of_memory() calls using oom_lock.
> > >
> > > The second reason is explained by Andrea Arcangeli (who added that code)
> > > that it aims to reduce the likelihood of OOM livelocks and be sure to
> > > invoke the OOM killer. There was a risk of livelock or anyway of delayed
> > > OOM killer invocation if ALLOC_WMARK_MIN is used, for relying on last
> > > few pages which are constantly allocated and freed in the meantime will
> > > not improve the situation.
>
> Above part is OK, isn't it?
>
> >
> > > But there is no longer possibility of OOM
> > > livelocks or failing to invoke the OOM killer because we need to mask
> > > __GFP_DIRECT_RECLAIM for last second allocation attempt because oom_lock
> > > prevents __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocations which last
> > > second allocation attempt indirectly involve from failing.
> >
> > This is an unfounded, misleading and actually even wrong statement that
> > has nothing to do with what Andrea had in mind. __GFP_DIRECT_RECLAIM
> > doesn't have anything to do with the livelock as I've already mentioned
> > several times already.
>
> I know that this part is not what Andrea had in mind when he added this comment.
> What I'm saying is that "precondition has changed after Andrea added this comment"
> and "these reasons which Andrea had in mind when he added this comment no longer
> holds". I'm posting "for the record" purpose in order to describe reasons for
> current code.
>
> When we introduced oom_lock (or formerly the per-zone oom lock) for serializing invocation
> of the OOM killer, we introduced two bugs at the same time. One bug is that since doing
> __GFP_DIRECT_RECLAIM with oom_lock held can make __GFP_DIRECT_RECLAIM && !__GFP_NORETRY
> allocations (which __GFP_DIRECT_RECLAIM indirectly involved) lockup, we need to avoid
> __GFP_DIRECT_RECLAIM allocations with oom_lock held. This is why commit e746bf730a76fe53
> ("mm,page_alloc: don't call __node_reclaim() with oom_lock held.") was made. This in turn
> forbids using __GFP_DIRECT_RECLAIM for last second allocation attempt which was not
> forbidden when Andrea added this comment.
But this has anything to do with the original motivation for the high
watermark allocation.
> ( The other bug is that we assumed that somebody is making progress for us when
> mutex_trylock(&oom_lock) in __alloc_pages_may_oom() failed, for we did not take
> scheduling priority into account when we introduced oom_lock. But the other bug
> is not what I'm writing in this patch. You can forget about the other bug
> regarding this patch. )
>
> >
> > > Since the OOM killer does not always kill a process consuming significant
> > > amount of memory (the OOM killer kills a process with highest OOM score
> > > (or instead one of its children if any)), there will be cases where
> > > ALLOC_WMARK_HIGH fails and ALLOC_WMARK_MIN succeeds.
> >
> > This is possible but not really interesting case as already explained.
> >
> > > Since the gap between ALLOC_WMARK_HIGH and ALLOC_WMARK_MIN can be changed
> > > by /proc/sys/vm/min_free_kbytes parameter, using ALLOC_WMARK_MIN for last
> > > second allocation attempt might be better for minimizing number of OOM
> > > victims. But that change should be done in a separate patch. This patch
> > > just clarifies that ALLOC_WMARK_HIGH is an arbitrary choice.
> >
> > Again unfounded claim.
>
> Since use of __GFP_DIRECT_RECLAIM for last second allocation attempt is now
> forbidden due to oom_lock already held, possibility of failing last allocation
> attempt has increased compared to when Andrea added this comment. Andrea said
>
> The high wmark is used to be sure the failure of reclaim isn't going to be
> ignored. If using the min wmark like you propose there's risk of livelock or
> anyway of delayed OOM killer invocation.
Wrong. It just takes an unrelated single page alloc/free loop to prevent
from the oom killer invocation.
[...]
> So, I believe that the changelog is not wrong, and I don't want to preserve
>
> keep very high watermark here, this is only to catch a parallel oom killing,
> we must fail if we're still under heavy pressure
>
> part which lost strong background.
I do not see how. You simply do not address the original concern Andrea
had and keep repeating unrelated stuff.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-11-09 11:30 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-08 11:01 Tetsuo Handa
2017-11-08 11:01 ` [PATCH 2/5] mm,oom: Move last second allocation to inside the OOM killer Tetsuo Handa
2017-11-08 14:50 ` Michal Hocko
2017-11-08 11:01 ` [PATCH 3/5] mm,oom: Use ALLOC_OOM for OOM victim's last second allocation Tetsuo Handa
2017-11-08 14:50 ` Michal Hocko
2017-11-08 11:01 ` [PATCH 4/5] mm,oom: Remove oom_lock serialization from the OOM reaper Tetsuo Handa
2017-11-08 15:03 ` Michal Hocko
2017-11-08 11:01 ` [PATCH 5/5] nommu,oom: Set MMF_OOM_SKIP without waiting for termination Tetsuo Handa
2017-11-08 14:56 ` Michal Hocko
2017-11-08 16:24 ` Michal Hocko
2017-11-09 10:49 ` Tetsuo Handa
2017-11-09 11:27 ` Michal Hocko
2017-11-08 14:50 ` [PATCH 1/5] mm,page_alloc: Update comment for last second allocation attempt Michal Hocko
2017-11-09 10:45 ` Tetsuo Handa
2017-11-09 11:30 ` Michal Hocko [this message]
2017-11-09 12:19 ` Tetsuo Handa
2017-11-09 12:25 ` Michal Hocko
2017-11-09 12:32 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171109113040.77gapoevxszejyfm@dhcp22.suse.cz \
--to=mhocko@suse.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox