From: Michal Hocko <mhocko@kernel.org>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org,
Andrea Arcangeli <aarcange@redhat.com>,
David Rientjes <rientjes@google.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Manish Jaggi <mjaggi@caviumnetworks.com>,
Mel Gorman <mgorman@suse.de>, Oleg Nesterov <oleg@redhat.com>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH] mm,oom: Try last second allocation before and after selecting an OOM victim.
Date: Mon, 30 Oct 2017 15:18:15 +0100 [thread overview]
Message-ID: <20171030141815.lk76bfetmspf7f4x@dhcp22.suse.cz> (raw)
In-Reply-To: <1509178029-10156-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>
On Sat 28-10-17 17:07:09, Tetsuo Handa wrote:
> This patch splits last second allocation attempt into two locations, once
> before selecting an OOM victim and again after selecting an OOM victim,
> and uses normal watermark for last second allocation attempts.
Why do we need both?
> As of linux-2.6.11, nothing prevented from concurrently calling
> out_of_memory(). TIF_MEMDIE test in select_bad_process() tried to avoid
> needless OOM killing. Thus, it was safe to do __GFP_DIRECT_RECLAIM
> allocation (apart from which watermark should be used) just before
> calling out_of_memory().
>
> As of linux-2.6.24, try_set_zone_oom() was added to
> __alloc_pages_may_oom() by commit ff0ceb9deb6eb017 ("oom: serialize out
> of memory calls") which effectively started acting as a kind of today's
> mutex_trylock(&oom_lock).
>
> As of linux-4.2, try_set_zone_oom() was replaced with oom_lock by
> commit dc56401fc9f25e8f ("mm: oom_kill: simplify OOM killer locking").
> At least by this time, it became no longer safe to do
> __GFP_DIRECT_RECLAIM allocation with oom_lock held.
>
> And as of linux-4.13, last second allocation attempt stopped using
> __GFP_DIRECT_RECLAIM by commit e746bf730a76fe53 ("mm,page_alloc: don't
> call __node_reclaim() with oom_lock held.").
>
> Therefore, there is no longer valid reason to use ALLOC_WMARK_HIGH for
> last second allocation attempt [1].
Another reason to use the high watermark as explained by Andrea was
"
: Elaborating the comment: the reason for the high wmark is to reduce
: the likelihood of livelocks and be sure to invoke the OOM killer, if
: we're still under pressure and reclaim just failed. The high wmark is
: used to be sure the failure of reclaim isn't going to be ignored. If
: using the min wmark like you propose there's risk of livelock or
: anyway of delayed OOM killer invocation.
"
How is that affected by changes in locking you discribe above?
> And this patch changes to do normal
> allocation attempt, with handling of ALLOC_OOM added in order to mitigate
> extra OOM victim selection problem reported by Manish Jaggi [2].
>
> Doing really last second allocation attempt after selecting an OOM victim
> will also help the OOM reaper to start reclaiming memory without waiting
> for oom_lock to be released.
The changelog is much more obscure than it really needs to be. You fail
to explain _why_ we need this and and _what_ the actual problem is. You
are simply drowning in details here (btw. this is not the first time
your changelog has this issues). Try to focus on _what_ is the problem
_why_ do we care and _how_ are you addressing it.
[...]
> +struct page *alloc_pages_before_oomkill(struct oom_control *oc)
> +{
> + /*
> + * Make sure that this allocation attempt shall not depend on
> + * __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocation, for the caller is
> + * already holding oom_lock.
> + */
> + const gfp_t gfp_mask = oc->gfp_mask & ~__GFP_DIRECT_RECLAIM;
> + struct alloc_context *ac = oc->ac;
> + unsigned int alloc_flags = gfp_to_alloc_flags(gfp_mask);
> + const int reserve_flags = __gfp_pfmemalloc_flags(gfp_mask);
> +
> + /* Need to update zonelist if selected as OOM victim. */
> + if (reserve_flags) {
> + alloc_flags = reserve_flags;
> + ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
> + ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
> + ac->high_zoneidx, ac->nodemask);
> + }
Why do we need this zone list rebuilding?
> + return get_page_from_freelist(gfp_mask, oc->order, alloc_flags, ac);
> +}
> +
> static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
> int preferred_nid, nodemask_t *nodemask,
> struct alloc_context *ac, gfp_t *alloc_mask,
> --
> 1.8.3.1
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-10-30 14:18 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-28 8:07 Tetsuo Handa
2017-10-30 14:18 ` Michal Hocko [this message]
2017-10-31 10:40 ` Tetsuo Handa
2017-10-31 12:10 ` Michal Hocko
2017-10-31 12:42 ` Tetsuo Handa
2017-10-31 12:48 ` Michal Hocko
2017-10-31 13:13 ` Tetsuo Handa
2017-10-31 13:22 ` Michal Hocko
2017-10-31 13:51 ` Tetsuo Handa
2017-10-31 14:10 ` Michal Hocko
2017-11-01 11:58 ` Tetsuo Handa
2017-11-01 12:46 ` Michal Hocko
2017-11-01 14:38 ` Tetsuo Handa
2017-11-01 14:48 ` Michal Hocko
2017-11-01 15:37 ` Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171030141815.lk76bfetmspf7f4x@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mjaggi@caviumnetworks.com \
--cc=oleg@redhat.com \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=rientjes@google.com \
--cc=vbabka@suse.cz \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox