From: Dave Chinner <david@fromorbit.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
mhocko@suse.cz, dchinner@redhat.com, linux-mm@kvack.org,
rientjes@google.com, oleg@redhat.com, akpm@linux-foundation.org,
mgorman@suse.de, torvalds@linux-foundation.org, xfs@oss.sgi.com
Subject: Re: How to handle TIF_MEMDIE stalls?
Date: Wed, 18 Feb 2015 09:54:30 +1100 [thread overview]
Message-ID: <20150217225430.GJ4251@dastard> (raw)
In-Reply-To: <20150217125315.GA14287@phnom.home.cmpxchg.org>
[ cc xfs list - experienced kernel devs should not have to be
reminded to do this ]
On Tue, Feb 17, 2015 at 07:53:15AM -0500, Johannes Weiner wrote:
> On Tue, Feb 17, 2015 at 09:23:26PM +0900, Tetsuo Handa wrote:
> > Tetsuo Handa wrote:
> > > Johannes Weiner wrote:
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index 8e20f9c2fa5a..f77c58ebbcfa 100644
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -2382,8 +2382,15 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
> > > > if (high_zoneidx < ZONE_NORMAL)
> > > > goto out;
> > > > /* The OOM killer does not compensate for light reclaim */
> > > > - if (!(gfp_mask & __GFP_FS))
> > > > + if (!(gfp_mask & __GFP_FS)) {
> > > > + /*
> > > > + * XXX: Page reclaim didn't yield anything,
> > > > + * and the OOM killer can't be invoked, but
> > > > + * keep looping as per should_alloc_retry().
> > > > + */
> > > > + *did_some_progress = 1;
> > > > goto out;
> > > > + }
> > >
> > > Why do you omit out_of_memory() call for GFP_NOIO / GFP_NOFS allocations?
> >
> > I can see "possible memory allocation deadlock in %s (mode:0x%x)" warnings
> > at kmem_alloc() in fs/xfs/kmem.c . I think commit 9879de7373fcfb46 "mm:
> > page_alloc: embed OOM killing naturally into allocation slowpath" introduced
> > a regression and below one is the fix.
> >
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2381,9 +2381,6 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
> > /* The OOM killer does not needlessly kill tasks for lowmem */
> > if (high_zoneidx < ZONE_NORMAL)
> > goto out;
> > - /* The OOM killer does not compensate for light reclaim */
> > - if (!(gfp_mask & __GFP_FS))
> > - goto out;
> > /*
> > * GFP_THISNODE contains __GFP_NORETRY and we never hit this.
> > * Sanity check for bare calls of __GFP_THISNODE, not real OOM.
>
> Again, we don't want to OOM kill on behalf of allocations that can't
> initiate IO, or even actively prevent others from doing it. Not per
> default anyway, because most callers can deal with the failure without
> having to resort to killing tasks, and NOFS reclaim *can* easily fail.
> It's the exceptions that should be annotated instead:
>
> void *
> kmem_alloc(size_t size, xfs_km_flags_t flags)
> {
> int retries = 0;
> gfp_t lflags = kmem_flags_convert(flags);
> void *ptr;
>
> do {
> ptr = kmalloc(size, lflags);
> if (ptr || (flags & (KM_MAYFAIL|KM_NOSLEEP)))
> return ptr;
> if (!(++retries % 100))
> xfs_err(NULL,
> "possible memory allocation deadlock in %s (mode:0x%x)",
> __func__, lflags);
> congestion_wait(BLK_RW_ASYNC, HZ/50);
> } while (1);
> }
>
> This should use __GFP_NOFAIL, which is not only designed to annotate
> broken code like this, but also recognizes that endless looping on a
> GFP_NOFS allocation needs the OOM killer after all to make progress.
>
> diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> index a7a3a63bb360..17ced1805d3a 100644
> --- a/fs/xfs/kmem.c
> +++ b/fs/xfs/kmem.c
> @@ -45,20 +45,12 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
> void *
> kmem_alloc(size_t size, xfs_km_flags_t flags)
> {
> - int retries = 0;
> gfp_t lflags = kmem_flags_convert(flags);
> - void *ptr;
>
> - do {
> - ptr = kmalloc(size, lflags);
> - if (ptr || (flags & (KM_MAYFAIL|KM_NOSLEEP)))
> - return ptr;
> - if (!(++retries % 100))
> - xfs_err(NULL,
> - "possible memory allocation deadlock in %s (mode:0x%x)",
> - __func__, lflags);
> - congestion_wait(BLK_RW_ASYNC, HZ/50);
> - } while (1);
> + if (!(flags & (KM_MAYFAIL | KM_NOSLEEP)))
> + lflags |= __GFP_NOFAIL;
> +
> + return kmalloc(size, lflags);
> }
Hmmm - the only reason there is a focus on this loop is that it
emits warnings about allocations failing. It's obvious that the
problem being dealt with here is a fundamental design issue w.r.t.
to locking and the OOM killer, but the proposed special casing
hack^H^H^H^Hband aid^W^Wsolution is not "working" because some code
in XFS started emitting warnings about allocations failing more
often.
So the answer is to remove the warning? That's like killing the
canary to stop the methane leak in the coal mine. No canary? No
problems!
Right now, the oom killer is a liability. Over the past 6 months
I've slowly had to exclude filesystem regression tests from running
on small memory machines because the OOM killer is now so unreliable
that it kills the test harness regularly rather than the process
generating memory pressure. That's a big red flag to me that all
this hacking around the edges is not solving the underlying problem,
but instead is breaking things that did once work.
And, well, then there's this (gfp.h):
* __GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller
* cannot handle allocation failures. This modifier is deprecated and no new
* users should be added.
So, is this another policy relevation from the mm developers about
the kmalloc API? i.e. that __GFP_NOFAIL is no longer deprecated?
Or just another symptom of frantic thrashing because nobody actually
understands the problem or those that do are unwilling to throw out
the broken crap and redesign it?
If you are changing allocator behaviour and constraints, then you
better damn well think through that changes fully, then document
those changes, change all the relevant code to use the new API (not
just those that throw warnings in your face) and make sure
*everyone* knows about it. e.g. a LWN article explaining the changes
and how memory allocation is going to work into the future would be
a good start.
Otherwise, this just looks like another knee-jerk band aid for an
architectural problem that needs more than special case hacks to
solve.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-02-17 22:54 UTC|newest]
Thread overview: 177+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-12 13:54 [RFC PATCH] oom: Don't count on mm-less current process Tetsuo Handa
2014-12-16 12:47 ` Michal Hocko
2014-12-17 11:54 ` Tetsuo Handa
2014-12-17 13:08 ` Michal Hocko
2014-12-18 12:11 ` Tetsuo Handa
2014-12-18 15:33 ` Michal Hocko
2014-12-19 12:07 ` Tetsuo Handa
2014-12-19 12:49 ` Michal Hocko
2014-12-20 9:13 ` Tetsuo Handa
2014-12-20 11:42 ` Tetsuo Handa
2014-12-22 20:25 ` Michal Hocko
2014-12-23 1:00 ` Tetsuo Handa
2014-12-23 9:51 ` Michal Hocko
2014-12-23 11:46 ` Tetsuo Handa
2014-12-23 11:57 ` Tetsuo Handa
2014-12-23 12:12 ` Tetsuo Handa
2014-12-23 12:27 ` Michal Hocko
2014-12-23 12:24 ` Michal Hocko
2014-12-23 13:00 ` Tetsuo Handa
2014-12-23 13:09 ` Michal Hocko
2014-12-23 13:20 ` Tetsuo Handa
2014-12-23 13:43 ` Michal Hocko
2014-12-23 14:11 ` Tetsuo Handa
2014-12-23 14:57 ` Michal Hocko
2014-12-19 12:22 ` How to handle TIF_MEMDIE stalls? Tetsuo Handa
2014-12-20 2:03 ` Dave Chinner
2014-12-20 12:41 ` Tetsuo Handa
2014-12-20 22:35 ` Dave Chinner
2014-12-21 8:45 ` Tetsuo Handa
2014-12-21 20:42 ` Dave Chinner
2014-12-22 16:57 ` Michal Hocko
2014-12-22 21:30 ` Dave Chinner
2014-12-23 9:41 ` Johannes Weiner
2014-12-24 1:06 ` Dave Chinner
2014-12-24 2:40 ` Linus Torvalds
2014-12-29 18:19 ` Michal Hocko
2014-12-30 6:42 ` Tetsuo Handa
2014-12-30 11:21 ` Michal Hocko
2014-12-30 13:33 ` Tetsuo Handa
2014-12-31 10:24 ` Tetsuo Handa
2015-02-09 11:44 ` Tetsuo Handa
2015-02-10 13:58 ` Tetsuo Handa
2015-02-10 15:19 ` Johannes Weiner
2015-02-11 2:23 ` Tetsuo Handa
2015-02-11 13:37 ` Tetsuo Handa
2015-02-11 18:50 ` Oleg Nesterov
2015-02-11 18:59 ` Oleg Nesterov
2015-03-14 13:03 ` Tetsuo Handa
2015-02-17 12:23 ` Tetsuo Handa
2015-02-17 12:53 ` Johannes Weiner
2015-02-17 15:38 ` Michal Hocko
2015-02-17 22:54 ` Dave Chinner [this message]
2015-02-17 23:32 ` Dave Chinner
2015-02-18 8:25 ` Michal Hocko
2015-02-18 10:48 ` Dave Chinner
2015-02-18 12:16 ` Michal Hocko
2015-02-18 21:31 ` Dave Chinner
2015-02-19 9:40 ` Michal Hocko
2015-02-19 22:03 ` Dave Chinner
2015-02-20 9:27 ` Michal Hocko
2015-02-19 11:01 ` Johannes Weiner
2015-02-19 12:29 ` Michal Hocko
2015-02-19 12:58 ` Michal Hocko
2015-02-19 15:29 ` Tetsuo Handa
2015-02-19 21:53 ` Tetsuo Handa
2015-02-20 9:13 ` Michal Hocko
2015-02-20 13:37 ` Stefan Ring
2015-02-19 13:29 ` Tetsuo Handa
2015-02-20 9:10 ` Michal Hocko
2015-02-20 12:20 ` Tetsuo Handa
2015-02-20 12:38 ` Michal Hocko
2015-02-19 21:43 ` Dave Chinner
2015-02-20 12:48 ` Michal Hocko
2015-02-20 23:09 ` Dave Chinner
2015-02-19 10:24 ` Johannes Weiner
2015-02-19 22:52 ` Dave Chinner
2015-02-20 10:36 ` Tetsuo Handa
2015-02-20 23:15 ` Dave Chinner
2015-02-21 3:20 ` Theodore Ts'o
2015-02-21 9:19 ` Andrew Morton
2015-02-21 13:48 ` Tetsuo Handa
2015-02-21 21:38 ` Dave Chinner
2015-02-22 0:20 ` Johannes Weiner
2015-02-23 10:48 ` Michal Hocko
2015-02-23 11:23 ` Tetsuo Handa
2015-02-23 21:33 ` David Rientjes
2015-02-22 14:48 ` __GFP_NOFAIL and oom_killer_disabled? Tetsuo Handa
2015-02-23 10:21 ` Michal Hocko
2015-02-23 13:03 ` Tetsuo Handa
2015-02-24 18:14 ` Michal Hocko
2015-02-25 11:22 ` Tetsuo Handa
2015-02-25 16:02 ` Michal Hocko
2015-02-25 21:48 ` Tetsuo Handa
2015-02-25 21:51 ` Andrew Morton
2015-02-21 12:00 ` How to handle TIF_MEMDIE stalls? Tetsuo Handa
2015-02-23 10:26 ` Michal Hocko
2015-02-21 11:12 ` Tetsuo Handa
2015-02-21 21:48 ` Dave Chinner
2015-02-21 23:52 ` Johannes Weiner
2015-02-23 0:45 ` Dave Chinner
2015-02-23 1:29 ` Andrew Morton
2015-02-23 7:32 ` Dave Chinner
2015-02-27 18:24 ` Vlastimil Babka
2015-02-28 0:03 ` Dave Chinner
2015-02-28 15:17 ` Theodore Ts'o
2015-03-02 9:39 ` Vlastimil Babka
2015-03-02 22:31 ` Dave Chinner
2015-03-03 9:13 ` Vlastimil Babka
2015-03-04 1:33 ` Dave Chinner
2015-03-04 8:50 ` Vlastimil Babka
2015-03-04 11:03 ` Dave Chinner
2015-03-07 0:20 ` Johannes Weiner
2015-03-07 3:43 ` Dave Chinner
2015-03-07 15:08 ` Johannes Weiner
2015-03-02 20:22 ` Johannes Weiner
2015-03-02 23:12 ` Dave Chinner
2015-03-03 2:50 ` Johannes Weiner
2015-03-04 6:52 ` Dave Chinner
2015-03-04 15:04 ` Johannes Weiner
2015-03-04 17:38 ` Theodore Ts'o
2015-03-04 23:17 ` Dave Chinner
2015-02-28 16:29 ` Johannes Weiner
2015-02-28 16:41 ` Theodore Ts'o
2015-02-28 22:15 ` Johannes Weiner
2015-03-01 11:17 ` Tetsuo Handa
2015-03-06 11:53 ` Tetsuo Handa
2015-03-01 13:43 ` Theodore Ts'o
2015-03-01 16:15 ` Johannes Weiner
2015-03-01 19:36 ` Theodore Ts'o
2015-03-01 20:44 ` Johannes Weiner
2015-03-01 20:17 ` Johannes Weiner
2015-03-01 21:48 ` Dave Chinner
2015-03-02 0:17 ` Dave Chinner
2015-03-02 12:46 ` Brian Foster
2015-02-28 18:36 ` Vlastimil Babka
2015-03-02 15:18 ` Michal Hocko
2015-03-02 16:05 ` Johannes Weiner
2015-03-02 17:10 ` Michal Hocko
2015-03-02 17:27 ` Johannes Weiner
2015-03-02 16:39 ` Theodore Ts'o
2015-03-02 16:58 ` Michal Hocko
2015-03-04 12:52 ` Dave Chinner
2015-02-17 14:59 ` Michal Hocko
2015-02-17 14:50 ` Michal Hocko
2015-02-17 14:37 ` Michal Hocko
2015-02-17 14:44 ` Michal Hocko
2015-02-16 11:23 ` Tetsuo Handa
2015-02-16 15:42 ` Johannes Weiner
2015-02-17 11:57 ` Tetsuo Handa
2015-02-17 13:16 ` Johannes Weiner
2015-02-17 16:50 ` Michal Hocko
2015-02-17 23:25 ` Dave Chinner
2015-02-18 8:48 ` Michal Hocko
2015-02-18 11:23 ` Tetsuo Handa
2015-02-18 12:29 ` Michal Hocko
2015-02-18 14:06 ` Tetsuo Handa
2015-02-18 14:25 ` Michal Hocko
2015-02-19 10:48 ` Tetsuo Handa
2015-02-20 8:26 ` Michal Hocko
2015-02-23 22:08 ` David Rientjes
2015-02-24 11:20 ` Tetsuo Handa
2015-02-24 15:20 ` Theodore Ts'o
2015-02-24 21:02 ` Dave Chinner
2015-02-25 14:31 ` Tetsuo Handa
2015-02-27 7:39 ` Dave Chinner
2015-02-27 12:42 ` Tetsuo Handa
2015-02-27 13:12 ` Dave Chinner
2015-03-04 12:41 ` Tetsuo Handa
2015-03-04 13:25 ` Dave Chinner
2015-03-04 14:11 ` Tetsuo Handa
2015-03-05 1:36 ` Dave Chinner
2015-02-17 16:33 ` Michal Hocko
2014-12-29 17:40 ` [PATCH] mm: get rid of radix tree gfp mask for pagecache_get_page (was: Re: How to handle TIF_MEMDIE stalls?) Michal Hocko
2014-12-29 18:45 ` Linus Torvalds
2014-12-29 19:33 ` Michal Hocko
2014-12-30 13:42 ` Michal Hocko
2014-12-30 21:45 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150217225430.GJ4251@dastard \
--to=david@fromorbit.com \
--cc=akpm@linux-foundation.org \
--cc=dchinner@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=oleg@redhat.com \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=rientjes@google.com \
--cc=torvalds@linux-foundation.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox