From: Dave Chinner <david@fromorbit.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
mhocko@suse.cz, dchinner@redhat.com, linux-mm@kvack.org,
rientjes@google.com, oleg@redhat.com, akpm@linux-foundation.org,
mgorman@suse.de, torvalds@linux-foundation.org, xfs@oss.sgi.com
Subject: Re: How to handle TIF_MEMDIE stalls?
Date: Fri, 20 Feb 2015 09:52:17 +1100 [thread overview]
Message-ID: <20150219225217.GY12722@dastard> (raw)
In-Reply-To: <20150219102431.GA15569@phnom.home.cmpxchg.org>
On Thu, Feb 19, 2015 at 05:24:31AM -0500, Johannes Weiner wrote:
> On Wed, Feb 18, 2015 at 09:54:30AM +1100, Dave Chinner wrote:
> > [ cc xfs list - experienced kernel devs should not have to be
> > reminded to do this ]
> >
> > On Tue, Feb 17, 2015 at 07:53:15AM -0500, Johannes Weiner wrote:
> > > - do {
> > > - ptr = kmalloc(size, lflags);
> > > - if (ptr || (flags & (KM_MAYFAIL|KM_NOSLEEP)))
> > > - return ptr;
> > > - if (!(++retries % 100))
> > > - xfs_err(NULL,
> > > - "possible memory allocation deadlock in %s (mode:0x%x)",
> > > - __func__, lflags);
> > > - congestion_wait(BLK_RW_ASYNC, HZ/50);
> > > - } while (1);
> > > + if (!(flags & (KM_MAYFAIL | KM_NOSLEEP)))
> > > + lflags |= __GFP_NOFAIL;
> > > +
> > > + return kmalloc(size, lflags);
> > > }
> >
> > Hmmm - the only reason there is a focus on this loop is that it
> > emits warnings about allocations failing. It's obvious that the
> > problem being dealt with here is a fundamental design issue w.r.t.
> > to locking and the OOM killer, but the proposed special casing
> > hack^H^H^H^Hband aid^W^Wsolution is not "working" because some code
> > in XFS started emitting warnings about allocations failing more
> > often.
> >
> > So the answer is to remove the warning? That's like killing the
> > canary to stop the methane leak in the coal mine. No canary? No
> > problems!
>
> That's not what happened. The patch that affected behavior here
> transformed code that an incoherent collection of conditions to
> something that has an actual model.
Which is entirely undocumented. If you have a model, the first thing
to do is document it and communicate that model to everyone who
needs to know about that new model. I have no idea what that model
is. Keeping it in your head and changing code that other people
maintain without giving them any means of understanding WTF you are
doing is a really bad engineering practice.
And yes, I have had a bit to say about this in public recently.
Go watch my recent LCA talk, for example....
And, FWIW, email discussions on a list is no substitute for a
properly documented design that people can take their time to
understand and digest.
> That model is that we don't loop
> in the allocator if there are no means to making forward progress. In
> this case, it was GFP_NOFS triggering an early exit from the allocator
> because it's not allowed to invoke the OOM killer per default, and
> there is little point in looping for times to better on their own.
So you keep saying....
> So these deadlock warnings happen, ironically, by the page allocator
> now bailing out of a locked-up state in which it's not making forward
> progress. They don't strike me as a very useful canary in this case.
... yet we *rarely* see the canary warnings we emit when we do too
many allocation retries, the code has been that way for 13-odd
years. Hence, despite your protestations that your way is *better*,
we have code that is tried, tested and proven in rugged production
environments. That's far more convincing evidence that the *code
should not change* than your assertions that it is broken and needs
to be fixed.
> > Right now, the oom killer is a liability. Over the past 6 months
> > I've slowly had to exclude filesystem regression tests from running
> > on small memory machines because the OOM killer is now so unreliable
> > that it kills the test harness regularly rather than the process
> > generating memory pressure. That's a big red flag to me that all
> > this hacking around the edges is not solving the underlying problem,
> > but instead is breaking things that did once work.
> >
> > And, well, then there's this (gfp.h):
> >
> > * __GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller
> > * cannot handle allocation failures. This modifier is deprecated and no new
> > * users should be added.
> >
> > So, is this another policy relevation from the mm developers about
> > the kmalloc API? i.e. that __GFP_NOFAIL is no longer deprecated?
> > Or just another symptom of frantic thrashing because nobody actually
> > understands the problem or those that do are unwilling to throw out
> > the broken crap and redesign it?
>
> Well, understand our dilemma here. __GFP_NOFAIL is a liability
> because it can trap tasks with unknown state and locks in a
> potentially never ending loop, and we don't want people to start using
> it as a convenient solution to get out of having a fallback strategy.
>
> However, if your entire architecture around a particular allocation is
> that failure is not an option at this point, and you can't reasonably
> preallocate - although that would always be preferrable - then please
> do not open code an endless loop around the call to the allocator but
> use __GFP_NOFAIL instead so that these callsites are annotated and can
> be reviewed.
I will actively work around aanything that causes filesystem memory
pressure to increase the chance of oom killer invocations. The OOM
killer is not a solution - it is, by definition, a loose cannon and
so we should be reducing dependencies on it.
I really don't care about the OOM Killer corner cases - it's
completely the wrong way line of development to be spending time on
and you aren't going to convince me otherwise. The OOM killer a
crutch used to justify having a memory allocation subsystem that
can't provide forward progress guarantee mechanisms to callers that
need it.
I've proposed a method of providing this forward progress guarantee
for subsystems of arbitrary complexity, and this removes the
dependency on the OOM killer for fowards allocation progress in such
contexts (e.g. filesystems). We should be discussing how to
implement that, not what bandaids we need to apply to the OOM
killer. I want to fix the underlying problems, not push them under
the OOM-killer bus...
> And please understand that this callsite blowing up is a chance to
> better the code and behavior here. Where previously it would just
> endlessly loop in the allocator without any means to make progress,
Again, this statement ignores the fact we have *no credible
evidence* that this is actually a problem in production
environments.
And, besides, even if you do force through changing the XFS code to
GFP_NOFAIL, it'll get changed back to a retry loop in the near
future when we add admin configurable error handling behaviour to
XFS, as I pointed Michal to....
(http://oss.sgi.com/archives/xfs/2015-02/msg00346.html)
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-02-19 22:52 UTC|newest]
Thread overview: 177+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-12 13:54 [RFC PATCH] oom: Don't count on mm-less current process Tetsuo Handa
2014-12-16 12:47 ` Michal Hocko
2014-12-17 11:54 ` Tetsuo Handa
2014-12-17 13:08 ` Michal Hocko
2014-12-18 12:11 ` Tetsuo Handa
2014-12-18 15:33 ` Michal Hocko
2014-12-19 12:07 ` Tetsuo Handa
2014-12-19 12:49 ` Michal Hocko
2014-12-20 9:13 ` Tetsuo Handa
2014-12-20 11:42 ` Tetsuo Handa
2014-12-22 20:25 ` Michal Hocko
2014-12-23 1:00 ` Tetsuo Handa
2014-12-23 9:51 ` Michal Hocko
2014-12-23 11:46 ` Tetsuo Handa
2014-12-23 11:57 ` Tetsuo Handa
2014-12-23 12:12 ` Tetsuo Handa
2014-12-23 12:27 ` Michal Hocko
2014-12-23 12:24 ` Michal Hocko
2014-12-23 13:00 ` Tetsuo Handa
2014-12-23 13:09 ` Michal Hocko
2014-12-23 13:20 ` Tetsuo Handa
2014-12-23 13:43 ` Michal Hocko
2014-12-23 14:11 ` Tetsuo Handa
2014-12-23 14:57 ` Michal Hocko
2014-12-19 12:22 ` How to handle TIF_MEMDIE stalls? Tetsuo Handa
2014-12-20 2:03 ` Dave Chinner
2014-12-20 12:41 ` Tetsuo Handa
2014-12-20 22:35 ` Dave Chinner
2014-12-21 8:45 ` Tetsuo Handa
2014-12-21 20:42 ` Dave Chinner
2014-12-22 16:57 ` Michal Hocko
2014-12-22 21:30 ` Dave Chinner
2014-12-23 9:41 ` Johannes Weiner
2014-12-24 1:06 ` Dave Chinner
2014-12-24 2:40 ` Linus Torvalds
2014-12-29 18:19 ` Michal Hocko
2014-12-30 6:42 ` Tetsuo Handa
2014-12-30 11:21 ` Michal Hocko
2014-12-30 13:33 ` Tetsuo Handa
2014-12-31 10:24 ` Tetsuo Handa
2015-02-09 11:44 ` Tetsuo Handa
2015-02-10 13:58 ` Tetsuo Handa
2015-02-10 15:19 ` Johannes Weiner
2015-02-11 2:23 ` Tetsuo Handa
2015-02-11 13:37 ` Tetsuo Handa
2015-02-11 18:50 ` Oleg Nesterov
2015-02-11 18:59 ` Oleg Nesterov
2015-03-14 13:03 ` Tetsuo Handa
2015-02-17 12:23 ` Tetsuo Handa
2015-02-17 12:53 ` Johannes Weiner
2015-02-17 15:38 ` Michal Hocko
2015-02-17 22:54 ` Dave Chinner
2015-02-17 23:32 ` Dave Chinner
2015-02-18 8:25 ` Michal Hocko
2015-02-18 10:48 ` Dave Chinner
2015-02-18 12:16 ` Michal Hocko
2015-02-18 21:31 ` Dave Chinner
2015-02-19 9:40 ` Michal Hocko
2015-02-19 22:03 ` Dave Chinner
2015-02-20 9:27 ` Michal Hocko
2015-02-19 11:01 ` Johannes Weiner
2015-02-19 12:29 ` Michal Hocko
2015-02-19 12:58 ` Michal Hocko
2015-02-19 15:29 ` Tetsuo Handa
2015-02-19 21:53 ` Tetsuo Handa
2015-02-20 9:13 ` Michal Hocko
2015-02-20 13:37 ` Stefan Ring
2015-02-19 13:29 ` Tetsuo Handa
2015-02-20 9:10 ` Michal Hocko
2015-02-20 12:20 ` Tetsuo Handa
2015-02-20 12:38 ` Michal Hocko
2015-02-19 21:43 ` Dave Chinner
2015-02-20 12:48 ` Michal Hocko
2015-02-20 23:09 ` Dave Chinner
2015-02-19 10:24 ` Johannes Weiner
2015-02-19 22:52 ` Dave Chinner [this message]
2015-02-20 10:36 ` Tetsuo Handa
2015-02-20 23:15 ` Dave Chinner
2015-02-21 3:20 ` Theodore Ts'o
2015-02-21 9:19 ` Andrew Morton
2015-02-21 13:48 ` Tetsuo Handa
2015-02-21 21:38 ` Dave Chinner
2015-02-22 0:20 ` Johannes Weiner
2015-02-23 10:48 ` Michal Hocko
2015-02-23 11:23 ` Tetsuo Handa
2015-02-23 21:33 ` David Rientjes
2015-02-22 14:48 ` __GFP_NOFAIL and oom_killer_disabled? Tetsuo Handa
2015-02-23 10:21 ` Michal Hocko
2015-02-23 13:03 ` Tetsuo Handa
2015-02-24 18:14 ` Michal Hocko
2015-02-25 11:22 ` Tetsuo Handa
2015-02-25 16:02 ` Michal Hocko
2015-02-25 21:48 ` Tetsuo Handa
2015-02-25 21:51 ` Andrew Morton
2015-02-21 12:00 ` How to handle TIF_MEMDIE stalls? Tetsuo Handa
2015-02-23 10:26 ` Michal Hocko
2015-02-21 11:12 ` Tetsuo Handa
2015-02-21 21:48 ` Dave Chinner
2015-02-21 23:52 ` Johannes Weiner
2015-02-23 0:45 ` Dave Chinner
2015-02-23 1:29 ` Andrew Morton
2015-02-23 7:32 ` Dave Chinner
2015-02-27 18:24 ` Vlastimil Babka
2015-02-28 0:03 ` Dave Chinner
2015-02-28 15:17 ` Theodore Ts'o
2015-03-02 9:39 ` Vlastimil Babka
2015-03-02 22:31 ` Dave Chinner
2015-03-03 9:13 ` Vlastimil Babka
2015-03-04 1:33 ` Dave Chinner
2015-03-04 8:50 ` Vlastimil Babka
2015-03-04 11:03 ` Dave Chinner
2015-03-07 0:20 ` Johannes Weiner
2015-03-07 3:43 ` Dave Chinner
2015-03-07 15:08 ` Johannes Weiner
2015-03-02 20:22 ` Johannes Weiner
2015-03-02 23:12 ` Dave Chinner
2015-03-03 2:50 ` Johannes Weiner
2015-03-04 6:52 ` Dave Chinner
2015-03-04 15:04 ` Johannes Weiner
2015-03-04 17:38 ` Theodore Ts'o
2015-03-04 23:17 ` Dave Chinner
2015-02-28 16:29 ` Johannes Weiner
2015-02-28 16:41 ` Theodore Ts'o
2015-02-28 22:15 ` Johannes Weiner
2015-03-01 11:17 ` Tetsuo Handa
2015-03-06 11:53 ` Tetsuo Handa
2015-03-01 13:43 ` Theodore Ts'o
2015-03-01 16:15 ` Johannes Weiner
2015-03-01 19:36 ` Theodore Ts'o
2015-03-01 20:44 ` Johannes Weiner
2015-03-01 20:17 ` Johannes Weiner
2015-03-01 21:48 ` Dave Chinner
2015-03-02 0:17 ` Dave Chinner
2015-03-02 12:46 ` Brian Foster
2015-02-28 18:36 ` Vlastimil Babka
2015-03-02 15:18 ` Michal Hocko
2015-03-02 16:05 ` Johannes Weiner
2015-03-02 17:10 ` Michal Hocko
2015-03-02 17:27 ` Johannes Weiner
2015-03-02 16:39 ` Theodore Ts'o
2015-03-02 16:58 ` Michal Hocko
2015-03-04 12:52 ` Dave Chinner
2015-02-17 14:59 ` Michal Hocko
2015-02-17 14:50 ` Michal Hocko
2015-02-17 14:37 ` Michal Hocko
2015-02-17 14:44 ` Michal Hocko
2015-02-16 11:23 ` Tetsuo Handa
2015-02-16 15:42 ` Johannes Weiner
2015-02-17 11:57 ` Tetsuo Handa
2015-02-17 13:16 ` Johannes Weiner
2015-02-17 16:50 ` Michal Hocko
2015-02-17 23:25 ` Dave Chinner
2015-02-18 8:48 ` Michal Hocko
2015-02-18 11:23 ` Tetsuo Handa
2015-02-18 12:29 ` Michal Hocko
2015-02-18 14:06 ` Tetsuo Handa
2015-02-18 14:25 ` Michal Hocko
2015-02-19 10:48 ` Tetsuo Handa
2015-02-20 8:26 ` Michal Hocko
2015-02-23 22:08 ` David Rientjes
2015-02-24 11:20 ` Tetsuo Handa
2015-02-24 15:20 ` Theodore Ts'o
2015-02-24 21:02 ` Dave Chinner
2015-02-25 14:31 ` Tetsuo Handa
2015-02-27 7:39 ` Dave Chinner
2015-02-27 12:42 ` Tetsuo Handa
2015-02-27 13:12 ` Dave Chinner
2015-03-04 12:41 ` Tetsuo Handa
2015-03-04 13:25 ` Dave Chinner
2015-03-04 14:11 ` Tetsuo Handa
2015-03-05 1:36 ` Dave Chinner
2015-02-17 16:33 ` Michal Hocko
2014-12-29 17:40 ` [PATCH] mm: get rid of radix tree gfp mask for pagecache_get_page (was: Re: How to handle TIF_MEMDIE stalls?) Michal Hocko
2014-12-29 18:45 ` Linus Torvalds
2014-12-29 19:33 ` Michal Hocko
2014-12-30 13:42 ` Michal Hocko
2014-12-30 21:45 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150219225217.GY12722@dastard \
--to=david@fromorbit.com \
--cc=akpm@linux-foundation.org \
--cc=dchinner@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=oleg@redhat.com \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=rientjes@google.com \
--cc=torvalds@linux-foundation.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox