linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	dchinner@redhat.com, linux-mm@kvack.org, rientjes@google.com,
	oleg@redhat.com, akpm@linux-foundation.org, mgorman@suse.de,
	torvalds@linux-foundation.org
Subject: Re: How to handle TIF_MEMDIE stalls?
Date: Wed, 18 Feb 2015 09:48:42 +0100	[thread overview]
Message-ID: <20150218084842.GB4478@dhcp22.suse.cz> (raw)
In-Reply-To: <20150217232552.GK4251@dastard>

On Wed 18-02-15 10:25:52, Dave Chinner wrote:
> On Tue, Feb 17, 2015 at 05:50:24PM +0100, Michal Hocko wrote:
> > On Tue 17-02-15 08:16:18, Johannes Weiner wrote:
> > > On Tue, Feb 17, 2015 at 08:57:05PM +0900, Tetsuo Handa wrote:
> > > > Johannes Weiner wrote:
> > > > > On Mon, Feb 16, 2015 at 08:23:16PM +0900, Tetsuo Handa wrote:
> > > > > >   (2) Implement TIF_MEMDIE timeout.
> > > > > 
> > > > > How about something like this?  This should solve the deadlock problem
> > > > > in the page allocator, but it would also simplify the memcg OOM killer
> > > > > and allow its use by in-kernel faults again.
> > > > 
> > > > Yes, basic idea would be same with
> > > > http://marc.info/?l=linux-mm&m=142002495532320&w=2 .
> > > > 
> > > > But Michal and David do not like the timeout approach.
> > > > http://marc.info/?l=linux-mm&m=141684783713564&w=2
> > > > http://marc.info/?l=linux-mm&m=141686814824684&w=2
> > 
> > Yes I really hate time based solutions for reasons already explained in
> > the referenced links.
> >  
> > > I'm open to suggestions, but we can't just stick our heads in the sand
> > > and pretend that these are just unrelated bugs.  They're not. 
> > 
> > Requesting GFP_NOFAIL allocation with locks held is IMHO a bug and
> > should be fixed.
> 
> That's rather naive.
> 
> Filesystems do demand paging of metadata within transactions, which
> means we are guaranteed to be holding locks when doing memory
> allocation. Indeed, this is what the GFP_NOFS allocation context is
> supposed to convey - we currently *hold locks* and so reclaim needs
> to be careful about recursion. I'll also argue that it means the OOM
> killer cannot kill the process attempting memory allocation for the
> same reason.

I am not sure I understand. Do you mean that OOM killer should attempt
to select a victim which is doing doing GFP_NOFS allocation or an
allocation in general?

> We are also guaranteed to be in a state where memory allocation
> failure *cannot be tolerated* because failure to complete the
> modification leaves the filesystem in a "corrupt in memory" state.
> We don't use GFP_NOFAIL because it's deprecated, but the reality is
> that we need to ensure memory allocation eventually succeeds because
> we *cannot go backwards*.
> 
> The choice is simple: memory allocation fails, we shut down the
> filesystem and guarantee that we DOS the entire machine because the
> filesystems have gone AWOL; or we keep trying memory allocation
> until it succeeds.

Would it be possible to drop the locks and retry the allocations?
Is the context which is doing this transaction a killable context?

> So, memory allocation generally succeeds eventually, so we have
> these loops around kmalloc(), kmem_cache_alloc() and alloc_page()
> that ensure allocation succeeds. Those loops also guarantee we get
> warnings when allocation is repeatedly failing and we might have
> actually hit a OOM deadlock situation.

As pointed in another email this should be done in the page allocator
IMO.

> > Hopelessly looping in the page allocator without GFP_NOFAIL is too risky
> > as well and we should get rid of this.
> 
> Yet the exact situation we need GFP_NOFAIL is the situation that you
> are calling a bug.
> 
> > Why should we still try to loop
> > when previous 1000 attempts failed with OOM killer invocation? Can we
> > simply fail after a configurable number of attempts?
> 
> OTOH, why should the memory allocator care what failure policy the
> callers have?

It is not about failure policy of the caller. It is about how long the
allocator tries before it gives up. A good allocator tries hard but not
too much if the caller is able to handle the failure because it is the
caller who defines the fallback policy.

> > This is prone to
> > reveal unchecked allocation failures but those are bugs as well and we
> > shouldn't pretend otherwise.
> > 
> > > As long
> > > as it's legal to enter the allocator with *anything* that can prevent
> > > another random task in the system from making progress, we have this
> > > deadlock potential.  One side has to give up, and it can't be the page
> > > allocator because it has to support __GFP_NOFAIL allocations, which
> > > are usually exactly the allocations that are buried in hard-to-unwind
> > > state that is likely to trip up exiting OOM victims.
> > 
> > I am not convinced that GFP_NOFAIL is the biggest problem. Most if
> > OOM livelocks I have seen were either due to GFP_KERNEL treated as
> > GFP_NOFAIL or an incorrect gfp mask (e.g. GFP_FS added where not
> > appropriate). I think we should focus on this part before we start
> > adding heuristics into OOM killer.
> 
> Having the OOM killer being able to kill the process that triggered
> it would be a good start.

Not sure I understand. Do you mean sysctl_oom_kill_allocating_task?

> More often than not, that is the process
> that needs killing, and the oom killer implementation currently
> cannot do anything about that process.

Can you elaborate? AFAICS the process which has triggered the OOM is the
easiest to kill victim. It is not blocked on any locks so it just needs
to get outside of the kernel.

> Make the OOM killer only be
> invoked by kswapd or some other independent kernel thread so that it
> is independent of the allocation context that needs to invoke it,
> and have the invoker wait to be told what to do.

Again, I am not sure I understand. The OOM killer doesn't block the
context which has triggered OOM condition. Allocation is retried after
OOM killer invocation and if the current context is the victim the
allocation failure is expedited.

> That way it can kill the invoking process if that's the one that
> needs to be killed, and then all "can't kill processes because the
> invoker holds locks they depend on" go away.

Except that killing the messenger is not the best strategy...
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-02-18  8:48 UTC|newest]

Thread overview: 177+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-12 13:54 [RFC PATCH] oom: Don't count on mm-less current process Tetsuo Handa
2014-12-16 12:47 ` Michal Hocko
2014-12-17 11:54   ` Tetsuo Handa
2014-12-17 13:08     ` Michal Hocko
2014-12-18 12:11       ` Tetsuo Handa
2014-12-18 15:33         ` Michal Hocko
2014-12-19 12:07           ` Tetsuo Handa
2014-12-19 12:49             ` Michal Hocko
2014-12-20  9:13               ` Tetsuo Handa
2014-12-20 11:42                 ` Tetsuo Handa
2014-12-22 20:25                   ` Michal Hocko
2014-12-23  1:00                     ` Tetsuo Handa
2014-12-23  9:51                       ` Michal Hocko
2014-12-23 11:46                         ` Tetsuo Handa
2014-12-23 11:57                           ` Tetsuo Handa
2014-12-23 12:12                             ` Tetsuo Handa
2014-12-23 12:27                             ` Michal Hocko
2014-12-23 12:24                           ` Michal Hocko
2014-12-23 13:00                             ` Tetsuo Handa
2014-12-23 13:09                               ` Michal Hocko
2014-12-23 13:20                                 ` Tetsuo Handa
2014-12-23 13:43                                   ` Michal Hocko
2014-12-23 14:11                                     ` Tetsuo Handa
2014-12-23 14:57                                       ` Michal Hocko
2014-12-19 12:22           ` How to handle TIF_MEMDIE stalls? Tetsuo Handa
2014-12-20  2:03             ` Dave Chinner
2014-12-20 12:41               ` Tetsuo Handa
2014-12-20 22:35                 ` Dave Chinner
2014-12-21  8:45                   ` Tetsuo Handa
2014-12-21 20:42                     ` Dave Chinner
2014-12-22 16:57                       ` Michal Hocko
2014-12-22 21:30                         ` Dave Chinner
2014-12-23  9:41                           ` Johannes Weiner
2014-12-24  1:06                             ` Dave Chinner
2014-12-24  2:40                               ` Linus Torvalds
2014-12-29 18:19                     ` Michal Hocko
2014-12-30  6:42                       ` Tetsuo Handa
2014-12-30 11:21                         ` Michal Hocko
2014-12-30 13:33                           ` Tetsuo Handa
2014-12-31 10:24                             ` Tetsuo Handa
2015-02-09 11:44                           ` Tetsuo Handa
2015-02-10 13:58                             ` Tetsuo Handa
2015-02-10 15:19                               ` Johannes Weiner
2015-02-11  2:23                                 ` Tetsuo Handa
2015-02-11 13:37                                   ` Tetsuo Handa
2015-02-11 18:50                                     ` Oleg Nesterov
2015-02-11 18:59                                       ` Oleg Nesterov
2015-03-14 13:03                                         ` Tetsuo Handa
2015-02-17 12:23                                   ` Tetsuo Handa
2015-02-17 12:53                                     ` Johannes Weiner
2015-02-17 15:38                                       ` Michal Hocko
2015-02-17 22:54                                       ` Dave Chinner
2015-02-17 23:32                                         ` Dave Chinner
2015-02-18  8:25                                         ` Michal Hocko
2015-02-18 10:48                                           ` Dave Chinner
2015-02-18 12:16                                             ` Michal Hocko
2015-02-18 21:31                                               ` Dave Chinner
2015-02-19  9:40                                                 ` Michal Hocko
2015-02-19 22:03                                                   ` Dave Chinner
2015-02-20  9:27                                                     ` Michal Hocko
2015-02-19 11:01                                               ` Johannes Weiner
2015-02-19 12:29                                                 ` Michal Hocko
2015-02-19 12:58                                                   ` Michal Hocko
2015-02-19 15:29                                                     ` Tetsuo Handa
2015-02-19 21:53                                                       ` Tetsuo Handa
2015-02-20  9:13                                                       ` Michal Hocko
2015-02-20 13:37                                                         ` Stefan Ring
2015-02-19 13:29                                                   ` Tetsuo Handa
2015-02-20  9:10                                                     ` Michal Hocko
2015-02-20 12:20                                                       ` Tetsuo Handa
2015-02-20 12:38                                                         ` Michal Hocko
2015-02-19 21:43                                                   ` Dave Chinner
2015-02-20 12:48                                                     ` Michal Hocko
2015-02-20 23:09                                                       ` Dave Chinner
2015-02-19 10:24                                         ` Johannes Weiner
2015-02-19 22:52                                           ` Dave Chinner
2015-02-20 10:36                                             ` Tetsuo Handa
2015-02-20 23:15                                               ` Dave Chinner
2015-02-21  3:20                                                 ` Theodore Ts'o
2015-02-21  9:19                                                   ` Andrew Morton
2015-02-21 13:48                                                     ` Tetsuo Handa
2015-02-21 21:38                                                     ` Dave Chinner
2015-02-22  0:20                                                     ` Johannes Weiner
2015-02-23 10:48                                                       ` Michal Hocko
2015-02-23 11:23                                                         ` Tetsuo Handa
2015-02-23 21:33                                                       ` David Rientjes
2015-02-22 14:48                                                     ` __GFP_NOFAIL and oom_killer_disabled? Tetsuo Handa
2015-02-23 10:21                                                       ` Michal Hocko
2015-02-23 13:03                                                         ` Tetsuo Handa
2015-02-24 18:14                                                           ` Michal Hocko
2015-02-25 11:22                                                             ` Tetsuo Handa
2015-02-25 16:02                                                               ` Michal Hocko
2015-02-25 21:48                                                                 ` Tetsuo Handa
2015-02-25 21:51                                                                   ` Andrew Morton
2015-02-21 12:00                                                   ` How to handle TIF_MEMDIE stalls? Tetsuo Handa
2015-02-23 10:26                                                   ` Michal Hocko
2015-02-21 11:12                                                 ` Tetsuo Handa
2015-02-21 21:48                                                   ` Dave Chinner
2015-02-21 23:52                                             ` Johannes Weiner
2015-02-23  0:45                                               ` Dave Chinner
2015-02-23  1:29                                                 ` Andrew Morton
2015-02-23  7:32                                                   ` Dave Chinner
2015-02-27 18:24                                                     ` Vlastimil Babka
2015-02-28  0:03                                                       ` Dave Chinner
2015-02-28 15:17                                                         ` Theodore Ts'o
2015-03-02  9:39                                                     ` Vlastimil Babka
2015-03-02 22:31                                                       ` Dave Chinner
2015-03-03  9:13                                                         ` Vlastimil Babka
2015-03-04  1:33                                                           ` Dave Chinner
2015-03-04  8:50                                                             ` Vlastimil Babka
2015-03-04 11:03                                                               ` Dave Chinner
2015-03-07  0:20                                                         ` Johannes Weiner
2015-03-07  3:43                                                           ` Dave Chinner
2015-03-07 15:08                                                             ` Johannes Weiner
2015-03-02 20:22                                                     ` Johannes Weiner
2015-03-02 23:12                                                       ` Dave Chinner
2015-03-03  2:50                                                         ` Johannes Weiner
2015-03-04  6:52                                                           ` Dave Chinner
2015-03-04 15:04                                                             ` Johannes Weiner
2015-03-04 17:38                                                               ` Theodore Ts'o
2015-03-04 23:17                                                                 ` Dave Chinner
2015-02-28 16:29                                                 ` Johannes Weiner
2015-02-28 16:41                                                   ` Theodore Ts'o
2015-02-28 22:15                                                     ` Johannes Weiner
2015-03-01 11:17                                                       ` Tetsuo Handa
2015-03-06 11:53                                                         ` Tetsuo Handa
2015-03-01 13:43                                                       ` Theodore Ts'o
2015-03-01 16:15                                                         ` Johannes Weiner
2015-03-01 19:36                                                           ` Theodore Ts'o
2015-03-01 20:44                                                             ` Johannes Weiner
2015-03-01 20:17                                                         ` Johannes Weiner
2015-03-01 21:48                                                       ` Dave Chinner
2015-03-02  0:17                                                         ` Dave Chinner
2015-03-02 12:46                                                           ` Brian Foster
2015-02-28 18:36                                                 ` Vlastimil Babka
2015-03-02 15:18                                                 ` Michal Hocko
2015-03-02 16:05                                                   ` Johannes Weiner
2015-03-02 17:10                                                     ` Michal Hocko
2015-03-02 17:27                                                       ` Johannes Weiner
2015-03-02 16:39                                                   ` Theodore Ts'o
2015-03-02 16:58                                                     ` Michal Hocko
2015-03-04 12:52                                                       ` Dave Chinner
2015-02-17 14:59                                     ` Michal Hocko
2015-02-17 14:50                                 ` Michal Hocko
2015-02-17 14:37                             ` Michal Hocko
2015-02-17 14:44                               ` Michal Hocko
2015-02-16 11:23                           ` Tetsuo Handa
2015-02-16 15:42                             ` Johannes Weiner
2015-02-17 11:57                               ` Tetsuo Handa
2015-02-17 13:16                                 ` Johannes Weiner
2015-02-17 16:50                                   ` Michal Hocko
2015-02-17 23:25                                     ` Dave Chinner
2015-02-18  8:48                                       ` Michal Hocko [this message]
2015-02-18 11:23                                         ` Tetsuo Handa
2015-02-18 12:29                                           ` Michal Hocko
2015-02-18 14:06                                             ` Tetsuo Handa
2015-02-18 14:25                                               ` Michal Hocko
2015-02-19 10:48                                                 ` Tetsuo Handa
2015-02-20  8:26                                                   ` Michal Hocko
2015-02-23 22:08                                 ` David Rientjes
2015-02-24 11:20                                   ` Tetsuo Handa
2015-02-24 15:20                                     ` Theodore Ts'o
2015-02-24 21:02                                       ` Dave Chinner
2015-02-25 14:31                                         ` Tetsuo Handa
2015-02-27  7:39                                           ` Dave Chinner
2015-02-27 12:42                                             ` Tetsuo Handa
2015-02-27 13:12                                               ` Dave Chinner
2015-03-04 12:41                                                 ` Tetsuo Handa
2015-03-04 13:25                                                   ` Dave Chinner
2015-03-04 14:11                                                     ` Tetsuo Handa
2015-03-05  1:36                                                       ` Dave Chinner
2015-02-17 16:33                             ` Michal Hocko
2014-12-29 17:40                   ` [PATCH] mm: get rid of radix tree gfp mask for pagecache_get_page (was: Re: How to handle TIF_MEMDIE stalls?) Michal Hocko
2014-12-29 18:45                     ` Linus Torvalds
2014-12-29 19:33                       ` Michal Hocko
2014-12-30 13:42                         ` Michal Hocko
2014-12-30 21:45                           ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150218084842.GB4478@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=oleg@redhat.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox