linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@suse.cz>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	dchinner@redhat.com, linux-mm@kvack.org, rientjes@google.com,
	oleg@redhat.com, Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: How to handle TIF_MEMDIE stalls?
Date: Tue, 23 Dec 2014 04:41:32 -0500	[thread overview]
Message-ID: <20141223094132.GA12208@phnom.home.cmpxchg.org> (raw)
In-Reply-To: <20141222213058.GQ15665@dastard>

On Tue, Dec 23, 2014 at 08:30:58AM +1100, Dave Chinner wrote:
> On Mon, Dec 22, 2014 at 05:57:36PM +0100, Michal Hocko wrote:
> > On Mon 22-12-14 07:42:49, Dave Chinner wrote:
> > [...]
> > > "memory reclaim gave up"? So why the hell isn't it returning a
> > > failure to the caller?
> > > 
> > > i.e. We have a perfectly good page cache allocation failure error
> > > path here all the way back to userspace, but we're invoking the
> > > OOM-killer to kill random processes rather than returning ENOMEM to
> > > the processes that are generating the memory demand?
> > > 
> > > Further: when did the oom-killer become the primary method
> > > of handling situations when memory allocation needs to fail?
> > > __GFP_WAIT does *not* mean memory allocation can't fail - that's what
> > > __GFP_NOFAIL means. And none of the page cache allocations use
> > > __GFP_NOFAIL, so why aren't we getting an allocation failure before
> > > the oom-killer is kicked?
> > 
> > Well, it has been an unwritten rule that GFP_KERNEL allocations for
> > low-order (<=PAGE_ALLOC_COSTLY_ORDER) never fail. This is a long ago
> > decision which would be tricky to fix now without silently breaking a
> > lot of code. Sad...
> 
> Wow.
> 
> We have *always* been told memory allocations are not guaranteed to
> succeed, ever, unless __GFP_NOFAIL is set, but that's deprecated and
> nobody is allowed to use it any more.
> 
> Lots of code has dependencies on memory allocation making progress
> or failing for the system to work in low memory situations. The page
> cache is one of them, which means all filesystems have that
> dependency. We don't explicitly ask memory allocations to fail, we
> *expect* the memory allocation failures will occur in low memory
> conditions. We've been designing and writing code with this in mind
> for the past 15 years.
> 
> How did we get so far away from the message of "the memory allocator
> never guarantees success" that it will never fail to allocate memory
> even if it means we livelock the entire system?

I think this isn't as much an allocation guarantee as it is based on
the thought that once we can't satisfy such low orders anymore the
system is so entirely unusable that the only remaining thing to do is
to kill processes one by one until the situation is resolved.

Hard to say, though, because this has been the behavior for longer
than the initial git import of the tree, without any code comment.

And yes, it's flawed, because the allocating task looping might be
what's holding up progress, as we can see here.

> > Nevertheless the caller can prevent from an endless loop by using
> > __GFP_NORETRY so this could be used as a workaround.
> 
> That's just a never-ending game of whack-a-mole that we will
> continually lose. It's not a workable solution.

Agreed.

> > The default should be opposite IMO and only those who really
> > require some guarantee should use a special flag for that purpose.
> 
> Yup, totally agree.

So how about something like the following change?  It restricts the
allocator's endless OOM killing loop to __GFP_NOFAIL contexts, which
are annotated in the callsite and thus easier to review for locks etc.
Otherwise, the allocator tries only as long as page reclaim makes
progress, the idea being that failures are handled gracefully in the
callsites, and page faults restarting automatically anyway.  The OOM
killing in that case is deferred to the end of the exception handler.

Preliminary testing confirms that the system is indeed trying just as
hard before OOM killing in the page fault case.  However, it doesn't
look like all callsites are prepared for failing smaller allocations:

[   55.553822] Out of memory: Kill process 240 (anonstress) score 158 or sacrifice child
[   55.561787] Killed process 240 (anonstress) total-vm:1540044kB, anon-rss:1284068kB, file-rss:468kB
[   55.571083] BUG: unable to handle kernel paging request at 00000000004006bd
[   55.578156] IP: [<00000000004006bd>] 0x4006bd
[   55.582584] PGD c8f3f067 PUD c8f48067 PMD c8f15067 PTE 0
[   55.588016] Oops: 0014 [#1] SMP 
[   55.591337] CPU: 1 PID: 240 Comm: anonstress Not tainted 3.18.0-mm1-00081-gf6137925fc97-dirty #188
[   55.600435] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H61M-DGS, BIOS P1.30 05/10/2012
[   55.610030] task: ffff8802139b9a10 ti: ffff8800c8f64000 task.ti: ffff8800c8f64000
[   55.617623] RIP: 0033:[<00000000004006bd>]  [<00000000004006bd>] 0x4006bd
[   55.624512] RSP: 002b:00007fffd43b7220  EFLAGS: 00010206
[   55.629901] RAX: 00007f87e6e01000 RBX: 0000000000000000 RCX: 00007f87f64fe25a
[   55.637104] RDX: 00007f879881a000 RSI: 000000005dc00000 RDI: 0000000000000000
[   55.644331] RBP: 00007fffd43b7240 R08: 00000000ffffffff R09: 0000000000000000
[   55.651569] R10: 0000000000000022 R11: 0000000000000283 R12: 0000000000400570
[   55.658796] R13: 00007fffd43b7340 R14: 0000000000000000 R15: 0000000000000000
[   55.666040] FS:  00007f87f69d1700(0000) GS:ffff88021f280000(0000) knlGS:0000000000000000
[   55.674221] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   55.680055] CR2: 00007fdd676ad480 CR3: 00000000c8f3e000 CR4: 00000000000407e0
[   55.687272] 
[   55.688780] RIP  [<00000000004006bd>] 0x4006bd
[   55.693304]  RSP <00007fffd43b7220>
[   55.696850] CR2: 00000000004006bd
[   55.700207] ---[ end trace b9cb4f44f8e47bc3 ]---
[   55.704903] Kernel panic - not syncing: Fatal exception
[   55.710208] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[   55.720517] Rebooting in 30 seconds..

Obvious bugs aside, though, the thought of failing order-0 allocations
after such a long time is scary...

---

  reply	other threads:[~2014-12-23  9:41 UTC|newest]

Thread overview: 177+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-12 13:54 [RFC PATCH] oom: Don't count on mm-less current process Tetsuo Handa
2014-12-16 12:47 ` Michal Hocko
2014-12-17 11:54   ` Tetsuo Handa
2014-12-17 13:08     ` Michal Hocko
2014-12-18 12:11       ` Tetsuo Handa
2014-12-18 15:33         ` Michal Hocko
2014-12-19 12:07           ` Tetsuo Handa
2014-12-19 12:49             ` Michal Hocko
2014-12-20  9:13               ` Tetsuo Handa
2014-12-20 11:42                 ` Tetsuo Handa
2014-12-22 20:25                   ` Michal Hocko
2014-12-23  1:00                     ` Tetsuo Handa
2014-12-23  9:51                       ` Michal Hocko
2014-12-23 11:46                         ` Tetsuo Handa
2014-12-23 11:57                           ` Tetsuo Handa
2014-12-23 12:12                             ` Tetsuo Handa
2014-12-23 12:27                             ` Michal Hocko
2014-12-23 12:24                           ` Michal Hocko
2014-12-23 13:00                             ` Tetsuo Handa
2014-12-23 13:09                               ` Michal Hocko
2014-12-23 13:20                                 ` Tetsuo Handa
2014-12-23 13:43                                   ` Michal Hocko
2014-12-23 14:11                                     ` Tetsuo Handa
2014-12-23 14:57                                       ` Michal Hocko
2014-12-19 12:22           ` How to handle TIF_MEMDIE stalls? Tetsuo Handa
2014-12-20  2:03             ` Dave Chinner
2014-12-20 12:41               ` Tetsuo Handa
2014-12-20 22:35                 ` Dave Chinner
2014-12-21  8:45                   ` Tetsuo Handa
2014-12-21 20:42                     ` Dave Chinner
2014-12-22 16:57                       ` Michal Hocko
2014-12-22 21:30                         ` Dave Chinner
2014-12-23  9:41                           ` Johannes Weiner [this message]
2014-12-24  1:06                             ` Dave Chinner
2014-12-24  2:40                               ` Linus Torvalds
2014-12-29 18:19                     ` Michal Hocko
2014-12-30  6:42                       ` Tetsuo Handa
2014-12-30 11:21                         ` Michal Hocko
2014-12-30 13:33                           ` Tetsuo Handa
2014-12-31 10:24                             ` Tetsuo Handa
2015-02-09 11:44                           ` Tetsuo Handa
2015-02-10 13:58                             ` Tetsuo Handa
2015-02-10 15:19                               ` Johannes Weiner
2015-02-11  2:23                                 ` Tetsuo Handa
2015-02-11 13:37                                   ` Tetsuo Handa
2015-02-11 18:50                                     ` Oleg Nesterov
2015-02-11 18:59                                       ` Oleg Nesterov
2015-03-14 13:03                                         ` Tetsuo Handa
2015-02-17 12:23                                   ` Tetsuo Handa
2015-02-17 12:53                                     ` Johannes Weiner
2015-02-17 15:38                                       ` Michal Hocko
2015-02-17 22:54                                       ` Dave Chinner
2015-02-17 23:32                                         ` Dave Chinner
2015-02-18  8:25                                         ` Michal Hocko
2015-02-18 10:48                                           ` Dave Chinner
2015-02-18 12:16                                             ` Michal Hocko
2015-02-18 21:31                                               ` Dave Chinner
2015-02-19  9:40                                                 ` Michal Hocko
2015-02-19 22:03                                                   ` Dave Chinner
2015-02-20  9:27                                                     ` Michal Hocko
2015-02-19 11:01                                               ` Johannes Weiner
2015-02-19 12:29                                                 ` Michal Hocko
2015-02-19 12:58                                                   ` Michal Hocko
2015-02-19 15:29                                                     ` Tetsuo Handa
2015-02-19 21:53                                                       ` Tetsuo Handa
2015-02-20  9:13                                                       ` Michal Hocko
2015-02-20 13:37                                                         ` Stefan Ring
2015-02-19 13:29                                                   ` Tetsuo Handa
2015-02-20  9:10                                                     ` Michal Hocko
2015-02-20 12:20                                                       ` Tetsuo Handa
2015-02-20 12:38                                                         ` Michal Hocko
2015-02-19 21:43                                                   ` Dave Chinner
2015-02-20 12:48                                                     ` Michal Hocko
2015-02-20 23:09                                                       ` Dave Chinner
2015-02-19 10:24                                         ` Johannes Weiner
2015-02-19 22:52                                           ` Dave Chinner
2015-02-20 10:36                                             ` Tetsuo Handa
2015-02-20 23:15                                               ` Dave Chinner
2015-02-21  3:20                                                 ` Theodore Ts'o
2015-02-21  9:19                                                   ` Andrew Morton
2015-02-21 13:48                                                     ` Tetsuo Handa
2015-02-21 21:38                                                     ` Dave Chinner
2015-02-22  0:20                                                     ` Johannes Weiner
2015-02-23 10:48                                                       ` Michal Hocko
2015-02-23 11:23                                                         ` Tetsuo Handa
2015-02-23 21:33                                                       ` David Rientjes
2015-02-22 14:48                                                     ` __GFP_NOFAIL and oom_killer_disabled? Tetsuo Handa
2015-02-23 10:21                                                       ` Michal Hocko
2015-02-23 13:03                                                         ` Tetsuo Handa
2015-02-24 18:14                                                           ` Michal Hocko
2015-02-25 11:22                                                             ` Tetsuo Handa
2015-02-25 16:02                                                               ` Michal Hocko
2015-02-25 21:48                                                                 ` Tetsuo Handa
2015-02-25 21:51                                                                   ` Andrew Morton
2015-02-21 12:00                                                   ` How to handle TIF_MEMDIE stalls? Tetsuo Handa
2015-02-23 10:26                                                   ` Michal Hocko
2015-02-21 11:12                                                 ` Tetsuo Handa
2015-02-21 21:48                                                   ` Dave Chinner
2015-02-21 23:52                                             ` Johannes Weiner
2015-02-23  0:45                                               ` Dave Chinner
2015-02-23  1:29                                                 ` Andrew Morton
2015-02-23  7:32                                                   ` Dave Chinner
2015-02-27 18:24                                                     ` Vlastimil Babka
2015-02-28  0:03                                                       ` Dave Chinner
2015-02-28 15:17                                                         ` Theodore Ts'o
2015-03-02  9:39                                                     ` Vlastimil Babka
2015-03-02 22:31                                                       ` Dave Chinner
2015-03-03  9:13                                                         ` Vlastimil Babka
2015-03-04  1:33                                                           ` Dave Chinner
2015-03-04  8:50                                                             ` Vlastimil Babka
2015-03-04 11:03                                                               ` Dave Chinner
2015-03-07  0:20                                                         ` Johannes Weiner
2015-03-07  3:43                                                           ` Dave Chinner
2015-03-07 15:08                                                             ` Johannes Weiner
2015-03-02 20:22                                                     ` Johannes Weiner
2015-03-02 23:12                                                       ` Dave Chinner
2015-03-03  2:50                                                         ` Johannes Weiner
2015-03-04  6:52                                                           ` Dave Chinner
2015-03-04 15:04                                                             ` Johannes Weiner
2015-03-04 17:38                                                               ` Theodore Ts'o
2015-03-04 23:17                                                                 ` Dave Chinner
2015-02-28 16:29                                                 ` Johannes Weiner
2015-02-28 16:41                                                   ` Theodore Ts'o
2015-02-28 22:15                                                     ` Johannes Weiner
2015-03-01 11:17                                                       ` Tetsuo Handa
2015-03-06 11:53                                                         ` Tetsuo Handa
2015-03-01 13:43                                                       ` Theodore Ts'o
2015-03-01 16:15                                                         ` Johannes Weiner
2015-03-01 19:36                                                           ` Theodore Ts'o
2015-03-01 20:44                                                             ` Johannes Weiner
2015-03-01 20:17                                                         ` Johannes Weiner
2015-03-01 21:48                                                       ` Dave Chinner
2015-03-02  0:17                                                         ` Dave Chinner
2015-03-02 12:46                                                           ` Brian Foster
2015-02-28 18:36                                                 ` Vlastimil Babka
2015-03-02 15:18                                                 ` Michal Hocko
2015-03-02 16:05                                                   ` Johannes Weiner
2015-03-02 17:10                                                     ` Michal Hocko
2015-03-02 17:27                                                       ` Johannes Weiner
2015-03-02 16:39                                                   ` Theodore Ts'o
2015-03-02 16:58                                                     ` Michal Hocko
2015-03-04 12:52                                                       ` Dave Chinner
2015-02-17 14:59                                     ` Michal Hocko
2015-02-17 14:50                                 ` Michal Hocko
2015-02-17 14:37                             ` Michal Hocko
2015-02-17 14:44                               ` Michal Hocko
2015-02-16 11:23                           ` Tetsuo Handa
2015-02-16 15:42                             ` Johannes Weiner
2015-02-17 11:57                               ` Tetsuo Handa
2015-02-17 13:16                                 ` Johannes Weiner
2015-02-17 16:50                                   ` Michal Hocko
2015-02-17 23:25                                     ` Dave Chinner
2015-02-18  8:48                                       ` Michal Hocko
2015-02-18 11:23                                         ` Tetsuo Handa
2015-02-18 12:29                                           ` Michal Hocko
2015-02-18 14:06                                             ` Tetsuo Handa
2015-02-18 14:25                                               ` Michal Hocko
2015-02-19 10:48                                                 ` Tetsuo Handa
2015-02-20  8:26                                                   ` Michal Hocko
2015-02-23 22:08                                 ` David Rientjes
2015-02-24 11:20                                   ` Tetsuo Handa
2015-02-24 15:20                                     ` Theodore Ts'o
2015-02-24 21:02                                       ` Dave Chinner
2015-02-25 14:31                                         ` Tetsuo Handa
2015-02-27  7:39                                           ` Dave Chinner
2015-02-27 12:42                                             ` Tetsuo Handa
2015-02-27 13:12                                               ` Dave Chinner
2015-03-04 12:41                                                 ` Tetsuo Handa
2015-03-04 13:25                                                   ` Dave Chinner
2015-03-04 14:11                                                     ` Tetsuo Handa
2015-03-05  1:36                                                       ` Dave Chinner
2015-02-17 16:33                             ` Michal Hocko
2014-12-29 17:40                   ` [PATCH] mm: get rid of radix tree gfp mask for pagecache_get_page (was: Re: How to handle TIF_MEMDIE stalls?) Michal Hocko
2014-12-29 18:45                     ` Linus Torvalds
2014-12-29 19:33                       ` Michal Hocko
2014-12-30 13:42                         ` Michal Hocko
2014-12-30 21:45                           ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141223094132.GA12208@phnom.home.cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=oleg@redhat.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox