From: Dave Chinner <david@fromorbit.com>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: dchinner@redhat.com, mhocko@suse.cz, linux-mm@kvack.org,
rientjes@google.com, oleg@redhat.com
Subject: Re: How to handle TIF_MEMDIE stalls?
Date: Mon, 22 Dec 2014 07:42:49 +1100 [thread overview]
Message-ID: <20141221204249.GL15665@dastard> (raw)
In-Reply-To: <201412211745.ECD69212.LQOFHtFOJMSOFV@I-love.SAKURA.ne.jp>
On Sun, Dec 21, 2014 at 05:45:32PM +0900, Tetsuo Handa wrote:
> Thank you for detailed explanation.
>
> Dave Chinner wrote:
> > So, going back to the lockup, doesn't hte fact that so many
> > processes are spinning in the shrinker tell you that there's a
> > problem in that area? i.e. this:
> >
> > [ 398.861602] [<ffffffff8159f814>] _cond_resched+0x24/0x40
> > [ 398.863195] [<ffffffff81122119>] shrink_slab+0x139/0x150
> > [ 398.864799] [<ffffffff811252bf>] do_try_to_free_pages+0x35f/0x4d0
> >
> > tells me a shrinker is not making progress for some reason. I'd
> > suggest that you run some tracing to find out what shrinker it is
> > stuck in. there are tracepoints in shrink_slab that will tell you
> > what shrinker is iterating for long periods of time. i.e instead of
> > ranting and pointing fingers at everyone, you need to keep digging
> > until you know exactly where reclaim progress is stalling.
>
> I checked using below patch that shrink_slab() is called for many times but
> each call took 0 jiffies and freed 0 objects. I think shrink_slab() is merely
> reported since it likely works as a location for yielding CPU resource.
So we've got a situation where memory reclaim is not making
progress because there's nothing left to free, and everything is
backed up waiting for memory allocation to complete so that locks
can be released.
> Since trying to trigger the OOM killer means that memory reclaim subsystem
> has gave up, the memory reclaim subsystem had been unable to find
> reclaimable memory after PID=12718 got TIF_MEMDIE at 548 sec.
> Is this interpretation correct?
"memory reclaim gave up"? So why the hell isn't it returning a
failure to the caller?
i.e. We have a perfectly good page cache allocation failure error
path here all the way back to userspace, but we're invoking the
OOM-killer to kill random processes rather than returning ENOMEM to
the processes that are generating the memory demand?
Further: when did the oom-killer become the primary method
of handling situations when memory allocation needs to fail?
__GFP_WAIT does *not* mean memory allocation can't fail - that's what
__GFP_NOFAIL means. And none of the page cache allocations use
__GFP_NOFAIL, so why aren't we getting an allocation failure before
the oom-killer is kicked?
> I guess __alloc_pages_direct_reclaim() returns NULL with did_some_progress > 0
> so that __alloc_pages_may_oom() will not be called easily. As long as
> try_to_free_pages() returns non-zero, __alloc_pages_direct_reclaim() might
> return NULL with did_some_progress > 0. So, do_try_to_free_pages() is called
> for many times and is likely to return non-zero. And when
> __alloc_pages_may_oom() is called, TIF_MEMDIE is set on the thread waiting
> for mutex_lock(&"struct inode"->i_mutex) at xfs_file_buffered_aio_write()
> and I see no further progress.
Of course - TIF_MEMDIE doesn't do anything to the task that is
blocked, and the SIGKILL signal can't be delivered until the syscall
completes or the kernel code checks for pending signals and handles
EINTR directly. Mutexes are uninterruptible by design so there's no
EINTR processing, hence the oom killer cannot make progress when
everything is blocked on mutexes waiting for memory allocation to
succeed or fail.
i.e. until the lock holder exists from direct memory reclaim and
releases the locks it holds, the oom killer will not be able to save
the system. IOWs, the problem is that memory allocation is not
failing when it should....
Focussing on the OOM killer here is the wrong way to solve this
problem - the problem that needs to be solved is sane handling of
OOM conditions to avoid needing to invoke the OOM-killer...
> I don't know where to examine next. Would you please teach me command line
> for tracepoints to examine?
Tracepoints for what purpose?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-12-21 20:42 UTC|newest]
Thread overview: 177+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-12 13:54 [RFC PATCH] oom: Don't count on mm-less current process Tetsuo Handa
2014-12-16 12:47 ` Michal Hocko
2014-12-17 11:54 ` Tetsuo Handa
2014-12-17 13:08 ` Michal Hocko
2014-12-18 12:11 ` Tetsuo Handa
2014-12-18 15:33 ` Michal Hocko
2014-12-19 12:07 ` Tetsuo Handa
2014-12-19 12:49 ` Michal Hocko
2014-12-20 9:13 ` Tetsuo Handa
2014-12-20 11:42 ` Tetsuo Handa
2014-12-22 20:25 ` Michal Hocko
2014-12-23 1:00 ` Tetsuo Handa
2014-12-23 9:51 ` Michal Hocko
2014-12-23 11:46 ` Tetsuo Handa
2014-12-23 11:57 ` Tetsuo Handa
2014-12-23 12:12 ` Tetsuo Handa
2014-12-23 12:27 ` Michal Hocko
2014-12-23 12:24 ` Michal Hocko
2014-12-23 13:00 ` Tetsuo Handa
2014-12-23 13:09 ` Michal Hocko
2014-12-23 13:20 ` Tetsuo Handa
2014-12-23 13:43 ` Michal Hocko
2014-12-23 14:11 ` Tetsuo Handa
2014-12-23 14:57 ` Michal Hocko
2014-12-19 12:22 ` How to handle TIF_MEMDIE stalls? Tetsuo Handa
2014-12-20 2:03 ` Dave Chinner
2014-12-20 12:41 ` Tetsuo Handa
2014-12-20 22:35 ` Dave Chinner
2014-12-21 8:45 ` Tetsuo Handa
2014-12-21 20:42 ` Dave Chinner [this message]
2014-12-22 16:57 ` Michal Hocko
2014-12-22 21:30 ` Dave Chinner
2014-12-23 9:41 ` Johannes Weiner
2014-12-24 1:06 ` Dave Chinner
2014-12-24 2:40 ` Linus Torvalds
2014-12-29 18:19 ` Michal Hocko
2014-12-30 6:42 ` Tetsuo Handa
2014-12-30 11:21 ` Michal Hocko
2014-12-30 13:33 ` Tetsuo Handa
2014-12-31 10:24 ` Tetsuo Handa
2015-02-09 11:44 ` Tetsuo Handa
2015-02-10 13:58 ` Tetsuo Handa
2015-02-10 15:19 ` Johannes Weiner
2015-02-11 2:23 ` Tetsuo Handa
2015-02-11 13:37 ` Tetsuo Handa
2015-02-11 18:50 ` Oleg Nesterov
2015-02-11 18:59 ` Oleg Nesterov
2015-03-14 13:03 ` Tetsuo Handa
2015-02-17 12:23 ` Tetsuo Handa
2015-02-17 12:53 ` Johannes Weiner
2015-02-17 15:38 ` Michal Hocko
2015-02-17 22:54 ` Dave Chinner
2015-02-17 23:32 ` Dave Chinner
2015-02-18 8:25 ` Michal Hocko
2015-02-18 10:48 ` Dave Chinner
2015-02-18 12:16 ` Michal Hocko
2015-02-18 21:31 ` Dave Chinner
2015-02-19 9:40 ` Michal Hocko
2015-02-19 22:03 ` Dave Chinner
2015-02-20 9:27 ` Michal Hocko
2015-02-19 11:01 ` Johannes Weiner
2015-02-19 12:29 ` Michal Hocko
2015-02-19 12:58 ` Michal Hocko
2015-02-19 15:29 ` Tetsuo Handa
2015-02-19 21:53 ` Tetsuo Handa
2015-02-20 9:13 ` Michal Hocko
2015-02-20 13:37 ` Stefan Ring
2015-02-19 13:29 ` Tetsuo Handa
2015-02-20 9:10 ` Michal Hocko
2015-02-20 12:20 ` Tetsuo Handa
2015-02-20 12:38 ` Michal Hocko
2015-02-19 21:43 ` Dave Chinner
2015-02-20 12:48 ` Michal Hocko
2015-02-20 23:09 ` Dave Chinner
2015-02-19 10:24 ` Johannes Weiner
2015-02-19 22:52 ` Dave Chinner
2015-02-20 10:36 ` Tetsuo Handa
2015-02-20 23:15 ` Dave Chinner
2015-02-21 3:20 ` Theodore Ts'o
2015-02-21 9:19 ` Andrew Morton
2015-02-21 13:48 ` Tetsuo Handa
2015-02-21 21:38 ` Dave Chinner
2015-02-22 0:20 ` Johannes Weiner
2015-02-23 10:48 ` Michal Hocko
2015-02-23 11:23 ` Tetsuo Handa
2015-02-23 21:33 ` David Rientjes
2015-02-22 14:48 ` __GFP_NOFAIL and oom_killer_disabled? Tetsuo Handa
2015-02-23 10:21 ` Michal Hocko
2015-02-23 13:03 ` Tetsuo Handa
2015-02-24 18:14 ` Michal Hocko
2015-02-25 11:22 ` Tetsuo Handa
2015-02-25 16:02 ` Michal Hocko
2015-02-25 21:48 ` Tetsuo Handa
2015-02-25 21:51 ` Andrew Morton
2015-02-21 12:00 ` How to handle TIF_MEMDIE stalls? Tetsuo Handa
2015-02-23 10:26 ` Michal Hocko
2015-02-21 11:12 ` Tetsuo Handa
2015-02-21 21:48 ` Dave Chinner
2015-02-21 23:52 ` Johannes Weiner
2015-02-23 0:45 ` Dave Chinner
2015-02-23 1:29 ` Andrew Morton
2015-02-23 7:32 ` Dave Chinner
2015-02-27 18:24 ` Vlastimil Babka
2015-02-28 0:03 ` Dave Chinner
2015-02-28 15:17 ` Theodore Ts'o
2015-03-02 9:39 ` Vlastimil Babka
2015-03-02 22:31 ` Dave Chinner
2015-03-03 9:13 ` Vlastimil Babka
2015-03-04 1:33 ` Dave Chinner
2015-03-04 8:50 ` Vlastimil Babka
2015-03-04 11:03 ` Dave Chinner
2015-03-07 0:20 ` Johannes Weiner
2015-03-07 3:43 ` Dave Chinner
2015-03-07 15:08 ` Johannes Weiner
2015-03-02 20:22 ` Johannes Weiner
2015-03-02 23:12 ` Dave Chinner
2015-03-03 2:50 ` Johannes Weiner
2015-03-04 6:52 ` Dave Chinner
2015-03-04 15:04 ` Johannes Weiner
2015-03-04 17:38 ` Theodore Ts'o
2015-03-04 23:17 ` Dave Chinner
2015-02-28 16:29 ` Johannes Weiner
2015-02-28 16:41 ` Theodore Ts'o
2015-02-28 22:15 ` Johannes Weiner
2015-03-01 11:17 ` Tetsuo Handa
2015-03-06 11:53 ` Tetsuo Handa
2015-03-01 13:43 ` Theodore Ts'o
2015-03-01 16:15 ` Johannes Weiner
2015-03-01 19:36 ` Theodore Ts'o
2015-03-01 20:44 ` Johannes Weiner
2015-03-01 20:17 ` Johannes Weiner
2015-03-01 21:48 ` Dave Chinner
2015-03-02 0:17 ` Dave Chinner
2015-03-02 12:46 ` Brian Foster
2015-02-28 18:36 ` Vlastimil Babka
2015-03-02 15:18 ` Michal Hocko
2015-03-02 16:05 ` Johannes Weiner
2015-03-02 17:10 ` Michal Hocko
2015-03-02 17:27 ` Johannes Weiner
2015-03-02 16:39 ` Theodore Ts'o
2015-03-02 16:58 ` Michal Hocko
2015-03-04 12:52 ` Dave Chinner
2015-02-17 14:59 ` Michal Hocko
2015-02-17 14:50 ` Michal Hocko
2015-02-17 14:37 ` Michal Hocko
2015-02-17 14:44 ` Michal Hocko
2015-02-16 11:23 ` Tetsuo Handa
2015-02-16 15:42 ` Johannes Weiner
2015-02-17 11:57 ` Tetsuo Handa
2015-02-17 13:16 ` Johannes Weiner
2015-02-17 16:50 ` Michal Hocko
2015-02-17 23:25 ` Dave Chinner
2015-02-18 8:48 ` Michal Hocko
2015-02-18 11:23 ` Tetsuo Handa
2015-02-18 12:29 ` Michal Hocko
2015-02-18 14:06 ` Tetsuo Handa
2015-02-18 14:25 ` Michal Hocko
2015-02-19 10:48 ` Tetsuo Handa
2015-02-20 8:26 ` Michal Hocko
2015-02-23 22:08 ` David Rientjes
2015-02-24 11:20 ` Tetsuo Handa
2015-02-24 15:20 ` Theodore Ts'o
2015-02-24 21:02 ` Dave Chinner
2015-02-25 14:31 ` Tetsuo Handa
2015-02-27 7:39 ` Dave Chinner
2015-02-27 12:42 ` Tetsuo Handa
2015-02-27 13:12 ` Dave Chinner
2015-03-04 12:41 ` Tetsuo Handa
2015-03-04 13:25 ` Dave Chinner
2015-03-04 14:11 ` Tetsuo Handa
2015-03-05 1:36 ` Dave Chinner
2015-02-17 16:33 ` Michal Hocko
2014-12-29 17:40 ` [PATCH] mm: get rid of radix tree gfp mask for pagecache_get_page (was: Re: How to handle TIF_MEMDIE stalls?) Michal Hocko
2014-12-29 18:45 ` Linus Torvalds
2014-12-29 19:33 ` Michal Hocko
2014-12-30 13:42 ` Michal Hocko
2014-12-30 21:45 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141221204249.GL15665@dastard \
--to=david@fromorbit.com \
--cc=dchinner@redhat.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=oleg@redhat.com \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox