From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: hannes@cmpxchg.org, tytso@mit.edu
Cc: david@fromorbit.com, mhocko@suse.cz, dchinner@redhat.com,
linux-mm@kvack.org, rientjes@google.com, oleg@redhat.com,
akpm@linux-foundation.org, mgorman@suse.de,
torvalds@linux-foundation.org, xfs@oss.sgi.com,
fernando_b1@lab.ntt.co.jp
Subject: Re: How to handle TIF_MEMDIE stalls?
Date: Sun, 1 Mar 2015 20:17:56 +0900 [thread overview]
Message-ID: <201503012017.EAD00571.HOOJVOStMFLFQF@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20150228221558.GA23028@phnom.home.cmpxchg.org>
Johannes Weiner wrote:
> On Sat, Feb 28, 2015 at 11:41:58AM -0500, Theodore Ts'o wrote:
> > On Sat, Feb 28, 2015 at 11:29:43AM -0500, Johannes Weiner wrote:
> > >
> > > I'm trying to figure out if the current nofail allocators can get
> > > their memory needs figured out beforehand. And reliably so - what
> > > good are estimates that are right 90% of the time, when failing the
> > > allocation means corrupting user data? What is the contingency plan?
> >
> > In the ideal world, we can figure out the exact memory needs
> > beforehand. But we live in an imperfect world, and given that block
> > devices *also* need memory, the answer is "of course not". We can't
> > be perfect. But we can least give some kind of hint, and we can offer
> > to wait before we get into a situation where we need to loop in
> > GFP_NOWAIT --- which is the contingency/fallback plan.
>
> Overestimating should be fine, the result would a bit of false memory
> pressure. But underestimating and looping can't be an option or the
> original lockups will still be there. We need to guarantee forward
> progress or the problem is somewhat mitigated at best - only now with
> quite a bit more complexity in the allocator and the filesystems.
>
> The block code would have to be looked at separately, but doesn't it
> already use mempools etc. to guarantee progress?
>
If underestimating is tolerable, can we simply set different watermark
levels for GFP_ATOMIC / GFP_NOIO / GFP_NOFS / GFP_KERNEL allocations?
For example,
GFP_KERNEL (or above) can fail if memory usage exceeds 95%
GFP_NOFS can fail if memory usage exceeds 97%
GFP_NOIO can fail if memory usage exceeds 98%
GFP_ATOMIC can fail if memory usage exceeds 99%
I think that below order-0 GFP_NOIO allocation enters into retry-forever loop
when GFP_KERNEL (or above) allocation starts waiting for reclaim sounds
strange. Use of same watermark is preventing kernel worker threads from
processing workqueue. While it is legal to do blocking operation from
workqueue, being blocked forever is an exclusive occupation for workqueue;
other jobs in the workqueue get stuck.
[ 907.302050] kworker/1:0 R running task 0 10832 2 0x00000080
[ 907.303961] Workqueue: events_freezable_power_ disk_events_workfn
[ 907.305706] ffff88007c8ab7d8 0000000000000046 ffff88007c8ab8a0 ffff88007c894190
[ 907.307761] 0000000000012500 ffff88007c8abfd8 0000000000012500 ffff88007c894190
[ 907.309894] 0000000000000020 ffff88007c8ab8b0 0000000000000002 ffffffff81848408
[ 907.311949] Call Trace:
[ 907.312989] [<ffffffff8159f814>] _cond_resched+0x24/0x40
[ 907.314578] [<ffffffff81122119>] shrink_slab+0x139/0x150
[ 907.316182] [<ffffffff811252bf>] do_try_to_free_pages+0x35f/0x4d0
[ 907.317889] [<ffffffff811254c4>] try_to_free_pages+0x94/0xc0
[ 907.319535] [<ffffffff8111a793>] __alloc_pages_nodemask+0x4e3/0xa40
[ 907.321259] [<ffffffff8115a8ce>] alloc_pages_current+0x8e/0x100
[ 907.322945] [<ffffffff8125bed6>] bio_copy_user_iov+0x1d6/0x380
[ 907.324606] [<ffffffff8125e4cd>] ? blk_rq_init+0xed/0x160
[ 907.326196] [<ffffffff8125c119>] bio_copy_kern+0x49/0x100
[ 907.327788] [<ffffffff810a14a0>] ? prepare_to_wait_event+0x100/0x100
[ 907.329549] [<ffffffff81265e6f>] blk_rq_map_kern+0x6f/0x130
[ 907.331184] [<ffffffff8116393e>] ? kmem_cache_alloc+0x48e/0x4b0
[ 907.332877] [<ffffffff813a66cf>] scsi_execute+0x12f/0x160
[ 907.334452] [<ffffffff813a7f14>] scsi_execute_req_flags+0x84/0xf0
[ 907.336156] [<ffffffffa01e29cc>] sr_check_events+0xbc/0x2e0 [sr_mod]
[ 907.337893] [<ffffffff8109834c>] ? put_prev_entity+0x2c/0x3b0
[ 907.339539] [<ffffffffa01d6177>] cdrom_check_events+0x17/0x30 [cdrom]
[ 907.341289] [<ffffffffa01e2e5d>] sr_block_check_events+0x2d/0x30 [sr_mod]
[ 907.343115] [<ffffffff812701c6>] disk_check_events+0x56/0x1b0
[ 907.344771] [<ffffffff81270331>] disk_events_workfn+0x11/0x20
[ 907.346421] [<ffffffff8107ceaf>] process_one_work+0x13f/0x370
[ 907.348057] [<ffffffff8107de99>] worker_thread+0x119/0x500
[ 907.349650] [<ffffffff8107dd80>] ? rescuer_thread+0x350/0x350
[ 907.351295] [<ffffffff81082f7c>] kthread+0xdc/0x100
[ 907.352765] [<ffffffff81082ea0>] ? kthread_create_on_node+0x1b0/0x1b0
[ 907.354520] [<ffffffff815a383c>] ret_from_fork+0x7c/0xb0
[ 907.356097] [<ffffffff81082ea0>] ? kthread_create_on_node+0x1b0/0x1b0
If I change GFP_NOIO in scsi_execute() to GFP_ATOMIC, above trace went away.
If we can reserve some amount of memory for block / filesystem layer than
allow non critical allocation, above trace will likely go away.
Or, instead maybe we can change GFP_NOIO to do
(1) try allocation using GFP_ATOMIC|GFP_NOWARN
(2) try allocating from freelist for GFP_NOIO
(3) fail the allocation with warning message
steps if we can implement freelist for GFP_NOIO. Ditto for GFP_NOFS.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-03-01 11:18 UTC|newest]
Thread overview: 177+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-12 13:54 [RFC PATCH] oom: Don't count on mm-less current process Tetsuo Handa
2014-12-16 12:47 ` Michal Hocko
2014-12-17 11:54 ` Tetsuo Handa
2014-12-17 13:08 ` Michal Hocko
2014-12-18 12:11 ` Tetsuo Handa
2014-12-18 15:33 ` Michal Hocko
2014-12-19 12:07 ` Tetsuo Handa
2014-12-19 12:49 ` Michal Hocko
2014-12-20 9:13 ` Tetsuo Handa
2014-12-20 11:42 ` Tetsuo Handa
2014-12-22 20:25 ` Michal Hocko
2014-12-23 1:00 ` Tetsuo Handa
2014-12-23 9:51 ` Michal Hocko
2014-12-23 11:46 ` Tetsuo Handa
2014-12-23 11:57 ` Tetsuo Handa
2014-12-23 12:12 ` Tetsuo Handa
2014-12-23 12:27 ` Michal Hocko
2014-12-23 12:24 ` Michal Hocko
2014-12-23 13:00 ` Tetsuo Handa
2014-12-23 13:09 ` Michal Hocko
2014-12-23 13:20 ` Tetsuo Handa
2014-12-23 13:43 ` Michal Hocko
2014-12-23 14:11 ` Tetsuo Handa
2014-12-23 14:57 ` Michal Hocko
2014-12-19 12:22 ` How to handle TIF_MEMDIE stalls? Tetsuo Handa
2014-12-20 2:03 ` Dave Chinner
2014-12-20 12:41 ` Tetsuo Handa
2014-12-20 22:35 ` Dave Chinner
2014-12-21 8:45 ` Tetsuo Handa
2014-12-21 20:42 ` Dave Chinner
2014-12-22 16:57 ` Michal Hocko
2014-12-22 21:30 ` Dave Chinner
2014-12-23 9:41 ` Johannes Weiner
2014-12-24 1:06 ` Dave Chinner
2014-12-24 2:40 ` Linus Torvalds
2014-12-29 18:19 ` Michal Hocko
2014-12-30 6:42 ` Tetsuo Handa
2014-12-30 11:21 ` Michal Hocko
2014-12-30 13:33 ` Tetsuo Handa
2014-12-31 10:24 ` Tetsuo Handa
2015-02-09 11:44 ` Tetsuo Handa
2015-02-10 13:58 ` Tetsuo Handa
2015-02-10 15:19 ` Johannes Weiner
2015-02-11 2:23 ` Tetsuo Handa
2015-02-11 13:37 ` Tetsuo Handa
2015-02-11 18:50 ` Oleg Nesterov
2015-02-11 18:59 ` Oleg Nesterov
2015-03-14 13:03 ` Tetsuo Handa
2015-02-17 12:23 ` Tetsuo Handa
2015-02-17 12:53 ` Johannes Weiner
2015-02-17 15:38 ` Michal Hocko
2015-02-17 22:54 ` Dave Chinner
2015-02-17 23:32 ` Dave Chinner
2015-02-18 8:25 ` Michal Hocko
2015-02-18 10:48 ` Dave Chinner
2015-02-18 12:16 ` Michal Hocko
2015-02-18 21:31 ` Dave Chinner
2015-02-19 9:40 ` Michal Hocko
2015-02-19 22:03 ` Dave Chinner
2015-02-20 9:27 ` Michal Hocko
2015-02-19 11:01 ` Johannes Weiner
2015-02-19 12:29 ` Michal Hocko
2015-02-19 12:58 ` Michal Hocko
2015-02-19 15:29 ` Tetsuo Handa
2015-02-19 21:53 ` Tetsuo Handa
2015-02-20 9:13 ` Michal Hocko
2015-02-20 13:37 ` Stefan Ring
2015-02-19 13:29 ` Tetsuo Handa
2015-02-20 9:10 ` Michal Hocko
2015-02-20 12:20 ` Tetsuo Handa
2015-02-20 12:38 ` Michal Hocko
2015-02-19 21:43 ` Dave Chinner
2015-02-20 12:48 ` Michal Hocko
2015-02-20 23:09 ` Dave Chinner
2015-02-19 10:24 ` Johannes Weiner
2015-02-19 22:52 ` Dave Chinner
2015-02-20 10:36 ` Tetsuo Handa
2015-02-20 23:15 ` Dave Chinner
2015-02-21 3:20 ` Theodore Ts'o
2015-02-21 9:19 ` Andrew Morton
2015-02-21 13:48 ` Tetsuo Handa
2015-02-21 21:38 ` Dave Chinner
2015-02-22 0:20 ` Johannes Weiner
2015-02-23 10:48 ` Michal Hocko
2015-02-23 11:23 ` Tetsuo Handa
2015-02-23 21:33 ` David Rientjes
2015-02-22 14:48 ` __GFP_NOFAIL and oom_killer_disabled? Tetsuo Handa
2015-02-23 10:21 ` Michal Hocko
2015-02-23 13:03 ` Tetsuo Handa
2015-02-24 18:14 ` Michal Hocko
2015-02-25 11:22 ` Tetsuo Handa
2015-02-25 16:02 ` Michal Hocko
2015-02-25 21:48 ` Tetsuo Handa
2015-02-25 21:51 ` Andrew Morton
2015-02-21 12:00 ` How to handle TIF_MEMDIE stalls? Tetsuo Handa
2015-02-23 10:26 ` Michal Hocko
2015-02-21 11:12 ` Tetsuo Handa
2015-02-21 21:48 ` Dave Chinner
2015-02-21 23:52 ` Johannes Weiner
2015-02-23 0:45 ` Dave Chinner
2015-02-23 1:29 ` Andrew Morton
2015-02-23 7:32 ` Dave Chinner
2015-02-27 18:24 ` Vlastimil Babka
2015-02-28 0:03 ` Dave Chinner
2015-02-28 15:17 ` Theodore Ts'o
2015-03-02 9:39 ` Vlastimil Babka
2015-03-02 22:31 ` Dave Chinner
2015-03-03 9:13 ` Vlastimil Babka
2015-03-04 1:33 ` Dave Chinner
2015-03-04 8:50 ` Vlastimil Babka
2015-03-04 11:03 ` Dave Chinner
2015-03-07 0:20 ` Johannes Weiner
2015-03-07 3:43 ` Dave Chinner
2015-03-07 15:08 ` Johannes Weiner
2015-03-02 20:22 ` Johannes Weiner
2015-03-02 23:12 ` Dave Chinner
2015-03-03 2:50 ` Johannes Weiner
2015-03-04 6:52 ` Dave Chinner
2015-03-04 15:04 ` Johannes Weiner
2015-03-04 17:38 ` Theodore Ts'o
2015-03-04 23:17 ` Dave Chinner
2015-02-28 16:29 ` Johannes Weiner
2015-02-28 16:41 ` Theodore Ts'o
2015-02-28 22:15 ` Johannes Weiner
2015-03-01 11:17 ` Tetsuo Handa [this message]
2015-03-06 11:53 ` Tetsuo Handa
2015-03-01 13:43 ` Theodore Ts'o
2015-03-01 16:15 ` Johannes Weiner
2015-03-01 19:36 ` Theodore Ts'o
2015-03-01 20:44 ` Johannes Weiner
2015-03-01 20:17 ` Johannes Weiner
2015-03-01 21:48 ` Dave Chinner
2015-03-02 0:17 ` Dave Chinner
2015-03-02 12:46 ` Brian Foster
2015-02-28 18:36 ` Vlastimil Babka
2015-03-02 15:18 ` Michal Hocko
2015-03-02 16:05 ` Johannes Weiner
2015-03-02 17:10 ` Michal Hocko
2015-03-02 17:27 ` Johannes Weiner
2015-03-02 16:39 ` Theodore Ts'o
2015-03-02 16:58 ` Michal Hocko
2015-03-04 12:52 ` Dave Chinner
2015-02-17 14:59 ` Michal Hocko
2015-02-17 14:50 ` Michal Hocko
2015-02-17 14:37 ` Michal Hocko
2015-02-17 14:44 ` Michal Hocko
2015-02-16 11:23 ` Tetsuo Handa
2015-02-16 15:42 ` Johannes Weiner
2015-02-17 11:57 ` Tetsuo Handa
2015-02-17 13:16 ` Johannes Weiner
2015-02-17 16:50 ` Michal Hocko
2015-02-17 23:25 ` Dave Chinner
2015-02-18 8:48 ` Michal Hocko
2015-02-18 11:23 ` Tetsuo Handa
2015-02-18 12:29 ` Michal Hocko
2015-02-18 14:06 ` Tetsuo Handa
2015-02-18 14:25 ` Michal Hocko
2015-02-19 10:48 ` Tetsuo Handa
2015-02-20 8:26 ` Michal Hocko
2015-02-23 22:08 ` David Rientjes
2015-02-24 11:20 ` Tetsuo Handa
2015-02-24 15:20 ` Theodore Ts'o
2015-02-24 21:02 ` Dave Chinner
2015-02-25 14:31 ` Tetsuo Handa
2015-02-27 7:39 ` Dave Chinner
2015-02-27 12:42 ` Tetsuo Handa
2015-02-27 13:12 ` Dave Chinner
2015-03-04 12:41 ` Tetsuo Handa
2015-03-04 13:25 ` Dave Chinner
2015-03-04 14:11 ` Tetsuo Handa
2015-03-05 1:36 ` Dave Chinner
2015-02-17 16:33 ` Michal Hocko
2014-12-29 17:40 ` [PATCH] mm: get rid of radix tree gfp mask for pagecache_get_page (was: Re: How to handle TIF_MEMDIE stalls?) Michal Hocko
2014-12-29 18:45 ` Linus Torvalds
2014-12-29 19:33 ` Michal Hocko
2014-12-30 13:42 ` Michal Hocko
2014-12-30 21:45 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201503012017.EAD00571.HOOJVOStMFLFQF@I-love.SAKURA.ne.jp \
--to=penguin-kernel@i-love.sakura.ne.jp \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=dchinner@redhat.com \
--cc=fernando_b1@lab.ntt.co.jp \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=oleg@redhat.com \
--cc=rientjes@google.com \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox