linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Brian Foster <bfoster@redhat.com>, Christoph Hellwig <hch@lst.de>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Xiong Zhou <xzhou@redhat.com>,
	linux-xfs@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: mm allocation failure and hang when running xfstests generic/269 on xfs
Date: Thu, 2 Mar 2017 16:14:11 +0100	[thread overview]
Message-ID: <20170302151411.GM1404@dhcp22.suse.cz> (raw)
In-Reply-To: <20170302145131.GF3213@bfoster.bfoster>

On Thu 02-03-17 09:51:31, Brian Foster wrote:
> On Thu, Mar 02, 2017 at 03:34:41PM +0100, Michal Hocko wrote:
> > On Thu 02-03-17 09:23:15, Brian Foster wrote:
> > > On Thu, Mar 02, 2017 at 02:50:01PM +0100, Michal Hocko wrote:
> > > > On Thu 02-03-17 08:41:58, Brian Foster wrote:
> > > > > On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote:
> > > > [...]
> > > > > > I see your argument about being in sync with other kmem helpers but
> > > > > > those are bit different because regular page/slab allocators allow never
> > > > > > fail semantic (even though this is mostly ignored by those helpers which
> > > > > > implement their own retries but that is a different topic).
> > > > > > 
> > > > > 
> > > > > ... but what I'm trying to understand here is whether this failure
> > > > > scenario is specific to vmalloc() or whether the other kmem_*()
> > > > > functions are susceptible to the same problem. For example, suppose we
> > > > > replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE,
> > > > > KM_SLEEP) call. Could we hit the same problem if the process is killed?
> > > > 
> > > > Well, kmem_zalloc uses kmalloc which can also fail when we are out of
> > > > memory but in that case we can expect the OOM killer releasing some
> > > > memory which would allow us to make a forward progress on the next
> > > > retry. So essentially retrying around kmalloc is much more safe in this
> > > > regard. Failing vmalloc might be permanent because there is no vmalloc
> > > > space to allocate from or much more likely due to already mentioned
> > > > patch. So vmalloc is different, really.
> > > 
> > > Right.. that's why I'm asking. So it's technically possible but highly
> > > unlikely due to the different failure characteristics. That seems
> > > reasonable to me, then. 
> > > 
> > > To be clear, do we understand what causes the vzalloc() failure to be
> > > effectively permanent in this specific reproducer? I know you mention
> > > above that we could be out of vmalloc space, but that doesn't clarify
> > > whether there are other potential failure paths or then what this has to
> > > do with the fact that the process was killed. Does the pending signal
> > > cause the subsequent failures or are you saying that there is some other
> > > root cause of the failure, this process would effectively be spinning
> > > here anyways, and we're just noticing it because it's trying to exit?
> > 
> > In this particular case it is fatal_signal_pending that causes the
> > permanent failure. This check has been added to prevent from complete
> > memory reserves depletion on OOM when a killed task has a free ticket to
> > reserves and vmalloc requests can be really large. In this case there
> > was no OOM killer going on but fsstress has SIGKILL pending for other
> > reason. Most probably as a result of the group_exit when all threads
> > are killed (see zap_process). I could have turn fatal_signal_pending
> > into tsk_is_oom_victim which would be less likely to hit but in
> > principle fatal_signal_pending should be better because we do want to
> > bail out when the process is existing as soon as possible.
> > 
> > What I really wanted to say is that there are other possible permanent
> > failure paths in vmalloc AFAICS. They are much less probable but they
> > still exist.
> > 
> > Does that make more sense now?
> 
> Yes, thanks. That explains why this crops up now where it hasn't in the
> past. Please include that background in the commit log description.

OK, does this sound better. I am open to any suggestions to improve this
of course

: xfs: allow kmem_zalloc_greedy to fail
: 
: Even though kmem_zalloc_greedy is documented it might fail the current
: code doesn't really implement this properly and loops on the smallest
: allowed size for ever. This is a problem because vzalloc might fail
: permanently - we might run out of vmalloc space or since 5d17a73a2ebe
: ("vmalloc: back off when the current task is killed") when the current
: task is killed. The later one makes the failure scenario much more
: probable than it used to be. Fix this by bailing out if the minimum size
: request failed.
: 
: This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
: 
: fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
: fsstress cpuset=/ mems_allowed=0-1
: CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
: Call Trace:
:  dump_stack+0x63/0x87
:  warn_alloc+0x114/0x1c0
:  ? alloc_pages_current+0x88/0x120
:  __vmalloc_node_range+0x250/0x2a0
:  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
:  ? free_hot_cold_page+0x21f/0x280
:  vzalloc+0x54/0x60
:  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
:  kmem_zalloc_greedy+0x2b/0x40 [xfs]
:  xfs_bulkstat+0x11b/0x730 [xfs]
:  ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
:  ? selinux_capable+0x20/0x30
:  ? security_capable+0x48/0x60
:  xfs_ioc_bulkstat+0xe4/0x190 [xfs]
:  xfs_file_ioctl+0x9dd/0xad0 [xfs]
:  ? do_filp_open+0xa5/0x100
:  do_vfs_ioctl+0xa7/0x5e0
:  SyS_ioctl+0x79/0x90
:  do_syscall_64+0x67/0x180
:  entry_SYSCALL64_slow_path+0x25/0x25
: 
: fsstress keeps looping inside kmem_zalloc_greedy without any way out
: because vmalloc keeps failing due to fatal_signal_pending.
: 
: Reported-by: Xiong Zhou <xzhou@redhat.com>
: Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
: Signed-off-by: Michal Hocko <mhocko@suse.com>

> Also, that kind of makes me think that a fatal_signal_pending() check is
> still appropriate in the loop, even if we want to drop the infinite
> retry loop in kmem_zalloc_greedy() as well. There's no sense in doing
> however many retries are left before we return and that's also more
> explicit for the next person who goes to change this code in the future.

I am not objecting to adding fatal_signal_pending as well I just thought
that from the logic POV breaking after reaching the minimum size is just
the right thing to do. We can optimize further by checking
fatal_signal_pending and reducing retries when we know it doesn't make
much sense but that should be done on top as an optimization IMHO.

> Otherwise, I'm fine with breaking the infinite retry loop at the same
> time. It looks like Christoph added this function originally so this
> should probably require his ack as well..

What do you think Christoph?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-03-02 15:14 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-01  4:46 Xiong Zhou
2017-03-02  0:37 ` Christoph Hellwig
2017-03-02  5:19   ` Xiong Zhou
2017-03-02  6:41     ` Bob Liu
2017-03-02  6:47     ` Anshuman Khandual
2017-03-02  8:42       ` Michal Hocko
2017-03-02  9:23         ` Xiong Zhou
2017-03-02 10:04     ` Tetsuo Handa
2017-03-02 10:35       ` Michal Hocko
2017-03-02 10:53         ` mm allocation failure and hang when running xfstests generic/269on xfs Tetsuo Handa
2017-03-02 12:24         ` mm allocation failure and hang when running xfstests generic/269 on xfs Brian Foster
2017-03-02 12:49           ` Michal Hocko
2017-03-02 13:00             ` Brian Foster
2017-03-02 13:07               ` Tetsuo Handa
2017-03-02 13:27               ` Michal Hocko
2017-03-02 13:41                 ` Brian Foster
2017-03-02 13:50                   ` Michal Hocko
2017-03-02 14:23                     ` Brian Foster
2017-03-02 14:34                       ` Michal Hocko
2017-03-02 14:51                         ` Brian Foster
2017-03-02 15:14                           ` Michal Hocko [this message]
2017-03-02 15:30                             ` Brian Foster
2017-03-02 15:45                               ` [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail Michal Hocko
2017-03-02 15:45                                 ` [PATCH 2/2] xfs: back off from kmem_zalloc_greedy if the task is killed Michal Hocko
2017-03-02 15:49                                   ` Christoph Hellwig
2017-03-02 15:59                                   ` Brian Foster
2017-03-02 15:49                                 ` [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail Christoph Hellwig
2017-03-02 15:59                                 ` Brian Foster
2017-03-02 16:16                                 ` Michal Hocko
2017-03-02 16:44                                   ` Darrick J. Wong
2017-03-03 22:54                                 ` Dave Chinner
2017-03-03 23:19                                   ` Darrick J. Wong
2017-03-04  4:48                                     ` Dave Chinner
2017-03-06 13:21                                   ` Michal Hocko
2017-03-02 15:47                               ` mm allocation failure and hang when running xfstests generic/269 on xfs Michal Hocko
2017-03-02 15:47                           ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170302151411.GM1404@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=bfoster@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=xzhou@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox