linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <dchinner@redhat.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>,
	linux-ext4@vger.kernel.org,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	linux-block@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	Eric Sandeen <sandeen@redhat.com>, Christoph Hellwig <hch@lst.de>,
	Zhang Yi <yi.zhang@redhat.com>
Subject: Re: [ext4 io hang] buffered write io hang in balance_dirty_pages
Date: Fri, 28 Apr 2023 09:33:20 +1000	[thread overview]
Message-ID: <ZEsGQFN4Pd12r+Nt@rh> (raw)
In-Reply-To: <ZEnb7KuOWmu5P+V9@ovpn-8-24.pek2.redhat.com>

On Thu, Apr 27, 2023 at 10:20:28AM +0800, Ming Lei wrote:
> Hello Guys,
> 
> I got one report in which buffered write IO hangs in balance_dirty_pages,
> after one nvme block device is unplugged physically, then umount can't
> succeed.

The bug here is that the device unplug code has not told the
filesystem that it's gone away permanently.

This is the same problem we've been having for the past 15 years -
when block device goes away permanently it leaves the filesystem and
everything else dependent on the block device completely unaware
that they are unable to function anymore. IOWs, the block
device remove path is relying on -unreliable side effects- of
filesystem IO error handling to produce what we'd call "correct
behaviour".

The block device needs to be shutting down the filesystem when it
has some sort of fatal, unrecoverable error like this (e.g. hot
unplug). We have the XFS_IOC_GOINGDOWN ioctl for telling the
filesystem it can't function anymore. This ioctl
(_IOR('X',125,__u32)) has also been replicated into ext4, f2fs and
CIFS and it gets exercised heavily by fstests. Hence this isn't XFS
specific functionality, nor is it untested functionality.

The ioctl should be lifted to the VFS as FS_IOC_SHUTDOWN and a
super_operations method added to trigger a filesystem shutdown.
That way the block device removal code could simply call
sb->s_ops->shutdown(sb, REASON) if it exists rather than
sync_filesystem(sb) if there's a superblock associated with the
block device. Then all these 

This way we won't have to spend another two decades of people
complaining about how applications and filesystems hang when they
pull the storage device out from under them and the filesystem
didn't do something that made it notice before the system hung....

> So far only observed on ext4 FS, not see it on XFS.

Pure dumb luck - a journal IO failed on XFS (probably during the
sync_filesystem() call) and that shut the filesystem down.

> I guess it isn't
> related with disk type, and not tried such test on other type of disks yet,
> but will do.

It can happen on any block device based storage that gets pulled
from under any filesystem without warning.

> Seems like dirty pages aren't cleaned after ext4 bio is failed in this
> situation?

Yes, because the filesystem wasn't shut down on device removal to
tell it that it's allowed to toss away dirty pages as they cannot be
cleaned via the IO path....

-Dave.
-- 
Dave Chinner
dchinner@redhat.com



  parent reply	other threads:[~2023-04-27 23:33 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-27  2:20 Ming Lei
2023-04-27  3:58 ` Matthew Wilcox
2023-04-27  4:50   ` Ming Lei
2023-04-27  6:36     ` Baokun Li
2023-04-27  7:33       ` Baokun Li
2023-04-27 10:01       ` Ming Lei
2023-04-27 11:19         ` Baokun Li
2023-04-27 11:27           ` Ming Lei
2023-04-28  1:41             ` Ming Lei
2023-04-28  3:47               ` Baokun Li
2023-04-28  5:47                 ` Theodore Ts'o
2023-04-29  3:16                   ` Ming Lei
2023-04-29  4:40                     ` Christoph Hellwig
2023-04-29  5:10                       ` Ming Lei
2023-05-01  4:47                         ` Christoph Hellwig
2023-05-02  0:57                           ` Ming Lei
2023-05-02  1:35                             ` Dave Chinner
2023-05-02 15:35                               ` Darrick J. Wong
2023-05-02 22:33                                 ` Dave Chinner
2023-05-02 23:27                                   ` Darrick J. Wong
2023-04-29  4:56                     ` Theodore Ts'o
2023-05-01  2:06                       ` Dave Chinner
2023-05-04  3:09                   ` Baokun Li
2023-04-27 23:33 ` Dave Chinner [this message]
2023-04-28  2:56   ` Matthew Wilcox
2023-04-28  5:24     ` Dave Chinner
2023-05-04 15:59 ` Keith Busch
2023-05-04 16:21   ` Matthew Wilcox
2023-05-05  2:06   ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZEsGQFN4Pd12r+Nt@rh \
    --to=dchinner@redhat.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ming.lei@redhat.com \
    --cc=sandeen@redhat.com \
    --cc=tytso@mit.edu \
    --cc=yi.zhang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox