linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: Jan Kara <jack@suse.cz>
Cc: "Aleksa Sarai" <asarai@suse.de>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"Luis R. Rodriguez" <mcgrof@kernel.org>,
	"Dave Chinner" <david@fromorbit.com>,
	"Михаил Гаврилов" <mikhail.v.gavrilov@gmail.com>,
	"Christoph Hellwig" <hch@infradead.org>,
	"Jan Blunck" <jblunck@infradead.org>,
	linux-mm@kvack.org, "Oscar Salvador" <osalvador@suse.com>,
	"Hannes Reinecke" <hare@suse.de>,
	linux-xfs@vger.kernel.org
Subject: Re: kernel BUG at fs/xfs/xfs_aops.c:853! in kernel 4.13 rc6
Date: Tue, 17 Oct 2017 10:12:33 -0400	[thread overview]
Message-ID: <20171017141233.l3avshagrv7fr7xt@thunk.org> (raw)
In-Reply-To: <20171017092017.GN9762@quack2.suse.cz>

On Tue, Oct 17, 2017 at 11:20:17AM +0200, Jan Kara wrote:
> The operation we are speaking about here is different. It is more along the
> lines of "release this device".  And in the current world of containers,
> mount namespaces, etc. it is not trivial for userspace to implement this
> using umount(2) as Ted points out. I believe we could do that by walking
> through all mount points of a superblock and unmounting them (and I don't
> want to get into a discussion how to efficiently implement that now but in
> principle the kernel has all the necessary information).

Yes, this is what I want.  And regardless of how efficiently or not
the kernel can implement such an operatoin, by definition it will be
more efficient than if we ahve to do it in userspace.  (And I don't
think it has to be super-efficient, since this is not a hot-path.  So
for the record, I wouldn't want to add any extra linked list
references, etc.)

> What I'm a bit concerned about is the "release device reference" part - for
> a block device to stop looking busy we have to do that however then the
> block device can go away and the filesystem isn't prepared to that - we
> reference sb->s_bdev in lots of places, we have buffer heads which are part
> of bdev page cache, and probably other indirect assumptions I forgot about
> now. One solution to this is to not just stop accessing the device but
> truly cleanup the filesystem up to a point where it is practically
> unmounted. I like this solution more but we have to be careful to block
> any access attemps high enough in VFS ideally before ever entering fs code.

Right, so first step would be to block access attempts high up in the
VFS.  The second would be to point any file descriptors at a revoked
NULL struct file, also redirect any task struct's CWD so it is as if
the directory had gotten rmdir'ed, and also munmap any mapped regions.
At that point, all of the file descriptors will be closed.  The third
step would be to do a syncfs(), which will force out any dirty pages.
And then finally, to call umount() in all of the namespaces, which
will naturally take care of any buffer or page cache references once
the ref count of the struct super goes to zero.

This all doesn't have to be a single system call.  Perhaps it would
make sense for first and second step to be one system call --- call it
revokefs(2), perhaps.  And then the last step could be another system
call --- maybe umountall(2).

> Another option would be to do something similar to what we do when the
> device just gets unplugged under our hands - we detach bdev from gendisk,
> leave it dangling and invisible. But we would still somehow have to
> convince DM that the bdev practically went away by calling
> disk->fops->release() and it all just seems fragile to me. But I wanted to
> mention this option in case the above solution proves to be too difficult.

Yeah, that's similarly as fragile as using the ext4/xfs/f2fs
shutdown/goingdown ioctl.  In order to do this right I really think we
need to get the VFS involved, so it can be a real, clean unmount, as
opposed to something where we just rip the file system away from the
bdev.

						- Ted
						

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-10-17 14:12 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CABXGCsOL+_OgC0dpO1+Zeg=iu7ryZRZT4S7k-io8EGB0ZRgZGw@mail.gmail.com>
2017-09-03  7:43 ` Christoph Hellwig
2017-09-03 14:08   ` Михаил Гаврилов
2017-09-04 12:30     ` Jan Kara
2017-10-07  8:10       ` Михаил Гаврилов
2017-10-07  9:22         ` Михаил Гаврилов
2017-10-09  0:05         ` Dave Chinner
2017-10-09 18:31           ` Luis R. Rodriguez
2017-10-09 19:02             ` Eric W. Biederman
2017-10-15  8:53               ` Aleksa Sarai
2017-10-15 13:06                 ` Theodore Ts'o
2017-10-15 22:14                   ` Eric W. Biederman
2017-10-15 23:22                     ` Dave Chinner
2017-10-16 17:44                       ` Eric W. Biederman
2017-10-16 21:38                         ` Dave Chinner
2017-10-16  1:13                     ` Theodore Ts'o
2017-10-16 17:53                       ` Eric W. Biederman
2017-10-16 18:50                         ` Theodore Ts'o
2017-10-16 22:00                       ` Dave Chinner
2017-10-17  1:34                         ` Theodore Ts'o
2017-10-17  0:59                       ` Aleksa Sarai
2017-10-17  9:20                         ` Jan Kara
2017-10-17 14:12                           ` Theodore Ts'o [this message]
2017-11-06 19:25                             ` Luis R. Rodriguez
2017-11-07 15:26                               ` Jan Kara
2017-10-09 22:28             ` Dave Chinner
2017-10-10  7:57               ` Jan Kara
2017-09-04  1:43   ` Dave Chinner
2017-09-04  2:20     ` Darrick J. Wong
2017-09-04 12:14       ` Jan Kara
2017-09-04 22:36         ` Dave Chinner
2017-09-05 16:17           ` Jan Kara
2017-09-05 23:42             ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171017141233.l3avshagrv7fr7xt@thunk.org \
    --to=tytso@mit.edu \
    --cc=asarai@suse.de \
    --cc=david@fromorbit.com \
    --cc=ebiederm@xmission.com \
    --cc=hare@suse.de \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jblunck@infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mikhail.v.gavrilov@gmail.com \
    --cc=osalvador@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox