From: Jan Kara <jack@suse.cz>
To: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>,
colyli@suse.com, "Jan Kara" <jack@suse.cz>,
"Aleksa Sarai" <asarai@suse.de>,
"Eric W. Biederman" <ebiederm@xmission.com>,
"Dave Chinner" <david@fromorbit.com>,
"Михаил Гаврилов" <mikhail.v.gavrilov@gmail.com>,
"Christoph Hellwig" <hch@infradead.org>,
"Jan Blunck" <jblunck@infradead.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"Oscar Salvador" <osalvador@suse.com>,
"Hannes Reinecke" <hare@suse.de>, xfs <linux-xfs@vger.kernel.org>
Subject: Re: kernel BUG at fs/xfs/xfs_aops.c:853! in kernel 4.13 rc6
Date: Tue, 7 Nov 2017 16:26:29 +0100 [thread overview]
Message-ID: <20171107152629.GF11391@quack2.suse.cz> (raw)
In-Reply-To: <CAB=NE6UK3463JfiZQFHUiMj=v6HDG0k+uEE-2OvRMsW7i1EMhA@mail.gmail.com>
On Mon 06-11-17 11:25:34, Luis R. Rodriguez wrote:
> On Tue, Oct 17, 2017 at 7:12 AM, Theodore Ts'o <tytso@mit.edu> wrote:
> > On Tue, Oct 17, 2017 at 11:20:17AM +0200, Jan Kara wrote:
> >> The operation we are speaking about here is different. It is more along the
> >> lines of "release this device". And in the current world of containers,
> >> mount namespaces, etc. it is not trivial for userspace to implement this
> >> using umount(2) as Ted points out. I believe we could do that by walking
> >> through all mount points of a superblock and unmounting them (and I don't
> >> want to get into a discussion how to efficiently implement that now but in
> >> principle the kernel has all the necessary information).
> >
> > Yes, this is what I want. And regardless of how efficiently or not
> > the kernel can implement such an operatoin, by definition it will be
> > more efficient than if we have to do it in userspace.
>
> It seems most folks agree we could all benefit from this, to help
> userspace with a sane implementation.
>
> >> What I'm a bit concerned about is the "release device reference" part - for
> >> a block device to stop looking busy we have to do that however then the
> >> block device can go away and the filesystem isn't prepared to that - we
> >> reference sb->s_bdev in lots of places, we have buffer heads which are part
> >> of bdev page cache, and probably other indirect assumptions I forgot about
> >> now.
>
> Is this new operation really the only place where such type of work
> could be useful for, or are there existing uses cases this sort of
> functionality could also be used for?
The functionality of being able to "invalidate" open file descriptor so
that it no longer points to the object it used to is useful also for other
cases I guess...
> For instance I don't think we do something similar to revokefs(2) (as
> described below) when a devices has been removed from a system, you
> seem to suggest we remove the dev from gendisk leaving it dangling and
> invisible. But other than this, it would seem its up to the filesystem
> to get anything else implemented correctly?
Yes, that's the current situation. When the device is yanked from under a
filesystem the current implementation makes it relatively straightforward
from fs POV - for all fs cares about the underlying device still exists. It
just returns errors for any IO done to it. It is upto fs implementation to
deal with it and be able to shutdown itself correctly in such case.
> > This all doesn't have to be a single system call. Perhaps it would
> > make sense for first and second step to be one system call --- call it
> > revokefs(2), perhaps. And then the last step could be another system
> > call --- maybe umountall(2).
>
> Wouldn't *some* part of this also help *enhance* filesystem suspend /
> thaw be used on system suspend / resume as well?
>
> If I may, if we split these up, into two, say revokefs(2) and
> umountall(2), how about:
>
> a) revokefs(2): ensures all file descriptors for the fs are closed
> - blocks access attempts high up in VFS
> - point any file descriptor to a revoked null struct file
> - redirect any task struct CWD's so as if the directory had rmmdir'd
> - munmap any mapped regions
>
> Of these only the first one seems useful for fs suspend?
If you reference "blocks access attempts high up in VFS" that already
happens for writes when you freeze the filesystem. Also suspend is
different in that userspace is already frozen when you get to freezing
filesystems so you care only about in-kernel users and there you do not
have standard set of entry points anyway... So I don't see much crossection
with system suspend here.
>
> b) umountall(2): properly unmounts filesystem from all namespaces
> - May need to verify if revokefs(2) was called, if so, now that all
> file descriptors should
> be closed, do syncfs() to force out any dirty pages
IMHO it doesn't need to verify this. The unmount will just fail if someone
is still using some fs.
> - unmount() in all namespaces, this takes care of any buffer or page
> cache reference once the ref count of the struct super block goes to
> to zero
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-11-07 15:26 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CABXGCsOL+_OgC0dpO1+Zeg=iu7ryZRZT4S7k-io8EGB0ZRgZGw@mail.gmail.com>
2017-09-03 7:43 ` Christoph Hellwig
2017-09-03 14:08 ` Михаил Гаврилов
2017-09-04 12:30 ` Jan Kara
2017-10-07 8:10 ` Михаил Гаврилов
2017-10-07 9:22 ` Михаил Гаврилов
2017-10-09 0:05 ` Dave Chinner
2017-10-09 18:31 ` Luis R. Rodriguez
2017-10-09 19:02 ` Eric W. Biederman
2017-10-15 8:53 ` Aleksa Sarai
2017-10-15 13:06 ` Theodore Ts'o
2017-10-15 22:14 ` Eric W. Biederman
2017-10-15 23:22 ` Dave Chinner
2017-10-16 17:44 ` Eric W. Biederman
2017-10-16 21:38 ` Dave Chinner
2017-10-16 1:13 ` Theodore Ts'o
2017-10-16 17:53 ` Eric W. Biederman
2017-10-16 18:50 ` Theodore Ts'o
2017-10-16 22:00 ` Dave Chinner
2017-10-17 1:34 ` Theodore Ts'o
2017-10-17 0:59 ` Aleksa Sarai
2017-10-17 9:20 ` Jan Kara
2017-10-17 14:12 ` Theodore Ts'o
2017-11-06 19:25 ` Luis R. Rodriguez
2017-11-07 15:26 ` Jan Kara [this message]
2017-10-09 22:28 ` Dave Chinner
2017-10-10 7:57 ` Jan Kara
2017-09-04 1:43 ` Dave Chinner
2017-09-04 2:20 ` Darrick J. Wong
2017-09-04 12:14 ` Jan Kara
2017-09-04 22:36 ` Dave Chinner
2017-09-05 16:17 ` Jan Kara
2017-09-05 23:42 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171107152629.GF11391@quack2.suse.cz \
--to=jack@suse.cz \
--cc=asarai@suse.de \
--cc=colyli@suse.com \
--cc=david@fromorbit.com \
--cc=ebiederm@xmission.com \
--cc=hare@suse.de \
--cc=hch@infradead.org \
--cc=jblunck@infradead.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=mikhail.v.gavrilov@gmail.com \
--cc=osalvador@suse.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox