From: Matthew Wilcox <mawilcox@microsoft.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
Theodore Ts'o <tytso@mit.edu>,
Christoph Hellwig <hch@infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
"linux-nvdimm@ml01.01.org" <linux-nvdimm@ml01.01.org>,
Dave Chinner <david@fromorbit.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Jan Kara <jack@suse.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>
Subject: RE: [PATCH v2 2/9] ext2: tell DAX the size of allocation holes
Date: Fri, 9 Sep 2016 20:35:01 +0000 [thread overview]
Message-ID: <DM2PR21MB0089BCA980B67D8C53B25A1BCBFA0@DM2PR21MB0089.namprd21.prod.outlook.com> (raw)
In-Reply-To: <20160909164808.GC18554@linux.intel.com>
I feel like we're not only building on shifting sands, but we haven't decided whether we're building a Pyramid or a Sphinx.
I thought after Storage Summit, we had broad agreement that we were moving to a primary DAX API that was not BH (nor indeed iomap) based. We would still have DAX helpers for block based filesystems (because duplicating all that code between filesystems is pointless), but I now know of three filesystems which are not block based that are interested in using DAX. Jared Hulbert's AXFS is a nice public example.
I posted a prototype of this here:
https://groups.google.com/d/msg/linux.kernel/xFFHVCQM7Go/ZQeDVYTnFgAJ
It is, of course, woefully out of date, but some of the principles in it are still good (and I'm working to split it into digestible chunks).
The essence:
1. VFS or VM calls filesystem (eg ->fault())
2. Filesystem calls DAX (eg dax_fault())
3. DAX looks in radix tree, finds no information.
4. DAX calls (NEW!) mapping->a_ops->populate_pfns
5a. Filesystem (if not block based) does its own thing to find out the PFNs corresponding to the requested range, then inserts them into the radix tree (possible helper in DAX code)
5b. Filesystem (if block based) looks up its internal data structure (eg extent tree) and
calls dax_create_pfns() (see giant patch from yesterday, only instead of
passing a get_block_t, the filesystem has already filled in a bh which
describes the entire extent that this access happens to land in).
6b. DAX takes care of calling bdev_direct_access() from dax_create_pfns().
Now, notice that there's no interaction with the rest of the filesystem here. We can swap out BHs and iomaps relatively trivially; there's no call for making grand changes, like converting ext2 over to iomap. The BH or iomap is only used for communicating the extent from the filesystem to DAX.
Do we have agreement that this is the right way to go?
-----Original Message-----
From: Ross Zwisler [mailto:ross.zwisler@linux.intel.com]
Sent: Friday, September 9, 2016 12:48 PM
To: Theodore Ts'o <tytso@mit.edu>; Christoph Hellwig <hch@infradead.org>; Ross Zwisler <ross.zwisler@linux.intel.com>; linux-kernel@vger.kernel.org; Andrew Morton <akpm@linux-foundation.org>; linux-nvdimm@ml01.01.org; Matthew Wilcox <mawilcox@microsoft.com>; Dave Chinner <david@fromorbit.com>; linux-mm@kvack.org; Andreas Dilger <adilger.kernel@dilger.ca>; Alexander Viro <viro@zeniv.linux.org.uk>; Jan Kara <jack@suse.com>; linux-fsdevel@vger.kernel.org; linux-ext4@vger.kernel.org
Subject: Re: [PATCH v2 2/9] ext2: tell DAX the size of allocation holes
On Mon, Aug 29, 2016 at 08:57:41AM -0400, Theodore Ts'o wrote:
> On Mon, Aug 29, 2016 at 12:41:16AM -0700, Christoph Hellwig wrote:
> >
> > We're going to move forward killing buffer_heads in XFS. I think ext4
> > would dramatically benefit from this a well, as would ext2 (although I
> > think all that DAX work in ext2 is a horrible idea to start with).
>
> It's been on my todo list. The only reason why I haven't done it yet
> is because I knew you were working on a solution, and I didn't want to
> do things one way for buffered I/O, and a different way for Direct
> I/O, and disentangling the DIO code and the different assumptions of
> how different file systems interact with the DIO code is a *mess*.
>
> It may have gotten better more recently, but a few years ago I took a
> look at it and backed slowly away.....
Ted, what do you think of the idea of moving to struct iomap in ext2?
If ext2 stays with the current struct buffer_head + get_block_t interface,
then it looks like DAX basically has three options:
1) Support two I/O paths and two versions of each of the fault paths (PTE,
PMD, etc). One of each of these would be based on struct iomap and would be
used by xfs and potentially ext4, and the other would be based on struct
buffer_head + get_block_t and would be used by ext2.
2) Only have a single struct iomap based I/O path and fault path, and add
shim/support code so that ext2 can use it, leaving the rest of ext2 to be
struct buffer_head + get_block_t based.
3) Only have a single struct buffer_head + get_block_t based DAX I/O and fault
path, and have XFS and potentially ext4 do the translation from their native
struct iomap interface.
It seems ideal for ext2 to switch along with everyone else, if getting rid of
struct buffer_head is a global goal. If not, I guess barring technical issues
#2 above seems cleanest - move DAX to the new structure, and provide backwards
compatibility to ext2. Thoughts?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-09-09 20:35 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-23 22:04 [PATCH v2 0/9] re-enable DAX PMD support Ross Zwisler
2016-08-23 22:04 ` [PATCH v2 1/9] ext4: allow DAX writeback for hole punch Ross Zwisler
2016-09-21 15:22 ` Ross Zwisler
2016-09-22 6:59 ` Jan Kara
2016-09-22 15:51 ` Theodore Ts'o
2016-08-23 22:04 ` [PATCH v2 2/9] ext2: tell DAX the size of allocation holes Ross Zwisler
2016-08-25 7:57 ` Christoph Hellwig
2016-08-25 19:25 ` Ross Zwisler
2016-08-26 21:29 ` Ross Zwisler
2016-08-29 0:42 ` Dave Chinner
2016-08-29 7:41 ` Christoph Hellwig
2016-08-29 12:57 ` Theodore Ts'o
2016-08-30 7:21 ` Christoph Hellwig
2016-09-09 16:48 ` Ross Zwisler
2016-09-09 20:35 ` Matthew Wilcox [this message]
2016-09-09 22:34 ` Dan Williams
2016-09-10 7:31 ` Christoph Hellwig
2016-09-10 7:50 ` Matthew Wilcox
2016-09-10 17:49 ` Theodore Ts'o
2016-09-11 0:42 ` Matthew Wilcox
2016-09-10 8:15 ` Matthew Wilcox
2016-09-10 14:56 ` Dan Williams
2016-09-10 7:30 ` Christoph Hellwig
2016-09-10 7:33 ` Matthew Wilcox
2016-09-10 7:42 ` Christoph Hellwig
2016-09-10 7:52 ` Matthew Wilcox
2016-09-11 12:47 ` Christoph Hellwig
2016-09-11 22:57 ` Ross Zwisler
2016-09-10 15:55 ` Matthew Wilcox
2016-09-15 20:09 ` Ross Zwisler
2016-08-23 22:04 ` [PATCH v2 3/9] ext4: " Ross Zwisler
2016-08-23 22:04 ` [PATCH v2 4/9] dax: remove buffer_size_valid() Ross Zwisler
2016-08-23 22:04 ` [PATCH v2 5/9] dax: make 'wait_table' global variable static Ross Zwisler
2016-08-23 22:04 ` [PATCH v2 6/9] dax: consistent variable naming for DAX entries Ross Zwisler
2016-08-23 22:04 ` [PATCH v2 7/9] dax: coordinate locking for offsets in PMD range Ross Zwisler
2016-08-23 22:04 ` [PATCH v2 8/9] dax: re-enable DAX PMD support Ross Zwisler
2016-08-23 22:04 ` [PATCH v2 9/9] dax: remove "depends on BROKEN" from FS_DAX_PMD Ross Zwisler
2016-08-30 23:01 ` [PATCH v2 0/9] re-enable DAX PMD support Ross Zwisler
2016-08-31 20:20 ` Kani, Toshimitsu
2016-08-31 21:36 ` Ross Zwisler
2016-08-31 22:08 ` Kani, Toshimitsu
2016-09-01 16:21 ` Ross Zwisler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DM2PR21MB0089BCA980B67D8C53B25A1BCBFA0@DM2PR21MB0089.namprd21.prod.outlook.com \
--to=mawilcox@microsoft.com \
--cc=adilger.kernel@dilger.ca \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jack@suse.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@ml01.01.org \
--cc=ross.zwisler@linux.intel.com \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox