From: Dan Williams <dan.j.williams@intel.com>
To: Dave Chinner <david@fromorbit.com>,
Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>, <akpm@linux-foundation.org>,
"Matthew Wilcox" <willy@infradead.org>, Jan Kara <jack@suse.cz>,
"Darrick J. Wong" <djwong@kernel.org>,
Christoph Hellwig <hch@lst.de>,
John Hubbard <jhubbard@nvidia.com>,
<linux-fsdevel@vger.kernel.org>, <nvdimm@lists.linux.dev>,
<linux-xfs@vger.kernel.org>, <linux-mm@kvack.org>,
<linux-ext4@vger.kernel.org>
Subject: Re: [PATCH v2 10/18] fsdax: Manage pgmap references at entry insertion and deletion
Date: Thu, 22 Sep 2022 19:01:56 -0700 [thread overview]
Message-ID: <632d13949e113_4a6742947c@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <20220923013634.GY3600936@dread.disaster.area>
Dave Chinner wrote:
> On Thu, Sep 22, 2022 at 02:54:42PM -0700, Dan Williams wrote:
> > Jason Gunthorpe wrote:
> > > On Wed, Sep 21, 2022 at 07:17:40PM -0700, Dan Williams wrote:
> > > > Jason Gunthorpe wrote:
> > > > > On Wed, Sep 21, 2022 at 05:14:34PM -0700, Dan Williams wrote:
> > > > >
> > > > > > > Indeed, you could reasonably put such a liveness test at the moment
> > > > > > > every driver takes a 0 refcount struct page and turns it into a 1
> > > > > > > refcount struct page.
> > > > > >
> > > > > > I could do it with a flag, but the reason to have pgmap->ref managed at
> > > > > > the page->_refcount 0 -> 1 and 1 -> 0 transitions is so at the end of
> > > > > > time memunmap_pages() can look at the one counter rather than scanning
> > > > > > and rescanning all the pages to see when they go to final idle.
> > > > >
> > > > > That makes some sense too, but the logical way to do that is to put some
> > > > > counter along the page_free() path, and establish a 'make a page not
> > > > > free' path that does the other side.
> > > > >
> > > > > ie it should not be in DAX code, it should be all in common pgmap
> > > > > code. The pgmap should never be freed while any page->refcount != 0
> > > > > and that should be an intrinsic property of pgmap, not relying on
> > > > > external parties.
> > > >
> > > > I just do not know where to put such intrinsics since there is nothing
> > > > today that requires going through the pgmap object to discover the pfn
> > > > and 'allocate' the page.
> > >
> > > I think that is just a new API that wrappers the set refcount = 1,
> > > percpu refcount and maybe building appropriate compound pages too.
> > >
> > > Eg maybe something like:
> > >
> > > struct folio *pgmap_alloc_folios(pgmap, start, length)
> > >
> > > And you get back maximally sized allocated folios with refcount = 1
> > > that span the requested range.
> > >
> > > > In other words make dax_direct_access() the 'allocation' event that pins
> > > > the pgmap? I might be speaking a foreign language if you're not familiar
> > > > with the relationship of 'struct dax_device' to 'struct dev_pagemap'
> > > > instances. This is not the first time I have considered making them one
> > > > in the same.
> > >
> > > I don't know enough about dax, so yes very foreign :)
> > >
> > > I'm thinking broadly about how to make pgmap usable to all the other
> > > drivers in a safe and robust way that makes some kind of logical sense.
> >
> > I think the API should be pgmap_folio_get() because, at least for DAX,
> > the memory is already allocated. The 'allocator' for fsdax is the
> > filesystem block allocator, and pgmap_folio_get() grants access to a
>
> No, the "allocator" for fsdax is the inode iomap interface, not the
> filesystem block allocator. The filesystem block allocator is only
> involved in iomapping if we have to allocate a new mapping for a
> given file offset.
>
> A better name for this is "arbiter", not allocator. To get an
> active mapping of the DAX pages backing a file, we need to ask the
> inode iomap subsystem to *map a file offset* and it will return
> kaddr and/or pfns for the backing store the file offset maps to.
>
> IOWs, for FSDAX, access to the backing store (i.e. the physical pages) is
> arbitrated by the *inode*, not the filesystem allocator or the dax
> device. Hence if a subsystem needs to pin the backing store for some
> use, it must first ensure that it holds an inode reference (direct
> or indirect) for that range of the backing store that will spans the
> life of the pin. When the pin is done, it can tear down the mappings
> it was using and then the inode reference can be released.
>
> This ensures that any racing unlink of the inode will not result in
> the backing store being freed from under the application that has a
> pin. It will prevent the inode from being reclaimed and so
> potentially accessing stale or freed in-memory structures. And it
> will prevent the filesytem from being unmounted while the
> application using FSDAX access is still actively using that
> functionality even if it's already closed all it's fds....
Sounds so simple when you put it that way. I'll give it a shot and stop
the gymnastics of trying to get in front of truncate_inode_pages_final()
with a 'dax break layouts', just hold it off until final unpin.
next prev parent reply other threads:[~2022-09-23 2:02 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-16 3:35 [PATCH v2 00/18] Fix the DAX-gup mistake Dan Williams
2022-09-16 3:35 ` [PATCH v2 01/18] fsdax: Wait on @page not @page->_refcount Dan Williams
2022-09-20 14:30 ` Jason Gunthorpe
2022-09-16 3:35 ` [PATCH v2 02/18] fsdax: Use dax_page_idle() to document DAX busy page checking Dan Williams
2022-09-20 14:31 ` Jason Gunthorpe
2022-09-16 3:35 ` [PATCH v2 03/18] fsdax: Include unmapped inodes for page-idle detection Dan Williams
2022-09-16 3:35 ` [PATCH v2 04/18] ext4: Add ext4_break_layouts() to the inode eviction path Dan Williams
2022-09-16 3:35 ` [PATCH v2 05/18] xfs: Add xfs_break_layouts() " Dan Williams
[not found] ` <20220918225731.GG3600936@dread.disaster.area>
2022-09-19 16:11 ` Dan Williams
[not found] ` <20220919212959.GL3600936@dread.disaster.area>
2022-09-20 16:44 ` Dan Williams
[not found] ` <20220921221416.GT3600936@dread.disaster.area>
2022-09-21 22:28 ` Jason Gunthorpe
[not found] ` <20220923001846.GX3600936@dread.disaster.area>
2022-09-23 0:41 ` Dan Williams
[not found] ` <20220923021012.GZ3600936@dread.disaster.area>
2022-09-23 9:38 ` Jan Kara
2022-09-23 23:06 ` Dan Williams
[not found] ` <20220925235407.GA3600936@dread.disaster.area>
2022-09-26 14:10 ` Jan Kara
2022-09-29 23:33 ` Dan Williams
2022-09-30 13:41 ` Jan Kara
2022-09-30 17:56 ` Dan Williams
2022-09-30 18:06 ` Jason Gunthorpe
2022-09-30 18:46 ` Dan Williams
2022-10-03 7:55 ` Jan Kara
2022-09-23 12:39 ` Jason Gunthorpe
[not found] ` <20220926003430.GB3600936@dread.disaster.area>
2022-09-26 13:04 ` Jason Gunthorpe
2022-09-22 0:02 ` Dan Williams
2022-09-22 0:10 ` Jason Gunthorpe
2022-09-16 3:35 ` [PATCH v2 06/18] fsdax: Rework dax_layout_busy_page() to dax_zap_mappings() Dan Williams
2022-09-16 3:35 ` [PATCH v2 07/18] fsdax: Update dax_insert_entry() calling convention to return an error Dan Williams
2022-09-16 3:35 ` [PATCH v2 08/18] fsdax: Cleanup dax_associate_entry() Dan Williams
2022-09-16 3:36 ` [PATCH v2 09/18] fsdax: Rework dax_insert_entry() calling convention Dan Williams
2022-09-16 3:36 ` [PATCH v2 10/18] fsdax: Manage pgmap references at entry insertion and deletion Dan Williams
2022-09-21 14:03 ` Jason Gunthorpe
2022-09-21 15:18 ` Dan Williams
2022-09-21 21:38 ` Dan Williams
2022-09-21 22:07 ` Jason Gunthorpe
2022-09-22 0:14 ` Dan Williams
2022-09-22 0:25 ` Jason Gunthorpe
2022-09-22 2:17 ` Dan Williams
2022-09-22 17:55 ` Jason Gunthorpe
2022-09-22 21:54 ` Dan Williams
2022-09-23 1:36 ` Dave Chinner
2022-09-23 2:01 ` Dan Williams [this message]
2022-09-23 13:24 ` Jason Gunthorpe
2022-09-23 16:29 ` Dan Williams
2022-09-23 17:42 ` Jason Gunthorpe
2022-09-23 19:03 ` Dan Williams
2022-09-23 19:23 ` Jason Gunthorpe
2022-09-27 6:07 ` Alistair Popple
2022-09-27 12:56 ` Jason Gunthorpe
2022-09-16 3:36 ` [PATCH v2 11/18] devdax: Minor warning fixups Dan Williams
2022-09-16 3:36 ` [PATCH v2 12/18] devdax: Move address_space helpers to the DAX core Dan Williams
2022-09-27 6:20 ` Alistair Popple
2022-09-29 22:38 ` Dan Williams
2022-09-16 3:36 ` [PATCH v2 13/18] dax: Prep mapping helpers for compound pages Dan Williams
2022-09-21 14:06 ` Jason Gunthorpe
2022-09-21 15:19 ` Dan Williams
2022-09-16 3:36 ` [PATCH v2 14/18] devdax: add PUD support to the DAX mapping infrastructure Dan Williams
2022-09-16 3:36 ` [PATCH v2 15/18] devdax: Use dax_insert_entry() + dax_delete_mapping_entry() Dan Williams
2022-09-21 14:10 ` Jason Gunthorpe
2022-09-21 15:48 ` Dan Williams
2022-09-21 22:23 ` Jason Gunthorpe
2022-09-22 0:15 ` Dan Williams
2022-09-16 3:36 ` [PATCH v2 16/18] mm/memremap_pages: Support initializing pages to a zero reference count Dan Williams
2022-09-21 15:24 ` Jason Gunthorpe
2022-09-21 23:45 ` Dan Williams
2022-09-22 0:03 ` Alistair Popple
2022-09-22 0:04 ` Jason Gunthorpe
2022-09-22 0:34 ` Dan Williams
2022-09-22 1:36 ` Alistair Popple
2022-09-22 2:34 ` Dan Williams
2022-09-26 6:17 ` Alistair Popple
2022-09-22 0:13 ` John Hubbard
2022-09-16 3:36 ` [PATCH v2 17/18] fsdax: Delete put_devmap_managed_page_refs() Dan Williams
2022-09-16 3:36 ` [PATCH v2 18/18] mm/gup: Drop DAX pgmap accounting Dan Williams
2022-09-20 14:29 ` [PATCH v2 00/18] Fix the DAX-gup mistake Jason Gunthorpe
2022-09-20 16:50 ` Dan Williams
2022-11-09 0:20 ` Andrew Morton
2022-11-09 11:38 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=632d13949e113_4a6742947c@dwillia2-xfh.jf.intel.com.notmuch \
--to=dan.j.williams@intel.com \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=nvdimm@lists.linux.dev \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox