From: Jason Gunthorpe <jgg@nvidia.com>
To: Alistair Popple <apopple@nvidia.com>
Cc: linux-mm@kvack.org, jhubbard@nvidia.com, rcampbell@nvidia.com,
willy@infradead.org, dan.j.williams@intel.com,
david@fromorbit.com, linux-fsdevel@vger.kernel.org, jack@suse.cz,
djwong@kernel.org, hch@lst.de, david@redhat.com
Subject: Re: ZONE_DEVICE refcounting
Date: Fri, 8 Mar 2024 09:44:18 -0400 [thread overview]
Message-ID: <20240308134418.GH9179@nvidia.com> (raw)
In-Reply-To: <87ttlhmj9p.fsf@nvdebian.thelocal>
On Fri, Mar 08, 2024 at 03:24:35PM +1100, Alistair Popple wrote:
> Hi,
>
> I have been looking at fixing up ZONE_DEVICE refcounting again. Specifically I
> have been looking at fixing the 1-based refcounts that are currently used for
> FS DAX pages (and p2pdma pages, but that's trival).
>
> This started with the simple idea of "just subtract one from the
> refcounts everywhere and that will fix the off by one". Unfortunately
> it's not that simple. For starters doing a simple conversion like that
> requires allowing pages to be mapped with zero refcounts. That seems
> wrong. It also leads to problems detecting idle IO vs. page map pages.
>
> So instead I'm thinking of doing something along the lines of the following:
>
> 1. Refcount FS DAX pages normally. Ie. map them with vm_insert_page() and
> increment the refcount inline with mapcount and decrement it when pages are
> unmapped.
This is the right thing to do
> 2. As per normal pages the pages are considered free when the refcount drops
> to zero.
>
> 3. Because these are treated as normal pages for refcounting we no longer map
> them as pte_devmap() (possibly freeing up a PTE bit).
Yes, the pmd/pte_devmap() should ideally go away.
> 4. PMD sized FS DAX pages get treated the same as normal compound pages.
>
> 5. This means we need to allow compound ZONE DEVICE pages. Tail pages share
> the page->pgmap field with page->compound_head, but this isn't a problem
> because the LSB of page->pgmap is free and we can still get pgmap from
> compound_head(page)->pgmap.
Right, this is the actual work - the mm is obviously already happy
with its part, fsdax just need to create a properly sized folio and
map it properly.
> 6. When FS DAX pages are freed they notify filesystem drivers. This can be done
> from the pgmap->ops->page_free() callback.
>
> 7. We could probably get rid of the pgmap refcounting because we can just scan
> pages and look for any pages with non-zero references and wait for them to be
> freed whilst ensuring no new mappings can be created (some drivers do a
> similar thing for private pages today). This might be a follow-up
> change.
Yeah, the pgmap refcounting needs some cleanup for sure.
Jason
next prev parent reply other threads:[~2024-03-08 13:44 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-08 4:24 Alistair Popple
2024-03-08 13:44 ` Jason Gunthorpe [this message]
2024-03-13 6:32 ` Dan Williams
2024-03-20 5:20 ` Alistair Popple
2024-03-21 5:26 ` Alistair Popple
2024-03-21 6:03 ` Dan Williams
2024-03-22 0:01 ` Alistair Popple
2024-03-22 3:18 ` Dave Chinner
2024-03-22 5:34 ` Alistair Popple
2024-03-22 6:58 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240308134418.GH9179@nvidia.com \
--to=jgg@nvidia.com \
--cc=apopple@nvidia.com \
--cc=dan.j.williams@intel.com \
--cc=david@fromorbit.com \
--cc=david@redhat.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=jhubbard@nvidia.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rcampbell@nvidia.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox