linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Alistair Popple <apopple@nvidia.com>
Cc: linux-mm@kvack.org, jhubbard@nvidia.com, rcampbell@nvidia.com,
	willy@infradead.org, dan.j.williams@intel.com,
	david@fromorbit.com, linux-fsdevel@vger.kernel.org, jack@suse.cz,
	djwong@kernel.org, hch@lst.de, david@redhat.com
Subject: Re: ZONE_DEVICE refcounting
Date: Fri, 8 Mar 2024 09:44:18 -0400	[thread overview]
Message-ID: <20240308134418.GH9179@nvidia.com> (raw)
In-Reply-To: <87ttlhmj9p.fsf@nvdebian.thelocal>

On Fri, Mar 08, 2024 at 03:24:35PM +1100, Alistair Popple wrote:
> Hi,
> 
> I have been looking at fixing up ZONE_DEVICE refcounting again. Specifically I
> have been looking  at fixing the 1-based refcounts that are currently used for
> FS DAX pages (and p2pdma pages, but that's trival).
> 
> This started with the simple idea of "just subtract one from the
> refcounts everywhere and that will fix the off by one". Unfortunately
> it's not that simple. For starters doing a simple conversion like that
> requires allowing pages to be mapped with zero refcounts. That seems
> wrong. It also leads to problems detecting idle IO vs. page map pages.
> 
> So instead I'm thinking of doing something along the lines of the following:
> 
> 1. Refcount FS DAX pages normally. Ie. map them with vm_insert_page() and
>    increment the refcount inline with mapcount and decrement it when pages are
>    unmapped.

This is the right thing to do

> 2. As per normal pages the pages are considered free when the refcount drops
>    to zero.
> 
> 3. Because these are treated as normal pages for refcounting we no longer map
>    them as pte_devmap() (possibly freeing up a PTE bit).

Yes, the pmd/pte_devmap() should ideally go away.

> 4. PMD sized FS DAX pages get treated the same as normal compound pages.
> 
> 5. This means we need to allow compound ZONE DEVICE pages. Tail pages share
>    the page->pgmap field with page->compound_head, but this isn't a problem
>    because the LSB of page->pgmap is free and we can still get pgmap from
>    compound_head(page)->pgmap.

Right, this is the actual work - the mm is obviously already happy
with its part, fsdax just need to create a properly sized folio and
map it properly.

> 6. When FS DAX pages are freed they notify filesystem drivers. This can be done
>    from the pgmap->ops->page_free() callback.
> 
> 7. We could probably get rid of the pgmap refcounting because we can just scan
>    pages and look for any pages with non-zero references and wait for them to be
>    freed whilst ensuring no new mappings can be created (some drivers do a
>    similar thing for private pages today). This might be a follow-up
>    change.

Yeah, the pgmap refcounting needs some cleanup for sure.

Jason


  reply	other threads:[~2024-03-08 13:44 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-08  4:24 Alistair Popple
2024-03-08 13:44 ` Jason Gunthorpe [this message]
2024-03-13  6:32 ` Dan Williams
2024-03-20  5:20   ` Alistair Popple
2024-03-21  5:26     ` Alistair Popple
2024-03-21  6:03       ` Dan Williams
2024-03-22  0:01         ` Alistair Popple
2024-03-22  3:18           ` Dave Chinner
2024-03-22  5:34             ` Alistair Popple
2024-03-22  6:58               ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240308134418.GH9179@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=apopple@nvidia.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=david@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jhubbard@nvidia.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rcampbell@nvidia.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox