From: Matthew Wilcox <willy@infradead.org>
To: David Hildenbrand <david@redhat.com>
Cc: linux-mm@kvack.org
Subject: Re: How should we RCU-free folios?
Date: Tue, 24 Jun 2025 21:29:31 +0100 [thread overview]
Message-ID: <aFsKq-en-hLoTJgR@casper.infradead.org> (raw)
In-Reply-To: <86582169-b4fa-4f4a-9480-612002b63174@redhat.com>
On Tue, Jun 24, 2025 at 02:13:47PM +0200, David Hildenbrand wrote:
> On 29.05.25 17:02, Matthew Wilcox wrote:
> > When folios are allocated separately from the underlying pages they
> > represent, they must also be freed. See
> > https://kernelnewbies.org/MatthewWilcox/FolioAlloc
> >
> > Since we want to do lockless lookups of folios in the page cache and
> > GUP,
>
> And in PFN walkers as well.
Good point (for those not quite clear what David means here, think
migration where we're doing a physical walk and trying to decide
what to do with the memory, although that's just an example; hwpoison
detection has similar problems)
> > 1. Free the folio back to the slab immediately, and mark the slab as
> > TYPESAFE_BY_RCU. That means that the folio may get reallocated at
> > any time, but it must always remain a folio (until an RCU grace period
> > has passed and then the entire slab may be reallocated to a different
> > purpose). Lookups will do:
> >
> > a. Get a pointer to the folio
> > b. Tryget a refcount on the folio
> > c. If it succeeds, re-check the folio is still the one we want
> > (If pagecache, check the xarray still points to the folio; if GUP,
> > check the page still points to the folio)
>
> Hm, that means that all PFN walker would now also have to do a tryget
> unconditionally.
To a certain extent. At least for migration, there's a first pass where
we can just look at the value contained in the memdesc to decide if this
block is migratable, then in the second pass we get the refcount and
start doing migration-things to each page.
> Also, free hugetlb folios have a refcount of 0 right now ...
Right ... I think handling of hugetlb folios will probably change
a bit. A free hugetlb folio probably doesn't free the folio, but
might set a flag indicating that it's free. It'd be up to the
PFN walker to, say, grab the hugetlb_lock which would make sure this
hugetlb folio wasn't allocated while it's messing with it.
> > 2. RCU-free the folio. The folio will not be reallocated until the
> > reader drops the RCU read lock. The read side still needs to tryget
> > the folio refcount. However, if it succeeds, it does not need to
> > re-check the pointer to the folio as the folio cannot have been
> > freed. The downside is that folios will hang around in the system for
> > longer before being reallocated, and this may be an unacceptable
> > increase in memory usage.
> >
> > 3. RCU free the folio and RCU free the memory it controls. Now an
> > RCU-protected lookup doesn't need to bump the refcount; if it found the
> > pointer, it knows the memory cannot be freed. I think this is a
> > step too far and would
>
> That sound nice, though :)
>
> >
> > I'm favouring option 1; it's what we currently do. But I wanted to
> > give people a chance to chime in and tell me my tradeoffs are wrong.
> > Or propose a fourth option.
>
> I really dislike the refcount dependency.
>
> Also ... what about memdescs without a refcount (e.g., PFN walkers and
> slab?)?
Depending on the PFN walker, it needs to know how to handle each kind
of memdesc. Migration might choose to skip slabs and so "handle" them
by moving on to the next block. hwpoison doesn't need to handle them
either (the system is dead if we see poison in a slab). I'm not sure
how a PFN walker can protect against slab "doing something" with the
struct slab. Maybe something like slab_lock() will be needed (yes,
I know mostly slab bypasses slab_lock). But it is going to be a
per-memdesc kind of problem to solve.
Two things I did want to raise though:
First, this is an improvement. There's altogether too much code that
thinks "If I raise the refcount on the page, that will prevent the memory
from being freed". And it'll certainly prevent the page from being
returned to the page allocator, but it won't prevent the slab allocator
from reusing the memory. Other allocators (eg dma_pool)? No idea.
Second, struct slab doesn't need to be RCU freed (unless we discover
PFN walkers are going to force us to). The slab allocator knows it is
the only user, and when it's done, it can just free it and there's no
chance anybody else is looking at it. Unless PFN walkers look at it,
which they can't today because struct slab is in mm/slab.h.
prev parent reply other threads:[~2025-06-24 20:29 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-29 15:02 Matthew Wilcox
2025-06-24 12:13 ` David Hildenbrand
2025-06-24 20:29 ` Matthew Wilcox [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aFsKq-en-hLoTJgR@casper.infradead.org \
--to=willy@infradead.org \
--cc=david@redhat.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox