From: Matthew Wilcox <willy@infradead.org>
To: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>, Fuad Tabba <tabba@google.com>,
linux-mm@kvack.org, kvm@vger.kernel.org,
nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
rppt@kernel.org, jglisse@redhat.com, akpm@linux-foundation.org,
muchun.song@linux.dev, simona@ffwll.ch, airlied@gmail.com,
pbonzini@redhat.com, seanjc@google.com, jhubbard@nvidia.com,
ackerleytng@google.com, vannapurve@google.com,
mail@maciej.szmigiero.name, kirill.shutemov@linux.intel.com,
quic_eberman@quicinc.com, maz@kernel.org, will@kernel.org,
qperret@google.com, keirf@google.com, roypat@amazon.co.uk
Subject: Re: [RFC PATCH v1 00/10] mm: Introduce and use folio_owner_ops
Date: Wed, 13 Nov 2024 04:57:28 +0000 [thread overview]
Message-ID: <ZzQxuAiJLbqm5xGO@casper.infradead.org> (raw)
In-Reply-To: <430b6a38-facf-4127-b1ef-5cfe7c495d63@redhat.com>
On Tue, Nov 12, 2024 at 03:22:46PM +0100, David Hildenbrand wrote:
> On 12.11.24 14:53, Jason Gunthorpe wrote:
> > On Tue, Nov 12, 2024 at 10:10:06AM +0100, David Hildenbrand wrote:
> > > On 12.11.24 06:26, Matthew Wilcox wrote:
> > > > I don't want you to respin. I think this is a bad idea.
> > >
> > > I'm hoping you'll find some more time to explain what exactly you don't
> > > like, because this series only refactors what we already have.
> > >
> > > I enjoy seeing the special casing (especially hugetlb) gone from mm/swap.c.
I don't. The list of 'if's is better than the indirect function call.
That's terribly expensive, and the way we reuse the lru.next field
is fragile. Not to mention that it introduces a new thing for the
hardening people to fret over.
> > And, IMHO, seems like overkill. We have only a handful of cases -
> > maybe we shouldn't be trying to get to full generality but just handle
> > a couple of cases directly? I don't really think it is such a bad
> > thing to have an if ladder on the free path if we have only a couple
> > things. Certainly it looks good instead of doing overlaying tricks.
>
> I'd really like to abstract hugetlb handling if possible. The way it stands
> it's just very odd.
There might be ways to make that better. I haven't really been looking
too hard at making that special handling go away.
> We'll need some reliable way to identify these folios that need care.
> guest_memfd will be using folio->mapcount for now, so for now we couldn't
> set a page type like hugetlb does.
If hugetlb can set lru.next at a certain point, then guestmemfd could
set a page type at a similar point, no?
> > Also how does this translate to Matthew's memdesc world?
In a memdesc world, pages no longer have a refcount. We might still
have put_page() which will now be a very complicated (and out-of-line)
function that looks up what kind of memdesc it is and operates on the
memdesc's refcount ... if it has one. I don't know if it'll be exported
to modules; I can see uses in the mm code, but I'm not sure if modules
will have a need.
Each memdesc type will have its own function to call to free the memdesc.
So we'll still have folio_put(). But slab does not have, need nor want
a refcount, so it'll just slab_free(). I expect us to keep around a
list of recently-freed memdescs of a particular type with their pages
still attached so that we can allocate them again quickly (or reclaim
them under memory pressure). Once that freelist overflows, we'll free
a batch of them to the buddy allocator (for the pages) and the slab
allocator (for the memdesc itself).
> guest_memfd and hugetlb would be operating on folios (at least for now),
> which contain the refcount,lru,private, ... so nothing special there.
>
> Once we actually decoupled "struct folio" from "struct page", we *might*
> have to play less tricks, because we could just have a callback pointer
> there. But well, maybe we also want to save some space in there.
>
> Do we want dedicated memdescs for hugetlb/guest_memfd that extend folios in
> the future? I don't know, maybe.
I've certainly considered going so far as a per-fs folio. So we'd
have an ext4_folio, an btrfs_folio, an iomap_folio, etc. That'd let us
get rid of folio->private, but I'm not sure that C's type system can
really handle this nicely. Maybe in a Rust world ;-)
What I'm thinking about is that I'd really like to be able to declare
that all the functions in ext4_aops only accept pointers to ext4_folio,
so ext4_dirty_folio() can't be called with pointers to _any_ folio,
but specifically folios which were previously allocated for ext4.
I don't know if Rust lets you do something like that.
> I'm currently wondering if we can use folio->private for the time being.
> Either
>
> (a) If folio->private is still set once the refcount drops to 0, it
> indicates that there is a freeing callback/owner_ops. We'll have to make
> hugetlb not use folio->private and convert others to clear folio->private
> before freeing.
>
> (b) Use bitX of folio->private to indicate that this has "owner_ops"
> meaning. We'll have to make hugetlb not use folio->private and make others
> not use bitX. Might be harder and overkill, because right now we only really
> need the callback when refcount==0.
>
> (c) Use some other indication that folio->private contains folio_ops.
I really don't want to use folio_ops / folio_owner_ops. I read
https://lore.kernel.org/all/CAGtprH_JP2w-4rq02h_Ugvq5KuHX7TUvegOS7xUs_iy5hriE7g@mail.gmail.com/
and I still don't understand what you're trying to do.
Would it work to use aops->free_folio() to notify you when the folio is
being removed from the address space?
next prev parent reply other threads:[~2024-11-13 4:57 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-08 16:20 Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 01/10] mm/hugetlb: rename isolate_hugetlb() to folio_isolate_hugetlb() Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 02/10] mm/migrate: don't call folio_putback_active_hugetlb() on dst hugetlb folio Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 03/10] mm/hugetlb: rename "folio_putback_active_hugetlb()" to "folio_putback_hugetlb()" Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 04/10] mm/hugetlb-cgroup: convert hugetlb_cgroup_css_offline() to work on folios Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 05/10] mm/hugetlb: use folio->lru int demote_free_hugetlb_folios() Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 06/10] mm/hugetlb: use separate folio->_hugetlb_list for hugetlb-internals Fuad Tabba
2024-11-12 15:28 ` wang wei
2024-11-12 15:48 ` [RFC " David Hildenbrand
2024-11-08 16:20 ` [RFC PATCH v1 07/10] mm: Introduce struct folio_owner_ops Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 08/10] mm: Use getters and setters to access page pgmap Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 09/10] mm: Use owner_ops on folio_put for zone device pages Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 10/10] mm: hugetlb: Use owner_ops on folio_put for hugetlb Fuad Tabba
2024-11-08 17:05 ` [RFC PATCH v1 00/10] mm: Introduce and use folio_owner_ops Jason Gunthorpe
2024-11-08 19:33 ` David Hildenbrand
2024-11-11 8:26 ` Fuad Tabba
2024-11-12 5:26 ` Matthew Wilcox
2024-11-12 9:10 ` David Hildenbrand
2024-11-12 13:53 ` Jason Gunthorpe
2024-11-12 14:22 ` David Hildenbrand
2024-11-13 4:57 ` Matthew Wilcox [this message]
2024-11-13 11:27 ` David Hildenbrand
2024-11-14 4:02 ` John Hubbard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZzQxuAiJLbqm5xGO@casper.infradead.org \
--to=willy@infradead.org \
--cc=ackerleytng@google.com \
--cc=airlied@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=jgg@nvidia.com \
--cc=jglisse@redhat.com \
--cc=jhubbard@nvidia.com \
--cc=keirf@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mail@maciej.szmigiero.name \
--cc=maz@kernel.org \
--cc=muchun.song@linux.dev \
--cc=nouveau@lists.freedesktop.org \
--cc=pbonzini@redhat.com \
--cc=qperret@google.com \
--cc=quic_eberman@quicinc.com \
--cc=roypat@amazon.co.uk \
--cc=rppt@kernel.org \
--cc=seanjc@google.com \
--cc=simona@ffwll.ch \
--cc=tabba@google.com \
--cc=vannapurve@google.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox