Re: [RFC PATCH v1 00/10] mm: Introduce and use folio_owner_ops

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Matthew Wilcox <willy@infradead.org>
To: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>, Fuad Tabba <tabba@google.com>,
	linux-mm@kvack.org, kvm@vger.kernel.org,
	nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	rppt@kernel.org, jglisse@redhat.com, akpm@linux-foundation.org,
	muchun.song@linux.dev, simona@ffwll.ch, airlied@gmail.com,
	pbonzini@redhat.com, seanjc@google.com, jhubbard@nvidia.com,
	ackerleytng@google.com, vannapurve@google.com,
	mail@maciej.szmigiero.name, kirill.shutemov@linux.intel.com,
	quic_eberman@quicinc.com, maz@kernel.org, will@kernel.org,
	qperret@google.com, keirf@google.com, roypat@amazon.co.uk
Subject: Re: [RFC PATCH v1 00/10] mm: Introduce and use folio_owner_ops
Date: Wed, 13 Nov 2024 04:57:28 +0000	[thread overview]
Message-ID: <ZzQxuAiJLbqm5xGO@casper.infradead.org> (raw)
In-Reply-To: <430b6a38-facf-4127-b1ef-5cfe7c495d63@redhat.com>

On Tue, Nov 12, 2024 at 03:22:46PM +0100, David Hildenbrand wrote:
> On 12.11.24 14:53, Jason Gunthorpe wrote:
> > On Tue, Nov 12, 2024 at 10:10:06AM +0100, David Hildenbrand wrote:
> > > On 12.11.24 06:26, Matthew Wilcox wrote:
> > > > I don't want you to respin.  I think this is a bad idea.
> > > 
> > > I'm hoping you'll find some more time to explain what exactly you don't
> > > like, because this series only refactors what we already have.
> > > 
> > > I enjoy seeing the special casing (especially hugetlb) gone from mm/swap.c.

I don't.  The list of 'if's is better than the indirect function call.
That's terribly expensive, and the way we reuse the lru.next field
is fragile.  Not to mention that it introduces a new thing for the
hardening people to fret over.

> > And, IMHO, seems like overkill. We have only a handful of cases -
> > maybe we shouldn't be trying to get to full generality but just handle
> > a couple of cases directly? I don't really think it is such a bad
> > thing to have an if ladder on the free path if we have only a couple
> > things. Certainly it looks good instead of doing overlaying tricks.
> 
> I'd really like to abstract hugetlb handling if possible. The way it stands
> it's just very odd.

There might be ways to make that better.  I haven't really been looking
too hard at making that special handling go away.

> We'll need some reliable way to identify these folios that need care.
> guest_memfd will be using folio->mapcount for now, so for now we couldn't
> set a page type like hugetlb does.

If hugetlb can set lru.next at a certain point, then guestmemfd could
set a page type at a similar point, no?

> > Also how does this translate to Matthew's memdesc world?

In a memdesc world, pages no longer have a refcount.  We might still
have put_page() which will now be a very complicated (and out-of-line)
function that looks up what kind of memdesc it is and operates on the
memdesc's refcount ... if it has one.  I don't know if it'll be exported
to modules; I can see uses in the mm code, but I'm not sure if modules
will have a need.

Each memdesc type will have its own function to call to free the memdesc.
So we'll still have folio_put().  But slab does not have, need nor want
a refcount, so it'll just slab_free().  I expect us to keep around a
list of recently-freed memdescs of a particular type with their pages
still attached so that we can allocate them again quickly (or reclaim
them under memory pressure).  Once that freelist overflows, we'll free
a batch of them to the buddy allocator (for the pages) and the slab
allocator (for the memdesc itself).

> guest_memfd and hugetlb would be operating on folios (at least for now),
> which contain the refcount,lru,private, ... so nothing special there.
> 
> Once we actually decoupled "struct folio" from "struct page", we *might*
> have to play less tricks, because we could just have a callback pointer
> there. But well, maybe we also want to save some space in there.
> 
> Do we want dedicated memdescs for hugetlb/guest_memfd that extend folios in
> the future? I don't know, maybe.

I've certainly considered going so far as a per-fs folio.  So we'd
have an ext4_folio, an btrfs_folio, an iomap_folio, etc.  That'd let us
get rid of folio->private, but I'm not sure that C's type system can
really handle this nicely.  Maybe in a Rust world ;-)

What I'm thinking about is that I'd really like to be able to declare
that all the functions in ext4_aops only accept pointers to ext4_folio,
so ext4_dirty_folio() can't be called with pointers to _any_ folio,
but specifically folios which were previously allocated for ext4.

I don't know if Rust lets you do something like that.

> I'm currently wondering if we can use folio->private for the time being.
> Either
> 
> (a) If folio->private is still set once the refcount drops to 0, it
> indicates that there is a freeing callback/owner_ops. We'll have to make
> hugetlb not use folio->private and convert others to clear folio->private
> before freeing.
> 
> (b) Use bitX of folio->private to indicate that this has "owner_ops"
> meaning. We'll have to make hugetlb not use folio->private and make others
> not use bitX. Might be harder and overkill, because right now we only really
> need the callback when refcount==0.
> 
> (c) Use some other indication that folio->private contains folio_ops.

I really don't want to use folio_ops / folio_owner_ops.  I read
https://lore.kernel.org/all/CAGtprH_JP2w-4rq02h_Ugvq5KuHX7TUvegOS7xUs_iy5hriE7g@mail.gmail.com/
and I still don't understand what you're trying to do.

Would it work to use aops->free_folio() to notify you when the folio is
being removed from the address space?

next prev parent reply	other threads:[~2024-11-13  4:57 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-08 16:20 Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 01/10] mm/hugetlb: rename isolate_hugetlb() to folio_isolate_hugetlb() Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 02/10] mm/migrate: don't call folio_putback_active_hugetlb() on dst hugetlb folio Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 03/10] mm/hugetlb: rename "folio_putback_active_hugetlb()" to "folio_putback_hugetlb()" Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 04/10] mm/hugetlb-cgroup: convert hugetlb_cgroup_css_offline() to work on folios Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 05/10] mm/hugetlb: use folio->lru int demote_free_hugetlb_folios() Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 06/10] mm/hugetlb: use separate folio->_hugetlb_list for hugetlb-internals Fuad Tabba
2024-11-12 15:28   ` wang wei
2024-11-12 15:48     ` [RFC " David Hildenbrand
2024-11-08 16:20 ` [RFC PATCH v1 07/10] mm: Introduce struct folio_owner_ops Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 08/10] mm: Use getters and setters to access page pgmap Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 09/10] mm: Use owner_ops on folio_put for zone device pages Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 10/10] mm: hugetlb: Use owner_ops on folio_put for hugetlb Fuad Tabba
2024-11-08 17:05 ` [RFC PATCH v1 00/10] mm: Introduce and use folio_owner_ops Jason Gunthorpe
2024-11-08 19:33   ` David Hildenbrand
2024-11-11  8:26     ` Fuad Tabba
2024-11-12  5:26       ` Matthew Wilcox
2024-11-12  9:10         ` David Hildenbrand
2024-11-12 13:53           ` Jason Gunthorpe
2024-11-12 14:22             ` David Hildenbrand
2024-11-13  4:57               ` Matthew Wilcox [this message]
2024-11-13 11:27                 ` David Hildenbrand
2024-11-14  4:02                 ` John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZzQxuAiJLbqm5xGO@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=ackerleytng@google.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jgg@nvidia.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=keirf@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=nouveau@lists.freedesktop.org \
    --cc=pbonzini@redhat.com \
    --cc=qperret@google.com \
    --cc=quic_eberman@quicinc.com \
    --cc=roypat@amazon.co.uk \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=simona@ffwll.ch \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox