linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Jason Gunthorpe <jgg@nvidia.com>, Fuad Tabba <tabba@google.com>
Cc: linux-mm@kvack.org, kvm@vger.kernel.org,
	nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	rppt@kernel.org, jglisse@redhat.com, akpm@linux-foundation.org,
	muchun.song@linux.dev, simona@ffwll.ch, airlied@gmail.com,
	pbonzini@redhat.com, seanjc@google.com, willy@infradead.org,
	jhubbard@nvidia.com, ackerleytng@google.com,
	vannapurve@google.com, mail@maciej.szmigiero.name,
	kirill.shutemov@linux.intel.com, quic_eberman@quicinc.com,
	maz@kernel.org, will@kernel.org, qperret@google.com,
	keirf@google.com, roypat@amazon.co.uk
Subject: Re: [RFC PATCH v1 00/10] mm: Introduce and use folio_owner_ops
Date: Fri, 8 Nov 2024 20:33:49 +0100	[thread overview]
Message-ID: <9dc212ac-c4c3-40f2-9feb-a8bcf71a1246@redhat.com> (raw)
In-Reply-To: <20241108170501.GI539304@nvidia.com>

On 08.11.24 18:05, Jason Gunthorpe wrote:
> On Fri, Nov 08, 2024 at 04:20:30PM +0000, Fuad Tabba wrote:
>> Some folios, such as hugetlb folios and zone device folios,
>> require special handling when the folio's reference count reaches
>> 0, before being freed. Moreover, guest_memfd folios will likely
>> require special handling to notify it once a folio's reference
>> count reaches 0, to facilitate shared to private folio conversion
>> [*]. Currently, each usecase has a dedicated callback when the
>> folio refcount reaches 0 to that effect. Adding yet more
>> callbacks is not ideal.
> 

Thanks for having a look!

Replying to clarify some things. Fuad, feel free to add additional 
information.

> Honestly, I question this thesis. How complex would it be to have 'yet
> more callbacks'? Is the challenge really that the mm can't detect when
> guestmemfd is the owner of the page because the page will be
> ZONE_NORMAL?

Fuad might have been a bit imprecise here: We don't want an ever growing 
list of checks+callbacks on the page freeing fast path.

This series replaces the two cases we have by a single generic one, 
which is nice independent of guest_memfd I think.

> 
> So the point of this is really to allow ZONE_NORMAL pages to have a
> per-allocator callback?

To intercept the refcount going to zero independent of any zones or 
magic page types, without as little overhead in the common page freeing 
path.

It can be used to implement custom allocators, like factored out for 
hugetlb in this series. It's not necessarily limited to that, though. It 
can be used as a form of "asynchronous page ref freezing", where you get 
notified once all references are gone.

(I might have another use case with PageOffline, where we want to 
prevent virtio-mem ones of them from getting accidentally leaked into 
the buddy during memory offlining with speculative references -- 
virtio_mem_fake_offline_going_offline() contains the interesting bits. 
But I did not look into the dirty details yet, just some thought where 
we'd want to intercept the refcount going to 0.)

> 
> But this is also why I suggested to shift them to ZONE_DEVICE for
> guestmemfd, because then you get these things for free from the pgmap.

With this series even hugetlb gets it for "free", and hugetlb is not 
quite the nail for the ZONE_DEVICE hammer IMHO :)

For things we can statically set aside early during boot and never 
really want to return to the buddy/another allocator, I would agree that 
static ZONE_DEVICE would have possible.

Whenever the buddy or other allocators are involved, and we might have 
granularity as a handful of pages (e.g., taken from the buddy), getting 
ZONE_DEVICE involved is not a good (or even feasible) approach.

After all, all we want is intercept the refcount going to 0.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-11-08 19:33 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-08 16:20 Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 01/10] mm/hugetlb: rename isolate_hugetlb() to folio_isolate_hugetlb() Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 02/10] mm/migrate: don't call folio_putback_active_hugetlb() on dst hugetlb folio Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 03/10] mm/hugetlb: rename "folio_putback_active_hugetlb()" to "folio_putback_hugetlb()" Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 04/10] mm/hugetlb-cgroup: convert hugetlb_cgroup_css_offline() to work on folios Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 05/10] mm/hugetlb: use folio->lru int demote_free_hugetlb_folios() Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 06/10] mm/hugetlb: use separate folio->_hugetlb_list for hugetlb-internals Fuad Tabba
2024-11-12 15:28   ` wang wei
2024-11-12 15:48     ` [RFC " David Hildenbrand
2024-11-08 16:20 ` [RFC PATCH v1 07/10] mm: Introduce struct folio_owner_ops Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 08/10] mm: Use getters and setters to access page pgmap Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 09/10] mm: Use owner_ops on folio_put for zone device pages Fuad Tabba
2024-11-08 16:20 ` [RFC PATCH v1 10/10] mm: hugetlb: Use owner_ops on folio_put for hugetlb Fuad Tabba
2024-11-08 17:05 ` [RFC PATCH v1 00/10] mm: Introduce and use folio_owner_ops Jason Gunthorpe
2024-11-08 19:33   ` David Hildenbrand [this message]
2024-11-11  8:26     ` Fuad Tabba
2024-11-12  5:26       ` Matthew Wilcox
2024-11-12  9:10         ` David Hildenbrand
2024-11-12 13:53           ` Jason Gunthorpe
2024-11-12 14:22             ` David Hildenbrand
2024-11-13  4:57               ` Matthew Wilcox
2024-11-13 11:27                 ` David Hildenbrand
2024-11-14  4:02                 ` John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9dc212ac-c4c3-40f2-9feb-a8bcf71a1246@redhat.com \
    --to=david@redhat.com \
    --cc=ackerleytng@google.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jgg@nvidia.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=keirf@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=nouveau@lists.freedesktop.org \
    --cc=pbonzini@redhat.com \
    --cc=qperret@google.com \
    --cc=quic_eberman@quicinc.com \
    --cc=roypat@amazon.co.uk \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=simona@ffwll.ch \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox