Re: [RFC PATCH v5 06/15] KVM: guest_memfd: Handle final folio_put() of guestmem pages

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Fuad Tabba <tabba@google.com>
To: Ackerley Tng <ackerleytng@google.com>
Cc: Vishal Annapurve <vannapurve@google.com>,
	kvm@vger.kernel.org,  linux-arm-msm@vger.kernel.org,
	linux-mm@kvack.org, pbonzini@redhat.com,  chenhuacai@kernel.org,
	mpe@ellerman.id.au, anup@brainfault.org,
	 paul.walmsley@sifive.com, palmer@dabbelt.com,
	aou@eecs.berkeley.edu,  seanjc@google.com,
	viro@zeniv.linux.org.uk, brauner@kernel.org,
	 willy@infradead.org, akpm@linux-foundation.org,
	xiaoyao.li@intel.com,  yilun.xu@intel.com,
	chao.p.peng@linux.intel.com, jarkko@kernel.org,
	 amoorthy@google.com, dmatlack@google.com,
	yu.c.zhang@linux.intel.com,  isaku.yamahata@intel.com,
	mic@digikod.net, vbabka@suse.cz,  mail@maciej.szmigiero.name,
	david@redhat.com, michael.roth@amd.com,  wei.w.wang@intel.com,
	liam.merwick@oracle.com, isaku.yamahata@gmail.com,
	 kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com,
	steven.price@arm.com,  quic_eberman@quicinc.com,
	quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com,
	 quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com,
	 quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
	catalin.marinas@arm.com,  james.morse@arm.com,
	yuzenghui@huawei.com, oliver.upton@linux.dev,  maz@kernel.org,
	will@kernel.org, qperret@google.com, keirf@google.com,
	 roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org,
	jgg@nvidia.com,  rientjes@google.com, jhubbard@nvidia.com,
	fvdl@google.com, hughd@google.com,  jthoughton@google.com
Subject: Re: [RFC PATCH v5 06/15] KVM: guest_memfd: Handle final folio_put() of guestmem pages
Date: Mon, 10 Feb 2025 16:04:07 +0000	[thread overview]
Message-ID: <CA+EHjTwGMYkGUWCghBqN=MTuLLn_SCWZJNhdGYAmg=mn-YQiyg@mail.gmail.com> (raw)
In-Reply-To: <diqzed0aowwa.fsf@ackerleytng-ctop-specialist.c.googlers.com>

Hi Ackerley,

On Fri, 7 Feb 2025 at 10:46, Ackerley Tng <ackerleytng@google.com> wrote:
>
> Vishal Annapurve <vannapurve@google.com> writes:
>
> > On Wed, Feb 5, 2025 at 9:39 AM Vishal Annapurve <vannapurve@google.com> wrote:
> >>
> >> On Wed, Feb 5, 2025 at 2:07 AM Fuad Tabba <tabba@google.com> wrote:
> >> >
> >> > Hi Vishal,
> >> >
> >> > On Wed, 5 Feb 2025 at 00:42, Vishal Annapurve <vannapurve@google.com> wrote:
> >> > >
> >> > > On Fri, Jan 17, 2025 at 8:30 AM Fuad Tabba <tabba@google.com> wrote:
> >> > > >
> >> > > > Before transitioning a guest_memfd folio to unshared, thereby
> >> > > > disallowing access by the host and allowing the hypervisor to
> >> > > > transition its view of the guest page as private, we need to be
> >> > > > sure that the host doesn't have any references to the folio.
> >> > > >
> >> > > > This patch introduces a new type for guest_memfd folios, and uses
> >> > > > that to register a callback that informs the guest_memfd
> >> > > > subsystem when the last reference is dropped, therefore knowing
> >> > > > that the host doesn't have any remaining references.
> >> > > >
> >> > > > Signed-off-by: Fuad Tabba <tabba@google.com>
> >> > > > ---
> >> > > > The function kvm_slot_gmem_register_callback() isn't used in this
> >> > > > series. It will be used later in code that performs unsharing of
> >> > > > memory. I have tested it with pKVM, based on downstream code [*].
> >> > > > It's included in this RFC since it demonstrates the plan to
> >> > > > handle unsharing of private folios.
> >> > > >
> >> > > > [*] https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/guestmem-6.13-v5-pkvm
> >> > >
> >> > > Should the invocation of kvm_slot_gmem_register_callback() happen in
> >> > > the same critical block as setting the guest memfd range mappability
> >> > > to NONE, otherwise conversion/truncation could race with registration
> >> > > of callback?
> >> >
> >> > I don't think it needs to, at least not as far potencial races are
> >> > concerned. First because kvm_slot_gmem_register_callback() grabs the
> >> > mapping's invalidate_lock as well as the folio lock, and
> >> > gmem_clear_mappable() grabs the mapping lock and the folio lock if a
> >> > folio has been allocated before.
> >>
> >> I was hinting towards such a scenario:
> >> Core1
> >> Shared to private conversion
> >>   -> Results in mappability attributes
> >>       being set to NONE
> >> ...
> >>         Trigger private to shared conversion/truncation for
> >> ...
> >>         overlapping ranges
> >> ...
> >> kvm_slot_gmem_register_callback() on
> >>       the guest_memfd ranges converted
> >>       above (This will end up registering callback
> >>       for guest_memfd ranges which possibly don't
> >>       carry *_MAPPABILITY_NONE)
> >>
> >
> > Sorry for the format mess above.
> >
> > I was hinting towards such a scenario:
> > Core1-
> > Shared to private conversion -> Results in mappability attributes
> > being set to NONE
> > ...
> > Core2
> > Trigger private to shared conversion/truncation for overlapping ranges
> > ...
> > Core1
> > kvm_slot_gmem_register_callback() on the guest_memfd ranges converted
> > above (This will end up registering callback for guest_memfd ranges
> > which possibly don't carry *_MAPPABILITY_NONE)
> >
>
> In my model (I'm working through internal processes to open source this)
> I set up the the folio_put() callback to be registered on truncation
> regardless of mappability state.
>
> The folio_put() callback has multiple purposes, see slide 5 of this deck
> [1]:
>
> 1. Transitioning mappability from NONE to GUEST
> 2. Merging the folio if it is ready for merging
> 3. Keeping subfolio around (even if refcount == 0) until folio is ready
>    for merging or return it to hugetlb
>
> So it is okay and in fact better to have the callback registered:
>
> 1. Folios with mappability == NONE can be transitioned to GUEST
> 2. Folios with mappability == GUEST/ALL can be merged if the other subfolios
>    are ready for merging
> 3. And no matter the mappability, if subfolios are not yet merged, they
>    have to be kept around even with refcount 0 until they are merged.
>
> The model doesn't model locking so I'll have to code it up for real to
> verify this, but for now I think we should take a mappability lock
> during mappability read/write, and do any necessary callback
> (un)registration while holding the lock. There's no concern of nested
> locking here since callback registration will purely (un)set
> PGTY_guest_memfd and does not add/drop refcounts.
>
> With the callback registration locked with mappability updates, the
> refcounting and folio_put() callback should keep guest_memfd in a
> consistent state.

So if I understand you correctly, we'll need to always register for
large folios, right? If that's the case, we could expand the check to
whether to register the callback, and ensure it's always registered
for large folios. Since, like I said, the common case for small folios
is that it would be just additional overhead. Right?

Cheers,
/fuad

> >> >
> >> > Second, __gmem_register_callback() checks before returning whether all
> >> > references have been dropped, and adjusts the mappability/shareability
> >> > if needed.
> >> >
> >> > Cheers,
> >> > /fuad
>
> [1] https://lpc.events/event/18/contributions/1764/attachments/1409/3704/guest-memfd-1g-page-support-2025-02-06.pdf

next prev parent reply	other threads:[~2025-02-10 16:04 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-17 16:29 [RFC PATCH v5 00/15] KVM: Restricted mapping of guest_memfd at the host and arm64 support Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 01/15] mm: Consolidate freeing of typed folios on final folio_put() Fuad Tabba
2025-01-17 22:05   ` Elliot Berman
2025-01-19 14:39     ` Fuad Tabba
2025-01-20 10:39     ` David Hildenbrand
2025-01-20 10:50       ` Fuad Tabba
2025-01-20 10:39   ` David Hildenbrand
2025-01-20 10:43     ` Fuad Tabba
2025-01-20 10:43     ` Vlastimil Babka
2025-01-20 11:12       ` Vlastimil Babka
2025-01-20 11:28       ` David Hildenbrand
2025-01-17 16:29 ` [RFC PATCH v5 02/15] KVM: guest_memfd: Make guest mem use guest mem inodes instead of anonymous inodes Fuad Tabba
2025-01-24  4:25   ` Gavin Shan
2025-01-29 10:12     ` Fuad Tabba
2025-02-11 15:58     ` Ackerley Tng
2025-01-17 16:29 ` [RFC PATCH v5 03/15] KVM: guest_memfd: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 04/15] KVM: guest_memfd: Track mappability within a struct kvm_gmem_private Fuad Tabba
2025-01-24  5:31   ` Gavin Shan
2025-01-29 10:15     ` Fuad Tabba
2025-02-26 22:29       ` Ackerley Tng
2025-01-17 16:29 ` [RFC PATCH v5 05/15] KVM: guest_memfd: Folio mappability states and functions that manage their transition Fuad Tabba
2025-01-20 10:30   ` Kirill A. Shutemov
2025-01-20 10:40     ` Fuad Tabba
2025-02-06  3:14       ` Ackerley Tng
2025-02-06  9:45         ` Fuad Tabba
2025-02-19 23:33   ` Ackerley Tng
2025-02-20  9:26     ` Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 06/15] KVM: guest_memfd: Handle final folio_put() of guestmem pages Fuad Tabba
2025-01-20 11:37   ` Vlastimil Babka
2025-01-20 12:14     ` Fuad Tabba
2025-01-22 22:24       ` Ackerley Tng
2025-01-23 11:00         ` Fuad Tabba
2025-02-06  3:18           ` Ackerley Tng
2025-02-06  3:28           ` Ackerley Tng
2025-02-06  9:47             ` Fuad Tabba
2025-01-30 14:23         ` Fuad Tabba
2025-01-22 22:16   ` Ackerley Tng
2025-01-23  9:50     ` Fuad Tabba
2025-02-05  1:28       ` Vishal Annapurve
2025-02-05  4:31         ` Ackerley Tng
2025-02-05  5:58           ` Vishal Annapurve
2025-02-05  0:42   ` Vishal Annapurve
2025-02-05 10:06     ` Fuad Tabba
2025-02-05 17:39       ` Vishal Annapurve
2025-02-05 17:42         ` Vishal Annapurve
2025-02-07 10:46           ` Ackerley Tng
2025-02-10 16:04             ` Fuad Tabba [this message]
2025-02-05  0:51   ` Vishal Annapurve
2025-02-05 10:07     ` Fuad Tabba
2025-02-06  3:37   ` Ackerley Tng
2025-02-06  9:49     ` Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 07/15] KVM: guest_memfd: Allow host to mmap guest_memfd() pages when shared Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 08/15] KVM: guest_memfd: Add guest_memfd support to kvm_(read|/write)_guest_page() Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 09/15] KVM: guest_memfd: Add KVM capability to check if guest_memfd is host mappable Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 10/15] KVM: guest_memfd: Add a guest_memfd() flag to initialize it as mappable Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 11/15] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 12/15] KVM: arm64: Skip VMA checks for slots without userspace address Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 13/15] KVM: arm64: Refactor user_mem_abort() calculation of force_pte Fuad Tabba
2025-01-17 16:30 ` [RFC PATCH v5 14/15] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
2025-01-17 16:30 ` [RFC PATCH v5 15/15] KVM: arm64: Enable guest_memfd private memory when pKVM is enabled Fuad Tabba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+EHjTwGMYkGUWCghBqN=MTuLLn_SCWZJNhdGYAmg=mn-YQiyg@mail.gmail.com' \
    --to=tabba@google.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=amoorthy@google.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=chenhuacai@kernel.org \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=fvdl@google.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=james.morse@arm.com \
    --cc=jarkko@kernel.org \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=keirf@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=liam.merwick@oracle.com \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=mic@digikod.net \
    --cc=michael.roth@amd.com \
    --cc=mpe@ellerman.id.au \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=qperret@google.com \
    --cc=quic_cvanscha@quicinc.com \
    --cc=quic_eberman@quicinc.com \
    --cc=quic_mnalajal@quicinc.com \
    --cc=quic_pderrin@quicinc.com \
    --cc=quic_pheragu@quicinc.com \
    --cc=quic_svaddagi@quicinc.com \
    --cc=quic_tsoni@quicinc.com \
    --cc=rientjes@google.com \
    --cc=roypat@amazon.co.uk \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wei.w.wang@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yilun.xu@intel.com \
    --cc=yu.c.zhang@linux.intel.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox