linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Fuad Tabba <tabba@google.com>
To: Ackerley Tng <ackerleytng@google.com>
Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org,
	linux-mm@kvack.org,  pbonzini@redhat.com, chenhuacai@kernel.org,
	mpe@ellerman.id.au,  anup@brainfault.org,
	paul.walmsley@sifive.com, palmer@dabbelt.com,
	 aou@eecs.berkeley.edu, seanjc@google.com,
	viro@zeniv.linux.org.uk,  brauner@kernel.org,
	willy@infradead.org, akpm@linux-foundation.org,
	 xiaoyao.li@intel.com, yilun.xu@intel.com,
	chao.p.peng@linux.intel.com,  jarkko@kernel.org,
	amoorthy@google.com, dmatlack@google.com,
	 yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com,
	mic@digikod.net,  vbabka@suse.cz, vannapurve@google.com,
	mail@maciej.szmigiero.name,  david@redhat.com,
	michael.roth@amd.com, wei.w.wang@intel.com,
	 liam.merwick@oracle.com, isaku.yamahata@gmail.com,
	 kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com,
	steven.price@arm.com,  quic_eberman@quicinc.com,
	quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com,
	 quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com,
	 quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
	catalin.marinas@arm.com,  james.morse@arm.com,
	yuzenghui@huawei.com, oliver.upton@linux.dev,  maz@kernel.org,
	will@kernel.org, qperret@google.com, keirf@google.com,
	 roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org,
	jgg@nvidia.com,  rientjes@google.com, jhubbard@nvidia.com,
	fvdl@google.com, hughd@google.com,  jthoughton@google.com
Subject: Re: [RFC PATCH v5 05/15] KVM: guest_memfd: Folio mappability states and functions that manage their transition
Date: Thu, 20 Feb 2025 09:26:20 +0000	[thread overview]
Message-ID: <CA+EHjTwmbK916i=W7WkDQc+kGGee_rvwUGfhbWiJQRsfB0ZW0w@mail.gmail.com> (raw)
In-Reply-To: <diqzzfih8q7r.fsf@ackerleytng-ctop.c.googlers.com>

Hi Ackerley,

On Wed, 19 Feb 2025 at 23:33, Ackerley Tng <ackerleytng@google.com> wrote:
>
> Fuad Tabba <tabba@google.com> writes:
>
> This question should not block merging of this series since performance
> can be improved in a separate series:
>
> > <snip>
> >
> > +
> > +/*
> > + * Marks the range [start, end) as mappable by both the host and the guest.
> > + * Usually called when guest shares memory with the host.
> > + */
> > +static int gmem_set_mappable(struct inode *inode, pgoff_t start, pgoff_t end)
> > +{
> > +     struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets;
> > +     void *xval = xa_mk_value(KVM_GMEM_ALL_MAPPABLE);
> > +     pgoff_t i;
> > +     int r = 0;
> > +
> > +     filemap_invalidate_lock(inode->i_mapping);
> > +     for (i = start; i < end; i++) {
>
> Were any alternative data structures considered, or does anyone have
> suggestions for alternatives? Doing xa_store() in a loop here will take
> a long time for large ranges.
>
> I looked into the following:
>
> Option 1: (preferred) Maple trees
>
> Maple tree has a nice API, though it would be better if it can combine
> ranges that have the same value.
>
> I will have to dig into performance, but I'm assuming that even large
> ranges are stored in a few nodes so this would be faster than iterating
> over indices in an xarray.
>
> void explore_maple_tree(void)
> {
>         DEFINE_MTREE(mt);
>
>         mt_init_flags(&mt, MT_FLAGS_LOCK_EXTERN | MT_FLAGS_USE_RCU);
>
>         mtree_store_range(&mt, 0, 16, xa_mk_value(0x20), GFP_KERNEL);
>         mtree_store_range(&mt, 8, 24, xa_mk_value(0x32), GFP_KERNEL);
>         mtree_store_range(&mt, 5, 10, xa_mk_value(0x32), GFP_KERNEL);
>
>         {
>                 void *entry;
>                 MA_STATE(mas, &mt, 0, 0);
>
>                 mas_for_each(&mas, entry, ULONG_MAX) {
>                         pr_err("[%ld, %ld]: 0x%lx\n", mas.index, mas.last, xa_to_value(entry));
>                 }
>         }
>
>         mtree_destroy(&mt);
> }
>
> stdout:
>
> [0, 4]: 0x20
> [5, 10]: 0x32
> [11, 24]: 0x32
>
> Option 2: Multi-index xarray
>
> The API is more complex than maple tree's, and IIUC multi-index xarrays
> are not generalizable to any range, so the range can't be 8 1G pages + 1
> 4K page for example. The size of the range has to be a power of 2 that
> is greater than 4K.
>
> Using multi-index xarrays would mean computing order to store
> multi-index entries. This can be computed from the size of the range to
> be added, but is an additional source of errors.
>
> Option 3: Interval tree, which is built on top of red-black trees
>
> The API is set up at a lower level. A macro is used to define interval
> trees, the user has to deal with nodes in the tree directly and
> separately define functions to override sub-ranges in larger ranges.

I didn't consider any other data structures, mainly out of laziness :)
What I mean by that is, xarrays is what is already used in guest_memfd
for tracking other gfn related items, even though many have talked
about in the future replacing it with something else.

I agree with you that it's not the ideal data structure, but also,
like you said, this isn't part of the interface, and it would be easy
to replace in the future. As you mention, one of the challenges is
figuring out the performance impact in practice, and once things have
settled down and the interface is more or less settled, some
benchmarking would be useful to guide us here.

Thanks!
/fuad

> > +             r = xa_err(xa_store(mappable_offsets, i, xval, GFP_KERNEL));
> > +             if (r)
> > +                     break;
> > +     }
> > +     filemap_invalidate_unlock(inode->i_mapping);
> > +
> > +     return r;
> > +}
> > +
> > +/*
> > + * Marks the range [start, end) as not mappable by the host. If the host doesn't
> > + * have any references to a particular folio, then that folio is marked as
> > + * mappable by the guest.
> > + *
> > + * However, if the host still has references to the folio, then the folio is
> > + * marked and not mappable by anyone. Marking it is not mappable allows it to
> > + * drain all references from the host, and to ensure that the hypervisor does
> > + * not transition the folio to private, since the host still might access it.
> > + *
> > + * Usually called when guest unshares memory with the host.
> > + */
> > +static int gmem_clear_mappable(struct inode *inode, pgoff_t start, pgoff_t end)
> > +{
> > +     struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets;
> > +     void *xval_guest = xa_mk_value(KVM_GMEM_GUEST_MAPPABLE);
> > +     void *xval_none = xa_mk_value(KVM_GMEM_NONE_MAPPABLE);
> > +     pgoff_t i;
> > +     int r = 0;
> > +
> > +     filemap_invalidate_lock(inode->i_mapping);
> > +     for (i = start; i < end; i++) {
> > +             struct folio *folio;
> > +             int refcount = 0;
> > +
> > +             folio = filemap_lock_folio(inode->i_mapping, i);
> > +             if (!IS_ERR(folio)) {
> > +                     refcount = folio_ref_count(folio);
> > +             } else {
> > +                     r = PTR_ERR(folio);
> > +                     if (WARN_ON_ONCE(r != -ENOENT))
> > +                             break;
> > +
> > +                     folio = NULL;
> > +             }
> > +
> > +             /* +1 references are expected because of filemap_lock_folio(). */
> > +             if (folio && refcount > folio_nr_pages(folio) + 1) {
> > +                     /*
> > +                      * Outstanding references, the folio cannot be faulted
> > +                      * in by anyone until they're dropped.
> > +                      */
> > +                     r = xa_err(xa_store(mappable_offsets, i, xval_none, GFP_KERNEL));
> > +             } else {
> > +                     /*
> > +                      * No outstanding references. Transition the folio to
> > +                      * guest mappable immediately.
> > +                      */
> > +                     r = xa_err(xa_store(mappable_offsets, i, xval_guest, GFP_KERNEL));
> > +             }
> > +
> > +             if (folio) {
> > +                     folio_unlock(folio);
> > +                     folio_put(folio);
> > +             }
> > +
> > +             if (WARN_ON_ONCE(r))
> > +                     break;
> > +     }
> > +     filemap_invalidate_unlock(inode->i_mapping);
> > +
> > +     return r;
> > +}
> > +
> >
> > <snip>


  reply	other threads:[~2025-02-20  9:27 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-17 16:29 [RFC PATCH v5 00/15] KVM: Restricted mapping of guest_memfd at the host and arm64 support Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 01/15] mm: Consolidate freeing of typed folios on final folio_put() Fuad Tabba
2025-01-17 22:05   ` Elliot Berman
2025-01-19 14:39     ` Fuad Tabba
2025-01-20 10:39     ` David Hildenbrand
2025-01-20 10:50       ` Fuad Tabba
2025-01-20 10:39   ` David Hildenbrand
2025-01-20 10:43     ` Fuad Tabba
2025-01-20 10:43     ` Vlastimil Babka
2025-01-20 11:12       ` Vlastimil Babka
2025-01-20 11:28       ` David Hildenbrand
2025-01-17 16:29 ` [RFC PATCH v5 02/15] KVM: guest_memfd: Make guest mem use guest mem inodes instead of anonymous inodes Fuad Tabba
2025-01-24  4:25   ` Gavin Shan
2025-01-29 10:12     ` Fuad Tabba
2025-02-11 15:58     ` Ackerley Tng
2025-01-17 16:29 ` [RFC PATCH v5 03/15] KVM: guest_memfd: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 04/15] KVM: guest_memfd: Track mappability within a struct kvm_gmem_private Fuad Tabba
2025-01-24  5:31   ` Gavin Shan
2025-01-29 10:15     ` Fuad Tabba
2025-02-26 22:29       ` Ackerley Tng
2025-01-17 16:29 ` [RFC PATCH v5 05/15] KVM: guest_memfd: Folio mappability states and functions that manage their transition Fuad Tabba
2025-01-20 10:30   ` Kirill A. Shutemov
2025-01-20 10:40     ` Fuad Tabba
2025-02-06  3:14       ` Ackerley Tng
2025-02-06  9:45         ` Fuad Tabba
2025-02-19 23:33   ` Ackerley Tng
2025-02-20  9:26     ` Fuad Tabba [this message]
2025-01-17 16:29 ` [RFC PATCH v5 06/15] KVM: guest_memfd: Handle final folio_put() of guestmem pages Fuad Tabba
2025-01-20 11:37   ` Vlastimil Babka
2025-01-20 12:14     ` Fuad Tabba
2025-01-22 22:24       ` Ackerley Tng
2025-01-23 11:00         ` Fuad Tabba
2025-02-06  3:18           ` Ackerley Tng
2025-02-06  3:28           ` Ackerley Tng
2025-02-06  9:47             ` Fuad Tabba
2025-01-30 14:23         ` Fuad Tabba
2025-01-22 22:16   ` Ackerley Tng
2025-01-23  9:50     ` Fuad Tabba
2025-02-05  1:28       ` Vishal Annapurve
2025-02-05  4:31         ` Ackerley Tng
2025-02-05  5:58           ` Vishal Annapurve
2025-02-05  0:42   ` Vishal Annapurve
2025-02-05 10:06     ` Fuad Tabba
2025-02-05 17:39       ` Vishal Annapurve
2025-02-05 17:42         ` Vishal Annapurve
2025-02-07 10:46           ` Ackerley Tng
2025-02-10 16:04             ` Fuad Tabba
2025-02-05  0:51   ` Vishal Annapurve
2025-02-05 10:07     ` Fuad Tabba
2025-02-06  3:37   ` Ackerley Tng
2025-02-06  9:49     ` Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 07/15] KVM: guest_memfd: Allow host to mmap guest_memfd() pages when shared Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 08/15] KVM: guest_memfd: Add guest_memfd support to kvm_(read|/write)_guest_page() Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 09/15] KVM: guest_memfd: Add KVM capability to check if guest_memfd is host mappable Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 10/15] KVM: guest_memfd: Add a guest_memfd() flag to initialize it as mappable Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 11/15] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 12/15] KVM: arm64: Skip VMA checks for slots without userspace address Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 13/15] KVM: arm64: Refactor user_mem_abort() calculation of force_pte Fuad Tabba
2025-01-17 16:30 ` [RFC PATCH v5 14/15] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
2025-01-17 16:30 ` [RFC PATCH v5 15/15] KVM: arm64: Enable guest_memfd private memory when pKVM is enabled Fuad Tabba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+EHjTwmbK916i=W7WkDQc+kGGee_rvwUGfhbWiJQRsfB0ZW0w@mail.gmail.com' \
    --to=tabba@google.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=amoorthy@google.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=chenhuacai@kernel.org \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=fvdl@google.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=james.morse@arm.com \
    --cc=jarkko@kernel.org \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=keirf@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=liam.merwick@oracle.com \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=mic@digikod.net \
    --cc=michael.roth@amd.com \
    --cc=mpe@ellerman.id.au \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=qperret@google.com \
    --cc=quic_cvanscha@quicinc.com \
    --cc=quic_eberman@quicinc.com \
    --cc=quic_mnalajal@quicinc.com \
    --cc=quic_pderrin@quicinc.com \
    --cc=quic_pheragu@quicinc.com \
    --cc=quic_svaddagi@quicinc.com \
    --cc=quic_tsoni@quicinc.com \
    --cc=rientjes@google.com \
    --cc=roypat@amazon.co.uk \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wei.w.wang@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yilun.xu@intel.com \
    --cc=yu.c.zhang@linux.intel.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox