linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nikita Kalyazin <kalyazin@amazon.com>
To: <willy@infradead.org>, <pbonzini@redhat.com>,
	<linux-fsdevel@vger.kernel.org>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, <kvm@vger.kernel.org>
Cc: <michael.day@amd.com>, <david@redhat.com>,
	<jthoughton@google.com>, <michael.roth@amd.com>,
	<ackerleytng@google.com>, <graf@amazon.de>, <jgowans@amazon.com>,
	<roypat@amazon.co.uk>, <derekmn@amazon.com>, <nsaenz@amazon.es>,
	<xmarcalx@amazon.com>, <kalyazin@amazon.com>
Subject: [RFC PATCH 0/2] mm: filemap: add filemap_grab_folios
Date: Fri, 10 Jan 2025 15:46:57 +0000	[thread overview]
Message-ID: <20250110154659.95464-1-kalyazin@amazon.com> (raw)

Based on David's suggestion for speeding up guest_memfd memory
population [1] made at the guest_memfd upstream call on 5 Dec 2024 [2],
this adds `filemap_grab_folios` that grabs multiple folios at a time.

Motivation

When profiling guest_memfd population and comparing the results with
population of anonymous memory via UFFDIO_COPY, I observed that the
former was up to 20% slower, mainly due to adding newly allocated pages
to the pagecache.  As far as I can see, the two main contributors to it
are pagecache locking and tree traversals needed for every folio.  The
RFC attempts to partially mitigate those by adding multiple folios at a
time to the pagecache.

Testing

With the change applied, I was able to observe a 10.3% (708 to 635 ms)
speedup in a selftest that populated 3GiB guest_memfd and a 9.5% (990 to
904 ms) speedup when restoring a 3GiB guest_memfd VM snapshot using a
custom Firecracker version, both on Intel Ice Lake.

Limitations

While `filemap_grab_folios` handles THP/large folios internally and
deals with reclaim artifacts in the pagecache (shadows), for simplicity
reasons, the RFC does not support those as it demonstrates the
optimisation applied to guest_memfd, which only uses small folios and
does not support reclaim at the moment.

Implementation

I am aware of existing filemap APIs operating on folio batches, however
I was not able to find one for the use case in question.  I was also
thinking about making use of the `folio_batch` struct, but was not able
to convince myself that it was useful.  Instead, a plain array of folio
pointers is allocated on stack and passed down the callchain.  A bitmap
is used to keep track of indexes whose folios were already present in
the pagecache to prevent allocations.  This does not look very clean to
me and I am more than open to hearing about better approaches.

Not being an expert in xarray, I do not know an idiomatic way to advance
the index if `xas_next` is called directly after instantiation of the
state that was never walked, so I used a call to `xas_set`.

While the series focuses on optimising _adding_ folios to the pagecache,
I was also experimenting with batching of pagecache _querying_.
Specifically, I tried to make use of `filemap_get_folios` instead of
`filemap_get_entry`, but I could not observe any visible speedup.

The series is applied on top of [1] as the 1st patch implements
`filemap_grab_folios`, while the 2nd patch makes use of it in the
guest_memfd's write syscall as a first user.

Questions:
 - Does the approach look reasonable in general?
 - Can the API be kept specialised to the non-reclaim-supported case or
   does it need to be generic?
 - Would it be sensible to add a specialised small-folio-only version of
   `filemap_grab_folios` at the beginning and extend it to large folios
later on?
 - Are there better ways to implement batching or even achieve the
   optimisaton goal in another way?

[1]: https://lore.kernel.org/kvm/20241129123929.64790-1-kalyazin@amazon.com/T/
[2]: https://docs.google.com/document/d/1M6766BzdY1Lhk7LiR5IqVR8B8mG3cr-cxTxOrAosPOk/edit?tab=t.0

Thanks
Nikita

Nikita Kalyazin (2):
  mm: filemap: add filemap_grab_folios
  KVM: guest_memfd: use filemap_grab_folios in write

 include/linux/pagemap.h |  31 +++++
 mm/filemap.c            | 263 ++++++++++++++++++++++++++++++++++++++++
 virt/kvm/guest_memfd.c  | 176 ++++++++++++++++++++++-----
 3 files changed, 437 insertions(+), 33 deletions(-)


base-commit: 643cff38ebe84c39fbd5a0fc3ab053cd941b9f94
-- 
2.40.1



             reply	other threads:[~2025-01-10 15:47 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-10 15:46 Nikita Kalyazin [this message]
2025-01-10 15:46 ` [RFC PATCH 1/2] " Nikita Kalyazin
2025-01-10 15:46 ` [RFC PATCH 2/2] KVM: guest_memfd: use filemap_grab_folios in write Nikita Kalyazin
2025-01-10 21:08   ` Mike Day
2025-01-14 16:08     ` Nikita Kalyazin
2025-01-10 17:01 ` [RFC PATCH 0/2] mm: filemap: add filemap_grab_folios David Hildenbrand
2025-01-10 18:54   ` Nikita Kalyazin
2025-01-13 12:20     ` David Hildenbrand
2025-01-14 16:07       ` Nikita Kalyazin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250110154659.95464-1-kalyazin@amazon.com \
    --to=kalyazin@amazon.com \
    --cc=ackerleytng@google.com \
    --cc=david@redhat.com \
    --cc=derekmn@amazon.com \
    --cc=graf@amazon.de \
    --cc=jgowans@amazon.com \
    --cc=jthoughton@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=michael.day@amd.com \
    --cc=michael.roth@amd.com \
    --cc=nsaenz@amazon.es \
    --cc=pbonzini@redhat.com \
    --cc=roypat@amazon.co.uk \
    --cc=willy@infradead.org \
    --cc=xmarcalx@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox