From: David Hildenbrand <david@redhat.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: "Shah, Amit" <Amit.Shah@amd.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"Roth, Michael" <Michael.Roth@amd.com>,
"liam.merwick@oracle.com" <liam.merwick@oracle.com>,
"seanjc@google.com" <seanjc@google.com>,
"jroedel@suse.de" <jroedel@suse.de>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"Sampat, Pratik Rajesh" <PratikRajesh.Sampat@amd.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Lendacky, Thomas" <Thomas.Lendacky@amd.com>,
"vbabka@suse.cz" <vbabka@suse.cz>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"linux-coco@lists.linux.dev" <linux-coco@lists.linux.dev>,
"quic_eberman@quicinc.com" <quic_eberman@quicinc.com>,
"Kalra, Ashish" <Ashish.Kalra@amd.com>,
"ackerleytng@google.com" <ackerleytng@google.com>,
"vannapurve@google.com" <vannapurve@google.com>
Subject: Re: [PATCH RFC v1 0/5] KVM: gmem: 2MB THP support and preparedness tracking changes
Date: Fri, 14 Mar 2025 10:33:07 +0100 [thread overview]
Message-ID: <18db10a0-bd40-4c6a-b099-236f4dcaf0cf@redhat.com> (raw)
In-Reply-To: <Z9PyLE/LCrSr2jCM@yzhao56-desk.sh.intel.com>
On 14.03.25 10:09, Yan Zhao wrote:
> On Wed, Jan 22, 2025 at 03:25:29PM +0100, David Hildenbrand wrote:
>> (split is possible if there are no unexpected folio references; private
>> pages cannot be GUP'ed, so it is feasible)
> ...
>>>> Note that I'm not quite sure about the "2MB" interface, should it be
>>>> a
>>>> "PMD-size" interface?
>>>
>>> I think Mike and I touched upon this aspect too - and I may be
>>> misremembering - Mike suggested getting 1M, 2M, and bigger page sizes
>>> in increments -- and then fitting in PMD sizes when we've had enough of
>>> those. That is to say he didn't want to preclude it, or gate the PMD
>>> work on enabling all sizes first.
>>
>> Starting with 2M is reasonable for now. The real question is how we want to
>> deal with
> Hi David,
>
Hi!
> I'm just trying to understand the background of in-place conversion.
>
> Regarding to the two issues you mentioned with THP and non-in-place-conversion,
> I have some questions (still based on starting with 2M):
>
>> (a) Not being able to allocate a 2M folio reliably
> If we start with fault in private pages from guest_memfd (not in page pool way)
> and shared pages anonymously, is it correct to say that this is only a concern
> when memory is under pressure?
Usually, fragmentation starts being a problem under memory pressure, and
memory pressure can show up simply because the page cache makes us of as
much memory as it wants.
As soon as we start allocating a 2 MB page for guest_memfd, to then
split it up + free only some parts back to the buddy (on private->shared
conversion), we create fragmentation that cannot get resolved as long as
the remaining private pages are not freed. A new conversion from
shared->private on the previously freed parts will allocate other
unmovable pages (not the freed ones) and make fragmentation worse.
In-place conversion improves that quite a lot, because guest_memfd tself
will not cause unmovable fragmentation. Of course, under memory
pressure, when and cannot allocate a 2M page for guest_memfd, it's
unavoidable. But then, we already had fragmentation (and did not really
cause any new one).
We discussed in the upstream call, that if guest_memfd (primarily) only
allocates 2M pages and frees 2M pages, it will not cause fragmentation
itself, which is pretty nice.
>
>> (b) Partial discarding
> For shared pages, page migration and folio split are possible for shared THP?
I assume by "shared" you mean "not guest_memfd, but some other memory we
use as an overlay" -- so no in-place conversion.
Yes, that should be possible as long as nothing else prevents
migration/split (e.g., longterm pinning)
>
> For private pages, as you pointed out earlier, if we can ensure there are no
> unexpected folio references for private memory, splitting a private huge folio
> should succeed.
Yes, and maybe (hopefully) we'll reach a point where private parts will
not have a refcount at all (initially, frozen refcount, discussed during
the last upstream call).
Are you concerned about the memory fragmentation after repeated
> partial conversions of private pages to and from shared?
Not only repeated, even just a single partial conversion. But of course,
repeated partial conversions will make it worse (e.g., never getting a
private huge page back when there was a partial conversion).
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2025-03-14 9:33 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-12 6:36 Michael Roth
2024-12-12 6:36 ` [PATCH 1/5] KVM: gmem: Don't rely on __kvm_gmem_get_pfn() for preparedness Michael Roth
2025-01-22 14:39 ` Tom Lendacky
2025-02-20 1:12 ` Michael Roth
2024-12-12 6:36 ` [PATCH 2/5] KVM: gmem: Don't clear pages that have already been prepared Michael Roth
2024-12-12 6:36 ` [PATCH 3/5] KVM: gmem: Hold filemap invalidate lock while allocating/preparing folios Michael Roth
2025-03-14 9:20 ` Yan Zhao
2025-04-07 8:25 ` Yan Zhao
2025-04-23 20:30 ` Ackerley Tng
2025-05-19 17:04 ` Ackerley Tng
2025-05-21 6:46 ` Yan Zhao
2025-06-03 1:05 ` Vishal Annapurve
2025-06-03 1:31 ` Yan Zhao
2025-06-04 6:28 ` Vishal Annapurve
2025-06-12 12:40 ` Yan Zhao
2025-06-12 14:43 ` Vishal Annapurve
2025-07-03 6:29 ` Yan Zhao
2025-06-13 15:19 ` Michael Roth
2025-06-13 18:04 ` Michael Roth
2025-07-03 6:33 ` Yan Zhao
2024-12-12 6:36 ` [PATCH 4/5] KVM: SEV: Improve handling of large ranges in gmem prepare callback Michael Roth
2024-12-12 6:36 ` [PATCH 5/5] KVM: Add hugepage support for dedicated guest memory Michael Roth
2025-03-14 9:50 ` Yan Zhao
2024-12-20 11:31 ` [PATCH RFC v1 0/5] KVM: gmem: 2MB THP support and preparedness tracking changes David Hildenbrand
2025-01-07 12:11 ` Shah, Amit
2025-01-22 14:25 ` David Hildenbrand
2025-03-14 9:09 ` Yan Zhao
2025-03-14 9:33 ` David Hildenbrand [this message]
2025-03-14 11:19 ` Yan Zhao
2025-03-18 2:24 ` Yan Zhao
2025-03-18 19:13 ` David Hildenbrand
2025-03-19 7:39 ` Yan Zhao
2025-02-11 1:16 ` Vishal Annapurve
2025-02-20 1:09 ` Michael Roth
2025-03-14 9:16 ` Yan Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=18db10a0-bd40-4c6a-b099-236f4dcaf0cf@redhat.com \
--to=david@redhat.com \
--cc=Amit.Shah@amd.com \
--cc=Ashish.Kalra@amd.com \
--cc=Michael.Roth@amd.com \
--cc=PratikRajesh.Sampat@amd.com \
--cc=Thomas.Lendacky@amd.com \
--cc=ackerleytng@google.com \
--cc=jroedel@suse.de \
--cc=kvm@vger.kernel.org \
--cc=liam.merwick@oracle.com \
--cc=linux-coco@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pbonzini@redhat.com \
--cc=quic_eberman@quicinc.com \
--cc=seanjc@google.com \
--cc=vannapurve@google.com \
--cc=vbabka@suse.cz \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox