From: Ackerley Tng <ackerleytng@google.com>
To: kalyazin@amazon.com, "Edgecombe,
Rick P" <rick.p.edgecombe@intel.com>,
"linux-riscv@lists.infradead.org"
<linux-riscv@lists.infradead.org>,
"kalyazin@amazon.co.uk" <kalyazin@amazon.co.uk>,
"kernel@xen0n.name" <kernel@xen0n.name>,
"linux-kselftest@vger.kernel.org"
<linux-kselftest@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-s390@vger.kernel.org" <linux-s390@vger.kernel.org>,
"kvmarm@lists.linux.dev" <kvmarm@lists.linux.dev>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
"loongarch@lists.linux.dev" <loongarch@lists.linux.dev>
Cc: "david@kernel.org" <david@kernel.org>,
"palmer@dabbelt.com" <palmer@dabbelt.com>,
"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
"svens@linux.ibm.com" <svens@linux.ibm.com>,
"jgross@suse.com" <jgross@suse.com>,
"surenb@google.com" <surenb@google.com>,
"riel@surriel.com" <riel@surriel.com>,
"pfalcato@suse.de" <pfalcato@suse.de>,
"peterx@redhat.com" <peterx@redhat.com>,
"x86@kernel.org" <x86@kernel.org>,
"rppt@kernel.org" <rppt@kernel.org>,
"thuth@redhat.com" <thuth@redhat.com>,
"maz@kernel.org" <maz@kernel.org>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"ast@kernel.org" <ast@kernel.org>,
"vbabka@suse.cz" <vbabka@suse.cz>,
"Annapurve, Vishal" <vannapurve@google.com>,
"borntraeger@linux.ibm.com" <borntraeger@linux.ibm.com>,
"alex@ghiti.fr" <alex@ghiti.fr>,
"pjw@kernel.org" <pjw@kernel.org>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"willy@infradead.org" <willy@infradead.org>,
"hca@linux.ibm.com" <hca@linux.ibm.com>,
"wyihan@google.com" <wyihan@google.com>,
"ryan.roberts@arm.com" <ryan.roberts@arm.com>,
"jolsa@kernel.org" <jolsa@kernel.org>,
"yang@os.amperecomputing.com" <yang@os.amperecomputing.com>,
"jmattson@google.com" <jmattson@google.com>,
"luto@kernel.org" <luto@kernel.org>,
"aneesh.kumar@kernel.org" <aneesh.kumar@kernel.org>,
"haoluo@google.com" <haoluo@google.com>,
"patrick.roy@linux.dev" <patrick.roy@linux.dev>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"coxu@redhat.com" <coxu@redhat.com>,
"mhocko@suse.com" <mhocko@suse.com>,
"mlevitsk@redhat.com" <mlevitsk@redhat.com>,
"jgg@ziepe.ca" <jgg@ziepe.ca>, "hpa@zytor.com" <hpa@zytor.com>,
"song@kernel.org" <song@kernel.org>,
"oupton@kernel.org" <oupton@kernel.org>,
"peterz@infradead.org" <peterz@infradead.org>,
"maobibo@loongson.cn" <maobibo@loongson.cn>,
"lorenzo.stoakes@oracle.com" <lorenzo.stoakes@oracle.com>,
"Liam.Howlett@oracle.com" <Liam.Howlett@oracle.com>,
"jthoughton@google.com" <jthoughton@google.com>,
"martin.lau@linux.dev" <martin.lau@linux.dev>,
"jhubbard@nvidia.com" <jhubbard@nvidia.com>,
"Yu, Yu-cheng" <yu-cheng.yu@intel.com>,
"Jonathan.Cameron@huawei.com" <Jonathan.Cameron@huawei.com>,
"eddyz87@gmail.com" <eddyz87@gmail.com>,
"yonghong.song@linux.dev" <yonghong.song@linux.dev>,
"chenhuacai@kernel.org" <chenhuacai@kernel.org>,
"shuah@kernel.org" <shuah@kernel.org>,
"prsampat@amd.com" <prsampat@amd.com>,
"kevin.brodsky@arm.com" <kevin.brodsky@arm.com>,
"shijie@os.amperecomputing.com" <shijie@os.amperecomputing.com>,
"suzuki.poulose@arm.com" <suzuki.poulose@arm.com>,
"itazur@amazon.co.uk" <itazur@amazon.co.uk>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"yuzenghui@huawei.com" <yuzenghui@huawei.com>,
"dev.jain@arm.com" <dev.jain@arm.com>,
"gor@linux.ibm.com" <gor@linux.ibm.com>,
"jackabt@amazon.co.uk" <jackabt@amazon.co.uk>,
"daniel@iogearbox.net" <daniel@iogearbox.net>,
"agordeev@linux.ibm.com" <agordeev@linux.ibm.com>,
"andrii@kernel.org" <andrii@kernel.org>,
"mingo@redhat.com" <mingo@redhat.com>,
"aou@eecs.berkeley.edu" <aou@eecs.berkeley.edu>,
"joey.gouly@arm.com" <joey.gouly@arm.com>,
"derekmn@amazon.com" <derekmn@amazon.com>,
"xmarcalx@amazon.co.uk" <xmarcalx@amazon.co.uk>,
"kpsingh@kernel.org" <kpsingh@kernel.org>,
"sdf@fomichev.me" <sdf@fomichev.me>,
"jackmanb@google.com" <jackmanb@google.com>,
"bp@alien8.de" <bp@alien8.de>, "corbet@lwn.net" <corbet@lwn.net>,
"jannh@google.com" <jannh@google.com>,
"john.fastabend@gmail.com" <john.fastabend@gmail.com>,
"kas@kernel.org" <kas@kernel.org>,
"will@kernel.org" <will@kernel.org>,
"seanjc@google.com" <seanjc@google.com>
Subject: Re: [PATCH v9 07/13] KVM: guest_memfd: Add flag to remove from direct map
Date: Tue, 27 Jan 2026 16:21:11 -0800 [thread overview]
Message-ID: <CAEvNRgFCwU7ezDV4Spj=H1JZohG9CSQRKMh_h1OGY1GrR2=7Eg@mail.gmail.com> (raw)
In-Reply-To: <afddc163-4b1e-46ee-920a-85de3b347291@amazon.com>
Nikita Kalyazin <kalyazin@amazon.com> writes:
> On 22/01/2026 18:37, Ackerley Tng wrote:
>> Nikita Kalyazin <kalyazin@amazon.com> writes:
>>
>>> On 16/01/2026 00:00, Edgecombe, Rick P wrote:
>>>> On Wed, 2026-01-14 at 13:46 +0000, Kalyazin, Nikita wrote:
>>>>> +static void kvm_gmem_folio_restore_direct_map(struct folio *folio)
>>>>> +{
>>>>> + /*
>>>>> + * Direct map restoration cannot fail, as the only error condition
>>>>> + * for direct map manipulation is failure to allocate page tables
>>>>> + * when splitting huge pages, but this split would have already
>>>>> + * happened in folio_zap_direct_map() in kvm_gmem_folio_zap_direct_map().
>>
>> Do you know if folio_restore_direct_map() will also end up merging page
>> table entries to a higher level?
>>
>>>>> + * Thus folio_restore_direct_map() here only updates prot bits.
>>>>> + */
>>>>> + if (kvm_gmem_folio_no_direct_map(folio)) {
>>>>> + WARN_ON_ONCE(folio_restore_direct_map(folio));
>>>>> + folio->private = (void *)((u64)folio->private & ~KVM_GMEM_FOLIO_NO_DIRECT_MAP);
>>>>> + }
>>>>> +}
>>>>> +
>>>>
>>>> Does this assume the folio would not have been split after it was zapped? As in,
>>>> if it was zapped at 2MB granularity (no 4KB direct map split required) but then
>>>> restored at 4KB (split required)? Or it gets merged somehow before this?
>>
>> I agree with the rest of the discussion that this will probably land
>> before huge page support, so I will have to figure out the intersection
>> of the two later.
>>
>>>
>>> AFAIK it can't be zapped at 2MB granularity as the zapping code will
>>> inevitably cause splitting because guest_memfd faults occur at the base
>>> page granularity as of now.
>>
>> Here's what I'm thinking for now:
>>
>> [HugeTLB, no conversions]
>> With initial HugeTLB support (no conversions), host userspace
>> guest_memfd faults will be:
>>
>> + For guest_memfd with PUD-sized pages
>> + At PUD level or PTE level
>> + For guest_memfd with PMD-sized pages
>> + At PMD level or PTE level
>>
>> Since this guest_memfd doesn't support conversions, the folio is never
>> split/merged, so the direct map is restored at whatever level it was
>> zapped. I think this works out well.
>>
>> [HugeTLB + conversions]
>> For a guest_memfd with HugeTLB support and conversions, host userspace
>> guest_memfd faults will always be at PTE level, so the direct map will
>> be split and the faulted pages have the direct map zapped in 4K chunks
>> as they are faulted.
>>
>> On conversion back to private, put those back into the direct map
>> (putting aside whether to merge the direct map PTEs for now).
>
> Makes sense to me.
>
>>
>>
>> Unfortunately there's no unmapping callback for guest_memfd to use, so
>> perhaps the principle should be to put the folios back into the direct
>> map ASAP - at unmapping if guest_memfd is doing the unmapping, otherwise
>> at freeing time?
>
> I'm not sure I fully understand what you mean here. What would be the
> purpose for hooking up to unmapping? Why would making sure we put
> folios back into the direct map whenever they are freed or converted to
> private not be sufficient?
I think putting the folios back into the direct map when the folios are
freed or converted to private should cover all cases.
I was just thinking that being able to hook up to unmapping is nice
since unmapping is the counterpart to mapping when the folios are
removed from the direct map.
next prev parent reply other threads:[~2026-01-28 0:21 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-14 13:45 [PATCH v9 00/13] Direct Map Removal Support for guest_memfd Kalyazin, Nikita
2026-01-14 13:45 ` [PATCH v9 01/13] set_memory: add folio_{zap,restore}_direct_map helpers Kalyazin, Nikita
2026-01-15 10:54 ` Huacai Chen
2026-01-15 11:03 ` [PATCH v9 01/13] set_memory: add folio_{zap, restore}_direct_map helpers Nikita Kalyazin
2026-01-15 12:12 ` [PATCH v9 01/13] set_memory: add folio_{zap,restore}_direct_map helpers Heiko Carstens
2026-01-15 15:25 ` [PATCH v9 01/13] set_memory: add folio_{zap, restore}_direct_map helpers Nikita Kalyazin
2026-01-15 15:55 ` [PATCH v9 01/13] set_memory: add folio_{zap,restore}_direct_map helpers Matthew Wilcox
2026-01-15 17:45 ` [PATCH v9 01/13] set_memory: add folio_{zap, restore}_direct_map helpers Nikita Kalyazin
2026-01-15 20:05 ` David Hildenbrand (Red Hat)
2026-01-15 21:07 ` [PATCH v9 01/13] set_memory: add folio_{zap,restore}_direct_map helpers Ackerley Tng
2026-01-14 13:45 ` [PATCH v9 02/13] mm/gup: drop secretmem optimization from gup_fast_folio_allowed Kalyazin, Nikita
2026-01-15 20:04 ` David Hildenbrand (Red Hat)
2026-01-15 21:40 ` Ackerley Tng
2026-01-16 14:55 ` Nikita Kalyazin
2026-01-22 0:20 ` Ackerley Tng
2026-01-14 13:45 ` [PATCH v9 03/13] mm: introduce AS_NO_DIRECT_MAP Kalyazin, Nikita
2026-01-15 21:42 ` Ackerley Tng
2026-01-14 13:45 ` [PATCH v9 04/13] KVM: guest_memfd: Add stub for kvm_arch_gmem_invalidate Kalyazin, Nikita
2026-01-15 21:47 ` Ackerley Tng
2026-01-14 13:46 ` [PATCH v9 05/13] KVM: x86: define kvm_arch_gmem_supports_no_direct_map() Kalyazin, Nikita
2026-01-15 21:48 ` Ackerley Tng
2026-01-14 13:46 ` [PATCH v9 06/13] KVM: arm64: " Kalyazin, Nikita
2026-01-14 13:46 ` [PATCH v9 07/13] KVM: guest_memfd: Add flag to remove from direct map Kalyazin, Nikita
2026-01-15 20:00 ` Ackerley Tng
2026-01-16 14:56 ` Nikita Kalyazin
2026-01-22 16:34 ` Ackerley Tng
2026-01-22 18:04 ` Nikita Kalyazin
2026-01-22 20:30 ` Ackerley Tng
2026-01-22 20:40 ` Nikita Kalyazin
2026-01-15 23:04 ` Edgecombe, Rick P
2026-01-16 15:02 ` Nikita Kalyazin
2026-01-16 15:35 ` Edgecombe, Rick P
2026-01-16 15:41 ` Sean Christopherson
2026-01-16 17:32 ` Nikita Kalyazin
2026-01-16 17:51 ` Edgecombe, Rick P
2026-01-16 17:30 ` Vishal Annapurve
2026-01-16 17:51 ` Edgecombe, Rick P
2026-01-22 16:44 ` Ackerley Tng
2026-01-22 17:35 ` Edgecombe, Rick P
2026-01-22 22:47 ` Ackerley Tng
2026-01-23 0:01 ` Edgecombe, Rick P
2026-01-28 0:29 ` Ackerley Tng
2026-01-16 0:00 ` Edgecombe, Rick P
2026-01-16 15:00 ` Nikita Kalyazin
2026-01-16 15:34 ` Edgecombe, Rick P
2026-01-16 17:28 ` Nikita Kalyazin
2026-01-16 17:36 ` Edgecombe, Rick P
2026-01-16 17:51 ` Nikita Kalyazin
2026-01-16 18:10 ` Edgecombe, Rick P
2026-01-16 18:16 ` Nikita Kalyazin
2026-01-22 18:37 ` Ackerley Tng
2026-01-22 18:47 ` Nikita Kalyazin
2026-01-26 16:56 ` Nikita Kalyazin
2026-01-28 0:21 ` Ackerley Tng [this message]
2026-01-14 13:46 ` [PATCH v9 08/13] KVM: selftests: load elf via bounce buffer Kalyazin, Nikita
2026-01-14 13:46 ` [PATCH v9 09/13] KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd != -1 Kalyazin, Nikita
2026-01-15 19:39 ` Ackerley Tng
2026-01-16 15:00 ` Nikita Kalyazin
2026-01-14 13:47 ` [PATCH v9 10/13] KVM: selftests: Add guest_memfd based vm_mem_backing_src_types Kalyazin, Nikita
2026-01-14 13:47 ` [PATCH v9 11/13] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in existing selftests Kalyazin, Nikita
2026-01-14 13:47 ` [PATCH v9 12/13] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape Kalyazin, Nikita
2026-01-14 13:47 ` [PATCH v9 13/13] KVM: selftests: Test guest execution from direct map removed gmem Kalyazin, Nikita
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAEvNRgFCwU7ezDV4Spj=H1JZohG9CSQRKMh_h1OGY1GrR2=7Eg@mail.gmail.com' \
--to=ackerleytng@google.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=Liam.Howlett@oracle.com \
--cc=agordeev@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=alex@ghiti.fr \
--cc=andrii@kernel.org \
--cc=aneesh.kumar@kernel.org \
--cc=aou@eecs.berkeley.edu \
--cc=ast@kernel.org \
--cc=borntraeger@linux.ibm.com \
--cc=bp@alien8.de \
--cc=bpf@vger.kernel.org \
--cc=catalin.marinas@arm.com \
--cc=chenhuacai@kernel.org \
--cc=corbet@lwn.net \
--cc=coxu@redhat.com \
--cc=daniel@iogearbox.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=derekmn@amazon.com \
--cc=dev.jain@arm.com \
--cc=eddyz87@gmail.com \
--cc=gor@linux.ibm.com \
--cc=haoluo@google.com \
--cc=hca@linux.ibm.com \
--cc=hpa@zytor.com \
--cc=itazur@amazon.co.uk \
--cc=jackabt@amazon.co.uk \
--cc=jackmanb@google.com \
--cc=jannh@google.com \
--cc=jgg@ziepe.ca \
--cc=jgross@suse.com \
--cc=jhubbard@nvidia.com \
--cc=jmattson@google.com \
--cc=joey.gouly@arm.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=jthoughton@google.com \
--cc=kalyazin@amazon.co.uk \
--cc=kalyazin@amazon.com \
--cc=kas@kernel.org \
--cc=kernel@xen0n.name \
--cc=kevin.brodsky@arm.com \
--cc=kpsingh@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-s390@vger.kernel.org \
--cc=loongarch@lists.linux.dev \
--cc=lorenzo.stoakes@oracle.com \
--cc=luto@kernel.org \
--cc=maobibo@loongson.cn \
--cc=martin.lau@linux.dev \
--cc=maz@kernel.org \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=mlevitsk@redhat.com \
--cc=oupton@kernel.org \
--cc=palmer@dabbelt.com \
--cc=patrick.roy@linux.dev \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=pfalcato@suse.de \
--cc=pjw@kernel.org \
--cc=prsampat@amd.com \
--cc=rick.p.edgecombe@intel.com \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=sdf@fomichev.me \
--cc=seanjc@google.com \
--cc=shijie@os.amperecomputing.com \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=surenb@google.com \
--cc=suzuki.poulose@arm.com \
--cc=svens@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=thuth@redhat.com \
--cc=vannapurve@google.com \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=wyihan@google.com \
--cc=x86@kernel.org \
--cc=xmarcalx@amazon.co.uk \
--cc=yang@os.amperecomputing.com \
--cc=yonghong.song@linux.dev \
--cc=yu-cheng.yu@intel.com \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox