From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F15C7C79FBD for ; Fri, 16 Jan 2026 14:57:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6378B6B00B3; Fri, 16 Jan 2026 09:57:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5B7F26B00B4; Fri, 16 Jan 2026 09:57:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 45BF16B00B5; Fri, 16 Jan 2026 09:57:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 2F8E96B00B3 for ; Fri, 16 Jan 2026 09:57:16 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id D8A8D1A016E for ; Fri, 16 Jan 2026 14:57:15 +0000 (UTC) X-FDA: 84338130030.03.7CA9B9D Received: from fra-out-014.esa.eu-central-1.outbound.mail-perimeter.amazon.com (fra-out-014.esa.eu-central-1.outbound.mail-perimeter.amazon.com [18.199.210.3]) by imf16.hostedemail.com (Postfix) with ESMTP id 3FDAC180008 for ; Fri, 16 Jan 2026 14:57:13 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazoncorp2 header.b=nCsEs94D; spf=pass (imf16.hostedemail.com: domain of "prvs=469074ca4=kalyazin@amazon.co.uk" designates 18.199.210.3 as permitted sender) smtp.mailfrom="prvs=469074ca4=kalyazin@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1768575433; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MR+rnXxwxn/7x1Gj4j0jxSdDeJDLel3jNY29213m130=; b=QAKChOmiGgYruiVLoyZlXH0xnCXGQrMWLwuDiwloHNMe6rnyfYJ6b/0XswSXuE00RlvxFy 4sROQu6PzMt/sgqD2ZIfkBNzWryG1T3hjZcoLHJZ56gecGWVTWOQojvsHISLS88roZlqjq O1QkYVpncqUMdfzplVK8CG70lVDEo98= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazoncorp2 header.b=nCsEs94D; spf=pass (imf16.hostedemail.com: domain of "prvs=469074ca4=kalyazin@amazon.co.uk" designates 18.199.210.3 as permitted sender) smtp.mailfrom="prvs=469074ca4=kalyazin@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1768575433; a=rsa-sha256; cv=none; b=YP8gTMajmo0lmUtaC90DdG7ggPvsh4BU4miIdjddhGt2O2JAna/uJyx+juuyGNSf/VVhRG 2BZdYj3K0qY87FV1teWUlIcDtrzzbc2z0OFOROxt0TVbz/MkgaWtk97Zd+jeDPS9vwtQaP t+++M1ADmJpgUrNutkF8SA3QNW1Xhpc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazoncorp2; t=1768575433; x=1800111433; h=message-id:date:mime-version:reply-to:subject:to:cc: references:from:in-reply-to:content-transfer-encoding; bh=MR+rnXxwxn/7x1Gj4j0jxSdDeJDLel3jNY29213m130=; b=nCsEs94DLPznHWjoKndleW/ic/DvzzNKdcjbhVKG6+85il3a2if+PZeS wXahkQpLZfYS0tc/VL3rKnZnMJMEVwLBtloJbaOZefXI9ynaunXPJtX6S mp7/IqSRIsgKiQGZORQZuq12ODSq9TwVU+PCMVBnCCGndvrAdqbxQUdFL izI74GO99GQkxwxq88ZtoIVz5XjajAtqQEA2+JUKV/mWYl0hn7N3lm8n5 etQ0VFqGbaX6eEi/vEACbGeqH9Z3JQ4eUo5YmDgdB21HS7ESItFp+fk1H k5mX6KmMQlNp0ce2wFKR0BDvSNgAR8Y8DPN/RGWKFuM6a7qbfiETcrtHD w==; X-CSE-ConnectionGUID: vQMfjCS2RPe/+ynLfnaVwA== X-CSE-MsgGUID: CK/cr+tHSwmdmlTSKnHeWA== X-IronPort-AV: E=Sophos;i="6.21,231,1763424000"; d="scan'208";a="7920258" Received: from ip-10-6-6-97.eu-central-1.compute.internal (HELO smtpout.naws.eu-central-1.prod.farcaster.email.amazon.dev) ([10.6.6.97]) by internal-fra-out-014.esa.eu-central-1.outbound.mail-perimeter.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jan 2026 14:56:55 +0000 Received: from EX19MTAEUB002.ant.amazon.com [54.240.197.232:20480] by smtpin.naws.eu-central-1.prod.farcaster.email.amazon.dev [10.0.28.56:2525] with esmtp (Farcaster) id fa6b071b-8258-4488-b17f-3ae252f37550; Fri, 16 Jan 2026 14:56:55 +0000 (UTC) X-Farcaster-Flow-ID: fa6b071b-8258-4488-b17f-3ae252f37550 Received: from EX19D005EUB003.ant.amazon.com (10.252.51.31) by EX19MTAEUB002.ant.amazon.com (10.252.51.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.35; Fri, 16 Jan 2026 14:56:52 +0000 Received: from [192.168.12.13] (10.106.82.9) by EX19D005EUB003.ant.amazon.com (10.252.51.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.35; Fri, 16 Jan 2026 14:56:48 +0000 Message-ID: Date: Fri, 16 Jan 2026 14:56:46 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Reply-To: Subject: Re: [PATCH v9 07/13] KVM: guest_memfd: Add flag to remove from direct map To: Ackerley Tng , "Kalyazin, Nikita" , "kvm@vger.kernel.org" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , "kvmarm@lists.linux.dev" , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" , "bpf@vger.kernel.org" , "linux-kselftest@vger.kernel.org" , "kernel@xen0n.name" , "linux-riscv@lists.infradead.org" , "linux-s390@vger.kernel.org" , "loongarch@lists.linux.dev" CC: "pbonzini@redhat.com" , "corbet@lwn.net" , "maz@kernel.org" , "oupton@kernel.org" , "joey.gouly@arm.com" , "suzuki.poulose@arm.com" , "yuzenghui@huawei.com" , "catalin.marinas@arm.com" , "will@kernel.org" , "seanjc@google.com" , "tglx@linutronix.de" , "mingo@redhat.com" , "bp@alien8.de" , "dave.hansen@linux.intel.com" , "x86@kernel.org" , "hpa@zytor.com" , "luto@kernel.org" , "peterz@infradead.org" , "willy@infradead.org" , "akpm@linux-foundation.org" , "david@kernel.org" , "lorenzo.stoakes@oracle.com" , "Liam.Howlett@oracle.com" , "vbabka@suse.cz" , "rppt@kernel.org" , "surenb@google.com" , "mhocko@suse.com" , "ast@kernel.org" , "daniel@iogearbox.net" , "andrii@kernel.org" , "martin.lau@linux.dev" , "eddyz87@gmail.com" , "song@kernel.org" , "yonghong.song@linux.dev" , "john.fastabend@gmail.com" , "kpsingh@kernel.org" , "sdf@fomichev.me" , "haoluo@google.com" , "jolsa@kernel.org" , "jgg@ziepe.ca" , "jhubbard@nvidia.com" , "peterx@redhat.com" , "jannh@google.com" , "pfalcato@suse.de" , "shuah@kernel.org" , "riel@surriel.com" , "ryan.roberts@arm.com" , "jgross@suse.com" , "yu-cheng.yu@intel.com" , "kas@kernel.org" , "coxu@redhat.com" , "kevin.brodsky@arm.com" , "maobibo@loongson.cn" , "prsampat@amd.com" , "mlevitsk@redhat.com" , "jmattson@google.com" , "jthoughton@google.com" , "agordeev@linux.ibm.com" , "alex@ghiti.fr" , "aou@eecs.berkeley.edu" , "borntraeger@linux.ibm.com" , "chenhuacai@kernel.org" , "dev.jain@arm.com" , "gor@linux.ibm.com" , "hca@linux.ibm.com" , "Jonathan.Cameron@huawei.com" , "palmer@dabbelt.com" , "pjw@kernel.org" , "shijie@os.amperecomputing.com" , "svens@linux.ibm.com" , "thuth@redhat.com" , "wyihan@google.com" , "yang@os.amperecomputing.com" , "vannapurve@google.com" , "jackmanb@google.com" , "aneesh.kumar@kernel.org" , "patrick.roy@linux.dev" , "Thomson, Jack" , "Itazuri, Takahiro" , "Manwaring, Derek" , "Cali, Marco" References: <20260114134510.1835-1-kalyazin@amazon.com> <20260114134510.1835-8-kalyazin@amazon.com> Content-Language: en-US From: Nikita Kalyazin Autocrypt: addr=kalyazin@amazon.com; keydata= xjMEY+ZIvRYJKwYBBAHaRw8BAQdA9FwYskD/5BFmiiTgktstviS9svHeszG2JfIkUqjxf+/N JU5pa2l0YSBLYWx5YXppbiA8a2FseWF6aW5AYW1hem9uLmNvbT7CjwQTFggANxYhBGhhGDEy BjLQwD9FsK+SyiCpmmTzBQJnrNfABQkFps9DAhsDBAsJCAcFFQgJCgsFFgIDAQAACgkQr5LK IKmaZPOpfgD/exazh4C2Z8fNEz54YLJ6tuFEgQrVQPX6nQ/PfQi2+dwBAMGTpZcj9Z9NvSe1 CmmKYnYjhzGxzjBs8itSUvWIcMsFzjgEY+ZIvRIKKwYBBAGXVQEFAQEHQCqd7/nb2tb36vZt ubg1iBLCSDctMlKHsQTp7wCnEc4RAwEIB8J+BBgWCAAmFiEEaGEYMTIGMtDAP0Wwr5LKIKma ZPMFAmes18AFCQWmz0MCGwwACgkQr5LKIKmaZPNTlQEA+q+rGFn7273rOAg+rxPty0M8lJbT i2kGo8RmPPLu650A/1kWgz1AnenQUYzTAFnZrKSsXAw5WoHaDLBz9kiO5pAK In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.106.82.9] X-ClientProxiedBy: EX19D008EUC003.ant.amazon.com (10.252.51.205) To EX19D005EUB003.ant.amazon.com (10.252.51.31) X-Rspam-User: X-Stat-Signature: s7h15i5wgzabuz7t6nttxxrz9n3cgsqa X-Rspamd-Queue-Id: 3FDAC180008 X-Rspamd-Server: rspam04 X-HE-Tag: 1768575433-486504 X-HE-Meta: U2FsdGVkX1/VxhXEbZhfTQ7BzZqdOiiouzGUQB+UBVUpzeVM6A146Y1KsgQNZlESUB79aO97ojyQ+PYBKd8Y4k+z98YMm9ANhtZAQ8IihtSdixpN0P62PvMzugiqwXP6cctzgagj3Jg6pL6JU7SzAZl8uRHgzQU1fdB3z4ZcWRn/NO5VA1r3Aa+4cMNKdQAoiabX0rYHdHaauXcit8+J/FtVbeznuUQj//XqR/W/Uy9FFJvR6eWNut5jgsoiLAB+JdhJge+cWkQf5r/drK0pS0SwzA9WLW7ln+W9GCHyYLyCY5yx9ukUUnOXcEgwxjUzwutg/KfVV+yF51bCMLXmCvUvwJ8l0SUb8vMteAhLDjaJ4+05eIuZ5ToHMUcAEiU7ULBSsGLAe21hn/qxKz1K+IFgR60ChvlX7wUKtIX8sIwc6a0YqK7CtbeyRc3fCYogZMa+Hh0vNElxftIDzNF+wrUvbDPBupahDuEzR+avSIu8rv9sQcRBfc7Mtxn7go5fqxl2uRFaZFQgiDe36Y1TOBgsTtL/dyVdXZql/suwzmrnkJy24+U2y0kcIpgYsttD9gmCImKT8Vo1Mn+faP1z8LTjVMhq2Qsf3TW5/L2kEEoUOK9Eilneu19WKpf5XSJIOZBDJNqIK1S9VqYEtwBFcgcIXmlFqDeldENluyyEvbqwKEA5fFVqeqLsj7EWQGpAvGciyYuwDWjyXg8NnMw+36vTuYXhaLpzq3tjHWXIs3dq5SITH4YvP3rP2TcLmrilupZq9vCdi8YnwugYuiPdSny8aQLOlL53O6e72T+ZmTnI2it5Ndj/wzo3EPH07ehxaeYAaDlFecjpqXvehr3pKWupK7igFcfyXpVzsidRjS3GT1/jRF9+28J2+kyL+V0Q/UD7O++yUTKrpin62F9FqgiD62InZf2IGzb+d3w8HPPxJvgFTGskNVQAqK9TMAR1sFOX4gUZqRDjkEbS6kO uj7GL76b g+JO3y42HTJx9S+v6ouimFhm4arvHWuNyrtU+79GG8ncrklBAcsHPFp1lO7QR0pV2uA1eocrLHKGiAK/lTioOLK1rR4FuFfO/TOcJn+GOA5aXEyDbq7fEQD8hZN3f6K2U3IDEECay8Ynt0BSIMZi8la0jIi+dM8buINjMJ40WkBHrAkeyFymeMuEjGLl8CoJrKbALBiXbKMi2Zj9Ui5SXOr+nNCLQhJMPNyquOZqkb+GlsMA3yIswCoe6v0Fy3tgAfhF0BTDaVaS8ayzn8qBMxYfA6jiiO2h1y825f5U1YXhXDNoGycdIHlA4Azm/XMlxJ6cB1e/WJafbl+784q+l4wxPPd7c9SgAw7l2R8y9vxSSnbtx6NptMPeMRHGyn4b2xxfZPhfRX1fvNf8y5WdXEahbWVJ/jK69NypR+sr8ce4tUInpIPYrXRyHg7rQi4rlfQmivCMTmm+xd87u2D6u2rnh/HZjCM8Jdz5PTlerzyouwhBvsLkwoVo4nAPXQrxvnSDjifIdlLZ2fjVWR1OXYARb8XpFmUfG/7NhrffPYB8tztvWrL0TsUEZQgwEmLlxDBatd2LLrtz4SPh8G+/+oyfOcGSZoHdMKzrvWB1Y7Mf+uEwOlF1tRN7weq4oC+zLDUx4Ct/PTF6Q4+u3LRKdvnfjR6pqOsLLbuT27zQjOQSQNBW5EsaZ/EfJREfcN2dh714vD+LjV1wKeVEob79Ma3wsRfZaGirC82vCkKg4ZD5M2te24ktpC3GkeQ567VmgxpScvYovDP6lcnLoE83/X5gODOnnRooqDzuDoSJLlM7e1rVn6Hljb3OOQnsVA2VZhqfuVZ+lZTWNLUHY7KbvG4V3Po5tlB9XHnET69/OUMPvnZT1rGqg9WaNnt7bwCw54rNtjdF4HfyWD9uB1hmIJlg+zvfw3ZhELvgkidYLnyawlV6vS4U3Ud3tCXaqQWp+WT+oznB4cPo9VnF7ipC47IigsV3d ijXsd0H2 8DFXKT3mwYHnBiqpE+9pciJ+3bWfKIgigTuRTQZQz2LFgomf+Wk05L22I84aag1vA5yD+Rv7qAI472ECpF5qWA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 15/01/2026 20:00, Ackerley Tng wrote: > "Kalyazin, Nikita" writes: > >> From: Patrick Roy >> >> Add GUEST_MEMFD_FLAG_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD() >> ioctl. When set, guest_memfd folios will be removed from the direct map >> after preparation, with direct map entries only restored when the folios >> are freed. >> >> To ensure these folios do not end up in places where the kernel cannot >> deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct >> address_space if GUEST_MEMFD_FLAG_NO_DIRECT_MAP is requested. >> >> Note that this flag causes removal of direct map entries for all >> guest_memfd folios independent of whether they are "shared" or "private" >> (although current guest_memfd only supports either all folios in the >> "shared" state, or all folios in the "private" state if >> GUEST_MEMFD_FLAG_MMAP is not set). The usecase for removing direct map >> entries of also the shared parts of guest_memfd are a special type of >> non-CoCo VM where, host userspace is trusted to have access to all of >> guest memory, but where Spectre-style transient execution attacks >> through the host kernel's direct map should still be mitigated. In this >> setup, KVM retains access to guest memory via userspace mappings of >> guest_memfd, which are reflected back into KVM's memslots via >> userspace_addr. This is needed for things like MMIO emulation on x86_64 >> to work. >> >> Direct map entries are zapped right before guest or userspace mappings >> of gmem folios are set up, e.g. in kvm_gmem_fault_user_mapping() or >> kvm_gmem_get_pfn() [called from the KVM MMU code]. The only place where >> a gmem folio can be allocated without being mapped anywhere is >> kvm_gmem_populate(), where handling potential failures of direct map >> removal is not possible (by the time direct map removal is attempted, >> the folio is already marked as prepared, meaning attempting to re-try >> kvm_gmem_populate() would just result in -EEXIST without fixing up the >> direct map state). These folios are then removed form the direct map >> upon kvm_gmem_get_pfn(), e.g. when they are mapped into the guest later. >> >> Signed-off-by: Patrick Roy >> Signed-off-by: Nikita Kalyazin >> --- >> Documentation/virt/kvm/api.rst | 22 ++++++++------ >> include/linux/kvm_host.h | 12 ++++++++ >> include/uapi/linux/kvm.h | 1 + >> virt/kvm/guest_memfd.c | 54 ++++++++++++++++++++++++++++++++++ >> 4 files changed, 80 insertions(+), 9 deletions(-) >> >> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst >> index 01a3abef8abb..c5f54f1370c8 100644 >> --- a/Documentation/virt/kvm/api.rst >> +++ b/Documentation/virt/kvm/api.rst >> @@ -6440,15 +6440,19 @@ a single guest_memfd file, but the bound ranges must not overlap). >> The capability KVM_CAP_GUEST_MEMFD_FLAGS enumerates the `flags` that can be >> specified via KVM_CREATE_GUEST_MEMFD. Currently defined flags: >> >> - ============================ ================================================ >> - GUEST_MEMFD_FLAG_MMAP Enable using mmap() on the guest_memfd file >> - descriptor. >> - GUEST_MEMFD_FLAG_INIT_SHARED Make all memory in the file shared during >> - KVM_CREATE_GUEST_MEMFD (memory files created >> - without INIT_SHARED will be marked private). >> - Shared memory can be faulted into host userspace >> - page tables. Private memory cannot. >> - ============================ ================================================ >> + ============================== ================================================ >> + GUEST_MEMFD_FLAG_MMAP Enable using mmap() on the guest_memfd file >> + descriptor. >> + GUEST_MEMFD_FLAG_INIT_SHARED Make all memory in the file shared during >> + KVM_CREATE_GUEST_MEMFD (memory files created >> + without INIT_SHARED will be marked private). >> + Shared memory can be faulted into host userspace >> + page tables. Private memory cannot. >> + GUEST_MEMFD_FLAG_NO_DIRECT_MAP The guest_memfd instance will behave similarly >> + to memfd_secret, and unmaps the memory backing > > Perhaps the reference to memfd_secret can be dropped to avoid anyone > assuming further similarities between guest_memfd and memfd_secret. This > could just say that "The guest_memfd instance will unmap the memory > backing it from the kernel's address space...". Agree, it may lead to a confusion down the line, thanks. > >> + it from the kernel's address space before >> + being passed off to userspace or the guest. >> + ============================== ================================================ >> >> When the KVM MMU performs a PFN lookup to service a guest fault and the backing >> guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be >> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h >> index 27796a09d29b..d4d5306075bf 100644 >> --- a/include/linux/kvm_host.h >> +++ b/include/linux/kvm_host.h >> @@ -738,10 +738,22 @@ static inline u64 kvm_gmem_get_supported_flags(struct kvm *kvm) >> if (!kvm || kvm_arch_supports_gmem_init_shared(kvm)) >> flags |= GUEST_MEMFD_FLAG_INIT_SHARED; >> >> + if (kvm_arch_gmem_supports_no_direct_map()) >> + flags |= GUEST_MEMFD_FLAG_NO_DIRECT_MAP; >> + >> return flags; >> } >> #endif >> >> +#ifdef CONFIG_KVM_GUEST_MEMFD >> +#ifndef kvm_arch_gmem_supports_no_direct_map >> +static inline bool kvm_arch_gmem_supports_no_direct_map(void) >> +{ >> + return false; >> +} >> +#endif >> +#endif /* CONFIG_KVM_GUEST_MEMFD */ >> + >> #ifndef kvm_arch_has_readonly_mem >> static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm) >> { >> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h >> index dddb781b0507..60341e1ba1be 100644 >> --- a/include/uapi/linux/kvm.h >> +++ b/include/uapi/linux/kvm.h >> @@ -1612,6 +1612,7 @@ struct kvm_memory_attributes { >> #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd) >> #define GUEST_MEMFD_FLAG_MMAP (1ULL << 0) >> #define GUEST_MEMFD_FLAG_INIT_SHARED (1ULL << 1) >> +#define GUEST_MEMFD_FLAG_NO_DIRECT_MAP (1ULL << 2) >> >> struct kvm_create_guest_memfd { >> __u64 size; >> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c >> index 92e7f8c1f303..43f64c11467a 100644 >> --- a/virt/kvm/guest_memfd.c >> +++ b/virt/kvm/guest_memfd.c >> @@ -7,6 +7,9 @@ >> #include >> #include >> #include >> +#include >> + >> +#include >> >> #include "kvm_mm.h" >> >> @@ -76,6 +79,43 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo >> return 0; >> } >> >> +#define KVM_GMEM_FOLIO_NO_DIRECT_MAP BIT(0) >> + >> +static bool kvm_gmem_folio_no_direct_map(struct folio *folio) >> +{ >> + return ((u64) folio->private) & KVM_GMEM_FOLIO_NO_DIRECT_MAP; > > Nit: I think there shouldn't be a space between (u64) and what's being casted. True, will remove. > >> +} >> + >> +static int kvm_gmem_folio_zap_direct_map(struct folio *folio) >> +{ >> + u64 gmem_flags = GMEM_I(folio_inode(folio))->flags; >> + int r = 0; >> + >> + if (kvm_gmem_folio_no_direct_map(folio) || !(gmem_flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP)) >> + goto out; >> + >> + folio->private = (void *)((u64)folio->private | KVM_GMEM_FOLIO_NO_DIRECT_MAP); >> + r = folio_zap_direct_map(folio); >> + >> +out: >> + return r; >> +} >> + >> +static void kvm_gmem_folio_restore_direct_map(struct folio *folio) >> +{ >> + /* >> + * Direct map restoration cannot fail, as the only error condition >> + * for direct map manipulation is failure to allocate page tables >> + * when splitting huge pages, but this split would have already >> + * happened in folio_zap_direct_map() in kvm_gmem_folio_zap_direct_map(). >> + * Thus folio_restore_direct_map() here only updates prot bits. >> + */ > > Thanks for this comment :) Thanks to Patrick :) > >> + if (kvm_gmem_folio_no_direct_map(folio)) { >> + WARN_ON_ONCE(folio_restore_direct_map(folio)); >> + folio->private = (void *)((u64)folio->private & ~KVM_GMEM_FOLIO_NO_DIRECT_MAP); >> + } >> +} >> + >> static inline void kvm_gmem_mark_prepared(struct folio *folio) >> { >> folio_mark_uptodate(folio); >> @@ -398,6 +438,7 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf) >> struct inode *inode = file_inode(vmf->vma->vm_file); >> struct folio *folio; >> vm_fault_t ret = VM_FAULT_LOCKED; >> + int err; >> >> if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode)) >> return VM_FAULT_SIGBUS; >> @@ -423,6 +464,12 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf) >> kvm_gmem_mark_prepared(folio); >> } >> >> + err = kvm_gmem_folio_zap_direct_map(folio); > > Perhaps the check for gmem_flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP should > be done here before making the call to kvm_gmem_folio_zap_direct_map() > to make it more obvious that zapping is conditional. Makes sense to me. > > Perhaps also add a check for kvm_arch_gmem_supports_no_direct_map() so > this call can be completely removed by the compiler if it wasn't > compiled in. But if it is compiled in, we will be paying the cost of the call on every page fault? Eg on arm64, it will call the following: bool can_set_direct_map(void) { ... return rodata_full || debug_pagealloc_enabled() || arm64_kfence_can_set_direct_map() || is_realm_world(); } > > The kvm_gmem_folio_no_direct_map() check should probably remain in > kvm_gmem_folio_zap_direct_map() since that's a "if already zapped, don't > zap again" check. > >> + if (err) { >> + ret = vmf_error(err); >> + goto out_folio; >> + } >> + >> vmf->page = folio_file_page(folio, vmf->pgoff); >> >> out_folio: >> @@ -533,6 +580,8 @@ static void kvm_gmem_free_folio(struct folio *folio) >> kvm_pfn_t pfn = page_to_pfn(page); >> int order = folio_order(folio); >> >> + kvm_gmem_folio_restore_direct_map(folio); >> + > > I can't decide if the kvm_gmem_folio_no_direct_map(folio) should be in > the caller or within kvm_gmem_folio_restore_direct_map(), since this > time it's a folio-specific property being checked. I'm tempted to keep it similar to the kvm_gmem_folio_zap_direct_map() case. How does the fact it's a folio-speicific property change your reasoning? > > Perhaps also add a check for kvm_arch_gmem_supports_no_direct_map() so > this call can be completely removed by the compiler if it wasn't > compiled in. IIUC whether the check is added in the caller or within > kvm_gmem_folio_restore_direct_map() the call can still be elided. Same concern as the above about kvm_gmem_folio_zap_direct_map(), ie the performance of the case where kvm_arch_gmem_supports_no_direct_map() exists. > >> kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order)); >> } >> >> @@ -596,6 +645,9 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) >> /* Unmovable mappings are supposed to be marked unevictable as well. */ >> WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); >> >> + if (flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP) >> + mapping_set_no_direct_map(inode->i_mapping); >> + >> GMEM_I(inode)->flags = flags; >> >> file = alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, &kvm_gmem_fops); >> @@ -807,6 +859,8 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, >> if (!is_prepared) >> r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio); >> >> + kvm_gmem_folio_zap_direct_map(folio); >> + > > Is there a reason why errors are not handled when faulting private memory? No, I can't see a reason. Will add a check, thanks. > >> folio_unlock(folio); >> >> if (!r) >> -- >> 2.50.1