From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C40F5C77B7F for ; Tue, 24 Jun 2025 23:40:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EB9EC6B00B4; Tue, 24 Jun 2025 19:40:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E6A4E6B00B6; Tue, 24 Jun 2025 19:40:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D80706B00B5; Tue, 24 Jun 2025 19:40:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C78706B008A for ; Tue, 24 Jun 2025 19:40:23 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 7F5EC1D923B for ; Tue, 24 Jun 2025 23:40:23 +0000 (UTC) X-FDA: 83591915526.14.A4F587D Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf19.hostedemail.com (Postfix) with ESMTP id A086E1A0005 for ; Tue, 24 Jun 2025 23:40:21 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=EyjKlpQD; spf=pass (imf19.hostedemail.com: domain of 3ZDdbaAsKCIknpxr4yrB60tt11tyr.p1zyv07A-zzx8npx.14t@flex--ackerleytng.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3ZDdbaAsKCIknpxr4yrB60tt11tyr.p1zyv07A-zzx8npx.14t@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750808421; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wtBKhjYCaIRhlGRFPITFm9XwgDffPGPWmrBo0vWLUKo=; b=wFCcHN1JLQGjoNftI9dHbGeqAcET+/+RZPUn8hhEZHAvs8OG2QJKOiDWDG3GMOBLKWNFb5 erlYqRaHhfSNAQVxYOSoIU5/kXPTZou3E7klu724zcO5mQB8jfqdJr40Bxc/tSpPq2ApUK U/BE+9hjPFIr/JmIkeZiPBwjUyAB2pk= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=EyjKlpQD; spf=pass (imf19.hostedemail.com: domain of 3ZDdbaAsKCIknpxr4yrB60tt11tyr.p1zyv07A-zzx8npx.14t@flex--ackerleytng.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3ZDdbaAsKCIknpxr4yrB60tt11tyr.p1zyv07A-zzx8npx.14t@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750808421; a=rsa-sha256; cv=none; b=KIBTat34PXS593M/LeS/cyJhZlSj9GD9r4uHLqsXV/WoiSYA9FTMmZgwok//unihHfh8kw JRKV1cTSlqFFjDxubdloBNEU+O7SeAHtTFWroujzy00XwAS85tR64xsNl5ZFzJPcZOeqVb x9u8iMDP/5fYPggAC1bG/koIlZyCmWI= Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-b2fb347b3e6so1062765a12.1 for ; Tue, 24 Jun 2025 16:40:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1750808420; x=1751413220; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=wtBKhjYCaIRhlGRFPITFm9XwgDffPGPWmrBo0vWLUKo=; b=EyjKlpQDR/GEc3iyOhxuJInvkEl7HPHYgKPvLDlrIX54qu7WU6PijsC7zQYHY+gMXY e5UjdF6DAUZJP1TC6yNAy1VvdLMSDSPciAp/hyis8BSt5iP2XBXg/cX2ol8NTU5epIj1 ecP4ezfQn5U6zJG6dQhoJU7kMRruz57n5pFBVioCFd6C8ZbO3uI8zmP7EkegoClb0hkm PstAIFzP6rlzYwQPYOkJK3bupjPiy2oRup8J9G2IKeMAsxjbOkcnJ8JaeoceG6PTTz2I ib2bzXH4MXqqY7pU1XQiyGhWH+tuydWTZ15IhK3Z7JpD227ASnrTl1nV8njADmwjoh3m FDyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750808420; x=1751413220; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wtBKhjYCaIRhlGRFPITFm9XwgDffPGPWmrBo0vWLUKo=; b=J9kF7tTzB5huW6CQ18Bn29hypXwBKfgSjJE9GzPKwYdSIEtdhEdBV0Mqye9YoN6oGo Vhzv9f+FwxTN1Y6h46HjA3dpNgC/T8doNzh4Nzc9CQ+8/s/Upw8xrkDblvUPxahe09eM 15vTSedsy1GepiX1ooCWIEVJBbyWOArUMsVZciyaXJkYS5dOBvPlamV+pXtwCBAO2OU7 o7kqkxJqFVeDozn1mnr6Vg0DoY6u8/R912JJbogMtf31dGdnReXLowiP9kSBcKkU79PO bZs5fFjCIwRniHZhc3CQqpf4+AM2vXuw8r7T/rEMjYevHEHpT1nagqHWTM+pEkfo2/qj PvkA== X-Forwarded-Encrypted: i=1; AJvYcCU21UYtOEACc5mC1+rYDKoDacfMVwFB71Cgaz2Z7c4LRkfpH4K22bY3B5Y+GKnjRIgUBmluMAaNyg==@kvack.org X-Gm-Message-State: AOJu0YyH2Cg2xd6JbZ5/gWxwUwH+v1bRRy2GuaAtve3A8ZjAdDdLV6U2 m55dRjvw+PNyqsxlNzeukdVDQV3Oqbtl0mAjwJL/f8+YAM/1v9cYo5bT/5ZkJgPVmOA2OUnmTxu 1k5Hdh5n2ZMaFg26XIHTb8gpCqw== X-Google-Smtp-Source: AGHT+IHgSUmIGUTP1cRXfKE/+aw5IRkLF9VVq5UcZdV1pOiUg7sF4uqUMGUdcbKxDyQplXXCwSktIYqGZpB8BdsqFQ== X-Received: from pfux8.prod.google.com ([2002:a05:6a00:bc8:b0:746:1a7b:a39a]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:2588:b0:1f5:717b:46dc with SMTP id adf61e73a8af0-2207f25dd7amr1308677637.27.1750808420299; Tue, 24 Jun 2025 16:40:20 -0700 (PDT) Date: Tue, 24 Jun 2025 16:40:18 -0700 In-Reply-To: Mime-Version: 1.0 References: <20250611133330.1514028-1-tabba@google.com> <20250611133330.1514028-11-tabba@google.com> Message-ID: Subject: Re: [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory From: Ackerley Tng To: Sean Christopherson , Fuad Tabba Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, kvmarm@lists.linux.dev, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, peterx@redhat.com, pankaj.gupta@amd.com, ira.weiny@intel.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: A086E1A0005 X-Stat-Signature: rt5wwb63jw5zx3fpnmd453dabr685t6m X-HE-Tag: 1750808421-420632 X-HE-Meta: U2FsdGVkX18coWAPmGLux9GZH1t5QLx6SXgmyDw7ioJDG0G9e4Td7eFgVmJDtH6NMesByPA3WGLne86k2anNwi9Hh7uCVdSPgfeUS4Qfr2BqBsQ79c944/V6bJi+Doksk8zg/yCg6JDBmmcFWMHztMG+7V+dDwnOm9sNYa3R1OgEjKQkhAiSX8FJnHUABUtuNGrr2S+C3eRbc90uEQRoesgkNccVMfauzcluIVhzWaLg/WnKIBxa/lwh1xrRvWUH8keINJ4FKy+tAeKMG3cZ0UJ4BgIOPLw4ON96bX4Pl9l4ub7cXOvJ0XSV7WNa/ZPnogpVuGacXJxa4zf1swlwhHu26MoUqK5DExgJHKKhCBATcRvClEhlPo2ObdNf0xsIph3FpPWmbT1YospJJ495vsB6HhnsMceKXBB/WlPX2BWH0BYL48UUYZvCgpNq83Js9tx7MkuYtD6CmZlNWiwhtyCFu2VMeS8pPvFAPgZ3xaMV/isEM9FwDh9TmawIN5W4P44mBv3EPD+QharcZ3sMdUuBo4ujyQ6Lm1LRSItv/Poea/XATj1lSS4YgrJ9QLdHeUTw4N17PT3LQm1YOupsN0KvkhcMhI1DjUNHY9q1t/rVO11vI3TleTOed213BjYSJe+naspbu7O55Huzy6l+RDyv1eNo/HhRYAZdS/68mvAyrlUhksX4eSUIln0miOyqDXvw5ecvQdUvIj5N6mCTPpiko1i2a6x6kz48gW9jJ+IiOsOUc1vReyfwIxezPPK9wJ3Eyp2/GgZbwqFfOUW2kqvGBetFwzvqiYTtPdXaTY42sBrAfPos255G/d+0zUKjOdF+qrqjVVLbP2Dfx9a9Bq0eaQF17nPM2TKRFcRZcW2pV+uLphcPONs1SRbj6uzBLBNs3HvSHoIMYZ5JozGdCCCw/hRCNnucSnoIx3MH23HSgE9ovu3rWPAMtBjkAMzWDVLtqL7hUZrqqjjr0fF r+Q2IJld YTV5VoV59YUmYW2qBXohmBBttXC6PdUixoKjqIaUDfGhQpXWS4ON+gwQFfafjBMtp8NVsuC6mPukRQCSDJ4FQH+5FhSFw2TZQPO0qCt5/k8B37HT30iMG1SNU0akroKVWTbrQgRJQsFQyVDcTQjcTT+TQ6m5Y/9BCDOlK4Pq4jw6NXiamDzUYHDvHMvoq1sVEp4CawSQHYFJ1D36ot22/N3srk+KNFZy1hyDZjH90nF/T0KUeOIexQsKsfZrDKg725lbohpFiqRl4xXdp6mKl9FXg7SiGpVhvzKenKsAmWjpENiH0yRp8X9jjhIRPGC2Itp/fHQv0IIpt0ZSD2xf8+Hf9ozVG8QB4u86zsTOiTl1ocZpI70wHEqt/F/L/nQ5OiFtqCcSghl7M+EOFqGRPTyHMOwCFMRb0YMKTRRVoDIzyMGcKReXjml28orn3b2AgpsxpGBI/4eLF8JsmFBdUE7/172AJWH7rnCixdVeiq7bSfv3HAUIIb3cYKPA5cwjE3TrALiCdBP3IlFWLpFWUAeTTsrsQhWoYOi4MWZjw8o7aIUG43pc/t8iffGDGdXECGEVsYmQGVPUwwgN3d+ePwgLm8YrIvyAks1jcUnUV/FLyccHjfv/kIg7G8X0bd/VIk0FP08aUs7lwXKAJ+yVg1P8CpxB2crOFAjBE8qYTDabGJyAVrel0XNS6mT1t/6aXD01dTNvgSCA8O/s= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Sean Christopherson writes: > On Wed, Jun 11, 2025, Fuad Tabba wrote: >> From: Ackerley Tng >> >> For memslots backed by guest_memfd with shared mem support, the KVM MMU >> must always fault in pages from guest_memfd, and not from the host >> userspace_addr. Update the fault handler to do so. > > And with a KVM_MEMSLOT_GUEST_MEMFD_ONLY flag, this becomes super obvious. > >> This patch also refactors related function names for accuracy: > > This patch. And phrase changelogs as commands. > >> kvm_mem_is_private() returns true only when the current private/shared >> state (in the CoCo sense) of the memory is private, and returns false if >> the current state is shared explicitly or impicitly, e.g., belongs to a >> non-CoCo VM. > > Again, state changes as commands. For the above, it's not obvious if you're > talking about the existing code versus the state of things after "this patch". > > Will fix these, thanks! >> kvm_mmu_faultin_pfn_gmem() is updated to indicate that it can be used to >> fault in not just private memory, but more generally, from guest_memfd. > >> +static inline u8 kvm_max_level_for_order(int order) > > Do not use "inline" for functions that are visible only to the local compilation > unit. "inline" is just a hint, and modern compilers are smart enough to inline > functions when appropriate without a hint. > > A longer explanation/rant here: https://lore.kernel.org/all/ZAdfX+S323JVWNZC@google.com > Will fix this! >> +static inline int kvm_gmem_max_mapping_level(const struct kvm_memory_slot *slot, >> + gfn_t gfn, int max_level) >> +{ >> + int max_order; >> >> if (max_level == PG_LEVEL_4K) >> return PG_LEVEL_4K; > > This is dead code, the one and only caller has *just* checked for this condition. >> >> - host_level = host_pfn_mapping_level(kvm, gfn, slot); >> - return min(host_level, max_level); >> + max_order = kvm_gmem_mapping_order(slot, gfn); >> + return min(max_level, kvm_max_level_for_order(max_order)); >> } > > ... > >> -static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, >> - u8 max_level, int gmem_order) >> +static u8 kvm_max_level_for_fault_and_order(struct kvm *kvm, > > This is comically verbose. C ain't Java. And having two separate helpers makes > it *really* hard to (a) even see there are TWO helpers in the first place, and > (b) understand how they differ. > > Gah, and not your bug, but completely ignoring the RMP in kvm_mmu_max_mapping_level() > is wrong. It "works" because guest_memfd doesn't (yet) support dirty logging, > no one enables the NX hugepage mitigation on AMD hosts. > > We could plumb in the pfn and private info, but I don't really see the point, > at least not at this time. > >> + struct kvm_page_fault *fault, >> + int order) >> { >> - u8 req_max_level; >> + u8 max_level = fault->max_level; >> >> if (max_level == PG_LEVEL_4K) >> return PG_LEVEL_4K; >> >> - max_level = min(kvm_max_level_for_order(gmem_order), max_level); >> + max_level = min(kvm_max_level_for_order(order), max_level); >> if (max_level == PG_LEVEL_4K) >> return PG_LEVEL_4K; >> >> - req_max_level = kvm_x86_call(private_max_mapping_level)(kvm, pfn); >> - if (req_max_level) >> - max_level = min(max_level, req_max_level); >> + if (fault->is_private) { >> + u8 level = kvm_x86_call(private_max_mapping_level)(kvm, fault->pfn); > > Hmm, so the interesting thing here is that (IIRC) the RMP restrictions aren't > just on the private pages, they also apply to the HYPERVISOR/SHARED pages. (Don't > quote me on that). > > Regardless, I'm leaning toward dropping the "private" part, and making SNP deal > with the intricacies of the RMP: > > /* Some VM types have additional restrictions, e.g. SNP's RMP. */ > req_max_level = kvm_x86_call(max_mapping_level)(kvm, fault); > if (req_max_level) > max_level = min(max_level, req_max_level); > > Then we can get to something like: > > static int kvm_gmem_max_mapping_level(struct kvm *kvm, int order, > struct kvm_page_fault *fault) > { > int max_level, req_max_level; > > max_level = kvm_max_level_for_order(order); > if (max_level == PG_LEVEL_4K) > return PG_LEVEL_4K; > > req_max_level = kvm_x86_call(max_mapping_level)(kvm, fault); > if (req_max_level) > max_level = min(max_level, req_max_level); > > return max_level; > } > > int kvm_mmu_max_mapping_level(struct kvm *kvm, > const struct kvm_memory_slot *slot, gfn_t gfn) > { > int max_level; > > max_level = kvm_lpage_info_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM); > if (max_level == PG_LEVEL_4K) > return PG_LEVEL_4K; > > /* TODO: Comment goes here about KVM not supporting this path (yet). */ Which path does KVM not support? > if (kvm_mem_is_private(kvm, gfn)) > return PG_LEVEL_4K; > Just making sure - this suggestion does take into account that kvm_mem_is_private() will be querying guest_memfd for memory privacy status, right? So the check below for kvm_is_memslot_gmem_only() will only be handling the cases where the memory is shared, and only guest_memfd is used for this gfn? > if (kvm_is_memslot_gmem_only(slot)) { > int order = kvm_gmem_mapping_order(slot, gfn); > > return min(max_level, kvm_gmem_max_mapping_level(kvm, order, NULL)); > } > > return min(max_level, host_pfn_mapping_level(kvm, gfn, slot)); > } > > static int kvm_mmu_faultin_pfn_gmem(struct kvm_vcpu *vcpu, > struct kvm_page_fault *fault) > { > struct kvm *kvm = vcpu->kvm; > int order, r; > > if (!kvm_slot_has_gmem(fault->slot)) { > kvm_mmu_prepare_memory_fault_exit(vcpu, fault); > return -EFAULT; > } > > r = kvm_gmem_get_pfn(kvm, fault->slot, fault->gfn, &fault->pfn, > &fault->refcounted_page, &order); > if (r) { > kvm_mmu_prepare_memory_fault_exit(vcpu, fault); > return r; > } > > fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); > fault->max_level = kvm_gmem_max_mapping_level(kvm, order, fault); > > return RET_PF_CONTINUE; > } > > int sev_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault) > { > int level, rc; > bool assigned; > > if (!sev_snp_guest(kvm)) > return 0; > > if (WARN_ON_ONCE(!fault) || !fault->is_private) > return 0; > > rc = snp_lookup_rmpentry(fault->pfn, &assigned, &level); > if (rc || !assigned) > return PG_LEVEL_4K; > > return level; > } I like this. Thanks for the suggestion, I'll pass Fuad some patch(es) for v13. >> +/* >> + * Returns true if the given gfn's private/shared status (in the CoCo sense) is >> + * private. >> + * >> + * A return value of false indicates that the gfn is explicitly or implicitly >> + * shared (i.e., non-CoCo VMs). >> + */ >> static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn) >> { >> - return IS_ENABLED(CONFIG_KVM_GMEM) && >> - kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE; >> + struct kvm_memory_slot *slot; >> + >> + if (!IS_ENABLED(CONFIG_KVM_GMEM)) >> + return false; >> + >> + slot = gfn_to_memslot(kvm, gfn); >> + if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) { >> + /* >> + * Without in-place conversion support, if a guest_memfd memslot >> + * supports shared memory, then all the slot's memory is >> + * considered not private, i.e., implicitly shared. >> + */ >> + return false; > > Why!?!? Just make sure KVM_MEMORY_ATTRIBUTE_PRIVATE is mutually exclusive with > mappable guest_memfd. You need to do that no matter what. Thanks, I agree that setting KVM_MEMORY_ATTRIBUTE_PRIVATE should be disallowed for gfn ranges whose slot is guest_memfd-only. Missed that out. Where do people think we should check the mutual exclusivity? In kvm_supported_mem_attributes() I'm thiking that we should still allow the use of KVM_MEMORY_ATTRIBUTE_PRIVATE for other non-guest_memfd-only gfn ranges. Or do people think we should just disallow KVM_MEMORY_ATTRIBUTE_PRIVATE for the entire VM as long as one memslot is a guest_memfd-only memslot? If we check mutually exclusivity when handling kvm_vm_set_memory_attributes(), as long as part of the range where KVM_MEMORY_ATTRIBUTE_PRIVATE is requested to be set intersects a range whose slot is guest_memfd-only, the ioctl will return EINVAL. > Then you don't need > to sprinkle special case code all over the place. > That's true, thanks. I guess the special-casing will come back when guest_memfd supports conversions (and stores shareability). After guest_memfd supports conversions, if guest_memfd-only memslot, check with guest_memfd. Else, look up memory attributes with kvm_get_memory_attributes(). >> + } >> + >> + return kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE; >> } >> #else >> static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn) >> -- >> 2.50.0.rc0.642.g800a2b2222-goog >>