From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8171FC369AB for ; Tue, 15 Apr 2025 21:50:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 830316B01A3; Tue, 15 Apr 2025 17:50:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7DF53280006; Tue, 15 Apr 2025 17:50:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6A78A280001; Tue, 15 Apr 2025 17:50:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 4A9CA6B01A3 for ; Tue, 15 Apr 2025 17:50:05 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 7333FBE538 for ; Tue, 15 Apr 2025 21:50:06 +0000 (UTC) X-FDA: 83337621612.15.CF5479B Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf21.hostedemail.com (Postfix) with ESMTP id D49751C0007 for ; Tue, 15 Apr 2025 21:50:04 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=qNTgypkc; spf=pass (imf21.hostedemail.com: domain of 3i9T-ZwsKCHYUWeYlfYsnhaaiiafY.Wigfchor-ggepUWe.ila@flex--ackerleytng.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3i9T-ZwsKCHYUWeYlfYsnhaaiiafY.Wigfchor-ggepUWe.ila@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744753804; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uTx3IQmw1d/s//2U0IFS0RT4+t8nXB6h4o3D1NalSH0=; b=vQ3DqUHKFHy8NLdcMpiQgUbGnsouQUrCWmwr0eDENM0k4b3rlqObp2M/bmJ2TUXSSnSAVS lp0XlaYRowENEw0TiYUKcqYSB7cb7l2UceWeuSHbnNp4i3rBVocOkhf6AEV2R9ThtLKAl9 kkdWokc4jh7d+P3pjFnysdUDvgcBbX4= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=qNTgypkc; spf=pass (imf21.hostedemail.com: domain of 3i9T-ZwsKCHYUWeYlfYsnhaaiiafY.Wigfchor-ggepUWe.ila@flex--ackerleytng.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3i9T-ZwsKCHYUWeYlfYsnhaaiiafY.Wigfchor-ggepUWe.ila@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744753804; a=rsa-sha256; cv=none; b=shzcrSHenqEMw/HRitkri2w9a7pHqGAXXxn++UuUOpgC0v5Pn7N/oB4axusWyRTBBjRp0o 7UGvPLu1E5MCvawZAyaqSFMYJig4HxcLF7p9/4tBXpeY68l4Ta7MOGBtfIyKnpmaAu+omL QOG0V2HdJnuY6HsdjN7bCYHEKIUM5rA= Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2ff6af1e264so8764938a91.3 for ; Tue, 15 Apr 2025 14:50:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744753804; x=1745358604; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=uTx3IQmw1d/s//2U0IFS0RT4+t8nXB6h4o3D1NalSH0=; b=qNTgypkcuLV6vuPFBPr+CKGp+pxAkxZdKLPQEpQ8JxELWKhlAC0Vyb4RMUyytmJ4QZ ODWNPQuAjVka8xyHdCIlFWZYnRi95SbQRBjFYVXkuLUccWvhhm0F5XGxkOVxpERvWKiC e7inCUVHPb9GuCxBwdY0fiDVhF90yhDezQr2m2HQmBRNMfn4qnXTJwdumkdUvuJ+8NIW WHRphSikZduSXPMenI2M2a6QmPxcLdvUMqEAlJgcjkVY0jt0Vqk0rwPaKCwmmWIOdXkm RYceNzbP1hGWFFxgoggbGt4cq6P6EY1VTwxsnuGrS7xHSfsUGeQFFaB0MZIKjaWqMPgM 71vw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744753804; x=1745358604; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uTx3IQmw1d/s//2U0IFS0RT4+t8nXB6h4o3D1NalSH0=; b=rLTot55VXK09rxjnE6VPpRPiDkzHx0G0njHyCjURzkZX4j9OOMDx8tnPWt+IVYw6y3 Kq3FdJdO7AxL9qACg26s7goqVDh24qGa6ozVBJpUebC1OK377Uaai7Go4zjkCYw8rhxc aV04FcFiAKjG55hAWxazc9blpvY17ej8uUza9AOFQcWNYgV4FvAvWvDkP7dUz0QEEX82 i+Z+SwOBhl/q8lj9K4Aeac/lHGiYortR50Mxl2JahPkWQ77xMGzuqnT8eZffBaGKey0C RH4M4UfgWQlDgSUe65WRD8Ze8P/nye1td1ZRBIcAtxAKxRt7FxVGE/Qno1Bf5J9dNeeR 6xXw== X-Forwarded-Encrypted: i=1; AJvYcCWobiedYVFRjyPrstHa1VauiZ2qWQqUVfbmHM6FtJ3AOYlQW+1ear5+iyeHtohC95gFnyF+N5LNsA==@kvack.org X-Gm-Message-State: AOJu0YzKpj26jNFX01MqmU7+jTPp6W8CtBs8Dwk89aJyl2lk4YlDV/1q 7Pw4hr/0waFBJ2fVJlCR0Kg6oE1/s02I3sbE9W74zl9MEspwH6eQdV+1LVmQfCGEtIH79Bb1t0a f2+TOKYjxngXQJsV+TdGnCw== X-Google-Smtp-Source: AGHT+IHbcuCKr7+TaUUDUKsgkWd0/H6le+d7oLdGfE5j/tZhHFTkUYtt0TfO8zWEQliBZKo1bdAr6qqVJB/Do3zRBA== X-Received: from pjbse6.prod.google.com ([2002:a17:90b:5186:b0:2ff:6e58:89f7]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4d0f:b0:2ff:592d:23bc with SMTP id 98e67ed59e1d1-3085ee7fd32mr993910a91.4.1744753803659; Tue, 15 Apr 2025 14:50:03 -0700 (PDT) Date: Tue, 15 Apr 2025 14:50:02 -0700 In-Reply-To: <6121b93b-6390-49e9-82db-4ed3a6797898@redhat.com> Mime-Version: 1.0 References: <20250318161823.4005529-1-tabba@google.com> <20250318161823.4005529-5-tabba@google.com> <8ebc66ae-5f37-44c0-884b-564a65467fe4@redhat.com> <6121b93b-6390-49e9-82db-4ed3a6797898@redhat.com> Message-ID: Subject: Re: [PATCH v7 4/9] KVM: guest_memfd: Handle in-place shared memory as guest_memfd backed memory From: Ackerley Tng To: David Hildenbrand , Fuad Tabba , kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, mail@maciej.szmigiero.name, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, peterx@redhat.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: D49751C0007 X-Stat-Signature: u6szj6zc6x7r5c73cdaa578exj8w779q X-HE-Tag: 1744753804-29833 X-HE-Meta: U2FsdGVkX1+prjgC/Ka2U4dwpw3wd6J7snJVPzP6dLjlysCBb4tQ/XUYBp1Tuk/ORBFZ/JF4o/yCRIbJUErTQONY7HeFFeTJKX8hKB/1mopT8edbVw0Fz4fFBQUYk5VLw1UC3sBWoVqi8sHSbjZohrAqKjDIs0fj9NVoxVOUN54eNsdQgHXs7rO9Kqtm/vlitfGHGfPP8Pl67Ee2u2VRnRI2AepytK4FlxQQKgqghrfqR7GuN+83mOYUsGfeIWmO9o1rSNs/uxX4QVm/Wi93PrsPltvvUGFqy4N3hD13TezVm3laK5UbeNmxtkLM5qgUa12VXXuyuR+AgU3iqC82GHZAV17rnAs+3z46MdSzn/2m4AH9K4NPn2RfEXvpZvF2ea/p9syP6cYBoys+PPGcayFRHhXT+ju9ybfHShcZSXnrTGYJyicq4e5iQGwV09Ow2iDjJJdcIKPbqfKyQu21i9zkY6xpBV2uAVdUsieVuM8YOx+eGPhc79zWoVIdf7LUKY7EfmKOQeDp2OFTY3RUyhqwoDN/3/Hs/YXzHTdm/JQtVL5zVGO10v1cPCk+AhEryqqLtOVESyeUy61ZS+NYpBeTlhQD+DaDz6i41RFgJVR/6t8pYKzqo3mXvNOarCt80eVwevEtTq8cVXz2PP5ff6K88XCsO4mJ3EwVTTAmkSUi5QJFrsj2Il93xiPvApqfteMb5S5bxLbKFiNWvYKit6OgMLqVvyqh64YJkbAz00XUO1JF81Uc89TgTjbJ+vY41NB4k2jZCrjWKrDd6BKg+gVttNtPAXbXTk4pm4K72UloGvYuaAGGOfbFZAuyPhx9MGrg67cYH69FyIG+FDvavT40CddSP2CPKz32gzu8Zo64Stk+cYmKh9CqxWaPy2IgXtxxafVQC6nApCF7TziNibBNde5Yb44BbBfp4YH8Gea3oiqUeBJd7KRs3W45IkPgEsRGEIzsiRsoJmtrnVB IJCJKcWA 80oxKbPwMwc68t3si3PWFvPN+/+eRDrid2xqYnjd+vewtrBaMYX9urp3yuw5XH/kmeIq2rQIhN88Dcp3jtFPmsWF1DWTP2PujAbZRE4jRwRKqrgP/8tmD812SGqPeutPcWIbSPA3pW4uYPQdQ+8IDwbMTEuv/6GeAg2XWnZU0l3A7xMZw+qyvFKGao8rk2RNQM7HlVtOSQE/WhKyWBmiHOPnSubSdyYNLKm7sKJqefTUZj28ITXAyarO8PWJC69HOhWz7Tu1QlPyu1LVlGHf+MhM5zuCMFscndp9BGqGhPXSKkbt42+9FruaqBQe9N/3s+zjfypZUmwfIhAysIIb69ypCYCZfrM6QKpHkX9loSoP1Z0hTiXWxwjvwiqXSNXZO0IDdxK9j0W2o+W9UeWI92POc9A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000006, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: David Hildenbrand writes: >>> I've been thinking long about this, and was wondering if we should instead >>> clean up the code to decouple the "private" from gmem handling first. >>> >> >> Thank you for making this suggestion more concrete, I like the renaming! >> > > Thanks for the fast feedback! > >>> I know, this was already discussed a couple of times, but faking that >>> shared memory is private looks odd. >>> >>> I played with the code to star cleaning this up. I ended up with the following >>> gmem-terminology cleanup patches (not even compile tested) >>> >>> KVM: rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE >>> KVM: rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM >>> KVM: rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() >>> KVM: x86: rename kvm->arch.has_private_mem to kvm->arch.supports_gmem >>> KVM: rename kvm_slot_can_be_private() to kvm_slot_has_gmem() >> >> Perhaps zooming into this [1] can clarify a lot. In >> kvm_mmu_max_mapping_level(), it was >> >> bool is_private = kvm_slot_has_gmem(slot) && kvm_mem_is_private(kvm, gfn); >> >> and now it is >> >> bool is_gmem = kvm_slot_has_gmem(slot) && kvm_mem_from_gmem(kvm, gfn); >> >> Is this actually saying that the mapping level is to be fully determined >> from lpage_info as long as this memslot has gmem and > > With this change in particular I was not quite sure what to do, maybe it should > stay specific to private memory only? But yeah the ideas was that > kvm_mem_from_gmem() would express: > > (a) if guest_memfd only supports private memory, it would translate to > kvm_mem_is_private() -> no change. > > (b) with guest_memfd having support for shared memory (+ support being enabled!), > it would only rely on the slot, not gfn information. Because it will all be > consumed from guest_memfd. > > This hunk was missing > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index d9616ee6acc70..cdcd7ac091b5c 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -2514,6 +2514,12 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn) > } > #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */ > > +static inline bool kvm_mem_from_gmem(struct kvm *kvm, gfn_t gfn) > +{ > + /* For now, only private memory gets consumed from guest_memfd. */ > + return kvm_mem_is_private(kvm, gfn); > +} > + > > I looked a little deeper and got help from James Houghton on understanding this too. Specifically for the usage of kvm_mem_is_private() in kvm_mmu_max_mapping_level(), the intention there is probably to skip querying userspace page tables in __kvm_mmu_max_mapping_level() since private memory will never be faulted into userspace, hence no need to check. Hence kvm_mem_is_private() there is really meant to query the private-ness of the gfn rather than just whether kvm_mem_from_gmem(). But then again, if kvm_mem_from_gmem(), guest_memfd should be queried for max_mapping_level. guest_memfd would know, for both private and shared memory, what page size the page was split to, and what level it was faulted as. (Exception: if/when guest_memfd supports THP, depending on how that is done, querying userspace page tables might be necessary to determine the max_mapping_level) >> >> A. this specific gfn is backed by gmem, or >> B. if the specific gfn is private? >> >> I noticed some other places where kvm_mem_is_private() is left as-is >> [2], is that intentional? Are you not just renaming but splitting out >> the case two cases A and B? > > That was the idea, yes. > > If we get a private fault and !kvm_mem_is_private(), or a shared fault and > kvm_mem_is_private(), then we should handle it like today. > > That is the kvm_mmu_faultin_pfn() case, where we > > if (fault->is_private != kvm_mem_is_private(kvm, fault->gfn)) { > kvm_mmu_prepare_memory_fault_exit(vcpu, fault); > return -EFAULT; > } > > which can be reached by arch/x86/kvm/svm/svm.c:npf_interception() > > if (sev_snp_guest(vcpu->kvm) && (error_code & PFERR_GUEST_ENC_MASK)) > error_code |= PFERR_PRIVATE_ACCESS; > > In summary: the memory attribute mismatch will be handled as is, but not how > we obtain the gfn. > > At least that was the idea (-issues in the commit). > > What are your thoughts about that direction? I still like the renaming. :) I looked into kvm_mem_is_private() and I believe it has the following uses: 1. Determining max_mapping_level (kvm_mmu_max_mapping_level() and friends) 2. Querying the kernel's record of private/shared state, which is used to handle (a) mismatch between fault->private and kernel's record (handling implicit conversions) (b) how to prefaulting pages (c) determining how to fault in KVM_X86_SW_PROTECTED_VMs So perhaps we could leave kvm_mem_is_private() as not renamed, but as part of the series introducing mmap and conversions (CONFIG_KVM_GMEM_SHARED_MEM), we should also have kvm_mem_is_private() query guest_memfd for shareability status, and perhaps kvm_mmu_max_mapping_level() could query guest_memfd for page size (after splitting, etc). IIUC the maximum mapping level is determined by these factors: 1. Attribute granularity (lpage_info) 2. Page size (guest_memfd for guest_memfd backed memory) 3. Size of mapping in host page table (for non-guest_memfd backed memory, and important for THP if/when/depending on how guest_memfd supports THP) > > -- > Cheers, > > David / dhildenb