From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0213EC282EC for ; Mon, 10 Mar 2025 14:23:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0EB40280005; Mon, 10 Mar 2025 10:23:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 07712280001; Mon, 10 Mar 2025 10:23:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E33F3280005; Mon, 10 Mar 2025 10:23:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C35CB280001 for ; Mon, 10 Mar 2025 10:23:44 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id EF3088112E for ; Mon, 10 Mar 2025 14:23:45 +0000 (UTC) X-FDA: 83205860010.01.A278C9E Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf25.hostedemail.com (Postfix) with ESMTP id 1EFF3A0019 for ; Mon, 10 Mar 2025 14:23:43 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="JB/LcU99"; spf=pass (imf25.hostedemail.com: domain of 37vXOZwsKCFs35D7KE7RMG99HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--ackerleytng.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=37vXOZwsKCFs35D7KE7RMG99HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741616624; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=siqaS2JbSnx2njtDDydrKwV/MnNuVKvfYYNYvSOekGg=; b=N++xlGjUf6YvuS6MWZaXpLC93K8MKJlJwOU30hz1Q72FinoRT8ehFrVcDxbNpAfKmFKU2d 9ci4emKPb7KW1bbjeWceUohCULRELgBw30WRDRChPvtR0i3N3jCwaJl4qVTsonuKF/0pPK WbqG3R6o2odiEdUajgPBIjI14IqXif8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741616624; a=rsa-sha256; cv=none; b=a87nZ+z37pXbcl8WfpNRkV5HgkG6xvmesLfRJXDrFn+uC9O/ezAJBLTzM1W36soPf2iRIm lo1fhNKSTsDIpOi8n4xsdylFrGKlNQJtbReevfxxICA2vKdhgz+MynqfuJgJw8PToG4hE+ R+54k5KnzCYntU+I2u/4kRjdtaX9WuM= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="JB/LcU99"; spf=pass (imf25.hostedemail.com: domain of 37vXOZwsKCFs35D7KE7RMG99HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--ackerleytng.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=37vXOZwsKCFs35D7KE7RMG99HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-300fefb8e25so57500a91.3 for ; Mon, 10 Mar 2025 07:23:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1741616623; x=1742221423; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=siqaS2JbSnx2njtDDydrKwV/MnNuVKvfYYNYvSOekGg=; b=JB/LcU99tcaGUwMOpQD88WfuO+M7898Hqy6LwnJ2FvgBxQIcjgf1pGpv2Sf7JJydxN bkVtcFk6Lna1eoSYIEfGNpf308F7uZUMxN6VfVnwxTeOFSZrre2bqEcMa78q+sxarbOw nAViksJfWzIJcHMELQOwiJ2xi92PMW5T46JJ3ctvlmwrWHv8GY9oJGqwHCAF5zRcwyXQ gxVSsEkB0Psf2TGjGM3EMHrzzO5PIOnGZIsESOkWbZ+mlkIQ4wVaSDWb9/oTbnFS48SR 8QnTl4ALMiZ5XPWX4BZtJ3qdtpPXX2mHyz4UvltrYlaqiOUKFYFGMdrG0kmg6TMhgPgh u/qQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741616623; x=1742221423; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=siqaS2JbSnx2njtDDydrKwV/MnNuVKvfYYNYvSOekGg=; b=GpVHSrKeduR8AmjKksdbwB3QP+wN9pFmxyjxCwfvdwz0qqHMnEYQmqC+n+9BQwJe0G +SJwz/0UTkIIECJo6VlGdDUPdeSk+uELbvBOOhDBa7dBeiY6UI/xxnIOVqe7sZmDCs2/ g/0umFM2lpmrYIL/HzK76BkJ/l2ymaiPMqRZ/7MXKM5SMjko8KyiyL6mzCGg8SscJtUh 4U0KOdxFssMYJEo6pR0Efp7Q14dgKwy2HMbLwL/3BJlb+y/WVH9Mc3DPVit6Gg9XqEXL yqYHLGr2M5XdM4PHrAz/NIvUF4w4hqcLPesX3uLO4hlqsLHM5TpbPEJN5snf+qMfidDe syMw== X-Forwarded-Encrypted: i=1; AJvYcCUO/J2s0CQ1GUpr2RctxtAUd3Uy1EQ5Em50zReudi9813PeEIJCIKBTA1vKXqbL3XTlAM7ruXr7TA==@kvack.org X-Gm-Message-State: AOJu0Yw+cn9zU5v6uK5WmnV4Yv8Rtk/vGYDKp5tygoKyDf4iEQML/kH1 CWbS5ePoMpbxCCGLC+vZEldFukHAWpz3id6aNwu0IqNJBMF4xvGTjlzxQN67SvBqcPvpDvGNuKP f5RwIX195D9bsF1V1gRBPcg== X-Google-Smtp-Source: AGHT+IEeQE/5jmeJ5/KuStm+Ss3KKZYK5UVXqTShZa7bpYd0X3YKbTqnCPeb/Q75cIive1di1Iuaubvf+4uM9toG5w== X-Received: from pjbpd6.prod.google.com ([2002:a17:90b:1dc6:b0:2fa:15aa:4d1e]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4d8c:b0:2ff:4bac:6fbf with SMTP id 98e67ed59e1d1-2ff7ce4f2bdmr22213321a91.7.1741616622722; Mon, 10 Mar 2025 07:23:42 -0700 (PDT) Date: Mon, 10 Mar 2025 14:23:41 +0000 In-Reply-To: Mime-Version: 1.0 References: <20250303171013.3548775-3-tabba@google.com> Message-ID: Subject: Re: [PATCH v5 2/9] KVM: guest_memfd: Handle final folio_put() of guest_memfd pages From: Ackerley Tng To: Fuad Tabba Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, peterx@redhat.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam12 X-Rspam-User: X-Rspamd-Queue-Id: 1EFF3A0019 X-Stat-Signature: jyr4hhe7ww74tohetmm7fpbkjcbrr4as X-HE-Tag: 1741616623-631347 X-HE-Meta: U2FsdGVkX1/MsNYsP0FEoVywqhXK0GNpnYUts/96AL9m398ARbZkROeOzqj2pm8NLuFearHffL/YpDv5rIskT5kHh0Kq2BMTgt8sO6b2QyMRDZ7vWusRhA6fupF5AoWF1Ef67p3ljKEHNIL8hUlkEDYT3d5ZmX9q2ZCd7NZXZHIze7Ia0lMUX/vtTSyq3MLdrCJFaSYN5AkUeHVSF9L2yttmpjaOFcQ8W++lMN+LwJNZu/7hOEPgAZsnqKWqKcNP9wL2SfdbKMEffbyrWTkFIMBSNf+9Sn9SjvJt29eG1fThgBZ7RuqjFY6kQRmK0RPRMzPaL0au/EoSMLZEcJ/agABDh7GsUp0xqEZ/pyJjjTaYy5aB3r4AIkRkq31bhrDxBXklsKpOTel1oQLAJ080E9z4GYiLw/AbiPMuiJyyse6xW5uVJ+8NQb0r8A2fo5aBf1oMLrr7KESRn1LFLrtm0DJ7E85FkRNmgV6L6dghDnrPe6Fw5/N8zLfVCNPNZQ1uJK09vOU/QWe4EcRda5Jm3lLXRCmlJ/dcr5+bGEevOMUaUzPI5gELAXieNaKMd8ANIC79djBviOdyub6ttWR8RLXNbpAdbRK9ovO6HpcWdir30r0v1ylQ8r91Yot61ZAMbVeoe5BzJ+RuILzk9D+2c+ZtkkZTx0wHXDq5B9S1+wLLd/+K8vZKhW4rrbpONDWx/Oz0JaxL7yfRDYwUbef+Fo4HuVzvqb9aU6ab3ACt4UDxSuHuC5FHP5RwNZzhW7bCvU3UQ6GbPoD8hb6MnC0XYRmvDVIcweopWzxWfC76iQSISaNFmc0OxuM2mlQRQ+fW1OIhdJkYrKexCA0TRcqduDGUysC6w4Km8IR50TCts5fgB+Yp1yK74W8HnnlI0ocfgc05815ih9WnDoC4SwHOrd4caTomK4Zm1UOEPYRGB4BYwDw78+vsdWFvM/c9biRguocnAeXD4JbSeXQzH1l xIJcqgVJ bfNhyHgMRe7Dfa9D9k/ULlCSc330MFKK1Sh8NqtxZEkeVVq3Fc20LnRUyR2UXbM6FE4r2C9XUFGs/aM/5mNT1rUKK0GT/8zAbN9jttOl0juYQf9Tzx+FSjeL4oEXQtF1n47iFjCjqAdgz9933BD+1ACUjUGXG3lHnlt365rb9Ja+HP+Kspen7NA/+z2xXTcLC9+wPY718C4+LYyN00GMcaZA/nkWbfQQ7ndvttkinnZvFJKX5fXUEHfihXbzgivhl9JA4HcTSGenqFwyNIuKKh+mhS3Wuq/XbKq9Qj9FpQIDe+KvJ2pX10vIfHdmvAoO3nztBdVJQp8sb0zQ5ywVS46qqKU3YiXQEg1YjfjGJG4Qxim1wCYHm0t2W99/5lMUtbdZaS3jvDK2+qCL9gGo/qckPSM9rFpiXkgPGvXTX+4ij0uX7W1dIwtlNwqZbU1mbsiGGc3EW/1bAXPW43zVwiguSajXXKMhZ3XmUMTVsLG3ag/vRK0f85lRgF/B5k0Dp6m9hlwAP+q5CXKP0QELKnbtvJIO9Mk3QzOgU95vVk7vjzrOSNgh+bmIFrTwrYLJBhsNqYVznEwO5jPVgGtnrYDOdfh+RDyBLfOkFZxAN+RjW70IVqKt4gRw5zp/dA8xsk7G15MlCeoymtjgrr6tmVCay6ONjRsADPWfIhpI28/EttxgY+rl4AWjGAxyumq93fH0saKsNZHK5Y+VWojG533HXVXgfxp6URI28 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Fuad Tabba writes: > Hi Ackerley, > > On Fri, 7 Mar 2025 at 17:04, Ackerley Tng wrote: >> >> Fuad Tabba writes: >> >> > Before transitioning a guest_memfd folio to unshared, thereby >> > disallowing access by the host and allowing the hypervisor to >> > transition its view of the guest page as private, we need to be >> > sure that the host doesn't have any references to the folio. >> > >> > This patch introduces a new type for guest_memfd folios, which >> > isn't activated in this series but is here as a placeholder and >> > to facilitate the code in the subsequent patch series. This will >> > be used in the future to register a callback that informs the >> > guest_memfd subsystem when the last reference is dropped, >> > therefore knowing that the host doesn't have any remaining >> > references. >> > >> > This patch also introduces the configuration option, >> > KVM_GMEM_SHARED_MEM, which toggles support for mapping >> > guest_memfd shared memory at the host. >> > >> > Signed-off-by: Fuad Tabba >> > Acked-by: Vlastimil Babka >> > Acked-by: David Hildenbrand >> > --- >> > include/linux/kvm_host.h | 7 +++++++ >> > include/linux/page-flags.h | 16 ++++++++++++++++ >> > mm/debug.c | 1 + >> > mm/swap.c | 9 +++++++++ >> > virt/kvm/Kconfig | 5 +++++ >> > 5 files changed, 38 insertions(+) >> > >> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h >> > index f34f4cfaa513..7788e3625f6d 100644 >> > --- a/include/linux/kvm_host.h >> > +++ b/include/linux/kvm_host.h >> > @@ -2571,4 +2571,11 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, >> > struct kvm_pre_fault_memory *range); >> > #endif >> > >> > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM >> > +static inline void kvm_gmem_handle_folio_put(struct folio *folio) >> > +{ >> > + WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress."); >> > +} >> > +#endif >> > + >> > #endif >> >> Following up with the discussion at the guest_memfd biweekly call on the >> guestmem library, I think this folio_put() handler for guest_memfd could >> be the first function that's refactored out into (placeholder name) >> mm/guestmem.c. >> >> This folio_put() handler has to stay in memory even after KVM (as a >> module) is unloaded from memory, and so it is a good candidate for the >> first function in the guestmem library. >> >> Along those lines, CONFIG_KVM_GMEM_SHARED_MEM in this patch can be >> renamed CONFIG_GUESTMEM, and CONFIG_GUESTMEM will guard the existence of >> PGTY_guestmem. >> >> CONFIG_KVM_GMEM_SHARED_MEM can be introduced in the next patch of this >> series, which could, in Kconfig, select CONFIG_GUESTMEM. >> >> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h >> > index 6dc2494bd002..daeee9a38e4c 100644 >> > --- a/include/linux/page-flags.h >> > +++ b/include/linux/page-flags.h >> > @@ -933,6 +933,7 @@ enum pagetype { >> > PGTY_slab = 0xf5, >> > PGTY_zsmalloc = 0xf6, >> > PGTY_unaccepted = 0xf7, >> > + PGTY_guestmem = 0xf8, >> > >> > PGTY_mapcount_underflow = 0xff >> > }; >> > @@ -1082,6 +1083,21 @@ FOLIO_TYPE_OPS(hugetlb, hugetlb) >> > FOLIO_TEST_FLAG_FALSE(hugetlb) >> > #endif >> > >> > +/* >> > + * guestmem folios are used to back VM memory as managed by guest_memfd. Once >> > + * the last reference is put, instead of freeing these folios back to the page >> > + * allocator, they are returned to guest_memfd. >> > + * >> > + * For now, guestmem will only be set on these folios as long as they cannot be >> > + * mapped to user space ("private state"), with the plan of always setting that >> > + * type once typed folios can be mapped to user space cleanly. >> > + */ >> > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM >> > +FOLIO_TYPE_OPS(guestmem, guestmem) >> > +#else >> > +FOLIO_TEST_FLAG_FALSE(guestmem) >> > +#endif >> > + >> > PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc) >> > >> > /* >> > diff --git a/mm/debug.c b/mm/debug.c >> > index 8d2acf432385..08bc42c6cba8 100644 >> > --- a/mm/debug.c >> > +++ b/mm/debug.c >> > @@ -56,6 +56,7 @@ static const char *page_type_names[] = { >> > DEF_PAGETYPE_NAME(table), >> > DEF_PAGETYPE_NAME(buddy), >> > DEF_PAGETYPE_NAME(unaccepted), >> > + DEF_PAGETYPE_NAME(guestmem), >> > }; >> > >> > static const char *page_type_name(unsigned int page_type) >> > diff --git a/mm/swap.c b/mm/swap.c >> > index 47bc1bb919cc..241880a46358 100644 >> > --- a/mm/swap.c >> > +++ b/mm/swap.c >> > @@ -38,6 +38,10 @@ >> > #include >> > #include >> > >> > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM >> > +#include >> > +#endif >> > + >> > #include "internal.h" >> > >> > #define CREATE_TRACE_POINTS >> > @@ -101,6 +105,11 @@ static void free_typed_folio(struct folio *folio) >> > case PGTY_hugetlb: >> > free_huge_folio(folio); >> > return; >> > +#endif >> > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM >> > + case PGTY_guestmem: >> > + kvm_gmem_handle_folio_put(folio); >> > + return; >> > #endif >> > default: >> > WARN_ON_ONCE(1); >> > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig >> > index 54e959e7d68f..37f7734cb10f 100644 >> > --- a/virt/kvm/Kconfig >> > +++ b/virt/kvm/Kconfig >> > @@ -124,3 +124,8 @@ config HAVE_KVM_ARCH_GMEM_PREPARE >> > config HAVE_KVM_ARCH_GMEM_INVALIDATE >> > bool >> > depends on KVM_PRIVATE_MEM >> > + >> > +config KVM_GMEM_SHARED_MEM >> > + select KVM_PRIVATE_MEM >> > + depends on !KVM_GENERIC_MEMORY_ATTRIBUTES >> >> Enforcing that KVM_GENERIC_MEMORY_ATTRIBUTES is not selected should not >> be a strict requirement. Fuad explained in an offline chat that this is >> just temporary. >> >> If we have CONFIG_GUESTMEM, then this question is moot, I think >> CONFIG_GUESTMEM would just be independent of everything else; other >> configs would depend on CONFIG_GUESTMEM. > > There are two things here. First of all, the unfortunate naming > situation where PRIVATE could mean GUESTMEM, or private could mean not > shared. I plan to tackle this aspect (i.e., the naming) in a separate > patch series, since that will surely generate a lot of debate :) > Oops. By "depend on CONFIG_GUESTMEM" I meant "depend on the introduction of the guestmem shim". I think this is a good time to introduce the shim because the folio_put() callback needs to be in mm and not just in KVM, which is a loadable module and hence can be removed from memory. If we do introduce the shim, the config flag CONFIG_KVM_GMEM_SHARED_MEM will be replaced by CONFIG_GUESTMEM (or other name), and then the question on depending on !KVM_GENERIC_MEMORY_ATTRIBUTES will be moot since I think an mm config flag wouldn't place a constraint on a module config flag? When I wrote this, I thought that config flags are easily renamed since they're an interface and are user-facing, but I realized config flag renaming seems to be easily renamed based on this search [1]. If we're going with renaming in a separate patch series, some mechanism should be introduced here to handle the case where 1. Kernel (and KVM module) is compiled with KVM_GMEM_SHARED_MEM set 2. KVM is unloaded 3. folio_put() tries to call kvm_gmem_handle_folio_put() > The other part is that, with shared memory in-place, the memory > attributes are an orthogonal matter. The attributes are the userpace's > view of what it expects the state of the memory to be, and are used to > multiplex whether the memory being accessed is guest_memfd or the > regular (i.e., most likely anonymous) memory used normally by KVM. > > This behavior however would be architecture, or even vm-type specific. > I agree it is orthogonal but I'm calling this out because "depends on !KVM_GENERIC_MEMORY_ATTRIBUTES" means if I set CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES, I can't use PGTY_guestmem since CONFIG_KVM_GMEM_SHARED_MEM would get unset. I was trying to test this with a KVM_X86_SW_PROTECTED_VM, setting up for using the ioctl to convert memory and hit this issue. > Cheers, > /fuad > >> > + bool [1] https://lore.kernel.org/all/?q=s%3Arename+dfn%3AKconfig+merged