From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DADA5C282EC for ; Mon, 17 Mar 2025 16:51:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7A83D280002; Mon, 17 Mar 2025 12:51:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 77DF1280001; Mon, 17 Mar 2025 12:51:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 64603280002; Mon, 17 Mar 2025 12:51:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 46985280001 for ; Mon, 17 Mar 2025 12:51:03 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 32545C0366 for ; Mon, 17 Mar 2025 16:51:03 +0000 (UTC) X-FDA: 83231632806.22.2EBBBF5 Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com [209.85.160.174]) by imf29.hostedemail.com (Postfix) with ESMTP id 17977120004 for ; Mon, 17 Mar 2025 16:51:00 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=kdHbPO+6; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of tabba@google.com designates 209.85.160.174 as permitted sender) smtp.mailfrom=tabba@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742230261; a=rsa-sha256; cv=none; b=CFZB/hIFszvVR4+AuoDUiTVbAM/Sq3uvTqeFYn/7cpGV0xO50e575wkL2o3EjNQTG/88Ad hZtc8lDcvqahdzq5Vxdeh5Ix8+27uRQCybnFELj6DPw261qFQFwpcdnM6pqkgjCW6IGBwR SAtLWojbOXpwefxtGI86yEdMj9R/7Qo= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=kdHbPO+6; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of tabba@google.com designates 209.85.160.174 as permitted sender) smtp.mailfrom=tabba@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742230261; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BMJCbVZNzWKyrw0PFDa/hXbadO5iF//m5/R/8CU725g=; b=p0CCDsDFRg42kd6M0SHHiLPNqDIsVZyK5u+x7Ox25ojZpXKqTn/Lp1IgGRayS8nrWOkEac EVkA/mzCjNRQZogsHQvOkZlxrcnYn8Cgv3iV58zw9kZLsayQn6hBOqLosKQ+/r1bqmVm54 Dhq0Xgmu/b9Ex5ndC0T2qqxxA9mF7DU= Received: by mail-qt1-f174.google.com with SMTP id d75a77b69052e-476693c2cc2so82151cf.0 for ; Mon, 17 Mar 2025 09:51:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742230260; x=1742835060; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=BMJCbVZNzWKyrw0PFDa/hXbadO5iF//m5/R/8CU725g=; b=kdHbPO+6gsxhQ2A9PrHP01SM85yhhuB2d1b57Yrgv+Ux5JwyEYwegLNS/eSCb5ocbr 9K/gLKU+NVGDnvLpTyDBQ2WAZ81/7IJEWxq3BAfnzf1a7U7oC4aAGamCnnqSuwkx1UVh XORqebWuMF58LDZ/lXMWzyhUMU/beoiYhkcnoTQsmL/26q548p3EXUPou0TjtF4Iw7m9 QBEjVjZtHLx8reiFtdSnXhJPgO0FyNwlD3aaXr+ysPi+Fj51v18X0OYxsmEmgr8JS1bp CZamTzAlALialSmjMzENUVcaxEvHZCVS6MzekpB7rAtWKTSpmRnXjNHUxmNrEgR+4b65 k/sQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742230260; x=1742835060; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=BMJCbVZNzWKyrw0PFDa/hXbadO5iF//m5/R/8CU725g=; b=cyBnjcJ5z8/VDE8RArkpm24zZfm+NJrSuEwpq8/oIhfHuk+IOJ5HS9Rn92SqUdLgUC y3J7uXH0xzka3LVNDTU5WHopENoDgJKVeAxeNHDO7u8i6Hq4/M85sjd3lVKqvsUjI9Z7 1MmhLjlY3X3b5dfFVC2fS59ezuYbHWxcvU441K4u4bqW5GBslEq79vXrzNXMQ6nsZoI2 U1D5kkumYBl4KOu5ujx+ljnLPBdAgzmTaQgfJCacXjMxifKfWc7WRF89o2om2tiMws9/ zzmdtJPCtEwmH1NIK3VFsyFCPzZAo9gcvLlPvvXrFABvj4uuOBb3SRfoWDiwNm6U6sff xHzw== X-Forwarded-Encrypted: i=1; AJvYcCUd673RINqdy/smSR31F6ZD8dumGysOCk39xhPqzije4Gt3n9cajDLfHzX4EIIFm0OJyHLFvGlGoA==@kvack.org X-Gm-Message-State: AOJu0YwB3LiBZJ0z0TFVw63oRBVoCx+LZuEAp7scUaIthMIi0UZEZJOO joM/J63z8zQHI9TBXZ6s9C8akZc40/TpSFcXBVJwwbHOekiSmSX4r/2qxYEZ1I535amZAUCrcB6 LBWEJYil1gZOMH8Sad39YQ3lMnNgLBSHFLN8h X-Gm-Gg: ASbGncv1RdRbOUNKmM+38vG5B0lcRdtTuy1fLUPKyNVeuXrD2p0LetWOhWHdfgQYizZ q/sdA2MdY/D8FTPAeMp2+LDnET8xub367ADufwq+BsYvdFh7K10ewJckBVleyBNpfD85uSPtHmL PYY1PTddQtu95OOnh+y8xyM+Mj X-Google-Smtp-Source: AGHT+IF6Orr4sUHoJpUx90HW8r5dlBhwAPLedSUqI8aM7Wg0m66ieHSV0fIkeZ/KxvbrgcW2stl5fVd/vCyh98qDxow= X-Received: by 2002:a05:622a:2d1:b0:476:d668:fd1c with SMTP id d75a77b69052e-476d668fd5bmr5944231cf.2.1742230260012; Mon, 17 Mar 2025 09:51:00 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Fuad Tabba Date: Mon, 17 Mar 2025 16:50:22 +0000 X-Gm-Features: AQ5f1JphqvzmOeuUctWoYWwk76QOo4D6Zmd-e11cY6CJEXV8UR8W47qbNU6GbeI Message-ID: Subject: Re: [PATCH v6 03/10] KVM: guest_memfd: Handle kvm_gmem_handle_folio_put() for KVM as a module To: Ackerley Tng Cc: Vlastimil Babka , kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, isaku.yamahata@intel.com, mic@digikod.net, vannapurve@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, peterx@redhat.com Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: znmgzj1s95mmfkkh4ydk8fgsqk9qwtyo X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 17977120004 X-Rspam-User: X-HE-Tag: 1742230260-177638 X-HE-Meta: U2FsdGVkX1+vmyn3ROcKeMvRx7p1XVS6Wr5MhFupRstkmAyhLjBApxgRGCLsOJ20VqcwGFP69B+jC4AlJlFe2RSj9AQGWI9iypx2nuwCYKwilPPg2PxEw6GXAWWO1T23HqtKsyV9PJdCRHVEIXgkNUbjuEvjXEJXIMKPvzEou3UXEG4KD2BDu5LyS8Cm+ZCM8KhrwlvrIF5Jye6eP/BSzIdAAVhzJi6nd4mKgzEpMcpJPNJ2Y+Irm5mUF4tMFwIfy1+wFhgHySAE9VMKne8GYR5hVt/1XPMjzfa9cKRxq7E0XJo58wgYASN36Gv7MsfC6vwXb1E9jRsh2+q5fOBD5mSzj4QcA7z0WZBBcKNSrneOIKw/yR3TpWhCgyRGUC4k/lfWG6PkTherZ5vSTccOVhP2jZ6bp1CGKAEKuZMZaLDNfjoqieTCoaoljk9Rplsg0SU3IE/JCfFuew3dI9zaYFV7puf1iPhifvwoBh9np6QGLAb03mCPtqvztkGD7BiKpPmCLPyUHn2eY1aLzJnqkfOciAIJ91PeRJFnbzykdke+v2vI86msG9JowwcwnzzaBZBfBtyRyEr63mSmr0aRC9DeEOjjoOUk8m6v2lsYt7lxSGqKsK0UX5NCgN1npAAF/4DSI/TIcGecYDy2QIP/AMPm4Ttm+RVWtjYEuC3dVfbGNJV0BH7+KcPMyZCmUIPO5jxtfcFfELs2tQLdEVwelyjo6Ck7D+vLO756Sb5OAQY+YH3LNA944b07yDZw37tEwJ948pm9nWNWfGvn7tJXOKYjtSfib+wntpd9PN2di0D85nUtGoiXgk66Cig2YdqfHR/0aunsBbO429Ro0F09Tm/NhJTBdd56FHQYA4LxYZKozPi2DsuwNlcPUi6V+ka/ILF6TAgOLe0Zre+TiI0cv3JjFsaZNZs2tqMQJoDKPMmokJCxbXuQq91FqutSsx/boeXCEy5vV6HQEGtsjXu 8Se+wipB C3w/WtPf6pZVTWhbzVgkf0p4PMqfRhsnGxRq7eZuL4AN/sT0tCGszFvfy4ER867qFK4ARFPGdW9nieTvfT4LHnlVjWzeGSuAHrb+Bs5VwwYchMcuoemgVv5KKk5ymDDyge1zdnYnxFng+eg9+VtekVTtAOXji95pXE+vpXlU97pz9uE1ttT455sunvYraW6Dn4ZdxhIaFBWlsaAX1KZUtdaaxUrQc4dhadP5SGF8O1OoRZMVtnTy1CDy3Rnw2BDECHiDwiU9L52r670mQUScRaHgOcy3N45Zu13ncRWlG9OXs4di1yglaq6qVY8/m4/hqAF1WxOy4Vdqkz3IWx1F1W969Qw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000005, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Ackerley, On Mon, 17 Mar 2025 at 16:27, Ackerley Tng wrote: > > Vlastimil Babka writes: > > > On 3/13/25 14:49, Ackerley Tng wrote: > >> Fuad Tabba writes: > >> > >>> In some architectures, KVM could be defined as a module. If there is a > >>> pending folio_put() while KVM is unloaded, the system could crash. By > >>> having a helper check for that and call the function only if it's > >>> available, we are able to handle that case more gracefully. > >>> > >>> Signed-off-by: Fuad Tabba > >>> > >>> --- > >>> > >>> This patch could be squashed with the previous one of the maintainers > >>> think it would be better. > >>> --- > >>> include/linux/kvm_host.h | 5 +---- > >>> mm/swap.c | 20 +++++++++++++++++++- > >>> virt/kvm/guest_memfd.c | 8 ++++++++ > >>> 3 files changed, 28 insertions(+), 5 deletions(-) > >>> > >>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > >>> index 7788e3625f6d..3ad0719bfc4f 100644 > >>> --- a/include/linux/kvm_host.h > >>> +++ b/include/linux/kvm_host.h > >>> @@ -2572,10 +2572,7 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, > >>> #endif > >>> > >>> #ifdef CONFIG_KVM_GMEM_SHARED_MEM > >>> -static inline void kvm_gmem_handle_folio_put(struct folio *folio) > >>> -{ > >>> - WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress."); > >>> -} > >>> +void kvm_gmem_handle_folio_put(struct folio *folio); > >>> #endif > >>> > >>> #endif > >>> diff --git a/mm/swap.c b/mm/swap.c > >>> index 241880a46358..27dfd75536c8 100644 > >>> --- a/mm/swap.c > >>> +++ b/mm/swap.c > >>> @@ -98,6 +98,24 @@ static void page_cache_release(struct folio *folio) > >>> unlock_page_lruvec_irqrestore(lruvec, flags); > >>> } > >>> > >>> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM > >>> +static void gmem_folio_put(struct folio *folio) > >>> +{ > >>> +#if IS_MODULE(CONFIG_KVM) > >>> + void (*fn)(struct folio *folio); > >>> + > >>> + fn = symbol_get(kvm_gmem_handle_folio_put); > >>> + if (WARN_ON_ONCE(!fn)) > >>> + return; > >>> + > >>> + fn(folio); > >>> + symbol_put(kvm_gmem_handle_folio_put); > >>> +#else > >>> + kvm_gmem_handle_folio_put(folio); > >>> +#endif > >>> +} > >>> +#endif > > > > Yeah, this is not great. The vfio code isn't setting a good example to follow :( > > > >> Sorry about the premature sending earlier! > >> > >> I was thinking about having a static function pointer in mm/swap.c that > >> will be filled in when KVM is loaded and cleared when KVM is unloaded. > >> > >> One benefit I see is that it'll avoid the lookup that symbol_get() does > >> on every folio_put(), but some other pinning on KVM would have to be > >> done to prevent KVM from being unloaded in the middle of > >> kvm_gmem_handle_folio_put() call. > > > > Isn't there some "natural" dependency between things such that at the point > > the KVM module is able to unload itself, no guest_memfd areas should be > > existing anymore at that point, and thus also not any pages that would use > > this callback should exist? In that case it would mean there's a memory leak > > if that happens so while we might be trying to avoid calling a function that > > was unleaded, we don't need to try has hard as symbol_get()/put() on every > > invocation, but a racy check would be good enough? > > Or would such a late folio_put() be legitimate to happen because some > > short-lived folio_get() from e.g. a pfn scanner could prolong the page's > > lifetime beyond the KVM module? I'd hope that since you want to make pages > > PGTY_guestmem only in certain points of their lifetime, then maybe this > > should not be possible to happen? > > > > IIUC the last refcount on a guest_memfd folio may not be held by > guest_memfd if the folios is already truncated from guest_memfd. The > inode could already be closed. If the inode is closed then the KVM is > free to be unloaded. > > This means that someone could hold on to the last refcount, unload KVM, > and then drop the last refcount and have the folio_put() call a > non-existent callback. > > If we first check that folio->mapping != NULL and then do > kvm_gmem_handle_folio_put(), then I think what you suggested would work, > since folio->mapping is only NULL when the folio has been disassociated > from the inode. > > gmem_folio_put() should probably end with > > if (folio_ref_count(folio) == 0) > __folio_put(folio) > > so that if kvm_gmem_handle_folio_put() is done with whatever it needs to > (e.g. complete the conversion) gmem_folio_put() will free the folio. Right, this is important. Into the next respin. Thanks, /fuad > >> Do you/anyone else see pros/cons either way? > >> > >>> + > >>> static void free_typed_folio(struct folio *folio) > >>> { > >>> switch (folio_get_type(folio)) { > >>> @@ -108,7 +126,7 @@ static void free_typed_folio(struct folio *folio) > >>> #endif > >>> #ifdef CONFIG_KVM_GMEM_SHARED_MEM > >>> case PGTY_guestmem: > >>> - kvm_gmem_handle_folio_put(folio); > >>> + gmem_folio_put(folio); > >>> return; > >>> #endif > >>> default: > >>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > >>> index b2aa6bf24d3a..5fc414becae5 100644 > >>> --- a/virt/kvm/guest_memfd.c > >>> +++ b/virt/kvm/guest_memfd.c > >>> @@ -13,6 +13,14 @@ struct kvm_gmem { > >>> struct list_head entry; > >>> }; > >>> > >>> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM > >>> +void kvm_gmem_handle_folio_put(struct folio *folio) > >>> +{ > >>> + WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress."); > >>> +} > >>> +EXPORT_SYMBOL_GPL(kvm_gmem_handle_folio_put); > >>> +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */ > >>> + > >>> /** > >>> * folio_file_pfn - like folio_file_page, but return a pfn. > >>> * @folio: The folio which contains this index.