From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F379C282EC for ; Mon, 17 Mar 2025 16:27:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EE650280002; Mon, 17 Mar 2025 12:27:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E9778280001; Mon, 17 Mar 2025 12:27:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D68A6280002; Mon, 17 Mar 2025 12:27:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B7BEA280001 for ; Mon, 17 Mar 2025 12:27:55 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 3C71A80300 for ; Mon, 17 Mar 2025 16:27:57 +0000 (UTC) X-FDA: 83231574594.29.DFEBB29 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf21.hostedemail.com (Postfix) with ESMTP id 75BAB1C000F for ; Mon, 17 Mar 2025 16:27:55 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="Fl4evPv/"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of 3ik3YZwsKCMsrt1v82vFA4xx55x2v.t532z4BE-331Crt1.58x@flex--ackerleytng.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3ik3YZwsKCMsrt1v82vFA4xx55x2v.t532z4BE-331Crt1.58x@flex--ackerleytng.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742228875; a=rsa-sha256; cv=none; b=mLKKRrR9NmaUTrMKPAcdgSRLsHpVocBzfpAfpepVekds1lkzKZXmyy5r2ghxgqJ3bno/8T jA826k6df1jJ1NrPXYC2jHUHh/XIbqstMBa8iIkwxfNUFzsdu+VS2XPu31Y1frjQxXhaef 1z9A45fs94ChGHvcLRtperZRnddqyQM= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="Fl4evPv/"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of 3ik3YZwsKCMsrt1v82vFA4xx55x2v.t532z4BE-331Crt1.58x@flex--ackerleytng.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3ik3YZwsKCMsrt1v82vFA4xx55x2v.t532z4BE-331Crt1.58x@flex--ackerleytng.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742228875; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=W8d3f9bwZb24F92RREK1YCHX2o38Tp6QBPGEvO0sKaU=; b=8TMU80hByrWTCTtDsC4lHhmWy4PHERTWZi220XKE92A1O3NOG8kQ44cRYYWv7IimuXjfAi 0UGiRcO3lkzX5RQ6UZgl/4Tb8U46r4Bb4wXICdtfOHVoF5rJktUa+83FFcS0mmeFAWsg5f EQPvz49vTI+hhg2eOaANAzNoY9JqDHw= Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-3011cfa31f6so3515148a91.3 for ; Mon, 17 Mar 2025 09:27:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742228874; x=1742833674; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=W8d3f9bwZb24F92RREK1YCHX2o38Tp6QBPGEvO0sKaU=; b=Fl4evPv/VStGvrCCpiYf2/xHBXLM6Mzhw3TvSuacJ35vCOAFZeIjTZmMN4LS2l8sSU /9NVS3FlNZulTYqNgfLhjxFkCoNliabYo5tXaMy1OEwacmNLnNOTOFSCUcASS/PsCBKt ZOQpWjx+xf1awMk6I8j6V32pcB4laz39KbzSLCaAzLJ6zraG8in4FwLB5QdKdmSBWE58 hdbZHdH/cg3yV6AA4FyQJqGhI2OeVcRMqhvn3+Vwn+C+uox9PfbEVKWTY7LI0OtjQh3i J6fmvEs3SfBFsJDANmxbzuArU0rFNvRr3ZZtey4/APier4L8KyBRB/Wi+tg5ypAnAdgg xHUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742228874; x=1742833674; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=W8d3f9bwZb24F92RREK1YCHX2o38Tp6QBPGEvO0sKaU=; b=pXLHIifoD6iEpj2qD7wtiIpGdGdBn3UIote9y8/hBn8go2+sSc2a499a8i2jvksyCB 6Z6+ORzBu2gZoDhoAWJlWbcxrXWt0nQYU0GzV7M4QWqO2e8dGrxz2WEEji3l8GDriBzp sxgrnRPa9xvjeg9SaoRxCHNpy7NNWjHLc02H2xc/fo+xXD6E2thX62qESdnwb1SkVmHT tDDpWLCO3OEOlP5UxJ3uSoWF3huxB0DEKxi8cAmp1sjj8jHA6kyLhkdGUPfZfM0dMuId M94V5TReS08oJ2BSy+yltThkQUAFF8bXS9p1YOJPjaAam+6Hc6z3dxSl8irMIXq06FnO PKpQ== X-Forwarded-Encrypted: i=1; AJvYcCVNcBIVfz8IKOML/AU1pC2Ovp4Jvpq0eONSHoclxlttsXyHNCCQBCH3y5v6tq1SBgNL3MRlTuYKCQ==@kvack.org X-Gm-Message-State: AOJu0YyotVOz/aWMs9iBpKQaglBMXQX62r7v7ldcSTVHSC/a/P6rHZBS S9bvIudQwvM7Be/c5kIw9TKQD15aNUUYc6UIc1AkRkVgUMeN4PWZakZEmuljfK5WzyA3y9QucRv wVYBmJ2HfhlvHK67FrvSGoA== X-Google-Smtp-Source: AGHT+IG4WirUOV3+r5B2lb31NOl3SYbtGGIlb7cjVN18KcQU2Z2EqVpQ/NTkkFp10CSOZgMn9KFkZaVD4S3aseWUuQ== X-Received: from pjn6.prod.google.com ([2002:a17:90b:5706:b0:2f9:e05f:187f]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2e44:b0:2ff:502e:62d4 with SMTP id 98e67ed59e1d1-30151d5bcdamr14234779a91.32.1742228874055; Mon, 17 Mar 2025 09:27:54 -0700 (PDT) Date: Mon, 17 Mar 2025 16:27:52 +0000 In-Reply-To: Mime-Version: 1.0 References: Message-ID: Subject: Re: [PATCH v6 03/10] KVM: guest_memfd: Handle kvm_gmem_handle_folio_put() for KVM as a module From: Ackerley Tng To: Vlastimil Babka , Fuad Tabba Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, isaku.yamahata@intel.com, mic@digikod.net, vannapurve@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, peterx@redhat.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Queue-Id: 75BAB1C000F X-Stat-Signature: rm6i61wobfpp675xg43eti5apuaohg8g X-Rspamd-Server: rspam06 X-HE-Tag: 1742228875-405854 X-HE-Meta: U2FsdGVkX19768SlLp/Cgpb67qUCXppCKIw5U3UonIbd59G1RBTKYNGHTSuoI6jFj0pxcIWOC2nJDf7VII7kcFMsfBuz0ee6lqeDOXa0b8vhZcBV5Wo6fwjc1t/+CRm3DcAqAfwAI2UVNLz9TN16dwcM3zanVXfssqL4Rqe7KxFYcHTQXTOxf9fhk4WyzFWr1aj6PYu4VL0S7NHluOw2nkP4wCTtk2LvzCyhBZv5i79apyJ/HqlgvcmOsgqHNciSgpWt0mALqbGvkjQVJf7aJYsLc9eL/XC0y5puYZVsMkCukD8n9rlaYEaM3bxESlsLAmHxkRnPG+kjSCvgoP6WFum00i8LqfLgRvrdC52BOiLfJOBblQpio5LHf6rrl+BOc8UcD+ZVXWshKBs1hlkx2uyM1FP3qqL7z2NwmcfmODsU/OhBt+yBbZzXupJ8TzAmpgquvmM+pPAFKuU7ELFMB5XLoMaOWUG4XXLlfu7w7cMzjnjTVESCdLLX4WImFEc7MQ1fXAlP1NgZtlUd8JQ4a3ReNXvTjWJYVdmUU6x8Egj0nboJEqGM+Yei88ZsmJnGhqhsvPPqjRqyc2VdUjvAQ4DiQ2l8ynu4T0S8YmfUHmqWCIu8HtPx7AVbkukMvD0PU3Lkrp6RLys7iZqC37tF8LqSkEscO1istbJGdqf+bimhu8jlrYRcOd5AQQv87PptnRZXxf/m6FWkwHEvcmvg+9e74qyXitw5ZE5xk+QrCo7Gf1DmxIRx9JaZi4/lwqf21p7Ep4N4gj60YaWPmEK2sKaXrxmEl0qA2EOIRyEF4NXeR/XqbNfu/MLlc01AP7XduXA6HaKeoTHh0KXV0fjYuQx8qYR8C0GoNFcjsEJs18lWLVIaK8gNOVHjQdJUzsjkidPsgayWOPEb4WGrXYA8LpC0KrV33I+LHAx20bwQPxIH9p/A6jjHqwsDcKbnZT4KwkKowuBNy8Gxct36dnj JjjJ3VWb C1yUeG6tmTivXzROntHEmDoqAxh5mnCFv+Qwzy5zglaBiPMnbhRw8nQN4HyD4RufmVmE2d+jmofXMjE2Sw21VhJp+4d2cqFeqSvgjZeOGdN9aEX/45NUjdA0Uden07bCJXvibe6WuwdKee2UPkJ64eV19+X5O9lc2TOczniKi91tuRG2im4oak+mKvsU4u/KOimQpCaimnr1qkSF9SV+BfpVxqktCRYo7HB7B93UhcPoF5/nVkl4yid54tQ19B++bX1cLSS29BtIXKqzuxiu4MOz/Ujx/TufUlPGwZ+aX0WcWNd5DBEPSMpiqDjfuMdtkYY+5k7AQbTC/pSblbOORKdTQL8I0nQj9FDqn2R5PiZjIckqFquORaKspQ6B0/jBzDSzR/jU0lfVgvrZdI1MDuvLy1HuMj3KigyeboC71t9utzVioJyY1sThJgqpPbHvJ+xvHGpXKWUF9y65K7MJiI4i84Us4vgJInNyp21MYeamSdVApThXWJ297TW1kK4WrGXdCo1Zll5kglSnECxIUOm4pWiVBieWcYx2hYg38utiLMi17u/yS5zUq+zc/9r56DkXvWVGcHADsROfUfBTeb/5p0R4OuKNqSEBoAm9KP7c2p97F0jUq2oBOvdUtCArLZnx1/ZOCjZVRXemKqBcGPPPBqJsscfGl20fdkg5BOfW5mMLitxUyQHKIiw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Vlastimil Babka writes: > On 3/13/25 14:49, Ackerley Tng wrote: >> Fuad Tabba writes: >> >>> In some architectures, KVM could be defined as a module. If there is a >>> pending folio_put() while KVM is unloaded, the system could crash. By >>> having a helper check for that and call the function only if it's >>> available, we are able to handle that case more gracefully. >>> >>> Signed-off-by: Fuad Tabba >>> >>> --- >>> >>> This patch could be squashed with the previous one of the maintainers >>> think it would be better. >>> --- >>> include/linux/kvm_host.h | 5 +---- >>> mm/swap.c | 20 +++++++++++++++++++- >>> virt/kvm/guest_memfd.c | 8 ++++++++ >>> 3 files changed, 28 insertions(+), 5 deletions(-) >>> >>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h >>> index 7788e3625f6d..3ad0719bfc4f 100644 >>> --- a/include/linux/kvm_host.h >>> +++ b/include/linux/kvm_host.h >>> @@ -2572,10 +2572,7 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, >>> #endif >>> >>> #ifdef CONFIG_KVM_GMEM_SHARED_MEM >>> -static inline void kvm_gmem_handle_folio_put(struct folio *folio) >>> -{ >>> - WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress."); >>> -} >>> +void kvm_gmem_handle_folio_put(struct folio *folio); >>> #endif >>> >>> #endif >>> diff --git a/mm/swap.c b/mm/swap.c >>> index 241880a46358..27dfd75536c8 100644 >>> --- a/mm/swap.c >>> +++ b/mm/swap.c >>> @@ -98,6 +98,24 @@ static void page_cache_release(struct folio *folio) >>> unlock_page_lruvec_irqrestore(lruvec, flags); >>> } >>> >>> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM >>> +static void gmem_folio_put(struct folio *folio) >>> +{ >>> +#if IS_MODULE(CONFIG_KVM) >>> + void (*fn)(struct folio *folio); >>> + >>> + fn = symbol_get(kvm_gmem_handle_folio_put); >>> + if (WARN_ON_ONCE(!fn)) >>> + return; >>> + >>> + fn(folio); >>> + symbol_put(kvm_gmem_handle_folio_put); >>> +#else >>> + kvm_gmem_handle_folio_put(folio); >>> +#endif >>> +} >>> +#endif > > Yeah, this is not great. The vfio code isn't setting a good example to follow :( > >> Sorry about the premature sending earlier! >> >> I was thinking about having a static function pointer in mm/swap.c that >> will be filled in when KVM is loaded and cleared when KVM is unloaded. >> >> One benefit I see is that it'll avoid the lookup that symbol_get() does >> on every folio_put(), but some other pinning on KVM would have to be >> done to prevent KVM from being unloaded in the middle of >> kvm_gmem_handle_folio_put() call. > > Isn't there some "natural" dependency between things such that at the point > the KVM module is able to unload itself, no guest_memfd areas should be > existing anymore at that point, and thus also not any pages that would use > this callback should exist? In that case it would mean there's a memory leak > if that happens so while we might be trying to avoid calling a function that > was unleaded, we don't need to try has hard as symbol_get()/put() on every > invocation, but a racy check would be good enough? > Or would such a late folio_put() be legitimate to happen because some > short-lived folio_get() from e.g. a pfn scanner could prolong the page's > lifetime beyond the KVM module? I'd hope that since you want to make pages > PGTY_guestmem only in certain points of their lifetime, then maybe this > should not be possible to happen? > IIUC the last refcount on a guest_memfd folio may not be held by guest_memfd if the folios is already truncated from guest_memfd. The inode could already be closed. If the inode is closed then the KVM is free to be unloaded. This means that someone could hold on to the last refcount, unload KVM, and then drop the last refcount and have the folio_put() call a non-existent callback. If we first check that folio->mapping != NULL and then do kvm_gmem_handle_folio_put(), then I think what you suggested would work, since folio->mapping is only NULL when the folio has been disassociated from the inode. gmem_folio_put() should probably end with if (folio_ref_count(folio) == 0) __folio_put(folio) so that if kvm_gmem_handle_folio_put() is done with whatever it needs to (e.g. complete the conversion) gmem_folio_put() will free the folio. >> Do you/anyone else see pros/cons either way? >> >>> + >>> static void free_typed_folio(struct folio *folio) >>> { >>> switch (folio_get_type(folio)) { >>> @@ -108,7 +126,7 @@ static void free_typed_folio(struct folio *folio) >>> #endif >>> #ifdef CONFIG_KVM_GMEM_SHARED_MEM >>> case PGTY_guestmem: >>> - kvm_gmem_handle_folio_put(folio); >>> + gmem_folio_put(folio); >>> return; >>> #endif >>> default: >>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c >>> index b2aa6bf24d3a..5fc414becae5 100644 >>> --- a/virt/kvm/guest_memfd.c >>> +++ b/virt/kvm/guest_memfd.c >>> @@ -13,6 +13,14 @@ struct kvm_gmem { >>> struct list_head entry; >>> }; >>> >>> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM >>> +void kvm_gmem_handle_folio_put(struct folio *folio) >>> +{ >>> + WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress."); >>> +} >>> +EXPORT_SYMBOL_GPL(kvm_gmem_handle_folio_put); >>> +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */ >>> + >>> /** >>> * folio_file_pfn - like folio_file_page, but return a pfn. >>> * @folio: The folio which contains this index.