From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE265C3600C for ; Thu, 3 Apr 2025 08:58:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 44C58280003; Thu, 3 Apr 2025 04:58:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3D1EE280001; Thu, 3 Apr 2025 04:58:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24B71280003; Thu, 3 Apr 2025 04:58:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 02A43280001 for ; Thu, 3 Apr 2025 04:58:50 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 93328A95D9 for ; Thu, 3 Apr 2025 08:58:51 +0000 (UTC) X-FDA: 83292132462.06.632E9DB Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) by imf04.hostedemail.com (Postfix) with ESMTP id B55A64000D for ; Thu, 3 Apr 2025 08:58:49 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Fdrb6lrB; spf=pass (imf04.hostedemail.com: domain of tabba@google.com designates 209.85.160.176 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743670729; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fgEvXXGP1m3zmYeTcI4rgOS5YVwaXYUmFGOy6VOVNWo=; b=43RmmmiHuRqkdUZ1tOX+vrJJWjnS2BEuGdcQVgUSEO1gvlQTqHleBJg8CFflvGxIRhTknv 6lUbxclyBd9gI2K5TytoO70W5FON91w58FrZUL6jKBakuq+qHClgirup45VaSUGAydh8F2 A75gif6DLlI4bKzo9341e3MpdjAOkoc= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Fdrb6lrB; spf=pass (imf04.hostedemail.com: domain of tabba@google.com designates 209.85.160.176 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743670729; a=rsa-sha256; cv=none; b=yVOYaYe8im/7aJErWz+LvKhpz9HJMDwEcsRFdp3zWCeqaiklcRGomgHscjhydBPpA9Kmgo sjRa7on08ecaiQqKQYdxP8eBKBsIrpZF6fJbceF9VwDTfYTcDYCuxyZujFw9wi4obh7dq5 u0Ilxd8IA4/ne0YdWbcbR7Sa0HQ5VBI= Received: by mail-qt1-f176.google.com with SMTP id d75a77b69052e-47666573242so853631cf.0 for ; Thu, 03 Apr 2025 01:58:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1743670729; x=1744275529; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=fgEvXXGP1m3zmYeTcI4rgOS5YVwaXYUmFGOy6VOVNWo=; b=Fdrb6lrBysConQatT97wHcwT6rhcKhqX1dVv9dnJ+U8b6bb0/92TWrIpgn0skVUdHo 0Qs2tiLGBkhfTHBufY/ew5v9p8SzuHVYoszKziZRG4bpQofycbBwc6NeIV/IRsaun5u/ LGrM+eLsZDod3HAK7Suo50Fz04YqsigjFHdCfqhCtpgmN1lLC58YZy73AgB5W4y28+Ym gv/XprhLIrn71XMrS79VbchLgr3URGN8NX3gdT0jKv158xIBjjyubpjwVlFqU0cEG4l1 VNf/1D8yT0G/aPjlfJch7AbIY9uD2EAiBA4xEwScRL7qIqrgtbG0OLHFUUmRpjm3hD7m tncQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743670729; x=1744275529; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=fgEvXXGP1m3zmYeTcI4rgOS5YVwaXYUmFGOy6VOVNWo=; b=gHuMsmidvECmtu0DSDk8CQZWD1Bt2HKsOcRQoU2hDGC1kPDD6f0S9HqrsGuLmDqYLE +QYGBGnQ2i43iYn7OczdOiDAZ9qmqwK377tw+eN0/tg+IompH6zVCAHUbj2iguS9eWvS UKbHzg1R3YCiOMcLp/lHf84zBWo63Fiooet+9/nJf4kFFXnUy3aC6/cjgnaO9FPESGcE bjbTIX7RZkyg6Vy08u8d78vAoS1I5EjcbEYxAdlF4Qse3jlB+4jvkO2tBJt4dm/PLzTi vGEhOVapbLVWq34FBEM6lgjt3ybu3opUPqFMbvf7zY2ZcCdo2uKRjCaYXfZhWFMvDCPg V1Xw== X-Forwarded-Encrypted: i=1; AJvYcCXtC+3b2TTQ08DumCP+lM3VRSbLy3hpP0c9O2VeiXQEgKSVXmIaYG2QMZ2ErV2QEpckeM5rupRrBg==@kvack.org X-Gm-Message-State: AOJu0YwZmkyqAwvQg8kx3G52mCBcIL7dUMed9xWYpQa1Ks3S/lcLhh/x cWJlNPoo5xHDZD0xfQ+XFkW7UNGmMcMD2F/1jfkaCmVI0JgwVY5/qtThqiFN9H4mn3ff29gSaSq qjRds8IRSS4OtrhJ/6NJIGWeCcr1VvrYvIwLn X-Gm-Gg: ASbGnctyR2v0rdtz32dSCqQ3gp6h9MSyvE56GCS0AfDOs5rZEJ+iDw0GJBbiNQk/Yi8 AEhPMPPCeu8pvheUnnkxrdkCYJsu/4+woZ0YxuAeb09A7ataOCsKwTzVEnEgi7SLOfWuK1g0dXQ /aX2VxyGsCaLN0bKJMOqkiz67Utw== X-Google-Smtp-Source: AGHT+IH4z1ZoSa8PwetB2kA35wKdfJ5q0G85rEwN1y8WRnFjZbzsNpHMZmrHEnpAA2ytzXMV3Aqw3jt3qenjGkw42No= X-Received: by 2002:ac8:754f:0:b0:479:1985:6687 with SMTP id d75a77b69052e-47919856707mr2761801cf.10.1743670728422; Thu, 03 Apr 2025 01:58:48 -0700 (PDT) MIME-Version: 1.0 References: <20250328153133.3504118-4-tabba@google.com> In-Reply-To: From: Fuad Tabba Date: Thu, 3 Apr 2025 09:58:11 +0100 X-Gm-Features: ATxdqUEvRL22zZX7SKy0CqhFyY24csNGzfb54MwVhadb3Rb98o--x59svE21pzI Message-ID: Subject: Re: [PATCH v7 3/7] KVM: guest_memfd: Track folio sharing within a struct kvm_gmem_private To: Sean Christopherson Cc: Ackerley Tng , kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, peterx@redhat.com, pankaj.gupta@amd.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: B55A64000D X-Rspamd-Server: rspam05 X-Rspam-User: X-Stat-Signature: oq6zeh3rwoo3qndtf4wz73hwos8pwsdu X-HE-Tag: 1743670729-810410 X-HE-Meta: U2FsdGVkX19sR+2cNJK1p/RV6P8yWvGSPhuuXGF+W5Mc9xiFHXLzlQNjUMCP89EL1nYb/oSTW7dJGjwnOS4Kl7ln91XP8GGUIZchewQo5S8DbOL7+GKPb6Wve9IWavQfrKlrvYPNhpBx0A6S8T/x/Bg9lpWbfN3/m2oV9t0obs40xqYsJ074IWNtKrEeCdUewva/IVxF99w/5cN6jml4rD3k1O6QRbTiPM+nPZJDSzoJYw5yAxyKpS6ZPvWIVoWF6coDk/3dU5jfRvIz/ugJvB1SZEW2je5d3ROvQ1Fzem/JL1OFoeFZ/DxLKvulSRiwp/3eCcsfROJr8dVwCqiUiocfXqdYfIUjdgib4Iu6Q3rTIfRu6dmUmEM5J+nDLzfY7xIxHfvuw1WlL26mAZY4nL832Mf5uZzl2RhPyd/fv2DgiOBoKtcsk/4oDXj8uh4aQuREnyCqvjmMjgsDulwuwDKAVDos4BIE7vO7Zo5ClgoUFBfPdWfxKg5EVulI/uKHe1WUFei+nvjzZUlSN2GG2IEy+As9tdZYUF7Cr7P6ATxhvISgHVdoiX4hLd9nLdSN2a7EOn83HjUO8WL+xEDXYVaivqXLKYKk6YorDi13VBDFq5zBdigqkoBF+5Fp419ds0G/x+CFrMpCPQDCef3bBVJTCOh2AYSmkVJmuxw81tm9fOfw6cNea8YC9vaPHoS1U8IKTS7UaJGSuQMd0V6LnvYCsALGyiYcgKMiw/X1LRaAaumTJpq1TXTxpx3OYtrOn0dEfP0dVWvLzbGW8ZIxJFJxkE6oAfhyLAXkodzQ8Pt2bou7346pfcBnIN98Jk3HqtYmPAb7vG4CX0J0ya/RvtCdoBC+3yBPENjpYEPssJa4jjKYJBtW2kV1l1KBhfffTIOaiuAEZX7c9XGRyst1j/nkTXtqfu8tcAT4ko0VypWV0pzSJqKt/H3c/PxrFJMp05lqDodEokwnMYP5Zo2 0Cfju28O yg8LaMpPJxMr8leLPzpeT7fxw2KKFUa4Wu75LPgZudcBtgr70A60QI6Rb/oToNVJxWYn+aucNOIqYVyy2Aw+Ou7jSnUum1xICWIUryLa0jVQTmhajtyKrtTFPJrrbWl2SPrMD1eA/LGuRqAGe2ZF9fnAlNUoyZNAsxL4Okt9wfIPteGhSvYNfuOwZvZY1eBlq+/ax/h9bGkBj7Ikizig8ivLIRZ0LVD6oZNWfLdByspNoRVUf/rMSl6XKtC2Cl+nVJxyuK7XDjrfdECfV+tGADlsIl8eX7KSmK+HxRzrMuwvh7jbLiEGHtnXBW5bBBSBk4UOswvJM1BJAxDC4eF4VaGkZ/Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Ackerley and Sean, On Thu, 3 Apr 2025 at 00:56, Sean Christopherson wrote: > > On Wed, Apr 02, 2025, Ackerley Tng wrote: > > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > > > index ac6b8853699d..cde16ed3b230 100644 > > > --- a/virt/kvm/guest_memfd.c > > > +++ b/virt/kvm/guest_memfd.c > > > @@ -17,6 +17,18 @@ struct kvm_gmem { > > > struct list_head entry; > > > }; > > > > > > +struct kvm_gmem_inode_private { > > > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM > > > + struct xarray shared_offsets; > > > + rwlock_t offsets_lock; > > > > This lock doesn't work, either that or this lock can't be held while > > faulting, because holding this lock means we can't sleep, and we need to > > sleep to allocate. > > rwlock_t is a variant of a spinlock, which can't be held when sleeping. > > What exactly does offsets_lock protect, and what are the rules for holding it? > At a glance, it's flawed. Something needs to prevent KVM from installing a mapping > for a private gfn that is being converted to shared. KVM doesn't hold references > to PFNs while they're mapped into the guest, and kvm_gmem_get_pfn() doesn't check > shared_offsets let alone take offsets_lock. You're right about the rwlock_t. The goal of the offsets_lock is to protect the shared offsets -- i.e., it's just meant to protect the SHARED/PRIVATE status of a folio, not more, hence why it's not checked in kvm_gmem_get_pfn(). It used to be protected by the filemap_invalidate_lock, but the problem is that it would be called from an interrupt context. However, this is wrong, as you've pointed out. The purpose of locking is to ensure that no two conversions of the same folio happen at the same time. An alternative I had written up is to rely on having exclusive access to the folio to ensure that, since this is tied to the folio. That could be either by acquiring the folio lock, or ensuring that the folio doesn't have any outstanding references, indicating that we have exclusive access to it. This would avoid the whole locking issue. > ... Something needs to prevent KVM from installing a mapping > for a private gfn that is being converted to shared. ... > guest_memfd currently handles races between kvm_gmem_fault() and PUNCH_HOLE via > kvm_gmem_invalidate_{begin,end}(). I don't see any equivalent functionality in > the shared/private conversion code. For in-place sharing, KVM can install a mapping for a SHARED gfn. What it cannot do is install a mapping for a transient (i.e., NONE) gfn. We don't rely on kvm_gmem_get_pfn() for that, but on the individual KVM mmu fault handlers, but that said... > In general, this series is unreviewable as many of the APIs have no callers, > which makes it impossible to review the safety/correctness of locking. Ditto > for all the stubs that are guarded by CONFIG_KVM_GMEM_SHARED_MEM. > > Lack of uAPI also makes this series unreviewable. The tracking is done on the > guest_memfd inode, but it looks like the uAPI to toggle private/shared is going > to be attached to the VM. Why? That's just adds extra locks and complexity to > ensure the memslots are stable. You're right. Back to your point from above, it might make more sense to have that logic in kvm_gmem_get_pfn() instead. I'll work something out. > Lastly, this series is at v7, but to be very blunt, it looks RFC quality to my > eyes. On top of the fact that it's practically impossible to review, it doesn't > even compile on x86 when CONFIG_KVM=m. > > mm/swap.c:110:(.text+0x18ba): undefined reference to `kvm_gmem_handle_folio_put' > ERROR: modpost: "security_inode_init_security_anon" [arch/x86/kvm/kvm.ko] undefined! > > The latter can be solved with an EXPORT, but calling module code from core mm/ > is a complete non-starter. > > Maybe much of this has discussed been discussed elsewhere and there's a grand > plan for how all of this will shake out. I haven't been closely following > guest_memfd development due to lack of cycles, and unfortunately I don't expect > that to change in the near future. I am more than happy to let y'all do the heavy > lifting, but when you submit something for inclusion (i.e. not RFC), then it needs > to actually be ready for inclusion. > > I would much, much prefer one large series that shows the full picture than a > mish mash of partial series that I can't actually review, even if the big series > is 100+ patches (hopefully not). Dropping the RFC from the second series was not intentional, the first series is the one where I intended to drop the RFC. I apologize for that. Especially since I obviously don't know how to handle modules and wanted some input on how to do that :) Splitting this series into two was my attempt at making it easier to review, but I see it ended up making things more complicated. The next one will be just one series. Cheers, /fuad