From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F5CFC36010 for ; Fri, 4 Apr 2025 06:44:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9E7036B0012; Fri, 4 Apr 2025 02:44:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 996206B0022; Fri, 4 Apr 2025 02:44:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8102B6B0023; Fri, 4 Apr 2025 02:44:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 63CCB6B0012 for ; Fri, 4 Apr 2025 02:44:12 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 44EDFB199E for ; Fri, 4 Apr 2025 06:44:12 +0000 (UTC) X-FDA: 83295421944.27.EF8AF70 Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) by imf19.hostedemail.com (Postfix) with ESMTP id 792C61A0004 for ; Fri, 4 Apr 2025 06:44:10 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=uf8LjnSZ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of tabba@google.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=tabba@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743749050; a=rsa-sha256; cv=none; b=yXYzMOz0dXbWyERjRV70oAuhcQD1LnNcdx0cLPE6N9W9OV0E3oMZKfid+TENHlKJvrdVen MApgJM6MKwOtmo4wVzVPz00O2XaaMrUTia0MND7p3pVPo/xmiL/BUoQQLZpRCiGm6pN8Ca iCPBl0cbSNfilxBy8dFJnaiChhT3haY= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=uf8LjnSZ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of tabba@google.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=tabba@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743749050; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QRp7k+A1pFrQ49ZEN8qzXLBlpSlTBAeehOkWfvELJA8=; b=yUcxsQADaIiBaOeHmgYjnRSTS5KhwcvW0e9y3SGR6XoQbEs51/jobq5/M87l7F4J1sQG/c GQABMpIEaYW/cCEOPHs7wB8b1ejGyAmudQ5D7vNWUFxaCNPqJBsFml3MTqR42z7mJ4mhR7 E9pCUlIx7B47YzkMj6iSg2SP1c1dmDM= Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-4769e30af66so162981cf.1 for ; Thu, 03 Apr 2025 23:44:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1743749049; x=1744353849; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=QRp7k+A1pFrQ49ZEN8qzXLBlpSlTBAeehOkWfvELJA8=; b=uf8LjnSZrFl3KY16eQwtq8/f0H1r0fDSXE0iTFSEx4gtkMJoJzygldwtPGoKiAm18m dQJepluJVLzZyB5kdRF3pYdLSnSjgtZlXmh/fAfaJFml/Qg7jFAQAKWRKz+VrRIA+NGf PsnrwDencoGVvuu49knbzt02RnKDBgbZpuqjNI5TrT6/xJkV4kQuzF2eRZd7NvgUp5yd RV5mFat5Mcv1lLIeEO2bYjhextN3YXpphJBU7TGR5R2XO3YBV9RymOWL5h96Bv1OKXzR yc63TkYV4Uj8z9FlaauK2AMID0xUHCoOjrBSTLFdA4TzKE7VLud0sg0IFvttmSEpLLKH OAYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743749049; x=1744353849; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=QRp7k+A1pFrQ49ZEN8qzXLBlpSlTBAeehOkWfvELJA8=; b=ofVg44hQZdt8kwhs4aC3UPlp8BGHgcCxV1AR2pZEBW8yrKX7Ju630mI6kFzK05Qc3a 6MNvRnJBSg9X7PW0lgM+jD3M6dZrh5u6WUtZ6ZCF2s2via+dEGydrSWkxWTB2e9dBmWX +nEFMtcPORaNPBYbcURt5kWKDu1eYkdvDlybZta3ceKdso/4N3bBYwSKAIIW7K7NSPYx U7U4iRuWRZItJQumtZ6IxBEwi0QOUnofXtSnJGUpXqb7hv/PRz7+E4bNvCOr8IoWEf8K RbllQGsyDF3/1wdl/DB/2x2xzMM0DetN0H2/r8VptxjTw4TMXR6n340r+5oSbvDPG/oZ sYvQ== X-Forwarded-Encrypted: i=1; AJvYcCWtAgP6hPLUcu6EwXLI5227UAn11SXO/QTMJiQYlOP328rot/Qwohrdn/R613VZ922mupbnJVw/Pg==@kvack.org X-Gm-Message-State: AOJu0Yw7EEdiEkN8LHoXpJVxT2QJUlKy+vL52xlUp5/nmtpZVRqEoNV/ NslTfHQBz4to5Ate6cBKgu3HruuMyekxkOCsud9QvplcnF2E4/NKlYhYDsDGU419StVtYWH/CUF GqROJorf29/+y6p8s01O9HsHoW1ynlZQlbexb X-Gm-Gg: ASbGncu9+z0iT9W+nP++Mnsb1D7XA80JjEpe82cc24Z6orlquD9+HTZ4+ofsNsbXGgn m41g554vE2TUZPaKVhkyPhqNKaFJSH2JxrxiGxeFbUHFnxggAZ4ZK1cLrZ0kHpE94fgdfXHCeDj xqNU9Zz7xgiF/bGS4ohhFOM9o+pw== X-Google-Smtp-Source: AGHT+IG09thUi2Cg5lfaR7AITr07s/waMELK/8mVmq0xXQpugyCN1C8fMQdmgtAODVMkq1XoS/cFYvIp5rTrjTIV7Wc= X-Received: by 2002:ac8:5790:0:b0:477:871c:5e80 with SMTP id d75a77b69052e-4792654b0f3mr2010081cf.5.1743749049397; Thu, 03 Apr 2025 23:44:09 -0700 (PDT) MIME-Version: 1.0 References: <20250328153133.3504118-4-tabba@google.com> In-Reply-To: From: Fuad Tabba Date: Fri, 4 Apr 2025 07:43:32 +0100 X-Gm-Features: ATxdqUFnj4_SKvrYDsdE58wYUCji883bNG1U9QIzdDfMDlyNRgNZNbRwG-gZp1U Message-ID: Subject: Re: [PATCH v7 3/7] KVM: guest_memfd: Track folio sharing within a struct kvm_gmem_private To: Sean Christopherson Cc: Ackerley Tng , kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, peterx@redhat.com, pankaj.gupta@amd.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 792C61A0004 X-Stat-Signature: wj7y4mfno7ps6oo6xidnmcu9iz55ctdg X-Rspam-User: X-HE-Tag: 1743749050-451514 X-HE-Meta: U2FsdGVkX1908qMEtqPQ9DmxlWOjnlX5xwN5FSjJPwLLXBNBoW9PnTfuiRXueBWsb8M323AfaEhPc7NhWhWQntoKHnQ/8pQUks0ursDE2E/bxJDtIz18boiXugRugxgnAk4KKBG2DS5dcogP44d4mtQMj0Q0D765GTINuEG7/IbRGbvWzcYdotSCbE+AKOKIPppeGCF1QfuJoQ2XBIkmWC+73zxvS9d0VWry56tUFvfRGK47ugSxdhWlxhZA+ayv9bPbMni1nqdpSHhJZpuTXOjE8LMdz9tjJrSNCLPq69fJgjyc8h0n04nJmeqEcVRyYRsk9Bd7u5lrCWcjgW+byhSDRTFBeBdfdCMHETAG+V1kwt4HFjFWydmj1/Q3vq4gfAyZ1gtco2rmnGXVq95GUjamHY/HGT3JXWCllh42nXiCHk46eE8ltyKhvPTwlBddtqz18JnRPgavUgipggHLqqmIQfCIuRYjI8RI+bM57x1m9BGYPsWjS1eoJrQnc1/Il60c0Ws/BiCO3Lzg4GCa8op1r/7b+0nqg9pVvRUrbyWGtUQPhnzGtYQW8uXG36u6uqW2uB/tlTVo9q672hVrxBVyFNlj9MmZeLicXBcKJEkb1uZdpaYHN5KaLv9y2nhe/8xeh9gsoDy+/4Lh/lxKiX+hLPSXtPAkkJ1JOV20rPJL4yz43GsHlr33z7WtUPK3wVzT/Thc11sjS5z0ZdLb3qzMpnoJgREodk3jYzrk5tUaLGdb3Ph4G1gGzJF7lx7/ghqgJdVjhocmJSFsqBX/0rOh3YGEVdtKFimejeiuyxq81hkK8W15SDAUucrNBgcqI1AZmazIy7wQVF2IfMoHVbfX3wnHETDJafJDLwDW5plkgBbS6fcYkbv8uTSaBmw93uJRFLXy0k3PMm+k6uAgP5Um7LBclPkMXZtMjsNsQrfnzvE88GJNjSvaCr7Wu0rdA6z+RZkpi3XsugGJq4f cT1Q/EYH NHS1DyzKAcYAcaErYeTMe+9c99LCfoMUtjZqqcCTgsKhtEmjEE05riyAWbYVlCR4XzD52n+Na3tZ1rX03DfYMSmaPL03PilvuuInwOY/mDG1Z1Q94cWPhjK93f0nRuQ747d+r2sOFNEPyKuuMcmFPghRbvO6963NSS8IJJgcwqdl5//whh4+TVus57Y6p+7G+1soVqcINSmEaa1VNOT/Xk/C6EW5jfNZTfnG/Si8S0o63dBjWvLyokFrziu40UQUOW+FUqpH5wqhuqKPOV6NhMlT04S6KchMen4OjIWPjHs4kVm5O7JJbawKDj9A7jj03qK+zP/UZKJ52RspFe79KyjSmnQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000043, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Sean, On Thu, 3 Apr 2025 at 15:51, Sean Christopherson wrote: > > On Thu, Apr 03, 2025, Fuad Tabba wrote: > > On Thu, 3 Apr 2025 at 00:56, Sean Christopherson wrote: > > > On Wed, Apr 02, 2025, Ackerley Tng wrote: > > > > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > > > > > index ac6b8853699d..cde16ed3b230 100644 > > > > > --- a/virt/kvm/guest_memfd.c > > > > > +++ b/virt/kvm/guest_memfd.c > > > > > @@ -17,6 +17,18 @@ struct kvm_gmem { > > > > > struct list_head entry; > > > > > }; > > > > > > > > > > +struct kvm_gmem_inode_private { > > > > > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM > > > > > + struct xarray shared_offsets; > > > > > + rwlock_t offsets_lock; > > > > > > > > This lock doesn't work, either that or this lock can't be held while > > > > faulting, because holding this lock means we can't sleep, and we need to > > > > sleep to allocate. > > > > > > rwlock_t is a variant of a spinlock, which can't be held when sleeping. > > > > > > What exactly does offsets_lock protect, and what are the rules for holding it? > > > At a glance, it's flawed. Something needs to prevent KVM from installing a mapping > > > for a private gfn that is being converted to shared. KVM doesn't hold references > > > to PFNs while they're mapped into the guest, and kvm_gmem_get_pfn() doesn't check > > > shared_offsets let alone take offsets_lock. > > > > You're right about the rwlock_t. The goal of the offsets_lock is to > > protect the shared offsets -- i.e., it's just meant to protect the > > SHARED/PRIVATE status of a folio, not more, hence why it's not checked > > in kvm_gmem_get_pfn(). It used to be protected by the > > filemap_invalidate_lock, but the problem is that it would be called > > from an interrupt context. > > > > However, this is wrong, as you've pointed out. The purpose of locking > > is to ensure that no two conversions of the same folio happen at the > > same time. An alternative I had written up is to rely on having > > exclusive access to the folio to ensure that, since this is tied to > > the folio. That could be either by acquiring the folio lock, or > > ensuring that the folio doesn't have any outstanding references, > > indicating that we have exclusive access to it. This would avoid the > > whole locking issue. > > > > > ... Something needs to prevent KVM from installing a mapping > > > for a private gfn that is being converted to shared. ... > > > > > guest_memfd currently handles races between kvm_gmem_fault() and PUNCH_HOLE via > > > kvm_gmem_invalidate_{begin,end}(). I don't see any equivalent functionality in > > > the shared/private conversion code. > > > > For in-place sharing, KVM can install a mapping for a SHARED gfn. What > > it cannot do is install a mapping for a transient (i.e., NONE) gfn. We > > don't rely on kvm_gmem_get_pfn() for that, but on the individual KVM > > mmu fault handlers, but that said... > > Consumption of shared/private physical pages _must_ be enforced by guest_memfd. > The private vs. shared state in the MMU handlers is that VM's view of the world > and desired state. The guest_memfd inode is the single source of true for the > state of the _physical_ page. > > E.g. on TDX, if KVM installs a private SPTE for a PFN that is in actuality shared, > there will be machine checks and the host will likely crash. I agree. As a plus, I've made that change and it actually simplifies the logic . > > > I would much, much prefer one large series that shows the full picture than a > > > mish mash of partial series that I can't actually review, even if the big series > > > is 100+ patches (hopefully not). > > > > Dropping the RFC from the second series was not intentional, the first > > series is the one where I intended to drop the RFC. I apologize for > > that. Especially since I obviously don't know how to handle modules > > and wanted some input on how to do that :) > > In this case, the rules for modules are pretty simple. Code in mm/ can't call > into KVM. Either avoid callbacks entirely, or implement via a layer of > indirection, e.g. function pointer or ops table, so that KVM can provide its > implementation at runtime. Ack. Thanks again! /fuad