From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6115BC02196 for ; Fri, 7 Feb 2025 10:46:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F2CC6280004; Fri, 7 Feb 2025 05:46:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F026D280001; Fri, 7 Feb 2025 05:46:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF147280004; Fri, 7 Feb 2025 05:46:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C0721280001 for ; Fri, 7 Feb 2025 05:46:20 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 28CD0C1845 for ; Fri, 7 Feb 2025 10:46:18 +0000 (UTC) X-FDA: 83092819236.17.0997BF6 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) by imf26.hostedemail.com (Postfix) with ESMTP id 582DE140006 for ; Fri, 7 Feb 2025 10:46:16 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=WFgpojGr; spf=pass (imf26.hostedemail.com: domain of 3d-SlZwsKCBw24C6JD6QLF88GG8D6.4GEDAFMP-EECN24C.GJ8@flex--ackerleytng.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3d-SlZwsKCBw24C6JD6QLF88GG8D6.4GEDAFMP-EECN24C.GJ8@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738925176; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:dkim-signature; bh=8Dl1LIULh/Hk0Dh3VLCiWHTYgwIdPTs5L1n9NPa7EY4=; b=PMfnwDEpJxYmaJjF3IbL8IFoS8kAIWukwDkZwRBnqBA9WnGD/WaPeezc/E9A1Z+RiNc62D CSsNmHLUNB0Zc8HMB5z1jCqx5wAxmaULsYoipA0C7LiAL20v/LzM8fof3vyh6vpBAOx1ow ZWdHbLcrf1cJf6b/+BEUnsvqf7sbFOc= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=WFgpojGr; spf=pass (imf26.hostedemail.com: domain of 3d-SlZwsKCBw24C6JD6QLF88GG8D6.4GEDAFMP-EECN24C.GJ8@flex--ackerleytng.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3d-SlZwsKCBw24C6JD6QLF88GG8D6.4GEDAFMP-EECN24C.GJ8@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738925176; a=rsa-sha256; cv=none; b=3tI0JGouCH6T9oxviSm2/cACEC6BTd3YybjgOFlrCIOGS540OjbhAux+ctNrFGRWEeDMnV gLYdN2i7AYFqC4uAtJyoxVKL2nmBJOQ7n7XfCGGrx46kUSbB++mmJLSrGdAuf84R9Mx50B VCwPpSxOKceFglJmJcutx4/oiV+KIwc= Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-21f5060ef10so29588085ad.2 for ; Fri, 07 Feb 2025 02:46:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738925175; x=1739529975; darn=kvack.org; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=8Dl1LIULh/Hk0Dh3VLCiWHTYgwIdPTs5L1n9NPa7EY4=; b=WFgpojGrmFF33Eqrokc/sZ2K+emYczn4ypgYObYiEpV3fZcAblBWuRfYrWBLzKQsrQ 5DwwGU5fcDYSTNOIS2ngHU1URsdMWoimQdjSLEOUn4MaOTOr6lGOFix160QRSjHS1fDq jOq2DE+6Q5tXT2bC32gMZBANWiXMlXW7u4MPbsW8N68XKguo9hGWY+0WzJUFPqWYT7j6 gXOlkSiEuJpEeT0BOutzapyl1bQjcuFCkeg3kpkZO4K7K3YgT5Keig07LNgFKgP/YBJs 22LL2T6xdla9/9VSIXQGhvote1/mD2mQOb3gSxEemxOGTT/sEXSgcoK2ukr00oTZVirf PZJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738925175; x=1739529975; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=8Dl1LIULh/Hk0Dh3VLCiWHTYgwIdPTs5L1n9NPa7EY4=; b=X8ulnh8rspYMCocmujG2X08DXgN45kL2eNYWE2JVYBoUerEmAhMHU8WQgaphFUM/fH P1lUnxMVF8tdslFs/BCVUIF/yGqc7yYzXOdRevPB6ejK6//5NjL1kduei3YT0bzRrcgi xsHRr50YExM5cDDLiZv/N63Yiy3dXrVLNZg+0thPovOOQ3OrBe3oB8wSofwofFKe2Xvj rF2smlE8cRFjHEJf5fE0Ljp1Hz4wiRyPO/7MiVcJFRQDPxGL44cinB77uSHuLO1tjxH/ Y3UKIctNGeNGE8mTFJw1C47qUZltu0gMZ2oUwCBop75TcV+LHN4EB2dVMkp+B+EDCNWo e9yg== X-Forwarded-Encrypted: i=1; AJvYcCVswNy2Gy3R4ow/glkxcbOipxE0wXDSB/deNFmWX6KF6wzQmAJeIN6D0blYZ/YKZOlgZP4c6NebZQ==@kvack.org X-Gm-Message-State: AOJu0Yyya8v4KTGSz6LXer8amdhZwOxVFOqCaaUUeK6xmlWAC2T41kSZ vcwOHyCyvEXWWrXleGjsJacOS4pZRXLaIypeIg/tUUf0LAhbt+mzcQNRLnLwCaZ8pd4zs8FEZnB D9jG0gVAZ8AiFyPxbcg5CZA== X-Google-Smtp-Source: AGHT+IHvDCkHXioeXTkvRcwHbNoG5B0oy0/KVX0GTCimiHJWUHjr8iro5mTlsf5sd+mlLHrxITtVvmIlBJE3c8u3bw== X-Received: from pgbcm3.prod.google.com ([2002:a05:6a02:a03:b0:aa5:c436:1469]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:9e46:b0:1e0:d19c:c950 with SMTP id adf61e73a8af0-1ee03a45e13mr5558700637.16.1738925175098; Fri, 07 Feb 2025 02:46:15 -0800 (PST) Date: Fri, 07 Feb 2025 10:46:13 +0000 In-Reply-To: (message from Vishal Annapurve on Wed, 5 Feb 2025 09:42:17 -0800) Mime-Version: 1.0 Message-ID: Subject: Re: [RFC PATCH v5 06/15] KVM: guest_memfd: Handle final folio_put() of guestmem pages From: Ackerley Tng To: Vishal Annapurve Cc: tabba@google.com, kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 582DE140006 X-Stat-Signature: 9ogr5wkgcmb1bijbk6shdgndwon4ki51 X-Rspam-User: X-HE-Tag: 1738925176-384786 X-HE-Meta: U2FsdGVkX1+D6Mx4oFmGcmVvVsco4hVxxYcz28PL/CE0peUWGeDYbycJjDJB9o1HyHr/8xcK2ZTmfp+1BaJA/XOwB468Ou7ZlgiIe7gUtrFppujG3VO5S8AsJov/2Kls9JmNoX6XMbrDEteSjg2ztwJFG84mPNl5gqD/aQoRd4kqCQhusCHtRxHUOvsemawGj1xGG/j6xo5qplp/OvYGrtvWBDwhcaJVjwpfURtGudhHRoYT0gSW97+YL5tBJUTRi+5hZEkEBP4amXx50omJQ7lCGvNMDpm6rfsmZvuy2EUdEHtJj+VzU+ef8T9j8rarYch+quN1+CzsNCt9Mx4A7QNFlieCGreyjELt7IWPIyYNARHWA3BMXh5QEn2u4md7c+9hQLg0BLlb434FdnRGpc/1NFJ8mMv3K6Yzu/z0iDxgL2AAps/+ue3gskCmN7urRtZ1cdzTLLm5xrvYhs6uLZEragxDrlAydGf0V0SJcGTeOjbzh7P4cquKyjtLUgfc7wypx08xNoN4HmHgrrQEcnayaeNn+GYef1p5HZBLQq1X60ZXQhVCf+TTNOx6yHNEcsJxc/ydJoAfad9iJzMZcJkftbWldoX7I+tk+3wquJIZdcXM7pEa6LWkxyYwb/QHWDAlpwD8aZfU+y3BZxxl2ywNxc4f33vUrNkgBdaaedRpUmnCzAQdDCAJFdbl8WqFzkKsq/VNe8DCS7jaWisUdYTmDXW3nFK4dhg3QIFD9ybj64/+DVesMv6NjnZrQyqh48VsEqbydphMZmaGTD/KUIrNLDhl9mOYWHzNCVmWiPsyIDd5qC/czW8RadO0oD3IM3+GOXWjocsOgSRYFaGREo18gqR4AZADl63Gtjmju0WjeJz8SHEAo/jr4EB45KlltfYLiefz2zxL5gQfF+CKF8z4Jb2so/+jmKfW2Nd3levLH1VU0bDwxgJ52YP6AITlt9Z/h/LEVc6wbYgzc7F vqP8bWZ9 8MzyqwvY8vXp++E70uPgRei9Auk49y3Q9cFWOCx5V6Jtq90GxhAk1tp0KxxS6W//8U2rZ2ncA01g1jKkYFbA/baVSK6blU7vAeatTHIZuuYbgQlA3+FZqE2uTiKYfguRf9h3cShEpq205jDdvjmwmDqxbjuzTcWV4YHtacCNdFPOybOEs4f1JfZfTEBxBb3DEj3GqAuMRwdrQhartbU5GJ7g+McxUepCCzZAZuR9cT8A4zvwOQOKm4J0GtGXYwu147LJJz4ER22ARahSLWEu+8ZR766aa5cEKdURQXYRard/pTAdxTLa6040Zp7CSRJl8sSqZGKi4zKoXjfriXld11ulQegncYVaWHe9oyf8NjM4Nz6Hkched2PAYiA0kQrNwVBSHVO/vDS39Wq9sDuFyy7y4aZs+8PEuZIG/6HNoPmbpPJwJ9BGLwX4jy0/huH42UnuUCwqcpRDPqftSOzeP3YZQNL4We1dc3xcNG32gsQM35JPW7TU8kqwNlRDC4O4aVsmSS0o9ZNl7mTYvh3EfH+uCYrJwgJ3z79ONHGCqASP38m+wN6YJKKVuwbIgCI873KJKQ4g9hxgsvzzXMFUFX4AKPmIYeh98hp4/cfAIg2E9IP+Aw7pTQuAU8fAOKfG6yihq5/HvCjvNNg/1T0twhLMH3PsFGFPNtr1DleHW0yUxc9i/hdbcMSO70qksb/6i66zXMo/u9T7RpqeboaqRNKcy7xPkh3eLJ0JN8OgdsQqrY6UuJ20p6YHbl2VYH6+LVVv2pOaVnR2pRxlw8/BhfkzalBBsy4rVNd2U4PAbsWPXWupXCPARtWMIaGZurFVzfpXI X-Bogosity: Ham, tests=bogofilter, spamicity=0.045363, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Vishal Annapurve writes: > On Wed, Feb 5, 2025 at 9:39=E2=80=AFAM Vishal Annapurve wrote: >> >> On Wed, Feb 5, 2025 at 2:07=E2=80=AFAM Fuad Tabba wro= te: >> > >> > Hi Vishal, >> > >> > On Wed, 5 Feb 2025 at 00:42, Vishal Annapurve = wrote: >> > > >> > > On Fri, Jan 17, 2025 at 8:30=E2=80=AFAM Fuad Tabba wrote: >> > > > >> > > > Before transitioning a guest_memfd folio to unshared, thereby >> > > > disallowing access by the host and allowing the hypervisor to >> > > > transition its view of the guest page as private, we need to be >> > > > sure that the host doesn't have any references to the folio. >> > > > >> > > > This patch introduces a new type for guest_memfd folios, and uses >> > > > that to register a callback that informs the guest_memfd >> > > > subsystem when the last reference is dropped, therefore knowing >> > > > that the host doesn't have any remaining references. >> > > > >> > > > Signed-off-by: Fuad Tabba >> > > > --- >> > > > The function kvm_slot_gmem_register_callback() isn't used in this >> > > > series. It will be used later in code that performs unsharing of >> > > > memory. I have tested it with pKVM, based on downstream code [*]. >> > > > It's included in this RFC since it demonstrates the plan to >> > > > handle unsharing of private folios. >> > > > >> > > > [*] https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/= guestmem-6.13-v5-pkvm >> > > >> > > Should the invocation of kvm_slot_gmem_register_callback() happen in >> > > the same critical block as setting the guest memfd range mappability >> > > to NONE, otherwise conversion/truncation could race with registratio= n >> > > of callback? >> > >> > I don't think it needs to, at least not as far potencial races are >> > concerned. First because kvm_slot_gmem_register_callback() grabs the >> > mapping's invalidate_lock as well as the folio lock, and >> > gmem_clear_mappable() grabs the mapping lock and the folio lock if a >> > folio has been allocated before. >> >> I was hinting towards such a scenario: >> Core1 >> Shared to private conversion >> -> Results in mappability attributes >> being set to NONE >> ... >> Trigger private to shared conversion/truncation for >> ... >> overlapping ranges >> ... >> kvm_slot_gmem_register_callback() on >> the guest_memfd ranges converted >> above (This will end up registering callback >> for guest_memfd ranges which possibly don't >> carry *_MAPPABILITY_NONE) >> > > Sorry for the format mess above. > > I was hinting towards such a scenario: > Core1- > Shared to private conversion -> Results in mappability attributes > being set to NONE > ... > Core2 > Trigger private to shared conversion/truncation for overlapping ranges > ... > Core1 > kvm_slot_gmem_register_callback() on the guest_memfd ranges converted > above (This will end up registering callback for guest_memfd ranges > which possibly don't carry *_MAPPABILITY_NONE) > In my model (I'm working through internal processes to open source this) I set up the the folio_put() callback to be registered on truncation regardless of mappability state. The folio_put() callback has multiple purposes, see slide 5 of this deck [1]: 1. Transitioning mappability from NONE to GUEST 2. Merging the folio if it is ready for merging 3. Keeping subfolio around (even if refcount =3D=3D 0) until folio is ready for merging or return it to hugetlb So it is okay and in fact better to have the callback registered: 1. Folios with mappability =3D=3D NONE can be transitioned to GUEST 2. Folios with mappability =3D=3D GUEST/ALL can be merged if the other subf= olios are ready for merging 3. And no matter the mappability, if subfolios are not yet merged, they have to be kept around even with refcount 0 until they are merged. The model doesn't model locking so I'll have to code it up for real to verify this, but for now I think we should take a mappability lock during mappability read/write, and do any necessary callback (un)registration while holding the lock. There's no concern of nested locking here since callback registration will purely (un)set PGTY_guest_memfd and does not add/drop refcounts. With the callback registration locked with mappability updates, the refcounting and folio_put() callback should keep guest_memfd in a consistent state. >> > >> > Second, __gmem_register_callback() checks before returning whether all >> > references have been dropped, and adjusts the mappability/shareability >> > if needed. >> > >> > Cheers, >> > /fuad [1] https://lpc.events/event/18/contributions/1764/attachments/1409/3704/gu= est-memfd-1g-page-support-2025-02-06.pdf