From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68F5DC02198 for ; Mon, 10 Feb 2025 16:04:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EF4E26B0083; Mon, 10 Feb 2025 11:04:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E7CA06B0088; Mon, 10 Feb 2025 11:04:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF6996B0089; Mon, 10 Feb 2025 11:04:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B09B46B0083 for ; Mon, 10 Feb 2025 11:04:50 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 753EA443E9 for ; Mon, 10 Feb 2025 16:04:47 +0000 (UTC) X-FDA: 83104508214.17.A77AA95 Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) by imf08.hostedemail.com (Postfix) with ESMTP id 8CB75160007 for ; Mon, 10 Feb 2025 16:04:45 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=a92M0zVL; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of tabba@google.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=tabba@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739203485; a=rsa-sha256; cv=none; b=wlczFMXZJIfLiBoxUkwUtQMCu7aGicY2uOU90ZTcquCEabNZLO/ry7I6q/NZgBtCxuJyHl GBtbcWpDUDM3gRmEEFKxM1z5E3VncX7fkcjCLqPUaqzALWqWWrWsM7LAGR21a/L1Z/+Qim jKjnFi6ne80Sfx6RxVztTWRzu6qhdCU= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=a92M0zVL; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of tabba@google.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=tabba@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739203485; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AcBTg4Wgx9l1UkrbP4fO7M94HUL3Hslq3degNWAQgeA=; b=MUnpEEcIeuJR/8YPKtezPuU6VZ0bWULgjbyuRLGonnZkB8K8KNIICXrqKMYCoooCADNF8P ZfiPqJZwJkTwZPIO8yDVErAYSG96HGDo5+2lhufiSyD7ktIrCTl+Y7MMqtwp2KKepAe1Ql zdW4EZpEQ1OMxo1G5RDkAuzn/ZwX3rE= Received: by mail-qt1-f181.google.com with SMTP id d75a77b69052e-47180c199ebso409451cf.0 for ; Mon, 10 Feb 2025 08:04:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739203485; x=1739808285; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=AcBTg4Wgx9l1UkrbP4fO7M94HUL3Hslq3degNWAQgeA=; b=a92M0zVLibPBGZU5ver9wx13gvr+mXODU5mBrjjpwqR6bfts6591/1Ry1DG5NcXsPA C1QAOuMXdAlIliaNDLV+PEpwO7c7fcPVT1f8NYHOuI6kUS7pXLuCvAxG7EekwoF7vxYK 2uXgLPFFuISHYEYAhBm+7tVJ8lGAovVZiC+hl4/LGC2NI9TWTp3PfogAP0kvSGHqgF9g pVYyj8cQb9BMWABXRq/HXE2MWu05m8GrwcmmDA4KoIvZRHQYTCwQCAwnlkmUbmqdU94+ M0P9hBx7Fy5GoYA+FZPbi9RW8NzlP9WtEtQ4EwqRI4KatV4aJGx+PjkyC9WSarpEXySv Ey5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739203485; x=1739808285; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AcBTg4Wgx9l1UkrbP4fO7M94HUL3Hslq3degNWAQgeA=; b=hz738XiQKqDZ8eA9ceKfJchJrre+8mEzJ3Wey5IxCWki5xHzf8iu3KvHmo7LbV8uzI VlJb4F/r2+v78wc4bD/AotpdN+Z1g3zEoGE1VDM40vHZP6x7VgraFrZLfNOR5zQWNgSG NKC/vpokFkY3zd9JKxZZQefDquyPaPYmLjpLW/of42Z/JqB3cikPW+DhbpKh3FT8pjJ7 wa6KC9NiogOUgHuV0ib7LopIH8imC79xAZXJBlR3RGHBckHfvnUICAsyTBI5SKQRYOid 2vJzV3afxyxsAf7bdIx1OcWY+kAxMLXyukiXTGb7EpDsrA6T0fzha1HtOTd35Bt5hJWz 5wHg== X-Forwarded-Encrypted: i=1; AJvYcCUJlp6NZvPCNvqfXcED0GGbDAq0ydkQzRSQo6cQ0pkSWGV56lBs5zfsU6Sbf4fWIRYZwSrhWnfAUA==@kvack.org X-Gm-Message-State: AOJu0Yyst2owsfZxyaVJjJ544cKlOSkGm+lpw6PCNrWv6rN1csUJKPWr IQNmPwYNvA2LEm//HwkFosntvL7DtxnVmtDtbMsrQw5BHWqWfKhsxXZrNNLlYR49qddhUg1KWCV 8hJym/S0BxoJEr7zah+tH0yiEWFDs8fqs5p5J X-Gm-Gg: ASbGncvvT4Ez7bo2PwnbOk87TS95abehQMgNfZwA4qyqLnycI4Ht5xSMo19zcoTdE67 gtHvB7xFxLQr5xs18cCiCK6mfHLsDtt/fL7hMkK7ybpAs2/HdZ+AZBJu0+oPKdgL+8oVb0qc= X-Google-Smtp-Source: AGHT+IEPhqgOY7/vjDJO9G9Kayo09mPAh4Owza9DV8lkYNo3V2kODOd7QSjFL6dbJJ0yHi4xvJ0CuU/eQiqf60I7yl4= X-Received: by 2002:ac8:4a02:0:b0:465:3d28:8c02 with SMTP id d75a77b69052e-471837d8db6mr4873031cf.26.1739203484270; Mon, 10 Feb 2025 08:04:44 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Fuad Tabba Date: Mon, 10 Feb 2025 16:04:07 +0000 X-Gm-Features: AWEUYZlYCTGs061p_Xgk33zraHEj1D0BCc_wrNzz8QY6p-YEtOzRpN_eEkVL548 Message-ID: Subject: Re: [RFC PATCH v5 06/15] KVM: guest_memfd: Handle final folio_put() of guestmem pages To: Ackerley Tng Cc: Vishal Annapurve , kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 8CB75160007 X-Rspamd-Server: rspam12 X-Stat-Signature: fd1f47qxqa8dxorg5fb8xrztrn3w4nga X-HE-Tag: 1739203485-758122 X-HE-Meta: U2FsdGVkX19Ftaq9pakGYe1h+RTl4jNmNKee+u7eLx8caUySczw99oCVXhHPpU11EjM5H++aIA+IoHBe9OxhuUkhrsL/aRRYD0TZ5A/ydCwBBnsJmHcDpcum+gTQ1HEx+UUxRrCBsndJLIOUrjFDV6XhkHiQtaobx68IgxhjtOVnW43zqwGtqRU4+2z4sX107ozB0ZGWGiwIIdSOv32ffnrwmZA9VImrnGhwHZTwgbH3KH0WiIbqDI4gpmSVtQUlX3C6oHDkgVZQwp68nj5jKmwCcZKGVkxTC3o1eJ/yicoSlfO/4q75kReJul9UDbYSKIZMVV9BYiy3l/tgaYM9jDd67oA05SVKIkNbN49DjzHiV9O4r7PHciq3hWb8227+IE8rxESCh+mmVtgPL+D4T243W0HumQMnNzX0udZEOwyenVzmw6zsZWGbNYGd88WiIVwy5NACBOLc8geKKMgnL3xzNipFssIoI/870YdYLRlcMX+3roXJm3xqZ4VJ0KOwqhkpW1xTUanum9TB7+UNd1sJG+e28NNLIzQLSGVhTo/mugov2RimM529jEoGJVxQEJWZHPIfnD6UWCkhAeYr8DB6b9qEQ3CJe343QgR6DKi9yoy/eQlA7Zxe81dLYIT+oKw5ty6igDDd/wVWWgH7PfrdZJ4D+76XdamYaWGcL/YvrKp4rO32ksTAeyFLXoYnJheU5F/80DSsYeYXI8SZX9jbZLvPGA+Io6Iw0DQakUa9TyQCCTSicvJPnj7AyRum8DsW2TFB2rEp3dKl4LC3hXSw0nD6qePuLO0UmPHr5WYvB9Yz1Y+CUoDz8B5YURxKwczk4p9FPbHjL+nJ1t4wqKbzynd2mjHUOjH2MGRYuLEpOpQIiQIz7K1dNA8S+BjYGcpiIN0CAmhqL5H3Pi+IWKXQcvbB5TZvArD2rSd5X/cEbUweVcyGnLV3jTN5BPBW8otryG6FW44n96GQpJN 8y5VEJwz rv/tZiGrVG2T2DLa2DH7hbBbzWlf86p+TjZstpNtQRsr6dNCPj4cu5JhI7I4PfxX0Bz8sFTzBZyUeWWoioxTfS8fpLHPlNnkzmc/GP2xB3bo1CflpcwU97Ns/z5DjOKS8zFq2BXg5OhoDfnSOh9ZjjN3hi91FwKWfv1ZhUb+dPKBBw5VnFy+MpHiWHtvkwkHZLhb+S/5d2KJlRNpKHQZIUiXkYKfPFGA29S/i/zVmFZuk67FpLrxX9KNRoExzWyn1X2o9dx6OrhMoBfkzsDRhKJMKD3IqPzR3PojO+6gMWsoXEdZzIR6rSaV5qPSzCTlpudRiGTmhX2yz6t4Khds7jFbwcE6LDTW4jN4F9dxGLveweIKQ9JnQjp0vQKov60COqmBEJgGK/Jd00IqLRqOReG8uUl+Q3dkNIRMzGevI+Z24rrY0fT/il7TOEpXEfqDGysVCnzjGtSrstQDT2EqQ0RvppR0t+/BO2dhc X-Bogosity: Ham, tests=bogofilter, spamicity=0.029864, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Ackerley, On Fri, 7 Feb 2025 at 10:46, Ackerley Tng wrote: > > Vishal Annapurve writes: > > > On Wed, Feb 5, 2025 at 9:39=E2=80=AFAM Vishal Annapurve wrote: > >> > >> On Wed, Feb 5, 2025 at 2:07=E2=80=AFAM Fuad Tabba w= rote: > >> > > >> > Hi Vishal, > >> > > >> > On Wed, 5 Feb 2025 at 00:42, Vishal Annapurve wrote: > >> > > > >> > > On Fri, Jan 17, 2025 at 8:30=E2=80=AFAM Fuad Tabba wrote: > >> > > > > >> > > > Before transitioning a guest_memfd folio to unshared, thereby > >> > > > disallowing access by the host and allowing the hypervisor to > >> > > > transition its view of the guest page as private, we need to be > >> > > > sure that the host doesn't have any references to the folio. > >> > > > > >> > > > This patch introduces a new type for guest_memfd folios, and use= s > >> > > > that to register a callback that informs the guest_memfd > >> > > > subsystem when the last reference is dropped, therefore knowing > >> > > > that the host doesn't have any remaining references. > >> > > > > >> > > > Signed-off-by: Fuad Tabba > >> > > > --- > >> > > > The function kvm_slot_gmem_register_callback() isn't used in thi= s > >> > > > series. It will be used later in code that performs unsharing of > >> > > > memory. I have tested it with pKVM, based on downstream code [*]= . > >> > > > It's included in this RFC since it demonstrates the plan to > >> > > > handle unsharing of private folios. > >> > > > > >> > > > [*] https://android-kvm.googlesource.com/linux/+/refs/heads/tabb= a/guestmem-6.13-v5-pkvm > >> > > > >> > > Should the invocation of kvm_slot_gmem_register_callback() happen = in > >> > > the same critical block as setting the guest memfd range mappabili= ty > >> > > to NONE, otherwise conversion/truncation could race with registrat= ion > >> > > of callback? > >> > > >> > I don't think it needs to, at least not as far potencial races are > >> > concerned. First because kvm_slot_gmem_register_callback() grabs the > >> > mapping's invalidate_lock as well as the folio lock, and > >> > gmem_clear_mappable() grabs the mapping lock and the folio lock if a > >> > folio has been allocated before. > >> > >> I was hinting towards such a scenario: > >> Core1 > >> Shared to private conversion > >> -> Results in mappability attributes > >> being set to NONE > >> ... > >> Trigger private to shared conversion/truncation for > >> ... > >> overlapping ranges > >> ... > >> kvm_slot_gmem_register_callback() on > >> the guest_memfd ranges converted > >> above (This will end up registering callback > >> for guest_memfd ranges which possibly don't > >> carry *_MAPPABILITY_NONE) > >> > > > > Sorry for the format mess above. > > > > I was hinting towards such a scenario: > > Core1- > > Shared to private conversion -> Results in mappability attributes > > being set to NONE > > ... > > Core2 > > Trigger private to shared conversion/truncation for overlapping ranges > > ... > > Core1 > > kvm_slot_gmem_register_callback() on the guest_memfd ranges converted > > above (This will end up registering callback for guest_memfd ranges > > which possibly don't carry *_MAPPABILITY_NONE) > > > > In my model (I'm working through internal processes to open source this) > I set up the the folio_put() callback to be registered on truncation > regardless of mappability state. > > The folio_put() callback has multiple purposes, see slide 5 of this deck > [1]: > > 1. Transitioning mappability from NONE to GUEST > 2. Merging the folio if it is ready for merging > 3. Keeping subfolio around (even if refcount =3D=3D 0) until folio is rea= dy > for merging or return it to hugetlb > > So it is okay and in fact better to have the callback registered: > > 1. Folios with mappability =3D=3D NONE can be transitioned to GUEST > 2. Folios with mappability =3D=3D GUEST/ALL can be merged if the other su= bfolios > are ready for merging > 3. And no matter the mappability, if subfolios are not yet merged, they > have to be kept around even with refcount 0 until they are merged. > > The model doesn't model locking so I'll have to code it up for real to > verify this, but for now I think we should take a mappability lock > during mappability read/write, and do any necessary callback > (un)registration while holding the lock. There's no concern of nested > locking here since callback registration will purely (un)set > PGTY_guest_memfd and does not add/drop refcounts. > > With the callback registration locked with mappability updates, the > refcounting and folio_put() callback should keep guest_memfd in a > consistent state. So if I understand you correctly, we'll need to always register for large folios, right? If that's the case, we could expand the check to whether to register the callback, and ensure it's always registered for large folios. Since, like I said, the common case for small folios is that it would be just additional overhead. Right? Cheers, /fuad > >> > > >> > Second, __gmem_register_callback() checks before returning whether a= ll > >> > references have been dropped, and adjusts the mappability/shareabili= ty > >> > if needed. > >> > > >> > Cheers, > >> > /fuad > > [1] https://lpc.events/event/18/contributions/1764/attachments/1409/3704/= guest-memfd-1g-page-support-2025-02-06.pdf