From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B264C02192 for ; Wed, 5 Feb 2025 05:58:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 86AD7280006; Wed, 5 Feb 2025 00:58:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 81B48280004; Wed, 5 Feb 2025 00:58:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6BB2C280006; Wed, 5 Feb 2025 00:58:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 4E327280004 for ; Wed, 5 Feb 2025 00:58:43 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id ED12616042D for ; Wed, 5 Feb 2025 05:58:42 +0000 (UTC) X-FDA: 83084836884.22.5FB53E2 Received: from mail-lf1-f54.google.com (mail-lf1-f54.google.com [209.85.167.54]) by imf08.hostedemail.com (Postfix) with ESMTP id 0615F160003 for ; Wed, 5 Feb 2025 05:58:40 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="QxZVH/9M"; spf=pass (imf08.hostedemail.com: domain of vannapurve@google.com designates 209.85.167.54 as permitted sender) smtp.mailfrom=vannapurve@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738735121; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+LeTTgaKlabWmcuLOpAiz/mXQYniil25QIw4uiRfkkw=; b=phcC7OM+HUVym+yPRvD+EUwfGBaj4zXPqdq3QUAZS2LT9IowCUa2JfHRjETwdS0jReBK09 56IwBugn0zFShPMhq7wStCEvxD4KjSXj9CLR+yfSPtdyH35xY25I7knccV9DgSGq9TRiMF 5uKxNm5h8G0MKIhfzu7fYQxGGnwPZrU= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="QxZVH/9M"; spf=pass (imf08.hostedemail.com: domain of vannapurve@google.com designates 209.85.167.54 as permitted sender) smtp.mailfrom=vannapurve@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738735121; a=rsa-sha256; cv=none; b=2Tn0vncTVuuafhnKcZRXIOjB3Id9dL6nIhO1q1jmShSqhQANLC/tjx5W1yz4tpUtilpLvN DCF9KpJjczV0YAkMTB8ihUyKCsgB2GJ1DUUmOUumnnlPVpX3O3ft6Usk7M4myjdhDumWcx wCMynhmedbRAZiVXQsJglwnVPCkk9iA= Received: by mail-lf1-f54.google.com with SMTP id 2adb3069b0e04-5401af8544bso10693e87.1 for ; Tue, 04 Feb 2025 21:58:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738735119; x=1739339919; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=+LeTTgaKlabWmcuLOpAiz/mXQYniil25QIw4uiRfkkw=; b=QxZVH/9Mu8tw8u46Kg95LsUm/2fs08HZBMdg/S2Tl0kacPbBzMXkDL41AJu9HBDDHg 4lx6Q8icVzZwogj+B3Gs+vBSHVklZW4KM32ux0KcrqRBBw6bYEGDCUQV/no/HGLVE+kw 7iHKht6msMHTJ3L7nrq8RB/qA3Zi4KgpL//bciInGma59Febp13ZIDGlsRSiY6GQF/YA gEaT/iGxUUyWR+XZPjfH6PefBozsJf7VefjSsIN0nC/Q10jYqBTduAJtiWflTxOO9RUm nLJn/05RbC75QWKYMtEc7DDL0NceZ7LpYQSNC0FmNPJv2WC4NycRqdBxey5Goe4AZnZa t56g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738735119; x=1739339919; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+LeTTgaKlabWmcuLOpAiz/mXQYniil25QIw4uiRfkkw=; b=psZTb9zYyzX7TCn4G7d2D0ypDAINbgzz63pQkTeOwKY0nEcpykaQ1s1DHPDo/1wU8g ZoBdPDTp69M7jannkIf0wMenX7z5Wp4XW0H2bhjmRhby/cUn3azpiO8rktK6WI90GnzA 7RVolTjeZd9ojwBEANTAyHTjhZBOHHPf9S4gJWtaNi4sbIZm9agUVFKCyjp15/fQcY3G A+UHRUiLYZeZSsZT+O7MOmO8SHMFmOZBS8a5RxffrOBCVcgX00kLskAak36ehGEvsx2R 8VwMZNBJW5XnjIVB+3irEkN84ORIj99QbHaUpYWb6pNZFxeJeI6hbkFpByvdhEi03nRr HEgw== X-Forwarded-Encrypted: i=1; AJvYcCVysmaol4nJkWOmiQQpl+tWr+RjRnYO/Ma4OuXXlF/KSVLyZUMr8iuOn6BzV8biRN1pladhSCYskg==@kvack.org X-Gm-Message-State: AOJu0YzJAIMeY5oZS+t5ShyZ3YUGSX6oWR0EoFL6Ls8AT5vb7vHveI8M CGIDTJEdXLsHuy5mW5PO5G5N+3pIeWm1ZIQl5AQf5LoR/AK15WW7NovfcQdc8q9Rj7iDaAZEolV 0hJSFMQ22m6Td736yA2WgNZ2MvdbCO6DKkJkH X-Gm-Gg: ASbGncvNnVJQ1NDQ2d2R1aaudNergqgMw8gogEQ34kk5OVuB0isgQXjJl+Pizvx/e53 +J+4P5Rq8PSCiIBqDNxM1XnH5+OnJkWZrysixvrts84nIPwBDtdw1cJ0IFM2U0z0eSixSXyyEip HOyk4m0SYBVVtNVBVBIcMK2Hgx6aOK+g== X-Google-Smtp-Source: AGHT+IHAeeJ7IOCbYq9qYXUedVnXfgoMiRmsjTdvV6OKC4uJyMl13U48eatWrGiG/P3Fuyq5YQO4n/2f1pmKorekP20= X-Received: by 2002:a05:6512:b8a:b0:542:9910:b298 with SMTP id 2adb3069b0e04-54400bfcf7bmr445446e87.7.1738735118871; Tue, 04 Feb 2025 21:58:38 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Vishal Annapurve Date: Tue, 4 Feb 2025 21:58:27 -0800 X-Gm-Features: AWEUYZm7U7NIsnnFeQDWA3MGWHf8AoWjaeLXKlqts6ZNxbpqML3Ghho8NZkbTlo Message-ID: Subject: Re: [RFC PATCH v5 06/15] KVM: guest_memfd: Handle final folio_put() of guestmem pages To: Ackerley Tng Cc: tabba@google.com, kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 0615F160003 X-Stat-Signature: 17wsi741zeu1f41nc7tfnm8t9y4dcoj4 X-HE-Tag: 1738735120-409670 X-HE-Meta: U2FsdGVkX18CnomDS62kC0lkoR3jR9eXqi487TCCStOHFknxq7JA9X9xSqOul2rA3OWei4FueVBb5TCMqYhZA/VPDQ5Bvyu+ETP3x4YsRuZbHRODydYf1be4JJcSQEkr51N99wj5/Gxxfm8g5CHXgCkuO066Hwt7pyv/WuQmzYjy+eWRgc/iVVypP6tRsMsE8FmSmsq7RXGKkm+qCnVaEKyk0yUAp4lXviQta1Fztmz2HKKo8JMFeqh3AovnN+TjNJm7HEVegxeQSSsF38hbrMOP8SJnF94XKcIzzDwljY90fMQRntPFEDfFptZeULIkQZUDvz9bhFjJsvY9B7OwAoe5NtKaQETvx3G5KfZb/xWATsj+Kxxgfgtc0fNqPolPJNh50ANoU5SObfw5qWii1hcIxCFalBaWhbw1WJtgKFbjDGL4wl/vgvSYX+X8BELjjR3qM3+n0B1bgokRHR5Lez2Ms/3V+p3WSzypBrjF1ilRF7PUuFWBGgNaPuH5MMZNxRgN5Q9+e5MzUqetLkwoU1BXI0dXHBoEFdMdYH5eNWIElEvbMu1cxDtcLBbEDNT3wOBpsaTS3NTU4JoQ6K9ZyFTmwIrP7j6IQ+mOSmC5xUyf1PR7tHwgfNbQ9CVAQpKVQ1NvIjpsGyu1/KkCGeJYIPp8aWstUl0bObRYTAM4mBCKjZcZSVosbIKu1tEzQ1jCijfAMJaZc7T3LgS2KUzPG4nRJn3mstPkgphcQr90djyQfk5x2LR4pjuOwc7GJVWfacPKMmPxf4CYYjlvSQH4LHZGetsHmvvdpTfX8hNX9HYxKLqJc4Z4ziQ3T/8rgItIqQJ6L4V7Ng5grGhk5ok9x+GlPLSWJouJPS1YZb0Vf4p+Qg8JNQPl2RSN27nA5uhS3inYB25Ni2Ok928r7ZXAJJtaUZEruUSNCzbuhQSOpPC48wMen00hQ9z/B4A9DyFyPKr5ShtsLbVLPre9Tzw OzYmlPY3 SDnGo3rR2j2+cLpnNXwkrbmi5XCi6BUFrbM7wvzyH2gz2WsKhqGbeSDL3EvFbDGpYXbXilmOgOND2WYGu/r8RACwwRJc+YZ49s8/MUBktpdjJts4zs8N+8NrdKwbtK2ZJ+CgvBYXAEih13MfjZk79J8Bzx1i6Ky5myE14RK2BtaarOe50Sx5hfz9EhrQQFTRahfETJOA+No+mCr7sF5bb1NFlvaTIgN+EPK8aKVgGNrZaIJNOuoiLwHibIbJK8KOPVlGweRwtgCt2qU7NKOvy5gSobFzFn0ezbnUTjR9EurBzw4SP0HqZfB1v3UHugtu6E0dUfs85OCjRpKT1A9Jd4Ada9w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.048525, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 4, 2025 at 8:31=E2=80=AFPM Ackerley Tng wrote: > > Vishal Annapurve writes: > > > On Thu, Jan 23, 2025 at 1:51=E2=80=AFAM Fuad Tabba w= rote: > >> > >> On Wed, 22 Jan 2025 at 22:16, Ackerley Tng wr= ote: > >> > > >> > Fuad Tabba writes: > >> > > >> > Hey Fuad, I'm still working on verifying all this but for now this i= s > >> > one issue. I think this can be fixed by checking if the folio->mappi= ng > >> > is NULL. If it's NULL, then the folio has been disassociated from th= e > >> > inode, and during the dissociation (removal from filemap), the > >> > mappability can also either > >> > > >> > 1. Be unset so that the default mappability can be set up based on > >> > GUEST_MEMFD_FLAG_INIT_MAPPABLE, or > >> > 2. Be directly restored based on GUEST_MEMFD_FLAG_INIT_MAPPABLE > >> > >> Thanks for pointing this out. I hadn't considered this case. I'll fix > >> in the respin. > >> > > > > Can the below scenario cause trouble? > > 1) Userspace converts a certain range of guest memfd as shared and > > grabs some refcounts on shared memory pages through existing kernel > > exposed mechanisms. > > 2) Userspace converts the same range to private which would cause the > > corresponding mappability attributes to be *MAPPABILITY_NONE. > > 3) Userspace truncates the range which will remove the page from pageca= che. > > 4) Userspace does the fallocate again, leading to a new page getting > > allocated without freeing the older page which is still refcounted > > (step 1). > > > > Effectively this could allow userspace to keep allocating multiple > > pages for the same guest_memfd range. > > I'm still verifying this but for now here's the flow Vishal described in > greater detail: > > + guest_memfd starts without GUEST_MEMFD_FLAG_INIT_MAPPABLE > + All new pages will start with mappability =3D GUEST > + guest uses a page > + Get new page > + Add page to filemap > + guest converts page to shared > + Mappability is now ALL > + host uses page > + host takes transient refcounts on page > + Refcount on the page is now (a) filemap's refcount (b) vma's refcou= nt > (c) transient refcount > + guest converts page to private > + Page is unmapped > + Refcount on the page is now (a) filemap's refcount (b) transien= t > refcount > + Since refcount is elevated, the mappabilities are left as NONE > + Filemap's refcounts are removed from the page > + Refcount on the page is now (a) transient refcount > + host punches hole to deallocate page > + Since mappability was NONE, restore filemap's refcount > + Refcount on the page is now (a) transient refcount (b) filemap'= s > refcount > + Mappabilities are reset to GUEST for truncated range > + Folio is removed from filemap > + Refcount on the page is now (a) transient refcount > + Callback remains registered so that when the transient refcounts ar= e > dropped, cleanup can happen - this is where merging will happen > with 1G page support > + host fallocate()s in the same address range > + will get a new page > > Though the host does manage to get a new page while the old one stays > around, I think this is working as intended, since the transient > refcounts are truly holding the old folio around. When the transient > refcounts go away, the old folio will still get cleaned up (with 1G page > support: merged and returned) to as expected. The new page will also be > freed at some point later. > > If the userspace program decides to keep taking transient refcounts to ho= ld > pages around, then the userspace program is truly leaking memory and it > shouldn't be guest_memfd's bug. I wouldn't call such references transient. But a similar scenario is applicable for shmem files so it makes sense to call out this behavior as WAI.