From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ABB6FC02192 for ; Wed, 5 Feb 2025 04:32:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5E183280005; Tue, 4 Feb 2025 23:32:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 59117280004; Tue, 4 Feb 2025 23:32:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 458A2280005; Tue, 4 Feb 2025 23:32:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 2835E280004 for ; Tue, 4 Feb 2025 23:32:01 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A6B52805E6 for ; Wed, 5 Feb 2025 04:32:00 +0000 (UTC) X-FDA: 83084618400.20.ACEB4CE Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) by imf17.hostedemail.com (Postfix) with ESMTP id D8D7E4000B for ; Wed, 5 Feb 2025 04:31:58 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=je4o3nTW; spf=pass (imf17.hostedemail.com: domain of 3vemiZwsKCGA8AICPJCWRLEEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--ackerleytng.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3vemiZwsKCGA8AICPJCWRLEEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738729918; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:dkim-signature; bh=ZtAIjBOFFeafVFBhyEOiFEMQllzc64d/EBT8ds1/JA8=; b=73AIYZSu3hOxuFmoDDfEtyc1MQZR8Np2m9IW1/znMgvEKBRXZ5XbgMB2KCE2foDMLMiWbQ nb0JEa+ny6LwL1KY79mTe+4H5uDf8nWPkvwgIiCFqAvW8rzQ8j60rILqMaPMBjMlNnowlf +BUuKu8/nQ5gB5CCxtunenoD2baUQZM= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=je4o3nTW; spf=pass (imf17.hostedemail.com: domain of 3vemiZwsKCGA8AICPJCWRLEEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--ackerleytng.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3vemiZwsKCGA8AICPJCWRLEEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738729918; a=rsa-sha256; cv=none; b=byu2NuD+bS6oTrVdaVEkQgUr6SEY/jeeLcX3laDGLFk6jyA9eUwJpnDV3BcmA2N70T96JK KEVlwRejoKCY7kjDTpu+ggRjVHA01imtYZVCiVU+h3XMoCFfXHUQ7GiY/+CkQU0kj8WidW C6VjQS9bzYNPiog2DWhfjMrPPt6CRUw= Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-21f078fc592so34545855ad.1 for ; Tue, 04 Feb 2025 20:31:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738729918; x=1739334718; darn=kvack.org; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=ZtAIjBOFFeafVFBhyEOiFEMQllzc64d/EBT8ds1/JA8=; b=je4o3nTW1AG5J+D8T0SgKKMHs3Ms+Zrwqi9R4KQrlbyzbelTQlxa08TTrzSWsCkIkh kLLBGDntniJfCw2Mg0FEfWzhqw5Lq4j+5yZTfi/RjVonRlCwaMYKdklMCetXL5b1aD49 SQl3cD5MxSSmkUsuqee3WzbTKdJ8YZZjFR8oIXg7D2SWu70SW481w6qF1ZlC4VawgTLM ehiBadotvg3ELf+AyU0t9kBN4iE94VRlk+nKOyKLE8r621Zioqg5PvB0defd9hculDoY hfmHggpCsuZUq26g5pKa+haTtBnaVFtEBnEFNZLek0Jcn6b1+ZNFzRkHP0/ziRvoh5Pq ERfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738729918; x=1739334718; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=ZtAIjBOFFeafVFBhyEOiFEMQllzc64d/EBT8ds1/JA8=; b=fMKlC/1DZX46l+MWFTRrZg3bpWk10VKR/qgacpmi1il+Uv+HtDnuLZHXU1KPwdSBIM y4MroSj0YOZpkueJwwyVFSTa8OoNdrsHSDD4K12kgDiEXlTr3WLINFBwejyHcmmthcr6 x15J00w+KpzDvqje0tMwcECEFKpZPBVsIax7SqlNvXO6uDMSWdT1tQY2TsQyMMdyOlIP DUh4pTRE6ptREDs34XSKIyAmJ1tDG50nrQDyqCadGVWatB92ZalAxvl2YTxF/W00gQPI UN5m/vl6t1OWGtLcgVGJWRrxU1eTpTIH2//j3Xoena47M3+EbXizWDMVDFdzocuEq505 +qiQ== X-Forwarded-Encrypted: i=1; AJvYcCXe5Hjakw1Jjea20/6aY8ZglMhqHVkeO63mVgcoLU3/srG8xfpEFoTC9E9aQQWuHACU/LJyEA2YOA==@kvack.org X-Gm-Message-State: AOJu0Yxud6b5j5LQgLjJBUGsQiHbZ97I0JScztHUawHO1hanXdEMZ1HV KYwmyu+A/vzNRyvAgDSDkXioKwO+zHcAtzRwNd2LUBLWFMLQHEs2gzUzePJBA89yX/mljrPFKIj 3IOhaoKRTlGCJLvURtlGXlw== X-Google-Smtp-Source: AGHT+IEUGAEXsyrkT7DjTRUy8A7fBT8adorRd83c8M0VQOASQHQWG4hYdiwBaNDxKPtPTZ7XijwVU7oj/MkuobrlOg== X-Received: from pfde7.prod.google.com ([2002:aa7:8c47:0:b0:725:f376:f548]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:100c:b0:1e0:d1c3:97d1 with SMTP id adf61e73a8af0-1ede88b3b5fmr2567882637.29.1738729917642; Tue, 04 Feb 2025 20:31:57 -0800 (PST) Date: Wed, 05 Feb 2025 04:31:53 +0000 In-Reply-To: (message from Vishal Annapurve on Tue, 4 Feb 2025 17:28:08 -0800) Mime-Version: 1.0 Message-ID: Subject: Re: [RFC PATCH v5 06/15] KVM: guest_memfd: Handle final folio_put() of guestmem pages From: Ackerley Tng To: Vishal Annapurve Cc: tabba@google.com, kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: D8D7E4000B X-Stat-Signature: 4yixz8ycdcde995748fyy93sti94dyhy X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1738729918-775282 X-HE-Meta: U2FsdGVkX19t3yPbMpdxFSqtMR072bjpDpieEgHrK/3hknX686euvR1GVNB7TG5a0sYEufbHTI6/74yuU7aEJVN6FwNwPPEhvlT9xpd28UCWFMEPgNpdzonF5iUNty+FVs9I/fix5NZyMHKotttba6JumvAHWEmWIVQDZPnERoSWDtKwOhbuv/7+W7Hl+KhX5oj1iah0phyo3WY0Onatf2VqfTfqyNJ5tvUQ9HYe3EIiFn04yemvSOjyVztg9eXiSoZbw6pjeNgImdmfn/prHu9Q8BTPhAyAZw0ekm3kPfwMc3X0W+ip9uai6sFmYuhgfEznKCcXKe9kVKa8WdkfgwtsZnCx0DQz23qqZQTPqt66U92PD1KB1QLoMebQYrexX2vvs9PI+kQN0qp/Y8wKOEzTpZ9RqscZNNrTUhhhaGctkJCT98n0AS3uTosbLgRnTu2JMkWyOd96sggwI2nHw2ac5BgxHGkwYTNeuQg9EM0r1oq66KbvZewpOJAHhHQmwRg8e+YvdWl/69g/juZxxF9gxR+wjKURuG+4q/Po8MZ12iGyIK0vspRi7Hlan6Ge0Kiqujv9pXAI3ahYvsM1FcMcp+Ctl7nwYxcSC89vJXoyz/EHh0CTq6zu1Sz2ruGiSrq5xOSHq6jmYKKaxux2n7VaLg5KrmGmC+HJEWzTYhOM+MVtVsrHm5JLhmA892vxTkBEspYFTN2wBqGTm+TY1oCvm4jlqbebrxZqzf2PX3j9DmDs6cPI2dscf0EasC6KwyQIwYhCB92yg0CiShxrgvHqLwIxhjFXO55IZltAC2eZbSXPwfiAk+a/MypTi+pD6FP6OFVPbYaGK8kDnGs44V0mHHV1DzjqM/Gef28+6p3D5wbxa5AuSw/tehd9Hz9jhM2SOmnYzfG/vN2bnOm4aUi4MFgbRNn8qMezP1Ogn3xGT2Qb5HEbOXyB0JJ0EvEp3VxaAzfd2oocFlMP+/Z 1+iNkDPg sJ1GZXbrwdEy8aq/H0UF6tK7NF0Znx3oUDCM+llqxBjUETNC+udHL5F8YKxefmSUUdXuAUCHSJsjX7JDkHg0USqzm+FRQJghSyW1BTRFdLyy/uzOnh9eVC0c7nSR6gMFUMTY0qtghnrAv9N33eDFuD13BvMTz1ZExJwaEVW0olyN5VE7y1oLeos8AbZ/Ub24i3dz252JLbvMm96HW2ljB9/ve5aeT/3kWHB+h8ZIp7w47b2KVlf9xlnFi4WVKPKPzJsi7Tl416QCQMHY2khsrBUydVBtDzPE7JVTYysJcb9uezyHmKdGRyYIKGXG6cRsN/mLYSTvHWMMVNwtC4v5u07+tkCxDw1feoaARg0EJBZ0BsKcLk8FBYv+71AE5WXje3KIxIQSe2Babuy5oxw7dlOq3KRcRBFncAJbahNPgwb9zPYpgsIumrq3ac0U35NME6c59N8e1UGcIwJgnOhRezo5/m2WuTuzK1zYFY72wHZo6niQoYW3NDe3SJ/No5sv42VgbQiL+FhAqV/Ff10yuW2AlAwda1gb2TaLf6Z1d2cSqLTzjgQVoZGyueSCVhSXZ2QNnZsoljeZSyB+oP28PkmoE+9IN9acOO1jMKQ947jQaSDPoIvcDk/DRfgY7zB1EEKZ6cQ9AubGD1ggRimH84+03KInmXAEJbsO1BvisXHy7vKsUPqr/e253lg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.282009, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Vishal Annapurve writes: > On Thu, Jan 23, 2025 at 1:51=E2=80=AFAM Fuad Tabba wro= te: >> >> On Wed, 22 Jan 2025 at 22:16, Ackerley Tng wrot= e: >> > >> > Fuad Tabba writes: >> > >> > Hey Fuad, I'm still working on verifying all this but for now this is >> > one issue. I think this can be fixed by checking if the folio->mapping >> > is NULL. If it's NULL, then the folio has been disassociated from the >> > inode, and during the dissociation (removal from filemap), the >> > mappability can also either >> > >> > 1. Be unset so that the default mappability can be set up based on >> > GUEST_MEMFD_FLAG_INIT_MAPPABLE, or >> > 2. Be directly restored based on GUEST_MEMFD_FLAG_INIT_MAPPABLE >> >> Thanks for pointing this out. I hadn't considered this case. I'll fix >> in the respin. >> > > Can the below scenario cause trouble? > 1) Userspace converts a certain range of guest memfd as shared and > grabs some refcounts on shared memory pages through existing kernel > exposed mechanisms. > 2) Userspace converts the same range to private which would cause the > corresponding mappability attributes to be *MAPPABILITY_NONE. > 3) Userspace truncates the range which will remove the page from pagecach= e. > 4) Userspace does the fallocate again, leading to a new page getting > allocated without freeing the older page which is still refcounted > (step 1). > > Effectively this could allow userspace to keep allocating multiple > pages for the same guest_memfd range. I'm still verifying this but for now here's the flow Vishal described in greater detail: + guest_memfd starts without GUEST_MEMFD_FLAG_INIT_MAPPABLE + All new pages will start with mappability =3D GUEST + guest uses a page + Get new page + Add page to filemap + guest converts page to shared + Mappability is now ALL + host uses page + host takes transient refcounts on page + Refcount on the page is now (a) filemap's refcount (b) vma's refcount (c) transient refcount + guest converts page to private + Page is unmapped + Refcount on the page is now (a) filemap's refcount (b) transient refcount + Since refcount is elevated, the mappabilities are left as NONE + Filemap's refcounts are removed from the page + Refcount on the page is now (a) transient refcount + host punches hole to deallocate page + Since mappability was NONE, restore filemap's refcount + Refcount on the page is now (a) transient refcount (b) filemap's refcount + Mappabilities are reset to GUEST for truncated range + Folio is removed from filemap + Refcount on the page is now (a) transient refcount + Callback remains registered so that when the transient refcounts are dropped, cleanup can happen - this is where merging will happen with 1G page support + host fallocate()s in the same address range + will get a new page Though the host does manage to get a new page while the old one stays around, I think this is working as intended, since the transient refcounts are truly holding the old folio around. When the transient refcounts go away, the old folio will still get cleaned up (with 1G page support: merged and returned) to as expected. The new page will also be freed at some point later. If the userspace program decides to keep taking transient refcounts to hold pages around, then the userspace program is truly leaking memory and it shouldn't be guest_memfd's bug.