From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A927C5B543 for ; Fri, 30 May 2025 20:32:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C15A46B01FF; Fri, 30 May 2025 16:32:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BC5C96B0200; Fri, 30 May 2025 16:32:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A8F0C6B0201; Fri, 30 May 2025 16:32:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 83E6B6B01FF for ; Fri, 30 May 2025 16:32:57 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A1F53E47D3 for ; Fri, 30 May 2025 20:32:56 +0000 (UTC) X-FDA: 83500723152.14.3A558FD Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) by imf16.hostedemail.com (Postfix) with ESMTP id C3371180005 for ; Fri, 30 May 2025 20:32:54 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ihA9YawN; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of 39RU6aAsKCFIuw4yB5yID7008805y.w86527EH-664Fuw4.8B0@flex--ackerleytng.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=39RU6aAsKCFIuw4yB5yID7008805y.w86527EH-664Fuw4.8B0@flex--ackerleytng.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748637174; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TBxCAqZvCNUWY+LzVlwT/+p8q3HcBUeDP2q6ZT5VtpU=; b=IU/pKNnupv6vtymea8o+Md5M0QL/rreciA1zL8aqgBOuiup2/fVTN5Sa0nsaY0kNyjsOqq humNrIQ7wWZDXE5Ao4k1tsz0X2E08g6d1RGH8k/PmMbZuas9pilra8NFh3JWB+YZhNmogy kfoKiF1bMEFB6pMsIh/u4i2754P0A/s= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ihA9YawN; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of 39RU6aAsKCFIuw4yB5yID7008805y.w86527EH-664Fuw4.8B0@flex--ackerleytng.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=39RU6aAsKCFIuw4yB5yID7008805y.w86527EH-664Fuw4.8B0@flex--ackerleytng.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748637174; a=rsa-sha256; cv=none; b=jP6zllUHP8tFjs3rjOpsNARlNrr1PwHxBHzBx8QoHfafHcTRKR2ynORZtqyTYUuERIQrO5 1+sCPZWt4NOAzC3HenbyoXu+v0Lrhg8RTk9mShkI9WADt1ltjo/djPWpSDOYxcv7Ibb0NF fka/PhJmLWeuO22VVefUrXUHTYkIu78= Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-7370e73f690so2664161b3a.3 for ; Fri, 30 May 2025 13:32:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1748637173; x=1749241973; darn=kvack.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=TBxCAqZvCNUWY+LzVlwT/+p8q3HcBUeDP2q6ZT5VtpU=; b=ihA9YawNF9XSvr1C0f1RQJI4kPTMeKMjyYw9HTscO6G3CRbsouPzMo/7GQvXAFnY6A LazvnvhbUiUa7T158Lorv0CS6RnaYUULfphq2g/QX0cjiFP3a+c2q68VFd5SXx+1sWmR 4zXiXR0wBPnUxMSBjiL40KwEi4t5t98vEYFK86cgjwjbmRgpdzME/2NOWHrle+tI0qmP NGIw7f61c4lKB4u7Cfc20s+Jyg80vJ8XYCELD2ERcqejrMd7XzKWYH7SjSt27kuC4YKM JL79S2XvUJaXqb8crrY2GEfUWmiz+EhJCY0VQoWO1kSYBGyvi3Fh9ts3UcfZX1RIKMKm oQIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748637173; x=1749241973; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=TBxCAqZvCNUWY+LzVlwT/+p8q3HcBUeDP2q6ZT5VtpU=; b=VSHZ3VSEP6RaasJwh1iZ+8dPogejz8KUIAbak3AOL/vZsRIxcmx1VmCRGIXZ3qNOYc T9zb7kyyNc47H+LP6BeUW++5lLO5Gei/4ML3DExbQDMjLFE4ksdcy9A1cY7nJ0YYbkVj soZh0i6kIf9HrFZhReor8/Oqbl6o6EH7WEq4v497Tfk30StFGMM4a3XJiX2oBABlErm9 ogDB8f8Rgab5zHrnTlYvqqzsanwWVM5gs2TB4kLyhmUi+3GHozk5aoc8pXIhtOqgH9cn hwa/BVosHT5ZcjsAiUv9zfO7u62gtW5M+CDnP7vMKcW3/7mEAKswE26jaD5XkfvwCn71 74vA== X-Forwarded-Encrypted: i=1; AJvYcCX8yD6tjDQCT9ojXH0DdQUC3dvitdCElQcABzLQHxz4QjfUx8QWi5x5YkeM0g0wdIBfl7rzveTsyQ==@kvack.org X-Gm-Message-State: AOJu0Yw4NKUny1uiKrm4abNnBZqfhKK41MhuWcBZCvMg3WDXcsZr172s RyAjXyNphLmB46SfQgO3x0KDPe9LlNN4yxmP78wdYG/Zv12LWUatyut7EM2oeUXyiCsxZdJf8c5 ScBUz0yFkCrTbU4hSCTOfFd8Fzw== X-Google-Smtp-Source: AGHT+IHn8Apyh2qJk1ZiWPwI4Xf5KbG4kHs1lrIxOZPs3doTW55CGpIuzttZTAM9OFdmVmjXkdwqHEOcQmlogkipOw== X-Received: from pfcg2.prod.google.com ([2002:a05:6a00:23c2:b0:746:1931:952a]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:3d4e:b0:73d:b1ff:c758 with SMTP id d2e1a72fcca58-747bd9e6de8mr6429258b3a.18.1748637173027; Fri, 30 May 2025 13:32:53 -0700 (PDT) Date: Fri, 30 May 2025 13:32:51 -0700 In-Reply-To: <21b9b151-6e4f-47b8-9c6b-73eeb0c20165@linux.intel.com> Mime-Version: 1.0 References: <37f60bbd7d408cf6d421d0582462488262c720ab.1747264138.git.ackerleytng@google.com> <21b9b151-6e4f-47b8-9c6b-73eeb0c20165@linux.intel.com> Message-ID: Subject: Re: [RFC PATCH v2 05/51] KVM: guest_memfd: Skip LRU for guest_memfd folios From: Ackerley Tng To: Binbin Wu Cc: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: C3371180005 X-Stat-Signature: xb5kxefii4w3o15hoaiqa5jgtr3xw8b9 X-Rspam-User: X-HE-Tag: 1748637174-86629 X-HE-Meta: U2FsdGVkX1/SA2Brax2di9cuZ7Zj6x7zTL6UPZ3S8flC+Qwqabunpp/l0pY09bgR8xYWaM9/ZVtsIRcPFsxERS/2RZtIemDdVy+GrmTYj16Ag3DlbDVZemWJE+v+kpB9YLvAHVkWyrpxqijKPNG+V676xxvnxUirCfSWMHryLtgHz9/LANvdzrkAcZDLBCdr0YN1nkqO/5yXzHFZiT5Hy+w/Ua0H1BdMadp5jAvX6KGmWlW4bLFfLIkHJEHiadr+4VkWzUK9U6Gz71ix+9x/YpS4pHojyYVx2UfQYtm9lebTDJR7tF4PbWykrM+EiTBQq6K33yQgS5DVv7fyZt5sSD63qtUBZKNrksSGOLz6fpl0h+pO0VsPO0mVwF1e0/FcgfHvJXBeoL1J0KH0Tv8KziD5omdTnP1pvRxmENMqKDrwXCu8lBD1LTgghOMBVuhy5zKX8QA/zVyJgUDgShMqJrucPVUhUgpwQsvLMdzctKD4Y+M/QO0q9Z7bqndjGeYfs91gjYUEDyuzbRTpMhk55B/+JfVDAJtd91xGn8Xo6+Dt39Ae5pJQj/7SU8W/nIgwy3dXS9gSVgsbdPiRxfnyRcvRdS1YyVx+ZYIM4gPCL6F2X0R2YxFrQCyRec+soLvLo+K9anjri6Lqxl/DrV8Xv1InCqIMYcFx5aQGll8eoQondetEtZ+KtMFB3CoFrlwKUcXKBfTDHTadCrFOkqLFIuhcpHtCSF8a5VKRMba85YW88X/K4A7qc3qCPzW8kKCGVVhawMba/3TchB0By1Nuw8z9XE0syU4fKBDKUAjQ/bEf0XoWbv4jusM3vyl6OwOx7yb1w0PgRoLOwSxPv8nHA8wG7HcQaW2MgxmmgrpESwForFzaiUX4bUap9wVc/AE392Rp8KRkEYC6BAL1jDcD7BuYa7+Bzew/l/LTWaxeuTAQQa6Vwov6mu3yyECJusYRwuoc4BUTHBly8k2FHz0 1u0bfIlR KJWKjDLNh6g2JDa/6/X/KKW+AHi5/MNYIk0CZIMSpD4jjrFVeuE8lyX5Q248A3E56viJbFhapklrY82sreWXceC8NQeqix+fCk56vsZ7hMwVDxNf3GGuvO2sVQVRWCt7n4BcOQPd/0SSy6wN+CG2G1C05AHdkPo/EnOrVjK9ZkWfO94BI2hTRbF0vhnEOQGZZNzCQKI+4PxCC1hPcXbXWJ/CLrcbL4pF125AXEdMVIdJVCh95u5jyQ9Xv0fIs61g7Kd49iNCZswpIARCSf/wwZLt1VdcNK6LtCRDHYvA0vu7k5Kw0YHITbMPGthr7D1RjzmotNLboGrQDcpiel760yMm5ssTZkXNQ8r3n2Q7Fu29+owKFQXCGuFGoS2ZtpkD1PdlV4IqtyX75a/2Ps+Uyk5QcMaTJQ0+PbhTUjzoiz1nKpj6hZTQoTNlYH3GUxWgkwUwNTBh87Ks9nb/JcIlVcuwA1jKhUtvG0L+iB0m0rS9VZBPcDJaG1pSmeDMpwFu9Z5ry+bDIh++SlaUxxmSfAeS/ZjTiWYCOjbEkel+kPY7r4mQymhqgXJ5H5+GqArOJ6YRuWI0UEkqFczC1GUXsBZZcclR6i057qCoXHPe3qSC5g8Ro+LSJ72YPPFHwhb6e7VDzqPPwZF3PoOOymRCFZC7RC0/kU6ET9YC8cUz/3wVATOeNVCKeeqK/JL+CkrbsxbcMVYJ/+3l6R/E= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Binbin Wu writes: > On 5/15/2025 7:41 AM, Ackerley Tng wrote: >> filemap_add_folio(), called from filemap_grab_folio(), adds the folio >> onto some LRU list, which is not necessary for guest_memfd since >> guest_memfd folios don't participate in any swapping. >> >> This patch reimplements part of filemap_add_folio() to avoid adding >> allocated guest_memfd folios to the filemap. > > filemap -> LRU list? > Yes, thank you. Will fix this in the next revision. >> >> With shared to private conversions dependent on refcounts, avoiding >> usage of LRU ensures that LRU lists no longer take any refcounts on >> guest_memfd folios and significantly reduces the chance of elevated >> refcounts during conversion. >> >> Signed-off-by: Ackerley Tng >> Change-Id: Ia2540d9fc132d46219e6e714fd42bc82a62a27fa >> --- >> mm/filemap.c | 1 + >> mm/memcontrol.c | 2 + >> virt/kvm/guest_memfd.c | 91 ++++++++++++++++++++++++++++++++++++++---- >> 3 files changed, 86 insertions(+), 8 deletions(-) >> > [...] >> /* >> * Returns a locked folio on success. The caller is responsible for >> * setting the up-to-date flag before the memory is mapped into the gu= est. >> @@ -477,8 +509,46 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, = struct kvm_memory_slot *slot, >> */ >> static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t i= ndex) >> { >> + struct folio *folio; >> + gfp_t gfp; >> + int ret; >> + >> +repeat: >> + folio =3D filemap_lock_folio(inode->i_mapping, index); >> + if (!IS_ERR(folio)) >> + return folio; >> + >> + gfp =3D mapping_gfp_mask(inode->i_mapping); >> + >> /* TODO: Support huge pages. */ >> - return filemap_grab_folio(inode->i_mapping, index); >> + folio =3D filemap_alloc_folio(gfp, 0); >> + if (!folio) >> + return ERR_PTR(-ENOMEM); >> + >> + ret =3D mem_cgroup_charge(folio, NULL, gfp); >> + if (ret) { >> + folio_put(folio); >> + return ERR_PTR(ret); >> + } >> + >> + ret =3D kvm_gmem_filemap_add_folio(inode->i_mapping, folio, index); >> + if (ret) { >> + folio_put(folio); >> + >> + /* >> + * There was a race, two threads tried to get a folio indexing >> + * to the same location in the filemap. The losing thread should >> + * free the allocated folio, then lock the folio added to the >> + * filemap by the winning thread. > > How about changing > =E2=80=9Cthen lock the folio added to the filemap by the winning thread= =E2=80=9D > to > "the winning thread locks the folio added to the filemap"? > How about: There was a race. Threads tried to get a folio indexing to the same location in the filemap. The winning thread allocated and locked the folio at the requested index. The losing threads should free the extra allocated folio, then wait to lock the same folio allocated (and locked) by the winning thread. >> + */ >> + if (ret =3D=3D -EEXIST) >> + goto repeat; >> + >> + return ERR_PTR(ret); >> + } >> + >> + __folio_set_locked(folio); >> + return folio; >> } >> =20 >> static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t s= tart, >> @@ -956,23 +1026,28 @@ static int kvm_gmem_error_folio(struct address_sp= ace *mapping, struct folio *fol >> } >> =20 >> #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE >> +static void kvm_gmem_invalidate(struct folio *folio) >> +{ >> + kvm_pfn_t pfn =3D folio_pfn(folio); >> + >> + kvm_arch_gmem_invalidate(pfn, pfn + folio_nr_pages(folio)); >> +} >> +#else >> +static inline void kvm_gmem_invalidate(struct folio *folio) {} > > No need to tag a local static function with "inline". > Will fix in the next revision. >> +#endif >> + > [...]