From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86593C02180 for ; Wed, 15 Jan 2025 05:42:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E7EE280003; Wed, 15 Jan 2025 00:42:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 171BE280001; Wed, 15 Jan 2025 00:42:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F2D2C280003; Wed, 15 Jan 2025 00:42:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id CCBD1280001 for ; Wed, 15 Jan 2025 00:42:36 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 848EE1C80AC for ; Wed, 15 Jan 2025 05:42:36 +0000 (UTC) X-FDA: 83008591512.01.AF2BA96 Received: from mail-ej1-f45.google.com (mail-ej1-f45.google.com [209.85.218.45]) by imf08.hostedemail.com (Postfix) with ESMTP id 257AE160002 for ; Wed, 15 Jan 2025 05:42:33 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=M1zmyeL9; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf08.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.218.45 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736919754; a=rsa-sha256; cv=none; b=wIi/aUmtXJYWAH2stndKLc/BObajDjK+fq8xUOP82saCffuZG+TxFqVJcpmZ+z8lAO5CCN xVIh//AbVWYCRyJXCMurWutcKFTtvTCdj+CdmrXnytAjGfk/UO1vYG2SyeeqrvCbkonD23 Relc6xkIxk7AV/SQ2j1rcmfvUZAHai4= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=M1zmyeL9; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf08.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.218.45 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736919754; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DuBzISU5VFPAjsRkdsiFIlUcNiX/+1PtQJcrEZX9csg=; b=Z2t6eKJvxU8Ji/CnKqRV98UCn+NSXBtqOR88UjE58MaUaj/zef67ZIAHyF7YN3jW4G2vJg 5uqBFmT95xQ6hijobwylutzQ6RXZjc9fBNWthaVm3i+VsE+VnFnwbl4BTidU9d+zmBSTjW rUn1d0j/1ep+FO/Nv7ireOPIBq9Ekao= Received: by mail-ej1-f45.google.com with SMTP id a640c23a62f3a-aaedd529ba1so892309566b.1 for ; Tue, 14 Jan 2025 21:42:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736919753; x=1737524553; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=DuBzISU5VFPAjsRkdsiFIlUcNiX/+1PtQJcrEZX9csg=; b=M1zmyeL9e67H4AMfqcXit+TmBq7Z10xGS3i2JFBkgqtmyKK/w/q5CRYEftpM5QpBpr zIEaAuH3x9LrjmlbRNqTGEBSHCbYIebl1+ycrrkQ5t8p4p7Q1Vp+4P2X/0j8shRheEon zLOQipI6z1Eiz0dY7WLirJg3hQdwnQFYexYbCBuhRo4YAllj77oDKTnJ6C+uEefZxMfg ji2o+gD5ti/nWBl7iBnL7bYDUVAZQZ6oDEb6EesvXTW0kvbODpMJT7+P3XvsmFovIUrY 06iw0cc8Qs8PMtir/4RqikvLuOFUQORtWiCKZEkvPXhUgTAWvLXYuTAV8At2TKEAtDzl b5CA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736919753; x=1737524553; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DuBzISU5VFPAjsRkdsiFIlUcNiX/+1PtQJcrEZX9csg=; b=Z43f0yWogQ97KDpeJR1k08e9zsR+umbBLMLlWMUXbxwlY0z4clxEW701TRiNWifrTn rzEQQuZx4Jdsf9f0kbftOJyrfWhplgj48Up4sFCwpEUAIXbv6p/7i1NN6zrcUa3gvWwo 5rb7iK6XUf+cCdMQb/E8PEWOUhi+mknL9Q4hp1K2oVhU1DtjGIx19fGjoOjug3fhkBoc z8KCQvCEOCu5R/bLWHz+vJOr9nK1aa6MGV7f2HdGT5bsOUUI3dEgZLPTP9UJeWgLpebg 07jXhxEhr93yRVXMG1wpgB6kIXMUw62p1F5O+FiX3h0ThZ6cER/LFq3cqjADKY10Tgh5 ns4w== X-Forwarded-Encrypted: i=1; AJvYcCX+PmeoQ0w1qkf+K+vwjZx1Hd+hgHFrwBjH1PlGVItX+OZw/Co+/sXO+AZxGK/wh/66P2PvKu5uNw==@kvack.org X-Gm-Message-State: AOJu0YyEEedSQL9rDAuf2Uc9QL8YSusAVYVCG8wh1bzqPXzYc13Fg+pS YRZ6LP4r/Des8p7DWBpSQldcRIbFgqGj72sDiaZsbJXxx3+ZPpmEZA3C7WkEvRgJ1ECkjThoCuC KZFGRrbFrpH+GHeAPMrlKRCIAMn8= X-Gm-Gg: ASbGnctsoHDH/81ppPG+AdfTUP/wOuAnToUN9opdQ//awV8TxbAKbvbm167E4VxbRuS nhG+HPvaFL7yGM+3uVWOh2OfBBzraAIpWBpzasQ== X-Google-Smtp-Source: AGHT+IFPndA0UnHuFvKNHji3gtntOaWwA5e+Fc7f8rILAQ7wRnmjAYYDRlAaokCLMY9u4f9Tm0Y1CgsH6jrpedhqsmg= X-Received: by 2002:a05:6402:35ca:b0:5d0:d818:559d with SMTP id 4fb4d7f45d1cf-5d972e0b954mr70203434a12.11.1736919752505; Tue, 14 Jan 2025 21:42:32 -0800 (PST) MIME-Version: 1.0 References: <20250115033808.40641-1-21cnbao@gmail.com> <20250115033808.40641-5-21cnbao@gmail.com> In-Reply-To: From: Lance Yang Date: Wed, 15 Jan 2025 13:41:56 +0800 X-Gm-Features: AbW1kvbgJbBDZ9E1MWxWALOwnicXwxZfzPZ7oAIJcXasjZxKMd8HDROy6TEL6hE Message-ID: Subject: Re: [PATCH v3 4/4] mm: Avoid splitting pmd for lazyfree pmd-mapped THP in try_to_unmap To: Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, linux-mm@kvack.org, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, lorenzo.stoakes@oracle.com, ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org, ying.huang@intel.com, zhengtangquan@oppo.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 257AE160002 X-Stat-Signature: ftkana8fahf5zxwyxncey34qphr9uchx X-HE-Tag: 1736919753-943084 X-HE-Meta: U2FsdGVkX18NT8U6ZaPzmc2/D6b2gZgOtHAK/1YwNJVDi/tVa9QOrQaNGjtN39R8i9YEfqGXM1P8ZGG+yS5p63Evwm9bqoYO6RMeTMXn80JtTrjalpCRwb4UfB6yTxokKmVHmH93zGHRCutwF69l00/XMmq20Iu7gLorOv9VudAIEKzxM9hlMt17cSpCyKEE+XjL8leC+BD4eVGwjDNFM4kL1wM00qyoX0RjyuZzR2yaDZtL4vILlN5LG6gJ+j0tcq+942mzD0jSyz5dejJfKCYAR7cvvtjI+QmIF0O+VIlmzXlomHxc1MtGzy7rWU0PCmaaPFE59zgw7u34dCJGCqrc0n/uyoWXbVNq1MnCd02sOf7RcQHOAx31wnI6QJsgw/Owd4x5dWRMUjwax/ReJNiQI7dNFntyTinMIN0DK1qbUfV+exPjY84BRPqhP9m2gpr0VjWE2maQovX9rExv3Wzatd8jr2EN9zfA7B+kj877P/5pMEMA1pCXHrbYrIM0Dg6Do14N2BooemGdtQjVVir69Mg/2H2wluTd/eIRDJwzrYyco37nUBVRpCDw4eKfCEPyogbIOFCGaS9WRyeErMVgeNkcUZLQDvRayWctNOjxBP6ddyQ8OTw7SMIgz/j0UWqlitBFrj4/IkawwUiEwe73w7W0FHQVg7hR00Yeokm0P33J5RxFX2nNH97vMufF833vDsyB+ixFsesdJYuj2krEefIAnzhuHtHCBYIFWIoLFjuIamhwWnK40/8YH8yMuJalMYiR5mW/OgopSq/8TR1GBE6Tmzfup17LgeZNLPYF0xqz0067XagzeN716jIJXk73BtL8gt2pinGItUlVv+ARoI3ezgd/ptImnjmjEJND2ZgXgFzVqRNg++WRn8ape5awRk1O2C1jbTEraz+/9YjquST2P2shuShxS17Q0RpOKKZMTISfejtNYcnsMDa1l5I6bFt/4lqXGyMetZE xE3n3UWM 72hZDjo07ULyMHSJ6R9GgYS7qVVOaasrrg9IaJ3t5uUCJ9q7nWOzDFVCrkzt2P1HBPknFe8jvBOR298eVsqrabChI28fe4CF4uNzmSBj4Kt/RvrmvYZrI2KnK4Exwhkva44Ufm1qFRY3PhVpNMoCy4jezsaHf4jXaUoSkOY4FT1NObKmNQ8OISgBxdmo2lhKVKonZHqrsr+oGlPHHW8TJjpw5jH/Ry3hDbRirm8K2u3tpc5x7aDnp5zkeZ8woeUJ07gzsMZCLMTeSoR0YTwDUcP25V0rPZHSXw1mpWK0n1CoSg7gHBuhnF4sKmAyjz4P12nlx65pWVR86NebAmNU5doUbBzKJm8WSSZp3y7i1MdRe5xYEc2iAucOT6eHC6F4FiniLVrx0Zb3eadZ0eXTKx/BkSyMleB77PIdVsZqivftdHOg1Hnl++KgPDg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 15, 2025 at 1:09=E2=80=AFPM Barry Song <21cnbao@gmail.com> wrot= e: > > On Wed, Jan 15, 2025 at 6:01=E2=80=AFPM Lance Yang = wrote: > > > > On Wed, Jan 15, 2025 at 11:38=E2=80=AFAM Barry Song <21cnbao@gmail.com>= wrote: > > > > > > From: Barry Song > > > > > > The try_to_unmap_one() function currently handles PMD-mapped THPs > > > inefficiently. It first splits the PMD into PTEs, copies the dirty > > > state from the PMD to the PTEs, iterates over the PTEs to locate > > > the dirty state, and then marks the THP as swap-backed. This process > > > involves unnecessary PMD splitting and redundant iteration. Instead, > > > this functionality can be efficiently managed in > > > __discard_anon_folio_pmd_locked(), avoiding the extra steps and > > > improving performance. > > > > > > The following microbenchmark redirties folios after invoking MADV_FRE= E, > > > then measures the time taken to perform memory reclamation (actually > > > set those folios swapbacked again) on the redirtied folios. > > > > > > #include > > > #include > > > #include > > > #include > > > > > > #define SIZE 128*1024*1024 // 128 MB > > > > > > int main(int argc, char *argv[]) > > > { > > > while(1) { > > > volatile int *p =3D mmap(0, SIZE, PROT_READ | PROT_WR= ITE, > > > MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); > > > > > > memset((void *)p, 1, SIZE); > > > madvise((void *)p, SIZE, MADV_FREE); > > > /* redirty after MADV_FREE */ > > > memset((void *)p, 1, SIZE); > > > > > > clock_t start_time =3D clock(); > > > madvise((void *)p, SIZE, MADV_PAGEOUT); > > > clock_t end_time =3D clock(); > > > > > > double elapsed_time =3D (double)(end_time - start_tim= e) / CLOCKS_PER_SEC; > > > printf("Time taken by reclamation: %f seconds\n", ela= psed_time); > > > > > > munmap((void *)p, SIZE); > > > } > > > return 0; > > > } > > > > > > Testing results are as below, > > > w/o patch: > > > ~ # ./a.out > > > Time taken by reclamation: 0.007300 seconds > > > Time taken by reclamation: 0.007226 seconds > > > Time taken by reclamation: 0.007295 seconds > > > Time taken by reclamation: 0.007731 seconds > > > Time taken by reclamation: 0.007134 seconds > > > Time taken by reclamation: 0.007285 seconds > > > Time taken by reclamation: 0.007720 seconds > > > Time taken by reclamation: 0.007128 seconds > > > Time taken by reclamation: 0.007710 seconds > > > Time taken by reclamation: 0.007712 seconds > > > Time taken by reclamation: 0.007236 seconds > > > Time taken by reclamation: 0.007690 seconds > > > Time taken by reclamation: 0.007174 seconds > > > Time taken by reclamation: 0.007670 seconds > > > Time taken by reclamation: 0.007169 seconds > > > Time taken by reclamation: 0.007305 seconds > > > Time taken by reclamation: 0.007432 seconds > > > Time taken by reclamation: 0.007158 seconds > > > Time taken by reclamation: 0.007133 seconds > > > =E2=80=A6 > > > > > > w/ patch > > > > > > ~ # ./a.out > > > Time taken by reclamation: 0.002124 seconds > > > Time taken by reclamation: 0.002116 seconds > > > Time taken by reclamation: 0.002150 seconds > > > Time taken by reclamation: 0.002261 seconds > > > Time taken by reclamation: 0.002137 seconds > > > Time taken by reclamation: 0.002173 seconds > > > Time taken by reclamation: 0.002063 seconds > > > Time taken by reclamation: 0.002088 seconds > > > Time taken by reclamation: 0.002169 seconds > > > Time taken by reclamation: 0.002124 seconds > > > Time taken by reclamation: 0.002111 seconds > > > Time taken by reclamation: 0.002224 seconds > > > Time taken by reclamation: 0.002297 seconds > > > Time taken by reclamation: 0.002260 seconds > > > Time taken by reclamation: 0.002246 seconds > > > Time taken by reclamation: 0.002272 seconds > > > Time taken by reclamation: 0.002277 seconds > > > Time taken by reclamation: 0.002462 seconds > > > =E2=80=A6 > > > > > > This patch significantly speeds up try_to_unmap_one() by allowing it > > > to skip redirtied THPs without splitting the PMD. > > > > > > Suggested-by: Baolin Wang > > > Suggested-by: Lance Yang > > > Signed-off-by: Barry Song > > > --- > > > mm/huge_memory.c | 24 +++++++++++++++++------- > > > mm/rmap.c | 13 ++++++++++--- > > > 2 files changed, 27 insertions(+), 10 deletions(-) > > > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > > index 3d3ebdc002d5..47cc8c3f8f80 100644 > > > --- a/mm/huge_memory.c > > > +++ b/mm/huge_memory.c > > > @@ -3070,8 +3070,12 @@ static bool __discard_anon_folio_pmd_locked(st= ruct vm_area_struct *vma, > > > int ref_count, map_count; > > > pmd_t orig_pmd =3D *pmdp; > > > > > > - if (folio_test_dirty(folio) || pmd_dirty(orig_pmd)) > > > + if (pmd_dirty(orig_pmd)) > > > + folio_set_dirty(folio); > > > + if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE= )) { > > > + folio_set_swapbacked(folio); > > > return false; > > > + } > > > > If either the PMD or the folio is dirty, should we just return false ri= ght away, > > regardless of VM_DROPPABLE? There=E2=80=99s no need to proceed further = in that > > case, IMHO ;) > > I don't quite understand you, but we need to proceed to clear pmd entry. > if vm_droppable is true, even if the folio is dirty, we still drop the fo= lio. Ah, you're right, and I completely got it wrong ;( Thanks, Lance > > > > > Thanks, > > Lance > > > > > > > > orig_pmd =3D pmdp_huge_clear_flush(vma, addr, pmdp); > > > > > > @@ -3098,8 +3102,15 @@ static bool __discard_anon_folio_pmd_locked(st= ruct vm_area_struct *vma, > > > * > > > * The only folio refs must be one from isolation plus the rm= ap(s). > > > */ > > > - if (folio_test_dirty(folio) || pmd_dirty(orig_pmd) || > > > - ref_count !=3D map_count + 1) { > > > + if (pmd_dirty(orig_pmd)) > > > + folio_set_dirty(folio); > > > + if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE= )) { > > > + folio_set_swapbacked(folio); > > > + set_pmd_at(mm, addr, pmdp, orig_pmd); > > > + return false; > > > + } > > > + > > > + if (ref_count !=3D map_count + 1) { > > > set_pmd_at(mm, addr, pmdp, orig_pmd); > > > return false; > > > } > > > @@ -3119,12 +3130,11 @@ bool unmap_huge_pmd_locked(struct vm_area_str= uct *vma, unsigned long addr, > > > { > > > VM_WARN_ON_FOLIO(!folio_test_pmd_mappable(folio), folio); > > > VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio); > > > + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); > > > + VM_WARN_ON_FOLIO(folio_test_swapbacked(folio), folio); > > > VM_WARN_ON_ONCE(!IS_ALIGNED(addr, HPAGE_PMD_SIZE)); > > > > > > - if (folio_test_anon(folio) && !folio_test_swapbacked(folio)) > > > - return __discard_anon_folio_pmd_locked(vma, addr, pmd= p, folio); > > > - > > > - return false; > > > + return __discard_anon_folio_pmd_locked(vma, addr, pmdp, folio= ); > > > } > > > > > > static void remap_page(struct folio *folio, unsigned long nr, int fl= ags) > > > diff --git a/mm/rmap.c b/mm/rmap.c > > > index be1978d2712d..a859c399ec7c 100644 > > > --- a/mm/rmap.c > > > +++ b/mm/rmap.c > > > @@ -1724,9 +1724,16 @@ static bool try_to_unmap_one(struct folio *fol= io, struct vm_area_struct *vma, > > > } > > > > > > if (!pvmw.pte) { > > > - if (unmap_huge_pmd_locked(vma, pvmw.address, = pvmw.pmd, > > > - folio)) > > > - goto walk_done; > > > + if (folio_test_anon(folio) && !folio_test_swa= pbacked(folio)) { > > > + if (unmap_huge_pmd_locked(vma, pvmw.a= ddress, pvmw.pmd, folio)) > > > + goto walk_done; > > > + /* > > > + * unmap_huge_pmd_locked has either a= lready marked > > > + * the folio as swap-backed or decide= d to retain it > > > + * due to GUP or speculative referenc= es. > > > + */ > > > + goto walk_abort; > > > + } > > > > > > if (flags & TTU_SPLIT_HUGE_PMD) { > > > /* > > > -- > > > 2.39.3 (Apple Git-146) > > >