From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2461DC02183 for ; Wed, 15 Jan 2025 06:27:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ABCF06B0088; Wed, 15 Jan 2025 01:27:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A6CC56B0089; Wed, 15 Jan 2025 01:27:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9355B6B008A; Wed, 15 Jan 2025 01:27:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 78A4C6B0088 for ; Wed, 15 Jan 2025 01:27:01 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 014201410B4 for ; Wed, 15 Jan 2025 06:27:00 +0000 (UTC) X-FDA: 83008703400.07.DE4A2DA Received: from mail-ed1-f41.google.com (mail-ed1-f41.google.com [209.85.208.41]) by imf02.hostedemail.com (Postfix) with ESMTP id 0F43780010 for ; Wed, 15 Jan 2025 06:26:58 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=QDNL5yEj; spf=pass (imf02.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.208.41 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736922419; a=rsa-sha256; cv=none; b=dZ6YWtaFM6/rnH2KuJ4Nphb9Urp5V8WS5MdAqYT8eWhjYQkhjDUGwjd0ln9o0kS2t8GRwh l5GQO5L2IMCvh0oDE/SBFtoX8/aCgl4TyFjZK6M7t5fCIu7cNz5yRcuutUUP65Gd/nHzvb c27F7LaaDK4+fKGMxCj2gnOPDd9onTY= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=QDNL5yEj; spf=pass (imf02.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.208.41 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736922419; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1QTHVErjQ8xeLIe3NUqxIk/8SXxlth+JV5nZQMI2kDg=; b=wsGz9pKyexmKMiInvqWVMTruZRqCxNIwABQoKnGPO1bcVMF5ry88Z8gf9ppEW+Gd2YbV2K 7lkJrlBaOW8A/GEsoklGIxB1zoAfrmMDIARnqxhXLYncOZ/QMCBxfWiEzooxj/qcDDPr+j mAIVRjhyWYZvOewTEnf7eCTboBuCcLg= Received: by mail-ed1-f41.google.com with SMTP id 4fb4d7f45d1cf-5d3bbb0f09dso11437147a12.2 for ; Tue, 14 Jan 2025 22:26:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736922417; x=1737527217; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=1QTHVErjQ8xeLIe3NUqxIk/8SXxlth+JV5nZQMI2kDg=; b=QDNL5yEj362TIJOuTySUIFiE7EMuFjj3ynIhe+iKHNqAnM2oeeBaCSjHZt1EzWTJgQ fWkNeEhrOnBmkU/SGCEvw6DzF8FuN50zcVpu5eGZ/5TeeKbFY96rfQoCU5nTj0rvu52h pUIUMi4tgwkEbbP8Jx/vQm5GoeB8bvHR1BBQ8jgxNxXSzo2VqHOMBB6QxaQdwLo288aj O76XkmH5CpgAPpMeS26WRXUlRT6Ezn+IPmaKPvUn0I/SDfjEbpHWlsr29XENzgyXROSR wzsIMLvCVud0p9dzTQnNPFxjvPGUQIaE1RyGBGTPIxftTSb9d1RhvPv4l3osIeux//lK ZYOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736922417; x=1737527217; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1QTHVErjQ8xeLIe3NUqxIk/8SXxlth+JV5nZQMI2kDg=; b=wBNR4DlE540nGpiDxQP/6r1p4yEm8noD+frSfdCQJCDd13Z7bMzdK1g8FvMVjeEj4b kK3V/MToIMdlj1onOEGkWuj26PGJ5OWgsugfpOYg5EkiK84bFplTeQZkrBsVRRjVuh/M 8eGFAXNrY7ufi+NRjLwlC7BXzGwyj6qYo7x86xMhL4Lz5XMA30kmedNEQ7bXZhbOoT6T ByQEKj7qfWIkg5eHFLKhhn27F1qR725bf8CT+vjmE9ESAYrUv3J9doluMwg0gxBXJvE4 blpdkF7QaX55UcEOfk5qd7PUjTD0xs7N4g2c5X2L+XaHRqfuX/X4sK4k4JOvHUkdGAGa qcgw== X-Forwarded-Encrypted: i=1; AJvYcCVHAklE4cTyNOUrBFMZOu1Jal9bnsSQCwjvAG9OMOR174hsIcymTtfMiHY9Y+925UGnJ/rqHkR6RA==@kvack.org X-Gm-Message-State: AOJu0Yy9Kb8AeYq2DTC0T7JqtBUmfUBEAuGWmfo4w4AW5IaqeoaiSS1t PiMsMx/V04x8bGPndy7sJpTDHMIaqxA3Ft/JC1yhgvXvsaLAXBjpWHhke7Qx1AUcFV65gkZk58S 1gbISx+ITGgyd73WftR8e1SG51Xk= X-Gm-Gg: ASbGncs1Xj02aroeWQ+znNEjCv/jnXKlszIgsnd66bM3BVsMM9ndB7AddZ9kCGEVnPV 6ARrBY2A5Z9w7eM/INcO3FSwg/TwLtvSfvWi81Q== X-Google-Smtp-Source: AGHT+IGHWO4sDuakKEdHqUvfE3Z3gW5fp1poivDXuCJtfBJKYZ5Who81+3H48NTj/ik4DCaoT89yf2ZxFGDe0IRR18s= X-Received: by 2002:a05:6402:2105:b0:5d0:bf27:ef8a with SMTP id 4fb4d7f45d1cf-5d972e4eeb6mr25379610a12.26.1736922417310; Tue, 14 Jan 2025 22:26:57 -0800 (PST) MIME-Version: 1.0 References: <20250115033808.40641-1-21cnbao@gmail.com> <20250115033808.40641-5-21cnbao@gmail.com> In-Reply-To: From: Lance Yang Date: Wed, 15 Jan 2025 14:26:19 +0800 X-Gm-Features: AbW1kvaPPmdSXN8eYcUNyJXVUpQa-l4-ZCcODmUNmLPk-TO5q5n5IpLZt77WFRA Message-ID: Subject: Re: [PATCH v3 4/4] mm: Avoid splitting pmd for lazyfree pmd-mapped THP in try_to_unmap To: Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, linux-mm@kvack.org, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, lorenzo.stoakes@oracle.com, ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org, ying.huang@intel.com, zhengtangquan@oppo.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 0F43780010 X-Stat-Signature: 9p9gc7rjr9389arn1efcrebum47u9rqd X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1736922418-246056 X-HE-Meta: U2FsdGVkX1/VbMffXl/yQyqLJNoXRzQIdP79C2b+rBMbCy7jaWu5DFN8Xce+Ql9ZFg4WK17T/IuAwe1yB7WwOPdTFrYwZwhiB7HoAPKLbK4vv10u0mUMPxqeu56imXGDterceupokMQ73Yqyqi2Cg+uLdCVF/ZEdSa36AFuBIgJqpagiWaMfypK4OHigkDVXTcnWVcNQNmoWeBoi4zgBFwzAj92l0q97snvmfMwSTdXpyiY8egGDsUPR0itBp+gBCuMBnjKoa+poNPOy/MBQdZzGcjTPv1GwAIfoDGHZidoxZm9hPnMu2ea/shVLC2CFyGXrvbTVxPrVKPWN3tyinNPrRWbTUk9rpupeh7QEP9fPB3cYLo13miM16+2xYmH87XuDqfWVAh+a6s+BOs63tQqQ+XmiTS+zE20GV7oZ1JvtFY+G/gC8ILhc8b/RO9f4YwkhaBQ4CKmAks/4Vr2irP6RHYIv0GbimM9nGyUX3u89nrCNO9Tk2H8Z/CaJbRWdz0ueaK/lH444zmXKxc5qRe0bZ4CwYrrbuI7tI4YQ+AKk0L5p7uIBioZQqDDsXbzXfzzt4IKKLOrXDzko7gYH3vUDFoUgBV0+R6WwB6gm7ddqIrUsUXwvKALXhdCIBiBDUHIu9f63AaQJv4Xm7qYzJOPw0rGWfakxSHpoAcH8iDIfBzke7kfs7j0fmoR2qHEMttO4f0EKHC5tGmGvxjT1kFlYJtzvo6xaf2iupdw1UbAvy2c/RUqMOYf/Y1LnPlr1XhZy8UOi74eyHfJOkOGBzIa9EN4RKbPku//K5dTHy6O7PKnzhsEXCb5XIWCujpaMNCRPTaUeu9IYFn1ctCqAIWFoAZncXC1LscVlGlHmrrJGZa9GL+EJRPNVvWFEHzjdWdAQeSjZtP4mQy8QjsJFuca0OuhQMXtJcJ9lXZQcbRqFiAe5dqrzBsN7K+sdiwCn2GHlafrQKtQsmhhGm+O us5eWKDu UaIL2IwHxNdmmF85ZL5Q5r+8K+vGF5zBMI8hlHLsnv1IE5MIMEYo4JwiiSkEJh2hc+eCElD18JghZdCb0okA6z4E1OuAVeWwYfR9/3Zhdt+rEWfaBXEB9WTL42nwjl0/5luhpnzZ5zxK9kuH3xH6YAFLbdg2Q9WvMDq1u32vbkJJgEyPOoemxMEz/HvwgMiH8dGN48luq/XLkQZZfQ1PP0WF0/QxHJGPTK4zFRftDhyO1TrZmRPL+/TKE828f5dfagyrGEfiNhNDrF9BBP38JUd1uOtP1NQpfnXMJrelQdy5B7ixjnJasdZ+MBFVAnm1hnmnfLUySBOA5AY6obKt7UIBRoVp6ExoXjoNcx71abUYFVnfNcvIHzS0SgbTqZAf+Y+NAdKPD4omaU7liqkNX5mOvnWgBO01q5Sl+P9Mjslty8L3XlOX7CPh94g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 15, 2025 at 1:09=E2=80=AFPM Barry Song <21cnbao@gmail.com> wrot= e: > > On Wed, Jan 15, 2025 at 6:01=E2=80=AFPM Lance Yang = wrote: > > > > On Wed, Jan 15, 2025 at 11:38=E2=80=AFAM Barry Song <21cnbao@gmail.com>= wrote: > > > > > > From: Barry Song > > > > > > The try_to_unmap_one() function currently handles PMD-mapped THPs > > > inefficiently. It first splits the PMD into PTEs, copies the dirty > > > state from the PMD to the PTEs, iterates over the PTEs to locate > > > the dirty state, and then marks the THP as swap-backed. This process > > > involves unnecessary PMD splitting and redundant iteration. Instead, > > > this functionality can be efficiently managed in > > > __discard_anon_folio_pmd_locked(), avoiding the extra steps and > > > improving performance. > > > > > > The following microbenchmark redirties folios after invoking MADV_FRE= E, > > > then measures the time taken to perform memory reclamation (actually > > > set those folios swapbacked again) on the redirtied folios. > > > > > > #include > > > #include > > > #include > > > #include > > > > > > #define SIZE 128*1024*1024 // 128 MB > > > > > > int main(int argc, char *argv[]) > > > { > > > while(1) { > > > volatile int *p =3D mmap(0, SIZE, PROT_READ | PROT_WR= ITE, > > > MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); > > > > > > memset((void *)p, 1, SIZE); > > > madvise((void *)p, SIZE, MADV_FREE); > > > /* redirty after MADV_FREE */ > > > memset((void *)p, 1, SIZE); > > > > > > clock_t start_time =3D clock(); > > > madvise((void *)p, SIZE, MADV_PAGEOUT); > > > clock_t end_time =3D clock(); > > > > > > double elapsed_time =3D (double)(end_time - start_tim= e) / CLOCKS_PER_SEC; > > > printf("Time taken by reclamation: %f seconds\n", ela= psed_time); > > > > > > munmap((void *)p, SIZE); > > > } > > > return 0; > > > } > > > > > > Testing results are as below, > > > w/o patch: > > > ~ # ./a.out > > > Time taken by reclamation: 0.007300 seconds > > > Time taken by reclamation: 0.007226 seconds > > > Time taken by reclamation: 0.007295 seconds > > > Time taken by reclamation: 0.007731 seconds > > > Time taken by reclamation: 0.007134 seconds > > > Time taken by reclamation: 0.007285 seconds > > > Time taken by reclamation: 0.007720 seconds > > > Time taken by reclamation: 0.007128 seconds > > > Time taken by reclamation: 0.007710 seconds > > > Time taken by reclamation: 0.007712 seconds > > > Time taken by reclamation: 0.007236 seconds > > > Time taken by reclamation: 0.007690 seconds > > > Time taken by reclamation: 0.007174 seconds > > > Time taken by reclamation: 0.007670 seconds > > > Time taken by reclamation: 0.007169 seconds > > > Time taken by reclamation: 0.007305 seconds > > > Time taken by reclamation: 0.007432 seconds > > > Time taken by reclamation: 0.007158 seconds > > > Time taken by reclamation: 0.007133 seconds > > > =E2=80=A6 > > > > > > w/ patch > > > > > > ~ # ./a.out > > > Time taken by reclamation: 0.002124 seconds > > > Time taken by reclamation: 0.002116 seconds > > > Time taken by reclamation: 0.002150 seconds > > > Time taken by reclamation: 0.002261 seconds > > > Time taken by reclamation: 0.002137 seconds > > > Time taken by reclamation: 0.002173 seconds > > > Time taken by reclamation: 0.002063 seconds > > > Time taken by reclamation: 0.002088 seconds > > > Time taken by reclamation: 0.002169 seconds > > > Time taken by reclamation: 0.002124 seconds > > > Time taken by reclamation: 0.002111 seconds > > > Time taken by reclamation: 0.002224 seconds > > > Time taken by reclamation: 0.002297 seconds > > > Time taken by reclamation: 0.002260 seconds > > > Time taken by reclamation: 0.002246 seconds > > > Time taken by reclamation: 0.002272 seconds > > > Time taken by reclamation: 0.002277 seconds > > > Time taken by reclamation: 0.002462 seconds > > > =E2=80=A6 > > > > > > This patch significantly speeds up try_to_unmap_one() by allowing it > > > to skip redirtied THPs without splitting the PMD. > > > > > > Suggested-by: Baolin Wang > > > Suggested-by: Lance Yang > > > Signed-off-by: Barry Song > > > --- > > > mm/huge_memory.c | 24 +++++++++++++++++------- > > > mm/rmap.c | 13 ++++++++++--- > > > 2 files changed, 27 insertions(+), 10 deletions(-) > > > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > > index 3d3ebdc002d5..47cc8c3f8f80 100644 > > > --- a/mm/huge_memory.c > > > +++ b/mm/huge_memory.c > > > @@ -3070,8 +3070,12 @@ static bool __discard_anon_folio_pmd_locked(st= ruct vm_area_struct *vma, > > > int ref_count, map_count; > > > pmd_t orig_pmd =3D *pmdp; > > > > > > - if (folio_test_dirty(folio) || pmd_dirty(orig_pmd)) > > > + if (pmd_dirty(orig_pmd)) > > > + folio_set_dirty(folio); > > > + if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE= )) { > > > + folio_set_swapbacked(folio); > > > return false; > > > + } > > > > If either the PMD or the folio is dirty, should we just return false ri= ght away, > > regardless of VM_DROPPABLE? There=E2=80=99s no need to proceed further = in that > > case, IMHO ;) > > I don't quite understand you, but we need to proceed to clear pmd entry. > if vm_droppable is true, even if the folio is dirty, we still drop the fo= lio. Hey barry, One thing I still don=E2=80=99t quite understand is as follows: One of the semantics of VM_DROPPABLE is that, under memory pressure, the kernel can drop the pages. Similarly, for MADV_FREE, one of its semantics is that the kernel can free the pages when memory pressure occurs, but only if there is no subsequent write (i.e., the PMD is clean). So, if VM_DROPPABLE is true, we still drop the folio even if it's dirty. Th= is seems to conflict with the semantics of MADV_FREE, which requires the folio or PMD to be clean before being dropped. wdyt? Thanks, Lance > > > > > Thanks, > > Lance > > > > > > > > orig_pmd =3D pmdp_huge_clear_flush(vma, addr, pmdp); > > > > > > @@ -3098,8 +3102,15 @@ static bool __discard_anon_folio_pmd_locked(st= ruct vm_area_struct *vma, > > > * > > > * The only folio refs must be one from isolation plus the rm= ap(s). > > > */ > > > - if (folio_test_dirty(folio) || pmd_dirty(orig_pmd) || > > > - ref_count !=3D map_count + 1) { > > > + if (pmd_dirty(orig_pmd)) > > > + folio_set_dirty(folio); > > > + if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE= )) { > > > + folio_set_swapbacked(folio); > > > + set_pmd_at(mm, addr, pmdp, orig_pmd); > > > + return false; > > > + } > > > + > > > + if (ref_count !=3D map_count + 1) { > > > set_pmd_at(mm, addr, pmdp, orig_pmd); > > > return false; > > > } > > > @@ -3119,12 +3130,11 @@ bool unmap_huge_pmd_locked(struct vm_area_str= uct *vma, unsigned long addr, > > > { > > > VM_WARN_ON_FOLIO(!folio_test_pmd_mappable(folio), folio); > > > VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio); > > > + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); > > > + VM_WARN_ON_FOLIO(folio_test_swapbacked(folio), folio); > > > VM_WARN_ON_ONCE(!IS_ALIGNED(addr, HPAGE_PMD_SIZE)); > > > > > > - if (folio_test_anon(folio) && !folio_test_swapbacked(folio)) > > > - return __discard_anon_folio_pmd_locked(vma, addr, pmd= p, folio); > > > - > > > - return false; > > > + return __discard_anon_folio_pmd_locked(vma, addr, pmdp, folio= ); > > > } > > > > > > static void remap_page(struct folio *folio, unsigned long nr, int fl= ags) > > > diff --git a/mm/rmap.c b/mm/rmap.c > > > index be1978d2712d..a859c399ec7c 100644 > > > --- a/mm/rmap.c > > > +++ b/mm/rmap.c > > > @@ -1724,9 +1724,16 @@ static bool try_to_unmap_one(struct folio *fol= io, struct vm_area_struct *vma, > > > } > > > > > > if (!pvmw.pte) { > > > - if (unmap_huge_pmd_locked(vma, pvmw.address, = pvmw.pmd, > > > - folio)) > > > - goto walk_done; > > > + if (folio_test_anon(folio) && !folio_test_swa= pbacked(folio)) { > > > + if (unmap_huge_pmd_locked(vma, pvmw.a= ddress, pvmw.pmd, folio)) > > > + goto walk_done; > > > + /* > > > + * unmap_huge_pmd_locked has either a= lready marked > > > + * the folio as swap-backed or decide= d to retain it > > > + * due to GUP or speculative referenc= es. > > > + */ > > > + goto walk_abort; > > > + } > > > > > > if (flags & TTU_SPLIT_HUGE_PMD) { > > > /* > > > -- > > > 2.39.3 (Apple Git-146) > > >