From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13EF3C04FFF for ; Thu, 18 Apr 2024 12:48:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 84D716B0092; Thu, 18 Apr 2024 08:48:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7D7336B0093; Thu, 18 Apr 2024 08:48:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6505C6B0096; Thu, 18 Apr 2024 08:48:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 42F496B0092 for ; Thu, 18 Apr 2024 08:48:28 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id B6634C1237 for ; Thu, 18 Apr 2024 12:48:27 +0000 (UTC) X-FDA: 82022631054.05.E90463A Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by imf19.hostedemail.com (Postfix) with ESMTP id DB70F1A0019 for ; Thu, 18 Apr 2024 12:48:25 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BE+FvR5G; spf=pass (imf19.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713444506; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=02DHgHC5hfvxUAH6RCZ0vfGGVyvpArmC7tKJv3DPOKA=; b=vDCnvclUClxkTlPFzNM1usIF12GQkWfEZF+dm2ddq4+YIrSeBskHIAQAX+BmXCRM9sqVrP 4Rtnxs1dUbewc9ljQXSzT3+3KQAWTnMMjxjWhK2zQWlDWKcJ4HCj2dLVmtHrUmCO4U/b7r 2GXFBJ+w1X2aII+tbnf5pYE3nAMx8UI= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BE+FvR5G; spf=pass (imf19.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713444506; a=rsa-sha256; cv=none; b=U0eNuHgjZP2A6DrttcD59vJk4a98b84ndhcftcJmOwGYmDRG01qmIilAak+4jhB1pPZcF5 SwoKGQl9h9BQkN3ZhtMu061UIu9Hx8WgiLm+T2XawMZUsh4MWQTGNhXOs5qmUT4933MX50 WWGsUqpLuSrhT+JbJu6bu253nCAE2Oc= Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-571c2055cb1so98426a12.1 for ; Thu, 18 Apr 2024 05:48:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713444504; x=1714049304; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=02DHgHC5hfvxUAH6RCZ0vfGGVyvpArmC7tKJv3DPOKA=; b=BE+FvR5GM4o9CXQeKwk6VYHAABwsow4Sp4ZfH2MAVA2xmhRvXaQzivfQ0L0d0uq9qw dk6autd8i8Xbe009EBltIn4VUkxNnXy375ngZTerZ6lstRQc+vKOg8H+/7ef4AQSQOwy ZdF47N422m1lLiO/1scUpN08YJ7zhXsAwbfiDqtxexy+JEcm7We5kV5Kmi3UmhiEJi47 OHuDsIFAxvcIJfs4+QHVLWTm4bjksy4Nyw9mF4WohVNzbe4Ae4tmgG/S9TmeT9sKQHGP tEa9Oc82z/Vg/slEhM9tyOKprtuttKGix1REtKQ62LeOm0UZbwEQYucDxlvqYm2QbW8R PdpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713444504; x=1714049304; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=02DHgHC5hfvxUAH6RCZ0vfGGVyvpArmC7tKJv3DPOKA=; b=iV7kCs9ann68DxLtpTqP95MMGflc6yWwmTfkPw3qOD2L6Fkaz1CU2ZI5ymZAG5ZPbd YgKp87pXbfMiaDHiiCdnqo6oHtYY0EJp3PFfqh6itYytSxs+C3Hrw8vTfU3sYST3BZPx 76DN1dDIDLdu9+ZBB1yZfKrFBPE5EjnMKqVKeCf2O1vaJtX6JHkl3GgL/TloA4F2yyge q9jLGnCGZjigcQhBwLNJeZTr12TCTgkQtv2P/PVTq7/W5ausggOky/lGZ89KfnJNKFdf JikxgFFZoKcSKD8AcZ04stjHY+Z+MOyVyh2LaRDHhCDyrmqBjQKljhOeIJDXylT3yEDE njbQ== X-Forwarded-Encrypted: i=1; AJvYcCWbiFLTy1R7R3u6fA2ZXgAh09IwHhNjbqSQS/U6kAa35fi252T7p3aFqHc2G2Ferb70Y8cSh6i31xHwTG4BsO2eZwg= X-Gm-Message-State: AOJu0YzJrLLyY+qZeaVdPMZw27k2lZ5tYIBJTi5CXkXvsAO4xWfTJl4I 2iOArbtPWhB0hBTPZ04dHW6M2MELl8tlJbdQjUnaCVkn5xCaqIxSFRXOLFj5qjVwaUaUeqF8FqV 4zAH+YfWoDxM8d5dWkQ1xbt5Jj7o= X-Google-Smtp-Source: AGHT+IHoVoQDLTTFoju5epQHkEGecXR+kc3MK/xg6dv/EKvUPfTp2gAwCrUPxZrHpUqrOoPa5KWSFyQ37OZCmN9Xs5o= X-Received: by 2002:a50:9faa:0:b0:570:37d:badd with SMTP id c39-20020a509faa000000b00570037dbaddmr2194588edf.28.1713444504271; Thu, 18 Apr 2024 05:48:24 -0700 (PDT) MIME-Version: 1.0 References: <20240418105750.98866-1-ioworker0@gmail.com> <20240418105750.98866-5-ioworker0@gmail.com> <2fdcee93-b8ad-4374-a8ab-7c7bed463813@redhat.com> <89b534ab-ce9f-4a8a-984c-8460f686980d@redhat.com> In-Reply-To: <89b534ab-ce9f-4a8a-984c-8460f686980d@redhat.com> From: Lance Yang Date: Thu, 18 Apr 2024 20:48:12 +0800 Message-ID: Subject: Re: [PATCH v9 4/4] mm/madvise: optimize lazyfreeing with mTHP in madvise_free To: David Hildenbrand Cc: akpm@linux-foundation.org, ryan.roberts@arm.com, 21cnbao@gmail.com, mhocko@suse.com, fengwei.yin@intel.com, zokeefe@google.com, shy828301@gmail.com, xiehuan09@gmail.com, wangkefeng.wang@huawei.com, songmuchun@bytedance.com, peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: DB70F1A0019 X-Stat-Signature: krm891yj9o8t9a5oaps7pu1xathw1npx X-Rspam-User: X-Rspamd-Server: rspam07 X-HE-Tag: 1713444505-18259 X-HE-Meta: U2FsdGVkX1+3eqmYLp8po8+WqtC7+uMzHF2/M541j1UXh+L80XxZBC/kxZgoLCgYrSLOqjvD9GTGvLR0u6yeGdbxtFqq54BSxdD47NTm3CmJq0ba321cAURt6E/Sra+8vrXMM1bP6Ds57iRzYjimEbIK2T8jkNkGbi2wCwBGVRwUkvvkXAfZR6n4AU34lGRwelPh0yV/MuQvn4yKkigMsuGVJbdJ/Duc0WEFqLxxeamRKLye3BlsayKd6TGbilWuMCkFdsX+KbzMDGRT5uAmECqLL1IWx5uRmKcn88RlOFa21jnZA9QUmPMv0jCL1e+uFs6D5FSk/Z9RMg9HFzMAZeh2NP6XG1Uiv9z445XW0JjtZ2enNbfmXqf2qE28ZpL0hvXMs8qr0OKw8+l2p/U/XTX1k3OHMpzQE+gWb/BEb6C+qoESmtH3LjEmisnjYiYmUufELKphP4/BJbM2KDc71owwADWSRN74Tp/W/hUOUy+iZL9EWV1xcYQFx0CEFfWTtnvYFzYrLOaXkhNaYqIFNXljGNaN/C9HFWk2yOVDOb6i795ugOd+dEdUJOniTWSmMHMDzI4RpgauPaBNiV6XtwCkLEG7Ti8fVMoSIAVuQDYbevHa5rWXTgXjzM9koiOpmiLICIeSirH09Rh9tlfvD/a8NWPbIOE7eyHbBczdB5XM/e4YS9Nw276Pu6ObgFzVF7V99RShm7h21x5lJvFNJhWWbrfXHFVBLHme/K5eJ0RehuEMpyXh+QyHmEVeNS/up7yrhg7N/w+Woh2pNFXP4IDXSghu+SKsUrA63BYBL5cMLccOnt7OPyuYGFT8Q8PgoFaBQwxzorcO2O5ogE+DPC2d7CM1nwUCZcKBqZ2f7kgWPFsUlFVSxo77/s0B7w5PYX7ZgX0+6vTtbGOaVuowzW8ms+PrmI2SC4HCR/eIS99US6mohmQUK+6eKhZpPe7XFjiT7GiI1OzY1EGWc8s qbxmCvqm 63Gyt49dCGeL1PCQVpJMJM3to9zY6/hlnHUjRxf2v2QwRjLmFLhuv5L2TBaKh53t9usS3obHMB+2h3WRQ7xDJ72W/4k3YpPBl5dF+cfdriaonxGUgA6iOuL3rLQb8cvVoCBn3Closg45vL/0E62RYJpUkXJ2pQ+kz11mqrYuRTNtOnzw8MxnOy4gUjIpw6WTkO6RPizhkSxuIMzBSODqOBvChVR1JsFhkO1nBAXYeI5goW8corVdqARsgDKqZNjolvjUt4BW8l0M9NdHBcHkkAUQA3WNG8ol62CU7hVzl6pBVKhD5uYF1uFdDfzjwAGayO/MYqWsjjdffVq9hrexSlHUyA6HjrPs6UiB6izQdYFhpR/saI7N85k3qMadGlATVUICeTxGrSBSB9kH+USmA2extAGLEyymcf1pjPYpd/Fj9JMvyYppvFxcmXauVXE7/zCMOJI38n85hSBnHuT133LuNnYz+k3XcqQBafLCitYaw8UIz2rw/745P5U12W9URAuxqeELzV6PcvM7URLCj4fJp81J5mY2Is/QuRb+3a6QhZ8JV6Fook9l2RCGizGcnfOAv X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 18, 2024 at 8:44=E2=80=AFPM David Hildenbrand wrote: > > On 18.04.24 14:33, Lance Yang wrote: > > On Thu, Apr 18, 2024 at 8:03=E2=80=AFPM David Hildenbrand wrote: > >> > >> On 18.04.24 12:57, Lance Yang wrote: > >>> This patch optimizes lazyfreeing with PTE-mapped mTHP[1] > >>> (Inspired by David Hildenbrand[2]). We aim to avoid unnecessary folio > >>> splitting if the large folio is fully mapped within the target range. > >>> > >>> If a large folio is locked or shared, or if we fail to split it, we j= ust > >>> leave it in place and advance to the next PTE in the range. But note = that > >>> the behavior is changed; previously, any failure of this sort would c= ause > >>> the entire operation to give up. As large folios become more common, > >>> sticking to the old way could result in wasted opportunities. > >>> > >>> On an Intel I5 CPU, lazyfreeing a 1GiB VMA backed by PTE-mapped folio= s of > >>> the same size results in the following runtimes for madvise(MADV_FREE= ) in > >>> seconds (shorter is better): > >>> > >>> Folio Size | Old | New | Change > >>> ------------------------------------------ > >>> 4KiB | 0.590251 | 0.590259 | 0% > >>> 16KiB | 2.990447 | 0.185655 | -94% > >>> 32KiB | 2.547831 | 0.104870 | -95% > >>> 64KiB | 2.457796 | 0.052812 | -97% > >>> 128KiB | 2.281034 | 0.032777 | -99% > >>> 256KiB | 2.230387 | 0.017496 | -99% > >>> 512KiB | 2.189106 | 0.010781 | -99% > >>> 1024KiB | 2.183949 | 0.007753 | -99% > >>> 2048KiB | 0.002799 | 0.002804 | 0% > >>> > >>> [1] https://lkml.kernel.org/r/20231207161211.2374093-5-ryan.roberts@a= rm.com > >>> [2] https://lore.kernel.org/linux-mm/20240214204435.167852-1-david@re= dhat.com > >>> > >>> Reviewed-by: Ryan Roberts > >>> Signed-off-by: Lance Yang > >>> --- > >>> mm/madvise.c | 85 +++++++++++++++++++++++++++---------------------= ---- > >>> 1 file changed, 44 insertions(+), 41 deletions(-) > >>> > >>> diff --git a/mm/madvise.c b/mm/madvise.c > >>> index 4597a3568e7e..375ab3234603 100644 > >>> --- a/mm/madvise.c > >>> +++ b/mm/madvise.c > >>> @@ -643,6 +643,7 @@ static int madvise_free_pte_range(pmd_t *pmd, uns= igned long addr, > >>> unsigned long end, struct mm_walk *wal= k) > >>> > >>> { > >>> + const cydp_t cydp_flags =3D CYDP_CLEAR_YOUNG | CYDP_CLEAR_DIRTY= ; > >>> struct mmu_gather *tlb =3D walk->private; > >>> struct mm_struct *mm =3D tlb->mm; > >>> struct vm_area_struct *vma =3D walk->vma; > >>> @@ -697,44 +698,57 @@ static int madvise_free_pte_range(pmd_t *pmd, u= nsigned long addr, > >>> continue; > >>> > >>> /* > >>> - * If pmd isn't transhuge but the folio is large and > >>> - * is owned by only this process, split it and > >>> - * deactivate all pages. > >>> + * If we encounter a large folio, only split it if it i= s not > >>> + * fully mapped within the range we are operating on. O= therwise > >>> + * leave it as is so that it can be marked as lazyfree.= If we > >>> + * fail to split a folio, leave it in place and advance= to the > >>> + * next pte in the range. > >>> */ > >>> if (folio_test_large(folio)) { > >>> - int err; > >>> + bool any_young, any_dirty; > >>> > >>> - if (folio_likely_mapped_shared(folio)) > >>> - break; > >>> - if (!folio_trylock(folio)) > >>> - break; > >>> - folio_get(folio); > >>> - arch_leave_lazy_mmu_mode(); > >>> - pte_unmap_unlock(start_pte, ptl); > >>> - start_pte =3D NULL; > >>> - err =3D split_folio(folio); > >>> - folio_unlock(folio); > >>> - folio_put(folio); > >>> - if (err) > >>> - break; > >>> - start_pte =3D pte =3D > >>> - pte_offset_map_lock(mm, pmd, addr, &ptl= ); > >>> - if (!start_pte) > >>> - break; > >>> - arch_enter_lazy_mmu_mode(); > >>> - pte--; > >>> - addr -=3D PAGE_SIZE; > >>> - continue; > >>> + nr =3D madvise_folio_pte_batch(addr, end, folio= , pte, > >>> + ptent, &any_young,= NULL); > >>> + > >>> + if (nr < folio_nr_pages(folio)) { > >>> + int err; > >>> + > >>> + if (folio_likely_mapped_shared(folio)) > >>> + continue; > >>> + if (!folio_trylock(folio)) > >>> + continue; > >>> + folio_get(folio); > >>> + arch_leave_lazy_mmu_mode(); > >>> + pte_unmap_unlock(start_pte, ptl); > >>> + start_pte =3D NULL; > >>> + err =3D split_folio(folio); > >>> + folio_unlock(folio); > >>> + folio_put(folio); > >>> + start_pte =3D pte =3D > >>> + pte_offset_map_lock(mm, pmd, ad= dr, &ptl); > >> > >> I'd just put it on a single line. > > > > start_pte =3D pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); > > > > I suddenly realized that putting it on a single line would exceed the > > 80-char limit. > > Which is fine according to Documentation/process/coding-style.rst > > ... as long as it aids readability. > > Alternatively, the following might do: > > pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); > start_pte =3D pte; Yep, I understood. Thanks, Lance > > -- > Cheers, > > David / dhildenb >