From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2ED51C4345F for ; Thu, 18 Apr 2024 12:33:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 99E156B0098; Thu, 18 Apr 2024 08:33:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 94F7E6B0099; Thu, 18 Apr 2024 08:33:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8173B6B009A; Thu, 18 Apr 2024 08:33:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 64E7F6B0098 for ; Thu, 18 Apr 2024 08:33:28 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 216B6161355 for ; Thu, 18 Apr 2024 12:33:28 +0000 (UTC) X-FDA: 82022593296.10.7BE6EDD Received: from mail-ej1-f50.google.com (mail-ej1-f50.google.com [209.85.218.50]) by imf28.hostedemail.com (Postfix) with ESMTP id 5FFA2C0006 for ; Thu, 18 Apr 2024 12:33:26 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=NTiTMnHd; spf=pass (imf28.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.218.50 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713443606; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GWbvBBD4IynrcD2Sf4xGpEx0l18q3sV1s1D5M4aDHzM=; b=SWDKrfW5EwTAdhieLnnsWpwAZmeIbLQaTc/YlynGSVjdvoavsa0GjY7xu4JXWke07OOcWK hK5/gyfgTVBAgfdwONG8m+WnJ+RBGkD2fB9vWI9D2Mq8ybAp9PKLNdagTDSluU7LGZu8F/ Iicg7kMbwD8qDtZxpYP5Y/gXBRbCO5Y= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=NTiTMnHd; spf=pass (imf28.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.218.50 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713443606; a=rsa-sha256; cv=none; b=vjJ782iqxZN59S5+HC1TIbCYK5s9bbhhqgFBQK+9O3G4QOdqwux6dk+jwOMwW3Ga4Dq5Wq iuwm9jy8kSm6bpGjdrp3REY5CyL/+cmVvSFAhZ+QYN6D7wI5kAJW5PLKPVgHu/ic5Y+flN USJQY2DhMc9UvE7i2amm5cF6HOOvwl0= Received: by mail-ej1-f50.google.com with SMTP id a640c23a62f3a-a555faf94fcso86455266b.0 for ; Thu, 18 Apr 2024 05:33:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713443604; x=1714048404; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=GWbvBBD4IynrcD2Sf4xGpEx0l18q3sV1s1D5M4aDHzM=; b=NTiTMnHdXrMrEEV3Vu5o+eNzYNkLHl3LrgIr7t7r7810I0f48iwDzEeee/j4p0a3kS dQlqm79N4xlBXXNpxpt7ToFpuhzZg0Ekw7Kz+bJ/2hWF8oyZLbqJ5nXvGvlO+KhiYcas x3/3jnX39h2FOmMdNuQoONAQ6fHz8bLArAgkWW4lOaj2kYiejvFYrBX846A038PALqqE znJbElVivaWACdj32KPG5gr01rowr10zYXx1RtRBjQ6qUtHZFLPPf7K5ecv4KPMyxqR0 apKaVcGRStAqvnBxo3JMpd6nXPYO2Z69rjZoNIoGDizeXHERc8AM2DTQoQ5F9OX2RLNX JMMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713443604; x=1714048404; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GWbvBBD4IynrcD2Sf4xGpEx0l18q3sV1s1D5M4aDHzM=; b=gene/tAD/2RL/U72DAVfZd7/IALF0B2NDoWnO+yuNufhWTmL8UWRRaLjoAtBEvkaJ3 +ZRZH/c0mRIwpnSN9HxGKHcNpj+dlvLhiXM99KndEGVZtpiM9J/y8C0pbylasueX/Uui oXmyoYssSBzolq/ZfNYiARn5KQ3yjIdlcLQiPv3euVyjrx7Q7Np9CfCMXfyvu5hdRVIt WgtR6A4D7OdHzaGIk15nmev6hmwEfFIAN1Au91FogVw7ssRkWMnmQKwuXmOTRmy79KXS gvnHYanx8J9lJV5CUTa0lqe6ffdVZ8Dxhkuf1Lk6uaiK1p5HbiozIZRg28nTy55IGymS jjyw== X-Forwarded-Encrypted: i=1; AJvYcCU/W0Pr3mnIU4qlsgjOdJ1Ps3Kkek8/+yg7B2LwyebF/p4KmxMIqO7FV/g9yQ/VEqkixcMUrbGcmq32VewnlV3dBJw= X-Gm-Message-State: AOJu0YzZZJzUW5Euz3ATTpw3aeJQ8bteP3NQohDk/3dkSnwtFaDxB3eS 5Y3yVhHJg+EnOJc4EoNXsY2ILqwQ9DR4uV5d7szKybVS66n7YPGcbOE6gCuLzdUGPj2pFNiOtRF uSOCdJbvNvw2bqo946h7Kzge8qtU= X-Google-Smtp-Source: AGHT+IHf/bkF7wytw2Nvf8Bz09dadKEPpP4FClinFkCqOj4123jXRV509DXqBDy3IkWH+qLwXOcD+lNOWsd4uUhYIfY= X-Received: by 2002:a17:907:7b82:b0:a52:6e3b:fcf1 with SMTP id ne2-20020a1709077b8200b00a526e3bfcf1mr2158169ejc.17.1713443604087; Thu, 18 Apr 2024 05:33:24 -0700 (PDT) MIME-Version: 1.0 References: <20240418105750.98866-1-ioworker0@gmail.com> <20240418105750.98866-5-ioworker0@gmail.com> <2fdcee93-b8ad-4374-a8ab-7c7bed463813@redhat.com> In-Reply-To: <2fdcee93-b8ad-4374-a8ab-7c7bed463813@redhat.com> From: Lance Yang Date: Thu, 18 Apr 2024 20:33:12 +0800 Message-ID: Subject: Re: [PATCH v9 4/4] mm/madvise: optimize lazyfreeing with mTHP in madvise_free To: David Hildenbrand Cc: akpm@linux-foundation.org, ryan.roberts@arm.com, 21cnbao@gmail.com, mhocko@suse.com, fengwei.yin@intel.com, zokeefe@google.com, shy828301@gmail.com, xiehuan09@gmail.com, wangkefeng.wang@huawei.com, songmuchun@bytedance.com, peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: n5gy4oo5xinxzafyph4qdp64puatyzfs X-Rspamd-Queue-Id: 5FFA2C0006 X-Rspamd-Server: rspam02 X-Rspam-User: X-HE-Tag: 1713443606-242918 X-HE-Meta: U2FsdGVkX19YS9QlVawENs0dD7mx9ho93VGAj0RkZAEg3WItFrqjY/QXUwxlc86PbvjCWTy62L+/zUhrZ04KPkvBvOG6Ox2+d+CpnyZPwcIDE1Om4HZ1XmuhsPFc0OBPv1GzUWTiMFdqIy5XnSiZdYujVP3xTBfU8k/AMR99idvkShZ7pUNaW3e5wtHdooDVOOCdJSVMaPeghMD7/TBBb7DiJ4RzJ8cUjfl56YwNaVmj3ak2BeDsFbtj0JVJd/+mUBAvO1F2gueWGD2P4JrPeHGv99GWfPWUDeehU2BpDzzL5wax6/wUOENWAmEiHlzSYuXNWcEBUVR6e0dEEzB9IdQOZz9noKW6GS8zRFG56Dni8yORPfu5g2U62NZYmlL0hShMUVRHmncqaeHAM2ZTbdlnH9CcjLmBR7jlujAoutwOxebh/Bauwct9sdYh2ZK6pcm6Sxvz61vIbLEPnnb/P2sC5cBP+pogClOPvPf7Q2pBk+7Yb48zNQAGM00znnK6ofCgyLsfrQ7fv9ShRPpe3IhPYR3hVTFO3MTNZ/YBq6ga4CX5vsoetKO/eHYQStgA0fZeIxjFMztHncvVEjxoYuZRY5fa++AcVz+z+Yudea4yrCWHVoJPTKT4f/vTpRhti0R9jeDX0EKEritYY8HEsVR7Mx7vfA+iFHrk/UEimwA7qx3yrFDkH23x3gxTxxHw6PjigNb8tY8eHwVgueYKAQ7se8kAUhmsUoTZTE5CyX8BAaZ9ctED8hNqm+I8/TUfVQzwGQtyX8+d+ZIqVeMmZ5ylROBsTlR00GYWFM+qGSH36b7X0GGnTEf5Q1BSQyLsAVfjEgzPRDYR2UJlzOi/GEtAysXYReROVmmpBxczuPNJdwpMOKiEelJ3ospFiT27otwCDbITlUvriCrVuBg+xmli28K+AzB2RcYNBwZNTwgVcdZE5eebLv8TRLXOPqlO85qhBXrpLtTBXX16Emp PPI7hvcY TkgtwvxW3AkVtxWe4qGEyIFAX2vNZ/+bEXQBvRqcKidHA2N0Qp3Rn9ROHZ4wbmPVEIxe6L+InszxE9b+JeUDlfxDFbw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 18, 2024 at 8:03=E2=80=AFPM David Hildenbrand wrote: > > On 18.04.24 12:57, Lance Yang wrote: > > This patch optimizes lazyfreeing with PTE-mapped mTHP[1] > > (Inspired by David Hildenbrand[2]). We aim to avoid unnecessary folio > > splitting if the large folio is fully mapped within the target range. > > > > If a large folio is locked or shared, or if we fail to split it, we jus= t > > leave it in place and advance to the next PTE in the range. But note th= at > > the behavior is changed; previously, any failure of this sort would cau= se > > the entire operation to give up. As large folios become more common, > > sticking to the old way could result in wasted opportunities. > > > > On an Intel I5 CPU, lazyfreeing a 1GiB VMA backed by PTE-mapped folios = of > > the same size results in the following runtimes for madvise(MADV_FREE) = in > > seconds (shorter is better): > > > > Folio Size | Old | New | Change > > ------------------------------------------ > > 4KiB | 0.590251 | 0.590259 | 0% > > 16KiB | 2.990447 | 0.185655 | -94% > > 32KiB | 2.547831 | 0.104870 | -95% > > 64KiB | 2.457796 | 0.052812 | -97% > > 128KiB | 2.281034 | 0.032777 | -99% > > 256KiB | 2.230387 | 0.017496 | -99% > > 512KiB | 2.189106 | 0.010781 | -99% > > 1024KiB | 2.183949 | 0.007753 | -99% > > 2048KiB | 0.002799 | 0.002804 | 0% > > > > [1] https://lkml.kernel.org/r/20231207161211.2374093-5-ryan.roberts@arm= .com > > [2] https://lore.kernel.org/linux-mm/20240214204435.167852-1-david@redh= at.com > > > > Reviewed-by: Ryan Roberts > > Signed-off-by: Lance Yang > > --- > > mm/madvise.c | 85 +++++++++++++++++++++++++++------------------------= - > > 1 file changed, 44 insertions(+), 41 deletions(-) > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > index 4597a3568e7e..375ab3234603 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -643,6 +643,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsig= ned long addr, > > unsigned long end, struct mm_walk *walk) > > > > { > > + const cydp_t cydp_flags =3D CYDP_CLEAR_YOUNG | CYDP_CLEAR_DIRTY; > > struct mmu_gather *tlb =3D walk->private; > > struct mm_struct *mm =3D tlb->mm; > > struct vm_area_struct *vma =3D walk->vma; > > @@ -697,44 +698,57 @@ static int madvise_free_pte_range(pmd_t *pmd, uns= igned long addr, > > continue; > > > > /* > > - * If pmd isn't transhuge but the folio is large and > > - * is owned by only this process, split it and > > - * deactivate all pages. > > + * If we encounter a large folio, only split it if it is = not > > + * fully mapped within the range we are operating on. Oth= erwise > > + * leave it as is so that it can be marked as lazyfree. I= f we > > + * fail to split a folio, leave it in place and advance t= o the > > + * next pte in the range. > > */ > > if (folio_test_large(folio)) { > > - int err; > > + bool any_young, any_dirty; > > > > - if (folio_likely_mapped_shared(folio)) > > - break; > > - if (!folio_trylock(folio)) > > - break; > > - folio_get(folio); > > - arch_leave_lazy_mmu_mode(); > > - pte_unmap_unlock(start_pte, ptl); > > - start_pte =3D NULL; > > - err =3D split_folio(folio); > > - folio_unlock(folio); > > - folio_put(folio); > > - if (err) > > - break; > > - start_pte =3D pte =3D > > - pte_offset_map_lock(mm, pmd, addr, &ptl); > > - if (!start_pte) > > - break; > > - arch_enter_lazy_mmu_mode(); > > - pte--; > > - addr -=3D PAGE_SIZE; > > - continue; > > + nr =3D madvise_folio_pte_batch(addr, end, folio, = pte, > > + ptent, &any_young, N= ULL); > > + > > + if (nr < folio_nr_pages(folio)) { > > + int err; > > + > > + if (folio_likely_mapped_shared(folio)) > > + continue; > > + if (!folio_trylock(folio)) > > + continue; > > + folio_get(folio); > > + arch_leave_lazy_mmu_mode(); > > + pte_unmap_unlock(start_pte, ptl); > > + start_pte =3D NULL; > > + err =3D split_folio(folio); > > + folio_unlock(folio); > > + folio_put(folio); > > + start_pte =3D pte =3D > > + pte_offset_map_lock(mm, pmd, addr= , &ptl); > > I'd just put it on a single line. start_pte =3D pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); I suddenly realized that putting it on a single line would exceed the 80-char limit. Or: start_pte =3D pte =3D pte_offset_map_lock( mm, pmd, addr, &ptl); Thanks, Lance > > > + if (!start_pte) > > + break; > > + arch_enter_lazy_mmu_mode(); > > + if (!err) > > + nr =3D 0; > > + continue; > > + } > > + > > + if (any_young) > > + ptent =3D pte_mkyoung(ptent); > > + if (any_dirty) > > any_dirty is never set, likely missed to pass it to > madvise_folio_pte_batch(). > > Apart from that LGTM and this patch is much easier to review now! > > > With above: > > Acked-by: David Hildenbrand > > -- > Cheers, > > David / dhildenb >