From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99212C54E49 for ; Mon, 26 Feb 2024 13:54:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 293CB440169; Mon, 26 Feb 2024 08:54:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 242C5440147; Mon, 26 Feb 2024 08:54:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0C406440169; Mon, 26 Feb 2024 08:54:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E3944440147 for ; Mon, 26 Feb 2024 08:54:21 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 920FCC0956 for ; Mon, 26 Feb 2024 13:54:21 +0000 (UTC) X-FDA: 81834099522.02.F41269B Received: from mail-yw1-f169.google.com (mail-yw1-f169.google.com [209.85.128.169]) by imf25.hostedemail.com (Postfix) with ESMTP id E23BBA001E for ; Mon, 26 Feb 2024 13:54:19 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dJzTyYq3; spf=pass (imf25.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.128.169 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708955659; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SZQA1pp9aylKKoMJWvQ6ksA0yhrk2u+zqXkCG99YSAI=; b=Nwb7HSVYkwpjr5byl/pERw/dBZOX2cLk3wJCOm49qbGwgDk3gTlSQ74GS1V2+ViNTLAfz4 SpcfeiyrHIFw6NaGl5azK3Wx41OmsP1XC7bkDBY6mQOUQ2n8FpVfZy/U7PKj6mWrNfpsos 6qBUKx8D1WAf5sE8Z+b6ynf/LpTMM34= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dJzTyYq3; spf=pass (imf25.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.128.169 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708955659; a=rsa-sha256; cv=none; b=humuLoY0QllQulOdLmcAYm4FcsbQMrkHLcDNo6lb9blGkNzyDiW+ujZQ9j5jOKMhHY2+p6 brzKnbgvDzf4xVs1QgpYykgzYxzu/pRx6jCqLulO5N5YfVMPo95zbjw6/IQfanOSZgeOLK TVWuUHL/VVZwNGzBFthBcL7KQW7fINQ= Received: by mail-yw1-f169.google.com with SMTP id 00721157ae682-6089b64f4eeso30606497b3.2 for ; Mon, 26 Feb 2024 05:54:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708955659; x=1709560459; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=SZQA1pp9aylKKoMJWvQ6ksA0yhrk2u+zqXkCG99YSAI=; b=dJzTyYq32Sy4hLltPmTCeNVHFhLAFLIWUPbRjTF4kBKSMXJ3oDxHH3Wh4RMmPrAsKV +mjW3QYQw3UureMitPx14a1vRRx9si6jkuyjIxc+MmTgOAoGk4c+gULXWzBhQzTO3cW9 O6Vo9l0eYDE0AduLfaGyTxYB2zOke6xwrFZf2un+OYQeaKvAWtYYbOSoY+EzIe9zyLF2 5RfYWywM3AQRJGaKxa2RkFy+rCWqFlGVNN+2wRWX3M97sAWBwHuNhlOYDV373tHcDttv XZqPEMYfQjT/bOO8Xnzbu33Hd3y/B9/MPz5nppv/D+rJpfBou7+eMi8tK2FPxIUkA/OW 2hKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708955659; x=1709560459; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SZQA1pp9aylKKoMJWvQ6ksA0yhrk2u+zqXkCG99YSAI=; b=oZWFzt1KsZO9bHyl5hQPjfu4Tdwoob773MGw1meYstMpGeTmnQgXRQIOYktDBv+sKP qfbK5Kplq0fGlbiDwR1EPybZmyuN5y+BjGPBX6z1BqoZNJdDpGd+TTuVCWkMZU99O6c7 sNz/bdeIR7XOGUjeeCU3ZHfDUHilpH5XoNgHOwLZd4rH1RTiRW9PXdk87nohZcsdTy+z Jjz+l61qpoYJl6GNhOwMkWWhqQnnHvLBUmY+vmMzXZQ8wnjTuUW3gaXeXQI7hZOlEpkK 0d5dUG8bdMhNQWZgg9c45VMPzUKFks1HD6LQLFhxnNt9yUt5y+hx/yuO4yIAV58v74zY nl1g== X-Forwarded-Encrypted: i=1; AJvYcCVT1bh2VRzZvZEicrP4qTUSoUJ9Ha901Ve9JrupxLADCchCkzSQQjL3kwND53jMG9OvWS9o5GHcon1hu17/QhEYlnY= X-Gm-Message-State: AOJu0YznJJNFg0mquSjCaYc+kRmjQ6Vos65kWxXLAh9Fc9ZllcBEoOul 1N7+rDIaBSIfx3scZ7UDo40T/MJacCmmcbd/bysHG8VBjXl/f3cm+UNz+HhV8Pdtpxq6mq9eKNn G+m7WYEF+sN4hLbeboIDmVvb1gRqnHHrHXBA205nL X-Google-Smtp-Source: AGHT+IHfEDXPwirrGbDZyDEnfcb05w6mmL1hznqa/LO6H6EP1zwa8BUQPuVg2T6csrugRa2VWoxcK8z2i1xU05ujtCU= X-Received: by 2002:a05:690c:5:b0:608:e3ac:e6ed with SMTP id bc5-20020a05690c000500b00608e3ace6edmr4593851ywb.10.1708955658999; Mon, 26 Feb 2024 05:54:18 -0800 (PST) MIME-Version: 1.0 References: <20240225123215.86503-1-ioworker0@gmail.com> In-Reply-To: From: Lance Yang Date: Mon, 26 Feb 2024 21:54:07 +0800 Message-ID: Subject: Re: [PATCH 1/1] mm/madvise: enhance lazyfreeing with mTHP in madvise_free To: Ryan Roberts Cc: akpm@linux-foundation.org, zokeefe@google.com, shy828301@gmail.com, david@redhat.com, mhocko@suse.com, wangkefeng.wang@huawei.com, songmuchun@bytedance.com, peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: E23BBA001E X-Rspam-User: X-Stat-Signature: jd6uqy9wuk89iacr8bimzncx8iqqiiws X-Rspamd-Server: rspam01 X-HE-Tag: 1708955659-575629 X-HE-Meta: U2FsdGVkX19sfF4iksWYKm+EQXjEPGQu6DVEx7Ni36mnx0v36yr+JfyzVFA3cq447aeLZPnk7zUILauAOkW1qpv4fGnP67ZKFaM5/P82gX9JDl5YeLut+DrS59UaISR5OuPRDg4FB6Ie5ZG+9j0vyGSRCxnjzu3w0e7gkXyay/OjVuv+RLvJOODlvc54LVuDkHnIp9UlcAZOx8fM7g+s9ciXIkquZBDDZAGhZKPNORcK30yYItQCI3jKFMHwckOjexsnY6Qds5vpC78EmCWl/OUS/1ECBguO6+2IvFFWzeMakaPgJ04L+FD/MPTMz+XAJh2OazmdkBXPkVxHI/2nQ6WyeI9kVX0k6G7lhBecuqZNgNKLZBhpRXdQa7sQZEmmGsAVT6KQ7nP7nvC9f7Nh7rPb5kJqxh0Jo1x3Ffgf0BwsgBxGTG42FqxGUxXjH34CW3hosPE25vy1CKUWCiQKa/zfU58DVqmd2J6LTIg0CmQ5DfqXQAhK7gAnViUIIPsvpwWhAdtPYrpBHPMxpJGczZ7O4KQ7ww0pteYzof/5l/FoVihvdFrcJF8Kcbq6I1d9VYnVLI646EqpAD6IF6UTmGw/tgts1hC3/HqZwhYMGRILyUV5VvpDXTWXZZHhEPkYWrGQzoM+mUTR7vx6m5V4em6jN8IFsriRDBW/+uCiolQeX5rR9n1Xz9m9vJOMiodhEphEuWJ1IG+b7wdEZr612Iq8taY1i6R8OIlxo+VsmjoVCuhRwQc4DftrQaYogaji69wwbhyzWlRLnuOcN1qwBEbQfaSbcBKMPlTZ0xU6Ex/qFK55PVOOa1mqd3vRezBGRma8FADZtT3YaXt7YRZNnDgpRuJ2pWIqa1nUhULe0E8/mqz9X9FVYmSaEC12JAg7Lo3sDWG3DvspGvoh8lpRQkvuwcHF1d8f1AbymBx2lrrpBmUR4Pf4g8oHXiMmww8bKQLaBYs0FQ4BRw6acia nd+wHkZs OhQEyndm3TWRzIFxR997YsXy8zQv38zNmrP4PD2xF6eUQR7m1P1SNRSGDzalwoe4WJrCIVKv52lziUWpTwO083G9UmA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000014, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hey Ryan, Thanks for taking time to review! On Mon, Feb 26, 2024 at 9:00=E2=80=AFPM Ryan Roberts = wrote: > > Hi Lance, > > Thanks for working on this! > > > On 25/02/2024 12:32, Lance Yang wrote: > > This patch improves madvise_free_pte_range() to correctly > > handle large folio that is smaller than PMD-size > > When you say "correctly handle" are you implying there is a bug with the = current > implementation or are you just saying you can optimize this to improve > performance? I'm not convinced there is a bug, but I agree there is certa= inly > room for performance improvement. I agree with your point, and will update the changelog in v2. Thanks again for your time! Best, Lance > > Thanks, > Ryan > > > (for example, 16KiB to 1024KiB[1]). It=E2=80=99s probably part of > > the preparation to support anonymous multi-size THP. > > > > Additionally, when the consecutive PTEs are mapped to > > consecutive pages of the same large folio (mTHP), if the > > folio is locked before madvise(MADV_FREE) or cannot be > > split, then all subsequent PTEs within the same PMD will > > be skipped. However, they should have been MADV_FREEed. > > > > Moreover, this patch also optimizes lazyfreeing with > > PTE-mapped mTHP (Inspired by David Hildenbrand[2]). We > > aim to avoid unnecessary folio splitting if the large > > folio is entirely within the given range. > > > > On an Intel I5 CPU, lazyfreeing a 1GiB VMA backed by > > PTE-mapped folios of the same size results in the following > > runtimes for madvise(MADV_FREE) in seconds (shorter is better): > > > > Folio Size | Old | New | Change > > ---------------------------------------------- > > 4KiB | 0.590251 | 0.590264 | 0% > > 16KiB | 2.990447 | 0.182167 | -94% > > 32KiB | 2.547831 | 0.101622 | -96% > > 64KiB | 2.457796 | 0.049726 | -98% > > 128KiB | 2.281034 | 0.030109 | -99% > > 256KiB | 2.230387 | 0.015838 | -99% > > 512KiB | 2.189106 | 0.009149 | -99% > > 1024KiB | 2.183949 | 0.006620 | -99% > > 2048KiB | 0.002799 | 0.002795 | 0% > > > > [1] https://lkml.kernel.org/r/20231207161211.2374093-5-ryan.roberts@arm= .com > > [2] https://lore.kernel.org/linux-mm/20240214204435.167852-1-david@redh= at.com/ > > > > Signed-off-by: Lance Yang > > --- > > mm/madvise.c | 69 +++++++++++++++++++++++++++++++++++++++++++--------- > > 1 file changed, 58 insertions(+), 11 deletions(-) > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > index cfa5e7288261..bcbf56595a2e 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -676,11 +676,43 @@ static int madvise_free_pte_range(pmd_t *pmd, uns= igned long addr, > > */ > > if (folio_test_large(folio)) { > > int err; > > + unsigned long next_addr, align; > > > > - if (folio_estimated_sharers(folio) !=3D 1) > > - break; > > - if (!folio_trylock(folio)) > > - break; > > + if (folio_estimated_sharers(folio) !=3D 1 || > > + !folio_trylock(folio)) > > + goto skip_large_folio; > > + > > + align =3D folio_nr_pages(folio) * PAGE_SIZE; > > + next_addr =3D ALIGN_DOWN(addr + align, align); > > + > > + /* > > + * If we mark only the subpages as lazyfree, > > + * split the large folio. > > + */ > > + if (next_addr > end || next_addr - addr !=3D alig= n) > > + goto split_large_folio; > > + > > + /* > > + * Avoid unnecessary folio splitting if the large > > + * folio is entirely within the given range. > > + */ > > + folio_test_clear_dirty(folio); > > + folio_unlock(folio); > > + for (; addr !=3D next_addr; pte++, addr +=3D PAGE= _SIZE) { > > + ptent =3D ptep_get(pte); > > + if (pte_young(ptent) || pte_dirty(ptent))= { > > + ptent =3D ptep_get_and_clear_full= ( > > + mm, addr, pte, tlb->fullm= m); > > + ptent =3D pte_mkold(ptent); > > + ptent =3D pte_mkclean(ptent); > > + set_pte_at(mm, addr, pte, ptent); > > + tlb_remove_tlb_entry(tlb, pte, ad= dr); > > + } > > + } > > + folio_mark_lazyfree(folio); > > + goto next_folio; > > + > > +split_large_folio: > > folio_get(folio); > > arch_leave_lazy_mmu_mode(); > > pte_unmap_unlock(start_pte, ptl); > > @@ -688,13 +720,28 @@ static int madvise_free_pte_range(pmd_t *pmd, uns= igned long addr, > > err =3D split_folio(folio); > > folio_unlock(folio); > > folio_put(folio); > > - if (err) > > - break; > > - start_pte =3D pte =3D > > - pte_offset_map_lock(mm, pmd, addr, &ptl); > > - if (!start_pte) > > - break; > > - arch_enter_lazy_mmu_mode(); > > + > > + /* > > + * If the large folio is locked before madvise(MA= DV_FREE) > > + * or cannot be split, we just skip it. > > + */ > > + if (err) { > > +skip_large_folio: > > + if (next_addr >=3D end) > > + break; > > + pte +=3D (next_addr - addr) / PAGE_SIZE; > > + addr =3D next_addr; > > + } > > + > > + if (!start_pte) { > > + start_pte =3D pte =3D pte_offset_map_lock= ( > > + mm, pmd, addr, &ptl); > > + if (!start_pte) > > + break; > > + arch_enter_lazy_mmu_mode(); > > + } > > + > > +next_folio: > > pte--; > > addr -=3D PAGE_SIZE; > > continue; >