From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 908CCCD1292 for ; Thu, 11 Apr 2024 12:24:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E13B86B007B; Thu, 11 Apr 2024 08:24:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D9CF16B0082; Thu, 11 Apr 2024 08:24:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C3D316B0083; Thu, 11 Apr 2024 08:24:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A23EB6B007B for ; Thu, 11 Apr 2024 08:24:00 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 63D2F160A1B for ; Thu, 11 Apr 2024 12:24:00 +0000 (UTC) X-FDA: 81997167840.24.B19E067 Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by imf30.hostedemail.com (Postfix) with ESMTP id 8451780011 for ; Thu, 11 Apr 2024 12:23:58 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Nw9z6GN3; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712838238; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KU++dEJizkyzzjRVdJjTg00DMJh8Wg2GKkxH84UqySk=; b=MFmgMTx2jvZP3ZW8TFELtorAVKhohv+OLUDQnXWHV1MyXjmZk1S02utL51QJiK66RsrLP5 eXRkglVvIdyzsLXkqIqcuVsw2q2RCCexN06oP5c+ijL+jfCdr65OWIGzTW5N+n4LkOapfI V0D4AmoYCcZwr/czL9jB0kiC0hSLORo= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Nw9z6GN3; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712838238; a=rsa-sha256; cv=none; b=3fSRCe1eiezk8J8sg3Kgzq4BfBJJHHUXxQ0/JfLlLQgaBURFYSvGZYvaYr1YRI3z+yiPj0 7zJRxG5sYWIR3tV3Rwf8cj9UU0q/ei45aLCfXgyMbMTakwTs0Mbnkd9oQENaZo1QzM3nqf Qvbh+3UgB0zK6Ulqu6qKZIvakIkhIu0= Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-56e2b3e114fso8283476a12.2 for ; Thu, 11 Apr 2024 05:23:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712838237; x=1713443037; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=KU++dEJizkyzzjRVdJjTg00DMJh8Wg2GKkxH84UqySk=; b=Nw9z6GN3Oj/AnRhRs/B0E3GkZuHaWqauJ7v4moWEW0LFVaZJdKU6TWfXKbi9p8H0l3 i8PqCBXSBGGheR1JpSh1aI9CMd4qzVI56pPygLesmlZZyW0UKi/isJJxe9QVFg9948zC bVNo+CXH0MnzJctz0c95gAWHE53qgAzgj5kCt1emTKcjO35VPLu6K2js/s2ESjX5DgXa tu2yVYBppoU0yA6JW3fQ/fW9GP21bVwRH1VZUs2corMgEV0+qYZ4TcB6Z07UE6lr0Tyy uH9gcapY6j7Hzz3yawM40WHC5pEgnluMGK/eHYWeo+W/91Nw9xMMddZVmG7TWh6EpbW6 EAsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712838237; x=1713443037; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KU++dEJizkyzzjRVdJjTg00DMJh8Wg2GKkxH84UqySk=; b=RHtQHii3YDKBuutJKGwvEzbZ7YGhTk7HvqPgVXJ9pVoZB9DoUdEv2ko52Gk3rBWOrh Aylz3swsm5JmwuyaWiJ0aqCCYbfXe13Q+qFaqjPivJNXd7oLkCENCkr/LPgqhZg3qBA6 i49QcIHgWsjWfAQtJ7sHrj5m+isJeBA0sqQZXDYO6QkDaEv/nQsJhABlbCAyxQD5c9n7 tp2Br6Odu9li1O6lIBwDFiWjIg6tct+GWie9kyueQ2kXapwkpXkIad9t5doVptqDSYDW Roo7JDxRz+qRnIeIFo7fQWatdG7jrfnAKPpv1IT9dUcubsGodT4LjR5H8A9ybDb2d9an yvUQ== X-Forwarded-Encrypted: i=1; AJvYcCVIE+hc14OlBP4cz+IlacwnU5uKUZzo2B/oWaj3jVJ1hnUu1rSD0h/rx4bXQCdK8ks0FIZc7stQgN0GqQHg8Xh/uZI= X-Gm-Message-State: AOJu0YxkoxPUCzpR6Ma55RLMLwHTn7wbtXlossNEdkHxr5KK5KSjiKOd NJeG/V6VyIBm8bZQqMscQ5oRPevo1rDMHigLslG6AHniBrZjFis2oEn/unMb+GeU35OEftoe6lT 7M4zrmNlyzoXVmgM1gKLeKLqCALY= X-Google-Smtp-Source: AGHT+IE3t1pmqvMACILLqYDEGV6+9xv33IawmivlcP4lNK34k2Ig9vuy8e85f0vHE1SVn0Cp+YoYNdRswrzD5sPpSZM= X-Received: by 2002:a50:d598:0:b0:56d:e73d:6d74 with SMTP id v24-20020a50d598000000b0056de73d6d74mr3361451edi.12.1712838236771; Thu, 11 Apr 2024 05:23:56 -0700 (PDT) MIME-Version: 1.0 References: <20240408042437.10951-1-ioworker0@gmail.com> <20240408042437.10951-2-ioworker0@gmail.com> <38c4add8-53a2-49ca-9f1b-f62c2ee3e764@arm.com> <013334d5-62d2-4256-8045-168893a0a0cf@redhat.com> In-Reply-To: From: Lance Yang Date: Thu, 11 Apr 2024 20:23:45 +0800 Message-ID: Subject: Re: [PATCH v5 1/2] mm/madvise: optimize lazyfreeing with mTHP in madvise_free To: Ryan Roberts Cc: David Hildenbrand , akpm@linux-foundation.org, 21cnbao@gmail.com, mhocko@suse.com, fengwei.yin@intel.com, zokeefe@google.com, shy828301@gmail.com, xiehuan09@gmail.com, wangkefeng.wang@huawei.com, songmuchun@bytedance.com, peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 8451780011 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: uo1jyauk4jkakqmh96rer3zeppiggy49 X-HE-Tag: 1712838238-546482 X-HE-Meta: U2FsdGVkX1/R9MeuahfBWQbGsZrX4I3/lTdNzYdxd+SP2bMvEGmseCv5lklq0KUpvEElncWhJU7QFBfvU7UTg4cuO9y5iPLlXU899F5HL8L6p25Wj2byYS1WoE3XTk9fdx6SXnqS72trt8Iuds+EY5pwvzeQTw6JbrS/tqejGcCcoFDkuXAIxPZhHIzCAJUbzvGzEKC5TPL5fN6vCksuRWPc4U58Dg3Ylp7PR2Alr80v/vaNCmaRCvFZwfZ9yWW3Daq/KLZRHxP3rEs6tHRLER3hI9gYiGtbRZeUthFZtosQgWn9uksrgjA6fEeiMmYwcLo64HxF8A1KxguIaNBXFxkVErigYFBDoO6EygJfznzsUHB0KZfjdWDEXLxRREHn1dA2VXCtzrxT0DvwjGSdJKQ6G5AcvSR9l1oaU4MvlHWJSfsojtlkB1FijrhotC3wu9Q3QI4A09HkDZiFkJJRrEvtJWFvTsZbzfsZDpchA0mlo+rcMRF9edi7hlJHEMUzA+JJOm/H1bb4A1t7imFN0iUsmAOoBYNxCd/lOmF/SS4eWK1nVbcWiXh3/0PpTvA9c18u+8b/EItT1k3XTPlhIOErPoYbngeCO8Zs9saWUbbimUeMiSQaZftCrWz9p6mut3J8ggV63Lifr4vJXcSbtKjOV05Xryr7ibT+0BkSoCNPizUSZTmQ27pisyAsGC4jRtxj9yU9v+ljDkv7Oxm7mBBMEEf7V5uh9HXBSUpFQljdcyt4bLf4tL8Ens21bExqHkpgTfzhtwZLhVof7RqU+91z8n9dqB5Rfmn8cT1wDtGNgMTRl9Xuk/pB4DEVKIrzpfMLsRcossQjEgeQVO6++iW6dcjH3mmhR7QEKikNePWoNcawIXf4QjYI+TAkmHCUcw9qmWE2rhZiXVvN/9BtqlTe0aUmbkGsQBeNYReoSWXcdyrSprajIOnDWKnkqFrOwQzsrOtwqwWFpzZjRCg OVWgEFOI HTn/FFzjb71GUnboGKEu5SsdGNYU7e2cz6Wr8k2yrUhEpPALQCmrfjwV+Pf5dT0n3ZyUsZ4AhjIb/ppD5QkPvjrBEHvq+N3zeua3F3DrTjys9DsHf9kLrZUzXR63p0G5PIM815Jtts7UhwbNeaTe1FGkA/92JEBU3CZicAh0w09BULHDRfWTWK76ZJ/kIm+BPg1BPSPOUReZ1iRVnwNpy+uh9IKxeHJ+MUdWcNscl52f5R+0aS5JAkT0ZrxtPUmQnNZQZqaqWDfWN4c0Qoqti4trIW+vi+1BVcz28LbNbcI7WbW0dzU03RR9NRCqsC9CEwZwQuQu+An6J7Jm3MvKny3Mpowh5MfyXXI0clWGR9GfrYhQl7CiCKkhA7jO1cTNpMCOwvd0dqegsR9cJ1FS3E/Xv7EM5oOge1uvciatSGicWMe/F1QsOTYH+BzTLzHbcmUMmub3BS9p2arGzCOiE7z8dnOkY517wmNXutaa8yGkG2jFNSBaj8kELALzYAGNMXDo7YgCrbXW9P2bPCefqaM342tun3e0yDT0+asTZNAecAp3XhukJAo/khKkMkpOrryjJ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 11, 2024 at 7:27=E2=80=AFPM Ryan Roberts = wrote: > > On 11/04/2024 12:20, David Hildenbrand wrote: > > On 11.04.24 13:11, Ryan Roberts wrote: > >> On 08/04/2024 05:24, Lance Yang wrote: > >>> This patch optimizes lazyfreeing with PTE-mapped mTHP[1] > >>> (Inspired by David Hildenbrand[2]). We aim to avoid unnecessary folio > >>> splitting if the large folio is fully mapped within the target range. > >>> > >>> If a large folio is locked or shared, or if we fail to split it, we j= ust > >>> leave it in place and advance to the next PTE in the range. But note = that > >>> the behavior is changed; previously, any failure of this sort would c= ause > >>> the entire operation to give up. As large folios become more common, > >>> sticking to the old way could result in wasted opportunities. > >>> > >>> On an Intel I5 CPU, lazyfreeing a 1GiB VMA backed by PTE-mapped folio= s of > >>> the same size results in the following runtimes for madvise(MADV_FREE= ) in > >>> seconds (shorter is better): > >>> > >>> Folio Size | Old | New | Change > >>> ------------------------------------------ > >>> 4KiB | 0.590251 | 0.590259 | 0% > >>> 16KiB | 2.990447 | 0.185655 | -94% > >>> 32KiB | 2.547831 | 0.104870 | -95% > >>> 64KiB | 2.457796 | 0.052812 | -97% > >>> 128KiB | 2.281034 | 0.032777 | -99% > >>> 256KiB | 2.230387 | 0.017496 | -99% > >>> 512KiB | 2.189106 | 0.010781 | -99% > >>> 1024KiB | 2.183949 | 0.007753 | -99% > >>> 2048KiB | 0.002799 | 0.002804 | 0% > >>> > >>> [1] https://lkml.kernel.org/r/20231207161211.2374093-5-ryan.roberts@a= rm.com > >>> [2] https://lore.kernel.org/linux-mm/20240214204435.167852-1-david@re= dhat.com > >>> > >>> Signed-off-by: Lance Yang > >>> --- > >>> include/linux/pgtable.h | 34 +++++++++ > >>> mm/internal.h | 12 +++- > >>> mm/madvise.c | 149 ++++++++++++++++++++++---------------= --- > >>> mm/memory.c | 4 +- > >>> 4 files changed, 129 insertions(+), 70 deletions(-) > >>> > >>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h > >>> index 0f4b2faa1d71..4dd442787420 100644 > >>> --- a/include/linux/pgtable.h > >>> +++ b/include/linux/pgtable.h > >>> @@ -489,6 +489,40 @@ static inline pte_t ptep_get_and_clear(struct mm= _struct > >>> *mm, > >>> } > >>> #endif > >>> +#ifndef mkold_clean_ptes > >>> +/** > >>> + * mkold_clean_ptes - Mark PTEs that map consecutive pages of the sa= me folio > >>> + * as old and clean. > >>> + * @mm: Address space the pages are mapped into. > >>> + * @addr: Address the first page is mapped at. > >>> + * @ptep: Page table pointer for the first entry. > >>> + * @nr: Number of entries to mark old and clean. > >>> + * > >>> + * May be overridden by the architecture; otherwise, implemented by > >>> + * get_and_clear/modify/set for each pte in the range. > >>> + * > >>> + * Note that PTE bits in the PTE range besides the PFN can differ. F= or example, > >>> + * some PTEs might be write-protected. > >>> + * > >>> + * Context: The caller holds the page table lock. The PTEs map cons= ecutive > >>> + * pages that belong to the same folio. The PTEs are all in the sam= e PMD. > >>> + */ > >>> +static inline void mkold_clean_ptes(struct mm_struct *mm, unsigned l= ong addr, > >>> + pte_t *ptep, unsigned int nr) > >> Thanks for the suggestions, Ryan, David! > >> Just thinking out loud, I wonder if it would be cleaner to convert mko= ld_ptes() > >> (which I added as part of swap-out) to something like: Yeah, this is definitely cleaner than before. > >> > >> clear_young_dirty_ptes(struct mm_struct *mm, unsigned long addr, > >> pte_t *ptep, unsigned int nr, > >> bool clear_young, bool clear_dirty); > >> > >> Then we can use the same function for both use cases and also have the= ability > >> to only clear dirty in future if we ever need it. The other advantage = is that we > >> only need to plumb a single function down the arm64 arch code. As it c= urrently > >> stands, those 2 functions would be duplicating most of their code. Agreed. It's indeed a good idea to use a single function for both use cases= . > > > > Yes. Maybe better use proper __bitwise flags, the compiler should be sm= art > > enough to optimize either way. Nice. I'll use the __bitwise flags as the input. > > Agreed. I was also thinking perhaps it makes sense to start using output = bitwise > flags for folio_pte_batch() since this patch set takes us up to 3 optiona= l bool > pointers for different things. Might be cleaner to have input flags to te= ll it > what we care about and output flags to highlight those things. I guess th= e > compiler should be able to optimize in the same way. > Should I start using output bitwise flags for folio_pte_batch() in this patch set? Thanks, Lance