From: Barry Song <21cnbao@gmail.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: Lance Yang <lance.yang@linux.dev>,
Vernon Yang <vernon2gm@gmail.com>,
akpm@linux-foundation.org, lorenzo.stoakes@oracle.com,
ziy@nvidia.com, dev.jain@arm.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org,
Vernon Yang <yanglincheng@kylinos.cn>
Subject: Re: [PATCH mm-new v7 4/5] mm: khugepaged: skip lazy-free folios
Date: Sun, 8 Feb 2026 06:01:51 +0800 [thread overview]
Message-ID: <CAGsJ_4yEfgipUe37_k5rArrYMPY_31JUKQGjRk+NNJTK9QhBWQ@mail.gmail.com> (raw)
In-Reply-To: <c1280171-b9e7-4f10-adb5-b6a8ed69e54b@kernel.org>
On Sun, Feb 8, 2026 at 5:38 AM David Hildenbrand (Arm) <david@kernel.org> wrote:
>
> On 2/7/26 14:51, Lance Yang wrote:
> >
> >
> > On 2026/2/7 16:34, Barry Song wrote:
> >> On Sat, Feb 7, 2026 at 4:16 PM Vernon Yang <vernon2gm@gmail.com> wrote:
> >>>
> >>> From: Vernon Yang <yanglincheng@kylinos.cn>
> >>>
> >>> For example, create three task: hot1 -> cold -> hot2. After all three
> >>> task are created, each allocate memory 128MB. the hot1/hot2 task
> >>> continuously access 128 MB memory, while the cold task only accesses
> >>> its memory briefly and then call madvise(MADV_FREE). However, khugepaged
> >>> still prioritizes scanning the cold task and only scans the hot2 task
> >>> after completing the scan of the cold task.
> >>>
> >>> And if we collapse with a lazyfree page, that content will never be none
> >>> and the deferred shrinker cannot reclaim them.
> >>>
> >>> So if the user has explicitly informed us via MADV_FREE that this memory
> >>> will be freed, it is appropriate for khugepaged to skip it only, thereby
> >>> avoiding unnecessary scan and collapse operations to reducing CPU
> >>> wastage.
> >>>
> >>> Here are the performance test results:
> >>> (Throughput bigger is better, other smaller is better)
> >>>
> >>> Testing on x86_64 machine:
> >>>
> >>> | task hot2 | without patch | with patch | delta |
> >>> |---------------------|---------------|---------------|---------|
> >>> | total accesses time | 3.14 sec | 2.93 sec | -6.69% |
> >>> | cycles per access | 4.96 | 2.21 | -55.44% |
> >>> | Throughput | 104.38 M/sec | 111.89 M/sec | +7.19% |
> >>> | dTLB-load-misses | 284814532 | 69597236 | -75.56% |
> >>>
> >>> Testing on qemu-system-x86_64 -enable-kvm:
> >>>
> >>> | task hot2 | without patch | with patch | delta |
> >>> |---------------------|---------------|---------------|---------|
> >>> | total accesses time | 3.35 sec | 2.96 sec | -11.64% |
> >>> | cycles per access | 7.29 | 2.07 | -71.60% |
> >>> | Throughput | 97.67 M/sec | 110.77 M/sec | +13.41% |
> >>> | dTLB-load-misses | 241600871 | 3216108 | -98.67% |
> >>>
> >>> Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
> >>> Acked-by: David Hildenbrand (arm) <david@kernel.org>
> >>> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> >>> ---
> >>> include/trace/events/huge_memory.h | 1 +
> >>> mm/khugepaged.c | 13 +++++++++++++
> >>> 2 files changed, 14 insertions(+)
> >>>
> >>> diff --git a/include/trace/events/huge_memory.h b/include/trace/
> >>> events/huge_memory.h
> >>> index 384e29f6bef0..bcdc57eea270 100644
> >>> --- a/include/trace/events/huge_memory.h
> >>> +++ b/include/trace/events/huge_memory.h
> >>> @@ -25,6 +25,7 @@
> >>> EM( SCAN_PAGE_LRU,
> >>> "page_not_in_lru") \
> >>> EM( SCAN_PAGE_LOCK,
> >>> "page_locked") \
> >>> EM( SCAN_PAGE_ANON,
> >>> "page_not_anon") \
> >>> + EM( SCAN_PAGE_LAZYFREE,
> >>> "page_lazyfree") \
> >>> EM( SCAN_PAGE_COMPOUND,
> >>> "page_compound") \
> >>> EM( SCAN_ANY_PROCESS,
> >>> "no_process_for_page") \
> >>> EM( SCAN_VMA_NULL,
> >>> "vma_null") \
> >>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >>> index 8b68ae3bc2c5..0d160e612e16 100644
> >>> --- a/mm/khugepaged.c
> >>> +++ b/mm/khugepaged.c
> >>> @@ -46,6 +46,7 @@ enum scan_result {
> >>> SCAN_PAGE_LRU,
> >>> SCAN_PAGE_LOCK,
> >>> SCAN_PAGE_ANON,
> >>> + SCAN_PAGE_LAZYFREE,
> >>> SCAN_PAGE_COMPOUND,
> >>> SCAN_ANY_PROCESS,
> >>> SCAN_VMA_NULL,
> >>> @@ -583,6 +584,12 @@ static enum scan_result
> >>> __collapse_huge_page_isolate(struct vm_area_struct *vma,
> >>> folio = page_folio(page);
> >>> VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
> >>>
> >>> + if (cc->is_khugepaged && !pte_dirty(pteval) &&
> >>> + folio_test_lazyfree(folio)) {
> >>
> >> We have two corner cases here:
> >
> > Good catch!
> >
> >>
> >> 1. Even if a lazyfree folio is dirty, if the VMA has the VM_DROPPABLE
> >> flag,
> >> a lazyfree folio may still be dropped, even when its PTE is dirty.
>
> Good point!
>
> >
> > Right. When the VMA has VM_DROPPABLE, we would drop the lazyfree folio
> > regardless of whether it (or the PTE) is dirty in try_to_unmap_one().
> >
> > So, IMHO, we could go with:
> >
> > cc->is_khugepaged && folio_test_lazyfree(folio) &&
> > (!pte_dirty(pteval) || (vma->vm_flags & VM_DROPPABLE))
>
> Hm. In a VM_DROPPABLE mapping all folios should be marked as lazy-free
> (see folio_add_new_anon_rmap()).
>
> The new (collapse) folio will also be marked lazy (due to
> folio_add_new_anon_rmap()) free and can just get dropped any time.
>
> So likely we should just not skip collapse for lazyfree folios in
> VM_DROPPABLE mappings?
Maybe change “just not skip” to “just skip”?
If the goal is to avoid the collapse overhead for folios that are
about to be dropped, we might consider skipping collapse for the
entire VMA?
>
> if (cc->is_khugepaged && !(vma->vm_flags & VM_DROPPABLE) &&
> folio_test_lazyfree(folio) && !pte_dirty(pteval)) {
> ...
> }
Thanks
Barry
next prev parent reply other threads:[~2026-02-07 22:02 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-07 8:16 [PATCH mm-new v7 0/5] Improve khugepaged scan logic Vernon Yang
2026-02-07 8:16 ` [PATCH mm-new v7 1/5] mm: khugepaged: add trace_mm_khugepaged_scan event Vernon Yang
2026-02-07 8:16 ` [PATCH mm-new v7 2/5] mm: khugepaged: refine scan progress number Vernon Yang
2026-02-08 9:17 ` Dev Jain
2026-02-08 13:25 ` Vernon Yang
2026-02-18 3:55 ` Vernon Yang
2026-02-18 8:05 ` David Hildenbrand (Arm)
2026-02-07 8:16 ` [PATCH mm-new v7 3/5] mm: add folio_test_lazyfree helper Vernon Yang
2026-02-07 8:16 ` [PATCH mm-new v7 4/5] mm: khugepaged: skip lazy-free folios Vernon Yang
2026-02-07 8:34 ` Barry Song
2026-02-07 13:51 ` Lance Yang
2026-02-07 21:38 ` David Hildenbrand (Arm)
2026-02-07 22:01 ` Barry Song [this message]
2026-02-07 22:05 ` David Hildenbrand (Arm)
2026-02-07 22:17 ` Barry Song
2026-02-07 22:25 ` David Hildenbrand (Arm)
2026-02-07 22:31 ` Barry Song
2026-02-08 13:26 ` Vernon Yang
2026-02-08 4:06 ` Lance Yang
2026-02-07 8:16 ` [PATCH mm-new v7 5/5] mm: khugepaged: set to next mm direct when mm has MMF_DISABLE_THP_COMPLETELY Vernon Yang
-- strict thread matches above, loose matches on Subject: below --
2026-02-07 8:11 [PATCH mm-new v7 0/5] Improve khugepaged scan logic Vernon Yang
2026-02-07 8:11 ` [PATCH mm-new v7 4/5] mm: khugepaged: skip lazy-free folios Vernon Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGsJ_4yEfgipUe37_k5rArrYMPY_31JUKQGjRk+NNJTK9QhBWQ@mail.gmail.com \
--to=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=lance.yang@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=vernon2gm@gmail.com \
--cc=yanglincheng@kylinos.cn \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox