From: Vernon Yang <vernon2gm@gmail.com>
To: akpm@linux-foundation.org, david@kernel.org, lorenzo.stoakes@oracle.com
Cc: ziy@nvidia.com, npache@redhat.com, baohua@kernel.org,
lance.yang@linux.dev, linux-mm@kvack.org,
linux-kernel@vger.kernel.org,
Vernon Yang <yanglincheng@kylinos.cn>
Subject: [PATCH 3/4] mm: khugepaged: move mm to list tail when MADV_COLD/MADV_FREE
Date: Mon, 15 Dec 2025 17:04:18 +0800 [thread overview]
Message-ID: <20251215090419.174418-4-yanglincheng@kylinos.cn> (raw)
In-Reply-To: <20251215090419.174418-1-yanglincheng@kylinos.cn>
For example, create three task: hot1 -> cold -> hot2. After all three
task are created, each allocate memory 128MB. the hot1/hot2 task
continuously access 128 MB memory, while the cold task only accesses
its memory briefly andthen call madvise(MADV_COLD). However, khugepaged
still prioritizes scanning the cold task and only scans the hot2 task
after completing the scan of the cold task.
So if the user has explicitly informed us via MADV_COLD/FREE that this
memory is cold or will be freed, it is appropriate for khugepaged to
scan it only at the latest possible moment, thereby avoiding unnecessary
scan and collapse operations to reducing CPU wastage.
Here are the performance test results:
(Throughput bigger is better, other smaller is better)
Testing on x86_64 machine:
| task hot2 | without patch | with patch | delta |
|---------------------|---------------|---------------|---------|
| total accesses time | 3.14 sec | 2.92 sec | -7.01% |
| cycles per access | 4.91 | 2.07 | -57.84% |
| Throughput | 104.38 M/sec | 112.12 M/sec | +7.42% |
| dTLB-load-misses | 288966432 | 1292908 | -99.55% |
Testing on qemu-system-x86_64 -enable-kvm:
| task hot2 | without patch | with patch | delta |
|---------------------|---------------|---------------|---------|
| total accesses time | 3.35 sec | 2.96 sec | -11.64% |
| cycles per access | 7.23 | 2.12 | -70.68% |
| Throughput | 97.88 M/sec | 110.76 M/sec | +13.16% |
| dTLB-load-misses | 237406497 | 3189194 | -98.66% |
Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
---
include/linux/khugepaged.h | 1 +
mm/khugepaged.c | 14 ++++++++++++++
mm/madvise.c | 3 +++
3 files changed, 18 insertions(+)
diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
index eb1946a70cff..726e99de84e9 100644
--- a/include/linux/khugepaged.h
+++ b/include/linux/khugepaged.h
@@ -15,6 +15,7 @@ extern void __khugepaged_enter(struct mm_struct *mm);
extern void __khugepaged_exit(struct mm_struct *mm);
extern void khugepaged_enter_vma(struct vm_area_struct *vma,
vm_flags_t vm_flags);
+void khugepaged_move_tail(struct mm_struct *mm);
extern void khugepaged_min_free_kbytes_update(void);
extern bool current_is_khugepaged(void);
extern int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 1ec1af5be3c8..91836dda2015 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -468,6 +468,20 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
}
}
+void khugepaged_move_tail(struct mm_struct *mm)
+{
+ struct mm_slot *slot;
+
+ if (!mm_flags_test(MMF_VM_HUGEPAGE, mm))
+ return;
+
+ spin_lock(&khugepaged_mm_lock);
+ slot = mm_slot_lookup(mm_slots_hash, mm);
+ if (slot && khugepaged_scan.mm_slot != slot)
+ list_move_tail(&slot->mm_node, &khugepaged_scan.mm_head);
+ spin_unlock(&khugepaged_mm_lock);
+}
+
void __khugepaged_exit(struct mm_struct *mm)
{
struct mm_slot *slot;
diff --git a/mm/madvise.c b/mm/madvise.c
index fb1c86e630b6..3f9ca7af2c82 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -608,6 +608,8 @@ static long madvise_cold(struct madvise_behavior *madv_behavior)
madvise_cold_page_range(&tlb, madv_behavior);
tlb_finish_mmu(&tlb);
+ khugepaged_move_tail(vma->vm_mm);
+
return 0;
}
@@ -835,6 +837,7 @@ static int madvise_free_single_vma(struct madvise_behavior *madv_behavior)
&walk_ops, tlb);
tlb_end_vma(tlb, vma);
mmu_notifier_invalidate_range_end(&range);
+ khugepaged_move_tail(mm);
return 0;
}
--
2.51.0
next prev parent reply other threads:[~2025-12-15 9:06 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-15 9:04 [PATCH 0/4] Improve khugepaged scan logic Vernon Yang
2025-12-15 9:04 ` [PATCH 1/4] mm: khugepaged: add trace_mm_khugepaged_scan event Vernon Yang
2025-12-18 9:24 ` David Hildenbrand (Red Hat)
2025-12-19 5:21 ` Vernon Yang
2025-12-15 9:04 ` [PATCH 2/4] mm: khugepaged: remove mm when all memory has been collapsed Vernon Yang
2025-12-15 11:52 ` Lance Yang
2025-12-16 6:27 ` Vernon Yang
2025-12-15 21:45 ` kernel test robot
2025-12-16 6:30 ` Vernon Yang
2025-12-15 23:01 ` kernel test robot
2025-12-16 6:32 ` Vernon Yang
2025-12-17 3:31 ` Wei Yang
2025-12-18 3:27 ` Vernon Yang
2025-12-18 3:48 ` Wei Yang
2025-12-18 4:41 ` Vernon Yang
2025-12-18 9:29 ` David Hildenbrand (Red Hat)
2025-12-19 5:24 ` Vernon Yang
2025-12-19 9:00 ` David Hildenbrand (Red Hat)
2025-12-19 8:35 ` Vernon Yang
2025-12-19 8:55 ` David Hildenbrand (Red Hat)
2025-12-23 11:18 ` Dev Jain
2025-12-25 16:07 ` Vernon Yang
2025-12-29 6:02 ` Vernon Yang
2025-12-22 19:00 ` kernel test robot
2025-12-15 9:04 ` Vernon Yang [this message]
2025-12-15 21:12 ` [PATCH 3/4] mm: khugepaged: move mm to list tail when MADV_COLD/MADV_FREE kernel test robot
2025-12-16 7:00 ` Vernon Yang
2025-12-16 13:08 ` kernel test robot
2025-12-16 13:31 ` kernel test robot
2025-12-18 9:31 ` David Hildenbrand (Red Hat)
2025-12-19 5:29 ` Vernon Yang
2025-12-19 8:58 ` David Hildenbrand (Red Hat)
2025-12-21 2:10 ` Wei Yang
2025-12-21 4:25 ` Vernon Yang
2025-12-21 9:24 ` David Hildenbrand (Red Hat)
2025-12-21 12:34 ` Vernon Yang
2025-12-23 9:59 ` David Hildenbrand (Red Hat)
2025-12-25 15:12 ` Vernon Yang
2025-12-21 12:38 ` Wei Yang
2025-12-15 9:04 ` [PATCH 4/4] mm: khugepaged: set to next mm direct when mm has MMF_DISABLE_THP_COMPLETELY Vernon Yang
2025-12-18 9:33 ` David Hildenbrand (Red Hat)
2025-12-19 5:31 ` Vernon Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251215090419.174418-4-yanglincheng@kylinos.cn \
--to=vernon2gm@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=david@kernel.org \
--cc=lance.yang@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=npache@redhat.com \
--cc=yanglincheng@kylinos.cn \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox