From: Vernon Yang <vernon2gm@gmail.com>
To: akpm@linux-foundation.org, david@kernel.org, lorenzo.stoakes@oracle.com
Cc: ziy@nvidia.com, dev.jain@arm.com, baohua@kernel.org,
lance.yang@linux.dev, richard.weiyang@gmail.com,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Vernon Yang <yanglincheng@kylinos.cn>
Subject: [PATCH v2 3/4] mm: khugepaged: set VM_NOHUGEPAGE flag when MADV_COLD/MADV_FREE
Date: Mon, 29 Dec 2025 13:51:50 +0800 [thread overview]
Message-ID: <20251229055151.54887-4-yanglincheng@kylinos.cn> (raw)
In-Reply-To: <20251229055151.54887-1-yanglincheng@kylinos.cn>
For example, create three task: hot1 -> cold -> hot2. After all three
task are created, each allocate memory 128MB. the hot1/hot2 task
continuously access 128 MB memory, while the cold task only accesses
its memory briefly andthen call madvise(MADV_COLD). However, khugepaged
still prioritizes scanning the cold task and only scans the hot2 task
after completing the scan of the cold task.
So if the user has explicitly informed us via MADV_COLD/FREE that this
memory is cold or will be freed, it is appropriate for khugepaged to
skip it only, thereby avoiding unnecessary scan and collapse operations
to reducing CPU wastage.
Here are the performance test results:
(Throughput bigger is better, other smaller is better)
Testing on x86_64 machine:
| task hot2 | without patch | with patch | delta |
|---------------------|---------------|---------------|---------|
| total accesses time | 3.14 sec | 2.93 sec | -6.69% |
| cycles per access | 4.96 | 2.21 | -55.44% |
| Throughput | 104.38 M/sec | 111.89 M/sec | +7.19% |
| dTLB-load-misses | 284814532 | 69597236 | -75.56% |
Testing on qemu-system-x86_64 -enable-kvm:
| task hot2 | without patch | with patch | delta |
|---------------------|---------------|---------------|---------|
| total accesses time | 3.35 sec | 2.96 sec | -11.64% |
| cycles per access | 7.29 | 2.07 | -71.60% |
| Throughput | 97.67 M/sec | 110.77 M/sec | +13.41% |
| dTLB-load-misses | 241600871 | 3216108 | -98.67% |
Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
---
mm/madvise.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/mm/madvise.c b/mm/madvise.c
index b617b1be0f53..3a48d725a3fc 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1360,11 +1360,8 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior)
return madvise_remove(madv_behavior);
case MADV_WILLNEED:
return madvise_willneed(madv_behavior);
- case MADV_COLD:
- return madvise_cold(madv_behavior);
case MADV_PAGEOUT:
return madvise_pageout(madv_behavior);
- case MADV_FREE:
case MADV_DONTNEED:
case MADV_DONTNEED_LOCKED:
return madvise_dontneed_free(madv_behavior);
@@ -1378,6 +1375,18 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior)
/* The below behaviours update VMAs via madvise_update_vma(). */
+ case MADV_COLD:
+ error = madvise_cold(madv_behavior);
+ if (error)
+ goto out;
+ new_flags = (new_flags & ~VM_HUGEPAGE) | VM_NOHUGEPAGE;
+ break;
+ case MADV_FREE:
+ error = madvise_dontneed_free(madv_behavior);
+ if (error)
+ goto out;
+ new_flags = (new_flags & ~VM_HUGEPAGE) | VM_NOHUGEPAGE;
+ break;
case MADV_NORMAL:
new_flags = new_flags & ~VM_RAND_READ & ~VM_SEQ_READ;
break;
@@ -1756,7 +1765,6 @@ static enum madvise_lock_mode get_lock_mode(struct madvise_behavior *madv_behavi
switch (madv_behavior->behavior) {
case MADV_REMOVE:
case MADV_WILLNEED:
- case MADV_COLD:
case MADV_PAGEOUT:
case MADV_POPULATE_READ:
case MADV_POPULATE_WRITE:
@@ -1766,7 +1774,6 @@ static enum madvise_lock_mode get_lock_mode(struct madvise_behavior *madv_behavi
case MADV_GUARD_REMOVE:
case MADV_DONTNEED:
case MADV_DONTNEED_LOCKED:
- case MADV_FREE:
return MADVISE_VMA_READ_LOCK;
default:
return MADVISE_MMAP_WRITE_LOCK;
--
2.51.0
next prev parent reply other threads:[~2025-12-29 5:52 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-29 5:51 [PATCH v2 0/4] Improve khugepaged scan logic Vernon Yang
2025-12-29 5:51 ` [PATCH v2 1/4] mm: khugepaged: add trace_mm_khugepaged_scan event Vernon Yang
2025-12-29 8:09 ` Barry Song
2025-12-29 5:51 ` [PATCH v2 2/4] mm: khugepaged: just skip when the memory has been collapsed Vernon Yang
2025-12-30 15:46 ` Vernon Yang
2025-12-29 5:51 ` Vernon Yang [this message]
2025-12-29 8:20 ` [PATCH v2 3/4] mm: khugepaged: set VM_NOHUGEPAGE flag when MADV_COLD/MADV_FREE Barry Song
2025-12-29 8:26 ` Dev Jain
2025-12-30 15:30 ` Vernon Yang
2025-12-30 19:54 ` David Hildenbrand (Red Hat)
2025-12-31 12:13 ` Vernon Yang
2025-12-31 12:19 ` David Hildenbrand (Red Hat)
2025-12-29 5:51 ` [PATCH v2 4/4] mm: khugepaged: set to next mm direct when mm has MMF_DISABLE_THP_COMPLETELY Vernon Yang
2025-12-30 20:03 ` David Hildenbrand (Red Hat)
2025-12-31 2:51 ` Wei Yang
2025-12-31 12:21 ` David Hildenbrand (Red Hat)
2025-12-31 10:57 ` Vernon Yang
2025-12-29 10:21 ` [syzbot ci] Re: Improve khugepaged scan logic syzbot ci
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251229055151.54887-4-yanglincheng@kylinos.cn \
--to=vernon2gm@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=lance.yang@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=richard.weiyang@gmail.com \
--cc=yanglincheng@kylinos.cn \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox