linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Raghavendra K T <raghavendra.kt@amd.com>
To: <raghavendra.kt@amd.com>
Cc: <AneeshKumar.KizhakeVeetil@arm.com>, <Michael.Day@amd.com>,
	<akpm@linux-foundation.org>, <bharata@amd.com>,
	<dave.hansen@intel.com>, <david@redhat.com>,
	<dongjoo.linux.dev@gmail.com>, <feng.tang@intel.com>,
	<gourry@gourry.net>, <hannes@cmpxchg.org>, <honggyu.kim@sk.com>,
	<hughd@google.com>, <jhubbard@nvidia.com>, <jon.grimm@amd.com>,
	<k.shutemov@gmail.com>, <kbusch@meta.com>,
	<kmanaouil.dev@gmail.com>, <leesuyeon0506@gmail.com>,
	<leillc@google.com>, <liam.howlett@oracle.com>,
	<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
	<mgorman@techsingularity.net>, <mingo@redhat.com>,
	<nadav.amit@gmail.com>, <nphamcs@gmail.com>,
	<peterz@infradead.org>, <riel@surriel.com>, <rientjes@google.com>,
	<rppt@kernel.org>, <santosh.shukla@amd.com>, <shivankg@amd.com>,
	<shy828301@gmail.com>, <sj@kernel.org>, <vbabka@suse.cz>,
	<weixugc@google.com>, <willy@infradead.org>,
	<ying.huang@linux.alibaba.com>, <ziy@nvidia.com>,
	<Jonathan.Cameron@huawei.com>, <dave@stgolabs.net>,
	<yuanchu@google.com>, <kinseyho@google.com>, <hdanton@sina.com>,
	<harry.yoo@oracle.com>
Subject: [RFC PATCH V3 00/17] mm: slowtier page promotion based on PTE A bit
Date: Thu, 14 Aug 2025 15:32:50 +0000	[thread overview]
Message-ID: <20250814153307.1553061-1-raghavendra.kt@amd.com> (raw)

The current series has additional enhancements and comments' incorporation on top of
RFC V2.

This is an additional source of hot page generator to NUMAB, IBS [4], KMGLRUD [5].

Introduction:
=============
In the current hot page promotion, all the activities including the
process address space scanning, NUMA hint fault handling and page
migration is performed in the process context. i.e., scanning overhead is
borne by applications.

This RFC V2 patch series does slow-tier page promotion by using PTE Accessed
bit scanning. Scanning is done by a global kernel thread which routinely
scans all the processes' address spaces and checks for accesses by reading
the PTE A bit.

A separate migration thread migrates/promotes the pages to the top-tier
node based on a simple heuristic that uses top-tier scan/access information
of the mm.

Additionally based on the feedback, a prctl knob with a scalar value is
provided to control per task scanning.

Changes Since RFC V2:
===================
 - Enhanced logic to migrate on second access.

 - Using prctl scalar value to further tune the scanning efficiency.

 - Using of PFN instead of folio to record hot pages for easy integration
with kpromoted/kmigrated [4].

 - Rebasing on top of fork/exec changes in v6.16.

 - Revisiting mm_walk logic and folio validation based on Harry's comments.

 - Feedback from migration system to slowdown scanning when more migration failures
 happen.

 - Masami's comment on trace patch.

 - Bug fix to overnight idle system  crash due to incorrect kmemcache usage.

 - Enhanced target node finding logic to further obtain fallback nodes to migrate.
(TBD: This needs followup patch that actually does migration to fallback target nodes)

Changes since RFC V1:
=====================
- Addressing the review comments by Jonathan (Thank you for your closer
 reviews).

- Per mm migration list with separate lock to resolve race conditions/softlockups
reported by Davidlohr.

- Add one more filter before migration for LRU_GEN case to check whether
 folio is still hot.

- Rename kmmscand ==> kscand kmmmigrated ==> kmigrated (hopefully this
 gets merged into Bharat's upcoming migration thread)

Changes since RFC V0:
======================
- A separate migration thread is used for migration, thus alleviating need for
  multi-threaded scanning (at least as per tracing).

- A simple heuristic for target node calculation is added.

- prctl (David R) interface with scalar value is added to control per task scanning.

- Steve's comment on tracing incorporated.

- Davidlohr's reported bugfix.

- Initial scan delay similar to NUMAB1 mode added.

- Got rid of migration lock during mm_walk.

A note on per mm migration list using mm_slot:
=============================================
Using per mm migration list (mm_slot) has helped to reduce contention
 and thus easing mm teardown during process exit.

It also helps to tie PFN/folio with mm to make heuristics work better
 and further it would help to throttle migration per mm (OR process) (TBD).

A note on PTE A bit scanning:
============================
Major positive: Current patchset is able to cover all the process address
 space scanning effectively with simple algorithms to tune scan_size and
 scan_period.

Thanks to Jonathan, Davidlohr, David, Harry, Masami Steve for review feedback on RFCs.

Future plans:
================
Evaluate how integration with hotness monitoring subsystem works, OR
as a standalone integration with kmigrated API* of [4]

Results:
=======
Benchmark Cbench (by Bharata) to evaluate performance promotion in
slowtier system.

Benchmark allocates memory on both regular NUMA node and  slowtier node,
then does continuous access.
Goal: Finishing fixed numaber of access in less time

SUT: Genoa+ EPYC system

base 6.16 NUMAB2 (because this has the best perf)
patched 6.16 + current series

Time taken in sec (lower is better)
               base           patched
8GB            228            206
32GB           547            534
128GB          1100           920

Links:
[1] RFC V0: https://lore.kernel.org/all/20241201153818.2633616-1-raghavendra.kt@amd.com/
[2] RFC V1: https://lore.kernel.org/linux-mm/20250319193028.29514-1-raghavendra.kt@amd.com/
[3] RFC V2: https://lore.kernel.org/linux-mm/20250624055617.1291159-1-raghavendra.kt@amd.com/
[4] Hotpage detection and promotion: https://lore.kernel.org/linux-mm/20250814134826.154003-1-bharata@amd.com/T/#t
[5] MGLRU: https://lkml.org/lkml/2025/3/24/1458

Patch organization:
patch 1-5 initial skeleton for scanning and migration
patch 6: migration
patch 7-9: scanning optimizations
patch 10: target_node heuristic
patch 11: Migration failure feedback
patch 12-14: sysfs, vmstat and tracing
patch 15-16: prctl implementation and enhancements to scanning.
patch17: Fallback target node finding

Raghavendra K T (17):
  mm: Add kscand kthread for PTE A bit scan
  mm: Maintain mm_struct list in the system
  mm: Scan the mm and create a migration list
  mm/kscand: Add only hot pages to migration list
  mm: Create a separate kthread for migration
  mm/migration: migrate accessed folios to toptier node
  mm: Add throttling of mm scanning using scan_period
  mm: Add throttling of mm scanning using scan_size
  mm: Add initial scan delay
  mm: Add a heuristic to calculate target node
  mm/kscand: Implement migration failure feedback
  sysfs: Add sysfs support to tune scanning
  mm/vmstat: Add vmstat counters
  trace/kscand: Add tracing of scanning and migration
  prctl: Introduce new prctl to control scanning
  prctl: Fine tune scan_period with prctl scale param
  mm: Create a list of fallback target nodes

 Documentation/filesystems/proc.rst |    2 +
 fs/proc/task_mmu.c                 |    4 +
 include/linux/kscand.h             |   30 +
 include/linux/migrate.h            |    2 +
 include/linux/mm.h                 |   13 +
 include/linux/mm_types.h           |    7 +
 include/linux/vm_event_item.h      |   12 +
 include/trace/events/kmem.h        |   99 ++
 include/uapi/linux/prctl.h         |    7 +
 kernel/fork.c                      |    6 +
 kernel/sys.c                       |   25 +
 mm/Kconfig                         |    8 +
 mm/Makefile                        |    1 +
 mm/internal.h                      |    1 +
 mm/kscand.c                        | 1754 ++++++++++++++++++++++++++++
 mm/migrate.c                       |    2 +-
 mm/mmap.c                          |    2 +
 mm/vma_exec.c                      |    3 +
 mm/vmstat.c                        |   12 +
 19 files changed, 1989 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/kscand.h
 create mode 100644 mm/kscand.c


base-commit: 038d61fd642278bab63ee8ef722c50d10ab01e8f
-- 
2.34.1



             reply	other threads:[~2025-08-14 15:33 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-14 15:32 Raghavendra K T [this message]
2025-08-14 15:32 ` [RFC PATCH V3 01/17] mm: Add kscand kthread for PTE A bit scan Raghavendra K T
2025-10-02 13:12   ` Jonathan Cameron
2025-08-14 15:32 ` [RFC PATCH V3 02/17] mm: Maintain mm_struct list in the system Raghavendra K T
2025-10-02 13:23   ` Jonathan Cameron
2025-08-14 15:32 ` [RFC PATCH V3 03/17] mm: Scan the mm and create a migration list Raghavendra K T
2025-10-02 13:53   ` Jonathan Cameron
2025-08-14 15:32 ` [RFC PATCH V3 04/17] mm/kscand: Add only hot pages to " Raghavendra K T
2025-10-02 16:00   ` Jonathan Cameron
2025-08-14 15:32 ` [RFC PATCH V3 05/17] mm: Create a separate kthread for migration Raghavendra K T
2025-10-02 16:03   ` Jonathan Cameron
2025-08-14 15:32 ` [RFC PATCH V3 06/17] mm/migration: migrate accessed folios to toptier node Raghavendra K T
2025-10-02 16:17   ` Jonathan Cameron
2025-08-14 15:32 ` [RFC PATCH V3 07/17] mm: Add throttling of mm scanning using scan_period Raghavendra K T
2025-10-02 16:24   ` Jonathan Cameron
2025-08-14 15:32 ` [RFC PATCH V3 08/17] mm: Add throttling of mm scanning using scan_size Raghavendra K T
2025-10-03  9:35   ` Jonathan Cameron
2025-08-14 15:32 ` [RFC PATCH V3 09/17] mm: Add initial scan delay Raghavendra K T
2025-10-03  9:41   ` Jonathan Cameron
2025-08-14 15:33 ` [RFC PATCH V3 10/17] mm: Add a heuristic to calculate target node Raghavendra K T
2025-10-03 10:04   ` Jonathan Cameron
2025-08-14 15:33 ` [RFC PATCH V3 11/17] mm/kscand: Implement migration failure feedback Raghavendra K T
2025-10-03 10:10   ` Jonathan Cameron
2025-08-14 15:33 ` [RFC PATCH V3 12/17] sysfs: Add sysfs support to tune scanning Raghavendra K T
2025-10-03 10:25   ` Jonathan Cameron
2025-08-14 15:33 ` [RFC PATCH V3 13/17] mm/vmstat: Add vmstat counters Raghavendra K T
2025-08-14 15:33 ` [RFC PATCH V3 14/17] trace/kscand: Add tracing of scanning and migration Raghavendra K T
2025-10-03 10:28   ` Jonathan Cameron
2025-08-14 15:33 ` [RFC PATCH V3 15/17] prctl: Introduce new prctl to control scanning Raghavendra K T
2025-08-14 15:33 ` [RFC PATCH V3 16/17] prctl: Fine tune scan_period with prctl scale param Raghavendra K T
2025-08-14 15:33 ` [RFC PATCH V3 17/17] mm: Create a list of fallback target nodes Raghavendra K T
2025-08-21 15:24 ` [RFC PATCH V3 00/17] mm: slowtier page promotion based on PTE A bit Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250814153307.1553061-1-raghavendra.kt@amd.com \
    --to=raghavendra.kt@amd.com \
    --cc=AneeshKumar.KizhakeVeetil@arm.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=Michael.Day@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=bharata@amd.com \
    --cc=dave.hansen@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=dongjoo.linux.dev@gmail.com \
    --cc=feng.tang@intel.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=harry.yoo@oracle.com \
    --cc=hdanton@sina.com \
    --cc=honggyu.kim@sk.com \
    --cc=hughd@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=jon.grimm@amd.com \
    --cc=k.shutemov@gmail.com \
    --cc=kbusch@meta.com \
    --cc=kinseyho@google.com \
    --cc=kmanaouil.dev@gmail.com \
    --cc=leesuyeon0506@gmail.com \
    --cc=leillc@google.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=nadav.amit@gmail.com \
    --cc=nphamcs@gmail.com \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=rppt@kernel.org \
    --cc=santosh.shukla@amd.com \
    --cc=shivankg@amd.com \
    --cc=shy828301@gmail.com \
    --cc=sj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox