Re: [RFC PATCH v3 0/8] mm: Hot page tracking and promotion infrastructure

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Bharata B Rao <bharata@amd.com>
To: <bharata@amd.com>
Cc: <Jonathan.Cameron@huawei.com>, <akpm@linux-foundation.org>,
	<alok.rathore@samsung.com>, <balbirs@nvidia.com>,
	<byungchul@sk.com>, <dave.hansen@intel.com>, <dave@stgolabs.net>,
	<david@redhat.com>, <gourry@gourry.net>,
	<joshua.hahnjy@gmail.com>, <kinseyho@google.com>,
	<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
	<mgorman@techsingularity.net>, <mingo@redhat.com>,
	<nifan.cxl@gmail.com>, <peterz@infradead.org>,
	<raghavendra.kt@amd.com>, <riel@surriel.com>,
	<rientjes@google.com>, <shivankg@amd.com>, <sj@kernel.org>,
	<weixugc@google.com>, <willy@infradead.org>,
	<xuezhengchu@huawei.com>, <yiannis@zptcorp.com>,
	<ying.huang@linux.alibaba.com>, <yuanchu@google.com>,
	<ziy@nvidia.com>
Subject: Re: [RFC PATCH v3 0/8] mm: Hot page tracking and promotion infrastructure
Date: Wed, 19 Nov 2025 18:36:24 +0530	[thread overview]
Message-ID: <20251119130624.74880-1-bharata@amd.com> (raw)
In-Reply-To: <20251110052343.208768-1-bharata@amd.com>

On 10-Nov-25 10:53 AM, Bharata B Rao wrote:
<snip>
> Results
> =======

Earlier I included results from the scenario where there was enough free
memory in the toptier node and hence demotions weren't getting triggered.
Here I am including results from a similar microbenchmark that results in
demotion too.

System details
--------------
3 node AMD Zen5 system with 2 regular NUMA nodes (0, 1) and a CXL node (2)

$ numactl -H
available: 3 nodes (0-2)
node 0 cpus: 0-95,192-287
node 0 size: 128460 MB
node 1 cpus: 96-191,288-383
node 1 size: 128893 MB
node 2 cpus:
node 2 size: 257993 MB
node distances:
node   0   1   2 
  0:  10  32  50 
  1:  32  10  60 
  2:  255  255  10

Microbenchmark details
----------------------
Single threaded application that allocates memory on both DRAM and CXL nodes
using mmap(MAP_POPULATE). Every 1G region of allocated memory on CXL node is
accessed at 4K granularity randomly and repetitively to build up the notion
of hotness in the 1GB region that is under access. This should drive promotion.
For promotion to work successfully, the DRAM memory that has been provisioned
(and not being accessed) should be demoted first. There is enough free memory
in the CXL node to for demotions.

In summary, this benchmark creates a memory pressure on DRAM node and does
CXL memory accesses to drive both demotion and promotion.

The number of accesses are fixed and hence, the quicker the accessed pages
get promoted to DRAM, the sooner the benchmark is expected to finish.

DRAM-node			= 1
CXL-node			= 2
Initial DRAM alloc ratio	= 75%
Allocation-size			= 171798691840
Initial DRAM Alloc-size	=	 128849018880
Initial CXL Alloc-size		= 42949672960
Hot-region-size			= 1073741824
Nr-regions			= 160
Nr-regions DRAM			= 120 (provisioned but not accessed)
Nr-hot-regions CXL		= 40
Access pattern			= random
Access granularity		= 4096
Delay b/n accesses		= 0
Load/store ratio		= 50l50s
THP used			= no
Nr accesses			= 42949672960
Nr repetitions			= 1024

Hotness sources
---------------
NUMAB0 - Without NUMA Balancing in base case and with no source enabled
	 in the patched case. No migrations.
NUMAB2 - Existing hot page promotion for the base case and
	 use of hint faults as source in the patched case.
pgtscan - Klruscand (MGLRU based PTE A bit scanning) source
hwhints - IBS as source

Time taken (microseconds, lower is better)
----------------------------------------------
Source	Base		Patched		Change
----------------------------------------------
NUMAB0	63,036,030	64,441,675	+2.2%
NUMAB2	62,286,691	68,786,394	+10.4%(#)
pgtscan	NA		68,702,226
hwhints	NA		67,455,607
----------------------------------------------

Pages migrated (pgpromote_success)
----------------------------------------------
Source	Base		Patched
----------------------------------------------
NUMAB0	0		0
NUMAB2	82134(*)	0(#)
pgtscan	NA		6,561,136
hwhints	NA		3,293($)
----------------------------------------------
(#) Unlike base NUMAB2, pghot migrates after 2 accesses.
    Getting two successive accesses within the observation window is hard with
    NUMA hint faults. The default sysctl_numa_balancing_scan_size of 256MB is
    too less to obtain significant number of hint faults.
(*) High run-to-run variation, so the average isn't really representative.
    Hint fault latency comes out higher than the default 1s threshold
    mostly, preventing migrations.
($) Sampling limitation

Pages demoted (pgdemote_kswapd+pgdemote_direct)
(This data is not really a comparision point but just providing
these numbers to show that the workload results in both promotion
and demotion)
----------------------------------------------
Source	Base		Patched
----------------------------------------------
NUMAB0	5,222,366	5,341,502
NUMAB2	5,256,310	5,325,845
pgtscan	NA		5,317,709
hwhints	NA		5,287,091
----------------------------------------------

Promotion candidate pages (pgpromote_candidate)
----------------------------------------------
Source	Base		Patched
----------------------------------------------
NUMAB0	0		0
NUMAB2	82,848		0
pgtscan	NA		0
hwhints	NA		0
----------------------------------------------

Non-rate limited Promotion candidate pages (pgpromote_candidate_nrl)
----------------------------------------------
Source	Base		Patched
----------------------------------------------
NUMAB0	0		0
NUMAB2	0		0
pgtscan	NA		6,561,147
hwhints	NA		3,292
----------------------------------------------

     prev parent reply	other threads:[~2025-11-19 13:06 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-10  5:23 Bharata B Rao
2025-11-10  5:23 ` [RFC PATCH v3 1/8] mm: migrate: Allow misplaced migration without VMA too Bharata B Rao
2025-11-10  5:23 ` [RFC PATCH v3 2/8] migrate: implement migrate_misplaced_folios_batch Bharata B Rao
2025-11-10  5:23 ` [RFC PATCH v3 3/8] mm: Hot page tracking and promotion Bharata B Rao
     [not found]   ` <CGME20251126132450epcas5p123220533572f40d70799294cd3ca4819@epcas5p1.samsung.com>
2025-11-26 13:24     ` Alok Rathore
2025-11-10  5:23 ` [RFC PATCH v3 4/8] x86: ibs: In-kernel IBS driver for memory access profiling Bharata B Rao
2025-11-10  5:23 ` [RFC PATCH v3 5/8] x86: ibs: Enable IBS profiling for memory accesses Bharata B Rao
2025-11-10  5:23 ` [RFC PATCH v3 6/8] mm: mglru: generalize page table walk Bharata B Rao
2025-11-10  5:23 ` [RFC PATCH v3 7/8] mm: klruscand: use mglru scanning for page promotion Bharata B Rao
2025-11-10  5:23 ` [RFC PATCH v3 8/8] mm: sched: Move hot page promotion from NUMAB=2 to pghot tracking Bharata B Rao
2025-11-19 13:06 ` Bharata B Rao [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251119130624.74880-1-bharata@amd.com \
    --to=bharata@amd.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alok.rathore@samsung.com \
    --cc=balbirs@nvidia.com \
    --cc=byungchul@sk.com \
    --cc=dave.hansen@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=gourry@gourry.net \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kinseyho@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=nifan.cxl@gmail.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=shivankg@amd.com \
    --cc=sj@kernel.org \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=xuezhengchu@huawei.com \
    --cc=yiannis@zptcorp.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox