linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Bharata B Rao <bharata@amd.com>
To: <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>
Cc: <Jonathan.Cameron@huawei.com>, <dave.hansen@intel.com>,
	<gourry@gourry.net>, <mgorman@techsingularity.net>,
	<mingo@redhat.com>, <peterz@infradead.org>,
	<raghavendra.kt@amd.com>, <riel@surriel.com>,
	<rientjes@google.com>, <sj@kernel.org>, <weixugc@google.com>,
	<willy@infradead.org>, <ying.huang@linux.alibaba.com>,
	<ziy@nvidia.com>, <dave@stgolabs.net>, <nifan.cxl@gmail.com>,
	<xuezhengchu@huawei.com>, <yiannis@zptcorp.com>,
	<akpm@linux-foundation.org>, <david@redhat.com>,
	<byungchul@sk.com>, <kinseyho@google.com>,
	<joshua.hahnjy@gmail.com>, <yuanchu@google.com>,
	<balbirs@nvidia.com>, <alok.rathore@samsung.com>,
	<shivankg@amd.com>
Subject: Re: [RFC PATCH v5 00/10] mm: Hot page tracking and promotion infrastructure
Date: Mon, 23 Feb 2026 19:57:39 +0530	[thread overview]
Message-ID: <a8d1efd6-2ca4-4f1d-9c0a-c8aa17732ee9@amd.com> (raw)
In-Reply-To: <20260129144043.231636-1-bharata@amd.com>

On 29-Jan-26 8:10 PM, Bharata B Rao wrote:
> 
> Results
> =======
> TODO: Will post benchmark nubmers as reply to this patchset soon.
> 

Here are some numbers from NAS Parallel Benchmark (NPB) with BT application:

Test system details
-------------------
3 node AMD Zen5 system with 2 regular NUMA nodes (0, 1) and a CXL node (2)

$ numactl -H
available: 3 nodes (0-2)
node 0 cpus: 0-95,192-287
node 0 size: 128460 MB
node 1 cpus: 96-191,288-383
node 1 size: 128893 MB
node 2 cpus:
node 2 size: 257993 MB
node distances:
node   0   1   2
  0:  10  32  50
  1:  32  10  60
  2:  255  255  10

Hotness sources
---------------
NUMAB0 - Without NUMA Balancing in base case and with no source enabled
         in the pghot case. No migrations occur.
NUMAB2 - Existing hot page promotion for the base case and
         use of hint faults as source in the pghot case.
         Both promotion and demotion are enabled in this case.

Pghot by default promotes after two accesses but for NUMAB2 source,
promotion is done after one access to match the base behaviour.
(/sys/kernel/debug/pghot/freq_threshold=1)


NAS-BT details
--------------
Command: mpirun -np 16 /usr/bin/numactl --cpunodebind=0,1
NPB3.4.4/NPB3.4-MPI/bin/bt.F.x

While class D uses around 24G of memory (which is too less to show the benefit
of promition), class E results in around 368G of memory which overflows my
toptier. Hence I wanted something in between these classes. So I have  modified
class F to the problem size of 768 which results in around 160GB of memory.

After the memory consumption stabilizes, all the rank PIDs are paused and
their memory is moved to CXL node using migratepages command. This simulates
the situation of memory residing on lower tier node and access by BT processes
leading to promotion.

Time in seconds - Lower is better
Mop/s total - Higher is better
=====================================================================================
                        Base            Base            pghot-default
pghot-precise
                        NUMAB0          NUMAB2          NUMAB2          NUMAB2
=====================================================================================
Time in seconds         7349.86         4422.50         6219.71         4113.56
Mop/s total             53247.66        88493.630       62923.030       95139.810

pgpromote_success       0               42181834        248503390       41955718
pgpromote_candidate     0               0               577086192       0
pgpromote_candidate_nrl 0               42181834        29410329        41956171
pgdemote_kswapd         0               0               216489010       0
numa_pte_updates        0               42252749        607470975       42037882
numa_hint_faults        0               42183772        606540729       41968150
=====================================================================================

- In the base case, the benchmark numbers improve significantly due to hot page
  promotion.
- Though the benchmark runs for hundreds of minutes, the pages get promoted
  within the first few mins.
- pghot-precise is able to match the base case numbers.
- The benchmark suffers in pghot-default case due to promotion being limited
  to the default NID (0) only. This leads to excessive PTE updates, hint faults,
  demotion and promotion churn.


  parent reply	other threads:[~2026-02-23 14:28 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-29 14:40 Bharata B Rao
2026-01-29 14:40 ` [RFC PATCH v5 01/10] mm: migrate: Allow misplaced migration without VMA Bharata B Rao
2026-01-29 14:40 ` [RFC PATCH v5 02/10] migrate: Add migrate_misplaced_folios_batch() Bharata B Rao
2026-01-29 14:40 ` [RFC PATCH v5 03/10] mm: Hot page tracking and promotion Bharata B Rao
2026-02-11 15:40   ` Bharata B Rao
2026-02-11 16:08     ` Gregory Price
2026-02-12  2:03       ` Bharata B Rao
2026-01-29 14:40 ` [RFC PATCH v5 04/10] mm: pghot: Precision mode for pghot Bharata B Rao
2026-01-29 14:40 ` [RFC PATCH v5 05/10] mm: sched: move NUMA balancing tiering promotion to pghot Bharata B Rao
2026-01-29 14:40 ` [RFC PATCH v5 06/10] x86: ibs: In-kernel IBS driver for memory access profiling Bharata B Rao
2026-01-29 14:40 ` [RFC PATCH v5 07/10] x86: ibs: Enable IBS profiling for memory accesses Bharata B Rao
2026-01-29 14:40 ` [RFC PATCH v5 08/10] mm: mglru: generalize page table walk Bharata B Rao
2026-01-29 14:40 ` [RFC PATCH v5 09/10] mm: klruscand: use mglru scanning for page promotion Bharata B Rao
2026-01-29 14:40 ` [RFC PATCH v5 10/10] mm: pghot: Add folio_mark_accessed() as hotness source Bharata B Rao
2026-02-09  3:25 ` [RFC PATCH v5 00/10] mm: Hot page tracking and promotion infrastructure Bharata B Rao
2026-02-09  3:30 ` Bharata B Rao
2026-02-11 15:30 ` Bharata B Rao
2026-02-11 16:04   ` Gregory Price
2026-02-12  2:16     ` Bharata B Rao
2026-02-11 16:06   ` Gregory Price
2026-02-12 16:15   ` Bharata B Rao
2026-02-13 14:56 ` Gregory Price
2026-02-16  3:00   ` Bharata B Rao
2026-02-23 14:27 ` Bharata B Rao [this message]
2026-02-23 15:02   ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a8d1efd6-2ca4-4f1d-9c0a-c8aa17732ee9@amd.com \
    --to=bharata@amd.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alok.rathore@samsung.com \
    --cc=balbirs@nvidia.com \
    --cc=byungchul@sk.com \
    --cc=dave.hansen@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=gourry@gourry.net \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kinseyho@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=nifan.cxl@gmail.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=shivankg@amd.com \
    --cc=sj@kernel.org \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=xuezhengchu@huawei.com \
    --cc=yiannis@zptcorp.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox