Re: [RFC PATCH v5 00/10] mm: Hot page tracking and promotion infrastructure

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Bharata B Rao <bharata@amd.com>
To: <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>
Cc: <Jonathan.Cameron@huawei.com>, <dave.hansen@intel.com>,
	<gourry@gourry.net>, <mgorman@techsingularity.net>,
	<mingo@redhat.com>, <peterz@infradead.org>,
	<raghavendra.kt@amd.com>, <riel@surriel.com>,
	<rientjes@google.com>, <sj@kernel.org>, <weixugc@google.com>,
	<willy@infradead.org>, <ying.huang@linux.alibaba.com>,
	<ziy@nvidia.com>, <dave@stgolabs.net>, <nifan.cxl@gmail.com>,
	<xuezhengchu@huawei.com>, <yiannis@zptcorp.com>,
	<akpm@linux-foundation.org>, <david@redhat.com>,
	<byungchul@sk.com>, <kinseyho@google.com>,
	<joshua.hahnjy@gmail.com>, <yuanchu@google.com>,
	<balbirs@nvidia.com>, <alok.rathore@samsung.com>,
	<shivankg@amd.com>
Subject: Re: [RFC PATCH v5 00/10] mm: Hot page tracking and promotion infrastructure
Date: Mon, 9 Feb 2026 09:00:48 +0530	[thread overview]
Message-ID: <4df58408-58d7-41ad-afa7-c42a64689ec8@amd.com> (raw)
In-Reply-To: <20260129144043.231636-1-bharata@amd.com>

On 29-Jan-26 8:10 PM, Bharata B Rao wrote:
> Results
> =======
> TODO: Will post benchmark nubmers as reply to this patchset soon.

Numbers from redis-memtier benchmark:

Test system details
-------------------
3 node AMD Zen5 system with 2 regular NUMA nodes (0, 1) and a CXL node (2)

$ numactl -H
available: 3 nodes (0-2)
node 0 cpus: 0-95,192-287
node 0 size: 128460 MB
node 1 cpus: 96-191,288-383
node 1 size: 128893 MB
node 2 cpus:
node 2 size: 257993 MB
node distances:
node   0   1   2
  0:  10  32  50
  1:  32  10  60
  2:  255  255  10

Hotness sources
---------------
NUMAB0 - Without NUMA Balancing in base case and with no source enabled
         in the patched case. No migrations occur.
NUMAB2 - Existing hot page promotion for the base case and
         use of hint faults as source in the patched case.

Pghot by default promotes after two accesses but for NUMAB2 source,
promotion is done after one access to match the base behaviour.
(/sys/kernel/debug/pghot/freq_threshold=1)

==============================================================
Scenario 1 - Enough memory in toptier and hence only promotion
==============================================================
In the setup phase, 64GB database is provisioned and explicitly moved
to Node 2 by migrating redis-server's memory to Node 2.
Memtier is run on Node 1.

Parallel distribution, 50% of the keys accessed, each 4 times.
16        Threads
100       Connections per thread
77808     Requests per client

==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9
Latency       KB/sec
--------------------------------------------------------------------------------------------------
Base, NUMAB0
Totals     225827.75       226.49746       225.27900       425.98300
454.65500    513106.09
--------------------------------------------------------------------------------------------------
Base, NUMAB2
Totals     254869.29       205.61759       216.06300       399.35900
454.65500    579091.74
--------------------------------------------------------------------------------------------------
pghot-default, NUMAB2
Totals     264229.35       202.81411       215.03900       393.21500
446.46300    600358.86
--------------------------------------------------------------------------------------------------
pghot-precise, NUMAB2
Totals     261136.17       203.32692       215.03900       391.16700
446.46300    593330.81
==================================================================================================

pgpromote_success
==================================
Base, NUMAB0            0
Base, NUMAB2            10,435,178
pghot-default, NUMAB2   10,435,031
pghot-precise, NUMAB2   10,435,245
==================================

- There is a clear benefit of hot page promotion seen. Both
  base and pghot show similar benefits.
- The number of pages promoted in both cases are more or less
  same.

==============================================================
Scenario 2 - Toptier memory overcommited, promotion + demotion
==============================================================
In the setup phase, 192GB database is provisioned. The database occupies
Node 1 entirely(~128GB) and spills over to Node 2 (~64GB).
Memtier is run on Node 1.

Parallel distribution, 50% of the keys accessed, each 4 times.
16        Threads
100       Connections per thread
233424    Requests per client

==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9
Latency       KB/sec
--------------------------------------------------------------------------------------------------
Base, NUMAB0
Totals     246474.55       211.90623       192.51100       370.68700
448.51100    560235.63
--------------------------------------------------------------------------------------------------
Base, NUMAB2
Totals     232790.88       221.18604       214.01500       419.83900
509.95100    529132.72
--------------------------------------------------------------------------------------------------
pghot-default, NUMAB2
Totals     241615.60       216.12761       210.94300       391.16700
475.13500    549191.27
--------------------------------------------------------------------------------------------------
pghot-precise, NUMAB2
Totals     238557.37       217.57630       207.87100       395.26300
471.03900    542239.92
==================================================================================================
                        pgpromote_success       pgdemote_kswapd
===============================================================
Base, NUMAB0            0                       832,494
Base, NUMAB2            352,075                 720,409
pghot-default, NUMAB2   25,865,321              26,154,984
pghot-precise, NUMAB2   25,525,429              25,838,095
===============================================================

- No clear benefit is seen with hot page promotion both in base and pghot case.
- Most promotion attempts in base case fail because the NUMA hint fault latency
  is found to exceed the threshold value (default threshold of 1000ms) in
  majority of the promotion attempts.
- Unlike base NUMAB2 where the hint fault latency is the difference between the
  PTE update time (during scanning) and the access time (hint fault), pghot uses
  a single latency threshold (4000ms in pghot-default and 5000ms in
  pghot-precise) for two purposes.
        1. If the time difference between successive accesses are within the
           threshold, the page is marked as hot.
        2. Later when kmigrated picks up the page for migration, it will migrate
           only if the difference between the current time and the time when the
          page was marked hot is with the threshold.
  Because of the above difference in behaviour, more number of pages get
  qualified for promotion compared to base NUMAB2.

next prev parent reply	other threads:[~2026-02-09  3:31 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-29 14:40 Bharata B Rao
2026-01-29 14:40 ` [RFC PATCH v5 01/10] mm: migrate: Allow misplaced migration without VMA Bharata B Rao
2026-01-29 14:40 ` [RFC PATCH v5 02/10] migrate: Add migrate_misplaced_folios_batch() Bharata B Rao
2026-01-29 14:40 ` [RFC PATCH v5 03/10] mm: Hot page tracking and promotion Bharata B Rao
2026-02-11 15:40   ` Bharata B Rao
2026-02-11 16:08     ` Gregory Price
2026-02-12  2:03       ` Bharata B Rao
2026-01-29 14:40 ` [RFC PATCH v5 04/10] mm: pghot: Precision mode for pghot Bharata B Rao
2026-01-29 14:40 ` [RFC PATCH v5 05/10] mm: sched: move NUMA balancing tiering promotion to pghot Bharata B Rao
2026-01-29 14:40 ` [RFC PATCH v5 06/10] x86: ibs: In-kernel IBS driver for memory access profiling Bharata B Rao
2026-01-29 14:40 ` [RFC PATCH v5 07/10] x86: ibs: Enable IBS profiling for memory accesses Bharata B Rao
2026-01-29 14:40 ` [RFC PATCH v5 08/10] mm: mglru: generalize page table walk Bharata B Rao
2026-01-29 14:40 ` [RFC PATCH v5 09/10] mm: klruscand: use mglru scanning for page promotion Bharata B Rao
2026-01-29 14:40 ` [RFC PATCH v5 10/10] mm: pghot: Add folio_mark_accessed() as hotness source Bharata B Rao
2026-02-09  3:25 ` [RFC PATCH v5 00/10] mm: Hot page tracking and promotion infrastructure Bharata B Rao
2026-02-09  3:30 ` Bharata B Rao [this message]
2026-02-11 15:30 ` Bharata B Rao
2026-02-11 16:04   ` Gregory Price
2026-02-12  2:16     ` Bharata B Rao
2026-02-11 16:06   ` Gregory Price
2026-02-12 16:15   ` Bharata B Rao
2026-02-13 14:56 ` Gregory Price
2026-02-16  3:00   ` Bharata B Rao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4df58408-58d7-41ad-afa7-c42a64689ec8@amd.com \
    --to=bharata@amd.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alok.rathore@samsung.com \
    --cc=balbirs@nvidia.com \
    --cc=byungchul@sk.com \
    --cc=dave.hansen@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=gourry@gourry.net \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kinseyho@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=nifan.cxl@gmail.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=shivankg@amd.com \
    --cc=sj@kernel.org \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=xuezhengchu@huawei.com \
    --cc=yiannis@zptcorp.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox