Re: [RFC PATCH v2 8/8] mm: sched: Move hot page promotion from NUMAB=2 to kpromoted

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Bharata B Rao <bharata@amd.com>
To: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
	<dave.hansen@intel.com>, <gourry@gourry.net>,
	<hannes@cmpxchg.org>, <mgorman@techsingularity.net>,
	<mingo@redhat.com>, <peterz@infradead.org>,
	<raghavendra.kt@amd.com>, <riel@surriel.com>,
	<rientjes@google.com>, <sj@kernel.org>, <weixugc@google.com>,
	<willy@infradead.org>, <ying.huang@linux.alibaba.com>,
	<ziy@nvidia.com>, <dave@stgolabs.net>, <nifan.cxl@gmail.com>,
	<xuezhengchu@huawei.com>, <yiannis@zptcorp.com>,
	<akpm@linux-foundation.org>, <david@redhat.com>,
	<byungchul@sk.com>, <kinseyho@google.com>,
	<joshua.hahnjy@gmail.com>, <yuanchu@google.com>,
	<balbirs@nvidia.com>, <alok.rathore@samsung.com>
Subject: Re: [RFC PATCH v2 8/8] mm: sched: Move hot page promotion from NUMAB=2 to kpromoted
Date: Mon, 6 Oct 2025 11:27:21 +0530	[thread overview]
Message-ID: <b13fc805-728a-494e-93ea-f2dea351eb00@amd.com> (raw)
In-Reply-To: <20251003133818.000017af@huawei.com>


On 03-Oct-25 6:08 PM, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:16:53 +0530
> Bharata B Rao <bharata@amd.com> wrote:
> 
>> Currently hot page promotion (NUMA_BALANCING_MEMORY_TIERING
>> mode of NUMA Balancing) does hot page detection (via hint faults),
>> hot page classification and eventual promotion, all by itself and
>> sits within the scheduler.
>>
>> With the new hot page tracking and promotion mechanism being
>> available, NUMA Balancing can limit itself to detection of
>> hot pages (via hint faults) and off-load rest of the
>> functionality to the common hot page tracking system.
>>
>> pghot_record_access(PGHOT_HINT_FAULT) API is used to feed the
>> hot page info. In addition, the migration rate limiting and
>> dynamic threshold logic are moved to kpromoted so that the same
>> can be used for hot pages reported by other sources too.
>>
>> Signed-off-by: Bharata B Rao <bharata@amd.com>
> 
> Making a direct replacement without any fallback to previous method
> is going to need a lot of data to show there are no important regressions.
> 
> So bold move if that's the intent! 

Firstly I am only moving the existing hot page heuristics that is part of
NUMAB=2 to kpromoted so that the same can be applied to hot pages being
identified by other sources. So the hint fault mechanism that is inherent
to NUMAB=2 still remains.

In fact, kscand effort started as a potential replacement for the existing
hot page promotion mechanism by getting rid of hint faults and moving the
page table scanning out of process context.

In any case, I will start including numbers from the next post.
>>  
>>  static unsigned int sysctl_pghot_freq_window = KPROMOTED_FREQ_WINDOW;
>>  
>> +/* Restrict the NUMA promotion throughput (MB/s) for each target node. */
>> +static unsigned int sysctl_pghot_promote_rate_limit = 65536;
> 
> If the comment correlates with the value, this is 64 GiB/s?  That seems
> unlikely if I guess possible.

IIUC, the existing logic tries to limit promotion rate to 64 GiB/s by
limiting the number of candidate pages that are promoted within the
1s observation interval.

Are you saying that achieving the rate of 64 GiB/s is not possible
or unlikely?

> 
>> +
>>  #ifdef CONFIG_SYSCTL
>>  static const struct ctl_table pghot_sysctls[] = {
>>  	{
>> @@ -44,8 +50,17 @@ static const struct ctl_table pghot_sysctls[] = {
>>  		.proc_handler	= proc_dointvec_minmax,
>>  		.extra1		= SYSCTL_ZERO,
>>  	},
>> +	{
>> +		.procname	= "pghot_promote_rate_limit_MBps",
>> +		.data		= &sysctl_pghot_promote_rate_limit,
>> +		.maxlen		= sizeof(unsigned int),
>> +		.mode		= 0644,
>> +		.proc_handler	= proc_dointvec_minmax,
>> +		.extra1		= SYSCTL_ZERO,
>> +	},
>>  };
>>  #endif
>> +
> Put that in earlier patch to reduce noise here.

This patch moves the hot page heuristics to kpromoted and hence this
related sysctl is also being moved in this patch.

> 
>>  static bool phi_heap_less(const void *lhs, const void *rhs, void *args)
>>  {
>>  	return (*(struct pghot_info **)lhs)->frequency >
>> @@ -94,11 +109,99 @@ static bool phi_heap_insert(struct max_heap *phi_heap, struct pghot_info *phi)
>>  	return true;
>>  }
>>  
>> +/*
>> + * For memory tiering mode, if there are enough free pages (more than
>> + * enough watermark defined here) in fast memory node, to take full
> 
> I'd use enough_wmark   Just because "more than enough" is a common
> English phrase and I at least tripped over that sentence as a result!

Ah I see that, but as you note later, I am currently only doing the
movement.

> 
>> + * advantage of fast memory capacity, all recently accessed slow
>> + * memory pages will be migrated to fast memory node without
>> + * considering hot threshold.
>> + */
>> +static bool pgdat_free_space_enough(struct pglist_data *pgdat)
>> +{
>> +	int z;
>> +	unsigned long enough_wmark;
>> +
>> +	enough_wmark = max(1UL * 1024 * 1024 * 1024 >> PAGE_SHIFT,
>> +			   pgdat->node_present_pages >> 4);
>> +	for (z = pgdat->nr_zones - 1; z >= 0; z--) {
>> +		struct zone *zone = pgdat->node_zones + z;
>> +
>> +		if (!populated_zone(zone))
>> +			continue;
>> +
>> +		if (zone_watermark_ok(zone, 0,
>> +				      promo_wmark_pages(zone) + enough_wmark,
>> +				      ZONE_MOVABLE, 0))
>> +			return true;
>> +	}
>> +	return false;
>> +}
> 
>> +
>> +static void kpromoted_promotion_adjust_threshold(struct pglist_data *pgdat,
> 
> Needs documentation of the algorithm and the reasons for various choices.
> 
> I see it is a code move though so maybe that's a job for another day.

Sure.

Regards,
Bharata.

next prev parent reply	other threads:[~2025-10-06  5:57 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-10 14:46 [RFC PATCH v2 0/8] mm: Hot page tracking and promotion infrastructure Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 1/8] mm: migrate: Allow misplaced migration without VMA too Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 2/8] migrate: implement migrate_misplaced_folios_batch Bharata B Rao
2025-10-03 10:36   ` Jonathan Cameron
2025-10-03 11:02     ` Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 3/8] mm: Hot page tracking and promotion Bharata B Rao
2025-10-03 11:17   ` Jonathan Cameron
2025-10-06  4:13     ` Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 4/8] x86: ibs: In-kernel IBS driver for memory access profiling Bharata B Rao
2025-10-03 12:19   ` Jonathan Cameron
2025-10-06  4:28     ` Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 5/8] x86: ibs: Enable IBS profiling for memory accesses Bharata B Rao
2025-10-03 12:22   ` Jonathan Cameron
2025-09-10 14:46 ` [RFC PATCH v2 6/8] mm: mglru: generalize page table walk Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 7/8] mm: klruscand: use mglru scanning for page promotion Bharata B Rao
2025-10-03 12:30   ` Jonathan Cameron
2025-09-10 14:46 ` [RFC PATCH v2 8/8] mm: sched: Move hot page promotion from NUMAB=2 to kpromoted Bharata B Rao
2025-10-03 12:38   ` Jonathan Cameron
2025-10-06  5:57     ` Bharata B Rao [this message]
2025-10-06  9:53       ` Jonathan Cameron
2025-09-10 15:39 ` [RFC PATCH v2 0/8] mm: Hot page tracking and promotion infrastructure Matthew Wilcox
2025-09-10 16:01   ` Gregory Price
2025-09-16 19:45     ` David Rientjes
2025-09-16 22:02       ` Gregory Price
2025-09-17  0:30       ` Wei Xu
2025-09-17  3:20         ` Balbir Singh
2025-09-17  4:15           ` Bharata B Rao
2025-09-17 16:49         ` Jonathan Cameron
2025-09-25 14:03           ` Yiannis Nikolakopoulos
2025-09-25 14:41             ` Gregory Price
2025-10-16 11:48               ` Yiannis Nikolakopoulos
2025-09-25 15:00             ` Jonathan Cameron
2025-09-25 15:08               ` Gregory Price
2025-09-25 15:18                 ` Gregory Price
2025-09-25 15:24                 ` Jonathan Cameron
2025-09-25 16:06                   ` Gregory Price
2025-09-25 17:23                     ` Jonathan Cameron
2025-09-25 19:02                       ` Gregory Price
2025-10-01  7:22                         ` Gregory Price
2025-10-17  9:53                           ` Yiannis Nikolakopoulos
2025-10-17 14:15                             ` Gregory Price
2025-10-17 14:36                               ` Jonathan Cameron
2025-10-17 14:59                                 ` Gregory Price
2025-10-20 14:05                                   ` Jonathan Cameron
2025-10-21 18:52                                     ` Gregory Price
2025-10-21 18:57                                       ` Gregory Price
2025-10-22  9:09                                         ` Jonathan Cameron
2025-10-22 15:05                                           ` Gregory Price
2025-10-23 15:29                                             ` Jonathan Cameron
2025-10-16 16:16               ` Yiannis Nikolakopoulos
2025-10-20 14:23                 ` Jonathan Cameron
2025-10-20 15:05                   ` Gregory Price
2025-10-08 17:59       ` Vinicius Petrucci

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b13fc805-728a-494e-93ea-f2dea351eb00@amd.com \
    --to=bharata@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=alok.rathore@samsung.com \
    --cc=balbirs@nvidia.com \
    --cc=byungchul@sk.com \
    --cc=dave.hansen@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=jonathan.cameron@huawei.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kinseyho@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=nifan.cxl@gmail.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=sj@kernel.org \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=xuezhengchu@huawei.com \
    --cc=yiannis@zptcorp.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox