Re: [RFC PATCH 0/2] Hot page promotion optimization for large address space

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Bharata B Rao <bharata@amd.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, mingo@redhat.com,
	peterz@infradead.org, mgorman@techsingularity.net,
	raghavendra.kt@amd.com, dave.hansen@linux.intel.com,
	hannes@cmpxchg.org
Subject: Re: [RFC PATCH 0/2] Hot page promotion optimization for large address space
Date: Thu, 28 Mar 2024 11:19:40 +0530	[thread overview]
Message-ID: <dd2bc563-7654-4d83-896e-49a7291dd1aa@amd.com> (raw)
In-Reply-To: <87il16lxzl.fsf@yhuang6-desk2.ccr.corp.intel.com>

On 28-Mar-24 11:05 AM, Huang, Ying wrote:
> Bharata B Rao <bharata@amd.com> writes:
> 
>> In order to check how efficiently the existing NUMA balancing
>> based hot page promotion mechanism can detect hot regions and
>> promote pages for workloads with large memory footprints, I
>> wrote and tested a program that allocates huge amount of
>> memory but routinely touches only small parts of it.
>>
>> This microbenchmark provisions memory both on DRAM node and CXL node.
>> It then divides the entire allocated memory into chunks of smaller
>> size and randomly choses a chunk for generating memory accesses.
>> Each chunk is then accessed for a fixed number of iterations to
>> create the notion of hotness. Within each chunk, the individual
>> pages at 4K granularity are again accessed in random fashion.
>>
>> When a chunk is taken up for access in this manner, its pages
>> can either be residing on DRAM or CXL. In the latter case, the NUMA
>> balancing driven hot page promotion logic is expected to detect and
>> promote the hot pages that reside on CXL.
>>
>> The experiment was conducted on a 2P AMD Bergamo system that has
>> CXL as the 3rd node.
>>
>> $ numactl -H
>> available: 3 nodes (0-2)
>> node 0 cpus: 0-127,256-383
>> node 0 size: 128054 MB
>> node 1 cpus: 128-255,384-511
>> node 1 size: 128880 MB
>> node 2 cpus:
>> node 2 size: 129024 MB
>> node distances:
>> node   0   1   2 
>>   0:  10  32  60 
>>   1:  32  10  50 
>>   2:  255  255  10
>>
>> It is seen that number of pages that get promoted is really low and
>> the reason for it happens to be that the NUMA hint fault latency turns
>> out to be much higher than the hot threshold most of the times. Here
>> are a few latency and threshold sample values captured from
>> should_numa_migrate_memory() routine when the benchmark was run:
>>
>> latency	threshold (in ms)
>> 20620	1125
>> 56185	1125
>> 98710	1250
>> 148871	1375
>> 182891	1625
>> 369415	1875
>> 630745	2000
> 
> The access latency of your workload is 20s to 630s, which appears too
> long.  Can you try to increase the range of threshold to deal with that?
> For example,
> 
> echo 100000 > /sys/kernel/debug/sched/numa_balancing/hot_threshold_ms

That of course should help. But I was exploring alternatives where the
notion of hotness can be de-linked from the absolute scanning time to
the extent possible. For large memory workloads where only parts of memory
get accessed at once, the scanning time can lag from the actual access
time significantly as the data above shows. Wondering if such cases can
be addressed without having to be workload-specific.

Regards,
Bharata.

next prev parent reply	other threads:[~2024-03-28  5:49 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-27 16:02 Bharata B Rao
2024-03-27 16:02 ` [RFC PATCH 1/2] sched/numa: Fault count based NUMA hint fault latency Bharata B Rao
2024-03-28  1:56   ` Huang, Ying
2024-03-28  4:39     ` Bharata B Rao
2024-03-28  5:21       ` Huang, Ying
2024-03-27 16:02 ` [RFC PATCH 2/2] mm: Update hint fault count for pages that are skipped during scanning Bharata B Rao
2024-03-28  5:35 ` [RFC PATCH 0/2] Hot page promotion optimization for large address space Huang, Ying
2024-03-28  5:49   ` Bharata B Rao [this message]
2024-03-28  6:03     ` Huang, Ying
2024-03-28  6:29       ` Bharata B Rao
2024-03-29  1:14         ` Huang, Ying
2024-04-01 12:20           ` Bharata B Rao
2024-04-02  2:03             ` Huang, Ying
2024-04-02  9:26               ` Bharata B Rao
2024-04-03  8:40                 ` Huang, Ying
2024-04-12  4:00                   ` Bharata B Rao
2024-04-12  7:28                     ` Huang, Ying
2024-04-12  8:16                       ` Bharata B Rao
2024-04-12  8:48                         ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dd2bc563-7654-4d83-896e-49a7291dd1aa@amd.com \
    --to=bharata@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox