From: "Huang, Ying" <ying.huang@intel.com>
To: Bharata B Rao <bharata@amd.com>
Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
<akpm@linux-foundation.org>, <mingo@redhat.com>,
<peterz@infradead.org>, <mgorman@techsingularity.net>,
<raghavendra.kt@amd.com>, <dave.hansen@linux.intel.com>,
<hannes@cmpxchg.org>
Subject: Re: [RFC PATCH 0/2] Hot page promotion optimization for large address space
Date: Thu, 28 Mar 2024 14:03:53 +0800 [thread overview]
Message-ID: <87edbulwom.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <dd2bc563-7654-4d83-896e-49a7291dd1aa@amd.com> (Bharata B. Rao's message of "Thu, 28 Mar 2024 11:19:40 +0530")
Bharata B Rao <bharata@amd.com> writes:
> On 28-Mar-24 11:05 AM, Huang, Ying wrote:
>> Bharata B Rao <bharata@amd.com> writes:
>>
>>> In order to check how efficiently the existing NUMA balancing
>>> based hot page promotion mechanism can detect hot regions and
>>> promote pages for workloads with large memory footprints, I
>>> wrote and tested a program that allocates huge amount of
>>> memory but routinely touches only small parts of it.
>>>
>>> This microbenchmark provisions memory both on DRAM node and CXL node.
>>> It then divides the entire allocated memory into chunks of smaller
>>> size and randomly choses a chunk for generating memory accesses.
>>> Each chunk is then accessed for a fixed number of iterations to
>>> create the notion of hotness. Within each chunk, the individual
>>> pages at 4K granularity are again accessed in random fashion.
>>>
>>> When a chunk is taken up for access in this manner, its pages
>>> can either be residing on DRAM or CXL. In the latter case, the NUMA
>>> balancing driven hot page promotion logic is expected to detect and
>>> promote the hot pages that reside on CXL.
>>>
>>> The experiment was conducted on a 2P AMD Bergamo system that has
>>> CXL as the 3rd node.
>>>
>>> $ numactl -H
>>> available: 3 nodes (0-2)
>>> node 0 cpus: 0-127,256-383
>>> node 0 size: 128054 MB
>>> node 1 cpus: 128-255,384-511
>>> node 1 size: 128880 MB
>>> node 2 cpus:
>>> node 2 size: 129024 MB
>>> node distances:
>>> node 0 1 2
>>> 0: 10 32 60
>>> 1: 32 10 50
>>> 2: 255 255 10
>>>
>>> It is seen that number of pages that get promoted is really low and
>>> the reason for it happens to be that the NUMA hint fault latency turns
>>> out to be much higher than the hot threshold most of the times. Here
>>> are a few latency and threshold sample values captured from
>>> should_numa_migrate_memory() routine when the benchmark was run:
>>>
>>> latency threshold (in ms)
>>> 20620 1125
>>> 56185 1125
>>> 98710 1250
>>> 148871 1375
>>> 182891 1625
>>> 369415 1875
>>> 630745 2000
>>
>> The access latency of your workload is 20s to 630s, which appears too
>> long. Can you try to increase the range of threshold to deal with that?
>> For example,
>>
>> echo 100000 > /sys/kernel/debug/sched/numa_balancing/hot_threshold_ms
>
> That of course should help. But I was exploring alternatives where the
> notion of hotness can be de-linked from the absolute scanning time to
In fact, only relative time from scan to hint fault is recorded and
calculated, we have only limited bits.
> the extent possible. For large memory workloads where only parts of memory
> get accessed at once, the scanning time can lag from the actual access
> time significantly as the data above shows. Wondering if such cases can
> be addressed without having to be workload-specific.
Does it really matter to promote the quite cold pages (accessed every
more than 20s)? And if so, how can we adjust the current algorithm to
cover that? I think that may be possible via extending the threshold
range. And I think that we can find some way to extending the range by
default if necessary.
--
Best Regards,
Huang, Ying
next prev parent reply other threads:[~2024-03-28 6:05 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-27 16:02 Bharata B Rao
2024-03-27 16:02 ` [RFC PATCH 1/2] sched/numa: Fault count based NUMA hint fault latency Bharata B Rao
2024-03-28 1:56 ` Huang, Ying
2024-03-28 4:39 ` Bharata B Rao
2024-03-28 5:21 ` Huang, Ying
2024-03-27 16:02 ` [RFC PATCH 2/2] mm: Update hint fault count for pages that are skipped during scanning Bharata B Rao
2024-03-28 5:35 ` [RFC PATCH 0/2] Hot page promotion optimization for large address space Huang, Ying
2024-03-28 5:49 ` Bharata B Rao
2024-03-28 6:03 ` Huang, Ying [this message]
2024-03-28 6:29 ` Bharata B Rao
2024-03-29 1:14 ` Huang, Ying
2024-04-01 12:20 ` Bharata B Rao
2024-04-02 2:03 ` Huang, Ying
2024-04-02 9:26 ` Bharata B Rao
2024-04-03 8:40 ` Huang, Ying
2024-04-12 4:00 ` Bharata B Rao
2024-04-12 7:28 ` Huang, Ying
2024-04-12 8:16 ` Bharata B Rao
2024-04-12 8:48 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87edbulwom.fsf@yhuang6-desk2.ccr.corp.intel.com \
--to=ying.huang@intel.com \
--cc=akpm@linux-foundation.org \
--cc=bharata@amd.com \
--cc=dave.hansen@linux.intel.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox