From: Bharata B Rao <bharata@amd.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
mgorman@suse.de, peterz@infradead.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
akpm@linux-foundation.org, luto@kernel.org, tglx@linutronix.de,
yue.li@memverge.com, Ravikumar.Bangoria@amd.com
Subject: Re: [RFC PATCH 0/5] Memory access profiler(IBS) driven NUMA balancing
Date: Fri, 3 Mar 2023 10:55:54 +0530 [thread overview]
Message-ID: <41b47cd7-1ba9-3205-165e-02e8384e7064@amd.com> (raw)
In-Reply-To: <87jzzz8tgm.fsf@yhuang6-desk2.ccr.corp.intel.com>
On 02-Mar-23 1:40 PM, Huang, Ying wrote:
> Bharata B Rao <bharata@amd.com> writes:
>>
>> Here is the data for the benchmark run:
>>
>> Time taken or overhead (us) for fault, task_work and sched_switch
>> handling
>>
>> Default IBS
>> Fault handling 2875354862 2602455
>> Task work handling 139023 24008121
>> Sched switch handling 37712
>> Total overhead 2875493885 26648288
>>
>> Default
>> -------
>> Total Min Max Avg
>> do_numa_page 2875354862 0.08 392.13 22.11
>> task_numa_work 139023 0.14 5365.77 532.66
>> Total 2875493885
>>
>> IBS
>> ---
>> Total Min Max Avg
>> ibs_overflow_handler 2602455 0.14 103.91 1.29
>> task_ibs_access_work 24008121 0.17 485.09 37.65
>> hw_access_sched_in 37712 0.15 287.55 1.35
>> Total 26648288
>>
>>
>> Default IBS
>> Benchmark score(us) 160171762.0 40323293.0
>> numa_pages_migrated 2097220 511791
>> Overhead per page 1371 52
>> Pages migrated per sec 13094 12692
>> numa_hint_faults_local 2820311 140856
>> numa_hint_faults 38589520 652647
>
> For default, numa_hint_faults >> numa_pages_migrated. It's hard to be
> understood.
Most of the migration requests from the numa hint page fault path
are failing due to failure to isolate the pages.
This is the check in migrate_misplaced_page() from where it returns
without even trying to do the subsequent migrate_pages() call:
isolated = numamigrate_isolate_page(pgdat, page);
if (!isolated)
goto out;
I will further investigate this.
> I guess that there aren't many shared pages in the
> benchmark?
I have a version of the benchmark which has a fraction of
shared memory between sets of thread in addition to the
per-set exclusive memory. Here too the same performance
difference is seen.
> And I guess that the free pages in the target node is enough
> too?
The benchmark is using 16G totally with 8G being accessed from
threads on either nodes. There is enough memory on the target
node to accept the incoming page migration requests.
>
>> hint_faults_local/hint_faults 7% 22%
>>
>> Here is the summary:
>>
>> - In case of IBS, the benchmark completes 75% faster compared to
>> the default case. The gain varies based on how many iterations of
>> memory accesses we run as part of the benchmark. For 2048 iterations
>> of accesses, I have seen a gain of around 50%.
>> - The overhead of NUMA balancing (as measured by the time taken in
>> the fault handling, task_work time handling and sched_switch time
>> handling) in the default case is seen to be pretty high compared to
>> the IBS case.
>> - The number of hint-faults in the default case is significantly
>> higher than the IBS case.
>> - The local hint-faults percentage is much better in the IBS
>> case compared to the default case.
>> - As shown in the graphs (in other threads of this mail thread), in
>> the default case, the page migrations start a bit slowly while IBS
>> case shows steady migrations right from the start.
>> - I have also shown (via graphs in other threads of this mail thread)
>> that in IBS case the benchmark is able to steadily increase
>> the access iterations over time, while in the default case, the
>> benchmark doesn't do forward progress for a long time after
>> an initial increase.
>
> Hard to understand this too. Pages are migrated to local, but
> performance doesn't improve.
Migrations start a bit late and too much of time is spent later
in the run in hint faults and failed migration attempts (due to failure
to isolate the pages) is probably the reason?
>
>> - Early migrations due to relevant access sampling from IBS,
>> is most probably the significant reason for the uplift that IBS
>> case gets.
>
> In original kernel, the NUMA page table scanning will delay for a
> while. Please check the below comments in task_tick_numa().
>
> /*
> * Using runtime rather than walltime has the dual advantage that
> * we (mostly) drive the selection from busy threads and that the
> * task needs to have done some actual work before we bother with
> * NUMA placement.
> */
>
> I think this is generally reasonable, while it's not best for this
> micro-benchmark.
This is in addition to the initial scan delay that we have via
sysctl_numa_balancing_scan_delay. I have an equivalent of this
initial delay where the IBS access sampling is not started for
the task until an initial delay.
Thanks for your observations.
Regards,
Bharata.
next prev parent reply other threads:[~2023-03-03 5:26 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-08 7:35 Bharata B Rao
2023-02-08 7:35 ` [RFC PATCH 1/5] x86/ibs: In-kernel IBS driver for page access profiling Bharata B Rao
2023-02-08 7:35 ` [RFC PATCH 2/5] x86/ibs: Drive NUMA balancing via IBS access data Bharata B Rao
2023-02-08 7:35 ` [RFC PATCH 3/5] x86/ibs: Enable per-process IBS from sched switch path Bharata B Rao
2023-02-08 7:35 ` [RFC PATCH 4/5] x86/ibs: Adjust access faults sampling period Bharata B Rao
2023-02-08 7:35 ` [RFC PATCH 5/5] x86/ibs: Delay the collection of HW-provided access info Bharata B Rao
2023-02-08 18:03 ` [RFC PATCH 0/5] Memory access profiler(IBS) driven NUMA balancing Peter Zijlstra
2023-02-08 18:12 ` Dave Hansen
2023-02-09 6:04 ` Bharata B Rao
2023-02-09 14:28 ` Dave Hansen
2023-02-10 4:28 ` Bharata B Rao
2023-02-10 4:40 ` Dave Hansen
2023-02-10 15:10 ` Bharata B Rao
2023-02-09 5:57 ` Bharata B Rao
2023-02-13 2:56 ` Huang, Ying
2023-02-13 3:23 ` Bharata B Rao
2023-02-13 3:34 ` Huang, Ying
2023-02-13 3:26 ` Huang, Ying
2023-02-13 5:52 ` Bharata B Rao
2023-02-13 6:30 ` Huang, Ying
2023-02-14 4:55 ` Bharata B Rao
2023-02-15 6:07 ` Huang, Ying
2023-02-24 3:28 ` Bharata B Rao
2023-02-16 8:41 ` Bharata B Rao
2023-02-17 6:03 ` Huang, Ying
2023-02-24 3:36 ` Bharata B Rao
2023-02-27 7:54 ` Huang, Ying
2023-03-01 11:21 ` Bharata B Rao
2023-03-02 8:10 ` Huang, Ying
2023-03-03 5:25 ` Bharata B Rao [this message]
2023-03-03 5:53 ` Huang, Ying
2023-03-06 15:30 ` Bharata B Rao
2023-03-07 2:33 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41b47cd7-1ba9-3205-165e-02e8384e7064@amd.com \
--to=bharata@amd.com \
--cc=Ravikumar.Bangoria@amd.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
--cc=ying.huang@intel.com \
--cc=yue.li@memverge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox