From: Ankur Arora <ankur.a.arora@oracle.com>
To: Raghavendra K T <raghavendra.kt@amd.com>
Cc: Ankur Arora <ankur.a.arora@oracle.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
torvalds@linux-foundation.org, akpm@linux-foundation.org,
luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com,
hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com,
willy@infradead.org, mgorman@suse.de, peterz@infradead.org,
rostedt@goodmis.org, tglx@linutronix.de,
vincent.guittot@linaro.org, jon.grimm@amd.com, bharata@amd.com,
boris.ostrovsky@oracle.com, konrad.wilk@oracle.com
Subject: Re: [PATCH 0/9] x86/clear_huge_page: multi-page clearing
Date: Sat, 08 Apr 2023 15:46:56 -0700 [thread overview]
Message-ID: <87ttxqf0v3.fsf@oracle.com> (raw)
In-Reply-To: <271b85ec-281e-d33b-5495-59eb2bc9fde4@amd.com>
Raghavendra K T <raghavendra.kt@amd.com> writes:
> On 4/3/2023 10:52 AM, Ankur Arora wrote:
>> This series introduces multi-page clearing for hugepages.
> *Milan* mm/clear_huge_page x86/clear_huge_page change
> (GB/s) (GB/s)
> pg-sz=2MB 12.24 17.54 +43.30%
> pg-sz=1GB 17.98 37.24 +107.11%
>
>
> Hello Ankur,
>
> Was able to test your patches. To summarize, am seeing 2x-3x perf
> improvement for 2M, 1GB base hugepage sizes.
Great. Thanks Raghavendra.
> SUT: Genoa AMD EPYC
> Thread(s) per core: 2
> Core(s) per socket: 128
> Socket(s): 2
>
> NUMA:
> NUMA node(s): 2
> NUMA node0 CPU(s): 0-127,256-383
> NUMA node1 CPU(s): 128-255,384-511
>
> Test: Use mmap(MAP_HUGETLB) to demand a fault on 64GB region (NUMA node0), for
> both base-hugepage-size=2M and 1GB
>
> perf stat -r 10 -d -d numactl -m 0 -N 0 <test>
>
> time in seconds elapsed (average of 10 runs) (lower = better)
>
> Result:
> page-size mm/clear_huge_page x86/clear_huge_page
> 2M 5.4567 2.6774
> 1G 2.64452 1.011281
So translating into BW, for Genoa we have:
page-size mm/clear_huge_page x86/clear_huge_page
2M 11.74 23.97
1G 24.24 63.36
That's a pretty good bump over Milan:
> *Milan* mm/clear_huge_page x86/clear_huge_page
> (GB/s) (GB/s)
> pg-sz=2MB 12.24 17.54
> pg-sz=1GB 17.98 37.24
Btw, are these numbers with boost=1?
> Full perfstat info
>
> page size = 2M mm/clear_huge_page
>
> Performance counter stats for 'numactl -m 0 -N 0 map_hugetlb_2M' (10 runs):
>
> 5,434.71 msec task-clock # 0.996 CPUs utilized
> ( +- 0.55% )
> 8 context-switches # 1.466 /sec
> ( +- 4.66% )
> 0 cpu-migrations # 0.000 /sec
> 32,918 page-faults # 6.034 K/sec
> ( +- 0.00% )
> 16,977,242,482 cycles # 3.112 GHz
> ( +- 0.04% ) (35.70%)
> 1,961,724 stalled-cycles-frontend # 0.01% frontend cycles
> idle ( +- 1.09% ) (35.72%)
> 35,685,674 stalled-cycles-backend # 0.21% backend cycles idle
> ( +- 3.48% ) (35.74%)
> 1,038,327,182 instructions # 0.06 insn per cycle
> # 0.04 stalled cycles per
> insn ( +- 0.38% )
> (35.75%)
> 221,409,216 branches # 40.584 M/sec
> ( +- 0.36% ) (35.75%)
> 350,730 branch-misses # 0.16% of all branches
> ( +- 1.18% ) (35.75%)
> 2,520,888,779 L1-dcache-loads # 462.077 M/sec
> ( +- 0.03% ) (35.73%)
> 1,094,178,209 L1-dcache-load-misses # 43.46% of all L1-dcache
> accesses ( +- 0.02% ) (35.71%)
> 67,751,730 L1-icache-loads # 12.419 M/sec
> ( +- 0.11% ) (35.70%)
> 271,118 L1-icache-load-misses # 0.40% of all L1-icache
> accesses ( +- 2.55% ) (35.70%)
> 506,635 dTLB-loads # 92.866 K/sec
> ( +- 3.31% ) (35.70%)
> 237,385 dTLB-load-misses # 43.64% of all dTLB cache
> accesses ( +- 7.00% ) (35.69%)
> 268 iTLB-load-misses # 6700.00% of all iTLB cache
> accesses ( +- 13.86% ) (35.70%)
>
> 5.4567 +- 0.0300 seconds time elapsed ( +- 0.55% )
>
> page size = 2M x86/clear_huge_page
> Performance counter stats for 'numactl -m 0 -N 0 map_hugetlb_2M' (10 runs):
>
> 2,780.69 msec task-clock # 1.039 CPUs utilized
> ( +- 1.03% )
> 3 context-switches # 1.121 /sec
> ( +- 21.34% )
> 0 cpu-migrations # 0.000 /sec
> 32,918 page-faults # 12.301 K/sec
> ( +- 0.00% )
> 8,143,619,771 cycles # 3.043 GHz
> ( +- 0.25% ) (35.62%)
> 2,024,872 stalled-cycles-frontend # 0.02% frontend cycles
> idle ( +-320.93% ) (35.66%)
> 717,198,728 stalled-cycles-backend # 8.82% backend cycles idle
> ( +- 8.26% ) (35.69%)
> 606,549,334 instructions # 0.07 insn per cycle
> # 1.39 stalled cycles per
> insn ( +- 0.23% )
> (35.73%)
> 108,856,550 branches # 40.677 M/sec
> ( +- 0.24% ) (35.76%)
> 202,490 branch-misses # 0.18% of all branches
> ( +- 3.58% ) (35.78%)
> 2,348,818,806 L1-dcache-loads # 877.701 M/sec
> ( +- 0.03% ) (35.78%)
> 1,081,562,988 L1-dcache-load-misses # 46.04% of all L1-dcache
> accesses ( +- 0.01% ) (35.78%)
> <not supported> LLC-loads
> <not supported> LLC-load-misses
> 43,411,167 L1-icache-loads # 16.222 M/sec
> ( +- 0.19% ) (35.77%)
> 273,042 L1-icache-load-misses # 0.64% of all L1-icache
> accesses ( +- 4.94% ) (35.76%)
> 834,482 dTLB-loads # 311.827 K/sec
> ( +- 9.73% ) (35.72%)
> 437,343 dTLB-load-misses # 65.86% of all dTLB cache
> accesses ( +- 8.56% ) (35.68%)
> 0 iTLB-loads # 0.000 /sec
> (35.65%)
> 160 iTLB-load-misses # 1777.78% of all iTLB cache
> accesses ( +- 15.82% ) (35.62%)
>
> 2.6774 +- 0.0287 seconds time elapsed ( +- 1.07% )
>
> page size = 1G mm/clear_huge_page
> Performance counter stats for 'numactl -m 0 -N 0 map_hugetlb_1G' (10 runs):
>
> 2,625.24 msec task-clock # 0.993 CPUs utilized
> ( +- 0.23% )
> 4 context-switches # 1.513 /sec
> ( +- 4.49% )
> 1 cpu-migrations # 0.378 /sec
> 214 page-faults # 80.965 /sec
> ( +- 0.13% )
> 8,178,624,349 cycles # 3.094 GHz
> ( +- 0.23% ) (35.65%)
> 2,942,576 stalled-cycles-frontend # 0.04% frontend cycles
> idle ( +- 75.22% ) (35.69%)
> 7,117,425 stalled-cycles-backend # 0.09% backend cycles idle
> ( +- 3.79% ) (35.73%)
> 454,521,647 instructions # 0.06 insn per cycle
> # 0.02 stalled cycles per
> insn ( +- 0.10% )
> (35.77%)
> 113,223,853 branches # 42.837 M/sec
> ( +- 0.08% ) (35.80%)
> 84,766 branch-misses # 0.07% of all branches
> ( +- 5.37% ) (35.80%)
> 2,294,528,890 L1-dcache-loads # 868.111 M/sec
> ( +- 0.02% ) (35.81%)
> 1,075,907,551 L1-dcache-load-misses # 46.88% of all L1-dcache
> accesses ( +- 0.02% ) (35.78%)
> 26,167,323 L1-icache-loads # 9.900 M/sec
> ( +- 0.24% ) (35.74%)
> 139,675 L1-icache-load-misses # 0.54% of all L1-icache
> accesses ( +- 0.37% ) (35.70%)
> 3,459 dTLB-loads # 1.309 K/sec
> ( +- 12.75% ) (35.67%)
> 732 dTLB-load-misses # 19.71% of all dTLB cache
> accesses ( +- 26.61% ) (35.62%)
> 11 iTLB-load-misses # 192.98% of all iTLB cache
> accesses ( +-238.28% ) (35.62%)
>
> 2.64452 +- 0.00600 seconds time elapsed ( +- 0.23% )
>
>
> page size = 1G x86/clear_huge_page
> Performance counter stats for 'numactl -m 0 -N 0 map_hugetlb_1G' (10 runs):
>
> 1,009.09 msec task-clock # 0.998 CPUs utilized
> ( +- 0.06% )
> 2 context-switches # 1.980 /sec
> ( +- 23.63% )
> 1 cpu-migrations # 0.990 /sec
> 214 page-faults # 211.887 /sec
> ( +- 0.16% )
> 3,154,980,463 cycles # 3.124 GHz
> ( +- 0.06% ) (35.77%)
> 145,051 stalled-cycles-frontend # 0.00% frontend cycles
> idle ( +- 6.26% ) (35.78%)
> 730,087,143 stalled-cycles-backend # 23.12% backend cycles idle
> ( +- 9.75% ) (35.78%)
> 45,813,391 instructions # 0.01 insn per cycle
> # 18.51 stalled cycles per
> insn ( +- 1.00% )
> (35.78%)
> 8,498,282 branches # 8.414 M/sec
> ( +- 1.54% ) (35.78%)
> 63,351 branch-misses # 0.74% of all branches
> ( +- 6.70% ) (35.69%)
> 29,135,863 L1-dcache-loads # 28.848 M/sec
> ( +- 5.67% ) (35.68%)
> 8,537,280 L1-dcache-load-misses # 28.66% of all L1-dcache
> accesses ( +- 10.15% ) (35.68%)
> 1,040,087 L1-icache-loads # 1.030 M/sec
> ( +- 1.60% ) (35.68%)
> 9,147 L1-icache-load-misses # 0.85% of all L1-icache
> accesses ( +- 6.50% ) (35.67%)
> 1,084 dTLB-loads # 1.073 K/sec
> ( +- 12.05% ) (35.68%)
> 431 dTLB-load-misses # 40.28% of all dTLB cache
> accesses ( +- 43.46% ) (35.68%)
> 16 iTLB-load-misses # 0.00% of all iTLB cache
> accesses ( +- 40.54% ) (35.68%)
>
> 1.011281 +- 0.000624 seconds time elapsed ( +- 0.06% )
>
> Please feel free to add
>
> Tested-by: Raghavendra K T <raghavendra.kt@amd.com>
Thanks
Ankur
> Will come back with further observations on patch/performance if any
next prev parent reply other threads:[~2023-04-08 22:47 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-03 5:22 Ankur Arora
2023-04-03 5:22 ` [PATCH 1/9] huge_pages: get rid of process_huge_page() Ankur Arora
2023-04-03 5:22 ` [PATCH 2/9] huge_page: get rid of {clear,copy}_subpage() Ankur Arora
2023-04-03 5:22 ` [PATCH 3/9] huge_page: allow arch override for clear/copy_huge_page() Ankur Arora
2023-04-03 5:22 ` [PATCH 4/9] x86/clear_page: parameterize clear_page*() to specify length Ankur Arora
2023-04-06 8:19 ` Peter Zijlstra
2023-04-07 3:03 ` Ankur Arora
2023-04-03 5:22 ` [PATCH 5/9] x86/clear_pages: add clear_pages() Ankur Arora
2023-04-06 8:23 ` Peter Zijlstra
2023-04-07 0:50 ` Ankur Arora
2023-04-07 10:34 ` Peter Zijlstra
2023-04-09 13:26 ` Matthew Wilcox
2023-04-03 5:22 ` [PATCH 6/9] mm/clear_huge_page: use multi-page clearing Ankur Arora
2023-04-03 5:22 ` [PATCH 7/9] sched: define TIF_ALLOW_RESCHED Ankur Arora
2023-04-05 20:07 ` Peter Zijlstra
2023-04-03 5:22 ` [PATCH 8/9] irqentry: define irqentry_exit_allow_resched() Ankur Arora
2023-04-04 9:38 ` Thomas Gleixner
2023-04-05 5:29 ` Ankur Arora
2023-04-05 20:22 ` Peter Zijlstra
2023-04-06 16:56 ` Ankur Arora
2023-04-06 20:13 ` Peter Zijlstra
2023-04-06 20:16 ` Peter Zijlstra
2023-04-07 2:29 ` Ankur Arora
2023-04-07 10:23 ` Peter Zijlstra
2023-04-03 5:22 ` [PATCH 9/9] x86/clear_huge_page: make clear_contig_region() preemptible Ankur Arora
2023-04-05 20:27 ` Peter Zijlstra
2023-04-06 17:00 ` Ankur Arora
2023-04-05 19:48 ` [PATCH 0/9] x86/clear_huge_page: multi-page clearing Raghavendra K T
2023-04-08 22:46 ` Ankur Arora [this message]
2023-04-10 6:26 ` Raghavendra K T
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ttxqf0v3.fsf@oracle.com \
--to=ankur.a.arora@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=bharata@amd.com \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=jon.grimm@amd.com \
--cc=juri.lelli@redhat.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@amd.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=vincent.guittot@linaro.org \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox