From: Ankur Arora <ankur.a.arora@oracle.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com,
hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com,
willy@infradead.org, mgorman@suse.de, peterz@infradead.org,
rostedt@goodmis.org, tglx@linutronix.de,
vincent.guittot@linaro.org, jon.grimm@amd.com, bharata@amd.com,
boris.ostrovsky@oracle.com, konrad.wilk@oracle.com,
ankur.a.arora@oracle.com
Subject: [PATCH 0/9] x86/clear_huge_page: multi-page clearing
Date: Sun, 2 Apr 2023 22:22:24 -0700 [thread overview]
Message-ID: <20230403052233.1880567-1-ankur.a.arora@oracle.com> (raw)
This series introduces multi-page clearing for hugepages.
This is a follow up of some of the ideas discussed at:
https://lore.kernel.org/lkml/CAHk-=wj9En-BC4t7J9xFZOws5ShwaR9yor7FxHZr8CTVyEP_+Q@mail.gmail.com/
On x86 page clearing is typically done via string intructions. These,
unlike a MOV loop, allow us to explicitly advertise the region-size to
the processor, which could serve as a hint to current (and/or
future) uarchs to elide cacheline allocation.
In current generation processors, Milan (and presumably other Zen
variants) use the hint to elide cacheline allocation (for
region-size > LLC-size.)
An additional reason for doing this is that string instructions are typically
microcoded, and clearing in bigger chunks than the current page-at-a-
time logic amortizes some of the cost.
All uarchs tested (Milan, Icelakex, Skylakex) showed improved performance.
There are, however, some problems:
1. extended zeroing periods means there's an increased latency due to
the now missing preemption points.
That's handled in patches 7, 8, 9:
"sched: define TIF_ALLOW_RESCHED"
"irqentry: define irqentry_exit_allow_resched()"
"x86/clear_huge_page: make clear_contig_region() preemptible"
by the context marking itself reschedulable, and rescheduling in
irqexit context if needed (for PREEMPTION_NONE/_VOLUNTARY.)
2. the current page-at-a-time clearing logic does left-right narrowing
towards the faulting page which benefits workloads by maintaining
cache locality for workloads which have a sequential pattern. Clearing
in large chunks loses that.
Some (but not all) of that could be ameliorated by something like
this patch:
https://lore.kernel.org/lkml/20220606203725.1313715-1-ankur.a.arora@oracle.com/
But, before doing that I'd like some comments on whether that is
worth doing for this specific use case?
Rest of the series:
Patches 1, 2, 3:
"huge_pages: get rid of process_huge_page()"
"huge_page: get rid of {clear,copy}_subpage()"
"huge_page: allow arch override for clear/copy_huge_page()"
are mechanical and they simplify some of the current clear_huge_page()
logic.
Patches 4, 5:
"x86/clear_page: parameterize clear_page*() to specify length"
"x86/clear_pages: add clear_pages()"
add clear_pages() and helpers.
Patch 6: "mm/clear_huge_page: use multi-page clearing" adds the
chunked x86 clear_huge_page() implementation.
Performance
==
Demand fault performance gets a decent boost:
*Icelakex* mm/clear_huge_page x86/clear_huge_page change
(GB/s) (GB/s)
pg-sz=2MB 8.76 11.82 +34.93%
pg-sz=1GB 8.99 12.18 +35.48%
*Milan* mm/clear_huge_page x86/clear_huge_page change
(GB/s) (GB/s)
pg-sz=2MB 12.24 17.54 +43.30%
pg-sz=1GB 17.98 37.24 +107.11%
vm-scalability/case-anon-w-seq-hugetlb, gains in stime but performs
worse when user space tries to touch those pages:
*Icelakex* mm/clear_huge_page x86/clear_huge_page change
(mem=4GB/task, tasks=128)
stime 293.02 +- .49% 239.39 +- .83% -18.30%
utime 440.11 +- .28% 508.74 +- .60% +15.59%
wall-clock 5.96 +- .33% 6.27 +-2.23% + 5.20%
*Milan* mm/clear_huge_page x86/clear_huge_page change
(mem=1GB/task, tasks=512)
stime 490.95 +- 3.55% 466.90 +- 4.79% - 4.89%
utime 276.43 +- 2.85% 311.97 +- 5.15% +12.85%
wall-clock 3.74 +- 6.41% 3.58 +- 7.82% - 4.27%
Also at:
github.com/terminus/linux clear-pages.v1
Comments appreciated!
Ankur Arora (9):
huge_pages: get rid of process_huge_page()
huge_page: get rid of {clear,copy}_subpage()
huge_page: allow arch override for clear/copy_huge_page()
x86/clear_page: parameterize clear_page*() to specify length
x86/clear_pages: add clear_pages()
mm/clear_huge_page: use multi-page clearing
sched: define TIF_ALLOW_RESCHED
irqentry: define irqentry_exit_allow_resched()
x86/clear_huge_page: make clear_contig_region() preemptible
arch/x86/include/asm/page.h | 6 +
arch/x86/include/asm/page_32.h | 6 +
arch/x86/include/asm/page_64.h | 25 +++--
arch/x86/include/asm/thread_info.h | 2 +
arch/x86/lib/clear_page_64.S | 45 ++++++--
arch/x86/mm/hugetlbpage.c | 59 ++++++++++
include/linux/sched.h | 29 +++++
kernel/entry/common.c | 8 ++
kernel/sched/core.c | 36 +++---
mm/memory.c | 174 +++++++++++++++--------------
10 files changed, 270 insertions(+), 120 deletions(-)
--
2.31.1
next reply other threads:[~2023-04-03 5:23 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-03 5:22 Ankur Arora [this message]
2023-04-03 5:22 ` [PATCH 1/9] huge_pages: get rid of process_huge_page() Ankur Arora
2023-04-03 5:22 ` [PATCH 2/9] huge_page: get rid of {clear,copy}_subpage() Ankur Arora
2023-04-03 5:22 ` [PATCH 3/9] huge_page: allow arch override for clear/copy_huge_page() Ankur Arora
2023-04-03 5:22 ` [PATCH 4/9] x86/clear_page: parameterize clear_page*() to specify length Ankur Arora
2023-04-06 8:19 ` Peter Zijlstra
2023-04-07 3:03 ` Ankur Arora
2023-04-03 5:22 ` [PATCH 5/9] x86/clear_pages: add clear_pages() Ankur Arora
2023-04-06 8:23 ` Peter Zijlstra
2023-04-07 0:50 ` Ankur Arora
2023-04-07 10:34 ` Peter Zijlstra
2023-04-09 13:26 ` Matthew Wilcox
2023-04-03 5:22 ` [PATCH 6/9] mm/clear_huge_page: use multi-page clearing Ankur Arora
2023-04-03 5:22 ` [PATCH 7/9] sched: define TIF_ALLOW_RESCHED Ankur Arora
2023-04-05 20:07 ` Peter Zijlstra
2023-04-03 5:22 ` [PATCH 8/9] irqentry: define irqentry_exit_allow_resched() Ankur Arora
2023-04-04 9:38 ` Thomas Gleixner
2023-04-05 5:29 ` Ankur Arora
2023-04-05 20:22 ` Peter Zijlstra
2023-04-06 16:56 ` Ankur Arora
2023-04-06 20:13 ` Peter Zijlstra
2023-04-06 20:16 ` Peter Zijlstra
2023-04-07 2:29 ` Ankur Arora
2023-04-07 10:23 ` Peter Zijlstra
2023-04-03 5:22 ` [PATCH 9/9] x86/clear_huge_page: make clear_contig_region() preemptible Ankur Arora
2023-04-05 20:27 ` Peter Zijlstra
2023-04-06 17:00 ` Ankur Arora
2023-04-05 19:48 ` [PATCH 0/9] x86/clear_huge_page: multi-page clearing Raghavendra K T
2023-04-08 22:46 ` Ankur Arora
2023-04-10 6:26 ` Raghavendra K T
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230403052233.1880567-1-ankur.a.arora@oracle.com \
--to=ankur.a.arora@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=bharata@amd.com \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=jon.grimm@amd.com \
--cc=juri.lelli@redhat.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=vincent.guittot@linaro.org \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox