From: Mateusz Guzik <mjguzik@gmail.com>
To: x86@kernel.org, linux-mm <linux-mm@kvack.org>
Subject: idea to ponder: handling extra pages on faults of anonymous areas
Date: Tue, 15 Apr 2025 18:58:43 +0200 [thread overview]
Message-ID: <CAGudoHGY012mwJqtGPUQ9mfQEVF1_otr9NSbbTYi_vazS09-CQ@mail.gmail.com> (raw)
If you have an area not backed by a huge page and fault on it, you
only get the 4K page zeroed even if the given mmapped area is bigger.
I have a promising result from zeroing more pages than that, but don't
have time to evaluate more workloads or code up a proper patch.
Hopefully someone(tm) will be interested enough to pick it up.
Rationale:
4K pages predate the fall of the Soviet Union and ram sizes went up
orders of magnitude since then even on what's considered low end
systems today. Similarly, memory usage of programs went up
significantly. It is not a stretch to suspect a bigger size would
serve real workloads better. 2MB pages of course are applicable in
some capacity, but my testing shows there is still tons of faults on
areas where they are not used.
In particular when running everyone's favourite workload of compiling
stuff, kernel time is quite big (e.g., > 15%), where a large chunk is
spent handling page faults.
While the hardware does not provide good granularity (the immediate
4KB -> 2MB jump) and will still need to use 4KB pages, fault handling
can go down by speculatively sorting out more than just the page which
got faulted on.
I suspect rolling with 8KB would provide a good enough improvement
while suffering negligible waste in practice.
While testing 8KB would require patching the kernel, I was pointed at
knobs in /sys/kernel/mm/transparent_hugepage which facilitate early
experiments. The smallest available size 16K, so that's what I used
below for benchmarking.
I conducted a simple experiment building will-it-scale like so:
taskset --cpu-list 1 hyperfine "gmake -s -j 1 clean all"
stock:
Time (mean ± σ): 20.707 s ± 0.080 s [User: 17.222 s, System: 3.376 s]
16K pages:
Time (mean ± σ): 19.471 s ± 0.046 s [User: 16.836 s, System: 2.608 s]
Or to put it differently a reliable 5% reduction in real time. Page
fault count dropped to less than half, which suggests majority of the
improvement would show up with mere 8K instead of 16.
the 16K thing was tested with:
echo always > /sys/kernel/mm/transparent_hugepage/hugepages-16kB/enabled
I stress the proposal is not necessarily to use mTHPs here (or
whatever the name), the above was merely employed because it was
readily available. I'm told the use of these might prevent other
optimization by the kernel -- these are artifacts of the
implementation and are not inherent to the idea.
The proposal is to fill in more than one page on faults on anonymous
areas, regardless of how it is specifically handled. I speculate
handling two pages (aka 8KB of size) will be an overall win and should
not be affecting anything else (huge page promotions, whatever TLB
fuckery and what have you). Worst case you got a page you are not
going to use.
I think a good quality proposal is quite time consuming to produce and
I don't have the cycles. I also can't guarantee the mm overlords will
accept something like that. I can however point out that google
experimented with 16KB pages for arm64 and got very promising results
(i have no idea if they switched to use them) -- I would start with
prodding those folk.
cheers
--
Mateusz Guzik <mjguzik gmail.com>
next reply other threads:[~2025-04-15 16:59 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-15 16:58 Mateusz Guzik [this message]
2025-04-16 7:34 ` Hao Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGudoHGY012mwJqtGPUQ9mfQEVF1_otr9NSbbTYi_vazS09-CQ@mail.gmail.com \
--to=mjguzik@gmail.com \
--cc=linux-mm@kvack.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox