From: Andrew Morton <akpm@linux-foundation.org>
To: Ankur Arora <ankur.a.arora@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
david@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com,
hpa@zytor.com, mingo@redhat.com, mjguzik@gmail.com,
luto@kernel.org, peterz@infradead.org, tglx@linutronix.de,
willy@infradead.org, raghavendra.kt@amd.com, chleroy@kernel.org,
ioworker0@gmail.com, boris.ostrovsky@oracle.com,
konrad.wilk@oracle.com
Subject: Re: [PATCH v10 7/8] mm, folio_zero_user: support clearing page ranges
Date: Tue, 16 Dec 2025 07:12:50 -0800 [thread overview]
Message-ID: <20251216071250.e49ecf7490acf7f377dbfdc0@linux-foundation.org> (raw)
In-Reply-To: <874ipqexai.fsf@oracle.com>
On Mon, 15 Dec 2025 22:49:25 -0800 Ankur Arora <ankur.a.arora@oracle.com> wrote:
> >> [#] Notice that we perform much better with preempt=full|lazy. As
> >> mentioned above, preemptible models not needing explicit invocations
> >> of cond_resched() allow clearing of the full extent (1GB) as a
> >> single unit.
> >> In comparison the maximum extent used for preempt=none|voluntary is
> >> PROCESS_PAGES_NON_PREEMPT_BATCH (8MB).
> >>
> >> The larger extent allows the processor to elide cacheline
> >> allocation (on Milan the threshold is LLC-size=32MB.)
> >
> > It is this?
>
> Yeah I think so. For size >= 32MB, the microcoder can really just elide
> cacheline allocation, and with the foreknowledge of the extent can perhaps
> optimize on cache coherence traffic (this last one is my speculation).
>
> On cacheline allocation elision, compare the L1-dcache-load in the two versions
> below:
>
> pg-sz=1GB:
> - 9,250,034,512 cycles # 2.418 GHz ( +- 0.43% ) (46.16%)
> - 544,878,976 instructions # 0.06 insn per cycle
> - 2,331,332,516 L1-dcache-loads # 609.471 M/sec ( +- 0.03% ) (46.16%)
> - 1,075,122,960 L1-dcache-load-misses # 46.12% of all L1-dcache accesses ( +- 0.01% ) (46.15%)
>
> + 3,688,681,006 cycles # 2.420 GHz ( +- 3.48% ) (46.01%)
> + 10,979,121 instructions # 0.00 insn per cycle
> + 31,829,258 L1-dcache-loads # 20.881 M/sec ( +- 4.92% ) (46.34%)
> + 13,677,295 L1-dcache-load-misses # 42.97% of all L1-dcache accesses ( +- 6.15% ) (46.32%)
>
That says L1 d-cache loads went from 600 million/sec down to 20
million/sec when using 32MB chunks?
Do you know what happens to preemption latency if you increase that
chunk size from 8MB to 32MB? At 42GB/sec, 32MB will take less than a
millisecond, yes? I'm not aware of us really having any latency
targets in these preemption modes, but 1 millisecond sounds pretty
good.
next prev parent reply other threads:[~2025-12-16 15:12 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-15 20:49 [PATCH v10 0/8] mm: folio_zero_user: clear contiguous pages Ankur Arora
2025-12-15 20:49 ` [PATCH v10 1/8] treewide: provide a generic clear_user_page() variant Ankur Arora
2025-12-18 7:11 ` David Hildenbrand (Red Hat)
2025-12-18 19:31 ` Ankur Arora
2025-12-15 20:49 ` [PATCH v10 2/8] highmem: introduce clear_user_highpages() Ankur Arora
2025-12-15 20:49 ` [PATCH v10 3/8] mm: introduce clear_pages() and clear_user_pages() Ankur Arora
2025-12-15 20:49 ` [PATCH v10 4/8] highmem: do range clearing in clear_user_highpages() Ankur Arora
2025-12-18 7:15 ` David Hildenbrand (Red Hat)
2025-12-18 20:01 ` Ankur Arora
2025-12-15 20:49 ` [PATCH v10 5/8] x86/mm: Simplify clear_page_* Ankur Arora
2025-12-15 20:49 ` [PATCH v10 6/8] x86/clear_page: Introduce clear_pages() Ankur Arora
2025-12-18 7:22 ` David Hildenbrand (Red Hat)
2025-12-15 20:49 ` [PATCH v10 7/8] mm, folio_zero_user: support clearing page ranges Ankur Arora
2025-12-16 2:44 ` Andrew Morton
2025-12-16 6:49 ` Ankur Arora
2025-12-16 15:12 ` Andrew Morton [this message]
2025-12-17 8:48 ` Ankur Arora
2025-12-17 18:54 ` Andrew Morton
2025-12-17 19:51 ` Ankur Arora
2025-12-17 20:26 ` Andrew Morton
2025-12-18 0:51 ` Ankur Arora
2025-12-18 7:36 ` David Hildenbrand (Red Hat)
2025-12-18 20:16 ` Ankur Arora
2025-12-15 20:49 ` [PATCH v10 8/8] mm: folio_zero_user: cache neighbouring pages Ankur Arora
2025-12-18 7:49 ` David Hildenbrand (Red Hat)
2025-12-18 21:01 ` Ankur Arora
2025-12-18 21:23 ` Ankur Arora
2025-12-23 10:11 ` David Hildenbrand (Red Hat)
2025-12-16 2:48 ` [PATCH v10 0/8] mm: folio_zero_user: clear contiguous pages Andrew Morton
2025-12-16 5:04 ` Ankur Arora
2025-12-18 7:38 ` David Hildenbrand (Red Hat)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251216071250.e49ecf7490acf7f377dbfdc0@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=ankur.a.arora@oracle.com \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=chleroy@kernel.org \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=hpa@zytor.com \
--cc=ioworker0@gmail.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=mjguzik@gmail.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@amd.com \
--cc=tglx@linutronix.de \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox