linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC] [PATCH] mm/page_alloc: pcp->batch tuning
@ 2025-10-06 14:54 Joshua Hahn
  2025-10-08 15:34 ` Dave Hansen
  0 siblings, 1 reply; 5+ messages in thread
From: Joshua Hahn @ 2025-10-06 14:54 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, Brendan Jackman, Johannes Weiner, Michal Hocko,
	Suren Baghdasaryan, Vlastimil Babka, Zi Yan, linux-kernel,
	linux-mm

Recently while working on another patch about batching
free_pcppages_bulk [1], I was curious why pcp->batch was always 63 on my
machine. This led me to zone_batchsize(), where I found this set of
lines to determine what the batch size should be for the host:

	batch = min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE);
	batch /= 4;		/* We effectively *= 4 below */
	if (batch < 1)
		batch = 1;

All of this is good, except the comment above which says "We effectively
*= 4 below". Nowhere else in the function zone_batchsize(), is there a
corresponding multipliation by 4. Looking into the history of this, it
seems like Dave Hansen had also noticed this back in 2013 [1]. Turns out
there *used* to be a corresponding *= 4, which was turned into a *= 6
later on to be used in pageset_setup_from_batch_size(), which no longer
exists.

This leaves us with a /= 4 with no corresponding *= 4 anywhere, which
leaves pcp->batch mistuned from the original intent when it was
introduced. This is made worse by the fact that pcp lists are generally
larger today than they were in 2013, meaning batch sizes should have
increased, not decreased.

While the obvious solution is to remove this /= 4 to restore the
original tuning heuristics, I think this discovery opens up a discussion
on what pcp->batch should be, and whether this is something that should
be dynamically tuned based on the system's usage, like pcp->high.

Naively removing the /= 4 also changes the tuning for the entire system,
so I am a bit hesitant to just simply remove this, even though I believe
having a larger batch size (this means the new default batch size will
be the # of pages it takes to get 1M) can be helpful for the general
scale of machines running today, as opposed to 12 years ago.

I've left this patch as an RFC to see what folks have to say about this
decision.

[1] https://lore.kernel.org/all/20251002204636.4016712-1-joshua.hahnjy@gmail.com/
[2] https://lore.kernel.org/linux-mm/20131015203547.8724C69C@viggo.jf.intel.com/

Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
 mm/page_alloc.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d1d037f97c5f..b4db0d09d145 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5815,7 +5815,6 @@ static int zone_batchsize(struct zone *zone)
 	 * and zone lock contention.
 	 */
 	batch = min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE);
-	batch /= 4;		/* We effectively *= 4 below */
 	if (batch < 1)
 		batch = 1;
 

base-commit: 097a6c336d0080725c626fda118ecfec448acd0f
-- 
2.47.3


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-10-09 14:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-06 14:54 [RFC] [PATCH] mm/page_alloc: pcp->batch tuning Joshua Hahn
2025-10-08 15:34 ` Dave Hansen
2025-10-08 19:36   ` Joshua Hahn
2025-10-09  2:57     ` Huang, Ying
2025-10-09 14:41       ` Joshua Hahn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox