linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] mm/page_alloc: pcp->batch cleanups
@ 2025-10-09 19:29 Joshua Hahn
  2025-10-09 19:29 ` [PATCH v2 1/2] mm/page_alloc: Clarify batch tuning in zone_batchsize Joshua Hahn
  2025-10-09 19:29 ` [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 Joshua Hahn
  0 siblings, 2 replies; 5+ messages in thread
From: Joshua Hahn @ 2025-10-09 19:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Hansen, Brendan Jackman, Johannes Weiner, Michal Hocko,
	Suren Baghdasaryan, Vlastimil Babka, Zi Yan, linux-kernel,
	linux-mm, kernel-team

Two small cleanups for mm/page_alloc.

Patch 1 cleans up a misleading comment about how pcp->batch is calculated,
and folds in the calculation to increase clarity. No functional change
intended.

Patch 2 corrects zones from reporting that their pcp->batch is 0 when it
is actually 1. Namely, corrects ZONE_DMA from reporting that its batch
size is 0.

Joshua Hahn (2):
  mm/page_alloc: Clarify batch tuning in zone_batchsize
  mm/page_alloc: Prevent reporting pcp->batch = 0

 mm/page_alloc.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)


base-commit: ec714e371f22f716a04e6ecb2a24988c92b26911
-- 
2.47.3


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/2] mm/page_alloc: Clarify batch tuning in zone_batchsize
  2025-10-09 19:29 [PATCH v2 0/2] mm/page_alloc: pcp->batch cleanups Joshua Hahn
@ 2025-10-09 19:29 ` Joshua Hahn
  2025-10-13 12:55   ` Vlastimil Babka
  2025-10-09 19:29 ` [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 Joshua Hahn
  1 sibling, 1 reply; 5+ messages in thread
From: Joshua Hahn @ 2025-10-09 19:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Hansen, Brendan Jackman, Johannes Weiner, Michal Hocko,
	Suren Baghdasaryan, Vlastimil Babka, Zi Yan, linux-kernel,
	linux-mm, kernel-team

Recently while working on another patch about batching
free_pcppages_bulk [1], I was curious why pcp->batch was always 63 on my
machine. This led me to zone_batchsize(), where I found this set of
lines to determine what the batch size should be for the host:

	batch = min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE);
	batch /= 4;		/* We effectively *= 4 below */
	if (batch < 1)
		batch = 1;

All of this is good, except the comment above which says "We effectively
*= 4 below". Nowhere else in the function zone_batchsize(), is there a
corresponding multipliation by 4. Looking into the history of this, it
seems like Dave Hansen had also noticed this back in 2013 [1]. Turns out
there *used* to be a corresponding *= 4, which was turned into a *= 6
later on to be used in pageset_setup_from_batch_size(), which no longer
exists.

Despite this mismatch not being corrected in the comments, it seems that
getting rid of the /= 4 leads to a performance regression on machines
with less than 250G memory and 176 processors. As such, let us preserve
the functionality but clean up the comments.

Fold the /= 4 into the calculation above: bitshift by 10+2=12, and
instead of dividing 1MB, divide 256KB and adjust the comments
accordingly. No functional change intended.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>

[1] https://lore.kernel.org/all/20251002204636.4016712-1-joshua.hahnjy@gmail.com/
[2] https://lore.kernel.org/linux-mm/20131015203547.8724C69C@viggo.jf.intel.com/
---
 mm/page_alloc.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 600d9e981c23..39368cdc953d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5860,13 +5860,12 @@ static int zone_batchsize(struct zone *zone)
 	int batch;
 
 	/*
-	 * The number of pages to batch allocate is either ~0.1%
-	 * of the zone or 1MB, whichever is smaller. The batch
+	 * The number of pages to batch allocate is either ~0.025%
+	 * of the zone or 256KB, whichever is smaller. The batch
 	 * size is striking a balance between allocation latency
 	 * and zone lock contention.
 	 */
-	batch = min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE);
-	batch /= 4;		/* We effectively *= 4 below */
+	batch = min(zone_managed_pages(zone) >> 12, SZ_256K / PAGE_SIZE);
 	if (batch < 1)
 		batch = 1;
 
-- 
2.47.3


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0
  2025-10-09 19:29 [PATCH v2 0/2] mm/page_alloc: pcp->batch cleanups Joshua Hahn
  2025-10-09 19:29 ` [PATCH v2 1/2] mm/page_alloc: Clarify batch tuning in zone_batchsize Joshua Hahn
@ 2025-10-09 19:29 ` Joshua Hahn
  2025-10-13 12:58   ` Vlastimil Babka
  1 sibling, 1 reply; 5+ messages in thread
From: Joshua Hahn @ 2025-10-09 19:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Hansen, Brendan Jackman, Johannes Weiner, Michal Hocko,
	Suren Baghdasaryan, Vlastimil Babka, Zi Yan, linux-kernel,
	linux-mm, kernel-team

zone_batchsize returns the appropriate value that should be used for
pcp->batch. If it finds a zone with less than 4096 pages or PAGE_SIZE >
1M, however, it leads to some incorrect math.

In the above case, we will get an intermediary value of 1, which is then
rounded down to the nearest power of two, and 1 is subtracted from it.
Since 1 is already a power of two, we will get batch = 1-1 = 0:

	batch = rounddown_pow_of_two(batch + batch/2) - 1;

A pcp->batch value of 0 is nonsensical. If this were actually set, then
functions like drain_zone_pages would become no-ops, since they could
only free 0 pages at a time.

Of the two callers of zone_batchsize, the one that is actually used to
set pcp->batch works around this by setting pcp->batch to the maximum
of 1 and zone_batchsize. However, the other caller, zone_pcp_init,
incorrectly prints out the batch size of the zone to be 0.

This is probably rare in a typical zone, but the DMA zone can often have
less than 4096 pages, which means it will print out "LIFO batch:0".

Before: [    0.001216]   DMA zone: 3998 pages, LIFO batch:0
After:  [    0.001210]   DMA zone: 3998 pages, LIFO batch:1

Instead of dealing with the error handling and the mismatch between the
reported and actual zone batchsize, just return 1 if the zone_batchsize
is 1 page or less before the rounding.

Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
 mm/page_alloc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 39368cdc953d..10a908793b4c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5866,8 +5866,8 @@ static int zone_batchsize(struct zone *zone)
 	 * and zone lock contention.
 	 */
 	batch = min(zone_managed_pages(zone) >> 12, SZ_256K / PAGE_SIZE);
-	if (batch < 1)
-		batch = 1;
+	if (batch <= 1)
+		return 1;
 
 	/*
 	 * Clamp the batch to a 2^n - 1 value. Having a power
@@ -6018,7 +6018,7 @@ static void zone_set_pageset_high_and_batch(struct zone *zone, int cpu_online)
 {
 	int new_high_min, new_high_max, new_batch;
 
-	new_batch = max(1, zone_batchsize(zone));
+	new_batch = zone_batchsize(zone);
 	if (percpu_pagelist_high_fraction) {
 		new_high_min = zone_highsize(zone, new_batch, cpu_online,
 					     percpu_pagelist_high_fraction);
-- 
2.47.3


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/2] mm/page_alloc: Clarify batch tuning in zone_batchsize
  2025-10-09 19:29 ` [PATCH v2 1/2] mm/page_alloc: Clarify batch tuning in zone_batchsize Joshua Hahn
@ 2025-10-13 12:55   ` Vlastimil Babka
  0 siblings, 0 replies; 5+ messages in thread
From: Vlastimil Babka @ 2025-10-13 12:55 UTC (permalink / raw)
  To: Joshua Hahn, Andrew Morton
  Cc: Dave Hansen, Brendan Jackman, Johannes Weiner, Michal Hocko,
	Suren Baghdasaryan, Zi Yan, linux-kernel, linux-mm, kernel-team

On 10/9/25 21:29, Joshua Hahn wrote:
> Recently while working on another patch about batching
> free_pcppages_bulk [1], I was curious why pcp->batch was always 63 on my
> machine. This led me to zone_batchsize(), where I found this set of
> lines to determine what the batch size should be for the host:
> 
> 	batch = min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE);
> 	batch /= 4;		/* We effectively *= 4 below */
> 	if (batch < 1)
> 		batch = 1;
> 
> All of this is good, except the comment above which says "We effectively
> *= 4 below". Nowhere else in the function zone_batchsize(), is there a
> corresponding multipliation by 4. Looking into the history of this, it
> seems like Dave Hansen had also noticed this back in 2013 [1]. Turns out
> there *used* to be a corresponding *= 4, which was turned into a *= 6
> later on to be used in pageset_setup_from_batch_size(), which no longer
> exists.
> 
> Despite this mismatch not being corrected in the comments, it seems that
> getting rid of the /= 4 leads to a performance regression on machines
> with less than 250G memory and 176 processors. As such, let us preserve
> the functionality but clean up the comments.
> 
> Fold the /= 4 into the calculation above: bitshift by 10+2=12, and
> instead of dividing 1MB, divide 256KB and adjust the comments
> accordingly. No functional change intended.
> 
> Suggested-by: Dave Hansen <dave.hansen@intel.com>
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0
  2025-10-09 19:29 ` [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 Joshua Hahn
@ 2025-10-13 12:58   ` Vlastimil Babka
  0 siblings, 0 replies; 5+ messages in thread
From: Vlastimil Babka @ 2025-10-13 12:58 UTC (permalink / raw)
  To: Joshua Hahn, Andrew Morton
  Cc: Dave Hansen, Brendan Jackman, Johannes Weiner, Michal Hocko,
	Suren Baghdasaryan, Zi Yan, linux-kernel, linux-mm, kernel-team

On 10/9/25 21:29, Joshua Hahn wrote:
> zone_batchsize returns the appropriate value that should be used for
> pcp->batch. If it finds a zone with less than 4096 pages or PAGE_SIZE >
> 1M, however, it leads to some incorrect math.
> 
> In the above case, we will get an intermediary value of 1, which is then
> rounded down to the nearest power of two, and 1 is subtracted from it.
> Since 1 is already a power of two, we will get batch = 1-1 = 0:
> 
> 	batch = rounddown_pow_of_two(batch + batch/2) - 1;
> 
> A pcp->batch value of 0 is nonsensical. If this were actually set, then
> functions like drain_zone_pages would become no-ops, since they could
> only free 0 pages at a time.
> 
> Of the two callers of zone_batchsize, the one that is actually used to
> set pcp->batch works around this by setting pcp->batch to the maximum
> of 1 and zone_batchsize. However, the other caller, zone_pcp_init,
> incorrectly prints out the batch size of the zone to be 0.
> 
> This is probably rare in a typical zone, but the DMA zone can often have
> less than 4096 pages, which means it will print out "LIFO batch:0".
> 
> Before: [    0.001216]   DMA zone: 3998 pages, LIFO batch:0
> After:  [    0.001210]   DMA zone: 3998 pages, LIFO batch:1
> 
> Instead of dealing with the error handling and the mismatch between the
> reported and actual zone batchsize, just return 1 if the zone_batchsize
> is 1 page or less before the rounding.
> 
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-10-13 12:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-09 19:29 [PATCH v2 0/2] mm/page_alloc: pcp->batch cleanups Joshua Hahn
2025-10-09 19:29 ` [PATCH v2 1/2] mm/page_alloc: Clarify batch tuning in zone_batchsize Joshua Hahn
2025-10-13 12:55   ` Vlastimil Babka
2025-10-09 19:29 ` [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 Joshua Hahn
2025-10-13 12:58   ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox