* [PATCH v2 0/2] mm/page_alloc: pcp->batch cleanups @ 2025-10-09 19:29 Joshua Hahn 2025-10-09 19:29 ` [PATCH v2 1/2] mm/page_alloc: Clarify batch tuning in zone_batchsize Joshua Hahn 2025-10-09 19:29 ` [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 Joshua Hahn 0 siblings, 2 replies; 5+ messages in thread From: Joshua Hahn @ 2025-10-09 19:29 UTC (permalink / raw) To: Andrew Morton Cc: Dave Hansen, Brendan Jackman, Johannes Weiner, Michal Hocko, Suren Baghdasaryan, Vlastimil Babka, Zi Yan, linux-kernel, linux-mm, kernel-team Two small cleanups for mm/page_alloc. Patch 1 cleans up a misleading comment about how pcp->batch is calculated, and folds in the calculation to increase clarity. No functional change intended. Patch 2 corrects zones from reporting that their pcp->batch is 0 when it is actually 1. Namely, corrects ZONE_DMA from reporting that its batch size is 0. Joshua Hahn (2): mm/page_alloc: Clarify batch tuning in zone_batchsize mm/page_alloc: Prevent reporting pcp->batch = 0 mm/page_alloc.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) base-commit: ec714e371f22f716a04e6ecb2a24988c92b26911 -- 2.47.3 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2 1/2] mm/page_alloc: Clarify batch tuning in zone_batchsize 2025-10-09 19:29 [PATCH v2 0/2] mm/page_alloc: pcp->batch cleanups Joshua Hahn @ 2025-10-09 19:29 ` Joshua Hahn 2025-10-13 12:55 ` Vlastimil Babka 2025-10-09 19:29 ` [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 Joshua Hahn 1 sibling, 1 reply; 5+ messages in thread From: Joshua Hahn @ 2025-10-09 19:29 UTC (permalink / raw) To: Andrew Morton Cc: Dave Hansen, Brendan Jackman, Johannes Weiner, Michal Hocko, Suren Baghdasaryan, Vlastimil Babka, Zi Yan, linux-kernel, linux-mm, kernel-team Recently while working on another patch about batching free_pcppages_bulk [1], I was curious why pcp->batch was always 63 on my machine. This led me to zone_batchsize(), where I found this set of lines to determine what the batch size should be for the host: batch = min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE); batch /= 4; /* We effectively *= 4 below */ if (batch < 1) batch = 1; All of this is good, except the comment above which says "We effectively *= 4 below". Nowhere else in the function zone_batchsize(), is there a corresponding multipliation by 4. Looking into the history of this, it seems like Dave Hansen had also noticed this back in 2013 [1]. Turns out there *used* to be a corresponding *= 4, which was turned into a *= 6 later on to be used in pageset_setup_from_batch_size(), which no longer exists. Despite this mismatch not being corrected in the comments, it seems that getting rid of the /= 4 leads to a performance regression on machines with less than 250G memory and 176 processors. As such, let us preserve the functionality but clean up the comments. Fold the /= 4 into the calculation above: bitshift by 10+2=12, and instead of dividing 1MB, divide 256KB and adjust the comments accordingly. No functional change intended. Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> [1] https://lore.kernel.org/all/20251002204636.4016712-1-joshua.hahnjy@gmail.com/ [2] https://lore.kernel.org/linux-mm/20131015203547.8724C69C@viggo.jf.intel.com/ --- mm/page_alloc.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 600d9e981c23..39368cdc953d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5860,13 +5860,12 @@ static int zone_batchsize(struct zone *zone) int batch; /* - * The number of pages to batch allocate is either ~0.1% - * of the zone or 1MB, whichever is smaller. The batch + * The number of pages to batch allocate is either ~0.025% + * of the zone or 256KB, whichever is smaller. The batch * size is striking a balance between allocation latency * and zone lock contention. */ - batch = min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE); - batch /= 4; /* We effectively *= 4 below */ + batch = min(zone_managed_pages(zone) >> 12, SZ_256K / PAGE_SIZE); if (batch < 1) batch = 1; -- 2.47.3 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 1/2] mm/page_alloc: Clarify batch tuning in zone_batchsize 2025-10-09 19:29 ` [PATCH v2 1/2] mm/page_alloc: Clarify batch tuning in zone_batchsize Joshua Hahn @ 2025-10-13 12:55 ` Vlastimil Babka 0 siblings, 0 replies; 5+ messages in thread From: Vlastimil Babka @ 2025-10-13 12:55 UTC (permalink / raw) To: Joshua Hahn, Andrew Morton Cc: Dave Hansen, Brendan Jackman, Johannes Weiner, Michal Hocko, Suren Baghdasaryan, Zi Yan, linux-kernel, linux-mm, kernel-team On 10/9/25 21:29, Joshua Hahn wrote: > Recently while working on another patch about batching > free_pcppages_bulk [1], I was curious why pcp->batch was always 63 on my > machine. This led me to zone_batchsize(), where I found this set of > lines to determine what the batch size should be for the host: > > batch = min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE); > batch /= 4; /* We effectively *= 4 below */ > if (batch < 1) > batch = 1; > > All of this is good, except the comment above which says "We effectively > *= 4 below". Nowhere else in the function zone_batchsize(), is there a > corresponding multipliation by 4. Looking into the history of this, it > seems like Dave Hansen had also noticed this back in 2013 [1]. Turns out > there *used* to be a corresponding *= 4, which was turned into a *= 6 > later on to be used in pageset_setup_from_batch_size(), which no longer > exists. > > Despite this mismatch not being corrected in the comments, it seems that > getting rid of the /= 4 leads to a performance regression on machines > with less than 250G memory and 176 processors. As such, let us preserve > the functionality but clean up the comments. > > Fold the /= 4 into the calculation above: bitshift by 10+2=12, and > instead of dividing 1MB, divide 256KB and adjust the comments > accordingly. No functional change intended. > > Suggested-by: Dave Hansen <dave.hansen@intel.com> > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 2025-10-09 19:29 [PATCH v2 0/2] mm/page_alloc: pcp->batch cleanups Joshua Hahn 2025-10-09 19:29 ` [PATCH v2 1/2] mm/page_alloc: Clarify batch tuning in zone_batchsize Joshua Hahn @ 2025-10-09 19:29 ` Joshua Hahn 2025-10-13 12:58 ` Vlastimil Babka 1 sibling, 1 reply; 5+ messages in thread From: Joshua Hahn @ 2025-10-09 19:29 UTC (permalink / raw) To: Andrew Morton Cc: Dave Hansen, Brendan Jackman, Johannes Weiner, Michal Hocko, Suren Baghdasaryan, Vlastimil Babka, Zi Yan, linux-kernel, linux-mm, kernel-team zone_batchsize returns the appropriate value that should be used for pcp->batch. If it finds a zone with less than 4096 pages or PAGE_SIZE > 1M, however, it leads to some incorrect math. In the above case, we will get an intermediary value of 1, which is then rounded down to the nearest power of two, and 1 is subtracted from it. Since 1 is already a power of two, we will get batch = 1-1 = 0: batch = rounddown_pow_of_two(batch + batch/2) - 1; A pcp->batch value of 0 is nonsensical. If this were actually set, then functions like drain_zone_pages would become no-ops, since they could only free 0 pages at a time. Of the two callers of zone_batchsize, the one that is actually used to set pcp->batch works around this by setting pcp->batch to the maximum of 1 and zone_batchsize. However, the other caller, zone_pcp_init, incorrectly prints out the batch size of the zone to be 0. This is probably rare in a typical zone, but the DMA zone can often have less than 4096 pages, which means it will print out "LIFO batch:0". Before: [ 0.001216] DMA zone: 3998 pages, LIFO batch:0 After: [ 0.001210] DMA zone: 3998 pages, LIFO batch:1 Instead of dealing with the error handling and the mismatch between the reported and actual zone batchsize, just return 1 if the zone_batchsize is 1 page or less before the rounding. Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> --- mm/page_alloc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 39368cdc953d..10a908793b4c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5866,8 +5866,8 @@ static int zone_batchsize(struct zone *zone) * and zone lock contention. */ batch = min(zone_managed_pages(zone) >> 12, SZ_256K / PAGE_SIZE); - if (batch < 1) - batch = 1; + if (batch <= 1) + return 1; /* * Clamp the batch to a 2^n - 1 value. Having a power @@ -6018,7 +6018,7 @@ static void zone_set_pageset_high_and_batch(struct zone *zone, int cpu_online) { int new_high_min, new_high_max, new_batch; - new_batch = max(1, zone_batchsize(zone)); + new_batch = zone_batchsize(zone); if (percpu_pagelist_high_fraction) { new_high_min = zone_highsize(zone, new_batch, cpu_online, percpu_pagelist_high_fraction); -- 2.47.3 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 2025-10-09 19:29 ` [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 Joshua Hahn @ 2025-10-13 12:58 ` Vlastimil Babka 0 siblings, 0 replies; 5+ messages in thread From: Vlastimil Babka @ 2025-10-13 12:58 UTC (permalink / raw) To: Joshua Hahn, Andrew Morton Cc: Dave Hansen, Brendan Jackman, Johannes Weiner, Michal Hocko, Suren Baghdasaryan, Zi Yan, linux-kernel, linux-mm, kernel-team On 10/9/25 21:29, Joshua Hahn wrote: > zone_batchsize returns the appropriate value that should be used for > pcp->batch. If it finds a zone with less than 4096 pages or PAGE_SIZE > > 1M, however, it leads to some incorrect math. > > In the above case, we will get an intermediary value of 1, which is then > rounded down to the nearest power of two, and 1 is subtracted from it. > Since 1 is already a power of two, we will get batch = 1-1 = 0: > > batch = rounddown_pow_of_two(batch + batch/2) - 1; > > A pcp->batch value of 0 is nonsensical. If this were actually set, then > functions like drain_zone_pages would become no-ops, since they could > only free 0 pages at a time. > > Of the two callers of zone_batchsize, the one that is actually used to > set pcp->batch works around this by setting pcp->batch to the maximum > of 1 and zone_batchsize. However, the other caller, zone_pcp_init, > incorrectly prints out the batch size of the zone to be 0. > > This is probably rare in a typical zone, but the DMA zone can often have > less than 4096 pages, which means it will print out "LIFO batch:0". > > Before: [ 0.001216] DMA zone: 3998 pages, LIFO batch:0 > After: [ 0.001210] DMA zone: 3998 pages, LIFO batch:1 > > Instead of dealing with the error handling and the mismatch between the > reported and actual zone batchsize, just return 1 if the zone_batchsize > is 1 page or less before the rounding. > > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-10-13 12:59 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-10-09 19:29 [PATCH v2 0/2] mm/page_alloc: pcp->batch cleanups Joshua Hahn 2025-10-09 19:29 ` [PATCH v2 1/2] mm/page_alloc: Clarify batch tuning in zone_batchsize Joshua Hahn 2025-10-13 12:55 ` Vlastimil Babka 2025-10-09 19:29 ` [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 Joshua Hahn 2025-10-13 12:58 ` Vlastimil Babka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox