[PATCH v2 0/2] mm/page_alloc: pcp->batch cleanups

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] mm/page_alloc: pcp->batch cleanups
@ 2025-10-09 19:29 Joshua Hahn
  2025-10-09 19:29 ` [PATCH v2 1/2] mm/page_alloc: Clarify batch tuning in zone_batchsize Joshua Hahn
  2025-10-09 19:29 ` [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 Joshua Hahn
  0 siblings, 2 replies; 8+ messages in thread
From: Joshua Hahn @ 2025-10-09 19:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Hansen, Brendan Jackman, Johannes Weiner, Michal Hocko,
	Suren Baghdasaryan, Vlastimil Babka, Zi Yan, linux-kernel,
	linux-mm, kernel-team

Two small cleanups for mm/page_alloc.

Patch 1 cleans up a misleading comment about how pcp->batch is calculated,
and folds in the calculation to increase clarity. No functional change
intended.

Patch 2 corrects zones from reporting that their pcp->batch is 0 when it
is actually 1. Namely, corrects ZONE_DMA from reporting that its batch
size is 0.

Joshua Hahn (2):
  mm/page_alloc: Clarify batch tuning in zone_batchsize
  mm/page_alloc: Prevent reporting pcp->batch = 0

 mm/page_alloc.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)


base-commit: ec714e371f22f716a04e6ecb2a24988c92b26911
-- 
2.47.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] mm/page_alloc: Clarify batch tuning in zone_batchsize
  2025-10-09 19:29 [PATCH v2 0/2] mm/page_alloc: pcp->batch cleanups Joshua Hahn
@ 2025-10-09 19:29 ` Joshua Hahn
  2025-10-13 12:55   ` Vlastimil Babka
  2025-10-09 19:29 ` [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 Joshua Hahn
  1 sibling, 1 reply; 8+ messages in thread
From: Joshua Hahn @ 2025-10-09 19:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Hansen, Brendan Jackman, Johannes Weiner, Michal Hocko,
	Suren Baghdasaryan, Vlastimil Babka, Zi Yan, linux-kernel,
	linux-mm, kernel-team

Recently while working on another patch about batching
free_pcppages_bulk [1], I was curious why pcp->batch was always 63 on my
machine. This led me to zone_batchsize(), where I found this set of
lines to determine what the batch size should be for the host:

	batch = min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE);
	batch /= 4;		/* We effectively *= 4 below */
	if (batch < 1)
		batch = 1;

All of this is good, except the comment above which says "We effectively
*= 4 below". Nowhere else in the function zone_batchsize(), is there a
corresponding multipliation by 4. Looking into the history of this, it
seems like Dave Hansen had also noticed this back in 2013 [1]. Turns out
there *used* to be a corresponding *= 4, which was turned into a *= 6
later on to be used in pageset_setup_from_batch_size(), which no longer
exists.

Despite this mismatch not being corrected in the comments, it seems that
getting rid of the /= 4 leads to a performance regression on machines
with less than 250G memory and 176 processors. As such, let us preserve
the functionality but clean up the comments.

Fold the /= 4 into the calculation above: bitshift by 10+2=12, and
instead of dividing 1MB, divide 256KB and adjust the comments
accordingly. No functional change intended.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>

[1] https://lore.kernel.org/all/20251002204636.4016712-1-joshua.hahnjy@gmail.com/
[2] https://lore.kernel.org/linux-mm/20131015203547.8724C69C@viggo.jf.intel.com/
---
 mm/page_alloc.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 600d9e981c23..39368cdc953d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5860,13 +5860,12 @@ static int zone_batchsize(struct zone *zone)
 	int batch;

 	/*
-	 * The number of pages to batch allocate is either ~0.1%
-	 * of the zone or 1MB, whichever is smaller. The batch
+	 * The number of pages to batch allocate is either ~0.025%
+	 * of the zone or 256KB, whichever is smaller. The batch
 	 * size is striking a balance between allocation latency
 	 * and zone lock contention.
 	 */
-	batch = min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE);
-	batch /= 4;		/* We effectively *= 4 below */
+	batch = min(zone_managed_pages(zone) >> 12, SZ_256K / PAGE_SIZE);
 	if (batch < 1)
 		batch = 1;

-- 
2.47.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0
  2025-10-09 19:29 [PATCH v2 0/2] mm/page_alloc: pcp->batch cleanups Joshua Hahn
  2025-10-09 19:29 ` [PATCH v2 1/2] mm/page_alloc: Clarify batch tuning in zone_batchsize Joshua Hahn
@ 2025-10-09 19:29 ` Joshua Hahn
  2025-10-13 12:58   ` Vlastimil Babka
  2025-12-16 21:47   ` [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 [mcf5208evb boot failure] Guenter Roeck
  1 sibling, 2 replies; 8+ messages in thread
From: Joshua Hahn @ 2025-10-09 19:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Hansen, Brendan Jackman, Johannes Weiner, Michal Hocko,
	Suren Baghdasaryan, Vlastimil Babka, Zi Yan, linux-kernel,
	linux-mm, kernel-team

zone_batchsize returns the appropriate value that should be used for
pcp->batch. If it finds a zone with less than 4096 pages or PAGE_SIZE >
1M, however, it leads to some incorrect math.

In the above case, we will get an intermediary value of 1, which is then
rounded down to the nearest power of two, and 1 is subtracted from it.
Since 1 is already a power of two, we will get batch = 1-1 = 0:

	batch = rounddown_pow_of_two(batch + batch/2) - 1;

A pcp->batch value of 0 is nonsensical. If this were actually set, then
functions like drain_zone_pages would become no-ops, since they could
only free 0 pages at a time.

Of the two callers of zone_batchsize, the one that is actually used to
set pcp->batch works around this by setting pcp->batch to the maximum
of 1 and zone_batchsize. However, the other caller, zone_pcp_init,
incorrectly prints out the batch size of the zone to be 0.

This is probably rare in a typical zone, but the DMA zone can often have
less than 4096 pages, which means it will print out "LIFO batch:0".

Before: [    0.001216]   DMA zone: 3998 pages, LIFO batch:0
After:  [    0.001210]   DMA zone: 3998 pages, LIFO batch:1

Instead of dealing with the error handling and the mismatch between the
reported and actual zone batchsize, just return 1 if the zone_batchsize
is 1 page or less before the rounding.

Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
 mm/page_alloc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 39368cdc953d..10a908793b4c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5866,8 +5866,8 @@ static int zone_batchsize(struct zone *zone)
 	 * and zone lock contention.
 	 */
 	batch = min(zone_managed_pages(zone) >> 12, SZ_256K / PAGE_SIZE);
-	if (batch < 1)
-		batch = 1;
+	if (batch <= 1)
+		return 1;

 	/*
 	 * Clamp the batch to a 2^n - 1 value. Having a power
@@ -6018,7 +6018,7 @@ static void zone_set_pageset_high_and_batch(struct zone *zone, int cpu_online)
 {
 	int new_high_min, new_high_max, new_batch;

-	new_batch = max(1, zone_batchsize(zone));
+	new_batch = zone_batchsize(zone);
 	if (percpu_pagelist_high_fraction) {
 		new_high_min = zone_highsize(zone, new_batch, cpu_online,
 					     percpu_pagelist_high_fraction);
-- 
2.47.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] mm/page_alloc: Clarify batch tuning in zone_batchsize
  2025-10-09 19:29 ` [PATCH v2 1/2] mm/page_alloc: Clarify batch tuning in zone_batchsize Joshua Hahn
@ 2025-10-13 12:55   ` Vlastimil Babka
  0 siblings, 0 replies; 8+ messages in thread
From: Vlastimil Babka @ 2025-10-13 12:55 UTC (permalink / raw)
  To: Joshua Hahn, Andrew Morton
  Cc: Dave Hansen, Brendan Jackman, Johannes Weiner, Michal Hocko,
	Suren Baghdasaryan, Zi Yan, linux-kernel, linux-mm, kernel-team

On 10/9/25 21:29, Joshua Hahn wrote:
> Recently while working on another patch about batching
> free_pcppages_bulk [1], I was curious why pcp->batch was always 63 on my
> machine. This led me to zone_batchsize(), where I found this set of
> lines to determine what the batch size should be for the host:
> 
> 	batch = min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE);
> 	batch /= 4;		/* We effectively *= 4 below */
> 	if (batch < 1)
> 		batch = 1;
> 
> All of this is good, except the comment above which says "We effectively
> *= 4 below". Nowhere else in the function zone_batchsize(), is there a
> corresponding multipliation by 4. Looking into the history of this, it
> seems like Dave Hansen had also noticed this back in 2013 [1]. Turns out
> there *used* to be a corresponding *= 4, which was turned into a *= 6
> later on to be used in pageset_setup_from_batch_size(), which no longer
> exists.
> 
> Despite this mismatch not being corrected in the comments, it seems that
> getting rid of the /= 4 leads to a performance regression on machines
> with less than 250G memory and 176 processors. As such, let us preserve
> the functionality but clean up the comments.
> 
> Fold the /= 4 into the calculation above: bitshift by 10+2=12, and
> instead of dividing 1MB, divide 256KB and adjust the comments
> accordingly. No functional change intended.
> 
> Suggested-by: Dave Hansen <dave.hansen@intel.com>
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0
  2025-10-09 19:29 ` [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 Joshua Hahn
@ 2025-10-13 12:58   ` Vlastimil Babka
  2025-12-16 21:47   ` [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 [mcf5208evb boot failure] Guenter Roeck
  1 sibling, 0 replies; 8+ messages in thread
From: Vlastimil Babka @ 2025-10-13 12:58 UTC (permalink / raw)
  To: Joshua Hahn, Andrew Morton
  Cc: Dave Hansen, Brendan Jackman, Johannes Weiner, Michal Hocko,
	Suren Baghdasaryan, Zi Yan, linux-kernel, linux-mm, kernel-team

On 10/9/25 21:29, Joshua Hahn wrote:
> zone_batchsize returns the appropriate value that should be used for
> pcp->batch. If it finds a zone with less than 4096 pages or PAGE_SIZE >
> 1M, however, it leads to some incorrect math.
> 
> In the above case, we will get an intermediary value of 1, which is then
> rounded down to the nearest power of two, and 1 is subtracted from it.
> Since 1 is already a power of two, we will get batch = 1-1 = 0:
> 
> 	batch = rounddown_pow_of_two(batch + batch/2) - 1;
> 
> A pcp->batch value of 0 is nonsensical. If this were actually set, then
> functions like drain_zone_pages would become no-ops, since they could
> only free 0 pages at a time.
> 
> Of the two callers of zone_batchsize, the one that is actually used to
> set pcp->batch works around this by setting pcp->batch to the maximum
> of 1 and zone_batchsize. However, the other caller, zone_pcp_init,
> incorrectly prints out the batch size of the zone to be 0.
> 
> This is probably rare in a typical zone, but the DMA zone can often have
> less than 4096 pages, which means it will print out "LIFO batch:0".
> 
> Before: [    0.001216]   DMA zone: 3998 pages, LIFO batch:0
> After:  [    0.001210]   DMA zone: 3998 pages, LIFO batch:1
> 
> Instead of dealing with the error handling and the mismatch between the
> reported and actual zone batchsize, just return 1 if the zone_batchsize
> is 1 page or less before the rounding.
> 
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 [mcf5208evb boot failure]
  2025-10-09 19:29 ` [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 Joshua Hahn
  2025-10-13 12:58   ` Vlastimil Babka
@ 2025-12-16 21:47   ` Guenter Roeck
  2025-12-16 23:20     ` Daniel Palmer
  2025-12-17  5:16     ` Joshua Hahn
  1 sibling, 2 replies; 8+ messages in thread
From: Guenter Roeck @ 2025-12-16 21:47 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: Andrew Morton, Dave Hansen, Brendan Jackman, Johannes Weiner,
	Michal Hocko, Suren Baghdasaryan, Vlastimil Babka, Zi Yan,
	linux-kernel, linux-mm, kernel-team, Geert Uytterhoeven,
	linux-m68k

Hi,

On Thu, Oct 09, 2025 at 12:29:31PM -0700, Joshua Hahn wrote:
> zone_batchsize returns the appropriate value that should be used for
> pcp->batch. If it finds a zone with less than 4096 pages or PAGE_SIZE >
> 1M, however, it leads to some incorrect math.
> 
> In the above case, we will get an intermediary value of 1, which is then
> rounded down to the nearest power of two, and 1 is subtracted from it.
> Since 1 is already a power of two, we will get batch = 1-1 = 0:
> 
> 	batch = rounddown_pow_of_two(batch + batch/2) - 1;
> 
> A pcp->batch value of 0 is nonsensical. If this were actually set, then
> functions like drain_zone_pages would become no-ops, since they could
> only free 0 pages at a time.
> 
> Of the two callers of zone_batchsize, the one that is actually used to
> set pcp->batch works around this by setting pcp->batch to the maximum
> of 1 and zone_batchsize. However, the other caller, zone_pcp_init,
> incorrectly prints out the batch size of the zone to be 0.
> 
> This is probably rare in a typical zone, but the DMA zone can often have
> less than 4096 pages, which means it will print out "LIFO batch:0".
> 
> Before: [    0.001216]   DMA zone: 3998 pages, LIFO batch:0
> After:  [    0.001210]   DMA zone: 3998 pages, LIFO batch:1
> 
> Instead of dealing with the error handling and the mismatch between the
> reported and actual zone batchsize, just return 1 if the zone_batchsize
> is 1 page or less before the rounding.
> 
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>

With this patch in the tree, the qemu 'mcf5208evb' machine fails to boot
with memory errors such as

S01syslogd: page allocation failure: order:7, mode:0xcc0(GFP_KERNEL), nodemask=(null)
CPU: 0 UID: 0 PID: 34 Comm: S01syslogd Not tainted 6.19.0-rc1 #1 NONE
Stack from 407d7ce0:
        407d7ce0 403df960 403df960 00000000 00000001 00000007 40027c60 403df960
        400c06be 00000cc0 00000001 407d7d7e 400bf614 407d7d34 403df3ba 407d7d14
        407d7db8 400c0e5c 00000cc0 00000000 403df3ba 00000007 00000007 00000cc0
        000d8000 00000018 0000006c 00000001 00000000 40fe6640 00000000 40fe81e4
        403ffa40 4085eff4 00000000 00000400 00000000 001008c0 00000000 40854041
        f4fe0000 00004041 f4fe0000 00000000 00010000 403ffa40 4085ed00 4085e800
Call Trace: [<40027c60>] dump_stack+0xc/0x10
 [<400c06be>] warn_alloc+0xdc/0x1bc
 [<400bf614>] get_page_from_freelist+0x0/0xfa6
 [<400c0e5c>] __alloc_frozen_pages_noprof+0x6be/0x8be
 [<400c1358>] get_free_pages_noprof+0x16/0x3e

Reverting this patch fixes the problem.

Bisect log is attached for reference.

Guenter

---
# bad: [416f99c3b16f582a3fc6d64a1f77f39d94b76de5] Merge tag 'driver-core-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
# good: [559e608c46553c107dbba19dae0854af7b219400] Merge tag 'ntfs3_for_6.19' of https://github.com/Paragon-Software-Group/linux-ntfs3
git bisect start '416f99c3b16f' '559e608c4655'
# good: [fa5ef105618ae9b5aaa51b3f09e41d88d4514207] Merge tag 'spi-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
git bisect good fa5ef105618ae9b5aaa51b3f09e41d88d4514207
# bad: [399ead3a6d76cbdd29a716660db5c84a314dab70] Merge tag 'uml-for-linux-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
git bisect bad 399ead3a6d76cbdd29a716660db5c84a314dab70
# good: [a3ebb59eee2e558e8f8f27fc3f75cd367f17cd8e] Merge tag 'vfio-v6.19-rc1' of https://github.com/awilliam/linux-vfio
git bisect good a3ebb59eee2e558e8f8f27fc3f75cd367f17cd8e
# bad: [faf3c923523e5c8fc3baaa413d62e913774ae52f] mm: fix vma_start_write_killable() signal handling
git bisect bad faf3c923523e5c8fc3baaa413d62e913774ae52f
# bad: [915a2453d824a9b6bf724e3f970d86ae1d092a61] mm/damon/tests/core-kunit: handle alloc failure on damon_test_set_attrs()
git bisect bad 915a2453d824a9b6bf724e3f970d86ae1d092a61
# bad: [c707a68f9468e4ef4a3546b636a9dd088fe7b7f1] mm: abstract io_remap_pfn_range() based on PFN
git bisect bad c707a68f9468e4ef4a3546b636a9dd088fe7b7f1
# bad: [ca30ac479e6cf7a210dcad32fa2ee99ca0357e91] mm/page_owner: simplify zone iteration logic in init_early_allocated_pages()
git bisect bad ca30ac479e6cf7a210dcad32fa2ee99ca0357e91
# good: [138336d674d2e51f1e5699d2a30af1e9aa1352b4] mm/zswap: remove unnecessary dlen writes for incompressible pages
git bisect good 138336d674d2e51f1e5699d2a30af1e9aa1352b4
# good: [0de9a442eeba4a6435af74120822b10b12ab8449] mm/page_owner: update Documentation with 'show_handles' and 'show_stacks_handles'
git bisect good 0de9a442eeba4a6435af74120822b10b12ab8449
# bad: [95b34d66480bbc9bc31e78c26b1d5be47358ffc0] mm: always call rmap_walk() on locked folios
git bisect bad 95b34d66480bbc9bc31e78c26b1d5be47358ffc0
# bad: [2783088ef24e32df9d70eb2a24f70de28b476a05] mm/page_alloc: prevent reporting pcp->batch = 0
git bisect bad 2783088ef24e32df9d70eb2a24f70de28b476a05
# good: [4dcf65bf5be22e32d389628b0e655731f97f525e] mm/page_alloc: clarify batch tuning in zone_batchsize
git bisect good 4dcf65bf5be22e32d389628b0e655731f97f525e
# first bad commit: [2783088ef24e32df9d70eb2a24f70de28b476a05] mm/page_alloc: prevent reporting pcp->batch = 0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 [mcf5208evb boot failure]
  2025-12-16 21:47   ` [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 [mcf5208evb boot failure] Guenter Roeck
@ 2025-12-16 23:20     ` Daniel Palmer
  2025-12-17  5:16     ` Joshua Hahn
  1 sibling, 0 replies; 8+ messages in thread
From: Daniel Palmer @ 2025-12-16 23:20 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Joshua Hahn, Andrew Morton, Dave Hansen, Brendan Jackman,
	Johannes Weiner, Michal Hocko, Suren Baghdasaryan,
	Vlastimil Babka, Zi Yan, linux-kernel, linux-mm, kernel-team,
	Geert Uytterhoeven, linux-m68k

Hi Guenter,

On Wed, 17 Dec 2025 at 06:47, Guenter Roeck <linux@roeck-us.net> wrote:
> With this patch in the tree, the qemu 'mcf5208evb' machine fails to boot
> with memory errors such as
>
> S01syslogd: page allocation failure: order:7, mode:0xcc0(GFP_KERNEL), nodemask=(null)
> CPU: 0 UID: 0 PID: 34 Comm: S01syslogd Not tainted 6.19.0-rc1 #1 NONE
> Stack from 407d7ce0:
>         407d7ce0 403df960 403df960 00000000 00000001 00000007 40027c60 403df960
>         400c06be 00000cc0 00000001 407d7d7e 400bf614 407d7d34 403df3ba 407d7d14
>         407d7db8 400c0e5c 00000cc0 00000000 403df3ba 00000007 00000007 00000cc0
>         000d8000 00000018 0000006c 00000001 00000000 40fe6640 00000000 40fe81e4
>         403ffa40 4085eff4 00000000 00000400 00000000 001008c0 00000000 40854041
>         f4fe0000 00004041 f4fe0000 00000000 00010000 403ffa40 4085ed00 4085e800
> Call Trace: [<40027c60>] dump_stack+0xc/0x10
>  [<400c06be>] warn_alloc+0xdc/0x1bc
>  [<400bf614>] get_page_from_freelist+0x0/0xfa6
>  [<400c0e5c>] __alloc_frozen_pages_noprof+0x6be/0x8be
>  [<400c1358>] get_free_pages_noprof+0x16/0x3e

I reported a different issue but with the same change:

https://lore.kernel.org/lkml/CAFr9PX=_HaM3_xPtTiBn5Gw5-0xcRpawpJ02NStfdr0khF2k7g@mail.gmail.com/
https://lore.kernel.org/lkml/20251211102607.2538595-1-daniel@thingy.jp/

Cheers,

Daniel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 [mcf5208evb boot failure]
  2025-12-16 21:47   ` [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 [mcf5208evb boot failure] Guenter Roeck
  2025-12-16 23:20     ` Daniel Palmer
@ 2025-12-17  5:16     ` Joshua Hahn
  1 sibling, 0 replies; 8+ messages in thread
From: Joshua Hahn @ 2025-12-17  5:16 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Andrew Morton, Dave Hansen, Brendan Jackman, Johannes Weiner,
	Michal Hocko, Suren Baghdasaryan, Vlastimil Babka, Zi Yan,
	linux-kernel, linux-mm, kernel-team, Geert Uytterhoeven,
	linux-m68k

On Tue, 16 Dec 2025 13:47:03 -0800 Guenter Roeck <linux@roeck-us.net> wrote:

> Hi,
> 
> On Thu, Oct 09, 2025 at 12:29:31PM -0700, Joshua Hahn wrote:
> > zone_batchsize returns the appropriate value that should be used for
> > pcp->batch. If it finds a zone with less than 4096 pages or PAGE_SIZE >
> > 1M, however, it leads to some incorrect math.
> > 
> > In the above case, we will get an intermediary value of 1, which is then
> > rounded down to the nearest power of two, and 1 is subtracted from it.
> > Since 1 is already a power of two, we will get batch = 1-1 = 0:
> > 
> > 	batch = rounddown_pow_of_two(batch + batch/2) - 1;
> > 
> > A pcp->batch value of 0 is nonsensical. If this were actually set, then
> > functions like drain_zone_pages would become no-ops, since they could
> > only free 0 pages at a time.
> > 
> > Of the two callers of zone_batchsize, the one that is actually used to
> > set pcp->batch works around this by setting pcp->batch to the maximum
> > of 1 and zone_batchsize. However, the other caller, zone_pcp_init,
> > incorrectly prints out the batch size of the zone to be 0.
> > 
> > This is probably rare in a typical zone, but the DMA zone can often have
> > less than 4096 pages, which means it will print out "LIFO batch:0".
> > 
> > Before: [    0.001216]   DMA zone: 3998 pages, LIFO batch:0
> > After:  [    0.001210]   DMA zone: 3998 pages, LIFO batch:1
> > 
> > Instead of dealing with the error handling and the mismatch between the
> > reported and actual zone batchsize, just return 1 if the zone_batchsize
> > is 1 page or less before the rounding.
> > 
> > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> 
> With this patch in the tree, the qemu 'mcf5208evb' machine fails to boot
> with memory errors such as
> 
> S01syslogd: page allocation failure: order:7, mode:0xcc0(GFP_KERNEL), nodemask=(null)
> CPU: 0 UID: 0 PID: 34 Comm: S01syslogd Not tainted 6.19.0-rc1 #1 NONE
> Stack from 407d7ce0:
>         407d7ce0 403df960 403df960 00000000 00000001 00000007 40027c60 403df960
>         400c06be 00000cc0 00000001 407d7d7e 400bf614 407d7d34 403df3ba 407d7d14
>         407d7db8 400c0e5c 00000cc0 00000000 403df3ba 00000007 00000007 00000cc0
>         000d8000 00000018 0000006c 00000001 00000000 40fe6640 00000000 40fe81e4
>         403ffa40 4085eff4 00000000 00000400 00000000 001008c0 00000000 40854041
>         f4fe0000 00004041 f4fe0000 00000000 00010000 403ffa40 4085ed00 4085e800
> Call Trace: [<40027c60>] dump_stack+0xc/0x10
>  [<400c06be>] warn_alloc+0xdc/0x1bc
>  [<400bf614>] get_page_from_freelist+0x0/0xfa6
>  [<400c0e5c>] __alloc_frozen_pages_noprof+0x6be/0x8be
>  [<400c1358>] get_free_pages_noprof+0x16/0x3e
> 
> Reverting this patch fixes the problem.

Hi Guenter,

Thank you for the report. Daniel Palmer has identified an issue on NOMMU
systems, and I think this is caused by the same issue. It seems like
mcf5208evb is also NOMMU (arch/m68k/Kconfig.cpu shows config M520x depends on
!MMU), so I imagine this is the same issue that was reported.

Andrew let me know that the commit has already been committed to mainline
so I'll be sending up a fix shortly. Sorry about the problem, and thank you
again for reporting it. I hope you have a great day!
Joshua


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-12-17  5:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-09 19:29 [PATCH v2 0/2] mm/page_alloc: pcp->batch cleanups Joshua Hahn
2025-10-09 19:29 ` [PATCH v2 1/2] mm/page_alloc: Clarify batch tuning in zone_batchsize Joshua Hahn
2025-10-13 12:55   ` Vlastimil Babka
2025-10-09 19:29 ` [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 Joshua Hahn
2025-10-13 12:58   ` Vlastimil Babka
2025-12-16 21:47   ` [PATCH v2 2/2] mm/page_alloc: Prevent reporting pcp->batch = 0 [mcf5208evb boot failure] Guenter Roeck
2025-12-16 23:20     ` Daniel Palmer
2025-12-17  5:16     ` Joshua Hahn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox