[PATCH 0/4] mm/page_alloc: Batch callers of free_pcppages

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] mm/page_alloc: Batch callers of free_pcppages_bulk
@ 2025-09-19 19:52 Joshua Hahn
  2025-09-19 19:52 ` [PATCH 1/4] mm/page_alloc/vmstat: Simplify refresh_cpu_vm_stats change detection Joshua Hahn
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Joshua Hahn @ 2025-09-19 19:52 UTC (permalink / raw)
  To: Andrew Morton, Johannes Weiner
  Cc: Chris Mason, Kiryl Shutsemau, Liam R. Howlett, Brendan Jackman,
	David Hildenbrand, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Zi Yan, linux-kernel,
	linux-mm, kernel-team

While testing workloads with high sustained memory pressure on large machines
(1TB memory, 316 CPUs), we saw an unexpectedly high number of softlockups.
Further investigation showed that the lock in free_pcppages_bulk was being held
for a long time, even being held while 2k+ pages were being freed [1].

This causes starvation in other processes for both the pcp and zone locks,
which can lead to softlockups that cause the system to stall [2].

[ 4512.591979] rcu: INFO: rcu_sched self-detected stall on CPU
[ 4512.604370] rcu: 	20-....: (9312 ticks this GP) idle=a654/1/0x4000000000000000 softirq=309340/309344 fqs=5426
[ 4512.626401] rcu: 	         hardirqs   softirqs   csw/system
[ 4512.638793] rcu: 	 number:        0        145            0
[ 4512.651177] rcu: 	cputime:       30      10410          174   ==> 10558(ms)
[ 4512.666657] rcu: 	(t=21077 jiffies g=783665 q=1242213 ncpus=316)

And here is the trace that accompanies it:

[ 4512.666815] RIP: 0010:free_unref_folios+0x47d/0xd80
[ 4512.666818] Code: 00 00 31 ff 40 80 ce 01 41 88 76 18 e9 a8 fe ff ff 40 84 ff 0f 84 d6 00 00 00 39 f0 0f 4c f0 4c 89 ff 4c 89 f2 e8 13 f2 fe ff <49> f7 87 88 05 00 00 04 00 00 00 0f 84 00 ff ff ff 49 8b 47 20 49
[ 4512.666820] RSP: 0018:ffffc900a62f3878 EFLAGS: 00000206
[ 4512.666822] RAX: 000000000005ae80 RBX: 000000000000087a RCX: 0000000000000001
[ 4512.666824] RDX: 000000000000007d RSI: 0000000000000282 RDI: ffff89404c8ba310
[ 4512.666825] RBP: 0000000000000001 R08: ffff89404c8b9d80 R09: 0000000000000001
[ 4512.666826] R10: 0000000000000010 R11: 00000000000130de R12: ffff89404c8b9d80
[ 4512.666827] R13: ffffea01cf3c0000 R14: ffff893d3ac5aec0 R15: ffff89404c8b9d80
[ 4512.666833]  ? free_unref_folios+0x47d/0xd80
[ 4512.666836]  free_pages_and_swap_cache+0xcd/0x1a0
[ 4512.666847]  tlb_finish_mmu+0x11c/0x350
[ 4512.666850]  vms_clear_ptes+0xf9/0x120
[ 4512.666855]  __mmap_region+0x29a/0xc00
[ 4512.666867]  do_mmap+0x34e/0x910
[ 4512.666873]  vm_mmap_pgoff+0xbb/0x200
[ 4512.666877]  ? hrtimer_interrupt+0x337/0x5c0
[ 4512.666879]  ? sched_clock+0x5/0x10
[ 4512.666882]  ? sched_clock_cpu+0xc/0x170
[ 4512.666885]  ? irqtime_account_irq+0x2b/0xa0
[ 4512.666888]  do_syscall_64+0x68/0x130
[ 4512.666892]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 4512.666896] RIP: 0033:0x7f1afe9257e2

To prevent starvation in both the pcp and zone locks, batch the freeing of
pages using pcp->batch. 

Because free_pcppages_bulk is called with both the pcp and zone lock,
relinquishing and reacquiring the locks are only effective when both of them
are broken together. Thus, instead of modifying free_pcppages_bulk to break
both locks, batch the freeing from its callers instead. 

In our fleet, we have seen that performing batched lock freeing has led to
significantly lower rates of softlockups, while incurring relatively small
regressions (relative to the workload and relative to the variation).

The following are a few synthetic benchmarks, made on a machine with
250G RAM, 179G swap, and 176 CPUs. 

stress-ng --vm 50 --vm-bytes 5G -M -t 100
+----------------------+---------------+----------+
|        Metric        | Variation (%) | Delta(%) |
+----------------------+---------------+----------+
| bogo ops             |        0.0120 |  -0.0011 |
| bogo ops/s (real)    |        0.0109 |  -0.0091 |
| bogo ops/s (usr+sys) |        0.5560 |  +0.1049 |
+----------------------+---------------+----------+

stress-ng --vm 10 --vm-bytes 30G -M -t 100
+----------------------+---------------+----------+
|        Metric        | Variation (%) | Delta(%) |
+----------------------+---------------+----------+
| bogo ops             |        1.8530 |  +0.4728 |
| bogo ops/s (real)    |        1.8604 |  +0.2029 |
| bogo ops/s (usr+sys) |        1.6054 |  -0.6381 |
+----------------------+---------------+----------+

Patch 1 simplifies the return semantics of decay_pcp_high and
refresh_cpu_vm_stats, which makes the change in patch 3 more semantically
accurate. 

Patch 2, 3, and 4 each address one caller of free_pcppages_bulk, and ensures
that large values passed to it are batched. 

This series is a follow-up to [2], where I attempted to solve the same problem
by relinquishing only the zone lock within free_pcppages_bulk. Because this
approach is different in nature, I decided not to send this as a v2, but
as a separate series altogether. 

[1] For instance, during *just* the boot of said large machine, there were
    2092 instances of free_pcppages_bulk being called with count > 1000.
[2] https://lore.kernel.org/all/20250818185804.21044-1-joshua.hahnjy@gmail.com/

Joshua Hahn (4):
  mm/page_alloc/vmstat: Simplify refresh_cpu_vm_stats change detection
  mm/page_alloc: Perform appropriate batching in drain_pages_zone
  mm/page_alloc: Batch page freeing in decay_pcp_high
  mm/page_alloc: Batch page freeing in free_frozen_page_commit

 include/linux/gfp.h |  2 +-
 mm/page_alloc.c     | 65 ++++++++++++++++++++++++++++++++-------------
 mm/vmstat.c         | 26 +++++++++---------
 3 files changed, 61 insertions(+), 32 deletions(-)


base-commit: 097a6c336d0080725c626fda118ecfec448acd0f
-- 
2.47.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] mm/page_alloc/vmstat: Simplify refresh_cpu_vm_stats change detection
  2025-09-19 19:52 [PATCH 0/4] mm/page_alloc: Batch callers of free_pcppages_bulk Joshua Hahn
@ 2025-09-19 19:52 ` Joshua Hahn
  2025-09-19 19:52 ` [PATCH 2/4] mm/page_alloc: Perform appropriate batching in drain_pages_zone Joshua Hahn
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Joshua Hahn @ 2025-09-19 19:52 UTC (permalink / raw)
  To: Andrew Morton, Johannes Weiner
  Cc: Chris Mason, Kiryl Shutsemau, Liam R. Howlett, Brendan Jackman,
	David Hildenbrand, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Zi Yan, linux-kernel,
	linux-mm, kernel-team

Currently, refresh_cpu_vm_stats returns an int, indicating how many
changes were made during its updates. Using this information, callers
like vmstat_update can heuristically determine if more work will be done
in the future.

However, all of refresh_cpu_vm_stats's callers either (a) ignore the
result, only caring about performing the updates, or (b) only care about
whether changes were made, but not *how many* changes were made.

Simplify the code by returning a bool instead to indicate if updates
were made.

In addition, simplify fold_diff and decay_pcp_high to return a bool
for the same reason.

Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
 include/linux/gfp.h |  2 +-
 mm/page_alloc.c     |  8 ++++----
 mm/vmstat.c         | 26 +++++++++++++-------------
 3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 5ebf26fcdcfa..63c72cb1d117 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -386,7 +386,7 @@ extern void free_pages(unsigned long addr, unsigned int order);
 #define free_page(addr) free_pages((addr), 0)
 
 void page_alloc_init_cpuhp(void);
-int decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp);
+bool decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp);
 void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
 void drain_all_pages(struct zone *zone);
 void drain_local_pages(struct zone *zone);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d1d037f97c5f..77e7d9a5f149 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2561,10 +2561,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
  * Called from the vmstat counter updater to decay the PCP high.
  * Return whether there are addition works to do.
  */
-int decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp)
+bool decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp)
 {
 	int high_min, to_drain, batch;
-	int todo = 0;
+	bool todo;
 
 	high_min = READ_ONCE(pcp->high_min);
 	batch = READ_ONCE(pcp->batch);
@@ -2577,7 +2577,7 @@ int decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp)
 		pcp->high = max3(pcp->count - (batch << CONFIG_PCP_BATCH_SCALE_MAX),
 				 pcp->high - (pcp->high >> 3), high_min);
 		if (pcp->high > high_min)
-			todo++;
+			todo = true;
 	}
 
 	to_drain = pcp->count - pcp->high;
@@ -2585,7 +2585,7 @@ int decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp)
 		spin_lock(&pcp->lock);
 		free_pcppages_bulk(zone, to_drain, pcp, 0);
 		spin_unlock(&pcp->lock);
-		todo++;
+		todo = true;
 	}
 
 	return todo;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 71cd1ceba191..1f74a3517ab2 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -771,25 +771,25 @@ EXPORT_SYMBOL(dec_node_page_state);
 
 /*
  * Fold a differential into the global counters.
- * Returns the number of counters updated.
+ * Returns whether counters were updated.
  */
 static int fold_diff(int *zone_diff, int *node_diff)
 {
 	int i;
-	int changes = 0;
+	bool changed = false;
 
 	for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
 		if (zone_diff[i]) {
 			atomic_long_add(zone_diff[i], &vm_zone_stat[i]);
-			changes++;
+			changed = true;
 	}
 
 	for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
 		if (node_diff[i]) {
 			atomic_long_add(node_diff[i], &vm_node_stat[i]);
-			changes++;
+			changed = true;
 	}
-	return changes;
+	return changed;
 }
 
 /*
@@ -806,16 +806,16 @@ static int fold_diff(int *zone_diff, int *node_diff)
  * with the global counters. These could cause remote node cache line
  * bouncing and will have to be only done when necessary.
  *
- * The function returns the number of global counters updated.
+ * The function returns whether global counters were updated.
  */
-static int refresh_cpu_vm_stats(bool do_pagesets)
+static bool refresh_cpu_vm_stats(bool do_pagesets)
 {
 	struct pglist_data *pgdat;
 	struct zone *zone;
 	int i;
 	int global_zone_diff[NR_VM_ZONE_STAT_ITEMS] = { 0, };
 	int global_node_diff[NR_VM_NODE_STAT_ITEMS] = { 0, };
-	int changes = 0;
+	bool changed = false;
 
 	for_each_populated_zone(zone) {
 		struct per_cpu_zonestat __percpu *pzstats = zone->per_cpu_zonestats;
@@ -839,7 +839,7 @@ static int refresh_cpu_vm_stats(bool do_pagesets)
 		if (do_pagesets) {
 			cond_resched();
 
-			changes += decay_pcp_high(zone, this_cpu_ptr(pcp));
+			changed |= decay_pcp_high(zone, this_cpu_ptr(pcp));
 #ifdef CONFIG_NUMA
 			/*
 			 * Deal with draining the remote pageset of this
@@ -861,13 +861,13 @@ static int refresh_cpu_vm_stats(bool do_pagesets)
 			}
 
 			if (__this_cpu_dec_return(pcp->expire)) {
-				changes++;
+				changed = true;
 				continue;
 			}
 
 			if (__this_cpu_read(pcp->count)) {
 				drain_zone_pages(zone, this_cpu_ptr(pcp));
-				changes++;
+				changed = true;
 			}
 #endif
 		}
@@ -887,8 +887,8 @@ static int refresh_cpu_vm_stats(bool do_pagesets)
 		}
 	}
 
-	changes += fold_diff(global_zone_diff, global_node_diff);
-	return changes;
+	changed |= fold_diff(global_zone_diff, global_node_diff);
+	return changed;
 }
 
 /*
-- 
2.47.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/4] mm/page_alloc: Perform appropriate batching in drain_pages_zone
  2025-09-19 19:52 [PATCH 0/4] mm/page_alloc: Batch callers of free_pcppages_bulk Joshua Hahn
  2025-09-19 19:52 ` [PATCH 1/4] mm/page_alloc/vmstat: Simplify refresh_cpu_vm_stats change detection Joshua Hahn
@ 2025-09-19 19:52 ` Joshua Hahn
  2025-09-19 19:52 ` [PATCH 3/4] mm/page_alloc: Batch page freeing in decay_pcp_high Joshua Hahn
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Joshua Hahn @ 2025-09-19 19:52 UTC (permalink / raw)
  To: Andrew Morton, Johannes Weiner
  Cc: Chris Mason, Kiryl Shutsemau, Brendan Jackman, Michal Hocko,
	Suren Baghdasaryan, Vlastimil Babka, Zi Yan, linux-kernel,
	linux-mm, kernel-team

drain_pages_zone completely drains a zone of its pcp free pages by
repeatedly calling free_pcppages_bulk until pcp->count reaches 0.
In this loop, it already performs batched calls to ensure that
free_pcppages_bulk isn't called to free too many pages at once, and
relinquishes & reacquires the lock between each call to prevent
lock starvation from other processes.

However, the current batching does not prevent lock starvation. The
current implementation creates batches of
pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX, which has been seen in
Meta workloads to be up to 64 << 5 == 2048 pages.

While it is true that CONFIG_PCP_BATCH_SCALE_MAX is a config and
indeed can be adjusted by the system admin to be any number from
0 to 6, it's default value of 5 is still too high to be reasonable for
any system.

Instead, let's create batches of pcp->batch pages, which gives a more
reasonable 64 pages per call to free_pcppages_bulk. This gives other
processes a chance to grab the lock and prevents starvation. Each
individual call to drain_pages_zone may take longer, but we avoid the
worst case scenario of completely starving out other system-critical
threads from acquiring the pcp lock while 2048 pages are freed
one-by-one.

Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
 mm/page_alloc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 77e7d9a5f149..b861b647f184 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2623,8 +2623,7 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone)
 		spin_lock(&pcp->lock);
 		count = pcp->count;
 		if (count) {
-			int to_drain = min(count,
-				pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX);
+			int to_drain = min(count, pcp->batch);

 			free_pcppages_bulk(zone, to_drain, pcp, 0);
 			count -= to_drain;
-- 
2.47.3

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 3/4] mm/page_alloc: Batch page freeing in decay_pcp_high
  2025-09-19 19:52 [PATCH 0/4] mm/page_alloc: Batch callers of free_pcppages_bulk Joshua Hahn
  2025-09-19 19:52 ` [PATCH 1/4] mm/page_alloc/vmstat: Simplify refresh_cpu_vm_stats change detection Joshua Hahn
  2025-09-19 19:52 ` [PATCH 2/4] mm/page_alloc: Perform appropriate batching in drain_pages_zone Joshua Hahn
@ 2025-09-19 19:52 ` Joshua Hahn
  2025-09-19 19:52 ` [PATCH 4/4] mm/page_alloc: Batch page freeing in free_frozen_page_commit Joshua Hahn
  2025-09-19 20:06 ` [PATCH 0/4] mm/page_alloc: Batch callers of free_pcppages_bulk Andrew Morton
  4 siblings, 0 replies; 9+ messages in thread
From: Joshua Hahn @ 2025-09-19 19:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Chris Mason, Kiryl Shutsemau, Brendan Jackman,
	Michal Hocko, Suren Baghdasaryan, Vlastimil Babka, Zi Yan,
	linux-kernel, linux-mm, kernel-team

It is possible for pcp->count - pcp->high to be greatly over pcp->batch.
When this happens, we should perform batching to ensure that
free_pcppages_bulk isn't called with too many pages to free at once, and
starve out other threads that need the pcp lock.

Since we are still only freeing the difference between the initial
pcp->count and pcp->high values, there should be no change to how many
pages are freed.

Suggested-by: Chris Mason <clm@fb.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Co-developed-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
 mm/page_alloc.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b861b647f184..467e524a99df 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2563,7 +2563,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
  */
 bool decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp)
 {
-	int high_min, to_drain, batch;
+	int high_min, to_drain, to_drain_batched, batch;
 	bool todo;
 
 	high_min = READ_ONCE(pcp->high_min);
@@ -2581,11 +2581,14 @@ bool decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp)
 	}
 
 	to_drain = pcp->count - pcp->high;
-	if (to_drain > 0) {
+	while (to_drain > 0) {
+		to_drain_batched = min(to_drain, batch);
 		spin_lock(&pcp->lock);
-		free_pcppages_bulk(zone, to_drain, pcp, 0);
+		free_pcppages_bulk(zone, to_drain_batched, pcp, 0);
 		spin_unlock(&pcp->lock);
 		todo = true;
+
+		to_drain -= to_drain_batched;
 	}
 
 	return todo;
-- 
2.47.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 4/4] mm/page_alloc: Batch page freeing in free_frozen_page_commit
  2025-09-19 19:52 [PATCH 0/4] mm/page_alloc: Batch callers of free_pcppages_bulk Joshua Hahn
                   ` (2 preceding siblings ...)
  2025-09-19 19:52 ` [PATCH 3/4] mm/page_alloc: Batch page freeing in decay_pcp_high Joshua Hahn
@ 2025-09-19 19:52 ` Joshua Hahn
  2025-09-20 14:41   ` kernel test robot
  2025-09-20 17:12   ` kernel test robot
  2025-09-19 20:06 ` [PATCH 0/4] mm/page_alloc: Batch callers of free_pcppages_bulk Andrew Morton
  4 siblings, 2 replies; 9+ messages in thread
From: Joshua Hahn @ 2025-09-19 19:52 UTC (permalink / raw)
  To: Andrew Morton, Johannes Weiner
  Cc: Chris Mason, Kiryl Shutsemau, Brendan Jackman, Michal Hocko,
	Suren Baghdasaryan, Vlastimil Babka, Zi Yan, linux-kernel,
	linux-mm, kernel-team

Before returning, free_frozen_page_commit calls free_pcppages_bulk using
nr_pcp_free to determine how many pages can appropritately be freed,
based on the tunable parameters stored in pcp. While this number is an
accurate representation of how many pages should be freed in total, it
is not an appropriate number of pages to free at once using
free_pcppages_bulk, since we have seen the value consistently go above
2000 in the Meta fleet on larger machines.

As such, perform batched page freeing in free_pcppages_bulk by using
pcp->batch member. In order to ensure that other processes are not
starved of the pcp (and zone) lock, free the pcp lock.

Note that because free_frozen_page_commit now performs a spinlock inside the
function (and can fail), the function may now return with a freed pcp.
To handle this, return true if the pcp is locked on exit and false otherwise.

Suggested-by: Chris Mason <clm@fb.com>
Co-developed-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
 mm/page_alloc.c | 45 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 36 insertions(+), 9 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 467e524a99df..dc9412e295dc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2821,11 +2821,19 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone,
 	return high;
 }
 
-static void free_frozen_page_commit(struct zone *zone,
+/*
+ * Tune pcp alloc factor and adjust count & free_count. Free pages to bring the
+ * pcp's watermarks below high.
+ *
+ * May return a freed pcp, if during page freeing the pcp spinlock cannot be
+ * reacquired. Return true if pcp is locked, false otherwise.
+ */
+static bool free_frozen_page_commit(struct zone *zone,
 		struct per_cpu_pages *pcp, struct page *page, int migratetype,
 		unsigned int order, fpi_t fpi_flags)
 {
 	int high, batch;
+	int to_free, to_free_batched;
 	int pindex;
 	bool free_high = false;
 
@@ -2864,17 +2872,34 @@ static void free_frozen_page_commit(struct zone *zone,
 		 * Do not attempt to take a zone lock. Let pcp->count get
 		 * over high mark temporarily.
 		 */
-		return;
+		return true;
 	}
 	high = nr_pcp_high(pcp, zone, batch, free_high);
-	if (pcp->count >= high) {
-		free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high),
-				   pcp, pindex);
+	to_free = nr_pcp_free(pcp, batch, high, free_high);
+	while (to_free > 0 && pcp->count >= high) {
+		to_free_batched = min(to_free, batch);
+		free_pcppages_bulk(zone, to_free_batched, pcp, pindex);
 		if (test_bit(ZONE_BELOW_HIGH, &zone->flags) &&
 		    zone_watermark_ok(zone, 0, high_wmark_pages(zone),
 				      ZONE_MOVABLE, 0))
 			clear_bit(ZONE_BELOW_HIGH, &zone->flags);
+
+		high = nr_pcp_high(pcp, zone, batch, free_high);
+		to_free -= to_free_batched;
+		if (pcp->count >= high) {
+			pcp_spin_unlock(pcp);
+			pcp_trylock_finish(UP_flags);
+
+			pcp_trylock_prepare(UP_flags);
+			pcp = pcp_spin_trylock(zone->per_cpu_pageset);
+			if (!pcp) {
+				pcp_trylock_finish(UP_flags);
+				return false;
+			}
+		}
 	}
+
+	return true;
 }
 
 /*
@@ -2922,8 +2947,9 @@ static void __free_frozen_pages(struct page *page, unsigned int order,
 	pcp_trylock_prepare(UP_flags);
 	pcp = pcp_spin_trylock(zone->per_cpu_pageset);
 	if (pcp) {
-		free_frozen_page_commit(zone, pcp, page, migratetype, order, fpi_flags);
-		pcp_spin_unlock(pcp);
+		if (free_frozen_page_commit(zone, pcp, page, migratetype, order,
+					    fpi_flags))
+			pcp_spin_unlock(pcp);
 	} else {
 		free_one_page(zone, page, pfn, order, fpi_flags);
 	}
@@ -3022,8 +3048,9 @@ void free_unref_folios(struct folio_batch *folios)
 			migratetype = MIGRATE_MOVABLE;
 
 		trace_mm_page_free_batched(&folio->page);
-		free_frozen_page_commit(zone, pcp, &folio->page, migratetype,
-					order, FPI_NONE);
+		if (!free_frozen_page_commit(zone, pcp, &folio->page,
+					migratetype, order, FPI_NONE))
+			pcp = NULL;
 	}
 
 	if (pcp) {
-- 
2.47.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/4] mm/page_alloc: Batch page freeing in free_frozen_page_commit
  2025-09-19 19:52 ` [PATCH 4/4] mm/page_alloc: Batch page freeing in free_frozen_page_commit Joshua Hahn
@ 2025-09-20 14:41   ` kernel test robot
  2025-09-20 17:12   ` kernel test robot
  1 sibling, 0 replies; 9+ messages in thread
From: kernel test robot @ 2025-09-20 14:41 UTC (permalink / raw)
  To: Joshua Hahn, Andrew Morton, Johannes Weiner
  Cc: llvm, oe-kbuild-all, Linux Memory Management List, Chris Mason,
	Kiryl Shutsemau, Brendan Jackman, Michal Hocko,
	Suren Baghdasaryan, Vlastimil Babka, Zi Yan, linux-kernel,
	kernel-team

Hi Joshua,

kernel test robot noticed the following build errors:

[auto build test ERROR on 097a6c336d0080725c626fda118ecfec448acd0f]

url:    https://github.com/intel-lab-lkp/linux/commits/Joshua-Hahn/mm-page_alloc-vmstat-Simplify-refresh_cpu_vm_stats-change-detection/20250920-035357
base:   097a6c336d0080725c626fda118ecfec448acd0f
patch link:    https://lore.kernel.org/r/20250919195223.1560636-5-joshua.hahnjy%40gmail.com
patch subject: [PATCH 4/4] mm/page_alloc: Batch page freeing in free_frozen_page_commit
config: x86_64-buildonly-randconfig-003-20250920 (https://download.01.org/0day-ci/archive/20250920/202509202221.bpQxfR5E-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250920/202509202221.bpQxfR5E-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509202221.bpQxfR5E-lkp@intel.com/

All errors (new ones prefixed by >>):

>> mm/page_alloc.c:2891:23: error: use of undeclared identifier 'UP_flags'
    2891 |                         pcp_trylock_finish(UP_flags);
         |                                            ^
>> mm/page_alloc.c:2891:23: error: use of undeclared identifier 'UP_flags'
   mm/page_alloc.c:2893:24: error: use of undeclared identifier 'UP_flags'
    2893 |                         pcp_trylock_prepare(UP_flags);
         |                                             ^
   mm/page_alloc.c:2893:24: error: use of undeclared identifier 'UP_flags'
   mm/page_alloc.c:2896:24: error: use of undeclared identifier 'UP_flags'
    2896 |                                 pcp_trylock_finish(UP_flags);
         |                                                    ^
   mm/page_alloc.c:2896:24: error: use of undeclared identifier 'UP_flags'
   6 errors generated.


vim +/UP_flags +2891 mm/page_alloc.c

  2823	
  2824	/*
  2825	 * Tune pcp alloc factor and adjust count & free_count. Free pages to bring the
  2826	 * pcp's watermarks below high.
  2827	 *
  2828	 * May return a freed pcp, if during page freeing the pcp spinlock cannot be
  2829	 * reacquired. Return true if pcp is locked, false otherwise.
  2830	 */
  2831	static bool free_frozen_page_commit(struct zone *zone,
  2832			struct per_cpu_pages *pcp, struct page *page, int migratetype,
  2833			unsigned int order, fpi_t fpi_flags)
  2834	{
  2835		int high, batch;
  2836		int to_free, to_free_batched;
  2837		int pindex;
  2838		bool free_high = false;
  2839	
  2840		/*
  2841		 * On freeing, reduce the number of pages that are batch allocated.
  2842		 * See nr_pcp_alloc() where alloc_factor is increased for subsequent
  2843		 * allocations.
  2844		 */
  2845		pcp->alloc_factor >>= 1;
  2846		__count_vm_events(PGFREE, 1 << order);
  2847		pindex = order_to_pindex(migratetype, order);
  2848		list_add(&page->pcp_list, &pcp->lists[pindex]);
  2849		pcp->count += 1 << order;
  2850	
  2851		batch = READ_ONCE(pcp->batch);
  2852		/*
  2853		 * As high-order pages other than THP's stored on PCP can contribute
  2854		 * to fragmentation, limit the number stored when PCP is heavily
  2855		 * freeing without allocation. The remainder after bulk freeing
  2856		 * stops will be drained from vmstat refresh context.
  2857		 */
  2858		if (order && order <= PAGE_ALLOC_COSTLY_ORDER) {
  2859			free_high = (pcp->free_count >= (batch + pcp->high_min / 2) &&
  2860				     (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) &&
  2861				     (!(pcp->flags & PCPF_FREE_HIGH_BATCH) ||
  2862				      pcp->count >= batch));
  2863			pcp->flags |= PCPF_PREV_FREE_HIGH_ORDER;
  2864		} else if (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) {
  2865			pcp->flags &= ~PCPF_PREV_FREE_HIGH_ORDER;
  2866		}
  2867		if (pcp->free_count < (batch << CONFIG_PCP_BATCH_SCALE_MAX))
  2868			pcp->free_count += (1 << order);
  2869	
  2870		if (unlikely(fpi_flags & FPI_TRYLOCK)) {
  2871			/*
  2872			 * Do not attempt to take a zone lock. Let pcp->count get
  2873			 * over high mark temporarily.
  2874			 */
  2875			return true;
  2876		}
  2877		high = nr_pcp_high(pcp, zone, batch, free_high);
  2878		to_free = nr_pcp_free(pcp, batch, high, free_high);
  2879		while (to_free > 0 && pcp->count >= high) {
  2880			to_free_batched = min(to_free, batch);
  2881			free_pcppages_bulk(zone, to_free_batched, pcp, pindex);
  2882			if (test_bit(ZONE_BELOW_HIGH, &zone->flags) &&
  2883			    zone_watermark_ok(zone, 0, high_wmark_pages(zone),
  2884					      ZONE_MOVABLE, 0))
  2885				clear_bit(ZONE_BELOW_HIGH, &zone->flags);
  2886	
  2887			high = nr_pcp_high(pcp, zone, batch, free_high);
  2888			to_free -= to_free_batched;
  2889			if (pcp->count >= high) {
  2890				pcp_spin_unlock(pcp);
> 2891				pcp_trylock_finish(UP_flags);
  2892	
  2893				pcp_trylock_prepare(UP_flags);
  2894				pcp = pcp_spin_trylock(zone->per_cpu_pageset);
  2895				if (!pcp) {
  2896					pcp_trylock_finish(UP_flags);
  2897					return false;
  2898				}
  2899			}
  2900		}
  2901	
  2902		return true;
  2903	}
  2904	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/4] mm/page_alloc: Batch page freeing in free_frozen_page_commit
  2025-09-19 19:52 ` [PATCH 4/4] mm/page_alloc: Batch page freeing in free_frozen_page_commit Joshua Hahn
  2025-09-20 14:41   ` kernel test robot
@ 2025-09-20 17:12   ` kernel test robot
  1 sibling, 0 replies; 9+ messages in thread
From: kernel test robot @ 2025-09-20 17:12 UTC (permalink / raw)
  To: Joshua Hahn, Andrew Morton, Johannes Weiner
  Cc: llvm, oe-kbuild-all, Linux Memory Management List, Chris Mason,
	Kiryl Shutsemau, Brendan Jackman, Michal Hocko,
	Suren Baghdasaryan, Vlastimil Babka, Zi Yan, linux-kernel,
	kernel-team

Hi Joshua,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 097a6c336d0080725c626fda118ecfec448acd0f]

url:    https://github.com/intel-lab-lkp/linux/commits/Joshua-Hahn/mm-page_alloc-vmstat-Simplify-refresh_cpu_vm_stats-change-detection/20250920-035357
base:   097a6c336d0080725c626fda118ecfec448acd0f
patch link:    https://lore.kernel.org/r/20250919195223.1560636-5-joshua.hahnjy%40gmail.com
patch subject: [PATCH 4/4] mm/page_alloc: Batch page freeing in free_frozen_page_commit
config: powerpc-taishan_defconfig (https://download.01.org/0day-ci/archive/20250921/202509210003.8eaSRC0k-lkp@intel.com/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250921/202509210003.8eaSRC0k-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509210003.8eaSRC0k-lkp@intel.com/

All warnings (new ones prefixed by >>):

   mm/page_alloc.c:2891:23: error: use of undeclared identifier 'UP_flags'; did you mean 'fpi_flags'?
    2891 |                         pcp_trylock_finish(UP_flags);
         |                                            ^~~~~~~~
         |                                            fpi_flags
   mm/page_alloc.c:109:53: note: expanded from macro 'pcp_trylock_finish'
     109 | #define pcp_trylock_finish(flags)       local_irq_restore(flags)
         |                                                           ^
   include/linux/irqflags.h:240:61: note: expanded from macro 'local_irq_restore'
     240 | #define local_irq_restore(flags) do { raw_local_irq_restore(flags); } while (0)
         |                                                             ^
   include/linux/irqflags.h:177:28: note: expanded from macro 'raw_local_irq_restore'
     177 |                 typecheck(unsigned long, flags);        \
         |                                          ^
   include/linux/typecheck.h:11:9: note: expanded from macro 'typecheck'
      11 |         typeof(x) __dummy2; \
         |                ^
   mm/page_alloc.c:2833:29: note: 'fpi_flags' declared here
    2833 |                 unsigned int order, fpi_t fpi_flags)
         |                                           ^
>> mm/page_alloc.c:2891:4: warning: comparison of distinct pointer types ('unsigned long *' and 'typeof (fpi_flags) *' (aka 'int *')) [-Wcompare-distinct-pointer-types]
    2891 |                         pcp_trylock_finish(UP_flags);
         |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
   mm/page_alloc.c:109:35: note: expanded from macro 'pcp_trylock_finish'
     109 | #define pcp_trylock_finish(flags)       local_irq_restore(flags)
         |                                         ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/irqflags.h:240:39: note: expanded from macro 'local_irq_restore'
     240 | #define local_irq_restore(flags) do { raw_local_irq_restore(flags); } while (0)
         |                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/irqflags.h:177:3: note: expanded from macro 'raw_local_irq_restore'
     177 |                 typecheck(unsigned long, flags);        \
         |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/typecheck.h:12:18: note: expanded from macro 'typecheck'
      12 |         (void)(&__dummy == &__dummy2); \
         |                ~~~~~~~~ ^  ~~~~~~~~~
   mm/page_alloc.c:2891:23: error: use of undeclared identifier 'UP_flags'; did you mean 'fpi_flags'?
    2891 |                         pcp_trylock_finish(UP_flags);
         |                                            ^~~~~~~~
         |                                            fpi_flags
   mm/page_alloc.c:109:53: note: expanded from macro 'pcp_trylock_finish'
     109 | #define pcp_trylock_finish(flags)       local_irq_restore(flags)
         |                                                           ^
   include/linux/irqflags.h:240:61: note: expanded from macro 'local_irq_restore'
     240 | #define local_irq_restore(flags) do { raw_local_irq_restore(flags); } while (0)
         |                                                             ^
   include/linux/irqflags.h:179:26: note: expanded from macro 'raw_local_irq_restore'
     179 |                 arch_local_irq_restore(flags);          \
         |                                        ^
   mm/page_alloc.c:2833:29: note: 'fpi_flags' declared here
    2833 |                 unsigned int order, fpi_t fpi_flags)
         |                                           ^
   mm/page_alloc.c:2893:24: error: use of undeclared identifier 'UP_flags'; did you mean 'fpi_flags'?
    2893 |                         pcp_trylock_prepare(UP_flags);
         |                                             ^~~~~~~~
         |                                             fpi_flags
   mm/page_alloc.c:108:51: note: expanded from macro 'pcp_trylock_prepare'
     108 | #define pcp_trylock_prepare(flags)      local_irq_save(flags)
         |                                                        ^
   include/linux/irqflags.h:239:55: note: expanded from macro 'local_irq_save'
     239 | #define local_irq_save(flags)   do { raw_local_irq_save(flags); } while (0)
         |                                                         ^
   include/linux/irqflags.h:172:28: note: expanded from macro 'raw_local_irq_save'
     172 |                 typecheck(unsigned long, flags);        \
         |                                          ^
   include/linux/typecheck.h:11:9: note: expanded from macro 'typecheck'
      11 |         typeof(x) __dummy2; \
         |                ^
   mm/page_alloc.c:2833:29: note: 'fpi_flags' declared here
    2833 |                 unsigned int order, fpi_t fpi_flags)
         |                                           ^
   mm/page_alloc.c:2893:4: warning: comparison of distinct pointer types ('unsigned long *' and 'typeof (fpi_flags) *' (aka 'int *')) [-Wcompare-distinct-pointer-types]
    2893 |                         pcp_trylock_prepare(UP_flags);
         |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   mm/page_alloc.c:108:36: note: expanded from macro 'pcp_trylock_prepare'
     108 | #define pcp_trylock_prepare(flags)      local_irq_save(flags)
         |                                         ^~~~~~~~~~~~~~~~~~~~~
   include/linux/irqflags.h:239:36: note: expanded from macro 'local_irq_save'
     239 | #define local_irq_save(flags)   do { raw_local_irq_save(flags); } while (0)
         |                                      ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/irqflags.h:172:3: note: expanded from macro 'raw_local_irq_save'
     172 |                 typecheck(unsigned long, flags);        \
         |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/typecheck.h:12:18: note: expanded from macro 'typecheck'
      12 |         (void)(&__dummy == &__dummy2); \
         |                ~~~~~~~~ ^  ~~~~~~~~~
   mm/page_alloc.c:2893:24: error: use of undeclared identifier 'UP_flags'
    2893 |                         pcp_trylock_prepare(UP_flags);
         |                                             ^
   mm/page_alloc.c:2896:24: error: use of undeclared identifier 'UP_flags'; did you mean 'fpi_flags'?
    2896 |                                 pcp_trylock_finish(UP_flags);
         |                                                    ^~~~~~~~
         |                                                    fpi_flags
   mm/page_alloc.c:109:53: note: expanded from macro 'pcp_trylock_finish'
     109 | #define pcp_trylock_finish(flags)       local_irq_restore(flags)
         |                                                           ^
   include/linux/irqflags.h:240:61: note: expanded from macro 'local_irq_restore'
     240 | #define local_irq_restore(flags) do { raw_local_irq_restore(flags); } while (0)
         |                                                             ^
   include/linux/irqflags.h:177:28: note: expanded from macro 'raw_local_irq_restore'
     177 |                 typecheck(unsigned long, flags);        \
         |                                          ^
   include/linux/typecheck.h:11:9: note: expanded from macro 'typecheck'
      11 |         typeof(x) __dummy2; \
         |                ^
   mm/page_alloc.c:2833:29: note: 'fpi_flags' declared here
    2833 |                 unsigned int order, fpi_t fpi_flags)
         |                                           ^
   mm/page_alloc.c:2896:5: warning: comparison of distinct pointer types ('unsigned long *' and 'typeof (fpi_flags) *' (aka 'int *')) [-Wcompare-distinct-pointer-types]
    2896 |                                 pcp_trylock_finish(UP_flags);
         |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
   mm/page_alloc.c:109:35: note: expanded from macro 'pcp_trylock_finish'
     109 | #define pcp_trylock_finish(flags)       local_irq_restore(flags)
         |                                         ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/irqflags.h:240:39: note: expanded from macro 'local_irq_restore'
     240 | #define local_irq_restore(flags) do { raw_local_irq_restore(flags); } while (0)
         |                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/irqflags.h:177:3: note: expanded from macro 'raw_local_irq_restore'
     177 |                 typecheck(unsigned long, flags);        \
         |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/typecheck.h:12:18: note: expanded from macro 'typecheck'
      12 |         (void)(&__dummy == &__dummy2); \


vim +2891 mm/page_alloc.c

  2823	
  2824	/*
  2825	 * Tune pcp alloc factor and adjust count & free_count. Free pages to bring the
  2826	 * pcp's watermarks below high.
  2827	 *
  2828	 * May return a freed pcp, if during page freeing the pcp spinlock cannot be
  2829	 * reacquired. Return true if pcp is locked, false otherwise.
  2830	 */
  2831	static bool free_frozen_page_commit(struct zone *zone,
  2832			struct per_cpu_pages *pcp, struct page *page, int migratetype,
  2833			unsigned int order, fpi_t fpi_flags)
  2834	{
  2835		int high, batch;
  2836		int to_free, to_free_batched;
  2837		int pindex;
  2838		bool free_high = false;
  2839	
  2840		/*
  2841		 * On freeing, reduce the number of pages that are batch allocated.
  2842		 * See nr_pcp_alloc() where alloc_factor is increased for subsequent
  2843		 * allocations.
  2844		 */
  2845		pcp->alloc_factor >>= 1;
  2846		__count_vm_events(PGFREE, 1 << order);
  2847		pindex = order_to_pindex(migratetype, order);
  2848		list_add(&page->pcp_list, &pcp->lists[pindex]);
  2849		pcp->count += 1 << order;
  2850	
  2851		batch = READ_ONCE(pcp->batch);
  2852		/*
  2853		 * As high-order pages other than THP's stored on PCP can contribute
  2854		 * to fragmentation, limit the number stored when PCP is heavily
  2855		 * freeing without allocation. The remainder after bulk freeing
  2856		 * stops will be drained from vmstat refresh context.
  2857		 */
  2858		if (order && order <= PAGE_ALLOC_COSTLY_ORDER) {
  2859			free_high = (pcp->free_count >= (batch + pcp->high_min / 2) &&
  2860				     (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) &&
  2861				     (!(pcp->flags & PCPF_FREE_HIGH_BATCH) ||
  2862				      pcp->count >= batch));
  2863			pcp->flags |= PCPF_PREV_FREE_HIGH_ORDER;
  2864		} else if (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) {
  2865			pcp->flags &= ~PCPF_PREV_FREE_HIGH_ORDER;
  2866		}
  2867		if (pcp->free_count < (batch << CONFIG_PCP_BATCH_SCALE_MAX))
  2868			pcp->free_count += (1 << order);
  2869	
  2870		if (unlikely(fpi_flags & FPI_TRYLOCK)) {
  2871			/*
  2872			 * Do not attempt to take a zone lock. Let pcp->count get
  2873			 * over high mark temporarily.
  2874			 */
  2875			return true;
  2876		}
  2877		high = nr_pcp_high(pcp, zone, batch, free_high);
  2878		to_free = nr_pcp_free(pcp, batch, high, free_high);
  2879		while (to_free > 0 && pcp->count >= high) {
  2880			to_free_batched = min(to_free, batch);
  2881			free_pcppages_bulk(zone, to_free_batched, pcp, pindex);
  2882			if (test_bit(ZONE_BELOW_HIGH, &zone->flags) &&
  2883			    zone_watermark_ok(zone, 0, high_wmark_pages(zone),
  2884					      ZONE_MOVABLE, 0))
  2885				clear_bit(ZONE_BELOW_HIGH, &zone->flags);
  2886	
  2887			high = nr_pcp_high(pcp, zone, batch, free_high);
  2888			to_free -= to_free_batched;
  2889			if (pcp->count >= high) {
  2890				pcp_spin_unlock(pcp);
> 2891				pcp_trylock_finish(UP_flags);
  2892	
  2893				pcp_trylock_prepare(UP_flags);
  2894				pcp = pcp_spin_trylock(zone->per_cpu_pageset);
  2895				if (!pcp) {
  2896					pcp_trylock_finish(UP_flags);
  2897					return false;
  2898				}
  2899			}
  2900		}
  2901	
  2902		return true;
  2903	}
  2904	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] mm/page_alloc: Batch callers of free_pcppages_bulk
  2025-09-19 19:52 [PATCH 0/4] mm/page_alloc: Batch callers of free_pcppages_bulk Joshua Hahn
                   ` (3 preceding siblings ...)
  2025-09-19 19:52 ` [PATCH 4/4] mm/page_alloc: Batch page freeing in free_frozen_page_commit Joshua Hahn
@ 2025-09-19 20:06 ` Andrew Morton
  2025-09-19 21:06   ` Joshua Hahn
  4 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2025-09-19 20:06 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: Johannes Weiner, Chris Mason, Kiryl Shutsemau, Liam R. Howlett,
	Brendan Jackman, David Hildenbrand, Lorenzo Stoakes,
	Michal Hocko, Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
	Zi Yan, linux-kernel, linux-mm, kernel-team

On Fri, 19 Sep 2025 12:52:18 -0700 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:

> While testing workloads with high sustained memory pressure on large machines
> (1TB memory, 316 CPUs), we saw an unexpectedly high number of softlockups.
> Further investigation showed that the lock in free_pcppages_bulk was being held
> for a long time, even being held while 2k+ pages were being freed [1].

What problems are caused by this, apart from a warning which can
presumably be suppressed in some fashion?

> This causes starvation in other processes for both the pcp and zone locks,
> which can lead to softlockups that cause the system to stall [2].

[2] doesn't describe such stalls.

>
> ...
>
> In our fleet, we have seen that performing batched lock freeing has led to
> significantly lower rates of softlockups, while incurring relatively small
> regressions (relative to the workload and relative to the variation).

"our" == Meta?




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] mm/page_alloc: Batch callers of free_pcppages_bulk
  2025-09-19 20:06 ` [PATCH 0/4] mm/page_alloc: Batch callers of free_pcppages_bulk Andrew Morton
@ 2025-09-19 21:06   ` Joshua Hahn
  0 siblings, 0 replies; 9+ messages in thread
From: Joshua Hahn @ 2025-09-19 21:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Chris Mason, Kiryl Shutsemau, Liam R. Howlett,
	Brendan Jackman, David Hildenbrand, Lorenzo Stoakes,
	Michal Hocko, Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
	Zi Yan, linux-kernel, linux-mm, kernel-team

Hello Andrew,

Thank you for your review, as always!

On Fri, 19 Sep 2025 13:06:44 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:

> On Fri, 19 Sep 2025 12:52:18 -0700 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
> 
> > While testing workloads with high sustained memory pressure on large machines
> > (1TB memory, 316 CPUs), we saw an unexpectedly high number of softlockups.
> > Further investigation showed that the lock in free_pcppages_bulk was being held
> > for a long time, even being held while 2k+ pages were being freed [1].
> 
> What problems are caused by this, apart from a warning which can
> presumably be suppressed in some fashion?

There are softlockup panics that we (Meta) saw in the fleet previously. For
some reason I can't get it to reproduce again, but let me try to find a way
to trigger these softlockups again. 

> > This causes starvation in other processes for both the pcp and zone locks,
> > which can lead to softlockups that cause the system to stall [2].
> 
> [2] doesn't describe such stalls.

You're absolutely right -- I was revising this cover letter a bit and I was
going to link the below message separately, but decided to put it in the
message and fogot to remove the footnote. The message below isn't a softlockup,
but let me try and get one to add to the cover letter in a reply to this.

> >
> > ...
> >
> > In our fleet, we have seen that performing batched lock freeing has led to
> > significantly lower rates of softlockups, while incurring relatively small
> > regressions (relative to the workload and relative to the variation).
> 
> "our" == Meta?

Yes -- sorry, I think I made this same mistake in the original version as well.
I'll be more careful about this!

Thank you again for your feedback Andrew, I hope you have a great day!
Joshua

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-09-20 17:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-19 19:52 [PATCH 0/4] mm/page_alloc: Batch callers of free_pcppages_bulk Joshua Hahn
2025-09-19 19:52 ` [PATCH 1/4] mm/page_alloc/vmstat: Simplify refresh_cpu_vm_stats change detection Joshua Hahn
2025-09-19 19:52 ` [PATCH 2/4] mm/page_alloc: Perform appropriate batching in drain_pages_zone Joshua Hahn
2025-09-19 19:52 ` [PATCH 3/4] mm/page_alloc: Batch page freeing in decay_pcp_high Joshua Hahn
2025-09-19 19:52 ` [PATCH 4/4] mm/page_alloc: Batch page freeing in free_frozen_page_commit Joshua Hahn
2025-09-20 14:41   ` kernel test robot
2025-09-20 17:12   ` kernel test robot
2025-09-19 20:06 ` [PATCH 0/4] mm/page_alloc: Batch callers of free_pcppages_bulk Andrew Morton
2025-09-19 21:06   ` Joshua Hahn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox