[PATCH v5] mm/page_alloc: boost watermarks on atomic allocation failure

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v5] mm/page_alloc: boost watermarks on atomic allocation failure
@ 2026-01-21  6:57 Qiliang Yuan
  2026-01-21 20:56 ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Qiliang Yuan @ 2026-01-21  6:57 UTC (permalink / raw)
  To: akpm
  Cc: david, mhocko, vbabka, willy, lance.yang, hannes, surenb,
	jackmanb, ziy, weixugc, rppt, linux-mm, linux-kernel,
	Qiliang Yuan

Atomic allocations (GFP_ATOMIC) are prone to failure under heavy memory
pressure as they cannot enter direct reclaim. This patch introduces a
'Soft Boost' mechanism to mitigate this.

When a GFP_ATOMIC request fails or enters the slowpath, the preferred
zone's watermark_boost is increased. This triggers kswapd to proactively
reclaim memory, creating a safety buffer for future atomic bursts.

To prevent excessive reclaim during packet storms, a 1-second debounce
timer (last_boost_jiffies) is added to each zone to rate-limit boosts.

This approach reuses existing watermark_boost infrastructure, ensuring
minimal overhead and asynchronous background reclaim via kswapd.

Allocation failure logs:
[38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0
[38535644.725059] node 1: slabs: 339, objs: 17616, free: 317
[38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535645.447664] node 0: slabs: 940, objs: 40864, free: 144
[38535645.454026] node 1: slabs: 322, objs: 19168, free: 383
[38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535649.655523] warn_alloc: 59 callbacks suppressed
[38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null)
[38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1

Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
v5:
  - Replaced custom watermark_scale_boost and manual recomputations with 
    native boost_watermark reuse.
  - Simplified logic to use existing 'boost' architecture for better 
    community acceptability.
v4:
  - Introduced watermark_scale_boost and gradual decay via balance_pgdat.
  - Added proactive soft-boosting when entering slowpath.
v3:
  - Moved debounce timer to per-zone to avoid cross-node interference.
  - Optimized candidate zone selection to reduce global reclaim pressure.
v2:
  - Added basic debounce logic and scaled boosting strength based on zone size.
v1:
  - Initial proposal: Basic watermark boost on atomic allocation failure.
---
 include/linux/mmzone.h |  1 +
 mm/page_alloc.c        | 29 ++++++++++++++++++++++++++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 75ef7c9f9307..8e37e4e6765b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -882,6 +882,7 @@ struct zone {
 	/* zone watermarks, access with *_wmark_pages(zone) macros */
 	unsigned long _watermark[NR_WMARK];
 	unsigned long watermark_boost;
+	unsigned long last_boost_jiffies;
 
 	unsigned long nr_reserved_highatomic;
 	unsigned long nr_free_highatomic;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c380f063e8b7..1faace9e2dc5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2189,12 +2189,31 @@ static inline bool boost_watermark(struct zone *zone)
 
 	max_boost = max(pageblock_nr_pages, max_boost);
 
-	zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages,
+	zone->watermark_boost = min(zone->watermark_boost +
+		max(pageblock_nr_pages, zone_managed_pages(zone) >> 10),
 		max_boost);
 
 	return true;
 }
 
+static void boost_zones_for_atomic(struct alloc_context *ac, gfp_t gfp_mask)
+{
+	struct zoneref *z;
+	struct zone *zone;
+	unsigned long now = jiffies;
+
+	for_each_zone_zonelist(zone, z, ac->zonelist, ac->highest_zoneidx) {
+		/* 1 second debounce to avoid spamming boosts in a burst */
+		if (time_after(now, zone->last_boost_jiffies + HZ)) {
+			zone->last_boost_jiffies = now;
+			if (boost_watermark(zone))
+				wakeup_kswapd(zone, gfp_mask, 0, ac->highest_zoneidx);
+			/* Only boost the preferred zone to be precise */
+			break;
+		}
+	}
+}
+
 /*
  * When we are falling back to another migratetype during allocation, should we
  * try to claim an entire block to satisfy further allocations, instead of
@@ -4742,6 +4761,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (page)
 		goto got_pg;
 
+	/* Proactively boost for atomic requests entering slowpath */
+	if ((gfp_mask & GFP_ATOMIC) && order == 0)
+		boost_zones_for_atomic(ac, gfp_mask);
+
 	/*
 	 * For costly allocations, try direct compaction first, as it's likely
 	 * that we have enough base pages and don't need to reclaim. For non-
@@ -4947,6 +4970,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		goto retry;
 	}
 fail:
+	/* Boost watermarks on atomic allocation failure to trigger kswapd */
+	if (unlikely(page == NULL && (gfp_mask & GFP_ATOMIC) && order == 0))
+		boost_zones_for_atomic(ac, gfp_mask);
+
 	warn_alloc(gfp_mask, ac->nodemask,
 			"page allocation failure: order:%u", order);
 got_pg:
-- 
2.51.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5] mm/page_alloc: boost watermarks on atomic allocation failure
  2026-01-21  6:57 [PATCH v5] mm/page_alloc: boost watermarks on atomic allocation failure Qiliang Yuan
@ 2026-01-21 20:56 ` Andrew Morton
  2026-01-22  1:40   ` Qiliang Yuan
                     ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Andrew Morton @ 2026-01-21 20:56 UTC (permalink / raw)
  To: Qiliang Yuan
  Cc: david, mhocko, vbabka, willy, lance.yang, hannes, surenb,
	jackmanb, ziy, weixugc, rppt, linux-mm, linux-kernel, netdev,
	Eric Dumazet

On Wed, 21 Jan 2026 01:57:40 -0500 Qiliang Yuan <realwujing@gmail.com> wrote:

> Atomic allocations (GFP_ATOMIC) are prone to failure under heavy memory
> pressure as they cannot enter direct reclaim. This patch introduces a
> 'Soft Boost' mechanism to mitigate this.
> 
> When a GFP_ATOMIC request fails or enters the slowpath, the preferred
> zone's watermark_boost is increased. This triggers kswapd to proactively
> reclaim memory, creating a safety buffer for future atomic bursts.
> 
> To prevent excessive reclaim during packet storms, a 1-second debounce
> timer (last_boost_jiffies) is added to each zone to rate-limit boosts.
> 
> This approach reuses existing watermark_boost infrastructure, ensuring
> minimal overhead and asynchronous background reclaim via kswapd.
> 
> Allocation failure logs:
> [38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0
> [38535644.725059] node 1: slabs: 339, objs: 17616, free: 317
> [38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
> [38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
> [38535645.447664] node 0: slabs: 940, objs: 40864, free: 144
> [38535645.454026] node 1: slabs: 322, objs: 19168, free: 383
> [38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
> [38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
> [38535649.655523] warn_alloc: 59 callbacks suppressed
> [38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null)
> [38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1

This seems sensible to me - dynamically boost reserves in response to
sustained GFP_ATOMIC allocation failures.

It's very much a networking thing and I expect the networking people
have been looking at these issues for years.  So let's start by cc'ing
them!

Obvious question, which I think was asked before: what about gradually
decreasing those reserves when the packet storm has subsided?

> v4:
>   - Introduced watermark_scale_boost and gradual decay via balance_pgdat.

And there it is, but v5 removed this.  Why?

Or perhaps I'm misreading the implementation.

>   - Added proactive soft-boosting when entering slowpath.
> v3:
>   - Moved debounce timer to per-zone to avoid cross-node interference.
>   - Optimized candidate zone selection to reduce global reclaim pressure.
> v2:
>   - Added basic debounce logic and scaled boosting strength based on zone size.
> v1:
>   - Initial proposal: Basic watermark boost on atomic allocation failure.
> ---
>  include/linux/mmzone.h |  1 +
>  mm/page_alloc.c        | 29 ++++++++++++++++++++++++++++-
>  2 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 75ef7c9f9307..8e37e4e6765b 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -882,6 +882,7 @@ struct zone {
>  	/* zone watermarks, access with *_wmark_pages(zone) macros */
>  	unsigned long _watermark[NR_WMARK];
>  	unsigned long watermark_boost;
> +	unsigned long last_boost_jiffies;
>  
>  	unsigned long nr_reserved_highatomic;
>  	unsigned long nr_free_highatomic;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c380f063e8b7..1faace9e2dc5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2189,12 +2189,31 @@ static inline bool boost_watermark(struct zone *zone)
>  
>  	max_boost = max(pageblock_nr_pages, max_boost);
>  
> -	zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages,
> +	zone->watermark_boost = min(zone->watermark_boost +
> +		max(pageblock_nr_pages, zone_managed_pages(zone) >> 10),

">> 10" is a magic number.  What is the reasoning behind choosing this
value?

>  		max_boost);
>  
>  	return true;
>  }
>  
> +static void boost_zones_for_atomic(struct alloc_context *ac, gfp_t gfp_mask)
> +{
> +	struct zoneref *z;
> +	struct zone *zone;
> +	unsigned long now = jiffies;
> +
> +	for_each_zone_zonelist(zone, z, ac->zonelist, ac->highest_zoneidx) {
> +		/* 1 second debounce to avoid spamming boosts in a burst */
> +		if (time_after(now, zone->last_boost_jiffies + HZ)) {
> +			zone->last_boost_jiffies = now;
> +			if (boost_watermark(zone))
> +				wakeup_kswapd(zone, gfp_mask, 0, ac->highest_zoneidx);
> +			/* Only boost the preferred zone to be precise */
> +			break;
> +		}
> +	}
> +}
> +
>  /*
>   * When we are falling back to another migratetype during allocation, should we
>   * try to claim an entire block to satisfy further allocations, instead of
> @@ -4742,6 +4761,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	if (page)
>  		goto got_pg;
>  
> +	/* Proactively boost for atomic requests entering slowpath */
> +	if ((gfp_mask & GFP_ATOMIC) && order == 0)
> +		boost_zones_for_atomic(ac, gfp_mask);
> +
>  	/*
>  	 * For costly allocations, try direct compaction first, as it's likely
>  	 * that we have enough base pages and don't need to reclaim. For non-
> @@ -4947,6 +4970,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  		goto retry;
>  	}
>  fail:
> +	/* Boost watermarks on atomic allocation failure to trigger kswapd */
> +	if (unlikely(page == NULL && (gfp_mask & GFP_ATOMIC) && order == 0))
> +		boost_zones_for_atomic(ac, gfp_mask);
> +
>  	warn_alloc(gfp_mask, ac->nodemask,
>  			"page allocation failure: order:%u", order);
>  got_pg:
> -- 
> 2.51.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5] mm/page_alloc: boost watermarks on atomic allocation failure
  2026-01-21 20:56 ` Andrew Morton
@ 2026-01-22  1:40   ` Qiliang Yuan
  2026-01-22  2:00   ` [PATCH] " Qiliang Yuan
  2026-01-22  2:07   ` [PATCH v6] " Qiliang Yuan
  2 siblings, 0 replies; 9+ messages in thread
From: Qiliang Yuan @ 2026-01-22  1:40 UTC (permalink / raw)
  To: akpm
  Cc: david, mhocko, vbabka, willy, lance.yang, hannes, surenb,
	jackmanb, ziy, weixugc, rppt, linux-mm, linux-kernel, netdev,
	edumazet

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 4687 bytes --]

On Wed, 21 Jan 2026 12:56:03 -0800 Andrew Morton <akpm@linux-foundation.org> wrote:

> This seems sensible to me - dynamically boost reserves in response to
> sustained GFP_ATOMIC allocation failures.  It's very much a networking
> thing and I expect the networking people have been looking at these
> issues for years.  So let's start by cc'ing them!

Thank you for the feedback and for cc'ing the networking folks! I appreciate
your continued engagement throughout this patch series (v1-v5).

> Obvious question, which I think was asked before: what about gradually
> decreasing those reserves when the packet storm has subsided?
> 
> > v4:
> > - Introduced watermark_scale_boost and gradual decay via balance_pgdat.
> 
> And there it is, but v5 removed this.  Why?  Or perhaps I'm misreading
> the implementation.

You're absolutely right - v4 did include a gradual decay mechanism. The
evolution from v1 to v5 was driven by community feedback, and I'd like to
explain the rationale for each major change:

**v1 → v2**: Following your and Matthew Wilcox's feedback on v1, I:
- Reduced the boost from doubling (100%) to 50% increase
- Added a decay mechanism (5% every 5 minutes)
- Added debounce logic
- v1: https://lore.kernel.org/all/tencent_9DB6637676D639B4B7AEA09CC6A6F9E49D0A@qq.com/
- v2: https://lore.kernel.org/all/tencent_6FE67BA7BE8376AB038A71ACAD4FF8A90006@qq.com/

**v2 → v3**: Following Michal Hocko's suggestion to use watermark_scale_factor
instead of min_free_kbytes, I switched to the watermark_boost infrastructure.
This was a significant simplification that reused existing MM subsystem patterns.
- v3: https://lore.kernel.org/all/tencent_44B556221480D8371FBC534ACCF3CE2C8707@qq.com/

**v3 → v4**: Added watermark_scale_boost and gradual decay via balance_pgdat()
to provide more fine-grained control over the reclaim aggressiveness.
- v4: https://lore.kernel.org/all/tencent_D23BFCB69EA088C55AFAF89F926036743E0A@qq.com/

**v4 → v5**: Removed watermark_scale_boost for the following reasons:
- v5: https://lore.kernel.org/all/20260121065740.35616-1-realwujing@gmail.com/

1. **Natural decay exists**: The existing watermark_boost infrastructure already
   has a built-in decay path. When kswapd successfully reclaims memory and the
   zone becomes balanced, kswapd_shrink_node() automatically resets
   watermark_boost to 0. This happens organically without custom decay logic.

2. **Simplicity**: The v4 approach added custom watermark_scale_boost tracking
   and manual decay in balance_pgdat(). This added complexity that duplicated
   functionality already present in the kswapd reclaim path.

3. **Production validation**: In our production environment (high-throughput
   networking workloads), the natural decay via kswapd proved sufficient. Once
   memory pressure subsides and kswapd successfully reclaims to the high
   watermark, the boost is cleared automatically within seconds.

However, I recognize this is a trade-off. The v4 gradual decay provided more
explicit control over the decay rate. If you or the networking maintainers feel
that explicit decay control is important for packet storm scenarios, I'm happy
to reintroduce the v4 approach or explore alternative decay strategies (e.g.,
time-based decay independent of kswapd success).

> > +	zone->watermark_boost = min(zone->watermark_boost +
> > +		max(pageblock_nr_pages, zone_managed_pages(zone) >> 10),
> 
> ">> 10" is a magic number.  What is the reasoning behind choosing this
> value?

Good catch. The ">> 10" (divide by 1024) was chosen to provide a
zone-proportional boost that scales with zone size:

- For a 1GB zone: ~1MB boost per trigger
- For a 16GB zone: ~16MB boost per trigger

The rationale:
1. **Proportionality**: Larger zones experiencing atomic allocation pressure
   likely need proportionally larger safety buffers. A fixed pageblock_nr_pages
   (typically 2MB) might be insufficient for large zones under heavy load.

2. **Conservative scaling**: 1/1024 (~0.1%) is aggressive enough to help during
   sustained pressure but conservative enough to avoid over-reclaim. This was
   empirically tuned based on our production workload.

3. **Production results**: In our high-throughput networking environment
   (100Gbps+ traffic bursts), this value reduced GFP_ATOMIC failures by ~95%
   without causing excessive kswapd activity or impacting normal allocations.

I should document this better. I propose adding a #define:

```c
/*
 * Boost watermarks by ~0.1% of zone size on atomic allocation pressure.
 * This provides zone-proportional safety buffers: ~1MB per 1GB of zone size.
 */
#define ATOMIC_BOOST_SCALE_SHIFT 10
```

Best regards,
Qiliang Yuan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] mm/page_alloc: boost watermarks on atomic allocation failure
  2026-01-21 20:56 ` Andrew Morton
  2026-01-22  1:40   ` Qiliang Yuan
@ 2026-01-22  2:00   ` Qiliang Yuan
  2026-01-22  2:17     ` Qiliang Yuan
  2026-01-22  2:07   ` [PATCH v6] " Qiliang Yuan
  2 siblings, 1 reply; 9+ messages in thread
From: Qiliang Yuan @ 2026-01-22  2:00 UTC (permalink / raw)
  To: akpm
  Cc: david, mhocko, vbabka, willy, lance.yang, hannes, surenb,
	jackmanb, ziy, weixugc, rppt, linux-mm, linux-kernel, netdev,
	edumazet, jis1, wangh13, liyi1, sunshx, zhangzq20, zhangjn11,
	Qiliang Yuan, Qiliang Yuan

Atomic allocations (GFP_ATOMIC) are prone to failure under heavy memory
pressure as they cannot enter direct reclaim. This patch introduces a
'Soft Boost' mechanism to mitigate this.

When a GFP_ATOMIC request fails or enters the slowpath, the preferred
zone's watermark_boost is increased. This triggers kswapd to proactively
reclaim memory, creating a safety buffer for future atomic bursts.

To prevent excessive reclaim during packet storms, a 1-second debounce
timer (last_boost_jiffies) is added to each zone to rate-limit boosts.

This approach reuses existing watermark_boost infrastructure, ensuring
minimal overhead and asynchronous background reclaim via kswapd.

Allocation failure logs:
[38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0
[38535644.725059] node 1: slabs: 339, objs: 17616, free: 317
[38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535645.447664] node 0: slabs: 940, objs: 40864, free: 144
[38535645.454026] node 1: slabs: 322, objs: 19168, free: 383
[38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535649.655523] warn_alloc: 59 callbacks suppressed
[38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null)
[38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1

Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>
---
v6:
  - Replace magic number ">> 10" with ATOMIC_BOOST_SCALE_SHIFT define
  - Add documentation explaining 0.1% zone size boost rationale
v5:
  - Simplify to use native boost_watermark() instead of custom logic
v4:
  - Add watermark_scale_boost and gradual decay via balance_pgdat
v3:
  - Move debounce timer to per-zone; optimize zone selection
v2:
  - Add debounce logic and zone-proportional boosting
v1:
  - Initial: boost min_free_kbytes on GFP_ATOMIC failure
---
 include/linux/mmzone.h |  1 +
 mm/page_alloc.c        | 36 +++++++++++++++++++++++++++++++++++-
 2 files changed, 36 insertions(+), 1 deletion(-)
---
 mm/page_alloc.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1faace9e2dc5..8ea2435125d5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -218,6 +218,13 @@ unsigned int pageblock_order __read_mostly;
 static void __free_pages_ok(struct page *page, unsigned int order,
 			    fpi_t fpi_flags);
 
+/*
+ * Boost watermarks by ~0.1% of zone size on atomic allocation pressure.
+ * This provides zone-proportional safety buffers: ~1MB per 1GB of zone size.
+ * Larger zones under GFP_ATOMIC pressure need proportionally larger reserves.
+ */
+#define ATOMIC_BOOST_SCALE_SHIFT 10
+
 /*
  * results with 256, 32 in the lowmem_reserve sysctl:
  *	1G machine -> (16M dma, 800M-16M normal, 1G-800M high)
@@ -2190,7 +2197,7 @@ static inline bool boost_watermark(struct zone *zone)
 	max_boost = max(pageblock_nr_pages, max_boost);
 
 	zone->watermark_boost = min(zone->watermark_boost +
-		max(pageblock_nr_pages, zone_managed_pages(zone) >> 10),
+		max(pageblock_nr_pages, zone_managed_pages(zone) >> ATOMIC_BOOST_SCALE_SHIFT),
 		max_boost);
 
 	return true;
-- 
2.51.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v6] mm/page_alloc: boost watermarks on atomic allocation failure
  2026-01-21 20:56 ` Andrew Morton
  2026-01-22  1:40   ` Qiliang Yuan
  2026-01-22  2:00   ` [PATCH] " Qiliang Yuan
@ 2026-01-22  2:07   ` Qiliang Yuan
  2026-01-22 12:22     ` Vlastimil Babka
  2 siblings, 1 reply; 9+ messages in thread
From: Qiliang Yuan @ 2026-01-22  2:07 UTC (permalink / raw)
  To: akpm
  Cc: david, mhocko, vbabka, willy, lance.yang, hannes, surenb,
	jackmanb, ziy, weixugc, rppt, linux-mm, linux-kernel, netdev,
	edumazet, jis1, wangh13, liyi1, sunshx, zhangzq20, zhangjn11,
	Qiliang Yuan, Qiliang Yuan

Atomic allocations (GFP_ATOMIC) are prone to failure under heavy memory
pressure as they cannot enter direct reclaim. This patch introduces a
'Soft Boost' mechanism to mitigate this.

When a GFP_ATOMIC request fails or enters the slowpath, the preferred
zone's watermark_boost is increased. This triggers kswapd to proactively
reclaim memory, creating a safety buffer for future atomic bursts.

To prevent excessive reclaim during packet storms, a 1-second debounce
timer (last_boost_jiffies) is added to each zone to rate-limit boosts.

This approach reuses existing watermark_boost infrastructure, ensuring
minimal overhead and asynchronous background reclaim via kswapd.

Allocation failure logs:
[38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0
[38535644.725059] node 1: slabs: 339, objs: 17616, free: 317
[38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535645.447664] node 0: slabs: 940, objs: 40864, free: 144
[38535645.454026] node 1: slabs: 322, objs: 19168, free: 383
[38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535649.655523] warn_alloc: 59 callbacks suppressed
[38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null)
[38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1

Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>
---
v6:
  - Replace magic number ">> 10" with ATOMIC_BOOST_SCALE_SHIFT define
  - Add documentation explaining 0.1% zone size boost rationale
v5:
  - Simplify to use native boost_watermark() instead of custom logic
v4:
  - Add watermark_scale_boost and gradual decay via balance_pgdat
v3:
  - Move debounce timer to per-zone; optimize zone selection
v2:
  - Add debounce logic and zone-proportional boosting
v1:
  - Initial: boost min_free_kbytes on GFP_ATOMIC failure
---
 include/linux/mmzone.h |  1 +
 mm/page_alloc.c        | 36 +++++++++++++++++++++++++++++++++++-
 2 files changed, 36 insertions(+), 1 deletion(-)
---
 include/linux/mmzone.h |  1 +
 mm/page_alloc.c        | 36 +++++++++++++++++++++++++++++++++++-
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 75ef7c9f9307..8e37e4e6765b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -882,6 +882,7 @@ struct zone {
 	/* zone watermarks, access with *_wmark_pages(zone) macros */
 	unsigned long _watermark[NR_WMARK];
 	unsigned long watermark_boost;
+	unsigned long last_boost_jiffies;
 
 	unsigned long nr_reserved_highatomic;
 	unsigned long nr_free_highatomic;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c380f063e8b7..8ea2435125d5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -218,6 +218,13 @@ unsigned int pageblock_order __read_mostly;
 static void __free_pages_ok(struct page *page, unsigned int order,
 			    fpi_t fpi_flags);
 
+/*
+ * Boost watermarks by ~0.1% of zone size on atomic allocation pressure.
+ * This provides zone-proportional safety buffers: ~1MB per 1GB of zone size.
+ * Larger zones under GFP_ATOMIC pressure need proportionally larger reserves.
+ */
+#define ATOMIC_BOOST_SCALE_SHIFT 10
+
 /*
  * results with 256, 32 in the lowmem_reserve sysctl:
  *	1G machine -> (16M dma, 800M-16M normal, 1G-800M high)
@@ -2189,12 +2196,31 @@ static inline bool boost_watermark(struct zone *zone)
 
 	max_boost = max(pageblock_nr_pages, max_boost);
 
-	zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages,
+	zone->watermark_boost = min(zone->watermark_boost +
+		max(pageblock_nr_pages, zone_managed_pages(zone) >> ATOMIC_BOOST_SCALE_SHIFT),
 		max_boost);
 
 	return true;
 }
 
+static void boost_zones_for_atomic(struct alloc_context *ac, gfp_t gfp_mask)
+{
+	struct zoneref *z;
+	struct zone *zone;
+	unsigned long now = jiffies;
+
+	for_each_zone_zonelist(zone, z, ac->zonelist, ac->highest_zoneidx) {
+		/* 1 second debounce to avoid spamming boosts in a burst */
+		if (time_after(now, zone->last_boost_jiffies + HZ)) {
+			zone->last_boost_jiffies = now;
+			if (boost_watermark(zone))
+				wakeup_kswapd(zone, gfp_mask, 0, ac->highest_zoneidx);
+			/* Only boost the preferred zone to be precise */
+			break;
+		}
+	}
+}
+
 /*
  * When we are falling back to another migratetype during allocation, should we
  * try to claim an entire block to satisfy further allocations, instead of
@@ -4742,6 +4768,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (page)
 		goto got_pg;
 
+	/* Proactively boost for atomic requests entering slowpath */
+	if ((gfp_mask & GFP_ATOMIC) && order == 0)
+		boost_zones_for_atomic(ac, gfp_mask);
+
 	/*
 	 * For costly allocations, try direct compaction first, as it's likely
 	 * that we have enough base pages and don't need to reclaim. For non-
@@ -4947,6 +4977,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		goto retry;
 	}
 fail:
+	/* Boost watermarks on atomic allocation failure to trigger kswapd */
+	if (unlikely(page == NULL && (gfp_mask & GFP_ATOMIC) && order == 0))
+		boost_zones_for_atomic(ac, gfp_mask);
+
 	warn_alloc(gfp_mask, ac->nodemask,
 			"page allocation failure: order:%u", order);
 got_pg:
-- 
2.51.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/page_alloc: boost watermarks on atomic allocation failure
  2026-01-22  2:00   ` [PATCH] " Qiliang Yuan
@ 2026-01-22  2:17     ` Qiliang Yuan
  0 siblings, 0 replies; 9+ messages in thread
From: Qiliang Yuan @ 2026-01-22  2:17 UTC (permalink / raw)
  To: realwujing
  Cc: akpm, david, edumazet, hannes, jackmanb, jis1, lance.yang,
	linux-kernel, linux-mm, liyi1, mhocko, netdev, rppt, sunshx,
	surenb, vbabka, wangh13, weixugc, willy, yuanql9, zhangjn11,
	zhangzq20, ziy

Hi,

Please ignore this patch. I realized that I forgot to label the
subject as v6, and I also forgot to rebase properly, so the changes
from v5 were not correctly merged into this version. 

I will rebase and send a proper v6 shortly. Sorry for the noise.

Best regards,
Qiliang Yuan


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v6] mm/page_alloc: boost watermarks on atomic allocation failure
  2026-01-22  2:07   ` [PATCH v6] " Qiliang Yuan
@ 2026-01-22 12:22     ` Vlastimil Babka
  2026-01-23  6:42       ` [PATCH v7] " Qiliang Yuan
  0 siblings, 1 reply; 9+ messages in thread
From: Vlastimil Babka @ 2026-01-22 12:22 UTC (permalink / raw)
  To: Qiliang Yuan, akpm
  Cc: david, mhocko, willy, lance.yang, hannes, surenb, jackmanb, ziy,
	weixugc, rppt, linux-mm, linux-kernel, netdev, edumazet, jis1,
	wangh13, liyi1, sunshx, zhangzq20, zhangjn11, Qiliang Yuan

On 1/22/26 03:07, Qiliang Yuan wrote:
> Atomic allocations (GFP_ATOMIC) are prone to failure under heavy memory
> pressure as they cannot enter direct reclaim. This patch introduces a
> 'Soft Boost' mechanism to mitigate this.
> 
> When a GFP_ATOMIC request fails or enters the slowpath, the preferred
> zone's watermark_boost is increased. This triggers kswapd to proactively
> reclaim memory, creating a safety buffer for future atomic bursts.
> 
> To prevent excessive reclaim during packet storms, a 1-second debounce
> timer (last_boost_jiffies) is added to each zone to rate-limit boosts.
> 
> This approach reuses existing watermark_boost infrastructure, ensuring
> minimal overhead and asynchronous background reclaim via kswapd.
> 
> Allocation failure logs:
> [38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0
> [38535644.725059] node 1: slabs: 339, objs: 17616, free: 317
> [38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
> [38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
> [38535645.447664] node 0: slabs: 940, objs: 40864, free: 144
> [38535645.454026] node 1: slabs: 322, objs: 19168, free: 383
> [38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
> [38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
> [38535649.655523] warn_alloc: 59 callbacks suppressed
> [38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null)
> [38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1
> 
> Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
> Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>
> ---
> v6:
>   - Replace magic number ">> 10" with ATOMIC_BOOST_SCALE_SHIFT define
>   - Add documentation explaining 0.1% zone size boost rationale
> v5:
>   - Simplify to use native boost_watermark() instead of custom logic
> v4:
>   - Add watermark_scale_boost and gradual decay via balance_pgdat
> v3:
>   - Move debounce timer to per-zone; optimize zone selection
> v2:
>   - Add debounce logic and zone-proportional boosting
> v1:
>   - Initial: boost min_free_kbytes on GFP_ATOMIC failure
> ---
>  include/linux/mmzone.h |  1 +
>  mm/page_alloc.c        | 36 +++++++++++++++++++++++++++++++++++-
>  2 files changed, 36 insertions(+), 1 deletion(-)
> ---
>  include/linux/mmzone.h |  1 +
>  mm/page_alloc.c        | 36 +++++++++++++++++++++++++++++++++++-
>  2 files changed, 36 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 75ef7c9f9307..8e37e4e6765b 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -882,6 +882,7 @@ struct zone {
>  	/* zone watermarks, access with *_wmark_pages(zone) macros */
>  	unsigned long _watermark[NR_WMARK];
>  	unsigned long watermark_boost;
> +	unsigned long last_boost_jiffies;
>  
>  	unsigned long nr_reserved_highatomic;
>  	unsigned long nr_free_highatomic;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c380f063e8b7..8ea2435125d5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -218,6 +218,13 @@ unsigned int pageblock_order __read_mostly;
>  static void __free_pages_ok(struct page *page, unsigned int order,
>  			    fpi_t fpi_flags);
>  
> +/*
> + * Boost watermarks by ~0.1% of zone size on atomic allocation pressure.
> + * This provides zone-proportional safety buffers: ~1MB per 1GB of zone size.
> + * Larger zones under GFP_ATOMIC pressure need proportionally larger reserves.
> + */
> +#define ATOMIC_BOOST_SCALE_SHIFT 10
> +
>  /*
>   * results with 256, 32 in the lowmem_reserve sysctl:
>   *	1G machine -> (16M dma, 800M-16M normal, 1G-800M high)
> @@ -2189,12 +2196,31 @@ static inline bool boost_watermark(struct zone *zone)
>  
>  	max_boost = max(pageblock_nr_pages, max_boost);
>  
> -	zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages,
> +	zone->watermark_boost = min(zone->watermark_boost +
> +		max(pageblock_nr_pages, zone_managed_pages(zone) >> ATOMIC_BOOST_SCALE_SHIFT),

So IIUC you are not changing (increasing) the maximum boost, but the amount
in one step. It would be more descriptive to first set a local variable with
this amount and then use it for the boosting.

This change also affects the original boost_watermark() caller. Maybe it's
fine, can't say without any measurements.

>  		max_boost);
>  
>  	return true;
>  }
>  
> +static void boost_zones_for_atomic(struct alloc_context *ac, gfp_t gfp_mask)
> +{
> +	struct zoneref *z;
> +	struct zone *zone;
> +	unsigned long now = jiffies;
> +
> +	for_each_zone_zonelist(zone, z, ac->zonelist, ac->highest_zoneidx) {
> +		/* 1 second debounce to avoid spamming boosts in a burst */
> +		if (time_after(now, zone->last_boost_jiffies + HZ)) {
> +			zone->last_boost_jiffies = now;
> +			if (boost_watermark(zone))
> +				wakeup_kswapd(zone, gfp_mask, 0, ac->highest_zoneidx);

The other caller of boost_watermark() is under zone->lock and it makes those
zone->watermark_boost increments safe, and balance_pgdat() takes it for the
decrements too with "/* Increments are under the zone lock */ " comment,
otherwise I wouldn't realize this.

It probably wouldn't hurt to add a lockdep assert into boost_watermark() to
prevent mistakes.

But the other caller also takes care not to call wakeup_kswapd() under the
zone lock so I would not do it as well - see commit 73444bc4d8f92

> +			/* Only boost the preferred zone to be precise */
> +			break;
> +		}
> +	}
> +}
> +
>  /*
>   * When we are falling back to another migratetype during allocation, should we
>   * try to claim an entire block to satisfy further allocations, instead of
> @@ -4742,6 +4768,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	if (page)
>  		goto got_pg;
>  
> +	/* Proactively boost for atomic requests entering slowpath */
> +	if ((gfp_mask & GFP_ATOMIC) && order == 0)
> +		boost_zones_for_atomic(ac, gfp_mask);
> +
>  	/*
>  	 * For costly allocations, try direct compaction first, as it's likely
>  	 * that we have enough base pages and don't need to reclaim. For non-
> @@ -4947,6 +4977,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  		goto retry;
>  	}
>  fail:
> +	/* Boost watermarks on atomic allocation failure to trigger kswapd */
> +	if (unlikely(page == NULL && (gfp_mask & GFP_ATOMIC) && order == 0))
> +		boost_zones_for_atomic(ac, gfp_mask);

We already did the boosting when entering slowpath, there's 1 second
debounce and GFP_ATOMIC can't really do anything in the slowpath to spend 1
second, so I think this is redundant.

> +
>  	warn_alloc(gfp_mask, ac->nodemask,
>  			"page allocation failure: order:%u", order);
>  got_pg:



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v7] mm/page_alloc: boost watermarks on atomic allocation failure
  2026-01-22 12:22     ` Vlastimil Babka
@ 2026-01-23  6:42       ` Qiliang Yuan
  2026-01-27  6:06         ` kernel test robot
  0 siblings, 1 reply; 9+ messages in thread
From: Qiliang Yuan @ 2026-01-23  6:42 UTC (permalink / raw)
  To: vbabka
  Cc: akpm, david, edumazet, hannes, jackmanb, jis1, lance.yang,
	linux-kernel, linux-mm, liyi1, mhocko, netdev, realwujing, rppt,
	sunshx, surenb, wangh13, weixugc, willy, yuanql9, zhangjn11,
	zhangzq20, ziy

Atomic allocations (GFP_ATOMIC) are prone to failure under heavy memory
pressure as they cannot enter direct reclaim. This patch introduces a
watermark boost mechanism to mitigate this issue.

When a GFP_ATOMIC request enters the slowpath, the preferred zone's
watermark_boost is increased under zone->lock protection. This triggers
kswapd to proactively reclaim memory, creating a safety buffer for
future atomic allocations. A 1-second debounce timer prevents excessive
boosts during traffic bursts.

This approach reuses existing watermark_boost infrastructure with
minimal overhead and proper locking to ensure thread safety.

Allocation failure logs:
[38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0
[38535644.725059] node 1: slabs: 339, objs: 17616, free: 317
[38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535645.447664] node 0: slabs: 940, objs: 40864, free: 144
[38535645.454026] node 1: slabs: 322, objs: 19168, free: 383
[38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535649.655523] warn_alloc: 59 callbacks suppressed
[38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null)
[38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1

Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>
---
v7:
  - Use local variable for boost_amount to improve code readability
  - Add zone->lock protection in boost_zones_for_atomic()
  - Add lockdep assertion in boost_watermark() to prevent locking mistakes
  - Remove redundant boost call at fail label due to 1-second debounce
v6:
  - Replace magic number ">> 10" with ATOMIC_BOOST_SCALE_SHIFT define
  - Add documentation explaining 0.1% zone size boost rationale
v5:
  - Simplify to use native boost_watermark() instead of custom logic
v4:
  - Add watermark_scale_boost and gradual decay via balance_pgdat
v3:
  - Move debounce timer to per-zone; optimize zone selection
v2:
  - Add debounce logic and zone-proportional boosting
v1:
  - Initial: boost min_free_kbytes on GFP_ATOMIC failure

 include/linux/mmzone.h |  1 +
 mm/page_alloc.c        | 46 ++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 75ef7c9f9307..8e37e4e6765b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -882,6 +882,7 @@ struct zone {
 	/* zone watermarks, access with *_wmark_pages(zone) macros */
 	unsigned long _watermark[NR_WMARK];
 	unsigned long watermark_boost;
+	unsigned long last_boost_jiffies;
 
 	unsigned long nr_reserved_highatomic;
 	unsigned long nr_free_highatomic;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c380f063e8b7..94168571cc38 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -218,6 +218,13 @@ unsigned int pageblock_order __read_mostly;
 static void __free_pages_ok(struct page *page, unsigned int order,
 			    fpi_t fpi_flags);
 
+/*
+ * Boost watermarks by ~0.1% of zone size on atomic allocation pressure.
+ * This provides zone-proportional safety buffers: ~1MB per 1GB of zone size.
+ * Larger zones under GFP_ATOMIC pressure need proportionally larger reserves.
+ */
+#define ATOMIC_BOOST_SCALE_SHIFT 10
+
 /*
  * results with 256, 32 in the lowmem_reserve sysctl:
  *	1G machine -> (16M dma, 800M-16M normal, 1G-800M high)
@@ -2161,6 +2168,9 @@ bool pageblock_unisolate_and_move_free_pages(struct zone *zone, struct page *pag
 static inline bool boost_watermark(struct zone *zone)
 {
 	unsigned long max_boost;
+	unsigned long boost_amount;
+
+	lockdep_assert_held(&zone->lock);
 
 	if (!watermark_boost_factor)
 		return false;
@@ -2189,12 +2199,40 @@ static inline bool boost_watermark(struct zone *zone)
 
 	max_boost = max(pageblock_nr_pages, max_boost);
 
-	zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages,
-		max_boost);
+	boost_amount = max(pageblock_nr_pages,
+			   zone_managed_pages(zone) >> ATOMIC_BOOST_SCALE_SHIFT);
+	zone->watermark_boost = min(zone->watermark_boost + boost_amount,
+				    max_boost);
 
 	return true;
 }
 
+static void boost_zones_for_atomic(struct alloc_context *ac, gfp_t gfp_mask)
+{
+	struct zoneref *z;
+	struct zone *zone;
+	unsigned long now = jiffies;
+	bool should_wake;
+
+	for_each_zone_zonelist(zone, z, ac->zonelist, ac->highest_zoneidx) {
+		/* Rate-limit boosts to once per second per zone */
+		if (time_after(now, zone->last_boost_jiffies + HZ)) {
+			zone->last_boost_jiffies = now;
+
+			/* Modify watermark under lock, wake kswapd outside */
+			spin_lock(&zone->lock);
+			should_wake = boost_watermark(zone);
+			spin_unlock(&zone->lock);
+
+			if (should_wake)
+				wakeup_kswapd(zone, gfp_mask, 0, ac->highest_zoneidx);
+
+			/* Boost only the preferred zone */
+			break;
+		}
+	}
+}
+
 /*
  * When we are falling back to another migratetype during allocation, should we
  * try to claim an entire block to satisfy further allocations, instead of
@@ -4742,6 +4780,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (page)
 		goto got_pg;
 
+	/* Boost watermarks for atomic requests entering slowpath */
+	if ((gfp_mask & GFP_ATOMIC) && order == 0)
+		boost_zones_for_atomic(ac, gfp_mask);
+
 	/*
 	 * For costly allocations, try direct compaction first, as it's likely
 	 * that we have enough base pages and don't need to reclaim. For non-
-- 
2.51.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v7] mm/page_alloc: boost watermarks on atomic allocation failure
  2026-01-23  6:42       ` [PATCH v7] " Qiliang Yuan
@ 2026-01-27  6:06         ` kernel test robot
  0 siblings, 0 replies; 9+ messages in thread
From: kernel test robot @ 2026-01-27  6:06 UTC (permalink / raw)
  To: Qiliang Yuan
  Cc: oe-lkp, lkp, Qiliang Yuan, linux-mm, vbabka, akpm, david,
	edumazet, hannes, jackmanb, jis1, lance.yang, linux-kernel,
	liyi1, mhocko, netdev, realwujing, rppt, sunshx, surenb, wangh13,
	weixugc, willy, zhangjn11, zhangzq20, ziy, oliver.sang



Hello,

kernel test robot noticed "WARNING:inconsistent_lock_state" on:

commit: 4f0cbecbc533f56605274f6211e31907ed792bdf ("[PATCH v7] mm/page_alloc: boost watermarks on atomic allocation failure")
url: https://github.com/intel-lab-lkp/linux/commits/Qiliang-Yuan/mm-page_alloc-boost-watermarks-on-atomic-allocation-failure/20260123-144418
base: v6.19-rc6
patch link: https://lore.kernel.org/all/20260123064231.250767-1-realwujing@gmail.com/
patch subject: [PATCH v7] mm/page_alloc: boost watermarks on atomic allocation failure

in testcase: trinity
version: trinity-i386-abe9de86-1_20230429
with following parameters:

	runtime: 600s



config: x86_64-randconfig-075-20251114
compiler: gcc-14
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 32G

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202601271341.5d24a59f-lkp@intel.com



[  151.153230][ T1379] WARNING: inconsistent lock state
[  151.153836][ T1379] 6.19.0-rc6-00001-g4f0cbecbc533 #1 Not tainted
[  151.154605][ T1379] --------------------------------
[  151.155192][ T1379] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[  151.155825][ T1379] trinity-c0/1379 [HC0[0]:SC0[0]:HE1:SE1] takes:
[  151.156424][ T1379] ffffffff8865d9e8 (&zone->lock){+.?.}-{3:3}, at: __alloc_pages_slowpath+0x1265/0x1b00
[  151.157399][ T1379] {IN-SOFTIRQ-W} state was registered at:
[  151.158029][ T1379]   __lock_acquire (kernel/locking/lockdep.c:5191 (discriminator 1))
[  151.158629][ T1379]   lock_acquire (include/trace/events/lock.h:24 (discriminator 15) include/trace/events/lock.h:24 (discriminator 15) kernel/locking/lockdep.c:5831 (discriminator 15))
[  151.159221][ T1379]   lock_acquire (kernel/locking/lockdep.c:5833)
[  151.159670][ T1379]   _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162)
[  151.160180][ T1379]   rmqueue_bulk (mm/page_alloc.c:2592 (discriminator 1))
[  151.160637][ T1379]   __rmqueue_pcplist (mm/page_alloc.c:3374 (discriminator 3))
[  151.161132][ T1379]   rmqueue+0x1b3c/0x3400
[  151.161630][ T1379]   get_page_from_freelist (mm/page_alloc.c:3982)
[  151.162294][ T1379]   __alloc_frozen_pages_noprof (mm/page_alloc.c:5282)
[  151.163006][ T1379]   allocate_slab (mm/slub.c:3079 mm/slub.c:3248)
[  151.163482][ T1379]   ___slab_alloc (mm/slub.c:3302 mm/slub.c:4656)
[  151.163962][ T1379]   __slab_alloc+0x30/0x80
[  151.164464][ T1379]   __kmalloc_noprof (mm/slub.c:4855 mm/slub.c:5251 mm/slub.c:5656 mm/slub.c:5669)
[  151.164954][ T1379]   alloc_ep_req (drivers/usb/gadget/u_f.c:22 (discriminator 4))
[  151.165425][ T1379]   source_sink_start_ep (drivers/usb/gadget/function/f_sourcesink.c:292 drivers/usb/gadget/function/f_sourcesink.c:608)
[  151.166375][ T1379]   enable_source_sink (drivers/usb/gadget/function/f_sourcesink.c:662)
[  151.167006][ T1379]   sourcesink_set_alt (drivers/usb/gadget/function/f_sourcesink.c:744)
[  151.167515][ T1379]   set_config+0x323/0xb00
[  151.168019][ T1379]   composite_setup (include/linux/spinlock.h:391 drivers/usb/gadget/composite.c:1901)
[  151.168548][ T1379]   dummy_timer (include/linux/spinlock.h:351 drivers/usb/gadget/udc/dummy_hcd.c:1929)
[  151.169019][ T1379]   __hrtimer_run_queues (kernel/time/hrtimer.c:1777 kernel/time/hrtimer.c:1841)
[  151.169550][ T1379]   hrtimer_run_softirq (kernel/time/hrtimer.c:1860)
[  151.170198][ T1379]   handle_softirqs (arch/x86/include/asm/jump_label.h:37 include/trace/events/irq.h:142 kernel/softirq.c:623)
[  151.170817][ T1379]   __irq_exit_rcu (kernel/softirq.c:657 kernel/softirq.c:496 kernel/softirq.c:723)
[  151.171325][ T1379]   irq_exit_rcu (kernel/softirq.c:741 (discriminator 38))
[  151.171780][ T1379]   sysvec_call_function_single (arch/x86/kernel/smp.c:266 (discriminator 35) arch/x86/kernel/smp.c:266 (discriminator 35))
[  151.172366][ T1379]   asm_sysvec_call_function_single (arch/x86/include/asm/idtentry.h:704)
[  151.172952][ T1379]   pv_native_safe_halt (arch/x86/kernel/paravirt.c:82)
[  151.173449][ T1379]   arch_cpu_idle (arch/x86/kernel/process.c:805)
[  151.173991][ T1379]   default_idle_call (include/linux/cpuidle.h:143 (discriminator 1) kernel/sched/idle.c:123 (discriminator 1))
[  151.174997][ T1379]   cpuidle_idle_call (kernel/sched/idle.c:192)
[  151.175913][ T1379]   do_idle (kernel/sched/idle.c:332)
[  151.176666][ T1379]   cpu_startup_entry (kernel/sched/idle.c:429)
[  151.177536][ T1379]   rest_init (init/main.c:757)
[  151.178357][ T1379]   start_kernel (init/main.c:1206)
[  151.179249][ T1379]   x86_64_start_reservations (arch/x86/kernel/head64.c:310)
[  151.180213][ T1379]   x86_64_start_kernel (??:?)
[  151.181119][ T1379]   common_startup_64 (arch/x86/kernel/head_64.S:419)
[  151.181992][ T1379] irq event stamp: 16437333
[  151.182773][ T1379] hardirqs last  enabled at (16437333): _raw_spin_unlock_irq (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:119 include/linux/spinlock_api_smp.h:159 kernel/locking/spinlock.c:202)
[  151.184399][ T1379] hardirqs last disabled at (16437332): _raw_spin_lock_irq (include/linux/spinlock_api_smp.h:117 (discriminator 1) kernel/locking/spinlock.c:170 (discriminator 1))
[  151.186068][ T1379] softirqs last  enabled at (16422642): handle_softirqs (kernel/softirq.c:469 (discriminator 2) kernel/softirq.c:650 (discriminator 2))
[  151.187758][ T1379] softirqs last disabled at (16422637): __irq_exit_rcu (kernel/softirq.c:657 kernel/softirq.c:496 kernel/softirq.c:723)
[  151.189383][ T1379]
[  151.189383][ T1379] other info that might help us debug this:
[  151.190809][ T1379]  Possible unsafe locking scenario:
[  151.190809][ T1379]
[  151.192092][ T1379]        CPU0
[  151.192740][ T1379]        ----
[  151.193401][ T1379]   lock(&zone->lock);
[  151.194192][ T1379]   <Interrupt>
[  151.194940][ T1379]     lock(&zone->lock);
[  151.195734][ T1379]
[  151.195734][ T1379]  *** DEADLOCK ***
[  151.195734][ T1379]
[  151.197223][ T1379] 2 locks held by trinity-c0/1379:
[  151.198134][ T1379]  #0: ffff8881001b2400 (sb_writers#5){.+.+}-{0:0}, at: ksys_fallocate (include/linux/file.h:62 (discriminator 1) include/linux/file.h:83 (discriminator 1) fs/open.c:358 (discriminator 1))
[  151.199767][ T1379]  #1: ffff8881373cd0d0 (&sb->s_type->i_mutex_key#12){+.+.}-{4:4}, at: shmem_fallocate (mm/shmem.c:3688)
[  151.201543][ T1379]
[  151.201543][ T1379] stack backtrace:
[  151.202709][ T1379] CPU: 0 UID: 65534 PID: 1379 Comm: trinity-c0 Not tainted 6.19.0-rc6-00001-g4f0cbecbc533 #1 PREEMPT  5190a26909b47a16cbbcf00ba20dd51a77658a62
[  151.202723][ T1379] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[  151.202735][ T1379] Call Trace:
[  151.202739][ T1379]  <TASK>
[  151.202743][ T1379]  dump_stack_lvl (lib/dump_stack.c:122)
[  151.202760][ T1379]  dump_stack (lib/dump_stack.c:130)
[  151.202766][ T1379]  print_usage_bug+0x25d/0x380
[  151.202775][ T1379]  mark_lock_irq (kernel/locking/lockdep.c:4268)
[  151.202782][ T1379]  ? save_trace (kernel/locking/lockdep.c:557 (discriminator 1) kernel/locking/lockdep.c:594 (discriminator 1))
[  151.202792][ T1379]  mark_lock (kernel/locking/lockdep.c:4753)
[  151.202798][ T1379]  mark_usage (kernel/locking/lockdep.c:4666 (discriminator 1))
[  151.202803][ T1379]  __lock_acquire (kernel/locking/lockdep.c:5191 (discriminator 1))
[  151.202809][ T1379]  ? mark_held_locks (kernel/locking/lockdep.c:4325 (discriminator 1))
[  151.202815][ T1379]  lock_acquire (include/trace/events/lock.h:24 (discriminator 15) include/trace/events/lock.h:24 (discriminator 15) kernel/locking/lockdep.c:5831 (discriminator 15))
[  151.202820][ T1379]  ? __alloc_pages_slowpath+0x1265/0x1b00
[  151.202832][ T1379]  ? get_page_from_freelist (mm/page_alloc.c:3733 (discriminator 1) mm/page_alloc.c:3936 (discriminator 1))
[  151.202838][ T1379]  ? wakeup_kswapd (arch/x86/include/asm/atomic64_64.h:15 include/linux/atomic/atomic-arch-fallback.h:2583 include/linux/atomic/atomic-long.h:38 include/linux/atomic/atomic-instrumented.h:3189 include/linux/mmzone.h:1106 include/linux/mmzone.h:1600 mm/vmscan.c:7378)
[  151.202848][ T1379]  lock_acquire (kernel/locking/lockdep.c:5833)
[  151.202853][ T1379]  ? __alloc_pages_slowpath+0x1265/0x1b00
[  151.202861][ T1379]  _raw_spin_lock (include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154)
[  151.202869][ T1379]  ? __alloc_pages_slowpath+0x1265/0x1b00
[  151.202876][ T1379]  __alloc_pages_slowpath+0x1265/0x1b00
[  151.202887][ T1379]  ? warn_alloc (mm/page_alloc.c:4731)
[  151.202896][ T1379]  ? get_page_from_freelist (mm/page_alloc.c:3733 (discriminator 1) mm/page_alloc.c:3936 (discriminator 1))
[  151.202902][ T1379]  ? lock_is_held_type (kernel/locking/lockdep.c:5601 (discriminator 1) kernel/locking/lockdep.c:5940 (discriminator 1))
[  151.202913][ T1379]  __alloc_frozen_pages_noprof (mm/page_alloc.c:5295)
[  151.202920][ T1379]  ? __alloc_pages_slowpath+0x1b00/0x1b00
[  151.202929][ T1379]  ? filemap_get_entry (mm/filemap.c:1888)
[  151.202936][ T1379]  ? __folio_lock_or_retry (mm/filemap.c:1888)
[  151.202944][ T1379]  __folio_alloc_noprof (mm/page_alloc.c:5316 mm/page_alloc.c:5326)
[  151.202951][ T1379]  shmem_alloc_and_add_folio+0xfc/0x3c0
[  151.202961][ T1379]  shmem_get_folio_gfp+0x388/0xd80
[  151.202970][ T1379]  shmem_fallocate (mm/shmem.c:3780)
[  151.202980][ T1379]  ? shmem_get_link (mm/shmem.c:3675)
[  151.202987][ T1379]  ? find_held_lock (kernel/locking/lockdep.c:5350 (discriminator 1))
[  151.202998][ T1379]  ? __lock_acquire (kernel/locking/lockdep.c:5237 (discriminator 1))
[  151.203003][ T1379]  ? find_held_lock (kernel/locking/lockdep.c:5350 (discriminator 1))
[  151.203013][ T1379]  ? lock_acquire (include/trace/events/lock.h:24 (discriminator 15) include/trace/events/lock.h:24 (discriminator 15) kernel/locking/lockdep.c:5831 (discriminator 15))
[  151.203018][ T1379]  ? ksys_fallocate (include/linux/file.h:62 (discriminator 1) include/linux/file.h:83 (discriminator 1) fs/open.c:358 (discriminator 1))
[  151.203030][ T1379]  ? lock_is_held_type (kernel/locking/lockdep.c:5601 (discriminator 1) kernel/locking/lockdep.c:5940 (discriminator 1))
[  151.203039][ T1379]  vfs_fallocate (fs/open.c:339 (discriminator 1))
[  151.203047][ T1379]  ksys_fallocate (include/linux/file.h:62 (discriminator 1) include/linux/file.h:83 (discriminator 1) fs/open.c:358 (discriminator 1))
[  151.203055][ T1379]  __ia32_sys_ia32_fallocate (arch/x86/kernel/sys_ia32.c:119)
[  151.203065][ T1379]  ? __do_fast_syscall_32 (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:119 arch/x86/entry/syscall_32.c:280)
[  151.203074][ T1379]  ia32_sys_call (arch/x86/entry/syscall_32.c:50)
[  151.203079][ T1379]  __do_fast_syscall_32 (arch/x86/entry/syscall_32.c:83 (discriminator 1) arch/x86/entry/syscall_32.c:307 (discriminator 1))
[  151.203088][ T1379]  do_fast_syscall_32 (arch/x86/entry/syscall_32.c:332 (discriminator 1))
[  151.203096][ T1379]  do_SYSENTER_32 (arch/x86/entry/syscall_32.c:371)
[  151.203103][ T1379]  entry_SYSENTER_compat_after_hwframe (arch/x86/entry/entry_64_compat.S:127)
[  151.203111][ T1379] RIP: 0023:0xf7ed9589
[  151.203118][ T1379] Code: 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 2e 8d b4 26 00 00 00 00 8d b4 26 00 00 00
All code
========
   0:	03 74 d8 01          	add    0x1(%rax,%rbx,8),%esi
	...
  20:	00 51 52             	add    %dl,0x52(%rcx)
  23:	55                   	push   %rbp
  24:	89 e5                	mov    %esp,%ebp
  26:*	0f 34                	sysenter		<-- trapping instruction
  28:	cd 80                	int    $0x80
  2a:	5d                   	pop    %rbp
  2b:	5a                   	pop    %rdx
  2c:	59                   	pop    %rcx
  2d:	c3                   	ret
  2e:	90                   	nop
  2f:	90                   	nop
  30:	90                   	nop
  31:	90                   	nop
  32:	2e 8d b4 26 00 00 00 	cs lea 0x0(%rsi,%riz,1),%esi
  39:	00 
  3a:	8d                   	.byte 0x8d
  3b:	b4 26                	mov    $0x26,%ah
  3d:	00 00                	add    %al,(%rax)
	...

Code starting with the faulting instruction
===========================================
   0:	5d                   	pop    %rbp
   1:	5a                   	pop    %rdx
   2:	59                   	pop    %rcx
   3:	c3                   	ret
   4:	90                   	nop
   5:	90                   	nop
   6:	90                   	nop
   7:	90                   	nop
   8:	2e 8d b4 26 00 00 00 	cs lea 0x0(%rsi,%riz,1),%esi
   f:	00 
  10:	8d                   	.byte 0x8d
  11:	b4 26                	mov    $0x26,%ah
  13:	00 00                	add    %al,(%rax)


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260127/202601271341.5d24a59f-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-01-27  6:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-21  6:57 [PATCH v5] mm/page_alloc: boost watermarks on atomic allocation failure Qiliang Yuan
2026-01-21 20:56 ` Andrew Morton
2026-01-22  1:40   ` Qiliang Yuan
2026-01-22  2:00   ` [PATCH] " Qiliang Yuan
2026-01-22  2:17     ` Qiliang Yuan
2026-01-22  2:07   ` [PATCH v6] " Qiliang Yuan
2026-01-22 12:22     ` Vlastimil Babka
2026-01-23  6:42       ` [PATCH v7] " Qiliang Yuan
2026-01-27  6:06         ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox