* [PATCH v5] mm/page_alloc: boost watermarks on atomic allocation failure
@ 2026-01-21 6:48 Qiliang Yuan
0 siblings, 0 replies; 4+ messages in thread
From: Qiliang Yuan @ 2026-01-21 6:48 UTC (permalink / raw)
To: david
Cc: lance.yang, mhocko, vbabka, hannes, weixugc, lorenzo.stoakes,
Liam.Howlett, rppt, surenb, jackmanb, ziy, linux-mm,
linux-kernel, Qiliang Yuan
Atomic allocations (GFP_ATOMIC) are prone to failure under heavy memory
pressure as they cannot enter direct reclaim. This patch introduces a
'Soft Boost' mechanism to mitigate this.
When a GFP_ATOMIC request fails or enters the slowpath, the preferred
zone's watermark_boost is increased. This triggers kswapd to proactively
reclaim memory, creating a safety buffer for future atomic bursts.
To prevent excessive reclaim during packet storms, a 1-second debounce
timer (last_boost_jiffies) is added to each zone to rate-limit boosts.
This approach reuses existing watermark_boost infrastructure, ensuring
minimal overhead and asynchronous background reclaim via kswapd.
Allocation failure logs:
[38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0
[38535644.725059] node 1: slabs: 339, objs: 17616, free: 317
[38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535645.447664] node 0: slabs: 940, objs: 40864, free: 144
[38535645.454026] node 1: slabs: 322, objs: 19168, free: 383
[38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535649.655523] warn_alloc: 59 callbacks suppressed
[38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null)
[38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
v5:
- Replaced custom watermark_scale_boost and manual recomputations with
native boost_watermark reuse.
- Simplified logic to use existing 'boost' architecture for better
community acceptability.
v4:
- Introduced watermark_scale_boost and gradual decay via balance_pgdat.
- Added proactive soft-boosting when entering slowpath.
v3:
- Moved debounce timer to per-zone to avoid cross-node interference.
- Optimized candidate zone selection to reduce global reclaim pressure.
v2:
- Added basic debounce logic and scaled boosting strength based on zone size.
v1:
- Initial proposal: Basic watermark boost on atomic allocation failure.
---
include/linux/mmzone.h | 1 +
mm/page_alloc.c | 29 ++++++++++++++++++++++++++++-
2 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 75ef7c9f9307..8e37e4e6765b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -882,6 +882,7 @@ struct zone {
/* zone watermarks, access with *_wmark_pages(zone) macros */
unsigned long _watermark[NR_WMARK];
unsigned long watermark_boost;
+ unsigned long last_boost_jiffies;
unsigned long nr_reserved_highatomic;
unsigned long nr_free_highatomic;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c380f063e8b7..1faace9e2dc5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2189,12 +2189,31 @@ static inline bool boost_watermark(struct zone *zone)
max_boost = max(pageblock_nr_pages, max_boost);
- zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages,
+ zone->watermark_boost = min(zone->watermark_boost +
+ max(pageblock_nr_pages, zone_managed_pages(zone) >> 10),
max_boost);
return true;
}
+static void boost_zones_for_atomic(struct alloc_context *ac, gfp_t gfp_mask)
+{
+ struct zoneref *z;
+ struct zone *zone;
+ unsigned long now = jiffies;
+
+ for_each_zone_zonelist(zone, z, ac->zonelist, ac->highest_zoneidx) {
+ /* 1 second debounce to avoid spamming boosts in a burst */
+ if (time_after(now, zone->last_boost_jiffies + HZ)) {
+ zone->last_boost_jiffies = now;
+ if (boost_watermark(zone))
+ wakeup_kswapd(zone, gfp_mask, 0, ac->highest_zoneidx);
+ /* Only boost the preferred zone to be precise */
+ break;
+ }
+ }
+}
+
/*
* When we are falling back to another migratetype during allocation, should we
* try to claim an entire block to satisfy further allocations, instead of
@@ -4742,6 +4761,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if (page)
goto got_pg;
+ /* Proactively boost for atomic requests entering slowpath */
+ if ((gfp_mask & GFP_ATOMIC) && order == 0)
+ boost_zones_for_atomic(ac, gfp_mask);
+
/*
* For costly allocations, try direct compaction first, as it's likely
* that we have enough base pages and don't need to reclaim. For non-
@@ -4947,6 +4970,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
goto retry;
}
fail:
+ /* Boost watermarks on atomic allocation failure to trigger kswapd */
+ if (unlikely(page == NULL && (gfp_mask & GFP_ATOMIC) && order == 0))
+ boost_zones_for_atomic(ac, gfp_mask);
+
warn_alloc(gfp_mask, ac->nodemask,
"page allocation failure: order:%u", order);
got_pg:
--
2.51.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v5] mm/page_alloc: boost watermarks on atomic allocation failure
2026-01-21 20:56 ` Andrew Morton
@ 2026-01-22 1:40 ` Qiliang Yuan
0 siblings, 0 replies; 4+ messages in thread
From: Qiliang Yuan @ 2026-01-22 1:40 UTC (permalink / raw)
To: akpm
Cc: david, mhocko, vbabka, willy, lance.yang, hannes, surenb,
jackmanb, ziy, weixugc, rppt, linux-mm, linux-kernel, netdev,
edumazet
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 4687 bytes --]
On Wed, 21 Jan 2026 12:56:03 -0800 Andrew Morton <akpm@linux-foundation.org> wrote:
> This seems sensible to me - dynamically boost reserves in response to
> sustained GFP_ATOMIC allocation failures. It's very much a networking
> thing and I expect the networking people have been looking at these
> issues for years. So let's start by cc'ing them!
Thank you for the feedback and for cc'ing the networking folks! I appreciate
your continued engagement throughout this patch series (v1-v5).
> Obvious question, which I think was asked before: what about gradually
> decreasing those reserves when the packet storm has subsided?
>
> > v4:
> > - Introduced watermark_scale_boost and gradual decay via balance_pgdat.
>
> And there it is, but v5 removed this. Why? Or perhaps I'm misreading
> the implementation.
You're absolutely right - v4 did include a gradual decay mechanism. The
evolution from v1 to v5 was driven by community feedback, and I'd like to
explain the rationale for each major change:
**v1 → v2**: Following your and Matthew Wilcox's feedback on v1, I:
- Reduced the boost from doubling (100%) to 50% increase
- Added a decay mechanism (5% every 5 minutes)
- Added debounce logic
- v1: https://lore.kernel.org/all/tencent_9DB6637676D639B4B7AEA09CC6A6F9E49D0A@qq.com/
- v2: https://lore.kernel.org/all/tencent_6FE67BA7BE8376AB038A71ACAD4FF8A90006@qq.com/
**v2 → v3**: Following Michal Hocko's suggestion to use watermark_scale_factor
instead of min_free_kbytes, I switched to the watermark_boost infrastructure.
This was a significant simplification that reused existing MM subsystem patterns.
- v3: https://lore.kernel.org/all/tencent_44B556221480D8371FBC534ACCF3CE2C8707@qq.com/
**v3 → v4**: Added watermark_scale_boost and gradual decay via balance_pgdat()
to provide more fine-grained control over the reclaim aggressiveness.
- v4: https://lore.kernel.org/all/tencent_D23BFCB69EA088C55AFAF89F926036743E0A@qq.com/
**v4 → v5**: Removed watermark_scale_boost for the following reasons:
- v5: https://lore.kernel.org/all/20260121065740.35616-1-realwujing@gmail.com/
1. **Natural decay exists**: The existing watermark_boost infrastructure already
has a built-in decay path. When kswapd successfully reclaims memory and the
zone becomes balanced, kswapd_shrink_node() automatically resets
watermark_boost to 0. This happens organically without custom decay logic.
2. **Simplicity**: The v4 approach added custom watermark_scale_boost tracking
and manual decay in balance_pgdat(). This added complexity that duplicated
functionality already present in the kswapd reclaim path.
3. **Production validation**: In our production environment (high-throughput
networking workloads), the natural decay via kswapd proved sufficient. Once
memory pressure subsides and kswapd successfully reclaims to the high
watermark, the boost is cleared automatically within seconds.
However, I recognize this is a trade-off. The v4 gradual decay provided more
explicit control over the decay rate. If you or the networking maintainers feel
that explicit decay control is important for packet storm scenarios, I'm happy
to reintroduce the v4 approach or explore alternative decay strategies (e.g.,
time-based decay independent of kswapd success).
> > + zone->watermark_boost = min(zone->watermark_boost +
> > + max(pageblock_nr_pages, zone_managed_pages(zone) >> 10),
>
> ">> 10" is a magic number. What is the reasoning behind choosing this
> value?
Good catch. The ">> 10" (divide by 1024) was chosen to provide a
zone-proportional boost that scales with zone size:
- For a 1GB zone: ~1MB boost per trigger
- For a 16GB zone: ~16MB boost per trigger
The rationale:
1. **Proportionality**: Larger zones experiencing atomic allocation pressure
likely need proportionally larger safety buffers. A fixed pageblock_nr_pages
(typically 2MB) might be insufficient for large zones under heavy load.
2. **Conservative scaling**: 1/1024 (~0.1%) is aggressive enough to help during
sustained pressure but conservative enough to avoid over-reclaim. This was
empirically tuned based on our production workload.
3. **Production results**: In our high-throughput networking environment
(100Gbps+ traffic bursts), this value reduced GFP_ATOMIC failures by ~95%
without causing excessive kswapd activity or impacting normal allocations.
I should document this better. I propose adding a #define:
```c
/*
* Boost watermarks by ~0.1% of zone size on atomic allocation pressure.
* This provides zone-proportional safety buffers: ~1MB per 1GB of zone size.
*/
#define ATOMIC_BOOST_SCALE_SHIFT 10
```
Best regards,
Qiliang Yuan
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v5] mm/page_alloc: boost watermarks on atomic allocation failure
2026-01-21 6:57 Qiliang Yuan
@ 2026-01-21 20:56 ` Andrew Morton
2026-01-22 1:40 ` Qiliang Yuan
0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2026-01-21 20:56 UTC (permalink / raw)
To: Qiliang Yuan
Cc: david, mhocko, vbabka, willy, lance.yang, hannes, surenb,
jackmanb, ziy, weixugc, rppt, linux-mm, linux-kernel, netdev,
Eric Dumazet
On Wed, 21 Jan 2026 01:57:40 -0500 Qiliang Yuan <realwujing@gmail.com> wrote:
> Atomic allocations (GFP_ATOMIC) are prone to failure under heavy memory
> pressure as they cannot enter direct reclaim. This patch introduces a
> 'Soft Boost' mechanism to mitigate this.
>
> When a GFP_ATOMIC request fails or enters the slowpath, the preferred
> zone's watermark_boost is increased. This triggers kswapd to proactively
> reclaim memory, creating a safety buffer for future atomic bursts.
>
> To prevent excessive reclaim during packet storms, a 1-second debounce
> timer (last_boost_jiffies) is added to each zone to rate-limit boosts.
>
> This approach reuses existing watermark_boost infrastructure, ensuring
> minimal overhead and asynchronous background reclaim via kswapd.
>
> Allocation failure logs:
> [38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0
> [38535644.725059] node 1: slabs: 339, objs: 17616, free: 317
> [38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
> [38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
> [38535645.447664] node 0: slabs: 940, objs: 40864, free: 144
> [38535645.454026] node 1: slabs: 322, objs: 19168, free: 383
> [38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
> [38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
> [38535649.655523] warn_alloc: 59 callbacks suppressed
> [38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null)
> [38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1
This seems sensible to me - dynamically boost reserves in response to
sustained GFP_ATOMIC allocation failures.
It's very much a networking thing and I expect the networking people
have been looking at these issues for years. So let's start by cc'ing
them!
Obvious question, which I think was asked before: what about gradually
decreasing those reserves when the packet storm has subsided?
> v4:
> - Introduced watermark_scale_boost and gradual decay via balance_pgdat.
And there it is, but v5 removed this. Why?
Or perhaps I'm misreading the implementation.
> - Added proactive soft-boosting when entering slowpath.
> v3:
> - Moved debounce timer to per-zone to avoid cross-node interference.
> - Optimized candidate zone selection to reduce global reclaim pressure.
> v2:
> - Added basic debounce logic and scaled boosting strength based on zone size.
> v1:
> - Initial proposal: Basic watermark boost on atomic allocation failure.
> ---
> include/linux/mmzone.h | 1 +
> mm/page_alloc.c | 29 ++++++++++++++++++++++++++++-
> 2 files changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 75ef7c9f9307..8e37e4e6765b 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -882,6 +882,7 @@ struct zone {
> /* zone watermarks, access with *_wmark_pages(zone) macros */
> unsigned long _watermark[NR_WMARK];
> unsigned long watermark_boost;
> + unsigned long last_boost_jiffies;
>
> unsigned long nr_reserved_highatomic;
> unsigned long nr_free_highatomic;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c380f063e8b7..1faace9e2dc5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2189,12 +2189,31 @@ static inline bool boost_watermark(struct zone *zone)
>
> max_boost = max(pageblock_nr_pages, max_boost);
>
> - zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages,
> + zone->watermark_boost = min(zone->watermark_boost +
> + max(pageblock_nr_pages, zone_managed_pages(zone) >> 10),
">> 10" is a magic number. What is the reasoning behind choosing this
value?
> max_boost);
>
> return true;
> }
>
> +static void boost_zones_for_atomic(struct alloc_context *ac, gfp_t gfp_mask)
> +{
> + struct zoneref *z;
> + struct zone *zone;
> + unsigned long now = jiffies;
> +
> + for_each_zone_zonelist(zone, z, ac->zonelist, ac->highest_zoneidx) {
> + /* 1 second debounce to avoid spamming boosts in a burst */
> + if (time_after(now, zone->last_boost_jiffies + HZ)) {
> + zone->last_boost_jiffies = now;
> + if (boost_watermark(zone))
> + wakeup_kswapd(zone, gfp_mask, 0, ac->highest_zoneidx);
> + /* Only boost the preferred zone to be precise */
> + break;
> + }
> + }
> +}
> +
> /*
> * When we are falling back to another migratetype during allocation, should we
> * try to claim an entire block to satisfy further allocations, instead of
> @@ -4742,6 +4761,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> if (page)
> goto got_pg;
>
> + /* Proactively boost for atomic requests entering slowpath */
> + if ((gfp_mask & GFP_ATOMIC) && order == 0)
> + boost_zones_for_atomic(ac, gfp_mask);
> +
> /*
> * For costly allocations, try direct compaction first, as it's likely
> * that we have enough base pages and don't need to reclaim. For non-
> @@ -4947,6 +4970,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> goto retry;
> }
> fail:
> + /* Boost watermarks on atomic allocation failure to trigger kswapd */
> + if (unlikely(page == NULL && (gfp_mask & GFP_ATOMIC) && order == 0))
> + boost_zones_for_atomic(ac, gfp_mask);
> +
> warn_alloc(gfp_mask, ac->nodemask,
> "page allocation failure: order:%u", order);
> got_pg:
> --
> 2.51.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v5] mm/page_alloc: boost watermarks on atomic allocation failure
@ 2026-01-21 6:57 Qiliang Yuan
2026-01-21 20:56 ` Andrew Morton
0 siblings, 1 reply; 4+ messages in thread
From: Qiliang Yuan @ 2026-01-21 6:57 UTC (permalink / raw)
To: akpm
Cc: david, mhocko, vbabka, willy, lance.yang, hannes, surenb,
jackmanb, ziy, weixugc, rppt, linux-mm, linux-kernel,
Qiliang Yuan
Atomic allocations (GFP_ATOMIC) are prone to failure under heavy memory
pressure as they cannot enter direct reclaim. This patch introduces a
'Soft Boost' mechanism to mitigate this.
When a GFP_ATOMIC request fails or enters the slowpath, the preferred
zone's watermark_boost is increased. This triggers kswapd to proactively
reclaim memory, creating a safety buffer for future atomic bursts.
To prevent excessive reclaim during packet storms, a 1-second debounce
timer (last_boost_jiffies) is added to each zone to rate-limit boosts.
This approach reuses existing watermark_boost infrastructure, ensuring
minimal overhead and asynchronous background reclaim via kswapd.
Allocation failure logs:
[38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0
[38535644.725059] node 1: slabs: 339, objs: 17616, free: 317
[38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535645.447664] node 0: slabs: 940, objs: 40864, free: 144
[38535645.454026] node 1: slabs: 322, objs: 19168, free: 383
[38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535649.655523] warn_alloc: 59 callbacks suppressed
[38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null)
[38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
v5:
- Replaced custom watermark_scale_boost and manual recomputations with
native boost_watermark reuse.
- Simplified logic to use existing 'boost' architecture for better
community acceptability.
v4:
- Introduced watermark_scale_boost and gradual decay via balance_pgdat.
- Added proactive soft-boosting when entering slowpath.
v3:
- Moved debounce timer to per-zone to avoid cross-node interference.
- Optimized candidate zone selection to reduce global reclaim pressure.
v2:
- Added basic debounce logic and scaled boosting strength based on zone size.
v1:
- Initial proposal: Basic watermark boost on atomic allocation failure.
---
include/linux/mmzone.h | 1 +
mm/page_alloc.c | 29 ++++++++++++++++++++++++++++-
2 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 75ef7c9f9307..8e37e4e6765b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -882,6 +882,7 @@ struct zone {
/* zone watermarks, access with *_wmark_pages(zone) macros */
unsigned long _watermark[NR_WMARK];
unsigned long watermark_boost;
+ unsigned long last_boost_jiffies;
unsigned long nr_reserved_highatomic;
unsigned long nr_free_highatomic;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c380f063e8b7..1faace9e2dc5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2189,12 +2189,31 @@ static inline bool boost_watermark(struct zone *zone)
max_boost = max(pageblock_nr_pages, max_boost);
- zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages,
+ zone->watermark_boost = min(zone->watermark_boost +
+ max(pageblock_nr_pages, zone_managed_pages(zone) >> 10),
max_boost);
return true;
}
+static void boost_zones_for_atomic(struct alloc_context *ac, gfp_t gfp_mask)
+{
+ struct zoneref *z;
+ struct zone *zone;
+ unsigned long now = jiffies;
+
+ for_each_zone_zonelist(zone, z, ac->zonelist, ac->highest_zoneidx) {
+ /* 1 second debounce to avoid spamming boosts in a burst */
+ if (time_after(now, zone->last_boost_jiffies + HZ)) {
+ zone->last_boost_jiffies = now;
+ if (boost_watermark(zone))
+ wakeup_kswapd(zone, gfp_mask, 0, ac->highest_zoneidx);
+ /* Only boost the preferred zone to be precise */
+ break;
+ }
+ }
+}
+
/*
* When we are falling back to another migratetype during allocation, should we
* try to claim an entire block to satisfy further allocations, instead of
@@ -4742,6 +4761,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if (page)
goto got_pg;
+ /* Proactively boost for atomic requests entering slowpath */
+ if ((gfp_mask & GFP_ATOMIC) && order == 0)
+ boost_zones_for_atomic(ac, gfp_mask);
+
/*
* For costly allocations, try direct compaction first, as it's likely
* that we have enough base pages and don't need to reclaim. For non-
@@ -4947,6 +4970,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
goto retry;
}
fail:
+ /* Boost watermarks on atomic allocation failure to trigger kswapd */
+ if (unlikely(page == NULL && (gfp_mask & GFP_ATOMIC) && order == 0))
+ boost_zones_for_atomic(ac, gfp_mask);
+
warn_alloc(gfp_mask, ac->nodemask,
"page allocation failure: order:%u", order);
got_pg:
--
2.51.0
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-01-22 1:40 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-21 6:48 [PATCH v5] mm/page_alloc: boost watermarks on atomic allocation failure Qiliang Yuan
2026-01-21 6:57 Qiliang Yuan
2026-01-21 20:56 ` Andrew Morton
2026-01-22 1:40 ` Qiliang Yuan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox