* [PATCH v5] mm/page_alloc: boost watermarks on atomic allocation failure
@ 2026-01-21 6:57 Qiliang Yuan
2026-01-21 20:56 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Qiliang Yuan @ 2026-01-21 6:57 UTC (permalink / raw)
To: akpm
Cc: david, mhocko, vbabka, willy, lance.yang, hannes, surenb,
jackmanb, ziy, weixugc, rppt, linux-mm, linux-kernel,
Qiliang Yuan
Atomic allocations (GFP_ATOMIC) are prone to failure under heavy memory
pressure as they cannot enter direct reclaim. This patch introduces a
'Soft Boost' mechanism to mitigate this.
When a GFP_ATOMIC request fails or enters the slowpath, the preferred
zone's watermark_boost is increased. This triggers kswapd to proactively
reclaim memory, creating a safety buffer for future atomic bursts.
To prevent excessive reclaim during packet storms, a 1-second debounce
timer (last_boost_jiffies) is added to each zone to rate-limit boosts.
This approach reuses existing watermark_boost infrastructure, ensuring
minimal overhead and asynchronous background reclaim via kswapd.
Allocation failure logs:
[38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0
[38535644.725059] node 1: slabs: 339, objs: 17616, free: 317
[38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535645.447664] node 0: slabs: 940, objs: 40864, free: 144
[38535645.454026] node 1: slabs: 322, objs: 19168, free: 383
[38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535649.655523] warn_alloc: 59 callbacks suppressed
[38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null)
[38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
v5:
- Replaced custom watermark_scale_boost and manual recomputations with
native boost_watermark reuse.
- Simplified logic to use existing 'boost' architecture for better
community acceptability.
v4:
- Introduced watermark_scale_boost and gradual decay via balance_pgdat.
- Added proactive soft-boosting when entering slowpath.
v3:
- Moved debounce timer to per-zone to avoid cross-node interference.
- Optimized candidate zone selection to reduce global reclaim pressure.
v2:
- Added basic debounce logic and scaled boosting strength based on zone size.
v1:
- Initial proposal: Basic watermark boost on atomic allocation failure.
---
include/linux/mmzone.h | 1 +
mm/page_alloc.c | 29 ++++++++++++++++++++++++++++-
2 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 75ef7c9f9307..8e37e4e6765b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -882,6 +882,7 @@ struct zone {
/* zone watermarks, access with *_wmark_pages(zone) macros */
unsigned long _watermark[NR_WMARK];
unsigned long watermark_boost;
+ unsigned long last_boost_jiffies;
unsigned long nr_reserved_highatomic;
unsigned long nr_free_highatomic;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c380f063e8b7..1faace9e2dc5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2189,12 +2189,31 @@ static inline bool boost_watermark(struct zone *zone)
max_boost = max(pageblock_nr_pages, max_boost);
- zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages,
+ zone->watermark_boost = min(zone->watermark_boost +
+ max(pageblock_nr_pages, zone_managed_pages(zone) >> 10),
max_boost);
return true;
}
+static void boost_zones_for_atomic(struct alloc_context *ac, gfp_t gfp_mask)
+{
+ struct zoneref *z;
+ struct zone *zone;
+ unsigned long now = jiffies;
+
+ for_each_zone_zonelist(zone, z, ac->zonelist, ac->highest_zoneidx) {
+ /* 1 second debounce to avoid spamming boosts in a burst */
+ if (time_after(now, zone->last_boost_jiffies + HZ)) {
+ zone->last_boost_jiffies = now;
+ if (boost_watermark(zone))
+ wakeup_kswapd(zone, gfp_mask, 0, ac->highest_zoneidx);
+ /* Only boost the preferred zone to be precise */
+ break;
+ }
+ }
+}
+
/*
* When we are falling back to another migratetype during allocation, should we
* try to claim an entire block to satisfy further allocations, instead of
@@ -4742,6 +4761,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if (page)
goto got_pg;
+ /* Proactively boost for atomic requests entering slowpath */
+ if ((gfp_mask & GFP_ATOMIC) && order == 0)
+ boost_zones_for_atomic(ac, gfp_mask);
+
/*
* For costly allocations, try direct compaction first, as it's likely
* that we have enough base pages and don't need to reclaim. For non-
@@ -4947,6 +4970,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
goto retry;
}
fail:
+ /* Boost watermarks on atomic allocation failure to trigger kswapd */
+ if (unlikely(page == NULL && (gfp_mask & GFP_ATOMIC) && order == 0))
+ boost_zones_for_atomic(ac, gfp_mask);
+
warn_alloc(gfp_mask, ac->nodemask,
"page allocation failure: order:%u", order);
got_pg:
--
2.51.0
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH v5] mm/page_alloc: boost watermarks on atomic allocation failure 2026-01-21 6:57 [PATCH v5] mm/page_alloc: boost watermarks on atomic allocation failure Qiliang Yuan @ 2026-01-21 20:56 ` Andrew Morton 2026-01-22 1:40 ` Qiliang Yuan ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Andrew Morton @ 2026-01-21 20:56 UTC (permalink / raw) To: Qiliang Yuan Cc: david, mhocko, vbabka, willy, lance.yang, hannes, surenb, jackmanb, ziy, weixugc, rppt, linux-mm, linux-kernel, netdev, Eric Dumazet On Wed, 21 Jan 2026 01:57:40 -0500 Qiliang Yuan <realwujing@gmail.com> wrote: > Atomic allocations (GFP_ATOMIC) are prone to failure under heavy memory > pressure as they cannot enter direct reclaim. This patch introduces a > 'Soft Boost' mechanism to mitigate this. > > When a GFP_ATOMIC request fails or enters the slowpath, the preferred > zone's watermark_boost is increased. This triggers kswapd to proactively > reclaim memory, creating a safety buffer for future atomic bursts. > > To prevent excessive reclaim during packet storms, a 1-second debounce > timer (last_boost_jiffies) is added to each zone to rate-limit boosts. > > This approach reuses existing watermark_boost infrastructure, ensuring > minimal overhead and asynchronous background reclaim via kswapd. > > Allocation failure logs: > [38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0 > [38535644.725059] node 1: slabs: 339, objs: 17616, free: 317 > [38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC) > [38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0 > [38535645.447664] node 0: slabs: 940, objs: 40864, free: 144 > [38535645.454026] node 1: slabs: 322, objs: 19168, free: 383 > [38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC) > [38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0 > [38535649.655523] warn_alloc: 59 callbacks suppressed > [38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null) > [38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1 This seems sensible to me - dynamically boost reserves in response to sustained GFP_ATOMIC allocation failures. It's very much a networking thing and I expect the networking people have been looking at these issues for years. So let's start by cc'ing them! Obvious question, which I think was asked before: what about gradually decreasing those reserves when the packet storm has subsided? > v4: > - Introduced watermark_scale_boost and gradual decay via balance_pgdat. And there it is, but v5 removed this. Why? Or perhaps I'm misreading the implementation. > - Added proactive soft-boosting when entering slowpath. > v3: > - Moved debounce timer to per-zone to avoid cross-node interference. > - Optimized candidate zone selection to reduce global reclaim pressure. > v2: > - Added basic debounce logic and scaled boosting strength based on zone size. > v1: > - Initial proposal: Basic watermark boost on atomic allocation failure. > --- > include/linux/mmzone.h | 1 + > mm/page_alloc.c | 29 ++++++++++++++++++++++++++++- > 2 files changed, 29 insertions(+), 1 deletion(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 75ef7c9f9307..8e37e4e6765b 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -882,6 +882,7 @@ struct zone { > /* zone watermarks, access with *_wmark_pages(zone) macros */ > unsigned long _watermark[NR_WMARK]; > unsigned long watermark_boost; > + unsigned long last_boost_jiffies; > > unsigned long nr_reserved_highatomic; > unsigned long nr_free_highatomic; > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index c380f063e8b7..1faace9e2dc5 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2189,12 +2189,31 @@ static inline bool boost_watermark(struct zone *zone) > > max_boost = max(pageblock_nr_pages, max_boost); > > - zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages, > + zone->watermark_boost = min(zone->watermark_boost + > + max(pageblock_nr_pages, zone_managed_pages(zone) >> 10), ">> 10" is a magic number. What is the reasoning behind choosing this value? > max_boost); > > return true; > } > > +static void boost_zones_for_atomic(struct alloc_context *ac, gfp_t gfp_mask) > +{ > + struct zoneref *z; > + struct zone *zone; > + unsigned long now = jiffies; > + > + for_each_zone_zonelist(zone, z, ac->zonelist, ac->highest_zoneidx) { > + /* 1 second debounce to avoid spamming boosts in a burst */ > + if (time_after(now, zone->last_boost_jiffies + HZ)) { > + zone->last_boost_jiffies = now; > + if (boost_watermark(zone)) > + wakeup_kswapd(zone, gfp_mask, 0, ac->highest_zoneidx); > + /* Only boost the preferred zone to be precise */ > + break; > + } > + } > +} > + > /* > * When we are falling back to another migratetype during allocation, should we > * try to claim an entire block to satisfy further allocations, instead of > @@ -4742,6 +4761,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > if (page) > goto got_pg; > > + /* Proactively boost for atomic requests entering slowpath */ > + if ((gfp_mask & GFP_ATOMIC) && order == 0) > + boost_zones_for_atomic(ac, gfp_mask); > + > /* > * For costly allocations, try direct compaction first, as it's likely > * that we have enough base pages and don't need to reclaim. For non- > @@ -4947,6 +4970,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > goto retry; > } > fail: > + /* Boost watermarks on atomic allocation failure to trigger kswapd */ > + if (unlikely(page == NULL && (gfp_mask & GFP_ATOMIC) && order == 0)) > + boost_zones_for_atomic(ac, gfp_mask); > + > warn_alloc(gfp_mask, ac->nodemask, > "page allocation failure: order:%u", order); > got_pg: > -- > 2.51.0 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v5] mm/page_alloc: boost watermarks on atomic allocation failure 2026-01-21 20:56 ` Andrew Morton @ 2026-01-22 1:40 ` Qiliang Yuan 2026-01-22 2:00 ` [PATCH] " Qiliang Yuan 2026-01-22 2:07 ` [PATCH v6] " Qiliang Yuan 2 siblings, 0 replies; 9+ messages in thread From: Qiliang Yuan @ 2026-01-22 1:40 UTC (permalink / raw) To: akpm Cc: david, mhocko, vbabka, willy, lance.yang, hannes, surenb, jackmanb, ziy, weixugc, rppt, linux-mm, linux-kernel, netdev, edumazet [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=y, Size: 4687 bytes --] On Wed, 21 Jan 2026 12:56:03 -0800 Andrew Morton <akpm@linux-foundation.org> wrote: > This seems sensible to me - dynamically boost reserves in response to > sustained GFP_ATOMIC allocation failures. It's very much a networking > thing and I expect the networking people have been looking at these > issues for years. So let's start by cc'ing them! Thank you for the feedback and for cc'ing the networking folks! I appreciate your continued engagement throughout this patch series (v1-v5). > Obvious question, which I think was asked before: what about gradually > decreasing those reserves when the packet storm has subsided? > > > v4: > > - Introduced watermark_scale_boost and gradual decay via balance_pgdat. > > And there it is, but v5 removed this. Why? Or perhaps I'm misreading > the implementation. You're absolutely right - v4 did include a gradual decay mechanism. The evolution from v1 to v5 was driven by community feedback, and I'd like to explain the rationale for each major change: **v1 → v2**: Following your and Matthew Wilcox's feedback on v1, I: - Reduced the boost from doubling (100%) to 50% increase - Added a decay mechanism (5% every 5 minutes) - Added debounce logic - v1: https://lore.kernel.org/all/tencent_9DB6637676D639B4B7AEA09CC6A6F9E49D0A@qq.com/ - v2: https://lore.kernel.org/all/tencent_6FE67BA7BE8376AB038A71ACAD4FF8A90006@qq.com/ **v2 → v3**: Following Michal Hocko's suggestion to use watermark_scale_factor instead of min_free_kbytes, I switched to the watermark_boost infrastructure. This was a significant simplification that reused existing MM subsystem patterns. - v3: https://lore.kernel.org/all/tencent_44B556221480D8371FBC534ACCF3CE2C8707@qq.com/ **v3 → v4**: Added watermark_scale_boost and gradual decay via balance_pgdat() to provide more fine-grained control over the reclaim aggressiveness. - v4: https://lore.kernel.org/all/tencent_D23BFCB69EA088C55AFAF89F926036743E0A@qq.com/ **v4 → v5**: Removed watermark_scale_boost for the following reasons: - v5: https://lore.kernel.org/all/20260121065740.35616-1-realwujing@gmail.com/ 1. **Natural decay exists**: The existing watermark_boost infrastructure already has a built-in decay path. When kswapd successfully reclaims memory and the zone becomes balanced, kswapd_shrink_node() automatically resets watermark_boost to 0. This happens organically without custom decay logic. 2. **Simplicity**: The v4 approach added custom watermark_scale_boost tracking and manual decay in balance_pgdat(). This added complexity that duplicated functionality already present in the kswapd reclaim path. 3. **Production validation**: In our production environment (high-throughput networking workloads), the natural decay via kswapd proved sufficient. Once memory pressure subsides and kswapd successfully reclaims to the high watermark, the boost is cleared automatically within seconds. However, I recognize this is a trade-off. The v4 gradual decay provided more explicit control over the decay rate. If you or the networking maintainers feel that explicit decay control is important for packet storm scenarios, I'm happy to reintroduce the v4 approach or explore alternative decay strategies (e.g., time-based decay independent of kswapd success). > > + zone->watermark_boost = min(zone->watermark_boost + > > + max(pageblock_nr_pages, zone_managed_pages(zone) >> 10), > > ">> 10" is a magic number. What is the reasoning behind choosing this > value? Good catch. The ">> 10" (divide by 1024) was chosen to provide a zone-proportional boost that scales with zone size: - For a 1GB zone: ~1MB boost per trigger - For a 16GB zone: ~16MB boost per trigger The rationale: 1. **Proportionality**: Larger zones experiencing atomic allocation pressure likely need proportionally larger safety buffers. A fixed pageblock_nr_pages (typically 2MB) might be insufficient for large zones under heavy load. 2. **Conservative scaling**: 1/1024 (~0.1%) is aggressive enough to help during sustained pressure but conservative enough to avoid over-reclaim. This was empirically tuned based on our production workload. 3. **Production results**: In our high-throughput networking environment (100Gbps+ traffic bursts), this value reduced GFP_ATOMIC failures by ~95% without causing excessive kswapd activity or impacting normal allocations. I should document this better. I propose adding a #define: ```c /* * Boost watermarks by ~0.1% of zone size on atomic allocation pressure. * This provides zone-proportional safety buffers: ~1MB per 1GB of zone size. */ #define ATOMIC_BOOST_SCALE_SHIFT 10 ``` Best regards, Qiliang Yuan ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] mm/page_alloc: boost watermarks on atomic allocation failure 2026-01-21 20:56 ` Andrew Morton 2026-01-22 1:40 ` Qiliang Yuan @ 2026-01-22 2:00 ` Qiliang Yuan 2026-01-22 2:17 ` Qiliang Yuan 2026-01-22 2:07 ` [PATCH v6] " Qiliang Yuan 2 siblings, 1 reply; 9+ messages in thread From: Qiliang Yuan @ 2026-01-22 2:00 UTC (permalink / raw) To: akpm Cc: david, mhocko, vbabka, willy, lance.yang, hannes, surenb, jackmanb, ziy, weixugc, rppt, linux-mm, linux-kernel, netdev, edumazet, jis1, wangh13, liyi1, sunshx, zhangzq20, zhangjn11, Qiliang Yuan, Qiliang Yuan Atomic allocations (GFP_ATOMIC) are prone to failure under heavy memory pressure as they cannot enter direct reclaim. This patch introduces a 'Soft Boost' mechanism to mitigate this. When a GFP_ATOMIC request fails or enters the slowpath, the preferred zone's watermark_boost is increased. This triggers kswapd to proactively reclaim memory, creating a safety buffer for future atomic bursts. To prevent excessive reclaim during packet storms, a 1-second debounce timer (last_boost_jiffies) is added to each zone to rate-limit boosts. This approach reuses existing watermark_boost infrastructure, ensuring minimal overhead and asynchronous background reclaim via kswapd. Allocation failure logs: [38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0 [38535644.725059] node 1: slabs: 339, objs: 17616, free: 317 [38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC) [38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0 [38535645.447664] node 0: slabs: 940, objs: 40864, free: 144 [38535645.454026] node 1: slabs: 322, objs: 19168, free: 383 [38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC) [38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0 [38535649.655523] warn_alloc: 59 callbacks suppressed [38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null) [38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1 Signed-off-by: Qiliang Yuan <realwujing@gmail.com> Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn> --- v6: - Replace magic number ">> 10" with ATOMIC_BOOST_SCALE_SHIFT define - Add documentation explaining 0.1% zone size boost rationale v5: - Simplify to use native boost_watermark() instead of custom logic v4: - Add watermark_scale_boost and gradual decay via balance_pgdat v3: - Move debounce timer to per-zone; optimize zone selection v2: - Add debounce logic and zone-proportional boosting v1: - Initial: boost min_free_kbytes on GFP_ATOMIC failure --- include/linux/mmzone.h | 1 + mm/page_alloc.c | 36 +++++++++++++++++++++++++++++++++++- 2 files changed, 36 insertions(+), 1 deletion(-) --- mm/page_alloc.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1faace9e2dc5..8ea2435125d5 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -218,6 +218,13 @@ unsigned int pageblock_order __read_mostly; static void __free_pages_ok(struct page *page, unsigned int order, fpi_t fpi_flags); +/* + * Boost watermarks by ~0.1% of zone size on atomic allocation pressure. + * This provides zone-proportional safety buffers: ~1MB per 1GB of zone size. + * Larger zones under GFP_ATOMIC pressure need proportionally larger reserves. + */ +#define ATOMIC_BOOST_SCALE_SHIFT 10 + /* * results with 256, 32 in the lowmem_reserve sysctl: * 1G machine -> (16M dma, 800M-16M normal, 1G-800M high) @@ -2190,7 +2197,7 @@ static inline bool boost_watermark(struct zone *zone) max_boost = max(pageblock_nr_pages, max_boost); zone->watermark_boost = min(zone->watermark_boost + - max(pageblock_nr_pages, zone_managed_pages(zone) >> 10), + max(pageblock_nr_pages, zone_managed_pages(zone) >> ATOMIC_BOOST_SCALE_SHIFT), max_boost); return true; -- 2.51.0 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm/page_alloc: boost watermarks on atomic allocation failure 2026-01-22 2:00 ` [PATCH] " Qiliang Yuan @ 2026-01-22 2:17 ` Qiliang Yuan 0 siblings, 0 replies; 9+ messages in thread From: Qiliang Yuan @ 2026-01-22 2:17 UTC (permalink / raw) To: realwujing Cc: akpm, david, edumazet, hannes, jackmanb, jis1, lance.yang, linux-kernel, linux-mm, liyi1, mhocko, netdev, rppt, sunshx, surenb, vbabka, wangh13, weixugc, willy, yuanql9, zhangjn11, zhangzq20, ziy Hi, Please ignore this patch. I realized that I forgot to label the subject as v6, and I also forgot to rebase properly, so the changes from v5 were not correctly merged into this version. I will rebase and send a proper v6 shortly. Sorry for the noise. Best regards, Qiliang Yuan ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v6] mm/page_alloc: boost watermarks on atomic allocation failure 2026-01-21 20:56 ` Andrew Morton 2026-01-22 1:40 ` Qiliang Yuan 2026-01-22 2:00 ` [PATCH] " Qiliang Yuan @ 2026-01-22 2:07 ` Qiliang Yuan 2026-01-22 12:22 ` Vlastimil Babka 2 siblings, 1 reply; 9+ messages in thread From: Qiliang Yuan @ 2026-01-22 2:07 UTC (permalink / raw) To: akpm Cc: david, mhocko, vbabka, willy, lance.yang, hannes, surenb, jackmanb, ziy, weixugc, rppt, linux-mm, linux-kernel, netdev, edumazet, jis1, wangh13, liyi1, sunshx, zhangzq20, zhangjn11, Qiliang Yuan, Qiliang Yuan Atomic allocations (GFP_ATOMIC) are prone to failure under heavy memory pressure as they cannot enter direct reclaim. This patch introduces a 'Soft Boost' mechanism to mitigate this. When a GFP_ATOMIC request fails or enters the slowpath, the preferred zone's watermark_boost is increased. This triggers kswapd to proactively reclaim memory, creating a safety buffer for future atomic bursts. To prevent excessive reclaim during packet storms, a 1-second debounce timer (last_boost_jiffies) is added to each zone to rate-limit boosts. This approach reuses existing watermark_boost infrastructure, ensuring minimal overhead and asynchronous background reclaim via kswapd. Allocation failure logs: [38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0 [38535644.725059] node 1: slabs: 339, objs: 17616, free: 317 [38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC) [38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0 [38535645.447664] node 0: slabs: 940, objs: 40864, free: 144 [38535645.454026] node 1: slabs: 322, objs: 19168, free: 383 [38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC) [38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0 [38535649.655523] warn_alloc: 59 callbacks suppressed [38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null) [38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1 Signed-off-by: Qiliang Yuan <realwujing@gmail.com> Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn> --- v6: - Replace magic number ">> 10" with ATOMIC_BOOST_SCALE_SHIFT define - Add documentation explaining 0.1% zone size boost rationale v5: - Simplify to use native boost_watermark() instead of custom logic v4: - Add watermark_scale_boost and gradual decay via balance_pgdat v3: - Move debounce timer to per-zone; optimize zone selection v2: - Add debounce logic and zone-proportional boosting v1: - Initial: boost min_free_kbytes on GFP_ATOMIC failure --- include/linux/mmzone.h | 1 + mm/page_alloc.c | 36 +++++++++++++++++++++++++++++++++++- 2 files changed, 36 insertions(+), 1 deletion(-) --- include/linux/mmzone.h | 1 + mm/page_alloc.c | 36 +++++++++++++++++++++++++++++++++++- 2 files changed, 36 insertions(+), 1 deletion(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 75ef7c9f9307..8e37e4e6765b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -882,6 +882,7 @@ struct zone { /* zone watermarks, access with *_wmark_pages(zone) macros */ unsigned long _watermark[NR_WMARK]; unsigned long watermark_boost; + unsigned long last_boost_jiffies; unsigned long nr_reserved_highatomic; unsigned long nr_free_highatomic; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c380f063e8b7..8ea2435125d5 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -218,6 +218,13 @@ unsigned int pageblock_order __read_mostly; static void __free_pages_ok(struct page *page, unsigned int order, fpi_t fpi_flags); +/* + * Boost watermarks by ~0.1% of zone size on atomic allocation pressure. + * This provides zone-proportional safety buffers: ~1MB per 1GB of zone size. + * Larger zones under GFP_ATOMIC pressure need proportionally larger reserves. + */ +#define ATOMIC_BOOST_SCALE_SHIFT 10 + /* * results with 256, 32 in the lowmem_reserve sysctl: * 1G machine -> (16M dma, 800M-16M normal, 1G-800M high) @@ -2189,12 +2196,31 @@ static inline bool boost_watermark(struct zone *zone) max_boost = max(pageblock_nr_pages, max_boost); - zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages, + zone->watermark_boost = min(zone->watermark_boost + + max(pageblock_nr_pages, zone_managed_pages(zone) >> ATOMIC_BOOST_SCALE_SHIFT), max_boost); return true; } +static void boost_zones_for_atomic(struct alloc_context *ac, gfp_t gfp_mask) +{ + struct zoneref *z; + struct zone *zone; + unsigned long now = jiffies; + + for_each_zone_zonelist(zone, z, ac->zonelist, ac->highest_zoneidx) { + /* 1 second debounce to avoid spamming boosts in a burst */ + if (time_after(now, zone->last_boost_jiffies + HZ)) { + zone->last_boost_jiffies = now; + if (boost_watermark(zone)) + wakeup_kswapd(zone, gfp_mask, 0, ac->highest_zoneidx); + /* Only boost the preferred zone to be precise */ + break; + } + } +} + /* * When we are falling back to another migratetype during allocation, should we * try to claim an entire block to satisfy further allocations, instead of @@ -4742,6 +4768,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, if (page) goto got_pg; + /* Proactively boost for atomic requests entering slowpath */ + if ((gfp_mask & GFP_ATOMIC) && order == 0) + boost_zones_for_atomic(ac, gfp_mask); + /* * For costly allocations, try direct compaction first, as it's likely * that we have enough base pages and don't need to reclaim. For non- @@ -4947,6 +4977,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, goto retry; } fail: + /* Boost watermarks on atomic allocation failure to trigger kswapd */ + if (unlikely(page == NULL && (gfp_mask & GFP_ATOMIC) && order == 0)) + boost_zones_for_atomic(ac, gfp_mask); + warn_alloc(gfp_mask, ac->nodemask, "page allocation failure: order:%u", order); got_pg: -- 2.51.0 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6] mm/page_alloc: boost watermarks on atomic allocation failure 2026-01-22 2:07 ` [PATCH v6] " Qiliang Yuan @ 2026-01-22 12:22 ` Vlastimil Babka 2026-01-23 6:42 ` [PATCH v7] " Qiliang Yuan 0 siblings, 1 reply; 9+ messages in thread From: Vlastimil Babka @ 2026-01-22 12:22 UTC (permalink / raw) To: Qiliang Yuan, akpm Cc: david, mhocko, willy, lance.yang, hannes, surenb, jackmanb, ziy, weixugc, rppt, linux-mm, linux-kernel, netdev, edumazet, jis1, wangh13, liyi1, sunshx, zhangzq20, zhangjn11, Qiliang Yuan On 1/22/26 03:07, Qiliang Yuan wrote: > Atomic allocations (GFP_ATOMIC) are prone to failure under heavy memory > pressure as they cannot enter direct reclaim. This patch introduces a > 'Soft Boost' mechanism to mitigate this. > > When a GFP_ATOMIC request fails or enters the slowpath, the preferred > zone's watermark_boost is increased. This triggers kswapd to proactively > reclaim memory, creating a safety buffer for future atomic bursts. > > To prevent excessive reclaim during packet storms, a 1-second debounce > timer (last_boost_jiffies) is added to each zone to rate-limit boosts. > > This approach reuses existing watermark_boost infrastructure, ensuring > minimal overhead and asynchronous background reclaim via kswapd. > > Allocation failure logs: > [38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0 > [38535644.725059] node 1: slabs: 339, objs: 17616, free: 317 > [38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC) > [38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0 > [38535645.447664] node 0: slabs: 940, objs: 40864, free: 144 > [38535645.454026] node 1: slabs: 322, objs: 19168, free: 383 > [38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC) > [38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0 > [38535649.655523] warn_alloc: 59 callbacks suppressed > [38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null) > [38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1 > > Signed-off-by: Qiliang Yuan <realwujing@gmail.com> > Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn> > --- > v6: > - Replace magic number ">> 10" with ATOMIC_BOOST_SCALE_SHIFT define > - Add documentation explaining 0.1% zone size boost rationale > v5: > - Simplify to use native boost_watermark() instead of custom logic > v4: > - Add watermark_scale_boost and gradual decay via balance_pgdat > v3: > - Move debounce timer to per-zone; optimize zone selection > v2: > - Add debounce logic and zone-proportional boosting > v1: > - Initial: boost min_free_kbytes on GFP_ATOMIC failure > --- > include/linux/mmzone.h | 1 + > mm/page_alloc.c | 36 +++++++++++++++++++++++++++++++++++- > 2 files changed, 36 insertions(+), 1 deletion(-) > --- > include/linux/mmzone.h | 1 + > mm/page_alloc.c | 36 +++++++++++++++++++++++++++++++++++- > 2 files changed, 36 insertions(+), 1 deletion(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 75ef7c9f9307..8e37e4e6765b 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -882,6 +882,7 @@ struct zone { > /* zone watermarks, access with *_wmark_pages(zone) macros */ > unsigned long _watermark[NR_WMARK]; > unsigned long watermark_boost; > + unsigned long last_boost_jiffies; > > unsigned long nr_reserved_highatomic; > unsigned long nr_free_highatomic; > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index c380f063e8b7..8ea2435125d5 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -218,6 +218,13 @@ unsigned int pageblock_order __read_mostly; > static void __free_pages_ok(struct page *page, unsigned int order, > fpi_t fpi_flags); > > +/* > + * Boost watermarks by ~0.1% of zone size on atomic allocation pressure. > + * This provides zone-proportional safety buffers: ~1MB per 1GB of zone size. > + * Larger zones under GFP_ATOMIC pressure need proportionally larger reserves. > + */ > +#define ATOMIC_BOOST_SCALE_SHIFT 10 > + > /* > * results with 256, 32 in the lowmem_reserve sysctl: > * 1G machine -> (16M dma, 800M-16M normal, 1G-800M high) > @@ -2189,12 +2196,31 @@ static inline bool boost_watermark(struct zone *zone) > > max_boost = max(pageblock_nr_pages, max_boost); > > - zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages, > + zone->watermark_boost = min(zone->watermark_boost + > + max(pageblock_nr_pages, zone_managed_pages(zone) >> ATOMIC_BOOST_SCALE_SHIFT), So IIUC you are not changing (increasing) the maximum boost, but the amount in one step. It would be more descriptive to first set a local variable with this amount and then use it for the boosting. This change also affects the original boost_watermark() caller. Maybe it's fine, can't say without any measurements. > max_boost); > > return true; > } > > +static void boost_zones_for_atomic(struct alloc_context *ac, gfp_t gfp_mask) > +{ > + struct zoneref *z; > + struct zone *zone; > + unsigned long now = jiffies; > + > + for_each_zone_zonelist(zone, z, ac->zonelist, ac->highest_zoneidx) { > + /* 1 second debounce to avoid spamming boosts in a burst */ > + if (time_after(now, zone->last_boost_jiffies + HZ)) { > + zone->last_boost_jiffies = now; > + if (boost_watermark(zone)) > + wakeup_kswapd(zone, gfp_mask, 0, ac->highest_zoneidx); The other caller of boost_watermark() is under zone->lock and it makes those zone->watermark_boost increments safe, and balance_pgdat() takes it for the decrements too with "/* Increments are under the zone lock */ " comment, otherwise I wouldn't realize this. It probably wouldn't hurt to add a lockdep assert into boost_watermark() to prevent mistakes. But the other caller also takes care not to call wakeup_kswapd() under the zone lock so I would not do it as well - see commit 73444bc4d8f92 > + /* Only boost the preferred zone to be precise */ > + break; > + } > + } > +} > + > /* > * When we are falling back to another migratetype during allocation, should we > * try to claim an entire block to satisfy further allocations, instead of > @@ -4742,6 +4768,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > if (page) > goto got_pg; > > + /* Proactively boost for atomic requests entering slowpath */ > + if ((gfp_mask & GFP_ATOMIC) && order == 0) > + boost_zones_for_atomic(ac, gfp_mask); > + > /* > * For costly allocations, try direct compaction first, as it's likely > * that we have enough base pages and don't need to reclaim. For non- > @@ -4947,6 +4977,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > goto retry; > } > fail: > + /* Boost watermarks on atomic allocation failure to trigger kswapd */ > + if (unlikely(page == NULL && (gfp_mask & GFP_ATOMIC) && order == 0)) > + boost_zones_for_atomic(ac, gfp_mask); We already did the boosting when entering slowpath, there's 1 second debounce and GFP_ATOMIC can't really do anything in the slowpath to spend 1 second, so I think this is redundant. > + > warn_alloc(gfp_mask, ac->nodemask, > "page allocation failure: order:%u", order); > got_pg: ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v7] mm/page_alloc: boost watermarks on atomic allocation failure 2026-01-22 12:22 ` Vlastimil Babka @ 2026-01-23 6:42 ` Qiliang Yuan 2026-01-27 6:06 ` kernel test robot 0 siblings, 1 reply; 9+ messages in thread From: Qiliang Yuan @ 2026-01-23 6:42 UTC (permalink / raw) To: vbabka Cc: akpm, david, edumazet, hannes, jackmanb, jis1, lance.yang, linux-kernel, linux-mm, liyi1, mhocko, netdev, realwujing, rppt, sunshx, surenb, wangh13, weixugc, willy, yuanql9, zhangjn11, zhangzq20, ziy Atomic allocations (GFP_ATOMIC) are prone to failure under heavy memory pressure as they cannot enter direct reclaim. This patch introduces a watermark boost mechanism to mitigate this issue. When a GFP_ATOMIC request enters the slowpath, the preferred zone's watermark_boost is increased under zone->lock protection. This triggers kswapd to proactively reclaim memory, creating a safety buffer for future atomic allocations. A 1-second debounce timer prevents excessive boosts during traffic bursts. This approach reuses existing watermark_boost infrastructure with minimal overhead and proper locking to ensure thread safety. Allocation failure logs: [38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0 [38535644.725059] node 1: slabs: 339, objs: 17616, free: 317 [38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC) [38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0 [38535645.447664] node 0: slabs: 940, objs: 40864, free: 144 [38535645.454026] node 1: slabs: 322, objs: 19168, free: 383 [38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC) [38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0 [38535649.655523] warn_alloc: 59 callbacks suppressed [38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null) [38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1 Signed-off-by: Qiliang Yuan <realwujing@gmail.com> Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn> --- v7: - Use local variable for boost_amount to improve code readability - Add zone->lock protection in boost_zones_for_atomic() - Add lockdep assertion in boost_watermark() to prevent locking mistakes - Remove redundant boost call at fail label due to 1-second debounce v6: - Replace magic number ">> 10" with ATOMIC_BOOST_SCALE_SHIFT define - Add documentation explaining 0.1% zone size boost rationale v5: - Simplify to use native boost_watermark() instead of custom logic v4: - Add watermark_scale_boost and gradual decay via balance_pgdat v3: - Move debounce timer to per-zone; optimize zone selection v2: - Add debounce logic and zone-proportional boosting v1: - Initial: boost min_free_kbytes on GFP_ATOMIC failure include/linux/mmzone.h | 1 + mm/page_alloc.c | 46 ++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 45 insertions(+), 2 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 75ef7c9f9307..8e37e4e6765b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -882,6 +882,7 @@ struct zone { /* zone watermarks, access with *_wmark_pages(zone) macros */ unsigned long _watermark[NR_WMARK]; unsigned long watermark_boost; + unsigned long last_boost_jiffies; unsigned long nr_reserved_highatomic; unsigned long nr_free_highatomic; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c380f063e8b7..94168571cc38 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -218,6 +218,13 @@ unsigned int pageblock_order __read_mostly; static void __free_pages_ok(struct page *page, unsigned int order, fpi_t fpi_flags); +/* + * Boost watermarks by ~0.1% of zone size on atomic allocation pressure. + * This provides zone-proportional safety buffers: ~1MB per 1GB of zone size. + * Larger zones under GFP_ATOMIC pressure need proportionally larger reserves. + */ +#define ATOMIC_BOOST_SCALE_SHIFT 10 + /* * results with 256, 32 in the lowmem_reserve sysctl: * 1G machine -> (16M dma, 800M-16M normal, 1G-800M high) @@ -2161,6 +2168,9 @@ bool pageblock_unisolate_and_move_free_pages(struct zone *zone, struct page *pag static inline bool boost_watermark(struct zone *zone) { unsigned long max_boost; + unsigned long boost_amount; + + lockdep_assert_held(&zone->lock); if (!watermark_boost_factor) return false; @@ -2189,12 +2199,40 @@ static inline bool boost_watermark(struct zone *zone) max_boost = max(pageblock_nr_pages, max_boost); - zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages, - max_boost); + boost_amount = max(pageblock_nr_pages, + zone_managed_pages(zone) >> ATOMIC_BOOST_SCALE_SHIFT); + zone->watermark_boost = min(zone->watermark_boost + boost_amount, + max_boost); return true; } +static void boost_zones_for_atomic(struct alloc_context *ac, gfp_t gfp_mask) +{ + struct zoneref *z; + struct zone *zone; + unsigned long now = jiffies; + bool should_wake; + + for_each_zone_zonelist(zone, z, ac->zonelist, ac->highest_zoneidx) { + /* Rate-limit boosts to once per second per zone */ + if (time_after(now, zone->last_boost_jiffies + HZ)) { + zone->last_boost_jiffies = now; + + /* Modify watermark under lock, wake kswapd outside */ + spin_lock(&zone->lock); + should_wake = boost_watermark(zone); + spin_unlock(&zone->lock); + + if (should_wake) + wakeup_kswapd(zone, gfp_mask, 0, ac->highest_zoneidx); + + /* Boost only the preferred zone */ + break; + } + } +} + /* * When we are falling back to another migratetype during allocation, should we * try to claim an entire block to satisfy further allocations, instead of @@ -4742,6 +4780,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, if (page) goto got_pg; + /* Boost watermarks for atomic requests entering slowpath */ + if ((gfp_mask & GFP_ATOMIC) && order == 0) + boost_zones_for_atomic(ac, gfp_mask); + /* * For costly allocations, try direct compaction first, as it's likely * that we have enough base pages and don't need to reclaim. For non- -- 2.51.0 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v7] mm/page_alloc: boost watermarks on atomic allocation failure 2026-01-23 6:42 ` [PATCH v7] " Qiliang Yuan @ 2026-01-27 6:06 ` kernel test robot 0 siblings, 0 replies; 9+ messages in thread From: kernel test robot @ 2026-01-27 6:06 UTC (permalink / raw) To: Qiliang Yuan Cc: oe-lkp, lkp, Qiliang Yuan, linux-mm, vbabka, akpm, david, edumazet, hannes, jackmanb, jis1, lance.yang, linux-kernel, liyi1, mhocko, netdev, realwujing, rppt, sunshx, surenb, wangh13, weixugc, willy, zhangjn11, zhangzq20, ziy, oliver.sang Hello, kernel test robot noticed "WARNING:inconsistent_lock_state" on: commit: 4f0cbecbc533f56605274f6211e31907ed792bdf ("[PATCH v7] mm/page_alloc: boost watermarks on atomic allocation failure") url: https://github.com/intel-lab-lkp/linux/commits/Qiliang-Yuan/mm-page_alloc-boost-watermarks-on-atomic-allocation-failure/20260123-144418 base: v6.19-rc6 patch link: https://lore.kernel.org/all/20260123064231.250767-1-realwujing@gmail.com/ patch subject: [PATCH v7] mm/page_alloc: boost watermarks on atomic allocation failure in testcase: trinity version: trinity-i386-abe9de86-1_20230429 with following parameters: runtime: 600s config: x86_64-randconfig-075-20251114 compiler: gcc-14 test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 32G (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202601271341.5d24a59f-lkp@intel.com [ 151.153230][ T1379] WARNING: inconsistent lock state [ 151.153836][ T1379] 6.19.0-rc6-00001-g4f0cbecbc533 #1 Not tainted [ 151.154605][ T1379] -------------------------------- [ 151.155192][ T1379] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. [ 151.155825][ T1379] trinity-c0/1379 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 151.156424][ T1379] ffffffff8865d9e8 (&zone->lock){+.?.}-{3:3}, at: __alloc_pages_slowpath+0x1265/0x1b00 [ 151.157399][ T1379] {IN-SOFTIRQ-W} state was registered at: [ 151.158029][ T1379] __lock_acquire (kernel/locking/lockdep.c:5191 (discriminator 1)) [ 151.158629][ T1379] lock_acquire (include/trace/events/lock.h:24 (discriminator 15) include/trace/events/lock.h:24 (discriminator 15) kernel/locking/lockdep.c:5831 (discriminator 15)) [ 151.159221][ T1379] lock_acquire (kernel/locking/lockdep.c:5833) [ 151.159670][ T1379] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162) [ 151.160180][ T1379] rmqueue_bulk (mm/page_alloc.c:2592 (discriminator 1)) [ 151.160637][ T1379] __rmqueue_pcplist (mm/page_alloc.c:3374 (discriminator 3)) [ 151.161132][ T1379] rmqueue+0x1b3c/0x3400 [ 151.161630][ T1379] get_page_from_freelist (mm/page_alloc.c:3982) [ 151.162294][ T1379] __alloc_frozen_pages_noprof (mm/page_alloc.c:5282) [ 151.163006][ T1379] allocate_slab (mm/slub.c:3079 mm/slub.c:3248) [ 151.163482][ T1379] ___slab_alloc (mm/slub.c:3302 mm/slub.c:4656) [ 151.163962][ T1379] __slab_alloc+0x30/0x80 [ 151.164464][ T1379] __kmalloc_noprof (mm/slub.c:4855 mm/slub.c:5251 mm/slub.c:5656 mm/slub.c:5669) [ 151.164954][ T1379] alloc_ep_req (drivers/usb/gadget/u_f.c:22 (discriminator 4)) [ 151.165425][ T1379] source_sink_start_ep (drivers/usb/gadget/function/f_sourcesink.c:292 drivers/usb/gadget/function/f_sourcesink.c:608) [ 151.166375][ T1379] enable_source_sink (drivers/usb/gadget/function/f_sourcesink.c:662) [ 151.167006][ T1379] sourcesink_set_alt (drivers/usb/gadget/function/f_sourcesink.c:744) [ 151.167515][ T1379] set_config+0x323/0xb00 [ 151.168019][ T1379] composite_setup (include/linux/spinlock.h:391 drivers/usb/gadget/composite.c:1901) [ 151.168548][ T1379] dummy_timer (include/linux/spinlock.h:351 drivers/usb/gadget/udc/dummy_hcd.c:1929) [ 151.169019][ T1379] __hrtimer_run_queues (kernel/time/hrtimer.c:1777 kernel/time/hrtimer.c:1841) [ 151.169550][ T1379] hrtimer_run_softirq (kernel/time/hrtimer.c:1860) [ 151.170198][ T1379] handle_softirqs (arch/x86/include/asm/jump_label.h:37 include/trace/events/irq.h:142 kernel/softirq.c:623) [ 151.170817][ T1379] __irq_exit_rcu (kernel/softirq.c:657 kernel/softirq.c:496 kernel/softirq.c:723) [ 151.171325][ T1379] irq_exit_rcu (kernel/softirq.c:741 (discriminator 38)) [ 151.171780][ T1379] sysvec_call_function_single (arch/x86/kernel/smp.c:266 (discriminator 35) arch/x86/kernel/smp.c:266 (discriminator 35)) [ 151.172366][ T1379] asm_sysvec_call_function_single (arch/x86/include/asm/idtentry.h:704) [ 151.172952][ T1379] pv_native_safe_halt (arch/x86/kernel/paravirt.c:82) [ 151.173449][ T1379] arch_cpu_idle (arch/x86/kernel/process.c:805) [ 151.173991][ T1379] default_idle_call (include/linux/cpuidle.h:143 (discriminator 1) kernel/sched/idle.c:123 (discriminator 1)) [ 151.174997][ T1379] cpuidle_idle_call (kernel/sched/idle.c:192) [ 151.175913][ T1379] do_idle (kernel/sched/idle.c:332) [ 151.176666][ T1379] cpu_startup_entry (kernel/sched/idle.c:429) [ 151.177536][ T1379] rest_init (init/main.c:757) [ 151.178357][ T1379] start_kernel (init/main.c:1206) [ 151.179249][ T1379] x86_64_start_reservations (arch/x86/kernel/head64.c:310) [ 151.180213][ T1379] x86_64_start_kernel (??:?) [ 151.181119][ T1379] common_startup_64 (arch/x86/kernel/head_64.S:419) [ 151.181992][ T1379] irq event stamp: 16437333 [ 151.182773][ T1379] hardirqs last enabled at (16437333): _raw_spin_unlock_irq (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:119 include/linux/spinlock_api_smp.h:159 kernel/locking/spinlock.c:202) [ 151.184399][ T1379] hardirqs last disabled at (16437332): _raw_spin_lock_irq (include/linux/spinlock_api_smp.h:117 (discriminator 1) kernel/locking/spinlock.c:170 (discriminator 1)) [ 151.186068][ T1379] softirqs last enabled at (16422642): handle_softirqs (kernel/softirq.c:469 (discriminator 2) kernel/softirq.c:650 (discriminator 2)) [ 151.187758][ T1379] softirqs last disabled at (16422637): __irq_exit_rcu (kernel/softirq.c:657 kernel/softirq.c:496 kernel/softirq.c:723) [ 151.189383][ T1379] [ 151.189383][ T1379] other info that might help us debug this: [ 151.190809][ T1379] Possible unsafe locking scenario: [ 151.190809][ T1379] [ 151.192092][ T1379] CPU0 [ 151.192740][ T1379] ---- [ 151.193401][ T1379] lock(&zone->lock); [ 151.194192][ T1379] <Interrupt> [ 151.194940][ T1379] lock(&zone->lock); [ 151.195734][ T1379] [ 151.195734][ T1379] *** DEADLOCK *** [ 151.195734][ T1379] [ 151.197223][ T1379] 2 locks held by trinity-c0/1379: [ 151.198134][ T1379] #0: ffff8881001b2400 (sb_writers#5){.+.+}-{0:0}, at: ksys_fallocate (include/linux/file.h:62 (discriminator 1) include/linux/file.h:83 (discriminator 1) fs/open.c:358 (discriminator 1)) [ 151.199767][ T1379] #1: ffff8881373cd0d0 (&sb->s_type->i_mutex_key#12){+.+.}-{4:4}, at: shmem_fallocate (mm/shmem.c:3688) [ 151.201543][ T1379] [ 151.201543][ T1379] stack backtrace: [ 151.202709][ T1379] CPU: 0 UID: 65534 PID: 1379 Comm: trinity-c0 Not tainted 6.19.0-rc6-00001-g4f0cbecbc533 #1 PREEMPT 5190a26909b47a16cbbcf00ba20dd51a77658a62 [ 151.202723][ T1379] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 151.202735][ T1379] Call Trace: [ 151.202739][ T1379] <TASK> [ 151.202743][ T1379] dump_stack_lvl (lib/dump_stack.c:122) [ 151.202760][ T1379] dump_stack (lib/dump_stack.c:130) [ 151.202766][ T1379] print_usage_bug+0x25d/0x380 [ 151.202775][ T1379] mark_lock_irq (kernel/locking/lockdep.c:4268) [ 151.202782][ T1379] ? save_trace (kernel/locking/lockdep.c:557 (discriminator 1) kernel/locking/lockdep.c:594 (discriminator 1)) [ 151.202792][ T1379] mark_lock (kernel/locking/lockdep.c:4753) [ 151.202798][ T1379] mark_usage (kernel/locking/lockdep.c:4666 (discriminator 1)) [ 151.202803][ T1379] __lock_acquire (kernel/locking/lockdep.c:5191 (discriminator 1)) [ 151.202809][ T1379] ? mark_held_locks (kernel/locking/lockdep.c:4325 (discriminator 1)) [ 151.202815][ T1379] lock_acquire (include/trace/events/lock.h:24 (discriminator 15) include/trace/events/lock.h:24 (discriminator 15) kernel/locking/lockdep.c:5831 (discriminator 15)) [ 151.202820][ T1379] ? __alloc_pages_slowpath+0x1265/0x1b00 [ 151.202832][ T1379] ? get_page_from_freelist (mm/page_alloc.c:3733 (discriminator 1) mm/page_alloc.c:3936 (discriminator 1)) [ 151.202838][ T1379] ? wakeup_kswapd (arch/x86/include/asm/atomic64_64.h:15 include/linux/atomic/atomic-arch-fallback.h:2583 include/linux/atomic/atomic-long.h:38 include/linux/atomic/atomic-instrumented.h:3189 include/linux/mmzone.h:1106 include/linux/mmzone.h:1600 mm/vmscan.c:7378) [ 151.202848][ T1379] lock_acquire (kernel/locking/lockdep.c:5833) [ 151.202853][ T1379] ? __alloc_pages_slowpath+0x1265/0x1b00 [ 151.202861][ T1379] _raw_spin_lock (include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154) [ 151.202869][ T1379] ? __alloc_pages_slowpath+0x1265/0x1b00 [ 151.202876][ T1379] __alloc_pages_slowpath+0x1265/0x1b00 [ 151.202887][ T1379] ? warn_alloc (mm/page_alloc.c:4731) [ 151.202896][ T1379] ? get_page_from_freelist (mm/page_alloc.c:3733 (discriminator 1) mm/page_alloc.c:3936 (discriminator 1)) [ 151.202902][ T1379] ? lock_is_held_type (kernel/locking/lockdep.c:5601 (discriminator 1) kernel/locking/lockdep.c:5940 (discriminator 1)) [ 151.202913][ T1379] __alloc_frozen_pages_noprof (mm/page_alloc.c:5295) [ 151.202920][ T1379] ? __alloc_pages_slowpath+0x1b00/0x1b00 [ 151.202929][ T1379] ? filemap_get_entry (mm/filemap.c:1888) [ 151.202936][ T1379] ? __folio_lock_or_retry (mm/filemap.c:1888) [ 151.202944][ T1379] __folio_alloc_noprof (mm/page_alloc.c:5316 mm/page_alloc.c:5326) [ 151.202951][ T1379] shmem_alloc_and_add_folio+0xfc/0x3c0 [ 151.202961][ T1379] shmem_get_folio_gfp+0x388/0xd80 [ 151.202970][ T1379] shmem_fallocate (mm/shmem.c:3780) [ 151.202980][ T1379] ? shmem_get_link (mm/shmem.c:3675) [ 151.202987][ T1379] ? find_held_lock (kernel/locking/lockdep.c:5350 (discriminator 1)) [ 151.202998][ T1379] ? __lock_acquire (kernel/locking/lockdep.c:5237 (discriminator 1)) [ 151.203003][ T1379] ? find_held_lock (kernel/locking/lockdep.c:5350 (discriminator 1)) [ 151.203013][ T1379] ? lock_acquire (include/trace/events/lock.h:24 (discriminator 15) include/trace/events/lock.h:24 (discriminator 15) kernel/locking/lockdep.c:5831 (discriminator 15)) [ 151.203018][ T1379] ? ksys_fallocate (include/linux/file.h:62 (discriminator 1) include/linux/file.h:83 (discriminator 1) fs/open.c:358 (discriminator 1)) [ 151.203030][ T1379] ? lock_is_held_type (kernel/locking/lockdep.c:5601 (discriminator 1) kernel/locking/lockdep.c:5940 (discriminator 1)) [ 151.203039][ T1379] vfs_fallocate (fs/open.c:339 (discriminator 1)) [ 151.203047][ T1379] ksys_fallocate (include/linux/file.h:62 (discriminator 1) include/linux/file.h:83 (discriminator 1) fs/open.c:358 (discriminator 1)) [ 151.203055][ T1379] __ia32_sys_ia32_fallocate (arch/x86/kernel/sys_ia32.c:119) [ 151.203065][ T1379] ? __do_fast_syscall_32 (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:119 arch/x86/entry/syscall_32.c:280) [ 151.203074][ T1379] ia32_sys_call (arch/x86/entry/syscall_32.c:50) [ 151.203079][ T1379] __do_fast_syscall_32 (arch/x86/entry/syscall_32.c:83 (discriminator 1) arch/x86/entry/syscall_32.c:307 (discriminator 1)) [ 151.203088][ T1379] do_fast_syscall_32 (arch/x86/entry/syscall_32.c:332 (discriminator 1)) [ 151.203096][ T1379] do_SYSENTER_32 (arch/x86/entry/syscall_32.c:371) [ 151.203103][ T1379] entry_SYSENTER_compat_after_hwframe (arch/x86/entry/entry_64_compat.S:127) [ 151.203111][ T1379] RIP: 0023:0xf7ed9589 [ 151.203118][ T1379] Code: 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 2e 8d b4 26 00 00 00 00 8d b4 26 00 00 00 All code ======== 0: 03 74 d8 01 add 0x1(%rax,%rbx,8),%esi ... 20: 00 51 52 add %dl,0x52(%rcx) 23: 55 push %rbp 24: 89 e5 mov %esp,%ebp 26:* 0f 34 sysenter <-- trapping instruction 28: cd 80 int $0x80 2a: 5d pop %rbp 2b: 5a pop %rdx 2c: 59 pop %rcx 2d: c3 ret 2e: 90 nop 2f: 90 nop 30: 90 nop 31: 90 nop 32: 2e 8d b4 26 00 00 00 cs lea 0x0(%rsi,%riz,1),%esi 39: 00 3a: 8d .byte 0x8d 3b: b4 26 mov $0x26,%ah 3d: 00 00 add %al,(%rax) ... Code starting with the faulting instruction =========================================== 0: 5d pop %rbp 1: 5a pop %rdx 2: 59 pop %rcx 3: c3 ret 4: 90 nop 5: 90 nop 6: 90 nop 7: 90 nop 8: 2e 8d b4 26 00 00 00 cs lea 0x0(%rsi,%riz,1),%esi f: 00 10: 8d .byte 0x8d 11: b4 26 mov $0x26,%ah 13: 00 00 add %al,(%rax) The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20260127/202601271341.5d24a59f-lkp@intel.com -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-01-27 6:06 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2026-01-21 6:57 [PATCH v5] mm/page_alloc: boost watermarks on atomic allocation failure Qiliang Yuan 2026-01-21 20:56 ` Andrew Morton 2026-01-22 1:40 ` Qiliang Yuan 2026-01-22 2:00 ` [PATCH] " Qiliang Yuan 2026-01-22 2:17 ` Qiliang Yuan 2026-01-22 2:07 ` [PATCH v6] " Qiliang Yuan 2026-01-22 12:22 ` Vlastimil Babka 2026-01-23 6:42 ` [PATCH v7] " Qiliang Yuan 2026-01-27 6:06 ` kernel test robot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox