[PATCH v9] mm/page_alloc: boost watermarks on atomic allocation failure

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v9] mm/page_alloc: boost watermarks on atomic allocation failure
@ 2026-02-13  3:17 Qiliang Yuan
  2026-02-13  8:46 ` Vlastimil Babka
  2026-02-13 19:36 ` Johannes Weiner
  0 siblings, 2 replies; 4+ messages in thread
From: Qiliang Yuan @ 2026-02-13  3:17 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Brendan Jackman, Johannes Weiner, Zi Yan, Lance Yang
  Cc: linux-mm, linux-kernel, Qiliang Yuan

Atomic allocations (GFP_ATOMIC) are prone to failure under heavy memory
pressure as they cannot enter direct reclaim. This patch introduces a
watermark boost mechanism to mitigate this issue.

When a GFP_ATOMIC request enters the slowpath, the preferred zone's
watermark_boost is increased under zone->lock protection. This triggers
kswapd to proactively reclaim memory, creating a safety buffer for
future atomic allocations. A 1-second debounce timer prevents excessive
boosts during traffic bursts.

This approach reuses existing watermark_boost infrastructure with
minimal overhead and proper locking to ensure thread safety.

Allocation failure logs:
[38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0
[38535644.725059] node 1: slabs: 339, objs: 17616, free: 317
[38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535645.447664] node 0: slabs: 940, objs: 40864, free: 144
[38535645.454026] node 1: slabs: 322, objs: 19168, free: 383
[38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535649.655523] warn_alloc: 59 callbacks suppressed
[38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null)
[38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
v9:
- Use mult_frac() for boost calculation. (SJ)
- Add !can_direct_reclaim check. (Vlastimil)
- Code cleanup: naming, scope, and line limits. (SJ)
- Update tags: Add Vlastimil's Acked-by.

v8:
- Use spin_lock_irqsave() to prevent inconsistent lock state.

v7:
- Use local variable for boost_amount.
- Add zone->lock protection.
- Add lockdep assertion.

v6:
- Use ATOMIC_BOOST_SCALE_SHIFT define.
- Add documentation for 0.1% rationale.

v5:
- Use native boost_watermark().

v4:
- Add watermark_scale_boost and gradual decay.

v3:
- Per-zone debounce timer.

v2:
- Debounce logic and zone-proportional boosting.

v1:
- Initial version.
---
Link to v8: https://lore.kernel.org/r/20260212-wujing-mm-page_alloc-v8-v8-1-daba38990cd3@gmail.com
---
 include/linux/mmzone.h |  1 +
 mm/page_alloc.c        | 49 +++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 75ef7c9f9307..8e37e4e6765b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -882,6 +882,7 @@ struct zone {
 	/* zone watermarks, access with *_wmark_pages(zone) macros */
 	unsigned long _watermark[NR_WMARK];
 	unsigned long watermark_boost;
+	unsigned long last_boost_jiffies;
 
 	unsigned long nr_reserved_highatomic;
 	unsigned long nr_free_highatomic;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c380f063e8b7..8af88584a8bd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -218,6 +218,13 @@ unsigned int pageblock_order __read_mostly;
 static void __free_pages_ok(struct page *page, unsigned int order,
 			    fpi_t fpi_flags);
 
+/*
+ * Boost watermarks by ~0.1% of zone size on atomic allocation pressure.
+ * This provides zone-proportional safety buffers: ~1MB per 1GB of zone size.
+ * Larger zones under GFP_ATOMIC pressure need proportionally larger reserves.
+ */
+#define ATOMIC_BOOST_FACTOR 1
+
 /*
  * results with 256, 32 in the lowmem_reserve sysctl:
  *	1G machine -> (16M dma, 800M-16M normal, 1G-800M high)
@@ -2161,6 +2168,9 @@ bool pageblock_unisolate_and_move_free_pages(struct zone *zone, struct page *pag
 static inline bool boost_watermark(struct zone *zone)
 {
 	unsigned long max_boost;
+	unsigned long boost_amount;
+
+	lockdep_assert_held(&zone->lock);
 
 	if (!watermark_boost_factor)
 		return false;
@@ -2189,12 +2199,43 @@ static inline bool boost_watermark(struct zone *zone)
 
 	max_boost = max(pageblock_nr_pages, max_boost);
 
-	zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages,
-		max_boost);
+	boost_amount = max(pageblock_nr_pages,
+			   mult_frac(zone_managed_pages(zone), ATOMIC_BOOST_FACTOR, 1000));
+	zone->watermark_boost = min(zone->watermark_boost + boost_amount,
+				    max_boost);
 
 	return true;
 }
 
+static void boost_zone_for_atomic(struct alloc_context *ac, gfp_t gfp_mask)
+{
+	struct zoneref *z;
+	struct zone *zone;
+	unsigned long now = jiffies;
+
+	for_each_zone_zonelist(zone, z, ac->zonelist, ac->highest_zoneidx) {
+		/* Rate-limit boosts to once per second per zone */
+		if (time_after(now, zone->last_boost_jiffies + HZ)) {
+			unsigned long flags;
+			bool should_wake;
+
+			zone->last_boost_jiffies = now;
+
+			/* Modify watermark under lock, wake kswapd outside */
+			spin_lock_irqsave(&zone->lock, flags);
+			should_wake = boost_watermark(zone);
+			spin_unlock_irqrestore(&zone->lock, flags);
+
+			if (should_wake)
+				wakeup_kswapd(zone, gfp_mask, 0,
+					      ac->highest_zoneidx);
+
+			/* Boost only the preferred zone */
+			break;
+		}
+	}
+}
+
 /*
  * When we are falling back to another migratetype during allocation, should we
  * try to claim an entire block to satisfy further allocations, instead of
@@ -4742,6 +4783,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (page)
 		goto got_pg;
 
+	/* Boost watermarks for atomic requests entering slowpath */
+	if ((gfp_mask & GFP_ATOMIC) && order == 0 && !can_direct_reclaim)
+		boost_zone_for_atomic(ac, gfp_mask);
+
 	/*
 	 * For costly allocations, try direct compaction first, as it's likely
 	 * that we have enough base pages and don't need to reclaim. For non-

---
base-commit: b54345928fa1dbde534e32ecaa138678fd5d2135
change-id: 20260206-wujing-mm-page_alloc-v8-fb1979bac6fe

Best regards,
-- 
Qiliang Yuan <realwujing@gmail.com>



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v9] mm/page_alloc: boost watermarks on atomic allocation failure
  2026-02-13  3:17 [PATCH v9] mm/page_alloc: boost watermarks on atomic allocation failure Qiliang Yuan
@ 2026-02-13  8:46 ` Vlastimil Babka
  2026-02-13 15:07   ` SeongJae Park
  2026-02-13 19:36 ` Johannes Weiner
  1 sibling, 1 reply; 4+ messages in thread
From: Vlastimil Babka @ 2026-02-13  8:46 UTC (permalink / raw)
  To: Qiliang Yuan, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Brendan Jackman,
	Johannes Weiner, Zi Yan, Lance Yang, SeongJae Park
  Cc: linux-mm, linux-kernel

On 2/13/26 04:17, Qiliang Yuan wrote:
> Atomic allocations (GFP_ATOMIC) are prone to failure under heavy memory
> pressure as they cannot enter direct reclaim. This patch introduces a
> watermark boost mechanism to mitigate this issue.
> 
> When a GFP_ATOMIC request enters the slowpath, the preferred zone's
> watermark_boost is increased under zone->lock protection. This triggers
> kswapd to proactively reclaim memory, creating a safety buffer for
> future atomic allocations. A 1-second debounce timer prevents excessive
> boosts during traffic bursts.
> 
> This approach reuses existing watermark_boost infrastructure with
> minimal overhead and proper locking to ensure thread safety.
> 
> Allocation failure logs:
> [38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0
> [38535644.725059] node 1: slabs: 339, objs: 17616, free: 317
> [38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
> [38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
> [38535645.447664] node 0: slabs: 940, objs: 40864, free: 144
> [38535645.454026] node 1: slabs: 322, objs: 19168, free: 383
> [38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
> [38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
> [38535649.655523] warn_alloc: 59 callbacks suppressed
> [38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null)
> [38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
> ---
> v9:
> - Use mult_frac() for boost calculation. (SJ)
> - Add !can_direct_reclaim check. (Vlastimil)
> - Code cleanup: naming, scope, and line limits. (SJ)
> - Update tags: Add Vlastimil's Acked-by.
> 
> v8:
> - Use spin_lock_irqsave() to prevent inconsistent lock state.
> 
> v7:
> - Use local variable for boost_amount.
> - Add zone->lock protection.
> - Add lockdep assertion.
> 
> v6:
> - Use ATOMIC_BOOST_SCALE_SHIFT define.
> - Add documentation for 0.1% rationale.
> 
> v5:
> - Use native boost_watermark().
> 
> v4:
> - Add watermark_scale_boost and gradual decay.
> 
> v3:
> - Per-zone debounce timer.
> 
> v2:
> - Debounce logic and zone-proportional boosting.
> 
> v1:
> - Initial version.
> ---
> Link to v8: https://lore.kernel.org/r/20260212-wujing-mm-page_alloc-v8-v8-1-daba38990cd3@gmail.com
> ---
>  include/linux/mmzone.h |  1 +
>  mm/page_alloc.c        | 49 +++++++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 48 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 75ef7c9f9307..8e37e4e6765b 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -882,6 +882,7 @@ struct zone {
>  	/* zone watermarks, access with *_wmark_pages(zone) macros */
>  	unsigned long _watermark[NR_WMARK];
>  	unsigned long watermark_boost;
> +	unsigned long last_boost_jiffies;
>  
>  	unsigned long nr_reserved_highatomic;
>  	unsigned long nr_free_highatomic;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c380f063e8b7..8af88584a8bd 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -218,6 +218,13 @@ unsigned int pageblock_order __read_mostly;
>  static void __free_pages_ok(struct page *page, unsigned int order,
>  			    fpi_t fpi_flags);
>  
> +/*
> + * Boost watermarks by ~0.1% of zone size on atomic allocation pressure.
> + * This provides zone-proportional safety buffers: ~1MB per 1GB of zone size.
> + * Larger zones under GFP_ATOMIC pressure need proportionally larger reserves.
> + */
> +#define ATOMIC_BOOST_FACTOR 1

... so now we #define 1 but it makes little sense without that hardcoded
1000 below.

> +
>  /*
>   * results with 256, 32 in the lowmem_reserve sysctl:
>   *	1G machine -> (16M dma, 800M-16M normal, 1G-800M high)
> @@ -2161,6 +2168,9 @@ bool pageblock_unisolate_and_move_free_pages(struct zone *zone, struct page *pag
>  static inline bool boost_watermark(struct zone *zone)
>  {
>  	unsigned long max_boost;
> +	unsigned long boost_amount;
> +
> +	lockdep_assert_held(&zone->lock);
>  
>  	if (!watermark_boost_factor)
>  		return false;
> @@ -2189,12 +2199,43 @@ static inline bool boost_watermark(struct zone *zone)
>  
>  	max_boost = max(pageblock_nr_pages, max_boost);
>  
> -	zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages,
> -		max_boost);
> +	boost_amount = max(pageblock_nr_pages,
> +			   mult_frac(zone_managed_pages(zone), ATOMIC_BOOST_FACTOR, 1000));

I don't think mult_frac() was a great suggestion. We're talking about right
shifting by a constant 10. In the other cases of mult_frac() we use dynamic
values for x and n so it's justified. But this IMHO is unnecessary complication.

> +	zone->watermark_boost = min(zone->watermark_boost + boost_amount,
> +				    max_boost);
>  
>  	return true;
>  }
>  
> +static void boost_zone_for_atomic(struct alloc_context *ac, gfp_t gfp_mask)
> +{
> +	struct zoneref *z;
> +	struct zone *zone;
> +	unsigned long now = jiffies;
> +
> +	for_each_zone_zonelist(zone, z, ac->zonelist, ac->highest_zoneidx) {
> +		/* Rate-limit boosts to once per second per zone */
> +		if (time_after(now, zone->last_boost_jiffies + HZ)) {
> +			unsigned long flags;
> +			bool should_wake;
> +
> +			zone->last_boost_jiffies = now;
> +
> +			/* Modify watermark under lock, wake kswapd outside */
> +			spin_lock_irqsave(&zone->lock, flags);
> +			should_wake = boost_watermark(zone);
> +			spin_unlock_irqrestore(&zone->lock, flags);
> +
> +			if (should_wake)
> +				wakeup_kswapd(zone, gfp_mask, 0,
> +					      ac->highest_zoneidx);
> +
> +			/* Boost only the preferred zone */
> +			break;
> +		}
> +	}
> +}
> +
>  /*
>   * When we are falling back to another migratetype during allocation, should we
>   * try to claim an entire block to satisfy further allocations, instead of
> @@ -4742,6 +4783,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	if (page)
>  		goto got_pg;
>  
> +	/* Boost watermarks for atomic requests entering slowpath */
> +	if ((gfp_mask & GFP_ATOMIC) && order == 0 && !can_direct_reclaim)
> +		boost_zone_for_atomic(ac, gfp_mask);
> +
>  	/*
>  	 * For costly allocations, try direct compaction first, as it's likely
>  	 * that we have enough base pages and don't need to reclaim. For non-
> 
> ---
> base-commit: b54345928fa1dbde534e32ecaa138678fd5d2135
> change-id: 20260206-wujing-mm-page_alloc-v8-fb1979bac6fe
> 
> Best regards,



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v9] mm/page_alloc: boost watermarks on atomic allocation failure
  2026-02-13  8:46 ` Vlastimil Babka
@ 2026-02-13 15:07   ` SeongJae Park
  0 siblings, 0 replies; 4+ messages in thread
From: SeongJae Park @ 2026-02-13 15:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: SeongJae Park, Qiliang Yuan, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Brendan Jackman, Johannes Weiner, Zi Yan, Lance Yang,
	linux-mm, linux-kernel

On Fri, 13 Feb 2026 09:46:14 +0100 Vlastimil Babka <vbabka@suse.cz> wrote:

> On 2/13/26 04:17, Qiliang Yuan wrote:
> > Atomic allocations (GFP_ATOMIC) are prone to failure under heavy memory
> > pressure as they cannot enter direct reclaim. This patch introduces a
> > watermark boost mechanism to mitigate this issue.
> > 
> > When a GFP_ATOMIC request enters the slowpath, the preferred zone's
> > watermark_boost is increased under zone->lock protection. This triggers
> > kswapd to proactively reclaim memory, creating a safety buffer for
> > future atomic allocations. A 1-second debounce timer prevents excessive
> > boosts during traffic bursts.
> > 
> > This approach reuses existing watermark_boost infrastructure with
> > minimal overhead and proper locking to ensure thread safety.
[...]
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index c380f063e8b7..8af88584a8bd 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -218,6 +218,13 @@ unsigned int pageblock_order __read_mostly;
> >  static void __free_pages_ok(struct page *page, unsigned int order,
> >  			    fpi_t fpi_flags);
> >  
> > +/*
> > + * Boost watermarks by ~0.1% of zone size on atomic allocation pressure.
> > + * This provides zone-proportional safety buffers: ~1MB per 1GB of zone size.
> > + * Larger zones under GFP_ATOMIC pressure need proportionally larger reserves.
> > + */
> > +#define ATOMIC_BOOST_FACTOR 1
> 
> ... so now we #define 1 but it makes little sense without that hardcoded
> 1000 below.

I agree.  I think it could be easier to understand if we use 10000 as the
denominator, consistent to other similar ones, like watermark_scale_factor.
Or, defining as a constant local variable or hard-coded value before its real
single use case might be easier to read, for below-mentioned reason.

> 
> > +
> >  /*
> >   * results with 256, 32 in the lowmem_reserve sysctl:
> >   *	1G machine -> (16M dma, 800M-16M normal, 1G-800M high)
> > @@ -2161,6 +2168,9 @@ bool pageblock_unisolate_and_move_free_pages(struct zone *zone, struct page *pag
> >  static inline bool boost_watermark(struct zone *zone)
> >  {
> >  	unsigned long max_boost;
> > +	unsigned long boost_amount;
> > +
> > +	lockdep_assert_held(&zone->lock);
> >  
> >  	if (!watermark_boost_factor)
> >  		return false;
> > @@ -2189,12 +2199,43 @@ static inline bool boost_watermark(struct zone *zone)
> >  
> >  	max_boost = max(pageblock_nr_pages, max_boost);
> >  
> > -	zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages,
> > -		max_boost);
> > +	boost_amount = max(pageblock_nr_pages,
> > +			   mult_frac(zone_managed_pages(zone), ATOMIC_BOOST_FACTOR, 1000));
> 
> I don't think mult_frac() was a great suggestion. We're talking about right
> shifting by a constant 10. In the other cases of mult_frac() we use dynamic
> values for x and n so it's justified. But this IMHO is unnecessary complication.

This file uses multi_frac() in two places with hard-coded denominator 10000.
Hence I feel it is more consistent to use mutl_frac() with the same denominator
(10000) and consistent naming.  In terms of overhead, I think the added
overhead is negligible, since this is called only once per second.

No strong opinion but just a trivial and personal taste, though.  Right
shifting should also be good to me. :)

And now I find I was thinking the ATOMIC_BOOST_SHIFT coulb be better to be
consistent with other similar code, because it is defined as a macro.  That is,
I was assuming it would be used in multiple places and therefore better to be
easily understood by readers.  Now I find it is actually being used only here.
What about defining it as a constant local variable here, or just hard-coding?


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v9] mm/page_alloc: boost watermarks on atomic allocation failure
  2026-02-13  3:17 [PATCH v9] mm/page_alloc: boost watermarks on atomic allocation failure Qiliang Yuan
  2026-02-13  8:46 ` Vlastimil Babka
@ 2026-02-13 19:36 ` Johannes Weiner
  1 sibling, 0 replies; 4+ messages in thread
From: Johannes Weiner @ 2026-02-13 19:36 UTC (permalink / raw)
  To: Qiliang Yuan
  Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Brendan Jackman, Zi Yan, Lance Yang, linux-mm,
	linux-kernel

On Fri, Feb 13, 2026 at 11:17:59AM +0800, Qiliang Yuan wrote:
> Atomic allocations (GFP_ATOMIC) are prone to failure under heavy memory
> pressure as they cannot enter direct reclaim. This patch introduces a
> watermark boost mechanism to mitigate this issue.
> 
> When a GFP_ATOMIC request enters the slowpath, the preferred zone's
> watermark_boost is increased under zone->lock protection. This triggers
> kswapd to proactively reclaim memory, creating a safety buffer for
> future atomic allocations. A 1-second debounce timer prevents excessive
> boosts during traffic bursts.
> 
> This approach reuses existing watermark_boost infrastructure with
> minimal overhead and proper locking to ensure thread safety.
> 
> Allocation failure logs:
> [38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0
> [38535644.725059] node 1: slabs: 339, objs: 17616, free: 317
> [38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
> [38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
> [38535645.447664] node 0: slabs: 940, objs: 40864, free: 144
> [38535645.454026] node 1: slabs: 322, objs: 19168, free: 383
> [38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
> [38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
> [38535649.655523] warn_alloc: 59 callbacks suppressed
> [38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null)
> [38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
> ---
> v9:
> - Use mult_frac() for boost calculation. (SJ)
> - Add !can_direct_reclaim check. (Vlastimil)
> - Code cleanup: naming, scope, and line limits. (SJ)
> - Update tags: Add Vlastimil's Acked-by.
> 
> v8:
> - Use spin_lock_irqsave() to prevent inconsistent lock state.
> 
> v7:
> - Use local variable for boost_amount.
> - Add zone->lock protection.
> - Add lockdep assertion.
> 
> v6:
> - Use ATOMIC_BOOST_SCALE_SHIFT define.
> - Add documentation for 0.1% rationale.
> 
> v5:
> - Use native boost_watermark().
> 
> v4:
> - Add watermark_scale_boost and gradual decay.
> 
> v3:
> - Per-zone debounce timer.
> 
> v2:
> - Debounce logic and zone-proportional boosting.
> 
> v1:
> - Initial version.
> ---
> Link to v8: https://lore.kernel.org/r/20260212-wujing-mm-page_alloc-v8-v8-1-daba38990cd3@gmail.com
> ---
>  include/linux/mmzone.h |  1 +
>  mm/page_alloc.c        | 49 +++++++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 48 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 75ef7c9f9307..8e37e4e6765b 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -882,6 +882,7 @@ struct zone {
>  	/* zone watermarks, access with *_wmark_pages(zone) macros */
>  	unsigned long _watermark[NR_WMARK];
>  	unsigned long watermark_boost;
> +	unsigned long last_boost_jiffies;
>  
>  	unsigned long nr_reserved_highatomic;
>  	unsigned long nr_free_highatomic;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c380f063e8b7..8af88584a8bd 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -218,6 +218,13 @@ unsigned int pageblock_order __read_mostly;
>  static void __free_pages_ok(struct page *page, unsigned int order,
>  			    fpi_t fpi_flags);
>  
> +/*
> + * Boost watermarks by ~0.1% of zone size on atomic allocation pressure.
> + * This provides zone-proportional safety buffers: ~1MB per 1GB of zone size.
> + * Larger zones under GFP_ATOMIC pressure need proportionally larger reserves.
> + */
> +#define ATOMIC_BOOST_FACTOR 1
> +
>  /*
>   * results with 256, 32 in the lowmem_reserve sysctl:
>   *	1G machine -> (16M dma, 800M-16M normal, 1G-800M high)
> @@ -2161,6 +2168,9 @@ bool pageblock_unisolate_and_move_free_pages(struct zone *zone, struct page *pag
>  static inline bool boost_watermark(struct zone *zone)
>  {
>  	unsigned long max_boost;
> +	unsigned long boost_amount;
> +
> +	lockdep_assert_held(&zone->lock);
>  
>  	if (!watermark_boost_factor)
>  		return false;

watermark_boost_factor is for fragmentation management. It's valid to
have this set to 0 and still want boosting for atomic.

> @@ -2189,12 +2199,43 @@ static inline bool boost_watermark(struct zone *zone)
>  
>  	max_boost = max(pageblock_nr_pages, max_boost);
>  
> -	zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages,
> -		max_boost);
> +	boost_amount = max(pageblock_nr_pages,
> +			   mult_frac(zone_managed_pages(zone), ATOMIC_BOOST_FACTOR, 1000));
> +	zone->watermark_boost = min(zone->watermark_boost + boost_amount,
> +				    max_boost);

Likewise, you don't want to add the atomic boost every time there is a
fragmentation event. You need to separate these paths.

The mult_frac() with constants seems a bit funny to me. Just do
zone_managed_pages(zone) / 1000, drop the define, and move the comment
and move the comment here.

> +static void boost_zone_for_atomic(struct alloc_context *ac, gfp_t gfp_mask)
> +{
> +	struct zoneref *z;
> +	struct zone *zone;
> +	unsigned long now = jiffies;
> +
> +	for_each_zone_zonelist(zone, z, ac->zonelist, ac->highest_zoneidx) {

for_each_zone_zonelist_nodemask() with ac->nodemask?

> +		/* Rate-limit boosts to once per second per zone */
> +		if (time_after(now, zone->last_boost_jiffies + HZ)) {
> +			unsigned long flags;
> +			bool should_wake;
> +
> +			zone->last_boost_jiffies = now;
> +
> +			/* Modify watermark under lock, wake kswapd outside */
> +			spin_lock_irqsave(&zone->lock, flags);
> +			should_wake = boost_watermark(zone);
> +			spin_unlock_irqrestore(&zone->lock, flags);
> +
> +			if (should_wake)
> +				wakeup_kswapd(zone, gfp_mask, 0,
> +					      ac->highest_zoneidx);
> +
> +			/* Boost only the preferred zone */
> +			break;
> +		}
> +	}

This is a bit strange to me. By the time you boost, all eligible zones
have been tried, and ALL their reserves were found to be inadequate
for the incoming atomic requests. They all *should* be boosted.

By doing them one by one, you risk additional failures even though you
already KNOW at this point that these other zones are problematic too.

So IMO, by the time you reach here, they should all be boosted.

> @@ -4742,6 +4783,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	if (page)
>  		goto got_pg;
>  
> +	/* Boost watermarks for atomic requests entering slowpath */
> +	if ((gfp_mask & GFP_ATOMIC) && order == 0 && !can_direct_reclaim)

This is a bit weird. GFP_ATOMIC is a mask, so this check will trigger
on anything that has __GFP_KSWAPD_RECLAIM set (which is most things),
so in turn you then have to filter out direct reclaim again (which the
real GFP_ATOMIC implies).

	if (gfp_has_flags(gfp_mask, GFP_ATOMIC))

> +		boost_zone_for_atomic(ac, gfp_mask);
> +


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-02-13 19:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-13  3:17 [PATCH v9] mm/page_alloc: boost watermarks on atomic allocation failure Qiliang Yuan
2026-02-13  8:46 ` Vlastimil Babka
2026-02-13 15:07   ` SeongJae Park
2026-02-13 19:36 ` Johannes Weiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox