[PATCH 0/1] mm/page_alloc: dynamic min_free

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/1] mm/page_alloc: dynamic min_free_kbytes adjustment
@ 2026-01-04 12:23 wujing
  2026-01-04 12:26 ` [PATCH 1/1] mm/page_alloc: auto-tune min_free_kbytes on atomic allocation failure wujing
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: wujing @ 2026-01-04 12:23 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka
  Cc: Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, linux-mm, linux-kernel, Qiliang Yuan,
	wujing

Atomic allocations (GFP_ATOMIC), particularly in network interrupt contexts,
are prone to failure during bursts of traffic if the pre-configured 
min_free_kbytes (atomic reserve) is insufficient. These failures lead to 
packet drops and performance degradation.

Static tuning of vm.min_free_kbytes is often challenging: setting it too 
low risks drops, while setting it too high wastes valuable memory.

This patch series introduces a reactive mechanism that:
1. Detects critical order-0 GFP_ATOMIC allocation failures.
2. Automatically doubles vm.min_free_kbytes to reserve more memory for 
   future bursts.
3. Enforces a safety cap (1% of total RAM) to prevent OOM or excessive waste.

This allows the system to self-adjust to the workload's specific atomic 
memory requirements without manual intervention.

wujing (1):
  mm/page_alloc: auto-tune min_free_kbytes on atomic allocation failure

 mm/page_alloc.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

-- 
2.39.5

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/1] mm/page_alloc: auto-tune min_free_kbytes on atomic allocation failure
  2026-01-04 12:23 [PATCH 0/1] mm/page_alloc: dynamic min_free_kbytes adjustment wujing
@ 2026-01-04 12:26 ` wujing
  2026-01-04 18:14   ` Andrew Morton
  2026-01-05  6:38   ` Lance Yang
  2026-01-05  8:17 ` [PATCH v2 0/1] mm/page_alloc: dynamic min_free_kbytes adjustment wujing
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: wujing @ 2026-01-04 12:26 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka
  Cc: Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, linux-mm, linux-kernel, Qiliang Yuan,
	wujing

Introduce a mechanism to dynamically increase vm.min_free_kbytes when
critical atomic allocations (GFP_ATOMIC, order-0) fail. This prevents
recurring network packet drops or other atomic failures by proactively
reserving more memory.

The adjustment doubles min_free_kbytes upon upon failure (exponential backoff),
capped at 1% of total RAM.

Observed failure logs:
[38535641.026406] node 0: slabs: 941, objs: 54656, free: 0
[38535641.037711] node 1: slabs: 349, objs: 22096, free: 272
[38535641.049025] node 1: slabs: 349, objs: 22096, free: 272
[38535642.795972] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535642.805017] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535642.816311] node 0: slabs: 854, objs: 42320, free: 0
[38535642.823066] node 1: slabs: 400, objs: 25360, free: 294
[38535643.070199] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535643.078861] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535643.089719] node 0: slabs: 841, objs: 41824, free: 0
[38535643.096513] node 1: slabs: 393, objs: 24480, free: 272
[38535643.484149] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535643.492831] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535643.503666] node 0: slabs: 898, objs: 43120, free: 159
[38535643.510140] node 1: slabs: 404, objs: 25424, free: 319
[38535644.699224] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535644.707911] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0
[38535644.725059] node 1: slabs: 339, objs: 17616, free: 317
[38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535645.447664] node 0: slabs: 940, objs: 40864, free: 144
[38535645.454026] node 1: slabs: 322, objs: 19168, free: 383
[38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535649.655523] warn_alloc: 59 callbacks suppressed
[38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null)
[38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1

Signed-off-by: wujing <realwujing@qq.com>
Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>
---
 mm/page_alloc.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c380f063e8b7..9a24e2b6cfbf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -30,6 +30,7 @@
 #include <linux/oom.h>
 #include <linux/topology.h>
 #include <linux/sysctl.h>
+#include <linux/workqueue.h>
 #include <linux/cpu.h>
 #include <linux/cpuset.h>
 #include <linux/pagevec.h>
@@ -3975,6 +3976,9 @@ static void warn_alloc_show_mem(gfp_t gfp_mask, nodemask_t *nodemask)
 	mem_cgroup_show_protected_memory(NULL);
 }
 
+static void boost_min_free_kbytes_workfn(struct work_struct *work);
+static DECLARE_WORK(boost_min_free_kbytes_work, boost_min_free_kbytes_workfn);
+
 void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
 {
 	struct va_format vaf;
@@ -4947,6 +4951,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		goto retry;
 	}
 fail:
+	/* Auto-tuning: trigger boost if atomic allocation fails */
+	if ((gfp_mask & GFP_ATOMIC) && order == 0)
+		schedule_work(&boost_min_free_kbytes_work);
+
 	warn_alloc(gfp_mask, ac->nodemask,
 			"page allocation failure: order:%u", order);
 got_pg:
@@ -7682,3 +7690,28 @@ struct page *alloc_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned int or
 	return page;
 }
 EXPORT_SYMBOL_GPL(alloc_pages_nolock_noprof);
+
+static void boost_min_free_kbytes_workfn(struct work_struct *work)
+{
+	int new_min;
+
+	/* Cap at 1% of total RAM for safety */
+	unsigned long total_kbytes = totalram_pages() << (PAGE_SHIFT - 10);
+	int max_limit = total_kbytes / 100;
+
+	/* Exponential increase: double the current value */
+	new_min = min_free_kbytes * 2;
+
+	if (new_min > max_limit)
+		new_min = max_limit;
+
+	if (new_min > min_free_kbytes) {
+		min_free_kbytes = new_min;
+		/* Update user_min_free_kbytes so it persists through recalculations */
+		if (new_min > user_min_free_kbytes)
+			user_min_free_kbytes = new_min;
+		
+		setup_per_zone_wmarks();
+		pr_info("Auto-tuning: unexpected atomic failure detected, increasing min_free_kbytes to %d\n", min_free_kbytes);
+	}
+}
-- 
2.39.5



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] mm/page_alloc: auto-tune min_free_kbytes on atomic allocation failure
  2026-01-04 12:26 ` [PATCH 1/1] mm/page_alloc: auto-tune min_free_kbytes on atomic allocation failure wujing
@ 2026-01-04 18:14   ` Andrew Morton
  2026-01-05  2:32     ` Matthew Wilcox
  2026-01-05  6:38   ` Lance Yang
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2026-01-04 18:14 UTC (permalink / raw)
  To: wujing
  Cc: Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, Zi Yan, linux-mm, linux-kernel,
	Qiliang Yuan

On Sun,  4 Jan 2026 20:26:52 +0800 wujing <realwujing@qq.com> wrote:

> Introduce a mechanism to dynamically increase vm.min_free_kbytes when
> critical atomic allocations (GFP_ATOMIC, order-0) fail. This prevents
> recurring network packet drops or other atomic failures by proactively
> reserving more memory.

Seems like a good idea, however it's very likely that the networking
people have looked into this rather a lot.  Can I suggest that you
engage with them?  netdev@vger.kernel.org.

> The adjustment doubles min_free_kbytes upon upon failure (exponential backoff),
> capped at 1% of total RAM.

But no attempt to reduce it again after the load spike has gone away.

> Observed failure logs:
> [38535641.026406] node 0: slabs: 941, objs: 54656, free: 0
> [38535641.037711] node 1: slabs: 349, objs: 22096, free: 272
> [38535641.049025] node 1: slabs: 349, objs: 22096, free: 272
>
> ...
>
> +static void boost_min_free_kbytes_workfn(struct work_struct *work);
> +static DECLARE_WORK(boost_min_free_kbytes_work, boost_min_free_kbytes_workfn);
> +
>  void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
>  {
>  	struct va_format vaf;
> @@ -4947,6 +4951,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  		goto retry;
>  	}
>  fail:
> +	/* Auto-tuning: trigger boost if atomic allocation fails */
> +	if ((gfp_mask & GFP_ATOMIC) && order == 0)
> +		schedule_work(&boost_min_free_kbytes_work);
> +

Probably this should be selectable and tunable via a kernel boot
parameter or a procfs tunable.  But I suggest you not do that work
until having discussed the approach with the networking developers.

>  	warn_alloc(gfp_mask, ac->nodemask,
>  			"page allocation failure: order:%u", order);
>  got_pg:



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] mm/page_alloc: auto-tune min_free_kbytes on atomic allocation failure
  2026-01-04 18:14   ` Andrew Morton
@ 2026-01-05  2:32     ` Matthew Wilcox
  0 siblings, 0 replies; 11+ messages in thread
From: Matthew Wilcox @ 2026-01-05  2:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: wujing, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, Zi Yan, linux-mm, linux-kernel,
	Qiliang Yuan

On Sun, Jan 04, 2026 at 10:14:43AM -0800, Andrew Morton wrote:
> On Sun,  4 Jan 2026 20:26:52 +0800 wujing <realwujing@qq.com> wrote:
> 
> > Introduce a mechanism to dynamically increase vm.min_free_kbytes when
> > critical atomic allocations (GFP_ATOMIC, order-0) fail. This prevents
> > recurring network packet drops or other atomic failures by proactively
> > reserving more memory.
> 
> Seems like a good idea, however it's very likely that the networking
> people have looked into this rather a lot.  Can I suggest that you
> engage with them?  netdev@vger.kernel.org.

Agreed, the networking people should definitely be brought into this.

I'm broadly in favour of something like this patch.  We should do more
auto-tuning and less reliant on sysadmin intervention.  I have two
questions:

1. Is doubling too aggressive?  Would an increase of, say, 10% or 20%
be more appropriate?

2. Do we have to wait for failure before increasing?  Could we schedule
the increase for when we get to within, say, 10% of the current limit?

> > The adjustment doubles min_free_kbytes upon upon failure (exponential backoff),
> > capped at 1% of total RAM.
> 
> But no attempt to reduce it again after the load spike has gone away.

Hm, how would we do that?  Automatically decay by 5%, 300 seconds after
increasing; then schedule another decay for 300 seconds after that until
we get down to something appropriately smaller?

> > +	/* Auto-tuning: trigger boost if atomic allocation fails */
> > +	if ((gfp_mask & GFP_ATOMIC) && order == 0)
> > +		schedule_work(&boost_min_free_kbytes_work);
> > +
> 
> Probably this should be selectable and tunable via a kernel boot
> parameter or a procfs tunable.  But I suggest you not do that work
> until having discussed the approach with the networking developers.

Ugh, please, no new tunables.  Let's just implement an algorithm that
works.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] mm/page_alloc: auto-tune min_free_kbytes on atomic allocation failure
  2026-01-04 12:26 ` [PATCH 1/1] mm/page_alloc: auto-tune min_free_kbytes on atomic allocation failure wujing
  2026-01-04 18:14   ` Andrew Morton
@ 2026-01-05  6:38   ` Lance Yang
  2026-01-05  7:29     ` wujing
  1 sibling, 1 reply; 11+ messages in thread
From: Lance Yang @ 2026-01-05  6:38 UTC (permalink / raw)
  To: realwujing
  Cc: david, akpm, hannes, jackmanb, linux-kernel, linux-mm, mhocko,
	surenb, vbabka, yuanql9, ziy, Lance Yang


On Sun,  4 Jan 2026 20:26:52 +0800, wujing wrote:
> Introduce a mechanism to dynamically increase vm.min_free_kbytes when
> critical atomic allocations (GFP_ATOMIC, order-0) fail. This prevents
> recurring network packet drops or other atomic failures by proactively
> reserving more memory.

Just wondering, could we adjust watermark_scale_factor instead of
min_free_kbytes?

Increasing min_free_kbytes directly reduces available memory, while
watermark_scale_factor just makes kswapd wake up earlier for reclaim ...

Seems like less side effect than min_free_kbytes :)

Cheers,
Lance

> 
> The adjustment doubles min_free_kbytes upon upon failure (exponential backoff),
> capped at 1% of total RAM.
> 
> Observed failure logs:
> [38535641.026406] node 0: slabs: 941, objs: 54656, free: 0
> [38535641.037711] node 1: slabs: 349, objs: 22096, free: 272
> [38535641.049025] node 1: slabs: 349, objs: 22096, free: 272
> [38535642.795972] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
> [38535642.805017] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
> [38535642.816311] node 0: slabs: 854, objs: 42320, free: 0
> [38535642.823066] node 1: slabs: 400, objs: 25360, free: 294
> [38535643.070199] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
> [38535643.078861] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
> [38535643.089719] node 0: slabs: 841, objs: 41824, free: 0
> [38535643.096513] node 1: slabs: 393, objs: 24480, free: 272
> [38535643.484149] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
> [38535643.492831] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
> [38535643.503666] node 0: slabs: 898, objs: 43120, free: 159
> [38535643.510140] node 1: slabs: 404, objs: 25424, free: 319
> [38535644.699224] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
> [38535644.707911] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
> [38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0
> [38535644.725059] node 1: slabs: 339, objs: 17616, free: 317
> [38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
> [38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
> [38535645.447664] node 0: slabs: 940, objs: 40864, free: 144
> [38535645.454026] node 1: slabs: 322, objs: 19168, free: 383
> [38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
> [38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
> [38535649.655523] warn_alloc: 59 callbacks suppressed
> [38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null)
> [38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1
> 
> Signed-off-by: wujing <realwujing@qq.com>
> Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>
> ---
>  mm/page_alloc.c | 33 +++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c380f063e8b7..9a24e2b6cfbf 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -30,6 +30,7 @@
>  #include <linux/oom.h>
>  #include <linux/topology.h>
>  #include <linux/sysctl.h>
> +#include <linux/workqueue.h>
>  #include <linux/cpu.h>
>  #include <linux/cpuset.h>
>  #include <linux/pagevec.h>
> @@ -3975,6 +3976,9 @@ static void warn_alloc_show_mem(gfp_t gfp_mask, nodemask_t *nodemask)
>  	mem_cgroup_show_protected_memory(NULL);
>  }
>  
> +static void boost_min_free_kbytes_workfn(struct work_struct *work);
> +static DECLARE_WORK(boost_min_free_kbytes_work, boost_min_free_kbytes_workfn);
> +
>  void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
>  {
>  	struct va_format vaf;
> @@ -4947,6 +4951,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  		goto retry;
>  	}
>  fail:
> +	/* Auto-tuning: trigger boost if atomic allocation fails */
> +	if ((gfp_mask & GFP_ATOMIC) && order == 0)
> +		schedule_work(&boost_min_free_kbytes_work);
> +
>  	warn_alloc(gfp_mask, ac->nodemask,
>  			"page allocation failure: order:%u", order);
>  got_pg:
> @@ -7682,3 +7690,28 @@ struct page *alloc_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned int or
>  	return page;
>  }
>  EXPORT_SYMBOL_GPL(alloc_pages_nolock_noprof);
> +
> +static void boost_min_free_kbytes_workfn(struct work_struct *work)
> +{
> +	int new_min;
> +
> +	/* Cap at 1% of total RAM for safety */
> +	unsigned long total_kbytes = totalram_pages() << (PAGE_SHIFT - 10);
> +	int max_limit = total_kbytes / 100;
> +
> +	/* Exponential increase: double the current value */
> +	new_min = min_free_kbytes * 2;
> +
> +	if (new_min > max_limit)
> +		new_min = max_limit;
> +
> +	if (new_min > min_free_kbytes) {
> +		min_free_kbytes = new_min;
> +		/* Update user_min_free_kbytes so it persists through recalculations */
> +		if (new_min > user_min_free_kbytes)
> +			user_min_free_kbytes = new_min;
> +		
> +		setup_per_zone_wmarks();
> +		pr_info("Auto-tuning: unexpected atomic failure detected, increasing min_free_kbytes to %d\n", min_free_kbytes);
> +	}
> +}


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] mm/page_alloc: auto-tune min_free_kbytes on atomic allocation failure
  2026-01-05  6:38   ` Lance Yang
@ 2026-01-05  7:29     ` wujing
  2026-01-05 16:47       ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: wujing @ 2026-01-05  7:29 UTC (permalink / raw)
  To: Lance Yang
  Cc: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, Zi Yan, linux-mm, linux-kernel,
	Qiliang Yuan, wujing

Hi Lance,

Thanks for the suggestion about using watermark_scale_factor instead of 
min_free_kbytes. I appreciate the feedback, and I'd like to explain why I 
believe min_free_kbytes is the correct knob to tune for this specific problem.

## The Core Issue

The failures we're observing are GFP_ATOMIC, order-0 allocations in interrupt 
context (network packet reception). From the logs:

  [38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC)

These allocations:
1. Cannot sleep or wait for memory reclaim
2. Can only use memory below the MIN watermark (the emergency reserve)
3. Fail when even this emergency reserve is exhausted

## Why watermark_scale_factor Won't Help

watermark_scale_factor controls the distance between MIN and LOW watermarks. 
It makes kswapd wake up earlier (at LOW instead of closer to MIN), which is 
great for preventing memory pressure.

However, for GFP_ATOMIC allocations:
- They don't wait for kswapd
- They only care about the MIN watermark itself
- A larger LOW-MIN gap doesn't increase the atomic reserve

Even if kswapd wakes up 10 seconds earlier due to a higher 
watermark_scale_factor, network interrupt bursts happen in milliseconds, 
leaving no time for reclaim.

## Why min_free_kbytes Is Necessary

min_free_kbytes directly controls the MIN watermark — the actual memory 
reserved for atomic allocations. Increasing it immediately makes more memory 
available for GFP_ATOMIC, which is what we need.

## Alternative: Hybrid Approach?

That said, your point about side effects is valid. One option could be:
1. Increase min_free_kbytes for immediate relief during failures
2. Also increase watermark_scale_factor slightly to make kswapd more aggressive
3. This could reduce the frequency of hitting MIN in the first place

Would this hybrid approach address your concerns?

Thanks again for the thoughtful review!

Best regards,
Wujing

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 0/1] mm/page_alloc: dynamic min_free_kbytes adjustment
  2026-01-04 12:23 [PATCH 0/1] mm/page_alloc: dynamic min_free_kbytes adjustment wujing
  2026-01-04 12:26 ` [PATCH 1/1] mm/page_alloc: auto-tune min_free_kbytes on atomic allocation failure wujing
@ 2026-01-05  8:17 ` wujing
  2026-01-05 11:59 ` [PATCH v3 0/1] mm/page_alloc: dynamic watermark boosting wujing
       [not found] ` <20260105115943.1361645-1-realwujing@qq.com>
  3 siblings, 0 replies; 11+ messages in thread
From: wujing @ 2026-01-05  8:17 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka
  Cc: Matthew Wilcox, Lance Yang, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, Zi Yan, linux-mm, linux-kernel,
	netdev, Qiliang Yuan, wujing

This is v2 of the auto-tuning patch, addressing feedback from Andrew Morton
and Matthew Wilcox.

## Responses to Andrew Morton's feedback:

> "But no attempt to reduce it again after the load spike has gone away."

v2 implements a decay mechanism: min_free_kbytes automatically reduces by 5%
every 5 minutes after being increased. However, it stops at 1.2x the initial
value rather than returning to baseline, ensuring the system "remembers"
previous pressure patterns.

> "Probably this should be selectable and tunable via a kernel boot parameter
> or a procfs tunable."

Per Matthew Wilcox's preference to avoid new tunables, v2 implements an
algorithm designed to work automatically without configuration. The parameters
(50% increase, 5% decay, 10s debounce) are chosen to be responsive yet stable.

> "Can I suggest that you engage with [the networking people]? netdev@"

Done - netdev@ is now CC'd on this v2 submission.

## Responses to Matthew Wilcox's feedback:

> "Is doubling too aggressive? Would an increase of, say, 10% or 20% be more
> appropriate?"

v2 uses a 50% increase (compromise between responsiveness and conservatism).
20% felt too slow for burst traffic scenarios based on our observations.

> "Do we have to wait for failure before increasing? Could we schedule the
> increase for when we get to within, say, 10% of the current limit?"

We considered proactive monitoring but concluded it would add overhead and
complexity. The debounce mechanism (10s) ensures we don't thrash while still
being reactive.

> "Hm, how would we do that? Automatically decay by 5%, 300 seconds after
> increasing; then schedule another decay for 300 seconds after that..."

Exactly as you suggested! v2 implements this decay chain. The only addition
is stopping at 1.2x baseline to preserve learning.

> "Ugh, please, no new tunables. Let's just implement an algorithm that works."

Agreed - v2 has zero new tunables.

## Changes in v2:
- Reduced aggressiveness: +50% increase instead of doubling
- Added debounce: Only trigger once per 10 seconds to prevent storms
- Added decay: Automatically reduce by 5% every 5 minutes
- Preserve learning: Decay stops at 1.2x initial value, not baseline
- Engaged networking community (netdev@)

Thanks for the thoughtful reviews!

wujing (1):
  mm/page_alloc: auto-tune min_free_kbytes on atomic allocation failure

 mm/page_alloc.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

-- 
2.39.5

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 0/1] mm/page_alloc: dynamic watermark boosting
  2026-01-04 12:23 [PATCH 0/1] mm/page_alloc: dynamic min_free_kbytes adjustment wujing
  2026-01-04 12:26 ` [PATCH 1/1] mm/page_alloc: auto-tune min_free_kbytes on atomic allocation failure wujing
  2026-01-05  8:17 ` [PATCH v2 0/1] mm/page_alloc: dynamic min_free_kbytes adjustment wujing
@ 2026-01-05 11:59 ` wujing
       [not found] ` <20260105115943.1361645-1-realwujing@qq.com>
  3 siblings, 0 replies; 11+ messages in thread
From: wujing @ 2026-01-05 11:59 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka
  Cc: Matthew Wilcox, Lance Yang, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, Zi Yan, linux-mm, linux-kernel,
	netdev, Qiliang Yuan, wujing

This is v3 of the auto-tuning patch, addressing feedback from Vlastimil Babka,
Andrew Morton, and Matthew Wilcox.

Major shift in v3:
Following Vlastimil's suggestion, this version abandons the direct modification
of min_free_kbytes. Instead, it leverages the existing watermark_boost
infrastructure. This approach is more idiomatic as it:
- Avoids conflicts with administrative sysctl settings.
- Only affects specific zones experiencing pressure.
- Utilizes standard kswapd logic for natural decay after reclamation.

Responses to Vlastimil Babka's feedback:
> "Were they really packet drops observed? AFAIK the receive is deferred to non-irq 
> context if those atomic allocations fail, it shouldn't mean a drop."
In our high-concurrency production environment, we observed that while the 
network stack tries to defer processing, persistent GFP_ATOMIC failures 
eventually lead to NIC-level drops due to RX buffer exhaustion.

> "As for the implementation I'd rather not be changing min_free_kbytes directly... 
> We already have watermark_boost to dynamically change watermarks"
Agreed and implemented in v3.

Changes in v3:
- Replaced min_free_kbytes modification with watermark_boost calls.
- Removed all complex decay/persistence logic from v2, relying on kswapd's 
  standard behavior.
- Maintained the 10-second debounce mechanism.
- Engaged netdev@ community as requested by Andrew Morton.

Thanks for the thoughtful reviews!

wujing (1):
  mm/page_alloc: auto-tune watermarks on atomic allocation failure

 mm/page_alloc.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

-- 
2.39.5



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/1] mm/page_alloc: auto-tune watermarks on atomic allocation failure
       [not found] ` <20260105115943.1361645-1-realwujing@qq.com>
@ 2026-01-05 11:59   ` wujing
  0 siblings, 0 replies; 11+ messages in thread
From: wujing @ 2026-01-05 11:59 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka
  Cc: Matthew Wilcox, Lance Yang, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, Zi Yan, linux-mm, linux-kernel,
	netdev, Qiliang Yuan, wujing

Introduce a mechanism to dynamically boost watermarks when critical
atomic allocations (GFP_ATOMIC, order-0) fail. This prevents recurring
network packet drops or other atomic failures by proactively triggering
kswapd to reclaim memory for future atomic requests.

The mechanism utilizes the existing watermark_boost infrastructure. When
an order-0 atomic allocation fails, watermarks are boosted in the
relevant zones, which encourages kswapd to reclaim pages more
aggressively. Boosting is debounced to once every 10 seconds to prevent
adjustment storms during burst traffic.

Testing has shown that this allows the system to recover quickly from
sudden spikes in network traffic that otherwise would cause persistent
allocation failures.

Observed failure logs:
[38535641.026406] node 0: slabs: 941, objs: 54656, free: 0
[38535641.037711] node 1: slabs: 349, objs: 22096, free: 272
[38535641.049025] node 1: slabs: 349, objs: 22096, free: 272
[38535642.795972] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535642.805017] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535642.816311] node 0: slabs: 854, objs: 42320, free: 0
[38535642.823066] node 1: slabs: 400, objs: 25360, free: 294
[38535643.070199] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535643.078861] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535643.089719] node 0: slabs: 841, objs: 41824, free: 0
[38535643.096513] node 1: slabs: 393, objs: 24480, free: 272
[38535643.484149] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535643.492831] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535643.503666] node 0: slabs: 898, objs: 43120, free: 159
[38535643.510140] node 1: slabs: 404, objs: 25424, free: 319
[38535644.699224] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535644.707911] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0
[38535644.725059] node 1: slabs: 339, objs: 17616, free: 317
[38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535645.447664] node 0: slabs: 940, objs: 40864, free: 144
[38535645.454026] node 1: slabs: 322, objs: 19168, free: 383
[38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
[38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
[38535649.655523] warn_alloc: 59 callbacks suppressed
[38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null)
[38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1

Signed-off-by: wujing <realwujing@qq.com>
Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>
---
 mm/page_alloc.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c380f063e8b7..a2959fee28d9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3975,6 +3975,10 @@ static void warn_alloc_show_mem(gfp_t gfp_mask, nodemask_t *nodemask)
 	mem_cgroup_show_protected_memory(NULL);
 }
 
+/* Auto-tuning watermarks on atomic allocation failures */
+static unsigned long last_boost_jiffies = 0;
+#define BOOST_DEBOUNCE_MS 10000  /* 10 seconds debounce */
+
 void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
 {
 	struct va_format vaf;
@@ -4947,6 +4951,22 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		goto retry;
 	}
 fail:
+	/* Auto-tuning: boost watermarks if atomic allocation fails */
+	if ((gfp_mask & GFP_ATOMIC) && order == 0) {
+		unsigned long now = jiffies;
+
+		if (time_after(now, last_boost_jiffies + msecs_to_jiffies(BOOST_DEBOUNCE_MS))) {
+			struct zoneref *z;
+			struct zone *zone;
+
+			last_boost_jiffies = now;
+			for_each_zone_zonelist(zone, z, ac->zonelist, ac->highest_zoneidx) {
+				if (boost_watermark(zone))
+					wakeup_kswapd(zone, gfp_mask, 0, ac->highest_zoneidx);
+			}
+		}
+	}
+
 	warn_alloc(gfp_mask, ac->nodemask,
 			"page allocation failure: order:%u", order);
 got_pg:
-- 
2.39.5



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] mm/page_alloc: auto-tune min_free_kbytes on atomic allocation failure
  2026-01-05  7:29     ` wujing
@ 2026-01-05 16:47       ` Michal Hocko
  0 siblings, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2026-01-05 16:47 UTC (permalink / raw)
  To: wujing
  Cc: Lance Yang, Andrew Morton, Vlastimil Babka, Suren Baghdasaryan,
	Brendan Jackman, Johannes Weiner, Zi Yan, linux-mm, linux-kernel,
	Qiliang Yuan

On Mon 05-01-26 15:29:04, wujing wrote:
> Hi Lance,
> 
> Thanks for the suggestion about using watermark_scale_factor instead of 
> min_free_kbytes. I appreciate the feedback, and I'd like to explain why I 
> believe min_free_kbytes is the correct knob to tune for this specific problem.
> 
> ## The Core Issue
> 
> The failures we're observing are GFP_ATOMIC, order-0 allocations in interrupt 
> context (network packet reception). From the logs:
> 
>   [38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC)
> 
> These allocations:
> 1. Cannot sleep or wait for memory reclaim
> 2. Can only use memory below the MIN watermark (the emergency reserve)
> 3. Fail when even this emergency reserve is exhausted
> 
> ## Why watermark_scale_factor Won't Help
> 
> watermark_scale_factor controls the distance between MIN and LOW watermarks. 
> It makes kswapd wake up earlier (at LOW instead of closer to MIN), which is 
> great for preventing memory pressure.
> 
> However, for GFP_ATOMIC allocations:
> - They don't wait for kswapd
> - They only care about the MIN watermark itself
> - A larger LOW-MIN gap doesn't increase the atomic reserve
> 
> Even if kswapd wakes up 10 seconds earlier due to a higher 
> watermark_scale_factor, network interrupt bursts happen in milliseconds, 
> leaving no time for reclaim.

The thing is that your approach is not immediate anyway. You are
scheduling a deferred work when allocation already fails unless I am
missing something. Which is too late as well.

Lance has a good point that updating the scale factor earlier might help
to smooth out the reclaim for the increased demand.

> ## Why min_free_kbytes Is Necessary
> 
> min_free_kbytes directly controls the MIN watermark — the actual memory 
> reserved for atomic allocations. Increasing it immediately makes more memory 
> available for GFP_ATOMIC, which is what we need.

Well, if the memory is hard to reclaim for kswapd and earlier wake up
doesn't help out to reclaim any memory then is realistic to expect that
direct reclaimers can do better on the min_free_kbytes (and watermarks)
update?

> ## Alternative: Hybrid Approach?
> 
> That said, your point about side effects is valid. One option could be:
> 1. Increase min_free_kbytes for immediate relief during failures
> 2. Also increase watermark_scale_factor slightly to make kswapd more aggressive
> 3. This could reduce the frequency of hitting MIN in the first place
> 
> Would this hybrid approach address your concerns?

I would much rather start with a simpler approach and only make this
more complicated when it turns insufficient. Scaling watermark_scale_factor 
seems like a more natural approach to me. Could you give it a try or
have you tried and it proved to be a worse solution?
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 0/1] mm/page_alloc: dynamic min_free_kbytes adjustment
@ 2026-01-04 12:17 wujing
  0 siblings, 0 replies; 11+ messages in thread
From: wujing @ 2026-01-04 12:17 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka
  Cc: Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, linux-mm, linux-kernel, Qiliang Yuan,
	wujing

Atomic allocations (GFP_ATOMIC), particularly in network interrupt contexts,
are prone to failure during bursts of traffic if the pre-configured 
min_free_kbytes (atomic reserve) is insufficient. These failures lead to 
packet drops and performance degradation.

Static tuning of vm.min_free_kbytes is often challenging: setting it too 
low risks drops, while setting it too high wastes valuable memory.

This patch series introduces a reactive mechanism that:
1. Detects critical order-0 GFP_ATOMIC allocation failures.
2. Automatically doubles vm.min_free_kbytes to reserve more memory for 
   future bursts.
3. Enforces a safety cap (1% of total RAM) to prevent OOM or excessive waste.

This allows the system to self-adjust to the workload's specific atomic 
memory requirements without manual intervention.

wujing (1):
  mm/page_alloc: auto-tune min_free_kbytes on atomic allocation failure

 mm/page_alloc.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

-- 
2.39.5

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-01-05 16:47 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-04 12:23 [PATCH 0/1] mm/page_alloc: dynamic min_free_kbytes adjustment wujing
2026-01-04 12:26 ` [PATCH 1/1] mm/page_alloc: auto-tune min_free_kbytes on atomic allocation failure wujing
2026-01-04 18:14   ` Andrew Morton
2026-01-05  2:32     ` Matthew Wilcox
2026-01-05  6:38   ` Lance Yang
2026-01-05  7:29     ` wujing
2026-01-05 16:47       ` Michal Hocko
2026-01-05  8:17 ` [PATCH v2 0/1] mm/page_alloc: dynamic min_free_kbytes adjustment wujing
2026-01-05 11:59 ` [PATCH v3 0/1] mm/page_alloc: dynamic watermark boosting wujing
     [not found] ` <20260105115943.1361645-1-realwujing@qq.com>
2026-01-05 11:59   ` [PATCH v3 1/1] mm/page_alloc: auto-tune watermarks on atomic allocation failure wujing
  -- strict thread matches above, loose matches on Subject: below --
2026-01-04 12:17 [PATCH 0/1] mm/page_alloc: dynamic min_free_kbytes adjustment wujing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox