[PATCH v1] mm/vmscan: Add retry logic for cgroups with memory.low in kswapd

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1] mm/vmscan: Add retry logic for cgroups with memory.low in kswapd
@ 2025-10-14  8:18 Jiayuan Chen
  2025-10-14  9:33 ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Jiayuan Chen @ 2025-10-14  8:18 UTC (permalink / raw)
  To: linux-mm
  Cc: Jiayuan Chen, Andrew Morton, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Johannes Weiner, David Hildenbrand, Michal Hocko, Qi Zheng,
	Shakeel Butt, Lorenzo Stoakes, linux-kernel

We can set memory.low for cgroups as a soft protection limit. When the
kernel cannot reclaim any pages from other cgroups, it retries reclaim
while ignoring the memory.low protection of the skipped cgroups.

Currently, this retry logic only works in direct reclaim path, but is
missing in the kswapd asynchronous reclaim. Typically, a cgroup may
contain some cold pages that could be reclaimed even when memory.low is
set.

This change adds retry logic to kswapd: if the first reclaim attempt fails
to reclaim any pages and some cgroups were skipped due to memory.low
protection, kswapd will perform a second reclaim pass ignoring memory.low
restrictions.

This ensures more consistent reclaim behavior between direct reclaim and
kswapd. By allowing kswapd to reclaim more proactively from protected
cgroups under global memory pressure, this optimization can help reduce
the occurrence of direct reclaim, which is more disruptive to application
performance.

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 mm/vmscan.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index c80fcae7f2a1..231c66fcdfd8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -7147,6 +7147,13 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
 		goto restart;
 	}

+	/* Restart if we skipped the memory low event */
+	if (sc.memcg_low_skipped && !sc.memcg_low_reclaim &&
+	    sc.priority < 1) {
+		sc.memcg_low_reclaim = 1;
+		goto restart;
+	}
+
 	if (!sc.nr_reclaimed)
 		atomic_inc(&pgdat->kswapd_failures);

-- 
2.43.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1] mm/vmscan: Add retry logic for cgroups with memory.low in kswapd
  2025-10-14  8:18 [PATCH v1] mm/vmscan: Add retry logic for cgroups with memory.low in kswapd Jiayuan Chen
@ 2025-10-14  9:33 ` Michal Hocko
  2025-10-14 12:56   ` Jiayuan Chen
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2025-10-14  9:33 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: linux-mm, Andrew Morton, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Johannes Weiner, David Hildenbrand, Qi Zheng, Shakeel Butt,
	Lorenzo Stoakes, linux-kernel

On Tue 14-10-25 16:18:49, Jiayuan Chen wrote:
> We can set memory.low for cgroups as a soft protection limit. When the
> kernel cannot reclaim any pages from other cgroups, it retries reclaim
> while ignoring the memory.low protection of the skipped cgroups.
> 
> Currently, this retry logic only works in direct reclaim path, but is
> missing in the kswapd asynchronous reclaim. Typically, a cgroup may
> contain some cold pages that could be reclaimed even when memory.low is
> set.
> 
> This change adds retry logic to kswapd: if the first reclaim attempt fails
> to reclaim any pages and some cgroups were skipped due to memory.low
> protection, kswapd will perform a second reclaim pass ignoring memory.low
> restrictions.
> 
> This ensures more consistent reclaim behavior between direct reclaim and
> kswapd. By allowing kswapd to reclaim more proactively from protected
> cgroups under global memory pressure, this optimization can help reduce
> the occurrence of direct reclaim, which is more disruptive to application
> performance.

Could you describe the problem you are trying to address in more details
please? Because your patch is significantly changing the behavior of the
low limit. I would even go as far as say it breaks its expecations
because low limit should provide a certain level of protection and
your patch would allow kswapd to reclaim from those cgroups much sooner
now. If this is really needed then we need much more detailed
justification and also evaluation how that influences existing users.

> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> ---
>  mm/vmscan.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c80fcae7f2a1..231c66fcdfd8 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -7147,6 +7147,13 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
>  		goto restart;
>  	}
>  
> +	/* Restart if we skipped the memory low event */
> +	if (sc.memcg_low_skipped && !sc.memcg_low_reclaim &&
> +	    sc.priority < 1) {
> +		sc.memcg_low_reclaim = 1;
> +		goto restart;
> +	}
> +
>  	if (!sc.nr_reclaimed)
>  		atomic_inc(&pgdat->kswapd_failures);
>  
> -- 
> 2.43.0

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1] mm/vmscan: Add retry logic for cgroups with memory.low in kswapd
  2025-10-14  9:33 ` Michal Hocko
@ 2025-10-14 12:56   ` Jiayuan Chen
  2025-10-16 14:49     ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Jiayuan Chen @ 2025-10-14 12:56 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Johannes Weiner, David Hildenbrand, Qi Zheng, Shakeel Butt,
	Lorenzo Stoakes, linux-kernel

October 14, 2025 at 17:33, "Michal Hocko" <mhocko@suse.com mailto:mhocko@suse.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > wrote:


> 
> On Tue 14-10-25 16:18:49, Jiayuan Chen wrote:
> 
> > 
> > We can set memory.low for cgroups as a soft protection limit. When the
> >  kernel cannot reclaim any pages from other cgroups, it retries reclaim
> >  while ignoring the memory.low protection of the skipped cgroups.
> >  
> >  Currently, this retry logic only works in direct reclaim path, but is
> >  missing in the kswapd asynchronous reclaim. Typically, a cgroup may
> >  contain some cold pages that could be reclaimed even when memory.low is
> >  set.
> >  
> >  This change adds retry logic to kswapd: if the first reclaim attempt fails
> >  to reclaim any pages and some cgroups were skipped due to memory.low
> >  protection, kswapd will perform a second reclaim pass ignoring memory.low
> >  restrictions.
> >  
> >  This ensures more consistent reclaim behavior between direct reclaim and
> >  kswapd. By allowing kswapd to reclaim more proactively from protected
> >  cgroups under global memory pressure, this optimization can help reduce
> >  the occurrence of direct reclaim, which is more disruptive to application
> >  performance.
> > 
> Could you describe the problem you are trying to address in more details
> please? Because your patch is significantly changing the behavior of the
> low limit. I would even go as far as say it breaks its expecations
> because low limit should provide a certain level of protection and
> your patch would allow kswapd to reclaim from those cgroups much sooner
> now. If this is really needed then we need much more detailed
> justification and also evaluation how that influences existing users.
> 


Thanks Michal, let me explain the issue I encountered:

1. When kswapd is triggered and there's no reclaimable memory (sc.nr_reclaimed == 0),
this causes kswapd_failures counter to continuously accumulate until it reaches
MAX_RECLAIM_RETRIES. This makes the kswapd thread stop running until a direct memory
reclaim is triggered.

2. We observed a phenomenon where kswapd is triggered by watermark_boost rather
than by actual memory watermarks being insufficient. For boost-triggered
reclamation, the maximum priority can only be DEF_PRIORITY - 2, making memory
reclamation more difficult compared to when priority is 1.

3. When we find that kswapd has no reclaimable memory, I think we could try to
reclaim some memory from pods protected by memory.low, similar to how direct memory
reclaim also has logic to reclaim memory protected by memory.low.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1] mm/vmscan: Add retry logic for cgroups with memory.low in kswapd
  2025-10-14 12:56   ` Jiayuan Chen
@ 2025-10-16 14:49     ` Michal Hocko
  2025-10-16 15:10       ` Jiayuan Chen
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2025-10-16 14:49 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: linux-mm, Andrew Morton, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Johannes Weiner, David Hildenbrand, Qi Zheng, Shakeel Butt,
	Lorenzo Stoakes, linux-kernel

On Tue 14-10-25 12:56:06, Jiayuan Chen wrote:
> October 14, 2025 at 17:33, "Michal Hocko" <mhocko@suse.com mailto:mhocko@suse.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > wrote:
> 
> 
> > 
> > On Tue 14-10-25 16:18:49, Jiayuan Chen wrote:
> > 
> > > 
> > > We can set memory.low for cgroups as a soft protection limit. When the
> > >  kernel cannot reclaim any pages from other cgroups, it retries reclaim
> > >  while ignoring the memory.low protection of the skipped cgroups.
> > >  
> > >  Currently, this retry logic only works in direct reclaim path, but is
> > >  missing in the kswapd asynchronous reclaim. Typically, a cgroup may
> > >  contain some cold pages that could be reclaimed even when memory.low is
> > >  set.
> > >  
> > >  This change adds retry logic to kswapd: if the first reclaim attempt fails
> > >  to reclaim any pages and some cgroups were skipped due to memory.low
> > >  protection, kswapd will perform a second reclaim pass ignoring memory.low
> > >  restrictions.
> > >  
> > >  This ensures more consistent reclaim behavior between direct reclaim and
> > >  kswapd. By allowing kswapd to reclaim more proactively from protected
> > >  cgroups under global memory pressure, this optimization can help reduce
> > >  the occurrence of direct reclaim, which is more disruptive to application
> > >  performance.
> > > 
> > Could you describe the problem you are trying to address in more details
> > please? Because your patch is significantly changing the behavior of the
> > low limit. I would even go as far as say it breaks its expecations
> > because low limit should provide a certain level of protection and
> > your patch would allow kswapd to reclaim from those cgroups much sooner
> > now. If this is really needed then we need much more detailed
> > justification and also evaluation how that influences existing users.
> > 
> 
> 
> Thanks Michal, let me explain the issue I encountered:
> 
> 1. When kswapd is triggered and there's no reclaimable memory (sc.nr_reclaimed == 0),
> this causes kswapd_failures counter to continuously accumulate until it reaches
> MAX_RECLAIM_RETRIES. This makes the kswapd thread stop running until a direct memory
> reclaim is triggered.

While the definition of low limit is rather vague:
        Best-effort memory protection.  If the memory usage of a
        cgroup is within its effective low boundary, the cgroup's
        memory won't be reclaimed unless there is no reclaimable
        memory available in unprotected cgroups.
        Above the effective low boundary (or
        effective min boundary if it is higher), pages are reclaimed
        proportionally to the overage, reducing reclaim pressure for
        smaller overages.
which doesn't explicitly rule out reclaim from the kswapd context but
historically we relied on the direct reclaim to detect the "no
reclaimable memory" situation as it is much easier to achieve in that
context. Also you do not really explain why backing off kswapd when all
the reclaimable memory is low limit protected is bad.

> 2. We observed a phenomenon where kswapd is triggered by watermark_boost rather
> than by actual memory watermarks being insufficient. For boost-triggered
> reclamation, the maximum priority can only be DEF_PRIORITY - 2, making memory
> reclamation more difficult compared to when priority is 1.

Do I get it right that you would like to break low limits on
watermark_boost reclaim? I am not sure I follow your priority argument.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1] mm/vmscan: Add retry logic for cgroups with memory.low in kswapd
  2025-10-16 14:49     ` Michal Hocko
@ 2025-10-16 15:10       ` Jiayuan Chen
  2025-10-16 18:43         ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Jiayuan Chen @ 2025-10-16 15:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Johannes Weiner, David Hildenbrand, Qi Zheng, Shakeel Butt,
	Lorenzo Stoakes, linux-kernel

October 16, 2025 at 22:49, "Michal Hocko" <mhocko@suse.com mailto:mhocko@suse.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > wrote:


> >  
> >  Thanks Michal, let me explain the issue I encountered:
> >  
> >  1. When kswapd is triggered and there's no reclaimable memory (sc.nr_reclaimed == 0),
> >  this causes kswapd_failures counter to continuously accumulate until it reaches
> >  MAX_RECLAIM_RETRIES. This makes the kswapd thread stop running until a direct memory
> >  reclaim is triggered.
> > 
> While the definition of low limit is rather vague:
>  Best-effort memory protection. If the memory usage of a
>  cgroup is within its effective low boundary, the cgroup's
>  memory won't be reclaimed unless there is no reclaimable
>  memory available in unprotected cgroups.
>  Above the effective low boundary (or
>  effective min boundary if it is higher), pages are reclaimed
>  proportionally to the overage, reducing reclaim pressure for
>  smaller overages.
> which doesn't explicitly rule out reclaim from the kswapd context but
> historically we relied on the direct reclaim to detect the "no
> reclaimable memory" situation as it is much easier to achieve in that
> context. Also you do not really explain why backing off kswapd when all
> the reclaimable memory is low limit protected is bad.

Thanks for providing this context.


> > 
> > 2. We observed a phenomenon where kswapd is triggered by watermark_boost rather
> >  than by actual memory watermarks being insufficient. For boost-triggered
> >  reclamation, the maximum priority can only be DEF_PRIORITY - 2, making memory
> >  reclamation more difficult compared to when priority is 1.
> > 
> Do I get it right that you would like to break low limits on
> watermark_boost reclaim? I am not sure I follow your priority argument.
> 
> -- 
> Michal Hocko
> SUSE Labs
>

The issue we encountered is that since the watermark_boost parameter is enabled by
default, it causes kswapd to be woken up even when memory watermarks are still relatively
high. Due to rapid consecutive wake-ups, kswapd_failures eventually reaches MAX_RECLAIM_RETRIES,
causing kswapd to stop running, which ultimately triggers direct memory reclaim.

I believe we should choose another approach that avoids breaking the memory.low semantics.
Specifically, in cases where kswapd is woken up due to watermark_boost, we should bypass the
logic that increments kswapd_failures.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1] mm/vmscan: Add retry logic for cgroups with memory.low in kswapd
  2025-10-16 15:10       ` Jiayuan Chen
@ 2025-10-16 18:43         ` Michal Hocko
  2025-10-20 10:11           ` Jiayuan Chen
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2025-10-16 18:43 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: linux-mm, Andrew Morton, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Johannes Weiner, David Hildenbrand, Qi Zheng, Shakeel Butt,
	Lorenzo Stoakes, linux-kernel

On Thu 16-10-25 15:10:31, Jiayuan Chen wrote:
[...]
> The issue we encountered is that since the watermark_boost parameter is enabled by
> default, it causes kswapd to be woken up even when memory watermarks are still relatively
> high. Due to rapid consecutive wake-ups, kswapd_failures eventually reaches MAX_RECLAIM_RETRIES,
> causing kswapd to stop running, which ultimately triggers direct memory reclaim.
>
> I believe we should choose another approach that avoids breaking the memory.low semantics.
> Specifically, in cases where kswapd is woken up due to watermark_boost, we should bypass the
> logic that increments kswapd_failures.

yes, this seems like unintended side effect of the implementation. Seems
like a rare problem as low limits would have to be configured very close
to kswapd watermarks. My assumption has always been that low limits are
not getting very close to watermarks because that makes any reclaim very
hard and configuration rather unstable but you might have a very good
reason to configure the memory protection that way. It would definitely
help to describe your specific setup with rationale so that we can look
into that closer.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1] mm/vmscan: Add retry logic for cgroups with memory.low in kswapd
  2025-10-16 18:43         ` Michal Hocko
@ 2025-10-20 10:11           ` Jiayuan Chen
  2025-11-07 13:22             ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Jiayuan Chen @ 2025-10-20 10:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Johannes Weiner, David Hildenbrand, Qi Zheng, Shakeel Butt,
	Lorenzo Stoakes, linux-kernel

October 17, 2025 at 02:43, "Michal Hocko" <mhocko@suse.com mailto:mhocko@suse.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > wrote:

> 
> On Thu 16-10-25 15:10:31, Jiayuan Chen wrote:
> [...]
> 
> > 
> > The issue we encountered is that since the watermark_boost parameter is enabled by
> >  default, it causes kswapd to be woken up even when memory watermarks are still relatively
> >  high. Due to rapid consecutive wake-ups, kswapd_failures eventually reaches MAX_RECLAIM_RETRIES,
> >  causing kswapd to stop running, which ultimately triggers direct memory reclaim.
> > 
> >  I believe we should choose another approach that avoids breaking the memory.low semantics.
> >  Specifically, in cases where kswapd is woken up due to watermark_boost, we should bypass the
> >  logic that increments kswapd_failures.
> > 
> yes, this seems like unintended side effect of the implementation. Seems
> like a rare problem as low limits would have to be configured very close
> to kswapd watermarks. My assumption has always been that low limits are
> not getting very close to watermarks because that makes any reclaim very
> hard and configuration rather unstable but you might have a very good
> reason to configure the memory protection that way. It would definitely
> help to describe your specific setup with rationale so that we can look
> into that closer.
> -- 
> Michal Hocko
> SUSE Labs
>

Thank you for your response, Michal.

To provide more context about our specific setup:

1. The memory.low values set on host pods are actually quite large,
   some pods are set to 10GB, others to 20GB, etc.
2. Since most pods have memory limits configured, each time kswapd
   is woken up, if a pod's memory usage hasn't exceeded its own
   memory.low, its memory won't be reclaimed.
3. When applications start up, rapidly consume memory, or experience
   network traffic bursts, the kernel reaches steal_suitable_fallback(),
   which sets watermark_boost and subsequently wakes kswapd.
4. In the core logic of kswapd thread (balance_pgdat()), when reclaim is
   triggered by watermark_boost, the maximum priority is 10. Higher priority
   values mean less aggressive LRU scanning, which can result in no pages
   being reclaimed during a single scan cycle:

if (nr_boost_reclaim && sc.priority == DEF_PRIORITY - 2)
    raise_priority = false;

5. This eventually causes pgdat->kswapd_failures to continuously accumulate,
   exceeding MAX_RECLAIM_RETRIES, and consequently kswapd stops working.
   At this point, the system's available memory is still significantly above
   the high watermark—it's inappropriate for kswapd to stop under these
   conditions.

The final observable issue is that a brief period of rapid memory allocation
causes kswapd to stop running, ultimately triggering direct reclaim and
making the applications unresponsive.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1] mm/vmscan: Add retry logic for cgroups with memory.low in kswapd
  2025-10-20 10:11           ` Jiayuan Chen
@ 2025-11-07 13:22             ` Michal Hocko
  2025-11-08  0:09               ` Shakeel Butt
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2025-11-07 13:22 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: linux-mm, Andrew Morton, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Johannes Weiner, David Hildenbrand, Qi Zheng, Shakeel Butt,
	Lorenzo Stoakes, linux-kernel

Sorry for late reply.

On Mon 20-10-25 10:11:23, Jiayuan Chen wrote:
[...]
> To provide more context about our specific setup:
> 
> 1. The memory.low values set on host pods are actually quite large,
>    some pods are set to 10GB, others to 20GB, etc.
> 2. Since most pods have memory limits configured, each time kswapd
>    is woken up, if a pod's memory usage hasn't exceeded its own
>    memory.low, its memory won't be reclaimed.
> 3. When applications start up, rapidly consume memory, or experience
>    network traffic bursts, the kernel reaches steal_suitable_fallback(),
>    which sets watermark_boost and subsequently wakes kswapd.
> 4. In the core logic of kswapd thread (balance_pgdat()), when reclaim is
>    triggered by watermark_boost, the maximum priority is 10. Higher priority
>    values mean less aggressive LRU scanning, which can result in no pages
>    being reclaimed during a single scan cycle:
> 
> if (nr_boost_reclaim && sc.priority == DEF_PRIORITY - 2)
>     raise_priority = false;
> 
> 5. This eventually causes pgdat->kswapd_failures to continuously accumulate,
>    exceeding MAX_RECLAIM_RETRIES, and consequently kswapd stops working.
>    At this point, the system's available memory is still significantly above
>    the high watermark—it's inappropriate for kswapd to stop under these
>    conditions.
> 
> The final observable issue is that a brief period of rapid memory allocation
> causes kswapd to stop running, ultimately triggering direct reclaim and
> making the applications unresponsive.

This to me sounds like something to be addressed in the watermark
boosting code. I do not think we should be breaching low limit for that
(opportunistic) reclaim.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1] mm/vmscan: Add retry logic for cgroups with memory.low in kswapd
  2025-11-07 13:22             ` Michal Hocko
@ 2025-11-08  0:09               ` Shakeel Butt
  0 siblings, 0 replies; 9+ messages in thread
From: Shakeel Butt @ 2025-11-08  0:09 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Jiayuan Chen, linux-mm, Andrew Morton, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Johannes Weiner, David Hildenbrand,
	Qi Zheng, Lorenzo Stoakes, linux-kernel

On Fri, Nov 07, 2025 at 02:22:14PM +0100, Michal Hocko wrote:
> Sorry for late reply.
> 
> On Mon 20-10-25 10:11:23, Jiayuan Chen wrote:
> [...]
> > To provide more context about our specific setup:
> > 
> > 1. The memory.low values set on host pods are actually quite large,
> >    some pods are set to 10GB, others to 20GB, etc.
> > 2. Since most pods have memory limits configured, each time kswapd
> >    is woken up, if a pod's memory usage hasn't exceeded its own
> >    memory.low, its memory won't be reclaimed.
> > 3. When applications start up, rapidly consume memory, or experience
> >    network traffic bursts, the kernel reaches steal_suitable_fallback(),
> >    which sets watermark_boost and subsequently wakes kswapd.
> > 4. In the core logic of kswapd thread (balance_pgdat()), when reclaim is
> >    triggered by watermark_boost, the maximum priority is 10. Higher priority
> >    values mean less aggressive LRU scanning, which can result in no pages
> >    being reclaimed during a single scan cycle:
> > 
> > if (nr_boost_reclaim && sc.priority == DEF_PRIORITY - 2)
> >     raise_priority = false;
> > 
> > 5. This eventually causes pgdat->kswapd_failures to continuously accumulate,
> >    exceeding MAX_RECLAIM_RETRIES, and consequently kswapd stops working.
> >    At this point, the system's available memory is still significantly above
> >    the high watermark—it's inappropriate for kswapd to stop under these
> >    conditions.
> > 
> > The final observable issue is that a brief period of rapid memory allocation
> > causes kswapd to stop running, ultimately triggering direct reclaim and
> > making the applications unresponsive.
> 
> This to me sounds like something to be addressed in the watermark
> boosting code. I do not think we should be breaching low limit for that
> (opportunistic) reclaim.

Jiayuan already posted v2 with different approach. We can move the
discussion there.

http://lore.kernel.org/20251024022711.382238-1-jiayuan.chen@linux.dev


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-11-08  0:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-14  8:18 [PATCH v1] mm/vmscan: Add retry logic for cgroups with memory.low in kswapd Jiayuan Chen
2025-10-14  9:33 ` Michal Hocko
2025-10-14 12:56   ` Jiayuan Chen
2025-10-16 14:49     ` Michal Hocko
2025-10-16 15:10       ` Jiayuan Chen
2025-10-16 18:43         ` Michal Hocko
2025-10-20 10:11           ` Jiayuan Chen
2025-11-07 13:22             ` Michal Hocko
2025-11-08  0:09               ` Shakeel Butt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox