Re: [PATCH v1] mm/vmscan: Add retry logic for cgroups with memory.low in kswapd

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Jiayuan Chen" <jiayuan.chen@linux.dev>
To: "Michal Hocko" <mhocko@suse.com>
Cc: linux-mm@kvack.org, "Andrew Morton" <akpm@linux-foundation.org>,
	"Axel Rasmussen" <axelrasmussen@google.com>,
	"Yuanchu Xie" <yuanchu@google.com>, "Wei Xu" <weixugc@google.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"David Hildenbrand" <david@redhat.com>,
	"Qi Zheng" <zhengqi.arch@bytedance.com>,
	"Shakeel Butt" <shakeel.butt@linux.dev>,
	"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1] mm/vmscan: Add retry logic for cgroups with memory.low in kswapd
Date: Mon, 20 Oct 2025 10:11:23 +0000	[thread overview]
Message-ID: <db4d9e73e6a70033da561ed88aef32c1ebe411dd@linux.dev> (raw)
In-Reply-To: <aPE84XfToVH4eAbs@tiehlicka>

October 17, 2025 at 02:43, "Michal Hocko" <mhocko@suse.com mailto:mhocko@suse.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > wrote:

> 
> On Thu 16-10-25 15:10:31, Jiayuan Chen wrote:
> [...]
> 
> > 
> > The issue we encountered is that since the watermark_boost parameter is enabled by
> >  default, it causes kswapd to be woken up even when memory watermarks are still relatively
> >  high. Due to rapid consecutive wake-ups, kswapd_failures eventually reaches MAX_RECLAIM_RETRIES,
> >  causing kswapd to stop running, which ultimately triggers direct memory reclaim.
> > 
> >  I believe we should choose another approach that avoids breaking the memory.low semantics.
> >  Specifically, in cases where kswapd is woken up due to watermark_boost, we should bypass the
> >  logic that increments kswapd_failures.
> > 
> yes, this seems like unintended side effect of the implementation. Seems
> like a rare problem as low limits would have to be configured very close
> to kswapd watermarks. My assumption has always been that low limits are
> not getting very close to watermarks because that makes any reclaim very
> hard and configuration rather unstable but you might have a very good
> reason to configure the memory protection that way. It would definitely
> help to describe your specific setup with rationale so that we can look
> into that closer.
> -- 
> Michal Hocko
> SUSE Labs
>

Thank you for your response, Michal.

To provide more context about our specific setup:

1. The memory.low values set on host pods are actually quite large,
   some pods are set to 10GB, others to 20GB, etc.
2. Since most pods have memory limits configured, each time kswapd
   is woken up, if a pod's memory usage hasn't exceeded its own
   memory.low, its memory won't be reclaimed.
3. When applications start up, rapidly consume memory, or experience
   network traffic bursts, the kernel reaches steal_suitable_fallback(),
   which sets watermark_boost and subsequently wakes kswapd.
4. In the core logic of kswapd thread (balance_pgdat()), when reclaim is
   triggered by watermark_boost, the maximum priority is 10. Higher priority
   values mean less aggressive LRU scanning, which can result in no pages
   being reclaimed during a single scan cycle:

if (nr_boost_reclaim && sc.priority == DEF_PRIORITY - 2)
    raise_priority = false;

5. This eventually causes pgdat->kswapd_failures to continuously accumulate,
   exceeding MAX_RECLAIM_RETRIES, and consequently kswapd stops working.
   At this point, the system's available memory is still significantly above
   the high watermark—it's inappropriate for kswapd to stop under these
   conditions.

The final observable issue is that a brief period of rapid memory allocation
causes kswapd to stop running, ultimately triggering direct reclaim and
making the applications unresponsive.

next prev parent reply	other threads:[~2025-10-20 10:11 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-14  8:18 Jiayuan Chen
2025-10-14  9:33 ` Michal Hocko
2025-10-14 12:56   ` Jiayuan Chen
2025-10-16 14:49     ` Michal Hocko
2025-10-16 15:10       ` Jiayuan Chen
2025-10-16 18:43         ` Michal Hocko
2025-10-20 10:11           ` Jiayuan Chen [this message]
2025-11-07 13:22             ` Michal Hocko
2025-11-08  0:09               ` Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=db4d9e73e6a70033da561ed88aef32c1ebe411dd@linux.dev \
    --to=jiayuan.chen@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=shakeel.butt@linux.dev \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox