From: Andrew Morton <akpm@linux-foundation.org>
To: "Jiayuan Chen" <jiayuan.chen@linux.dev>
Cc: "Shakeel Butt" <shakeel.butt@linux.dev>,
linux-mm@kvack.org, "Johannes Weiner" <hannes@cmpxchg.org>,
"David Hildenbrand" <david@redhat.com>,
"Michal Hocko" <mhocko@kernel.org>,
"Qi Zheng" <zhengqi.arch@bytedance.com>,
"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
"Axel Rasmussen" <axelrasmussen@google.com>,
"Yuanchu Xie" <yuanchu@google.com>, "Wei Xu" <weixugc@google.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] mm/vmscan: skip increasing kswapd_failures when reclaim was boosted
Date: Fri, 14 Nov 2025 16:40:37 -0800 [thread overview]
Message-ID: <20251114164037.bdd9551dfae425bc52756832@linux-foundation.org> (raw)
In-Reply-To: <53de0b3ee0b822418e909db29bfa6513faff9d36@linux.dev>
On Fri, 14 Nov 2025 04:17:40 +0000 "Jiayuan Chen" <jiayuan.chen@linux.dev> wrote:
> > > Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> > >
> > Please resolve Andrew's comment and add couple of lines on boosted
> > watermark increasing the chances of kswapd failures and the patch only
> > targets that particular scenario, the general solution TBD in the commit
> > message.
> >
> > With that, you can add:
> >
> > Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
> >
>
> I see this patch is already in mm-next. I'm not sure how to proceed.
> Perhaps Andrew needs to do a git rebase and then reword the commit message?
A rebase would be needed if the patch had been placed in mm.git's
mm-stable branch. But it's still in the mm-unstable branch where
patches are kept in quilt form and are imported into git for each
mm.git release.
> But regardless, I'll reword the commit message here and please let me know
> how to proceed if possible:
Which is why I do things this way.
Easy peasy, edited, thanks.
From: Jiayuan Chen <jiayuan.chen@linux.dev>
Subject: mm/vmscan: skip increasing kswapd_failures when reclaim was boosted
Date: Fri, 24 Oct 2025 10:27:11 +0800
We have a colocation cluster used for deploying both offline and online
services simultaneously. In this environment, we encountered a
scenario where direct memory reclamation was triggered due to kswapd
not running.
1. When applications start up, rapidly consume memory, or experience
network traffic bursts, the kernel reaches steal_suitable_fallback(),
which sets watermark_boost and subsequently wakes kswapd.
2. In the core logic of kswapd thread (balance_pgdat()), when reclaim is
triggered by watermark_boost, the maximum priority is 10. Higher
priority values mean less aggressive LRU scanning, which can result in
no pages being reclaimed during a single scan cycle:
if (nr_boost_reclaim && sc.priority =3D=3D DEF_PRIORITY - 2)
raise_priority =3D false;
3. Additionally, many of our pods are configured with memory.low, which
prevents memory reclamation in certain cgroups, further increasing the
chance of failing to reclaim memory.
4. This eventually causes pgdat->kswapd_failures to continuously
accumulate, exceeding MAX_RECLAIM_RETRIES, and consequently kswapd
sto= ps working. At this point, the system's available memory is
still significantly above the high watermark =E2=80=94 it's
inappropriate fo= r kswapd to stop under these conditions.
The final observable issue is that a brief period of rapid memory
allocation causes kswapd to stop running, ultimately triggering direct
reclaim and making the applications unresponsive.
This problem leading to direct memory reclamation has been a
long-standing issue in our production environment. We initially held
the simple assumption that it was caused by applications allocating
memory too rapidly for kswapd to keep up with reclamation. However,
after we began monitoring kswapd's runtime behavior, we discovered a
different pattern:
kswapd initially exhibits very aggressive activity even when there is
still considerable free memory, but it subsequently stops running
entirely, even as memory levels approach the low watermark.
In summary, both boosted watermarks and memory.low increase the
probability of kswapd operation failures.
This patch specifically addresses the scenario involving boosted
watermarks by not incrementing kswapd_failures when reclamation fails.
A more general solution, potentially addressing memory.low or other
cases, requires further discussion.
Link: https://lkml.kernel.org/r/53de0b3ee0b822418e909db29bfa6513faff9d36@linux.dev
Link: https://lkml.kernel.org/r/20251024022711.382238-1-jiayuan.chen@linux.dev
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/vmscan.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/mm/vmscan.c~mm-vmscan-skip-increasing-kswapd_failures-when-reclaim-was-boosted
+++ a/mm/vmscan.c
@@ -7127,7 +7127,12 @@ restart:
goto restart;
}
- if (!sc.nr_reclaimed)
+ /*
+ * If the reclaim was boosted, we might still be far from the
+ * watermark_high at this point. We need to avoid increasing the
+ * failure count to prevent the kswapd thread from stopping.
+ */
+ if (!sc.nr_reclaimed && !boosted)
atomic_inc(&pgdat->kswapd_failures);
out:
_
prev parent reply other threads:[~2025-11-15 0:40 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20251024022711.382238-1-jiayuan.chen@linux.dev>
2025-10-26 4:40 ` Andrew Morton
2025-11-08 1:11 ` Shakeel Butt
2025-11-12 2:21 ` Jiayuan Chen
2025-11-13 23:41 ` Shakeel Butt
2025-11-13 10:02 ` Michal Hocko
2025-11-13 19:28 ` Shakeel Butt
2025-11-14 2:23 ` Jiayuan Chen
2025-11-13 23:47 ` Shakeel Butt
2025-11-14 4:17 ` Jiayuan Chen
2025-11-15 0:40 ` Andrew Morton [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251114164037.bdd9551dfae425bc52756832@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=jiayuan.chen@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox