From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: linux-mm@kvack.org, shakeel.butt@linux.dev
Cc: Jiayuan Chen <jiayuan.chen@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
Steven Rostedt <rostedt@goodmis.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Brendan Jackman <jackmanb@google.com>,
Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>,
Qi Zheng <zhengqi.arch@bytedance.com>,
Jiayuan Chen <jiayuan.chen@shopee.com>,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org
Subject: [PATCH v3 0/2] mm/vmscan: mitigate spurious kswapd_failures reset and add tracepoints
Date: Wed, 14 Jan 2026 15:40:34 +0800 [thread overview]
Message-ID: <20260114074049.229935-1-jiayuan.chen@linux.dev> (raw)
== Problem ==
We observed an issue in production on a multi-NUMA system where kswapd
runs endlessly, causing sustained heavy IO READ pressure across the
entire system.
The root cause is that direct reclaim triggered by cgroup memory.high
keeps resetting kswapd_failures to 0, even when the node cannot be
balanced. This prevents kswapd from ever stopping after reaching
MAX_RECLAIM_RETRIES.
```bash
bpftrace -e '
#include <linux/mmzone.h>
#include <linux/shrinker.h>
kprobe:balance_pgdat {
$pgdat = (struct pglist_data *)arg0;
if ($pgdat->kswapd_failures > 0) {
printf("[node %d] [%lu] kswapd end, kswapd_failures %d\n",
$pgdat->node_id, jiffies, $pgdat->kswapd_failures);
}
}
tracepoint:vmscan:mm_vmscan_direct_reclaim_end {
printf("[cpu %d] [%ul] reset kswapd_failures %d \n", cpu, jiffies,
args.nr_reclaimed)
}
'
```
The trace results showed that when kswapd_failures reaches 15, continuous
direct reclaim keeps resetting it to 0. This was accompanied by a flood of
kswapd_failures log entries, and shortly after, we observed massive
refaults occurring.
== Solution ==
Patch 1 fixes the issue by only resetting kswapd_failures when the node
is actually balanced. This introduces pgdat_try_reset_kswapd_failures()
as a wrapper that checks pgdat_balanced() before resetting.
Patch 2 extends the wrapper to track why kswapd_failures was reset,
adding tracepoints for better observability:
- mm_vmscan_reset_kswapd_failures: traces each reset with reason
- mm_vmscan_kswapd_reclaim_fail: traces each kswapd reclaim failure
---
v2 -> v3: https://lore.kernel.org/all/20251226080042.291657-1-jiayuan.chen@linux.dev/
- Add tracepoints for kswapd_failures reset and reclaim failure
- Expand commit message with test results
v1 -> v2: https://lore.kernel.org/all/20251222122022.254268-1-jiayuan.chen@linux.dev/
Jiayuan Chen (2):
mm/vmscan: mitigate spurious kswapd_failures reset from direct reclaim
mm/vmscan: add tracepoint and reason for kswapd_failures reset
include/linux/mmzone.h | 9 +++++++
include/trace/events/vmscan.h | 51 +++++++++++++++++++++++++++++++++++
mm/memory-tiers.c | 2 +-
mm/page_alloc.c | 2 +-
mm/vmscan.c | 33 ++++++++++++++++++++---
5 files changed, 91 insertions(+), 6 deletions(-)
--
2.43.0
next reply other threads:[~2026-01-14 7:41 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-14 7:40 Jiayuan Chen [this message]
2026-01-14 7:40 ` [PATCH v3 1/2] mm/vmscan: mitigate spurious kswapd_failures reset from direct reclaim Jiayuan Chen
2026-01-14 7:40 ` [PATCH v3 2/2] mm/vmscan: add tracepoint and reason for kswapd_failures reset Jiayuan Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260114074049.229935-1-jiayuan.chen@linux.dev \
--to=jiayuan.chen@linux.dev \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=jackmanb@google.com \
--cc=jiayuan.chen@shopee.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mhocko@suse.com \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=zhengqi.arch@bytedance.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox