From: Shakeel Butt <shakeel.butt@linux.dev>
To: Jiayuan Chen <jiayuan.chen@linux.dev>
Cc: Michal Hocko <mhocko@suse.com>,
linux-mm@kvack.org, Jiayuan Chen <jiayuan.chen@shopee.com>,
Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
David Hildenbrand <david@kernel.org>,
Qi Zheng <zhengqi.arch@bytedance.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] mm/vmscan: mitigate spurious kswapd_failures reset from direct reclaim
Date: Mon, 12 Jan 2026 13:29:06 -0800 [thread overview]
Message-ID: <langyedbbu7b4zkz5o7yy7m7bdlusoa3zwsjbgrqt2p7ou37qm@fi7rovfl5gfz> (raw)
In-Reply-To: <61b4f3ba49016e68e8d6bfe6543150a7de0bac79@linux.dev>
Hi Jiayuan,
Sorry for late reply. Let me respond in-place below.
On Wed, Jan 07, 2026 at 11:39:36AM +0000, Jiayuan Chen wrote:
[...]
>
> Hi Shakeel,
>
> Thanks for the feedback.
>
> To be honest, the issue is difficult to reproduce because the boundary conditions are quite complex.
> We also haven't deployed this patch in production yet. I discovered the relationship between
> kswapd_failures and direct reclaim through the following bpftrace script:
>
> '''bash
>
> bpftrace -e '
> #include <linux/mmzone.h>
> #include <linux/shrinker.h>
> kprobe:balance_pgdat {
> $pgdat = (struct pglist_data *)arg0;
> if ($pgdat->kswapd_failures > 0) {
> printf("[node %d] [%lu] kswapd end, kswapd_failures %d\n", $pgdat->node_id, jiffies, $pgdat->kswapd_failures);
> }
> }
> tracepoint:vmscan:mm_vmscan_direct_reclaim_end {
> printf("[cpu %d] [%ul] reset kswapd_failures %d \n", cpu, jiffies, args.nr_reclaimed)
> }
> '
>
> '''
>
> The trace results showed that when kswapd_failures reaches 15, continuous direct reclaim keeps
> resetting it to 0. This was accompanied by a flood of kswapd_failures log entries, and shortly
> after, we observed massive refaults occurring.
> (Note that I can only observe up to 15 in the trace due to a kprobe limitation:
> the kprobe on balance_pgdat fires at function entry, but kswapd_failures is incremented to 16 only
> when balance_pgdat fails to reclaim any pages - at which point kswapd goes to sleep and there's no
> suitable hook point to capture it.)
>
>
> Before I send v3, I'd like to continue the discussion to make sure we're aligned on the approach:
>
> Do you think the bpftrace evidence above is sufficient?
Mainly I want to see if the patch is contributing positively or
negatively in the situation you are seeing in your production. Overall I
think Michal and I are on the same page that the patch is net positive
but the testing in production would eliminate the concerns completely.
Anyways we can proceed with the patch and we can always change in future
if this does not work. Please go ahead with v3 with additional
explanation.
>
>
> If you and Michal are okay with the current approach, I'll prepare v3 with mote detailed comments addressed.
>
> By the way, this tracing limitation makes me wonder: would it be appropriate to add two tracepoints for
> kswapd_failures? One for when kswapd_failures reaches MAX_RECLAIM_RETRIES (16), and another for when it
> gets reset to 0. Currently, the only way to detect this is by polling node_unreclaimable from /proc/zoneinfo,
> but the sampling interval is usually too coarse to catch these events.
tracepoints are cheap and I am all for more observability. Go ahead and
propose the tracepoints which you see fit.
prev parent reply other threads:[~2026-01-12 21:29 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20251226080042.291657-1-jiayuan.chen@linux.dev>
2025-12-28 19:46 ` Andrew Morton
2025-12-30 4:10 ` Shakeel Butt
2026-01-06 22:06 ` Shakeel Butt
2026-01-07 11:39 ` Jiayuan Chen
2026-01-12 21:29 ` Shakeel Butt [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=langyedbbu7b4zkz5o7yy7m7bdlusoa3zwsjbgrqt2p7ou37qm@fi7rovfl5gfz \
--to=shakeel.butt@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=jiayuan.chen@linux.dev \
--cc=jiayuan.chen@shopee.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox