From: Vlastimil Babka <vbabka@suse.cz>
To: Chanwon Park <flyinrm@gmail.com>,
akpm@linux-foundation.org, surenb@google.com, mhocko@suse.com,
jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com,
david@redhat.com, zhengqi.arch@bytedance.com,
shakeel.butt@linux.dev, lorenzo.stoakes@oracle.com
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: re-enable kswapd when memory pressure subsides or demotion is toggled
Date: Tue, 30 Sep 2025 09:43:28 +0200 [thread overview]
Message-ID: <877fa6d5-e2c7-41d8-88f7-6ee6ac395fc2@suse.cz> (raw)
In-Reply-To: <aL6qGi69jWXfPc4D@pcw-MS-7D22>
On 9/8/25 12:04, Chanwon Park wrote:
> If kswapd fails to reclaim pages from a node MAX_RECLAIM_RETRIES in a
> row, kswapd on that node gets disabled. That is, the system won't wakeup
> kswapd for that node until page reclamation is observed at least once.
> That reclamation is mostly done by direct reclaim, which in turn enables
> kswapd back.
>
> However, on systems with CXL memory nodes, workloads with high anon page
> usage can disable kswapd indefinitely, without triggering direct
> reclaim. This can be reproduced with following steps:
>
> numa node 0 (32GB memory, 48 CPUs)
> numa node 2~5 (512GB CXL memory, 128GB each)
> (numa node 1 is disabled)
> swap space 8GB
>
> 1) Set /sys/kernel/mm/demotion_enabled to 0.
> 2) Set /proc/sys/kernel/numa_balancing to 0.
> 3) Run a process that allocates and random accesses 500GB of anon
> pages.
> 4) Let the process exit normally.
>
> During 3), free memory on node 0 gets lower than low watermark, and
> kswapd runs and depletes swap space. Then, kswapd fails consecutively
> and gets disabled. Allocation afterwards happens on CXL memory, so node
> 0 never gains more memory pressure to trigger direct reclaim.
>
> After 4), kswapd on node 0 remains disabled, and tasks running on that
> node are unable to swap. If you turn on NUMA_BALANCING_MEMORY_TIERING
> and demotion now, it won't work properly since kswapd is disabled.
>
> To mitigate this problem, reset kswapd_failures to 0 on following
> conditions:
>
> a) ZONE_BELOW_HIGH bit of a zone in hopeless node with a fallback
> memory node gets cleared.
> b) demotion_enabled is changed from false to true.
>
> Rationale for a):
> ZONE_BELOW_HIGH bit being cleared might be a sign that the node may
> be reclaimable afterwards. This won't help much if the memory-hungry
> process keeps running without freeing anything, but at least the node
> will go back to reclaimable state when the process exits.
>
> Rationale for b):
> When demotion_enabled is false, kswapd can only reclaim anon pages by
> swapping them out to swap space. If demotion_enabled is turned on,
> kswapd can demote anon pages to another node for reclaiming. So, the
> original failure count for determining reclaimability is no longer
> valid.
>
> Since kswapd_failures resets may be missed by ++ operation, it is
> changed from int to atomic_t.
>
> Signed-off-by: Chanwon Park <flyinrm@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
prev parent reply other threads:[~2025-09-30 7:43 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-08 10:04 Chanwon Park
2025-09-09 0:06 ` Andrew Morton
2025-09-09 5:57 ` Chanwon Park
2025-09-30 7:43 ` Vlastimil Babka [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877fa6d5-e2c7-41d8-88f7-6ee6ac395fc2@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=flyinrm@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=jackmanb@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=zhengqi.arch@bytedance.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox