From: Shakeel Butt <shakeel.butt@linux.dev>
To: Davidlohr Bueso <dave@stgolabs.net>
Cc: akpm@linux-foundation.org, mhocko@kernel.org, hannes@cmpxchg.org,
roman.gushchin@linux.dev, yosryahmed@google.com,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 4/4] mm: introduce per-node proactive reclaim interface
Date: Wed, 25 Jun 2025 16:10:16 -0700 [thread overview]
Message-ID: <zutbi6jjx6rj2beytkp2ihpyxkuvg43ggsglfhimluojko4frf@gacgibzen5k4> (raw)
In-Reply-To: <20250623185851.830632-5-dave@stgolabs.net>
On Mon, Jun 23, 2025 at 11:58:51AM -0700, Davidlohr Bueso wrote:
> This adds support for allowing proactive reclaim in general on a
> NUMA system. A per-node interface extends support for beyond a
> memcg-specific interface, respecting the current semantics of
> memory.reclaim: respecting aging LRU and not supporting
> artificially triggering eviction on nodes belonging to non-bottom
> tiers.
>
> This patch allows userspace to do:
>
> echo "512M swappiness=10" > /sys/devices/system/node/nodeX/reclaim
>
> One of the premises for this is to semantically align as best as
> possible with memory.reclaim. During a brief time memcg did
> support nodemask until 55ab834a86a9 (Revert "mm: add nodes=
> arg to memory.reclaim"), for which semantics around reclaim
> (eviction) vs demotion were not clear, rendering charging
> expectations to be broken.
>
> With this approach:
>
> 1. Users who do not use memcg can benefit from proactive reclaim.
> The memcg interface is not NUMA aware and there are usecases that
> are focusing on NUMA balancing rather than workload memory footprint.
>
> 2. Proactive reclaim on top tiers will trigger demotion, for which
> memory is still byte-addressable. Reclaiming on the bottom nodes
> will trigger evicting to swap (the traditional sense of reclaim).
> This follows the semantics of what is today part of the aging process
> on tiered memory, mirroring what every other form of reclaim does
> (reactive and memcg proactive reclaim). Furthermore per-node proactive
> reclaim is not as susceptible to the memcg charging problem mentioned
> above.
>
> 3. Unlike the nodes= arg, this interface avoids confusing semantics,
> such as what exactly the user wants when mixing top-tier and low-tier
> nodes in the nodemask. Further per-node interface is less exposed to
> "free up memory in my container" usecases, where eviction is intended.
>
> 4. Users that *really* want to free up memory can use proactive reclaim
> on nodes knowingly to be on the bottom tiers to force eviction in a
> natural way - higher access latencies are still better than swap.
> If compelled, while no guarantees and perhaps not worth the effort,
> users could also also potentially follow a ladder-like approach to
> eventually free up the memory. Alternatively, perhaps an 'evict' option
> could be added to the parameters for both memory.reclaim and per-node
> interfaces to force this action unconditionally.
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Overall looks good but I will try to dig deeper in next couple of days
(or weeks).
One orthogonal thought: I wonder if we want a unified aging (hotness or
generation or active/inactive) view of jobs/memcgs/system. At the moment
due to the way LRUs are implemented i.e. per-memcg per-node, we can have
different view of these LRUs even for the same memcg. For example the
hottest pages in low tier node might be colder than coldest pages in the
top tier. Not sure how to implement it in a scalable way.
next prev parent reply other threads:[~2025-06-25 23:10 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-23 18:58 [PATCH -next v2 0/4] mm: per-node proactive reclaim Davidlohr Bueso
2025-06-23 18:58 ` [PATCH 1/4] mm/vmscan: respect psi_memstall region in node reclaim Davidlohr Bueso
2025-06-25 17:08 ` Shakeel Butt
2025-07-17 1:44 ` Roman Gushchin
2025-06-23 18:58 ` [PATCH 2/4] mm/memcg: make memory.reclaim interface generic Davidlohr Bueso
2025-06-23 21:45 ` Andrew Morton
2025-06-23 23:36 ` Davidlohr Bueso
2025-06-24 18:26 ` Klara Modin
2025-07-17 1:58 ` Roman Gushchin
2025-07-17 16:35 ` Davidlohr Bueso
2025-07-17 22:17 ` Shakeel Butt
2025-07-17 22:52 ` Andrew Morton
2025-07-17 23:56 ` Davidlohr Bueso
2025-07-18 0:17 ` Shakeel Butt
2025-06-23 18:58 ` [PATCH 3/4] mm/vmscan: make __node_reclaim() more generic Davidlohr Bueso
2025-07-17 2:03 ` Roman Gushchin
2025-07-17 22:25 ` Shakeel Butt
2025-06-23 18:58 ` [PATCH 4/4] mm: introduce per-node proactive reclaim interface Davidlohr Bueso
2025-06-25 23:10 ` Shakeel Butt [this message]
2025-06-27 19:07 ` SeongJae Park
2025-07-17 2:46 ` Roman Gushchin
2025-07-17 16:26 ` Davidlohr Bueso
2025-07-17 22:46 ` Andrew Morton
[not found] ` <20250717064925.2304-1-hdanton@sina.com>
2025-07-17 7:39 ` Michal Hocko
2025-07-17 22:28 ` Shakeel Butt
2025-06-23 21:50 ` [PATCH -next v2 0/4] mm: per-node proactive reclaim Andrew Morton
2025-07-16 0:24 ` Andrew Morton
2025-07-16 15:15 ` Shakeel Butt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=zutbi6jjx6rj2beytkp2ihpyxkuvg43ggsglfhimluojko4frf@gacgibzen5k4 \
--to=shakeel.butt@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=dave@stgolabs.net \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=roman.gushchin@linux.dev \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox