From: Bing Jiao <bingjiao@google.com>
To: Gregory Price <gourry@gourry.net>
Cc: linux-mm@kvack.org, "Waiman Long" <longman@redhat.com>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Michal Hocko" <mhocko@kernel.org>,
"Roman Gushchin" <roman.gushchin@linux.dev>,
"Shakeel Butt" <shakeel.butt@linux.dev>,
"Muchun Song" <muchun.song@linux.dev>,
"Andrew Morton" <akpm@linux-foundation.org>,
"David Hildenbrand" <david@kernel.org>,
"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
"Vlastimil Babka" <vbabka@suse.cz>,
"Mike Rapoport" <rppt@kernel.org>,
"Suren Baghdasaryan" <surenb@google.com>,
"Tejun Heo" <tj@kernel.org>, "Michal Koutný" <mkoutny@suse.com>,
"Qi Zheng" <zhengqi.arch@bytedance.com>,
"Axel Rasmussen" <axelrasmussen@google.com>,
"Yuanchu Xie" <yuanchu@google.com>, "Wei Xu" <weixugc@google.com>,
cgroups@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/vmscan: respect mems_effective in demote_folio_list()
Date: Mon, 22 Dec 2025 06:28:11 +0000 [thread overview]
Message-ID: <aUjk-yVw8ddRgZyN@google.com> (raw)
In-Reply-To: <aUfi9gn5HS4u4ShU@gourry-fedora-PF4VCD3F>
On Sun, Dec 21, 2025 at 07:07:18AM -0500, Gregory Price wrote:
>
> I think this patch can be done without as many changes as proposed here.
>
> > -bool mem_cgroup_node_allowed(struct mem_cgroup *memcg, int nid);
> > +void mem_cgroup_node_allowed(struct mem_cgroup *memcg, nodemask_t *nodes);
>
> > -static inline bool mem_cgroup_node_allowed(struct mem_cgroup *memcg, int nid)
> > +static inline void mem_cgroup_node_allowed(struct mem_cgroup *memcg,
>
> > -int next_demotion_node(int node);
> > +int next_demotion_node(int node, nodemask_t *mask);
>
> > -bool cpuset_node_allowed(struct cgroup *cgroup, int nid)
> > +void cpuset_node_allowed(struct cgroup *cgroup, nodemask_t *nodes)
>
> These are some fairly major contract changes, and the names don't make
> much sense as a result.
>
> Would be better to just make something like
>
> /* Filter the given nmask based on cpuset.mems.allowed */
> mem_cgroup_filter_mems_allowed(memg, nmask);
>
> (or some other, better name)
>
> separate of the existing interfaces, and operate on one scratch-mask if
> possible.
>
Hi Gregory, thank you for the review and suggestions.
I have divided these changes into 2 patches based on your suggestions.
Since mem_cgroup_node_allowed() and cpuset_node_allowed() are dangling,
they are removed in v2 2/2.
> > +static int get_demotion_targets(nodemask_t *targets, struct pglist_data *pgdat,
> > + struct mem_cgroup *memcg)
> > +{
> > + nodemask_t allowed_mask;
> > + nodemask_t preferred_mask;
> > + int preferred_node;
> > +
> > + if (!pgdat)
> > + return NUMA_NO_NODE;
> > +
> > + preferred_node = next_demotion_node(pgdat->node_id, &preferred_mask);
> > + if (preferred_node == NUMA_NO_NODE)
> > + return NUMA_NO_NODE;
> > +
> > + node_get_allowed_targets(pgdat, &allowed_mask);
> > + mem_cgroup_node_allowed(memcg, &allowed_mask);
> > + if (nodes_empty(allowed_mask))
> > + return NUMA_NO_NODE;
> > +
> > + if (targets)
> > + nodes_copy(*targets, allowed_mask);
> > +
> > + do {
> > + if (node_isset(preferred_node, allowed_mask))
> > + return preferred_node;
> > +
> > + nodes_and(preferred_mask, preferred_mask, allowed_mask);
> > + if (!nodes_empty(preferred_mask))
> > + return node_random(&preferred_mask);
> > +
> > + /*
> > + * Hop to the next tier of preferred nodes. Even if
> > + * preferred_node is not set in allowed_mask, still can use it
> > + * to query the nest-best demotion nodes.
> > + */
> > + preferred_node = next_demotion_node(preferred_node,
> > + &preferred_mask);
> > + } while (preferred_node != NUMA_NO_NODE);
> > +
>
> What you're implementing here is effectively a new feature - allowing
> demotion to jump nodes rather than just target the next demotion node.
>
> This is nice, but it should be a separate patch proposal (I think Andrew
> said something as much already) - not as part of a fix.
>
Thanks for the suggestion.
I sent a v2 patch series for fixes and backport. This function (jump
node) will be sent in another thread for distinguishing between fixes
and features.
> > + /*
> > + * Should not reach here, as a non-empty allowed_mask ensures
> > + * there must have a target node for demotion.
>
> Does it? What if preferred_node is online when calling
> next_demotion_node(), but then is offline when
> node_get_allowed_targets() is called?
>
>
> > + * Otherwise, it suggests something wrong in node_demotion[]->preferred,
> > + * where the same-tier nodes have different preferred targets.
> > + * E.g., if node 0 identifies both nodes 2 and 3 as preferred targets,
> > + * but nodes 2 and 3 themselves have different preferred nodes.
> > + */
> > + WARN_ON_ONCE(1);
> > + return node_random(&allowed_mask);
>
> Just returning a random allowed node seems like an objectively poor
> result and we should just not demote if we reach this condition. It
> likesly means hotplug was happening and node states changed.
>
> > @@ -1041,10 +1090,10 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
> > if (list_empty(demote_folios))
> > return 0;
> >
> > + target_nid = get_demotion_targets(&allowed_mask, pgdat, memcg);
> > if (target_nid == NUMA_NO_NODE)
> > return 0;
> > -
> > - node_get_allowed_targets(pgdat, &allowed_mask);
>
> in the immediate fixup patch, it seems more expedient to just add the
> function i described above
>
> /* Filter the given nmask based on cpuset.mems.allowed */
> mem_cgroup_filter_mems_allowed(memg, nmask);
>
> and then add that immediate after the node_get_allowed_targets() call.
>
> Then come back around afterwards to add the tier/node-skip functionality
> from above in a separate feature patch.
>
> ~Gregory
>
Thanks for the hit. I had never considered hot-swapping before.
> ---
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 670fe9fae5ba..1971a8d9475b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1046,6 +1046,11 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
>
> node_get_allowed_targets(pgdat, &allowed_mask);
>
> + /* Filter based on mems_allowed, fail if the result is empty */
> + mem_cgroup_filter_nodemask(memcg, &allowed_mask);
> + if (nodes_empty(allowed_mask))
> + return 0;
> +
> /* Demotion ignores all cpuset and mempolicy settings */
> migrate_pages(demote_folios, alloc_demote_folio, NULL,
> (unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION,
>
>
Thanks for the code. My v2 1/2 is based on your suggestion with
some changes.
Best,
Bing
next prev parent reply other threads:[~2025-12-22 6:28 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-20 6:10 Bing Jiao
2025-12-20 19:20 ` Andrew Morton
2025-12-22 6:16 ` Bing Jiao
2025-12-21 12:07 ` Gregory Price
2025-12-22 6:28 ` Bing Jiao [this message]
2025-12-21 23:36 ` [PATCH v2 0/2] fix demotion targets checks in reclaim/demotion Bing Jiao
2025-12-21 23:36 ` [PATCH v2 1/2] mm/vmscan: respect mems_effective in demote_folio_list() Bing Jiao
2025-12-22 2:38 ` Chen Ridong
2025-12-22 21:56 ` kernel test robot
2025-12-22 22:18 ` kernel test robot
2025-12-21 23:36 ` [PATCH v2 2/2] mm/vmscan: check all allowed targets in can_demote() Bing Jiao
2025-12-22 2:51 ` Chen Ridong
2025-12-22 6:09 ` Bing Jiao
2025-12-22 8:28 ` Chen Ridong
2025-12-23 21:19 ` [PATCH v3] mm/vmscan: fix demotion targets checks in reclaim/demotion Bing Jiao
2025-12-23 21:38 ` Bing Jiao
2025-12-24 1:19 ` Gregory Price
2025-12-26 18:48 ` Bing Jiao
2025-12-24 1:49 ` Chen Ridong
2025-12-26 18:58 ` Bing Jiao
2025-12-26 19:32 ` Waiman Long
2025-12-26 20:24 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aUjk-yVw8ddRgZyN@google.com \
--to=bingjiao@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=cgroups@vger.kernel.org \
--cc=david@kernel.org \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longman@redhat.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=rppt@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox