linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <llong@redhat.com>
To: Gregory Price <gourry@gourry.net>, linux-mm@kvack.org
Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	kernel-team@meta.com, hannes@cmpxchg.org, mhocko@kernel.org,
	roman.gushchin@linux.dev, shakeel.butt@linux.dev,
	muchun.song@linux.dev, tj@kernel.org, mkoutny@suse.com,
	akpm@linux-foundation.org
Subject: Re: [PATCH v4 2/2] vmscan,cgroup: apply mems_effective to reclaim
Date: Mon, 21 Apr 2025 22:02:22 -0400	[thread overview]
Message-ID: <d7568176-6199-488f-b45a-c494c8baec25@redhat.com> (raw)
In-Reply-To: <20250422012616.1883287-3-gourry@gourry.net>

On 4/21/25 9:26 PM, Gregory Price wrote:
> It is possible for a reclaimer to cause demotions of an lruvec belonging
> to a cgroup with cpuset.mems set to exclude some nodes. Attempt to apply
> this limitation based on the lruvec's memcg and prevent demotion.
>
> Notably, this may still allow demotion of shared libraries or any memory
> first instantiated in another cgroup. This means cpusets still cannot
> cannot guarantee complete isolation when demotion is enabled, and the
> docs have been updated to reflect this.
>
> This is useful for isolating workloads on a multi-tenant system from
> certain classes of memory more consistently - with the noted exceptions.
>
> Acked-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Gregory Price <gourry@gourry.net>
> ---
>   .../ABI/testing/sysfs-kernel-mm-numa          | 16 +++++---
>   include/linux/cpuset.h                        |  5 +++
>   include/linux/memcontrol.h                    |  6 +++
>   kernel/cgroup/cpuset.c                        | 26 ++++++++++++
>   mm/memcontrol.c                               |  6 +++
>   mm/vmscan.c                                   | 41 +++++++++++--------
>   6 files changed, 78 insertions(+), 22 deletions(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-numa b/Documentation/ABI/testing/sysfs-kernel-mm-numa
> index 77e559d4ed80..90e375ff54cb 100644
> --- a/Documentation/ABI/testing/sysfs-kernel-mm-numa
> +++ b/Documentation/ABI/testing/sysfs-kernel-mm-numa
> @@ -16,9 +16,13 @@ Description:	Enable/disable demoting pages during reclaim
>   		Allowing page migration during reclaim enables these
>   		systems to migrate pages from fast tiers to slow tiers
>   		when the fast tier is under pressure.  This migration
> -		is performed before swap.  It may move data to a NUMA
> -		node that does not fall into the cpuset of the
> -		allocating process which might be construed to violate
> -		the guarantees of cpusets.  This should not be enabled
> -		on systems which need strict cpuset location
> -		guarantees.
> +		is performed before swap if an eligible numa node is
> +		present in cpuset.mems for the cgroup (or if cpuset v1
> +		is being used). If cpusets.mems changes at runtime, it
> +		may move data to a NUMA node that does not fall into the
> +		cpuset of the new cpusets.mems, which might be construed
> +		to violate the guarantees of cpusets.  Shared memory,
> +		such as libraries, owned by another cgroup may still be
> +		demoted and result in memory use on a node not present
> +		in cpusets.mem. This should not be enabled on systems
> +		which need strict cpuset location guarantees.
> diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
> index 893a4c340d48..5255e3fdbf62 100644
> --- a/include/linux/cpuset.h
> +++ b/include/linux/cpuset.h
> @@ -171,6 +171,7 @@ static inline void set_mems_allowed(nodemask_t nodemask)
>   	task_unlock(current);
>   }
>   
> +extern bool cpuset_node_allowed(struct cgroup *cgroup, int nid);
>   #else /* !CONFIG_CPUSETS */
>   
>   static inline bool cpusets_enabled(void) { return false; }
> @@ -282,6 +283,10 @@ static inline bool read_mems_allowed_retry(unsigned int seq)
>   	return false;
>   }
>   
> +static inline bool cpuset_node_allowed(struct cgroup *cgroup, int nid)
> +{
> +	return true;
> +}
>   #endif /* !CONFIG_CPUSETS */
>   
>   #endif /* _LINUX_CPUSET_H */
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 53364526d877..a6c4e3faf721 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -1736,6 +1736,8 @@ static inline void count_objcg_events(struct obj_cgroup *objcg,
>   	rcu_read_unlock();
>   }
>   
> +bool mem_cgroup_node_allowed(struct mem_cgroup *memcg, int nid);
> +
>   #else
>   static inline bool mem_cgroup_kmem_disabled(void)
>   {
> @@ -1793,6 +1795,10 @@ static inline void count_objcg_events(struct obj_cgroup *objcg,
>   {
>   }
>   
> +static inline bool mem_cgroup_node_allowed(struct mem_cgroup *memcg, int nid)
> +{
> +	return true;
> +}
>   #endif /* CONFIG_MEMCG */
>   
>   #if defined(CONFIG_MEMCG) && defined(CONFIG_ZSWAP)
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index f8e6a9b642cb..c52348bfd5db 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -4163,6 +4163,32 @@ bool cpuset_current_node_allowed(int node, gfp_t gfp_mask)
>   	return allowed;
>   }
>   
> +bool cpuset_node_allowed(struct cgroup *cgroup, int nid)
> +{
> +	struct cgroup_subsys_state *css;
> +	struct cpuset *cs;
> +	bool allowed;
> +
> +	/*
> +	 * In v1, mem_cgroup and cpuset are unlikely in the same hierarchy
> +	 * and mems_allowed is likely to be empty even if we could get to it,
> +	 * so return true to avoid taking a global lock on the empty check.
> +	 */
> +	if (!cpuset_v2())
> +		return true;
> +
> +	css = cgroup_get_e_css(cgroup, &cpuset_cgrp_subsys);
> +	if (!css)
> +		return true;
> +
> +	cs = container_of(css, struct cpuset, css);
> +	rcu_read_lock();

Sorry, I missed the fact that cgroup_get_e_css() will take a reference 
to the css and so it won't go away. In that case, rcu_read_lock() isn't 
really needed. However, I do want a comment to say that accessing 
effective_mems should normally requrie taking either a cpuset_mutex or 
callback_lock, but is skipped in this case to avoid taking a global lock 
in the reclaim path at the expense that the result may be inaccurate in 
some rare cases.

Cheers,
Longman

Cheers,
Longman



  reply	other threads:[~2025-04-22  2:02 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-22  1:26 [PATCH v4 0/2] vmscan: enforce mems_effective during demotion Gregory Price
2025-04-22  1:26 ` [PATCH v4 1/2] cpuset: rename cpuset_node_allowed to cpuset_current_node_allowed Gregory Price
2025-04-22 17:37   ` Johannes Weiner
2025-04-22  1:26 ` [PATCH v4 2/2] vmscan,cgroup: apply mems_effective to reclaim Gregory Price
2025-04-22  2:02   ` Waiman Long [this message]
2025-04-22  4:07     ` Gregory Price
2025-04-22  4:30   ` [PATCH] cpuset: relax locking on cpuset_node_allowed Gregory Price
2025-04-22  4:41     ` Shakeel Butt
2025-04-22 17:46     ` Johannes Weiner
2025-04-22 19:57     ` Waiman Long
2025-04-22  4:41   ` [PATCH v4 2/2] vmscan,cgroup: apply mems_effective to reclaim Shakeel Butt
2025-04-22 17:41   ` Johannes Weiner
2025-04-24 20:22   ` [PATCH v5 0/2] vmscan: enforce mems_effective during demotion Gregory Price
2025-04-24 20:22     ` [PATCH v5 1/2] cpuset: rename cpuset_node_allowed to cpuset_current_node_allowed Gregory Price
2025-04-24 20:22     ` [PATCH v5 2/2] vmscan,cgroup: apply mems_effective to reclaim Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d7568176-6199-488f-b45a-c494c8baec25@redhat.com \
    --to=llong@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox