[PATCH] memcg: add hierarchical effective limits for v2

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] memcg: add hierarchical effective limits for v2
@ 2025-02-05 22:20 Shakeel Butt
  2025-02-05 22:33 ` Balbir Singh
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Shakeel Butt @ 2025-02-05 22:20 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner
  Cc: Michal Hocko, Roman Gushchin, Muchun Song, Michal Koutný,
	linux-mm, cgroups, linux-kernel, Meta kernel team

Memcg-v1 exposes hierarchical_[memory|memsw]_limit counters in its
memory.stat file which applications can use to get their effective limit
which is the minimum of limits of itself and all of its ancestors. This
is pretty useful in environments where cgroup namespace is used and the
application does not have access to the full view of the cgroup
hierarchy. Let's expose effective limits for memcg v2 as well.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 Documentation/admin-guide/cgroup-v2.rst | 24 +++++++++++++
 mm/memcontrol.c                         | 48 +++++++++++++++++++++++++
 2 files changed, 72 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index cb1b4e759b7e..175e9435ad5c 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1311,6 +1311,14 @@ PAGE_SIZE multiple when read back.
 	Caller could retry them differently, return into userspace
 	as -ENOMEM or silently ignore in cases like disk readahead.
 
+  memory.max.effective
+	A read-only single value file which exists on non-root cgroups.
+
+        The effective limit of the cgroup i.e. the minimum memory.max
+        of all ancestors including itself. This is useful for environments
+        where cgroup namespace is being used and the application does not
+        have full view of the hierarchy.
+
   memory.reclaim
 	A write-only nested-keyed file which exists for all cgroups.
 
@@ -1726,6 +1734,14 @@ The following nested keys are defined.
 	Swap usage hard limit.  If a cgroup's swap usage reaches this
 	limit, anonymous memory of the cgroup will not be swapped out.
 
+  memory.swap.max.effective
+	A read-only single value file which exists on non-root cgroups.
+
+        The effective limit of the cgroup i.e. the minimum memory.swap.max
+        of all ancestors including itself. This is useful for environments
+        where cgroup namespace is being used and the application does not
+        have full view of the hierarchy.
+
   memory.swap.events
 	A read-only flat-keyed file which exists on non-root cgroups.
 	The following entries are defined.  Unless specified
@@ -1766,6 +1782,14 @@ The following nested keys are defined.
 	limit, it will refuse to take any more stores before existing
 	entries fault back in or are written out to disk.
 
+  memory.zswap.max.effective
+	A read-only single value file which exists on non-root cgroups.
+
+        The effective limit of the cgroup i.e. the minimum memory.zswap.max
+        of all ancestors including itself. This is useful for environments
+        where cgroup namespace is being used and the application does not
+        have full view of the hierarchy.
+
   memory.zswap.writeback
 	A read-write single value file. The default value is "1".
 	Note that this setting is hierarchical, i.e. the writeback would be
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index cae1c2e0cc71..8d21c1a44220 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4161,6 +4161,17 @@ static int memory_max_show(struct seq_file *m, void *v)
 		READ_ONCE(mem_cgroup_from_seq(m)->memory.max));
 }
 
+static int memory_max_effective_show(struct seq_file *m, void *v)
+{
+	unsigned long max = PAGE_COUNTER_MAX;
+	struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
+
+	for (; memcg; memcg = parent_mem_cgroup(memcg))
+		max = min(max, READ_ONCE(memcg->memory.max));
+
+	return seq_puts_memcg_tunable(m, max);
+}
+
 static ssize_t memory_max_write(struct kernfs_open_file *of,
 				char *buf, size_t nbytes, loff_t off)
 {
@@ -4438,6 +4449,11 @@ static struct cftype memory_files[] = {
 		.seq_show = memory_max_show,
 		.write = memory_max_write,
 	},
+	{
+		.name = "max.effective",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = memory_max_effective_show,
+	},
 	{
 		.name = "events",
 		.flags = CFTYPE_NOT_ON_ROOT,
@@ -5117,6 +5133,17 @@ static int swap_max_show(struct seq_file *m, void *v)
 		READ_ONCE(mem_cgroup_from_seq(m)->swap.max));
 }
 
+static int swap_max_effective_show(struct seq_file *m, void *v)
+{
+	unsigned long max = PAGE_COUNTER_MAX;
+	struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
+
+	for (; memcg; memcg = parent_mem_cgroup(memcg))
+		max = min(max, READ_ONCE(memcg->swap.max));
+
+	return seq_puts_memcg_tunable(m, max);
+}
+
 static ssize_t swap_max_write(struct kernfs_open_file *of,
 			      char *buf, size_t nbytes, loff_t off)
 {
@@ -5166,6 +5193,11 @@ static struct cftype swap_files[] = {
 		.seq_show = swap_max_show,
 		.write = swap_max_write,
 	},
+	{
+		.name = "swap.max.effective",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = swap_max_effective_show,
+	},
 	{
 		.name = "swap.peak",
 		.flags = CFTYPE_NOT_ON_ROOT,
@@ -5308,6 +5340,17 @@ static int zswap_max_show(struct seq_file *m, void *v)
 		READ_ONCE(mem_cgroup_from_seq(m)->zswap_max));
 }
 
+static int zswap_max_effective_show(struct seq_file *m, void *v)
+{
+	unsigned long max = PAGE_COUNTER_MAX;
+	struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
+
+	for (; memcg; memcg = parent_mem_cgroup(memcg))
+		max = min(max, READ_ONCE(memcg->zswap_max));
+
+	return seq_puts_memcg_tunable(m, max);
+}
+
 static ssize_t zswap_max_write(struct kernfs_open_file *of,
 			       char *buf, size_t nbytes, loff_t off)
 {
@@ -5362,6 +5405,11 @@ static struct cftype zswap_files[] = {
 		.seq_show = zswap_max_show,
 		.write = zswap_max_write,
 	},
+	{
+		.name = "zswap.max.effective",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = zswap_max_effective_show,
+	},
 	{
 		.name = "zswap.writeback",
 		.seq_show = zswap_writeback_show,
-- 
2.43.5



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: add hierarchical effective limits for v2
  2025-02-05 22:20 [PATCH] memcg: add hierarchical effective limits for v2 Shakeel Butt
@ 2025-02-05 22:33 ` Balbir Singh
  2025-02-06 15:57 ` Michal Koutný
  2025-02-06 22:24 ` Shakeel Butt
  2 siblings, 0 replies; 16+ messages in thread
From: Balbir Singh @ 2025-02-05 22:33 UTC (permalink / raw)
  To: Shakeel Butt, Tejun Heo, Johannes Weiner
  Cc: Michal Hocko, Roman Gushchin, Muchun Song, Michal Koutný,
	linux-mm, cgroups, linux-kernel, Meta kernel team

On 2/6/25 09:20, Shakeel Butt wrote:
> Memcg-v1 exposes hierarchical_[memory|memsw]_limit counters in its
> memory.stat file which applications can use to get their effective limit
> which is the minimum of limits of itself and all of its ancestors. This
> is pretty useful in environments where cgroup namespace is used and the
> application does not have access to the full view of the cgroup
> hierarchy. Let's expose effective limits for memcg v2 as well.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>


Even without namespaces, in a hierarchy the application might be restricted
in reading the parent cgroup information (read permission removed for example)

Otherwise looks good to me

Reviewed-by: Balbir Singh <balbirs@nvidia.com>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: add hierarchical effective limits for v2
  2025-02-05 22:20 [PATCH] memcg: add hierarchical effective limits for v2 Shakeel Butt
  2025-02-05 22:33 ` Balbir Singh
@ 2025-02-06 15:57 ` Michal Koutný
  2025-02-06 19:09   ` Shakeel Butt
  2025-02-06 22:24 ` Shakeel Butt
  2 siblings, 1 reply; 16+ messages in thread
From: Michal Koutný @ 2025-02-06 15:57 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Tejun Heo, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, linux-mm, cgroups, linux-kernel, Meta kernel team

[-- Attachment #1: Type: text/plain, Size: 1108 bytes --]

Hello Shakeel.

On Wed, Feb 05, 2025 at 02:20:29PM -0800, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> Memcg-v1 exposes hierarchical_[memory|memsw]_limit counters in its
> memory.stat file which applications can use to get their effective limit
> which is the minimum of limits of itself and all of its ancestors.

I was fan of equal idea too [1]. The referenced series also tackles
change notifications (to make this complete for apps that really want to
scale based on the actual limit). I ceased to like it when I realized
there can be hierarchies when the effective value cannot be effectively
:) determined [2].

> This is pretty useful in environments where cgroup namespace is used
> and the application does not have access to the full view of the
> cgroup hierarchy. Let's expose effective limits for memcg v2 as well.

Also, the case for this exposition was never strongly built.
Why isn't PSI enough in your case?

Thanks,
Michal

[1] https://lore.kernel.org/r/20240606152232.20253-1-mkoutny@suse.com
[2] https://lore.kernel.org/r/7chi6d2sdhwdsfihoxqmtmi4lduea3dsgc7xorvonugkm4qz2j@gehs4slutmtg

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: add hierarchical effective limits for v2
  2025-02-06 15:57 ` Michal Koutný
@ 2025-02-06 19:09   ` Shakeel Butt
  2025-02-06 19:37     ` T.J. Mercier
  2025-02-10 16:24     ` Michal Koutný
  0 siblings, 2 replies; 16+ messages in thread
From: Shakeel Butt @ 2025-02-06 19:09 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Tejun Heo, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, linux-mm, cgroups, linux-kernel, Meta kernel team

On Thu, Feb 06, 2025 at 04:57:39PM +0100, Michal Koutný wrote:
> Hello Shakeel.
> 
> On Wed, Feb 05, 2025 at 02:20:29PM -0800, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > Memcg-v1 exposes hierarchical_[memory|memsw]_limit counters in its
> > memory.stat file which applications can use to get their effective limit
> > which is the minimum of limits of itself and all of its ancestors.
> 
> I was fan of equal idea too [1]. The referenced series also tackles
> change notifications (to make this complete for apps that really want to
> scale based on the actual limit). I ceased to like it when I realized
> there can be hierarchies when the effective value cannot be effectively
> :) determined [2].
> 
> > This is pretty useful in environments where cgroup namespace is used
> > and the application does not have access to the full view of the
> > cgroup hierarchy. Let's expose effective limits for memcg v2 as well.
> 
> Also, the case for this exposition was never strongly built.
> Why isn't PSI enough in your case?
> 

Hi Michal,

Oh I totally forgot about your series. In my use-case, it is not about
dynamically knowning how much they can expand and adjust themselves but
rather knowing statically upfront what resources they have been given.
More concretely, these are workloads which used to completely occupy a
single machine, though within containers but without limits. These
workloads used to look at machine level metrics at startup on how much
resources are available.

Now these workloads are being moved to multi-tenant environment but
still the machine is partitioned statically between the workloads. So,
these workloads need to know upfront how much resources are allocated to
them upfront and the way the cgroup hierarchy is setup, that information
is a bit above the tree.

I hope this clarifies the motivation behind this change i.e. the target
is not dynamic load balancing but rather upfront static knowledge.

thanks,
Shakeel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: add hierarchical effective limits for v2
  2025-02-06 19:09   ` Shakeel Butt
@ 2025-02-06 19:37     ` T.J. Mercier
  2025-02-10 16:24     ` Michal Koutný
  1 sibling, 0 replies; 16+ messages in thread
From: T.J. Mercier @ 2025-02-06 19:37 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Michal Koutný,
	Tejun Heo, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, linux-mm, cgroups, linux-kernel, Meta kernel team

On Thu, Feb 6, 2025 at 11:09 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Thu, Feb 06, 2025 at 04:57:39PM +0100, Michal Koutný wrote:
> > Hello Shakeel.
> >
> > On Wed, Feb 05, 2025 at 02:20:29PM -0800, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > > Memcg-v1 exposes hierarchical_[memory|memsw]_limit counters in its
> > > memory.stat file which applications can use to get their effective limit
> > > which is the minimum of limits of itself and all of its ancestors.
> >
> > I was fan of equal idea too [1]. The referenced series also tackles
> > change notifications (to make this complete for apps that really want to
> > scale based on the actual limit). I ceased to like it when I realized
> > there can be hierarchies when the effective value cannot be effectively
> > :) determined [2].
> >
> > > This is pretty useful in environments where cgroup namespace is used
> > > and the application does not have access to the full view of the
> > > cgroup hierarchy. Let's expose effective limits for memcg v2 as well.
> >
> > Also, the case for this exposition was never strongly built.
> > Why isn't PSI enough in your case?
> >
>
> Hi Michal,
>
> Oh I totally forgot about your series. In my use-case, it is not about
> dynamically knowning how much they can expand and adjust themselves but
> rather knowing statically upfront what resources they have been given.
> More concretely, these are workloads which used to completely occupy a
> single machine, though within containers but without limits. These
> workloads used to look at machine level metrics at startup on how much
> resources are available.
>
> Now these workloads are being moved to multi-tenant environment but
> still the machine is partitioned statically between the workloads. So,
> these workloads need to know upfront how much resources are allocated to
> them upfront and the way the cgroup hierarchy is setup, that information
> is a bit above the tree.
>
> I hope this clarifies the motivation behind this change i.e. the target
> is not dynamic load balancing but rather upfront static knowledge.
>
> thanks,
> Shakeel
>

We've been thinking of using memcg to both protect (memory.min) and
limit (via memcg OOM) memory hungry apps (games), while informing such
apps of their upper limit so they know how much they can allocate
before risking being killed. Visibility of the cgroup hierarchy isn't
an issue, but having a single file to read instead of walking up the
tree with multiple reads to calculate an effective limit would be
nice. Partial memcg activation in the hierarchy *is* an issue, but
walking up to the closest ancestor with memcg activated is better than
reading all the way up.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: add hierarchical effective limits for v2
  2025-02-05 22:20 [PATCH] memcg: add hierarchical effective limits for v2 Shakeel Butt
  2025-02-05 22:33 ` Balbir Singh
  2025-02-06 15:57 ` Michal Koutný
@ 2025-02-06 22:24 ` Shakeel Butt
  2 siblings, 0 replies; 16+ messages in thread
From: Shakeel Butt @ 2025-02-06 22:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Tejun Heo, Johannes Weiner, Roman Gushchin,
	Muchun Song, Michal Koutný,
	linux-mm, cgroups, linux-kernel, Meta kernel team

Oops, I forgot to CC Andrew.

On Wed, Feb 05, 2025 at 02:20:29PM -0800, Shakeel Butt wrote:
> Memcg-v1 exposes hierarchical_[memory|memsw]_limit counters in its
> memory.stat file which applications can use to get their effective limit
> which is the minimum of limits of itself and all of its ancestors. This
> is pretty useful in environments where cgroup namespace is used and the
> application does not have access to the full view of the cgroup
> hierarchy. Let's expose effective limits for memcg v2 as well.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---
>  Documentation/admin-guide/cgroup-v2.rst | 24 +++++++++++++
>  mm/memcontrol.c                         | 48 +++++++++++++++++++++++++
>  2 files changed, 72 insertions(+)
> 
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index cb1b4e759b7e..175e9435ad5c 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1311,6 +1311,14 @@ PAGE_SIZE multiple when read back.
>  	Caller could retry them differently, return into userspace
>  	as -ENOMEM or silently ignore in cases like disk readahead.
>  
> +  memory.max.effective
> +	A read-only single value file which exists on non-root cgroups.
> +
> +        The effective limit of the cgroup i.e. the minimum memory.max
> +        of all ancestors including itself. This is useful for environments
> +        where cgroup namespace is being used and the application does not
> +        have full view of the hierarchy.
> +
>    memory.reclaim
>  	A write-only nested-keyed file which exists for all cgroups.
>  
> @@ -1726,6 +1734,14 @@ The following nested keys are defined.
>  	Swap usage hard limit.  If a cgroup's swap usage reaches this
>  	limit, anonymous memory of the cgroup will not be swapped out.
>  
> +  memory.swap.max.effective
> +	A read-only single value file which exists on non-root cgroups.
> +
> +        The effective limit of the cgroup i.e. the minimum memory.swap.max
> +        of all ancestors including itself. This is useful for environments
> +        where cgroup namespace is being used and the application does not
> +        have full view of the hierarchy.
> +
>    memory.swap.events
>  	A read-only flat-keyed file which exists on non-root cgroups.
>  	The following entries are defined.  Unless specified
> @@ -1766,6 +1782,14 @@ The following nested keys are defined.
>  	limit, it will refuse to take any more stores before existing
>  	entries fault back in or are written out to disk.
>  
> +  memory.zswap.max.effective
> +	A read-only single value file which exists on non-root cgroups.
> +
> +        The effective limit of the cgroup i.e. the minimum memory.zswap.max
> +        of all ancestors including itself. This is useful for environments
> +        where cgroup namespace is being used and the application does not
> +        have full view of the hierarchy.
> +
>    memory.zswap.writeback
>  	A read-write single value file. The default value is "1".
>  	Note that this setting is hierarchical, i.e. the writeback would be
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index cae1c2e0cc71..8d21c1a44220 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4161,6 +4161,17 @@ static int memory_max_show(struct seq_file *m, void *v)
>  		READ_ONCE(mem_cgroup_from_seq(m)->memory.max));
>  }
>  
> +static int memory_max_effective_show(struct seq_file *m, void *v)
> +{
> +	unsigned long max = PAGE_COUNTER_MAX;
> +	struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
> +
> +	for (; memcg; memcg = parent_mem_cgroup(memcg))
> +		max = min(max, READ_ONCE(memcg->memory.max));
> +
> +	return seq_puts_memcg_tunable(m, max);
> +}
> +
>  static ssize_t memory_max_write(struct kernfs_open_file *of,
>  				char *buf, size_t nbytes, loff_t off)
>  {
> @@ -4438,6 +4449,11 @@ static struct cftype memory_files[] = {
>  		.seq_show = memory_max_show,
>  		.write = memory_max_write,
>  	},
> +	{
> +		.name = "max.effective",
> +		.flags = CFTYPE_NOT_ON_ROOT,
> +		.seq_show = memory_max_effective_show,
> +	},
>  	{
>  		.name = "events",
>  		.flags = CFTYPE_NOT_ON_ROOT,
> @@ -5117,6 +5133,17 @@ static int swap_max_show(struct seq_file *m, void *v)
>  		READ_ONCE(mem_cgroup_from_seq(m)->swap.max));
>  }
>  
> +static int swap_max_effective_show(struct seq_file *m, void *v)
> +{
> +	unsigned long max = PAGE_COUNTER_MAX;
> +	struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
> +
> +	for (; memcg; memcg = parent_mem_cgroup(memcg))
> +		max = min(max, READ_ONCE(memcg->swap.max));
> +
> +	return seq_puts_memcg_tunable(m, max);
> +}
> +
>  static ssize_t swap_max_write(struct kernfs_open_file *of,
>  			      char *buf, size_t nbytes, loff_t off)
>  {
> @@ -5166,6 +5193,11 @@ static struct cftype swap_files[] = {
>  		.seq_show = swap_max_show,
>  		.write = swap_max_write,
>  	},
> +	{
> +		.name = "swap.max.effective",
> +		.flags = CFTYPE_NOT_ON_ROOT,
> +		.seq_show = swap_max_effective_show,
> +	},
>  	{
>  		.name = "swap.peak",
>  		.flags = CFTYPE_NOT_ON_ROOT,
> @@ -5308,6 +5340,17 @@ static int zswap_max_show(struct seq_file *m, void *v)
>  		READ_ONCE(mem_cgroup_from_seq(m)->zswap_max));
>  }
>  
> +static int zswap_max_effective_show(struct seq_file *m, void *v)
> +{
> +	unsigned long max = PAGE_COUNTER_MAX;
> +	struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
> +
> +	for (; memcg; memcg = parent_mem_cgroup(memcg))
> +		max = min(max, READ_ONCE(memcg->zswap_max));
> +
> +	return seq_puts_memcg_tunable(m, max);
> +}
> +
>  static ssize_t zswap_max_write(struct kernfs_open_file *of,
>  			       char *buf, size_t nbytes, loff_t off)
>  {
> @@ -5362,6 +5405,11 @@ static struct cftype zswap_files[] = {
>  		.seq_show = zswap_max_show,
>  		.write = zswap_max_write,
>  	},
> +	{
> +		.name = "zswap.max.effective",
> +		.flags = CFTYPE_NOT_ON_ROOT,
> +		.seq_show = zswap_max_effective_show,
> +	},
>  	{
>  		.name = "zswap.writeback",
>  		.seq_show = zswap_writeback_show,
> -- 
> 2.43.5
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: add hierarchical effective limits for v2
  2025-02-06 19:09   ` Shakeel Butt
  2025-02-06 19:37     ` T.J. Mercier
@ 2025-02-10 16:24     ` Michal Koutný
  2025-02-10 18:34       ` Shakeel Butt
  2025-02-10 22:52       ` Johannes Weiner
  1 sibling, 2 replies; 16+ messages in thread
From: Michal Koutný @ 2025-02-10 16:24 UTC (permalink / raw)
  To: Shakeel Butt, T.J. Mercier
  Cc: Tejun Heo, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, linux-mm, cgroups, linux-kernel, Meta kernel team

Hello.

On Thu, Feb 06, 2025 at 11:09:05AM -0800, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> Oh I totally forgot about your series. In my use-case, it is not about
> dynamically knowning how much they can expand and adjust themselves but
> rather knowing statically upfront what resources they have been given.

From the memcg PoV, the effective value doesn't tell how much they were
given (because of sharing).

> More concretely, these are workloads which used to completely occupy a
> single machine, though within containers but without limits. These
> workloads used to look at machine level metrics at startup on how much
> resources are available.

I've been there but haven't found convincing mapping of global to memcg
limits.

The issue is that such a value won't guarantee no OOM when below because
it can be (generally) effectively shared.

(Alas, apps typically don't express their memory needs in units of
PSI. So it boils down to a system wide monitor like systemd-oomd and
cooperation with it.)

> Now these workloads are being moved to multi-tenant environment but
> still the machine is partitioned statically between the workloads. So,
> these workloads need to know upfront how much resources are allocated to
> them upfront and the way the cgroup hierarchy is setup, that information
> is a bit above the tree.

FTR, e.g. in systemd setups, this can be partially overcome by exposed
EffectiveMemoryMax= (the service manager who configures the resources
also can do the ancestry traversal).
kubernetes has downward API where generic resource info is shared into
containers and I recall that lxcfs could mangle procfs
memory info wrt memory limits for legacy apps.

As I think about it, the cgroupns (in)visibility should be resolved by
assigning the proper limit to namespace's root group memory.max (read
only for contained user) and the traversal...

On Thu, Feb 06, 2025 at 11:37:31AM -0800, "T.J. Mercier" <tjmercier@google.com> wrote:
> but having a single file to read instead of walking up the
> tree with multiple reads to calculate an effective limit would be
> nice.

...in kernel is nice but possible performance gain isn't worth hiding
the shareability of the effective limit.

So I wonder what is the current PoV of more MM people...

Michal

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: add hierarchical effective limits for v2
  2025-02-10 16:24     ` Michal Koutný
@ 2025-02-10 18:34       ` Shakeel Butt
  2025-02-10 22:52       ` Johannes Weiner
  1 sibling, 0 replies; 16+ messages in thread
From: Shakeel Butt @ 2025-02-10 18:34 UTC (permalink / raw)
  To: Michal Koutný
  Cc: T.J. Mercier, Tejun Heo, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Muchun Song, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On Mon, Feb 10, 2025 at 05:24:17PM +0100, Michal Koutný wrote:
> Hello.
> 
> On Thu, Feb 06, 2025 at 11:09:05AM -0800, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > Oh I totally forgot about your series. In my use-case, it is not about
> > dynamically knowning how much they can expand and adjust themselves but
> > rather knowing statically upfront what resources they have been given.
> 
> From the memcg PoV, the effective value doesn't tell how much they were
> given (because of sharing).
> 
> > More concretely, these are workloads which used to completely occupy a
> > single machine, though within containers but without limits. These
> > workloads used to look at machine level metrics at startup on how much
> > resources are available.
> 
> I've been there but haven't found convincing mapping of global to memcg
> limits.
> 
> The issue is that such a value won't guarantee no OOM when below because
> it can be (generally) effectively shared.
> 
> (Alas, apps typically don't express their memory needs in units of
> PSI. So it boils down to a system wide monitor like systemd-oomd and
> cooperation with it.)
> 

I think you missed the static partitioning of resources use-case I
mentioned. The issue you are pointing exist for the system level metrics
as well i.e. a worklod looking at system metrics can't say how much they
are given but in my specific case, the workloads know they occupy the
full machine. Now we want to move such workloads to multi-tenant
environment but the resources are still statically partitioned and not
overcommitted, so effective limit will tell how much they are given.

> > Now these workloads are being moved to multi-tenant environment but
> > still the machine is partitioned statically between the workloads. So,
> > these workloads need to know upfront how much resources are allocated to
> > them upfront and the way the cgroup hierarchy is setup, that information
> > is a bit above the tree.
> 
> FTR, e.g. in systemd setups, this can be partially overcome by exposed
> EffectiveMemoryMax= (the service manager who configures the resources
> also can do the ancestry traversal).
> kubernetes has downward API where generic resource info is shared into
> containers and I recall that lxcfs could mangle procfs
> memory info wrt memory limits for legacy apps.
> 
> 
> As I think about it, the cgroupns (in)visibility should be resolved by
> assigning the proper limit to namespace's root group memory.max (read
> only for contained user) and the traversal...
> 

I think here your point is why not have userspace based solution. I
think it is possible but not convenient and adds an external dependency
in the workload.

> 
> On Thu, Feb 06, 2025 at 11:37:31AM -0800, "T.J. Mercier" <tjmercier@google.com> wrote:
> > but having a single file to read instead of walking up the
> > tree with multiple reads to calculate an effective limit would be
> > nice.
> 
> ...in kernel is nice but possible performance gain isn't worth hiding
> the shareability of the effective limit.
> 
> 
> So I wonder what is the current PoV of more MM people...

Yup, let's see more opinion on this.

Thanks Michal for your feedback.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: add hierarchical effective limits for v2
  2025-02-10 16:24     ` Michal Koutný
  2025-02-10 18:34       ` Shakeel Butt
@ 2025-02-10 22:52       ` Johannes Weiner
  2025-02-11  4:55         ` Roman Gushchin
  1 sibling, 1 reply; 16+ messages in thread
From: Johannes Weiner @ 2025-02-10 22:52 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Shakeel Butt, T.J. Mercier, Tejun Heo, Michal Hocko,
	Roman Gushchin, Muchun Song, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On Mon, Feb 10, 2025 at 05:24:17PM +0100, Michal Koutný wrote:
> Hello.
> 
> On Thu, Feb 06, 2025 at 11:09:05AM -0800, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > Oh I totally forgot about your series. In my use-case, it is not about
> > dynamically knowning how much they can expand and adjust themselves but
> > rather knowing statically upfront what resources they have been given.
> 
> From the memcg PoV, the effective value doesn't tell how much they were
> given (because of sharing).

It's definitely true that if you have an ancestral limit for several
otherwise unlimited siblings, then interpreting this number as "this
is how much memory I have available" will be completely misleading.

I would also say that sharing a limit with several siblings requires a
certain degree of awareness and cooperation between them. From that
POV, IMO it would be fine to provide a metric with contextual caveats.

The problem is, what do we do with canned, unaware, maybe untrusted
applications? And they don't necessarily know which they are.

It depends heavily on the judgement of the administrator of any given
deployment. Some workloads might be completely untrusted and hard
limited. Another deployment might consider the same workload
reasonably predictable that it's configured only with a failsafe max
limit that is much higher than where the workload is *expected* to
operate. The allotment might happen altogether with min/low
protections and no max limit. Or there could be a combination of
protection slightly below and a limit slightly above the expected
workload size.

It seems basically impossible to write portable code against this
without knowing the intent of the person setting it up.

But how do we communicate intent down to the container? The two broad
options are implicitly or explicitly:

a) Provide a cgroup file that automatically derives intended target
   size from how min/low/high/max are set up.

   Right now those can be set up super loosely depending on what the
   administrator thinks about the application. In order for this to
   work, we'd likely have to define an idiomatic way of configuring
   the controller. E.g. if you set max by itself, we assume this is
   the target size. If you set low, with or without max, then low is
   the target size. Or if you set both, target is in between.

   I'm not completely convinced this is workable. It might require
   settings beyond what's actually needed for the safe containment of
   the workload, which carries the risk of excluding something useful.
   I don't mean enforced configuration rules, but rather the case
   where a configuration is reasonable and effective given the
   workload and environment, but now the target file shows nonsense.

b) Provide a cgroup file that is freely configurable by the
   administrator with the target size of the container.

   This has obvious drawbacks as well. What's the default value? Also,
   a lot of setups are dead simple: set a hard limit and expect the
   workload to adhere to that, period. Nobody is going to reliably set
   another cgroup file that a workload may or may not consume.

The third option is to wash our hands of all of this, provide the
static hierarchy settings to the leaves (like this patch, plus do it
for the other knobs as well) and let userspace figure it out.

Thoughts?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: add hierarchical effective limits for v2
  2025-02-10 22:52       ` Johannes Weiner
@ 2025-02-11  4:55         ` Roman Gushchin
  2025-02-12  1:08           ` Shakeel Butt
  0 siblings, 1 reply; 16+ messages in thread
From: Roman Gushchin @ 2025-02-11  4:55 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Koutný,
	Shakeel Butt, T.J. Mercier, Tejun Heo, Michal Hocko, Muchun Song,
	linux-mm, cgroups, linux-kernel, Meta kernel team

On Mon, Feb 10, 2025 at 05:52:34PM -0500, Johannes Weiner wrote:
> On Mon, Feb 10, 2025 at 05:24:17PM +0100, Michal Koutný wrote:
> > Hello.
> > 
> > On Thu, Feb 06, 2025 at 11:09:05AM -0800, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > > Oh I totally forgot about your series. In my use-case, it is not about
> > > dynamically knowning how much they can expand and adjust themselves but
> > > rather knowing statically upfront what resources they have been given.
> > 
> > From the memcg PoV, the effective value doesn't tell how much they were
> > given (because of sharing).
> 
> It's definitely true that if you have an ancestral limit for several
> otherwise unlimited siblings, then interpreting this number as "this
> is how much memory I have available" will be completely misleading.
> 
> I would also say that sharing a limit with several siblings requires a
> certain degree of awareness and cooperation between them. From that
> POV, IMO it would be fine to provide a metric with contextual caveats.
> 
> The problem is, what do we do with canned, unaware, maybe untrusted
> applications? And they don't necessarily know which they are.
> 
> It depends heavily on the judgement of the administrator of any given
> deployment. Some workloads might be completely untrusted and hard
> limited. Another deployment might consider the same workload
> reasonably predictable that it's configured only with a failsafe max
> limit that is much higher than where the workload is *expected* to
> operate. The allotment might happen altogether with min/low
> protections and no max limit. Or there could be a combination of
> protection slightly below and a limit slightly above the expected
> workload size.
> 
> It seems basically impossible to write portable code against this
> without knowing the intent of the person setting it up.
> 
> But how do we communicate intent down to the container? The two broad
> options are implicitly or explicitly:
> 
> a) Provide a cgroup file that automatically derives intended target
>    size from how min/low/high/max are set up.
> 
>    Right now those can be set up super loosely depending on what the
>    administrator thinks about the application. In order for this to
>    work, we'd likely have to define an idiomatic way of configuring
>    the controller. E.g. if you set max by itself, we assume this is
>    the target size. If you set low, with or without max, then low is
>    the target size. Or if you set both, target is in between.
> 
>    I'm not completely convinced this is workable.

This sounds like memory.available.

It's hard to implement well, especially taking into account things like
numa, memory sharing, estimating how much can be reclaimed, etc.

But at the same time there is a value in providing such metric.
There is a clear use case. And it's even harder to implement this
in userspace.

> b) Provide a cgroup file that is freely configurable by the
>    administrator with the target size of the container.
> 
>    This has obvious drawbacks as well. What's the default value? Also,
>    a lot of setups are dead simple: set a hard limit and expect the
>    workload to adhere to that, period. Nobody is going to reliably set
>    another cgroup file that a workload may or may not consume.

Yeah, this is a weird option.

> 
> The third option is to wash our hands of all of this, provide the
> static hierarchy settings to the leaves (like this patch, plus do it
> for the other knobs as well) and let userspace figure it out.

Idk, I see a very little value in it. I'm not necessarily opposing this patchset,
just not seeing a lot of value.

Maybe I'm missing something, but somehow it wasn't a problem for many years.
Nothing really changed here.

So maybe someone can come up with a better explanation of a specific problem
we're trying to solve here?

Thanks!


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: add hierarchical effective limits for v2
  2025-02-11  4:55         ` Roman Gushchin
@ 2025-02-12  1:08           ` Shakeel Butt
  2025-02-17 17:57             ` Michal Koutný
  0 siblings, 1 reply; 16+ messages in thread
From: Shakeel Butt @ 2025-02-12  1:08 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Michal Koutný,
	T.J. Mercier, Tejun Heo, Michal Hocko, Muchun Song, linux-mm,
	cgroups, linux-kernel, Meta kernel team

On Tue, Feb 11, 2025 at 04:55:33AM +0000, Roman Gushchin wrote:
[...]
> 
> Maybe I'm missing something, but somehow it wasn't a problem for many years.
> Nothing really changed here.
> 
> So maybe someone can come up with a better explanation of a specific problem
> we're trying to solve here?

The most simple explanation is visibility. Workloads that used to run
solo are being moved to a multi-tenant but non-overcommited environment
and they need to know their capacity which they used to get from system
metrics. Now they have to get from cgroup limit files but usage of
cgroup namespace limits those workloads to extract the needed
information.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: add hierarchical effective limits for v2
  2025-02-12  1:08           ` Shakeel Butt
@ 2025-02-17 17:57             ` Michal Koutný
  2025-02-26 21:13               ` Shakeel Butt
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Koutný @ 2025-02-17 17:57 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Roman Gushchin, Johannes Weiner, T.J. Mercier, Tejun Heo,
	Michal Hocko, Muchun Song, linux-mm, cgroups, linux-kernel,
	Meta kernel team

[-- Attachment #1: Type: text/plain, Size: 2211 bytes --]

Hello.

On Tue, Feb 11, 2025 at 05:08:03PM -0800, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > So maybe someone can come up with a better explanation of a specific problem
> > we're trying to solve here?

In my experience, another factor is the switch from v1 to v2 (which
propagates slower to downstreams) and applications that rely on
memory.stat:hierarchical_memory_limit. (Funnily, enough the commit
fee7b548e6f2b ("memcg: show real limit under hierarchy mode") introduces
it primarily for debugging purposes (not sizing). An application being
killed with no apparent (immediate) limit breach.)

Roman, you may also remember that it had already popped ~year ago [1].

> The most simple explanation is visibility. Workloads that used to run
> solo are being moved to a multi-tenant but non-overcommited environment
> and they need to know their capacity which they used to get from system
> metrics.

> Now they have to get from cgroup limit files but usage of
> cgroup namespace limits those workloads to extract the needed
> information.

I remember Shakeel said the limit may be set higher in the hierarchy for
container + siblings but then it's potentially overcommitted, no?

I.e. namespace visibility alone is not the problem. The cgns root's
memory.max is the shared medium between host and guest through which the
memory allowance can be passed -- that actually sounds to me like
Johannes' option b).

(Which leads me to an idea of memory.max.effective that'd only present
the value iff there's no sibling between tightest ancestor..self. If one
looks at nr_tasks, it's partial but correct memory available. Not that
useful due to the partiality.)

Since I was originally fan of the idea, I'm not a strong opponent of
plain memory.max.effective, especially when Johannes considers the
option of kernel stepping back here and it may help some users. But I'd
like to see the original incarnations [2] somehow linked (and maybe
start only with memory.max as
that has some usecases).

Thanks,
Michal

[1] https://lore.kernel.org/all/ZcY7NmjkJMhGz8fP@host1.jankratochvil.net/
[2] https://lore.kernel.org/all/20240606152232.20253-1-mkoutny@suse.com/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: add hierarchical effective limits for v2
  2025-02-17 17:57             ` Michal Koutný
@ 2025-02-26 21:13               ` Shakeel Butt
  2025-02-27  3:51                 ` Johannes Weiner
  0 siblings, 1 reply; 16+ messages in thread
From: Shakeel Butt @ 2025-02-26 21:13 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Roman Gushchin, Johannes Weiner, T.J. Mercier, Tejun Heo,
	Michal Hocko, Muchun Song, linux-mm, cgroups, linux-kernel,
	Meta kernel team

Sorry for the late response.

On Mon, Feb 17, 2025 at 06:57:46PM +0100, Michal Koutný wrote:
> Hello.
> 

[...]

> > The most simple explanation is visibility. Workloads that used to run
> > solo are being moved to a multi-tenant but non-overcommited environment
> > and they need to know their capacity which they used to get from system
> > metrics.
> 
> > Now they have to get from cgroup limit files but usage of
> > cgroup namespace limits those workloads to extract the needed
> > information.
> 
> I remember Shakeel said the limit may be set higher in the hierarchy for
> container + siblings but then it's potentially overcommitted, no?
> 
> I.e. namespace visibility alone is not the problem. The cgns root's
> memory.max is the shared medium between host and guest through which the
> memory allowance can be passed -- that actually sounds to me like
> Johannes' option b).
> 
> (Which leads me to an idea of memory.max.effective that'd only present
> the value iff there's no sibling between tightest ancestor..self. If one
> looks at nr_tasks, it's partial but correct memory available. Not that
> useful due to the partiality.)
> 
> Since I was originally fan of the idea, I'm not a strong opponent of
> plain memory.max.effective, especially when Johannes considers the
> option of kernel stepping back here and it may help some users. But I'd
> like to see the original incarnations [2] somehow linked (and maybe
> start only with memory.max as
> that has some usecases).

Yes, I can link [2] with more info added to the commit message.

Johannes, do you want effective interface for low and min as well or for
now just keep the current targeted interfaces?

> 
> Thanks,
> Michal
> 
> [1] https://lore.kernel.org/all/ZcY7NmjkJMhGz8fP@host1.jankratochvil.net/
> [2] https://lore.kernel.org/all/20240606152232.20253-1-mkoutny@suse.com/




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: add hierarchical effective limits for v2
  2025-02-26 21:13               ` Shakeel Butt
@ 2025-02-27  3:51                 ` Johannes Weiner
  2025-03-17  1:12                   ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Johannes Weiner @ 2025-02-27  3:51 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Michal Koutný,
	Roman Gushchin, T.J. Mercier, Tejun Heo, Michal Hocko,
	Muchun Song, linux-mm, cgroups, linux-kernel, Meta kernel team

On Wed, Feb 26, 2025 at 01:13:28PM -0800, Shakeel Butt wrote:
> Sorry for the late response.
> 
> On Mon, Feb 17, 2025 at 06:57:46PM +0100, Michal Koutný wrote:
> > Hello.
> > 
> 
> [...]
> 
> > > The most simple explanation is visibility. Workloads that used to run
> > > solo are being moved to a multi-tenant but non-overcommited environment
> > > and they need to know their capacity which they used to get from system
> > > metrics.
> > 
> > > Now they have to get from cgroup limit files but usage of
> > > cgroup namespace limits those workloads to extract the needed
> > > information.
> > 
> > I remember Shakeel said the limit may be set higher in the hierarchy for
> > container + siblings but then it's potentially overcommitted, no?
> > 
> > I.e. namespace visibility alone is not the problem. The cgns root's
> > memory.max is the shared medium between host and guest through which the
> > memory allowance can be passed -- that actually sounds to me like
> > Johannes' option b).
> > 
> > (Which leads me to an idea of memory.max.effective that'd only present
> > the value iff there's no sibling between tightest ancestor..self. If one
> > looks at nr_tasks, it's partial but correct memory available. Not that
> > useful due to the partiality.)
> > 
> > Since I was originally fan of the idea, I'm not a strong opponent of
> > plain memory.max.effective, especially when Johannes considers the
> > option of kernel stepping back here and it may help some users. But I'd
> > like to see the original incarnations [2] somehow linked (and maybe
> > start only with memory.max as
> > that has some usecases).
> 
> Yes, I can link [2] with more info added to the commit message.
> 
> Johannes, do you want effective interface for low and min as well or for
> now just keep the current targeted interfaces?

I think it would make sense to do min, low, high, max for memory in
one go, as a complete new feature, rather than doing them one by one.

Tejun, what's your take on this, considering other controllers as
well? Does that seem like a reasonable solution to address the "I'm in
a namespace and can't see my configuration" problem?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: add hierarchical effective limits for v2
  2025-02-27  3:51                 ` Johannes Weiner
@ 2025-03-17  1:12                   ` Andrew Morton
  2025-03-17 18:06                     ` Tejun Heo
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2025-03-17  1:12 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Shakeel Butt, Michal Koutný,
	Roman Gushchin, T.J. Mercier, Tejun Heo, Michal Hocko,
	Muchun Song, linux-mm, cgroups, linux-kernel, Meta kernel team

On Wed, 26 Feb 2025 22:51:55 -0500 Johannes Weiner <hannes@cmpxchg.org> wrote:

> > > start only with memory.max as
> > > that has some usecases).
> > 
> > Yes, I can link [2] with more info added to the commit message.
> > 
> > Johannes, do you want effective interface for low and min as well or for
> > now just keep the current targeted interfaces?
> 
> I think it would make sense to do min, low, high, max for memory in
> one go, as a complete new feature, rather than doing them one by one.
> 
> Tejun, what's your take on this, considering other controllers as
> well? Does that seem like a reasonable solution to address the "I'm in
> a namespace and can't see my configuration" problem?

I guess Tejun missed this.

It seems that more think time is needed on this patch?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: add hierarchical effective limits for v2
  2025-03-17  1:12                   ` Andrew Morton
@ 2025-03-17 18:06                     ` Tejun Heo
  0 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2025-03-17 18:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Shakeel Butt, Michal Koutný,
	Roman Gushchin, T.J. Mercier, Michal Hocko, Muchun Song,
	linux-mm, cgroups, linux-kernel, Meta kernel team

Hello,

On Sun, Mar 16, 2025 at 06:12:14PM -0700, Andrew Morton wrote:
> On Wed, 26 Feb 2025 22:51:55 -0500 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > > > start only with memory.max as
> > > > that has some usecases).
> > > 
> > > Yes, I can link [2] with more info added to the commit message.
> > > 
> > > Johannes, do you want effective interface for low and min as well or for
> > > now just keep the current targeted interfaces?
> > 
> > I think it would make sense to do min, low, high, max for memory in
> > one go, as a complete new feature, rather than doing them one by one.
> > 
> > Tejun, what's your take on this, considering other controllers as
> > well? Does that seem like a reasonable solution to address the "I'm in
> > a namespace and can't see my configuration" problem?
> 
> I guess Tejun missed this.
> 
> It seems that more think time is needed on this patch?

Oh yes, I did. My apologies and thanks for the poking.

I'm a bit doubtful that simply compounding the configured values and
presenting them to the nested cgroup would be a good solution here. It does
add more information but given that the same values can indicate multiple
widely differing situations, I'm unsure how much value they would provide.
Wouldn't it be just providing more numbers to be confused about?

My intuition is that most applications would want a single number to base
sizing decisions on and I don't know how they could handle the gap between
e.g. low and max without further information on configuration intent.

If someone has to provide the configuration intent anyway, why not just let
them provide the single number that the application would care about - the
intended memory amount for that application or container? We can provide a
dedicated cgroup file for it or admin can just set an xattr on the cgroup
directory. Maybe the xattr perm checks can be improved so that it aligns
better with subtree delegations for the latter.

ie. I feel like .effective's are us trying to do *something* even if that
thing doesn't actually solve the problem. I'm not hard set on this opinion
tho and would really appreciate counter arguments.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-03-17 18:06 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-05 22:20 [PATCH] memcg: add hierarchical effective limits for v2 Shakeel Butt
2025-02-05 22:33 ` Balbir Singh
2025-02-06 15:57 ` Michal Koutný
2025-02-06 19:09   ` Shakeel Butt
2025-02-06 19:37     ` T.J. Mercier
2025-02-10 16:24     ` Michal Koutný
2025-02-10 18:34       ` Shakeel Butt
2025-02-10 22:52       ` Johannes Weiner
2025-02-11  4:55         ` Roman Gushchin
2025-02-12  1:08           ` Shakeel Butt
2025-02-17 17:57             ` Michal Koutný
2025-02-26 21:13               ` Shakeel Butt
2025-02-27  3:51                 ` Johannes Weiner
2025-03-17  1:12                   ` Andrew Morton
2025-03-17 18:06                     ` Tejun Heo
2025-02-06 22:24 ` Shakeel Butt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox