[PATCH 0/2] Use kmem_cache for memcg alloc

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] Use kmem_cache for memcg alloc
@ 2025-04-23  8:43 Huan Yang
  2025-04-23  8:43 ` [PATCH 1/2] mm/memcg: use kmem_cache when alloc memcg Huan Yang
  2025-04-23  8:43 ` [PATCH 2/2] mm/memcg: use kmem_cache when alloc memcg pernode info Huan Yang
  0 siblings, 2 replies; 9+ messages in thread
From: Huan Yang @ 2025-04-23  8:43 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, cgroups, linux-mm, linux-kernel
  Cc: opensource.kernel, Huan Yang

mem_cgroup_alloc create mem_cgroup struct and many other struct include
mem_cgroup_per_node.

In our machine - Arm64, 16G, 6.6, 1NUMA, memcgv2 with nokmem,nosocket,
cgroup_disable pressure, run shell like this:
  echo 1 > /sys/kernel/tracing/events/kmem/kmalloc/enable
  echo 1 > /sys/kernel/tracing/tracing_on
  cat /sys/kernel/tracing/trace_pipe | grep kmalloc | grep mem_cgroup
  # In other shell terminal
  echo +memory > /sys/fs/cgroup/cgroup.subtree_control
Then can see ftrace show may like this:
  # mem_cgroup struct alloc in mem_cgroup_alloc
  sh-6312    [000] ..... 58015.698365: kmalloc:
    call_site=mem_cgroup_css_alloc+0xd8/0x5b4 ptr=000000003e4c3799
    bytes_req=2312 bytes_alloc=4096 gfp_flags=GFP_KERNEL|__GFP_ZERO
    node=-1 accounted=false

  # mem_cgroup_per_node alloc in alloc_mem_cgroup_per_node_info
  sh-6312    [000] ..... 58015.698389: kmalloc:
    call_site=mem_cgroup_css_alloc+0x1d8/0x5b4 ptr=00000000d798700c
    bytes_req=2896 bytes_alloc=4096 gfp_flags=GFP_KERNEL|__GFP_ZERO
    node=0 accounted=false

Both two of them use kmalloc to alloc target struct, which size between
2K-4K, but get 4K above aligned slab object_size.
The reason is common kmalloc prepared many pre-defined size slab, in our
machine, like:64 128 192 256 ... 2k 4k 8k.

So, when alloc mem_cgroup or mem_cgroup_per_node, it request 2312/2896 with
between 2K-4K, and no other pre-slub like 3k or more large but low than 4k,
so it use 4K slab object_size.

So, each memcg will waste 1784bytes, and due to 1 node, also waste
1200bytes, sum as 2984bytes per memcg. If multi NUMA node, this also
increase more:
  8bytes * nr_node_ids + 1200bytes * nr_node_ids
This is a little waste.

This patchset add two kmem_cache for their struct alloc.
  patch1 - mem_cgroup named memcg_cachep
  patch2 - mem_cgroup_per_node named memcg_pn_cachep

This benifit can observe by this shell:
  echo 1 > /sys/kernel/tracing/events/kmem/kmem_cache_alloc/enable
  echo 1 > /sys/kernel/tracing/tracing_on
  cat /sys/kernel/tracing/trace_pipe | grep kmem_cache_alloc | grep mem_cgroup
  # In other shell terminal
  echo +memory > /sys/fs/cgroup/cgroup.subtree_control
Then show may like this:
  sh-9827    [000] .....   289.513598: kmem_cache_alloc:
    call_site=mem_cgroup_css_alloc+0xbc/0x5d4 ptr=00000000695c1806
    bytes_req=2312 bytes_alloc=2368 gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1
    accounted=false
  sh-9827    [000] .....   289.513602: kmem_cache_alloc:
    call_site=mem_cgroup_css_alloc+0x1b8/0x5d4 ptr=000000002989e63a
    bytes_req=2896 bytes_alloc=2944 gfp_flags=GFP_KERNEL|__GFP_ZERO node=0
    accounted=false
That mean mem_cgroup request 2312 bytes, given 2368bytes obj, and
mem_cgroup_per_node request 2896 bytes, given 2944bytes obj. Which due to
each kmem_cache with SLAB_HWCACHE_ALIGN.
And if without SLAB_HWCACHE_ALIGN, may show like this:
  sh-9269    [003] .....    80.396366: kmem_cache_alloc:
    call_site=mem_cgroup_css_alloc+0xbc/0x5d4 ptr=000000005b12b475
    bytes_req=2312 bytes_alloc=2312 gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1
    accounted=false
  sh-9269    [003] .....    80.396411: kmem_cache_alloc:
    call_site=mem_cgroup_css_alloc+0x1b8/0x5d4 ptr=00000000f347adc6
    bytes_req=2896 bytes_alloc=2896 gfp_flags=GFP_KERNEL|__GFP_ZERO node=0
    accounted=false

Which bytes_alloc show as expect, but I guess with HWCACHE ALIGN more worth,
so this patchset use it as default. If anything wrong, please correct me.

So, how many memcg may show in real machine? This is depends on how we group
memory resource for task. In our phone, we choice per-app-memcg, so this is
depends on how many app user installed and openned, May 10 - 100.
And if by uid-pid group, this maybe more, may 400 - 1000 or more.

Huan Yang (2):
  mm/memcg: use kmem_cache when alloc memcg
  mm/memcg: use kmem_cache when alloc memcg pernode info

 mm/memcontrol.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)


base-commit: 2c9c612abeb38aab0e87d48496de6fd6daafb00b
--
2.48.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] mm/memcg: use kmem_cache when alloc memcg
  2025-04-23  8:43 [PATCH 0/2] Use kmem_cache for memcg alloc Huan Yang
@ 2025-04-23  8:43 ` Huan Yang
  2025-04-23 21:59   ` Andrew Morton
  2025-04-23  8:43 ` [PATCH 2/2] mm/memcg: use kmem_cache when alloc memcg pernode info Huan Yang
  1 sibling, 1 reply; 9+ messages in thread
From: Huan Yang @ 2025-04-23  8:43 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, cgroups, linux-mm, linux-kernel
  Cc: opensource.kernel, Huan Yang

When trace mem_cgroup_alloc using trace_kmalloc, may get this output:

kmalloc: call_site=mem_cgroup_css_alloc+0xd8/0x5b4 ptr=000000003e4c3799
bytes_req=2312 bytes_alloc=4096 gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1
accounted=false

If multi NUMA node, maybe above a little than 2312(8bytes * nr_node_ids).

Which mean each mem_cgroup struct only need 2312bytes but kmalloc give
4096, due to we only have kmalloc-2k and kmalloc-4k, so fallback into
kmalloc-4k. This is waste a little.

This patch use kmem_cache_alloc to alloc mem_cgroup struct. Ftrace show
like this:

kmem_cache_alloc: call_site=mem_cgroup_css_alloc+0xbc/0x5d4
ptr=00000000695c1806 bytes_req=2312 bytes_alloc=2368
gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1 accounted=false

Compare to 1 memcg waste 1784bytes, this only 'waste' 56bytes due to hw
cacheline align.

Signed-off-by: Huan Yang <link@vivo.com>
---
 mm/memcontrol.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5e2ea8b8a898..cb32a498e5ae 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -95,6 +95,8 @@ static bool cgroup_memory_nokmem __ro_after_init;
 /* BPF memory accounting disabled? */
 static bool cgroup_memory_nobpf __ro_after_init;
 
+static struct kmem_cache *memcg_cachep;
+
 #ifdef CONFIG_CGROUP_WRITEBACK
 static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq);
 #endif
@@ -3652,7 +3654,10 @@ static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent)
 	int __maybe_unused i;
 	long error;
 
-	memcg = kzalloc(struct_size(memcg, nodeinfo, nr_node_ids), GFP_KERNEL);
+	memcg = likely(memcg_cachep) ?
+			kmem_cache_zalloc(memcg_cachep, GFP_KERNEL) :
+			kzalloc(struct_size(memcg, nodeinfo, nr_node_ids),
+				GFP_KERNEL);
 	if (!memcg)
 		return ERR_PTR(-ENOMEM);
 
@@ -5039,6 +5044,7 @@ __setup("cgroup.memory=", cgroup_memory);
 static int __init mem_cgroup_init(void)
 {
 	int cpu;
+	unsigned int memcg_size;
 
 	/*
 	 * Currently s32 type (can refer to struct batched_lruvec_stat) is
@@ -5055,6 +5061,10 @@ static int __init mem_cgroup_init(void)
 		INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work,
 			  drain_local_stock);
 
+	memcg_size = struct_size_t(struct mem_cgroup, nodeinfo, nr_node_ids);
+	memcg_cachep = kmem_cache_create("mem_cgroup", memcg_size, 0,
+					 SLAB_PANIC | SLAB_HWCACHE_ALIGN, NULL);
+
 	return 0;
 }
 subsys_initcall(mem_cgroup_init);
-- 
2.48.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] mm/memcg: use kmem_cache when alloc memcg pernode info
  2025-04-23  8:43 [PATCH 0/2] Use kmem_cache for memcg alloc Huan Yang
  2025-04-23  8:43 ` [PATCH 1/2] mm/memcg: use kmem_cache when alloc memcg Huan Yang
@ 2025-04-23  8:43 ` Huan Yang
  1 sibling, 0 replies; 9+ messages in thread
From: Huan Yang @ 2025-04-23  8:43 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, cgroups, linux-mm, linux-kernel
  Cc: opensource.kernel, Huan Yang

When trace mem_cgroup_alloc using trace_kmalloc, may get this output:

kmalloc: call_site=mem_cgroup_css_alloc+0x1d8/0x5b4 ptr=00000000d798700c
bytes_req=2896 bytes_alloc=4096 gfp_flags=GFP_KERNEL|__GFP_ZERO node=0
accounted=false

Due to each memcg will alloc per_nodeinfo for each NUMA node, and each
NUMA node is 2896bytes but kmalloc also given 4096bytes, that mean 1
memcg on each NUMA node will waste 1200bytes.

This patch use kmem_cache when alloc mem_cgroup_per_nodeinfo for each
NUMA node. By this, ftrace how like:

kmem_cache_alloc: call_site=mem_cgroup_css_alloc+0x1b8/0x5d4
ptr=000000002989e63a bytes_req=2896 bytes_alloc=2944
gfp_flags=GFP_KERNEL|__GFP_ZERO node=0 accounted=false

Compare to kmalloc, this only "waste" 48bytes per node due to HW
cacheline align

Signed-off-by: Huan Yang <link@vivo.com>
---
 mm/memcontrol.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index cb32a498e5ae..e8797382aeb4 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -96,6 +96,7 @@ static bool cgroup_memory_nokmem __ro_after_init;
 static bool cgroup_memory_nobpf __ro_after_init;
 
 static struct kmem_cache *memcg_cachep;
+static struct kmem_cache *memcg_pn_cachep;
 
 #ifdef CONFIG_CGROUP_WRITEBACK
 static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq);
@@ -3601,7 +3602,10 @@ static bool alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 {
 	struct mem_cgroup_per_node *pn;
 
-	pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, node);
+	pn = likely(memcg_pn_cachep) ?
+		     kmem_cache_alloc_node(memcg_pn_cachep,
+					   GFP_KERNEL | __GFP_ZERO, node) :
+		     kzalloc_node(sizeof(*pn), GFP_KERNEL, node);
 	if (!pn)
 		return false;
 
@@ -5065,6 +5069,9 @@ static int __init mem_cgroup_init(void)
 	memcg_cachep = kmem_cache_create("mem_cgroup", memcg_size, 0,
 					 SLAB_PANIC | SLAB_HWCACHE_ALIGN, NULL);
 
+	memcg_pn_cachep = KMEM_CACHE(mem_cgroup_per_node,
+				     SLAB_PANIC | SLAB_HWCACHE_ALIGN);
+
 	return 0;
 }
 subsys_initcall(mem_cgroup_init);
-- 
2.48.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] mm/memcg: use kmem_cache when alloc memcg
  2025-04-23  8:43 ` [PATCH 1/2] mm/memcg: use kmem_cache when alloc memcg Huan Yang
@ 2025-04-23 21:59   ` Andrew Morton
  2025-04-23 22:13     ` Shakeel Butt
  2025-04-24  1:46     ` Huan Yang
  0 siblings, 2 replies; 9+ messages in thread
From: Andrew Morton @ 2025-04-23 21:59 UTC (permalink / raw)
  To: Huan Yang
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, cgroups, linux-mm, linux-kernel, opensource.kernel

On Wed, 23 Apr 2025 16:43:04 +0800 Huan Yang <link@vivo.com> wrote:

> @@ -3652,7 +3654,10 @@ static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent)
>  	int __maybe_unused i;
>  	long error;
>  
> -	memcg = kzalloc(struct_size(memcg, nodeinfo, nr_node_ids), GFP_KERNEL);
> +	memcg = likely(memcg_cachep) ?
> +			kmem_cache_zalloc(memcg_cachep, GFP_KERNEL) :
> +			kzalloc(struct_size(memcg, nodeinfo, nr_node_ids),
> +				GFP_KERNEL);

Why are we testing for memcg_cachep=NULL?

> @@ -5055,6 +5061,10 @@ static int __init mem_cgroup_init(void)
>  		INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work,
>  			  drain_local_stock);
>  
> +	memcg_size = struct_size_t(struct mem_cgroup, nodeinfo, nr_node_ids);
> +	memcg_cachep = kmem_cache_create("mem_cgroup", memcg_size, 0,
> +					 SLAB_PANIC | SLAB_HWCACHE_ALIGN, NULL);

If it's because this allocation might have failed then let's not
bother.  If an __init-time allocation failed, this kernel is unusable
anyway.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] mm/memcg: use kmem_cache when alloc memcg
  2025-04-23 21:59   ` Andrew Morton
@ 2025-04-23 22:13     ` Shakeel Butt
  2025-04-24  2:45       ` Huan Yang
  2025-04-24  1:46     ` Huan Yang
  1 sibling, 1 reply; 9+ messages in thread
From: Shakeel Butt @ 2025-04-23 22:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Huan Yang, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, cgroups, linux-mm, linux-kernel, opensource.kernel

On Wed, Apr 23, 2025 at 02:59:12PM -0700, Andrew Morton wrote:
> On Wed, 23 Apr 2025 16:43:04 +0800 Huan Yang <link@vivo.com> wrote:
> 
> > @@ -3652,7 +3654,10 @@ static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent)
> >  	int __maybe_unused i;
> >  	long error;
> >  
> > -	memcg = kzalloc(struct_size(memcg, nodeinfo, nr_node_ids), GFP_KERNEL);
> > +	memcg = likely(memcg_cachep) ?
> > +			kmem_cache_zalloc(memcg_cachep, GFP_KERNEL) :
> > +			kzalloc(struct_size(memcg, nodeinfo, nr_node_ids),
> > +				GFP_KERNEL);
> 
> Why are we testing for memcg_cachep=NULL?
> 
> > @@ -5055,6 +5061,10 @@ static int __init mem_cgroup_init(void)
> >  		INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work,
> >  			  drain_local_stock);
> >  
> > +	memcg_size = struct_size_t(struct mem_cgroup, nodeinfo, nr_node_ids);
> > +	memcg_cachep = kmem_cache_create("mem_cgroup", memcg_size, 0,
> > +					 SLAB_PANIC | SLAB_HWCACHE_ALIGN, NULL);
> 
> If it's because this allocation might have failed then let's not
> bother.  If an __init-time allocation failed, this kernel is unusable
> anyway.

+1 to Andrew's point. SLAB_PANIC is used here, so, memcg_cachep can't be
NULL later.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] mm/memcg: use kmem_cache when alloc memcg
  2025-04-23 21:59   ` Andrew Morton
  2025-04-23 22:13     ` Shakeel Butt
@ 2025-04-24  1:46     ` Huan Yang
  1 sibling, 0 replies; 9+ messages in thread
From: Huan Yang @ 2025-04-24  1:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, cgroups, linux-mm, linux-kernel, opensource.kernel


在 2025/4/24 05:59, Andrew Morton 写道:
> On Wed, 23 Apr 2025 16:43:04 +0800 Huan Yang <link@vivo.com> wrote:
>
>> @@ -3652,7 +3654,10 @@ static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent)
>>   	int __maybe_unused i;
>>   	long error;
>>   
>> -	memcg = kzalloc(struct_size(memcg, nodeinfo, nr_node_ids), GFP_KERNEL);
>> +	memcg = likely(memcg_cachep) ?
>> +			kmem_cache_zalloc(memcg_cachep, GFP_KERNEL) :
>> +			kzalloc(struct_size(memcg, nodeinfo, nr_node_ids),
>> +				GFP_KERNEL);
> Why are we testing for memcg_cachep=NULL?

Due to cgroup_init ahead of initcall, hence, root_mem_cgroup create will trigger NULL pointer access.

So, here test it. :)

I don't find a more good way to avoid this.

>
>> @@ -5055,6 +5061,10 @@ static int __init mem_cgroup_init(void)
>>   		INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work,
>>   			  drain_local_stock);
>>   
>> +	memcg_size = struct_size_t(struct mem_cgroup, nodeinfo, nr_node_ids);
>> +	memcg_cachep = kmem_cache_create("mem_cgroup", memcg_size, 0,
>> +					 SLAB_PANIC | SLAB_HWCACHE_ALIGN, NULL);
> If it's because this allocation might have failed then let's not
> bother.  If an __init-time allocation failed, this kernel is unusable
> anyway.
>
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] mm/memcg: use kmem_cache when alloc memcg
  2025-04-23 22:13     ` Shakeel Butt
@ 2025-04-24  2:45       ` Huan Yang
  2025-04-24  3:31         ` Shakeel Butt
  0 siblings, 1 reply; 9+ messages in thread
From: Huan Yang @ 2025-04-24  2:45 UTC (permalink / raw)
  To: Shakeel Butt, Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	cgroups, linux-mm, linux-kernel, opensource.kernel

Hi Shakeel

在 2025/4/24 06:13, Shakeel Butt 写道:
> On Wed, Apr 23, 2025 at 02:59:12PM -0700, Andrew Morton wrote:
>> On Wed, 23 Apr 2025 16:43:04 +0800 Huan Yang <link@vivo.com> wrote:
>>
>>> @@ -3652,7 +3654,10 @@ static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent)
>>>   	int __maybe_unused i;
>>>   	long error;
>>>   
>>> -	memcg = kzalloc(struct_size(memcg, nodeinfo, nr_node_ids), GFP_KERNEL);
>>> +	memcg = likely(memcg_cachep) ?
>>> +			kmem_cache_zalloc(memcg_cachep, GFP_KERNEL) :
>>> +			kzalloc(struct_size(memcg, nodeinfo, nr_node_ids),
>>> +				GFP_KERNEL);
>> Why are we testing for memcg_cachep=NULL?
>>
>>> @@ -5055,6 +5061,10 @@ static int __init mem_cgroup_init(void)
>>>   		INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work,
>>>   			  drain_local_stock);
>>>   
>>> +	memcg_size = struct_size_t(struct mem_cgroup, nodeinfo, nr_node_ids);
>>> +	memcg_cachep = kmem_cache_create("mem_cgroup", memcg_size, 0,
>>> +					 SLAB_PANIC | SLAB_HWCACHE_ALIGN, NULL);
>> If it's because this allocation might have failed then let's not
>> bother.  If an __init-time allocation failed, this kernel is unusable
>> anyway.
>>
>> +1 to Andrew's point. SLAB_PANIC is used here, so, memcg_cachep can't be
>> NULL later.

I see cgroup_init(in start_kernel) ahead of initcall( which in rest_init->kernel_init->...->initcall). So, root_mem_cgroup create use

cachep will trigger NULL pointer access.

Thanks,

Huan



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] mm/memcg: use kmem_cache when alloc memcg
  2025-04-24  2:45       ` Huan Yang
@ 2025-04-24  3:31         ` Shakeel Butt
  2025-04-24  3:40           ` Huan Yang
  0 siblings, 1 reply; 9+ messages in thread
From: Shakeel Butt @ 2025-04-24  3:31 UTC (permalink / raw)
  To: Huan Yang
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, cgroups, linux-mm, linux-kernel, opensource.kernel

On Thu, Apr 24, 2025 at 10:45:05AM +0800, Huan Yang wrote:
> Hi Shakeel
> 
> 在 2025/4/24 06:13, Shakeel Butt 写道:
> > On Wed, Apr 23, 2025 at 02:59:12PM -0700, Andrew Morton wrote:
> > > On Wed, 23 Apr 2025 16:43:04 +0800 Huan Yang <link@vivo.com> wrote:
> > > 
> > > > @@ -3652,7 +3654,10 @@ static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent)
> > > >   	int __maybe_unused i;
> > > >   	long error;
> > > > -	memcg = kzalloc(struct_size(memcg, nodeinfo, nr_node_ids), GFP_KERNEL);
> > > > +	memcg = likely(memcg_cachep) ?
> > > > +			kmem_cache_zalloc(memcg_cachep, GFP_KERNEL) :
> > > > +			kzalloc(struct_size(memcg, nodeinfo, nr_node_ids),
> > > > +				GFP_KERNEL);
> > > Why are we testing for memcg_cachep=NULL?
> > > 
> > > > @@ -5055,6 +5061,10 @@ static int __init mem_cgroup_init(void)
> > > >   		INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work,
> > > >   			  drain_local_stock);
> > > > +	memcg_size = struct_size_t(struct mem_cgroup, nodeinfo, nr_node_ids);
> > > > +	memcg_cachep = kmem_cache_create("mem_cgroup", memcg_size, 0,
> > > > +					 SLAB_PANIC | SLAB_HWCACHE_ALIGN, NULL);
> > > If it's because this allocation might have failed then let's not
> > > bother.  If an __init-time allocation failed, this kernel is unusable
> > > anyway.
> > > 
> > > +1 to Andrew's point. SLAB_PANIC is used here, so, memcg_cachep can't be
> > > NULL later.
> 
> I see cgroup_init(in start_kernel) ahead of initcall( which in rest_init->kernel_init->...->initcall). So, root_mem_cgroup create use
> 
> cachep will trigger NULL pointer access.

So, either create a new function which creates this kmem cache before
cgroup_init() or just call mem_cgroup_init() before cgroup_init()
(similar to cpuset_init()).


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] mm/memcg: use kmem_cache when alloc memcg
  2025-04-24  3:31         ` Shakeel Butt
@ 2025-04-24  3:40           ` Huan Yang
  0 siblings, 0 replies; 9+ messages in thread
From: Huan Yang @ 2025-04-24  3:40 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, cgroups, linux-mm, linux-kernel, opensource.kernel

Hi Shakeel

在 2025/4/24 11:31, Shakeel Butt 写道:
> On Thu, Apr 24, 2025 at 10:45:05AM +0800, Huan Yang wrote:
>> Hi Shakeel
>>
>> 在 2025/4/24 06:13, Shakeel Butt 写道:
>>> On Wed, Apr 23, 2025 at 02:59:12PM -0700, Andrew Morton wrote:
>>>> On Wed, 23 Apr 2025 16:43:04 +0800 Huan Yang <link@vivo.com> wrote:
>>>>
>>>>> @@ -3652,7 +3654,10 @@ static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent)
>>>>>    	int __maybe_unused i;
>>>>>    	long error;
>>>>> -	memcg = kzalloc(struct_size(memcg, nodeinfo, nr_node_ids), GFP_KERNEL);
>>>>> +	memcg = likely(memcg_cachep) ?
>>>>> +			kmem_cache_zalloc(memcg_cachep, GFP_KERNEL) :
>>>>> +			kzalloc(struct_size(memcg, nodeinfo, nr_node_ids),
>>>>> +				GFP_KERNEL);
>>>> Why are we testing for memcg_cachep=NULL?
>>>>
>>>>> @@ -5055,6 +5061,10 @@ static int __init mem_cgroup_init(void)
>>>>>    		INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work,
>>>>>    			  drain_local_stock);
>>>>> +	memcg_size = struct_size_t(struct mem_cgroup, nodeinfo, nr_node_ids);
>>>>> +	memcg_cachep = kmem_cache_create("mem_cgroup", memcg_size, 0,
>>>>> +					 SLAB_PANIC | SLAB_HWCACHE_ALIGN, NULL);
>>>> If it's because this allocation might have failed then let's not
>>>> bother.  If an __init-time allocation failed, this kernel is unusable
>>>> anyway.
>>>>
>>>> +1 to Andrew's point. SLAB_PANIC is used here, so, memcg_cachep can't be
>>>> NULL later.
>> I see cgroup_init(in start_kernel) ahead of initcall( which in rest_init->kernel_init->...->initcall). So, root_mem_cgroup create use
>>
>> cachep will trigger NULL pointer access.
> So, either create a new function which creates this kmem cache before
> cgroup_init() or just call mem_cgroup_init() before cgroup_init()
> (similar to cpuset_init()).

Do you mean like this:

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 5264d148bdd9..e9a4fc5aabc7 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -348,6 +348,7 @@ enum objext_flags {

  #ifdef CONFIG_MEMCG

+extern int mem_cgroup_init(void);
  static inline bool folio_memcg_kmem(struct folio *folio);

  /*
@@ -1059,6 +1060,8 @@ static inline u64 cgroup_id_from_mm(struct mm_struct *mm)

  #else /* CONFIG_MEMCG */

+static inline int mem_cgroup_init(void) { return 0; }
+
  #define MEM_CGROUP_ID_SHIFT    0

  static inline struct mem_cgroup *folio_memcg(struct folio *folio)
diff --git a/init/main.c b/init/main.c
index 6b14e6116a1f..d9c646960fdd 100644
--- a/init/main.c
+++ b/init/main.c
@@ -50,6 +50,7 @@
  #include <linux/writeback.h>
  #include <linux/cpu.h>
  #include <linux/cpuset.h>
+#include <linux/memcontrol.h>
  #include <linux/cgroup.h>
  #include <linux/efi.h>
  #include <linux/tick.h>
@@ -1087,6 +1088,7 @@ void start_kernel(void)
      nsfs_init();
      pidfs_init();
      cpuset_init();
+    mem_cgroup_init();
      cgroup_init();
      taskstats_init_early();
      delayacct_init();
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5e2ea8b8a898..1537562c01c2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5029,14 +5029,12 @@ static int __init cgroup_memory(char *s)
  __setup("cgroup.memory=", cgroup_memory);

  /*
- * subsys_initcall() for memory controller.
- *
   * Some parts like memcg_hotplug_cpu_dead() have to be initialized from this
   * context because of lock dependencies (cgroup_lock -> cpu hotplug) but
   * basically everything that doesn't depend on a specific mem_cgroup structure
   * should be initialized from here.
   */
-static int __init mem_cgroup_init(void)
+int __init mem_cgroup_init(void)
  {
      int cpu;

@@ -5057,7 +5055,6 @@ static int __init mem_cgroup_init(void)

      return 0;
  }
-subsys_initcall(mem_cgroup_init);

  #ifdef CONFIG_SWAP
  /**

Anyway, I'll test it.:)

Thanks for your suggestion.

Thank you,

Huan



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-04-24  3:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-23  8:43 [PATCH 0/2] Use kmem_cache for memcg alloc Huan Yang
2025-04-23  8:43 ` [PATCH 1/2] mm/memcg: use kmem_cache when alloc memcg Huan Yang
2025-04-23 21:59   ` Andrew Morton
2025-04-23 22:13     ` Shakeel Butt
2025-04-24  2:45       ` Huan Yang
2025-04-24  3:31         ` Shakeel Butt
2025-04-24  3:40           ` Huan Yang
2025-04-24  1:46     ` Huan Yang
2025-04-23  8:43 ` [PATCH 2/2] mm/memcg: use kmem_cache when alloc memcg pernode info Huan Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox