linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Use kmem_cache for memcg alloc
@ 2025-04-24 12:09 Huan Yang
  2025-04-24 12:09 ` [PATCH v2 1/3] mm/memcg: use kmem_cache when alloc memcg Huan Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Huan Yang @ 2025-04-24 12:09 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, Petr Mladek,
	Sebastian Andrzej Siewior, Huan Yang, Francesco Valla,
	Huang Shijie, KP Singh, Paul E. McKenney, Rasmus Villemoes,
	Uladzislau Rezki (Sony),
	Guo Weikang, Raul E Rangel, cgroups, linux-mm, linux-kernel,
	Boqun Feng, Geert Uytterhoeven
  Cc: opensource.kernel

The mem_cgroup_alloc function creates mem_cgroup struct and it's associated
structures including mem_cgroup_per_node.
Through detailed analysis on our test machine (Arm64, 16GB RAM, 6.6 kernel,
1 NUMA node, memcgv2 with nokmem,nosocket,cgroup_disable=pressure),
we can observe the memory allocation for these structures using the
following shell commands:
  # Enable tracing
  echo 1 > /sys/kernel/tracing/events/kmem/kmalloc/enable
  echo 1 > /sys/kernel/tracing/tracing_on
  cat /sys/kernel/tracing/trace_pipe | grep kmalloc | grep mem_cgroup

  # Trigger allocation if cgroup subtree do not enable memcg
  echo +memory > /sys/fs/cgroup/cgroup.subtree_control

Ftrace Output:
  # mem_cgroup struct allocation
  sh-6312    [000] ..... 58015.698365: kmalloc:
    call_site=mem_cgroup_css_alloc+0xd8/0x5b4
    ptr=000000003e4c3799 bytes_req=2312 bytes_alloc=4096
    gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1 accounted=false

  # mem_cgroup_per_node allocation
  sh-6312    [000] ..... 58015.698389: kmalloc:
    call_site=mem_cgroup_css_alloc+0x1d8/0x5b4
    ptr=00000000d798700c bytes_req=2896 bytes_alloc=4096
    gfp_flags=GFP_KERNEL|__GFP_ZERO node=0 accounted=false

Key Observations:
  1. Both structures use kmalloc with requested sizes between 2KB-4KB
  2. Allocation alignment forces 4KB slab usage due to pre-defined sizes
     (64B, 128B,..., 2KB, 4KB, 8KB)
  3. Memory waste per memcg instance:
      Base struct: 4096 - 2312 = 1784 bytes
      Per-node struct: 4096 - 2896 = 1200 bytes
      Total waste: 2984 bytes (1-node system)
      NUMA scaling: (1200 + 8) * nr_node_ids bytes
So, it's a little waste.

This patchset introduces dedicated kmem_cache:
  Patch1 - mem_cgroup kmem_cache - memcg_cachep
  Patch2 - mem_cgroup_per_node kmem_cache - memcg_pn_cachep

The benefits of this change can be observed with the following tracing
commands:
  # Enable tracing
  echo 1 > /sys/kernel/tracing/events/kmem/kmem_cache_alloc/enable
  echo 1 > /sys/kernel/tracing/tracing_on
  cat /sys/kernel/tracing/trace_pipe | grep kmem_cache_alloc | grep mem_cgroup
  # In another terminal:
  echo +memory > /sys/fs/cgroup/cgroup.subtree_control


The output might now look like this:

  # mem_cgroup struct allocation
  sh-9827     [000] .....   289.513598: kmem_cache_alloc:
    call_site=mem_cgroup_css_alloc+0xbc/0x5d4 ptr=00000000695c1806
    bytes_req=2312 bytes_alloc=2368 gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1
    accounted=false
  # mem_cgroup_per_node allocation
  sh-9827     [000] .....   289.513602: kmem_cache_alloc:
    call_site=mem_cgroup_css_alloc+0x1b8/0x5d4 ptr=000000002989e63a
    bytes_req=2896 bytes_alloc=2944 gfp_flags=GFP_KERNEL|__GFP_ZERO node=0
    accounted=false

This indicates that the `mem_cgroup` struct now requests 2312 bytes
and is allocated 2368 bytes, while `mem_cgroup_per_node` requests 2896 bytes
and is allocated 2944 bytes.
The slight increase in allocated size is due to `SLAB_HWCACHE_ALIGN` in the
`kmem_cache`.

Without `SLAB_HWCACHE_ALIGN`, the allocation might appear as:

  # mem_cgroup struct allocation
  sh-9269     [003] .....    80.396366: kmem_cache_alloc:
    call_site=mem_cgroup_css_alloc+0xbc/0x5d4 ptr=000000005b12b475
    bytes_req=2312 bytes_alloc=2312 gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1
    accounted=false

  # mem_cgroup_per_node allocation
  sh-9269     [003] .....    80.396411: kmem_cache_alloc:
    call_site=mem_cgroup_css_alloc+0x1b8/0x5d4 ptr=00000000f347adc6
    bytes_req=2896 bytes_alloc=2896 gfp_flags=GFP_KERNEL|__GFP_ZERO node=0
    accounted=false

While the `bytes_alloc` now matches the `bytes_req`, this patchset defaults
to using `SLAB_HWCACHE_ALIGN` as it is generally considered more beneficial
for performance. Please let me know if there are any issues or if I've
misunderstood anything.

Patch3 - introduce the mem_cgroup_early_init() function to pre-allocate
         essential resources before cgroup_init() create the root_mem_cgroup.
         Currently is create memcg_cachep and memcg_pn_cachep, so keep
         this struct alloc cleanly.

ChangeLog:

 v1 -> v2:
   Patch1-2 simple change commit message.
   Patch3: Add mem_cgroup_init_early to help "memcg" prepare resources
           before cgroup_init().

v1: https://lore.kernel.org/all/20250423084306.65706-1-link@vivo.com/

Huan Yang (3):
  mm/memcg: use kmem_cache when alloc memcg
  mm/memcg: use kmem_cache when alloc memcg pernode info
  mm/memcg: introduce mem_cgroup_early_init

 include/linux/memcontrol.h |  5 +++++
 init/main.c                |  2 ++
 mm/memcontrol.c            | 29 +++++++++++++++++++++++++++--
 3 files changed, 34 insertions(+), 2 deletions(-)


base-commit: 2c9c612abeb38aab0e87d48496de6fd6daafb00b
--
2.48.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-04-25  1:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-24 12:09 [PATCH v2 0/3] Use kmem_cache for memcg alloc Huan Yang
2025-04-24 12:09 ` [PATCH v2 1/3] mm/memcg: use kmem_cache when alloc memcg Huan Yang
2025-04-24 12:09 ` [PATCH v2 2/3] mm/memcg: use kmem_cache when alloc memcg pernode info Huan Yang
2025-04-24 12:09 ` [PATCH v2 3/3] mm/memcg: introduce mem_cgroup_early_init Huan Yang
2025-04-24 16:00   ` Shakeel Butt
2025-04-24 23:00     ` Shakeel Butt
2025-04-25  1:11       ` Huan Yang
2025-04-25  1:30         ` Shakeel Butt
2025-04-25  1:55           ` Huan Yang
2025-04-25  1:11     ` Huan Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox