linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4 0/8] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unsed slab space
@ 2025-12-22 11:08 Harry Yoo
  2025-12-22 11:08 ` [PATCH V4 1/8] mm/slab: use unsigned long for orig_size to ensure proper metadata align Harry Yoo
                   ` (7 more replies)
  0 siblings, 8 replies; 27+ messages in thread
From: Harry Yoo @ 2025-12-22 11:08 UTC (permalink / raw)
  To: akpm, vbabka
  Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
	muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, harry.yoo,
	tytso, adilger.kernel, linux-ext4, linux-kernel, cgroups, hao.li

RFC V3: https://lore.kernel.org/linux-mm/20251027122847.320924-1-harry.yoo@oracle.com

I believe I addressed all comments in RFC V3 (except handling lazy
allocation of slabobj_exts, which I would prefer to do as future work).
Please let me know if I missed your comments.

If there is no major drawbacks or concerns coming up, I would like to push
this forward for 7.0 merge window after some review & testing.

Have a wonderful end of the year!

RFC V3 -> V4:
- Rebased onto the latest slab/for-next, dropped RFC
- The metadata alignment (after orig_size) fix is now included as patch 1
  of this series
- Patch 2: Document that use_freeptr_offset can be used for caches with
  constructor (Suren, Vlastimil)
- Patch 6: use get/put_slab_obj_exts() instead of
  metadata_access_enable/disable (Suren)
- Patch 7: Change !mem_cgroup_disabled() check to memcg_kmem_online()
  (Andrey Ryabinin)
- Added Reviewed-by, Suggested-by tags, thanks!

When CONFIG_MEMCG and CONFIG_MEM_ALLOC_PROFILING are enabled,
the kernel allocates two pointers per object: one for the memory cgroup
(obj_cgroup) to which it belongs, and another for the code location
that requested the allocation.

In two special cases, this overhead can be eliminated by allocating
slabobj_ext metadata from unused space within a slab:

  Case 1. The "leftover" space after the last slab object is larger than
          the size of an array of slabobj_ext.

  Case 2. The per-object alignment padding is larger than
          sizeof(struct slabobj_ext).

For these two cases, one or two pointers can be saved per slab object.
Examples: ext4 inode cache (case 1) and xfs inode cache (case 2).
That's approximately 0.7-0.8% (memcg) or 1.5-1.6%% (memcg + mem profiling)
of the total inode cache size.

Implementing case 2 is not straightforward, because the existing code
assumes that slab->obj_exts is an array of slabobj_ext, while case 2
breaks the assumption.

As suggested by Vlastimil, abstract access to individual slabobj_ext
metadata via a new helper named slab_obj_ext():

static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
                                               unsigned long obj_exts,
                                               unsigned int index)
{
        return (struct slabobj_ext *)(obj_exts + slab_get_stride(slab) * index);
} 

In the normal case (including case 1), slab->obj_exts points to an array
of slabobj_ext, and the stride is sizeof(struct slabobj_ext).

In case 2, the stride is s->size and
slab->obj_exts = slab_address(slab) + s->red_left_pad + (offset of slabobj_ext)

With this approach, the memcg charging fastpath doesn't need to care the
storage method of slabobj_ext.

Harry Yoo (8):
  mm/slab: use unsigned long for orig_size to ensure proper metadata
    align
  mm/slab: allow specifying free pointer offset when using constructor
  ext4: specify the free pointer offset for ext4_inode_cache
  mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper
  mm/slab: use stride to access slabobj_ext
  mm/memcontrol,alloc_tag: handle slabobj_ext access under KASAN poison
  mm/slab: save memory by allocating slabobj_ext array from leftover
  mm/slab: place slabobj_ext metadata in unused space within s->size

 fs/ext4/super.c      |  20 ++-
 include/linux/slab.h |  39 +++--
 mm/memcontrol.c      |  31 +++-
 mm/slab.h            | 120 ++++++++++++++-
 mm/slab_common.c     |   8 +-
 mm/slub.c            | 345 +++++++++++++++++++++++++++++++++++--------
 6 files changed, 466 insertions(+), 97 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH V4 1/8] mm/slab: use unsigned long for orig_size to ensure proper metadata align
  2025-12-22 11:08 [PATCH V4 0/8] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unsed slab space Harry Yoo
@ 2025-12-22 11:08 ` Harry Yoo
  2025-12-22 11:08 ` [PATCH V4 2/8] mm/slab: allow specifying free pointer offset when using constructor Harry Yoo
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Harry Yoo @ 2025-12-22 11:08 UTC (permalink / raw)
  To: akpm, vbabka
  Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
	muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, harry.yoo,
	tytso, adilger.kernel, linux-ext4, linux-kernel, cgroups, hao.li,
	stable

When both KASAN and SLAB_STORE_USER are enabled, accesses to
struct kasan_alloc_meta fields can be misaligned on 64-bit architectures.
This occurs because orig_size is currently defined as unsigned int,
which only guarantees 4-byte alignment. When struct kasan_alloc_meta is
placed after orig_size, it may end up at a 4-byte boundary rather than
the required 8-byte boundary on 64-bit systems.

Note that 64-bit architectures without HAVE_EFFICIENT_UNALIGNED_ACCESS
are assumed to require 64-bit accesses to be 64-bit aligned.
See HAVE_64BIT_ALIGNED_ACCESS and commit adab66b71abf ("Revert:
"ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"") for more details.

Change orig_size from unsigned int to unsigned long to ensure proper
alignment for any subsequent metadata. This should not waste additional
memory because kmalloc objects are already aligned to at least
ARCH_KMALLOC_MINALIGN.

Suggested-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: stable@vger.kernel.org
Fixes: 6edf2576a6cc ("mm/slub: enable debugging memory wasting of kmalloc")
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
 mm/slub.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index ad71f01571f0..1c747435a6ab 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -857,7 +857,7 @@ static inline bool slab_update_freelist(struct kmem_cache *s, struct slab *slab,
  * request size in the meta data area, for better debug and sanity check.
  */
 static inline void set_orig_size(struct kmem_cache *s,
-				void *object, unsigned int orig_size)
+				void *object, unsigned long orig_size)
 {
 	void *p = kasan_reset_tag(object);
 
@@ -867,10 +867,10 @@ static inline void set_orig_size(struct kmem_cache *s,
 	p += get_info_end(s);
 	p += sizeof(struct track) * 2;
 
-	*(unsigned int *)p = orig_size;
+	*(unsigned long *)p = orig_size;
 }
 
-static inline unsigned int get_orig_size(struct kmem_cache *s, void *object)
+static inline unsigned long get_orig_size(struct kmem_cache *s, void *object)
 {
 	void *p = kasan_reset_tag(object);
 
@@ -883,7 +883,7 @@ static inline unsigned int get_orig_size(struct kmem_cache *s, void *object)
 	p += get_info_end(s);
 	p += sizeof(struct track) * 2;
 
-	return *(unsigned int *)p;
+	return *(unsigned long *)p;
 }
 
 #ifdef CONFIG_SLUB_DEBUG
@@ -1198,7 +1198,7 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p)
 		off += 2 * sizeof(struct track);
 
 	if (slub_debug_orig_size(s))
-		off += sizeof(unsigned int);
+		off += sizeof(unsigned long);
 
 	off += kasan_metadata_size(s, false);
 
@@ -1394,7 +1394,7 @@ static int check_pad_bytes(struct kmem_cache *s, struct slab *slab, u8 *p)
 		off += 2 * sizeof(struct track);
 
 		if (s->flags & SLAB_KMALLOC)
-			off += sizeof(unsigned int);
+			off += sizeof(unsigned long);
 	}
 
 	off += kasan_metadata_size(s, false);
@@ -7949,7 +7949,7 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
 
 		/* Save the original kmalloc request size */
 		if (flags & SLAB_KMALLOC)
-			size += sizeof(unsigned int);
+			size += sizeof(unsigned long);
 	}
 #endif
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH V4 2/8] mm/slab: allow specifying free pointer offset when using constructor
  2025-12-22 11:08 [PATCH V4 0/8] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unsed slab space Harry Yoo
  2025-12-22 11:08 ` [PATCH V4 1/8] mm/slab: use unsigned long for orig_size to ensure proper metadata align Harry Yoo
@ 2025-12-22 11:08 ` Harry Yoo
  2025-12-22 11:08 ` [PATCH V4 3/8] ext4: specify the free pointer offset for ext4_inode_cache Harry Yoo
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Harry Yoo @ 2025-12-22 11:08 UTC (permalink / raw)
  To: akpm, vbabka
  Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
	muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, harry.yoo,
	tytso, adilger.kernel, linux-ext4, linux-kernel, cgroups, hao.li

When a slab cache has a constructor, the free pointer is placed after the
object because certain fields must not be overwritten even after the
object is freed.

However, some fields that the constructor does not initialize can safely
be overwritten after free. Allow specifying the free pointer offset within
the object, reducing the overall object size when some fields can be reused
for the free pointer.

Adjust the document accordingly.

Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
 include/linux/slab.h | 30 ++++++++++++++++--------------
 mm/slab_common.c     |  2 +-
 mm/slub.c            |  6 ++++--
 3 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 2482992248dc..4554c04a9bd7 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -299,24 +299,26 @@ struct kmem_cache_args {
 	unsigned int usersize;
 	/**
 	 * @freeptr_offset: Custom offset for the free pointer
-	 * in &SLAB_TYPESAFE_BY_RCU caches
+	 * in caches with &SLAB_TYPESAFE_BY_RCU or @ctor
 	 *
-	 * By default &SLAB_TYPESAFE_BY_RCU caches place the free pointer
-	 * outside of the object. This might cause the object to grow in size.
-	 * Cache creators that have a reason to avoid this can specify a custom
-	 * free pointer offset in their struct where the free pointer will be
-	 * placed.
+	 * By default, &SLAB_TYPESAFE_BY_RCU and @ctor caches place the free
+	 * pointer outside of the object. This might cause the object to grow
+	 * in size. Cache creators that have a reason to avoid this can specify
+	 * a custom free pointer offset in their data structure where the free
+	 * pointer will be placed.
 	 *
-	 * Note that placing the free pointer inside the object requires the
-	 * caller to ensure that no fields are invalidated that are required to
-	 * guard against object recycling (See &SLAB_TYPESAFE_BY_RCU for
-	 * details).
+	 * For caches with &SLAB_TYPESAFE_BY_RCU, the caller must ensure that
+	 * the free pointer does not overlay fields required to guard against
+	 * object recycling (See &SLAB_TYPESAFE_BY_RCU for details).
 	 *
-	 * Using %0 as a value for @freeptr_offset is valid. If @freeptr_offset
-	 * is specified, %use_freeptr_offset must be set %true.
+	 * For caches with @ctor, the caller must ensure that the free pointer
+	 * does not overlay fields initialized by the constructor.
+	 *
+	 * Currently, only caches with &SLAB_TYPESAFE_BY_RCU or @ctor
+	 * may specify @freeptr_offset.
 	 *
-	 * Note that @ctor currently isn't supported with custom free pointers
-	 * as a @ctor requires an external free pointer.
+	 * Using %0 as a value for @freeptr_offset is valid. If @freeptr_offset
+	 * is specified, @use_freeptr_offset must be set %true.
 	 */
 	unsigned int freeptr_offset;
 	/**
diff --git a/mm/slab_common.c b/mm/slab_common.c
index eed7ea556cb1..c4cf9ed2ec92 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -231,7 +231,7 @@ static struct kmem_cache *create_cache(const char *name,
 	err = -EINVAL;
 	if (args->use_freeptr_offset &&
 	    (args->freeptr_offset >= object_size ||
-	     !(flags & SLAB_TYPESAFE_BY_RCU) ||
+	     (!(flags & SLAB_TYPESAFE_BY_RCU) && !args->ctor) ||
 	     !IS_ALIGNED(args->freeptr_offset, __alignof__(freeptr_t))))
 		goto out;
 
diff --git a/mm/slub.c b/mm/slub.c
index 1c747435a6ab..0e32f6420a8a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -7907,7 +7907,8 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
 	s->inuse = size;
 
 	if (((flags & SLAB_TYPESAFE_BY_RCU) && !args->use_freeptr_offset) ||
-	    (flags & SLAB_POISON) || s->ctor ||
+	    (flags & SLAB_POISON) ||
+	    (s->ctor && !args->use_freeptr_offset) ||
 	    ((flags & SLAB_RED_ZONE) &&
 	     (s->object_size < sizeof(void *) || slub_debug_orig_size(s)))) {
 		/*
@@ -7928,7 +7929,8 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
 		 */
 		s->offset = size;
 		size += sizeof(void *);
-	} else if ((flags & SLAB_TYPESAFE_BY_RCU) && args->use_freeptr_offset) {
+	} else if (((flags & SLAB_TYPESAFE_BY_RCU) || s->ctor) &&
+			args->use_freeptr_offset) {
 		s->offset = args->freeptr_offset;
 	} else {
 		/*
-- 
2.43.0



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH V4 3/8] ext4: specify the free pointer offset for ext4_inode_cache
  2025-12-22 11:08 [PATCH V4 0/8] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unsed slab space Harry Yoo
  2025-12-22 11:08 ` [PATCH V4 1/8] mm/slab: use unsigned long for orig_size to ensure proper metadata align Harry Yoo
  2025-12-22 11:08 ` [PATCH V4 2/8] mm/slab: allow specifying free pointer offset when using constructor Harry Yoo
@ 2025-12-22 11:08 ` Harry Yoo
  2025-12-22 11:08 ` [PATCH V4 4/8] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper Harry Yoo
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Harry Yoo @ 2025-12-22 11:08 UTC (permalink / raw)
  To: akpm, vbabka
  Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
	muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, harry.yoo,
	tytso, adilger.kernel, linux-ext4, linux-kernel, cgroups, hao.li

Convert ext4_inode_cache to use the kmem_cache_args interface and
specify a free pointer offset.

Since ext4_inode_cache uses a constructor, the free pointer would be
placed after the object to overwriting fields used by the constructor.
However, some fields such as ->i_flags are not used by the constructor
and can safely be repurposed for the free pointer.

Specify the free pointer offset at i_flags to reduce the object size.

Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
 fs/ext4/super.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 87205660c5d0..42580643a466 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1491,12 +1491,20 @@ static void init_once(void *foo)
 
 static int __init init_inodecache(void)
 {
-	ext4_inode_cachep = kmem_cache_create_usercopy("ext4_inode_cache",
-				sizeof(struct ext4_inode_info), 0,
-				SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
-				offsetof(struct ext4_inode_info, i_data),
-				sizeof_field(struct ext4_inode_info, i_data),
-				init_once);
+	struct kmem_cache_args args = {
+		.align = 0,
+		.useroffset = offsetof(struct ext4_inode_info, i_data),
+		.usersize = sizeof_field(struct ext4_inode_info, i_data),
+		.use_freeptr_offset = true,
+		.freeptr_offset = offsetof(struct ext4_inode_info, i_flags),
+		.ctor = init_once,
+	};
+
+	ext4_inode_cachep = kmem_cache_create("ext4_inode_cache",
+				sizeof(struct ext4_inode_info),
+				&args,
+				SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT);
+
 	if (ext4_inode_cachep == NULL)
 		return -ENOMEM;
 	return 0;
-- 
2.43.0



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH V4 4/8] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper
  2025-12-22 11:08 [PATCH V4 0/8] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unsed slab space Harry Yoo
                   ` (2 preceding siblings ...)
  2025-12-22 11:08 ` [PATCH V4 3/8] ext4: specify the free pointer offset for ext4_inode_cache Harry Yoo
@ 2025-12-22 11:08 ` Harry Yoo
  2025-12-22 23:36   ` kernel test robot
  2025-12-23  0:08   ` kernel test robot
  2025-12-22 11:08 ` [PATCH V4 5/8] mm/slab: use stride to access slabobj_ext Harry Yoo
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 27+ messages in thread
From: Harry Yoo @ 2025-12-22 11:08 UTC (permalink / raw)
  To: akpm, vbabka
  Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
	muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, harry.yoo,
	tytso, adilger.kernel, linux-ext4, linux-kernel, cgroups, hao.li

Currently, the slab allocator assumes that slab->obj_exts is a pointer
to an array of struct slabobj_ext objects. However, to support storage
methods where struct slabobj_ext is embedded within objects, the slab
allocator should not make this assumption. Instead of directly
dereferencing the slabobj_exts array, abstract access to
struct slabobj_ext via helper functions.

Introduce a new API slabobj_ext metadata access:

  slab_obj_ext(slab, obj_exts, index) - returns the pointer to
  struct slabobj_ext element at the given index.

Directly dereferencing the return value of slab_obj_exts() is no longer
allowed. Instead, slab_obj_ext() must always be used to access
individual struct slabobj_ext objects.

Convert all users to use these APIs.
No functional changes intended.

Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
 mm/memcontrol.c | 23 +++++++++++++++-------
 mm/slab.h       | 43 +++++++++++++++++++++++++++++++++++------
 mm/slub.c       | 51 ++++++++++++++++++++++++++++---------------------
 3 files changed, 82 insertions(+), 35 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index be810c1fbfc3..fd9105a953b0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2596,7 +2596,8 @@ struct mem_cgroup *mem_cgroup_from_obj_slab(struct slab *slab, void *p)
 	 * Memcg membership data for each individual object is saved in
 	 * slab->obj_exts.
 	 */
-	struct slabobj_ext *obj_exts;
+	unsigned long obj_exts;
+	struct slabobj_ext *obj_ext;
 	unsigned int off;
 
 	obj_exts = slab_obj_exts(slab);
@@ -2604,8 +2605,9 @@ struct mem_cgroup *mem_cgroup_from_obj_slab(struct slab *slab, void *p)
 		return NULL;
 
 	off = obj_to_index(slab->slab_cache, slab, p);
-	if (obj_exts[off].objcg)
-		return obj_cgroup_memcg(obj_exts[off].objcg);
+	obj_ext = slab_obj_ext(slab, obj_exts, off);
+	if (obj_ext->objcg)
+		return obj_cgroup_memcg(obj_ext->objcg);
 
 	return NULL;
 }
@@ -3191,6 +3193,9 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
 	}
 
 	for (i = 0; i < size; i++) {
+		unsigned long obj_exts;
+		struct slabobj_ext *obj_ext;
+
 		slab = virt_to_slab(p[i]);
 
 		if (!slab_obj_exts(slab) &&
@@ -3213,29 +3218,33 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
 					slab_pgdat(slab), cache_vmstat_idx(s)))
 			return false;
 
+		obj_exts = slab_obj_exts(slab);
 		off = obj_to_index(s, slab, p[i]);
+		obj_ext = slab_obj_ext(slab, obj_exts, off);
 		obj_cgroup_get(objcg);
-		slab_obj_exts(slab)[off].objcg = objcg;
+		obj_ext->objcg = objcg;
 	}
 
 	return true;
 }
 
 void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
-			    void **p, int objects, struct slabobj_ext *obj_exts)
+			    void **p, int objects, unsigned long obj_exts)
 {
 	size_t obj_size = obj_full_size(s);
 
 	for (int i = 0; i < objects; i++) {
 		struct obj_cgroup *objcg;
+		struct slabobj_ext *obj_ext;
 		unsigned int off;
 
 		off = obj_to_index(s, slab, p[i]);
-		objcg = obj_exts[off].objcg;
+		obj_ext = slab_obj_ext(slab, obj_exts, off);
+		objcg = obj_ext->objcg;
 		if (!objcg)
 			continue;
 
-		obj_exts[off].objcg = NULL;
+		obj_ext->objcg = NULL;
 		refill_obj_stock(objcg, obj_size, true, -obj_size,
 				 slab_pgdat(slab), cache_vmstat_idx(s));
 		obj_cgroup_put(objcg);
diff --git a/mm/slab.h b/mm/slab.h
index e767aa7e91b0..5c75ef3d1823 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -509,10 +509,12 @@ static inline bool slab_in_kunit_test(void) { return false; }
  * associated with a slab.
  * @slab: a pointer to the slab struct
  *
- * Returns a pointer to the object extension vector associated with the slab,
- * or NULL if no such vector has been associated yet.
+ * Returns the address of the object extension vector associated with the slab,
+ * or zero if no such vector has been associated yet.
+ * Do not dereference the return value directly; use slab_obj_ext() to access
+ * its elements.
  */
-static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
+static inline unsigned long slab_obj_exts(struct slab *slab)
 {
 	unsigned long obj_exts = READ_ONCE(slab->obj_exts);
 
@@ -525,7 +527,30 @@ static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
 		       obj_exts != OBJEXTS_ALLOC_FAIL, slab_page(slab));
 	VM_BUG_ON_PAGE(obj_exts & MEMCG_DATA_KMEM, slab_page(slab));
 #endif
-	return (struct slabobj_ext *)(obj_exts & ~OBJEXTS_FLAGS_MASK);
+
+	return obj_exts & ~OBJEXTS_FLAGS_MASK;
+}
+
+/*
+ * slab_obj_ext - get the pointer to the slab object extension metadata
+ * associated with an object in a slab.
+ * @slab: a pointer to the slab struct
+ * @obj_exts: a pointer to the object extension vector
+ * @index: an index of the object
+ *
+ * Returns a pointer to the object extension associated with the object.
+ */
+static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
+					       unsigned long obj_exts,
+					       unsigned int index)
+{
+	struct slabobj_ext *obj_ext;
+
+	VM_WARN_ON_ONCE(!slab_obj_exts(slab));
+	VM_WARN_ON_ONCE(obj_exts != slab_obj_exts(slab));
+
+	obj_ext = (struct slabobj_ext *)obj_exts;
+	return &obj_ext[index];
 }
 
 int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
@@ -533,7 +558,13 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
 
 #else /* CONFIG_SLAB_OBJ_EXT */
 
-static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
+static inline unsigned long slab_obj_exts(struct slab *slab)
+{
+	return false;
+}
+
+static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
+					       unsigned int index)
 {
 	return NULL;
 }
@@ -550,7 +581,7 @@ static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)
 bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
 				  gfp_t flags, size_t size, void **p);
 void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
-			    void **p, int objects, struct slabobj_ext *obj_exts);
+			    void **p, int objects, unsigned long obj_exts);
 #endif
 
 void kvfree_rcu_cb(struct rcu_head *head);
diff --git a/mm/slub.c b/mm/slub.c
index 0e32f6420a8a..84bd4f23dc4a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2042,7 +2042,7 @@ static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab,
 
 static inline void mark_objexts_empty(struct slabobj_ext *obj_exts)
 {
-	struct slabobj_ext *slab_exts;
+	unsigned long slab_exts;
 	struct slab *obj_exts_slab;
 
 	obj_exts_slab = virt_to_slab(obj_exts);
@@ -2050,13 +2050,15 @@ static inline void mark_objexts_empty(struct slabobj_ext *obj_exts)
 	if (slab_exts) {
 		unsigned int offs = obj_to_index(obj_exts_slab->slab_cache,
 						 obj_exts_slab, obj_exts);
+		struct slabobj_ext *ext = slab_obj_ext(obj_exts_slab,
+						       slab_exts, offs);
 
-		if (unlikely(is_codetag_empty(&slab_exts[offs].ref)))
+		if (unlikely(is_codetag_empty(ext->ref)))
 			return;
 
 		/* codetag should be NULL here */
-		WARN_ON(slab_exts[offs].ref.ct);
-		set_codetag_empty(&slab_exts[offs].ref);
+		WARN_ON(ext->ref.ct);
+		set_codetag_empty(&ext->ref);
 	}
 }
 
@@ -2176,7 +2178,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
 
 static inline void free_slab_obj_exts(struct slab *slab)
 {
-	struct slabobj_ext *obj_exts;
+	unsigned long obj_exts;
 
 	obj_exts = slab_obj_exts(slab);
 	if (!obj_exts) {
@@ -2196,11 +2198,11 @@ static inline void free_slab_obj_exts(struct slab *slab)
 	 * NULL, therefore replace NULL with CODETAG_EMPTY to indicate that
 	 * the extension for obj_exts is expected to be NULL.
 	 */
-	mark_objexts_empty(obj_exts);
+	mark_objexts_empty((struct slabobj_ext *)obj_exts);
 	if (unlikely(READ_ONCE(slab->obj_exts) & OBJEXTS_NOSPIN_ALLOC))
-		kfree_nolock(obj_exts);
+		kfree_nolock((void *)obj_exts);
 	else
-		kfree(obj_exts);
+		kfree((void *)obj_exts);
 	slab->obj_exts = 0;
 }
 
@@ -2225,26 +2227,29 @@ static inline void free_slab_obj_exts(struct slab *slab)
 #ifdef CONFIG_MEM_ALLOC_PROFILING
 
 static inline struct slabobj_ext *
-prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
+prepare_slab_obj_ext_hook(struct kmem_cache *s, gfp_t flags, void *p)
 {
 	struct slab *slab;
+	unsigned long obj_exts;
 
 	slab = virt_to_slab(p);
-	if (!slab_obj_exts(slab) &&
+	obj_exts = slab_obj_exts(slab);
+	if (!obj_exts &&
 	    alloc_slab_obj_exts(slab, s, flags, false)) {
 		pr_warn_once("%s, %s: Failed to create slab extension vector!\n",
 			     __func__, s->name);
 		return NULL;
 	}
 
-	return slab_obj_exts(slab) + obj_to_index(s, slab, p);
+	obj_exts = slab_obj_exts(slab);
+	return slab_obj_ext(slab, obj_exts, obj_to_index(s, slab, p));
 }
 
 /* Should be called only if mem_alloc_profiling_enabled() */
 static noinline void
 __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
 {
-	struct slabobj_ext *obj_exts;
+	struct slabobj_ext *obj_ext;
 
 	if (!object)
 		return;
@@ -2255,14 +2260,14 @@ __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
 	if (flags & __GFP_NO_OBJ_EXT)
 		return;
 
-	obj_exts = prepare_slab_obj_exts_hook(s, flags, object);
+	obj_ext = prepare_slab_obj_ext_hook(s, flags, object);
 	/*
 	 * Currently obj_exts is used only for allocation profiling.
 	 * If other users appear then mem_alloc_profiling_enabled()
 	 * check should be added before alloc_tag_add().
 	 */
-	if (likely(obj_exts))
-		alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size);
+	if (likely(obj_ext))
+		alloc_tag_add(&obj_ext->ref, current->alloc_tag, s->size);
 	else
 		alloc_tag_set_inaccurate(current->alloc_tag);
 }
@@ -2279,8 +2284,8 @@ static noinline void
 __alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
 			       int objects)
 {
-	struct slabobj_ext *obj_exts;
 	int i;
+	unsigned long obj_exts;
 
 	/* slab->obj_exts might not be NULL if it was created for MEMCG accounting. */
 	if (s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE))
@@ -2293,7 +2298,7 @@ __alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p
 	for (i = 0; i < objects; i++) {
 		unsigned int off = obj_to_index(s, slab, p[i]);
 
-		alloc_tag_sub(&obj_exts[off].ref, s->size);
+		alloc_tag_sub(&slab_obj_ext(slab, obj_exts, off)->ref, s->size);
 	}
 }
 
@@ -2352,7 +2357,7 @@ static __fastpath_inline
 void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
 			  int objects)
 {
-	struct slabobj_ext *obj_exts;
+	unsigned long obj_exts;
 
 	if (!memcg_kmem_online())
 		return;
@@ -2367,7 +2372,8 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
 static __fastpath_inline
 bool memcg_slab_post_charge(void *p, gfp_t flags)
 {
-	struct slabobj_ext *slab_exts;
+	unsigned long obj_exts;
+	struct slabobj_ext *obj_ext;
 	struct kmem_cache *s;
 	struct page *page;
 	struct slab *slab;
@@ -2408,10 +2414,11 @@ bool memcg_slab_post_charge(void *p, gfp_t flags)
 		return true;
 
 	/* Ignore already charged objects. */
-	slab_exts = slab_obj_exts(slab);
-	if (slab_exts) {
+	obj_exts = slab_obj_exts(slab);
+	if (obj_exts) {
 		off = obj_to_index(s, slab, p);
-		if (unlikely(slab_exts[off].objcg))
+		obj_ext = slab_obj_ext(slab, obj_exts, off);
+		if (unlikely(obj_ext->objcg))
 			return true;
 	}
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH V4 5/8] mm/slab: use stride to access slabobj_ext
  2025-12-22 11:08 [PATCH V4 0/8] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unsed slab space Harry Yoo
                   ` (3 preceding siblings ...)
  2025-12-22 11:08 ` [PATCH V4 4/8] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper Harry Yoo
@ 2025-12-22 11:08 ` Harry Yoo
  2025-12-22 11:08 ` [PATCH V4 6/8] mm/memcontrol,alloc_tag: handle slabobj_ext access under KASAN poison Harry Yoo
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Harry Yoo @ 2025-12-22 11:08 UTC (permalink / raw)
  To: akpm, vbabka
  Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
	muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, harry.yoo,
	tytso, adilger.kernel, linux-ext4, linux-kernel, cgroups, hao.li

Use a configurable stride value when accessing slab object extension
metadata instead of assuming a fixed sizeof(struct slabobj_ext).

Store stride value in free bits of slab->counters field. This allows
for flexibility in cases where the extension is embedded within
slab objects.

Since these free bits exist only on 64-bit, any future optimizations
that need to change stride value cannot be enabled on 32-bit architectures.

Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
 mm/slab.h | 37 +++++++++++++++++++++++++++++++++----
 mm/slub.c |  2 ++
 2 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index 5c75ef3d1823..38967ec663d1 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -55,6 +55,14 @@ struct freelist_counters {
 					 * that the slab was corrupted
 					 */
 					unsigned frozen:1;
+#ifdef CONFIG_64BIT
+					/*
+					 * Some optimizations use free bits in 'counters' field
+					 * to save memory. In case ->stride field is not available,
+					 * such optimizations are disabled.
+					 */
+					unsigned short stride;
+#endif
 				};
 			};
 		};
@@ -531,6 +539,26 @@ static inline unsigned long slab_obj_exts(struct slab *slab)
 	return obj_exts & ~OBJEXTS_FLAGS_MASK;
 }
 
+#ifdef CONFIG_64BIT
+static inline void slab_set_stride(struct slab *slab, unsigned short stride)
+{
+	slab->stride = stride;
+}
+static inline unsigned short slab_get_stride(struct slab *slab)
+{
+	return slab->stride;
+}
+#else
+static inline void slab_set_stride(struct slab *slab, unsigned short stride)
+{
+	VM_WARN_ON_ONCE(stride != sizeof(struct slabobj_ext));
+}
+static inline unsigned short slab_get_stride(struct slab *slab)
+{
+	return sizeof(struct slabobj_ext);
+}
+#endif
+
 /*
  * slab_obj_ext - get the pointer to the slab object extension metadata
  * associated with an object in a slab.
@@ -544,13 +572,10 @@ static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
 					       unsigned long obj_exts,
 					       unsigned int index)
 {
-	struct slabobj_ext *obj_ext;
-
 	VM_WARN_ON_ONCE(!slab_obj_exts(slab));
 	VM_WARN_ON_ONCE(obj_exts != slab_obj_exts(slab));
 
-	obj_ext = (struct slabobj_ext *)obj_exts;
-	return &obj_ext[index];
+	return (struct slabobj_ext *)(obj_exts + slab_get_stride(slab) * index);
 }
 
 int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
@@ -569,6 +594,10 @@ static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
 	return NULL;
 }
 
+static inline void slab_set_stride(struct slab *slab, unsigned int stride) { }
+static inline unsigned int slab_get_stride(struct slab *slab) { return 0; }
+
+
 #endif /* CONFIG_SLAB_OBJ_EXT */
 
 static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)
diff --git a/mm/slub.c b/mm/slub.c
index 84bd4f23dc4a..8ac60a17d988 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2147,6 +2147,8 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
 retry:
 	old_exts = READ_ONCE(slab->obj_exts);
 	handle_failed_objexts_alloc(old_exts, vec, objects);
+	slab_set_stride(slab, sizeof(struct slabobj_ext));
+
 	if (new_slab) {
 		/*
 		 * If the slab is brand new and nobody can yet access its
-- 
2.43.0



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH V4 6/8] mm/memcontrol,alloc_tag: handle slabobj_ext access under KASAN poison
  2025-12-22 11:08 [PATCH V4 0/8] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unsed slab space Harry Yoo
                   ` (4 preceding siblings ...)
  2025-12-22 11:08 ` [PATCH V4 5/8] mm/slab: use stride to access slabobj_ext Harry Yoo
@ 2025-12-22 11:08 ` Harry Yoo
  2025-12-22 11:08 ` [PATCH V4 7/8] mm/slab: save memory by allocating slabobj_ext array from leftover Harry Yoo
  2025-12-22 11:08 ` [PATCH V4 8/8] mm/slab: place slabobj_ext metadata in unused space within s->size Harry Yoo
  7 siblings, 0 replies; 27+ messages in thread
From: Harry Yoo @ 2025-12-22 11:08 UTC (permalink / raw)
  To: akpm, vbabka
  Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
	muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, harry.yoo,
	tytso, adilger.kernel, linux-ext4, linux-kernel, cgroups, hao.li

In the near future, slabobj_ext may reside outside the allocated slab
object range within a slab, which could be reported as an out-of-bounds
access by KASAN.

As suggested by Andrey Konovalov [1], explicitly disable KASAN and KMSAN
checks when accessing slabobj_ext within slab allocator, memory profiling,
and memory cgroup code. While an alternative approach could be to unpoison
slabobj_ext, out-of-bounds accesses outside the slab allocator are
generally more common.

Move metadata_access_enable()/disable() helpers to mm/slab.h so that
it can be used outside mm/slub.c. However, as suggested by Suren
Baghdasaryan [2], instead of calling them directly from mm code (which is
more prone to errors), change users to access slabobj_ext via get/put
APIs:

  - Users should call get_slab_obj_exts() to access slabobj_metadata.
    From now on, accessing it outside the section covered by
    get_slab_obj_exts() ~ put_slab_obj_exts() is illegal.
    This ensures that accesses to slabobj_ext metadata won't be reported
    as access violations.

  - If get_slab_obj_exts() returns zero, the caller should not call
    put_slab_obj_exts(). Otherwise it must be paired with
    put_slab_obj_exts().

Call kasan_reset_tag() in slab_obj_ext() before returning the address to
prevent SW or HW tag-based KASAN from reporting false positives.

Suggested-by: Andrey Konovalov <andreyknvl@gmail.com>
Suggested-by: Suren Baghdasaryan <surenb@google.com>
Link: https://lore.kernel.org/linux-mm/CA+fCnZezoWn40BaS3cgmCeLwjT+5AndzcQLc=wH3BjMCu6_YCw@mail.gmail.com [1]
Link: https://lore.kernel.org/linux-mm/CAJuCfpG=Lb4WhYuPkSpdNO4Ehtjm1YcEEK0OM=3g9i=LxmpHSQ@mail.gmail.com [2]
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
 mm/memcontrol.c | 12 +++++++--
 mm/slab.h       | 54 +++++++++++++++++++++++++++++++++++---
 mm/slub.c       | 69 ++++++++++++++++++++++++-------------------------
 3 files changed, 95 insertions(+), 40 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fd9105a953b0..50ca00122571 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2604,10 +2604,16 @@ struct mem_cgroup *mem_cgroup_from_obj_slab(struct slab *slab, void *p)
 	if (!obj_exts)
 		return NULL;
 
+	get_slab_obj_exts(obj_exts);
 	off = obj_to_index(slab->slab_cache, slab, p);
 	obj_ext = slab_obj_ext(slab, obj_exts, off);
-	if (obj_ext->objcg)
-		return obj_cgroup_memcg(obj_ext->objcg);
+	if (obj_ext->objcg) {
+		struct obj_cgroup *objcg = obj_ext->objcg;
+
+		put_slab_obj_exts(obj_exts);
+		return obj_cgroup_memcg(objcg);
+	}
+	put_slab_obj_exts(obj_exts);
 
 	return NULL;
 }
@@ -3219,10 +3225,12 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
 			return false;
 
 		obj_exts = slab_obj_exts(slab);
+		get_slab_obj_exts(obj_exts);
 		off = obj_to_index(s, slab, p[i]);
 		obj_ext = slab_obj_ext(slab, obj_exts, off);
 		obj_cgroup_get(objcg);
 		obj_ext->objcg = objcg;
+		put_slab_obj_exts(obj_exts);
 	}
 
 	return true;
diff --git a/mm/slab.h b/mm/slab.h
index 38967ec663d1..ba67d6059032 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -510,6 +510,24 @@ bool slab_in_kunit_test(void);
 static inline bool slab_in_kunit_test(void) { return false; }
 #endif
 
+/*
+ * slub is about to manipulate internal object metadata.  This memory lies
+ * outside the range of the allocated object, so accessing it would normally
+ * be reported by kasan as a bounds error.  metadata_access_enable() is used
+ * to tell kasan that these accesses are OK.
+ */
+static inline void metadata_access_enable(void)
+{
+	kasan_disable_current();
+	kmsan_disable_current();
+}
+
+static inline void metadata_access_disable(void)
+{
+	kmsan_enable_current();
+	kasan_enable_current();
+}
+
 #ifdef CONFIG_SLAB_OBJ_EXT
 
 /*
@@ -519,8 +537,22 @@ static inline bool slab_in_kunit_test(void) { return false; }
  *
  * Returns the address of the object extension vector associated with the slab,
  * or zero if no such vector has been associated yet.
- * Do not dereference the return value directly; use slab_obj_ext() to access
- * its elements.
+ * Do not dereference the return value directly; use get/put_slab_obj_exts()
+ * pair and slab_obj_ext() to access individual elements.
+ *
+ * Example usage:
+ *
+ * obj_exts = slab_obj_exts(slab);
+ * if (obj_exts) {
+ *         get_slab_obj_exts(obj_exts);
+ *         obj_ext = slab_obj_ext(slab, obj_exts, obj_to_index(s, slab, obj));
+ *         // do something with obj_ext
+ *         put_slab_obj_exts(obj_exts);
+ * }
+ *
+ * Note that the get/put semantics does not involve reference counting.
+ * Instead, it updates kasan/kmsan depth so that accesses to slabobj_ext
+ * won't be reported as access violations.
  */
 static inline unsigned long slab_obj_exts(struct slab *slab)
 {
@@ -539,6 +571,17 @@ static inline unsigned long slab_obj_exts(struct slab *slab)
 	return obj_exts & ~OBJEXTS_FLAGS_MASK;
 }
 
+static inline void get_slab_obj_exts(unsigned long obj_exts)
+{
+	VM_WARN_ON_ONCE(!obj_exts);
+	metadata_access_enable();
+}
+
+static inline void put_slab_obj_exts(unsigned long obj_exts)
+{
+	metadata_access_disable();
+}
+
 #ifdef CONFIG_64BIT
 static inline void slab_set_stride(struct slab *slab, unsigned short stride)
 {
@@ -567,15 +610,20 @@ static inline unsigned short slab_get_stride(struct slab *slab)
  * @index: an index of the object
  *
  * Returns a pointer to the object extension associated with the object.
+ * Must be called within a section covered by get/put_slab_obj_exts().
  */
 static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
 					       unsigned long obj_exts,
 					       unsigned int index)
 {
+	struct slabobj_ext *obj_ext;
+
 	VM_WARN_ON_ONCE(!slab_obj_exts(slab));
 	VM_WARN_ON_ONCE(obj_exts != slab_obj_exts(slab));
 
-	return (struct slabobj_ext *)(obj_exts + slab_get_stride(slab) * index);
+	obj_ext = (struct slabobj_ext *)(obj_exts +
+					 slab_get_stride(slab) * index);
+	return kasan_reset_tag(obj_ext);
 }
 
 int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
diff --git a/mm/slub.c b/mm/slub.c
index 8ac60a17d988..39c381cc1b2c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -975,24 +975,6 @@ static slab_flags_t slub_debug;
 static const char *slub_debug_string __ro_after_init;
 static int disable_higher_order_debug;
 
-/*
- * slub is about to manipulate internal object metadata.  This memory lies
- * outside the range of the allocated object, so accessing it would normally
- * be reported by kasan as a bounds error.  metadata_access_enable() is used
- * to tell kasan that these accesses are OK.
- */
-static inline void metadata_access_enable(void)
-{
-	kasan_disable_current();
-	kmsan_disable_current();
-}
-
-static inline void metadata_access_disable(void)
-{
-	kmsan_enable_current();
-	kasan_enable_current();
-}
-
 /*
  * Object debugging
  */
@@ -2042,23 +2024,27 @@ static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab,
 
 static inline void mark_objexts_empty(struct slabobj_ext *obj_exts)
 {
-	unsigned long slab_exts;
 	struct slab *obj_exts_slab;
+	unsigned long slab_exts;
 
 	obj_exts_slab = virt_to_slab(obj_exts);
 	slab_exts = slab_obj_exts(obj_exts_slab);
 	if (slab_exts) {
+		get_slab_obj_exts(slab_exts);
 		unsigned int offs = obj_to_index(obj_exts_slab->slab_cache,
 						 obj_exts_slab, obj_exts);
 		struct slabobj_ext *ext = slab_obj_ext(obj_exts_slab,
 						       slab_exts, offs);
 
-		if (unlikely(is_codetag_empty(ext->ref)))
+		if (unlikely(is_codetag_empty(ext->ref))) {
+			put_slab_obj_exts(slab_exts);
 			return;
+		}
 
 		/* codetag should be NULL here */
 		WARN_ON(ext->ref.ct);
 		set_codetag_empty(&ext->ref);
+		put_slab_obj_exts(slab_exts);
 	}
 }
 
@@ -2228,30 +2214,28 @@ static inline void free_slab_obj_exts(struct slab *slab)
 
 #ifdef CONFIG_MEM_ALLOC_PROFILING
 
-static inline struct slabobj_ext *
-prepare_slab_obj_ext_hook(struct kmem_cache *s, gfp_t flags, void *p)
+static inline unsigned long
+prepare_slab_obj_exts_hook(struct kmem_cache *s, struct slab *slab,
+			   gfp_t flags, void *p)
 {
-	struct slab *slab;
-	unsigned long obj_exts;
-
-	slab = virt_to_slab(p);
-	obj_exts = slab_obj_exts(slab);
-	if (!obj_exts &&
+	if (!slab_obj_exts(slab) &&
 	    alloc_slab_obj_exts(slab, s, flags, false)) {
 		pr_warn_once("%s, %s: Failed to create slab extension vector!\n",
 			     __func__, s->name);
-		return NULL;
+		return 0;
 	}
 
-	obj_exts = slab_obj_exts(slab);
-	return slab_obj_ext(slab, obj_exts, obj_to_index(s, slab, p));
+	return slab_obj_exts(slab);
 }
 
+
 /* Should be called only if mem_alloc_profiling_enabled() */
 static noinline void
 __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
 {
+	unsigned long obj_exts;
 	struct slabobj_ext *obj_ext;
+	struct slab *slab;
 
 	if (!object)
 		return;
@@ -2262,16 +2246,23 @@ __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
 	if (flags & __GFP_NO_OBJ_EXT)
 		return;
 
-	obj_ext = prepare_slab_obj_ext_hook(s, flags, object);
+	slab = virt_to_slab(object);
+	obj_exts = prepare_slab_obj_exts_hook(s, slab, flags, object);
 	/*
 	 * Currently obj_exts is used only for allocation profiling.
 	 * If other users appear then mem_alloc_profiling_enabled()
 	 * check should be added before alloc_tag_add().
 	 */
-	if (likely(obj_ext))
+	if (obj_exts) {
+		unsigned int obj_idx = obj_to_index(s, slab, object);
+
+		get_slab_obj_exts(obj_exts);
+		obj_ext = slab_obj_ext(slab, obj_exts, obj_idx);
 		alloc_tag_add(&obj_ext->ref, current->alloc_tag, s->size);
-	else
+		put_slab_obj_exts(obj_exts);
+	} else {
 		alloc_tag_set_inaccurate(current->alloc_tag);
+	}
 }
 
 static inline void
@@ -2297,11 +2288,13 @@ __alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p
 	if (!obj_exts)
 		return;
 
+	get_slab_obj_exts(obj_exts);
 	for (i = 0; i < objects; i++) {
 		unsigned int off = obj_to_index(s, slab, p[i]);
 
 		alloc_tag_sub(&slab_obj_ext(slab, obj_exts, off)->ref, s->size);
 	}
+	put_slab_obj_exts(obj_exts);
 }
 
 static inline void
@@ -2368,7 +2361,9 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
 	if (likely(!obj_exts))
 		return;
 
+	get_slab_obj_exts(obj_exts);
 	__memcg_slab_free_hook(s, slab, p, objects, obj_exts);
+	put_slab_obj_exts(obj_exts);
 }
 
 static __fastpath_inline
@@ -2418,10 +2413,14 @@ bool memcg_slab_post_charge(void *p, gfp_t flags)
 	/* Ignore already charged objects. */
 	obj_exts = slab_obj_exts(slab);
 	if (obj_exts) {
+		get_slab_obj_exts(obj_exts);
 		off = obj_to_index(s, slab, p);
 		obj_ext = slab_obj_ext(slab, obj_exts, off);
-		if (unlikely(obj_ext->objcg))
+		if (unlikely(obj_ext->objcg)) {
+			put_slab_obj_exts(obj_exts);
 			return true;
+		}
+		put_slab_obj_exts(obj_exts);
 	}
 
 	return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p);
-- 
2.43.0



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH V4 7/8] mm/slab: save memory by allocating slabobj_ext array from leftover
  2025-12-22 11:08 [PATCH V4 0/8] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unsed slab space Harry Yoo
                   ` (5 preceding siblings ...)
  2025-12-22 11:08 ` [PATCH V4 6/8] mm/memcontrol,alloc_tag: handle slabobj_ext access under KASAN poison Harry Yoo
@ 2025-12-22 11:08 ` Harry Yoo
  2025-12-23  1:40   ` kernel test robot
  2025-12-23 15:08   ` Hao Li
  2025-12-22 11:08 ` [PATCH V4 8/8] mm/slab: place slabobj_ext metadata in unused space within s->size Harry Yoo
  7 siblings, 2 replies; 27+ messages in thread
From: Harry Yoo @ 2025-12-22 11:08 UTC (permalink / raw)
  To: akpm, vbabka
  Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
	muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, harry.yoo,
	tytso, adilger.kernel, linux-ext4, linux-kernel, cgroups, hao.li

The leftover space in a slab is always smaller than s->size, and
kmem caches for large objects that are not power-of-two sizes tend to have
a greater amount of leftover space per slab. In some cases, the leftover
space is larger than the size of the slabobj_ext array for the slab.

An excellent example of such a cache is ext4_inode_cache. On my system,
the object size is 1144, with a preferred order of 3, 28 objects per slab,
and 736 bytes of leftover space per slab.

Since the size of the slabobj_ext array is only 224 bytes (w/o mem
profiling) or 448 bytes (w/ mem profiling) per slab, the entire array
fits within the leftover space.

Allocate the slabobj_exts array from this unused space instead of using
kcalloc() when it is large enough. The array is allocated from unused
space only when creating new slabs, and it doesn't try to utilize unused
space if alloc_slab_obj_exts() is called after slab creation because
implementing lazy allocation involves more expensive synchronization.

The implementation and evaluation of lazy allocation from unused space
is left as future-work. As pointed by Vlastimil Babka [1], it could be
beneficial when a slab cache without SLAB_ACCOUNT can be created, and
some of the allocations from the cache use __GFP_ACCOUNT. For example,
xarray does that.

To avoid unnecessary overhead when MEMCG (with SLAB_ACCOUNT) and
MEM_ALLOC_PROFILING are not used for the cache, allocate the slabobj_ext
array only when either of them is enabled.

[ MEMCG=y, MEM_ALLOC_PROFILING=n ]

Before patch (creating ~2.64M directories on ext4):
  Slab:            4747880 kB
  SReclaimable:    4169652 kB
  SUnreclaim:       578228 kB

After patch (creating ~2.64M directories on ext4):
  Slab:            4724020 kB
  SReclaimable:    4169188 kB
  SUnreclaim:       554832 kB (-22.84 MiB)

Enjoy the memory savings!

Link: https://lore.kernel.org/linux-mm/48029aab-20ea-4d90-bfd1-255592b2018e@suse.cz [1]
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
 mm/slub.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 151 insertions(+), 5 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 39c381cc1b2c..3fc3d2ca42e7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -886,6 +886,99 @@ static inline unsigned long get_orig_size(struct kmem_cache *s, void *object)
 	return *(unsigned long *)p;
 }
 
+#ifdef CONFIG_SLAB_OBJ_EXT
+
+/*
+ * Check if memory cgroup or memory allocation profiling is enabled.
+ * If enabled, SLUB tries to reduce memory overhead of accounting
+ * slab objects. If neither is enabled when this function is called,
+ * the optimization is simply skipped to avoid affecting caches that do not
+ * need slabobj_ext metadata.
+ *
+ * However, this may disable optimization when memory cgroup or memory
+ * allocation profiling is used, but slabs are created too early
+ * even before those subsystems are initialized.
+ */
+static inline bool need_slab_obj_exts(struct kmem_cache *s)
+{
+	if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
+		return true;
+
+	if (mem_alloc_profiling_enabled())
+		return true;
+
+	return false;
+}
+
+static inline unsigned int obj_exts_size_in_slab(struct slab *slab)
+{
+	return sizeof(struct slabobj_ext) * slab->objects;
+}
+
+static inline unsigned long obj_exts_offset_in_slab(struct kmem_cache *s,
+						    struct slab *slab)
+{
+	unsigned long objext_offset;
+
+	objext_offset = s->red_left_pad + s->size * slab->objects;
+	objext_offset = ALIGN(objext_offset, sizeof(struct slabobj_ext));
+	return objext_offset;
+}
+
+static inline bool obj_exts_fit_within_slab_leftover(struct kmem_cache *s,
+						     struct slab *slab)
+{
+	unsigned long objext_offset = obj_exts_offset_in_slab(s, slab);
+	unsigned long objext_size = obj_exts_size_in_slab(slab);
+
+	return objext_offset + objext_size <= slab_size(slab);
+}
+
+static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
+{
+	unsigned long expected;
+	unsigned long obj_exts;
+
+	obj_exts = slab_obj_exts(slab);
+	if (!obj_exts)
+		return false;
+
+	if (!obj_exts_fit_within_slab_leftover(s, slab))
+		return false;
+
+	expected = (unsigned long)slab_address(slab);
+	expected += obj_exts_offset_in_slab(s, slab);
+	return obj_exts == expected;
+}
+#else
+static inline bool need_slab_obj_exts(struct kmem_cache *s)
+{
+	return false;
+}
+
+static inline unsigned int obj_exts_size_in_slab(struct slab *slab)
+{
+	return 0;
+}
+
+static inline unsigned long obj_exts_offset_in_slab(struct kmem_cache *s,
+						    struct slab *slab)
+{
+	return 0;
+}
+
+static inline bool obj_exts_fit_within_slab_leftover(struct kmem_cache *s,
+						     struct slab *slab)
+{
+	return false;
+}
+
+static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
+{
+	return false;
+}
+#endif
+
 #ifdef CONFIG_SLUB_DEBUG
 
 /*
@@ -1405,7 +1498,15 @@ slab_pad_check(struct kmem_cache *s, struct slab *slab)
 	start = slab_address(slab);
 	length = slab_size(slab);
 	end = start + length;
-	remainder = length % s->size;
+
+	if (obj_exts_in_slab(s, slab)) {
+		remainder = length;
+		remainder -= obj_exts_offset_in_slab(s, slab);
+		remainder -= obj_exts_size_in_slab(slab);
+	} else {
+		remainder = length % s->size;
+	}
+
 	if (!remainder)
 		return;
 
@@ -2179,6 +2280,11 @@ static inline void free_slab_obj_exts(struct slab *slab)
 		return;
 	}
 
+	if (obj_exts_in_slab(slab->slab_cache, slab)) {
+		slab->obj_exts = 0;
+		return;
+	}
+
 	/*
 	 * obj_exts was created with __GFP_NO_OBJ_EXT flag, therefore its
 	 * corresponding extension will be NULL. alloc_tag_sub() will throw a
@@ -2194,6 +2300,35 @@ static inline void free_slab_obj_exts(struct slab *slab)
 	slab->obj_exts = 0;
 }
 
+/*
+ * Try to allocate slabobj_ext array from unused space.
+ * This function must be called on a freshly allocated slab to prevent
+ * concurrency problems.
+ */
+static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
+{
+	void *addr;
+	unsigned long obj_exts;
+
+	if (!need_slab_obj_exts(s))
+		return;
+
+	if (obj_exts_fit_within_slab_leftover(s, slab)) {
+		addr = slab_address(slab) + obj_exts_offset_in_slab(s, slab);
+		addr = kasan_reset_tag(addr);
+		obj_exts = (unsigned long)addr;
+
+		get_slab_obj_exts(obj_exts);
+		memset(addr, 0, obj_exts_size_in_slab(slab));
+		put_slab_obj_exts(obj_exts);
+
+		if (IS_ENABLED(CONFIG_MEMCG))
+			obj_exts |= MEMCG_DATA_OBJEXTS;
+		slab->obj_exts = obj_exts;
+		slab_set_stride(slab, sizeof(struct slabobj_ext));
+	}
+}
+
 #else /* CONFIG_SLAB_OBJ_EXT */
 
 static inline void init_slab_obj_exts(struct slab *slab)
@@ -2210,6 +2345,11 @@ static inline void free_slab_obj_exts(struct slab *slab)
 {
 }
 
+static inline void alloc_slab_obj_exts_early(struct kmem_cache *s,
+						       struct slab *slab)
+{
+}
+
 #endif /* CONFIG_SLAB_OBJ_EXT */
 
 #ifdef CONFIG_MEM_ALLOC_PROFILING
@@ -3206,7 +3346,9 @@ static inline bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
 static __always_inline void account_slab(struct slab *slab, int order,
 					 struct kmem_cache *s, gfp_t gfp)
 {
-	if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
+	if (memcg_kmem_online() &&
+			(s->flags & SLAB_ACCOUNT) &&
+			!slab_obj_exts(slab))
 		alloc_slab_obj_exts(slab, s, gfp, true);
 
 	mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
@@ -3270,9 +3412,6 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 	slab->objects = oo_objects(oo);
 	slab->inuse = 0;
 	slab->frozen = 0;
-	init_slab_obj_exts(slab);
-
-	account_slab(slab, oo_order(oo), s, flags);
 
 	slab->slab_cache = s;
 
@@ -3281,6 +3420,13 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 	start = slab_address(slab);
 
 	setup_slab_debug(s, slab, start);
+	init_slab_obj_exts(slab);
+	/*
+	 * Poison the slab before initializing the slabobj_ext array
+	 * to prevent the array from being overwritten.
+	 */
+	alloc_slab_obj_exts_early(s, slab);
+	account_slab(slab, oo_order(oo), s, flags);
 
 	shuffle = shuffle_freelist(s, slab);
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH V4 8/8] mm/slab: place slabobj_ext metadata in unused space within s->size
  2025-12-22 11:08 [PATCH V4 0/8] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unsed slab space Harry Yoo
                   ` (6 preceding siblings ...)
  2025-12-22 11:08 ` [PATCH V4 7/8] mm/slab: save memory by allocating slabobj_ext array from leftover Harry Yoo
@ 2025-12-22 11:08 ` Harry Yoo
  2025-12-24  5:33   ` Hao Li
  7 siblings, 1 reply; 27+ messages in thread
From: Harry Yoo @ 2025-12-22 11:08 UTC (permalink / raw)
  To: akpm, vbabka
  Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
	muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, harry.yoo,
	tytso, adilger.kernel, linux-ext4, linux-kernel, cgroups, hao.li

When a cache has high s->align value and s->object_size is not aligned
to it, each object ends up with some unused space because of alignment.
If this wasted space is big enough, we can use it to store the
slabobj_ext metadata instead of wasting it.

On my system, this happens with caches like kmem_cache, mm_struct, pid,
task_struct, sighand_cache, xfs_inode, and others.

To place the slabobj_ext metadata within each object, the existing
slab_obj_ext() logic can still be used by setting:

  - slab->obj_exts = slab_address(slab) + s->red_left_zone +
                     (slabobj_ext offset)
  - stride = s->size

slab_obj_ext() doesn't need know where the metadata is stored,
so this method works without adding extra overhead to slab_obj_ext().

A good example benefiting from this optimization is xfs_inode
(object_size: 992, align: 64). To measure memory savings, 2 millions of
files were created on XFS.

[ MEMCG=y, MEM_ALLOC_PROFILING=n ]

Before patch (creating ~2.64M directories on xfs):
  Slab:            5175976 kB
  SReclaimable:    3837524 kB
  SUnreclaim:      1338452 kB

After patch (creating ~2.64M directories on xfs):
  Slab:            5152912 kB
  SReclaimable:    3838568 kB
  SUnreclaim:      1314344 kB (-23.54 MiB)

Enjoy the memory savings!

Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
 include/linux/slab.h |  9 ++++++
 mm/slab_common.c     |  6 ++--
 mm/slub.c            | 73 ++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 4554c04a9bd7..da512d9ab1a0 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -59,6 +59,9 @@ enum _slab_flag_bits {
 	_SLAB_CMPXCHG_DOUBLE,
 #ifdef CONFIG_SLAB_OBJ_EXT
 	_SLAB_NO_OBJ_EXT,
+#endif
+#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
+	_SLAB_OBJ_EXT_IN_OBJ,
 #endif
 	_SLAB_FLAGS_LAST_BIT
 };
@@ -244,6 +247,12 @@ enum _slab_flag_bits {
 #define SLAB_NO_OBJ_EXT		__SLAB_FLAG_UNUSED
 #endif
 
+#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
+#define SLAB_OBJ_EXT_IN_OBJ	__SLAB_FLAG_BIT(_SLAB_OBJ_EXT_IN_OBJ)
+#else
+#define SLAB_OBJ_EXT_IN_OBJ	__SLAB_FLAG_UNUSED
+#endif
+
 /*
  * ZERO_SIZE_PTR will be returned for zero sized kmalloc requests.
  *
diff --git a/mm/slab_common.c b/mm/slab_common.c
index c4cf9ed2ec92..f0a6db20d7ea 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -43,11 +43,13 @@ DEFINE_MUTEX(slab_mutex);
 struct kmem_cache *kmem_cache;
 
 /*
- * Set of flags that will prevent slab merging
+ * Set of flags that will prevent slab merging.
+ * Any flag that adds per-object metadata should be included,
+ * since slab merging can update s->inuse that affects the metadata layout.
  */
 #define SLAB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
 		SLAB_TRACE | SLAB_TYPESAFE_BY_RCU | SLAB_NOLEAKTRACE | \
-		SLAB_FAILSLAB | SLAB_NO_MERGE)
+		SLAB_FAILSLAB | SLAB_NO_MERGE | SLAB_OBJ_EXT_IN_OBJ)
 
 #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \
 			 SLAB_CACHE_DMA32 | SLAB_ACCOUNT)
diff --git a/mm/slub.c b/mm/slub.c
index 3fc3d2ca42e7..78f0087c8e48 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -977,6 +977,39 @@ static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
 {
 	return false;
 }
+
+#endif
+
+#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
+static bool obj_exts_in_object(struct kmem_cache *s)
+{
+	return s->flags & SLAB_OBJ_EXT_IN_OBJ;
+}
+
+static unsigned int obj_exts_offset_in_object(struct kmem_cache *s)
+{
+	unsigned int offset = get_info_end(s);
+
+	if (kmem_cache_debug_flags(s, SLAB_STORE_USER))
+		offset += sizeof(struct track) * 2;
+
+	if (slub_debug_orig_size(s))
+		offset += sizeof(unsigned long);
+
+	offset += kasan_metadata_size(s, false);
+
+	return offset;
+}
+#else
+static inline bool obj_exts_in_object(struct kmem_cache *s)
+{
+	return false;
+}
+
+static inline unsigned int obj_exts_offset_in_object(struct kmem_cache *s)
+{
+	return 0;
+}
 #endif
 
 #ifdef CONFIG_SLUB_DEBUG
@@ -1277,6 +1310,9 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p)
 
 	off += kasan_metadata_size(s, false);
 
+	if (obj_exts_in_object(s))
+		off += sizeof(struct slabobj_ext);
+
 	if (off != size_from_object(s))
 		/* Beginning of the filler is the free pointer */
 		print_section(KERN_ERR, "Padding  ", p + off,
@@ -1446,7 +1482,10 @@ check_bytes_and_report(struct kmem_cache *s, struct slab *slab,
  * 	A. Free pointer (if we cannot overwrite object on free)
  * 	B. Tracking data for SLAB_STORE_USER
  *	C. Original request size for kmalloc object (SLAB_STORE_USER enabled)
- *	D. Padding to reach required alignment boundary or at minimum
+ *	D. KASAN alloc metadata (KASAN enabled)
+ *	E. struct slabobj_ext to store accounting metadata
+ *	   (SLAB_OBJ_EXT_IN_OBJ enabled)
+ *	F. Padding to reach required alignment boundary or at minimum
  * 		one word if debugging is on to be able to detect writes
  * 		before the word boundary.
  *
@@ -1474,6 +1513,9 @@ static int check_pad_bytes(struct kmem_cache *s, struct slab *slab, u8 *p)
 
 	off += kasan_metadata_size(s, false);
 
+	if (obj_exts_in_object(s))
+		off += sizeof(struct slabobj_ext);
+
 	if (size_from_object(s) == off)
 		return 1;
 
@@ -2280,7 +2322,8 @@ static inline void free_slab_obj_exts(struct slab *slab)
 		return;
 	}
 
-	if (obj_exts_in_slab(slab->slab_cache, slab)) {
+	if (obj_exts_in_slab(slab->slab_cache, slab) ||
+			obj_exts_in_object(slab->slab_cache)) {
 		slab->obj_exts = 0;
 		return;
 	}
@@ -2326,6 +2369,23 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
 			obj_exts |= MEMCG_DATA_OBJEXTS;
 		slab->obj_exts = obj_exts;
 		slab_set_stride(slab, sizeof(struct slabobj_ext));
+	} else if (obj_exts_in_object(s)) {
+		unsigned int offset = obj_exts_offset_in_object(s);
+
+		obj_exts = (unsigned long)slab_address(slab);
+		obj_exts += s->red_left_pad;
+		obj_exts += obj_exts_offset_in_object(s);
+
+		get_slab_obj_exts(obj_exts);
+		for_each_object(addr, s, slab_address(slab), slab->objects)
+			memset(kasan_reset_tag(addr) + offset, 0,
+			       sizeof(struct slabobj_ext));
+		put_slab_obj_exts(obj_exts);
+
+		if (IS_ENABLED(CONFIG_MEMCG))
+			obj_exts |= MEMCG_DATA_OBJEXTS;
+		slab->obj_exts = obj_exts;
+		slab_set_stride(slab, s->size);
 	}
 }
 
@@ -8023,6 +8083,7 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
 {
 	slab_flags_t flags = s->flags;
 	unsigned int size = s->object_size;
+	unsigned int aligned_size;
 	unsigned int order;
 
 	/*
@@ -8132,7 +8193,13 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
 	 * offset 0. In order to align the objects we have to simply size
 	 * each object to conform to the alignment.
 	 */
-	size = ALIGN(size, s->align);
+	aligned_size = ALIGN(size, s->align);
+#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
+	if (aligned_size - size >= sizeof(struct slabobj_ext))
+		s->flags |= SLAB_OBJ_EXT_IN_OBJ;
+#endif
+	size = aligned_size;
+
 	s->size = size;
 	s->reciprocal_size = reciprocal_value(size);
 	order = calculate_order(size);
-- 
2.43.0



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V4 4/8] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper
  2025-12-22 11:08 ` [PATCH V4 4/8] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper Harry Yoo
@ 2025-12-22 23:36   ` kernel test robot
  2025-12-23  0:08   ` kernel test robot
  1 sibling, 0 replies; 27+ messages in thread
From: kernel test robot @ 2025-12-22 23:36 UTC (permalink / raw)
  To: Harry Yoo, akpm, vbabka
  Cc: llvm, oe-kbuild-all, andreyknvl, cl, dvyukov, glider, hannes,
	linux-mm, mhocko, muchun.song, rientjes, roman.gushchin,
	ryabinin.a.a, shakeel.butt, surenb, vincenzo.frascino,
	yeoreum.yun, harry.yoo, tytso, adilger.kernel, linux-ext4,
	linux-kernel, cgroups, hao.li

Hi Harry,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Harry-Yoo/mm-slab-use-unsigned-long-for-orig_size-to-ensure-proper-metadata-align/20251222-191144
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20251222110843.980347-5-harry.yoo%40oracle.com
patch subject: [PATCH V4 4/8] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper
config: x86_64-buildonly-randconfig-001-20251223 (https://download.01.org/0day-ci/archive/20251223/202512230850.bBE4pAZ5-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251223/202512230850.bBE4pAZ5-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512230850.bBE4pAZ5-lkp@intel.com/

All errors (new ones prefixed by >>):

>> mm/slub.c:2056:33: error: passing 'union codetag_ref' to parameter of incompatible type 'union codetag_ref *'; take the address with &
    2056 |                 if (unlikely(is_codetag_empty(ext->ref)))
         |                                               ^~~~~~~~
         |                                               &
   include/linux/compiler.h:47:41: note: expanded from macro 'unlikely'
      47 | #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
         |                                           ^
   include/linux/compiler.h:32:34: note: expanded from macro '__branch_check__'
      32 |                         ______r = __builtin_expect(!!(x), expect);      \
         |                                                       ^
   include/linux/alloc_tag.h:52:56: note: passing argument to parameter 'ref' here
      52 | static inline bool is_codetag_empty(union codetag_ref *ref)
         |                                                        ^
>> mm/slub.c:2056:33: error: passing 'union codetag_ref' to parameter of incompatible type 'union codetag_ref *'; take the address with &
    2056 |                 if (unlikely(is_codetag_empty(ext->ref)))
         |                                               ^~~~~~~~
         |                                               &
   include/linux/compiler.h:47:68: note: expanded from macro 'unlikely'
      47 | #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
         |                                                                      ^
   include/linux/compiler.h:34:19: note: expanded from macro '__branch_check__'
      34 |                                              expect, is_constant);      \
         |                                                      ^~~~~~~~~~~
   include/linux/alloc_tag.h:52:56: note: passing argument to parameter 'ref' here
      52 | static inline bool is_codetag_empty(union codetag_ref *ref)
         |                                                        ^
   2 errors generated.


vim +2056 mm/slub.c

  2042	
  2043	static inline void mark_objexts_empty(struct slabobj_ext *obj_exts)
  2044	{
  2045		unsigned long slab_exts;
  2046		struct slab *obj_exts_slab;
  2047	
  2048		obj_exts_slab = virt_to_slab(obj_exts);
  2049		slab_exts = slab_obj_exts(obj_exts_slab);
  2050		if (slab_exts) {
  2051			unsigned int offs = obj_to_index(obj_exts_slab->slab_cache,
  2052							 obj_exts_slab, obj_exts);
  2053			struct slabobj_ext *ext = slab_obj_ext(obj_exts_slab,
  2054							       slab_exts, offs);
  2055	
> 2056			if (unlikely(is_codetag_empty(ext->ref)))
  2057				return;
  2058	
  2059			/* codetag should be NULL here */
  2060			WARN_ON(ext->ref.ct);
  2061			set_codetag_empty(&ext->ref);
  2062		}
  2063	}
  2064	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V4 4/8] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper
  2025-12-22 11:08 ` [PATCH V4 4/8] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper Harry Yoo
  2025-12-22 23:36   ` kernel test robot
@ 2025-12-23  0:08   ` kernel test robot
  1 sibling, 0 replies; 27+ messages in thread
From: kernel test robot @ 2025-12-23  0:08 UTC (permalink / raw)
  To: Harry Yoo, akpm, vbabka
  Cc: oe-kbuild-all, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
	mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, harry.yoo,
	tytso, adilger.kernel, linux-ext4, linux-kernel, cgroups, hao.li

Hi Harry,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Harry-Yoo/mm-slab-use-unsigned-long-for-orig_size-to-ensure-proper-metadata-align/20251222-191144
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20251222110843.980347-5-harry.yoo%40oracle.com
patch subject: [PATCH V4 4/8] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper
config: parisc-randconfig-002-20251223 (https://download.01.org/0day-ci/archive/20251223/202512230727.ktAJv4eA-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251223/202512230727.ktAJv4eA-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512230727.ktAJv4eA-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/asm-generic/bug.h:5,
                    from arch/parisc/include/asm/bug.h:97,
                    from include/linux/bug.h:5,
                    from include/linux/mmdebug.h:5,
                    from include/linux/mm.h:6,
                    from mm/slub.c:13:
   mm/slub.c: In function 'mark_objexts_empty':
>> mm/slub.c:2056:36: error: incompatible type for argument 1 of 'is_codetag_empty'
      if (unlikely(is_codetag_empty(ext->ref)))
                                    ~~~^~~~~
   include/linux/compiler.h:77:42: note: in definition of macro 'unlikely'
    # define unlikely(x) __builtin_expect(!!(x), 0)
                                             ^
   In file included from include/linux/workqueue.h:9,
                    from include/linux/mm_types.h:19,
                    from include/linux/mmzone.h:22,
                    from include/linux/gfp.h:7,
                    from include/linux/mm.h:7,
                    from mm/slub.c:13:
   include/linux/alloc_tag.h:52:56: note: expected 'union codetag_ref *' but argument is of type 'union codetag_ref'
    static inline bool is_codetag_empty(union codetag_ref *ref)
                                        ~~~~~~~~~~~~~~~~~~~^~~

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for CAN_DEV
   Depends on [n]: NETDEVICES [=n] && CAN [=y]
   Selected by [y]:
   - CAN [=y] && NET [=y]


vim +/is_codetag_empty +2056 mm/slub.c

  2042	
  2043	static inline void mark_objexts_empty(struct slabobj_ext *obj_exts)
  2044	{
  2045		unsigned long slab_exts;
  2046		struct slab *obj_exts_slab;
  2047	
  2048		obj_exts_slab = virt_to_slab(obj_exts);
  2049		slab_exts = slab_obj_exts(obj_exts_slab);
  2050		if (slab_exts) {
  2051			unsigned int offs = obj_to_index(obj_exts_slab->slab_cache,
  2052							 obj_exts_slab, obj_exts);
  2053			struct slabobj_ext *ext = slab_obj_ext(obj_exts_slab,
  2054							       slab_exts, offs);
  2055	
> 2056			if (unlikely(is_codetag_empty(ext->ref)))
  2057				return;
  2058	
  2059			/* codetag should be NULL here */
  2060			WARN_ON(ext->ref.ct);
  2061			set_codetag_empty(&ext->ref);
  2062		}
  2063	}
  2064	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V4 7/8] mm/slab: save memory by allocating slabobj_ext array from leftover
  2025-12-22 11:08 ` [PATCH V4 7/8] mm/slab: save memory by allocating slabobj_ext array from leftover Harry Yoo
@ 2025-12-23  1:40   ` kernel test robot
  2025-12-23 15:08   ` Hao Li
  1 sibling, 0 replies; 27+ messages in thread
From: kernel test robot @ 2025-12-23  1:40 UTC (permalink / raw)
  To: Harry Yoo, akpm, vbabka
  Cc: llvm, oe-kbuild-all, andreyknvl, cl, dvyukov, glider, hannes,
	linux-mm, mhocko, muchun.song, rientjes, roman.gushchin,
	ryabinin.a.a, shakeel.butt, surenb, vincenzo.frascino,
	yeoreum.yun, harry.yoo, tytso, adilger.kernel, linux-ext4,
	linux-kernel, cgroups, hao.li

Hi Harry,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Harry-Yoo/mm-slab-use-unsigned-long-for-orig_size-to-ensure-proper-metadata-align/20251222-191144
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20251222110843.980347-8-harry.yoo%40oracle.com
patch subject: [PATCH V4 7/8] mm/slab: save memory by allocating slabobj_ext array from leftover
config: x86_64-buildonly-randconfig-001-20251223 (https://download.01.org/0day-ci/archive/20251223/202512231042.EEBUajQY-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251223/202512231042.EEBUajQY-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512231042.EEBUajQY-lkp@intel.com/

All errors (new ones prefixed by >>):

   mm/slub.c:2140:33: error: passing 'union codetag_ref' to parameter of incompatible type 'union codetag_ref *'; take the address with &
    2140 |                 if (unlikely(is_codetag_empty(ext->ref))) {
         |                                               ^~~~~~~~
         |                                               &
   include/linux/compiler.h:47:41: note: expanded from macro 'unlikely'
      47 | #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
         |                                           ^
   include/linux/compiler.h:32:34: note: expanded from macro '__branch_check__'
      32 |                         ______r = __builtin_expect(!!(x), expect);      \
         |                                                       ^
   include/linux/alloc_tag.h:52:56: note: passing argument to parameter 'ref' here
      52 | static inline bool is_codetag_empty(union codetag_ref *ref)
         |                                                        ^
   mm/slub.c:2140:33: error: passing 'union codetag_ref' to parameter of incompatible type 'union codetag_ref *'; take the address with &
    2140 |                 if (unlikely(is_codetag_empty(ext->ref))) {
         |                                               ^~~~~~~~
         |                                               &
   include/linux/compiler.h:47:68: note: expanded from macro 'unlikely'
      47 | #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
         |                                                                      ^
   include/linux/compiler.h:34:19: note: expanded from macro '__branch_check__'
      34 |                                              expect, is_constant);      \
         |                                                      ^~~~~~~~~~~
   include/linux/alloc_tag.h:52:56: note: passing argument to parameter 'ref' here
      52 | static inline bool is_codetag_empty(union codetag_ref *ref)
         |                                                        ^
>> mm/slub.c:2326:16: error: use of undeclared identifier 'MEMCG_DATA_OBJEXTS'
    2326 |                         obj_exts |= MEMCG_DATA_OBJEXTS;
         |                                     ^
   3 errors generated.


vim +/MEMCG_DATA_OBJEXTS +2326 mm/slub.c

  2302	
  2303	/*
  2304	 * Try to allocate slabobj_ext array from unused space.
  2305	 * This function must be called on a freshly allocated slab to prevent
  2306	 * concurrency problems.
  2307	 */
  2308	static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
  2309	{
  2310		void *addr;
  2311		unsigned long obj_exts;
  2312	
  2313		if (!need_slab_obj_exts(s))
  2314			return;
  2315	
  2316		if (obj_exts_fit_within_slab_leftover(s, slab)) {
  2317			addr = slab_address(slab) + obj_exts_offset_in_slab(s, slab);
  2318			addr = kasan_reset_tag(addr);
  2319			obj_exts = (unsigned long)addr;
  2320	
  2321			get_slab_obj_exts(obj_exts);
  2322			memset(addr, 0, obj_exts_size_in_slab(slab));
  2323			put_slab_obj_exts(obj_exts);
  2324	
  2325			if (IS_ENABLED(CONFIG_MEMCG))
> 2326				obj_exts |= MEMCG_DATA_OBJEXTS;
  2327			slab->obj_exts = obj_exts;
  2328			slab_set_stride(slab, sizeof(struct slabobj_ext));
  2329		}
  2330	}
  2331	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V4 7/8] mm/slab: save memory by allocating slabobj_ext array from leftover
  2025-12-22 11:08 ` [PATCH V4 7/8] mm/slab: save memory by allocating slabobj_ext array from leftover Harry Yoo
  2025-12-23  1:40   ` kernel test robot
@ 2025-12-23 15:08   ` Hao Li
  2025-12-23 15:31     ` Harry Yoo
  1 sibling, 1 reply; 27+ messages in thread
From: Hao Li @ 2025-12-23 15:08 UTC (permalink / raw)
  To: Harry Yoo
  Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
	mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, tytso,
	adilger.kernel, linux-ext4, linux-kernel, cgroups

On Mon, Dec 22, 2025 at 08:08:42PM +0900, Harry Yoo wrote:
> The leftover space in a slab is always smaller than s->size, and
> kmem caches for large objects that are not power-of-two sizes tend to have
> a greater amount of leftover space per slab. In some cases, the leftover
> space is larger than the size of the slabobj_ext array for the slab.
> 
> An excellent example of such a cache is ext4_inode_cache. On my system,
> the object size is 1144, with a preferred order of 3, 28 objects per slab,
> and 736 bytes of leftover space per slab.
> 
> Since the size of the slabobj_ext array is only 224 bytes (w/o mem
> profiling) or 448 bytes (w/ mem profiling) per slab, the entire array
> fits within the leftover space.
> 
> Allocate the slabobj_exts array from this unused space instead of using
> kcalloc() when it is large enough. The array is allocated from unused
> space only when creating new slabs, and it doesn't try to utilize unused
> space if alloc_slab_obj_exts() is called after slab creation because
> implementing lazy allocation involves more expensive synchronization.
> 
> The implementation and evaluation of lazy allocation from unused space
> is left as future-work. As pointed by Vlastimil Babka [1], it could be
> beneficial when a slab cache without SLAB_ACCOUNT can be created, and
> some of the allocations from the cache use __GFP_ACCOUNT. For example,
> xarray does that.
> 
> To avoid unnecessary overhead when MEMCG (with SLAB_ACCOUNT) and
> MEM_ALLOC_PROFILING are not used for the cache, allocate the slabobj_ext
> array only when either of them is enabled.
> 
> [ MEMCG=y, MEM_ALLOC_PROFILING=n ]
> 
> Before patch (creating ~2.64M directories on ext4):
>   Slab:            4747880 kB
>   SReclaimable:    4169652 kB
>   SUnreclaim:       578228 kB
> 
> After patch (creating ~2.64M directories on ext4):
>   Slab:            4724020 kB
>   SReclaimable:    4169188 kB
>   SUnreclaim:       554832 kB (-22.84 MiB)
> 
> Enjoy the memory savings!
> 
> Link: https://lore.kernel.org/linux-mm/48029aab-20ea-4d90-bfd1-255592b2018e@suse.cz [1]
> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> ---
>  mm/slub.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 151 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 39c381cc1b2c..3fc3d2ca42e7 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -886,6 +886,99 @@ static inline unsigned long get_orig_size(struct kmem_cache *s, void *object)
>  	return *(unsigned long *)p;
>  }
>  
> +#ifdef CONFIG_SLAB_OBJ_EXT
> +
> +/*
> + * Check if memory cgroup or memory allocation profiling is enabled.
> + * If enabled, SLUB tries to reduce memory overhead of accounting
> + * slab objects. If neither is enabled when this function is called,
> + * the optimization is simply skipped to avoid affecting caches that do not
> + * need slabobj_ext metadata.
> + *
> + * However, this may disable optimization when memory cgroup or memory
> + * allocation profiling is used, but slabs are created too early
> + * even before those subsystems are initialized.
> + */
> +static inline bool need_slab_obj_exts(struct kmem_cache *s)
> +{
> +	if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
> +		return true;
> +
> +	if (mem_alloc_profiling_enabled())
> +		return true;
> +
> +	return false;
> +}
> +
> +static inline unsigned int obj_exts_size_in_slab(struct slab *slab)
> +{
> +	return sizeof(struct slabobj_ext) * slab->objects;
> +}
> +
> +static inline unsigned long obj_exts_offset_in_slab(struct kmem_cache *s,
> +						    struct slab *slab)
> +{
> +	unsigned long objext_offset;
> +
> +	objext_offset = s->red_left_pad + s->size * slab->objects;

Hi Harry,

As s->size already includes s->red_left_pad, do we still need
s->red_left_pad here?

> +	objext_offset = ALIGN(objext_offset, sizeof(struct slabobj_ext));
> +	return objext_offset;
> +}
> +
> +static inline bool obj_exts_fit_within_slab_leftover(struct kmem_cache *s,
> +						     struct slab *slab)
> +{
> +	unsigned long objext_offset = obj_exts_offset_in_slab(s, slab);
> +	unsigned long objext_size = obj_exts_size_in_slab(slab);
> +
> +	return objext_offset + objext_size <= slab_size(slab);
> +}
> +
> +static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
> +{
> +	unsigned long expected;
> +	unsigned long obj_exts;
> +
> +	obj_exts = slab_obj_exts(slab);
> +	if (!obj_exts)
> +		return false;
> +
> +	if (!obj_exts_fit_within_slab_leftover(s, slab))
> +		return false;
> +
> +	expected = (unsigned long)slab_address(slab);
> +	expected += obj_exts_offset_in_slab(s, slab);
> +	return obj_exts == expected;
> +}
> +#else
> +static inline bool need_slab_obj_exts(struct kmem_cache *s)
> +{
> +	return false;
> +}
> +
> +static inline unsigned int obj_exts_size_in_slab(struct slab *slab)
> +{
> +	return 0;
> +}
> +
> +static inline unsigned long obj_exts_offset_in_slab(struct kmem_cache *s,
> +						    struct slab *slab)
> +{
> +	return 0;
> +}
> +
> +static inline bool obj_exts_fit_within_slab_leftover(struct kmem_cache *s,
> +						     struct slab *slab)
> +{
> +	return false;
> +}
> +
> +static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
> +{
> +	return false;
> +}
> +#endif
> +
>  #ifdef CONFIG_SLUB_DEBUG
>  
>  /*
> @@ -1405,7 +1498,15 @@ slab_pad_check(struct kmem_cache *s, struct slab *slab)
>  	start = slab_address(slab);
>  	length = slab_size(slab);
>  	end = start + length;
> -	remainder = length % s->size;
> +
> +	if (obj_exts_in_slab(s, slab)) {
> +		remainder = length;
> +		remainder -= obj_exts_offset_in_slab(s, slab);
> +		remainder -= obj_exts_size_in_slab(slab);
> +	} else {
> +		remainder = length % s->size;
> +	}
> +
>  	if (!remainder)
>  		return;
>  
> @@ -2179,6 +2280,11 @@ static inline void free_slab_obj_exts(struct slab *slab)
>  		return;
>  	}
>  
> +	if (obj_exts_in_slab(slab->slab_cache, slab)) {
> +		slab->obj_exts = 0;
> +		return;
> +	}
> +
>  	/*
>  	 * obj_exts was created with __GFP_NO_OBJ_EXT flag, therefore its
>  	 * corresponding extension will be NULL. alloc_tag_sub() will throw a
> @@ -2194,6 +2300,35 @@ static inline void free_slab_obj_exts(struct slab *slab)
>  	slab->obj_exts = 0;
>  }
>  
> +/*
> + * Try to allocate slabobj_ext array from unused space.
> + * This function must be called on a freshly allocated slab to prevent
> + * concurrency problems.
> + */
> +static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
> +{
> +	void *addr;
> +	unsigned long obj_exts;
> +
> +	if (!need_slab_obj_exts(s))
> +		return;
> +
> +	if (obj_exts_fit_within_slab_leftover(s, slab)) {
> +		addr = slab_address(slab) + obj_exts_offset_in_slab(s, slab);
> +		addr = kasan_reset_tag(addr);
> +		obj_exts = (unsigned long)addr;
> +
> +		get_slab_obj_exts(obj_exts);
> +		memset(addr, 0, obj_exts_size_in_slab(slab));
> +		put_slab_obj_exts(obj_exts);
> +
> +		if (IS_ENABLED(CONFIG_MEMCG))
> +			obj_exts |= MEMCG_DATA_OBJEXTS;
> +		slab->obj_exts = obj_exts;
> +		slab_set_stride(slab, sizeof(struct slabobj_ext));
> +	}
> +}
> +
>  #else /* CONFIG_SLAB_OBJ_EXT */
>  
>  static inline void init_slab_obj_exts(struct slab *slab)
> @@ -2210,6 +2345,11 @@ static inline void free_slab_obj_exts(struct slab *slab)
>  {
>  }
>  
> +static inline void alloc_slab_obj_exts_early(struct kmem_cache *s,
> +						       struct slab *slab)
> +{
> +}
> +
>  #endif /* CONFIG_SLAB_OBJ_EXT */
>  
>  #ifdef CONFIG_MEM_ALLOC_PROFILING
> @@ -3206,7 +3346,9 @@ static inline bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
>  static __always_inline void account_slab(struct slab *slab, int order,
>  					 struct kmem_cache *s, gfp_t gfp)
>  {
> -	if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
> +	if (memcg_kmem_online() &&
> +			(s->flags & SLAB_ACCOUNT) &&
> +			!slab_obj_exts(slab))
>  		alloc_slab_obj_exts(slab, s, gfp, true);
>  
>  	mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
> @@ -3270,9 +3412,6 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>  	slab->objects = oo_objects(oo);
>  	slab->inuse = 0;
>  	slab->frozen = 0;
> -	init_slab_obj_exts(slab);
> -
> -	account_slab(slab, oo_order(oo), s, flags);
>  
>  	slab->slab_cache = s;
>  
> @@ -3281,6 +3420,13 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>  	start = slab_address(slab);
>  
>  	setup_slab_debug(s, slab, start);
> +	init_slab_obj_exts(slab);
> +	/*
> +	 * Poison the slab before initializing the slabobj_ext array
> +	 * to prevent the array from being overwritten.
> +	 */
> +	alloc_slab_obj_exts_early(s, slab);
> +	account_slab(slab, oo_order(oo), s, flags);
>  
>  	shuffle = shuffle_freelist(s, slab);
>  
> -- 
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V4 7/8] mm/slab: save memory by allocating slabobj_ext array from leftover
  2025-12-23 15:08   ` Hao Li
@ 2025-12-23 15:31     ` Harry Yoo
  2025-12-23 16:08       ` Hao Li
  0 siblings, 1 reply; 27+ messages in thread
From: Harry Yoo @ 2025-12-23 15:31 UTC (permalink / raw)
  To: Hao Li
  Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
	mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, tytso,
	adilger.kernel, linux-ext4, linux-kernel, cgroups

On Tue, Dec 23, 2025 at 11:08:32PM +0800, Hao Li wrote:
> On Mon, Dec 22, 2025 at 08:08:42PM +0900, Harry Yoo wrote:
> > The leftover space in a slab is always smaller than s->size, and
> > kmem caches for large objects that are not power-of-two sizes tend to have
> > a greater amount of leftover space per slab. In some cases, the leftover
> > space is larger than the size of the slabobj_ext array for the slab.
> > 
> > An excellent example of such a cache is ext4_inode_cache. On my system,
> > the object size is 1144, with a preferred order of 3, 28 objects per slab,
> > and 736 bytes of leftover space per slab.
> > 
> > Since the size of the slabobj_ext array is only 224 bytes (w/o mem
> > profiling) or 448 bytes (w/ mem profiling) per slab, the entire array
> > fits within the leftover space.
> > 
> > Allocate the slabobj_exts array from this unused space instead of using
> > kcalloc() when it is large enough. The array is allocated from unused
> > space only when creating new slabs, and it doesn't try to utilize unused
> > space if alloc_slab_obj_exts() is called after slab creation because
> > implementing lazy allocation involves more expensive synchronization.
> > 
> > The implementation and evaluation of lazy allocation from unused space
> > is left as future-work. As pointed by Vlastimil Babka [1], it could be
> > beneficial when a slab cache without SLAB_ACCOUNT can be created, and
> > some of the allocations from the cache use __GFP_ACCOUNT. For example,
> > xarray does that.
> > 
> > To avoid unnecessary overhead when MEMCG (with SLAB_ACCOUNT) and
> > MEM_ALLOC_PROFILING are not used for the cache, allocate the slabobj_ext
> > array only when either of them is enabled.
> > 
> > [ MEMCG=y, MEM_ALLOC_PROFILING=n ]
> > 
> > Before patch (creating ~2.64M directories on ext4):
> >   Slab:            4747880 kB
> >   SReclaimable:    4169652 kB
> >   SUnreclaim:       578228 kB
> > 
> > After patch (creating ~2.64M directories on ext4):
> >   Slab:            4724020 kB
> >   SReclaimable:    4169188 kB
> >   SUnreclaim:       554832 kB (-22.84 MiB)
> > 
> > Enjoy the memory savings!
> > 
> > Link: https://lore.kernel.org/linux-mm/48029aab-20ea-4d90-bfd1-255592b2018e@suse.cz
> > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > ---
> >  mm/slub.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 151 insertions(+), 5 deletions(-)
> > 
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 39c381cc1b2c..3fc3d2ca42e7 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -886,6 +886,99 @@ static inline unsigned long get_orig_size(struct kmem_cache *s, void *object)
> >  	return *(unsigned long *)p;
> >  }
> >  
> > +#ifdef CONFIG_SLAB_OBJ_EXT
> > +
> > +/*
> > + * Check if memory cgroup or memory allocation profiling is enabled.
> > + * If enabled, SLUB tries to reduce memory overhead of accounting
> > + * slab objects. If neither is enabled when this function is called,
> > + * the optimization is simply skipped to avoid affecting caches that do not
> > + * need slabobj_ext metadata.
> > + *
> > + * However, this may disable optimization when memory cgroup or memory
> > + * allocation profiling is used, but slabs are created too early
> > + * even before those subsystems are initialized.
> > + */
> > +static inline bool need_slab_obj_exts(struct kmem_cache *s)
> > +{
> > +	if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
> > +		return true;
> > +
> > +	if (mem_alloc_profiling_enabled())
> > +		return true;
> > +
> > +	return false;
> > +}
> > +
> > +static inline unsigned int obj_exts_size_in_slab(struct slab *slab)
> > +{
> > +	return sizeof(struct slabobj_ext) * slab->objects;
> > +}
> > +
> > +static inline unsigned long obj_exts_offset_in_slab(struct kmem_cache *s,
> > +						    struct slab *slab)
> > +{
> > +	unsigned long objext_offset;
> > +
> > +	objext_offset = s->red_left_pad + s->size * slab->objects;
> 
> Hi Harry,

Hi Hao, thanks for the review!
Hope you're doing well.

> As s->size already includes s->red_left_pad

Great question. It's true that s->size includes s->red_left_pad,
but we have also a redzone right before the first object:

  [ redzone ] [ obj 1 | redzone ] [ obj 2| redzone ] [ ... ]

So we have (slab->objects + 1) red zones and so

> do we still need > s->red_left_pad here?

I think this is still needed.

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V4 7/8] mm/slab: save memory by allocating slabobj_ext array from leftover
  2025-12-23 15:31     ` Harry Yoo
@ 2025-12-23 16:08       ` Hao Li
  2025-12-23 16:25         ` Harry Yoo
  0 siblings, 1 reply; 27+ messages in thread
From: Hao Li @ 2025-12-23 16:08 UTC (permalink / raw)
  To: Harry Yoo
  Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
	mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, tytso,
	adilger.kernel, linux-ext4, linux-kernel, cgroups

On Wed, Dec 24, 2025 at 12:31:19AM +0900, Harry Yoo wrote:
> On Tue, Dec 23, 2025 at 11:08:32PM +0800, Hao Li wrote:
> > On Mon, Dec 22, 2025 at 08:08:42PM +0900, Harry Yoo wrote:
> > > The leftover space in a slab is always smaller than s->size, and
> > > kmem caches for large objects that are not power-of-two sizes tend to have
> > > a greater amount of leftover space per slab. In some cases, the leftover
> > > space is larger than the size of the slabobj_ext array for the slab.
> > > 
> > > An excellent example of such a cache is ext4_inode_cache. On my system,
> > > the object size is 1144, with a preferred order of 3, 28 objects per slab,
> > > and 736 bytes of leftover space per slab.
> > > 
> > > Since the size of the slabobj_ext array is only 224 bytes (w/o mem
> > > profiling) or 448 bytes (w/ mem profiling) per slab, the entire array
> > > fits within the leftover space.
> > > 
> > > Allocate the slabobj_exts array from this unused space instead of using
> > > kcalloc() when it is large enough. The array is allocated from unused
> > > space only when creating new slabs, and it doesn't try to utilize unused
> > > space if alloc_slab_obj_exts() is called after slab creation because
> > > implementing lazy allocation involves more expensive synchronization.
> > > 
> > > The implementation and evaluation of lazy allocation from unused space
> > > is left as future-work. As pointed by Vlastimil Babka [1], it could be
> > > beneficial when a slab cache without SLAB_ACCOUNT can be created, and
> > > some of the allocations from the cache use __GFP_ACCOUNT. For example,
> > > xarray does that.
> > > 
> > > To avoid unnecessary overhead when MEMCG (with SLAB_ACCOUNT) and
> > > MEM_ALLOC_PROFILING are not used for the cache, allocate the slabobj_ext
> > > array only when either of them is enabled.
> > > 
> > > [ MEMCG=y, MEM_ALLOC_PROFILING=n ]
> > > 
> > > Before patch (creating ~2.64M directories on ext4):
> > >   Slab:            4747880 kB
> > >   SReclaimable:    4169652 kB
> > >   SUnreclaim:       578228 kB
> > > 
> > > After patch (creating ~2.64M directories on ext4):
> > >   Slab:            4724020 kB
> > >   SReclaimable:    4169188 kB
> > >   SUnreclaim:       554832 kB (-22.84 MiB)
> > > 
> > > Enjoy the memory savings!
> > > 
> > > Link: https://lore.kernel.org/linux-mm/48029aab-20ea-4d90-bfd1-255592b2018e@suse.cz
> > > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > > ---
> > >  mm/slub.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
> > >  1 file changed, 151 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index 39c381cc1b2c..3fc3d2ca42e7 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -886,6 +886,99 @@ static inline unsigned long get_orig_size(struct kmem_cache *s, void *object)
> > >  	return *(unsigned long *)p;
> > >  }
> > >  
> > > +#ifdef CONFIG_SLAB_OBJ_EXT
> > > +
> > > +/*
> > > + * Check if memory cgroup or memory allocation profiling is enabled.
> > > + * If enabled, SLUB tries to reduce memory overhead of accounting
> > > + * slab objects. If neither is enabled when this function is called,
> > > + * the optimization is simply skipped to avoid affecting caches that do not
> > > + * need slabobj_ext metadata.
> > > + *
> > > + * However, this may disable optimization when memory cgroup or memory
> > > + * allocation profiling is used, but slabs are created too early
> > > + * even before those subsystems are initialized.
> > > + */
> > > +static inline bool need_slab_obj_exts(struct kmem_cache *s)
> > > +{
> > > +	if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
> > > +		return true;
> > > +
> > > +	if (mem_alloc_profiling_enabled())
> > > +		return true;
> > > +
> > > +	return false;
> > > +}
> > > +
> > > +static inline unsigned int obj_exts_size_in_slab(struct slab *slab)
> > > +{
> > > +	return sizeof(struct slabobj_ext) * slab->objects;
> > > +}
> > > +
> > > +static inline unsigned long obj_exts_offset_in_slab(struct kmem_cache *s,
> > > +						    struct slab *slab)
> > > +{
> > > +	unsigned long objext_offset;
> > > +
> > > +	objext_offset = s->red_left_pad + s->size * slab->objects;
> > 
> > Hi Harry,
> 
> Hi Hao, thanks for the review!
> Hope you're doing well.

Thanks Harry. Hope you are too!

> 
> > As s->size already includes s->red_left_pad
> 
> Great question. It's true that s->size includes s->red_left_pad,
> but we have also a redzone right before the first object:
> 
>   [ redzone ] [ obj 1 | redzone ] [ obj 2| redzone ] [ ... ]
> 
> So we have (slab->objects + 1) red zones and so

I have a follow-up question regarding the redzones. Unless I'm missing
some detail, it seems the left redzone should apply to each object as
well. If so, I would expect the memory layout to be:

[left redzone | obj 1 | right redzone], [left redzone | obj 2 | right redzone], [ ... ]

In `calculate_sizes()`, I see:

if ((flags & SLAB_RED_ZONE) && size == s->object_size)
    size += sizeof(void *);
...
...
if (flags & SLAB_RED_ZONE) {
    size += s->red_left_pad;
}

Could you please confirm whether my understanding is correct, or point
out what I'm missing?

> 
> > do we still need > s->red_left_pad here?
> 
> I think this is still needed.
> 
> -- 
> Cheers,
> Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V4 7/8] mm/slab: save memory by allocating slabobj_ext array from leftover
  2025-12-23 16:08       ` Hao Li
@ 2025-12-23 16:25         ` Harry Yoo
  2025-12-24  3:18           ` Hao Li
  0 siblings, 1 reply; 27+ messages in thread
From: Harry Yoo @ 2025-12-23 16:25 UTC (permalink / raw)
  To: Hao Li
  Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
	mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, tytso,
	adilger.kernel, linux-ext4, linux-kernel, cgroups

On Wed, Dec 24, 2025 at 12:08:36AM +0800, Hao Li wrote:
> On Wed, Dec 24, 2025 at 12:31:19AM +0900, Harry Yoo wrote:
> > On Tue, Dec 23, 2025 at 11:08:32PM +0800, Hao Li wrote:
> > > On Mon, Dec 22, 2025 at 08:08:42PM +0900, Harry Yoo wrote:
> > > > The leftover space in a slab is always smaller than s->size, and
> > > > kmem caches for large objects that are not power-of-two sizes tend to have
> > > > a greater amount of leftover space per slab. In some cases, the leftover
> > > > space is larger than the size of the slabobj_ext array for the slab.
> > > > 
> > > > An excellent example of such a cache is ext4_inode_cache. On my system,
> > > > the object size is 1144, with a preferred order of 3, 28 objects per slab,
> > > > and 736 bytes of leftover space per slab.
> > > > 
> > > > Since the size of the slabobj_ext array is only 224 bytes (w/o mem
> > > > profiling) or 448 bytes (w/ mem profiling) per slab, the entire array
> > > > fits within the leftover space.
> > > > 
> > > > Allocate the slabobj_exts array from this unused space instead of using
> > > > kcalloc() when it is large enough. The array is allocated from unused
> > > > space only when creating new slabs, and it doesn't try to utilize unused
> > > > space if alloc_slab_obj_exts() is called after slab creation because
> > > > implementing lazy allocation involves more expensive synchronization.
> > > > 
> > > > The implementation and evaluation of lazy allocation from unused space
> > > > is left as future-work. As pointed by Vlastimil Babka [1], it could be
> > > > beneficial when a slab cache without SLAB_ACCOUNT can be created, and
> > > > some of the allocations from the cache use __GFP_ACCOUNT. For example,
> > > > xarray does that.
> > > > 
> > > > To avoid unnecessary overhead when MEMCG (with SLAB_ACCOUNT) and
> > > > MEM_ALLOC_PROFILING are not used for the cache, allocate the slabobj_ext
> > > > array only when either of them is enabled.
> > > > 
> > > > [ MEMCG=y, MEM_ALLOC_PROFILING=n ]
> > > > 
> > > > Before patch (creating ~2.64M directories on ext4):
> > > >   Slab:            4747880 kB
> > > >   SReclaimable:    4169652 kB
> > > >   SUnreclaim:       578228 kB
> > > > 
> > > > After patch (creating ~2.64M directories on ext4):
> > > >   Slab:            4724020 kB
> > > >   SReclaimable:    4169188 kB
> > > >   SUnreclaim:       554832 kB (-22.84 MiB)
> > > > 
> > > > Enjoy the memory savings!
> > > > 
> > > > Link: https://lore.kernel.org/linux-mm/48029aab-20ea-4d90-bfd1-255592b2018e@suse.cz
> > > > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > > > ---
> > > >  mm/slub.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
> > > >  1 file changed, 151 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/mm/slub.c b/mm/slub.c
> > > > index 39c381cc1b2c..3fc3d2ca42e7 100644
> > > > --- a/mm/slub.c
> > > > +++ b/mm/slub.c
> > > > @@ -886,6 +886,99 @@ static inline unsigned long get_orig_size(struct kmem_cache *s, void *object)
> > > >  	return *(unsigned long *)p;
> > > >  }
> > > >  
> > > > +#ifdef CONFIG_SLAB_OBJ_EXT
> > > > +
> > > > +/*
> > > > + * Check if memory cgroup or memory allocation profiling is enabled.
> > > > + * If enabled, SLUB tries to reduce memory overhead of accounting
> > > > + * slab objects. If neither is enabled when this function is called,
> > > > + * the optimization is simply skipped to avoid affecting caches that do not
> > > > + * need slabobj_ext metadata.
> > > > + *
> > > > + * However, this may disable optimization when memory cgroup or memory
> > > > + * allocation profiling is used, but slabs are created too early
> > > > + * even before those subsystems are initialized.
> > > > + */
> > > > +static inline bool need_slab_obj_exts(struct kmem_cache *s)
> > > > +{
> > > > +	if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
> > > > +		return true;
> > > > +
> > > > +	if (mem_alloc_profiling_enabled())
> > > > +		return true;
> > > > +
> > > > +	return false;
> > > > +}
> > > > +
> > > > +static inline unsigned int obj_exts_size_in_slab(struct slab *slab)
> > > > +{
> > > > +	return sizeof(struct slabobj_ext) * slab->objects;
> > > > +}
> > > > +
> > > > +static inline unsigned long obj_exts_offset_in_slab(struct kmem_cache *s,
> > > > +						    struct slab *slab)
> > > > +{
> > > > +	unsigned long objext_offset;
> > > > +
> > > > +	objext_offset = s->red_left_pad + s->size * slab->objects;
> > > 
> > > Hi Harry,
> > 
> > Hi Hao, thanks for the review!
> > Hope you're doing well.
> 
> Thanks Harry. Hope you are too!
> 
> > 
> > > As s->size already includes s->red_left_pad
> > 
> > Great question. It's true that s->size includes s->red_left_pad,
> > but we have also a redzone right before the first object:
> > 
> >   [ redzone ] [ obj 1 | redzone ] [ obj 2| redzone ] [ ... ]
> > 
> > So we have (slab->objects + 1) red zones and so
> 
> I have a follow-up question regarding the redzones. Unless I'm missing
> some detail, it seems the left redzone should apply to each object as
> well. If so, I would expect the memory layout to be:
> 
> [left redzone | obj 1 | right redzone], [left redzone | obj 2 | right redzone], [ ... ]
> 
> In `calculate_sizes()`, I see:
> 
> if ((flags & SLAB_RED_ZONE) && size == s->object_size)
>     size += sizeof(void *);

Yes, this is the right redzone,

> ...
> ...
> if (flags & SLAB_RED_ZONE) {
>     size += s->red_left_pad;
> }

This is the left red zone.
Both of them are included in the size...

Oh god, I was confused, thanks for the correction!

> Could you please confirm whether my understanding is correct, or point
> out what I'm missing?

I think your understanding is correct.

Hmm, perhaps we should update the "Object layout:" comment above
check_pad_bytes() to avoid future confusion?

> > > do we still need > s->red_left_pad here?
> > 
> > I think this is still needed.
> > 
> > -- 
> > Cheers,
> > Harry / Hyeonggon

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V4 7/8] mm/slab: save memory by allocating slabobj_ext array from leftover
  2025-12-23 16:25         ` Harry Yoo
@ 2025-12-24  3:18           ` Hao Li
  2025-12-24  5:53             ` Harry Yoo
  0 siblings, 1 reply; 27+ messages in thread
From: Hao Li @ 2025-12-24  3:18 UTC (permalink / raw)
  To: Harry Yoo
  Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
	mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, tytso,
	adilger.kernel, linux-ext4, linux-kernel, cgroups

On Wed, Dec 24, 2025 at 01:25:01AM +0900, Harry Yoo wrote:
> On Wed, Dec 24, 2025 at 12:08:36AM +0800, Hao Li wrote:
> > On Wed, Dec 24, 2025 at 12:31:19AM +0900, Harry Yoo wrote:
> > > On Tue, Dec 23, 2025 at 11:08:32PM +0800, Hao Li wrote:
> > > > On Mon, Dec 22, 2025 at 08:08:42PM +0900, Harry Yoo wrote:
> > > > > The leftover space in a slab is always smaller than s->size, and
> > > > > kmem caches for large objects that are not power-of-two sizes tend to have
> > > > > a greater amount of leftover space per slab. In some cases, the leftover
> > > > > space is larger than the size of the slabobj_ext array for the slab.
> > > > > 
> > > > > An excellent example of such a cache is ext4_inode_cache. On my system,
> > > > > the object size is 1144, with a preferred order of 3, 28 objects per slab,
> > > > > and 736 bytes of leftover space per slab.
> > > > > 
> > > > > Since the size of the slabobj_ext array is only 224 bytes (w/o mem
> > > > > profiling) or 448 bytes (w/ mem profiling) per slab, the entire array
> > > > > fits within the leftover space.
> > > > > 
> > > > > Allocate the slabobj_exts array from this unused space instead of using
> > > > > kcalloc() when it is large enough. The array is allocated from unused
> > > > > space only when creating new slabs, and it doesn't try to utilize unused
> > > > > space if alloc_slab_obj_exts() is called after slab creation because
> > > > > implementing lazy allocation involves more expensive synchronization.
> > > > > 
> > > > > The implementation and evaluation of lazy allocation from unused space
> > > > > is left as future-work. As pointed by Vlastimil Babka [1], it could be
> > > > > beneficial when a slab cache without SLAB_ACCOUNT can be created, and
> > > > > some of the allocations from the cache use __GFP_ACCOUNT. For example,
> > > > > xarray does that.
> > > > > 
> > > > > To avoid unnecessary overhead when MEMCG (with SLAB_ACCOUNT) and
> > > > > MEM_ALLOC_PROFILING are not used for the cache, allocate the slabobj_ext
> > > > > array only when either of them is enabled.
> > > > > 
> > > > > [ MEMCG=y, MEM_ALLOC_PROFILING=n ]
> > > > > 
> > > > > Before patch (creating ~2.64M directories on ext4):
> > > > >   Slab:            4747880 kB
> > > > >   SReclaimable:    4169652 kB
> > > > >   SUnreclaim:       578228 kB
> > > > > 
> > > > > After patch (creating ~2.64M directories on ext4):
> > > > >   Slab:            4724020 kB
> > > > >   SReclaimable:    4169188 kB
> > > > >   SUnreclaim:       554832 kB (-22.84 MiB)
> > > > > 
> > > > > Enjoy the memory savings!
> > > > > 
> > > > > Link: https://lore.kernel.org/linux-mm/48029aab-20ea-4d90-bfd1-255592b2018e@suse.cz
> > > > > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > > > > ---
> > > > >  mm/slub.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
> > > > >  1 file changed, 151 insertions(+), 5 deletions(-)
> > > > > 
> > > > > diff --git a/mm/slub.c b/mm/slub.c
> > > > > index 39c381cc1b2c..3fc3d2ca42e7 100644
> > > > > --- a/mm/slub.c
> > > > > +++ b/mm/slub.c
> > > > > @@ -886,6 +886,99 @@ static inline unsigned long get_orig_size(struct kmem_cache *s, void *object)
> > > > >  	return *(unsigned long *)p;
> > > > >  }
> > > > >  
> > > > > +#ifdef CONFIG_SLAB_OBJ_EXT
> > > > > +
> > > > > +/*
> > > > > + * Check if memory cgroup or memory allocation profiling is enabled.
> > > > > + * If enabled, SLUB tries to reduce memory overhead of accounting
> > > > > + * slab objects. If neither is enabled when this function is called,
> > > > > + * the optimization is simply skipped to avoid affecting caches that do not
> > > > > + * need slabobj_ext metadata.
> > > > > + *
> > > > > + * However, this may disable optimization when memory cgroup or memory
> > > > > + * allocation profiling is used, but slabs are created too early
> > > > > + * even before those subsystems are initialized.
> > > > > + */
> > > > > +static inline bool need_slab_obj_exts(struct kmem_cache *s)
> > > > > +{
> > > > > +	if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
> > > > > +		return true;
> > > > > +
> > > > > +	if (mem_alloc_profiling_enabled())
> > > > > +		return true;
> > > > > +
> > > > > +	return false;
> > > > > +}
> > > > > +
> > > > > +static inline unsigned int obj_exts_size_in_slab(struct slab *slab)
> > > > > +{
> > > > > +	return sizeof(struct slabobj_ext) * slab->objects;
> > > > > +}
> > > > > +
> > > > > +static inline unsigned long obj_exts_offset_in_slab(struct kmem_cache *s,
> > > > > +						    struct slab *slab)
> > > > > +{
> > > > > +	unsigned long objext_offset;
> > > > > +
> > > > > +	objext_offset = s->red_left_pad + s->size * slab->objects;
> > > > 
> > > > Hi Harry,
> > > 
> > > Hi Hao, thanks for the review!
> > > Hope you're doing well.
> > 
> > Thanks Harry. Hope you are too!
> > 
> > > 
> > > > As s->size already includes s->red_left_pad
> > > 
> > > Great question. It's true that s->size includes s->red_left_pad,
> > > but we have also a redzone right before the first object:
> > > 
> > >   [ redzone ] [ obj 1 | redzone ] [ obj 2| redzone ] [ ... ]
> > > 
> > > So we have (slab->objects + 1) red zones and so
> > 
> > I have a follow-up question regarding the redzones. Unless I'm missing
> > some detail, it seems the left redzone should apply to each object as
> > well. If so, I would expect the memory layout to be:
> > 
> > [left redzone | obj 1 | right redzone], [left redzone | obj 2 | right redzone], [ ... ]
> > 
> > In `calculate_sizes()`, I see:
> > 
> > if ((flags & SLAB_RED_ZONE) && size == s->object_size)
> >     size += sizeof(void *);
> 
> Yes, this is the right redzone,
> 
> > ...
> > ...
> > if (flags & SLAB_RED_ZONE) {
> >     size += s->red_left_pad;
> > }
> 
> This is the left red zone.
> Both of them are included in the size...
> 
> Oh god, I was confused, thanks for the correction!

Glad it helped!

> 
> > Could you please confirm whether my understanding is correct, or point
> > out what I'm missing?
> 
> I think your understanding is correct.
> 
> Hmm, perhaps we should update the "Object layout:" comment above
> check_pad_bytes() to avoid future confusion?

Yes, exactly. That’s a good idea. Also, I feel the layout description in
the check_pad_bytes() comment isn’t very intuitive and can be a bit hard
to follow. I think it might be clearer if we explicitly list out each
field. What do you think about that?

> 
> > > > do we still need > s->red_left_pad here?
> > > 
> > > I think this is still needed.
> > > 
> > > -- 
> > > Cheers,
> > > Harry / Hyeonggon
> 
> -- 
> Cheers,
> Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V4 8/8] mm/slab: place slabobj_ext metadata in unused space within s->size
  2025-12-22 11:08 ` [PATCH V4 8/8] mm/slab: place slabobj_ext metadata in unused space within s->size Harry Yoo
@ 2025-12-24  5:33   ` Hao Li
  2025-12-24  6:38     ` Harry Yoo
  0 siblings, 1 reply; 27+ messages in thread
From: Hao Li @ 2025-12-24  5:33 UTC (permalink / raw)
  To: Harry Yoo
  Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
	mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, tytso,
	adilger.kernel, linux-ext4, linux-kernel, cgroups

On Mon, Dec 22, 2025 at 08:08:43PM +0900, Harry Yoo wrote:
> When a cache has high s->align value and s->object_size is not aligned
> to it, each object ends up with some unused space because of alignment.
> If this wasted space is big enough, we can use it to store the
> slabobj_ext metadata instead of wasting it.
> 
> On my system, this happens with caches like kmem_cache, mm_struct, pid,
> task_struct, sighand_cache, xfs_inode, and others.
> 
> To place the slabobj_ext metadata within each object, the existing
> slab_obj_ext() logic can still be used by setting:
> 
>   - slab->obj_exts = slab_address(slab) + s->red_left_zone +
>                      (slabobj_ext offset)
>   - stride = s->size
> 
> slab_obj_ext() doesn't need know where the metadata is stored,
> so this method works without adding extra overhead to slab_obj_ext().
> 
> A good example benefiting from this optimization is xfs_inode
> (object_size: 992, align: 64). To measure memory savings, 2 millions of
> files were created on XFS.
> 
> [ MEMCG=y, MEM_ALLOC_PROFILING=n ]
> 
> Before patch (creating ~2.64M directories on xfs):
>   Slab:            5175976 kB
>   SReclaimable:    3837524 kB
>   SUnreclaim:      1338452 kB
> 
> After patch (creating ~2.64M directories on xfs):
>   Slab:            5152912 kB
>   SReclaimable:    3838568 kB
>   SUnreclaim:      1314344 kB (-23.54 MiB)
> 
> Enjoy the memory savings!
> 
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> ---
>  include/linux/slab.h |  9 ++++++
>  mm/slab_common.c     |  6 ++--
>  mm/slub.c            | 73 ++++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 83 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 4554c04a9bd7..da512d9ab1a0 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -59,6 +59,9 @@ enum _slab_flag_bits {
>  	_SLAB_CMPXCHG_DOUBLE,
>  #ifdef CONFIG_SLAB_OBJ_EXT
>  	_SLAB_NO_OBJ_EXT,
> +#endif
> +#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
> +	_SLAB_OBJ_EXT_IN_OBJ,
>  #endif
>  	_SLAB_FLAGS_LAST_BIT
>  };
> @@ -244,6 +247,12 @@ enum _slab_flag_bits {
>  #define SLAB_NO_OBJ_EXT		__SLAB_FLAG_UNUSED
>  #endif
>  
> +#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
> +#define SLAB_OBJ_EXT_IN_OBJ	__SLAB_FLAG_BIT(_SLAB_OBJ_EXT_IN_OBJ)
> +#else
> +#define SLAB_OBJ_EXT_IN_OBJ	__SLAB_FLAG_UNUSED
> +#endif
> +
>  /*
>   * ZERO_SIZE_PTR will be returned for zero sized kmalloc requests.
>   *
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index c4cf9ed2ec92..f0a6db20d7ea 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -43,11 +43,13 @@ DEFINE_MUTEX(slab_mutex);
>  struct kmem_cache *kmem_cache;
>  
>  /*
> - * Set of flags that will prevent slab merging
> + * Set of flags that will prevent slab merging.
> + * Any flag that adds per-object metadata should be included,
> + * since slab merging can update s->inuse that affects the metadata layout.
>   */
>  #define SLAB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
>  		SLAB_TRACE | SLAB_TYPESAFE_BY_RCU | SLAB_NOLEAKTRACE | \
> -		SLAB_FAILSLAB | SLAB_NO_MERGE)
> +		SLAB_FAILSLAB | SLAB_NO_MERGE | SLAB_OBJ_EXT_IN_OBJ)
>  
>  #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \
>  			 SLAB_CACHE_DMA32 | SLAB_ACCOUNT)
> diff --git a/mm/slub.c b/mm/slub.c
> index 3fc3d2ca42e7..78f0087c8e48 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -977,6 +977,39 @@ static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
>  {
>  	return false;
>  }
> +
> +#endif
> +
> +#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
> +static bool obj_exts_in_object(struct kmem_cache *s)
> +{
> +	return s->flags & SLAB_OBJ_EXT_IN_OBJ;
> +}
> +
> +static unsigned int obj_exts_offset_in_object(struct kmem_cache *s)
> +{
> +	unsigned int offset = get_info_end(s);
> +
> +	if (kmem_cache_debug_flags(s, SLAB_STORE_USER))
> +		offset += sizeof(struct track) * 2;
> +
> +	if (slub_debug_orig_size(s))
> +		offset += sizeof(unsigned long);
> +
> +	offset += kasan_metadata_size(s, false);
> +
> +	return offset;
> +}
> +#else
> +static inline bool obj_exts_in_object(struct kmem_cache *s)
> +{
> +	return false;
> +}
> +
> +static inline unsigned int obj_exts_offset_in_object(struct kmem_cache *s)
> +{
> +	return 0;
> +}
>  #endif
>  
>  #ifdef CONFIG_SLUB_DEBUG
> @@ -1277,6 +1310,9 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p)
>  
>  	off += kasan_metadata_size(s, false);
>  
> +	if (obj_exts_in_object(s))
> +		off += sizeof(struct slabobj_ext);
> +
>  	if (off != size_from_object(s))
>  		/* Beginning of the filler is the free pointer */
>  		print_section(KERN_ERR, "Padding  ", p + off,
> @@ -1446,7 +1482,10 @@ check_bytes_and_report(struct kmem_cache *s, struct slab *slab,
>   * 	A. Free pointer (if we cannot overwrite object on free)
>   * 	B. Tracking data for SLAB_STORE_USER
>   *	C. Original request size for kmalloc object (SLAB_STORE_USER enabled)
> - *	D. Padding to reach required alignment boundary or at minimum
> + *	D. KASAN alloc metadata (KASAN enabled)
> + *	E. struct slabobj_ext to store accounting metadata
> + *	   (SLAB_OBJ_EXT_IN_OBJ enabled)
> + *	F. Padding to reach required alignment boundary or at minimum
>   * 		one word if debugging is on to be able to detect writes
>   * 		before the word boundary.
>   *
> @@ -1474,6 +1513,9 @@ static int check_pad_bytes(struct kmem_cache *s, struct slab *slab, u8 *p)
>  
>  	off += kasan_metadata_size(s, false);
>  
> +	if (obj_exts_in_object(s))
> +		off += sizeof(struct slabobj_ext);
> +
>  	if (size_from_object(s) == off)
>  		return 1;
>  
> @@ -2280,7 +2322,8 @@ static inline void free_slab_obj_exts(struct slab *slab)
>  		return;
>  	}
>  
> -	if (obj_exts_in_slab(slab->slab_cache, slab)) {
> +	if (obj_exts_in_slab(slab->slab_cache, slab) ||
> +			obj_exts_in_object(slab->slab_cache)) {
>  		slab->obj_exts = 0;
>  		return;
>  	}
> @@ -2326,6 +2369,23 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
>  			obj_exts |= MEMCG_DATA_OBJEXTS;
>  		slab->obj_exts = obj_exts;
>  		slab_set_stride(slab, sizeof(struct slabobj_ext));
> +	} else if (obj_exts_in_object(s)) {
> +		unsigned int offset = obj_exts_offset_in_object(s);
> +
> +		obj_exts = (unsigned long)slab_address(slab);
> +		obj_exts += s->red_left_pad;
> +		obj_exts += obj_exts_offset_in_object(s);

Hi, Harry

It looks like this could just be simplified to obj_exts += offset, right?

> +
> +		get_slab_obj_exts(obj_exts);
> +		for_each_object(addr, s, slab_address(slab), slab->objects)
> +			memset(kasan_reset_tag(addr) + offset, 0,
> +			       sizeof(struct slabobj_ext));
> +		put_slab_obj_exts(obj_exts);
> +
> +		if (IS_ENABLED(CONFIG_MEMCG))
> +			obj_exts |= MEMCG_DATA_OBJEXTS;
> +		slab->obj_exts = obj_exts;
> +		slab_set_stride(slab, s->size);
>  	}
>  }
>  
> @@ -8023,6 +8083,7 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
>  {
>  	slab_flags_t flags = s->flags;
>  	unsigned int size = s->object_size;
> +	unsigned int aligned_size;
>  	unsigned int order;
>  
>  	/*
> @@ -8132,7 +8193,13 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
>  	 * offset 0. In order to align the objects we have to simply size
>  	 * each object to conform to the alignment.
>  	 */
> -	size = ALIGN(size, s->align);
> +	aligned_size = ALIGN(size, s->align);
> +#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
> +	if (aligned_size - size >= sizeof(struct slabobj_ext))
> +		s->flags |= SLAB_OBJ_EXT_IN_OBJ;
> +#endif
> +	size = aligned_size;
> +

One more thought: in calculate_sizes() we add some extra padding when
SLAB_RED_ZONE is enabled:

if (flags & SLAB_RED_ZONE) {
	/*
	 * Add some empty padding so that we can catch
	 * overwrites from earlier objects rather than let
	 * tracking information or the free pointer be
	 * corrupted if a user writes before the start
	 * of the object.
	 */
	size += sizeof(void *);
	...
}


From what I understand, this additional padding ends up being placed
after the KASAN allocation metadata.
Since it’s only "extra" padding (i.e., it doesn’t seem strictly required
for the layout), and your patch would reuse this area — together with
the final padding introduced by `size = ALIGN(size, s->align);` — for
objext, it seems like this padding may no longer provide much benefit.

Do you think it would make sense to remove this extra padding
altogether?

-- 
Thanks,
Hao
>  	s->size = size;
>  	s->reciprocal_size = reciprocal_value(size);
>  	order = calculate_order(size);
> -- 
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V4 7/8] mm/slab: save memory by allocating slabobj_ext array from leftover
  2025-12-24  3:18           ` Hao Li
@ 2025-12-24  5:53             ` Harry Yoo
  2025-12-24  6:05               ` Hao Li
  2025-12-24 12:51               ` [PATCH] slub: clarify object field layout comments Hao Li
  0 siblings, 2 replies; 27+ messages in thread
From: Harry Yoo @ 2025-12-24  5:53 UTC (permalink / raw)
  To: Hao Li
  Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
	mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, tytso,
	adilger.kernel, linux-ext4, linux-kernel, cgroups

On Wed, Dec 24, 2025 at 11:18:56AM +0800, Hao Li wrote:
> On Wed, Dec 24, 2025 at 01:25:01AM +0900, Harry Yoo wrote:
> > On Wed, Dec 24, 2025 at 12:08:36AM +0800, Hao Li wrote:
> > > On Wed, Dec 24, 2025 at 12:31:19AM +0900, Harry Yoo wrote:
> > > > On Tue, Dec 23, 2025 at 11:08:32PM +0800, Hao Li wrote:
> > > > > On Mon, Dec 22, 2025 at 08:08:42PM +0900, Harry Yoo wrote:
> > > > > > The leftover space in a slab is always smaller than s->size, and
> > > > > > kmem caches for large objects that are not power-of-two sizes tend to have
> > > > > > a greater amount of leftover space per slab. In some cases, the leftover
> > > > > > space is larger than the size of the slabobj_ext array for the slab.
> > > > > > 
> > > > > > An excellent example of such a cache is ext4_inode_cache. On my system,
> > > > > > the object size is 1144, with a preferred order of 3, 28 objects per slab,
> > > > > > and 736 bytes of leftover space per slab.
> > > > > > 
> > > > > > Since the size of the slabobj_ext array is only 224 bytes (w/o mem
> > > > > > profiling) or 448 bytes (w/ mem profiling) per slab, the entire array
> > > > > > fits within the leftover space.
> > > > > > 
> > > > > > Allocate the slabobj_exts array from this unused space instead of using
> > > > > > kcalloc() when it is large enough. The array is allocated from unused
> > > > > > space only when creating new slabs, and it doesn't try to utilize unused
> > > > > > space if alloc_slab_obj_exts() is called after slab creation because
> > > > > > implementing lazy allocation involves more expensive synchronization.
> > > > > > 
> > > > > > The implementation and evaluation of lazy allocation from unused space
> > > > > > is left as future-work. As pointed by Vlastimil Babka [1], it could be
> > > > > > beneficial when a slab cache without SLAB_ACCOUNT can be created, and
> > > > > > some of the allocations from the cache use __GFP_ACCOUNT. For example,
> > > > > > xarray does that.
> > > > > > 
> > > > > > To avoid unnecessary overhead when MEMCG (with SLAB_ACCOUNT) and
> > > > > > MEM_ALLOC_PROFILING are not used for the cache, allocate the slabobj_ext
> > > > > > array only when either of them is enabled.
> > > > > > 
> > > > > > [ MEMCG=y, MEM_ALLOC_PROFILING=n ]
> > > > > > 
> > > > > > Before patch (creating ~2.64M directories on ext4):
> > > > > >   Slab:            4747880 kB
> > > > > >   SReclaimable:    4169652 kB
> > > > > >   SUnreclaim:       578228 kB
> > > > > > 
> > > > > > After patch (creating ~2.64M directories on ext4):
> > > > > >   Slab:            4724020 kB
> > > > > >   SReclaimable:    4169188 kB
> > > > > >   SUnreclaim:       554832 kB (-22.84 MiB)
> > > > > > 
> > > > > > Enjoy the memory savings!
> > > > > > 
> > > > > > Link: https://lore.kernel.org/linux-mm/48029aab-20ea-4d90-bfd1-255592b2018e@suse.cz
> > > > > > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > > > > > ---
> > > > > >  mm/slub.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
> > > > > >  1 file changed, 151 insertions(+), 5 deletions(-)
> > > > > > 
> > > > > > diff --git a/mm/slub.c b/mm/slub.c
> > > > > > index 39c381cc1b2c..3fc3d2ca42e7 100644
> > > > > > --- a/mm/slub.c
> > > > > > +++ b/mm/slub.c
> > > > > > @@ -886,6 +886,99 @@ static inline unsigned long get_orig_size(struct kmem_cache *s, void *object)
> > > > > >  	return *(unsigned long *)p;
> > > > > >  }
> > > > > >  
> > > > > > +#ifdef CONFIG_SLAB_OBJ_EXT
> > > > > > +
> > > > > > +/*
> > > > > > + * Check if memory cgroup or memory allocation profiling is enabled.
> > > > > > + * If enabled, SLUB tries to reduce memory overhead of accounting
> > > > > > + * slab objects. If neither is enabled when this function is called,
> > > > > > + * the optimization is simply skipped to avoid affecting caches that do not
> > > > > > + * need slabobj_ext metadata.
> > > > > > + *
> > > > > > + * However, this may disable optimization when memory cgroup or memory
> > > > > > + * allocation profiling is used, but slabs are created too early
> > > > > > + * even before those subsystems are initialized.
> > > > > > + */
> > > > > > +static inline bool need_slab_obj_exts(struct kmem_cache *s)
> > > > > > +{
> > > > > > +	if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
> > > > > > +		return true;
> > > > > > +
> > > > > > +	if (mem_alloc_profiling_enabled())
> > > > > > +		return true;
> > > > > > +
> > > > > > +	return false;
> > > > > > +}
> > > > > > +
> > > > > > +static inline unsigned int obj_exts_size_in_slab(struct slab *slab)
> > > > > > +{
> > > > > > +	return sizeof(struct slabobj_ext) * slab->objects;
> > > > > > +}
> > > > > > +
> > > > > > +static inline unsigned long obj_exts_offset_in_slab(struct kmem_cache *s,
> > > > > > +						    struct slab *slab)
> > > > > > +{
> > > > > > +	unsigned long objext_offset;
> > > > > > +
> > > > > > +	objext_offset = s->red_left_pad + s->size * slab->objects;
> > > > > 
> > > > > Hi Harry,
> > > > 
> > > > Hi Hao, thanks for the review!
> > > > Hope you're doing well.
> > > 
> > > Thanks Harry. Hope you are too!
> > > 
> > > > 
> > > > > As s->size already includes s->red_left_pad
> > > > 
> > > > Great question. It's true that s->size includes s->red_left_pad,
> > > > but we have also a redzone right before the first object:
> > > > 
> > > >   [ redzone ] [ obj 1 | redzone ] [ obj 2| redzone ] [ ... ]
> > > > 
> > > > So we have (slab->objects + 1) red zones and so
> > > 
> > > I have a follow-up question regarding the redzones. Unless I'm missing
> > > some detail, it seems the left redzone should apply to each object as
> > > well. If so, I would expect the memory layout to be:
> > > 
> > > [left redzone | obj 1 | right redzone], [left redzone | obj 2 | right redzone], [ ... ]
> > > 
> > > In `calculate_sizes()`, I see:
> > > 
> > > if ((flags & SLAB_RED_ZONE) && size == s->object_size)
> > >     size += sizeof(void *);
> > 
> > Yes, this is the right redzone,
> > 
> > > ...
> > > ...
> > > if (flags & SLAB_RED_ZONE) {
> > >     size += s->red_left_pad;
> > > }
> > 
> > This is the left red zone.
> > Both of them are included in the size...
> > 
> > Oh god, I was confused, thanks for the correction!
> 
> Glad it helped!
> 
> > > Could you please confirm whether my understanding is correct, or point
> > > out what I'm missing?
> > 
> > I think your understanding is correct.
> > 
> > Hmm, perhaps we should update the "Object layout:" comment above
> > check_pad_bytes() to avoid future confusion?
> 
> Yes, exactly. That’s a good idea.
>
> Also, I feel the layout description in the check_pad_bytes() comment
> isn’t very intuitive and can be a bit hard to follow. I think it might be
> clearer if we explicitly list out each field. What do you think about that?

Yeah it's confusing, but from your description
I'm not sure what the end result would look like.

Could you please do a patch that does it? (and also adding left redzone
to the object layout comment, if you are willing to!)

As long as it makes it more understandable/intuitive,
it'd be nice to have!

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V4 7/8] mm/slab: save memory by allocating slabobj_ext array from leftover
  2025-12-24  5:53             ` Harry Yoo
@ 2025-12-24  6:05               ` Hao Li
  2025-12-24 12:51               ` [PATCH] slub: clarify object field layout comments Hao Li
  1 sibling, 0 replies; 27+ messages in thread
From: Hao Li @ 2025-12-24  6:05 UTC (permalink / raw)
  To: Harry Yoo
  Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
	mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, tytso,
	adilger.kernel, linux-ext4, linux-kernel, cgroups

On Wed, Dec 24, 2025 at 02:53:26PM +0900, Harry Yoo wrote:
> On Wed, Dec 24, 2025 at 11:18:56AM +0800, Hao Li wrote:
> > On Wed, Dec 24, 2025 at 01:25:01AM +0900, Harry Yoo wrote:
> > > On Wed, Dec 24, 2025 at 12:08:36AM +0800, Hao Li wrote:
> > > > On Wed, Dec 24, 2025 at 12:31:19AM +0900, Harry Yoo wrote:
> > > > > On Tue, Dec 23, 2025 at 11:08:32PM +0800, Hao Li wrote:
> > > > > > On Mon, Dec 22, 2025 at 08:08:42PM +0900, Harry Yoo wrote:
> > > > > > > The leftover space in a slab is always smaller than s->size, and
> > > > > > > kmem caches for large objects that are not power-of-two sizes tend to have
> > > > > > > a greater amount of leftover space per slab. In some cases, the leftover
> > > > > > > space is larger than the size of the slabobj_ext array for the slab.
> > > > > > > 
> > > > > > > An excellent example of such a cache is ext4_inode_cache. On my system,
> > > > > > > the object size is 1144, with a preferred order of 3, 28 objects per slab,
> > > > > > > and 736 bytes of leftover space per slab.
> > > > > > > 
> > > > > > > Since the size of the slabobj_ext array is only 224 bytes (w/o mem
> > > > > > > profiling) or 448 bytes (w/ mem profiling) per slab, the entire array
> > > > > > > fits within the leftover space.
> > > > > > > 
> > > > > > > Allocate the slabobj_exts array from this unused space instead of using
> > > > > > > kcalloc() when it is large enough. The array is allocated from unused
> > > > > > > space only when creating new slabs, and it doesn't try to utilize unused
> > > > > > > space if alloc_slab_obj_exts() is called after slab creation because
> > > > > > > implementing lazy allocation involves more expensive synchronization.
> > > > > > > 
> > > > > > > The implementation and evaluation of lazy allocation from unused space
> > > > > > > is left as future-work. As pointed by Vlastimil Babka [1], it could be
> > > > > > > beneficial when a slab cache without SLAB_ACCOUNT can be created, and
> > > > > > > some of the allocations from the cache use __GFP_ACCOUNT. For example,
> > > > > > > xarray does that.
> > > > > > > 
> > > > > > > To avoid unnecessary overhead when MEMCG (with SLAB_ACCOUNT) and
> > > > > > > MEM_ALLOC_PROFILING are not used for the cache, allocate the slabobj_ext
> > > > > > > array only when either of them is enabled.
> > > > > > > 
> > > > > > > [ MEMCG=y, MEM_ALLOC_PROFILING=n ]
> > > > > > > 
> > > > > > > Before patch (creating ~2.64M directories on ext4):
> > > > > > >   Slab:            4747880 kB
> > > > > > >   SReclaimable:    4169652 kB
> > > > > > >   SUnreclaim:       578228 kB
> > > > > > > 
> > > > > > > After patch (creating ~2.64M directories on ext4):
> > > > > > >   Slab:            4724020 kB
> > > > > > >   SReclaimable:    4169188 kB
> > > > > > >   SUnreclaim:       554832 kB (-22.84 MiB)
> > > > > > > 
> > > > > > > Enjoy the memory savings!
> > > > > > > 
> > > > > > > Link: https://lore.kernel.org/linux-mm/48029aab-20ea-4d90-bfd1-255592b2018e@suse.cz
> > > > > > > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > > > > > > ---
> > > > > > >  mm/slub.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
> > > > > > >  1 file changed, 151 insertions(+), 5 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/mm/slub.c b/mm/slub.c
> > > > > > > index 39c381cc1b2c..3fc3d2ca42e7 100644
> > > > > > > --- a/mm/slub.c
> > > > > > > +++ b/mm/slub.c
> > > > > > > @@ -886,6 +886,99 @@ static inline unsigned long get_orig_size(struct kmem_cache *s, void *object)
> > > > > > >  	return *(unsigned long *)p;
> > > > > > >  }
> > > > > > >  
> > > > > > > +#ifdef CONFIG_SLAB_OBJ_EXT
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Check if memory cgroup or memory allocation profiling is enabled.
> > > > > > > + * If enabled, SLUB tries to reduce memory overhead of accounting
> > > > > > > + * slab objects. If neither is enabled when this function is called,
> > > > > > > + * the optimization is simply skipped to avoid affecting caches that do not
> > > > > > > + * need slabobj_ext metadata.
> > > > > > > + *
> > > > > > > + * However, this may disable optimization when memory cgroup or memory
> > > > > > > + * allocation profiling is used, but slabs are created too early
> > > > > > > + * even before those subsystems are initialized.
> > > > > > > + */
> > > > > > > +static inline bool need_slab_obj_exts(struct kmem_cache *s)
> > > > > > > +{
> > > > > > > +	if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
> > > > > > > +		return true;
> > > > > > > +
> > > > > > > +	if (mem_alloc_profiling_enabled())
> > > > > > > +		return true;
> > > > > > > +
> > > > > > > +	return false;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static inline unsigned int obj_exts_size_in_slab(struct slab *slab)
> > > > > > > +{
> > > > > > > +	return sizeof(struct slabobj_ext) * slab->objects;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static inline unsigned long obj_exts_offset_in_slab(struct kmem_cache *s,
> > > > > > > +						    struct slab *slab)
> > > > > > > +{
> > > > > > > +	unsigned long objext_offset;
> > > > > > > +
> > > > > > > +	objext_offset = s->red_left_pad + s->size * slab->objects;
> > > > > > 
> > > > > > Hi Harry,
> > > > > 
> > > > > Hi Hao, thanks for the review!
> > > > > Hope you're doing well.
> > > > 
> > > > Thanks Harry. Hope you are too!
> > > > 
> > > > > 
> > > > > > As s->size already includes s->red_left_pad
> > > > > 
> > > > > Great question. It's true that s->size includes s->red_left_pad,
> > > > > but we have also a redzone right before the first object:
> > > > > 
> > > > >   [ redzone ] [ obj 1 | redzone ] [ obj 2| redzone ] [ ... ]
> > > > > 
> > > > > So we have (slab->objects + 1) red zones and so
> > > > 
> > > > I have a follow-up question regarding the redzones. Unless I'm missing
> > > > some detail, it seems the left redzone should apply to each object as
> > > > well. If so, I would expect the memory layout to be:
> > > > 
> > > > [left redzone | obj 1 | right redzone], [left redzone | obj 2 | right redzone], [ ... ]
> > > > 
> > > > In `calculate_sizes()`, I see:
> > > > 
> > > > if ((flags & SLAB_RED_ZONE) && size == s->object_size)
> > > >     size += sizeof(void *);
> > > 
> > > Yes, this is the right redzone,
> > > 
> > > > ...
> > > > ...
> > > > if (flags & SLAB_RED_ZONE) {
> > > >     size += s->red_left_pad;
> > > > }
> > > 
> > > This is the left red zone.
> > > Both of them are included in the size...
> > > 
> > > Oh god, I was confused, thanks for the correction!
> > 
> > Glad it helped!
> > 
> > > > Could you please confirm whether my understanding is correct, or point
> > > > out what I'm missing?
> > > 
> > > I think your understanding is correct.
> > > 
> > > Hmm, perhaps we should update the "Object layout:" comment above
> > > check_pad_bytes() to avoid future confusion?
> > 
> > Yes, exactly. That’s a good idea.
> >
> > Also, I feel the layout description in the check_pad_bytes() comment
> > isn’t very intuitive and can be a bit hard to follow. I think it might be
> > clearer if we explicitly list out each field. What do you think about that?
> 
> Yeah it's confusing, but from your description
> I'm not sure what the end result would look like.
> 
> Could you please do a patch that does it? (and also adding left redzone
> to the object layout comment, if you are willing to!)

Sure — I'd be happy to!

> 
> As long as it makes it more understandable/intuitive,
> it'd be nice to have!

I'll send a patch for review soon.

-- 
Thanks,
Hao
> 
> -- 
> Cheers,
> Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V4 8/8] mm/slab: place slabobj_ext metadata in unused space within s->size
  2025-12-24  5:33   ` Hao Li
@ 2025-12-24  6:38     ` Harry Yoo
  2025-12-24 12:43       ` Hao Li
  0 siblings, 1 reply; 27+ messages in thread
From: Harry Yoo @ 2025-12-24  6:38 UTC (permalink / raw)
  To: Hao Li
  Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
	mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, tytso,
	adilger.kernel, linux-ext4, linux-kernel, cgroups

On Wed, Dec 24, 2025 at 01:33:59PM +0800, Hao Li wrote:
> On Mon, Dec 22, 2025 at 08:08:43PM +0900, Harry Yoo wrote:
> > When a cache has high s->align value and s->object_size is not aligned
> > to it, each object ends up with some unused space because of alignment.
> > If this wasted space is big enough, we can use it to store the
> > slabobj_ext metadata instead of wasting it.
> > 
> > On my system, this happens with caches like kmem_cache, mm_struct, pid,
> > task_struct, sighand_cache, xfs_inode, and others.
> > 
> > To place the slabobj_ext metadata within each object, the existing
> > slab_obj_ext() logic can still be used by setting:
> > 
> >   - slab->obj_exts = slab_address(slab) + s->red_left_zone +
> >                      (slabobj_ext offset)
> >   - stride = s->size
> > 
> > slab_obj_ext() doesn't need know where the metadata is stored,
> > so this method works without adding extra overhead to slab_obj_ext().
> > 
> > A good example benefiting from this optimization is xfs_inode
> > (object_size: 992, align: 64). To measure memory savings, 2 millions of
> > files were created on XFS.
> > 
> > [ MEMCG=y, MEM_ALLOC_PROFILING=n ]
> > 
> > Before patch (creating ~2.64M directories on xfs):
> >   Slab:            5175976 kB
> >   SReclaimable:    3837524 kB
> >   SUnreclaim:      1338452 kB
> > 
> > After patch (creating ~2.64M directories on xfs):
> >   Slab:            5152912 kB
> >   SReclaimable:    3838568 kB
> >   SUnreclaim:      1314344 kB (-23.54 MiB)
> > 
> > Enjoy the memory savings!
> > 
> > Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > ---
> >  include/linux/slab.h |  9 ++++++
> >  mm/slab_common.c     |  6 ++--
> >  mm/slub.c            | 73 ++++++++++++++++++++++++++++++++++++++++++--
> >  3 files changed, 83 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > index 4554c04a9bd7..da512d9ab1a0 100644
> > --- a/include/linux/slab.h
> > +++ b/include/linux/slab.h
> > @@ -59,6 +59,9 @@ enum _slab_flag_bits {
> >  	_SLAB_CMPXCHG_DOUBLE,
> >  #ifdef CONFIG_SLAB_OBJ_EXT
> >  	_SLAB_NO_OBJ_EXT,
> > +#endif
> > +#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
> > +	_SLAB_OBJ_EXT_IN_OBJ,
> >  #endif
> >  	_SLAB_FLAGS_LAST_BIT
> >  };
> > @@ -244,6 +247,12 @@ enum _slab_flag_bits {
> >  #define SLAB_NO_OBJ_EXT		__SLAB_FLAG_UNUSED
> >  #endif
> >  
> > +#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
> > +#define SLAB_OBJ_EXT_IN_OBJ	__SLAB_FLAG_BIT(_SLAB_OBJ_EXT_IN_OBJ)
> > +#else
> > +#define SLAB_OBJ_EXT_IN_OBJ	__SLAB_FLAG_UNUSED
> > +#endif
> > +
> >  /*
> >   * ZERO_SIZE_PTR will be returned for zero sized kmalloc requests.
> >   *
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index c4cf9ed2ec92..f0a6db20d7ea 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -43,11 +43,13 @@ DEFINE_MUTEX(slab_mutex);
> >  struct kmem_cache *kmem_cache;
> >  
> >  /*
> > - * Set of flags that will prevent slab merging
> > + * Set of flags that will prevent slab merging.
> > + * Any flag that adds per-object metadata should be included,
> > + * since slab merging can update s->inuse that affects the metadata layout.
> >   */
> >  #define SLAB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
> >  		SLAB_TRACE | SLAB_TYPESAFE_BY_RCU | SLAB_NOLEAKTRACE | \
> > -		SLAB_FAILSLAB | SLAB_NO_MERGE)
> > +		SLAB_FAILSLAB | SLAB_NO_MERGE | SLAB_OBJ_EXT_IN_OBJ)
> >  
> >  #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \
> >  			 SLAB_CACHE_DMA32 | SLAB_ACCOUNT)
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 3fc3d2ca42e7..78f0087c8e48 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -977,6 +977,39 @@ static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
> >  {
> >  	return false;
> >  }
> > +
> > +#endif
> > +
> > +#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
> > +static bool obj_exts_in_object(struct kmem_cache *s)
> > +{
> > +	return s->flags & SLAB_OBJ_EXT_IN_OBJ;
> > +}
> > +
> > +static unsigned int obj_exts_offset_in_object(struct kmem_cache *s)
> > +{
> > +	unsigned int offset = get_info_end(s);
> > +
> > +	if (kmem_cache_debug_flags(s, SLAB_STORE_USER))
> > +		offset += sizeof(struct track) * 2;
> > +
> > +	if (slub_debug_orig_size(s))
> > +		offset += sizeof(unsigned long);
> > +
> > +	offset += kasan_metadata_size(s, false);
> > +
> > +	return offset;
> > +}
> > +#else
> > +static inline bool obj_exts_in_object(struct kmem_cache *s)
> > +{
> > +	return false;
> > +}
> > +
> > +static inline unsigned int obj_exts_offset_in_object(struct kmem_cache *s)
> > +{
> > +	return 0;
> > +}
> >  #endif
> >  
> >  #ifdef CONFIG_SLUB_DEBUG
> > @@ -1277,6 +1310,9 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p)
> >  
> >  	off += kasan_metadata_size(s, false);
> >  
> > +	if (obj_exts_in_object(s))
> > +		off += sizeof(struct slabobj_ext);
> > +
> >  	if (off != size_from_object(s))
> >  		/* Beginning of the filler is the free pointer */
> >  		print_section(KERN_ERR, "Padding  ", p + off,
> > @@ -1446,7 +1482,10 @@ check_bytes_and_report(struct kmem_cache *s, struct slab *slab,
> >   * 	A. Free pointer (if we cannot overwrite object on free)
> >   * 	B. Tracking data for SLAB_STORE_USER
> >   *	C. Original request size for kmalloc object (SLAB_STORE_USER enabled)
> > - *	D. Padding to reach required alignment boundary or at minimum
> > + *	D. KASAN alloc metadata (KASAN enabled)
> > + *	E. struct slabobj_ext to store accounting metadata
> > + *	   (SLAB_OBJ_EXT_IN_OBJ enabled)
> > + *	F. Padding to reach required alignment boundary or at minimum
> >   * 		one word if debugging is on to be able to detect writes
> >   * 		before the word boundary.
> >   *
> > @@ -1474,6 +1513,9 @@ static int check_pad_bytes(struct kmem_cache *s, struct slab *slab, u8 *p)
> >  
> >  	off += kasan_metadata_size(s, false);
> >  
> > +	if (obj_exts_in_object(s))
> > +		off += sizeof(struct slabobj_ext);
> > +
> >  	if (size_from_object(s) == off)
> >  		return 1;
> >  
> > @@ -2280,7 +2322,8 @@ static inline void free_slab_obj_exts(struct slab *slab)
> >  		return;
> >  	}
> >  
> > -	if (obj_exts_in_slab(slab->slab_cache, slab)) {
> > +	if (obj_exts_in_slab(slab->slab_cache, slab) ||
> > +			obj_exts_in_object(slab->slab_cache)) {
> >  		slab->obj_exts = 0;
> >  		return;
> >  	}
> > @@ -2326,6 +2369,23 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
> >  			obj_exts |= MEMCG_DATA_OBJEXTS;
> >  		slab->obj_exts = obj_exts;
> >  		slab_set_stride(slab, sizeof(struct slabobj_ext));
> > +	} else if (obj_exts_in_object(s)) {
> > +		unsigned int offset = obj_exts_offset_in_object(s);
> > +
> > +		obj_exts = (unsigned long)slab_address(slab);
> > +		obj_exts += s->red_left_pad;
> > +		obj_exts += obj_exts_offset_in_object(s);
> 
> Hi, Harry
> 
> It looks like this could just be simplified to obj_exts += offset, right?

Right! Will do in v5.

> > +
> > +		get_slab_obj_exts(obj_exts);
> > +		for_each_object(addr, s, slab_address(slab), slab->objects)
> > +			memset(kasan_reset_tag(addr) + offset, 0,
> > +			       sizeof(struct slabobj_ext));
> > +		put_slab_obj_exts(obj_exts);
> > +
> > +		if (IS_ENABLED(CONFIG_MEMCG))
> > +			obj_exts |= MEMCG_DATA_OBJEXTS;
> > +		slab->obj_exts = obj_exts;
> > +		slab_set_stride(slab, s->size);
> >  	}
> >  }
> >  
> > @@ -8023,6 +8083,7 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
> >  {
> >  	slab_flags_t flags = s->flags;
> >  	unsigned int size = s->object_size;
> > +	unsigned int aligned_size;
> >  	unsigned int order;
> >  
> >  	/*
> > @@ -8132,7 +8193,13 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
> >  	 * offset 0. In order to align the objects we have to simply size
> >  	 * each object to conform to the alignment.
> >  	 */
> > -	size = ALIGN(size, s->align);
> > +	aligned_size = ALIGN(size, s->align);
> > +#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
> > +	if (aligned_size - size >= sizeof(struct slabobj_ext))
> > +		s->flags |= SLAB_OBJ_EXT_IN_OBJ;
> > +#endif
> > +	size = aligned_size;
> > +
> 
> One more thought: in calculate_sizes() we add some extra padding when
> SLAB_RED_ZONE is enabled:
> 
> if (flags & SLAB_RED_ZONE) {
> 	/*
> 	 * Add some empty padding so that we can catch
> 	 * overwrites from earlier objects rather than let
> 	 * tracking information or the free pointer be
> 	 * corrupted if a user writes before the start
> 	 * of the object.
> 	 */
> 	size += sizeof(void *);
> 	...
> }
> 
> 
> From what I understand, this additional padding ends up being placed
> after the KASAN allocation metadata.

Right.

> Since it’s only "extra" padding (i.e., it doesn’t seem strictly required
> for the layout), and your patch would reuse this area — together with
> the final padding introduced by `size = ALIGN(size, s->align);`

Very good point!
Nah, it wasn't intentional to reuse the extra padding.

> for objext, it seems like this padding may no longer provide much benefit.
> Do you think it would make sense to remove this extra padding
> altogether?

I think when debugging flags are enabled it'd still be useful to have,
I'll try to keep the padding area after obj_ext (so that overwrites from
the previous object won't overwrite the metadata).

Thanks a lot!

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V4 8/8] mm/slab: place slabobj_ext metadata in unused space within s->size
  2025-12-24  6:38     ` Harry Yoo
@ 2025-12-24 12:43       ` Hao Li
  2025-12-30  4:59         ` Harry Yoo
  0 siblings, 1 reply; 27+ messages in thread
From: Hao Li @ 2025-12-24 12:43 UTC (permalink / raw)
  To: Harry Yoo
  Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
	mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, tytso,
	adilger.kernel, linux-ext4, linux-kernel, cgroups

On Wed, Dec 24, 2025 at 03:38:57PM +0900, Harry Yoo wrote:
> On Wed, Dec 24, 2025 at 01:33:59PM +0800, Hao Li wrote:
> > On Mon, Dec 22, 2025 at 08:08:43PM +0900, Harry Yoo wrote:
> > > When a cache has high s->align value and s->object_size is not aligned
> > > to it, each object ends up with some unused space because of alignment.
> > > If this wasted space is big enough, we can use it to store the
> > > slabobj_ext metadata instead of wasting it.
> > > 
> > > On my system, this happens with caches like kmem_cache, mm_struct, pid,
> > > task_struct, sighand_cache, xfs_inode, and others.
> > > 
> > > To place the slabobj_ext metadata within each object, the existing
> > > slab_obj_ext() logic can still be used by setting:
> > > 
> > >   - slab->obj_exts = slab_address(slab) + s->red_left_zone +
> > >                      (slabobj_ext offset)
> > >   - stride = s->size
> > > 
> > > slab_obj_ext() doesn't need know where the metadata is stored,
> > > so this method works without adding extra overhead to slab_obj_ext().
> > > 
> > > A good example benefiting from this optimization is xfs_inode
> > > (object_size: 992, align: 64). To measure memory savings, 2 millions of
> > > files were created on XFS.
> > > 
> > > [ MEMCG=y, MEM_ALLOC_PROFILING=n ]
> > > 
> > > Before patch (creating ~2.64M directories on xfs):
> > >   Slab:            5175976 kB
> > >   SReclaimable:    3837524 kB
> > >   SUnreclaim:      1338452 kB
> > > 
> > > After patch (creating ~2.64M directories on xfs):
> > >   Slab:            5152912 kB
> > >   SReclaimable:    3838568 kB
> > >   SUnreclaim:      1314344 kB (-23.54 MiB)
> > > 
> > > Enjoy the memory savings!
> > > 
> > > Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > > ---
> > >  include/linux/slab.h |  9 ++++++
> > >  mm/slab_common.c     |  6 ++--
> > >  mm/slub.c            | 73 ++++++++++++++++++++++++++++++++++++++++++--
> > >  3 files changed, 83 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > > index 4554c04a9bd7..da512d9ab1a0 100644
> > > --- a/include/linux/slab.h
> > > +++ b/include/linux/slab.h
> > > @@ -59,6 +59,9 @@ enum _slab_flag_bits {
> > >  	_SLAB_CMPXCHG_DOUBLE,
> > >  #ifdef CONFIG_SLAB_OBJ_EXT
> > >  	_SLAB_NO_OBJ_EXT,
> > > +#endif
> > > +#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
> > > +	_SLAB_OBJ_EXT_IN_OBJ,
> > >  #endif
> > >  	_SLAB_FLAGS_LAST_BIT
> > >  };
> > > @@ -244,6 +247,12 @@ enum _slab_flag_bits {
> > >  #define SLAB_NO_OBJ_EXT		__SLAB_FLAG_UNUSED
> > >  #endif
> > >  
> > > +#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
> > > +#define SLAB_OBJ_EXT_IN_OBJ	__SLAB_FLAG_BIT(_SLAB_OBJ_EXT_IN_OBJ)
> > > +#else
> > > +#define SLAB_OBJ_EXT_IN_OBJ	__SLAB_FLAG_UNUSED
> > > +#endif
> > > +
> > >  /*
> > >   * ZERO_SIZE_PTR will be returned for zero sized kmalloc requests.
> > >   *
> > > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > > index c4cf9ed2ec92..f0a6db20d7ea 100644
> > > --- a/mm/slab_common.c
> > > +++ b/mm/slab_common.c
> > > @@ -43,11 +43,13 @@ DEFINE_MUTEX(slab_mutex);
> > >  struct kmem_cache *kmem_cache;
> > >  
> > >  /*
> > > - * Set of flags that will prevent slab merging
> > > + * Set of flags that will prevent slab merging.
> > > + * Any flag that adds per-object metadata should be included,
> > > + * since slab merging can update s->inuse that affects the metadata layout.
> > >   */
> > >  #define SLAB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
> > >  		SLAB_TRACE | SLAB_TYPESAFE_BY_RCU | SLAB_NOLEAKTRACE | \
> > > -		SLAB_FAILSLAB | SLAB_NO_MERGE)
> > > +		SLAB_FAILSLAB | SLAB_NO_MERGE | SLAB_OBJ_EXT_IN_OBJ)
> > >  
> > >  #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \
> > >  			 SLAB_CACHE_DMA32 | SLAB_ACCOUNT)
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index 3fc3d2ca42e7..78f0087c8e48 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -977,6 +977,39 @@ static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
> > >  {
> > >  	return false;
> > >  }
> > > +
> > > +#endif
> > > +
> > > +#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
> > > +static bool obj_exts_in_object(struct kmem_cache *s)
> > > +{
> > > +	return s->flags & SLAB_OBJ_EXT_IN_OBJ;
> > > +}
> > > +
> > > +static unsigned int obj_exts_offset_in_object(struct kmem_cache *s)
> > > +{
> > > +	unsigned int offset = get_info_end(s);
> > > +
> > > +	if (kmem_cache_debug_flags(s, SLAB_STORE_USER))
> > > +		offset += sizeof(struct track) * 2;
> > > +
> > > +	if (slub_debug_orig_size(s))
> > > +		offset += sizeof(unsigned long);
> > > +
> > > +	offset += kasan_metadata_size(s, false);
> > > +
> > > +	return offset;
> > > +}
> > > +#else
> > > +static inline bool obj_exts_in_object(struct kmem_cache *s)
> > > +{
> > > +	return false;
> > > +}
> > > +
> > > +static inline unsigned int obj_exts_offset_in_object(struct kmem_cache *s)
> > > +{
> > > +	return 0;
> > > +}
> > >  #endif
> > >  
> > >  #ifdef CONFIG_SLUB_DEBUG
> > > @@ -1277,6 +1310,9 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p)
> > >  
> > >  	off += kasan_metadata_size(s, false);
> > >  
> > > +	if (obj_exts_in_object(s))
> > > +		off += sizeof(struct slabobj_ext);
> > > +
> > >  	if (off != size_from_object(s))
> > >  		/* Beginning of the filler is the free pointer */
> > >  		print_section(KERN_ERR, "Padding  ", p + off,
> > > @@ -1446,7 +1482,10 @@ check_bytes_and_report(struct kmem_cache *s, struct slab *slab,
> > >   * 	A. Free pointer (if we cannot overwrite object on free)
> > >   * 	B. Tracking data for SLAB_STORE_USER
> > >   *	C. Original request size for kmalloc object (SLAB_STORE_USER enabled)
> > > - *	D. Padding to reach required alignment boundary or at minimum
> > > + *	D. KASAN alloc metadata (KASAN enabled)
> > > + *	E. struct slabobj_ext to store accounting metadata
> > > + *	   (SLAB_OBJ_EXT_IN_OBJ enabled)
> > > + *	F. Padding to reach required alignment boundary or at minimum
> > >   * 		one word if debugging is on to be able to detect writes
> > >   * 		before the word boundary.
> > >   *
> > > @@ -1474,6 +1513,9 @@ static int check_pad_bytes(struct kmem_cache *s, struct slab *slab, u8 *p)
> > >  
> > >  	off += kasan_metadata_size(s, false);
> > >  
> > > +	if (obj_exts_in_object(s))
> > > +		off += sizeof(struct slabobj_ext);
> > > +
> > >  	if (size_from_object(s) == off)
> > >  		return 1;
> > >  
> > > @@ -2280,7 +2322,8 @@ static inline void free_slab_obj_exts(struct slab *slab)
> > >  		return;
> > >  	}
> > >  
> > > -	if (obj_exts_in_slab(slab->slab_cache, slab)) {
> > > +	if (obj_exts_in_slab(slab->slab_cache, slab) ||
> > > +			obj_exts_in_object(slab->slab_cache)) {
> > >  		slab->obj_exts = 0;
> > >  		return;
> > >  	}
> > > @@ -2326,6 +2369,23 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
> > >  			obj_exts |= MEMCG_DATA_OBJEXTS;
> > >  		slab->obj_exts = obj_exts;
> > >  		slab_set_stride(slab, sizeof(struct slabobj_ext));
> > > +	} else if (obj_exts_in_object(s)) {
> > > +		unsigned int offset = obj_exts_offset_in_object(s);
> > > +
> > > +		obj_exts = (unsigned long)slab_address(slab);
> > > +		obj_exts += s->red_left_pad;
> > > +		obj_exts += obj_exts_offset_in_object(s);
> > 
> > Hi, Harry
> > 
> > It looks like this could just be simplified to obj_exts += offset, right?
> 
> Right! Will do in v5.
> 
> > > +
> > > +		get_slab_obj_exts(obj_exts);
> > > +		for_each_object(addr, s, slab_address(slab), slab->objects)
> > > +			memset(kasan_reset_tag(addr) + offset, 0,
> > > +			       sizeof(struct slabobj_ext));
> > > +		put_slab_obj_exts(obj_exts);
> > > +
> > > +		if (IS_ENABLED(CONFIG_MEMCG))
> > > +			obj_exts |= MEMCG_DATA_OBJEXTS;
> > > +		slab->obj_exts = obj_exts;
> > > +		slab_set_stride(slab, s->size);
> > >  	}
> > >  }
> > >  
> > > @@ -8023,6 +8083,7 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
> > >  {
> > >  	slab_flags_t flags = s->flags;
> > >  	unsigned int size = s->object_size;
> > > +	unsigned int aligned_size;
> > >  	unsigned int order;
> > >  
> > >  	/*
> > > @@ -8132,7 +8193,13 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
> > >  	 * offset 0. In order to align the objects we have to simply size
> > >  	 * each object to conform to the alignment.
> > >  	 */
> > > -	size = ALIGN(size, s->align);
> > > +	aligned_size = ALIGN(size, s->align);
> > > +#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
> > > +	if (aligned_size - size >= sizeof(struct slabobj_ext))
> > > +		s->flags |= SLAB_OBJ_EXT_IN_OBJ;
> > > +#endif
> > > +	size = aligned_size;
> > > +
> > 
> > One more thought: in calculate_sizes() we add some extra padding when
> > SLAB_RED_ZONE is enabled:
> > 
> > if (flags & SLAB_RED_ZONE) {
> > 	/*
> > 	 * Add some empty padding so that we can catch
> > 	 * overwrites from earlier objects rather than let
> > 	 * tracking information or the free pointer be
> > 	 * corrupted if a user writes before the start
> > 	 * of the object.
> > 	 */
> > 	size += sizeof(void *);
> > 	...
> > }
> > 
> > 
> > From what I understand, this additional padding ends up being placed
> > after the KASAN allocation metadata.
> 
> Right.
> 
> > Since it’s only "extra" padding (i.e., it doesn’t seem strictly required
> > for the layout), and your patch would reuse this area — together with
> > the final padding introduced by `size = ALIGN(size, s->align);`
> 
> Very good point!
> Nah, it wasn't intentional to reuse the extra padding.
> 
> > for objext, it seems like this padding may no longer provide much benefit.
> > Do you think it would make sense to remove this extra padding
> > altogether?
> 
> I think when debugging flags are enabled it'd still be useful to have,

Absolutely — I’m with you on this.

After thinking about it again, I agree it’s better to keep it.

Without that mandatory extra word, we could end up with "no trailing
padding at all" in cases where ALIGN(size, s->align) doesn’t actually
add any bytes.

> I'll try to keep the padding area after obj_ext (so that overwrites from
> the previous object won't overwrite the metadata).

Agree — we should make sure there is at least sizeof(void *) of extra
space after obj_exts when SLAB_RED_ZONE is enabled, so POISON_INUSE has
somewhere to go.

> 
> Thanks a lot!

Happy to help.

-- 
Thanks,
Hao
> 
> -- 
> Cheers,
> Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] slub: clarify object field layout comments
  2025-12-24  5:53             ` Harry Yoo
  2025-12-24  6:05               ` Hao Li
@ 2025-12-24 12:51               ` Hao Li
  2025-12-29  7:07                 ` Harry Yoo
  1 sibling, 1 reply; 27+ messages in thread
From: Hao Li @ 2025-12-24 12:51 UTC (permalink / raw)
  To: Harry Yoo
  Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
	mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, tytso,
	adilger.kernel, linux-ext4, linux-kernel, cgroups

The comments above check_pad_bytes() document the field layout of a
single object. Rewrite them to improve clarity and precision.

Also update an outdated comment in calculate_sizes().

Suggested-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Hao Li <hao.li@linux.dev>
---
Hi Harry, this patch adds more detailed object layout documentation. Let
me know if you have any comments.

 mm/slub.c | 92 ++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 53 insertions(+), 39 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index a94c64f56504..138e9d13540d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1211,44 +1211,58 @@ check_bytes_and_report(struct kmem_cache *s, struct slab *slab,
 }
 
 /*
- * Object layout:
- *
- * object address
- * 	Bytes of the object to be managed.
- * 	If the freepointer may overlay the object then the free
- *	pointer is at the middle of the object.
- *
- * 	Poisoning uses 0x6b (POISON_FREE) and the last byte is
- * 	0xa5 (POISON_END)
- *
- * object + s->object_size
- * 	Padding to reach word boundary. This is also used for Redzoning.
- * 	Padding is extended by another word if Redzoning is enabled and
- * 	object_size == inuse.
- *
- * 	We fill with 0xbb (SLUB_RED_INACTIVE) for inactive objects and with
- * 	0xcc (SLUB_RED_ACTIVE) for objects in use.
- *
- * object + s->inuse
- * 	Meta data starts here.
- *
- * 	A. Free pointer (if we cannot overwrite object on free)
- * 	B. Tracking data for SLAB_STORE_USER
- *	C. Original request size for kmalloc object (SLAB_STORE_USER enabled)
- *	D. Padding to reach required alignment boundary or at minimum
- * 		one word if debugging is on to be able to detect writes
- * 		before the word boundary.
- *
- *	Padding is done using 0x5a (POISON_INUSE)
- *
- * object + s->size
- * 	Nothing is used beyond s->size.
- *
- * If slabcaches are merged then the object_size and inuse boundaries are mostly
- * ignored. And therefore no slab options that rely on these boundaries
+ * Object field layout:
+ *
+ * [Left redzone padding] (if SLAB_RED_ZONE)
+ *   - Field size: s->red_left_pad
+ *   - Filled with 0xbb (SLUB_RED_INACTIVE) for inactive objects and
+ *     0xcc (SLUB_RED_ACTIVE) for objects in use when SLAB_RED_ZONE.
+ *
+ * [Object bytes]
+ *   - Field size: s->object_size
+ *   - Object payload bytes.
+ *   - If the freepointer may overlap the object, it is stored inside
+ *     the object (typically near the middle).
+ *   - Poisoning uses 0x6b (POISON_FREE) and the last byte is
+ *     0xa5 (POISON_END) when __OBJECT_POISON is enabled.
+ *
+ * [Word-align padding] (right redzone when SLAB_RED_ZONE is set)
+ *   - Field size: s->inuse - s->object_size
+ *   - If redzoning is enabled and ALIGN(size, sizeof(void *)) adds no
+ *     padding, explicitly extend by one word so the right redzone is
+ *     non-empty.
+ *   - Filled with 0xbb (SLUB_RED_INACTIVE) for inactive objects and
+ *     0xcc (SLUB_RED_ACTIVE) for objects in use when SLAB_RED_ZONE.
+ *
+ * [Metadata starts at object + s->inuse]
+ *   - A. freelist pointer (if freeptr_outside_object)
+ *   - B. alloc tracking (SLAB_STORE_USER)
+ *   - C. free tracking (SLAB_STORE_USER)
+ *   - D. original request size (SLAB_KMALLOC && SLAB_STORE_USER)
+ *   - E. KASAN metadata (if enabled)
+ *
+ * [Mandatory padding] (if CONFIG_SLUB_DEBUG && SLAB_RED_ZONE)
+ *   - One mandatory debug word to guarantee a minimum poisoned gap
+ *     between metadata and the next object, independent of alignment.
+ *   - Filled with 0x5a (POISON_INUSE) when SLAB_POISON is set.
+ * [Final alignment padding]
+ *   - Any bytes added by ALIGN(size, s->align) to reach s->size.
+ *   - Filled with 0x5a (POISON_INUSE) when SLAB_POISON is set.
+ *
+ * Notes:
+ * - Redzones are filled by init_object() with SLUB_RED_ACTIVE/INACTIVE.
+ * - Object contents are poisoned with POISON_FREE/END when __OBJECT_POISON.
+ * - The trailing padding is pre-filled with POISON_INUSE by
+ *   setup_slab_debug() when SLAB_POISON is set, and is validated by
+ *   check_pad_bytes().
+ * - The first object pointer is slab_address(slab) +
+ *   (s->red_left_pad if redzoning); subsequent objects are reached by
+ *   adding s->size each time.
+ *
+ * If slabcaches are merged then the object_size and inuse boundaries are
+ * mostly ignored. Therefore no slab options that rely on these boundaries
  * may be used with merged slabcaches.
  */
-
 static int check_pad_bytes(struct kmem_cache *s, struct slab *slab, u8 *p)
 {
 	unsigned long off = get_info_end(s);	/* The end of info */
@@ -7103,9 +7117,9 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
 
 
 	/*
-	 * If we are Redzoning then check if there is some space between the
-	 * end of the object and the free pointer. If not then add an
-	 * additional word to have some bytes to store Redzone information.
+	 * If we are Redzoning and there is no space between the end of the
+	 * object and the following fields, add one word so the right Redzone
+	 * is non-empty.
 	 */
 	if ((flags & SLAB_RED_ZONE) && size == s->object_size)
 		size += sizeof(void *);
-- 
2.50.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] slub: clarify object field layout comments
  2025-12-24 12:51               ` [PATCH] slub: clarify object field layout comments Hao Li
@ 2025-12-29  7:07                 ` Harry Yoo
  2025-12-29 11:56                   ` Hao Li
  0 siblings, 1 reply; 27+ messages in thread
From: Harry Yoo @ 2025-12-29  7:07 UTC (permalink / raw)
  To: Hao Li
  Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
	mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, tytso,
	adilger.kernel, linux-ext4, linux-kernel, cgroups

On Wed, Dec 24, 2025 at 08:51:14PM +0800, Hao Li wrote:
> The comments above check_pad_bytes() document the field layout of a
> single object. Rewrite them to improve clarity and precision.
> 
> Also update an outdated comment in calculate_sizes().
> 
> Suggested-by: Harry Yoo <harry.yoo@oracle.com>
> Signed-off-by: Hao Li <hao.li@linux.dev>
> ---
> Hi Harry, this patch adds more detailed object layout documentation. Let
> me know if you have any comments.

Hi Hao, thanks for improving it!
It looks much clearer now.

few nits below.

> + * Object field layout:
> + *
> + * [Left redzone padding] (if SLAB_RED_ZONE)
> + *   - Field size: s->red_left_pad
> + *   - Filled with 0xbb (SLUB_RED_INACTIVE) for inactive objects and
> + *     0xcc (SLUB_RED_ACTIVE) for objects in use when SLAB_RED_ZONE.

nit: although it becomes clear after reading the Notes: section,
I would like to make it clear that object address starts here (after
the left redzone) and the left redzone is right before each object.

> + * [Object bytes]
> + *   - Field size: s->object_size
> + *   - Object payload bytes.
> + *   - If the freepointer may overlap the object, it is stored inside
> + *     the object (typically near the middle).
> + *   - Poisoning uses 0x6b (POISON_FREE) and the last byte is
> + *     0xa5 (POISON_END) when __OBJECT_POISON is enabled.
> + *
> + * [Word-align padding] (right redzone when SLAB_RED_ZONE is set)
> + *   - Field size: s->inuse - s->object_size
> + *   - If redzoning is enabled and ALIGN(size, sizeof(void *)) adds no
> + *     padding, explicitly extend by one word so the right redzone is
> + *     non-empty.
> + *   - Filled with 0xbb (SLUB_RED_INACTIVE) for inactive objects and
> + *     0xcc (SLUB_RED_ACTIVE) for objects in use when SLAB_RED_ZONE.
> + *
> + * [Metadata starts at object + s->inuse]
> + *   - A. freelist pointer (if freeptr_outside_object)
> + *   - B. alloc tracking (SLAB_STORE_USER)
> + *   - C. free tracking (SLAB_STORE_USER)
> + *   - D. original request size (SLAB_KMALLOC && SLAB_STORE_USER)
> + *   - E. KASAN metadata (if enabled)
> + *
> + * [Mandatory padding] (if CONFIG_SLUB_DEBUG && SLAB_RED_ZONE)
> + *   - One mandatory debug word to guarantee a minimum poisoned gap
> + *     between metadata and the next object, independent of alignment.
> + *   - Filled with 0x5a (POISON_INUSE) when SLAB_POISON is set.
>
> + * [Final alignment padding]
> + *   - Any bytes added by ALIGN(size, s->align) to reach s->size.
> + *   - Filled with 0x5a (POISON_INUSE) when SLAB_POISON is set.
> + *
> + * Notes:
> + * - Redzones are filled by init_object() with SLUB_RED_ACTIVE/INACTIVE.
> + * - Object contents are poisoned with POISON_FREE/END when __OBJECT_POISON.
> + * - The trailing padding is pre-filled with POISON_INUSE by
> + *   setup_slab_debug() when SLAB_POISON is set, and is validated by
> + *   check_pad_bytes().
> + * - The first object pointer is slab_address(slab) +
> + *   (s->red_left_pad if redzoning); subsequent objects are reached by
> + *   adding s->size each time.
> + *
> + * If slabcaches are merged then the object_size and inuse boundaries are
> + * mostly ignored. Therefore no slab options that rely on these boundaries
>   * may be used with merged slabcaches.

For the last paragraph, perhaps it'll be clearer to say:

  "If a slab cache flag relies on specific metadata to exist at a fixed
   offset, the flag must be included in SLAB_NEVER_MERGE to prevent
   merging. Otherwise, the cache would misbehave as s->object_size and
   s->inuse are adjusted during cache merging"

Otherwise looks great to me, so please feel free to add:
Acked-by: Harry Yoo <harry.yoo@oracle.com>

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] slub: clarify object field layout comments
  2025-12-29  7:07                 ` Harry Yoo
@ 2025-12-29 11:56                   ` Hao Li
  0 siblings, 0 replies; 27+ messages in thread
From: Hao Li @ 2025-12-29 11:56 UTC (permalink / raw)
  To: Harry Yoo
  Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
	mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, tytso,
	adilger.kernel, linux-ext4, linux-kernel, cgroups

On Mon, Dec 29, 2025 at 04:07:54PM +0900, Harry Yoo wrote:
> On Wed, Dec 24, 2025 at 08:51:14PM +0800, Hao Li wrote:
> > The comments above check_pad_bytes() document the field layout of a
> > single object. Rewrite them to improve clarity and precision.
> > 
> > Also update an outdated comment in calculate_sizes().
> > 
> > Suggested-by: Harry Yoo <harry.yoo@oracle.com>
> > Signed-off-by: Hao Li <hao.li@linux.dev>
> > ---
> > Hi Harry, this patch adds more detailed object layout documentation. Let
> > me know if you have any comments.
> 
> Hi Hao, thanks for improving it!
> It looks much clearer now.

Hi Harry,

Thanks for the review and the Acked-by!

> 
> few nits below.
> 
> > + * Object field layout:
> > + *
> > + * [Left redzone padding] (if SLAB_RED_ZONE)
> > + *   - Field size: s->red_left_pad
> > + *   - Filled with 0xbb (SLUB_RED_INACTIVE) for inactive objects and
> > + *     0xcc (SLUB_RED_ACTIVE) for objects in use when SLAB_RED_ZONE.
> 
> nit: although it becomes clear after reading the Notes: section,
> I would like to make it clear that object address starts here (after
> the left redzone) and the left redzone is right before each object.

Good point. I’ll make this explicit in v2.

> 
> > + * [Object bytes]
> > + *   - Field size: s->object_size
> > + *   - Object payload bytes.
> > + *   - If the freepointer may overlap the object, it is stored inside
> > + *     the object (typically near the middle).
> > + *   - Poisoning uses 0x6b (POISON_FREE) and the last byte is
> > + *     0xa5 (POISON_END) when __OBJECT_POISON is enabled.
> > + *
> > + * [Word-align padding] (right redzone when SLAB_RED_ZONE is set)
> > + *   - Field size: s->inuse - s->object_size
> > + *   - If redzoning is enabled and ALIGN(size, sizeof(void *)) adds no
> > + *     padding, explicitly extend by one word so the right redzone is
> > + *     non-empty.
> > + *   - Filled with 0xbb (SLUB_RED_INACTIVE) for inactive objects and
> > + *     0xcc (SLUB_RED_ACTIVE) for objects in use when SLAB_RED_ZONE.
> > + *
> > + * [Metadata starts at object + s->inuse]
> > + *   - A. freelist pointer (if freeptr_outside_object)
> > + *   - B. alloc tracking (SLAB_STORE_USER)
> > + *   - C. free tracking (SLAB_STORE_USER)
> > + *   - D. original request size (SLAB_KMALLOC && SLAB_STORE_USER)
> > + *   - E. KASAN metadata (if enabled)
> > + *
> > + * [Mandatory padding] (if CONFIG_SLUB_DEBUG && SLAB_RED_ZONE)
> > + *   - One mandatory debug word to guarantee a minimum poisoned gap
> > + *     between metadata and the next object, independent of alignment.
> > + *   - Filled with 0x5a (POISON_INUSE) when SLAB_POISON is set.
> >
> > + * [Final alignment padding]
> > + *   - Any bytes added by ALIGN(size, s->align) to reach s->size.
> > + *   - Filled with 0x5a (POISON_INUSE) when SLAB_POISON is set.
> > + *
> > + * Notes:
> > + * - Redzones are filled by init_object() with SLUB_RED_ACTIVE/INACTIVE.
> > + * - Object contents are poisoned with POISON_FREE/END when __OBJECT_POISON.
> > + * - The trailing padding is pre-filled with POISON_INUSE by
> > + *   setup_slab_debug() when SLAB_POISON is set, and is validated by
> > + *   check_pad_bytes().
> > + * - The first object pointer is slab_address(slab) +
> > + *   (s->red_left_pad if redzoning); subsequent objects are reached by
> > + *   adding s->size each time.
> > + *
> > + * If slabcaches are merged then the object_size and inuse boundaries are
> > + * mostly ignored. Therefore no slab options that rely on these boundaries
> >   * may be used with merged slabcaches.
> 
> For the last paragraph, perhaps it'll be clearer to say:
> 
>   "If a slab cache flag relies on specific metadata to exist at a fixed
>    offset, the flag must be included in SLAB_NEVER_MERGE to prevent
>    merging. Otherwise, the cache would misbehave as s->object_size and
>    s->inuse are adjusted during cache merging"

Agreed. I’ll reword that paragraph along your suggestion to emphasize
the fixed-offset metadata requirement.

> 
> Otherwise looks great to me, so please feel free to add:
> Acked-by: Harry Yoo <harry.yoo@oracle.com>

I'll include this Acked-by in v2. Thanks!

-- 
Thanks
Hao
> 
> -- 
> Cheers,
> Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V4 8/8] mm/slab: place slabobj_ext metadata in unused space within s->size
  2025-12-24 12:43       ` Hao Li
@ 2025-12-30  4:59         ` Harry Yoo
  2025-12-30  8:54           ` Hao Li
  0 siblings, 1 reply; 27+ messages in thread
From: Harry Yoo @ 2025-12-30  4:59 UTC (permalink / raw)
  To: Hao Li
  Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
	mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, tytso,
	adilger.kernel, linux-ext4, linux-kernel, cgroups

On Wed, Dec 24, 2025 at 08:43:17PM +0800, Hao Li wrote:
> On Wed, Dec 24, 2025 at 03:38:57PM +0900, Harry Yoo wrote:
> > On Wed, Dec 24, 2025 at 01:33:59PM +0800, Hao Li wrote:
> > > One more thought: in calculate_sizes() we add some extra padding when
> > > SLAB_RED_ZONE is enabled:
> > > 
> > > if (flags & SLAB_RED_ZONE) {
> > > 	/*
> > > 	 * Add some empty padding so that we can catch
> > > 	 * overwrites from earlier objects rather than let
> > > 	 * tracking information or the free pointer be
> > > 	 * corrupted if a user writes before the start
> > > 	 * of the object.
> > > 	 */
> > > 	size += sizeof(void *);
> > > 	...
> > > }
> > > 
> > > 
> > > From what I understand, this additional padding ends up being placed
> > > after the KASAN allocation metadata.
> > 
> > Right.
> > 
> > > Since it’s only "extra" padding (i.e., it doesn’t seem strictly required
> > > for the layout), and your patch would reuse this area — together with
> > > the final padding introduced by `size = ALIGN(size, s->align);`
> > 
> > Very good point!
> > Nah, it wasn't intentional to reuse the extra padding.

Waaaait, now I'm looking into it again to write V5...

It may reduce (or remove) the space for the final padding but not the
mandatory padding because the mandatory padding is already included
in the size before `aligned_size = ALIGN(size, s->align)`

> > > for objext, it seems like this padding may no longer provide much benefit.
> > > Do you think it would make sense to remove this extra padding
> > > altogether?
> > 
> > I think when debugging flags are enabled it'd still be useful to have,
> 
> Absolutely — I’m with you on this.
> 
> After thinking about it again, I agree it’s better to keep it.
> 
> Without that mandatory extra word, we could end up with "no trailing
> padding at all" in cases where ALIGN(size, s->align) doesn’t actually
> add any bytes.
> 
> > I'll try to keep the padding area after obj_ext (so that overwrites from
> > the previous object won't overwrite the metadata).
> 
> Agree — we should make sure there is at least sizeof(void *) of extra
> space after obj_exts when SLAB_RED_ZONE is enabled, so POISON_INUSE has
> somewhere to go.

I think V4 of the patchset is already doing that, no?

The mandatory padding exists after obj_ext if SLAB_RED_ZONE is enabled
and the final padding may or may not exist. check_pad_bytes() already knows
that the padding(s) exist after obj_ext.

By the way, thanks for fixing the comment once again,
it's easier to think about the layout now.

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V4 8/8] mm/slab: place slabobj_ext metadata in unused space within s->size
  2025-12-30  4:59         ` Harry Yoo
@ 2025-12-30  8:54           ` Hao Li
  0 siblings, 0 replies; 27+ messages in thread
From: Hao Li @ 2025-12-30  8:54 UTC (permalink / raw)
  To: Harry Yoo
  Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
	mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
	shakeel.butt, surenb, vincenzo.frascino, yeoreum.yun, tytso,
	adilger.kernel, linux-ext4, linux-kernel, cgroups

On Tue, Dec 30, 2025 at 01:59:57PM +0900, Harry Yoo wrote:
> On Wed, Dec 24, 2025 at 08:43:17PM +0800, Hao Li wrote:
> > On Wed, Dec 24, 2025 at 03:38:57PM +0900, Harry Yoo wrote:
> > > On Wed, Dec 24, 2025 at 01:33:59PM +0800, Hao Li wrote:
> > > > One more thought: in calculate_sizes() we add some extra padding when
> > > > SLAB_RED_ZONE is enabled:
> > > > 
> > > > if (flags & SLAB_RED_ZONE) {
> > > > 	/*
> > > > 	 * Add some empty padding so that we can catch
> > > > 	 * overwrites from earlier objects rather than let
> > > > 	 * tracking information or the free pointer be
> > > > 	 * corrupted if a user writes before the start
> > > > 	 * of the object.
> > > > 	 */
> > > > 	size += sizeof(void *);
> > > > 	...
> > > > }
> > > > 
> > > > 
> > > > From what I understand, this additional padding ends up being placed
> > > > after the KASAN allocation metadata.
> > > 
> > > Right.
> > > 
> > > > Since it’s only "extra" padding (i.e., it doesn’t seem strictly required
> > > > for the layout), and your patch would reuse this area — together with
> > > > the final padding introduced by `size = ALIGN(size, s->align);`
> > > 
> > > Very good point!
> > > Nah, it wasn't intentional to reuse the extra padding.
> 
> Waaaait, now I'm looking into it again to write V5...
> 
> It may reduce (or remove) the space for the final padding but not the
> mandatory padding because the mandatory padding is already included
> in the size before `aligned_size = ALIGN(size, s->align)`

Ah, right - I double-checked as well. `aligned_size - size` is exactly the
space reserved for the final padding, so slabobj_ext won't eat into the
mandatory padding.

> 
> > > > for objext, it seems like this padding may no longer provide much benefit.
> > > > Do you think it would make sense to remove this extra padding
> > > > altogether?
> > > 
> > > I think when debugging flags are enabled it'd still be useful to have,
> > 
> > Absolutely — I’m with you on this.
> > 
> > After thinking about it again, I agree it’s better to keep it.
> > 
> > Without that mandatory extra word, we could end up with "no trailing
> > padding at all" in cases where ALIGN(size, s->align) doesn’t actually
> > add any bytes.
> > 
> > > I'll try to keep the padding area after obj_ext (so that overwrites from
> > > the previous object won't overwrite the metadata).
> > 
> > Agree — we should make sure there is at least sizeof(void *) of extra
> > space after obj_exts when SLAB_RED_ZONE is enabled, so POISON_INUSE has
> > somewhere to go.
> 
> I think V4 of the patchset is already doing that, no?
> 
> The mandatory padding exists after obj_ext if SLAB_RED_ZONE is enabled
> and the final padding may or may not exist. check_pad_bytes() already knows
> that the padding(s) exist after obj_ext.

Yes, you are right, V4 already does this — I just hadn't noticed it earlier...

> 
> By the way, thanks for fixing the comment once again,
> it's easier to think about the layout now.

Glad it helped. The object layout is really subtle — missing even a
small detail was enough to throw us off. Glad we finally got it all
straightened out.

-- 
Thanks,
Hao


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2025-12-30  8:54 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-22 11:08 [PATCH V4 0/8] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unsed slab space Harry Yoo
2025-12-22 11:08 ` [PATCH V4 1/8] mm/slab: use unsigned long for orig_size to ensure proper metadata align Harry Yoo
2025-12-22 11:08 ` [PATCH V4 2/8] mm/slab: allow specifying free pointer offset when using constructor Harry Yoo
2025-12-22 11:08 ` [PATCH V4 3/8] ext4: specify the free pointer offset for ext4_inode_cache Harry Yoo
2025-12-22 11:08 ` [PATCH V4 4/8] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper Harry Yoo
2025-12-22 23:36   ` kernel test robot
2025-12-23  0:08   ` kernel test robot
2025-12-22 11:08 ` [PATCH V4 5/8] mm/slab: use stride to access slabobj_ext Harry Yoo
2025-12-22 11:08 ` [PATCH V4 6/8] mm/memcontrol,alloc_tag: handle slabobj_ext access under KASAN poison Harry Yoo
2025-12-22 11:08 ` [PATCH V4 7/8] mm/slab: save memory by allocating slabobj_ext array from leftover Harry Yoo
2025-12-23  1:40   ` kernel test robot
2025-12-23 15:08   ` Hao Li
2025-12-23 15:31     ` Harry Yoo
2025-12-23 16:08       ` Hao Li
2025-12-23 16:25         ` Harry Yoo
2025-12-24  3:18           ` Hao Li
2025-12-24  5:53             ` Harry Yoo
2025-12-24  6:05               ` Hao Li
2025-12-24 12:51               ` [PATCH] slub: clarify object field layout comments Hao Li
2025-12-29  7:07                 ` Harry Yoo
2025-12-29 11:56                   ` Hao Li
2025-12-22 11:08 ` [PATCH V4 8/8] mm/slab: place slabobj_ext metadata in unused space within s->size Harry Yoo
2025-12-24  5:33   ` Hao Li
2025-12-24  6:38     ` Harry Yoo
2025-12-24 12:43       ` Hao Li
2025-12-30  4:59         ` Harry Yoo
2025-12-30  8:54           ` Hao Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox