From: Harry Yoo <harry.yoo@oracle.com>
To: Hao Li <hao.li@linux.dev>
Cc: akpm@linux-foundation.org, vbabka@suse.cz, andreyknvl@gmail.com,
cl@gentwo.org, dvyukov@google.com, glider@google.com,
hannes@cmpxchg.org, linux-mm@kvack.org, mhocko@kernel.org,
muchun.song@linux.dev, rientjes@google.com,
roman.gushchin@linux.dev, ryabinin.a.a@gmail.com,
shakeel.butt@linux.dev, surenb@google.com,
vincenzo.frascino@arm.com, yeoreum.yun@arm.com, tytso@mit.edu,
adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org
Subject: Re: [PATCH V4 8/8] mm/slab: place slabobj_ext metadata in unused space within s->size
Date: Wed, 24 Dec 2025 15:38:57 +0900 [thread overview]
Message-ID: <aUuKgRlI60Hw3-Et@hyeyoo> (raw)
In-Reply-To: <l2xww4mysued3fjc2jzzy6cjrq5guygsxesmfqrhv2laxigpaq@ghj7xitfq7fh>
On Wed, Dec 24, 2025 at 01:33:59PM +0800, Hao Li wrote:
> On Mon, Dec 22, 2025 at 08:08:43PM +0900, Harry Yoo wrote:
> > When a cache has high s->align value and s->object_size is not aligned
> > to it, each object ends up with some unused space because of alignment.
> > If this wasted space is big enough, we can use it to store the
> > slabobj_ext metadata instead of wasting it.
> >
> > On my system, this happens with caches like kmem_cache, mm_struct, pid,
> > task_struct, sighand_cache, xfs_inode, and others.
> >
> > To place the slabobj_ext metadata within each object, the existing
> > slab_obj_ext() logic can still be used by setting:
> >
> > - slab->obj_exts = slab_address(slab) + s->red_left_zone +
> > (slabobj_ext offset)
> > - stride = s->size
> >
> > slab_obj_ext() doesn't need know where the metadata is stored,
> > so this method works without adding extra overhead to slab_obj_ext().
> >
> > A good example benefiting from this optimization is xfs_inode
> > (object_size: 992, align: 64). To measure memory savings, 2 millions of
> > files were created on XFS.
> >
> > [ MEMCG=y, MEM_ALLOC_PROFILING=n ]
> >
> > Before patch (creating ~2.64M directories on xfs):
> > Slab: 5175976 kB
> > SReclaimable: 3837524 kB
> > SUnreclaim: 1338452 kB
> >
> > After patch (creating ~2.64M directories on xfs):
> > Slab: 5152912 kB
> > SReclaimable: 3838568 kB
> > SUnreclaim: 1314344 kB (-23.54 MiB)
> >
> > Enjoy the memory savings!
> >
> > Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > ---
> > include/linux/slab.h | 9 ++++++
> > mm/slab_common.c | 6 ++--
> > mm/slub.c | 73 ++++++++++++++++++++++++++++++++++++++++++--
> > 3 files changed, 83 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > index 4554c04a9bd7..da512d9ab1a0 100644
> > --- a/include/linux/slab.h
> > +++ b/include/linux/slab.h
> > @@ -59,6 +59,9 @@ enum _slab_flag_bits {
> > _SLAB_CMPXCHG_DOUBLE,
> > #ifdef CONFIG_SLAB_OBJ_EXT
> > _SLAB_NO_OBJ_EXT,
> > +#endif
> > +#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
> > + _SLAB_OBJ_EXT_IN_OBJ,
> > #endif
> > _SLAB_FLAGS_LAST_BIT
> > };
> > @@ -244,6 +247,12 @@ enum _slab_flag_bits {
> > #define SLAB_NO_OBJ_EXT __SLAB_FLAG_UNUSED
> > #endif
> >
> > +#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
> > +#define SLAB_OBJ_EXT_IN_OBJ __SLAB_FLAG_BIT(_SLAB_OBJ_EXT_IN_OBJ)
> > +#else
> > +#define SLAB_OBJ_EXT_IN_OBJ __SLAB_FLAG_UNUSED
> > +#endif
> > +
> > /*
> > * ZERO_SIZE_PTR will be returned for zero sized kmalloc requests.
> > *
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index c4cf9ed2ec92..f0a6db20d7ea 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -43,11 +43,13 @@ DEFINE_MUTEX(slab_mutex);
> > struct kmem_cache *kmem_cache;
> >
> > /*
> > - * Set of flags that will prevent slab merging
> > + * Set of flags that will prevent slab merging.
> > + * Any flag that adds per-object metadata should be included,
> > + * since slab merging can update s->inuse that affects the metadata layout.
> > */
> > #define SLAB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
> > SLAB_TRACE | SLAB_TYPESAFE_BY_RCU | SLAB_NOLEAKTRACE | \
> > - SLAB_FAILSLAB | SLAB_NO_MERGE)
> > + SLAB_FAILSLAB | SLAB_NO_MERGE | SLAB_OBJ_EXT_IN_OBJ)
> >
> > #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \
> > SLAB_CACHE_DMA32 | SLAB_ACCOUNT)
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 3fc3d2ca42e7..78f0087c8e48 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -977,6 +977,39 @@ static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
> > {
> > return false;
> > }
> > +
> > +#endif
> > +
> > +#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
> > +static bool obj_exts_in_object(struct kmem_cache *s)
> > +{
> > + return s->flags & SLAB_OBJ_EXT_IN_OBJ;
> > +}
> > +
> > +static unsigned int obj_exts_offset_in_object(struct kmem_cache *s)
> > +{
> > + unsigned int offset = get_info_end(s);
> > +
> > + if (kmem_cache_debug_flags(s, SLAB_STORE_USER))
> > + offset += sizeof(struct track) * 2;
> > +
> > + if (slub_debug_orig_size(s))
> > + offset += sizeof(unsigned long);
> > +
> > + offset += kasan_metadata_size(s, false);
> > +
> > + return offset;
> > +}
> > +#else
> > +static inline bool obj_exts_in_object(struct kmem_cache *s)
> > +{
> > + return false;
> > +}
> > +
> > +static inline unsigned int obj_exts_offset_in_object(struct kmem_cache *s)
> > +{
> > + return 0;
> > +}
> > #endif
> >
> > #ifdef CONFIG_SLUB_DEBUG
> > @@ -1277,6 +1310,9 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p)
> >
> > off += kasan_metadata_size(s, false);
> >
> > + if (obj_exts_in_object(s))
> > + off += sizeof(struct slabobj_ext);
> > +
> > if (off != size_from_object(s))
> > /* Beginning of the filler is the free pointer */
> > print_section(KERN_ERR, "Padding ", p + off,
> > @@ -1446,7 +1482,10 @@ check_bytes_and_report(struct kmem_cache *s, struct slab *slab,
> > * A. Free pointer (if we cannot overwrite object on free)
> > * B. Tracking data for SLAB_STORE_USER
> > * C. Original request size for kmalloc object (SLAB_STORE_USER enabled)
> > - * D. Padding to reach required alignment boundary or at minimum
> > + * D. KASAN alloc metadata (KASAN enabled)
> > + * E. struct slabobj_ext to store accounting metadata
> > + * (SLAB_OBJ_EXT_IN_OBJ enabled)
> > + * F. Padding to reach required alignment boundary or at minimum
> > * one word if debugging is on to be able to detect writes
> > * before the word boundary.
> > *
> > @@ -1474,6 +1513,9 @@ static int check_pad_bytes(struct kmem_cache *s, struct slab *slab, u8 *p)
> >
> > off += kasan_metadata_size(s, false);
> >
> > + if (obj_exts_in_object(s))
> > + off += sizeof(struct slabobj_ext);
> > +
> > if (size_from_object(s) == off)
> > return 1;
> >
> > @@ -2280,7 +2322,8 @@ static inline void free_slab_obj_exts(struct slab *slab)
> > return;
> > }
> >
> > - if (obj_exts_in_slab(slab->slab_cache, slab)) {
> > + if (obj_exts_in_slab(slab->slab_cache, slab) ||
> > + obj_exts_in_object(slab->slab_cache)) {
> > slab->obj_exts = 0;
> > return;
> > }
> > @@ -2326,6 +2369,23 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
> > obj_exts |= MEMCG_DATA_OBJEXTS;
> > slab->obj_exts = obj_exts;
> > slab_set_stride(slab, sizeof(struct slabobj_ext));
> > + } else if (obj_exts_in_object(s)) {
> > + unsigned int offset = obj_exts_offset_in_object(s);
> > +
> > + obj_exts = (unsigned long)slab_address(slab);
> > + obj_exts += s->red_left_pad;
> > + obj_exts += obj_exts_offset_in_object(s);
>
> Hi, Harry
>
> It looks like this could just be simplified to obj_exts += offset, right?
Right! Will do in v5.
> > +
> > + get_slab_obj_exts(obj_exts);
> > + for_each_object(addr, s, slab_address(slab), slab->objects)
> > + memset(kasan_reset_tag(addr) + offset, 0,
> > + sizeof(struct slabobj_ext));
> > + put_slab_obj_exts(obj_exts);
> > +
> > + if (IS_ENABLED(CONFIG_MEMCG))
> > + obj_exts |= MEMCG_DATA_OBJEXTS;
> > + slab->obj_exts = obj_exts;
> > + slab_set_stride(slab, s->size);
> > }
> > }
> >
> > @@ -8023,6 +8083,7 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
> > {
> > slab_flags_t flags = s->flags;
> > unsigned int size = s->object_size;
> > + unsigned int aligned_size;
> > unsigned int order;
> >
> > /*
> > @@ -8132,7 +8193,13 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
> > * offset 0. In order to align the objects we have to simply size
> > * each object to conform to the alignment.
> > */
> > - size = ALIGN(size, s->align);
> > + aligned_size = ALIGN(size, s->align);
> > +#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
> > + if (aligned_size - size >= sizeof(struct slabobj_ext))
> > + s->flags |= SLAB_OBJ_EXT_IN_OBJ;
> > +#endif
> > + size = aligned_size;
> > +
>
> One more thought: in calculate_sizes() we add some extra padding when
> SLAB_RED_ZONE is enabled:
>
> if (flags & SLAB_RED_ZONE) {
> /*
> * Add some empty padding so that we can catch
> * overwrites from earlier objects rather than let
> * tracking information or the free pointer be
> * corrupted if a user writes before the start
> * of the object.
> */
> size += sizeof(void *);
> ...
> }
>
>
> From what I understand, this additional padding ends up being placed
> after the KASAN allocation metadata.
Right.
> Since it’s only "extra" padding (i.e., it doesn’t seem strictly required
> for the layout), and your patch would reuse this area — together with
> the final padding introduced by `size = ALIGN(size, s->align);`
Very good point!
Nah, it wasn't intentional to reuse the extra padding.
> for objext, it seems like this padding may no longer provide much benefit.
> Do you think it would make sense to remove this extra padding
> altogether?
I think when debugging flags are enabled it'd still be useful to have,
I'll try to keep the padding area after obj_ext (so that overwrites from
the previous object won't overwrite the metadata).
Thanks a lot!
--
Cheers,
Harry / Hyeonggon
next prev parent reply other threads:[~2025-12-24 6:39 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-22 11:08 [PATCH V4 0/8] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unsed slab space Harry Yoo
2025-12-22 11:08 ` [PATCH V4 1/8] mm/slab: use unsigned long for orig_size to ensure proper metadata align Harry Yoo
2025-12-22 11:08 ` [PATCH V4 2/8] mm/slab: allow specifying free pointer offset when using constructor Harry Yoo
2025-12-22 11:08 ` [PATCH V4 3/8] ext4: specify the free pointer offset for ext4_inode_cache Harry Yoo
2025-12-22 11:08 ` [PATCH V4 4/8] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper Harry Yoo
2025-12-22 23:36 ` kernel test robot
2025-12-23 0:08 ` kernel test robot
2025-12-22 11:08 ` [PATCH V4 5/8] mm/slab: use stride to access slabobj_ext Harry Yoo
2025-12-22 11:08 ` [PATCH V4 6/8] mm/memcontrol,alloc_tag: handle slabobj_ext access under KASAN poison Harry Yoo
2025-12-22 11:08 ` [PATCH V4 7/8] mm/slab: save memory by allocating slabobj_ext array from leftover Harry Yoo
2025-12-23 1:40 ` kernel test robot
2025-12-23 15:08 ` Hao Li
2025-12-23 15:31 ` Harry Yoo
2025-12-23 16:08 ` Hao Li
2025-12-23 16:25 ` Harry Yoo
2025-12-24 3:18 ` Hao Li
2025-12-24 5:53 ` Harry Yoo
2025-12-24 6:05 ` Hao Li
2025-12-24 12:51 ` [PATCH] slub: clarify object field layout comments Hao Li
2025-12-29 7:07 ` Harry Yoo
2025-12-29 11:56 ` Hao Li
2025-12-22 11:08 ` [PATCH V4 8/8] mm/slab: place slabobj_ext metadata in unused space within s->size Harry Yoo
2025-12-24 5:33 ` Hao Li
2025-12-24 6:38 ` Harry Yoo [this message]
2025-12-24 12:43 ` Hao Li
2025-12-30 4:59 ` Harry Yoo
2025-12-30 8:54 ` Hao Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aUuKgRlI60Hw3-Et@hyeyoo \
--to=harry.yoo@oracle.com \
--cc=adilger.kernel@dilger.ca \
--cc=akpm@linux-foundation.org \
--cc=andreyknvl@gmail.com \
--cc=cgroups@vger.kernel.org \
--cc=cl@gentwo.org \
--cc=dvyukov@google.com \
--cc=glider@google.com \
--cc=hannes@cmpxchg.org \
--cc=hao.li@linux.dev \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=ryabinin.a.a@gmail.com \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=tytso@mit.edu \
--cc=vbabka@suse.cz \
--cc=vincenzo.frascino@arm.com \
--cc=yeoreum.yun@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox