From: Hao Li <hao.li@linux.dev>
To: Harry Yoo <harry.yoo@oracle.com>
Cc: akpm@linux-foundation.org, vbabka@suse.cz, andreyknvl@gmail.com,
cl@gentwo.org, dvyukov@google.com, glider@google.com,
hannes@cmpxchg.org, linux-mm@kvack.org, mhocko@kernel.org,
muchun.song@linux.dev, rientjes@google.com,
roman.gushchin@linux.dev, ryabinin.a.a@gmail.com,
shakeel.butt@linux.dev, surenb@google.com,
vincenzo.frascino@arm.com, yeoreum.yun@arm.com, tytso@mit.edu,
adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org
Subject: Re: [PATCH V4 7/8] mm/slab: save memory by allocating slabobj_ext array from leftover
Date: Wed, 24 Dec 2025 14:05:33 +0800 [thread overview]
Message-ID: <nhf2pxhoap3rwrzsgvu6h66bk6wilutt54sepb7brxrjv45sql@bkb5hopiq7bv> (raw)
In-Reply-To: <aUt_1uDe05diks7b@hyeyoo>
On Wed, Dec 24, 2025 at 02:53:26PM +0900, Harry Yoo wrote:
> On Wed, Dec 24, 2025 at 11:18:56AM +0800, Hao Li wrote:
> > On Wed, Dec 24, 2025 at 01:25:01AM +0900, Harry Yoo wrote:
> > > On Wed, Dec 24, 2025 at 12:08:36AM +0800, Hao Li wrote:
> > > > On Wed, Dec 24, 2025 at 12:31:19AM +0900, Harry Yoo wrote:
> > > > > On Tue, Dec 23, 2025 at 11:08:32PM +0800, Hao Li wrote:
> > > > > > On Mon, Dec 22, 2025 at 08:08:42PM +0900, Harry Yoo wrote:
> > > > > > > The leftover space in a slab is always smaller than s->size, and
> > > > > > > kmem caches for large objects that are not power-of-two sizes tend to have
> > > > > > > a greater amount of leftover space per slab. In some cases, the leftover
> > > > > > > space is larger than the size of the slabobj_ext array for the slab.
> > > > > > >
> > > > > > > An excellent example of such a cache is ext4_inode_cache. On my system,
> > > > > > > the object size is 1144, with a preferred order of 3, 28 objects per slab,
> > > > > > > and 736 bytes of leftover space per slab.
> > > > > > >
> > > > > > > Since the size of the slabobj_ext array is only 224 bytes (w/o mem
> > > > > > > profiling) or 448 bytes (w/ mem profiling) per slab, the entire array
> > > > > > > fits within the leftover space.
> > > > > > >
> > > > > > > Allocate the slabobj_exts array from this unused space instead of using
> > > > > > > kcalloc() when it is large enough. The array is allocated from unused
> > > > > > > space only when creating new slabs, and it doesn't try to utilize unused
> > > > > > > space if alloc_slab_obj_exts() is called after slab creation because
> > > > > > > implementing lazy allocation involves more expensive synchronization.
> > > > > > >
> > > > > > > The implementation and evaluation of lazy allocation from unused space
> > > > > > > is left as future-work. As pointed by Vlastimil Babka [1], it could be
> > > > > > > beneficial when a slab cache without SLAB_ACCOUNT can be created, and
> > > > > > > some of the allocations from the cache use __GFP_ACCOUNT. For example,
> > > > > > > xarray does that.
> > > > > > >
> > > > > > > To avoid unnecessary overhead when MEMCG (with SLAB_ACCOUNT) and
> > > > > > > MEM_ALLOC_PROFILING are not used for the cache, allocate the slabobj_ext
> > > > > > > array only when either of them is enabled.
> > > > > > >
> > > > > > > [ MEMCG=y, MEM_ALLOC_PROFILING=n ]
> > > > > > >
> > > > > > > Before patch (creating ~2.64M directories on ext4):
> > > > > > > Slab: 4747880 kB
> > > > > > > SReclaimable: 4169652 kB
> > > > > > > SUnreclaim: 578228 kB
> > > > > > >
> > > > > > > After patch (creating ~2.64M directories on ext4):
> > > > > > > Slab: 4724020 kB
> > > > > > > SReclaimable: 4169188 kB
> > > > > > > SUnreclaim: 554832 kB (-22.84 MiB)
> > > > > > >
> > > > > > > Enjoy the memory savings!
> > > > > > >
> > > > > > > Link: https://lore.kernel.org/linux-mm/48029aab-20ea-4d90-bfd1-255592b2018e@suse.cz
> > > > > > > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > > > > > > ---
> > > > > > > mm/slub.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
> > > > > > > 1 file changed, 151 insertions(+), 5 deletions(-)
> > > > > > >
> > > > > > > diff --git a/mm/slub.c b/mm/slub.c
> > > > > > > index 39c381cc1b2c..3fc3d2ca42e7 100644
> > > > > > > --- a/mm/slub.c
> > > > > > > +++ b/mm/slub.c
> > > > > > > @@ -886,6 +886,99 @@ static inline unsigned long get_orig_size(struct kmem_cache *s, void *object)
> > > > > > > return *(unsigned long *)p;
> > > > > > > }
> > > > > > >
> > > > > > > +#ifdef CONFIG_SLAB_OBJ_EXT
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Check if memory cgroup or memory allocation profiling is enabled.
> > > > > > > + * If enabled, SLUB tries to reduce memory overhead of accounting
> > > > > > > + * slab objects. If neither is enabled when this function is called,
> > > > > > > + * the optimization is simply skipped to avoid affecting caches that do not
> > > > > > > + * need slabobj_ext metadata.
> > > > > > > + *
> > > > > > > + * However, this may disable optimization when memory cgroup or memory
> > > > > > > + * allocation profiling is used, but slabs are created too early
> > > > > > > + * even before those subsystems are initialized.
> > > > > > > + */
> > > > > > > +static inline bool need_slab_obj_exts(struct kmem_cache *s)
> > > > > > > +{
> > > > > > > + if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
> > > > > > > + return true;
> > > > > > > +
> > > > > > > + if (mem_alloc_profiling_enabled())
> > > > > > > + return true;
> > > > > > > +
> > > > > > > + return false;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static inline unsigned int obj_exts_size_in_slab(struct slab *slab)
> > > > > > > +{
> > > > > > > + return sizeof(struct slabobj_ext) * slab->objects;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static inline unsigned long obj_exts_offset_in_slab(struct kmem_cache *s,
> > > > > > > + struct slab *slab)
> > > > > > > +{
> > > > > > > + unsigned long objext_offset;
> > > > > > > +
> > > > > > > + objext_offset = s->red_left_pad + s->size * slab->objects;
> > > > > >
> > > > > > Hi Harry,
> > > > >
> > > > > Hi Hao, thanks for the review!
> > > > > Hope you're doing well.
> > > >
> > > > Thanks Harry. Hope you are too!
> > > >
> > > > >
> > > > > > As s->size already includes s->red_left_pad
> > > > >
> > > > > Great question. It's true that s->size includes s->red_left_pad,
> > > > > but we have also a redzone right before the first object:
> > > > >
> > > > > [ redzone ] [ obj 1 | redzone ] [ obj 2| redzone ] [ ... ]
> > > > >
> > > > > So we have (slab->objects + 1) red zones and so
> > > >
> > > > I have a follow-up question regarding the redzones. Unless I'm missing
> > > > some detail, it seems the left redzone should apply to each object as
> > > > well. If so, I would expect the memory layout to be:
> > > >
> > > > [left redzone | obj 1 | right redzone], [left redzone | obj 2 | right redzone], [ ... ]
> > > >
> > > > In `calculate_sizes()`, I see:
> > > >
> > > > if ((flags & SLAB_RED_ZONE) && size == s->object_size)
> > > > size += sizeof(void *);
> > >
> > > Yes, this is the right redzone,
> > >
> > > > ...
> > > > ...
> > > > if (flags & SLAB_RED_ZONE) {
> > > > size += s->red_left_pad;
> > > > }
> > >
> > > This is the left red zone.
> > > Both of them are included in the size...
> > >
> > > Oh god, I was confused, thanks for the correction!
> >
> > Glad it helped!
> >
> > > > Could you please confirm whether my understanding is correct, or point
> > > > out what I'm missing?
> > >
> > > I think your understanding is correct.
> > >
> > > Hmm, perhaps we should update the "Object layout:" comment above
> > > check_pad_bytes() to avoid future confusion?
> >
> > Yes, exactly. That’s a good idea.
> >
> > Also, I feel the layout description in the check_pad_bytes() comment
> > isn’t very intuitive and can be a bit hard to follow. I think it might be
> > clearer if we explicitly list out each field. What do you think about that?
>
> Yeah it's confusing, but from your description
> I'm not sure what the end result would look like.
>
> Could you please do a patch that does it? (and also adding left redzone
> to the object layout comment, if you are willing to!)
Sure — I'd be happy to!
>
> As long as it makes it more understandable/intuitive,
> it'd be nice to have!
I'll send a patch for review soon.
--
Thanks,
Hao
>
> --
> Cheers,
> Harry / Hyeonggon
next prev parent reply other threads:[~2025-12-24 6:05 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-22 11:08 [PATCH V4 0/8] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unsed slab space Harry Yoo
2025-12-22 11:08 ` [PATCH V4 1/8] mm/slab: use unsigned long for orig_size to ensure proper metadata align Harry Yoo
2025-12-22 11:08 ` [PATCH V4 2/8] mm/slab: allow specifying free pointer offset when using constructor Harry Yoo
2025-12-22 11:08 ` [PATCH V4 3/8] ext4: specify the free pointer offset for ext4_inode_cache Harry Yoo
2025-12-22 11:08 ` [PATCH V4 4/8] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper Harry Yoo
2025-12-22 23:36 ` kernel test robot
2025-12-23 0:08 ` kernel test robot
2025-12-22 11:08 ` [PATCH V4 5/8] mm/slab: use stride to access slabobj_ext Harry Yoo
2025-12-22 11:08 ` [PATCH V4 6/8] mm/memcontrol,alloc_tag: handle slabobj_ext access under KASAN poison Harry Yoo
2025-12-22 11:08 ` [PATCH V4 7/8] mm/slab: save memory by allocating slabobj_ext array from leftover Harry Yoo
2025-12-23 1:40 ` kernel test robot
2025-12-23 15:08 ` Hao Li
2025-12-23 15:31 ` Harry Yoo
2025-12-23 16:08 ` Hao Li
2025-12-23 16:25 ` Harry Yoo
2025-12-24 3:18 ` Hao Li
2025-12-24 5:53 ` Harry Yoo
2025-12-24 6:05 ` Hao Li [this message]
2025-12-24 12:51 ` [PATCH] slub: clarify object field layout comments Hao Li
2025-12-29 7:07 ` Harry Yoo
2025-12-29 11:56 ` Hao Li
2025-12-22 11:08 ` [PATCH V4 8/8] mm/slab: place slabobj_ext metadata in unused space within s->size Harry Yoo
2025-12-24 5:33 ` Hao Li
2025-12-24 6:38 ` Harry Yoo
2025-12-24 12:43 ` Hao Li
2025-12-30 4:59 ` Harry Yoo
2025-12-30 8:54 ` Hao Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=nhf2pxhoap3rwrzsgvu6h66bk6wilutt54sepb7brxrjv45sql@bkb5hopiq7bv \
--to=hao.li@linux.dev \
--cc=adilger.kernel@dilger.ca \
--cc=akpm@linux-foundation.org \
--cc=andreyknvl@gmail.com \
--cc=cgroups@vger.kernel.org \
--cc=cl@gentwo.org \
--cc=dvyukov@google.com \
--cc=glider@google.com \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=ryabinin.a.a@gmail.com \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=tytso@mit.edu \
--cc=vbabka@suse.cz \
--cc=vincenzo.frascino@arm.com \
--cc=yeoreum.yun@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox