linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Harry Yoo <harry.yoo@oracle.com>
To: Hao Li <hao.li@linux.dev>
Cc: akpm@linux-foundation.org, vbabka@suse.cz, andreyknvl@gmail.com,
	cl@gentwo.org, dvyukov@google.com, glider@google.com,
	hannes@cmpxchg.org, linux-mm@kvack.org, mhocko@kernel.org,
	muchun.song@linux.dev, rientjes@google.com,
	roman.gushchin@linux.dev, ryabinin.a.a@gmail.com,
	shakeel.butt@linux.dev, surenb@google.com,
	vincenzo.frascino@arm.com, yeoreum.yun@arm.com, tytso@mit.edu,
	adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org
Subject: Re: [PATCH V4 7/8] mm/slab: save memory by allocating slabobj_ext array from leftover
Date: Wed, 24 Dec 2025 14:53:26 +0900	[thread overview]
Message-ID: <aUt_1uDe05diks7b@hyeyoo> (raw)
In-Reply-To: <c6owr44jdncf7q5zqgq4wn4pm57ai4cd3upauwmwszopuddf5g@52mkqbe2m27j>

On Wed, Dec 24, 2025 at 11:18:56AM +0800, Hao Li wrote:
> On Wed, Dec 24, 2025 at 01:25:01AM +0900, Harry Yoo wrote:
> > On Wed, Dec 24, 2025 at 12:08:36AM +0800, Hao Li wrote:
> > > On Wed, Dec 24, 2025 at 12:31:19AM +0900, Harry Yoo wrote:
> > > > On Tue, Dec 23, 2025 at 11:08:32PM +0800, Hao Li wrote:
> > > > > On Mon, Dec 22, 2025 at 08:08:42PM +0900, Harry Yoo wrote:
> > > > > > The leftover space in a slab is always smaller than s->size, and
> > > > > > kmem caches for large objects that are not power-of-two sizes tend to have
> > > > > > a greater amount of leftover space per slab. In some cases, the leftover
> > > > > > space is larger than the size of the slabobj_ext array for the slab.
> > > > > > 
> > > > > > An excellent example of such a cache is ext4_inode_cache. On my system,
> > > > > > the object size is 1144, with a preferred order of 3, 28 objects per slab,
> > > > > > and 736 bytes of leftover space per slab.
> > > > > > 
> > > > > > Since the size of the slabobj_ext array is only 224 bytes (w/o mem
> > > > > > profiling) or 448 bytes (w/ mem profiling) per slab, the entire array
> > > > > > fits within the leftover space.
> > > > > > 
> > > > > > Allocate the slabobj_exts array from this unused space instead of using
> > > > > > kcalloc() when it is large enough. The array is allocated from unused
> > > > > > space only when creating new slabs, and it doesn't try to utilize unused
> > > > > > space if alloc_slab_obj_exts() is called after slab creation because
> > > > > > implementing lazy allocation involves more expensive synchronization.
> > > > > > 
> > > > > > The implementation and evaluation of lazy allocation from unused space
> > > > > > is left as future-work. As pointed by Vlastimil Babka [1], it could be
> > > > > > beneficial when a slab cache without SLAB_ACCOUNT can be created, and
> > > > > > some of the allocations from the cache use __GFP_ACCOUNT. For example,
> > > > > > xarray does that.
> > > > > > 
> > > > > > To avoid unnecessary overhead when MEMCG (with SLAB_ACCOUNT) and
> > > > > > MEM_ALLOC_PROFILING are not used for the cache, allocate the slabobj_ext
> > > > > > array only when either of them is enabled.
> > > > > > 
> > > > > > [ MEMCG=y, MEM_ALLOC_PROFILING=n ]
> > > > > > 
> > > > > > Before patch (creating ~2.64M directories on ext4):
> > > > > >   Slab:            4747880 kB
> > > > > >   SReclaimable:    4169652 kB
> > > > > >   SUnreclaim:       578228 kB
> > > > > > 
> > > > > > After patch (creating ~2.64M directories on ext4):
> > > > > >   Slab:            4724020 kB
> > > > > >   SReclaimable:    4169188 kB
> > > > > >   SUnreclaim:       554832 kB (-22.84 MiB)
> > > > > > 
> > > > > > Enjoy the memory savings!
> > > > > > 
> > > > > > Link: https://lore.kernel.org/linux-mm/48029aab-20ea-4d90-bfd1-255592b2018e@suse.cz
> > > > > > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > > > > > ---
> > > > > >  mm/slub.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
> > > > > >  1 file changed, 151 insertions(+), 5 deletions(-)
> > > > > > 
> > > > > > diff --git a/mm/slub.c b/mm/slub.c
> > > > > > index 39c381cc1b2c..3fc3d2ca42e7 100644
> > > > > > --- a/mm/slub.c
> > > > > > +++ b/mm/slub.c
> > > > > > @@ -886,6 +886,99 @@ static inline unsigned long get_orig_size(struct kmem_cache *s, void *object)
> > > > > >  	return *(unsigned long *)p;
> > > > > >  }
> > > > > >  
> > > > > > +#ifdef CONFIG_SLAB_OBJ_EXT
> > > > > > +
> > > > > > +/*
> > > > > > + * Check if memory cgroup or memory allocation profiling is enabled.
> > > > > > + * If enabled, SLUB tries to reduce memory overhead of accounting
> > > > > > + * slab objects. If neither is enabled when this function is called,
> > > > > > + * the optimization is simply skipped to avoid affecting caches that do not
> > > > > > + * need slabobj_ext metadata.
> > > > > > + *
> > > > > > + * However, this may disable optimization when memory cgroup or memory
> > > > > > + * allocation profiling is used, but slabs are created too early
> > > > > > + * even before those subsystems are initialized.
> > > > > > + */
> > > > > > +static inline bool need_slab_obj_exts(struct kmem_cache *s)
> > > > > > +{
> > > > > > +	if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
> > > > > > +		return true;
> > > > > > +
> > > > > > +	if (mem_alloc_profiling_enabled())
> > > > > > +		return true;
> > > > > > +
> > > > > > +	return false;
> > > > > > +}
> > > > > > +
> > > > > > +static inline unsigned int obj_exts_size_in_slab(struct slab *slab)
> > > > > > +{
> > > > > > +	return sizeof(struct slabobj_ext) * slab->objects;
> > > > > > +}
> > > > > > +
> > > > > > +static inline unsigned long obj_exts_offset_in_slab(struct kmem_cache *s,
> > > > > > +						    struct slab *slab)
> > > > > > +{
> > > > > > +	unsigned long objext_offset;
> > > > > > +
> > > > > > +	objext_offset = s->red_left_pad + s->size * slab->objects;
> > > > > 
> > > > > Hi Harry,
> > > > 
> > > > Hi Hao, thanks for the review!
> > > > Hope you're doing well.
> > > 
> > > Thanks Harry. Hope you are too!
> > > 
> > > > 
> > > > > As s->size already includes s->red_left_pad
> > > > 
> > > > Great question. It's true that s->size includes s->red_left_pad,
> > > > but we have also a redzone right before the first object:
> > > > 
> > > >   [ redzone ] [ obj 1 | redzone ] [ obj 2| redzone ] [ ... ]
> > > > 
> > > > So we have (slab->objects + 1) red zones and so
> > > 
> > > I have a follow-up question regarding the redzones. Unless I'm missing
> > > some detail, it seems the left redzone should apply to each object as
> > > well. If so, I would expect the memory layout to be:
> > > 
> > > [left redzone | obj 1 | right redzone], [left redzone | obj 2 | right redzone], [ ... ]
> > > 
> > > In `calculate_sizes()`, I see:
> > > 
> > > if ((flags & SLAB_RED_ZONE) && size == s->object_size)
> > >     size += sizeof(void *);
> > 
> > Yes, this is the right redzone,
> > 
> > > ...
> > > ...
> > > if (flags & SLAB_RED_ZONE) {
> > >     size += s->red_left_pad;
> > > }
> > 
> > This is the left red zone.
> > Both of them are included in the size...
> > 
> > Oh god, I was confused, thanks for the correction!
> 
> Glad it helped!
> 
> > > Could you please confirm whether my understanding is correct, or point
> > > out what I'm missing?
> > 
> > I think your understanding is correct.
> > 
> > Hmm, perhaps we should update the "Object layout:" comment above
> > check_pad_bytes() to avoid future confusion?
> 
> Yes, exactly. That’s a good idea.
>
> Also, I feel the layout description in the check_pad_bytes() comment
> isn’t very intuitive and can be a bit hard to follow. I think it might be
> clearer if we explicitly list out each field. What do you think about that?

Yeah it's confusing, but from your description
I'm not sure what the end result would look like.

Could you please do a patch that does it? (and also adding left redzone
to the object layout comment, if you are willing to!)

As long as it makes it more understandable/intuitive,
it'd be nice to have!

-- 
Cheers,
Harry / Hyeonggon


  reply	other threads:[~2025-12-24  5:53 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-22 11:08 [PATCH V4 0/8] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unsed slab space Harry Yoo
2025-12-22 11:08 ` [PATCH V4 1/8] mm/slab: use unsigned long for orig_size to ensure proper metadata align Harry Yoo
2025-12-22 11:08 ` [PATCH V4 2/8] mm/slab: allow specifying free pointer offset when using constructor Harry Yoo
2025-12-22 11:08 ` [PATCH V4 3/8] ext4: specify the free pointer offset for ext4_inode_cache Harry Yoo
2025-12-22 11:08 ` [PATCH V4 4/8] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper Harry Yoo
2025-12-22 23:36   ` kernel test robot
2025-12-23  0:08   ` kernel test robot
2025-12-22 11:08 ` [PATCH V4 5/8] mm/slab: use stride to access slabobj_ext Harry Yoo
2025-12-22 11:08 ` [PATCH V4 6/8] mm/memcontrol,alloc_tag: handle slabobj_ext access under KASAN poison Harry Yoo
2025-12-22 11:08 ` [PATCH V4 7/8] mm/slab: save memory by allocating slabobj_ext array from leftover Harry Yoo
2025-12-23  1:40   ` kernel test robot
2025-12-23 15:08   ` Hao Li
2025-12-23 15:31     ` Harry Yoo
2025-12-23 16:08       ` Hao Li
2025-12-23 16:25         ` Harry Yoo
2025-12-24  3:18           ` Hao Li
2025-12-24  5:53             ` Harry Yoo [this message]
2025-12-24  6:05               ` Hao Li
2025-12-24 12:51               ` [PATCH] slub: clarify object field layout comments Hao Li
2025-12-29  7:07                 ` Harry Yoo
2025-12-29 11:56                   ` Hao Li
2025-12-22 11:08 ` [PATCH V4 8/8] mm/slab: place slabobj_ext metadata in unused space within s->size Harry Yoo
2025-12-24  5:33   ` Hao Li
2025-12-24  6:38     ` Harry Yoo
2025-12-24 12:43       ` Hao Li
2025-12-30  4:59         ` Harry Yoo
2025-12-30  8:54           ` Hao Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aUt_1uDe05diks7b@hyeyoo \
    --to=harry.yoo@oracle.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@gentwo.org \
    --cc=dvyukov@google.com \
    --cc=glider@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hao.li@linux.dev \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=ryabinin.a.a@gmail.com \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=tytso@mit.edu \
    --cc=vbabka@suse.cz \
    --cc=vincenzo.frascino@arm.com \
    --cc=yeoreum.yun@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox