From: Hyeonggon Yoo <42.hyeyoo@gmail.com>
To: Vasily Averin <vvs@virtuozzo.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>,
Vlastimil Babka <vbabka@suse.cz>,
Christoph Lameter <cl@linux.com>,
David Rientjes <rientjes@google.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Pekka Enberg <penberg@kernel.org>, Linux MM <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
kernel@openvz.org
Subject: Re: slabinfo shows incorrect active_objs ???
Date: Mon, 28 Feb 2022 10:28:53 +0000 [thread overview]
Message-ID: <Yhyj5QY+2Vmk9jO8@ip-172-31-19-208.ap-northeast-1.compute.internal> (raw)
In-Reply-To: <YhyicnXySs2syMfg@ip-172-31-19-208.ap-northeast-1.compute.internal>
On Mon, Feb 28, 2022 at 10:22:42AM +0000, Hyeonggon Yoo wrote:
> On Mon, Feb 28, 2022 at 09:17:27AM +0300, Vasily Averin wrote:
> > On 25.02.2022 07:37, Vasily Averin wrote:
> > > On 25.02.2022 03:08, Roman Gushchin wrote:
> > > >
> > > > > On Feb 24, 2022, at 5:17 AM, Vasily Averin <vvs@virtuozzo.com> wrote:
> > > > >
> > > > > On 22.02.2022 19:32, Shakeel Butt wrote:
> > > > > > If you are just interested in the stats, you can use SLAB for your experiments.
> > > > >
> > > > > Unfortunately memcg_slabino.py does not support SLAB right now.
> > > > >
> > > > > > On 23.02.2022 20:31, Vlastimil Babka wrote:
> > > > > > > On 2/23/22 04:45, Hyeonggon Yoo wrote:
> > > > > > > On Wed, Feb 23, 2022 at 01:32:36AM +0100, Vlastimil Babka wrote:
> > > > > > > > Hm it would be easier just to disable merging when the precise counters are
> > > > > > > > enabled. Assume it would be a config option (possibly boot-time option with
> > > > > > > > static keys) anyway so those who don't need them can avoid the overhead.
> > > > > > >
> > > > > > > Is it possible to accurately account objects in SLUB? I think it's not
> > > > > > > easy because a CPU can free objects to remote cpu's partial slabs using
> > > > > > > cmpxchg_double()...
> > > > > > AFAIU Roman's idea would be that each alloc/free would simply inc/dec an
> > > > > > object counter that's disconnected from physical handling of particular sl*b
> > > > > > implementation. It would provide exact count of objects from the perspective
> > > > > > of slab users.
> > > > > > I assume for reduced overhead the counters would be implemented in a percpu
> > > > > > fashion as e.g. vmstats. Slabinfo gathering would thus have to e.g. sum up
> > > > > > those percpu counters.
> > > > >
> > > > > I like this idea too and I'm going to spend some time for its implementation.
> > > >
> > > > Sounds good!
> > > >
> > > > Unfortunately it’s quite tricky: the problem is that there is potentially a large and dynamic set of cgroups and also large and dynamic set of slab caches. Given the performance considerations, it’s also unlikely to avoid using percpu variables.
> > > > So we come to the (nr_slab_caches * nr_cgroups * nr_cpus) number of “objects”. If we create them proactively, we’re likely wasting lot of memory. Creating them on demand is tricky too (especially without losing some accounting accuracy).
> > >
> > > I told about global (i.e. non-memcg) precise slab counters only.
> > > I'm expect it can done under new config option and/or static key, and if present use them in /proc/slabinfo output.
> > >
> > > At present I'm still going to extract memcg counters via your memcg_slabinfo script.
> >
> > I'm not sure I'll be able to debug this patch properly and decided to submit it as is.
> > I hope it can be useful.
> >
> > In general it works and /proc/slabinfo shows reasonable numbers,
> > however in some cases they differs from crash' "kmem -s" output, either +1 or -1.
> > Obviously I missed something.
> >
> > ---[cut here]---
> > [PATCH RFC] slub: precise in-use counter for /proc/slabinfo output
> >
> > Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
> > ---
> > include/linux/slub_def.h | 3 +++
> > init/Kconfig | 7 +++++++
> > mm/slub.c | 20 +++++++++++++++++++-
> > 3 files changed, 29 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> > index 33c5c0e3bd8d..d22e18dfe905 100644
> > --- a/include/linux/slub_def.h
> > +++ b/include/linux/slub_def.h
> > @@ -56,6 +56,9 @@ struct kmem_cache_cpu {
> > #ifdef CONFIG_SLUB_STATS
> > unsigned stat[NR_SLUB_STAT_ITEMS];
> > #endif
> > +#ifdef CONFIG_SLUB_PRECISE_INUSE
> > + unsigned inuse; /* Precise in-use counter */
> > +#endif
> > };
> > #ifdef CONFIG_SLUB_CPU_PARTIAL
> > diff --git a/init/Kconfig b/init/Kconfig
> > index e9119bf54b1f..5c57bdbb8938 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -1995,6 +1995,13 @@ config SLUB_CPU_PARTIAL
> > which requires the taking of locks that may cause latency spikes.
> > Typically one would choose no for a realtime system.
> > +config SLUB_PRECISE_INUSE
> > + default n
> > + depends on SLUB && SMP
> > + bool "SLUB precise in-use counter"
> > + help
> > + Per cpu in-use counter shows precise statistic in slabinfo.
> > +
> > config MMAP_ALLOW_UNINITIALIZED
> > bool "Allow mmapped anonymous memory to be uninitialized"
> > depends on EXPERT && !MMU
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 261474092e43..90750cae0af9 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -3228,6 +3228,9 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
> > out:
> > slab_post_alloc_hook(s, objcg, gfpflags, 1, &object, init);
> > +#ifdef CONFIG_SLUB_PRECISE_INUSE
> > + raw_cpu_inc(s->cpu_slab->inuse);
> > +#endif
> > return object;
> > }
> > @@ -3506,8 +3509,12 @@ static __always_inline void slab_free(struct kmem_cache *s, struct slab *slab,
> > * With KASAN enabled slab_free_freelist_hook modifies the freelist
> > * to remove objects, whose reuse must be delayed.
> > */
> > - if (slab_free_freelist_hook(s, &head, &tail, &cnt))
> > + if (slab_free_freelist_hook(s, &head, &tail, &cnt)) {
> > do_slab_free(s, slab, head, tail, cnt, addr);
> > +#ifdef CONFIG_SLUB_PRECISE_INUSE
> > + raw_cpu_sub(s->cpu_slab->inuse, cnt);
> > +#endif
> > + }
> > }
> > #ifdef CONFIG_KASAN_GENERIC
> > @@ -6253,6 +6260,17 @@ void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo)
> > nr_free += count_partial(n, count_free);
> > }
> > +#ifdef CONFIG_SLUB_PRECISE_INUSE
> > + {
> > + unsigned int cpu, nr_inuse = 0;
> > +
> > + for_each_possible_cpu(cpu)
> > + nr_inuse += per_cpu_ptr((s)->cpu_slab, cpu)->inuse;
> > +
> > + if (nr_inuse <= nr_objs)
> > + nr_free = nr_objs - nr_inuse;
> > + }
> > +#endif
> > sinfo->active_objs = nr_objs - nr_free;
> > sinfo->num_objs = nr_objs;
> > sinfo->active_slabs = nr_slabs;
>
> Hi Vasily, thank you for this patch.
> This looks nice, but I see things we can improve:
>
> 1) using raw_cpu_{inc,sub}(), s->cpu_slab->inuse will be racy if kernel
> can be preempted. slub does not disable preemption/interrupts at all in fastpath.
>
> And yeah, we can accept being racy to some degree. but it will be incorrect
> more and more if system is up for long time. So I think atomic integer
> is right choice if correctness is important?
>
> 2) This code is not aware of cpu partials. there is list of slab for
> each kmem_cache_cpu. you can iterate them by:
>
And replying this, I realized again ... we need to consider disabling
preemption when freeing to remote cpu's partials if CONFIG_SLUB_PRECISE_INUSE=y.
Hmm, do we need another approach?
> kmem_cache_cpu->partial->next->next->next->... and so on until it enters NULL.
>
> So we need to count cpu partials' inuse too.
> Then we need per-slab counters... I think we can use struct slab's
> __unused field for this?
>
> Thanks :)
>
> --
> Thank you, You are awesome!
> Hyeonggon :-)
--
Thank you, You are awesome!
Hyeonggon :-)
next prev parent reply other threads:[~2022-02-28 10:29 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-22 9:22 Vasily Averin
2022-02-22 10:23 ` Hyeonggon Yoo
2022-02-22 12:10 ` Vasily Averin
2022-02-22 16:32 ` Shakeel Butt
2022-02-22 16:47 ` Roman Gushchin
2022-02-23 1:07 ` Vasily Averin
2022-02-22 20:59 ` Roman Gushchin
2022-02-22 23:08 ` Vlastimil Babka
2022-02-23 0:07 ` Roman Gushchin
2022-02-23 0:32 ` Vlastimil Babka
2022-02-23 3:45 ` Hyeonggon Yoo
2022-02-23 17:31 ` Vlastimil Babka
2022-02-23 18:15 ` Roman Gushchin
2022-02-24 13:16 ` Vasily Averin
2022-02-25 0:08 ` Roman Gushchin
2022-02-25 4:37 ` Vasily Averin
2022-02-28 6:17 ` Vasily Averin
2022-02-28 10:22 ` Hyeonggon Yoo
2022-02-28 10:28 ` Hyeonggon Yoo [this message]
2022-02-28 10:43 ` Hyeonggon Yoo
2022-02-28 12:09 ` Hyeonggon Yoo
2022-03-03 8:39 ` Christoph Lameter
2022-03-04 16:29 ` Vlastimil Babka
2022-02-22 11:10 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yhyj5QY+2Vmk9jO8@ip-172-31-19-208.ap-northeast-1.compute.internal \
--to=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=kernel@openvz.org \
--cc=linux-mm@kvack.org \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=vbabka@suse.cz \
--cc=vvs@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox