From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6AE36C433F5 for ; Mon, 28 Feb 2022 12:09:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C13698D0002; Mon, 28 Feb 2022 07:09:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BC2CF8D0001; Mon, 28 Feb 2022 07:09:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A89618D0002; Mon, 28 Feb 2022 07:09:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0217.hostedemail.com [216.40.44.217]) by kanga.kvack.org (Postfix) with ESMTP id 960AB8D0001 for ; Mon, 28 Feb 2022 07:09:45 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 50F39987B2 for ; Mon, 28 Feb 2022 12:09:45 +0000 (UTC) X-FDA: 79192069530.23.663DDDE Received: from mail-pg1-f179.google.com (mail-pg1-f179.google.com [209.85.215.179]) by imf20.hostedemail.com (Postfix) with ESMTP id C02961C000C for ; Mon, 28 Feb 2022 12:09:44 +0000 (UTC) Received: by mail-pg1-f179.google.com with SMTP id z4so11178237pgh.12 for ; Mon, 28 Feb 2022 04:09:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=wL7/9zo+DY/binrGUlvYH/GTqNI5myPzf7TaLrENn+o=; b=ognWGBP5fA0d2Wp0S4iW43iqxOWEPzNi4JVoVH6bRLzxj/npTy8TypzICWovDDxzKP nEsVMtcB8rtbeQDekvPAUF3iLzoJ/4e1dhGW8C2GyygT7EIvf92GV8tCmMOWcaj7hwcQ Yte0Dl1KLjtDLC9DnDrrbwjEoy9d6jUknXaa3EQgvh5EE6CijAFCkJqA7voSiKgSO2eU mV4hRzXT+HFheC98FE35lbox/nn4TawEcdTRsyY5rLSmpGmfO4NlvhT25Fkwo5ls4+4q YwXL2aeY1m4D2GBBnxQ/PWnpNh/2Yiaw0txA3Wx5bVVt0E7OnsZtP20Fg9XQ54kX4Nda yg7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=wL7/9zo+DY/binrGUlvYH/GTqNI5myPzf7TaLrENn+o=; b=O0k9UC8NX/Kh6pM/sTbB02yVJ0scDUDNR3byyio/xPGU+aW5ncOaocgwXTDGD0oFSs ZykcToGMvlaoz+KB4b3SEOtiUriYMI7gBpIyYooyWYoIzfXS2pUEQyDJFn8s63BjIJm5 wpHceyo66iiaggA57I26IuAmXRpSYzD6RRFzDJXtry14G/xhwbXeAXuYcVjS6oGcv0Gc bkKUvuIJhiNt/XNq8fdw0PIsrCep/61p6y3eLMuoNcL5nRKHbCRpHfoBXquhNP0By9fN NxZ5/KkKe2RWcpXtM39AqfVt9z5tlI4Qj7Aa7RvTYIKo2xRiJY2CfNs8Q9fBhE3P5nCv 9Vew== X-Gm-Message-State: AOAM531RsevFz7ku00Yspj3vqruEYifxrfNutVHzcYWggEkEvovW5tjy YWeXehdSSnxxkAy/kooqhBppjux/U6pRUg== X-Google-Smtp-Source: ABdhPJxm9P67GZ64GC9seoyCL2DE5thTh1/HrXfR1ximWBUJwAx5e7jaLtzQme6A68LNQjWfhQxGAg== X-Received: by 2002:a62:fb0f:0:b0:4f2:6d3f:5ffb with SMTP id x15-20020a62fb0f000000b004f26d3f5ffbmr20937685pfm.55.1646050183769; Mon, 28 Feb 2022 04:09:43 -0800 (PST) Received: from ip-172-31-19-208.ap-northeast-1.compute.internal (ec2-18-181-137-102.ap-northeast-1.compute.amazonaws.com. [18.181.137.102]) by smtp.gmail.com with ESMTPSA id 124-20020a620582000000b004dee0e77128sm12427438pff.166.2022.02.28.04.09.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Feb 2022 04:09:43 -0800 (PST) Date: Mon, 28 Feb 2022 12:09:38 +0000 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> To: Vasily Averin Cc: Roman Gushchin , Vlastimil Babka , Christoph Lameter , David Rientjes , Joonsoo Kim , Pekka Enberg , Linux MM , Andrew Morton , kernel@openvz.org Subject: Re: slabinfo shows incorrect active_objs ??? Message-ID: References: <4BC89091-F314-4785-BCBB-189CE42B0192@linux.dev> <1c73adc1-f780-56ac-4c67-490670a27951@virtuozzo.com> <2a7d3c8a-ad92-0ffe-4374-f0bb7e029a74@virtuozzo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <2a7d3c8a-ad92-0ffe-4374-f0bb7e029a74@virtuozzo.com> X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: C02961C000C X-Rspam-User: Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=ognWGBP5; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.215.179 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com X-Stat-Signature: 3gmody6uf97zc3dgewojcgrbkib9w3fr X-HE-Tag: 1646050184-44626 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Feb 28, 2022 at 09:17:27AM +0300, Vasily Averin wrote: > On 25.02.2022 07:37, Vasily Averin wrote: > > On 25.02.2022 03:08, Roman Gushchin wrote: > > >=20 > > > > On Feb 24, 2022, at 5:17 AM, Vasily Averin wr= ote: > > > >=20 > > > > =EF=BB=BFOn 22.02.2022 19:32, Shakeel Butt wrote: > > > > > If you are just interested in the stats, you can use SLAB for y= our experiments. > > > >=20 > > > > Unfortunately memcg_slabino.py does not support SLAB right now. > > > >=20 > > > > > On 23.02.2022 20:31, Vlastimil Babka wrote: > > > > > > On 2/23/22 04:45, Hyeonggon Yoo wrote: > > > > > > On Wed, Feb 23, 2022 at 01:32:36AM +0100, Vlastimil Babka wro= te: > > > > > > > Hm it would be easier just to disable merging when the prec= ise counters are > > > > > > > enabled. Assume it would be a config option (possibly boot-= time option with > > > > > > > static keys) anyway so those who don't need them can avoid = the overhead. > > > > > >=20 > > > > > > Is it possible to accurately account objects in SLUB? I think= it's not > > > > > > easy because a CPU can free objects to remote cpu's partial s= labs using > > > > > > cmpxchg_double()... > > > > > AFAIU Roman's idea would be that each alloc/free would simply i= nc/dec an > > > > > object counter that's disconnected from physical handling of pa= rticular sl*b > > > > > implementation. It would provide exact count of objects from th= e perspective > > > > > of slab users. > > > > > I assume for reduced overhead the counters would be implemented= in a percpu > > > > > fashion as e.g. vmstats. Slabinfo gathering would thus have to = e.g. sum up > > > > > those percpu counters. > > > >=20 > > > > I like this idea too and I'm going to spend some time for its imp= lementation. > > >=20 > > > Sounds good! > > >=20 > > > Unfortunately it=E2=80=99s quite tricky: the problem is that there = is potentially a large and dynamic set of cgroups and also large and dyna= mic set of slab caches. Given the performance considerations, it=E2=80=99= s also unlikely to avoid using percpu variables. > > > So we come to the (nr_slab_caches * nr_cgroups * nr_cpus) number of= =E2=80=9Cobjects=E2=80=9D. If we create them proactively, we=E2=80=99re = likely wasting lot of memory. Creating them on demand is tricky too (espe= cially without losing some accounting accuracy). > >=20 > > I told about global (i.e. non-memcg) precise slab counters only. > > I'm expect it can done under new config option and/or static key, and= if present use them in /proc/slabinfo output. > >=20 > > At present I'm still going to extract memcg counters via your memcg_s= labinfo script. >=20 > I'm not sure I'll be able to debug this patch properly and decided to s= ubmit it as is. > I hope it can be useful. >=20 > In general it works and /proc/slabinfo shows reasonable numbers, > however in some cases they differs from crash' "kmem -s" output, either= +1 or -1. > Obviously I missed something. > Oh, sorry for the noise. You implemented what Roman said. So s->cpu_slab->inuse is just per-cpu counters for every object of a cach= e, not cpu slab. Please ignore my last feedback. Anyway, I think the +1 or -1 difference is due to race? What was your preemption model? > ---[cut here]--- > [PATCH RFC] slub: precise in-use counter for /proc/slabinfo output >=20 > Signed-off-by: Vasily Averin > --- > include/linux/slub_def.h | 3 +++ > init/Kconfig | 7 +++++++ > mm/slub.c | 20 +++++++++++++++++++- > 3 files changed, 29 insertions(+), 1 deletion(-) >=20 > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h > index 33c5c0e3bd8d..d22e18dfe905 100644 > --- a/include/linux/slub_def.h > +++ b/include/linux/slub_def.h > @@ -56,6 +56,9 @@ struct kmem_cache_cpu { > #ifdef CONFIG_SLUB_STATS > unsigned stat[NR_SLUB_STAT_ITEMS]; > #endif > +#ifdef CONFIG_SLUB_PRECISE_INUSE > + unsigned inuse; /* Precise in-use counter */ > +#endif > }; > #ifdef CONFIG_SLUB_CPU_PARTIAL > diff --git a/init/Kconfig b/init/Kconfig > index e9119bf54b1f..5c57bdbb8938 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1995,6 +1995,13 @@ config SLUB_CPU_PARTIAL > which requires the taking of locks that may cause latency spikes. > Typically one would choose no for a realtime system. > +config SLUB_PRECISE_INUSE > + default n > + depends on SLUB && SMP > + bool "SLUB precise in-use counter" > + help > + Per cpu in-use counter shows precise statistic in slabinfo. > + > config MMAP_ALLOW_UNINITIALIZED > bool "Allow mmapped anonymous memory to be uninitialized" > depends on EXPERT && !MMU > diff --git a/mm/slub.c b/mm/slub.c > index 261474092e43..90750cae0af9 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3228,6 +3228,9 @@ static __always_inline void *slab_alloc_node(stru= ct kmem_cache *s, > out: > slab_post_alloc_hook(s, objcg, gfpflags, 1, &object, init); > +#ifdef CONFIG_SLUB_PRECISE_INUSE > + raw_cpu_inc(s->cpu_slab->inuse); > +#endif > return object; > } > @@ -3506,8 +3509,12 @@ static __always_inline void slab_free(struct kme= m_cache *s, struct slab *slab, > * With KASAN enabled slab_free_freelist_hook modifies the freelist > * to remove objects, whose reuse must be delayed. > */ > - if (slab_free_freelist_hook(s, &head, &tail, &cnt)) > + if (slab_free_freelist_hook(s, &head, &tail, &cnt)) { > do_slab_free(s, slab, head, tail, cnt, addr); > +#ifdef CONFIG_SLUB_PRECISE_INUSE > + raw_cpu_sub(s->cpu_slab->inuse, cnt); > +#endif > + } > } > #ifdef CONFIG_KASAN_GENERIC > @@ -6253,6 +6260,17 @@ void get_slabinfo(struct kmem_cache *s, struct s= labinfo *sinfo) > nr_free +=3D count_partial(n, count_free); > } > +#ifdef CONFIG_SLUB_PRECISE_INUSE > + { > + unsigned int cpu, nr_inuse =3D 0; > + > + for_each_possible_cpu(cpu) > + nr_inuse +=3D per_cpu_ptr((s)->cpu_slab, cpu)->inuse; > + > + if (nr_inuse <=3D nr_objs) > + nr_free =3D nr_objs - nr_inuse; > + } > +#endif > sinfo->active_objs =3D nr_objs - nr_free; > sinfo->num_objs =3D nr_objs; > sinfo->active_slabs =3D nr_slabs; > --=20 > 2.25.1 --=20 Thank you, You are awesome! Hyeonggon :-)