From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C8DCC433F5 for ; Mon, 28 Feb 2022 10:29:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD4308D0002; Mon, 28 Feb 2022 05:28:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A83ED8D0001; Mon, 28 Feb 2022 05:28:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 973828D0002; Mon, 28 Feb 2022 05:28:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 88C778D0001 for ; Mon, 28 Feb 2022 05:28:59 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 624A421377 for ; Mon, 28 Feb 2022 10:28:59 +0000 (UTC) X-FDA: 79191815598.12.31382A5 Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) by imf25.hostedemail.com (Postfix) with ESMTP id 5F813A0002 for ; Mon, 28 Feb 2022 10:28:59 +0000 (UTC) Received: by mail-pj1-f54.google.com with SMTP id g7-20020a17090a708700b001bb78857ccdso14397932pjk.1 for ; Mon, 28 Feb 2022 02:28:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=BWXAviNFDf61qLoj7P7n1Xo4MIvhhtn4c1OPSv3Ebgk=; b=b7hesEiH/ZNlQsJ+9QKrxN6vD8FvHT5WTmTSURxf4CEkfdobLOnS3cgZ72NIAttLbs yR4z2uK/AFxiIgoDSyL1UJmjSCdDLcKOw7pHkCFMRShDukKdP/LEeqKYDZaNeaoUjgEX ifIAZ0FUcOTNiSitms70zblrKTJglznZTnCYyZUlS0nz7/6J8+3oeh3i6vWoIzdW04NQ MoLxCFr1e3sbS+KBWwdvgozI2MzFAiROMZxcPIBM2oMg+mqIyq1Gfcdu2WBNmdk3RYN9 qewMk2m4dj9J6WZoDRlTqnvcGYicQtuk/aXY8+zYLoVz2EycXlnmoLyT1CczMZYMyymJ WYMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=BWXAviNFDf61qLoj7P7n1Xo4MIvhhtn4c1OPSv3Ebgk=; b=wZHRacA8F/zLpNWChdV1DMZHQZG9Pc9sCYato8AvExNcIRSJfSkWTH4ghtfaNs99pf 0zfbx+LdFSFGmyIa5DfeU49aEkh56E9VfcUXU8jrQD6I7ligxvnQ6sKufnpqBPq7Ibp3 CUTQ7h7BIQlBPcKxaoWYzptP84er/5al3dhxC+gU2JivvR8MsxHouAC7rQMrL1y2/WjK KF1qnNutFjCWkTYz46qT9m8Ko33MI0qiZtI0d/mR+4MuIKWzDrTe/ZrD4oHcjvST/Ito tpQBN9ioDAALJ8soao5/s+s1gaCnoW1hjrSfBjEDUIgAnU+p9qR83jwFbPSclBm1fhAX aA+Q== X-Gm-Message-State: AOAM533IoF0znef2dLFzMJrTkroZHKDsb5H12J3yhcceKsnazSppaCnK srITOE5OIeEABeGdC1RrsYc= X-Google-Smtp-Source: ABdhPJz3cBluTt3GBaEpVDVDmTpOkzodj+/RUZ+HCxgC8kXUlo46uDSRYc6YHCNf7PZ5H5xP1DNoag== X-Received: by 2002:a17:902:ee09:b0:151:42ae:6eaa with SMTP id z9-20020a170902ee0900b0015142ae6eaamr12626935plb.9.1646044137864; Mon, 28 Feb 2022 02:28:57 -0800 (PST) Received: from ip-172-31-19-208.ap-northeast-1.compute.internal (ec2-18-181-137-102.ap-northeast-1.compute.amazonaws.com. [18.181.137.102]) by smtp.gmail.com with ESMTPSA id t41-20020a056a0013a900b004e167af0c0dsm13498269pfg.89.2022.02.28.02.28.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Feb 2022 02:28:57 -0800 (PST) Date: Mon, 28 Feb 2022 10:28:53 +0000 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> To: Vasily Averin Cc: Roman Gushchin , Vlastimil Babka , Christoph Lameter , David Rientjes , Joonsoo Kim , Pekka Enberg , Linux MM , Andrew Morton , kernel@openvz.org Subject: Re: slabinfo shows incorrect active_objs ??? Message-ID: References: <4BC89091-F314-4785-BCBB-189CE42B0192@linux.dev> <1c73adc1-f780-56ac-4c67-490670a27951@virtuozzo.com> <2a7d3c8a-ad92-0ffe-4374-f0bb7e029a74@virtuozzo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: 5F813A0002 X-Stat-Signature: a691mqy8q3eoxmfp3d99rom19t4e3njd X-Rspam-User: Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=b7hesEiH; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com X-Rspamd-Server: rspam03 X-HE-Tag: 1646044139-798875 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Feb 28, 2022 at 10:22:42AM +0000, Hyeonggon Yoo wrote: > On Mon, Feb 28, 2022 at 09:17:27AM +0300, Vasily Averin wrote: > > On 25.02.2022 07:37, Vasily Averin wrote: > > > On 25.02.2022 03:08, Roman Gushchin wrote: > > > > > > > > > On Feb 24, 2022, at 5:17 AM, Vasily Averin wrote: > > > > > > > > > > On 22.02.2022 19:32, Shakeel Butt wrote: > > > > > > If you are just interested in the stats, you can use SLAB for your experiments. > > > > > > > > > > Unfortunately memcg_slabino.py does not support SLAB right now. > > > > > > > > > > > On 23.02.2022 20:31, Vlastimil Babka wrote: > > > > > > > On 2/23/22 04:45, Hyeonggon Yoo wrote: > > > > > > > On Wed, Feb 23, 2022 at 01:32:36AM +0100, Vlastimil Babka wrote: > > > > > > > > Hm it would be easier just to disable merging when the precise counters are > > > > > > > > enabled. Assume it would be a config option (possibly boot-time option with > > > > > > > > static keys) anyway so those who don't need them can avoid the overhead. > > > > > > > > > > > > > > Is it possible to accurately account objects in SLUB? I think it's not > > > > > > > easy because a CPU can free objects to remote cpu's partial slabs using > > > > > > > cmpxchg_double()... > > > > > > AFAIU Roman's idea would be that each alloc/free would simply inc/dec an > > > > > > object counter that's disconnected from physical handling of particular sl*b > > > > > > implementation. It would provide exact count of objects from the perspective > > > > > > of slab users. > > > > > > I assume for reduced overhead the counters would be implemented in a percpu > > > > > > fashion as e.g. vmstats. Slabinfo gathering would thus have to e.g. sum up > > > > > > those percpu counters. > > > > > > > > > > I like this idea too and I'm going to spend some time for its implementation. > > > > > > > > Sounds good! > > > > > > > > Unfortunately it’s quite tricky: the problem is that there is potentially a large and dynamic set of cgroups and also large and dynamic set of slab caches. Given the performance considerations, it’s also unlikely to avoid using percpu variables. > > > > So we come to the (nr_slab_caches * nr_cgroups * nr_cpus) number of “objects”. If we create them proactively, we’re likely wasting lot of memory. Creating them on demand is tricky too (especially without losing some accounting accuracy). > > > > > > I told about global (i.e. non-memcg) precise slab counters only. > > > I'm expect it can done under new config option and/or static key, and if present use them in /proc/slabinfo output. > > > > > > At present I'm still going to extract memcg counters via your memcg_slabinfo script. > > > > I'm not sure I'll be able to debug this patch properly and decided to submit it as is. > > I hope it can be useful. > > > > In general it works and /proc/slabinfo shows reasonable numbers, > > however in some cases they differs from crash' "kmem -s" output, either +1 or -1. > > Obviously I missed something. > > > > ---[cut here]--- > > [PATCH RFC] slub: precise in-use counter for /proc/slabinfo output > > > > Signed-off-by: Vasily Averin > > --- > > include/linux/slub_def.h | 3 +++ > > init/Kconfig | 7 +++++++ > > mm/slub.c | 20 +++++++++++++++++++- > > 3 files changed, 29 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h > > index 33c5c0e3bd8d..d22e18dfe905 100644 > > --- a/include/linux/slub_def.h > > +++ b/include/linux/slub_def.h > > @@ -56,6 +56,9 @@ struct kmem_cache_cpu { > > #ifdef CONFIG_SLUB_STATS > > unsigned stat[NR_SLUB_STAT_ITEMS]; > > #endif > > +#ifdef CONFIG_SLUB_PRECISE_INUSE > > + unsigned inuse; /* Precise in-use counter */ > > +#endif > > }; > > #ifdef CONFIG_SLUB_CPU_PARTIAL > > diff --git a/init/Kconfig b/init/Kconfig > > index e9119bf54b1f..5c57bdbb8938 100644 > > --- a/init/Kconfig > > +++ b/init/Kconfig > > @@ -1995,6 +1995,13 @@ config SLUB_CPU_PARTIAL > > which requires the taking of locks that may cause latency spikes. > > Typically one would choose no for a realtime system. > > +config SLUB_PRECISE_INUSE > > + default n > > + depends on SLUB && SMP > > + bool "SLUB precise in-use counter" > > + help > > + Per cpu in-use counter shows precise statistic in slabinfo. > > + > > config MMAP_ALLOW_UNINITIALIZED > > bool "Allow mmapped anonymous memory to be uninitialized" > > depends on EXPERT && !MMU > > diff --git a/mm/slub.c b/mm/slub.c > > index 261474092e43..90750cae0af9 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -3228,6 +3228,9 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s, > > out: > > slab_post_alloc_hook(s, objcg, gfpflags, 1, &object, init); > > +#ifdef CONFIG_SLUB_PRECISE_INUSE > > + raw_cpu_inc(s->cpu_slab->inuse); > > +#endif > > return object; > > } > > @@ -3506,8 +3509,12 @@ static __always_inline void slab_free(struct kmem_cache *s, struct slab *slab, > > * With KASAN enabled slab_free_freelist_hook modifies the freelist > > * to remove objects, whose reuse must be delayed. > > */ > > - if (slab_free_freelist_hook(s, &head, &tail, &cnt)) > > + if (slab_free_freelist_hook(s, &head, &tail, &cnt)) { > > do_slab_free(s, slab, head, tail, cnt, addr); > > +#ifdef CONFIG_SLUB_PRECISE_INUSE > > + raw_cpu_sub(s->cpu_slab->inuse, cnt); > > +#endif > > + } > > } > > #ifdef CONFIG_KASAN_GENERIC > > @@ -6253,6 +6260,17 @@ void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo) > > nr_free += count_partial(n, count_free); > > } > > +#ifdef CONFIG_SLUB_PRECISE_INUSE > > + { > > + unsigned int cpu, nr_inuse = 0; > > + > > + for_each_possible_cpu(cpu) > > + nr_inuse += per_cpu_ptr((s)->cpu_slab, cpu)->inuse; > > + > > + if (nr_inuse <= nr_objs) > > + nr_free = nr_objs - nr_inuse; > > + } > > +#endif > > sinfo->active_objs = nr_objs - nr_free; > > sinfo->num_objs = nr_objs; > > sinfo->active_slabs = nr_slabs; > > Hi Vasily, thank you for this patch. > This looks nice, but I see things we can improve: > > 1) using raw_cpu_{inc,sub}(), s->cpu_slab->inuse will be racy if kernel > can be preempted. slub does not disable preemption/interrupts at all in fastpath. > > And yeah, we can accept being racy to some degree. but it will be incorrect > more and more if system is up for long time. So I think atomic integer > is right choice if correctness is important? > > 2) This code is not aware of cpu partials. there is list of slab for > each kmem_cache_cpu. you can iterate them by: > And replying this, I realized again ... we need to consider disabling preemption when freeing to remote cpu's partials if CONFIG_SLUB_PRECISE_INUSE=y. Hmm, do we need another approach? > kmem_cache_cpu->partial->next->next->next->... and so on until it enters NULL. > > So we need to count cpu partials' inuse too. > Then we need per-slab counters... I think we can use struct slab's > __unused field for this? > > Thanks :) > > -- > Thank you, You are awesome! > Hyeonggon :-) -- Thank you, You are awesome! Hyeonggon :-)