From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03056C02188 for ; Mon, 27 Jan 2025 19:38:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 70F4328019F; Mon, 27 Jan 2025 14:38:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6BEF3280185; Mon, 27 Jan 2025 14:38:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5AE5528019F; Mon, 27 Jan 2025 14:38:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2F396280185 for ; Mon, 27 Jan 2025 14:38:47 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id D5067A05C2 for ; Mon, 27 Jan 2025 19:38:46 +0000 (UTC) X-FDA: 83054244252.26.4A2EA49 Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) by imf27.hostedemail.com (Postfix) with ESMTP id ED72E40005 for ; Mon, 27 Jan 2025 19:38:44 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=PUZksQi4; spf=pass (imf27.hostedemail.com: domain of surenb@google.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738006725; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2OGk2erwgzOfJpqwELGwWkTt2at7bhBQaycgP2aVVa8=; b=SPZhzC+CtBiiFCiiznk3cGZO3idGIP0mo2vMJK/f2Az4WPvRtmbC8soaBHu98/O+JPgvaa s5BXv1to0h3cOEmgIxEFty6vV04nUJ/tXx9/giBAHLZ5t3NXaxpS64hriAwLK3MlSml98c pIKC1IA5Vbh5vfUSwdcxhzRhYP5irr0= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=PUZksQi4; spf=pass (imf27.hostedemail.com: domain of surenb@google.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738006725; a=rsa-sha256; cv=none; b=wj5uCmLa0RpDQwrCV3I366D/+9XMEUePxql6uxMuf1+9zTxaTK4DnT5fywsvzpzBHh7RsO 2KpgPL6nxU5lqK6iHKKREA6+s/FN7X/TToJGTsK/UKYrHph6fgbNAyrLRtL1SweNZ3xTrg 2bGe0LG1B7lPcgggNsNBI4Kqzxr0CRk= Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-4678c9310afso38191cf.1 for ; Mon, 27 Jan 2025 11:38:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738006724; x=1738611524; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2OGk2erwgzOfJpqwELGwWkTt2at7bhBQaycgP2aVVa8=; b=PUZksQi4A5pSkuqqjUsOtSLz4QQvI7F8CFWHvJwHfKCawY/wHC9DOqg5WZosLw52S0 F3zD8nb77dogWIQs0EAkPVDw5Ce+AkWS82O5PWBUBJg7d6Gtx+dScxg0XIHx+FUoMGJ8 eky4FjkPV8wuG1cLTf8vGCdhyE4OurJ1uKX0yHeQexyvJfIElfAhs20GckKIqjim+tkO fg4wqBhWjcuGpoxieoz94GlxnpByQykMRs3FdIRobnQEf5vD2hBQGi5kwBkMpRrHvHY4 u5RoLSuFwVt069g4JKqYnb/z8g1rjg2yw4YAh2JvlCH/7FZRLmLe+IkaxftxrSvvjGOG Z1rg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738006724; x=1738611524; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2OGk2erwgzOfJpqwELGwWkTt2at7bhBQaycgP2aVVa8=; b=p/tryRW36dspHNZBBQ+YER5XTlCao8f4obqlZVN3rQchwo/z0d7tblKCpQTxQ5AvhP xjEjINZxdttNvbyb9rIgI0yig3+dPJDQaY2qHtaaX+tH/HPRnRSO6r4Xe41q/D4nAhPI M+1e2/NJcz3bMLaVHe5aOLCv0+oTDheMOid+mHe0DU26WRn5bdIExZo4flt8gBJ/Npvk IGNIfMqYrICQ1TCh3CDktGRBlkYiUrEWYcbIJxriipCWspM+Lhpl7AZVv5yKo5KGyfb+ u+reOgG2hY5Af8uMv4TnjvZ7xdhBmwPWetBWS6F+YzsKdmgZMT8aUBFkxjRC+R4v35GQ AsDQ== X-Forwarded-Encrypted: i=1; AJvYcCVbvt7geQYeny/AarEf2ketY/HPBYJGqCXrEvNrVNf53UTg72FqnuidLrycOn8AY90KcwnCckdSEg==@kvack.org X-Gm-Message-State: AOJu0YxOwlHMZNNih7VMvTKtNJXK8q5yJLYUgSzZ4geyRS9s8mk9Aekb /DEDbBDDGxOhwObEGkQRb5Rjo+qioYAqCUB2W3xb1v3QRFu3RIBAl79F2lvRVCoq+VnFEVK7Shc 00029FyhRI/DSl9fo198r8KtjHJLh7U8Wsaj3 X-Gm-Gg: ASbGncvTOqK5PXdhFhtn9w8SeFIxk2lvKYKYaFdv/kU7fZQv++xsYPyh4mqbfWq67sH 0+Zdy4Ke0qcADPnSkjIPeEG0ud6xbHC0I1ZiS9t0ObSow+oZ7cSmzfSeP5JEOcQ== X-Google-Smtp-Source: AGHT+IG0RMXNI72pF1Rxfh+C+Y274Hw/Ua/SrEUcHxnag8NaOhvioONoRdHrXNdBdBaUAFV8DnRx/e+iymwqNCKLh5U= X-Received: by 2002:a05:622a:6090:b0:46d:f28a:b5b5 with SMTP id d75a77b69052e-46fc55b829bmr451031cf.10.1738006723698; Mon, 27 Jan 2025 11:38:43 -0800 (PST) MIME-Version: 1.0 References: <20250126070206.381302-1-surenb@google.com> <20250126070206.381302-2-surenb@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Mon, 27 Jan 2025 11:38:32 -0800 X-Gm-Features: AWEUYZnlvxoaiM8App6sWoF9kUKTxgy5INblBiEjQUBk0kqZYeEdaU7agJQksTA Message-ID: Subject: Re: [PATCH 2/3] alloc_tag: uninline code gated by mem_alloc_profiling_key in slab allocator To: Vlastimil Babka , Steven Rostedt Cc: akpm@linux-foundation.org, Peter Zijlstra , kent.overstreet@linux.dev, yuzhao@google.com, minchan@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, 00107082@163.com, quic_zhenhuah@quicinc.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: ED72E40005 X-Stat-Signature: 49t59grf4j1md3ja5bmt88p5isg3uxdw X-Rspam-User: X-HE-Tag: 1738006724-465554 X-HE-Meta: U2FsdGVkX1+POybFbEQokicN3VcjE/CRiebws40B2WhWrunKjHXjbhxho6Ajl2FqVIPwVcXBGVPzn82YnmLFQQn+Hn+uN30kewZW4kpIG8eQYr74RUleWQej+3TR5arvHs9FLM+r+39zbER49oPgGYNY5n9FrC81+0rKET/Prfp/kMQnGFTVrX5LRQ6NNzZt3cIxSIFIYFTC4CvUkIDfonjo+yNoZUVp8Gk3SAWD6thTpJCfGSg5XDc4ydDvrVZW8l6Tt5AcknIPcPfLqzMoAIr+CYX1kJb/HkY1tQVmBkpXKeQRZL82VlJuiZIjSqLFuQFZkZO0r8FaGlFBFRYgmDJBT2yxkREP+4iyz1UIzJql7houhxE4g9VhdUuL65sisIeGY2Hmolletz/5sof1AHh5Wm8Jkm6aWpLC7GWtdta+EdxvIFgNPWCbZ2d7+urmfJOmHE114TK9ry09oJbzkVypD7EDCcoz5iTnxigIgwITxkOcIcTAX2deRsIz/ZnHnFTAu+IlBWpGctfEtiYcWa89B8Pzx9fDXLH1aT60wxwA588ZiCiIyWdt8g1zAwHwszbyKXoQtPjzjDvTJv8gfBosqRFz+S+x7ng2+scQmDcuPuMJXQDlQELx+2MvfqjaKNAxZPDFy1QCKtmDVeB6wKFq5oeQ2o/QmorBADT3P2Dvi8MR3MlwKcyD2KBUnWV2YZJfqvwx2FR58RdJComDujo5hdTOcQavsKwE22YthfU/JTtQ28czezf77Vf0gbBn7s6RNpYbovNKpejJar0lR5mijbowirtoq7+D9G8ibgtS1qsxB7eXK16epoIVMeo8KwvNnw6ImKFmLR5VcCZYwdA8mFiuANi/SA/0hRKze/h8FA8CntDymC8gE1CZz6rGLiJc0PxmKwYIFiUX/oaQ8UN7Oq3aJJybQBT1ci3+denRnQq8TY7qJuA879jK+ilzoLjDjjaMmCK1crAP2Xm 3gg4soHj BFZ3jXnHLB7ikNNLgWWdnNc3rvbf/K6RMp5f1lfwFbXEoET+/IfdspEHyY+Hq5bPUVmPFs8D9XvcK15LUn8V8lPyDdrS0N5NL4WSHe4ZdhSpii7pWYiC93pz8Z7CcgO7Ouwygj2SuWmBB3EgfWHswjvje5cPuRo9tmnvyjdV5UKgkZoyu4tpSVEAw8Yjk5QmSO/E35Dh3Wh9GjfCM6lSTZID9dbnz5iDF/WZBtuzzpxjFCxmAILWN0DWhNvK1QZg8X51dzMWfLNMsZhU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Jan 26, 2025 at 8:47=E2=80=AFAM Vlastimil Babka wr= ote: > > On 1/26/25 08:02, Suren Baghdasaryan wrote: > > When a sizable code section is protected by a disabled static key, that > > code gets into the instruction cache even though it's not executed and > > consumes the cache, increasing cache misses. This can be remedied by > > moving such code into a separate uninlined function. The improvement Sorry, I missed adding Steven Rostedt into the CC list since his advice was instrumental in finding the way to optimize the static key performance in this patch. Added now. > > Weird, I thought the static_branch_likely/unlikely/maybe was already > handling this by the unlikely case being a jump to a block away from the > fast-path stream of instructions, thus making it less likely to get cache= d. > AFAIU even plain likely()/unlikely() should do this, along with branch > prediction hints. This was indeed an unexpected overhead when I measured it on Android. Cache pollution was my understanding of the cause for this high overhead after Steven told me to try uninlining the protected code. He has done something similar in the tracing subsystem. But maybe I misunderstood the real reason. Steven, could you please verify if my understanding of the high overhead cause is correct here? Maybe there is something else at play that I missed? > > > however comes at the expense of the configuration when this static key > > gets enabled since there is now an additional function call. > > The default state of the mem_alloc_profiling_key is controlled by > > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT. Apply this optimization > > only if CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=3Dn, improving th= e > > performance of the default configuration. > > When CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=3Dy the functions > > are inlined and performance does not change. > > > > On a Pixel6 phone, slab allocation profiling overhead measured with > > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=3Dn and profiling disable= d: > > > > baseline modified > > Big 3.31% 0.17% > > Medium 3.79% 0.57% > > Little 6.68% 1.28% > > What does big/medium/little mean here? But indeed not nice overhead for > disabled static key. Big/Medium/Little is the CPU core size on my ARM64-based Android phone. > > > When CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=3Dn and memory alloc= ation > > profiling gets enabled, the difference in performance before and after > > this change stays within noise levels. > > > > On x86 this patch does not make noticeable difference because the overh= ead > > with mem_alloc_profiling_key disabled is much lower (under 1%) to start > > with, so any improvement is less visible and hard to distinguish from t= he > > noise. > > That would be in line with my understanding above. Does the arm64 compile= r > not do it as well as x86 (could be maybe found out by disassembling) or t= he > Pixel6 cpu somhow caches these out of line blocks more aggressively and o= nly > a function call stops it? I'll disassemble the code and will see what it looks like. > > > Signed-off-by: Suren Baghdasaryan > > Kinda sad that despite the static key we have to control a lot by the > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT in addition. I agree. If there is a better way to fix this regression I'm open to changes. Let's wait for Steven to confirm my understanding before proceeding. Thanks, Suren. > > > --- > > include/linux/alloc_tag.h | 6 +++++ > > mm/slub.c | 46 ++++++++++++++++++++++++--------------- > > 2 files changed, 34 insertions(+), 18 deletions(-) > > > > diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h > > index a946e0203e6d..c5de2a0c1780 100644 > > --- a/include/linux/alloc_tag.h > > +++ b/include/linux/alloc_tag.h > > @@ -116,6 +116,12 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared= _alloc_tag); > > DECLARE_STATIC_KEY_MAYBE(CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT= , > > mem_alloc_profiling_key); > > > > +#ifdef CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT > > +#define inline_if_mem_alloc_prof inline > > +#else > > +#define inline_if_mem_alloc_prof noinline > > +#endif > > + > > static inline bool mem_alloc_profiling_enabled(void) > > { > > return static_branch_maybe(CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_= DEFAULT, > > diff --git a/mm/slub.c b/mm/slub.c > > index 996691c137eb..3107d43dfddc 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -2000,7 +2000,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct= kmem_cache *s, > > return 0; > > } > > > > -static inline void free_slab_obj_exts(struct slab *slab) > > +static inline_if_mem_alloc_prof void free_slab_obj_exts(struct slab *s= lab) > > { > > struct slabobj_ext *obj_exts; > > > > @@ -2077,33 +2077,35 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s= , gfp_t flags, void *p) > > return slab_obj_exts(slab) + obj_to_index(s, slab, p); > > } > > > > -static inline void > > -alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_= t flags) > > +static inline_if_mem_alloc_prof void > > +__alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gf= p_t flags) > > { > > - if (need_slab_obj_ext()) { > > - struct slabobj_ext *obj_exts; > > + struct slabobj_ext *obj_exts; > > > > - obj_exts =3D prepare_slab_obj_exts_hook(s, flags, object)= ; > > - /* > > - * Currently obj_exts is used only for allocation profili= ng. > > - * If other users appear then mem_alloc_profiling_enabled= () > > - * check should be added before alloc_tag_add(). > > - */ > > - if (likely(obj_exts)) > > - alloc_tag_add(&obj_exts->ref, current->alloc_tag,= s->size); > > - } > > + obj_exts =3D prepare_slab_obj_exts_hook(s, flags, object); > > + /* > > + * Currently obj_exts is used only for allocation profiling. > > + * If other users appear then mem_alloc_profiling_enabled() > > + * check should be added before alloc_tag_add(). > > + */ > > + if (likely(obj_exts)) > > + alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size= ); > > } > > > > static inline void > > -alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, = void **p, > > +alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_= t flags) > > +{ > > + if (need_slab_obj_ext()) > > + __alloc_tagging_slab_alloc_hook(s, object, flags); > > +} > > + > > +static inline_if_mem_alloc_prof void > > +__alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab= , void **p, > > int objects) > > { > > struct slabobj_ext *obj_exts; > > int i; > > > > - if (!mem_alloc_profiling_enabled()) > > - return; > > - > > /* slab->obj_exts might not be NULL if it was created for MEMCG a= ccounting. */ > > if (s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE)) > > return; > > @@ -2119,6 +2121,14 @@ alloc_tagging_slab_free_hook(struct kmem_cache *= s, struct slab *slab, void **p, > > } > > } > > > > +static inline void > > +alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, = void **p, > > + int objects) > > +{ > > + if (mem_alloc_profiling_enabled()) > > + __alloc_tagging_slab_free_hook(s, slab, p, objects); > > +} > > + > > #else /* CONFIG_MEM_ALLOC_PROFILING */ > > > > static inline void >