From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EB42C4167B for ; Thu, 7 Dec 2023 02:32:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AE7306B0080; Wed, 6 Dec 2023 21:32:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A979C6B0081; Wed, 6 Dec 2023 21:32:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 937596B0083; Wed, 6 Dec 2023 21:32:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8394B6B0080 for ; Wed, 6 Dec 2023 21:32:39 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 52653A0F91 for ; Thu, 7 Dec 2023 02:32:39 +0000 (UTC) X-FDA: 81538448838.29.34552FE Received: from mail-ot1-f44.google.com (mail-ot1-f44.google.com [209.85.210.44]) by imf24.hostedemail.com (Postfix) with ESMTP id 804E618000D for ; Thu, 7 Dec 2023 02:32:37 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lxaf6hI6; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.210.44 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701916357; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ETvtz3rEq8y5ZHvBBRevwO8n4qMHcpeDzavyr8b+04g=; b=yPWBGHfim6GaOf3mwYpfignbAjM84uNgHDR5vTH6pFC+YzKEh+ubh60iJABzsvOsLkv7xR sudpJshK0JQ/nYCDo034wCraRmSN74U5yF2LgtR5PYLEwIAT0+QTk3cK3X0BY2zZWiMRa+ Cnf2/0IHS9Fj5vI9qNgvF5BOpnDfY6Y= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lxaf6hI6; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.210.44 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701916357; a=rsa-sha256; cv=none; b=0fnGDsEXl8ET2IIcjyaXe6QBPHsrr1XZXwx29ecE+w0seZOqj+aEjKSIRqOCkhUyrM8axF /KVZEYkl8uYijqViCl/FHKdDdrBbbuFee/ttwgjiOt8VOhsHvyxGmeCgFZDmPHNILOxUU2 l3e2aYs57MF2N2lv/uW7RXH35OwVruo= Received: by mail-ot1-f44.google.com with SMTP id 46e09a7af769-6d9a3c035b3so296210a34.2 for ; Wed, 06 Dec 2023 18:32:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701916356; x=1702521156; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ETvtz3rEq8y5ZHvBBRevwO8n4qMHcpeDzavyr8b+04g=; b=lxaf6hI6IdKZrL/utkMlBShbT9lgS4PtWBfYjWEgo5fg5EjBqTwtURRqYYSCiffR59 oLSPHWe/ECVz7QMvZ1gMlNR0UN2jWTK2qdNio7lWuA3O2wTU5464AQ2c+TmDzdg33QqE e+J5WXY0u8CE2Cnk5l8QGRop+8r+f7qVXU6TjEMkxXQz8y7nSSUXMRXFeXrn3FnS1uY5 hEJN4eicT381s+V+vwu9Jq4drSuYOGpvUgCLme2ufoIAJgPODwqRwlUVry2hN5fcDW5H idptYIXEbzwZbpWJcKhUBhP6fmuu4ws4QLLH8lQiVaeXamDIFtuhhX1JvYYg9ZG8tbbD iwYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701916356; x=1702521156; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ETvtz3rEq8y5ZHvBBRevwO8n4qMHcpeDzavyr8b+04g=; b=wtmsctz8iafMiISFxq9dIZGxCvJSFt6HTDE28XNUQZx6oSHKzkIZFCN+7QcBqgKaAp OFiGDd6SycFNWM0EkcxqpNFmNZwoD04zq4MWkj4hoQQIdxQy/l5vUdvbFLLMRwZPbIyq 8UVCKHx9xVnSgE+MgWrH01+Lfor5Nf/MXRzoGS+Fx0VpYEvMfyzfXTXqsfn6apUT4KgJ tYHjJf7VB2RDqboQW5R45xijSd+wYSf4JRbMTWZP51govGTc7qjww6dj9mwxluIHW4IN 3OxDW/eVCXOT8eZXxkPfLf8zvZVx/S1l63vbhEGjC/m06DVvCTpzDkWuEyIXOgGcQkp6 wo4A== X-Gm-Message-State: AOJu0Yy2PS6m+InQA2qgm1c0yM8l49KcSblvUEJ3Qbx9rAzXYy+LkDxk iHSJtXC/FfFitHBade2rAvk= X-Google-Smtp-Source: AGHT+IFhomdac9DZ2rxPUWMaIR1/wt3h2XJFmm+ErXjfJhRGagXQaC+OJI0pfIyYTN+084ixsrR/wQ== X-Received: by 2002:a05:6870:d93:b0:1fb:75b:2fd1 with SMTP id mj19-20020a0568700d9300b001fb075b2fd1mr2039607oab.104.1701916356157; Wed, 06 Dec 2023 18:32:36 -0800 (PST) Received: from localhost.localdomain ([1.245.180.67]) by smtp.gmail.com with ESMTPSA id c192-20020a6335c9000000b005c60ad6c4absm168730pga.4.2023.12.06.18.32.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Dec 2023 18:32:35 -0800 (PST) Date: Thu, 7 Dec 2023 11:32:12 +0900 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> To: Vlastimil Babka Cc: David Rientjes , Christoph Lameter , Pekka Enberg , Joonsoo Kim , Andrew Morton , Roman Gushchin , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Marco Elver , Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , Kees Cook , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kasan-dev@googlegroups.com, cgroups@vger.kernel.org, linux-hardening@vger.kernel.org Subject: Re: [PATCH v2 20/21] mm/slub: optimize alloc fastpath code layout Message-ID: References: <20231120-slab-remove-slab-v2-0-9c9c70177183@suse.cz> <20231120-slab-remove-slab-v2-20-9c9c70177183@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231120-slab-remove-slab-v2-20-9c9c70177183@suse.cz> X-Rspam-User: X-Stat-Signature: 4bdx6imscsrwnrmwaqqqdfy6kxgkkejj X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 804E618000D X-HE-Tag: 1701916357-149598 X-HE-Meta: U2FsdGVkX1/LzTc5yM70FDpQbh1SuW0KifKAjjg9xUi6qLEFLu/+6IGyf2tfRKJGcXms9KkVgsY0W9IZhsn7FqtpheEEBEv3ie4WHbifoJCea08G8xGj4SZeU+GmUZiFI6xeZAHs8hv0BjampW03VYYnRQ2d5QXxykypAjZMccJlDCjMLIPHbin/U/bY+GSFWmWX4++hRZz24H1RGoTlRXCJAEQBHM3zWqhbKoa7VUXFIHzHDztQ0ltRAUECUMaUqZOsj4e624zJi/TCCbOApcbwCkIkszvyI6u/1SHtcVxr6vGf8gP8HG/2LwrneF8aiP0FgM55Uji+mywR4AgZTvUDA4fC7xvqVRWeYNUXT9PCzbWEBQfwOnAyYlX1FxJpSo7EMzG1PUhA2X45d6Ka4YQQKhlo6NqSTttUobhi/6+4nL5lx0CVg/AzHsLzGbv+/JsgX6MC4h4xS0nzO8JQ02pbSq7HJqYw8WbvcySeVNlYzxGJsQs5QSwU8lWo0Ll1bG2CAqyqioBdJj98boqiiUmdfco/cgCLuJTxsDJrhRCEcU71Bq8bp4SQDIZfJtJLDirNHmLjrWYmWztZ4hxbo1wfpe8gTauRja+7kiPd+KCgWhYG/GjKqSNhVroFgnKqt8sw0K7tCBfRCkrRsFuzBmvCTUKqHkbf+5KdIPTvcsrQ+WY7/JvBYmFKIwy2NO8jynoeYllZCmh+4gi41inSEt/l2UmpFpUlm7I6kINAxMMwdq62GQmc2WT3UUbNUf6Yu9JsnVAiBKQP45YAUJPCmZu/Q+vL5QGsLO4JKhv6yFFjS+YpP3QVY4PnNwq7MrCaLlwL9pw97ZFuOKbnxcbf0p46UmxtIaxR6PLfmoK56o1FIC5hS+8xcPPcORSIWxVayUpSJbpXnw62zOaNldICRyvQhS4JowG22/g5G401eVQAsyGQ59Dt+oEpeHUVRyVR334srXhTUc4shMGBY1a AtUyQtDI Tz5cLONYf2xm2udtAs+0BSLEOwGd0OEC8ku3JmPtFcG2j+9O36dAjbsloA0+BQMXh+PgHJz5YtF2vWlM/S+n7FyIGKbjLTLsXXltd4Y0cScJjMlb2JxgG8YeJ7Ai+kGlj683Px8gZQvqMgU5m2YF2U+QdoB67tuhlEw3mK5yCDN2myZQGRE0iLJXWX0S8b8h97yO2tkz/dr3y3p/9Ph/eEJcZtmwi6Rfq1AhvQg3yvMUGsvWdoHa9aWflYf8SoqEBBa0rVP4Gt0hl8aN/RSv2ryMit2c4v1wHIOs1wtXwudcMxWwoW0w0J0uqafvOWxyvnv2pkbCFM+KnA83KdqUFwOJe6XS1HTRhU9Oe7trNrmg7gkcv5H3yqqr9UboilmRca34g77eZlA+/5fCbXh5b3ehMDEewYUqSNTq7uGMIZpGfJPz0goFItlnSx371g+GzQWPsj+r8FoJk3eoVDlBdsuLoMMNNlsPpksq+gBrTlqM+Pt8rekkRH1Yp3meRGJCiLK46DWnPGiLzf5g= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 20, 2023 at 07:34:31PM +0100, Vlastimil Babka wrote: > With allocation fastpaths no longer divided between two .c files, we > have better inlining, however checking the disassembly of > kmem_cache_alloc() reveals we can do better to make the fastpaths > smaller and move the less common situations out of line or to separate > functions, to reduce instruction cache pressure. > > - split memcg pre/post alloc hooks to inlined checks that use likely() > to assume there will be no objcg handling necessary, and non-inline > functions doing the actual handling > > - add some more likely/unlikely() to pre/post alloc hooks to indicate > which scenarios should be out of line > > - change gfp_allowed_mask handling in slab_post_alloc_hook() so the > code can be optimized away when kasan/kmsan/kmemleak is configured out > > bloat-o-meter shows: > add/remove: 4/2 grow/shrink: 1/8 up/down: 521/-2924 (-2403) > Function old new delta > __memcg_slab_post_alloc_hook - 461 +461 > kmem_cache_alloc_bulk 775 791 +16 > __pfx_should_failslab.constprop - 16 +16 > __pfx___memcg_slab_post_alloc_hook - 16 +16 > should_failslab.constprop - 12 +12 > __pfx_memcg_slab_post_alloc_hook 16 - -16 > kmem_cache_alloc_lru 1295 1023 -272 > kmem_cache_alloc_node 1118 817 -301 > kmem_cache_alloc 1076 772 -304 > kmalloc_node_trace 1149 838 -311 > kmalloc_trace 1102 789 -313 > __kmalloc_node_track_caller 1393 1080 -313 > __kmalloc_node 1397 1082 -315 > __kmalloc 1374 1059 -315 > memcg_slab_post_alloc_hook 464 - -464 > > Note that gcc still decided to inline __memcg_pre_alloc_hook(), but the > code is out of line. Forcing noinline did not improve the results. As a > result the fastpaths are shorter and overal code size is reduced. > > Signed-off-by: Vlastimil Babka > --- > mm/slub.c | 89 ++++++++++++++++++++++++++++++++++++++------------------------- > 1 file changed, 54 insertions(+), 35 deletions(-) > > diff --git a/mm/slub.c b/mm/slub.c > index 5683f1d02e4f..77d259f3d592 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1866,25 +1866,17 @@ static inline size_t obj_full_size(struct kmem_cache *s) > /* > * Returns false if the allocation should fail. > */ > -static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s, > - struct list_lru *lru, > - struct obj_cgroup **objcgp, > - size_t objects, gfp_t flags) > +static bool __memcg_slab_pre_alloc_hook(struct kmem_cache *s, > + struct list_lru *lru, > + struct obj_cgroup **objcgp, > + size_t objects, gfp_t flags) > { > - struct obj_cgroup *objcg; > - > - if (!memcg_kmem_online()) > - return true; > - > - if (!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT)) > - return true; > - > /* > * The obtained objcg pointer is safe to use within the current scope, > * defined by current task or set_active_memcg() pair. > * obj_cgroup_get() is used to get a permanent reference. > */ > - objcg = current_obj_cgroup(); > + struct obj_cgroup *objcg = current_obj_cgroup(); > if (!objcg) > return true; > > @@ -1907,17 +1899,34 @@ static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s, > return true; > } > > -static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, > - struct obj_cgroup *objcg, > - gfp_t flags, size_t size, > - void **p) > +/* > + * Returns false if the allocation should fail. > + */ > +static __fastpath_inline > +bool memcg_slab_pre_alloc_hook(struct kmem_cache *s, struct list_lru *lru, > + struct obj_cgroup **objcgp, size_t objects, > + gfp_t flags) > +{ > + if (!memcg_kmem_online()) > + return true; > + > + if (likely(!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT))) > + return true; > + > + return likely(__memcg_slab_pre_alloc_hook(s, lru, objcgp, objects, > + flags)); > +} > + > +static void __memcg_slab_post_alloc_hook(struct kmem_cache *s, > + struct obj_cgroup *objcg, > + gfp_t flags, size_t size, > + void **p) > { > struct slab *slab; > unsigned long off; > size_t i; > > - if (!memcg_kmem_online() || !objcg) > - return; > + flags &= gfp_allowed_mask; > > for (i = 0; i < size; i++) { > if (likely(p[i])) { > @@ -1940,6 +1949,16 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, > } > } > > +static __fastpath_inline > +void memcg_slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, > + gfp_t flags, size_t size, void **p) > +{ > + if (likely(!memcg_kmem_online() || !objcg)) > + return; > + > + return __memcg_slab_post_alloc_hook(s, objcg, flags, size, p); > +} > + > static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, > void **p, int objects) > { > @@ -3709,34 +3728,34 @@ noinline int should_failslab(struct kmem_cache *s, gfp_t gfpflags) > } > ALLOW_ERROR_INJECTION(should_failslab, ERRNO); > > -static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, > - struct list_lru *lru, > - struct obj_cgroup **objcgp, > - size_t size, gfp_t flags) > +static __fastpath_inline > +struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, > + struct list_lru *lru, > + struct obj_cgroup **objcgp, > + size_t size, gfp_t flags) > { > flags &= gfp_allowed_mask; > > might_alloc(flags); > > - if (should_failslab(s, flags)) > + if (unlikely(should_failslab(s, flags))) > return NULL; > > - if (!memcg_slab_pre_alloc_hook(s, lru, objcgp, size, flags)) > + if (unlikely(!memcg_slab_pre_alloc_hook(s, lru, objcgp, size, flags))) > return NULL; > > return s; > } > > -static inline void slab_post_alloc_hook(struct kmem_cache *s, > - struct obj_cgroup *objcg, gfp_t flags, > - size_t size, void **p, bool init, > - unsigned int orig_size) > +static __fastpath_inline > +void slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, > + gfp_t flags, size_t size, void **p, bool init, > + unsigned int orig_size) > { > unsigned int zero_size = s->object_size; > bool kasan_init = init; > size_t i; > - > - flags &= gfp_allowed_mask; > + gfp_t init_flags = flags & gfp_allowed_mask; > > /* > * For kmalloc object, the allocated memory size(object_size) is likely > @@ -3769,13 +3788,13 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s, > * As p[i] might get tagged, memset and kmemleak hook come after KASAN. > */ > for (i = 0; i < size; i++) { > - p[i] = kasan_slab_alloc(s, p[i], flags, kasan_init); > + p[i] = kasan_slab_alloc(s, p[i], init_flags, kasan_init); > if (p[i] && init && (!kasan_init || > !kasan_has_integrated_init())) > memset(p[i], 0, zero_size); > kmemleak_alloc_recursive(p[i], s->object_size, 1, > - s->flags, flags); > - kmsan_slab_alloc(s, p[i], flags); > + s->flags, init_flags); > + kmsan_slab_alloc(s, p[i], init_flags); > } > > memcg_slab_post_alloc_hook(s, objcg, flags, size, p); > @@ -3799,7 +3818,7 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list > bool init = false; > > s = slab_pre_alloc_hook(s, lru, &objcg, 1, gfpflags); > - if (!s) > + if (unlikely(!s)) > return NULL; > > object = kfence_alloc(s, orig_size, gfpflags); > > -- Looks good to me, Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> > 2.42.1 > >