From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60C8DC87FC3 for ; Thu, 29 Aug 2024 17:04:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D111F6B00B3; Thu, 29 Aug 2024 13:04:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C97936B00B4; Thu, 29 Aug 2024 13:04:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B1C1F6B00B5; Thu, 29 Aug 2024 13:04:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 883CC6B00B3 for ; Thu, 29 Aug 2024 13:04:16 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 11DE9A8EE2 for ; Thu, 29 Aug 2024 17:04:16 +0000 (UTC) X-FDA: 82505906112.04.176A64D Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf10.hostedemail.com (Postfix) with ESMTP id 05C6CC0018 for ; Thu, 29 Aug 2024 17:04:12 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=B+uE9TyZ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf10.hostedemail.com: domain of surenb@google.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724950953; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1nrWj+lbgwnv4pKqxuewQJzBbIqoUYBkPvZycuD61mM=; b=7JiLM/twuTShtBTEUSo+2VCUWqSbI/JXUZ1prTnkMB9aFCta+T7kZ7yo7+AyyuohqaDrVv R/C1kPq4HHRpwoLNoPbfUPcUCwwgDc0KTTKzMNwsPdjt/YYJcm578ms/1uzTDTSwbhl8yH BjMabIBFTBNokLwInjb6Pe/ntYcWFHw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724950953; a=rsa-sha256; cv=none; b=YAYB61WQdAKRMf1QEeSgCMFkWfX9X5DBOLyLM9dzG4eorA625TJssAMxusFNVXNSg0gFF0 RAQFnTavt8PFVNVulNBt8RawyxdXknGT7AVyyYDKft+zkWQVz3bAMeCmMLmiyJuU62RJ+g TpEsdlukRsGcWRjLfP6eqHYhfgEZfKk= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=B+uE9TyZ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf10.hostedemail.com: domain of surenb@google.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=surenb@google.com Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-20260346ca1so7475ad.0 for ; Thu, 29 Aug 2024 10:04:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1724951051; x=1725555851; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=1nrWj+lbgwnv4pKqxuewQJzBbIqoUYBkPvZycuD61mM=; b=B+uE9TyZkAi8wOiJp9RJhvCU2fC0bINO0uKUAmA1qFAsQwx7Ocw7GEOv7zz6CytGhB mxIxB2vtwpeAIrROO36y0E3DyK/WFlOWAyXtil01D3BLQrCh92y9G01M/fQcn6xv3kFk oPVIyPD2zKAMUx6tW+DY9vfKU15lV/hKsoju3SOHL95l3tWjBi0UUQ01tBZTW3cOTgPo Qj1JLit9DSp8/PFX5PnFOVU3AjYE9yEX/v2vPnEINU4Z0UgMJ0LHA/Hw92qZfboEJ105 KfXY45GCakxzTk47GoZ9BKou2aqy5c8sAcsBztknExy0Gl1yY3RvMyoHbNolr7Z2BloF YVfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724951051; x=1725555851; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1nrWj+lbgwnv4pKqxuewQJzBbIqoUYBkPvZycuD61mM=; b=JWrHwpGMQ/XRkrP5ycVt56TR16HTzSavAjUmSBtMXi1l148vJ56SWSFcNAT3FNubLV wmRhtePKmtntKUuURyYt8gluqiqXeSGUqEZ8E48w/34PKNHqIBHtSn4eiZylIn3R+thn ohYnxls6HbQ9sgHW9RJiUu3SL7Uf6+Y3eZkPD2pCybfWZm5vsySu7OgdqDQrcSieA1ea U0mhDOewXBWxkAbuqN802lD4dSZ9sFxF4ZG6F8kHjomw1xL4IYYUScz7jYs83wZOi/aP WqEPyP8nT4AjNu0YrUk9i3KcSAES8Ijkb/SXNZAc9MR8fdakHqnZdU2GIlllzUYIzPsm 5/Ng== X-Forwarded-Encrypted: i=1; AJvYcCW62nzWp+0weA0yifh/Lb6mZwnRE7GsJXK3CoJigyVf8bU3ZzPIzMdTEJzcwMsNSjJmZ9HQBo8Xng==@kvack.org X-Gm-Message-State: AOJu0Yxl+BQrQeNe85MndgxxBOxXWf30yYTmI2yCkGBeSO8N0aEBeKZi intDZZC4uete+pU/oAG3D7HqXJ6XpnSQjkoiM5331Vfj2sryLOuzmOPvAnNcsHOjyp9alRzCL6H N63TG49rjr45AXNeOZkefyWPGVYqdZCPJG6ee X-Google-Smtp-Source: AGHT+IEKjHEI5fi8CACxEU/wuH7rLZWTlw6hmW+nXaPA8fFpOSL/sPXDl3XdQmTpStVXXT1hAiKSRn50zzP7b1kofVA= X-Received: by 2002:a17:903:32ca:b0:201:fba5:3ed with SMTP id d9443c01a7336-20510d58008mr3167875ad.27.1724951050824; Thu, 29 Aug 2024 10:04:10 -0700 (PDT) MIME-Version: 1.0 References: <20240809072532.work.266-kees@kernel.org> <20240809073309.2134488-5-kees@kernel.org> In-Reply-To: <20240809073309.2134488-5-kees@kernel.org> From: Suren Baghdasaryan Date: Thu, 29 Aug 2024 10:03:56 -0700 Message-ID: Subject: Re: [PATCH 5/5] slab: Allocate and use per-call-site caches To: Kees Cook Cc: Vlastimil Babka , Kent Overstreet , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, linux-mm@kvack.org, "GONG, Ruiqi" , Jann Horn , Matteo Rizzo , jvoisin , Xiu Jianfeng , linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 05C6CC0018 X-Stat-Signature: 7fpjw3jmf4na3ttibs3aye4g9swq7oy4 X-Rspam-User: X-HE-Tag: 1724951052-82029 X-HE-Meta: U2FsdGVkX1/JYZ4yhwsWwK53k1yCaFAu0KzjiHvpAOzqBmGRxkTwH1QmCDS48rUZwZHrFZv9R/Xoo7xZ56xMNFHfRiudd5tXNyRm/fQGgJJpTlIcraMWT9MUj0IWxoa86zZLYMOK2urwXsPWIFsf8N7WOqbt7E5BW87oK3xJJCxjdaefesIRLR10H73rMF6rQ7vMu6FdxqIFKus3OAVaXxga5hRdu6wzvtH+NssTHKM9EZ8wYOvCMaYnUI9L7Tqi0Utuie3m+GupSST5S07+DZhgvIsEONXLAYCSLkPRlbsMW/SpfmjaLOagmugirwedoGXx+0vdUmdTFRA/NjboRANB1AK91eBuEplSNsu74ts7lYhzqq9gsfP0IeO42Z4+58tWvqUn2XPPDj41mT7r0V6DAzBjiTlJk3/uVHkxrthgfuNha/efvgVnyZ6ClqDUr/U7ap40E+AKkqLNykaoJkaAay1bB5DQYwz4yDU15iMev2e2rLUxoP5IQ9kAhzwi33VHFp6RnglQQSi9QBoxJuFBSHH1LEvU6Br59/QVY/LPABM8XCVUx9NaP3oMezvHbj9nKDl9dKbJ4ecANubK/MGHnbeGRnpjpm1MuJWZCMkP8NMEQNQpLptOXbQU29TdjGSocYqwJfMF1Js9Klf9/x3vtQFFGM+3utKkzlmHHrMNu0vu01P6R4j08FRwnr/1/6LezhUCWNRL8xbyYwmVlf1A4WzqMYGIZXDFLuXjkPj4bikIpe3xoNA0RFAAK/4I8nPBDL9SsN5Yq3r5EjG/7xSKRGOb0iaGW0rFQ3EZ7jw5TBavHeffHG3fOBgqlOj7ej4/DyQ4ZhJMgJqKPasYQEuMMcKP/7YVE6sRtIQWoQZYhJQdTj63IMS+KvXtbaWfNykKDyW1/zK1m/IkA29KxkSv8TXnuwvBa2FKBZ/xUK5xQpxieaaRW60yJWJpMrxPWF2VdtDcy6NW8Q8qJWA m4k+xu8W /wOEMNomhZOk0l/Vn5Ah7unQ4IJJc78T9cgF4d6mcn6IhR7Yft3XQO05j92uAHyxSBxaRp8a1LUM+sLyJZtMpQNYlerjw2gmBkHU0kaQAEPdVu195lE0hnRMiDHSAditzbomug4xSx8iL6w9T6+0t5eoWtHTiPKcpRgMJygw7EuqdK+LimzobGi9xb11TNqLG1zyss2xKqqLYpvr3fpqQKoSOtUslXSWwvaWKuLd50speqT+WcYGuSaqypHB8HhxQwPpaVB0pVhqSTiqm3WTylaA7YiWDVIcNpedgnjl6HDqdnP5DPjVtR0gzQVkxvoFQfb1T7et38NckFfTfiawSr7WqkjQZX0nUMG3g3vTdocw4jI0WpIsC/HvzRm1YujLRA3LYwri2G4ITt2breoEgoSQ8I5ICtqKt5OBgrZb5dDj+OhYz0GQf1S2xcJm3ucR4YCOjFWyvdhJXYKut4vtRFRt3NOG3CyODGuSaxm+Z5wBVrcBI/PxLDGRvBypp2XiAsyQDilB4oy2A1dD5Bsq9lro0zeEFPUP3+Vn/ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Aug 9, 2024 at 12:33=E2=80=AFAM Kees Cook wrote: > > Use separate per-call-site kmem_cache or kmem_buckets. These are > allocated on demand to avoid wasting memory for unused caches. > > A few caches need to be allocated very early to support allocating the > caches themselves: kstrdup(), kvasprintf(), and pcpu_mem_zalloc(). Any > GFP_ATOMIC allocations are currently left to be allocated from > KMALLOC_NORMAL. > > With a distro config, /proc/slabinfo grows from ~400 entries to ~2200. > > Since this feature (CONFIG_SLAB_PER_SITE) is redundant to > CONFIG_RANDOM_KMALLOC_CACHES, mark it a incompatible. Add Kconfig help > text that compares the features. > > Improvements needed: > - Retain call site gfp flags in alloc_tag meta field to: > - pre-allocate all GFP_ATOMIC caches (since their caches cannot > be allocated on demand unless we want them to be GFP_ATOMIC > themselves...) I'm currently working on a feature to identify allocations with __GFP_ACCOUNT known at compile time (similar to how you handle the size in the previous patch). Might be something you can reuse/extend. > - Separate MEMCG allocations as well Do you mean allocations with __GFP_ACCOUNT or something else? > - Allocate individual caches within kmem_buckets on demand to > further reduce memory usage overhead. > > Signed-off-by: Kees Cook > --- > Cc: Suren Baghdasaryan > Cc: Kent Overstreet > Cc: Vlastimil Babka > Cc: Christoph Lameter > Cc: Pekka Enberg > Cc: David Rientjes > Cc: Joonsoo Kim > Cc: Andrew Morton > Cc: Roman Gushchin > Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> > Cc: linux-mm@kvack.org > --- > include/linux/alloc_tag.h | 8 +++ > lib/alloc_tag.c | 121 +++++++++++++++++++++++++++++++++++--- > mm/Kconfig | 19 +++++- > mm/slab_common.c | 1 + > mm/slub.c | 31 +++++++++- > 5 files changed, 170 insertions(+), 10 deletions(-) > > diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h > index f5d8c5849b82..c95628f9b049 100644 > --- a/include/linux/alloc_tag.h > +++ b/include/linux/alloc_tag.h > @@ -24,6 +24,7 @@ struct alloc_tag_counters { > struct alloc_meta { > /* 0 means non-slab, SIZE_MAX means dynamic, and everything else = is fixed-size. */ > size_t sized; > + void *cache; I see now where that meta.cache in the previous patch came from... That part should be moved here. > }; > #define ALLOC_META_INIT(_size) { \ > .sized =3D (__builtin_constant_p(_size) ? (_size) : SIZE_= MAX), \ > @@ -216,6 +217,13 @@ static inline void alloc_tag_sub(union codetag_ref *= ref, size_t bytes) {} > > #endif /* CONFIG_MEM_ALLOC_PROFILING */ > > +#ifdef CONFIG_SLAB_PER_SITE > +void alloc_tag_early_walk(void); > +void alloc_tag_site_init(struct codetag *ct, bool ondemand); > +#else > +static inline void alloc_tag_early_walk(void) {} > +#endif > + > #define alloc_hooks_tag(_tag, _do_alloc) \ > ({ \ > struct alloc_tag * __maybe_unused _old =3D alloc_tag_save(_tag); = \ > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c > index 6d2cb72bf269..e8a66a7c4a6b 100644 > --- a/lib/alloc_tag.c > +++ b/lib/alloc_tag.c > @@ -157,6 +157,89 @@ static void __init procfs_init(void) > proc_create_seq("allocinfo", 0400, NULL, &allocinfo_seq_op); > } > > +#ifdef CONFIG_SLAB_PER_SITE > +static bool ondemand_ready; > + > +void alloc_tag_site_init(struct codetag *ct, bool ondemand) > +{ > + struct alloc_tag *tag =3D ct_to_alloc_tag(ct); > + char *name; > + void *p, *old; > + > + /* Only handle kmalloc allocations. */ > + if (!tag->meta.sized) > + return; > + > + /* Must be ready for on-demand allocations. */ > + if (ondemand && !ondemand_ready) > + return; > + > + old =3D READ_ONCE(tag->meta.cache); > + /* Already allocated? */ > + if (old) > + return; > + > + if (tag->meta.sized < SIZE_MAX) { > + /* Fixed-size allocations. */ > + name =3D kasprintf(GFP_KERNEL, "f:%zu:%s:%d", tag->meta.s= ized, ct->function, ct->lineno); > + if (WARN_ON_ONCE(!name)) > + return; > + /* > + * As with KMALLOC_NORMAL, the entire allocation needs to= be > + * open to usercopy access. :( > + */ > + p =3D kmem_cache_create_usercopy(name, tag->meta.sized, 0= , > + SLAB_NO_MERGE, 0, tag->met= a.sized, > + NULL); > + } else { > + /* Dynamically-size allocations. */ > + name =3D kasprintf(GFP_KERNEL, "d:%s:%d", ct->function, c= t->lineno); > + if (WARN_ON_ONCE(!name)) > + return; > + p =3D kmem_buckets_create(name, SLAB_NO_MERGE, 0, UINT_MA= X, NULL); > + } > + if (p) { > + if (unlikely(!try_cmpxchg(&tag->meta.cache, &old, p))) { > + /* We lost the allocation race; clean up. */ > + if (tag->meta.sized < SIZE_MAX) > + kmem_cache_destroy(p); > + else > + kmem_buckets_destroy(p); > + } > + } > + kfree(name); > +} > + > +static void alloc_tag_site_init_early(struct codetag *ct) > +{ > + /* Explicitly initialize the caches needed to initialize caches. = */ > + if (strcmp(ct->function, "kstrdup") =3D=3D 0 || > + strcmp(ct->function, "kvasprintf") =3D=3D 0 || > + strcmp(ct->function, "pcpu_mem_zalloc") =3D=3D 0) I hope we can find a better way to distinguish these allocations. Maybe have a specialized hook for them, like alloc_hooks_early() which sets a bit inside ct->flags to distinguish them? > + alloc_tag_site_init(ct, false); > + > + /* TODO: pre-allocate GFP_ATOMIC caches here. */ You could pre-allocate GFP_ATOMIC caches during alloc_tag_module_load() only if gfp_flags are known at compile time I think. I guess for the dynamic case choose_slab() will fall back to kmalloc_slab()? > +} > +#endif > + > +static void alloc_tag_module_load(struct codetag_type *cttype, > + struct codetag_module *cmod) > +{ > +#ifdef CONFIG_SLAB_PER_SITE > + struct codetag_iterator iter; > + struct codetag *ct; > + > + iter =3D codetag_get_ct_iter(cttype); > + for (ct =3D codetag_next_ct(&iter); ct; ct =3D codetag_next_ct(&i= ter)) { > + if (iter.cmod !=3D cmod) > + continue; > + > + /* TODO: pre-allocate GFP_ATOMIC caches here. */ > + //alloc_tag_site_init(ct, false); > + } > +#endif > +} > + > static bool alloc_tag_module_unload(struct codetag_type *cttype, > struct codetag_module *cmod) > { > @@ -175,8 +258,21 @@ static bool alloc_tag_module_unload(struct codetag_t= ype *cttype, > > if (WARN(counter.bytes, > "%s:%u module %s func:%s has %llu allocated at m= odule unload", > - ct->filename, ct->lineno, ct->modname, ct->funct= ion, counter.bytes)) > + ct->filename, ct->lineno, ct->modname, ct->funct= ion, counter.bytes)) { > module_unused =3D false; > + } > +#ifdef CONFIG_SLAB_PER_SITE > + else if (tag->meta.sized) { > + /* Remove the allocated caches, if possible. */ > + void *p =3D READ_ONCE(tag->meta.cache); > + > + WRITE_ONCE(tag->meta.cache, NULL); I'm guessing you are not using try_cmpxchg() the same way you did in alloc_tag_site_init() because a race with any other user is impossible at the module unload time? If so, a comment mentioning that would be good. > + if (tag->meta.sized < SIZE_MAX) > + kmem_cache_destroy(p); > + else > + kmem_buckets_destroy(p); > + } > +#endif > } > > return module_unused; > @@ -260,15 +356,16 @@ static void __init sysctl_init(void) > static inline void sysctl_init(void) {} > #endif /* CONFIG_SYSCTL */ > > +static const struct codetag_type_desc alloc_tag_desc =3D { > + .section =3D "alloc_tags", > + .tag_size =3D sizeof(struct alloc_tag), > + .module_load =3D alloc_tag_module_load, > + .module_unload =3D alloc_tag_module_unload, > +}; > + > static int __init alloc_tag_init(void) > { > - const struct codetag_type_desc desc =3D { > - .section =3D "alloc_tags", > - .tag_size =3D sizeof(struct alloc_tag), > - .module_unload =3D alloc_tag_module_unload, > - }; > - > - alloc_tag_cttype =3D codetag_register_type(&desc); > + alloc_tag_cttype =3D codetag_register_type(&alloc_tag_desc); > if (IS_ERR(alloc_tag_cttype)) > return PTR_ERR(alloc_tag_cttype); > > @@ -278,3 +375,11 @@ static int __init alloc_tag_init(void) > return 0; > } > module_init(alloc_tag_init); > + > +#ifdef CONFIG_SLAB_PER_SITE > +void alloc_tag_early_walk(void) > +{ > + codetag_early_walk(&alloc_tag_desc, alloc_tag_site_init_early); > + ondemand_ready =3D true; > +} > +#endif > diff --git a/mm/Kconfig b/mm/Kconfig > index 855c63c3270d..4f01cb6dd32e 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -302,7 +302,20 @@ config SLAB_PER_SITE > default SLAB_FREELIST_HARDENED > select SLAB_BUCKETS > help > - Track sizes of kmalloc() call sites. > + As a defense against shared-cache "type confusion" use-after-fr= ee > + attacks, every kmalloc()-family call allocates from a separate > + kmem_cache (or when dynamically sized, kmem_buckets). Attackers > + will no longer be able to groom malicious objects via similarly > + sized allocations that share the same cache as the target objec= t. > + > + This increases the "at rest" kmalloc slab memory usage by > + roughly 5x (around 7MiB), and adds the potential for greater > + long-term memory fragmentation. However, some workloads > + actually see performance improvements when single allocation > + sites are hot. I hope you provide the performance and overhead data in the cover letter when you post v1. > + > + For a similar defense, see CONFIG_RANDOM_KMALLOC_CACHES, which > + has less memory usage overhead, but is probabilistic. > > config SLUB_STATS > default n > @@ -331,6 +344,7 @@ config SLUB_CPU_PARTIAL > config RANDOM_KMALLOC_CACHES > default n > depends on !SLUB_TINY > + depends on !SLAB_PER_SITE > bool "Randomize slab caches for normal kmalloc" > help > A hardening feature that creates multiple copies of slab caches= for > @@ -345,6 +359,9 @@ config RANDOM_KMALLOC_CACHES > limited degree of memory and CPU overhead that relates to hardw= are and > system workload. > > + For a similar defense, see CONFIG_SLAB_PER_SITE, which is > + deterministic, but has greater memory usage overhead. > + > endmenu # Slab allocator options > > config SHUFFLE_PAGE_ALLOCATOR > diff --git a/mm/slab_common.c b/mm/slab_common.c > index fc698cba0ebe..09506bfa972c 100644 > --- a/mm/slab_common.c > +++ b/mm/slab_common.c > @@ -1040,6 +1040,7 @@ void __init create_kmalloc_caches(void) > kmem_buckets_cache =3D kmem_cache_create("kmalloc_buckets= ", > sizeof(kmem_bucket= s), > 0, SLAB_NO_MERGE, = NULL); > + alloc_tag_early_walk(); > } > > /** > diff --git a/mm/slub.c b/mm/slub.c > index 3520acaf9afa..d14102c4b4d7 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -4135,6 +4135,35 @@ void *__kmalloc_large_node_noprof(size_t size, gfp= _t flags, int node) > } > EXPORT_SYMBOL(__kmalloc_large_node_noprof); > > +static __always_inline > +struct kmem_cache *choose_slab(size_t size, kmem_buckets *b, gfp_t flags= , > + unsigned long caller) > +{ > +#ifdef CONFIG_SLAB_PER_SITE > + struct alloc_tag *tag =3D current->alloc_tag; > + > + if (!b && tag && tag->meta.sized && > + kmalloc_type(flags, caller) =3D=3D KMALLOC_NORMAL && > + (flags & GFP_ATOMIC) !=3D GFP_ATOMIC) { What if allocation is GFP_ATOMIC but a previous allocation from the same location (same tag) happened without GFP_ATOMIC and tag->meta.cache was allocated. Why not use that existing cache? Same if the tag->meta.cache was pre-allocated. > + void *p =3D READ_ONCE(tag->meta.cache); > + > + if (!p && slab_state >=3D UP) { > + alloc_tag_site_init(&tag->ct, true); > + p =3D READ_ONCE(tag->meta.cache); > + } > + > + if (tag->meta.sized < SIZE_MAX) { > + if (p) > + return p; > + /* Otherwise continue with default buckets. */ > + } else { > + b =3D p; > + } > + } > +#endif > + return kmalloc_slab(size, b, flags, caller); > +} > + > static __always_inline > void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int n= ode, > unsigned long caller) > @@ -4152,7 +4181,7 @@ void *__do_kmalloc_node(size_t size, kmem_buckets *= b, gfp_t flags, int node, > if (unlikely(!size)) > return ZERO_SIZE_PTR; > > - s =3D kmalloc_slab(size, b, flags, caller); > + s =3D choose_slab(size, b, flags, caller); > > ret =3D slab_alloc_node(s, NULL, flags, node, caller, size); > ret =3D kasan_kmalloc(s, ret, size, flags); > -- > 2.34.1 >