From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5EF6EE57DC for ; Wed, 11 Sep 2024 22:30:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2F71C6B007B; Wed, 11 Sep 2024 18:30:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2A5156B0082; Wed, 11 Sep 2024 18:30:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 16DD86B0083; Wed, 11 Sep 2024 18:30:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id EBF366B007B for ; Wed, 11 Sep 2024 18:30:43 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 7DC371C6652 for ; Wed, 11 Sep 2024 22:30:43 +0000 (UTC) X-FDA: 82553903166.18.33D3260 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf24.hostedemail.com (Postfix) with ESMTP id D09AF180004 for ; Wed, 11 Sep 2024 22:30:41 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=lRt2ljNa; spf=pass (imf24.hostedemail.com: domain of kees@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=kees@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726093813; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xJycngbCuEzFnqlAayfrv6pt9/cFtQTCPEqDnXkvfE8=; b=OrXCX2oGqfQADbvahqrxDAel+9r5vwtxDWnuHyMOpJFF0gmQsrC7xRTXTKUyOBps8UC870 5QQvbLpMt+PGLZFeHCEN7G13zzRBw6GqoAiFIe1W8VsS2hEJZhFN3zxvDot7yLlCNhIYFf xoIro7jIhR9vxYla7c33MPfCiO9E+30= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=lRt2ljNa; spf=pass (imf24.hostedemail.com: domain of kees@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=kees@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726093813; a=rsa-sha256; cv=none; b=jraWoOKDbYjB1XDvmg+QyBGXe/wAmMb+xdloZXqewP6Q8TNZH+RPLUFZXp64wspoG4SIZ1 zTpjIDtU4/3n3sW5bFUdHg1gthzq0o1xo9/YU6yYLnIOFHnkvMzXp2iwgHgw3im5DUbAfm dfT8S2yBmB6cWMY5AY7TlJO6IwdY08o= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 5CA5BA454D9; Wed, 11 Sep 2024 22:30:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 98552C4CEC6; Wed, 11 Sep 2024 22:30:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726093840; bh=WwOTVKODLlsieEew/9/eTn6ygyv4nmybvgfiewsHUcY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=lRt2ljNawjLRvP1xHwUY0yv3IT4FjV2TPZwX7H05+br38UTe5ED8PEDv/WdFDYAtu 5w4/hWH7WRayR1m9YhQe7NTnTAyXF1A/DPxLXdkGtUHnnQCptZ62gTzEBz1amNZAo1 NjUNCe0SI8b6EY2SjOtUEViQXlhbskLlZsmDue1QsALVpDPPCnIA5MuGl0AI7AREpe /yvcp79v5i1jLBznkX2McANWnLKEE2YU3fsMT2mhNOzs6gnyAmiBxlfrnvgaoYW1zg shxMqjz63CIJcPI6sOIEXscIdDtoPHQ3FnMM17T9z3LSPf3UCU2eqMcoKqGSlhIuyq KGVy5dRKo/6XQ== Date: Wed, 11 Sep 2024 15:30:40 -0700 From: Kees Cook To: Suren Baghdasaryan Cc: Vlastimil Babka , Kent Overstreet , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, linux-mm@kvack.org, "GONG, Ruiqi" , Jann Horn , Matteo Rizzo , jvoisin , Xiu Jianfeng , linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org Subject: Re: [PATCH 5/5] slab: Allocate and use per-call-site caches Message-ID: <202409111523.AEAEE48@keescook> References: <20240809072532.work.266-kees@kernel.org> <20240809073309.2134488-5-kees@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspam-User: X-Stat-Signature: ek74znmbubit6f9m4nf8skkjexbbmqc3 X-Rspamd-Queue-Id: D09AF180004 X-Rspamd-Server: rspam11 X-HE-Tag: 1726093841-282235 X-HE-Meta: U2FsdGVkX18A+kR5ciUaGrqRZB7+Ax5MMIgwn2XrpraFsa7WiFWST9wHrSJgZNdSBe9D3HvfNKrcwrDRHpYyb9Wz3MNl1psmzCiBgNGK7xgL728vGum7NlaLkDV93cg45KpmhmQ1hNwl95aCYyHG0UAV/0QTHWdU0So5W93zl5m9xeUj+Wv9y7Q7jH6+e+5YTT1iZ0Y9X1vHib30c+weyoORqdMXoZWiltsLR0wKfnRfqQgu/YW+AO0H3IKlSYKB3TAjje0E3nu9MmI2Dm/+j/v0xogNG4Pgp+N0ftQzBoPeGU8m+n6zfJ8GqHvDqVOKoO7vs2DrkCjYlxaH/2eSiuqipU+av17ghyZPwQTzZ6kG6fSH0V3UcuY1KAFAMg6sTJfWNglZtRELZ6MckD7yqF1RRFUfEpoPQOccFWkBt/3t5QtHiViQfJz2pXfMREs/5Pj2m4Qb65Ans9nWVZ4u8ammXkrM7EqbRDzlgkp9t5hnde8EaOPQflD6IlPJ2S4qaJtjxP64yWkdVsV2lbaz88DjsT/DHxqUQxX0RWh/Vrh5GZ2ZL0f5kiZ7d+X7iK+nIrtBWNWW44TJ1obhOKtiByJod6qLkeLr8XYftAwTAhfayHmY4wQuArxqSV5U6tvhj9RqT7WeEkhNmhqFMl1mzs7DvGMIDwxfWwdHFQKb71Ts1sdlyXn8QGBb2qGOVeIn4PKjwVrunuHPVOnkEmAKpbc9ZyGHD+sE54ooBRLlHNceSk5xS7cbiYxlUhuGKB9+oDI3Maz+6sF+BiZpfuxUQZj9vNCpPDNmkZOysR4xvNdrOHeMvHQUhpqEj736F7lnqdt6u6yn/6bOlw14KliqO+JVLfq5DS/dOQdgDWS0Sji5wl5c8ejPWy78JpoatL2QqZswNtoJPQ/KDr+Jn7zLFmIaP4isaJil4H86g/ct4Gc105P3NHoVmPSzJKkOt1hc7SmBQ5M7RxFzlg0qGXL O8fR6RGk cCnOG2rTieiC7ysmW2/ZmscXtozWUdMfKoeqm1DtxnxBOdxRQSV42K5R3GCViozZNFp3wKHfPMuFHqK1JlvgKdtdhjrVnoIoPSwFGFdiEq9BN+dpVrH2VBBg8bz7y5sCEuPIosuxIAkDi5vi3qAJTu6ojonP4JbGc2+8wySTvDjed8SRT2bdn2BHi50IopDYM+T3xW/rj/fAIPZGhORA/cDVlzpc7BKgidvKx6SqhB7Z/0vMJpmR29FCpWEs/uzh7SOn9YZLIenSmqlNGWDf89qpB1MVWLQWF44YInKRp8qFF6niptZPvh3lqZFCgfv0Yzks2Ko2OLaF8hpo4JhJADF/1W6Xntimr8OZVW5vDBPNwoxnA5J2LPMeT5A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 29, 2024 at 10:03:56AM -0700, Suren Baghdasaryan wrote: > On Fri, Aug 9, 2024 at 12:33 AM Kees Cook wrote: > > > > Use separate per-call-site kmem_cache or kmem_buckets. These are > > allocated on demand to avoid wasting memory for unused caches. > > > > A few caches need to be allocated very early to support allocating the > > caches themselves: kstrdup(), kvasprintf(), and pcpu_mem_zalloc(). Any > > GFP_ATOMIC allocations are currently left to be allocated from > > KMALLOC_NORMAL. > > > > With a distro config, /proc/slabinfo grows from ~400 entries to ~2200. > > > > Since this feature (CONFIG_SLAB_PER_SITE) is redundant to > > CONFIG_RANDOM_KMALLOC_CACHES, mark it a incompatible. Add Kconfig help > > text that compares the features. > > > > Improvements needed: > > - Retain call site gfp flags in alloc_tag meta field to: > > - pre-allocate all GFP_ATOMIC caches (since their caches cannot > > be allocated on demand unless we want them to be GFP_ATOMIC > > themselves...) > > I'm currently working on a feature to identify allocations with > __GFP_ACCOUNT known at compile time (similar to how you handle the > size in the previous patch). Might be something you can reuse/extend. Great, yes! I'd love to check it out. > > - Separate MEMCG allocations as well > > Do you mean allocations with __GFP_ACCOUNT or something else? I do, yes. > > +static void alloc_tag_site_init_early(struct codetag *ct) > > +{ > > + /* Explicitly initialize the caches needed to initialize caches. */ > > + if (strcmp(ct->function, "kstrdup") == 0 || > > + strcmp(ct->function, "kvasprintf") == 0 || > > + strcmp(ct->function, "pcpu_mem_zalloc") == 0) > > I hope we can find a better way to distinguish these allocations. > Maybe have a specialized hook for them, like alloc_hooks_early() which > sets a bit inside ct->flags to distinguish them? That might be possible. I'll see how that ends up looking. I don't want to even further fragment the alloc_hooks_... variants. > > > + alloc_tag_site_init(ct, false); > > + > > + /* TODO: pre-allocate GFP_ATOMIC caches here. */ > > You could pre-allocate GFP_ATOMIC caches during > alloc_tag_module_load() only if gfp_flags are known at compile time I > think. I guess for the dynamic case choose_slab() will fall back to > kmalloc_slab()? Right, yes. I'd do it like the size checking: if we know at compile time, we can depend on it, otherwise it's a run-time fallback. > > > @@ -175,8 +258,21 @@ static bool alloc_tag_module_unload(struct codetag_type *cttype, > > > > if (WARN(counter.bytes, > > "%s:%u module %s func:%s has %llu allocated at module unload", > > - ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes)) > > + ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes)) { > > module_unused = false; > > + } > > +#ifdef CONFIG_SLAB_PER_SITE > > + else if (tag->meta.sized) { > > + /* Remove the allocated caches, if possible. */ > > + void *p = READ_ONCE(tag->meta.cache); > > + > > + WRITE_ONCE(tag->meta.cache, NULL); > > I'm guessing you are not using try_cmpxchg() the same way you did in > alloc_tag_site_init() because a race with any other user is impossible > at the module unload time? If so, a comment mentioning that would be > good. Correct. It should not be possible. But yes, I will add a comment. > > diff --git a/mm/Kconfig b/mm/Kconfig > > index 855c63c3270d..4f01cb6dd32e 100644 > > --- a/mm/Kconfig > > +++ b/mm/Kconfig > > @@ -302,7 +302,20 @@ config SLAB_PER_SITE > > default SLAB_FREELIST_HARDENED > > select SLAB_BUCKETS > > help > > - Track sizes of kmalloc() call sites. > > + As a defense against shared-cache "type confusion" use-after-free > > + attacks, every kmalloc()-family call allocates from a separate > > + kmem_cache (or when dynamically sized, kmem_buckets). Attackers > > + will no longer be able to groom malicious objects via similarly > > + sized allocations that share the same cache as the target object. > > + > > + This increases the "at rest" kmalloc slab memory usage by > > + roughly 5x (around 7MiB), and adds the potential for greater > > + long-term memory fragmentation. However, some workloads > > + actually see performance improvements when single allocation > > + sites are hot. > > I hope you provide the performance and overhead data in the cover > letter when you post v1. That's my plan. It's always odd choosing workloads, but we do seem to have a few 'regular' benchmarks (hackbench, kernel builds, etc). Is there anything in particular you'd want to see? > > +static __always_inline > > +struct kmem_cache *choose_slab(size_t size, kmem_buckets *b, gfp_t flags, > > + unsigned long caller) > > +{ > > +#ifdef CONFIG_SLAB_PER_SITE > > + struct alloc_tag *tag = current->alloc_tag; > > + > > + if (!b && tag && tag->meta.sized && > > + kmalloc_type(flags, caller) == KMALLOC_NORMAL && > > + (flags & GFP_ATOMIC) != GFP_ATOMIC) { > > What if allocation is GFP_ATOMIC but a previous allocation from the > same location (same tag) happened without GFP_ATOMIC and > tag->meta.cache was allocated. Why not use that existing cache? > Same if the tag->meta.cache was pre-allocated. Maybe I was being too conservative in my understanding -- I thought that I couldn't use those caches on the chance that they may already be full? Or is that always the risk, ad GFP_ATOMIC deals with that? If it would be considered safe attempt the allocation from the existing cache, then yeah, I can adjust this check. Thanks for looking these over! -Kees -- Kees Cook