From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 357F4EED617 for ; Thu, 12 Sep 2024 15:59:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A9D9E6B008C; Thu, 12 Sep 2024 11:59:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A25E66B0093; Thu, 12 Sep 2024 11:59:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8ED306B0098; Thu, 12 Sep 2024 11:59:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6CD4E6B008C for ; Thu, 12 Sep 2024 11:59:09 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 1F85E1C5C20 for ; Thu, 12 Sep 2024 15:59:09 +0000 (UTC) X-FDA: 82556545218.15.B9CD912 Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45]) by imf11.hostedemail.com (Postfix) with ESMTP id 23B0740004 for ; Thu, 12 Sep 2024 15:59:06 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=U2vTSxC6; spf=pass (imf11.hostedemail.com: domain of surenb@google.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726156693; a=rsa-sha256; cv=none; b=s23iK9dj3XSpF1X9f/2doqLI/LdOK9sjV/Lli1Ik/i8LOISflzvP+MauMAzKT8fuZANU41 NgrnbN3T1zZC2osOqSpBWK9woeIlZnsjXWrtyQnPgKSuBDdNiy3QDJqiRVauh8cYSUpXT+ DfDOmwxRDWRF6Ng+BzifOMi6Y4ULAoU= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=U2vTSxC6; spf=pass (imf11.hostedemail.com: domain of surenb@google.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726156693; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DzkSOjtlRgEJBE0QzC4qfBZck+h9F5FrZHcRFt4ekK8=; b=CeMIP6JqZ7MnXy+k/LOuXHLbKlgBiMszLPG7rBOrtxCLhtFG9SKKnQXVsyurIL9rvfgyUR Y4s8eEfQKxPGX2hAA+KlDZFvgcAuB+0qI+MWXcAUe4OiA8SWVrDjtaVh3lSfcZxbWyFcyi +laIEvEBV1e5eEvkGwrldLvjukxhmDo= Received: by mail-ed1-f45.google.com with SMTP id 4fb4d7f45d1cf-5c247dd0899so19331a12.1 for ; Thu, 12 Sep 2024 08:59:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1726156745; x=1726761545; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=DzkSOjtlRgEJBE0QzC4qfBZck+h9F5FrZHcRFt4ekK8=; b=U2vTSxC6FHqVn4dqtFfq4dukjsQkAo3B1fsBDKiOf0Mbj5Ua3X2lVBrMnhQXz8UW9F Smc/c3so8ic6UWq22QN1Tl5nRza1rEnyE/dNlxqNLa5WmeFXVF7Cm6IUONOEKPVrMlu7 ugiQX6dVnKh6Ga6kT0x4rEGvXYgKZK+B2zbDqx9M0scga+lzTE9er3DY0ZS4T+y0POW5 70tArs9bIL4wbsuYHIBYG5ejTdNBpRm4qmk9G03uSXnIeywDCJKuiyocqLwJfajKTlPm naEZEnvnByqjozp6ntzDX7p9kMPCFPmOOwU/ifWF0q1zYL74oSI63cURs95WEALzsBmE qrWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726156745; x=1726761545; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DzkSOjtlRgEJBE0QzC4qfBZck+h9F5FrZHcRFt4ekK8=; b=hJh1FpKcUd/fdCO3uw3TIIxqyAA5FTjF4WzbbRK3ur7RJ8c10hXKgda9vtfjn5telw fuPWxWQ+EhZQfk1rdZ5Jd2f3JAfWhqMDCy7ChR6M1n6fULnI7IM+ptQYEiGyfDXc1C40 nkNBOmOrTkCdeD4AG4HkQW4qbx51QmCRBiDk52tpjlA4+DTM07FqCTyXoIgVefdIp+AX ojmrMFBNsXXl1mCBbpNx1GL4J3zDwiBLZ/UV6P24C7XlxRMWtSEauDkY5hRNvt3jod4W Zg9SduFezAqiHWV1eYAjByP+4BQCo12P7lyfDEF7NFu3/MChjEomk9BChmmlwlyNLCbF QCkQ== X-Forwarded-Encrypted: i=1; AJvYcCUxPlDNvn1cswulCEiEw5bcGVh9KLx/89jsVlqXxMbT0UMvtbjsdQEm0dIDbdsd0AMMLt5JHeMBNg==@kvack.org X-Gm-Message-State: AOJu0Yzz+6/42nu0jgnPefdbvDGxYIwy+CzkvYU5LYnrP0AojV8LWfGe 1e8RY8x+7dQmzdpZQu4VEgfuFJqO+DDpPwV8whIt1hW6H4xv65vAQA4ZAbAaWXc7HP9vfbpKi9n UN1Du/tW6YsoG4/HDoWKU3FNkkLG7npEwjMHI X-Google-Smtp-Source: AGHT+IHsTcSdl2wYCab3UQ1UwOoblz6Kyhoy04ecJapJrovEhl/+EJxUnbtQa5b/7UIj3ct3VL+haxAvOyHQNaUBk2A= X-Received: by 2002:a05:6402:1d48:b0:5c2:5251:ba5c with SMTP id 4fb4d7f45d1cf-5c41439fd9dmr529985a12.0.1726156744526; Thu, 12 Sep 2024 08:59:04 -0700 (PDT) MIME-Version: 1.0 References: <20240809072532.work.266-kees@kernel.org> <20240809073309.2134488-5-kees@kernel.org> <202409111523.AEAEE48@keescook> In-Reply-To: <202409111523.AEAEE48@keescook> From: Suren Baghdasaryan Date: Thu, 12 Sep 2024 08:58:48 -0700 Message-ID: Subject: Re: [PATCH 5/5] slab: Allocate and use per-call-site caches To: Kees Cook Cc: Vlastimil Babka , Kent Overstreet , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, linux-mm@kvack.org, "GONG, Ruiqi" , Jann Horn , Matteo Rizzo , jvoisin , Xiu Jianfeng , linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 9sc93uixd5h4cx5bgafx7k8etsyz7it1 X-Rspamd-Queue-Id: 23B0740004 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1726156746-590950 X-HE-Meta: U2FsdGVkX18JjX4KPMZegP5rg7RG+fU7bzWDuqPDSzizNQidkA2MTApPcoiMT90/6RhpaxSAY19Y6PxzWvfNkT3jZ1ZzS0XDZpHfeU1rwGioRegM431UJrVB4qO0XLCePvmM/WWVJIcx8w/e07BUdRI1Ghpp59hCTo0ZIQgfnHcxOp6DX5KwTVtAGcVTYxOVElp+vYojLPDwr+0aEthYhGNvAJt32nRZOJvInbkG2v7gLjvaCd1GyxQAELGtPjm/v85xLiAWeUw7zU9RH3FQImrbORzDyqrJ9eohZn2kn77/FLeZH2kpXqfxD+wQMrk1PVOZo4WPDZcNYdIjk0ctJsLbmBq+w2X/EYSZcJUDgPZlstcwJNpSxY6qLtHlgqNPXF9yrmLZ85Stx4tztRCjJO26FvbxPtYDxNtdlLyRO9R62B5pcdBO8ld49qVkO+yRLe4K/c+MSkLT25NwZ/PlzEciiYPmNDERSY4mM4tcG7lZF/tYalE1uzU1MF5P/K8PZfgbwHXplwCaQjeISitFJjUJFE8xmiKan0r8ca9rOKtgYKacCD7PsE7XwiAGO5MEJzRtA965exq0Tqdqyib7Jy1WYDApHXMXNb94AJfkATzlLHSs8cLKzFe9h/kqTESLSpwWiShZeZELCoRShj54YhZs6YuTR0QEzCukAUGw1reJjPXGHRXCNY3/KLLasuNz7V7mae8FtUSaovOsjpYRakPQs/Ojq/XyZhrlbLBBVvdlFhrD5S5s86PampChZ8shoc30PHdiDGoOw0UWoq7D8maMg9AhvKtWzwtU//id7yMyTKFNB/tk6Ns5daGh2vOQeWOAQykUwtsrI2aUvYs4f8eoNJwLl+IuL0uF5qayYxlW+acWVCnwUa94k2BMZygzom3OgWvl55fOme4N3iMxLg56NE5trGfih2Gl/Y8JEufWpaBY6bXshtze6m6PqHtxbDKgQNb+VOY29FCM6VL Z2JIO37D f8j10HoD6mUHRYtPNvccyFI/EZUg5R2E2Gx2XxJt00COVKzyuqyACdvRhCe2WfnSsEkir+Rf5s+qeOKZ2rOmzl45Nx78CVMC5ovjvnVrxpXkdhR9WRy6VePjQP+dZnSNDgLe3hRrc1HOtx7/P64TawPxCH46RF6pcPYkWRHZZ42zJK7aD6K/jlTyB96xJwuTZW+0Wbf56ZYijRqxu5EF9ZTUrI8AsLbdqDs1nyMQkxFmx8/QKBYFcg5SxY6XwcUnQvLF27uE5k9Rg2QCYihV4DOI0zBwlDa7Rgd0M X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Sep 11, 2024 at 3:30=E2=80=AFPM Kees Cook wrote: > > On Thu, Aug 29, 2024 at 10:03:56AM -0700, Suren Baghdasaryan wrote: > > On Fri, Aug 9, 2024 at 12:33=E2=80=AFAM Kees Cook wro= te: > > > > > > Use separate per-call-site kmem_cache or kmem_buckets. These are > > > allocated on demand to avoid wasting memory for unused caches. > > > > > > A few caches need to be allocated very early to support allocating th= e > > > caches themselves: kstrdup(), kvasprintf(), and pcpu_mem_zalloc(). An= y > > > GFP_ATOMIC allocations are currently left to be allocated from > > > KMALLOC_NORMAL. > > > > > > With a distro config, /proc/slabinfo grows from ~400 entries to ~2200= . > > > > > > Since this feature (CONFIG_SLAB_PER_SITE) is redundant to > > > CONFIG_RANDOM_KMALLOC_CACHES, mark it a incompatible. Add Kconfig hel= p > > > text that compares the features. > > > > > > Improvements needed: > > > - Retain call site gfp flags in alloc_tag meta field to: > > > - pre-allocate all GFP_ATOMIC caches (since their caches cannot > > > be allocated on demand unless we want them to be GFP_ATOMIC > > > themselves...) > > > > I'm currently working on a feature to identify allocations with > > __GFP_ACCOUNT known at compile time (similar to how you handle the > > size in the previous patch). Might be something you can reuse/extend. > > Great, yes! I'd love to check it out. > > > > - Separate MEMCG allocations as well > > > > Do you mean allocations with __GFP_ACCOUNT or something else? > > I do, yes. > > > > +static void alloc_tag_site_init_early(struct codetag *ct) > > > +{ > > > + /* Explicitly initialize the caches needed to initialize cach= es. */ > > > + if (strcmp(ct->function, "kstrdup") =3D=3D 0 || > > > + strcmp(ct->function, "kvasprintf") =3D=3D 0 || > > > + strcmp(ct->function, "pcpu_mem_zalloc") =3D=3D 0) > > > > I hope we can find a better way to distinguish these allocations. > > Maybe have a specialized hook for them, like alloc_hooks_early() which > > sets a bit inside ct->flags to distinguish them? > > That might be possible. I'll see how that ends up looking. I don't want > to even further fragment the alloc_hooks_... variants. > > > > > > + alloc_tag_site_init(ct, false); > > > + > > > + /* TODO: pre-allocate GFP_ATOMIC caches here. */ > > > > You could pre-allocate GFP_ATOMIC caches during > > alloc_tag_module_load() only if gfp_flags are known at compile time I > > think. I guess for the dynamic case choose_slab() will fall back to > > kmalloc_slab()? > > Right, yes. I'd do it like the size checking: if we know at compile > time, we can depend on it, otherwise it's a run-time fallback. > > > > > > @@ -175,8 +258,21 @@ static bool alloc_tag_module_unload(struct codet= ag_type *cttype, > > > > > > if (WARN(counter.bytes, > > > "%s:%u module %s func:%s has %llu allocated = at module unload", > > > - ct->filename, ct->lineno, ct->modname, ct->f= unction, counter.bytes)) > > > + ct->filename, ct->lineno, ct->modname, ct->f= unction, counter.bytes)) { > > > module_unused =3D false; > > > + } > > > +#ifdef CONFIG_SLAB_PER_SITE > > > + else if (tag->meta.sized) { > > > + /* Remove the allocated caches, if possible. = */ > > > + void *p =3D READ_ONCE(tag->meta.cache); > > > + > > > + WRITE_ONCE(tag->meta.cache, NULL); > > > > I'm guessing you are not using try_cmpxchg() the same way you did in > > alloc_tag_site_init() because a race with any other user is impossible > > at the module unload time? If so, a comment mentioning that would be > > good. > > Correct. It should not be possible. But yes, I will add a comment. > > > > diff --git a/mm/Kconfig b/mm/Kconfig > > > index 855c63c3270d..4f01cb6dd32e 100644 > > > --- a/mm/Kconfig > > > +++ b/mm/Kconfig > > > @@ -302,7 +302,20 @@ config SLAB_PER_SITE > > > default SLAB_FREELIST_HARDENED > > > select SLAB_BUCKETS > > > help > > > - Track sizes of kmalloc() call sites. > > > + As a defense against shared-cache "type confusion" use-afte= r-free > > > + attacks, every kmalloc()-family call allocates from a separ= ate > > > + kmem_cache (or when dynamically sized, kmem_buckets). Attac= kers > > > + will no longer be able to groom malicious objects via simil= arly > > > + sized allocations that share the same cache as the target o= bject. > > > + > > > + This increases the "at rest" kmalloc slab memory usage by > > > + roughly 5x (around 7MiB), and adds the potential for greate= r > > > + long-term memory fragmentation. However, some workloads > > > + actually see performance improvements when single allocatio= n > > > + sites are hot. > > > > I hope you provide the performance and overhead data in the cover > > letter when you post v1. > > That's my plan. It's always odd choosing workloads, but we do seem to > have a few 'regular' benchmarks (hackbench, kernel builds, etc). Is > there anything in particular you'd want to see? I have a stress test implemented as a loadable module to benchmark slab and page allocation times (just a tight loop and timing it). I can clean it up a bit and share with you. > > > > +static __always_inline > > > +struct kmem_cache *choose_slab(size_t size, kmem_buckets *b, gfp_t f= lags, > > > + unsigned long caller) > > > +{ > > > +#ifdef CONFIG_SLAB_PER_SITE > > > + struct alloc_tag *tag =3D current->alloc_tag; > > > + > > > + if (!b && tag && tag->meta.sized && > > > + kmalloc_type(flags, caller) =3D=3D KMALLOC_NORMAL && > > > + (flags & GFP_ATOMIC) !=3D GFP_ATOMIC) { > > > > What if allocation is GFP_ATOMIC but a previous allocation from the > > same location (same tag) happened without GFP_ATOMIC and > > tag->meta.cache was allocated. Why not use that existing cache? > > Same if the tag->meta.cache was pre-allocated. > > Maybe I was being too conservative in my understanding -- I thought that > I couldn't use those caches on the chance that they may already be full? > Or is that always the risk, ad GFP_ATOMIC deals with that? If it would > be considered safe attempt the allocation from the existing cache, then > yeah, I can adjust this check. Well, you fall back to kmalloc_slab() which also might be full. So, how would using an existing cache be different? > > Thanks for looking these over! > > -Kees > > -- > Kees Cook