From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 357F4EED617
	for <linux-mm@archiver.kernel.org>; Thu, 12 Sep 2024 15:59:10 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id A9D9E6B008C; Thu, 12 Sep 2024 11:59:09 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id A25E66B0093; Thu, 12 Sep 2024 11:59:09 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 8ED306B0098; Thu, 12 Sep 2024 11:59:09 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15])
	by kanga.kvack.org (Postfix) with ESMTP id 6CD4E6B008C
	for <linux-mm@kvack.org>; Thu, 12 Sep 2024 11:59:09 -0400 (EDT)
Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay01.hostedemail.com (Postfix) with ESMTP id 1F85E1C5C20
	for <linux-mm@kvack.org>; Thu, 12 Sep 2024 15:59:09 +0000 (UTC)
X-FDA: 82556545218.15.B9CD912
Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45])
	by imf11.hostedemail.com (Postfix) with ESMTP id 23B0740004
	for <linux-mm@kvack.org>; Thu, 12 Sep 2024 15:59:06 +0000 (UTC)
Authentication-Results: imf11.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=U2vTSxC6;
	spf=pass (imf11.hostedemail.com: domain of surenb@google.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=surenb@google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726156693; a=rsa-sha256;
	cv=none;
	b=s23iK9dj3XSpF1X9f/2doqLI/LdOK9sjV/Lli1Ik/i8LOISflzvP+MauMAzKT8fuZANU41
	NgrnbN3T1zZC2osOqSpBWK9woeIlZnsjXWrtyQnPgKSuBDdNiy3QDJqiRVauh8cYSUpXT+
	DfDOmwxRDWRF6Ng+BzifOMi6Y4ULAoU=
ARC-Authentication-Results: i=1;
	imf11.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=U2vTSxC6;
	spf=pass (imf11.hostedemail.com: domain of surenb@google.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=surenb@google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1726156693;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=DzkSOjtlRgEJBE0QzC4qfBZck+h9F5FrZHcRFt4ekK8=;
	b=CeMIP6JqZ7MnXy+k/LOuXHLbKlgBiMszLPG7rBOrtxCLhtFG9SKKnQXVsyurIL9rvfgyUR
	Y4s8eEfQKxPGX2hAA+KlDZFvgcAuB+0qI+MWXcAUe4OiA8SWVrDjtaVh3lSfcZxbWyFcyi
	+laIEvEBV1e5eEvkGwrldLvjukxhmDo=
Received: by mail-ed1-f45.google.com with SMTP id 4fb4d7f45d1cf-5c247dd0899so19331a12.1
        for <linux-mm@kvack.org>; Thu, 12 Sep 2024 08:59:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1726156745; x=1726761545; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=DzkSOjtlRgEJBE0QzC4qfBZck+h9F5FrZHcRFt4ekK8=;
        b=U2vTSxC6FHqVn4dqtFfq4dukjsQkAo3B1fsBDKiOf0Mbj5Ua3X2lVBrMnhQXz8UW9F
         Smc/c3so8ic6UWq22QN1Tl5nRza1rEnyE/dNlxqNLa5WmeFXVF7Cm6IUONOEKPVrMlu7
         ugiQX6dVnKh6Ga6kT0x4rEGvXYgKZK+B2zbDqx9M0scga+lzTE9er3DY0ZS4T+y0POW5
         70tArs9bIL4wbsuYHIBYG5ejTdNBpRm4qmk9G03uSXnIeywDCJKuiyocqLwJfajKTlPm
         naEZEnvnByqjozp6ntzDX7p9kMPCFPmOOwU/ifWF0q1zYL74oSI63cURs95WEALzsBmE
         qrWQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1726156745; x=1726761545;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=DzkSOjtlRgEJBE0QzC4qfBZck+h9F5FrZHcRFt4ekK8=;
        b=hJh1FpKcUd/fdCO3uw3TIIxqyAA5FTjF4WzbbRK3ur7RJ8c10hXKgda9vtfjn5telw
         fuPWxWQ+EhZQfk1rdZ5Jd2f3JAfWhqMDCy7ChR6M1n6fULnI7IM+ptQYEiGyfDXc1C40
         nkNBOmOrTkCdeD4AG4HkQW4qbx51QmCRBiDk52tpjlA4+DTM07FqCTyXoIgVefdIp+AX
         ojmrMFBNsXXl1mCBbpNx1GL4J3zDwiBLZ/UV6P24C7XlxRMWtSEauDkY5hRNvt3jod4W
         Zg9SduFezAqiHWV1eYAjByP+4BQCo12P7lyfDEF7NFu3/MChjEomk9BChmmlwlyNLCbF
         QCkQ==
X-Forwarded-Encrypted: i=1; AJvYcCUxPlDNvn1cswulCEiEw5bcGVh9KLx/89jsVlqXxMbT0UMvtbjsdQEm0dIDbdsd0AMMLt5JHeMBNg==@kvack.org
X-Gm-Message-State: AOJu0Yzz+6/42nu0jgnPefdbvDGxYIwy+CzkvYU5LYnrP0AojV8LWfGe
	1e8RY8x+7dQmzdpZQu4VEgfuFJqO+DDpPwV8whIt1hW6H4xv65vAQA4ZAbAaWXc7HP9vfbpKi9n
	UN1Du/tW6YsoG4/HDoWKU3FNkkLG7npEwjMHI
X-Google-Smtp-Source: AGHT+IHsTcSdl2wYCab3UQ1UwOoblz6Kyhoy04ecJapJrovEhl/+EJxUnbtQa5b/7UIj3ct3VL+haxAvOyHQNaUBk2A=
X-Received: by 2002:a05:6402:1d48:b0:5c2:5251:ba5c with SMTP id
 4fb4d7f45d1cf-5c41439fd9dmr529985a12.0.1726156744526; Thu, 12 Sep 2024
 08:59:04 -0700 (PDT)
MIME-Version: 1.0
References: <20240809072532.work.266-kees@kernel.org> <20240809073309.2134488-5-kees@kernel.org>
 <CAJuCfpHOqKPzbbkULpWU5g1-8mTLXraQM4taHzajY_cJ-YFWgQ@mail.gmail.com> <202409111523.AEAEE48@keescook>
In-Reply-To: <202409111523.AEAEE48@keescook>
From: Suren Baghdasaryan <surenb@google.com>
Date: Thu, 12 Sep 2024 08:58:48 -0700
Message-ID: <CAJuCfpEw1fsuQjDcS23yZ-WE+WoA0oKArs=Q=G14Bh-509AYgA@mail.gmail.com>
Subject: Re: [PATCH 5/5] slab: Allocate and use per-call-site caches
To: Kees Cook <kees@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>, Kent Overstreet <kent.overstreet@linux.dev>, 
	Christoph Lameter <cl@linux.com>, Pekka Enberg <penberg@kernel.org>, David Rientjes <rientjes@google.com>, 
	Joonsoo Kim <iamjoonsoo.kim@lge.com>, Andrew Morton <akpm@linux-foundation.org>, 
	Roman Gushchin <roman.gushchin@linux.dev>, Hyeonggon Yoo <42.hyeyoo@gmail.com>, linux-mm@kvack.org, 
	"GONG, Ruiqi" <gongruiqi@huaweicloud.com>, Jann Horn <jannh@google.com>, 
	Matteo Rizzo <matteorizzo@google.com>, jvoisin <julien.voisin@dustri.org>, 
	Xiu Jianfeng <xiujianfeng@huawei.com>, linux-kernel@vger.kernel.org, 
	linux-hardening@vger.kernel.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Stat-Signature: 9sc93uixd5h4cx5bgafx7k8etsyz7it1
X-Rspamd-Queue-Id: 23B0740004
X-Rspam-User: 
X-Rspamd-Server: rspam10
X-HE-Tag: 1726156746-590950
X-HE-Meta: U2FsdGVkX18JjX4KPMZegP5rg7RG+fU7bzWDuqPDSzizNQidkA2MTApPcoiMT90/6RhpaxSAY19Y6PxzWvfNkT3jZ1ZzS0XDZpHfeU1rwGioRegM431UJrVB4qO0XLCePvmM/WWVJIcx8w/e07BUdRI1Ghpp59hCTo0ZIQgfnHcxOp6DX5KwTVtAGcVTYxOVElp+vYojLPDwr+0aEthYhGNvAJt32nRZOJvInbkG2v7gLjvaCd1GyxQAELGtPjm/v85xLiAWeUw7zU9RH3FQImrbORzDyqrJ9eohZn2kn77/FLeZH2kpXqfxD+wQMrk1PVOZo4WPDZcNYdIjk0ctJsLbmBq+w2X/EYSZcJUDgPZlstcwJNpSxY6qLtHlgqNPXF9yrmLZ85Stx4tztRCjJO26FvbxPtYDxNtdlLyRO9R62B5pcdBO8ld49qVkO+yRLe4K/c+MSkLT25NwZ/PlzEciiYPmNDERSY4mM4tcG7lZF/tYalE1uzU1MF5P/K8PZfgbwHXplwCaQjeISitFJjUJFE8xmiKan0r8ca9rOKtgYKacCD7PsE7XwiAGO5MEJzRtA965exq0Tqdqyib7Jy1WYDApHXMXNb94AJfkATzlLHSs8cLKzFe9h/kqTESLSpwWiShZeZELCoRShj54YhZs6YuTR0QEzCukAUGw1reJjPXGHRXCNY3/KLLasuNz7V7mae8FtUSaovOsjpYRakPQs/Ojq/XyZhrlbLBBVvdlFhrD5S5s86PampChZ8shoc30PHdiDGoOw0UWoq7D8maMg9AhvKtWzwtU//id7yMyTKFNB/tk6Ns5daGh2vOQeWOAQykUwtsrI2aUvYs4f8eoNJwLl+IuL0uF5qayYxlW+acWVCnwUa94k2BMZygzom3OgWvl55fOme4N3iMxLg56NE5trGfih2Gl/Y8JEufWpaBY6bXshtze6m6PqHtxbDKgQNb+VOY29FCM6VL
 Z2JIO37D
 f8j10HoD6mUHRYtPNvccyFI/EZUg5R2E2Gx2XxJt00COVKzyuqyACdvRhCe2WfnSsEkir+Rf5s+qeOKZ2rOmzl45Nx78CVMC5ovjvnVrxpXkdhR9WRy6VePjQP+dZnSNDgLe3hRrc1HOtx7/P64TawPxCH46RF6pcPYkWRHZZ42zJK7aD6K/jlTyB96xJwuTZW+0Wbf56ZYijRqxu5EF9ZTUrI8AsLbdqDs1nyMQkxFmx8/QKBYFcg5SxY6XwcUnQvLF27uE5k9Rg2QCYihV4DOI0zBwlDa7Rgd0M
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Wed, Sep 11, 2024 at 3:30=E2=80=AFPM Kees Cook <kees@kernel.org> wrote:
>
> On Thu, Aug 29, 2024 at 10:03:56AM -0700, Suren Baghdasaryan wrote:
> > On Fri, Aug 9, 2024 at 12:33=E2=80=AFAM Kees Cook <kees@kernel.org> wro=
te:
> > >
> > > Use separate per-call-site kmem_cache or kmem_buckets. These are
> > > allocated on demand to avoid wasting memory for unused caches.
> > >
> > > A few caches need to be allocated very early to support allocating th=
e
> > > caches themselves: kstrdup(), kvasprintf(), and pcpu_mem_zalloc(). An=
y
> > > GFP_ATOMIC allocations are currently left to be allocated from
> > > KMALLOC_NORMAL.
> > >
> > > With a distro config, /proc/slabinfo grows from ~400 entries to ~2200=
.
> > >
> > > Since this feature (CONFIG_SLAB_PER_SITE) is redundant to
> > > CONFIG_RANDOM_KMALLOC_CACHES, mark it a incompatible. Add Kconfig hel=
p
> > > text that compares the features.
> > >
> > > Improvements needed:
> > > - Retain call site gfp flags in alloc_tag meta field to:
> > >   - pre-allocate all GFP_ATOMIC caches (since their caches cannot
> > >     be allocated on demand unless we want them to be GFP_ATOMIC
> > >     themselves...)
> >
> > I'm currently working on a feature to identify allocations with
> > __GFP_ACCOUNT known at compile time (similar to how you handle the
> > size in the previous patch). Might be something you can reuse/extend.
>
> Great, yes! I'd love to check it out.
>
> > >   - Separate MEMCG allocations as well
> >
> > Do you mean allocations with __GFP_ACCOUNT or something else?
>
> I do, yes.
>
> > > +static void alloc_tag_site_init_early(struct codetag *ct)
> > > +{
> > > +       /* Explicitly initialize the caches needed to initialize cach=
es. */
> > > +       if (strcmp(ct->function, "kstrdup") =3D=3D 0 ||
> > > +           strcmp(ct->function, "kvasprintf") =3D=3D 0 ||
> > > +           strcmp(ct->function, "pcpu_mem_zalloc") =3D=3D 0)
> >
> > I hope we can find a better way to distinguish these allocations.
> > Maybe have a specialized hook for them, like alloc_hooks_early() which
> > sets a bit inside ct->flags to distinguish them?
>
> That might be possible. I'll see how that ends up looking. I don't want
> to even further fragment the alloc_hooks_... variants.
>
> >
> > > +               alloc_tag_site_init(ct, false);
> > > +
> > > +       /* TODO: pre-allocate GFP_ATOMIC caches here. */
> >
> > You could pre-allocate GFP_ATOMIC caches during
> > alloc_tag_module_load() only if gfp_flags are known at compile time I
> > think. I guess for the dynamic case choose_slab() will fall back to
> > kmalloc_slab()?
>
> Right, yes. I'd do it like the size checking: if we know at compile
> time, we can depend on it, otherwise it's a run-time fallback.
>
> >
> > > @@ -175,8 +258,21 @@ static bool alloc_tag_module_unload(struct codet=
ag_type *cttype,
> > >
> > >                 if (WARN(counter.bytes,
> > >                          "%s:%u module %s func:%s has %llu allocated =
at module unload",
> > > -                        ct->filename, ct->lineno, ct->modname, ct->f=
unction, counter.bytes))
> > > +                        ct->filename, ct->lineno, ct->modname, ct->f=
unction, counter.bytes)) {
> > >                         module_unused =3D false;
> > > +               }
> > > +#ifdef CONFIG_SLAB_PER_SITE
> > > +               else if (tag->meta.sized) {
> > > +                       /* Remove the allocated caches, if possible. =
*/
> > > +                       void *p =3D READ_ONCE(tag->meta.cache);
> > > +
> > > +                       WRITE_ONCE(tag->meta.cache, NULL);
> >
> > I'm guessing you are not using try_cmpxchg() the same way you did in
> > alloc_tag_site_init() because a race with any other user is impossible
> > at the module unload time? If so, a comment mentioning that would be
> > good.
>
> Correct. It should not be possible. But yes, I will add a comment.
>
> > > diff --git a/mm/Kconfig b/mm/Kconfig
> > > index 855c63c3270d..4f01cb6dd32e 100644
> > > --- a/mm/Kconfig
> > > +++ b/mm/Kconfig
> > > @@ -302,7 +302,20 @@ config SLAB_PER_SITE
> > >         default SLAB_FREELIST_HARDENED
> > >         select SLAB_BUCKETS
> > >         help
> > > -         Track sizes of kmalloc() call sites.
> > > +         As a defense against shared-cache "type confusion" use-afte=
r-free
> > > +         attacks, every kmalloc()-family call allocates from a separ=
ate
> > > +         kmem_cache (or when dynamically sized, kmem_buckets). Attac=
kers
> > > +         will no longer be able to groom malicious objects via simil=
arly
> > > +         sized allocations that share the same cache as the target o=
bject.
> > > +
> > > +         This increases the "at rest" kmalloc slab memory usage by
> > > +         roughly 5x (around 7MiB), and adds the potential for greate=
r
> > > +         long-term memory fragmentation. However, some workloads
> > > +         actually see performance improvements when single allocatio=
n
> > > +         sites are hot.
> >
> > I hope you provide the performance and overhead data in the cover
> > letter when you post v1.
>
> That's my plan. It's always odd choosing workloads, but we do seem to
> have a few 'regular' benchmarks (hackbench, kernel builds, etc). Is
> there anything in particular you'd want to see?

I have a stress test implemented as a loadable module to benchmark
slab and page allocation times (just a tight loop and timing it). I
can clean it up a bit and share with you.

>
> > > +static __always_inline
> > > +struct kmem_cache *choose_slab(size_t size, kmem_buckets *b, gfp_t f=
lags,
> > > +                              unsigned long caller)
> > > +{
> > > +#ifdef CONFIG_SLAB_PER_SITE
> > > +       struct alloc_tag *tag =3D current->alloc_tag;
> > > +
> > > +       if (!b && tag && tag->meta.sized &&
> > > +           kmalloc_type(flags, caller) =3D=3D KMALLOC_NORMAL &&
> > > +           (flags & GFP_ATOMIC) !=3D GFP_ATOMIC) {
> >
> > What if allocation is GFP_ATOMIC but a previous allocation from the
> > same location (same tag) happened without GFP_ATOMIC and
> > tag->meta.cache was allocated. Why not use that existing cache?
> > Same if the tag->meta.cache was pre-allocated.
>
> Maybe I was being too conservative in my understanding -- I thought that
> I couldn't use those caches on the chance that they may already be full?
> Or is that always the risk, ad GFP_ATOMIC deals with that? If it would
> be considered safe attempt the allocation from the existing cache, then
> yeah, I can adjust this check.

Well, you fall back to kmalloc_slab() which also might be full. So,
how would using an existing cache be different?

>
> Thanks for looking these over!
>
> -Kees
>
> --
> Kees Cook