From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7EFE7CA0ECB for ; Mon, 11 Sep 2023 22:12:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C77DA6B00A2; Mon, 11 Sep 2023 18:12:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C0B7A6B00A3; Mon, 11 Sep 2023 18:12:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC7F76B00A4; Mon, 11 Sep 2023 18:12:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 96B156B00A2 for ; Mon, 11 Sep 2023 18:12:18 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5E706160C12 for ; Mon, 11 Sep 2023 22:12:18 +0000 (UTC) X-FDA: 81225715956.21.70AFBF1 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf06.hostedemail.com (Postfix) with ESMTP id 62D67180012 for ; Mon, 11 Sep 2023 22:12:16 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=UtxzIOUp; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf06.hostedemail.com: domain of keescook@chromium.org designates 209.85.214.169 as permitted sender) smtp.mailfrom=keescook@chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694470336; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cQYSEPVjdWUsd7o+3V8/FiE1Fuh3XipjtHvOOZSgOwY=; b=F/lVyR2B9JEXJhlBHRulRF+wtbX+EJhA2CSnzZuYEbXjoPE2fkvDDQhcE8VgZqFtuOs1Ga f7mRbe7lzpX6V0T46mmV0c2fgn5yehXHhf5Jgz9igp5xfiXK+Uh/anw/t3mtrivei3XRFb /YcK4F81HXilbtqsidJ1B6lFbLkG2Gk= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=UtxzIOUp; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf06.hostedemail.com: domain of keescook@chromium.org designates 209.85.214.169 as permitted sender) smtp.mailfrom=keescook@chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694470336; a=rsa-sha256; cv=none; b=Yv2Y8XhHg2M7vjJo0zp22N+XgbZmP3XsdM378/xMyz737cDbBGBijx/MkhiVaU6lpK50tZ zQrQ39D19e7Z0oXBPDJNHx1fY2g0j9/7U5ru39XQ179HMD1K0J94cYQE+qu6iP+l++8DVW BWhVTX+YB+amgy81H7Fd8AfDdW4ZQA0= Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-1c39bc0439bso15501845ad.0 for ; Mon, 11 Sep 2023 15:12:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1694470335; x=1695075135; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=cQYSEPVjdWUsd7o+3V8/FiE1Fuh3XipjtHvOOZSgOwY=; b=UtxzIOUpqYKpunjB6cOYzomamJwWkT244EB0Eu+UsMyLcmL8urDd3+b0CnsSBaOg6H YqWfj0swXa900f8e6zODJwU5ShBvNmuYNc3P1bDZhC8mxdvq/Qv1+KYbhFTBQQKX4fhO BFhnWduRR+nK8o678Wjkx0Mm1b4Lla8wleNa0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694470335; x=1695075135; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=cQYSEPVjdWUsd7o+3V8/FiE1Fuh3XipjtHvOOZSgOwY=; b=AuWViOu658dGf+b9Rx87fiLUzslAPRLl0qTyuYIxZYxgJI4rpf8pf/DZSESyDVMcAY E+L9w4F90dv2qMJMgsHn7o6yBwlyIea+c8j479kINSAuPaD9JPBP4vbs6P+vfpC7c1Zm Li/Ecq5w7DgHVeXE/+UNd4GPuuAZZq7JKP5MuED7UI9MY1le63A2TyhL09wWg3h+C3YK eh0wjRS2pmSGk3/1y8oNk+jmcVGGwSPjfXL5RSteY5EKB99ZKqPhtaWuHZgR0cBbZm3C NPVqWIvx3V/iiP2wL0tZkigdovfEtT3A6TeNoAIjf0/+7DPXpOj1/3Ce9y4l943Oqp/k NIyQ== X-Gm-Message-State: AOJu0YxGARATyO94N4CkpFexxSGQsJQrWQCqG5IgEFppeSUQm1SfM+N/ b1ckwYF/wWcApnvFJgIXmAC0xQ== X-Google-Smtp-Source: AGHT+IG3HeocsPT3k/k788MsyUecmkaP+/Gvq4BeZ0yHot7638R5mqQQmQNC/GLej1PL60e+LZfgRw== X-Received: by 2002:a17:902:e748:b0:1bb:8cb6:3f99 with SMTP id p8-20020a170902e74800b001bb8cb63f99mr1080109plf.14.1694470335094; Mon, 11 Sep 2023 15:12:15 -0700 (PDT) Received: from www.outflux.net (198-0-35-241-static.hfc.comcastbusiness.net. [198.0.35.241]) by smtp.gmail.com with ESMTPSA id o12-20020a170902778c00b001bc6536051bsm6949622pll.184.2023.09.11.15.12.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Sep 2023 15:12:14 -0700 (PDT) Date: Mon, 11 Sep 2023 15:12:13 -0700 From: Kees Cook To: jvoisin Cc: gongruiqi@huaweicloud.com, 42.hyeyoo@gmail.com, akpm@linux-foundation.org, aleksander.lobakin@intel.com, cl@linux.com, dennis@kernel.org, dvyukov@google.com, elver@google.com, glider@google.com, gongruiqi1@huawei.com, iamjoonsoo.kim@lge.com, jannh@google.com, jmorris@namei.org, linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, paul@paul-moore.com, pedro.falcato@gmail.com, penberg@kernel.org, rientjes@google.com, roman.gushchin@linux.dev, serge@hallyn.com, tj@kernel.org, vbabka@suse.cz, wangweiyang2@huawei.com, xiujianfeng@huawei.com, laurentsimon@google.com Subject: Re: [PATCH v5] Randomized slab caches for kmalloc() Message-ID: <202309111428.6F36672F57@keescook> References: <20230714064422.3305234-1-gongruiqi@huaweicloud.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: 62D67180012 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: ekwq5ufpdjb1jtyijfnpkzcdyww4zkft X-HE-Tag: 1694470336-389730 X-HE-Meta: U2FsdGVkX18jI19zE9cUGgDatSE+u8a+XDjzO8P3dINOvIjKEG3CG69Tre2/i7M98lWkh/w/Ak0v+rNvC4AJXLyHtzblsHoX8QLw+dvpTMmpl5Ct5YCwZaaUDy+IuO2I/TkMvki11LGleYBnxWggLhATwsatAPSKbn2wp1K7xb5utUpJhVqfI5ePggKFSGC1uXELn2+OMLlqpX58mhnjnB4yDC5JJGVud7e+HflwhDlh5ANppKHe2jWPeQv5Dia8kjmgskjrGsgcVmE8rhp53IaAU16eMAOcjHdJkixXmO89QP9sijXwdgMpSyAZJywVxchOocJ9dgFxfzLWT3CJw97niJeQpE19E5evURht4DutHgYy7FVGKJ60mi/kx+yBhX62YkBHR/B9+67pEz3sNzFa0un+ZGB/8dYpPZyOoK71Rq3P3JtGfnJeZGAFcLDDg7rz1fegUesEIO6JSh/zpqTFa6hhD9ed3XqaL2oW6spt0vq78Be+CWjhjc3LsTTDDyXmCwnx5APt6NG4bGgmtYF2DD9jWJ75rSA7trJLidK2wkbd46+AznTFDofCgJMpP0UJiaqycfsYPUIObd3nq6ESdw1lnu9DhpO1MOKoUUvYQ0M1KbbgLrBUv1PUWADG/3JUA1uJhBCHiMU6yI4k9CcI+x5Fyml4CzuwsMNoo2fJvdbxrxzVI6WfjcyV3zMqo5pp/J4oQneRc3ZXqKe5050x0jY8R8IxF4jDZDalq+esdWkFjssq/+CvwcPoY67t8uzTbg/DUtxaK/5IJRUDnaktm4jteVT+iiVE6QhUhnxa4s+LiW+Txu6pX5Rev3nUn/J57uzNZkcHf33sFMc5iRq8YWPXMBLuZiRG61nrsT+/hT7QP9/X3euZS3dxg7jVPPgsv6OAZCw/3ritZcVwMi5GoN8goUrybEXNmkBh1XtMx7sxLdiuxrFNz8iae2oNv5vPcZuCvimcMO0rARy KdzczBka Jpv1fb3Vmv4kkO+bLLZpBn4yS6iqsRR/Y1tC2yKtE+XxJMe7zn82ymN8rnF2L4jCAjO9jxOJldAf5Tp/L61FWxMof/pSYPdFOBPj2f+5bHESPV8010Zlfu63oqwAw5wFnqPpKi1Di2lerApfmp52DL7fv82h0L0Ig+fk153TBFWtFLvLa8K5KNXekhP/5hkEfFqkdcBqtIdvy/wt0rEPs5UCuQXGAT6I1kTaW3MHn4S3kpZjmFjqk6VlAmeLCLQnGLVtS3q7NLqECwh+V3nJu6Y8MqDXevdIRlOla6ZwEl3HNa659SrDVPn6NnxxZWC7qddVlRQlpFc1oRqykMkmKLLFK/1mfXpr8E0sUwPhKfJRKA2TXJ85BJlzakUOu3C95P1wjxE/5rjXJesx8cybRqAiLlauU/JTdh8Q7YjQnN0l7gpB9Fk8L6KynzLVR8oHG2he7R4eCcAcQnZiQq0Mnd4XtR6ZmzZPi0k9+C0klFFOH9Y1b3tnUHJnClgtNZnCu6sWZZWg1fy7VkmzSM6oAUTZxG1XmB6s7P1Pz7COMs9filX58gFBCC+Lw13CkxuFSNITB5/grKiPuhfZ2cwtsROAGzTjszS20boh0ImVcghfYRwOeg8Av1T0CY91Rd2mrZ7JnUJXsNGtl3G1RduaRRtAgthciwXolW0z8TxvcJHNt3P++QvqOi477znqLu9+UXToL3ulUcVYHQOvMjF9PlRUS6ZfgA8LM9mcRgLH9Qfw6baZwV7PidMuVZpBf823HRSDv X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Sep 11, 2023 at 11:18:15PM +0200, jvoisin wrote: > I wrote a small blogpost[1] about this series, and was told[2] that it > would be interesting to share it on this thread, so here it is, copied > verbatim: Thanks for posting! > Ruiqi Gong and Xiu Jianfeng got their > [Randomized slab caches for > kmalloc()](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c6152940584290668b35fa0800026f6a1ae05fe) > patch series merged upstream, and I've and enough discussions about it to > warrant summarising them into a small blogpost. > > The main idea is to have multiple slab caches, and pick one at random > based on > the address of code calling `kmalloc()` and a per-boot seed, to make > heap-spraying harder. > It's a great idea, but comes with some shortcomings for now: > > - Objects being allocated via wrappers around `kmalloc()`, like > `sock_kmalloc`, > `f2fs_kmalloc`, `aligned_kmalloc`, … will end up in the same slab cache. I'd love to see some way to "unwrap" these kinds of allocators. Right now we try to manually mark them so the debugging options can figure out what did the allocation, but it's not complete by any means. I'd kind of like to see a common front end that specified some set of "do stuff" routines. e.g. to replace devm_kmalloc(), we could have: void *alloc(size_t usable, gfp_t flags, size_t (*prepare)(size_t, gfp_t, void *ctx), void * (*finish)(size_t, gfp_t, void *ctx, void *allocated) void * ctx) ssize_t devm_prep(size_t usable, gfp_t *flags, void *ctx) { ssize_t tot_size; if (unlikely(check_add_overflow(sizeof(struct devres), size, &tot_size))) return -ENOMEM; tot_size = kmalloc_size_roundup(tot_size); *flags |= __GFP_ZERO; return tot_size; } void *devm_finish(size_t usable, gfp_t flags, void *ctx, void *allocated) { struct devres *dr = allocated; struct device *dev = ctx; INIT_LIST_HEAD(&dr->node.entry); dr->node.release = devm_kmalloc_release; set_node_dbginfo(&dr->node, "devm_kzalloc_release", usable); devres_add(dev, dr->data); return dr->data; } #define devm_kmalloc(dev, size, gfp) \ alloc(size, gfp, devm_prep, devm_finish, dev) And now there's no wrapper any more, just a routine to get the actual size, and a routine to set up the memory and return the "usable" pointer. > - The slabs needs to be pinned, otherwise an attacker could > [feng-shui](https://en.wikipedia.org/wiki/Heap_feng_shui) their way > into having the whole slab free'ed, garbage-collected, and have a slab for > another type allocated at the same VA. [Jann Horn](https://thejh.net/) > and [Matteo Rizzo](https://infosec.exchange/@nspace) have a [nice > set of > patches](https://github.com/torvalds/linux/compare/master...thejh:linux:slub-virtual-upstream), > discussed a bit in [this Project Zero > blogpost](https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html), > for a feature called [`SLAB_VIRTUAL`]( > https://github.com/torvalds/linux/commit/f3afd3a2152353be355b90f5fd4367adbf6a955e), > implementing precisely this. I'm hoping this will get posted to LKML soon. > - There are 16 slabs by default, so one chance out of 16 to end up in > the same > slab cache as the target. Future work can make this more deterministic. > - There are no guard pages between caches, so inter-caches overflows are > possible. This may be addressed by SLAB_VIRTUAL. > - As pointed by > [andreyknvl](https://twitter.com/andreyknvl/status/1700267669336080678) > and [minipli](https://infosec.exchange/@minipli/111045336853055793), > the fewer allocations hitting a given cache means less noise, > so it might even help with some heap feng-shui. That may be true, but I suspect it'll be mitigated by the overall reduction shared caches. > - minipli also pointed that "randomized caches still freely > mix kernel allocations with user controlled ones (`xattr`, `keyctl`, > `msg_msg`, …). > So even though merging is disabled for these caches, i.e. no direct > overlap > with `cred_jar` etc., other object types can still be targeted (`struct > pipe_buffer`, BPF maps, its verifier state objects,…). It’s just a > matter of > probing which allocation index the targeted object falls into.", > but I considered this out of scope, since it's much more involved; > albeit something like > [`CONFIG_KMALLOC_SPLIT_VARSIZE`](https://github.com/thejh/linux/blob/slub-virtual/MITIGATION_README) > wouldn't significantly increase complexity. Now that we have a mechanism to easily deal with "many kmalloc buckets", I think we can easily start carving out specific variable-sized caches (like msg_msg). Basically doing a manual type-based separation. So, yeah, we're in a better place that we were before, and better positioned to continue to make improvements here. I think an easy win would be doing this last one: separate out the user controlled variable-sized caches and give them their own distinct buckets outside of the 16 random ones. Can you give that a try and send patches? -Kees -- Kees Cook