From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C19CC4345F for ; Fri, 3 May 2024 19:06:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 569D16B00A1; Fri, 3 May 2024 15:06:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5198E6B00A2; Fri, 3 May 2024 15:06:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E0B76B00A3; Fri, 3 May 2024 15:06:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1FF836B00A1 for ; Fri, 3 May 2024 15:06:44 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 956021C025C for ; Fri, 3 May 2024 19:06:43 +0000 (UTC) X-FDA: 82078016286.29.B30B943 Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) by imf17.hostedemail.com (Postfix) with ESMTP id 940A340023 for ; Fri, 3 May 2024 19:06:41 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=VaWGPqfu; spf=pass (imf17.hostedemail.com: domain of keescook@chromium.org designates 209.85.210.169 as permitted sender) smtp.mailfrom=keescook@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714763201; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tZsz55fiqZR4w4NlbwWoRsw4O+mWS9LyOjfLrSZHpu4=; b=VuTXRwlljnaJhfm6Lf2ThatvsVV9CX2Yz1sgjq16DyiL49cM9OkivcIN6NyVajUrMcjKdL 0NlAoxcRubV5MeLYi3rL3gu4xhogjLsiOXVl2Mj8y8THSYln4CD4oXDGnaalnUIYL+Os6D Au4/g+tGFKszDYPZGfwO36otXzhs+ZU= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=VaWGPqfu; spf=pass (imf17.hostedemail.com: domain of keescook@chromium.org designates 209.85.210.169 as permitted sender) smtp.mailfrom=keescook@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714763201; a=rsa-sha256; cv=none; b=hxnNBZWkZjZjt1aR+VTbdvYWNw5KP0sLKvtUl6Zsedyy5Bansa352gPuFkauv49PfZvn3o ijujZD5C9BLG1vA3hm1ZTC+xAfwt4sfEiK/QO91E0lk558TTttJEXS7AwtxC/6BKw3uky6 BAWQCXdTCX5JII1RW3S5EdHErnDfdZ4= Received: by mail-pf1-f169.google.com with SMTP id d2e1a72fcca58-6f44b5e7f07so37685b3a.2 for ; Fri, 03 May 2024 12:06:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1714763200; x=1715368000; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=tZsz55fiqZR4w4NlbwWoRsw4O+mWS9LyOjfLrSZHpu4=; b=VaWGPqfuZtomY1dy9iV9utnoLn6dBF48XfaEsmo+Cj/WcogNWK9O3ldd4Qsvk2LzdH jJ5HN/bs1OP7nXVP1NpgqLiXGdy7S4kZjuzW32Jo7bcoDQxxkNcZxNQ/nnOMJyGaLNrI 6oLc3yn7M9OWQTQTIAlU06AMbYFItEZH52Sj0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714763200; x=1715368000; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tZsz55fiqZR4w4NlbwWoRsw4O+mWS9LyOjfLrSZHpu4=; b=vD9y0ok/UeUqrSnt7OxS/zEAoKprDxdCbCVbFxJ+pq7GaEDgWBuXpqwxOFmrsK05+N OLWO9AkOAFbdEyDW774UVjkO/X9ZAF57wi8jO/dbTc9pH2OLOiubl5asjEEgpeSNqlJW syc4PVHudrICcCCEL5xDsgMnFx18PkCIM0sL8Pf+vd6ndVEq66Ds5yVNtqzbZ/SWfi11 Bk6K9bfnKe8vd99cCVM4KH+1vg6p/o0FOzUHpkLawyaXmnIdB92xzaP+vy90lbvod93J vrIMoljXk9KIaaGiMPFmC0lKla0rbMCXd23wZvBFJWdNrCsRz7/O+ieWS9VYxBv9ja8X Yuzg== X-Forwarded-Encrypted: i=1; AJvYcCUvlQe41tZebPbOC8jZ/E6wOcrKivmx1j/AJUM2QU3SaDO4R6UZaAVM6o3snyzfx6J3TxMeYOchYZjIQV9xPnrW5bo= X-Gm-Message-State: AOJu0YwGXRPqlwk2GIhDZqmGVSJ4d/gTwI/aiy5kqYd8e21anipt+46/ 2AZgKeXIja6fM+xgRWgepReY7NslcHg6kjA50nz9vm6+qgRVBfFi432QBZ4Otg== X-Google-Smtp-Source: AGHT+IGwximmF86WltIWrIY4dQJlppuh9AHcCW6xvgeEfudlw5lj3W4bueIitHYmaG0IiKCkZhyeOA== X-Received: by 2002:a05:6a00:3d04:b0:6f3:ee60:54c3 with SMTP id lo4-20020a056a003d0400b006f3ee6054c3mr3422347pfb.19.1714763200238; Fri, 03 May 2024 12:06:40 -0700 (PDT) Received: from www.outflux.net ([198.0.35.241]) by smtp.gmail.com with ESMTPSA id jw33-20020a056a0092a100b006f3eec7be3bsm3423206pfb.32.2024.05.03.12.06.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 May 2024 12:06:39 -0700 (PDT) Date: Fri, 3 May 2024 12:06:38 -0700 From: Kees Cook To: jvoisin Cc: Matteo Rizzo , Vlastimil Babka , Andrew Morton , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, "GONG, Ruiqi" , Xiu Jianfeng , Suren Baghdasaryan , Kent Overstreet , Jann Horn , Thomas Graf , Herbert Xu , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-hardening@vger.kernel.org Subject: Re: [PATCH v3 0/6] slab: Introduce dedicated bucket allocator Message-ID: <202405031126.CEAB079A1A@keescook> References: <20240424213019.make.366-kees@kernel.org> <202404280921.A7683D511@keescook> <28478de8-3028-48f2-b887-56149b6e324a@dustri.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <28478de8-3028-48f2-b887-56149b6e324a@dustri.org> X-Stat-Signature: y31ffgu1h4ds8abc5myc9we9t1e8wcxt X-Rspamd-Queue-Id: 940A340023 X-Rspamd-Server: rspam10 X-Rspam-User: X-HE-Tag: 1714763201-173815 X-HE-Meta: U2FsdGVkX19fFt9M6AEAf1aFy7TrnI3UB8K3EZNgFi3oOwc3Y6ym0Fyd5hi2q7hBwPCwNmDo0wKXZ3PbSqrqH9gTQBIN6BRg2CUvLvZuYvECq4ICxnQWnY9pLvHho6z3vjom/Eu5q4Uxbff49A/HfjFVTp3F/8JSNTYNxl8tbonvuwQnq8+2xcXcBIkuX2UoHc3Lrb1pBpU9A0DqX8GYs0Acid71K46TQyFxCE9FRBnp6Oh8riiNoREiZ1YSfUJ7MwB75QyPWoZMvMgAU5QhXex+kvGhpssuZHg+pKDPDJLxBWwDBJhYNP0JNqe40+acQBqBKuFcOylDvy3IWgqYXP/ubXBF+hx/b5Z+krd5B/Dkzn0t1LHFYbmKVG3G1s1SYwIPdG1b3b4Lf2ksW8S+2tAg3GwDj2r/qkXuBU9L8gAj/lkfSoVDKsX9Z+l11dAUdeBHd9hgX5Srk9khQGAzm+ZdAcqE8WF826564H4knC+cH0U2C86Ik85Hx175vSu7dv83C2K6HxfRpXK0s72LC2YaqxPE72ychtC2twh/6yoLffp2P2hRM83kmweQfxCDcA3eyaFONdTfSlMJJTVEEprNj8ld9zLhma6DW0j7EUTV4iZWpE4npTiqhfq8m5tNKEmDb6lBS4MXdiki0SK7JQpfzoJsUXFN3zF1cDQ7bqaXyJt5mnzWWVHT+J/JSS//Q2ESplt6DOv08UZbGsbVz9ppuQa9wddixPweDScc02GjYf7kAHvNXjFyoDoMC9QVV6mPKdLmanRAF9tPu4aQN5NVRwo73jI3iVqA5+gdQv7gkeKhn/QhMWH4XZy84UsFABYzAtDoXowOFiCnlaVv9K8LPUkA4O9FT7TyjCfTUNpYNCex84NpdBJ+RJ8aqjQZZsw9aYyYPx/qRfpc6xWTHYFXFyL//ocrT7Nz7z+8pq8QDLHqr9pr9YUnZ6aGPc7LjZTFOaooy4YLLPZHRTW M7fae2o8 7KMSLeSKlRQiyUvPxt/hDCpdh5cBGY/lPxD/rpoxR7qaxGfl999G7DMhLoAc2ppIU1eHho9Gdv1svxJcmJ8qBtUbZ0nuPhS9E60qnAz9+S2q7Kr+iufGZN10yOrgpdBdjnXhGxlBuvdoOSHnpBX8wmuWbTuESJ371YyoDBEjSnKOOpT57JrSOQVJiD8cznA6zOPU2ZUpWnZqWHCJn89nh22nbgMfiYkc3FhBYtnu/vnSmSB0XOPBtLv6qPIDsOYJK1upcSaYpJ2fuzVxsXzKOAAMdt80RpEUmL5HBjTBwaWCA6TnltteTG2C+hyjkd3rDolhMujHci13H6/lwVHexeCbnfmYCKzLmD1bGmeUaN7973Fu6o9B/RqTYj/RT+my6E0qQoiuuJS/mJ8zXXS+fL/RVphIsERFD3p2ov0l3YtvgpB0QY2N6zUspfAjO68ZY67rE76Swh/t9cTL+0wW/g8V/hckjpC22TbeQQnZPOeagw2cfg3QPmIcXvuHiN9us24fL6fqZecDAlBx9fVqbkKX+AdPeZkRNwTMz9XC2++kLAXoBia5mIY/O0dLhnEammhCIXHmICo/fL2UeHa6JiTANdX5pAL05T5BTMsWh+cUN4h+x6+vIL4hZrzfqEMbo3kvcJzdEtPjjj2Y= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000048, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, May 03, 2024 at 03:39:28PM +0200, jvoisin wrote: > On 4/28/24 19:02, Kees Cook wrote: > > On Sun, Apr 28, 2024 at 01:02:36PM +0200, jvoisin wrote: > >> On 4/24/24 23:40, Kees Cook wrote: > >>> [...] > >>> While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense > >>> against these kinds of "type confusion" attacks, including for fixed > >>> same-size heap objects, we can create a complementary deterministic > >>> defense for dynamically sized allocations that are directly user > >>> controlled. Addressing these cases is limited in scope, so isolation these > >>> kinds of interfaces will not become an unbounded game of whack-a-mole. For > >>> example, pass through memdup_user(), making isolation there very > >>> effective. > >> > >> What does "Addressing these cases is limited in scope, so isolation > >> these kinds of interfaces will not become an unbounded game of > >> whack-a-mole." mean exactly? > > > > The number of cases where there is a user/kernel API for size-controlled > > allocations is limited. They don't get added very often, and most are > > (correctly) using kmemdup_user() as the basis of their allocations. This > > means we have a relatively well defined set of criteria for finding > > places where this is needed, and most newly added interfaces will use > > the existing (kmemdup_user()) infrastructure that will already be covered. > > A simple CodeQL query returns 266 of them: > https://lookerstudio.google.com/reporting/68b02863-4f5c-4d85-b3c1-992af89c855c/page/n92nD?params=%7B%22df3%22:%22include%25EE%2580%25803%25EE%2580%2580T%22%7D These aren't filtered for being long-lived, nor filtered for userspace reachability, nor filtered for userspace size and content controllability. Take for example, cros_ec_get_panicinfo(): the size is controlled by a device, the allocation doesn't last beyond the function, and the function itself is part of device probing. > Is this number realistic and coherent with your results/own analysis? No, I think it's 1 possibly 2 orders of magnitude too high. Thank you for the link, though: we can see what the absolute upper bounds is with it, but that's not an accurate count of cases that would need to explicitly use this bucket API. But even if it did, 300 instances is still small: we converted more VLAs than that, more switch statement fallthroughs than that, and fixed more array bounds cases than that. And, again, while this series does close a bunch of methods today, it's a _prerequisite_ for doing per-call-site allocation isolation, which obviates the need for doing per-site analysis. (We can and will still do such analysis, though, since there's a benefit to it for folks that can't tolerate the expected per-site memory overhead.) > [...] > >>> Memory allocation pinning[2] is still needed to plug the Use-After-Free > >>> cross-allocator weakness, but that is an existing and separate issue > >>> which is complementary to this improvement. Development continues for > >>> that feature via the SLAB_VIRTUAL[3] series (which could also provide > >>> guard pages -- another complementary improvement). > >>> > >>> Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1] > >>> Link: https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html [2] > >>> Link: https://lore.kernel.org/lkml/20230915105933.495735-1-matteorizzo@google.com/ [3] > >> > >> To be honest, I think this series is close to useless without allocation > >> pinning. And even with pinning, it's still routinely bypassed in the > >> KernelCTF > >> (https://github.com/google/security-research/tree/master/pocs/linux/kernelctf). > > > > Sure, I can understand why you might think that, but I disagree. This > > adds the building blocks we need for better allocation isolation > > control, and stops existing (and similar) attacks toda> > > But yes, given attackers with sufficient control over the entire system, > > all mitigations get weaker. We can't fall into the trap of "perfect > > security"; real-world experience shows that incremental improvements > > like this can strongly impact the difficulty of mounting attacks. Not > > all flaws are created equal; not everything is exploitable to the same > > degree. > > It's not about "perfect security", but about wisely spending the > complexity/review/performance/churn/… budgets in my opinion. Sure, that's an appropriate analysis to make, and it's one I've done. I think this series is well within those budgets: it abstracts the "bucket" system into a distinct object that we've needed to have extracted for other things, it's a pretty trivial review (since the abstraction makes the other patches very straight forward), using the new API is a nearly trivial drop-in replacement, and we immediately closes several glaring exploit techniques, which has real-world impact. This is, IMO, a total slam dunk of a series. > >> Do you have some particular exploits in mind that would be completely > >> mitigated by your series? > > > > I link to like a dozen in the last two patches. :P > > > > This series immediately closes 3 well used exploit methodologies. > > Attackers exploiting new flaws that could have used the killed methods > > must now choose methods that have greater complexity, and this drives > > them towards cross-allocator attacks. Robust exploits there are more > > costly to develop as we narrow the scope of methods. > > You linked exploits that were making use of the two structures that you > isolated; making them use different structures would likely mean a > couple of hours. I think you underestimate what it would take to provide such a flexible replacement. As I noted earlier, the techniques have several requirements: - reachable from userspace - long-lived allocation - userspace controllable size - userspace controllable contents I'm not saying there aren't other interfaces that provide this, but it's not common, and each (like these) will have their own quirks and limitations. (e.g. the msg_msg exploit can't use the start of the allocation since the contents aren't controllable, and has a minimum size for the same reason.) This series kills the 3 techniques with _2_ changes. 2 of the techniques depend on the same internal (memdup_user()) that gets protected, which implies that it will cover other things both now and in the future. > I was more interested in exploits that are effectively killed; as I'm > still not convinced that elastic structures are rare, and that manually > isolating them one by one is attainable/sustainable/… I don't agree with your rarity analysis, but it doesn't matter, because we will be taking the next step and providing per-call-site isolation using this abstraction. > But if you have some proper analysis in this direction, then yes, I > completely agrees that isolating all of them is a great idea. I don't need to perform a complete reachability analysis for all UAPI because I can point to just memdup_user(): it is the recommended way to get long-lived data from userspace. It has been and will be used by interfaces that meet all 4 criteria for the exploit technique. Converting other APIs to it or using the bucket allocation API can happen over time as those are identified. This is standard operating procedure for incremental improvements in Linux. > > Bad analogy: we're locking the doors of a house. Yes, some windows may > > still be unlocked, but now they'll need a ladder. And it doesn't make > > sense to lock the windows if we didn't lock the doors first. This is > > what I mean by complementary defenses, and comes back to what I mentioned > > earlier: "perfect security" is a myth, but incremental security works. > > > >> Moreover, I'm not aware of any ongoing development of the SLAB_VIRTUAL > >> series: the last sign of life on its thread is from 7 months ago. > > > > Yeah, I know, but sometimes other things get in the way. Matteo assures > > me it's still coming. > > > > Since you're interested in seeing SLAB_VIRTUAL land, please join the > > development efforts. Reach out to Matteo (you, he, and I all work for > > the same company) and see where you can assist. Surely this can be > > something you can contribute to while "on the clock"? > > I left Google a couple of weeks ago unfortunately, Ah! Bummer; I didn't see that happen. :( > and I won't touch > anything with email-based development for less than a Google salary :D LOL. Yes, I can understand that. :) I do want to say, though, that objections carry a lot more weight when counter-proposal patches are provided. "This is the way." :P -Kees -- Kees Cook