From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EFFC2C4345F
	for <linux-mm@archiver.kernel.org>; Fri,  3 May 2024 13:39:39 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 7F6716B0092; Fri,  3 May 2024 09:39:39 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 7A63F6B0095; Fri,  3 May 2024 09:39:39 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 647116B0096; Fri,  3 May 2024 09:39:39 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16])
	by kanga.kvack.org (Postfix) with ESMTP id 3FB956B0092
	for <linux-mm@kvack.org>; Fri,  3 May 2024 09:39:39 -0400 (EDT)
Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay04.hostedemail.com (Postfix) with ESMTP id F3DFD1A10E8
	for <linux-mm@kvack.org>; Fri,  3 May 2024 13:39:37 +0000 (UTC)
X-FDA: 82077192036.02.6E36441
Received: from out-175.mta1.migadu.com (out-175.mta1.migadu.com [95.215.58.175])
	by imf19.hostedemail.com (Postfix) with ESMTP id 79D011A001D
	for <linux-mm@kvack.org>; Fri,  3 May 2024 13:39:35 +0000 (UTC)
Authentication-Results: imf19.hostedemail.com;
	dkim=pass header.d=dustri.org header.s=key1 header.b=mCNUcPGK;
	dmarc=pass (policy=quarantine) header.from=dustri.org;
	spf=pass (imf19.hostedemail.com: domain of julien.voisin@dustri.org designates 95.215.58.175 as permitted sender) smtp.mailfrom=julien.voisin@dustri.org
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1714743576;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=+RwTdEzW/FB2ApzZxuuCNuwAx1O+uSLKzF1SQsO0uE8=;
	b=klVBnCHh/2O3CgEKX8hsths8tVsOJf2QwHe7At5J2xJVMiMEoWIleDPclrGj1+dhJ5LruP
	AOTTQIWNWHqjTM5cx9Ay8mrr2PVKgIIM/8nF9pG+jbDvFon/mUHS5EiKgHrl6x+QJ8KrNQ
	JF0AGKh2Thin85yzeg3C42wq1X44q+Y=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714743576; a=rsa-sha256;
	cv=none;
	b=0bxnHdQihko+xFc9FkdWANQeBEbuP0dubf6VKhI5f7Hw1azXesFFLeppqSQnlnqhVdKD71
	Bm+DISjr6H9XUtUKbfMsoxaSrSdiZDkFO0yagxrJybA4LiUQsW6S5JLneeZi1P6pzcyptU
	E0zjJvYLbnfN4ae9SP6llISc7xnH0wo=
ARC-Authentication-Results: i=1;
	imf19.hostedemail.com;
	dkim=pass header.d=dustri.org header.s=key1 header.b=mCNUcPGK;
	dmarc=pass (policy=quarantine) header.from=dustri.org;
	spf=pass (imf19.hostedemail.com: domain of julien.voisin@dustri.org designates 95.215.58.175 as permitted sender) smtp.mailfrom=julien.voisin@dustri.org
Message-ID: <28478de8-3028-48f2-b887-56149b6e324a@dustri.org>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dustri.org; s=key1;
	t=1714743573;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:autocrypt:autocrypt;
	bh=+RwTdEzW/FB2ApzZxuuCNuwAx1O+uSLKzF1SQsO0uE8=;
	b=mCNUcPGKoh9tQSDW19snBXOgRCBUIw4ATq52+/dYMeqf6z6MwYKeFaBMNQffZiNuqCe9wW
	ycq754rqQCUB2MdFlAXWflQyB7TmoiUh1rfulR+de//9XAeKQrsVvUIoe+u4Hn0MiBs8PR
	zn+FC/PsB893kexCoiOSx6/Ihk4/8h92BtuElX7XEp3S4TbtmGRcODawD63986cel8cPuA
	gjdcYOSLptoz1cNvgqgmt7A6UqjBaL5s+7/1x1y9iiKMXzfFmECDJ2m7WZjHl9Ne4jXUVb
	bQiQV2JD6Cdy5fYCxbGJFtF7hQsuLv7Za7NLFc7XLXgDHMnxQVTHYiMbMQJKrQ==
Date: Fri, 3 May 2024 15:39:28 +0200
MIME-Version: 1.0
Subject: Re: [PATCH v3 0/6] slab: Introduce dedicated bucket allocator
To: Kees Cook <keescook@chromium.org>, Matteo Rizzo <matteorizzo@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
 Andrew Morton <akpm@linux-foundation.org>, Christoph Lameter <cl@linux.com>,
 Pekka Enberg <penberg@kernel.org>, David Rientjes <rientjes@google.com>,
 Joonsoo Kim <iamjoonsoo.kim@lge.com>,
 Roman Gushchin <roman.gushchin@linux.dev>,
 Hyeonggon Yoo <42.hyeyoo@gmail.com>, "GONG, Ruiqi"
 <gongruiqi@huaweicloud.com>, Xiu Jianfeng <xiujianfeng@huawei.com>,
 Suren Baghdasaryan <surenb@google.com>,
 Kent Overstreet <kent.overstreet@linux.dev>, Jann Horn <jannh@google.com>,
 Thomas Graf <tgraf@suug.ch>, Herbert Xu <herbert@gondor.apana.org.au>,
 linux-kernel@vger.kernel.org, linux-mm@kvack.org,
 linux-hardening@vger.kernel.org
References: <20240424213019.make.366-kees@kernel.org>
 <d0a65407-d3ae-46d5-800f-415ce7efcf22@dustri.org>
 <202404280921.A7683D511@keescook>
Content-Language: en-US
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers.
From: jvoisin <julien.voisin@dustri.org>
Autocrypt: addr=julien.voisin@dustri.org; keydata=
 xsFNBFWzxaABEACu3G1fwzHtrhHuotgvZ69zA4YqF9vYfx7hoYrjnKzP5pTiOZ2US6AJj1qE
 W1WlN6cHnqzzqoXotVu/MPuPrbadL21jRnJWurrkktpcqK4BaCZ5S0lOQ3ck40LysidexhI6
 ZZi6jhBZzuzxs2Mi9aIPIxDekXAWQBybs4m27E4MNmJkIshVnDTMQ4ToGQxzwPj+VurpQVPh
 WGMCPwlUbVkN/w6N/lLC088ePpESh5E0vFE+BQc66ZpRn+cXTlaqjQnwRtWuEBoqJSn2MXAn
 wODEj4H5HvQjgFyRfmOHHMTEHOg4yyc84SmIv8YJlTbVX7VnMGUJF43SA4PFtXFypBkQ481u
 w10XdBPYwD/i0q3QnzzRiIsrlQJUCkGFxyIpcDNRnf3ApjT4+QuEaw98tKvgRzCozFx2D94w
 sSFz858vZrdYj4pt/VYw8JeoPDiWwuzPVvgpJmQlL8aCRnAhLIv9O+fySvXcGh1WEvtUgkNn
 1WjU2M00BYnPNFBEeGMRWkxuVwV1o+WKNJfwg2UcDghSkJGBCPCAiC2fDlfyk3njjLjxZHP/
 mYNwUkxTlQolzknJZ5wg7vbE6r4rfQX4gTi3mNzYtqUAb17GIczOARZK7qdSapObrXPFGgX3
 Bd4FZJEaIq3p5xWcWS8fcMveoYO7m9cyaSkSQxAPrPZE3hDF1QARAQABzTJKdWxpZW4gKGp2
 b2lzaW4pIFZvaXNpbiA8anVsaWVuLnZvaXNpbkBkdXN0cmkub3JnPsLBlwQTAQoAQQIbAwUL
 CQgHAwUVCgkICwUWAgMBAAIeAQIXgAIZARYhBJ/N7p4aOB8xHqYqdATQQegXGQHMBQJfDWXp
 BQkSV5eAAAoJEATQQegXGQHMKrwQAI8gOcx3qRk7T5qBgg9rlk3WDaJWcmw1Dq2VnjKrEVLh
 vxvwK/CjiaH4g6oUiGNeDVBoozjzKM/umHL7SoBjhHiayEu33ziAjLWxiVGbHVmHmfXkZdQz
 CEBSI1ZR8HF88tFCCOCtK7Nc+1yohmTnfnrIIEXMpSvAgdFilwnjYbaNe+aQ9MJMo+k7J144
 h+BzN5EW19zVwOidUdD0HxKpCYz6D34etnYIpv8Qa0KBzOPTtO1QYr6A7MfQPiRVlIOA543g
 h9bi9SQhCBsOZU1NOVQUZ3/ktj8qlUTVlOhGKYaPvJJ0X9va02rzL7zxYcVZgQic2dTLGYW/
 GGHVseegnxWB/7V49Yf4ZljQvjK2B1COmahZ2UYN+fzqXO0NhpSLX4SDKDnvM/3X2TYWx1MS
 fY8x4IURA633TTW9QZzflqIYk4aO44/8MDiuaxLwt+e6d8EN8ECaAoVFPCq1dWTjCJ4XhSlb
 6eV8trCpLZfkVviuRD7xPtZU1sViVSj/O9naQ2HuUq0+LuYBmI25BEpq2rkgVKS++sYgUtxO
 IP5WoQJeNNnS+8e15VRdb77WxRe6+05JNu48wZI2OcW/MiyFs+cGtoDC5mSpVuJTmpPumP7A
 hjlxy4e5YlQn6coqQcuNL1DC/vUFwO1/cnh5dqk0x5JfHL1/XFWYjsVNjuJj/vIQzsFNBFWz
 xaABEAC/p5ESSIlC6qVJnxfhtIpappjkHmFjMHWmFrB05KnmtGB/InGH0e5y2OVaKz0RErLd
 f2CAzU5zb9cyLPnqHpE7SaqtPBmahTBX7nVzIFrbjLpU/XPHaWrHa6M1ifyu1y2msXe5U1ln
 oOVjJXTVsyoNAt8wzf73I4St2+pY7kQBlv5AUTssa4T22hZs3BImcd4OsLpct2aIGd3NGofN
 ksiLB3ZiE/vKJkXWIbx9/hm8nuKlQuHGo+sHho8T+QQcc+YCo66BYBznzD+yEv/UALjgHWU/
 PXw3RVM8kqQ3WlmWsYKqQYgkaA2cVPrkbLlxiHg28Y4deu6oZR4oSovXjJk4jj3m/UckaN0f
 c47BG1VwKVHxjg/c8hy1elunhJv0Vf2eLA6pc0UfAcpSkJZNkOLjFZ9YROHdiKiUE4pEej4/
 o3WE76TIX58aURuouVAVwe14sIED3QLoO+4wczTZsOX/jcOg2D2qPquby5taOAM6yPP/v7fy
 TAG9UYdxq1L9/wKwhH1pmagkTmLu7k5XzgQ/6rrR4NJPRRMETrtqDFJNb2UxhRlnl/Cavkt6
 5BK7D0QJ9n9phFWC2oTIaMd5suFZK3I71UdeTaBOlrqmqLzuBVhzQeAK2vaJI1c6IzqjGRlx
 PEm6BuHfRWaf+LLj4Z7wrupWwAxLjHgPUCL2Chk2ZwARAQABwsF8BBgBCgAmAhsMFiEEn83u
 nho4HzEepip0BNBB6BcZAcwFAl8NaJcFCRK/pHcACgkQBNBB6BcZAcxUhg//fmeZNMlB7NPJ
 bT4dLsnSTCRAl1zqCxqowPyG4ux79qiG73KW/vLT1EUQTm4ANyl5Mwyf+3ssfzxl/Flp7i93
 57rENZRMWj80JluU8w68sUrxKlTNZfrukHttoNPmTh9TTuvP0yQXysJyy0p6VvdBT5euf2Iw
 LMERoaln4h2VuhLSL80VcJfou0TVl9Aq47HerwTPXQdC4Rm/bYrdDdZhEJMrEQuDP6eLIjmC
 4vI51LwnPcXABan3WudfEaxdpI9acwcCy9XQ32vIjhxV9D3fx0dsfo6PDXFdKEY9q+bfOjUt
 GyqZWRtqe/EWM8T1w4H4svpGpTh2mB8Du/1jNy5CiSgLiDySd6Gz8vP0rqFGYuLN1fCBNpd4
 PzF29dPO8xJ++K5pVi+pXpKzIfW9f2ZL0fabrsKP1Rht+q+3ldgGSvgw3v2aFffvEuRmodiY
 Vkby7UMuABQGlgE89z+cffBRhelgNzoVs/PtIuWb/y5BgOBGD9zUn4Z2FjB5eby230qkP1uQ
 +vyunBj6QnANa7qBxycL+xXbW8HBksArQ/HX+OZs7hagrP0qGMnjmCzsblv0wixghgvQTkpg
 61RTH34ieLUkzE0oFkrqJyNZcoH0wStdP/9zwK1Av0cZcFcvlLdIL956v4IpZozW1ScG7OJw
 766VTOg4l2PTPCnIdNFy1Os=
In-Reply-To: <202404280921.A7683D511@keescook>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Migadu-Flow: FLOW_OUT
X-Rspamd-Server: rspam08
X-Rspamd-Queue-Id: 79D011A001D
X-Stat-Signature: i9xk17jtq59e98otrfimmta7q1u74bqr
X-Rspam-User: 
X-HE-Tag: 1714743575-34848
X-HE-Meta: U2FsdGVkX19MHPRUbCqXd212QJBPY8heF5/4jK0VGtHJRhAjDHQQFmR+NwuoXbQVktKp+zUF4Jvrkw2n/p79zlez6cdVJ8S86C1/1UMUb80TkPsUFB4cM7uSic1Lj4yT8tck3uavXvb28J5RD8nbLNAScXlWTDBNI27x3WzCqjopkjUI5N5Cp0FIcMDcdeIRolQFPS8gT5bigiK0CDlxGxVlwb9HGKPjdvQIXDxC1MtXV9qSbTvANBZImqvR6cAB7b4+tt+ZRoQYXLg4yEQACAeCP+jEWrvxYSh8mCS9lbocq6q+HOgeQ5sNpvy0ASFsWlA8zYNO09vo7QraCbnB+OS8SlBLeECt5ybFkHsUKaXTHFAB0tLE2MGomDrZ49jtvfsCD5jApyVgNPSu+vZ11EZ4Mvgc/tuWHA+RPzCuZbJcp/IjiukXmOM/q7X77RQOBychs892o/9Je3/+I9LeVCMLSBvFQyCG/Q3qf+O8oHG7xMtZTlZ4QNWu0/+GOAFRHNQzXiTdLLIAo8vlAZblEbHIZ2TLcRZarKeU6PuQ4f6yMWCkGJUuwesUgO+HCUd9XVihk0e3oCd1niJtwc5f5+5ZmVnVyk9sfvHG1sDN5tVmjwGmaYypYAiR7oCVOpPHK258qnHMBXlRLkPbV49SmM0lybY9OjQwiLEzbU7zMzKCrE9m+S5MA0gK1mKwdh9seL1K30OHVIaG1bWYt6cVxrgZ1HqATy7B5v9MTxJM7G6zfvjwGinuxJCTfEBxBZXELk08lFKQcru4rOxpCm3s643PiWrtCIFvu35ogLIfrJcly7AqQkLJYjivgdde2vkLwOGThGd4GLTFSjCw9NaaObG1sPNqZQiWVHhsFLavRa+GZob5Q+VMFj6xXc1bkfyTR+IUB95AprCfcgQKWl3EwgprGLCHTla1MdvJ1Etb7mNFSZIkhcovZy31tnmT6ysfKceEBg/urJKL1JPubVd
 slTeDIlq
 PlGh/onWbWICTT43pmOPhSnVWVxXdSzJ/VRcOnPPx52ucY4ffFJCyR73zWNp83+NO3lNjxMCJkPCf3WDUVX1CaHVC8pKoiLoG6HuNflYIu5SqrvYrjExWkKoi+LvthwtXxuCi0JNcT19NDifhs1WJvtfnamkr4IpWdledoFawqOOGYmXL7Ia5a5ZF5Ba276KXBFrPHr/SI3os5m4HmSxaS1msRtgehF9YcNrkQFTweYZtQAwgPJOjV9sq7raGbd1sHWkz529zdVj5ba41ouZtDN9IDYESKya5TygPBWE9X4tBVLkbkq7SwCAeAmZwboXoCB/eiopscftNGOKiL7bYA4g1Jkudteu7UsQa1MlyBOlBLQCF63SdcuRiQHWR4DXqwkxPuXmWbFNjoVGhMk35KC4WOwu4SNEtS5GH6SMBPaEJQHyRbiDSO3sGPE6cDw6m2ERJUP4R5ZFBuMekeM+Y/OqtGegbKo5hJ2gjCZB/b9iis5b5WsuHd1X3omZgdMb3F61TbovtM7fBThNhwilX6z7aWHgMYbHHMBlBLneiMA45bNQxBWmLWQVRZ0YVOaxcbJJI8EFtYV+j31+8ItGGRN54/pm1QkU0XuIIirxTol8QumQMiW3mbAK6AbtwpkLKkoPkbGfoe6gkhok0q9I1hzaNQV2zbITFt4IiPxDcYRXaPhKZ30q44avdrDpivdMtot0Zgd2J7txGrsVT9mcDBP60H1tI1E17sOTecrKo6tDntOh1VEuGYzoippwX7pnlHHogBNAjcgqOgp/Tr+3Q4nShESPlGhfpbjP3hkHd5ZpMbPJW0GzXUDWWzkXWkzbO3r0+S2u6Rjos+LVB/JLea6dl0jlFW42h5mqRmOBLBfg6BzpFr26onb1zPglPS3cdcY7VxZXD8X8s7tGZeWKmtfh4c3OGmVGLoTtRQPRIIwHkiutDTIM79kdMv5bKb/j2nkh7PTjDMZUosB5njq54R8aMtLKr
 Gll/mRit
 2QiT6J7V/mQ6VyGRkdoQ1KYK/OWLfrew62JcYF160rtA9iXMrIV3Kuq5eYJ7FCw6D73BhEMEmdBy77R40OUpfeLz5LPJi7uma/hzpEl8FiUVswpZiB8xtcqYvwtbsVMeHDSSJQeck0/wCSya987VP89EVaAVKoRYA7Ebmr301qXAyrQI7xyLn4f1m1Do9cb2/VHMYrA+zhs=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On 4/28/24 19:02, Kees Cook wrote:
> On Sun, Apr 28, 2024 at 01:02:36PM +0200, jvoisin wrote:
>> On 4/24/24 23:40, Kees Cook wrote:
>>> Hi,
>>>
>>> Series change history:
>>>
>>>  v3:
>>>   - clarify rationale and purpose in commit log
>>>   - rebase to -next (CONFIG_CODE_TAGGING)
>>>   - simplify calling styles and split out bucket plumbing more cleanly
>>>   - consolidate kmem_buckets_*() family introduction patches
>>>  v2: https://lore.kernel.org/lkml/20240305100933.it.923-kees@kernel.org/
>>>  v1: https://lore.kernel.org/lkml/20240304184252.work.496-kees@kernel.org/
>>>
>>> For the cover letter, I'm repeating commit log for patch 4 here, which has
>>> additional clarifications and rationale since v2:
>>>
>>>     Dedicated caches are available for fixed size allocations via
>>>     kmem_cache_alloc(), but for dynamically sized allocations there is only
>>>     the global kmalloc API's set of buckets available. This means it isn't
>>>     possible to separate specific sets of dynamically sized allocations into
>>>     a separate collection of caches.
>>>     
>>>     This leads to a use-after-free exploitation weakness in the Linux
>>>     kernel since many heap memory spraying/grooming attacks depend on using
>>>     userspace-controllable dynamically sized allocations to collide with
>>>     fixed size allocations that end up in same cache.
>>>     
>>>     While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense
>>>     against these kinds of "type confusion" attacks, including for fixed
>>>     same-size heap objects, we can create a complementary deterministic
>>>     defense for dynamically sized allocations that are directly user
>>>     controlled. Addressing these cases is limited in scope, so isolation these
>>>     kinds of interfaces will not become an unbounded game of whack-a-mole. For
>>>     example, pass through memdup_user(), making isolation there very
>>>     effective.
>>
>> What does "Addressing these cases is limited in scope, so isolation
>> these kinds of interfaces will not become an unbounded game of
>> whack-a-mole." mean exactly?
> 
> The number of cases where there is a user/kernel API for size-controlled
> allocations is limited. They don't get added very often, and most are
> (correctly) using kmemdup_user() as the basis of their allocations. This
> means we have a relatively well defined set of criteria for finding
> places where this is needed, and most newly added interfaces will use
> the existing (kmemdup_user()) infrastructure that will already be covered.

A simple CodeQL query returns 266 of them:
https://lookerstudio.google.com/reporting/68b02863-4f5c-4d85-b3c1-992af89c855c/page/n92nD?params=%7B%22df3%22:%22include%25EE%2580%25803%25EE%2580%2580T%22%7D

Is this number realistic and coherent with your results/own analysis?

> 
>>>     In order to isolate user-controllable sized allocations from system
>>>     allocations, introduce kmem_buckets_create(), which behaves like
>>>     kmem_cache_create(). Introduce kmem_buckets_alloc(), which behaves like
>>>     kmem_cache_alloc(). Introduce kmem_buckets_alloc_track_caller() for
>>>     where caller tracking is needed. Introduce kmem_buckets_valloc() for
>>>     cases where vmalloc callback is needed.
>>>     
>>>     Allows for confining allocations to a dedicated set of sized caches
>>>     (which have the same layout as the kmalloc caches).
>>>     
>>>     This can also be used in the future to extend codetag allocation
>>>     annotations to implement per-caller allocation cache isolation[1] even
>>>     for dynamic allocations.
>> Having per-caller allocation cache isolation looks like something that
>> has already been done in
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c6152940584290668b35fa0800026f6a1ae05fe
>> albeit in a randomized way. Why not piggy-back on the infra added by
>> this patch, instead of adding a new API?
> 
> It's not sufficient because it is a static set of buckets. It cannot be
> adjusted dynamically (which is not a problem kmem_buckets_create() has).
> I had asked[1], in an earlier version of CONFIG_RANDOM_KMALLOC_CACHES, for
> exactly the API that is provided in this series, because that would be
> much more flexible.
> 
> And for systems that use allocation profiling, the next step
> would be to provide per-call-site isolation (which would supersede
> CONFIG_RANDOM_KMALLOC_CACHES, which we'd keep for the non-alloc-prof
> cases).
> 
>>>     Memory allocation pinning[2] is still needed to plug the Use-After-Free
>>>     cross-allocator weakness, but that is an existing and separate issue
>>>     which is complementary to this improvement. Development continues for
>>>     that feature via the SLAB_VIRTUAL[3] series (which could also provide
>>>     guard pages -- another complementary improvement).
>>>     
>>>     Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1]
>>>     Link: https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html [2]
>>>     Link: https://lore.kernel.org/lkml/20230915105933.495735-1-matteorizzo@google.com/ [3]
>>
>> To be honest, I think this series is close to useless without allocation
>> pinning. And even with pinning, it's still routinely bypassed in the
>> KernelCTF
>> (https://github.com/google/security-research/tree/master/pocs/linux/kernelctf).
> 
> Sure, I can understand why you might think that, but I disagree. This
> adds the building blocks we need for better allocation isolation
> control, and stops existing (and similar) attacks toda>
> But yes, given attackers with sufficient control over the entire system,
> all mitigations get weaker. We can't fall into the trap of "perfect
> security"; real-world experience shows that incremental improvements
> like this can strongly impact the difficulty of mounting attacks. Not
> all flaws are created equal; not everything is exploitable to the same
> degree.

It's not about "perfect security", but about wisely spending the
complexity/review/performance/churn/… budgets in my opinion.

>> Do you have some particular exploits in mind that would be completely
>> mitigated by your series?
> 
> I link to like a dozen in the last two patches. :P
> 
> This series immediately closes 3 well used exploit methodologies.
> Attackers exploiting new flaws that could have used the killed methods
> must now choose methods that have greater complexity, and this drives
> them towards cross-allocator attacks. Robust exploits there are more
> costly to develop as we narrow the scope of methods.

You linked exploits that were making use of the two structures that you
isolated; making them use different structures would likely mean a
couple of hours.

I was more interested in exploits that are effectively killed; as I'm
still not convinced that elastic structures are rare, and that manually
isolating them one by one is attainable/sustainable/…

But if you have some proper analysis in this direction, then yes, I
completely agrees that isolating all of them is a great idea.

> 
> Bad analogy: we're locking the doors of a house. Yes, some windows may
> still be unlocked, but now they'll need a ladder. And it doesn't make
> sense to lock the windows if we didn't lock the doors first. This is
> what I mean by complementary defenses, and comes back to what I mentioned
> earlier: "perfect security" is a myth, but incremental security works.
> 
>> Moreover, I'm not aware of any ongoing development of the SLAB_VIRTUAL
>> series: the last sign of life on its thread is from 7 months ago.
> 
> Yeah, I know, but sometimes other things get in the way. Matteo assures
> me it's still coming.
> 
> Since you're interested in seeing SLAB_VIRTUAL land, please join the
> development efforts. Reach out to Matteo (you, he, and I all work for
> the same company) and see where you can assist. Surely this can be
> something you can contribute to while "on the clock"?

I left Google a couple of weeks ago unfortunately, and I won't touch
anything with email-based development for less than a Google salary :D

> 
>>> After the core implementation are 2 patches that cover the most heavily
>>> abused "repeat offenders" used in exploits. Repeating those details here:
>>>
>>>     The msg subsystem is a common target for exploiting[1][2][3][4][5][6]
>>>     use-after-free type confusion flaws in the kernel for both read and
>>>     write primitives. Avoid having a user-controlled size cache share the
>>>     global kmalloc allocator by using a separate set of kmalloc buckets.
>>>     
>>>     Link: https://blog.hacktivesecurity.com/index.php/2022/06/13/linux-kernel-exploit-development-1day-case-study/ [1]
>>>     Link: https://hardenedvault.net/blog/2022-11-13-msg_msg-recon-mitigation-ved/ [2]
>>>     Link: https://www.willsroot.io/2021/08/corctf-2021-fire-of-salvation-writeup.html [3]
>>>     Link: https://a13xp0p0v.github.io/2021/02/09/CVE-2021-26708.html [4]
>>>     Link: https://google.github.io/security-research/pocs/linux/cve-2021-22555/writeup.html [5]
>>>     Link: https://zplin.me/papers/ELOISE.pdf [6]
>>>     Link: https://syst3mfailure.io/wall-of-perdition/ [7]
>>>
>>>     Both memdup_user() and vmemdup_user() handle allocations that are
>>>     regularly used for exploiting use-after-free type confusion flaws in
>>>     the kernel (e.g. prctl() PR_SET_VMA_ANON_NAME[1] and setxattr[2][3][4]
>>>     respectively).
>>>     
>>>     Since both are designed for contents coming from userspace, it allows
>>>     for userspace-controlled allocation sizes. Use a dedicated set of kmalloc
>>>     buckets so these allocations do not share caches with the global kmalloc
>>>     buckets.
>>>     
>>>     Link: https://starlabs.sg/blog/2023/07-prctl-anon_vma_name-an-amusing-heap-spray/ [1]
>>>     Link: https://duasynt.com/blog/linux-kernel-heap-spray [2]
>>>     Link: https://etenal.me/archives/1336 [3]
>>>     Link: https://github.com/a13xp0p0v/kernel-hack-drill/blob/master/drill_exploit_uaf.c [4]
>>
>> What's the performance impact of this series? Did you run some benchmarks?
> 
> I wasn't able to measure any performance impact at all. It does add a
> small bit of memory overhead, but it's on the order of a dozen pages
> used for the 2 extra sets of buckets. (E.g. it's well below the overhead
> introduced by CONFIG_RANDOM_KMALLOC_CACHES, which adds 16 extra sets
> of buckets.)

Nice!