From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50D5ACCF9EC for ; Wed, 25 Sep 2024 19:46:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 57DFA6B00AC; Wed, 25 Sep 2024 15:46:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 506B06B00AD; Wed, 25 Sep 2024 15:46:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3A7496B00AF; Wed, 25 Sep 2024 15:46:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 17D4D6B00AC for ; Wed, 25 Sep 2024 15:46:16 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8E589ACA9C for ; Wed, 25 Sep 2024 19:46:15 +0000 (UTC) X-FDA: 82604291910.08.3032C24 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf01.hostedemail.com (Postfix) with ESMTP id 79E914001B for ; Wed, 25 Sep 2024 19:46:12 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=UWmKHScw; dmarc=none; spf=none (imf01.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727293557; a=rsa-sha256; cv=none; b=8E0HnoBisHAMFBznbCPp9915h40eSMwP1XjYYZ7ZAXhoO0yCpr8nFvpXBGZF8hqo2B+tUl 5gerJ5H7HeCl7sMMZGlqm63Vg324ssRYWKtrU+VhGSJ7axrH/Xzk9U4Qic36bAy6ejGtt9 yGebrL9P9h/w4AHNwNZ6jbCtv+P0/2c= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=UWmKHScw; dmarc=none; spf=none (imf01.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727293557; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=zVGcxuq/dSpvIFu8PtXvtLD2BP3GcSIrVF8T010lvk4=; b=PuRD3vfWECI0Q4H9P8SG42gSCtcgdY4elyQDOGTZa+6jiWkpGqKYsYEUFBzv38W9flWuwz WWeF6DfozL2xpWPnZ5XukmufDw8XSDLlg090GCU1zAH54ext5YSqXLItSmWrjs4/BM1rnm D7E7t2UJpLnMNtCceITN9Dbu+GbWQ4k= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:Message-ID: Subject:Cc:To:From:Date:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:In-Reply-To:References; bh=zVGcxuq/dSpvIFu8PtXvtLD2BP3GcSIrVF8T010lvk4=; b=UWmKHScwxv2nLx7CFIKBp0rT8r TAivixxe1lqe77i+mDHwC6U8L3se+u/1/PwB5llozNRPwp1CyQJ/mVHaHDz1kqv7Esl1aBIgMxFk1 AlsncYnrHvTD/ZtOO0g2kfA64Fk2i57sZ+BucGxGEWPWL00eKYHcAENipNURrq0SCI5LT0ffLf/nR CjyPwXDAmmJiWKEUKXDcdIh7bRFaWCM8bUkOffGwimSIbdlbmzomu+StwFe+b/3XfN7v6aLTB/vj/ F6V2/f75tIjhcWURgKZHJDt3CTxJtRz8gcyZOvF5OuzIAIsWVnO80V11jtRClrhC4Kh2jgWjjyqVd 9ha85Qiw==; Received: from willy by casper.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1stXxh-00000004jOu-46TW; Wed, 25 Sep 2024 19:46:10 +0000 Date: Wed, 25 Sep 2024 20:46:09 +0100 From: Matthew Wilcox To: linux-mm@kvack.org Cc: Kees Cook , Jann Horn Subject: [DESIGN] Hardening page allocator against type confusion Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Rspam-User: X-Stat-Signature: zbwp6kmgh9jwnb57chraaiiq74dgbqbj X-Rspamd-Queue-Id: 79E914001B X-Rspamd-Server: rspam02 X-HE-Tag: 1727293572-606310 X-HE-Meta: U2FsdGVkX19HHRPZFjOrrnzq1rhrQG9yWWLL7qpiYb1Y+A5uDemoeyc188fLjFrsoTjBnAgt3g9WqgfMUjRcJK13VnMSJLOgyMPA1EsSjbifVpdt5TTd+r0ZXddKEzEPN3i4kc8anRHN2GEX92LbWzxWn5WXfgYyqB8iYMiNY8Xlq+m5mPFXYDG3amISA+eickT3GEQ5eevZIkQmb/mDA9z2dk+9QgpxJbd7iHfQN2gheuDNx40aV2Bw1ZBEEldcMYpk1A7jycgS2MkbBrKafbMHFcDr4eA0D5ExcFHkg0ir92vo78jNlt3knMipoh5gSMxtNvdUkyJ1FGEAAAJAKH/3O5C4LYxS9OoJ06t1q1tNrKfZaP3XSxbI7RijgUHboJNunXes0nxMP1Q3EkdmyvDQv4sKgf0kXZVU/68Ri/3uBmPzoRDoYFEftwwpo5Ik94noqtjbGQ7g1Gbvqpwjb0wEZcfH522gmlm3uOLG4XOZNvmYwKCfX5NT/CAR3pXXUYkWHE0+S8DFySNPRMoYWo2P5hRxvSecULk1LBy0l9l3qXkq+z/8YCWU4xbqnXUQnXFwDDC2eQujBNjuR/FeOWfcyS8PtnqSk7kYQaNjYFY3p5FmmmOC3JUes0UD36HeTntuVlhGBr5WiwhcaHDwohiOvEMEmu+WYbe2kTGdy1ogDGHE7WiETPa7d0j5FbcuZCoH8gAn2uIbn4CffjlFxR0hzQjQ0r+RkE3DMMahtbhDUzcHf0vBfnC8UuNEwIO72nolp+/epYclFhi0OFqy+CZlhzcmkSLg1J3qbhLjR3SOqDYoabPNyEuoG4qHV+wDtW3wA8wAh5bwkocOc5Ga6a4U3xyM3WJEBdBg32ykQ29QbiUD5FTMehfD3cPP1muXym2BRgfHCPzwolPCHoCUeCyCEv/EWXGWaf3lI6qzlgdwovTPekNR2xGTeeYuUl88XvxWfQ68fmrQ4L5MUg/ dY/47fGH ZSVcNG33Xveyc+0XyCvuHOuZd73x0t3FVxaJD3vjzkRlS25Zt7WO/cBGmKrtc+mvzJUows0d3TpcWb/bMdUR+ooGszL80raL8bSInRnD7U9LH3a/qF0ulZ65UeUsMtCnr5UaG0Nhzp6p2KtE7DlZA31zZWWnvr3eTxaxtwQD9yT+0sbW5qGia0tYNMPDovm0cztowh28TQ5YgL6OcNOGZRqeIFsX0BUoflSBb9OU7DtjeKnStVjeCkzdS7baUZF1cl3gp/M7uYksno1mdDTr2KZHvUWzKC+7K6ZUL X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Kees and I had a fun discussion at Plumbers. We're trying to harden against type confusion, where we think we have a pointer to one thing, but it turns out to be a pointer to a different thing. There's various ways this can be harmful, which Kees has laid out before when adding slab buckets. eg see https://lwn.net/Articles/978976/ Not all allocations come from slab though. If we free a slab object and the slab it was in gets freed back to the page allocator, it can turn into almost anything else _quickly_ as the page allocator fronts the buddy allocator with a stack of recently-freed pages (called PCP, not to be confused with percpu memory), so if the attacker can arrange for a page table allocation to come in soon after a slab free, it is very likely to be the memory they have access to. My proposal is that we resolve this "type confusion" by having separate PCP lists for different types of pages. We'll need to have this for memdescs anyway, so this is just shifting some of the work left. We'd reduce the exploitability of type confusion by using a per-CPU, per-type stack of recently used pages. To turn a slab page into a page table page, the attacker would have to cause a dozen slabs to be freed on this CPU, pushing this one into the buddy allocator. Then they'd have to cause the allocating task to empty its stack of page table pages, causing the attackable slab to be pulled from the buddy. It's still possible, but it's harder. Harder enough? I don't know, hence this email. We can get into the API design (and then the implementation design) if we have agreement that this is the right approach to be taking.