From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE569CDB474 for ; Tue, 17 Oct 2023 15:30:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5D7AE8000F; Tue, 17 Oct 2023 11:30:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 587DE80009; Tue, 17 Oct 2023 11:30:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 44F618000F; Tue, 17 Oct 2023 11:30:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 3483080009 for ; Tue, 17 Oct 2023 11:30:13 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id EECCF120D08 for ; Tue, 17 Oct 2023 15:30:12 +0000 (UTC) X-FDA: 81355339464.22.66E03BF Received: from mail-vk1-f177.google.com (mail-vk1-f177.google.com [209.85.221.177]) by imf18.hostedemail.com (Postfix) with ESMTP id E60DB1C002F for ; Tue, 17 Oct 2023 15:30:10 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Cl4dw5bl; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of pedro.falcato@gmail.com designates 209.85.221.177 as permitted sender) smtp.mailfrom=pedro.falcato@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697556610; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hzY0GitsyqElPw0k5xUlrTwnu+ANjyD6D4kPaGkZk+Y=; b=KZ/V0sTqz/J4NQc1y6kvoBk1ek9tHk1Wo28ukKsre8ZpuEwEYIJei/Nwc7MZxoT6YRiNGd S4kMMruNsPLZSQe1nQyN4MJyoG9ZhGjaocQZCkdhDgKvMxZV4p4ysOlQvSBLdUQQe/aryN YgrzB723/4OhgBPmhaUxhHAFXvWkyhM= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Cl4dw5bl; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of pedro.falcato@gmail.com designates 209.85.221.177 as permitted sender) smtp.mailfrom=pedro.falcato@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697556610; a=rsa-sha256; cv=none; b=0YCfCPw/QoX+g2v3JrvTXwkZwJyfYRFl7J1Q5UnFzpyamecsa4tx792BmA5kXJtKKyaxEd KZ6oOBmU1D0AoBX1PJWdSGPyc5HIMTER3eg+dE3y+4VwHjl6CX6jurNyi23UDUFgQdzb0e 9wDySelIvj5fpWl07sCstCRrOXj1lD4= Received: by mail-vk1-f177.google.com with SMTP id 71dfb90a1353d-49d8dd34f7bso2421157e0c.3 for ; Tue, 17 Oct 2023 08:30:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1697556610; x=1698161410; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=hzY0GitsyqElPw0k5xUlrTwnu+ANjyD6D4kPaGkZk+Y=; b=Cl4dw5blp5J82TO/21W0LX4i4vpgzAvmcE0mnsHj4oME1ZoDsVVKl6dqtdHk2iKQLb jKyNaPiSqXbDjPI5DxIIZaoWoBTmYcXywT/NlB4lsrowtOkM8qbYsDU1aHxoeo7za3FW Am9+wS8rAMG8YWuvakyKFNmcQ3M0fH1nIlsxOKV3AT78UraatcC0i8z4EgEw6WObfu/d 6A1LCw5/fdPmcuVZxlDdsvKEA7IiNEV1JSW7kbaoHeuRP++ieHBBrA6RK62rk9NZzfz9 UqqQkUy0WiwUYo5KuUcHeas7uRxsqX8tzq4cKs0x/WcAZvyiXFSWEST+WHDsyugZEPgS sqLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697556610; x=1698161410; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hzY0GitsyqElPw0k5xUlrTwnu+ANjyD6D4kPaGkZk+Y=; b=pn/TMKrESVAOsXWXwnhrQ0/mDTxghUNXsVy8+ye6YczxwG5y7+vdT/CTRJam3+TB7j Yrb8kt+nPE6rWxzfsKgT+Q7uRoE59rWZE2gSqiTsWlwqS61XDNcyEhuO/WhqZuWxx7vO V48wpHy6HIhvIzG8/4wemoB3ittP0MUmdfCtaBmTJRSE6jYRwo2a8eOu+mKmbmRYDrw0 5Lb9RVTlYE4nffs1JnTZhCG5tIEgqjpjZy7Tg+yhxO1tvI52BvNdWeufT3zY5cwe3zqv yDFplc5ZK4uvL7RItWaO7MeD/5Cw1na8fK5C0cmvbJ5v8wUKRpp8w1yCt8lEe1dcPY31 iIuQ== X-Gm-Message-State: AOJu0YyrlGyPSm3bJVo7/yz0Qv1xvqhDRp52ihmGOtHWeP0vmm2RxaMe qkd6vA9MizOCCXdcgxGYicPDnQuLvIJFPv7qk74= X-Google-Smtp-Source: AGHT+IEToIjI6smNAtb9fQBnBiUnBPsWCGQ9u1vaLqsMsmIj7mhsOek4OqQxscUBMCeGRUVeH5hK/AybixcyAuDmkds= X-Received: by 2002:a05:6122:30a2:b0:49d:d73e:5d07 with SMTP id cd34-20020a05612230a200b0049dd73e5d07mr2683910vkb.16.1697556609819; Tue, 17 Oct 2023 08:30:09 -0700 (PDT) MIME-Version: 1.0 References: <20231016143828.647848-1-jeffxu@chromium.org> In-Reply-To: From: Pedro Falcato Date: Tue, 17 Oct 2023 16:29:58 +0100 Message-ID: Subject: Re: [RFC PATCH v1 0/8] Introduce mseal() syscall To: Matthew Wilcox Cc: jeffxu@chromium.org, akpm@linux-foundation.org, keescook@chromium.org, sroettger@google.com, jeffxu@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, jannh@google.com, surenb@google.com, alex.sierra@amd.com, apopple@nvidia.com, aneesh.kumar@linux.ibm.com, axelrasmussen@google.com, ben@decadent.org.uk, catalin.marinas@arm.com, david@redhat.com, dwmw@amazon.co.uk, ying.huang@intel.com, hughd@google.com, joey.gouly@arm.com, corbet@lwn.net, wangkefeng.wang@huawei.com, Liam.Howlett@oracle.com, torvalds@linux-foundation.org, lstoakes@gmail.com, mawupeng1@huawei.com, linmiaohe@huawei.com, namit@vmware.com, peterx@redhat.com, peterz@infradead.org, ryan.roberts@arm.com, shr@devkernel.io, vbabka@suse.cz, xiujianfeng@huawei.com, yu.ma@intel.com, zhangpeng362@huawei.com, dave.hansen@intel.com, luto@kernel.org, linux-hardening@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: 7z341r497gtcww1gfkha71qntcm68jdw X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: E60DB1C002F X-HE-Tag: 1697556610-836376 X-HE-Meta: U2FsdGVkX1/rid8SE4FzCsNei39bjZpoTXqQYH74SXwdFNBgUmjlpFcBlVIIu/kKyBix/nx7/quJ9lIOaIGa/vXkG/JwVQ63Wr29tn+iKTb5w4E5/f73OSqY11y52uPXtM1fUd1hLNITy+YnMTfMm6+zQozx5F7Jd1ArKGdtchgG5dfj1wKaYObRj7yC9WM2a8IzvpHWI4ZKgcFjrdNSuwhfi6Tf3p8az1VhvwriGnDFdQD+SVuBcY57dNTAo2SADpqWbAN3obDqrTNeHvEg13BMHSgQ/A7fF16Qu8q4yuk6uh0tKzvXopzmMLym+/RrBAuMdxpilSMMZsJUdo12DpvnNauv1fbTFyrGeomfN3xvTKe9sUq9B0RAmTt6zdQGAZCOUI/8F2wMSnMpfddozXrgXHYnrv/W/Bz5qnx55KaJkD8Bqbxtng417m/zXqE7r3f+pektynRHsEtXEfUpaJOUZhURS2pKyaMVlyDb4DLreI8qquvOR7gyX2aykmNDuHA2KdwLrONLOfjVOty4lT8V5vld6BZXVPEnzZjo3WbHkj/7LY25QDGIA/lj0CxH9BNJln9dxnJrLu4QPB+dCAa59ExxLKW96rOSleG1kjjlJcT8fNTsIOdeDujkA50NfRd2es9YTfWqmbgj6LyFEhs3XhaMpjWfqvkKyNvlGnhAfaGZnGYCNFLZCZVXH94olVlJ2TQIt8rzRxwzMVwn+jkZxKzLuQ643njTsmw77TNUwhGCMhGPhpuYaoL7zs9AhkLhI6ZYGDIJuato3MVa9T28A7KFdOHjrCcba75k/b8P3yUK+Fj8p2YBHgdob+k9LsKR2S/xUPBXikG0YrIB5kxuIRXhwBHHIaXMiRxgwymz8uCMGWuZ12s/82n7EuwPynRNWZtSY2nyUuebk7pcSqyHUTjCIXeKtbWJAkhTbVfM4HMAHVxDRfY5ar0O07BxL7h9n3JJ7AzQc+chP4Q jSwjuYAx q4YDnSpwFxSov4CVDFNxJRbEQ1WEkZitXs4HyffeHEbXD4/yOuthasn1FVxKIh08xx3fv8cAPAbKTjkCGT12T1Uf5Nl2N79SGqcoxyQkMkX3CMMRPGRDDXImsJ/LHtviw1dKKud2nLCjyVc0frQ/NvtACqirLlHVQ3HEbsfXmYe3tDFoAM9i/xct+QpwAjMpHghO1PCCnGXylsPpx6fQ4Whzdw4Toe+51adDIzd3ISsw++zp55SZwrWmVnl1VaIwdbzaIchJxVV/u5Duq61rJZdGQxpVZEBh3/pijXfIXp4eXRNO/UoT4ffQut52cZ0SeleY3/mzv9/YpIdebbUWL4KQVT95Sdn7odjy1UxqbltrdR/NcleWsA71JWoplKazMbpHIzHDBFVzl/SWmGZiP7AFzKIYAOUrljWtDykOWN1nDvQW9G+KOqr2+JPzeEUPy4KdW+33aP3JfPAZi0K+jWoURDJtDzyzCPIZGa5oxK7SS5VRw5ok7zQ5YA4hM5SPWD20+ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Oct 16, 2023 at 4:18=E2=80=AFPM Matthew Wilcox wrote: > > On Mon, Oct 16, 2023 at 02:38:19PM +0000, jeffxu@chromium.org wrote: > > Modern CPUs support memory permissions such as RW and NX bits. Linux ha= s > > supported NX since the release of kernel version 2.6.8 in August 2004 [= 1]. > > This seems like a confusing way to introduce the subject. Here, you're > talking about page permissions, whereas (as far as I can tell), mseal() i= s > about making _virtual_ addresses immutable, for some value of immutable. > > > Memory sealing additionally protects the mapping itself against > > modifications. This is useful to mitigate memory corruption issues wher= e > > a corrupted pointer is passed to a memory management syscall. For examp= le, > > such an attacker primitive can break control-flow integrity guarantees > > since read-only memory that is supposed to be trusted can become writab= le > > or .text pages can get remapped. Memory sealing can automatically be > > applied by the runtime loader to seal .text and .rodata pages and > > applications can additionally seal security critical data at runtime. > > A similar feature already exists in the XNU kernel with the > > VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable syscall = [4]. > > Also, Chrome wants to adopt this feature for their CFI work [2] and thi= s > > patchset has been designed to be compatible with the Chrome use case. > > This [2] seems very generic and wide-ranging, not helpful. [5] was more > useful to understand what you're trying to do. > > > The new mseal() is an architecture independent syscall, and with > > following signature: > > > > mseal(void addr, size_t len, unsigned int types, unsigned int flags) > > > > addr/len: memory range. Must be continuous/allocated memory, or else > > mseal() will fail and no VMA is updated. For details on acceptable > > arguments, please refer to comments in mseal.c. Those are also fully > > covered by the selftest. > > Mmm. So when you say "continuous/allocated" what you really mean is > "Must have contiguous VMAs" rather than "All pages in this range must > be populated", yes? > > > types: bit mask to specify which syscall to seal, currently they are: > > MM_SEAL_MSEAL 0x1 > > MM_SEAL_MPROTECT 0x2 > > MM_SEAL_MUNMAP 0x4 > > MM_SEAL_MMAP 0x8 > > MM_SEAL_MREMAP 0x10 > > I don't understand why we want this level of granularity. The OpenBSD > and XNU examples just say "This must be immutable*". For values of > immutable that allow downgrading access (eg RW to RO or RX to RO), > but not upgrading access (RW->RX, RO->*, RX->RW). > > > Each bit represents sealing for one specific syscall type, e.g. > > MM_SEAL_MPROTECT will deny mprotect syscall. The consideration of bitma= sk > > is that the API is extendable, i.e. when needed, the sealing can be > > extended to madvise, mlock, etc. Backward compatibility is also easy. > > Honestly, it feels too flexible. Why not just two flags to mprotect() > -- PROT_IMMUTABLE and PROT_DOWNGRADABLE. I can see a use for that -- > maybe for some things we want to be able to downgrade and for other > things, we don't. I think it's worth pointing out that this suggestion (with PROT_*) could easily integrate with mmap() and as such allow for one-shot mmap() + mseal(). If we consider the common case as 'addr =3D mmap(...); mseal(addr);', it definitely sounds like a performance win as we halve the number of syscalls for a sealed mapping. And if we trivially look at e.g OpenBSD ld.so code, mmap() + mimmutable() and mprotect() + mimmutable() seem like common patterns. --=20 Pedro