From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CD44CDB474 for ; Tue, 17 Oct 2023 21:34:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EF9D28D000E; Tue, 17 Oct 2023 17:34:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EAA778D0005; Tue, 17 Oct 2023 17:34:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D71BB8D000E; Tue, 17 Oct 2023 17:34:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C86028D0005 for ; Tue, 17 Oct 2023 17:34:06 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 9D6E21CB740 for ; Tue, 17 Oct 2023 21:34:06 +0000 (UTC) X-FDA: 81356256492.21.20A8051 Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171]) by imf11.hostedemail.com (Postfix) with ESMTP id E43FF40015 for ; Tue, 17 Oct 2023 21:34:04 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=cw6Dsf1B; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf11.hostedemail.com: domain of jeffxu@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=jeffxu@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697578444; a=rsa-sha256; cv=none; b=3wk1JvWUeEFumoMW49FVvoi35w4kwNtoCn0VPYq4evzZH/R58n3oAsg58n6Z1utPvcWC4/ 6fBZ1OjE4x1d6OxA9Yl86AGWTtHe3R4pFmCMmImM9B/3n0ekcbV8HDELt9o8Fz8TbVggIL 1ay1SEpTSEMJklYLqg/Auu9Iu9oHTC0= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=cw6Dsf1B; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf11.hostedemail.com: domain of jeffxu@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=jeffxu@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697578444; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FMXGhkJDs7eDT4ZOO41ZpDvjS9DBH4hMrhhJ0DjwM1k=; b=DFjv5O3/i4KsKhfwhSRRvA0GyCMFnoIIn0PeCLWJu9iSyrzUJXZ+ysmJ/4+SSip9vNxSTA Hs20DdmcNkPRI+OvioAyy2pHP5nOsl3xbz2ruCCnOx4eoirALnF9Ka+svpGmvnnvcNeb3e 32609uR8eUoDvtS2q/vcHMTjKXhsqWA= Received: by mail-qt1-f171.google.com with SMTP id d75a77b69052e-41b813f0a29so47111cf.0 for ; Tue, 17 Oct 2023 14:34:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1697578444; x=1698183244; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=FMXGhkJDs7eDT4ZOO41ZpDvjS9DBH4hMrhhJ0DjwM1k=; b=cw6Dsf1B1Len1ly/dJKuLDi7fjAJRPD5mYDucv8Xeaxvbyyl46tmD8XSVLRyKuximl fvIa4YhbC8aZydnzArMhJi0WSOqZSYUIyPbKwyTBGdPkFQ3iB5FMjVdE4VZazLkvstEh J6PvIit3lyYQPD51NaTbDquWboE8nX/Ssv59T3pILD/oVZaQfaAkdT/ZQXuYA7ABBdow IJlNCw8vrMDcLi+tcgy2A0w1cMj7XDZNv6yRgihjwex1yXetPUBlzW0qN4sK5gP/X33l /MnKA+sOfy8OxL68dbhc/NW8q0+nVtlR/NIuwP2td2S39fw4hwySqi88Yu4cta5HooUB hmGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697578444; x=1698183244; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FMXGhkJDs7eDT4ZOO41ZpDvjS9DBH4hMrhhJ0DjwM1k=; b=MwfyMHdmQqHztYlU5639u+QQojsxzN1K+4VQoDaWmoXraJEU1F3Qu2lhy9kz0FmrYh 44tDdRfQ8L2hdZP9FVqA/QVdPu9JqA6goOfQ1HDZY7KrHKbndVSYAzBM4mEd5O1P5S4t XnVdfcGiUp9F7a+aXKL39RI89IlaNzSN38/+Y9+nbNQfgcYRNQhmXY/ZKz6uaMqsOlt1 yE1lJhA5A2UT//K8ZxDY92YgEmajytx2lQdBKJ2hr6dwVx9JyqEmqv74V/um+40yhLl8 T0XM2JzzsI/NKGc1IY1gIjHUelyboUUV2auC+lpEcWdgas+1iQdHiRGfEvfQJjWHbnxO y07g== X-Gm-Message-State: AOJu0YwUPYXVtEL0/MJABPvjD8CCwmnP3PEbdWvV39nakxWgTY1TazVL c0z/hbVGR79wCWwTI4Z7sXgvodky4a+4jO+NQNr2MQ== X-Google-Smtp-Source: AGHT+IFUwVtlYfLi7JKZSCUz0Jiosx6BPQWeV/rvZGaP3zfq+4NyIM/XBBrwOPaKmQ5mtz+XiIwCd4QxV0UKqN6/gis= X-Received: by 2002:a05:622a:68ce:b0:41b:834e:1b24 with SMTP id ic14-20020a05622a68ce00b0041b834e1b24mr94058qtb.6.1697578443767; Tue, 17 Oct 2023 14:34:03 -0700 (PDT) MIME-Version: 1.0 References: <20231016143828.647848-1-jeffxu@chromium.org> In-Reply-To: From: Jeff Xu Date: Tue, 17 Oct 2023 14:33:25 -0700 Message-ID: Subject: Re: [RFC PATCH v1 0/8] Introduce mseal() syscall To: Pedro Falcato Cc: Matthew Wilcox , jeffxu@chromium.org, akpm@linux-foundation.org, keescook@chromium.org, sroettger@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, jannh@google.com, surenb@google.com, alex.sierra@amd.com, apopple@nvidia.com, aneesh.kumar@linux.ibm.com, axelrasmussen@google.com, ben@decadent.org.uk, catalin.marinas@arm.com, david@redhat.com, dwmw@amazon.co.uk, ying.huang@intel.com, hughd@google.com, joey.gouly@arm.com, corbet@lwn.net, wangkefeng.wang@huawei.com, Liam.Howlett@oracle.com, torvalds@linux-foundation.org, lstoakes@gmail.com, mawupeng1@huawei.com, linmiaohe@huawei.com, namit@vmware.com, peterx@redhat.com, peterz@infradead.org, ryan.roberts@arm.com, shr@devkernel.io, vbabka@suse.cz, xiujianfeng@huawei.com, yu.ma@intel.com, zhangpeng362@huawei.com, dave.hansen@intel.com, luto@kernel.org, linux-hardening@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: E43FF40015 X-Stat-Signature: 8ygqzwo87xxzmuf1anu93x7bymky64cn X-HE-Tag: 1697578444-235733 X-HE-Meta: U2FsdGVkX185Pyz1jKSmJCeNlVDUEVESAQoYmgprOznS1HYrfokmZsBGPIasFaOTtGmvYa3KaJySJcPnU+uIsgs+SEo9BH6USEldQq05WrjM+BZxaHaxSqFD3wiOl/mSF+0rpQEV1XB0k/f7yNPMcbrNjGLY7chegy/sHEAGO8c1Bdf+Ie7qJl/wg7R1xMyzcaBJa11h4Vti6Rew0pBYrc1CD30s0xU0ruH89hy9hf/6P/n5E0Wv5bhYmwpCmDCDH6etHqcHgCX0lMo0Yf8MsPqgR4oLJ6qXN84+QyRbeSZYUiTGQhtE5vshOHC/Y2USxbs1an/AubUcrgew+YmnbDJQPpZok/LruuqO0ii8osDAVJyHzsi9LXyaQ4AKNa4ct8B2f5d/1vZDG3U0wnmN87CG3uUB4786KDIEe8m3wg5nqC/xFxPy+GHVcjl4zIdGrSzFgxF9iSOjLLbQc3K1Oe5kHv1oZTkR6aOLBRsr+zT+s8cmLuSGOK1T5r6QSXeCfNwVx4d/AArQE3JuRKLoxN7shQi+ygGEJJVKLR6PIxRbIxXisdnvNtLKR9KMTgzjgMDG/v/BWOcIKeuCNi9JuoYqqRStK5umzMWSk539LlIiD8OX1AFndKHMwlDCBmUcjR6AJ848PriancScsSuYDAdO3h6Xw4Gxk0cZBOEF/9XAzHX8ixzQL7/3WzDGhbCkNPme1mxDNd0iRIukj3cE5vVV7AFUUgveczRZNFWNZ9lAxbC/H8DtwEE7LdYEG01e3BfBa6Vwz2nKVnsLsG6EW124qZc3PpADTsA9L+e9JllVHRypZdESINMmE7sdP9/xmSG/xuErzab1eHlcJJHn8RiAwNQ21AVU520xEANHM/zsfg0d8kSU+LYZPgI6PFjvNLEgtYaVNnYWVYc3OrQsY4WpKYuERouJXbGU1dfWE4+mWpIa02h7+67mVD4YI2RqWruzEzyIkltio0ELP03 ns0X/UPd FaQYlNUkt6Qbdc7utV5FscNSOb/3ewk4n7IoTUsbMEM3zuyZI3HIyWPDKqHTToJ7yCKzGINuM1w3oSlNQMUSf81vey+1uJ1hvryf9A+yy75bviVoutL4JiKah6MXoIWy90qe6i5uYBFstZl6qMPRRhe3Qdu9PTUOSak/cT3Ixtqn893yXhH3KYMEMrdoutpwq62jhBUQCPYKuuT8Za56Qz7x4vgwFhhGNjSVtJkTVwnH31K6IUzrfyEujR6NoSYoNeYxlrdtXA8KhII+6Ok9vnoTL6uOxZwY0kOm9QFD1WC5vFJZcJATK83mWX+paCBLXEUSYzNCASL31t8xGxypy2DKaTxAYBJrNFbJ3yFwNRpvT6c5S0ZkhgkJWqBBPp83g0T6AdkjCr+mSl0jdVY8MAFOkk9F5WiWWvBB6Dfu8CqrKW2BLFwcln5XrWg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Oct 17, 2023 at 8:30=E2=80=AFAM Pedro Falcato wrote: > > On Mon, Oct 16, 2023 at 4:18=E2=80=AFPM Matthew Wilcox wrote: > > > > On Mon, Oct 16, 2023 at 02:38:19PM +0000, jeffxu@chromium.org wrote: > > > Modern CPUs support memory permissions such as RW and NX bits. Linux = has > > > supported NX since the release of kernel version 2.6.8 in August 2004= [1]. > > > > This seems like a confusing way to introduce the subject. Here, you're > > talking about page permissions, whereas (as far as I can tell), mseal()= is > > about making _virtual_ addresses immutable, for some value of immutable= . > > > > > Memory sealing additionally protects the mapping itself against > > > modifications. This is useful to mitigate memory corruption issues wh= ere > > > a corrupted pointer is passed to a memory management syscall. For exa= mple, > > > such an attacker primitive can break control-flow integrity guarantee= s > > > since read-only memory that is supposed to be trusted can become writ= able > > > or .text pages can get remapped. Memory sealing can automatically be > > > applied by the runtime loader to seal .text and .rodata pages and > > > applications can additionally seal security critical data at runtime. > > > A similar feature already exists in the XNU kernel with the > > > VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable syscal= l [4]. > > > Also, Chrome wants to adopt this feature for their CFI work [2] and t= his > > > patchset has been designed to be compatible with the Chrome use case. > > > > This [2] seems very generic and wide-ranging, not helpful. [5] was mor= e > > useful to understand what you're trying to do. > > > > > The new mseal() is an architecture independent syscall, and with > > > following signature: > > > > > > mseal(void addr, size_t len, unsigned int types, unsigned int flags) > > > > > > addr/len: memory range. Must be continuous/allocated memory, or else > > > mseal() will fail and no VMA is updated. For details on acceptable > > > arguments, please refer to comments in mseal.c. Those are also fully > > > covered by the selftest. > > > > Mmm. So when you say "continuous/allocated" what you really mean is > > "Must have contiguous VMAs" rather than "All pages in this range must > > be populated", yes? > > > > > types: bit mask to specify which syscall to seal, currently they are: > > > MM_SEAL_MSEAL 0x1 > > > MM_SEAL_MPROTECT 0x2 > > > MM_SEAL_MUNMAP 0x4 > > > MM_SEAL_MMAP 0x8 > > > MM_SEAL_MREMAP 0x10 > > > > I don't understand why we want this level of granularity. The OpenBSD > > and XNU examples just say "This must be immutable*". For values of > > immutable that allow downgrading access (eg RW to RO or RX to RO), > > but not upgrading access (RW->RX, RO->*, RX->RW). > > > > > Each bit represents sealing for one specific syscall type, e.g. > > > MM_SEAL_MPROTECT will deny mprotect syscall. The consideration of bit= mask > > > is that the API is extendable, i.e. when needed, the sealing can be > > > extended to madvise, mlock, etc. Backward compatibility is also easy. > > > > Honestly, it feels too flexible. Why not just two flags to mprotect() > > -- PROT_IMMUTABLE and PROT_DOWNGRADABLE. I can see a use for that -- > > maybe for some things we want to be able to downgrade and for other > > things, we don't. > > I think it's worth pointing out that this suggestion (with PROT_*) > could easily integrate with mmap() and as such allow for one-shot > mmap() + mseal(). > If we consider the common case as 'addr =3D mmap(...); mseal(addr);', it > definitely sounds like a performance win as we halve the number of > syscalls for a sealed mapping. And if we trivially look at e.g OpenBSD > ld.so code, mmap() + mimmutable() and mprotect() + mimmutable() seem > like common patterns. > Yes. mmap() can support sealing as well, and memory is allocated as immutable from begining. This is orthogonal to mseal() though. In case of ld.so, iiuc, memory can be first allocated as W, then later changed to RO, for example, during symbol resolution. The important point is that the application can decide what type of sealing it wants, and when to apply it. There needs to be an api(), that can be mseal() or mprotect2() or mimmutable(), the naming is not important to me. mprotect() in linux have the following signature: int mprotect(void addr[.len], size_t len, int prot); the prot bitmasks are all taken here. I have not checked the prot field in mmap(), there might be bits left, even not, we could have mmap2(), so that is not an issue. Thanks -Jeff > -- > Pedro