From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34D29C4828D for ; Fri, 2 Feb 2024 03:20:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BA9D36B007E; Thu, 1 Feb 2024 22:20:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B57076B0080; Thu, 1 Feb 2024 22:20:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A455E6B0082; Thu, 1 Feb 2024 22:20:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 961986B007E for ; Thu, 1 Feb 2024 22:20:49 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 64D80A03F9 for ; Fri, 2 Feb 2024 03:20:49 +0000 (UTC) X-FDA: 81745411818.14.45A410A Received: from mail-ed1-f44.google.com (mail-ed1-f44.google.com [209.85.208.44]) by imf30.hostedemail.com (Postfix) with ESMTP id 9A92B80002 for ; Fri, 2 Feb 2024 03:20:47 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=XRoFSHLk; spf=pass (imf30.hostedemail.com: domain of jeffxu@google.com designates 209.85.208.44 as permitted sender) smtp.mailfrom=jeffxu@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706844047; a=rsa-sha256; cv=none; b=Mr5yaqq6trSzL2+U+Wta18hPUZ17D+PO7k6hyp2qPlILNeu8uSrYt6bj2rzbV8RDpTTcxt AeItJ/ljAbb/Ml6DjwSSqTpYHEf7Lo4JALnmpLh3lPTnZiveBlw8y9MHVqFEyuf9ptQgav MvsVD2XYF2rKQYeDpfw618zuDvKw/6k= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=XRoFSHLk; spf=pass (imf30.hostedemail.com: domain of jeffxu@google.com designates 209.85.208.44 as permitted sender) smtp.mailfrom=jeffxu@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706844047; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=C1ccPOcrfBOBxKgyXEOc4T5KfmNv7K5VpG7dfv94EiA=; b=VdWM+Pag92FzOdjaqWFEJgFvuDF0w3jR3ualR5AtPbrPwzjPC/LF5lU7pkZjrIqZIqJx0i 2Bob8X7dvdyirNFdJvghBGg74OkT3DKUd/xFCN31rYeoY2tA7VpY2uLTJ3uFa0PDxK7lMo Gy9du5kxSac/8wRIaFQ4TY2XIqcmTXo= Received: by mail-ed1-f44.google.com with SMTP id 4fb4d7f45d1cf-55f63fd3dd8so6393a12.0 for ; Thu, 01 Feb 2024 19:20:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706844046; x=1707448846; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=C1ccPOcrfBOBxKgyXEOc4T5KfmNv7K5VpG7dfv94EiA=; b=XRoFSHLkUyKoamCDDEGH8DKM/IKNWXZMspr6RRJppsYZHv+0FzGYScboWvAOQFN2J1 2pWisTlfpmKuT44JCMpx/QlYNDjY7JPZ67njDeqSbVMTMfMrZv2LIjf6ECd6a+JnnLE8 K1g6UoMxCu80fwOZvsJpqmvS7LrpZCZFO9pT1rsSBNg+rOWOKBvoV6++IbswcwCWzABB 7ZSiILy70OrwEVzVQiTuPxjvHDwSeQxUJHLV/+ZcBRYKXChpYsPR9y8eRejlls/GcAn1 UsfsAihpdku4vB+Wgzxnoao20AbG/vVWWe2WvX06UgEmL3KA0cXGMYsyYOmc5MO7DeXi RczQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706844046; x=1707448846; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=C1ccPOcrfBOBxKgyXEOc4T5KfmNv7K5VpG7dfv94EiA=; b=GSY4OOFy1HmbHaxG2StzBJAkBOPMMusDMaOcZB9zJteXzwvc0E+l7ANVqZDXQVrz3W u/s48tZ2ZfDIWXndcxrkNmvfqKnp9lQbArVOjsiJV8R/CAqbWAMt86DCRcEb8R2oU/Zn yZmpOO3lvbr4e/xT9fFWkGppnOloUWwOQIY20y97m6XNbZno2rUl8Kvtev4ZVDQcPtcJ Yr7gs1nIdlyuuCDI5itA5UWAcdlmFAyOiMUtVm4iOFgamRX93HEnCJ0xkE1Ikr9gU7px FPlNmTv+NvZea2/rWq/MIrBe79nQ4OriTC0GArq7NWDEQifmnk3CNIHkB+c8AZ3lZFsS 5fPQ== X-Gm-Message-State: AOJu0YwPzH8S7ByUcWkJIWSn+HtsN4o+548Hz6w+LUn6ThI0s3QoJutf EI4Iz+XxWfTJDMRv+hg2/qVBvaZqs9y7u440RrkF3Cke4/pmKaqoF5EzyE0Tx9fU8aqu8OIHtoP SPVG/aduvMdzXmnac9YNPb4eGgz2hh81HLRLZ X-Google-Smtp-Source: AGHT+IETV/je7D6y3Sm+exJ30FP4o+/gQJbnsW13jxMrj5U3Y6RvNW5CXCTWMCQ6Ci/5MpOeg2RKxta4JNZl5e2HZ+o= X-Received: by 2002:a50:951e:0:b0:55f:993a:f1c2 with SMTP id u30-20020a50951e000000b0055f993af1c2mr87759eda.6.1706844045871; Thu, 01 Feb 2024 19:20:45 -0800 (PST) MIME-Version: 1.0 References: <20240131175027.3287009-1-jeffxu@chromium.org> <20240131193411.opisg5yoyxkwoyil@revolver> <20240201204512.ht3e33yj77kkxi4q@revolver> <58408.1706828083@cvs.openbsd.org> In-Reply-To: From: Jeff Xu Date: Thu, 1 Feb 2024 19:20:08 -0800 Message-ID: Subject: Re: [PATCH v8 0/4] Introduce mseal To: Linus Torvalds Cc: Theo de Raadt , Jeff Xu , "Liam R. Howlett" , Jonathan Corbet , akpm@linux-foundation.org, keescook@chromium.org, jannh@google.com, sroettger@google.com, willy@infradead.org, gregkh@linuxfoundation.org, usama.anjum@collabora.com, rdunlap@infradead.org, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, pedro.falcato@gmail.com, dave.hansen@intel.com, linux-hardening@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 9A92B80002 X-Stat-Signature: 81ka6t763yqcba9bwk6u9bzbymsr8e4k X-Rspam-User: X-HE-Tag: 1706844047-547991 X-HE-Meta: U2FsdGVkX181XVKt98NU8/WssrRJFXG1aBhoQPMXeHtTzF1ymb0+9KdlOPpUaP6rpC7aXk3p9e2EFSU/88JdkpfTsXgUhOwZZfjv+LBBjEqOhpoctaZP2QshqLzqENfo2D3zU0Iogvd8d2HfTbUUx25imu8AefhssNqhsGlNTg6j0HqbYthndJ58IQqy7u16F6RECagkRmXONdkncgLyofdFXmeynsSmZWBeNjYYFsZQyiTVZyfqfFCCD+Y76DN+jIvnMOb8kpXG/O67+SAM2vZmkViGfIx6zqJ/k86EoZbWbXxwJPZz0zkYiH15JY/DDOfjRRLbCVTCKdN7sXfqP9GcA6882dA1OHWTm3wxMLE+az+GaMbPcj//JvMQDu4155B2kkp22Ekb4pcly9PDaFJY5VCnr32ooI5oKPaxnZVi9fPmCqW7rdm/mcRXNVRcZEb+wyvyubgRYGR1LuBMTf3it2tUk+Ens7FkG39EnbD0hoShe9WvwPXkoow2UNLj4wC6TKI5/GIqXzGHZGR5IhcY+GcF9RtfVWKmd43iFWpjly3jkHWsRKcgelI+ciYF1PekTyVH8hEUIkDqkP4GMWYhpTGMRI567rsA7nb+VqOyp7A5ZAr80Mx0UnNQlNHqCq41Urczyot2lcTVAb2Ia+jicGa1shwjeJWt99hmUHTMB4EJDWxxYp1+EaCl0uDqLrL730l2uNsfnahdqBYHB+mY9O5/9LwIPN9wv9ixmpODcNaJGRfq0h6rs7bcunQY0hs8OQwGHuuDQyGUmOuQHpYOjgx/oyt8abGVgYVcAtp2EBoXJov5S18yhhh9pAAcKdKwDqyWZmuJ4w+RG1fKdYtuCWgo7Odr+On1ralrFaQsD2f2ll6k/8wGZ/Ekks9UFHb6AIokL+F1ofbzSg1g7+Jhe00+QnLii+Ur6Cf1aIzGG83b2X04KmDrv+WhL7toqw/wMk5K8cxIXIrZDl6 ZQ7CRvMO oH6PDBQX8CV3t8DFKsL2n3XnXaEgfZyd4wiipS3wuJ/9gqqrzhAR69QqxzNA0vfjquAWo+BORk8AaiGruol+4dxD1FgGfIOYQ7rtVob/jl75CmAMlfkyzCF/19vk2Sz2jBP2uq3Zq4Y0nZX9ZoxDMmXbzzJpN5nIHFkGCTtSK96OwnNyxUenfH7hIGCw+XLpR67S+tt6AcNTIyHC7lqUmMw4R6xSXjp5C3PXTGCsPgyc37nwT6KbdfazmjNa9Q9TGl9H/V0GQqCXi6dOBOMhh8bUJxd7FfafMFh025RqEDJWbCftmjOmaajeEewPgAJPKwKGLo8eXDiUvD0A= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Feb 1, 2024 at 3:15=E2=80=AFPM Linus Torvalds wrote: > > On Thu, 1 Feb 2024 at 14:54, Theo de Raadt wrote: > > > > Linus, you are in for a shock when the proposal doesn't work for glibc > > and all the applications! > > Heh. I've enjoyed seeing your argumentative style that made you so > famous back in the days. Maybe it's always been there, but I haven't > seen the BSD people in so long that I'd forgotten all about it. > > That said, famously argumentative or not, I think Theo is right, and I > do think the MAP_SEALABLE bit is nonsensical. > > If somebody wants to mseal() a memory region, why would they need to > express that ahead of time? > I like to look at things from the point of view of average Linux userspace developers, they might not have the same level of expertise as the other folks on this email list or they might not have time and mileage for those details. To me, the most important thing is to deliver a feature that's easy to use and works well. I don't want users to mess things up, so if I'm the one giving them the tools, I'm going to make sure they have all the information they need and that there are safeguards in place. e.g. considering the following user case: 1> a security sensitive data is allocated from heap, using malloc, from the software component A, and filled with information. 2> software component B then uses mprotect to change it to RO, and seal it using mseal(). Yes. we could choose to allow it. But there are complications: 1> Is this the right pattern ? why don't component A already seal it if they think it is important ? 2> Why heap, why not mmap() a new memory mapping for that security data ? 3> free() will not respect the situation of whether the memory is sealed or not. How would a new developer know they probably shall never free the sealed memory ? 4> brk-shrink will never be able to pass the VMA that gets splited out by mseal(), there are memory footprint implications to the process. 5> what if the security sensitive data happens to be the first VMA or last VMA of the heap, will sealing the first VMA/last VMA cause any issue there ? since they might carry important VMA flags ? ( I don't know enough about brk.) 6> If we ever support sealing the heap for its entirety (make it not executable), and still want to support other brk behaviors, such as shrink/grow, would that conflict with current mseal(), if we allow it on heap from beginning ? Questions like that, without clear answers, to me it is premature to already let developers start using mseal() for heap. And even if we have all the answers for heap, how about stack, or other types of virtual memory ? Again, I don't have enough knowledge to get a complete list that shouldn't be sealed, the input from Theo is none should I worry about. However it is clearly not none to me, besides heap mentioned, there is also aio/shm. So MAP_SEALABLE is a conservative approach to limit the scope to *** two known use cases *** that I want to work on (libc and chrome) and give time needed to answer those questions. It is like a claim: only those marked by MAP_SEALABLE support the sealing at this point of time. And MAP_SEALABLE is reversible, e.g. a sysctl could be added to make all memory sealable in the future, or we could obsoleted it entirely when time comes, an application that already passes MAP_SEALABLE can be treated as noop. However, if all memory were allowed to be sealable from the beginning, reversing that decision would be hard. After those considerations, if MAP_SEALABLE is still not preferred by you. Then I have the following options for you to choose: 1. MAP_NOT_SEALABLE in the mmap(). And I will use them for the heap/aio/shm case. This basically says Linux does not officially support sealing on those, until we support them, we discourage the sealing on those mappings. 2. make MAP_NOT_SEALABLE only a kernel visible flag. So application space won't be able to use it. 3. open for all, and list as much as details in the documentation. If we choose this route, I would like to have more discussion on the heap/stack, at least the Linux developers will learn from those discussions. > So the part I think is sane is the mseal() system call itself, in that > it allows *potential* future expansion of the semantics. > > But hopefully said future expansion isn't even needed, and all users > want the base experience, which is why I think PROT_SEAL (both to mmap > and to mprotect) makes sense as an alternative form. > > So yes, to my mind > > mprotect(addr, len, PROT_READ); > mseal(addr, len, 0); > > should basically give identical results to > > mprotect(addr, len, PROT_READ | PROT_SEAL); > > and using PROT_SEAL at mmap() time is similarly the same obvious > notion of "map this, and then seal that mapping". > > The reason for having "mseal()" as a separate call at all from the > PROT_SEAL bit is that it does allow possible future expansion (while > PROT_SEAL is just a single bit, and it won't change semantics) but > also so that you can do whatever prep-work in stages if you want to, > and then just go "now we seal it all". > To clarify: do you mean to have the following ? mmap(PROT_READ|PROT_SEAL) mseal(addr,len,0) mprotect(addr,len,PROT_READ|PROT_SEAL) ? I have to think about the mprotect() case. For mmap(PROT_READ|PROT_SEAL), I might have a use case already: fs/binfmt_elf.c if (current->personality & MMAP_PAGE_ZERO) { /* Why this, you ask??? Well SVr4 maps page 0 as read-only= , and some applications "depend" upon this behavior. Since we do not have the power to recompile these, we emulate the SVr4 behavior. Sigh. */ error =3D vm_mmap(NULL, 0, PAGE_SIZE, PROT_READ | PROT_EXEC, <-- add PROT_SEAL MAP_FIXED | MAP_PRIVATE, 0); } I don't see the benefit of RWX page 0, which might make a null pointers error to become executable for some code. Best Regards, -Jeff > Linus