From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3701C001E0 for ; Wed, 2 Aug 2023 20:45:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 57F632801E4; Wed, 2 Aug 2023 16:45:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 52FB42801AA; Wed, 2 Aug 2023 16:45:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3F7462801E4; Wed, 2 Aug 2023 16:45:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2FB852801AA for ; Wed, 2 Aug 2023 16:45:36 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id F41E41601DC for ; Wed, 2 Aug 2023 20:45:35 +0000 (UTC) X-FDA: 81080345430.25.EE88988 Received: from mail-oa1-f54.google.com (mail-oa1-f54.google.com [209.85.160.54]) by imf27.hostedemail.com (Postfix) with ESMTP id 2D35540011 for ; Wed, 2 Aug 2023 20:45:33 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=fWDVHQD6; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf27.hostedemail.com: domain of jeffxu@chromium.org designates 209.85.160.54 as permitted sender) smtp.mailfrom=jeffxu@chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691009134; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=g/1Bfm6suFyViDWNlr8+HrMuba+po2ALntmKmq7g2XA=; b=SUorwjvfol4mwxtrt+fnzEE04qllxsJgxu5VarjWwqCzrK/b6KAV3zqbbcAyBlmfb0bTRc AxcoB653nZPOkkqzTNSEW6iye+Va/BSyjsA69FdLQm1oU9T7AWDaEiqkk/GfG6ADIPe2r7 y8tdqpMHMg5BIxFLUG8ImVeyDy4nRj4= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=fWDVHQD6; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf27.hostedemail.com: domain of jeffxu@chromium.org designates 209.85.160.54 as permitted sender) smtp.mailfrom=jeffxu@chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691009134; a=rsa-sha256; cv=none; b=lY1b6gMoPCxvJngIAZKv+KLaTzKbVdVNSQ8AG2TC4SmYQwl/nkxsA8cP+YvacynG5yaQ9B tADC88/ylNwrglUhY3rkEaeZdQu+sdn8DRX1bYXfxBqSQ6N6icObK9pYkU/qb9ojsIGnOj YFA3YmaXu3IdT3hWW79jNKZTXeJyOk8= Received: by mail-oa1-f54.google.com with SMTP id 586e51a60fabf-1bed90ee8b7so115355fac.0 for ; Wed, 02 Aug 2023 13:45:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1691009133; x=1691613933; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=g/1Bfm6suFyViDWNlr8+HrMuba+po2ALntmKmq7g2XA=; b=fWDVHQD6E/n9Ca7eabCCX6b0FFMGlyISauUxdo0n2rpIZnfZ95RCQCHEPckZtOKLCA Sddo6fzWI4gkqr1sajuR+gUUXb/J9BJ7WjG/y+FBliSpaWUz/rczqYOGle4EdddiwjtP i4w41LX3Z1PNFXlt8JNAPBre/xxE2c3QRnk/E= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691009133; x=1691613933; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=g/1Bfm6suFyViDWNlr8+HrMuba+po2ALntmKmq7g2XA=; b=icnPKCKvtxKkUz4J5bxXpNYJNqnXGONKYx/uA8/h05cjEDlH6aIv7f6CyxaX3eKMIu oakAAseKGCE4D3RjkbzcdHcgk9C1E9ZFUHwvA47xbt4jkPQxfMRZvEnaTFiQUQ0LSYKx kQsNbXaHXiZetfpUOkw10uhl75sRCVHWSS7pLQ2V1eoljLEECehEk+/S0UR4mcSbLGKH ODDe/rtwmO/rS/Sq/GxtAxbSiecLtb9iGZjcCnzwlNag/8tE4A+8GH/lfh1H0qKQJKjo sYO9sLrOOqNAG+VnGpI83jQPP0nKqBiPDyogoLrkB/Tms3O4EBrWJivYIhXTe4VY0HGP YyEA== X-Gm-Message-State: ABy/qLZtMeQDkVPW6II5tv+56RDuB/59+BKFbVf6KQ5OceQ73k/K8TcQ FDQx/mq1rXhuZWnr0awiNosjGyy8oyAUHCbl3jye7g== X-Google-Smtp-Source: APBJJlEqN8WznNoc2+T+MjGzRWpOC67aOcySrJF83niYlI694u4xO01qbb56VWKDeRmZuTqh3OCCclVzdbbzIgJRr4Y= X-Received: by 2002:a05:6870:2052:b0:1bb:85c3:929e with SMTP id l18-20020a056870205200b001bb85c3929emr16774760oad.48.1691009133246; Wed, 02 Aug 2023 13:45:33 -0700 (PDT) MIME-Version: 1.0 References: <20230713143406.14342-1-cyphar@cyphar.com> <20230801.032503-medium.noises.extinct.omen-CStYZUqcNLCS@cyphar.com> In-Reply-To: <20230801.032503-medium.noises.extinct.omen-CStYZUqcNLCS@cyphar.com> From: Jeff Xu Date: Wed, 2 Aug 2023 13:45:21 -0700 Message-ID: Subject: Re: [RFC PATCH 0/3] memfd: cleanups for vm.memfd_noexec To: Aleksa Sarai Cc: Jeff Xu , Andrew Morton , Shuah Khan , Kees Cook , Daniel Verkamp , Luis Chamberlain , YueHaibing , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-hardening@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 2D35540011 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: j8e8zczjeowpnszmod3ztptdn1j9azh9 X-HE-Tag: 1691009133-364252 X-HE-Meta: U2FsdGVkX1/QPNI6xtAi7T5f29rCt5wNSumHspcrZtym2iKFJgxaABrBSJTQYPw7E/Tk7mLrziQWNu2wNKLFE6YysmllbqwqxxQLPPh9/Rq/jvCXqNQ9NM3F+f9q+nwVrNKLRKy1FbdJNgFai6sZpR7xTs53mj2gSBz7Jrc0GU2PQLUMLo1lUcnYb/D5pNz7f6gPoQ0/MT4sEhgSKeMFwq2Tum38oubJt2bKbDWR2J8qJsQDthH5/DPHLJd5ZHkUoc8LZp0QjqckbB1ttzsenwZUNgzeU0jo9M2Vai2VfdBLcMtynzugD3LDkKqJLadmrzy2af1T6H1EMzThTtEzUueMDGVuiZLIcz4Dkv6YUdnpV0yRXJ2plnfl4/uOC61g8iLbHIlV30lTZuHYWKeKWVILZiWqhO07cZA+8CJVg/Za7GtZm4Pm/24emeHLiKRQBzXK2B84PATUkiEDzPbjFYEEmYFjSmG+4Ye3tYcK3LeD+AKW9VwpgGN7BgWDmF1mIPAw2Qlni7Ga8ELfSyrItQtVlGMqAUXO1tyLZ7VrVejuR4MNB/FHwhxEn73f3oxmPMe2p6jFuy7sQ9xuSv4dKrBDl8KQnLc+Nu6xa3zBXNUbz0dnbjlVYGS1PjQE1GNPMO7f39nxBHY0Qbdh70LM0sfMzHEHFFVweJYPaJJc9ckWEdYTZ822Qr988a5yK0/2KkDU4Dks209Nfb5qTmUveH/9Wof1CDqVn1+ViyW730cXPcwoW19pLlA6BsAjhw2E66eaVFtDW6iqGxmwjQ3SCBVeUnEblaE1vhtPJ37oLM8lAGeCyP6AlniBAp02cCbUJhrnL2+7sqw7y3rzVOaye7T0MHZ3B3JeHSrnSRe8tBkGwlfeiNiSFDrdcw9xrdgubIUV6SXzawMk7mQiKVWLge5E0oaNn2ZSfa2OrovN6xiljp1TzgL8UitJPmu4XoFcJauW+NOt6rnsbb1Kqbu aswxZwi3 kT+l4PpzqJHf6Wa6dMPv9SNvUez6n6cqsIg/i3839/EdO4uHPXJ7EyIBCMrtDwW87teXHDHmPSf+TvSOPGQd1VV0S4B1M3TlzcD3HAZRuAdkytKxCl99aWnl+QKZ37Cznwv4n5I8ZvDVcmONGClHecQCr+vYDXADONWPYwmL17t+e+z3hHJmrEZEruzlNVmuyfiq9GH3HT4KzGynWMUZZHAE/1sDrtYlYHuUQc7WIDqVgQC7dCTGUvZoOLQ05aEQtGsQskReM3/Nb3TmtIm4K0vMBj/aOYacN+TPPETPwmMRZc21LqbMcFaEs85VnF38HfjWCeF1N5zTGepzgXpKDmCX+gpWESSNAeZ9DbYb44XvTaYI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > > > > > * vm.memfd_noexec=2 shouldn't reject old-style memfd_create(2) syscalls > > > > > because it will make it far to difficult to ever migrate. Instead it > > > > > should imply MFD_EXEC. > > > > > > > > > Though the purpose of memfd_noexec=2 is not to help with migration - > > > > but to disable creation of executable memfd for the current system/pid > > > > namespace. > > > > During the migration, vm.memfd_noexe = 1 helps overwriting for > > > > unmigrated user code as a temporary measure. > > > > > > My point is that the current behaviour for =2 means that nobody other > > > than *maybe* ChromeOS will ever be able to use it because it requires > > > auditing every program on the system. In fact, it's possible even > > > ChromeOS will run into issues given that one of the arguments made for > > > the nosymfollow mount option was that auditing all of ChromeOS to > > > replace every open with RESOLVE_NO_SYMLINKS would be too much effort[1] > > > (which I agreed with). Maybe this is less of an issue with > > > memfd_create(2) (which is much newer than open(2)) but it still seems > > > like a lot of busy work when the =1 behaviour is entirely sane even in > > > the strict threat model that =2 is trying to protect against. > > > > > It can also be a container (that have all memfd_create migrated to new API) > > If ChromeOS would struggle to rewrite all of the libraries they use, > containers are in even worse shape -- most container users don't have a > complete list of every package installed in a container, let alone the > ability to audit whether they pass a (no-op) flag to memfd_create(2) in > every codepath. > > > One option I considered previously was "=2" would do overwrite+block , > > and "=3" just block. But then I worry that applications won't have > > motivation to ever change their existing code, the setting will > > forever stay at "=2", making "=3" even more impossible to ever be used > > system side. > > What is the downside of overwriting? Backwards-compatibility is a very > important part of Linux -- being able to use old programs without having > to modify them is incredibly important. Yes, this behaviour is opt-in -- > but I don't see the point of making opting in more difficult than > necessary. Surely overwite+block provides the security guarantee you > need from the threat model -- othewise nobody will be able to use block > because you never know if one library will call memfd_create() > "incorrectly" without the new flags. > > > > > If you want to block syscalls that don't explicitly pass NOEXEC_SEAL, > > > there are several tools for doing this (both seccomp and LSM hooks). > > > > > > [1]: https://lore.kernel.org/linux-fsdevel/20200131212021.GA108613@google.com/ > > > > > > > Additional functionality/features should be implemented through > > > > security hook and LSM, not sysctl, I think. > > > > > > This issue with =2 cannot be fixed in an LSM. (On the other hand, you > > > could implement either =2 behaviour with an LSM using =1, and the > > > current strict =2 behaviour could be implemented purely with seccomp.) > > > > > By migration, I mean a system that is not fully migrated, such a > > system should just use "=0" or "=1". Additional features can be > > implemented in SELinux/Landlock/other LSM by a motivated dev. e.g. if > > a system wants to limit executable memfd to specific programs or fully > > disable it. > > "=2" is for a system/container that is fully migrated, in that case, > > SELinux/Landlock/LSM can do the same, but sysctl provides a convenient > > alternative. > > Yes, seccomp provides a similar mechanism. Indeed, combining "=1" and > > seccomp (block MFD_EXEC), it will overwrite + block X mfd, which is > > essentially what you want, iiuc.However, I do not wish to have this > > implemented in kernel, due to the thinking that I want kernel to get > > out of business of "overwriting" eventually. > > See my above comments -- "overwriting" is perfectly acceptable to me. > There's also no way to "get out of the business of overwriting" -- Linux > has strict backwards compatibility requirements. > I agree, if we weigh on the short term goal of letting the user space applications to do minimum, then having 4 state sysctl (or 2 sysctl, one controls overwrite, one disable/enable executable memfd) will do. But with that approach, I'm afraid a version of the future (say in 20 years), most applications stays with memfd_create with the old API style, not setting the NX bit. With the current approach, it might seem to be less convenient, but I hope it offers a bit of incentive to make applications migrating their code towards the new API, explicitly setting the NX bit. I understand this hope is questionable, we might still end up the same in 20 years, but at least I tried :-). I will leave this decision to maintainers when you supply patches for that, and I wouldn't feel bad either way, there is a valid reason on both sides. To supplement, there are two other ways for what you want: 1> seccomp to block MFD_EXEC, and leaving the setting to 1. 2> implement the blocking using a security hook and LSM, imo, which is probably the most common way to deal with this type of request (block something). I admit those two ways will be less convenient than just having sysctl do all the things, from the user space's perspective. Thanks -Jeff