From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB766EE4993 for ; Mon, 21 Aug 2023 19:05:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 36B4B900005; Mon, 21 Aug 2023 15:05:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 31B668E0012; Mon, 21 Aug 2023 15:05:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E36C900005; Mon, 21 Aug 2023 15:05:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 0CA348E0012 for ; Mon, 21 Aug 2023 15:05:37 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id CF5171A0BD5 for ; Mon, 21 Aug 2023 19:05:36 +0000 (UTC) X-FDA: 81149040672.02.3D6E51B Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171]) by imf26.hostedemail.com (Postfix) with ESMTP id 1D1A114001E for ; Mon, 21 Aug 2023 19:05:34 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=mLJCVxdE; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of jeffxu@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=jeffxu@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692644735; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=590/UzQEybat9jvi9JGXsnqP5LlShojALuYttZJVkzw=; b=4JWgiwGhK7OadncSZptwgZdCaVnvu9foQPS3yJr9Qvk3eOUBG9z0HZRd3nAlxlHDb5ftZq JGJKkvBW+sl1XN53vyZgwchm98P8VdmyDJqKVywUmt+YiuTxMfnGdewy8UhF7i8a09bND2 7df98n6L+Y39UZVTvHfpckI0Ia9los4= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=mLJCVxdE; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of jeffxu@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=jeffxu@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692644735; a=rsa-sha256; cv=none; b=jowH84Rp21DkZpcSp7HcSmTWjII0InrS8bofaNcsSMDdO96xmq1Bnoj+UfapWQoq0rtmwQ F6S7TH9ZcZ5jt6YIZVTAUaJrtm9C4sMGMEX0WfbdAFCjLdiuZjUaxIwIAdMIK2Lc0k22VH LwniaEkg6tjJ4EQPoUU5y4t4lSoBIs0= Received: by mail-qt1-f171.google.com with SMTP id d75a77b69052e-40c72caec5cso59601cf.0 for ; Mon, 21 Aug 2023 12:05:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692644734; x=1693249534; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=590/UzQEybat9jvi9JGXsnqP5LlShojALuYttZJVkzw=; b=mLJCVxdEfC9JgMq29KJ2oYsUJpED5y428av1O5YJ0jU6sB+lO8HWEVpuO8nKjgg9Ml gUTt3y395vdJFfIxWtCcB5HMdSaz3WnSebe4uGlVJ+urQr1ywHhfgIASLr6vK+utxaI5 Z+4FMfFRAD2wOs67xCprm2kpV/KfQ0h0fYu6n1Y9q+Ibfk8y3L5t6vCtZofwbY153cGX kgmSLizjQG/KlPGgIcrjNBuwcUxEpiPhijxN1aMcXZE5tI8aFmxTUlp9SJPljx25YhVI G8wjE1ZHZ9q9adseOfJW/eM1JAC6Im/ND/0g+m0y7I+OOF8DC2NWr/4Ab4kg9x0Ll26F 9Rdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692644734; x=1693249534; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=590/UzQEybat9jvi9JGXsnqP5LlShojALuYttZJVkzw=; b=Ys9eJJeMoL38lK3VkgM033dvW8mWDyGFhFndYJ3x+fy432u7/mmT1WrOLfcdiOi5mk 3IrkdoHs15qv1ipAP3+04CNQ0tPHAQJOTOw/jXuXPLqby+QJQVNZH6c9QShRcz6pKDkE lWbfz2U/t547kokup7KfR6pBRszjMLBA2cW4Ixv5Om4Mqc5dzRd+0doSUW2+rdTtEaJM LiZLjJpYtTYuPW8lglN17XcoqpUKiKcnuCl5NGnxwBFKfxGVQekA/6nK4WpiFNO9jTxm XfspItpGmloQDWi9VdGGYG1Lu/pmRekGEMTYRvi1jiiPnY5HZ7fw+PqAW2l6lJnuwOwZ ls4w== X-Gm-Message-State: AOJu0Yzs6I29+R+xxJxt0WtYSbmBZj0viUfTwPLsGrndjDY6KYIzFZc3 0IM0B97LrD0/CSLxVZCMt/kGg/ZGiMoh3FlhltzvDA== X-Google-Smtp-Source: AGHT+IHWSgMuae3sAcjkJjT268HxeCbcdoOrGPQCFnFxWh5gkeDKoKc++ZSNNxE25rj8mdxOygQIECLN3qRMn8mdGK8= X-Received: by 2002:a05:622a:1045:b0:403:affb:3c03 with SMTP id f5-20020a05622a104500b00403affb3c03mr51253qte.10.1692644734007; Mon, 21 Aug 2023 12:05:34 -0700 (PDT) MIME-Version: 1.0 References: <20230814-memfd-vm-noexec-uapi-fixes-v2-0-7ff9e3e10ba6@cyphar.com> <20230819.022033-joyful.ward.quirky.defender-lpHlCTglJUSs@cyphar.com> In-Reply-To: <20230819.022033-joyful.ward.quirky.defender-lpHlCTglJUSs@cyphar.com> From: Jeff Xu Date: Mon, 21 Aug 2023 12:04:57 -0700 Message-ID: Subject: Re: [PATCH v2 0/5] memfd: cleanups for vm.memfd_noexec To: Aleksa Sarai Cc: Andrew Morton , Shuah Khan , Kees Cook , Daniel Verkamp , Christian Brauner , Dominique Martinet , stable@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: scjsdstgjpjbsb95x1qi13jyyw5bb9ws X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 1D1A114001E X-HE-Tag: 1692644734-76501 X-HE-Meta: U2FsdGVkX1/ZvzshZJd3qRxIZO/Gq+Mx4VFhdc8w3L1+Sjjx59bSTlbyq+HqUvL/YSM96PR6xaAF3snRJcmn6XWB9satgc+G3W41PoZFEoXg8SmYAjfpyl3jqkSVuzaH3l5TNPw5708uCGjLX7i1XE+OSFJxDnYwI+WWQ3/H6vLSj9SpM1p+ki1886aw+v2c+mVHWISt0NPgg5piyaSsuKjgvTomqE920OI0rqsMkfQbQaCm2LKd2xbCyefFRquhfB6twNK1IDzJDIr6ZxS/rEkRzaLJDjzkUyO9oR2734MMap1b5ENis8TemmjbdMI528AsQT4/k1odjsZrPUxAuve7SEIlPB1a5w4phE5BA+mQNZRVAW6wBehWfTqMy0z2lIhr62nwv7QnkUHIVxYZ6I2IMBLUWR5UFscUNW92Xd/NtQfQbcYd5GR7Ir/5c/nKd2iLxtJ7uuCdso9RkZmE3/Cs2ite0tR+sqSGCDAg7nwVkPIdOyMmAiBNEtgZUBvRJuKHDTgte++xF4u9IYFoD/S8pDZ43f5qi5LtIwyrXU83Z3/qGpWYHqF2VRkFiewOYUNPnW40qTQ1hYYmSvVGkqeo3hiyy5zEZO/zTRkZKgYHKbqNQiZ/IHwkE6qTwplVdaHyqXps5Nu8Jajm+0bb/TJO73lEjPXsmYOFd/sjkp/ygpEMuIe9Q7iaaoM3QwC8DyqCVdYv8/UoBFDW1nVZ6eROuwa/VbC+iAnc67/IQzvo3MiHH3adsZwGXIZLwIEDC883xzYWkHPkYCecxTAUTJM5wZMJ7HLeLKgGHcNyrHSHNLn1FOwU4CW0Pnj6jayUiD5BLkES6qjsRTjmuNkM3am3F1WUb0EDtYMsaJkIogWGbKIx3wxpznh+C/fen2cXTQlrqXk7nD9tC7l4lr5TyoJCuJWtXTtLpjc0b4Niloi1GaAp9cBxyX891s0t/JDa7Vmu/CFd2Btl+a7bscr zb1h50Z9 Sf3AQ8P8r5D2KthCmkSOz+Lhw9TvhAQQLxMmsIHi50SyfDmPVcc5K3cix6VOFhMwjRVrHlCnoidsN+Ki3EpmDOkAOSSF0ml0l+xP9ZKQP9k+YI4kz0WYoc6BQj+DYLoEAxmAVPZfQCDsttJnlPOjHMCoZMABpqIJfmb6o1AK3Li6E6Y8ekql5vkzIL0DIZm7CAYdnxoPcsJD+W3HR185efkBV+45LIEFZedI7bZqg8tePU6Y/JvrRxH5EjzTZEHD+p9D/vuXIoA+1EMmHvp5+iQX71gC9EAfwxR7Jius0+hAj07nHaoeFCreZM3yh72mxaZkIMryoe6rYI8MY5RvAOu+c2ofR5PVtCJvzRJT8stecgVv6tyGJABGQvGLpXLUU/we9lGjKSFBPZXVslHYb+wS0PLuQYhSdupjv X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Aug 18, 2023 at 7:50=E2=80=AFPM Aleksa Sarai wr= ote: > > On 2023-08-15, Jeff Xu wrote: > > On Mon, Aug 14, 2023 at 1:41=E2=80=AFAM Aleksa Sarai wrote: > > > > > > The most critical issue with vm.memfd_noexec=3D2 (the fact that passi= ng > > > MFD_EXEC would bypass it entirely[1]) has been fixed in Andrew's > > > tree[2], but there are still some outstanding issues that need to be > > > addressed: > > > > > > * vm.memfd_noexec=3D2 shouldn't reject old-style memfd_create(2) sys= calls > > > because it will make it far to difficult to ever migrate. Instead = it > > > should imply MFD_EXEC. > > > > > > * The dmesg warnings are pr_warn_once(), which on most systems means > > > that they will be used up by systemd or some other boot process an= d > > > userspace developers will never see it. > > > > > > - For the !(flags & (MFD_EXEC | MFD_NOEXEC_SEAL)) case, outputting= a > > > rate-limited message to the kernel log is necessary to tell > > > userspace that they should add the new flags. > > > > > > Arguably the most ideal way to deal with the spam concern[3,4] > > > while still prompting userspace to switch to the new flags would= be > > > to only log the warning once per task or something similar. > > > However, adding something to task_struct for tracking this would= be > > > needless bloat for a single pr_warn_ratelimited(). > > > > > > So just switch to pr_info_ratelimited() to avoid spamming the lo= g > > > with something that isn't a real warning. There's lots of > > > info-level stuff in dmesg, it seems really unlikely that this > > > should be an actual problem. Most programs are already switching= to > > > the new flags anyway. > > > > > > - For the vm.memfd_noexec=3D2 case, we need to log a warning for e= very > > > failure because otherwise userspace will have no idea why their > > > previously working program started returning -EACCES (previously > > > -EINVAL) from memfd_create(2). pr_warn_once() is simply wrong he= re. > > > > > > * The racheting mechanism for vm.memfd_noexec makes it incredibly > > > unappealing for most users to enable the sysctl because enabling i= t > > > on &init_pid_ns means you need a system reboot to unset it. Given = the > > > actual security threat being protected against, CAP_SYS_ADMIN user= s > > > being restricted in this way makes little sense. > > > > > > The argument for this ratcheting by the original author was that i= t > > > allows you to have a hierarchical setting that cannot be unset by > > > child pidnses, but this is not accurate -- changing the parent > > > pidns's vm.memfd_noexec setting to be more restrictive didn't affe= ct > > > children. > > > > > That is not exactly what I said though. > > Sorry, I probably should've phrased this as "one of the main arguments". > In the last discussion thread we had in the v1 of this patch, it was my > impression that this was the primary sticking point. > > > From ChromeOS's position, allowing downgrade is less secure, and this > > setting was designed to be set at startup/reboot time from the very > > beginning, such that the kernel command line or as part of the > > container runtime environment (get passed to sandboxed container) > > If this had been implemented as a cmdline flag, it would be completely > reasonable that you need to reboot to change it. However, it was You might already know that sysctl can be set in kernel command line, thanks to Vlastimil Babka from SUSE. [1] [1] https://lore.kernel.org/lkml/20200325120345.12946-1-vbabka@suse.cz/ > implemented as a sysctl and the behaviour of sysctls is that admins can > (generally) change them after they've been set -- even for > security-related sysctls such as the fs.protected_* sysctls. The only > counter-example I know if the YAMA one, and if I'm being honest I think > that behaviour is also weird. > > > I understand your viewpoint, from another distribution point of view, > > the original design might be too restricted, so if the kernel wants > > to weigh more on ease of admin, I'm OK with your approach. > > Though it is less secure for ChromeOS - i.e. we do try to prevent > > arbitrary code execution as much as possible, even for CAP_SYSADMIN. > > And with this change, it is less secure and one more possibility for > > us to consider. > > FWIW I still think the threat model where a &init_user_ns-privileged > CAP_SYS_ADMIN process can be tricked into writing a sysctl should be > protected against by memfd_create(MFD_EXEC) doesn't really make sense > for the vast majority of systems (if any). > I agree other distributions might not care much about running arbitrary code on the host for CAP_SYS_ADMIN, similar to traditional unix in this aspect. ChromeOS has some unique security features. > If ChromeOS really wants the old vm.memfd_noexec=3D2 behaviour to be > enforced, this can be done with a very simple seccomp filter. If applied > to pid1, this would also not be possible to unset without a reboot. > In practice, host and process can have different values for vm.memfd_noexec, it can't easily be implemented through seccomp. Seccomp also requires no-new-priv set, there are implications if we set it to pid 1 and apply to all its children. > -- > Aleksa Sarai > Senior Software Engineer (Containers) > SUSE Linux GmbH > Thanks Best regards, -Jeff