From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FD5FEB64D7 for ; Wed, 28 Jun 2023 11:43:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8925E8D0002; Wed, 28 Jun 2023 07:43:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 843468D0001; Wed, 28 Jun 2023 07:43:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 70C188D0002; Wed, 28 Jun 2023 07:43:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 619398D0001 for ; Wed, 28 Jun 2023 07:43:12 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 22F17120D84 for ; Wed, 28 Jun 2023 11:43:12 +0000 (UTC) X-FDA: 80951970624.02.3CF45CB Received: from nautica.notk.org (nautica.notk.org [91.121.71.147]) by imf26.hostedemail.com (Postfix) with ESMTP id D1A6914001F for ; Wed, 28 Jun 2023 11:43:08 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=codewreck.org header.s=2 header.b=glrL3CK0; dkim=pass header.d=codewreck.org header.s=2 header.b=jvZ0TWI1; spf=pass (imf26.hostedemail.com: domain of asmadeus@codewreck.org designates 91.121.71.147 as permitted sender) smtp.mailfrom=asmadeus@codewreck.org; dmarc=pass (policy=none) header.from=codewreck.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687952589; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0WN8XsDGTRKDo6/ITLTj6m/cxiiek/G6u+FK4mhH5tw=; b=C5otZi7M7hkWfk+IeTWna8wQ801rf5PEcZY8RSuixW5V4/btr9by3DaQujymc1CsnPeJ/o l6io+V8FQQ+MVuVccXSX6zKEyNXeSl5ZRhYyNbOmiAfwnh9fIVwygFZ8rKXN2AvD59NWby KHGQDx2VLqMyPaG9g+ayxSC/WviRdoU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687952589; a=rsa-sha256; cv=none; b=rSPESleaB1azBTvO6Ypjljf4SGZfaOfroyWtIAFTrrcjeLBTb8tteyoobOYZ+b9seUQUFx WT9oUlgr1Q2q0vN60hPiqinmGtLZMqNADD2S1wgp9oGLvkeDeaxTRWH2O2PEu9Zt/sBN+7 3YKD+GyGcRgH6wWPQpSZC42lUe4UM6Y= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=codewreck.org header.s=2 header.b=glrL3CK0; dkim=pass header.d=codewreck.org header.s=2 header.b=jvZ0TWI1; spf=pass (imf26.hostedemail.com: domain of asmadeus@codewreck.org designates 91.121.71.147 as permitted sender) smtp.mailfrom=asmadeus@codewreck.org; dmarc=pass (policy=none) header.from=codewreck.org Received: by nautica.notk.org (Postfix, from userid 108) id AE019C01D; Wed, 28 Jun 2023 13:43:06 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=codewreck.org; s=2; t=1687952586; bh=0WN8XsDGTRKDo6/ITLTj6m/cxiiek/G6u+FK4mhH5tw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=glrL3CK0Yg1K/0KppAUwAB+QaPZKN5CqDYaVFqtHQoRePCY8QWrQZBZOX2YQP5YRo Vi0tUEOxrX6U9rjvEQFq3qGuW3CDNk1km0BoBBthd3wvStolb6H71Bu7HMc7PubGj5 H+VHOpTq32Y4qyXGjB3SEqXjlMymIReMZhLBTCRQVwkPT4arlUK1Ns1BX2wd1oaGJ3 WA6v1EcgzDdwjjifsbkkQfvommxjycyBjVrIWv2VsQnGZdoD3yESByyfLkQ0qnWqDi d+pRHjhTfGoC82NTNyl/iPrXomToEdw8hAa+93vXGYz2aaR0P4ZvZPMAtmMbeZHTvE +09Qbgz7xTAsw== Received: from odin.codewreck.org (localhost [127.0.0.1]) by nautica.notk.org (Postfix) with ESMTPS id 28BC5C009; Wed, 28 Jun 2023 13:43:00 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=codewreck.org; s=2; t=1687952585; bh=0WN8XsDGTRKDo6/ITLTj6m/cxiiek/G6u+FK4mhH5tw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=jvZ0TWI1KCV5Hc+MvIv/ZKAp0wL4fpBdccuM0cFUe0TFf+X6gYr9CDJd7QsRqmqG4 4YgnIfYF0sW7oFSn/jDjP6Uxo6MeXPqmEfiA1NPx2a4J4w2yM8x7NuuoZflv9h/9Y4 7p7dxCWhVcETdtOmMvlK7vbu32Umo3+ITlWpJzuvDeOmVrp+gAI/H2+RyVb8DbqCvN e2BI3hh2HsD88feDgPMaHDYDcp+8hf6rETpXY9DD5JU77lb88PlAYN9u9cSjhLD0xb RpjCi4BTEZ8N3Hkf8CmGuIOm7SEN6m26emULCkRXfrotBmD9pC5dBc1IYAax/a0P6V aGEE7u5dhH38Q== Received: from localhost (odin.codewreck.org [local]) by odin.codewreck.org (OpenSMTPD) with ESMTPA id f3a75d01; Wed, 28 Jun 2023 11:42:56 +0000 (UTC) Date: Wed, 28 Jun 2023 20:42:41 +0900 From: Dominique Martinet To: jeffxu@chromium.org Cc: skhan@linuxfoundation.org, keescook@chromium.org, akpm@linux-foundation.org, dmitry.torokhov@gmail.com, dverkamp@chromium.org, hughd@google.com, jeffxu@google.com, jorgelo@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, jannh@google.com, linux-hardening@vger.kernel.org, linux-security-module@vger.kernel.org, kernel test robot Subject: Re: [PATCH v8 3/5] mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC Message-ID: References: <20221215001205.51969-1-jeffxu@google.com> <20221215001205.51969-4-jeffxu@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20221215001205.51969-4-jeffxu@google.com> X-Rspamd-Queue-Id: D1A6914001F X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 691z4ju977yjk14xqnj83zyc6fjt1uwd X-HE-Tag: 1687952588-147471 X-HE-Meta: U2FsdGVkX18udfn5pEF84Djm+FHVF9iYoXW5LF40Xk3t0wL7l5C7HTkVAExNUwinZcKLYSPue8jx+f7k8jzOLW1RzZirkSmneIhvWCp1Mv458WqM36p3PfsDADk42j75BhNxLNvhWstjMZHEZPop3tomt+7yeUiLMqRk4eW2m6SYCma4qtNtsLLqqhKuDLBgGETrXn4m7OyELYGx5ngS9h0FYAL+/hCQTZZgvcdBu+9GommMkZ2fpz5JMcD5Ivst9P7RxM2B35CVw4b4mSiVUi6xNcepkhVuI3KaZC00XWlXVtEb2Di7vAPH30C2Pg862B1yDDmxSObrth8c1dES0DLj+vXTMlm4QBw/ZOEdAX5O8hOg6OxPXr08vSwBjWUOGXdoz4el4T7WQxzugapVMFrZh+UTELRZjIAoGCEr7w1R05DSnrCNVtUvn4iAyaBAhVcKkuHVHwqFrXDYm3f7u87swYrLbt7VSldjhdKcIoBZJodbbLrUDboVze27nygYKvQM8/aLruWb+WZOl/rfsmH1x19lrsz1eYG5oP2wweCxTYKtQWdS1Bj839BK/sxaWz5qFzyYG1GMQBbxbL3wlDKOBCb43/C/d1j/toPLdHrddCazqcGtNGfQQ30HSseRUxRA3kuve5dKrGOc1cyWbg1SY3rZ4R0nIStQpw6oYtSGg0kfUPHDdONm93D9J7g5Ep1Q3IRs7hLNohtkePQZJVae/j9GV9K4AjOHgtzlZ89M/Y6Y3ExIdtWdSenZDckuAizuim0+7D8NQn5OMbI2b6K4G35FdUo+GJ+zTL4P41Oc2H/jeoQ82SQiQdJjogIAnn+8zSace+SSX/jd08Sr3NY/sRVw7EiP55Qot4jsRMqBHi+YZKgTa9FtE8uH3RN7JyYVgGjJTb5nDGp2GkBxgNZC/OEE8i5OTz28nF7rP0Sul0IrKr8mksXGJM5k2New/Quj7BzAjPa2pNwl80K 4J0eu5gi 4xasDZ/cOGlBTUoUKrVoGBeTLBlK2vPWW9uUb80HQLAIDItM7EJKUj93dFhYzFa810jtK1J+YWs9pfAo/yVSTp9v1XSllFWI5UKLcqh288xMIIm+w/C9YZU5NvpWoRegTbnau+1rSgdtfXD8F8SYvsetyWb/fIN7omfP8aCOVOBmkKje3vllt34/9gZ2kzl0/n3fSxiIlFXc9Z/8yldK44MBFjNv6O8sExfTbyHrP0wd0Ae8i3raQx4JVd256tkJoptB2c5QifZJX21/iJwTgykx9RE+8qdjYXnHX/H951EdbFckbFGewxQBi5cm6yWo0UbOjB6e+Rj02fsePYiocNi3kTIjQDbFJkhvwFJGaA5JcWB8i6HETpCz5PQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: jeffxu@chromium.org wrote on Thu, Dec 15, 2022 at 12:12:03AM +0000: > From: Jeff Xu > > The new MFD_NOEXEC_SEAL and MFD_EXEC flags allows application to > set executable bit at creation time (memfd_create). > > When MFD_NOEXEC_SEAL is set, memfd is created without executable bit > (mode:0666), and sealed with F_SEAL_EXEC, so it can't be chmod to > be executable (mode: 0777) after creation. > > when MFD_EXEC flag is set, memfd is created with executable bit > (mode:0777), this is the same as the old behavior of memfd_create. > > The new pid namespaced sysctl vm.memfd_noexec has 3 values: > 0: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like > MFD_EXEC was set. > 1: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like > MFD_NOEXEC_SEAL was set. > 2: memfd_create() without MFD_NOEXEC_SEAL will be rejected. So, erm, I'm a bit late to the party but I was just looking at a way of blocking memfd_create+exec in a container and this sounded perfect: my reading is that this is a security feature meant to be set for container's namespaces that'd totally disable something like memfd_create followed by fexecve (because we don't want weird binaries coming from who knows where to be executed on a shiny secure system), but. . . is this actually supposed to work? (see below) > [...] > --- a/mm/memfd.c > +++ b/mm/memfd.c > @@ -263,12 +264,14 @@ long memfd_fcntl(struct file *file, unsigned int cmd, unsigned long arg) > #define MFD_NAME_PREFIX_LEN (sizeof(MFD_NAME_PREFIX) - 1) > #define MFD_NAME_MAX_LEN (NAME_MAX - MFD_NAME_PREFIX_LEN) > > -#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB) > +#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB | MFD_NOEXEC_SEAL | MFD_EXEC) > > SYSCALL_DEFINE2(memfd_create, > const char __user *, uname, > unsigned int, flags) > { > + char comm[TASK_COMM_LEN]; > + struct pid_namespace *ns; > unsigned int *file_seals; > struct file *file; > int fd, error; > @@ -285,6 +288,39 @@ SYSCALL_DEFINE2(memfd_create, > return -EINVAL; > } > > + /* Invalid if both EXEC and NOEXEC_SEAL are set.*/ > + if ((flags & MFD_EXEC) && (flags & MFD_NOEXEC_SEAL)) > + return -EINVAL; > + > + if (!(flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) { > + [code that checks the sysctl] If flags already has either MFD_EXEC or MFD_NOEXEC_SEAL, you don't check the sysctl at all. This can be verified easily: ----- $ cat > memfd_exec.c <<'EOF' #define _GNU_SOURCE #include #include #include #include #include #ifndef MFD_EXEC #define MFD_EXEC 0x0010U #endif int main() { int fd = memfd_create("script", MFD_EXEC); if (fd == -1)l perror("memfd"); char prog[] = "#!/bin/sh\necho Ran script\n"; if (write(fd, prog, sizeof(prog)-1) != sizeof(prog)-1) perror("write"); char *const argv[] = { "script", NULL }; char *const envp[] = { NULL }; fexecve(fd, argv, envp); perror("fexecve"); } EOF $ gcc -o memfd_exec memfd_exec.c $ ./memfd_exec Ran script $ sysctl vm.memfd_noexec vm.memfd_noexec = 2 ----- (as opposed to failing hard on memfd_create if flag unset on sysctl=2, and failing on fexecve with flag unset and sysctl=1) What am I missing? BTW I find the current behaviour rather hard to use: setting this to 2 should still set NOEXEC by default in my opinion, just refuse anything that explicitly requested EXEC. Sure there's a warn_once that memfd_create was used without seal, but right now on my system it's "used up" 5 seconds after boot by systemd: [ 5.854378] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=1 'systemd' And anyway, older kernels will barf up EINVAL when calling memfd_create with MFD_NOEXEC_SEAL, so even if userspace will want to adapt they'll need to try calling memfd_create with the flag once and retry on EINVAL, which let's face it is going to take a while to happen. (Also, the flag has been added to glibc, but not in any release yet) Making calls default to noexec AND refuse exec does what you want (forbid use of exec in an app that wasn't in a namespace that allows exec) while allowing apps that require it to work; that sounds better than making all applications that haven't taken the pain of adding the new flag to me. Well, I guess an app that did require exec without setting the flag will fail in a weird place instead of failing at memfd_create and having a chance to fallback, so it's not like it doesn't make any sense; I don't have such strong feelings about this if the sysctl works, but for my use case I'm more likely to want to take a chance at memfd_create not needing exec than having the flag set. Perhaps a third value if I cared enough... -- Dominique Martinet | Asmadeus