From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A978C001E0 for ; Wed, 16 Aug 2023 05:13:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 602E494003C; Wed, 16 Aug 2023 01:13:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5B1A28D001C; Wed, 16 Aug 2023 01:13:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4A0A594003C; Wed, 16 Aug 2023 01:13:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 3B3908D001C for ; Wed, 16 Aug 2023 01:13:57 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 0679E1603B4 for ; Wed, 16 Aug 2023 05:13:57 +0000 (UTC) X-FDA: 81128800914.02.9F15942 Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) by imf16.hostedemail.com (Postfix) with ESMTP id 354C618000A for ; Wed, 16 Aug 2023 05:13:55 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=xj+Wh4p0; spf=pass (imf16.hostedemail.com: domain of jeffxu@google.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=jeffxu@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692162835; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PQIHIan4UgwN8PNlutsYyTVZApetHryIC6JPWb8ty6E=; b=Lkod5ZlFi9dy6dksNafxq4npU4CGzAyvWgHX2CytfDmR4wFrF8e4H2QPEQ3bAkRcf/s+/B GsG0W/OO8vTUzpzOzVq5t1slQuMhtqumr+1TxC2FDbtmVl2HE2QaAoo7MrPO6Lw03cmX03 2vs57gLsou2JmJ8ane4gS3jzSj3V0NQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692162835; a=rsa-sha256; cv=none; b=L9+UuG00n1HUdWy6lqEn9a6peWk4q1eY3ZsIOK8bZ+Ng6mcxf17GvS1JFojo7Ogi3yf0o9 3Y2VrpepSlD6n+nz+hiIDvq+uOK7tTg6pfyduiqKcqUkZaiEqInP+csk0fOHYJ6NGUi07A 4iJDh+wTpHyey1DOEdaJgowrGJYm03g= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=xj+Wh4p0; spf=pass (imf16.hostedemail.com: domain of jeffxu@google.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=jeffxu@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-4036bd4fff1so187881cf.0 for ; Tue, 15 Aug 2023 22:13:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692162834; x=1692767634; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=PQIHIan4UgwN8PNlutsYyTVZApetHryIC6JPWb8ty6E=; b=xj+Wh4p0ryCM6u+0ke6PF1ZKIWXPnqDCM2X2W+Vh9DRPemN7o1IAJXVlQ8ebnFqqy7 NB7U1xgXTbtRiVUnO9pdj1QvGThirUh1JM05Vw9I419YXC633tnXWzTIBC/x0hDl57iM SXLip6VcckpF1MHVCBnihB7PTZRoG1mBeCVMydwHnvXdayx2z06NaEqfUhqT5+fW0oUX B1CdjQMpAjW7usda8p2facaUUSzGjgSNMg+GAiUrSMEcShzIkImShl1i71PjVsYqpp1f coNSn1PudFw4PPNwiHbMtK4K6AC60oBl36rXmDZrvkVKkUOm8eR95tMyvdM5xdaFsDap HEaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692162834; x=1692767634; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PQIHIan4UgwN8PNlutsYyTVZApetHryIC6JPWb8ty6E=; b=Tm66ZwCWi34SnLqSUcjlSb+/kXRUClhoTPNKzdLKziqvfYPZEGZPhFs+IKsMlNI8uz P6kgsSaBTRwgETeXJyMTerYWLdhhlvZxOZGnjKcbZZcCFRT5cP8RtGvrgegeogCZic5Q 3jgzUMYWR63vP6LdsFlhPd0/dymZD354KPe1kAbKIJpkjAxjFn4ExoEYwVxWtUM/wwzW zzu9wJSdvRuIhLGaWifCuJrNaNQNXJaKcXCH+aLS1NYfGvB8RwVZIaR8x5oMRjvZDORF Y3I8hqK2qNc2NJvbmJPberuS3s+N+58WvOJ66eB2VJ5rquufDjyX7Zl4o3F/tpgChf5Y ij3g== X-Gm-Message-State: AOJu0YwheoLrSpBV3VEkW6Vmvg740hxxlJIU+Uo7LRZFwjaBEkueTsRz G6496oaxyO+7rkKNtYpPgO3u3/iQQDJBcZrQe0YL3A== X-Google-Smtp-Source: AGHT+IGP4SVn8SCneg6CjJ1ixzGWEsewlhNhcVCZOTXpO5Wj/AlFbxwap6oQp9wibsgICq0L8WSBO0iTE4GHZgff8DY= X-Received: by 2002:a05:622a:d5:b0:3ef:5f97:258f with SMTP id p21-20020a05622a00d500b003ef5f97258fmr142674qtw.16.1692162834282; Tue, 15 Aug 2023 22:13:54 -0700 (PDT) MIME-Version: 1.0 References: <20230814-memfd-vm-noexec-uapi-fixes-v2-0-7ff9e3e10ba6@cyphar.com> <20230814-memfd-vm-noexec-uapi-fixes-v2-4-7ff9e3e10ba6@cyphar.com> In-Reply-To: <20230814-memfd-vm-noexec-uapi-fixes-v2-4-7ff9e3e10ba6@cyphar.com> From: Jeff Xu Date: Tue, 15 Aug 2023 22:13:18 -0700 Message-ID: Subject: Re: [PATCH v2 4/5] memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy To: Aleksa Sarai Cc: Andrew Morton , Shuah Khan , Kees Cook , Daniel Verkamp , Christian Brauner , Dominique Martinet , stable@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 354C618000A X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: em69tb5917eg6rwe5ibc963snojubd5t X-HE-Tag: 1692162835-156716 X-HE-Meta: U2FsdGVkX1/6dEHMDCaBOUDKWX0lRceIaoTrEW0svvXwNUZx567Kaz5lEuu3Xh780podIJpMiyDAe6K0OeOaVtTMPSMLSGMJyFYh5YDwVtvX+BggITxsfDk+iokP1TFM2v1d3y7yVPHDaRIt2Or2jvBC6lnnRK3KD7X1II3jFDzin+iValKTv/vL7jF4Jma8IHNrVhQLHdhvVZaxJlA7h6Z7bwf/ngqYkelIU++Otf4n9p5L7EwMqQXCoHwlJB8+PgIupRn+P77kri2HClr16yJBewIfVQ+9h1wB3u7dssn/9uzvRqqK+X2CD939+xcxgd34UkGi5BXEyNvQgaUQarUOzjkeDI5G87a5Qsqfiw5YwhaLm6ErCH+hOSvOKFC/sS4Lkce0NdGNO5Y1xv5057yEIi+OVEFUjaK1GfyXtXLjgYaPQ3wPGnofXG29JqQMD2X/3+KsvKAZMd8AaPC11QReNmN7MDL8h3YMG/+tmPWGxs5ld9ewRVaq+fk62BwbGG2S3SmUhMmL+8e1IYR1GKZK3aAlkeF0xU4q4RFO833olWv69/C8v3j8U2+H7wklOSRJg8GzjUET9RogIW4LlnBsN/bSfxvNIjogYbaKeKxXo4m7hxJkod1mGaC4V1r0zvvD1IdBmwqrLgn0LEnMtadCgzO0e01J5/B1IIzIsVTvgyGZyHyRF/O/L8+cVhaX4guy8uEb2uvLP0OkI8hypHanjLC8KQEKWIrNxasepGxQKR1HaxMwBSaM/CZ68uNyKviFGMihRcxj1ZU5zO4Dybun1hqUXrr+Awi6IkjQr3uppOds8ccGLHiYuUo3vWIzW8tXfWPRSc4djcRKlS1lR2Ogvu/SJruqVv1RaAuHQO+uLn3fl+tu7wQTDrZxqXjQ1n5meOshUJH2t/wMf/hD2XwWeFc5mUnUuwtqiWsRSqdiJM0DIi9bEdWD8WE77/tWhMcHRcXXlnERmeHl5fh 7IlQqHSM JVekGD6L50qcxiAAuH9hN83x+Qhxis0JtlySG3S4gXXR5NhOZwV0Y6bHK3aHzIKjURgawFDJZwqopKnQzCDkDPJAEfum/+/0BjlMT5ZKgQeZa4F4ekSi28YbR4NRX9ha8umGAif6tVqkCta906SfrJ7y1dOrgHDXxSGA663BjxxrPmQQ6jQ2g53CSiUzz1lk0bsDyPd2Xge/jBAi4c+/+Uii/ytypmuKiFhwGV5cXcpNefTeoX5eJLoKgJTV3X0taxb03HhWl8S8nWGjsY504IRvtTzLN+3LJvmAZxo30hrtAJfA1oO71daGkugy9+UnqQInBy9m6dM/hQLhsPxqVKiOycs8EXHI9fslf7wzMZ+GVCAA/qqm4iaNmuA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Aug 14, 2023 at 1:41=E2=80=AFAM Aleksa Sarai wr= ote: > > This sysctl has the very unusual behaviour of not allowing any user (even > CAP_SYS_ADMIN) to reduce the restriction setting, meaning that if you > were to set this sysctl to a more restrictive option in the host pidns > you would need to reboot your machine in order to reset it. > > The justification given in [1] is that this is a security feature and > thus it should not be possible to disable. Aside from the fact that we > have plenty of security-related sysctls that can be disabled after being > enabled (fs.protected_symlinks for instance), the protection provided by > the sysctl is to stop users from being able to create a binary and then > execute it. A user with CAP_SYS_ADMIN can trivially do this without > memfd_create(2): > > % cat mount-memfd.c > #include > #include > #include > #include > #include > #include > > #define SHELLCODE "#!/bin/echo this file was executed from this totally= private tmpfs:" > > int main(void) > { > int fsfd =3D fsopen("tmpfs", FSOPEN_CLOEXEC); > assert(fsfd >=3D 0); > assert(!fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 2)); > > int dfd =3D fsmount(fsfd, FSMOUNT_CLOEXEC, 0); > assert(dfd >=3D 0); > > int execfd =3D openat(dfd, "exe", O_CREAT | O_RDWR | O_CLOEXEC, 0= 782); > assert(execfd >=3D 0); > assert(write(execfd, SHELLCODE, strlen(SHELLCODE)) =3D=3D strlen(= SHELLCODE)); > assert(!close(execfd)); > > char *execpath =3D NULL; > char *argv[] =3D { "bad-exe", NULL }, *envp[] =3D { NULL }; > execfd =3D openat(dfd, "exe", O_PATH | O_CLOEXEC); > assert(execfd >=3D 0); > assert(asprintf(&execpath, "/proc/self/fd/%d", execfd) > 0); > assert(!execve(execpath, argv, envp)); > } > % ./mount-memfd > this file was executed from this totally private tmpfs: /proc/self/fd/5 > % > > Given that it is possible for CAP_SYS_ADMIN users to create executable > binaries without memfd_create(2) and without touching the host > filesystem (not to mention the many other things a CAP_SYS_ADMIN process > would be able to do that would be equivalent or worse), it seems strange > to cause a fair amount of headache to admins when there doesn't appear > to be an actual security benefit to blocking this. There appear to be > concerns about confused-deputy-esque attacks[2] but a confused deputy tha= t > can write to arbitrary sysctls is a bigger security issue than > executable memfds. > Something to point out: The demo code might be enough to prove your case in other distributions, however, in ChromeOS, you can't run this code. The executable in ChromeOS are all from known sources and verified at boot. If an attacker could run this code in ChromeOS, that means the attacker already acquired arbitrary code execution through other ways, at that point, the attacker no longer needs to create/find an executable memfd, they already have the vehicle. You can't use an example of an attacker already running arbitrary code to prove that disable downgrading is useless. I agree it is a big problem that an attacker already can modify a sysctl. Assuming this can happen by controlling arguments passed into sysctl, at the time, the attacker might not have full arbitrary code execution yet, that is the reason the original design is so restrictive. Best regards, -Jeff