From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96D88C25B78 for ; Tue, 28 May 2024 17:56:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D52906B0088; Tue, 28 May 2024 13:56:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D026C6B008C; Tue, 28 May 2024 13:56:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BA23F6B0095; Tue, 28 May 2024 13:56:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9B5C16B0088 for ; Tue, 28 May 2024 13:56:55 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 44D04160241 for ; Tue, 28 May 2024 17:56:55 +0000 (UTC) X-FDA: 82168560390.14.A32E039 Received: from mail-ed1-f46.google.com (mail-ed1-f46.google.com [209.85.208.46]) by imf01.hostedemail.com (Postfix) with ESMTP id 6099D40014 for ; Tue, 28 May 2024 17:56:53 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rhtgN2ee; spf=pass (imf01.hostedemail.com: domain of jeffxu@google.com designates 209.85.208.46 as permitted sender) smtp.mailfrom=jeffxu@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716919013; a=rsa-sha256; cv=none; b=ZPwyQz5Oke3GkeqgFBXn2gh3891iliPDD7d3Y2aLTH5WWei/hkwyj7PGlHL5p/OzxWmBt6 LOw3npoKW8HARhZyrpgW7CI4QHkFqlflYitJ0meyC2sqzamznwVxhFz8jY0ogGs1s9xW43 oAsRl1tnocAQXfkZjhwHyM36tVIS0YA= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rhtgN2ee; spf=pass (imf01.hostedemail.com: domain of jeffxu@google.com designates 209.85.208.46 as permitted sender) smtp.mailfrom=jeffxu@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716919013; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SKUGqqWPr9TLGGPFQ1drx41emUhb6B1gi66jgb5MsYE=; b=NGlvb69n9Zi5kx9JX6AzN0f8NDu+nnkUItwQfugRhnE/1MVhnKNM3t3KV0SstiwXBL0fiF vrobfFhcmKCS23mTUcUdfKv5YpWUXJhSkhuXpf+sze0khoUw6BpD7uFkg8EGJ0CDtqRmZ6 Tewk3R5p1GtKLtEgZ+mPp3u5kNF7Of0= Received: by mail-ed1-f46.google.com with SMTP id 4fb4d7f45d1cf-579ce5fbeb6so1652a12.1 for ; Tue, 28 May 2024 10:56:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1716919012; x=1717523812; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=SKUGqqWPr9TLGGPFQ1drx41emUhb6B1gi66jgb5MsYE=; b=rhtgN2eedb4by0hN2BXU/TWBL6w2Wm4ZHjD9Kc2U+vU9g+7xS6th++f6JQHYs0Eh9E +BVEORGE4ekmC1jXXwd6npN2WHciCpu/8BezPPS5zERljdzlURV/XOa2WSdOdz5LqP1i snlLgokjqQRy3FlhojJ4JV15Qr59YZ9yzV0RGGUnLnb5hJQkFb0XTtQmpdQkHfGdY1cY 1Ep5Dsx3tSOLLSo4x93TM/KAfSysEq6RZd+5l/jXbDqOGHKGdna7NHknWkLpMb8fXwvF r17qkPV+LGR3KTeaQEZJJGpGz2MaGqu/pk+3yftzab42gDTopKUe3kWS8F0lNGUNUnDQ jNtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716919012; x=1717523812; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SKUGqqWPr9TLGGPFQ1drx41emUhb6B1gi66jgb5MsYE=; b=FjGH2ZPs8ZatCzq8BP8iPvrbOYkKGTcXXtQTsBup9HcyXLWL3j92yVPPrMKpUt9O7f JcyKDS5GDtDbwT2/nagYEluJfLHxZAlvg4ZPSHApA8IYr7fTW3lT44Q4XC1qJXQS7gIR 9m0KHpLAl+VXAaj78nGFQdKITJ7NmO9X0FDZKonJCIoxFbw0YjXijqhHx1C2+TP6rCX/ uxQ0E0Lx6MFtKTL6Ff1fIxjuaUVJqfUncW+VdPXYVnx91hSlM8GA9Qvb+WaLjtUhbeoW Afa0bRJFmRq6pZbCn08TzTUxj/TLA5YH0brSWrADQMoK5NivYSexvK1o36aUvwScdxTS 2uYg== X-Forwarded-Encrypted: i=1; AJvYcCWnrwvAMzDpiJB87blfyYvkeIj+ZQmsBGh3NdYIsErg8glz+BG6PbSb0aQfGCcmWR3yzPm1GRX64QaILS25cFxA3qA= X-Gm-Message-State: AOJu0Yz7cG6U77hLvRKMURprbglGXyszmVT5Ioc5Sp6JYj6ZfGHCZ7Sl Hls/RSQnn+ijt/bIIG9S1beWY+5+wQWpXoULWPS1pJh5HxJLM1/K9KK3OpVNfyn1JKHUlZSh5f6 67LXolafPV0vpvK00hIzQXLLwZbBfzfc4rLGG X-Google-Smtp-Source: AGHT+IEcFSGFLA92N67hRJZZKfuUP54ChZWVp/fzQzI9F+ne83J/6+p0ouLE12hyW7IobvfLf4GWEsYmji6NaeNQTP4= X-Received: by 2002:a05:6402:13d4:b0:574:ea5c:fa24 with SMTP id 4fb4d7f45d1cf-57a02766876mr11029a12.3.1716919011421; Tue, 28 May 2024 10:56:51 -0700 (PDT) MIME-Version: 1.0 References: <20240513191544.94754-1-pobrn@protonmail.com> <20240522162324.0aeba086228eddd8aff4f628@linux-foundation.org> <1KDsEBw8g7ymBVpGJZp9NRH1HmCBsQ_jjQ_jKOg90gLUFhW5W6lcG-bI4-5OPkrD24RiG7G83VoZL4SXPQjfldsNFDg7bFnFFgrVZWwSWXQ=@protonmail.com> <08450f80-4c33-40db-886f-fee18e531545@app.fastmail.com> <20240524.160158-custard.odds.smutty.cuff-caukvmB4EWP9@cyphar.com> In-Reply-To: <20240524.160158-custard.odds.smutty.cuff-caukvmB4EWP9@cyphar.com> From: Jeff Xu Date: Tue, 28 May 2024 10:56:12 -0700 Message-ID: Subject: Re: [PATCH v1] memfd: `MFD_NOEXEC_SEAL` should not imply `MFD_ALLOW_SEALING` To: Aleksa Sarai , Jeff Xu Cc: David Rheinsberg , =?UTF-8?B?QmFybmFiw6FzIFDFkWN6ZQ==?= , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, dmitry.torokhov@gmail.com, Daniel Verkamp , hughd@google.com, jorgelo@chromium.org, skhan@linuxfoundation.org, Kees Cook Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 6099D40014 X-Rspam-User: X-Rspamd-Server: rspam12 X-Stat-Signature: 1u6iqg9bc9nmnd7y9tjr6kkbdrfaf69s X-HE-Tag: 1716919013-879732 X-HE-Meta: U2FsdGVkX1+4hKC4o1iRnaBBwvYfSMMjroZYQ3U+q4zP++90ihTZwfB396uD0EdElJ+niDQpt1pp+pv/tBLNiCPBXAEFwFwsDJnoC+haQHky/dkiKau4ol1ur4PQTeQFEKrbqhzVzzDsgCqOHVVu4Tw/09pz/0aDnNIhXJM2XPmoLHHJEP8NMl2mUZbtcZowsL0MYyAP3cjnMwK9hKrLZkZSqkIFf4VRAdAvI8ay2z8RXPH9NuKdZv/RujlDYospXWnwnL/3yvgwNBVmw5klm0/vhKCH2NJCA9flx0q3skUHDxAhHah4rAc/VJdiYcwG54tNqTBtHQxoPggvf5rAGaEntVV3iSGyHb2LcWO2fINFauoFIiESjUXpNFT+V1hdj9luXcfn+FAztff12pB6T0W01X3LJgCcrAu+YUQzNJ0xFLjW+z1IvS709eJ2ljcuh2RR2vLjnu6O4KSkqE0fFxfsJS3GK2pbroXtTaA8wVA4mMSLuB05hM2sLsNtZpr/pfqqANqrcodP0jE5Udau6imp0hXeq1pijY2wJrb6gk5uUGdThirCo/J+aBSiYNjoYj1yx4HjssI5bktFiE0lEaK3gmWGgxHAotOzSk0DtxgjPOvYNIZFBOHnnQTN6I8c59kBdO0vzuKdNshgc8fTsgyk3WCXr+8FHfrxfnZg6OxAcFBX67CQrfCtErQ3HdWDnm28vXX5psPv4bnMBoo0ulp7qsDBjUrM0fBsbCDwlVtF938+au2eEpfVzbCadAO5f7AhOQU6V4wa0X9VDIA/3EA6sfiE3H7tIb59p4/7LylKS7NALQcrMCQcPAXhKbz44PLqXIMj4quhKZxdSnT+UbQ7D1KRR8iWrbHM6kXxoHJq5BMDFtmtaESEF4tH4jxIxDUQcgSD13BM0k4gcvtB7ORbmS1GhoOHcaH+HEAF2Cnr9QPaXuwqqREEuq4tP8xdMxUpqSo5pcbpkOtNNMU rEosFO03 72fykndCp8RC1dQQDSqNNbfyrXf/gez30Mecm90iXZ5yq3gOP2Wm6d4IkDTuXsSSGCWSyqYJc4xF9t86FADbYqr8Y/3xUBcix0Kmie/Y5i4qVl+mNXc2zcapqr3CYhcki4GAHG87YxYHBH35pN4Pg6kITBCg6vkte40ZG68UDXcyHMRXQrYyDUYtzcaNAPd1Q4ppoxnwoDEp94fVisJED89mwXiNHc2bApxaMaRqA9pHEpeN5dnbkm/BDsJ9uuy5qh60uO0XgOdcDZkemsuf3g2t55po51rKGkVKO4KnN/ddVM5xWNz/wbd0KUNoLsYO3Yz9xitY+0B/Q6TndsumatYt7ROTYArKm5ZeNadnEKRC4Is8g7mbW6gTezh9uOTw+V/K/3crjErI5CS1k5eRzuFrvI6SVdj8bAOwNt7A7FhqGuuMOE2uIrdKruHVSu0wbDwH7mym7pO6fWT3dpkclUzqnX+ENUUa2zX5sUfoNwIL5Acc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Aleksa, On Fri, May 24, 2024 at 9:12=E2=80=AFAM Aleksa Sarai wr= ote: > > On 2024-05-23, Jeff Xu wrote: > > Regarding vm.memfd_noexec, on another topic. > > I think in addition to vm.memfd_noexec =3D 1 and 2, there still could > > be another state: 3 > > > > =3D0. Do nothing. > > =3D1. This will add MFD_NOEXEC_SEAL if application didn't set EXEC or > > MFD_NOEXE_SEAL (to help with the migration) > > =3D2: This will reject all calls without MFD_NOEXEC_SEAL (the whole > > system doesn't allow executable memfd) > > =3D3: Application must set MFD_EXEC or MFD_NOEXEC_SEAL explicitly, or > > else it will be rejected. > > > > 3 is useful because it lets applications choose what to use, and > > forces applications to migrate to new semantics (this is what 2 did > > before 9876cfe8). > > The caveat is 3 is less restrictive than 2, so must document it clearly= . > > As discussed at the time, "you must use this flag" is not a useful > setting for a general purpose operating system because it explicitly > disables backwards compatibility (breaking any application that was > written in the past 10 years!) for no reason other than "new is better". > Are you referring to ratcheting in the sysctl in my original patch or is this something else ? I do not disagree with your change of "removing the ratcheting" from the admin point of view. > As I suggested when we fixed the semantics of vm.memfd_noexec, if you > really want to block a particular flag from not being set, seccomp lets > you do this incredibly easily without acting as a footgun for admins. seccomp can but it requires more work for the container, e.g. container needs to allow-list all the syscalls. I'm trying to point out that seccomp might not cover all user-cases. "ratcheting" in the vm.memfd_noexec is lightweight and can be applied to the sandbox of the container in advance, but since admin doesn't like ratcheting in sysctl, maybe prctl or LSM are ways to implement such restriction. > Yes, vm.memfd_noexec can break programs that use executable memfds, but > that is the point of the sysctl -- making vm.memfd_noexec break programs > that don't use executable memfds (they are only guilty of being written > before mid-2023) is not useful. > > In addition, making 3 less restrictive than 2 would make the original > restriction mechanism useless. A malicious process could raise the > setting to 3 and disable the "protection" (as discussed before, I really > don't understand the threat model here, but making it possible to > disable easily is pretty clearly). > You could change the policy, but now > you're adding more complexity for a feature that IMO doesn't really make > sense in the first place. > The reason of 3 is help with migration (not for threat-model), e.g. a container can force every apps run in the container migrates their memfd_create to use either MFD_EXEC or MFD_NOEXEC_SEAL. But I understand what you mean, with current code, adding 3 would cause more confusion to vm.memfd_noexec. Perhaps a new sysctl or prctl is the way to go if the app wants to force migration. In the hinder sight: two sysctls would work betters: the first deal with migration, the second enforces NO_EXEC_SEAL. Thanks -Jeff > > -Jeff > > > > > Reviewed-by: David Rheinsberg > > > > > > Thanks > > > David > > -- > Aleksa Sarai > Senior Software Engineer (Containers) > SUSE Linux GmbH >