From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F5BFC001DF for ; Thu, 3 Aug 2023 00:06:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C9625280200; Wed, 2 Aug 2023 20:06:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C46272801EB; Wed, 2 Aug 2023 20:06:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B0E08280200; Wed, 2 Aug 2023 20:06:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A1CC02801EB for ; Wed, 2 Aug 2023 20:06:32 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 4AF1DB2793 for ; Thu, 3 Aug 2023 00:06:32 +0000 (UTC) X-FDA: 81080851824.03.F99AD8B Received: from mail-ot1-f42.google.com (mail-ot1-f42.google.com [209.85.210.42]) by imf20.hostedemail.com (Postfix) with ESMTP id 71C861C0028 for ; Thu, 3 Aug 2023 00:06:30 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=Q43jhR9F; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf20.hostedemail.com: domain of jeffxu@chromium.org designates 209.85.210.42 as permitted sender) smtp.mailfrom=jeffxu@chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691021190; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=C5bV8lUVCZT9FHeRYVjRrJMMVO8kdf2AYYmYfyHoLRc=; b=QFUXosOBKva2JT9LAcXz8YWnkz2stdk4hwO82axp5G/9FHFa3sjj1fklk80DldZv5RKVsf trw5TbDKzolGcJ6LJR6WdKMO0tMIYqTcaX0iSBXItr0Aikzl4dkD2sEgevkVOWBZ5Zre79 XeVTnz87KSCuLUmfhdzmCPceJ2ZKlTQ= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=Q43jhR9F; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf20.hostedemail.com: domain of jeffxu@chromium.org designates 209.85.210.42 as permitted sender) smtp.mailfrom=jeffxu@chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691021190; a=rsa-sha256; cv=none; b=yoP8xVEhFwJRdpyIEaoQL+208cureicKdVJPuhUzv5b7lK0+Wnovpi0VLbHPuZ1m2yGVtR GA8AaoNK4P7TZ0uUyKUwP3BxykuN+0HNPsMTD5KD4zlIyfi5+3u4C70ofd+YqkmBGwxf1q XuW8kQSRhltKa4hVUPCcm3VfwlanFJs= Received: by mail-ot1-f42.google.com with SMTP id 46e09a7af769-6bc9de53ee2so390270a34.2 for ; Wed, 02 Aug 2023 17:06:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1691021189; x=1691625989; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=C5bV8lUVCZT9FHeRYVjRrJMMVO8kdf2AYYmYfyHoLRc=; b=Q43jhR9FPq4V4QC4bkK60Vp0Apj3fPeT3tJVRErdVrxxqmGP7j8AlgpeIN4vns2sU0 tqdIeChLPqzv02ps+kk8oeDO/zymqpAk8uWAdU66q36Oa8hGAF/n+zUkjCAgs8Ohl25w LP5y+j88CgWQDq42U0pMLOE8HrtCd30+O9//8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691021189; x=1691625989; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=C5bV8lUVCZT9FHeRYVjRrJMMVO8kdf2AYYmYfyHoLRc=; b=R3vkyeY0wHSijlnNIvHxQBBWq8Mz8gVQeJn84vB9oueV4xt8LfEs4xMuGcUwLddfPh z/YpbjUFbuTDdSKNdplGku6HJcdpuds8h0GmP59YjhyzWf7yLw4TT8kar74JMcMAcXs4 qGc+QUt92It0xXdmKx6Z6MpwjJrKd0D0N1aEDrbhbA/qY6AdOMZ0cACSE8D6j+j0Iaid FeXnT9OIsjn9pXWDkYat9TbCKkE0Na5mcQwTkAxN3LEmFIpE90KDXU9VE5TBnddA+2JW gBnVlwH6qo1d50pK8uHKOeprXbjTReWKJ+yEsUzkq+Tll2+HtZ5T+5K1GSJZzkho9wRi JBDQ== X-Gm-Message-State: ABy/qLb1W3rqnNvg5IJKplHcUP72xdAb3wqw+PVhFFbC10/6ApjqlH+G WfvARoM2eKNtpGu/++lGtznNi9ubTkCjMP6uTDLWyw== X-Google-Smtp-Source: APBJJlHULcxpHCqbXXv011nCfdvIzK7LvTT08X/UtG0LktiOwZGcV63sf1mW3qYe23ClG3POXFIKjL7Zxpyd3M8n7zw= X-Received: by 2002:a05:6871:1c6:b0:1ba:53ed:18c9 with SMTP id q6-20020a05687101c600b001ba53ed18c9mr18741812oad.37.1691021189390; Wed, 02 Aug 2023 17:06:29 -0700 (PDT) MIME-Version: 1.0 References: <20230714114753.170814-1-david@readahead.eu> In-Reply-To: From: Jeff Xu Date: Wed, 2 Aug 2023 17:06:17 -0700 Message-ID: Subject: Re: [PATCH] memfd: support MFD_NOEXEC alongside MFD_EXEC To: David Rheinsberg Cc: Jeff Xu , linux-kernel@vger.kernel.org, Andrew Morton , Kees Cook , Daniel Verkamp , linux-mm@kvack.org, Peter Xu , linux-hardening@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 71C861C0028 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: ejak7ab5i7c8nr75xsx8ccj9o73a7fyh X-HE-Tag: 1691021190-258457 X-HE-Meta: U2FsdGVkX1+2LqeQrpDg6lq2hQjdvgnYPAts9fe55TGyK8eRFzFbHpSY+JyYp0STvuzKdkE/ghiRF8IAHALyv8jk/PztH1Z+7l90fIG6D3vL3zkiUTOr1T42AaW7GIhN1KYXenUwYZXjYHJomzeRBEz8x1rCwRsCfr+txL18E8ehjrlLzyF6ERpjigmu6nk7suhrymSFFUmaZbG4MFRBK1izzvXLODEUq0GzB31eQWqN1hAylUGTX4Z3nv6e7mpHdk+CoqHSutXkCqfC3aqAKLQc17WD6lgyd/EJWfCu18JoyPBEIkCUU21Rz/WlRNeSqp0l1dXj+9eSs+0y2+fdXpHrTQPvFNDDBePiR8OACBHBac/V3E2lKRp/DH26ieeLAlqgppBhw/WANu3P4QXdUUTOhS+5AzdMqg6J5IQ83Q3aRtcHRQl7AxgzV4T+JP0OSvmut1OP7FtLEG20xJacMFpaRx7cY0/wA7C7jUPlxrVwT3JQlCsLmb5erGkKT3RlwreeBOQOmWoyFu47ON2G+MHdlksvaTc3h8VkuhWgFM6bdfBTYtz0Ctk+JfBCkO0CJHHSvCJr+pmbaGCK5KffRMhxYtt/h7oc0yL3rg3zpNudM+MW4Gu3F8mq/fibOB/ZtvowzbpJ1RuTHtQwMYjxWdLiPHj4VJczJrd+kUmQzLxS/YqzPcoeMqG8MUFadC8Mb985sub423ED0L4xtf3n8Aatgf3nBpawgDsNJ/rspFinkrnp6s75mNlVow8YtwHFqDwk0P+718w1Fey5u8biVRAwF0FQbEq1JNMUDBfHZM9S1xe1EOiPgbpDKNskk54Ig4DUuvyP/Ps9pmk9ROa4niLN2PvuXhbQbHNhSXKWi6mTo66xavdfdeag0JFngnBlyq9/2Bywh7hu2TVuufelB2JUxkBpKxhT5NoTSybN+y1bWwtknbYu1XXW37sJyKTybYK3HpSvtOTEtUpYVPX 8eErT46X upJMRrOAIwfgRRQlN9fIgTG+qGOy8nef4XpIJ9iVEijyJ4il4J6VItJb6OyAg9nCD9kTyx6F1P52tMRMWTb44uW5mjm6ZWbqwMiO6mZL2zS7OxZ88mG2HBC8TJb5cMj45KJDLoWW/XRI6urPHLrJrov7N5kC/EO/jORjmxZPf0R5atQJNKJY+b3bibqZmIMk1rsqHQmPJIGw37z38G+gSpfLMUNFqTL+oMk/3SRpykwQOWfeMikCO5AOtzqbsVYHlyeHlaUA4A3We9ydmNTTKtr/YD3V1TBF/AOe/TYE/hPjHndk420hvR3DSYxnV31hE1C5Mj2Ufuv5wYELQfWq0lLT/ag== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000017, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Aug 2, 2023 at 12:58=E2=80=AFAM David Rheinsberg wrote: > >> I think I didn't get my point across. Imagine an application that does= *NOT* use sealing, but uses memfds. This application shares memfds with un= trusted clients, and does this in a safe way (SIGBUS protected). Everything= works fine, unless someone decides to enable `vm.memfd_noexec=3D2`. Sudden= ly, the memfd will have sealing enabled *without* the application ever requ= esting this. Now any untrusted client that got the memfd can add seals to t= he memfd, even though the creator of the memfd did not enable sealing. This= client can now seal WRITES on the memfd, even though it really should not = be able to do that. > >> > >> (This is not an hypothetical setup, we have such setups for data shari= ng already) > > > > Thanks, this helps me understand your point better. > > > > I'm not convinced that sysctl needs to consider the threat model of > > "someone" changing and breaking an application. If we follow that > > threat model, there are a lot of other sysctls to worry about. > > > > Also, in the system that you described, if memfd is handled to an > > untrusted process, not only "sealing" can cause damage, but also > > chmod, arbitrary rw, imo the right approach is to harden the process > > or mechanism of passing the memfd. > > No. The model I describe is carefully designed to hand out file-descripto= rs to inodes that the clients have *no* access to. They cannot run fchmod(2= ), unlink(2), etc. All they can do is operate on the open file. And all acc= ess to this shared file is properly guarded against possible damage the oth= er concurrent clients can do. The entire model is already hardened against = malicious actors. > > With the new sysctl, a new attack-vector is introduced, which was not pos= sible before. > > I was *explicitly* told to add `MFD_ALLOW_SEALING` for that exact reason = when introducing memfd_create(2). So I am a bit baffled why it is now ok to= enable sealing behind the users back. > > I agree that the new sysctl is a root-only option. But I fail to see *why= * it implies `MFD_ALLOW_SEALING`? This behavior is not documented nor is it= explained in the original commit-messages, nor mentioned *anywhere*. > > >> Thus, setting the security-option `memfd_noexec` *breaks* applications= , because it enables sealing. If `MFD_NOEXEC_SEAL` would *not* imply `MFD_A= LLOW_SEALING`, this would not be an issue. IOW, why does =C2=B4MFD_NOEXEC_S= EAL` clear `F_SEAL_SEAL` even if `MFD_ALLOW_SEALING` is not set? > >> > > > > If MFD_NOEXEC_SEAL is not desired, then it should not be used to > > overwrite memfd_create() in this system. > > > > For the question of why the sysctl adding a seal without application > > setting it , the rationale here is, as summary of previous/this > > emails: > > I still think we are not talking about the same thing. I completely under= stand why you add the seal! I am just questioning why you *CLEAR* `F_SEAL_S= EAL`? That is, why do you enable `MFD_ALLOW_SEALING` without the user reque= sting it? You could just set `F_SEAL_EXEC` without clearing `F_SEAL_SEAL`. = And then require `MFD_ALLOW_SEALING` on top to clear `F_SEAL_SEAL`. > Ah, I apologize. I didn't read it carefully enough and misunderstood you, thanks for clarification. The reason that F_SEAL_SEAL is cleared, is that MFD_NOEXEC_SEAL implies MFD_ALLOW_SEALING, and it seems to be reasonable that application might want to use sealing e.g I image application write the content to memfd then adding F_SEAL_WRITE. Your point is that MFD_ALLOW_SEALING should not be implied by MFD_NOEXEC_SEAL. An application should still explicitly set MFD_ALLOW_SEALING. To me, MFD_NOEXEC_SEAL, the _SEAL part implies to allow sealing, but of course, this might not be so clear to anyone other than me :-) , documentation is indeed necessary. And with the context you described, now I think your approach is better: 1> application set MFD_NOEXEC_SEAL, with MFD_ALLOW_SEALING F_SEAL_EXEC is set, F_SEAL_SEAL is clear. 2> Application set MFD_NOEXEC_SEAL, without MFD_ALLOW_SEALING F_SEAL_EXEC and F_SEAL_SEAL are set. > [...] > >> The downside of `MFD_NOEXEC` is that it might be picked over `MFD_NOEX= EC_SEAL` by uneducated users, thus reducing security. But right now, the al= ternative is that existing code picks `MFD_EXEC` instead and never clears t= he executable bit, because it is a hassle to do so. > >> > > > > Yes. This is the downside I was thinking about. > > > > I lean to believe the kernel API shouldn't be feature rich, it could > > be simple, optimized towards the majority of user cases, and ideally, > > is self-explained without devs to look through documentation. For > > example, if I had to choose one to implement between MFD_NOEXEC and > > MFD_NOEXEC_SEAL, I would choose MFD_NOEXEC_SEAL because it should be > > what most users care about. > > Well, if we were to go back, we would make MFD_NOEXEC(_SEAL) the default = and just add `MFD_EXEC` :) > > >> Or is there another reason *not* to include `MFD_NOEXEC`? I am not sur= e I understand fully why you fight it so vehemently? > >> > > > > I wouldn't add it myself, I hope to convince you not to :-). > > If you still think it is beneficial to add MFD_NOEXEC (saving one > > chmod call and making it easy to use), I wouldn't feel bad about that. > > I would suggest going with documentation to help devs to choose > > between those two, i.e. recommend MFD_NOEXEC_SEAL in most cases. > > Any application that cannot use `F_SEAL_EXEC` (e.g., because its peers ve= rify for historic reasons that the seal is not set) now has to do an extra = dance to get the "safer" behavior, rather than getting the "safer" behavior= by default. That is, we make it easier to get the unsafe behavior than to = get the safe behavior (in this particular scenario). > Without `MFD_NOEXEC`, it is easier to end up with a 0777 memfd than not. = I want the application that desires `S_IXUSR` to jump through hoops, not th= e application that does *not* require it. > I see your points now, i.e. the "disallow sealing entirely" is at least as important as "not able to chmod to add X". I think the reasonable mid-ground is perhaps adding MFD_NOEXEC support, with some documentation to help dev to choose between MFD_NOEXEC and MFD_NOEXEC_SEAL Would you like to update your patch to the last version on Andrew's branch, adding selftest, and perhaps help for documentation ? Thanks! -Jeff