From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41E3AC43216 for ; Thu, 26 Aug 2021 22:14:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C8F7161029 for ; Thu, 26 Aug 2021 22:14:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org C8F7161029 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=xmission.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 4C5F38D0002; Thu, 26 Aug 2021 18:14:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 459268D0001; Thu, 26 Aug 2021 18:14:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C8CA8D0002; Thu, 26 Aug 2021 18:14:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0029.hostedemail.com [216.40.44.29]) by kanga.kvack.org (Postfix) with ESMTP id 105718D0001 for ; Thu, 26 Aug 2021 18:14:38 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 94518274C6 for ; Thu, 26 Aug 2021 22:14:37 +0000 (UTC) X-FDA: 78518636994.20.2883DB7 Received: from out02.mta.xmission.com (out02.mta.xmission.com [166.70.13.232]) by imf16.hostedemail.com (Postfix) with ESMTP id 453CCF000095 for ; Thu, 26 Aug 2021 22:14:37 +0000 (UTC) Received: from in01.mta.xmission.com ([166.70.13.51]:43756) by out02.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1mJNdh-00EQKs-GD; Thu, 26 Aug 2021 16:14:25 -0600 Received: from ip68-227-160-95.om.om.cox.net ([68.227.160.95]:36634 helo=email.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1mJNdf-00HV1Y-UD; Thu, 26 Aug 2021 16:14:25 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: David Hildenbrand Cc: Andy Lutomirski , Linus Torvalds , David Laight , Linux Kernel Mailing List , Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Al Viro , Alexey Dobriyan , Steven Rostedt , "Peter Zijlstra \(Intel\)" , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Petr Mladek , Sergey Senozhatsky , Andy Shevchenko , Rasmus Villemoes , Kees Cook , Greg Ungerer , Geert Uytterhoeven , Mike Rapoport , Vlastimil Babka , Vincenzo Frascino , Chinwen Chang , Michel Lespinasse , Catalin Marinas , "Matthew Wilcox \(Oracle\)" , Huang Ying , Jann Horn , Feng Tang , Kevin Brodsky , Michael Ellerman , Shawn Anastasio , Steven Price , Nicholas Piggin , Christian Brauner , Jens Axboe , Gabriel Krisman Bertazi , Peter Xu , Suren Baghdasaryan , Shakeel Butt , Marco Elver , Daniel Jordan , Nicolas Viennot , Thomas Cedeno , Collin Fijalkovich , Michal Hocko , Miklos Szeredi , Chengguang Xu , Christian =?utf-8?Q?K=C3=B6nig?= , "linux-unionfs\@vger.kernel.org" , Linux API , the arch/x86 maintainers , linux-fsdevel@vger.kernel.org, Linux-MM , Florian Weimer , Michael Kerrisk References: <20210812084348.6521-1-david@redhat.com> <87o8a2d0wf.fsf@disp2133> <60db2e61-6b00-44fa-b718-e4361fcc238c@www.fastmail.com> <87lf56bllc.fsf@disp2133> <87eeay8pqx.fsf@disp2133> <5b0d7c1e73ca43ef9ce6665fec6c4d7e@AcuMS.aculab.com> <87h7ft2j68.fsf@disp2133> <0ed69079-9e13-a0f4-776c-1f24faa9daec@redhat.com> Date: Thu, 26 Aug 2021 17:13:52 -0500 In-Reply-To: <0ed69079-9e13-a0f4-776c-1f24faa9daec@redhat.com> (David Hildenbrand's message of "Thu, 26 Aug 2021 23:47:07 +0200") Message-ID: <87mtp3g8gv.fsf@disp2133> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-XM-SPF: eid=1mJNdf-00HV1Y-UD;;;mid=<87mtp3g8gv.fsf@disp2133>;;;hst=in01.mta.xmission.com;;;ip=68.227.160.95;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX18m5lfQvtNAYQfGYREq2VBtg2jfskLzpXA= X-SA-Exim-Connect-IP: 68.227.160.95 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH v1 0/7] Remove in-tree usage of MAP_DENYWRITE X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Authentication-Results: imf16.hostedemail.com; dkim=none; spf=pass (imf16.hostedemail.com: domain of ebiederm@xmission.com designates 166.70.13.232 as permitted sender) smtp.mailfrom=ebiederm@xmission.com; dmarc=pass (policy=none) header.from=xmission.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 453CCF000095 X-Stat-Signature: iz75xpqf47n8ekacc94w3nukc7enw4cx X-HE-Tag: 1630016077-912165 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: David Hildenbrand writes: > On 26.08.21 19:48, Andy Lutomirski wrote: >> On Fri, Aug 13, 2021, at 5:54 PM, Linus Torvalds wrote: >>> On Fri, Aug 13, 2021 at 2:49 PM Andy Lutomirski wrote: >>>> >>>> I=E2=80=99ll bite. How about we attack this in the opposite direction= : remove the deny write mechanism entirely. >>> >>> I think that would be ok, except I can see somebody relying on it. >>> >>> It's broken, it's stupid, but we've done that ETXTBUSY for a _loong_ ti= me. >> >> Someone off-list just pointed something out to me, and I think we should= push harder to remove ETXTBSY. Specifically, we've all been focused on op= en() failing with ETXTBSY, and it's easy to make fun of anyone opening a ru= nning program for write when they should be unlinking and replacing it. >> >> Alas, Linux's implementation of deny_write_access() is correct^Wabsurd, = and deny_write_access() *also* returns ETXTBSY if the file is open for writ= e. So, in a multithreaded program, one thread does: >> >> fd =3D open("some exefile", O_RDWR | O_CREAT | O_CLOEXEC); >> write(fd, some stuff); >> >> <--- problem is here >> >> close(fd); >> execve("some exefile"); >> >> Another thread does: >> >> fork(); >> execve("something else"); >> >> In between fork and execve, there's another copy of the open file descri= ption, and i_writecount is held, and the execve() fails. Whoops. See, for= example: >> >> https://github.com/golang/go/issues/22315 >> >> I propose we get rid of deny_write_access() completely to solve this. >> >> Getting rid of i_writecount itself seems a bit harder, since a handful o= f filesystems use it for clever reasons. >> >> (OFD locks seem like they might have the same problem. Maybe we should = have a clone() flag to unshare the file table and close close-on-exec thing= s?) >> > > It's not like this issue is new (^2017) or relevant in practice. So no > need to hurry IMHO. One step at a time: it might make perfect sense to > remove ETXTBSY, but we have to be careful to not break other user > space that actually cares about the current behavior in practice. It is an old enough issue that I agree there is no need to hurry. I also ran into this issue not too long ago when I refactored the usermode_driver code. My challenge was not being in userspace the delayed fput was not happening in my kernel thread. Which meant that writing the file, then closing the file, then execing the file consistently reported -ETXTBSY. The kernel code wound up doing: /* Flush delayed fput so exec can open the file read-only */ flush_delayed_fput(); task_work_run(); As I read the code the delay for userspace file descriptors is always done with task_work_add, so userspace should not hit that kind of silliness, and should be able to actually close the file descriptor before the exec. On the flip side, I don't know how anything can depend upon getting an -ETXTBSY. So I don't think there is any real risk of breaking userspace if we remove it. Eric