From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF0AFD70DEB for ; Thu, 28 Nov 2024 18:28:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 81D7C6B008C; Thu, 28 Nov 2024 13:28:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7CB836B0092; Thu, 28 Nov 2024 13:28:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 66BFB6B0093; Thu, 28 Nov 2024 13:28:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 4B0796B008C for ; Thu, 28 Nov 2024 13:28:25 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id D02DF1205A5 for ; Thu, 28 Nov 2024 18:28:24 +0000 (UTC) X-FDA: 82836338886.02.50B841C Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com [209.85.208.54]) by imf11.hostedemail.com (Postfix) with ESMTP id EDDC640013 for ; Thu, 28 Nov 2024 18:28:14 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=pbg8orJE; spf=pass (imf11.hostedemail.com: domain of jannh@google.com designates 209.85.208.54 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732818499; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tv8OCKFpQ5lMoZqDLhvzgUyfFZsdPxYXsLEcWpy9h0c=; b=JjC/xOMAwdrQJX9IZFFhKQM/mH8XqDiKLe3f/OET/4Ivcf3pORBbhMGFrkfox+cEG839Qo hvzmgFstiE/qPDE+b0CTGLPgUEXr/WZW9WIxhXvT0pSUlxW9zuFsZnCo9ei6lz0Zo9wffL kzbhBYvWkaON2Dhq0lWjZ6b+aNc+OfY= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=pbg8orJE; spf=pass (imf11.hostedemail.com: domain of jannh@google.com designates 209.85.208.54 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732818499; a=rsa-sha256; cv=none; b=pQui2AyihMIcJ9TjPh0jEffQIpMzRu1cTqP0rZtwAMfzQ9dRN1c/qoUNsQ0othGVu8z1ib m7Cwzv14o856G0NR8fe05hp48qxy0xbpyt77zav4VW56RJOVX/xWpVzuo/X8cDX8gPIzlU jydFPLywmAAzvJV6PUX6QIekDLQdiNc= Received: by mail-ed1-f54.google.com with SMTP id 4fb4d7f45d1cf-5d0aab1aafcso1016a12.1 for ; Thu, 28 Nov 2024 10:28:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1732818501; x=1733423301; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=tv8OCKFpQ5lMoZqDLhvzgUyfFZsdPxYXsLEcWpy9h0c=; b=pbg8orJEO7Cp9LFDwnnu/snHY2vvHU3EG+ift9gDLoelLE20nlpRRwNZp0QDTtS3al zvjFgMtK84UwlCZuQekXhGUPzlYWRTH3yRa2q9N0YI2dTOqD1gilTFRoV6+PNZn52aYQ Y8nMe7KTxj4T/uaXJZHNesosvHq2VhxHbEJOybSiurRewXgMOGLg7Vc3Lsk1T9Iff1dv NMcXudlVnWx+r/hPsshvvck/MQ2edZE12oksSUc27a8xa6RvtSt9ZzbxBU7SHOLvgZOP dFWDkiv/BlOG0kwy22zZSDdKK5DZHEDJu1ZLteOLCg89WkYCm84T0aoSYZTEr/6WAi61 fdjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732818501; x=1733423301; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tv8OCKFpQ5lMoZqDLhvzgUyfFZsdPxYXsLEcWpy9h0c=; b=nPJnqQpn0s3/Utkx8o/7EWeW+QfeoBG4acXI8WbVPyVh+j2N2Y2GVO6k+SYcyYoQja +M3njJfhIHnqSU19UuRjqKiNwtZJX8BAndfrbO5vuCNG9awnIC8wvfYzZZmqeffQFw7B 1A0YArhUgxIQRShWFE2VxRBYFJrW3PpDTBQkOLiN2+yIavsvpyb1WLVhKY7CirSxzRSQ XaHG4rKl/u+jjTSAcJsNiafmPzhr33kV35zqrFCVceS+nTy95yVzPNN9E1nj+sxU4zuH 2+UDDzRyUFE+QZeilLkraxuhZ7HuF99rv1JSntlr+eRiWhZF7Ld1kDXHckyCjqxW9X+o 1f+A== X-Forwarded-Encrypted: i=1; AJvYcCVXSY9kSe6oIds+vrug23ODHNOay67EWIBhi/MQF2F5OqYIf65zTZU36LhjQfmGnrY2JY5HIMiAIQ==@kvack.org X-Gm-Message-State: AOJu0YzCS6MRzbvFVD3WyCxwZgECqMD9SvzjspA8NQ3fxohwQSbfkrc+ W1YRR5gQXYh8vWWQ5IbA3oojauYuSNHC7s4HNSOu48X31qKLmygpiXM2oF12YMrjD4cozbfUShQ xS8Okk3Oz95EMHAm2++jXNqBEOiLzius0Cm4p X-Gm-Gg: ASbGncvoIrp32rTROSGgX+9vN4J4AFYaJSjNxIDRnLAaqhKVUjhMd8CL8uAkMr+Z4F0 XmLs3C1wjTjQ9yCO/DOMUCM7ugPFC6hFSBtjKmFrA0kjvNsMAHZTLcPLAi1U= X-Google-Smtp-Source: AGHT+IF4elDJ5oQsFPW4V2HQgCdcZ4+fbW09x2aFjr1nn8V0wIL/8OB/nG4BdD5OkcuJB/VhQgqqTieDfTBuz8J4w7Q= X-Received: by 2002:a50:ed8f:0:b0:5d0:acf3:e3a6 with SMTP id 4fb4d7f45d1cf-5d0acf3e47amr752a12.1.1732818500889; Thu, 28 Nov 2024 10:28:20 -0800 (PST) MIME-Version: 1.0 References: <99fc35d2c62bd2e05571cf60d9f8b843c56069e0.1732804776.git.lorenzo.stoakes@oracle.com> <03678794-2e09-4b93-aacb-90ca6ab36682@lucifer.local> In-Reply-To: <03678794-2e09-4b93-aacb-90ca6ab36682@lucifer.local> From: Jann Horn Date: Thu, 28 Nov 2024 19:27:44 +0100 Message-ID: Subject: Re: [PATCH 1/2] mm: reinstate ability to map write-sealed memfd mappings read-only To: Lorenzo Stoakes Cc: Julian Orth , Andrew Morton , "Liam R . Howlett" , Vlastimil Babka , Shuah Khan , Linus Torvalds , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: EDDC640013 X-Stat-Signature: 6h94t3q5fup3eyha1ne9xmqfm5fsoqm8 X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1732818494-365968 X-HE-Meta: U2FsdGVkX189NX1LREptSpQ6Hmk1zWABW9JNGrC1O44R/WBwVnnL5sfCtWRbS13cGVTu0+PhV7kgMfbGBW+wYqdFC0K3quIdNMknX+qIafd4GPLrJ9rvFwB2E1RTE+UT7dR/b9AYbi4aExgN0CMlkDRTDrLxCOWY6zHl14HoPMx/qgZhg8GYOnPQ1LWiv3EI8mGlCnycdEVejXNjceHlxkacalkedfvyhGyG1e0wURvPQIxqB4DHc+h8d8qu++fYpRB44MrG0bAZOzf63b4XHZcS6wtRC1uEbyzp1RzMJ6yIz0VdMU/vk2H9eqkkJKxK5Vpmd6Ulz7xiaa3kCc+2oGC4UC6I6VPu9sXhxz01pmse0IbOsu80xwVxmsvTTYn4lsE32g2URp/9Y+W3H++Vf5KLJDG5B9EqNf7zulhe2vAeUtRY9gBAuchmfNJkHuuEjvXOMEOoddBGjrOHDsoxIjC8f+SZSLfauN/LdM/+sptiGiFANk9W6nRLp3iUVgzvNY7wjff95QPKwvrM1QPM4H2JzW9eE1VE235wEZcDPNgbOUeQVJsBDxweUwOZhbLYkL8MK4wUTYUQcRe5p2r5n1HwPdV/d2sVNXL6W2jA+N4PBNWi0JFdB7dSruisrbMbGOaL12rTSEt5RTPrU/J+O2GNPrROyc8/XuIYpwLcCrpM4eJ2wO6sYM6F/yGBtpUQzWQCjpJKIlg/Fh/aixNal17IXVDNy9yW+oqOb8tiZnXW8aFUPCuLiQpbj74nVHkNb47QaWjZ+GUkb+R6SW7OIK1KlQJQnA/PRlK0sVRmVjhoYHWm901S9ue4nQPqnUFsQUNU/WPBBbpkt/qCzdbRILLLazCaQAPJoAqo5B/T5ItJ7JDIAe/i2Wp0D5LSoPUlYTT/MdlJH4WRaMwSqqF8HqU6cTUh5DFynq6gn/GKV+KEvymvSXNjQQA3hE23rLiDtX0pxjWcV5JDs9l0IJT Wf4LURvT SKaa0XUytRs8Hx+T4BU3cxREeKTS4MuMxxQVMVux/jDZC7WrGLmlRc6uZwjbTY3vwKeNSf2YoRRve6fxKM16ZaTLuvdtb4BBocgNTY3u/JLBnH2jxURqXC8W7mxskpcuc57yAGZCms70K8mMUE3fKeloppH+uF1sdFWN4VERp24kcAOsqc5tZWUJwGZeTgDofcngl6/H3Uzumdnwr6hVf9GFMU7Qt/NhclamgJ6oHEhYtrtRdkTORVF8nOypBi3I3ZW/hbaWHu0LCTvYPDBLxT82hkJQzEhXjIWOv+9M0oC08MttrCb0s/M7/Ju8ric+RHCloqGWVEMuCcFo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Nov 28, 2024 at 7:21=E2=80=AFPM Lorenzo Stoakes wrote: > On Thu, Nov 28, 2024 at 06:58:21PM +0100, Julian Orth wrote: > > (Re-sending the message below since I forgot to reply-all) > > > > On Thu, Nov 28, 2024 at 6:46=E2=80=AFPM Jann Horn wr= ote: > > > > > > On Thu, Nov 28, 2024 at 4:06=E2=80=AFPM Lorenzo Stoakes > > > wrote: > > > > In commit 158978945f31 ("mm: perform the mapping_map_writable() che= ck after > > > > call_mmap()") (and preceding changes in the same series) it became = possible > > > > to mmap() F_SEAL_WRITE sealed memfd mappings read-only. > > > > > > > > This was previously unnecessarily disallowed, despite the man page > > > > documentation indicating that it would be, thereby limiting the use= fulness > > > > of F_SEAL_WRITE logic. > > > > > > > > We fixed this by adapting logic that existed for the F_SEAL_FUTURE_= WRITE > > > > seal (one which disallows future writes to the memfd) to also be us= ed for > > > > F_SEAL_WRITE. > > > > > > > > For background - the F_SEAL_FUTURE_WRITE seal clears VM_MAYWRITE fo= r a > > > > read-only mapping to disallow mprotect() from overriding the seal -= an > > > > operation performed by seal_check_write(), invoked from shmem_mmap(= ), the > > > > f_op->mmap() hook used by shmem mappings. > > > > > > > > By extending this to F_SEAL_WRITE and critically - checking > > > > mapping_map_writable() to determine if we may map the memfd AFTER w= e invoke > > > > shmem_mmap() - the desired logic becomes possible. This is because > > > > mapping_map_writable() explicitly checks for VM_MAYWRITE, which we = will > > > > have cleared. > > > > > > > > Commit 5de195060b2e ("mm: resolve faulty mmap_region() error path > > > > behaviour") unintentionally undid this logic by moving the > > > > mapping_map_writable() check before the shmem_mmap() hook is invoke= d, > > > > thereby regressing this change. > > > > > > > > We reinstate this functionality by moving the check out of shmem_mm= ap() and > > > > instead performing it in do_mmap() at the point at which VMA flags = are > > > > being determined, which seems in any case to be a more appropriate = place in > > > > which to make this determination. > > > > > > > > In order to achieve this we rework memfd seal logic to allow us acc= ess to > > > > this information using existing logic and eliminate the clearing of > > > > VM_MAYWRITE from seal_check_write() which we are performing in do_m= map() > > > > instead. > > > > > > If we already check is_readonly_sealed() and strip VM_MAYWRITE in > > > do_mmap(), without holding any kind of lock or counter on the file > > > yet, then this check is clearly racy somehow, right? I think we have = a > > > race where we intermittently reject shared-readonly mmap() calls? > > > > Apropos race, some time ago I reported a way to get a mutable mapping > > for a write-sealed memfd via a race: > > > > https://bugzilla.kernel.org/show_bug.cgi?id=3D219106 > > Kind of hard to read rust code, but it looks like you're intentionally > trying to race sealing on the assumption it's atomic when it's not? That > doesn't seem like a bug? > > The intent of sealing memfds is you establish the memfd buffer, then seal > it and _only then_ expose it elsewhere. > > I may be missing something here, however. AFAIU memfds are supposed to also guarantee *to the recipient* of the shared memfd that the memory inside it won't change anymore, so that the recipient can parse data out of this shared memory buffer without having to worry about the data concurrently changing. udmabuf_create() looks like it indeed breaks that assumption by first calling check_memfd_seals() and then doing udmabuf_pin_folios() without any lock that prevents a seal being added in between those. That's also why we have memfd_wait_for_pins(), which ensures that folios in the memfd don't have elevated refcounts when a F_SEAL_WRITE seal is added. (I believe that's one of the major differences in usecases of F_SEAL_WRITE and F_SEAL_FUTURE_WRITE: F_SEAL_FUTURE_WRITE is enough for cases where only the creator of the memfd wants to prevent other tasks from writing into it, while F_SEAL_WRITE is suitable for cases where the creator and recipient of the memfd want mutual protection.)