From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A29C2D70DEB for ; Thu, 28 Nov 2024 18:19:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D708B6B0083; Thu, 28 Nov 2024 13:19:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D200F6B0085; Thu, 28 Nov 2024 13:19:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE93C6B0088; Thu, 28 Nov 2024 13:19:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A07BA6B0083 for ; Thu, 28 Nov 2024 13:19:01 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 2E9ECA064F for ; Thu, 28 Nov 2024 18:19:01 +0000 (UTC) X-FDA: 82836315030.21.D041D67 Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45]) by imf08.hostedemail.com (Postfix) with ESMTP id 7194A16000C for ; Thu, 28 Nov 2024 18:18:54 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=YAYbuLRo; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of jannh@google.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732817932; a=rsa-sha256; cv=none; b=pkbt5+TL5XzVhCilkYtothEsSST0zMWHi46ku7QonJpZRKxNyGK2QHFLE7HJHXmUfro094 PMDfcCW66/5AI2Qys2bNCYqsTxvZTypOhWBrOHyN+NoI5CoHsxGJKCohbymhm9xwSO9ZrC GQlPJfoHcu57WuP9ydrm72wYZyJNoS4= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=YAYbuLRo; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of jannh@google.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732817932; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jBQKGMceh2eim8+DXFzAVIWaviXIbhui6CqSR6Gz0ns=; b=lG1/f8SwoabBGNQRd/3fDyJu1KOl7zs6eG19H24RhQFzrSuQ2WfO0ZNKHs4hE+VTRu3iUZ 4Nm5IsEYrBCLNWzXsgWHG3Dgw80/55EzhjOfJxKK0wUtXq4mf9rONJNqF12iS6DZ8YqP/4 wRIurqvlToAa/ZUNCY6H0ubyHTldY9E= Received: by mail-ed1-f45.google.com with SMTP id 4fb4d7f45d1cf-5cfb81a0af9so9030a12.0 for ; Thu, 28 Nov 2024 10:18:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1732817938; x=1733422738; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=jBQKGMceh2eim8+DXFzAVIWaviXIbhui6CqSR6Gz0ns=; b=YAYbuLRo3ng+zjJF6JOqQBMwVZb30wHLHTKFG76V4ox5Br3qkYZLKWr6EDTnhoTjp8 H0Vuo0YELWTKihjIr8HTqSdUbuoLQkQXoWMEy2fYej0IWOs2Mh39pXHiwLQSsJ6wTvb8 idhjDXXBsupSaUzAXHwHH7GSLVgXvH7SznnSLa0/cIQLbCOUDZv+tB4+/TtAU60m6oKQ gM+z4AGxnzcIR/5rz/0LlCKNBg/tmguH5frZe81szvktF+SNx/AxZIZ5UEJvI0uk8lyW VXZ6pCvJ+JS4wyxz5nwpsO29bNcKyJoQNsmeYajzFbtSdhw0YFdeHT2sp8tSTd0f4W5i FdFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732817938; x=1733422738; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jBQKGMceh2eim8+DXFzAVIWaviXIbhui6CqSR6Gz0ns=; b=f8a5XqEw80cB3bPC7YW4/djoX2FtaEVsFe27StYEclOa12G60o6MaQ7IvtzEQgipQC dCcy59ua6obqeK++ry1v3/VPDx0ntyG45ouDaBTagHAi2rjB5W6v4qpWqux27d9r3FH/ rZIkjNHp+LD0IMK8ISpQE8Th8yy70fiKz6cXSrnwxsSc5ozpd+ojBA02h2BqBBhoVqrJ MAiWjthxWwuQkgcOyLKO3bs3ESSaaTJzn7VoM3RixPltwlqrb1zqIKvku755yZVlHGke WpyQrz8UjVlLX2Stb1GJNtf8BRu4MKSJamOp35GkwxlwwECFBDNPeeHqyQ0F0ZXhfs5r W9hQ== X-Forwarded-Encrypted: i=1; AJvYcCUN5bwZGbLDDniyKePBtfMWPw1WFWTsOgjiLcCNvvIF20ChN39vdBknAbhhrKVL1/l+55VOFmFXng==@kvack.org X-Gm-Message-State: AOJu0YzWJXuiBjLRnugWoqDxIM7hv2HiffRH/GGBSO35rEhgcYzTMLk5 pL2VgTOI7IlLB0PaeTlQdx1JPVwP/qx/NW4J2m83/3tRqq/qvJQ3yPJSukrDnrJRGLFbOCYWDiY 9NRxlKBgfltdmaMlahXihcy8PAwMSjJPFpZFO X-Gm-Gg: ASbGncsFgj8M1LFs9u+hDeijx2YN/tdHTIJ7EyT71IrmgeUcm+0hBIcVL8LM6JoM9rL ZjlJLdOojYkub8vwuZ54C/Kpgqqrtj7VoEL6kZVMx5toR3djY8wGid70Hwlc= X-Google-Smtp-Source: AGHT+IEDndyfEI17lLdZ9F2PAmODz4L955CZSUO/X2tz83vpI1YhVZbQrg1GudU5w1kIHL5tz118RYG5Aa1xsq8ESrE= X-Received: by 2002:a50:ed8f:0:b0:5d0:acf3:e3a6 with SMTP id 4fb4d7f45d1cf-5d0acf3e47amr497a12.1.1732817937353; Thu, 28 Nov 2024 10:18:57 -0800 (PST) MIME-Version: 1.0 References: <99fc35d2c62bd2e05571cf60d9f8b843c56069e0.1732804776.git.lorenzo.stoakes@oracle.com> <8219cfd3-f488-492c-8d4c-26e9f0169e8e@lucifer.local> In-Reply-To: <8219cfd3-f488-492c-8d4c-26e9f0169e8e@lucifer.local> From: Jann Horn Date: Thu, 28 Nov 2024 19:18:20 +0100 Message-ID: Subject: Re: [PATCH 1/2] mm: reinstate ability to map write-sealed memfd mappings read-only To: Lorenzo Stoakes Cc: Julian Orth , Andrew Morton , "Liam R . Howlett" , Vlastimil Babka , Shuah Khan , Linus Torvalds , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: zi84wokm4pqbf39mugeyazdem1d1y7fe X-Rspam-User: X-Rspamd-Queue-Id: 7194A16000C X-Rspamd-Server: rspam08 X-HE-Tag: 1732817934-335332 X-HE-Meta: U2FsdGVkX19j6uPLCstuqPGS8BzMzoR+NqAcLjOC4tD7ExiVmrqLwk8ESBr2JyTgcq7fZBaM5U4v78+v50WPbKbdHqIRVXlPEV4bbl8x+u5HArcblzepv58D8y8YR/S3uJTdIeBvRkkuYZGxf9y/H+moIOK6su+uSgEBtREvKECHCxtwwzL15jwdFbLdwvKu9wPDCSfVi6+cXPLmSqfoQQTzEwOFl4vOhKuvjGLL8NBXrVlH0SXYGXEksLIIpgk7JEEDRV+NtJhQkShwrce5VEDKSVf+g7S9XQmy+vPOz0o0ooFTbHBONu1pjtJCCxgfnnHdRVVHVx+eOoox7N6bXwNvAZe8KvyvyMdFHONGZNeLcqrRIOlA/C5sFk8le8Cogsl4q1/YeipGQmeL1MBF0pWcIxLzsi8Jfrl9/7a3mbFqT8cW6/t5WVbPj673Ii54+7h06ws30xGjFP2bBHS57I9e9CygBD94OR5M5R0eI/jNOuSypoXtrevGTogHLmYVxpp7z7UxH53WXJMnVhMQi1eo96FaZcTH7TjvM2x7kWchfURefnF7rd3xdMlAaSilCHvhrIrwBq9x1+Z27LGUxxvtvDnGDLAqp43G4yDvBuBHdubipMYVWrPCLgdhcEFuHnkHLePJCDOpSzdLM/ZshWnh2QrrvxaqvD56nwajZbn+DxU3RpDQqh6EkmSXpFCDHhAPCTYwJcUJ6dY0dmb6VO6F/DotVSpxf5ZaW5hHyxNIyYEFZeQohKXCBXEVTDSjni/PpMJ5ACQ8WYfRU1p/t/e2VWkTOhZCOh0Q7MbZ39qSyRB/S8VAJJzW37MdSJ0tTvc4CxWxGnmXY3gBlz/qfQdKWSTX1A3BGi6l29CAsxf9eIn7LR1x0OgmzhvfnIke12zmBa/3j21/V7/Vm5CPtmfuc8y1IgYoFXojThcwR/l3m8KthdtPlE9DIAzybtzAbr5RyK2tRcHbpdlbWY8 1ToxmRrQ 2FbDV1FjxFU+n6TwkAIQbW7rTRgzKyEBs725cpYYg8EdJMzOOOpkoYx/jQfQ8hD46Vt+tszXwRIIX2/VOfNmJuMuJN/XXXyBVp6Zp8TmAyAjUMamNNkZaADMQvFEcz1ZGK2G95+XPFmTM1PAHf7tlkv4IRkxyFFhKYm2G3VaniIoZkb9/IJKC4vmJsm1aWIF7NS9ic4nGCS+mCLDqxXEVSk/SICJPk3jfL26LgnuJE/8to5jcu4ZBCbkrw17z9c5I9CIjDDhf5OZvam/YOJvNWFSPU6ezXbo1OsUBXMRAtKMpxjOoBUuFdtVxOE1quZpf1K4ZcnTDW0a8+CI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Nov 28, 2024 at 7:05=E2=80=AFPM Lorenzo Stoakes wrote: > On Thu, Nov 28, 2024 at 06:45:46PM +0100, Jann Horn wrote: > > On Thu, Nov 28, 2024 at 4:06=E2=80=AFPM Lorenzo Stoakes > > wrote: > > > In commit 158978945f31 ("mm: perform the mapping_map_writable() check= after > > > call_mmap()") (and preceding changes in the same series) it became po= ssible > > > to mmap() F_SEAL_WRITE sealed memfd mappings read-only. > > > > > > This was previously unnecessarily disallowed, despite the man page > > > documentation indicating that it would be, thereby limiting the usefu= lness > > > of F_SEAL_WRITE logic. > > > > > > We fixed this by adapting logic that existed for the F_SEAL_FUTURE_WR= ITE > > > seal (one which disallows future writes to the memfd) to also be used= for > > > F_SEAL_WRITE. > > > > > > For background - the F_SEAL_FUTURE_WRITE seal clears VM_MAYWRITE for = a > > > read-only mapping to disallow mprotect() from overriding the seal - a= n > > > operation performed by seal_check_write(), invoked from shmem_mmap(),= the > > > f_op->mmap() hook used by shmem mappings. > > > > > > By extending this to F_SEAL_WRITE and critically - checking > > > mapping_map_writable() to determine if we may map the memfd AFTER we = invoke > > > shmem_mmap() - the desired logic becomes possible. This is because > > > mapping_map_writable() explicitly checks for VM_MAYWRITE, which we wi= ll > > > have cleared. > > > > > > Commit 5de195060b2e ("mm: resolve faulty mmap_region() error path > > > behaviour") unintentionally undid this logic by moving the > > > mapping_map_writable() check before the shmem_mmap() hook is invoked, > > > thereby regressing this change. > > > > > > We reinstate this functionality by moving the check out of shmem_mmap= () and > > > instead performing it in do_mmap() at the point at which VMA flags ar= e > > > being determined, which seems in any case to be a more appropriate pl= ace in > > > which to make this determination. > > > > > > In order to achieve this we rework memfd seal logic to allow us acces= s to > > > this information using existing logic and eliminate the clearing of > > > VM_MAYWRITE from seal_check_write() which we are performing in do_mma= p() > > > instead. > > > > If we already check is_readonly_sealed() and strip VM_MAYWRITE in > > do_mmap(), without holding any kind of lock or counter on the file > > yet, then this check is clearly racy somehow, right? I think we have a > > race where we intermittently reject shared-readonly mmap() calls? > > > > Like: > > > > process 1: calls mmap(PROT_READ, MAP_PRIVATE), checks is_readonly_seale= d() > > process 2: adds a F_SEAL_WRITE seal > > process 1: enters mmap_region(), is_shared_maywrite() is true, > > mapping_map_writable() fails > > I don't think this matters? Firstly these would have to be threads unless= you > are going out of your way to transmit the memfd incompletely set up via a= socket > or something, and then you'd have to be doing it on the assumption that i= t > wouldn't race? Ah, I guess that's true. > The whole purpose of this change is to _allow read-only mapping *at all*_= . Not > to avoid silly races that are the product of somebody doing stupid things= . > > > > > But even if we fix that, the same scenario would result in > > F_SEAL_WRITE randomly failing depending on the ordering, so it's not > > like we can actually do anything particularly sensible if these two > > operations race. Taking a step back, read-only shared mappings of > > F_SEAL_WRITE-sealed files are just kind of a bad idea because if > > someone first creates a read-only shared mapping and *then* tries to > > apply F_SEAL_WRITE, that won't work because the existing mapping will > > be VM_MAYWRITE. > > I don't think so? > > If they try to do that, attempting to apply the seal will fail as write w= ill be > disallowed. So there's no risk of overriding the seal. > > The idea is you establish a buffer, write into it, unmap, write-seal, and= now > you can mmap() it PROT_READ. > > Obviously it's not sensible (or really probably sensibly feasible) to try= to > find every VMA that has it opened VM_READ | VM_MAYWRITE and clear the > VM_MAYWRITE, so instead we simply disallow that scenario. > > But it's totally reasonable to be able to mmap() it readonly afterwards. > > > > > And the manpage is just misleading on interaction with shared mappings > > in general, it says "Using the F_ADD_SEALS operation to set the > > F_SEAL_WRITE seal fails with EBUSY if any writable, shared mapping > > exists" when actually, it more or less fails if any shared mapping > > exists at all. > > No, it's when any writable mapping exists after my changes :) but people > might not be quite aware of the distinction between VM_MAYWRITE and > VM_WRITE. To clarify, do you read "writable" as "VM_MAYWRITE|VM_SHARED"? > > @Julian Orth: Did you report this regression because this change > > caused issues with existing userspace code? > > > > > Reported-by: Julian Orth > > > Closes: https://lore.kernel.org/all/CAHijbEUMhvJTN9Xw1GmbM266FXXv=3DU= 7s4L_Jem5x3AaPZxrYpQ@mail.gmail.com/ > > > Fixes: 5de195060b2e ("mm: resolve faulty mmap_region() error path beh= aviour") > > > Signed-off-by: Lorenzo Stoakes > > In any case, we are not discussing my original patch in 6.6 that permitte= d > this behaviour, whether you agree or disagree it was sensible, we have > regressed user-visible behaviour, this change restores it. Hm, yeah, you're right.