From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B60AD70DE9 for ; Thu, 28 Nov 2024 17:46:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 22F616B0083; Thu, 28 Nov 2024 12:46:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1DFA56B0085; Thu, 28 Nov 2024 12:46:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0A84A6B0088; Thu, 28 Nov 2024 12:46:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E05D96B0083 for ; Thu, 28 Nov 2024 12:46:26 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 887791405AD for ; Thu, 28 Nov 2024 17:46:26 +0000 (UTC) X-FDA: 82836233004.05.61D0137 Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) by imf03.hostedemail.com (Postfix) with ESMTP id 68E6B20013 for ; Thu, 28 Nov 2024 17:46:21 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=RwIzM8Tb; spf=pass (imf03.hostedemail.com: domain of jannh@google.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732815979; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xfMtrktkr9Zu3Sugjz2kzAdSEEF9EQmcCqe5fFm09fQ=; b=ShvYpxWbhxB1xZjJGY+s6wOagXfhugmYvtgopAv/Ny8YwKJ4v34TnAtnFozgEWieUBLJzX QEGsHy+leegw2pmvyviz9u5PKJXn7HXiWVdsn74HiOFGi422a0qh+xSnWKWaLZk5hXuhWM B4ZkTkKioJ2l+C8j6qOrtgOUqM5c7oA= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=RwIzM8Tb; spf=pass (imf03.hostedemail.com: domain of jannh@google.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732815979; a=rsa-sha256; cv=none; b=E9YJiyk6eqaaRDSoSJg3sJ4+iqnRAyqBzY3sqHc5nCy8LR5vtjABB9Snfh3QLAMKUQMfAU rpmPIFjLg5NbNvsXww5lxRX0tOsuFOrkCAw8xfw5q0oMnz1h1xJS9pIAs3r4EaYo4YT9p8 zwLn7Cn0u8FkzoP04Axj+i4TnYL4X4A= Received: by mail-ed1-f42.google.com with SMTP id 4fb4d7f45d1cf-5cfc264b8b6so9098a12.0 for ; Thu, 28 Nov 2024 09:46:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1732815983; x=1733420783; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=xfMtrktkr9Zu3Sugjz2kzAdSEEF9EQmcCqe5fFm09fQ=; b=RwIzM8TbF9mqgXMoi27xnkI6VFuohca8Tp3rGwbHbVbLyh8DVAGj4htDKgqBgQX+e6 QOgRKbb3AVkVwvnXL+Abbov19f4kvcYZo5lKYSh64GZOUzytGCsq+xnmp7D5fogufWG4 NMLeDcRbyFpTZIDsrcSWoil1RLVp5KVA0ThJaPdps5zhiZA/1Px9TDqlO4gPQ9qct9dz AUrJiYpN6uC4kqL5sVxN/BP0uApCtRtfoEU/vKV9NQyTmd1lnQ1V/F3ln1ydkBh/P7En K7+CsAL0IyTHRi0o3Vokx+CVn3YrorUY7xxQofTncKBo9To5tJz3Vg6locFGJBzDvpxl lHKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732815983; x=1733420783; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xfMtrktkr9Zu3Sugjz2kzAdSEEF9EQmcCqe5fFm09fQ=; b=h6FQXvfSg4jkPMtxqC0kP+b/qPo1DVYLeB9JbaUT3Oa3nJ4umWiB0MTa88FRmMPtyK mV0zHnTIAYmYdr0HOHs1+DcvsZ/luJ9vEZfj+y/8nn9BizEWFkeUbFenlDon++/Fr2kc jsbCgpmB1oXgdRnXmc/iCdCOuF+r5kj4I5NX1+Kgo+6QLRlUV2K9nOJO0ql3qXhCRLm1 MPe7Z+kX6M6w+pI3JSqpxeR6bwQD3eW12IWcB7/+JzsrkQIdMiOo8K2OxTwlTGPq0U19 mybpw5kEVU+nyjkogGoyMn7CKjhr0GsVJYS+1QzGYL+v2+7nsMUklbdW3af1PKdOaoDe sdKA== X-Forwarded-Encrypted: i=1; AJvYcCV04vIF7IIK2Lui3BbukreC6RpZNBqurlRyCWFOjVYGvlxlfS6RMraLB0Se8jmNuFQqQ64W9z3WuQ==@kvack.org X-Gm-Message-State: AOJu0YwVJ+41pSnmiQcc8q8WtHptGT2LVz6gRQdIQQzupY8uEQJeBRh+ jLY2Y7whlA49xjrC53jYu1Q3vwLns4GUv073CkdU83eSTzppk6T87XOnBMGPpYkNGaohI+2CcnF 8hVb9+iOiUISNq8PvlkHqwPiIVnFKlBOZ2SuF X-Gm-Gg: ASbGncsuL9+lA/UYts5IcIcgmhxBXIqiK0cPqHHCdd8gJfwMEiwIVBRPgJlOfOr+lp5 CZcv4JDShwjvfILH24Tw3JFCw5sgBog2DPri+UW1a1WskUuWBAVVy54/IKbE= X-Google-Smtp-Source: AGHT+IFE4f8J+2OyfXEtsnJk8KQJKomP7u9NV3Lgu3wZ2xI/5CcK+/hsYP/OfXOTFHkLmSTMWEKADTKc0Ow3bC+lz64= X-Received: by 2002:a50:8e13:0:b0:5d0:9990:ac8b with SMTP id 4fb4d7f45d1cf-5d09990ad93mr52309a12.2.1732815982650; Thu, 28 Nov 2024 09:46:22 -0800 (PST) MIME-Version: 1.0 References: <99fc35d2c62bd2e05571cf60d9f8b843c56069e0.1732804776.git.lorenzo.stoakes@oracle.com> In-Reply-To: <99fc35d2c62bd2e05571cf60d9f8b843c56069e0.1732804776.git.lorenzo.stoakes@oracle.com> From: Jann Horn Date: Thu, 28 Nov 2024 18:45:46 +0100 Message-ID: Subject: Re: [PATCH 1/2] mm: reinstate ability to map write-sealed memfd mappings read-only To: Lorenzo Stoakes , Julian Orth Cc: Andrew Morton , "Liam R . Howlett" , Vlastimil Babka , Shuah Khan , Linus Torvalds , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 68E6B20013 X-Rspamd-Server: rspam12 X-Stat-Signature: qn13oismxceeo5czibrotp9uk5rbnx91 X-Rspam-User: X-HE-Tag: 1732815981-74053 X-HE-Meta: U2FsdGVkX1/lqNJ3QHc68tWvHtDb1lEOKqvwZvQbpKr3Ui0sydesyA5fHfjN6wiesq9Lu++d5NNmMvk6BmavegWs6LaiqCdo42j6/HXzqxPh2xBu6ENm5OmO02QckdSoq7LqJa3ZVtGkoV5FVR8kO5QZFKCNg+zCVka+YggTKLegBs2PQ0tt1n0eJN++p+x//bcszDeZ6c4svOBayY8bb+Oag967mO39xZAvGKP7ht6cDWePLAg0PrNzynqei9it4XlmkRZQkyeVIAF7nvfZnesMhE+HKIVdI0M3ZcguScqrrES3Ad/kfigWJTQtbiccQ1grBGKihr24Npe4ooNL0oZPXkXLa6faSqqO1SR/My8ItFRFxIE3Ae0KEITPXdXbnvBycU0TVN3KnA6IVP+EGoUJBxVDWL0y5i5UGBqy+Mvf5TzAwA52n0Ukddv5jCk3051YdgV2u0corW4EQsRf3bKKcKeoJvYd6CpxZJsOSxwhGUOWV1F6GpPOMRTwK9LwlE6ci2KigGuLlHpqVJFJwPa2NmUMSUnxlZ0j148H75E5IteSdqyRbYqMj3ow4auKfyRCttnz3wCP07F1U9WDpjOzyCtgaH3fGmShg4Ltohc+Hd5l7MNfe8xwXXObtEWfVTh0UjLxjrTWaJ9i0EyCqpG0XqT6Sx4BIxO+U2Z858Z96YpkY7f3q5JYajm2j5SriUvOy/awW5pNQ6FrpfFXgA4zuGGjjNWwncJ/GolxSAYJbRohQ7CiTVPgBLB+rTN7dohpnqXbBr+KY/uDCIVz0z3VHIr1fDknftfrSFiL3Qj+qyuMvbScNWIF6gV/GvyTxDrl4r0ArqS6uiQdbmIFSdShTdJ8ZRqBPo6PwpHgk6xSQw38XtxQtoMgYdqfuenwIo5VV+k80nO+nXSIiHZWc4QtW/xZhR6qqvBZ6SkKvCdSS3w0KQVOnGnrXH+D4cnScpjRo6IvmLbadrhPPF4 iv8qE/Cx N2Magh6fI7sPR2E7KLNOj8TQ53/xz4OHK3gBlEWlLpRpjoFQLUBKVKza6xYNByVxKdmdV4IxR1IY1xrOzNtfusQ5Dyrc81wqe9eozBVL0ftvDFz9x3UsSFkaTY7AKNw4F7mTKs+/IbE6V4bmfega2pejlPqI9Ra06+oPUyIwyoLx/87lSsB+j3FOxKWl2O8mK6u5xVj+j/6h9sIOlPF9Al60Y8xKtjex/7gtXTR1PEFHT86rtDlr5E+hykQM2nLlvm//BtoV+OCy9N3Q9rslpw9K3oG2240YSze5jkdv8MsrW8vLcBDPgBDNyllW1NJNFoIYcRw0Sfo0GSK4mPO5B5igX4Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Nov 28, 2024 at 4:06=E2=80=AFPM Lorenzo Stoakes wrote: > In commit 158978945f31 ("mm: perform the mapping_map_writable() check aft= er > call_mmap()") (and preceding changes in the same series) it became possib= le > to mmap() F_SEAL_WRITE sealed memfd mappings read-only. > > This was previously unnecessarily disallowed, despite the man page > documentation indicating that it would be, thereby limiting the usefulnes= s > of F_SEAL_WRITE logic. > > We fixed this by adapting logic that existed for the F_SEAL_FUTURE_WRITE > seal (one which disallows future writes to the memfd) to also be used for > F_SEAL_WRITE. > > For background - the F_SEAL_FUTURE_WRITE seal clears VM_MAYWRITE for a > read-only mapping to disallow mprotect() from overriding the seal - an > operation performed by seal_check_write(), invoked from shmem_mmap(), the > f_op->mmap() hook used by shmem mappings. > > By extending this to F_SEAL_WRITE and critically - checking > mapping_map_writable() to determine if we may map the memfd AFTER we invo= ke > shmem_mmap() - the desired logic becomes possible. This is because > mapping_map_writable() explicitly checks for VM_MAYWRITE, which we will > have cleared. > > Commit 5de195060b2e ("mm: resolve faulty mmap_region() error path > behaviour") unintentionally undid this logic by moving the > mapping_map_writable() check before the shmem_mmap() hook is invoked, > thereby regressing this change. > > We reinstate this functionality by moving the check out of shmem_mmap() a= nd > instead performing it in do_mmap() at the point at which VMA flags are > being determined, which seems in any case to be a more appropriate place = in > which to make this determination. > > In order to achieve this we rework memfd seal logic to allow us access to > this information using existing logic and eliminate the clearing of > VM_MAYWRITE from seal_check_write() which we are performing in do_mmap() > instead. If we already check is_readonly_sealed() and strip VM_MAYWRITE in do_mmap(), without holding any kind of lock or counter on the file yet, then this check is clearly racy somehow, right? I think we have a race where we intermittently reject shared-readonly mmap() calls? Like: process 1: calls mmap(PROT_READ, MAP_PRIVATE), checks is_readonly_sealed() process 2: adds a F_SEAL_WRITE seal process 1: enters mmap_region(), is_shared_maywrite() is true, mapping_map_writable() fails But even if we fix that, the same scenario would result in F_SEAL_WRITE randomly failing depending on the ordering, so it's not like we can actually do anything particularly sensible if these two operations race. Taking a step back, read-only shared mappings of F_SEAL_WRITE-sealed files are just kind of a bad idea because if someone first creates a read-only shared mapping and *then* tries to apply F_SEAL_WRITE, that won't work because the existing mapping will be VM_MAYWRITE. And the manpage is just misleading on interaction with shared mappings in general, it says "Using the F_ADD_SEALS operation to set the F_SEAL_WRITE seal fails with EBUSY if any writable, shared mapping exists" when actually, it more or less fails if any shared mapping exists at all. @Julian Orth: Did you report this regression because this change caused issues with existing userspace code? > Reported-by: Julian Orth > Closes: https://lore.kernel.org/all/CAHijbEUMhvJTN9Xw1GmbM266FXXv=3DU7s4L= _Jem5x3AaPZxrYpQ@mail.gmail.com/ > Fixes: 5de195060b2e ("mm: resolve faulty mmap_region() error path behavio= ur") > Signed-off-by: Lorenzo Stoakes