From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAE15D70DE8 for ; Thu, 28 Nov 2024 18:27:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 424E96B0083; Thu, 28 Nov 2024 13:27:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3D3CF6B0085; Thu, 28 Nov 2024 13:27:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 274066B0088; Thu, 28 Nov 2024 13:27:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 089196B0083 for ; Thu, 28 Nov 2024 13:27:16 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 8FD001C6165 for ; Thu, 28 Nov 2024 18:27:15 +0000 (UTC) X-FDA: 82836335778.21.B0A24D7 Received: from mail-yw1-f178.google.com (mail-yw1-f178.google.com [209.85.128.178]) by imf04.hostedemail.com (Postfix) with ESMTP id 4EDDA40003 for ; Thu, 28 Nov 2024 18:27:05 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="ZFKS/4sb"; spf=pass (imf04.hostedemail.com: domain of ju.orth@gmail.com designates 209.85.128.178 as permitted sender) smtp.mailfrom=ju.orth@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732818425; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Xr2wcVj7SaU2JW4+8FivLZoOgTtjpwjka3vtiBZxC6g=; b=G+gdhanspDDESQrIWIP8DWfM5PZFmsNstkyIBjyoI8S3HMywBpx1Xf/t33PnW+4twk99hG ClO/nxWFlYnzofpzLMS/4bmxPoOMjZAtUXV3/w3gvgLbHa1VSKSHz3UVX+bN+lX51OITfT Tu2+FiG4KM9myTYjN7E26DAl/uuerpA= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="ZFKS/4sb"; spf=pass (imf04.hostedemail.com: domain of ju.orth@gmail.com designates 209.85.128.178 as permitted sender) smtp.mailfrom=ju.orth@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732818425; a=rsa-sha256; cv=none; b=B61qmpCoG44a76PMx7FyFoK25nrsIwAbYkdYjiXEN35lbt9ACzuuf4yEt8X+EaJaxelbBZ RAKhJDZvwcMw+c9ipJ3Ott9d2LB5y3RkxoXM8FGN1vokLMsd54IggzW/8DgJA41UQzIkJh 5OdSDfmvhRbCUMRMPs/oR5dGRR4y2Wg= Received: by mail-yw1-f178.google.com with SMTP id 00721157ae682-6ef4084df4cso10771947b3.1 for ; Thu, 28 Nov 2024 10:27:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1732818433; x=1733423233; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Xr2wcVj7SaU2JW4+8FivLZoOgTtjpwjka3vtiBZxC6g=; b=ZFKS/4sbwZSSTn3QGD5HFoit/xFlITj/Lxll3dUA8CSpwcp3yg1zjghIfWFiT4sUZn dDAInBmdSe20TzHRwf0pso8vIntUw9RXLdy3bHrUjmT75pccJIZ1gv9w1J8xa5eXJfAn ic4rvM3Lu/cN/EVwgh2UWlT5N3763QMEAHarxuN1x9/uWRpbOICGuuk83ra0GvzLq1LE DvTWavfkZbMMWRwDRcjjnKcCCPmCCDq8jexVg/T0Al4NyXmMu4oFTxyQJEj3YZU+cbvp qLksMpqxiT5whhL0OWszvXfI7MiXP9rBXTYjZ+j4U7JuE8ZmmrK64pACWG6jO+bE8TVX L3uA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732818433; x=1733423233; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Xr2wcVj7SaU2JW4+8FivLZoOgTtjpwjka3vtiBZxC6g=; b=FTjdao1iw9RsQtixGMUIIHxfd8v53/6tESBXt20KBt1Uq3yO1IaDnxLKmItAV+/Tug uqevVqhJoLfzUFbZ3M0vED6b0Vtz3oDXGZY5g0Hvj6ZKCXJ1rlQ9tGukCuqwvLFEFWBf eiMTjxnn6CKBlHzAFtf0S1nFVWtnkBkR0Ze9dZ0UsBd2Pl7DFL0dLSZeP9bzkpA/IYuG tWU+4yi150yCEIGcVlBGE7LNSBJf11OkPxz4cU3zvHYhNWZm11VlnWbRUdnQZ45K6H7m F2L2RYXboFkoF5vHeHJdzwQWDZIgEFzlzOhfkOswmeLEZT1647JyGaMj7YZPt+3aK14q nunA== X-Forwarded-Encrypted: i=1; AJvYcCVVH4bgdVwPrmn6x293AyWM8kkQ8xrYe0rqKi5BaQ/fXYbeTB4Dn3VJd+sLS3eheMomw6/I7gePpQ==@kvack.org X-Gm-Message-State: AOJu0YzPYhPKtNm6ZBtpDVrPxULOCB3TlSnuimMC1Al/W3hztjqVDd55 tsgGs+Xkgs0yb4bKh7MsCevRzljCUOi2YYMZ4H/sz2+/plyCsTUsQD+V5mbfg+SEOwtv4L5RgT+ xcXgWFThbv6y+EwTZcupZD/cOhkM= X-Gm-Gg: ASbGncuZk6aMqS5VEvAVFuWwdPr/OKpTchWJIpe+at+EwU+1iVKTUW1gWKboBfcVh3h AxcWeN+ZUKrsVlILK5MayQHnEGV3Vsftyp1cfo5c0 X-Google-Smtp-Source: AGHT+IH5navH/hx/UmIJ7Jn9h1O9wNGtyaLnCSk0q13jWCma7ZDVYjqdUYNzrcTvbe1Nct2UWvwIS4TGMzJAYblSi84= X-Received: by 2002:a05:690c:2506:b0:6ee:baf5:a4e8 with SMTP id 00721157ae682-6ef3729d86bmr92951457b3.39.1732818432814; Thu, 28 Nov 2024 10:27:12 -0800 (PST) MIME-Version: 1.0 References: <99fc35d2c62bd2e05571cf60d9f8b843c56069e0.1732804776.git.lorenzo.stoakes@oracle.com> <03678794-2e09-4b93-aacb-90ca6ab36682@lucifer.local> In-Reply-To: <03678794-2e09-4b93-aacb-90ca6ab36682@lucifer.local> From: Julian Orth Date: Thu, 28 Nov 2024 19:27:02 +0100 Message-ID: Subject: Re: [PATCH 1/2] mm: reinstate ability to map write-sealed memfd mappings read-only To: Lorenzo Stoakes Cc: Jann Horn , Andrew Morton , "Liam R . Howlett" , Vlastimil Babka , Shuah Khan , Linus Torvalds , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 4EDDA40003 X-Rspam-User: X-Stat-Signature: bx7gqwwatr161qqd8d1ejod7ez94y86z X-HE-Tag: 1732818425-6885 X-HE-Meta: U2FsdGVkX1/N9LbQ7LMpxF9M8Hsbx8NhqiLROaf9s1fjENC0E25jC13EkVCBW8IQx04lJRJUYBfH94QS4WDuBoUWmj+hULfxFAubnypPXhn8aZep9TY8LwB0opEGiBhxQljDZv0+/P5GyFksyPyz25s4+RPc3mRWx3aHBRib2+c1GHp3Pd1WjAE1ROflr22gGLf3hBhr/XDuwL2epf1M0ZmZABmCxA4dZVey8IJdUUXNqM44r54sbuHxwwGwiqE1SgOb5WlPe734cbIFwj3jNHXQi037eFOJ+ZAe2uuYwo7vvoFQ4GfHl287eh17JQ3FySJ0GGRcmEYsHX0hMCfPCsIu2Cq7JiJMbaEbGfgWHt2WtUpRwrP2mh83ETiKUYiXCDRmXlk1Bmcbx2VlOU/9DIysxS4seJgePDd9+yE0Ta70tAmQs6Tx76LY2Srg1+vdNMVNIC8tKR2HG/V+RVY0XsxuaW5eVuQO40K6Zmi3HHcFQn2QhdbSMsgUpnZLKKu/ldCxLZlWobNJAdeJqQTQcCvDttIaVXMDpeHXZ8YW7DYqIjxlstmXChH4356OuwhXYHxQcQCVfRuUA9PovryVFYIYvTg/qok0lZ3zFlhd1siIrjY+wKWBUGcFBewBClDNJ5kRsP6QkhWtuJHmjkdb+vGt/mlForS174kGWxhaPZUoBV8ncTTv1ax/7E+1pZRPEt7SEuuHZD1kqNn1Bpp/pkHYuJXe8UX9yLkdwatB4+yyRpe16euncB0aEiXRt/dAZTcv9kCgl/JPGQf5C7URO/SgWNVGoaz7JmtGyV7bLCqVNzcsnINreMdhwKXw88K/jAbRzLFjAZx4bGHYfjY5NYwHXjAbrGMXmUDTMe6uHywJZ5FS0KGKzbzqSSGwtdWuyOIuWcOd6a0Kc9sbiAMdPSJBSSKkizHkQCBtYMFkKMTlEAMaBA+tlRm/ztNNuBiXqnnhirGxh21pwOs0UCN njHUyv0W zyPwy2UOKLwseZtUsHomTR6bpDDEX9dZiUnBnJaM8PqqwBv6fWvMXkyDgvBHiy1NccFpcy3wA3jBDEuYg40SkULwHR9EqHeXe/JKTXDa9iYnPhCDo47D1CnucjFif2nn28pINdG/dciCpOxK2bpaI8xljJVLrBxew9yMmGUxsWvVk+5/ibuZe2jbWTnbjtG7EdjYw4f3OXubyMho/lvVuoF8ox6RIawVmIPUSLpIcHaix9pCEfx4IE4a2c+IskxqQNYOs6Xl2fxIbo+pnZcrt3iiL6uZRUxr50+e7Sn2y6UQDzkyIk1m2rLjHUchhz3aNK53Zq32ldOMdnK1GW+Ri3dHKJAIT2NWIp5ZfotdiMGYcA54UGt9lZZ3ZU433Y6/1PoMEXW+KHUd2Ae/ZraYmH0eCcj/ozt2mPz4S X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Nov 28, 2024 at 7:20=E2=80=AFPM Lorenzo Stoakes wrote: > > On Thu, Nov 28, 2024 at 06:58:21PM +0100, Julian Orth wrote: > > (Re-sending the message below since I forgot to reply-all) > > > > On Thu, Nov 28, 2024 at 6:46=E2=80=AFPM Jann Horn wr= ote: > > > > > > On Thu, Nov 28, 2024 at 4:06=E2=80=AFPM Lorenzo Stoakes > > > wrote: > > > > In commit 158978945f31 ("mm: perform the mapping_map_writable() che= ck after > > > > call_mmap()") (and preceding changes in the same series) it became = possible > > > > to mmap() F_SEAL_WRITE sealed memfd mappings read-only. > > > > > > > > This was previously unnecessarily disallowed, despite the man page > > > > documentation indicating that it would be, thereby limiting the use= fulness > > > > of F_SEAL_WRITE logic. > > > > > > > > We fixed this by adapting logic that existed for the F_SEAL_FUTURE_= WRITE > > > > seal (one which disallows future writes to the memfd) to also be us= ed for > > > > F_SEAL_WRITE. > > > > > > > > For background - the F_SEAL_FUTURE_WRITE seal clears VM_MAYWRITE fo= r a > > > > read-only mapping to disallow mprotect() from overriding the seal -= an > > > > operation performed by seal_check_write(), invoked from shmem_mmap(= ), the > > > > f_op->mmap() hook used by shmem mappings. > > > > > > > > By extending this to F_SEAL_WRITE and critically - checking > > > > mapping_map_writable() to determine if we may map the memfd AFTER w= e invoke > > > > shmem_mmap() - the desired logic becomes possible. This is because > > > > mapping_map_writable() explicitly checks for VM_MAYWRITE, which we = will > > > > have cleared. > > > > > > > > Commit 5de195060b2e ("mm: resolve faulty mmap_region() error path > > > > behaviour") unintentionally undid this logic by moving the > > > > mapping_map_writable() check before the shmem_mmap() hook is invoke= d, > > > > thereby regressing this change. > > > > > > > > We reinstate this functionality by moving the check out of shmem_mm= ap() and > > > > instead performing it in do_mmap() at the point at which VMA flags = are > > > > being determined, which seems in any case to be a more appropriate = place in > > > > which to make this determination. > > > > > > > > In order to achieve this we rework memfd seal logic to allow us acc= ess to > > > > this information using existing logic and eliminate the clearing of > > > > VM_MAYWRITE from seal_check_write() which we are performing in do_m= map() > > > > instead. > > > > > > If we already check is_readonly_sealed() and strip VM_MAYWRITE in > > > do_mmap(), without holding any kind of lock or counter on the file > > > yet, then this check is clearly racy somehow, right? I think we have = a > > > race where we intermittently reject shared-readonly mmap() calls? > > > > Apropos race, some time ago I reported a way to get a mutable mapping > > for a write-sealed memfd via a race: > > > > https://bugzilla.kernel.org/show_bug.cgi?id=3D219106 > > Kind of hard to read rust code, but it looks like you're intentionally > trying to race sealing on the assumption it's atomic when it's not? That > doesn't seem like a bug? > > The intent of sealing memfds is you establish the memfd buffer, then seal > it and _only then_ expose it elsewhere. > > I may be missing something here, however. systemd allows the client to send journal logs via a file descriptor. If this file descriptor is a write-sealed, truncate-sealed memfd, systemd uses an optimized code path that relies on the contents of the file being immutable. I could not find any actual way to exploit this nor could I find any other open-source code that relies on write-seals being accurate. But maybe there is some code out there that could be exploited this way. E.g. if it used the contents of the file as an array index, then you could easily get a TOC/TOU exploit with this. > > > > > > > > > Like: > > > > > > process 1: calls mmap(PROT_READ, MAP_PRIVATE), checks is_readonly_sea= led() > > > process 2: adds a F_SEAL_WRITE seal > > > process 1: enters mmap_region(), is_shared_maywrite() is true, > > > mapping_map_writable() fails > > > > > > But even if we fix that, the same scenario would result in > > > F_SEAL_WRITE randomly failing depending on the ordering, so it's not > > > like we can actually do anything particularly sensible if these two > > > operations race. Taking a step back, read-only shared mappings of > > > F_SEAL_WRITE-sealed files are just kind of a bad idea because if > > > someone first creates a read-only shared mapping and *then* tries to > > > apply F_SEAL_WRITE, that won't work because the existing mapping will > > > be VM_MAYWRITE. > > > > > > And the manpage is just misleading on interaction with shared mapping= s > > > in general, it says "Using the F_ADD_SEALS operation to set the > > > F_SEAL_WRITE seal fails with EBUSY if any writable, shared mapping > > > exists" when actually, it more or less fails if any shared mapping > > > exists at all. > > > > > > @Julian Orth: Did you report this regression because this change > > > caused issues with existing userspace code? > > > > I noticed this because it broke one of my testcases. It would also > > affect production code but making that code work on pre-6.6 kernels is > > probably a good idea anyway. > > Thanks for having that test case! I have added a test here to ensure we d= o > not regress this again. > > This was a new feature introduced in 6.6, there is no reason to backport = it > to any earlier kernels if this is what you mean :) > > It's more a convenience thing like 'hm I can read() this but I can > mmap-read this even though the man page says I can'. > > > > > > > > > > Reported-by: Julian Orth > > > > Closes: https://lore.kernel.org/all/CAHijbEUMhvJTN9Xw1GmbM266FXXv= =3DU7s4L_Jem5x3AaPZxrYpQ@mail.gmail.com/ > > > > Fixes: 5de195060b2e ("mm: resolve faulty mmap_region() error path b= ehaviour") > > > > Signed-off-by: Lorenzo Stoakes