From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1B0AD70DE8 for ; Thu, 28 Nov 2024 17:58:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5713D6B0085; Thu, 28 Nov 2024 12:58:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5209D6B0089; Thu, 28 Nov 2024 12:58:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E7BD6B008C; Thu, 28 Nov 2024 12:58:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 1CA356B0085 for ; Thu, 28 Nov 2024 12:58:35 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id C2CE64056E for ; Thu, 28 Nov 2024 17:58:34 +0000 (UTC) X-FDA: 82836263412.07.C4DF986 Received: from mail-yw1-f182.google.com (mail-yw1-f182.google.com [209.85.128.182]) by imf28.hostedemail.com (Postfix) with ESMTP id A9774C0020 for ; Thu, 28 Nov 2024 17:58:22 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CNoznLa8; spf=pass (imf28.hostedemail.com: domain of ju.orth@gmail.com designates 209.85.128.182 as permitted sender) smtp.mailfrom=ju.orth@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732816704; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=T6RLN8HoPiYT3cM8ia7YAAZtHbp+Vo69fflIdtAi/MA=; b=pav3tWGqXYtRiO+65J0k6gLiNEK3Ft033ckWe8F4zXXENR5rBNak5CJ0RSGYueau8Asia3 wIwn+A3ueqdIGtzqSJldzJ197CauNGLMjm9oK9uKcLfNPzKAr4eU/cmeaDiPH85lWR0Mfv +6WMRuxG0srNv/uiBy/by7NpV4KY9Vk= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CNoznLa8; spf=pass (imf28.hostedemail.com: domain of ju.orth@gmail.com designates 209.85.128.182 as permitted sender) smtp.mailfrom=ju.orth@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732816704; a=rsa-sha256; cv=none; b=oxiwlmtPgs3hAq4r+6BuxKt9G+jYmuNaNsHI02ITAKYetKTgb7lnWT6WyNkezFfuKmTFmD Y25UlR8i4Gq1Pa/EopO9CYcFdQkmiHhlto3lGAp0mNOz1SP4vfD94O3O7wvSEBUau4aNRm 1upxV2LSKbRPqGEmlX5GAVs2K4ZXYOU= Received: by mail-yw1-f182.google.com with SMTP id 00721157ae682-6eebae6f013so12327297b3.0 for ; Thu, 28 Nov 2024 09:58:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1732816712; x=1733421512; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=T6RLN8HoPiYT3cM8ia7YAAZtHbp+Vo69fflIdtAi/MA=; b=CNoznLa81eAG28+mZpyLeOzhT3Ydiu5dn1spMID6oAz2aMyYH5l76XEyQAL6IV1VqY MCn73cRZL/TqtRMa3eGHVNX3sgfCdblo4z/pK7Q5gZnUzxyiepDM3ogmpdBnwImXUx48 kNqBxX7o6Ih+BDLKB8BMySyMdr4tQDzCxrdG1q4iMJQKjmrfsOu6jbgKqc15elq9x4Cr dKgskwd+hvLEkkYFilgF4t+u+VFqS0dozzUnvdqgjGzwiJCicqfdDCGKmyHloXB0zm/G n0tdOGyHlTnWCDbF2E8SEhrrO/YUKD8SZM3gEZu4ExfnUTvlTfYIcctqsiSJGDH16fXu HY6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732816712; x=1733421512; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=T6RLN8HoPiYT3cM8ia7YAAZtHbp+Vo69fflIdtAi/MA=; b=qmA1vaDzt/fPH+yey3Vbb6tw3QPNEGbKXlW+WifXNuj4So/DNNrcuQDfeYVj5fXzLd Y+MtAW+cKBwyO5Xm6m9yZnimVJ2dbXvFFeOYMwUdUqjysppkVCDgjN+tRWwGvDE4h5Mf bO0c5ZzVwYxnPCe21MJu9iFO8vTcjbau6Kf5tPifDKsoh92H7FyvjXxpClL0EKutK0sb hSP+Ytr51u+tuS43cYTJSxTCv2nyQKB2Lgz15YvbrtjVu7C88j21dplmCD1dUkP9x6yR QbdTb5MHC+er6KcxDOeZYJKJvgtzRtGo8GqKdM5j4akwOloUQpf8FS5JNygu1JJn0WHl GUvg== X-Forwarded-Encrypted: i=1; AJvYcCUIHso+PKUI8A41plWkysLm0S2DpFsguNXNTaQuy9bOe+cmiHVe4nTFo1lV8x28D5LgZ7KeSsi3Fw==@kvack.org X-Gm-Message-State: AOJu0YxuTMHQtRy/tes+Q5LkVpDhNKJvbMmxxxGSgqhB3WQQrugJHVWF o7m/UgnEOjjav11FLIfNmEowbe+3yPgak0XHXW6DF3QAQClNkF4LoKJnpLVJGcYivdrCbahhfHs D58I7ytUMpDG/WGd037qE9LBZsYQ= X-Gm-Gg: ASbGncvZGhRlkh4gl1zdzprqWmmPiS19WK+IZSI9sCqtlpGz6W8WIxUkKgC3QWq6Xfz VO8LEn41lMpjJflEIsplivTF7BcdI9W4FGvaT7M24 X-Google-Smtp-Source: AGHT+IGqtQMx2ADzvHfeedR0IXoeNkzKLsvmqGsHgy19ZsJLIAZxEgv4vqJSui9pa4nfYEe/P+jKNPMf/jUw8eirlt8= X-Received: by 2002:a05:690c:6e0d:b0:6ea:c928:771 with SMTP id 00721157ae682-6ef37274fadmr112213167b3.28.1732816712033; Thu, 28 Nov 2024 09:58:32 -0800 (PST) MIME-Version: 1.0 References: <99fc35d2c62bd2e05571cf60d9f8b843c56069e0.1732804776.git.lorenzo.stoakes@oracle.com> In-Reply-To: From: Julian Orth Date: Thu, 28 Nov 2024 18:58:21 +0100 Message-ID: Subject: Re: [PATCH 1/2] mm: reinstate ability to map write-sealed memfd mappings read-only To: Jann Horn Cc: Lorenzo Stoakes , Andrew Morton , "Liam R . Howlett" , Vlastimil Babka , Shuah Khan , Linus Torvalds , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: A9774C0020 X-Rspam-User: X-Stat-Signature: 3qiza31ibxkfofb1zs4wo8otmoi4etrz X-HE-Tag: 1732816702-906470 X-HE-Meta: U2FsdGVkX18UV1qQnHijKbs5tK/7hkQmKzHwNWkmwHI6cVfYfQavztxCYJPpMOfMxoyM5n4415n/r7h/lfRFkBdIJIYyoiNVL07E8nl+BK7DtKc72757Ey369wd7N9ei5uAyvMJ3D5xAB50TxsbvVZlpFiCA6zFyWhYDzscK+rlfj7qjW8QsCZtv3rDauqIFGyKLzHE96/9sxARfwbEkG91B5xDQ8t05dXO0tECmVVaEvfoJzQ3BRRsDXpeiUofZMm+OeMF1WFZxwDgKE7rmPhpQYJHxbUIYCNhmbroMGKOwF9y8X8C10+ALnD7+zIE3JYwF5aCxMAwdctmL/FAk/kEWqsCl4PEuZ9lql71EM4UPnMGi8RdpHqvmhQw/C1czVPnHEKhQ6lrxGDqRO3a1vsVWI1MgytrnF/Srtt9D5UKMXrHSyMVvY5NAmidnncRkQ5pr5pZnjHMV7c++j2fmzRJr0dlZA3u32oB49JCg7YuWkt0jT5aOtjxq06hVe6/uFQY5zyAejw5akoXf2o6Jfr+7foZqzjEFgp1UOqprhFMpf/QFGEStRw6M19Mso7dc6/eGQ5y/cS+0tzMD6gpTeQZwvyNkIz+N9sXmvn7rC3CoP+XxVz/SNNBJdHfYOWmg7RcHfZtEKstbhtvjKf9AD8hPYxKtnfdxU1bklIawir7eGTYeGrQN6jiuYkJOyjBhQBAyRa9Ad78xl0DHgot6b/x/h60wEsy7Sc/FudSdVzUMv+TL/t3bPvwCXW1E/lb246IiVe7xquutpSSt+2i+mX81T+OFmPgMyJBi8C9XvUm0Wtwe6e2UKyt/X8ARmfdmZirgDcDhxdteWw4zpegoP/8LEuyZJUkYidQ9CgCKnWz7CLi/USwLU5gaXdeiI5+YqNMegYLYsektKfLvonhQulFu4RMSRfPThs+xHEFMnso8WqzKXwmCQLGZ0ZQzn38iI4VEAMOybDS6DvVPMLH xK+5X8jK wTNMdslIh+VHztX1EkIwOCp1VeRAUNfkWRu+7TYk4citugvl2L0z0oLK2bJ7koNWiC2171Yx9WdbrcQ93FpV1wVe6feTtSCgU1qqzVAK7AtdKjDBr2cpQ6xnaNQ6y0sA0lhcOin9dJdNvZ/TMpWdxH5/Lkau83i37OH7jOqV7D7+qTByWwhV8dXPqGvFOfTIQa3w0GBpCz3FADb//QKXttdtg9jWN/FlIGrjUakR73RrOfZw1RNJPMGwqOMD2A91prde7dvqlfZz/JRzzySF4WD3dy487A+ajLKXQr20dT4K/EIRIvjB2hinBapzyR4Nedgn66D+n4gSE7RlIwgav9zN5FrYSukVY1UXHY9IFksi+KG0BT2JlgMEaK7AB/c89DE0wu6cnDRQb+4rnuN14RM8XS5YIIT1PKv0A/c/1CXm8Aecx0AAGYyTvG2gtuAQc+OpI X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: (Re-sending the message below since I forgot to reply-all) On Thu, Nov 28, 2024 at 6:46=E2=80=AFPM Jann Horn wrote: > > On Thu, Nov 28, 2024 at 4:06=E2=80=AFPM Lorenzo Stoakes > wrote: > > In commit 158978945f31 ("mm: perform the mapping_map_writable() check a= fter > > call_mmap()") (and preceding changes in the same series) it became poss= ible > > to mmap() F_SEAL_WRITE sealed memfd mappings read-only. > > > > This was previously unnecessarily disallowed, despite the man page > > documentation indicating that it would be, thereby limiting the usefuln= ess > > of F_SEAL_WRITE logic. > > > > We fixed this by adapting logic that existed for the F_SEAL_FUTURE_WRIT= E > > seal (one which disallows future writes to the memfd) to also be used f= or > > F_SEAL_WRITE. > > > > For background - the F_SEAL_FUTURE_WRITE seal clears VM_MAYWRITE for a > > read-only mapping to disallow mprotect() from overriding the seal - an > > operation performed by seal_check_write(), invoked from shmem_mmap(), t= he > > f_op->mmap() hook used by shmem mappings. > > > > By extending this to F_SEAL_WRITE and critically - checking > > mapping_map_writable() to determine if we may map the memfd AFTER we in= voke > > shmem_mmap() - the desired logic becomes possible. This is because > > mapping_map_writable() explicitly checks for VM_MAYWRITE, which we will > > have cleared. > > > > Commit 5de195060b2e ("mm: resolve faulty mmap_region() error path > > behaviour") unintentionally undid this logic by moving the > > mapping_map_writable() check before the shmem_mmap() hook is invoked, > > thereby regressing this change. > > > > We reinstate this functionality by moving the check out of shmem_mmap()= and > > instead performing it in do_mmap() at the point at which VMA flags are > > being determined, which seems in any case to be a more appropriate plac= e in > > which to make this determination. > > > > In order to achieve this we rework memfd seal logic to allow us access = to > > this information using existing logic and eliminate the clearing of > > VM_MAYWRITE from seal_check_write() which we are performing in do_mmap(= ) > > instead. > > If we already check is_readonly_sealed() and strip VM_MAYWRITE in > do_mmap(), without holding any kind of lock or counter on the file > yet, then this check is clearly racy somehow, right? I think we have a > race where we intermittently reject shared-readonly mmap() calls? Apropos race, some time ago I reported a way to get a mutable mapping for a write-sealed memfd via a race: https://bugzilla.kernel.org/show_bug.cgi?id=3D219106 > > Like: > > process 1: calls mmap(PROT_READ, MAP_PRIVATE), checks is_readonly_sealed(= ) > process 2: adds a F_SEAL_WRITE seal > process 1: enters mmap_region(), is_shared_maywrite() is true, > mapping_map_writable() fails > > But even if we fix that, the same scenario would result in > F_SEAL_WRITE randomly failing depending on the ordering, so it's not > like we can actually do anything particularly sensible if these two > operations race. Taking a step back, read-only shared mappings of > F_SEAL_WRITE-sealed files are just kind of a bad idea because if > someone first creates a read-only shared mapping and *then* tries to > apply F_SEAL_WRITE, that won't work because the existing mapping will > be VM_MAYWRITE. > > And the manpage is just misleading on interaction with shared mappings > in general, it says "Using the F_ADD_SEALS operation to set the > F_SEAL_WRITE seal fails with EBUSY if any writable, shared mapping > exists" when actually, it more or less fails if any shared mapping > exists at all. > > @Julian Orth: Did you report this regression because this change > caused issues with existing userspace code? I noticed this because it broke one of my testcases. It would also affect production code but making that code work on pre-6.6 kernels is probably a good idea anyway. > > > Reported-by: Julian Orth > > Closes: https://lore.kernel.org/all/CAHijbEUMhvJTN9Xw1GmbM266FXXv=3DU7s= 4L_Jem5x3AaPZxrYpQ@mail.gmail.com/ > > Fixes: 5de195060b2e ("mm: resolve faulty mmap_region() error path behav= iour") > > Signed-off-by: Lorenzo Stoakes