From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD348C3ABB0 for ; Mon, 5 May 2025 13:37:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8F42B6B0082; Mon, 5 May 2025 09:37:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8795F6B0083; Mon, 5 May 2025 09:37:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 71AE06B008A; Mon, 5 May 2025 09:37:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4F3BF6B0082 for ; Mon, 5 May 2025 09:37:47 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id AA2C812071E for ; Mon, 5 May 2025 13:37:47 +0000 (UTC) X-FDA: 83408956974.27.7B22ADA Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf08.hostedemail.com (Postfix) with ESMTP id EFD2C160008 for ; Mon, 5 May 2025 13:37:45 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="KTC/a18v"; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf08.hostedemail.com: domain of brauner@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=brauner@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746452266; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oXfPslpZH9vsCi3n94MSdat487FRYWfrL5ew1HZWKv4=; b=1Oj0IXXFfAVT12GVtJHNsUU1umUtrLiJt5Q70ODVP/iRYhEE3Ee+hSHDwv0YtK8slpWjBt UQDu7jF1iVpf+aYQEFK2b45UxF+vDHMhoGpCcf04Nd506uMf8dDFuWA87bqYsUX7YtfNjl EL9XFmejUQPzaKmW2r3vhblelCON6ik= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="KTC/a18v"; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf08.hostedemail.com: domain of brauner@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=brauner@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746452266; a=rsa-sha256; cv=none; b=fYJ4x3KK2dBTfNkjZTpB3E43SG0UPPwhqH0Ykb0X8QNv0o+9Yyfl4s89gNTWGTvu6dYQP7 LGrCibmH612Cm7widaN8Bz48xukbcVAyn5oaDH9tqlxKq1B3nbIRrc7a2/yIxPAXregn7u wo0teZCbaXgut1Sw+mibcmGGWdDexMk= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 8FA37A4AF6A; Mon, 5 May 2025 13:32:17 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 52B2CC4CEE4; Mon, 5 May 2025 13:37:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1746452264; bh=IZHg9odk9bDIZMztFTJjQU96GEYqgDiVHKES6Znv4V0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=KTC/a18vN15zzDRcH6W2HE4hiOXv6dG+FfkKUWIpOc9XgzPMWRLVg8DaK8r9mtI8h MDTCgUVV8uNAdmZvRh0kQGTImWlzCwIto+6hdjuaXYqMrJcP7aAYTnKWaY3yGmHgGK glTj3npkVpgtVZHNSFdorACh8GtoduHxWTY9sg7pdZLGcp4vaKRiUH5vocu6489mc5 R8oPGS4RJo4BhkG/zCB6vrxDxqEXfuyiM7wVcaJ4s56wwtLPdv6m/rlDihtj5F7rtu ZIivXEjuenPfkWTz92vnczGkpyCqqhQA2qn/RDvx0bLmURWL5botH/txI9iruxQuKD zAWqyEYF9S7iA== Date: Mon, 5 May 2025 15:37:39 +0200 From: Christian Brauner To: Lorenzo Stoakes Cc: Jan Kara , Andrew Morton , David Hildenbrand , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Jann Horn , Pedro Falcato , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Alexander Viro , Matthew Wilcox Subject: Re: [RFC PATCH v2 0/3] eliminate mmap() retry merge, add .mmap_prepare hook Message-ID: <20250505-frisur-stempeln-9b66d6115726@brauner> References: <0776ce6e-ed62-4eaa-a71a-a082dafbdb99@lucifer.local> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <0776ce6e-ed62-4eaa-a71a-a082dafbdb99@lucifer.local> X-Stat-Signature: bwrkspwg56jzjx4wnzzzig1n35zdjuz7 X-Rspamd-Queue-Id: EFD2C160008 X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1746452265-621070 X-HE-Meta: U2FsdGVkX19qPWC13ZeZ5YqhUlN0YKNDWB+5u+ueZks//vxOcrq9zbJtdYbsE6VrBt+HHoOsx1LLpkxVYn1Z7voSiEHgYclwbaqlXyoofBp2q32wtU79HkIERAOrNatsDGXwBV8kpwQsKK3fMqoMEik9cC088z6YGpPeKz9Ui2VZl1r18uYGXp5+nsBPryz6v+xQ/LndSPo8Bv4Hg+JhlT0Gz9/Szta1SVX4RaCH3JaLVv2MDQfNV7pxm6Uo/9qX0wHu3bMk9ZYmKsNjzO6/BDh1QHMRhWWrSAQbMh//mSPfkoulmQZXOdh17Ul0E0ZXH1xlSQJHzPNJNhJWLzsxyQGME+uOF1+AxE6PbqReFtnqp8TzoOcbHzItUf0FUAtv77kLUP28m1xDWoo/KUzKRuJDNDFvlPGONVgnvrRi9DlHdFFDjo6YIKKgox0EVEhBh09YFSti557lMoDvE2krP8NZfrRc1paX7mZHt//gWiPLdIOFoKU5Yans/09M83qHeHAPykDL/zWX2RpRhTPw86WIGbI7mv2j/qwPtVOxMs8iXYcIRvocVI9cTBvPY6vsFq2to7xiQYHiBslaOiIM5z6RIghSagPMtckvO94HYnx4n/LyOPCJRWIiHbnSs9/qI4uF2NyTaTnNSMUonYY1oXsuSpu5FgB7gz6rVjNNdgNQK/Z1MneidRu6NHsO93Xbs8WfjcfFu6kdvqPP6KAtBrjFD6d48HcAoT4u4a6YpU0nSGoX2OKYXwUwq523FMl4rcJkGEBqkw71NUh1Bsk+CmcDA2GaFdbq37JWqZYYWLihzGZNowAKzCOxLqthFlUSgGOu1YRSO/gFVRy4tNVHnm7uF75AwFDJUHrpUffprCOWh6khQ8ma0jeIcRc/9ZCcRVAVjXS8vxdwyTv3O90H+i90sanAPDOh5UM7vm62RiL6sP6UAPWY9FeabdNvi8S5sfM3O5cDbYfNcMpAMol 4lgtgB+S r+3t/v29X3ZKzPY+yBTkQS4rjsmQIwAIzYeUn0vlT+4k+OA7Xg0E12pr91btM3QZngXI3f/RRx5ZzBjqr1ypeBXvx+effW3v3vFU5SyhUCWoGfNaJeZkiRH18blUVxYnubRnuVbwLqkQ0qa8SfygwnFclog4z4fphp5TBknHmTH843hXBbEjR9W0RZIHNVSWkjfQaSTkZjPnsWr2tlw8HHW/i5DHlTuwHQ+2TjlSuMcht3xSOFKpIFtsKo0pzN3Jat/s4mlykyIRfYc8yxT33RE3YiBh4dsek+oeiho759CjGsUOOMDbkkucujh0c10hGOJD/OcSVhlNVZZn3JcdeDZQTmQfKxCg/iT/T X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, May 02, 2025 at 01:59:49PM +0100, Lorenzo Stoakes wrote: > On Fri, May 02, 2025 at 02:20:38PM +0200, Jan Kara wrote: > > On Thu 01-05-25 18:25:26, Lorenzo Stoakes wrote: > > > During the mmap() of a file-backed mapping, we invoke the underlying driver > > > file's mmap() callback in order to perform driver/file system > > > initialisation of the underlying VMA. > > > > > > This has been a source of issues in the past, including a significant > > > security concern relating to unwinding of error state discovered by Jann > > > Horn, as fixed in commit 5de195060b2e ("mm: resolve faulty mmap_region() > > > error path behaviour") which performed the recent, significant, rework of > > > mmap() as a whole. > > > > > > However, we have had a fly in the ointment remain - drivers have a great > > > deal of freedom in the .mmap() hook to manipulate VMA state (as well as > > > page table state). > > > > > > This can be problematic, as we can no longer reason sensibly about VMA > > > state once the call is complete (the ability to do - anything - here does > > > rather interfere with that). > > > > > > In addition, callers may choose to do odd or unusual things which might > > > interfere with subsequent steps in the mmap() process, and it may do so and > > > then raise an error, requiring very careful unwinding of state about which > > > we can make no assumptions. > > > > > > Rather than providing such an open-ended interface, this series provides an > > > alternative, far more restrictive one - we expose a whitelist of fields > > > which can be adjusted by the driver, along with immutable state upon which > > > the driver can make such decisions: > > > > > > struct vm_area_desc { > > > /* Immutable state. */ > > > struct mm_struct *mm; > > > unsigned long start; > > > unsigned long end; > > > > > > /* Mutable fields. Populated with initial state. */ > > > pgoff_t pgoff; > > > struct file *file; > > > vm_flags_t vm_flags; > > > pgprot_t page_prot; > > > > > > /* Write-only fields. */ > > > const struct vm_operations_struct *vm_ops; > > > void *private_data; > > > }; > > > > > > The mmap logic then updates the state used to either merge with a VMA or > > > establish a new VMA based upon this logic. > > > > > > This is achieved via new file hook .mmap_prepare(), which is, importantly, > > > invoked very early on in the mmap() process. > > > > > > If an error arises, we can very simply abort the operation with very little > > > unwinding of state required. > > > > Looks sensible. So is there a plan to transform existing .mmap hooks to > > .mmap_prepare hooks? I agree that for most filesystems this should be just > > easy 1:1 replacement and AFAIU this would be prefered? > > Thanks! > > Yeah the intent is to convert _all_ callers away from .mmap() so we can > lock down what drivers are doing and be able to (relatively) safely make > assumptions about what's going on in mmap logic. > > As David points out, we may need to add new callbacks to account for other The plural is a little worrying, let's please aim minimize the number of new methods we need for this.