From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4F8DC3ABDA for ; Wed, 14 May 2025 10:01:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A41C66B0126; Wed, 14 May 2025 06:01:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9EEC06B0127; Wed, 14 May 2025 06:01:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 88FF66B0128; Wed, 14 May 2025 06:01:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 6816D6B0126 for ; Wed, 14 May 2025 06:01:55 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 896F81CD01B for ; Wed, 14 May 2025 10:01:56 +0000 (UTC) X-FDA: 83441072232.30.FF263F4 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf18.hostedemail.com (Postfix) with ESMTP id 475971C000F for ; Wed, 14 May 2025 10:01:54 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=OMhGCGQJ; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b="Fby0+Cd/"; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=OMhGCGQJ; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b="Fby0+Cd/"; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf18.hostedemail.com: domain of pfalcato@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=pfalcato@suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747216914; a=rsa-sha256; cv=none; b=Sx4NTAhVA3K+YMzV5xJTv3PWnFrKvs4ZiSvPT3xMvtdV9kFOfYO/rTta51Fnp0fuxXVqV0 lkd5ioRvYcpvZmtPNNoNox4p9Zuq5KNVWbSQSodtc11E+Pfw6LOOb1y5sVzsjqWvo8mBey AIpuQAxj5I/XJ+Y0UWQ/hhdBrQVBwIk= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=OMhGCGQJ; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b="Fby0+Cd/"; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=OMhGCGQJ; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b="Fby0+Cd/"; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf18.hostedemail.com: domain of pfalcato@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=pfalcato@suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747216914; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0qFvpH2MgN8teidabV80M0BZ5ot+a2R9QqLMuMJOrW4=; b=8W3L2JIChiipO9taojpiANwICoHyHk38Lqf/IQT6ZCzu2glfhfYFZ89ChdkoP18XaSZ/8T QTmfnb3anwyv0HjcuJmoYiw+vCpwpnDWOC9Wl7kifrXotFLhyw6egKoAzZaZIU9v+gLKzh rlVsB/gqebbjbSZgqW0x0Ti/k2Sv8Vs= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 7CC43211D0; Wed, 14 May 2025 10:01:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1747216912; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=0qFvpH2MgN8teidabV80M0BZ5ot+a2R9QqLMuMJOrW4=; b=OMhGCGQJey9EaGuP4MAxOm1U6wITxW42sRXkpw+vhjFteV1yKucf35eQc0IqkNmlQP6N1T J//7jilJzTlY8/ZEOvhuC07prHFMXw2olb+a/cAKS+swhzm7rU/InP1O9gWRJ/RhDXdMsd t/OhITkHjXg1a0rh1kVv+h61jYyHnrw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1747216912; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=0qFvpH2MgN8teidabV80M0BZ5ot+a2R9QqLMuMJOrW4=; b=Fby0+Cd/bt2q98+8gHe/IaIYZPAXdZV/i2QMGngR2mg9vvawgdAlMEDGvHUnCwl5Gw6SnR S1UoZ5ZWAaSeX6BA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1747216912; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=0qFvpH2MgN8teidabV80M0BZ5ot+a2R9QqLMuMJOrW4=; b=OMhGCGQJey9EaGuP4MAxOm1U6wITxW42sRXkpw+vhjFteV1yKucf35eQc0IqkNmlQP6N1T J//7jilJzTlY8/ZEOvhuC07prHFMXw2olb+a/cAKS+swhzm7rU/InP1O9gWRJ/RhDXdMsd t/OhITkHjXg1a0rh1kVv+h61jYyHnrw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1747216912; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=0qFvpH2MgN8teidabV80M0BZ5ot+a2R9QqLMuMJOrW4=; b=Fby0+Cd/bt2q98+8gHe/IaIYZPAXdZV/i2QMGngR2mg9vvawgdAlMEDGvHUnCwl5Gw6SnR S1UoZ5ZWAaSeX6BA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 7F8B613306; Wed, 14 May 2025 10:01:51 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id B/sAHA9qJGjtNgAAD6G6ig (envelope-from ); Wed, 14 May 2025 10:01:51 +0000 Date: Wed, 14 May 2025 11:01:49 +0100 From: Pedro Falcato To: Lorenzo Stoakes Cc: Andrew Morton , David Hildenbrand , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Jann Horn , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Alexander Viro , Christian Brauner , Jan Kara , Matthew Wilcox Subject: Re: [PATCH v2 1/3] mm: introduce new .mmap_prepare() file callback Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 475971C000F X-Rspam-User: X-Stat-Signature: fbqppyc373emkgjw9seim68xqogimtpu X-HE-Tag: 1747216914-878716 X-HE-Meta: U2FsdGVkX1+akNDxJDNvWZKJpG6SdN6QkFg5CSLcuaN0EtDCmOotbyp/e+lqgGxgnoFT++mUEcw7d9DYsuTonTbOPJARD32M02iRSBykfZKCuwFeI3o5R8D/kSl8GQT9PzGtsP843E0XujkhaGWPsOLGRy3mEKgFxABuPr5YIr9MrouB76e64YLXy5bCiIhn3Iecr8acKziH/sD5cNxz+p5HyalYr1TG+bNL7WLA/TJDsh/+vun3RFAfE9AHsQCFXuG/3qV2b3o6FjgxCwB4AWSAOKKa29pLnhO4i/I5jmh0hm4P41Z//EKu1Zc4q4SWKZjMQ7g34IQmYzHQ3Gflqvq1WpvaNSqb5KnxeYZH6Qha4eTSxM64+3jaMowBpbvbSTMsBxyYqw3G0s20B6y3w2Hj/XxJjdf9rUYM8K57MXU+hkA0f+GpdHrCmssci/WEzNWkWMQPlCeZ9hRFn7KRGuu7femyIbp6WPo7NjNn5yxjfTV3bobzwaGEQCib0fpLkUuEW/PJZMdGuhZQ9q51gCOZGXDMPbEixj4Z0kYBOeDaDzYl/BCFG8F0v0P/V6tK5m6AprCppjIKHddrBVhbQcTfAkxxejaZenDsxe6/5PHj3zVKHJDLdIWTgEXTr4Xp5BXBvyQi+P3lRQUvqoyfSNfH6rH7P59pvXqFw7p5ALixl8+HdQIhxDHoucwQmJvdcksSx4P+B8xJdZmQhIaXVgRv1UglWh1LTaag+03ifnWe6rTJb8KzBfEJfIlEzQhsy2C2pexlPKSKBROCbe12s0V2P0CL/PdaJXClzIYmyNIqWja3z+nxzsdBiGPcCZMH6bBOVQEi3DAUPy1HhnDc7jlEWjhq3IeHQt/CLMqI3ftvxDKtZAnvUTG9ghxLQUa3ESNCoDXllkfqhiqjDT43bS0K3+rI3Cnz0cCm1ALqxN94v/i7f9xNVa5fPaSeqnzCAqwjRyd/oxMdMibVWsK re4Bufn3 +nhB2v7nDWVmhRxffGUApEntPXNxWDNWaejHhn5S3NMF83uYIRWY24k4OE2JgYwf9cOTvDQlWPjHqhTxYDmJ+xsPkry6YEwOTKwCaIJPMBJs6ZU8FiU/ksWLG0gYDHj/MXKljmfDkcskb+JayR74v7fABknpFMqobLuz+IZu0q+SBIBcZhgpG00+yoKoLyPFkeQ+Goul6eTTOWhoO5SbtyztBMnc2SjGOuT+0DPQKm3SqV2L3eYLhbJc4LyAfX+8gD+dqT35fgNX4+tH/RdKfYB5DIg/+IfbwqFyfhOebT1LgwYkeCxKrb1R7xV5VYux1b7GrBmngTKmoMv7Xo/rx5Ey5hjEI0/f7+2zXEiJ1fM3ZgrERf/Dsy8231RL1bX0sc9HV0Vz5G3N8a15Yg8ZtPJXuPnunaE8obK09 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, May 14, 2025 at 10:12:49AM +0100, Lorenzo Stoakes wrote: > On Wed, May 14, 2025 at 10:04:06AM +0100, Pedro Falcato wrote: > > On Fri, May 09, 2025 at 01:13:34PM +0100, Lorenzo Stoakes wrote: > > > Provide a means by which drivers can specify which fields of those > > > permitted to be changed should be altered to prior to mmap()'ing a > > > range (which may either result from a merge or from mapping an entirely new > > > VMA). > > > > > > Doing so is substantially safer than the existing .mmap() calback which > > > provides unrestricted access to the part-constructed VMA and permits > > > drivers and file systems to do 'creative' things which makes it hard to > > > reason about the state of the VMA after the function returns. > > > > > > The existing .mmap() callback's freedom has caused a great deal of issues, > > > especially in error handling, as unwinding the mmap() state has proven to > > > be non-trivial and caused significant issues in the past, for instance > > > those addressed in commit 5de195060b2e ("mm: resolve faulty mmap_region() > > > error path behaviour"). > > > > > > It also necessitates a second attempt at merge once the .mmap() callback > > > has completed, which has caused issues in the past, is awkward, adds > > > overhead and is difficult to reason about. > > > > > > The .mmap_prepare() callback eliminates this requirement, as we can update > > > fields prior to even attempting the first merge. It is safer, as we heavily > > > restrict what can actually be modified, and being invoked very early in the > > > mmap() process, error handling can be performed safely with very little > > > unwinding of state required. > > > > > > The .mmap_prepare() and deprecated .mmap() callbacks are mutually > > > exclusive, so we permit only one to be invoked at a time. > > > > > > Update vma userland test stubs to account for changes. > > > > > > Signed-off-by: Lorenzo Stoakes > > > > Reviewed-by: Pedro Falcato > > > > Neat idea, thanks. This should also help out with the insane proliferation of > > vm_flags_set in ->mmap() callbacks all over. Hopefully. [snip] > > > > 2) Possibly add a ->mmap_finish()? With a fully constructed vma at that point. > > So things like remap_pfn_range can still be used by drivers' mmap() > > implementation. > > Thanks for raising the remap_pfn_range() case! Yes this is definitely a > thing. > > However this proposed callback would totally undermine the purpose of the > change - the idea is to never give a vma because if we do so we lose all of > the advantages here and may as well just leave the mmap in place for > this... > Yes, good point. > However I do think we'll need a new callback at some point (previously > discussed in thread). > > We could perhaps provide the option to _explicitly_ remap for instance. I > would want it to be heavily locked down as to what can happen and to happen > as early as possible. > I think we can simply combine various ideas here. Like: struct vm_area_desc_private { struct vm_area_desc desc; struct vm_area_struct *vma; }; Then, for this "mmap_finish" callback and associated infra: int (*mmap_finish)(struct vm_area_desc *desc); int mmap_remap_pfn_range(struct vm_area_desc *desc, /*...*/) { struct vm_area_desc_private *private = container_of(desc, struct vm_area_desc_private, desc); return remap_pfn_range(private->vma, /*...*/); } int random_driver_mmap_finish(struct vm_area_desc *desc) { return mmap_remap_pfn_range(desc, desc->start, some_pfn, some_size, desc->page_prot); } I think something of the sort would be quite less prone to abuse, and we could take the time to then even polish up the interface (e.g maybe it would be nicer if mmap_remap_pfn_range took a vma offset, and not a start address). Anyway, just brainstorming. This idea came to mind, I think it's quite interesting. > This is something we can iterate on, as trying to find the ideal scheme > immediately will just lead to inaction, the big advantage with the approach > here is we can be iterative. > > We provide this, use it in a scenario which allows us to eliminate merge > retry, and can take it from there :) Yep, totally. > > So indeed, watch this space basically... I will be highly proactive on this > stuff moving forward. > > > > > 1) is particularly important so our VFS and driver friends know this is supposed > > to be The Way Forward. > > I think probably the answer is for me to fully update the document to be > bang up to date, right? But that would obviously work best as a follow up > patch :) > You love your big projects :p I had the impression the docs were more or less up to date? The VFS people do update it somewhat diligently. And for mm we only have ->mmap, ->get_unmapped_area, and now ->mmap_prepare. And the descriptions are ATM quite useless, just "called by the mmap(2) system call". -- Pedro