From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F8F6C5B552 for ; Mon, 9 Jun 2025 11:35:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2882D6B008C; Mon, 9 Jun 2025 07:35:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2605C6B0092; Mon, 9 Jun 2025 07:35:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 178956B0093; Mon, 9 Jun 2025 07:35:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id EC4E36B008C for ; Mon, 9 Jun 2025 07:35:46 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8F71B161A15 for ; Mon, 9 Jun 2025 11:35:46 +0000 (UTC) X-FDA: 83535657492.13.1D4C2EF Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf25.hostedemail.com (Postfix) with ESMTP id 53B12A0005 for ; Mon, 9 Jun 2025 11:35:44 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=fWecfaQF; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=p8ai50Dv; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=fWecfaQF; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=p8ai50Dv; spf=pass (imf25.hostedemail.com: domain of pfalcato@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=pfalcato@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749468944; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0aX4K0ZmJrkWwew/LvkVL8sEQCpqlT0saO3jfYDTfg8=; b=33wYgef6wRREzLhiXUHTDLqKFbXxk+D2V5MzS4czknKtyH5iJ1ZYMIBdcGZwBBg3zFM7md 42M6sE35wIuMCwiiCPsVm0LGQPTY+yBeVm4/bvZZuoyFObk1/t3evkTSQtFuSt5gAG4vGJ vfJeo2gauMp/6Hs6vdKTibJF9YO1mSs= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=fWecfaQF; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=p8ai50Dv; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=fWecfaQF; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=p8ai50Dv; spf=pass (imf25.hostedemail.com: domain of pfalcato@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=pfalcato@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749468944; a=rsa-sha256; cv=none; b=tY4LEcWd9g4N2zsdFqAJkCl6+EdqvtnNnJ6054mRRS1w4IJvIqayKICLB//56TAgTmXiAz EMbIaGMDH23RtAwVZAw8UGAkEm4etBgj8nKgZywY6PnCuCHmO77pp7ScOcSU5hCzm2WEIL qk8JI6qELe1FRvvKzQEoDhfAeB3xH7o= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id BD5612118A; Mon, 9 Jun 2025 11:35:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1749468942; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=0aX4K0ZmJrkWwew/LvkVL8sEQCpqlT0saO3jfYDTfg8=; b=fWecfaQF8O4I+bO0Z2K29v2Q3dmludqUkhUXQbdertpNCuNyrN7F6en1xXm2e72cvBNEfb woQPYSUD3aydiLcKKEOW6QCjrWqnIlJQeB6b+C9+mnPxJ9J/qSQNYpcfEhWJ8mXQMYdC51 ZtwX4h2ZdCQjlYOq3+4vmbZQi4fwApA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1749468942; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=0aX4K0ZmJrkWwew/LvkVL8sEQCpqlT0saO3jfYDTfg8=; b=p8ai50Dv4WmcZF1f/x9s+Y0iZV3ABINLUBWPRhIiKrzHHVQFo/DZxVn9CADDCrMBft2NmK XZHXUxDWM3HQjeAw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1749468942; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=0aX4K0ZmJrkWwew/LvkVL8sEQCpqlT0saO3jfYDTfg8=; b=fWecfaQF8O4I+bO0Z2K29v2Q3dmludqUkhUXQbdertpNCuNyrN7F6en1xXm2e72cvBNEfb woQPYSUD3aydiLcKKEOW6QCjrWqnIlJQeB6b+C9+mnPxJ9J/qSQNYpcfEhWJ8mXQMYdC51 ZtwX4h2ZdCQjlYOq3+4vmbZQi4fwApA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1749468942; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=0aX4K0ZmJrkWwew/LvkVL8sEQCpqlT0saO3jfYDTfg8=; b=p8ai50Dv4WmcZF1f/x9s+Y0iZV3ABINLUBWPRhIiKrzHHVQFo/DZxVn9CADDCrMBft2NmK XZHXUxDWM3HQjeAw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 090A2137FE; Mon, 9 Jun 2025 11:35:41 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id IyGQOg3HRmgFPgAAD6G6ig (envelope-from ); Mon, 09 Jun 2025 11:35:41 +0000 Date: Mon, 9 Jun 2025 12:35:40 +0100 From: Pedro Falcato To: Lorenzo Stoakes Cc: Andrew Morton , Alexander Viro , Christian Brauner , Jan Kara , "Liam R . Howlett" , Vlastimil Babka , Jann Horn , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] mm: add mmap_prepare() compatibility layer for nested file systems Message-ID: References: <20250609092413.45435-1-lorenzo.stoakes@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250609092413.45435-1-lorenzo.stoakes@oracle.com> X-Rspamd-Queue-Id: 53B12A0005 X-Stat-Signature: m8zgbgitmtnp3txg9uupzjghdpjfrxi5 X-Rspam-User: X-Rspamd-Server: rspam07 X-HE-Tag: 1749468944-182842 X-HE-Meta: U2FsdGVkX191AJb+654s3URi4Lzw6IKt8/4ve+UFEK+dx2pFf9C0ITEwByNCNLm9kCM14Hi5sXoLQJpOQje4jfroxWh9t9Rf4d39INtrKZzF3SSM4scmAi/qj4vPC09z66G/NdBb6sQOUEmBARzDEwP1DiD7qydJmvKdA0VaSH5+/NhCtg85t5Av+AEwdhLVOEAonVZb/Ug9NcnZJyMjKR1wLOSECxYj2JcaP143S1SyWjjBF3NARe7f79HvwqFTrfxexj+bk4qnwKbjduD4PytzaIGn/Cg1WFhKtWvGtZtYFrclGpW/1xYkv+AlyseDCYioSECDCkVk72XSQtqaVtGMdh4PyoAlqBKeVjhckTM2cOQB920ZMI62ZN72u4/eu1bCvviSiJ2KPk8inniQMtHT6I2Ny4t66v6VG31O9j92yLOephdh13RjvEYo9BipqUlPyL3M9zf8OM5gyjeYt2JdGQUVc8dmuD+kkrLVZaP7DVDOqgznocpr5/cPgMcsDfGzo+0+BpeRlKZ8P1Ddb5F2tyVswiAly8Y2lC3up8EhnWur8sxsv4MMxQRBhtV5bPrMnAGSbVLTOI3gJdeWnUkwVjf/7fcsJucJHHr69R2P2V8ydqRTYiIIneTQUmP72dzXBk/QOLlSulTT6avCXx5pCuCjKGPSfQSmzlJfnPWNLygS/LusF0mB2BRI7hk4nShgKnoydMLLu+Uuvc0cSYPndwYcAFIXL6Z2+6E/0yt+StraSR5A1Ms3vg+c65Ow06gbLg7s9SqDR09pMTvqwcioTwKr6tVpB/aIubtgnm6dG6jKDCWTFw+WEe6Z16OTRZu8fiVaNwtxAQ1AEAnurogiw/g0b3MUI0dNDDzHTUEV6p5E4pn/ovQrTZFSipfEci3NYw9XMxvSHPWm3aD9pi+JTR6jqGAkdf1E8ClDSwYihR2uPCwxo0ALs04wwQ79SS1iNeqrWt2r9mFYCwd N80OsQl6 XKHMQIlIyVPETbIMteuRtTqKXuDv7xwGfSjla9xZqm0yIlwACrVox0RZ8WqZbBcJC0Jn2e3z0K2hZ75rDssqk1PKhfsyv9vd+dgGZFoiYw26b5PV0sMcdfT9LyXR42i+oiLgsXlBSHz8aDs8YerupEHIoq8kLYeegE4USefXDFRQznLbIipjM3bejjc7hANracYcrh63pHJhwf3dOJlIwebk0AcPDi43irqypCsd7F66zg99pIt6OuvSled5TjQSKcpV5e1Vuxn7De7LLOyLp2ZJmOktudfDzkCzWprPEGgw+9tzNVSt1O4mcG2I6LCVXqiVSz4bY02kTDcGNe37qjHSry2aIErzv/PTtgIqfXZz3G2UWqXonbLY33LT1+POkL9YwX8EV6wtH7T6zt2xYmGuhliU2qOlREdQc+HillOr8NBntulhezdlDT9P5RHAEgAJo X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 09, 2025 at 10:24:13AM +0100, Lorenzo Stoakes wrote: > Nested file systems, that is those which invoke call_mmap() within their > own f_op->mmap() handlers, may encounter underlying file systems which > provide the f_op->mmap_prepare() hook introduced by commit > c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"). > > We have a chicken-and-egg scenario here - until all file systems are > converted to using .mmap_prepare(), we cannot convert these nested > handlers, as we can't call f_op->mmap from an .mmap_prepare() hook. > > So we have to do it the other way round - invoke the .mmap_prepare() hook > from an .mmap() one. > > in order to do so, we need to convert VMA state into a struct vm_area_desc > descriptor, invoking the underlying file system's f_op->mmap_prepare() > callback passing a pointer to this, and then setting VMA state accordingly > and safely. > > This patch achieves this via the compat_vma_mmap_prepare() function, which > we invoke from call_mmap() if f_op->mmap_prepare() is specified in the > passed in file pointer. > > We place the fundamental logic into mm/vma.c where VMA manipulation > belongs. We also update the VMA userland tests to accommodate the changes. > > The compat_vma_mmap_prepare() function and its associated machinery is > temporary, and will be removed once the conversion of file systems is > complete. > Thanks, this is annoying but looks mostly cromulent! > Signed-off-by: Lorenzo Stoakes > Reported-by: Jann Horn > Closes: https://lore.kernel.org/linux-mm/CAG48ez04yOEVx1ekzOChARDDBZzAKwet8PEoPM4Ln3_rk91AzQ@mail.gmail.com/ > Fixes: c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"). > --- > include/linux/fs.h | 6 +++-- > mm/mmap.c | 39 +++++++++++++++++++++++++++ > mm/vma.c | 46 +++++++++++++++++++++++++++++++- > mm/vma.h | 4 +++ > tools/testing/vma/vma_internal.h | 16 +++++++++++ > 5 files changed, 108 insertions(+), 3 deletions(-) > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 05abdabe9db7..8fe41a2b7527 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2274,10 +2274,12 @@ static inline bool file_has_valid_mmap_hooks(struct file *file) > return true; > } > > +int compat_vma_mmap_prepare(struct file *file, struct vm_area_struct *vma); > + > static inline int call_mmap(struct file *file, struct vm_area_struct *vma) > { > - if (WARN_ON_ONCE(file->f_op->mmap_prepare)) > - return -EINVAL; > + if (file->f_op->mmap_prepare) > + return compat_vma_mmap_prepare(file, vma); > > return file->f_op->mmap(file, vma); > } > diff --git a/mm/mmap.c b/mm/mmap.c > index 09c563c95112..0755cb5d89d1 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -1891,3 +1891,42 @@ __latent_entropy int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm) > vm_unacct_memory(charge); > goto loop_out; > } > + > +/** > + * compat_vma_mmap_prepare() - Apply the file's .mmap_prepare() hook to an > + * existing VMA > + * @file: The file which possesss an f_op->mmap_prepare() hook > + * @vma; The VMA to apply the .mmap_prepare() hook to. > + * > + * Ordinarily, .mmap_prepare() is invoked directly upon mmap(). However, certain > + * 'wrapper' file systems invoke a nested mmap hook of an underlying file. > + * > + * Until all filesystems are converted to use .mmap_prepare(), we must be > + * conservative and continue to invoke these 'wrapper' filesystems using the > + * deprecated .mmap() hook. > + * > + * However we have a problem if the underlying file system possesses an > + * .mmap_prepare() hook, as we are in a different context when we invoke the > + * .mmap() hook, already having a VMA to deal with. > + * > + * compat_vma_mmap_prepare() is a compatibility function that takes VMA state, > + * establishes a struct vm_area_desc descriptor, passes to the underlying > + * .mmap_prepare() hook and applies any changes performed by it. > + * > + * Once the conversion of filesystems is complete this function will no longer > + * be required and will be removed. > + * > + * Returns: 0 on success or error. > + */ > +int compat_vma_mmap_prepare(struct file *file, struct vm_area_struct *vma) > +{ > + struct vm_area_desc desc; > + int err; > + > + err = file->f_op->mmap_prepare(vma_to_desc(vma, &desc)); > + if (err) > + return err; > + set_vma_from_desc(vma, &desc); > + > + return 0; > +} > diff --git a/mm/vma.c b/mm/vma.c > index 01b1d26d87b4..d771750f8f76 100644 > --- a/mm/vma.c > +++ b/mm/vma.c > @@ -3153,7 +3153,6 @@ int __vm_munmap(unsigned long start, size_t len, bool unlock) > return ret; > } > > - > /* Insert vm structure into process list sorted by address > * and into the inode's i_mmap tree. If vm_file is non-NULL > * then i_mmap_rwsem is taken here. > @@ -3195,3 +3194,48 @@ int insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma) > > return 0; > } > + > +/* > + * Temporary helper functions for file systems which wrap an invocation of > + * f_op->mmap() but which might have an underlying file system which implements > + * f_op->mmap_prepare(). > + */ > + > +struct vm_area_desc *vma_to_desc(struct vm_area_struct *vma, > + struct vm_area_desc *desc) > +{ > + desc->mm = vma->vm_mm; > + desc->start = vma->vm_start; > + desc->end = vma->vm_end; > + > + desc->pgoff = vma->vm_pgoff; > + desc->file = vma->vm_file; > + desc->vm_flags = vma->vm_flags; > + desc->page_prot = vma->vm_page_prot; > + > + desc->vm_ops = NULL; > + desc->private_data = NULL; > + > + return desc; > +} > + > +void set_vma_from_desc(struct vm_area_struct *vma, struct vm_area_desc *desc) > +{ > + /* > + * Since we're invoking .mmap_prepare() despite having a partially > + * established VMA, we must take care to handle setting fields > + * correctly. > + */ > + > + /* Mutable fields. Populated with initial state. */ > + vma->vm_pgoff = desc->pgoff; > + if (vma->vm_file != desc->file) > + vma_set_file(vma, desc->file); > + if (vma->vm_flags != desc->vm_flags) > + vm_flags_set(vma, desc->vm_flags); I think we don't need vm_flags_set in this case, since the VMA isn't exposed yet. __vm_flags_mod should work just fine. Of course this isn't a big deal, but I would like it if we reduced vm_flags_set to core mm and conceptually attached things. In any case, with or without that addressed: Reviewed-by: Pedro Falcato -- Pedro