linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Jinjiang Tu <tujinjiang@huawei.com>
Cc: miklos@szeredi.hu, amir73il@gmail.com, akpm@linux-foundation.org,
	vbabka@suse.cz, jannh@google.com, linux-mm@kvack.org,
	linux-unionfs@vger.kernel.org, wangkefeng.wang@huawei.com,
	sunnanyong@huawei.com, yi.zhang@huawei.com,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH -next] ovl: respect underlying filesystem's get_unmapped_area()
Date: Fri, 6 Dec 2024 09:25:19 +0000	[thread overview]
Message-ID: <a39ef271-dc1b-4f61-ba01-dde5b127bef2@lucifer.local> (raw)
In-Reply-To: <041dcc1a-0630-27b9-661b-8c64a3775426@huawei.com>

To be clear - I'm not accepting the export of __get_unmapped_area() so if
you depend on this for this approach, you can't take this approach.

It's an internal implementation detail. That you choose to make your
filesystem possibly a module doesn't mean that mm is required to export
internal impl details to you. Sorry.

To rescind this would require a very strong argument, you have not provided
it.

On Fri, Dec 06, 2024 at 11:35:08AM +0800, Jinjiang Tu wrote:
>
> 在 2024/12/5 23:04, Lorenzo Stoakes 写道:
> > + Matthew for large folio aspect
> >
> > On Thu, Dec 05, 2024 at 10:30:38PM +0800, Jinjiang Tu wrote:
> > > During our tests in containers, there is a read-only file (i.e., shared
> > > libraies) in the overlayfs filesystem, and the underlying filesystem is
> > > ext4, which supports large folio. We mmap the file with PROT_READ prot,
> > > and then call madvise(MADV_COLLAPSE) for it. However, the madvise call
> > > fails and returns EINVAL.
> > >
> > > The reason is that the mapping address isn't aligned to PMD size. Since
> > > overlayfs doesn't support large folio, __get_unmapped_area() doesn't call
> > > thp_get_unmapped_area() to get a THP aligned address.
> > >
> > > To fix it, call get_unmapped_area() with the realfile.
> > Isn't the correct solution to get overlayfs to support large folios?
> >
> > > Besides, since overlayfs may be built with CONFIG_OVERLAY_FS=m, we should
> > > export get_unmapped_area().
> > Yeah, not in favour of this at all. This is an internal implementation
> > detail. It seems like you're trying to hack your way into avoiding
> > providing support for large folios and to hand it off to the underlying
> > file system.
> >
> > Again, why don't you just support large folios in overlayfs?
> >
> > Literally no other file system or driver appears to make use of this
> > directly in this manner.
> >
> > And there's absolutely no way this should be exported non-GPL as if it were
> > unavoidable core functionality that everyone needs. Only you seem to...
> >
> > > Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
> > > ---
> > >   fs/overlayfs/file.c | 20 ++++++++++++++++++++
> > >   mm/mmap.c           |  1 +
> > >   2 files changed, 21 insertions(+)
> > >
> > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > > index 969b458100fe..d0dcf675ebe8 100644
> > > --- a/fs/overlayfs/file.c
> > > +++ b/fs/overlayfs/file.c
> > > @@ -653,6 +653,25 @@ static int ovl_flush(struct file *file, fl_owner_t id)
> > >   	return err;
> > >   }
> > >
> > > +static unsigned long ovl_get_unmapped_area(struct file *file,
> > > +		unsigned long addr, unsigned long len, unsigned long pgoff,
> > > +		unsigned long flags)
> > > +{
> > > +	struct file *realfile;
> > > +	const struct cred *old_cred;
> > > +	unsigned long ret;
> > > +
> > > +	realfile = ovl_real_file(file);
> > > +	if (IS_ERR(realfile))
> > > +		return PTR_ERR(realfile);
> > > +
> > > +	old_cred = ovl_override_creds(file_inode(file)->i_sb);
> > > +	ret = get_unmapped_area(realfile, addr, len, pgoff, flags);
> > > +	ovl_revert_creds(old_cred);
> > Why are you overriding credentials, then reinstating them here? That
> > seems... iffy? I knew nothing about overlayfs so this may just be a
> > misunderstanding...
>
> I refer to other file operations in overlayfs (i.e., ovl_fallocate, backing_file_mmap).
> Since get_unmapped_area() has security related operations (e.g., security_mmap_addr()),
> We should call it with the cred of the underlying file.
>
> >
> > > +
> > > +	return ret;
> > > +}
> > > +
> > >   const struct file_operations ovl_file_operations = {
> > >   	.open		= ovl_open,
> > >   	.release	= ovl_release,
> > > @@ -661,6 +680,7 @@ const struct file_operations ovl_file_operations = {
> > >   	.write_iter	= ovl_write_iter,
> > >   	.fsync		= ovl_fsync,
> > >   	.mmap		= ovl_mmap,
> > > +	.get_unmapped_area = ovl_get_unmapped_area,
> > >   	.fallocate	= ovl_fallocate,
> > >   	.fadvise	= ovl_fadvise,
> > >   	.flush		= ovl_flush,
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index 16f8e8be01f8..60eb1ff7c9a8 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -913,6 +913,7 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> > >   	error = security_mmap_addr(addr);
> > >   	return error ? error : addr;
> > >   }
> > > +EXPORT_SYMBOL(__get_unmapped_area);
> > We'll need a VERY good reason to export this internal implementation
> > detail, and if that were provided we'd need a VERY good reason for it not
> > to be GPL.
> >
> > This just seems to be a cheap way of invoking (),
> > maybe, if it is being used by the underlying file system.
>
> But the underlying file system may not support large folio. In this case,
> the mmap address doesn't need to be aligned with THP size.

But it'd not cause any problems to just do that anyway right? I don't think
many people think 'oh no I have a PMD aligned mapping now what will I do'?

Again - the right solution here is to handle large folios in overlayfs as
far as I can tell.

In any case as per the above, we're just not exporting
__get_unmapped_area(), sorry.

>
> >
> > And again... why not just add large folio support? We can't just take a
> > hack here.
> >
> > >   unsigned long
> > >   mm_get_unmapped_area(struct mm_struct *mm, struct file *file,
> > > --
> > > 2.34.1
> > >


  reply	other threads:[~2024-12-06  9:25 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-05 14:30 Jinjiang Tu
2024-12-05 15:04 ` Lorenzo Stoakes
2024-12-05 15:12   ` Amir Goldstein
2024-12-05 15:24     ` Lorenzo Stoakes
2024-12-06  3:35       ` Jinjiang Tu
2024-12-06 13:52         ` Matthew Wilcox
2024-12-10 17:36       ` Jann Horn
2024-12-06  3:35   ` Jinjiang Tu
2024-12-06  9:25     ` Lorenzo Stoakes [this message]
2024-12-06 10:45       ` Kefeng Wang
2024-12-06 11:53         ` Lorenzo Stoakes
2024-12-10  7:09           ` Kefeng Wang
2024-12-06 12:58         ` Amir Goldstein
2024-12-10  7:19           ` Kefeng Wang
2024-12-11  9:43             ` Amir Goldstein
2024-12-11 15:01               ` Kefeng Wang
2024-12-13  1:51                 ` Jinjiang Tu
2024-12-13  4:25               ` Matthew Wilcox
2024-12-13  7:49                 ` Kefeng Wang
2024-12-13 14:00                   ` Matthew Wilcox
2024-12-14  7:58                     ` Kefeng Wang
2024-12-05 21:21 ` kernel test robot
2024-12-06 13:58 ` Matthew Wilcox
2024-12-09  6:43   ` Jinjiang Tu
2024-12-09 13:33     ` Matthew Wilcox
2024-12-10  7:13       ` Jinjiang Tu
2024-12-10 17:58         ` Matthew Wilcox
2024-12-11  2:21           ` Zhang Yi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a39ef271-dc1b-4f61-ba01-dde5b127bef2@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=amir73il@gmail.com \
    --cc=jannh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=sunnanyong@huawei.com \
    --cc=tujinjiang@huawei.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=yi.zhang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox