From: Amir Goldstein <amir73il@gmail.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>,
Jerome Glisse <jglisse@redhat.com>, Jan Kara <jack@suse.cz>
Cc: lsf-pc@lists.linux-foundation.org,
Al Viro <viro@zeniv.linux.org.uk>,
"Darrick J. Wong" <darrick.wong@oracle.com>,
Dave Chinner <david@fromorbit.com>,
Matthew Wilcox <willy@infradead.org>, Chris Mason <clm@fb.com>,
Miklos Szeredi <miklos@szeredi.hu>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Linux MM <linux-mm@kvack.org>
Subject: Re: [LSF/MM TOPIC] Sharing file backed pages
Date: Fri, 25 Jan 2019 10:39:20 +0200 [thread overview]
Message-ID: <CAOQ4uxgfkzWsh+=gKGL4YGiBGLYvhcOCy13X5L2ycVdghYhrOA@mail.gmail.com> (raw)
In-Reply-To: <20190124103906.iwbttyrf6lddieou@kshutemo-mobl1>
On Thu, Jan 24, 2019 at 12:39 PM Kirill A. Shutemov
<kirill@shutemov.name> wrote:
>
> On Wed, Jan 23, 2019 at 03:54:34PM +0100, Jan Kara wrote:
> > On Wed 23-01-19 10:48:58, Amir Goldstein wrote:
> > > In his session about "reflink" in LSF/MM 2016 [1], Darrick Wong brought
> > > up the subject of sharing pages between cloned files and the general vibe
> > > in room was that it could be done.
> > >
> > > In his talk about XFS subvolumes and snapshots [2], Dave Chinner said
> > > that Matthew Willcox was "working on that problem".
> > >
> > > I have started working on a new overlayfs address space implementation
> > > that could also benefit from being able to share pages even for filesystems
> > > that do not support clones (for copy up anticipation state).
> > >
> > > To simplify the problem, we can start with sharing only uptodate clean
> > > pages that map the same offset in respected files. While the same offset
> > > requirement somewhat limits the use cases that benefit from shared file
> > > pages, there is still a vast majority of use cases (i.e. clone full
> > > image), where sharing pages of similar offset will bring a lot of
> > > benefit.
> > >
> > > At first glance, this requires dropping the assumption that a for an
> > > uptodate clean page, vmf->vma->vm_file->f_inode == page->mapping->host.
> > > Is there really such an assumption in common vfs/mm code? and what will
> > > it take to drop it?
> >
> > There definitely is such assumption. Take for example page reclaim as one
> > such place that will be non-trivial to deal with. You need to remove the
> > page from page cache of all inodes that contain it without having any file
> > context whatsoever. So you will need to create some way for this page->page
> > caches mapping to happen.
>
> We have it solved for anon pages where we need to find all VMA the page
> might be mapped to. I think we should look into adopting anon_vma
> approach[1] for files too. From the first look the problemspace looks very
> similar.
>
Yes there are many similarities and we should definitely adopt existing
solutions for shared anon pages. There are also differences and we need
to make sure we cover them in the design.
For example, reclaiming a multiply shared page may prove to be more
expensive then reclaiming a non shared page. Depending on how the page
has ended up being shared (perhaps by KSM or by a special copy_file_range()
mode on an fs that doesn't support clone_file_range), the next time
the instances
of the shared page are faulted in, they may not be shared anymore and may
consume more cache space.
I'd also like to discuss which control the filesystem gets over
unsharing a page.
Will fs have a say before page is COWed? By which order of VMAs?
I think most people currently view the shared pages concept as symetric for
all VMAs that share the page, but for overlayfs, a "master-slave" or "stacked"
model might be a better fit, so that, for example, "master" can make a call to
notify the "slave" about page being dirty instead of breaking the sharing.
Jerome,
Do you think we will have time to cover these issues in the joint session.
Perhaps we should tentatively plan for a filesystem track session for
filesystem followup issues?
Some issues I can think of are:
- Which control filesystem gets for new functionality (see above)
- Common code to help sharing pages, i.e. for generic vfs interfaces
like clone/dedupe/copy_range
- Can/should blockdev pages (of same block) be shared with file
pages of the filesystem on that blockdev by common mpage_ helpers?
- A common use case is that filesystem images are cloned and loop mounted.
How can we propagate the knowledge about files data on loop mounted fs
originating from the same underlying block though the loop device? (*)
(*) loop device is just a simple example, but same can apply to other
storage stacks as well where block layer has dedupe.
Thanks,
Amir.
next prev parent reply other threads:[~2019-01-25 8:39 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-23 8:48 Amir Goldstein
2019-01-23 8:48 ` Amir Goldstein
2019-01-23 14:54 ` Jan Kara
2019-01-23 15:12 ` Jerome Glisse
2019-01-23 15:26 ` Jerome Glisse
2019-01-23 17:57 ` Amir Goldstein
2019-01-23 17:57 ` Amir Goldstein
2019-01-24 10:39 ` Kirill A. Shutemov
2019-01-25 8:39 ` Amir Goldstein [this message]
2019-01-25 8:39 ` Amir Goldstein
2019-01-23 17:06 ` James Bottomley
2019-01-23 17:06 ` James Bottomley
2019-01-23 19:10 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAOQ4uxgfkzWsh+=gKGL4YGiBGLYvhcOCy13X5L2ycVdghYhrOA@mail.gmail.com' \
--to=amir73il@gmail.com \
--cc=clm@fb.com \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=jack@suse.cz \
--cc=jglisse@redhat.com \
--cc=kirill@shutemov.name \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=miklos@szeredi.hu \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox