linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	agruenba@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: [RFC] Bypass filesystems for reading cached pages
Date: Mon, 22 Jun 2020 10:32:15 +1000	[thread overview]
Message-ID: <20200622003215.GC2040@dread.disaster.area> (raw)
In-Reply-To: <20200619155036.GZ8681@bombadil.infradead.org>

On Fri, Jun 19, 2020 at 08:50:36AM -0700, Matthew Wilcox wrote:
> 
> This patch lifts the IOCB_CACHED idea expressed by Andreas to the VFS.
> The advantage of this patch is that we can avoid taking any filesystem
> lock, as long as the pages being accessed are in the cache (and we don't
> need to readahead any pages into the cache).  We also avoid an indirect
> function call in these cases.

What does this micro-optimisation actually gain us except for more
complexity in the IO path?

i.e. if a filesystem lock has such massive overhead that it slows
down the cached readahead path in production workloads, then that's
something the filesystem needs to address, not unconditionally
bypass the filesystem before the IO gets anywhere near it.

> This could go horribly wrong if filesystems rely on doing work in their
> ->read_iter implementation (eg checking i_size after acquiring their
> lock) instead of keeping the page cache uptodate.  On the other hand,
> the ->map_pages() method is already called without locks, so filesystems
> should already be prepared for this.

Oh, gawd, we have *yet another* unlocked page cache read path that
can race with invalidations during fallocate() operations?

/me goes and looks at filemap_map_pages()

Yup, filemap_map_pages() is only safe against invalidations beyond
EOF (i.e. truncate) and can still race with invalidations within
EOF. So, yes, I'm right in that this path is not safe to run without
filesystem locking to serialise the IO against fallocate()...

Darrick, it looks like we need to wrap filemap_map_pages() with the
XFS_MMAPLOCK_SHARED like we do for all the other page fault paths
that can call into the IO path.

> Arguably we could do something similar for writes.  I'm a little more
> scared of that patch since filesystems are more likely to want to do
> things to keep their fies in sync for writes.

Please, no.  We can have uptodate cached pages over holes, unwritten
extents, shared extents, etc but they all require filesystem level
serialisation and space/block allocation work *before* we copy data
into the page. i.e. if allocation/space reservation fails, we need
to abort before changing data.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com


  parent reply	other threads:[~2020-06-22  0:32 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-19 15:50 Matthew Wilcox
2020-06-19 19:06 ` Chaitanya Kulkarni
2020-06-19 20:12   ` Matthew Wilcox
2020-06-19 21:25     ` Chaitanya Kulkarni
2020-06-20  6:19 ` Amir Goldstein
2020-06-20 19:15   ` Matthew Wilcox
2020-06-21  6:00     ` Amir Goldstein
2020-06-22  1:02     ` Dave Chinner
2020-06-22  0:32 ` Dave Chinner [this message]
2020-06-22 14:35   ` Andreas Gruenbacher
2020-06-22 18:13     ` Matthew Wilcox
2020-06-24 12:35       ` Andreas Gruenbacher
2020-07-02 15:16         ` Andreas Gruenbacher
2020-07-02 17:30           ` Matthew Wilcox
2020-06-23  0:52     ` Dave Chinner
2020-06-23  7:41       ` Andreas Gruenbacher
2020-06-22 19:18   ` Matthew Wilcox
2020-06-23  2:35     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200622003215.GC2040@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=agruenba@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox