linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Shyam Prasad N <nspmangalore@gmail.com>
Cc: lsf-pc@lists.linux-foundation.org,
	 linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-mm@kvack.org, brauner@kernel.org,
	 Matthew Wilcox <willy@infradead.org>,
	David Howells <dhowells@redhat.com>,
	 Jeff Layton <jlayton@redhat.com>,
	Steve French <smfrench@gmail.com>,
	trondmy@kernel.org,  Shyam Prasad N <sprasad@microsoft.com>
Subject: Re: [LSF/MM/BPF TOPIC] Predictive readahead of dentries
Date: Wed, 15 Jan 2025 15:21:52 +0100	[thread overview]
Message-ID: <CAOQ4uxhBWV3DfqaE=reuPjh8w92wwujA6Abj=Gt0YvapR4m_1g@mail.gmail.com> (raw)
In-Reply-To: <CANT5p=rxOnq_jtnOpMTKA+ycKYzkJyjESbAkE5zj20h4XtE0Ew@mail.gmail.com>

On Wed, Jan 15, 2025 at 12:27 PM Shyam Prasad N <nspmangalore@gmail.com> wrote:
>
> On Tue, Jan 14, 2025 at 6:55 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Tue, Jan 14, 2025 at 4:38 AM Shyam Prasad N <nspmangalore@gmail.com> wrote:
> > >
> > > The Linux kernel does buffered reads and writes using the page cache
> > > layer, where the filesystem reads and writes are offloaded to the
> > > VM/MM layer. The VM layer does a predictive readahead of data by
> > > optionally asking the filesystem to read more data asynchronously than
> > > what was requested.
> > >
> > > The VFS layer maintains a dentry cache which gets populated during
> > > access of dentries (either during readdir/getdents or during lookup).
> > > This dentries within a directory actually forms the address space for
> > > the directory, which is read sequentially during getdents. For network
> > > filesystems, the dentries are also looked up during revalidate.
> > >
> > > During sequential getdents, it makes sense to perform a readahead
> > > similar to file reads. Even for revalidations and dentry lookups,
> > > there can be some heuristics that can be maintained to know if the
> > > lookups within the directory are sequential in nature. With this, the
> > > dentry cache can be pre-populated for a directory, even before the
> > > dentries are accessed, thereby boosting the performance. This could
> > > give even more benefits for network filesystems by avoiding costly
> > > round trips to the server.
> > >
> >
> > I believe you are referring to READDIRPLUS, which is quite common
> > for network protocols and also supported by FUSE.
> This discussion is not completely about readdirplus, but definitely is
> a part of it.
> I'm suggesting doing the next set of readdir() calls in advance, so
> that the data needed to serve those are already in the cache.
> I'm also suggesting artificially doing a readdir to avoid sequential
> revalidation of each dentry; or a readdirplus to avoid stat of each
> inode corresponding to these dentries

Well, if readdirplus is implemented, then "readaheadplus" could be
implemented by async io_uring readdirplus commands. Right?
io_uring command would have to know to chain the following
readdirplus commands with the offset returned from the previous
readdirplus response, but that should be doable I think?

> >
> > Unlike network protocols, FUSE decides by server configuration and
> > heuristics whether to "fuse_use_readdirplus" - specifically in readdirplus_auto
> > mode, FUSE starts with readdirplus, but if nothing calls lookup on the
> > directory inode by the time the next getdents call, it stops with readdirplus.
> >
> > I personally ran into the problem that I would like to control from the
> > application, which knows if it is doing "ls" or "ls -l" whether a specific
> > getdents() will use FUSE readdirplus or not, because in some situations
> > where "ls -l" is not needed that can avoid a lot of unneeded IO.
> >
> > I do not know if implementing readdirplus (i.e. populate inode and dentry)
> > makes sense for disk filesystems, but if we do it in VFS level, there has to
> > be at an API to control or at least opt-out of readdirplus, like with readahead.
> That would be a great knob to have for network filesystems. We have to
> rely on heuristics today to predict which of these patterns the
> workload is using.
>

It seems like the demand existed for a long time.
Man page for posix_fadvise(2) says:
"Programs can use posix_fadvise() to announce an intention to access file data
 in a specific pattern in the future, thus allowing the kernel to
perform appropriate
 optimizations."

I do not read this as limiting to non-directory files, and indeed fadvise() can
be called on directories, but others could argue that this is an API abuse.

Mind sending a patch for POSIX_FADV_{NO,}READDIRPLUS?
make sure it fails with -ENOTDIR on non-dir and be ready to face the
inevitable bikeshedding ;)

Thanks,
Amir.


  reply	other threads:[~2025-01-15 14:22 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-14  3:38 Shyam Prasad N
2025-01-14 12:39 ` [Lsf-pc] " Jan Kara
2025-01-15  9:52   ` Shyam Prasad N
2025-01-14 13:24 ` Amir Goldstein
2025-01-14 14:12   ` Benjamin Coddington
2025-01-14 15:01     ` Paulo Alcantara
2025-01-15 14:30       ` Shyam Prasad N
2025-01-15 14:47         ` Paulo Alcantara
2025-01-15 11:27   ` Shyam Prasad N
2025-01-15 14:21     ` Amir Goldstein [this message]
2025-01-20 21:26   ` Benjamin Coddington
2025-01-14 15:59 ` James Bottomley
2025-01-16  4:50 ` Al Viro
2025-01-16  5:31 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOQ4uxhBWV3DfqaE=reuPjh8w92wwujA6Abj=Gt0YvapR4m_1g@mail.gmail.com' \
    --to=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=dhowells@redhat.com \
    --cc=jlayton@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=nspmangalore@gmail.com \
    --cc=smfrench@gmail.com \
    --cc=sprasad@microsoft.com \
    --cc=trondmy@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox