From: Kiryl Shutsemau <kirill@shutemov.name>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>,
Luis Chamberlain <mcgrof@kernel.org>,
Linux-MM <linux-mm@kvack.org>,
linux-fsdevel@vger.kernel.org
Subject: Re: Optimizing small reads
Date: Mon, 6 Oct 2025 19:04:26 +0100 [thread overview]
Message-ID: <5zq4qlllkr7zlif3dohwuraa7rukykkuu6khifumnwoltcijfc@po27djfyqbka> (raw)
In-Reply-To: <CAHk-=wi4Cma0HL2DVLWRrvte5NDpcb2A6VZNwUc0riBr2=7TXw@mail.gmail.com>
On Mon, Oct 06, 2025 at 08:50:55AM -0700, Linus Torvalds wrote:
> On Mon, 6 Oct 2025 at 04:45, Kiryl Shutsemau <kirill@shutemov.name> wrote:
> >
> > Below is my take on this. Lightly tested.
>
> Thanks.
>
> That looked even simpler than what I thought it would be, although I
> worry a bit about the effect on page_cache_delete() now being much
> more expensive with that spinlock.
>
> And the spinlock actually looks unnecessary, since page_cache_delete()
> already relies on being serialized by holding the i_pages lock.
>
> So I think you can just change the seqcount_spinlock_t to a plain
> seqcount_t with no locking at all, and document that external locking.
That is not a spinlock. It is lockdep annotation that we expect this
spinlock to be held there for seqcount write to be valid.
It is NOP with lockdep disabled.
pidfs does the same: pidmap_lock_seq tied to pidmap_lock. And for
example for code flow: free_pid() takes pidmap_lock and calls
pidfs_remove_pid() that takes pidmap_lock_seq.
It also takes care about preemption disabling if needed.
Unless I totally misunderstood how it works... :P
> > - Do we want a bounded retry on read_seqcount_retry()?
> > Maybe upto 3 iterations?
>
> No., I don't think it ever triggers, and I really like how this looks.
>
> And I'd go even further, and change that first
>
> seq = read_seqcount_begin(&mapping->i_pages_delete_seqcnt);
>
> into a
>
> if (!raw_seqcount_try_begin(&mapping->i_pages_delete_seqcnt);
> return 0;
>
> so that you don't even wait for any existing case.
Ack.
> That you could even do *outside* the RCU section, but I'm not sure
> that buys us anything.
>
> *If* somebody ever hits it we can revisit, but I really think the
> whole point of this fast-path is to just deal with the common case
> quickly.
>
> There are going to be other things that are much more common and much
> more realistic, like "this is the first read, so I need to set the
> accessed bit".
>
> > - HIGHMEM support is trivial with memcpy_from_file_folio();
>
> Good call. I didn't even want to think about it, and obviously never did.
>
> > - I opted for late partial read check. It would be nice allow to read
> > across PAGE_SIZE boundary as long as it is in the same folio
>
> Sure,
>
> When I wrote that patch, I actually worried more about the negative
> overhead of it not hitting at all, so I tried very hard to minimize
> the cases where we look up a folio speculatively only to then decide
> we can't use it.
Consider it warming up CPU cache for slow path :P
> But as long as that
>
> if (iov_iter_count(iter) <= sizeof(area)) {
>
> is there to protect the really basic rule, I guess it's not a huge deal.
>
> > - Move i_size check after uptodate check. It seems to be required
> > according to the comment in filemap_read(). But I cannot say I
> > understand i_size implications here.
>
> I forget too, and it might be voodoo programming.
>
> > - Size of area is 256 bytes. I wounder if we want to get the fast read
> > to work on full page chunks. Can we dedicate a page per CPU for this?
> > I expect it to cover substantially more cases.
>
> I guess a percpu page would be good, but I really liked using the
> buffer we already ended up having for that page array.
>
> Maybe worth playing around with.
With page size buffer we might consider serving larger reads in the same
manner with loop around filemap_fast_read().
--
Kiryl Shutsemau / Kirill A. Shutemov
next prev parent reply other threads:[~2025-10-06 18:04 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-03 2:18 Linus Torvalds
2025-10-03 3:32 ` Luis Chamberlain
2025-10-15 21:31 ` Swarna Prabhu
2025-10-03 9:55 ` Kiryl Shutsemau
2025-10-03 16:18 ` Linus Torvalds
2025-10-03 16:40 ` Linus Torvalds
2025-10-03 17:23 ` Kiryl Shutsemau
2025-10-03 17:49 ` Linus Torvalds
2025-10-06 11:44 ` Kiryl Shutsemau
2025-10-06 15:50 ` Linus Torvalds
2025-10-06 18:04 ` Kiryl Shutsemau [this message]
2025-10-06 18:14 ` Linus Torvalds
2025-10-07 21:47 ` Linus Torvalds
2025-10-07 22:35 ` Linus Torvalds
2025-10-07 22:54 ` Linus Torvalds
2025-10-07 23:30 ` Linus Torvalds
2025-10-08 14:54 ` Kiryl Shutsemau
2025-10-08 16:27 ` Linus Torvalds
2025-10-08 17:03 ` Linus Torvalds
2025-10-09 16:22 ` Kiryl Shutsemau
2025-10-09 17:29 ` Linus Torvalds
2025-10-10 10:10 ` Kiryl Shutsemau
2025-10-10 17:51 ` Linus Torvalds
2025-10-13 15:35 ` Kiryl Shutsemau
2025-10-13 15:39 ` Kiryl Shutsemau
2025-10-13 16:19 ` Linus Torvalds
2025-10-14 12:58 ` Kiryl Shutsemau
2025-10-14 16:41 ` Linus Torvalds
2025-10-13 16:06 ` Linus Torvalds
2025-10-13 17:26 ` Theodore Ts'o
2025-10-14 3:20 ` Theodore Ts'o
2025-10-08 10:28 ` Kiryl Shutsemau
2025-10-08 16:24 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5zq4qlllkr7zlif3dohwuraa7rukykkuu6khifumnwoltcijfc@po27djfyqbka \
--to=kirill@shutemov.name \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mcgrof@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox