Re: Optimizing small reads - Kiryl Shutsemau

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Kiryl Shutsemau <kirill@shutemov.name>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>,
	 Luis Chamberlain <mcgrof@kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: Optimizing small reads
Date: Mon, 6 Oct 2025 19:04:26 +0100	[thread overview]
Message-ID: <5zq4qlllkr7zlif3dohwuraa7rukykkuu6khifumnwoltcijfc@po27djfyqbka> (raw)
In-Reply-To: <CAHk-=wi4Cma0HL2DVLWRrvte5NDpcb2A6VZNwUc0riBr2=7TXw@mail.gmail.com>

On Mon, Oct 06, 2025 at 08:50:55AM -0700, Linus Torvalds wrote:
> On Mon, 6 Oct 2025 at 04:45, Kiryl Shutsemau <kirill@shutemov.name> wrote:
> >
> > Below is my take on this. Lightly tested.
> 
> Thanks.
> 
> That looked even simpler than what I thought it would be, although I
> worry a bit about the effect on page_cache_delete() now being much
> more expensive with that spinlock.
>
> And the spinlock actually looks unnecessary, since page_cache_delete()
> already relies on being serialized by holding the i_pages lock.
> 
> So I think you can just change the seqcount_spinlock_t to a plain
> seqcount_t with no locking at all, and document that external locking.

That is not a spinlock. It is lockdep annotation that we expect this
spinlock to be held there for seqcount write to be valid.

It is NOP with lockdep disabled.

pidfs does the same: pidmap_lock_seq tied to pidmap_lock. And for
example for code flow: free_pid() takes pidmap_lock and calls
pidfs_remove_pid() that takes pidmap_lock_seq.

It also takes care about preemption disabling if needed.

Unless I totally misunderstood how it works... :P

> >  - Do we want a bounded retry on read_seqcount_retry()?
> >    Maybe upto 3 iterations?
> 
> No., I don't think it ever triggers, and I really like how this looks.
> 
> And I'd go even further, and change that first
> 
>         seq = read_seqcount_begin(&mapping->i_pages_delete_seqcnt);
> 
> into a
> 
>         if (!raw_seqcount_try_begin(&mapping->i_pages_delete_seqcnt);
>                 return 0;
> 
> so that you don't even wait for any existing case.

Ack.

> That you could even do *outside* the RCU section, but I'm not sure
> that buys us anything.
> 
> *If* somebody ever hits it we can revisit, but I really think the
> whole point of this fast-path is to just deal with the common case
> quickly.
> 
> There are going to be other things that are much more common and much
> more realistic, like "this is the first read, so I need to set the
> accessed bit".
> 
> >  - HIGHMEM support is trivial with memcpy_from_file_folio();
> 
> Good call. I didn't even want to think about it, and obviously never did.
> 
> >  - I opted for late partial read check. It would be nice allow to read
> >    across PAGE_SIZE boundary as long as it is in the same folio
> 
> Sure,
> 
> When I wrote that patch, I actually worried more about the negative
> overhead of it not hitting at all, so I tried very hard to minimize
> the cases where we look up a folio speculatively only to then decide
> we can't use it.

Consider it warming up CPU cache for slow path :P

> But as long as that
> 
>         if (iov_iter_count(iter) <= sizeof(area)) {
> 
> is there to protect the really basic rule, I guess it's not a huge deal.
> 
> >  - Move i_size check after uptodate check. It seems to be required
> >    according to the comment in filemap_read(). But I cannot say I
> >    understand i_size implications here.
> 
> I forget too, and it might be voodoo programming.
> 
> >  - Size of area is 256 bytes. I wounder if we want to get the fast read
> >    to work on full page chunks. Can we dedicate a page per CPU for this?
> >    I expect it to cover substantially more cases.
> 
> I guess a percpu page would be good, but I really liked using the
> buffer we already ended up having for that page array.
> 
> Maybe worth playing around with.

With page size buffer we might consider serving larger reads in the same
manner with loop around filemap_fast_read().

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

next prev parent reply	other threads:[~2025-10-06 18:04 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-03  2:18 Linus Torvalds
2025-10-03  3:32 ` Luis Chamberlain
2025-10-15 21:31   ` Swarna Prabhu
2025-10-03  9:55 ` Kiryl Shutsemau
2025-10-03 16:18   ` Linus Torvalds
2025-10-03 16:40     ` Linus Torvalds
2025-10-03 17:23       ` Kiryl Shutsemau
2025-10-03 17:49         ` Linus Torvalds
2025-10-06 11:44           ` Kiryl Shutsemau
2025-10-06 15:50             ` Linus Torvalds
2025-10-06 18:04               ` Kiryl Shutsemau [this message]
2025-10-06 18:14                 ` Linus Torvalds
2025-10-07 21:47                 ` Linus Torvalds
2025-10-07 22:35                   ` Linus Torvalds
2025-10-07 22:54                     ` Linus Torvalds
2025-10-07 23:30                       ` Linus Torvalds
2025-10-08 14:54                         ` Kiryl Shutsemau
2025-10-08 16:27                           ` Linus Torvalds
2025-10-08 17:03                             ` Linus Torvalds
2025-10-09 16:22                               ` Kiryl Shutsemau
2025-10-09 17:29                                 ` Linus Torvalds
2025-10-10 10:10                                   ` Kiryl Shutsemau
2025-10-10 17:51                                     ` Linus Torvalds
2025-10-13 15:35                                       ` Kiryl Shutsemau
2025-10-13 15:39                                         ` Kiryl Shutsemau
2025-10-13 16:19                                           ` Linus Torvalds
2025-10-14 12:58                                             ` Kiryl Shutsemau
2025-10-14 16:41                                               ` Linus Torvalds
2025-10-13 16:06                                         ` Linus Torvalds
2025-10-13 17:26                                         ` Theodore Ts'o
2025-10-14  3:20                                           ` Theodore Ts'o
2025-10-08 10:28                       ` Kiryl Shutsemau
2025-10-08 16:24                         ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5zq4qlllkr7zlif3dohwuraa7rukykkuu6khifumnwoltcijfc@po27djfyqbka \
    --to=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox