Re: Optimizing small reads

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Linus Torvalds <torvalds@linux-foundation.org>
To: Kiryl Shutsemau <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>,
	Luis Chamberlain <mcgrof@kernel.org>,
	 Linux-MM <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: Optimizing small reads
Date: Tue, 7 Oct 2025 14:47:37 -0700	[thread overview]
Message-ID: <CAHk-=wjDvkQ9H9kEv-wWKTzdBsnCWpwgnvkaknv4rjSdLErG0g@mail.gmail.com> (raw)
In-Reply-To: <5zq4qlllkr7zlif3dohwuraa7rukykkuu6khifumnwoltcijfc@po27djfyqbka>

[-- Attachment #1: Type: text/plain, Size: 1623 bytes --]

On Mon, 6 Oct 2025 at 11:04, Kiryl Shutsemau <kirill@shutemov.name> wrote:
>
> On Mon, Oct 06, 2025 at 08:50:55AM -0700, Linus Torvalds wrote:
> > On Mon, 6 Oct 2025 at 04:45, Kiryl Shutsemau <kirill@shutemov.name> wrote:
> > >
> > >  - Size of area is 256 bytes. I wounder if we want to get the fast read
> > >    to work on full page chunks. Can we dedicate a page per CPU for this?
> > >    I expect it to cover substantially more cases.
> >
> > I guess a percpu page would be good, but I really liked using the
> > buffer we already ended up having for that page array.
> >
> > Maybe worth playing around with.
>
> With page size buffer we might consider serving larger reads in the same
> manner with loop around filemap_fast_read().

Actually, thinking much more about this, I definitely do *not* think
that doing a separate page buffer is a good idea. I think it will only
cause the "fast path" to pollute more of the L1 cache, and slow things
down.

The on-stack buffer, in contrast, will pretty much either already be
in the cache, or would be brought into it regardless. So there's no
extra cache footprint from using it.

But the "loop around filemap_fast_read()" might be a fine idea even
with "just" the 256 byte buffer.

Something ENTIRELY UNTESTED like this, perhaps? This is relative to your patch.

Note: this also ends up doing it all with pagefaults disabled, becasue
that way we can do the looping without dropping and re-taking the RCU
lock. I'm not sure that is sensible, but whatever. Considering that I
didn't test this AT ALL, please just consider this a "wild handwaving"
patch.

           Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1876 bytes --]

 mm/filemap.c | 36 +++++++++++++++++++++++++++---------
 1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 60a7b9275741..541273388512 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2697,7 +2697,7 @@ static void filemap_end_dropbehind_read(struct folio *folio)
 	}
 }
 
-static inline unsigned long filemap_fast_read(struct address_space *mapping,
+static unsigned long filemap_fast_read(struct address_space *mapping,
 					      loff_t pos, char *buffer,
 					      size_t size)
 {
@@ -2792,20 +2792,38 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
 	 * any compiler initialization would be pointless since this
 	 * can fill it will garbage.
 	 */
-	if (iov_iter_count(iter) <= sizeof(area)) {
+	if (iov_iter_count(iter) <= PAGE_SIZE) {
 		size_t count = iov_iter_count(iter);
+		size_t fast_read = 0;
 
 		/* Let's see if we can just do the read under RCU */
 		rcu_read_lock();
-		count = filemap_fast_read(mapping, iocb->ki_pos, area.buffer, count);
+		pagefault_disable();
+		do {
+			size_t copied = min(count, sizeof(area));
+
+			copied = filemap_fast_read(mapping, iocb->ki_pos, area.buffer, copied);
+			if (!copied)
+				break;
+			copied = copy_to_iter(area.buffer, copied, iter);
+			if (!copied)
+				break;
+			fast_read += copied;
+			iocb->ki_pos += copied;
+			already_read += copied;
+			count -= copied;
+		} while (count);
+		pagefault_enable();
 		rcu_read_unlock();
-		if (count) {
-			size_t copied = copy_to_iter(area.buffer, count, iter);
-			if (unlikely(!copied))
-				return already_read ? already_read : -EFAULT;
-			ra->prev_pos = iocb->ki_pos += copied;
+
+		if (fast_read) {
+			ra->prev_pos += fast_read;
+			already_read += fast_read;
 			file_accessed(filp);
-			return copied + already_read;
+
+			/* All done? */
+			if (!count)
+				return already_read;
 		}
 	}

next prev parent reply	other threads:[~2025-10-07 21:48 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-03  2:18 Linus Torvalds
2025-10-03  3:32 ` Luis Chamberlain
2025-10-15 21:31   ` Swarna Prabhu
2025-10-03  9:55 ` Kiryl Shutsemau
2025-10-03 16:18   ` Linus Torvalds
2025-10-03 16:40     ` Linus Torvalds
2025-10-03 17:23       ` Kiryl Shutsemau
2025-10-03 17:49         ` Linus Torvalds
2025-10-06 11:44           ` Kiryl Shutsemau
2025-10-06 15:50             ` Linus Torvalds
2025-10-06 18:04               ` Kiryl Shutsemau
2025-10-06 18:14                 ` Linus Torvalds
2025-10-07 21:47                 ` Linus Torvalds [this message]
2025-10-07 22:35                   ` Linus Torvalds
2025-10-07 22:54                     ` Linus Torvalds
2025-10-07 23:30                       ` Linus Torvalds
2025-10-08 14:54                         ` Kiryl Shutsemau
2025-10-08 16:27                           ` Linus Torvalds
2025-10-08 17:03                             ` Linus Torvalds
2025-10-09 16:22                               ` Kiryl Shutsemau
2025-10-09 17:29                                 ` Linus Torvalds
2025-10-10 10:10                                   ` Kiryl Shutsemau
2025-10-10 17:51                                     ` Linus Torvalds
2025-10-13 15:35                                       ` Kiryl Shutsemau
2025-10-13 15:39                                         ` Kiryl Shutsemau
2025-10-13 16:19                                           ` Linus Torvalds
2025-10-14 12:58                                             ` Kiryl Shutsemau
2025-10-14 16:41                                               ` Linus Torvalds
2025-10-13 16:06                                         ` Linus Torvalds
2025-10-13 17:26                                         ` Theodore Ts'o
2025-10-14  3:20                                           ` Theodore Ts'o
2025-10-08 10:28                       ` Kiryl Shutsemau
2025-10-08 16:24                         ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wjDvkQ9H9kEv-wWKTzdBsnCWpwgnvkaknv4rjSdLErG0g@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox