linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kiryl Shutsemau <kirill@shutemov.name>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>,
	 Luis Chamberlain <mcgrof@kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: Optimizing small reads
Date: Wed, 8 Oct 2025 11:28:42 +0100	[thread overview]
Message-ID: <bbagpeesjg73emxpwkxnvaepcn5hjrsrabaamtth2m26khhppa@7hpwl2mk3mlc> (raw)
In-Reply-To: <CAHk-=wgzXWxG=PCmi_NQ6Z50_EXAL9vGHQSGMNAVkK4ooqOLiA@mail.gmail.com>

On Tue, Oct 07, 2025 at 03:54:19PM -0700, Linus Torvalds wrote:
> On Tue, 7 Oct 2025 at 15:35, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > But I think I'll try to boot it next. Wish me luck.
> 
> Bah. It boots - after you fix the stupid double increment of 'already_copied'.
> 
> I didn't remove the update inside the loop when I made it update it
> after the loop.
> 
> So here's the slightly fixed patch that actually does boot - and that
> I'm running right now. But I wouldn't call it exactly "tested".
> 
> Caveat patchor.

My take on the same is below.

The biggest difference is that I drop RCU lock between iterations. But
as you said, not sure if it is sensible. It allows page faults.

Other thing that I noticed that we don't do flush_dcache_folio() in fast
path. I bypassed fast path if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE.

>  mm/filemap.c | 33 +++++++++++++++++++++++++--------
>  1 file changed, 25 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 60a7b9275741..ba11f018ca6b 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2792,20 +2792,37 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
>  	 * any compiler initialization would be pointless since this
>  	 * can fill it will garbage.
>  	 */
> -	if (iov_iter_count(iter) <= sizeof(area)) {
> +	if (iov_iter_count(iter) <= PAGE_SIZE) {

PAGE_SIZE is somewhat arbitrary here. We might want to see if we can do
full length (or until the first failure). But holding RCU read lock whole
time might not be a good idea in this case.

>  		size_t count = iov_iter_count(iter);
> +		size_t fast_read = 0;
>  
>  		/* Let's see if we can just do the read under RCU */
>  		rcu_read_lock();
> -		count = filemap_fast_read(mapping, iocb->ki_pos, area.buffer, count);
> +		pagefault_disable();
> +		do {
> +			size_t copied = min(count, sizeof(area));
> +
> +			copied = filemap_fast_read(mapping, iocb->ki_pos, area.buffer, copied);
> +			if (!copied)
> +				break;

filemap_fast_read() will only read short on EOF. So if it reads short we
don't need additional iterations.

> +			copied = copy_to_iter(area.buffer, copied, iter);
> +			if (!copied)
> +				break;
> +			fast_read += copied;
> +			iocb->ki_pos += copied;
> +			count -= copied;
> +		} while (count);
> +		pagefault_enable();
>  		rcu_read_unlock();
> -		if (count) {
> -			size_t copied = copy_to_iter(area.buffer, count, iter);
> -			if (unlikely(!copied))
> -				return already_read ? already_read : -EFAULT;
> -			ra->prev_pos = iocb->ki_pos += copied;
> +
> +		if (fast_read) {
> +			ra->prev_pos += fast_read;
> +			already_read += fast_read;
>  			file_accessed(filp);
> -			return copied + already_read;
> +
> +			/* All done? */
> +			if (!count)
> +				return already_read;
>  		}
>  	}
>  

diff --git a/mm/filemap.c b/mm/filemap.c
index d9fda3c3ae2c..6b9627cf47af 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2752,29 +2752,48 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
 
 	iov_iter_truncate(iter, inode->i_sb->s_maxbytes - iocb->ki_pos);
 
+	/* Don't bother with flush_dcache_folio() */
+	if (ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE)
+		goto slowpath;
+
 	/*
 	 * Try a quick lockless read into the 'area' union. Note that
 	 * this union is intentionally marked "__uninitialized", because
 	 * any compiler initialization would be pointless since this
 	 * can fill it will garbage.
 	 */
-	if (iov_iter_count(iter) <= sizeof(area)) {
-		size_t count = iov_iter_count(iter);
+	do {
+		size_t to_read, read, copied;
+
+		to_read = min(iov_iter_count(iter), sizeof(area));
 
 		/* Let's see if we can just do the read under RCU */
 		rcu_read_lock();
-		count = filemap_fast_read(mapping, iocb->ki_pos, area.buffer, count);
+		read = filemap_fast_read(mapping, iocb->ki_pos, area.buffer, to_read);
 		rcu_read_unlock();
-		if (count) {
-			size_t copied = copy_to_iter(area.buffer, count, iter);
-			if (unlikely(!copied))
-				return already_read ? already_read : -EFAULT;
-			ra->prev_pos = iocb->ki_pos += copied;
-			file_accessed(filp);
-			return copied + already_read;
-		}
-	}
 
+		if (!read)
+			break;
+
+		copied = copy_to_iter(area.buffer, read, iter);
+
+		already_read += copied;
+		iocb->ki_pos += copied;
+		last_pos = iocb->ki_pos;
+
+		if (copied < read) {
+			error = -EFAULT;
+			goto out;
+		}
+
+		/* filemap_fast_read() only reads short at EOF: Stop. */
+		if (read != to_read)
+			goto out;
+	} while (iov_iter_count(iter));
+
+	if (!iov_iter_count(iter))
+		goto out;
+slowpath:
 	/*
 	 * This actually properly initializes the fbatch for the slow case
 	 */
@@ -2865,7 +2884,7 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
 		}
 		folio_batch_init(&area.fbatch);
 	} while (iov_iter_count(iter) && iocb->ki_pos < isize && !error);
-
+out:
 	file_accessed(filp);
 	ra->prev_pos = last_pos;
 	return already_read ? already_read : error;
-- 
  Kiryl Shutsemau / Kirill A. Shutemov


  parent reply	other threads:[~2025-10-08 10:28 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-03  2:18 Linus Torvalds
2025-10-03  3:32 ` Luis Chamberlain
2025-10-15 21:31   ` Swarna Prabhu
2025-10-03  9:55 ` Kiryl Shutsemau
2025-10-03 16:18   ` Linus Torvalds
2025-10-03 16:40     ` Linus Torvalds
2025-10-03 17:23       ` Kiryl Shutsemau
2025-10-03 17:49         ` Linus Torvalds
2025-10-06 11:44           ` Kiryl Shutsemau
2025-10-06 15:50             ` Linus Torvalds
2025-10-06 18:04               ` Kiryl Shutsemau
2025-10-06 18:14                 ` Linus Torvalds
2025-10-07 21:47                 ` Linus Torvalds
2025-10-07 22:35                   ` Linus Torvalds
2025-10-07 22:54                     ` Linus Torvalds
2025-10-07 23:30                       ` Linus Torvalds
2025-10-08 14:54                         ` Kiryl Shutsemau
2025-10-08 16:27                           ` Linus Torvalds
2025-10-08 17:03                             ` Linus Torvalds
2025-10-09 16:22                               ` Kiryl Shutsemau
2025-10-09 17:29                                 ` Linus Torvalds
2025-10-10 10:10                                   ` Kiryl Shutsemau
2025-10-10 17:51                                     ` Linus Torvalds
2025-10-13 15:35                                       ` Kiryl Shutsemau
2025-10-13 15:39                                         ` Kiryl Shutsemau
2025-10-13 16:19                                           ` Linus Torvalds
2025-10-14 12:58                                             ` Kiryl Shutsemau
2025-10-14 16:41                                               ` Linus Torvalds
2025-10-13 16:06                                         ` Linus Torvalds
2025-10-13 17:26                                         ` Theodore Ts'o
2025-10-14  3:20                                           ` Theodore Ts'o
2025-10-08 10:28                       ` Kiryl Shutsemau [this message]
2025-10-08 16:24                         ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bbagpeesjg73emxpwkxnvaepcn5hjrsrabaamtth2m26khhppa@7hpwl2mk3mlc \
    --to=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox