linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Kiryl Shutsemau <kirill@shutemov.name>,
	David Hildenbrand <david@redhat.com>,
	 Matthew Wilcox <willy@infradead.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	 Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	linux-mm@kvack.org,  linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,  Kiryl Shutsemau <kas@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>
Subject: Re: [PATCH] mm/filemap: Implement fast short reads
Date: Tue, 21 Oct 2025 05:47:01 -1000	[thread overview]
Message-ID: <CAHk-=wh62OxWsL+msmks7=VdBJHz7HvRYoPDckkAEAwsgrmjew@mail.gmail.com> (raw)
In-Reply-To: <20251019215328.3b529dc78222787226bd4ffe@linux-foundation.org>

On Sun, 19 Oct 2025 at 18:53, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Is there really no way to copy the dang thing straight out to
> userspace, skip the bouncing?

Sadly, no.

It's trivial to copy to user space in a RCU-protected region: just
disable page faults and it all works fine.

In fact, it works so fine that everything boots and it all looks
beautiful in profiles etc - ask me how I know.

But it's still wrong. The problem is that *after* you've copies things
away from the page cache, you need to check that the page cache
contents are still valid.

And it's not a problem to do that and just say "don't count the bytes
I just copied, and we'll copy over them later".

But while 99.999% of the time we *will* copy over them later, it's not
actually guaranteed. What migth happen is that after we've filled in
user space with the optimistically copied data, we figure out that the
page cache is no longer valid, and we go to the slow case, and two
problems may have happened:

 (a) the file got truncated in the meantime, and we just filled in
stale data (possibly zeroes) in a user space buffer, and we're
returning a smaller length than what we filled out.

Will user space care? Not realistically, no. But it's wrong, and some
user space *might* be using the buffer as a ring-buffer or something,
and assume that if we return 5 bytes from "read()", the subsequent
bytes are still valid from (previous) ring buffer fills.

But if we decide to ignore that issue (possibly with some "open()"
time flag to say "give me optimistic short reads, and I won't care),
we still have

 (b) the folio we copied from migth have been released and re-used for
something else

and this is fatal. We might have optimistically copied things that are
now security-sensitive and even if we return a short read - or
overwrite it - layer, user space should never have seen that data.

This (b) thing is solvable too, but requires that page cache releases
always would be RCU-delayed, and they aren't.

So both are "solvable", but they are very big and very separate solutions.

               Linus


  parent reply	other threads:[~2025-10-21 15:47 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-17 14:15 Kiryl Shutsemau
2025-10-18  2:38 ` kernel test robot
2025-10-18  3:54 ` kernel test robot
2025-10-18  4:46 ` kernel test robot
2025-10-18 17:56 ` Linus Torvalds
2025-10-20 11:03   ` Kiryl Shutsemau
2025-10-20  4:53 ` Andrew Morton
2025-10-20 11:33   ` Kiryl Shutsemau
2025-10-21 15:50     ` Linus Torvalds
2025-10-21 23:39     ` Dave Chinner
2025-10-22  4:25       ` Linus Torvalds
2025-10-22  8:00         ` Dave Chinner
2025-10-22 15:31           ` Linus Torvalds
2025-10-23  7:50             ` Dave Chinner
2025-10-23  9:37               ` Jan Kara
2025-10-21 15:47   ` Linus Torvalds [this message]
2025-10-22  7:08 ` Pedro Falcato
2025-10-22  7:13   ` Linus Torvalds
2025-10-22  7:38     ` Pedro Falcato
2025-10-22 10:00       ` Kiryl Shutsemau
2025-10-22 17:28 ` David Hildenbrand
2025-10-23 10:31   ` Kiryl Shutsemau
2025-10-23 10:54     ` David Hildenbrand
2025-10-23 11:09       ` Kiryl Shutsemau
2025-10-23 12:08         ` David Hildenbrand
2025-10-23 11:10       ` David Hildenbrand
2025-10-23 11:11         ` David Hildenbrand
2025-10-23 11:40           ` Kiryl Shutsemau
2025-10-23 11:49             ` David Hildenbrand
2025-10-23 12:41               ` Kiryl Shutsemau
2025-10-23 17:42     ` Yang Shi
2025-10-27 10:49       ` Hugh Dickins
2025-10-27 15:50         ` Linus Torvalds
2025-10-27 16:06           ` David Hildenbrand
2025-10-27 16:48             ` Linus Torvalds
2025-10-27 16:53               ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wh62OxWsL+msmks7=VdBJHz7HvRYoPDckkAEAwsgrmjew@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=david@redhat.com \
    --cc=jack@suse.cz \
    --cc=kas@kernel.org \
    --cc=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=surenb@google.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox