From: Linus Torvalds <torvalds@linux-foundation.org>
To: Hugh Dickins <hughd@google.com>
Cc: Kiryl Shutsemau <kirill@shutemov.name>,
David Hildenbrand <david@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <willy@infradead.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Yang Shi <shy828301@gmail.com>,
Dave Chinner <david@fromorbit.com>,
Suren Baghdasaryan <surenb@google.com>,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/filemap: Implement fast short reads
Date: Mon, 27 Oct 2025 08:50:53 -0700 [thread overview]
Message-ID: <CAHk-=wiWmTpQwz5FZ_=At_Tw+Nm_5Fcy-9is_jXCMo9T0mshZQ@mail.gmail.com> (raw)
In-Reply-To: <b8e56515-3903-068c-e4bd-fc0ca5c30d94@google.com>
On Mon, 27 Oct 2025 at 03:49, Hugh Dickins <hughd@google.com> wrote:
>
> This makes a fundamental change to speculative page cache assumptions.
Yes, but I'm a bit surprised by people who find that scary.
The page cache does *much* more scary things elsewhere, particularly
the whole folio_try_get() dance (in filemap_get_entry() and other
places).
I suspect you ignore that just because it's been that way forever, so
you're comfortable with it.
I'd argue that that is much *much* more subtle because it means that
somebody may be incrementing the page count of a page that has already
been re-allocated by somebody else.
Talk about cognitive load: that code makes you think that "hey, the
tryget means that if it has been released, we don't get a ref to it",
because that's how many of our *other* speculative RCU accesses do in
fact work.
But that's not how the page cache works, exactly because freeing isn't
actually RCU-delayed.
So while the code visually follows the exact same pattern as some
other "look up speculatively under RCU, skip if it's not there any
more", it actually does exactly the same thing as the "copy data under
RCU, then check later if it was ok". Except it does "increment
refcount under RCU, then check later if it was actually valid".
That said, I wonder if we might not consider making page cache freeing
be RCU-delayed. This has come up before (exactly *because* of that
"folio_try_get()").
Because while I am pretty sure that filemap_get_entry() is fine (and a
number of other core users), I'm not convinced that some of the other
users of folio_try_get() are necessarily aware of just how subtle that
thing is.
Anyway, I'm certainly not going to push that patch very hard.
But I do think that a "3x performance improvement on a case that is
known to be an issue for at least one real-world customer" shouldn't
be called "a niche case". I've seen *way* more niche than that.
(I do think RCU-freeing folios would potentially be an interesting
thing to look into, but I don't think the patch under discussion is
necessarily the reason to do so).
Linus
next prev parent reply other threads:[~2025-10-27 15:51 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-17 14:15 Kiryl Shutsemau
2025-10-18 2:38 ` kernel test robot
2025-10-18 3:54 ` kernel test robot
2025-10-18 4:46 ` kernel test robot
2025-10-18 17:56 ` Linus Torvalds
2025-10-20 11:03 ` Kiryl Shutsemau
2025-10-20 4:53 ` Andrew Morton
2025-10-20 11:33 ` Kiryl Shutsemau
2025-10-21 15:50 ` Linus Torvalds
2025-10-21 23:39 ` Dave Chinner
2025-10-22 4:25 ` Linus Torvalds
2025-10-22 8:00 ` Dave Chinner
2025-10-22 15:31 ` Linus Torvalds
2025-10-23 7:50 ` Dave Chinner
2025-10-23 9:37 ` Jan Kara
2025-10-21 15:47 ` Linus Torvalds
2025-10-22 7:08 ` Pedro Falcato
2025-10-22 7:13 ` Linus Torvalds
2025-10-22 7:38 ` Pedro Falcato
2025-10-22 10:00 ` Kiryl Shutsemau
2025-10-22 17:28 ` David Hildenbrand
2025-10-23 10:31 ` Kiryl Shutsemau
2025-10-23 10:54 ` David Hildenbrand
2025-10-23 11:09 ` Kiryl Shutsemau
2025-10-23 12:08 ` David Hildenbrand
2025-10-23 11:10 ` David Hildenbrand
2025-10-23 11:11 ` David Hildenbrand
2025-10-23 11:40 ` Kiryl Shutsemau
2025-10-23 11:49 ` David Hildenbrand
2025-10-23 12:41 ` Kiryl Shutsemau
2025-10-23 17:42 ` Yang Shi
2025-10-27 10:49 ` Hugh Dickins
2025-10-27 15:50 ` Linus Torvalds [this message]
2025-10-27 16:06 ` David Hildenbrand
2025-10-27 16:48 ` Linus Torvalds
2025-10-27 16:53 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHk-=wiWmTpQwz5FZ_=At_Tw+Nm_5Fcy-9is_jXCMo9T0mshZQ@mail.gmail.com' \
--to=torvalds@linux-foundation.org \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=david@fromorbit.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=kirill@shutemov.name \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=shy828301@gmail.com \
--cc=surenb@google.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox