From: Linus Torvalds <torvalds@linux-foundation.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: Matt Whitlock <kernel@mattwhitlock.name>,
David Howells <dhowells@redhat.com>,
netdev@vger.kernel.org, Dave Chinner <david@fromorbit.com>,
Jens Axboe <axboe@kernel.dk>,
linux-fsdevel@kvack.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe
Date: Thu, 29 Jun 2023 11:53:13 -0700 [thread overview]
Message-ID: <CAHk-=wjixHw6n_R5TQWW1r0a+GgFAPGw21KMj6obkzr3qXXbYA@mail.gmail.com> (raw)
In-Reply-To: <ZJ3OoCcSxZzzgUur@casper.infradead.org>
On Thu, 29 Jun 2023 at 11:34, Matthew Wilcox <willy@infradead.org> wrote:
>
> I think David muddied the waters by talking about vmsplice(). The
> problem encountered is with splice() from the page cache. Reading
> the documentation,
>
> splice() moves data between two file descriptors without copying be‐
> tween kernel address space and user address space. It transfers up to
> len bytes of data from the file descriptor fd_in to the file descriptor
> fd_out, where one of the file descriptors must refer to a pipe.
Well, the original intent really always was that it's about zero-copy.
So I do think that the answer to your test-program is that yes, it
really traditionally *should* output "new".
A splice from a file acts like a scatter-gather mmap() in the kernel.
It's the original intent, and it's the whole reason it's noticeably
faster than doing a write.
Now, do I then agree that splice() has turned out to be a nasty morass
of problems? Yes.
And I even agree that while I actually *think* that your test program
should output "new" (because that is the whole point of the exercise),
it also means that people who use splice() need to *understand* that,
and it's much too easy to get things wrong if you don't understand
that the whole point of splice is to act as a kind of ad-hoc in-kernel
mmap thing.
And to make matters worse, for mmap() we actually do have some
coherency helpers. For splice(), the page ref stays around.
It's kind of like GUP and page pinning - another area where we have
had lots of problems and lots of nasty semantics and complications
with other VM operations over the years.
So I really *really* don't want to complicate splice() even more to
give it some new semantics that it has never ever really had, because
people didn't understand it and used it wrong.
Quite the reverse. I'd be willing to *simplify* splice() by just
saying "it was all a mistake", and just turning it into wrappers
around read/write. But those patches would have to be radical
simplifications, not adding yet more crud on top of the pain that is
splice().
Because it will hurt performance. And I'm ok with that as long as it
comes with huge simplifications. What I'm *not* ok with is "I mis-used
splice, now I want splice to act differently, so let's make it even
more complicated".
Linus
next prev parent reply other threads:[~2023-06-29 18:53 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-29 15:54 David Howells
2023-06-29 15:54 ` [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns David Howells
2023-07-19 10:17 ` Miklos Szeredi
2023-07-19 17:59 ` Matt Whitlock
2023-07-19 19:35 ` Miklos Szeredi
2023-07-19 19:44 ` Matthew Wilcox
2023-07-19 19:56 ` Miklos Szeredi
2023-07-19 20:04 ` Matthew Wilcox
2023-07-19 20:16 ` Linus Torvalds
2023-07-19 21:02 ` Matt Whitlock
2023-07-19 23:20 ` Linus Torvalds
2023-07-19 23:41 ` Matt Whitlock
2023-07-20 0:00 ` Linus Torvalds
2023-07-19 23:48 ` Linus Torvalds
2023-07-24 9:44 ` David Howells
2023-07-24 13:55 ` Miklos Szeredi
2023-07-24 16:15 ` David Howells
2023-06-29 15:54 ` [RFC PATCH 2/4] splice: Make vmsplice() steal or copy David Howells
2023-06-30 13:44 ` Simon Horman
2023-06-30 15:29 ` David Howells
2023-06-30 17:32 ` Simon Horman
2023-06-29 15:54 ` [RFC PATCH 3/4] splice: Remove some now-unused bits David Howells
2023-06-29 15:54 ` [RFC PATCH 4/4] splice: Record some statistics David Howells
2023-06-29 17:56 ` [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe Linus Torvalds
2023-06-29 18:05 ` Matt Whitlock
2023-06-29 18:19 ` Linus Torvalds
2023-06-29 18:34 ` Matthew Wilcox
2023-06-29 18:53 ` Linus Torvalds [this message]
2023-06-30 16:50 ` David Howells
2023-06-29 18:42 ` Linus Torvalds
2023-06-29 18:16 ` Matt Whitlock
2023-06-30 0:01 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHk-=wjixHw6n_R5TQWW1r0a+GgFAPGw21KMj6obkzr3qXXbYA@mail.gmail.com' \
--to=torvalds@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=david@fromorbit.com \
--cc=dhowells@redhat.com \
--cc=kernel@mattwhitlock.name \
--cc=linux-fsdevel@kvack.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=netdev@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox