linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: David Howells <dhowells@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>, Al Viro <viro@zeniv.linux.org.uk>,
	 Christoph Hellwig <hch@infradead.org>,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	 Jeff Layton <jlayton@kernel.org>,
	David Hildenbrand <david@redhat.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	 Logan Gunthorpe <logang@deltatee.com>,
	Hillf Danton <hdanton@sina.com>,
	 Christian Brauner <brauner@kernel.org>,
	linux-fsdevel@vger.kernel.org,  linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	 Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v20 03/32] splice: Make direct_read_splice() limit to eof where appropriate
Date: Fri, 19 May 2023 10:37:59 -0700	[thread overview]
Message-ID: <CAHk-=wjDq5_wLWrapzFiJ3ZNn6aGFWeMJpAj5q+4z-Ok8DD9dA@mail.gmail.com> (raw)
In-Reply-To: <1845768.1684514823@warthog.procyon.org.uk>

On Fri, May 19, 2023 at 9:48 AM David Howells <dhowells@redhat.com> wrote:
>
> This is just an optimisation to cut down the amount of bufferage allocated

So the thing is, it's actually very very wrong for some files.

Now, admittedly, those files have other issues too, and it's a design
mistake to begin with, but look at a number of files in /proc.

In particular, look at the regular files that have a size of '0'. It's
quite common indeed. Things like

    /proc/cpuinfo
    /proc/stat
    ...

you can find a ton of them with

    find /proc -type f -size 0

Is it horribly wrong and bad? Yes. I hate it. It means that some
really basic user space tools refuse to work on them, and the tools
are 100% right - this is a kernel misfeature. Trying to do things like

    less -S /proc/cpuinfo

may or may not work depending on your version of 'less', for example,
because it's entirely reasonable to do something like

    fd = open(..);
    if (!fstat(fd, &st))
         len = st.st_size;

and limit your reads to the size of the file - exactly like your patch does.

Except it fails horribly on those broken /proc files.

I hate it, and I blame myself for the above horror, but it's pretty
much unfixable. We could make them look like named pipes or something,
but that's really ugly and probably would break other things anyway.
And we simply don't know the size ahead of time.

Now, *most* things work, because they just do the whole "read until
EOF". In fact, my current version of 'less' has no problem at all
doing the above thing, and gives the "expected" output.

Also, honestly, I really don't think that it's necessarily a good idea
to splice /proc files, but we actually do have splice wired up to
these because people asked for it:

    fe33850ff798 ("proc: wire up generic_file_splice_read for iter ops")
    4bd6a7353ee1 ("sysctl: Convert to iter interfaces")

so I suspect those things do exist.

> I could just drop it and leave it to userspace for now as the filesystem/block
> layer will stop anyway if it hits the EOF.  Christoph would prefer that I call
> direct_splice_read() from generic_file_splice_read() in all O_DIRECT cases, if
> that's fine with you.

I guess that's fine, and for O_DIRECT itself it might even make sense
to do the size test. That said, I doubt it matters: if you use
O_DIRECT on a small file, you only have yourself to blame for doing
something stupid.

And if it isn't a small file, then who cares about some small EOF-time
optimization? Nobody.

So I would suggest not doing that optimization at all, because as-is,
it's either pointless or actively broken.

That said, I would *not* hate some kind of special FMODE_SIZELIMIT
flag that allows filesystems to opt in to "limit reads to size".

We already have flags like that: FMODE_UNSIGNED_OFFSET and
'sb->s_maxbytes' are both basically variations on that same theme, and
having another flag to say "limit reads to i_size" wouldn't be wrong.

It's only wrong when it is done mindlessly with S_ISREG().

             Linus


  reply	other threads:[~2023-05-19 17:38 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-19  7:40 [PATCH v20 00/32] splice, block: Use page pinning and kill ITER_PIPE David Howells
2023-05-19  7:40 ` [PATCH v20 01/32] splice: Fix filemap of a blockdev David Howells
2023-05-19  8:06   ` Christoph Hellwig
2023-05-19  9:11   ` David Howells
2023-05-19  7:40 ` [PATCH v20 02/32] splice: Clean up direct_splice_read() a bit David Howells
2023-05-19  7:40 ` [PATCH v20 03/32] splice: Make direct_read_splice() limit to eof where appropriate David Howells
2023-05-19  8:09   ` Christoph Hellwig
2023-05-19  8:43   ` David Howells
2023-05-19  8:47     ` Christoph Hellwig
2023-05-19 22:27     ` David Howells
2023-05-20  3:54       ` Christoph Hellwig
2023-05-19 16:31   ` Linus Torvalds
2023-05-19 16:47   ` David Howells
2023-05-19 17:37     ` Linus Torvalds [this message]
2023-05-19  7:40 ` [PATCH v20 04/32] splice: Make do_splice_to() generic and export it David Howells
2023-05-19  7:40 ` [PATCH v20 05/32] splice: Make splice from a DAX file use direct_splice_read() David Howells
2023-05-19  8:10   ` Christoph Hellwig
2023-05-19  8:48   ` David Howells
2023-05-19  7:40 ` [PATCH v20 06/32] shmem: Implement splice-read David Howells
2023-05-19  7:40 ` [PATCH v20 07/32] overlayfs: " David Howells
2023-05-19  7:40 ` [PATCH v20 08/32] coda: " David Howells
2023-05-19  7:40 ` [PATCH v20 09/32] tty, proc, kernfs, random: Use direct_splice_read() David Howells
2023-05-19  8:12   ` Christoph Hellwig
2023-05-19 16:22     ` Linus Torvalds
2023-05-19  7:40 ` [PATCH v20 10/32] net: Make sock_splice_read() use direct_splice_read() by default David Howells
2023-05-19  7:40 ` [PATCH v20 11/32] 9p: Add splice_read stub David Howells
2023-05-19  7:40 ` [PATCH v20 12/32] afs: Provide a splice-read stub David Howells
2023-05-19  7:40 ` [PATCH v20 13/32] ceph: " David Howells
2023-05-19  8:40   ` Xiubo Li
2023-05-19  9:24   ` David Howells
2023-05-22  1:53     ` Xiubo Li
2023-05-19  7:40 ` [PATCH v20 14/32] ecryptfs: " David Howells
2023-05-19  7:40 ` [PATCH v20 15/32] ext4: " David Howells
2023-05-19  7:40 ` [PATCH v20 16/32] f2fs: " David Howells
2023-05-19  7:40 ` [PATCH v20 17/32] nfs: " David Howells
2023-05-19  7:40 ` [PATCH v20 18/32] ntfs3: " David Howells
2023-05-19  7:40 ` [PATCH v20 19/32] ocfs2: " David Howells
2023-05-19  7:40 ` [PATCH v20 20/32] orangefs: " David Howells
2023-05-19  7:40 ` [PATCH v20 21/32] xfs: " David Howells
2023-05-19  7:40 ` [PATCH v20 22/32] zonefs: " David Howells
2023-05-19  7:40 ` [PATCH v20 23/32] splice: Convert trace/seq to use direct_splice_read() David Howells
2023-05-22 14:29   ` Steven Rostedt
2023-05-22 14:50   ` David Howells
2023-05-22 17:42     ` Linus Torvalds
2023-05-22 18:38       ` Steven Rostedt
2023-05-19  7:40 ` [PATCH v20 24/32] splice: Do splice read from a file without using ITER_PIPE David Howells
2023-05-19  7:40 ` [PATCH v20 25/32] cifs: Use generic_file_splice_read() David Howells
2023-05-19  7:40 ` [PATCH v20 26/32] iov_iter: Kill ITER_PIPE David Howells
2023-05-19  7:40 ` [PATCH v20 27/32] iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing David Howells
2023-05-19  7:40 ` [PATCH v20 28/32] block: Fix bio_flagged() so that gcc can better optimise it David Howells
2023-05-19  7:40 ` [PATCH v20 29/32] block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logic David Howells
2023-05-20  1:26   ` Kent Overstreet
2023-05-20  3:56     ` Christoph Hellwig
2023-05-20  4:13       ` Kent Overstreet
2023-05-20  4:17         ` Christoph Hellwig
2023-05-20  5:52           ` Kent Overstreet
2023-05-20  8:40   ` David Howells
2023-05-19  7:40 ` [PATCH v20 30/32] block: Add BIO_PAGE_PINNED and associated infrastructure David Howells
2023-05-19  7:40 ` [PATCH v20 31/32] block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages David Howells
2023-05-19  7:40 ` [PATCH v20 32/32] block: convert bio_map_user_iov " David Howells
2023-05-19  7:49 ` [PATCH] iov_iter: Add automatic-alloc for ITER_BVEC and use in direct_splice_read() David Howells
2023-05-19  8:18   ` Christoph Hellwig
2023-05-19  8:06 ` [PATCH v20 00/32] splice, block: Use page pinning and kill ITER_PIPE Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wjDq5_wLWrapzFiJ3ZNn6aGFWeMJpAj5q+4z-Ok8DD9dA@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=david@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=hdanton@sina.com \
    --cc=jack@suse.cz \
    --cc=jgg@nvidia.com \
    --cc=jlayton@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=logang@deltatee.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox