linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Shaurya Rane <ssrane_b23@ee.vjti.ac.in>,
	"Darrick J . Wong" <djwong@kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Meta kernel team <kernel-team@meta.com>,
	bpf@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	syzbot+09b7d050e4806540153d@syzkaller.appspotmail.com,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH bpf v2] lib/buildid: use __kernel_read() for sleepable context
Date: Mon, 22 Dec 2025 05:31:50 +0000	[thread overview]
Message-ID: <aUjXxSAD2-c4Ivy1@casper.infradead.org> (raw)
In-Reply-To: <64muytpsnwmjcnc5szbz4gfnh2owgorsfdl5zmomtykptfry4s@tuajoyqmulqc>

On Thu, Dec 18, 2025 at 09:58:43PM -0800, Shakeel Butt wrote:
> On Fri, Dec 19, 2025 at 04:07:51AM +0000, Matthew Wilcox wrote:
> > > I am assuming that __kernel_read() can return less data than the
> > > requested. Is that assumption incorrect?
> > 
> > I think it can, but I don't think a second call will get any more data.
> > For example, it could hit EOF.  What led you to think that calling it in
> > a loop was the right approach?
> 
> I am kind of following the convention of a userspace application doing
> read() syscall i.e. repeatedly call read() until you hit an error or EOF
> in which case 0 will be returned or you successfully read the amount of
> data you want. I am handling negative error and 0 and for 0, I am
> returning -EIO as that would be unexpected end of an ELF file.

Oh, you sweet summer child.  I hope Father Christmas leaves you an
extra special present in your stocking this week!

While it would be lovely to believe that userspace does that kind of loop,
it just doesn't.  That's why mounting NFS filesystems with the "intr"
option (before I made it a no-op) was dangerous -- userspace just isn't
prepared for short reads.  I mean, we're lucky if userspace even checks
for errors, let alone does this kind of advanced "oh, we got fewer bytes
than we wanted, keep trying" scheme.

A filesystem's ->read_iter() implementation can stop short of reading
all bytes requested if:

 - We hit EIO.  No amount of asking will return more bytes, the data's
   just not there any more.
 - We hit EOF.  There's no more data to read.
 - We're unable to access the buffer.  That's only possible for user
   buffers, not kernel buffers.
 - We receive a fatal signal.  I suppose there is the tiniest chance
   that the I/O completes while we're processing the "we returned early"
   loop, but in practice, the user has asked that we die now, and even
   trying again is rude.  Just die as quickly as we can.

I can't think of any other cases.  It's just not allowed to return
short reads to userspace (other than EIO/EOF), and that drives all
the behaviour.

> Anyways the question is if __kernel_read() returns less amount of data
> than requested, should we return error instead of retrying? I looked
> couple of callers of __kernel_read() & kernel_read(). Some are erroring
> out if received data is less than requested (e.g. big_key_read()) and
> some are calling in the loop (e.g. kernel_read_file()).

kernel_read_file() is wrong.  Thanks for reporting it; I'll send a patch.


  parent reply	other threads:[~2025-12-22  5:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-18 20:55 Shakeel Butt
2025-12-18 21:21 ` Andrew Morton
2025-12-18 23:55 ` Matthew Wilcox
2025-12-19  0:16   ` Shakeel Butt
2025-12-19  4:07     ` Matthew Wilcox
2025-12-19  5:58       ` Shakeel Butt
2025-12-19 17:42         ` Andrii Nakryiko
2025-12-22  5:33           ` Matthew Wilcox
2025-12-22  5:31         ` Matthew Wilcox [this message]
2025-12-22 19:41           ` Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aUjXxSAD2-c4Ivy1@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=kernel-team@meta.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shakeel.butt@linux.dev \
    --cc=ssrane_b23@ee.vjti.ac.in \
    --cc=syzbot+09b7d050e4806540153d@syzkaller.appspotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox