* [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
@ 2025-11-14 19:37 ssrane_b23
2025-11-14 20:44 ` Matthew Wilcox
0 siblings, 1 reply; 22+ messages in thread
From: ssrane_b23 @ 2025-11-14 19:37 UTC (permalink / raw)
To: willy
Cc: akpm, shakeel.butt, eddyz87, andrii, ast, linux-fsdevel,
linux-mm, linux-kernel, linux-kernel-mentees, skhan,
david.hunter.linux, khalid, Shaurya Rane,
syzbot+09b7d050e4806540153d
From: Shaurya Rane <ssrane_b23@ee.vjti.ac.in>
When read_cache_folio() is called with a NULL filler function on a
mapping that does not implement read_folio, a NULL pointer
dereference occurs in filemap_read_folio().
The crash occurs when:
build_id_parse() is called on a VMA backed by a file from a
filesystem that does not implement ->read_folio() (e.g. procfs,
sysfs, or other virtual filesystems).
read_cache_folio() is called with filler = NULL.
do_read_cache_folio() assigns filler = mapping->a_ops->read_folio,
which is still NULL.
filemap_read_folio() calls filler(), causing a NULL pointer
dereference.
The fix is to add a NULL check after the fallback assignment and return
-EIO. Callers handle this error safely.
Reported-by: syzbot+09b7d050e4806540153d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=09b7d050e4806540153d
Fixes: ad41251c290d ("lib/buildid: implement sleepable build_id_parse() API")
Signed-off-by: Shaurya Rane <ssrane_b23@ee.vjti.ac.in>
---
mm/filemap.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/filemap.c b/mm/filemap.c
index 13f0259d993c..f700fe931d61 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3980,6 +3980,8 @@ static struct folio *do_read_cache_folio(struct address_space *mapping,
if (!filler)
filler = mapping->a_ops->read_folio;
+ if (!filler)
+ return ERR_PTR(-EIO);
repeat:
folio = filemap_get_folio(mapping, index);
if (IS_ERR(folio)) {
--
2.34.1
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
2025-11-14 19:37 [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio() ssrane_b23
@ 2025-11-14 20:44 ` Matthew Wilcox
2025-11-16 5:42 ` [PATCH v2] " ssrane_b23
2025-11-16 5:43 ` [PATCH] " SHAURYA RANE
0 siblings, 2 replies; 22+ messages in thread
From: Matthew Wilcox @ 2025-11-14 20:44 UTC (permalink / raw)
To: ssrane_b23
Cc: akpm, shakeel.butt, eddyz87, andrii, ast, linux-fsdevel,
linux-mm, linux-kernel, linux-kernel-mentees, skhan,
david.hunter.linux, khalid, syzbot+09b7d050e4806540153d
On Sat, Nov 15, 2025 at 01:07:29AM +0530, ssrane_b23@ee.vjti.ac.in wrote:
> When read_cache_folio() is called with a NULL filler function on a
> mapping that does not implement read_folio, a NULL pointer
> dereference occurs in filemap_read_folio().
>
> The crash occurs when:
>
> build_id_parse() is called on a VMA backed by a file from a
> filesystem that does not implement ->read_folio() (e.g. procfs,
> sysfs, or other virtual filesystems).
Not a fan of this approach, to be honest. This should be caught at
a higher level. In __build_id_parse(), there's already a check:
/* only works for page backed storage */
if (!vma->vm_file)
return -EINVAL;
which is funny because the comment is correct, but the code is not.
I suspect the right answer is to add right after it:
+ if (vma->vm_file->f_mapping->a_ops == &empty_aops)
+ return -EINVAL;
Want to test that out?
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v2] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
2025-11-14 20:44 ` Matthew Wilcox
@ 2025-11-16 5:42 ` ssrane_b23
2025-11-16 5:43 ` [PATCH] " SHAURYA RANE
1 sibling, 0 replies; 22+ messages in thread
From: ssrane_b23 @ 2025-11-16 5:42 UTC (permalink / raw)
To: willy
Cc: akpm, shakeel.butt, eddyz87, andrii, ast, linux-fsdevel,
linux-mm, linux-kernel, linux-kernel-mentees, skhan,
david.hunter.linux, khalid, syzbot+09b7d050e4806540153d,
ssrane_b23
From: Shaurya Rane <ssrane_b23@ee.vjti.ac.in>
When read_cache_folio() is called with a NULL filler function on a mapping
that doesn't implement read_folio, a NULL pointer dereference occurs in
filemap_read_folio().
The crash occurs when:
1. build_id_parse() is called on a VMA backed by a file from a filesystem
that doesn't implement address_space_operations.read_folio (e.g., procfs,
sysfs, or other virtual filesystems)
2. read_cache_folio() is called with filler=NULL
3. do_read_cache_folio() sets filler = mapping->a_ops->read_folio (still NULL)
4. filemap_read_folio() calls filler() causing a NULL pointer dereference
The fix is to add a NULL check after the fallback assignment and return
-EIO, which is handled gracefully by the callers.
Reported-by: syzbot+09b7d050e4806540153d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=09b7d050e4806540153d
Fixes: ad41251c290d ("lib/buildid: implement sleepable build_id_parse() API")
Signed-off-by: Shaurya Rane <ssrane_b23@ee.vjti.ac.in>
---
Changes in v2:
- Add early read_folio check in __build_id_parse() for unsupported filesystems.
-Add defensive !filler check in do_read_cache_folio() to prevent NULL dereference.
---
lib/buildid.c | 3 +++
mm/filemap.c | 2 ++
2 files changed, 5 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index c4b0f376fb34..3136cc92010f 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -298,6 +298,9 @@ static int __build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
/* only works for page backed storage */
if (!vma->vm_file)
return -EINVAL;
+ /* check if filesystem supports page cache operations */
+ if (!vma->vm_file->f_mapping->a_ops->read_folio)
+ return -EINVAL;
freader_init_from_file(&r, buf, sizeof(buf), vma->vm_file, may_fault);
diff --git a/mm/filemap.c b/mm/filemap.c
index 13f0259d993c..f700fe931d61 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3980,6 +3980,8 @@ static struct folio *do_read_cache_folio(struct address_space *mapping,
if (!filler)
filler = mapping->a_ops->read_folio;
+ if (!filler)
+ return ERR_PTR(-EIO);
repeat:
folio = filemap_get_folio(mapping, index);
if (IS_ERR(folio)) {
--
2.34.1
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
2025-11-14 20:44 ` Matthew Wilcox
2025-11-16 5:42 ` [PATCH v2] " ssrane_b23
@ 2025-11-16 5:43 ` SHAURYA RANE
2025-11-16 22:32 ` Matthew Wilcox
1 sibling, 1 reply; 22+ messages in thread
From: SHAURYA RANE @ 2025-11-16 5:43 UTC (permalink / raw)
To: Matthew Wilcox
Cc: akpm, shakeel.butt, eddyz87, andrii, ast, linux-fsdevel,
linux-mm, linux-kernel, linux-kernel-mentees, skhan,
david.hunter.linux, khalid, syzbot+09b7d050e4806540153d
On Sat, Nov 15, 2025 at 2:14 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sat, Nov 15, 2025 at 01:07:29AM +0530, ssrane_b23@ee.vjti.ac.in wrote:
> > When read_cache_folio() is called with a NULL filler function on a
> > mapping that does not implement read_folio, a NULL pointer
> > dereference occurs in filemap_read_folio().
> >
> > The crash occurs when:
> >
> > build_id_parse() is called on a VMA backed by a file from a
> > filesystem that does not implement ->read_folio() (e.g. procfs,
> > sysfs, or other virtual filesystems).
>
> Not a fan of this approach, to be honest. This should be caught at
> a higher level. In __build_id_parse(), there's already a check:
>
> /* only works for page backed storage */
> if (!vma->vm_file)
> return -EINVAL;
>
> which is funny because the comment is correct, but the code is not.
> I suspect the right answer is to add right after it:
>
> + if (vma->vm_file->f_mapping->a_ops == &empty_aops)
> + return -EINVAL;
>
> Want to test that out?
Thanks for the suggestion.
Checking for
a_ops == &empty_aops
is not enough. Certain filesystems for example XFS with DAX use
their own a_ops table (not empty_aops) but still do not implement
->read_folio(). In those cases read_cache_folio() still ends up with
filler = NULL and filemap_read_folio(NULL) crashes.
Since build_id_parse() only works for true page-backed mappings, I think the
most reliable fix is to fail even earlier in _build_id_parse() before
we even reach the filemap path:
if (!vma->vm_file->f_mapping->a_ops->read_folio)
return -EINVAL;
This catches XFS+DAX and any other filesystem that lacks ->read_folio,
and it fails fast at the correct layer rather than deeper in mm/filemap.
I think this is the right approach. I’ll send a v2 shortly.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
2025-11-16 5:43 ` [PATCH] " SHAURYA RANE
@ 2025-11-16 22:32 ` Matthew Wilcox
2025-11-17 14:10 ` Shaurya Rane
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Matthew Wilcox @ 2025-11-16 22:32 UTC (permalink / raw)
To: SHAURYA RANE
Cc: akpm, shakeel.butt, eddyz87, andrii, ast, linux-fsdevel,
linux-mm, linux-kernel, linux-kernel-mentees, skhan,
david.hunter.linux, khalid, syzbot+09b7d050e4806540153d
First, some process things ;-)
1. Thank you for working on this. Andrii has been ignoring it since
August, which is bad. So thank you for picking it up.
2. Sending a v2 while we're having a discussion is generally a bad idea.
It's fine to send a patch as a reply, but going as far as a v2 isn't
necessary. If conversation has died down, then a v2 is definitely
warranted, but you and I are still having a discussion ;-)
3. When you do send a v2 (or, now that you've sent a v2, send a v3),
do it as a new thread rather then in reply to the v1 thread. That plays
better with the tooling we have like b4 which will pull in all patches
in a thread.
With that over with, on to the fun technical stuff.
On Sun, Nov 16, 2025 at 11:13:42AM +0530, SHAURYA RANE wrote:
> On Sat, Nov 15, 2025 at 2:14 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Sat, Nov 15, 2025 at 01:07:29AM +0530, ssrane_b23@ee.vjti.ac.in wrote:
> > > When read_cache_folio() is called with a NULL filler function on a
> > > mapping that does not implement read_folio, a NULL pointer
> > > dereference occurs in filemap_read_folio().
> > >
> > > The crash occurs when:
> > >
> > > build_id_parse() is called on a VMA backed by a file from a
> > > filesystem that does not implement ->read_folio() (e.g. procfs,
> > > sysfs, or other virtual filesystems).
> >
> > Not a fan of this approach, to be honest. This should be caught at
> > a higher level. In __build_id_parse(), there's already a check:
> >
> > /* only works for page backed storage */
> > if (!vma->vm_file)
> > return -EINVAL;
> >
> > which is funny because the comment is correct, but the code is not.
> > I suspect the right answer is to add right after it:
> >
> > + if (vma->vm_file->f_mapping->a_ops == &empty_aops)
> > + return -EINVAL;
> >
> > Want to test that out?
> Thanks for the suggestion.
> Checking for
> a_ops == &empty_aops
> is not enough. Certain filesystems for example XFS with DAX use
> their own a_ops table (not empty_aops) but still do not implement
> ->read_folio(). In those cases read_cache_folio() still ends up with
> filler = NULL and filemap_read_folio(NULL) crashes.
Ah, right. I had assumed that the only problem was synthetic
filesystems like sysfs and procfs which can't have buildids because
buildids only exist in executables. And neither procfs nor sysfs
contain executables.
But DAX is different. You can absolutely put executables on a DAX
filesystem. So we shouldn't filter out DAX here. And we definitely
shouldn't *silently* fail for DAX. Otherwise nobody will ever realise
that the buildid people just couldn't be bothered to make DAX work.
I don't think it's necessarily all that hard to make buildid work
for DAX. It's probably something like:
if (IS_DAX(file_inode(file)))
kernel_read(file, buf, count, &pos);
but that's just off the top of my head.
I really don't want the check for filler being NULL in read_cache_folio().
I want it to crash noisily if callers are doing something stupid.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
2025-11-16 22:32 ` Matthew Wilcox
@ 2025-11-17 14:10 ` Shaurya Rane
2025-11-17 18:42 ` Andrii Nakryiko
2025-11-17 16:41 ` Darrick J. Wong
2025-11-18 5:05 ` Christoph Hellwig
2 siblings, 1 reply; 22+ messages in thread
From: Shaurya Rane @ 2025-11-17 14:10 UTC (permalink / raw)
To: Matthew Wilcox
Cc: akpm, shakeel.butt, eddyz87, andrii, ast, linux-fsdevel,
linux-mm, linux-kernel, linux-kernel-mentees, skhan,
david.hunter.linux, khalid, syzbot+09b7d050e4806540153d
On Sun, Nov 16, 2025 at 10:32:12PM +0000, Matthew Wilcox wrote:
> First, some process things ;-)
>
> 1. Thank you for working on this. Andrii has been ignoring it since
> August, which is bad. So thank you for picking it up.
>
> 2. Sending a v2 while we're having a discussion is generally a bad idea.
> It's fine to send a patch as a reply, but going as far as a v2 isn't
> necessary. If conversation has died down, then a v2 is definitely
> warranted, but you and I are still having a discussion ;-)
>
> 3. When you do send a v2 (or, now that you've sent a v2, send a v3),
> do it as a new thread rather then in reply to the v1 thread. That plays
> better with the tooling we have like b4 which will pull in all patches
> in a thread.
>
Apologies for the process errors regarding the v2 submission. I appreciate the guidance on the workflow and threading; I will ensure the next version is sent as a clean, new thread once we have agreed on the technical solution.
> With that over with, on to the fun technical stuff.
>
> On Sun, Nov 16, 2025 at 11:13:42AM +0530, SHAURYA RANE wrote:
> > On Sat, Nov 15, 2025 at 2:14 AM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Sat, Nov 15, 2025 at 01:07:29AM +0530, ssrane_b23@ee.vjti.ac.in wrote:
> > > > When read_cache_folio() is called with a NULL filler function on a
> > > > mapping that does not implement read_folio, a NULL pointer
> > > > dereference occurs in filemap_read_folio().
> > > >
> > > > The crash occurs when:
> > > >
> > > > build_id_parse() is called on a VMA backed by a file from a
> > > > filesystem that does not implement ->read_folio() (e.g. procfs,
> > > > sysfs, or other virtual filesystems).
> > >
> > > Not a fan of this approach, to be honest. This should be caught at
> > > a higher level. In __build_id_parse(), there's already a check:
> > >
> > > /* only works for page backed storage */
> > > if (!vma->vm_file)
> > > return -EINVAL;
> > >
> > > which is funny because the comment is correct, but the code is not.
> > > I suspect the right answer is to add right after it:
> > >
> > > + if (vma->vm_file->f_mapping->a_ops == &empty_aops)
> > > + return -EINVAL;
> > >
> > > Want to test that out?
> > Thanks for the suggestion.
> > Checking for
> > a_ops == &empty_aops
> > is not enough. Certain filesystems for example XFS with DAX use
> > their own a_ops table (not empty_aops) but still do not implement
> > ->read_folio(). In those cases read_cache_folio() still ends up with
> > filler = NULL and filemap_read_folio(NULL) crashes.
>
> Ah, right. I had assumed that the only problem was synthetic
> filesystems like sysfs and procfs which can't have buildids because
> buildids only exist in executables. And neither procfs nor sysfs
> contain executables.
>
> But DAX is different. You can absolutely put executables on a DAX
> filesystem. So we shouldn't filter out DAX here. And we definitely
> shouldn't *silently* fail for DAX. Otherwise nobody will ever realise
> that the buildid people just couldn't be bothered to make DAX work.
>
> I don't think it's necessarily all that hard to make buildid work
> for DAX. It's probably something like:
>
> if (IS_DAX(file_inode(file)))
> kernel_read(file, buf, count, &pos);
>
> but that's just off the top of my head.
>
>
I agree that DAX needs proper support rather than silent filtering.
However, investigating the actual syzbot reproducer revealed that the issue extends beyond just DAX. The crash is actually triggering on tmpfs (shmem).I verified via debug logging that the crashing VMA is backed by `shmem_aops`. Looking at `mm/shmem.c`, tmpfs legitimately lacks a `.read_folio` implementation by design.
It seems there are several "real" filesystems that can contain executables/libraries but lack `.read_folio`:
1. tmpfs/shmem
2. OverlayFS (delegates I/O)
3. DAX filesystems
Given that this affects multiple filesystem types, handling them all correctly via `kernel_read` might be a larger scope than fixing the immediate crash. I worry about missing edge cases in tmpfs or OverlayFS if we try to implement the fallback immediately in this patch.
> I really don't want the check for filler being NULL in read_cache_folio().
> I want it to crash noisily if callers are doing something stupid.
I propose the following approach for v3. It avoids the silent failure you are concerned about, but prevents the kernel panic:
1. Silent reject for `empty_aops` (procfs/sysfs), as they legitimately can't contain build IDs.
2. Loud warning + Error for other cases (DAX, tmpfs, OverlayFS).
The code would look like this:
/* pseudo-filesystems */
if (vma->vm_file->f_mapping->a_ops == &empty_aops)
return -EINVAL;
/* Real filesystems missing read_folio (DAX, tmpfs, OverlayFS, etc.) */
if (!vma->vm_file->f_mapping->a_ops->read_folio) {
/*
* TODO: Implement kernel_read() fallback for DAX/tmpfs.
* For now, fail loudly so we know what we are missing.
*/
pr_warn_once("build_id_parse: filesystem %s lacks read_folio support\n",
vma->vm_file->f_path.dentry->d_sb->s_type->name);
return -EOPNOTSUPP;
}
This highlights exactly which filesystems are missing support in the logs without crashing the machine
Thanks,
Shaurya
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
2025-11-16 22:32 ` Matthew Wilcox
2025-11-17 14:10 ` Shaurya Rane
@ 2025-11-17 16:41 ` Darrick J. Wong
2025-11-17 18:03 ` Matthew Wilcox
2025-11-18 5:05 ` Christoph Hellwig
2 siblings, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2025-11-17 16:41 UTC (permalink / raw)
To: Matthew Wilcox
Cc: SHAURYA RANE, akpm, shakeel.butt, eddyz87, andrii, ast,
linux-fsdevel, linux-mm, linux-kernel, linux-kernel-mentees,
skhan, david.hunter.linux, khalid, syzbot+09b7d050e4806540153d
On Sun, Nov 16, 2025 at 10:32:12PM +0000, Matthew Wilcox wrote:
> First, some process things ;-)
>
> 1. Thank you for working on this. Andrii has been ignoring it since
> August, which is bad. So thank you for picking it up.
>
> 2. Sending a v2 while we're having a discussion is generally a bad idea.
> It's fine to send a patch as a reply, but going as far as a v2 isn't
> necessary. If conversation has died down, then a v2 is definitely
> warranted, but you and I are still having a discussion ;-)
>
> 3. When you do send a v2 (or, now that you've sent a v2, send a v3),
> do it as a new thread rather then in reply to the v1 thread. That plays
> better with the tooling we have like b4 which will pull in all patches
> in a thread.
>
> With that over with, on to the fun technical stuff.
>
> On Sun, Nov 16, 2025 at 11:13:42AM +0530, SHAURYA RANE wrote:
> > On Sat, Nov 15, 2025 at 2:14 AM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Sat, Nov 15, 2025 at 01:07:29AM +0530, ssrane_b23@ee.vjti.ac.in wrote:
> > > > When read_cache_folio() is called with a NULL filler function on a
> > > > mapping that does not implement read_folio, a NULL pointer
> > > > dereference occurs in filemap_read_folio().
> > > >
> > > > The crash occurs when:
> > > >
> > > > build_id_parse() is called on a VMA backed by a file from a
> > > > filesystem that does not implement ->read_folio() (e.g. procfs,
> > > > sysfs, or other virtual filesystems).
> > >
> > > Not a fan of this approach, to be honest. This should be caught at
> > > a higher level. In __build_id_parse(), there's already a check:
> > >
> > > /* only works for page backed storage */
> > > if (!vma->vm_file)
> > > return -EINVAL;
> > >
> > > which is funny because the comment is correct, but the code is not.
> > > I suspect the right answer is to add right after it:
> > >
> > > + if (vma->vm_file->f_mapping->a_ops == &empty_aops)
> > > + return -EINVAL;
> > >
> > > Want to test that out?
> > Thanks for the suggestion.
> > Checking for
> > a_ops == &empty_aops
> > is not enough. Certain filesystems for example XFS with DAX use
> > their own a_ops table (not empty_aops) but still do not implement
> > ->read_folio(). In those cases read_cache_folio() still ends up with
> > filler = NULL and filemap_read_folio(NULL) crashes.
>
> Ah, right. I had assumed that the only problem was synthetic
> filesystems like sysfs and procfs which can't have buildids because
> buildids only exist in executables. And neither procfs nor sysfs
> contain executables.
>
> But DAX is different. You can absolutely put executables on a DAX
> filesystem. So we shouldn't filter out DAX here. And we definitely
> shouldn't *silently* fail for DAX. Otherwise nobody will ever realise
> that the buildid people just couldn't be bothered to make DAX work.
>
> I don't think it's necessarily all that hard to make buildid work
> for DAX. It's probably something like:
>
> if (IS_DAX(file_inode(file)))
> kernel_read(file, buf, count, &pos);
>
> but that's just off the top of my head.
I wondered why this whole thing opencodes kernel_read, but then I
noticed zero fstests for it and decid*******************************
*****.
--D
>
> I really don't want the check for filler being NULL in read_cache_folio().
> I want it to crash noisily if callers are doing something stupid.
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
2025-11-17 16:41 ` Darrick J. Wong
@ 2025-11-17 18:03 ` Matthew Wilcox
2025-11-17 18:45 ` Andrii Nakryiko
0 siblings, 1 reply; 22+ messages in thread
From: Matthew Wilcox @ 2025-11-17 18:03 UTC (permalink / raw)
To: Darrick J. Wong
Cc: SHAURYA RANE, akpm, shakeel.butt, eddyz87, andrii, ast,
linux-fsdevel, linux-mm, linux-kernel, linux-kernel-mentees,
skhan, david.hunter.linux, khalid, syzbot+09b7d050e4806540153d
On Mon, Nov 17, 2025 at 08:41:55AM -0800, Darrick J. Wong wrote:
> I wondered why this whole thing opencodes kernel_read, but then I
> noticed zero fstests for it and decid*******************************
> *****.
I wondered the same thing! And the answer is that it's special BPF
stuff:
/* if sleeping is allowed, wait for the page, if necessary */
if (r->may_fault && (IS_ERR(r->folio) || !folio_test_uptodate(r->folio))) {
filemap_invalidate_lock_shared(r->file->f_mapping);
r->folio = read_cache_folio(r->file->f_mapping, file_off >> PAGE_SHIFT,
NULL, r->file);
filemap_invalidate_unlock_shared(r->file->f_mapping);
}
if 'may_fault' (a misnomer since it really means "may sleep"), then we
essentially do kernel_read().
Now, maybe the right thing to do here is rip out almost all of
lib/buildid.c and replace it with an iocb with IOCB_NOWAIT set (or not).
I was hesitant to suggest this earlier as it's a bit of a big ask of
someone who was just trying to submit a one-line change. But now that
"it's also shmem" has entered the picture, I'm leaning more towards this
approach anyway.
Looking at it though, it's a bit weird that we don't have a
kiocb_read(). It feels like __kernel_read() needs to be split into
half like:
diff --git a/fs/read_write.c b/fs/read_write.c
index 833bae068770..a3bf962836a7 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -503,14 +503,29 @@ static int warn_unsupported(struct file *file, const char *op)
return -EINVAL;
}
-ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
+ssize_t kiocb_read(struct kiocb *iocb, void *buf, size_t count)
{
+ struct file *file = iocb->ki_filp;
struct kvec iov = {
.iov_base = buf,
.iov_len = min_t(size_t, count, MAX_RW_COUNT),
};
- struct kiocb kiocb;
struct iov_iter iter;
+ int ret;
+
+ iov_iter_kvec(&iter, ITER_DEST, &iov, 1, iov.iov_len);
+ ret = file->f_op->read_iter(iocb, &iter);
+ if (ret > 0) {
+ fsnotify_access(file);
+ add_rchar(current, ret);
+ }
+ inc_syscr(current);
+ return ret;
+}
+
+ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
+{
+ struct kiocb kiocb;
ssize_t ret;
if (WARN_ON_ONCE(!(file->f_mode & FMODE_READ)))
@@ -526,15 +541,9 @@ ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
init_sync_kiocb(&kiocb, file);
kiocb.ki_pos = pos ? *pos : 0;
- iov_iter_kvec(&iter, ITER_DEST, &iov, 1, iov.iov_len);
- ret = file->f_op->read_iter(&kiocb, &iter);
- if (ret > 0) {
- if (pos)
- *pos = kiocb.ki_pos;
- fsnotify_access(file);
- add_rchar(current, ret);
- }
- inc_syscr(current);
+ ret = kiocb_read(&kiocb, buf, count);
+ if (pos && ret > 0)
+ *pos = kiocb.ki_pos;
return ret;
}
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
2025-11-17 14:10 ` Shaurya Rane
@ 2025-11-17 18:42 ` Andrii Nakryiko
0 siblings, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2025-11-17 18:42 UTC (permalink / raw)
To: Shaurya Rane
Cc: Matthew Wilcox, akpm, shakeel.butt, eddyz87, andrii, ast,
linux-fsdevel, linux-mm, linux-kernel, linux-kernel-mentees,
skhan, david.hunter.linux, khalid, syzbot+09b7d050e4806540153d,
bpf
+ bpf@
On Mon, Nov 17, 2025 at 6:10 AM Shaurya Rane <ssrane_b23@ee.vjti.ac.in> wrote:
>
> On Sun, Nov 16, 2025 at 10:32:12PM +0000, Matthew Wilcox wrote:
> > First, some process things ;-)
> >
> > 1. Thank you for working on this. Andrii has been ignoring it since
> > August, which is bad. So thank you for picking it up.
It is bad, I'm sorry for this. I was surprised to read this, though,
as I was not aware of any bug related to build ID parsing code, so I
went looking at syzbot history of the issue. August timeframe you are
referring to implies those "Monthly fs report" emails, which
unfortunately I didn't receive as I'm not subscribed to linux-fsdevel,
but I do see that there was earlier report back in April, which I did
get in my inbox, apparently. So I'm sorry again for dropping the ball.
Please feel free to ping me or BPF mailing list next time when you see
something not being addressed in a timely manner.
> >
> > 2. Sending a v2 while we're having a discussion is generally a bad idea.
> > It's fine to send a patch as a reply, but going as far as a v2 isn't
> > necessary. If conversation has died down, then a v2 is definitely
> > warranted, but you and I are still having a discussion ;-)
> >
> > 3. When you do send a v2 (or, now that you've sent a v2, send a v3),
> > do it as a new thread rather then in reply to the v1 thread. That plays
> > better with the tooling we have like b4 which will pull in all patches
> > in a thread.
> >
> Apologies for the process errors regarding the v2 submission. I appreciate the guidance on the workflow and threading; I will ensure the next version is sent as a clean, new thread once we have agreed on the technical solution.
> > With that over with, on to the fun technical stuff.
> >
> > On Sun, Nov 16, 2025 at 11:13:42AM +0530, SHAURYA RANE wrote:
> > > On Sat, Nov 15, 2025 at 2:14 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Sat, Nov 15, 2025 at 01:07:29AM +0530, ssrane_b23@ee.vjti.ac.in wrote:
> > > > > When read_cache_folio() is called with a NULL filler function on a
> > > > > mapping that does not implement read_folio, a NULL pointer
> > > > > dereference occurs in filemap_read_folio().
> > > > >
> > > > > The crash occurs when:
> > > > >
> > > > > build_id_parse() is called on a VMA backed by a file from a
> > > > > filesystem that does not implement ->read_folio() (e.g. procfs,
> > > > > sysfs, or other virtual filesystems).
> > > >
> > > > Not a fan of this approach, to be honest. This should be caught at
> > > > a higher level. In __build_id_parse(), there's already a check:
> > > >
> > > > /* only works for page backed storage */
> > > > if (!vma->vm_file)
> > > > return -EINVAL;
> > > >
> > > > which is funny because the comment is correct, but the code is not.
> > > > I suspect the right answer is to add right after it:
> > > >
> > > > + if (vma->vm_file->f_mapping->a_ops == &empty_aops)
> > > > + return -EINVAL;
> > > >
> > > > Want to test that out?
> > > Thanks for the suggestion.
> > > Checking for
> > > a_ops == &empty_aops
> > > is not enough. Certain filesystems for example XFS with DAX use
> > > their own a_ops table (not empty_aops) but still do not implement
> > > ->read_folio(). In those cases read_cache_folio() still ends up with
> > > filler = NULL and filemap_read_folio(NULL) crashes.
> >
> > Ah, right. I had assumed that the only problem was synthetic
> > filesystems like sysfs and procfs which can't have buildids because
> > buildids only exist in executables. And neither procfs nor sysfs
> > contain executables.
> >
> > But DAX is different. You can absolutely put executables on a DAX
> > filesystem. So we shouldn't filter out DAX here. And we definitely
> > shouldn't *silently* fail for DAX. Otherwise nobody will ever realise
> > that the buildid people just couldn't be bothered to make DAX work.
> >
> > I don't think it's necessarily all that hard to make buildid work
> > for DAX. It's probably something like:
> >
> > if (IS_DAX(file_inode(file)))
> > kernel_read(file, buf, count, &pos);
> >
> > but that's just off the top of my head.
> >
> >
> I agree that DAX needs proper support rather than silent filtering.
> However, investigating the actual syzbot reproducer revealed that the issue extends beyond just DAX. The crash is actually triggering on tmpfs (shmem).I verified via debug logging that the crashing VMA is backed by `shmem_aops`. Looking at `mm/shmem.c`, tmpfs legitimately lacks a `.read_folio` implementation by design.
> It seems there are several "real" filesystems that can contain executables/libraries but lack `.read_folio`:
> 1. tmpfs/shmem
> 2. OverlayFS (delegates I/O)
> 3. DAX filesystems
> Given that this affects multiple filesystem types, handling them all correctly via `kernel_read` might be a larger scope than fixing the immediate crash. I worry about missing edge cases in tmpfs or OverlayFS if we try to implement the fallback immediately in this patch.
> > I really don't want the check for filler being NULL in read_cache_folio().
> > I want it to crash noisily if callers are doing something stupid.
> I propose the following approach for v3. It avoids the silent failure you are concerned about, but prevents the kernel panic:
>
> 1. Silent reject for `empty_aops` (procfs/sysfs), as they legitimately can't contain build IDs.
> 2. Loud warning + Error for other cases (DAX, tmpfs, OverlayFS).
>
Tbh, it seems a bit fragile to have to hard-code such file
system-specific logic in higher-level build ID fetching logic, where
all we really ask for from filemap_get_folio() + read_cache_folio()
combo is to give us requested piece of file or let us know (without
crashing) that this was not possible.
But if there is no way to abstract this away, then I think Shaurya
proposed with failing known-not-supported cases and warning on
unexpected ones would be a reasonable solution, I suppose. I see that
Matthew is discussing generalizing kernel_read, so maybe that will be
a better solution, let's see.
> The code would look like this:
>
> /* pseudo-filesystems */
> if (vma->vm_file->f_mapping->a_ops == &empty_aops)
> return -EINVAL;
>
> /* Real filesystems missing read_folio (DAX, tmpfs, OverlayFS, etc.) */
> if (!vma->vm_file->f_mapping->a_ops->read_folio) {
> /*
> * TODO: Implement kernel_read() fallback for DAX/tmpfs.
> * For now, fail loudly so we know what we are missing.
> */
> pr_warn_once("build_id_parse: filesystem %s lacks read_folio support\n",
> vma->vm_file->f_path.dentry->d_sb->s_type->name);
> return -EOPNOTSUPP;
> }
>
> This highlights exactly which filesystems are missing support in the logs without crashing the machine
> Thanks,
> Shaurya
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
2025-11-17 18:03 ` Matthew Wilcox
@ 2025-11-17 18:45 ` Andrii Nakryiko
2025-11-18 13:03 ` Christoph Hellwig
0 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2025-11-17 18:45 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Darrick J. Wong, SHAURYA RANE, akpm, shakeel.butt, eddyz87,
andrii, ast, linux-fsdevel, linux-mm, linux-kernel,
linux-kernel-mentees, skhan, david.hunter.linux, khalid,
syzbot+09b7d050e4806540153d, bpf
+ bpf@
On Mon, Nov 17, 2025 at 10:03 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Nov 17, 2025 at 08:41:55AM -0800, Darrick J. Wong wrote:
> > I wondered why this whole thing opencodes kernel_read, but then I
> > noticed zero fstests for it and decid*******************************
> > *****.
>
> I wondered the same thing! And the answer is that it's special BPF
> stuff:
>
> /* if sleeping is allowed, wait for the page, if necessary */
> if (r->may_fault && (IS_ERR(r->folio) || !folio_test_uptodate(r->folio))) {
> filemap_invalidate_lock_shared(r->file->f_mapping);
> r->folio = read_cache_folio(r->file->f_mapping, file_off >> PAGE_SHIFT,
> NULL, r->file);
> filemap_invalidate_unlock_shared(r->file->f_mapping);
> }
>
> if 'may_fault' (a misnomer since it really means "may sleep"), then we
> essentially do kernel_read().
>
> Now, maybe the right thing to do here is rip out almost all of
> lib/buildid.c and replace it with an iocb with IOCB_NOWAIT set (or not).
> I was hesitant to suggest this earlier as it's a bit of a big ask of
> someone who was just trying to submit a one-line change. But now that
> "it's also shmem" has entered the picture, I'm leaning more towards this
> approach anyway.
As I replied on another email, ideally we'd have some low-level file
reading interface where we wouldn't have to know about secretmem, or
XFS+DAX, or whatever other unusual combination of conditions where
exposed internal APIs like filemap_get_folio() + read_cache_folio()
can crash.
The only real limitation is that we'd like to be able to control
whether we are ok sleeping or not, as this code can be called from
pretty much anywhere BPF might run, which includes NMI context.
Would this kiocb_read() approach work under those circumstances?
>
> Looking at it though, it's a bit weird that we don't have a
> kiocb_read(). It feels like __kernel_read() needs to be split into
> half like:
>
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 833bae068770..a3bf962836a7 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -503,14 +503,29 @@ static int warn_unsupported(struct file *file, const char *op)
> return -EINVAL;
> }
>
> -ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
> +ssize_t kiocb_read(struct kiocb *iocb, void *buf, size_t count)
> {
> + struct file *file = iocb->ki_filp;
> struct kvec iov = {
> .iov_base = buf,
> .iov_len = min_t(size_t, count, MAX_RW_COUNT),
> };
> - struct kiocb kiocb;
> struct iov_iter iter;
> + int ret;
> +
> + iov_iter_kvec(&iter, ITER_DEST, &iov, 1, iov.iov_len);
> + ret = file->f_op->read_iter(iocb, &iter);
> + if (ret > 0) {
> + fsnotify_access(file);
> + add_rchar(current, ret);
> + }
> + inc_syscr(current);
> + return ret;
> +}
> +
> +ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
> +{
> + struct kiocb kiocb;
> ssize_t ret;
>
> if (WARN_ON_ONCE(!(file->f_mode & FMODE_READ)))
> @@ -526,15 +541,9 @@ ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
>
> init_sync_kiocb(&kiocb, file);
> kiocb.ki_pos = pos ? *pos : 0;
> - iov_iter_kvec(&iter, ITER_DEST, &iov, 1, iov.iov_len);
> - ret = file->f_op->read_iter(&kiocb, &iter);
> - if (ret > 0) {
> - if (pos)
> - *pos = kiocb.ki_pos;
> - fsnotify_access(file);
> - add_rchar(current, ret);
> - }
> - inc_syscr(current);
> + ret = kiocb_read(&kiocb, buf, count);
> + if (pos && ret > 0)
> + *pos = kiocb.ki_pos;
> return ret;
> }
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
2025-11-16 22:32 ` Matthew Wilcox
2025-11-17 14:10 ` Shaurya Rane
2025-11-17 16:41 ` Darrick J. Wong
@ 2025-11-18 5:05 ` Christoph Hellwig
2025-11-18 12:51 ` Matthew Wilcox
2 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2025-11-18 5:05 UTC (permalink / raw)
To: Matthew Wilcox
Cc: SHAURYA RANE, akpm, shakeel.butt, eddyz87, andrii, ast,
linux-fsdevel, linux-mm, linux-kernel, linux-kernel-mentees,
skhan, david.hunter.linux, khalid, syzbot+09b7d050e4806540153d
On Sun, Nov 16, 2025 at 10:32:12PM +0000, Matthew Wilcox wrote:
> I don't think it's necessarily all that hard to make buildid work
> for DAX. It's probably something like:
>
> if (IS_DAX(file_inode(file)))
> kernel_read(file, buf, count, &pos);
>
> but that's just off the top of my head.
The code should just unconditionally use kernel_read(). Relying
on ->read_folio to just work is only something file system code and
library code called by the file systems can assume.
Something reading ELF headers has no bunsiness poking into this layer.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
2025-11-18 5:05 ` Christoph Hellwig
@ 2025-11-18 12:51 ` Matthew Wilcox
2025-11-18 12:56 ` Christoph Hellwig
0 siblings, 1 reply; 22+ messages in thread
From: Matthew Wilcox @ 2025-11-18 12:51 UTC (permalink / raw)
To: Christoph Hellwig
Cc: SHAURYA RANE, akpm, shakeel.butt, eddyz87, andrii, ast,
linux-fsdevel, linux-mm, linux-kernel, linux-kernel-mentees,
skhan, david.hunter.linux, khalid, syzbot+09b7d050e4806540153d
On Mon, Nov 17, 2025 at 09:05:17PM -0800, Christoph Hellwig wrote:
> On Sun, Nov 16, 2025 at 10:32:12PM +0000, Matthew Wilcox wrote:
> > I don't think it's necessarily all that hard to make buildid work
> > for DAX. It's probably something like:
> >
> > if (IS_DAX(file_inode(file)))
> > kernel_read(file, buf, count, &pos);
> >
> > but that's just off the top of my head.
>
> The code should just unconditionally use kernel_read(). Relying
> on ->read_folio to just work is only something file system code and
> library code called by the file systems can assume.
>
> Something reading ELF headers has no bunsiness poking into this layer.
Please read the rest of the thread; this code can be called in contexts
that can't block. That was why I proposed the kiocb_read() refactoring
that I would expect you to have an opinion on.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
2025-11-18 12:51 ` Matthew Wilcox
@ 2025-11-18 12:56 ` Christoph Hellwig
0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2025-11-18 12:56 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Christoph Hellwig, SHAURYA RANE, akpm, shakeel.butt, eddyz87,
andrii, ast, linux-fsdevel, linux-mm, linux-kernel,
linux-kernel-mentees, skhan, david.hunter.linux, khalid,
syzbot+09b7d050e4806540153d
On Tue, Nov 18, 2025 at 12:51:50PM +0000, Matthew Wilcox wrote:
> Please read the rest of the thread; this code can be called in contexts
> that can't block. That was why I proposed the kiocb_read() refactoring
> that I would expect you to have an opinion on.
Doing file reads from context that can't block is just broken. Let's
just kill the code before it causes more harm.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
2025-11-17 18:45 ` Andrii Nakryiko
@ 2025-11-18 13:03 ` Christoph Hellwig
2025-11-18 15:37 ` Matthew Wilcox
0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2025-11-18 13:03 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Matthew Wilcox, Darrick J. Wong, SHAURYA RANE, akpm,
shakeel.butt, eddyz87, andrii, ast, linux-fsdevel, linux-mm,
linux-kernel, linux-kernel-mentees, skhan, david.hunter.linux,
khalid, syzbot+09b7d050e4806540153d, bpf
On Mon, Nov 17, 2025 at 10:45:31AM -0800, Andrii Nakryiko wrote:
> As I replied on another email, ideally we'd have some low-level file
> reading interface where we wouldn't have to know about secretmem, or
> XFS+DAX, or whatever other unusual combination of conditions where
> exposed internal APIs like filemap_get_folio() + read_cache_folio()
> can crash.
The problem is that you did something totally insane and it kinda works
most of the time. But bpf or any other file system consumer has
absolutely not business poking into the page cache to start with.
And I'm really pissed off that you wrote and merged this code without
ever bothering to talk to a FS or MM person who have immediately told
you so. Let's just rip out this buildid junk for now and restart
because the problem isn't actually that easy.
>
> The only real limitation is that we'd like to be able to control
> whether we are ok sleeping or not, as this code can be called from
> pretty much anywhere BPF might run, which includes NMI context.
>
> Would this kiocb_read() approach work under those circumstances?
No. IOCB_NOWAIT is just a hint to avoid blocking function calls.
It is not guarantee and a guarantee is basically impossible.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
2025-11-18 13:03 ` Christoph Hellwig
@ 2025-11-18 15:37 ` Matthew Wilcox
2025-11-18 16:12 ` Darrick J. Wong
2025-11-18 19:27 ` Andrii Nakryiko
0 siblings, 2 replies; 22+ messages in thread
From: Matthew Wilcox @ 2025-11-18 15:37 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andrii Nakryiko, Darrick J. Wong, SHAURYA RANE, akpm,
shakeel.butt, eddyz87, andrii, ast, linux-fsdevel, linux-mm,
linux-kernel, linux-kernel-mentees, skhan, david.hunter.linux,
khalid, syzbot+09b7d050e4806540153d, bpf
On Tue, Nov 18, 2025 at 05:03:24AM -0800, Christoph Hellwig wrote:
> On Mon, Nov 17, 2025 at 10:45:31AM -0800, Andrii Nakryiko wrote:
> > As I replied on another email, ideally we'd have some low-level file
> > reading interface where we wouldn't have to know about secretmem, or
> > XFS+DAX, or whatever other unusual combination of conditions where
> > exposed internal APIs like filemap_get_folio() + read_cache_folio()
> > can crash.
>
> The problem is that you did something totally insane and it kinda works
> most of the time.
... on 64-bit systems. The HIGHMEM handling is screwed up too.
> But bpf or any other file system consumer has
> absolutely not business poking into the page cache to start with.
Agreed.
> And I'm really pissed off that you wrote and merged this code without
> ever bothering to talk to a FS or MM person who have immediately told
> you so. Let's just rip out this buildid junk for now and restart
> because the problem isn't actually that easy.
Oh, they did talk to fs & mm people originally and were told NO, so they
sneaked it in through the BPF tree.
https://lore.kernel.org/all/20230316170149.4106586-1-jolsa@kernel.org/
> > The only real limitation is that we'd like to be able to control
> > whether we are ok sleeping or not, as this code can be called from
> > pretty much anywhere BPF might run, which includes NMI context.
> >
> > Would this kiocb_read() approach work under those circumstances?
>
> No. IOCB_NOWAIT is just a hint to avoid blocking function calls.
> It is not guarantee and a guarantee is basically impossible.
I'm not sure I'd go that far -- I think we're pretty good about not
sleeping when IOCB_NOWAIT is specified and any remaining places can
be fixed up.
But I am inclined to rip out the buildid code, just because the
authors have been so rude.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
2025-11-18 15:37 ` Matthew Wilcox
@ 2025-11-18 16:12 ` Darrick J. Wong
2025-11-18 19:38 ` Andrii Nakryiko
2025-11-18 19:27 ` Andrii Nakryiko
1 sibling, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2025-11-18 16:12 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Christoph Hellwig, Andrii Nakryiko, SHAURYA RANE, akpm,
shakeel.butt, eddyz87, andrii, ast, linux-fsdevel, linux-mm,
linux-kernel, linux-kernel-mentees, skhan, david.hunter.linux,
khalid, syzbot+09b7d050e4806540153d, bpf
On Tue, Nov 18, 2025 at 03:37:09PM +0000, Matthew Wilcox wrote:
> On Tue, Nov 18, 2025 at 05:03:24AM -0800, Christoph Hellwig wrote:
> > On Mon, Nov 17, 2025 at 10:45:31AM -0800, Andrii Nakryiko wrote:
> > > As I replied on another email, ideally we'd have some low-level file
> > > reading interface where we wouldn't have to know about secretmem, or
> > > XFS+DAX, or whatever other unusual combination of conditions where
> > > exposed internal APIs like filemap_get_folio() + read_cache_folio()
> > > can crash.
> >
> > The problem is that you did something totally insane and it kinda works
> > most of the time.
>
> ... on 64-bit systems. The HIGHMEM handling is screwed up too.
>
> > But bpf or any other file system consumer has
> > absolutely not business poking into the page cache to start with.
>
> Agreed.
>
> > And I'm really pissed off that you wrote and merged this code without
> > ever bothering to talk to a FS or MM person who have immediately told
> > you so. Let's just rip out this buildid junk for now and restart
> > because the problem isn't actually that easy.
>
> Oh, they did talk to fs & mm people originally and were told NO, so they
> sneaked it in through the BPF tree.
>
> https://lore.kernel.org/all/20230316170149.4106586-1-jolsa@kernel.org/
>
> > > The only real limitation is that we'd like to be able to control
> > > whether we are ok sleeping or not, as this code can be called from
> > > pretty much anywhere BPF might run, which includes NMI context.
> > >
> > > Would this kiocb_read() approach work under those circumstances?
> >
> > No. IOCB_NOWAIT is just a hint to avoid blocking function calls.
> > It is not guarantee and a guarantee is basically impossible.
>
> I'm not sure I'd go that far -- I think we're pretty good about not
> sleeping when IOCB_NOWAIT is specified and any remaining places can
> be fixed up.
>
> But I am inclined to rip out the buildid code, just because the
> authors have been so rude.
Which fstest actually checks the functionality of the buildid code?
I don't find any, which means none of the fs people have a good signal
for breakage in this, um, novel file I/O path.
--D
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
2025-11-18 15:37 ` Matthew Wilcox
2025-11-18 16:12 ` Darrick J. Wong
@ 2025-11-18 19:27 ` Andrii Nakryiko
2025-11-19 5:50 ` Christoph Hellwig
1 sibling, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2025-11-18 19:27 UTC (permalink / raw)
To: Matthew Wilcox, Linus Torvalds
Cc: Christoph Hellwig, Darrick J. Wong, SHAURYA RANE, akpm,
shakeel.butt, eddyz87, andrii, ast, linux-fsdevel, linux-mm,
linux-kernel, linux-kernel-mentees, skhan, david.hunter.linux,
khalid, syzbot+09b7d050e4806540153d, bpf
On Tue, Nov 18, 2025 at 7:37 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Nov 18, 2025 at 05:03:24AM -0800, Christoph Hellwig wrote:
> > On Mon, Nov 17, 2025 at 10:45:31AM -0800, Andrii Nakryiko wrote:
> > > As I replied on another email, ideally we'd have some low-level file
> > > reading interface where we wouldn't have to know about secretmem, or
> > > XFS+DAX, or whatever other unusual combination of conditions where
> > > exposed internal APIs like filemap_get_folio() + read_cache_folio()
> > > can crash.
> >
> > The problem is that you did something totally insane and it kinda works
> > most of the time.
>
> ... on 64-bit systems. The HIGHMEM handling is screwed up too.
>
> > But bpf or any other file system consumer has
> > absolutely not business poking into the page cache to start with.
>
> Agreed.
Then please help make it better, give us interfaces you think are
appropriate. People do use this functionality in production, it's
important and we are not going to drop it. In non-sleepable mode it's
best-effort, if the requested part of the file is paged in, we'll
successfully read data (such as ELF's build ID), and if not, we'll
report that to the BPF program as -EFAULT. In sleepable mode, we'll
wait for that part of the file to be paged in before proceeding.
PROCMAP_QUERY ioctl() is always in sleepable mode, so it will wait for
file data to be read.
If you don't like the implementation, please help improve it, don't
just request dropping it "because BPF folks" or anything like that.
>
> > And I'm really pissed off that you wrote and merged this code without
> > ever bothering to talk to a FS or MM person who have immediately told
> > you so. Let's just rip out this buildid junk for now and restart
> > because the problem isn't actually that easy.
>
> Oh, they did talk to fs & mm people originally and were told NO, so they
> sneaked it in through the BPF tree.
This patch set was never landed and has *NO* relation to the
lib/buildid.c stuff we are discussing. There was no sneaking anything
in. The patch set in question that added folio-based reading logic was
developed in the open with both mm and fsdevel in CC. Matthew himself
looked at it, he NAKed page-based initial implementation but suggested
folio-based one ([0]). Shakeel did review this (the patch set went
through 10 revisions, plenty of time to object).
[0] https://lore.kernel.org/bpf/ZrOStYOrlFr21jRc@casper.infradead.org/
>
> https://lore.kernel.org/all/20230316170149.4106586-1-jolsa@kernel.org/
>
> > > The only real limitation is that we'd like to be able to control
> > > whether we are ok sleeping or not, as this code can be called from
> > > pretty much anywhere BPF might run, which includes NMI context.
> > >
> > > Would this kiocb_read() approach work under those circumstances?
> >
> > No. IOCB_NOWAIT is just a hint to avoid blocking function calls.
> > It is not guarantee and a guarantee is basically impossible.
>
> I'm not sure I'd go that far -- I think we're pretty good about not
> sleeping when IOCB_NOWAIT is specified and any remaining places can
> be fixed up.
>
> But I am inclined to rip out the buildid code, just because the
> authors have been so rude.
Can you please elaborate on "authors have been so rude" a bit more?
Besides that link to an absolutely unrelated patch set?..
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
2025-11-18 16:12 ` Darrick J. Wong
@ 2025-11-18 19:38 ` Andrii Nakryiko
2025-11-19 5:52 ` Christoph Hellwig
2025-11-19 6:29 ` Darrick J. Wong
0 siblings, 2 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2025-11-18 19:38 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Matthew Wilcox, Christoph Hellwig, SHAURYA RANE, akpm,
shakeel.butt, eddyz87, andrii, ast, linux-fsdevel, linux-mm,
linux-kernel, linux-kernel-mentees, skhan, david.hunter.linux,
khalid, syzbot+09b7d050e4806540153d, bpf
On Tue, Nov 18, 2025 at 8:12 AM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Tue, Nov 18, 2025 at 03:37:09PM +0000, Matthew Wilcox wrote:
> > On Tue, Nov 18, 2025 at 05:03:24AM -0800, Christoph Hellwig wrote:
> > > On Mon, Nov 17, 2025 at 10:45:31AM -0800, Andrii Nakryiko wrote:
> > > > As I replied on another email, ideally we'd have some low-level file
> > > > reading interface where we wouldn't have to know about secretmem, or
> > > > XFS+DAX, or whatever other unusual combination of conditions where
> > > > exposed internal APIs like filemap_get_folio() + read_cache_folio()
> > > > can crash.
> > >
> > > The problem is that you did something totally insane and it kinda works
> > > most of the time.
> >
> > ... on 64-bit systems. The HIGHMEM handling is screwed up too.
> >
> > > But bpf or any other file system consumer has
> > > absolutely not business poking into the page cache to start with.
> >
> > Agreed.
> >
> > > And I'm really pissed off that you wrote and merged this code without
> > > ever bothering to talk to a FS or MM person who have immediately told
> > > you so. Let's just rip out this buildid junk for now and restart
> > > because the problem isn't actually that easy.
> >
> > Oh, they did talk to fs & mm people originally and were told NO, so they
> > sneaked it in through the BPF tree.
> >
> > https://lore.kernel.org/all/20230316170149.4106586-1-jolsa@kernel.org/
> >
> > > > The only real limitation is that we'd like to be able to control
> > > > whether we are ok sleeping or not, as this code can be called from
> > > > pretty much anywhere BPF might run, which includes NMI context.
> > > >
> > > > Would this kiocb_read() approach work under those circumstances?
> > >
> > > No. IOCB_NOWAIT is just a hint to avoid blocking function calls.
> > > It is not guarantee and a guarantee is basically impossible.
> >
> > I'm not sure I'd go that far -- I think we're pretty good about not
> > sleeping when IOCB_NOWAIT is specified and any remaining places can
> > be fixed up.
> >
> > But I am inclined to rip out the buildid code, just because the
> > authors have been so rude.
>
> Which fstest actually checks the functionality of the buildid code?
> I don't find any, which means none of the fs people have a good signal
> for breakage in this, um, novel file I/O path.
We have plenty of build ID tests in BPF selftest that validate this
functionality:
- tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c
- tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c
- tools/testing/selftests/bpf/prog_tests/build_id.c
This functionality is exposed to BPF (and PROCMAP_QUERY, which has its
own mm selftests), so that's where we test this. So we'll know at the
very least when trees merge that something is broken.
>
> --D
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
2025-11-18 19:27 ` Andrii Nakryiko
@ 2025-11-19 5:50 ` Christoph Hellwig
2025-11-19 17:12 ` Andrii Nakryiko
0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2025-11-19 5:50 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Matthew Wilcox, Linus Torvalds, Christoph Hellwig,
Darrick J. Wong, SHAURYA RANE, akpm, shakeel.butt, eddyz87,
andrii, ast, linux-fsdevel, linux-mm, linux-kernel,
linux-kernel-mentees, skhan, david.hunter.linux, khalid,
syzbot+09b7d050e4806540153d, bpf
On Tue, Nov 18, 2025 at 11:27:47AM -0800, Andrii Nakryiko wrote:
> Then please help make it better, give us interfaces you think are
> appropriate. People do use this functionality in production, it's
> important and we are not going to drop it. In non-sleepable mode it's
> best-effort, if the requested part of the file is paged in, we'll
> successfully read data (such as ELF's build ID), and if not, we'll
> report that to the BPF program as -EFAULT. In sleepable mode, we'll
> wait for that part of the file to be paged in before proceeding.
> PROCMAP_QUERY ioctl() is always in sleepable mode, so it will wait for
> file data to be read.
That's pretty demanding: "If you don't give me the interface that I want
I'll just poke into internals and do broken shit" isn't really the
best way to make friends and win influence.,
> If you don't like the implementation, please help improve it, don't
> just request dropping it "because BPF folks" or anything like that.
Again, you're trying to put a lot of work you should have done on
others. Everyone here is pretty helpful guiding when asking for help,
but being asked at gunpoint to cleanup the mess your created is not
going to get everyone drop their work and jump onto your project.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
2025-11-18 19:38 ` Andrii Nakryiko
@ 2025-11-19 5:52 ` Christoph Hellwig
2025-11-19 6:29 ` Darrick J. Wong
1 sibling, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2025-11-19 5:52 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Darrick J. Wong, Matthew Wilcox, Christoph Hellwig, SHAURYA RANE,
akpm, shakeel.butt, eddyz87, andrii, ast, linux-fsdevel,
linux-mm, linux-kernel, linux-kernel-mentees, skhan,
david.hunter.linux, khalid, syzbot+09b7d050e4806540153d, bpf
On Tue, Nov 18, 2025 at 11:38:36AM -0800, Andrii Nakryiko wrote:
> We have plenty of build ID tests in BPF selftest that validate this
> functionality:
And none of them is run on a wide variety of file systems, presumably
because they are only run by BFP developers.
IFF we allow for magic file access methods it needs to be part of
regular file system testing.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
2025-11-18 19:38 ` Andrii Nakryiko
2025-11-19 5:52 ` Christoph Hellwig
@ 2025-11-19 6:29 ` Darrick J. Wong
1 sibling, 0 replies; 22+ messages in thread
From: Darrick J. Wong @ 2025-11-19 6:29 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Matthew Wilcox, Christoph Hellwig, SHAURYA RANE, akpm,
shakeel.butt, eddyz87, andrii, ast, linux-fsdevel, linux-mm,
linux-kernel, linux-kernel-mentees, skhan, david.hunter.linux,
khalid, syzbot+09b7d050e4806540153d, bpf
On Tue, Nov 18, 2025 at 11:38:36AM -0800, Andrii Nakryiko wrote:
> On Tue, Nov 18, 2025 at 8:12 AM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Tue, Nov 18, 2025 at 03:37:09PM +0000, Matthew Wilcox wrote:
> > > On Tue, Nov 18, 2025 at 05:03:24AM -0800, Christoph Hellwig wrote:
> > > > On Mon, Nov 17, 2025 at 10:45:31AM -0800, Andrii Nakryiko wrote:
> > > > > As I replied on another email, ideally we'd have some low-level file
> > > > > reading interface where we wouldn't have to know about secretmem, or
> > > > > XFS+DAX, or whatever other unusual combination of conditions where
> > > > > exposed internal APIs like filemap_get_folio() + read_cache_folio()
> > > > > can crash.
> > > >
> > > > The problem is that you did something totally insane and it kinda works
> > > > most of the time.
> > >
> > > ... on 64-bit systems. The HIGHMEM handling is screwed up too.
> > >
> > > > But bpf or any other file system consumer has
> > > > absolutely not business poking into the page cache to start with.
> > >
> > > Agreed.
> > >
> > > > And I'm really pissed off that you wrote and merged this code without
> > > > ever bothering to talk to a FS or MM person who have immediately told
> > > > you so. Let's just rip out this buildid junk for now and restart
> > > > because the problem isn't actually that easy.
> > >
> > > Oh, they did talk to fs & mm people originally and were told NO, so they
> > > sneaked it in through the BPF tree.
> > >
> > > https://lore.kernel.org/all/20230316170149.4106586-1-jolsa@kernel.org/
> > >
> > > > > The only real limitation is that we'd like to be able to control
> > > > > whether we are ok sleeping or not, as this code can be called from
> > > > > pretty much anywhere BPF might run, which includes NMI context.
> > > > >
> > > > > Would this kiocb_read() approach work under those circumstances?
> > > >
> > > > No. IOCB_NOWAIT is just a hint to avoid blocking function calls.
> > > > It is not guarantee and a guarantee is basically impossible.
> > >
> > > I'm not sure I'd go that far -- I think we're pretty good about not
> > > sleeping when IOCB_NOWAIT is specified and any remaining places can
> > > be fixed up.
> > >
> > > But I am inclined to rip out the buildid code, just because the
> > > authors have been so rude.
> >
> > Which fstest actually checks the functionality of the buildid code?
> > I don't find any, which means none of the fs people have a good signal
> > for breakage in this, um, novel file I/O path.
>
> We have plenty of build ID tests in BPF selftest that validate this
> functionality:
>
> - tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c
> - tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c
> - tools/testing/selftests/bpf/prog_tests/build_id.c
>
> This functionality is exposed to BPF (and PROCMAP_QUERY, which has its
> own mm selftests), so that's where we test this. So we'll know at the
> very least when trees merge that something is broken.
Only if you're testing the buildid functionality with all known file I/O
paths implemented by all filesystems. Or you could add a new testcase
to fstests and we'd do all that *for* you.
--D
> >
> > --D
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio()
2025-11-19 5:50 ` Christoph Hellwig
@ 2025-11-19 17:12 ` Andrii Nakryiko
0 siblings, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2025-11-19 17:12 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Matthew Wilcox, Linus Torvalds, Darrick J. Wong, SHAURYA RANE,
akpm, shakeel.butt, eddyz87, andrii, ast, linux-fsdevel,
linux-mm, linux-kernel, linux-kernel-mentees, skhan,
david.hunter.linux, khalid, syzbot+09b7d050e4806540153d, bpf
On Tue, Nov 18, 2025 at 9:50 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Tue, Nov 18, 2025 at 11:27:47AM -0800, Andrii Nakryiko wrote:
> > Then please help make it better, give us interfaces you think are
> > appropriate. People do use this functionality in production, it's
> > important and we are not going to drop it. In non-sleepable mode it's
> > best-effort, if the requested part of the file is paged in, we'll
> > successfully read data (such as ELF's build ID), and if not, we'll
> > report that to the BPF program as -EFAULT. In sleepable mode, we'll
> > wait for that part of the file to be paged in before proceeding.
> > PROCMAP_QUERY ioctl() is always in sleepable mode, so it will wait for
> > file data to be read.
>
> That's pretty demanding: "If you don't give me the interface that I want
> I'll just poke into internals and do broken shit" isn't really the
> best way to make friends and win influence.,
Did you read the second part of my reply? The functionality in
question ([0]) was developed in the open, over multiple revisions,
with both mm and fsdevel mailing list CC'ed. Matthew Wilcox did look
at this, provided feedback and suggestion to use filemap_get_folio() +
read_cache_folio(), which I did incorporate.
[0] https://lore.kernel.org/bpf/20240829174232.3133883-1-andrii@kernel.org/
>
> > If you don't like the implementation, please help improve it, don't
> > just request dropping it "because BPF folks" or anything like that.
>
> Again, you're trying to put a lot of work you should have done on
> others. Everyone here is pretty helpful guiding when asking for help,
> but being asked at gunpoint to cleanup the mess your created is not
> going to get everyone drop their work and jump onto your project.
Gunpoint, really?.. Am I not asking for help to improve the code? This
functionality is being used, and we can't "just rip it out" as you
propose. Let's fix it instead.
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-11-19 17:12 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-14 19:37 [PATCH] mm/filemap: fix NULL pointer dereference in do_read_cache_folio() ssrane_b23
2025-11-14 20:44 ` Matthew Wilcox
2025-11-16 5:42 ` [PATCH v2] " ssrane_b23
2025-11-16 5:43 ` [PATCH] " SHAURYA RANE
2025-11-16 22:32 ` Matthew Wilcox
2025-11-17 14:10 ` Shaurya Rane
2025-11-17 18:42 ` Andrii Nakryiko
2025-11-17 16:41 ` Darrick J. Wong
2025-11-17 18:03 ` Matthew Wilcox
2025-11-17 18:45 ` Andrii Nakryiko
2025-11-18 13:03 ` Christoph Hellwig
2025-11-18 15:37 ` Matthew Wilcox
2025-11-18 16:12 ` Darrick J. Wong
2025-11-18 19:38 ` Andrii Nakryiko
2025-11-19 5:52 ` Christoph Hellwig
2025-11-19 6:29 ` Darrick J. Wong
2025-11-18 19:27 ` Andrii Nakryiko
2025-11-19 5:50 ` Christoph Hellwig
2025-11-19 17:12 ` Andrii Nakryiko
2025-11-18 5:05 ` Christoph Hellwig
2025-11-18 12:51 ` Matthew Wilcox
2025-11-18 12:56 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox