* [PATCH] mm/readahead: read min folio constraints under invalidate lock
@ 2025-12-15 14:19 Jinchao Wang
2025-12-15 14:22 ` Matthew Wilcox
0 siblings, 1 reply; 7+ messages in thread
From: Jinchao Wang @ 2025-12-15 14:19 UTC (permalink / raw)
To: Matthew Wilcox (Oracle),
Andrew Morton, Christian Brauner, Hannes Reinecke,
Luis Chamberlain, linux-fsdevel, linux-mm, linux-kernel
Cc: stable, Jinchao Wang, syzbot+4d3cc33ef7a77041efa6,
syzbot+fdba5cca73fee92c69d6
page_cache_ra_order() and page_cache_ra_unbounded() read mapping minimum folio
constraints before taking the invalidate lock, allowing concurrent changes to
violate page cache invariants.
Move the lookups under filemap_invalidate_lock_shared() to ensure readahead
allocations respect the mapping constraints.
Fixes: 47dd67532303 ("block/bdev: lift block size restrictions to 64k")
Reported-by: syzbot+4d3cc33ef7a77041efa6@syzkaller.appspotmail.com
Reported-by: syzbot+fdba5cca73fee92c69d6@syzkaller.appspotmail.com
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
mm/readahead.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/mm/readahead.c b/mm/readahead.c
index b415c9969176..74acd6c4f87c 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -214,7 +214,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
unsigned long index = readahead_index(ractl);
gfp_t gfp_mask = readahead_gfp_mask(mapping);
unsigned long mark = ULONG_MAX, i = 0;
- unsigned int min_nrpages = mapping_min_folio_nrpages(mapping);
+ unsigned int min_nrpages;
/*
* Partway through the readahead operation, we will have added
@@ -232,6 +232,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
lookahead_size);
filemap_invalidate_lock_shared(mapping);
index = mapping_align_index(mapping, index);
+ min_nrpages = mapping_min_folio_nrpages(mapping);
/*
* As iterator `i` is aligned to min_nrpages, round_up the
@@ -467,7 +468,7 @@ void page_cache_ra_order(struct readahead_control *ractl,
struct address_space *mapping = ractl->mapping;
pgoff_t start = readahead_index(ractl);
pgoff_t index = start;
- unsigned int min_order = mapping_min_folio_order(mapping);
+ unsigned int min_order;
pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT;
pgoff_t mark = index + ra->size - ra->async_size;
unsigned int nofs;
@@ -485,13 +486,16 @@ void page_cache_ra_order(struct readahead_control *ractl,
new_order = min(mapping_max_folio_order(mapping), new_order);
new_order = min_t(unsigned int, new_order, ilog2(ra->size));
- new_order = max(new_order, min_order);
ra->order = new_order;
/* See comment in page_cache_ra_unbounded() */
nofs = memalloc_nofs_save();
filemap_invalidate_lock_shared(mapping);
+
+ min_order = mapping_min_folio_order(mapping);
+ new_order = max(new_order, min_order);
+
/*
* If the new_order is greater than min_order and index is
* already aligned to new_order, then this will be noop as index
--
2.43.0
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] mm/readahead: read min folio constraints under invalidate lock
2025-12-15 14:19 [PATCH] mm/readahead: read min folio constraints under invalidate lock Jinchao Wang
@ 2025-12-15 14:22 ` Matthew Wilcox
2025-12-16 1:37 ` Jinchao Wang
0 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2025-12-15 14:22 UTC (permalink / raw)
To: Jinchao Wang
Cc: Andrew Morton, Christian Brauner, Hannes Reinecke,
Luis Chamberlain, linux-fsdevel, linux-mm, linux-kernel, stable,
syzbot+4d3cc33ef7a77041efa6, syzbot+fdba5cca73fee92c69d6
On Mon, Dec 15, 2025 at 10:19:00PM +0800, Jinchao Wang wrote:
> page_cache_ra_order() and page_cache_ra_unbounded() read mapping minimum folio
> constraints before taking the invalidate lock, allowing concurrent changes to
> violate page cache invariants.
>
> Move the lookups under filemap_invalidate_lock_shared() to ensure readahead
> allocations respect the mapping constraints.
Why are the mapping folio size constraints being changed? They're
supposed to be set at inode instantiation and then never changed.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm/readahead: read min folio constraints under invalidate lock
2025-12-15 14:22 ` Matthew Wilcox
@ 2025-12-16 1:37 ` Jinchao Wang
2025-12-16 2:42 ` Matthew Wilcox
0 siblings, 1 reply; 7+ messages in thread
From: Jinchao Wang @ 2025-12-16 1:37 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, Christian Brauner, Hannes Reinecke,
Luis Chamberlain, linux-fsdevel, linux-mm, linux-kernel, stable,
syzbot+4d3cc33ef7a77041efa6, syzbot+fdba5cca73fee92c69d6
On Mon, Dec 15, 2025 at 02:22:23PM +0000, Matthew Wilcox wrote:
> On Mon, Dec 15, 2025 at 10:19:00PM +0800, Jinchao Wang wrote:
> > page_cache_ra_order() and page_cache_ra_unbounded() read mapping minimum folio
> > constraints before taking the invalidate lock, allowing concurrent changes to
> > violate page cache invariants.
> >
> > Move the lookups under filemap_invalidate_lock_shared() to ensure readahead
> > allocations respect the mapping constraints.
>
> Why are the mapping folio size constraints being changed? They're
> supposed to be set at inode instantiation and then never changed.
They can change after instantiation for block devices. In the syzbot repro:
blkdev_ioctl() -> blkdev_bszset() -> set_blocksize() ->
mapping_set_folio_min_order()
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm/readahead: read min folio constraints under invalidate lock
2025-12-16 1:37 ` Jinchao Wang
@ 2025-12-16 2:42 ` Matthew Wilcox
2025-12-16 3:12 ` Jinchao Wang
0 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2025-12-16 2:42 UTC (permalink / raw)
To: Jinchao Wang
Cc: Andrew Morton, Christian Brauner, Hannes Reinecke,
Luis Chamberlain, linux-fsdevel, linux-mm, linux-kernel, stable,
syzbot+4d3cc33ef7a77041efa6, syzbot+fdba5cca73fee92c69d6
On Tue, Dec 16, 2025 at 09:37:51AM +0800, Jinchao Wang wrote:
> On Mon, Dec 15, 2025 at 02:22:23PM +0000, Matthew Wilcox wrote:
> > On Mon, Dec 15, 2025 at 10:19:00PM +0800, Jinchao Wang wrote:
> > > page_cache_ra_order() and page_cache_ra_unbounded() read mapping minimum folio
> > > constraints before taking the invalidate lock, allowing concurrent changes to
> > > violate page cache invariants.
> > >
> > > Move the lookups under filemap_invalidate_lock_shared() to ensure readahead
> > > allocations respect the mapping constraints.
> >
> > Why are the mapping folio size constraints being changed? They're
> > supposed to be set at inode instantiation and then never changed.
>
> They can change after instantiation for block devices. In the syzbot repro:
> blkdev_ioctl() -> blkdev_bszset() -> set_blocksize() ->
> mapping_set_folio_min_order()
Oh, this is just syzbot doing stupid things. We should probably make
blkdev_bszset() fail if somebody else has an fd open.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm/readahead: read min folio constraints under invalidate lock
2025-12-16 2:42 ` Matthew Wilcox
@ 2025-12-16 3:12 ` Jinchao Wang
2025-12-16 3:53 ` Matthew Wilcox
0 siblings, 1 reply; 7+ messages in thread
From: Jinchao Wang @ 2025-12-16 3:12 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, Christian Brauner, Hannes Reinecke,
Luis Chamberlain, linux-fsdevel, linux-mm, linux-kernel, stable,
syzbot+4d3cc33ef7a77041efa6, syzbot+fdba5cca73fee92c69d6
On Tue, Dec 16, 2025 at 02:42:06AM +0000, Matthew Wilcox wrote:
> On Tue, Dec 16, 2025 at 09:37:51AM +0800, Jinchao Wang wrote:
> > On Mon, Dec 15, 2025 at 02:22:23PM +0000, Matthew Wilcox wrote:
> > > On Mon, Dec 15, 2025 at 10:19:00PM +0800, Jinchao Wang wrote:
> > > > page_cache_ra_order() and page_cache_ra_unbounded() read mapping minimum folio
> > > > constraints before taking the invalidate lock, allowing concurrent changes to
> > > > violate page cache invariants.
> > > >
> > > > Move the lookups under filemap_invalidate_lock_shared() to ensure readahead
> > > > allocations respect the mapping constraints.
> > >
> > > Why are the mapping folio size constraints being changed? They're
> > > supposed to be set at inode instantiation and then never changed.
> >
> > They can change after instantiation for block devices. In the syzbot repro:
> > blkdev_ioctl() -> blkdev_bszset() -> set_blocksize() ->
> > mapping_set_folio_min_order()
>
> Oh, this is just syzbot doing stupid things. We should probably make
> blkdev_bszset() fail if somebody else has an fd open.
Thanks, that makes sense.
Tightening blkdev_bszset() would avoid the race entirely.
This change is meant as a defensive fix to prevent BUGs.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm/readahead: read min folio constraints under invalidate lock
2025-12-16 3:12 ` Jinchao Wang
@ 2025-12-16 3:53 ` Matthew Wilcox
2025-12-18 4:03 ` Jinchao Wang
0 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2025-12-16 3:53 UTC (permalink / raw)
To: Jinchao Wang
Cc: Andrew Morton, Christian Brauner, Hannes Reinecke,
Luis Chamberlain, linux-fsdevel, linux-mm, linux-kernel, stable,
syzbot+4d3cc33ef7a77041efa6, syzbot+fdba5cca73fee92c69d6
On Tue, Dec 16, 2025 at 11:12:21AM +0800, Jinchao Wang wrote:
> On Tue, Dec 16, 2025 at 02:42:06AM +0000, Matthew Wilcox wrote:
> > On Tue, Dec 16, 2025 at 09:37:51AM +0800, Jinchao Wang wrote:
> > > On Mon, Dec 15, 2025 at 02:22:23PM +0000, Matthew Wilcox wrote:
> > > > On Mon, Dec 15, 2025 at 10:19:00PM +0800, Jinchao Wang wrote:
> > > > > page_cache_ra_order() and page_cache_ra_unbounded() read mapping minimum folio
> > > > > constraints before taking the invalidate lock, allowing concurrent changes to
> > > > > violate page cache invariants.
> > > > >
> > > > > Move the lookups under filemap_invalidate_lock_shared() to ensure readahead
> > > > > allocations respect the mapping constraints.
> > > >
> > > > Why are the mapping folio size constraints being changed? They're
> > > > supposed to be set at inode instantiation and then never changed.
> > >
> > > They can change after instantiation for block devices. In the syzbot repro:
> > > blkdev_ioctl() -> blkdev_bszset() -> set_blocksize() ->
> > > mapping_set_folio_min_order()
> >
> > Oh, this is just syzbot doing stupid things. We should probably make
> > blkdev_bszset() fail if somebody else has an fd open.
>
> Thanks, that makes sense.
> Tightening blkdev_bszset() would avoid the race entirely.
> This change is meant as a defensive fix to prevent BUGs.
Yes, but the point is that there's a lot of code which relies on
the AS_FOLIO bits not changing in the middle. Syzbot found one of them,
but there are others.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm/readahead: read min folio constraints under invalidate lock
2025-12-16 3:53 ` Matthew Wilcox
@ 2025-12-18 4:03 ` Jinchao Wang
0 siblings, 0 replies; 7+ messages in thread
From: Jinchao Wang @ 2025-12-18 4:03 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, Christian Brauner, Hannes Reinecke,
Luis Chamberlain, linux-fsdevel, linux-mm, linux-kernel, stable,
syzbot+4d3cc33ef7a77041efa6, syzbot+fdba5cca73fee92c69d6
On Tue, Dec 16, 2025 at 03:53:17AM +0000, Matthew Wilcox wrote:
> On Tue, Dec 16, 2025 at 11:12:21AM +0800, Jinchao Wang wrote:
> > On Tue, Dec 16, 2025 at 02:42:06AM +0000, Matthew Wilcox wrote:
> > > On Tue, Dec 16, 2025 at 09:37:51AM +0800, Jinchao Wang wrote:
> > > > On Mon, Dec 15, 2025 at 02:22:23PM +0000, Matthew Wilcox wrote:
> > > > > On Mon, Dec 15, 2025 at 10:19:00PM +0800, Jinchao Wang wrote:
> > > > > > page_cache_ra_order() and page_cache_ra_unbounded() read mapping minimum folio
> > > > > > constraints before taking the invalidate lock, allowing concurrent changes to
> > > > > > violate page cache invariants.
> > > > > >
> > > > > > Move the lookups under filemap_invalidate_lock_shared() to ensure readahead
> > > > > > allocations respect the mapping constraints.
> > > > >
> > > > > Why are the mapping folio size constraints being changed? They're
> > > > > supposed to be set at inode instantiation and then never changed.
> > > >
> > > > They can change after instantiation for block devices. In the syzbot repro:
> > > > blkdev_ioctl() -> blkdev_bszset() -> set_blocksize() ->
> > > > mapping_set_folio_min_order()
> > >
> > > Oh, this is just syzbot doing stupid things. We should probably make
> > > blkdev_bszset() fail if somebody else has an fd open.
> >
> > Thanks, that makes sense.
> > Tightening blkdev_bszset() would avoid the race entirely.
> > This change is meant as a defensive fix to prevent BUGs.
>
> Yes, but the point is that there's a lot of code which relies on
> the AS_FOLIO bits not changing in the middle. Syzbot found one of them,
> but there are others.
I've been thinking about this more, and I wanted to share another
perspective if that's okay.
Rather than tracking down every place that might change AS_FOLIO bits
(like blkdev_bszset() and potentially others), what if we make the
page cache layer itself robust against such changes?
The invalidate_lock was introduced for exactly this kind of protection
(commit 730633f0b7f9: "mm: Protect operations adding pages to page
cache with invalidate_lock"). This way, the page cache doesn't need
to rely on assumptions about what upper layers might do.
The readahead functions already hold filemap_invalidate_lock_shared(),
so moving the constraint reads under the lock adds no overhead. It
would protect against AS_FOLIO changes regardless of their source.
I think this separates concerns nicely: upper layers can change
constraints through the invalidate_lock protocol, and page cache
operations are automatically safe. But I'd really value your thoughts
on this approach - you have much more experience with these tradeoffs
than I do.
Thanks again for taking the time to discuss this.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-12-18 4:03 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-15 14:19 [PATCH] mm/readahead: read min folio constraints under invalidate lock Jinchao Wang
2025-12-15 14:22 ` Matthew Wilcox
2025-12-16 1:37 ` Jinchao Wang
2025-12-16 2:42 ` Matthew Wilcox
2025-12-16 3:12 ` Jinchao Wang
2025-12-16 3:53 ` Matthew Wilcox
2025-12-18 4:03 ` Jinchao Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox