* [PATCH 0/32] fs: Move metadata bh tracking from address_space
@ 2026-03-03 10:33 Jan Kara
2026-03-03 10:33 ` [PATCH 01/32] fat: Sync and invalidate metadata buffers from fat_evict_inode() Jan Kara
` (33 more replies)
0 siblings, 34 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Hello,
this patch series cleans up the mess that has accumulated over the years in
metadata buffer_head tracking for inodes, moves the tracking into dedicated
structure in filesystem-private part of the inode (so that we don't use
private_list, private_data, and private_lock in struct address_space), and also
moves couple other users of private_data and private_list so these are removed
from struct address_space saving 3 longs in struct inode for 99% of inodes. I
would like to get rid of private_lock in struct address_space as well however
the locking changes for buffer_heads are non-trivial there and the patch series
is long enough as is. So let's leave that for another time.
The patches have survived some testing with fstests and ltp however I didn't
test AFFS, HUGETLBFS, and KVM guest_memfd changes so a help with testing
those would be very welcome. Thanks.
block/bdev.c | 1
fs/affs/affs.h | 2
fs/affs/dir.c | 1
fs/affs/file.c | 1
fs/affs/inode.c | 2
fs/affs/super.c | 6
fs/affs/symlink.c | 1
fs/aio.c | 78 +++++++-
fs/bfs/bfs.h | 2
fs/bfs/dir.c | 1
fs/bfs/file.c | 4
fs/bfs/inode.c | 9 +
fs/buffer.c | 387 +++++++++++++++++---------------------------
fs/ext2/ext2.h | 2
fs/ext2/file.c | 1
fs/ext2/inode.c | 3
fs/ext2/namei.c | 2
fs/ext2/super.c | 6
fs/ext2/symlink.c | 2
fs/ext4/ext4.h | 4
fs/ext4/file.c | 1
fs/ext4/inode.c | 9 -
fs/ext4/namei.c | 2
fs/ext4/super.c | 9 -
fs/ext4/symlink.c | 3
fs/fat/fat.h | 2
fs/fat/file.c | 1
fs/fat/inode.c | 16 +
fs/fat/namei_msdos.c | 1
fs/fat/namei_vfat.c | 1
fs/gfs2/glock.c | 1
fs/hugetlbfs/inode.c | 10 -
fs/inode.c | 24 +-
fs/minix/file.c | 1
fs/minix/inode.c | 10 +
fs/minix/minix.h | 2
fs/minix/namei.c | 1
fs/ntfs3/file.c | 3
fs/ocfs2/dlmglue.c | 1
fs/ocfs2/namei.c | 3
fs/udf/file.c | 1
fs/udf/inode.c | 2
fs/udf/namei.c | 1
fs/udf/super.c | 6
fs/udf/symlink.c | 1
fs/udf/udf_i.h | 1
fs/udf/udfdecl.h | 1
include/linux/buffer_head.h | 6
include/linux/fs.h | 11 -
include/linux/hugetlb.h | 1
mm/hugetlb.c | 10 -
virt/kvm/guest_memfd.c | 12 -
52 files changed, 360 insertions(+), 309 deletions(-)
Honza
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 01/32] fat: Sync and invalidate metadata buffers from fat_evict_inode()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
@ 2026-03-03 10:33 ` Jan Kara
2026-03-03 10:33 ` [PATCH 02/32] udf: Sync and invalidate metadata buffers from udf_evict_inode() Jan Kara
` (32 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
There are only very few filesystems using generic metadata buffer head
tracking and everybody is paying the overhead. When we remove this
tracking for inode reclaim code .evict will start to see inodes with
metadata buffers attached so write them out and prune them.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/fat/inode.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 3cc5fb01afa1..ce88602b0d57 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -657,8 +657,10 @@ static void fat_evict_inode(struct inode *inode)
if (!inode->i_nlink) {
inode->i_size = 0;
fat_truncate_blocks(inode, 0);
- } else
+ } else {
+ sync_mapping_buffers(inode->i_mapping);
fat_free_eofblocks(inode);
+ }
invalidate_inode_buffers(inode);
clear_inode(inode);
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 02/32] udf: Sync and invalidate metadata buffers from udf_evict_inode()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
2026-03-03 10:33 ` [PATCH 01/32] fat: Sync and invalidate metadata buffers from fat_evict_inode() Jan Kara
@ 2026-03-03 10:33 ` Jan Kara
2026-03-03 10:33 ` [PATCH 03/32] minix: Sync and invalidate metadata buffers from minix_evict_inode() Jan Kara
` (31 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
There are only very few filesystems using generic metadata buffer head
tracking and everybody is paying the overhead. When we remove this
tracking for inode reclaim code .evict will start to see inodes with
metadata buffers attached so write them out and prune them.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/udf/inode.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/udf/inode.c b/fs/udf/inode.c
index 7fae8002344a..739b190ca4e9 100644
--- a/fs/udf/inode.c
+++ b/fs/udf/inode.c
@@ -154,6 +154,8 @@ void udf_evict_inode(struct inode *inode)
}
}
truncate_inode_pages_final(&inode->i_data);
+ if (!want_delete)
+ sync_mapping_buffers(&inode->i_data);
invalidate_inode_buffers(inode);
clear_inode(inode);
kfree(iinfo->i_data);
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 03/32] minix: Sync and invalidate metadata buffers from minix_evict_inode()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
2026-03-03 10:33 ` [PATCH 01/32] fat: Sync and invalidate metadata buffers from fat_evict_inode() Jan Kara
2026-03-03 10:33 ` [PATCH 02/32] udf: Sync and invalidate metadata buffers from udf_evict_inode() Jan Kara
@ 2026-03-03 10:33 ` Jan Kara
2026-03-03 10:33 ` [PATCH 04/32] ext2: Sync and invalidate metadata buffers from ext2_evict_inode() Jan Kara
` (30 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
There are only very few filesystems using generic metadata buffer head
tracking and everybody is paying the overhead. When we remove this
tracking for inode reclaim code .evict will start to see inodes with
metadata buffers attached so write them out and prune them.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/minix/inode.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/minix/inode.c b/fs/minix/inode.c
index 99541c6a5bbf..ab7c06efb139 100644
--- a/fs/minix/inode.c
+++ b/fs/minix/inode.c
@@ -48,6 +48,8 @@ static void minix_evict_inode(struct inode *inode)
if (!inode->i_nlink) {
inode->i_size = 0;
minix_truncate(inode);
+ } else {
+ sync_mapping_buffers(&inode->i_data);
}
invalidate_inode_buffers(inode);
clear_inode(inode);
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 04/32] ext2: Sync and invalidate metadata buffers from ext2_evict_inode()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (2 preceding siblings ...)
2026-03-03 10:33 ` [PATCH 03/32] minix: Sync and invalidate metadata buffers from minix_evict_inode() Jan Kara
@ 2026-03-03 10:33 ` Jan Kara
2026-03-03 10:33 ` [PATCH 05/32] ext4: Sync and invalidate metadata buffers from ext4_evict_inode() Jan Kara
` (29 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
There are only very few filesystems using generic metadata buffer head
tracking and everybody is paying the overhead. When we remove this
tracking for inode reclaim code .evict will start to see inodes with
metadata buffers attached so write them out and prune them.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext2/inode.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index dbfe9098a124..fb91c61aa6d6 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -94,8 +94,9 @@ void ext2_evict_inode(struct inode * inode)
if (inode->i_blocks)
ext2_truncate_blocks(inode, 0);
ext2_xattr_delete_inode(inode);
+ } else {
+ sync_mapping_buffers(&inode->i_data);
}
-
invalidate_inode_buffers(inode);
clear_inode(inode);
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 05/32] ext4: Sync and invalidate metadata buffers from ext4_evict_inode()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (3 preceding siblings ...)
2026-03-03 10:33 ` [PATCH 04/32] ext2: Sync and invalidate metadata buffers from ext2_evict_inode() Jan Kara
@ 2026-03-03 10:33 ` Jan Kara
2026-03-03 10:33 ` [PATCH 06/32] ext4: Use inode_has_buffers() Jan Kara
` (28 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
There are only very few filesystems using generic metadata buffer head
tracking and everybody is paying the overhead. When we remove this
tracking for inode reclaim code .evict will start to see inodes with
metadata buffers attached so write them out and prune them.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/inode.c | 4 +++-
fs/ext4/super.c | 3 ++-
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 396dc3a5d16b..c2692b9c7123 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -185,7 +185,9 @@ void ext4_evict_inode(struct inode *inode)
ext4_evict_ea_inode(inode);
if (inode->i_nlink) {
truncate_inode_pages_final(&inode->i_data);
-
+ /* Avoid mballoc special inode which has no proper iops */
+ if (!EXT4_SB(inode->i_sb)->s_journal)
+ sync_mapping_buffers(&inode->i_data);
goto no_delete;
}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 43f680c750ae..ea827b0ecc8d 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1524,7 +1524,8 @@ static void destroy_inodecache(void)
void ext4_clear_inode(struct inode *inode)
{
ext4_fc_del(inode);
- invalidate_inode_buffers(inode);
+ if (!EXT4_SB(inode->i_sb)->s_journal)
+ invalidate_inode_buffers(inode);
clear_inode(inode);
ext4_discard_preallocations(inode);
ext4_es_remove_extent(inode, 0, EXT_MAX_BLOCKS);
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 06/32] ext4: Use inode_has_buffers()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (4 preceding siblings ...)
2026-03-03 10:33 ` [PATCH 05/32] ext4: Sync and invalidate metadata buffers from ext4_evict_inode() Jan Kara
@ 2026-03-03 10:33 ` Jan Kara
2026-03-03 10:33 ` [PATCH 07/32] bfs: Sync and invalidate metadata buffers from bfs_evict_inode() Jan Kara
` (27 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Instead of checking i_private_list directly use appropriate wrapper
inode_has_buffers(). Also delete stale comment.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/buffer.c | 1 +
fs/ext4/inode.c | 5 +----
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 22b43642ba57..1bc0f22f3cc2 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -524,6 +524,7 @@ int inode_has_buffers(struct inode *inode)
{
return !list_empty(&inode->i_data.i_private_list);
}
+EXPORT_SYMBOL_GPL(inode_has_buffers);
/*
* osync is designed to support O_SYNC io. It waits synchronously for
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c2692b9c7123..6f892abef003 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1422,9 +1422,6 @@ static int write_end_fn(handle_t *handle, struct inode *inode,
/*
* We need to pick up the new inode size which generic_commit_write gave us
* `iocb` can be NULL - eg, when called from page_symlink().
- *
- * ext4 never places buffers on inode->i_mapping->i_private_list. metadata
- * buffers are managed internally.
*/
static int ext4_write_end(const struct kiocb *iocb,
struct address_space *mapping,
@@ -3439,7 +3436,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
}
/* Any metadata buffers to write? */
- if (!list_empty(&inode->i_mapping->i_private_list))
+ if (inode_has_buffers(inode))
return true;
return inode_state_read_once(inode) & I_DIRTY_DATASYNC;
}
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 07/32] bfs: Sync and invalidate metadata buffers from bfs_evict_inode()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (5 preceding siblings ...)
2026-03-03 10:33 ` [PATCH 06/32] ext4: Use inode_has_buffers() Jan Kara
@ 2026-03-03 10:33 ` Jan Kara
2026-03-03 10:33 ` [PATCH 08/32] affs: Sync and invalidate metadata buffers from affs_evict_inode() Jan Kara
` (26 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
There are only very few filesystems using generic metadata buffer head
tracking and everybody is paying the overhead. When we remove this
tracking for inode reclaim code .evict will start to see inodes with
metadata buffers attached so write them out and prune them.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/bfs/inode.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/bfs/inode.c b/fs/bfs/inode.c
index 9da02f5cb6cd..e0e50a9dbe9c 100644
--- a/fs/bfs/inode.c
+++ b/fs/bfs/inode.c
@@ -187,6 +187,8 @@ static void bfs_evict_inode(struct inode *inode)
dprintf("ino=%08lx\n", ino);
truncate_inode_pages_final(&inode->i_data);
+ if (inode->i_nlink)
+ sync_mapping_buffers(&inode->i_data);
invalidate_inode_buffers(inode);
clear_inode(inode);
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 08/32] affs: Sync and invalidate metadata buffers from affs_evict_inode()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (6 preceding siblings ...)
2026-03-03 10:33 ` [PATCH 07/32] bfs: Sync and invalidate metadata buffers from bfs_evict_inode() Jan Kara
@ 2026-03-03 10:33 ` Jan Kara
2026-03-03 10:33 ` [PATCH 09/32] fs: Ignore inode metadata buffers in inode_lru_isolate() Jan Kara
` (25 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
There are only very few filesystems using generic metadata buffer head
tracking and everybody is paying the overhead. When we remove this
tracking for inode reclaim code .evict will start to see inodes with
metadata buffers attached so write them out and prune them.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/affs/inode.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/affs/inode.c b/fs/affs/inode.c
index 0bfc7d151dcd..84afa862f220 100644
--- a/fs/affs/inode.c
+++ b/fs/affs/inode.c
@@ -267,6 +267,8 @@ affs_evict_inode(struct inode *inode)
if (!inode->i_nlink) {
inode->i_size = 0;
affs_truncate(inode);
+ } else {
+ sync_mapping_buffers(&inode->i_data);
}
invalidate_inode_buffers(inode);
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 09/32] fs: Ignore inode metadata buffers in inode_lru_isolate()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (7 preceding siblings ...)
2026-03-03 10:33 ` [PATCH 08/32] affs: Sync and invalidate metadata buffers from affs_evict_inode() Jan Kara
@ 2026-03-03 10:33 ` Jan Kara
2026-03-03 10:33 ` [PATCH 10/32] fs: Stop using i_private_data for metadata bh tracking Jan Kara
` (24 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
There are only a few filesystems that use generic tracking of inode
metadata buffer heads. As such it is mostly pointless to verify such
attached buffer heads during inode reclaim. Drop the handling from
inode_lru_isolate().
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/buffer.c | 29 -----------------------------
fs/inode.c | 21 +++++++++------------
include/linux/buffer_head.h | 3 ---
3 files changed, 9 insertions(+), 44 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 1bc0f22f3cc2..bd48644e1bf8 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -878,35 +878,6 @@ void invalidate_inode_buffers(struct inode *inode)
}
EXPORT_SYMBOL(invalidate_inode_buffers);
-/*
- * Remove any clean buffers from the inode's buffer list. This is called
- * when we're trying to free the inode itself. Those buffers can pin it.
- *
- * Returns true if all buffers were removed.
- */
-int remove_inode_buffers(struct inode *inode)
-{
- int ret = 1;
-
- if (inode_has_buffers(inode)) {
- struct address_space *mapping = &inode->i_data;
- struct list_head *list = &mapping->i_private_list;
- struct address_space *buffer_mapping = mapping->i_private_data;
-
- spin_lock(&buffer_mapping->i_private_lock);
- while (!list_empty(list)) {
- struct buffer_head *bh = BH_ENTRY(list->next);
- if (buffer_dirty(bh)) {
- ret = 0;
- break;
- }
- __remove_assoc_queue(bh);
- }
- spin_unlock(&buffer_mapping->i_private_lock);
- }
- return ret;
-}
-
/*
* Create the appropriate buffers when given a folio for data area and
* the size of each buffer.. Use the bh->b_this_page linked list to
diff --git a/fs/inode.c b/fs/inode.c
index cc12b68e021b..4f98a5f04bbd 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -17,7 +17,6 @@
#include <linux/fsverity.h>
#include <linux/mount.h>
#include <linux/posix_acl.h>
-#include <linux/buffer_head.h> /* for inode_has_buffers */
#include <linux/ratelimit.h>
#include <linux/list_lru.h>
#include <linux/iversion.h>
@@ -367,7 +366,6 @@ struct inode *alloc_inode(struct super_block *sb)
void __destroy_inode(struct inode *inode)
{
- BUG_ON(inode_has_buffers(inode));
inode_detach_wb(inode);
security_inode_free(inode);
fsnotify_inode_delete(inode);
@@ -994,19 +992,18 @@ static enum lru_status inode_lru_isolate(struct list_head *item,
* page cache in order to free up struct inodes: lowmem might
* be under pressure before the cache inside the highmem zone.
*/
- if (inode_has_buffers(inode) || !mapping_empty(&inode->i_data)) {
+ if (!mapping_empty(&inode->i_data)) {
+ unsigned long reap;
+
inode_pin_lru_isolating(inode);
spin_unlock(&inode->i_lock);
spin_unlock(&lru->lock);
- if (remove_inode_buffers(inode)) {
- unsigned long reap;
- reap = invalidate_mapping_pages(&inode->i_data, 0, -1);
- if (current_is_kswapd())
- __count_vm_events(KSWAPD_INODESTEAL, reap);
- else
- __count_vm_events(PGINODESTEAL, reap);
- mm_account_reclaimed_pages(reap);
- }
+ reap = invalidate_mapping_pages(&inode->i_data, 0, -1);
+ if (current_is_kswapd())
+ __count_vm_events(KSWAPD_INODESTEAL, reap);
+ else
+ __count_vm_events(PGINODESTEAL, reap);
+ mm_account_reclaimed_pages(reap);
inode_unpin_lru_isolating(inode);
return LRU_RETRY;
}
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index b16b88bfbc3e..631bf971efc0 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -517,7 +517,6 @@ void buffer_init(void);
bool try_to_free_buffers(struct folio *folio);
int inode_has_buffers(struct inode *inode);
void invalidate_inode_buffers(struct inode *inode);
-int remove_inode_buffers(struct inode *inode);
int sync_mapping_buffers(struct address_space *mapping);
void invalidate_bh_lrus(void);
void invalidate_bh_lrus_cpu(void);
@@ -528,9 +527,7 @@ extern int buffer_heads_over_limit;
static inline void buffer_init(void) {}
static inline bool try_to_free_buffers(struct folio *folio) { return true; }
-static inline int inode_has_buffers(struct inode *inode) { return 0; }
static inline void invalidate_inode_buffers(struct inode *inode) {}
-static inline int remove_inode_buffers(struct inode *inode) { return 1; }
static inline int sync_mapping_buffers(struct address_space *mapping) { return 0; }
static inline void invalidate_bh_lrus(void) {}
static inline void invalidate_bh_lrus_cpu(void) {}
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 10/32] fs: Stop using i_private_data for metadata bh tracking
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (8 preceding siblings ...)
2026-03-03 10:33 ` [PATCH 09/32] fs: Ignore inode metadata buffers in inode_lru_isolate() Jan Kara
@ 2026-03-03 10:33 ` Jan Kara
2026-03-03 10:34 ` [PATCH 11/32] gfs2: Don't zero i_private_data Jan Kara
` (23 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
All filesystem using generic metadata bh tracking are using bdev mapping
as a backing for these bhs. Stop using i_private_data for it and get to
bdev mapping directly.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/buffer.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index bd48644e1bf8..c85ccfb1a4ec 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -574,9 +574,10 @@ static int osync_buffers_list(spinlock_t *lock, struct list_head *list)
*/
int sync_mapping_buffers(struct address_space *mapping)
{
- struct address_space *buffer_mapping = mapping->i_private_data;
+ struct address_space *buffer_mapping =
+ mapping->host->i_sb->s_bdev->bd_mapping;
- if (buffer_mapping == NULL || list_empty(&mapping->i_private_list))
+ if (list_empty(&mapping->i_private_list))
return 0;
return fsync_buffers_list(&buffer_mapping->i_private_lock,
@@ -679,11 +680,6 @@ void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode)
struct address_space *buffer_mapping = bh->b_folio->mapping;
mark_buffer_dirty(bh);
- if (!mapping->i_private_data) {
- mapping->i_private_data = buffer_mapping;
- } else {
- BUG_ON(mapping->i_private_data != buffer_mapping);
- }
if (!bh->b_assoc_map) {
spin_lock(&buffer_mapping->i_private_lock);
list_move_tail(&bh->b_assoc_buffers,
@@ -868,7 +864,8 @@ void invalidate_inode_buffers(struct inode *inode)
if (inode_has_buffers(inode)) {
struct address_space *mapping = &inode->i_data;
struct list_head *list = &mapping->i_private_list;
- struct address_space *buffer_mapping = mapping->i_private_data;
+ struct address_space *buffer_mapping =
+ mapping->host->i_sb->s_bdev->bd_mapping;
spin_lock(&buffer_mapping->i_private_lock);
while (!list_empty(list))
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 11/32] gfs2: Don't zero i_private_data
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (9 preceding siblings ...)
2026-03-03 10:33 ` [PATCH 10/32] fs: Stop using i_private_data for metadata bh tracking Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 12:32 ` Andreas Gruenbacher
2026-03-03 10:34 ` [PATCH 12/32] hugetlbfs: Stop using i_private_data Jan Kara
` (22 subsequent siblings)
33 siblings, 1 reply; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara, Andreas Gruenbacher, gfs2
The zeroing is the only use within gfs2 so it is pointless.
CC: Andreas Gruenbacher <agruenba@redhat.com>
CC: gfs2@lists.linux.dev
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/gfs2/glock.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 2acbabccc8ad..b8a144d3a73b 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1149,7 +1149,6 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
mapping->flags = 0;
gfp_mask = mapping_gfp_mask(sdp->sd_inode->i_mapping);
mapping_set_gfp_mask(mapping, gfp_mask);
- mapping->i_private_data = NULL;
mapping->writeback_index = 0;
}
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 12/32] hugetlbfs: Stop using i_private_data
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (10 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 11/32] gfs2: Don't zero i_private_data Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 13/32] aio: Stop using i_private_data and i_private_lock Jan Kara
` (21 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Instead of using i_private_data for resv_map pointer add the pointer
into hugetlbfs private part of the inode.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/hugetlbfs/inode.c | 10 ++--------
include/linux/hugetlb.h | 1 +
mm/hugetlb.c | 10 +---------
3 files changed, 4 insertions(+), 17 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 3f70c47981de..0496f2e6d177 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -622,13 +622,7 @@ static void hugetlbfs_evict_inode(struct inode *inode)
trace_hugetlbfs_evict_inode(inode);
remove_inode_hugepages(inode, 0, LLONG_MAX);
- /*
- * Get the resv_map from the address space embedded in the inode.
- * This is the address space which points to any resv_map allocated
- * at inode creation time. If this is a device special inode,
- * i_mapping may not point to the original address space.
- */
- resv_map = (struct resv_map *)(&inode->i_data)->i_private_data;
+ resv_map = HUGETLBFS_I(inode)->resv_map;
/* Only regular and link inodes have associated reserve maps */
if (resv_map)
resv_map_release(&resv_map->refs);
@@ -950,7 +944,7 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb,
&hugetlbfs_i_mmap_rwsem_key);
inode->i_mapping->a_ops = &hugetlbfs_aops;
simple_inode_init_ts(inode);
- inode->i_mapping->i_private_data = resv_map;
+ info->resv_map = resv_map;
info->seals = F_SEAL_SEAL;
switch (mode & S_IFMT) {
default:
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 65910437be1c..fc5462fe943f 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -518,6 +518,7 @@ static inline struct hugetlbfs_sb_info *HUGETLBFS_SB(struct super_block *sb)
struct hugetlbfs_inode_info {
struct inode vfs_inode;
+ struct resv_map *resv_map;
unsigned int seals;
};
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 0beb6e22bc26..7ab5c724a711 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1157,15 +1157,7 @@ void resv_map_release(struct kref *ref)
static inline struct resv_map *inode_resv_map(struct inode *inode)
{
- /*
- * At inode evict time, i_mapping may not point to the original
- * address space within the inode. This original address space
- * contains the pointer to the resv_map. So, always use the
- * address space embedded within the inode.
- * The VERY common case is inode->mapping == &inode->i_data but,
- * this may not be true for device special inodes.
- */
- return (struct resv_map *)(&inode->i_data)->i_private_data;
+ return HUGETLBFS_I(inode)->resv_map;
}
static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 13/32] aio: Stop using i_private_data and i_private_lock
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (11 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 12/32] hugetlbfs: Stop using i_private_data Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 14/32] fs: Remove i_private_data Jan Kara
` (20 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Instead of using i_private_data and i_private_lock, just create aio
inodes with appropriate necessary fields.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/aio.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 66 insertions(+), 12 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index a07bdd1aaaa6..ba9b9fa2446b 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -218,6 +218,17 @@ struct aio_kiocb {
struct eventfd_ctx *ki_eventfd;
};
+struct aio_inode_info {
+ struct inode vfs_inode;
+ spinlock_t migrate_lock;
+ struct kioctx *ctx;
+};
+
+static inline struct aio_inode_info *AIO_I(struct inode *inode)
+{
+ return container_of(inode, struct aio_inode_info, vfs_inode);
+}
+
/*------ sysctl variables----*/
static DEFINE_SPINLOCK(aio_nr_lock);
static unsigned long aio_nr; /* current system wide number of aio requests */
@@ -251,6 +262,7 @@ static void __init aio_sysctl_init(void)
static struct kmem_cache *kiocb_cachep;
static struct kmem_cache *kioctx_cachep;
+static struct kmem_cache *aio_inode_cachep;
static struct vfsmount *aio_mnt;
@@ -261,11 +273,12 @@ static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages)
{
struct file *file;
struct inode *inode = alloc_anon_inode(aio_mnt->mnt_sb);
+
if (IS_ERR(inode))
return ERR_CAST(inode);
inode->i_mapping->a_ops = &aio_ctx_aops;
- inode->i_mapping->i_private_data = ctx;
+ AIO_I(inode)->ctx = ctx;
inode->i_size = PAGE_SIZE * nr_pages;
file = alloc_file_pseudo(inode, aio_mnt, "[aio]",
@@ -275,14 +288,49 @@ static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages)
return file;
}
+static struct inode *aio_alloc_inode(struct super_block *sb)
+{
+ struct aio_inode_info *ai;
+
+ ai = alloc_inode_sb(sb, aio_inode_cachep, GFP_KERNEL);
+ if (!ai)
+ return NULL;
+ ai->ctx = NULL;
+
+ return &ai->vfs_inode;
+}
+
+static void aio_free_inode(struct inode *inode)
+{
+ kmem_cache_free(aio_inode_cachep, AIO_I(inode));
+}
+
+static const struct super_operations aio_super_operations = {
+ .alloc_inode = aio_alloc_inode,
+ .free_inode = aio_free_inode,
+ .statfs = simple_statfs,
+};
+
static int aio_init_fs_context(struct fs_context *fc)
{
- if (!init_pseudo(fc, AIO_RING_MAGIC))
+ struct pseudo_fs_context *pfc;
+
+ pfc = init_pseudo(fc, AIO_RING_MAGIC);
+ if (!pfc)
return -ENOMEM;
fc->s_iflags |= SB_I_NOEXEC;
+ pfc->ops = &aio_super_operations;
return 0;
}
+static void init_once(void *obj)
+{
+ struct aio_inode_info *ai = obj;
+
+ inode_init_once(&ai->vfs_inode);
+ spin_lock_init(&ai->migrate_lock);
+}
+
/* aio_setup
* Creates the slab caches used by the aio routines, panic on
* failure as this is done early during the boot sequence.
@@ -294,6 +342,11 @@ static int __init aio_setup(void)
.init_fs_context = aio_init_fs_context,
.kill_sb = kill_anon_super,
};
+
+ aio_inode_cachep = kmem_cache_create("aio_inode_cache",
+ sizeof(struct aio_inode_info), 0,
+ (SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_ACCOUNT),
+ init_once);
aio_mnt = kern_mount(&aio_fs);
if (IS_ERR(aio_mnt))
panic("Failed to create aio fs mount.");
@@ -308,17 +361,17 @@ __initcall(aio_setup);
static void put_aio_ring_file(struct kioctx *ctx)
{
struct file *aio_ring_file = ctx->aio_ring_file;
- struct address_space *i_mapping;
if (aio_ring_file) {
- truncate_setsize(file_inode(aio_ring_file), 0);
+ struct inode *inode = file_inode(aio_ring_file);
+
+ truncate_setsize(inode, 0);
/* Prevent further access to the kioctx from migratepages */
- i_mapping = aio_ring_file->f_mapping;
- spin_lock(&i_mapping->i_private_lock);
- i_mapping->i_private_data = NULL;
+ spin_lock(&AIO_I(inode)->migrate_lock);
+ AIO_I(inode)->ctx = NULL;
ctx->aio_ring_file = NULL;
- spin_unlock(&i_mapping->i_private_lock);
+ spin_unlock(&AIO_I(inode)->migrate_lock);
fput(aio_ring_file);
}
@@ -408,13 +461,14 @@ static int aio_migrate_folio(struct address_space *mapping, struct folio *dst,
struct folio *src, enum migrate_mode mode)
{
struct kioctx *ctx;
+ struct aio_inode_info *ai = AIO_I(mapping->host);
unsigned long flags;
pgoff_t idx;
int rc = 0;
- /* mapping->i_private_lock here protects against the kioctx teardown. */
- spin_lock(&mapping->i_private_lock);
- ctx = mapping->i_private_data;
+ /* ai->migrate_lock here protects against the kioctx teardown. */
+ spin_lock(&ai->migrate_lock);
+ ctx = ai->ctx;
if (!ctx) {
rc = -EINVAL;
goto out;
@@ -467,7 +521,7 @@ static int aio_migrate_folio(struct address_space *mapping, struct folio *dst,
out_unlock:
mutex_unlock(&ctx->ring_lock);
out:
- spin_unlock(&mapping->i_private_lock);
+ spin_unlock(&ai->migrate_lock);
return rc;
}
#else
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 14/32] fs: Remove i_private_data
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (12 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 13/32] aio: Stop using i_private_data and i_private_lock Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 15/32] fs: Drop osync_buffers_list() Jan Kara
` (19 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Nobody is using it anymore.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/inode.c | 1 -
include/linux/fs.h | 2 --
2 files changed, 3 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index 4f98a5f04bbd..d5774e627a9c 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -283,7 +283,6 @@ int inode_init_always_gfp(struct super_block *sb, struct inode *inode, gfp_t gfp
atomic_set(&mapping->nr_thps, 0);
#endif
mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
- mapping->i_private_data = NULL;
mapping->writeback_index = 0;
init_rwsem(&mapping->invalidate_lock);
lockdep_set_class_and_name(&mapping->invalidate_lock,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8b3dd145b25e..10b96eb5391d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -465,7 +465,6 @@ extern const struct address_space_operations empty_aops;
* @wb_err: The most recent error which has occurred.
* @i_private_lock: For use by the owner of the address_space.
* @i_private_list: For use by the owner of the address_space.
- * @i_private_data: For use by the owner of the address_space.
*/
struct address_space {
struct inode *host;
@@ -486,7 +485,6 @@ struct address_space {
spinlock_t i_private_lock;
struct list_head i_private_list;
struct rw_semaphore i_mmap_rwsem;
- void * i_private_data;
} __attribute__((aligned(sizeof(long)))) __randomize_layout;
/*
* On most architectures that alignment is already the case; but
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 15/32] fs: Drop osync_buffers_list()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (13 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 14/32] fs: Remove i_private_data Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 16/32] fs: Fold fsync_buffers_list() into sync_mapping_buffers() Jan Kara
` (18 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
The function only waits for already locked buffers in the list of
metadata bhs. fsync_buffers_list() has just waited for all outstanding
IO on buffers so this isn't adding anything useful. Comment in front of
fsync_buffers_list() mentions concerns about buffers being moved out
from tmp list back to mappings i_private_list but these days
mark_buffer_dirty_inode() doesn't touch inodes with b_assoc_map set so
that cannot happen. Just delete the stale code.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/buffer.c | 43 ++-----------------------------------------
1 file changed, 2 insertions(+), 41 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index c85ccfb1a4ec..1c0e7c81a38b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -526,41 +526,6 @@ int inode_has_buffers(struct inode *inode)
}
EXPORT_SYMBOL_GPL(inode_has_buffers);
-/*
- * osync is designed to support O_SYNC io. It waits synchronously for
- * all already-submitted IO to complete, but does not queue any new
- * writes to the disk.
- *
- * To do O_SYNC writes, just queue the buffer writes with write_dirty_buffer
- * as you dirty the buffers, and then use osync_inode_buffers to wait for
- * completion. Any other dirty buffers which are not yet queued for
- * write will not be flushed to disk by the osync.
- */
-static int osync_buffers_list(spinlock_t *lock, struct list_head *list)
-{
- struct buffer_head *bh;
- struct list_head *p;
- int err = 0;
-
- spin_lock(lock);
-repeat:
- list_for_each_prev(p, list) {
- bh = BH_ENTRY(p);
- if (buffer_locked(bh)) {
- get_bh(bh);
- spin_unlock(lock);
- wait_on_buffer(bh);
- if (!buffer_uptodate(bh))
- err = -EIO;
- brelse(bh);
- spin_lock(lock);
- goto repeat;
- }
- }
- spin_unlock(lock);
- return err;
-}
-
/**
* sync_mapping_buffers - write out & wait upon a mapping's "associated" buffers
* @mapping: the mapping which wants those buffers written
@@ -777,7 +742,7 @@ static int fsync_buffers_list(spinlock_t *lock, struct list_head *list)
{
struct buffer_head *bh;
struct address_space *mapping;
- int err = 0, err2;
+ int err = 0;
struct blk_plug plug;
LIST_HEAD(tmp);
@@ -844,11 +809,7 @@ static int fsync_buffers_list(spinlock_t *lock, struct list_head *list)
}
spin_unlock(lock);
- err2 = osync_buffers_list(lock, list);
- if (err)
- return err;
- else
- return err2;
+ return err;
}
/*
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 16/32] fs: Fold fsync_buffers_list() into sync_mapping_buffers()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (14 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 15/32] fs: Drop osync_buffers_list() Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 17/32] fs: Move metadata bhs tracking to a separate struct Jan Kara
` (17 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
There's only single caller of fsync_buffers_list() so untangle the code
a bit by folding fsync_buffers_list() into sync_mapping_buffers(). Also
merge the comments and update them to reflect current state of code.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/buffer.c | 180 +++++++++++++++++++++++-----------------------------
1 file changed, 80 insertions(+), 100 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 1c0e7c81a38b..18012afb8289 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -54,7 +54,6 @@
#include "internal.h"
-static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
enum rw_hint hint, struct writeback_control *wbc);
@@ -531,22 +530,96 @@ EXPORT_SYMBOL_GPL(inode_has_buffers);
* @mapping: the mapping which wants those buffers written
*
* Starts I/O against the buffers at mapping->i_private_list, and waits upon
- * that I/O.
+ * that I/O. Basically, this is a convenience function for fsync(). @mapping
+ * is a file or directory which needs those buffers to be written for a
+ * successful fsync().
*
- * Basically, this is a convenience function for fsync().
- * @mapping is a file or directory which needs those buffers to be written for
- * a successful fsync().
+ * We have conflicting pressures: we want to make sure that all
+ * initially dirty buffers get waited on, but that any subsequently
+ * dirtied buffers don't. After all, we don't want fsync to last
+ * forever if somebody is actively writing to the file.
+ *
+ * Do this in two main stages: first we copy dirty buffers to a
+ * temporary inode list, queueing the writes as we go. Then we clean
+ * up, waiting for those writes to complete. mark_buffer_dirty_inode()
+ * doesn't touch b_assoc_buffers list if b_assoc_map is not NULL so we
+ * are sure the buffer stays on our list until IO completes (at which point
+ * it can be reaped).
*/
int sync_mapping_buffers(struct address_space *mapping)
{
struct address_space *buffer_mapping =
mapping->host->i_sb->s_bdev->bd_mapping;
+ struct buffer_head *bh;
+ int err = 0;
+ struct blk_plug plug;
+ LIST_HEAD(tmp);
if (list_empty(&mapping->i_private_list))
return 0;
- return fsync_buffers_list(&buffer_mapping->i_private_lock,
- &mapping->i_private_list);
+ blk_start_plug(&plug);
+
+ spin_lock(&buffer_mapping->i_private_lock);
+ while (!list_empty(&mapping->i_private_list)) {
+ bh = BH_ENTRY(list->next);
+ WARN_ON_ONCE(bh->b_assoc_map != mapping);
+ __remove_assoc_queue(bh);
+ /* Avoid race with mark_buffer_dirty_inode() which does
+ * a lockless check and we rely on seeing the dirty bit */
+ smp_mb();
+ if (buffer_dirty(bh) || buffer_locked(bh)) {
+ list_add(&bh->b_assoc_buffers, &tmp);
+ bh->b_assoc_map = mapping;
+ if (buffer_dirty(bh)) {
+ get_bh(bh);
+ spin_unlock(&buffer_mapping->i_private_lock);
+ /*
+ * Ensure any pending I/O completes so that
+ * write_dirty_buffer() actually writes the
+ * current contents - it is a noop if I/O is
+ * still in flight on potentially older
+ * contents.
+ */
+ write_dirty_buffer(bh, REQ_SYNC);
+
+ /*
+ * Kick off IO for the previous mapping. Note
+ * that we will not run the very last mapping,
+ * wait_on_buffer() will do that for us
+ * through sync_buffer().
+ */
+ brelse(bh);
+ spin_lock(&buffer_mapping->i_private_lock);
+ }
+ }
+ }
+
+ spin_unlock(&buffer_mapping->i_private_lock);
+ blk_finish_plug(&plug);
+ spin_lock(&buffer_mapping->i_private_lock);
+
+ while (!list_empty(&tmp)) {
+ bh = BH_ENTRY(tmp.prev);
+ get_bh(bh);
+ __remove_assoc_queue(bh);
+ /* Avoid race with mark_buffer_dirty_inode() which does
+ * a lockless check and we rely on seeing the dirty bit */
+ smp_mb();
+ if (buffer_dirty(bh)) {
+ list_add(&bh->b_assoc_buffers,
+ &mapping->i_private_list);
+ bh->b_assoc_map = mapping;
+ }
+ spin_unlock(&buffer_mapping->i_private_lock);
+ wait_on_buffer(bh);
+ if (!buffer_uptodate(bh))
+ err = -EIO;
+ brelse(bh);
+ spin_lock(&buffer_mapping->i_private_lock);
+ }
+ spin_unlock(&buffer_mapping->i_private_lock);
+ return err;
}
EXPORT_SYMBOL(sync_mapping_buffers);
@@ -719,99 +792,6 @@ bool block_dirty_folio(struct address_space *mapping, struct folio *folio)
}
EXPORT_SYMBOL(block_dirty_folio);
-/*
- * Write out and wait upon a list of buffers.
- *
- * We have conflicting pressures: we want to make sure that all
- * initially dirty buffers get waited on, but that any subsequently
- * dirtied buffers don't. After all, we don't want fsync to last
- * forever if somebody is actively writing to the file.
- *
- * Do this in two main stages: first we copy dirty buffers to a
- * temporary inode list, queueing the writes as we go. Then we clean
- * up, waiting for those writes to complete.
- *
- * During this second stage, any subsequent updates to the file may end
- * up refiling the buffer on the original inode's dirty list again, so
- * there is a chance we will end up with a buffer queued for write but
- * not yet completed on that list. So, as a final cleanup we go through
- * the osync code to catch these locked, dirty buffers without requeuing
- * any newly dirty buffers for write.
- */
-static int fsync_buffers_list(spinlock_t *lock, struct list_head *list)
-{
- struct buffer_head *bh;
- struct address_space *mapping;
- int err = 0;
- struct blk_plug plug;
- LIST_HEAD(tmp);
-
- blk_start_plug(&plug);
-
- spin_lock(lock);
- while (!list_empty(list)) {
- bh = BH_ENTRY(list->next);
- mapping = bh->b_assoc_map;
- __remove_assoc_queue(bh);
- /* Avoid race with mark_buffer_dirty_inode() which does
- * a lockless check and we rely on seeing the dirty bit */
- smp_mb();
- if (buffer_dirty(bh) || buffer_locked(bh)) {
- list_add(&bh->b_assoc_buffers, &tmp);
- bh->b_assoc_map = mapping;
- if (buffer_dirty(bh)) {
- get_bh(bh);
- spin_unlock(lock);
- /*
- * Ensure any pending I/O completes so that
- * write_dirty_buffer() actually writes the
- * current contents - it is a noop if I/O is
- * still in flight on potentially older
- * contents.
- */
- write_dirty_buffer(bh, REQ_SYNC);
-
- /*
- * Kick off IO for the previous mapping. Note
- * that we will not run the very last mapping,
- * wait_on_buffer() will do that for us
- * through sync_buffer().
- */
- brelse(bh);
- spin_lock(lock);
- }
- }
- }
-
- spin_unlock(lock);
- blk_finish_plug(&plug);
- spin_lock(lock);
-
- while (!list_empty(&tmp)) {
- bh = BH_ENTRY(tmp.prev);
- get_bh(bh);
- mapping = bh->b_assoc_map;
- __remove_assoc_queue(bh);
- /* Avoid race with mark_buffer_dirty_inode() which does
- * a lockless check and we rely on seeing the dirty bit */
- smp_mb();
- if (buffer_dirty(bh)) {
- list_add(&bh->b_assoc_buffers,
- &mapping->i_private_list);
- bh->b_assoc_map = mapping;
- }
- spin_unlock(lock);
- wait_on_buffer(bh);
- if (!buffer_uptodate(bh))
- err = -EIO;
- brelse(bh);
- spin_lock(lock);
- }
-
- spin_unlock(lock);
- return err;
-}
-
/*
* Invalidate any and all dirty buffers on a given inode. We are
* probably unmounting the fs, but that doesn't mean we have already
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 17/32] fs: Move metadata bhs tracking to a separate struct
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (15 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 16/32] fs: Fold fsync_buffers_list() into sync_mapping_buffers() Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 18/32] fs: Provide operation for fetching mapping_metadata_bhs Jan Kara
` (16 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Instead of tracking metadata bhs for a mapping using i_private_list and
i_private_lock we create a dedicated mapping_metadata_bhs struct for it.
So far this struct is embedded in address_space but that will be
switched for per-fs private inode parts later in the series. This also
changes the locking from bdev mapping's i_private_lock to lock embedded
in mapping_metadata_bhs to untangle the i_private_lock locking for
maintaining lists of metadata bhs and the locking for looking up /
reclaiming bdev's buffer heads. The locking in remove_assoc_map()
gets more complex due to this but overall this looks like a reasonable
tradeoff.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/buffer.c | 138 +++++++++++++++++++++------------------------
fs/inode.c | 2 +
include/linux/fs.h | 7 +++
3 files changed, 74 insertions(+), 73 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 18012afb8289..d39ae6581c26 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -469,30 +469,13 @@ EXPORT_SYMBOL(mark_buffer_async_write);
*
* The functions mark_buffer_dirty_inode(), fsync_inode_buffers(),
* inode_has_buffers() and invalidate_inode_buffers() are provided for the
- * management of a list of dependent buffers at ->i_mapping->i_private_list.
- *
- * Locking is a little subtle: try_to_free_buffers() will remove buffers
- * from their controlling inode's queue when they are being freed. But
- * try_to_free_buffers() will be operating against the *blockdev* mapping
- * at the time, not against the S_ISREG file which depends on those buffers.
- * So the locking for i_private_list is via the i_private_lock in the address_space
- * which backs the buffers. Which is different from the address_space
- * against which the buffers are listed. So for a particular address_space,
- * mapping->i_private_lock does *not* protect mapping->i_private_list! In fact,
- * mapping->i_private_list will always be protected by the backing blockdev's
- * ->i_private_lock.
- *
- * Which introduces a requirement: all buffers on an address_space's
- * ->i_private_list must be from the same address_space: the blockdev's.
- *
- * address_spaces which do not place buffers at ->i_private_list via these
- * utility functions are free to use i_private_lock and i_private_list for
- * whatever they want. The only requirement is that list_empty(i_private_list)
- * be true at clear_inode() time.
- *
- * FIXME: clear_inode should not call invalidate_inode_buffers(). The
- * filesystems should do that. invalidate_inode_buffers() should just go
- * BUG_ON(!list_empty).
+ * management of a list of dependent buffers in mapping_metadata_bhs struct.
+ *
+ * The locking is a little subtle: The list of buffer heads is protected by
+ * the lock in mapping_metadata_bhs so functions coming from bdev mapping
+ * (such as try_to_free_buffers()) need to safely get to mapping_metadata_bhs
+ * using RCU, grab the lock, verify we didn't race with somebody detaching the
+ * bh / moving it to different inode and only then proceeding.
*
* FIXME: mark_buffer_dirty_inode() is a data-plane operation. It should
* take an address_space, not an inode. And it should be called
@@ -509,19 +492,45 @@ EXPORT_SYMBOL(mark_buffer_async_write);
* b_inode back.
*/
-/*
- * The buffer's backing address_space's i_private_lock must be held
- */
-static void __remove_assoc_queue(struct buffer_head *bh)
+static void __remove_assoc_queue(struct mapping_metadata_bhs *mmb,
+ struct buffer_head *bh)
{
+ lockdep_assert_held(&mmb->lock);
list_del_init(&bh->b_assoc_buffers);
WARN_ON(!bh->b_assoc_map);
bh->b_assoc_map = NULL;
}
+static void remove_assoc_queue(struct buffer_head *bh)
+{
+ struct address_space *mapping;
+ struct mapping_metadata_bhs *mmb;
+
+ /*
+ * The locking dance is ugly here. We need to acquire lock
+ * protecting metadata bh list while possibly racing with bh
+ * being removed from the list or moved to a different one. We
+ * use RCU to pin mapping_metadata_bhs in memory to
+ * opportunistically acquire the lock and then recheck the bh
+ * didn't move under us.
+ */
+ while (bh->b_assoc_map) {
+ rcu_read_lock();
+ mapping = READ_ONCE(bh->b_assoc_map);
+ if (mapping) {
+ mmb = &mapping->i_metadata_bhs;
+ spin_lock(&mmb->lock);
+ if (bh->b_assoc_map == mapping)
+ __remove_assoc_queue(mmb, bh);
+ spin_unlock(&mmb->lock);
+ }
+ rcu_read_unlock();
+ }
+}
+
int inode_has_buffers(struct inode *inode)
{
- return !list_empty(&inode->i_data.i_private_list);
+ return !list_empty(&inode->i_data.i_metadata_bhs.list);
}
EXPORT_SYMBOL_GPL(inode_has_buffers);
@@ -529,7 +538,7 @@ EXPORT_SYMBOL_GPL(inode_has_buffers);
* sync_mapping_buffers - write out & wait upon a mapping's "associated" buffers
* @mapping: the mapping which wants those buffers written
*
- * Starts I/O against the buffers at mapping->i_private_list, and waits upon
+ * Starts I/O against the buffers at mapping->i_metadata_bhs and waits upon
* that I/O. Basically, this is a convenience function for fsync(). @mapping
* is a file or directory which needs those buffers to be written for a
* successful fsync().
@@ -548,23 +557,22 @@ EXPORT_SYMBOL_GPL(inode_has_buffers);
*/
int sync_mapping_buffers(struct address_space *mapping)
{
- struct address_space *buffer_mapping =
- mapping->host->i_sb->s_bdev->bd_mapping;
+ struct mapping_metadata_bhs *mmb = &mapping->i_metadata_bhs;
struct buffer_head *bh;
int err = 0;
struct blk_plug plug;
LIST_HEAD(tmp);
- if (list_empty(&mapping->i_private_list))
+ if (list_empty(&mmb->list))
return 0;
blk_start_plug(&plug);
- spin_lock(&buffer_mapping->i_private_lock);
- while (!list_empty(&mapping->i_private_list)) {
- bh = BH_ENTRY(list->next);
+ spin_lock(&mmb->lock);
+ while (!list_empty(&mmb->list)) {
+ bh = BH_ENTRY(mmb->list.next);
WARN_ON_ONCE(bh->b_assoc_map != mapping);
- __remove_assoc_queue(bh);
+ __remove_assoc_queue(mmb, bh);
/* Avoid race with mark_buffer_dirty_inode() which does
* a lockless check and we rely on seeing the dirty bit */
smp_mb();
@@ -573,7 +581,7 @@ int sync_mapping_buffers(struct address_space *mapping)
bh->b_assoc_map = mapping;
if (buffer_dirty(bh)) {
get_bh(bh);
- spin_unlock(&buffer_mapping->i_private_lock);
+ spin_unlock(&mmb->lock);
/*
* Ensure any pending I/O completes so that
* write_dirty_buffer() actually writes the
@@ -590,35 +598,34 @@ int sync_mapping_buffers(struct address_space *mapping)
* through sync_buffer().
*/
brelse(bh);
- spin_lock(&buffer_mapping->i_private_lock);
+ spin_lock(&mmb->lock);
}
}
}
- spin_unlock(&buffer_mapping->i_private_lock);
+ spin_unlock(&mmb->lock);
blk_finish_plug(&plug);
- spin_lock(&buffer_mapping->i_private_lock);
+ spin_lock(&mmb->lock);
while (!list_empty(&tmp)) {
bh = BH_ENTRY(tmp.prev);
get_bh(bh);
- __remove_assoc_queue(bh);
+ __remove_assoc_queue(mmb, bh);
/* Avoid race with mark_buffer_dirty_inode() which does
* a lockless check and we rely on seeing the dirty bit */
smp_mb();
if (buffer_dirty(bh)) {
- list_add(&bh->b_assoc_buffers,
- &mapping->i_private_list);
+ list_add(&bh->b_assoc_buffers, &mmb->list);
bh->b_assoc_map = mapping;
}
- spin_unlock(&buffer_mapping->i_private_lock);
+ spin_unlock(&mmb->lock);
wait_on_buffer(bh);
if (!buffer_uptodate(bh))
err = -EIO;
brelse(bh);
- spin_lock(&buffer_mapping->i_private_lock);
+ spin_lock(&mmb->lock);
}
- spin_unlock(&buffer_mapping->i_private_lock);
+ spin_unlock(&mmb->lock);
return err;
}
EXPORT_SYMBOL(sync_mapping_buffers);
@@ -715,15 +722,14 @@ void write_boundary_block(struct block_device *bdev,
void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode)
{
struct address_space *mapping = inode->i_mapping;
- struct address_space *buffer_mapping = bh->b_folio->mapping;
mark_buffer_dirty(bh);
if (!bh->b_assoc_map) {
- spin_lock(&buffer_mapping->i_private_lock);
+ spin_lock(&mapping->i_metadata_bhs.lock);
list_move_tail(&bh->b_assoc_buffers,
- &mapping->i_private_list);
+ &mapping->i_metadata_bhs.list);
bh->b_assoc_map = mapping;
- spin_unlock(&buffer_mapping->i_private_lock);
+ spin_unlock(&mapping->i_metadata_bhs.lock);
}
}
EXPORT_SYMBOL(mark_buffer_dirty_inode);
@@ -796,22 +802,16 @@ EXPORT_SYMBOL(block_dirty_folio);
* Invalidate any and all dirty buffers on a given inode. We are
* probably unmounting the fs, but that doesn't mean we have already
* done a sync(). Just drop the buffers from the inode list.
- *
- * NOTE: we take the inode's blockdev's mapping's i_private_lock. Which
- * assumes that all the buffers are against the blockdev.
*/
void invalidate_inode_buffers(struct inode *inode)
{
if (inode_has_buffers(inode)) {
- struct address_space *mapping = &inode->i_data;
- struct list_head *list = &mapping->i_private_list;
- struct address_space *buffer_mapping =
- mapping->host->i_sb->s_bdev->bd_mapping;
-
- spin_lock(&buffer_mapping->i_private_lock);
- while (!list_empty(list))
- __remove_assoc_queue(BH_ENTRY(list->next));
- spin_unlock(&buffer_mapping->i_private_lock);
+ struct mapping_metadata_bhs *mmb = &inode->i_data.i_metadata_bhs;
+
+ spin_lock(&mmb->lock);
+ while (!list_empty(&mmb->list))
+ __remove_assoc_queue(mmb, BH_ENTRY(mmb->list.next));
+ spin_unlock(&mmb->lock);
}
}
EXPORT_SYMBOL(invalidate_inode_buffers);
@@ -1155,14 +1155,7 @@ EXPORT_SYMBOL(__brelse);
void __bforget(struct buffer_head *bh)
{
clear_buffer_dirty(bh);
- if (bh->b_assoc_map) {
- struct address_space *buffer_mapping = bh->b_folio->mapping;
-
- spin_lock(&buffer_mapping->i_private_lock);
- list_del_init(&bh->b_assoc_buffers);
- bh->b_assoc_map = NULL;
- spin_unlock(&buffer_mapping->i_private_lock);
- }
+ remove_assoc_queue(bh);
__brelse(bh);
}
EXPORT_SYMBOL(__bforget);
@@ -2810,8 +2803,7 @@ drop_buffers(struct folio *folio, struct buffer_head **buffers_to_free)
do {
struct buffer_head *next = bh->b_this_page;
- if (bh->b_assoc_map)
- __remove_assoc_queue(bh);
+ remove_assoc_queue(bh);
bh = next;
} while (bh != head);
*buffers_to_free = head;
diff --git a/fs/inode.c b/fs/inode.c
index d5774e627a9c..393f586d050a 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -483,6 +483,8 @@ static void __address_space_init_once(struct address_space *mapping)
init_rwsem(&mapping->i_mmap_rwsem);
INIT_LIST_HEAD(&mapping->i_private_list);
spin_lock_init(&mapping->i_private_lock);
+ spin_lock_init(&mapping->i_metadata_bhs.lock);
+ INIT_LIST_HEAD(&mapping->i_metadata_bhs.list);
mapping->i_mmap = RB_ROOT_CACHED;
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 10b96eb5391d..64771a55adc5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -445,6 +445,12 @@ struct address_space_operations {
extern const struct address_space_operations empty_aops;
+/* Structure for tracking metadata buffer heads associated with the mapping */
+struct mapping_metadata_bhs {
+ spinlock_t lock; /* Lock protecting bh list */
+ struct list_head list; /* The list of bhs (b_assoc_buffers) */
+};
+
/**
* struct address_space - Contents of a cacheable, mappable object.
* @host: Owner, either the inode or the block_device.
@@ -484,6 +490,7 @@ struct address_space {
errseq_t wb_err;
spinlock_t i_private_lock;
struct list_head i_private_list;
+ struct mapping_metadata_bhs i_metadata_bhs;
struct rw_semaphore i_mmap_rwsem;
} __attribute__((aligned(sizeof(long)))) __randomize_layout;
/*
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 18/32] fs: Provide operation for fetching mapping_metadata_bhs
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (16 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 17/32] fs: Move metadata bhs tracking to a separate struct Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-04 12:48 ` Christian Brauner
2026-03-03 10:34 ` [PATCH 19/32] ntfs3: Drop pointless sync_mapping_buffers() call Jan Kara
` (15 subsequent siblings)
33 siblings, 1 reply; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
When we move mapping_metadata_bhs to fs-private part of an inode the
generic code will need a way to get to this struct from general struct
inode. Add inode operation for this similarly to operation for grabbing
offset_ctx.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/buffer.c | 35 +++++++++++++++++++++++++----------
include/linux/buffer_head.h | 1 +
include/linux/fs.h | 1 +
3 files changed, 27 insertions(+), 10 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index d39ae6581c26..d7a1d72302da 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -492,6 +492,20 @@ EXPORT_SYMBOL(mark_buffer_async_write);
* b_inode back.
*/
+void mmb_init(struct mapping_metadata_bhs *mmb)
+{
+ spin_lock_init(&mmb->lock);
+ INIT_LIST_HEAD(&mmb->list);
+}
+EXPORT_SYMBOL(mmb_init);
+
+static struct mapping_metadata_bhs *inode_get_metadata_bhs(struct inode *inode)
+{
+ if (inode->i_op->get_metadata_bhs)
+ return inode->i_op->get_metadata_bhs(inode);
+ return &inode->i_mapping->i_metadata_bhs;
+}
+
static void __remove_assoc_queue(struct mapping_metadata_bhs *mmb,
struct buffer_head *bh)
{
@@ -518,7 +532,7 @@ static void remove_assoc_queue(struct buffer_head *bh)
rcu_read_lock();
mapping = READ_ONCE(bh->b_assoc_map);
if (mapping) {
- mmb = &mapping->i_metadata_bhs;
+ mmb = inode_get_metadata_bhs(mapping->host);
spin_lock(&mmb->lock);
if (bh->b_assoc_map == mapping)
__remove_assoc_queue(mmb, bh);
@@ -557,7 +571,8 @@ EXPORT_SYMBOL_GPL(inode_has_buffers);
*/
int sync_mapping_buffers(struct address_space *mapping)
{
- struct mapping_metadata_bhs *mmb = &mapping->i_metadata_bhs;
+ struct mapping_metadata_bhs *mmb =
+ inode_get_metadata_bhs(mapping->host);
struct buffer_head *bh;
int err = 0;
struct blk_plug plug;
@@ -721,15 +736,15 @@ void write_boundary_block(struct block_device *bdev,
void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode)
{
- struct address_space *mapping = inode->i_mapping;
-
mark_buffer_dirty(bh);
if (!bh->b_assoc_map) {
- spin_lock(&mapping->i_metadata_bhs.lock);
- list_move_tail(&bh->b_assoc_buffers,
- &mapping->i_metadata_bhs.list);
- bh->b_assoc_map = mapping;
- spin_unlock(&mapping->i_metadata_bhs.lock);
+ struct mapping_metadata_bhs *mmb;
+
+ mmb = inode_get_metadata_bhs(inode);
+ spin_lock(&mmb->lock);
+ list_move_tail(&bh->b_assoc_buffers, &mmb->list);
+ bh->b_assoc_map = inode->i_mapping;
+ spin_unlock(&mmb->lock);
}
}
EXPORT_SYMBOL(mark_buffer_dirty_inode);
@@ -806,7 +821,7 @@ EXPORT_SYMBOL(block_dirty_folio);
void invalidate_inode_buffers(struct inode *inode)
{
if (inode_has_buffers(inode)) {
- struct mapping_metadata_bhs *mmb = &inode->i_data.i_metadata_bhs;
+ struct mapping_metadata_bhs *mmb = inode_get_metadata_bhs(inode);
spin_lock(&mmb->lock);
while (!list_empty(&mmb->list))
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 631bf971efc0..623ee66d41a8 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -515,6 +515,7 @@ bool block_dirty_folio(struct address_space *mapping, struct folio *folio);
void buffer_init(void);
bool try_to_free_buffers(struct folio *folio);
+void mmb_init(struct mapping_metadata_bhs *mmb);
int inode_has_buffers(struct inode *inode);
void invalidate_inode_buffers(struct inode *inode);
int sync_mapping_buffers(struct address_space *mapping);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 64771a55adc5..b4d9be1fefa4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2046,6 +2046,7 @@ struct inode_operations {
struct dentry *dentry, struct file_kattr *fa);
int (*fileattr_get)(struct dentry *dentry, struct file_kattr *fa);
struct offset_ctx *(*get_offset_ctx)(struct inode *inode);
+ struct mapping_metadata_bhs *(*get_metadata_bhs)(struct inode *inode);
} ____cacheline_aligned;
/* Did the driver provide valid mmap hook configuration? */
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 19/32] ntfs3: Drop pointless sync_mapping_buffers() call
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (17 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 18/32] fs: Provide operation for fetching mapping_metadata_bhs Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 20/32] ocfs2: Drop pointless sync_mapping_buffers() calls Jan Kara
` (14 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara, Konstantin Komarov, ntfs3
ntfs3 never calls mark_buffer_dirty_inode() and thus its metadata
buffers list is always empty. Drop the pointless sync_mapping_buffers()
call.
CC: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
CC: ntfs3@lists.linux.dev
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ntfs3/file.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
index 7eecf1e01f74..570c92fa7ee7 100644
--- a/fs/ntfs3/file.c
+++ b/fs/ntfs3/file.c
@@ -387,9 +387,6 @@ static int ntfs_extend(struct inode *inode, loff_t pos, size_t count,
int err2;
err = filemap_fdatawrite_range(mapping, pos, end - 1);
- err2 = sync_mapping_buffers(mapping);
- if (!err)
- err = err2;
err2 = write_inode_now(inode, 1);
if (!err)
err = err2;
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 20/32] ocfs2: Drop pointless sync_mapping_buffers() calls
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (18 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 19/32] ntfs3: Drop pointless sync_mapping_buffers() call Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 21/32] bdev: Drop pointless invalidate_mapping_buffers() call Jan Kara
` (13 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara, Joel Becker, Joseph Qi, ocfs2-devel
ocfs2 never calls mark_buffer_dirty_inode() and thus its metadata
buffers list is always empty. Drop the pointless sync_mapping_buffers()
calls.
CC: Joel Becker <jlbec@evilplan.org>
CC: Joseph Qi <joseph.qi@linux.alibaba.com>
CC: ocfs2-devel@lists.linux.dev
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ocfs2/dlmglue.c | 1 -
fs/ocfs2/namei.c | 3 ---
2 files changed, 4 deletions(-)
diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index bd2ddb7d841d..7283bb2c5a31 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -3971,7 +3971,6 @@ static int ocfs2_data_convert_worker(struct ocfs2_lock_res *lockres,
mlog(ML_ERROR, "Could not sync inode %llu for downconvert!",
(unsigned long long)OCFS2_I(inode)->ip_blkno);
}
- sync_mapping_buffers(mapping);
if (blocking == DLM_LOCK_EX) {
truncate_inode_pages(mapping, 0);
} else {
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index 268b79339a51..1277666c77cd 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -1683,9 +1683,6 @@ static int ocfs2_rename(struct mnt_idmap *idmap,
if (rename_lock)
ocfs2_rename_unlock(osb);
- if (new_inode)
- sync_mapping_buffers(old_inode->i_mapping);
-
iput(new_inode);
ocfs2_free_dir_lookup_result(&target_lookup_res);
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 21/32] bdev: Drop pointless invalidate_mapping_buffers() call
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (19 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 20/32] ocfs2: Drop pointless sync_mapping_buffers() calls Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 14:03 ` Christoph Hellwig
2026-03-03 14:09 ` Christoph Hellwig
2026-03-03 10:34 ` [PATCH 22/32] fs: Switch inode_has_buffers() to take mapping_metadata_bhs Jan Kara
` (12 subsequent siblings)
33 siblings, 2 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara, Jens Axboe, linux-block
Nobody is calling mark_buffer_dirty_inode() with internal bdev inode and
it doesn't make sense for internal bdev inode to have any metadata
buffer heads. Just drop the pointless invalidate_mapping_buffers() call.
CC: Jens Axboe <axboe@kernel.dk>
CC: linux-block@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
block/bdev.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/block/bdev.c b/block/bdev.c
index ed022f8c48c7..ad1660b6b324 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -420,7 +420,6 @@ static void init_once(void *data)
static void bdev_evict_inode(struct inode *inode)
{
truncate_inode_pages_final(&inode->i_data);
- invalidate_inode_buffers(inode); /* is it needed here? */
clear_inode(inode);
}
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 22/32] fs: Switch inode_has_buffers() to take mapping_metadata_bhs
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (20 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 21/32] bdev: Drop pointless invalidate_mapping_buffers() call Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 23/32] ext2: Track metadata bhs in fs-private inode part Jan Kara
` (11 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
inode_has_buffers() is also used internally and it is trivial so it's
pointless to grab mapping_metadata_bhs for each invocation. Just let
that function take mapping_metadata_bhs struct instead and rename the
function to mmb_has_buffers().
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/buffer.c | 14 +++++++-------
fs/ext4/inode.c | 2 +-
include/linux/buffer_head.h | 2 +-
3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index d7a1d72302da..096a8d9e3280 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -468,7 +468,7 @@ EXPORT_SYMBOL(mark_buffer_async_write);
* written back and waited upon before fsync() returns.
*
* The functions mark_buffer_dirty_inode(), fsync_inode_buffers(),
- * inode_has_buffers() and invalidate_inode_buffers() are provided for the
+ * mmb_has_buffers() and invalidate_inode_buffers() are provided for the
* management of a list of dependent buffers in mapping_metadata_bhs struct.
*
* The locking is a little subtle: The list of buffer heads is protected by
@@ -542,11 +542,11 @@ static void remove_assoc_queue(struct buffer_head *bh)
}
}
-int inode_has_buffers(struct inode *inode)
+bool mmb_has_buffers(struct mapping_metadata_bhs *mmb)
{
- return !list_empty(&inode->i_data.i_metadata_bhs.list);
+ return !list_empty(&mmb->list);
}
-EXPORT_SYMBOL_GPL(inode_has_buffers);
+EXPORT_SYMBOL_GPL(mmb_has_buffers);
/**
* sync_mapping_buffers - write out & wait upon a mapping's "associated" buffers
@@ -578,7 +578,7 @@ int sync_mapping_buffers(struct address_space *mapping)
struct blk_plug plug;
LIST_HEAD(tmp);
- if (list_empty(&mmb->list))
+ if (!mmb_has_buffers(mmb))
return 0;
blk_start_plug(&plug);
@@ -820,9 +820,9 @@ EXPORT_SYMBOL(block_dirty_folio);
*/
void invalidate_inode_buffers(struct inode *inode)
{
- if (inode_has_buffers(inode)) {
- struct mapping_metadata_bhs *mmb = inode_get_metadata_bhs(inode);
+ struct mapping_metadata_bhs *mmb = inode_get_metadata_bhs(inode);
+ if (mmb_has_buffers(mmb)) {
spin_lock(&mmb->lock);
while (!list_empty(&mmb->list))
__remove_assoc_queue(mmb, BH_ENTRY(mmb->list.next));
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 6f892abef003..011cb2eb16a2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3436,7 +3436,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
}
/* Any metadata buffers to write? */
- if (inode_has_buffers(inode))
+ if (mmb_has_buffers(&inode->i_mapping->i_metadata_bhs))
return true;
return inode_state_read_once(inode) & I_DIRTY_DATASYNC;
}
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 623ee66d41a8..ebbd73c45e63 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -516,7 +516,7 @@ bool block_dirty_folio(struct address_space *mapping, struct folio *folio);
void buffer_init(void);
bool try_to_free_buffers(struct folio *folio);
void mmb_init(struct mapping_metadata_bhs *mmb);
-int inode_has_buffers(struct inode *inode);
+bool mmb_has_buffers(struct mapping_metadata_bhs *mmb);
void invalidate_inode_buffers(struct inode *inode);
int sync_mapping_buffers(struct address_space *mapping);
void invalidate_bh_lrus(void);
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 23/32] ext2: Track metadata bhs in fs-private inode part
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (21 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 22/32] fs: Switch inode_has_buffers() to take mapping_metadata_bhs Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 24/32] affs: " Jan Kara
` (10 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Track metadata bhs for an inode in fs-private part of the inode.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext2/ext2.h | 2 ++
fs/ext2/file.c | 1 +
fs/ext2/namei.c | 2 ++
fs/ext2/super.c | 6 ++++++
fs/ext2/symlink.c | 2 ++
5 files changed, 13 insertions(+)
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index 5e0c6c5fcb6c..2b6593ba107f 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -676,6 +676,7 @@ struct ext2_inode_info {
#ifdef CONFIG_QUOTA
struct dquot __rcu *i_dquot[MAXQUOTAS];
#endif
+ struct mapping_metadata_bhs i_metadata_bhs;
};
/*
@@ -766,6 +767,7 @@ void ext2_msg(struct super_block *, const char *, const char *, ...);
extern void ext2_update_dynamic_rev (struct super_block *sb);
extern void ext2_sync_super(struct super_block *sb, struct ext2_super_block *es,
int wait);
+struct mapping_metadata_bhs *ext2_get_metadata_bhs(struct inode *inode);
/*
* Inodes and files operations
diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index ebe356a38b18..2dbf3e7c2e9c 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -338,4 +338,5 @@ const struct inode_operations ext2_file_inode_operations = {
.fiemap = ext2_fiemap,
.fileattr_get = ext2_fileattr_get,
.fileattr_set = ext2_fileattr_set,
+ .get_metadata_bhs = ext2_get_metadata_bhs,
};
diff --git a/fs/ext2/namei.c b/fs/ext2/namei.c
index bde617a66cec..70c94adce837 100644
--- a/fs/ext2/namei.c
+++ b/fs/ext2/namei.c
@@ -422,6 +422,7 @@ const struct inode_operations ext2_dir_inode_operations = {
.tmpfile = ext2_tmpfile,
.fileattr_get = ext2_fileattr_get,
.fileattr_set = ext2_fileattr_set,
+ .get_metadata_bhs = ext2_get_metadata_bhs,
};
const struct inode_operations ext2_special_inode_operations = {
@@ -430,4 +431,5 @@ const struct inode_operations ext2_special_inode_operations = {
.setattr = ext2_setattr,
.get_inode_acl = ext2_get_acl,
.set_acl = ext2_set_acl,
+ .get_metadata_bhs = ext2_get_metadata_bhs,
};
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 603f2641fe10..503c25cae27c 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -215,6 +215,7 @@ static struct inode *ext2_alloc_inode(struct super_block *sb)
#ifdef CONFIG_QUOTA
memset(&ei->i_dquot, 0, sizeof(ei->i_dquot));
#endif
+ mmb_init(&ei->i_metadata_bhs);
return &ei->vfs_inode;
}
@@ -259,6 +260,11 @@ static void destroy_inodecache(void)
kmem_cache_destroy(ext2_inode_cachep);
}
+struct mapping_metadata_bhs *ext2_get_metadata_bhs(struct inode *inode)
+{
+ return &EXT2_I(inode)->i_metadata_bhs;
+}
+
static int ext2_show_options(struct seq_file *seq, struct dentry *root)
{
struct super_block *sb = root->d_sb;
diff --git a/fs/ext2/symlink.c b/fs/ext2/symlink.c
index 948d3a441403..c82a15d28772 100644
--- a/fs/ext2/symlink.c
+++ b/fs/ext2/symlink.c
@@ -26,6 +26,7 @@ const struct inode_operations ext2_symlink_inode_operations = {
.getattr = ext2_getattr,
.setattr = ext2_setattr,
.listxattr = ext2_listxattr,
+ .get_metadata_bhs = ext2_get_metadata_bhs,
};
const struct inode_operations ext2_fast_symlink_inode_operations = {
@@ -33,4 +34,5 @@ const struct inode_operations ext2_fast_symlink_inode_operations = {
.getattr = ext2_getattr,
.setattr = ext2_setattr,
.listxattr = ext2_listxattr,
+ .get_metadata_bhs = ext2_get_metadata_bhs,
};
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 24/32] affs: Track metadata bhs in fs-private inode part
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (22 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 23/32] ext2: Track metadata bhs in fs-private inode part Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 25/32] bfs: " Jan Kara
` (9 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Track metadata bhs for an inode in fs-private part of the inode.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/affs/affs.h | 2 ++
fs/affs/dir.c | 1 +
fs/affs/file.c | 1 +
fs/affs/super.c | 6 ++++++
fs/affs/symlink.c | 1 +
5 files changed, 11 insertions(+)
diff --git a/fs/affs/affs.h b/fs/affs/affs.h
index ac4e9a02910b..a1eb400e1018 100644
--- a/fs/affs/affs.h
+++ b/fs/affs/affs.h
@@ -44,6 +44,7 @@ struct affs_inode_info {
struct mutex i_link_lock; /* Protects internal inode access. */
struct mutex i_ext_lock; /* Protects internal inode access. */
#define i_hash_lock i_ext_lock
+ struct mapping_metadata_bhs i_metadata_bhs;
u32 i_blkcnt; /* block count */
u32 i_extcnt; /* extended block count */
u32 *i_lc; /* linear cache of extended blocks */
@@ -151,6 +152,7 @@ extern bool affs_nofilenametruncate(const struct dentry *dentry);
extern int affs_check_name(const unsigned char *name, int len,
bool notruncate);
extern int affs_copy_name(unsigned char *bstr, struct dentry *dentry);
+struct mapping_metadata_bhs *affs_get_metadata_bhs(struct inode *inode);
/* bitmap. c */
diff --git a/fs/affs/dir.c b/fs/affs/dir.c
index 5c8d83387a39..6b0314c84972 100644
--- a/fs/affs/dir.c
+++ b/fs/affs/dir.c
@@ -72,6 +72,7 @@ const struct inode_operations affs_dir_inode_operations = {
.rmdir = affs_rmdir,
.rename = affs_rename2,
.setattr = affs_notify_change,
+ .get_metadata_bhs = affs_get_metadata_bhs,
};
static int
diff --git a/fs/affs/file.c b/fs/affs/file.c
index 6c9258359ddb..4dbd9351eea0 100644
--- a/fs/affs/file.c
+++ b/fs/affs/file.c
@@ -1014,4 +1014,5 @@ const struct file_operations affs_file_operations = {
const struct inode_operations affs_file_inode_operations = {
.setattr = affs_notify_change,
+ .get_metadata_bhs = affs_get_metadata_bhs,
};
diff --git a/fs/affs/super.c b/fs/affs/super.c
index 8451647f3fea..dff272df0636 100644
--- a/fs/affs/super.c
+++ b/fs/affs/super.c
@@ -108,6 +108,7 @@ static struct inode *affs_alloc_inode(struct super_block *sb)
i->i_lc = NULL;
i->i_ext_bh = NULL;
i->i_pa_cnt = 0;
+ mmb_init(&i->i_metadata_bhs);
return &i->vfs_inode;
}
@@ -147,6 +148,11 @@ static void destroy_inodecache(void)
kmem_cache_destroy(affs_inode_cachep);
}
+struct mapping_metadata_bhs *affs_get_metadata_bhs(struct inode *inode)
+{
+ return &AFFS_I(inode)->i_metadata_bhs;
+}
+
static const struct super_operations affs_sops = {
.alloc_inode = affs_alloc_inode,
.free_inode = affs_free_inode,
diff --git a/fs/affs/symlink.c b/fs/affs/symlink.c
index 094aec8d17b8..68fa091bd377 100644
--- a/fs/affs/symlink.c
+++ b/fs/affs/symlink.c
@@ -72,4 +72,5 @@ const struct address_space_operations affs_symlink_aops = {
const struct inode_operations affs_symlink_inode_operations = {
.get_link = page_get_link,
.setattr = affs_notify_change,
+ .get_metadata_bhs = affs_get_metadata_bhs,
};
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 25/32] bfs: Track metadata bhs in fs-private inode part
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (23 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 24/32] affs: " Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 26/32] fat: " Jan Kara
` (8 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Track metadata bhs for an inode in fs-private part of the inode.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/bfs/bfs.h | 2 ++
fs/bfs/dir.c | 1 +
fs/bfs/file.c | 4 +++-
fs/bfs/inode.c | 7 +++++++
4 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/fs/bfs/bfs.h b/fs/bfs/bfs.h
index 606f9378b2f0..5fadb6e860f1 100644
--- a/fs/bfs/bfs.h
+++ b/fs/bfs/bfs.h
@@ -35,6 +35,7 @@ struct bfs_inode_info {
unsigned long i_dsk_ino; /* inode number from the disk, can be 0 */
unsigned long i_sblock;
unsigned long i_eblock;
+ struct mapping_metadata_bhs i_metadata_bhs;
struct inode vfs_inode;
};
@@ -55,6 +56,7 @@ static inline struct bfs_inode_info *BFS_I(struct inode *inode)
/* inode.c */
extern struct inode *bfs_iget(struct super_block *sb, unsigned long ino);
extern void bfs_dump_imap(const char *, struct super_block *);
+struct mapping_metadata_bhs *bfs_get_metadata_bhs(struct inode *inode);
/* file.c */
extern const struct inode_operations bfs_file_inops;
diff --git a/fs/bfs/dir.c b/fs/bfs/dir.c
index c375e22c4c0c..30529f476582 100644
--- a/fs/bfs/dir.c
+++ b/fs/bfs/dir.c
@@ -262,6 +262,7 @@ const struct inode_operations bfs_dir_inops = {
.link = bfs_link,
.unlink = bfs_unlink,
.rename = bfs_rename,
+ .get_metadata_bhs = bfs_get_metadata_bhs,
};
static int bfs_add_entry(struct inode *dir, const struct qstr *child, int ino)
diff --git a/fs/bfs/file.c b/fs/bfs/file.c
index d33d6bde992b..335ab07e37fe 100644
--- a/fs/bfs/file.c
+++ b/fs/bfs/file.c
@@ -200,4 +200,6 @@ const struct address_space_operations bfs_aops = {
.bmap = bfs_bmap,
};
-const struct inode_operations bfs_file_inops;
+const struct inode_operations bfs_file_inops = {
+ .get_metadata_bhs = bfs_get_metadata_bhs,
+};
diff --git a/fs/bfs/inode.c b/fs/bfs/inode.c
index e0e50a9dbe9c..f1a392394a23 100644
--- a/fs/bfs/inode.c
+++ b/fs/bfs/inode.c
@@ -259,6 +259,8 @@ static struct inode *bfs_alloc_inode(struct super_block *sb)
bi = alloc_inode_sb(sb, bfs_inode_cachep, GFP_KERNEL);
if (!bi)
return NULL;
+ mmb_init(&bi->i_metadata_bhs);
+
return &bi->vfs_inode;
}
@@ -296,6 +298,11 @@ static void destroy_inodecache(void)
kmem_cache_destroy(bfs_inode_cachep);
}
+struct mapping_metadata_bhs *bfs_get_metadata_bhs(struct inode *inode)
+{
+ return &BFS_I(inode)->i_metadata_bhs;
+}
+
static const struct super_operations bfs_sops = {
.alloc_inode = bfs_alloc_inode,
.free_inode = bfs_free_inode,
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 26/32] fat: Track metadata bhs in fs-private inode part
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (24 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 25/32] bfs: " Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 27/32] udf: " Jan Kara
` (7 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Track metadata bhs for an inode in fs-private part of the inode.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/fat/fat.h | 2 ++
fs/fat/file.c | 1 +
fs/fat/inode.c | 12 ++++++++++++
fs/fat/namei_msdos.c | 1 +
fs/fat/namei_vfat.c | 1 +
5 files changed, 17 insertions(+)
diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index 0d269dba897b..2b2f6ad32f24 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -130,6 +130,7 @@ struct msdos_inode_info {
struct hlist_node i_dir_hash; /* hash by i_logstart */
struct rw_semaphore truncate_lock; /* protect bmap against truncate */
struct timespec64 i_crtime; /* File creation (birth) time */
+ struct mapping_metadata_bhs i_metadata_bhs;
struct inode vfs_inode;
};
@@ -424,6 +425,7 @@ extern int fat_fill_inode(struct inode *inode, struct msdos_dir_entry *de);
extern int fat_flush_inodes(struct super_block *sb, struct inode *i1,
struct inode *i2);
+struct mapping_metadata_bhs *fat_get_metadata_bhs(struct inode *inode);
extern const struct fs_parameter_spec fat_param_spec[];
int fat_init_fs_context(struct fs_context *fc, bool is_vfat);
diff --git a/fs/fat/file.c b/fs/fat/file.c
index 124d9c5431c8..da21636d3874 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -574,4 +574,5 @@ const struct inode_operations fat_file_inode_operations = {
.setattr = fat_setattr,
.getattr = fat_getattr,
.update_time = fat_update_time,
+ .get_metadata_bhs = fat_get_metadata_bhs,
};
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index ce88602b0d57..8561b8be5ca2 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -763,6 +763,7 @@ static struct inode *fat_alloc_inode(struct super_block *sb)
ei->i_pos = 0;
ei->i_crtime.tv_sec = 0;
ei->i_crtime.tv_nsec = 0;
+ mmb_init(&ei->i_metadata_bhs);
return &ei->vfs_inode;
}
@@ -807,6 +808,12 @@ static void __exit fat_destroy_inodecache(void)
kmem_cache_destroy(fat_inode_cachep);
}
+struct mapping_metadata_bhs *fat_get_metadata_bhs(struct inode *inode)
+{
+ return &MSDOS_I(inode)->i_metadata_bhs;
+}
+EXPORT_SYMBOL_GPL(fat_get_metadata_bhs);
+
int fat_reconfigure(struct fs_context *fc)
{
bool new_rdonly;
@@ -1531,6 +1538,10 @@ static int fat_read_static_bpb(struct super_block *sb,
return error;
}
+static const struct inode_operations fat_table_inode_operations = {
+ .get_metadata_bhs = fat_get_metadata_bhs,
+};
+
/*
* Read the super block of an MS-DOS FS.
*/
@@ -1806,6 +1817,7 @@ int fat_fill_super(struct super_block *sb, struct fs_context *fc,
fat_inode = new_inode(sb);
if (!fat_inode)
goto out_fail;
+ fat_inode->i_op = &fat_table_inode_operations;
sbi->fat_inode = fat_inode;
fsinfo_inode = new_inode(sb);
diff --git a/fs/fat/namei_msdos.c b/fs/fat/namei_msdos.c
index 048c103b506a..1526b8910d51 100644
--- a/fs/fat/namei_msdos.c
+++ b/fs/fat/namei_msdos.c
@@ -643,6 +643,7 @@ static const struct inode_operations msdos_dir_inode_operations = {
.setattr = fat_setattr,
.getattr = fat_getattr,
.update_time = fat_update_time,
+ .get_metadata_bhs = fat_get_metadata_bhs,
};
static void setup(struct super_block *sb)
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 87dcdd86272b..ca5e0e9822a6 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -1186,6 +1186,7 @@ static const struct inode_operations vfat_dir_inode_operations = {
.setattr = fat_setattr,
.getattr = fat_getattr,
.update_time = fat_update_time,
+ .get_metadata_bhs = fat_get_metadata_bhs,
};
static void setup(struct super_block *sb)
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 27/32] udf: Track metadata bhs in fs-private inode part
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (25 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 26/32] fat: " Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 28/32] minix: " Jan Kara
` (6 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Track metadata bhs for an inode in fs-private part of the inode.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/udf/file.c | 1 +
fs/udf/namei.c | 1 +
fs/udf/super.c | 6 ++++++
fs/udf/symlink.c | 1 +
fs/udf/udf_i.h | 1 +
fs/udf/udfdecl.h | 1 +
6 files changed, 11 insertions(+)
diff --git a/fs/udf/file.c b/fs/udf/file.c
index 32ae7cfd72c5..8d51313173f3 100644
--- a/fs/udf/file.c
+++ b/fs/udf/file.c
@@ -251,4 +251,5 @@ static int udf_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
const struct inode_operations udf_file_inode_operations = {
.setattr = udf_setattr,
+ .get_metadata_bhs = udf_get_metadata_bhs,
};
diff --git a/fs/udf/namei.c b/fs/udf/namei.c
index 5f2e9a892bff..ef9eadb96f4e 100644
--- a/fs/udf/namei.c
+++ b/fs/udf/namei.c
@@ -1025,4 +1025,5 @@ const struct inode_operations udf_dir_inode_operations = {
.mknod = udf_mknod,
.rename = udf_rename,
.tmpfile = udf_tmpfile,
+ .get_metadata_bhs = udf_get_metadata_bhs,
};
diff --git a/fs/udf/super.c b/fs/udf/super.c
index 27f463fd1d89..eb62972c9fda 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -166,6 +166,7 @@ static struct inode *udf_alloc_inode(struct super_block *sb)
ei->cached_extent.lstart = -1;
spin_lock_init(&ei->i_extent_cache_lock);
inode_set_iversion(&ei->vfs_inode, 1);
+ mmb_init(&ei->i_metadata_bhs);
return &ei->vfs_inode;
}
@@ -205,6 +206,11 @@ static void destroy_inodecache(void)
kmem_cache_destroy(udf_inode_cachep);
}
+struct mapping_metadata_bhs *udf_get_metadata_bhs(struct inode *inode)
+{
+ return &UDF_I(inode)->i_metadata_bhs;
+}
+
/* Superblock operations */
static const struct super_operations udf_sb_ops = {
.alloc_inode = udf_alloc_inode,
diff --git a/fs/udf/symlink.c b/fs/udf/symlink.c
index fe03745d09b1..56c860a10b91 100644
--- a/fs/udf/symlink.c
+++ b/fs/udf/symlink.c
@@ -168,4 +168,5 @@ const struct address_space_operations udf_symlink_aops = {
const struct inode_operations udf_symlink_inode_operations = {
.get_link = page_get_link,
.getattr = udf_symlink_getattr,
+ .get_metadata_bhs = udf_get_metadata_bhs,
};
diff --git a/fs/udf/udf_i.h b/fs/udf/udf_i.h
index 312b7c9ef10e..fdaa88c49c2b 100644
--- a/fs/udf/udf_i.h
+++ b/fs/udf/udf_i.h
@@ -50,6 +50,7 @@ struct udf_inode_info {
struct kernel_lb_addr i_locStreamdir;
__u64 i_lenStreams;
struct rw_semaphore i_data_sem;
+ struct mapping_metadata_bhs i_metadata_bhs;
struct udf_ext_cache cached_extent;
/* Spinlock for protecting extent cache */
spinlock_t i_extent_cache_lock;
diff --git a/fs/udf/udfdecl.h b/fs/udf/udfdecl.h
index d159f20d61e8..db2b92217bf5 100644
--- a/fs/udf/udfdecl.h
+++ b/fs/udf/udfdecl.h
@@ -126,6 +126,7 @@ static inline void udf_updated_lvid(struct super_block *sb)
extern u64 lvid_get_unique_id(struct super_block *sb);
struct inode *udf_find_metadata_inode_efe(struct super_block *sb,
u32 meta_file_loc, u32 partition_num);
+struct mapping_metadata_bhs *udf_get_metadata_bhs(struct inode *inode);
/* namei.c */
static inline unsigned int udf_dir_entry_len(struct fileIdentDesc *cfi)
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 28/32] minix: Track metadata bhs in fs-private inode part
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (26 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 27/32] udf: " Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 29/32] ext4: " Jan Kara
` (5 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Track metadata bhs for an inode in fs-private part of the inode.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/minix/file.c | 1 +
fs/minix/inode.c | 8 ++++++++
fs/minix/minix.h | 2 ++
fs/minix/namei.c | 1 +
4 files changed, 12 insertions(+)
diff --git a/fs/minix/file.c b/fs/minix/file.c
index dca7ac71f049..b3abe380634a 100644
--- a/fs/minix/file.c
+++ b/fs/minix/file.c
@@ -50,4 +50,5 @@ static int minix_setattr(struct mnt_idmap *idmap,
const struct inode_operations minix_file_inode_operations = {
.setattr = minix_setattr,
.getattr = minix_getattr,
+ .get_metadata_bhs = minix_get_metadata_bhs,
};
diff --git a/fs/minix/inode.c b/fs/minix/inode.c
index ab7c06efb139..20abbe21a632 100644
--- a/fs/minix/inode.c
+++ b/fs/minix/inode.c
@@ -85,6 +85,8 @@ static struct inode *minix_alloc_inode(struct super_block *sb)
ei = alloc_inode_sb(sb, minix_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
+ mmb_init(&ei->i_metadata_bhs);
+
return &ei->vfs_inode;
}
@@ -122,6 +124,11 @@ static void destroy_inodecache(void)
kmem_cache_destroy(minix_inode_cachep);
}
+struct mapping_metadata_bhs *minix_get_metadata_bhs(struct inode *inode)
+{
+ return &minix_i(inode)->i_metadata_bhs;
+}
+
static const struct super_operations minix_sops = {
.alloc_inode = minix_alloc_inode,
.free_inode = minix_free_in_core_inode,
@@ -502,6 +509,7 @@ static const struct address_space_operations minix_aops = {
static const struct inode_operations minix_symlink_inode_operations = {
.get_link = page_get_link,
.getattr = minix_getattr,
+ .get_metadata_bhs = minix_get_metadata_bhs,
};
void minix_set_inode(struct inode *inode, dev_t rdev)
diff --git a/fs/minix/minix.h b/fs/minix/minix.h
index 7e1f652f16d3..38981a30ac99 100644
--- a/fs/minix/minix.h
+++ b/fs/minix/minix.h
@@ -19,6 +19,7 @@ struct minix_inode_info {
__u16 i1_data[16];
__u32 i2_data[16];
} u;
+ struct mapping_metadata_bhs i_metadata_bhs;
struct inode vfs_inode;
};
@@ -57,6 +58,7 @@ unsigned long minix_count_free_blocks(struct super_block *sb);
int minix_getattr(struct mnt_idmap *, const struct path *,
struct kstat *, u32, unsigned int);
int minix_prepare_chunk(struct folio *folio, loff_t pos, unsigned len);
+struct mapping_metadata_bhs *minix_get_metadata_bhs(struct inode *inode);
extern void V1_minix_truncate(struct inode *);
extern void V2_minix_truncate(struct inode *);
diff --git a/fs/minix/namei.c b/fs/minix/namei.c
index 263e4ba8b1c8..e31e84a677eb 100644
--- a/fs/minix/namei.c
+++ b/fs/minix/namei.c
@@ -288,4 +288,5 @@ const struct inode_operations minix_dir_inode_operations = {
.rename = minix_rename,
.getattr = minix_getattr,
.tmpfile = minix_tmpfile,
+ .get_metadata_bhs = minix_get_metadata_bhs,
};
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 29/32] ext4: Track metadata bhs in fs-private inode part
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (27 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 28/32] minix: " Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 30/32] vfs: Drop mapping_metadata_bhs from address space Jan Kara
` (4 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Track metadata bhs for an inode in fs-private part of the inode. We need
the tracking only for nojournal mode so this is somewhat wasteful. We
can relatively easily make the mapping_metadata_bhs struct dynamically
allocated similarly to how we treat jbd2_inode but let's leave that for
ext4 specific series once the dust settles a bit.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/ext4.h | 4 +++-
fs/ext4/file.c | 1 +
fs/ext4/inode.c | 2 +-
fs/ext4/namei.c | 2 ++
fs/ext4/super.c | 6 ++++++
fs/ext4/symlink.c | 3 +++
6 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 293f698b7042..a829e5da67af 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1121,6 +1121,7 @@ struct ext4_inode_info {
struct rw_semaphore i_data_sem;
struct inode vfs_inode;
struct jbd2_inode *jinode;
+ struct mapping_metadata_bhs i_metadata_bhs;
/*
* File creation time. Its function is same as that of
@@ -3203,8 +3204,9 @@ extern void ext4_mark_group_bitmap_corrupted(struct super_block *sb,
unsigned int flags);
extern unsigned int ext4_num_base_meta_blocks(struct super_block *sb,
ext4_group_t block_group);
-extern void print_daily_error_info(struct timer_list *t);
+struct mapping_metadata_bhs *ext4_get_metadata_bhs(struct inode *inode);
+extern void print_daily_error_info(struct timer_list *t);
extern __printf(7, 8)
void __ext4_error(struct super_block *, const char *, unsigned int, bool,
int, __u64, const char *, ...);
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index f1dc5ce791a7..3d433f50524b 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -987,5 +987,6 @@ const struct inode_operations ext4_file_inode_operations = {
.fiemap = ext4_fiemap,
.fileattr_get = ext4_fileattr_get,
.fileattr_set = ext4_fileattr_set,
+ .get_metadata_bhs = ext4_get_metadata_bhs,
};
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 011cb2eb16a2..eead6c5c2366 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3436,7 +3436,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
}
/* Any metadata buffers to write? */
- if (mmb_has_buffers(&inode->i_mapping->i_metadata_bhs))
+ if (mmb_has_buffers(&EXT4_I(inode)->i_metadata_bhs))
return true;
return inode_state_read_once(inode) & I_DIRTY_DATASYNC;
}
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index c4b5e252af0e..4d2cae140b71 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -4228,6 +4228,7 @@ const struct inode_operations ext4_dir_inode_operations = {
.fiemap = ext4_fiemap,
.fileattr_get = ext4_fileattr_get,
.fileattr_set = ext4_fileattr_set,
+ .get_metadata_bhs = ext4_get_metadata_bhs,
};
const struct inode_operations ext4_special_inode_operations = {
@@ -4236,4 +4237,5 @@ const struct inode_operations ext4_special_inode_operations = {
.listxattr = ext4_listxattr,
.get_inode_acl = ext4_get_acl,
.set_acl = ext4_set_acl,
+ .get_metadata_bhs = ext4_get_metadata_bhs,
};
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index ea827b0ecc8d..4b9eb86b03e2 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1428,6 +1428,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
INIT_WORK(&ei->i_rsv_conversion_work, ext4_end_io_rsv_work);
ext4_fc_init_inode(&ei->vfs_inode);
spin_lock_init(&ei->i_fc_lock);
+ mmb_init(&ei->i_metadata_bhs);
return &ei->vfs_inode;
}
@@ -1521,6 +1522,11 @@ static void destroy_inodecache(void)
kmem_cache_destroy(ext4_inode_cachep);
}
+struct mapping_metadata_bhs *ext4_get_metadata_bhs(struct inode *inode)
+{
+ return &EXT4_I(inode)->i_metadata_bhs;
+}
+
void ext4_clear_inode(struct inode *inode)
{
ext4_fc_del(inode);
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index 645240cc0229..53ec8daf4cae 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -119,6 +119,7 @@ const struct inode_operations ext4_encrypted_symlink_inode_operations = {
.setattr = ext4_setattr,
.getattr = ext4_encrypted_symlink_getattr,
.listxattr = ext4_listxattr,
+ .get_metadata_bhs = ext4_get_metadata_bhs,
};
const struct inode_operations ext4_symlink_inode_operations = {
@@ -126,6 +127,7 @@ const struct inode_operations ext4_symlink_inode_operations = {
.setattr = ext4_setattr,
.getattr = ext4_getattr,
.listxattr = ext4_listxattr,
+ .get_metadata_bhs = ext4_get_metadata_bhs,
};
const struct inode_operations ext4_fast_symlink_inode_operations = {
@@ -133,4 +135,5 @@ const struct inode_operations ext4_fast_symlink_inode_operations = {
.setattr = ext4_setattr,
.getattr = ext4_getattr,
.listxattr = ext4_listxattr,
+ .get_metadata_bhs = ext4_get_metadata_bhs,
};
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 30/32] vfs: Drop mapping_metadata_bhs from address space
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (28 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 29/32] ext4: " Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 31/32] kvm: Use private inode list instead of i_private_list Jan Kara
` (3 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Nobody uses mapping_metadata_bhs in struct address_space anymore. Just
remove it.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/buffer.c | 16 ++++++++++------
fs/inode.c | 2 --
include/linux/fs.h | 1 -
3 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 096a8d9e3280..02176e0acfe1 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -501,9 +501,13 @@ EXPORT_SYMBOL(mmb_init);
static struct mapping_metadata_bhs *inode_get_metadata_bhs(struct inode *inode)
{
+ /*
+ * We can get called for various half-initialized or bad inodes so
+ * verify .get_metadata_bhs callback exists.
+ */
if (inode->i_op->get_metadata_bhs)
return inode->i_op->get_metadata_bhs(inode);
- return &inode->i_mapping->i_metadata_bhs;
+ return NULL;
}
static void __remove_assoc_queue(struct mapping_metadata_bhs *mmb,
@@ -544,7 +548,7 @@ static void remove_assoc_queue(struct buffer_head *bh)
bool mmb_has_buffers(struct mapping_metadata_bhs *mmb)
{
- return !list_empty(&mmb->list);
+ return mmb && !list_empty(&mmb->list);
}
EXPORT_SYMBOL_GPL(mmb_has_buffers);
@@ -552,10 +556,10 @@ EXPORT_SYMBOL_GPL(mmb_has_buffers);
* sync_mapping_buffers - write out & wait upon a mapping's "associated" buffers
* @mapping: the mapping which wants those buffers written
*
- * Starts I/O against the buffers at mapping->i_metadata_bhs and waits upon
- * that I/O. Basically, this is a convenience function for fsync(). @mapping
- * is a file or directory which needs those buffers to be written for a
- * successful fsync().
+ * Starts I/O against the buffers tracked in mapping_metadata_bhs for the
+ * mapping and waits upon that I/O. Basically, this is a convenience function
+ * for fsync(). @mapping is a file or directory which needs those buffers to
+ * be written for a successful fsync().
*
* We have conflicting pressures: we want to make sure that all
* initially dirty buffers get waited on, but that any subsequently
diff --git a/fs/inode.c b/fs/inode.c
index 393f586d050a..d5774e627a9c 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -483,8 +483,6 @@ static void __address_space_init_once(struct address_space *mapping)
init_rwsem(&mapping->i_mmap_rwsem);
INIT_LIST_HEAD(&mapping->i_private_list);
spin_lock_init(&mapping->i_private_lock);
- spin_lock_init(&mapping->i_metadata_bhs.lock);
- INIT_LIST_HEAD(&mapping->i_metadata_bhs.list);
mapping->i_mmap = RB_ROOT_CACHED;
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b4d9be1fefa4..1611d8ce4b66 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -490,7 +490,6 @@ struct address_space {
errseq_t wb_err;
spinlock_t i_private_lock;
struct list_head i_private_list;
- struct mapping_metadata_bhs i_metadata_bhs;
struct rw_semaphore i_mmap_rwsem;
} __attribute__((aligned(sizeof(long)))) __randomize_layout;
/*
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 31/32] kvm: Use private inode list instead of i_private_list
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (29 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 30/32] vfs: Drop mapping_metadata_bhs from address space Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 32/32] fs: Drop i_private_list from address_space Jan Kara
` (2 subsequent siblings)
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Instead of using mapping->i_private_list use a list in private part of
the inode.
Signed-off-by: Jan Kara <jack@suse.cz>
---
virt/kvm/guest_memfd.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 017d84a7adf3..6d36a7827870 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -30,6 +30,7 @@ struct gmem_file {
struct gmem_inode {
struct shared_policy policy;
struct inode vfs_inode;
+ struct list_head gem_file_list;
u64 flags;
};
@@ -39,8 +40,8 @@ static __always_inline struct gmem_inode *GMEM_I(struct inode *inode)
return container_of(inode, struct gmem_inode, vfs_inode);
}
-#define kvm_gmem_for_each_file(f, mapping) \
- list_for_each_entry(f, &(mapping)->i_private_list, entry)
+#define kvm_gmem_for_each_file(f, inode) \
+ list_for_each_entry(f, &GMEM_I(inode)->gem_file_list, entry)
/**
* folio_file_pfn - like folio_file_page, but return a pfn.
@@ -202,7 +203,7 @@ static void kvm_gmem_invalidate_begin(struct inode *inode, pgoff_t start,
attr_filter = kvm_gmem_get_invalidate_filter(inode);
- kvm_gmem_for_each_file(f, inode->i_mapping)
+ kvm_gmem_for_each_file(f, inode)
__kvm_gmem_invalidate_begin(f, start, end, attr_filter);
}
@@ -223,7 +224,7 @@ static void kvm_gmem_invalidate_end(struct inode *inode, pgoff_t start,
{
struct gmem_file *f;
- kvm_gmem_for_each_file(f, inode->i_mapping)
+ kvm_gmem_for_each_file(f, inode)
__kvm_gmem_invalidate_end(f, start, end);
}
@@ -609,7 +610,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
kvm_get_kvm(kvm);
f->kvm = kvm;
xa_init(&f->bindings);
- list_add(&f->entry, &inode->i_mapping->i_private_list);
+ list_add(&f->entry, &GMEM_I(inode)->gem_file_list);
fd_install(fd, file);
return fd;
@@ -945,6 +946,7 @@ static struct inode *kvm_gmem_alloc_inode(struct super_block *sb)
mpol_shared_policy_init(&gi->policy, NULL);
gi->flags = 0;
+ INIT_LIST_HEAD(&gi->gem_file_list);
return &gi->vfs_inode;
}
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 32/32] fs: Drop i_private_list from address_space
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (30 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 31/32] kvm: Use private inode list instead of i_private_list Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 23:35 ` [syzbot ci] Re: fs: Move metadata bh tracking " syzbot ci
2026-03-04 12:32 ` [PATCH 0/32] " Christian Brauner
33 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Nobody is using i_private_list anymore. Remove it.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/inode.c | 2 --
include/linux/fs.h | 2 --
2 files changed, 4 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index d5774e627a9c..a8f019078fab 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -481,7 +481,6 @@ static void __address_space_init_once(struct address_space *mapping)
{
xa_init_flags(&mapping->i_pages, XA_FLAGS_LOCK_IRQ | XA_FLAGS_ACCOUNT);
init_rwsem(&mapping->i_mmap_rwsem);
- INIT_LIST_HEAD(&mapping->i_private_list);
spin_lock_init(&mapping->i_private_lock);
mapping->i_mmap = RB_ROOT_CACHED;
}
@@ -795,7 +794,6 @@ void clear_inode(struct inode *inode)
* nor even WARN_ON(!mapping_empty).
*/
xa_unlock_irq(&inode->i_data.i_pages);
- BUG_ON(!list_empty(&inode->i_data.i_private_list));
BUG_ON(!(inode_state_read_once(inode) & I_FREEING));
BUG_ON(inode_state_read_once(inode) & I_CLEAR);
BUG_ON(!list_empty(&inode->i_wb_list));
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1611d8ce4b66..adad21e31cfc 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -470,7 +470,6 @@ struct mapping_metadata_bhs {
* @flags: Error bits and flags (AS_*).
* @wb_err: The most recent error which has occurred.
* @i_private_lock: For use by the owner of the address_space.
- * @i_private_list: For use by the owner of the address_space.
*/
struct address_space {
struct inode *host;
@@ -489,7 +488,6 @@ struct address_space {
unsigned long flags;
errseq_t wb_err;
spinlock_t i_private_lock;
- struct list_head i_private_list;
struct rw_semaphore i_mmap_rwsem;
} __attribute__((aligned(sizeof(long)))) __randomize_layout;
/*
--
2.51.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 11/32] gfs2: Don't zero i_private_data
2026-03-03 10:34 ` [PATCH 11/32] gfs2: Don't zero i_private_data Jan Kara
@ 2026-03-03 12:32 ` Andreas Gruenbacher
2026-03-04 10:39 ` Jan Kara
0 siblings, 1 reply; 42+ messages in thread
From: Andreas Gruenbacher @ 2026-03-03 12:32 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, gfs2
Jan,
On Tue, Mar 3, 2026 at 11:34 AM Jan Kara <jack@suse.cz> wrote:
> The zeroing is the only use within gfs2 so it is pointless.
"Remove the explicit zeroing of mapping->i_private_data since this
field is no longer used."
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Thanks,
Andreas
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 21/32] bdev: Drop pointless invalidate_mapping_buffers() call
2026-03-03 10:34 ` [PATCH 21/32] bdev: Drop pointless invalidate_mapping_buffers() call Jan Kara
@ 2026-03-03 14:03 ` Christoph Hellwig
2026-03-04 10:30 ` Jan Kara
2026-03-03 14:09 ` Christoph Hellwig
1 sibling, 1 reply; 42+ messages in thread
From: Christoph Hellwig @ 2026-03-03 14:03 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jens Axboe, linux-block
> diff --git a/block/bdev.c b/block/bdev.c
> index ed022f8c48c7..ad1660b6b324 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -420,7 +420,6 @@ static void init_once(void *data)
> static void bdev_evict_inode(struct inode *inode)
> {
> truncate_inode_pages_final(&inode->i_data);
> - invalidate_inode_buffers(inode); /* is it needed here? */
> clear_inode(inode);
> }
With this, bdev_evict_inode can go away as it is equivalent to the
default action when no ->evict_inode is provided.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 21/32] bdev: Drop pointless invalidate_mapping_buffers() call
2026-03-03 10:34 ` [PATCH 21/32] bdev: Drop pointless invalidate_mapping_buffers() call Jan Kara
2026-03-03 14:03 ` Christoph Hellwig
@ 2026-03-03 14:09 ` Christoph Hellwig
2026-03-04 10:36 ` Jan Kara
1 sibling, 1 reply; 42+ messages in thread
From: Christoph Hellwig @ 2026-03-03 14:09 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jens Axboe, linux-block
FYI, linux-block only got this patch which is totally messed up.
Please always send all patches to every list and person, otherwise
you fill peoples inboxes with unreviewable junk.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [syzbot ci] Re: fs: Move metadata bh tracking from address_space
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (31 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 32/32] fs: Drop i_private_list from address_space Jan Kara
@ 2026-03-03 23:35 ` syzbot ci
2026-03-04 12:32 ` [PATCH 0/32] " Christian Brauner
33 siblings, 0 replies; 42+ messages in thread
From: syzbot ci @ 2026-03-03 23:35 UTC (permalink / raw)
To: agruenba, aivazian.tigran, almaz.alexandrovich, axboe, bcrl,
brauner, david, dsterba, gfs2, hirofumi, jack, jlbec, joseph.qi,
linux-aio, linux-block, linux-ext4, linux-fsdevel, linux-mm,
muchun.song, ntfs3, ocfs2-devel, osalvador, tytso, viro
Cc: syzbot, syzkaller-bugs
syzbot ci has tested the following series
[v1] fs: Move metadata bh tracking from address_space
https://lore.kernel.org/all/20260303101717.27224-1-jack@suse.cz
* [PATCH 01/32] fat: Sync and invalidate metadata buffers from fat_evict_inode()
* [PATCH 02/32] udf: Sync and invalidate metadata buffers from udf_evict_inode()
* [PATCH 03/32] minix: Sync and invalidate metadata buffers from minix_evict_inode()
* [PATCH 04/32] ext2: Sync and invalidate metadata buffers from ext2_evict_inode()
* [PATCH 05/32] ext4: Sync and invalidate metadata buffers from ext4_evict_inode()
* [PATCH 06/32] ext4: Use inode_has_buffers()
* [PATCH 07/32] bfs: Sync and invalidate metadata buffers from bfs_evict_inode()
* [PATCH 08/32] affs: Sync and invalidate metadata buffers from affs_evict_inode()
* [PATCH 09/32] fs: Ignore inode metadata buffers in inode_lru_isolate()
* [PATCH 10/32] fs: Stop using i_private_data for metadata bh tracking
* [PATCH 11/32] gfs2: Don't zero i_private_data
* [PATCH 12/32] hugetlbfs: Stop using i_private_data
* [PATCH 13/32] aio: Stop using i_private_data and i_private_lock
* [PATCH 14/32] fs: Remove i_private_data
* [PATCH 15/32] fs: Drop osync_buffers_list()
* [PATCH 16/32] fs: Fold fsync_buffers_list() into sync_mapping_buffers()
* [PATCH 17/32] fs: Move metadata bhs tracking to a separate struct
* [PATCH 18/32] fs: Provide operation for fetching mapping_metadata_bhs
* [PATCH 19/32] ntfs3: Drop pointless sync_mapping_buffers() call
* [PATCH 20/32] ocfs2: Drop pointless sync_mapping_buffers() calls
* [PATCH 21/32] bdev: Drop pointless invalidate_mapping_buffers() call
* [PATCH 22/32] fs: Switch inode_has_buffers() to take mapping_metadata_bhs
* [PATCH 23/32] ext2: Track metadata bhs in fs-private inode part
* [PATCH 24/32] affs: Track metadata bhs in fs-private inode part
* [PATCH 25/32] bfs: Track metadata bhs in fs-private inode part
* [PATCH 26/32] fat: Track metadata bhs in fs-private inode part
* [PATCH 27/32] udf: Track metadata bhs in fs-private inode part
* [PATCH 28/32] minix: Track metadata bhs in fs-private inode part
* [PATCH 29/32] ext4: Track metadata bhs in fs-private inode part
* [PATCH 30/32] vfs: Drop mapping_metadata_bhs from address space
* [PATCH 31/32] kvm: Use private inode list instead of i_private_list
* [PATCH 32/32] fs: Drop i_private_list from address_space
and found the following issues:
* BUG: spinlock bad magic in region_del
* KASAN: slab-use-after-free Read in region_del
* general protection fault in mark_buffer_dirty_inode
Full report is available here:
https://ci.syzbot.org/series/3cf14b16-7f50-44ce-9f95-8ac4b86cf294
***
BUG: spinlock bad magic in region_del
tree: mm-new
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/akpm/mm.git
base: f50c6ce7bf30099042dac755fbd1e97da456f5ec
arch: amd64
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config: https://ci.syzbot.org/builds/e716ec88-6c00-48e7-868d-3f4cb3999d4b/config
syz repro: https://ci.syzbot.org/findings/0d1bc933-ce69-432e-a2d5-b2411fe4cfec/syz_repro
BUG: spinlock bad magic on CPU#0, syz.0.151/6273
lock: 0xffff8881165dc808, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
CPU: 0 UID: 0 PID: 6273 Comm: syz.0.151 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
spin_bug kernel/locking/spinlock_debug.c:78 [inline]
debug_spin_lock_before kernel/locking/spinlock_debug.c:86 [inline]
do_raw_spin_lock+0x1e5/0x2f0 kernel/locking/spinlock_debug.c:115
spin_lock include/linux/spinlock.h:341 [inline]
region_del+0xbe/0x950 mm/hugetlb.c:863
hugetlb_unreserve_pages+0xfa/0x230 mm/hugetlb.c:6757
remove_inode_hugepages+0x1036/0x11a0 fs/hugetlbfs/inode.c:613
hugetlbfs_evict_inode+0xaf/0x260 fs/hugetlbfs/inode.c:623
evict+0x61e/0xb10 fs/inode.c:841
__dentry_kill+0x1a2/0x5e0 fs/dcache.c:670
finish_dput+0xc9/0x480 fs/dcache.c:879
do_one_tree fs/dcache.c:1657 [inline]
shrink_dcache_for_umount+0xe1/0x1f0 fs/dcache.c:1671
generic_shutdown_super+0x6f/0x2d0 fs/super.c:624
kill_anon_super+0x3b/0x70 fs/super.c:1292
deactivate_locked_super+0xbc/0x130 fs/super.c:476
cleanup_mnt+0x437/0x4d0 fs/namespace.c:1312
task_work_run+0x1d9/0x270 kernel/task_work.c:233
exit_task_work include/linux/task_work.h:40 [inline]
do_exit+0x69b/0x2320 kernel/exit.c:971
do_group_exit+0x21b/0x2d0 kernel/exit.c:1112
get_signal+0x1284/0x1330 kernel/signal.c:3034
arch_do_signal_or_restart+0xbc/0x830 arch/x86/kernel/signal.c:337
__exit_to_user_mode_loop kernel/entry/common.c:64 [inline]
exit_to_user_mode_loop+0x86/0x480 kernel/entry/common.c:98
__exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline]
syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:325 [inline]
do_syscall_64+0x32d/0xf80 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f6e0f19c799
Code: Unable to access opcode bytes at 0x7f6e0f19c76f.
RSP: 002b:00007f6e101360e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 00007f6e0f415fa8 RCX: 00007f6e0f19c799
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f6e0f415fa8
RBP: 00007f6e0f415fa0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f6e0f416038 R14: 00007fff1de1a520 R15: 00007fff1de1a608
</TASK>
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 0 UID: 0 PID: 6273 Comm: syz.0.151 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:region_del+0x108/0x950 mm/hugetlb.c:864
Code: 24 20 49 29 c4 4c 03 23 48 89 03 48 8b 5c 24 40 4c 39 eb 0f 84 64 05 00 00 e8 74 c0 9c ff 4c 89 64 24 10 49 89 df 49 c1 ef 03 <41> 80 3c 2f 00 74 08 48 89 df e8 b9 d8 06 00 48 8b 03 48 89 44 24
RSP: 0018:ffffc90003b17330 EFLAGS: 00010246
RAX: a69e65823ec40000 RBX: 0000000000000000 RCX: 0000000000000001
RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffc90003b172a0
RBP: dffffc0000000000 R08: 0000000000000003 R09: 0000000000000004
R10: dffffc0000000000 R11: fffff52000762e54 R12: 0000000000000000
R13: ffff8881165dc848 R14: 1ffff11022cbb909 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88818de67000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc23744ea7c CR3: 000000000e54c000 CR4: 00000000000006f0
Call Trace:
<TASK>
hugetlb_unreserve_pages+0xfa/0x230 mm/hugetlb.c:6757
remove_inode_hugepages+0x1036/0x11a0 fs/hugetlbfs/inode.c:613
hugetlbfs_evict_inode+0xaf/0x260 fs/hugetlbfs/inode.c:623
evict+0x61e/0xb10 fs/inode.c:841
__dentry_kill+0x1a2/0x5e0 fs/dcache.c:670
finish_dput+0xc9/0x480 fs/dcache.c:879
do_one_tree fs/dcache.c:1657 [inline]
shrink_dcache_for_umount+0xe1/0x1f0 fs/dcache.c:1671
generic_shutdown_super+0x6f/0x2d0 fs/super.c:624
kill_anon_super+0x3b/0x70 fs/super.c:1292
deactivate_locked_super+0xbc/0x130 fs/super.c:476
cleanup_mnt+0x437/0x4d0 fs/namespace.c:1312
task_work_run+0x1d9/0x270 kernel/task_work.c:233
exit_task_work include/linux/task_work.h:40 [inline]
do_exit+0x69b/0x2320 kernel/exit.c:971
do_group_exit+0x21b/0x2d0 kernel/exit.c:1112
get_signal+0x1284/0x1330 kernel/signal.c:3034
arch_do_signal_or_restart+0xbc/0x830 arch/x86/kernel/signal.c:337
__exit_to_user_mode_loop kernel/entry/common.c:64 [inline]
exit_to_user_mode_loop+0x86/0x480 kernel/entry/common.c:98
__exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline]
syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:325 [inline]
do_syscall_64+0x32d/0xf80 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f6e0f19c799
Code: Unable to access opcode bytes at 0x7f6e0f19c76f.
RSP: 002b:00007f6e101360e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 00007f6e0f415fa8 RCX: 00007f6e0f19c799
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f6e0f415fa8
RBP: 00007f6e0f415fa0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f6e0f416038 R14: 00007fff1de1a520 R15: 00007fff1de1a608
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:region_del+0x108/0x950 mm/hugetlb.c:864
Code: 24 20 49 29 c4 4c 03 23 48 89 03 48 8b 5c 24 40 4c 39 eb 0f 84 64 05 00 00 e8 74 c0 9c ff 4c 89 64 24 10 49 89 df 49 c1 ef 03 <41> 80 3c 2f 00 74 08 48 89 df e8 b9 d8 06 00 48 8b 03 48 89 44 24
RSP: 0018:ffffc90003b17330 EFLAGS: 00010246
RAX: a69e65823ec40000 RBX: 0000000000000000 RCX: 0000000000000001
RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffc90003b172a0
RBP: dffffc0000000000 R08: 0000000000000003 R09: 0000000000000004
R10: dffffc0000000000 R11: fffff52000762e54 R12: 0000000000000000
R13: ffff8881165dc848 R14: 1ffff11022cbb909 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88818de67000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc23744ea7c CR3: 000000000e54c000 CR4: 00000000000006f0
----------------
Code disassembly (best guess):
0: 24 20 and $0x20,%al
2: 49 29 c4 sub %rax,%r12
5: 4c 03 23 add (%rbx),%r12
8: 48 89 03 mov %rax,(%rbx)
b: 48 8b 5c 24 40 mov 0x40(%rsp),%rbx
10: 4c 39 eb cmp %r13,%rbx
13: 0f 84 64 05 00 00 je 0x57d
19: e8 74 c0 9c ff call 0xff9cc092
1e: 4c 89 64 24 10 mov %r12,0x10(%rsp)
23: 49 89 df mov %rbx,%r15
26: 49 c1 ef 03 shr $0x3,%r15
* 2a: 41 80 3c 2f 00 cmpb $0x0,(%r15,%rbp,1) <-- trapping instruction
2f: 74 08 je 0x39
31: 48 89 df mov %rbx,%rdi
34: e8 b9 d8 06 00 call 0x6d8f2
39: 48 8b 03 mov (%rbx),%rax
3c: 48 rex.W
3d: 89 .byte 0x89
3e: 44 rex.R
3f: 24 .byte 0x24
***
KASAN: slab-use-after-free Read in region_del
tree: mm-new
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/akpm/mm.git
base: f50c6ce7bf30099042dac755fbd1e97da456f5ec
arch: amd64
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config: https://ci.syzbot.org/builds/e716ec88-6c00-48e7-868d-3f4cb3999d4b/config
syz repro: https://ci.syzbot.org/findings/df3f89db-a2df-4664-973c-472164179e0a/syz_repro
==================================================================
BUG: KASAN: slab-use-after-free in __raw_spin_lock include/linux/spinlock_api_smp.h:158 [inline]
BUG: KASAN: slab-use-after-free in _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
Read of size 1 at addr ffff888114425020 by task syz.2.313/6592
CPU: 0 UID: 0 PID: 6592 Comm: syz.2.313 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xba/0x230 mm/kasan/report.c:482
kasan_report+0x117/0x150 mm/kasan/report.c:595
__kasan_check_byte+0x2a/0x40 mm/kasan/common.c:574
kasan_check_byte include/linux/kasan.h:402 [inline]
lock_acquire+0x79/0x2e0 kernel/locking/lockdep.c:5842
__raw_spin_lock include/linux/spinlock_api_smp.h:158 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:341 [inline]
region_del+0xbe/0x950 mm/hugetlb.c:863
hugetlb_unreserve_pages+0xfa/0x230 mm/hugetlb.c:6757
remove_inode_hugepages+0x1036/0x11a0 fs/hugetlbfs/inode.c:613
hugetlbfs_evict_inode+0xaf/0x260 fs/hugetlbfs/inode.c:623
evict+0x61e/0xb10 fs/inode.c:841
__dentry_kill+0x1a2/0x5e0 fs/dcache.c:670
finish_dput+0xc9/0x480 fs/dcache.c:879
do_one_tree fs/dcache.c:1657 [inline]
shrink_dcache_for_umount+0xe1/0x1f0 fs/dcache.c:1671
generic_shutdown_super+0x6f/0x2d0 fs/super.c:624
kill_anon_super+0x3b/0x70 fs/super.c:1292
deactivate_locked_super+0xbc/0x130 fs/super.c:476
cleanup_mnt+0x437/0x4d0 fs/namespace.c:1312
task_work_run+0x1d9/0x270 kernel/task_work.c:233
exit_task_work include/linux/task_work.h:40 [inline]
do_exit+0x69b/0x2320 kernel/exit.c:971
do_group_exit+0x21b/0x2d0 kernel/exit.c:1112
get_signal+0x1284/0x1330 kernel/signal.c:3034
arch_do_signal_or_restart+0xbc/0x830 arch/x86/kernel/signal.c:337
__exit_to_user_mode_loop kernel/entry/common.c:64 [inline]
exit_to_user_mode_loop+0x86/0x480 kernel/entry/common.c:98
__exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline]
syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:325 [inline]
do_syscall_64+0x32d/0xf80 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f6b41f9c799
Code: Unable to access opcode bytes at 0x7f6b41f9c76f.
RSP: 002b:00007f6b42db90e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 00007f6b42215fa8 RCX: 00007f6b41f9c799
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f6b42215fa8
RBP: 00007f6b42215fa0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f6b42216038 R14: 00007ffd7b00f490 R15: 00007ffd7b00f578
</TASK>
Allocated by task 6005:
kasan_save_stack mm/kasan/common.c:57 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
__kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
kasan_kmalloc include/linux/kasan.h:263 [inline]
__kmalloc_cache_noprof+0x31c/0x660 mm/slub.c:5339
kmalloc_noprof include/linux/slab.h:962 [inline]
resv_map_alloc+0x51/0x2c0 mm/hugetlb.c:1108
hugetlbfs_get_inode+0x5d/0x680 fs/hugetlbfs/inode.c:932
hugetlbfs_mknod fs/hugetlbfs/inode.c:987 [inline]
hugetlbfs_create+0x59/0xf0 fs/hugetlbfs/inode.c:1009
lookup_open fs/namei.c:4483 [inline]
open_last_lookups fs/namei.c:4583 [inline]
path_openat+0x1395/0x3860 fs/namei.c:4827
do_file_open+0x23e/0x4a0 fs/namei.c:4859
do_sys_openat2+0x113/0x200 fs/open.c:1366
do_sys_open fs/open.c:1372 [inline]
__do_sys_creat fs/open.c:1450 [inline]
__se_sys_creat fs/open.c:1444 [inline]
__x64_sys_creat+0x8f/0xc0 fs/open.c:1444
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 6005:
kasan_save_stack mm/kasan/common.c:57 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:584
poison_slab_object mm/kasan/common.c:253 [inline]
__kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
kasan_slab_free include/linux/kasan.h:235 [inline]
slab_free_hook mm/slub.c:2687 [inline]
slab_free mm/slub.c:6124 [inline]
kfree+0x1c1/0x630 mm/slub.c:6442
hugetlbfs_evict_inode+0xe1/0x260 fs/hugetlbfs/inode.c:628
evict+0x61e/0xb10 fs/inode.c:841
__dentry_kill+0x1a2/0x5e0 fs/dcache.c:670
shrink_kill+0xa9/0x2c0 fs/dcache.c:1147
shrink_dentry_list+0x2e0/0x5e0 fs/dcache.c:1174
shrink_dcache_tree+0xcf/0x310 fs/dcache.c:-1
do_one_tree fs/dcache.c:1654 [inline]
shrink_dcache_for_umount+0xa8/0x1f0 fs/dcache.c:1671
generic_shutdown_super+0x6f/0x2d0 fs/super.c:624
kill_anon_super+0x3b/0x70 fs/super.c:1292
deactivate_locked_super+0xbc/0x130 fs/super.c:476
cleanup_mnt+0x437/0x4d0 fs/namespace.c:1312
task_work_run+0x1d9/0x270 kernel/task_work.c:233
exit_task_work include/linux/task_work.h:40 [inline]
do_exit+0x69b/0x2320 kernel/exit.c:971
do_group_exit+0x21b/0x2d0 kernel/exit.c:1112
get_signal+0x1284/0x1330 kernel/signal.c:3034
arch_do_signal_or_restart+0xbc/0x830 arch/x86/kernel/signal.c:337
__exit_to_user_mode_loop kernel/entry/common.c:64 [inline]
exit_to_user_mode_loop+0x86/0x480 kernel/entry/common.c:98
__exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline]
syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:325 [inline]
do_syscall_64+0x32d/0xf80 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The buggy address belongs to the object at ffff888114425000
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 32 bytes inside of
freed 512-byte region [ffff888114425000, ffff888114425200)
The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff888114424000 pfn:0x114424
head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x17ff00000000240(workingset|head|node=0|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 017ff00000000240 ffff888100041c80 ffffea00044b8a10 ffffea0004539010
raw: ffff888114424000 0000000000100009 00000000f5000000 0000000000000000
head: 017ff00000000240 ffff888100041c80 ffffea00044b8a10 ffffea0004539010
head: ffff888114424000 0000000000100009 00000000f5000000 0000000000000000
head: 017ff00000000002 ffffea0004510901 00000000ffffffff 00000000ffffffff
head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000004
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 5267, tgid 5267 (udevd), ts 28927219244, free_ts 28922963584
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x231/0x280 mm/page_alloc.c:1889
prep_new_page mm/page_alloc.c:1897 [inline]
get_page_from_freelist+0x24dc/0x2580 mm/page_alloc.c:3962
__alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5250
alloc_slab_page mm/slub.c:3255 [inline]
allocate_slab+0x77/0x660 mm/slub.c:3444
new_slab mm/slub.c:3502 [inline]
refill_objects+0x331/0x3c0 mm/slub.c:7134
refill_sheaf mm/slub.c:2804 [inline]
__pcs_replace_empty_main+0x2b9/0x620 mm/slub.c:4578
alloc_from_pcs mm/slub.c:4681 [inline]
slab_alloc_node mm/slub.c:4815 [inline]
__kmalloc_cache_noprof+0x392/0x660 mm/slub.c:5334
kmalloc_noprof include/linux/slab.h:962 [inline]
kzalloc_noprof include/linux/slab.h:1200 [inline]
kernfs_fop_open+0x397/0xca0 fs/kernfs/file.c:641
do_dentry_open+0x785/0x14e0 fs/open.c:949
vfs_open+0x3b/0x340 fs/open.c:1081
do_open fs/namei.c:4671 [inline]
path_openat+0x2e08/0x3860 fs/namei.c:4830
do_file_open+0x23e/0x4a0 fs/namei.c:4859
do_sys_openat2+0x113/0x200 fs/open.c:1366
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x138/0x170 fs/open.c:1383
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
page last free pid 5265 tgid 5265 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
__free_pages_prepare mm/page_alloc.c:1433 [inline]
__free_frozen_pages+0xc2b/0xdb0 mm/page_alloc.c:2978
__slab_free+0x263/0x2b0 mm/slub.c:5532
qlink_free mm/kasan/quarantine.c:163 [inline]
qlist_free_all+0x97/0x100 mm/kasan/quarantine.c:179
kasan_quarantine_reduce+0x148/0x160 mm/kasan/quarantine.c:286
__kasan_slab_alloc+0x22/0x80 mm/kasan/common.c:350
kasan_slab_alloc include/linux/kasan.h:253 [inline]
slab_post_alloc_hook mm/slub.c:4501 [inline]
slab_alloc_node mm/slub.c:4830 [inline]
kmem_cache_alloc_noprof+0x2bc/0x650 mm/slub.c:4837
lsm_inode_alloc security/security.c:228 [inline]
security_inode_alloc+0x39/0x310 security/security.c:1189
inode_init_always_gfp+0x9c8/0xda0 fs/inode.c:305
inode_init_always include/linux/fs.h:2925 [inline]
alloc_inode+0x82/0x1b0 fs/inode.c:352
iget_locked+0x131/0x6a0 fs/inode.c:1474
kernfs_get_inode+0x4f/0x780 fs/kernfs/inode.c:253
kernfs_iop_lookup+0x1fe/0x320 fs/kernfs/dir.c:1241
__lookup_slow+0x2b7/0x410 fs/namei.c:1916
lookup_slow+0x53/0x70 fs/namei.c:1933
walk_component fs/namei.c:2279 [inline]
lookup_last fs/namei.c:2780 [inline]
path_lookupat+0x3f5/0x8c0 fs/namei.c:2804
filename_lookup+0x256/0x5d0 fs/namei.c:2833
Memory state around the buggy address:
ffff888114424f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888114424f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888114425000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888114425080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888114425100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
***
general protection fault in mark_buffer_dirty_inode
tree: mm-new
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/akpm/mm.git
base: f50c6ce7bf30099042dac755fbd1e97da456f5ec
arch: amd64
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config: https://ci.syzbot.org/builds/e716ec88-6c00-48e7-868d-3f4cb3999d4b/config
C repro: https://ci.syzbot.org/findings/670a21ca-1447-4fda-909b-5098c9c0cdd9/c_repro
syz repro: https://ci.syzbot.org/findings/670a21ca-1447-4fda-909b-5098c9c0cdd9/syz_repro
EXT4-fs (loop0): mounted filesystem 76b65be2-f6da-4727-8c75-0525a5b65a09 r/w without journal. Quota mode: none.
ext4 filesystem being mounted at /0/mnt supports timestamps until 2038-01-19 (0x7fffffff)
fscrypt: AES-256-CBC-CTS using implementation "cts(cbc(ecb(aes-lib)))"
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
CPU: 1 UID: 0 PID: 5946 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:kasan_byte_accessible+0x12/0x30 mm/kasan/generic.c:210
Code: 79 ff ff ff 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 40 d6 48 c1 ef 03 48 b8 00 00 00 00 00 fc ff df <0f> b6 04 07 3c 08 0f 92 c0 e9 40 6a 80 09 cc 66 66 66 66 66 66 2e
RSP: 0018:ffffc90003c9f380 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: ffffffff8bafae9e RCX: 0000000080000002
RDX: 0000000000000000 RSI: ffffffff8bafae9e RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: dffffc0000000000 R11: fffffbfff2023057 R12: 0000000000000000
R13: 0000000000000018 R14: 0000000000000018 R15: 0000000000000001
FS: 0000555590824500(0000) GS:ffff8882a9467000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2e763fff CR3: 000000016fa5e000 CR4: 00000000000006f0
Call Trace:
<TASK>
__kasan_check_byte+0x12/0x40 mm/kasan/common.c:573
kasan_check_byte include/linux/kasan.h:402 [inline]
lock_acquire+0x79/0x2e0 kernel/locking/lockdep.c:5842
__raw_spin_lock include/linux/spinlock_api_smp.h:158 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:341 [inline]
mark_buffer_dirty_inode+0xe3/0x2f0 fs/buffer.c:748
__ext4_handle_dirty_metadata+0x27a/0x810 fs/ext4/ext4_jbd2.c:393
ext4_xattr_block_set+0x24ff/0x2ad0 fs/ext4/xattr.c:2168
ext4_xattr_set_handle+0xe34/0x14c0 fs/ext4/xattr.c:2457
ext4_set_context+0x233/0x560 fs/ext4/crypto.c:166
fscrypt_set_context+0x397/0x460 fs/crypto/policy.c:791
__ext4_new_inode+0x3158/0x3d20 fs/ext4/ialloc.c:1314
ext4_symlink+0x3ac/0xb90 fs/ext4/namei.c:3386
vfs_symlink+0x195/0x340 fs/namei.c:5615
filename_symlinkat+0x1cd/0x410 fs/namei.c:5640
__do_sys_symlink fs/namei.c:5667 [inline]
__se_sys_symlink+0x4d/0x2b0 fs/namei.c:5663
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fe222b9c799
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffdf34afb88 EFLAGS: 00000246 ORIG_RAX: 0000000000000058
RAX: ffffffffffffffda RBX: 00007fe222e15fa0 RCX: 00007fe222b9c799
RDX: 0000000000000000 RSI: 00002000000000c0 RDI: 0000200000000080
RBP: 00007fe222c32bd9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fe222e15fac R14: 00007fe222e15fa0 R15: 00007fe222e15fa0
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:kasan_byte_accessible+0x12/0x30 mm/kasan/generic.c:210
Code: 79 ff ff ff 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 40 d6 48 c1 ef 03 48 b8 00 00 00 00 00 fc ff df <0f> b6 04 07 3c 08 0f 92 c0 e9 40 6a 80 09 cc 66 66 66 66 66 66 2e
RSP: 0018:ffffc90003c9f380 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: ffffffff8bafae9e RCX: 0000000080000002
RDX: 0000000000000000 RSI: ffffffff8bafae9e RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: dffffc0000000000 R11: fffffbfff2023057 R12: 0000000000000000
R13: 0000000000000018 R14: 0000000000000018 R15: 0000000000000001
FS: 0000555590824500(0000) GS:ffff8882a9467000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2e763fff CR3: 000000016fa5e000 CR4: 00000000000006f0
----------------
Code disassembly (best guess), 4 bytes skipped:
0: 0f 1f 40 00 nopl 0x0(%rax)
4: 90 nop
5: 90 nop
6: 90 nop
7: 90 nop
8: 90 nop
9: 90 nop
a: 90 nop
b: 90 nop
c: 90 nop
d: 90 nop
e: 90 nop
f: 90 nop
10: 90 nop
11: 90 nop
12: 90 nop
13: 90 nop
14: 0f 1f 40 d6 nopl -0x2a(%rax)
18: 48 c1 ef 03 shr $0x3,%rdi
1c: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
23: fc ff df
* 26: 0f b6 04 07 movzbl (%rdi,%rax,1),%eax <-- trapping instruction
2a: 3c 08 cmp $0x8,%al
2c: 0f 92 c0 setb %al
2f: e9 40 6a 80 09 jmp 0x9806a74
34: cc int3
35: 66 data16
36: 66 data16
37: 66 data16
38: 66 data16
39: 66 data16
3a: 66 data16
3b: 2e cs
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 21/32] bdev: Drop pointless invalidate_mapping_buffers() call
2026-03-03 14:03 ` Christoph Hellwig
@ 2026-03-04 10:30 ` Jan Kara
0 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-04 10:30 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jan Kara, linux-fsdevel, Christian Brauner, Al Viro, linux-ext4,
Ted Tso, Tigran A. Aivazian, David Sterba, OGAWA Hirofumi,
Muchun Song, Oscar Salvador, David Hildenbrand, linux-mm,
linux-aio, Benjamin LaHaise, Jens Axboe, linux-block
On Tue 03-03-26 06:03:57, Christoph Hellwig wrote:
> > diff --git a/block/bdev.c b/block/bdev.c
> > index ed022f8c48c7..ad1660b6b324 100644
> > --- a/block/bdev.c
> > +++ b/block/bdev.c
> > @@ -420,7 +420,6 @@ static void init_once(void *data)
> > static void bdev_evict_inode(struct inode *inode)
> > {
> > truncate_inode_pages_final(&inode->i_data);
> > - invalidate_inode_buffers(inode); /* is it needed here? */
> > clear_inode(inode);
> > }
>
> With this, bdev_evict_inode can go away as it is equivalent to the
> default action when no ->evict_inode is provided.
Good point. I'll remove bdev_evict_inode(). Thanks!
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 21/32] bdev: Drop pointless invalidate_mapping_buffers() call
2026-03-03 14:09 ` Christoph Hellwig
@ 2026-03-04 10:36 ` Jan Kara
0 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-04 10:36 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jan Kara, linux-fsdevel, Christian Brauner, Al Viro, linux-ext4,
Ted Tso, Tigran A. Aivazian, David Sterba, OGAWA Hirofumi,
Muchun Song, Oscar Salvador, David Hildenbrand, linux-mm,
linux-aio, Benjamin LaHaise, Jens Axboe, linux-block
On Tue 03-03-26 06:09:03, Christoph Hellwig wrote:
> FYI, linux-block only got this patch which is totally messed up.
> Please always send all patches to every list and person, otherwise
> you fill peoples inboxes with unreviewable junk.
Well, I've CCed on the whole series everybody who was non-trivially
impacted. But there are couple of these trivial "remove effectively dead
code" patches which stand on their own and a lot of people actually prefer
to only get individual patches in such cases. So I don't plan on changing
that but I guess I could have CCed linux-block on the whole series as
buffer_heads are tangentially related to block layer anyway.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 11/32] gfs2: Don't zero i_private_data
2026-03-03 12:32 ` Andreas Gruenbacher
@ 2026-03-04 10:39 ` Jan Kara
0 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2026-03-04 10:39 UTC (permalink / raw)
To: Andreas Gruenbacher
Cc: Jan Kara, linux-fsdevel, Christian Brauner, Al Viro, linux-ext4,
Ted Tso, Tigran A. Aivazian, David Sterba, OGAWA Hirofumi,
Muchun Song, Oscar Salvador, David Hildenbrand, linux-mm,
linux-aio, Benjamin LaHaise, gfs2
On Tue 03-03-26 13:32:31, Andreas Gruenbacher wrote:
> Jan,
>
> On Tue, Mar 3, 2026 at 11:34 AM Jan Kara <jack@suse.cz> wrote:
> > The zeroing is the only use within gfs2 so it is pointless.
>
> "Remove the explicit zeroing of mapping->i_private_data since this
> field is no longer used."
>
> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Thanks for review. I've updated the changelog.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 0/32] fs: Move metadata bh tracking from address_space
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (32 preceding siblings ...)
2026-03-03 23:35 ` [syzbot ci] Re: fs: Move metadata bh tracking " syzbot ci
@ 2026-03-04 12:32 ` Christian Brauner
33 siblings, 0 replies; 42+ messages in thread
From: Christian Brauner @ 2026-03-04 12:32 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, Al Viro, linux-ext4, Ted Tso, Tigran A. Aivazian,
David Sterba, OGAWA Hirofumi, Muchun Song, Oscar Salvador,
David Hildenbrand, linux-mm, linux-aio, Benjamin LaHaise
On Tue, Mar 03, 2026 at 11:33:49AM +0100, Jan Kara wrote:
> Hello,
>
> this patch series cleans up the mess that has accumulated over the years in
> metadata buffer_head tracking for inodes, moves the tracking into dedicated
> structure in filesystem-private part of the inode (so that we don't use
> private_list, private_data, and private_lock in struct address_space), and also
> moves couple other users of private_data and private_list so these are removed
> from struct address_space saving 3 longs in struct inode for 99% of inodes. I
Yes! I love it.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 18/32] fs: Provide operation for fetching mapping_metadata_bhs
2026-03-03 10:34 ` [PATCH 18/32] fs: Provide operation for fetching mapping_metadata_bhs Jan Kara
@ 2026-03-04 12:48 ` Christian Brauner
0 siblings, 0 replies; 42+ messages in thread
From: Christian Brauner @ 2026-03-04 12:48 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, Al Viro, linux-ext4, Ted Tso, Tigran A. Aivazian,
David Sterba, OGAWA Hirofumi, Muchun Song, Oscar Salvador,
David Hildenbrand, linux-mm, linux-aio, Benjamin LaHaise
On Tue, Mar 03, 2026 at 11:34:07AM +0100, Jan Kara wrote:
> When we move mapping_metadata_bhs to fs-private part of an inode the
> generic code will need a way to get to this struct from general struct
> inode. Add inode operation for this similarly to operation for grabbing
> offset_ctx.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
Yeah, it's a good enough trade-off, I think.
^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2026-03-04 12:49 UTC | newest]
Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
2026-03-03 10:33 ` [PATCH 01/32] fat: Sync and invalidate metadata buffers from fat_evict_inode() Jan Kara
2026-03-03 10:33 ` [PATCH 02/32] udf: Sync and invalidate metadata buffers from udf_evict_inode() Jan Kara
2026-03-03 10:33 ` [PATCH 03/32] minix: Sync and invalidate metadata buffers from minix_evict_inode() Jan Kara
2026-03-03 10:33 ` [PATCH 04/32] ext2: Sync and invalidate metadata buffers from ext2_evict_inode() Jan Kara
2026-03-03 10:33 ` [PATCH 05/32] ext4: Sync and invalidate metadata buffers from ext4_evict_inode() Jan Kara
2026-03-03 10:33 ` [PATCH 06/32] ext4: Use inode_has_buffers() Jan Kara
2026-03-03 10:33 ` [PATCH 07/32] bfs: Sync and invalidate metadata buffers from bfs_evict_inode() Jan Kara
2026-03-03 10:33 ` [PATCH 08/32] affs: Sync and invalidate metadata buffers from affs_evict_inode() Jan Kara
2026-03-03 10:33 ` [PATCH 09/32] fs: Ignore inode metadata buffers in inode_lru_isolate() Jan Kara
2026-03-03 10:33 ` [PATCH 10/32] fs: Stop using i_private_data for metadata bh tracking Jan Kara
2026-03-03 10:34 ` [PATCH 11/32] gfs2: Don't zero i_private_data Jan Kara
2026-03-03 12:32 ` Andreas Gruenbacher
2026-03-04 10:39 ` Jan Kara
2026-03-03 10:34 ` [PATCH 12/32] hugetlbfs: Stop using i_private_data Jan Kara
2026-03-03 10:34 ` [PATCH 13/32] aio: Stop using i_private_data and i_private_lock Jan Kara
2026-03-03 10:34 ` [PATCH 14/32] fs: Remove i_private_data Jan Kara
2026-03-03 10:34 ` [PATCH 15/32] fs: Drop osync_buffers_list() Jan Kara
2026-03-03 10:34 ` [PATCH 16/32] fs: Fold fsync_buffers_list() into sync_mapping_buffers() Jan Kara
2026-03-03 10:34 ` [PATCH 17/32] fs: Move metadata bhs tracking to a separate struct Jan Kara
2026-03-03 10:34 ` [PATCH 18/32] fs: Provide operation for fetching mapping_metadata_bhs Jan Kara
2026-03-04 12:48 ` Christian Brauner
2026-03-03 10:34 ` [PATCH 19/32] ntfs3: Drop pointless sync_mapping_buffers() call Jan Kara
2026-03-03 10:34 ` [PATCH 20/32] ocfs2: Drop pointless sync_mapping_buffers() calls Jan Kara
2026-03-03 10:34 ` [PATCH 21/32] bdev: Drop pointless invalidate_mapping_buffers() call Jan Kara
2026-03-03 14:03 ` Christoph Hellwig
2026-03-04 10:30 ` Jan Kara
2026-03-03 14:09 ` Christoph Hellwig
2026-03-04 10:36 ` Jan Kara
2026-03-03 10:34 ` [PATCH 22/32] fs: Switch inode_has_buffers() to take mapping_metadata_bhs Jan Kara
2026-03-03 10:34 ` [PATCH 23/32] ext2: Track metadata bhs in fs-private inode part Jan Kara
2026-03-03 10:34 ` [PATCH 24/32] affs: " Jan Kara
2026-03-03 10:34 ` [PATCH 25/32] bfs: " Jan Kara
2026-03-03 10:34 ` [PATCH 26/32] fat: " Jan Kara
2026-03-03 10:34 ` [PATCH 27/32] udf: " Jan Kara
2026-03-03 10:34 ` [PATCH 28/32] minix: " Jan Kara
2026-03-03 10:34 ` [PATCH 29/32] ext4: " Jan Kara
2026-03-03 10:34 ` [PATCH 30/32] vfs: Drop mapping_metadata_bhs from address space Jan Kara
2026-03-03 10:34 ` [PATCH 31/32] kvm: Use private inode list instead of i_private_list Jan Kara
2026-03-03 10:34 ` [PATCH 32/32] fs: Drop i_private_list from address_space Jan Kara
2026-03-03 23:35 ` [syzbot ci] Re: fs: Move metadata bh tracking " syzbot ci
2026-03-04 12:32 ` [PATCH 0/32] " Christian Brauner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox