* [PATCH 0/32] fs: Move metadata bh tracking from address_space
@ 2026-03-03 10:33 Jan Kara
2026-03-03 10:33 ` [PATCH 01/32] fat: Sync and invalidate metadata buffers from fat_evict_inode() Jan Kara
` (31 more replies)
0 siblings, 32 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Hello,
this patch series cleans up the mess that has accumulated over the years in
metadata buffer_head tracking for inodes, moves the tracking into dedicated
structure in filesystem-private part of the inode (so that we don't use
private_list, private_data, and private_lock in struct address_space), and also
moves couple other users of private_data and private_list so these are removed
from struct address_space saving 3 longs in struct inode for 99% of inodes. I
would like to get rid of private_lock in struct address_space as well however
the locking changes for buffer_heads are non-trivial there and the patch series
is long enough as is. So let's leave that for another time.
The patches have survived some testing with fstests and ltp however I didn't
test AFFS, HUGETLBFS, and KVM guest_memfd changes so a help with testing
those would be very welcome. Thanks.
block/bdev.c | 1
fs/affs/affs.h | 2
fs/affs/dir.c | 1
fs/affs/file.c | 1
fs/affs/inode.c | 2
fs/affs/super.c | 6
fs/affs/symlink.c | 1
fs/aio.c | 78 +++++++-
fs/bfs/bfs.h | 2
fs/bfs/dir.c | 1
fs/bfs/file.c | 4
fs/bfs/inode.c | 9 +
fs/buffer.c | 387 +++++++++++++++++---------------------------
fs/ext2/ext2.h | 2
fs/ext2/file.c | 1
fs/ext2/inode.c | 3
fs/ext2/namei.c | 2
fs/ext2/super.c | 6
fs/ext2/symlink.c | 2
fs/ext4/ext4.h | 4
fs/ext4/file.c | 1
fs/ext4/inode.c | 9 -
fs/ext4/namei.c | 2
fs/ext4/super.c | 9 -
fs/ext4/symlink.c | 3
fs/fat/fat.h | 2
fs/fat/file.c | 1
fs/fat/inode.c | 16 +
fs/fat/namei_msdos.c | 1
fs/fat/namei_vfat.c | 1
fs/gfs2/glock.c | 1
fs/hugetlbfs/inode.c | 10 -
fs/inode.c | 24 +-
fs/minix/file.c | 1
fs/minix/inode.c | 10 +
fs/minix/minix.h | 2
fs/minix/namei.c | 1
fs/ntfs3/file.c | 3
fs/ocfs2/dlmglue.c | 1
fs/ocfs2/namei.c | 3
fs/udf/file.c | 1
fs/udf/inode.c | 2
fs/udf/namei.c | 1
fs/udf/super.c | 6
fs/udf/symlink.c | 1
fs/udf/udf_i.h | 1
fs/udf/udfdecl.h | 1
include/linux/buffer_head.h | 6
include/linux/fs.h | 11 -
include/linux/hugetlb.h | 1
mm/hugetlb.c | 10 -
virt/kvm/guest_memfd.c | 12 -
52 files changed, 360 insertions(+), 309 deletions(-)
Honza
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 01/32] fat: Sync and invalidate metadata buffers from fat_evict_inode()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
@ 2026-03-03 10:33 ` Jan Kara
2026-03-03 10:33 ` [PATCH 02/32] udf: Sync and invalidate metadata buffers from udf_evict_inode() Jan Kara
` (30 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
There are only very few filesystems using generic metadata buffer head
tracking and everybody is paying the overhead. When we remove this
tracking for inode reclaim code .evict will start to see inodes with
metadata buffers attached so write them out and prune them.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/fat/inode.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 3cc5fb01afa1..ce88602b0d57 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -657,8 +657,10 @@ static void fat_evict_inode(struct inode *inode)
if (!inode->i_nlink) {
inode->i_size = 0;
fat_truncate_blocks(inode, 0);
- } else
+ } else {
+ sync_mapping_buffers(inode->i_mapping);
fat_free_eofblocks(inode);
+ }
invalidate_inode_buffers(inode);
clear_inode(inode);
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 02/32] udf: Sync and invalidate metadata buffers from udf_evict_inode()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
2026-03-03 10:33 ` [PATCH 01/32] fat: Sync and invalidate metadata buffers from fat_evict_inode() Jan Kara
@ 2026-03-03 10:33 ` Jan Kara
2026-03-03 10:33 ` [PATCH 03/32] minix: Sync and invalidate metadata buffers from minix_evict_inode() Jan Kara
` (29 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
There are only very few filesystems using generic metadata buffer head
tracking and everybody is paying the overhead. When we remove this
tracking for inode reclaim code .evict will start to see inodes with
metadata buffers attached so write them out and prune them.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/udf/inode.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/udf/inode.c b/fs/udf/inode.c
index 7fae8002344a..739b190ca4e9 100644
--- a/fs/udf/inode.c
+++ b/fs/udf/inode.c
@@ -154,6 +154,8 @@ void udf_evict_inode(struct inode *inode)
}
}
truncate_inode_pages_final(&inode->i_data);
+ if (!want_delete)
+ sync_mapping_buffers(&inode->i_data);
invalidate_inode_buffers(inode);
clear_inode(inode);
kfree(iinfo->i_data);
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 03/32] minix: Sync and invalidate metadata buffers from minix_evict_inode()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
2026-03-03 10:33 ` [PATCH 01/32] fat: Sync and invalidate metadata buffers from fat_evict_inode() Jan Kara
2026-03-03 10:33 ` [PATCH 02/32] udf: Sync and invalidate metadata buffers from udf_evict_inode() Jan Kara
@ 2026-03-03 10:33 ` Jan Kara
2026-03-03 10:33 ` [PATCH 04/32] ext2: Sync and invalidate metadata buffers from ext2_evict_inode() Jan Kara
` (28 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
There are only very few filesystems using generic metadata buffer head
tracking and everybody is paying the overhead. When we remove this
tracking for inode reclaim code .evict will start to see inodes with
metadata buffers attached so write them out and prune them.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/minix/inode.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/minix/inode.c b/fs/minix/inode.c
index 99541c6a5bbf..ab7c06efb139 100644
--- a/fs/minix/inode.c
+++ b/fs/minix/inode.c
@@ -48,6 +48,8 @@ static void minix_evict_inode(struct inode *inode)
if (!inode->i_nlink) {
inode->i_size = 0;
minix_truncate(inode);
+ } else {
+ sync_mapping_buffers(&inode->i_data);
}
invalidate_inode_buffers(inode);
clear_inode(inode);
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 04/32] ext2: Sync and invalidate metadata buffers from ext2_evict_inode()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (2 preceding siblings ...)
2026-03-03 10:33 ` [PATCH 03/32] minix: Sync and invalidate metadata buffers from minix_evict_inode() Jan Kara
@ 2026-03-03 10:33 ` Jan Kara
2026-03-03 10:33 ` [PATCH 05/32] ext4: Sync and invalidate metadata buffers from ext4_evict_inode() Jan Kara
` (27 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
There are only very few filesystems using generic metadata buffer head
tracking and everybody is paying the overhead. When we remove this
tracking for inode reclaim code .evict will start to see inodes with
metadata buffers attached so write them out and prune them.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext2/inode.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index dbfe9098a124..fb91c61aa6d6 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -94,8 +94,9 @@ void ext2_evict_inode(struct inode * inode)
if (inode->i_blocks)
ext2_truncate_blocks(inode, 0);
ext2_xattr_delete_inode(inode);
+ } else {
+ sync_mapping_buffers(&inode->i_data);
}
-
invalidate_inode_buffers(inode);
clear_inode(inode);
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 05/32] ext4: Sync and invalidate metadata buffers from ext4_evict_inode()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (3 preceding siblings ...)
2026-03-03 10:33 ` [PATCH 04/32] ext2: Sync and invalidate metadata buffers from ext2_evict_inode() Jan Kara
@ 2026-03-03 10:33 ` Jan Kara
2026-03-03 10:33 ` [PATCH 06/32] ext4: Use inode_has_buffers() Jan Kara
` (26 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
There are only very few filesystems using generic metadata buffer head
tracking and everybody is paying the overhead. When we remove this
tracking for inode reclaim code .evict will start to see inodes with
metadata buffers attached so write them out and prune them.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/inode.c | 4 +++-
fs/ext4/super.c | 3 ++-
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 396dc3a5d16b..c2692b9c7123 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -185,7 +185,9 @@ void ext4_evict_inode(struct inode *inode)
ext4_evict_ea_inode(inode);
if (inode->i_nlink) {
truncate_inode_pages_final(&inode->i_data);
-
+ /* Avoid mballoc special inode which has no proper iops */
+ if (!EXT4_SB(inode->i_sb)->s_journal)
+ sync_mapping_buffers(&inode->i_data);
goto no_delete;
}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 43f680c750ae..ea827b0ecc8d 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1524,7 +1524,8 @@ static void destroy_inodecache(void)
void ext4_clear_inode(struct inode *inode)
{
ext4_fc_del(inode);
- invalidate_inode_buffers(inode);
+ if (!EXT4_SB(inode->i_sb)->s_journal)
+ invalidate_inode_buffers(inode);
clear_inode(inode);
ext4_discard_preallocations(inode);
ext4_es_remove_extent(inode, 0, EXT_MAX_BLOCKS);
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 06/32] ext4: Use inode_has_buffers()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (4 preceding siblings ...)
2026-03-03 10:33 ` [PATCH 05/32] ext4: Sync and invalidate metadata buffers from ext4_evict_inode() Jan Kara
@ 2026-03-03 10:33 ` Jan Kara
2026-03-03 10:33 ` [PATCH 07/32] bfs: Sync and invalidate metadata buffers from bfs_evict_inode() Jan Kara
` (25 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Instead of checking i_private_list directly use appropriate wrapper
inode_has_buffers(). Also delete stale comment.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/buffer.c | 1 +
fs/ext4/inode.c | 5 +----
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 22b43642ba57..1bc0f22f3cc2 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -524,6 +524,7 @@ int inode_has_buffers(struct inode *inode)
{
return !list_empty(&inode->i_data.i_private_list);
}
+EXPORT_SYMBOL_GPL(inode_has_buffers);
/*
* osync is designed to support O_SYNC io. It waits synchronously for
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c2692b9c7123..6f892abef003 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1422,9 +1422,6 @@ static int write_end_fn(handle_t *handle, struct inode *inode,
/*
* We need to pick up the new inode size which generic_commit_write gave us
* `iocb` can be NULL - eg, when called from page_symlink().
- *
- * ext4 never places buffers on inode->i_mapping->i_private_list. metadata
- * buffers are managed internally.
*/
static int ext4_write_end(const struct kiocb *iocb,
struct address_space *mapping,
@@ -3439,7 +3436,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
}
/* Any metadata buffers to write? */
- if (!list_empty(&inode->i_mapping->i_private_list))
+ if (inode_has_buffers(inode))
return true;
return inode_state_read_once(inode) & I_DIRTY_DATASYNC;
}
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 07/32] bfs: Sync and invalidate metadata buffers from bfs_evict_inode()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (5 preceding siblings ...)
2026-03-03 10:33 ` [PATCH 06/32] ext4: Use inode_has_buffers() Jan Kara
@ 2026-03-03 10:33 ` Jan Kara
2026-03-03 10:33 ` [PATCH 08/32] affs: Sync and invalidate metadata buffers from affs_evict_inode() Jan Kara
` (24 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
There are only very few filesystems using generic metadata buffer head
tracking and everybody is paying the overhead. When we remove this
tracking for inode reclaim code .evict will start to see inodes with
metadata buffers attached so write them out and prune them.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/bfs/inode.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/bfs/inode.c b/fs/bfs/inode.c
index 9da02f5cb6cd..e0e50a9dbe9c 100644
--- a/fs/bfs/inode.c
+++ b/fs/bfs/inode.c
@@ -187,6 +187,8 @@ static void bfs_evict_inode(struct inode *inode)
dprintf("ino=%08lx\n", ino);
truncate_inode_pages_final(&inode->i_data);
+ if (inode->i_nlink)
+ sync_mapping_buffers(&inode->i_data);
invalidate_inode_buffers(inode);
clear_inode(inode);
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 08/32] affs: Sync and invalidate metadata buffers from affs_evict_inode()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (6 preceding siblings ...)
2026-03-03 10:33 ` [PATCH 07/32] bfs: Sync and invalidate metadata buffers from bfs_evict_inode() Jan Kara
@ 2026-03-03 10:33 ` Jan Kara
2026-03-03 10:33 ` [PATCH 09/32] fs: Ignore inode metadata buffers in inode_lru_isolate() Jan Kara
` (23 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
There are only very few filesystems using generic metadata buffer head
tracking and everybody is paying the overhead. When we remove this
tracking for inode reclaim code .evict will start to see inodes with
metadata buffers attached so write them out and prune them.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/affs/inode.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/affs/inode.c b/fs/affs/inode.c
index 0bfc7d151dcd..84afa862f220 100644
--- a/fs/affs/inode.c
+++ b/fs/affs/inode.c
@@ -267,6 +267,8 @@ affs_evict_inode(struct inode *inode)
if (!inode->i_nlink) {
inode->i_size = 0;
affs_truncate(inode);
+ } else {
+ sync_mapping_buffers(&inode->i_data);
}
invalidate_inode_buffers(inode);
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 09/32] fs: Ignore inode metadata buffers in inode_lru_isolate()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (7 preceding siblings ...)
2026-03-03 10:33 ` [PATCH 08/32] affs: Sync and invalidate metadata buffers from affs_evict_inode() Jan Kara
@ 2026-03-03 10:33 ` Jan Kara
2026-03-03 10:33 ` [PATCH 10/32] fs: Stop using i_private_data for metadata bh tracking Jan Kara
` (22 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
There are only a few filesystems that use generic tracking of inode
metadata buffer heads. As such it is mostly pointless to verify such
attached buffer heads during inode reclaim. Drop the handling from
inode_lru_isolate().
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/buffer.c | 29 -----------------------------
fs/inode.c | 21 +++++++++------------
include/linux/buffer_head.h | 3 ---
3 files changed, 9 insertions(+), 44 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 1bc0f22f3cc2..bd48644e1bf8 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -878,35 +878,6 @@ void invalidate_inode_buffers(struct inode *inode)
}
EXPORT_SYMBOL(invalidate_inode_buffers);
-/*
- * Remove any clean buffers from the inode's buffer list. This is called
- * when we're trying to free the inode itself. Those buffers can pin it.
- *
- * Returns true if all buffers were removed.
- */
-int remove_inode_buffers(struct inode *inode)
-{
- int ret = 1;
-
- if (inode_has_buffers(inode)) {
- struct address_space *mapping = &inode->i_data;
- struct list_head *list = &mapping->i_private_list;
- struct address_space *buffer_mapping = mapping->i_private_data;
-
- spin_lock(&buffer_mapping->i_private_lock);
- while (!list_empty(list)) {
- struct buffer_head *bh = BH_ENTRY(list->next);
- if (buffer_dirty(bh)) {
- ret = 0;
- break;
- }
- __remove_assoc_queue(bh);
- }
- spin_unlock(&buffer_mapping->i_private_lock);
- }
- return ret;
-}
-
/*
* Create the appropriate buffers when given a folio for data area and
* the size of each buffer.. Use the bh->b_this_page linked list to
diff --git a/fs/inode.c b/fs/inode.c
index cc12b68e021b..4f98a5f04bbd 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -17,7 +17,6 @@
#include <linux/fsverity.h>
#include <linux/mount.h>
#include <linux/posix_acl.h>
-#include <linux/buffer_head.h> /* for inode_has_buffers */
#include <linux/ratelimit.h>
#include <linux/list_lru.h>
#include <linux/iversion.h>
@@ -367,7 +366,6 @@ struct inode *alloc_inode(struct super_block *sb)
void __destroy_inode(struct inode *inode)
{
- BUG_ON(inode_has_buffers(inode));
inode_detach_wb(inode);
security_inode_free(inode);
fsnotify_inode_delete(inode);
@@ -994,19 +992,18 @@ static enum lru_status inode_lru_isolate(struct list_head *item,
* page cache in order to free up struct inodes: lowmem might
* be under pressure before the cache inside the highmem zone.
*/
- if (inode_has_buffers(inode) || !mapping_empty(&inode->i_data)) {
+ if (!mapping_empty(&inode->i_data)) {
+ unsigned long reap;
+
inode_pin_lru_isolating(inode);
spin_unlock(&inode->i_lock);
spin_unlock(&lru->lock);
- if (remove_inode_buffers(inode)) {
- unsigned long reap;
- reap = invalidate_mapping_pages(&inode->i_data, 0, -1);
- if (current_is_kswapd())
- __count_vm_events(KSWAPD_INODESTEAL, reap);
- else
- __count_vm_events(PGINODESTEAL, reap);
- mm_account_reclaimed_pages(reap);
- }
+ reap = invalidate_mapping_pages(&inode->i_data, 0, -1);
+ if (current_is_kswapd())
+ __count_vm_events(KSWAPD_INODESTEAL, reap);
+ else
+ __count_vm_events(PGINODESTEAL, reap);
+ mm_account_reclaimed_pages(reap);
inode_unpin_lru_isolating(inode);
return LRU_RETRY;
}
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index b16b88bfbc3e..631bf971efc0 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -517,7 +517,6 @@ void buffer_init(void);
bool try_to_free_buffers(struct folio *folio);
int inode_has_buffers(struct inode *inode);
void invalidate_inode_buffers(struct inode *inode);
-int remove_inode_buffers(struct inode *inode);
int sync_mapping_buffers(struct address_space *mapping);
void invalidate_bh_lrus(void);
void invalidate_bh_lrus_cpu(void);
@@ -528,9 +527,7 @@ extern int buffer_heads_over_limit;
static inline void buffer_init(void) {}
static inline bool try_to_free_buffers(struct folio *folio) { return true; }
-static inline int inode_has_buffers(struct inode *inode) { return 0; }
static inline void invalidate_inode_buffers(struct inode *inode) {}
-static inline int remove_inode_buffers(struct inode *inode) { return 1; }
static inline int sync_mapping_buffers(struct address_space *mapping) { return 0; }
static inline void invalidate_bh_lrus(void) {}
static inline void invalidate_bh_lrus_cpu(void) {}
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 10/32] fs: Stop using i_private_data for metadata bh tracking
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (8 preceding siblings ...)
2026-03-03 10:33 ` [PATCH 09/32] fs: Ignore inode metadata buffers in inode_lru_isolate() Jan Kara
@ 2026-03-03 10:33 ` Jan Kara
2026-03-03 10:34 ` [PATCH 11/32] gfs2: Don't zero i_private_data Jan Kara
` (21 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:33 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
All filesystem using generic metadata bh tracking are using bdev mapping
as a backing for these bhs. Stop using i_private_data for it and get to
bdev mapping directly.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/buffer.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index bd48644e1bf8..c85ccfb1a4ec 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -574,9 +574,10 @@ static int osync_buffers_list(spinlock_t *lock, struct list_head *list)
*/
int sync_mapping_buffers(struct address_space *mapping)
{
- struct address_space *buffer_mapping = mapping->i_private_data;
+ struct address_space *buffer_mapping =
+ mapping->host->i_sb->s_bdev->bd_mapping;
- if (buffer_mapping == NULL || list_empty(&mapping->i_private_list))
+ if (list_empty(&mapping->i_private_list))
return 0;
return fsync_buffers_list(&buffer_mapping->i_private_lock,
@@ -679,11 +680,6 @@ void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode)
struct address_space *buffer_mapping = bh->b_folio->mapping;
mark_buffer_dirty(bh);
- if (!mapping->i_private_data) {
- mapping->i_private_data = buffer_mapping;
- } else {
- BUG_ON(mapping->i_private_data != buffer_mapping);
- }
if (!bh->b_assoc_map) {
spin_lock(&buffer_mapping->i_private_lock);
list_move_tail(&bh->b_assoc_buffers,
@@ -868,7 +864,8 @@ void invalidate_inode_buffers(struct inode *inode)
if (inode_has_buffers(inode)) {
struct address_space *mapping = &inode->i_data;
struct list_head *list = &mapping->i_private_list;
- struct address_space *buffer_mapping = mapping->i_private_data;
+ struct address_space *buffer_mapping =
+ mapping->host->i_sb->s_bdev->bd_mapping;
spin_lock(&buffer_mapping->i_private_lock);
while (!list_empty(list))
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 11/32] gfs2: Don't zero i_private_data
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (9 preceding siblings ...)
2026-03-03 10:33 ` [PATCH 10/32] fs: Stop using i_private_data for metadata bh tracking Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 12:32 ` Andreas Gruenbacher
2026-03-03 10:34 ` [PATCH 12/32] hugetlbfs: Stop using i_private_data Jan Kara
` (20 subsequent siblings)
31 siblings, 1 reply; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara, Andreas Gruenbacher, gfs2
The zeroing is the only use within gfs2 so it is pointless.
CC: Andreas Gruenbacher <agruenba@redhat.com>
CC: gfs2@lists.linux.dev
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/gfs2/glock.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 2acbabccc8ad..b8a144d3a73b 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1149,7 +1149,6 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
mapping->flags = 0;
gfp_mask = mapping_gfp_mask(sdp->sd_inode->i_mapping);
mapping_set_gfp_mask(mapping, gfp_mask);
- mapping->i_private_data = NULL;
mapping->writeback_index = 0;
}
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 12/32] hugetlbfs: Stop using i_private_data
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (10 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 11/32] gfs2: Don't zero i_private_data Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 13/32] aio: Stop using i_private_data and i_private_lock Jan Kara
` (19 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Instead of using i_private_data for resv_map pointer add the pointer
into hugetlbfs private part of the inode.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/hugetlbfs/inode.c | 10 ++--------
include/linux/hugetlb.h | 1 +
mm/hugetlb.c | 10 +---------
3 files changed, 4 insertions(+), 17 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 3f70c47981de..0496f2e6d177 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -622,13 +622,7 @@ static void hugetlbfs_evict_inode(struct inode *inode)
trace_hugetlbfs_evict_inode(inode);
remove_inode_hugepages(inode, 0, LLONG_MAX);
- /*
- * Get the resv_map from the address space embedded in the inode.
- * This is the address space which points to any resv_map allocated
- * at inode creation time. If this is a device special inode,
- * i_mapping may not point to the original address space.
- */
- resv_map = (struct resv_map *)(&inode->i_data)->i_private_data;
+ resv_map = HUGETLBFS_I(inode)->resv_map;
/* Only regular and link inodes have associated reserve maps */
if (resv_map)
resv_map_release(&resv_map->refs);
@@ -950,7 +944,7 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb,
&hugetlbfs_i_mmap_rwsem_key);
inode->i_mapping->a_ops = &hugetlbfs_aops;
simple_inode_init_ts(inode);
- inode->i_mapping->i_private_data = resv_map;
+ info->resv_map = resv_map;
info->seals = F_SEAL_SEAL;
switch (mode & S_IFMT) {
default:
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 65910437be1c..fc5462fe943f 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -518,6 +518,7 @@ static inline struct hugetlbfs_sb_info *HUGETLBFS_SB(struct super_block *sb)
struct hugetlbfs_inode_info {
struct inode vfs_inode;
+ struct resv_map *resv_map;
unsigned int seals;
};
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 0beb6e22bc26..7ab5c724a711 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1157,15 +1157,7 @@ void resv_map_release(struct kref *ref)
static inline struct resv_map *inode_resv_map(struct inode *inode)
{
- /*
- * At inode evict time, i_mapping may not point to the original
- * address space within the inode. This original address space
- * contains the pointer to the resv_map. So, always use the
- * address space embedded within the inode.
- * The VERY common case is inode->mapping == &inode->i_data but,
- * this may not be true for device special inodes.
- */
- return (struct resv_map *)(&inode->i_data)->i_private_data;
+ return HUGETLBFS_I(inode)->resv_map;
}
static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 13/32] aio: Stop using i_private_data and i_private_lock
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (11 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 12/32] hugetlbfs: Stop using i_private_data Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 14/32] fs: Remove i_private_data Jan Kara
` (18 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Instead of using i_private_data and i_private_lock, just create aio
inodes with appropriate necessary fields.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/aio.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 66 insertions(+), 12 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index a07bdd1aaaa6..ba9b9fa2446b 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -218,6 +218,17 @@ struct aio_kiocb {
struct eventfd_ctx *ki_eventfd;
};
+struct aio_inode_info {
+ struct inode vfs_inode;
+ spinlock_t migrate_lock;
+ struct kioctx *ctx;
+};
+
+static inline struct aio_inode_info *AIO_I(struct inode *inode)
+{
+ return container_of(inode, struct aio_inode_info, vfs_inode);
+}
+
/*------ sysctl variables----*/
static DEFINE_SPINLOCK(aio_nr_lock);
static unsigned long aio_nr; /* current system wide number of aio requests */
@@ -251,6 +262,7 @@ static void __init aio_sysctl_init(void)
static struct kmem_cache *kiocb_cachep;
static struct kmem_cache *kioctx_cachep;
+static struct kmem_cache *aio_inode_cachep;
static struct vfsmount *aio_mnt;
@@ -261,11 +273,12 @@ static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages)
{
struct file *file;
struct inode *inode = alloc_anon_inode(aio_mnt->mnt_sb);
+
if (IS_ERR(inode))
return ERR_CAST(inode);
inode->i_mapping->a_ops = &aio_ctx_aops;
- inode->i_mapping->i_private_data = ctx;
+ AIO_I(inode)->ctx = ctx;
inode->i_size = PAGE_SIZE * nr_pages;
file = alloc_file_pseudo(inode, aio_mnt, "[aio]",
@@ -275,14 +288,49 @@ static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages)
return file;
}
+static struct inode *aio_alloc_inode(struct super_block *sb)
+{
+ struct aio_inode_info *ai;
+
+ ai = alloc_inode_sb(sb, aio_inode_cachep, GFP_KERNEL);
+ if (!ai)
+ return NULL;
+ ai->ctx = NULL;
+
+ return &ai->vfs_inode;
+}
+
+static void aio_free_inode(struct inode *inode)
+{
+ kmem_cache_free(aio_inode_cachep, AIO_I(inode));
+}
+
+static const struct super_operations aio_super_operations = {
+ .alloc_inode = aio_alloc_inode,
+ .free_inode = aio_free_inode,
+ .statfs = simple_statfs,
+};
+
static int aio_init_fs_context(struct fs_context *fc)
{
- if (!init_pseudo(fc, AIO_RING_MAGIC))
+ struct pseudo_fs_context *pfc;
+
+ pfc = init_pseudo(fc, AIO_RING_MAGIC);
+ if (!pfc)
return -ENOMEM;
fc->s_iflags |= SB_I_NOEXEC;
+ pfc->ops = &aio_super_operations;
return 0;
}
+static void init_once(void *obj)
+{
+ struct aio_inode_info *ai = obj;
+
+ inode_init_once(&ai->vfs_inode);
+ spin_lock_init(&ai->migrate_lock);
+}
+
/* aio_setup
* Creates the slab caches used by the aio routines, panic on
* failure as this is done early during the boot sequence.
@@ -294,6 +342,11 @@ static int __init aio_setup(void)
.init_fs_context = aio_init_fs_context,
.kill_sb = kill_anon_super,
};
+
+ aio_inode_cachep = kmem_cache_create("aio_inode_cache",
+ sizeof(struct aio_inode_info), 0,
+ (SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_ACCOUNT),
+ init_once);
aio_mnt = kern_mount(&aio_fs);
if (IS_ERR(aio_mnt))
panic("Failed to create aio fs mount.");
@@ -308,17 +361,17 @@ __initcall(aio_setup);
static void put_aio_ring_file(struct kioctx *ctx)
{
struct file *aio_ring_file = ctx->aio_ring_file;
- struct address_space *i_mapping;
if (aio_ring_file) {
- truncate_setsize(file_inode(aio_ring_file), 0);
+ struct inode *inode = file_inode(aio_ring_file);
+
+ truncate_setsize(inode, 0);
/* Prevent further access to the kioctx from migratepages */
- i_mapping = aio_ring_file->f_mapping;
- spin_lock(&i_mapping->i_private_lock);
- i_mapping->i_private_data = NULL;
+ spin_lock(&AIO_I(inode)->migrate_lock);
+ AIO_I(inode)->ctx = NULL;
ctx->aio_ring_file = NULL;
- spin_unlock(&i_mapping->i_private_lock);
+ spin_unlock(&AIO_I(inode)->migrate_lock);
fput(aio_ring_file);
}
@@ -408,13 +461,14 @@ static int aio_migrate_folio(struct address_space *mapping, struct folio *dst,
struct folio *src, enum migrate_mode mode)
{
struct kioctx *ctx;
+ struct aio_inode_info *ai = AIO_I(mapping->host);
unsigned long flags;
pgoff_t idx;
int rc = 0;
- /* mapping->i_private_lock here protects against the kioctx teardown. */
- spin_lock(&mapping->i_private_lock);
- ctx = mapping->i_private_data;
+ /* ai->migrate_lock here protects against the kioctx teardown. */
+ spin_lock(&ai->migrate_lock);
+ ctx = ai->ctx;
if (!ctx) {
rc = -EINVAL;
goto out;
@@ -467,7 +521,7 @@ static int aio_migrate_folio(struct address_space *mapping, struct folio *dst,
out_unlock:
mutex_unlock(&ctx->ring_lock);
out:
- spin_unlock(&mapping->i_private_lock);
+ spin_unlock(&ai->migrate_lock);
return rc;
}
#else
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 14/32] fs: Remove i_private_data
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (12 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 13/32] aio: Stop using i_private_data and i_private_lock Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 15/32] fs: Drop osync_buffers_list() Jan Kara
` (17 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Nobody is using it anymore.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/inode.c | 1 -
include/linux/fs.h | 2 --
2 files changed, 3 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index 4f98a5f04bbd..d5774e627a9c 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -283,7 +283,6 @@ int inode_init_always_gfp(struct super_block *sb, struct inode *inode, gfp_t gfp
atomic_set(&mapping->nr_thps, 0);
#endif
mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
- mapping->i_private_data = NULL;
mapping->writeback_index = 0;
init_rwsem(&mapping->invalidate_lock);
lockdep_set_class_and_name(&mapping->invalidate_lock,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8b3dd145b25e..10b96eb5391d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -465,7 +465,6 @@ extern const struct address_space_operations empty_aops;
* @wb_err: The most recent error which has occurred.
* @i_private_lock: For use by the owner of the address_space.
* @i_private_list: For use by the owner of the address_space.
- * @i_private_data: For use by the owner of the address_space.
*/
struct address_space {
struct inode *host;
@@ -486,7 +485,6 @@ struct address_space {
spinlock_t i_private_lock;
struct list_head i_private_list;
struct rw_semaphore i_mmap_rwsem;
- void * i_private_data;
} __attribute__((aligned(sizeof(long)))) __randomize_layout;
/*
* On most architectures that alignment is already the case; but
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 15/32] fs: Drop osync_buffers_list()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (13 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 14/32] fs: Remove i_private_data Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 16/32] fs: Fold fsync_buffers_list() into sync_mapping_buffers() Jan Kara
` (16 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
The function only waits for already locked buffers in the list of
metadata bhs. fsync_buffers_list() has just waited for all outstanding
IO on buffers so this isn't adding anything useful. Comment in front of
fsync_buffers_list() mentions concerns about buffers being moved out
from tmp list back to mappings i_private_list but these days
mark_buffer_dirty_inode() doesn't touch inodes with b_assoc_map set so
that cannot happen. Just delete the stale code.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/buffer.c | 43 ++-----------------------------------------
1 file changed, 2 insertions(+), 41 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index c85ccfb1a4ec..1c0e7c81a38b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -526,41 +526,6 @@ int inode_has_buffers(struct inode *inode)
}
EXPORT_SYMBOL_GPL(inode_has_buffers);
-/*
- * osync is designed to support O_SYNC io. It waits synchronously for
- * all already-submitted IO to complete, but does not queue any new
- * writes to the disk.
- *
- * To do O_SYNC writes, just queue the buffer writes with write_dirty_buffer
- * as you dirty the buffers, and then use osync_inode_buffers to wait for
- * completion. Any other dirty buffers which are not yet queued for
- * write will not be flushed to disk by the osync.
- */
-static int osync_buffers_list(spinlock_t *lock, struct list_head *list)
-{
- struct buffer_head *bh;
- struct list_head *p;
- int err = 0;
-
- spin_lock(lock);
-repeat:
- list_for_each_prev(p, list) {
- bh = BH_ENTRY(p);
- if (buffer_locked(bh)) {
- get_bh(bh);
- spin_unlock(lock);
- wait_on_buffer(bh);
- if (!buffer_uptodate(bh))
- err = -EIO;
- brelse(bh);
- spin_lock(lock);
- goto repeat;
- }
- }
- spin_unlock(lock);
- return err;
-}
-
/**
* sync_mapping_buffers - write out & wait upon a mapping's "associated" buffers
* @mapping: the mapping which wants those buffers written
@@ -777,7 +742,7 @@ static int fsync_buffers_list(spinlock_t *lock, struct list_head *list)
{
struct buffer_head *bh;
struct address_space *mapping;
- int err = 0, err2;
+ int err = 0;
struct blk_plug plug;
LIST_HEAD(tmp);
@@ -844,11 +809,7 @@ static int fsync_buffers_list(spinlock_t *lock, struct list_head *list)
}
spin_unlock(lock);
- err2 = osync_buffers_list(lock, list);
- if (err)
- return err;
- else
- return err2;
+ return err;
}
/*
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 16/32] fs: Fold fsync_buffers_list() into sync_mapping_buffers()
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (14 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 15/32] fs: Drop osync_buffers_list() Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 17/32] fs: Move metadata bhs tracking to a separate struct Jan Kara
` (15 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
There's only single caller of fsync_buffers_list() so untangle the code
a bit by folding fsync_buffers_list() into sync_mapping_buffers(). Also
merge the comments and update them to reflect current state of code.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/buffer.c | 180 +++++++++++++++++++++++-----------------------------
1 file changed, 80 insertions(+), 100 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 1c0e7c81a38b..18012afb8289 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -54,7 +54,6 @@
#include "internal.h"
-static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
enum rw_hint hint, struct writeback_control *wbc);
@@ -531,22 +530,96 @@ EXPORT_SYMBOL_GPL(inode_has_buffers);
* @mapping: the mapping which wants those buffers written
*
* Starts I/O against the buffers at mapping->i_private_list, and waits upon
- * that I/O.
+ * that I/O. Basically, this is a convenience function for fsync(). @mapping
+ * is a file or directory which needs those buffers to be written for a
+ * successful fsync().
*
- * Basically, this is a convenience function for fsync().
- * @mapping is a file or directory which needs those buffers to be written for
- * a successful fsync().
+ * We have conflicting pressures: we want to make sure that all
+ * initially dirty buffers get waited on, but that any subsequently
+ * dirtied buffers don't. After all, we don't want fsync to last
+ * forever if somebody is actively writing to the file.
+ *
+ * Do this in two main stages: first we copy dirty buffers to a
+ * temporary inode list, queueing the writes as we go. Then we clean
+ * up, waiting for those writes to complete. mark_buffer_dirty_inode()
+ * doesn't touch b_assoc_buffers list if b_assoc_map is not NULL so we
+ * are sure the buffer stays on our list until IO completes (at which point
+ * it can be reaped).
*/
int sync_mapping_buffers(struct address_space *mapping)
{
struct address_space *buffer_mapping =
mapping->host->i_sb->s_bdev->bd_mapping;
+ struct buffer_head *bh;
+ int err = 0;
+ struct blk_plug plug;
+ LIST_HEAD(tmp);
if (list_empty(&mapping->i_private_list))
return 0;
- return fsync_buffers_list(&buffer_mapping->i_private_lock,
- &mapping->i_private_list);
+ blk_start_plug(&plug);
+
+ spin_lock(&buffer_mapping->i_private_lock);
+ while (!list_empty(&mapping->i_private_list)) {
+ bh = BH_ENTRY(list->next);
+ WARN_ON_ONCE(bh->b_assoc_map != mapping);
+ __remove_assoc_queue(bh);
+ /* Avoid race with mark_buffer_dirty_inode() which does
+ * a lockless check and we rely on seeing the dirty bit */
+ smp_mb();
+ if (buffer_dirty(bh) || buffer_locked(bh)) {
+ list_add(&bh->b_assoc_buffers, &tmp);
+ bh->b_assoc_map = mapping;
+ if (buffer_dirty(bh)) {
+ get_bh(bh);
+ spin_unlock(&buffer_mapping->i_private_lock);
+ /*
+ * Ensure any pending I/O completes so that
+ * write_dirty_buffer() actually writes the
+ * current contents - it is a noop if I/O is
+ * still in flight on potentially older
+ * contents.
+ */
+ write_dirty_buffer(bh, REQ_SYNC);
+
+ /*
+ * Kick off IO for the previous mapping. Note
+ * that we will not run the very last mapping,
+ * wait_on_buffer() will do that for us
+ * through sync_buffer().
+ */
+ brelse(bh);
+ spin_lock(&buffer_mapping->i_private_lock);
+ }
+ }
+ }
+
+ spin_unlock(&buffer_mapping->i_private_lock);
+ blk_finish_plug(&plug);
+ spin_lock(&buffer_mapping->i_private_lock);
+
+ while (!list_empty(&tmp)) {
+ bh = BH_ENTRY(tmp.prev);
+ get_bh(bh);
+ __remove_assoc_queue(bh);
+ /* Avoid race with mark_buffer_dirty_inode() which does
+ * a lockless check and we rely on seeing the dirty bit */
+ smp_mb();
+ if (buffer_dirty(bh)) {
+ list_add(&bh->b_assoc_buffers,
+ &mapping->i_private_list);
+ bh->b_assoc_map = mapping;
+ }
+ spin_unlock(&buffer_mapping->i_private_lock);
+ wait_on_buffer(bh);
+ if (!buffer_uptodate(bh))
+ err = -EIO;
+ brelse(bh);
+ spin_lock(&buffer_mapping->i_private_lock);
+ }
+ spin_unlock(&buffer_mapping->i_private_lock);
+ return err;
}
EXPORT_SYMBOL(sync_mapping_buffers);
@@ -719,99 +792,6 @@ bool block_dirty_folio(struct address_space *mapping, struct folio *folio)
}
EXPORT_SYMBOL(block_dirty_folio);
-/*
- * Write out and wait upon a list of buffers.
- *
- * We have conflicting pressures: we want to make sure that all
- * initially dirty buffers get waited on, but that any subsequently
- * dirtied buffers don't. After all, we don't want fsync to last
- * forever if somebody is actively writing to the file.
- *
- * Do this in two main stages: first we copy dirty buffers to a
- * temporary inode list, queueing the writes as we go. Then we clean
- * up, waiting for those writes to complete.
- *
- * During this second stage, any subsequent updates to the file may end
- * up refiling the buffer on the original inode's dirty list again, so
- * there is a chance we will end up with a buffer queued for write but
- * not yet completed on that list. So, as a final cleanup we go through
- * the osync code to catch these locked, dirty buffers without requeuing
- * any newly dirty buffers for write.
- */
-static int fsync_buffers_list(spinlock_t *lock, struct list_head *list)
-{
- struct buffer_head *bh;
- struct address_space *mapping;
- int err = 0;
- struct blk_plug plug;
- LIST_HEAD(tmp);
-
- blk_start_plug(&plug);
-
- spin_lock(lock);
- while (!list_empty(list)) {
- bh = BH_ENTRY(list->next);
- mapping = bh->b_assoc_map;
- __remove_assoc_queue(bh);
- /* Avoid race with mark_buffer_dirty_inode() which does
- * a lockless check and we rely on seeing the dirty bit */
- smp_mb();
- if (buffer_dirty(bh) || buffer_locked(bh)) {
- list_add(&bh->b_assoc_buffers, &tmp);
- bh->b_assoc_map = mapping;
- if (buffer_dirty(bh)) {
- get_bh(bh);
- spin_unlock(lock);
- /*
- * Ensure any pending I/O completes so that
- * write_dirty_buffer() actually writes the
- * current contents - it is a noop if I/O is
- * still in flight on potentially older
- * contents.
- */
- write_dirty_buffer(bh, REQ_SYNC);
-
- /*
- * Kick off IO for the previous mapping. Note
- * that we will not run the very last mapping,
- * wait_on_buffer() will do that for us
- * through sync_buffer().
- */
- brelse(bh);
- spin_lock(lock);
- }
- }
- }
-
- spin_unlock(lock);
- blk_finish_plug(&plug);
- spin_lock(lock);
-
- while (!list_empty(&tmp)) {
- bh = BH_ENTRY(tmp.prev);
- get_bh(bh);
- mapping = bh->b_assoc_map;
- __remove_assoc_queue(bh);
- /* Avoid race with mark_buffer_dirty_inode() which does
- * a lockless check and we rely on seeing the dirty bit */
- smp_mb();
- if (buffer_dirty(bh)) {
- list_add(&bh->b_assoc_buffers,
- &mapping->i_private_list);
- bh->b_assoc_map = mapping;
- }
- spin_unlock(lock);
- wait_on_buffer(bh);
- if (!buffer_uptodate(bh))
- err = -EIO;
- brelse(bh);
- spin_lock(lock);
- }
-
- spin_unlock(lock);
- return err;
-}
-
/*
* Invalidate any and all dirty buffers on a given inode. We are
* probably unmounting the fs, but that doesn't mean we have already
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 17/32] fs: Move metadata bhs tracking to a separate struct
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (15 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 16/32] fs: Fold fsync_buffers_list() into sync_mapping_buffers() Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 18/32] fs: Provide operation for fetching mapping_metadata_bhs Jan Kara
` (14 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Instead of tracking metadata bhs for a mapping using i_private_list and
i_private_lock we create a dedicated mapping_metadata_bhs struct for it.
So far this struct is embedded in address_space but that will be
switched for per-fs private inode parts later in the series. This also
changes the locking from bdev mapping's i_private_lock to lock embedded
in mapping_metadata_bhs to untangle the i_private_lock locking for
maintaining lists of metadata bhs and the locking for looking up /
reclaiming bdev's buffer heads. The locking in remove_assoc_map()
gets more complex due to this but overall this looks like a reasonable
tradeoff.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/buffer.c | 138 +++++++++++++++++++++------------------------
fs/inode.c | 2 +
include/linux/fs.h | 7 +++
3 files changed, 74 insertions(+), 73 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 18012afb8289..d39ae6581c26 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -469,30 +469,13 @@ EXPORT_SYMBOL(mark_buffer_async_write);
*
* The functions mark_buffer_dirty_inode(), fsync_inode_buffers(),
* inode_has_buffers() and invalidate_inode_buffers() are provided for the
- * management of a list of dependent buffers at ->i_mapping->i_private_list.
- *
- * Locking is a little subtle: try_to_free_buffers() will remove buffers
- * from their controlling inode's queue when they are being freed. But
- * try_to_free_buffers() will be operating against the *blockdev* mapping
- * at the time, not against the S_ISREG file which depends on those buffers.
- * So the locking for i_private_list is via the i_private_lock in the address_space
- * which backs the buffers. Which is different from the address_space
- * against which the buffers are listed. So for a particular address_space,
- * mapping->i_private_lock does *not* protect mapping->i_private_list! In fact,
- * mapping->i_private_list will always be protected by the backing blockdev's
- * ->i_private_lock.
- *
- * Which introduces a requirement: all buffers on an address_space's
- * ->i_private_list must be from the same address_space: the blockdev's.
- *
- * address_spaces which do not place buffers at ->i_private_list via these
- * utility functions are free to use i_private_lock and i_private_list for
- * whatever they want. The only requirement is that list_empty(i_private_list)
- * be true at clear_inode() time.
- *
- * FIXME: clear_inode should not call invalidate_inode_buffers(). The
- * filesystems should do that. invalidate_inode_buffers() should just go
- * BUG_ON(!list_empty).
+ * management of a list of dependent buffers in mapping_metadata_bhs struct.
+ *
+ * The locking is a little subtle: The list of buffer heads is protected by
+ * the lock in mapping_metadata_bhs so functions coming from bdev mapping
+ * (such as try_to_free_buffers()) need to safely get to mapping_metadata_bhs
+ * using RCU, grab the lock, verify we didn't race with somebody detaching the
+ * bh / moving it to different inode and only then proceeding.
*
* FIXME: mark_buffer_dirty_inode() is a data-plane operation. It should
* take an address_space, not an inode. And it should be called
@@ -509,19 +492,45 @@ EXPORT_SYMBOL(mark_buffer_async_write);
* b_inode back.
*/
-/*
- * The buffer's backing address_space's i_private_lock must be held
- */
-static void __remove_assoc_queue(struct buffer_head *bh)
+static void __remove_assoc_queue(struct mapping_metadata_bhs *mmb,
+ struct buffer_head *bh)
{
+ lockdep_assert_held(&mmb->lock);
list_del_init(&bh->b_assoc_buffers);
WARN_ON(!bh->b_assoc_map);
bh->b_assoc_map = NULL;
}
+static void remove_assoc_queue(struct buffer_head *bh)
+{
+ struct address_space *mapping;
+ struct mapping_metadata_bhs *mmb;
+
+ /*
+ * The locking dance is ugly here. We need to acquire lock
+ * protecting metadata bh list while possibly racing with bh
+ * being removed from the list or moved to a different one. We
+ * use RCU to pin mapping_metadata_bhs in memory to
+ * opportunistically acquire the lock and then recheck the bh
+ * didn't move under us.
+ */
+ while (bh->b_assoc_map) {
+ rcu_read_lock();
+ mapping = READ_ONCE(bh->b_assoc_map);
+ if (mapping) {
+ mmb = &mapping->i_metadata_bhs;
+ spin_lock(&mmb->lock);
+ if (bh->b_assoc_map == mapping)
+ __remove_assoc_queue(mmb, bh);
+ spin_unlock(&mmb->lock);
+ }
+ rcu_read_unlock();
+ }
+}
+
int inode_has_buffers(struct inode *inode)
{
- return !list_empty(&inode->i_data.i_private_list);
+ return !list_empty(&inode->i_data.i_metadata_bhs.list);
}
EXPORT_SYMBOL_GPL(inode_has_buffers);
@@ -529,7 +538,7 @@ EXPORT_SYMBOL_GPL(inode_has_buffers);
* sync_mapping_buffers - write out & wait upon a mapping's "associated" buffers
* @mapping: the mapping which wants those buffers written
*
- * Starts I/O against the buffers at mapping->i_private_list, and waits upon
+ * Starts I/O against the buffers at mapping->i_metadata_bhs and waits upon
* that I/O. Basically, this is a convenience function for fsync(). @mapping
* is a file or directory which needs those buffers to be written for a
* successful fsync().
@@ -548,23 +557,22 @@ EXPORT_SYMBOL_GPL(inode_has_buffers);
*/
int sync_mapping_buffers(struct address_space *mapping)
{
- struct address_space *buffer_mapping =
- mapping->host->i_sb->s_bdev->bd_mapping;
+ struct mapping_metadata_bhs *mmb = &mapping->i_metadata_bhs;
struct buffer_head *bh;
int err = 0;
struct blk_plug plug;
LIST_HEAD(tmp);
- if (list_empty(&mapping->i_private_list))
+ if (list_empty(&mmb->list))
return 0;
blk_start_plug(&plug);
- spin_lock(&buffer_mapping->i_private_lock);
- while (!list_empty(&mapping->i_private_list)) {
- bh = BH_ENTRY(list->next);
+ spin_lock(&mmb->lock);
+ while (!list_empty(&mmb->list)) {
+ bh = BH_ENTRY(mmb->list.next);
WARN_ON_ONCE(bh->b_assoc_map != mapping);
- __remove_assoc_queue(bh);
+ __remove_assoc_queue(mmb, bh);
/* Avoid race with mark_buffer_dirty_inode() which does
* a lockless check and we rely on seeing the dirty bit */
smp_mb();
@@ -573,7 +581,7 @@ int sync_mapping_buffers(struct address_space *mapping)
bh->b_assoc_map = mapping;
if (buffer_dirty(bh)) {
get_bh(bh);
- spin_unlock(&buffer_mapping->i_private_lock);
+ spin_unlock(&mmb->lock);
/*
* Ensure any pending I/O completes so that
* write_dirty_buffer() actually writes the
@@ -590,35 +598,34 @@ int sync_mapping_buffers(struct address_space *mapping)
* through sync_buffer().
*/
brelse(bh);
- spin_lock(&buffer_mapping->i_private_lock);
+ spin_lock(&mmb->lock);
}
}
}
- spin_unlock(&buffer_mapping->i_private_lock);
+ spin_unlock(&mmb->lock);
blk_finish_plug(&plug);
- spin_lock(&buffer_mapping->i_private_lock);
+ spin_lock(&mmb->lock);
while (!list_empty(&tmp)) {
bh = BH_ENTRY(tmp.prev);
get_bh(bh);
- __remove_assoc_queue(bh);
+ __remove_assoc_queue(mmb, bh);
/* Avoid race with mark_buffer_dirty_inode() which does
* a lockless check and we rely on seeing the dirty bit */
smp_mb();
if (buffer_dirty(bh)) {
- list_add(&bh->b_assoc_buffers,
- &mapping->i_private_list);
+ list_add(&bh->b_assoc_buffers, &mmb->list);
bh->b_assoc_map = mapping;
}
- spin_unlock(&buffer_mapping->i_private_lock);
+ spin_unlock(&mmb->lock);
wait_on_buffer(bh);
if (!buffer_uptodate(bh))
err = -EIO;
brelse(bh);
- spin_lock(&buffer_mapping->i_private_lock);
+ spin_lock(&mmb->lock);
}
- spin_unlock(&buffer_mapping->i_private_lock);
+ spin_unlock(&mmb->lock);
return err;
}
EXPORT_SYMBOL(sync_mapping_buffers);
@@ -715,15 +722,14 @@ void write_boundary_block(struct block_device *bdev,
void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode)
{
struct address_space *mapping = inode->i_mapping;
- struct address_space *buffer_mapping = bh->b_folio->mapping;
mark_buffer_dirty(bh);
if (!bh->b_assoc_map) {
- spin_lock(&buffer_mapping->i_private_lock);
+ spin_lock(&mapping->i_metadata_bhs.lock);
list_move_tail(&bh->b_assoc_buffers,
- &mapping->i_private_list);
+ &mapping->i_metadata_bhs.list);
bh->b_assoc_map = mapping;
- spin_unlock(&buffer_mapping->i_private_lock);
+ spin_unlock(&mapping->i_metadata_bhs.lock);
}
}
EXPORT_SYMBOL(mark_buffer_dirty_inode);
@@ -796,22 +802,16 @@ EXPORT_SYMBOL(block_dirty_folio);
* Invalidate any and all dirty buffers on a given inode. We are
* probably unmounting the fs, but that doesn't mean we have already
* done a sync(). Just drop the buffers from the inode list.
- *
- * NOTE: we take the inode's blockdev's mapping's i_private_lock. Which
- * assumes that all the buffers are against the blockdev.
*/
void invalidate_inode_buffers(struct inode *inode)
{
if (inode_has_buffers(inode)) {
- struct address_space *mapping = &inode->i_data;
- struct list_head *list = &mapping->i_private_list;
- struct address_space *buffer_mapping =
- mapping->host->i_sb->s_bdev->bd_mapping;
-
- spin_lock(&buffer_mapping->i_private_lock);
- while (!list_empty(list))
- __remove_assoc_queue(BH_ENTRY(list->next));
- spin_unlock(&buffer_mapping->i_private_lock);
+ struct mapping_metadata_bhs *mmb = &inode->i_data.i_metadata_bhs;
+
+ spin_lock(&mmb->lock);
+ while (!list_empty(&mmb->list))
+ __remove_assoc_queue(mmb, BH_ENTRY(mmb->list.next));
+ spin_unlock(&mmb->lock);
}
}
EXPORT_SYMBOL(invalidate_inode_buffers);
@@ -1155,14 +1155,7 @@ EXPORT_SYMBOL(__brelse);
void __bforget(struct buffer_head *bh)
{
clear_buffer_dirty(bh);
- if (bh->b_assoc_map) {
- struct address_space *buffer_mapping = bh->b_folio->mapping;
-
- spin_lock(&buffer_mapping->i_private_lock);
- list_del_init(&bh->b_assoc_buffers);
- bh->b_assoc_map = NULL;
- spin_unlock(&buffer_mapping->i_private_lock);
- }
+ remove_assoc_queue(bh);
__brelse(bh);
}
EXPORT_SYMBOL(__bforget);
@@ -2810,8 +2803,7 @@ drop_buffers(struct folio *folio, struct buffer_head **buffers_to_free)
do {
struct buffer_head *next = bh->b_this_page;
- if (bh->b_assoc_map)
- __remove_assoc_queue(bh);
+ remove_assoc_queue(bh);
bh = next;
} while (bh != head);
*buffers_to_free = head;
diff --git a/fs/inode.c b/fs/inode.c
index d5774e627a9c..393f586d050a 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -483,6 +483,8 @@ static void __address_space_init_once(struct address_space *mapping)
init_rwsem(&mapping->i_mmap_rwsem);
INIT_LIST_HEAD(&mapping->i_private_list);
spin_lock_init(&mapping->i_private_lock);
+ spin_lock_init(&mapping->i_metadata_bhs.lock);
+ INIT_LIST_HEAD(&mapping->i_metadata_bhs.list);
mapping->i_mmap = RB_ROOT_CACHED;
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 10b96eb5391d..64771a55adc5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -445,6 +445,12 @@ struct address_space_operations {
extern const struct address_space_operations empty_aops;
+/* Structure for tracking metadata buffer heads associated with the mapping */
+struct mapping_metadata_bhs {
+ spinlock_t lock; /* Lock protecting bh list */
+ struct list_head list; /* The list of bhs (b_assoc_buffers) */
+};
+
/**
* struct address_space - Contents of a cacheable, mappable object.
* @host: Owner, either the inode or the block_device.
@@ -484,6 +490,7 @@ struct address_space {
errseq_t wb_err;
spinlock_t i_private_lock;
struct list_head i_private_list;
+ struct mapping_metadata_bhs i_metadata_bhs;
struct rw_semaphore i_mmap_rwsem;
} __attribute__((aligned(sizeof(long)))) __randomize_layout;
/*
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 18/32] fs: Provide operation for fetching mapping_metadata_bhs
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (16 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 17/32] fs: Move metadata bhs tracking to a separate struct Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 19/32] ntfs3: Drop pointless sync_mapping_buffers() call Jan Kara
` (13 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
When we move mapping_metadata_bhs to fs-private part of an inode the
generic code will need a way to get to this struct from general struct
inode. Add inode operation for this similarly to operation for grabbing
offset_ctx.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/buffer.c | 35 +++++++++++++++++++++++++----------
include/linux/buffer_head.h | 1 +
include/linux/fs.h | 1 +
3 files changed, 27 insertions(+), 10 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index d39ae6581c26..d7a1d72302da 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -492,6 +492,20 @@ EXPORT_SYMBOL(mark_buffer_async_write);
* b_inode back.
*/
+void mmb_init(struct mapping_metadata_bhs *mmb)
+{
+ spin_lock_init(&mmb->lock);
+ INIT_LIST_HEAD(&mmb->list);
+}
+EXPORT_SYMBOL(mmb_init);
+
+static struct mapping_metadata_bhs *inode_get_metadata_bhs(struct inode *inode)
+{
+ if (inode->i_op->get_metadata_bhs)
+ return inode->i_op->get_metadata_bhs(inode);
+ return &inode->i_mapping->i_metadata_bhs;
+}
+
static void __remove_assoc_queue(struct mapping_metadata_bhs *mmb,
struct buffer_head *bh)
{
@@ -518,7 +532,7 @@ static void remove_assoc_queue(struct buffer_head *bh)
rcu_read_lock();
mapping = READ_ONCE(bh->b_assoc_map);
if (mapping) {
- mmb = &mapping->i_metadata_bhs;
+ mmb = inode_get_metadata_bhs(mapping->host);
spin_lock(&mmb->lock);
if (bh->b_assoc_map == mapping)
__remove_assoc_queue(mmb, bh);
@@ -557,7 +571,8 @@ EXPORT_SYMBOL_GPL(inode_has_buffers);
*/
int sync_mapping_buffers(struct address_space *mapping)
{
- struct mapping_metadata_bhs *mmb = &mapping->i_metadata_bhs;
+ struct mapping_metadata_bhs *mmb =
+ inode_get_metadata_bhs(mapping->host);
struct buffer_head *bh;
int err = 0;
struct blk_plug plug;
@@ -721,15 +736,15 @@ void write_boundary_block(struct block_device *bdev,
void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode)
{
- struct address_space *mapping = inode->i_mapping;
-
mark_buffer_dirty(bh);
if (!bh->b_assoc_map) {
- spin_lock(&mapping->i_metadata_bhs.lock);
- list_move_tail(&bh->b_assoc_buffers,
- &mapping->i_metadata_bhs.list);
- bh->b_assoc_map = mapping;
- spin_unlock(&mapping->i_metadata_bhs.lock);
+ struct mapping_metadata_bhs *mmb;
+
+ mmb = inode_get_metadata_bhs(inode);
+ spin_lock(&mmb->lock);
+ list_move_tail(&bh->b_assoc_buffers, &mmb->list);
+ bh->b_assoc_map = inode->i_mapping;
+ spin_unlock(&mmb->lock);
}
}
EXPORT_SYMBOL(mark_buffer_dirty_inode);
@@ -806,7 +821,7 @@ EXPORT_SYMBOL(block_dirty_folio);
void invalidate_inode_buffers(struct inode *inode)
{
if (inode_has_buffers(inode)) {
- struct mapping_metadata_bhs *mmb = &inode->i_data.i_metadata_bhs;
+ struct mapping_metadata_bhs *mmb = inode_get_metadata_bhs(inode);
spin_lock(&mmb->lock);
while (!list_empty(&mmb->list))
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 631bf971efc0..623ee66d41a8 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -515,6 +515,7 @@ bool block_dirty_folio(struct address_space *mapping, struct folio *folio);
void buffer_init(void);
bool try_to_free_buffers(struct folio *folio);
+void mmb_init(struct mapping_metadata_bhs *mmb);
int inode_has_buffers(struct inode *inode);
void invalidate_inode_buffers(struct inode *inode);
int sync_mapping_buffers(struct address_space *mapping);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 64771a55adc5..b4d9be1fefa4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2046,6 +2046,7 @@ struct inode_operations {
struct dentry *dentry, struct file_kattr *fa);
int (*fileattr_get)(struct dentry *dentry, struct file_kattr *fa);
struct offset_ctx *(*get_offset_ctx)(struct inode *inode);
+ struct mapping_metadata_bhs *(*get_metadata_bhs)(struct inode *inode);
} ____cacheline_aligned;
/* Did the driver provide valid mmap hook configuration? */
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 19/32] ntfs3: Drop pointless sync_mapping_buffers() call
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (17 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 18/32] fs: Provide operation for fetching mapping_metadata_bhs Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 20/32] ocfs2: Drop pointless sync_mapping_buffers() calls Jan Kara
` (12 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara, Konstantin Komarov, ntfs3
ntfs3 never calls mark_buffer_dirty_inode() and thus its metadata
buffers list is always empty. Drop the pointless sync_mapping_buffers()
call.
CC: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
CC: ntfs3@lists.linux.dev
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ntfs3/file.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
index 7eecf1e01f74..570c92fa7ee7 100644
--- a/fs/ntfs3/file.c
+++ b/fs/ntfs3/file.c
@@ -387,9 +387,6 @@ static int ntfs_extend(struct inode *inode, loff_t pos, size_t count,
int err2;
err = filemap_fdatawrite_range(mapping, pos, end - 1);
- err2 = sync_mapping_buffers(mapping);
- if (!err)
- err = err2;
err2 = write_inode_now(inode, 1);
if (!err)
err = err2;
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 20/32] ocfs2: Drop pointless sync_mapping_buffers() calls
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (18 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 19/32] ntfs3: Drop pointless sync_mapping_buffers() call Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 21/32] bdev: Drop pointless invalidate_mapping_buffers() call Jan Kara
` (11 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara, Joel Becker, Joseph Qi, ocfs2-devel
ocfs2 never calls mark_buffer_dirty_inode() and thus its metadata
buffers list is always empty. Drop the pointless sync_mapping_buffers()
calls.
CC: Joel Becker <jlbec@evilplan.org>
CC: Joseph Qi <joseph.qi@linux.alibaba.com>
CC: ocfs2-devel@lists.linux.dev
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ocfs2/dlmglue.c | 1 -
fs/ocfs2/namei.c | 3 ---
2 files changed, 4 deletions(-)
diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index bd2ddb7d841d..7283bb2c5a31 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -3971,7 +3971,6 @@ static int ocfs2_data_convert_worker(struct ocfs2_lock_res *lockres,
mlog(ML_ERROR, "Could not sync inode %llu for downconvert!",
(unsigned long long)OCFS2_I(inode)->ip_blkno);
}
- sync_mapping_buffers(mapping);
if (blocking == DLM_LOCK_EX) {
truncate_inode_pages(mapping, 0);
} else {
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index 268b79339a51..1277666c77cd 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -1683,9 +1683,6 @@ static int ocfs2_rename(struct mnt_idmap *idmap,
if (rename_lock)
ocfs2_rename_unlock(osb);
- if (new_inode)
- sync_mapping_buffers(old_inode->i_mapping);
-
iput(new_inode);
ocfs2_free_dir_lookup_result(&target_lookup_res);
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 21/32] bdev: Drop pointless invalidate_mapping_buffers() call
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (19 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 20/32] ocfs2: Drop pointless sync_mapping_buffers() calls Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 14:03 ` Christoph Hellwig
2026-03-03 14:09 ` Christoph Hellwig
2026-03-03 10:34 ` [PATCH 22/32] fs: Switch inode_has_buffers() to take mapping_metadata_bhs Jan Kara
` (10 subsequent siblings)
31 siblings, 2 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara, Jens Axboe, linux-block
Nobody is calling mark_buffer_dirty_inode() with internal bdev inode and
it doesn't make sense for internal bdev inode to have any metadata
buffer heads. Just drop the pointless invalidate_mapping_buffers() call.
CC: Jens Axboe <axboe@kernel.dk>
CC: linux-block@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
block/bdev.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/block/bdev.c b/block/bdev.c
index ed022f8c48c7..ad1660b6b324 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -420,7 +420,6 @@ static void init_once(void *data)
static void bdev_evict_inode(struct inode *inode)
{
truncate_inode_pages_final(&inode->i_data);
- invalidate_inode_buffers(inode); /* is it needed here? */
clear_inode(inode);
}
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 22/32] fs: Switch inode_has_buffers() to take mapping_metadata_bhs
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (20 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 21/32] bdev: Drop pointless invalidate_mapping_buffers() call Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 23/32] ext2: Track metadata bhs in fs-private inode part Jan Kara
` (9 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
inode_has_buffers() is also used internally and it is trivial so it's
pointless to grab mapping_metadata_bhs for each invocation. Just let
that function take mapping_metadata_bhs struct instead and rename the
function to mmb_has_buffers().
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/buffer.c | 14 +++++++-------
fs/ext4/inode.c | 2 +-
include/linux/buffer_head.h | 2 +-
3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index d7a1d72302da..096a8d9e3280 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -468,7 +468,7 @@ EXPORT_SYMBOL(mark_buffer_async_write);
* written back and waited upon before fsync() returns.
*
* The functions mark_buffer_dirty_inode(), fsync_inode_buffers(),
- * inode_has_buffers() and invalidate_inode_buffers() are provided for the
+ * mmb_has_buffers() and invalidate_inode_buffers() are provided for the
* management of a list of dependent buffers in mapping_metadata_bhs struct.
*
* The locking is a little subtle: The list of buffer heads is protected by
@@ -542,11 +542,11 @@ static void remove_assoc_queue(struct buffer_head *bh)
}
}
-int inode_has_buffers(struct inode *inode)
+bool mmb_has_buffers(struct mapping_metadata_bhs *mmb)
{
- return !list_empty(&inode->i_data.i_metadata_bhs.list);
+ return !list_empty(&mmb->list);
}
-EXPORT_SYMBOL_GPL(inode_has_buffers);
+EXPORT_SYMBOL_GPL(mmb_has_buffers);
/**
* sync_mapping_buffers - write out & wait upon a mapping's "associated" buffers
@@ -578,7 +578,7 @@ int sync_mapping_buffers(struct address_space *mapping)
struct blk_plug plug;
LIST_HEAD(tmp);
- if (list_empty(&mmb->list))
+ if (!mmb_has_buffers(mmb))
return 0;
blk_start_plug(&plug);
@@ -820,9 +820,9 @@ EXPORT_SYMBOL(block_dirty_folio);
*/
void invalidate_inode_buffers(struct inode *inode)
{
- if (inode_has_buffers(inode)) {
- struct mapping_metadata_bhs *mmb = inode_get_metadata_bhs(inode);
+ struct mapping_metadata_bhs *mmb = inode_get_metadata_bhs(inode);
+ if (mmb_has_buffers(mmb)) {
spin_lock(&mmb->lock);
while (!list_empty(&mmb->list))
__remove_assoc_queue(mmb, BH_ENTRY(mmb->list.next));
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 6f892abef003..011cb2eb16a2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3436,7 +3436,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
}
/* Any metadata buffers to write? */
- if (inode_has_buffers(inode))
+ if (mmb_has_buffers(&inode->i_mapping->i_metadata_bhs))
return true;
return inode_state_read_once(inode) & I_DIRTY_DATASYNC;
}
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 623ee66d41a8..ebbd73c45e63 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -516,7 +516,7 @@ bool block_dirty_folio(struct address_space *mapping, struct folio *folio);
void buffer_init(void);
bool try_to_free_buffers(struct folio *folio);
void mmb_init(struct mapping_metadata_bhs *mmb);
-int inode_has_buffers(struct inode *inode);
+bool mmb_has_buffers(struct mapping_metadata_bhs *mmb);
void invalidate_inode_buffers(struct inode *inode);
int sync_mapping_buffers(struct address_space *mapping);
void invalidate_bh_lrus(void);
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 23/32] ext2: Track metadata bhs in fs-private inode part
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (21 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 22/32] fs: Switch inode_has_buffers() to take mapping_metadata_bhs Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 24/32] affs: " Jan Kara
` (8 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Track metadata bhs for an inode in fs-private part of the inode.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext2/ext2.h | 2 ++
fs/ext2/file.c | 1 +
fs/ext2/namei.c | 2 ++
fs/ext2/super.c | 6 ++++++
fs/ext2/symlink.c | 2 ++
5 files changed, 13 insertions(+)
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index 5e0c6c5fcb6c..2b6593ba107f 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -676,6 +676,7 @@ struct ext2_inode_info {
#ifdef CONFIG_QUOTA
struct dquot __rcu *i_dquot[MAXQUOTAS];
#endif
+ struct mapping_metadata_bhs i_metadata_bhs;
};
/*
@@ -766,6 +767,7 @@ void ext2_msg(struct super_block *, const char *, const char *, ...);
extern void ext2_update_dynamic_rev (struct super_block *sb);
extern void ext2_sync_super(struct super_block *sb, struct ext2_super_block *es,
int wait);
+struct mapping_metadata_bhs *ext2_get_metadata_bhs(struct inode *inode);
/*
* Inodes and files operations
diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index ebe356a38b18..2dbf3e7c2e9c 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -338,4 +338,5 @@ const struct inode_operations ext2_file_inode_operations = {
.fiemap = ext2_fiemap,
.fileattr_get = ext2_fileattr_get,
.fileattr_set = ext2_fileattr_set,
+ .get_metadata_bhs = ext2_get_metadata_bhs,
};
diff --git a/fs/ext2/namei.c b/fs/ext2/namei.c
index bde617a66cec..70c94adce837 100644
--- a/fs/ext2/namei.c
+++ b/fs/ext2/namei.c
@@ -422,6 +422,7 @@ const struct inode_operations ext2_dir_inode_operations = {
.tmpfile = ext2_tmpfile,
.fileattr_get = ext2_fileattr_get,
.fileattr_set = ext2_fileattr_set,
+ .get_metadata_bhs = ext2_get_metadata_bhs,
};
const struct inode_operations ext2_special_inode_operations = {
@@ -430,4 +431,5 @@ const struct inode_operations ext2_special_inode_operations = {
.setattr = ext2_setattr,
.get_inode_acl = ext2_get_acl,
.set_acl = ext2_set_acl,
+ .get_metadata_bhs = ext2_get_metadata_bhs,
};
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 603f2641fe10..503c25cae27c 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -215,6 +215,7 @@ static struct inode *ext2_alloc_inode(struct super_block *sb)
#ifdef CONFIG_QUOTA
memset(&ei->i_dquot, 0, sizeof(ei->i_dquot));
#endif
+ mmb_init(&ei->i_metadata_bhs);
return &ei->vfs_inode;
}
@@ -259,6 +260,11 @@ static void destroy_inodecache(void)
kmem_cache_destroy(ext2_inode_cachep);
}
+struct mapping_metadata_bhs *ext2_get_metadata_bhs(struct inode *inode)
+{
+ return &EXT2_I(inode)->i_metadata_bhs;
+}
+
static int ext2_show_options(struct seq_file *seq, struct dentry *root)
{
struct super_block *sb = root->d_sb;
diff --git a/fs/ext2/symlink.c b/fs/ext2/symlink.c
index 948d3a441403..c82a15d28772 100644
--- a/fs/ext2/symlink.c
+++ b/fs/ext2/symlink.c
@@ -26,6 +26,7 @@ const struct inode_operations ext2_symlink_inode_operations = {
.getattr = ext2_getattr,
.setattr = ext2_setattr,
.listxattr = ext2_listxattr,
+ .get_metadata_bhs = ext2_get_metadata_bhs,
};
const struct inode_operations ext2_fast_symlink_inode_operations = {
@@ -33,4 +34,5 @@ const struct inode_operations ext2_fast_symlink_inode_operations = {
.getattr = ext2_getattr,
.setattr = ext2_setattr,
.listxattr = ext2_listxattr,
+ .get_metadata_bhs = ext2_get_metadata_bhs,
};
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 24/32] affs: Track metadata bhs in fs-private inode part
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (22 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 23/32] ext2: Track metadata bhs in fs-private inode part Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 25/32] bfs: " Jan Kara
` (7 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Track metadata bhs for an inode in fs-private part of the inode.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/affs/affs.h | 2 ++
fs/affs/dir.c | 1 +
fs/affs/file.c | 1 +
fs/affs/super.c | 6 ++++++
fs/affs/symlink.c | 1 +
5 files changed, 11 insertions(+)
diff --git a/fs/affs/affs.h b/fs/affs/affs.h
index ac4e9a02910b..a1eb400e1018 100644
--- a/fs/affs/affs.h
+++ b/fs/affs/affs.h
@@ -44,6 +44,7 @@ struct affs_inode_info {
struct mutex i_link_lock; /* Protects internal inode access. */
struct mutex i_ext_lock; /* Protects internal inode access. */
#define i_hash_lock i_ext_lock
+ struct mapping_metadata_bhs i_metadata_bhs;
u32 i_blkcnt; /* block count */
u32 i_extcnt; /* extended block count */
u32 *i_lc; /* linear cache of extended blocks */
@@ -151,6 +152,7 @@ extern bool affs_nofilenametruncate(const struct dentry *dentry);
extern int affs_check_name(const unsigned char *name, int len,
bool notruncate);
extern int affs_copy_name(unsigned char *bstr, struct dentry *dentry);
+struct mapping_metadata_bhs *affs_get_metadata_bhs(struct inode *inode);
/* bitmap. c */
diff --git a/fs/affs/dir.c b/fs/affs/dir.c
index 5c8d83387a39..6b0314c84972 100644
--- a/fs/affs/dir.c
+++ b/fs/affs/dir.c
@@ -72,6 +72,7 @@ const struct inode_operations affs_dir_inode_operations = {
.rmdir = affs_rmdir,
.rename = affs_rename2,
.setattr = affs_notify_change,
+ .get_metadata_bhs = affs_get_metadata_bhs,
};
static int
diff --git a/fs/affs/file.c b/fs/affs/file.c
index 6c9258359ddb..4dbd9351eea0 100644
--- a/fs/affs/file.c
+++ b/fs/affs/file.c
@@ -1014,4 +1014,5 @@ const struct file_operations affs_file_operations = {
const struct inode_operations affs_file_inode_operations = {
.setattr = affs_notify_change,
+ .get_metadata_bhs = affs_get_metadata_bhs,
};
diff --git a/fs/affs/super.c b/fs/affs/super.c
index 8451647f3fea..dff272df0636 100644
--- a/fs/affs/super.c
+++ b/fs/affs/super.c
@@ -108,6 +108,7 @@ static struct inode *affs_alloc_inode(struct super_block *sb)
i->i_lc = NULL;
i->i_ext_bh = NULL;
i->i_pa_cnt = 0;
+ mmb_init(&i->i_metadata_bhs);
return &i->vfs_inode;
}
@@ -147,6 +148,11 @@ static void destroy_inodecache(void)
kmem_cache_destroy(affs_inode_cachep);
}
+struct mapping_metadata_bhs *affs_get_metadata_bhs(struct inode *inode)
+{
+ return &AFFS_I(inode)->i_metadata_bhs;
+}
+
static const struct super_operations affs_sops = {
.alloc_inode = affs_alloc_inode,
.free_inode = affs_free_inode,
diff --git a/fs/affs/symlink.c b/fs/affs/symlink.c
index 094aec8d17b8..68fa091bd377 100644
--- a/fs/affs/symlink.c
+++ b/fs/affs/symlink.c
@@ -72,4 +72,5 @@ const struct address_space_operations affs_symlink_aops = {
const struct inode_operations affs_symlink_inode_operations = {
.get_link = page_get_link,
.setattr = affs_notify_change,
+ .get_metadata_bhs = affs_get_metadata_bhs,
};
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 25/32] bfs: Track metadata bhs in fs-private inode part
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (23 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 24/32] affs: " Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 26/32] fat: " Jan Kara
` (6 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Track metadata bhs for an inode in fs-private part of the inode.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/bfs/bfs.h | 2 ++
fs/bfs/dir.c | 1 +
fs/bfs/file.c | 4 +++-
fs/bfs/inode.c | 7 +++++++
4 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/fs/bfs/bfs.h b/fs/bfs/bfs.h
index 606f9378b2f0..5fadb6e860f1 100644
--- a/fs/bfs/bfs.h
+++ b/fs/bfs/bfs.h
@@ -35,6 +35,7 @@ struct bfs_inode_info {
unsigned long i_dsk_ino; /* inode number from the disk, can be 0 */
unsigned long i_sblock;
unsigned long i_eblock;
+ struct mapping_metadata_bhs i_metadata_bhs;
struct inode vfs_inode;
};
@@ -55,6 +56,7 @@ static inline struct bfs_inode_info *BFS_I(struct inode *inode)
/* inode.c */
extern struct inode *bfs_iget(struct super_block *sb, unsigned long ino);
extern void bfs_dump_imap(const char *, struct super_block *);
+struct mapping_metadata_bhs *bfs_get_metadata_bhs(struct inode *inode);
/* file.c */
extern const struct inode_operations bfs_file_inops;
diff --git a/fs/bfs/dir.c b/fs/bfs/dir.c
index c375e22c4c0c..30529f476582 100644
--- a/fs/bfs/dir.c
+++ b/fs/bfs/dir.c
@@ -262,6 +262,7 @@ const struct inode_operations bfs_dir_inops = {
.link = bfs_link,
.unlink = bfs_unlink,
.rename = bfs_rename,
+ .get_metadata_bhs = bfs_get_metadata_bhs,
};
static int bfs_add_entry(struct inode *dir, const struct qstr *child, int ino)
diff --git a/fs/bfs/file.c b/fs/bfs/file.c
index d33d6bde992b..335ab07e37fe 100644
--- a/fs/bfs/file.c
+++ b/fs/bfs/file.c
@@ -200,4 +200,6 @@ const struct address_space_operations bfs_aops = {
.bmap = bfs_bmap,
};
-const struct inode_operations bfs_file_inops;
+const struct inode_operations bfs_file_inops = {
+ .get_metadata_bhs = bfs_get_metadata_bhs,
+};
diff --git a/fs/bfs/inode.c b/fs/bfs/inode.c
index e0e50a9dbe9c..f1a392394a23 100644
--- a/fs/bfs/inode.c
+++ b/fs/bfs/inode.c
@@ -259,6 +259,8 @@ static struct inode *bfs_alloc_inode(struct super_block *sb)
bi = alloc_inode_sb(sb, bfs_inode_cachep, GFP_KERNEL);
if (!bi)
return NULL;
+ mmb_init(&bi->i_metadata_bhs);
+
return &bi->vfs_inode;
}
@@ -296,6 +298,11 @@ static void destroy_inodecache(void)
kmem_cache_destroy(bfs_inode_cachep);
}
+struct mapping_metadata_bhs *bfs_get_metadata_bhs(struct inode *inode)
+{
+ return &BFS_I(inode)->i_metadata_bhs;
+}
+
static const struct super_operations bfs_sops = {
.alloc_inode = bfs_alloc_inode,
.free_inode = bfs_free_inode,
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 26/32] fat: Track metadata bhs in fs-private inode part
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (24 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 25/32] bfs: " Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 27/32] udf: " Jan Kara
` (5 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Track metadata bhs for an inode in fs-private part of the inode.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/fat/fat.h | 2 ++
fs/fat/file.c | 1 +
fs/fat/inode.c | 12 ++++++++++++
fs/fat/namei_msdos.c | 1 +
fs/fat/namei_vfat.c | 1 +
5 files changed, 17 insertions(+)
diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index 0d269dba897b..2b2f6ad32f24 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -130,6 +130,7 @@ struct msdos_inode_info {
struct hlist_node i_dir_hash; /* hash by i_logstart */
struct rw_semaphore truncate_lock; /* protect bmap against truncate */
struct timespec64 i_crtime; /* File creation (birth) time */
+ struct mapping_metadata_bhs i_metadata_bhs;
struct inode vfs_inode;
};
@@ -424,6 +425,7 @@ extern int fat_fill_inode(struct inode *inode, struct msdos_dir_entry *de);
extern int fat_flush_inodes(struct super_block *sb, struct inode *i1,
struct inode *i2);
+struct mapping_metadata_bhs *fat_get_metadata_bhs(struct inode *inode);
extern const struct fs_parameter_spec fat_param_spec[];
int fat_init_fs_context(struct fs_context *fc, bool is_vfat);
diff --git a/fs/fat/file.c b/fs/fat/file.c
index 124d9c5431c8..da21636d3874 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -574,4 +574,5 @@ const struct inode_operations fat_file_inode_operations = {
.setattr = fat_setattr,
.getattr = fat_getattr,
.update_time = fat_update_time,
+ .get_metadata_bhs = fat_get_metadata_bhs,
};
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index ce88602b0d57..8561b8be5ca2 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -763,6 +763,7 @@ static struct inode *fat_alloc_inode(struct super_block *sb)
ei->i_pos = 0;
ei->i_crtime.tv_sec = 0;
ei->i_crtime.tv_nsec = 0;
+ mmb_init(&ei->i_metadata_bhs);
return &ei->vfs_inode;
}
@@ -807,6 +808,12 @@ static void __exit fat_destroy_inodecache(void)
kmem_cache_destroy(fat_inode_cachep);
}
+struct mapping_metadata_bhs *fat_get_metadata_bhs(struct inode *inode)
+{
+ return &MSDOS_I(inode)->i_metadata_bhs;
+}
+EXPORT_SYMBOL_GPL(fat_get_metadata_bhs);
+
int fat_reconfigure(struct fs_context *fc)
{
bool new_rdonly;
@@ -1531,6 +1538,10 @@ static int fat_read_static_bpb(struct super_block *sb,
return error;
}
+static const struct inode_operations fat_table_inode_operations = {
+ .get_metadata_bhs = fat_get_metadata_bhs,
+};
+
/*
* Read the super block of an MS-DOS FS.
*/
@@ -1806,6 +1817,7 @@ int fat_fill_super(struct super_block *sb, struct fs_context *fc,
fat_inode = new_inode(sb);
if (!fat_inode)
goto out_fail;
+ fat_inode->i_op = &fat_table_inode_operations;
sbi->fat_inode = fat_inode;
fsinfo_inode = new_inode(sb);
diff --git a/fs/fat/namei_msdos.c b/fs/fat/namei_msdos.c
index 048c103b506a..1526b8910d51 100644
--- a/fs/fat/namei_msdos.c
+++ b/fs/fat/namei_msdos.c
@@ -643,6 +643,7 @@ static const struct inode_operations msdos_dir_inode_operations = {
.setattr = fat_setattr,
.getattr = fat_getattr,
.update_time = fat_update_time,
+ .get_metadata_bhs = fat_get_metadata_bhs,
};
static void setup(struct super_block *sb)
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 87dcdd86272b..ca5e0e9822a6 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -1186,6 +1186,7 @@ static const struct inode_operations vfat_dir_inode_operations = {
.setattr = fat_setattr,
.getattr = fat_getattr,
.update_time = fat_update_time,
+ .get_metadata_bhs = fat_get_metadata_bhs,
};
static void setup(struct super_block *sb)
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 27/32] udf: Track metadata bhs in fs-private inode part
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (25 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 26/32] fat: " Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 28/32] minix: " Jan Kara
` (4 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Track metadata bhs for an inode in fs-private part of the inode.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/udf/file.c | 1 +
fs/udf/namei.c | 1 +
fs/udf/super.c | 6 ++++++
fs/udf/symlink.c | 1 +
fs/udf/udf_i.h | 1 +
fs/udf/udfdecl.h | 1 +
6 files changed, 11 insertions(+)
diff --git a/fs/udf/file.c b/fs/udf/file.c
index 32ae7cfd72c5..8d51313173f3 100644
--- a/fs/udf/file.c
+++ b/fs/udf/file.c
@@ -251,4 +251,5 @@ static int udf_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
const struct inode_operations udf_file_inode_operations = {
.setattr = udf_setattr,
+ .get_metadata_bhs = udf_get_metadata_bhs,
};
diff --git a/fs/udf/namei.c b/fs/udf/namei.c
index 5f2e9a892bff..ef9eadb96f4e 100644
--- a/fs/udf/namei.c
+++ b/fs/udf/namei.c
@@ -1025,4 +1025,5 @@ const struct inode_operations udf_dir_inode_operations = {
.mknod = udf_mknod,
.rename = udf_rename,
.tmpfile = udf_tmpfile,
+ .get_metadata_bhs = udf_get_metadata_bhs,
};
diff --git a/fs/udf/super.c b/fs/udf/super.c
index 27f463fd1d89..eb62972c9fda 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -166,6 +166,7 @@ static struct inode *udf_alloc_inode(struct super_block *sb)
ei->cached_extent.lstart = -1;
spin_lock_init(&ei->i_extent_cache_lock);
inode_set_iversion(&ei->vfs_inode, 1);
+ mmb_init(&ei->i_metadata_bhs);
return &ei->vfs_inode;
}
@@ -205,6 +206,11 @@ static void destroy_inodecache(void)
kmem_cache_destroy(udf_inode_cachep);
}
+struct mapping_metadata_bhs *udf_get_metadata_bhs(struct inode *inode)
+{
+ return &UDF_I(inode)->i_metadata_bhs;
+}
+
/* Superblock operations */
static const struct super_operations udf_sb_ops = {
.alloc_inode = udf_alloc_inode,
diff --git a/fs/udf/symlink.c b/fs/udf/symlink.c
index fe03745d09b1..56c860a10b91 100644
--- a/fs/udf/symlink.c
+++ b/fs/udf/symlink.c
@@ -168,4 +168,5 @@ const struct address_space_operations udf_symlink_aops = {
const struct inode_operations udf_symlink_inode_operations = {
.get_link = page_get_link,
.getattr = udf_symlink_getattr,
+ .get_metadata_bhs = udf_get_metadata_bhs,
};
diff --git a/fs/udf/udf_i.h b/fs/udf/udf_i.h
index 312b7c9ef10e..fdaa88c49c2b 100644
--- a/fs/udf/udf_i.h
+++ b/fs/udf/udf_i.h
@@ -50,6 +50,7 @@ struct udf_inode_info {
struct kernel_lb_addr i_locStreamdir;
__u64 i_lenStreams;
struct rw_semaphore i_data_sem;
+ struct mapping_metadata_bhs i_metadata_bhs;
struct udf_ext_cache cached_extent;
/* Spinlock for protecting extent cache */
spinlock_t i_extent_cache_lock;
diff --git a/fs/udf/udfdecl.h b/fs/udf/udfdecl.h
index d159f20d61e8..db2b92217bf5 100644
--- a/fs/udf/udfdecl.h
+++ b/fs/udf/udfdecl.h
@@ -126,6 +126,7 @@ static inline void udf_updated_lvid(struct super_block *sb)
extern u64 lvid_get_unique_id(struct super_block *sb);
struct inode *udf_find_metadata_inode_efe(struct super_block *sb,
u32 meta_file_loc, u32 partition_num);
+struct mapping_metadata_bhs *udf_get_metadata_bhs(struct inode *inode);
/* namei.c */
static inline unsigned int udf_dir_entry_len(struct fileIdentDesc *cfi)
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 28/32] minix: Track metadata bhs in fs-private inode part
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (26 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 27/32] udf: " Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 29/32] ext4: " Jan Kara
` (3 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Track metadata bhs for an inode in fs-private part of the inode.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/minix/file.c | 1 +
fs/minix/inode.c | 8 ++++++++
fs/minix/minix.h | 2 ++
fs/minix/namei.c | 1 +
4 files changed, 12 insertions(+)
diff --git a/fs/minix/file.c b/fs/minix/file.c
index dca7ac71f049..b3abe380634a 100644
--- a/fs/minix/file.c
+++ b/fs/minix/file.c
@@ -50,4 +50,5 @@ static int minix_setattr(struct mnt_idmap *idmap,
const struct inode_operations minix_file_inode_operations = {
.setattr = minix_setattr,
.getattr = minix_getattr,
+ .get_metadata_bhs = minix_get_metadata_bhs,
};
diff --git a/fs/minix/inode.c b/fs/minix/inode.c
index ab7c06efb139..20abbe21a632 100644
--- a/fs/minix/inode.c
+++ b/fs/minix/inode.c
@@ -85,6 +85,8 @@ static struct inode *minix_alloc_inode(struct super_block *sb)
ei = alloc_inode_sb(sb, minix_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
+ mmb_init(&ei->i_metadata_bhs);
+
return &ei->vfs_inode;
}
@@ -122,6 +124,11 @@ static void destroy_inodecache(void)
kmem_cache_destroy(minix_inode_cachep);
}
+struct mapping_metadata_bhs *minix_get_metadata_bhs(struct inode *inode)
+{
+ return &minix_i(inode)->i_metadata_bhs;
+}
+
static const struct super_operations minix_sops = {
.alloc_inode = minix_alloc_inode,
.free_inode = minix_free_in_core_inode,
@@ -502,6 +509,7 @@ static const struct address_space_operations minix_aops = {
static const struct inode_operations minix_symlink_inode_operations = {
.get_link = page_get_link,
.getattr = minix_getattr,
+ .get_metadata_bhs = minix_get_metadata_bhs,
};
void minix_set_inode(struct inode *inode, dev_t rdev)
diff --git a/fs/minix/minix.h b/fs/minix/minix.h
index 7e1f652f16d3..38981a30ac99 100644
--- a/fs/minix/minix.h
+++ b/fs/minix/minix.h
@@ -19,6 +19,7 @@ struct minix_inode_info {
__u16 i1_data[16];
__u32 i2_data[16];
} u;
+ struct mapping_metadata_bhs i_metadata_bhs;
struct inode vfs_inode;
};
@@ -57,6 +58,7 @@ unsigned long minix_count_free_blocks(struct super_block *sb);
int minix_getattr(struct mnt_idmap *, const struct path *,
struct kstat *, u32, unsigned int);
int minix_prepare_chunk(struct folio *folio, loff_t pos, unsigned len);
+struct mapping_metadata_bhs *minix_get_metadata_bhs(struct inode *inode);
extern void V1_minix_truncate(struct inode *);
extern void V2_minix_truncate(struct inode *);
diff --git a/fs/minix/namei.c b/fs/minix/namei.c
index 263e4ba8b1c8..e31e84a677eb 100644
--- a/fs/minix/namei.c
+++ b/fs/minix/namei.c
@@ -288,4 +288,5 @@ const struct inode_operations minix_dir_inode_operations = {
.rename = minix_rename,
.getattr = minix_getattr,
.tmpfile = minix_tmpfile,
+ .get_metadata_bhs = minix_get_metadata_bhs,
};
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 29/32] ext4: Track metadata bhs in fs-private inode part
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (27 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 28/32] minix: " Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 30/32] vfs: Drop mapping_metadata_bhs from address space Jan Kara
` (2 subsequent siblings)
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Track metadata bhs for an inode in fs-private part of the inode. We need
the tracking only for nojournal mode so this is somewhat wasteful. We
can relatively easily make the mapping_metadata_bhs struct dynamically
allocated similarly to how we treat jbd2_inode but let's leave that for
ext4 specific series once the dust settles a bit.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/ext4.h | 4 +++-
fs/ext4/file.c | 1 +
fs/ext4/inode.c | 2 +-
fs/ext4/namei.c | 2 ++
fs/ext4/super.c | 6 ++++++
fs/ext4/symlink.c | 3 +++
6 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 293f698b7042..a829e5da67af 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1121,6 +1121,7 @@ struct ext4_inode_info {
struct rw_semaphore i_data_sem;
struct inode vfs_inode;
struct jbd2_inode *jinode;
+ struct mapping_metadata_bhs i_metadata_bhs;
/*
* File creation time. Its function is same as that of
@@ -3203,8 +3204,9 @@ extern void ext4_mark_group_bitmap_corrupted(struct super_block *sb,
unsigned int flags);
extern unsigned int ext4_num_base_meta_blocks(struct super_block *sb,
ext4_group_t block_group);
-extern void print_daily_error_info(struct timer_list *t);
+struct mapping_metadata_bhs *ext4_get_metadata_bhs(struct inode *inode);
+extern void print_daily_error_info(struct timer_list *t);
extern __printf(7, 8)
void __ext4_error(struct super_block *, const char *, unsigned int, bool,
int, __u64, const char *, ...);
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index f1dc5ce791a7..3d433f50524b 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -987,5 +987,6 @@ const struct inode_operations ext4_file_inode_operations = {
.fiemap = ext4_fiemap,
.fileattr_get = ext4_fileattr_get,
.fileattr_set = ext4_fileattr_set,
+ .get_metadata_bhs = ext4_get_metadata_bhs,
};
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 011cb2eb16a2..eead6c5c2366 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3436,7 +3436,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
}
/* Any metadata buffers to write? */
- if (mmb_has_buffers(&inode->i_mapping->i_metadata_bhs))
+ if (mmb_has_buffers(&EXT4_I(inode)->i_metadata_bhs))
return true;
return inode_state_read_once(inode) & I_DIRTY_DATASYNC;
}
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index c4b5e252af0e..4d2cae140b71 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -4228,6 +4228,7 @@ const struct inode_operations ext4_dir_inode_operations = {
.fiemap = ext4_fiemap,
.fileattr_get = ext4_fileattr_get,
.fileattr_set = ext4_fileattr_set,
+ .get_metadata_bhs = ext4_get_metadata_bhs,
};
const struct inode_operations ext4_special_inode_operations = {
@@ -4236,4 +4237,5 @@ const struct inode_operations ext4_special_inode_operations = {
.listxattr = ext4_listxattr,
.get_inode_acl = ext4_get_acl,
.set_acl = ext4_set_acl,
+ .get_metadata_bhs = ext4_get_metadata_bhs,
};
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index ea827b0ecc8d..4b9eb86b03e2 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1428,6 +1428,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
INIT_WORK(&ei->i_rsv_conversion_work, ext4_end_io_rsv_work);
ext4_fc_init_inode(&ei->vfs_inode);
spin_lock_init(&ei->i_fc_lock);
+ mmb_init(&ei->i_metadata_bhs);
return &ei->vfs_inode;
}
@@ -1521,6 +1522,11 @@ static void destroy_inodecache(void)
kmem_cache_destroy(ext4_inode_cachep);
}
+struct mapping_metadata_bhs *ext4_get_metadata_bhs(struct inode *inode)
+{
+ return &EXT4_I(inode)->i_metadata_bhs;
+}
+
void ext4_clear_inode(struct inode *inode)
{
ext4_fc_del(inode);
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index 645240cc0229..53ec8daf4cae 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -119,6 +119,7 @@ const struct inode_operations ext4_encrypted_symlink_inode_operations = {
.setattr = ext4_setattr,
.getattr = ext4_encrypted_symlink_getattr,
.listxattr = ext4_listxattr,
+ .get_metadata_bhs = ext4_get_metadata_bhs,
};
const struct inode_operations ext4_symlink_inode_operations = {
@@ -126,6 +127,7 @@ const struct inode_operations ext4_symlink_inode_operations = {
.setattr = ext4_setattr,
.getattr = ext4_getattr,
.listxattr = ext4_listxattr,
+ .get_metadata_bhs = ext4_get_metadata_bhs,
};
const struct inode_operations ext4_fast_symlink_inode_operations = {
@@ -133,4 +135,5 @@ const struct inode_operations ext4_fast_symlink_inode_operations = {
.setattr = ext4_setattr,
.getattr = ext4_getattr,
.listxattr = ext4_listxattr,
+ .get_metadata_bhs = ext4_get_metadata_bhs,
};
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 30/32] vfs: Drop mapping_metadata_bhs from address space
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (28 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 29/32] ext4: " Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 31/32] kvm: Use private inode list instead of i_private_list Jan Kara
2026-03-03 10:34 ` [PATCH 32/32] fs: Drop i_private_list from address_space Jan Kara
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Nobody uses mapping_metadata_bhs in struct address_space anymore. Just
remove it.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/buffer.c | 16 ++++++++++------
fs/inode.c | 2 --
include/linux/fs.h | 1 -
3 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 096a8d9e3280..02176e0acfe1 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -501,9 +501,13 @@ EXPORT_SYMBOL(mmb_init);
static struct mapping_metadata_bhs *inode_get_metadata_bhs(struct inode *inode)
{
+ /*
+ * We can get called for various half-initialized or bad inodes so
+ * verify .get_metadata_bhs callback exists.
+ */
if (inode->i_op->get_metadata_bhs)
return inode->i_op->get_metadata_bhs(inode);
- return &inode->i_mapping->i_metadata_bhs;
+ return NULL;
}
static void __remove_assoc_queue(struct mapping_metadata_bhs *mmb,
@@ -544,7 +548,7 @@ static void remove_assoc_queue(struct buffer_head *bh)
bool mmb_has_buffers(struct mapping_metadata_bhs *mmb)
{
- return !list_empty(&mmb->list);
+ return mmb && !list_empty(&mmb->list);
}
EXPORT_SYMBOL_GPL(mmb_has_buffers);
@@ -552,10 +556,10 @@ EXPORT_SYMBOL_GPL(mmb_has_buffers);
* sync_mapping_buffers - write out & wait upon a mapping's "associated" buffers
* @mapping: the mapping which wants those buffers written
*
- * Starts I/O against the buffers at mapping->i_metadata_bhs and waits upon
- * that I/O. Basically, this is a convenience function for fsync(). @mapping
- * is a file or directory which needs those buffers to be written for a
- * successful fsync().
+ * Starts I/O against the buffers tracked in mapping_metadata_bhs for the
+ * mapping and waits upon that I/O. Basically, this is a convenience function
+ * for fsync(). @mapping is a file or directory which needs those buffers to
+ * be written for a successful fsync().
*
* We have conflicting pressures: we want to make sure that all
* initially dirty buffers get waited on, but that any subsequently
diff --git a/fs/inode.c b/fs/inode.c
index 393f586d050a..d5774e627a9c 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -483,8 +483,6 @@ static void __address_space_init_once(struct address_space *mapping)
init_rwsem(&mapping->i_mmap_rwsem);
INIT_LIST_HEAD(&mapping->i_private_list);
spin_lock_init(&mapping->i_private_lock);
- spin_lock_init(&mapping->i_metadata_bhs.lock);
- INIT_LIST_HEAD(&mapping->i_metadata_bhs.list);
mapping->i_mmap = RB_ROOT_CACHED;
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b4d9be1fefa4..1611d8ce4b66 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -490,7 +490,6 @@ struct address_space {
errseq_t wb_err;
spinlock_t i_private_lock;
struct list_head i_private_list;
- struct mapping_metadata_bhs i_metadata_bhs;
struct rw_semaphore i_mmap_rwsem;
} __attribute__((aligned(sizeof(long)))) __randomize_layout;
/*
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 31/32] kvm: Use private inode list instead of i_private_list
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (29 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 30/32] vfs: Drop mapping_metadata_bhs from address space Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
2026-03-03 10:34 ` [PATCH 32/32] fs: Drop i_private_list from address_space Jan Kara
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Instead of using mapping->i_private_list use a list in private part of
the inode.
Signed-off-by: Jan Kara <jack@suse.cz>
---
virt/kvm/guest_memfd.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 017d84a7adf3..6d36a7827870 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -30,6 +30,7 @@ struct gmem_file {
struct gmem_inode {
struct shared_policy policy;
struct inode vfs_inode;
+ struct list_head gem_file_list;
u64 flags;
};
@@ -39,8 +40,8 @@ static __always_inline struct gmem_inode *GMEM_I(struct inode *inode)
return container_of(inode, struct gmem_inode, vfs_inode);
}
-#define kvm_gmem_for_each_file(f, mapping) \
- list_for_each_entry(f, &(mapping)->i_private_list, entry)
+#define kvm_gmem_for_each_file(f, inode) \
+ list_for_each_entry(f, &GMEM_I(inode)->gem_file_list, entry)
/**
* folio_file_pfn - like folio_file_page, but return a pfn.
@@ -202,7 +203,7 @@ static void kvm_gmem_invalidate_begin(struct inode *inode, pgoff_t start,
attr_filter = kvm_gmem_get_invalidate_filter(inode);
- kvm_gmem_for_each_file(f, inode->i_mapping)
+ kvm_gmem_for_each_file(f, inode)
__kvm_gmem_invalidate_begin(f, start, end, attr_filter);
}
@@ -223,7 +224,7 @@ static void kvm_gmem_invalidate_end(struct inode *inode, pgoff_t start,
{
struct gmem_file *f;
- kvm_gmem_for_each_file(f, inode->i_mapping)
+ kvm_gmem_for_each_file(f, inode)
__kvm_gmem_invalidate_end(f, start, end);
}
@@ -609,7 +610,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
kvm_get_kvm(kvm);
f->kvm = kvm;
xa_init(&f->bindings);
- list_add(&f->entry, &inode->i_mapping->i_private_list);
+ list_add(&f->entry, &GMEM_I(inode)->gem_file_list);
fd_install(fd, file);
return fd;
@@ -945,6 +946,7 @@ static struct inode *kvm_gmem_alloc_inode(struct super_block *sb)
mpol_shared_policy_init(&gi->policy, NULL);
gi->flags = 0;
+ INIT_LIST_HEAD(&gi->gem_file_list);
return &gi->vfs_inode;
}
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 32/32] fs: Drop i_private_list from address_space
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
` (30 preceding siblings ...)
2026-03-03 10:34 ` [PATCH 31/32] kvm: Use private inode list instead of i_private_list Jan Kara
@ 2026-03-03 10:34 ` Jan Kara
31 siblings, 0 replies; 36+ messages in thread
From: Jan Kara @ 2026-03-03 10:34 UTC (permalink / raw)
To: linux-fsdevel
Cc: Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jan Kara
Nobody is using i_private_list anymore. Remove it.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/inode.c | 2 --
include/linux/fs.h | 2 --
2 files changed, 4 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index d5774e627a9c..a8f019078fab 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -481,7 +481,6 @@ static void __address_space_init_once(struct address_space *mapping)
{
xa_init_flags(&mapping->i_pages, XA_FLAGS_LOCK_IRQ | XA_FLAGS_ACCOUNT);
init_rwsem(&mapping->i_mmap_rwsem);
- INIT_LIST_HEAD(&mapping->i_private_list);
spin_lock_init(&mapping->i_private_lock);
mapping->i_mmap = RB_ROOT_CACHED;
}
@@ -795,7 +794,6 @@ void clear_inode(struct inode *inode)
* nor even WARN_ON(!mapping_empty).
*/
xa_unlock_irq(&inode->i_data.i_pages);
- BUG_ON(!list_empty(&inode->i_data.i_private_list));
BUG_ON(!(inode_state_read_once(inode) & I_FREEING));
BUG_ON(inode_state_read_once(inode) & I_CLEAR);
BUG_ON(!list_empty(&inode->i_wb_list));
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1611d8ce4b66..adad21e31cfc 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -470,7 +470,6 @@ struct mapping_metadata_bhs {
* @flags: Error bits and flags (AS_*).
* @wb_err: The most recent error which has occurred.
* @i_private_lock: For use by the owner of the address_space.
- * @i_private_list: For use by the owner of the address_space.
*/
struct address_space {
struct inode *host;
@@ -489,7 +488,6 @@ struct address_space {
unsigned long flags;
errseq_t wb_err;
spinlock_t i_private_lock;
- struct list_head i_private_list;
struct rw_semaphore i_mmap_rwsem;
} __attribute__((aligned(sizeof(long)))) __randomize_layout;
/*
--
2.51.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 11/32] gfs2: Don't zero i_private_data
2026-03-03 10:34 ` [PATCH 11/32] gfs2: Don't zero i_private_data Jan Kara
@ 2026-03-03 12:32 ` Andreas Gruenbacher
0 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2026-03-03 12:32 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, gfs2
Jan,
On Tue, Mar 3, 2026 at 11:34 AM Jan Kara <jack@suse.cz> wrote:
> The zeroing is the only use within gfs2 so it is pointless.
"Remove the explicit zeroing of mapping->i_private_data since this
field is no longer used."
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Thanks,
Andreas
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 21/32] bdev: Drop pointless invalidate_mapping_buffers() call
2026-03-03 10:34 ` [PATCH 21/32] bdev: Drop pointless invalidate_mapping_buffers() call Jan Kara
@ 2026-03-03 14:03 ` Christoph Hellwig
2026-03-03 14:09 ` Christoph Hellwig
1 sibling, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2026-03-03 14:03 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jens Axboe, linux-block
> diff --git a/block/bdev.c b/block/bdev.c
> index ed022f8c48c7..ad1660b6b324 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -420,7 +420,6 @@ static void init_once(void *data)
> static void bdev_evict_inode(struct inode *inode)
> {
> truncate_inode_pages_final(&inode->i_data);
> - invalidate_inode_buffers(inode); /* is it needed here? */
> clear_inode(inode);
> }
With this, bdev_evict_inode can go away as it is equivalent to the
default action when no ->evict_inode is provided.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 21/32] bdev: Drop pointless invalidate_mapping_buffers() call
2026-03-03 10:34 ` [PATCH 21/32] bdev: Drop pointless invalidate_mapping_buffers() call Jan Kara
2026-03-03 14:03 ` Christoph Hellwig
@ 2026-03-03 14:09 ` Christoph Hellwig
1 sibling, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2026-03-03 14:09 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, Christian Brauner, Al Viro, linux-ext4, Ted Tso,
Tigran A. Aivazian, David Sterba, OGAWA Hirofumi, Muchun Song,
Oscar Salvador, David Hildenbrand, linux-mm, linux-aio,
Benjamin LaHaise, Jens Axboe, linux-block
FYI, linux-block only got this patch which is totally messed up.
Please always send all patches to every list and person, otherwise
you fill peoples inboxes with unreviewable junk.
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2026-03-03 14:09 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-03 10:33 [PATCH 0/32] fs: Move metadata bh tracking from address_space Jan Kara
2026-03-03 10:33 ` [PATCH 01/32] fat: Sync and invalidate metadata buffers from fat_evict_inode() Jan Kara
2026-03-03 10:33 ` [PATCH 02/32] udf: Sync and invalidate metadata buffers from udf_evict_inode() Jan Kara
2026-03-03 10:33 ` [PATCH 03/32] minix: Sync and invalidate metadata buffers from minix_evict_inode() Jan Kara
2026-03-03 10:33 ` [PATCH 04/32] ext2: Sync and invalidate metadata buffers from ext2_evict_inode() Jan Kara
2026-03-03 10:33 ` [PATCH 05/32] ext4: Sync and invalidate metadata buffers from ext4_evict_inode() Jan Kara
2026-03-03 10:33 ` [PATCH 06/32] ext4: Use inode_has_buffers() Jan Kara
2026-03-03 10:33 ` [PATCH 07/32] bfs: Sync and invalidate metadata buffers from bfs_evict_inode() Jan Kara
2026-03-03 10:33 ` [PATCH 08/32] affs: Sync and invalidate metadata buffers from affs_evict_inode() Jan Kara
2026-03-03 10:33 ` [PATCH 09/32] fs: Ignore inode metadata buffers in inode_lru_isolate() Jan Kara
2026-03-03 10:33 ` [PATCH 10/32] fs: Stop using i_private_data for metadata bh tracking Jan Kara
2026-03-03 10:34 ` [PATCH 11/32] gfs2: Don't zero i_private_data Jan Kara
2026-03-03 12:32 ` Andreas Gruenbacher
2026-03-03 10:34 ` [PATCH 12/32] hugetlbfs: Stop using i_private_data Jan Kara
2026-03-03 10:34 ` [PATCH 13/32] aio: Stop using i_private_data and i_private_lock Jan Kara
2026-03-03 10:34 ` [PATCH 14/32] fs: Remove i_private_data Jan Kara
2026-03-03 10:34 ` [PATCH 15/32] fs: Drop osync_buffers_list() Jan Kara
2026-03-03 10:34 ` [PATCH 16/32] fs: Fold fsync_buffers_list() into sync_mapping_buffers() Jan Kara
2026-03-03 10:34 ` [PATCH 17/32] fs: Move metadata bhs tracking to a separate struct Jan Kara
2026-03-03 10:34 ` [PATCH 18/32] fs: Provide operation for fetching mapping_metadata_bhs Jan Kara
2026-03-03 10:34 ` [PATCH 19/32] ntfs3: Drop pointless sync_mapping_buffers() call Jan Kara
2026-03-03 10:34 ` [PATCH 20/32] ocfs2: Drop pointless sync_mapping_buffers() calls Jan Kara
2026-03-03 10:34 ` [PATCH 21/32] bdev: Drop pointless invalidate_mapping_buffers() call Jan Kara
2026-03-03 14:03 ` Christoph Hellwig
2026-03-03 14:09 ` Christoph Hellwig
2026-03-03 10:34 ` [PATCH 22/32] fs: Switch inode_has_buffers() to take mapping_metadata_bhs Jan Kara
2026-03-03 10:34 ` [PATCH 23/32] ext2: Track metadata bhs in fs-private inode part Jan Kara
2026-03-03 10:34 ` [PATCH 24/32] affs: " Jan Kara
2026-03-03 10:34 ` [PATCH 25/32] bfs: " Jan Kara
2026-03-03 10:34 ` [PATCH 26/32] fat: " Jan Kara
2026-03-03 10:34 ` [PATCH 27/32] udf: " Jan Kara
2026-03-03 10:34 ` [PATCH 28/32] minix: " Jan Kara
2026-03-03 10:34 ` [PATCH 29/32] ext4: " Jan Kara
2026-03-03 10:34 ` [PATCH 30/32] vfs: Drop mapping_metadata_bhs from address space Jan Kara
2026-03-03 10:34 ` [PATCH 31/32] kvm: Use private inode list instead of i_private_list Jan Kara
2026-03-03 10:34 ` [PATCH 32/32] fs: Drop i_private_list from address_space Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox