linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] fs/buffer: split pagecache lookups into atomic or blocking
@ 2025-04-18  1:59 Davidlohr Bueso
  2025-04-18  1:59 ` [PATCH 1/7] fs/buffer: split locking for pagecache lookups Davidlohr Bueso
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: Davidlohr Bueso @ 2025-04-18  1:59 UTC (permalink / raw)
  To: jack, tytso, adilger.kernel, brauner
  Cc: mcgrof, willy, hare, djwong, linux-ext4, linux-fsdevel, linux-mm,
	Davidlohr Bueso

Hello,

Changes from v1: rebased on top of vfs.fixes (Christian).

This is a respin of the series[0] to address the sleep in atomic scenarios for
noref migration with large folios, introduced in:

      3c20917120ce61 ("block/bdev: enable large folio support for large logical block sizes")

The main difference is that it removes the first patch and moves the fix (reducing
the i_private_lock critical region in the migration path) to the final patch, which
also introduces the new BH_Migrate flag. It also simplifies the locking scheme in
patch 1 to avoid folio trylocking in the atomic lookup cases. So essentially blocking
users will take the folio lock and hence wait for migration, and otherwise nonblocking
callers will bail the lookup if a noref migration is on-going. Blocking callers
will also benefit from potential performance gains by reducing contention on the
spinlock for bdev mappings.

Applies against latest vfs.fixes. Please consider for Linus' tree.

Patch 1: carves a path for callers that can block to take the folio lock.
Patch 2: adds sleeping flavors to pagecache lookups, no users.
Patches 3-6: converts to the new call, where possible.
Patch 7: does the actual sleep in atomic fix.

Thanks!

[0] https://lore.kernel.org/all/20250410014945.2140781-1-mcgrof@kernel.org/

Davidlohr Bueso (7):
  fs/buffer: split locking for pagecache lookups
  fs/buffer: introduce sleeping flavors for pagecache lookups
  fs/buffer: use sleeping version of __find_get_block()
  fs/ocfs2: use sleeping version of __find_get_block()
  fs/jbd2: use sleeping version of __find_get_block()
  fs/ext4: use sleeping version of sb_find_get_block()
  mm/migrate: fix sleep in atomic for large folios and buffer heads

 fs/buffer.c                 | 73 +++++++++++++++++++++++++++----------
 fs/ext4/ialloc.c            |  3 +-
 fs/ext4/mballoc.c           |  3 +-
 fs/jbd2/revoke.c            | 15 +++++---
 fs/ocfs2/journal.c          |  2 +-
 include/linux/buffer_head.h |  9 +++++
 mm/migrate.c                |  8 ++--
 7 files changed, 82 insertions(+), 31 deletions(-)

--
2.39.5



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/7] fs/buffer: split locking for pagecache lookups
  2025-04-18  1:59 [PATCH v2 0/7] fs/buffer: split pagecache lookups into atomic or blocking Davidlohr Bueso
@ 2025-04-18  1:59 ` Davidlohr Bueso
  2025-04-18  1:59 ` [PATCH 2/7] fs/buffer: introduce sleeping flavors " Davidlohr Bueso
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Davidlohr Bueso @ 2025-04-18  1:59 UTC (permalink / raw)
  To: jack, tytso, adilger.kernel, brauner
  Cc: mcgrof, willy, hare, djwong, linux-ext4, linux-fsdevel, linux-mm,
	Davidlohr Bueso

Callers of __find_get_block() may or may not allow for blocking
semantics, and is currently assumed that it will not. Layout
two paths based on this. The the private_lock scheme will
continued to be used for atomic contexts. Otherwise take the
folio lock instead, which protects the buffers, such as
vs migration and try_to_free_buffers().

Per the "hack idea", the latter can alleviate contention on
the private_lock for bdev mappings. For reasons of determinism
and avoid making bugs hard to reproduce, the trylocking is not
attempted.

No change in semantics. All lookup users still take the spinlock.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 fs/buffer.c | 41 +++++++++++++++++++++++++----------------
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index c7abb4a029dc..f8fcffdbe5d9 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -176,18 +176,8 @@ void end_buffer_write_sync(struct buffer_head *bh, int uptodate)
 }
 EXPORT_SYMBOL(end_buffer_write_sync);
 
-/*
- * Various filesystems appear to want __find_get_block to be non-blocking.
- * But it's the page lock which protects the buffers.  To get around this,
- * we get exclusion from try_to_free_buffers with the blockdev mapping's
- * i_private_lock.
- *
- * Hack idea: for the blockdev mapping, i_private_lock contention
- * may be quite high.  This code could TryLock the page, and if that
- * succeeds, there is no need to take i_private_lock.
- */
 static struct buffer_head *
-__find_get_block_slow(struct block_device *bdev, sector_t block)
+__find_get_block_slow(struct block_device *bdev, sector_t block, bool atomic)
 {
 	struct address_space *bd_mapping = bdev->bd_mapping;
 	const int blkbits = bd_mapping->host->i_blkbits;
@@ -204,7 +194,16 @@ __find_get_block_slow(struct block_device *bdev, sector_t block)
 	if (IS_ERR(folio))
 		goto out;
 
-	spin_lock(&bd_mapping->i_private_lock);
+	/*
+	 * Folio lock protects the buffers. Callers that cannot block
+	 * will fallback to serializing vs try_to_free_buffers() via
+	 * the i_private_lock.
+	 */
+	if (atomic)
+		spin_lock(&bd_mapping->i_private_lock);
+	else
+		folio_lock(folio);
+
 	head = folio_buffers(folio);
 	if (!head)
 		goto out_unlock;
@@ -236,7 +235,10 @@ __find_get_block_slow(struct block_device *bdev, sector_t block)
 		       1 << blkbits);
 	}
 out_unlock:
-	spin_unlock(&bd_mapping->i_private_lock);
+	if (atomic)
+		spin_unlock(&bd_mapping->i_private_lock);
+	else
+		folio_unlock(folio);
 	folio_put(folio);
 out:
 	return ret;
@@ -1388,14 +1390,15 @@ lookup_bh_lru(struct block_device *bdev, sector_t block, unsigned size)
  * it in the LRU and mark it as accessed.  If it is not present then return
  * NULL
  */
-struct buffer_head *
-__find_get_block(struct block_device *bdev, sector_t block, unsigned size)
+static struct buffer_head *
+find_get_block_common(struct block_device *bdev, sector_t block,
+			unsigned size, bool atomic)
 {
 	struct buffer_head *bh = lookup_bh_lru(bdev, block, size);
 
 	if (bh == NULL) {
 		/* __find_get_block_slow will mark the page accessed */
-		bh = __find_get_block_slow(bdev, block);
+		bh = __find_get_block_slow(bdev, block, atomic);
 		if (bh)
 			bh_lru_install(bh);
 	} else
@@ -1403,6 +1406,12 @@ __find_get_block(struct block_device *bdev, sector_t block, unsigned size)
 
 	return bh;
 }
+
+struct buffer_head *
+__find_get_block(struct block_device *bdev, sector_t block, unsigned size)
+{
+	return find_get_block_common(bdev, block, size, true);
+}
 EXPORT_SYMBOL(__find_get_block);
 
 /**
-- 
2.39.5



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/7] fs/buffer: introduce sleeping flavors for pagecache lookups
  2025-04-18  1:59 [PATCH v2 0/7] fs/buffer: split pagecache lookups into atomic or blocking Davidlohr Bueso
  2025-04-18  1:59 ` [PATCH 1/7] fs/buffer: split locking for pagecache lookups Davidlohr Bueso
@ 2025-04-18  1:59 ` Davidlohr Bueso
  2025-04-18  1:59 ` [PATCH 3/7] fs/buffer: use sleeping version of __find_get_block() Davidlohr Bueso
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Davidlohr Bueso @ 2025-04-18  1:59 UTC (permalink / raw)
  To: jack, tytso, adilger.kernel, brauner
  Cc: mcgrof, willy, hare, djwong, linux-ext4, linux-fsdevel, linux-mm,
	Davidlohr Bueso

Add __find_get_block_nonatomic() and sb_find_get_block_nonatomic()
calls for which users will be converted where safe. These versions
will take the folio lock instead of the mapping's private_lock.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 fs/buffer.c                 | 9 +++++++++
 include/linux/buffer_head.h | 8 ++++++++
 2 files changed, 17 insertions(+)

diff --git a/fs/buffer.c b/fs/buffer.c
index f8fcffdbe5d9..5b1d74c818e9 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1414,6 +1414,15 @@ __find_get_block(struct block_device *bdev, sector_t block, unsigned size)
 }
 EXPORT_SYMBOL(__find_get_block);
 
+/* same as __find_get_block() but allows sleeping contexts */
+struct buffer_head *
+__find_get_block_nonatomic(struct block_device *bdev, sector_t block,
+			   unsigned size)
+{
+	return find_get_block_common(bdev, block, size, false);
+}
+EXPORT_SYMBOL(__find_get_block_nonatomic);
+
 /**
  * bdev_getblk - Get a buffer_head in a block device's buffer cache.
  * @bdev: The block device.
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index f0a4ad7839b6..c791aa9a08da 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -222,6 +222,8 @@ void __wait_on_buffer(struct buffer_head *);
 wait_queue_head_t *bh_waitq_head(struct buffer_head *bh);
 struct buffer_head *__find_get_block(struct block_device *bdev, sector_t block,
 			unsigned size);
+struct buffer_head *__find_get_block_nonatomic(struct block_device *bdev,
+			sector_t block, unsigned size);
 struct buffer_head *bdev_getblk(struct block_device *bdev, sector_t block,
 		unsigned size, gfp_t gfp);
 void __brelse(struct buffer_head *);
@@ -397,6 +399,12 @@ sb_find_get_block(struct super_block *sb, sector_t block)
 	return __find_get_block(sb->s_bdev, block, sb->s_blocksize);
 }
 
+static inline struct buffer_head *
+sb_find_get_block_nonatomic(struct super_block *sb, sector_t block)
+{
+	return __find_get_block_nonatomic(sb->s_bdev, block, sb->s_blocksize);
+}
+
 static inline void
 map_bh(struct buffer_head *bh, struct super_block *sb, sector_t block)
 {
-- 
2.39.5



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/7] fs/buffer: use sleeping version of __find_get_block()
  2025-04-18  1:59 [PATCH v2 0/7] fs/buffer: split pagecache lookups into atomic or blocking Davidlohr Bueso
  2025-04-18  1:59 ` [PATCH 1/7] fs/buffer: split locking for pagecache lookups Davidlohr Bueso
  2025-04-18  1:59 ` [PATCH 2/7] fs/buffer: introduce sleeping flavors " Davidlohr Bueso
@ 2025-04-18  1:59 ` Davidlohr Bueso
  2025-04-18  1:59 ` [PATCH 4/7] fs/ocfs2: " Davidlohr Bueso
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Davidlohr Bueso @ 2025-04-18  1:59 UTC (permalink / raw)
  To: jack, tytso, adilger.kernel, brauner
  Cc: mcgrof, willy, hare, djwong, linux-ext4, linux-fsdevel, linux-mm,
	Davidlohr Bueso

Convert to the new nonatomic flavor to benefit from potential performance
benefits and adapt in the future vs migration such that semantics
are kept.

Convert write_boundary_block() which already takes the buffer
lock as well as bdev_getblk() depending on the respective gpf flags.
There are no changes in semantics.

Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 fs/buffer.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 5b1d74c818e9..f8c9e5eb4685 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -658,7 +658,9 @@ EXPORT_SYMBOL(generic_buffers_fsync);
 void write_boundary_block(struct block_device *bdev,
 			sector_t bblock, unsigned blocksize)
 {
-	struct buffer_head *bh = __find_get_block(bdev, bblock + 1, blocksize);
+	struct buffer_head *bh;
+
+	bh = __find_get_block_nonatomic(bdev, bblock + 1, blocksize);
 	if (bh) {
 		if (buffer_dirty(bh))
 			write_dirty_buffer(bh, 0);
@@ -1440,7 +1442,12 @@ EXPORT_SYMBOL(__find_get_block_nonatomic);
 struct buffer_head *bdev_getblk(struct block_device *bdev, sector_t block,
 		unsigned size, gfp_t gfp)
 {
-	struct buffer_head *bh = __find_get_block(bdev, block, size);
+	struct buffer_head *bh;
+
+	if (gfpflags_allow_blocking(gfp))
+		bh = __find_get_block_nonatomic(bdev, block, size);
+	else
+		bh = __find_get_block(bdev, block, size);
 
 	might_alloc(gfp);
 	if (bh)
-- 
2.39.5



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 4/7] fs/ocfs2: use sleeping version of __find_get_block()
  2025-04-18  1:59 [PATCH v2 0/7] fs/buffer: split pagecache lookups into atomic or blocking Davidlohr Bueso
                   ` (2 preceding siblings ...)
  2025-04-18  1:59 ` [PATCH 3/7] fs/buffer: use sleeping version of __find_get_block() Davidlohr Bueso
@ 2025-04-18  1:59 ` Davidlohr Bueso
  2025-04-18  1:59 ` [PATCH 5/7] fs/jbd2: " Davidlohr Bueso
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Davidlohr Bueso @ 2025-04-18  1:59 UTC (permalink / raw)
  To: jack, tytso, adilger.kernel, brauner
  Cc: mcgrof, willy, hare, djwong, linux-ext4, linux-fsdevel, linux-mm,
	Davidlohr Bueso

This is a path that allows for blocking as it does IO. Convert
to the new nonatomic flavor to benefit from potential performance
benefits and adapt in the future vs migration such that semantics
are kept.

Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 fs/ocfs2/journal.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index f1b4b3e611cb..c7a9729dc9d0 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -1249,7 +1249,7 @@ static int ocfs2_force_read_journal(struct inode *inode)
 		}
 
 		for (i = 0; i < p_blocks; i++, p_blkno++) {
-			bh = __find_get_block(osb->sb->s_bdev, p_blkno,
+			bh = __find_get_block_nonatomic(osb->sb->s_bdev, p_blkno,
 					osb->sb->s_blocksize);
 			/* block not cached. */
 			if (!bh)
-- 
2.39.5



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 5/7] fs/jbd2: use sleeping version of __find_get_block()
  2025-04-18  1:59 [PATCH v2 0/7] fs/buffer: split pagecache lookups into atomic or blocking Davidlohr Bueso
                   ` (3 preceding siblings ...)
  2025-04-18  1:59 ` [PATCH 4/7] fs/ocfs2: " Davidlohr Bueso
@ 2025-04-18  1:59 ` Davidlohr Bueso
  2025-04-18  1:59 ` [PATCH 6/7] fs/ext4: use sleeping version of sb_find_get_block() Davidlohr Bueso
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Davidlohr Bueso @ 2025-04-18  1:59 UTC (permalink / raw)
  To: jack, tytso, adilger.kernel, brauner
  Cc: mcgrof, willy, hare, djwong, linux-ext4, linux-fsdevel, linux-mm,
	Davidlohr Bueso

Convert to the new nonatomic flavor to benefit from potential
performance benefits and adapt in the future vs migration such
that semantics are kept.

- jbd2_journal_revoke(): can sleep (has might_sleep() in the beginning)

- jbd2_journal_cancel_revoke(): only used from do_get_write_access() and
    do_get_create_access() which do sleep. So can sleep.

- jbd2_clear_buffer_revoked_flags() - only called from journal commit code
    which sleeps. So can sleep.

Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 fs/jbd2/revoke.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/jbd2/revoke.c b/fs/jbd2/revoke.c
index 0cf0fddbee81..1467f6790747 100644
--- a/fs/jbd2/revoke.c
+++ b/fs/jbd2/revoke.c
@@ -345,7 +345,8 @@ int jbd2_journal_revoke(handle_t *handle, unsigned long long blocknr,
 	bh = bh_in;
 
 	if (!bh) {
-		bh = __find_get_block(bdev, blocknr, journal->j_blocksize);
+		bh = __find_get_block_nonatomic(bdev, blocknr,
+						journal->j_blocksize);
 		if (bh)
 			BUFFER_TRACE(bh, "found on hash");
 	}
@@ -355,7 +356,8 @@ int jbd2_journal_revoke(handle_t *handle, unsigned long long blocknr,
 
 		/* If there is a different buffer_head lying around in
 		 * memory anywhere... */
-		bh2 = __find_get_block(bdev, blocknr, journal->j_blocksize);
+		bh2 = __find_get_block_nonatomic(bdev, blocknr,
+						 journal->j_blocksize);
 		if (bh2) {
 			/* ... and it has RevokeValid status... */
 			if (bh2 != bh && buffer_revokevalid(bh2))
@@ -464,7 +466,8 @@ void jbd2_journal_cancel_revoke(handle_t *handle, struct journal_head *jh)
 	 * state machine will get very upset later on. */
 	if (need_cancel) {
 		struct buffer_head *bh2;
-		bh2 = __find_get_block(bh->b_bdev, bh->b_blocknr, bh->b_size);
+		bh2 = __find_get_block_nonatomic(bh->b_bdev, bh->b_blocknr,
+						 bh->b_size);
 		if (bh2) {
 			if (bh2 != bh)
 				clear_buffer_revoked(bh2);
@@ -492,9 +495,9 @@ void jbd2_clear_buffer_revoked_flags(journal_t *journal)
 			struct jbd2_revoke_record_s *record;
 			struct buffer_head *bh;
 			record = (struct jbd2_revoke_record_s *)list_entry;
-			bh = __find_get_block(journal->j_fs_dev,
-					      record->blocknr,
-					      journal->j_blocksize);
+			bh = __find_get_block_nonatomic(journal->j_fs_dev,
+							record->blocknr,
+							journal->j_blocksize);
 			if (bh) {
 				clear_buffer_revoked(bh);
 				__brelse(bh);
-- 
2.39.5



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 6/7] fs/ext4: use sleeping version of sb_find_get_block()
  2025-04-18  1:59 [PATCH v2 0/7] fs/buffer: split pagecache lookups into atomic or blocking Davidlohr Bueso
                   ` (4 preceding siblings ...)
  2025-04-18  1:59 ` [PATCH 5/7] fs/jbd2: " Davidlohr Bueso
@ 2025-04-18  1:59 ` Davidlohr Bueso
  2025-04-18  1:59 ` [PATCH 7/7] mm/migrate: fix sleep in atomic for large folios and buffer heads Davidlohr Bueso
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Davidlohr Bueso @ 2025-04-18  1:59 UTC (permalink / raw)
  To: jack, tytso, adilger.kernel, brauner
  Cc: mcgrof, willy, hare, djwong, linux-ext4, linux-fsdevel, linux-mm,
	Davidlohr Bueso

Enable ext4_free_blocks() to use it, which has a cond_resched to begin
with. Convert to the new nonatomic flavor to benefit from potential
performance benefits and adapt in the future vs migration such that
semantics are kept.

Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 fs/ext4/mballoc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index f88424c28194..1e98c5be4e0a 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6642,7 +6642,8 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,
 		for (i = 0; i < count; i++) {
 			cond_resched();
 			if (is_metadata)
-				bh = sb_find_get_block(inode->i_sb, block + i);
+				bh = sb_find_get_block_nonatomic(inode->i_sb,
+								 block + i);
 			ext4_forget(handle, is_metadata, inode, bh, block + i);
 		}
 	}
-- 
2.39.5



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 7/7] mm/migrate: fix sleep in atomic for large folios and buffer heads
  2025-04-18  1:59 [PATCH v2 0/7] fs/buffer: split pagecache lookups into atomic or blocking Davidlohr Bueso
                   ` (5 preceding siblings ...)
  2025-04-18  1:59 ` [PATCH 6/7] fs/ext4: use sleeping version of sb_find_get_block() Davidlohr Bueso
@ 2025-04-18  1:59 ` Davidlohr Bueso
  2025-04-21 21:06 ` [PATCH v2 0/7] fs/buffer: split pagecache lookups into atomic or blocking Luis Chamberlain
  2025-04-22  7:51 ` Christian Brauner
  8 siblings, 0 replies; 12+ messages in thread
From: Davidlohr Bueso @ 2025-04-18  1:59 UTC (permalink / raw)
  To: jack, tytso, adilger.kernel, brauner
  Cc: mcgrof, willy, hare, djwong, linux-ext4, linux-fsdevel, linux-mm,
	Davidlohr Bueso, kernel test robot, syzbot+f3c6fda1297c748a7076

The large folio + buffer head noref migration scenarios are
being naughty and blocking while holding a spinlock.

As a consequence of the pagecache lookup path taking the
folio lock this serializes against migration paths, so
they can wait for each other. For the private_lock
atomic case, a new BH_Migrate flag is introduced which
enables the lookup to bail.

This allows the critical region of the private_lock on
the migration path to be reduced to the way it was before
ebdf4de5642fb6 ("mm: migrate: fix reference  check race
between __find_get_block() and migration"), that is covering
the count checks.

The scope is always noref migration.

Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: syzbot+f3c6fda1297c748a7076@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/oe-lkp/202503101536.27099c77-lkp@intel.com
Fixes: 3c20917120ce61 ("block/bdev: enable large folio support for large logical block sizes")
Reviewed-by: Jan Kara <jack@suse.cz>
Co-developed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 fs/buffer.c                 | 12 +++++++++++-
 fs/ext4/ialloc.c            |  3 ++-
 include/linux/buffer_head.h |  1 +
 mm/migrate.c                |  8 +++++---
 4 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index f8c9e5eb4685..7be23ff20b27 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -207,6 +207,15 @@ __find_get_block_slow(struct block_device *bdev, sector_t block, bool atomic)
 	head = folio_buffers(folio);
 	if (!head)
 		goto out_unlock;
+	/*
+	 * Upon a noref migration, the folio lock serializes here;
+	 * otherwise bail.
+	 */
+	if (test_bit_acquire(BH_Migrate, &head->b_state)) {
+		WARN_ON(!atomic);
+		goto out_unlock;
+	}
+
 	bh = head;
 	do {
 		if (!buffer_mapped(bh))
@@ -1390,7 +1399,8 @@ lookup_bh_lru(struct block_device *bdev, sector_t block, unsigned size)
 /*
  * Perform a pagecache lookup for the matching buffer.  If it's there, refresh
  * it in the LRU and mark it as accessed.  If it is not present then return
- * NULL
+ * NULL. Atomic context callers may also return NULL if the buffer is being
+ * migrated; similarly the page is not marked accessed either.
  */
 static struct buffer_head *
 find_get_block_common(struct block_device *bdev, sector_t block,
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 38bc8d74f4cc..e7ecc7c8a729 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -691,7 +691,8 @@ static int recently_deleted(struct super_block *sb, ext4_group_t group, int ino)
 	if (!bh || !buffer_uptodate(bh))
 		/*
 		 * If the block is not in the buffer cache, then it
-		 * must have been written out.
+		 * must have been written out, or, most unlikely, is
+		 * being migrated - false failure should be OK here.
 		 */
 		goto out;
 
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index c791aa9a08da..0029ff880e27 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -34,6 +34,7 @@ enum bh_state_bits {
 	BH_Meta,	/* Buffer contains metadata */
 	BH_Prio,	/* Buffer should be submitted with REQ_PRIO */
 	BH_Defer_Completion, /* Defer AIO completion to workqueue */
+	BH_Migrate,     /* Buffer is being migrated (norefs) */
 
 	BH_PrivateStart,/* not a state bit, but the first bit available
 			 * for private allocation by other entities
diff --git a/mm/migrate.c b/mm/migrate.c
index f3ee6d8d5e2e..676d9cfc7059 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -845,9 +845,11 @@ static int __buffer_migrate_folio(struct address_space *mapping,
 		return -EAGAIN;
 
 	if (check_refs) {
-		bool busy;
+		bool busy, migrating;
 		bool invalidated = false;
 
+		migrating = test_and_set_bit_lock(BH_Migrate, &head->b_state);
+		VM_WARN_ON_ONCE(migrating);
 recheck_buffers:
 		busy = false;
 		spin_lock(&mapping->i_private_lock);
@@ -859,12 +861,12 @@ static int __buffer_migrate_folio(struct address_space *mapping,
 			}
 			bh = bh->b_this_page;
 		} while (bh != head);
+		spin_unlock(&mapping->i_private_lock);
 		if (busy) {
 			if (invalidated) {
 				rc = -EAGAIN;
 				goto unlock_buffers;
 			}
-			spin_unlock(&mapping->i_private_lock);
 			invalidate_bh_lrus();
 			invalidated = true;
 			goto recheck_buffers;
@@ -883,7 +885,7 @@ static int __buffer_migrate_folio(struct address_space *mapping,
 
 unlock_buffers:
 	if (check_refs)
-		spin_unlock(&mapping->i_private_lock);
+		clear_bit_unlock(BH_Migrate, &head->b_state);
 	bh = head;
 	do {
 		unlock_buffer(bh);
-- 
2.39.5



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/7] fs/buffer: split pagecache lookups into atomic or blocking
  2025-04-18  1:59 [PATCH v2 0/7] fs/buffer: split pagecache lookups into atomic or blocking Davidlohr Bueso
                   ` (6 preceding siblings ...)
  2025-04-18  1:59 ` [PATCH 7/7] mm/migrate: fix sleep in atomic for large folios and buffer heads Davidlohr Bueso
@ 2025-04-21 21:06 ` Luis Chamberlain
  2025-04-22  7:51 ` Christian Brauner
  8 siblings, 0 replies; 12+ messages in thread
From: Luis Chamberlain @ 2025-04-21 21:06 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: jack, tytso, adilger.kernel, brauner, willy, hare, djwong,
	linux-ext4, linux-fsdevel, linux-mm

On Thu, Apr 17, 2025 at 06:59:14PM -0700, Davidlohr Bueso wrote:
> Hello,
> 
> Changes from v1: rebased on top of vfs.fixes (Christian).
> 
> This is a respin of the series[0] to address the sleep in atomic scenarios for
> noref migration with large folios, introduced in:
> 
>       3c20917120ce61 ("block/bdev: enable large folio support for large logical block sizes")
> 
> The main difference is that it removes the first patch and moves the fix (reducing
> the i_private_lock critical region in the migration path) to the final patch, which
> also introduces the new BH_Migrate flag. It also simplifies the locking scheme in
> patch 1 to avoid folio trylocking in the atomic lookup cases. So essentially blocking
> users will take the folio lock and hence wait for migration, and otherwise nonblocking
> callers will bail the lookup if a noref migration is on-going. Blocking callers
> will also benefit from potential performance gains by reducing contention on the
> spinlock for bdev mappings.
> 
> Applies against latest vfs.fixes. Please consider for Linus' tree.
> 
> Patch 1: carves a path for callers that can block to take the folio lock.
> Patch 2: adds sleeping flavors to pagecache lookups, no users.
> Patches 3-6: converts to the new call, where possible.
> Patch 7: does the actual sleep in atomic fix.
> 
> Thanks!

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Tested-by: kdevops@lists.linux.dev # [0] [1]
Link: https://kdevops.org/ext4/v6.15-rc2.html # [0]
Link: https://lore.kernel.org/all/aAAEvcrmREWa1SKF@bombadil.infradead.org/ # [1]

  Luis


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/7] fs/buffer: split pagecache lookups into atomic or blocking
  2025-04-18  1:59 [PATCH v2 0/7] fs/buffer: split pagecache lookups into atomic or blocking Davidlohr Bueso
                   ` (7 preceding siblings ...)
  2025-04-21 21:06 ` [PATCH v2 0/7] fs/buffer: split pagecache lookups into atomic or blocking Luis Chamberlain
@ 2025-04-22  7:51 ` Christian Brauner
  8 siblings, 0 replies; 12+ messages in thread
From: Christian Brauner @ 2025-04-22  7:51 UTC (permalink / raw)
  To: jack, tytso, adilger.kernel, Davidlohr Bueso
  Cc: Christian Brauner, mcgrof, willy, hare, djwong, linux-ext4,
	linux-fsdevel, linux-mm

On Thu, 17 Apr 2025 18:59:14 -0700, Davidlohr Bueso wrote:
> Changes from v1: rebased on top of vfs.fixes (Christian).
> 
> This is a respin of the series[0] to address the sleep in atomic scenarios for
> noref migration with large folios, introduced in:
> 
>       3c20917120ce61 ("block/bdev: enable large folio support for large logical block sizes")
> 
> [...]

Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes

[1/7] fs/buffer: split locking for pagecache lookups
      https://git.kernel.org/vfs/vfs/c/86e6a02d527d
[2/7] fs/buffer: introduce sleeping flavors for pagecache lookups
      https://git.kernel.org/vfs/vfs/c/b3a1fdf3516f
[3/7] fs/buffer: use sleeping version of __find_get_block()
      https://git.kernel.org/vfs/vfs/c/5852088c19f2
[4/7] fs/ocfs2: use sleeping version of __find_get_block()
      https://git.kernel.org/vfs/vfs/c/2732f3e4752a
[5/7] fs/jbd2: use sleeping version of __find_get_block()
      https://git.kernel.org/vfs/vfs/c/ce1fb0552481
[6/7] fs/ext4: use sleeping version of sb_find_get_block()
      https://git.kernel.org/vfs/vfs/c/9461e25786b0
[7/7] mm/migrate: fix sleep in atomic for large folios and buffer heads
      https://git.kernel.org/vfs/vfs/c/6676209df9cf


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/7] fs/jbd2: use sleeping version of __find_get_block()
  2025-04-15 23:16 ` [PATCH 5/7] fs/jbd2: use sleeping version of __find_get_block() Davidlohr Bueso
@ 2025-04-16  9:38   ` Jan Kara
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2025-04-16  9:38 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: jack, tytso, adilger.kernel, brauner, mcgrof, willy, hare,
	djwong, linux-ext4, linux-fsdevel, linux-mm

On Tue 15-04-25 16:16:33, Davidlohr Bueso wrote:
> Convert to the new nonatomic flavor to benefit from potential
> performance benefits and adapt in the future vs migration such
> that semantics are kept.
> 
> - jbd2_journal_revoke(): can sleep (has might_sleep() in the beginning)
> 
> - jbd2_journal_cancel_revoke(): only used from do_get_write_access() and
>     do_get_create_access() which do sleep. So can sleep.
> 
> - jbd2_clear_buffer_revoked_flags() - only called from journal commit code
>     which sleeps. So can sleep.
> 
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza


> ---
>  fs/jbd2/revoke.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/jbd2/revoke.c b/fs/jbd2/revoke.c
> index 0cf0fddbee81..1467f6790747 100644
> --- a/fs/jbd2/revoke.c
> +++ b/fs/jbd2/revoke.c
> @@ -345,7 +345,8 @@ int jbd2_journal_revoke(handle_t *handle, unsigned long long blocknr,
>  	bh = bh_in;
>  
>  	if (!bh) {
> -		bh = __find_get_block(bdev, blocknr, journal->j_blocksize);
> +		bh = __find_get_block_nonatomic(bdev, blocknr,
> +						journal->j_blocksize);
>  		if (bh)
>  			BUFFER_TRACE(bh, "found on hash");
>  	}
> @@ -355,7 +356,8 @@ int jbd2_journal_revoke(handle_t *handle, unsigned long long blocknr,
>  
>  		/* If there is a different buffer_head lying around in
>  		 * memory anywhere... */
> -		bh2 = __find_get_block(bdev, blocknr, journal->j_blocksize);
> +		bh2 = __find_get_block_nonatomic(bdev, blocknr,
> +						 journal->j_blocksize);
>  		if (bh2) {
>  			/* ... and it has RevokeValid status... */
>  			if (bh2 != bh && buffer_revokevalid(bh2))
> @@ -464,7 +466,8 @@ void jbd2_journal_cancel_revoke(handle_t *handle, struct journal_head *jh)
>  	 * state machine will get very upset later on. */
>  	if (need_cancel) {
>  		struct buffer_head *bh2;
> -		bh2 = __find_get_block(bh->b_bdev, bh->b_blocknr, bh->b_size);
> +		bh2 = __find_get_block_nonatomic(bh->b_bdev, bh->b_blocknr,
> +						 bh->b_size);
>  		if (bh2) {
>  			if (bh2 != bh)
>  				clear_buffer_revoked(bh2);
> @@ -492,9 +495,9 @@ void jbd2_clear_buffer_revoked_flags(journal_t *journal)
>  			struct jbd2_revoke_record_s *record;
>  			struct buffer_head *bh;
>  			record = (struct jbd2_revoke_record_s *)list_entry;
> -			bh = __find_get_block(journal->j_fs_dev,
> -					      record->blocknr,
> -					      journal->j_blocksize);
> +			bh = __find_get_block_nonatomic(journal->j_fs_dev,
> +							record->blocknr,
> +							journal->j_blocksize);
>  			if (bh) {
>  				clear_buffer_revoked(bh);
>  				__brelse(bh);
> -- 
> 2.39.5
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 5/7] fs/jbd2: use sleeping version of __find_get_block()
  2025-04-15 23:16 [PATCH -next " Davidlohr Bueso
@ 2025-04-15 23:16 ` Davidlohr Bueso
  2025-04-16  9:38   ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Davidlohr Bueso @ 2025-04-15 23:16 UTC (permalink / raw)
  To: jack, tytso, adilger.kernel, brauner
  Cc: mcgrof, willy, hare, djwong, linux-ext4, linux-fsdevel, linux-mm,
	Davidlohr Bueso

Convert to the new nonatomic flavor to benefit from potential
performance benefits and adapt in the future vs migration such
that semantics are kept.

- jbd2_journal_revoke(): can sleep (has might_sleep() in the beginning)

- jbd2_journal_cancel_revoke(): only used from do_get_write_access() and
    do_get_create_access() which do sleep. So can sleep.

- jbd2_clear_buffer_revoked_flags() - only called from journal commit code
    which sleeps. So can sleep.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
 fs/jbd2/revoke.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/jbd2/revoke.c b/fs/jbd2/revoke.c
index 0cf0fddbee81..1467f6790747 100644
--- a/fs/jbd2/revoke.c
+++ b/fs/jbd2/revoke.c
@@ -345,7 +345,8 @@ int jbd2_journal_revoke(handle_t *handle, unsigned long long blocknr,
 	bh = bh_in;
 
 	if (!bh) {
-		bh = __find_get_block(bdev, blocknr, journal->j_blocksize);
+		bh = __find_get_block_nonatomic(bdev, blocknr,
+						journal->j_blocksize);
 		if (bh)
 			BUFFER_TRACE(bh, "found on hash");
 	}
@@ -355,7 +356,8 @@ int jbd2_journal_revoke(handle_t *handle, unsigned long long blocknr,
 
 		/* If there is a different buffer_head lying around in
 		 * memory anywhere... */
-		bh2 = __find_get_block(bdev, blocknr, journal->j_blocksize);
+		bh2 = __find_get_block_nonatomic(bdev, blocknr,
+						 journal->j_blocksize);
 		if (bh2) {
 			/* ... and it has RevokeValid status... */
 			if (bh2 != bh && buffer_revokevalid(bh2))
@@ -464,7 +466,8 @@ void jbd2_journal_cancel_revoke(handle_t *handle, struct journal_head *jh)
 	 * state machine will get very upset later on. */
 	if (need_cancel) {
 		struct buffer_head *bh2;
-		bh2 = __find_get_block(bh->b_bdev, bh->b_blocknr, bh->b_size);
+		bh2 = __find_get_block_nonatomic(bh->b_bdev, bh->b_blocknr,
+						 bh->b_size);
 		if (bh2) {
 			if (bh2 != bh)
 				clear_buffer_revoked(bh2);
@@ -492,9 +495,9 @@ void jbd2_clear_buffer_revoked_flags(journal_t *journal)
 			struct jbd2_revoke_record_s *record;
 			struct buffer_head *bh;
 			record = (struct jbd2_revoke_record_s *)list_entry;
-			bh = __find_get_block(journal->j_fs_dev,
-					      record->blocknr,
-					      journal->j_blocksize);
+			bh = __find_get_block_nonatomic(journal->j_fs_dev,
+							record->blocknr,
+							journal->j_blocksize);
 			if (bh) {
 				clear_buffer_revoked(bh);
 				__brelse(bh);
-- 
2.39.5



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-04-22  7:51 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-18  1:59 [PATCH v2 0/7] fs/buffer: split pagecache lookups into atomic or blocking Davidlohr Bueso
2025-04-18  1:59 ` [PATCH 1/7] fs/buffer: split locking for pagecache lookups Davidlohr Bueso
2025-04-18  1:59 ` [PATCH 2/7] fs/buffer: introduce sleeping flavors " Davidlohr Bueso
2025-04-18  1:59 ` [PATCH 3/7] fs/buffer: use sleeping version of __find_get_block() Davidlohr Bueso
2025-04-18  1:59 ` [PATCH 4/7] fs/ocfs2: " Davidlohr Bueso
2025-04-18  1:59 ` [PATCH 5/7] fs/jbd2: " Davidlohr Bueso
2025-04-18  1:59 ` [PATCH 6/7] fs/ext4: use sleeping version of sb_find_get_block() Davidlohr Bueso
2025-04-18  1:59 ` [PATCH 7/7] mm/migrate: fix sleep in atomic for large folios and buffer heads Davidlohr Bueso
2025-04-21 21:06 ` [PATCH v2 0/7] fs/buffer: split pagecache lookups into atomic or blocking Luis Chamberlain
2025-04-22  7:51 ` Christian Brauner
  -- strict thread matches above, loose matches on Subject: below --
2025-04-15 23:16 [PATCH -next " Davidlohr Bueso
2025-04-15 23:16 ` [PATCH 5/7] fs/jbd2: use sleeping version of __find_get_block() Davidlohr Bueso
2025-04-16  9:38   ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox