Re: [PATCH 2/3] fs/buffer: avoid races with folio migrations on __find_get_block_slow()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Davidlohr Bueso <dave@stgolabs.net>
To: Jan Kara <jack@suse.cz>
Cc: Luis Chamberlain <mcgrof@kernel.org>,
	brauner@kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca,
	linux-ext4@vger.kernel.org, riel@surriel.com,
	willy@infradead.org, hannes@cmpxchg.org, oliver.sang@intel.com,
	david@redhat.com, axboe@kernel.dk, hare@suse.de,
	david@fromorbit.com, djwong@kernel.org, ritesh.list@gmail.com,
	linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
	linux-mm@kvack.org, gost.dev@samsung.com, p.raghav@samsung.com,
	da.gomez@samsung.com
Subject: Re: [PATCH 2/3] fs/buffer: avoid races with folio migrations on __find_get_block_slow()
Date: Tue, 1 Apr 2025 14:49:51 -0700	[thread overview]
Message-ID: <20250401214951.kikcrmu5k3q6qmcr@offworld> (raw)
In-Reply-To: <lj6o73q6nev776uvy7potqrn5gmgtm4o2cev7dloedwasxcsmn@uanvqp3sm35p>

On Tue, 01 Apr 2025, Jan Kara wrote:

>I find this problematic. It fixes the race with migration, alright
>(although IMO we should have a comment very well explaining the interplay
>of folio lock and mapping->private_lock to make this work - probably in
>buffer_migrate_folio_norefs() - and reference it from here), but there are
>places which expect that if __find_get_block() doesn't return anything,
>this block is not cached in the buffer cache. And your change breaks this
>assumption. Look for example at write_boundary_block(), that will fail to
>write the block it should write if it races with someone locking the folio
>after your changes. Similarly the code tracking state of deleted metadata
>blocks in fs/jbd2/revoke.c will fail to properly update buffer's state if
>__find_get_block() suddently starts returning NULL although the buffer is
>present in cache.

Yeah - one thing I was thinking about, _iff_ failing lookups (__find_get_block()
returning nil) during migration is in fact permitted, was adding a BH_migrate
flag and serialize vs __buffer_migrate_folio() entirely. Semantically there
are no users, and none are added during this window, but as a consequence I
suppose one thread could see the page not cached, act upon that, then see it
cached once the migration is done and get confused(?). So I don't see a problem
here for write_boundary_block() specifically, but I'm probably overlooking others.

Now, if bailing on the lookup is not an option, meaning it must wait for the
migration to complete, I'm not sure large folios will ever be compatible with
the "Various filesystems appear to want __find_get_block to be non-blocking."
comment.

So the below could be tucked in for norefs only (because this is about the addr
space i_private_lock), but this also shortens the hold time; if that matters
at all, of course, vs changing the migration semantics.

Thanks,
Davidlohr

---8<----------------------------
diff --git a/fs/buffer.c b/fs/buffer.c
index cc8452f60251..f585339ae2e4 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -208,6 +208,14 @@ __find_get_block_slow(struct block_device *bdev, sector_t block)
	head = folio_buffers(folio);
	if (!head)
		goto out_unlock;
+
+	bh = head;
+	do {
+		if (test_bit(BH_migrate, &bh->b_state))
+			goto out_unlock;
+		bh = bh->b_this_page;
+	} while (bh != head);
+
	bh = head;
	do {
		if (!buffer_mapped(bh))
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 932139c5d46f..e956a1509a05 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -34,6 +34,7 @@ enum bh_state_bits {
	BH_Meta,	/* Buffer contains metadata */
	BH_Prio,	/* Buffer should be submitted with REQ_PRIO */
	BH_Defer_Completion, /* Defer AIO completion to workqueue */
+	BH_migrate,     /* Buffer is being migrated */

	BH_PrivateStart,/* not a state bit, but the first bit available
			 * for private allocation by other entities
diff --git a/mm/migrate.c b/mm/migrate.c
index a073eb6c5009..0ffa8b478fd3 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -846,6 +846,12 @@ static int __buffer_migrate_folio(struct address_space *mapping,
	if (!buffer_migrate_lock_buffers(head, mode))
		return -EAGAIN;

+	bh = head;
+	do {
+		set_bit(BH_migrate, &bh->b_state);
+		bh = bh->b_this_page;
+	} while (bh != head);
+
	if (check_refs) {
		bool busy;
		bool invalidated = false;
@@ -861,12 +867,12 @@ static int __buffer_migrate_folio(struct address_space *mapping,
			}
			bh = bh->b_this_page;
		} while (bh != head);
+		spin_unlock(&mapping->i_private_lock);
		if (busy) {
			if (invalidated) {
				rc = -EAGAIN;
				goto unlock_buffers;
			}
-			spin_unlock(&mapping->i_private_lock);
			invalidate_bh_lrus();
			invalidated = true;
			goto recheck_buffers;
@@ -884,10 +890,9 @@ static int __buffer_migrate_folio(struct address_space *mapping,
	} while (bh != head);

  unlock_buffers:
-	if (check_refs)
-		spin_unlock(&mapping->i_private_lock);
	bh = head;
	do {
+		clear_bit(BH_migrate, &bh->b_state)
		unlock_buffer(bh);
		bh = bh->b_this_page;
	} while (bh != head);

next prev parent reply	other threads:[~2025-04-01 21:50 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-30  6:47 [PATCH 0/3] mm: move migration work around to buffer-heads Luis Chamberlain
2025-03-30  6:47 ` [PATCH 1/3] mm/migrate: add might_sleep() on __migrate_folio() Luis Chamberlain
2025-03-30 12:04   ` Matthew Wilcox
2025-03-31  6:28     ` Luis Chamberlain
2025-04-01 22:53     ` Davidlohr Bueso
2025-03-30  6:47 ` [PATCH 2/3] fs/buffer: avoid races with folio migrations on __find_get_block_slow() Luis Chamberlain
2025-03-31 19:58   ` Luis Chamberlain
2025-04-02 23:11     ` Luis Chamberlain
2025-04-04 15:55       ` Luis Chamberlain
2025-04-01 10:57   ` Jan Kara
2025-04-01 21:49     ` Davidlohr Bueso [this message]
2025-04-02  1:58       ` Matthew Wilcox
2025-04-03  2:04         ` Luis Chamberlain
2025-04-03 13:43           ` Jan Kara
2025-04-03 16:11             ` Theodore Ts'o
2025-04-03  1:02     ` Luis Chamberlain
2025-03-30  6:47 ` [PATCH 3/3] mm/migrate: avoid atomic context on buffer_migrate_folio_norefs() migration Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250401214951.kikcrmu5k3q6qmcr@offworld \
    --to=dave@stgolabs.net \
    --cc=adilger.kernel@dilger.ca \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=da.gomez@samsung.com \
    --cc=david@fromorbit.com \
    --cc=david@redhat.com \
    --cc=djwong@kernel.org \
    --cc=gost.dev@samsung.com \
    --cc=hannes@cmpxchg.org \
    --cc=hare@suse.de \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=oliver.sang@intel.com \
    --cc=p.raghav@samsung.com \
    --cc=riel@surriel.com \
    --cc=ritesh.list@gmail.com \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox