From: Jan Kara <jack@suse.cz>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
brauner@kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca,
linux-ext4@vger.kernel.org, riel@surriel.com,
hannes@cmpxchg.org, oliver.sang@intel.com, david@redhat.com,
axboe@kernel.dk, hare@suse.de, david@fromorbit.com,
djwong@kernel.org, ritesh.list@gmail.com,
linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
linux-mm@kvack.org, gost.dev@samsung.com, p.raghav@samsung.com,
da.gomez@samsung.com
Subject: Re: [PATCH 2/3] fs/buffer: avoid races with folio migrations on __find_get_block_slow()
Date: Thu, 3 Apr 2025 15:43:12 +0200 [thread overview]
Message-ID: <2jrcw4mtwcophanqmi2y74ffgf247m6ap44u3gedpylsjl3bz6@yueuwkmcwm66> (raw)
In-Reply-To: <Z-3spxNHYe_CbLgP@bombadil.infradead.org>
On Wed 02-04-25 19:04:23, Luis Chamberlain wrote:
> On Wed, Apr 02, 2025 at 02:58:28AM +0100, Matthew Wilcox wrote:
> > On Tue, Apr 01, 2025 at 02:49:51PM -0700, Davidlohr Bueso wrote:
> > > So the below could be tucked in for norefs only (because this is about the addr
> > > space i_private_lock), but this also shortens the hold time; if that matters
> > > at all, of course, vs changing the migration semantics.
> >
> > I like this approach a lot better. One wrinkle is that it doesn't seem
> > that we need to set the BH_Migrate bit on every buffer; we could define
> > that it's only set on the head BH, right?
>
> Yes, we are also only doing this for block devices, and for migration
> purposes. Even though a bit from one buffer may be desirable it makes
> no sense to allow for that in case migration is taking place. So indeed
> we have no need to add the flag for all buffers.
>
> I think the remaining question is what users of __find_get_block_slow()
> can really block, and well I've started trying to determine that with
> coccinelle [0], its gonna take some more time.
>
> Perhaps its easier to ask, why would a block device mapping want to
> allow __find_get_block_slow() to not block?
So I've audited all callers of __find_get_block_slow() (there aren't that
many) and most of them are actually fine with sleeping these days.
Analysis:
__find_get_block_slow() is only used from __find_get_block().
__find_get_block() is used from:
write_boundary_block() - locks the buffer so can sleep
bdev_getblk() - allocates buffers with 'gfp' mask. We use GFP_NOWAIT mask
from some places (generally doing readahead). For callers where
!gfpflags_allow_blocking() we should bail rather than block on
migration. Callers are currently fine with this, we should probably
document that bdev_getblk() with restrictive gfp mask may fail even if
bh is present - or perhaps make this even more explicit in the API by
providing bdev_try_getblk() and make bdev_getblk() assert gfp mask
allows sleeping.
__getblk_slow() - only called from bdev_getblk(). Probably should fold
there.
ocfs2_force_read_journal() - allows sleeping as it does IO
jbd2_journal_revoke() - can sleep (has might_sleep() in the beginning)
jbd2_journal_cancel_revoke() - only used from do_get_write_access() and
do_get_create_access() which do sleep. So can sleep.
jbd2_clear_buffer_revoked_flags() - only called from journal commit code
which sleeps. So can sleep.
The last user is sb_find_get_block() which is used from:
hpfs_prefetch_sectors() - prefers bail rather than blocking
fat_dir_readahead() - prefers bail rather than blocking
exfat_dir_readahead() - prefers bail rather than blocking
ext4_free_blocks() - can sleep
ext4_getblk() - depending on EXT4_GET_BLOCKS_CACHED_NOWAIT flag either can
sleep or must bail (and is fine with it) rather than sleeping
fs/ext4/ialloc.c:recently_deleted() - this one is the most problematic
place. It must bail rather than sleeping (called under a spinlock) but
it depends on the fact that if bh is not returned, then the data has been
written out and evicted from memory. Luckily, the usage of
recently_deleted() is mostly an optimization to reduce damage in case
of crash so rare false failure should be OK. Ted, what is your opinion?
And this is actually all. So it seems that if we give possibility to
callers to tell whether they want to bail or wait for migration, things
should work out fine.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2025-04-03 13:43 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-30 6:47 [PATCH 0/3] mm: move migration work around to buffer-heads Luis Chamberlain
2025-03-30 6:47 ` [PATCH 1/3] mm/migrate: add might_sleep() on __migrate_folio() Luis Chamberlain
2025-03-30 12:04 ` Matthew Wilcox
2025-03-31 6:28 ` Luis Chamberlain
2025-04-01 22:53 ` Davidlohr Bueso
2025-03-30 6:47 ` [PATCH 2/3] fs/buffer: avoid races with folio migrations on __find_get_block_slow() Luis Chamberlain
2025-03-31 19:58 ` Luis Chamberlain
2025-04-02 23:11 ` Luis Chamberlain
2025-04-04 15:55 ` Luis Chamberlain
2025-04-01 10:57 ` Jan Kara
2025-04-01 21:49 ` Davidlohr Bueso
2025-04-02 1:58 ` Matthew Wilcox
2025-04-03 2:04 ` Luis Chamberlain
2025-04-03 13:43 ` Jan Kara [this message]
2025-04-03 16:11 ` Theodore Ts'o
2025-04-03 1:02 ` Luis Chamberlain
2025-03-30 6:47 ` [PATCH 3/3] mm/migrate: avoid atomic context on buffer_migrate_folio_norefs() migration Luis Chamberlain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2jrcw4mtwcophanqmi2y74ffgf247m6ap44u3gedpylsjl3bz6@yueuwkmcwm66 \
--to=jack@suse.cz \
--cc=adilger.kernel@dilger.ca \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=da.gomez@samsung.com \
--cc=david@fromorbit.com \
--cc=david@redhat.com \
--cc=djwong@kernel.org \
--cc=gost.dev@samsung.com \
--cc=hannes@cmpxchg.org \
--cc=hare@suse.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mcgrof@kernel.org \
--cc=oliver.sang@intel.com \
--cc=p.raghav@samsung.com \
--cc=riel@surriel.com \
--cc=ritesh.list@gmail.com \
--cc=tytso@mit.edu \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox