From: Jan Kara <jack@suse.cz>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: Jan Kara <jack@suse.cz>,
dave@stgolabs.net, brauner@kernel.org, tytso@mit.edu,
adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org,
riel@surriel.com, willy@infradead.org, hannes@cmpxchg.org,
oliver.sang@intel.com, david@redhat.com, axboe@kernel.dk,
hare@suse.de, david@fromorbit.com, djwong@kernel.org,
ritesh.list@gmail.com, linux-fsdevel@vger.kernel.org,
linux-block@vger.kernel.org, linux-mm@kvack.org,
gost.dev@samsung.com, p.raghav@samsung.com,
da.gomez@samsung.com,
syzbot+f3c6fda1297c748a7076@syzkaller.appspotmail.com
Subject: Re: [PATCH v2 1/8] migrate: fix skipping metadata buffer heads on migration
Date: Wed, 23 Apr 2025 19:09:28 +0200 [thread overview]
Message-ID: <jwciumjkfwwjeoklsi6ubcspcjswkz5s5gtttzpjqft6dtb7sp@c4ae6y5pix5w> (raw)
In-Reply-To: <Z__hthNd2nj9QjrM@bombadil.infradead.org>
[-- Attachment #1: Type: text/plain, Size: 1484 bytes --]
On Wed 16-04-25 09:58:30, Luis Chamberlain wrote:
> On Tue, Apr 15, 2025 at 06:28:55PM +0200, Jan Kara wrote:
> > > So I tried:
> > >
> > > root@e1-ext4-2k /var/lib/xfstests # fsck /dev/loop5 -y 2>&1 > log
> > > e2fsck 1.47.2 (1-Jan-2025)
> > > root@e1-ext4-2k /var/lib/xfstests # wc -l log
> > > 16411 log
> >
> > Can you share the log please?
>
> Sure, here you go:
>
> https://github.com/linux-kdevops/20250416-ext4-jbd2-bh-migrate-corruption
>
> The last trace-0004.txt is a fresh one with Davidlohr's patches
> applied. It has trace-0004-fsck.txt.
Thanks for the data! I was staring at them for some time and at this point
I'm leaning towards a conclusion that this is actually not a case of
metadata corruption but rather a bug in ext4 transaction credit computation
that is completely independent of page migration.
Based on the e2fsck log you've provided the only damage in the filesystem
is from the aborted transaction handle in the middle of extent tree growth.
So nothing points to a lost metadata write or anything like that. And the
credit reservation for page writeback is indeed somewhat racy - we reserve
number of transaction credits based on current tree depth. However by the
time we get to ext4_ext_map_blocks() another process could have modified
the extent tree so we may need to modify more blocks than we originally
expected and reserved credits for.
Can you give attached patch a try please?
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
[-- Attachment #2: 0001-ext4-Fix-calculation-of-credits-for-extent-tree-modi.patch --]
[-- Type: text/x-patch, Size: 1981 bytes --]
From 4c53fb9f4b9b3eb4a579f69b7adcb6524d55629c Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Wed, 23 Apr 2025 18:10:54 +0200
Subject: [PATCH] ext4: Fix calculation of credits for extent tree modification
Luis and David are reporting that after running generic/750 test for 90+
hours on 2k ext4 filesystem, they are able to trigger a warning in
jbd2_journal_dirty_metadata() complaining that there are not enough
credits in the running transaction started in ext4_do_writepages().
Indeed the code in ext4_do_writepages() is racy and the extent tree can
change between the time we compute credits necessary for extent tree
computation and the time we actually modify the extent tree. Thus it may
happen that the number of credits actually needed is higher. Modify
ext4_ext_index_trans_blocks() to count with the worst case of maximum
tree depth.
Link: https://lore.kernel.org/all/20250415013641.f2ppw6wov4kn4wq2@offworld
Reported-by: Davidlohr Bueso <dave@stgolabs.net>
Reported-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/extents.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index c616a16a9f36..43286632e650 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2396,18 +2396,19 @@ int ext4_ext_calc_credits_for_single_extent(struct inode *inode, int nrblocks,
int ext4_ext_index_trans_blocks(struct inode *inode, int extents)
{
int index;
- int depth;
/* If we are converting the inline data, only one is needed here. */
if (ext4_has_inline_data(inode))
return 1;
- depth = ext_depth(inode);
-
+ /*
+ * Extent tree can change between the time we estimate credits and
+ * the time we actually modify the tree. Assume the worst case.
+ */
if (extents <= 1)
- index = depth * 2;
+ index = EXT4_MAX_EXTENT_DEPTH * 2;
else
- index = depth * 3;
+ index = EXT4_MAX_EXTENT_DEPTH * 3;
return index;
}
--
2.43.0
next prev parent reply other threads:[~2025-04-23 17:09 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-10 1:49 [PATCH v2 0/8] mm: enhance migration work around on noref buffer-heads Luis Chamberlain
2025-04-10 1:49 ` [PATCH v2 1/8] migrate: fix skipping metadata buffer heads on migration Luis Chamberlain
2025-04-10 3:18 ` Matthew Wilcox
2025-04-10 12:05 ` Jan Kara
2025-04-14 21:09 ` Luis Chamberlain
2025-04-14 22:19 ` Luis Chamberlain
2025-04-15 9:05 ` Christian Brauner
2025-04-15 15:47 ` Luis Chamberlain
2025-04-15 16:23 ` Jan Kara
2025-04-15 21:06 ` Luis Chamberlain
2025-04-16 2:02 ` Davidlohr Bueso
2025-04-15 11:17 ` Jan Kara
2025-04-15 11:23 ` Jan Kara
2025-04-15 16:18 ` Luis Chamberlain
2025-04-15 16:28 ` Jan Kara
2025-04-16 16:58 ` Luis Chamberlain
2025-04-23 17:09 ` Jan Kara [this message]
2025-04-23 20:30 ` Luis Chamberlain
2025-04-25 22:51 ` Luis Chamberlain
2025-04-28 23:08 ` Luis Chamberlain
2025-04-29 9:32 ` Jan Kara
2025-04-15 1:36 ` Davidlohr Bueso
2025-04-15 11:25 ` Jan Kara
2025-04-15 18:14 ` Davidlohr Bueso
2025-04-10 1:49 ` [PATCH v2 2/8] fs/buffer: try to use folio lock for pagecache lookups Luis Chamberlain
2025-04-10 14:38 ` Jan Kara
2025-04-10 17:38 ` Davidlohr Bueso
2025-04-10 1:49 ` [PATCH v2 3/8] fs/buffer: introduce __find_get_block_nonatomic() Luis Chamberlain
2025-04-10 1:49 ` [PATCH v2 4/8] fs/ocfs2: use sleeping version of __find_get_block() Luis Chamberlain
2025-04-10 1:49 ` [PATCH v2 5/8] fs/jbd2: " Luis Chamberlain
2025-04-10 1:49 ` [PATCH v2 6/8] fs/ext4: " Luis Chamberlain
2025-04-10 13:36 ` Jan Kara
2025-04-10 17:32 ` Davidlohr Bueso
2025-04-10 1:49 ` [PATCH v2 7/8] mm/migrate: enable noref migration for jbd2 Luis Chamberlain
2025-04-10 13:40 ` Jan Kara
2025-04-10 17:30 ` Davidlohr Bueso
2025-04-14 12:12 ` Jan Kara
2025-04-10 1:49 ` [PATCH v2 8/8] mm: add migration buffer-head debugfs interface Luis Chamberlain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=jwciumjkfwwjeoklsi6ubcspcjswkz5s5gtttzpjqft6dtb7sp@c4ae6y5pix5w \
--to=jack@suse.cz \
--cc=adilger.kernel@dilger.ca \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=da.gomez@samsung.com \
--cc=dave@stgolabs.net \
--cc=david@fromorbit.com \
--cc=david@redhat.com \
--cc=djwong@kernel.org \
--cc=gost.dev@samsung.com \
--cc=hannes@cmpxchg.org \
--cc=hare@suse.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mcgrof@kernel.org \
--cc=oliver.sang@intel.com \
--cc=p.raghav@samsung.com \
--cc=riel@surriel.com \
--cc=ritesh.list@gmail.com \
--cc=syzbot+f3c6fda1297c748a7076@syzkaller.appspotmail.com \
--cc=tytso@mit.edu \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox