From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A20D4C369DC for ; Tue, 29 Apr 2025 09:32:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E593C6B000C; Tue, 29 Apr 2025 05:32:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DDE556B000E; Tue, 29 Apr 2025 05:32:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C32D96B0011; Tue, 29 Apr 2025 05:32:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 9B9826B000C for ; Tue, 29 Apr 2025 05:32:53 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 9770E1617E1 for ; Tue, 29 Apr 2025 09:32:54 +0000 (UTC) X-FDA: 83386567068.09.3E0D60B Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf10.hostedemail.com (Postfix) with ESMTP id E4979C0009 for ; Tue, 29 Apr 2025 09:32:51 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=BS8TXisa; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=qork3uvg; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=xvu3U+V7; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=vL0J2oIh; dmarc=none; spf=pass (imf10.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745919172; a=rsa-sha256; cv=none; b=S+oSnO3D0JuRIRpp1ajlFTyxjMJ6d++8jcAcVYT6JRQFnQgLqqfEAbRhNtMHyWJYMg42px MpmOoMMIYxiKU5PzpFziqAgvAKPTcZS0PmeqA5qB+yQpOLZsYI7pyh9O/c/BgL2uMU/hoh aOxcOgdeXxZbNXjGz2hC2UZN6qXP7vo= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=BS8TXisa; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=qork3uvg; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=xvu3U+V7; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=vL0J2oIh; dmarc=none; spf=pass (imf10.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745919172; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=axQ6VDg/2Q3mj72ydC4+gODgLwSF9HxyukFyq6pOaPA=; b=65w2BLz/36B4YszSkOouoAApZ5gpnL71Cjcu/xVNqDBirykRKS6hWUX2UbeWFSwKk2N97v BownSMN87B+F+DRjR8NaqAxBgxEuOq3z9MpBLiYgg2u2EzrBL4tEsdO4c1xgOUUA4KyuX5 pnICUv4yVv+g9filEvLWsOn88V1mUIM= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id E919B1F7A5; Tue, 29 Apr 2025 09:32:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1745919170; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=axQ6VDg/2Q3mj72ydC4+gODgLwSF9HxyukFyq6pOaPA=; b=BS8TXisaeb9OFwqumRCpQEgoHRlSuPMpoZZByPZfoZNJ3XjhwLVTHDmRkiHquEEV9GhkiK xkuzfxO77dcbsYYHXupLqsN3qlJAHDW6GYdrMl3PIP33rdRmIzl/NZKVctk4g0R5OthTK/ 3iJFuun1byFkHhGnQm7xFXrNEgYBhxc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1745919170; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=axQ6VDg/2Q3mj72ydC4+gODgLwSF9HxyukFyq6pOaPA=; b=qork3uvg4Ll56BM/EIU7uCygstkmZqKY0UYoyMsFetKzIH+2riMJRQOvBkXSTNU+pxUEoh 886R+FXIXCvTpvDg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1745919169; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=axQ6VDg/2Q3mj72ydC4+gODgLwSF9HxyukFyq6pOaPA=; b=xvu3U+V7t7tcS3Xma8xdhnk9Jf5U+Q67C+8Ui7JknY9WC/gkYr9cjczOeMd0GAdEpoacIR fpgyv6GAA0giuhG5WlXaNTQLqeqNZAB17ZVzF1oJPJy3sQfY8fj1TPMR2NmsDPB9kwfWbh 2dkPwsn6BPHVzJYMn4O1Sbrco8yjXCk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1745919169; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=axQ6VDg/2Q3mj72ydC4+gODgLwSF9HxyukFyq6pOaPA=; b=vL0J2oIh7+CY/g4Y1cpG4MzrRLxLs3tqR5I8zIxDl/T0g17n4eoLfLYrpbMGGXV0ZJ8fH/ Lb5ysxHWfXo4WKCg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id D8AA21340C; Tue, 29 Apr 2025 09:32:49 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id bErgNMGcEGisGwAAD6G6ig (envelope-from ); Tue, 29 Apr 2025 09:32:49 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 7D67DA0952; Tue, 29 Apr 2025 11:32:41 +0200 (CEST) Date: Tue, 29 Apr 2025 11:32:41 +0200 From: Jan Kara To: Luis Chamberlain Cc: Jan Kara , dave@stgolabs.net, brauner@kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org, riel@surriel.com, willy@infradead.org, hannes@cmpxchg.org, oliver.sang@intel.com, david@redhat.com, axboe@kernel.dk, hare@suse.de, david@fromorbit.com, djwong@kernel.org, ritesh.list@gmail.com, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-mm@kvack.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com, syzbot+f3c6fda1297c748a7076@syzkaller.appspotmail.com Subject: Re: [PATCH v2 1/8] migrate: fix skipping metadata buffer heads on migration Message-ID: References: <20250410014945.2140781-1-mcgrof@kernel.org> <20250410014945.2140781-2-mcgrof@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: E4979C0009 X-Stat-Signature: 8qtrahttfna96swznaetdsygcmmyfipm X-Rspam-User: X-HE-Tag: 1745919171-151008 X-HE-Meta: U2FsdGVkX19nfsd6wlxVR8AaAf1JLaZE9krc+JosAMeKK2GXJGx+BSQ+MM/Edn4IwzAkmXQEe4VLcwJN3yXiDaPIl9VMhOnGTr/Xk0x4R9i0Iv9C523kAAis4opqvzC7LeED8QQfxj1wVmJWUQ+C7FPohrQ+C5qU9K/X99J4jy1YjxM5tDyO0QgV9j4VGm02XBAiEYItM2D1c3/Tf/bzrBxnfbJEdBjmymDb1emBuAU7c7JAXj/FlF5wRVO7jdyxepJIoUpeBKbAwx/QCChFfKd/FsOYNiy0Gk1ybG4CBFItU88FCEbgLHKViPIsfsVE70xMlNaVMXNmui5bCfJRzINKDjSVU6aQYQsBALSc2ezeA+GO6TQjIQR+Wv3AjF4UmCJeX5xipvZiTeppd7qwpKYy+krWxcg+oT61cJYOCQ6lly8Qw1jbBeV1QtcRBW3NO26DJAYcciVBbpTsgVtOHCEC5IGWLOB0dolZ99Z4Ru0VLAYDGUiid8oxCo8PoGQ76FZUbc1+nA6ZP8bYnASxchrttEhZw2kZP4ScFRg0i0amy7qQKmLRyNBkSoFo1f0VmivPFC3kXDIxS37DpePxTPzCVLLgPgkbapBqZnxJFaXsI50Fpu+tRFdVVPXh18Q4+PZTnodAw+umaD8lsoJ1Tgcf1RBE8lfvWttluQLknHk0w7uk7Gc7oUqR2M/ihK/M0gGNzFIactC83lSh78aFF65T2J9sd6B68EYUymdsKuRaawx5zxBey2ScUKFbPfD+Y6KBePjr8DP8+IKb3VjsLKxle3uE1dlDWDuVMyZ2K8R5R1kAmgXwzCBOKuBxk9S2sqPpDzCSE6WJm3JNQ16XeyyxxQ79Xg4YG1THSmIvORk2Jy/wTNZMIaARvt47LoXf4xU1yDcd8g1NU44QzPxB1dWzrnJRpOJAXJ9ORGmnjMWNAZDb7nHCp0YDTJWuZRiolM75IHLaKuX1q47J9U3 qO5C4jRi RDr8vYQvtjO62SKY0nqXQDmnvEoDh2CZie2XN3JGRBYqeie4QbW/f8YWDpnpUSuQKiAm3ELl0afNfMSqZaE0e5rNFsn5w+PxG+06WaMEh99pZ5u4MKdA1eOPvZL6QCvRe1TvGRjCkpppQF0asNKKwurQzBouEbbd0uwZpupwJxaP0YLTVw+faPZuxYqAuSabyS1Nr+UvFE0buhZDi3WilcIF4SAA2uR9wyHheaEwUMKv+hMx9svx1F3heLV3VPcb10TPGnhZn4DREi4Yz6KgV2MzD9SiG82k4tIm3+rAG3zbMTBcyf91C8NvBhfbWVSX/UOpY560MhmzdUUX5zKCwAVYoNC9NKwyy59/nrn4vBwT7Uj9EZzxPTl+PjbWJx5XPHTQtQX1ca1ZBhvH6QlaR+8oUgajokUF5JR0T9cvTvE7B2pLq5FcYh/OxBVZe8XtnzuqJF+pkC1lc57hOUwmW6AWwB9R7IcYmEqnABsSsFskLV4bKmxulV79wJIJlOXBxjdmQ7YsrmVPXikPr2w8Hx3CdWjXf12OwIQ03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon 28-04-25 16:08:52, Luis Chamberlain wrote: > On Fri, Apr 25, 2025 at 03:51:55PM -0700, Luis Chamberlain wrote: > > On Wed, Apr 23, 2025 at 01:30:29PM -0700, Luis Chamberlain wrote: > > > On Wed, Apr 23, 2025 at 07:09:28PM +0200, Jan Kara wrote: > > > > On Wed 16-04-25 09:58:30, Luis Chamberlain wrote: > > > > > On Tue, Apr 15, 2025 at 06:28:55PM +0200, Jan Kara wrote: > > > > > > > So I tried: > > > > > > > > > > > > > > root@e1-ext4-2k /var/lib/xfstests # fsck /dev/loop5 -y 2>&1 > log > > > > > > > e2fsck 1.47.2 (1-Jan-2025) > > > > > > > root@e1-ext4-2k /var/lib/xfstests # wc -l log > > > > > > > 16411 log > > > > > > > > > > > > Can you share the log please? > > > > > > > > > > Sure, here you go: > > > > > > > > > > https://github.com/linux-kdevops/20250416-ext4-jbd2-bh-migrate-corruption > > > > > > > > > > The last trace-0004.txt is a fresh one with Davidlohr's patches > > > > > applied. It has trace-0004-fsck.txt. > > > > > > > > Thanks for the data! I was staring at them for some time and at this point > > > > I'm leaning towards a conclusion that this is actually not a case of > > > > metadata corruption but rather a bug in ext4 transaction credit computation > > > > that is completely independent of page migration. > > > > > > > > Based on the e2fsck log you've provided the only damage in the filesystem > > > > is from the aborted transaction handle in the middle of extent tree growth. > > > > So nothing points to a lost metadata write or anything like that. And the > > > > credit reservation for page writeback is indeed somewhat racy - we reserve > > > > number of transaction credits based on current tree depth. However by the > > > > time we get to ext4_ext_map_blocks() another process could have modified > > > > the extent tree so we may need to modify more blocks than we originally > > > > expected and reserved credits for. > > > > > > > > Can you give attached patch a try please? > > > > > > > > Honza > > > > -- > > > > Jan Kara > > > > SUSE Labs, CR > > > > > > > From 4c53fb9f4b9b3eb4a579f69b7adcb6524d55629c Mon Sep 17 00:00:00 2001 > > > > From: Jan Kara > > > > Date: Wed, 23 Apr 2025 18:10:54 +0200 > > > > Subject: [PATCH] ext4: Fix calculation of credits for extent tree modification > > > > > > > > Luis and David are reporting that after running generic/750 test for 90+ > > > > hours on 2k ext4 filesystem, they are able to trigger a warning in > > > > jbd2_journal_dirty_metadata() complaining that there are not enough > > > > credits in the running transaction started in ext4_do_writepages(). > > > > > > > > Indeed the code in ext4_do_writepages() is racy and the extent tree can > > > > change between the time we compute credits necessary for extent tree > > > > computation and the time we actually modify the extent tree. Thus it may > > > > happen that the number of credits actually needed is higher. Modify > > > > ext4_ext_index_trans_blocks() to count with the worst case of maximum > > > > tree depth. > > > > > > > > Link: https://lore.kernel.org/all/20250415013641.f2ppw6wov4kn4wq2@offworld > > > > Reported-by: Davidlohr Bueso > > > > Reported-by: Luis Chamberlain > > > > Signed-off-by: Jan Kara > > > > > > I kicked off tests! Let's see after ~ 90 hours! > > > > Tested-by: kdevops@lists.linux.dev > > > > I have run the test over 3 separate guests and each one has tested this > > over 48 hours each. There is no ext4 fs corruption reported, all is > > good, so I do believe thix fixes the issue. One of the guests was on > > Linus't tree which didn't yet have Davidlorh's fixes for folio migration. > > And so I believe this patch should have a stable tag fix so stable gets it. > > Jan, my testing has passed 120 hours now on multiple guests. This is > certainly a fixed bug with your patch. Thanks for testing! I'll do an official submission of the fix. Honza -- Jan Kara SUSE Labs, CR