From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5828C369AB for ; Tue, 15 Apr 2025 16:24:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F240C2800AB; Tue, 15 Apr 2025 12:23:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ED1D428009B; Tue, 15 Apr 2025 12:23:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D733E2800AB; Tue, 15 Apr 2025 12:23:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id B73F528009B for ; Tue, 15 Apr 2025 12:23:59 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id BBCE91211D4 for ; Tue, 15 Apr 2025 16:24:00 +0000 (UTC) X-FDA: 83336799840.08.778652F Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf19.hostedemail.com (Postfix) with ESMTP id 6A75A1A000A for ; Tue, 15 Apr 2025 16:23:58 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="Jv4GuO/T"; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=DW03iMV3; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="Jv4GuO/T"; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=DW03iMV3; dmarc=none; spf=pass (imf19.hostedemail.com: domain of jack@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744734238; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Dsp5Dc6HOnMd9yPCyRiIOOb7SllY5qkSdNd2gHurJPY=; b=TBweRTx47yvfYdOlctG/X3MxLzyDC0TcN7OoLUmpPislOArX0RpBScFKpbD+GmBcT/sqgZ 4jcuz16Atr7WVrJt9aoAd8eU0L57D6XZhOgE4svjECAndCt4s6rb7yJ2jelQNUQkt6CWvw ctD22lFxtmgmfDDjv39wpZID3Q0nyyg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744734238; a=rsa-sha256; cv=none; b=2qdDbjpWk8GmdF64+y5SloClPOy2v4gpUKLB8wbiyzEnJEVwFJf39cqkZ5WU9qxqljtTs3 2FmciQ6OuG2xGvshRgarzR5L8y0O2YYTcDtoJ/SqPrxSThGqr5NhJNqh5kmMtaAKt+ycH9 O08PbEuMY3oc/77zLkcTM18nngq2Eb8= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="Jv4GuO/T"; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=DW03iMV3; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="Jv4GuO/T"; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=DW03iMV3; dmarc=none; spf=pass (imf19.hostedemail.com: domain of jack@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=jack@suse.cz Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 6A08A21184; Tue, 15 Apr 2025 16:23:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1744734235; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Dsp5Dc6HOnMd9yPCyRiIOOb7SllY5qkSdNd2gHurJPY=; b=Jv4GuO/T9ZIWl1L7QZPPVV//V2ckd1G3h29GQReIyKJ43xnx5jBxS7EcPSlS2ujHc4G4gC yZPuQra+lw8g08idMpr6jEZZ3Fb+PsYZhkpXE7rr4ahB4p7TRRy6ja5PMr2fD4+kcHQiiT cvQ3NhMxEKWM+TkYwqdZH8t4/lRMwlA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1744734235; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Dsp5Dc6HOnMd9yPCyRiIOOb7SllY5qkSdNd2gHurJPY=; b=DW03iMV3+e+SoUAyEQdrz6b6PgxdSn9pm2Ys2MfZ4LmnucYqcqvfZnzJp0sQADQYiZgSw7 iIZfZDQzBC4OUuAg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1744734235; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Dsp5Dc6HOnMd9yPCyRiIOOb7SllY5qkSdNd2gHurJPY=; b=Jv4GuO/T9ZIWl1L7QZPPVV//V2ckd1G3h29GQReIyKJ43xnx5jBxS7EcPSlS2ujHc4G4gC yZPuQra+lw8g08idMpr6jEZZ3Fb+PsYZhkpXE7rr4ahB4p7TRRy6ja5PMr2fD4+kcHQiiT cvQ3NhMxEKWM+TkYwqdZH8t4/lRMwlA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1744734235; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Dsp5Dc6HOnMd9yPCyRiIOOb7SllY5qkSdNd2gHurJPY=; b=DW03iMV3+e+SoUAyEQdrz6b6PgxdSn9pm2Ys2MfZ4LmnucYqcqvfZnzJp0sQADQYiZgSw7 iIZfZDQzBC4OUuAg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 5D5E0139A1; Tue, 15 Apr 2025 16:23:55 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id oMi/FhuI/mf1YAAAD6G6ig (envelope-from ); Tue, 15 Apr 2025 16:23:55 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 0AAA7A0947; Tue, 15 Apr 2025 18:23:55 +0200 (CEST) Date: Tue, 15 Apr 2025 18:23:54 +0200 From: Jan Kara To: Luis Chamberlain Cc: Christian Brauner , Jan Kara , tytso@mit.edu, adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org, riel@surriel.com, dave@stgolabs.net, willy@infradead.org, hannes@cmpxchg.org, oliver.sang@intel.com, david@redhat.com, axboe@kernel.dk, hare@suse.de, david@fromorbit.com, djwong@kernel.org, ritesh.list@gmail.com, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-mm@kvack.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com, syzbot+f3c6fda1297c748a7076@syzkaller.appspotmail.com Subject: Re: [PATCH v2 1/8] migrate: fix skipping metadata buffer heads on migration Message-ID: References: <20250410014945.2140781-1-mcgrof@kernel.org> <20250410014945.2140781-2-mcgrof@kernel.org> <20250415-freihalten-tausend-a9791b9c3a03@brauner> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Action: no action X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 6A75A1A000A X-Rspam-User: X-Stat-Signature: peqnby8de1y41otw4ipejh8biuno9dpg X-HE-Tag: 1744734238-123577 X-HE-Meta: U2FsdGVkX19EH6dRjfafwNK/8e5KDot3b39ZVszMYlJP/rVsFMPwEZ8rpe5WLZoGJo8fJWJFaIIp9ZmrpE3afVPsjtVlCEp6ODfPYWsGe4abJem+7/1pF/+XNMu/sl7Gid2cqTrD06ah3zjdeCNXmeOzJoUjAUJ1Pz4hSPSRZ9h1PxQVGkaRovztSRm2cGQ1WuaVMDnkCFRmjOXNKgoSV5AQCdjA7ZlHGFN8KXGkPI6LynAwLXKfIb42mPTfHi6iORb8CrnIu+/qdWEShRR2YWGUJdy8dSpoD0FuBnVf022qkakjfrvRk7V1XQPt6gX5c6knpOnn48wYcHoHfQXVcqqasFwEOd64Ak4kCIE+pjuHhNyPrTvByfqpsquYTL/xO/Smn94dLc5iyp7PBKdHJna67eN5BCP7VB6K6sRAx+UYJja2dUimHThjR8w9qHn7LZGrXa8M8zTv5fWrDV0NAUOYS6q+ilPpqNwLmgCBGr//0idy9mfQWkI1z1vfsLGMkZrk1BgkGsGhlHbKVxr2eN7qHgRjqGe1qjcPfYeQE03J2zIHQNpE+xmQ9DMfxRW7STcnIH1UHoWIBoceee9enjGsJURIjjnbD69uAvzjapsnx2zo1vabuDPxb+Bv7hiKK/JOJhjS/c88BdD1aJUF1n7TXgYWy4LuTHkC4u97i7KqeIlXA0qXufAFIUJ0enU4Oie/Ne3CQ/VDegoYhM+jnGhFzbGVmVXomE1rx2mDMgcbEhDpZTqN9L4DiaRrqUVR202j520mmdA1VLSMVSkT8xSXcBTF6pJLOoAtA4PyLCoKyLYKImOzhTRtM/Bz8ojSeHtwlpxgFuxoQwK2MfbR2xgRdh38KElrPhL5NoEXmDlEbu5Kt/FF19Us23jxmfXOLGu/csmx6TGsLL8JQjuK7VU46ghKxRgLUcL7CFIgp0/Dkfc2pEFxugAPfhLwcKLpAjNYO4HSUiruC6LnLaF GLDv7Ty0 k5Ho9PyIyy7FHh+dGSALi1trRURP7kuec0kF1NWiKJ9r3gKGkPrXQQ/4NBA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue 15-04-25 08:47:51, Luis Chamberlain wrote: > On Tue, Apr 15, 2025 at 11:05:38AM +0200, Christian Brauner wrote: > > On Mon, Apr 14, 2025 at 03:19:33PM -0700, Luis Chamberlain wrote: > > > On Mon, Apr 14, 2025 at 02:09:46PM -0700, Luis Chamberlain wrote: > > > > On Thu, Apr 10, 2025 at 02:05:38PM +0200, Jan Kara wrote: > > > > > > @@ -859,12 +862,12 @@ static int __buffer_migrate_folio(struct address_space *mapping, > > > > > > } > > > > > > bh = bh->b_this_page; > > > > > > } while (bh != head); > > > > > > + spin_unlock(&mapping->i_private_lock); > > > > > > > > > > No, you've just broken all simple filesystems (like ext2) with this patch. > > > > > You can reduce the spinlock critical section only after providing > > > > > alternative way to protect them from migration. So this should probably > > > > > happen at the end of the series. > > > > > > > > So you're OK with this spin lock move with the other series in place? > > > > > > > > And so we punt the hard-to-reproduce corruption issue as future work > > > > to do? Becuase the other alternative for now is to just disable > > > > migration for jbd2: > > > > > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > > > index 1dc09ed5d403..ef1c3ef68877 100644 > > > > --- a/fs/ext4/inode.c > > > > +++ b/fs/ext4/inode.c > > > > @@ -3631,7 +3631,6 @@ static const struct address_space_operations ext4_journalled_aops = { > > > > .bmap = ext4_bmap, > > > > .invalidate_folio = ext4_journalled_invalidate_folio, > > > > .release_folio = ext4_release_folio, > > > > - .migrate_folio = buffer_migrate_folio_norefs, > > > > .is_partially_uptodate = block_is_partially_uptodate, > > > > .error_remove_folio = generic_error_remove_folio, > > > > .swap_activate = ext4_iomap_swap_activate, > > > > > > BTW I ask because.. are your expectations that the next v3 series also > > > be a target for Linus tree as part of a fix for this spinlock > > > replacement? > > > > Since this is fixing potential filesystem corruption I will upstream > > whatever we need to do to fix this. Ideally we have a minimal fix to > > upstream now and a comprehensive fix and cleanup for v6.16. > > Despite our efforts we don't yet have an agreement on how to fix the > ext4 corruption, becuase Jan noted the buffer_meta() check in this patch > is too broad and would affect other filesystems (I have yet to > understand how, but will review). > > And so while we have agreement we can remove the spin lock to fix the > sleeping while atomic incurred by large folios for buffer heads by this > patch series, the removal of the spin lock would happen at the end of > this series. > > And so the ext4 corruption is an existing issue as-is today, its > separate from the spin lock removal goal to fix the sleeping while > atomic.. I agree. Ext4 corruption problems are separate from sleeping in atomic issues. > However this series might be quite big for an rc2 or rc3 fix for that spin > lock removal issue. It should bring in substantial performance benefits > though, so it might be worthy to consider. We can re-run tests with the > adjustment to remove the spin lock until the last patch in this series. > > The alternative is to revert the spin lock addition commit for Linus' > tree, ie commit ebdf4de5642fb6 ("mm: migrate: fix reference check race > between __find_get_block() and migration") and note that it in fact does > not fix the ext4 corruption as we've noted, and in fact causes an issue > with sleeping while atomic with support for large folios for buffer > heads. If we do that then we punt this series for the next development > window, and it would just not have the spin lock removal on the last > patch. Well, the commit ebdf4de5642fb6 is 6 years old. At that time there were no large folios (in fact there were no folios at all ;)) in the page cache and it does work quite well (I didn't see a corruption report from real users since then). So I don't like removing that commit because it makes a "reproducible with a heavy stress test" problem become a "reproduced by real world workloads" problem. If you look for a fast way to fixup sleep in atomic issues, then I'd suggest just disabling large pages for block device page cache. That is the new functionality that actually triggered all these investigations and sleep-in-atomic reports. And once this patch set gets merged, we can reenable large folios in the block device page cache again. Honza -- Jan Kara SUSE Labs, CR