From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C08D1C369AB for ; Tue, 15 Apr 2025 21:06:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 90CBF6B019C; Tue, 15 Apr 2025 17:06:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 895896B019E; Tue, 15 Apr 2025 17:06:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 733A4280001; Tue, 15 Apr 2025 17:06:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 510706B019C for ; Tue, 15 Apr 2025 17:06:25 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8BC9D1213D2 for ; Tue, 15 Apr 2025 21:06:25 +0000 (UTC) X-FDA: 83337511530.25.9C0A061 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf07.hostedemail.com (Postfix) with ESMTP id D3E214000E for ; Tue, 15 Apr 2025 21:06:23 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=undf+jZc; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf07.hostedemail.com: domain of mcgrof@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=mcgrof@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744751184; a=rsa-sha256; cv=none; b=LVFsMQvsWi14fEFiSc5trQzAz403hao8gXkp9eKioRZSK4JV827jDVT64YZVQoJcxHqZ3q gbNJEOmWvtL119wIjrjhitqpVLGR2JgK7RiqjA1nrAXns+Bkhyajg6Y36zzoirpqYAMMUY ArrRhKVSDmkLhp6lt8FfXpVy4NxVNrw= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=undf+jZc; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf07.hostedemail.com: domain of mcgrof@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=mcgrof@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744751184; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HuS5Spiz7qO9v67h5qE8XTj6obphATNt6gREZNhHzFo=; b=5DqZ3I6W7LyofJgdEbz5Qlfe+RtnGLLIgyTt9VpXVmM+na4Ly9zIs5WMK6LYpyrLWLuHk1 3udibANyB8QPXGfA+Y9fDXYH6oPUqiaX/tcvZrKPkMs9mL6lwBWUhiTbdYpf3xN8xUjU/5 tLwcL2OylgPEvLa63nNWsWM+z5SP5KI= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id D08E25C5A48; Tue, 15 Apr 2025 21:04:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 00061C4CEE7; Tue, 15 Apr 2025 21:06:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1744751182; bh=EU2IYnaeN39G2kJHqb87k5W39AXqIoNMyAG+a9aBwqU=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=undf+jZcbkPvPdy7FelgDomTaoGuUn9+aO7AkU64SM4MUXv3+KvcJ23nXjo7iVwsv YK9zuo9w5ak0x3THwRVWe1EgIngS171uV6lymMfSfvKKzrcp1uEs+7pxohJ3B02fUc AYBE6WIqc3nUeQD6pFrH/zalOBvGYXWpmQbFHWCgf3xsPzgT7lG3pb5T52WbLsw3OX Z08jG0z86D14wlPEN4KNfiP6CkRX637SmpF+09uqRZ+VECWzc2VhsrRMc9BGyj89AO 4b2aMMx0lFXH2rdd4GItvYWV/QyvDM9JLKDXUoV1oN+mMvt+fC//Wrylk3ao3GB9io 9lov5pXbdQNJw== Date: Tue, 15 Apr 2025 14:06:20 -0700 From: Luis Chamberlain To: Jan Kara Cc: Christian Brauner , tytso@mit.edu, adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org, riel@surriel.com, dave@stgolabs.net, willy@infradead.org, hannes@cmpxchg.org, oliver.sang@intel.com, david@redhat.com, axboe@kernel.dk, hare@suse.de, david@fromorbit.com, djwong@kernel.org, ritesh.list@gmail.com, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-mm@kvack.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com, syzbot+f3c6fda1297c748a7076@syzkaller.appspotmail.com Subject: Re: [PATCH v2 1/8] migrate: fix skipping metadata buffer heads on migration Message-ID: References: <20250410014945.2140781-1-mcgrof@kernel.org> <20250410014945.2140781-2-mcgrof@kernel.org> <20250415-freihalten-tausend-a9791b9c3a03@brauner> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: D3E214000E X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: affwxzd1xzcsnh35pzzsq8jzsfdy6qmm X-HE-Tag: 1744751183-413909 X-HE-Meta: U2FsdGVkX19PH7XodTFBD+R+PHW9K6oejaOttvXCLxkv5bmC/dNE13CVXs/8PVqe7ap7Oz/e8cKu38P2ShlyKTQNhl+p57qeoJoXhq/8VY7B0P962/lX0zPkykFlnqAUFUA/MWwgkXpFaq/9swiIZUHZS4d9GTDeNOxK0QuNx2W//uN1olUA0qooulIQyw8uaOIVNL/FsWE7DriDwXfaV3GdnB7Z/sN7N+znCoN0n40fmswJlwzv547/jjo9tXM6XTbIQbyjAseQoz/7It18YnUjeVVlk6wppaL1aVGMi+ubb7yy8d7QXDAhyjg9/4yZgkOB5EQE53lH5t/mZqDumxM5uOoBY/wKX9DoM9JMbdEFx8AlAmhuPmihWTxmV3deeDSPvH7fxeMf9enoRSrxumZw2FNbG5znbrrMlnJUmGdxHJGoRPVLiuW0IegRrvjIKdqQrCBntWAah6K7ImMkR0mPWfZGiQD5LEFDMzPa8RzrIJFDvVqwuRYTqdYg6nqZ5Os23LRcaMcbILnushFIEn7q6ilVeCUDfORH5ytipHPkz9SFY8K3ygTXgLM6CJcQHIj6WM1zWMmvdfLuakKAu8K5fWVZUg6V2jK6gF/1Pj6mgGF1KxPdDZkc7cR0k7D0cvsp0dRuVKq+tLsJm+5oZprXhchvvswESF355QtlkIyaWTdDdjZRZG4Xiu4ffX2lIHL1nTHfoUC/BgiAgGOf2G+V/ucq+ITmcgRvBfZOT0948pvx3AVvqq+4F4BNeCcOgoAWlgzE3NP0yvdvxyVHV6y4rY0p0uLR7hLdkgwmPVDLxOmXx+LtUIqUkkQeqN6cONgB1madmCw7dIzjXn5Cl45cY/HhqP40KCVru7O36X7OQEg4j+xK0B7eKQEDNSPnu9VSmyFA9P35SXyrlMxSkk9S6rpHnIVcTus/uBE6ktzUQAyUnR4BUvkpFAdteAa+0lA+/RPOAaevxvWsKmi W9ybHrf5 J/ua/EB/wcajVYkYHjAC0xnq77dKOdYQNVVPyY90Ddq8x/sQjMvzOAFeJwHrfx5+1qlAOlG+6jE7lWsSYLNbDWi4OCnz24v/KglbIA0p7stOVPe3DSwl+UGvvKKW75hciTF/TxqI9rDyZgNCUBzZI0Fg1SPwz/xOCHG+EAAWmfAEnk1KSQ4UewK3PruVOETWSu4iBHzekDkNeSTpwQhvacSq1SuEZSAEBXPbKZMKAsyibSW44h6ATHbxlSSTKZI+ktdL1VBowyoN/cY4L7Gcm6oSX7SQ/C8kDeW9C8o6vhVvjUOw2DeGdJCeBbg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Apr 15, 2025 at 06:23:54PM +0200, Jan Kara wrote: > On Tue 15-04-25 08:47:51, Luis Chamberlain wrote: > > On Tue, Apr 15, 2025 at 11:05:38AM +0200, Christian Brauner wrote: > > > On Mon, Apr 14, 2025 at 03:19:33PM -0700, Luis Chamberlain wrote: > > > > On Mon, Apr 14, 2025 at 02:09:46PM -0700, Luis Chamberlain wrote: > > > > > On Thu, Apr 10, 2025 at 02:05:38PM +0200, Jan Kara wrote: > > > > > > > @@ -859,12 +862,12 @@ static int __buffer_migrate_folio(struct address_space *mapping, > > > > > > > } > > > > > > > bh = bh->b_this_page; > > > > > > > } while (bh != head); > > > > > > > + spin_unlock(&mapping->i_private_lock); > > > > > > > > > > > > No, you've just broken all simple filesystems (like ext2) with this patch. > > > > > > You can reduce the spinlock critical section only after providing > > > > > > alternative way to protect them from migration. So this should probably > > > > > > happen at the end of the series. > > > > > > > > > > So you're OK with this spin lock move with the other series in place? > > > > > > > > > > And so we punt the hard-to-reproduce corruption issue as future work > > > > > to do? Becuase the other alternative for now is to just disable > > > > > migration for jbd2: > > > > > > > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > > > > index 1dc09ed5d403..ef1c3ef68877 100644 > > > > > --- a/fs/ext4/inode.c > > > > > +++ b/fs/ext4/inode.c > > > > > @@ -3631,7 +3631,6 @@ static const struct address_space_operations ext4_journalled_aops = { > > > > > .bmap = ext4_bmap, > > > > > .invalidate_folio = ext4_journalled_invalidate_folio, > > > > > .release_folio = ext4_release_folio, > > > > > - .migrate_folio = buffer_migrate_folio_norefs, > > > > > .is_partially_uptodate = block_is_partially_uptodate, > > > > > .error_remove_folio = generic_error_remove_folio, > > > > > .swap_activate = ext4_iomap_swap_activate, > > > > > > > > BTW I ask because.. are your expectations that the next v3 series also > > > > be a target for Linus tree as part of a fix for this spinlock > > > > replacement? > > > > > > Since this is fixing potential filesystem corruption I will upstream > > > whatever we need to do to fix this. Ideally we have a minimal fix to > > > upstream now and a comprehensive fix and cleanup for v6.16. > > > > Despite our efforts we don't yet have an agreement on how to fix the > > ext4 corruption, becuase Jan noted the buffer_meta() check in this patch > > is too broad and would affect other filesystems (I have yet to > > understand how, but will review). > > > > And so while we have agreement we can remove the spin lock to fix the > > sleeping while atomic incurred by large folios for buffer heads by this > > patch series, the removal of the spin lock would happen at the end of > > this series. > > > > And so the ext4 corruption is an existing issue as-is today, its > > separate from the spin lock removal goal to fix the sleeping while > > atomic.. > > I agree. Ext4 corruption problems are separate from sleeping in atomic > issues. Glad that's clear. > > However this series might be quite big for an rc2 or rc3 fix for that spin > > lock removal issue. It should bring in substantial performance benefits > > though, so it might be worthy to consider. We can re-run tests with the > > adjustment to remove the spin lock until the last patch in this series. > > > > The alternative is to revert the spin lock addition commit for Linus' > > tree, ie commit ebdf4de5642fb6 ("mm: migrate: fix reference check race > > between __find_get_block() and migration") and note that it in fact does > > not fix the ext4 corruption as we've noted, and in fact causes an issue > > with sleeping while atomic with support for large folios for buffer > > heads. If we do that then we punt this series for the next development > > window, and it would just not have the spin lock removal on the last > > patch. > > Well, the commit ebdf4de5642fb6 is 6 years old. At that time there were no > large folios (in fact there were no folios at all ;)) in the page cache and > it does work quite well (I didn't see a corruption report from real users > since then). It is still a work around. > So I don't like removing that commit because it makes a > "reproducible with a heavy stress test" problem become a "reproduced by > real world workloads" problem. So how about just patch 2 and 8 in this series, with the spin lock removal happening on the last patch for Linus tree? Luis