From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0F3FC3DA63 for ; Thu, 18 Jul 2024 16:02:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 29D926B0096; Thu, 18 Jul 2024 12:02:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 24D406B0098; Thu, 18 Jul 2024 12:02:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 13C456B0099; Thu, 18 Jul 2024 12:02:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E85956B0096 for ; Thu, 18 Jul 2024 12:02:15 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 7BAE11A100D for ; Thu, 18 Jul 2024 16:02:15 +0000 (UTC) X-FDA: 82353340230.14.0DD5131 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf23.hostedemail.com (Postfix) with ESMTP id 15EE0140020 for ; Thu, 18 Jul 2024 16:02:08 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=iI0Cu8PP; spf=pass (imf23.hostedemail.com: domain of djwong@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=djwong@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721318497; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1ZmhrFnmYFml6fzh2yc+okaqhPJMVjk+2LMi7DNRpTg=; b=iRxYWd0rZdW8dZBCYorhNn3E85WoeMwQU3UU/gWZ+O1QRAbloexsgjij5XcmUruPmRamFz /1HXx0tIu8Ks/NsqT0cJ4qtuVGUfGVTLWR+XV+iOhLIgDFowl0LdXwDeivte0SkrdipWiF eYoCqVhHVldZ0zQ13t9x5l1xXxDRzt8= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=iI0Cu8PP; spf=pass (imf23.hostedemail.com: domain of djwong@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=djwong@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721318497; a=rsa-sha256; cv=none; b=a8XhlUlyC0yYLWkxxCW3tBuGJDZ2V9Yo0ygZhm57wCHqzVbcwNfYGRhkwNkW39r6iMa019 8wVRITQNCkwyTNuhfOtg82iSazFf7BVyfZmwZMgLb6Rk+6dBzmGkWJN+u3rpqzD2V/Vtki pK5S/qWZowXHEoiBjEtksQ3nDhzz0aQ= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id A338FCE1A1F; Thu, 18 Jul 2024 16:02:04 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D0050C116B1; Thu, 18 Jul 2024 16:02:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1721318523; bh=chaMx9s0OPNI5Z83DMujo8XKQemJrFuGl7D3JBJIH/Y=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=iI0Cu8PPtEXbTYKZQp8/M7xkyPuUFZoXYWEk5RZil8xgg3YSv1wLP6shLL7c52bAP oygIa4yxoGc3jYydhbfvDcZWqQTLg87J2nfdHBf0nWue9vwvwdZ0YRkD0eJw8jLEMO ypNOKyxRyfhkiduJVs3KKd8qVDeHgq8Jif8YuV+eNZbpcx8wB8TisWdwAyE3mFasm2 lF+V2gvDjg0PjBE0K2UpkGW7MW9xa6Ea38F1iM1Gsf5TKdPKmZ+Dwd6L4++7S3dukK amwfHiexgKzRYQe4CKZGaXx6LuscSa7cpxAVKok2E5tcfXY0KHR3kCv5Au4rsok5jk 2sDi+Ly+E1QYA== Date: Thu, 18 Jul 2024 09:02:02 -0700 From: "Darrick J. Wong" To: Josef Bacik Cc: Brian Foster , linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH RFC 0/4] iomap: zero dirty folios over unwritten mappings on zero range Message-ID: <20240718160202.GL612460@frogsfrogsfrogs> References: <20240718130212.23905-1-bfoster@redhat.com> <20240718153613.GC2099026@perftesting> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240718153613.GC2099026@perftesting> X-Stat-Signature: fn9bn93w7fu9xokwib9zkuupcqznuxg3 X-Rspam-User: X-Rspamd-Queue-Id: 15EE0140020 X-Rspamd-Server: rspam02 X-HE-Tag: 1721318528-125054 X-HE-Meta: U2FsdGVkX1+7S1359UA1mgXAhCqfUvJOVpKYdek8qA76NnlyZzgDOIpSnx0FTBqSQ3i+mep6Cnf7fhr/mb80MuVaoocTJskpG1kJwXSoS12hal0P/G6rH5ZvBv35mTLaazingLE44iPpN0Uz/g3kgtQMRp89hRZ2gmZuG9hcqJMSiDMAOq5R5gRpNWZjxhOqJwxfIXztGMk4H2NDWVji9UR009+ks3Ypr+jjPsnBM3z3hW2NcKl0xfqEow/bQ0/Usvt/205ug2B6M3BQnjISosfdmVc6b2wN81EI4OpqLNkRwBsj111ZZnnQzhzLQKcbyg/Kqj/gThz4IedR9AnMbuCmIyFn6p3MIb2Z4XDNeX/5W5+ec+e0M5l5Gw37mGbIonVC69WDp97ShgLBbSK9GyZ9M828dLZ7Ghooo7wWBEf81vaLm4upX1xYT8GrGXQGlFoTwl9pcmYUh0CQuH2bu4gmqVa9nT6IppZEaSymn7h8dnAGbmF6LIOntkKyv8uyhSAD+a9f8mYOlAoBTMLkftYjrkYZOMfh59NEFL2aoFAlU/Lis+rk3XE3fVJQIn30K6nYRSs+vgYSyCoIyLxgNIFLAdRKjngXwjVzA/TtEE25eccomSgyAFtP7HwuspcBUuH3SJ6QPK1ICfPmK2k70+kYqHm0rDz1AabSvNkMdNLS860G9PUBiDrqAOZC4CCibyyNQJ/1FqWJV/o4QwX59mbGqPC7nJm2ss4FVJmFhjLIRbT73RRl6h57SfR4Mm2F59ygTXaO26+EBkvoKkadc9joaRQeH30+RiA1CZrm5isBeWj1pzSCjH7iI2qRSwnZ9c3RkT6PRY2oAzORkaQ573FWF/UEwhk5Je0+WGKVNOAr/SqYBSkPE9Ujdfg6+c5Guiyd/Ypi28C0pdSq9FSmy1NTGT47jHd66xoEhC6G006qjLbA48zcFuGxbdvZkIsfbw7bW5SFlyNpQR9rMEN AZIhP84S 7rP5Dw+A3u8jKgTXrx6DZHYH9G/Ps6U8yOv+diK2otjec/DGB/7MJYDw9H2eHMr+LjiE80FERp5UIFZdG30kM3Pf24xwFQV+G++XnF8zWdlq6E8fkPfrfR3EWubSwATsCrf98+DK8spdW5toxsnGvoWn8M3dOhpGMNXXlxgyt6zBLJK3TXyySNV+Qdm2G9XVVMf5A/8PCrOAy48j7GQzIoOj78A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jul 18, 2024 at 11:36:13AM -0400, Josef Bacik wrote: > On Thu, Jul 18, 2024 at 09:02:08AM -0400, Brian Foster wrote: > > Hi all, > > > > This is a stab at fixing the iomap zero range problem where it doesn't > > correctly handle the case of an unwritten mapping with dirty pagecache. > > The gist is that we scan the mapping for dirty cache, zero any > > already-dirty folios via buffered writes as normal, but then otherwise > > skip clean ranges once we have a chance to validate those ranges against > > races with writeback or reclaim. > > > > This is somewhat simplistic in terms of how it scans, but that is > > intentional based on the existing use cases for zero range. From poking > > around a bit, my current sense is that there isn't any user of zero > > range that would ever expect to see more than a single dirty folio. Most > > callers either straddle the EOF folio or flush in higher level code for > > presumably (fs) context specific reasons. If somebody has an example to > > the contrary, please let me know because I'd love to be able to use it > > for testing. > > > > The caveat to this approach is that it only works for filesystems that > > implement folio_ops->iomap_valid(), which is currently just XFS. GFS2 > > doesn't use ->iomap_valid() and does call zero range, but AFAICT it > > doesn't actually export unwritten mappings so I suspect this is not a > > problem. My understanding is that ext4 iomap support is in progress, but > > I've not yet dug into what that looks like (though I suspect similar to > > XFS). The concern is mainly that this leaves a landmine for fs that > > might grow support for unwritten mappings && zero range but not > > ->iomap_valid(). We'd likely never know zero range was broken for such > > fs until stale data exposure problems start to materialize. > > > > I considered adding a fallback to just add a flush at the top of > > iomap_zero_range() so at least all future users would be correct, but I > > wanted to gate that on the absence of ->iomap_valid() and folio_ops > > isn't provided until iomap_begin() time. I suppose another way around > > that could be to add a flags param to iomap_zero_range() where the > > caller could explicitly opt out of a flush, but that's still kind of > > ugly. I dunno, maybe better than nothing..? Or move ->iomap_valid to the iomap ops structure. It's a mapping predicate, and has nothing to do with folios. > > So IMO, this raises the question of whether this is just unnecessarily > > overcomplicated. The KISS principle implies that it would also be > > perfectly fine to do a conditional "flush and stale" in zero range > > whenever we see the combination of an unwritten mapping and dirty > > pagecache (the latter checked before or during ->iomap_begin()). That's > > simple to implement and AFAICT would work/perform adequately and > > generically for all filesystems. I have one or two prototypes of this > > sort of thing if folks want to see it as an alternative. I wouldn't mind seeing such a prototype. Start by hoisting the filemap_write_and_wait_range call to iomap, then adjust it only to do that if there's dirty pagecache + unwritten mappings? Then get more complicated from there, and we can decide if we want the increasing levels of trickiness. > I think this is the better approach, otherwise there's another behavior that's > gated behind having a callback that other filesystems may not know about and > thus have a gap. I think filesystems currently only need to supply an ->iomap_valid function for pagecache operations because those are the only ones where we have to maintain consistency between something that isn't locked when we get the mapping, and the mapping not being locked when we lock that first thing. I suspect they also only need to supply it if they support unwritten extents. >From what I can tell, the rest (e.g. directio/FIEMAP) don't care because callers get to manage concurrency. *But* in general it makes sense to me that any iomap operation ought to be able to revalidate a mapping at any time. > Additionally do you have a test for this stale data exposure? I think no matter > what the solution it would be good to have a test for this so that we can make > sure we're all doing the correct thing with zero range. Thanks, I was also curious about this. IIRC we have some tests for the validiting checking itself, but I don't recall if there's a specific regression test for the eofblock clearing. --D > Josef >