From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56EE7C433FE for ; Tue, 15 Nov 2022 23:53:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ACB6E8E0001; Tue, 15 Nov 2022 18:53:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A851B6B0074; Tue, 15 Nov 2022 18:53:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 943008E0001; Tue, 15 Nov 2022 18:53:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 847736B0073 for ; Tue, 15 Nov 2022 18:53:45 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 4F09340C0D for ; Tue, 15 Nov 2022 23:53:45 +0000 (UTC) X-FDA: 80137331610.05.F5256DD Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf07.hostedemail.com (Postfix) with ESMTP id EDFD640008 for ; Tue, 15 Nov 2022 23:53:44 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1A91460E74; Tue, 15 Nov 2022 23:53:44 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 76939C433D6; Tue, 15 Nov 2022 23:53:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1668556423; bh=tinpog2j2TyOwTj0/2DkQxmgA7ouFbzuhWwiDwP3sD4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=GYwkLPL4a+wEKDYPunTywJdNP9aoJRgEJGYmi6LMCUlMfSh23ktgwWDYuQbhDPHRP j0m1mvNJ6Oq4jeAThlq1hadt3ueR9cdnQYt0R/qW6tfxgeUWHOX3OQNg7eh8bUchN9 /Ir0dV07sDw0KXBccvKw62yHCOgXsL1mP2qBq/OIaUuPk6k/GSRNFXiqWQvonRK+5x YGdy0B9ByUESYUd1CoMicmJpzcNZzgJ9ks0FlX/6VlS+P87rr15dDqOQX0lS6QEqol R7T3csuowLxntIZa2R2oKM00GJuCWr3PIH9s56TtgDYCSpTmz/54fR63RuOFICpe/I tlATG4/x32Oqw== Date: Tue, 15 Nov 2022 15:53:43 -0800 From: "Darrick J. Wong" To: Dave Chinner Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 3/9] xfs: punching delalloc extents on write failure is racy Message-ID: References: <20221115013043.360610-1-david@fromorbit.com> <20221115013043.360610-4-david@fromorbit.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221115013043.360610-4-david@fromorbit.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668556425; a=rsa-sha256; cv=none; b=H/yqNj1NQg+i1SNZfuIJkdsi0HRgivNB08w+QsRXh9wR+voFdalZ03KEE0Xx7jUzc7EtaE yQ8FdMefDWZFHaK13UyhqnP8o8B18CKSAWjd+vRHhSCQW7c9rOal6hrp6TOqhxGuLzldgq uJtisvKHdLtMjjbnUCpcUzg7KbV33Ok= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=GYwkLPL4; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf07.hostedemail.com: domain of djwong@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=djwong@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668556425; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=M0PqGplGNX8OPGiy6mynqwQgy+nUAV9f/RE8V4I5Ebs=; b=Zd24pIcx6MkJKp2RSIZOABCYowlM0MNC60mDCQgtqbmP6fUOxmjXmhlMnjuOyQrGjS3msf 8vNy0FeGHfjDRnlmXagA2F3GZgk17U67kwgozhD9YtT1RhqeVdA/IMVd82c8ZU3UAR2+SZ uCjVJsNyToZoz5RdOB4X2iKE4lPxOz4= X-Stat-Signature: i65iromrp8ifc9ecd7yu76aznh9ew8so X-Rspamd-Queue-Id: EDFD640008 X-Rspam-User: Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=GYwkLPL4; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf07.hostedemail.com: domain of djwong@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=djwong@kernel.org X-Rspamd-Server: rspam09 X-HE-Tag: 1668556424-16407 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Nov 15, 2022 at 12:30:37PM +1100, Dave Chinner wrote: > From: Dave Chinner > > xfs_buffered_write_iomap_end() has a comment about the safety of > punching delalloc extents based holding the IOLOCK_EXCL. This > comment is wrong, and punching delalloc extents is not race free. > > When we punch out a delalloc extent after a write failure in > xfs_buffered_write_iomap_end(), we punch out the page cache with > truncate_pagecache_range() before we punch out the delalloc extents. > At this point, we only hold the IOLOCK_EXCL, so there is nothing > stopping mmap() write faults racing with this cleanup operation, > reinstantiating a folio over the range we are about to punch and > hence requiring the delalloc extent to be kept. > > If this race condition is hit, we can end up with a dirty page in > the page cache that has no delalloc extent or space reservation > backing it. This leads to bad things happening at writeback time. > > To avoid this race condition, we need the page cache truncation to > be atomic w.r.t. the extent manipulation. We can do this by holding > the mapping->invalidate_lock exclusively across this operation - > this will prevent new pages from being inserted into the page cache > whilst we are removing the pages and the backing extent and space > reservation. > > Taking the mapping->invalidate_lock exclusively in the buffered > write IO path is safe - it naturally nests inside the IOLOCK (see > truncate and fallocate paths). iomap_zero_range() can be called from > under the mapping->invalidate_lock (from the truncate path via > either xfs_zero_eof() or xfs_truncate_page(), but iomap_zero_iter() > will not instantiate new delalloc pages (because it skips holes) and > hence will not ever need to punch out delalloc extents on failure. > > Fix the locking issue, and clean up the code logic a little to avoid > unnecessary work if we didn't allocate the delalloc extent or wrote > the entire region we allocated. > > Signed-off-by: Dave Chinner Didn't I already tag this...? Reviewed-by: Darrick J. Wong --D > --- > fs/xfs/xfs_iomap.c | 41 +++++++++++++++++++++++------------------ > 1 file changed, 23 insertions(+), 18 deletions(-) > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > index 5cea069a38b4..a2e45ea1b0cb 100644 > --- a/fs/xfs/xfs_iomap.c > +++ b/fs/xfs/xfs_iomap.c > @@ -1147,6 +1147,10 @@ xfs_buffered_write_iomap_end( > written = 0; > } > > + /* If we didn't reserve the blocks, we're not allowed to punch them. */ > + if (!(iomap->flags & IOMAP_F_NEW)) > + return 0; > + > /* > * start_fsb refers to the first unused block after a short write. If > * nothing was written, round offset down to point at the first block in > @@ -1158,27 +1162,28 @@ xfs_buffered_write_iomap_end( > start_fsb = XFS_B_TO_FSB(mp, offset + written); > end_fsb = XFS_B_TO_FSB(mp, offset + length); > > + /* Nothing to do if we've written the entire delalloc extent */ > + if (start_fsb >= end_fsb) > + return 0; > + > /* > - * Trim delalloc blocks if they were allocated by this write and we > - * didn't manage to write the whole range. > - * > - * We don't need to care about racing delalloc as we hold i_mutex > - * across the reserve/allocate/unreserve calls. If there are delalloc > - * blocks in the range, they are ours. > + * Lock the mapping to avoid races with page faults re-instantiating > + * folios and dirtying them via ->page_mkwrite between the page cache > + * truncation and the delalloc extent removal. Failing to do this can > + * leave dirty pages with no space reservation in the cache. > */ > - if ((iomap->flags & IOMAP_F_NEW) && start_fsb < end_fsb) { > - truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb), > - XFS_FSB_TO_B(mp, end_fsb) - 1); > - > - error = xfs_bmap_punch_delalloc_range(ip, start_fsb, > - end_fsb - start_fsb); > - if (error && !xfs_is_shutdown(mp)) { > - xfs_alert(mp, "%s: unable to clean up ino %lld", > - __func__, ip->i_ino); > - return error; > - } > + filemap_invalidate_lock(inode->i_mapping); > + truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb), > + XFS_FSB_TO_B(mp, end_fsb) - 1); > + > + error = xfs_bmap_punch_delalloc_range(ip, start_fsb, > + end_fsb - start_fsb); > + filemap_invalidate_unlock(inode->i_mapping); > + if (error && !xfs_is_shutdown(mp)) { > + xfs_alert(mp, "%s: unable to clean up ino %lld", > + __func__, ip->i_ino); > + return error; > } > - > return 0; > } > > -- > 2.37.2 >