From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B018C43219 for ; Tue, 15 Nov 2022 01:30:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ADDD26B0072; Mon, 14 Nov 2022 20:30:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A2C268E0002; Mon, 14 Nov 2022 20:30:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 740E98E0001; Mon, 14 Nov 2022 20:30:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5D8556B0072 for ; Mon, 14 Nov 2022 20:30:50 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 2D659A05AA for ; Tue, 15 Nov 2022 01:30:50 +0000 (UTC) X-FDA: 80133947460.06.EF9BE92 Received: from mail-pg1-f169.google.com (mail-pg1-f169.google.com [209.85.215.169]) by imf04.hostedemail.com (Postfix) with ESMTP id BF69C40008 for ; Tue, 15 Nov 2022 01:30:49 +0000 (UTC) Received: by mail-pg1-f169.google.com with SMTP id q1so11865145pgl.11 for ; Mon, 14 Nov 2022 17:30:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=UpYORrtHPEFkdqKAbT3ik18Ay5BZZZbilDStDtNjXS8=; b=co43QgtTK4koY81+x7Mo6GXWhA8qafRWtVduLzA3bTAJl8WxeaRPbiJZNKYpZvtv98 BBfXbZtdI9El//DFgPBA3E6qxVtEWYfRyAadPz9RFZwsBQoXz26JkDUb5MtjjcvgnuU6 H/6MiKVkvnQSQKTvlNghrZ3Ag66qduR2ABeHfYCn0Fwdu88F4MOQpTC5TrSqp+lNGouq Rr5jP3OZu6Nfy3I0yadxa2fjIda663NaefuTlLmr5xVjGTCWQubF0+38igI0nlXOkxbY TMnuc3qwv9lmbKuaa+duJvVX8XPUiSWifr3CeWjxTrQ5A0LZYVpfELEUPY5QvTEi+zLN Bvzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UpYORrtHPEFkdqKAbT3ik18Ay5BZZZbilDStDtNjXS8=; b=pZUCABvictliq3tONnjMitpPyn4u4ba+rQl0eH6DunvMdA+ahRZT48wgK5bL0v8S8n EHQNzVDn02TOZZRwxH/3Y4pcYQseJ10kFSKAUKdLLt17Lnue2YHJo8JoJ5GHlVA7LAtM +ou7FZfpEVlYtr+hOZUPbmR9TyrW50lqpJzQxdYjjMliCYtjxRkwUNWoNUrmrej6YR7p 8KTJDjp3ZE6aEzVnMUkd/yBNUa9QvRaIRLgUR0oUiHuYGp8YXkjK2UuWQXUEAPiixIk9 zplE/da7FWaRYCzYod5UODds9++NE+3S8LI+WVirIDByyNaxVU5jha+ZkRw8wTiKpB4N TH+A== X-Gm-Message-State: ANoB5pkzAIjSqL2qYFIj4obUT7tNBrzL0vPmCR4NULIcPb8g0gde6l/0 6KLu3c6Hy7bgNs0bErRCyqdaHY5fD/397A== X-Google-Smtp-Source: AA0mqf7cXllmyDaiPWcvaKRz/fwiRS4L+bWOPOhO9q5Y16NL3MgbOh3d+5jMiP3wDCY0eNsTVcQnmQ== X-Received: by 2002:a63:3601:0:b0:46e:fa6d:497 with SMTP id d1-20020a633601000000b0046efa6d0497mr13560226pga.60.1668475848708; Mon, 14 Nov 2022 17:30:48 -0800 (PST) Received: from dread.disaster.area (pa49-181-106-210.pa.nsw.optusnet.com.au. [49.181.106.210]) by smtp.gmail.com with ESMTPSA id n12-20020a170902e54c00b00186f608c543sm8304788plf.304.2022.11.14.17.30.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Nov 2022 17:30:47 -0800 (PST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1oukmj-00EKFw-B3; Tue, 15 Nov 2022 12:30:45 +1100 Received: from dave by discord.disaster.area with local (Exim 4.96) (envelope-from ) id 1oukmj-001VpM-10; Tue, 15 Nov 2022 12:30:45 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 3/9] xfs: punching delalloc extents on write failure is racy Date: Tue, 15 Nov 2022 12:30:37 +1100 Message-Id: <20221115013043.360610-4-david@fromorbit.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20221115013043.360610-1-david@fromorbit.com> References: <20221115013043.360610-1-david@fromorbit.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=co43QgtT; spf=none (imf04.hostedemail.com: domain of david@fromorbit.com has no SPF policy when checking 209.85.215.169) smtp.mailfrom=david@fromorbit.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668475849; a=rsa-sha256; cv=none; b=Bs6FKXOz4iwUQhPc9VY561Y7bMq1tGbLzloxwKqYR1GTDrua2zr+AfZBdWHGee1S+vt8mN d80hKzUgSaacM/J1a0XOb2Vd6xFgXfvZmPk6xxz0ufftNy0AYN4nCac/dEg9p64M4HiTlV L9YVlhCFBuD8oOKQJ2ip74cmhx0Uot8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668475849; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UpYORrtHPEFkdqKAbT3ik18Ay5BZZZbilDStDtNjXS8=; b=6RO3h/fs02aW2W21czjYRS+wAhRoEYF3Z/fzuNGxt3siB3wr1KsxF0eDigzedYUx5LvtIj VtGToyJKm4DylS7trbJVxZ4kHnnNJlnNTAjZ7efnHyyus/w8bKeRdjbdu55sw319otr8Jy etTK4Lxgc6RElRzEpir2C7DWUlwEE28= X-Rspam-User: Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=co43QgtT; spf=none (imf04.hostedemail.com: domain of david@fromorbit.com has no SPF policy when checking 209.85.215.169) smtp.mailfrom=david@fromorbit.com; dmarc=none X-Stat-Signature: 9zywaj444z54nt8arafxfhupwf4wxbah X-Rspamd-Queue-Id: BF69C40008 X-Rspamd-Server: rspam09 X-HE-Tag: 1668475849-65683 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Dave Chinner xfs_buffered_write_iomap_end() has a comment about the safety of punching delalloc extents based holding the IOLOCK_EXCL. This comment is wrong, and punching delalloc extents is not race free. When we punch out a delalloc extent after a write failure in xfs_buffered_write_iomap_end(), we punch out the page cache with truncate_pagecache_range() before we punch out the delalloc extents. At this point, we only hold the IOLOCK_EXCL, so there is nothing stopping mmap() write faults racing with this cleanup operation, reinstantiating a folio over the range we are about to punch and hence requiring the delalloc extent to be kept. If this race condition is hit, we can end up with a dirty page in the page cache that has no delalloc extent or space reservation backing it. This leads to bad things happening at writeback time. To avoid this race condition, we need the page cache truncation to be atomic w.r.t. the extent manipulation. We can do this by holding the mapping->invalidate_lock exclusively across this operation - this will prevent new pages from being inserted into the page cache whilst we are removing the pages and the backing extent and space reservation. Taking the mapping->invalidate_lock exclusively in the buffered write IO path is safe - it naturally nests inside the IOLOCK (see truncate and fallocate paths). iomap_zero_range() can be called from under the mapping->invalidate_lock (from the truncate path via either xfs_zero_eof() or xfs_truncate_page(), but iomap_zero_iter() will not instantiate new delalloc pages (because it skips holes) and hence will not ever need to punch out delalloc extents on failure. Fix the locking issue, and clean up the code logic a little to avoid unnecessary work if we didn't allocate the delalloc extent or wrote the entire region we allocated. Signed-off-by: Dave Chinner --- fs/xfs/xfs_iomap.c | 41 +++++++++++++++++++++++------------------ 1 file changed, 23 insertions(+), 18 deletions(-) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 5cea069a38b4..a2e45ea1b0cb 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1147,6 +1147,10 @@ xfs_buffered_write_iomap_end( written = 0; } + /* If we didn't reserve the blocks, we're not allowed to punch them. */ + if (!(iomap->flags & IOMAP_F_NEW)) + return 0; + /* * start_fsb refers to the first unused block after a short write. If * nothing was written, round offset down to point at the first block in @@ -1158,27 +1162,28 @@ xfs_buffered_write_iomap_end( start_fsb = XFS_B_TO_FSB(mp, offset + written); end_fsb = XFS_B_TO_FSB(mp, offset + length); + /* Nothing to do if we've written the entire delalloc extent */ + if (start_fsb >= end_fsb) + return 0; + /* - * Trim delalloc blocks if they were allocated by this write and we - * didn't manage to write the whole range. - * - * We don't need to care about racing delalloc as we hold i_mutex - * across the reserve/allocate/unreserve calls. If there are delalloc - * blocks in the range, they are ours. + * Lock the mapping to avoid races with page faults re-instantiating + * folios and dirtying them via ->page_mkwrite between the page cache + * truncation and the delalloc extent removal. Failing to do this can + * leave dirty pages with no space reservation in the cache. */ - if ((iomap->flags & IOMAP_F_NEW) && start_fsb < end_fsb) { - truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb), - XFS_FSB_TO_B(mp, end_fsb) - 1); - - error = xfs_bmap_punch_delalloc_range(ip, start_fsb, - end_fsb - start_fsb); - if (error && !xfs_is_shutdown(mp)) { - xfs_alert(mp, "%s: unable to clean up ino %lld", - __func__, ip->i_ino); - return error; - } + filemap_invalidate_lock(inode->i_mapping); + truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb), + XFS_FSB_TO_B(mp, end_fsb) - 1); + + error = xfs_bmap_punch_delalloc_range(ip, start_fsb, + end_fsb - start_fsb); + filemap_invalidate_unlock(inode->i_mapping); + if (error && !xfs_is_shutdown(mp)) { + xfs_alert(mp, "%s: unable to clean up ino %lld", + __func__, ip->i_ino); + return error; } - return 0; } -- 2.37.2