From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25214C33C9E for ; Tue, 14 Jan 2020 16:13:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D3ADD24655 for ; Tue, 14 Jan 2020 16:13:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="ZKbbV85l" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D3ADD24655 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 730D78E0010; Tue, 14 Jan 2020 11:13:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6E0638E0009; Tue, 14 Jan 2020 11:13:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 583AF8E0010; Tue, 14 Jan 2020 11:13:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0067.hostedemail.com [216.40.44.67]) by kanga.kvack.org (Postfix) with ESMTP id 3A5988E0009 for ; Tue, 14 Jan 2020 11:13:01 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id 19E0A180AD81D for ; Tue, 14 Jan 2020 16:13:01 +0000 (UTC) X-FDA: 76376733762.04.river53_37620c4742638 X-HE-Tag: river53_37620c4742638 X-Filterd-Recvd-Size: 8707 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Tue, 14 Jan 2020 16:13:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From :Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=4Hexr2/FIsZz1f1S0iN0yG1Q7M/61ZediDTapXBtYP8=; b=ZKbbV85l+TR0/8EQoI6oK7HMR+ L/ai9aEQ+wkkVnz+mMSjdF1R+fg/w7+4CTKvQpQgDarOYB1MK63tY0vW+MlRFVAuoLWnQ8aQ7jWJD 4JesyRb3zYBTonvFcxVoMXA8CzIQa62hP9zFTQBaKJQt3gwCJYM6vjXKK2bD77kIspkFYRowTsJ0m pjT/twezcuXLpVRtlDpq5s7aph1jeYdoQwPNFbOHup/a/DOyV/htDWG5QbTxrNeSDYedphQ+97Lo/ nPBO8jOkNXxdTSzw+yW7al2k2hrlZliDhwcFfAqp11Z97xlb2mWLOGw7cqIwMP7oxuVX86Xe8+4R7 uAsY6AnQ==; Received: from [2001:4bb8:18c:4f54:fcbb:a92b:61e1:719] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1irOoI-0000Er-Ej; Tue, 14 Jan 2020 16:12:55 +0000 From: Christoph Hellwig To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, Waiman Long , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Will Deacon , Andrew Morton , linux-ext4@vger.kernel.org, cluster-devel@redhat.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 10/12] xfs: hold i_rwsem until AIO completes Date: Tue, 14 Jan 2020 17:12:23 +0100 Message-Id: <20200114161225.309792-11-hch@lst.de> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200114161225.309792-1-hch@lst.de> References: <20200114161225.309792-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Switch ext4 from the magic i_dio_count scheme to just hold i_rwsem until the actual I/O has completed to reduce the locking complexity and avoid nasty bugs due to missing inode_dio_wait calls. Signed-off-by: Christoph Hellwig --- fs/xfs/scrub/bmap.c | 1 - fs/xfs/xfs_bmap_util.c | 3 --- fs/xfs/xfs_file.c | 47 +++++++++++++----------------------------- fs/xfs/xfs_icache.c | 3 +-- fs/xfs/xfs_ioctl.c | 1 - fs/xfs/xfs_iops.c | 5 ----- fs/xfs/xfs_reflink.c | 2 -- 7 files changed, 15 insertions(+), 47 deletions(-) diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index fa6ea6407992..d3e4068d3189 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -45,7 +45,6 @@ xchk_setup_inode_bmap( */ if (S_ISREG(VFS_I(sc->ip)->i_mode) && sc->sm->sm_type =3D=3D XFS_SCRUB_TYPE_BMBTD) { - inode_dio_wait(VFS_I(sc->ip)); error =3D filemap_write_and_wait(VFS_I(sc->ip)->i_mapping); if (error) goto out; diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index e62fb5216341..a454f481107e 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -674,9 +674,6 @@ xfs_free_eofblocks( if (error) return error; =20 - /* wait on dio to ensure i_size has settled */ - inode_dio_wait(VFS_I(ip)); - error =3D xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp); if (error) { diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 0cc843a4a163..d0ee7d2932e4 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -193,9 +193,11 @@ xfs_file_dio_aio_read( } else { xfs_ilock(ip, XFS_IOLOCK_SHARED); } - ret =3D iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, NULL, 0); - xfs_iunlock(ip, XFS_IOLOCK_SHARED); =20 + ret =3D iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, NULL, + IOMAP_DIO_RWSEM_SHARED); + if (ret !=3D -EIOCBQUEUED) + xfs_iunlock(ip, XFS_IOLOCK_SHARED); return ret; } =20 @@ -341,15 +343,6 @@ xfs_file_aio_write_checks( xfs_ilock(ip, *iolock); iov_iter_reexpand(from, count); } - /* - * We now have an IO submission barrier in place, but - * AIO can do EOF updates during IO completion and hence - * we now need to wait for all of them to drain. Non-AIO - * DIO will have drained before we are given the - * XFS_IOLOCK_EXCL, and so for most cases this wait is a - * no-op. - */ - inode_dio_wait(inode); drained_dio =3D true; goto restart; } @@ -469,13 +462,7 @@ static const struct iomap_dio_ops xfs_dio_write_ops = =3D { * needs to do sub-block zeroing and that requires serialisation against= other * direct IOs to the same block. In this case we need to serialise the * submission of the unaligned IOs so that we don't get racing block zer= oing in - * the dio layer. To avoid the problem with aio, we also need to wait f= or - * outstanding IOs to complete so that unwritten extent conversion is co= mpleted - * before we try to map the overlapping block. This is currently impleme= nted by - * hitting it with a big hammer (i.e. inode_dio_wait()). - * - * Returns with locks held indicated by @iolock and errors indicated by - * negative return values. + * the dio layer. */ STATIC ssize_t xfs_file_dio_aio_write( @@ -546,18 +533,21 @@ xfs_file_dio_aio_write( * xfs_file_aio_write_checks() for other reasons. */ if (unaligned_io) { - inode_dio_wait(inode); - dio_flags =3D IOMAP_DIO_SYNCHRONOUS; - } else if (iolock =3D=3D XFS_IOLOCK_EXCL) { - xfs_ilock_demote(ip, XFS_IOLOCK_EXCL); - iolock =3D XFS_IOLOCK_SHARED; + dio_flags =3D IOMAP_DIO_RWSEM_EXCL | IOMAP_DIO_SYNCHRONOUS; + } else { + if (iolock =3D=3D XFS_IOLOCK_EXCL) { + xfs_ilock_demote(ip, XFS_IOLOCK_EXCL); + iolock =3D XFS_IOLOCK_SHARED; + } + dio_flags =3D IOMAP_DIO_RWSEM_SHARED; } =20 trace_xfs_file_direct_write(ip, count, iocb->ki_pos); ret =3D iomap_dio_rw(iocb, from, &xfs_direct_write_iomap_ops, &xfs_dio_write_ops, dio_flags); out: - xfs_iunlock(ip, iolock); + if (ret !=3D -EIOCBQUEUED) + xfs_iunlock(ip, iolock); =20 /* * No fallback to buffered IO on errors for XFS, direct IO will either @@ -819,15 +809,6 @@ xfs_file_fallocate( if (error) goto out_unlock; =20 - /* - * Must wait for all AIO to complete before we continue as AIO can - * change the file size on completion without holding any locks we - * currently hold. We must do this first because AIO can update both - * the on disk and in memory inode sizes, and the operations that follo= w - * require the in-memory size to be fully up-to-date. - */ - inode_dio_wait(inode); - /* * Now AIO and DIO has drained we flush and (if necessary) invalidate * the cached range over the first operation we are about to run. diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 8dc2e5414276..9e6f32fd32f5 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1720,8 +1720,7 @@ xfs_prep_free_cowblocks( */ if ((VFS_I(ip)->i_state & I_DIRTY_PAGES) || mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_DIRTY) || - mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_WRITEBACK) || - atomic_read(&VFS_I(ip)->i_dio_count)) + mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_WRITEBACK)) return false; =20 return true; diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 7b35d62ede9f..331453f2c4be 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -548,7 +548,6 @@ xfs_ioc_space( error =3D xfs_break_layouts(inode, &iolock, BREAK_UNMAP); if (error) goto out_unlock; - inode_dio_wait(inode); =20 switch (bf->l_whence) { case 0: /*SEEK_SET*/ diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 8afe69ca188b..700edeccc6bf 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -893,11 +893,6 @@ xfs_setattr_size( if (error) return error; =20 - /* - * Wait for all direct I/O to complete. - */ - inode_dio_wait(inode); - /* * File data changes must be complete before we start the transaction t= o * modify the inode. This needs to be done before joining the inode to diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index de451235c4ee..f775e60ca6f7 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1525,8 +1525,6 @@ xfs_reflink_unshare( =20 trace_xfs_reflink_unshare(ip, offset, len); =20 - inode_dio_wait(inode); - error =3D iomap_file_unshare(inode, offset, len, &xfs_buffered_write_iomap_ops); if (error) --=20 2.24.1