From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55BFAC33CB3 for ; Tue, 14 Jan 2020 16:12:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0B6D124685 for ; Tue, 14 Jan 2020 16:12:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="sCB1kcD9" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0B6D124685 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A3DEC8E000C; Tue, 14 Jan 2020 11:12:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9EE4F8E0009; Tue, 14 Jan 2020 11:12:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 88E7E8E000C; Tue, 14 Jan 2020 11:12:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0063.hostedemail.com [216.40.44.63]) by kanga.kvack.org (Postfix) with ESMTP id 6D8D48E0009 for ; Tue, 14 Jan 2020 11:12:55 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 3B5E82839 for ; Tue, 14 Jan 2020 16:12:55 +0000 (UTC) X-FDA: 76376733510.10.clam42_3687edf82901a X-HE-Tag: clam42_3687edf82901a X-Filterd-Recvd-Size: 7053 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) by imf22.hostedemail.com (Postfix) with ESMTP for ; Tue, 14 Jan 2020 16:12:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From :Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=kuBKhMklw+YxwJKcgxxO2S9Flu8Wk6bhNkWsODoSOE0=; b=sCB1kcD90uIUWLGF48wprnsXiX 3TB+jsOI4Y2LkRdrKtxH7i+aXulnTirR1Cm/o8Nz6e4INd8u4uc0mJ3rPB5SzAwsKj5VfFRQAJBVr 3fUFDOuZm+DBrQQWkhuoNqedHFyFeIuI8r927b3PwWgcpgLD/Am4ErzsYUPEBFxz7sZuTCsVinp4t b/PWVC2rKPI1UPEI1qNqzum+uHVMcQA2gZRteVU9VeEFHkzOfP20VFKZOc1tfVxyDUniazlYLshgf FZcWjN50VlbXvGJKA5TxSQ7uDk2TOGwTCR8IhGlGSp9FaxaAyGjBKQ3+KxCIelMg+ZEfos510Jftn QT84XA8Q==; Received: from [2001:4bb8:18c:4f54:fcbb:a92b:61e1:719] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1irOoA-0000CC-CW; Tue, 14 Jan 2020 16:12:46 +0000 From: Christoph Hellwig To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, Waiman Long , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Will Deacon , Andrew Morton , linux-ext4@vger.kernel.org, cluster-devel@redhat.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 07/12] iomap: allow holding i_rwsem until aio completion Date: Tue, 14 Jan 2020 17:12:20 +0100 Message-Id: <20200114161225.309792-8-hch@lst.de> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200114161225.309792-1-hch@lst.de> References: <20200114161225.309792-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The direct I/O code currently uses a hand crafted i_dio_count that needs to be incremented under i_rwsem and then is decremented when I/O completes. That scheme means file system code needs to be very careful to wait for i_dio_count to reach zero under i_rwsem in various places that are very cumbersome to get rid. It also means we can't get the effect of an exclusive i_rwsem for actually asynchronous I/O, forcing pointless synchronous execution of sub-blocksize writes. Replace the i_dio_count scheme with holding i_rwsem over the duration of the whole I/O. While this introduces a non-owner unlock that isn't nice to RT workload, the open coded locking primitive using i_dio_count isn't any better. Signed-off-by: Christoph Hellwig --- fs/iomap/direct-io.c | 44 +++++++++++++++++++++++++++++++++++++------ include/linux/iomap.h | 2 ++ 2 files changed, 40 insertions(+), 6 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index e706329d71a0..0113ac33b0a0 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -70,7 +70,7 @@ static void iomap_dio_submit_bio(struct iomap_dio *dio,= struct iomap *iomap, dio->submit.cookie =3D submit_bio(bio); } =20 -static ssize_t iomap_dio_complete(struct iomap_dio *dio) +static ssize_t iomap_dio_complete(struct iomap_dio *dio, bool unlock) { const struct iomap_dio_ops *dops =3D dio->dops; struct kiocb *iocb =3D dio->iocb; @@ -112,6 +112,13 @@ static ssize_t iomap_dio_complete(struct iomap_dio *= dio) dio_warn_stale_pagecache(iocb->ki_filp); } =20 + if (unlock) { + if (dio->flags & IOMAP_DIO_RWSEM_EXCL) + up_write(&inode->i_rwsem); + else if (dio->flags & IOMAP_DIO_RWSEM_SHARED) + up_read(&inode->i_rwsem); + } + /* * If this is a DSYNC write, make sure we push it to stable storage now * that we've written data. @@ -129,8 +136,22 @@ static void iomap_dio_complete_work(struct work_stru= ct *work) { struct iomap_dio *dio =3D container_of(work, struct iomap_dio, aio.work= ); struct kiocb *iocb =3D dio->iocb; + struct inode *inode =3D file_inode(iocb->ki_filp); =20 - iocb->ki_complete(iocb, iomap_dio_complete(dio), 0); + /* + * XXX: For reads this code is directly called from bio ->end_io, which + * often is hard or softirq context. In that case lockdep records the + * below as lock acquisitions from irq context and causes warnings. + */ + if (dio->flags & IOMAP_DIO_RWSEM_EXCL) { + rwsem_acquire(&inode->i_rwsem.dep_map, 0, 0, _THIS_IP_); + if (IS_ENABLED(CONFIG_RWSEM_SPIN_ON_OWNER)) + atomic_long_set(&inode->i_rwsem.owner, (long)current); + } else if (dio->flags & IOMAP_DIO_RWSEM_SHARED) { + rwsem_acquire_read(&inode->i_rwsem.dep_map, 0, 0, _THIS_IP_); + } + + iocb->ki_complete(iocb, iomap_dio_complete(dio, true), 0); } =20 /* @@ -430,7 +451,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *ite= r, dio->i_size =3D i_size_read(inode); dio->dops =3D dops; dio->error =3D 0; - dio->flags =3D 0; + dio->flags =3D dio_flags; =20 dio->submit.iter =3D iter; dio->submit.waiter =3D current; @@ -551,8 +572,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *ite= r, dio->wait_for_completion =3D wait_for_completion; if (!atomic_dec_and_test(&dio->ref)) { if (!wait_for_completion) - return -EIOCBQUEUED; - + goto async_completion; for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); if (!READ_ONCE(dio->submit.waiter)) @@ -567,10 +587,22 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *i= ter, __set_current_state(TASK_RUNNING); } =20 - return iomap_dio_complete(dio); + return iomap_dio_complete(dio, false); =20 out_free_dio: kfree(dio); return ret; + +async_completion: + /* + * We are returning to userspace now, but i_rwsem is still held until + * the I/O completion comes back. + */ + if (dio_flags & (IOMAP_DIO_RWSEM_EXCL | IOMAP_DIO_RWSEM_SHARED)) + rwsem_release(&inode->i_rwsem.dep_map, _THIS_IP_); + if ((dio_flags & IOMAP_DIO_RWSEM_EXCL) && + IS_ENABLED(CONFIG_RWSEM_SPIN_ON_OWNER)) + atomic_long_set(&inode->i_rwsem.owner, RWSEM_OWNER_UNKNOWN); + return -EIOCBQUEUED; } EXPORT_SYMBOL_GPL(iomap_dio_rw); diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 3faeb8fd0961..f259bb979d7f 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -249,6 +249,8 @@ int iomap_writepages(struct address_space *mapping, #define IOMAP_DIO_UNWRITTEN (1 << 0) /* covers unwritten extent(s) */ #define IOMAP_DIO_COW (1 << 1) /* covers COW extent(s) */ #define IOMAP_DIO_SYNCHRONOUS (1 << 2) /* no async completion */ +#define IOMAP_DIO_RWSEM_EXCL (1 << 3) /* holds shared i_rwsem */ +#define IOMAP_DIO_RWSEM_SHARED (1 << 4) /* holds exclusive i_rwsem */ =20 struct iomap_dio_ops { int (*end_io)(struct kiocb *iocb, ssize_t size, int error, --=20 2.24.1