From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 093FEC433E2 for ; Thu, 17 Sep 2020 15:31:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6E060222D5 for ; Thu, 17 Sep 2020 15:31:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="ffihf+5N" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6E060222D5 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 77D7A6B006C; Thu, 17 Sep 2020 11:31:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 72DB76B006E; Thu, 17 Sep 2020 11:31:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 66B226B0070; Thu, 17 Sep 2020 11:31:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0187.hostedemail.com [216.40.44.187]) by kanga.kvack.org (Postfix) with ESMTP id 525736B006C for ; Thu, 17 Sep 2020 11:31:31 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 044B0349B for ; Thu, 17 Sep 2020 15:31:31 +0000 (UTC) X-FDA: 77272942782.16.ocean85_2f175c827123 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id 08CFA100E5A2B for ; Thu, 17 Sep 2020 15:31:30 +0000 (UTC) X-HE-Tag: ocean85_2f175c827123 X-Filterd-Recvd-Size: 7788 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf17.hostedemail.com (Postfix) with ESMTP for ; Thu, 17 Sep 2020 15:31:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=zXmx2Q9ib8LaEq/cmNv7DtrHg7rQIIRhLXukJQtl2ug=; b=ffihf+5NKGdlu+XW5it3GHqzVB wV8ybsOqFAy9WktF9PdPXd2BwOQSZANCWWHyXH/mEtngJKZf3xOEoD9JCKHw4+g0OfYJhOc7AVHQt 0gVu68YvUMnq1NxaLz4/tNIjq6X8ua6zUp61z7DR7po/TtyHZqAGyOz8Qmm34stw6FLTlWzyQU6q8 zXEfP6psDDzJxwpQ37M6Vy63mhmgRcc+ySyhcVXVUyv2+3LUWSotVTI55QattpfbF6nlaRo1GDL2O qMZwLGHUezabQa6mru/yTdiLJl/oC5Uym3hl6B7ZdxB6DB89vrx3fYbLzg2Qq8u59Wa6KAzr+PRZQ 57dfGvYQ==; Received: from willy by casper.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1kIvYi-0001PJ-AR; Thu, 17 Sep 2020 15:10:52 +0000 From: "Matthew Wilcox (Oracle)" To: linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , linux-mm@kvack.org, v9fs-developer@lists.sourceforge.net, linux-kernel@vger.kernel.org, linux-afs@lists.infradead.org, ceph-devel@vger.kernel.org, linux-cifs@vger.kernel.org, ecryptfs@vger.kernel.org, linux-um@lists.infradead.org, linux-mtd@lists.infradead.org, Richard Weinberger Subject: [PATCH 01/13] mm: Add AOP_UPDATED_PAGE return value Date: Thu, 17 Sep 2020 16:10:38 +0100 Message-Id: <20200917151050.5363-2-willy@infradead.org> X-Mailer: git-send-email 2.21.3 In-Reply-To: <20200917151050.5363-1-willy@infradead.org> References: <20200917151050.5363-1-willy@infradead.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: 08CFA100E5A2B X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Allow synchronous ->readpage implementations to execute more efficiently by skipping the re-locking of the page. Signed-off-by: Matthew Wilcox (Oracle) --- Documentation/filesystems/locking.rst | 7 ++++--- Documentation/filesystems/vfs.rst | 21 ++++++++++++++------- include/linux/fs.h | 5 +++++ mm/filemap.c | 12 ++++++++++-- 4 files changed, 33 insertions(+), 12 deletions(-) diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesy= stems/locking.rst index 64f94a18d97e..06a7a8bf2362 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -269,7 +269,7 @@ locking rules: ops PageLocked(page) i_rwsem =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D= =3D=3D=3D=3D=3D=3D=3D writepage: yes, unlocks (see below) -readpage: yes, unlocks +readpage: yes, may unlock writepages: set_page_dirty no readahead: yes, unlocks @@ -294,8 +294,9 @@ swap_deactivate: no ->write_begin(), ->write_end() and ->readpage() may be called from the request handler (/dev/loop). =20 -->readpage() unlocks the page, either synchronously or via I/O -completion. +->readpage() may return AOP_UPDATED_PAGE if the page is now Uptodate +or 0 if the page will be unlocked asynchronously by I/O completion. +If it returns -errno, it should unlock the page. =20 ->readahead() unlocks the pages that I/O is attempted on like ->readpage= (). =20 diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystem= s/vfs.rst index ca52c82e5bb5..16248c299aaa 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -643,7 +643,7 @@ set_page_dirty to write data into the address_space, = and writepage and writepages to writeback data to storage. =20 Adding and removing pages to/from an address_space is protected by the -inode's i_mutex. +inode's i_rwsem held exclusively. =20 When data is written to a page, the PG_Dirty flag should be set. It typically remains set until writepage asks for it to be written. This @@ -757,12 +757,19 @@ cache in your filesystem. The following members ar= e defined: =20 ``readpage`` called by the VM to read a page from backing store. The page - will be Locked when readpage is called, and should be unlocked - and marked uptodate once the read completes. If ->readpage - discovers that it needs to unlock the page for some reason, it - can do so, and then return AOP_TRUNCATED_PAGE. In this case, - the page will be relocated, relocked and if that all succeeds, - ->readpage will be called again. + will be Locked and !Uptodate when readpage is called. Ideally, + the filesystem will bring the page Uptodate and return + AOP_UPDATED_PAGE. If the filesystem encounters an error, it + should unlock the page and return a negative errno without marking + the page Uptodate. It does not need to mark the page as Error. + If the filesystem returns 0, this means the page will be unlocked + asynchronously by I/O completion. The VFS will wait for the + page to be unlocked, so there is no advantage to executing this + operation asynchronously. + + The filesystem can also return AOP_TRUNCATED_PAGE to indicate + that it had to unlock the page to avoid a deadlock. The caller + will re-check the page cache and call ->readpage again. =20 ``writepages`` called by the VM to write out pages associated with the diff --git a/include/linux/fs.h b/include/linux/fs.h index e019ea2f1347..6fc650050d20 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -273,6 +273,10 @@ struct iattr { * reference, it should drop it before retrying. Returned * by readpage(). * + * @AOP_UPDATED_PAGE: The readpage method has brought the page Uptodate + * without releasing the page lock. This is suitable for synchronous + * implementations of readpage. + * * address_space_operation functions return these large constants to ind= icate * special semantics to the caller. These are much larger than the byte= s in a * page to allow for functions that return the number of bytes operated = on in a @@ -282,6 +286,7 @@ struct iattr { enum positive_aop_returns { AOP_WRITEPAGE_ACTIVATE =3D 0x80000, AOP_TRUNCATED_PAGE =3D 0x80001, + AOP_UPDATED_PAGE =3D 0x80002, }; =20 #define AOP_FLAG_CONT_EXPAND 0x0001 /* called from cont_expand */ diff --git a/mm/filemap.c b/mm/filemap.c index 1aaea26556cc..131a2aaa1537 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2254,8 +2254,10 @@ ssize_t generic_file_buffered_read(struct kiocb *i= ocb, * PG_error will be set again if readpage fails. */ ClearPageError(page); - /* Start the actual read. The read will unlock the page. */ + /* Start the actual read. The read may unlock the page. */ error =3D mapping->a_ops->readpage(filp, page); + if (error =3D=3D AOP_UPDATED_PAGE) + goto page_ok; =20 if (unlikely(error)) { if (error =3D=3D AOP_TRUNCATED_PAGE) { @@ -2619,7 +2621,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) */ if (unlikely(!PageUptodate(page))) goto page_not_uptodate; - +page_ok: /* * We've made it this far and we had to drop our mmap_lock, now is the * time to return to the upper layer and have it re-find the vma and @@ -2654,6 +2656,8 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) ClearPageError(page); fpin =3D maybe_unlock_mmap_for_io(vmf, fpin); error =3D mapping->a_ops->readpage(file, page); + if (error =3D=3D AOP_UPDATED_PAGE) + goto page_ok; if (!error) { wait_on_page_locked(page); if (!PageUptodate(page)) @@ -2867,6 +2871,10 @@ static struct page *do_read_cache_page(struct addr= ess_space *mapping, err =3D filler(data, page); else err =3D mapping->a_ops->readpage(data, page); + if (err =3D=3D AOP_UPDATED_PAGE) { + unlock_page(page); + goto out; + } =20 if (err < 0) { put_page(page); --=20 2.28.0