From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AAF1EC33CAF for ; Mon, 13 Jan 2020 15:38:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 619ED2081E for ; Mon, 13 Jan 2020 15:38:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="aXRA6eIk" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 619ED2081E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3E12C8E0014; Mon, 13 Jan 2020 10:37:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 399598E0012; Mon, 13 Jan 2020 10:37:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 20DE58E0014; Mon, 13 Jan 2020 10:37:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0119.hostedemail.com [216.40.44.119]) by kanga.kvack.org (Postfix) with ESMTP id EF8DD8E0012 for ; Mon, 13 Jan 2020 10:37:55 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id B777E40CA for ; Mon, 13 Jan 2020 15:37:55 +0000 (UTC) X-FDA: 76373016510.10.basin16_7f5fa9289cd32 X-HE-Tag: basin16_7f5fa9289cd32 X-Filterd-Recvd-Size: 5534 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Mon, 13 Jan 2020 15:37:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=PyHbyvKFQNe8v5T0y5nRYH5QHMFCBJQIcCvIL1xBpxU=; b=aXRA6eIkhgUgZqBvbUS+ZT0AY Ux0HfjsTPwuBndSDbLlPFYPMuDrL5UPjn7V+uWPhI0Uonke0LyYqhJQt5I8fekhyUoIr/i7MENFfH krWVP95kKeqDv5LMdxMgajQV8ep7p4jSGWuWcFcbW+SWoQrIjORQd4iEMjtI5FE91MUy9jD2QTWQX SO+CvbiJfx7clU0wIuWhNtsC05ZLJzRnh5XqWFKeoETzy88ij9ENSzkSliyoXrBFEKrsOZjtCgBjG CvnWRJjqZ6vQEaxyuAvXYKUW89bjx8tk6KlSxpK8yE2KFdMB3VRIoQrd0K6ZGoFvHj8QAoyPyvAVR MhswCUdAQ==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1ir1mr-00075X-0L; Mon, 13 Jan 2020 15:37:53 +0000 From: Matthew Wilcox To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Cc: "Matthew Wilcox (Oracle)" , jlayton@kernel.org, hch@infradead.org Subject: [RFC 0/8] Replacing the readpages a_op Date: Mon, 13 Jan 2020 07:37:38 -0800 Message-Id: <20200113153746.26654-1-willy@infradead.org> X-Mailer: git-send-email 2.21.0 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: "Matthew Wilcox (Oracle)" I think everybody hates the readpages API. The fundamental problem with it is that it passes the pages to be read on a doubly linked list, using the ->lru list in the struct page. That means the filesystems have to do the work of calling add_to_page_cache{,_lru,_locked}, and handling failures (because another task is also accessing that chunk of the file, and so it fails). This is an attempt to add a ->readahead op to replace ->readpages. I've converted two users, iomap/xfs and cifs. The cifs conversion is lacking fscache support, and that's just because I didn't want to do that work; I don't believe there's anything fundamental to it. But I wanted to do iomap because it is The Infrastructure Of The Future and cifs because it is the sole remaining user of add_to_page_cache_locked(), which enables the last two patches in the series. By the way, that gives CIFS access to the workingset shadow infrastructure, which it had to ignore before because it couldn't put pages onto the lru list at the right time. The fundamental question is, how do we indicate to the implementation of ->readahead what pages to operate on? I've gone with passing a pagevec. This has the obvious advantage that it's a data structure that already exists and is used within filemap for batches of pages. I had to add a bit of new infrastructure to support iterating over the pages in the pagevec, but with that done, it's quite nice. I think the biggest problem is that the size of the pagevec is limited to 15 pages (60kB). So that'll mean that if the readahead window bumps all the way up to 256kB, we may end up making 5 BIOs (and merging them) instead of one. I'd kind of like to be able to allocate variable length pagevecs while allowing regular pagevecs to be allocated on the stack, but I can't figure out a way to do that. eg this doesn't work: - struct page *pages[PAGEVEC_SIZE]; + union { + struct page *pages[PAGEVEC_SIZE]; + struct page *_pages[]; + } and if we just allocate them, useful and wonderful tools are going to point out when pages[16] is accessed that we've overstepped the end of the array. I have considered alternatives to the pagevec like just having the ->readahead implementation look up the pages in the i_pages XArray directly. That didn't work out too well. Anyway, I want to draw your attention to the diffstat below. Net 91 line= s deleted, and that's with adding all the infrastructure for ->readahead and getting rid of none of the infrastructure for ->readpages. There's probably a good couple of hundred lines of code to be deleted there. Matthew Wilcox (Oracle) (8): pagevec: Add an iterator mm: Fix the return type of __do_page_cache_readahead mm: Use a pagevec for readahead mm/fs: Add a_ops->readahead iomap,xfs: Convert from readpages to readahead cifs: Convert from readpages to readahead mm: Remove add_to_page_cache_locked mm: Unify all add_to_page_cache variants Documentation/filesystems/locking.rst | 8 +- Documentation/filesystems/vfs.rst | 9 ++ fs/cifs/file.c | 125 ++++------------------- fs/iomap/buffered-io.c | 60 +++-------- fs/iomap/trace.h | 18 ++-- fs/xfs/xfs_aops.c | 12 +-- include/linux/fs.h | 3 + include/linux/iomap.h | 4 +- include/linux/pagemap.h | 23 +---- include/linux/pagevec.h | 20 ++++ mm/filemap.c | 72 ++++--------- mm/internal.h | 2 +- mm/readahead.c | 141 +++++++++++++++++--------- 13 files changed, 203 insertions(+), 294 deletions(-) --=20 2.24.1