From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60EBAC3B189 for ; Wed, 12 Feb 2020 04:19:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 19F5A20848 for ; Wed, 12 Feb 2020 04:19:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="lTq6KvZ2" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 19F5A20848 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8CB146B03A0; Tue, 11 Feb 2020 23:18:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5AD8E6B03AF; Tue, 11 Feb 2020 23:18:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C85326B03AE; Tue, 11 Feb 2020 23:18:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0151.hostedemail.com [216.40.44.151]) by kanga.kvack.org (Postfix) with ESMTP id 135E66B03A8 for ; Tue, 11 Feb 2020 23:18:49 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id B1741124F for ; Wed, 12 Feb 2020 04:18:48 +0000 (UTC) X-FDA: 76480169136.29.guide30_12aa1f1707d11 X-HE-Tag: guide30_12aa1f1707d11 X-Filterd-Recvd-Size: 6270 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Wed, 12 Feb 2020 04:18:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=F5NU3oCRsJ/VRewPgyFmDI/jOyoaYvHI3EnYnqICXs0=; b=lTq6KvZ25CFBn20eSyRhQz+xt4 xZ5E0flMQSOELfzqd57XfcPrr/jsPgY5r7h5VGuLF8q8wml2MDikHBD1nkHZxF9oGnyYdmwD/nBOJ 5rSiKXMb5QcGERIaLQXM/T9RTt7f8B/4vli/UrD3Y4uEBPB3+vbbi0uGsbBVDvmcJx6IzRE9lSdH3 IBqUNVplisHwBmiOyYmef8UmsYF2HfbbPUQasNPZXwcFJfxaTuFUqdY1Z0wwSVA0k60KQ1LbCJzdH E4aajx9ShctTRRA6nl9E05t9s7BupBssLXhxgL+gGIqri1khlJg9vlNER8XmyWjsAcqS0QZMbGckM gIRtseLQ==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1j1jU7-0006pC-HQ; Wed, 12 Feb 2020 04:18:47 +0000 From: Matthew Wilcox To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Cc: "Matthew Wilcox (Oracle)" , linux-kernel@vger.kernel.org Subject: [PATCH v2 24/25] mm: Add large page readahead Date: Tue, 11 Feb 2020 20:18:44 -0800 Message-Id: <20200212041845.25879-25-willy@infradead.org> X-Mailer: git-send-email 2.21.1 In-Reply-To: <20200212041845.25879-1-willy@infradead.org> References: <20200212041845.25879-1-willy@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: "Matthew Wilcox (Oracle)" If the filesystem supports large pages, allocate larger pages in the readahead code when it seems worth doing. The heuristic for choosing larger page sizes will surely need some tuning, but this aggressive ramp-up seems good for testing. Signed-off-by: Matthew Wilcox (Oracle) --- mm/readahead.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 93 insertions(+), 5 deletions(-) diff --git a/mm/readahead.c b/mm/readahead.c index 29ca25c8f01e..b582f09aa7e3 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -406,13 +406,96 @@ static int try_context_readahead(struct address_spa= ce *mapping, return 1; } =20 +static inline int ra_alloc_page(struct address_space *mapping, pgoff_t o= ffset, + pgoff_t mark, unsigned int order, gfp_t gfp) +{ + int err; + struct page *page =3D __page_cache_alloc_order(gfp, order); + + if (!page) + return -ENOMEM; + if (mark - offset < (1UL << order)) + SetPageReadahead(page); + err =3D add_to_page_cache_lru(page, mapping, offset, gfp); + if (err) + put_page(page); + return err; +} + +#define PMD_ORDER (PMD_SHIFT - PAGE_SHIFT) + +static unsigned long page_cache_readahead_order(struct address_space *ma= pping, + struct file_ra_state *ra, struct file *file, unsigned int order) +{ + struct readahead_control rac =3D { + .mapping =3D mapping, + .file =3D file, + .start =3D ra->start, + .nr_pages =3D 0, + }; + unsigned int old_order =3D order; + pgoff_t offset =3D ra->start; + pgoff_t limit =3D (i_size_read(mapping->host) - 1) >> PAGE_SHIFT; + pgoff_t mark =3D offset + ra->size - ra->async_size; + int err =3D 0; + gfp_t gfp =3D readahead_gfp_mask(mapping); + + limit =3D min(limit, offset + ra->size - 1); + + /* Grow page size up to PMD size */ + if (order < PMD_ORDER) { + order +=3D 2; + if (order > PMD_ORDER) + order =3D PMD_ORDER; + while ((1 << order) > ra->size) + order--; + } + + /* If size is somehow misaligned, fill with order-0 pages */ + while (!err && offset & ((1UL << old_order) - 1)) { + err =3D ra_alloc_page(mapping, offset++, mark, 0, gfp); + if (!err) + rac.nr_pages++; + } + + while (!err && offset & ((1UL << order) - 1)) { + err =3D ra_alloc_page(mapping, offset, mark, old_order, gfp); + if (!err) + rac.nr_pages +=3D 1UL << old_order; + offset +=3D 1UL << old_order; + } + + while (!err && offset <=3D limit) { + err =3D ra_alloc_page(mapping, offset, mark, order, gfp); + if (!err) + rac.nr_pages +=3D 1UL << order; + offset +=3D 1UL << order; + } + + if (offset > limit) { + ra->size +=3D offset - limit - 1; + ra->async_size +=3D offset - limit - 1; + } + + read_pages(&rac, NULL); + + /* + * If there were already pages in the page cache, then we may have + * left some gaps. Let the regular readahead code take care of this + * situation. + */ + if (err) + return ra_submit(ra, mapping, file); + return 0; +} + /* * A minimal readahead algorithm for trivial sequential/random reads. */ static unsigned long ondemand_readahead(struct address_space *mapping, struct file_ra_state *ra, struct file *filp, - bool hit_readahead_marker, pgoff_t offset, + struct page *page, pgoff_t offset, unsigned long req_size) { struct backing_dev_info *bdi =3D inode_to_bdi(mapping->host); @@ -451,7 +534,7 @@ ondemand_readahead(struct address_space *mapping, * Query the pagecache for async_size, which normally equals to * readahead size. Ramp it up and use it as the new readahead size. */ - if (hit_readahead_marker) { + if (page) { pgoff_t start; =20 rcu_read_lock(); @@ -520,7 +603,12 @@ ondemand_readahead(struct address_space *mapping, } } =20 - return ra_submit(ra, mapping, filp); + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) || !page || + !(mapping->host->i_sb->s_type->fs_flags & FS_LARGE_PAGES)) + return ra_submit(ra, mapping, filp); + + return page_cache_readahead_order(mapping, ra, filp, + compound_order(page)); } =20 /** @@ -555,7 +643,7 @@ void page_cache_sync_readahead(struct address_space *= mapping, } =20 /* do read-ahead */ - ondemand_readahead(mapping, ra, filp, false, offset, req_size); + ondemand_readahead(mapping, ra, filp, NULL, offset, req_size); } EXPORT_SYMBOL_GPL(page_cache_sync_readahead); =20 @@ -602,7 +690,7 @@ page_cache_async_readahead(struct address_space *mapp= ing, return; =20 /* do read-ahead */ - ondemand_readahead(mapping, ra, filp, true, offset, req_size); + ondemand_readahead(mapping, ra, filp, page, offset, req_size); } EXPORT_SYMBOL_GPL(page_cache_async_readahead); =20 --=20 2.25.0