From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A250AC4742C for ; Fri, 13 Nov 2020 04:47:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9DFC120936 for ; Fri, 13 Nov 2020 04:47:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="OIkfBs7E" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9DFC120936 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 955F66B005C; Thu, 12 Nov 2020 23:46:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8DF876B005D; Thu, 12 Nov 2020 23:46:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7A54B6B0068; Thu, 12 Nov 2020 23:46:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0229.hostedemail.com [216.40.44.229]) by kanga.kvack.org (Postfix) with ESMTP id 459316B005C for ; Thu, 12 Nov 2020 23:46:59 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id D19188249980 for ; Fri, 13 Nov 2020 04:46:58 +0000 (UTC) X-FDA: 77478160116.27.stick73_00162b22730c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id B49583D663 for ; Fri, 13 Nov 2020 04:46:58 +0000 (UTC) X-HE-Tag: stick73_00162b22730c X-Filterd-Recvd-Size: 4259 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Fri, 13 Nov 2020 04:46:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:Message-ID: Subject:To:From:Date:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:In-Reply-To:References; bh=5XkRS8SQi9lZ8xBy4p7uZlvNXOyVuyNr1Qx3lDzm30s=; b=OIkfBs7EidbKhFO+6xVRa4W1ss ht7YRsIJ/fkKyxjdOxBNMlMqD8toUGAkWWsDD2k+nLoURjK65fDQLkjjNpWDnBjZzXXh/2J198a93 rrMwBcE0CQKbZKJbO/UYSIsw6QfR6eGKPYNVf4p2b5QUeX5aMHkUQYKK3JwJ5D94WUA0IHR/KN4co nwS3q7DdbcLTPBpdEnHOTaFNZr231G6Wyh/IPLf26RYCNmQXgl7DmHcRX6vABdzx9gpZP0a+15KJE PxNpaYl+BlOVafynuLOin0uCQGTZ8W5kczD4L0oLDoRebs/3Un63QnBo49sMvUg/TxhXrXr5jqSrU 51FOETjg==; Received: from willy by casper.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1kdQz6-0004Td-AM; Fri, 13 Nov 2020 04:46:52 +0000 Date: Fri, 13 Nov 2020 04:46:52 +0000 From: Matthew Wilcox To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: Are THPs the right model for the pagecache? Message-ID: <20201113044652.GD17076@casper.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When I started working on using larger pages in the page cache, I was thinking about calling them large pages or lpages. As I worked my way through the code, I switched to simply adopting the transparent huge page terminology that is used by anonymous and shmem. I just changed the definition so that a thp is a page of arbitrary order. But now I'm wondering if that expediency has brought me to the right place. To enable THP, you have to select CONFIG_TRANSPARENT_HUGEPAGE, which is only available on architectures which support using larger TLB entries to map PMD-sized pages. Fair enough, since that was the original definition, but the point of suppoting larger page sizes in the page cache is to reduce software overhead. Why shouldn't Alpha or m68k use large pages in the page cache, even if they can't use them in their TLBs? I'm also thinking about the number of asserts about PageHead/PageTail/PageCompound and the repeated invocations of compound_head(). If we had a different type for large pages, we could use the compiler to assert these things instead of putting in runtime asserts. IOWs, something like this: struct lpage { struct page subpages[4]; }; static inline struct lpage *page_lpage(struct page *page) { unsigned long head = READ_ONCE(page->compound_head); if (unlikely(head & 1)) return (struct lpage *)(head - 1); return (struct lpage *)page; } We can then work our way through the code, distinguishing between functions which really want to get an lpage (ie ones which currently assert that they see only a PageHead) and functions which want to get a particular subpage. Some functions are going to need to be split. eg pagecache_get_page() currently takes an FGP_HEAD flag which determines whether it returns a head page or the subpage for the index. FGP_HEAD will have to go away in favour of having separate pagecache_get_subpage() and pagecache_get_lpage(). Or preferably, all callers of pagecache_get_page() get converted to use lpages and they can call find_subpage() all by themselves, if they need it. Feels like a lot of work, but it can be done gradually. My fear with the current code is that filesystem writers who want to convert to supporting THPs are not going to understand which interfaces expect a THP and which expect a subpage. For example, vmf->page (in the mkwrite handler) is a subpage. But the page passed to ->readpage is a THP. I don't think we're going to be able to switch either of those any time soon, so distinguishing them with a type seems only fair to fs authors. See, for example, Darrick's reasonable question here: https://lore.kernel.org/linux-fsdevel/20201014161216.GE9832@magnolia/ I'm not volunteering to do any of this in time for the next merge window! I have lots of patches to get approved by various maintainers in the next two weeks!