From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0282EC433EF for ; Thu, 27 Jan 2022 21:57:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4C59A6B0071; Thu, 27 Jan 2022 16:57:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 474216B0072; Thu, 27 Jan 2022 16:57:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 33CE56B0073; Thu, 27 Jan 2022 16:57:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0006.hostedemail.com [216.40.44.6]) by kanga.kvack.org (Postfix) with ESMTP id 23D566B0071 for ; Thu, 27 Jan 2022 16:57:11 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id D7B1290087 for ; Thu, 27 Jan 2022 21:57:10 +0000 (UTC) X-FDA: 79077428220.25.9B9DCEF Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf06.hostedemail.com (Postfix) with ESMTP id E09F2180003 for ; Thu, 27 Jan 2022 21:57:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:Message-ID: Subject:To:From:Date:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:In-Reply-To:References; bh=iSrFUGdmJlamSr5RCQz3n4wIar+he17tYPtSwgocvok=; b=QqhXAeY27GcRHu82PrZBu62CY3 7TtfA5D+tXijMLRJ3Jbg/ksXf0Qg0Uxbtc5zN3Crq0p5vO+ISo9ehp+taS5MdXpXSnHLC5U823MiL j0hgL9mu4a4wGkDRpRBmFNLXG1O9JXdOvZx9BJvVJhY2c9yfOSX/CSB05MCJscWvikbTwgnPNYXmx +gQxdm9lN9AUzI1ww7W9ruWqzF2fnTR45k7qm3JHFjUN+Z3Fl4R0Vvx8wJEklaXj5PkVctVkSbWHr pSeJq7F1f0+OsZCF8MR/dFfxZZyEl/YvNv/JupwM9R6Rx3zo12BZUq7QEY7arJAR8K+6dwQdiLpN/ m+9RYvkQ==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nDClN-005hu2-NC for linux-mm@kvack.org; Thu, 27 Jan 2022 21:57:06 +0000 Date: Thu, 27 Jan 2022 21:57:05 +0000 From: Matthew Wilcox To: linux-mm@kvack.org Subject: A two-bit folio_mapcount Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: E09F2180003 X-Stat-Signature: 98an4fkmjju8ny688kha9u1xaa3ydhmo Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=QqhXAeY2; dmarc=none; spf=none (imf06.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org X-Rspam-User: nil X-HE-Tag: 1643320629-939564 X-Bogosity: Ham, tests=bogofilter, spamicity=0.022947, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: As promised, here's a half-baked proposal for making folio_mapcount() significantly cheaper at the cost of making it less precise. I appreciate that folio_mapcount() is not upstream yet, so take a look at total_mapcount() if you want to understand what I'm talking about. For a 2MB folio on a 4k architecture, you have to check 512 cachelines to determine how many times a folio is mapped. That's 32kB of memory, which is a good chunk of your L1 cache. The problem is that every PTE mapping increments the ->mapcount of each individual page (and the number of PMD mappings is stored separately). To find out how many times the entire folio is mapped, you've got to look at each constituent page. Added to that, each increment of any of the ->mapcount bumps the refcount on the head page. That's a lot of atomic ops, and we've had some problems where the page refcount has been attacked resulting in overflow. I would like to start counting folio mapcounts in a more Discworld Troll manner. Zero, One, Two, Many. That limits the total number of refcount increments to 3. Once you reach "Many", you've essentially lost count, and you need to walk the interval tree to figure out exactly how many mappings there are (this means we can no longer use mapcount to decide to stop walking the rmap, but I think that's OK?) You can decrement from Two to One and One to Zero, but you can't decrement from Many to Two. If you walk the rmap and discover there are less than Many mappings, you can set mapcount to Two, One or Zero (adjusting page refcount at the same time). The mapcount would also no longer count the number of individual PTE or PMD mappings. Instead, it would be the number of VMAs which contain at least one page table reference to this folio. One advantage to this scheme is that it makes something like 30 bits available in struct page. I'm sure we'll be able to think of some good uses for them. PageDoubleMap also goes away (because we no longer care whether the folio is mapped with PMDs or PTEs). So ... what's going to be made catastrophically slower by this scheme? Maybe something involving anonymous pages? Those tend to be my blind spot.