From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46270C636D3 for ; Wed, 8 Feb 2023 19:55:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AEE6E6B0071; Wed, 8 Feb 2023 14:55:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A9E966B0072; Wed, 8 Feb 2023 14:55:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 966316B0074; Wed, 8 Feb 2023 14:55:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 878016B0071 for ; Wed, 8 Feb 2023 14:55:01 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 63850140733 for ; Wed, 8 Feb 2023 19:55:01 +0000 (UTC) X-FDA: 80445178002.11.7A4B0B5 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf07.hostedemail.com (Postfix) with ESMTP id D3D9A40005 for ; Wed, 8 Feb 2023 19:54:57 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="Gl/fG/9A"; spf=none (imf07.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675886099; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/0psbTcWhytTrETeFtlLdUiMg9wng766FBVjXDlhSg8=; b=lmIoi8RwJUBFGRuDEuzWzZIwtxQhB60z02P6FtxAqPN3hfUDQj5oP6DxuAGGVZVNu7RQty eUUj16rpRezVCb4zwPQ4sJsZjvigWyGJqd5uOPGRkNPolKHlpouKtmnRkG2HLFyvVXYj1C 5d3is7nYo7gZHDgz2tSmcAymdDVfVAA= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="Gl/fG/9A"; spf=none (imf07.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675886099; a=rsa-sha256; cv=none; b=5iIGLEu69ALSOieE71gWKymYZL2yCRdk3+azTaoUor+m4H/LctOc4X/cLWEnWuw7FgLNiA nWp3mdcGY3O4yNsGCqjFxqmGPOrFzclP+FQt3eul8h1O+bn9Vq2FmL1qNIWanJx0XrOarU IU3dXL90xWLj/Sd0s+NH0AcNA84YLZo= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=/0psbTcWhytTrETeFtlLdUiMg9wng766FBVjXDlhSg8=; b=Gl/fG/9An6q91SMNVw2tB+ZPOX dqWc6xQGFvy9J9GaDyojlPh4727vqGPO6qmLKmOztUQ70ap3x082qpZRZlAuzVA3yWhWR4lsAymPo NWaKqR9DO8pt/dxuSY6381JNOwiKujEOdvJgTj8L1yXOvejNQoQFapG3Cum476cKkOdKPSCTU4Chf VLknoRL8IW9DPuMD7qOi5y8QjohQYAGzMJ+h2Ru2GLtNphB6iGK70ZQcM6SbbZDwbdsfUzyScDZHZ pWopcPu7XOpbxEDfPBpMoxwj9H3F/iywJ3Hl4CftAX5ObNGLHbdDe9+YeiUh+An3eKIsRMsFwDaW+ v/nXR0OQ==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPqWq-001VNP-Qe; Wed, 08 Feb 2023 19:54:52 +0000 Date: Wed, 8 Feb 2023 19:54:52 +0000 From: Matthew Wilcox To: Zi Yan Cc: linux-mm@kvack.org, Vishal Moola , Hugh Dickins , Rik van Riel , David Hildenbrand , "Yin, Fengwei" Subject: Re: Folio mapcount Message-ID: References: <7DCA075B-1E43-47B1-9402-66C54513D52E@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7DCA075B-1E43-47B1-9402-66C54513D52E@nvidia.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: D3D9A40005 X-Rspam-User: X-Stat-Signature: 94r6556x7k5fuqcfr1wy9iisioruwd67 X-HE-Tag: 1675886097-653170 X-HE-Meta: U2FsdGVkX1+J1CNzxsZ+bO/FlbKl1Eu/QdALxkLXswrLoZKOGJ3/UMev05CQ+r1Ou2KDfOokhpEE+IztXHlIqAEPrQC65ZKLAdgAUAYmm+DHMncinmrUyDLkplXUVjzYNyuPHvDustho3t2bk1LO6f6IegYvhD9ou6LjvHCee4Ky37ZQGut6qSHdmpsiIdAH6diaFXW8cTYZJWu8BZcZ3nImyhL0JvXVTvzwHweEezhGizkz39XxAZ4XW6AohKXEkvZCLuNmSeZLUbUZvsCPMvuMcic3mSZuzBZ3/q6t0ebJHa/D0Je/NjWd8Wfz7RUEJsA48RdwpnRpalOtyAmZ5CJWU16yDD1LXMoovLjvBV/m8KKDvw6HcSNt89aS/1HrD0aSj+mZ5Qxe+64aKioerSIqB6KNRkDDjKT02ZeVrOQpSLLcVOosGKexovqc+0bNrX9rIpHD+EZG5o2TDz0de+YgyKOBSwwT74IjQOHZCI10imU8euelLv4sSkWRFsBLYLPkwfplRGWUa8HAJHKI6LSOnfhcgh1O4hds2OKd/CDQGEJAxZhosjbxICztBh9N9FFImZM2c60lzeqWJyP6PZpxOa0ENwekceWsH6K4XoNcBbawGyqVo2MxjJi0FjXMeTlogqqgzrYjgQHkjmNnsClb1aL/8mzk8QKjPiqK9j9Cv6SnctonwQ2MnF6x61jrFu6LXA71/65LZL/19Mun3DawkLEhuQUaDw2/oXzJ9lArmggqu9Bau5ObU/0b/l0nhFTDY2HpoCyaG2AqZW2Ayqx/wCCWwR4Kfcdif9LFOgoOyG6JCx9sLmsTWGVNbTS65YCc6obWYGpaWpa/tQXA9+6yxE7cUll2CbnKwV64ei0RkjBIt7EkrUs7ruqYmttConA1nGMRVE3eA+h0SA/U/LWhfpr5jRevpDDWa7At7qMfMF43FvGouHzTQHL+SltVDmF9D1mVF5ShxTqqqHK G9tR9YO7 2MZB5HIfAyDhR/kMoBVbQKO+BkJjBFWCXhhS+ZxdTkoB3uvjjMBVQSX7QnzoGUNvo8kYrwT1D06NVk7t+RGyYYIQ7WUMZHsvOAuVYtaQ5xEe7WYNUZfXUvzGVmW4UQS9wk0mEJ7P+ya3cIUxSnFYuzDT3c03h5x3oo6NnIu+jELmBBLV/eAGbboUSAQZo4GxM+2F/I3QKHsHiB6RvBH+vFhVjBHs7kfVsL1L0NXN3QVAsPAIHYe+WvzvacyP/0oTdAyEwJBpJQ+TvNVJ0GiskEdKf/zNl35S+d72FKwRsQZr9meLTfdhBwxagMTh97RyCww0wqi+gUHWzSxNR9L/F7oewtw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Feb 08, 2023 at 02:36:41PM -0500, Zi Yan wrote: > On 7 Feb 2023, at 11:51, Matthew Wilcox wrote: > > > On Tue, Feb 07, 2023 at 11:23:31AM -0500, Zi Yan wrote: > >> On 24 Jan 2023, at 13:13, Matthew Wilcox wrote: > >> > >>> Once we get to the part of the folio journey where we have > >>> one-pointer-per-page, we can't afford to maintain per-page state. > >>> Currently we maintain a per-page mapcount, and that will have to go. > >>> We can maintain extra state for a multi-page folio, but it has to be a > >>> constant amount of extra state no matter how many pages are in the folio. > >>> > >>> My proposal is that we maintain a single mapcount per folio, and its > >>> definition is the number of (vma, page table) tuples which have a > >>> reference to any pages in this folio. > >> > >> How about having two, full_folio_mapcount and partial_folio_mapcount? > >> If partial_folio_mapcount is 0, we can have a fast path without doing > >> anything at page level. > > > > A fast path for what? I don't understand your vision; can you spell it > > out for me? My current proposal is here: > > A fast code path for only handling folios as a whole. For cases that > subpages are mapped from a folio, traversing through subpages might be > needed and will be slow. A code separation might be cleaner and makes > folio as a whole handling quicker. To be clear, in this proposal, there is no subpage mapcount. I've got my eye on one struct folio per allocation, so there will be no more tail pages. The proposal has one mapcount, and that's it. I'd be open to saying "OK, we need two mapcounts", but not to anything that needs to scale per number of pages in the folio. > For your proposal, "How many VMAs have one-or-more pages of this folio mapped" > should be the responsibility of rmap. We could add a counter to rmap > instead. It seems that you are mixing page table mapping with virtual > address space (VMA) mapping together. rmap tells you how many VMAs cover this folio. It doesn't tell you how many of those VMAs have actually got any pages from it mapped. It's also rather slower than a simple atomic_read(), so I think you'll have an uphill battle trying to convince people to use rmap for this purpose. I'm not sure what you mean by "add a counter to rmap"? One count per mapped page in the vma? > > > > https://lore.kernel.org/linux-mm/Y+FkV4fBxHlp6FTH@casper.infradead.org/ > > > > The three questions we need to be able to answer (in my current > > understanding) are laid out here: > > > > https://lore.kernel.org/linux-mm/Y+HblAN5bM1uYD2f@casper.infradead.org/ > > I think we probably need to clarify the definition of "map" in your > questions. Does it mean mapped by page tables or VMAs? When a page > is mapped into a VMA, it can be mapped by one or more page table entries, > but not the other way around, right? Or is shared page table entry merged > now so that more than one VMAs can use a single page table entry to map > a folio? Mapped by page tables, just like today. It'd be quite the change to figure out the mapcount of a page newly brought into the page cache; we'd have to do an rmap walk to see how many mapcounts to give it. I don't think this is a great idea. As far as I know, shared page tables are only supported by hugetlbfs, and I prefer to stick cheese in my ears and pretend they don't exist. To be absolutely concrete about this, my proposal is: Folio brought into page cache has mapcount 0 (whether or not there are any VMAs that cover it) When we take a page fault on one of the pages in it, its mapcount increases from 0 to 1. When we take another page fault on a page in it, we do a pvmw to determine if any pages from this folio are already mapped by this VMA; we see that there is one and we do not increment the mapcount. We partially munmap() so that we need to unmap one of the pages. We remove it from the page tables and call page_remove_rmap(). That does another pvmw and sees there's still a page in this folio mapped by this VMA, does not decrement the refcount We truncate() the file smaller than the position of the folio, which causes us to unmap the rest of the folio. The pvmw walk detects no more pages from this folio mapped and we decrement the mapcount. Clear enough?