From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CEFBC05027 for ; Mon, 6 Feb 2023 20:34:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1DAE36B0072; Mon, 6 Feb 2023 15:34:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 18AD26B0073; Mon, 6 Feb 2023 15:34:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 07AEE6B0074; Mon, 6 Feb 2023 15:34:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id ECF186B0072 for ; Mon, 6 Feb 2023 15:34:40 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id BC79E1C6058 for ; Mon, 6 Feb 2023 20:34:40 +0000 (UTC) X-FDA: 80438020320.27.D3AAB17 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf21.hostedemail.com (Postfix) with ESMTP id A207D1C001F for ; Mon, 6 Feb 2023 20:34:38 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=H36wdoBp; spf=none (imf21.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675715678; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=axLQxR1xSnpLz8tAiIqdL9aCYebEoeYR8ACAn1jYrQw=; b=fAEYD8xALlHjGRgrTXUemUhld2Ga/NeEoPcFnh2hpL35vHOS9x2oO4jS/JA1fq3p46sR1I 49BGkHdeHJNG4JiTpV0TGEtTRfC5pXJMyvqTnTPJyzKmGZDFIpTm2VYxAMho6HDtiy4zdJ jZzoMF1ybrmUY1SnHpDoNZFgg1t5eUg= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=H36wdoBp; spf=none (imf21.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675715678; a=rsa-sha256; cv=none; b=3zg2CGlQez6VEecH87SgOBQNINf4SeKVjSPZ2fh7JuZlMutsRFa0/5/A7Ni56+0HCgVXON Y7pt2fXGimUmXTiTFQnl4frx0+ARaudfM+JJz+QZEnkrfWbPrbQ9c1WeGXPIRTWPPGPFLe 2c57id0tmiQS3RokrxCVNY8uf7BPCjI= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=axLQxR1xSnpLz8tAiIqdL9aCYebEoeYR8ACAn1jYrQw=; b=H36wdoBp0jrsMFgWvBczfNSuHC xAY6HRt078w1wbm5O6dfb8nuGQTtpibcjGoR4fVTXlaI/OLtJ2QE4pBS6w44SGqQgO7izzJ4CNOlh Z1hgSlEGV1yqnfWz4k6jC0iyE/kEcuWgxv9IA7v+AwbidBgR8T1KSzFG+2TRO4EyaHEYmrGrGjBHg AIYWxxqgpXTPXrTpxmMW1d/GR7K4Q1iQ4TAidpYOTFRHsuQyMrD/0MwcYk3ZtB1BUASDOAhbNUAyU 5emuZoHVoKkRlG4SJ3r93RPUit4H088xGLUeLjM1LIR/RCuBJbSz4LXXAyZILTSAMKe+Ua4QJiy7F d3jlzxGA==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pP8C7-00H46l-R4; Mon, 06 Feb 2023 20:34:31 +0000 Date: Mon, 6 Feb 2023 20:34:31 +0000 From: Matthew Wilcox To: linux-mm@kvack.org Cc: Vishal Moola , Hugh Dickins , Rik van Riel , David Hildenbrand , "Yin, Fengwei" Subject: Re: Folio mapcount Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: A207D1C001F X-Stat-Signature: whxce4gkkxyenb66fgdjj8ht8jcbjr8t X-HE-Tag: 1675715678-740066 X-HE-Meta: U2FsdGVkX19S7+QSqoTE3ZiBqPL4DN6Zxy60DnWit3oTDbOvnr0gx2qMzCXKYsBVi5MFKCjzU1C+gGynCbZmPDi0zyaNO2EOUAAK/wJtgsM97keXETdlHkEPGvm13lwN9AvbXD9pF+XaHJvHJKEmO/QAnQeD11Mq0F8p4q0F+HA7A67ddG5Q23ezJrI4IQ1vlo65e31SEsp1kgYnvntBbEpDGymvi7iwLJsyAc1B2Iaol4s4avVn7XXqz/sCkyAwrjv2ejPvuDG6TCIRvnD4HvL7XbH7KFf0Ti3V0oPiFUvlJ6nufDxHua8o7tTfrwVt4RsrCyTR4L/bRV5CCpsMoelNMPI2cZc2DcV6oCcl8zp03PQv4a3ota0VMbVKbZRqazEYQ2gpvw32DDBOdjhwj+0Ndrw81B8lZJXJ7169FDqy4dawyagk/rfSppFII9TyOsudks0JUltLGKs24mpqXLlCiUzhYIcGjqoXp44wyZiwfCAED92lm/1KjO/seLfYJBOSmv5bcjCKfvpXC1JG6lQ5RElkoDAWAb5C96+eRhMKv2drGC+DDdZp1s7YnOyWLAV9yGoJ+UZVz5BU1rcrQtf1H26YhWHXKAoN9tIoQUz6HOexYQTklO5li7v0OKnhyg0WgjT4BkoO3obeJfSvo+4duNIUZbUEkpuROrMYWDB8cyDgNE/hJ2Z3kpNhvNmGLyg0bKyfuXuSo6u43UKaUNPaJs3CoGykdyKom6WsJsxx1bkFxu6FKOxxqPoDigq1Sr0R13Q+zG8GJ/hTApKbrVTDr8iTWoTDnGTD8A+DHbtEchRtVNMFAOUSUzZ6Lt06UXRf1QnmKWhsQdVZFpa/KprGI9d1oY7b2CkfJrkOJBVyHZU/JjupcNqZoTZfv2KtM1GbWz0RVtSuqjHDNmAM94Pr5GJOWlAtdGvZ+lt58bx2OXYaSODmzOC5Hr4qq3l/nuYrKN3PSFaY+xd82sN QfOc2cDy Oo+x/Gnyr0LhFhq7puJ/tcsm9EITBJBDjHgSgmRAFsiSvZwxqfbbY5MFAaCRUA0XhMFj8uJUONu5iTzlCBhc+XI8LEOPb3D5/6njP6qKaThiR5ktZ6XR5Qni2ZjEcaWar0MiQb26FqNpmn/CrNi7tN8REifal3GexjVJ00usGGkHZrGOGA7eDKlqdQhoQnTzucl4MTROXTk7jxxnbI6u3t+17MkBqzdKcMVow X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 24, 2023 at 06:13:21PM +0000, Matthew Wilcox wrote: > Once we get to the part of the folio journey where we have > one-pointer-per-page, we can't afford to maintain per-page state. > Currently we maintain a per-page mapcount, and that will have to go. > We can maintain extra state for a multi-page folio, but it has to be a > constant amount of extra state no matter how many pages are in the folio. > > My proposal is that we maintain a single mapcount per folio, and its > definition is the number of (vma, page table) tuples which have a > reference to any pages in this folio. I've been thinking about this a lot more, and I have changed my mind. It works fine to answer the question "Is any page in this folio mapped", but it's now hard to answer the question "I have it mapped, does anybody else?" That question is asked, for example, in madvise_cold_or_pageout_pte_range(). With this definition, if the mapcount is 1, it's definitely only mapped by us. If it's more than 2, it's definitely mapped by somebody else (*). If it's 2, maybe we have the folio mapped twice, and maybe we have it mapped once and somebody else has it mapped once, so we have to consult the rmap to find out. Not fun times. (*) If we support folios larger than PMD size, then the answer is more complex. I now think the mapcount has to be defined as "How many VMAs have one-or-more pages of this folio mapped". That means that our future folio_add_file_rmap_range() looks a bit like this: { bool add_mapcount = true; if (nr < folio_nr_pages(folio)) add_mapcount = !folio_has_ptes(folio, vma); if (add_mapcount) atomic_inc(&folio->_mapcount); __lruvec_stat_mod_folio(folio, NR_FILE_MAPPED, nr); if (nr == HPAGE_PMD_NR) __lruvec_stat_mod_folio(folio, folio_test_swapbacked(folio) ? NR_SHMEM_PMDMAPPED : NR_FILE_PMDMAPPED, nr); mlock_vma_folio(folio, vma, nr == HPAGE_PMD_NR); } bool folio_mapped_in_vma(struct folio *folio, struct vm_area_struct *vma) { unsigned long address = vma_address(&folio->page, vma); DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); if (!page_vma_mapped_walk(&pvmw)) return false; page_vma_mapped_walk_done(&pvmw); return true; } ... some details to be fixed here; particularly this will currently deadlock on the PTL, so we'd need not only to exclude the current PMD from being examined, but also avoid a deadly embrace between two threads (do we currently have a locking order defined for page table locks at the same height of the tree?)