From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8356DC25B4E for ; Tue, 24 Jan 2023 18:36:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 222B46B0072; Tue, 24 Jan 2023 13:36:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1D2B16B0075; Tue, 24 Jan 2023 13:36:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0C26A6B0078; Tue, 24 Jan 2023 13:36:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id F28746B0072 for ; Tue, 24 Jan 2023 13:36:02 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C68B41201FE for ; Tue, 24 Jan 2023 18:36:02 +0000 (UTC) X-FDA: 80390546964.21.39CB28B Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) by imf04.hostedemail.com (Postfix) with ESMTP id E74984001B for ; Tue, 24 Jan 2023 18:36:00 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=kjVqsvbC; spf=pass (imf04.hostedemail.com: domain of shy828301@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674585361; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gW+8wo9DyjzrACK7Qfv0MCF7cMVfn09Y2wfPoMpiWPs=; b=NAcPBDSFwIF/gxvOS37zRtMGWGgBrTqDme88cmZ3HCoPS0epXvTJPXG0qItW8RvHCkU1lq 2itl1/f/LgoF9Nw4zbzwxhR0vRBAQAeitJMuY/aPw13hgSTUua5TrlHcq5Fpdbi6VrssyB yWvUPbRzkAgVACj0Ku4orTOX5EJwxX0= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=kjVqsvbC; spf=pass (imf04.hostedemail.com: domain of shy828301@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674585361; a=rsa-sha256; cv=none; b=Llg+aijyRpEbKf5Pgj1faW8FCFOcu5KvD3O5fZ9X0YSAaeDhOFGolexSrNROMKs7/rt7+B hbfgAA2OgLlpAq2CIZymZBL0eUKjUMWSAp+Jr/o6xb3c6MIG/Ym+w5+ZjRTKIJaL7l5sGT m8/lRNGFzssso1Nhe5AWcAXjy0wZNsU= Received: by mail-pl1-f178.google.com with SMTP id d3so15584864plr.10 for ; Tue, 24 Jan 2023 10:36:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=gW+8wo9DyjzrACK7Qfv0MCF7cMVfn09Y2wfPoMpiWPs=; b=kjVqsvbCXA8OZ7P7v9My3BZ9IFl/n2TmplqeP2SKKG3dd2iZ/mWBK1A9HDxDbkK8pF ZSbyKPsaiGgoHKoa5dy92DasastulN8oUMrlwMA7BvJHUZzC/aIjoXssuCElRQHGYS2A 4SYEabBk+sGh7bpJMEseSZ6buMY3mnDq0KgCiCagcx+8vGCgarL3EftauuauBB1eWEzv mSwGcSyW3VcFKAaPiILEwXf68TPTZ0sVskYYXjwq09SVHgv260RlWzN0mrVncQQ8HVFX R2NYkV1LJuCb9NS1QlWdaVgoQf9YbbGPXREHO5uLmNU+46jIFpOJ2TwQFIyj+XW3BPgU bBRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=gW+8wo9DyjzrACK7Qfv0MCF7cMVfn09Y2wfPoMpiWPs=; b=EowJv2b672et8sv+52VTRpoNw37Ejj8fTNfD8vgWTamjwaFaz4kPnDRKsFWAgERIiF 8RKAP9HinEp20nz7vecgZCOlXu4vRKy874eVXndvuDvm7V5i2Q9eAe9iAnmcLHmwCWUf iuy84u6bxEJ51bwGIlmAy3ZK2jt4SGUiYE8iwOMpDLd6p2e+JGofSVlBcmHghtDULgTf m4su9nKihkMOBQyaeFpQJ1JPKstpnS8Y0IWyx8xnThBO+EgXlGSBB74jyNEnWQx3VmUg cl8ojJj+OFYvdTVjt7sdMD2myHwJtqxEpCBu8t1yaRo6jzsaY9G8qlJsRmWlwZt7PiYB /g9w== X-Gm-Message-State: AFqh2kp6rjbITBghMrau07HiJ9vY86ghhEbA7NXlW+cfH7SHiXhOsE4w mlM0dwHS8uDa0n9QCfu1hwO+Znq85c5uHYx26i4= X-Google-Smtp-Source: AMrXdXvh0HbUkEUV2YgRpnHAqIbwlosDkEd6sSorxMrggnsuyciiEkfxR1io1CI8q0lIJLHV6PmrdosH0M+NzMSQHAI= X-Received: by 2002:a17:90b:82:b0:229:419c:1d98 with SMTP id bb2-20020a17090b008200b00229419c1d98mr2994865pjb.164.1674585359547; Tue, 24 Jan 2023 10:35:59 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Yang Shi Date: Tue, 24 Jan 2023 10:35:48 -0800 Message-ID: Subject: Re: Folio mapcount To: Matthew Wilcox Cc: linux-mm@kvack.org, Vishal Moola , Hugh Dickins , Rik van Riel , David Hildenbrand , "Yin, Fengwei" Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: p8o6ocxhgydtpaekocz5p1irib3o3yoh X-Rspamd-Queue-Id: E74984001B X-HE-Tag: 1674585360-273408 X-HE-Meta: U2FsdGVkX19FyKPCSAFrBhcRBr1F2z36wXR3BdmmFLiSE2YTMBPBXFBDTv34uzWKMqI3hpAS/eBH8MiN7Bjer3nwiDNOuQkSh9Kez/Gn+ult/qNg2CgkRYkgLfG7qEOku10kPYrpBZDPaut5E1tZt6a3RmMspLmjHem260hED+cuSP8A5UQtJjTAm3IRYt/ZFlI9+DaWxceEraIKazJ/4quZX2hVVSVJxe4ymJaLsH+96Y5kP7gTNDZoeaGcTursY/PuwNT8QvnItZw9KmY55wfJqjRup/ylWTzgLyyhNceef1dnHmidjn9de8XyK/uLQYFgm3lEjzmGi1yn5njnQ5MCOFn3ga+ylosJMT0xr7eMrCKQfyeBAg41kedBRwOMYjfUMWZhVLFqZ5xkwIreYtoIp5qIf11IgbPTDCrg6VhqybI57+mFj6ulkbFTp5Rj0k4PrjWZrEF5KPSORhcgDnFBL+2MSPK4r86EGZLCoPmj9Eq1Oen5yeF2nRckf0gvZAXcmxi3bcBnBeVNG0xlkTb/IROa6YwY5YzKlb2ADCqszurjvoWLrmbSrs8JiXvlCsSANeJlIj0B8pq04DP4lGchiNpxlXtUmkaMzUXqDpiG4xCTHym+BxbI0ROeMgW3sBRuqaYCPvNydXtJaQDLm/1cX08AOWrkA6AZ1y3j9CO2qUoFywvqI54PiGY/q+FM8sVOKvCqr2K/7NaA27KLbAK7cBBsjp+ptuJcXDodgYrpbKbNkx4aO0upnSTWf8ypCMEYQXoMiw9gmd89IupBTwtd9UYudfxvPqHR07GwBB45BtIRWYcbQSKOEABrtybQzdagMcUUG76z2n1CBLQ+yhA+YFiomi8ruSNyoAFfV9NZml7lXW/EgmMFwnFDRswQmJqDnOKz418kfOt/7GjxP51bnNgMmNoK8JG/q9Xq6tEP5kWzj4DDIBxOJvGpDs0L5a2g2JNVMk2yk+bYxF3 cHXftUEk A96PaQjUbZKT1fnJF0vaMlzRSaTLlhucsmLSJfOLnH6/v5l11G7mOB1lqY42Jrj9qnH51oqbBh4ifp3xw+cEMy90irMvXNTZlhoH3kup+lWcXR6JyprvE2UCutb2a3lS3Vp+eiVzzjc6jRrdCgtM1gnvtbFtXQ7sX/31CouLUA6u1UfdV+3mvTiLFR7ewTyS+dw3n33C6oRc+0PZ/yU9PMKOTYDcGN0BEWA5UfsS2SjRd5b/026OTf3g+2w8AGqF2Hni6tFV0hZ3Sa+8nsXUWZM9hjpXIKx0mXYgc5xIrNKPMrzXBjw9BJoAIlCIscZXBHIEaA7qnjyKvPcmtV6J4JxFCD6OF+9mip4c3Um4c4XofQH/Kj8zlY3c2FXEMPDw1tIloqYB1QuOrfq3254mcfXX8YqizoOVHed159ZJh7w8Jq4BXbbb2DD6ta5mwUTQmWSD/IQ1zroK09T8rwhHBKzUqxDcjp5ZRM6ACJ06dr9YEYyPCIXmt09mrCw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 24, 2023 at 10:13 AM Matthew Wilcox wrote: > > Once we get to the part of the folio journey where we have > one-pointer-per-page, we can't afford to maintain per-page state. > Currently we maintain a per-page mapcount, and that will have to go. > We can maintain extra state for a multi-page folio, but it has to be a > constant amount of extra state no matter how many pages are in the folio. > > My proposal is that we maintain a single mapcount per folio, and its > definition is the number of (vma, page table) tuples which have a > reference to any pages in this folio. > > I think there's a good performance win and simplification to be had > here, so I think it's worth doing for 6.4. > > Examples > -------- > > In the simple and common case where every page in a folio is mapped > once by a single vma and single page table, mapcount would be 1 [1]. > If the folio is mapped across a page table boundary by a single VMA, > after we take a page fault on it in one page table, it gets a mapcount > of 1. After taking a page fault on it in the other page table, its > mapcount increases to 2. > > For a PMD-sized THP naturally aligned, mapcount is 1. Splitting the > PMD into PTEs would not change the mapcount; the folio remains order-9 > but it stll has a reference from only one page table (a different page > table, but still just one). > > Implementation sketch > --------------------- > > When we take a page fault, we can/should map every page in the folio > that fits in this VMA and this page table. We do this at present in > filemap_map_pages() by looping over each page in the folio and calling > do_set_pte() on each. We should have a: > > do_set_pte_range(vmf, folio, addr, first_page, n); > > and then change the API to page_add_new_anon_rmap() / page_add_file_rmap() > to pass in (folio, first, n) instead of page. That gives us one call to > page_add_*_rmap() per (vma, page table) tuple. > > In try_to_unmap_one(), page_vma_mapped_walk() currently calls us for > each pfn. We'll want a function like > page_vma_mapped_walk_skip_to_end_of_ptable() > in order to persuade it to only call us once or twice if the folio > is mapped across a page table boundary. > > Concerns > -------- > > We'll have to be careful to always zap all the PTEs for a given (vma, > pt) tuple at the same time, otherwise mapcount will get out of sync > (eg map three pages, unmap two; we shouldn't decrement the mapcount, > but I don't think we can know that. But does this ever happen? I think > we always unmap the entire folio, like in try_to_unmap_one(). Off the top of my head, MADV_DONTNEED may unmap the folio partially, but keep the folio unsplit until some point, for example, memory pressure. munmap() should be able to unmap a folio partially as well. > > I haven't got my head around SetPageAnonExclusive() yet. I think it can > be a per-folio bit, but handling a folio split across two page tables > may be tricky. > > Notes > ----- > > [1] Ignoring the bias by -1 to let us detect transitions that we care > about more efficiently; I'm talking about the value returned from > page_mapcount(), not the value stored in page->_mapcount. > >