From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7401BC636CD for ; Tue, 7 Feb 2023 22:39:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CAFEB6B0093; Tue, 7 Feb 2023 17:39:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C39D56B0095; Tue, 7 Feb 2023 17:39:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ADB2E6B0098; Tue, 7 Feb 2023 17:39:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 973E96B0093 for ; Tue, 7 Feb 2023 17:39:14 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 535BE120C1A for ; Tue, 7 Feb 2023 22:39:14 +0000 (UTC) X-FDA: 80441963028.23.3842067 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf07.hostedemail.com (Postfix) with ESMTP id BFF324000F for ; Tue, 7 Feb 2023 22:39:11 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PvhIXjjl; spf=pass (imf07.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675809552; a=rsa-sha256; cv=none; b=S1R3+HxYuYB4ffizqW0TDpGy8kjtIZnRjqXX8KDA6F9OzB8mZN2Kzw3VXmNR6Hf+or/p3t +9nF89nZCmlPcJYGaVjqG3nLnGayYf7BBM/1xu2X0WOdzlsj8dLiJlqnL55guNWGe+bbMs kLtkEEo+94aPUbj9/GI8SLa6N2frcuw= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PvhIXjjl; spf=pass (imf07.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675809552; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=X9Csf47SP2qQUbx/ZS59vXchZldccDF7AgBUh6TNwYk=; b=tyODft6Q8oCXH4bGevyfCwOyW36S+Oind/FsBhpL2Td9Q7hwPLXCHfYxbqCEd7rRNi+sMl cZt6gArP79s2718ulhcxQSs8CW68Cv7ppBFq9T0iOuq7hgtYA2q13X4jGks0MiqrCoSnJd IbjUWl5dBKQ4qzoepAMm7D8p2+VUhx0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675809550; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=X9Csf47SP2qQUbx/ZS59vXchZldccDF7AgBUh6TNwYk=; b=PvhIXjjlWSBQrniUsd2G1rBrp5XuFPI564z0BglzBl2vC6Gyws9cYyiW2eChojKdRfpH4m dTkFhE2cVTCI55ssjOnCYvz9gBaeAv5P0ZRap+/i+hbMJitWY7x6nfbqPnCWGJDMJKBKZt S/GKopOA2kNXI2oAtMRv/FRqu7M7K5c= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-203-fImSQo6nMY2X1Y55lOEJjA-1; Tue, 07 Feb 2023 17:39:09 -0500 X-MC-Unique: fImSQo6nMY2X1Y55lOEJjA-1 Received: by mail-qt1-f198.google.com with SMTP id he22-20020a05622a601600b003ba3c280fabso1324759qtb.2 for ; Tue, 07 Feb 2023 14:39:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=X9Csf47SP2qQUbx/ZS59vXchZldccDF7AgBUh6TNwYk=; b=ODcaij8QFmnFzytcik2a6QYoHUDfRukCp8ZssmSTmLo+n4+9miHUePc4CfBp4CImU6 wnlW8VoyG1L2OBpF4km6hNHn4c/fz5Pva+dJZYkvZAiwgHwjaQ6iJfar72/7l6FkbvNU ZJ74ad+V85mbwPOBV57kfXfbxwpWgFSuoacAeQDy6v4TLeY/0chq/KxGv1GO2u/UiK3v P4aQri3E9E7uGTWkNvy7dP4zj9+XLy1YiDNrkSDjMjAH4U/+wTDK9zFYBHL9ZjWZXEOG L0qa1YeOitPSh+eBSBoe5Nd17FN2iPxnW8KFTmzaS8MkNptMKoUza736eGMlNzQglp4b fyFQ== X-Gm-Message-State: AO0yUKWpZPcfnJ7KQdjm6xPGT4fxowd0ixxHhePUvq++KNyuBwLqyeEY MQQ6NKWSQo2aDafDlsVpFPBXi1jBzLhIs+bAaGKrFKTi81omrmRH/ZXrrsLE/yMQK9SDZRngcDd J7Rsw9KH4qec= X-Received: by 2002:ac8:5709:0:b0:3b8:4951:57bb with SMTP id 9-20020ac85709000000b003b8495157bbmr10225774qtw.5.1675809549226; Tue, 07 Feb 2023 14:39:09 -0800 (PST) X-Google-Smtp-Source: AK7set+wcvd4R5L2z7kw7A/EppK0XEPsvziyJBesvsuSMM0WJLj8/B9narnfVtYu57jhpgJMw1P5+g== X-Received: by 2002:ac8:5709:0:b0:3b8:4951:57bb with SMTP id 9-20020ac85709000000b003b8495157bbmr10225743qtw.5.1675809548915; Tue, 07 Feb 2023 14:39:08 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-56-70-30-145-63.dsl.bell.ca. [70.30.145.63]) by smtp.gmail.com with ESMTPSA id 17-20020ac85651000000b003b63a734434sm10188363qtt.52.2023.02.07.14.39.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Feb 2023 14:39:08 -0800 (PST) Date: Tue, 7 Feb 2023 17:39:07 -0500 From: Peter Xu To: Matthew Wilcox Cc: linux-mm@kvack.org, Vishal Moola , Hugh Dickins , Rik van Riel , David Hildenbrand , "Yin, Fengwei" Subject: Re: Folio mapcount Message-ID: References: MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Rspamd-Queue-Id: BFF324000F X-Rspamd-Server: rspam01 X-Stat-Signature: 11ek7xicchi96k9pnnxhz8o4g8b4zzq6 X-HE-Tag: 1675809551-702632 X-HE-Meta: U2FsdGVkX1/7DraZHhSX4Mj6W+Sy8zrfQX8690n1FrwioMpfcs67qO8qIGnIUfZJ/DDg9Xa+ANHTgqRkknqqxcQqDDfv8DQoUwaiWF0opaW0utAFD0RQxlMyJwCFyjPexbxXyAOk3r+ene6dj/if15TXTOn0rLZtV4wG+igJcK1OUhH5sGog3NmIibpFlzsotN11dS1I9fiuxJqrqUj+B9fsY+LvUXYy6D3A/d6kbbRDu/uNkPObqgcW4G6XuSiRsn8ABcf4fCVIU61OQPQETB/EC19HOfz77OVlZrHCbovNcKIgRxFhv9jz7qFLGX7UaZoD92mzKlpjWZTYydiSii72N+FvnWQDXknaMovznuiryerQ53SMO/QmYtyENqAUuh3pMPF6GlqTqF83PscndlH+oa07kze6zFX9TFvesnFsD+3JV7/op2clgQWDEvc8GQnsHZja/JPsF74+OvqkCrU2FnMQXjFQVykQy80/Ik0mdW3lcO5M1wLkgcMczEe0WBbS36V88pTvoJKjybiYPQ8W8AzYbs0dEWx/oyvn05YYbGeI33wQP/KtSaMOvgeZYFHHzkrgOQEW84qvECJ3WhxdmEMhM5eaIsuIXRV4kXH0OPKN01qGfzGlLeEN4+tgEluxjfm0XjlQqluvCP5FYcvLQhq/+V9x/Xg4ncUVQpqArfr9eeCt+ch7+sP9qA9ONgqFqXw8LATXx6t3e75nmjVuUYhbS9wq6hkHwY6CS91EAhXbjTX8om32RaNu1UBAKA/XqY4gQBWQypUHKumqsGee2LByEQaGXaREkZ0WvrFPuJYDjQGKgmv0KImmksGiQllac5LHspLoKP/xxFC07wQLxOpvg3J8Op4i43+Y+OowkyZ+ILQFfsY6T3Guzt+tCx0Htj1rr0/hdLNeQtz3Hz7BvfvQ7X1ZkWqBtd2pvxdnC+FUA0saUTojYXu4Q6cZvKC7Abk+Pxvly2eKli9 E3jcSULI zjOqKXNzy1XSkFlO/CAX4M9Mi59pUKeys/6Y4XqJWHAAefTM+AhXjlPDOylt+2f01GnJ5MZNVldWNF1N1MdIbWl8MzktYuBI9sqm8Uy2A3oY52JwnhGgDzE3uo4KWwO3Au521hjNF+mdoqgX7KuVEvETwmm0s1c2dLp/Q6dzq3HBBv+M+F8bER1kkZ4USQ8U8x6SLpf/YlpZZpkHkbGfurVSaStq4L8xHlyKISGaAKL9zLKU2CzECToUcWYNNqhn18KoXeo4QJAz4a4EJPUmdU/lsoWRP7M6hdLavlqYT6ftcsTAaXSsM/8uydUPP4VzUmgKZxO9EeQZw5PC3gUOGNfom0ErEeD/dndZMKd9ErNOqKhE7GuYC65WbUhIadr/GFcr3iCOe5pNrM0rMM+2LFRBOPCEMZLXXtl6Gwl1RO9P4sPiqTcrVHIz+BbKgl+fBQrnFPTqBuJrC488= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Feb 06, 2023 at 08:34:31PM +0000, Matthew Wilcox wrote: > On Tue, Jan 24, 2023 at 06:13:21PM +0000, Matthew Wilcox wrote: > > Once we get to the part of the folio journey where we have > > one-pointer-per-page, we can't afford to maintain per-page state. > > Currently we maintain a per-page mapcount, and that will have to go. > > We can maintain extra state for a multi-page folio, but it has to be a > > constant amount of extra state no matter how many pages are in the folio. > > > > My proposal is that we maintain a single mapcount per folio, and its > > definition is the number of (vma, page table) tuples which have a > > reference to any pages in this folio. > > I've been thinking about this a lot more, and I have changed my > mind. It works fine to answer the question "Is any page in this > folio mapped", but it's now hard to answer the question "I have it > mapped, does anybody else?" That question is asked, for example, > in madvise_cold_or_pageout_pte_range(). I'm curious whether it is still fine in rare cases - IMHO it's a matter of when it'll go severely wrong if the mapcount should be exactly 1 (it's privately owned by a vma) but we reported 2. In this MADV_COLD/MADV_PAGEOUT case we'll skip COLD or PAGEOUT some pages even if we can, but is it a deal breaker (if the benefit of the change can be proved and worthwhile)? Especially, this only happens with unaligned folios being mapped. Is unaligned mapping for a folio common? Is there any other use cases that can go worse than this one? (E.g., IIUC superfluous but occasional CoW seems fine) OTOH... > > With this definition, if the mapcount is 1, it's definitely only mapped > by us. If it's more than 2, it's definitely mapped by somebody else (*). > If it's 2, maybe we have the folio mapped twice, and maybe we have it > mapped once and somebody else has it mapped once, so we have to consult > the rmap to find out. Not fun times. > > (*) If we support folios larger than PMD size, then the answer is more > complex. > > I now think the mapcount has to be defined as "How many VMAs have > one-or-more pages of this folio mapped". > > That means that our future folio_add_file_rmap_range() looks a bit > like this: > > { > bool add_mapcount = true; > > if (nr < folio_nr_pages(folio)) > add_mapcount = !folio_has_ptes(folio, vma); > if (add_mapcount) > atomic_inc(&folio->_mapcount); > > __lruvec_stat_mod_folio(folio, NR_FILE_MAPPED, nr); > if (nr == HPAGE_PMD_NR) > __lruvec_stat_mod_folio(folio, folio_test_swapbacked(folio) ? > NR_SHMEM_PMDMAPPED : NR_FILE_PMDMAPPED, nr); > > mlock_vma_folio(folio, vma, nr == HPAGE_PMD_NR); > } > > bool folio_mapped_in_vma(struct folio *folio, struct vm_area_struct *vma) > { > unsigned long address = vma_address(&folio->page, vma); > DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); > > if (!page_vma_mapped_walk(&pvmw)) > return false; > page_vma_mapped_walk_done(&pvmw); > return true; > } > > ... some details to be fixed here; particularly this will currently > deadlock on the PTL, so we'd need not only to exclude the current > PMD from being examined, but also avoid a deadly embrace between > two threads (do we currently have a locking order defined for > page table locks at the same height of the tree?) ... it starts to sound scary if it needs to take >1 pgtable locks. Thanks, -- Peter Xu