From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D9D8C433F5 for ; Thu, 16 Dec 2021 15:55:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 82B176B0071; Thu, 16 Dec 2021 10:54:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7DABC6B0073; Thu, 16 Dec 2021 10:54:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6A3E16B0074; Thu, 16 Dec 2021 10:54:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0063.hostedemail.com [216.40.44.63]) by kanga.kvack.org (Postfix) with ESMTP id 5B3C06B0071 for ; Thu, 16 Dec 2021 10:54:58 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 1CAEF8249980 for ; Thu, 16 Dec 2021 15:54:48 +0000 (UTC) X-FDA: 78924105456.22.F8F5997 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf26.hostedemail.com (Postfix) with ESMTP id 433BC140013 for ; Thu, 16 Dec 2021 15:54:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=vWjVinovdpEEy01Fd/HB8MmTxohhqaAdIDFIhbTR9fM=; b=vr5xxdG2jEKd8MBXl8EG5Kt7uw Qk3PHQtbnf4Oc7yzT0a9G1c8145rnRz247L3ftJjRq2pKSPtx1lcxV/S+YdsKimhuOiRCG4EnA5KL BGUHLAillKUC0NQMYr3ySZqp+IpTpA7g07q3JNQbCMrB8aYRJjydSHXBUZmFEJGNukKN1J41IRuQE LZyASbX/q0Oev4O4stNQhB318J+rQWisXrq/iuSF8LyNRfKMQpPWS5YCedbopSYOy6+19dYYc2yyu mwTNdXXq+cJFk0yO1Q3gL1OIg7UG+GmwqN9qvxvRnq4JaRfElTkCrPyMDAokIlx7d+bdco/UOA92G LXiF9rtg==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1mxt5Y-00FiTJ-R4; Thu, 16 Dec 2021 15:54:36 +0000 Date: Thu, 16 Dec 2021 15:54:36 +0000 From: Matthew Wilcox To: Jason Gunthorpe Cc: "Kirill A. Shutemov" , linux-mm@kvack.org, Hugh Dickins , David Hildenbrand , Mike Kravetz Subject: Re: folio mapcount Message-ID: References: <20211216093737.7w2fv7p7j2rrx5r6@box.shutemov.name> <20211216151917.GK6467@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211216151917.GK6467@ziepe.ca> X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 433BC140013 X-Stat-Signature: e38sow1c8m43fdaw1sz8y657ygoi41om Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=vr5xxdG2; spf=none (imf26.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none X-HE-Tag: 1639670086-800321 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Dec 16, 2021 at 11:19:17AM -0400, Jason Gunthorpe wrote: > On Thu, Dec 16, 2021 at 01:56:57PM +0000, Matthew Wilcox wrote: > > p = mmap(x, 2MB, PROT_READ|PROT_WRITE, ...): THP allocated > > mprotect(p, 4KB, PROT_READ): THP split. > > > > And in that case, I would say the THP now has mapcount of 2 because > > there are 2 VMAs mapping it. > > At least today mapcount is only loosely connected to VMAs. It really > counts the number of PUD/PTEs that point at parts of the memory. Careful. Currently, you need to distinguish between total_mapcount(), page_trans_huge_mapcount() and page_mapcount(). Take a look at __page_mapcount() to be sure you really know what the mapcount "really" counts today ... (also I'm going to assume that when you said PUD you really mean PMD throughout) > If, under the PTL, you observe a mapcount of 1 then you know that the > PUD/PTE you have under lock is the ONLY PUD/PTE that refers to this > page and will remain so while the lock is held. > > So, today the above ends up with a mapcount of 1 and when we take a > COW fault we can re-use the page. > > If the above ends up with a mapcount of 2 then COW will copy not > re-use, which will cause unexpected data corruption in all those > annoying side cases. As I understood David's presentation yesterday, we actually have data corruption issues in all the annoying side cases with THPs in current upstream, so that's no worse than we have now. But let's see if we can avoid them. It feels like what we want from a COW perspective is a count of the number of MMs mapping a page, not the number of VMAs, PTEs or PMDs mapping the page. Right? So here's a corner case ... p = mmap(x, 2MB, PROT_READ|PROT_WRITE, ...): THP allocated mremap(p + 128K, 128K, 128K, MREMAP_MAYMOVE | MREMAP_FIXED, p + 2MB): PMD split Should mapcount be 1 or 2 at this point? Does the answer change if it's an anonymous or file page? (I think it might; perhaps if you do this to an anon THP, we split the anon THP and each page is now mapped once, whereas if you do it to a file page, the page cache does not split the page, but it does count the extra mapping. also the answer may be different between MAP_PRIVATE and MAP_SHARED) > The actual value of mapcount doesn't matter, the main issue is if it > is 1 or not, and if 1 the determination needs to be reliable and race > free. > > Putting the mapcount on the head page is sort of an optimization 'all > tail pages share the same mapcount' and the doublemap thing is about > exiting that optimization with minimal locking. > > There is other stuff going on too that can possibly do other things > but this seems to be the primary difficult task. > > (IIRC anyhow, from when I looked at this with David a few months ago) > > > I'm just trying to learn enough to make sensible suggestions for > > simplification. As yesterday's call proved, there are all kinds of > > corner cases when messing with mapcount and refcount. > > It would be amazing to get some better idea! > > Jason >