From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E635FC433EF for ; Thu, 16 Dec 2021 17:01:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6CE376B0074; Thu, 16 Dec 2021 12:01:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 67E886B0075; Thu, 16 Dec 2021 12:01:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 546DA6B0078; Thu, 16 Dec 2021 12:01:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0125.hostedemail.com [216.40.44.125]) by kanga.kvack.org (Postfix) with ESMTP id 45D9A6B0074 for ; Thu, 16 Dec 2021 12:01:23 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 0C7CC89068 for ; Thu, 16 Dec 2021 17:01:13 +0000 (UTC) X-FDA: 78924272826.27.224EE4C Received: from mail-qv1-f41.google.com (mail-qv1-f41.google.com [209.85.219.41]) by imf13.hostedemail.com (Postfix) with ESMTP id ADE162001F for ; Thu, 16 Dec 2021 17:01:07 +0000 (UTC) Received: by mail-qv1-f41.google.com with SMTP id p3so23916558qvj.9 for ; Thu, 16 Dec 2021 09:01:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Q/zUsf8VIkSukahxbWh8SDMXdLf7ZZkS4MAC5B8I458=; b=LEs3Rgq4M7hJypdh7ExGn8XFByw58PW5c1B/g1qiH+Z0n22XZ0vq2mseWldZhftoGd QtOVIiN7dKixLoq5ZNsQfj+23TfCAZIBWbNwk3NuyNI+Qq5NerNUChCKVAAFbVnHQQqQ ccyM0LywljceoL2qd4B2kW6D5vp8WnSfnJYfz0yPgIB+XEnTDyqLkcrwtvn4vpl8pfD5 doUirTTTkRW2l5cUf9FnQTqOhiabqDkUv31zkGg79i0stHDaCDndqW3sHFmTXGVWMOUV UWIzB/XiRMQ2l1qWwGPYWzTIFTtjLbyNqIEgV6kO3Wv2FcA6WerK30LcE76nWsBHMcRe bOdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Q/zUsf8VIkSukahxbWh8SDMXdLf7ZZkS4MAC5B8I458=; b=Z8jjt9pvRLl2ELhgtcSqxteU2IisPq4vrQNwVj+Ge3eo7fS2Ih16hSBvqXXcK2y59s 5MsEksQ97J3w5HRL0fCdEZuJTHJsIsIrSYwnAg9Ne2o2FA5xz2BqqW5UDGdtOcDX5eqC xAS9wJVXhuIGeL8/WulQ+I3UPLQU5q1DqE8UUBczMy/VTRhy8y4UaGWYyKnR+buPl4u4 eZWP9/L3h6bIPXciCJbzFJkdk431SbJQp4xK/3wrmNDsMl2vvl2LRS6J/LY14Ps6zgdW RkPQukp7vyT3rTMlk+U5YzR/4BpYITKJwvGMn/dGswog9SIBdVUyNWLz8XBUSwetkahP lVKA== X-Gm-Message-State: AOAM5333EFNOLn0z8mXcCQLhOahD5z/+AenngaTVvzXIBQd7UC0V+wWw 51hWeyTweJOOP0VaAtyomnnlbg== X-Google-Smtp-Source: ABdhPJySqPwbLr471FhfF/om+Ww8BA82ZfvJ/Yg+NGkVBlYv+KsK50fzKNtcmA11DpBKvH0psgDRrg== X-Received: by 2002:a05:6214:2682:: with SMTP id gm2mr12433679qvb.2.1639674071793; Thu, 16 Dec 2021 09:01:11 -0800 (PST) Received: from ziepe.ca (hlfxns017vw-142-162-113-129.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.162.113.129]) by smtp.gmail.com with ESMTPSA id t30sm3061814qkj.125.2021.12.16.09.01.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Dec 2021 09:01:11 -0800 (PST) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1mxu7y-005X6n-7L; Thu, 16 Dec 2021 13:01:10 -0400 Date: Thu, 16 Dec 2021 13:01:10 -0400 From: Jason Gunthorpe To: Matthew Wilcox Cc: "Kirill A. Shutemov" , linux-mm@kvack.org, Hugh Dickins , David Hildenbrand , Mike Kravetz Subject: Re: folio mapcount Message-ID: <20211216170110.GL6467@ziepe.ca> References: <20211216093737.7w2fv7p7j2rrx5r6@box.shutemov.name> <20211216151917.GK6467@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: ADE162001F X-Stat-Signature: d9wnt8dxjm1curobsk6bjqe9nm4z1se9 Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=ziepe.ca header.s=google header.b=LEs3Rgq4; spf=pass (imf13.hostedemail.com: domain of jgg@ziepe.ca designates 209.85.219.41 as permitted sender) smtp.mailfrom=jgg@ziepe.ca; dmarc=none X-Rspamd-Server: rspam10 X-HE-Tag: 1639674067-706309 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Dec 16, 2021 at 03:54:36PM +0000, Matthew Wilcox wrote: > On Thu, Dec 16, 2021 at 11:19:17AM -0400, Jason Gunthorpe wrote: > > On Thu, Dec 16, 2021 at 01:56:57PM +0000, Matthew Wilcox wrote: > > > p = mmap(x, 2MB, PROT_READ|PROT_WRITE, ...): THP allocated > > > mprotect(p, 4KB, PROT_READ): THP split. > > > > > > And in that case, I would say the THP now has mapcount of 2 because > > > there are 2 VMAs mapping it. > > > > At least today mapcount is only loosely connected to VMAs. It really > > counts the number of PUD/PTEs that point at parts of the memory. > > Careful. Currently, you need to distinguish between total_mapcount(), > page_trans_huge_mapcount() and page_mapcount(). Take a look at > __page_mapcount() to be sure you really know what the mapcount "really" > counts today ... Right, I was mostly trying to describe one of the difficult problems all this different stuff is trying to solve. > > If the above ends up with a mapcount of 2 then COW will copy not > > re-use, which will cause unexpected data corruption in all those > > annoying side cases. > > As I understood David's presentation yesterday, we actually have > data corruption issues in all the annoying side cases with THPs > in current upstream, so that's no worse than we have now. But let's > see if we can avoid them. Possibly, I'm not sure :) > It feels like what we want from a COW perspective is a count of the > number of MMs mapping a page, not the number of VMAs, PTEs or PMDs > mapping the page. Right? Interesting.. For the COW the interesting question is if wp_page_reuse() happens in do_wp_page(), and it looks like mapcount is only used to make that decision for anonymous pages. So, at least for COW's use of mapcount we can focus entirely on anon pages? For anon pages.. At least the number of VMA's pointing to anon memory is a limit on map_count - I assume there is some way we can copy and double-map anonymous VMAs into the same mm? Still, if we can know the number of VMAs is 1 then we are safe to allow wp_page_reuse() However, it needs to be more exact than that, if num VMAs > 1 we then have to query on a per-page granularity Though, it seems interesting, if we knew how many anonymous VMA's pointed at an anonymous page (at a 4k granularity) that would replace mapcount for COW? I wonder if we could somehow know that only 1 VMA is pointing at the pages as the normal fast path and if COW encounters more than 1 VMA it does some more expensive calculation? > p = mmap(x, 2MB, PROT_READ|PROT_WRITE, ...): THP allocated > mremap(p + 128K, 128K, 128K, MREMAP_MAYMOVE | MREMAP_FIXED, p + 2MB): > PMD split > Should mapcount be 1 or 2 at this point? If I read this right it should be 1 because each 4k page is pointed to by only 1 PTE/PMD. mremap moves, not copies.. Jason