From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7CC8FC433F5 for ; Thu, 16 Dec 2021 16:45:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8B5676B0071; Thu, 16 Dec 2021 11:45:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 865436B0073; Thu, 16 Dec 2021 11:45:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 72D936B0074; Thu, 16 Dec 2021 11:45:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0039.hostedemail.com [216.40.44.39]) by kanga.kvack.org (Postfix) with ESMTP id 664BF6B0071 for ; Thu, 16 Dec 2021 11:45:22 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 11F758907E for ; Thu, 16 Dec 2021 16:45:12 +0000 (UTC) X-FDA: 78924232464.17.DD65E08 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf12.hostedemail.com (Postfix) with ESMTP id 1DFE940011 for ; Thu, 16 Dec 2021 16:45:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639673110; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CMDHKCOzwHWaUd5a4B/hgXV5a5gRwhD2GoITAnwlMhg=; b=XaouNcy+M8sPAbVXaF0iYKWWuuA3+cfaojCLhS0rV0kqu6ESx4kcgrTzhWwLdONNjhDAIX HBCh+XpXgBVaj1av0Oa6gvRu8rCvlvRvyEdBkFDgUYeXs5IHNLcMs/ioNIW+TEANqk/SyH RF1BYF2vI5Z48RzmSzNHb+W45oIwqho= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-631-5YLIsiCpP2667BpEJ7xr4g-1; Thu, 16 Dec 2021 11:45:07 -0500 X-MC-Unique: 5YLIsiCpP2667BpEJ7xr4g-1 Received: by mail-wr1-f71.google.com with SMTP id v18-20020a5d5912000000b001815910d2c0so7127087wrd.1 for ; Thu, 16 Dec 2021 08:45:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=CMDHKCOzwHWaUd5a4B/hgXV5a5gRwhD2GoITAnwlMhg=; b=tQa2yx1tSt45inGm+83iwR5S3W6inpFozQDMYbzDRyLK3uZO8zDbEgW4AxC6S8ljE7 1juhvPTBtbxuNuACHoXhrTDUDnaa25EUx4HH6ThaGngg6kPo2G7B1+gypg7nMO5i84kI ZQ/h7usNhHfBdaAMw7m3zIV7HEfSSye5++CxFvxEOxVki2ze/xf7rJ8/eXvH/onwJlox cwsS4H0V+genVGExmiGTtQrYosX4djuxtDW5U+9s1IVX4j32nui8HSvMwvHq+6h7nj2p DfVph5keok4Jv2n58SawpzIFoaFFCH/DJYOsiIHCHWo/pU90Asffq9h8ZirDnXoIeaSc dzkQ== X-Gm-Message-State: AOAM532cIhKYKtIbC978aQmzqOZu3TbuDKk/KtpB6lM1QMVUAcYySXTs vpPovVduIpMfLWntBQDSjwwd2ZwmjH2cvAx9zFvJ1sXXHtJ3OKiaCXMRa8jS313mg3klm/MO7AM 9hD2uFahOGaY= X-Received: by 2002:a05:600c:2308:: with SMTP id 8mr3072353wmo.14.1639673106218; Thu, 16 Dec 2021 08:45:06 -0800 (PST) X-Google-Smtp-Source: ABdhPJxgdUejQ0K8WMSj+axmB1Kjr2kFcFvXvEDg+FCZqghCuYhUTfxxDtS8iyVKLiCRf6qWGi4C/A== X-Received: by 2002:a05:600c:2308:: with SMTP id 8mr3072326wmo.14.1639673105905; Thu, 16 Dec 2021 08:45:05 -0800 (PST) Received: from [192.168.3.132] (p4ff23dcd.dip0.t-ipconnect.de. [79.242.61.205]) by smtp.gmail.com with ESMTPSA id o3sm5916972wri.103.2021.12.16.08.45.05 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 16 Dec 2021 08:45:05 -0800 (PST) Message-ID: <54338c9c-0985-04a8-5d96-8dd3b15f5709@redhat.com> Date: Thu, 16 Dec 2021 17:45:04 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 To: Matthew Wilcox , Jason Gunthorpe Cc: "Kirill A. Shutemov" , linux-mm@kvack.org, Hugh Dickins , Mike Kravetz References: <20211216093737.7w2fv7p7j2rrx5r6@box.shutemov.name> <20211216151917.GK6467@ziepe.ca> From: David Hildenbrand Organization: Red Hat Subject: Re: folio mapcount In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 1DFE940011 X-Stat-Signature: sdnqsxos8z698zseb9tcxxpn63t4wgif Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=XaouNcy+; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf12.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com X-Rspamd-Server: rspam02 X-HE-Tag: 1639673103-668499 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 16.12.21 16:54, Matthew Wilcox wrote: > On Thu, Dec 16, 2021 at 11:19:17AM -0400, Jason Gunthorpe wrote: >> On Thu, Dec 16, 2021 at 01:56:57PM +0000, Matthew Wilcox wrote: >>> p = mmap(x, 2MB, PROT_READ|PROT_WRITE, ...): THP allocated >>> mprotect(p, 4KB, PROT_READ): THP split. >>> >>> And in that case, I would say the THP now has mapcount of 2 because >>> there are 2 VMAs mapping it. >> >> At least today mapcount is only loosely connected to VMAs. It really >> counts the number of PUD/PTEs that point at parts of the memory. > > Careful. Currently, you need to distinguish between total_mapcount(), > page_trans_huge_mapcount() and page_mapcount(). Take a look at > __page_mapcount() to be sure you really know what the mapcount "really" > counts today ... Yes, and the documentation above page_trans_huge_mapcount() tries to bring some clarity. Tries :) > > (also I'm going to assume that when you said PUD you really mean > PMD throughout) > >> If, under the PTL, you observe a mapcount of 1 then you know that the >> PUD/PTE you have under lock is the ONLY PUD/PTE that refers to this >> page and will remain so while the lock is held. >> >> So, today the above ends up with a mapcount of 1 and when we take a >> COW fault we can re-use the page. >> >> If the above ends up with a mapcount of 2 then COW will copy not >> re-use, which will cause unexpected data corruption in all those >> annoying side cases. > > As I understood David's presentation yesterday, we actually have > data corruption issues in all the annoying side cases with THPs > in current upstream, so that's no worse than we have now. But let's > see if we can avoid them. Right, because the refcount is even more shaky ... > > It feels like what we want from a COW perspective is a count of the > number of MMs mapping a page, not the number of VMAs, PTEs or PMDs > mapping the page. Right? > > So here's a corner case ... > > p = mmap(x, 2MB, PROT_READ|PROT_WRITE, ...): THP allocated > mremap(p + 128K, 128K, 128K, MREMAP_MAYMOVE | MREMAP_FIXED, p + 2MB): > PMD split > (busy preparing and testing related patches, so I only skimmed over the discussion) Whenever we have to go through an internal munmap (mmap, munmap, mremap), we would split the PMD and map the remainder using PTE. We place the huge page on the deferred split queue, where the actual compound page will get split ("THP split"). In move_page_tables() we perform the split_huge_pmd() as well, which would trigger in your example I think. For anon pages, IIRC, there is no way to get more than one mapping per process for a single base page. "sharing" as in "shared anonymous pages" only applies between processes, not VMAs. One anon base page can only be mapped once into a process ever. An anon base page can be mapped shared into multiple processes. "The function returns the highest mapcount any one of the subpages has. If the return value is one, even if different processes are mapping different subpages of the transparent hugepage, they can all reuse it, because each process is reusing a different subpage." So if you see "at least one subpage is mapped by more than one" and the page is anon shared, you have to split the PMD and trigger unsharing for exactly that subpage. But it is indeed confusing ... > Should mapcount be 1 or 2 at this point? Does the answer change if it's The PMD was split. Each subpage is mapped exactly once. page_trans_huge_mapcount() is supposed to return 1 because there is no sharing. (Famous last words) -- Thanks, David / dhildenb