From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50DE9C27C76 for ; Wed, 25 Jan 2023 09:09:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A61146B0072; Wed, 25 Jan 2023 04:09:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9E92E6B0073; Wed, 25 Jan 2023 04:09:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 865E06B0075; Wed, 25 Jan 2023 04:09:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 71D376B0072 for ; Wed, 25 Jan 2023 04:09:53 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 57A641A077E for ; Wed, 25 Jan 2023 09:09:53 +0000 (UTC) X-FDA: 80392749066.15.75AD1C9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf02.hostedemail.com (Postfix) with ESMTP id 1978B80014 for ; Wed, 25 Jan 2023 09:09:50 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jD3uCZzu; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf02.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674637791; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Wc7sghVCwBvSL7EPf7q9I53793kamlgo09mmOCsrn58=; b=tDhOaJ1j2smjR19S2XolCBiUX8RlICatJAtrNMgOPssIb8xI2GTW2tZiyDmdWjocF6mQfa 5zBp09h40b8KQ37n18O+3o1RtnGwi8dlgauE6JW4DaUdfvD80fSHl0PeVSqeB75SGoHa8K blnpCl0GjI3b2TjoExAfC/m/hBjBam4= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jD3uCZzu; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf02.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674637791; a=rsa-sha256; cv=none; b=Jfn7LNMLXhW/XQZ4eYdY5NvN3T5e/4FQVOYGzU8sQxqWJy1tHx90DbZOxE+HfS90P8k1iY kl8UgUI9w2aBCoWY1jQIiUFu0fRymlsnhweK0unohKQFWBB1IR5Y4lO6rF3eq+RZ0bdsde 4JyCm49NmsTxd5FgSdfr40eM48o/vAM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1674637790; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Wc7sghVCwBvSL7EPf7q9I53793kamlgo09mmOCsrn58=; b=jD3uCZzudAZSJ01aOHbwFh6Tz4ULHz/RC3CCRzzW5It87E5JyhdGnLq1DR3B8CdU4UVhIK iMzzdnHAsRYGqVHkqH3OtMxGa63eV0x9FrZMb8QfPI6wbGs5g7dGREUwL0vFZeHd3ksQAJ WFhXwYyLtH7nR+CmgMRP/McdLX08spw= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-197-E4AK2PGlPuCP4OA8vI7MMA-1; Wed, 25 Jan 2023 04:09:49 -0500 X-MC-Unique: E4AK2PGlPuCP4OA8vI7MMA-1 Received: by mail-wm1-f72.google.com with SMTP id o22-20020a05600c511600b003db02b921f1so763048wms.8 for ; Wed, 25 Jan 2023 01:09:48 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :content-language:references:cc:to:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Wc7sghVCwBvSL7EPf7q9I53793kamlgo09mmOCsrn58=; b=Ai44KN1vtYPhe7KFXaIGvLWHera+H4sIvNQNcTQsmGwrsH1fAkB4JUDKQ1NSRWm4+2 hW4g3h4gbgJGkn8PYlz/XmBqAiUVpXAHRkVesEOH2DxtzE+nBXPPmWbDfGVkKWaxuk9J ImC2p+2Z2gGzo5UlZo62AeU3kerowLzIae1UK/3TNTIoYHUt4CCxeMETdwKZMrQPoo/8 gpJO67+lbawFZuPm2G7L6UthIt/oTscx44Rw2E/TeAxpYrmgmRVBp6sKlCRqMlD26Zq1 UTEAwnZgCLJ1paV3optLqDLNMb7Z6pew7jChEOHcdLFXoLIPZqB9XtC9PrdOJttay1fJ zcAw== X-Gm-Message-State: AFqh2ko3ypxImfQxv5exXYhWDNSnJWu1j0v+jUP94H9A3HbKKw6Sh2If 3j8ugATC85IMo8y1tUVn+7By1H2t4MEBz3u4Z4nu6C63Zsu8FdJcnnYRuFzkhE8cSl5a1NWb4Ou 7whN2VcgH8/Q= X-Received: by 2002:a05:600c:4fc6:b0:3dc:d5c:76d9 with SMTP id o6-20020a05600c4fc600b003dc0d5c76d9mr6200564wmq.0.1674637787988; Wed, 25 Jan 2023 01:09:47 -0800 (PST) X-Google-Smtp-Source: AMrXdXu23gJk9ZxnMwsgYxUe38GU2ZVDL2Qi5oDnkgeTDEVbF+hfiIKjr5CpAYkLQpUpO4awVjNNrw== X-Received: by 2002:a05:600c:4fc6:b0:3dc:d5c:76d9 with SMTP id o6-20020a05600c4fc600b003dc0d5c76d9mr6200537wmq.0.1674637787632; Wed, 25 Jan 2023 01:09:47 -0800 (PST) Received: from ?IPV6:2003:cb:c705:4c00:486:38e2:8ff8:a135? (p200300cbc7054c00048638e28ff8a135.dip0.t-ipconnect.de. [2003:cb:c705:4c00:486:38e2:8ff8:a135]) by smtp.gmail.com with ESMTPSA id x26-20020a1c7c1a000000b003db01178b62sm1176536wmc.40.2023.01.25.01.09.46 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 25 Jan 2023 01:09:46 -0800 (PST) Message-ID: <2281795d-5931-5189-ef2e-c589e55e43a3@redhat.com> Date: Wed, 25 Jan 2023 10:09:45 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 To: Mike Kravetz , linux-mm@kvack.org Cc: Naoya Horiguchi , David Rientjes , Michal Hocko , Matthew Wilcox , Peter Xu , James Houghton , Muchun Song References: From: David Hildenbrand Organization: Red Hat Subject: Re: A mapcount riddle In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 1978B80014 X-Stat-Signature: gptcsdrge8zcjjqjpzqibwoocne64jw7 X-HE-Tag: 1674637790-742438 X-HE-Meta: U2FsdGVkX18LwMK5gUxQJh3C6g52XwCtQQKsT1DJ64aTTXykAZ3sL6ETGzBnHYasf1o+bvsgHGD+XMW5Hvzj4Nl9gmSQ01rYG75Etc7z4BxdD4T5Id7gcnLyDec5mpSFL4a+1wkbZEVzb7sfLD7+FaKim0jctJ3FVU1OnxvERN0BoXYD4+qzqcRnd68MaK3fO2URdiNhtFWIPc8JmjExvyEJ0u9cKKeeurxCU7oz+Hqjxwr7f5s/1P1K+at7oTfHIsrKLbb2laUTVY2hCxj4JiSJMM7obKFHZvSOuZbJatOP926KkGFCNUQ+Ts4ZIs+BAeGQWVc8SebJxqwxu8q9z0b/Fuu1+/RaEvO/qDd++fW+8K4fJn2w6PFQyQ9y9EVtN78HZSZ04Nwcp8VN85wwq6s77VbHg9a93CcX6AT8bcCI/KzVLF3JibamVlTCoH+puW2PETMaRmOKpoT/tolSoREUiQQZEWeet2kl6WaYySPKFFFtE2jIqjaH858CRUlzL4idaxZ9L7G1xnNu2Wp6S7LSrAUDkn5ABWg8Ep7d3PSg+NXOyUdoLAnZ77ifNhJ4SWdnclsZ2o4DXo5ECDtx75Icd0eOumCog2R9mKfRcSEqTUzKRbFBqCoEvYMc/81/yawx/SzoovTnyjolCJs0/ag4epIBXLCKLSdZb/MyU0JZQw9WkFW08+1Kk1lTPRA+gOer7ftOUVHoGOquReMj9j+DIAfcyMUiOiU9iTiCp5bXMZI+X5jbCuEmKuxK/EfrZpHaje+4yFfuMhXK8ExbAXCNUpw/Db/wKT0BJiq2OdsySnbmuH12zl1OHvKexdYxigikslXtsFnMbUZ82RcLXzcuHWYaZ/Og+kj7aP16jtzrytpw78TJo5hqLRwKeQAeAptWUzcpLgAkf6PyHacrMTaVVSBafTR1sVKdrPjeiRwMjtjfkTz5vU2MaMasC2ZoscKc4T0HsJHjQ+7ss5k a6hCscmF z/JQCSuxshyoGtD6EBOtImZPxnTNXjb7Hou4Ykol2w1aseXPCHZw25MMWtNtvNyIY1hnaW8EtHvKtVLRK6OovbcD+sk177st2uaNnzMlozDQnxJ67bBcP5Swxr5a+T1kV0aH9De+OBsGJ2pY4cbHhS1tGJ5EGs/KmlKXhd6enjJ5xHBP+dTRLusGunOjYxRrozpRV4dsdWXVGwk1Ys2/j7CQYcRUH1ftrXZ3xHh283CfSAoX1vX3+R2eSsXbXqbpIjqw6xHvPpeZB8ahGjR+Tu0kXCs5XvLQbyxpcoZlp+wpqXEDx9Tb9NX7iDkLnmpPBuMrK5g1EAjv1wJ6Z0DLzM9e2WzYnBbstK8AwROng1c43UOgzR/YF1sYJOK9l8rgGDz062M5/wj1woMS6zT+UNOIFBA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 24.01.23 21:56, Mike Kravetz wrote: > Q How can a page be mapped into multiple processes and have a > mapcount of 1? > > A It is a hugetlb page referenced by a shared PMD. > > I was looking to expose some basic information about PMD sharing via > /proc/smaps. After adding the code, I started a couple processes > sharing a large hugetlb mapping that would result in the use of > shared PMDs. When I looked at the output of /proc/smaps, I saw > my new metric counting the number of shared PMDs. However, what > stood out was that the entire mapping was listed as Private_Hugetlb. > WTH??? It certainly was shared! The routine smaps_hugetlb_range > decides between Private_Hugetlb and Shared_Hugetlb with this code: > > if (page) { > int mapcount = page_mapcount(page); > > if (mapcount >= 2) > mss->shared_hugetlb += huge_page_size(hstate_vma(vma)); > else > mss->private_hugetlb += huge_page_size(hstate_vma(vma)); > } > > After spending some time looking for issues in the page_mapcount code, > I came to the realization that the mapcount of hugetlb pages only > referenced by a shared PMD would be 1 no matter how many processes had > mapped the page. When a page is first faulted, the mapcount is set to 1. > When faulted in other processes, the shared PMD is added to the page > table of the other processes. No increase of mapcount will occur. > > At first thought this seems bad. However, I believe this has been the > behavior since hugetlb PMD sharing was introduced in 2006 and I am > unaware of any reported issues. I did a audit of code looking at > mapcount. In addition to the above issue with smaps, there appears > to be an issue with 'migrate_pages' where shared pages could be migrated > without appropriate privilege. > > /* With MPOL_MF_MOVE, we migrate only unshared hugepage. */ > if (flags & (MPOL_MF_MOVE_ALL) || > (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) { > if (isolate_hugetlb(page, qp->pagelist) && > (flags & MPOL_MF_STRICT)) > /* > * Failed to isolate page but allow migrating pages > * which have been queued. > */ > ret = 1; > } > > I will prepare fixes for both of these. However, I wanted to ask if > anyone has ideas about other potential issues with this? > > Since COW is mostly relevant to private mappings, shared PMDs generally > do not apply. Nothing stood out in a quick audit of code. Yes, we shouldn't have to worry about anon pages in shared PMDs. The observed mapcount weirdness is one of the reasons why I suggested for PTE-table sharing (new RFC was posted some time ago, but no time to look into that) to treat sharing of the page table only as a mechanism to deduplicate page table memory -- and to not change the semantics of pages mapped in there. That is: if the page is logically mapped into two page table structures, the refcount and the mapcount would be 2 instead of 1. Of course, that implies some additional sharing-aware map/unmap logic, because the refcount+mapcount has to be adjusted accordingly. But PTE-table sharing has to take proper care of private mappings as well, that's more what I was concerned about. -- Thanks, David / dhildenb