From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BF5AC54EED for ; Tue, 24 Jan 2023 23:00:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BD8426B0072; Tue, 24 Jan 2023 18:00:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B88B16B0073; Tue, 24 Jan 2023 18:00:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A28E36B0074; Tue, 24 Jan 2023 18:00:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 9574D6B0072 for ; Tue, 24 Jan 2023 18:00:45 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 4D8061404C4 for ; Tue, 24 Jan 2023 23:00:45 +0000 (UTC) X-FDA: 80391214050.07.C9CD0AF Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf23.hostedemail.com (Postfix) with ESMTP id 1C6CE140011 for ; Tue, 24 Jan 2023 23:00:42 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=MO06L+Q3; spf=pass (imf23.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674601243; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WdzgBpIi2cc5hrvgOSZNJOMiUcsSh31kARSRtuKXXSc=; b=UQs4YqVocfa2ZmAWbvUemRUD95nT4m9EsbybYFhZTD42EBt+lap83SaxvNbqtB/28o7X4h SaoQV4iIJ1XImJ8dw7lRevsKMaqNe0ebO83pcY7BtJz30L4d6UzfxmdkP8J60EXlet4qb/ 5xuYBOP01oRNIPglf3hRAaga2D13fQ0= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=MO06L+Q3; spf=pass (imf23.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674601243; a=rsa-sha256; cv=none; b=VOmgWq+OVasyMqE3NFPrp1pX20CwQD1hiBV+k4D71GUIKIqvam5zNJGoDVDt+UN0vtkxZb iTPMzkAs8tkzVZBH+Ja5qtfl6XqGm3zRwlDbEjW7NTBidF2Y5r/KpZBV/p/ze3H0RTpfzc m/T8zxssJZESBVeOkPIoZ8LkSZDl58k= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1674601242; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=WdzgBpIi2cc5hrvgOSZNJOMiUcsSh31kARSRtuKXXSc=; b=MO06L+Q3m8YTbjVhOen0VZ+4MezBfYZrC3F0cCJpKJ0vsHJ96zf7iw0GrsujNAYHz9oW1B SOrT/RKDr6WFkuuyS+uBmPReVfXg/VD5hN1//ikAi6f1QJRd0qdnEpP36NGM33h0ltH8F4 HYGAI+j5xQBle1QXEav2ZIFkcb0601Y= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-316-fpoZq-fsOaSNUdt4oWTRNQ-1; Tue, 24 Jan 2023 18:00:39 -0500 X-MC-Unique: fpoZq-fsOaSNUdt4oWTRNQ-1 Received: by mail-qv1-f69.google.com with SMTP id r10-20020ad4522a000000b004d28fcbfe17so8284816qvq.4 for ; Tue, 24 Jan 2023 15:00:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=WdzgBpIi2cc5hrvgOSZNJOMiUcsSh31kARSRtuKXXSc=; b=pAAcffsuSiRW8IOpfaKGq+rqH6UwtY78SRcWu8PAzcmanvr5sSRzEWmf/9tzviPc1S i3BAip66ITadITtIAmDlRcK11KUw1Nn3J8anjXrsvmkSaBXRzb/RBwy0mLcLskRtTBNG sEsilnOMIVzjsIFOuf1FFJpqV7M0wOUyEUtrFgsRdWAyUEqnV2/B+SWir6OZ6XD870d/ ccIWyifig8wrVFQhX1ifRCtPnGIAgdVMNIfxbgKEc653c4QbXKOBtA+Na6dwBnfqpzc+ 3H9Qo7pwn+nd3WMZH4IAtvwsqphK2YnCTyTAecNv4CAd0Qz6SZcuLsgHeS88ShzCL+60 2nWw== X-Gm-Message-State: AFqh2kqi5EAmcqQrx9RPif129hgEsC2MCP0pAT5zh1KjHpSK/phU/Z8a jlACU4jfjEYY/frkePeU6Bheu5qCNieEz4jHBeszw3vVOo4m4SGyoMyL3PaOEJDNr8LscfOTDFf EDZvoT0NS6Oc= X-Received: by 2002:ac8:478c:0:b0:3a8:10c4:4ae with SMTP id k12-20020ac8478c000000b003a810c404aemr43501818qtq.49.1674601238598; Tue, 24 Jan 2023 15:00:38 -0800 (PST) X-Google-Smtp-Source: AMrXdXs3LPyd5CULlFBViPT/CfKicuZWguSzOvHJuPUAioKIT/p/vtarIT0gvV6ABFAyUTvbxyDgUQ== X-Received: by 2002:ac8:478c:0:b0:3a8:10c4:4ae with SMTP id k12-20020ac8478c000000b003a810c404aemr43501768qtq.49.1674601238125; Tue, 24 Jan 2023 15:00:38 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-56-70-30-145-63.dsl.bell.ca. [70.30.145.63]) by smtp.gmail.com with ESMTPSA id l127-20020a37bb85000000b00706aeebe71csm2254547qkf.108.2023.01.24.15.00.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Jan 2023 15:00:37 -0800 (PST) Date: Tue, 24 Jan 2023 18:00:36 -0500 From: Peter Xu To: Mike Kravetz Cc: linux-mm@kvack.org, Naoya Horiguchi , David Rientjes , Michal Hocko , Matthew Wilcox , David Hildenbrand , James Houghton , Muchun Song Subject: Re: A mapcount riddle Message-ID: References: MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Stat-Signature: 1mwxgudmcpducfq5yrq1gqghsb7dr3ut X-Rspam-User: X-Rspamd-Queue-Id: 1C6CE140011 X-Rspamd-Server: rspam06 X-HE-Tag: 1674601242-167613 X-HE-Meta: U2FsdGVkX19rRLIJxlJA+RwmZBdaRtg5EgvEfZkREEunnwuBLtz1pMQ7KbRAbwX0RDbsj5k81rUbPAozY3BnfINiToUKGEaoPl8N1oDWKsmfs5VDyhx0+kNYu1Cs2BX+lT5u2gm5/xWy9KLcnrOWBHIukQmOvf4OsCzUGiTKzNk8JZybarNqO7pnpiGuMhzLNRmSGIh+P5XBGlHO/wggizrLgOqcoPILjBzKmGTfFdufQ92fO242cuBa/wkSTq90RfKFjtMr5EgAGzUILne7dCoSojg2fcQvuyCwuHmD4Vafu0rEONX/AetXrWNK9x1cgyDTt+pwtJM3h39TYfwJppGnqPWG3Nj7N2tFfnhg1GMM6SlCmtEWpLv5tQj/OxR+B3Keqsjurwrj9WbYGJqeTqr7wa3C9ihL8wL1cniVd1M6pFr0bNs3tJYEDUlo5Wetb23G1pvgL5I9cwuwWVCTWXIdh7GQAv+DMEHwHIrq8Ju2KyYRDVGvMykXPMd3+gKGYSbiVqI6yWFv1NIYsiMcoF+92iIdObYGLb3UMUvc31R+9BSedFtfd0FHIPFAPtmH++oZeS/TyUctjBQrEiIJb+7GHIZwIiM8EaSG/uYfd8aRg9sfNGaJ9aW/nb/9DwO0WuBfXlW6GdfRFOGpfAH8+FpOmwrAfU8EyPiU2hJh/KpsYSFAN65jc7VKwJ8YIlCrGJOxIez1eHVKXniiOUj7BwUDQlqWHdJiPojGgNnH8x5RGBiuEZ0f6IBNy47ZLzyHUu7ngJ352v2OwTgd33imCU+jER83WQX7RI9b0EKPSHG/vNAbRitPlHopJqixn9Yqe3C5zAsy6CCYfhOOlzl7fbERLx+DMtgcuX2v1IHmmK6eFYRi8zk5r7n6pck2v9SwkZePJwmvIl4CqvOw1wnGN5f1wmi90mo7VhjRERA4mS9okRJ6n7yJvN1AP1rxNS+RcmsR4skt4nCFu+K3WF6 oljbMfAX Zwoda3OSzcRaxPugJNjYwpOR8sU2H2jRGUjWs4J804N43NXa1TPAeae0PIvQAzSO9npK9RzP0YRrGIxIQXGSkIAnAeo5zknx+eiIW3+BL75QrpMYGCB8Kxj67FRO30Kg/8NzYg3gKye9VAlHHrTAj9nlev3ZMt4K+gvigIp6S6jW4iX+8DuitNjjO2yhwFBNy8vZBNEzpiOmljPK+SwbcDq4EiA5FBbQNaRQhpPwSP76J/O8ERIyylLDamHE1ZPbhbihe+o5oUfGPsICbagtxqmt90I6GWDtc8nW6J1ZkwTmxrIA5lFXS0UDbq3CWFIwXtcqkqAW4uyew16f/sxp2jgfuadjGT1AlThvGSkvdV40l1NBPprdMypzYW6Gpcxk8bOLM X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 24, 2023 at 12:56:24PM -0800, Mike Kravetz wrote: > Q How can a page be mapped into multiple processes and have a > mapcount of 1? > > A It is a hugetlb page referenced by a shared PMD. > > I was looking to expose some basic information about PMD sharing via > /proc/smaps. After adding the code, I started a couple processes > sharing a large hugetlb mapping that would result in the use of > shared PMDs. When I looked at the output of /proc/smaps, I saw > my new metric counting the number of shared PMDs. However, what > stood out was that the entire mapping was listed as Private_Hugetlb. > WTH??? It certainly was shared! The routine smaps_hugetlb_range > decides between Private_Hugetlb and Shared_Hugetlb with this code: > > if (page) { > int mapcount = page_mapcount(page); > > if (mapcount >= 2) > mss->shared_hugetlb += huge_page_size(hstate_vma(vma)); > else > mss->private_hugetlb += huge_page_size(hstate_vma(vma)); > } This is definitely unfortunate.. > > After spending some time looking for issues in the page_mapcount code, > I came to the realization that the mapcount of hugetlb pages only > referenced by a shared PMD would be 1 no matter how many processes had > mapped the page. When a page is first faulted, the mapcount is set to 1. > When faulted in other processes, the shared PMD is added to the page > table of the other processes. No increase of mapcount will occur. > > At first thought this seems bad. However, I believe this has been the > behavior since hugetlb PMD sharing was introduced in 2006 and I am > unaware of any reported issues. I did a audit of code looking at > mapcount. In addition to the above issue with smaps, there appears > to be an issue with 'migrate_pages' where shared pages could be migrated > without appropriate privilege. > > /* With MPOL_MF_MOVE, we migrate only unshared hugepage. */ > if (flags & (MPOL_MF_MOVE_ALL) || > (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) { > if (isolate_hugetlb(page, qp->pagelist) && > (flags & MPOL_MF_STRICT)) > /* > * Failed to isolate page but allow migrating pages > * which have been queued. > */ > ret = 1; > } > > I will prepare fixes for both of these. However, I wanted to ask if > anyone has ideas about other potential issues with this? This reminded me whether things should be checked already before this happens. E.g. when trying to share pmd, whether it makes sense to check vma mempolicy before doing so? Then the question is if pmd sharing only happens with the vma that shares the same memory policy, whether above mapcount==1 check would be acceptable even if it's shared by multiple processes. Besides, I'm also curious on the planned fix too regarding the two issues mentioned. Thanks, > > Since COW is mostly relevant to private mappings, shared PMDs generally > do not apply. Nothing stood out in a quick audit of code. > -- > Mike Kravetz > -- Peter Xu