From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5639C54E94 for ; Tue, 24 Jan 2023 23:30:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8621A6B0072; Tue, 24 Jan 2023 18:30:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 812BB6B0073; Tue, 24 Jan 2023 18:30:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6DA616B0074; Tue, 24 Jan 2023 18:30:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 5FA926B0072 for ; Tue, 24 Jan 2023 18:30:08 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 3681E160C56 for ; Tue, 24 Jan 2023 23:30:08 +0000 (UTC) X-FDA: 80391288096.04.8DE7EC2 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf06.hostedemail.com (Postfix) with ESMTP id 7024918000C for ; Tue, 24 Jan 2023 23:30:06 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=LdJWa+Qy; spf=pass (imf06.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674603006; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=g7tNTdpV9r/m0p7wXre7/VKM6Yd1oJeR2urEKFUJ4ho=; b=OHBU8fYy2o47uRjOm2KKKL5yrkftGBHWS+cFQTdfxdzwKNg0hftMTHBHRZoci9X6RJMXMs Ljv9t8eIDHwFXa7n6NI4Yaj94f/lEinqe3JTAKzhoBMx5J6mEFPNcJ9Mt6dpXecoUKpaMj Wqrz6l4IzeYvgcPkJ+lUWRqIWY0BBik= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=LdJWa+Qy; spf=pass (imf06.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674603006; a=rsa-sha256; cv=none; b=U1/o3DFifA0SiPNWUlJD53ZYLYF+HRUiaBEsy7jUmFuEpGbztVvJGhSqJFQYr4yB18o2vs clZDl8u0Oy19TqEx9cRzxUButsrQpJSHdwow7eiw8KZJJQAAARmQRC2cRTYtQpqVgZwpA8 x81Mil1V8xlnOd9nOHJAlaMWxgz3e78= Received: by mail-pj1-f43.google.com with SMTP id e10-20020a17090a630a00b0022bedd66e6dso253583pjj.1 for ; Tue, 24 Jan 2023 15:30:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=g7tNTdpV9r/m0p7wXre7/VKM6Yd1oJeR2urEKFUJ4ho=; b=LdJWa+QywYUn00wjHbu4bMtEMZ+WB6mfnOviD4ztCDcPz3vG4PErxp5xHBTrQM77CO GXCvB3GSb5HWAB2bxzY7MEeM0FUliLJsBFT7wD+sHGCHB43mlIrdprdudLBm17UWbFPs vZq0ZgdazN1q5H+7d0H2ubGMGvCgglDxKAEhkLASsJ2cXrobWdQE3tm97T5bbVP6osF5 Ve1dQ8OFmZisZkxYVLt6vJn5jGPMGPtMOuXXhLCNTylqcpn0m+YkBsfeaRVTMB+t8rPs jCXQfYKArBwWJV8d0jrE1mEqAbkoq7V1rl/eTbEtn43gXXsHiQK/S61pOfrD73woUZ7V 74cA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=g7tNTdpV9r/m0p7wXre7/VKM6Yd1oJeR2urEKFUJ4ho=; b=WpdjvCZ5WShkfULThhT0PR5LfpkUnatVRfeo3Qj5aDnnsIMz0SePLMFe40AYcMilri 4YGu519QZwCo3JLiUGAY6F5Hv+T1v5YXVV4Ic8u96JVbqu5z+2EIR2o+8q0sdsJmXeXp Tcx7ukw8qeTOlmEj3/AeJuFpjE5va+vzWV54oPfaIsifFS6FLQopPp/6W/svutmHR+az HRLWL+o+AWpj9+nDcfjhw5+8nh5SEFxMpkdAPw7TJVwNuOSAHAV64Z+i4jaL/3nXborM VvP2GJmSdsHGimuLtYOU9kvOEKWcYHnUriB6ktfgAJA9aVrX+n1MSsITaxle6kaCc+uD pQMA== X-Gm-Message-State: AFqh2kqgsOwGSCCdRjFN6goeyA/oBFUcpw47ZXPYj7ftyp64oLPsQSpQ eEXZeB2w7GKdPpdLxCVrbiXcHDpAuzpiUTi6jI0= X-Google-Smtp-Source: AMrXdXuQbnBJ4mWDWSboozt8ss0jPL3m0gXroA02s4XucW9astPKqNOyt7wpzmj/FcLTIhsdvA3n/1Op4cZ+oX/Spvs= X-Received: by 2002:a17:90b:82:b0:229:419c:1d98 with SMTP id bb2-20020a17090b008200b00229419c1d98mr3105688pjb.164.1674603005210; Tue, 24 Jan 2023 15:30:05 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Yang Shi Date: Tue, 24 Jan 2023 15:29:53 -0800 Message-ID: Subject: Re: A mapcount riddle To: Peter Xu Cc: Mike Kravetz , linux-mm@kvack.org, Naoya Horiguchi , David Rientjes , Michal Hocko , Matthew Wilcox , David Hildenbrand , James Houghton , Muchun Song Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 7024918000C X-Stat-Signature: iz63tbc1wup5cquu9q7sjxfiz4dcuy7m X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1674603006-101771 X-HE-Meta: U2FsdGVkX19zzXGapMx4OQOVGA/G/TkLogzeC1gnEZRo5R72szTJDM9o70KBClz1pkFGtCjGzDBLKwDyunp1QOgLuM4+wZr5567wzspkuyPHCxYyQ4SdGJHv0cm2Mu13DaOUDnQaobgOYl5bxeJFQQWIb4ujeyWF+j+eWIhTvTJBNNBYhxBqkWj8/SXwx73vfF2AnnlJmBtqkZklN1ZhHDai5C/8jPrPPXh6N2ouuTVffjB9enP2jXHQNPpJF4q9512BSm6UN0w83KVSaSRj0s/U2HQCkhQQFFraH1i3QC7k2Ed2p/tvjJg+ngd6B4PPRYKfy6KDGnXnL9Mv7r9JzubNpSOcLyGwF2J2uU/4QAZmJqzBGkKDf5GXYZrqxy+0+PUvO4TdufRy37gz02DKfgY2BLomNdTbLy8tamfQad9vXzSxAr/hknm1OtM4dKyWbC0djr2C4alRExGYBRMGu5fImRX8eHY/0+F6vjAlna1O9jZYvIVJi6ED+BNtdN3A2FyC3w2bdI7V2R0kiC+37byXbWYNeUwyGYfVYgiRSn+D27JywI32b3gSpd7+ctwkGXzDrJfNvL8+biyhb9DdJdk4JFSp5HPLvPK8e6zobFOOaYv1ltUPOouFeZG5OI+3TOXVyL1zSWOHEnW4L9cglqxLEPSuiVj7vaJKoXe9OjBMN8wpdiw//wq29vhqKi0cXiHBH2nccQ2NbH4IlN4MJjI4b9e8ixPgBZXbkVoTV1aZ6QwfU9xW0wH3SLiGBjbkInws8FRkZzXpC0ZXsEproHbr0y+1+9UHxj4xkGA+k+cPJh06LgeyAlOC+d/SQGBFlq2+vEXJYoM/Htnato+gxnX/GDm8vds3zv2MXRbpwHpYSqjMST0fSy0Sy+d7ZW6P7hEcoQDK3P9JFnU+hELy29jjHZd6QkBNlv/Tm5ke4sld50Lr/5oiU4a7o0n+SdVktC20eJnB435THuNQUI9 BTB3ph0v Rdb69vfMtdXEzXSTyfsd0h+N0de/5uf3ypprA6BQlZ3lpO4e2dj8HIMWDDEzLmo4genm30mjkVc3j5FEN/ybk1guSWkm26iGsDF2rJVO8uyiqqwXnEfwRBIM4V8e9Q6wpv3QwtfbF9/Worh/xDJ94WFBmSgNX9fRXiqgaV+SRBtSh5CrxJRp5iPal4X1Iqy3Xb5815D2REwfFYNM2ta+xba8WHNfOq0jbynQo6CylmO+MpN/fRhgAjYPymY0ykLb2gNNZaN92sHtkq7qUFIqLmvOpnAGi5BqeA8n1MjN+jr1QBkbyJ7ZcVUNJJkEfKKXhBn1Yi/mH1jxq82P2SuMncE948ly+zlqthnZHfDu1h0mmln1mt/8vF8eH+wSa/In2X/UHtKf6f2OgF7Y= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 24, 2023 at 3:00 PM Peter Xu wrote: > > On Tue, Jan 24, 2023 at 12:56:24PM -0800, Mike Kravetz wrote: > > Q How can a page be mapped into multiple processes and have a > > mapcount of 1? > > > > A It is a hugetlb page referenced by a shared PMD. > > > > I was looking to expose some basic information about PMD sharing via > > /proc/smaps. After adding the code, I started a couple processes > > sharing a large hugetlb mapping that would result in the use of > > shared PMDs. When I looked at the output of /proc/smaps, I saw > > my new metric counting the number of shared PMDs. However, what > > stood out was that the entire mapping was listed as Private_Hugetlb. > > WTH??? It certainly was shared! The routine smaps_hugetlb_range > > decides between Private_Hugetlb and Shared_Hugetlb with this code: > > > > if (page) { > > int mapcount = page_mapcount(page); > > > > if (mapcount >= 2) > > mss->shared_hugetlb += huge_page_size(hstate_vma(vma)); > > else > > mss->private_hugetlb += huge_page_size(hstate_vma(vma)); > > } > > This is definitely unfortunate.. > > > > > After spending some time looking for issues in the page_mapcount code, > > I came to the realization that the mapcount of hugetlb pages only > > referenced by a shared PMD would be 1 no matter how many processes had > > mapped the page. When a page is first faulted, the mapcount is set to 1. > > When faulted in other processes, the shared PMD is added to the page > > table of the other processes. No increase of mapcount will occur. > > > > At first thought this seems bad. However, I believe this has been the > > behavior since hugetlb PMD sharing was introduced in 2006 and I am > > unaware of any reported issues. I did a audit of code looking at > > mapcount. In addition to the above issue with smaps, there appears > > to be an issue with 'migrate_pages' where shared pages could be migrated > > without appropriate privilege. > > > > /* With MPOL_MF_MOVE, we migrate only unshared hugepage. */ > > if (flags & (MPOL_MF_MOVE_ALL) || > > (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) { > > if (isolate_hugetlb(page, qp->pagelist) && > > (flags & MPOL_MF_STRICT)) > > /* > > * Failed to isolate page but allow migrating pages > > * which have been queued. > > */ > > ret = 1; > > } > > > > I will prepare fixes for both of these. However, I wanted to ask if > > anyone has ideas about other potential issues with this? > > This reminded me whether things should be checked already before this > happens. E.g. when trying to share pmd, whether it makes sense to check > vma mempolicy before doing so? > > Then the question is if pmd sharing only happens with the vma that shares > the same memory policy, whether above mapcount==1 check would be acceptable > even if it's shared by multiple processes. I don't think so. One process might change its policy, for example, bind to another node, then result in migration for the hugepage due to the incorrect mapcount. The above example code pasted by Mike actually comes from mbind if I remember correctly. I'm wondering whether we could use refcount instead of mapcount to determine if hugetlb page is shared or not, assuming refcounting for hugetlb page behaves similar to base page (inc when mapped by a new process or pinned). If it is pinned (for example, GUP) we can't migrate it either. > > Besides, I'm also curious on the planned fix too regarding the two issues > mentioned. > > Thanks, > > > > > Since COW is mostly relevant to private mappings, shared PMDs generally > > do not apply. Nothing stood out in a quick audit of code. > > -- > > Mike Kravetz > > > > -- > Peter Xu > >