From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8AB57C61DA3 for ; Thu, 26 Jan 2023 18:22:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0E2DC6B0072; Thu, 26 Jan 2023 13:22:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0939C6B0073; Thu, 26 Jan 2023 13:22:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E75EF8E0001; Thu, 26 Jan 2023 13:22:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D890B6B0072 for ; Thu, 26 Jan 2023 13:22:58 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 9B3B180840 for ; Thu, 26 Jan 2023 18:22:58 +0000 (UTC) X-FDA: 80397771636.04.2E3B138 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf17.hostedemail.com (Postfix) with ESMTP id DFDCF40008 for ; Thu, 26 Jan 2023 18:22:56 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=D0hcnjM9; spf=pass (imf17.hostedemail.com: domain of shy828301@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674757376; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7MlZpe/CA4sFXt7gom4MHKCS46hw5Ui3IlWwSIpghIk=; b=bQs9VecMSP28B4gEkA/BWRM2KdoknZfsvrrxfQPp/NkWR94p4EyPDss8dmzm+NucnVU2lq UAN6hyREnTM5ZUgCaqk9i6U/IvY94bDsxbOGeqzt05qAJBdIAPzHJKLki3bTwrb60kWPJi UmzPePf/lqx6MNi3hd1Rjtd0QWrqjyc= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=D0hcnjM9; spf=pass (imf17.hostedemail.com: domain of shy828301@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674757376; a=rsa-sha256; cv=none; b=OWbQf7RCgLCjK2yFgbC0NjPizFGenBFyT7B62CM3kw/BM0/LuNY4nbJSkGcAEKdF4TbdXb 4Uy1BPuExGcZf30wrqiLsvEjlIzSYz6cA23cIsI/pA8i1JYvA51WMeysWqX0OTjoQBZwU1 UAx9y/5mAm8UQIBfuRNleJtRgoc66dA= Received: by mail-pl1-f170.google.com with SMTP id 5so2624729plo.3 for ; Thu, 26 Jan 2023 10:22:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=7MlZpe/CA4sFXt7gom4MHKCS46hw5Ui3IlWwSIpghIk=; b=D0hcnjM9XhUfa4AImU8+eNxOIjfuI5W8CAMW2FhMcDy7C3/kvAwcFDpfhvPertqkUg c8JBbGhjD7301hsdlEH18Sj6BI0TH2veYEoetbHU+eYu16/4hiON+34THptXr3p0o4eW Xy8XFuyxfEBmAZaJ3t+7TawskK5tdJ683HpOrw/nCX735RMIZlmZOTAxJPT4Rgd4/Mi9 C/r4NRxtGIke2ZFE7mdOLf3OkKno4BeNracRppDFsrstgrOI/fWMLjt7GyuV6qf5X8Qs q1qLX961jj6Wd04eidEZeS3c8qDaAf/BYbNyhg0+4RN4FjkUe0vYgm5sSyBsY0puz1NG qQgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=7MlZpe/CA4sFXt7gom4MHKCS46hw5Ui3IlWwSIpghIk=; b=YPGnpbHLSoXa9kNWkg443GT1xqb7wSGQIbrem/g8K8Ua/+0uzBPh7NaQqaUv5ISFzA uOEfQ/yYN8qvtp3e2K0UCT7RGykdbVDvhx/BltTHSi3m9t3A9kOy+1fAuQf/h91fOsbp 8/8o0lHf1g17BvW0UqL3PoIX0NOPWg5c1krq/XIWJsUGZcajZ0l8xbEuYSbGvNc9g23t 6vkKsuHvkVMOSowaMGUgu6DOVcAPK3Hu7P19Aa7ZNidz+YkPhn+yY93/S1yuAZsZiLhp ilJH80AcSOjycyihmV01XkfGMZsqQAvSWEBG992NOTnpa3vAcb5NyHIi4Dfgwf200PXL UCaw== X-Gm-Message-State: AFqh2kq1daJ36R3GDeFfpZLKs8uEe+Zo0+yWh3GBIi7PcF7Ur6sdRa1x beNDX+cD/8tVpJhpBHcyxGavDcBGyG0EotDOjg8= X-Google-Smtp-Source: AMrXdXtSChiWWHNNMdWO3YV3orZ7NQAe1Tv5iHRx6/PE3xbAoB+WnxHJMNma15RZZe9BLzDcMUhl3dM+FiJx9/lXyRY= X-Received: by 2002:a17:90a:62ca:b0:229:ab8:5f1d with SMTP id k10-20020a17090a62ca00b002290ab85f1dmr4345180pjs.15.1674757375605; Thu, 26 Jan 2023 10:22:55 -0800 (PST) MIME-Version: 1.0 References: <394ed0b3-273d-6071-d71d-ddcef6ea72e9@redhat.com> In-Reply-To: <394ed0b3-273d-6071-d71d-ddcef6ea72e9@redhat.com> From: Yang Shi Date: Thu, 26 Jan 2023 10:22:44 -0800 Message-ID: Subject: Re: A mapcount riddle To: David Hildenbrand Cc: James Houghton , Peter Xu , Mike Kravetz , linux-mm@kvack.org, Naoya Horiguchi , David Rientjes , Michal Hocko , Matthew Wilcox , Muchun Song Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: DFDCF40008 X-Stat-Signature: oz5jepk6ghuqwmuefwh4qba6cgssdwwh X-Rspam-User: X-HE-Tag: 1674757376-911028 X-HE-Meta: U2FsdGVkX1/6jdaDWs1MAICfBsrYNZ0gUPgzU6vNWhI4e5uXTfIGeMxtpqaNdD1Vl9fHrKdhTGUrf5wkXgUzt++BoBCfy5xC8KnPM+XXcKL2marsRs1ArQa17kGdcUX1UoJbdh137Z+f/0X+0kzCeWE9GMbJ3y45Z4Y0mOLqn6i4M+zPOBuFG5v7Vtk6dcJoUS21qJL640H96jh6gAAoHDZN5AEMJdeu565GM9t80YxtBAoruKGAKLLduXvJF8SJpjLyB+PKiyCQjqC5UGavnuUd5zNrgdSBXqLKwQnffEZrRjO1YICRXi/cWrLoFZwkiEykVRXq1tqCO9P93heUMndj1JojtBHYM1n7zfN1kmS0loVrQzwCZLaXk3qpai7hJ3/BrXCDkoKLHQhsllvo2qTC4WaHu6oW+hMux43maKUhyt7ERUX8JErARXiB1KeEzK1IomtKOO5WPTC0+WA6kyaNPCAQzg/IErTYED1vh7O7pkgr32z1N3bo+6kbiieh9WrEt2AcIe77A+PDeQVu9RGH0RLCEb/XzmzRmYmQbstcvI52VqPjGJd1TbFivNmQOaO5SGbiSzEmdT1yk691hlvPvrd3erxMHarXGf9dng0XdEjMZSwqRelDbraRkZLR6hZpE/3Xs1LmUPrxRNbxdqMmhFuhy8BKG8/xVkKMvZ0N73bIhCDihn9Fc4WmVZn+Ox9Ns5PsjwHOAq3mORU0SPNoGdYtA6tRgm7t4XQCqU79RaDDr3NyDGe9ZJPdNRg+P0hdhMRtARwkwglFmyGR0RR5Nc4LO4VRz4npFz5lCfklvTTGHsw2+8lRAYmLt43aJOSAG9/fUgCSkluzJT4QLqf06bgG5boCPZ/qmcntJxsc7zYgfrj6ZoJPAFMOUhHxgzmctOSJ9UCvjbIcoG856C2h9kHMrxhNgzf9LrIU5gk3JIniSt3iQEEAPWDKMZ6q1htaa+X4p/fI+ibB4fm jTvRwD9l mhYiyh27JL5P1ATL8eRP6qcSrxhtPlEYFk1xzQrzunKyGNI9T2euUL8bOV89NOcc0r+OAGOimfc7S8qjoarfCWCvk7Qvlbwqv6zChMafHmS/MYC/5BPhzO4SKSVrpK/MDjSXUHQoGeiHkv5CAZa7709lcOF743+dSwwMJJQteRwegn4G57ta1FBcor11b51kSJga0gK3Qkt0Ek/MSR4xrV4i/xELziH0S2AQV7d3EFl8oxAcsKgsOAjiKF9ZgAiD31nU/xB0Bb/4+RBGTd6T+pXIlUEgJuTlIt6d6oVyJL73hGS/d9eH86to644zJjC8Q5+cudxCO7VFydlid+a2pgnP5kzRyqzWGUk+TlAsbQvgrFzudk2Q6Dgrf6jujhX0u+LdqHRyNXC6BxeU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jan 26, 2023 at 1:15 AM David Hildenbrand wrote: > > On 25.01.23 17:22, James Houghton wrote: > > On Wed, Jan 25, 2023 at 7:54 AM Peter Xu wrote: > >> > >> On Wed, Jan 25, 2023 at 07:26:49AM -0800, James Houghton wrote: > >>>> At first thought this seems bad. However, I believe this has been the > >>>> behavior since hugetlb PMD sharing was introduced in 2006 and I am > >>>> unaware of any reported issues. I did a audit of code looking at > >>>> mapcount. In addition to the above issue with smaps, there appears > >>>> to be an issue with 'migrate_pages' where shared pages could be migrated > >>>> without appropriate privilege. > >>>> > >>>> /* With MPOL_MF_MOVE, we migrate only unshared hugepage. */ > >>>> if (flags & (MPOL_MF_MOVE_ALL) || > >>>> (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) { > >>>> if (isolate_hugetlb(page, qp->pagelist) && > >>>> (flags & MPOL_MF_STRICT)) > >>>> /* > >>>> * Failed to isolate page but allow migrating pages > >>>> * which have been queued. > >>>> */ > >>>> ret = 1; > >>>> } > >>> > >>> This isn't the exact same problem you're fixing Mike, but I want to > >>> point out a related problem. > >>> > >>> This is the generic-mm-equivalent of the hugetlb code above: > >>> > >>> static int migrate_page_add(struct page *page, struct list_head > >>> *pagelist, unsigned long flags) > >>> { > >>> struct page *head = compound_head(page); > >>> /* > >>> * Avoid migrating a page that is shared with others. > >>> */ > >>> if ((flags & MPOL_MF_MOVE_ALL) || page_mapcount(head) == 1) { > >>> if (!isolate_lru_page(head)) { > >>> list_add_tail(&head->lru, pagelist); > >>> mod_node_page_state(page_pgdat(head), > >>> NR_ISOLATED_ANON + page_is_file_lru(head), > >>> thp_nr_pages(head)); > >>> ... > >>> } > >>> > >>> If you have a partially PTE-mapped THP, page_mapcount(head) will not > >>> accurately determine if a page is mapped in multiple VMAs or not (it > >>> only tells you how many times the head page is mapped). > >>> > >>> For example... > >>> 1) You could have the THP PMD-mapped in one VMA, and then one tail > >>> page of the THP can be mapped in another. page_mapcount(head) will be > >>> 1. > >>> 2) You could have two VMAs map two separate tail pages of the THP, in > >>> which case page_mapcount(head) will be 0. > >>> > >>> I bring this up because we have the same problem with HugeTLB > >>> high-granularity mapping. > >> > >> Maybe a better match here is total_mapcount() rather than page_mapcount() > >> (despite the overheads on the sub-page loop)? > > IIRC, total_mapcount() would also not be what we want: for a PTE-mapped > THP it would be e.g. 512 instead of one. [unless I am confused again > about mapcounts] > > See my other comment, I believe this is supposed to be a guesstimate > whether "this page is shared". And we use the first subpage to make a > guess here ... I tried to dig into the git and review history. It seems like the code was added when THP migration support was introduced, and was just copied from the base page case IIRC. I agree it is a heuristic guess about whether this page is shared or not, but reading the head page's mapcount is not correct AFAICT. The total_mapcount should be used, although it can't distinguish unshared PTE mapped (multiple subpages mapped by PTEs) THP, but it could filter out shared pages as expected. > > Of course, we could try harder, by looking at > 1 subpage, to test if > any of these subpages has a mapcount > 1. But it still wouldn't be > accurate .... A little bit overkilling TBH. > > -- > Thanks, > > David / dhildenb > >