From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78830C3DA64 for ; Sun, 4 Aug 2024 15:06:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 77E6A6B007B; Sun, 4 Aug 2024 11:06:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 72F0B6B0082; Sun, 4 Aug 2024 11:06:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5CF2C6B0085; Sun, 4 Aug 2024 11:06:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 4145B6B007B for ; Sun, 4 Aug 2024 11:06:16 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 9BC9A161309 for ; Sun, 4 Aug 2024 15:06:15 +0000 (UTC) X-FDA: 82414888710.08.193763D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf27.hostedemail.com (Postfix) with ESMTP id 3660840016 for ; Sun, 4 Aug 2024 15:06:12 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Q9bqMHcK; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf27.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722783923; a=rsa-sha256; cv=none; b=mhre9+C54urcwqzMpf4J8Nya2Jp5VBrwVhJpUiANEeeBKoU+tTyn+m77pvClrxZzP8TBvZ xhNpsNwEsAwKk0AFPPCpLWySY4bFsP3oMKWByPT1B1Ov65RmrXHA2oFFqcZdIm2X4HQyui 2uzilzrvJYCkw4ofiD52UviQWW8pn1o= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Q9bqMHcK; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf27.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722783923; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SvbvcFjU58ktq+KRI5iYfFYfuh6LM7bmOviBHt03oLI=; b=JiTZlYPW0IbMSMyIqJHbllh5DWLBjwa0NH2FydIdz1awEVdnv9DARTb96ko6GD7SKysCuu Gx06eUyQHGP0EOky9aWyNiY1pDwYUz5j7FBf5VWtXV0hckNHGYjgW9R13fPToq+6cWy1Ot xIhOQZOiwQQEYutN5rToXkOEw6BAj38= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722783971; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=SvbvcFjU58ktq+KRI5iYfFYfuh6LM7bmOviBHt03oLI=; b=Q9bqMHcKQ3vGFx4oo0ujLHiRdYIOGlyLmr4NbSFznzuupBzaSInM70xwWxWDxEE9wSnzmd MD7bxuZj+offm4sJSpsg9Kt1PmwhZVegZxa6WfdDl2LItJJ1vasIB4EBScol9lK+T5GKKe 1D1Qyq6JHOqvpFFs36nKEJggZXKUlgM= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-298-4KUtNNkkP8GcYiKcmra-SA-1; Sun, 04 Aug 2024 11:06:10 -0400 X-MC-Unique: 4KUtNNkkP8GcYiKcmra-SA-1 Received: by mail-qv1-f72.google.com with SMTP id 6a1803df08f44-6b7678caf7dso28548016d6.1 for ; Sun, 04 Aug 2024 08:06:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722783970; x=1723388770; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=SvbvcFjU58ktq+KRI5iYfFYfuh6LM7bmOviBHt03oLI=; b=tIlhSNqp78zaAj8J6hMEzAPe0sHtl+FOi1w8kz0A2qo2IlhqP6yXmeTIx/AXRT0g3h YNDaAB1iTPmyt5E48HLr/ozGWOYQLFqMNP4hqoCH3iGlsQUPSZZ+J51DXUKiW5+aOYEn 12oQWrAgIFczheoPjq8+tx+NXwH9GTUAigtW3NxjeOJFwx/Thv5XOSC/Jnichw826PBU V8JyuZ5Mt7UliYzjhLnMUZ6CC4tQsz25Hki0uOYzJjkp/jg548vBmQvC1GbsP7HNJNPC Si1X5+0pmLLoDa3YH0ND3lAKQ+PbpTjTmhXsU2PhcDAYbv7JcT1P7ug1V6J99mokp8og KaSA== X-Gm-Message-State: AOJu0YxtvElxBMD1qDeM4ktS7YH6sJPkVENgIDQbL0fyGhP2j0Q6Q45M 4cipXWW/gN6DE0tk2sHREV96G0dXHb/zUauzdjwmk/Le7ngOlEJiWKqbLjqKIwccrJzQAt7kV69 mMhHpMFlAjhTOnyf37qCUD9CDDUwfBwoxmDFtp51p9gS8cUCN X-Received: by 2002:ad4:4ee1:0:b0:6b7:b2fb:7dcd with SMTP id 6a1803df08f44-6bb98408bc3mr68328666d6.8.1722783969517; Sun, 04 Aug 2024 08:06:09 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFAHSZ75g6qzRAE6XgtCfAsaJachILokCvd/ttfnqNGQLClyVvD5fN6U4DiFXAwrTwReCdoYA== X-Received: by 2002:ad4:4ee1:0:b0:6b7:b2fb:7dcd with SMTP id 6a1803df08f44-6bb98408bc3mr68328306d6.8.1722783969070; Sun, 04 Aug 2024 08:06:09 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6bb9c797741sm26466216d6.44.2024.08.04.08.06.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Aug 2024 08:06:08 -0700 (PDT) Date: Sun, 4 Aug 2024 11:06:06 -0400 From: Peter Xu To: David Hildenbrand Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dave Jiang , Rik van Riel , Dave Hansen , Michael Ellerman , linuxppc-dev@lists.ozlabs.org, Matthew Wilcox , Rick P Edgecombe , Oscar Salvador , Mel Gorman , Andrew Morton , Borislav Petkov , Christophe Leroy , Huang Ying , "Kirill A . Shutemov" , "Aneesh Kumar K . V" , Dan Williams , Thomas Gleixner , Hugh Dickins , x86@kernel.org, Nicholas Piggin , Vlastimil Babka , Ingo Molnar , Alex Thorlton Subject: Re: [PATCH v3 2/8] mm/mprotect: Remove NUMA_HUGE_PTE_UPDATES Message-ID: References: <20240715192142.3241557-1-peterx@redhat.com> <20240715192142.3241557-3-peterx@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 3660840016 X-Stat-Signature: t953ahfiq9bnamoaa631ymk7ucootdhw X-Rspam-User: X-HE-Tag: 1722783972-86315 X-HE-Meta: U2FsdGVkX18ENipjGQYqrD5isgvU++1g+IWWIuZScvYH9Vno7q3FMwH2M2u5doHn0Kp/tvvARx4a1kPo9erUiaHvcfGq2VH4BCySHO/lct11NaCNTQOxnTI3pCDOxr2eoffeWTPF32J7rM3lOH3oL82VPAxnYF07Shtm8+TI20++Lku49qKguTamh3hna4vCNRxZNO9YB4hZIV8h1Thwd9OXknlt3rAWs/ZIXJj2fTZYIvwR9NbI2HRLeukKH6anlCQEttpz6uwTN3u4oNzblasJdKMK5tW/UKoOA8mIpLdm9+yHHOJidbF/rq8XWWe6yLyTOY3KjhfURyL3CmI+M7E8RMpnedv1lqIS15UVj1nS+hnNSk7UPnyPRpAVgRWNcXDH2s3RtTRukD8D21igPUkbljbs0MvgkM9I+VQIiZAtkLb3Ua4P9XBGvNQNgY24lSyg+uWlaiwOsvzj1FEtlkt6h9w7ORJSnHZV2MjqCsOe1koj1Q3ohV+fHtRMKMHxdlU0246mrdBQ5n2S6Acv4nDEf2TnKxz9FlRBrwhGEnpbW6jdoXiBISI4KSV/3wOiZc9bdQmEnV03G3cHY1rYuTXIyWfCwNxUy3uO0bm9hscBN9XEM+Y4+5cQnN6ATCTvnegEoqhfL7k0gWKdWuEhSjnMYUWpcQEv3+zJAraznBxzdoNkOFejMsCRlqiqnS7CaSKQP/Pc+l71WR8bw6Zcr3avks81ZPkp57OYZKN5Nptgr1YqBox0C65h540WYY+bfLlWNnY2lNqO2ruWO/SBqcQa4ppfqT4xovirko0jsD7bIGdnWl0nJh2AJc5dTHQmV22qrXrYqrR+v3xyaBPhl4ysGxNVdWpZy7+5c7TtJ3AY8Tl8WHA7wyVPWef+/ALsxrl19+i7T9CJLBQtcq+Vx10AKvpyNcvIy1LFeL+awx+ORh34Gno6OFdvGGI831qii4qy/JwIy+OXwpSMr54 +z3TkWNT aQ2PesyMDfFgC4TnSbLLgfR9wrN9VRguk83SBl8sKrtDWgAx88EVUj0Q7RYThK/GUUdu5XEPUhOert3O27xnJyUdZ9zofWeKWQegBf4VQl7aBqRMRg0Zs5CR+q1dO3moooAdESEf19oxTDdwTU/t/mfnQztfakQRRBxU3WibM2/HJaBUMcQexF93qzaR2bE1v6Jo6MHlKBD59Ilodh0PVOVzsAL4/T8RLELkKJdAGEfza+ov1EWjZ7zVfqsGhgE5yXQgNX+KAK5S28YsBT2q3em8Cv7gVSskiev/RE5K75/p1/cQvH9wA7O++JznZ8D+gUKB2zeaZh522cjywb/T0ZvkW+fBg8eAhwtO5lLF6U8PMASU7UmS1NfZBSNAsljQkXeHjZHwfzy+srG0TtjDpFi+5bPXx5BJY88bjwlqRBOUSgPZBJkG5EHjFG/9H0hXiGMV776vxC/eSNGBWEBeUyrUq4j9PtOuB2DNbt/I27YltiNYYJocefjnFZg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 31, 2024 at 02:18:26PM +0200, David Hildenbrand wrote: > On 15.07.24 21:21, Peter Xu wrote: > > In 2013, commit 72403b4a0fbd ("mm: numa: return the number of base pages > > altered by protection changes") introduced "numa_huge_pte_updates" vmstat > > entry, trying to capture how many huge ptes (in reality, PMD thps at that > > time) are marked by NUMA balancing. > > > > This patch proposes to remove it for some reasons. > > > > Firstly, the name is misleading. We can have more than one way to have a > > "huge pte" at least nowadays, and that's also the major goal of this patch, > > where it paves way for PUD handling in change protection code paths. > > > > PUDs are coming not only for dax (which has already came and yet broken..), > > but also for pfnmaps and hugetlb pages. The name will simply stop making > > sense when PUD will start to be involved in mprotect() world. > > > > It'll also make it not reasonable either if we boost the counter for both > > pmd/puds. In short, current accounting won't be right when PUD comes, so > > the scheme was only suitable at that point in time where PUD wasn't even > > possible. > > > > Secondly, the accounting was simply not right from the start as long as it > > was also affected by other call sites besides NUMA. mprotect() is one, > > while userfaultfd-wp also leverages change protection path to modify > > pgtables. If it wants to do right it needs to check the caller but it > > never did; at least mprotect() should be there even in 2013. > > > > It gives me the impression that nobody is seriously using this field, and > > it's also impossible to be serious. > > It's weird and the implementation is ugly. The intention really was to only > consider MM_CP_PROT_NUMA, but that apparently is not the case. > > hugetlb/mprotect/... should have never been accounted. > > [...] > > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > index 73d791d1caad..53656227f70d 100644 > > --- a/mm/vmstat.c > > +++ b/mm/vmstat.c > > @@ -1313,7 +1313,6 @@ const char * const vmstat_text[] = { > > #ifdef CONFIG_NUMA_BALANCING > > "numa_pte_updates", > > - "numa_huge_pte_updates", > > "numa_hint_faults", > > "numa_hint_faults_local", > > "numa_pages_migrated", > > It's a user-visible update. I assume most tools should be prepared for this > stat missing (just like handling !CONFIG_NUMA_BALANCING). > > Apparently it's documented [1][2] for some distros: Yes, and AFAIU, [2] is a document to explain an issue relevant to numa balancing, and I'd highly doubt [2] referenced [1] here; even the order of the parameters are the same to be listed. > > "The amount of transparent huge pages that were marked for NUMA hinting > faults. In combination with numa_pte_updates the total address space that > was marked can be calculated." > > And now I realize that change_prot_numa() would account these PMD updates as > well in numa_pte_updates and I am confused about the SUSE documentation: "In > combination with numa_pte_updates" doesn't really apply, right? > > At this point I don't know what's right or wrong. Me neither, even without PUD involvement. Talking about numa_pte_updates, hugetlb_change_protection() returns "number of huge ptes", so one 2M hugetlb page is accounted once; while comparing to the generic THP (change_protection_range()) it's HPAGE_PUD_NR. It'll make more sense to me if it sticks with PAGE_SIZE. So all these counters look a bit confusing. > > If we'd want to fix it instead, the right thing to do would be doing the > accounting only with MM_CP_PROT_NUMA. But then, numa_pte_updates is also > wrongly updated I believe :( Right. I don't have a reason to change numa_pte_updates semantics yet so far, but here there's the problem where numa_huge_pte_updates can be ambiguous when there is even PUD involved. In general, I don't know how I should treat this counter in PUD path even if NUMA isn't involved in dax yet; it can be soon involved if we move on with using this same path for hugetlb, or when 1G thp can be possible (with Yu Zhao's TAO?). One other thing I can do is I drop this patch, ignore NUMA_HUGE_PTE_UPDATES in PUD dax processing for now. It'll work for this series, but it'll still be a problem later. I figured maybe we should simply drop it from now. Thanks, > > > [1] https://documentation.suse.com/de-de/sles/12-SP5/html/SLES-all/cha-tuning-numactl.html > [2] https://support.oracle.com/knowledge/Oracle%20Linux%20and%20Virtualization/2749259_1.html > > -- > Cheers, > > David / dhildenb > -- Peter Xu