From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A508C4707C for ; Fri, 12 Jan 2024 10:01:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B09396B0085; Fri, 12 Jan 2024 05:00:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AB85F6B0089; Fri, 12 Jan 2024 05:00:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9A7A76B008A; Fri, 12 Jan 2024 05:00:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 8DE426B0085 for ; Fri, 12 Jan 2024 05:00:59 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 4866BA0D66 for ; Fri, 12 Jan 2024 10:00:59 +0000 (UTC) X-FDA: 81670215438.15.4616FF0 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf23.hostedemail.com (Postfix) with ESMTP id A95AC140026 for ; Fri, 12 Jan 2024 10:00:56 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf23.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705053657; a=rsa-sha256; cv=none; b=tnSwDcucwl1+wo2u2vdSicELcvMrnP6AJUZRNFiKCoUjBC034C5+DbWW1ozSLpnQQunkdw mAOZbPmIbOdvh4ITx6YEALcuqR833RR+irxLVzxideSEqTsYgBZLOj4/cULMIrRBkppfH0 toxlOn1NSGlbg5apw5ar1F9RYBE6ZYU= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf23.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705053657; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6uKLT4X0wqs21D5PlVo9gCAoB9N2ZzGvFOtocC9HF60=; b=UHZvdPefyeHVxH5CsMfKdIHZSr5UYV/+Cc3ZfaPVy0YKcXvV9sbN2gpO+yu0OQhfGEpNc0 BlctDBeYPRYMoq3gwTDz5p9Cava6ZWa7tVB/M8Ah1eibCj2okjs9poCrrEiLZxR+MUahhf HFHnCyLgUQYUae7yCgLmJ+uxG9rfTGs= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BA7441FB; Fri, 12 Jan 2024 02:01:41 -0800 (PST) Received: from [10.57.76.127] (unknown [10.57.76.127]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A44753F64C; Fri, 12 Jan 2024 02:00:53 -0800 (PST) Message-ID: <22905bf7-570f-41a9-8dd0-b8a250c97de3@arm.com> Date: Fri, 12 Jan 2024 10:00:51 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] tools/mm: Add thpmaps script to dump THP usage info Content-Language: en-GB To: John Hubbard , Andrew Morton , Zenghui Yu , Matthew Wilcox , David Hildenbrand , Kefeng Wang , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , William Kucharski Cc: linux-mm@kvack.org, Barry Song References: <20240110173203.3419437-1-ryan.roberts@arm.com> <33341ca8-1354-4f3f-b377-0b7d04da48d0@nvidia.com> <43230798-af22-4f59-b37c-8257bae32af8@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: A95AC140026 X-Stat-Signature: q3j9grrttd1nnwpmb6hysotam9bs95mn X-Rspam-User: X-HE-Tag: 1705053656-469836 X-HE-Meta: U2FsdGVkX1+ZH45nVkLjbv9F/w++5HE0IANglaeSAqaDHW4q0jSwQwhZehsjiwP2pNGg3NssFpVUJNwVLImQbTaKyoq1y0DFbUVkaLVFG4fe3bEinqJx6G1ocuQzhBgtf85VakvBshG7ZJH3lFa90c1C1dM2hoxLSleh7Fbnzc3pPZ9zwefU2sIhezDb+F+sm5RTAgtogBbX5+tB7IBxGA7H7m2G5+2VyeC6ZlLBWNZU3h1Zyez0kv1e1/fx2FIKJ21yjGE3msLodBhmudu6t/jGF9GvmgWB/novbqIq9lBoqUOl59ibXUfUzOE+nzGvT8XCgLTU7zESam3RFqk+eBsJbuAU4t1a2BsN8jRu4Evs+LtgULZ8gU9Ovl6T46J50mFAr5ZJHvLE+d5bsCKLPtvGvt/F2ac0B+x+YvP31+FJGLPIb6/FBHGfFg5wMyZwMnLQLS9RByGov7KKAXVfT5Umg0ttIFPtS/KNigX+OUYqHTWEAFIY7STxtxvDH7WdRVu7RhJ7LH966Xq8yJgkl1gRVUklhhKITsUz1tENiK6YhsPMxt3ezkqL2TRJnBFrIRK+y6fXErTZ8xWitXtP+Zr7W0r45VypysfTD4IX5yxrFA3eXoJguYvB83gdQgjqpCuK24CYnKlGwKaYBXmo4YxeJ8Eu2ePKDWmAkOUC8eGHbPHLVif6shCylzn0tb4nGnd0JJdu09K3ZAxF7K1fZqK4IoUHWKgviMFDTa7S0EdYduHQy70hCP8K4cNyME2PaRSitoBv95YyOP3uqmmb+WC/m+AeumayC9D5uV8bB2EqSuHAdoZ2qWrQKg1RPf8YN5VSvPC/zIEVoIDllPsQV3YNpY7Oh6Kl X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 11/01/2024 18:17, John Hubbard wrote: > On 1/11/24 03:54, Ryan Roberts wrote: > ... >> I'm not sure exectly what you are asking. The "cont" counters are counting >> blocks of contiguous, naturally aligned physical memory, which are also mapped >> contiguously and aligned. So a smaller --cont would always include all the >> memory captured in a larger --cont. In this case, its all the *file-backed* >> memory (as highighted in the label name) so nothing to do with (m)THP. But where >> you have THP, --cont doesn't care what the underlying THP size is as long as its >> requirements are met, so PMD-sized THPs would be included in e.g. >> *anon*-cont-aligned-128kB. >> >> Note the the "--cont" counters don't directly count memory that is PTE-mapped >> with the contiguous bit set in the page table; it just counts memory that meets >> the alignment, size and mapping requirements. On arm64 systems with the contpte >> series, the contiguous bit would be used here, but its not a part of what's >> getting measured. >> > > The "cont" and "naturally aligned" terms are difficult here, even though > I'm familiar with the implementation. But putting on my systems > monitoring hat, these terms are not helping people as much as I'd like, > because: > > a) "Contiguous" is not really a unique situation, so measuring large pages >    that are "contiguous" is confusing. All folios are contiguous, and >    anything a pte points to is contiguous as well. So --cont really >    throws off the user/reader. > > b) "Naturally aligned" is also tricky. Because "natural" is not explained. > Here it means NAPOT (naturally aligned power of two, I saw that in the > riscv docs). > > After spending a day or two exploring running systems with this, I'd > like to suggest: > > 1) measure "native PMD THPs" vs. pte-mapped mTHPs. This provides a lot > of information: mTHP is configured as expected, and is helping or not, > etc. There is a difference between how a THP is mapped (PTE vs PMD) and its size. A PMD-sized THP can still be mapped with PTEs. So I'd rather not completely filter out PMD-sized THPs, if that's your suggestion. But we could make a distinction between THPs mapped by PTE and those mapped by PMD; the kernel interface doesn't directly give us this, but we can infer it from the AnonHugePages and *PmdMapped stats in smaps. > > 2) Not having to list out all the mTHP sizes would be nice. Instead, > just use the possible sizes from /sys/kernel/mm/transparent_hugepage/* , > unless the user specifies sizes. This is exactly what the tool already does. Perhaps you haven't fully understood the counters that it outputs? You *always* get the following counters (although note the tool *hides* all counters whose value is 0 by default - show them with --inc-empty). This example is for a system with 4K base pages: # thpmaps --pid 1 --summary --inc-empty anon-thp-aligned-16kB: anon-thp-aligned-32kB: anon-thp-aligned-64kB: anon-thp-aligned-128kB: anon-thp-aligned-256kB: anon-thp-aligned-512kB: anon-thp-aligned-1024kB: anon-thp-aligned-2048kB: anon-thp-unaligned-16kB: anon-thp-unaligned-32kB: anon-thp-unaligned-64kB: anon-thp-unaligned-128kB: anon-thp-unaligned-256kB: anon-thp-unaligned-512kB: anon-thp-unaligned-1024kB: anon-thp-unaligned-2048kB: anon-thp-partial: file-thp-aligned-16kB: file-thp-aligned-32kB: file-thp-aligned-64kB: file-thp-aligned-128kB: file-thp-aligned-256kB: file-thp-aligned-512kB: file-thp-aligned-1024kB: file-thp-aligned-2048kB: file-thp-unaligned-16kB: file-thp-unaligned-32kB: file-thp-unaligned-64kB: file-thp-unaligned-128kB: file-thp-unaligned-256kB: file-thp-unaligned-512kB: file-thp-unaligned-1024kB: file-thp-unaligned-2048kB: file-thp-partial: So you have counters for every supported THP size in the system - they will be different for a 64K base page system. anon vs file: hopefully obvious aligned vs unaligned: In both cases the THP is mapped fully and contiguously. In the aligned cases it is mapped so that it is naturally aligned. So a 16K THP is mapped into VA space on a 16K boundary, a 32K THP on a 32K boundary, etc. partial: Parts of THPs that are partially mapped into VA space. Note this does not draw a distinction between PMD-mapped and PTE-mapped THPs. But a THP can only be PMD-mapped if it is both PMD-aligned and PMD-sized. So only 2 counters can include PMD-mappings; anon-thp-aligned-2048kB and file-thp-aligned-2048kB. We can filter that out by subtracting the relevant smaps counters from them. I could add a --ignore-pmd-mapped flag to do that? Or I could rename all the existing counters to include "pte" and introduce 2 new counters: anon-thp-aligned-pmd-2048kB and file-thp-aligned-pmd-2048kB? The --cont option will add *additional* special counters, if specified. The idea here is to provide a view on what percentage of memory is getting contpte-mapped. So if you provide "--cont 64K" it will give you a counter showing how much memory is in 64K, naturally aligned blocks (actually 2 counters; file and anon). Those blocks can come from fully mapped and aligned 64K THPs. But they can also come from bigger THPs - for example, if a 128K THP is aligned on a 64K boundary (but not a 128K boundary), then it will provide 2 64K cont blocks, but it will be counted as unaligned in anon-thp-unaligned-128kB. Or if a 2M THP is partially mapped so that only it's first 1M is mapped and aligned on a 64K boundary, then it will be counted in the *-thp-partial counter and would add 1M to the *-cont-aligned-64kB counter. Sorry if I've labored the point here. But I think the only thing the tool doesn't already do that you are asking for is to differentiate PTE- vs PMD- mappings? > > ... >                          (e.g. /sys/fs/cgroup for cgroup-v2 or >>>>                          /sys/fs/cgroup/pids for cgroup-v1). Exactly one >>>>                          of --pid and --cgroup must be provided. >>> >>> Maybe we could add "--global" to that list. That would look, in order, >>> inside cgroups2 and cgroups, for a list of pids, and then run as if >>> --cgroup /sys/fs/cgroup or --cgroup /sys/fs/cgroup/pids were specified. >> >> I think actually it might be better just to make global the default when neither >> --pid nor --cgroup are provided? And in this case, I'll just grab all the pids >> from /proc rather than traverse the cgroup hierachy, that way it will work on >> systems without cgroups. Does that work for you? > > Yes! That was my initial idea, in fact, and after over-thinking it for > a while, it turned into the above. haha :) OK great - implemented for v3. > > > thanks,