From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8185C47077 for ; Thu, 11 Jan 2024 17:32:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 672EB6B008C; Thu, 11 Jan 2024 12:32:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 622D36B0092; Thu, 11 Jan 2024 12:32:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4EA606B0093; Thu, 11 Jan 2024 12:32:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 3FEAE6B008C for ; Thu, 11 Jan 2024 12:32:46 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 0CCB3A06C9 for ; Thu, 11 Jan 2024 17:32:46 +0000 (UTC) X-FDA: 81667725132.11.950C47D Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf06.hostedemail.com (Postfix) with ESMTP id D99C118000F for ; Thu, 11 Jan 2024 17:32:43 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf06.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1704994364; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EhpqwZY3opQu+aRSsb1wlQDqV3AubRGSVLsf7X3bQ0M=; b=vTaeVKO3/wPcvRuMZbtz/KLeZBUJTrqwcYTaurN2d3vVRMfGiL3l0yIJvbe0U5Wsg8HXkl gjcE02TjB/sq4t3hRO8laDYkTLTMabqqA2oRhkXO8JA/FKF0jbzdt2WXpKuzvFAIgbOIZQ sMywqqxNR0WGGrYvT7RUCiwO1V6o+Lg= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf06.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1704994364; a=rsa-sha256; cv=none; b=kdXfNc4WhARRrQW0ZlMTKvpybGbXd/zaAEawCqaoKETtBjvUKEf5lr/KlUC3QrKFQ1Qgyt AOmB2uhsw+eW6BCHmwzHO/n7SWueg1j9tRgL9BZqaypkgTo0eJ4vKz6oACex13+fatGOhF 4TR5Evp1CeSrmHAikO3wUBHFwTtkdJM= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9B30D2F4; Thu, 11 Jan 2024 09:33:28 -0800 (PST) Received: from [10.1.28.168] (XHFQ2J9959.cambridge.arm.com [10.1.28.168]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 321223F5A1; Thu, 11 Jan 2024 09:32:41 -0800 (PST) Message-ID: Date: Thu, 11 Jan 2024 17:32:39 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] tools/mm: Add thpmaps script to dump THP usage info Content-Language: en-GB From: Ryan Roberts To: John Hubbard , Andrew Morton , Zenghui Yu , Matthew Wilcox , David Hildenbrand , Kefeng Wang , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , William Kucharski Cc: linux-mm@kvack.org, Barry Song References: <20240110173203.3419437-1-ryan.roberts@arm.com> <33341ca8-1354-4f3f-b377-0b7d04da48d0@nvidia.com> <43230798-af22-4f59-b37c-8257bae32af8@arm.com> In-Reply-To: <43230798-af22-4f59-b37c-8257bae32af8@arm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: D99C118000F X-Stat-Signature: c6946ss3fb7hptrsegpg6a31rpbcmx74 X-Rspam-User: X-HE-Tag: 1704994363-449056 X-HE-Meta: U2FsdGVkX1+gNVejHUaexg4KHS0vLa9v9KBFSNmsgi5erQ9ecsqlnxjzqZN7OZgGRA0wRWMzuFWaL6oGILYiHAAhDDCjtyFcVHiOIhwK8Z0YWdCFoWZopubnUgiKCdNQtJXw93bjU0ER0dXOtqEhr9gDudlDkm6AQ3Z8Dqk52Ou8qwQIDvtoj54f5w9VZzYYqiXGTqcpCa/iHTsXwHKFufm/w+y2L4DO0NL7KanIdYJgDRuIzqiaf6ZyCijtHrNrd6PVLrXJ5m3k8CVUmBVLwT8mgjTraivXIMz2JPXnYtV3N8Bh0FjSeQtdv45oVpL0dSFhomnXM5G5CEfZn68IEJHglZ76j9SH8bOmcw4z3c8s0fTOwLOR3twmegqx9FOosEQzI5QDdEBZVyn11EP8rBXsTHkeyqWBHycVaD2pYyFcGWQpRDoPppsqBf/MNEl9UoFxo1m47+yoSWU33gqGLPOCgkg4KqstK6w2bWJbp4c76bsZyVveDV9O3n4KYMcg6grYuqHTg4IJqji5LfeOTRgIRELrUwAtnewt9ZQd/ZTn22ll6PeVQcdjIMuyaoi7m2cDcddDgjXzy+7UH2BEMNuqjJ2ez/z+ujJJtiDlOU2gqTAcMmNkuIIMGUng3YmyggsbUxnlPOs3NCE9CKLLHVhqGm2cmvpUyFQgi+tFPqOIblmxmMzcSg//07kKaK4By9PrQUkAOGQow6a22fTowIWtGMYo3RRWjQ021N8qBRWifZYZCiejibw1NPN1MZs5fvnxaZNoxqCDOchaNbX+l+8KQnV5iNWigYL1IAwlA9m+bQEwIanQJTHwxD6SzahJWCRW/E3LgcpV3fatOlU7l3Ga6TDwTcZ+hzy+/oyR5UV4Vuw6MgQPKtn3LP0grEY1zcNc2Q0na1Ci8qr1KYlbaQ9tTXwzz3+wQfgIwqrI05bchNRb132U0zqtTO/fqP8mkS09x3UC+ASR/Deqkyu 76ZxGu4l t4+H9Un3M2j9wN1RbNS8bYce9v9meapJR7sj0ZH2gUm8pvVSuu+/t2vp+VfOlcCcyMLmcuO2TZMExbHBfCFUwS6sq8igUrYLElRjfTH8DITWNx+PDRbhxJ0kl6fmElMfZQrRRYMFAYRL8v5dY5z8MBsPmSw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 11/01/2024 11:54, Ryan Roberts wrote: > On 10/01/2024 23:21, John Hubbard wrote: >> On 1/10/24 09:32, Ryan Roberts wrote: >> ... >>> options: >>>    -h, --help           show this help message and exit >>>    --pid pid            Process id of the target process. Exactly one >>>                         of --pid and --cgroup must be provided. >>>    --cgroup path        Path to the target cgroup in sysfs. Iterates >>>                         over every pid in the cgroup and its children. >>>                         Get global stats by passing in the root cgroup >> >> Hi Ryan, >> >> Yes, this version is fairly effective at getting global stats now. >> >> I've got some proposed minor tweaks below, and a few questions. Let me >> start with the questions: >> >> 1) When I run this on an older 6.4.8-based kernel: >> >>     # ./thpmaps --cgroup /sys/fs/cgroup  --cont 128K --cont 512K --cont 1M \ >>             --cont 2M --cont 512M --summary >> >> , I get this output: >> >> file-thp-aligned-524288kB:      36175872 kB (95%) >> file-thp-partial:                 856640 kB ( 2%) >> file-cont-aligned-128kB:        37032320 kB (97%) >> file-cont-aligned-512kB:        36597760 kB (96%) >> file-cont-aligned-1024kB:       36597760 kB (96%) >> file-cont-aligned-2048kB:       36595712 kB (96%) >> file-cont-aligned-524288kB:     36175872 kB (95%) >> >> >> Is it true that the above is basically "normal" 512MB THP in action? > > No: the "file" part of the counter name means it is file (not anon). So this is > not mTHP, which would always be anon (e.g. "anon-cont-aligned-128kB"). Based on > your follow-up mail, I would guess this is mostly hugetlb memory rather than > actual page cache memory, but they are both getting lumped into those "file" labels. > >> And all of the "cont" entries are just that way because we can't >> really tell mTHP/cont apart from normal THP? > > I'm not sure exectly what you are asking. The "cont" counters are counting > blocks of contiguous, naturally aligned physical memory, which are also mapped > contiguously and aligned. So a smaller --cont would always include all the > memory captured in a larger --cont. In this case, its all the *file-backed* > memory (as highighted in the label name) so nothing to do with (m)THP. But where > you have THP, --cont doesn't care what the underlying THP size is as long as its > requirements are met, so PMD-sized THPs would be included in e.g. > *anon*-cont-aligned-128kB. > > Note the the "--cont" counters don't directly count memory that is PTE-mapped > with the contiguous bit set in the page table; it just counts memory that meets > the alignment, size and mapping requirements. On arm64 systems with the contpte > series, the contiguous bit would be used here, but its not a part of what's > getting measured. > >> >> 2) On an mTHP kernel with the latest patchsets (arm64, 64K page size), I >> *think* I cannot turn off mTHP. I'm still teasing apart how much of this >> is an instrumentation error, and how much is a measurement problem (with >> the test suite). And maybe I'm wrong entirely. But the "never" option >> doesn't seem to have an effect. Unless the latest version of the testsuite >> is doing something new, sigh. >> >> $ for f in $(find /sys/kernel/mm/transparent_hugepage/ -name enabled); do echo >> "$f: $(cat $f)"; done >> /sys/kernel/mm/transparent_hugepage/hugepages-512kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/enabled: always madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-262144kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-32768kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-16384kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-524288kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-8192kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-256kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-65536kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-131072kB/enabled: always inherit >> madvise [never] >> /sys/kernel/mm/transparent_hugepage/hugepages-4096kB/enabled: always inherit >> madvise [never] >> >> Any quick thoughts? Don't waste any time on this, it's probably >> operator error. Just in case, though. > > As per your email, you're looking at hugetlb memory (as per counter label). > > I have all the information to create a hugetlb-specific set of counters, so its > not lumped in with page cache memory. You would then have counter sets of > "anon", "file" and "htlb". Would that be useful? Or I could just filter out hugetlb memory so it doesn't appear in this tool at all? That would be easier implementation-wise, and probably more in line with the original intention of the tool (it's called thpmaps, after all). > >> >> >>>                         (e.g. /sys/fs/cgroup for cgroup-v2 or >>>                         /sys/fs/cgroup/pids for cgroup-v1). Exactly one >>>                         of --pid and --cgroup must be provided. >> >> Maybe we could add "--global" to that list. That would look, in order, >> inside cgroups2 and cgroups, for a list of pids, and then run as if >> --cgroup /sys/fs/cgroup or --cgroup /sys/fs/cgroup/pids were specified. > > I think actually it might be better just to make global the default when neither > --pid nor --cgroup are provided? And in this case, I'll just grab all the pids > from /proc rather than traverse the cgroup hierachy, that way it will work on > systems without cgroups. Does that work for you? > >> >> It's nicer than failing out. And it's also directly useful. I would be >> running my above command like this, instead: >> >> # ./thpmaps --global  --cont 128K --cont 512K --cont 1M \ >>             --cont 2M --cont 512M --summary >> >> thanks, >