From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 594D6C4707B for ; Thu, 11 Jan 2024 11:54:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 83AC76B0083; Thu, 11 Jan 2024 06:54:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7E89E6B0085; Thu, 11 Jan 2024 06:54:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 689FB6B008A; Thu, 11 Jan 2024 06:54:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 55D366B0083 for ; Thu, 11 Jan 2024 06:54:17 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 2AA351408E9 for ; Thu, 11 Jan 2024 11:54:17 +0000 (UTC) X-FDA: 81666872154.17.B3800CB Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf10.hostedemail.com (Postfix) with ESMTP id 1412EC0006 for ; Thu, 11 Jan 2024 11:54:14 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf10.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1704974055; a=rsa-sha256; cv=none; b=rJSxSmU4VmE/mq8m326ZJBIhkv4C7EGy7yUVCntedFXaA05YeSvBv/7pYlTq9TBX1UlreP Ul4zsceZXRkJ8jAfQasDtnzUL2ebcGYtEJhRFITF+UEyQBpB/Y5gOVHDQsN8YB44Tl8UbP L8vq7O+UAe9ir3rmpXfHlzPBHefKn+g= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf10.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1704974055; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GtnBSPNUokzYvt2whrxmvU7nXSYWJjcV7Bu7gdXfndU=; b=5z1/FR2AgCc6TUAl7zWaC/IvtjghBucE+vfIzV4p7tZuV+DFSHMCYajwFTZsczdMp0k2h3 nuypOrgyl/dxKySDSKXK1DIIUAboTWuYZ1NX5BMK8ceMOvolnU3UwOerZ4PPm5VP2AL0La +m2bnsvjppcH/Hc3kGOJk4m9FhrmgUw= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C38C32F4; Thu, 11 Jan 2024 03:54:59 -0800 (PST) Received: from [10.1.28.168] (XHFQ2J9959.cambridge.arm.com [10.1.28.168]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7FA783F64C; Thu, 11 Jan 2024 03:54:12 -0800 (PST) Message-ID: <43230798-af22-4f59-b37c-8257bae32af8@arm.com> Date: Thu, 11 Jan 2024 11:54:11 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] tools/mm: Add thpmaps script to dump THP usage info Content-Language: en-GB To: John Hubbard , Andrew Morton , Zenghui Yu , Matthew Wilcox , David Hildenbrand , Kefeng Wang , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , William Kucharski Cc: linux-mm@kvack.org, Barry Song References: <20240110173203.3419437-1-ryan.roberts@arm.com> <33341ca8-1354-4f3f-b377-0b7d04da48d0@nvidia.com> From: Ryan Roberts In-Reply-To: <33341ca8-1354-4f3f-b377-0b7d04da48d0@nvidia.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 1412EC0006 X-Stat-Signature: 4pfe5etoq54gsrc355z7fzk1h3mz6gat X-Rspam-User: X-HE-Tag: 1704974054-578379 X-HE-Meta: U2FsdGVkX187FcYvGgIf9Vik3bWPURsilKy2WlFrvSnRf212GqRgdVDn+R663fDFFN03BYwtgJINllfuIfTca0CZsCO6QCWU/m0eVUK2VPtfiE9l0pQPKKX/nmPbtoRaj5Ow2RShgRoF8j+NYBZfhbX/m/cpp5J43vn8DoWrnEv59tM59ad8lG84Ms7wPih4xKgEJYcp82d1+/aaWjgdyHCbIlsGAwUomZ+JFY8jS+Swu1jDXnLjbjTuwXXqdyMgme1b9a4V6hUkf8eymxESw9tt0BKJ1AQkqP5qU+Pqz5Sog/+dsqoXiPCE5ZdUoSGfqVPuhpV7NVNxaT/pHE9X9ML97Hq9yV+9Vj63M6jh2C9R/UmXMZEjkccja8ugQ9XgKdY/NlMG1EPM0kTppWdbkmDXkDNJZRm77TzDFvMQuGdYteTKxG8v524WnZfI+3LOfCCrawA8SHHrBMF7wLhb8kaq8e6HZHVQI3PD3T+rNK4JSKPGgXRjDlya7Bmv0Wbk+deJNvW2RIflOJrfszJGDqceVCrag3NoGxXXV+8n9p4Eb258WlGsQSzmRrMTJ1sPkIjg56AAfD+wZi9g5IzQRCfI/Q4a8RKkyJxS3wue1jatq24jYDqPmu+v+1voNcfxjUReNt2UeLXDI0VP5rrWKy+WsSGvgviQ7ogF64M9JJnwtn/uvwTM0jjEQ6AH9rAkCYu9rK08GMujR1bO0oiOJBvzFcKoRopFmvfbgaPyFtT/HlzRA8qx+DvDkBQer9tRIzvRX8YliFJQobmHoHAumaLet1jja02Vssvk1M2kSAyTrG0hC6xRew8xkJXyTSjnuOhK8iXDd2uC4+99O+/s/XyDlHgnDaagAxyS3MhM9iAnfLMHi5LcoXk7dtbb9obHobhfRp/dCH9KMc4a9uxykLkFyj2gKCUVuwYCqIkpQY5w021E/j78MbtZn4mDgtqMwIFve1NBofhTyTwzX/q tvfV1kT4 nF9OAoy81ViwsZyRBfvVfN3b9OoHwz3ww9UI1SW5xHskvuv2o+mblmf4PZlbfbsaxc2wUHULd3u6DUd0cqw2Ef4yFPBu07vfdQiTmApindYWHkkG5B/kx08f90QM5droFfOGlZhrBq5xlCpb855HtB+eHFg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 10/01/2024 23:21, John Hubbard wrote: > On 1/10/24 09:32, Ryan Roberts wrote: > ... >> options: >>    -h, --help           show this help message and exit >>    --pid pid            Process id of the target process. Exactly one >>                         of --pid and --cgroup must be provided. >>    --cgroup path        Path to the target cgroup in sysfs. Iterates >>                         over every pid in the cgroup and its children. >>                         Get global stats by passing in the root cgroup > > Hi Ryan, > > Yes, this version is fairly effective at getting global stats now. > > I've got some proposed minor tweaks below, and a few questions. Let me > start with the questions: > > 1) When I run this on an older 6.4.8-based kernel: > >     # ./thpmaps --cgroup /sys/fs/cgroup  --cont 128K --cont 512K --cont 1M \ >             --cont 2M --cont 512M --summary > > , I get this output: > > file-thp-aligned-524288kB:      36175872 kB (95%) > file-thp-partial:                 856640 kB ( 2%) > file-cont-aligned-128kB:        37032320 kB (97%) > file-cont-aligned-512kB:        36597760 kB (96%) > file-cont-aligned-1024kB:       36597760 kB (96%) > file-cont-aligned-2048kB:       36595712 kB (96%) > file-cont-aligned-524288kB:     36175872 kB (95%) > > > Is it true that the above is basically "normal" 512MB THP in action? No: the "file" part of the counter name means it is file (not anon). So this is not mTHP, which would always be anon (e.g. "anon-cont-aligned-128kB"). Based on your follow-up mail, I would guess this is mostly hugetlb memory rather than actual page cache memory, but they are both getting lumped into those "file" labels. > And all of the "cont" entries are just that way because we can't > really tell mTHP/cont apart from normal THP? I'm not sure exectly what you are asking. The "cont" counters are counting blocks of contiguous, naturally aligned physical memory, which are also mapped contiguously and aligned. So a smaller --cont would always include all the memory captured in a larger --cont. In this case, its all the *file-backed* memory (as highighted in the label name) so nothing to do with (m)THP. But where you have THP, --cont doesn't care what the underlying THP size is as long as its requirements are met, so PMD-sized THPs would be included in e.g. *anon*-cont-aligned-128kB. Note the the "--cont" counters don't directly count memory that is PTE-mapped with the contiguous bit set in the page table; it just counts memory that meets the alignment, size and mapping requirements. On arm64 systems with the contpte series, the contiguous bit would be used here, but its not a part of what's getting measured. > > 2) On an mTHP kernel with the latest patchsets (arm64, 64K page size), I > *think* I cannot turn off mTHP. I'm still teasing apart how much of this > is an instrumentation error, and how much is a measurement problem (with > the test suite). And maybe I'm wrong entirely. But the "never" option > doesn't seem to have an effect. Unless the latest version of the testsuite > is doing something new, sigh. > > $ for f in $(find /sys/kernel/mm/transparent_hugepage/ -name enabled); do echo > "$f: $(cat $f)"; done > /sys/kernel/mm/transparent_hugepage/hugepages-512kB/enabled: always inherit > madvise [never] > /sys/kernel/mm/transparent_hugepage/enabled: always madvise [never] > /sys/kernel/mm/transparent_hugepage/hugepages-262144kB/enabled: always inherit > madvise [never] > /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled: always inherit > madvise [never] > /sys/kernel/mm/transparent_hugepage/hugepages-32768kB/enabled: always inherit > madvise [never] > /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/enabled: always inherit > madvise [never] > /sys/kernel/mm/transparent_hugepage/hugepages-16384kB/enabled: always inherit > madvise [never] > /sys/kernel/mm/transparent_hugepage/hugepages-524288kB/enabled: always inherit > madvise [never] > /sys/kernel/mm/transparent_hugepage/hugepages-8192kB/enabled: always inherit > madvise [never] > /sys/kernel/mm/transparent_hugepage/hugepages-256kB/enabled: always inherit > madvise [never] > /sys/kernel/mm/transparent_hugepage/hugepages-65536kB/enabled: always inherit > madvise [never] > /sys/kernel/mm/transparent_hugepage/hugepages-131072kB/enabled: always inherit > madvise [never] > /sys/kernel/mm/transparent_hugepage/hugepages-4096kB/enabled: always inherit > madvise [never] > > Any quick thoughts? Don't waste any time on this, it's probably > operator error. Just in case, though. As per your email, you're looking at hugetlb memory (as per counter label). I have all the information to create a hugetlb-specific set of counters, so its not lumped in with page cache memory. You would then have counter sets of "anon", "file" and "htlb". Would that be useful? > > >>                         (e.g. /sys/fs/cgroup for cgroup-v2 or >>                         /sys/fs/cgroup/pids for cgroup-v1). Exactly one >>                         of --pid and --cgroup must be provided. > > Maybe we could add "--global" to that list. That would look, in order, > inside cgroups2 and cgroups, for a list of pids, and then run as if > --cgroup /sys/fs/cgroup or --cgroup /sys/fs/cgroup/pids were specified. I think actually it might be better just to make global the default when neither --pid nor --cgroup are provided? And in this case, I'll just grab all the pids from /proc rather than traverse the cgroup hierachy, that way it will work on systems without cgroups. Does that work for you? > > It's nicer than failing out. And it's also directly useful. I would be > running my above command like this, instead: > > # ./thpmaps --global  --cont 128K --cont 512K --cont 1M \ >             --cont 2M --cont 512M --summary > > thanks,