From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3E21C47077 for ; Tue, 16 Jan 2024 08:53:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 70C5A6B007B; Tue, 16 Jan 2024 03:53:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6BC316B007D; Tue, 16 Jan 2024 03:53:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 585636B007E; Tue, 16 Jan 2024 03:53:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 49ACB6B007B for ; Tue, 16 Jan 2024 03:53:47 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 0FBEB14023F for ; Tue, 16 Jan 2024 08:53:47 +0000 (UTC) X-FDA: 81684561294.25.8106231 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf01.hostedemail.com (Postfix) with ESMTP id F3B0040005 for ; Tue, 16 Jan 2024 08:53:44 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf01.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705395225; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1VjECMLHvZVOtE4e0EXlHREv+L83Fb9z7hmm4FO/Gl4=; b=oDzBvvsXdHAXoWZT/ECKNNrlRAQHMlOnkf6KplkLV34qQfmOvt+fIYcW0fL7VdS8mHAj/6 z3Ts1K53j7KOQxSlcQoCsnZBdwszAG/yDKCQe6OD1zmfwLrEN4O2zb8pOgz3tQjls3QWra gAUc8MfYP2VtJGTZR5QHCyoIwSN64Mw= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf01.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705395225; a=rsa-sha256; cv=none; b=t042Lt78wFaWj9f6rYVIblNiEM93huEORkUcKWRf6MksTLJ6wFeSVqFcgh9KzPJT/EmTMc AutfLIa6FDXYNYLmD1aSxo1zSE4t3IlDUImOHfyZngRGouxMCCDV+nKnRraOd0I9r2hcHp g4pDo15OVzAOgBR9GnqVFVRoVUH8+TA= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B490D2F4; Tue, 16 Jan 2024 00:54:29 -0800 (PST) Received: from [10.57.76.47] (unknown [10.57.76.47]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A443F3F6C4; Tue, 16 Jan 2024 00:53:40 -0800 (PST) Message-ID: <82dcfef7-7323-482e-8a27-98530570688e@arm.com> Date: Tue, 16 Jan 2024 08:53:37 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] tools/mm: Add thpmaps script to dump THP usage info Content-Language: en-GB To: John Hubbard , Andrew Morton , Zenghui Yu , Matthew Wilcox , David Hildenbrand , Kefeng Wang , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , William Kucharski Cc: linux-mm@kvack.org, Barry Song References: <20240110173203.3419437-1-ryan.roberts@arm.com> <33341ca8-1354-4f3f-b377-0b7d04da48d0@nvidia.com> <43230798-af22-4f59-b37c-8257bae32af8@arm.com> <22905bf7-570f-41a9-8dd0-b8a250c97de3@arm.com> <0f5b9444-fd79-49f0-b9d8-f5e04c044696@nvidia.com> <64f4fc88-b591-4a76-9a9f-3971225d0fa7@arm.com> <9acb1684-7c5a-41c4-9a23-edad73e55585@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Stat-Signature: 1s8pyqbpyckbw8bbisywe4fcjkosoroj X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: F3B0040005 X-HE-Tag: 1705395224-136296 X-HE-Meta: U2FsdGVkX1+NRYubrAHuefkKlLatdzD2ruLvgrVAayc09YXkLgNj935sDlLF5TP/QkL9/6mnHoeClgfvjATV2AJ5x+B2TN9Kc/WymAX2oc4HwIlriGFe+V45UZO7Tv9s25iH93tcHMjLxLyHipehEKXsYQGGjfQc+1xEXNHFuhSgCBTFo2lvPjmkDk2z5WKcYa+HN4E9BfokPa5lgAu03L8pcef8hMjfBBn6KzX/1tO0LdUuxzhaDH3GAdHSZvdUMqVdtgnW59XWAumcFeeX9ILPwiYE0BMD1ZYlHgwJ+N0P96c/5FAmknRqoDCW6pougW9254gf2+AZZYguENMlAxYzdlmiu5AE4j18j1F6iVAJSzNejj+F/bxPloq7EtafqBlsixUHcHzDagU6txvGNvj08TShnC+aa2VMF2ecWlss5YgwIRpj8kuVHt0d4iYwCilSsb/axV8WlTiS1aMCxkpfpchQHqc78xyN+PvEh9D4+zrKOFF4UTiZg8Le/LlCg85pqtWRs2O12rYK+lOPN9dPeU9/ULReTqoL2QcEYNagkNAcsKiebOWK4u46SDwceNb+LfXBxWYI1+lo+Omzzd2tJOyNeCoSCFBpP0mb1m8F9hxIQwAJx+jZy0P1aCc6XkqAnH5cd2jPn8uePi742+52MZfz+G8vud86w8eb/nMxJmWSPcKVSbrflP8wbJP0v0oQblQKs7L6YQeiv2izGhnN0xvBD3laUslFW/SjTm5HUwjckOvngkiZiscyAXvLiPm6htidujFzyRQ9xg3PsN/8++N1F91VnRLwbeLGqU2z5CVklqC6baBvlm5LyOFB1NpvTWhP84fB9a/FV8gEHFaKVDioN9oIpZFfmc2ynwY8cdevG/yGqAgS8SEag+gx7yb2NIi2AB0CVBTpkxQXs0DT14dPoRUery2qsqX+zU6PJRLQsIX+IoIRu/DhyaqWtkg66JnyOuzmRrLqZR+ LQOkuNR9 eeU0KzOCylLrzFJWdXZq4OG4kfitl20F1WKD4fYZ9gxsB7t0RMeGqEAhDAgCLfJasAu2Bg/jpjYml9O1DW/daOado/Dcva9vD3VVR1Zw++Hr5BAkvuO26u5CHGyDVUTSVVNT6SRIoLMgxYUny3/rOBFUVYA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 15/01/2024 21:30, John Hubbard wrote: > On 1/15/24 07:56, Ryan Roberts wrote: > ... >>> But yes, let me work up some improved documentation and send it out for your >>> review. The reason its a bit terse at the moment, is that I'm using Python's >>> ArgumentParser for the documentation, and it removes all line breaks from the >>> description which makes it hard to format longer form docs. Anyway, that's a bad >>> excuse for bad docs so I'll figure out a solution. >> >> Here is my proposed documentation. If you could take a look and let me know if >> it makes sense, then I'll modify the tool to conform: >> > > Looks great. One typo fix and a note, below. > >> --8<-- >> >> $ ./thpmaps --help >> >> usage: thpmaps [-h] [--pid pid | --cgroup path] [--rollup] [--cont size[KMG]] >>                 [--inc-smaps] [--inc-empty] [--periodic sleep_ms] >> >> Prints information about how transparent huge pages are mapped, either system- >> wide, or for a specified process or cgroup. >> >> A default set of statistics is always generated for THP mappings. However, it is > > The way this is done is sufficiently interesting to the sysadmin to say a > few words about it. Something along these lines, approximately: > >  ----- >  When run without options, cgroups v1 or v2 (depending on what is active >  on the system) is used in order to get a listing of all user space pids. >  That pid list is passed into the core script, as if the user had provided >  "--pids pid1 pid2 ...". >  ----- Agree with the sentiment; I'll add something similar. Although, I'm no longer using cgroups to get all the pids - I'm grabbing them from /proc. --8<-- When run with --pid, the user explicitly specifies the set of pids to scan. e.g. "--pid 10 [--pid 134 ...]". When run with --cgroup, the user passes either a v1 or v2 cgroup and all pids that belong to the cgroup subtree are scanned. When run with neither --pid nor --cgroup, the full set of pids on the system is gathered from /proc and scanned as if the user had provided "--pid 1 --pid 2 ...". --8<-- > > This reminds me that maybe a --pids options is helpful, what do you think? How about I allow --pid to be specified multiple times? That will make the parsing easier (and be consistent with the way it works for --cont): --pid 1 --pid 2 --pid 3 ... > > >> also possible to generate additional statistics for "contiguous block mappings" >> where the block size is user-defined. >> >> Statistics are maintained independently for anonymous and file-backed >> (pagecache) memory and are shown both in kB and as a percentage of either total >> anonymous or total file-backed memory as appropriate. >> >> THP Statistics >> -------------- >> >> Statistics are always generated for fully- and contiguously-mapped THPs whose >> mapping address is aligned to their size, for each supported by the >> system. Separate counters describe THPs mapped by PTE vs those mapped by PMD. >> (Although note a THP can only be mapped by PMD if it is PMD-sized): >> >> - anon-thp-pte-aligned-kB >> - file-thp-pte-aligned-kB >> - anon-thp-pmd-aligned-kB >> - file-thp-pmd-aligned-kB >> >> Similarly, statistics are always generated for fully- and contiguously-mapped >> THPs whose mapping address is *not* aligned to their size, for each >> supported by the system. Due to the unaligned mapping, it is impossible to map >> by PMD, so there are only PTE counters for this case: >> >> - anon-thp-pte-unaligned-kB >> - file-thp-pte-unaligned-kB >> >> Statistics are also always generated for mapped pages that belong to a THP but >> where the is THP is *not* fully- and contiguously- mapped. These "partial" >> mappings are all counted in the same counter regardless of the size of the THP >> that is partially mapped: >> >> - anon-thp-pte-partial >> - file-thp-pte-partial >> >> Contiguous Block Statistics >> --------------------------- >> >> An optional, additional set of statistics is generated for every contiguous >> block size specified with `--cont `. These statistics show how much memory >> is mapped in contiguous blocks of and also aligned to . A given >> contiguous block must all belong to the same THP, but there is no requirement >> for it to be the *whole* THP. Separate counters describe contiguous blocks >> mapped by PTE vs those mapped by PMD: >> >> - anon-cont-pte-aligned-kB >> - file-cont-pte-aligned-kB >> - anon-cont-pmd-aligned-kB >> - file-cont-pmd-aligned-kB >> >> As an example, if montiroing 64K contiguous blocks (--cont 64K), there are a > > typo: "monitoring" > >> number of sources that could provide such blocks: a fully- and contiguously- >> mapped 64K THP that is aligned to a 64K boundary would provide 1 block. A fully- >> and contiguously-mapped 128K THP that is aligned to at least a 64K boundary >> would provide 2 blocks. Or a 128K THP that maps its first 100K, but contiguously >> and starting at a 64K boundary would provide 1 block. A fully- and contiguously- >> mapped 2M THP would provide 32 blocks. There are many other possible >> permutations. >> >> optional arguments: >>    -h, --help           show this help message and exit >>    --pid pid            Process id of the target process. --pid and --cgroup are >>                         mutually exclusive. If neither are provided, all >>                         processes are scanned to provide system-wide information. >>    --cgroup path        Path to the target cgroup in sysfs. Iterates over every >>                         pid in the cgroup and its children. --pid and --cgroup >>                         are mutually exclusive. If neither are provided, all >>                         processes are scanned to provide system-wide information. >>    --rollup             Sum the per-vma statistics to provide a summary over the >>                         whole system, process or cgroup. >>    --cont size[KMG]     Adds stats for memory that is mapped in contiguous blocks >>                         of and also aligned to . May be issued >>                         multiple times to track multiple sized blocks. Useful to >>                         infer e.g. arm64 contpte and hpa mappings. Size must be a >>                         power-of-2 number of pages. >>    --inc-smaps          Include all numerical, additive /proc//smaps stats >>                         in the output. >>    --inc-empty          Show all statistics including those whose value is 0. >>    --periodic sleep_ms  Run in a loop, polling every sleep_ms milliseconds. >> >> Requires root privilege to access pagemap and kpageflags. >> >> --8<-- > > It's all looking much more understandable now, very nice. Great - thanks for the review. I'll get this straightened out and post later today. > > thanks,