From: Ryan Roberts <ryan.roberts@arm.com>
To: John Hubbard <jhubbard@nvidia.com>,
Andrew Morton <akpm@linux-foundation.org>,
Zenghui Yu <yuzenghui@huawei.com>,
Matthew Wilcox <willy@infradead.org>,
David Hildenbrand <david@redhat.com>,
Kefeng Wang <wangkefeng.wang@huawei.com>, Zi Yan <ziy@nvidia.com>,
Barry Song <21cnbao@gmail.com>,
Alistair Popple <apopple@nvidia.com>,
William Kucharski <william.kucharski@oracle.com>
Cc: linux-mm@kvack.org, Barry Song <v-songbaohua@oppo.com>
Subject: Re: [PATCH v2] tools/mm: Add thpmaps script to dump THP usage info
Date: Tue, 16 Jan 2024 08:53:37 +0000 [thread overview]
Message-ID: <82dcfef7-7323-482e-8a27-98530570688e@arm.com> (raw)
In-Reply-To: <f23e6f32-397e-4821-a0fe-f2a0bb6e2fe0@nvidia.com>
On 15/01/2024 21:30, John Hubbard wrote:
> On 1/15/24 07:56, Ryan Roberts wrote:
> ...
>>> But yes, let me work up some improved documentation and send it out for your
>>> review. The reason its a bit terse at the moment, is that I'm using Python's
>>> ArgumentParser for the documentation, and it removes all line breaks from the
>>> description which makes it hard to format longer form docs. Anyway, that's a bad
>>> excuse for bad docs so I'll figure out a solution.
>>
>> Here is my proposed documentation. If you could take a look and let me know if
>> it makes sense, then I'll modify the tool to conform:
>>
>
> Looks great. One typo fix and a note, below.
>
>> --8<--
>>
>> $ ./thpmaps --help
>>
>> usage: thpmaps [-h] [--pid pid | --cgroup path] [--rollup] [--cont size[KMG]]
>> [--inc-smaps] [--inc-empty] [--periodic sleep_ms]
>>
>> Prints information about how transparent huge pages are mapped, either system-
>> wide, or for a specified process or cgroup.
>>
>> A default set of statistics is always generated for THP mappings. However, it is
>
> The way this is done is sufficiently interesting to the sysadmin to say a
> few words about it. Something along these lines, approximately:
>
> -----
> When run without options, cgroups v1 or v2 (depending on what is active
> on the system) is used in order to get a listing of all user space pids.
> That pid list is passed into the core script, as if the user had provided
> "--pids pid1 pid2 ...".
> -----
Agree with the sentiment; I'll add something similar. Although, I'm no longer
using cgroups to get all the pids - I'm grabbing them from /proc.
--8<--
When run with --pid, the user explicitly specifies the set of pids to scan. e.g.
"--pid 10 [--pid 134 ...]". When run with --cgroup, the user passes either a v1
or v2 cgroup and all pids that belong to the cgroup subtree are scanned. When
run with neither --pid nor --cgroup, the full set of pids on the system is
gathered from /proc and scanned as if the user had provided "--pid 1 --pid 2 ...".
--8<--
>
> This reminds me that maybe a --pids options is helpful, what do you think?
How about I allow --pid to be specified multiple times? That will make the
parsing easier (and be consistent with the way it works for --cont):
--pid 1 --pid 2 --pid 3 ...
>
>
>> also possible to generate additional statistics for "contiguous block mappings"
>> where the block size is user-defined.
>>
>> Statistics are maintained independently for anonymous and file-backed
>> (pagecache) memory and are shown both in kB and as a percentage of either total
>> anonymous or total file-backed memory as appropriate.
>>
>> THP Statistics
>> --------------
>>
>> Statistics are always generated for fully- and contiguously-mapped THPs whose
>> mapping address is aligned to their size, for each <size> supported by the
>> system. Separate counters describe THPs mapped by PTE vs those mapped by PMD.
>> (Although note a THP can only be mapped by PMD if it is PMD-sized):
>>
>> - anon-thp-pte-aligned-<size>kB
>> - file-thp-pte-aligned-<size>kB
>> - anon-thp-pmd-aligned-<size>kB
>> - file-thp-pmd-aligned-<size>kB
>>
>> Similarly, statistics are always generated for fully- and contiguously-mapped
>> THPs whose mapping address is *not* aligned to their size, for each <size>
>> supported by the system. Due to the unaligned mapping, it is impossible to map
>> by PMD, so there are only PTE counters for this case:
>>
>> - anon-thp-pte-unaligned-<size>kB
>> - file-thp-pte-unaligned-<size>kB
>>
>> Statistics are also always generated for mapped pages that belong to a THP but
>> where the is THP is *not* fully- and contiguously- mapped. These "partial"
>> mappings are all counted in the same counter regardless of the size of the THP
>> that is partially mapped:
>>
>> - anon-thp-pte-partial
>> - file-thp-pte-partial
>>
>> Contiguous Block Statistics
>> ---------------------------
>>
>> An optional, additional set of statistics is generated for every contiguous
>> block size specified with `--cont <size>`. These statistics show how much memory
>> is mapped in contiguous blocks of <size> and also aligned to <size>. A given
>> contiguous block must all belong to the same THP, but there is no requirement
>> for it to be the *whole* THP. Separate counters describe contiguous blocks
>> mapped by PTE vs those mapped by PMD:
>>
>> - anon-cont-pte-aligned-<size>kB
>> - file-cont-pte-aligned-<size>kB
>> - anon-cont-pmd-aligned-<size>kB
>> - file-cont-pmd-aligned-<size>kB
>>
>> As an example, if montiroing 64K contiguous blocks (--cont 64K), there are a
>
> typo: "monitoring"
>
>> number of sources that could provide such blocks: a fully- and contiguously-
>> mapped 64K THP that is aligned to a 64K boundary would provide 1 block. A fully-
>> and contiguously-mapped 128K THP that is aligned to at least a 64K boundary
>> would provide 2 blocks. Or a 128K THP that maps its first 100K, but contiguously
>> and starting at a 64K boundary would provide 1 block. A fully- and contiguously-
>> mapped 2M THP would provide 32 blocks. There are many other possible
>> permutations.
>>
>> optional arguments:
>> -h, --help show this help message and exit
>> --pid pid Process id of the target process. --pid and --cgroup are
>> mutually exclusive. If neither are provided, all
>> processes are scanned to provide system-wide information.
>> --cgroup path Path to the target cgroup in sysfs. Iterates over every
>> pid in the cgroup and its children. --pid and --cgroup
>> are mutually exclusive. If neither are provided, all
>> processes are scanned to provide system-wide information.
>> --rollup Sum the per-vma statistics to provide a summary over the
>> whole system, process or cgroup.
>> --cont size[KMG] Adds stats for memory that is mapped in contiguous blocks
>> of <size> and also aligned to <size>. May be issued
>> multiple times to track multiple sized blocks. Useful to
>> infer e.g. arm64 contpte and hpa mappings. Size must be a
>> power-of-2 number of pages.
>> --inc-smaps Include all numerical, additive /proc/<pid>/smaps stats
>> in the output.
>> --inc-empty Show all statistics including those whose value is 0.
>> --periodic sleep_ms Run in a loop, polling every sleep_ms milliseconds.
>>
>> Requires root privilege to access pagemap and kpageflags.
>>
>> --8<--
>
> It's all looking much more understandable now, very nice.
Great - thanks for the review. I'll get this straightened out and post later today.
>
> thanks,
next prev parent reply other threads:[~2024-01-16 8:53 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-10 17:32 Ryan Roberts
2024-01-10 23:21 ` John Hubbard
2024-01-11 0:11 ` John Hubbard
2024-01-11 3:32 ` John Hubbard
2024-01-11 11:54 ` Ryan Roberts
2024-01-11 17:32 ` Ryan Roberts
2024-01-11 18:01 ` David Hildenbrand
2024-01-11 18:04 ` John Hubbard
2024-01-12 10:01 ` Ryan Roberts
2024-01-11 18:17 ` John Hubbard
2024-01-12 10:00 ` Ryan Roberts
2024-01-12 19:14 ` John Hubbard
2024-01-15 9:48 ` Ryan Roberts
2024-01-15 15:56 ` Ryan Roberts
2024-01-15 21:30 ` John Hubbard
2024-01-16 8:53 ` Ryan Roberts [this message]
2024-01-16 17:27 ` John Hubbard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=82dcfef7-7323-482e-8a27-98530570688e@arm.com \
--to=ryan.roberts@arm.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=david@redhat.com \
--cc=jhubbard@nvidia.com \
--cc=linux-mm@kvack.org \
--cc=v-songbaohua@oppo.com \
--cc=wangkefeng.wang@huawei.com \
--cc=william.kucharski@oracle.com \
--cc=willy@infradead.org \
--cc=yuzenghui@huawei.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox