linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: John Hubbard <jhubbard@nvidia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Zenghui Yu <yuzenghui@huawei.com>,
	Matthew Wilcox <willy@infradead.org>,
	David Hildenbrand <david@redhat.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>, Zi Yan <ziy@nvidia.com>,
	Barry Song <21cnbao@gmail.com>,
	Alistair Popple <apopple@nvidia.com>,
	William Kucharski <william.kucharski@oracle.com>
Cc: linux-mm@kvack.org, Barry Song <v-songbaohua@oppo.com>
Subject: Re: [PATCH v2] tools/mm: Add thpmaps script to dump THP usage info
Date: Fri, 12 Jan 2024 10:00:51 +0000	[thread overview]
Message-ID: <22905bf7-570f-41a9-8dd0-b8a250c97de3@arm.com> (raw)
In-Reply-To: <ebe47d0f-35e3-4c49-8128-8e705b57adbe@nvidia.com>

On 11/01/2024 18:17, John Hubbard wrote:
> On 1/11/24 03:54, Ryan Roberts wrote:
> ...
>> I'm not sure exectly what you are asking. The "cont" counters are counting
>> blocks of contiguous, naturally aligned physical memory, which are also mapped
>> contiguously and aligned. So a smaller --cont would always include all the
>> memory captured in a larger --cont. In this case, its all the *file-backed*
>> memory (as highighted in the label name) so nothing to do with (m)THP. But where
>> you have THP, --cont doesn't care what the underlying THP size is as long as its
>> requirements are met, so PMD-sized THPs would be included in e.g.
>> *anon*-cont-aligned-128kB.
>>
>> Note the the "--cont" counters don't directly count memory that is PTE-mapped
>> with the contiguous bit set in the page table; it just counts memory that meets
>> the alignment, size and mapping requirements. On arm64 systems with the contpte
>> series, the contiguous bit would be used here, but its not a part of what's
>> getting measured.
>>
> 
> The "cont" and "naturally aligned" terms are difficult here, even though
> I'm familiar with the implementation. But putting on my systems
> monitoring hat, these terms are not helping people as much as I'd like,
> because:
> 
> a) "Contiguous" is not really a unique situation, so measuring large pages
>    that are "contiguous" is confusing. All folios are contiguous, and
>    anything a pte points to is contiguous as well. So --cont really
>    throws off the user/reader.
> 
> b) "Naturally aligned" is also tricky. Because "natural" is not explained.
> Here it means NAPOT (naturally aligned power of two, I saw that in the
> riscv docs).
> 
> After spending a day or two exploring running systems with this, I'd
> like to suggest:
> 
> 1) measure "native PMD THPs" vs. pte-mapped mTHPs. This provides a lot
> of information: mTHP is configured as expected, and is helping or not,
> etc.

There is a difference between how a THP is mapped (PTE vs PMD) and its size. A
PMD-sized THP can still be mapped with PTEs. So I'd rather not completely filter
out PMD-sized THPs, if that's your suggestion. But we could make a distinction
between THPs mapped by PTE and those mapped by PMD; the kernel interface doesn't
directly give us this, but we can infer it from the AnonHugePages and *PmdMapped
stats in smaps.

> 
> 2) Not having to list out all the mTHP sizes would be nice. Instead,
> just use the possible sizes from /sys/kernel/mm/transparent_hugepage/* ,
> unless the user specifies sizes.

This is exactly what the tool already does. Perhaps you haven't fully understood
the counters that it outputs?

You *always* get the following counters (although note the tool *hides* all
counters whose value is 0 by default - show them with --inc-empty). This example
is for a system with 4K base pages:

# thpmaps --pid 1 --summary --inc-empty

anon-thp-aligned-16kB:
anon-thp-aligned-32kB:
anon-thp-aligned-64kB:
anon-thp-aligned-128kB:
anon-thp-aligned-256kB:
anon-thp-aligned-512kB:
anon-thp-aligned-1024kB:
anon-thp-aligned-2048kB:
anon-thp-unaligned-16kB:
anon-thp-unaligned-32kB:
anon-thp-unaligned-64kB:
anon-thp-unaligned-128kB:
anon-thp-unaligned-256kB:
anon-thp-unaligned-512kB:
anon-thp-unaligned-1024kB:
anon-thp-unaligned-2048kB:
anon-thp-partial:
file-thp-aligned-16kB:
file-thp-aligned-32kB:
file-thp-aligned-64kB:
file-thp-aligned-128kB:
file-thp-aligned-256kB:
file-thp-aligned-512kB:
file-thp-aligned-1024kB:
file-thp-aligned-2048kB:
file-thp-unaligned-16kB:
file-thp-unaligned-32kB:
file-thp-unaligned-64kB:
file-thp-unaligned-128kB:
file-thp-unaligned-256kB:
file-thp-unaligned-512kB:
file-thp-unaligned-1024kB:
file-thp-unaligned-2048kB:
file-thp-partial:

So you have counters for every supported THP size in the system - they will be
different for a 64K base page system.

anon vs file: hopefully obvious

aligned vs unaligned: In both cases the THP is mapped fully and contiguously. In
the aligned cases it is mapped so that it is naturally aligned. So a 16K THP is
mapped into VA space on a 16K boundary, a 32K THP on a 32K boundary, etc.

partial: Parts of THPs that are partially mapped into VA space.

Note this does not draw a distinction between PMD-mapped and PTE-mapped THPs.
But a THP can only be PMD-mapped if it is both PMD-aligned and PMD-sized. So
only 2 counters can include PMD-mappings; anon-thp-aligned-2048kB and
file-thp-aligned-2048kB. We can filter that out by subtracting the relevant
smaps counters from them. I could add a --ignore-pmd-mapped flag to do that? Or
I could rename all the existing counters to include "pte" and introduce 2 new
counters: anon-thp-aligned-pmd-2048kB and file-thp-aligned-pmd-2048kB?

The --cont option will add *additional* special counters, if specified. The idea
here is to provide a view on what percentage of memory is getting
contpte-mapped. So if you provide "--cont 64K" it will give you a counter
showing how much memory is in 64K, naturally aligned blocks (actually 2
counters; file and anon). Those blocks can come from fully mapped and aligned
64K THPs. But they can also come from bigger THPs - for example, if a 128K THP
is aligned on a 64K boundary (but not a 128K boundary), then it will provide 2
64K cont blocks, but it will be counted as unaligned in
anon-thp-unaligned-128kB. Or if a 2M THP is partially mapped so that only it's
first 1M is mapped and aligned on a 64K boundary, then it will be counted in the
*-thp-partial counter and would add 1M to the *-cont-aligned-64kB counter.


Sorry if I've labored the point here. But I think the only thing the tool
doesn't already do that you are asking for is to differentiate PTE- vs PMD-
mappings?

> 
> ...
>                          (e.g. /sys/fs/cgroup for cgroup-v2 or
>>>>                          /sys/fs/cgroup/pids for cgroup-v1). Exactly one
>>>>                          of --pid and --cgroup must be provided.
>>>
>>> Maybe we could add "--global" to that list. That would look, in order,
>>> inside cgroups2 and cgroups, for a list of pids, and then run as if
>>> --cgroup /sys/fs/cgroup or --cgroup /sys/fs/cgroup/pids were specified.
>>
>> I think actually it might be better just to make global the default when neither
>> --pid nor --cgroup are provided? And in this case, I'll just grab all the pids
>> from /proc rather than traverse the cgroup hierachy, that way it will work on
>> systems without cgroups. Does that work for you?
> 
> Yes! That was my initial idea, in fact, and after over-thinking it for
> a while, it turned into the above. haha :)

OK great - implemented for v3.

> 
> 
> thanks,



  reply	other threads:[~2024-01-12 10:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-10 17:32 Ryan Roberts
2024-01-10 23:21 ` John Hubbard
2024-01-11  0:11   ` John Hubbard
2024-01-11  3:32     ` John Hubbard
2024-01-11 11:54   ` Ryan Roberts
2024-01-11 17:32     ` Ryan Roberts
2024-01-11 18:01       ` David Hildenbrand
2024-01-11 18:04       ` John Hubbard
2024-01-12 10:01         ` Ryan Roberts
2024-01-11 18:17     ` John Hubbard
2024-01-12 10:00       ` Ryan Roberts [this message]
2024-01-12 19:14         ` John Hubbard
2024-01-15  9:48           ` Ryan Roberts
2024-01-15 15:56             ` Ryan Roberts
2024-01-15 21:30               ` John Hubbard
2024-01-16  8:53                 ` Ryan Roberts
2024-01-16 17:27                   ` John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=22905bf7-570f-41a9-8dd0-b8a250c97de3@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=david@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-mm@kvack.org \
    --cc=v-songbaohua@oppo.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=william.kucharski@oracle.com \
    --cc=willy@infradead.org \
    --cc=yuzenghui@huawei.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox