linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: John Hubbard <jhubbard@nvidia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Zenghui Yu <yuzenghui@huawei.com>,
	Matthew Wilcox <willy@infradead.org>,
	David Hildenbrand <david@redhat.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>, Zi Yan <ziy@nvidia.com>,
	Barry Song <21cnbao@gmail.com>,
	Alistair Popple <apopple@nvidia.com>,
	William Kucharski <william.kucharski@oracle.com>
Cc: linux-mm@kvack.org, Barry Song <v-songbaohua@oppo.com>
Subject: Re: [PATCH v2] tools/mm: Add thpmaps script to dump THP usage info
Date: Thu, 11 Jan 2024 17:32:39 +0000	[thread overview]
Message-ID: <de8ce74a-11ff-4be9-b5d4-2af1ebdd03d0@arm.com> (raw)
In-Reply-To: <43230798-af22-4f59-b37c-8257bae32af8@arm.com>

On 11/01/2024 11:54, Ryan Roberts wrote:
> On 10/01/2024 23:21, John Hubbard wrote:
>> On 1/10/24 09:32, Ryan Roberts wrote:
>> ...
>>> options:
>>>    -h, --help           show this help message and exit
>>>    --pid pid            Process id of the target process. Exactly one
>>>                         of --pid and --cgroup must be provided.
>>>    --cgroup path        Path to the target cgroup in sysfs. Iterates
>>>                         over every pid in the cgroup and its children.
>>>                         Get global stats by passing in the root cgroup
>>
>> Hi Ryan,
>>
>> Yes, this version is fairly effective at getting global stats now.
>>
>> I've got some proposed minor tweaks below, and a few questions. Let me
>> start with the questions:
>>
>> 1) When I run this on an older 6.4.8-based kernel:
>>
>>     # ./thpmaps --cgroup /sys/fs/cgroup  --cont 128K --cont 512K --cont 1M \
>>             --cont 2M --cont 512M --summary
>>
>> , I get this output:
>>
>> file-thp-aligned-524288kB:      36175872 kB (95%)
>> file-thp-partial:                 856640 kB ( 2%)
>> file-cont-aligned-128kB:        37032320 kB (97%)
>> file-cont-aligned-512kB:        36597760 kB (96%)
>> file-cont-aligned-1024kB:       36597760 kB (96%)
>> file-cont-aligned-2048kB:       36595712 kB (96%)
>> file-cont-aligned-524288kB:     36175872 kB (95%)
>>
>>
>> Is it true that the above is basically "normal" 512MB THP in action?
> 
> No: the "file" part of the counter name means it is file (not anon). So this is
> not mTHP, which would always be anon (e.g. "anon-cont-aligned-128kB"). Based on
> your follow-up mail, I would guess this is mostly hugetlb memory rather than
> actual page cache memory, but they are both getting lumped into those "file" labels.
> 
>> And all of the "cont" entries are just that way because we can't
>> really tell mTHP/cont apart from normal THP?
> 
> I'm not sure exectly what you are asking. The "cont" counters are counting
> blocks of contiguous, naturally aligned physical memory, which are also mapped
> contiguously and aligned. So a smaller --cont would always include all the
> memory captured in a larger --cont. In this case, its all the *file-backed*
> memory (as highighted in the label name) so nothing to do with (m)THP. But where
> you have THP, --cont doesn't care what the underlying THP size is as long as its
> requirements are met, so PMD-sized THPs would be included in e.g.
> *anon*-cont-aligned-128kB.
> 
> Note the the "--cont" counters don't directly count memory that is PTE-mapped
> with the contiguous bit set in the page table; it just counts memory that meets
> the alignment, size and mapping requirements. On arm64 systems with the contpte
> series, the contiguous bit would be used here, but its not a part of what's
> getting measured.
> 
>>
>> 2) On an mTHP kernel with the latest patchsets (arm64, 64K page size), I
>> *think* I cannot turn off mTHP. I'm still teasing apart how much of this
>> is an instrumentation error, and how much is a measurement problem (with
>> the test suite). And maybe I'm wrong entirely. But the "never" option
>> doesn't seem to have an effect. Unless the latest version of the testsuite
>> is doing something new, sigh.
>>
>> $ for f in $(find /sys/kernel/mm/transparent_hugepage/ -name enabled); do echo
>> "$f: $(cat $f)"; done
>> /sys/kernel/mm/transparent_hugepage/hugepages-512kB/enabled: always inherit
>> madvise [never]
>> /sys/kernel/mm/transparent_hugepage/enabled: always madvise [never]
>> /sys/kernel/mm/transparent_hugepage/hugepages-262144kB/enabled: always inherit
>> madvise [never]
>> /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled: always inherit
>> madvise [never]
>> /sys/kernel/mm/transparent_hugepage/hugepages-32768kB/enabled: always inherit
>> madvise [never]
>> /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/enabled: always inherit
>> madvise [never]
>> /sys/kernel/mm/transparent_hugepage/hugepages-16384kB/enabled: always inherit
>> madvise [never]
>> /sys/kernel/mm/transparent_hugepage/hugepages-524288kB/enabled: always inherit
>> madvise [never]
>> /sys/kernel/mm/transparent_hugepage/hugepages-8192kB/enabled: always inherit
>> madvise [never]
>> /sys/kernel/mm/transparent_hugepage/hugepages-256kB/enabled: always inherit
>> madvise [never]
>> /sys/kernel/mm/transparent_hugepage/hugepages-65536kB/enabled: always inherit
>> madvise [never]
>> /sys/kernel/mm/transparent_hugepage/hugepages-131072kB/enabled: always inherit
>> madvise [never]
>> /sys/kernel/mm/transparent_hugepage/hugepages-4096kB/enabled: always inherit
>> madvise [never]
>>
>> Any quick thoughts? Don't waste any time on this, it's probably
>> operator error. Just in case, though.
> 
> As per your email, you're looking at hugetlb memory (as per counter label).
> 
> I have all the information to create a hugetlb-specific set of counters, so its
> not lumped in with page cache memory. You would then have counter sets of
> "anon", "file" and "htlb". Would that be useful?

Or I could just filter out hugetlb memory so it doesn't appear in this tool at
all? That would be easier implementation-wise, and probably more in line with
the original intention of the tool (it's called thpmaps, after all).

> 
>>
>>
>>>                         (e.g. /sys/fs/cgroup for cgroup-v2 or
>>>                         /sys/fs/cgroup/pids for cgroup-v1). Exactly one
>>>                         of --pid and --cgroup must be provided.
>>
>> Maybe we could add "--global" to that list. That would look, in order,
>> inside cgroups2 and cgroups, for a list of pids, and then run as if
>> --cgroup /sys/fs/cgroup or --cgroup /sys/fs/cgroup/pids were specified.
> 
> I think actually it might be better just to make global the default when neither
> --pid nor --cgroup are provided? And in this case, I'll just grab all the pids
> from /proc rather than traverse the cgroup hierachy, that way it will work on
> systems without cgroups. Does that work for you?
> 
>>
>> It's nicer than failing out. And it's also directly useful. I would be
>> running my above command like this, instead:
>>
>> # ./thpmaps --global  --cont 128K --cont 512K --cont 1M \
>>             --cont 2M --cont 512M --summary
>>
>> thanks,
> 



  reply	other threads:[~2024-01-11 17:32 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-10 17:32 Ryan Roberts
2024-01-10 23:21 ` John Hubbard
2024-01-11  0:11   ` John Hubbard
2024-01-11  3:32     ` John Hubbard
2024-01-11 11:54   ` Ryan Roberts
2024-01-11 17:32     ` Ryan Roberts [this message]
2024-01-11 18:01       ` David Hildenbrand
2024-01-11 18:04       ` John Hubbard
2024-01-12 10:01         ` Ryan Roberts
2024-01-11 18:17     ` John Hubbard
2024-01-12 10:00       ` Ryan Roberts
2024-01-12 19:14         ` John Hubbard
2024-01-15  9:48           ` Ryan Roberts
2024-01-15 15:56             ` Ryan Roberts
2024-01-15 21:30               ` John Hubbard
2024-01-16  8:53                 ` Ryan Roberts
2024-01-16 17:27                   ` John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=de8ce74a-11ff-4be9-b5d4-2af1ebdd03d0@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=david@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-mm@kvack.org \
    --cc=v-songbaohua@oppo.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=william.kucharski@oracle.com \
    --cc=willy@infradead.org \
    --cc=yuzenghui@huawei.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox