From: Ryan Roberts <ryan.roberts@arm.com>
To: Barry Song <21cnbao@gmail.com>, David Hildenbrand <david@redhat.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
cerasuolodomenico@gmail.com, chrisl@kernel.org,
kasong@tencent.com, peterx@redhat.com, surenb@google.com,
v-songbaohua@oppo.com, willy@infradead.org,
yosryahmed@google.com, yuzhao@google.com
Subject: Re: [PATCH v3] mm: add per-order mTHP alloc_success and alloc_fail counters
Date: Fri, 5 Apr 2024 08:18:54 +0100 [thread overview]
Message-ID: <3e5e87b1-b26b-4a93-ab56-9f23bd56a02a@arm.com> (raw)
In-Reply-To: <CAGsJ_4yPJd25O314zBWzB9ZFdsTMHx29gePatiE=drtA4jyjXw@mail.gmail.com>
On 05/04/2024 05:01, Barry Song wrote:
> On Fri, Apr 5, 2024 at 3:57 PM Barry Song <21cnbao@gmail.com> wrote:
>>
>> On Fri, Apr 5, 2024 at 4:31 AM David Hildenbrand <david@redhat.com> wrote:
>>>
>>> On 04.04.24 09:21, Ryan Roberts wrote:
>>>> On 03/04/2024 22:00, Barry Song wrote:
>>>>> On Thu, Apr 4, 2024 at 12:48 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>
>>>>>> On 03/04/2024 09:22, David Hildenbrand wrote:
>>>>>>> On 03.04.24 05:55, Barry Song wrote:
>>>>>>>> From: Barry Song <v-songbaohua@oppo.com>
>>>>>>>>
>>>>>>>> Profiling a system blindly with mTHP has become challenging due
>>>>>>>> to the lack of visibility into its operations. Presenting the
>>>>>>>> success rate of mTHP allocations appears to be pressing need.
>>>>>>>>
>>>>>>>> Recently, I've been experiencing significant difficulty debugging
>>>>>>>> performance improvements and regressions without these figures.
>>>>>>>> It's crucial for us to understand the true effectiveness of
>>>>>>>> mTHP in real-world scenarios, especially in systems with
>>>>>>>> fragmented memory.
>>>>>>>>
>>>>>>>> This patch sets up the framework for per-order mTHP counters,
>>>>>>>> starting with the introduction of anon_alloc_success and
>>>>>>>> anon_alloc_fail counters. Incorporating additional counters
>>>>>>>> should now be straightforward as well.
>>>>>>>>
>>>>>>>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>>>>>>>> ---
>>>>>>>> -v3:
>>>>>>>> * save some memory as order-0 and order-1 can't be THP, Ryan;
>>>>>>>> * rename to anon_alloc as right now we only support anon to address
>>>>>>>> David's comment;
>>>>>>>> * drop a redundant "else", Ryan
>>>>>>>>
>>>>>>>> include/linux/huge_mm.h | 18 ++++++++++++++
>>>>>>>> mm/huge_memory.c | 54 +++++++++++++++++++++++++++++++++++++++++
>>>>>>>> mm/memory.c | 2 ++
>>>>>>>> 3 files changed, 74 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>>>>>> index e896ca4760f6..5e9af6be9537 100644
>>>>>>>> --- a/include/linux/huge_mm.h
>>>>>>>> +++ b/include/linux/huge_mm.h
>>>>>>>> @@ -70,6 +70,7 @@ extern struct kobj_attribute shmem_enabled_attr;
>>>>>>>> * (which is a limitation of the THP implementation).
>>>>>>>> */
>>>>>>>> #define THP_ORDERS_ALL_ANON ((BIT(PMD_ORDER + 1) - 1) & ~(BIT(0) | BIT(1)))
>>>>>>>> +#define THP_MIN_ORDER 2
>>>>>>>> /*
>>>>>>>> * Mask of all large folio orders supported for file THP.
>>>>>>>> @@ -264,6 +265,23 @@ unsigned long thp_vma_allowable_orders(struct
>>>>>>>> vm_area_struct *vma,
>>>>>>>> enforce_sysfs, orders);
>>>>>>>> }
>>>>>>>> +enum thp_event_item {
>>>>>>>> + THP_ANON_ALLOC_SUCCESS,
>>>>>>>> + THP_ANON_ALLOC_FAIL,
>>>>>>>> + NR_THP_EVENT_ITEMS
>>>>>>>> +};
>>>>>>>
>>>>>>> Maybe use a prefix that resembles matches the enum name and is "obviously"
>>>>>>> different to the ones in vm_event_item.h, like
>>>>>>>
>>>>>>> enum thp_event {
>>>>>>> THP_EVENT_ANON_ALLOC_SUCCESS,
>>>>>>> THP_EVENT_ANON_ALLOC_FAIL,
>>>>>>> __THP_EVENT_COUNT,
>>>>>>> };
>>>>>>
>>>>>> FWIW, I'd personally replace "event" with "stat". For me "event" only ever
>>>>>> increments, but "stat" can increment and decrement. An event is a type of stat.
>>>>>>
>>>>>> You are only adding events for now, but we have identified a need for inc/dec
>>>>>> stats that will be added in future.
>>>>>
>>>>> What about the below?
>>>>>
>>>>> enum thp_stat {
>>
>> It seems we still need to use enum thp_stat_item rather than thp_stat.
>> This follows
>> enum zone_stat_item
>> enum numa_stat_item
>> enum node_stat_item
>>
>> And most importantly, the below looks much better
>>
>> enum thp_stat_item {
>> THP_STAT_ANON_ALLOC,
>> THP_STAT_ANON_ALLOC_FALLBACK,
>> __THP_STAT_COUNT
>> };
>>
>> struct thp_state {
>> unsigned long state[PMD_ORDER + 1][__THP_STAT_COUNT];
>> };
>>
>> DECLARE_PER_CPU(struct thp_state, thp_states);
>>
>> than
>>
>> enum thp_stat {
>> THP_STAT_ANON_ALLOC,
>> THP_STAT_ANON_ALLOC_FALLBACK,
>> __THP_STAT_COUNT
>> };
>>
>> struct thp_state {
>> unsigned long state[PMD_ORDER + 1][__THP_STAT_COUNT];
>> };
>>
>>>>> THP_EVENT_ANON_ALLOC,
>>>>> THP_EVENT_ANON_ALLOC_FALLBACK,
>>>>> THP_EVENT_SWPOUT,
>>>>> THP_EVENT_SWPOUT_FALLBACK,
>>>>> ...
>>>>> THP_NR_ANON_PAGES,
>>>>> THP_NR_FILE_PAGES,
>>>>
>>>> I find this ambiguous; Is it the number of THPs or the number of base pages?
>>>>
>>>> I think David made the point about incorporating the enum name into the labels
>>>> too, so that there can be no namespace confusion. How about:
>>>>
>>>> <enum>_<type>_<name>
>>>>
>>>> So:
>>>>
>>>> THP_STAT_EV_ANON_ALLOC
>>>> THP_STAT_EV_ANON_ALLOC_FALLBACK
>>>> THP_STAT_EV_ANON_PARTIAL
>>>> THP_STAT_EV_SWPOUT
>>>> THP_STAT_EV_SWPOUT_FALLBACK
>>>> ...
>>>> THP_STAT_NR_ANON
>>>> THP_STAT_NR_FILE
>>>> ...
>>>> __THP_STAT_COUNT
>>>
>>> I'd even drop the "EV". "NR_ANON" vs "ANON_ALLOC" etc. is expressive enough.
>>
>> ok.
>
> Hi David, Ryan,
>
> I've named everything as follows. Please let me know if you have any further
> suggestions before I send the updated version :-)
Please treat all my comments below as optional - I'm just stating my preference!
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index e896ca4760f6..cc13fa14aa32 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -264,6 +264,23 @@ unsigned long thp_vma_allowable_orders(struct
> vm_area_struct *vma,
> enforce_sysfs, orders);
> }
>
> +enum thp_stat_item {
> + THP_STAT_ANON_ALLOC,
> + THP_STAT_ANON_ALLOC_FALLBACK,
> + __THP_STAT_COUNT
> +};
> +
> +struct thp_state {
> + unsigned long state[PMD_ORDER + 1][__THP_STAT_COUNT];
Why using "state" here? I think "stats" is more appropriate? (as in short for
"statistics") i.e. `struct thp_stats` and `unsigned long stats[][]`.
> +};
> +
> +DECLARE_PER_CPU(struct thp_state, thp_states);
> +
> +static inline void count_thp_state(int order, enum thp_stat_item item)
> +{
> + this_cpu_inc(thp_states.state[order][item]);
> +}
> +
> #define transparent_hugepage_use_zero_page() \
> (transparent_hugepage_flags & \
> (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG))
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 9d4b2fbf6872..e704b4408181 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -526,6 +526,46 @@ static const struct kobj_type thpsize_ktype = {
> .sysfs_ops = &kobj_sysfs_ops,
> };
>
> +DEFINE_PER_CPU(struct thp_state, thp_states) = {{{0}}};
Does this need to be explicitly zeroed? Won't that be the default initial state,
just like for regular globals? Perhaps PER_CPU is special?
> +
> +static unsigned long sum_thp_states(int order, enum thp_stat_item item)
Again, I'd call it sum_thp_stats().
> +{
> + unsigned long sum = 0;
> + int cpu;
> +
> + for_each_online_cpu(cpu) {
> + struct thp_state *this = &per_cpu(thp_states, cpu);
> +
> + sum += this->state[order][item];
> + }
> +
> + return sum;
> +}
> +
> +#define THP_STATE_ATTR(_name, _index) \
And THP_STATS_ATTR(); they are going to live in the "stats" directory after all.
> +static ssize_t _name##_show(struct kobject *kobj, \
> + struct kobj_attribute *attr, char *buf) \
> +{ \
> + int order = to_thpsize(kobj)->order; \
> + \
> + return sysfs_emit(buf, "%lu\n", sum_thp_states(order, _index)); \
> +} \
> +static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
> +
> +THP_STATE_ATTR(anon_alloc, THP_STAT_ANON_ALLOC);
> +THP_STATE_ATTR(anon_alloc_fallback, THP_STAT_ANON_ALLOC_FALLBACK);
>
>>
>>>
>>> --
>>> Cheers,
>>>
>>> David / dhildenb
>
> Thanks
> Barry
next prev parent reply other threads:[~2024-04-05 7:19 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-03 3:55 Barry Song
2024-04-03 8:22 ` David Hildenbrand
2024-04-03 11:48 ` Ryan Roberts
2024-04-03 12:00 ` David Hildenbrand
2024-04-03 21:00 ` Barry Song
2024-04-04 7:21 ` Ryan Roberts
2024-04-04 10:52 ` Barry Song
[not found] ` <30392471-71f9-4eb1-8855-d9c12499346f@redhat.com>
2024-04-05 2:57 ` Barry Song
2024-04-05 4:01 ` Barry Song
2024-04-05 6:29 ` David Hildenbrand
2024-04-05 7:21 ` Ryan Roberts
2024-04-05 9:04 ` David Hildenbrand
2024-04-05 9:24 ` Barry Song
2024-04-05 10:15 ` David Hildenbrand
2024-04-05 10:51 ` Barry Song
2024-04-05 7:18 ` Ryan Roberts [this message]
2024-04-05 9:08 ` Barry Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3e5e87b1-b26b-4a93-ab56-9f23bd56a02a@arm.com \
--to=ryan.roberts@arm.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cerasuolodomenico@gmail.com \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=kasong@tencent.com \
--cc=linux-mm@kvack.org \
--cc=peterx@redhat.com \
--cc=surenb@google.com \
--cc=v-songbaohua@oppo.com \
--cc=willy@infradead.org \
--cc=yosryahmed@google.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox