From: Barry Song <21cnbao@gmail.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: David Hildenbrand <david@redhat.com>,
akpm@linux-foundation.org, linux-mm@kvack.org,
cerasuolodomenico@gmail.com, chrisl@kernel.org,
kasong@tencent.com, peterx@redhat.com, surenb@google.com,
v-songbaohua@oppo.com, willy@infradead.org,
yosryahmed@google.com, yuzhao@google.com
Subject: Re: [PATCH v3] mm: add per-order mTHP alloc_success and alloc_fail counters
Date: Fri, 5 Apr 2024 22:08:16 +1300 [thread overview]
Message-ID: <CAGsJ_4x5K3B0dc2mhSaC8XAZRjiMss1QVfBa5kw8_TfRDOt8AA@mail.gmail.com> (raw)
In-Reply-To: <3e5e87b1-b26b-4a93-ab56-9f23bd56a02a@arm.com>
On Fri, Apr 5, 2024 at 8:18 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 05/04/2024 05:01, Barry Song wrote:
> > On Fri, Apr 5, 2024 at 3:57 PM Barry Song <21cnbao@gmail.com> wrote:
> >>
> >> On Fri, Apr 5, 2024 at 4:31 AM David Hildenbrand <david@redhat.com> wrote:
> >>>
> >>> On 04.04.24 09:21, Ryan Roberts wrote:
> >>>> On 03/04/2024 22:00, Barry Song wrote:
> >>>>> On Thu, Apr 4, 2024 at 12:48 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>>>>>
> >>>>>> On 03/04/2024 09:22, David Hildenbrand wrote:
> >>>>>>> On 03.04.24 05:55, Barry Song wrote:
> >>>>>>>> From: Barry Song <v-songbaohua@oppo.com>
> >>>>>>>>
> >>>>>>>> Profiling a system blindly with mTHP has become challenging due
> >>>>>>>> to the lack of visibility into its operations. Presenting the
> >>>>>>>> success rate of mTHP allocations appears to be pressing need.
> >>>>>>>>
> >>>>>>>> Recently, I've been experiencing significant difficulty debugging
> >>>>>>>> performance improvements and regressions without these figures.
> >>>>>>>> It's crucial for us to understand the true effectiveness of
> >>>>>>>> mTHP in real-world scenarios, especially in systems with
> >>>>>>>> fragmented memory.
> >>>>>>>>
> >>>>>>>> This patch sets up the framework for per-order mTHP counters,
> >>>>>>>> starting with the introduction of anon_alloc_success and
> >>>>>>>> anon_alloc_fail counters. Incorporating additional counters
> >>>>>>>> should now be straightforward as well.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> >>>>>>>> ---
> >>>>>>>> -v3:
> >>>>>>>> * save some memory as order-0 and order-1 can't be THP, Ryan;
> >>>>>>>> * rename to anon_alloc as right now we only support anon to address
> >>>>>>>> David's comment;
> >>>>>>>> * drop a redundant "else", Ryan
> >>>>>>>>
> >>>>>>>> include/linux/huge_mm.h | 18 ++++++++++++++
> >>>>>>>> mm/huge_memory.c | 54 +++++++++++++++++++++++++++++++++++++++++
> >>>>>>>> mm/memory.c | 2 ++
> >>>>>>>> 3 files changed, 74 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> >>>>>>>> index e896ca4760f6..5e9af6be9537 100644
> >>>>>>>> --- a/include/linux/huge_mm.h
> >>>>>>>> +++ b/include/linux/huge_mm.h
> >>>>>>>> @@ -70,6 +70,7 @@ extern struct kobj_attribute shmem_enabled_attr;
> >>>>>>>> * (which is a limitation of the THP implementation).
> >>>>>>>> */
> >>>>>>>> #define THP_ORDERS_ALL_ANON ((BIT(PMD_ORDER + 1) - 1) & ~(BIT(0) | BIT(1)))
> >>>>>>>> +#define THP_MIN_ORDER 2
> >>>>>>>> /*
> >>>>>>>> * Mask of all large folio orders supported for file THP.
> >>>>>>>> @@ -264,6 +265,23 @@ unsigned long thp_vma_allowable_orders(struct
> >>>>>>>> vm_area_struct *vma,
> >>>>>>>> enforce_sysfs, orders);
> >>>>>>>> }
> >>>>>>>> +enum thp_event_item {
> >>>>>>>> + THP_ANON_ALLOC_SUCCESS,
> >>>>>>>> + THP_ANON_ALLOC_FAIL,
> >>>>>>>> + NR_THP_EVENT_ITEMS
> >>>>>>>> +};
> >>>>>>>
> >>>>>>> Maybe use a prefix that resembles matches the enum name and is "obviously"
> >>>>>>> different to the ones in vm_event_item.h, like
> >>>>>>>
> >>>>>>> enum thp_event {
> >>>>>>> THP_EVENT_ANON_ALLOC_SUCCESS,
> >>>>>>> THP_EVENT_ANON_ALLOC_FAIL,
> >>>>>>> __THP_EVENT_COUNT,
> >>>>>>> };
> >>>>>>
> >>>>>> FWIW, I'd personally replace "event" with "stat". For me "event" only ever
> >>>>>> increments, but "stat" can increment and decrement. An event is a type of stat.
> >>>>>>
> >>>>>> You are only adding events for now, but we have identified a need for inc/dec
> >>>>>> stats that will be added in future.
> >>>>>
> >>>>> What about the below?
> >>>>>
> >>>>> enum thp_stat {
> >>
> >> It seems we still need to use enum thp_stat_item rather than thp_stat.
> >> This follows
> >> enum zone_stat_item
> >> enum numa_stat_item
> >> enum node_stat_item
> >>
> >> And most importantly, the below looks much better
> >>
> >> enum thp_stat_item {
> >> THP_STAT_ANON_ALLOC,
> >> THP_STAT_ANON_ALLOC_FALLBACK,
> >> __THP_STAT_COUNT
> >> };
> >>
> >> struct thp_state {
> >> unsigned long state[PMD_ORDER + 1][__THP_STAT_COUNT];
> >> };
> >>
> >> DECLARE_PER_CPU(struct thp_state, thp_states);
> >>
> >> than
> >>
> >> enum thp_stat {
> >> THP_STAT_ANON_ALLOC,
> >> THP_STAT_ANON_ALLOC_FALLBACK,
> >> __THP_STAT_COUNT
> >> };
> >>
> >> struct thp_state {
> >> unsigned long state[PMD_ORDER + 1][__THP_STAT_COUNT];
> >> };
> >>
> >>>>> THP_EVENT_ANON_ALLOC,
> >>>>> THP_EVENT_ANON_ALLOC_FALLBACK,
> >>>>> THP_EVENT_SWPOUT,
> >>>>> THP_EVENT_SWPOUT_FALLBACK,
> >>>>> ...
> >>>>> THP_NR_ANON_PAGES,
> >>>>> THP_NR_FILE_PAGES,
> >>>>
> >>>> I find this ambiguous; Is it the number of THPs or the number of base pages?
> >>>>
> >>>> I think David made the point about incorporating the enum name into the labels
> >>>> too, so that there can be no namespace confusion. How about:
> >>>>
> >>>> <enum>_<type>_<name>
> >>>>
> >>>> So:
> >>>>
> >>>> THP_STAT_EV_ANON_ALLOC
> >>>> THP_STAT_EV_ANON_ALLOC_FALLBACK
> >>>> THP_STAT_EV_ANON_PARTIAL
> >>>> THP_STAT_EV_SWPOUT
> >>>> THP_STAT_EV_SWPOUT_FALLBACK
> >>>> ...
> >>>> THP_STAT_NR_ANON
> >>>> THP_STAT_NR_FILE
> >>>> ...
> >>>> __THP_STAT_COUNT
> >>>
> >>> I'd even drop the "EV". "NR_ANON" vs "ANON_ALLOC" etc. is expressive enough.
> >>
> >> ok.
> >
> > Hi David, Ryan,
> >
> > I've named everything as follows. Please let me know if you have any further
> > suggestions before I send the updated version :-)
>
> Please treat all my comments below as optional - I'm just stating my preference!
>
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index e896ca4760f6..cc13fa14aa32 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -264,6 +264,23 @@ unsigned long thp_vma_allowable_orders(struct
> > vm_area_struct *vma,
> > enforce_sysfs, orders);
> > }
> >
> > +enum thp_stat_item {
> > + THP_STAT_ANON_ALLOC,
> > + THP_STAT_ANON_ALLOC_FALLBACK,
> > + __THP_STAT_COUNT
> > +};
> > +
> > +struct thp_state {
> > + unsigned long state[PMD_ORDER + 1][__THP_STAT_COUNT];
>
> Why using "state" here? I think "stats" is more appropriate? (as in short for
> "statistics") i.e. `struct thp_stats` and `unsigned long stats[][]`.
Sounds good.
>
> > +};
> > +
> > +DECLARE_PER_CPU(struct thp_state, thp_states);
> > +
> > +static inline void count_thp_state(int order, enum thp_stat_item item)
> > +{
> > + this_cpu_inc(thp_states.state[order][item]);
> > +}
> > +
> > #define transparent_hugepage_use_zero_page() \
> > (transparent_hugepage_flags & \
> > (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG))
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 9d4b2fbf6872..e704b4408181 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -526,6 +526,46 @@ static const struct kobj_type thpsize_ktype = {
> > .sysfs_ops = &kobj_sysfs_ops,
> > };
> >
> > +DEFINE_PER_CPU(struct thp_state, thp_states) = {{{0}}};
>
> Does this need to be explicitly zeroed? Won't that be the default initial state,
> just like for regular globals? Perhaps PER_CPU is special?
I really don't know. uninitialized data is usually put in bss section so kernel
will memset them to 0. but uninitialized per-cpu data is still in .data..percpu
section and I can't find where memset is done in this area as this section
can also contain initialized per-cpu data. Will compilers fill zero for
uninitialized per-cpu data automatically so kernel just thinks they are
initialized 0 data?
anyway, I copied the code from vmstat
DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
>
> > +
> > +static unsigned long sum_thp_states(int order, enum thp_stat_item item)
>
> Again, I'd call it sum_thp_stats().
sounds good.
>
> > +{
> > + unsigned long sum = 0;
> > + int cpu;
> > +
> > + for_each_online_cpu(cpu) {
> > + struct thp_state *this = &per_cpu(thp_states, cpu);
> > +
> > + sum += this->state[order][item];
> > + }
> > +
> > + return sum;
> > +}
> > +
> > +#define THP_STATE_ATTR(_name, _index) \
>
> And THP_STATS_ATTR(); they are going to live in the "stats" directory after all.
sounds good.
>
> > +static ssize_t _name##_show(struct kobject *kobj, \
> > + struct kobj_attribute *attr, char *buf) \
> > +{ \
> > + int order = to_thpsize(kobj)->order; \
> > + \
> > + return sysfs_emit(buf, "%lu\n", sum_thp_states(order, _index)); \
> > +} \
> > +static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
> > +
> > +THP_STATE_ATTR(anon_alloc, THP_STAT_ANON_ALLOC);
> > +THP_STATE_ATTR(anon_alloc_fallback, THP_STAT_ANON_ALLOC_FALLBACK);
> >
> >>
> >>>
> >>> --
> >>> Cheers,
> >>>
> >>> David / dhildenb
> >
> > Thanks
> > Barry
>
prev parent reply other threads:[~2024-04-05 9:08 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-03 3:55 Barry Song
2024-04-03 8:22 ` David Hildenbrand
2024-04-03 11:48 ` Ryan Roberts
2024-04-03 12:00 ` David Hildenbrand
2024-04-03 21:00 ` Barry Song
2024-04-04 7:21 ` Ryan Roberts
2024-04-04 10:52 ` Barry Song
[not found] ` <30392471-71f9-4eb1-8855-d9c12499346f@redhat.com>
2024-04-05 2:57 ` Barry Song
2024-04-05 4:01 ` Barry Song
2024-04-05 6:29 ` David Hildenbrand
2024-04-05 7:21 ` Ryan Roberts
2024-04-05 9:04 ` David Hildenbrand
2024-04-05 9:24 ` Barry Song
2024-04-05 10:15 ` David Hildenbrand
2024-04-05 10:51 ` Barry Song
2024-04-05 7:18 ` Ryan Roberts
2024-04-05 9:08 ` Barry Song [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGsJ_4x5K3B0dc2mhSaC8XAZRjiMss1QVfBa5kw8_TfRDOt8AA@mail.gmail.com \
--to=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cerasuolodomenico@gmail.com \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=kasong@tencent.com \
--cc=linux-mm@kvack.org \
--cc=peterx@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=surenb@google.com \
--cc=v-songbaohua@oppo.com \
--cc=willy@infradead.org \
--cc=yosryahmed@google.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox