From: Barry Song <21cnbao@gmail.com>
To: David Hildenbrand <david@redhat.com>,
Ryan Roberts <ryan.roberts@arm.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
cerasuolodomenico@gmail.com, chrisl@kernel.org,
kasong@tencent.com, peterx@redhat.com, surenb@google.com,
v-songbaohua@oppo.com, willy@infradead.org,
yosryahmed@google.com, yuzhao@google.com
Subject: Re: [PATCH v3] mm: add per-order mTHP alloc_success and alloc_fail counters
Date: Fri, 5 Apr 2024 17:01:05 +1300 [thread overview]
Message-ID: <CAGsJ_4yPJd25O314zBWzB9ZFdsTMHx29gePatiE=drtA4jyjXw@mail.gmail.com> (raw)
In-Reply-To: <CAGsJ_4zAUkN_cpVzdV-94zPqEAhh+gW9rYY2S8V8e24_OdjJaA@mail.gmail.com>
On Fri, Apr 5, 2024 at 3:57 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Fri, Apr 5, 2024 at 4:31 AM David Hildenbrand <david@redhat.com> wrote:
> >
> > On 04.04.24 09:21, Ryan Roberts wrote:
> > > On 03/04/2024 22:00, Barry Song wrote:
> > >> On Thu, Apr 4, 2024 at 12:48 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
> > >>>
> > >>> On 03/04/2024 09:22, David Hildenbrand wrote:
> > >>>> On 03.04.24 05:55, Barry Song wrote:
> > >>>>> From: Barry Song <v-songbaohua@oppo.com>
> > >>>>>
> > >>>>> Profiling a system blindly with mTHP has become challenging due
> > >>>>> to the lack of visibility into its operations. Presenting the
> > >>>>> success rate of mTHP allocations appears to be pressing need.
> > >>>>>
> > >>>>> Recently, I've been experiencing significant difficulty debugging
> > >>>>> performance improvements and regressions without these figures.
> > >>>>> It's crucial for us to understand the true effectiveness of
> > >>>>> mTHP in real-world scenarios, especially in systems with
> > >>>>> fragmented memory.
> > >>>>>
> > >>>>> This patch sets up the framework for per-order mTHP counters,
> > >>>>> starting with the introduction of anon_alloc_success and
> > >>>>> anon_alloc_fail counters. Incorporating additional counters
> > >>>>> should now be straightforward as well.
> > >>>>>
> > >>>>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > >>>>> ---
> > >>>>> -v3:
> > >>>>> * save some memory as order-0 and order-1 can't be THP, Ryan;
> > >>>>> * rename to anon_alloc as right now we only support anon to address
> > >>>>> David's comment;
> > >>>>> * drop a redundant "else", Ryan
> > >>>>>
> > >>>>> include/linux/huge_mm.h | 18 ++++++++++++++
> > >>>>> mm/huge_memory.c | 54 +++++++++++++++++++++++++++++++++++++++++
> > >>>>> mm/memory.c | 2 ++
> > >>>>> 3 files changed, 74 insertions(+)
> > >>>>>
> > >>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > >>>>> index e896ca4760f6..5e9af6be9537 100644
> > >>>>> --- a/include/linux/huge_mm.h
> > >>>>> +++ b/include/linux/huge_mm.h
> > >>>>> @@ -70,6 +70,7 @@ extern struct kobj_attribute shmem_enabled_attr;
> > >>>>> * (which is a limitation of the THP implementation).
> > >>>>> */
> > >>>>> #define THP_ORDERS_ALL_ANON ((BIT(PMD_ORDER + 1) - 1) & ~(BIT(0) | BIT(1)))
> > >>>>> +#define THP_MIN_ORDER 2
> > >>>>> /*
> > >>>>> * Mask of all large folio orders supported for file THP.
> > >>>>> @@ -264,6 +265,23 @@ unsigned long thp_vma_allowable_orders(struct
> > >>>>> vm_area_struct *vma,
> > >>>>> enforce_sysfs, orders);
> > >>>>> }
> > >>>>> +enum thp_event_item {
> > >>>>> + THP_ANON_ALLOC_SUCCESS,
> > >>>>> + THP_ANON_ALLOC_FAIL,
> > >>>>> + NR_THP_EVENT_ITEMS
> > >>>>> +};
> > >>>>
> > >>>> Maybe use a prefix that resembles matches the enum name and is "obviously"
> > >>>> different to the ones in vm_event_item.h, like
> > >>>>
> > >>>> enum thp_event {
> > >>>> THP_EVENT_ANON_ALLOC_SUCCESS,
> > >>>> THP_EVENT_ANON_ALLOC_FAIL,
> > >>>> __THP_EVENT_COUNT,
> > >>>> };
> > >>>
> > >>> FWIW, I'd personally replace "event" with "stat". For me "event" only ever
> > >>> increments, but "stat" can increment and decrement. An event is a type of stat.
> > >>>
> > >>> You are only adding events for now, but we have identified a need for inc/dec
> > >>> stats that will be added in future.
> > >>
> > >> What about the below?
> > >>
> > >> enum thp_stat {
>
> It seems we still need to use enum thp_stat_item rather than thp_stat.
> This follows
> enum zone_stat_item
> enum numa_stat_item
> enum node_stat_item
>
> And most importantly, the below looks much better
>
> enum thp_stat_item {
> THP_STAT_ANON_ALLOC,
> THP_STAT_ANON_ALLOC_FALLBACK,
> __THP_STAT_COUNT
> };
>
> struct thp_state {
> unsigned long state[PMD_ORDER + 1][__THP_STAT_COUNT];
> };
>
> DECLARE_PER_CPU(struct thp_state, thp_states);
>
> than
>
> enum thp_stat {
> THP_STAT_ANON_ALLOC,
> THP_STAT_ANON_ALLOC_FALLBACK,
> __THP_STAT_COUNT
> };
>
> struct thp_state {
> unsigned long state[PMD_ORDER + 1][__THP_STAT_COUNT];
> };
>
> > >> THP_EVENT_ANON_ALLOC,
> > >> THP_EVENT_ANON_ALLOC_FALLBACK,
> > >> THP_EVENT_SWPOUT,
> > >> THP_EVENT_SWPOUT_FALLBACK,
> > >> ...
> > >> THP_NR_ANON_PAGES,
> > >> THP_NR_FILE_PAGES,
> > >
> > > I find this ambiguous; Is it the number of THPs or the number of base pages?
> > >
> > > I think David made the point about incorporating the enum name into the labels
> > > too, so that there can be no namespace confusion. How about:
> > >
> > > <enum>_<type>_<name>
> > >
> > > So:
> > >
> > > THP_STAT_EV_ANON_ALLOC
> > > THP_STAT_EV_ANON_ALLOC_FALLBACK
> > > THP_STAT_EV_ANON_PARTIAL
> > > THP_STAT_EV_SWPOUT
> > > THP_STAT_EV_SWPOUT_FALLBACK
> > > ...
> > > THP_STAT_NR_ANON
> > > THP_STAT_NR_FILE
> > > ...
> > > __THP_STAT_COUNT
> >
> > I'd even drop the "EV". "NR_ANON" vs "ANON_ALLOC" etc. is expressive enough.
>
> ok.
Hi David, Ryan,
I've named everything as follows. Please let me know if you have any further
suggestions before I send the updated version :-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index e896ca4760f6..cc13fa14aa32 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -264,6 +264,23 @@ unsigned long thp_vma_allowable_orders(struct
vm_area_struct *vma,
enforce_sysfs, orders);
}
+enum thp_stat_item {
+ THP_STAT_ANON_ALLOC,
+ THP_STAT_ANON_ALLOC_FALLBACK,
+ __THP_STAT_COUNT
+};
+
+struct thp_state {
+ unsigned long state[PMD_ORDER + 1][__THP_STAT_COUNT];
+};
+
+DECLARE_PER_CPU(struct thp_state, thp_states);
+
+static inline void count_thp_state(int order, enum thp_stat_item item)
+{
+ this_cpu_inc(thp_states.state[order][item]);
+}
+
#define transparent_hugepage_use_zero_page() \
(transparent_hugepage_flags & \
(1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG))
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9d4b2fbf6872..e704b4408181 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -526,6 +526,46 @@ static const struct kobj_type thpsize_ktype = {
.sysfs_ops = &kobj_sysfs_ops,
};
+DEFINE_PER_CPU(struct thp_state, thp_states) = {{{0}}};
+
+static unsigned long sum_thp_states(int order, enum thp_stat_item item)
+{
+ unsigned long sum = 0;
+ int cpu;
+
+ for_each_online_cpu(cpu) {
+ struct thp_state *this = &per_cpu(thp_states, cpu);
+
+ sum += this->state[order][item];
+ }
+
+ return sum;
+}
+
+#define THP_STATE_ATTR(_name, _index) \
+static ssize_t _name##_show(struct kobject *kobj, \
+ struct kobj_attribute *attr, char *buf) \
+{ \
+ int order = to_thpsize(kobj)->order; \
+ \
+ return sysfs_emit(buf, "%lu\n", sum_thp_states(order, _index)); \
+} \
+static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
+
+THP_STATE_ATTR(anon_alloc, THP_STAT_ANON_ALLOC);
+THP_STATE_ATTR(anon_alloc_fallback, THP_STAT_ANON_ALLOC_FALLBACK);
>
> >
> > --
> > Cheers,
> >
> > David / dhildenb
Thanks
Barry
next prev parent reply other threads:[~2024-04-05 4:01 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-03 3:55 Barry Song
2024-04-03 8:22 ` David Hildenbrand
2024-04-03 11:48 ` Ryan Roberts
2024-04-03 12:00 ` David Hildenbrand
2024-04-03 21:00 ` Barry Song
2024-04-04 7:21 ` Ryan Roberts
2024-04-04 10:52 ` Barry Song
[not found] ` <30392471-71f9-4eb1-8855-d9c12499346f@redhat.com>
2024-04-05 2:57 ` Barry Song
2024-04-05 4:01 ` Barry Song [this message]
2024-04-05 6:29 ` David Hildenbrand
2024-04-05 7:21 ` Ryan Roberts
2024-04-05 9:04 ` David Hildenbrand
2024-04-05 9:24 ` Barry Song
2024-04-05 10:15 ` David Hildenbrand
2024-04-05 10:51 ` Barry Song
2024-04-05 7:18 ` Ryan Roberts
2024-04-05 9:08 ` Barry Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGsJ_4yPJd25O314zBWzB9ZFdsTMHx29gePatiE=drtA4jyjXw@mail.gmail.com' \
--to=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cerasuolodomenico@gmail.com \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=kasong@tencent.com \
--cc=linux-mm@kvack.org \
--cc=peterx@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=surenb@google.com \
--cc=v-songbaohua@oppo.com \
--cc=willy@infradead.org \
--cc=yosryahmed@google.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox