From: Nico Pache <npache@redhat.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
linux-mm@kvack.org, linux-doc@vger.kernel.org, david@redhat.com,
ziy@nvidia.com, baolin.wang@linux.alibaba.com,
Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com,
corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org,
mathieu.desnoyers@efficios.com, akpm@linux-foundation.org,
baohua@kernel.org, willy@infradead.org, peterx@redhat.com,
wangkefeng.wang@huawei.com, usamaarif642@gmail.com,
sunnanyong@huawei.com, vishal.moola@gmail.com,
thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com,
kas@kernel.org, aarcange@redhat.com, raquini@redhat.com,
anshuman.khandual@arm.com, catalin.marinas@arm.com,
tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com,
jack@suse.cz, cl@gentwo.org, jglisse@google.com,
surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org,
rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org,
hughd@google.com, richard.weiyang@gmail.com,
lance.yang@linux.dev, vbabka@suse.cz, rppt@kernel.org,
jannh@google.com, pfalcato@suse.de
Subject: Re: [PATCH v12 mm-new 09/15] khugepaged: add per-order mTHP collapse failure statistics
Date: Fri, 7 Nov 2025 10:14:10 -0700 [thread overview]
Message-ID: <CAA1CXcDT19rV_08pVP7CLuUZiVHW_1rSOv2oMXUHyRxh5sGCcA@mail.gmail.com> (raw)
In-Reply-To: <ffcf2c28-d0ae-4a45-8693-10fb4dff8479@lucifer.local>
On Thu, Nov 6, 2025 at 11:47 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Wed, Oct 22, 2025 at 12:37:11PM -0600, Nico Pache wrote:
> > Add three new mTHP statistics to track collapse failures for different
> > orders when encountering swap PTEs, excessive none PTEs, and shared PTEs:
> >
> > - collapse_exceed_swap_pte: Increment when mTHP collapse fails due to swap
> > PTEs
> >
> > - collapse_exceed_none_pte: Counts when mTHP collapse fails due to
> > exceeding the none PTE threshold for the given order
> >
> > - collapse_exceed_shared_pte: Counts when mTHP collapse fails due to shared
> > PTEs
> >
> > These statistics complement the existing THP_SCAN_EXCEED_* events by
> > providing per-order granularity for mTHP collapse attempts. The stats are
> > exposed via sysfs under
> > `/sys/kernel/mm/transparent_hugepage/hugepages-*/stats/` for each
> > supported hugepage size.
> >
> > As we currently dont support collapsing mTHPs that contain a swap or
> > shared entry, those statistics keep track of how often we are
> > encountering failed mTHP collapses due to these restrictions.
> >
> > Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > Signed-off-by: Nico Pache <npache@redhat.com>
> > ---
> > Documentation/admin-guide/mm/transhuge.rst | 23 ++++++++++++++++++++++
> > include/linux/huge_mm.h | 3 +++
> > mm/huge_memory.c | 7 +++++++
> > mm/khugepaged.c | 16 ++++++++++++---
> > 4 files changed, 46 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> > index 13269a0074d4..7c71cda8aea1 100644
> > --- a/Documentation/admin-guide/mm/transhuge.rst
> > +++ b/Documentation/admin-guide/mm/transhuge.rst
> > @@ -709,6 +709,29 @@ nr_anon_partially_mapped
> > an anonymous THP as "partially mapped" and count it here, even though it
> > is not actually partially mapped anymore.
> >
> > +collapse_exceed_none_pte
> > + The number of anonymous mTHP pte ranges where the number of none PTEs
>
> Ranges? Is the count per-mTHP folio? Or per PTE entry? Let's clarify.
I dont know the proper terminology. But what we have here is a range
of PTEs that is being considered for mTHP folio collapse; however, it
is still not a mTHP folio which is why I hesitated to call it that.
Given this counter is per mTHP size I think the proper way to say this would be:
The number of collapse attempts that failed due to exceeding the
max_ptes_none threshold.
>
> > + exceeded the max_ptes_none threshold. For mTHP collapse, khugepaged
> > + checks a PMD region and tracks which PTEs are present. It then tries
> > + to collapse to the largest enabled mTHP size. The allowed number of empty
>
> Well and then tries to collapse to the next and etc. right? So maybe worth
> mentioning?
>
> > + PTEs is the max_ptes_none threshold scaled by the collapse order. This
>
> I think this needs clarification, scaled how? Also obviously with the proposed
> new approach we will need to correct this to reflect the 511/0 situation.
>
> > + counter records the number of times a collapse attempt was skipped for
> > + this reason, and khugepaged moved on to try the next available mTHP size.
>
> OK you mention the moving on here, so for each attempted mTHP size which exeeds
> max_none_pte we increment this stat correct? Probably worth clarifying that.
>
> > +
> > +collapse_exceed_swap_pte
> > + The number of anonymous mTHP pte ranges which contain at least one swap
> > + PTE. Currently khugepaged does not support collapsing mTHP regions
> > + that contain a swap PTE. This counter can be used to monitor the
> > + number of khugepaged mTHP collapses that failed due to the presence
> > + of a swap PTE.
>
> OK so as soon as we encounter a swap PTE we abort and this counts each instance
> of that?
>
> I guess worth spelling that out? Given we don't support it, surely the opening
> description should be 'The number of anonymous mTHP PTE ranges which were unable
> to be collapsed due to containing one or more swap PTEs'.
>
> > +
> > +collapse_exceed_shared_pte
> > + The number of anonymous mTHP pte ranges which contain at least one shared
> > + PTE. Currently khugepaged does not support collapsing mTHP pte ranges
> > + that contain a shared PTE. This counter can be used to monitor the
> > + number of khugepaged mTHP collapses that failed due to the presence
> > + of a shared PTE.
>
> Same comments as above.
>
> > +
> > As the system ages, allocating huge pages may be expensive as the
> > system uses memory compaction to copy data around memory to free a
> > huge page for use. There are some counters in ``/proc/vmstat`` to help
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index 3d29624c4f3f..4b2773235041 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -144,6 +144,9 @@ enum mthp_stat_item {
> > MTHP_STAT_SPLIT_DEFERRED,
> > MTHP_STAT_NR_ANON,
> > MTHP_STAT_NR_ANON_PARTIALLY_MAPPED,
> > + MTHP_STAT_COLLAPSE_EXCEED_SWAP,
> > + MTHP_STAT_COLLAPSE_EXCEED_NONE,
> > + MTHP_STAT_COLLAPSE_EXCEED_SHARED,
> > __MTHP_STAT_COUNT
> > };
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 0063d1ba926e..7335b92969d6 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -638,6 +638,10 @@ DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
> > DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
> > DEFINE_MTHP_STAT_ATTR(nr_anon, MTHP_STAT_NR_ANON);
> > DEFINE_MTHP_STAT_ATTR(nr_anon_partially_mapped, MTHP_STAT_NR_ANON_PARTIALLY_MAPPED);
> > +DEFINE_MTHP_STAT_ATTR(collapse_exceed_swap_pte, MTHP_STAT_COLLAPSE_EXCEED_SWAP);
> > +DEFINE_MTHP_STAT_ATTR(collapse_exceed_none_pte, MTHP_STAT_COLLAPSE_EXCEED_NONE);
> > +DEFINE_MTHP_STAT_ATTR(collapse_exceed_shared_pte, MTHP_STAT_COLLAPSE_EXCEED_SHARED);
> > +
> >
> > static struct attribute *anon_stats_attrs[] = {
> > &anon_fault_alloc_attr.attr,
> > @@ -654,6 +658,9 @@ static struct attribute *anon_stats_attrs[] = {
> > &split_deferred_attr.attr,
> > &nr_anon_attr.attr,
> > &nr_anon_partially_mapped_attr.attr,
> > + &collapse_exceed_swap_pte_attr.attr,
> > + &collapse_exceed_none_pte_attr.attr,
> > + &collapse_exceed_shared_pte_attr.attr,
> > NULL,
> > };
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index d741af15e18c..053202141ea3 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -592,7 +592,9 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> > continue;
> > } else {
> > result = SCAN_EXCEED_NONE_PTE;
> > - count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> > + if (order == HPAGE_PMD_ORDER)
> > + count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> > + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_NONE);
> > goto out;
> > }
> > }
> > @@ -622,10 +624,17 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> > * shared may cause a future higher order collapse on a
> > * rescan of the same range.
> > */
> > - if (order != HPAGE_PMD_ORDER || (cc->is_khugepaged &&
> > - shared > khugepaged_max_ptes_shared)) {
> > + if (order != HPAGE_PMD_ORDER) {
>
Thanks for the review! I'll go clean these up for the next version
> A little nit/idea in general for series - since we do this order !=
> HPAGE_PMD_ORDER check all over, maybe have a predict function like:
>
> static bool is_mthp_order(unsigned int order)
> {
> return order != HPAGE_PMD_ORDER;
> }
sure!
>
> > + result = SCAN_EXCEED_SHARED_PTE;
> > + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SHARED);
> > + goto out;
> > + }
> > +
> > + if (cc->is_khugepaged &&
> > + shared > khugepaged_max_ptes_shared) {
> > result = SCAN_EXCEED_SHARED_PTE;
> > count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> > + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SHARED);
>
> OK I _think_ I mentioned this in a previous revision so forgive me for being
> repetitious but we also count PMD orders here?
>
> But in the MTHP_STAT_COLLAPSE_EXCEED_NONE and MTP_STAT_COLLAPSE_EXCEED_SWAP
> cases we don't? Why's that?
Hmm I could have sworn I fixed that... perhaps I reintroduced the
missing stat update when I had to rebase/undo the cleanup series by
Lance. I will fix this.
Cheers.
-- Nico
>
>
> > goto out;
> > }
> > }
> > @@ -1073,6 +1082,7 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm,
> > * range.
> > */
> > if (order != HPAGE_PMD_ORDER) {
> > + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SWAP);
> > pte_unmap(pte);
> > mmap_read_unlock(mm);
> > result = SCAN_EXCEED_SWAP_PTE;
> > --
> > 2.51.0
> >
>
> Thanks, Lorenzo
>
next prev parent reply other threads:[~2025-11-07 17:14 UTC|newest]
Thread overview: 91+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-22 18:37 [PATCH v12 mm-new 00/15] khugepaged: mTHP support Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 01/15] khugepaged: rename hpage_collapse_* to collapse_* Nico Pache
2025-11-08 1:42 ` Wei Yang
2025-10-22 18:37 ` [PATCH v12 mm-new 02/15] introduce collapse_single_pmd to unify khugepaged and madvise_collapse Nico Pache
2025-10-27 9:00 ` Lance Yang
2025-10-27 15:44 ` Lorenzo Stoakes
2025-11-08 1:44 ` Wei Yang
2025-10-22 18:37 ` [PATCH v12 mm-new 03/15] khugepaged: generalize hugepage_vma_revalidate for mTHP support Nico Pache
2025-10-27 9:02 ` Lance Yang
2025-11-08 1:54 ` Wei Yang
2025-10-22 18:37 ` [PATCH v12 mm-new 04/15] khugepaged: generalize alloc_charge_folio() Nico Pache
2025-10-27 9:05 ` Lance Yang
2025-11-08 2:34 ` Wei Yang
2025-10-22 18:37 ` [PATCH v12 mm-new 05/15] khugepaged: generalize __collapse_huge_page_* for mTHP support Nico Pache
2025-10-27 9:17 ` Lance Yang
2025-10-27 16:00 ` Lorenzo Stoakes
2025-11-10 13:20 ` Nico Pache
2025-11-08 3:01 ` Wei Yang
2025-10-22 18:37 ` [PATCH v12 mm-new 06/15] khugepaged: introduce collapse_max_ptes_none helper function Nico Pache
2025-10-27 17:53 ` Lorenzo Stoakes
2025-10-28 10:09 ` Baolin Wang
2025-10-28 13:57 ` Nico Pache
2025-10-28 17:07 ` Lorenzo Stoakes
2025-10-28 17:56 ` David Hildenbrand
2025-10-28 18:09 ` Lorenzo Stoakes
2025-10-28 18:17 ` David Hildenbrand
2025-10-28 18:41 ` Lorenzo Stoakes
2025-10-29 15:04 ` David Hildenbrand
2025-10-29 18:41 ` Lorenzo Stoakes
2025-10-29 21:10 ` Nico Pache
2025-10-30 18:03 ` Lorenzo Stoakes
2025-10-29 20:45 ` Nico Pache
2025-10-28 13:36 ` Nico Pache
2025-10-28 14:15 ` David Hildenbrand
2025-10-28 17:29 ` Lorenzo Stoakes
2025-10-28 17:36 ` Lorenzo Stoakes
2025-10-28 18:08 ` David Hildenbrand
2025-10-28 18:59 ` Lorenzo Stoakes
2025-10-28 19:08 ` Lorenzo Stoakes
2025-10-29 2:09 ` Baolin Wang
2025-10-29 2:49 ` Nico Pache
2025-10-29 18:55 ` Lorenzo Stoakes
2025-10-29 21:14 ` Nico Pache
2025-10-30 1:15 ` Baolin Wang
2025-10-29 2:47 ` Nico Pache
2025-10-29 18:58 ` Lorenzo Stoakes
2025-10-29 21:23 ` Nico Pache
2025-10-30 10:15 ` Lorenzo Stoakes
2025-10-31 11:12 ` David Hildenbrand
2025-10-28 16:57 ` Lorenzo Stoakes
2025-10-28 17:49 ` David Hildenbrand
2025-10-28 17:59 ` Lorenzo Stoakes
2025-10-22 18:37 ` [PATCH v12 mm-new 07/15] khugepaged: generalize collapse_huge_page for mTHP collapse Nico Pache
2025-10-27 3:25 ` Baolin Wang
2025-11-06 18:14 ` Lorenzo Stoakes
2025-11-07 3:09 ` Dev Jain
2025-11-07 9:18 ` Lorenzo Stoakes
2025-11-07 19:33 ` Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 08/15] khugepaged: skip collapsing mTHP to smaller orders Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 09/15] khugepaged: add per-order mTHP collapse failure statistics Nico Pache
2025-11-06 18:45 ` Lorenzo Stoakes
2025-11-07 17:14 ` Nico Pache [this message]
2025-10-22 18:37 ` [PATCH v12 mm-new 10/15] khugepaged: improve tracepoints for mTHP orders Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 11/15] khugepaged: introduce collapse_allowable_orders helper function Nico Pache
2025-11-06 18:49 ` Lorenzo Stoakes
2025-11-07 18:01 ` Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 12/15] khugepaged: Introduce mTHP collapse support Nico Pache
2025-10-27 6:28 ` Baolin Wang
2025-11-09 2:08 ` Wei Yang
2025-11-11 21:56 ` Nico Pache
2025-11-19 11:53 ` Lorenzo Stoakes
2025-11-19 12:08 ` Lorenzo Stoakes
2025-11-20 22:32 ` Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 13/15] khugepaged: avoid unnecessary mTHP collapse attempts Nico Pache
2025-11-09 2:40 ` Wei Yang
2025-11-17 18:16 ` Nico Pache
2025-11-18 2:00 ` Wei Yang
2025-11-19 12:05 ` Lorenzo Stoakes
2025-11-26 23:16 ` Nico Pache
2025-11-26 23:29 ` Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 14/15] khugepaged: run khugepaged for all orders Nico Pache
2025-11-19 12:13 ` Lorenzo Stoakes
2025-11-20 6:37 ` Baolin Wang
2025-10-22 18:37 ` [PATCH v12 mm-new 15/15] Documentation: mm: update the admin guide for mTHP collapse Nico Pache
2025-10-22 19:52 ` Christoph Lameter (Ampere)
2025-10-22 20:22 ` David Hildenbrand
2025-10-23 8:00 ` Lorenzo Stoakes
2025-10-23 8:44 ` Pedro Falcato
2025-10-24 13:54 ` Zach O'Keefe
2025-10-23 23:41 ` Christoph Lameter (Ampere)
2025-10-22 20:13 ` [PATCH v12 mm-new 00/15] khugepaged: mTHP support Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAA1CXcDT19rV_08pVP7CLuUZiVHW_1rSOv2oMXUHyRxh5sGCcA@mail.gmail.com \
--to=npache@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=jannh@google.com \
--cc=jglisse@google.com \
--cc=kas@kernel.org \
--cc=lance.yang@linux.dev \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mhocko@suse.com \
--cc=peterx@redhat.com \
--cc=pfalcato@suse.de \
--cc=raquini@redhat.com \
--cc=rdunlap@infradead.org \
--cc=richard.weiyang@gmail.com \
--cc=rientjes@google.com \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=sunnanyong@huawei.com \
--cc=surenb@google.com \
--cc=thomas.hellstrom@linux.intel.com \
--cc=tiwai@suse.de \
--cc=usamaarif642@gmail.com \
--cc=vbabka@suse.cz \
--cc=vishal.moola@gmail.com \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=yang@os.amperecomputing.com \
--cc=ziy@nvidia.com \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox