From: Lorenzo Stoakes <ljs@kernel.org>
To: Nico Pache <npache@redhat.com>
Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org,
aarcange@redhat.com, akpm@linux-foundation.org,
anshuman.khandual@arm.com, apopple@nvidia.com, baohua@kernel.org,
baolin.wang@linux.alibaba.com, byungchul@sk.com,
catalin.marinas@arm.com, cl@gentwo.org, corbet@lwn.net,
dave.hansen@linux.intel.com, david@kernel.org, dev.jain@arm.com,
gourry@gourry.net, hannes@cmpxchg.org, hughd@google.com,
jack@suse.cz, jackmanb@google.com, jannh@google.com,
jglisse@google.com, joshua.hahnjy@gmail.com, kas@kernel.org,
lance.yang@linux.dev, Liam.Howlett@oracle.com,
lorenzo.stoakes@oracle.com, mathieu.desnoyers@efficios.com,
matthew.brost@intel.com, mhiramat@kernel.org, mhocko@suse.com,
peterx@redhat.com, pfalcato@suse.de, rakie.kim@sk.com,
raquini@redhat.com, rdunlap@infradead.org,
richard.weiyang@gmail.com, rientjes@google.com,
rostedt@goodmis.org, rppt@kernel.org, ryan.roberts@arm.com,
shivankg@amd.com, sunnanyong@huawei.com, surenb@google.com,
thomas.hellstrom@linux.intel.com, tiwai@suse.de,
usamaarif642@gmail.com, vbabka@suse.cz, vishal.moola@gmail.com,
wangkefeng.wang@huawei.com, will@kernel.org, willy@infradead.org,
yang@os.amperecomputing.com, ying.huang@linux.alibaba.com,
ziy@nvidia.com, zokeefe@google.com
Subject: Re: [PATCH mm-unstable v15 07/13] mm/khugepaged: add per-order mTHP collapse failure statistics
Date: Thu, 16 Apr 2026 08:21:07 +0100 [thread overview]
Message-ID: <aeCFqEHaeO8dD11M@lucifer> (raw)
In-Reply-To: <CAA1CXcCS9gWySN1oQzEYpALfURxBwt58us9tkAbNPnHOKmLd5g@mail.gmail.com>
Ack on all below due to lower bandwidth :P
It's nothing really major here so don't let any of this block on respin!
Cheers, Lorenzo
On Sun, Apr 12, 2026 at 08:48:29PM -0600, Nico Pache wrote:
> On Tue, Mar 17, 2026 at 11:05 AM Lorenzo Stoakes (Oracle)
> <ljs@kernel.org> wrote:
> >
> > On Wed, Feb 25, 2026 at 08:25:04PM -0700, Nico Pache wrote:
> > > Add three new mTHP statistics to track collapse failures for different
> > > orders when encountering swap PTEs, excessive none PTEs, and shared PTEs:
> > >
> > > - collapse_exceed_swap_pte: Increment when mTHP collapse fails due to swap
> > > PTEs
> > >
> > > - collapse_exceed_none_pte: Counts when mTHP collapse fails due to
> > > exceeding the none PTE threshold for the given order
> > >
> > > - collapse_exceed_shared_pte: Counts when mTHP collapse fails due to shared
> > > PTEs
> > >
> > > These statistics complement the existing THP_SCAN_EXCEED_* events by
> > > providing per-order granularity for mTHP collapse attempts. The stats are
> > > exposed via sysfs under
> > > `/sys/kernel/mm/transparent_hugepage/hugepages-*/stats/` for each
> > > supported hugepage size.
> > >
> > > As we currently dont support collapsing mTHPs that contain a swap or
> > > shared entry, those statistics keep track of how often we are
> > > encountering failed mTHP collapses due to these restrictions.
> > >
> > > Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > > Signed-off-by: Nico Pache <npache@redhat.com>
> > > ---
> > > Documentation/admin-guide/mm/transhuge.rst | 24 ++++++++++++++++++++++
> > > include/linux/huge_mm.h | 3 +++
> > > mm/huge_memory.c | 7 +++++++
> > > mm/khugepaged.c | 16 ++++++++++++---
> > > 4 files changed, 47 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> > > index c51932e6275d..eebb1f6bbc6c 100644
> > > --- a/Documentation/admin-guide/mm/transhuge.rst
> > > +++ b/Documentation/admin-guide/mm/transhuge.rst
> > > @@ -714,6 +714,30 @@ nr_anon_partially_mapped
> > > an anonymous THP as "partially mapped" and count it here, even though it
> > > is not actually partially mapped anymore.
> > >
> > > +collapse_exceed_none_pte
> > > + The number of collapse attempts that failed due to exceeding the
> > > + max_ptes_none threshold. For mTHP collapse, Currently only max_ptes_none
> > > + values of 0 and (HPAGE_PMD_NR - 1) are supported. Any other value will
> > > + emit a warning and no mTHP collapse will be attempted. khugepaged will
> >
> > It's weird to document this here but not elsewhere in the document? I mean I
> > made this comment on the documentation patch also.
>
> I can add some more documentation but TBH I don't really know where or
> what else to put. I checked a few of these other per-mTHP stats, and
> none are referenced elsewhere. if anything these 3 additions are the
> best documented ones.
>
> >
> > Not sure if I missed you adding it to another bit of the docs? :)
> >
> > > + try to collapse to the largest enabled (m)THP size; if it fails, it will
> > > + try the next lower enabled mTHP size. This counter records the number of
> > > + times a collapse attempt was skipped for exceeding the max_ptes_none
> > > + threshold, and khugepaged will move on to the next available mTHP size.
> > > +
> > > +collapse_exceed_swap_pte
> > > + The number of anonymous mTHP PTE ranges which were unable to collapse due
> > > + to containing at least one swap PTE. Currently khugepaged does not
> > > + support collapsing mTHP regions that contain a swap PTE. This counter can
> > > + be used to monitor the number of khugepaged mTHP collapses that failed
> > > + due to the presence of a swap PTE.
> > > +
> > > +collapse_exceed_shared_pte
> > > + The number of anonymous mTHP PTE ranges which were unable to collapse due
> > > + to containing at least one shared PTE. Currently khugepaged does not
> > > + support collapsing mTHP PTE ranges that contain a shared PTE. This
> > > + counter can be used to monitor the number of khugepaged mTHP collapses
> > > + that failed due to the presence of a shared PTE.
> >
> > All of these talk about 'ranges' that could be of any size. Are these useful
> > metrics? Counting a bunch of failures and not knowing if they are 256 KB
> > failures or 16 KB failures or whatever is maybe not so useful information?
>
> These are per-mTHP size statistics. If you look at the surrounding
> examples and docs this all makes more sense.
>
> >
> > Also, from the code, aren't you treating PMD events the same as mTHP ones from
> > the point of view of these counters? Maybe worth documenting that?
>
> IIUC, yes but that is true of all these
>
> ```
> In /sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/stats, There are
> also individual counters for each huge page size, which can be utilized to
> monitor the system's effectiveness in providing huge pages for usage. Each
> counter has its own corresponding file.
> ```
>
> >
> > > +
> > > As the system ages, allocating huge pages may be expensive as the
> > > system uses memory compaction to copy data around memory to free a
> > > huge page for use. There are some counters in ``/proc/vmstat`` to help
> > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > index 9941fc6d7bd8..e8777bb2347d 100644
> > > --- a/include/linux/huge_mm.h
> > > +++ b/include/linux/huge_mm.h
> > > @@ -144,6 +144,9 @@ enum mthp_stat_item {
> > > MTHP_STAT_SPLIT_DEFERRED,
> > > MTHP_STAT_NR_ANON,
> > > MTHP_STAT_NR_ANON_PARTIALLY_MAPPED,
> > > + MTHP_STAT_COLLAPSE_EXCEED_SWAP,
> > > + MTHP_STAT_COLLAPSE_EXCEED_NONE,
> > > + MTHP_STAT_COLLAPSE_EXCEED_SHARED,
> > > __MTHP_STAT_COUNT
> > > };
> > >
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index 228f35e962b9..1049a207a257 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -642,6 +642,10 @@ DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
> > > DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
> > > DEFINE_MTHP_STAT_ATTR(nr_anon, MTHP_STAT_NR_ANON);
> > > DEFINE_MTHP_STAT_ATTR(nr_anon_partially_mapped, MTHP_STAT_NR_ANON_PARTIALLY_MAPPED);
> > > +DEFINE_MTHP_STAT_ATTR(collapse_exceed_swap_pte, MTHP_STAT_COLLAPSE_EXCEED_SWAP);
> > > +DEFINE_MTHP_STAT_ATTR(collapse_exceed_none_pte, MTHP_STAT_COLLAPSE_EXCEED_NONE);
> > > +DEFINE_MTHP_STAT_ATTR(collapse_exceed_shared_pte, MTHP_STAT_COLLAPSE_EXCEED_SHARED);
> >
> > Is there a reason there's such a difference between the names and the actual
> > enum names?
>
> Good point I didnt think about that. I can update those as long as
> they don't conflict with something else (I forget why i named them
> like this).
>
> >
> > > +
> > >
> > > static struct attribute *anon_stats_attrs[] = {
> > > &anon_fault_alloc_attr.attr,
> > > @@ -658,6 +662,9 @@ static struct attribute *anon_stats_attrs[] = {
> > > &split_deferred_attr.attr,
> > > &nr_anon_attr.attr,
> > > &nr_anon_partially_mapped_attr.attr,
> > > + &collapse_exceed_swap_pte_attr.attr,
> > > + &collapse_exceed_none_pte_attr.attr,
> > > + &collapse_exceed_shared_pte_attr.attr,
> > > NULL,
> > > };
> > >
> > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > index c739f26dd61e..a6cf90e09e4a 100644
> > > --- a/mm/khugepaged.c
> > > +++ b/mm/khugepaged.c
> > > @@ -595,7 +595,9 @@ static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma,
> > > continue;
> > > } else {
> > > result = SCAN_EXCEED_NONE_PTE;
> > > - count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> > > + if (is_pmd_order(order))
> > > + count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> > > + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_NONE);
> >
> > It's a bit gross to have separate stats for both thp and mthp but maybe
> > unavoidable from a legacy stand point.
>
> I agree but that's how it currently is. Perhaps we can add this to the
> TODO list for THP work.
>
> >
> > Why are we dropping the _PTE suffix?
>
> I follow the convention that the other mTHP stats follow for example
> (MTHP_STAT_SPLIT_DEFERRED)
>
> >
> > > goto out;
> > > }
> > > }
> > > @@ -631,10 +633,17 @@ static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma,
> > > * shared may cause a future higher order collapse on a
> > > * rescan of the same range.
> > > */
> > > - if (!is_pmd_order(order) || (cc->is_khugepaged &&
> > > - shared > khugepaged_max_ptes_shared)) {
> >
> > OK losing track here :) as the series sadly doesn't currently apply so can't
> > browser file as is.
> >
> > In the code I'm looking at, there's also a ++shared here that I guess another
> > patch removed?
> >
> > Is this in the folio_maybe_mapped_shared() branch?
>
> yes the counting is now done at the top of that branch.
>
> >
> > > + if (!is_pmd_order(order)) {
> > > + result = SCAN_EXCEED_SHARED_PTE;
> > > + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SHARED);
> > > + goto out;
> > > + }
> > > +
> > > + if (cc->is_khugepaged &&
> > > + shared > khugepaged_max_ptes_shared) {
> > > result = SCAN_EXCEED_SHARED_PTE;
> > > count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> > > + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SHARED);
> > > goto out;
> >
> > Anyway I'm a bit lost on this logic until a respin but this looks like a LOT of
> > code duplication. I see David alluded to a refactoring so maybe what he suggests
> > will help (not had a chance to check what it is specifically :P)
>
> Yep :) should look cleaner in the next one. Although it's quite a bit
> of refactoring. I'll be praying that i got it right on the first go,
> and I put all the other pieces in the desired spot.
>
> >
> > > }
> > > }
> > > @@ -1081,6 +1090,7 @@ static enum scan_result __collapse_huge_page_swapin(struct mm_struct *mm,
> > > * range.
> > > */
> > > if (!is_pmd_order(order)) {
> > > + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SWAP);
> >
> > Hmm I thought we were incrementing mthp stats for pmd sized also?
>
> Yes we are supposed to. I've already refactored and it looks fine
> there... perhaps i missed this one in this version!
>
> Cheers,
>
> -- Nico
>
> >
> > > pte_unmap(pte);
> > > mmap_read_unlock(mm);
> > > result = SCAN_EXCEED_SWAP_PTE;
> > > --
> > > 2.53.0
> > >
> >
> > Cheers, Lorenzo
> >
>
next prev parent reply other threads:[~2026-04-16 7:21 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-26 3:17 [PATCH mm-unstable v15 00/13] khugepaged: mTHP support Nico Pache
2026-02-26 3:22 ` [PATCH mm-unstable v15 01/13] mm/khugepaged: generalize hugepage_vma_revalidate for " Nico Pache
2026-03-12 20:00 ` David Hildenbrand (Arm)
2026-02-26 3:23 ` [PATCH mm-unstable v15 02/13] mm/khugepaged: generalize alloc_charge_folio() Nico Pache
2026-03-12 20:05 ` David Hildenbrand (Arm)
2026-02-26 3:23 ` [PATCH mm-unstable v15 03/13] mm/khugepaged: generalize __collapse_huge_page_* for mTHP support Nico Pache
2026-03-12 20:32 ` David Hildenbrand (Arm)
2026-03-12 20:36 ` David Hildenbrand (Arm)
2026-03-12 20:56 ` David Hildenbrand (Arm)
2026-04-08 19:48 ` Nico Pache
2026-04-09 8:14 ` David Hildenbrand (Arm)
2026-04-09 16:17 ` Nico Pache
2026-04-09 18:35 ` David Hildenbrand (Arm)
2026-02-26 3:24 ` [PATCH mm-unstable v15 04/13] mm/khugepaged: introduce collapse_max_ptes_none helper function Nico Pache
2026-02-26 3:24 ` [PATCH mm-unstable v15 05/13] mm/khugepaged: generalize collapse_huge_page for mTHP collapse Nico Pache
2026-03-17 16:51 ` Lorenzo Stoakes (Oracle)
2026-03-17 17:16 ` Randy Dunlap
2026-04-16 4:14 ` Nico Pache
2026-04-16 6:43 ` Lorenzo Stoakes
2026-02-26 3:24 ` [PATCH mm-unstable v15 06/13] mm/khugepaged: skip collapsing mTHP to smaller orders Nico Pache
2026-03-12 21:00 ` David Hildenbrand (Arm)
2026-04-13 1:38 ` Nico Pache
2026-04-13 7:37 ` David Hildenbrand (Arm)
2026-02-26 3:25 ` [PATCH mm-unstable v15 07/13] mm/khugepaged: add per-order mTHP collapse failure statistics Nico Pache
2026-03-12 21:03 ` David Hildenbrand (Arm)
2026-03-17 17:05 ` Lorenzo Stoakes (Oracle)
2026-04-13 2:48 ` Nico Pache
2026-04-16 7:21 ` Lorenzo Stoakes [this message]
2026-02-26 3:25 ` [PATCH mm-unstable v15 08/13] mm/khugepaged: improve tracepoints for mTHP orders Nico Pache
2026-03-12 21:05 ` David Hildenbrand (Arm)
2026-02-26 3:25 ` [PATCH mm-unstable v15 09/13] mm/khugepaged: introduce collapse_allowable_orders helper function Nico Pache
2026-03-12 21:09 ` David Hildenbrand (Arm)
2026-03-17 17:08 ` Lorenzo Stoakes (Oracle)
2026-02-26 3:26 ` [PATCH mm-unstable v15 10/13] mm/khugepaged: Introduce mTHP collapse support Nico Pache
2026-03-12 21:16 ` David Hildenbrand (Arm)
2026-03-17 21:36 ` Lorenzo Stoakes (Oracle)
2026-02-26 3:26 ` [PATCH mm-unstable v15 11/13] mm/khugepaged: avoid unnecessary mTHP collapse attempts Nico Pache
2026-02-26 16:26 ` Usama Arif
2026-02-26 20:47 ` Nico Pache
2026-03-12 21:19 ` David Hildenbrand (Arm)
2026-03-17 10:35 ` Lorenzo Stoakes (Oracle)
2026-03-18 18:59 ` Nico Pache
2026-03-18 19:48 ` David Hildenbrand (Arm)
2026-03-19 15:59 ` Lorenzo Stoakes (Oracle)
2026-02-26 3:26 ` [PATCH mm-unstable v15 12/13] mm/khugepaged: run khugepaged for all orders Nico Pache
2026-02-26 15:53 ` Usama Arif
2026-03-12 21:22 ` David Hildenbrand (Arm)
2026-03-17 10:58 ` Lorenzo Stoakes (Oracle)
2026-03-18 19:02 ` Nico Pache
2026-03-17 11:36 ` Lance Yang
2026-03-18 19:07 ` Nico Pache
2026-02-26 3:27 ` [PATCH mm-unstable v15 13/13] Documentation: mm: update the admin guide for mTHP collapse Nico Pache
2026-03-17 11:02 ` Lorenzo Stoakes (Oracle)
2026-03-18 19:08 ` Nico Pache
2026-03-18 19:49 ` David Hildenbrand (Arm)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aeCFqEHaeO8dD11M@lucifer \
--to=ljs@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=apopple@nvidia.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=byungchul@sk.com \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=jackmanb@google.com \
--cc=jannh@google.com \
--cc=jglisse@google.com \
--cc=joshua.hahnjy@gmail.com \
--cc=kas@kernel.org \
--cc=lance.yang@linux.dev \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=matthew.brost@intel.com \
--cc=mhiramat@kernel.org \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=peterx@redhat.com \
--cc=pfalcato@suse.de \
--cc=rakie.kim@sk.com \
--cc=raquini@redhat.com \
--cc=rdunlap@infradead.org \
--cc=richard.weiyang@gmail.com \
--cc=rientjes@google.com \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shivankg@amd.com \
--cc=sunnanyong@huawei.com \
--cc=surenb@google.com \
--cc=thomas.hellstrom@linux.intel.com \
--cc=tiwai@suse.de \
--cc=usamaarif642@gmail.com \
--cc=vbabka@suse.cz \
--cc=vishal.moola@gmail.com \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=yang@os.amperecomputing.com \
--cc=ying.huang@linux.alibaba.com \
--cc=ziy@nvidia.com \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox