linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Nico Pache <npache@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-doc@vger.kernel.org, david@redhat.com,
	ziy@nvidia.com, baolin.wang@linux.alibaba.com,
	Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com,
	corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org,
	mathieu.desnoyers@efficios.com, akpm@linux-foundation.org,
	baohua@kernel.org, willy@infradead.org, peterx@redhat.com,
	wangkefeng.wang@huawei.com, usamaarif642@gmail.com,
	sunnanyong@huawei.com, vishal.moola@gmail.com,
	thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com,
	kas@kernel.org, aarcange@redhat.com, raquini@redhat.com,
	anshuman.khandual@arm.com, catalin.marinas@arm.com,
	tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com,
	jack@suse.cz, cl@gentwo.org, jglisse@google.com,
	surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org,
	rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org,
	hughd@google.com, richard.weiyang@gmail.com,
	lance.yang@linux.dev, vbabka@suse.cz, rppt@kernel.org,
	jannh@google.com, pfalcato@suse.de
Subject: Re: [PATCH v12 mm-new 06/15] khugepaged: introduce collapse_max_ptes_none helper function
Date: Mon, 27 Oct 2025 17:53:47 +0000	[thread overview]
Message-ID: <5f8c69c1-d07b-4957-b671-b37fccf729f1@lucifer.local> (raw)
In-Reply-To: <20251022183717.70829-7-npache@redhat.com>

On Wed, Oct 22, 2025 at 12:37:08PM -0600, Nico Pache wrote:
> The current mechanism for determining mTHP collapse scales the
> khugepaged_max_ptes_none value based on the target order. This
> introduces an undesirable feedback loop, or "creep", when max_ptes_none
> is set to a value greater than HPAGE_PMD_NR / 2.
>
> With this configuration, a successful collapse to order N will populate
> enough pages to satisfy the collapse condition on order N+1 on the next
> scan. This leads to unnecessary work and memory churn.
>
> To fix this issue introduce a helper function that caps the max_ptes_none
> to HPAGE_PMD_NR / 2 - 1 (255 on 4k page size). The function also scales
> the max_ptes_none number by the (PMD_ORDER - target collapse order).
>
> The limits can be ignored by passing full_scan=true, this is useful for
> madvise_collapse (which ignores limits), or in the case of
> collapse_scan_pmd(), allows the full PMD to be scanned when mTHP
> collapse is available.
>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
>  mm/khugepaged.c | 35 ++++++++++++++++++++++++++++++++++-
>  1 file changed, 34 insertions(+), 1 deletion(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 4ccebf5dda97..286c3a7afdee 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -459,6 +459,39 @@ void __khugepaged_enter(struct mm_struct *mm)
>  		wake_up_interruptible(&khugepaged_wait);
>  }
>
> +/**
> + * collapse_max_ptes_none - Calculate maximum allowed empty PTEs for collapse
> + * @order: The folio order being collapsed to
> + * @full_scan: Whether this is a full scan (ignore limits)
> + *
> + * For madvise-triggered collapses (full_scan=true), all limits are bypassed
> + * and allow up to HPAGE_PMD_NR - 1 empty PTEs.
> + *
> + * For PMD-sized collapses (order == HPAGE_PMD_ORDER), use the configured
> + * khugepaged_max_ptes_none value.
> + *
> + * For mTHP collapses, scale down the max_ptes_none proportionally to the folio
> + * order, but caps it at HPAGE_PMD_NR/2-1 to prevent a collapse feedback loop.
> + *
> + * Return: Maximum number of empty PTEs allowed for the collapse operation
> + */
> +static unsigned int collapse_max_ptes_none(unsigned int order, bool full_scan)
> +{
> +	unsigned int max_ptes_none;
> +
> +	/* ignore max_ptes_none limits */
> +	if (full_scan)
> +		return HPAGE_PMD_NR - 1;
> +
> +	if (order == HPAGE_PMD_ORDER)
> +		return khugepaged_max_ptes_none;
> +
> +	max_ptes_none = min(khugepaged_max_ptes_none, HPAGE_PMD_NR/2 - 1);

I mean not to beat a dead horse re: v11 commentary, but I thought we were going
to implement David's idea re: the new 'eagerness' tunable, and again we're now just
implementing the capping at HPAGE_PMD_NR/2 - 1 thing again?

I'm still really quite uncomfortable with us silently capping this value.

If we're putting forward theoretical ideas that are to be later built upon, this
series should be an RFC.

But if we really intend to silently ignore user input the problem is that then
becomes established uAPI.

I think it's _sensible_ to avoid this mTHP escalation problem, but the issue is
visibility I think.

I think people are going to find it odd that you set it to something, but then
get something else.

As an alternative we could have a new sysfs field:

/sys/kernel/mm/transparent_hugepage/khugepaged/max_mthp_ptes_none

That shows the cap clearly.

In fact, it could be read-only... and just expose it to the user. That reduces
complexity.

We can then bring in eagerness later and have the same situation of
max_ptes_none being a parameter that exists (plus this additional read-only
parameter).

> +
> +	return max_ptes_none >> (HPAGE_PMD_ORDER - order);
> +
> +}
> +
>  void khugepaged_enter_vma(struct vm_area_struct *vma,
>  			  vm_flags_t vm_flags)
>  {
> @@ -546,7 +579,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>  	pte_t *_pte;
>  	int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
>  	const unsigned long nr_pages = 1UL << order;
> -	int max_ptes_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order);
> +	int max_ptes_none = collapse_max_ptes_none(order, !cc->is_khugepaged);
>
>  	for (_pte = pte; _pte < pte + nr_pages;
>  	     _pte++, addr += PAGE_SIZE) {
> --
> 2.51.0
>


  reply	other threads:[~2025-10-27 17:54 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-22 18:37 [PATCH v12 mm-new 00/15] khugepaged: mTHP support Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 01/15] khugepaged: rename hpage_collapse_* to collapse_* Nico Pache
2025-11-08  1:42   ` Wei Yang
2025-10-22 18:37 ` [PATCH v12 mm-new 02/15] introduce collapse_single_pmd to unify khugepaged and madvise_collapse Nico Pache
2025-10-27  9:00   ` Lance Yang
2025-10-27 15:44   ` Lorenzo Stoakes
2025-11-08  1:44   ` Wei Yang
2025-10-22 18:37 ` [PATCH v12 mm-new 03/15] khugepaged: generalize hugepage_vma_revalidate for mTHP support Nico Pache
2025-10-27  9:02   ` Lance Yang
2025-11-08  1:54   ` Wei Yang
2025-10-22 18:37 ` [PATCH v12 mm-new 04/15] khugepaged: generalize alloc_charge_folio() Nico Pache
2025-10-27  9:05   ` Lance Yang
2025-11-08  2:34   ` Wei Yang
2025-10-22 18:37 ` [PATCH v12 mm-new 05/15] khugepaged: generalize __collapse_huge_page_* for mTHP support Nico Pache
2025-10-27  9:17   ` Lance Yang
2025-10-27 16:00   ` Lorenzo Stoakes
2025-11-10 13:20     ` Nico Pache
2025-11-08  3:01   ` Wei Yang
2025-10-22 18:37 ` [PATCH v12 mm-new 06/15] khugepaged: introduce collapse_max_ptes_none helper function Nico Pache
2025-10-27 17:53   ` Lorenzo Stoakes [this message]
2025-10-28 10:09     ` Baolin Wang
2025-10-28 13:57       ` Nico Pache
2025-10-28 17:07       ` Lorenzo Stoakes
2025-10-28 17:56         ` David Hildenbrand
2025-10-28 18:09           ` Lorenzo Stoakes
2025-10-28 18:17             ` David Hildenbrand
2025-10-28 18:41               ` Lorenzo Stoakes
2025-10-29 15:04                 ` David Hildenbrand
2025-10-29 18:41                   ` Lorenzo Stoakes
2025-10-29 21:10                     ` Nico Pache
2025-10-30 18:03                       ` Lorenzo Stoakes
2025-10-29 20:45                   ` Nico Pache
2025-10-28 13:36     ` Nico Pache
2025-10-28 14:15       ` David Hildenbrand
2025-10-28 17:29         ` Lorenzo Stoakes
2025-10-28 17:36           ` Lorenzo Stoakes
2025-10-28 18:08           ` David Hildenbrand
2025-10-28 18:59             ` Lorenzo Stoakes
2025-10-28 19:08               ` Lorenzo Stoakes
2025-10-29  2:09               ` Baolin Wang
2025-10-29  2:49                 ` Nico Pache
2025-10-29 18:55                 ` Lorenzo Stoakes
2025-10-29 21:14                   ` Nico Pache
2025-10-30  1:15                     ` Baolin Wang
2025-10-29  2:47               ` Nico Pache
2025-10-29 18:58                 ` Lorenzo Stoakes
2025-10-29 21:23                   ` Nico Pache
2025-10-30 10:15                     ` Lorenzo Stoakes
2025-10-31 11:12               ` David Hildenbrand
2025-10-28 16:57       ` Lorenzo Stoakes
2025-10-28 17:49         ` David Hildenbrand
2025-10-28 17:59           ` Lorenzo Stoakes
2025-10-22 18:37 ` [PATCH v12 mm-new 07/15] khugepaged: generalize collapse_huge_page for mTHP collapse Nico Pache
2025-10-27  3:25   ` Baolin Wang
2025-11-06 18:14   ` Lorenzo Stoakes
2025-11-07  3:09     ` Dev Jain
2025-11-07  9:18       ` Lorenzo Stoakes
2025-11-07 19:33     ` Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 08/15] khugepaged: skip collapsing mTHP to smaller orders Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 09/15] khugepaged: add per-order mTHP collapse failure statistics Nico Pache
2025-11-06 18:45   ` Lorenzo Stoakes
2025-11-07 17:14     ` Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 10/15] khugepaged: improve tracepoints for mTHP orders Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 11/15] khugepaged: introduce collapse_allowable_orders helper function Nico Pache
2025-11-06 18:49   ` Lorenzo Stoakes
2025-11-07 18:01     ` Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 12/15] khugepaged: Introduce mTHP collapse support Nico Pache
2025-10-27  6:28   ` Baolin Wang
2025-11-09  2:08   ` Wei Yang
2025-11-11 21:56     ` Nico Pache
2025-11-19 11:53   ` Lorenzo Stoakes
2025-11-19 12:08     ` Lorenzo Stoakes
2025-11-20 22:32     ` Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 13/15] khugepaged: avoid unnecessary mTHP collapse attempts Nico Pache
2025-11-09  2:40   ` Wei Yang
2025-11-17 18:16     ` Nico Pache
2025-11-18  2:00       ` Wei Yang
2025-11-19 12:05   ` Lorenzo Stoakes
2025-11-26 23:16     ` Nico Pache
2025-11-26 23:29     ` Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 14/15] khugepaged: run khugepaged for all orders Nico Pache
2025-11-19 12:13   ` Lorenzo Stoakes
2025-11-20  6:37     ` Baolin Wang
2025-10-22 18:37 ` [PATCH v12 mm-new 15/15] Documentation: mm: update the admin guide for mTHP collapse Nico Pache
2025-10-22 19:52   ` Christoph Lameter (Ampere)
2025-10-22 20:22     ` David Hildenbrand
2025-10-23  8:00       ` Lorenzo Stoakes
2025-10-23  8:44         ` Pedro Falcato
2025-10-24 13:54           ` Zach O'Keefe
2025-10-23 23:41       ` Christoph Lameter (Ampere)
2025-10-22 20:13 ` [PATCH v12 mm-new 00/15] khugepaged: mTHP support Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5f8c69c1-d07b-4957-b671-b37fccf729f1@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=catalin.marinas@arm.com \
    --cc=cl@gentwo.org \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jglisse@google.com \
    --cc=kas@kernel.org \
    --cc=lance.yang@linux.dev \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pfalcato@suse.de \
    --cc=raquini@redhat.com \
    --cc=rdunlap@infradead.org \
    --cc=richard.weiyang@gmail.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=sunnanyong@huawei.com \
    --cc=surenb@google.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tiwai@suse.de \
    --cc=usamaarif642@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=vishal.moola@gmail.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yang@os.amperecomputing.com \
    --cc=ziy@nvidia.com \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox