linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com,
	baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com,
	npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
	hannes@cmpxchg.org, usamaarif642@gmail.com,
	gutierrez.asier@huawei-partners.com, willy@infradead.org,
	ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
	ameryhung@gmail.com, rientjes@google.com, corbet@lwn.net,
	21cnbao@gmail.com, shakeel.butt@linux.dev, bpf@vger.kernel.org,
	linux-mm@kvack.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH v7 mm-new 04/10] mm: thp: enable THP allocation exclusively through khugepaged
Date: Thu, 11 Sep 2025 16:58:35 +0100	[thread overview]
Message-ID: <a2c122f5-ab6a-4242-9db8-e5175d5b27b3@lucifer.local> (raw)
In-Reply-To: <20250910024447.64788-5-laoar.shao@gmail.com>

On Wed, Sep 10, 2025 at 10:44:41AM +0800, Yafang Shao wrote:
> Currently, THP allocation cannot be restricted to khugepaged alone while
> being disabled in the page fault path. This limitation exists because
> disabling THP allocation during page faults also prevents the execution of
> khugepaged_enter_vma() in that path.

This is quite confusing, I see what you mean - you want to be able to disable
page fault THP but not khugepaged THP _at the point of possibly faulting in a
THP aligned VMA_.

It seems this patch makes khugepaged_enter_vma() unconditional for an anonymous
VMA, rather than depending on the return value specified by
thp_vma_allowable_order().

So I think a clearer explanation is:

	khugepaged_enter_vma() ultimately invokes any attached BPF function with
	the TVA_KHUGEPAGED flag set when determining whether or not to enable
	khugepaged THP for a freshly faulted in VMA.

	Currently, on fault, we invoke this in do_huge_pmd_anonymous_page(), as
	invoked by create_huge_pmd() and only when we have already checked to
	see if an allowable TVA_PAGEFAULT order is specified.

	Since we might want to disallow THP on fault-in but allow it via
	khugepaged, we move things around so we always attempt to enter
	khugepaged upon fault.

Having said all this, I'm very confused.

Why are we doing this?

We only enable khugepaged _early_ when we know we're faulting in a huge PMD
here.

I guess we do this because, if we are allowed to do the pagefault, maybe
something changed that might have previously disallowed khugepaged to run for
the mm.

But now we're just checking unconditionally for... no reason?

if BPF disables page fault but not khugepaged, then surely the mm would already
be under be khugepaged if it could be?

It's sort of immaterial if we get a pmd_none() that is not-faultable for
whatever reason but BPF might say is khugepaged'able, because it'd have already
set this.

This is because if we just map a new VMA, we already let khugepaged have it via
khugepaged_enter_vma() in __mmap_new_vma() and in the merge paths.

I mean maybe I'm missing something here :)

>
> With the introduction of BPF, we can now implement THP policies based on
> different TVA types. This patch adjusts the logic to support this new
> capability.
>
> While we could also extend prtcl() to utilize this new policy, such a

Typo: prtcl -> prctl

> change would require a uAPI modification.

Hm, in what respect? PR_SET_THP_DISABLE?

>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  mm/huge_memory.c |  1 -
>  mm/memory.c      | 13 ++++++++-----
>  2 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 523153d21a41..1e9e7b32e2cf 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1346,7 +1346,6 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>  	ret = vmf_anon_prepare(vmf);
>  	if (ret)
>  		return ret;
> -	khugepaged_enter_vma(vma, vma->vm_flags);
>
>  	if (!(vmf->flags & FAULT_FLAG_WRITE) &&
>  			!mm_forbids_zeropage(vma->vm_mm) &&
> diff --git a/mm/memory.c b/mm/memory.c
> index d8819cac7930..d0609dc1e371 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -6289,11 +6289,14 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
>  	if (pud_trans_unstable(vmf.pud))
>  		goto retry_pud;
>
> -	if (pmd_none(*vmf.pmd) &&
> -	    thp_vma_allowable_order(vma, vm_flags, TVA_PAGEFAULT, PMD_ORDER)) {
> -		ret = create_huge_pmd(&vmf);
> -		if (!(ret & VM_FAULT_FALLBACK))
> -			return ret;
> +	if (pmd_none(*vmf.pmd)) {
> +		if (vma_is_anonymous(vma))
> +			khugepaged_enter_vma(vma, vm_flags);
> +		if (thp_vma_allowable_order(vma, vm_flags, TVA_PAGEFAULT, PMD_ORDER)) {
> +			ret = create_huge_pmd(&vmf);
> +			if (!(ret & VM_FAULT_FALLBACK))
> +				return ret;
> +		}
>  	} else {
>  		vmf.orig_pmd = pmdp_get_lockless(vmf.pmd);
>
> --
> 2.47.3
>


  parent reply	other threads:[~2025-09-11 15:59 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-10  2:44 [PATCH v7 mm-new 0/9] mm, bpf: BPF based THP order selection Yafang Shao
2025-09-10  2:44 ` [PATCH v7 mm-new 01/10] mm: thp: remove disabled task from khugepaged_mm_slot Yafang Shao
2025-09-10  5:11   ` Lance Yang
2025-09-10  6:17     ` Yafang Shao
2025-09-10  7:21   ` Lance Yang
2025-09-10 17:27   ` kernel test robot
2025-09-11  2:12     ` Lance Yang
2025-09-11  2:28       ` Zi Yan
2025-09-11  2:35         ` Yafang Shao
2025-09-11  2:38         ` Lance Yang
2025-09-11 13:47         ` Lorenzo Stoakes
2025-09-14  2:48           ` Yafang Shao
2025-09-11 13:43   ` Lorenzo Stoakes
2025-09-14  2:47     ` Yafang Shao
2025-09-10  2:44 ` [PATCH v7 mm-new 02/10] mm: thp: add support for BPF based THP order selection Yafang Shao
2025-09-10 12:42   ` Lance Yang
2025-09-10 12:54     ` Lance Yang
2025-09-10 13:56       ` Lance Yang
2025-09-11  2:48         ` Yafang Shao
2025-09-11  3:04           ` Lance Yang
2025-09-11 14:45         ` Lorenzo Stoakes
2025-09-11 14:02     ` Lorenzo Stoakes
2025-09-11 14:42       ` Lance Yang
2025-09-11 14:58         ` Lorenzo Stoakes
2025-09-12  7:58           ` Yafang Shao
2025-09-12 12:04             ` Lorenzo Stoakes
2025-09-11 14:33   ` Lorenzo Stoakes
2025-09-12  8:28     ` Yafang Shao
2025-09-12 11:53       ` Lorenzo Stoakes
2025-09-14  2:22         ` Yafang Shao
2025-09-11 14:51   ` Lorenzo Stoakes
2025-09-12  8:03     ` Yafang Shao
2025-09-12 12:00       ` Lorenzo Stoakes
2025-09-25 10:05   ` Lance Yang
2025-09-25 11:38     ` Yafang Shao
2025-09-10  2:44 ` [PATCH v7 mm-new 03/10] mm: thp: decouple THP allocation between swap and page fault paths Yafang Shao
2025-09-11 14:55   ` Lorenzo Stoakes
2025-09-12  7:20     ` Yafang Shao
2025-09-12 12:04       ` Lorenzo Stoakes
2025-09-10  2:44 ` [PATCH v7 mm-new 04/10] mm: thp: enable THP allocation exclusively through khugepaged Yafang Shao
2025-09-11 15:53   ` Lance Yang
2025-09-12  6:21     ` Yafang Shao
2025-09-11 15:58   ` Lorenzo Stoakes [this message]
2025-09-12  6:17     ` Yafang Shao
2025-09-12 13:48       ` Lorenzo Stoakes
2025-09-14  2:19         ` Yafang Shao
2025-09-10  2:44 ` [PATCH v7 mm-new 05/10] bpf: mark mm->owner as __safe_rcu_or_null Yafang Shao
2025-09-11 16:04   ` Lorenzo Stoakes
2025-09-10  2:44 ` [PATCH v7 mm-new 06/10] bpf: mark vma->vm_mm as __safe_trusted_or_null Yafang Shao
2025-09-11 17:08   ` Lorenzo Stoakes
2025-09-11 17:30   ` Liam R. Howlett
2025-09-11 17:44     ` Lorenzo Stoakes
2025-09-12  3:56       ` Yafang Shao
2025-09-12  3:50     ` Yafang Shao
2025-09-10  2:44 ` [PATCH v7 mm-new 07/10] selftests/bpf: add a simple BPF based THP policy Yafang Shao
2025-09-10 20:44   ` Alexei Starovoitov
2025-09-11  2:31     ` Yafang Shao
2025-09-10  2:44 ` [PATCH v7 mm-new 08/10] selftests/bpf: add test case to update " Yafang Shao
2025-09-10  2:44 ` [PATCH v7 mm-new 09/10] selftests/bpf: add test cases for invalid thp_adjust usage Yafang Shao
2025-09-10  2:44 ` [PATCH v7 mm-new 10/10] Documentation: add BPF-based THP policy management Yafang Shao
2025-09-10 11:11 ` [PATCH v7 mm-new 0/9] mm, bpf: BPF based THP order selection Lance Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a2c122f5-ab6a-4242-9db8-e5175d5b27b3@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=21cnbao@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=ameryhung@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bpf@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=daniel@iogearbox.net \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=gutierrez.asier@huawei-partners.com \
    --cc=hannes@cmpxchg.org \
    --cc=laoar.shao@gmail.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npache@redhat.com \
    --cc=rientjes@google.com \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=usamaarif642@gmail.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox