From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: akpm@linux-foundation.org, david@kernel.org, riel@surriel.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, harry.yoo@oracle.com,
jannh@google.com, gavinguo@igalia.com,
baolin.wang@linux.alibaba.com, ziy@nvidia.com,
linux-mm@kvack.org, Lance Yang <lance.yang@linux.dev>,
stable@vger.kernel.org
Subject: Re: [Patch v3] mm/huge_memory: fix early failure try_to_migrate() when split huge pmd for shared thp
Date: Mon, 9 Feb 2026 17:08:16 +0000 [thread overview]
Message-ID: <fbd6c31f-7f35-4986-86e3-76bf8963433d@lucifer.local> (raw)
In-Reply-To: <20260205033113.30724-1-richard.weiyang@gmail.com>
On Thu, Feb 05, 2026 at 03:31:13AM +0000, Wei Yang wrote:
> Commit 60fbb14396d5 ("mm/huge_memory: adjust try_to_migrate_one() and
> split_huge_pmd_locked()") return false unconditionally after
> split_huge_pmd_locked() which may fail early during try_to_migrate() for
> shared thp. This will lead to unexpected folio split failure.
I think this could be put more clearly. 'When splitting a PMD THP migration
entry in try_to_migrate_one() in a rmap walk invoked by try_to_migrate() when
TTU_SPLIT_HUGE_PMD is specified.' or something like that.
>
> One way to reproduce:
>
> Create an anonymous thp range and fork 512 children, so we have a
> thp shared mapped in 513 processes. Then trigger folio split with
> /sys/kernel/debug/split_huge_pages debugfs to split the thp folio to
> order 0.
I think you should explain the issue before the repro. This is just confusing
things. Mention the repro _afterwards_.
>
> Without the above commit, we can successfully split to order 0.
> With the above commit, the folio is still a large folio.
>
> The reason is the above commit return false after split pmd
This sentence doesn't really make sense. Returns false where? And under what
circumstances?
I'm having to look through 60fbb14396d5 to understand this which isn't a good
sign.
'This patch adjusted try_to_migrate_one() to, when a PMD-mapped THP migration
entry is found, and TTU_SPLIT_HUGE_PMD is specified (for example, via
unmap_folio()), exit the walk and return false unconditionally'.
> unconditionally in the first process and break try_to_migrate().
>
> On memory pressure or failure, we would try to reclaim unused memory or
> limit bad memory after folio split. If failed to split it, we will leave
Limit bad memory? What does that mean? Also should be If '_we_' or '_it_' or
something like that.
> some more memory unusable than expected.
'We will leave some more memory unusable than expected' is super unclear.
You mean we will fail to migrate THP entries at the PTE level?
Can we say this instead please?
>
> The tricky thing in above reproduce method is current debugfs interface
> leverage function split_huge_pages_pid(), which will iterate the whole
> pmd range and do folio split on each base page address. This means it
> will try 512 times, and each time split one pmd from pmd mapped to pte
> mapped thp. If there are less than 512 shared mapped process,
> the folio is still split successfully at last. But in real world, we
> usually try it for once.
This whole sentence could be dropped I think I don't think it adds anything.
And you're really confusing the issue by dwelling on this I think.
You need to restart the walk in this case in order for the PTEs to be correctly
handled right?
Can you explain why we can't just essentially revert 60fbb14396d5? Or at least
the bit that did this change?
Also is unmap_folio() the only caller with TTU_SPLIT_HUGE_PMD as the comment
that was deleted by 60fbb14396d5 implied? Or are there others? If it is, please
mention the commit msg.
>
> This patch fixes this by restart page_vma_mapped_walk() after
> split_huge_pmd_locked(). We cannot simply return "true" to fix the
> problem, as that would affect another case:
I mean how would it fix the problem to incorrectly have it return true when the
walk had not in fact completed?
I'm not sure why you're dwelling on this idea in the commit msg?
> split_huge_pmd_locked()->folio_try_share_anon_rmap_pmd() can failed and
> leave the folio mapped through PTEs; we would return "true" from
> try_to_migrate_one() in that case as well. While that is mostly
> harmless, we could end up walking the rmap, wasting some cycles.
I mean I think we can just drop this whole paragraph no?
You might think I'm being picky about the commit msg here, but as is I find it
pretty much incomprehensible and that's not helpful if we have to go back and
read this in future.
>
> Fixes: 60fbb14396d5 ("mm/huge_memory: adjust try_to_migrate_one() and split_huge_pmd_locked()")
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Tested-by: Lance Yang <lance.yang@linux.dev>
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> Reviewed-by: Gavin Guo <gavinguo@igalia.com>
> Acked-by: David Hildenbrand (arm) <david@kernel.org>
> Cc: Gavin Guo <gavinguo@igalia.com>
> Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Lance Yang <lance.yang@linux.dev>
> Cc: <stable@vger.kernel.org>
>
> ---
> v3:
> * gather RB
> * adjust the commit log and comment per David
Clearly not enough :)
> * add userspace-visible runtime effect in change log
Which one was that?
> v2:
> * restart page_vma_mapped_walk() after split_huge_pmd_locked()
> ---
> mm/rmap.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 618df3385c8b..1041a64b8e6b 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -2446,11 +2446,17 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
> __maybe_unused pmd_t pmdval;
>
> if (flags & TTU_SPLIT_HUGE_PMD) {
> + /*
> + * split_huge_pmd_locked() might leave the
> + * folio mapped through PTEs. Retry the walk
> + * so we can detect this scenario and properly
> + * abort the walk.
> + */
This comment is a lot clearer than the commit msg :)
> split_huge_pmd_locked(vma, pvmw.address,
> pvmw.pmd, true);
> - ret = false;
> - page_vma_mapped_walk_done(&pvmw);
> - break;
> + flags &= ~TTU_SPLIT_HUGE_PMD;
> + page_vma_mapped_walk_restart(&pvmw);
> + continue;
This logic does lok reasonable.
> }
> #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
> pmdval = pmdp_get(pvmw.pmd);
> --
> 2.34.1
>
Cheers, Lorenzo
next prev parent reply other threads:[~2026-02-09 17:08 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-05 3:31 Wei Yang
2026-02-09 17:08 ` Lorenzo Stoakes [this message]
2026-02-10 3:23 ` Wei Yang
2026-02-13 13:20 ` Wei Yang
2026-02-22 0:50 ` Wei Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fbd6c31f-7f35-4986-86e3-76bf8963433d@lucifer.local \
--to=lorenzo.stoakes@oracle.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=gavinguo@igalia.com \
--cc=harry.yoo@oracle.com \
--cc=jannh@google.com \
--cc=lance.yang@linux.dev \
--cc=linux-mm@kvack.org \
--cc=richard.weiyang@gmail.com \
--cc=riel@surriel.com \
--cc=stable@vger.kernel.org \
--cc=vbabka@suse.cz \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox