From: Wei Yang <richard.weiyang@gmail.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Wei Yang <richard.weiyang@gmail.com>,
akpm@linux-foundation.org, david@kernel.org, riel@surriel.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, harry.yoo@oracle.com,
jannh@google.com, gavinguo@igalia.com,
baolin.wang@linux.alibaba.com, ziy@nvidia.com,
linux-mm@kvack.org, Lance Yang <lance.yang@linux.dev>,
stable@vger.kernel.org
Subject: Re: [Patch v3] mm/huge_memory: fix early failure try_to_migrate() when split huge pmd for shared thp
Date: Tue, 10 Feb 2026 03:23:04 +0000 [thread overview]
Message-ID: <20260210032304.j4k5izweewouabqb@master> (raw)
In-Reply-To: <fbd6c31f-7f35-4986-86e3-76bf8963433d@lucifer.local>
On Mon, Feb 09, 2026 at 05:08:16PM +0000, Lorenzo Stoakes wrote:
>On Thu, Feb 05, 2026 at 03:31:13AM +0000, Wei Yang wrote:
>> Commit 60fbb14396d5 ("mm/huge_memory: adjust try_to_migrate_one() and
>> split_huge_pmd_locked()") return false unconditionally after
>> split_huge_pmd_locked() which may fail early during try_to_migrate() for
>> shared thp. This will lead to unexpected folio split failure.
>
>I think this could be put more clearly. 'When splitting a PMD THP migration
>entry in try_to_migrate_one() in a rmap walk invoked by try_to_migrate() when
split_huge_pmd_locked() could split a PMD THP migration entry, but here we
expect a PMD THP normal entry.
>TTU_SPLIT_HUGE_PMD is specified.' or something like that.
>
>>
>> One way to reproduce:
>>
>> Create an anonymous thp range and fork 512 children, so we have a
>> thp shared mapped in 513 processes. Then trigger folio split with
>> /sys/kernel/debug/split_huge_pages debugfs to split the thp folio to
>> order 0.
>
>I think you should explain the issue before the repro. This is just confusing
>things. Mention the repro _afterwards_.
>
OK, will move afterwards.
>>
>> Without the above commit, we can successfully split to order 0.
>> With the above commit, the folio is still a large folio.
>>
>> The reason is the above commit return false after split pmd
>
>This sentence doesn't really make sense. Returns false where? And under what
>circumstances?
>
>I'm having to look through 60fbb14396d5 to understand this which isn't a good
>sign.
>
>'This patch adjusted try_to_migrate_one() to, when a PMD-mapped THP migration
I am afraid the original intention of commit 60fbb14396d5 is not just for
migration entry.
>entry is found, and TTU_SPLIT_HUGE_PMD is specified (for example, via
>unmap_folio()), exit the walk and return false unconditionally'.
>
>> unconditionally in the first process and break try_to_migrate().
>>
>> On memory pressure or failure, we would try to reclaim unused memory or
>> limit bad memory after folio split. If failed to split it, we will leave
>
>Limit bad memory? What does that mean? Also should be If '_we_' or '_it_' or
>something like that.
>
What I want to mean is in memory_failure() we use try_to_split_thp_page() and
the PG_has_hwpoisoned bit is only set in the after-split folio contains
@split_at.
>> some more memory unusable than expected.
>
>'We will leave some more memory unusable than expected' is super unclear.
>
>You mean we will fail to migrate THP entries at the PTE level?
>
No.
Hmm... I would like to clarify before continue.
This fix is not to fix migration case. This is to fix folio split for a shared
mapped PMD THP. Current folio split leverage migration entry during split
anonymous folio. So the action here is not to migrate it.
I am a little lost here.
>Can we say this instead please?
>
>>
>> The tricky thing in above reproduce method is current debugfs interface
>> leverage function split_huge_pages_pid(), which will iterate the whole
>> pmd range and do folio split on each base page address. This means it
>> will try 512 times, and each time split one pmd from pmd mapped to pte
>> mapped thp. If there are less than 512 shared mapped process,
>> the folio is still split successfully at last. But in real world, we
>> usually try it for once.
>
>This whole sentence could be dropped I think I don't think it adds anything.
>
>And you're really confusing the issue by dwelling on this I think.
>
>You need to restart the walk in this case in order for the PTEs to be correctly
>handled right?
>
>Can you explain why we can't just essentially revert 60fbb14396d5? Or at least
>the bit that did this change?
>
>Also is unmap_folio() the only caller with TTU_SPLIT_HUGE_PMD as the comment
>that was deleted by 60fbb14396d5 implied? Or are there others? If it is, please
>mention the commit msg.
>
>
>>
>> This patch fixes this by restart page_vma_mapped_walk() after
>> split_huge_pmd_locked(). We cannot simply return "true" to fix the
>> problem, as that would affect another case:
>
>I mean how would it fix the problem to incorrectly have it return true when the
>walk had not in fact completed?
>
>I'm not sure why you're dwelling on this idea in the commit msg?
>
>> split_huge_pmd_locked()->folio_try_share_anon_rmap_pmd() can failed and
>> leave the folio mapped through PTEs; we would return "true" from
>> try_to_migrate_one() in that case as well. While that is mostly
>> harmless, we could end up walking the rmap, wasting some cycles.
>
>I mean I think we can just drop this whole paragraph no?
>
>You might think I'm being picky about the commit msg here, but as is I find it
>pretty much incomprehensible and that's not helpful if we have to go back and
>read this in future.
>
Never mind.
A clearer and comprehensive change log is helpful for all. And my English is
not native language, so your suggestion helps a lot.
>>
>> Fixes: 60fbb14396d5 ("mm/huge_memory: adjust try_to_migrate_one() and split_huge_pmd_locked()")
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>> Tested-by: Lance Yang <lance.yang@linux.dev>
>> Reviewed-by: Lance Yang <lance.yang@linux.dev>
>> Reviewed-by: Gavin Guo <gavinguo@igalia.com>
>> Acked-by: David Hildenbrand (arm) <david@kernel.org>
>> Cc: Gavin Guo <gavinguo@igalia.com>
>> Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Cc: Lance Yang <lance.yang@linux.dev>
>> Cc: <stable@vger.kernel.org>
>>
>> ---
>> v3:
>> * gather RB
>> * adjust the commit log and comment per David
>
>Clearly not enough :)
>
>> * add userspace-visible runtime effect in change log
>
>Which one was that?
>
>> v2:
>> * restart page_vma_mapped_walk() after split_huge_pmd_locked()
>> ---
>> mm/rmap.c | 12 +++++++++---
>> 1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 618df3385c8b..1041a64b8e6b 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -2446,11 +2446,17 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>> __maybe_unused pmd_t pmdval;
>>
>> if (flags & TTU_SPLIT_HUGE_PMD) {
>> + /*
>> + * split_huge_pmd_locked() might leave the
>> + * folio mapped through PTEs. Retry the walk
>> + * so we can detect this scenario and properly
>> + * abort the walk.
>> + */
>
>This comment is a lot clearer than the commit msg :)
>
>> split_huge_pmd_locked(vma, pvmw.address,
>> pvmw.pmd, true);
>> - ret = false;
>> - page_vma_mapped_walk_done(&pvmw);
>> - break;
>> + flags &= ~TTU_SPLIT_HUGE_PMD;
>> + page_vma_mapped_walk_restart(&pvmw);
>> + continue;
>
>This logic does lok reasonable.
>
>> }
>> #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
>> pmdval = pmdp_get(pvmw.pmd);
>> --
>> 2.34.1
>>
>
>Cheers, Lorenzo
--
Wei Yang
Help you, Help me
next prev parent reply other threads:[~2026-02-10 3:23 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-05 3:31 Wei Yang
2026-02-09 17:08 ` Lorenzo Stoakes
2026-02-10 3:23 ` Wei Yang [this message]
2026-02-13 13:20 ` Wei Yang
2026-02-22 0:50 ` Wei Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260210032304.j4k5izweewouabqb@master \
--to=richard.weiyang@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=gavinguo@igalia.com \
--cc=harry.yoo@oracle.com \
--cc=jannh@google.com \
--cc=lance.yang@linux.dev \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=riel@surriel.com \
--cc=stable@vger.kernel.org \
--cc=vbabka@suse.cz \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox