From: Wei Yang <richard.weiyang@gmail.com>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
akpm@linux-foundation.org, david@kernel.org, riel@surriel.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, harry.yoo@oracle.com,
jannh@google.com, gavinguo@igalia.com,
baolin.wang@linux.alibaba.com, ziy@nvidia.com,
linux-mm@kvack.org, Lance Yang <lance.yang@linux.dev>,
stable@vger.kernel.org
Subject: Re: [Patch v3] mm/huge_memory: fix early failure try_to_migrate() when split huge pmd for shared thp
Date: Fri, 13 Feb 2026 13:20:27 +0000 [thread overview]
Message-ID: <20260213132027.wm75sh6trz7n24kd@master> (raw)
In-Reply-To: <20260210032304.j4k5izweewouabqb@master>
On Tue, Feb 10, 2026 at 03:23:04AM +0000, Wei Yang wrote:
>On Mon, Feb 09, 2026 at 05:08:16PM +0000, Lorenzo Stoakes wrote:
>>On Thu, Feb 05, 2026 at 03:31:13AM +0000, Wei Yang wrote:
>>> Commit 60fbb14396d5 ("mm/huge_memory: adjust try_to_migrate_one() and
>>> split_huge_pmd_locked()") return false unconditionally after
>>> split_huge_pmd_locked() which may fail early during try_to_migrate() for
>>> shared thp. This will lead to unexpected folio split failure.
>>
>>I think this could be put more clearly. 'When splitting a PMD THP migration
>>entry in try_to_migrate_one() in a rmap walk invoked by try_to_migrate() when
>
>split_huge_pmd_locked() could split a PMD THP migration entry, but here we
>expect a PMD THP normal entry.
>
>>TTU_SPLIT_HUGE_PMD is specified.' or something like that.
>>
>>>
>>> One way to reproduce:
>>>
>>> Create an anonymous thp range and fork 512 children, so we have a
>>> thp shared mapped in 513 processes. Then trigger folio split with
>>> /sys/kernel/debug/split_huge_pages debugfs to split the thp folio to
>>> order 0.
>>
>>I think you should explain the issue before the repro. This is just confusing
>>things. Mention the repro _afterwards_.
>>
>
>OK, will move afterwards.
>
>>>
>>> Without the above commit, we can successfully split to order 0.
>>> With the above commit, the folio is still a large folio.
>>>
>>> The reason is the above commit return false after split pmd
>>
>>This sentence doesn't really make sense. Returns false where? And under what
>>circumstances?
>>
>>I'm having to look through 60fbb14396d5 to understand this which isn't a good
>>sign.
>>
>>'This patch adjusted try_to_migrate_one() to, when a PMD-mapped THP migration
>
>I am afraid the original intention of commit 60fbb14396d5 is not just for
>migration entry.
>
>>entry is found, and TTU_SPLIT_HUGE_PMD is specified (for example, via
>>unmap_folio()), exit the walk and return false unconditionally'.
>>
>>> unconditionally in the first process and break try_to_migrate().
>>>
>>> On memory pressure or failure, we would try to reclaim unused memory or
>>> limit bad memory after folio split. If failed to split it, we will leave
>>
>>Limit bad memory? What does that mean? Also should be If '_we_' or '_it_' or
>>something like that.
>>
>
>What I want to mean is in memory_failure() we use try_to_split_thp_page() and
>the PG_has_hwpoisoned bit is only set in the after-split folio contains
>@split_at.
>
>>> some more memory unusable than expected.
>>
>>'We will leave some more memory unusable than expected' is super unclear.
>>
>>You mean we will fail to migrate THP entries at the PTE level?
>>
>
>No.
>
>Hmm... I would like to clarify before continue.
>
>This fix is not to fix migration case. This is to fix folio split for a shared
>mapped PMD THP. Current folio split leverage migration entry during split
>anonymous folio. So the action here is not to migrate it.
>
>I am a little lost here.
>
>>Can we say this instead please?
>>
Hi, Lorenzo
I am not sure understand you correctly. If not, please let me know.
>>>
>>> The tricky thing in above reproduce method is current debugfs interface
>>> leverage function split_huge_pages_pid(), which will iterate the whole
>>> pmd range and do folio split on each base page address. This means it
>>> will try 512 times, and each time split one pmd from pmd mapped to pte
>>> mapped thp. If there are less than 512 shared mapped process,
>>> the folio is still split successfully at last. But in real world, we
>>> usually try it for once.
>>
>>This whole sentence could be dropped I think I don't think it adds anything.
>>
>>And you're really confusing the issue by dwelling on this I think.
>>
It is intended to explain why the reproduce method should fork 512 child. In
case it is not helpful, I will drop it.
>>You need to restart the walk in this case in order for the PTEs to be correctly
>>handled right?
>>
>>Can you explain why we can't just essentially revert 60fbb14396d5? Or at least
>>the bit that did this change?
Commit 60fbb14396d5 removed some duplicated check covered by
page_vma_mapped_walk(), so just reverting it may not good?
You mean a sentence like above is preferred in commit msg?
>>
>>Also is unmap_folio() the only caller with TTU_SPLIT_HUGE_PMD as the comment
>>that was deleted by 60fbb14396d5 implied? Or are there others? If it is, please
>>mention the commit msg.
>>
Currently there are two core users of TTU_SPLIT_HUGE_PMD:
* try_to_unmap_one()
* try_to_migrate_one()
And another two indirect user by calling try_to_unmap():
* try_folio_split_or_unmap()
* shrink_folio_list()
try_to_unmap_one() doesn't fail early, so only try_to_migrate_one() is
affected.
So you prefer some description like above to be added in commit msg?
>>
>>>
>>> This patch fixes this by restart page_vma_mapped_walk() after
>>> split_huge_pmd_locked(). We cannot simply return "true" to fix the
>>> problem, as that would affect another case:
>>
>>I mean how would it fix the problem to incorrectly have it return true when the
>>walk had not in fact completed?
>>
>>I'm not sure why you're dwelling on this idea in the commit msg?
>>
>>> split_huge_pmd_locked()->folio_try_share_anon_rmap_pmd() can failed and
>>> leave the folio mapped through PTEs; we would return "true" from
>>> try_to_migrate_one() in that case as well. While that is mostly
>>> harmless, we could end up walking the rmap, wasting some cycles.
>>
>>I mean I think we can just drop this whole paragraph no?
>>
I had an original explanation in [1], which is not clear.
Then David proposed this version in [2], which looks good to me. So I took it
in v3.
If this is not necessary, I am ok to drop it.
[1]: http://lkml.kernel.org/r/20260204004219.6524-1-richard.weiyang@gmail.com
[2]: http://lkml.kernel.org/r/df86ccfd-68a5-416e-81cc-02858e395b70@kernel.org
>>You might think I'm being picky about the commit msg here, but as is I find it
>>pretty much incomprehensible and that's not helpful if we have to go back and
>>read this in future.
>>
>
>Never mind.
>
>A clearer and comprehensive change log is helpful for all. And my English is
>not native language, so your suggestion helps a lot.
>
>>>
>>> Fixes: 60fbb14396d5 ("mm/huge_memory: adjust try_to_migrate_one() and split_huge_pmd_locked()")
>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>> Tested-by: Lance Yang <lance.yang@linux.dev>
>>> Reviewed-by: Lance Yang <lance.yang@linux.dev>
>>> Reviewed-by: Gavin Guo <gavinguo@igalia.com>
>>> Acked-by: David Hildenbrand (arm) <david@kernel.org>
>>> Cc: Gavin Guo <gavinguo@igalia.com>
>>> Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
>>> Cc: Zi Yan <ziy@nvidia.com>
>>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> Cc: Lance Yang <lance.yang@linux.dev>
>>> Cc: <stable@vger.kernel.org>
>>>
>>> ---
>>> v3:
>>> * gather RB
>>> * adjust the commit log and comment per David
>>
>>Clearly not enough :)
>>
>>> * add userspace-visible runtime effect in change log
>>
>>Which one was that?
>>
>>> v2:
>>> * restart page_vma_mapped_walk() after split_huge_pmd_locked()
>>> ---
>>> mm/rmap.c | 12 +++++++++---
>>> 1 file changed, 9 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index 618df3385c8b..1041a64b8e6b 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -2446,11 +2446,17 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>>> __maybe_unused pmd_t pmdval;
>>>
>>> if (flags & TTU_SPLIT_HUGE_PMD) {
>>> + /*
>>> + * split_huge_pmd_locked() might leave the
>>> + * folio mapped through PTEs. Retry the walk
>>> + * so we can detect this scenario and properly
>>> + * abort the walk.
>>> + */
>>
>>This comment is a lot clearer than the commit msg :)
>>
>>> split_huge_pmd_locked(vma, pvmw.address,
>>> pvmw.pmd, true);
>>> - ret = false;
>>> - page_vma_mapped_walk_done(&pvmw);
>>> - break;
>>> + flags &= ~TTU_SPLIT_HUGE_PMD;
>>> + page_vma_mapped_walk_restart(&pvmw);
>>> + continue;
>>
>>This logic does lok reasonable.
>>
>>> }
>>> #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
>>> pmdval = pmdp_get(pvmw.pmd);
>>> --
>>> 2.34.1
>>>
>>
>>Cheers, Lorenzo
>
>--
>Wei Yang
>Help you, Help me
--
Wei Yang
Help you, Help me
next prev parent reply other threads:[~2026-02-13 13:20 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-05 3:31 Wei Yang
2026-02-09 17:08 ` Lorenzo Stoakes
2026-02-10 3:23 ` Wei Yang
2026-02-13 13:20 ` Wei Yang [this message]
2026-02-22 0:50 ` Wei Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260213132027.wm75sh6trz7n24kd@master \
--to=richard.weiyang@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=gavinguo@igalia.com \
--cc=harry.yoo@oracle.com \
--cc=jannh@google.com \
--cc=lance.yang@linux.dev \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=riel@surriel.com \
--cc=stable@vger.kernel.org \
--cc=vbabka@suse.cz \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox