From: Wei Yang <richard.weiyang@gmail.com>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
akpm@linux-foundation.org, david@kernel.org, riel@surriel.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, harry.yoo@oracle.com,
jannh@google.com, gavinguo@igalia.com,
baolin.wang@linux.alibaba.com, ziy@nvidia.com,
linux-mm@kvack.org, Lance Yang <lance.yang@linux.dev>,
stable@vger.kernel.org
Subject: Re: [Patch v3] mm/huge_memory: fix early failure try_to_migrate() when split huge pmd for shared thp
Date: Sun, 22 Feb 2026 00:50:18 +0000 [thread overview]
Message-ID: <20260222005018.r4xum26tfxgnnvys@master> (raw)
In-Reply-To: <20260213132027.wm75sh6trz7n24kd@master>
On Fri, Feb 13, 2026 at 01:20:27PM +0000, Wei Yang wrote:
>On Tue, Feb 10, 2026 at 03:23:04AM +0000, Wei Yang wrote:
>>On Mon, Feb 09, 2026 at 05:08:16PM +0000, Lorenzo Stoakes wrote:
>>>On Thu, Feb 05, 2026 at 03:31:13AM +0000, Wei Yang wrote:
>>>> Commit 60fbb14396d5 ("mm/huge_memory: adjust try_to_migrate_one() and
>>>> split_huge_pmd_locked()") return false unconditionally after
>>>> split_huge_pmd_locked() which may fail early during try_to_migrate() for
>>>> shared thp. This will lead to unexpected folio split failure.
>>>
>>>I think this could be put more clearly. 'When splitting a PMD THP migration
>>>entry in try_to_migrate_one() in a rmap walk invoked by try_to_migrate() when
>>
>>split_huge_pmd_locked() could split a PMD THP migration entry, but here we
>>expect a PMD THP normal entry.
>>
>>>TTU_SPLIT_HUGE_PMD is specified.' or something like that.
>>>
>>>>
>>>> One way to reproduce:
>>>>
>>>> Create an anonymous thp range and fork 512 children, so we have a
>>>> thp shared mapped in 513 processes. Then trigger folio split with
>>>> /sys/kernel/debug/split_huge_pages debugfs to split the thp folio to
>>>> order 0.
>>>
>>>I think you should explain the issue before the repro. This is just confusing
>>>things. Mention the repro _afterwards_.
>>>
>>
>>OK, will move afterwards.
>>
>>>>
>>>> Without the above commit, we can successfully split to order 0.
>>>> With the above commit, the folio is still a large folio.
>>>>
>>>> The reason is the above commit return false after split pmd
>>>
>>>This sentence doesn't really make sense. Returns false where? And under what
>>>circumstances?
>>>
>>>I'm having to look through 60fbb14396d5 to understand this which isn't a good
>>>sign.
>>>
>>>'This patch adjusted try_to_migrate_one() to, when a PMD-mapped THP migration
>>
>>I am afraid the original intention of commit 60fbb14396d5 is not just for
>>migration entry.
>>
>>>entry is found, and TTU_SPLIT_HUGE_PMD is specified (for example, via
>>>unmap_folio()), exit the walk and return false unconditionally'.
>>>
>>>> unconditionally in the first process and break try_to_migrate().
>>>>
>>>> On memory pressure or failure, we would try to reclaim unused memory or
>>>> limit bad memory after folio split. If failed to split it, we will leave
>>>
>>>Limit bad memory? What does that mean? Also should be If '_we_' or '_it_' or
>>>something like that.
>>>
>>
>>What I want to mean is in memory_failure() we use try_to_split_thp_page() and
>>the PG_has_hwpoisoned bit is only set in the after-split folio contains
>>@split_at.
>>
>>>> some more memory unusable than expected.
>>>
>>>'We will leave some more memory unusable than expected' is super unclear.
>>>
>>>You mean we will fail to migrate THP entries at the PTE level?
>>>
>>
>>No.
>>
>>Hmm... I would like to clarify before continue.
>>
>>This fix is not to fix migration case. This is to fix folio split for a shared
>>mapped PMD THP. Current folio split leverage migration entry during split
>>anonymous folio. So the action here is not to migrate it.
>>
>>I am a little lost here.
>>
>>>Can we say this instead please?
>>>
>
>Hi, Lorenzo
>
>I am not sure understand you correctly. If not, please let me know.
>
>>>>
>>>> The tricky thing in above reproduce method is current debugfs interface
>>>> leverage function split_huge_pages_pid(), which will iterate the whole
>>>> pmd range and do folio split on each base page address. This means it
>>>> will try 512 times, and each time split one pmd from pmd mapped to pte
>>>> mapped thp. If there are less than 512 shared mapped process,
>>>> the folio is still split successfully at last. But in real world, we
>>>> usually try it for once.
>>>
>>>This whole sentence could be dropped I think I don't think it adds anything.
>>>
>>>And you're really confusing the issue by dwelling on this I think.
>>>
>
>It is intended to explain why the reproduce method should fork 512 child. In
>case it is not helpful, I will drop it.
>
>>>You need to restart the walk in this case in order for the PTEs to be correctly
>>>handled right?
>>>
>>>Can you explain why we can't just essentially revert 60fbb14396d5? Or at least
>>>the bit that did this change?
>
>Commit 60fbb14396d5 removed some duplicated check covered by
>page_vma_mapped_walk(), so just reverting it may not good?
>
>You mean a sentence like above is preferred in commit msg?
>
>>>
>>>Also is unmap_folio() the only caller with TTU_SPLIT_HUGE_PMD as the comment
>>>that was deleted by 60fbb14396d5 implied? Or are there others? If it is, please
>>>mention the commit msg.
>>>
>
>Currently there are two core users of TTU_SPLIT_HUGE_PMD:
>
> * try_to_unmap_one()
> * try_to_migrate_one()
>
>And another two indirect user by calling try_to_unmap():
>
> * try_folio_split_or_unmap()
> * shrink_folio_list()
>
>try_to_unmap_one() doesn't fail early, so only try_to_migrate_one() is
>affected.
>
>So you prefer some description like above to be added in commit msg?
>
>>>
>>>>
>>>> This patch fixes this by restart page_vma_mapped_walk() after
>>>> split_huge_pmd_locked(). We cannot simply return "true" to fix the
>>>> problem, as that would affect another case:
>>>
>>>I mean how would it fix the problem to incorrectly have it return true when the
>>>walk had not in fact completed?
>>>
>>>I'm not sure why you're dwelling on this idea in the commit msg?
>>>
>>>> split_huge_pmd_locked()->folio_try_share_anon_rmap_pmd() can failed and
>>>> leave the folio mapped through PTEs; we would return "true" from
>>>> try_to_migrate_one() in that case as well. While that is mostly
>>>> harmless, we could end up walking the rmap, wasting some cycles.
>>>
>>>I mean I think we can just drop this whole paragraph no?
>>>
>
>I had an original explanation in [1], which is not clear.
>Then David proposed this version in [2], which looks good to me. So I took it
>in v3.
>
>If this is not necessary, I am ok to drop it.
>
>[1]: http://lkml.kernel.org/r/20260204004219.6524-1-richard.weiyang@gmail.com
>[2]: http://lkml.kernel.org/r/df86ccfd-68a5-416e-81cc-02858e395b70@kernel.org
>
Hi, Lorenzo
I am not certain on how you prefer the commit msg, would you mind taking a
look at my question when you have time slot? So I could prepare next version.
Thanks a lot.
>>>You might think I'm being picky about the commit msg here, but as is I find it
>>>pretty much incomprehensible and that's not helpful if we have to go back and
>>>read this in future.
>>>
>>
>>Never mind.
>>
>>A clearer and comprehensive change log is helpful for all. And my English is
>>not native language, so your suggestion helps a lot.
>>
>>>>
>>>> Fixes: 60fbb14396d5 ("mm/huge_memory: adjust try_to_migrate_one() and split_huge_pmd_locked()")
>>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>>> Tested-by: Lance Yang <lance.yang@linux.dev>
>>>> Reviewed-by: Lance Yang <lance.yang@linux.dev>
>>>> Reviewed-by: Gavin Guo <gavinguo@igalia.com>
>>>> Acked-by: David Hildenbrand (arm) <david@kernel.org>
>>>> Cc: Gavin Guo <gavinguo@igalia.com>
>>>> Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
>>>> Cc: Zi Yan <ziy@nvidia.com>
>>>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>> Cc: Lance Yang <lance.yang@linux.dev>
>>>> Cc: <stable@vger.kernel.org>
>>>>
>>>> ---
>>>> v3:
>>>> * gather RB
>>>> * adjust the commit log and comment per David
>>>
>>>Clearly not enough :)
>>>
>>>> * add userspace-visible runtime effect in change log
>>>
>>>Which one was that?
>>>
>>>> v2:
>>>> * restart page_vma_mapped_walk() after split_huge_pmd_locked()
>>>> ---
>>>> mm/rmap.c | 12 +++++++++---
>>>> 1 file changed, 9 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>> index 618df3385c8b..1041a64b8e6b 100644
>>>> --- a/mm/rmap.c
>>>> +++ b/mm/rmap.c
>>>> @@ -2446,11 +2446,17 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>>>> __maybe_unused pmd_t pmdval;
>>>>
>>>> if (flags & TTU_SPLIT_HUGE_PMD) {
>>>> + /*
>>>> + * split_huge_pmd_locked() might leave the
>>>> + * folio mapped through PTEs. Retry the walk
>>>> + * so we can detect this scenario and properly
>>>> + * abort the walk.
>>>> + */
>>>
>>>This comment is a lot clearer than the commit msg :)
>>>
>>>> split_huge_pmd_locked(vma, pvmw.address,
>>>> pvmw.pmd, true);
>>>> - ret = false;
>>>> - page_vma_mapped_walk_done(&pvmw);
>>>> - break;
>>>> + flags &= ~TTU_SPLIT_HUGE_PMD;
>>>> + page_vma_mapped_walk_restart(&pvmw);
>>>> + continue;
>>>
>>>This logic does lok reasonable.
>>>
>>>> }
>>>> #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
>>>> pmdval = pmdp_get(pvmw.pmd);
>>>> --
>>>> 2.34.1
>>>>
>>>
>>>Cheers, Lorenzo
>>
>>--
>>Wei Yang
>>Help you, Help me
>
>--
>Wei Yang
>Help you, Help me
--
Wei Yang
Help you, Help me
prev parent reply other threads:[~2026-02-22 0:50 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-05 3:31 Wei Yang
2026-02-09 17:08 ` Lorenzo Stoakes
2026-02-10 3:23 ` Wei Yang
2026-02-13 13:20 ` Wei Yang
2026-02-22 0:50 ` Wei Yang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260222005018.r4xum26tfxgnnvys@master \
--to=richard.weiyang@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=gavinguo@igalia.com \
--cc=harry.yoo@oracle.com \
--cc=jannh@google.com \
--cc=lance.yang@linux.dev \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=riel@surriel.com \
--cc=stable@vger.kernel.org \
--cc=vbabka@suse.cz \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox