linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Wei Yang <richard.weiyang@gmail.com>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	akpm@linux-foundation.org, david@kernel.org, riel@surriel.com,
	Liam.Howlett@oracle.com, vbabka@suse.cz, harry.yoo@oracle.com,
	jannh@google.com, gavinguo@igalia.com,
	baolin.wang@linux.alibaba.com, ziy@nvidia.com,
	linux-mm@kvack.org, Lance Yang <lance.yang@linux.dev>,
	stable@vger.kernel.org
Subject: Re: [Patch v3] mm/huge_memory: fix early failure try_to_migrate() when split huge pmd for shared thp
Date: Fri, 13 Feb 2026 13:20:27 +0000	[thread overview]
Message-ID: <20260213132027.wm75sh6trz7n24kd@master> (raw)
In-Reply-To: <20260210032304.j4k5izweewouabqb@master>

On Tue, Feb 10, 2026 at 03:23:04AM +0000, Wei Yang wrote:
>On Mon, Feb 09, 2026 at 05:08:16PM +0000, Lorenzo Stoakes wrote:
>>On Thu, Feb 05, 2026 at 03:31:13AM +0000, Wei Yang wrote:
>>> Commit 60fbb14396d5 ("mm/huge_memory: adjust try_to_migrate_one() and
>>> split_huge_pmd_locked()") return false unconditionally after
>>> split_huge_pmd_locked() which may fail early during try_to_migrate() for
>>> shared thp. This will lead to unexpected folio split failure.
>>
>>I think this could be put more clearly. 'When splitting a PMD THP migration
>>entry in try_to_migrate_one() in a rmap walk invoked by try_to_migrate() when
>
>split_huge_pmd_locked() could split a PMD THP migration entry, but here we
>expect a PMD THP normal entry.
>
>>TTU_SPLIT_HUGE_PMD is specified.' or something like that.
>>
>>>
>>> One way to reproduce:
>>>
>>>     Create an anonymous thp range and fork 512 children, so we have a
>>>     thp shared mapped in 513 processes. Then trigger folio split with
>>>     /sys/kernel/debug/split_huge_pages debugfs to split the thp folio to
>>>     order 0.
>>
>>I think you should explain the issue before the repro. This is just confusing
>>things. Mention the repro _afterwards_.
>>
>
>OK, will move afterwards.
>
>>>
>>> Without the above commit, we can successfully split to order 0.
>>> With the above commit, the folio is still a large folio.
>>>
>>> The reason is the above commit return false after split pmd
>>
>>This sentence doesn't really make sense. Returns false where? And under what
>>circumstances?
>>
>>I'm having to look through 60fbb14396d5 to understand this which isn't a good
>>sign.
>>
>>'This patch adjusted try_to_migrate_one() to, when a PMD-mapped THP migration
>
>I am afraid the original intention of commit 60fbb14396d5 is not just for
>migration entry.
>
>>entry is found, and TTU_SPLIT_HUGE_PMD is specified (for example, via
>>unmap_folio()), exit the walk and return false unconditionally'.
>>
>>> unconditionally in the first process and break try_to_migrate().
>>>
>>> On memory pressure or failure, we would try to reclaim unused memory or
>>> limit bad memory after folio split. If failed to split it, we will leave
>>
>>Limit bad memory? What does that mean? Also should be If '_we_' or '_it_' or
>>something like that.
>>
>
>What I want to mean is in memory_failure() we use try_to_split_thp_page() and
>the PG_has_hwpoisoned bit is only set in the after-split folio contains
>@split_at.
>
>>> some more memory unusable than expected.
>>
>>'We will leave some more memory unusable than expected' is super unclear.
>>
>>You mean we will fail to migrate THP entries at the PTE level?
>>
>
>No. 
>
>Hmm... I would like to clarify before continue.
>
>This fix is not to fix migration case. This is to fix folio split for a shared
>mapped PMD THP. Current folio split leverage migration entry during split
>anonymous folio. So the action here is not to migrate it.
>
>I am a little lost here.
>
>>Can we say this instead please?
>>

Hi, Lorenzo

I am not sure understand you correctly. If not, please let me know.

>>>
>>> The tricky thing in above reproduce method is current debugfs interface
>>> leverage function split_huge_pages_pid(), which will iterate the whole
>>> pmd range and do folio split on each base page address. This means it
>>> will try 512 times, and each time split one pmd from pmd mapped to pte
>>> mapped thp. If there are less than 512 shared mapped process,
>>> the folio is still split successfully at last. But in real world, we
>>> usually try it for once.
>>
>>This whole sentence could be dropped I think I don't think it adds anything.
>>
>>And you're really confusing the issue by dwelling on this I think.
>>

It is intended to explain why the reproduce method should fork 512 child. In
case it is not helpful, I will drop it.

>>You need to restart the walk in this case in order for the PTEs to be correctly
>>handled right?
>>
>>Can you explain why we can't just essentially revert 60fbb14396d5? Or at least
>>the bit that did this change?

Commit 60fbb14396d5 removed some duplicated check covered by
page_vma_mapped_walk(), so just reverting it may not good?

You mean a sentence like above is preferred in commit msg?

>>
>>Also is unmap_folio() the only caller with TTU_SPLIT_HUGE_PMD as the comment
>>that was deleted by 60fbb14396d5 implied? Or are there others? If it is, please
>>mention the commit msg.
>>

Currently there are two core users of TTU_SPLIT_HUGE_PMD:

  * try_to_unmap_one()
  * try_to_migrate_one()

And another two indirect user by calling try_to_unmap():

  * try_folio_split_or_unmap()
  * shrink_folio_list()

try_to_unmap_one() doesn't fail early, so only try_to_migrate_one() is
affected.

So you prefer some description like above to be added in commit msg?

>>
>>>
>>> This patch fixes this by restart page_vma_mapped_walk() after
>>> split_huge_pmd_locked(). We cannot simply return "true" to fix the
>>> problem, as that would affect another case:
>>
>>I mean how would it fix the problem to incorrectly have it return true when the
>>walk had not in fact completed?
>>
>>I'm not sure why you're dwelling on this idea in the commit msg?
>>
>>> split_huge_pmd_locked()->folio_try_share_anon_rmap_pmd() can failed and
>>> leave the folio mapped through PTEs; we would return "true" from
>>> try_to_migrate_one() in that case as well. While that is mostly
>>> harmless, we could end up walking the rmap, wasting some cycles.
>>
>>I mean I think we can just drop this whole paragraph no?
>>

I had an original explanation in [1], which is not clear.
Then David proposed this version in [2], which looks good to me. So I took it
in v3.

If this is not necessary, I am ok to drop it.

[1]: http://lkml.kernel.org/r/20260204004219.6524-1-richard.weiyang@gmail.com
[2]: http://lkml.kernel.org/r/df86ccfd-68a5-416e-81cc-02858e395b70@kernel.org

>>You might think I'm being picky about the commit msg here, but as is I find it
>>pretty much incomprehensible and that's not helpful if we have to go back and
>>read this in future.
>>
>
>Never mind.
>
>A clearer and comprehensive change log is helpful for all. And my English is
>not native language, so your suggestion helps a lot.
>
>>>
>>> Fixes: 60fbb14396d5 ("mm/huge_memory: adjust try_to_migrate_one() and split_huge_pmd_locked()")
>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>> Tested-by: Lance Yang <lance.yang@linux.dev>
>>> Reviewed-by: Lance Yang <lance.yang@linux.dev>
>>> Reviewed-by: Gavin Guo <gavinguo@igalia.com>
>>> Acked-by: David Hildenbrand (arm) <david@kernel.org>
>>> Cc: Gavin Guo <gavinguo@igalia.com>
>>> Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
>>> Cc: Zi Yan <ziy@nvidia.com>
>>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> Cc: Lance Yang <lance.yang@linux.dev>
>>> Cc: <stable@vger.kernel.org>
>>>
>>> ---
>>> v3:
>>>   * gather RB
>>>   * adjust the commit log and comment per David
>>
>>Clearly not enough :)
>>
>>>   * add userspace-visible runtime effect in change log
>>
>>Which one was that?
>>
>>> v2:
>>>   * restart page_vma_mapped_walk() after split_huge_pmd_locked()
>>> ---
>>>  mm/rmap.c | 12 +++++++++---
>>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index 618df3385c8b..1041a64b8e6b 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -2446,11 +2446,17 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>>>  			__maybe_unused pmd_t pmdval;
>>>
>>>  			if (flags & TTU_SPLIT_HUGE_PMD) {
>>> +				/*
>>> +				 * split_huge_pmd_locked() might leave the
>>> +				 * folio mapped through PTEs. Retry the walk
>>> +				 * so we can detect this scenario and properly
>>> +				 * abort the walk.
>>> +				 */
>>
>>This comment is a lot clearer than the commit msg :)
>>
>>>  				split_huge_pmd_locked(vma, pvmw.address,
>>>  						      pvmw.pmd, true);
>>> -				ret = false;
>>> -				page_vma_mapped_walk_done(&pvmw);
>>> -				break;
>>> +				flags &= ~TTU_SPLIT_HUGE_PMD;
>>> +				page_vma_mapped_walk_restart(&pvmw);
>>> +				continue;
>>
>>This logic does lok reasonable.
>>
>>>  			}
>>>  #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
>>>  			pmdval = pmdp_get(pvmw.pmd);
>>> --
>>> 2.34.1
>>>
>>
>>Cheers, Lorenzo
>
>-- 
>Wei Yang
>Help you, Help me

-- 
Wei Yang
Help you, Help me


  reply	other threads:[~2026-02-13 13:20 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-05  3:31 Wei Yang
2026-02-09 17:08 ` Lorenzo Stoakes
2026-02-10  3:23   ` Wei Yang
2026-02-13 13:20     ` Wei Yang [this message]
2026-02-22  0:50       ` Wei Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260213132027.wm75sh6trz7n24kd@master \
    --to=richard.weiyang@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=gavinguo@igalia.com \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=riel@surriel.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox