Re: [Patch v3] mm/huge_memory: fix early failure try_to_migrate() when split huge pmd for shared thp

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Wei Yang <richard.weiyang@gmail.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Wei Yang <richard.weiyang@gmail.com>,
	akpm@linux-foundation.org, david@kernel.org, riel@surriel.com,
	Liam.Howlett@oracle.com, vbabka@suse.cz, harry.yoo@oracle.com,
	jannh@google.com, gavinguo@igalia.com,
	baolin.wang@linux.alibaba.com, ziy@nvidia.com,
	linux-mm@kvack.org, Lance Yang <lance.yang@linux.dev>,
	stable@vger.kernel.org
Subject: Re: [Patch v3] mm/huge_memory: fix early failure try_to_migrate() when split huge pmd for shared thp
Date: Tue, 10 Feb 2026 03:23:04 +0000	[thread overview]
Message-ID: <20260210032304.j4k5izweewouabqb@master> (raw)
In-Reply-To: <fbd6c31f-7f35-4986-86e3-76bf8963433d@lucifer.local>

On Mon, Feb 09, 2026 at 05:08:16PM +0000, Lorenzo Stoakes wrote:
>On Thu, Feb 05, 2026 at 03:31:13AM +0000, Wei Yang wrote:
>> Commit 60fbb14396d5 ("mm/huge_memory: adjust try_to_migrate_one() and
>> split_huge_pmd_locked()") return false unconditionally after
>> split_huge_pmd_locked() which may fail early during try_to_migrate() for
>> shared thp. This will lead to unexpected folio split failure.
>
>I think this could be put more clearly. 'When splitting a PMD THP migration
>entry in try_to_migrate_one() in a rmap walk invoked by try_to_migrate() when

split_huge_pmd_locked() could split a PMD THP migration entry, but here we
expect a PMD THP normal entry.

>TTU_SPLIT_HUGE_PMD is specified.' or something like that.
>
>>
>> One way to reproduce:
>>
>>     Create an anonymous thp range and fork 512 children, so we have a
>>     thp shared mapped in 513 processes. Then trigger folio split with
>>     /sys/kernel/debug/split_huge_pages debugfs to split the thp folio to
>>     order 0.
>
>I think you should explain the issue before the repro. This is just confusing
>things. Mention the repro _afterwards_.
>

OK, will move afterwards.

>>
>> Without the above commit, we can successfully split to order 0.
>> With the above commit, the folio is still a large folio.
>>
>> The reason is the above commit return false after split pmd
>
>This sentence doesn't really make sense. Returns false where? And under what
>circumstances?
>
>I'm having to look through 60fbb14396d5 to understand this which isn't a good
>sign.
>
>'This patch adjusted try_to_migrate_one() to, when a PMD-mapped THP migration

I am afraid the original intention of commit 60fbb14396d5 is not just for
migration entry.

>entry is found, and TTU_SPLIT_HUGE_PMD is specified (for example, via
>unmap_folio()), exit the walk and return false unconditionally'.
>
>> unconditionally in the first process and break try_to_migrate().
>>
>> On memory pressure or failure, we would try to reclaim unused memory or
>> limit bad memory after folio split. If failed to split it, we will leave
>
>Limit bad memory? What does that mean? Also should be If '_we_' or '_it_' or
>something like that.
>

What I want to mean is in memory_failure() we use try_to_split_thp_page() and
the PG_has_hwpoisoned bit is only set in the after-split folio contains
@split_at.

>> some more memory unusable than expected.
>
>'We will leave some more memory unusable than expected' is super unclear.
>
>You mean we will fail to migrate THP entries at the PTE level?
>

No. 

Hmm... I would like to clarify before continue.

This fix is not to fix migration case. This is to fix folio split for a shared
mapped PMD THP. Current folio split leverage migration entry during split
anonymous folio. So the action here is not to migrate it.

I am a little lost here.

>Can we say this instead please?
>
>>
>> The tricky thing in above reproduce method is current debugfs interface
>> leverage function split_huge_pages_pid(), which will iterate the whole
>> pmd range and do folio split on each base page address. This means it
>> will try 512 times, and each time split one pmd from pmd mapped to pte
>> mapped thp. If there are less than 512 shared mapped process,
>> the folio is still split successfully at last. But in real world, we
>> usually try it for once.
>
>This whole sentence could be dropped I think I don't think it adds anything.
>
>And you're really confusing the issue by dwelling on this I think.
>
>You need to restart the walk in this case in order for the PTEs to be correctly
>handled right?
>
>Can you explain why we can't just essentially revert 60fbb14396d5? Or at least
>the bit that did this change?
>
>Also is unmap_folio() the only caller with TTU_SPLIT_HUGE_PMD as the comment
>that was deleted by 60fbb14396d5 implied? Or are there others? If it is, please
>mention the commit msg.
>
>
>>
>> This patch fixes this by restart page_vma_mapped_walk() after
>> split_huge_pmd_locked(). We cannot simply return "true" to fix the
>> problem, as that would affect another case:
>
>I mean how would it fix the problem to incorrectly have it return true when the
>walk had not in fact completed?
>
>I'm not sure why you're dwelling on this idea in the commit msg?
>
>> split_huge_pmd_locked()->folio_try_share_anon_rmap_pmd() can failed and
>> leave the folio mapped through PTEs; we would return "true" from
>> try_to_migrate_one() in that case as well. While that is mostly
>> harmless, we could end up walking the rmap, wasting some cycles.
>
>I mean I think we can just drop this whole paragraph no?
>
>You might think I'm being picky about the commit msg here, but as is I find it
>pretty much incomprehensible and that's not helpful if we have to go back and
>read this in future.
>

Never mind.

A clearer and comprehensive change log is helpful for all. And my English is
not native language, so your suggestion helps a lot.

>>
>> Fixes: 60fbb14396d5 ("mm/huge_memory: adjust try_to_migrate_one() and split_huge_pmd_locked()")
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>> Tested-by: Lance Yang <lance.yang@linux.dev>
>> Reviewed-by: Lance Yang <lance.yang@linux.dev>
>> Reviewed-by: Gavin Guo <gavinguo@igalia.com>
>> Acked-by: David Hildenbrand (arm) <david@kernel.org>
>> Cc: Gavin Guo <gavinguo@igalia.com>
>> Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Cc: Lance Yang <lance.yang@linux.dev>
>> Cc: <stable@vger.kernel.org>
>>
>> ---
>> v3:
>>   * gather RB
>>   * adjust the commit log and comment per David
>
>Clearly not enough :)
>
>>   * add userspace-visible runtime effect in change log
>
>Which one was that?
>
>> v2:
>>   * restart page_vma_mapped_walk() after split_huge_pmd_locked()
>> ---
>>  mm/rmap.c | 12 +++++++++---
>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 618df3385c8b..1041a64b8e6b 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -2446,11 +2446,17 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>>  			__maybe_unused pmd_t pmdval;
>>
>>  			if (flags & TTU_SPLIT_HUGE_PMD) {
>> +				/*
>> +				 * split_huge_pmd_locked() might leave the
>> +				 * folio mapped through PTEs. Retry the walk
>> +				 * so we can detect this scenario and properly
>> +				 * abort the walk.
>> +				 */
>
>This comment is a lot clearer than the commit msg :)
>
>>  				split_huge_pmd_locked(vma, pvmw.address,
>>  						      pvmw.pmd, true);
>> -				ret = false;
>> -				page_vma_mapped_walk_done(&pvmw);
>> -				break;
>> +				flags &= ~TTU_SPLIT_HUGE_PMD;
>> +				page_vma_mapped_walk_restart(&pvmw);
>> +				continue;
>
>This logic does lok reasonable.
>
>>  			}
>>  #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
>>  			pmdval = pmdp_get(pvmw.pmd);
>> --
>> 2.34.1
>>
>
>Cheers, Lorenzo

-- 
Wei Yang
Help you, Help me

next prev parent reply	other threads:[~2026-02-10  3:23 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-05  3:31 Wei Yang
2026-02-09 17:08 ` Lorenzo Stoakes
2026-02-10  3:23   ` Wei Yang [this message]
2026-02-13 13:20     ` Wei Yang
2026-02-22  0:50       ` Wei Yang
2026-03-03 10:03         ` Lorenzo Stoakes
2026-03-03 10:12       ` Lorenzo Stoakes
2026-03-04  1:08         ` Wei Yang
2026-03-04 17:02           ` Lorenzo Stoakes (Oracle)
2026-03-03 10:08     ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260210032304.j4k5izweewouabqb@master \
    --to=richard.weiyang@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=gavinguo@igalia.com \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=riel@surriel.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox