linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: akpm@linux-foundation.org, david@kernel.org, riel@surriel.com,
	Liam.Howlett@oracle.com, vbabka@suse.cz, harry.yoo@oracle.com,
	jannh@google.com, gavinguo@igalia.com,
	baolin.wang@linux.alibaba.com, ziy@nvidia.com,
	linux-mm@kvack.org, Lance Yang <lance.yang@linux.dev>,
	stable@vger.kernel.org
Subject: Re: [Patch v3] mm/huge_memory: fix early failure try_to_migrate() when split huge pmd for shared thp
Date: Tue, 3 Mar 2026 10:12:35 +0000	[thread overview]
Message-ID: <c820d6a1-4283-46a5-a639-04f6849e4e73@lucifer.local> (raw)
In-Reply-To: <20260213132027.wm75sh6trz7n24kd@master>

On Fri, Feb 13, 2026 at 01:20:27PM +0000, Wei Yang wrote:
> On Tue, Feb 10, 2026 at 03:23:04AM +0000, Wei Yang wrote:
> >On Mon, Feb 09, 2026 at 05:08:16PM +0000, Lorenzo Stoakes wrote:
> >>On Thu, Feb 05, 2026 at 03:31:13AM +0000, Wei Yang wrote:
> >>> Commit 60fbb14396d5 ("mm/huge_memory: adjust try_to_migrate_one() and
> >>> split_huge_pmd_locked()") return false unconditionally after
> >>> split_huge_pmd_locked() which may fail early during try_to_migrate() for
> >>> shared thp. This will lead to unexpected folio split failure.
> >>
> >>I think this could be put more clearly. 'When splitting a PMD THP migration
> >>entry in try_to_migrate_one() in a rmap walk invoked by try_to_migrate() when
> >
> >split_huge_pmd_locked() could split a PMD THP migration entry, but here we
> >expect a PMD THP normal entry.
> >
> >>TTU_SPLIT_HUGE_PMD is specified.' or something like that.
> >>
> >>>
> >>> One way to reproduce:
> >>>
> >>>     Create an anonymous thp range and fork 512 children, so we have a
> >>>     thp shared mapped in 513 processes. Then trigger folio split with
> >>>     /sys/kernel/debug/split_huge_pages debugfs to split the thp folio to
> >>>     order 0.
> >>
> >>I think you should explain the issue before the repro. This is just confusing
> >>things. Mention the repro _afterwards_.
> >>
> >
> >OK, will move afterwards.
> >
> >>>
> >>> Without the above commit, we can successfully split to order 0.
> >>> With the above commit, the folio is still a large folio.
> >>>
> >>> The reason is the above commit return false after split pmd
> >>
> >>This sentence doesn't really make sense. Returns false where? And under what
> >>circumstances?
> >>
> >>I'm having to look through 60fbb14396d5 to understand this which isn't a good
> >>sign.
> >>
> >>'This patch adjusted try_to_migrate_one() to, when a PMD-mapped THP migration
> >
> >I am afraid the original intention of commit 60fbb14396d5 is not just for
> >migration entry.
> >
> >>entry is found, and TTU_SPLIT_HUGE_PMD is specified (for example, via
> >>unmap_folio()), exit the walk and return false unconditionally'.
> >>
> >>> unconditionally in the first process and break try_to_migrate().
> >>>
> >>> On memory pressure or failure, we would try to reclaim unused memory or
> >>> limit bad memory after folio split. If failed to split it, we will leave
> >>
> >>Limit bad memory? What does that mean? Also should be If '_we_' or '_it_' or
> >>something like that.
> >>
> >
> >What I want to mean is in memory_failure() we use try_to_split_thp_page() and
> >the PG_has_hwpoisoned bit is only set in the after-split folio contains
> >@split_at.

I mean is this the case you're asserting in your repro or is it the only one in
which the issue can arise?

You should make this clear with reference to the actual functions where this
happens in the commit msg.

> >
> >>> some more memory unusable than expected.
> >>
> >>'We will leave some more memory unusable than expected' is super unclear.
> >>
> >>You mean we will fail to migrate THP entries at the PTE level?
> >>
> >
> >No.
> >
> >Hmm... I would like to clarify before continue.
> >
> >This fix is not to fix migration case. This is to fix folio split for a shared
> >mapped PMD THP. Current folio split leverage migration entry during split
> >anonymous folio. So the action here is not to migrate it.
> >
> >I am a little lost here.
> >
> >>Can we say this instead please?
> >>
>
> Hi, Lorenzo
>
> I am not sure understand you correctly. If not, please let me know.
>
> >>>
> >>> The tricky thing in above reproduce method is current debugfs interface
> >>> leverage function split_huge_pages_pid(), which will iterate the whole
> >>> pmd range and do folio split on each base page address. This means it
> >>> will try 512 times, and each time split one pmd from pmd mapped to pte
> >>> mapped thp. If there are less than 512 shared mapped process,
> >>> the folio is still split successfully at last. But in real world, we
> >>> usually try it for once.
> >>
> >>This whole sentence could be dropped I think I don't think it adds anything.
> >>
> >>And you're really confusing the issue by dwelling on this I think.
> >>
>
> It is intended to explain why the reproduce method should fork 512 child. In
> case it is not helpful, I will drop it.

Yeah it's not too helpful I don't think. You could say 'forking many children'
or something.

>
> >>You need to restart the walk in this case in order for the PTEs to be correctly
> >>handled right?
> >>
> >>Can you explain why we can't just essentially revert 60fbb14396d5? Or at least
> >>the bit that did this change?
>
> Commit 60fbb14396d5 removed some duplicated check covered by
> page_vma_mapped_walk(), so just reverting it may not good?
>
> You mean a sentence like above is preferred in commit msg?

I mean you need to explain why you're not just reverting it, saying why in
the commit msg would be helpful yes, thanks!

>
> >>
> >>Also is unmap_folio() the only caller with TTU_SPLIT_HUGE_PMD as the comment
> >>that was deleted by 60fbb14396d5 implied? Or are there others? If it is, please
> >>mention the commit msg.
> >>
>
> Currently there are two core users of TTU_SPLIT_HUGE_PMD:
>
>   * try_to_unmap_one()
>   * try_to_migrate_one()
>
> And another two indirect user by calling try_to_unmap():
>
>   * try_folio_split_or_unmap()
>   * shrink_folio_list()
>
> try_to_unmap_one() doesn't fail early, so only try_to_migrate_one() is
> affected.
>
> So you prefer some description like above to be added in commit msg?

Yes please! Thanks.

>
> >>
> >>>
> >>> This patch fixes this by restart page_vma_mapped_walk() after
> >>> split_huge_pmd_locked(). We cannot simply return "true" to fix the
> >>> problem, as that would affect another case:
> >>
> >>I mean how would it fix the problem to incorrectly have it return true when the
> >>walk had not in fact completed?
> >>
> >>I'm not sure why you're dwelling on this idea in the commit msg?
> >>
> >>> split_huge_pmd_locked()->folio_try_share_anon_rmap_pmd() can failed and
> >>> leave the folio mapped through PTEs; we would return "true" from
> >>> try_to_migrate_one() in that case as well. While that is mostly
> >>> harmless, we could end up walking the rmap, wasting some cycles.
> >>
> >>I mean I think we can just drop this whole paragraph no?
> >>
>
> I had an original explanation in [1], which is not clear.
> Then David proposed this version in [2], which looks good to me. So I took it
> in v3.
>
> If this is not necessary, I am ok to drop it.

Hmm :P well I don't want to contradict David, his suggestions are usually
excellent, but I think that paragraph needs rework at the very least. It's
useful to mention functions explicitly, I think something like:

'when invoking folio_try_share_anon_rmap_pmd() from split_huge_pmd_locked(), the
latter can fail and leave a large folio mapped using PTEs, in which case we
ought to return true from try_to_migrate_one(). This might result in unnecesary
walking of the rmap but is relatively harmless'

Might work better?

>
> [1]: http://lkml.kernel.org/r/20260204004219.6524-1-richard.weiyang@gmail.com
> [2]: http://lkml.kernel.org/r/df86ccfd-68a5-416e-81cc-02858e395b70@kernel.org
>
> >>You might think I'm being picky about the commit msg here, but as is I find it
> >>pretty much incomprehensible and that's not helpful if we have to go back and
> >>read this in future.
> >>
> >
> >Never mind.
> >
> >A clearer and comprehensive change log is helpful for all. And my English is
> >not native language, so your suggestion helps a lot.
> >
> >>>
> >>> Fixes: 60fbb14396d5 ("mm/huge_memory: adjust try_to_migrate_one() and split_huge_pmd_locked()")
> >>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> >>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> >>> Reviewed-by: Zi Yan <ziy@nvidia.com>
> >>> Tested-by: Lance Yang <lance.yang@linux.dev>
> >>> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> >>> Reviewed-by: Gavin Guo <gavinguo@igalia.com>
> >>> Acked-by: David Hildenbrand (arm) <david@kernel.org>
> >>> Cc: Gavin Guo <gavinguo@igalia.com>
> >>> Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
> >>> Cc: Zi Yan <ziy@nvidia.com>
> >>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> >>> Cc: Lance Yang <lance.yang@linux.dev>
> >>> Cc: <stable@vger.kernel.org>
> >>>
> >>> ---
> >>> v3:
> >>>   * gather RB
> >>>   * adjust the commit log and comment per David
> >>
> >>Clearly not enough :)
> >>
> >>>   * add userspace-visible runtime effect in change log
> >>
> >>Which one was that?
> >>
> >>> v2:
> >>>   * restart page_vma_mapped_walk() after split_huge_pmd_locked()
> >>> ---
> >>>  mm/rmap.c | 12 +++++++++---
> >>>  1 file changed, 9 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/mm/rmap.c b/mm/rmap.c
> >>> index 618df3385c8b..1041a64b8e6b 100644
> >>> --- a/mm/rmap.c
> >>> +++ b/mm/rmap.c
> >>> @@ -2446,11 +2446,17 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
> >>>  			__maybe_unused pmd_t pmdval;
> >>>
> >>>  			if (flags & TTU_SPLIT_HUGE_PMD) {
> >>> +				/*
> >>> +				 * split_huge_pmd_locked() might leave the
> >>> +				 * folio mapped through PTEs. Retry the walk
> >>> +				 * so we can detect this scenario and properly
> >>> +				 * abort the walk.
> >>> +				 */
> >>
> >>This comment is a lot clearer than the commit msg :)
> >>
> >>>  				split_huge_pmd_locked(vma, pvmw.address,
> >>>  						      pvmw.pmd, true);
> >>> -				ret = false;
> >>> -				page_vma_mapped_walk_done(&pvmw);
> >>> -				break;
> >>> +				flags &= ~TTU_SPLIT_HUGE_PMD;
> >>> +				page_vma_mapped_walk_restart(&pvmw);
> >>> +				continue;
> >>
> >>This logic does lok reasonable.
> >>
> >>>  			}
> >>>  #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
> >>>  			pmdval = pmdp_get(pvmw.pmd);
> >>> --
> >>> 2.34.1
> >>>
> >>
> >>Cheers, Lorenzo
> >
> >--
> >Wei Yang
> >Help you, Help me
>
> --
> Wei Yang
> Help you, Help me

Cheers, Lorenzo


  parent reply	other threads:[~2026-03-03 10:12 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-05  3:31 Wei Yang
2026-02-09 17:08 ` Lorenzo Stoakes
2026-02-10  3:23   ` Wei Yang
2026-02-13 13:20     ` Wei Yang
2026-02-22  0:50       ` Wei Yang
2026-03-03 10:03         ` Lorenzo Stoakes
2026-03-03 10:12       ` Lorenzo Stoakes [this message]
2026-03-03 10:08     ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c820d6a1-4283-46a5-a639-04f6849e4e73@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=gavinguo@igalia.com \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-mm@kvack.org \
    --cc=richard.weiyang@gmail.com \
    --cc=riel@surriel.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox