From: Harry Yoo <harry.yoo@oracle.com>
To: Jane Chu <jane.chu@oracle.com>
Cc: osalvador@suse.de, liushixin2@huawei.com, muchun.song@linux.dev,
david@redhat.com, akpm@linux-foundation.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, jannh@google.com
Subject: Re: [PATCH] mm/hugetlb: fix copy_hugetlb_page_range() to use ->pt_share_count
Date: Wed, 10 Sep 2025 10:14:50 +0900 [thread overview]
Message-ID: <aMDRAr1UP1Ix6zaY@hyeyoo> (raw)
In-Reply-To: <20250909184357.569259-1-jane.chu@oracle.com>
On Tue, Sep 09, 2025 at 12:43:57PM -0600, Jane Chu wrote:
> commit 59d9094df3d79 introduced ->pt_share_count dedicated to
nit: the format should be:
commit <sha1> ("summary")
?
> hugetlb PMD share count tracking, but omitted fixing
> copy_hugetlb_page_range(), leaving the function relying on
> page_count() for tracking that no longer works.
>
> When lazy page table copy for hugetlb is disabled (commit bcd51a3c679d),
same here.
> fork()'ing with hugetlb PMD sharing quickly lockup -
>
> [ 239.446559] watchdog: BUG: soft lockup - CPU#75 stuck for 27s!
> [ 239.446611] RIP: 0010:native_queued_spin_lock_slowpath+0x7e/0x2e0
> [ 239.446631] Call Trace:
> [ 239.446633] <TASK>
> [ 239.446636] _raw_spin_lock+0x3f/0x60
> [ 239.446639] copy_hugetlb_page_range+0x258/0xb50
> [ 239.446645] copy_page_range+0x22b/0x2c0
> [ 239.446651] dup_mmap+0x3e2/0x770
> [ 239.446654] dup_mm.constprop.0+0x5e/0x230
> [ 239.446657] copy_process+0xd17/0x1760
> [ 239.446660] kernel_clone+0xc0/0x3e0
> [ 239.446661] __do_sys_clone+0x65/0xa0
> [ 239.446664] do_syscall_64+0x82/0x930
> [ 239.446668] ? count_memcg_events+0xd2/0x190
> [ 239.446671] ? syscall_trace_enter+0x14e/0x1f0
> [ 239.446676] ? syscall_exit_work+0x118/0x150
> [ 239.446677] ? arch_exit_to_user_mode_prepare.constprop.0+0x9/0xb0
> [ 239.446681] ? clear_bhb_loop+0x30/0x80
> [ 239.446684] ? clear_bhb_loop+0x30/0x80
> [ 239.446686] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> There are two options to resolve the potential latent issue:
> 1. remove the PMD sharing awareness from copy_hugetlb_page_range(),
> 2. fix it.
> This patch opts for the second option.
>
> Fixes: 59d9094df3d79 ("mm: hugetlb: independent PMD page table shared
> count")
nit: we don't add newline even when Fixes: tag is line is over 75 characters?
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> ---
The change in general looks good to me,
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
+Cc Jann Horn who backported the commit 59d9094df3d79.
Elaborating a little bit why it doesn't need to be backported:
TL;DR: One needs to backport commit 3aa4ed8040e15 to pre-v6.0 kernels,
or revert commit bcd51a3c679d to trigger this.
commit 59d9094df3d79 ("mm: hugetlb: independent PMD page table shared count")
is introduced in 6.13 and backported to 5.10.y, 5.15.y, 6.1.y, 6.6.y, 6.12.y.
As lazy page table copy is enabled in v6.0 by commit bcd51a3c679d
("hugetlb: lazy page table copies in fork()"), the bug is not triggered
because shared hugetlb VMAs go through lazy page table copy logic
(vma_needs_copy() returns false) or they can't share page tables
(uffd_disable_huge_pmd_share() returns false).
They shouldn't have anon_vma, VM_PFNMAP, VM_MIXEDMAP and UFFD-WP VMA
cannot share page tables - so either vma_needs_copy() return false, or
page tables cannot be shared.
And before commit 3aa4ed8040e15 ("mm/hugetlb: make detecting shared pte
more reliable") introduced in v6.1, copy_hugetlb_page_range() doesn't check
refcount to determine whether the page table is shared.
> mm/hugetlb.c | 13 ++++---------
> 1 file changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 753f99b4c718..8ca5b4f7805f 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5594,18 +5594,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
> break;
> }
>
> - /*
> - * If the pagetables are shared don't copy or take references.
> - *
> - * dst_pte == src_pte is the common case of src/dest sharing.
> - * However, src could have 'unshared' and dst shares with
> - * another vma. So page_count of ptep page is checked instead
> - * to reliably determine whether pte is shared.
> - */
> - if (page_count(virt_to_page(dst_pte)) > 1) {
> +#ifdef CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING
> + /* If the pagetables are shared don't copy or take references. */
> + if (ptdesc_pmd_pts_count(virt_to_ptdesc(dst_pte)) > 0) {
> addr |= last_addr_mask;
> continue;
> }
> +#endif
>
> dst_ptl = huge_pte_lock(h, dst, dst_pte);
> src_ptl = huge_pte_lockptr(h, src, src_pte);
> --
> 2.43.5
>
--
Cheers,
Harry / Hyeonggon
next prev parent reply other threads:[~2025-09-10 1:15 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-09 18:43 Jane Chu
2025-09-10 1:14 ` Harry Yoo [this message]
2025-09-10 19:23 ` jane.chu
2025-09-10 6:45 ` David Hildenbrand
2025-09-11 19:54 ` jane.chu
2025-09-12 7:31 ` David Hildenbrand
2025-09-15 16:51 ` jane.chu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aMDRAr1UP1Ix6zaY@hyeyoo \
--to=harry.yoo@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=jane.chu@oracle.com \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liushixin2@huawei.com \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox