From: David Hildenbrand <david@redhat.com>
To: Barry Song <21cnbao@gmail.com>,
akpm@linux-foundation.org, linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, Barry Song <v-songbaohua@oppo.com>,
"Lai, Yi" <yi1.lai@linux.intel.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Qi Zheng <zhengqi.arch@bytedance.com>,
Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
Suren Baghdasaryan <surenb@google.com>,
Lokesh Gidra <lokeshgidra@google.com>,
Tangquan Zheng <zhengtangquan@oppo.com>,
Lance Yang <ioworker0@gmail.com>, Zi Yan <ziy@nvidia.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Nico Pache <npache@redhat.com>,
Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>
Subject: Re: [PATCH] mm: Fix the race between collapse and PT_RECLAIM under per-vma lock
Date: Tue, 5 Aug 2025 10:02:46 +0200 [thread overview]
Message-ID: <2e95f6a0-7376-47f0-841d-8f442890149a@redhat.com> (raw)
In-Reply-To: <20250805035447.7958-1-21cnbao@gmail.com>
On 05.08.25 05:54, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
>
> The check_pmd_still_valid() call during collapse is currently only
> protected by the mmap_lock in write mode, which was sufficient when
> pt_reclaim always ran under mmap_lock in read mode. However, since
> madvise_dontneed can now execute under a per-VMA lock, this assumption
> is no longer valid. As a result, a race condition can occur between
> collapse and PT_RECLAIM, potentially leading to a kernel panic.
>
> [ 38.151897] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] SMP KASI
> [ 38.153519] KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
> [ 38.154605] CPU: 0 UID: 0 PID: 721 Comm: repro Not tainted 6.16.0-next-20250801-next-2025080 #1 PREEMPT(voluntary)
> [ 38.155929] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org4
> [ 38.157418] RIP: 0010:kasan_byte_accessible+0x15/0x30
> [ 38.158125] Code: 03 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 48 b8 00 00 00 00 00 fc0
> [ 38.160461] RSP: 0018:ffff88800feef678 EFLAGS: 00010286
> [ 38.161220] RAX: dffffc0000000000 RBX: 0000000000000001 RCX: 1ffffffff0dde60c
> [ 38.162232] RDX: 0000000000000000 RSI: ffffffff85da1e18 RDI: dffffc0000000003
> [ 38.163176] RBP: ffff88800feef698 R08: 0000000000000001 R09: 0000000000000000
> [ 38.164195] R10: 0000000000000000 R11: ffff888016a8ba58 R12: 0000000000000018
> [ 38.165189] R13: 0000000000000018 R14: ffffffff85da1e18 R15: 0000000000000000
> [ 38.166100] FS: 0000000000000000(0000) GS:ffff8880e3b40000(0000) knlGS:0000000000000000
> [ 38.167137] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 38.167891] CR2: 00007f97fadfe504 CR3: 0000000007088005 CR4: 0000000000770ef0
> [ 38.168812] PKRU: 55555554
> [ 38.169275] Call Trace:
> [ 38.169647] <TASK>
> [ 38.169975] ? __kasan_check_byte+0x19/0x50
> [ 38.170581] lock_acquire+0xea/0x310
> [ 38.171083] ? rcu_is_watching+0x19/0xc0
> [ 38.171615] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
> [ 38.172343] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
> [ 38.173130] _raw_spin_lock+0x38/0x50
> [ 38.173707] ? __pte_offset_map_lock+0x1a2/0x3c0
> [ 38.174390] __pte_offset_map_lock+0x1a2/0x3c0
> [ 38.174987] ? __pfx___pte_offset_map_lock+0x10/0x10
> [ 38.175724] ? __pfx_pud_val+0x10/0x10
> [ 38.176308] ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30
> [ 38.177183] unmap_page_range+0xb60/0x43e0
> [ 38.177824] ? __pfx_unmap_page_range+0x10/0x10
> [ 38.178485] ? mas_next_slot+0x133a/0x1a50
> [ 38.179079] unmap_single_vma.constprop.0+0x15b/0x250
> [ 38.179830] unmap_vmas+0x1fa/0x460
> [ 38.180373] ? __pfx_unmap_vmas+0x10/0x10
> [ 38.180994] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
> [ 38.181877] exit_mmap+0x1a2/0xb40
> [ 38.182396] ? lock_release+0x14f/0x2c0
> [ 38.182929] ? __pfx_exit_mmap+0x10/0x10
> [ 38.183474] ? __pfx___mutex_unlock_slowpath+0x10/0x10
> [ 38.184188] ? mutex_unlock+0x16/0x20
> [ 38.184704] mmput+0x132/0x370
> [ 38.185208] do_exit+0x7e7/0x28c0
> [ 38.185682] ? __this_cpu_preempt_check+0x21/0x30
> [ 38.186328] ? do_group_exit+0x1d8/0x2c0
> [ 38.186873] ? __pfx_do_exit+0x10/0x10
> [ 38.187401] ? __this_cpu_preempt_check+0x21/0x30
> [ 38.188036] ? _raw_spin_unlock_irq+0x2c/0x60
> [ 38.188634] ? lockdep_hardirqs_on+0x89/0x110
> [ 38.189313] do_group_exit+0xe4/0x2c0
> [ 38.189831] __x64_sys_exit_group+0x4d/0x60
> [ 38.190413] x64_sys_call+0x2174/0x2180
> [ 38.190935] do_syscall_64+0x6d/0x2e0
> [ 38.191449] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> This patch moves the vma_start_write() call to precede
> check_pmd_still_valid(), ensuring that the check is also properly
> protected by the per-VMA lock.
>
> Fixes: a6fde7add78d ("mm: use per_vma lock for MADV_DONTNEED")
> Tested-by: "Lai, Yi" <yi1.lai@linux.intel.com>
> Reported-by: "Lai, Yi" <yi1.lai@linux.intel.com>
> Closes: https://lore.kernel.org/all/aJAFrYfyzGpbm+0m@ly-workstation/
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Cc: Qi Zheng <zhengqi.arch@bytedance.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Jann Horn <jannh@google.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Lokesh Gidra <lokeshgidra@google.com>
> Cc: Tangquan Zheng <zhengtangquan@oppo.com>
> Cc: Lance Yang <ioworker0@gmail.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> Cc: Nico Pache <npache@redhat.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Dev Jain <dev.jain@arm.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
> mm/khugepaged.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 374a6a5193a7..6b40bdfd224c 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1172,11 +1172,11 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address,
> if (result != SCAN_SUCCEED)
> goto out_up_write;
> /* check if the pmd is still valid */
> + vma_start_write(vma);
> result = check_pmd_still_valid(mm, address, pmd);
> if (result != SCAN_SUCCEED)
> goto out_up_write;
>
> - vma_start_write(vma);
> anon_vma_lock_write(vma->anon_vma);
>
> mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address,
LGTM, I was wondering whether we should just place it next to the
mmap_write_lock() with the assumption that hugepage_vma_revalidate()
will commonly not fail.
So personally, I would move it further up.
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
prev parent reply other threads:[~2025-08-05 8:02 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-05 3:54 Barry Song
2025-08-05 5:20 ` Lorenzo Stoakes
2025-08-05 6:41 ` Baolin Wang
2025-08-05 6:42 ` Qi Zheng
2025-08-05 7:53 ` Baolin Wang
2025-08-05 8:17 ` Qi Zheng
2025-08-05 8:56 ` Baolin Wang
2025-08-05 9:30 ` Qi Zheng
2025-08-05 9:50 ` David Hildenbrand
2025-08-05 10:07 ` Baolin Wang
2025-08-05 10:26 ` Qi Zheng
2025-08-05 8:02 ` David Hildenbrand [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2e95f6a0-7376-47f0-841d-8f442890149a@redhat.com \
--to=david@redhat.com \
--cc=21cnbao@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=dev.jain@arm.com \
--cc=ioworker0@gmail.com \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lokeshgidra@google.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=npache@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=surenb@google.com \
--cc=v-songbaohua@oppo.com \
--cc=vbabka@suse.cz \
--cc=yi1.lai@linux.intel.com \
--cc=zhengqi.arch@bytedance.com \
--cc=zhengtangquan@oppo.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox