From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD5D6C3ABD9 for ; Wed, 14 May 2025 08:10:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BFB7D6B00F6; Wed, 14 May 2025 04:10:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BA95A6B00F7; Wed, 14 May 2025 04:10:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A72EE6B00F8; Wed, 14 May 2025 04:10:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 846896B00F6 for ; Wed, 14 May 2025 04:10:32 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 528C0140AE3 for ; Wed, 14 May 2025 08:10:32 +0000 (UTC) X-FDA: 83440791504.02.A508A67 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by imf08.hostedemail.com (Postfix) with ESMTP id 24B8D160004 for ; Wed, 14 May 2025 08:10:28 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=igalia.com header.s=20170329 header.b=qhJa3wSM; spf=pass (imf08.hostedemail.com: domain of gavinguo@igalia.com designates 213.97.179.56 as permitted sender) smtp.mailfrom=gavinguo@igalia.com; dmarc=pass (policy=none) header.from=igalia.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747210230; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wlcuC/Obn9blaoTR3DxmIs5wGn43S4QO+fV7DMoCvEg=; b=eRMh7wQ/l2ik+/7zbVfYFys5CD2fJxnV79MOf13KVgVJXs9gsl69TmY/egckwlZP7DnIBk TgJvJ9DHR02iklVnQfrz3NJzmYqB497a2s1bPzq3QFi2r6cYt/7tTJNk5jl6rskBQ8WvYT Jbt+r8YFwuvLUhL9qcvWqcnJXX2z880= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=igalia.com header.s=20170329 header.b=qhJa3wSM; spf=pass (imf08.hostedemail.com: domain of gavinguo@igalia.com designates 213.97.179.56 as permitted sender) smtp.mailfrom=gavinguo@igalia.com; dmarc=pass (policy=none) header.from=igalia.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747210230; a=rsa-sha256; cv=none; b=D3ZN3cAd5t2LluhxIc+tgrJS6FWfLb31HeF9eX9FfGm+Ut/7ynb2dTNOkOeUFDpnmH9Nq/ sD2Gtfd22/HTftRzDSUFpiFWgyMSvxaxX0lXwfYsaGX7exF+F88D/hteNaLZ+R61QvN3s4 WtNoOn4KggX+PO9x4i2l/1v/n5fohEA= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:From: References:Cc:To:Subject:MIME-Version:Date:Message-ID:Sender:Reply-To: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=wlcuC/Obn9blaoTR3DxmIs5wGn43S4QO+fV7DMoCvEg=; b=qhJa3wSMDkumojp2YO3yBEseMy KWNe/InMsoQvUqNIH3MvPI1FVu0umdgCm/in1O/DC6DiWqFanT/Aw1kmeYRv2cpEUK/XXAFO/7Qzf Jq4x74zZnHXLOAt25RoE7fVJz3mEIijLCMDtAZfhSyhQUJOKvU/MTMRvZS0jXpKeFoDWVoQ7SldG9 HZCAMkrk8cv/yprnWF+yJ2y9AgTIhVEeibvQW41rnvj/GUO0adEfcAW3knb2sDh8QipI20U5lg244 vaWUVYxS5y9gECyPWsyxkjZj5N7uU34/9wvxerkrFH3tcIoGuEdL+qP7akHob+xMEpt5qxsw50TdM Okt0Ngiw==; Received: from 27-51-1-120.adsl.fetnet.net ([27.51.1.120] helo=[192.168.238.43]) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_128_GCM:128) (Exim) id 1uF76S-0082hg-Ck; Wed, 14 May 2025 10:10:19 +0200 Message-ID: <075ae729-1d4a-4f12-a2ba-b4f508e5d0a1@igalia.com> Date: Wed, 14 May 2025 16:10:12 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm/hugetlb: fix a deadlock with pagecache_folio and hugetlb_fault_mutex_table To: Byungchul Park Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, muchun.song@linux.dev, osalvador@suse.de, akpm@linux-foundation.org, mike.kravetz@oracle.com, kernel-dev@igalia.com, stable@vger.kernel.org, Hugh Dickins , Florent Revest , Gavin Shan , kernel_team@skhynix.com References: <20250513093448.592150-1-gavinguo@igalia.com> <20250514064729.GA17622@system.software.com> Content-Language: en-US From: Gavin Guo In-Reply-To: <20250514064729.GA17622@system.software.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: a98t4nww4p5bah9ib3h5ca5sb8fq4hjm X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 24B8D160004 X-HE-Tag: 1747210228-207263 X-HE-Meta: U2FsdGVkX19rk1Lyv+3mzpzHGaMJfEVu+SbZ0eUhe5E9vv/SdY+mYM3Ae6c06Dj3WkNkyrbOcDIK4KfKOerOe2CA7hJ/pSEBZI9Rc/K1iqlmwE0/h/BHtHdNnRLdJVaBX56QUdapAZHgQbqK+7k8wvJuBvfAqdJOBpuzTmbqe/ap8bMdENYHaw9pBsYPoJ7DSHIDchy1JUlzU7t8VqD3QrmmETcVvBqGSZaxzRVHIKQDYaWZriqvVyl5xSAgGG/POAlp88B7Sit/QRBpcY6q/+VoWXIT+7nmkqdHwWA0KNMgzdJ1gVDBftSomvQtNtNNwjZVcOsCbvddjwkiYqnWV5378lwbDMmclYqiG8BplGJ4V/fCgQrjR985Wegy+Tq7VPYZQd4Hl8IKp+XoqIOXsqpeOY0PrRe9MSU9V1ihcnHW3uYOY3vYcKCLGHJ3srn3eAYdqakU9MZXb1KSRe1awmqW/FWeHx5MFS2frWXL4Z0bSd4F2ydVR+cuyQFEZlJQuha58hKuihLmTfU1dZdQsf6EFAwlMVCj7jTzOJcEhzLCbiWYmHKq0QU54RIa0SdtdUuAvQuCVqZDEvR15niaKjY9Gz9AMmTvDAqLdehZON7s9ZJXbsmqUURVL4ifTKDSjQEbuOvgGhyjhM2frv/c1iTOlmak05YOHUGpi2IKM7Caa9AlnIcso1NeIqRLWV1hFpUp1WEmYcLH2TbmpSitKI8TtmT2BwKQqb/F33uDzktCyiDSFJaAw++40CBiwGMrVu843htR4ebCAw6Ls7phqkcWHNbzXIxZjov0KFOjFtGiLF7hXoLhbCl7zqHKsQDjrFANATV3k30lnKEDun8DebiWoEiqxx8GWZ83EtmOjgHwWWVYZVc9xbKTbsNNKKhaEDJjFpMdipUbfbNp99wGgyjBVvXao1L7EYJJ4lw8f6rxNPK7JHTrZnoQZMBpkbiXW9yZdhGLhSfBhEuu0iL ycvx/GrG I1ZYiqWhJGSRD7lMK+uj4BGHONcgly4zqrBxd2EWL4X9XH66m4KWyuOfD84MeoZRD0O5UwaDt1TFVzF9Vbv06VcSRRvYTbPa00JzV8yJZxj6EiNoChDNz41SrmirWvn2p77z2Exr+AY0pa3ohy8OzJi4dIjG9V2gLsrVYvJfFgX5P/z3vMzDg//Y7e69d6m8JtGah7yCbiYvAX6qoZGyFseyXmCb4+ckOhfMD1o1aNjxWavLbVZquagCNR7ajDdRGG33auWEqpdbQEA4a8hINWMpKyZKri06LLtVzqylvoPQjfUixP7eYDaQh/SEuhVVGJa5yGtmgStrJIRHPYux/yZKJDCxIFu24HmaPH2YIPr0JjVU9LTUKtEbr2v1Wz5ZD6KZs1SnXZU+5yCW+oQ5BU0Jny5q0ki75iM+vnLOiJOm3mscfHM2ONlZ3rQlhRlsmW0tVHKWqGYgb3Otij5sW1cx3u8i2a8L9Ol4DTtUOscLET2/kaNWTh2Bwzaebp+WcclU56Th53za+Inum2lGt/iVcVncrdsXfYSrpx6O0532Z+lmG6sTI5KZc8dSUX+n23flqBSyucbdfuKsN/2S3voNQmYZ8I1O/oWbAyoCel0Wn4AHoAv0pOxe/7g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Byungchul, On 5/14/25 14:47, Byungchul Park wrote: > On Tue, May 13, 2025 at 05:34:48PM +0800, Gavin Guo wrote: >> The patch fixes a deadlock which can be triggered by an internal >> syzkaller [1] reproducer and captured by bpftrace script [2] and its log > > Hi, > > I'm trying to reproduce using the test program [1]. But not yet > produced. I see a lot of segfaults while running [1]. I guess > something goes wrong. Is there any prerequisite condition to reproduce > it? Lemme know if any. Or can you try DEPT15 with your config and > environment by the following steps: > > 1. Apply the patchset on v6.15-rc6. > https://lkml.kernel.org/r/20250513100730.12664-1-byungchul@sk.com > 2. Turn on CONFIG_DEPT. > 3. Run test program reproducing the deadlock. > 4. Check dmesg to see if dept reported the dependency. > > Byungchul I have enabled the patchset and successfully reproduced the bug. It seems that there is no warning or error log related to the lock. Did I miss anything? This is the console log: https://drive.google.com/file/d/1dxWNiO71qE-H-e5NMPqj7W-aW5CkGSSF/view?usp=sharing > >> [3] in this scenario: >> >> Process 1 Process 2 >> --- --- >> hugetlb_fault >> mutex_lock(B) // take B >> filemap_lock_hugetlb_folio >> filemap_lock_folio >> __filemap_get_folio >> folio_lock(A) // take A >> hugetlb_wp >> mutex_unlock(B) // release B >> ... hugetlb_fault >> ... mutex_lock(B) // take B >> filemap_lock_hugetlb_folio >> filemap_lock_folio >> __filemap_get_folio >> folio_lock(A) // blocked >> unmap_ref_private >> ... >> mutex_lock(B) // retake and blocked >> >> This is a ABBA deadlock involving two locks: >> - Lock A: pagecache_folio lock >> - Lock B: hugetlb_fault_mutex_table lock >> >> The deadlock occurs between two processes as follows: >> 1. The first process (let’s call it Process 1) is handling a >> copy-on-write (COW) operation on a hugepage via hugetlb_wp. Due to >> insufficient reserved hugetlb pages, Process 1, owner of the reserved >> hugetlb page, attempts to unmap a hugepage owned by another process >> (non-owner) to satisfy the reservation. Before unmapping, Process 1 >> acquires lock B (hugetlb_fault_mutex_table lock) and then lock A >> (pagecache_folio lock). To proceed with the unmap, it releases Lock B >> but retains Lock A. After the unmap, Process 1 tries to reacquire Lock >> B. However, at this point, Lock B has already been acquired by another >> process. >> >> 2. The second process (Process 2) enters the hugetlb_fault handler >> during the unmap operation. It successfully acquires Lock B >> (hugetlb_fault_mutex_table lock) that was just released by Process 1, >> but then attempts to acquire Lock A (pagecache_folio lock), which is >> still held by Process 1. >> >> As a result, Process 1 (holding Lock A) is blocked waiting for Lock B >> (held by Process 2), while Process 2 (holding Lock B) is blocked waiting >> for Lock A (held by Process 1), constructing a ABBA deadlock scenario. >> >> The solution here is to unlock the pagecache_folio and provide the >> pagecache_folio_unlocked variable to the caller to have the visibility >> over the pagecache_folio status for subsequent handling. >> >> The error message: >> INFO: task repro_20250402_:13229 blocked for more than 64 seconds. >> Not tainted 6.15.0-rc3+ #24 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> task:repro_20250402_ state:D stack:25856 pid:13229 tgid:13228 ppid:3513 task_flags:0x400040 flags:0x00004006 >> Call Trace: >> >> __schedule+0x1755/0x4f50 >> schedule+0x158/0x330 >> schedule_preempt_disabled+0x15/0x30 >> __mutex_lock+0x75f/0xeb0 >> hugetlb_wp+0xf88/0x3440 >> hugetlb_fault+0x14c8/0x2c30 >> trace_clock_x86_tsc+0x20/0x20 >> do_user_addr_fault+0x61d/0x1490 >> exc_page_fault+0x64/0x100 >> asm_exc_page_fault+0x26/0x30 >> RIP: 0010:__put_user_4+0xd/0x20 >> copy_process+0x1f4a/0x3d60 >> kernel_clone+0x210/0x8f0 >> __x64_sys_clone+0x18d/0x1f0 >> do_syscall_64+0x6a/0x120 >> entry_SYSCALL_64_after_hwframe+0x76/0x7e >> RIP: 0033:0x41b26d >> >> INFO: task repro_20250402_:13229 is blocked on a mutex likely owned by task repro_20250402_:13250. >> task:repro_20250402_ state:D stack:28288 pid:13250 tgid:13228 ppid:3513 task_flags:0x400040 flags:0x00000006 >> Call Trace: >> >> __schedule+0x1755/0x4f50 >> schedule+0x158/0x330 >> io_schedule+0x92/0x110 >> folio_wait_bit_common+0x69a/0xba0 >> __filemap_get_folio+0x154/0xb70 >> hugetlb_fault+0xa50/0x2c30 >> trace_clock_x86_tsc+0x20/0x20 >> do_user_addr_fault+0xace/0x1490 >> exc_page_fault+0x64/0x100 >> asm_exc_page_fault+0x26/0x30 >> RIP: 0033:0x402619 >> >> INFO: task repro_20250402_:13250 blocked for more than 65 seconds. >> Not tainted 6.15.0-rc3+ #24 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> task:repro_20250402_ state:D stack:28288 pid:13250 tgid:13228 ppid:3513 task_flags:0x400040 flags:0x00000006 >> Call Trace: >> >> __schedule+0x1755/0x4f50 >> schedule+0x158/0x330 >> io_schedule+0x92/0x110 >> folio_wait_bit_common+0x69a/0xba0 >> __filemap_get_folio+0x154/0xb70 >> hugetlb_fault+0xa50/0x2c30 >> trace_clock_x86_tsc+0x20/0x20 >> do_user_addr_fault+0xace/0x1490 >> exc_page_fault+0x64/0x100 >> asm_exc_page_fault+0x26/0x30 >> RIP: 0033:0x402619 >> >> >> Showing all locks held in the system: >> 1 lock held by khungtaskd/35: >> #0: ffffffff879a7440 (rcu_read_lock){....}-{1:3}, at: debug_show_all_locks+0x30/0x180 >> 2 locks held by repro_20250402_/13229: >> #0: ffff888017d801e0 (&mm->mmap_lock){++++}-{4:4}, at: lock_mm_and_find_vma+0x37/0x300 >> #1: ffff888000fec848 (&hugetlb_fault_mutex_table[i]){+.+.}-{4:4}, at: hugetlb_wp+0xf88/0x3440 >> 3 locks held by repro_20250402_/13250: >> #0: ffff8880177f3d08 (vm_lock){++++}-{0:0}, at: do_user_addr_fault+0x41b/0x1490 >> #1: ffff888000fec848 (&hugetlb_fault_mutex_table[i]){+.+.}-{4:4}, at: hugetlb_fault+0x3b8/0x2c30 >> #2: ffff8880129500e8 (&resv_map->rw_sema){++++}-{4:4}, at: hugetlb_fault+0x494/0x2c30 >> >> Link: https://drive.google.com/file/d/1DVRnIW-vSayU5J1re9Ct_br3jJQU6Vpb/view?usp=drive_link [1] >> Link: https://github.com/bboymimi/bpftracer/blob/master/scripts/hugetlb_lock_debug.bt [2] >> Link: https://drive.google.com/file/d/1bWq2-8o-BJAuhoHWX7zAhI6ggfhVzQUI/view?usp=sharing [3] >> Fixes: 40549ba8f8e0 ("hugetlb: use new vma_lock for pmd sharing synchronization") >> Cc: >> Cc: Hugh Dickins >> Cc: Florent Revest >> Cc: Gavin Shan >> Signed-off-by: Gavin Guo >> --- >> mm/hugetlb.c | 33 ++++++++++++++++++++++++++++----- >> 1 file changed, 28 insertions(+), 5 deletions(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index e3e6ac991b9c..ad54a74aa563 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -6115,7 +6115,8 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma, >> * Keep the pte_same checks anyway to make transition from the mutex easier. >> */ >> static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, >> - struct vm_fault *vmf) >> + struct vm_fault *vmf, >> + bool *pagecache_folio_unlocked) >> { >> struct vm_area_struct *vma = vmf->vma; >> struct mm_struct *mm = vma->vm_mm; >> @@ -6212,6 +6213,22 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, >> u32 hash; >> >> folio_put(old_folio); >> + /* >> + * The pagecache_folio needs to be unlocked to avoid >> + * deadlock and we won't re-lock it in hugetlb_wp(). The >> + * pagecache_folio could be truncated after being >> + * unlocked. So its state should not be relied >> + * subsequently. >> + * >> + * Setting *pagecache_folio_unlocked to true allows the >> + * caller to handle any necessary logic related to the >> + * folio's unlocked state. >> + */ >> + if (pagecache_folio) { >> + folio_unlock(pagecache_folio); >> + if (pagecache_folio_unlocked) >> + *pagecache_folio_unlocked = true; >> + } >> /* >> * Drop hugetlb_fault_mutex and vma_lock before >> * unmapping. unmapping needs to hold vma_lock >> @@ -6566,7 +6583,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping, >> hugetlb_count_add(pages_per_huge_page(h), mm); >> if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) { >> /* Optimization, do the COW without a second fault */ >> - ret = hugetlb_wp(folio, vmf); >> + ret = hugetlb_wp(folio, vmf, NULL); >> } >> >> spin_unlock(vmf->ptl); >> @@ -6638,6 +6655,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, >> struct hstate *h = hstate_vma(vma); >> struct address_space *mapping; >> int need_wait_lock = 0; >> + bool pagecache_folio_unlocked = false; >> struct vm_fault vmf = { >> .vma = vma, >> .address = address & huge_page_mask(h), >> @@ -6792,7 +6810,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, >> >> if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) { >> if (!huge_pte_write(vmf.orig_pte)) { >> - ret = hugetlb_wp(pagecache_folio, &vmf); >> + ret = hugetlb_wp(pagecache_folio, &vmf, >> + &pagecache_folio_unlocked); >> goto out_put_page; >> } else if (likely(flags & FAULT_FLAG_WRITE)) { >> vmf.orig_pte = huge_pte_mkdirty(vmf.orig_pte); >> @@ -6809,10 +6828,14 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, >> out_ptl: >> spin_unlock(vmf.ptl); >> >> - if (pagecache_folio) { >> + /* >> + * If the pagecache_folio is unlocked in hugetlb_wp(), we skip >> + * folio_unlock() here. >> + */ >> + if (pagecache_folio && !pagecache_folio_unlocked) >> folio_unlock(pagecache_folio); >> + if (pagecache_folio) >> folio_put(pagecache_folio); >> - } >> out_mutex: >> hugetlb_vma_unlock_read(vma); >> >> >> base-commit: d76bb1ebb5587f66b0f8b8099bfbb44722bc08b3 >> -- >> 2.43.0 >>