From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E15FBC54E71 for ; Tue, 20 May 2025 19:53:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 340756B0099; Tue, 20 May 2025 15:53:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 318386B009E; Tue, 20 May 2025 15:53:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 255D16B009F; Tue, 20 May 2025 15:53:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id F16456B0099 for ; Tue, 20 May 2025 15:53:13 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 572FB1A10F2 for ; Tue, 20 May 2025 19:53:13 +0000 (UTC) X-FDA: 83464335066.23.D4E599E Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf08.hostedemail.com (Postfix) with ESMTP id 4874C16000B for ; Tue, 20 May 2025 19:53:11 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=QvCktGuM; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=sL68f2ji; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=QvCktGuM; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=sL68f2ji; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf08.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=osalvador@suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747770791; a=rsa-sha256; cv=none; b=dgCNAg/nMG6pwV5Y07tudXCViURh2XkP/sIhyB8xgQF40eSiiDNUWgVM8IkNmrbFja7QGK Hcp80t0TZjbd4ZMdIrBu13XHkRlD2+tIXfymL20h1HjMwt7DJpUPioccurabYGtZM3Lhfk EhqeJ91xq4FmZWl9ZhCRu5vIi8g7fYY= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=QvCktGuM; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=sL68f2ji; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=QvCktGuM; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=sL68f2ji; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf08.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=osalvador@suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747770791; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=S34FQjO/spo0/EF5quOdXn1uELyrC8/pQ7dPMdPUKXs=; b=pbbOy0NJ+joMn1Hb2zE5sCiPnfJCRrOZPE1pX98s4CfC7hzwvHNZn7KoR0miRTfFucjZln R5e/e/vRivALBlEavh+KA5kXec6XExGsVzMJFVIXEEXfmlOXaYZA1r1OVpMZbUxFzptpT/ Sr8RHnJJIMJ7ufFryxr9Yej13wlAVZ0= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id A55B02225F; Tue, 20 May 2025 19:53:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1747770789; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=S34FQjO/spo0/EF5quOdXn1uELyrC8/pQ7dPMdPUKXs=; b=QvCktGuMZWVjCcKAF6DQr/6K/g9d0kj8S88bvzkl9x+3GLmyUCJWBeakmlcbrgg6qgPJ7n NejSFqWc6b+GsRzQh6T6rcK7XR4sI8EDwFCNJaErMjb9JP2dhqOPC9JC1YI9HYYB+qFyzY jfxbpoGfWg8jz3yExwONwb5p+7uzWb8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1747770789; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=S34FQjO/spo0/EF5quOdXn1uELyrC8/pQ7dPMdPUKXs=; b=sL68f2jiZoxPUe0MpvYNAOsabqTjxTgbFBzg8X4oCHV/MW8maD8NyEdzlpOlGr+xOvEfpk h040HrHFg4xf3FCA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1747770789; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=S34FQjO/spo0/EF5quOdXn1uELyrC8/pQ7dPMdPUKXs=; b=QvCktGuMZWVjCcKAF6DQr/6K/g9d0kj8S88bvzkl9x+3GLmyUCJWBeakmlcbrgg6qgPJ7n NejSFqWc6b+GsRzQh6T6rcK7XR4sI8EDwFCNJaErMjb9JP2dhqOPC9JC1YI9HYYB+qFyzY jfxbpoGfWg8jz3yExwONwb5p+7uzWb8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1747770789; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=S34FQjO/spo0/EF5quOdXn1uELyrC8/pQ7dPMdPUKXs=; b=sL68f2jiZoxPUe0MpvYNAOsabqTjxTgbFBzg8X4oCHV/MW8maD8NyEdzlpOlGr+xOvEfpk h040HrHFg4xf3FCA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 77B7B13A3E; Tue, 20 May 2025 19:53:09 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id eI5+HKXdLGgYdQAAD6G6ig (envelope-from ); Tue, 20 May 2025 19:53:09 +0000 Date: Tue, 20 May 2025 21:53:00 +0200 From: Oscar Salvador To: Gavin Guo Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, muchun.song@linux.dev, akpm@linux-foundation.org, mike.kravetz@oracle.com, kernel-dev@igalia.com, stable@vger.kernel.org, Hugh Dickins , Florent Revest , Gavin Shan Subject: Re: [PATCH] mm/hugetlb: fix a deadlock with pagecache_folio and hugetlb_fault_mutex_table Message-ID: References: <20250513093448.592150-1-gavinguo@igalia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250513093448.592150-1-gavinguo@igalia.com> X-Rspam-User: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 4874C16000B X-Stat-Signature: akken7asym81ss945gzhc5rwuhaiqwm9 X-HE-Tag: 1747770791-218945 X-HE-Meta: U2FsdGVkX19WJtCoDPeEyqSRv6eRwlaFt67hZnJ6hG+WTr27RfBtRvSmSfKh7472p5zj1Dm76VJuvkzFuxY5aryHSdy1wWFxwqyUdHI3Q4UgOAkTocyCCaAjxkPFgvrKKdFMNLbkhrfCLWNL1x0bDtYjkacon46fF5zL62/4AntAKCMQQ01Etr9YAYOKOBangq8nKmwfqXaeZFjvWg6+obgPB2VgQPt934UWSaxqz1MaKs1gqvZoMWUC/AXJ7TM8yqQJC9/UeKveWWFb7cT607zMYtMEyjOobinKTQLTZZtyAe3Emw/l4xQXpWGWrdtg3egg6fu9W4EVqoDiSyqlT4sT3n4lUwfLqHkGQnJTO+bu56QjbGlEwfcNvWtozXn2FX0QPaQ3hs+Ja/Emdya80pwemKLe7DltYZ0eI82EJ7fuwkMIIQO1YY+EcQ2MgK4AH6u5hB1bNU4uRXUmcfmaUEep3NTQeZ3/HzPi7LlVqNvQYogF6l82btWcAVNoncxxrC5dTojHQ5abxFjas2Ij51C4qVlMPX1X1dZfgLxRXDr370TnMIN4GleBUVHBVnr1OVRhNECFu+MnoX/e8K9Mhts9r7ty/A+CEj6FQNYAb9V64blRnQkcZZFl+WdhqVJ/LZO1jzXQDLo0/T1vCrwwsXP8BQhGfUaYYdFE+OzYDYQ56H54BgSEJJCMdJZJDZnfvFSpbCtJUdTG8QLPCDC6c5XTVWbndNZxZcu89GUJYnWmM3C6s9dT8bz5b57V7wEcaYgYsQXQimWIetRQwhGCFbTL1JMseUH9si+krAPMXXwuIAsewr2q1a/5Y/7har2fd/prDijnZkDoCbBSWZy/Yz457QYJIVsIPVyv3WAoMIhBLjsjG/awdGVgd0bx++Ji2od8sLSmOjJjZR8C+XFytnDkAhY2gb3Y5TudYQZMhr67W2vLCOlw685XyjkaB4ttLipUMfFkukEixeEPBrz kgw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 13, 2025 at 05:34:48PM +0800, Gavin Guo wrote: > The patch fixes a deadlock which can be triggered by an internal > syzkaller [1] reproducer and captured by bpftrace script [2] and its log > [3] in this scenario: > > Process 1 Process 2 > --- --- > hugetlb_fault > mutex_lock(B) // take B > filemap_lock_hugetlb_folio > filemap_lock_folio > __filemap_get_folio > folio_lock(A) // take A > hugetlb_wp > mutex_unlock(B) // release B > ... hugetlb_fault > ... mutex_lock(B) // take B > filemap_lock_hugetlb_folio > filemap_lock_folio > __filemap_get_folio > folio_lock(A) // blocked > unmap_ref_private > ... > mutex_lock(B) // retake and blocked > ... > Signed-off-by: Gavin Guo I think this is more convoluted that it needs to be? hugetlb_wp() is called from hugetlb_no_page() and hugetlb_fault(). hugetlb_no_page() locks and unlocks the lock itself, which leaves us with hugetlb_fault(). hugetlb_fault() always passed the folio locked to hugetlb_wp(), and the latter only unlocks it when we have a cow from owner happening and we cannot satisfy the allocation. So, should not checking whether the folio is still locked after returning enough? What speaks against: diff --git a/mm/hugetlb.c b/mm/hugetlb.c index bd8971388236..23b57c5689a4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6228,6 +6228,12 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, u32 hash; folio_put(old_folio); + /* + * The pagecache_folio needs to be unlocked to avoid + * deadlock when the child unmaps the folio. + */ + if (pagecache_folio) + folio_unlock(pagecache_folio); /* * Drop hugetlb_fault_mutex and vma_lock before * unmapping. unmapping needs to hold vma_lock @@ -6825,7 +6831,12 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, spin_unlock(vmf.ptl); if (pagecache_folio) { - folio_unlock(pagecache_folio); + /* + * hugetlb_wp() might have already unlocked pagecache_folio, so + * skip it if that is the case. + */ + if (folio_test_locked(pagecache_folio)) + folio_unlock(pagecache_folio); folio_put(pagecache_folio); } out_mutex: > --- > mm/hugetlb.c | 33 ++++++++++++++++++++++++++++----- > 1 file changed, 28 insertions(+), 5 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index e3e6ac991b9c..ad54a74aa563 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -6115,7 +6115,8 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma, > * Keep the pte_same checks anyway to make transition from the mutex easier. > */ > static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, > - struct vm_fault *vmf) > + struct vm_fault *vmf, > + bool *pagecache_folio_unlocked) > { > struct vm_area_struct *vma = vmf->vma; > struct mm_struct *mm = vma->vm_mm; > @@ -6212,6 +6213,22 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, > u32 hash; > > folio_put(old_folio); > + /* > + * The pagecache_folio needs to be unlocked to avoid > + * deadlock and we won't re-lock it in hugetlb_wp(). The > + * pagecache_folio could be truncated after being > + * unlocked. So its state should not be relied > + * subsequently. > + * > + * Setting *pagecache_folio_unlocked to true allows the > + * caller to handle any necessary logic related to the > + * folio's unlocked state. > + */ > + if (pagecache_folio) { > + folio_unlock(pagecache_folio); > + if (pagecache_folio_unlocked) > + *pagecache_folio_unlocked = true; > + } > /* > * Drop hugetlb_fault_mutex and vma_lock before > * unmapping. unmapping needs to hold vma_lock > @@ -6566,7 +6583,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping, > hugetlb_count_add(pages_per_huge_page(h), mm); > if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) { > /* Optimization, do the COW without a second fault */ > - ret = hugetlb_wp(folio, vmf); > + ret = hugetlb_wp(folio, vmf, NULL); > } > > spin_unlock(vmf->ptl); > @@ -6638,6 +6655,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > struct hstate *h = hstate_vma(vma); > struct address_space *mapping; > int need_wait_lock = 0; > + bool pagecache_folio_unlocked = false; > struct vm_fault vmf = { > .vma = vma, > .address = address & huge_page_mask(h), > @@ -6792,7 +6810,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > > if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) { > if (!huge_pte_write(vmf.orig_pte)) { > - ret = hugetlb_wp(pagecache_folio, &vmf); > + ret = hugetlb_wp(pagecache_folio, &vmf, > + &pagecache_folio_unlocked); > goto out_put_page; > } else if (likely(flags & FAULT_FLAG_WRITE)) { > vmf.orig_pte = huge_pte_mkdirty(vmf.orig_pte); > @@ -6809,10 +6828,14 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > out_ptl: > spin_unlock(vmf.ptl); > > - if (pagecache_folio) { > + /* > + * If the pagecache_folio is unlocked in hugetlb_wp(), we skip > + * folio_unlock() here. > + */ > + if (pagecache_folio && !pagecache_folio_unlocked) > folio_unlock(pagecache_folio); > + if (pagecache_folio) > folio_put(pagecache_folio); > - } > out_mutex: > hugetlb_vma_unlock_read(vma); > > > base-commit: d76bb1ebb5587f66b0f8b8099bfbb44722bc08b3 > -- > 2.43.0 > > -- Oscar Salvador SUSE Labs