From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 553E9C54E65 for ; Wed, 21 May 2025 11:12:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E423F6B00A7; Wed, 21 May 2025 07:12:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E1A5F6B00A8; Wed, 21 May 2025 07:12:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D56896B00A9; Wed, 21 May 2025 07:12:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B6F896B00A7 for ; Wed, 21 May 2025 07:12:42 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 653EA1D3544 for ; Wed, 21 May 2025 11:12:42 +0000 (UTC) X-FDA: 83466652164.04.EAC999B Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by imf29.hostedemail.com (Postfix) with ESMTP id 453C4120008 for ; Wed, 21 May 2025 11:12:39 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=igalia.com header.s=20170329 header.b=C4sdW6bp; spf=pass (imf29.hostedemail.com: domain of gavinguo@igalia.com designates 213.97.179.56 as permitted sender) smtp.mailfrom=gavinguo@igalia.com; dmarc=pass (policy=none) header.from=igalia.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747825960; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tSZLmeHxhklcvtBKwZ8YgYV6AoGGikz4OO4w/hxhhKs=; b=KRyEGgnX9y/vglbUULkllAiOZeD8CBEDhxD7AZ5RUZJ383ebQWRN8UV6WzQ5JuVPpTH7FV bcGmhYQUGEsNwuUlyc2fRPseCVcOfN3LbSHyYQr3ym59O4/QqiA/UkoGJRcpO40GPc40sp eemEx0WXkJPmXrmD2PHjjIQrYot709s= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747825960; a=rsa-sha256; cv=none; b=6A4RafZVXPfdelrK2L8icfi/SXgTzqghaYeKtY4lE0cAo4njLAmdRBy7GaZVHqwxnp983O cN/ljiBYaYjog2Equ3IHuOVmQLFLYPEew6rLfWGpw43pFHGcpF6U5GTTo/XI0jerkmijlL GnduLtB3MQPpRXF3pY8N6k0XyXZuHz0= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=igalia.com header.s=20170329 header.b=C4sdW6bp; spf=pass (imf29.hostedemail.com: domain of gavinguo@igalia.com designates 213.97.179.56 as permitted sender) smtp.mailfrom=gavinguo@igalia.com; dmarc=pass (policy=none) header.from=igalia.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:From: References:Cc:To:Subject:MIME-Version:Date:Message-ID:Sender:Reply-To: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=tSZLmeHxhklcvtBKwZ8YgYV6AoGGikz4OO4w/hxhhKs=; b=C4sdW6bp4/QGVyGGEHLe5UvnEB QB3dunN1B6Sj9DdlBjAbe0Di5DDvo7TPLEZ43BqcE0TLorcKaYOggAxsNV0zYpSg38Ye3us0+TTA5 sl85HOxLT+GO4lA5ZW2BBVXcKxdvqGZjjfS44jEJ4MokB+8zECQlVSL4jSHJDVRGzy6HeJNvuDe+h /yTJHaTzBJlGfR7gZ0Kx2E9h9XRs+sSjB9ydZuTBdn3rhLQM78L5y90oZeV7OEpaGEsDz/0iQXgZ4 RJJ4ul6PCpwxSMHQ/agxerOIW6SxWytb7OVdH7rCO1N2O6hV0UfZaGMX0jTIUmn7LCEnXq0MlU1la 7Al9j6YQ==; Received: from 27-52-98-155.adsl.fetnet.net ([27.52.98.155] helo=[192.168.238.43]) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_128_GCM:128) (Exim) id 1uHhNA-00BAFP-Rp; Wed, 21 May 2025 13:12:33 +0200 Message-ID: <3b167142-7c1a-4d6b-b41e-8c067ab737df@igalia.com> Date: Wed, 21 May 2025 19:12:21 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm/hugetlb: fix a deadlock with pagecache_folio and hugetlb_fault_mutex_table To: Oscar Salvador Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, muchun.song@linux.dev, akpm@linux-foundation.org, mike.kravetz@oracle.com, kernel-dev@igalia.com, stable@vger.kernel.org, Hugh Dickins , Florent Revest , Gavin Shan References: <20250513093448.592150-1-gavinguo@igalia.com> Content-Language: en-US From: Gavin Guo In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Stat-Signature: u899m17g6rjj7eejojikcmeedd5fg6rr X-Rspamd-Queue-Id: 453C4120008 X-Rspam-User: X-Rspamd-Server: rspam02 X-HE-Tag: 1747825959-240029 X-HE-Meta: U2FsdGVkX1/+RvXrwRVgbK0zx3pO85SUixwyBPyeppIyZu2hDiyN664GlL/5funAf4HXSQJVHiUUGnKdCYUQJH8zOSTK4IfymQH/KJ6NJPwKIRVYTPoObXmYgUNC1iJKhOBONefAMYsAMEvtOar2OSNQjEpl/62jXYe6Bb8j4rtZzYXJZ/TBU0X5sVrB+C7AkMXzLOEhO8+9KDQadDnBxHYlRTI3hmX//WM8X2cSKwHsw6gYirtYjqF91Mc4F5/Qbf5FQh3XSaWuK3uMJbSDkazx0LKpTczfXrJCWg3ZSvWemenu0YKTR5V6UrZUDtSntqopW1aUIf+Kr/mFAKChviv779UfwzBcUnu3532990bfMzzgda/l1COGsGi1joOb4Yi3RaiQBM8tuCxEINuA1p6eWyYsn8QVGWxNp98qPWJBqb4FZwZgM0/1PDpC9jKcq3F90PyGxwiaf8nVvFxaQfVKLux39+wxfaBKy9Tpf7t1BF6lsTqmgWdT1VFfXFAnnjvUyoJTsdHTsHrWjBdss/upkOFm+l+4QT4Ol3jB27JVdjpRPod5Jdbuy5d2LO1gDh9L+9RkbH/1UajWGq5D3DInUYJtKqnzl4vqMQn/KfZlSIcBixGcfDB7Woz5JBNIfA+Drd9N4mEY9CEz0lvbXysSU+APRyKX+1rmixhUZscOx4jFn9rghP+TxS1YqCztCVik92bYX9ZZpf3J0mm43ZcyhW0xEg7T27kgHUqgE3kjtqD8iLXZQcyk9sHy/5hq5LIKUKqBC13Drg2n2SQrvmpvUNp+QLhRI1hPzS6ESEzmZjRt3vBp2u+TOPHF9ydF0i8Be/3Mt6k45dH20KGC+gNiPARsr7SFCiiMswhryxhdm0d9M7bQavu9SyHDZkOhTUc7lQkEFJmCy7/M9t238Xfho9XfJHM5rm4CmbWVOwH/Q6ukkm9iwYtIm6wlWjFYWnWMgRytsAu5zXF+oFO ypUPq0uk TtWM05qY3rPP9l5Cr/LxouhwoaBvanepvd6JFNWwMHmwirisj/WCVIxY18/Vg/J0oYNeuF4HVDKHnJGSxwJc3CpCAAc5bWmqx2WHUlu22u9j1AHMZd/y446MOxrUga4sRcZvLnLRxLfntNr0/6hA4dNyb2rUApAwor04WeBUEUjcESkpECUt1vuB3zvNOHFUP8RmR5idoPTurQRlKNSQ4MjItwdemKNhLuCxFfh/s69tcWE2wNUrIr6ZQZ5Pp2lxLlbdy9NfhufDBu5XYF/Vh8+NzMdWn0wwt4GVhrzd4StZEpNBzymNOlY2INWM3RREBLfAQme2tuXs5n3KvFAifSUtCh1JASLA8IUhg2arA4nuXU+eLsp+xKIh9Yw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 5/21/25 03:53, Oscar Salvador wrote: > On Tue, May 13, 2025 at 05:34:48PM +0800, Gavin Guo wrote: >> The patch fixes a deadlock which can be triggered by an internal >> syzkaller [1] reproducer and captured by bpftrace script [2] and its log >> [3] in this scenario: >> >> Process 1 Process 2 >> --- --- >> hugetlb_fault >> mutex_lock(B) // take B >> filemap_lock_hugetlb_folio >> filemap_lock_folio >> __filemap_get_folio >> folio_lock(A) // take A >> hugetlb_wp >> mutex_unlock(B) // release B >> ... hugetlb_fault >> ... mutex_lock(B) // take B >> filemap_lock_hugetlb_folio >> filemap_lock_folio >> __filemap_get_folio >> folio_lock(A) // blocked >> unmap_ref_private >> ... >> mutex_lock(B) // retake and blocked >> > ... >> Signed-off-by: Gavin Guo > > I think this is more convoluted that it needs to be? > > hugetlb_wp() is called from hugetlb_no_page() and hugetlb_fault(). > hugetlb_no_page() locks and unlocks the lock itself, which leaves us > with hugetlb_fault(). > > hugetlb_fault() always passed the folio locked to hugetlb_wp(), and the > latter only unlocks it when we have a cow from owner happening and we > cannot satisfy the allocation. > So, should not checking whether the folio is still locked after > returning enough? > What speaks against: > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index bd8971388236..23b57c5689a4 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -6228,6 +6228,12 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, > u32 hash; > > folio_put(old_folio); > + /* > + * The pagecache_folio needs to be unlocked to avoid > + * deadlock when the child unmaps the folio. > + */ > + if (pagecache_folio) > + folio_unlock(pagecache_folio); > /* > * Drop hugetlb_fault_mutex and vma_lock before > * unmapping. unmapping needs to hold vma_lock > @@ -6825,7 +6831,12 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > spin_unlock(vmf.ptl); > > if (pagecache_folio) { > - folio_unlock(pagecache_folio); > + /* > + * hugetlb_wp() might have already unlocked pagecache_folio, so > + * skip it if that is the case. > + */ > + if (folio_test_locked(pagecache_folio)) > + folio_unlock(pagecache_folio); > folio_put(pagecache_folio); > } > out_mutex: Really appreciate your review. Good catch! This is elegant. The patch is under testing and I'll send out the v2 patch soon. > >> --- >> mm/hugetlb.c | 33 ++++++++++++++++++++++++++++----- >> 1 file changed, 28 insertions(+), 5 deletions(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index e3e6ac991b9c..ad54a74aa563 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -6115,7 +6115,8 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma, >> * Keep the pte_same checks anyway to make transition from the mutex easier. >> */ >> static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, >> - struct vm_fault *vmf) >> + struct vm_fault *vmf, >> + bool *pagecache_folio_unlocked) >> { >> struct vm_area_struct *vma = vmf->vma; >> struct mm_struct *mm = vma->vm_mm; >> @@ -6212,6 +6213,22 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, >> u32 hash; >> >> folio_put(old_folio); >> + /* >> + * The pagecache_folio needs to be unlocked to avoid >> + * deadlock and we won't re-lock it in hugetlb_wp(). The >> + * pagecache_folio could be truncated after being >> + * unlocked. So its state should not be relied >> + * subsequently. >> + * >> + * Setting *pagecache_folio_unlocked to true allows the >> + * caller to handle any necessary logic related to the >> + * folio's unlocked state. >> + */ >> + if (pagecache_folio) { >> + folio_unlock(pagecache_folio); >> + if (pagecache_folio_unlocked) >> + *pagecache_folio_unlocked = true; >> + } >> /* >> * Drop hugetlb_fault_mutex and vma_lock before >> * unmapping. unmapping needs to hold vma_lock >> @@ -6566,7 +6583,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping, >> hugetlb_count_add(pages_per_huge_page(h), mm); >> if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) { >> /* Optimization, do the COW without a second fault */ >> - ret = hugetlb_wp(folio, vmf); >> + ret = hugetlb_wp(folio, vmf, NULL); >> } >> >> spin_unlock(vmf->ptl); >> @@ -6638,6 +6655,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, >> struct hstate *h = hstate_vma(vma); >> struct address_space *mapping; >> int need_wait_lock = 0; >> + bool pagecache_folio_unlocked = false; >> struct vm_fault vmf = { >> .vma = vma, >> .address = address & huge_page_mask(h), >> @@ -6792,7 +6810,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, >> >> if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) { >> if (!huge_pte_write(vmf.orig_pte)) { >> - ret = hugetlb_wp(pagecache_folio, &vmf); >> + ret = hugetlb_wp(pagecache_folio, &vmf, >> + &pagecache_folio_unlocked); >> goto out_put_page; >> } else if (likely(flags & FAULT_FLAG_WRITE)) { >> vmf.orig_pte = huge_pte_mkdirty(vmf.orig_pte); >> @@ -6809,10 +6828,14 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, >> out_ptl: >> spin_unlock(vmf.ptl); >> >> - if (pagecache_folio) { >> + /* >> + * If the pagecache_folio is unlocked in hugetlb_wp(), we skip >> + * folio_unlock() here. >> + */ >> + if (pagecache_folio && !pagecache_folio_unlocked) >> folio_unlock(pagecache_folio); >> + if (pagecache_folio) >> folio_put(pagecache_folio); >> - } >> out_mutex: >> hugetlb_vma_unlock_read(vma); >> >> >> base-commit: d76bb1ebb5587f66b0f8b8099bfbb44722bc08b3 >> -- >> 2.43.0 >> >> >