From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D073C3ABB2 for ; Wed, 28 May 2025 15:04:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 059286B0083; Wed, 28 May 2025 11:04:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 00A046B0088; Wed, 28 May 2025 11:04:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E62556B0089; Wed, 28 May 2025 11:04:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CA1CE6B0083 for ; Wed, 28 May 2025 11:04:03 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 67846C0B4E for ; Wed, 28 May 2025 15:04:03 +0000 (UTC) X-FDA: 83492636766.30.CC4FF0D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 11D9912001D for ; Wed, 28 May 2025 15:04:00 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=WG67XJgn; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf29.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748444641; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Fcg1D7pi8/e049vKSQfksEQEilD5PKbpIfwXzkGgo8Q=; b=qwj5BmXHPziv2lbBSVovlBssT7gMma2wyXeJSy0PJJzAoZVZQe8AHMTofaCGuQkfbtn/95 u82GViwcLa2F6lSD2CwzOjEquwbnngnzqa37rNE8Ac92bTFhZPU4OHVMAldveVYIrrB3gu 2d9G5YiqIeFFGQLYZUKra0NRpisHyjI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748444641; a=rsa-sha256; cv=none; b=nTVmQ3TFoRY9bJmtLUep+zj3EBFsNJR0mIuZSxPCMFsmMaW4SucBYSm0ls6e4nWntJkbJe 6AJh4PFrZuujMJJ4GAwD10ilrrYRxPbfIYhPV6UrARW4w7nz/9w5O7LRCDO4l5jXBkfT2o iAMJ5IsKQlIBLYDCVNtqillSJTNAN98= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=WG67XJgn; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf29.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1748444640; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Fcg1D7pi8/e049vKSQfksEQEilD5PKbpIfwXzkGgo8Q=; b=WG67XJgnbKyuT/aVVcc8awq33yCiWu0vMkCBCqNOjZIhu570SgxKw0bCQI2uPID2A0+Ul0 Y+uCuvm1WrrTTYghKHhdzL/rF9HyRPPUCh1nASh46eCd5vw32LcEwRz2Zr2ASmSmw+Z9fa IFBQpK878z0krmRweRh1YuP4nnv6gcI= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-178-ucemp4OvMP2JyBs4yTWAbw-1; Wed, 28 May 2025 11:03:59 -0400 X-MC-Unique: ucemp4OvMP2JyBs4yTWAbw-1 X-Mimecast-MFC-AGG-ID: ucemp4OvMP2JyBs4yTWAbw_1748444638 Received: by mail-qt1-f198.google.com with SMTP id d75a77b69052e-4a35b701385so15368821cf.0 for ; Wed, 28 May 2025 08:03:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748444638; x=1749049438; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Fcg1D7pi8/e049vKSQfksEQEilD5PKbpIfwXzkGgo8Q=; b=jFa9FrZ3970Kxcs9TKNMCgCLRuiiB5KBnzZZrJ2IrcfYdKOducXC06Y8i0nCI+JP7i DO9fSOwoRY8PZzk83prLeMxegU67yyANtQoUBhg91KBnJefXQeyNCBBy9BF6uyMOZmPj wOvouHxKrnbR9Aq++C1ziy0b1qJdAcApy5ksnahlMzF3NiVLNisMo1qBvfTlxQlhBLcu Y/FHhV8ne5LBIrWIrLg/s5DaFO0Nbp1tqGHtj47ejIJxydB2pyH7ctoBJqEfrQzgxBcr o38wi9j2tzf1QjRepoloXGJpaOd90yDpHXBuKZh/3OtYHi/495pdd9nfRirOuKf1Mfa/ Bnbw== X-Forwarded-Encrypted: i=1; AJvYcCUJy+i+tUjXR+oQlWRYn0VZwc21JoKqBK4Fg5pdpDhJGJmdS8Y4k6/BtOghh4sf0luKraTQyBmK1A==@kvack.org X-Gm-Message-State: AOJu0Yyb1M3a46IO3CnantYo54f2WCV4QsNOl7B+IOb0Tjxv5MNk496d UtHwYxx+20MyFkGoKDqKI8iadrHQVbmgxl0EmeOT5R1zJioJDv1SHL+ZSEpOHYsZoEHL6yxSDU8 X9CiWDtzAJDTQ+jbPXvP5PV2wXKCQ4xGYfi4TUTJK4XjJtL/1k1IR X-Gm-Gg: ASbGncvJtzHO8c5kn6O+JjwrAnc7mabrCSHwAxuQ6admmZbSRPE/C2X/hAHgkkFxRTR t6/g8+7wWfvd8oXr2JKhv7Pcurl1uiRfYvS/x7Xp3YB5PH98POHUAire1U31bbDvfCYLVUdympC EHqxS3OiEdgBQv45zKtaGzy41EsqHUJjy7VipgdF3dydlXJSfmAc3u5FTCe3RUvf++IfwEzjWg4 JMe8oIptpiL2vBosfNWartFBm0r7ONzflWyUBGcs0/SxToJZe6rpHHqPZjNbyl2i0rsD1zr/29M TPM= X-Received: by 2002:a05:622a:6107:b0:494:b316:3c7e with SMTP id d75a77b69052e-49f46d2a5efmr286510801cf.28.1748444638278; Wed, 28 May 2025 08:03:58 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFnUJayuS+UkHO22TovQrelasWBZh3qt4Z2cFlt3nrXlxBdJnptXMVTRnxlIh9YaGB7i53KdA== X-Received: by 2002:a05:622a:6107:b0:494:b316:3c7e with SMTP id d75a77b69052e-49f46d2a5efmr286510291cf.28.1748444637865; Wed, 28 May 2025 08:03:57 -0700 (PDT) Received: from x1.local ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4a3c80f2b8asm6940071cf.73.2025.05.28.08.03.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 May 2025 08:03:57 -0700 (PDT) Date: Wed, 28 May 2025 11:03:53 -0400 From: Peter Xu To: Oscar Salvador Cc: Gavin Guo , linux-mm@kvack.org, linux-kernel@vger.kernel.org, muchun.song@linux.dev, akpm@linux-foundation.org, mike.kravetz@oracle.com, kernel-dev@igalia.com, stable@vger.kernel.org, Hugh Dickins , Florent Revest , Gavin Shan , David Hildenbrand Subject: Re: [PATCH v3] mm/hugetlb: fix a deadlock with pagecache_folio and hugetlb_fault_mutex_table Message-ID: References: <20250528023326.3499204-1-gavinguo@igalia.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: ZxA9PaDGzUCJCZ7D2JVuZviip-pjxleEZMgEhG_tqUk_1748444638 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 11D9912001D X-Stat-Signature: m4hje7ahjxngxesbbiaafadqkyzttwoj X-Rspam-User: X-HE-Tag: 1748444640-524535 X-HE-Meta: U2FsdGVkX1/Phg+2CVkJob/B/Tg9An0mfewthUxDcMnnMTI+QrWqTkBoqNu0WtL1gcl8Rx1sOsSbSmT8Qed9WsZW0yO1YZi+HO1xOIPEMpKS3jlSLZt0h3mcM44ZurkO3T9ZktBPqWFvqRPDemLOuHg+sO882kKqlDvQSc2mWg6TasiDCXU06AuwHQTafOwDxSzBP3lQeJdSZOTW+r/7inlto7h5oapwbVBGrzXcF00mofAQClBOACIAqJL8rvUEqZaNgKqs4Efix12N9I2UWg9/pXWrIQQKRu4Z6CDHMqwCkp30ShYjbPGskPvuCLjfU9u0N1zdLY4p27wsXFL13g/M4xsgP/0iW0SHSBsbwvdp2XQyparkg2FrbC6OqPghx/qVSzDeJ43H5cmR681kY29jVHJhVkIF0xpB6MoZVoDBvDGl8yRe1Zs9nUvqPbFIXkD8n9r2i7fw7KSQbpxgnfNbJlWUFn/wEwuhytxHQuXSp9IU5FsGzkYPNMAfdycMTQu5H+8G8aEXbYqjpgESRsKgrBhjL+X5Lr2TcHe+miLxq/ZpJ2/1kcBjrMdhtXkE5v6ngpa81Wdd4/h+InTc8VGtwJDDRcf2m+L/el3n+t/UtixtAjzv9QK0NBNjSrgFORScdasobH6aBtk7mRCNi5fuKmheZwjn7KR2gjHMoZhXHuKZ0djzJ7IUtn7Xy1fGidgqIOXjymW1sJ2mPbuEyFHAy7mrl0ghzhTFKOAqjVeJ8zJGKUcYqC2+HqEXHf3DjBOC/eeZOGH9vcBOG63SS67U0LLRqOivuc3+NeeEaZKcne61gTh5+wOA+hOfq7Z6PfwSuWZM+83nZvVTzzDB8IqrC19Y7pIxreqXrDRj4TffQTPbqiWMtYCcNsBHhpjeP0+5M77sJT8hhSy/WM4S9qrXdzKDnCUt/j29bbh767Wz0ijFKnUayv9MgLN8PIQag+YLF7P+jrd40ogPtXo /6HmN7Ri UEBRMQMfXkcaCo0knaJGed3OpMTIatK2qdzZgWZu/z0KDSltvZg1kyECVZgNqIbt2x2OiGXW+m+g1gQhoeLAb2o4I68jZP++oGlSGHNo66ywZx8sFj17Yx7c4jGRPdk2I2fwVTEh35R1wRdGePD2inJ0cKdOUCYlr6Q6tNDbS05VaHjGlb/R730iTqig4w+gjlRJ3GQ2ZWfYPWVl/+FVVhmmCVjHv3hys1CPZEMAedQlAv/qu3JU/FAlVsUrd3ZhvVPnsTo+ujwCl02neGnBTVUPJpFBO4rPV9dP6sR4s0tU8j8DnWjAkI8VNt5ivKQRbm1dADpFHK11Q3+D+PUXH/hamTIdITSFglM4MncqdQJQHqFPFzp2vM56qEPqHgBVZQIU/FNKMbYoi6DdOxNeri5pCcyE2h+p/NRukabSR9TnKuNEuseK0hH0DacISOxS2j2SaCT3HURE8r6X34cXk01dxCK6+ebticfSE0FzKoi14j1kjSSTornm5WEL1wa4/gfnHk45ItlNkIdAIsyJpEmtfdR8+SK0waSla X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, May 28, 2025 at 11:27:46AM +0200, Oscar Salvador wrote: > On Wed, May 28, 2025 at 10:33:26AM +0800, Gavin Guo wrote: > > There is ABBA dead locking scenario happening between hugetlb_fault() > > and hugetlb_wp() on the pagecache folio's lock and hugetlb global mutex, > > which is reproducible with syzkaller [1]. As below stack traces reveal, > > process-1 tries to take the hugetlb global mutex (A3), but with the > > pagecache folio's lock hold. Process-2 took the hugetlb global mutex but > > tries to take the pagecache folio's lock. > > > > Process-1 Process-2 > > ========= ========= > > hugetlb_fault > > mutex_lock (A1) > > filemap_lock_hugetlb_folio (B1) > > hugetlb_wp > > alloc_hugetlb_folio #error > > mutex_unlock (A2) > > hugetlb_fault > > mutex_lock (A4) > > filemap_lock_hugetlb_folio (B4) > > unmap_ref_private > > mutex_lock (A3) > > > > Fix it by releasing the pagecache folio's lock at (A2) of process-1 so > > that pagecache folio's lock is available to process-2 at (B4), to avoid > > the deadlock. In process-1, a new variable is added to track if the > > pagecache folio's lock has been released by its child function > > hugetlb_wp() to avoid double releases on the lock in hugetlb_fault(). > > The similar changes are applied to hugetlb_no_page(). > > > > Link: https://drive.google.com/file/d/1DVRnIW-vSayU5J1re9Ct_br3jJQU6Vpb/view?usp=drive_link [1] > > Fixes: 40549ba8f8e0 ("hugetlb: use new vma_lock for pmd sharing synchronization") > > Cc: > > Cc: Hugh Dickins > > Cc: Florent Revest > > Reviewed-by: Gavin Shan > > Signed-off-by: Gavin Guo > ... > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 6a3cf7935c14..560b9b35262a 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -6137,7 +6137,8 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma, > > * Keep the pte_same checks anyway to make transition from the mutex easier. > > */ > > static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, > > - struct vm_fault *vmf) > > + struct vm_fault *vmf, > > + bool *pagecache_folio_locked) > > { > > struct vm_area_struct *vma = vmf->vma; > > struct mm_struct *mm = vma->vm_mm; > > @@ -6234,6 +6235,18 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, > > u32 hash; > > > > folio_put(old_folio); > > + /* > > + * The pagecache_folio has to be unlocked to avoid > > + * deadlock and we won't re-lock it in hugetlb_wp(). The > > + * pagecache_folio could be truncated after being > > + * unlocked. So its state should not be reliable > > + * subsequently. > > + */ > > + if (pagecache_folio) { > > + folio_unlock(pagecache_folio); > > + if (pagecache_folio_locked) > > + *pagecache_folio_locked = false; > > + } > > I am having a problem with this patch as I think it keeps carrying on an > assumption that it is not true. > > I was discussing this matter yesterday with Peter Xu (CCed now), who has also some > experience in this field. > > Exactly against what pagecache_folio's lock protects us when > pagecache_folio != old_folio? > > There are two cases here: > > 1) pagecache_folio = old_folio (original page in the pagecache) > 2) pagecache_folio != old_folio (original page has already been mapped > privately and CoWed, old_folio contains > the new folio) > > For case 1), we need to hold the lock because we are copying old_folio > to the new one in hugetlb_wp(). That is clear. So I'm not 100% sure we need the folio lock even for copy; IIUC a refcount would be enough? > > But for case 2), unless I am missing something, we do not really need the > pagecache_folio's lock at all, do we? (only old_folio's one) > The only reason pagecache_folio gets looked up in the pagecache is to check > whether the current task has mapped and faulted in the file privately, which > means that a reservation has been consumed (a new folio was allocated). > That is what the whole dance about "old_folio != pagecache_folio && > HPAGE_RESV_OWNER" in hugetlb_wp() is about. > > And the original mapping cannot really go away either from under us, as > remove_inode_hugepages() needs to take the mutex in order to evict it, > which would be the only reason counters like resv_huge_pages (adjusted in > remove_inode_hugepages()->hugetlb_unreserve_pages()) would > interfere with alloc_hugetlb_folio() from hugetlb_wp(). > > So, again, unless I am missing something there is no need for the > pagecache_folio lock when pagecache_folio != old_folio, let alone the > need to hold it throughout hugetlb_wp(). > I think we could just look up the cache, and unlock it right away. > > So, the current situation (previous to this patch) is already misleading > for case 2). > > And comments like: > > /* > * The pagecache_folio has to be unlocked to avoid > * deadlock and we won't re-lock it in hugetlb_wp(). The > * pagecache_folio could be truncated after being > * unlocked. So its state should not be reliable > * subsequently. > */ > > Keep carrying on the assumption that we need the lock. > > Now, if the above is true, I would much rather see this reworked (I have > some ideas I discussed with Peter yesterday), than keep it as is. Yes just to reply in public I also am not aware of why the folio lock is needed considering hugetlb has the fault mutex. I'm not sure if we should rely more on the fault mutex, but that doesn't sound like an immediate concern. It may depend on whether my above understand was correct.. and only if so, maybe we could avoid locking the folio completely. Thanks, > > Let me also CC David who tends to have a good overview in this. > > -- > Oscar Salvador > SUSE Labs > -- Peter Xu