From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C888C5AD49 for ; Wed, 28 May 2025 21:34:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 66B2A6B008C; Wed, 28 May 2025 17:34:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 643D16B0092; Wed, 28 May 2025 17:34:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 55A2E6B0093; Wed, 28 May 2025 17:34:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3AAF86B008C for ; Wed, 28 May 2025 17:34:20 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id D692BEB305 for ; Wed, 28 May 2025 21:34:19 +0000 (UTC) X-FDA: 83493620238.06.3A38E12 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf14.hostedemail.com (Postfix) with ESMTP id 972F310000F for ; Wed, 28 May 2025 21:34:17 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=LddGgKyd; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b="/TK6c2b5"; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=LmKz5+RU; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=5cQ+DouM; spf=pass (imf14.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748468057; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Zn5dDTOpfN67/OlDdxZPFl4vcSl3rIJLnpSo3fCnCZc=; b=fdbuZsO4Q/sLS+NHqm7gyeydqWtgmylLqbOS5qq1OXeZ+d+QzD4XE2/m+x/uq13Cn+fAKC J+qo0j0EoyIMsyLRc3ldDr70PwcWwIponr0omyd0TZN/Ah2AGePyYSKFJ3kfd1ltYIHEFC Yjk/CmM+YeC7F7s6Ddj4YwpnaLxmH+8= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=LddGgKyd; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b="/TK6c2b5"; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=LmKz5+RU; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=5cQ+DouM; spf=pass (imf14.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748468057; a=rsa-sha256; cv=none; b=m+zKPv7KXZSzIhFhRmnjjPUJrZlfRtN9V/3M5L8+VFC3L3S8X+kXQK0HS0kaP+uqrdObKY SXC10GTz8bq0vDAY08zgDJMQqQ4z+v5PIk9Tf59bULc4n+4FQmGdGd3dx89eBi8jdYbhy5 PRbbPhSzLpCRYY0K0ntnZgcNgvAaENg= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id BB3C52116C; Wed, 28 May 2025 21:34:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1748468056; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Zn5dDTOpfN67/OlDdxZPFl4vcSl3rIJLnpSo3fCnCZc=; b=LddGgKydabcDjkgkjiRsNwHsRbjjffoLMKK9oY2XmEFKbs0Wfn3VyJUkY2YdJqUy7Pmu+m +S6TtlCq0W7tpHQrlm7xpil/F/5UQPMPCtPaxhldpGPmLr85Xg2Bs/F/cuBzhyEgvldik8 8qSUSEqC1TWjmt9yhrl4EN2i6Cng3iA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1748468056; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Zn5dDTOpfN67/OlDdxZPFl4vcSl3rIJLnpSo3fCnCZc=; b=/TK6c2b5toZAlnPRukLl8xJl9eOOVPjeQxalhgsADmGO8L4nCoP+OZMkaYCfZfOp3Yp3Ru IcZRM2qRQ6wC/4AQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1748468055; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Zn5dDTOpfN67/OlDdxZPFl4vcSl3rIJLnpSo3fCnCZc=; b=LmKz5+RU3Roi2VY/TI1FP70B7nHiLENRqHDTR5xw6N63UClfa0iPJgrYX6hxmw9MZeQVtW sosiyniCDQFlCZ9cNKe9Ssjuv1NGQJFhsvAYjZnjpJTKk35QQOgrSj9oXpAEOEUV9rzWEL MknOtrZ7aofUsO6ojrHycrwtBIjFmn0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1748468055; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Zn5dDTOpfN67/OlDdxZPFl4vcSl3rIJLnpSo3fCnCZc=; b=5cQ+DouMgAZWRywQPBt4Vt9W4cYvV9iv/sXoaKDQ99F9S6CZDZbevAYYE7r2kHVvAI+NKe oUSbQzUMqCkhznAw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 0B442136E3; Wed, 28 May 2025 21:34:14 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 3xyUO1aBN2hGQgAAD6G6ig (envelope-from ); Wed, 28 May 2025 21:34:14 +0000 Date: Wed, 28 May 2025 23:34:09 +0200 From: Oscar Salvador To: David Hildenbrand Cc: Peter Xu , Gavin Guo , linux-mm@kvack.org, linux-kernel@vger.kernel.org, muchun.song@linux.dev, akpm@linux-foundation.org, mike.kravetz@oracle.com, kernel-dev@igalia.com, stable@vger.kernel.org, Hugh Dickins , Florent Revest , Gavin Shan Subject: Re: [PATCH v3] mm/hugetlb: fix a deadlock with pagecache_folio and hugetlb_fault_mutex_table Message-ID: References: <20250528023326.3499204-1-gavinguo@igalia.com> <629bb87e-c493-4069-866c-20e02c14ddcc@redhat.com> <4cc7e0bb-f247-419d-bf6f-07dc5e88c9c1@redhat.com> <04ecf2e3-651a-47c9-9f30-d31423e2c9d7@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <04ecf2e3-651a-47c9-9f30-d31423e2c9d7@redhat.com> X-Rspamd-Action: no action X-Rspam-User: X-Stat-Signature: 8g6m93mfjf3qbufronx9hdq61dtz6nje X-Rspamd-Queue-Id: 972F310000F X-Rspamd-Server: rspam11 X-HE-Tag: 1748468057-464487 X-HE-Meta: U2FsdGVkX18dUVaO0MpUHkO2xE3IXcUTAghB3cgITyJNy6P9mM123tFO+n6D0S2mGLej8edoStSNQZ/Kcgfxadw0jT1fnb+l5QMRCXhC+tyBD+VftPmp0EnnpYL2gm4GEtrmzRJIKDcHc8/EYeW4K+yNnr8QzYyQlXkdbI3bGNbXr9Chi+Eu5tS0uSKTKpsGoNUC9Nwtf0QHCC7R3RvDbs/tlYDVZGJFCfgKSbiizQeJGSQSsDJy8nLdoS+ZrGeBPeae7Oc0Qr55aynSZ4/CDHWkHfCKPY616MeYuaTGKlc8SclGo0XaKdBInWPZ81TKA5jx6qlXRvfFc8rjYcqZyO3pESKVvYmVrFI2buYVRu2IthV18gjBRPwQ0ZnwCCxMkB/KBCrWsqfI8rmg8hOJl5n/afBhduFxPsGA8nShYvGUVeiHLn20wBoNIOM7xI3vaGKaUvXmAydDYjmAr9Lv0OxypfY+t8/ZTD4YxM5qnWHj3ndikon+YRLVOEXamClrcg3UqvncQJjcuQcv4LXm2ZQeaQLX66evomMXqp1CBQbGEM2mhc0gpBv8qHZHfR56UlcN9qBmQ/fmjgK+svlxoGz+R6tnDtIG7f9p5w2OF+jJm0SavDr3nY26xBHU4TdJo1ZQ3kQYZbixj2Bn7CyZ3SFisRKH9LNhUDXtDsz5oeLmPtNt01Z3CrwqJzhzucAfvwNW2RUb1KZC1Pukfuw+cp1rx7tipd6Hu+Fo9fYy2wDxCUDxic1WbwIwNac1W9aDcJ7EA0AikkKZO2yMyCVIM/JJvGXM66LopY8DMppUG0alRSjZk53GNiiCkzeuVodzFZnztphHEqH1UQzSWbU+/RL7TGToC45ztkM3jQ/Gc0ciUIQUvQ9KGP/7f+GOcZ8LDz31igi5aSGm/RVoPvHGxtGxulL4NgqgSJA0ueGYHAbmMUHRJgCREEZJg8EnnYSy5FVCnwqKjQXDwB1KF7T yScc96rG CzsXgECk+Swmdemcl4W8yUp4aDq5y8J8sOwm4HweYmGHZEMv4QraPz3V981JBZZaFqEKkGWpRXA6Ab38OjD2rdKEA0LlQcxAl15e8WnimC7srgPvU7MqMxowPavURnFAqPFOHRejxBrrxmnkm3aYdjKllml06HixjJAszz2yoZ7Jo5v0xTXx5q9bT0b/gP8akVbg7ST16RHCYz5c8D9X4snXHli9nlZ8w8A7lENzGRc/YiLTH2I6rydESPReBFNUC29DuxbEU4zofaL44oTnmOtaeLHzdbsEtm7/0zUX2fKSBCWqkXmgWdzHK1vbmR/05pZpwOdwgmTzCz3rGwMDiiD+jF9klVhSd6UQ27h+CtnMKEkjhemdoZJdHnA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, May 28, 2025 at 10:26:04PM +0200, David Hildenbrand wrote: > Digging a bit: > > commit 56c9cfb13c9b6516017eea4e8cbe22ea02e07ee6 > Author: Naoya Horiguchi > Date: Fri Sep 10 13:23:04 2010 +0900 > > hugetlb, rmap: fix confusing page locking in hugetlb_cow() > The "if (!trylock_page)" block in the avoidcopy path of hugetlb_cow() > looks confusing and is buggy. Originally this trylock_page() was > intended to make sure that old_page is locked even when old_page != > pagecache_page, because then only pagecache_page is locked. > > Added the comment > > + /* > + * hugetlb_cow() requires page locks of pte_page(entry) and > + * pagecache_page, so here we need take the former one > + * when page != pagecache_page or !pagecache_page. > + * Note that locking order is always pagecache_page -> page, > + * so no worry about deadlock. > + */ > > > And > > commit 0fe6e20b9c4c53b3e97096ee73a0857f60aad43f > Author: Naoya Horiguchi > Date: Fri May 28 09:29:16 2010 +0900 > > hugetlb, rmap: add reverse mapping for hugepage > This patch adds reverse mapping feature for hugepage by introducing > mapcount for shared/private-mapped hugepage and anon_vma for > private-mapped hugepage. > While hugepage is not currently swappable, reverse mapping can be useful > for memory error handler. > Without this patch, memory error handler cannot identify processes > using the bad hugepage nor unmap it from them. That is: > - for shared hugepage: > we can collect processes using a hugepage through pagecache, > but can not unmap the hugepage because of the lack of mapcount. > - for privately mapped hugepage: > we can neither collect processes nor unmap the hugepage. > This patch solves these problems. > This patch include the bug fix given by commit 23be7468e8, so reverts it. > > Added the real locking magic. Yes, I have been checking "hugetlb, rmap: add reverse mapping for hugepage", which added locking the now-so-called 'old_folio' in case hugetlbfs_pagecache_page() didn't return anything. Because in hugetlb_wp, this was added: @@ -2286,8 +2299,11 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma, retry_avoidcopy: /* If no-one else is actually using this page, avoid the copy * and just make the page writable */ - avoidcopy = (page_count(old_page) == 1); + avoidcopy = (page_mapcount(old_page) == 1); if (avoidcopy) { + if (!trylock_page(old_page)) + if (PageAnon(old_page)) + page_move_anon_rmap(old_page, vma, address); So, as you mentioned, it was done to keep the rmap stable as I guess rmap test test the PageLock. > Not that much changed regarding locking until COW support was added in > > commit 1e8f889b10d8d2223105719e36ce45688fedbd59 > Author: David Gibson > Date: Fri Jan 6 00:10:44 2006 -0800 > > [PATCH] Hugetlb: Copy on Write support > Implement copy-on-write support for hugetlb mappings so MAP_PRIVATE can be > supported. This helps us to safely use hugetlb pages in many more > applications. The patch makes the following changes. If needed, I also have > it broken out according to the following paragraphs. > > > Confusing. > > Locking the *old_folio* when calling hugetlb_wp() makes sense when it is > an anon folio because we might want to call folio_move_anon_rmap() to adjust the rmap root. Yes, this is clear. > Locking the pagecache folio when calling hugetlb_wp() if old_folio is an anon folio ... > does not make sense to me. I think this one is also clear. > Locking the pagecache folio when calling hugetlb_wp if old_folio is a pageache folio ... > also doesn't quite make sense for me. > Again, we don't take the lock for ordinary pages, so what's special about hugetlb for the last > case (reservations, I assume?). So, this case is when pagecache_folio == old_folio. I guess we are talking about resv_maps? But I think we cannot interfere there. For the reserves to be modified the page has to go away. Now, I have been checking this one too: commit 04f2cbe35699d22dbf428373682ead85ca1240f5 Author: Mel Gorman Date: Wed Jul 23 21:27:25 2008 -0700 hugetlb: guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed And I think it is interesting. That one added this chunk in hugetlb_fault(): @@ -1126,8 +1283,15 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, spin_lock(&mm->page_table_lock); /* Check for a racing update before calling hugetlb_cow */ if (likely(pte_same(entry, huge_ptep_get(ptep)))) - if (write_access && !pte_write(entry)) - ret = hugetlb_cow(mm, vma, address, ptep, entry); + if (write_access && !pte_write(entry)) { + struct page *page; + page = hugetlbfs_pagecache_page(vma, address); + ret = hugetlb_cow(mm, vma, address, ptep, entry, page); + if (page) { + unlock_page(page); + put_page(page); + } + } So, it finds and lock the page in the pagecache, and calls hugetlb_cow. hugetlb_fault() takes hugetlb_instantiation_mutex, and there is a comment saying: /* * Serialize hugepage allocation and instantiation, so that we don't * get spurious allocation failures if two CPUs race to instantiate * the same page in the page cache. */ mutex_lock(&hugetlb_instantiation_mutex); But it does not say anything about truncation. Actually, checking the truncation code from back then, truncate_hugepages() (and none of its callers) take the hugetlb_instantiation_mutex, as it is done today (e.g: current remove_inode_hugepages() code). Back then, truncate_hugepages() relied only in lock_page(): static void truncate_hugepages(struct inode *inode, loff_t lstart) { ... ... lock_page(page); truncate_huge_page(page); unlock_page(page); } While today, remove_inode_hugepages() takes the mutex, and also the lock. And then zaps the page and does its thing with resv_maps. So I think that we should not even need the lock for hugetlb_wp when pagecache_folio == old_folio (pagecache), because the mutex already protects us from the page to go away, right (e.g: truncated)? Besides we hold a reference on that page since filemap_lock_hugetlb_folio() locks the page and increases its refcount. All in all, I am leaning towards not being needed, but it's getting late here.. -- Oscar Salvador SUSE Labs