From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 893E7C678DA for ; Tue, 10 Jun 2025 14:13:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0D0AB6B007B; Tue, 10 Jun 2025 10:13:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A6B66B008A; Tue, 10 Jun 2025 10:13:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED6F36B008C; Tue, 10 Jun 2025 10:13:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id CAB186B007B for ; Tue, 10 Jun 2025 10:13:56 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5D2B01604D3 for ; Tue, 10 Jun 2025 14:13:56 +0000 (UTC) X-FDA: 83539684872.18.A615D8D Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf20.hostedemail.com (Postfix) with ESMTP id EEF611C0009 for ; Tue, 10 Jun 2025 14:13:53 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=UIYK7eHB; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b="w0XR/JIU"; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=C9hvL1E+; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=Gm7jUAe6; spf=pass (imf20.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749564834; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1MQSXXafvP/LLujHfyD/l7INR/OL5srFwLiZqmlLeww=; b=IZO62JXKpmZ+EAxJzk0yyVEGEZDmycgP1wFeWlSopoEPydSQsZ6TgKdxtno1Ry+qnp1+Xc RzS82++WCIUqwrOilUnYkLoq2uDJnjLjRMsVVnka0QyCAaMVV10CiIBjN0ICzzyJ7J2wgF BuMEwVhHHxS7wcJBVpUk8RXgNhBDTJs= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=UIYK7eHB; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b="w0XR/JIU"; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=C9hvL1E+; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=Gm7jUAe6; spf=pass (imf20.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749564834; a=rsa-sha256; cv=none; b=itBO/YvbqThRhdjuPtimu6IZ+41Pfp/yxlAERwMr9ld2bov2/ylqenVes/dCbm1laNYxVa qMgODMtlWSQM6gDWlmnfWgcCPNExvRJgfcyJySFcTvZJHVoql98WpXKZN+emX6MmE9CJ8G vs6/BDWMo6Jm8j5cOKUFFzdkx2j5PhA= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id EEEB11F76B; Tue, 10 Jun 2025 14:13:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1749564832; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=1MQSXXafvP/LLujHfyD/l7INR/OL5srFwLiZqmlLeww=; b=UIYK7eHB0AF9rpEztfwaWmAqGLJ/lMc1jm8teffNbo5hIr+hpjhM/Xq7p3bkqsZJ5pePRL BGod60gQpIKzmVkNGySgGqls4ea9KjQTZw4ccdh7TT1/7KGhFX12y2YtnzUnD0bgWUFV7K PZFoCHmvSYs7j85Wt959OYiyVLCBfxw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1749564832; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=1MQSXXafvP/LLujHfyD/l7INR/OL5srFwLiZqmlLeww=; b=w0XR/JIUTDWdcy8kCxYLuuLdNBG+ekiJ8+EL2keuBdlgJiWFOYZGJByacAeRzlIRO9US/M m+1QSDHZtdG+irBw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1749564831; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=1MQSXXafvP/LLujHfyD/l7INR/OL5srFwLiZqmlLeww=; b=C9hvL1E+2HIk8OagG9rvR6rflGt2gg3GY7VeJAkv/DU0QFmGG78fJD0q8niZtv1GCK9QVV wa9xxkzx4VDzE+CJcMcWGiGhUXgxbzimDnDDdk3CTvqv5EI0FrEYHug7CI8YqEvzrod0Y3 vJQjWnZCzd0dGoL0iHyBH+0lBNB5UlQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1749564831; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=1MQSXXafvP/LLujHfyD/l7INR/OL5srFwLiZqmlLeww=; b=Gm7jUAe6YI8zWx0kaLtiFebWapGhfZfXSscjgRwmHFKTFmEhyslcbNk3tihMs/fZLx/Y+T bTves1OV2m3xb7CQ== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 77D57139E2; Tue, 10 Jun 2025 14:13:51 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id efZnGp89SGiwNQAAD6G6ig (envelope-from ); Tue, 10 Jun 2025 14:13:51 +0000 Date: Tue, 10 Jun 2025 16:13:45 +0200 From: Oscar Salvador To: Peter Xu Cc: Andrew Morton , Muchun Song , David Hildenbrand , James Houghton , Gavin Guo , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 1/3] mm, hugetlb: Clean up locking in hugetlb_fault and hugetlb_wp Message-ID: References: <20250602141610.173698-1-osalvador@suse.de> <20250602141610.173698-2-osalvador@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Action: no action X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: EEF611C0009 X-Stat-Signature: mg8ern5ainxd5fq8h7opb9gkg5ze88gy X-Rspam-User: X-HE-Tag: 1749564833-811661 X-HE-Meta: U2FsdGVkX19wApYOqtcHH7X6gYZDsK8Q8Qmlgo3TBWiRTDgjCTWGN+3Hn5nP8bLm5hRL06Vs9+sh7SVmvzW3Hv1sGVcwo0v6Z0AdROU9onCmRP8B/MRQ8xAlOfAAqPLM8+zK9MEgtI6Yk3yvFO05H7KT+veJjO9+l4MPkrl+fACOqEYbDRM+HphWLXDHwNva6KRLsXjh2FP5D4LsXOFV8bRvbsjz7zZFp/UZBo+gIc76+OlKZq754wQHCzvOCVVWTtJFWJkXq01hTK4zK0P35lYP250rdJt9i9ymwKab5PLo4h6CZjqMPZ6WKyYsyaP07tz3MeDknN7Wu6uIoQ3wZP3fwvaEpTBEtAurF5shpRLqF3p7Jr+KH0DvxMYaSpPM+5/AX9AakLWhepjl7f7CGQRTyxcJieN6zQHBpaDP+8hbrb2avo8TTQn0JN7jXq3t7W7aeNksAKxPaMlz0VPFXHCafgPt5LH+VqzJ64qrvrzzC++dw+ROhNbssnCvRmL53J1XkKSmnT/zs7IRiQa87ttzmhbwvZHyOTrRT74lafKBmD8cwFgJWD0vVTFloY0YWAOmq56W68yEmRgMAUR/BpUQ184VmRixDsHeCdvsyonyh6EuiVbzhLvy1D0/nm2KnOaDH0Uaflhvk4xO9M7SWIovSxzYF3v9yA2pvedrNyJnzPj81SJr2K6lGz3gRFjrj9hdQ5Nc46d4WrJzL8MVUYG05Ue5+BNQoMdDADmNMOYF/8ZjkZvs1oWMbAYi9DQPgyLz19KG3i8VIJCQyyDoWkshah1T9laLbjtAQaKJ4X4WSC1lPv0GhJ7rKeunSVWG3I8o6Ly0dkAjS2ojp2VpdaKQG5kzMz8kx9HtzDn+AFGQKAKvienEfoLlR6j/PW5Vfy1KbELg1YlnEkBe2rIQyD+K8AiVN5MEbWI52SrI3VWbflFMjRdxj2uwA5X7Y2JKRWxqqqV+klHEmyXJqPj RH/35XCB 9/a+AK+xL3a+woKxA+YwfKNAnBgqtvzWrjiQZ6jtC/0mUN3nbn+O5FOoFLsG6yjzIcrvaquqdUzgiEltOLL55zdvw1/hujxuDWIA0i3VJ2xhC6DhX7AI6mSI+69+U4A8IIqAvG12F9pFOpQy6c+71jMzLQP46FaMYIZFtpvWmk9ebW8D30QSU6RSMAK0y6uwJLaSWN6zN5qy79pCiqr1IUQ4b4f9SKxk02U/iCF5hYp0iZAuASyudB+Z0Jnr2Wpo/vt733NDnqSjKatn2vgaaBNW2z25JUEUK6lWXLf0w9fJK0kIoslUCel87wOzfu1WAqIIJVyL75Lxe8HY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 03, 2025 at 02:31:15PM -0400, Peter Xu wrote: > > There're actually three places this patch touched, the 3rd one is > > hugetlb_no_page(), in which case I also think we should lock it, not only > > because file folios normally does it (see do_fault(), for example), but > > also that's exactly what James mentioned I believe on possible race of > > !uptodate hugetlb folio being injected by UFFDIO_CONTINUE, along the lines: > > > > folio = alloc_hugetlb_folio(vma, vmf->address, false); > > ... > > folio_zero_user(folio, vmf->real_address); > > __folio_mark_uptodate(folio); > > I think I was wrong here at least partly.. So what this patch changed is > only the lookup of the no_page path, hence what I said here doesn't apply. > This patch also mentioned in the commit message on why James's concern was > ok - the fault mutex was held. Yes I agree. Actually even without fault > mutex, the folio is only injected into page cache after mark uptodate.. so > it looks fine even without the mutex. > > Though it's still true there're three paths to be discussed, which should > include no_page, and it's still needed to be discussed when any of us like > to remove folio lock even in the lookup path. > > For example, I'm not sure whether it's always thread safe to do > folio_test_hwpoison() when without it, even with the fault mutex. > > Sorry for the confusion, Oscar. So, I'm having another go at this. I started with replacing the check for cow-on-private-mappings [1]. I'm still to figure out how to approach the other locks though. The thing is, we only need the lock in two cases: 1) folio is the original folio in the pagecache: We need to lock it to copy it over to another page (do_fault->do_cow_page locks the folio via filemap_fault). Although, the truth be told, we even lock the folio on read-fault in do_read_fault. Maybe that is just because __do_fault() ends up calling filemap_fault (so it doesn't differentiate?). But that is not a problem, just a note. So it is ok for hugetlb_no_page() to also take the lock when read-faulting. We need to take the lock here. 2) folio is anonymous: We need to take the lock when we want to check if we can re-use the folio in hugetlb_wp. Normal faulting path does it in wp_can_reuse_anon_folio(). Now, the scope for them is different. Normal faulting path only locks the anonymous folio when calling wp_can_reuse_anon_folio() (so, it isn't hold during the copy), while we lock folios from the pagecache in filemap_fault() and keep the lock until we have a) mapped it into our pagetables (read-fault) b) or we have copied it over to another page. Representing this difference of timespan in hugetlb is rather tricky as is. Maybe it could be done if we refactor hugetlb_{fault,no_page,wp} to look more alike to the non-hugetlb faulting path. E.g: hugetlb_fault could directly call hugetlb_wp if it sees that FAULT_FLAG_WRITE is on, instead of doing a first pass to hugetlb_no_page, and leave hugetlb_no_page for only RO stuff. Not put much though on it, but just an idead of how do_fault() handles it. (In an ideal world hugetlb could be re-routed to generic faulting path but we would need to sort out somehow the need to allocate folios, reserves etc. so that is still far future) Anyway, as I said, representing this different of timespan file vs anonymous in hugetlb is tricky, so I think the current state of locks, with [1] applied is fine. hugetlb_fault: - locks the folio mapped in pte_orig hugetlb_no_page: - locks the folio from the pagecache - or allocates a new folio - and inserts it into the pagecache and locks it - and locks it (anonymous) So, hugetlb_wp() already gets the folio locked and doesn't have to bother about anything else. Of course, while I think that locks can be keep as they are, I plan to document them properly so the next poor soul that comes along doesn't have to spend time trying to figure out why we do what we do. Another thing of keeping the locks is that for the future we can potentially rely less on the faulting mutex. [1] https://github.com/leberus/linux/commit/b8e10b854fc3e427fd69d61d065798c7f31ebd55 -- Oscar Salvador SUSE Labs