From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B745C5AD49 for ; Tue, 3 Jun 2025 13:51:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 030066B0466; Tue, 3 Jun 2025 09:51:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F22DF6B0468; Tue, 3 Jun 2025 09:51:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E119A6B0469; Tue, 3 Jun 2025 09:51:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C2BF96B0466 for ; Tue, 3 Jun 2025 09:51:08 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 8494680A72 for ; Tue, 3 Jun 2025 13:51:08 +0000 (UTC) X-FDA: 83514225816.14.8FC41FB Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf19.hostedemail.com (Postfix) with ESMTP id 3B1A41A0017 for ; Tue, 3 Jun 2025 13:51:05 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b="UNL/7H5X"; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=3lyZ9GBX; dkim=pass header.d=suse.de header.s=susede2_rsa header.b="UNL/7H5X"; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=3lyZ9GBX; spf=pass (imf19.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748958666; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=22/+GF2KAA9m9tC5GoNVuilwMTd7DvHg3TACB80pfT0=; b=X8xOvHPXuHQa+qIqSTgT2wc7rX0Bjiz8qoK81qwCDdo5sywqodfEkc4JNTLTM5n0NXA4HD QGsNLNETQPKwKg7ctdA/d4rYEqxTB24hmSiSVklvSGAyGPzOIJxb4ROSboliJY2l8EbRAR DivdcrEdygYmKOFnxQInMMOcUXIbtvE= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b="UNL/7H5X"; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=3lyZ9GBX; dkim=pass header.d=suse.de header.s=susede2_rsa header.b="UNL/7H5X"; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=3lyZ9GBX; spf=pass (imf19.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748958666; a=rsa-sha256; cv=none; b=sYXTSf750oNQNX4sQYsgVacdj9Tu8ke8+pDa8bGl0XZhctBYIbfxtOvw4CEUfqKgv+UqdU WUCaog4q/fxpRjIkikSA0G6c/vB5jfmWapbg92Sci0+FRhF92nTYF0hKEu0e49CyJ5dbQP PQ13LtzoV5y87nGh4H9qMIVxnOjhUlk= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id B254B1F7B5; Tue, 3 Jun 2025 13:51:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1748958664; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=22/+GF2KAA9m9tC5GoNVuilwMTd7DvHg3TACB80pfT0=; b=UNL/7H5XVAAAhLQUPM3Z8QxOmiYuV+wmQVVotxgRrdX881XF8yDy6D9XEwpAviy7cbnC1u u7RfCNXt4IRu/deo5YXFSQLM0pxp2kElXzzL1NqtcmW82nyScmmGvxG1Uv0pZtRl1OOfxm UaOfxuU7WaPTGSbKBoUwdN3D5o+wqeQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1748958664; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=22/+GF2KAA9m9tC5GoNVuilwMTd7DvHg3TACB80pfT0=; b=3lyZ9GBX6PNzaGdenFVblDMD12InDgglN6MVxX8kNSBpJtdUTAvGDoUZO6L5vRDg0A2EL9 IEVdKPRZFCdsByAA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1748958664; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=22/+GF2KAA9m9tC5GoNVuilwMTd7DvHg3TACB80pfT0=; b=UNL/7H5XVAAAhLQUPM3Z8QxOmiYuV+wmQVVotxgRrdX881XF8yDy6D9XEwpAviy7cbnC1u u7RfCNXt4IRu/deo5YXFSQLM0pxp2kElXzzL1NqtcmW82nyScmmGvxG1Uv0pZtRl1OOfxm UaOfxuU7WaPTGSbKBoUwdN3D5o+wqeQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1748958664; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=22/+GF2KAA9m9tC5GoNVuilwMTd7DvHg3TACB80pfT0=; b=3lyZ9GBX6PNzaGdenFVblDMD12InDgglN6MVxX8kNSBpJtdUTAvGDoUZO6L5vRDg0A2EL9 IEVdKPRZFCdsByAA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 3B6C913AAD; Tue, 3 Jun 2025 13:51:04 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id ZcyuC8j9Pmj/FgAAD6G6ig (envelope-from ); Tue, 03 Jun 2025 13:51:04 +0000 Date: Tue, 3 Jun 2025 15:50:54 +0200 From: Oscar Salvador To: Peter Xu Cc: Andrew Morton , Muchun Song , David Hildenbrand , James Houghton , Gavin Guo , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 1/3] mm, hugetlb: Clean up locking in hugetlb_fault and hugetlb_wp Message-ID: References: <20250602141610.173698-1-osalvador@suse.de> <20250602141610.173698-2-osalvador@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 3B1A41A0017 X-Stat-Signature: 68u89endbywxm8rd6ctie1bbn9n5jmr5 X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1748958665-35446 X-HE-Meta: U2FsdGVkX1/jCRVdPoQSsPX5CEsUB5qluzRNMtZo8k5k46sd77qeI8q84vDMHMcUXby9qxSNvqsfeZAJtn5UDlfQlZt04j2WVApJsNGwsXPToDixOs1moO3k9+SUC7bk0CRV/CEaqFKY9NEp/aFhQ8GVVoKFS7wz82PIhkRgYs7WHzQmfKMQik34jVxo0x0VBE9WV6vHEGBqY8u0LNyKUiejDWzYBJUpykRd3YJnDd1/3NC7qwexA9YnX8a1TR24aEwVF1OYokOm3TNyn8oghFa0WTcrEIWiV0A+xJk7Hte9YhwFrInPuuCa3XqVnaqi646InOaKIV+EXYIQ9GpAYvA1voEs50Cn1UBBj2tyO1245zReZdKkkH6qFNg3BlV4fmG95yIjyfNfXe6tJHGVmN1M49WN4fURg3w5AV8r4vBADjs5t+xmhV/gVdDU+RcegRuwfwrvtkBAEoKsJO3qSNGGpRxz8GqPCp0Hg1jm+sXtChPtKSAnf64DNIGJ53PdRN9gmzzOq1ljEVtmEwPBymYDjGllT1DHX0Vpa8Qb8+SGRwyhQR0k+ZxHMSRdPEuV879xrLQXkZWag98SBdRSWk4DGdPpopXE08dF/I+h0r2C7/6WkwDNicEtPK88hcc3ZOEdyZHjhKy29Y8TeXklUgmw9tm8S7aHh9hZeSA6cEzAhRRuJduZQCkaRUpaOHlUhNRz2XCPv1aYyVCtC0bSR2Dbqbkq9G+rNe4ME3RgKJyRX6b2PJSTYqjdzWiKikFR4fHOdmWM0PbnVSIqglmn4hA/JbFMIIT1UfwM/dujxtd0ZQ9ljJMVW326KpMBhC7EZ/x+RRzEGGLrJ+GlR77Nqerp9o8wR8GFo7WaT0lqqJtiz0iEULomCL7aipzoSVLdqE6I+P/oMmP7sIvZBYV7Q34hTFsGD3voMUuUT14ME6l/c0ANXEihC02KkrjHvKZ2XEL5Buc3b3SRt16kzVo iMOmq2DC T0U0CzkcYOt4v8K+/DJ12cpdLsY/ssC+esCrA5TbneC/4bPCwiYvTsPfzfob7dieENu08u8Rt3N9MzMKvxOTA0ZrhzrmQXwGWsHrcq3drsK5TwZAlpMVZRfXVyr4CYHDf1belnOHZojGIg92Kfsc0UyJoGNudZ1KhPhzN7VYVpLkUn8TQcY+L28uubQ/ewrcULJ1pXnfBWze78LsljwMGhny0hG7po5txZ1PV0qvbGt2YcEF/q8gphnxy6/LQ/GmUgFUKrnmfwjTjwhdEmdwbl1aTp7EZe+Izu1nCLM6Cm6Lt7d2vH/Dz9y+4Yg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 02, 2025 at 05:30:19PM -0400, Peter Xu wrote: > Right, and thanks for the git digging as usual. I would agree hugetlb is > more challenge than many other modules on git archaeology. :) > > Even if I mentioned the invalidate_lock, I don't think I thought deeper > than that. I just wished whenever possible we still move hugetlb code > closer to generic code, so if that's the goal we may still want to one day > have a closer look at whether hugetlb can also use invalidate_lock. Maybe > it isn't worthwhile at last: invalidate_lock is currently a rwsem, which > normally at least allows concurrent fault, but that's currently what isn't > allowed in hugetlb anyway.. > > If we start to remove finer grained locks that work will be even harder, > and removing folio lock in this case in fault path also brings hugetlbfs > even further from other file systems. That might be slightly against what > we used to wish to do, which is to make it closer to others. Meanwhile I'm > also not yet sure the benefit of not taking folio lock all across, e.g. I > don't expect perf would change at all even if lock is avoided. We may want > to think about that too when doing so. Ok, I have to confess I was not looking things from this perspective, but when doing so, yes, you are right, we should strive to find replacements wherever we can for not using hugetlb-specific code. I do not know about this case though, not sure what other options do we have when trying to shut concurrent faults while doing other operation. But it is something we should definitely look at. Wrt. to the lock. There were two locks, old_folio (taken in hugetlb_fault) and pagecache_folio one. The thing was not about worry as how much perf we leave on the table because of these locks, as I am pretty sure is next to 0, but my drive was to understand what are protection and why, because as the discussion showed, none of us really had a good idea about it and it turns out that this goes back more than ~20 years ago. Another topic for the lock (old_folio, so the one we copy from), when we compare it to generic code, we do not take the lock there. Looking at do_wp_page(), we do __get__ a reference on the folio we copy from, but not the lock, so AFAIU, the lock seems only to please folio_move_anon_rmap() from hugetlb_wp. Taking a look at do_wp_page()->wp_can_reuse_anon_folio() which also calls folio_move_anon_rmap() in case we can re-use the folio, it only takes the lock before the call to folio_move_anon_rmap(), and then unlocks it. Which, I think, hugetlb should also do. do_wp_page wp_can_reuse_anon_folio ? : yes: folio_lock ; folio_move_anon_rmap ; folio_unlock bail out : no: get a reference on the folio and call wp_page_copy So, this should be the lead that hugetlb follows. As I said, it is not about the performance, and I agree that relying on finer granularity locks is the way to go, but we need to understand where and why, and with the current code from upstream, that is not clear at all. That is why I wanted to reduce the scope of old_folio to what is actually needed, which is the snippet: if (folio_mapcount(old_folio) == 1 && folio_test_anon(old_folio)) { if (!PageAnonExclusive(&old_folio->page)) { folio_move_anon_rmap(old_folio, vma); SetPageAnonExclusive(&old_folio->page); } if (likely(!unshare)) set_huge_ptep_maybe_writable(vma, vmf->address, vmf->pte); delayacct_wpcopy_end(); return 0; } I think it is important to 1) reduce it to wrap what actually needs to be within the lock and 2) document why, so no one has to put the gloves and start digging in the history again. > Thanks! I hope that'll also help whatever patch to land sooner, after it > can be verified to fix the issue. So, my plan is: 1) Fix pagecache folio issue in one patch (test for anon, still need to check but it should work) 2) implement the 'filemap_get_hugetlb_folio' thing to get a reference and not lock it 3) reduce scope of old_folio I want to make it clear that while I still want to add filemap_get_hugetlb_folio and stop using the lock version, the reason is not to give more power to the mutex, but to bring it closer to what do_wp_page does. What do you think about it? -- Oscar Salvador SUSE Labs