From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E48E8C5AD49 for ; Tue, 3 Jun 2025 15:46:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 83A736B04B3; Tue, 3 Jun 2025 11:46:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8120C6B04B4; Tue, 3 Jun 2025 11:46:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 701656B04B5; Tue, 3 Jun 2025 11:46:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 4F2906B04B3 for ; Tue, 3 Jun 2025 11:46:29 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id CB3D5120DB0 for ; Tue, 3 Jun 2025 15:46:28 +0000 (UTC) X-FDA: 83514516456.12.F4FA01C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf09.hostedemail.com (Postfix) with ESMTP id 651A614000D for ; Tue, 3 Jun 2025 15:46:26 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=WyGLaS6F; spf=pass (imf09.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748965586; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=khdrIjasOkGlZbUzyxfVc/Ywtr8IPhLGqgsvDogkQOI=; b=18yWzKbUgzhfX3y+AEZOwJ5MApvaZSnfXl3MDa65tjHs8e60RFElVSacSIRmlxp2DcpqFQ 972AbeUYUIhLYnnNwjqxtC7MGtv+i8orFY05kNeztC8LovZxYaa4MWRcGhEdvWTRHL7zFJ cS5REsQI4PgD2L/XNBNyXWVGMSC7nsI= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=WyGLaS6F; spf=pass (imf09.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748965586; a=rsa-sha256; cv=none; b=NTM/606iCDJLYh/2ysy096jIh9ZDjp3JF6+cSEyVgK2lnwaq+2V+mlhU24Az6xu+uKkaj8 HbFkNxWOph+TgIPJqoJdSF7iKXdXj2McQSSZ5qGAiY5MqbzgTzWy33Yv34iRlWOty3TAcS DFhXw1V4Ft0v0MGKHN7xN2oumnUqkwM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1748965585; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=khdrIjasOkGlZbUzyxfVc/Ywtr8IPhLGqgsvDogkQOI=; b=WyGLaS6Frj7+PgJt7m7EEuvKFOM0P0F2e/RTDi5DvlerCq68xy1MSakRXovW3pShaeoUda 0Tas4OtpWv6TDx8z6hnjol3374fB3bG1i/T6Xz5s+SZ0FL8xGzyXRW91hRDY+TpEhTk8QI K4ei+s0J8ygU058/sGUSRWUCyz/HGCo= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-138-U5koevBXO-KxQrzO5hhlhg-1; Tue, 03 Jun 2025 11:46:24 -0400 X-MC-Unique: U5koevBXO-KxQrzO5hhlhg-1 X-Mimecast-MFC-AGG-ID: U5koevBXO-KxQrzO5hhlhg_1748965584 Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-7d09ed509aaso848452785a.3 for ; Tue, 03 Jun 2025 08:46:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748965584; x=1749570384; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=khdrIjasOkGlZbUzyxfVc/Ywtr8IPhLGqgsvDogkQOI=; b=QXVx1KHYJGP+hY+Ieyw5MzQIeoMQob0iMS9fSx8wAqe7sPtVjqOclwPcfxSZaLcjqW xJVg8mmrpiRrBzLt/OxUZSmbHOUD6q4/N+jgaDG1NcChvsVUabwwFTWM4C6OGB/SwxEX 00Yw4ZQN0sV8yAe8CaluczCpOy3zkjS6dMMTeIO+SLOn7pwIpCQtrI2+VbJOkPxgsFqB 0JkwpmDeKzPCpLIJRARW8uRknX7f9yzQiJ/4PW/AUUfLoq4cxWR1xTPnKmyoOnLVgNfu xp0wqhSubcIz6TFZQ3M4ST5gJy+Op2S2tIgpXE3+FJitNYscNHhekDlDF+AcbhGXarUs lIOw== X-Forwarded-Encrypted: i=1; AJvYcCVFVn2ZoHbMV9K+ccvd8E7LwcPEq/VIlHUUwbATdPYdOacj9LeUOEteI6iEKEcD27ir9s+h2/iLYA==@kvack.org X-Gm-Message-State: AOJu0YzW7LTD81nVnMBT8YCXmnEVqb7uADWGbsnG3jCbCFNQWZJzYw2w AhBHe9ojXRaQhpv13Sx5Lxd2CEj5kHu8jraoZjZLrw/AApLDEHQmBJnSyqsfbkC04dF4tcNpKFl MCfymIGQo7bFYf8HnVvbFTR64CIEdhcwI361sFp3XViCyBAyweweV X-Gm-Gg: ASbGncuoIpzqBhgbq5wu6Is1kAFp0XJzNmBC49jo0Rgs6HEG1CDHXBk402sOBYy75Tl SlAf+cZXu8qEN2ytBSEwUBbOi8Fz2O5WB7nLTWtFxJlYdttCaXWq9s+a6qnLMLlF5WZ3pNAWwDv +CoY9hA5uHxmVIPBGWm0gTL7qCirB4Pm2okAzDpcNDCfNFSSYzG1xhxLo7eWaYHl/BYB3o96A3R C1RX8UXhFdaaACXoxTrCrvpfi+FPgz7JE2ZMj00GhcNg1IkWa/mfYp5gQJIXATorB5bCU+SWHkT Jxo= X-Received: by 2002:a05:620a:4691:b0:7c5:4711:dc56 with SMTP id af79cd13be357-7d0a20335b9mr2962027485a.48.1748965583636; Tue, 03 Jun 2025 08:46:23 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF//ktSyVFl+v/tijgOb91sHHNrbN9UrFKtZgKyfrDek47YV5a+Gf7srmFcO/iimgfqc/i8wA== X-Received: by 2002:a05:620a:4691:b0:7c5:4711:dc56 with SMTP id af79cd13be357-7d0a20335b9mr2962023485a.48.1748965583048; Tue, 03 Jun 2025 08:46:23 -0700 (PDT) Received: from x1.local ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7d09a0f9089sm863352085a.34.2025.06.03.08.46.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Jun 2025 08:46:22 -0700 (PDT) Date: Tue, 3 Jun 2025 11:46:19 -0400 From: Peter Xu To: David Hildenbrand Cc: Oscar Salvador , Andrew Morton , Muchun Song , James Houghton , Gavin Guo , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 1/3] mm, hugetlb: Clean up locking in hugetlb_fault and hugetlb_wp Message-ID: References: <20250602141610.173698-1-osalvador@suse.de> <20250602141610.173698-2-osalvador@suse.de> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: kB6d6kN4x9qCNqdv_Z5BQnpXciu0Y1voHE25LyqGrVg_1748965584 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 651A614000D X-Stat-Signature: oofxwru5kg4je4zze6sopxcpgwkuxmt1 X-Rspam-User: X-HE-Tag: 1748965586-71478 X-HE-Meta: U2FsdGVkX19R1baX7TQ7oQRrp+D6nPNdzUljqXDuRsr81fksVl5ewhoZgq9Kw1MZO3R7cCJlLK1vN3ghVu6k3xF5E2Ac3FH+sG7I52PqiYSh5RTh0b4t70PUf6BnJRFGMAYq0wZS4J1/CgE+GbB9Upe2hg16/w5sqSZeE++KTNuR8P+ZNZ6iCWIJaNvfblKSYnW+FgzbL0crKa+hQ/eMRR8aVlPA2bq3QkgydKC4dDNqlGXfP5BK9gXNaqo06uH0R0kxH6UFgc1oiVTDA1pynfXDY1ksIQjDku2zwakohCVTT/CYio9K3pvCXIFUzPOmMfXQf2bjcl1ipkMJKLdnmwp774ZJIf9O+2YWGOLe57/eveChXY9tvdOFipwZK29Kg5c+QhZSIxTQ/X9wsZIN8sKY+UztpFDE4KaPFJ1kAm4r7UEmiRlGn1eOLngKNzTr2e7JSnykAj7vtGOuT84//GS+hQIG398rSkUn0HHwPD07N28JAu9IqxaHqAYPjfSQVYzvF1Q+LqTn2IZa5uMe5ytV1SkBtl9G8hw4BDX0Sgr9YcdMRVCK8j33zkK63RUdBYpp6Klmm/TMWi4GXrgf6/Y0EJD/r8Wc/qxARDovCJYET6n2caJ31IS8e99CX8HO9Ngkh5QgHc4x0ke84kWYY3HSXO+Dffg9k9wig7A/c0CbvCr0VLplPv6Sa96QcVvdALpRKvcMHAdKHGtv43RWBpINBwatgonDd9QX+T2xGM0qzPvRrre9+d27x9EwIXeAZI0kPQlC1YfKHozyXyeUub918wsZZnpnuSJSeo+sECL19dlEDdjPzm3Tx7O1Q9t5S9FBYl/spiZsuYFtkDtkbnMM1TmEoFcq3Cc42RUzeLz5vYA47KHFRMoQXG+oc3HDFa5a2EsSm5LzC3gKl6KLJAU2T1lAxOmSj7KJwe2GFEmbrdVi9fhOqI5KpWqEDiF8gNDYTYrMxXwrQlCaRyb Qe1BAjEK B2hV2TDxRehrD6rnqhIW0h5x9Y6nDUeWuA9XiVO6OGTvNwR3jJ0wHA7AFbAfmrNTwKTm3XbRhD7ocbDsz9dymJ1zN4zOnc50E7jozMTCdhOA4Pn0mUxcxUBJl5u0GaL3Pcp7Y1LlHu7BqrUaIdGdWj7od/wQONz/vG8t217/JLasvFijNoAiOphQkjw7JGofm+vFZviLpUSyiMimHjzG+h51bdwWpjafoNAa1iLUCcjlINVWw6FA9RxWaSt3Sh2IVbwRlOPaMws+QF+POzLirTIHHN5MfwoK4HPX7g0amPI3DPw4WJoevz6pZLEOuTCQ/EpMI51N63m6oZac5EIzt+vacjFulOkemRKiCbT9kZ+di72CmGJ0Yn0RjaEQLOmKf0xhh0enwYhmAoydLgf9OoGU7GnQEfVPPPTCgr3R/oBUcG1ewKNkTRh+Ab5dIk0OnCrTAnq3LgRKZ+dH6phEJa8X8JDRnF7RBahSq7lbnsvthvYXvRyI7/+p/pbQ7fSvKT78S X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 03, 2025 at 05:08:55PM +0200, David Hildenbrand wrote: > On 03.06.25 16:57, Peter Xu wrote: > > On Tue, Jun 03, 2025 at 03:50:54PM +0200, Oscar Salvador wrote: > > > On Mon, Jun 02, 2025 at 05:30:19PM -0400, Peter Xu wrote: > > > > Right, and thanks for the git digging as usual. I would agree hugetlb is > > > > more challenge than many other modules on git archaeology. :) > > > > > > > > Even if I mentioned the invalidate_lock, I don't think I thought deeper > > > > than that. I just wished whenever possible we still move hugetlb code > > > > closer to generic code, so if that's the goal we may still want to one day > > > > have a closer look at whether hugetlb can also use invalidate_lock. Maybe > > > > it isn't worthwhile at last: invalidate_lock is currently a rwsem, which > > > > normally at least allows concurrent fault, but that's currently what isn't > > > > allowed in hugetlb anyway.. > > > > > > > > If we start to remove finer grained locks that work will be even harder, > > > > and removing folio lock in this case in fault path also brings hugetlbfs > > > > even further from other file systems. That might be slightly against what > > > > we used to wish to do, which is to make it closer to others. Meanwhile I'm > > > > also not yet sure the benefit of not taking folio lock all across, e.g. I > > > > don't expect perf would change at all even if lock is avoided. We may want > > > > to think about that too when doing so. > > > > > > Ok, I have to confess I was not looking things from this perspective, > > > but when doing so, yes, you are right, we should strive to find > > > replacements wherever we can for not using hugetlb-specific code. > > > > > > I do not know about this case though, not sure what other options do we > > > have when trying to shut concurrent faults while doing other operation. > > > But it is something we should definitely look at. > > > > > > Wrt. to the lock. > > > There were two locks, old_folio (taken in hugetlb_fault) and > > > pagecache_folio one. > > > > There're actually three places this patch touched, the 3rd one is > > hugetlb_no_page(), in which case I also think we should lock it, not only > > because file folios normally does it (see do_fault(), for example), but > > also that's exactly what James mentioned I believe on possible race of > > !uptodate hugetlb folio being injected by UFFDIO_CONTINUE, along the lines: > > > > folio = alloc_hugetlb_folio(vma, vmf->address, false); > > ... > > folio_zero_user(folio, vmf->real_address); > > __folio_mark_uptodate(folio); > > > > > The thing was not about worry as how much perf we leave on the table > > > because of these locks, as I am pretty sure is next to 0, but my drive > > > was to understand what are protection and why, because as the discussion > > > showed, none of us really had a good idea about it and it turns out that this > > > goes back more than ~20 years ago. > > > > > > Another topic for the lock (old_folio, so the one we copy from), > > > when we compare it to generic code, we do not take the lock there. > > > Looking at do_wp_page(), we do __get__ a reference on the folio we copy > > > from, but not the lock, so AFAIU, the lock seems only to please > > > > Yes this is a good point; for CoW path alone maybe we don't need to lock > > old_folio. > > > > > folio_move_anon_rmap() from hugetlb_wp. > > > > > > Taking a look at do_wp_page()->wp_can_reuse_anon_folio() which also > > > calls folio_move_anon_rmap() in case we can re-use the folio, it only > > > takes the lock before the call to folio_move_anon_rmap(), and then > > > unlocks it. > > > > IMHO, do_wp_page() took the folio lock not for folio_move_anon_rmap(), but > > for checking swapcache/ksm stuff which needs to be serialized with folio > > lock. > > > > So I'm not 100% confident on the folio_move_anon_rmap(), but I _think_ it > > deserves a data_race() and IIUC it only work not because of the folio lock, > > but because of how anon_vma is managed as a tree as of now, so that as long > > as WRITE_ONCE() even a race is benign (because the rmap walker will either > > see a complete old anon_vma that includes the parent process's anon_vma, or > > the child's). What really protects the anon_vma should really be anon_vma > > lock.. That can definitely be a separate topic. I'm not sure whether you'd > > like to dig this part out, but if you do I'd also be more than happy to > > know whether my understanding needs correction here.. :) > > > > In general, I still agree with you that if hugetlb CoW path can look closer > > to do_wp_page then it's great. > > > As stated elsewhere, the mapcount check + folio_move_anon_rmap need the > folio lock. Could you elaborate what would go wrong if we do folio_move_anon_rmap() without folio lock here? Just to make sure we're on the same page: we already have pgtable lock held, and we decided to reuse an anonymous hugetlb page. Thanks, -- Peter Xu