From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3F85C4332F for ; Thu, 8 Dec 2022 20:29:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4739F8E0007; Thu, 8 Dec 2022 15:29:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 424598E0003; Thu, 8 Dec 2022 15:29:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2EC608E0007; Thu, 8 Dec 2022 15:29:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 1F2D08E0003 for ; Thu, 8 Dec 2022 15:29:05 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E7935160563 for ; Thu, 8 Dec 2022 20:29:04 +0000 (UTC) X-FDA: 80220278208.10.B485866 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf21.hostedemail.com (Postfix) with ESMTP id 1E03A1C0011 for ; Thu, 8 Dec 2022 20:29:01 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Utj4AvEu; spf=pass (imf21.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670531342; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ii0FuSoZ7SGQWloslzn471nE8nER1iLtDMQwcznCIMw=; b=fCDekaW4o8jQhNXHpELbHZ1/VqzEu100HpHR5pIOQEzWUcORANSBCrDdS0I501JOMpwTN9 wR4u+8cwHVn92mzipSnNfvP/EuBjk5z+nQ+HJTG9npVF5A7X6LShcEXw8Qeyg62Dim0ZZ2 jqRNFL8D5BP2FG7LiwuHdgoM+Sv3qNM= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Utj4AvEu; spf=pass (imf21.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670531342; a=rsa-sha256; cv=none; b=wBxM7nLMS1Mu+Ty+A4lWgyCdIlosYZSHN7t8Dg6GBTmzYjN4pWfL2z1qE4aWaW3KlxaVg1 Ye2qNBVmxPO1alCZlq3JVX0Tfd0sGtJWh3YVYc+Kf0xfcYRolmONJzJj3+fdjDC1u75atB HrsKQl8rSaH/tM62+CgurXZ0vihvtcE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670531341; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Ii0FuSoZ7SGQWloslzn471nE8nER1iLtDMQwcznCIMw=; b=Utj4AvEuyDsUl8Ikw7VPFEh732GAT5la7Bym6kc4GPTKEqBGK++AqcD5kzrRj3h04VW+us g9JUKj0cuwP8YxcXsl8FPBwpYPgW2Lsg6Hmmq1pL3zgFTJOTDyzVTyNJdcf9NtLk+9D15m DW28T+44VXAhL1CiWWy02NNgOrvXBtU= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-602-ONP8VUBfMFqOMca1EdVHoQ-1; Thu, 08 Dec 2022 15:28:59 -0500 X-MC-Unique: ONP8VUBfMFqOMca1EdVHoQ-1 Received: by mail-qk1-f198.google.com with SMTP id f13-20020a05620a408d00b006fc740f837eso2710325qko.20 for ; Thu, 08 Dec 2022 12:28:59 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Ii0FuSoZ7SGQWloslzn471nE8nER1iLtDMQwcznCIMw=; b=YiUtKMLut4f+wbUjUbG7zPpNKvKpaX3F6ksP7HKzrX2Ez7mpbIYe7FTNXAEVMjNE14 c3gIk8mp10FyvO7/T3H/BkQ0mHpNcw4QtMD83sMW2FouhpJZ3NbEPLQvvYgNMJ/zgNFH ajeEbrPJwxT+0K74FY5WGV6ZSdbkY4QNllv6rbyIFj2sOS64o11ZDn0UgnID075B+htP yFizUU01z/zuOjqlkeyDV+EueWFThcn1JSZY7AlOKSUGwGBW1ToYs5XaBPTGoT1W2KVr +74ODHpniRAN4k5aBER4wKZulVJTilJWNKrRXQk3vT/sBSko4CZ5hM5pGS+jDZNfvu/H tSPA== X-Gm-Message-State: ANoB5pm2U+TOj5/xTN6IhaOKlqO5MOHXzKnJR4mSXm7gd4i/F6Yj+HlA Vj2WwKCnF0mH9BHuhK2sEyeaeiFQxM+rAomOij9q+unJrK6gFbWDq00KRnLdlcUVj4iOTkd//PS 8KEUjlIN9qT8= X-Received: by 2002:a05:622a:1002:b0:3a5:4e34:fafe with SMTP id d2-20020a05622a100200b003a54e34fafemr5952473qte.68.1670531339325; Thu, 08 Dec 2022 12:28:59 -0800 (PST) X-Google-Smtp-Source: AA0mqf6izTd8dvbGPkRyTdbCT91s9peP5zAGoaeKQ8hNdzhpRPFtyvhn6SWCASnYc4491qtuUqwlig== X-Received: by 2002:a05:622a:1002:b0:3a5:4e34:fafe with SMTP id d2-20020a05622a100200b003a54e34fafemr5952455qte.68.1670531339036; Thu, 08 Dec 2022 12:28:59 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id k26-20020ac8475a000000b0039cb59f00fcsm10904928qtp.30.2022.12.08.12.28.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Dec 2022 12:28:58 -0800 (PST) Date: Thu, 8 Dec 2022 15:28:57 -0500 From: Peter Xu To: John Hubbard Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Muchun Song , Andrea Arcangeli , James Houghton , Jann Horn , Rik van Riel , Miaohe Lin , Andrew Morton , Mike Kravetz , David Hildenbrand , Nadav Amit Subject: Re: [PATCH v2 04/10] mm/hugetlb: Move swap entry handling into vma lock when faulted Message-ID: References: <20221207203034.650899-1-peterx@redhat.com> <20221207203034.650899-5-peterx@redhat.com> <326789a5-85ba-f13c-389e-fd21d673e3ae@nvidia.com> <86bff55b-d048-1500-cddc-2d53702d7a3b@nvidia.com> MIME-Version: 1.0 In-Reply-To: <86bff55b-d048-1500-cddc-2d53702d7a3b@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 1E03A1C0011 X-Stat-Signature: j7n36jzsoxj49k9ywnqtkcuyyg8fwuph X-Rspam-User: X-HE-Tag: 1670531341-707611 X-HE-Meta: U2FsdGVkX19FyxFOwvoQTaml017axEopTGDIlZ69L6XODfbzGOKzRYz7UT/6EcNEwt/NnL9GLVwCnZmk6+WZqOxBHlNntTKD1NqzlO5p9TPIy3uIYF5pnjeaQCY2YEyezTqGkaV+ZxYR3a9h5/rTf83EL3F+FjX/kbobP+qMPAHWgJDqGkEqkmuWZJtADuq4DwAnSkyGHBppDmx1uk9ji3Ih/V0DFI5SwbMJFw3ho2EYtQg0Skb4APvSJ+HZeEZflj6jJ/+mxR3n3PmgqzuPhpKDLYBKvfbK1PbUqCr6wvzACTGmwvSQfG/4TEURS2sAfEGos+hcVYNGtKOwJ/KpXh7hl20ewm0jF8VXBrpZNFFj++7A4EJ5M5UH88xYok2ZC0T/Q+AZ/MOKvBoJ8jZ3oJWd61+LvUH46Pk6JBjVQjrlHNnEDKUid4h+uC4/wdlSJ3ds99j9hPTjKH1JqJNZT7XQF2IHp1mkvqOxn2nHb3WX5/6XWiwQRjUemA7TVEg6VjFo5gPaDBkKEBr/JXzky8t+nzpj7MSVjDW2v2Skh6YbHF2exzEUl0zH4dyNUUYpziMthRjfLpaPjKIUiWd0QLFrYI04makFpdXl4vPycRUpWAIABkUsVIBXDmgwhEpxLXmcfDfbCj/2rbzB2F1G5LWI31ECecJLkX7j8cSg6fm+l7IRuJKYSSoVRxJZp6RVHYeoVI8/jaM3vubc5KuY91OQ1MOzgJRzl2blrWFnx5qbG3JHgYZ3ETTVng1c1EJqfEEbbU4r4bQlkBq5CEsZ6m+BiCiGMDa83UUqyHiXXNVanQ+cyZKizsWJ+JRdtS9Rgx+iiaRN8sDdIogePoB9K2b35numjF7XJ3B8dxX55yZGJ0NqJ02Ja5cQvphu8HazgSI+SRK7MIEYM2pZ9zGqtR3QAGfB9yTX05LI5tZ7texWJ1UVGkm1/HDrKDNKP7B0Pyd0uEGNlS0EpEp0xx+ 4/2aEhkQ FSA5c/wre1bhTkFpb5fSyEBTnTqBku7gBh/QwUFRols5MR4Yg8Jd9spUqwzpbYpJ3peFdklmBhsKVdemaBorroU4jBUotf265PJjBIejheaXhvbhO16ltUJmf8+z2QmaOWTTcm6vecExlPDNUFOUOCGgnPFYeWzfGtqW775AF0RK8bY5K79dalkGnHg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Dec 07, 2022 at 03:05:42PM -0800, John Hubbard wrote: > On 12/7/22 14:43, Peter Xu wrote: > > Note that here migration_entry_wait_huge() will release it. > > > > Sorry it's definitely not as straightforward, but this is also something I > > didn't come up with a better solution, because we need the vma lock to > > protect the spinlock, which is used in deep code path of the migration > > code. > > > > That's also why I added a rich comment above, and there's "The vma lock > > will be released there" which is just for that. > > > > Yes, OK, > > Reviewed-by: John Hubbard > > ...and here is some fancy documentation polishing (incremental on top of this > specific patch) if you would like to fold it in, optional but it makes me > happier: > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 49f73677a418..e3bbd4869f68 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5809,6 +5809,10 @@ u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx) > } > #endif > +/* > + * There are a few special cases in which this function returns while still > + * holding locks. Those are noted inline. > + */ This is not true, I think? It always releases all the locks. > vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > unsigned long address, unsigned int flags) > { > @@ -5851,8 +5855,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > /* PTE markers should be handled the same way as none pte */ > if (huge_pte_none_mostly(entry)) > /* > - * hugetlb_no_page will drop vma lock and hugetlb fault > - * mutex internally, which make us return immediately. > + * hugetlb_no_page() will release both the vma lock and the > + * hugetlb fault mutex, so just return directly from that: I'm probably not gonna touch this part because it's not part of the patch.. For the rest, I can do. I'll also apply the comment changes elsewhere if I don't speak up - in most cases they all look good to me. Thanks, > */ > return hugetlb_no_page(mm, vma, mapping, idx, address, ptep, > entry, flags); > @@ -5869,10 +5873,11 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > if (!pte_present(entry)) { > if (unlikely(is_hugetlb_entry_migration(entry))) { > /* > - * Release fault lock first because the vma lock is > - * needed to guard the huge_pte_lockptr() later in > - * migration_entry_wait_huge(). The vma lock will > - * be released there. > + * Release the hugetlb fault lock now, but retain the > + * vma lock, because it is needed to guard the > + * huge_pte_lockptr() later in > + * migration_entry_wait_huge(). The vma lock will be > + * released there. > */ > mutex_unlock(&hugetlb_fault_mutex_table[hash]); > migration_entry_wait_huge(vma, ptep); > diff --git a/mm/migrate.c b/mm/migrate.c > index d14f1f3ab073..a31df628b938 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -333,16 +333,18 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, > } > #ifdef CONFIG_HUGETLB_PAGE > + > +/* > + * The vma read lock must be held upon entry. Holding that lock prevents either > + * the pte or the ptl from being freed. > + * > + * This function will release the vma lock before returning. > + */ > void __migration_entry_wait_huge(struct vm_area_struct *vma, > pte_t *ptep, spinlock_t *ptl) > { > pte_t pte; > - /* > - * The vma read lock must be taken, which will be released before > - * the function returns. It makes sure the pgtable page (along > - * with its spin lock) not be freed in parallel. > - */ > hugetlb_vma_assert_locked(vma); > spin_lock(ptl); > > > thanks, > -- > John Hubbard > NVIDIA > -- Peter Xu