From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F498C3DA4A for ; Fri, 26 Jul 2024 15:36:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7E48C6B00AB; Fri, 26 Jul 2024 11:36:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 794806B00AC; Fri, 26 Jul 2024 11:36:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 65BEE6B00AD; Fri, 26 Jul 2024 11:36:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 48E536B00AB for ; Fri, 26 Jul 2024 11:36:54 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id DF17516186E for ; Fri, 26 Jul 2024 15:36:53 +0000 (UTC) X-FDA: 82382306706.11.EF7C768 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf16.hostedemail.com (Postfix) with ESMTP id B0F42180005 for ; Fri, 26 Jul 2024 15:36:51 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=BsLcy0TA; spf=pass (imf16.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722008161; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QLm/qsKSuQtpspS0vF8FpFK8gPeNztN29mDRZKXEolM=; b=EI082yrl+Qe9okLDMrezSU/Lkmd0HrIwfxhyHf6t3LmX29qUjH7z+UjsxRihsnDVRFql0S 3R5md+MIK2Bj7xMXANplhcIQFn0Wm4wOB8nbSY0Wt7YZMuZBm8iEssV/5rSXgqXjaQATJX m+Mtw2R8rqUjjGdscxMmqdjpSiaSTCk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722008161; a=rsa-sha256; cv=none; b=16lAHuF6worsyv8IUJJqVggr7Mv+0GD3NRC0aHHL99NjvDpepfWpZikjEtgUh2mZDR7SLy O8gsUhp90BWrR4Wtq6BHYJF0UvQr9Zv5sxoVibvQExUlwwUgXTRE8s4TirHgf+0AlzAF2f 9fRMFU4rh36own0OloE6CqmGS7zxtqs= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=BsLcy0TA; spf=pass (imf16.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722008211; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=QLm/qsKSuQtpspS0vF8FpFK8gPeNztN29mDRZKXEolM=; b=BsLcy0TAWBhNTlV2w60mhwSJmMHn4WHeGWr1QEqUAm93AiA82x0AOUvPNCObU3Q18/lURj BJKxphvtlDU3JPcQaVR63Kzbcmbbc4xkl/NGS5TQ8WOEqAVa5Ubhq2Hr6034hyd3ctVrIr CQGJ+Rt1GqKR3okmYKnwjittxNM7ZxQ= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-137-aSSC9CAoMGSM5cm_GVe6BQ-1; Fri, 26 Jul 2024 11:36:49 -0400 X-MC-Unique: aSSC9CAoMGSM5cm_GVe6BQ-1 Received: by mail-qt1-f197.google.com with SMTP id d75a77b69052e-44ff44e2f9bso468491cf.3 for ; Fri, 26 Jul 2024 08:36:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722008209; x=1722613009; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=QLm/qsKSuQtpspS0vF8FpFK8gPeNztN29mDRZKXEolM=; b=fD6S2D43cknAaZmdiL8zhVi4qg/ErTe60w+nsYVhPxmsjUID/zNhme/3Km4wWByo9G oJKbJnHa0d6eILTc3NMm/0wyRGotN5Z4bBLoTvn3PBYLEVVkzoojz1rZS3aefdwN8H3n YL+wDKreWPchuo+zbYmiED9/N5XZodJlFWegnIQxzSaUItrYdsWthuhDXDf0rVFNvPBx XUvTGKRlvOrHPPBplXZBKJrAU746aWUcdFbpVyFKayBj3NPChVvqrrW4JZEqbv1pD7ce Xv4/qtdPaCK/qDbV3UINyJQiR78qbXDsWBVZ4mgKRPenTq7LM5FsjSSvfQo8WONd1bCR P1Fg== X-Forwarded-Encrypted: i=1; AJvYcCWlbQsfVzLFugKfYGvVCvv6ZqkrgBBc2j7dyfwKEWkzqmnu3e/Er51t3HfxQ1imqCo8eLbwaxjR5GUHgGLsVh5eQps= X-Gm-Message-State: AOJu0YySSKrr8yL8ujkUT+rO8XQLzhaN836EzqlKL6SIVyrpEh3Ifrcp LCUiDGrlT14nU42CcVsE+zGGXAo7nGixLX8VGFexbr9/oid3qkqbetmrkPVrEM8iEl27YhRAex/ ZmRKH+t9AzqIzm1KaxJ8O/Dx3XVTFgHzo1dcqOJQmhptGcZsh X-Received: by 2002:a05:620a:d96:b0:79f:84f:80b1 with SMTP id af79cd13be357-7a1d69ac9e2mr416724885a.7.1722008208796; Fri, 26 Jul 2024 08:36:48 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHHby0Zf7xeNrsERVjSP+2iorPmfENAqa3QKQZu/PoiXRWD/Ql4rXBzNMgzPl0Vq6g21JDLeA== X-Received: by 2002:a05:620a:d96:b0:79f:84f:80b1 with SMTP id af79cd13be357-7a1d69ac9e2mr416722885a.7.1722008208271; Fri, 26 Jul 2024 08:36:48 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a1d7435577sm188127485a.96.2024.07.26.08.36.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Jul 2024 08:36:47 -0700 (PDT) Date: Fri, 26 Jul 2024 11:36:45 -0400 From: Peter Xu To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Muchun Song , Oscar Salvador Subject: Re: [PATCH v1 1/2] mm: let pte_lockptr() consume a pte_t pointer Message-ID: References: <20240725183955.2268884-1-david@redhat.com> <20240725183955.2268884-2-david@redhat.com> MIME-Version: 1.0 In-Reply-To: <20240725183955.2268884-2-david@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B0F42180005 X-Stat-Signature: bu7ptk3dipcxx7r77xrtx6pmxpsf9t1q X-HE-Tag: 1722008211-889535 X-HE-Meta: U2FsdGVkX1+Pqikqvs3RrpxnEpCg11mi4ApQ7ayIvcdwmM7GrYss+F7Zf6fxTQZVtQ65GCuXWoiJXkMgfxzzWMc0fMrG15c2La0w1/nu2cClaVfqW4XIAY0FVTAr4ZNSKDgkkX26uDnEsls4ziDgZ+JjjHEzX0mft6jabLe7CcMyTXSG8PDT9IJOoa6BPXyrBQjVFscmKjcgzFL0EupcqU0+hItqnmSqH2CFHWiu/RmKxePxiPOZOpASO90cbCuCYkKuRYmyM/WRegLueyQavhFyumDuIcqrHr9QhK3nH4EqaKsi89Ofp/t4L1La2+kvaH8X2mRfxOaWR7SdpsVXDRwDYju7az2Nu0upDz1RkFqU5d9EUhvDybiP7umwlLiPAROmx0vsFn4BdLReXWZLhFLC2xNkJUCSkKcMmDlG/u6c7Mm3YK5lN7qEUl04DZFcaPFivXgOnv0YAMIkZT0TqM2jfQfw7NcEsvidLv+KAwsFJK8YOZmBYfmhwv46VNFK/eVl/jnAMAAlY5Y40I85SX8kI9WwEd8f+fYqQv1Vw9EEk8DZrlTBhUZtJbyj3ibqPaOZyxZXX+XXa+fJKWuN8FbWWgyJRjnx62trtAF0D0v/Pg2YNmTP/Dc9FgzaFRSml508Hk3zrPhpo5xrEE+HyTvLWFmxRKsQaSvku2ylRcZ6FKqp9hMmuIkgxQFQmpf2hVT9PAc5bc173A9u/c/Ic4dAZMRvODrLQwgVTyYX78ENm3aT2LOeIuDuBcqnPTx/kOQYM8kgY0zxwizIgfgyoxusdkBMHyB0sFQJ1oUiVIjDKg93zK9pIyD3cPg9bhi7c8LKAMfpfRt7g8exOKZlIU/ykac8t9NIlzMWll2skyvEs/115CdoMO/spNdzUpgVZ+u1iWO7TD34y8EU28I5ihGMfYUN2PQP8P0QeCf6zGu84UqhFD74kYEDpCXm6d/z/ff44/rikBjdOTMzZag BPs9dCtb OeOl1vSuJSZlaI/ifCa4m319CzmgaSk7GWBdjirlm3FQRNXV4UxnXIPiZDkFN4oCn1Ik3bfbNIOmmL0XIOgaXixMOcMPn/Wjw/Q12/mrEyRCcijC+gXwbLEI/0ggweLNUWCY2Vu0d9LboTS9uR6v/Wru7ehURE4Pp0KFPmYtz6x/GSrPZNVSdSpAU8NAJ2IEeH8+UiBf0ZeBQUC858NRB/+9FoZvBpyktzw491pGLC2wzi0boMS++bNq0vba1rtCg9HviUyiI5JbLjw49NfQ1y7X9cqGTu+4Gu+WI4OTxeV1Ta/uRbykzjs+/kZb3ZIahzkcGcXc8OmDrWZPhTNFmvEAMwQZEzVdn3NbITqU/MG8hwAPB+iKuiia0ODX0CXjDi+Bc9UisfQxZAyZx96gAfeKeyVY0Y2JG8XBMzIdLgyFM6QUZlWltaJFuCLlqUHBaSa7hgrJ8GnXstLYIX3ngShoUtmLOYQO3LLcp7TPmRjQxmtJFpWgXXB0X6uaNPe2c4D1ep4ITropJ5FP1Rp2c3fGiJvrhKOQT2oZN X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jul 25, 2024 at 08:39:54PM +0200, David Hildenbrand wrote: > pte_lockptr() is the only *_lockptr() function that doesn't consume > what would be expected: it consumes a pmd_t pointer instead of a pte_t > pointer. > > Let's change that. The two callers in pgtable-generic.c are easily > adjusted. Adjust khugepaged.c:retract_page_tables() to simply do a > pte_offset_map_nolock() to obtain the lock, even though we won't actually > be traversing the page table. > > This makes the code more similar to the other variants and avoids other > hacks to make the new pte_lockptr() version happy. pte_lockptr() users > reside now only in pgtable-generic.c. > > Maybe, using pte_offset_map_nolock() is the right thing to do because > the PTE table could have been removed in the meantime? At least it sounds > more future proof if we ever have other means of page table reclaim. I think it can't change, because anyone who wants to race against this should try to take the pmd lock first (which was held already)? I wonder an open coded "ptlock_ptr(page_ptdesc(pmd_page(*pmd)))" would be nicer here, but only if my understanding is correct. Thanks, > > It's not quite clear if holding the PTE table lock is really required: > what if someone else obtains the lock just after we unlock it? But we'll > leave that as is for now, maybe there are good reasons. > > This is a preparation for adapting hugetlb page table locking logic to > take the same locks as core-mm page table walkers would. > > Signed-off-by: David Hildenbrand > --- > include/linux/mm.h | 7 ++++--- > mm/khugepaged.c | 21 +++++++++++++++------ > mm/pgtable-generic.c | 4 ++-- > 3 files changed, 21 insertions(+), 11 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 2c6ccf088c7be..0472a5090b180 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2873,9 +2873,10 @@ static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc) > } > #endif /* ALLOC_SPLIT_PTLOCKS */ > > -static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd) > +static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pte_t *pte) > { > - return ptlock_ptr(page_ptdesc(pmd_page(*pmd))); > + /* PTE page tables don't currently exceed a single page. */ > + return ptlock_ptr(virt_to_ptdesc(pte)); > } > > static inline bool ptlock_init(struct ptdesc *ptdesc) > @@ -2898,7 +2899,7 @@ static inline bool ptlock_init(struct ptdesc *ptdesc) > /* > * We use mm->page_table_lock to guard all pagetable pages of the mm. > */ > -static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd) > +static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pte_t *pte) > { > return &mm->page_table_lock; > } > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index cdd1d8655a76b..f3b3db1046155 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -1697,12 +1697,13 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) > i_mmap_lock_read(mapping); > vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { > struct mmu_notifier_range range; > + bool retracted = false; > struct mm_struct *mm; > unsigned long addr; > pmd_t *pmd, pgt_pmd; > spinlock_t *pml; > spinlock_t *ptl; > - bool skipped_uffd = false; > + pte_t *pte; > > /* > * Check vma->anon_vma to exclude MAP_PRIVATE mappings that > @@ -1739,9 +1740,17 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) > mmu_notifier_invalidate_range_start(&range); > > pml = pmd_lock(mm, pmd); > - ptl = pte_lockptr(mm, pmd); > + > + /* > + * No need to check the PTE table content, but we'll grab the > + * PTE table lock while we zap it. > + */ > + pte = pte_offset_map_nolock(mm, pmd, addr, &ptl); > + if (!pte) > + goto unlock_pmd; > if (ptl != pml) > spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); > + pte_unmap(pte); > > /* > * Huge page lock is still held, so normally the page table > @@ -1752,20 +1761,20 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) > * repeating the anon_vma check protects from one category, > * and repeating the userfaultfd_wp() check from another. > */ > - if (unlikely(vma->anon_vma || userfaultfd_wp(vma))) { > - skipped_uffd = true; > - } else { > + if (likely(!vma->anon_vma && !userfaultfd_wp(vma))) { > pgt_pmd = pmdp_collapse_flush(vma, addr, pmd); > pmdp_get_lockless_sync(); > + retracted = true; > } > > if (ptl != pml) > spin_unlock(ptl); > +unlock_pmd: > spin_unlock(pml); > > mmu_notifier_invalidate_range_end(&range); > > - if (!skipped_uffd) { > + if (retracted) { > mm_dec_nr_ptes(mm); > page_table_check_pte_clear_range(mm, addr, pgt_pmd); > pte_free_defer(mm, pmd_pgtable(pgt_pmd)); > diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c > index a78a4adf711ac..13a7705df3f87 100644 > --- a/mm/pgtable-generic.c > +++ b/mm/pgtable-generic.c > @@ -313,7 +313,7 @@ pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, > > pte = __pte_offset_map(pmd, addr, &pmdval); > if (likely(pte)) > - *ptlp = pte_lockptr(mm, &pmdval); > + *ptlp = pte_lockptr(mm, pte); > return pte; > } > > @@ -371,7 +371,7 @@ pte_t *__pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, > pte = __pte_offset_map(pmd, addr, &pmdval); > if (unlikely(!pte)) > return pte; > - ptl = pte_lockptr(mm, &pmdval); > + ptl = pte_lockptr(mm, pte); > spin_lock(ptl); > if (likely(pmd_same(pmdval, pmdp_get_lockless(pmd)))) { > *ptlp = ptl; > -- > 2.45.2 > -- Peter Xu