From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4A1EC3DA61 for ; Mon, 29 Jul 2024 07:48:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6C8AD6B008A; Mon, 29 Jul 2024 03:48:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6782C6B008C; Mon, 29 Jul 2024 03:48:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4F23F6B0092; Mon, 29 Jul 2024 03:48:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 301196B008A for ; Mon, 29 Jul 2024 03:48:20 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8753F1201AF for ; Mon, 29 Jul 2024 07:48:19 +0000 (UTC) X-FDA: 82392012318.25.EDC7E5B Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf22.hostedemail.com (Postfix) with ESMTP id C19ECC0003 for ; Mon, 29 Jul 2024 07:48:16 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=JCHBHfgO; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf22.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722239254; a=rsa-sha256; cv=none; b=nMBSdD9v3i3ZNJjPu8xFvJ+vf7z6Uykhpo3a3Wa+8eYMlxbL+4EH9pCMh0jqtQ3dYp0M5w 9p9WsBqJRQIIQv0AkT1BnItnp7sbux8Pmuw9X4KADUA2R3mAZJHS7KvUnMA685MyHZpveD ckCMKFgbsR3LgQfQ59vISbGhPinneb8= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=JCHBHfgO; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf22.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722239254; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=imL8KMm21BYmuGn+D1QBxeMgp1Tkuo5Lke200k4m+tI=; b=VqeTEtKTrJE98AlDzufu8uBU/NpFYmvgzI4Q7YKfu3oBawOIFiRuMELsEk7xvy0saPtBsQ YxjHiGyH95+ldNnA/+FlBFZcZWN20H8CjMYmO3GVdTTqda5fu9kv5ajXkSVFMVGg26fQPT R8/NWMrFFUgbJF6Pgfz+4ulr6/LnBF0= Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-1fd8faa28fdso954245ad.0 for ; Mon, 29 Jul 2024 00:48:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1722239295; x=1722844095; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=imL8KMm21BYmuGn+D1QBxeMgp1Tkuo5Lke200k4m+tI=; b=JCHBHfgOmDFy6UTQkiWWhcisdJukl2tXFDBi5ibKO35t4pXCITtDK0Yh/HTy03sq90 kAEAzd59wNK7bbXD0+rhLaWPNMRu9v3i9RhGLy6rputL6J/7j4iuY5kJkm7r4M7WJ2H1 P6EHHvXHB7oEnJmmlYb2qdFC1AS4rLMQ/0C3Clj2C7s7vlD4iJcvxX7lnYEdehHRYqKm XIhmL7+0PjM9V5ydpuQe1S+s2XHb1bBNPWEab0XuwbOtwpyZO5/ymeJBkocYAKs73c8d rWd3YkSZP0iCmI+cezLmKBlooQ4BFdTfSvkerE7lNGRi3DixM56YgR3khqbjhOCYS2iS zA5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722239295; x=1722844095; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=imL8KMm21BYmuGn+D1QBxeMgp1Tkuo5Lke200k4m+tI=; b=PGLxG6DZEGwquktOsCTLvjh+cYpqviilXRXLd4SDzhCSMHSRg4pa4FEJj/nSSF51kN SAmF/GsHs3s4Xy5lr7iUHRITuXsqemRvhMhtrrmZ/gL6zSYr5j+k5F5fGKxDoS4Nl5sM No9U9LLxyZTsWzhyklRVKhjYv4ZCn8Nf+T0drLZZk4WREqfx2xX9squQMZJ4FssuEa2i xrSqXQpcINVk/ZoHlHTBOyVmF2XciOIhkNRgcUDQ5K44AE0LX+kcaUFFLQv4N1AIYl6A /BvgwcC2HHq4jywMVFiJ8zezwyx1KcXSFWAnkdyFft/djHCYBBMQLU44jyvaoes+XrcF cS3A== X-Forwarded-Encrypted: i=1; AJvYcCXffWAzuenutdWjT72+9jzs3zcq/IBjsL0b4B0osSCGmAvREoyniyDQhj2z6zE7A4psq8dAJ4Bt4w==@kvack.org X-Gm-Message-State: AOJu0YyB4PFbgckhsOA4kpcWQ6lj6t34Notu4vL1r5kcreyCp9i32BRm jdbu/L9WzGlrYXTNaCx8YdnpnCtqVsLFHuvrxhTeIyFGbjG+9EjaN+Jbu3gsCUxUcLVIhoiShgT e X-Google-Smtp-Source: AGHT+IHlXhDynmgBvWrpUd+RXS5BR/x336r+3XIBpChwiiIwqdd4uIDsdNa42LXdilE6g4UEd6InwA== X-Received: by 2002:a17:902:6b0c:b0:1ff:1cf6:5b9b with SMTP id d9443c01a7336-1ff1cf666b3mr20937855ad.11.1722239295019; Mon, 29 Jul 2024 00:48:15 -0700 (PDT) Received: from [10.255.168.175] ([139.177.225.232]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1fed7ee1477sm76503735ad.169.2024.07.29.00.48.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 29 Jul 2024 00:48:14 -0700 (PDT) Message-ID: <100ecc66-c2ce-4dbb-8600-d782e75ab69c@bytedance.com> Date: Mon, 29 Jul 2024 15:48:09 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 1/2] mm: let pte_lockptr() consume a pte_t pointer Content-Language: en-US To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Muchun Song , Peter Xu , Oscar Salvador References: <20240725183955.2268884-1-david@redhat.com> <20240725183955.2268884-2-david@redhat.com> From: Qi Zheng In-Reply-To: <20240725183955.2268884-2-david@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: C19ECC0003 X-Stat-Signature: i4s47aq7hgykqhjizmr4dr9bzdnok4o3 X-Rspam-User: X-HE-Tag: 1722239296-571295 X-HE-Meta: U2FsdGVkX1+e8bpuiKL0FCwAXwCLRvsRw4ZNmG3NRIYm3humdgaRz3lNl1ahJR4cpwMBKZSJ4bDajvYAZCeGq0tZkQ1Wr8Tm4W+vjg7EVrDCgKYLblY+bzExLVEzZakF1OnWYZjNNfscNctLz3WG9OhFa8Q02XfMXhIQpO1V+w8wolpLBuizbxWtdG3JGZ4NmF+Yh6QORtmeVBKjFTiT6rPtNiU12ptqXOr1vV9yzLo2/UkC/PwpGw0vTpzH4f9hnZYq4X34Zc9V2WpBIlqKGFm486NQhkFrN03eFX51OjYtEyui5xW+NslhDoDyd2AWxpxIpErRn6raj5LUomnPqIt4HJTkiHl6WCXE6vvJKXQEl9q2lemEZVbWMK9MzQcDVhoQy0BPUKlY6o2azd0ZbmP4NDCktnMDPWcJgo99oIYmbofH7k50h5lGLeuM7IHj8qZixT4hA+05tPvhEsoWemeWo7huxQSpxZ8nFLq1d53RFfI7+8yFc8Jsuq+uIyc4NkBG5ItTXqZYZ45je3h16RfZtZsfwB/ZttNaTN2gvPeH6PuYSFjRAXxpZ1Fb7VQVGPqnOMHM97lLI6vBXcw+iUaun1HPVbb4yq7yzkzRUFDRaePtgRcEgHiX/440PlLrkEghNuJuaC2+F0A4LC2gTRPZoR1LZ3OAaNoTBp5B3l47gmt3qvsFe2R68NDg7cE81zEvX1HwQ0mlBET60snora8osoMOFBUHjQ1mZWze1yebuENTgcYgo57Ct4ea25/0DaXaLh6ziNAIdNAoieZrbQSqcSTc6qJhvz1sOWC8oSEWpx/UXxiPSdOLK9xgkwZfEdQRUUmarfCsMJMMdfMx7g7xwmhMzdB4w/2Gj+lvCpN9IOZ/dC/IiDKstHekIek44Vo5qMhrZV6NOh+M1/vD0B5irqFH42wwdXT7cymz3L45A9S8Qnk9Tiu2/F/g9iiIL3Djc5fv3C8/Ra3kuZp cnf1eIUV gnYOLp87pCK+ntgkt5lvP3F5NGnGSsN616v99iVPlxhEpFbSE6XXmNz88Vh7SsOcYijQBFwRNlIRzuIx7vUuGYncK+RUy011YRMzho4zWil3U2/Ix7maciICY1qdT8LnqL4iod8vzsL+C+taom1bd3tikDvQlvBJyfQaxTzWt8EurjKJpcgyccNMOWkUXi+oe0cIi7M20Tn+JmHmrDVLdpQ6zvZB5l9IsHe8/W3hffLmgkvNGxXK8LJyoFeBSVKXeWya855r15Az7AhFbKqLSP8ZuOjw9SXwp47dIqyKCNZyVRxuftlmejcNcMtKu4/SpSC1itpr9xt3AUykODxWD5cqvze0HhUMdqSVwgb2xXjG9ihMKTEkAc74yRWOIdmO5CotofIgqIZ1p4wWxIBRoyGAgQx9vdjGECF5jsv/4sep3YwbAA0NCtXNB7w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/7/26 02:39, David Hildenbrand wrote: > pte_lockptr() is the only *_lockptr() function that doesn't consume > what would be expected: it consumes a pmd_t pointer instead of a pte_t > pointer. > > Let's change that. The two callers in pgtable-generic.c are easily > adjusted. Adjust khugepaged.c:retract_page_tables() to simply do a > pte_offset_map_nolock() to obtain the lock, even though we won't actually > be traversing the page table. > > This makes the code more similar to the other variants and avoids other > hacks to make the new pte_lockptr() version happy. pte_lockptr() users > reside now only in pgtable-generic.c. > > Maybe, using pte_offset_map_nolock() is the right thing to do because > the PTE table could have been removed in the meantime? At least it sounds > more future proof if we ever have other means of page table reclaim. Agree, this helps us recheck the pmd entry. > > It's not quite clear if holding the PTE table lock is really required: > what if someone else obtains the lock just after we unlock it? But we'll > leave that as is for now, maybe there are good reasons. > > This is a preparation for adapting hugetlb page table locking logic to > take the same locks as core-mm page table walkers would. > > Signed-off-by: David Hildenbrand > --- > include/linux/mm.h | 7 ++++--- > mm/khugepaged.c | 21 +++++++++++++++------ > mm/pgtable-generic.c | 4 ++-- > 3 files changed, 21 insertions(+), 11 deletions(-) Since pte_lockptr() no longer has a pmd parameter, it is best to modify the comments above __pte_offset_map_lock() as well: ``` This helps the caller to avoid a later pte_lockptr(mm, *pmd), which might by that time act on a changed *pmd ... ``` Otherwise: Reviewed-by: Qi Zheng > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 2c6ccf088c7be..0472a5090b180 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2873,9 +2873,10 @@ static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc) > } > #endif /* ALLOC_SPLIT_PTLOCKS */ > > -static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd) > +static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pte_t *pte) > { > - return ptlock_ptr(page_ptdesc(pmd_page(*pmd))); > + /* PTE page tables don't currently exceed a single page. */ > + return ptlock_ptr(virt_to_ptdesc(pte)); > } > > static inline bool ptlock_init(struct ptdesc *ptdesc) > @@ -2898,7 +2899,7 @@ static inline bool ptlock_init(struct ptdesc *ptdesc) > /* > * We use mm->page_table_lock to guard all pagetable pages of the mm. > */ > -static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd) > +static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pte_t *pte) > { > return &mm->page_table_lock; > } > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index cdd1d8655a76b..f3b3db1046155 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -1697,12 +1697,13 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) > i_mmap_lock_read(mapping); > vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { > struct mmu_notifier_range range; > + bool retracted = false; > struct mm_struct *mm; > unsigned long addr; > pmd_t *pmd, pgt_pmd; > spinlock_t *pml; > spinlock_t *ptl; > - bool skipped_uffd = false; > + pte_t *pte; > > /* > * Check vma->anon_vma to exclude MAP_PRIVATE mappings that > @@ -1739,9 +1740,17 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) > mmu_notifier_invalidate_range_start(&range); > > pml = pmd_lock(mm, pmd); > - ptl = pte_lockptr(mm, pmd); > + > + /* > + * No need to check the PTE table content, but we'll grab the > + * PTE table lock while we zap it. > + */ > + pte = pte_offset_map_nolock(mm, pmd, addr, &ptl); > + if (!pte) > + goto unlock_pmd; > if (ptl != pml) > spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); > + pte_unmap(pte); > > /* > * Huge page lock is still held, so normally the page table > @@ -1752,20 +1761,20 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) > * repeating the anon_vma check protects from one category, > * and repeating the userfaultfd_wp() check from another. > */ > - if (unlikely(vma->anon_vma || userfaultfd_wp(vma))) { > - skipped_uffd = true; > - } else { > + if (likely(!vma->anon_vma && !userfaultfd_wp(vma))) { > pgt_pmd = pmdp_collapse_flush(vma, addr, pmd); > pmdp_get_lockless_sync(); > + retracted = true; > } > > if (ptl != pml) > spin_unlock(ptl); > +unlock_pmd: > spin_unlock(pml); > > mmu_notifier_invalidate_range_end(&range); > > - if (!skipped_uffd) { > + if (retracted) { > mm_dec_nr_ptes(mm); > page_table_check_pte_clear_range(mm, addr, pgt_pmd); > pte_free_defer(mm, pmd_pgtable(pgt_pmd)); > diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c > index a78a4adf711ac..13a7705df3f87 100644 > --- a/mm/pgtable-generic.c > +++ b/mm/pgtable-generic.c > @@ -313,7 +313,7 @@ pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, > > pte = __pte_offset_map(pmd, addr, &pmdval); > if (likely(pte)) > - *ptlp = pte_lockptr(mm, &pmdval); > + *ptlp = pte_lockptr(mm, pte); > return pte; > } > > @@ -371,7 +371,7 @@ pte_t *__pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, > pte = __pte_offset_map(pmd, addr, &pmdval); > if (unlikely(!pte)) > return pte; > - ptl = pte_lockptr(mm, &pmdval); > + ptl = pte_lockptr(mm, pte); > spin_lock(ptl); > if (likely(pmd_same(pmdval, pmdp_get_lockless(pmd)))) { > *ptlp = ptl;