From: Qi Zheng <zhengqi.arch@bytedance.com>
To: david@redhat.com, jannh@google.com, hughd@google.com,
willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org,
akpm@linux-foundation.org, peterx@redhat.com
Cc: mgorman@suse.de, catalin.marinas@arm.com, will@kernel.org,
dave.hansen@linux.intel.com, luto@kernel.org,
peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
zokeefe@google.com, rientjes@google.com,
Qi Zheng <zhengqi.arch@bytedance.com>
Subject: [PATCH v3 1/9] mm: khugepaged: recheck pmd state in retract_page_tables()
Date: Thu, 14 Nov 2024 14:59:52 +0800 [thread overview]
Message-ID: <e5b321ffc3ebfcc46e53830e917ad246f7d2825f.1731566457.git.zhengqi.arch@bytedance.com> (raw)
In-Reply-To: <cover.1731566457.git.zhengqi.arch@bytedance.com>
In retract_page_tables(), the lock of new_folio is still held, we will be
blocked in the page fault path, which prevents the pte entries from being
set again. So even though the old empty PTE page may be concurrently freed
and a new PTE page is filled into the pmd entry, it is still empty and can
be removed.
So just refactor the retract_page_tables() a little bit and recheck the
pmd state after holding the pmd lock.
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
mm/khugepaged.c | 45 +++++++++++++++++++++++++++++++--------------
1 file changed, 31 insertions(+), 14 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 6f8d46d107b4b..99dc995aac110 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -947,17 +947,10 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
return SCAN_SUCCEED;
}
-static int find_pmd_or_thp_or_none(struct mm_struct *mm,
- unsigned long address,
- pmd_t **pmd)
+static inline int check_pmd_state(pmd_t *pmd)
{
- pmd_t pmde;
+ pmd_t pmde = pmdp_get_lockless(pmd);
- *pmd = mm_find_pmd(mm, address);
- if (!*pmd)
- return SCAN_PMD_NULL;
-
- pmde = pmdp_get_lockless(*pmd);
if (pmd_none(pmde))
return SCAN_PMD_NONE;
if (!pmd_present(pmde))
@@ -971,6 +964,17 @@ static int find_pmd_or_thp_or_none(struct mm_struct *mm,
return SCAN_SUCCEED;
}
+static int find_pmd_or_thp_or_none(struct mm_struct *mm,
+ unsigned long address,
+ pmd_t **pmd)
+{
+ *pmd = mm_find_pmd(mm, address);
+ if (!*pmd)
+ return SCAN_PMD_NULL;
+
+ return check_pmd_state(*pmd);
+}
+
static int check_pmd_still_valid(struct mm_struct *mm,
unsigned long address,
pmd_t *pmd)
@@ -1720,7 +1724,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
pmd_t *pmd, pgt_pmd;
spinlock_t *pml;
spinlock_t *ptl;
- bool skipped_uffd = false;
+ bool success = false;
/*
* Check vma->anon_vma to exclude MAP_PRIVATE mappings that
@@ -1757,6 +1761,19 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
mmu_notifier_invalidate_range_start(&range);
pml = pmd_lock(mm, pmd);
+ /*
+ * The lock of new_folio is still held, we will be blocked in
+ * the page fault path, which prevents the pte entries from
+ * being set again. So even though the old empty PTE page may be
+ * concurrently freed and a new PTE page is filled into the pmd
+ * entry, it is still empty and can be removed.
+ *
+ * So here we only need to recheck if the state of pmd entry
+ * still meets our requirements, rather than checking pmd_same()
+ * like elsewhere.
+ */
+ if (check_pmd_state(pmd) != SCAN_SUCCEED)
+ goto drop_pml;
ptl = pte_lockptr(mm, pmd);
if (ptl != pml)
spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
@@ -1770,20 +1787,20 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
* repeating the anon_vma check protects from one category,
* and repeating the userfaultfd_wp() check from another.
*/
- if (unlikely(vma->anon_vma || userfaultfd_wp(vma))) {
- skipped_uffd = true;
- } else {
+ if (likely(!vma->anon_vma && !userfaultfd_wp(vma))) {
pgt_pmd = pmdp_collapse_flush(vma, addr, pmd);
pmdp_get_lockless_sync();
+ success = true;
}
if (ptl != pml)
spin_unlock(ptl);
+drop_pml:
spin_unlock(pml);
mmu_notifier_invalidate_range_end(&range);
- if (!skipped_uffd) {
+ if (success) {
mm_dec_nr_ptes(mm);
page_table_check_pte_clear_range(mm, addr, pgt_pmd);
pte_free_defer(mm, pmd_pgtable(pgt_pmd));
--
2.20.1
next prev parent reply other threads:[~2024-11-14 7:00 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-14 6:59 [PATCH v3 0/9] synchronously scan and reclaim empty user PTE pages Qi Zheng
2024-11-14 6:59 ` Qi Zheng [this message]
2024-11-14 6:59 ` [PATCH v3 2/9] mm: userfaultfd: recheck dst_pmd entry in move_pages_pte() Qi Zheng
2024-11-14 6:59 ` [PATCH v3 3/9] mm: introduce zap_nonpresent_ptes() Qi Zheng
2024-11-14 6:59 ` [PATCH v3 4/9] mm: introduce skip_none_ptes() Qi Zheng
2024-11-14 8:04 ` David Hildenbrand
2024-11-14 9:20 ` Qi Zheng
2024-11-14 12:32 ` David Hildenbrand
2024-11-14 12:51 ` Qi Zheng
2024-11-14 21:19 ` David Hildenbrand
2024-11-15 3:03 ` Qi Zheng
2024-11-15 10:22 ` David Hildenbrand
2024-11-15 14:41 ` Qi Zheng
2024-11-15 14:59 ` David Hildenbrand
2024-11-18 3:35 ` Qi Zheng
2024-11-18 9:29 ` David Hildenbrand
2024-11-18 10:34 ` Qi Zheng
2024-11-18 10:41 ` David Hildenbrand
2024-11-18 10:56 ` Qi Zheng
2024-11-18 10:59 ` David Hildenbrand
2024-11-18 11:13 ` Qi Zheng
2024-11-19 9:55 ` David Hildenbrand
2024-11-19 10:03 ` Qi Zheng
2024-11-14 6:59 ` [PATCH v3 5/9] mm: introduce do_zap_pte_range() Qi Zheng
2024-11-14 6:59 ` [PATCH v3 6/9] mm: make zap_pte_range() handle full within-PMD range Qi Zheng
2024-11-14 6:59 ` [PATCH v3 7/9] mm: pgtable: try to reclaim empty PTE page in madvise(MADV_DONTNEED) Qi Zheng
2024-11-14 6:59 ` [PATCH v3 8/9] x86: mm: free page table pages by RCU instead of semi RCU Qi Zheng
2024-11-14 7:00 ` [PATCH v3 9/9] x86: select ARCH_SUPPORTS_PT_RECLAIM if X86_64 Qi Zheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e5b321ffc3ebfcc46e53830e917ad246f7d2825f.1731566457.git.zhengqi.arch@bytedance.com \
--to=zhengqi.arch@bytedance.com \
--cc=akpm@linux-foundation.org \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=luto@kernel.org \
--cc=mgorman@suse.de \
--cc=muchun.song@linux.dev \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=rientjes@google.com \
--cc=vbabka@kernel.org \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox