From: Qi Zheng <zhengqi.arch@bytedance.com>
To: david@redhat.com, hughd@google.com, willy@infradead.org,
mgorman@suse.de, muchun.song@linux.dev, vbabka@kernel.org,
akpm@linux-foundation.org, zokeefe@google.com,
rientjes@google.com
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Qi Zheng <zhengqi.arch@bytedance.com>
Subject: [RFC PATCH v2 5/7] x86: mm: free page table pages by RCU instead of semi RCU
Date: Mon, 5 Aug 2024 20:55:09 +0800 [thread overview]
Message-ID: <9a3deedc55947030db20a5ef8aca7b2741df2d9d.1722861064.git.zhengqi.arch@bytedance.com> (raw)
In-Reply-To: <cover.1722861064.git.zhengqi.arch@bytedance.com>
Now, if CONFIG_MMU_GATHER_RCU_TABLE_FREE is selected, the page table pages
will be freed by semi RCU, that is:
- batch table freeing: asynchronous free by RCU
- single table freeing: IPI + synchronous free
In this way, the page table can be lockless traversed by disabling IRQ in
paths such as fast GUP. But this is not enough to free the empty PTE page
table pages in paths other that munmap and exit_mmap path, because IPI
cannot be synchronized with rcu_read_lock() in pte_offset_map{_lock}().
In preparation for supporting empty PTE page table pages reclaimation,
let single table also be freed by RCU like batch table freeing. Then we
can also use pte_offset_map() etc to prevent PTE page from being freed.
Like pte_free_defer(), we can also safely use ptdesc->pt_rcu_head to free
the page table pages:
- The pt_rcu_head is unioned with pt_list and pmd_huge_pte.
- For pt_list, it is used to manage the PGD page in x86. Fortunately
tlb_remove_table() will not be used for free PGD pages, so it is safe
to use pt_rcu_head.
- For pmd_huge_pte, we will do zap_deposited_table() before freeing the
PMD page, so it is also safe.
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
arch/x86/include/asm/tlb.h | 19 +++++++++++++++++++
arch/x86/kernel/paravirt.c | 7 +++++++
arch/x86/mm/pgtable.c | 10 +++++++++-
mm/mmu_gather.c | 9 ++++++++-
4 files changed, 43 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h
index 580636cdc257b..e223b53a8b190 100644
--- a/arch/x86/include/asm/tlb.h
+++ b/arch/x86/include/asm/tlb.h
@@ -34,4 +34,23 @@ static inline void __tlb_remove_table(void *table)
free_page_and_swap_cache(table);
}
+#ifdef CONFIG_PT_RECLAIM
+static inline void __tlb_remove_table_one_rcu(struct rcu_head *head)
+{
+ struct page *page;
+
+ page = container_of(head, struct page, rcu_head);
+ free_page_and_swap_cache(page);
+}
+
+static inline void __tlb_remove_table_one(void *table)
+{
+ struct page *page;
+
+ page = table;
+ call_rcu(&page->rcu_head, __tlb_remove_table_one_rcu);
+}
+#define __tlb_remove_table_one __tlb_remove_table_one
+#endif /* CONFIG_PT_RECLAIM */
+
#endif /* _ASM_X86_TLB_H */
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 5358d43886adc..199b9a3813b4a 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -60,10 +60,17 @@ void __init native_pv_lock_init(void)
static_branch_disable(&virt_spin_lock_key);
}
+#ifndef CONFIG_PT_RECLAIM
static void native_tlb_remove_table(struct mmu_gather *tlb, void *table)
{
tlb_remove_page(tlb, table);
}
+#else
+static void native_tlb_remove_table(struct mmu_gather *tlb, void *table)
+{
+ tlb_remove_table(tlb, table);
+}
+#endif
struct static_key paravirt_steal_enabled;
struct static_key paravirt_steal_rq_enabled;
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index f5931499c2d6b..ea8522289c93d 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -19,12 +19,20 @@ EXPORT_SYMBOL(physical_mask);
#endif
#ifndef CONFIG_PARAVIRT
+#ifndef CONFIG_PT_RECLAIM
static inline
void paravirt_tlb_remove_table(struct mmu_gather *tlb, void *table)
{
tlb_remove_page(tlb, table);
}
-#endif
+#else
+static inline
+void paravirt_tlb_remove_table(struct mmu_gather *tlb, void *table)
+{
+ tlb_remove_table(tlb, table);
+}
+#endif /* !CONFIG_PT_RECLAIM */
+#endif /* !CONFIG_PARAVIRT */
gfp_t __userpte_alloc_gfp = GFP_PGTABLE_USER | PGTABLE_HIGHMEM;
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index 99b3e9408aa0f..d948479ca09e6 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -311,10 +311,17 @@ static inline void tlb_table_invalidate(struct mmu_gather *tlb)
}
}
+#ifndef __tlb_remove_table_one
+static inline void __tlb_remove_table_one(void *table)
+{
+ __tlb_remove_table(table);
+}
+#endif
+
static void tlb_remove_table_one(void *table)
{
tlb_remove_table_sync_one();
- __tlb_remove_table(table);
+ __tlb_remove_table_one(table);
}
static void tlb_table_flush(struct mmu_gather *tlb)
--
2.20.1
next prev parent reply other threads:[~2024-08-05 12:56 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-05 12:55 [RFC PATCH v2 0/7] synchronously scan and reclaim empty user PTE pages Qi Zheng
2024-08-05 12:55 ` [RFC PATCH v2 1/7] mm: pgtable: make pte_offset_map_nolock() return pmdval Qi Zheng
2024-08-05 14:43 ` David Hildenbrand
2024-08-06 2:40 ` Qi Zheng
2024-08-06 14:16 ` David Hildenbrand
[not found] ` <f6c05526-5ac9-4597-9e80-099ea22fa0ae@bytedance.com>
2024-08-09 16:54 ` David Hildenbrand
2024-08-12 6:21 ` Qi Zheng
2024-08-16 8:59 ` David Hildenbrand
2024-08-16 9:21 ` Qi Zheng
2024-08-05 12:55 ` [RFC PATCH v2 2/7] mm: introduce CONFIG_PT_RECLAIM Qi Zheng
2024-08-06 14:25 ` David Hildenbrand
2024-08-05 12:55 ` [RFC PATCH v2 3/7] mm: pass address information to pmd_install() Qi Zheng
2024-08-05 12:55 ` [RFC PATCH v2 4/7] mm: pgtable: try to reclaim empty PTE pages in zap_page_range_single() Qi Zheng
2024-08-06 14:40 ` David Hildenbrand
[not found] ` <42942b4d-153e-43e2-bfb1-43db49f87e50@bytedance.com>
2024-08-16 9:22 ` David Hildenbrand
2024-08-16 10:01 ` Qi Zheng
2024-08-16 10:03 ` David Hildenbrand
2024-08-16 10:07 ` Qi Zheng
2024-08-05 12:55 ` Qi Zheng [this message]
2024-08-05 12:55 ` [RFC PATCH v2 6/7] x86: mm: define arch_flush_tlb_before_set_huge_page Qi Zheng
2024-08-05 12:55 ` [RFC PATCH v2 7/7] x86: select ARCH_SUPPORTS_PT_RECLAIM if X86_64 Qi Zheng
2024-08-05 13:14 ` [RFC PATCH v2 0/7] synchronously scan and reclaim empty user PTE pages Qi Zheng
2024-08-06 3:31 ` Qi Zheng
2024-08-16 2:55 ` Qi Zheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9a3deedc55947030db20a5ef8aca7b2741df2d9d.1722861064.git.zhengqi.arch@bytedance.com \
--to=zhengqi.arch@bytedance.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=muchun.song@linux.dev \
--cc=rientjes@google.com \
--cc=vbabka@kernel.org \
--cc=willy@infradead.org \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox