From: Qi Zheng <zhengqi.arch@bytedance.com>
To: akpm@linux-foundation.org, tglx@linutronix.de,
kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com,
david@redhat.com, jgg@nvidia.com, tj@kernel.org,
dennis@kernel.org, ming.lei@redhat.com
Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, songmuchun@bytedance.com,
zhouchengming@bytedance.com,
Qi Zheng <zhengqi.arch@bytedance.com>
Subject: [RFC PATCH 09/18] pte_ref: add pte_tryget() and {__,}pte_put() helper
Date: Fri, 29 Apr 2022 21:35:43 +0800 [thread overview]
Message-ID: <20220429133552.33768-10-zhengqi.arch@bytedance.com> (raw)
In-Reply-To: <20220429133552.33768-1-zhengqi.arch@bytedance.com>
The user PTE page table page may be freed when the last
percpu_ref is dropped. So we need to try to get its
percpu_ref before accessing the PTE page to prevent it
form being freed during the access process.
This patch adds pte_tryget() and {__,}pte_put() to help us
to get and put the percpu_ref of user PTE page table pages.
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
include/linux/pte_ref.h | 23 ++++++++++++++++
mm/pte_ref.c | 58 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 81 insertions(+)
diff --git a/include/linux/pte_ref.h b/include/linux/pte_ref.h
index d3963a151ca5..bfe620038699 100644
--- a/include/linux/pte_ref.h
+++ b/include/linux/pte_ref.h
@@ -12,6 +12,10 @@
bool pte_ref_init(pgtable_t pte);
void pte_ref_free(pgtable_t pte);
+void free_user_pte(struct mm_struct *mm, pmd_t *pmd, unsigned long addr);
+bool pte_tryget(struct mm_struct *mm, pmd_t *pmd, unsigned long addr);
+void __pte_put(pgtable_t page);
+void pte_put(pte_t *ptep);
#else /* !CONFIG_FREE_USER_PTE */
@@ -24,6 +28,25 @@ static inline void pte_ref_free(pgtable_t pte)
{
}
+static inline void free_user_pte(struct mm_struct *mm, pmd_t *pmd,
+ unsigned long addr)
+{
+}
+
+static inline bool pte_tryget(struct mm_struct *mm, pmd_t *pmd,
+ unsigned long addr)
+{
+ return true;
+}
+
+static inline void __pte_put(pgtable_t page)
+{
+}
+
+static inline void pte_put(pte_t *ptep)
+{
+}
+
#endif /* CONFIG_FREE_USER_PTE */
#endif /* _LINUX_PTE_REF_H */
diff --git a/mm/pte_ref.c b/mm/pte_ref.c
index 52e31be00de4..5b382445561e 100644
--- a/mm/pte_ref.c
+++ b/mm/pte_ref.c
@@ -44,4 +44,62 @@ void pte_ref_free(pgtable_t pte)
kfree(ref);
}
+void free_user_pte(struct mm_struct *mm, pmd_t *pmd, unsigned long addr) {}
+
+/*
+ * pte_tryget - try to get the pte_ref of the user PTE page table page
+ * @mm: pointer the target address space
+ * @pmd: pointer to a PMD.
+ * @addr: virtual address associated with pmd.
+ *
+ * Return: true if getting the pte_ref succeeded. And false otherwise.
+ *
+ * Before accessing the user PTE page table, we need to hold a refcount to
+ * protect against the concurrent release of the PTE page table.
+ * But we will fail in the following case:
+ * - The content mapped in @pmd is not a PTE page
+ * - The pte_ref is zero, it may be reclaimed
+ */
+bool pte_tryget(struct mm_struct *mm, pmd_t *pmd, unsigned long addr)
+{
+ bool retval = true;
+ pmd_t pmdval;
+ pgtable_t pte;
+
+ rcu_read_lock();
+ pmdval = READ_ONCE(*pmd);
+ pte = pmd_pgtable(pmdval);
+ if (unlikely(pmd_none(pmdval) || pmd_leaf(pmdval))) {
+ retval = false;
+ } else if (!percpu_ref_tryget(pte->pte_ref)) {
+ rcu_read_unlock();
+ /*
+ * Also do free_user_pte() here to prevent missed reclaim due
+ * to race condition.
+ */
+ free_user_pte(mm, pmd, addr & PMD_MASK);
+ return false;
+ }
+ rcu_read_unlock();
+
+ return retval;
+}
+
+void __pte_put(pgtable_t page)
+{
+ percpu_ref_put(page->pte_ref);
+}
+
+void pte_put(pte_t *ptep)
+{
+ pgtable_t page;
+
+ if (pte_huge(*ptep))
+ return;
+
+ page = pte_to_page(ptep);
+ __pte_put(page);
+}
+EXPORT_SYMBOL(pte_put);
+
#endif /* CONFIG_FREE_USER_PTE */
--
2.20.1
next prev parent reply other threads:[~2022-04-29 13:37 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-29 13:35 [RFC PATCH 00/18] Try to free user PTE page table pages Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 01/18] x86/mm/encrypt: add the missing pte_unmap() call Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 02/18] percpu_ref: make ref stable after percpu_ref_switch_to_atomic_sync() returns Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 03/18] percpu_ref: make percpu_ref_switch_lock per percpu_ref Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 04/18] mm: convert to use ptep_clear() in pte_clear_not_present_full() Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 05/18] mm: split the related definitions of pte_offset_map_lock() into pgtable.h Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 06/18] mm: introduce CONFIG_FREE_USER_PTE Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 07/18] mm: add pte_to_page() helper Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 08/18] mm: introduce percpu_ref for user PTE page table page Qi Zheng
2022-04-29 13:35 ` Qi Zheng [this message]
2022-04-29 13:35 ` [RFC PATCH 10/18] mm: add pte_tryget_map{_lock}() helper Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 11/18] mm: convert to use pte_tryget_map_lock() Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 12/18] mm: convert to use pte_tryget_map() Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 13/18] mm: add try_to_free_user_pte() helper Qi Zheng
2022-04-30 13:35 ` Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 14/18] mm: use try_to_free_user_pte() in MADV_DONTNEED case Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 15/18] mm: use try_to_free_user_pte() in MADV_FREE case Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 16/18] pte_ref: add track_pte_{set, clear}() helper Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 17/18] x86/mm: add x86_64 support for pte_ref Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 18/18] Documentation: add document " Qi Zheng
2022-04-30 13:19 ` Bagas Sanjaya
2022-04-30 13:32 ` Qi Zheng
2022-05-17 8:30 ` [RFC PATCH 00/18] Try to free user PTE page table pages Qi Zheng
2022-05-18 14:51 ` David Hildenbrand
2022-05-18 14:56 ` Matthew Wilcox
2022-05-19 4:03 ` Qi Zheng
2022-05-19 3:58 ` Qi Zheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220429133552.33768-10-zhengqi.arch@bytedance.com \
--to=zhengqi.arch@bytedance.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=dennis@kernel.org \
--cc=jgg@nvidia.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mika.penttila@nextfour.com \
--cc=ming.lei@redhat.com \
--cc=songmuchun@bytedance.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=zhouchengming@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox