From: Peter Xu <peterx@redhat.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
James Houghton <jthoughton@google.com>,
Miaohe Lin <linmiaohe@huawei.com>,
David Hildenbrand <david@redhat.com>,
Muchun Song <songmuchun@bytedance.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Nadav Amit <nadav.amit@gmail.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
peterx@redhat.com, Rik van Riel <riel@surriel.com>
Subject: [PATCH RFC 02/10] mm/hugetlb: Comment huge_pte_offset() for its locking requirements
Date: Sun, 30 Oct 2022 17:29:21 -0400 [thread overview]
Message-ID: <20221030212929.335473-3-peterx@redhat.com> (raw)
In-Reply-To: <20221030212929.335473-1-peterx@redhat.com>
huge_pte_offset() is potentially a pgtable walker, looking up pte_t* for a
hugetlb address.
Normally, it's always safe to walk the pgtable as long as we're with the
mmap lock held for either read or write, because that guarantees the
pgtable pages will always be valid during the process.
But it's not true for hugetlbfs: hugetlbfs has the pmd sharing feature, it
means that even with mmap lock held, the PUD pgtable page can still go away
from under us if pmd unsharing is possible during the walk.
It's not always the case, e.g.:
(1) If the mapping is private we're not prone to pmd sharing or
unsharing, so it's okay.
(2) If we're with the hugetlb vma lock held for either read/write, it's
okay too because pmd unshare cannot happen at all.
Document all these explicitly for huge_pte_offset(), because it's really
not that obvious. This also tells all the callers on what it needs to
guarantee huge_pte_offset() thread-safety.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
arch/arm64/mm/hugetlbpage.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 35e9a468d13e..0bf930c75d4b 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -329,6 +329,38 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
return ptep;
}
+/*
+ * huge_pte_offset(): Walk the hugetlb pgtable until the last level PTE.
+ * Returns the pte_t* if found, or NULL if the address is not mapped.
+ *
+ * NOTE: since this function will walk all the pgtable pages (including not
+ * only high-level pgtable page, but also PUD that can be unshared
+ * concurrently for VM_SHARED), the caller of this function should be
+ * responsible of its thread safety. One can follow this rule:
+ *
+ * (1) For private mappings: pmd unsharing is not possible, so it'll
+ * always be safe if we're with the mmap sem for either read or write.
+ * This is normally always the case, so IOW we don't need to do
+ * anything special.
+ *
+ * (2) For shared mappings: pmd unsharing is possible (so the PUD-ranged
+ * pgtable page can go away from under us! It can be done by a pmd
+ * unshare with a follow up munmap() on the other process), then we
+ * need either:
+ *
+ * (2.1) hugetlb vma lock read or write held, to make sure pmd unshare
+ * won't happen upon the range (it also makes sure the pte_t we
+ * read is the right and stable one), or,
+ *
+ * (2.2) RCU read lock, to make sure even pmd unsharing happened, the
+ * old shared PUD page won't get freed from under us, so even of
+ * the pteval can be obsolete, at least it's still always safe to
+ * access the pgtable page (e.g., de-referencing pte_t* would not
+ * cause use-after-free).
+ *
+ * PS: from the regard of (2.2), it's the same logic of fast-gup being safe
+ * for generic mm, as long as RCU is used to free any pgtable page.
+ */
pte_t *huge_pte_offset(struct mm_struct *mm,
unsigned long addr, unsigned long sz)
{
--
2.37.3
next prev parent reply other threads:[~2022-10-30 21:29 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-30 21:29 [PATCH RFC 00/10] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare Peter Xu
2022-10-30 21:29 ` [PATCH RFC 01/10] mm/hugetlb: Let vma_offset_start() to return start Peter Xu
2022-11-03 15:25 ` Mike Kravetz
2022-10-30 21:29 ` Peter Xu [this message]
2022-11-01 5:46 ` [PATCH RFC 02/10] mm/hugetlb: Comment huge_pte_offset() for its locking requirements Nadav Amit
2022-11-02 20:51 ` Peter Xu
2022-11-03 15:42 ` Mike Kravetz
2022-11-03 18:11 ` Peter Xu
2022-11-03 18:38 ` Mike Kravetz
2022-10-30 21:29 ` [PATCH RFC 03/10] mm/hugetlb: Make hugetlb_vma_maps_page() RCU-safe Peter Xu
2022-10-30 21:29 ` [PATCH RFC 04/10] mm/hugetlb: Make userfaultfd_huge_must_wait() RCU-safe Peter Xu
2022-11-02 18:06 ` James Houghton
2022-11-02 21:17 ` Peter Xu
2022-10-30 21:29 ` [PATCH RFC 05/10] mm/hugetlb: Make walk_hugetlb_range() RCU-safe Peter Xu
2022-11-06 8:14 ` kernel test robot
2022-11-06 16:41 ` Peter Xu
2022-10-30 21:29 ` [PATCH RFC 06/10] mm/hugetlb: Make page_vma_mapped_walk() RCU-safe Peter Xu
2022-10-30 21:29 ` [PATCH RFC 07/10] mm/hugetlb: Make hugetlb_follow_page_mask() RCU-safe Peter Xu
2022-11-02 18:24 ` James Houghton
2022-11-03 15:50 ` Peter Xu
2022-10-30 21:30 ` [PATCH RFC 08/10] mm/hugetlb: Make follow_hugetlb_page RCU-safe Peter Xu
2022-10-30 21:30 ` [PATCH RFC 09/10] mm/hugetlb: Make hugetlb_fault() RCU-safe Peter Xu
2022-11-02 18:04 ` James Houghton
2022-11-03 15:39 ` Peter Xu
2022-10-30 21:30 ` [PATCH RFC 10/10] mm/hugetlb: Comment at rest huge_pte_offset() places Peter Xu
2022-11-01 5:39 ` Nadav Amit
2022-11-02 21:21 ` Peter Xu
2022-11-04 0:21 ` [PATCH RFC 00/10] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare Mike Kravetz
2022-11-04 15:02 ` Peter Xu
2022-11-04 15:44 ` Mike Kravetz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221030212929.335473-3-peterx@redhat.com \
--to=peterx@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=jthoughton@google.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=nadav.amit@gmail.com \
--cc=riel@surriel.com \
--cc=songmuchun@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox