From: David Hildenbrand <david@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, James Houghton <jthoughton@google.com>,
stable@vger.kernel.org, Peter Xu <peterx@redhat.com>,
Oscar Salvador <osalvador@suse.de>,
Muchun Song <muchun.song@linux.dev>,
Baolin Wang <baolin.wang@linux.alibaba.com>
Subject: Re: [PATCH v3] mm/hugetlb: fix hugetlb vs. core-mm PT locking
Date: Thu, 1 Aug 2024 10:50:18 +0200 [thread overview]
Message-ID: <541f6c23-77ad-4d46-a8ed-fb18c9b635b3@redhat.com> (raw)
In-Reply-To: <20240731122103.382509-1-david@redhat.com>
On 31.07.24 14:21, David Hildenbrand wrote:
> We recently made GUP's common page table walking code to also walk hugetlb
> VMAs without most hugetlb special-casing, preparing for the future of
> having less hugetlb-specific page table walking code in the codebase.
> Turns out that we missed one page table locking detail: page table locking
> for hugetlb folios that are not mapped using a single PMD/PUD.
James, Peter,
the following seems to get the job done. Thoughts?
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 8e462205400d..776dc3914d9e 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -938,10 +938,40 @@ static inline bool htlb_allow_alloc_fallback(int reason)
static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
struct mm_struct *mm, pte_t *pte)
{
- if (huge_page_size(h) == PMD_SIZE)
+ unsigned long size = huge_page_size(h);
+
+ VM_WARN_ON(size == PAGE_SIZE);
+
+ /*
+ * hugetlb must use the exact same PT locks as core-mm page table
+ * walkers would. When modifying a PTE table, hugetlb must take the
+ * PTE PT lock, when modifying a PMD table, hugetlb must take the PMD
+ * PT lock etc.
+ *
+ * The expectation is that any hugetlb folio smaller than a PMD is
+ * always mapped into a single PTE table and that any hugetlb folio
+ * smaller than a PUD (but at least as big as a PMD) is always mapped
+ * into a single PMD table.
+ *
+ * If that does not hold for an architecture, then that architecture
+ * must disable split PT locks such that all *_lockptr() functions
+ * will give us the same result: the per-MM PT lock.
+ *
+ * Note that with e.g., CONFIG_PGTABLE_LEVELS=2 where
+ * PGDIR_SIZE==P4D_SIZE==PUD_SIZE==PMD_SIZE, we'd use the MM PT lock
+ * directly with a PMD hugetlb size, whereby core-mm would call
+ * pmd_lockptr() instead. However, in such configurations split PMD
+ * locks are disabled -- split locks don't make sense on a single
+ * PGDIR page table -- and the end result is the same.
+ */
+ if (size >= P4D_SIZE)
+ return &mm->page_table_lock;
+ else if (size >= PUD_SIZE)
+ return pud_lockptr(mm, (pud_t *) pte);
+ else if (size >= PMD_SIZE || IS_ENABLED(CONFIG_HIGHPTE))
return pmd_lockptr(mm, (pmd_t *) pte);
- VM_BUG_ON(huge_page_size(h) == PAGE_SIZE);
- return &mm->page_table_lock;
+ /* pte_alloc_huge() only applies with !CONFIG_HIGHPTE */
+ return ptep_lockptr(mm, pte);
}
#ifndef hugepages_supported
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a890a1731c14..bd219ac9c026 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2869,6 +2869,13 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
}
+static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte)
+{
+ BUILD_BUG_ON(IS_ENABLED(CONFIG_HIGHPTE));
+ BUILD_BUG_ON(MAX_PTRS_PER_PTE * sizeof(pte_t) > PAGE_SIZE);
+ return ptlock_ptr(virt_to_ptdesc(pte));
+}
+
static inline bool ptlock_init(struct ptdesc *ptdesc)
{
/*
@@ -2893,6 +2900,10 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
{
return &mm->page_table_lock;
}
+static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte)
+{
+ return &mm->page_table_lock;
+}
static inline void ptlock_cache_init(void) {}
static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
static inline void ptlock_free(struct ptdesc *ptdesc) {}
--
2.45.2
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2024-08-01 8:50 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-31 12:21 David Hildenbrand
2024-07-31 14:54 ` Peter Xu
2024-07-31 16:33 ` David Hildenbrand
2024-08-01 2:03 ` Michael Ellerman
2024-08-01 7:35 ` David Hildenbrand
2024-08-14 17:25 ` Christophe Leroy
2024-08-01 8:50 ` David Hildenbrand [this message]
2024-08-01 13:52 ` Peter Xu
2024-08-01 15:35 ` David Hildenbrand
2024-08-01 16:07 ` Peter Xu
2024-08-01 16:24 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=541f6c23-77ad-4d46-a8ed-fb18c9b635b3@redhat.com \
--to=david@redhat.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=jthoughton@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=peterx@redhat.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox