From: James Houghton <jthoughton@google.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
Muchun Song <songmuchun@bytedance.com>,
Peter Xu <peterx@redhat.com>
Cc: David Hildenbrand <david@redhat.com>,
David Rientjes <rientjes@google.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Mina Almasry <almasrymina@google.com>,
"Zach O'Keefe" <zokeefe@google.com>,
Manish Mishra <manish.mishra@nutanix.com>,
Naoya Horiguchi <naoya.horiguchi@nec.com>,
"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Vlastimil Babka <vbabka@suse.cz>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Miaohe Lin <linmiaohe@huawei.com>,
Yang Shi <shy828301@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
James Houghton <jthoughton@google.com>
Subject: [PATCH 21/46] hugetlb: use struct hugetlb_pte for walk_hugetlb_range
Date: Thu, 5 Jan 2023 10:18:19 +0000 [thread overview]
Message-ID: <20230105101844.1893104-22-jthoughton@google.com> (raw)
In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com>
The main change in this commit is to walk_hugetlb_range to support
walking HGM mappings, but all walk_hugetlb_range callers must be updated
to use the new API and take the correct action.
Listing all the changes to the callers:
For s390 changes, we simply ignore HGM PTEs (we don't support s390 yet).
For smaps, shared_hugetlb (and private_hugetlb, although private
mappings don't support HGM) may now not be divisible by the hugepage
size. The appropriate changes have been made to support analyzing HGM
PTEs.
For pagemap, we ignore non-leaf PTEs by treating that as if they were
none PTEs. We can only end up with non-leaf PTEs if they had just been
updated from a none PTE.
For show_numa_map, the challenge is that, if any of a hugepage is
mapped, we have to count that entire page exactly once, as the results
are given in units of hugepages. To support HGM mappings, we keep track
of the last page that we looked it. If the hugepage we are currently
looking at is the same as the last one, then we must be looking at an
HGM-mapped page that has been mapped at high-granularity, and we've
already accounted for it.
For DAMON, we treat non-leaf PTEs as if they were blank, for the same
reason as pagemap.
For hwpoison, we proactively update the logic to support the case when
hpte is pointing to a subpage within the poisoned hugepage.
For queue_pages_hugetlb/migration, we ignore all HGM-enabled VMAs for
now.
For mincore, we ignore non-leaf PTEs for the same reason as pagemap.
For mprotect/prot_none_hugetlb_entry, we retry the walk when we get a
non-leaf PTE.
Signed-off-by: James Houghton <jthoughton@google.com>
---
arch/s390/mm/gmap.c | 20 ++++++++--
fs/proc/task_mmu.c | 83 +++++++++++++++++++++++++++++-----------
include/linux/pagewalk.h | 10 +++--
mm/damon/vaddr.c | 42 +++++++++++++-------
mm/hmm.c | 20 +++++++---
mm/memory-failure.c | 17 ++++----
mm/mempolicy.c | 12 ++++--
mm/mincore.c | 17 ++++++--
mm/mprotect.c | 18 ++++++---
mm/pagewalk.c | 20 +++++-----
10 files changed, 180 insertions(+), 79 deletions(-)
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 74e1d873dce0..284466bf4f25 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2626,13 +2626,25 @@ static int __s390_enable_skey_pmd(pmd_t *pmd, unsigned long addr,
return 0;
}
-static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr,
- unsigned long hmask, unsigned long next,
+static int __s390_enable_skey_hugetlb(struct hugetlb_pte *hpte,
+ unsigned long addr,
struct mm_walk *walk)
{
- pmd_t *pmd = (pmd_t *)pte;
+ struct hstate *h = hstate_vma(walk->vma);
+ pmd_t *pmd;
unsigned long start, end;
- struct page *page = pmd_page(*pmd);
+ struct page *page;
+
+ if (huge_page_size(h) != hugetlb_pte_size(hpte))
+ /* Ignore high-granularity PTEs. */
+ return 0;
+
+ if (!pte_present(huge_ptep_get(hpte->ptep)))
+ /* Ignore non-present PTEs. */
+ return 0;
+
+ pmd = (pmd_t *)hpte->ptep;
+ page = pmd_page(*pmd);
/*
* The write check makes sure we do not set a key on shared
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 41b5509bde0e..c353cab11eee 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -731,18 +731,28 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
}
#ifdef CONFIG_HUGETLB_PAGE
-static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
- unsigned long addr, unsigned long end,
- struct mm_walk *walk)
+static int smaps_hugetlb_range(struct hugetlb_pte *hpte,
+ unsigned long addr,
+ struct mm_walk *walk)
{
struct mem_size_stats *mss = walk->private;
struct vm_area_struct *vma = walk->vma;
struct page *page = NULL;
+ pte_t pte = huge_ptep_get(hpte->ptep);
- if (pte_present(*pte)) {
- page = vm_normal_page(vma, addr, *pte);
- } else if (is_swap_pte(*pte)) {
- swp_entry_t swpent = pte_to_swp_entry(*pte);
+ if (pte_present(pte)) {
+ /* We only care about leaf-level PTEs. */
+ if (!hugetlb_pte_present_leaf(hpte, pte))
+ /*
+ * The only case where hpte is not a leaf is that
+ * it was originally none, but it was split from
+ * under us. It was originally none, so exclude it.
+ */
+ return 0;
+
+ page = vm_normal_page(vma, addr, pte);
+ } else if (is_swap_pte(pte)) {
+ swp_entry_t swpent = pte_to_swp_entry(pte);
if (is_pfn_swap_entry(swpent))
page = pfn_swap_entry_to_page(swpent);
@@ -751,9 +761,9 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
int mapcount = page_mapcount(page);
if (mapcount >= 2)
- mss->shared_hugetlb += huge_page_size(hstate_vma(vma));
+ mss->shared_hugetlb += hugetlb_pte_size(hpte);
else
- mss->private_hugetlb += huge_page_size(hstate_vma(vma));
+ mss->private_hugetlb += hugetlb_pte_size(hpte);
}
return 0;
}
@@ -1572,22 +1582,31 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
#ifdef CONFIG_HUGETLB_PAGE
/* This function walks within one hugetlb entry in the single call */
-static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
- unsigned long addr, unsigned long end,
+static int pagemap_hugetlb_range(struct hugetlb_pte *hpte,
+ unsigned long addr,
struct mm_walk *walk)
{
struct pagemapread *pm = walk->private;
struct vm_area_struct *vma = walk->vma;
u64 flags = 0, frame = 0;
int err = 0;
- pte_t pte;
+ unsigned long hmask = hugetlb_pte_mask(hpte);
+ unsigned long end = addr + hugetlb_pte_size(hpte);
+ pte_t pte = huge_ptep_get(hpte->ptep);
+ struct page *page;
if (vma->vm_flags & VM_SOFTDIRTY)
flags |= PM_SOFT_DIRTY;
- pte = huge_ptep_get(ptep);
if (pte_present(pte)) {
- struct page *page = pte_page(pte);
+ /*
+ * We raced with this PTE being split, which can only happen if
+ * it was blank before. Treat it is as if it were blank.
+ */
+ if (!hugetlb_pte_present_leaf(hpte, pte))
+ return 0;
+
+ page = pte_page(pte);
if (!PageAnon(page))
flags |= PM_FILE;
@@ -1868,10 +1887,16 @@ static struct page *can_gather_numa_stats_pmd(pmd_t pmd,
}
#endif
+struct show_numa_map_private {
+ struct numa_maps *md;
+ struct page *last_page;
+};
+
static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
unsigned long end, struct mm_walk *walk)
{
- struct numa_maps *md = walk->private;
+ struct show_numa_map_private *priv = walk->private;
+ struct numa_maps *md = priv->md;
struct vm_area_struct *vma = walk->vma;
spinlock_t *ptl;
pte_t *orig_pte;
@@ -1883,6 +1908,7 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
struct page *page;
page = can_gather_numa_stats_pmd(*pmd, vma, addr);
+ priv->last_page = page;
if (page)
gather_stats(page, md, pmd_dirty(*pmd),
HPAGE_PMD_SIZE/PAGE_SIZE);
@@ -1896,6 +1922,7 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
orig_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
do {
struct page *page = can_gather_numa_stats(*pte, vma, addr);
+ priv->last_page = page;
if (!page)
continue;
gather_stats(page, md, pte_dirty(*pte), 1);
@@ -1906,19 +1933,25 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
return 0;
}
#ifdef CONFIG_HUGETLB_PAGE
-static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask,
- unsigned long addr, unsigned long end, struct mm_walk *walk)
+static int gather_hugetlb_stats(struct hugetlb_pte *hpte, unsigned long addr,
+ struct mm_walk *walk)
{
- pte_t huge_pte = huge_ptep_get(pte);
+ struct show_numa_map_private *priv = walk->private;
+ pte_t huge_pte = huge_ptep_get(hpte->ptep);
struct numa_maps *md;
struct page *page;
- if (!pte_present(huge_pte))
+ if (!hugetlb_pte_present_leaf(hpte, huge_pte))
+ return 0;
+
+ page = compound_head(pte_page(huge_pte));
+ if (priv->last_page == page)
+ /* we've already accounted for this page */
return 0;
- page = pte_page(huge_pte);
+ priv->last_page = page;
- md = walk->private;
+ md = priv->md;
gather_stats(page, md, pte_dirty(huge_pte), 1);
return 0;
}
@@ -1948,9 +1981,15 @@ static int show_numa_map(struct seq_file *m, void *v)
struct file *file = vma->vm_file;
struct mm_struct *mm = vma->vm_mm;
struct mempolicy *pol;
+
char buffer[64];
int nid;
+ struct show_numa_map_private numa_map_private;
+
+ numa_map_private.md = md;
+ numa_map_private.last_page = NULL;
+
if (!mm)
return 0;
@@ -1980,7 +2019,7 @@ static int show_numa_map(struct seq_file *m, void *v)
seq_puts(m, " huge");
/* mmap_lock is held by m_start */
- walk_page_vma(vma, &show_numa_ops, md);
+ walk_page_vma(vma, &show_numa_ops, &numa_map_private);
if (!md->pages)
goto out;
diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
index 27a6df448ee5..f4bddad615c2 100644
--- a/include/linux/pagewalk.h
+++ b/include/linux/pagewalk.h
@@ -3,6 +3,7 @@
#define _LINUX_PAGEWALK_H
#include <linux/mm.h>
+#include <linux/hugetlb.h>
struct mm_walk;
@@ -31,6 +32,10 @@ struct mm_walk;
* ptl after dropping the vma lock, or else revalidate
* those items after re-acquiring the vma lock and before
* accessing them.
+ * In the presence of high-granularity hugetlb entries,
+ * @hugetlb_entry is called only for leaf-level entries
+ * (hstate-level entries are ignored if they are not
+ * leaves).
* @test_walk: caller specific callback function to determine whether
* we walk over the current vma or not. Returning 0 means
* "do page table walk over the current vma", returning
@@ -58,9 +63,8 @@ struct mm_walk_ops {
unsigned long next, struct mm_walk *walk);
int (*pte_hole)(unsigned long addr, unsigned long next,
int depth, struct mm_walk *walk);
- int (*hugetlb_entry)(pte_t *pte, unsigned long hmask,
- unsigned long addr, unsigned long next,
- struct mm_walk *walk);
+ int (*hugetlb_entry)(struct hugetlb_pte *hpte,
+ unsigned long addr, struct mm_walk *walk);
int (*test_walk)(unsigned long addr, unsigned long next,
struct mm_walk *walk);
int (*pre_vma)(unsigned long start, unsigned long end,
diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
index 9d92c5eb3a1f..2383f647f202 100644
--- a/mm/damon/vaddr.c
+++ b/mm/damon/vaddr.c
@@ -330,11 +330,12 @@ static int damon_mkold_pmd_entry(pmd_t *pmd, unsigned long addr,
}
#ifdef CONFIG_HUGETLB_PAGE
-static void damon_hugetlb_mkold(pte_t *pte, struct mm_struct *mm,
+static void damon_hugetlb_mkold(struct hugetlb_pte *hpte, pte_t entry,
+ struct mm_struct *mm,
struct vm_area_struct *vma, unsigned long addr)
{
bool referenced = false;
- pte_t entry = huge_ptep_get(pte);
+ pte_t entry = huge_ptep_get(hpte->ptep);
struct folio *folio = pfn_folio(pte_pfn(entry));
folio_get(folio);
@@ -342,12 +343,12 @@ static void damon_hugetlb_mkold(pte_t *pte, struct mm_struct *mm,
if (pte_young(entry)) {
referenced = true;
entry = pte_mkold(entry);
- set_huge_pte_at(mm, addr, pte, entry);
+ set_huge_pte_at(mm, addr, hpte->ptep, entry);
}
#ifdef CONFIG_MMU_NOTIFIER
if (mmu_notifier_clear_young(mm, addr,
- addr + huge_page_size(hstate_vma(vma))))
+ addr + hugetlb_pte_size(hpte)))
referenced = true;
#endif /* CONFIG_MMU_NOTIFIER */
@@ -358,20 +359,26 @@ static void damon_hugetlb_mkold(pte_t *pte, struct mm_struct *mm,
folio_put(folio);
}
-static int damon_mkold_hugetlb_entry(pte_t *pte, unsigned long hmask,
- unsigned long addr, unsigned long end,
+static int damon_mkold_hugetlb_entry(struct hugetlb_pte *hpte,
+ unsigned long addr,
struct mm_walk *walk)
{
- struct hstate *h = hstate_vma(walk->vma);
spinlock_t *ptl;
pte_t entry;
- ptl = huge_pte_lock(h, walk->mm, pte);
- entry = huge_ptep_get(pte);
+ ptl = hugetlb_pte_lock(hpte);
+ entry = huge_ptep_get(hpte->ptep);
if (!pte_present(entry))
goto out;
- damon_hugetlb_mkold(pte, walk->mm, walk->vma, addr);
+ if (!hugetlb_pte_present_leaf(hpte, entry))
+ /*
+ * We raced with someone splitting a blank PTE. Treat this PTE
+ * as if it were blank.
+ */
+ goto out;
+
+ damon_hugetlb_mkold(hpte, entry, walk->mm, walk->vma, addr);
out:
spin_unlock(ptl);
@@ -484,8 +491,8 @@ static int damon_young_pmd_entry(pmd_t *pmd, unsigned long addr,
}
#ifdef CONFIG_HUGETLB_PAGE
-static int damon_young_hugetlb_entry(pte_t *pte, unsigned long hmask,
- unsigned long addr, unsigned long end,
+static int damon_young_hugetlb_entry(struct hugetlb_pte *hpte,
+ unsigned long addr,
struct mm_walk *walk)
{
struct damon_young_walk_private *priv = walk->private;
@@ -494,11 +501,18 @@ static int damon_young_hugetlb_entry(pte_t *pte, unsigned long hmask,
spinlock_t *ptl;
pte_t entry;
- ptl = huge_pte_lock(h, walk->mm, pte);
- entry = huge_ptep_get(pte);
+ ptl = hugetlb_pte_lock(hpte);
+ entry = huge_ptep_get(hpte->ptep);
if (!pte_present(entry))
goto out;
+ if (!hugetlb_pte_present_leaf(hpte, entry))
+ /*
+ * We raced with someone splitting a blank PTE. Treat this PTE
+ * as if it were blank.
+ */
+ goto out;
+
folio = pfn_folio(pte_pfn(entry));
folio_get(folio);
diff --git a/mm/hmm.c b/mm/hmm.c
index 6a151c09de5e..d3e40cfdd4cb 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -468,8 +468,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
#endif
#ifdef CONFIG_HUGETLB_PAGE
-static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
- unsigned long start, unsigned long end,
+static int hmm_vma_walk_hugetlb_entry(struct hugetlb_pte *hpte,
+ unsigned long start,
struct mm_walk *walk)
{
unsigned long addr = start, i, pfn;
@@ -479,16 +479,24 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
unsigned int required_fault;
unsigned long pfn_req_flags;
unsigned long cpu_flags;
+ unsigned long hmask = hugetlb_pte_mask(hpte);
+ unsigned int order = hpte->shift - PAGE_SHIFT;
+ unsigned long end = start + hugetlb_pte_size(hpte);
spinlock_t *ptl;
pte_t entry;
- ptl = huge_pte_lock(hstate_vma(vma), walk->mm, pte);
- entry = huge_ptep_get(pte);
+ ptl = hugetlb_pte_lock(hpte);
+ entry = huge_ptep_get(hpte->ptep);
+
+ if (!hugetlb_pte_present_leaf(hpte, entry)) {
+ spin_unlock(ptl);
+ return -EAGAIN;
+ }
i = (start - range->start) >> PAGE_SHIFT;
pfn_req_flags = range->hmm_pfns[i];
cpu_flags = pte_to_hmm_pfn_flags(range, entry) |
- hmm_pfn_flags_order(huge_page_order(hstate_vma(vma)));
+ hmm_pfn_flags_order(order);
required_fault =
hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, cpu_flags);
if (required_fault) {
@@ -605,7 +613,7 @@ int hmm_range_fault(struct hmm_range *range)
* in pfns. All entries < last in the pfn array are set to their
* output, and all >= are still at their input values.
*/
- } while (ret == -EBUSY);
+ } while (ret == -EBUSY || ret == -EAGAIN);
return ret;
}
EXPORT_SYMBOL(hmm_range_fault);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index c77a9e37e27e..e7e56298d305 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -641,6 +641,7 @@ static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift,
unsigned long poisoned_pfn, struct to_kill *tk)
{
unsigned long pfn = 0;
+ unsigned long base_pages_poisoned = (1UL << shift) / PAGE_SIZE;
if (pte_present(pte)) {
pfn = pte_pfn(pte);
@@ -651,7 +652,8 @@ static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift,
pfn = swp_offset_pfn(swp);
}
- if (!pfn || pfn != poisoned_pfn)
+ if (!pfn || pfn < poisoned_pfn ||
+ pfn >= poisoned_pfn + base_pages_poisoned)
return 0;
set_to_kill(tk, addr, shift);
@@ -717,16 +719,15 @@ static int hwpoison_pte_range(pmd_t *pmdp, unsigned long addr,
}
#ifdef CONFIG_HUGETLB_PAGE
-static int hwpoison_hugetlb_range(pte_t *ptep, unsigned long hmask,
- unsigned long addr, unsigned long end,
- struct mm_walk *walk)
+static int hwpoison_hugetlb_range(struct hugetlb_pte *hpte,
+ unsigned long addr,
+ struct mm_walk *walk)
{
struct hwp_walk *hwp = walk->private;
- pte_t pte = huge_ptep_get(ptep);
- struct hstate *h = hstate_vma(walk->vma);
+ pte_t pte = huge_ptep_get(hpte->ptep);
- return check_hwpoisoned_entry(pte, addr, huge_page_shift(h),
- hwp->pfn, &hwp->tk);
+ return check_hwpoisoned_entry(pte, addr & hugetlb_pte_mask(hpte),
+ hpte->shift, hwp->pfn, &hwp->tk);
}
#else
#define hwpoison_hugetlb_range NULL
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d3558248a0f0..e5859ed34e90 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -558,8 +558,8 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
return addr != end ? -EIO : 0;
}
-static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask,
- unsigned long addr, unsigned long end,
+static int queue_pages_hugetlb(struct hugetlb_pte *hpte,
+ unsigned long addr,
struct mm_walk *walk)
{
int ret = 0;
@@ -570,8 +570,12 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask,
spinlock_t *ptl;
pte_t entry;
- ptl = huge_pte_lock(hstate_vma(walk->vma), walk->mm, pte);
- entry = huge_ptep_get(pte);
+ /* We don't migrate high-granularity HugeTLB mappings for now. */
+ if (hugetlb_hgm_enabled(walk->vma))
+ return -EINVAL;
+
+ ptl = hugetlb_pte_lock(hpte);
+ entry = huge_ptep_get(hpte->ptep);
if (!pte_present(entry))
goto unlock;
page = pte_page(entry);
diff --git a/mm/mincore.c b/mm/mincore.c
index a085a2aeabd8..0894965b3944 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -22,18 +22,29 @@
#include <linux/uaccess.h>
#include "swap.h"
-static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr,
- unsigned long end, struct mm_walk *walk)
+static int mincore_hugetlb(struct hugetlb_pte *hpte, unsigned long addr,
+ struct mm_walk *walk)
{
#ifdef CONFIG_HUGETLB_PAGE
unsigned char present;
+ unsigned long end = addr + hugetlb_pte_size(hpte);
unsigned char *vec = walk->private;
+ pte_t pte = huge_ptep_get(hpte->ptep);
/*
* Hugepages under user process are always in RAM and never
* swapped out, but theoretically it needs to be checked.
*/
- present = pte && !huge_pte_none(huge_ptep_get(pte));
+ present = !huge_pte_none(pte);
+
+ /*
+ * If the pte is present but not a leaf, we raced with someone
+ * splitting it. For someone to have split it, it must have been
+ * huge_pte_none before, so treat it as such.
+ */
+ if (pte_present(pte) && !hugetlb_pte_present_leaf(hpte, pte))
+ present = false;
+
for (; addr != end; vec++, addr += PAGE_SIZE)
*vec = present;
walk->private = vec;
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 71358e45a742..62d8c5f7bc92 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -543,12 +543,16 @@ static int prot_none_pte_entry(pte_t *pte, unsigned long addr,
0 : -EACCES;
}
-static int prot_none_hugetlb_entry(pte_t *pte, unsigned long hmask,
- unsigned long addr, unsigned long next,
+static int prot_none_hugetlb_entry(struct hugetlb_pte *hpte,
+ unsigned long addr,
struct mm_walk *walk)
{
- return pfn_modify_allowed(pte_pfn(*pte), *(pgprot_t *)(walk->private)) ?
- 0 : -EACCES;
+ pte_t pte = huge_ptep_get(hpte->ptep);
+
+ if (!hugetlb_pte_present_leaf(hpte, pte))
+ return -EAGAIN;
+ return pfn_modify_allowed(pte_pfn(pte),
+ *(pgprot_t *)(walk->private)) ? 0 : -EACCES;
}
static int prot_none_test(unsigned long addr, unsigned long next,
@@ -591,8 +595,10 @@ mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma,
(newflags & VM_ACCESS_FLAGS) == 0) {
pgprot_t new_pgprot = vm_get_page_prot(newflags);
- error = walk_page_range(current->mm, start, end,
- &prot_none_walk_ops, &new_pgprot);
+ do {
+ error = walk_page_range(current->mm, start, end,
+ &prot_none_walk_ops, &new_pgprot);
+ } while (error == -EAGAIN);
if (error)
return error;
}
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index cb23f8a15c13..05ce242f8b7e 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -3,6 +3,7 @@
#include <linux/highmem.h>
#include <linux/sched.h>
#include <linux/hugetlb.h>
+#include <linux/minmax.h>
/*
* We want to know the real level where a entry is located ignoring any
@@ -296,20 +297,21 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
struct vm_area_struct *vma = walk->vma;
struct hstate *h = hstate_vma(vma);
unsigned long next;
- unsigned long hmask = huge_page_mask(h);
- unsigned long sz = huge_page_size(h);
- pte_t *pte;
const struct mm_walk_ops *ops = walk->ops;
int err = 0;
+ struct hugetlb_pte hpte;
hugetlb_vma_lock_read(vma);
do {
- next = hugetlb_entry_end(h, addr, end);
- pte = hugetlb_walk(vma, addr & hmask, sz);
- if (pte)
- err = ops->hugetlb_entry(pte, hmask, addr, next, walk);
- else if (ops->pte_hole)
- err = ops->pte_hole(addr, next, -1, walk);
+ if (hugetlb_full_walk(&hpte, vma, addr)) {
+ next = hugetlb_entry_end(h, addr, end);
+ if (ops->pte_hole)
+ err = ops->pte_hole(addr, next, -1, walk);
+ } else {
+ err = ops->hugetlb_entry(
+ &hpte, addr, walk);
+ next = min(addr + hugetlb_pte_size(&hpte), end);
+ }
if (err)
break;
} while (addr = next, addr != end);
--
2.39.0.314.g84b9a713c41-goog
next prev parent reply other threads:[~2023-01-05 10:19 UTC|newest]
Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-05 10:17 [PATCH 00/46] Based on latest mm-unstable (85b44c25cd1e) James Houghton
2023-01-05 10:17 ` [PATCH 01/46] hugetlb: don't set PageUptodate for UFFDIO_CONTINUE James Houghton
2023-01-05 10:18 ` [PATCH 02/46] hugetlb: remove mk_huge_pte; it is unused James Houghton
2023-01-05 10:18 ` [PATCH 03/46] hugetlb: remove redundant pte_mkhuge in migration path James Houghton
2023-01-05 10:18 ` [PATCH 04/46] hugetlb: only adjust address ranges when VMAs want PMD sharing James Houghton
2023-01-05 10:18 ` [PATCH 05/46] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING James Houghton
2023-01-05 10:18 ` [PATCH 06/46] mm: add VM_HUGETLB_HGM VMA flag James Houghton
2023-01-05 10:18 ` [PATCH 07/46] hugetlb: rename __vma_shareable_flags_pmd to __vma_has_hugetlb_vma_lock James Houghton
2023-01-05 10:18 ` [PATCH 08/46] hugetlb: add HugeTLB HGM enablement helpers James Houghton
2023-01-05 10:18 ` [PATCH 09/46] mm: add MADV_SPLIT to enable HugeTLB HGM James Houghton
2023-01-05 15:05 ` kernel test robot
2023-01-05 15:29 ` David Hildenbrand
2023-01-10 0:01 ` Zach O'Keefe
2023-01-05 10:18 ` [PATCH 10/46] hugetlb: make huge_pte_lockptr take an explicit shift argument James Houghton
2023-01-05 10:18 ` [PATCH 11/46] hugetlb: add hugetlb_pte to track HugeTLB page table entries James Houghton
2023-01-05 16:06 ` kernel test robot
2023-01-05 10:18 ` [PATCH 12/46] hugetlb: add hugetlb_alloc_pmd and hugetlb_alloc_pte James Houghton
2023-01-05 10:18 ` [PATCH 13/46] hugetlb: add hugetlb_hgm_walk and hugetlb_walk_step James Houghton
2023-01-05 16:57 ` kernel test robot
2023-01-05 18:58 ` kernel test robot
2023-01-11 21:51 ` Peter Xu
2023-01-12 13:38 ` James Houghton
2023-01-05 10:18 ` [PATCH 14/46] hugetlb: add make_huge_pte_with_shift James Houghton
2023-01-05 10:18 ` [PATCH 15/46] hugetlb: make default arch_make_huge_pte understand small mappings James Houghton
2023-01-05 10:18 ` [PATCH 16/46] hugetlbfs: do a full walk to check if vma maps a page James Houghton
2023-01-05 10:18 ` [PATCH 17/46] hugetlb: make unmapping compatible with high-granularity mappings James Houghton
2023-01-05 10:18 ` [PATCH 18/46] hugetlb: add HGM support for hugetlb_change_protection James Houghton
2023-01-05 10:18 ` [PATCH 19/46] hugetlb: add HGM support for follow_hugetlb_page James Houghton
2023-01-05 22:26 ` Peter Xu
2023-01-12 18:02 ` Peter Xu
2023-01-12 18:06 ` James Houghton
2023-01-05 10:18 ` [PATCH 20/46] hugetlb: add HGM support for hugetlb_follow_page_mask James Houghton
2023-01-05 10:18 ` James Houghton [this message]
2023-01-05 22:42 ` [PATCH 21/46] hugetlb: use struct hugetlb_pte for walk_hugetlb_range Peter Xu
2023-01-11 22:58 ` Peter Xu
2023-01-12 14:06 ` James Houghton
2023-01-12 15:29 ` Peter Xu
2023-01-12 16:45 ` James Houghton
2023-01-12 16:55 ` James Houghton
2023-01-12 20:27 ` Peter Xu
2023-01-12 21:17 ` James Houghton
2023-01-12 21:33 ` Peter Xu
2023-01-16 10:17 ` David Hildenbrand
2023-01-17 23:11 ` James Houghton
2023-01-18 9:43 ` David Hildenbrand
2023-01-18 15:35 ` Peter Xu
2023-01-18 16:39 ` James Houghton
2023-01-18 18:21 ` David Hildenbrand
2023-01-18 19:28 ` Mike Kravetz
2023-01-19 16:57 ` James Houghton
2023-01-19 17:31 ` Mike Kravetz
2023-01-19 19:42 ` James Houghton
2023-01-19 20:53 ` Peter Xu
2023-01-19 22:45 ` James Houghton
2023-01-19 22:00 ` Mike Kravetz
2023-01-19 22:23 ` Peter Xu
2023-01-19 22:35 ` James Houghton
2023-01-19 23:07 ` Peter Xu
2023-01-19 23:26 ` James Houghton
2023-01-20 17:23 ` Peter Xu
2023-01-19 23:44 ` Mike Kravetz
2023-01-23 15:19 ` Peter Xu
2023-01-23 17:49 ` Mike Kravetz
2023-01-26 16:58 ` James Houghton
2023-01-26 20:30 ` Peter Xu
2023-01-27 21:02 ` James Houghton
2023-01-30 17:29 ` Peter Xu
2023-01-30 18:38 ` James Houghton
2023-01-30 21:14 ` Peter Xu
2023-02-01 0:24 ` James Houghton
2023-02-01 1:24 ` Peter Xu
2023-02-01 15:45 ` James Houghton
2023-02-01 15:56 ` David Hildenbrand
2023-02-01 17:58 ` James Houghton
2023-02-01 18:01 ` David Hildenbrand
2023-02-01 16:22 ` Peter Xu
2023-02-01 21:32 ` James Houghton
2023-02-01 21:51 ` Peter Xu
2023-02-02 0:24 ` James Houghton
2023-02-07 16:30 ` James Houghton
2023-02-07 22:46 ` James Houghton
2023-02-07 23:13 ` Peter Xu
2023-02-08 0:26 ` James Houghton
2023-02-08 16:16 ` Peter Xu
2023-02-09 16:43 ` James Houghton
2023-02-09 19:10 ` Peter Xu
2023-02-09 19:49 ` James Houghton
2023-02-09 20:22 ` Peter Xu
2023-01-18 17:08 ` David Hildenbrand
2023-01-05 10:18 ` [PATCH 22/46] mm: rmap: provide pte_order in page_vma_mapped_walk James Houghton
2023-01-05 10:18 ` [PATCH 23/46] mm: rmap: make page_vma_mapped_walk callers use pte_order James Houghton
2023-01-05 10:18 ` [PATCH 24/46] rmap: update hugetlb lock comment for HGM James Houghton
2023-01-05 10:18 ` [PATCH 25/46] hugetlb: update page_vma_mapped to do high-granularity walks James Houghton
2023-01-05 10:18 ` [PATCH 26/46] hugetlb: add HGM support for copy_hugetlb_page_range James Houghton
2023-01-05 10:18 ` [PATCH 27/46] hugetlb: add HGM support for move_hugetlb_page_tables James Houghton
2023-01-05 10:18 ` [PATCH 28/46] hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page James Houghton
2023-01-05 10:18 ` [PATCH 29/46] rmap: in try_to_{migrate,unmap}_one, check head page for page flags James Houghton
2023-01-05 10:18 ` [PATCH 30/46] hugetlb: add high-granularity migration support James Houghton
2023-01-05 10:18 ` [PATCH 31/46] hugetlb: sort hstates in hugetlb_init_hstates James Houghton
2023-01-05 10:18 ` [PATCH 32/46] hugetlb: add for_each_hgm_shift James Houghton
2023-01-05 10:18 ` [PATCH 33/46] hugetlb: userfaultfd: add support for high-granularity UFFDIO_CONTINUE James Houghton
2023-01-05 10:18 ` [PATCH 34/46] hugetlb: userfaultfd: when using MADV_SPLIT, round addresses to PAGE_SIZE James Houghton
2023-01-06 15:13 ` Peter Xu
2023-01-10 14:50 ` James Houghton
2023-01-05 10:18 ` [PATCH 35/46] hugetlb: add MADV_COLLAPSE for hugetlb James Houghton
2023-01-10 20:04 ` James Houghton
2023-01-17 21:06 ` Peter Xu
2023-01-17 21:38 ` James Houghton
2023-01-17 21:54 ` Peter Xu
2023-01-19 22:37 ` Peter Xu
2023-01-19 23:06 ` James Houghton
2023-01-05 10:18 ` [PATCH 36/46] hugetlb: remove huge_pte_lock and huge_pte_lockptr James Houghton
2023-01-05 10:18 ` [PATCH 37/46] hugetlb: replace make_huge_pte with make_huge_pte_with_shift James Houghton
2023-01-05 10:18 ` [PATCH 38/46] mm: smaps: add stats for HugeTLB mapping size James Houghton
2023-01-05 10:18 ` [PATCH 39/46] hugetlb: x86: enable high-granularity mapping James Houghton
2023-01-12 20:07 ` James Houghton
2023-01-05 10:18 ` [PATCH 40/46] docs: hugetlb: update hugetlb and userfaultfd admin-guides with HGM info James Houghton
2023-01-05 10:18 ` [PATCH 41/46] docs: proc: include information about HugeTLB HGM James Houghton
2023-01-05 10:18 ` [PATCH 42/46] selftests/vm: add HugeTLB HGM to userfaultfd selftest James Houghton
2023-01-05 10:18 ` [PATCH 43/46] selftests/kvm: add HugeTLB HGM to KVM demand paging selftest James Houghton
2023-01-05 10:18 ` [PATCH 44/46] selftests/vm: add anon and shared hugetlb to migration test James Houghton
2023-01-05 10:18 ` [PATCH 45/46] selftests/vm: add hugetlb HGM test to migration selftest James Houghton
2023-01-05 10:18 ` [PATCH 46/46] selftests/vm: add HGM UFFDIO_CONTINUE and hwpoison tests James Houghton
2023-01-05 10:47 ` [PATCH 00/46] Based on latest mm-unstable (85b44c25cd1e) David Hildenbrand
2023-01-09 19:53 ` Mike Kravetz
2023-01-10 15:47 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230105101844.1893104-22-jthoughton@google.com \
--to=jthoughton@google.com \
--cc=akpm@linux-foundation.org \
--cc=almasrymina@google.com \
--cc=axelrasmussen@google.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=dgilbert@redhat.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=manish.mishra@nutanix.com \
--cc=mike.kravetz@oracle.com \
--cc=naoya.horiguchi@nec.com \
--cc=peterx@redhat.com \
--cc=rientjes@google.com \
--cc=shy828301@gmail.com \
--cc=songmuchun@bytedance.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox