[PATCH 0/2] mm/hmm: more HMM clean up

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] mm/hmm: more HMM clean up
@ 2019-07-23 23:30 Ralph Campbell
  2019-07-23 23:30 ` [PATCH 1/2] mm/hmm: a few more C style and comment clean ups Ralph Campbell
  2019-07-23 23:30 ` [PATCH 2/2] mm/hmm: make full use of walk_page_range() Ralph Campbell
  0 siblings, 2 replies; 8+ messages in thread
From: Ralph Campbell @ 2019-07-23 23:30 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, Ralph Campbell

Here are two more patches for things I found to clean up.
I assume this will go into Jason's tree since there will likely be
more HMM changes in this cycle.

Ralph Campbell (2):
  mm/hmm: a few more C style and comment clean ups
  mm/hmm: make full use of walk_page_range()

 mm/hmm.c | 231 +++++++++++++++++++++----------------------------------
 1 file changed, 86 insertions(+), 145 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] mm/hmm: a few more C style and comment clean ups
  2019-07-23 23:30 [PATCH 0/2] mm/hmm: more HMM clean up Ralph Campbell
@ 2019-07-23 23:30 ` Ralph Campbell
  2019-07-23 23:57   ` Jason Gunthorpe
  2019-07-23 23:30 ` [PATCH 2/2] mm/hmm: make full use of walk_page_range() Ralph Campbell
  1 sibling, 1 reply; 8+ messages in thread
From: Ralph Campbell @ 2019-07-23 23:30 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Ralph Campbell, Jérôme Glisse,
	Jason Gunthorpe, Christoph Hellwig

A few more comments and minor programming style clean ups.
There should be no functional changes.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
---
 mm/hmm.c | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index b810a4fa3de9..8271f110c243 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -32,7 +32,7 @@ static const struct mmu_notifier_ops hmm_mmu_notifier_ops;
  * hmm_get_or_create - register HMM against an mm (HMM internal)
  *
  * @mm: mm struct to attach to
- * Returns: returns an HMM object, either by referencing the existing
+ * Return: an HMM object, either by referencing the existing
  *          (per-process) object, or by creating a new one.
  *
  * This is not intended to be used directly by device drivers. If mm already
@@ -323,8 +323,8 @@ static int hmm_pfns_bad(unsigned long addr,
 }
 
 /*
- * hmm_vma_walk_hole() - handle a range lacking valid pmd or pte(s)
- * @start: range virtual start address (inclusive)
+ * hmm_vma_walk_hole_() - handle a range lacking valid pmd or pte(s)
+ * @addr: range virtual start address (inclusive)
  * @end: range virtual end address (exclusive)
  * @fault: should we fault or not ?
  * @write_fault: write fault ?
@@ -374,9 +374,9 @@ static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
 	/*
 	 * So we not only consider the individual per page request we also
 	 * consider the default flags requested for the range. The API can
-	 * be use in 2 fashions. The first one where the HMM user coalesce
-	 * multiple page fault into one request and set flags per pfns for
-	 * of those faults. The second one where the HMM user want to pre-
+	 * be used 2 ways. The first one where the HMM user coalesces
+	 * multiple page faults into one request and sets flags per pfn for
+	 * those faults. The second one where the HMM user wants to pre-
 	 * fault a range with specific flags. For the latter one it is a
 	 * waste to have the user pre-fill the pfn arrays with a default
 	 * flags value.
@@ -386,7 +386,7 @@ static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
 	/* We aren't ask to do anything ... */
 	if (!(pfns & range->flags[HMM_PFN_VALID]))
 		return;
-	/* If this is device memory than only fault if explicitly requested */
+	/* If this is device memory then only fault if explicitly requested */
 	if ((cpu_flags & range->flags[HMM_PFN_DEVICE_PRIVATE])) {
 		/* Do we fault on device memory ? */
 		if (pfns & range->flags[HMM_PFN_DEVICE_PRIVATE]) {
@@ -500,7 +500,7 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
 	hmm_vma_walk->last = end;
 	return 0;
 #else
-	/* If THP is not enabled then we should never reach that code ! */
+	/* If THP is not enabled then we should never reach this code ! */
 	return -EINVAL;
 #endif
 }
@@ -624,13 +624,12 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 	pte_t *ptep;
 	pmd_t pmd;
 
-
 again:
 	pmd = READ_ONCE(*pmdp);
 	if (pmd_none(pmd))
 		return hmm_vma_walk_hole(start, end, walk);
 
-	if (pmd_huge(pmd) && (range->vma->vm_flags & VM_HUGETLB))
+	if (pmd_huge(pmd) && is_vm_hugetlb_page(vma))
 		return hmm_pfns_bad(start, end, walk);
 
 	if (thp_migration_supported() && is_pmd_migration_entry(pmd)) {
@@ -655,11 +654,11 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 
 	if (pmd_devmap(pmd) || pmd_trans_huge(pmd)) {
 		/*
-		 * No need to take pmd_lock here, even if some other threads
+		 * No need to take pmd_lock here, even if some other thread
 		 * is splitting the huge pmd we will get that event through
 		 * mmu_notifier callback.
 		 *
-		 * So just read pmd value and check again its a transparent
+		 * So just read pmd value and check again it's a transparent
 		 * huge or device mapping one and compute corresponding pfn
 		 * values.
 		 */
@@ -673,7 +672,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 	}
 
 	/*
-	 * We have handled all the valid case above ie either none, migration,
+	 * We have handled all the valid cases above ie either none, migration,
 	 * huge or transparent huge. At this point either it is a valid pmd
 	 * entry pointing to pte directory or it is a bad pmd that will not
 	 * recover.
@@ -793,10 +792,10 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
 	pte_t entry;
 	int ret = 0;
 
-	size = 1UL << huge_page_shift(h);
+	size = huge_page_size(h);
 	mask = size - 1;
 	if (range->page_shift != PAGE_SHIFT) {
-		/* Make sure we are looking at full page. */
+		/* Make sure we are looking at a full page. */
 		if (start & mask)
 			return -EINVAL;
 		if (end < (start + size))
@@ -807,8 +806,7 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
 		size = PAGE_SIZE;
 	}
 
-
-	ptl = huge_pte_lock(hstate_vma(walk->vma), walk->mm, pte);
+	ptl = huge_pte_lock(hstate_vma(vma), walk->mm, pte);
 	entry = huge_ptep_get(pte);
 
 	i = (start - range->start) >> range->page_shift;
@@ -857,7 +855,7 @@ static void hmm_pfns_clear(struct hmm_range *range,
  * @start: start virtual address (inclusive)
  * @end: end virtual address (exclusive)
  * @page_shift: expect page shift for the range
- * Returns 0 on success, -EFAULT if the address space is no longer valid
+ * Return: 0 on success, -EFAULT if the address space is no longer valid
  *
  * Track updates to the CPU page table see include/linux/hmm.h
  */
-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/2] mm/hmm: make full use of walk_page_range()
  2019-07-23 23:30 [PATCH 0/2] mm/hmm: more HMM clean up Ralph Campbell
  2019-07-23 23:30 ` [PATCH 1/2] mm/hmm: a few more C style and comment clean ups Ralph Campbell
@ 2019-07-23 23:30 ` Ralph Campbell
  2019-07-24  6:51   ` Christoph Hellwig
  1 sibling, 1 reply; 8+ messages in thread
From: Ralph Campbell @ 2019-07-23 23:30 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Ralph Campbell, Jérôme Glisse,
	Jason Gunthorpe, Christoph Hellwig

hmm_range_snapshot() and hmm_range_fault() both call find_vma() and
walk_page_range() in a loop. This is unnecessary duplication since
walk_page_range() calls find_vma() in a loop already.
Simplify hmm_range_snapshot() and hmm_range_fault() by defining a
walk_test() callback function to filter unhandled vmas.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
---
 mm/hmm.c | 197 ++++++++++++++++++++-----------------------------------
 1 file changed, 70 insertions(+), 127 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 8271f110c243..218557451070 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -839,13 +839,40 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
 #endif
 }
 
-static void hmm_pfns_clear(struct hmm_range *range,
-			   uint64_t *pfns,
-			   unsigned long addr,
-			   unsigned long end)
+static int hmm_vma_walk_test(unsigned long start,
+			     unsigned long end,
+			     struct mm_walk *walk)
 {
-	for (; addr < end; addr += PAGE_SIZE, pfns++)
-		*pfns = range->values[HMM_PFN_NONE];
+	const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP;
+	struct hmm_vma_walk *hmm_vma_walk = walk->private;
+	struct hmm_range *range = hmm_vma_walk->range;
+	struct vm_area_struct *vma = walk->vma;
+
+	/* If range is no longer valid, force retry. */
+	if (!range->valid)
+		return -EBUSY;
+
+	if (vma->vm_flags & device_vma)
+		return -EFAULT;
+
+	if (is_vm_hugetlb_page(vma)) {
+		if (huge_page_shift(hstate_vma(vma)) != range->page_shift &&
+		    range->page_shift != PAGE_SHIFT)
+			return -EINVAL;
+	} else {
+		if (range->page_shift != PAGE_SHIFT)
+			return -EINVAL;
+	}
+
+	/*
+	 * If vma does not allow read access, then assume that it does not
+	 * allow write access, either. HMM does not support architectures
+	 * that allow write without read.
+	 */
+	if (!(vma->vm_flags & VM_READ))
+		return -EPERM;
+
+	return 0;
 }
 
 /*
@@ -949,65 +976,28 @@ EXPORT_SYMBOL(hmm_range_unregister);
  */
 long hmm_range_snapshot(struct hmm_range *range)
 {
-	const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP;
-	unsigned long start = range->start, end;
-	struct hmm_vma_walk hmm_vma_walk;
+	unsigned long start = range->start;
+	struct hmm_vma_walk hmm_vma_walk = {};
 	struct hmm *hmm = range->hmm;
-	struct vm_area_struct *vma;
-	struct mm_walk mm_walk;
+	struct mm_walk mm_walk = {};
+	int ret;
 
 	lockdep_assert_held(&hmm->mm->mmap_sem);
-	do {
-		/* If range is no longer valid force retry. */
-		if (!range->valid)
-			return -EBUSY;
 
-		vma = find_vma(hmm->mm, start);
-		if (vma == NULL || (vma->vm_flags & device_vma))
-			return -EFAULT;
-
-		if (is_vm_hugetlb_page(vma)) {
-			if (huge_page_shift(hstate_vma(vma)) !=
-				    range->page_shift &&
-			    range->page_shift != PAGE_SHIFT)
-				return -EINVAL;
-		} else {
-			if (range->page_shift != PAGE_SHIFT)
-				return -EINVAL;
-		}
+	hmm_vma_walk.range = range;
+	hmm_vma_walk.last = start;
+	mm_walk.private = &hmm_vma_walk;
 
-		if (!(vma->vm_flags & VM_READ)) {
-			/*
-			 * If vma do not allow read access, then assume that it
-			 * does not allow write access, either. HMM does not
-			 * support architecture that allow write without read.
-			 */
-			hmm_pfns_clear(range, range->pfns,
-				range->start, range->end);
-			return -EPERM;
-		}
+	mm_walk.mm = hmm->mm;
+	mm_walk.pud_entry = hmm_vma_walk_pud;
+	mm_walk.pmd_entry = hmm_vma_walk_pmd;
+	mm_walk.pte_hole = hmm_vma_walk_hole;
+	mm_walk.hugetlb_entry = hmm_vma_walk_hugetlb_entry;
+	mm_walk.test_walk = hmm_vma_walk_test;
 
-		range->vma = vma;
-		hmm_vma_walk.pgmap = NULL;
-		hmm_vma_walk.last = start;
-		hmm_vma_walk.fault = false;
-		hmm_vma_walk.range = range;
-		mm_walk.private = &hmm_vma_walk;
-		end = min(range->end, vma->vm_end);
-
-		mm_walk.vma = vma;
-		mm_walk.mm = vma->vm_mm;
-		mm_walk.pte_entry = NULL;
-		mm_walk.test_walk = NULL;
-		mm_walk.hugetlb_entry = NULL;
-		mm_walk.pud_entry = hmm_vma_walk_pud;
-		mm_walk.pmd_entry = hmm_vma_walk_pmd;
-		mm_walk.pte_hole = hmm_vma_walk_hole;
-		mm_walk.hugetlb_entry = hmm_vma_walk_hugetlb_entry;
-
-		walk_page_range(start, end, &mm_walk);
-		start = end;
-	} while (start < range->end);
+	ret = walk_page_range(start, range->end, &mm_walk);
+	if (ret)
+		return ret;
 
 	return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT;
 }
@@ -1043,83 +1033,36 @@ EXPORT_SYMBOL(hmm_range_snapshot);
  */
 long hmm_range_fault(struct hmm_range *range, bool block)
 {
-	const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP;
-	unsigned long start = range->start, end;
-	struct hmm_vma_walk hmm_vma_walk;
+	unsigned long start = range->start;
+	struct hmm_vma_walk hmm_vma_walk = {};
 	struct hmm *hmm = range->hmm;
-	struct vm_area_struct *vma;
-	struct mm_walk mm_walk;
+	struct mm_walk mm_walk = {};
 	int ret;
 
 	lockdep_assert_held(&hmm->mm->mmap_sem);
 
-	do {
-		/* If range is no longer valid force retry. */
-		if (!range->valid)
-			return -EBUSY;
+	hmm_vma_walk.range = range;
+	hmm_vma_walk.last = start;
+	hmm_vma_walk.fault = true;
+	hmm_vma_walk.block = block;
+	mm_walk.private = &hmm_vma_walk;
 
-		vma = find_vma(hmm->mm, start);
-		if (vma == NULL || (vma->vm_flags & device_vma))
-			return -EFAULT;
-
-		if (is_vm_hugetlb_page(vma)) {
-			if (huge_page_shift(hstate_vma(vma)) !=
-			    range->page_shift &&
-			    range->page_shift != PAGE_SHIFT)
-				return -EINVAL;
-		} else {
-			if (range->page_shift != PAGE_SHIFT)
-				return -EINVAL;
-		}
+	mm_walk.mm = hmm->mm;
+	mm_walk.pud_entry = hmm_vma_walk_pud;
+	mm_walk.pmd_entry = hmm_vma_walk_pmd;
+	mm_walk.pte_hole = hmm_vma_walk_hole;
+	mm_walk.hugetlb_entry = hmm_vma_walk_hugetlb_entry;
+	mm_walk.test_walk = hmm_vma_walk_test;
 
-		if (!(vma->vm_flags & VM_READ)) {
-			/*
-			 * If vma do not allow read access, then assume that it
-			 * does not allow write access, either. HMM does not
-			 * support architecture that allow write without read.
-			 */
-			hmm_pfns_clear(range, range->pfns,
-				range->start, range->end);
-			return -EPERM;
-		}
+	do {
+		ret = walk_page_range(start, range->end, &mm_walk);
+		start = hmm_vma_walk.last;
 
-		range->vma = vma;
-		hmm_vma_walk.pgmap = NULL;
-		hmm_vma_walk.last = start;
-		hmm_vma_walk.fault = true;
-		hmm_vma_walk.block = block;
-		hmm_vma_walk.range = range;
-		mm_walk.private = &hmm_vma_walk;
-		end = min(range->end, vma->vm_end);
-
-		mm_walk.vma = vma;
-		mm_walk.mm = vma->vm_mm;
-		mm_walk.pte_entry = NULL;
-		mm_walk.test_walk = NULL;
-		mm_walk.hugetlb_entry = NULL;
-		mm_walk.pud_entry = hmm_vma_walk_pud;
-		mm_walk.pmd_entry = hmm_vma_walk_pmd;
-		mm_walk.pte_hole = hmm_vma_walk_hole;
-		mm_walk.hugetlb_entry = hmm_vma_walk_hugetlb_entry;
-
-		do {
-			ret = walk_page_range(start, end, &mm_walk);
-			start = hmm_vma_walk.last;
-
-			/* Keep trying while the range is valid. */
-		} while (ret == -EBUSY && range->valid);
-
-		if (ret) {
-			unsigned long i;
-
-			i = (hmm_vma_walk.last - range->start) >> PAGE_SHIFT;
-			hmm_pfns_clear(range, &range->pfns[i],
-				hmm_vma_walk.last, range->end);
-			return ret;
-		}
-		start = end;
+		/* Keep trying while the range is valid. */
+	} while (ret == -EBUSY && range->valid);
 
-	} while (start < range->end);
+	if (ret)
+		return ret;
 
 	return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT;
 }
-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] mm/hmm: a few more C style and comment clean ups
  2019-07-23 23:30 ` [PATCH 1/2] mm/hmm: a few more C style and comment clean ups Ralph Campbell
@ 2019-07-23 23:57   ` Jason Gunthorpe
  2019-07-24  5:41     ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2019-07-23 23:57 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: linux-mm, linux-kernel, Jérôme Glisse, Christoph Hellwig

On Tue, Jul 23, 2019 at 04:30:15PM -0700, Ralph Campbell wrote:
> -	if (pmd_huge(pmd) && (range->vma->vm_flags & VM_HUGETLB))
> +	if (pmd_huge(pmd) && is_vm_hugetlb_page(vma))
>  		return hmm_pfns_bad(start, end, walk);

This one is not a minor cleanup.. I think it should be done on its
own commit, and more comletely, maybe based on the below..

If vma is always the same as the the first vma, then your hunk above
here is much better than introducing a hugetlb flag as I did below..

Although I don't understand why we have this test when it does seem to
support huge pages, and the commit log suggests hugetlbfs was
deliberately supported. So a comment (or deletion) sure would be nice.

So maybe sequence this into your series?

Jason

From 6ea7cd2565b5b660d22a659b71b62614e66bc345 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgg@mellanox.com>
Date: Tue, 23 Jul 2019 12:28:32 -0300
Subject: [PATCH] mm/hmm: remove hmm_range vma

This value is only read inside hmm_vma_walk_pmd() and all the callers,
through walk_page_range(), always set the value. The proper place for
per-walk data is in hmm_vma_walk, and since the only usage is a vm_flags
test just precompute and store that.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 drivers/gpu/drm/nouveau/nouveau_svm.c |  7 +++----
 include/linux/hmm.h                   |  1 -
 mm/hmm.c                              | 11 ++++++-----
 3 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index a9c5c58d425b3d..4f4bec40b887a6 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -495,12 +495,12 @@ nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range)
 				 range->start, range->end,
 				 PAGE_SHIFT);
 	if (ret) {
-		up_read(&range->vma->vm_mm->mmap_sem);
+		up_read(&range->hmm->mm->mmap_sem);
 		return (int)ret;
 	}
 
 	if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
-		up_read(&range->vma->vm_mm->mmap_sem);
+		up_read(&range->hmm->mm->mmap_sem);
 		return -EBUSY;
 	}
 
@@ -508,7 +508,7 @@ nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range)
 	if (ret <= 0) {
 		if (ret == 0)
 			ret = -EBUSY;
-		up_read(&range->vma->vm_mm->mmap_sem);
+		up_read(&range->hmm->mm->mmap_sem);
 		hmm_range_unregister(range);
 		return ret;
 	}
@@ -681,7 +681,6 @@ nouveau_svm_fault(struct nvif_notify *notify)
 			 args.i.p.addr + args.i.p.size, fn - fi);
 
 		/* Have HMM fault pages within the fault window to the GPU. */
-		range.vma = vma;
 		range.start = args.i.p.addr;
 		range.end = args.i.p.addr + args.i.p.size;
 		range.pfns = args.phys;
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 9f32586684c9c3..d4b89f655817cd 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -164,7 +164,6 @@ enum hmm_pfn_value_e {
  */
 struct hmm_range {
 	struct hmm		*hmm;
-	struct vm_area_struct	*vma;
 	struct list_head	list;
 	unsigned long		start;
 	unsigned long		end;
diff --git a/mm/hmm.c b/mm/hmm.c
index 16b6731a34db79..3d8cdfb67a6ab8 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -285,8 +285,9 @@ struct hmm_vma_walk {
 	struct hmm_range	*range;
 	struct dev_pagemap	*pgmap;
 	unsigned long		last;
-	bool			fault;
-	bool			block;
+	bool			fault : 1;
+	bool			block : 1;
+	bool			hugetlb : 1;
 };
 
 static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr,
@@ -635,7 +636,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 	if (pmd_none(pmd))
 		return hmm_vma_walk_hole(start, end, walk);
 
-	if (pmd_huge(pmd) && (range->vma->vm_flags & VM_HUGETLB))
+	if (pmd_huge(pmd) && hmm_vma_walk->hugetlb)
 		return hmm_pfns_bad(start, end, walk);
 
 	if (thp_migration_supported() && is_pmd_migration_entry(pmd)) {
@@ -994,7 +995,7 @@ long hmm_range_snapshot(struct hmm_range *range)
 			return -EPERM;
 		}
 
-		range->vma = vma;
+		hmm_vma_walk.hugetlb = vma->vm_flags & VM_HUGETLB;
 		hmm_vma_walk.pgmap = NULL;
 		hmm_vma_walk.last = start;
 		hmm_vma_walk.fault = false;
@@ -1090,7 +1091,7 @@ long hmm_range_fault(struct hmm_range *range, bool block)
 			return -EPERM;
 		}
 
-		range->vma = vma;
+		hmm_vma_walk.hugetlb = vma->vm_flags & VM_HUGETLB;
 		hmm_vma_walk.pgmap = NULL;
 		hmm_vma_walk.last = start;
 		hmm_vma_walk.fault = true;
-- 
2.22.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] mm/hmm: a few more C style and comment clean ups
  2019-07-23 23:57   ` Jason Gunthorpe
@ 2019-07-24  5:41     ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2019-07-24  5:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Ralph Campbell, linux-mm, linux-kernel, Jérôme Glisse,
	Christoph Hellwig

On Tue, Jul 23, 2019 at 11:57:52PM +0000, Jason Gunthorpe wrote:
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 16b6731a34db79..3d8cdfb67a6ab8 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -285,8 +285,9 @@ struct hmm_vma_walk {
>  	struct hmm_range	*range;
>  	struct dev_pagemap	*pgmap;
>  	unsigned long		last;
> -	bool			fault;
> -	bool			block;
> +	bool			fault : 1;
> +	bool			block : 1;
> +	bool			hugetlb : 1;

I don't think we should even keep these bools around.  I have something
like this hiding in a branche, which properly cleans much of this up:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/hmm-dma-cleanup

Notably:

http://git.infradead.org/users/hch/misc.git/commitdiff/2abdc0ac8f9f32149246957121ebccbe5c0a729d

http://git.infradead.org/users/hch/misc.git/commitdiff/a34ccd30ee8a8a3111d9e91711c12901ed7dea74

http://git.infradead.org/users/hch/misc.git/commitdiff/81f442ebac7170815af7770a1efa9c4ab662137e

This doesn't go all the way yet - the page_walk infrastructure is
built around the idea of doing its own vma lookups, and we should
eventually kill the lookup inside hmm entirely. 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] mm/hmm: make full use of walk_page_range()
  2019-07-23 23:30 ` [PATCH 2/2] mm/hmm: make full use of walk_page_range() Ralph Campbell
@ 2019-07-24  6:51   ` Christoph Hellwig
  2019-07-24 11:53     ` Jason Gunthorpe
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2019-07-24  6:51 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: linux-mm, linux-kernel, Jérôme Glisse, Jason Gunthorpe,
	Christoph Hellwig

On Tue, Jul 23, 2019 at 04:30:16PM -0700, Ralph Campbell wrote:
> hmm_range_snapshot() and hmm_range_fault() both call find_vma() and
> walk_page_range() in a loop. This is unnecessary duplication since
> walk_page_range() calls find_vma() in a loop already.
> Simplify hmm_range_snapshot() and hmm_range_fault() by defining a
> walk_test() callback function to filter unhandled vmas.

I like the approach a lot!

But we really need to sort out the duplication between hmm_range_fault
and hmm_range_snapshot first, as they are basically the same code.  I
have patches here:

http://git.infradead.org/users/hch/misc.git/commitdiff/a34ccd30ee8a8a3111d9e91711c12901ed7dea74

http://git.infradead.org/users/hch/misc.git/commitdiff/81f442ebac7170815af7770a1efa9c4ab662137e

That being said we don't really have any users for the snapshot mode
or non-blocking faults, and I don't see any in the immediate pipeline
either.  It might actually be a better idea to just kill that stuff off
for now until we have a user, as code without users is per definition
untested and will just bitrot and break.

> +	const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP;
> +	struct hmm_vma_walk *hmm_vma_walk = walk->private;
> +	struct hmm_range *range = hmm_vma_walk->range;
> +	struct vm_area_struct *vma = walk->vma;
> +
> +	/* If range is no longer valid, force retry. */
> +	if (!range->valid)
> +		return -EBUSY;
> +
> +	if (vma->vm_flags & device_vma)

Can we just kill off this odd device_vma variable?

	if (vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP))

and maybe add a comment on why we are skipping them (because they
don't have struct page backing I guess..)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] mm/hmm: make full use of walk_page_range()
  2019-07-24  6:51   ` Christoph Hellwig
@ 2019-07-24 11:53     ` Jason Gunthorpe
  2019-07-24 19:50       ` Ralph Campbell
  0 siblings, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2019-07-24 11:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ralph Campbell, linux-mm, linux-kernel, Jérôme Glisse

On Wed, Jul 24, 2019 at 08:51:46AM +0200, Christoph Hellwig wrote:
> On Tue, Jul 23, 2019 at 04:30:16PM -0700, Ralph Campbell wrote:
> > hmm_range_snapshot() and hmm_range_fault() both call find_vma() and
> > walk_page_range() in a loop. This is unnecessary duplication since
> > walk_page_range() calls find_vma() in a loop already.
> > Simplify hmm_range_snapshot() and hmm_range_fault() by defining a
> > walk_test() callback function to filter unhandled vmas.
> 
> I like the approach a lot!
> 
> But we really need to sort out the duplication between hmm_range_fault
> and hmm_range_snapshot first, as they are basically the same code.  I
> have patches here:
> 
> http://git.infradead.org/users/hch/misc.git/commitdiff/a34ccd30ee8a8a3111d9e91711c12901ed7dea74
> 
> http://git.infradead.org/users/hch/misc.git/commitdiff/81f442ebac7170815af7770a1efa9c4ab662137e

Yeah, that is a straightforward improvement, maybe Ralph should grab
these two as part of his series?

> That being said we don't really have any users for the snapshot mode
> or non-blocking faults, and I don't see any in the immediate pipeline
> either.

If this code was production ready I'd use it in ODP right away.

When we first create a ODP MR we'd want to snapshot to pre-load the
NIC tables with something other than page fault, but not fault
anything.

This would be a big performance improvement for ODP.

Jason


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] mm/hmm: make full use of walk_page_range()
  2019-07-24 11:53     ` Jason Gunthorpe
@ 2019-07-24 19:50       ` Ralph Campbell
  0 siblings, 0 replies; 8+ messages in thread
From: Ralph Campbell @ 2019-07-24 19:50 UTC (permalink / raw)
  To: Jason Gunthorpe, Christoph Hellwig
  Cc: linux-mm, linux-kernel, Jérôme Glisse


On 7/24/19 4:53 AM, Jason Gunthorpe wrote:
> On Wed, Jul 24, 2019 at 08:51:46AM +0200, Christoph Hellwig wrote:
>> On Tue, Jul 23, 2019 at 04:30:16PM -0700, Ralph Campbell wrote:
>>> hmm_range_snapshot() and hmm_range_fault() both call find_vma() and
>>> walk_page_range() in a loop. This is unnecessary duplication since
>>> walk_page_range() calls find_vma() in a loop already.
>>> Simplify hmm_range_snapshot() and hmm_range_fault() by defining a
>>> walk_test() callback function to filter unhandled vmas.
>>
>> I like the approach a lot!
>>
>> But we really need to sort out the duplication between hmm_range_fault
>> and hmm_range_snapshot first, as they are basically the same code.  I
>> have patches here:
>>
>> http://git.infradead.org/users/hch/misc.git/commitdiff/a34ccd30ee8a8a3111d9e91711c12901ed7dea74
>>
>> http://git.infradead.org/users/hch/misc.git/commitdiff/81f442ebac7170815af7770a1efa9c4ab662137e
> 
> Yeah, that is a straightforward improvement, maybe Ralph should grab
> these two as part of his series?

Sure, no problem.
I'll add them in v2 when I fix the other issues in the series.

>> That being said we don't really have any users for the snapshot mode
>> or non-blocking faults, and I don't see any in the immediate pipeline
>> either.
> 
> If this code was production ready I'd use it in ODP right away.
> 
> When we first create a ODP MR we'd want to snapshot to pre-load the
> NIC tables with something other than page fault, but not fault
> anything.
> 
> This would be a big performance improvement for ODP.
> 
> Jason
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-07-24 19:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-23 23:30 [PATCH 0/2] mm/hmm: more HMM clean up Ralph Campbell
2019-07-23 23:30 ` [PATCH 1/2] mm/hmm: a few more C style and comment clean ups Ralph Campbell
2019-07-23 23:57   ` Jason Gunthorpe
2019-07-24  5:41     ` Christoph Hellwig
2019-07-23 23:30 ` [PATCH 2/2] mm/hmm: make full use of walk_page_range() Ralph Campbell
2019-07-24  6:51   ` Christoph Hellwig
2019-07-24 11:53     ` Jason Gunthorpe
2019-07-24 19:50       ` Ralph Campbell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox