linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] mm: cleanups for device-exclusive entries (hmm)
@ 2025-02-26 13:22 David Hildenbrand
  2025-02-26 13:22 ` [PATCH v2 1/5] lib/test_hmm: make dmirror_atomic_map() consume a single page David Hildenbrand
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-02-26 13:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton,
	Jérôme Glisse, Simona Vetter, Alistair Popple,
	Jason Gunthorpe

Based on mm/mm-unstable, which already contains [1].

Some smaller device-exclusive cleanups I have lying around. Tested
using the hmm selftests without surprises.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>

v1 -> v2:
 * Rebased on top of [1]
 * "mm/memory: remove PageAnonExclusive sanity-check in
    restore_exclusive_pte()"
  -> Added
 * "mm/memory: document restore_exclusive_pte()"
  -> adjust/clarify/simplify documentation
 * "mm/mmu_notifier: use MMU_NOTIFY_CLEAR in
    remove_device_exclusive_entry()"
  -> Use MMU_NOTIFY_EXCLUSIVE only for a single purpose such that we
     always have the owner

[1] https://lkml.kernel.org/r/20250210193801.781278-1-david@redhat.com

David Hildenbrand (5):
  lib/test_hmm: make dmirror_atomic_map() consume a single page
  mm/memory: remove PageAnonExclusive sanity-check in
    restore_exclusive_pte()
  mm/memory: pass folio and pte to restore_exclusive_pte()
  mm/memory: document restore_exclusive_pte()
  mm/mmu_notifier: use MMU_NOTIFY_CLEAR in
    remove_device_exclusive_entry()

 include/linux/mmu_notifier.h |  8 ++---
 lib/test_hmm.c               | 32 ++++++-------------
 mm/memory.c                  | 60 ++++++++++++++++++++++++------------
 3 files changed, 55 insertions(+), 45 deletions(-)


base-commit: 598d34afeca6bb10554846cf157a3ded8729516c
-- 
2.48.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/5] lib/test_hmm: make dmirror_atomic_map() consume a single page
  2025-02-26 13:22 [PATCH v2 0/5] mm: cleanups for device-exclusive entries (hmm) David Hildenbrand
@ 2025-02-26 13:22 ` David Hildenbrand
  2025-02-26 13:22 ` [PATCH v2 2/5] mm/memory: remove PageAnonExclusive sanity-check in restore_exclusive_pte() David Hildenbrand
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-02-26 13:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton,
	Jérôme Glisse, Simona Vetter, Alistair Popple,
	Jason Gunthorpe

The caller now always passes a single page; let's simplify, and return
"0" on success.

Reviewed-by: Alistair Popple <apopple@nvidia.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 lib/test_hmm.c | 32 ++++++++++----------------------
 1 file changed, 10 insertions(+), 22 deletions(-)

diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 155b18cd9f2af..5b144bc5c4ec7 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -707,34 +707,23 @@ static int dmirror_check_atomic(struct dmirror *dmirror, unsigned long start,
 	return 0;
 }
 
-static int dmirror_atomic_map(unsigned long start, unsigned long end,
-			      struct page **pages, struct dmirror *dmirror)
+static int dmirror_atomic_map(unsigned long addr, struct page *page,
+		struct dmirror *dmirror)
 {
-	unsigned long pfn, mapped = 0;
-	int i;
+	void *entry;
 
 	/* Map the migrated pages into the device's page tables. */
 	mutex_lock(&dmirror->mutex);
 
-	for (i = 0, pfn = start >> PAGE_SHIFT; pfn < (end >> PAGE_SHIFT); pfn++, i++) {
-		void *entry;
-
-		if (!pages[i])
-			continue;
-
-		entry = pages[i];
-		entry = xa_tag_pointer(entry, DPT_XA_TAG_ATOMIC);
-		entry = xa_store(&dmirror->pt, pfn, entry, GFP_ATOMIC);
-		if (xa_is_err(entry)) {
-			mutex_unlock(&dmirror->mutex);
-			return xa_err(entry);
-		}
-
-		mapped++;
+	entry = xa_tag_pointer(page, DPT_XA_TAG_ATOMIC);
+	entry = xa_store(&dmirror->pt, addr >> PAGE_SHIFT, entry, GFP_ATOMIC);
+	if (xa_is_err(entry)) {
+		mutex_unlock(&dmirror->mutex);
+		return xa_err(entry);
 	}
 
 	mutex_unlock(&dmirror->mutex);
-	return mapped;
+	return 0;
 }
 
 static int dmirror_migrate_finalize_and_map(struct migrate_vma *args,
@@ -804,8 +793,7 @@ static int dmirror_exclusive(struct dmirror *dmirror,
 			break;
 		}
 
-		ret = dmirror_atomic_map(addr, addr + PAGE_SIZE, &page, dmirror);
-		ret = ret == 1 ? 0 : -EBUSY;
+		ret = dmirror_atomic_map(addr, page, dmirror);
 		folio_unlock(folio);
 		folio_put(folio);
 	}
-- 
2.48.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 2/5] mm/memory: remove PageAnonExclusive sanity-check in restore_exclusive_pte()
  2025-02-26 13:22 [PATCH v2 0/5] mm: cleanups for device-exclusive entries (hmm) David Hildenbrand
  2025-02-26 13:22 ` [PATCH v2 1/5] lib/test_hmm: make dmirror_atomic_map() consume a single page David Hildenbrand
@ 2025-02-26 13:22 ` David Hildenbrand
  2025-02-26 13:22 ` [PATCH v2 3/5] mm/memory: pass folio and pte to restore_exclusive_pte() David Hildenbrand
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-02-26 13:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton,
	Jérôme Glisse, Simona Vetter, Alistair Popple,
	Jason Gunthorpe

In commit b832a354d787 ("mm/memory: page_add_anon_rmap() ->
folio_add_anon_rmap_pte()") we accidentally changed the sanity check to
essentially ignore anonymous folio by mis-placing the "!" ... but we really
always only get anonymous folios in restore_exclusive_pte().

However, in the meantime we removed the separate "writable
device-exclusive entries" and always detect if the PTE can be writable
using can_change_pte_writable() -- which also consults
PageAnonExclusive.

So let's just get rid of this sanity check completely.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 567b45e5d149e..507045fa719cc 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -740,9 +740,6 @@ static void restore_exclusive_pte(struct vm_area_struct *vma,
 			pte = pte_mkdirty(pte);
 		pte = pte_mkwrite(pte, vma);
 	}
-
-	VM_BUG_ON_FOLIO(pte_write(pte) && (!folio_test_anon(folio) &&
-					   PageAnonExclusive(page)), folio);
 	set_pte_at(vma->vm_mm, address, ptep, pte);
 
 	/*
-- 
2.48.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 3/5] mm/memory: pass folio and pte to restore_exclusive_pte()
  2025-02-26 13:22 [PATCH v2 0/5] mm: cleanups for device-exclusive entries (hmm) David Hildenbrand
  2025-02-26 13:22 ` [PATCH v2 1/5] lib/test_hmm: make dmirror_atomic_map() consume a single page David Hildenbrand
  2025-02-26 13:22 ` [PATCH v2 2/5] mm/memory: remove PageAnonExclusive sanity-check in restore_exclusive_pte() David Hildenbrand
@ 2025-02-26 13:22 ` David Hildenbrand
  2025-02-26 13:22 ` [PATCH v2 4/5] mm/memory: document restore_exclusive_pte() David Hildenbrand
  2025-02-26 13:22 ` [PATCH v2 5/5] mm/mmu_notifier: use MMU_NOTIFY_CLEAR in remove_device_exclusive_entry() David Hildenbrand
  4 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-02-26 13:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton,
	Jérôme Glisse, Simona Vetter, Alistair Popple,
	Jason Gunthorpe

Let's pass the folio and the pte to restore_exclusive_pte(), so we
can avoid repeated page_folio() and ptep_get(). To do that,
pass the pte to try_restore_exclusive_pte() and use a folio in there
already.

While at it, just avoid the "swp_entry_t entry" variable in
try_restore_exclusive_pte() and add a folio-locked check to
restore_exclusive_pte().

Reviewed-by: Alistair Popple <apopple@nvidia.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 507045fa719cc..2a0b4dd858769 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -719,14 +719,13 @@ struct folio *vm_normal_folio_pmd(struct vm_area_struct *vma,
 #endif
 
 static void restore_exclusive_pte(struct vm_area_struct *vma,
-				  struct page *page, unsigned long address,
-				  pte_t *ptep)
+		struct folio *folio, struct page *page, unsigned long address,
+		pte_t *ptep, pte_t orig_pte)
 {
-	struct folio *folio = page_folio(page);
-	pte_t orig_pte;
 	pte_t pte;
 
-	orig_pte = ptep_get(ptep);
+	VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
+
 	pte = pte_mkold(mk_pte(page, READ_ONCE(vma->vm_page_prot)));
 	if (pte_swp_soft_dirty(orig_pte))
 		pte = pte_mksoft_dirty(pte);
@@ -753,16 +752,15 @@ static void restore_exclusive_pte(struct vm_area_struct *vma,
  * Tries to restore an exclusive pte if the page lock can be acquired without
  * sleeping.
  */
-static int
-try_restore_exclusive_pte(pte_t *src_pte, struct vm_area_struct *vma,
-			unsigned long addr)
+static int try_restore_exclusive_pte(struct vm_area_struct *vma,
+		unsigned long addr, pte_t *ptep, pte_t orig_pte)
 {
-	swp_entry_t entry = pte_to_swp_entry(ptep_get(src_pte));
-	struct page *page = pfn_swap_entry_to_page(entry);
+	struct page *page = pfn_swap_entry_to_page(pte_to_swp_entry(orig_pte));
+	struct folio *folio = page_folio(page);
 
-	if (trylock_page(page)) {
-		restore_exclusive_pte(vma, page, addr, src_pte);
-		unlock_page(page);
+	if (folio_trylock(folio)) {
+		restore_exclusive_pte(vma, folio, page, addr, ptep, orig_pte);
+		folio_unlock(folio);
 		return 0;
 	}
 
@@ -868,7 +866,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		 * (ie. COW) mappings.
 		 */
 		VM_BUG_ON(!is_cow_mapping(src_vma->vm_flags));
-		if (try_restore_exclusive_pte(src_pte, src_vma, addr))
+		if (try_restore_exclusive_pte(src_vma, addr, src_pte, orig_pte))
 			return -EBUSY;
 		return -ENOENT;
 	} else if (is_pte_marker_entry(entry)) {
@@ -4030,7 +4028,8 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
 	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address,
 				&vmf->ptl);
 	if (likely(vmf->pte && pte_same(ptep_get(vmf->pte), vmf->orig_pte)))
-		restore_exclusive_pte(vma, vmf->page, vmf->address, vmf->pte);
+		restore_exclusive_pte(vma, folio, vmf->page, vmf->address,
+				      vmf->pte, vmf->orig_pte);
 
 	if (vmf->pte)
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-- 
2.48.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 4/5] mm/memory: document restore_exclusive_pte()
  2025-02-26 13:22 [PATCH v2 0/5] mm: cleanups for device-exclusive entries (hmm) David Hildenbrand
                   ` (2 preceding siblings ...)
  2025-02-26 13:22 ` [PATCH v2 3/5] mm/memory: pass folio and pte to restore_exclusive_pte() David Hildenbrand
@ 2025-02-26 13:22 ` David Hildenbrand
  2025-02-26 13:22 ` [PATCH v2 5/5] mm/mmu_notifier: use MMU_NOTIFY_CLEAR in remove_device_exclusive_entry() David Hildenbrand
  4 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-02-26 13:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton,
	Jérôme Glisse, Simona Vetter, Alistair Popple,
	Jason Gunthorpe

Let's document how this function is to be used, and why the folio lock
is involved.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index 2a0b4dd858769..50a305d7efcb9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -718,6 +718,32 @@ struct folio *vm_normal_folio_pmd(struct vm_area_struct *vma,
 }
 #endif
 
+/**
+ * restore_exclusive_pte - Restore a device-exclusive entry
+ * @vma: VMA covering @address
+ * @folio: the mapped folio
+ * @page: the mapped folio page
+ * @address: the virtual address
+ * @ptep: pte pointer into the locked page table mapping the folio page
+ * @orig_pte: pte value at @ptep
+ *
+ * Restore a device-exclusive non-swap entry to an ordinary present pte.
+ *
+ * The folio and the page table must be locked, and MMU notifiers must have
+ * been called to invalidate any (exclusive) device mappings.
+ *
+ * Locking the folio makes sure that anybody who just converted the pte to
+ * a device-exclusive entry can map it into the device to make forward
+ * progress without others converting it back until the folio was unlocked.
+ *
+ * If the folio lock ever becomes an issue, we can stop relying on the folio
+ * lock; it might make some scenarios with heavy thrashing less likely to
+ * make forward progress, but these scenarios might not be valid use cases.
+ *
+ * Note that the folio lock does not protect against all cases of concurrent
+ * page table modifications (e.g., MADV_DONTNEED, mprotect), so device drivers
+ * must use MMU notifiers to sync against any concurrent changes.
+ */
 static void restore_exclusive_pte(struct vm_area_struct *vma,
 		struct folio *folio, struct page *page, unsigned long address,
 		pte_t *ptep, pte_t orig_pte)
-- 
2.48.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 5/5] mm/mmu_notifier: use MMU_NOTIFY_CLEAR in remove_device_exclusive_entry()
  2025-02-26 13:22 [PATCH v2 0/5] mm: cleanups for device-exclusive entries (hmm) David Hildenbrand
                   ` (3 preceding siblings ...)
  2025-02-26 13:22 ` [PATCH v2 4/5] mm/memory: document restore_exclusive_pte() David Hildenbrand
@ 2025-02-26 13:22 ` David Hildenbrand
  4 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-02-26 13:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton,
	Jérôme Glisse, Simona Vetter, Alistair Popple,
	Jason Gunthorpe

Let's limit the use of MMU_NOTIFY_EXCLUSIVE to the case where we convert
a present PTE to device-exclusive. For the other case, we can simply
use MMU_NOTIFY_CLEAR, because it really is clearing the
device-exclusive entry first, to then install the present entry.

Update the documentation of MMU_NOTIFY_EXCLUSIVE, to document the single
use case more thoroughly.

If ever required, we could add a separate MMU_NOTIFY_CLEAR_EXCLUSIVE;
for now using MMU_NOTIFY_CLEAR seems to be sufficient.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/mmu_notifier.h | 8 ++++----
 mm/memory.c                  | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index d4e7146618262..bc2402a45741d 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -43,10 +43,10 @@ struct mmu_interval_notifier;
  * a device driver to possibly ignore the invalidation if the
  * owner field matches the driver's device private pgmap owner.
  *
- * @MMU_NOTIFY_EXCLUSIVE: to signal a device driver that the device will no
- * longer have exclusive access to the page. When sent during creation of an
- * exclusive range the owner will be initialised to the value provided by the
- * caller of make_device_exclusive(), otherwise the owner will be NULL.
+ * @MMU_NOTIFY_EXCLUSIVE: conversion of a page table entry to device-exclusive.
+ * The owner is initialized to the value provided by the caller of
+ * make_device_exclusive(), such that this caller can filter out these
+ * events.
  */
 enum mmu_notifier_event {
 	MMU_NOTIFY_UNMAP = 0,
diff --git a/mm/memory.c b/mm/memory.c
index 50a305d7efcb9..79acd2d95dcff 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4046,7 +4046,7 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
 		folio_put(folio);
 		return ret;
 	}
-	mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0,
+	mmu_notifier_range_init_owner(&range, MMU_NOTIFY_CLEAR, 0,
 				vma->vm_mm, vmf->address & PAGE_MASK,
 				(vmf->address & PAGE_MASK) + PAGE_SIZE, NULL);
 	mmu_notifier_invalidate_range_start(&range);
-- 
2.48.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-02-26 13:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-26 13:22 [PATCH v2 0/5] mm: cleanups for device-exclusive entries (hmm) David Hildenbrand
2025-02-26 13:22 ` [PATCH v2 1/5] lib/test_hmm: make dmirror_atomic_map() consume a single page David Hildenbrand
2025-02-26 13:22 ` [PATCH v2 2/5] mm/memory: remove PageAnonExclusive sanity-check in restore_exclusive_pte() David Hildenbrand
2025-02-26 13:22 ` [PATCH v2 3/5] mm/memory: pass folio and pte to restore_exclusive_pte() David Hildenbrand
2025-02-26 13:22 ` [PATCH v2 4/5] mm/memory: document restore_exclusive_pte() David Hildenbrand
2025-02-26 13:22 ` [PATCH v2 5/5] mm/mmu_notifier: use MMU_NOTIFY_CLEAR in remove_device_exclusive_entry() David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox