* [PATCH v2 0/5] mm: cleanups for device-exclusive entries (hmm)
@ 2025-02-26 13:22 David Hildenbrand
2025-02-26 13:22 ` [PATCH v2 1/5] lib/test_hmm: make dmirror_atomic_map() consume a single page David Hildenbrand
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-02-26 13:22 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, David Hildenbrand, Andrew Morton,
Jérôme Glisse, Simona Vetter, Alistair Popple,
Jason Gunthorpe
Based on mm/mm-unstable, which already contains [1].
Some smaller device-exclusive cleanups I have lying around. Tested
using the hmm selftests without surprises.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
v1 -> v2:
* Rebased on top of [1]
* "mm/memory: remove PageAnonExclusive sanity-check in
restore_exclusive_pte()"
-> Added
* "mm/memory: document restore_exclusive_pte()"
-> adjust/clarify/simplify documentation
* "mm/mmu_notifier: use MMU_NOTIFY_CLEAR in
remove_device_exclusive_entry()"
-> Use MMU_NOTIFY_EXCLUSIVE only for a single purpose such that we
always have the owner
[1] https://lkml.kernel.org/r/20250210193801.781278-1-david@redhat.com
David Hildenbrand (5):
lib/test_hmm: make dmirror_atomic_map() consume a single page
mm/memory: remove PageAnonExclusive sanity-check in
restore_exclusive_pte()
mm/memory: pass folio and pte to restore_exclusive_pte()
mm/memory: document restore_exclusive_pte()
mm/mmu_notifier: use MMU_NOTIFY_CLEAR in
remove_device_exclusive_entry()
include/linux/mmu_notifier.h | 8 ++---
lib/test_hmm.c | 32 ++++++-------------
mm/memory.c | 60 ++++++++++++++++++++++++------------
3 files changed, 55 insertions(+), 45 deletions(-)
base-commit: 598d34afeca6bb10554846cf157a3ded8729516c
--
2.48.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/5] lib/test_hmm: make dmirror_atomic_map() consume a single page
2025-02-26 13:22 [PATCH v2 0/5] mm: cleanups for device-exclusive entries (hmm) David Hildenbrand
@ 2025-02-26 13:22 ` David Hildenbrand
2025-02-26 13:22 ` [PATCH v2 2/5] mm/memory: remove PageAnonExclusive sanity-check in restore_exclusive_pte() David Hildenbrand
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-02-26 13:22 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, David Hildenbrand, Andrew Morton,
Jérôme Glisse, Simona Vetter, Alistair Popple,
Jason Gunthorpe
The caller now always passes a single page; let's simplify, and return
"0" on success.
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
lib/test_hmm.c | 32 ++++++++++----------------------
1 file changed, 10 insertions(+), 22 deletions(-)
diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 155b18cd9f2af..5b144bc5c4ec7 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -707,34 +707,23 @@ static int dmirror_check_atomic(struct dmirror *dmirror, unsigned long start,
return 0;
}
-static int dmirror_atomic_map(unsigned long start, unsigned long end,
- struct page **pages, struct dmirror *dmirror)
+static int dmirror_atomic_map(unsigned long addr, struct page *page,
+ struct dmirror *dmirror)
{
- unsigned long pfn, mapped = 0;
- int i;
+ void *entry;
/* Map the migrated pages into the device's page tables. */
mutex_lock(&dmirror->mutex);
- for (i = 0, pfn = start >> PAGE_SHIFT; pfn < (end >> PAGE_SHIFT); pfn++, i++) {
- void *entry;
-
- if (!pages[i])
- continue;
-
- entry = pages[i];
- entry = xa_tag_pointer(entry, DPT_XA_TAG_ATOMIC);
- entry = xa_store(&dmirror->pt, pfn, entry, GFP_ATOMIC);
- if (xa_is_err(entry)) {
- mutex_unlock(&dmirror->mutex);
- return xa_err(entry);
- }
-
- mapped++;
+ entry = xa_tag_pointer(page, DPT_XA_TAG_ATOMIC);
+ entry = xa_store(&dmirror->pt, addr >> PAGE_SHIFT, entry, GFP_ATOMIC);
+ if (xa_is_err(entry)) {
+ mutex_unlock(&dmirror->mutex);
+ return xa_err(entry);
}
mutex_unlock(&dmirror->mutex);
- return mapped;
+ return 0;
}
static int dmirror_migrate_finalize_and_map(struct migrate_vma *args,
@@ -804,8 +793,7 @@ static int dmirror_exclusive(struct dmirror *dmirror,
break;
}
- ret = dmirror_atomic_map(addr, addr + PAGE_SIZE, &page, dmirror);
- ret = ret == 1 ? 0 : -EBUSY;
+ ret = dmirror_atomic_map(addr, page, dmirror);
folio_unlock(folio);
folio_put(folio);
}
--
2.48.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 2/5] mm/memory: remove PageAnonExclusive sanity-check in restore_exclusive_pte()
2025-02-26 13:22 [PATCH v2 0/5] mm: cleanups for device-exclusive entries (hmm) David Hildenbrand
2025-02-26 13:22 ` [PATCH v2 1/5] lib/test_hmm: make dmirror_atomic_map() consume a single page David Hildenbrand
@ 2025-02-26 13:22 ` David Hildenbrand
2025-02-26 13:22 ` [PATCH v2 3/5] mm/memory: pass folio and pte to restore_exclusive_pte() David Hildenbrand
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-02-26 13:22 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, David Hildenbrand, Andrew Morton,
Jérôme Glisse, Simona Vetter, Alistair Popple,
Jason Gunthorpe
In commit b832a354d787 ("mm/memory: page_add_anon_rmap() ->
folio_add_anon_rmap_pte()") we accidentally changed the sanity check to
essentially ignore anonymous folio by mis-placing the "!" ... but we really
always only get anonymous folios in restore_exclusive_pte().
However, in the meantime we removed the separate "writable
device-exclusive entries" and always detect if the PTE can be writable
using can_change_pte_writable() -- which also consults
PageAnonExclusive.
So let's just get rid of this sanity check completely.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/memory.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 567b45e5d149e..507045fa719cc 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -740,9 +740,6 @@ static void restore_exclusive_pte(struct vm_area_struct *vma,
pte = pte_mkdirty(pte);
pte = pte_mkwrite(pte, vma);
}
-
- VM_BUG_ON_FOLIO(pte_write(pte) && (!folio_test_anon(folio) &&
- PageAnonExclusive(page)), folio);
set_pte_at(vma->vm_mm, address, ptep, pte);
/*
--
2.48.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 3/5] mm/memory: pass folio and pte to restore_exclusive_pte()
2025-02-26 13:22 [PATCH v2 0/5] mm: cleanups for device-exclusive entries (hmm) David Hildenbrand
2025-02-26 13:22 ` [PATCH v2 1/5] lib/test_hmm: make dmirror_atomic_map() consume a single page David Hildenbrand
2025-02-26 13:22 ` [PATCH v2 2/5] mm/memory: remove PageAnonExclusive sanity-check in restore_exclusive_pte() David Hildenbrand
@ 2025-02-26 13:22 ` David Hildenbrand
2025-02-26 13:22 ` [PATCH v2 4/5] mm/memory: document restore_exclusive_pte() David Hildenbrand
2025-02-26 13:22 ` [PATCH v2 5/5] mm/mmu_notifier: use MMU_NOTIFY_CLEAR in remove_device_exclusive_entry() David Hildenbrand
4 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-02-26 13:22 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, David Hildenbrand, Andrew Morton,
Jérôme Glisse, Simona Vetter, Alistair Popple,
Jason Gunthorpe
Let's pass the folio and the pte to restore_exclusive_pte(), so we
can avoid repeated page_folio() and ptep_get(). To do that,
pass the pte to try_restore_exclusive_pte() and use a folio in there
already.
While at it, just avoid the "swp_entry_t entry" variable in
try_restore_exclusive_pte() and add a folio-locked check to
restore_exclusive_pte().
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/memory.c | 29 ++++++++++++++---------------
1 file changed, 14 insertions(+), 15 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 507045fa719cc..2a0b4dd858769 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -719,14 +719,13 @@ struct folio *vm_normal_folio_pmd(struct vm_area_struct *vma,
#endif
static void restore_exclusive_pte(struct vm_area_struct *vma,
- struct page *page, unsigned long address,
- pte_t *ptep)
+ struct folio *folio, struct page *page, unsigned long address,
+ pte_t *ptep, pte_t orig_pte)
{
- struct folio *folio = page_folio(page);
- pte_t orig_pte;
pte_t pte;
- orig_pte = ptep_get(ptep);
+ VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
+
pte = pte_mkold(mk_pte(page, READ_ONCE(vma->vm_page_prot)));
if (pte_swp_soft_dirty(orig_pte))
pte = pte_mksoft_dirty(pte);
@@ -753,16 +752,15 @@ static void restore_exclusive_pte(struct vm_area_struct *vma,
* Tries to restore an exclusive pte if the page lock can be acquired without
* sleeping.
*/
-static int
-try_restore_exclusive_pte(pte_t *src_pte, struct vm_area_struct *vma,
- unsigned long addr)
+static int try_restore_exclusive_pte(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep, pte_t orig_pte)
{
- swp_entry_t entry = pte_to_swp_entry(ptep_get(src_pte));
- struct page *page = pfn_swap_entry_to_page(entry);
+ struct page *page = pfn_swap_entry_to_page(pte_to_swp_entry(orig_pte));
+ struct folio *folio = page_folio(page);
- if (trylock_page(page)) {
- restore_exclusive_pte(vma, page, addr, src_pte);
- unlock_page(page);
+ if (folio_trylock(folio)) {
+ restore_exclusive_pte(vma, folio, page, addr, ptep, orig_pte);
+ folio_unlock(folio);
return 0;
}
@@ -868,7 +866,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
* (ie. COW) mappings.
*/
VM_BUG_ON(!is_cow_mapping(src_vma->vm_flags));
- if (try_restore_exclusive_pte(src_pte, src_vma, addr))
+ if (try_restore_exclusive_pte(src_vma, addr, src_pte, orig_pte))
return -EBUSY;
return -ENOENT;
} else if (is_pte_marker_entry(entry)) {
@@ -4030,7 +4028,8 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address,
&vmf->ptl);
if (likely(vmf->pte && pte_same(ptep_get(vmf->pte), vmf->orig_pte)))
- restore_exclusive_pte(vma, vmf->page, vmf->address, vmf->pte);
+ restore_exclusive_pte(vma, folio, vmf->page, vmf->address,
+ vmf->pte, vmf->orig_pte);
if (vmf->pte)
pte_unmap_unlock(vmf->pte, vmf->ptl);
--
2.48.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 4/5] mm/memory: document restore_exclusive_pte()
2025-02-26 13:22 [PATCH v2 0/5] mm: cleanups for device-exclusive entries (hmm) David Hildenbrand
` (2 preceding siblings ...)
2025-02-26 13:22 ` [PATCH v2 3/5] mm/memory: pass folio and pte to restore_exclusive_pte() David Hildenbrand
@ 2025-02-26 13:22 ` David Hildenbrand
2025-02-26 13:22 ` [PATCH v2 5/5] mm/mmu_notifier: use MMU_NOTIFY_CLEAR in remove_device_exclusive_entry() David Hildenbrand
4 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-02-26 13:22 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, David Hildenbrand, Andrew Morton,
Jérôme Glisse, Simona Vetter, Alistair Popple,
Jason Gunthorpe
Let's document how this function is to be used, and why the folio lock
is involved.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/memory.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/mm/memory.c b/mm/memory.c
index 2a0b4dd858769..50a305d7efcb9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -718,6 +718,32 @@ struct folio *vm_normal_folio_pmd(struct vm_area_struct *vma,
}
#endif
+/**
+ * restore_exclusive_pte - Restore a device-exclusive entry
+ * @vma: VMA covering @address
+ * @folio: the mapped folio
+ * @page: the mapped folio page
+ * @address: the virtual address
+ * @ptep: pte pointer into the locked page table mapping the folio page
+ * @orig_pte: pte value at @ptep
+ *
+ * Restore a device-exclusive non-swap entry to an ordinary present pte.
+ *
+ * The folio and the page table must be locked, and MMU notifiers must have
+ * been called to invalidate any (exclusive) device mappings.
+ *
+ * Locking the folio makes sure that anybody who just converted the pte to
+ * a device-exclusive entry can map it into the device to make forward
+ * progress without others converting it back until the folio was unlocked.
+ *
+ * If the folio lock ever becomes an issue, we can stop relying on the folio
+ * lock; it might make some scenarios with heavy thrashing less likely to
+ * make forward progress, but these scenarios might not be valid use cases.
+ *
+ * Note that the folio lock does not protect against all cases of concurrent
+ * page table modifications (e.g., MADV_DONTNEED, mprotect), so device drivers
+ * must use MMU notifiers to sync against any concurrent changes.
+ */
static void restore_exclusive_pte(struct vm_area_struct *vma,
struct folio *folio, struct page *page, unsigned long address,
pte_t *ptep, pte_t orig_pte)
--
2.48.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 5/5] mm/mmu_notifier: use MMU_NOTIFY_CLEAR in remove_device_exclusive_entry()
2025-02-26 13:22 [PATCH v2 0/5] mm: cleanups for device-exclusive entries (hmm) David Hildenbrand
` (3 preceding siblings ...)
2025-02-26 13:22 ` [PATCH v2 4/5] mm/memory: document restore_exclusive_pte() David Hildenbrand
@ 2025-02-26 13:22 ` David Hildenbrand
4 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-02-26 13:22 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, David Hildenbrand, Andrew Morton,
Jérôme Glisse, Simona Vetter, Alistair Popple,
Jason Gunthorpe
Let's limit the use of MMU_NOTIFY_EXCLUSIVE to the case where we convert
a present PTE to device-exclusive. For the other case, we can simply
use MMU_NOTIFY_CLEAR, because it really is clearing the
device-exclusive entry first, to then install the present entry.
Update the documentation of MMU_NOTIFY_EXCLUSIVE, to document the single
use case more thoroughly.
If ever required, we could add a separate MMU_NOTIFY_CLEAR_EXCLUSIVE;
for now using MMU_NOTIFY_CLEAR seems to be sufficient.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
include/linux/mmu_notifier.h | 8 ++++----
mm/memory.c | 2 +-
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index d4e7146618262..bc2402a45741d 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -43,10 +43,10 @@ struct mmu_interval_notifier;
* a device driver to possibly ignore the invalidation if the
* owner field matches the driver's device private pgmap owner.
*
- * @MMU_NOTIFY_EXCLUSIVE: to signal a device driver that the device will no
- * longer have exclusive access to the page. When sent during creation of an
- * exclusive range the owner will be initialised to the value provided by the
- * caller of make_device_exclusive(), otherwise the owner will be NULL.
+ * @MMU_NOTIFY_EXCLUSIVE: conversion of a page table entry to device-exclusive.
+ * The owner is initialized to the value provided by the caller of
+ * make_device_exclusive(), such that this caller can filter out these
+ * events.
*/
enum mmu_notifier_event {
MMU_NOTIFY_UNMAP = 0,
diff --git a/mm/memory.c b/mm/memory.c
index 50a305d7efcb9..79acd2d95dcff 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4046,7 +4046,7 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
folio_put(folio);
return ret;
}
- mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0,
+ mmu_notifier_range_init_owner(&range, MMU_NOTIFY_CLEAR, 0,
vma->vm_mm, vmf->address & PAGE_MASK,
(vmf->address & PAGE_MASK) + PAGE_SIZE, NULL);
mmu_notifier_invalidate_range_start(&range);
--
2.48.1
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-02-26 13:23 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-26 13:22 [PATCH v2 0/5] mm: cleanups for device-exclusive entries (hmm) David Hildenbrand
2025-02-26 13:22 ` [PATCH v2 1/5] lib/test_hmm: make dmirror_atomic_map() consume a single page David Hildenbrand
2025-02-26 13:22 ` [PATCH v2 2/5] mm/memory: remove PageAnonExclusive sanity-check in restore_exclusive_pte() David Hildenbrand
2025-02-26 13:22 ` [PATCH v2 3/5] mm/memory: pass folio and pte to restore_exclusive_pte() David Hildenbrand
2025-02-26 13:22 ` [PATCH v2 4/5] mm/memory: document restore_exclusive_pte() David Hildenbrand
2025-02-26 13:22 ` [PATCH v2 5/5] mm/mmu_notifier: use MMU_NOTIFY_CLEAR in remove_device_exclusive_entry() David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox