linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816
@ 2024-08-16 11:13 Bert Karwatzki
  2024-08-16 11:13 ` [PATCH v5.1 01/19] mm/mmap: Correctly position vma_iterator in __split_vma() Bert Karwatzki
                   ` (15 more replies)
  0 siblings, 16 replies; 18+ messages in thread
From: Bert Karwatzki @ 2024-08-16 11:13 UTC (permalink / raw)
  To: Liam R . Howlett
  Cc: Bert Karwatzki, Suren Baghdasaryan, Vlastimil Babka,
	Lorenzo Stoakes, Matthew Wilcox, sidhartha.kumar,
	Paul E . McKenney, Jiri Olsa, linux-kernel, linux-mm,
	Andrew Morton, Kees Cook, Jeff Xu

Since the v5 patchset some changes have taken place in the linux-next tree
which make it impossible to cleanly apply that patchset. The most important
of these changes is the splitting of the mm/mmap.c into mm/mmap.c and mm/vma.c
and the splitting of mm/internal.h into mm/internal.h and mm/vma.h. Also
arch_unmap() has already been removed from mm/mmap.c in next-20240816 so
there's no need to take care of that.

When testing this with `stress-ng --vm-segv 1` dmesg show errors like

 [   T3753] coredump: 3753(stress-ng-vm-se): over coredump resource limit, skipping core dump
 [   T3753] coredump: 3753(stress-ng-vm-se): coredump has not been created, error -7
 [   T3754] coredump: 3754(stress-ng-vm-se): over coredump resource limit, skipping core dump
 [   T3754] coredump: 3754(stress-ng-vm-se): coredump has not been created, error -7
 [...]

this is most likely not a problem of the patchset but instead caused by
a more verbose coredump introduced in the following patch:
https://lore.kernel.org/all/20240718182743.1959160-3-romank@linux.microsoft.com/

Changes since v4:
 - rebase on next-20240816
 - some functions which were static in the original v5 patchset
   are now non-static as they're used in both mmap.c and vma.c
   (an alternative approach to this would have been to leave
   them as "static inline" and put them into vma.h (at least if
   they're only used in mmap.c and vma.c))

Bert Karwatzki

Liam R. Howlett (19):
  mm/mmap: Correctly position vma_iterator in __split_vma()
  mm/mmap: Introduce abort_munmap_vmas()
  mm/mmap: Introduce vmi_complete_munmap_vmas()
  mm/mmap: Extract the gathering of vmas from do_vmi_align_munmap()
  mm/mmap: Introduce vma_munmap_struct for use in munmap operations
  mm/mmap: Change munmap to use vma_munmap_struct() for accounting and
    surrounding vmas
  mm/mmap: Extract validate_mm() from vma_complete()
  mm/mmap: Inline munmap operation in mmap_region()
  mm/mmap: Expand mmap_region() munmap call
  mm/mmap: Support vma == NULL in init_vma_munmap()
  mm/mmap: Reposition vma iterator in mmap_region()
  mm/mmap: Track start and end of munmap in vma_munmap_struct
  mm/mmap: Clean up unmap_region() argument list
  mm/mmap: Avoid zeroing vma tree in mmap_region()
  mm/mmap: Use PHYS_PFN in mmap_region()
  mm/mmap: Use vms accounted pages in mmap_region()
  mm/mmap: Move can_modify_mm() check down the stack
  ipc/shm, mm: Drop do_vma_munmap()
  mm/mmap: Move may_expand_vm() check in mmap_region()

 include/linux/mm.h |   6 +-
 ipc/shm.c          |   8 +-
 mm/mmap.c          | 146 ++++++++---------
 mm/vma.c           | 397 +++++++++++++++++++++++++++++++--------------
 mm/vma.h           |  61 ++++++-
 5 files changed, 403 insertions(+), 215 deletions(-)

--
2.45.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5.1 01/19] mm/mmap: Correctly position vma_iterator in __split_vma()
  2024-08-16 11:13 [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816 Bert Karwatzki
@ 2024-08-16 11:13 ` Bert Karwatzki
  2024-08-16 11:13 ` [PATCH v5.1 02/19] mm/mmap: Introduce abort_munmap_vmas() Bert Karwatzki
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Bert Karwatzki @ 2024-08-16 11:13 UTC (permalink / raw)
  To: Liam R . Howlett
  Cc: Bert Karwatzki, Suren Baghdasaryan, Vlastimil Babka,
	Lorenzo Stoakes, Matthew Wilcox, sidhartha.kumar,
	Paul E . McKenney, Jiri Olsa, linux-kernel, linux-mm,
	Andrew Morton, Kees Cook, Jeff Xu

The vma iterator may be left pointing to the newly created vma.  This
happens when inserting the new vma at the end of the old vma
(!new_below).

The incorrect position in the vma iterator is not exposed currently
since the vma iterator is repositioned in the munmap path and is not
reused in any of the other paths.

This has limited impact in the current code, but is required for future
changes.

Fixes: b2b3b886738f ("mm: don't use __vma_adjust() in __split_vma()")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
---
 mm/vma.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/vma.c b/mm/vma.c
index 84965f2cd580..02e4f7cf489f 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -246,6 +246,9 @@ static int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	/* Success. */
 	if (new_below)
 		vma_next(vmi);
+	else
+		vma_prev(vmi);
+
 	return 0;

 out_free_mpol:
--
2.45.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5.1 02/19] mm/mmap: Introduce abort_munmap_vmas()
  2024-08-16 11:13 [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816 Bert Karwatzki
  2024-08-16 11:13 ` [PATCH v5.1 01/19] mm/mmap: Correctly position vma_iterator in __split_vma() Bert Karwatzki
@ 2024-08-16 11:13 ` Bert Karwatzki
  2024-08-16 11:13 ` [PATCH v5.1 03/19] mm/mmap: Introduce vmi_complete_munmap_vmas() Bert Karwatzki
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Bert Karwatzki @ 2024-08-16 11:13 UTC (permalink / raw)
  To: Liam R . Howlett
  Cc: Bert Karwatzki, Suren Baghdasaryan, Vlastimil Babka,
	Lorenzo Stoakes, Matthew Wilcox, sidhartha.kumar,
	Paul E . McKenney, Jiri Olsa, linux-kernel, linux-mm,
	Andrew Morton, Kees Cook, Jeff Xu, Lorenzo Stoakes

Extract clean up of failed munmap() operations from
do_vmi_align_munmap().  This simplifies later patches in the series.

It is worth noting that the mas_for_each() loop now has a different
upper limit.  This should not change the number of vmas visited for
reattaching to the main vma tree (mm_mt), as all vmas are reattached in
both scenarios.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
---
 mm/vma.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/mm/vma.c b/mm/vma.c
index 02e4f7cf489f..83260a3142b3 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -668,6 +668,22 @@ void vma_complete(struct vma_prepare *vp,
 	validate_mm(mm);
 }

+/*
+ * abort_munmap_vmas - Undo any munmap work and free resources
+ *
+ * Reattach any detached vmas and free up the maple tree used to track the vmas.
+ */
+static inline void abort_munmap_vmas(struct ma_state *mas_detach)
+{
+	struct vm_area_struct *vma;
+
+	mas_set(mas_detach, 0);
+	mas_for_each(mas_detach, vma, ULONG_MAX)
+		vma_mark_detached(vma, false);
+
+	__mt_destroy(mas_detach->tree);
+}
+
 /*
  * do_vmi_align_munmap() - munmap the aligned region from @start to @end.
  * @vmi: The vma iterator
@@ -822,11 +838,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 userfaultfd_error:
 munmap_gather_failed:
 end_split_failed:
-	mas_set(&mas_detach, 0);
-	mas_for_each(&mas_detach, next, end)
-		vma_mark_detached(next, false);
-
-	__mt_destroy(&mt_detach);
+	abort_munmap_vmas(&mas_detach);
 start_split_failed:
 map_count_exceeded:
 	validate_mm(mm);
--
2.45.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5.1 03/19] mm/mmap: Introduce vmi_complete_munmap_vmas()
  2024-08-16 11:13 [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816 Bert Karwatzki
  2024-08-16 11:13 ` [PATCH v5.1 01/19] mm/mmap: Correctly position vma_iterator in __split_vma() Bert Karwatzki
  2024-08-16 11:13 ` [PATCH v5.1 02/19] mm/mmap: Introduce abort_munmap_vmas() Bert Karwatzki
@ 2024-08-16 11:13 ` Bert Karwatzki
  2024-08-16 11:13 ` [PATCH v5.1 04/19] mm/mmap: Extract the gathering of vmas from do_vmi_align_munmap() Bert Karwatzki
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Bert Karwatzki @ 2024-08-16 11:13 UTC (permalink / raw)
  To: Liam R . Howlett
  Cc: Bert Karwatzki, Suren Baghdasaryan, Vlastimil Babka,
	Lorenzo Stoakes, Matthew Wilcox, sidhartha.kumar,
	Paul E . McKenney, Jiri Olsa, linux-kernel, linux-mm,
	Andrew Morton, Kees Cook, Jeff Xu, Lorenzo Stoakes

Extract all necessary operations that need to be completed after the vma
maple tree is updated from a munmap() operation.  Extracting this makes
the later patch in the series easier to understand.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
---
 mm/vma.c | 81 ++++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 55 insertions(+), 26 deletions(-)

diff --git a/mm/vma.c b/mm/vma.c
index 83260a3142b3..d331ba575019 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -684,6 +684,58 @@ static inline void abort_munmap_vmas(struct ma_state *mas_detach)
 	__mt_destroy(mas_detach->tree);
 }

+/*
+ * vmi_complete_munmap_vmas() - Finish the munmap() operation
+ * @vmi: The vma iterator
+ * @vma: The first vma to be munmapped
+ * @mm: The mm struct
+ * @start: The start address
+ * @end: The end address
+ * @unlock: Unlock the mm or not
+ * @mas_detach: them maple state of the detached vma maple tree
+ * @locked_vm: The locked_vm count in the detached vmas
+ *
+ * This function updates the mm_struct, unmaps the region, frees the resources
+ * used for the munmap() and may downgrade the lock - if requested.  Everything
+ * needed to be done once the vma maple tree is updated.
+ */
+static void
+vmi_complete_munmap_vmas(struct vma_iterator *vmi, struct vm_area_struct *vma,
+		struct mm_struct *mm, unsigned long start,
+		unsigned long end, bool unlock, struct ma_state *mas_detach,
+		unsigned long locked_vm)
+{
+	struct vm_area_struct *prev, *next;
+	int count;
+
+	count = mas_detach->index + 1;
+	mm->map_count -= count;
+	mm->locked_vm -= locked_vm;
+	if (unlock)
+		mmap_write_downgrade(mm);
+
+	prev = vma_iter_prev_range(vmi);
+	next = vma_next(vmi);
+	if (next)
+		vma_iter_prev_range(vmi);
+
+	/*
+	 * We can free page tables without write-locking mmap_lock because VMAs
+	 * were isolated before we downgraded mmap_lock.
+	 */
+	mas_set(mas_detach, 1);
+	unmap_region(mm, mas_detach, vma, prev, next, start, end, count,
+		     !unlock);
+	/* Statistics and freeing VMAs */
+	mas_set(mas_detach, 0);
+	remove_mt(mm, mas_detach);
+	validate_mm(mm);
+	if (unlock)
+		mmap_read_unlock(mm);
+
+	__mt_destroy(mas_detach->tree);
+}
+
 /*
  * do_vmi_align_munmap() - munmap the aligned region from @start to @end.
  * @vmi: The vma iterator
@@ -703,7 +755,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		    struct mm_struct *mm, unsigned long start,
 		    unsigned long end, struct list_head *uf, bool unlock)
 {
-	struct vm_area_struct *prev, *next = NULL;
+	struct vm_area_struct *next = NULL;
 	struct maple_tree mt_detach;
 	int count = 0;
 	int error = -ENOMEM;
@@ -807,31 +859,8 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		goto clear_tree_failed;

 	/* Point of no return */
-	mm->locked_vm -= locked_vm;
-	mm->map_count -= count;
-	if (unlock)
-		mmap_write_downgrade(mm);
-
-	prev = vma_iter_prev_range(vmi);
-	next = vma_next(vmi);
-	if (next)
-		vma_iter_prev_range(vmi);
-
-	/*
-	 * We can free page tables without write-locking mmap_lock because VMAs
-	 * were isolated before we downgraded mmap_lock.
-	 */
-	mas_set(&mas_detach, 1);
-	unmap_region(mm, &mas_detach, vma, prev, next, start, end, count,
-		     !unlock);
-	/* Statistics and freeing VMAs */
-	mas_set(&mas_detach, 0);
-	remove_mt(mm, &mas_detach);
-	validate_mm(mm);
-	if (unlock)
-		mmap_read_unlock(mm);
-
-	__mt_destroy(&mt_detach);
+	vmi_complete_munmap_vmas(vmi, vma, mm, start, end, unlock, &mas_detach,
+				 locked_vm);
 	return 0;

 clear_tree_failed:
--
2.45.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5.1 04/19] mm/mmap: Extract the gathering of vmas from do_vmi_align_munmap()
  2024-08-16 11:13 [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816 Bert Karwatzki
                   ` (2 preceding siblings ...)
  2024-08-16 11:13 ` [PATCH v5.1 03/19] mm/mmap: Introduce vmi_complete_munmap_vmas() Bert Karwatzki
@ 2024-08-16 11:13 ` Bert Karwatzki
  2024-08-16 11:13 ` [PATCH v5.1 05/19] mm/mmap: Introduce vma_munmap_struct for use in munmap operations Bert Karwatzki
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Bert Karwatzki @ 2024-08-16 11:13 UTC (permalink / raw)
  To: Liam R . Howlett
  Cc: Bert Karwatzki, Suren Baghdasaryan, Vlastimil Babka,
	Lorenzo Stoakes, Matthew Wilcox, sidhartha.kumar,
	Paul E . McKenney, Jiri Olsa, linux-kernel, linux-mm,
	Andrew Morton, Kees Cook, Jeff Xu, Lorenzo Stoakes

Create vmi_gather_munmap_vmas() to handle the gathering of vmas into a
detached maple tree for removal later.  Part of the gathering is the
splitting of vmas that span the boundary.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/vma.c | 80 ++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 58 insertions(+), 22 deletions(-)

diff --git a/mm/vma.c b/mm/vma.c
index d331ba575019..a980837eefd7 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -737,32 +737,30 @@ vmi_complete_munmap_vmas(struct vma_iterator *vmi, struct vm_area_struct *vma,
 }

 /*
- * do_vmi_align_munmap() - munmap the aligned region from @start to @end.
+ * vmi_gather_munmap_vmas() - Put all VMAs within a range into a maple tree
+ * for removal at a later date.  Handles splitting first and last if necessary
+ * and marking the vmas as isolated.
+ *
  * @vmi: The vma iterator
  * @vma: The starting vm_area_struct
  * @mm: The mm_struct
  * @start: The aligned start address to munmap.
  * @end: The aligned end address to munmap.
  * @uf: The userfaultfd list_head
- * @unlock: Set to true to drop the mmap_lock.  unlocking only happens on
- * success.
+ * @mas_detach: The maple state tracking the detached tree
+ * @locked_vm: a pointer to store the VM_LOCKED pages count.
  *
- * Return: 0 on success and drops the lock if so directed, error and leaves the
- * lock held otherwise.
+ * Return: 0 on success
  */
-int
-do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
+static int
+vmi_gather_munmap_vmas(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		    struct mm_struct *mm, unsigned long start,
-		    unsigned long end, struct list_head *uf, bool unlock)
+		    unsigned long end, struct list_head *uf,
+		    struct ma_state *mas_detach, unsigned long *locked_vm)
 {
 	struct vm_area_struct *next = NULL;
-	struct maple_tree mt_detach;
 	int count = 0;
 	int error = -ENOMEM;
-	unsigned long locked_vm = 0;
-	MA_STATE(mas_detach, &mt_detach, 0, 0);
-	mt_init_flags(&mt_detach, vmi->mas.tree->ma_flags & MT_FLAGS_LOCK_MASK);
-	mt_on_stack(mt_detach);

 	/*
 	 * If we need to split any vma, do it now to save pain later.
@@ -801,13 +799,13 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 				goto end_split_failed;
 		}
 		vma_start_write(next);
-		mas_set(&mas_detach, count);
-		error = mas_store_gfp(&mas_detach, next, GFP_KERNEL);
+		mas_set(mas_detach, count++);
+		error = mas_store_gfp(mas_detach, next, GFP_KERNEL);
 		if (error)
 			goto munmap_gather_failed;
 		vma_mark_detached(next, true);
 		if (next->vm_flags & VM_LOCKED)
-			locked_vm += vma_pages(next);
+			*locked_vm += vma_pages(next);

 		count++;
 		if (unlikely(uf)) {
@@ -834,7 +832,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 #if defined(CONFIG_DEBUG_VM_MAPLE_TREE)
 	/* Make sure no VMAs are about to be lost. */
 	{
-		MA_STATE(test, &mt_detach, 0, 0);
+		MA_STATE(test, mas_detach->tree, 0, 0);
 		struct vm_area_struct *vma_mas, *vma_test;
 		int test_count = 0;

@@ -854,6 +852,48 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	while (vma_iter_addr(vmi) > start)
 		vma_iter_prev_range(vmi);

+	return 0;
+
+userfaultfd_error:
+munmap_gather_failed:
+end_split_failed:
+	abort_munmap_vmas(mas_detach);
+start_split_failed:
+map_count_exceeded:
+	return error;
+}
+
+/*
+ * do_vmi_align_munmap() - munmap the aligned region from @start to @end.
+ * @vmi: The vma iterator
+ * @vma: The starting vm_area_struct
+ * @mm: The mm_struct
+ * @start: The aligned start address to munmap.
+ * @end: The aligned end address to munmap.
+ * @uf: The userfaultfd list_head
+ * @unlock: Set to true to drop the mmap_lock.  unlocking only happens on
+ * success.
+ *
+ * Return: 0 on success and drops the lock if so directed, error and leaves the
+ * lock held otherwise.
+ */
+int
+do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
+		    struct mm_struct *mm, unsigned long start,
+		    unsigned long end, struct list_head *uf, bool unlock)
+{
+	struct maple_tree mt_detach;
+	MA_STATE(mas_detach, &mt_detach, 0, 0);
+	mt_init_flags(&mt_detach, vmi->mas.tree->ma_flags & MT_FLAGS_LOCK_MASK);
+	mt_on_stack(mt_detach);
+	int error;
+	unsigned long locked_vm = 0;
+
+	error = vmi_gather_munmap_vmas(vmi, vma, mm, start, end, uf,
+				       &mas_detach, &locked_vm);
+	if (error)
+		goto gather_failed;
+
 	error = vma_iter_clear_gfp(vmi, start, end, GFP_KERNEL);
 	if (error)
 		goto clear_tree_failed;
@@ -864,12 +904,8 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	return 0;

 clear_tree_failed:
-userfaultfd_error:
-munmap_gather_failed:
-end_split_failed:
 	abort_munmap_vmas(&mas_detach);
-start_split_failed:
-map_count_exceeded:
+gather_failed:
 	validate_mm(mm);
 	return error;
 }
--
2.45.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5.1 05/19] mm/mmap: Introduce vma_munmap_struct for use in munmap operations
  2024-08-16 11:13 [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816 Bert Karwatzki
                   ` (3 preceding siblings ...)
  2024-08-16 11:13 ` [PATCH v5.1 04/19] mm/mmap: Extract the gathering of vmas from do_vmi_align_munmap() Bert Karwatzki
@ 2024-08-16 11:13 ` Bert Karwatzki
  2024-08-16 11:13 ` [PATCH v5.1 06/19] mm/mmap: Change munmap to use vma_munmap_struct() for accounting and surrounding vmas Bert Karwatzki
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Bert Karwatzki @ 2024-08-16 11:13 UTC (permalink / raw)
  To: Liam R . Howlett
  Cc: Bert Karwatzki, Suren Baghdasaryan, Vlastimil Babka,
	Lorenzo Stoakes, Matthew Wilcox, sidhartha.kumar,
	Paul E . McKenney, Jiri Olsa, linux-kernel, linux-mm,
	Andrew Morton, Kees Cook, Jeff Xu, Lorenzo Stoakes

Use a structure to pass along all the necessary information and counters
involved in removing vmas from the mm_struct.

Update vmi_ function names to vms_ to indicate the first argument
type change.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/vma.c | 140 +++++++++++++++++++++++++++++--------------------------
 mm/vma.h |  16 +++++++
 2 files changed, 90 insertions(+), 66 deletions(-)

diff --git a/mm/vma.c b/mm/vma.c
index a980837eefd7..9495230df3c3 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -365,6 +365,32 @@ static void __vma_link_file(struct vm_area_struct *vma,
 	flush_dcache_mmap_unlock(mapping);
 }

+/*
+ * init_vma_munmap() - Initializer wrapper for vma_munmap_struct
+ * @vms: The vma munmap struct
+ * @vmi: The vma iterator
+ * @vma: The first vm_area_struct to munmap
+ * @start: The aligned start address to munmap
+ * @end: The aligned end address to munmap
+ * @uf: The userfaultfd list_head
+ * @unlock: Unlock after the operation.  Only unlocked on success
+ */
+static inline void init_vma_munmap(struct vma_munmap_struct *vms,
+		struct vma_iterator *vmi, struct vm_area_struct *vma,
+		unsigned long start, unsigned long end, struct list_head *uf,
+		bool unlock)
+{
+	vms->vmi = vmi;
+	vms->vma = vma;
+	vms->mm = vma->vm_mm;
+	vms->start = start;
+	vms->end = end;
+	vms->unlock = unlock;
+	vms->uf = uf;
+	vms->vma_count = 0;
+	vms->nr_pages = vms->locked_vm = 0;
+}
+
 /*
  * vma_prepare() - Helper function for handling locking VMAs prior to altering
  * @vp: The initialized vma_prepare struct
@@ -685,81 +711,63 @@ static inline void abort_munmap_vmas(struct ma_state *mas_detach)
 }

 /*
- * vmi_complete_munmap_vmas() - Finish the munmap() operation
- * @vmi: The vma iterator
- * @vma: The first vma to be munmapped
- * @mm: The mm struct
- * @start: The start address
- * @end: The end address
- * @unlock: Unlock the mm or not
- * @mas_detach: them maple state of the detached vma maple tree
- * @locked_vm: The locked_vm count in the detached vmas
+ * vms_complete_munmap_vmas() - Finish the munmap() operation
+ * @vms: The vma munmap struct
+ * @mas_detach: The maple state of the detached vmas
  *
- * This function updates the mm_struct, unmaps the region, frees the resources
+ * This updates the mm_struct, unmaps the region, frees the resources
  * used for the munmap() and may downgrade the lock - if requested.  Everything
  * needed to be done once the vma maple tree is updated.
  */
-static void
-vmi_complete_munmap_vmas(struct vma_iterator *vmi, struct vm_area_struct *vma,
-		struct mm_struct *mm, unsigned long start,
-		unsigned long end, bool unlock, struct ma_state *mas_detach,
-		unsigned long locked_vm)
+
+static void vms_complete_munmap_vmas(struct vma_munmap_struct *vms,
+		struct ma_state *mas_detach)
 {
 	struct vm_area_struct *prev, *next;
-	int count;
+	struct mm_struct *mm;

-	count = mas_detach->index + 1;
-	mm->map_count -= count;
-	mm->locked_vm -= locked_vm;
-	if (unlock)
+	mm = vms->mm;
+	mm->map_count -= vms->vma_count;
+	mm->locked_vm -= vms->locked_vm;
+	if (vms->unlock)
 		mmap_write_downgrade(mm);

-	prev = vma_iter_prev_range(vmi);
-	next = vma_next(vmi);
+	prev = vma_iter_prev_range(vms->vmi);
+	next = vma_next(vms->vmi);
 	if (next)
-		vma_iter_prev_range(vmi);
+		vma_iter_prev_range(vms->vmi);

 	/*
 	 * We can free page tables without write-locking mmap_lock because VMAs
 	 * were isolated before we downgraded mmap_lock.
 	 */
 	mas_set(mas_detach, 1);
-	unmap_region(mm, mas_detach, vma, prev, next, start, end, count,
-		     !unlock);
+	unmap_region(mm, mas_detach, vms->vma, prev, next, vms->start, vms->end,
+		     vms->vma_count, !vms->unlock);
 	/* Statistics and freeing VMAs */
 	mas_set(mas_detach, 0);
 	remove_mt(mm, mas_detach);
 	validate_mm(mm);
-	if (unlock)
+	if (vms->unlock)
 		mmap_read_unlock(mm);

 	__mt_destroy(mas_detach->tree);
 }

 /*
- * vmi_gather_munmap_vmas() - Put all VMAs within a range into a maple tree
+ * vms_gather_munmap_vmas() - Put all VMAs within a range into a maple tree
  * for removal at a later date.  Handles splitting first and last if necessary
  * and marking the vmas as isolated.
  *
- * @vmi: The vma iterator
- * @vma: The starting vm_area_struct
- * @mm: The mm_struct
- * @start: The aligned start address to munmap.
- * @end: The aligned end address to munmap.
- * @uf: The userfaultfd list_head
+ * @vms: The vma munmap struct
  * @mas_detach: The maple state tracking the detached tree
- * @locked_vm: a pointer to store the VM_LOCKED pages count.
  *
  * Return: 0 on success
  */
-static int
-vmi_gather_munmap_vmas(struct vma_iterator *vmi, struct vm_area_struct *vma,
-		    struct mm_struct *mm, unsigned long start,
-		    unsigned long end, struct list_head *uf,
-		    struct ma_state *mas_detach, unsigned long *locked_vm)
+static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
+		struct ma_state *mas_detach)
 {
 	struct vm_area_struct *next = NULL;
-	int count = 0;
 	int error = -ENOMEM;

 	/*
@@ -771,17 +779,18 @@ vmi_gather_munmap_vmas(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	 */

 	/* Does it split the first one? */
-	if (start > vma->vm_start) {
+	if (vms->start > vms->vma->vm_start) {

 		/*
 		 * Make sure that map_count on return from munmap() will
 		 * not exceed its limit; but let map_count go just above
 		 * its limit temporarily, to help free resources as expected.
 		 */
-		if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count)
+		if (vms->end < vms->vma->vm_end &&
+		    vms->mm->map_count >= sysctl_max_map_count)
 			goto map_count_exceeded;

-		error = __split_vma(vmi, vma, start, 1);
+		error = __split_vma(vms->vmi, vms->vma, vms->start, 1);
 		if (error)
 			goto start_split_failed;
 	}
@@ -790,25 +799,24 @@ vmi_gather_munmap_vmas(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	 * Detach a range of VMAs from the mm. Using next as a temp variable as
 	 * it is always overwritten.
 	 */
-	next = vma;
+	next = vms->vma;
 	do {
 		/* Does it split the end? */
-		if (next->vm_end > end) {
-			error = __split_vma(vmi, next, end, 0);
+		if (next->vm_end > vms->end) {
+			error = __split_vma(vms->vmi, next, vms->end, 0);
 			if (error)
 				goto end_split_failed;
 		}
 		vma_start_write(next);
-		mas_set(mas_detach, count++);
+		mas_set(mas_detach, vms->vma_count++);
 		error = mas_store_gfp(mas_detach, next, GFP_KERNEL);
 		if (error)
 			goto munmap_gather_failed;
 		vma_mark_detached(next, true);
 		if (next->vm_flags & VM_LOCKED)
-			*locked_vm += vma_pages(next);
+			vms->locked_vm += vma_pages(next);

-		count++;
-		if (unlikely(uf)) {
+		if (unlikely(vms->uf)) {
 			/*
 			 * If userfaultfd_unmap_prep returns an error the vmas
 			 * will remain split, but userland will get a
@@ -818,16 +826,17 @@ vmi_gather_munmap_vmas(struct vma_iterator *vmi, struct vm_area_struct *vma,
 			 * split, despite we could. This is unlikely enough
 			 * failure that it's not worth optimizing it for.
 			 */
-			error = userfaultfd_unmap_prep(next, start, end, uf);
+			error = userfaultfd_unmap_prep(next, vms->start,
+						       vms->end, vms->uf);

 			if (error)
 				goto userfaultfd_error;
 		}
 #ifdef CONFIG_DEBUG_VM_MAPLE_TREE
-		BUG_ON(next->vm_start < start);
-		BUG_ON(next->vm_start > end);
+		BUG_ON(next->vm_start < vms->start);
+		BUG_ON(next->vm_start > vms->end);
 #endif
-	} for_each_vma_range(*vmi, next, end);
+	} for_each_vma_range(*(vms->vmi), next, vms->end);

 #if defined(CONFIG_DEBUG_VM_MAPLE_TREE)
 	/* Make sure no VMAs are about to be lost. */
@@ -836,21 +845,21 @@ vmi_gather_munmap_vmas(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		struct vm_area_struct *vma_mas, *vma_test;
 		int test_count = 0;

-		vma_iter_set(vmi, start);
+		vma_iter_set(vms->vmi, vms->start);
 		rcu_read_lock();
-		vma_test = mas_find(&test, count - 1);
-		for_each_vma_range(*vmi, vma_mas, end) {
+		vma_test = mas_find(&test, vms->vma_count - 1);
+		for_each_vma_range(*(vms->vmi), vma_mas, vms->end) {
 			BUG_ON(vma_mas != vma_test);
 			test_count++;
-			vma_test = mas_next(&test, count - 1);
+			vma_test = mas_next(&test, vms->vma_count - 1);
 		}
 		rcu_read_unlock();
-		BUG_ON(count != test_count);
+		BUG_ON(vms->vma_count != test_count);
 	}
 #endif

-	while (vma_iter_addr(vmi) > start)
-		vma_iter_prev_range(vmi);
+	while (vma_iter_addr(vms->vmi) > vms->start)
+		vma_iter_prev_range(vms->vmi);

 	return 0;

@@ -886,11 +895,11 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	MA_STATE(mas_detach, &mt_detach, 0, 0);
 	mt_init_flags(&mt_detach, vmi->mas.tree->ma_flags & MT_FLAGS_LOCK_MASK);
 	mt_on_stack(mt_detach);
+	struct vma_munmap_struct vms;
 	int error;
-	unsigned long locked_vm = 0;

-	error = vmi_gather_munmap_vmas(vmi, vma, mm, start, end, uf,
-				       &mas_detach, &locked_vm);
+	init_vma_munmap(&vms, vmi, vma, start, end, uf, unlock);
+	error = vms_gather_munmap_vmas(&vms, &mas_detach);
 	if (error)
 		goto gather_failed;

@@ -899,8 +908,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		goto clear_tree_failed;

 	/* Point of no return */
-	vmi_complete_munmap_vmas(vmi, vma, mm, start, end, unlock, &mas_detach,
-				 locked_vm);
+	vms_complete_munmap_vmas(&vms, &mas_detach);
 	return 0;

 clear_tree_failed:
diff --git a/mm/vma.h b/mm/vma.h
index 6efdf1768a0a..f65c739cbd00 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -21,6 +21,22 @@ struct vma_prepare {
 	struct vm_area_struct *remove2;
 };

+/*
+ * vma munmap operation
+ */
+struct vma_munmap_struct {
+	struct vma_iterator *vmi;
+	struct mm_struct *mm;
+	struct vm_area_struct *vma;	/* The first vma to munmap */
+	struct list_head *uf;		/* Userfaultfd list_head */
+	unsigned long start;		/* Aligned start addr (inclusive) */
+	unsigned long end;		/* Aligned end addr (exclusive) */
+	int vma_count;			/* Number of vmas that will be removed */
+	unsigned long nr_pages;		/* Number of pages being removed */
+	unsigned long locked_vm;	/* Number of locked pages */
+	bool unlock;			/* Unlock after the munmap */
+};
+
 struct unlink_vma_file_batch {
 	int count;
 	struct vm_area_struct *vmas[8];
--
2.45.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5.1 06/19] mm/mmap: Change munmap to use vma_munmap_struct() for accounting and surrounding vmas
  2024-08-16 11:13 [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816 Bert Karwatzki
                   ` (4 preceding siblings ...)
  2024-08-16 11:13 ` [PATCH v5.1 05/19] mm/mmap: Introduce vma_munmap_struct for use in munmap operations Bert Karwatzki
@ 2024-08-16 11:13 ` Bert Karwatzki
  2024-08-16 11:13 ` [PATCH v5.1 07/19] mm/mmap: Extract validate_mm() from vma_complete() Bert Karwatzki
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Bert Karwatzki @ 2024-08-16 11:13 UTC (permalink / raw)
  To: Liam R . Howlett
  Cc: Bert Karwatzki, Suren Baghdasaryan, Vlastimil Babka,
	Lorenzo Stoakes, Matthew Wilcox, sidhartha.kumar,
	Paul E . McKenney, Jiri Olsa, linux-kernel, linux-mm,
	Andrew Morton, Kees Cook, Jeff Xu, Lorenzo Stoakes

Clean up the code by changing the munmap operation to use a structure
for the accounting and munmap variables.

Since remove_mt() is only called in one location and the contents will
be reduced to almost nothing.  The remains of the function can be added
to vms_complete_munmap_vmas().

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
---
 mm/vma.c | 79 ++++++++++++++++++++++++++++----------------------------
 mm/vma.h |  6 +++++
 2 files changed, 46 insertions(+), 39 deletions(-)

diff --git a/mm/vma.c b/mm/vma.c
index 9495230df3c3..816736c4f82e 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -273,30 +273,6 @@ static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	return __split_vma(vmi, vma, addr, new_below);
 }

-/*
- * Ok - we have the memory areas we should free on a maple tree so release them,
- * and do the vma updates.
- *
- * Called with the mm semaphore held.
- */
-static inline void remove_mt(struct mm_struct *mm, struct ma_state *mas)
-{
-	unsigned long nr_accounted = 0;
-	struct vm_area_struct *vma;
-
-	/* Update high watermark before we lower total_vm */
-	update_hiwater_vm(mm);
-	mas_for_each(mas, vma, ULONG_MAX) {
-		long nrpages = vma_pages(vma);
-
-		if (vma->vm_flags & VM_ACCOUNT)
-			nr_accounted += nrpages;
-		vm_stat_account(mm, vma->vm_flags, -nrpages);
-		remove_vma(vma, false);
-	}
-	vm_unacct_memory(nr_accounted);
-}
-
 /*
  * init_vma_prep() - Initializer wrapper for vma_prepare struct
  * @vp: The vma_prepare struct
@@ -388,7 +364,8 @@ static inline void init_vma_munmap(struct vma_munmap_struct *vms,
 	vms->unlock = unlock;
 	vms->uf = uf;
 	vms->vma_count = 0;
-	vms->nr_pages = vms->locked_vm = 0;
+	vms->nr_pages = vms->locked_vm = vms->nr_accounted = 0;
+	vms->exec_vm = vms->stack_vm = vms->data_vm = 0;
 }

 /*
@@ -715,7 +692,7 @@ static inline void abort_munmap_vmas(struct ma_state *mas_detach)
  * @vms: The vma munmap struct
  * @mas_detach: The maple state of the detached vmas
  *
- * This updates the mm_struct, unmaps the region, frees the resources
+ * This function updates the mm_struct, unmaps the region, frees the resources
  * used for the munmap() and may downgrade the lock - if requested.  Everything
  * needed to be done once the vma maple tree is updated.
  */
@@ -723,7 +700,7 @@ static inline void abort_munmap_vmas(struct ma_state *mas_detach)
 static void vms_complete_munmap_vmas(struct vma_munmap_struct *vms,
 		struct ma_state *mas_detach)
 {
-	struct vm_area_struct *prev, *next;
+	struct vm_area_struct *vma;
 	struct mm_struct *mm;

 	mm = vms->mm;
@@ -732,21 +709,26 @@ static void vms_complete_munmap_vmas(struct vma_munmap_struct *vms,
 	if (vms->unlock)
 		mmap_write_downgrade(mm);

-	prev = vma_iter_prev_range(vms->vmi);
-	next = vma_next(vms->vmi);
-	if (next)
-		vma_iter_prev_range(vms->vmi);
-
 	/*
 	 * We can free page tables without write-locking mmap_lock because VMAs
 	 * were isolated before we downgraded mmap_lock.
 	 */
 	mas_set(mas_detach, 1);
-	unmap_region(mm, mas_detach, vms->vma, prev, next, vms->start, vms->end,
-		     vms->vma_count, !vms->unlock);
-	/* Statistics and freeing VMAs */
+	unmap_region(mm, mas_detach, vms->vma, vms->prev, vms->next,
+		     vms->start, vms->end, vms->vma_count, !vms->unlock);
+	/* Update high watermark before we lower total_vm */
+	update_hiwater_vm(mm);
+	/* Stat accounting */
+	WRITE_ONCE(mm->total_vm, READ_ONCE(mm->total_vm) - vms->nr_pages);
+	mm->exec_vm -= vms->exec_vm;
+	mm->stack_vm -= vms->stack_vm;
+	mm->data_vm -= vms->data_vm;
+	/* Remove and clean up vmas */
 	mas_set(mas_detach, 0);
-	remove_mt(mm, mas_detach);
+	mas_for_each(mas_detach, vma, ULONG_MAX)
+		remove_vma(vma, false);
+
+	vm_unacct_memory(vms->nr_accounted);
 	validate_mm(mm);
 	if (vms->unlock)
 		mmap_read_unlock(mm);
@@ -794,13 +776,14 @@ static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
 		if (error)
 			goto start_split_failed;
 	}
+	vms->prev = vma_prev(vms->vmi);

 	/*
 	 * Detach a range of VMAs from the mm. Using next as a temp variable as
 	 * it is always overwritten.
 	 */
-	next = vms->vma;
-	do {
+	for_each_vma_range(*(vms->vmi), next, vms->end) {
+		long nrpages;
 		/* Does it split the end? */
 		if (next->vm_end > vms->end) {
 			error = __split_vma(vms->vmi, next, vms->end, 0);
@@ -813,6 +796,22 @@ static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
 		if (error)
 			goto munmap_gather_failed;
 		vma_mark_detached(next, true);
+		nrpages = vma_pages(next);
+
+		vms->nr_pages += nrpages;
+		if (next->vm_flags & VM_LOCKED)
+			vms->locked_vm += nrpages;
+
+		if (next->vm_flags & VM_ACCOUNT)
+			vms->nr_accounted += nrpages;
+
+		if (is_exec_mapping(next->vm_flags))
+			vms->exec_vm += nrpages;
+		else if (is_stack_mapping(next->vm_flags))
+			vms->stack_vm += nrpages;
+		else if (is_data_mapping(next->vm_flags))
+			vms->data_vm += nrpages;
+
 		if (next->vm_flags & VM_LOCKED)
 			vms->locked_vm += vma_pages(next);

@@ -836,7 +835,9 @@ static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
 		BUG_ON(next->vm_start < vms->start);
 		BUG_ON(next->vm_start > vms->end);
 #endif
-	} for_each_vma_range(*(vms->vmi), next, vms->end);
+	}
+
+	vms->next = vma_next(vms->vmi);

 #if defined(CONFIG_DEBUG_VM_MAPLE_TREE)
 	/* Make sure no VMAs are about to be lost. */
diff --git a/mm/vma.h b/mm/vma.h
index f65c739cbd00..7ba0d71b50ca 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -28,12 +28,18 @@ struct vma_munmap_struct {
 	struct vma_iterator *vmi;
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;	/* The first vma to munmap */
+	struct vm_area_struct *prev;    /* vma before the munmap area */
+	struct vm_area_struct *next;	/* vma after the munmap area */
 	struct list_head *uf;		/* Userfaultfd list_head */
 	unsigned long start;		/* Aligned start addr (inclusive) */
 	unsigned long end;		/* Aligned end addr (exclusive) */
 	int vma_count;			/* Number of vmas that will be removed */
 	unsigned long nr_pages;		/* Number of pages being removed */
 	unsigned long locked_vm;	/* Number of locked pages */
+	unsigned long nr_accounted;	/* Number of VM_ACCOUNT pages */
+	unsigned long exec_vm;
+	unsigned long stack_vm;
+	unsigned long data_vm;
 	bool unlock;			/* Unlock after the munmap */
 };

--
2.45.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5.1 07/19] mm/mmap: Extract validate_mm() from vma_complete()
  2024-08-16 11:13 [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816 Bert Karwatzki
                   ` (5 preceding siblings ...)
  2024-08-16 11:13 ` [PATCH v5.1 06/19] mm/mmap: Change munmap to use vma_munmap_struct() for accounting and surrounding vmas Bert Karwatzki
@ 2024-08-16 11:13 ` Bert Karwatzki
  2024-08-16 11:13 ` [PATCH v5.1 08/19] mm/mmap: Inline munmap operation in mmap_region() Bert Karwatzki
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Bert Karwatzki @ 2024-08-16 11:13 UTC (permalink / raw)
  To: Liam R . Howlett
  Cc: Bert Karwatzki, Suren Baghdasaryan, Vlastimil Babka,
	Lorenzo Stoakes, Matthew Wilcox, sidhartha.kumar,
	Paul E . McKenney, Jiri Olsa, linux-kernel, linux-mm,
	Andrew Morton, Kees Cook, Jeff Xu, Lorenzo Stoakes

vma_complete() will need to be called during an unsafe time to call
validate_mm().  Extract the call in all places now so that only one
location can be modified in the next change.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/mmap.c | 1 +
 mm/vma.c  | 5 ++++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 1da0e8325b33..6aed0b769cc4 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1805,6 +1805,7 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		vma_iter_store(vmi, vma);

 		vma_complete(&vp, vmi, mm);
+		validate_mm(mm);
 		khugepaged_enter_vma(vma, flags);
 		goto out;
 	}
diff --git a/mm/vma.c b/mm/vma.c
index 816736c4f82e..e4677364082b 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -242,6 +242,7 @@ static int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,

 	/* vma_complete stores the new vma */
 	vma_complete(&vp, vmi, vma->vm_mm);
+	validate_mm(vma->vm_mm);

 	/* Success. */
 	if (new_below)
@@ -548,6 +549,7 @@ int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	vma_iter_store(vmi, vma);

 	vma_complete(&vp, vmi, vma->vm_mm);
+	validate_mm(vma->vm_mm);
 	return 0;

 nomem:
@@ -589,6 +591,7 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	vma_iter_clear(vmi);
 	vma_set_range(vma, start, end, pgoff);
 	vma_complete(&vp, vmi, vma->vm_mm);
+	validate_mm(vma->vm_mm);
 	return 0;
 }

@@ -668,7 +671,6 @@ void vma_complete(struct vma_prepare *vp,
 	}
 	if (vp->insert && vp->file)
 		uprobe_mmap(vp->insert);
-	validate_mm(mm);
 }

 /*
@@ -1196,6 +1198,7 @@ static struct vm_area_struct
 	}

 	vma_complete(&vp, vmi, mm);
+	validate_mm(mm);
 	khugepaged_enter_vma(res, vm_flags);
 	return res;

--
2.45.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5.1 08/19] mm/mmap: Inline munmap operation in mmap_region()
  2024-08-16 11:13 [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816 Bert Karwatzki
                   ` (6 preceding siblings ...)
  2024-08-16 11:13 ` [PATCH v5.1 07/19] mm/mmap: Extract validate_mm() from vma_complete() Bert Karwatzki
@ 2024-08-16 11:13 ` Bert Karwatzki
  2024-08-16 11:13 ` [PATCH v5.1 09/19] mm/mmap: Expand mmap_region() munmap call Bert Karwatzki
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Bert Karwatzki @ 2024-08-16 11:13 UTC (permalink / raw)
  To: Liam R . Howlett
  Cc: Bert Karwatzki, Suren Baghdasaryan, Vlastimil Babka,
	Lorenzo Stoakes, Matthew Wilcox, sidhartha.kumar,
	Paul E . McKenney, Jiri Olsa, linux-kernel, linux-mm,
	Andrew Morton, Kees Cook, Jeff Xu, Liam R . Howlett,
	Lorenzo Stoakes

mmap_region is already passed sanitized addr and len, so change the
call to do_vmi_munmap() to do_vmi_align_munmap() and inline the other
checks.

The inlining of the function and checks is an intermediate step in the
series so future patches are easier to follow.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/mmap.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 6aed0b769cc4..6ed27cf12217 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1388,12 +1388,21 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 			return -ENOMEM;
 	}

-	/* Unmap any existing mapping in the area */
-	error = do_vmi_munmap(&vmi, mm, addr, len, uf, false);
-	if (error == -EPERM)
-		return error;
-	else if (error)
-		return -ENOMEM;
+
+	if (unlikely(!can_modify_mm(mm, addr, end)))
+		return -EPERM;
+
+	 /* arch_unmap() might do unmaps itself.  */
+	arch_unmap(mm, addr, end);
+
+	/* Find the first overlapping VMA */
+	vma = vma_find(&vmi, end);
+	if (vma) {
+		/* Unmap any existing mapping in the area */
+		if (do_vmi_align_munmap(&vmi, vma, mm, addr, end, uf, false))
+			return -ENOMEM;
+		vma = NULL;
+	}

 	/*
 	 * Private writable mapping: check memory availability
--
2.45.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5.1 09/19] mm/mmap: Expand mmap_region() munmap call
  2024-08-16 11:13 [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816 Bert Karwatzki
                   ` (7 preceding siblings ...)
  2024-08-16 11:13 ` [PATCH v5.1 08/19] mm/mmap: Inline munmap operation in mmap_region() Bert Karwatzki
@ 2024-08-16 11:13 ` Bert Karwatzki
  2024-08-16 11:13 ` [PATCH v5.1 10/19] mm/mmap: Support vma == NULL in init_vma_munmap() Bert Karwatzki
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Bert Karwatzki @ 2024-08-16 11:13 UTC (permalink / raw)
  To: Liam R . Howlett
  Cc: Bert Karwatzki, Suren Baghdasaryan, Vlastimil Babka,
	Lorenzo Stoakes, Matthew Wilcox, sidhartha.kumar,
	Paul E . McKenney, Jiri Olsa, linux-kernel, linux-mm,
	Andrew Morton, Kees Cook, Jeff Xu, Liam R . Howlett,
	Lorenzo Stoakes

Open code the do_vmi_align_munmap() call so that it can be broken up
later in the series.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/mmap.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 6ed27cf12217..56d0aaff8478 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1366,6 +1366,9 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	struct vm_area_struct *next, *prev, *merge;
 	pgoff_t pglen = len >> PAGE_SHIFT;
 	unsigned long charged = 0;
+	struct vma_munmap_struct vms;
+	struct ma_state mas_detach;
+	struct maple_tree mt_detach;
 	unsigned long end = addr + len;
 	unsigned long merge_start = addr, merge_end = end;
 	bool writable_file_mapping = false;
@@ -1398,10 +1401,27 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	/* Find the first overlapping VMA */
 	vma = vma_find(&vmi, end);
 	if (vma) {
-		/* Unmap any existing mapping in the area */
-		if (do_vmi_align_munmap(&vmi, vma, mm, addr, end, uf, false))
+		mt_init_flags(&mt_detach, vmi.mas.tree->ma_flags & MT_FLAGS_LOCK_MASK);
+		mt_on_stack(mt_detach);
+		mas_init(&mas_detach, &mt_detach, /* addr = */ 0);
+		init_vma_munmap(&vms, &vmi, vma, addr, end, uf, /* unlock = */ false);
+		/* Prepare to unmap any existing mapping in the area */
+		if (vms_gather_munmap_vmas(&vms, &mas_detach))
+			return -ENOMEM;
+
+		/* Remove any existing mappings from the vma tree */
+		if (vma_iter_clear_gfp(&vmi, addr, end, GFP_KERNEL))
 			return -ENOMEM;
+
+		/* Unmap any existing mapping in the area */
+		vms_complete_munmap_vmas(&vms, &mas_detach);
+		next = vms.next;
+		prev = vms.prev;
+		vma_prev(&vmi);
 		vma = NULL;
+	} else {
+		next = vma_next(&vmi);
+		prev = vma_prev(&vmi);
 	}

 	/*
@@ -1414,8 +1434,6 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		vm_flags |= VM_ACCOUNT;
 	}

-	next = vma_next(&vmi);
-	prev = vma_prev(&vmi);
 	if (vm_flags & VM_SPECIAL) {
 		if (prev)
 			vma_iter_next_range(&vmi);
--
2.45.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5.1 10/19] mm/mmap: Support vma == NULL in init_vma_munmap()
  2024-08-16 11:13 [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816 Bert Karwatzki
                   ` (8 preceding siblings ...)
  2024-08-16 11:13 ` [PATCH v5.1 09/19] mm/mmap: Expand mmap_region() munmap call Bert Karwatzki
@ 2024-08-16 11:13 ` Bert Karwatzki
  2024-08-16 11:13 ` [PATCH v5.1 11/19] mm/mmap: Reposition vma iterator in mmap_region() Bert Karwatzki
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Bert Karwatzki @ 2024-08-16 11:13 UTC (permalink / raw)
  To: Liam R . Howlett
  Cc: Bert Karwatzki, Suren Baghdasaryan, Vlastimil Babka,
	Lorenzo Stoakes, Matthew Wilcox, sidhartha.kumar,
	Paul E . McKenney, Jiri Olsa, linux-kernel, linux-mm,
	Andrew Morton, Kees Cook, Jeff Xu, Liam R . Howlett

Adding support for a NULL vma means the init_vma_munmap() can be
initialized for a less error-prone process when calling
vms_complete_munmap_vmas() later on.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
---
 mm/mmap.c |  5 +----
 mm/vma.c  | 26 ++++++++++++++++----------
 mm/vma.h  | 14 ++++++++++++++
 3 files changed, 31 insertions(+), 14 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 56d0aaff8478..16fd14b243f9 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1395,16 +1395,13 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	if (unlikely(!can_modify_mm(mm, addr, end)))
 		return -EPERM;

-	 /* arch_unmap() might do unmaps itself.  */
-	arch_unmap(mm, addr, end);
-
 	/* Find the first overlapping VMA */
 	vma = vma_find(&vmi, end);
+	init_vma_munmap(&vms, &vmi, vma, addr, end, uf, /* unlock = */ false);
 	if (vma) {
 		mt_init_flags(&mt_detach, vmi.mas.tree->ma_flags & MT_FLAGS_LOCK_MASK);
 		mt_on_stack(mt_detach);
 		mas_init(&mas_detach, &mt_detach, /* addr = */ 0);
-		init_vma_munmap(&vms, &vmi, vma, addr, end, uf, /* unlock = */ false);
 		/* Prepare to unmap any existing mapping in the area */
 		if (vms_gather_munmap_vmas(&vms, &mas_detach))
 			return -ENOMEM;
diff --git a/mm/vma.c b/mm/vma.c
index e4677364082b..a5ca42b7161b 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -352,16 +352,22 @@ static void __vma_link_file(struct vm_area_struct *vma,
  * @uf: The userfaultfd list_head
  * @unlock: Unlock after the operation.  Only unlocked on success
  */
-static inline void init_vma_munmap(struct vma_munmap_struct *vms,
+void
+init_vma_munmap(struct vma_munmap_struct *vms,
 		struct vma_iterator *vmi, struct vm_area_struct *vma,
-		unsigned long start, unsigned long end, struct list_head *uf,
-		bool unlock)
+		unsigned long start, unsigned long end,
+		struct list_head *uf, bool unlock)
 {
 	vms->vmi = vmi;
 	vms->vma = vma;
-	vms->mm = vma->vm_mm;
-	vms->start = start;
-	vms->end = end;
+	if (vma) {
+		vms->mm = vma->vm_mm;
+		vms->start = start;
+		vms->end = end;
+	} else {
+		vms->mm = NULL;
+		vms->start = vms->end = 0;
+	}
 	vms->unlock = unlock;
 	vms->uf = uf;
 	vms->vma_count = 0;
@@ -699,8 +705,8 @@ static inline void abort_munmap_vmas(struct ma_state *mas_detach)
  * needed to be done once the vma maple tree is updated.
  */

-static void vms_complete_munmap_vmas(struct vma_munmap_struct *vms,
-		struct ma_state *mas_detach)
+void vms_complete_munmap_vmas(struct vma_munmap_struct *vms,
+			      struct ma_state *mas_detach)
 {
 	struct vm_area_struct *vma;
 	struct mm_struct *mm;
@@ -748,8 +754,8 @@ static void vms_complete_munmap_vmas(struct vma_munmap_struct *vms,
  *
  * Return: 0 on success
  */
-static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
-		struct ma_state *mas_detach)
+int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
+			   struct ma_state *mas_detach)
 {
 	struct vm_area_struct *next = NULL;
 	int error = -ENOMEM;
diff --git a/mm/vma.h b/mm/vma.h
index 7ba0d71b50ca..8b2401f93c74 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -63,6 +63,12 @@ void anon_vma_interval_tree_post_update_vma(struct vm_area_struct *vma);
 /* Required for do_brk_flags(). */
 void vma_prepare(struct vma_prepare *vp);

+/* Required for mmap_region() */
+void init_vma_munmap(struct vma_munmap_struct *vms,
+		     struct vma_iterator *vmi, struct vm_area_struct *vma,
+		     unsigned long start, unsigned long end,
+		     struct list_head *uf, bool unlock);
+
 /* Required for do_brk_flags(). */
 void init_vma_prep(struct vma_prepare *vp,
 		   struct vm_area_struct *vma);
@@ -78,6 +84,14 @@ int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
 int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	       unsigned long start, unsigned long end, pgoff_t pgoff);

+/* Required for mmap_region() */
+void vms_complete_munmap_vmas(struct vma_munmap_struct *vms,
+			      struct ma_state *mas_detach);
+
+/* Required for mmap_region() */
+int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
+			   struct ma_state *mas_detach);
+
 int
 do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		    struct mm_struct *mm, unsigned long start,
--
2.45.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5.1 11/19] mm/mmap: Reposition vma iterator in mmap_region()
  2024-08-16 11:13 [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816 Bert Karwatzki
                   ` (9 preceding siblings ...)
  2024-08-16 11:13 ` [PATCH v5.1 10/19] mm/mmap: Support vma == NULL in init_vma_munmap() Bert Karwatzki
@ 2024-08-16 11:13 ` Bert Karwatzki
  2024-08-16 11:13 ` [PATCH v5.1 12/19] mm/mmap: Track start and end of munmap in vma_munmap_struct Bert Karwatzki
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Bert Karwatzki @ 2024-08-16 11:13 UTC (permalink / raw)
  To: Liam R . Howlett
  Cc: Bert Karwatzki, Suren Baghdasaryan, Vlastimil Babka,
	Lorenzo Stoakes, Matthew Wilcox, sidhartha.kumar,
	Paul E . McKenney, Jiri Olsa, linux-kernel, linux-mm,
	Andrew Morton, Kees Cook, Jeff Xu, Liam R . Howlett,
	Lorenzo Stoakes

Instead of moving (or leaving) the vma iterator pointing at the previous
vma, leave it pointing at the insert location.  Pointing the vma
iterator at the insert location allows for a cleaner walk of the vma
tree for MAP_FIXED and the no expansion cases.

The vma_prev() call in the case of merging the previous vma is
equivalent to vma_iter_prev_range(), since the vma iterator will be
pointing to the location just before the previous vma.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/mmap.c | 38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 16fd14b243f9..6720b55b47ed 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1404,21 +1404,22 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		mas_init(&mas_detach, &mt_detach, /* addr = */ 0);
 		/* Prepare to unmap any existing mapping in the area */
 		if (vms_gather_munmap_vmas(&vms, &mas_detach))
-			return -ENOMEM;
+			goto gather_failed;

 		/* Remove any existing mappings from the vma tree */
 		if (vma_iter_clear_gfp(&vmi, addr, end, GFP_KERNEL))
-			return -ENOMEM;
+			goto clear_tree_failed;

 		/* Unmap any existing mapping in the area */
 		vms_complete_munmap_vmas(&vms, &mas_detach);
 		next = vms.next;
 		prev = vms.prev;
-		vma_prev(&vmi);
 		vma = NULL;
 	} else {
 		next = vma_next(&vmi);
 		prev = vma_prev(&vmi);
+		if (prev)
+			vma_iter_next_range(&vmi);
 	}

 	/*
@@ -1431,11 +1432,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		vm_flags |= VM_ACCOUNT;
 	}

-	if (vm_flags & VM_SPECIAL) {
-		if (prev)
-			vma_iter_next_range(&vmi);
+	if (vm_flags & VM_SPECIAL)
 		goto cannot_expand;
-	}

 	/* Attempt to expand an old mapping */
 	/* Check next */
@@ -1456,19 +1454,21 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		merge_start = prev->vm_start;
 		vma = prev;
 		vm_pgoff = prev->vm_pgoff;
-	} else if (prev) {
-		vma_iter_next_range(&vmi);
+		vma_prev(&vmi); /* Equivalent to going to the previous range */
 	}

-	/* Actually expand, if possible */
-	if (vma &&
-	    !vma_expand(&vmi, vma, merge_start, merge_end, vm_pgoff, next)) {
-		khugepaged_enter_vma(vma, vm_flags);
-		goto expanded;
+	if (vma) {
+		/* Actually expand, if possible */
+		if (!vma_expand(&vmi, vma, merge_start, merge_end, vm_pgoff, next)) {
+			khugepaged_enter_vma(vma, vm_flags);
+			goto expanded;
+		}
+
+		/* If the expand fails, then reposition the vma iterator */
+		if (unlikely(vma == prev))
+			vma_iter_set(&vmi, addr);
 	}

-	if (vma == prev)
-		vma_iter_set(&vmi, addr);
 cannot_expand:

 	/*
@@ -1629,6 +1629,12 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		vm_unacct_memory(charged);
 	validate_mm(mm);
 	return error;
+
+clear_tree_failed:
+	abort_munmap_vmas(&mas_detach);
+gather_failed:
+	validate_mm(mm);
+	return -ENOMEM;
 }

 static int __vm_munmap(unsigned long start, size_t len, bool unlock)
--
2.45.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5.1 12/19] mm/mmap: Track start and end of munmap in vma_munmap_struct
  2024-08-16 11:13 [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816 Bert Karwatzki
                   ` (10 preceding siblings ...)
  2024-08-16 11:13 ` [PATCH v5.1 11/19] mm/mmap: Reposition vma iterator in mmap_region() Bert Karwatzki
@ 2024-08-16 11:13 ` Bert Karwatzki
  2024-08-16 11:13 ` [PATCH v5.1 13/19] mm/mmap: Clean up unmap_region() argument list Bert Karwatzki
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Bert Karwatzki @ 2024-08-16 11:13 UTC (permalink / raw)
  To: Liam R . Howlett
  Cc: Bert Karwatzki, Suren Baghdasaryan, Vlastimil Babka,
	Lorenzo Stoakes, Matthew Wilcox, sidhartha.kumar,
	Paul E . McKenney, Jiri Olsa, linux-kernel, linux-mm,
	Andrew Morton, Kees Cook, Jeff Xu, Liam R . Howlett,
	Lorenzo Stoakes

Set the start and end address for munmap when the prev and next are
gathered.  This is needed to avoid incorrect addresses being used during
the vms_complete_munmap_vmas() function if the prev/next vma are
expanded.

Add a new helper vms_complete_pte_clear(), which is needed later and
will avoid growing the argument list to unmap_region() beyond the 9 it
already has.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/vma.c | 38 ++++++++++++++++++++++++++++++--------
 mm/vma.h |  5 +++++
 2 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/mm/vma.c b/mm/vma.c
index a5ca42b7161b..e106d412c4c3 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -373,6 +373,8 @@ init_vma_munmap(struct vma_munmap_struct *vms,
 	vms->vma_count = 0;
 	vms->nr_pages = vms->locked_vm = vms->nr_accounted = 0;
 	vms->exec_vm = vms->stack_vm = vms->data_vm = 0;
+	vms->unmap_start = FIRST_USER_ADDRESS;
+	vms->unmap_end = USER_PGTABLES_CEILING;
 }

 /*
@@ -684,7 +686,7 @@ void vma_complete(struct vma_prepare *vp,
  *
  * Reattach any detached vmas and free up the maple tree used to track the vmas.
  */
-static inline void abort_munmap_vmas(struct ma_state *mas_detach)
+void abort_munmap_vmas(struct ma_state *mas_detach)
 {
 	struct vm_area_struct *vma;

@@ -695,6 +697,28 @@ static inline void abort_munmap_vmas(struct ma_state *mas_detach)
 	__mt_destroy(mas_detach->tree);
 }

+static void vms_complete_pte_clear(struct vma_munmap_struct *vms,
+		struct ma_state *mas_detach, bool mm_wr_locked)
+{
+	struct mmu_gather tlb;
+
+	/*
+	 * We can free page tables without write-locking mmap_lock because VMAs
+	 * were isolated before we downgraded mmap_lock.
+	 */
+	mas_set(mas_detach, 1);
+	lru_add_drain();
+	tlb_gather_mmu(&tlb, vms->mm);
+	update_hiwater_rss(vms->mm);
+	unmap_vmas(&tlb, mas_detach, vms->vma, vms->start, vms->end,
+		   vms->vma_count, mm_wr_locked);
+	mas_set(mas_detach, 1);
+	/* start and end may be different if there is no prev or next vma. */
+	free_pgtables(&tlb, mas_detach, vms->vma, vms->unmap_start,
+		      vms->unmap_end, mm_wr_locked);
+	tlb_finish_mmu(&tlb);
+}
+
 /*
  * vms_complete_munmap_vmas() - Finish the munmap() operation
  * @vms: The vma munmap struct
@@ -717,13 +741,7 @@ void vms_complete_munmap_vmas(struct vma_munmap_struct *vms,
 	if (vms->unlock)
 		mmap_write_downgrade(mm);

-	/*
-	 * We can free page tables without write-locking mmap_lock because VMAs
-	 * were isolated before we downgraded mmap_lock.
-	 */
-	mas_set(mas_detach, 1);
-	unmap_region(mm, mas_detach, vms->vma, vms->prev, vms->next,
-		     vms->start, vms->end, vms->vma_count, !vms->unlock);
+	vms_complete_pte_clear(vms, mas_detach, !vms->unlock);
 	/* Update high watermark before we lower total_vm */
 	update_hiwater_vm(mm);
 	/* Stat accounting */
@@ -785,6 +803,8 @@ int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
 			goto start_split_failed;
 	}
 	vms->prev = vma_prev(vms->vmi);
+	if (vms->prev)
+		vms->unmap_start = vms->prev->vm_end;

 	/*
 	 * Detach a range of VMAs from the mm. Using next as a temp variable as
@@ -846,6 +866,8 @@ int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
 	}

 	vms->next = vma_next(vms->vmi);
+	if (vms->next)
+		vms->unmap_end = vms->next->vm_start;

 #if defined(CONFIG_DEBUG_VM_MAPLE_TREE)
 	/* Make sure no VMAs are about to be lost. */
diff --git a/mm/vma.h b/mm/vma.h
index 8b2401f93c74..b857e7dc4bfe 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -33,6 +33,8 @@ struct vma_munmap_struct {
 	struct list_head *uf;		/* Userfaultfd list_head */
 	unsigned long start;		/* Aligned start addr (inclusive) */
 	unsigned long end;		/* Aligned end addr (exclusive) */
+	unsigned long unmap_start;	/* Unmap PTE start */
+	unsigned long unmap_end;	/* Unmap PTE end */
 	int vma_count;			/* Number of vmas that will be removed */
 	unsigned long nr_pages;		/* Number of pages being removed */
 	unsigned long locked_vm;	/* Number of locked pages */
@@ -54,6 +56,9 @@ void validate_mm(struct mm_struct *mm);
 #define validate_mm(mm) do { } while (0)
 #endif

+/* Required for mmap_region() */
+void abort_munmap_vmas(struct ma_state *mas_detach);
+
 /* Required for expand_downwards(). */
 void anon_vma_interval_tree_pre_update_vma(struct vm_area_struct *vma);

--
2.45.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5.1 13/19] mm/mmap: Clean up unmap_region() argument list
  2024-08-16 11:13 [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816 Bert Karwatzki
                   ` (11 preceding siblings ...)
  2024-08-16 11:13 ` [PATCH v5.1 12/19] mm/mmap: Track start and end of munmap in vma_munmap_struct Bert Karwatzki
@ 2024-08-16 11:13 ` Bert Karwatzki
  2024-08-16 11:13 ` [PATCH v5.1 14/19] mm/mmap: Avoid zeroing vma tree in mmap_region() Bert Karwatzki
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Bert Karwatzki @ 2024-08-16 11:13 UTC (permalink / raw)
  To: Liam R . Howlett
  Cc: Bert Karwatzki, Suren Baghdasaryan, Vlastimil Babka,
	Lorenzo Stoakes, Matthew Wilcox, sidhartha.kumar,
	Paul E . McKenney, Jiri Olsa, linux-kernel, linux-mm,
	Andrew Morton, Kees Cook, Jeff Xu, Liam R . Howlett,
	Lorenzo Stoakes

With the only caller to unmap_region() being the error path of
mmap_region(), the argument list can be significantly reduced.

There is also no need to forward declare the static function any
longer.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/mmap.c |  3 +--
 mm/vma.c  | 17 ++++++++---------
 mm/vma.h  |  6 ++----
 3 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 6720b55b47ed..1f0b8dc5a089 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1617,8 +1617,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,

 		vma_iter_set(&vmi, vma->vm_end);
 		/* Undo any partial mapping done by a device driver. */
-		unmap_region(mm, &vmi.mas, vma, prev, next, vma->vm_start,
-			     vma->vm_end, vma->vm_end, true);
+		unmap_region(&vmi.mas, vma, prev, next);
 	}
 	if (writable_file_mapping)
 		mapping_unmap_writable(file->f_mapping);
diff --git a/mm/vma.c b/mm/vma.c
index e106d412c4c3..0244320d76ab 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -155,22 +155,21 @@ void remove_vma(struct vm_area_struct *vma, bool unreachable)
  *
  * Called with the mm semaphore held.
  */
-void unmap_region(struct mm_struct *mm, struct ma_state *mas,
-		struct vm_area_struct *vma, struct vm_area_struct *prev,
-		struct vm_area_struct *next, unsigned long start,
-		unsigned long end, unsigned long tree_end, bool mm_wr_locked)
+void unmap_region(struct ma_state *mas, struct vm_area_struct *vma,
+		struct vm_area_struct *prev, struct vm_area_struct *next)
 {
+	struct mm_struct *mm = vma->vm_mm;
 	struct mmu_gather tlb;
-	unsigned long mt_start = mas->index;

 	lru_add_drain();
 	tlb_gather_mmu(&tlb, mm);
 	update_hiwater_rss(mm);
-	unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked);
-	mas_set(mas, mt_start);
+	unmap_vmas(&tlb, mas, vma, vma->vm_start, vma->vm_end, vma->vm_end,
+		   /* mm_wr_locked = */ true);
+	mas_set(mas, vma->vm_end);
 	free_pgtables(&tlb, mas, vma, prev ? prev->vm_end : FIRST_USER_ADDRESS,
-				 next ? next->vm_start : USER_PGTABLES_CEILING,
-				 mm_wr_locked);
+		      next ? next->vm_start : USER_PGTABLES_CEILING,
+		      /* mm_wr_locked = */ true);
 	tlb_finish_mmu(&tlb);
 }

diff --git a/mm/vma.h b/mm/vma.h
index b857e7dc4bfe..2a2ca489e622 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -108,10 +108,8 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,

 void remove_vma(struct vm_area_struct *vma, bool unreachable);

-void unmap_region(struct mm_struct *mm, struct ma_state *mas,
-		struct vm_area_struct *vma, struct vm_area_struct *prev,
-		struct vm_area_struct *next, unsigned long start,
-		unsigned long end, unsigned long tree_end, bool mm_wr_locked);
+void unmap_region(struct ma_state *mas, struct vm_area_struct *vma,
+		struct vm_area_struct *prev, struct vm_area_struct *next);

 /* Required by mmap_region(). */
 bool
--
2.45.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5.1 14/19] mm/mmap: Avoid zeroing vma tree in mmap_region()
  2024-08-16 11:13 [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816 Bert Karwatzki
                   ` (12 preceding siblings ...)
  2024-08-16 11:13 ` [PATCH v5.1 13/19] mm/mmap: Clean up unmap_region() argument list Bert Karwatzki
@ 2024-08-16 11:13 ` Bert Karwatzki
  2024-08-16 11:13 ` [PATCH v5.1 15/19] mm/mmap: Use PHYS_PFN " Bert Karwatzki
  2024-08-16 11:33 ` [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816 Lorenzo Stoakes
  15 siblings, 0 replies; 18+ messages in thread
From: Bert Karwatzki @ 2024-08-16 11:13 UTC (permalink / raw)
  To: Liam R . Howlett
  Cc: Bert Karwatzki, Suren Baghdasaryan, Vlastimil Babka,
	Lorenzo Stoakes, Matthew Wilcox, sidhartha.kumar,
	Paul E . McKenney, Jiri Olsa, linux-kernel, linux-mm,
	Andrew Morton, Kees Cook, Jeff Xu, Liam R . Howlett

Instead of zeroing the vma tree and then overwriting the area, let the
area be overwritten and then clean up the gathered vmas using
vms_complete_munmap_vmas().

If a driver is mapping over an existing vma, then clear the ptes before
the call_mmap() invocation.  If the vma has a vm_ops->close(), then call
the close() function.  This is done using the vms_clear_ptes() and
vms_close_vmas() helpers.  This has the side effect of needing to call
open() on the vmas if the mmap_region() fails later on.

Temporarily keep track of the number of pages that will be removed and
reduce the charged amount.

This commit drops the validate_mm() call in the vma_expand() function.
It is necessary to drop the validate as it would fail since the mm
map_count would be incorrect during a vma expansion, prior to the
cleanup from vms_complete_munmap_vmas().

Clean up the error handing of the vms_gather_munmap_vmas() by calling
the verification within the function.

Note that before this change, a MAP_FIXED could fail and leave a gap in
the vma tree.  With this change, a MAP_FIXED failure will fail without
creating a gap and leave *a* vma in the area (may have been split) and
attept to restore them to an operational state (re-attached and
vm_ops->open()'ed if they were vm_ops->close()'d).

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
---
 mm/mmap.c | 60 +++++++++++++++++++++++++------------------------------
 mm/vma.c  | 59 ++++++++++++++++++++++++++++++++++++++++++------------
 mm/vma.h  | 16 ++++++++++++---
 3 files changed, 86 insertions(+), 49 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 1f0b8dc5a089..e152b6caaf9c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1373,24 +1373,19 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	unsigned long merge_start = addr, merge_end = end;
 	bool writable_file_mapping = false;
 	pgoff_t vm_pgoff;
-	int error;
+	int error = -ENOMEM;
 	VMA_ITERATOR(vmi, mm, addr);
+	unsigned long nr_pages, nr_accounted;

-	/* Check against address space limit. */
-	if (!may_expand_vm(mm, vm_flags, len >> PAGE_SHIFT)) {
-		unsigned long nr_pages;
-
-		/*
-		 * MAP_FIXED may remove pages of mappings that intersects with
-		 * requested mapping. Account for the pages it would unmap.
-		 */
-		nr_pages = count_vma_pages_range(mm, addr, end);
-
-		if (!may_expand_vm(mm, vm_flags,
-					(len >> PAGE_SHIFT) - nr_pages))
-			return -ENOMEM;
-	}
+	nr_pages = count_vma_pages_range(mm, addr, end, &nr_accounted);

+	/*
+	 * Check against address space limit.
+	 * MAP_FIXED may remove pages of mappings that intersects with requested
+	 * mapping. Account for the pages it would unmap.
+	 */
+	if (!may_expand_vm(mm, vm_flags, (len >> PAGE_SHIFT) - nr_pages))
+		return -ENOMEM;

 	if (unlikely(!can_modify_mm(mm, addr, end)))
 		return -EPERM;
@@ -1404,14 +1399,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		mas_init(&mas_detach, &mt_detach, /* addr = */ 0);
 		/* Prepare to unmap any existing mapping in the area */
 		if (vms_gather_munmap_vmas(&vms, &mas_detach))
-			goto gather_failed;
-
-		/* Remove any existing mappings from the vma tree */
-		if (vma_iter_clear_gfp(&vmi, addr, end, GFP_KERNEL))
-			goto clear_tree_failed;
+			return -ENOMEM;

-		/* Unmap any existing mapping in the area */
-		vms_complete_munmap_vmas(&vms, &mas_detach);
 		next = vms.next;
 		prev = vms.prev;
 		vma = NULL;
@@ -1427,8 +1416,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	 */
 	if (accountable_mapping(file, vm_flags)) {
 		charged = len >> PAGE_SHIFT;
+		charged -= nr_accounted;
 		if (security_vm_enough_memory_mm(mm, charged))
-			return -ENOMEM;
+			goto abort_munmap;
+		vms.nr_accounted = 0;
 		vm_flags |= VM_ACCOUNT;
 	}

@@ -1477,10 +1468,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	 * not unmapped, but the maps are removed from the list.
 	 */
 	vma = vm_area_alloc(mm);
-	if (!vma) {
-		error = -ENOMEM;
+	if (!vma)
 		goto unacct_error;
-	}

 	vma_iter_config(&vmi, addr, end);
 	vma_set_range(vma, addr, end, pgoff);
@@ -1489,6 +1478,9 @@ unsigned long mmap_region(struct file *file, unsigned long addr,

 	if (file) {
 		vma->vm_file = get_file(file);
+		/* call_mmap() may map PTE, so ensure there are no existing PTEs */
+		vms_clear_ptes(&vms, &mas_detach, /* mm_wr_locked = */ true);
+		vms_close_vmas(&vms, &mas_detach);
 		error = call_mmap(file, vma);
 		if (error)
 			goto unmap_and_free_vma;
@@ -1579,6 +1571,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 expanded:
 	perf_event_mmap(vma);

+	/* Unmap any existing mapping in the area */
+	if (vms.nr_pages)
+		vms_complete_munmap_vmas(&vms, &mas_detach);
+
 	vm_stat_account(mm, vm_flags, len >> PAGE_SHIFT);
 	if (vm_flags & VM_LOCKED) {
 		if ((vm_flags & VM_SPECIAL) || vma_is_dax(vma) ||
@@ -1626,14 +1622,12 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 unacct_error:
 	if (charged)
 		vm_unacct_memory(charged);
-	validate_mm(mm);
-	return error;

-clear_tree_failed:
-	abort_munmap_vmas(&mas_detach);
-gather_failed:
+abort_munmap:
+	if (vms.nr_pages)
+		abort_munmap_vmas(&mas_detach, vms.closed);
 	validate_mm(mm);
-	return -ENOMEM;
+	return error;
 }

 static int __vm_munmap(unsigned long start, size_t len, bool unlock)
@@ -1972,7 +1966,7 @@ void exit_mmap(struct mm_struct *mm)
 	do {
 		if (vma->vm_flags & VM_ACCOUNT)
 			nr_accounted += vma_pages(vma);
-		remove_vma(vma, true);
+		remove_vma(vma, /* unreachable = */ true, /* closed = */ false);
 		count++;
 		cond_resched();
 		vma = vma_next(&vmi);
diff --git a/mm/vma.c b/mm/vma.c
index 0244320d76ab..86757443a7a2 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -136,10 +136,10 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
 /*
  * Close a vm structure and free it.
  */
-void remove_vma(struct vm_area_struct *vma, bool unreachable)
+void remove_vma(struct vm_area_struct *vma, bool unreachable, bool closed)
 {
 	might_sleep();
-	if (vma->vm_ops && vma->vm_ops->close)
+	if (!closed && vma->vm_ops && vma->vm_ops->close)
 		vma->vm_ops->close(vma);
 	if (vma->vm_file)
 		fput(vma->vm_file);
@@ -374,6 +374,8 @@ init_vma_munmap(struct vma_munmap_struct *vms,
 	vms->exec_vm = vms->stack_vm = vms->data_vm = 0;
 	vms->unmap_start = FIRST_USER_ADDRESS;
 	vms->unmap_end = USER_PGTABLES_CEILING;
+	vms->clear_ptes = false;	/* No PTEs to clear yet */
+	vms->closed = false;
 }

 /*
@@ -556,7 +558,6 @@ int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	vma_iter_store(vmi, vma);

 	vma_complete(&vp, vmi, vma->vm_mm);
-	validate_mm(vma->vm_mm);
 	return 0;

 nomem:
@@ -685,22 +686,30 @@ void vma_complete(struct vma_prepare *vp,
  *
  * Reattach any detached vmas and free up the maple tree used to track the vmas.
  */
-void abort_munmap_vmas(struct ma_state *mas_detach)
+void abort_munmap_vmas(struct ma_state *mas_detach, bool closed)
 {
 	struct vm_area_struct *vma;

 	mas_set(mas_detach, 0);
-	mas_for_each(mas_detach, vma, ULONG_MAX)
+	mas_for_each(mas_detach, vma, ULONG_MAX) {
+		if (closed && vma->vm_ops && vma->vm_ops->close &&
+		    vma->vm_ops->open)
+			vma->vm_ops->open(vma);
+
 		vma_mark_detached(vma, false);
+	}

 	__mt_destroy(mas_detach->tree);
 }

-static void vms_complete_pte_clear(struct vma_munmap_struct *vms,
-		struct ma_state *mas_detach, bool mm_wr_locked)
+void vms_clear_ptes(struct vma_munmap_struct *vms,
+		    struct ma_state *mas_detach, bool mm_wr_locked)
 {
 	struct mmu_gather tlb;

+	if (!vms->clear_ptes)
+		return;
+
 	/*
 	 * We can free page tables without write-locking mmap_lock because VMAs
 	 * were isolated before we downgraded mmap_lock.
@@ -716,6 +725,22 @@ static void vms_complete_pte_clear(struct vma_munmap_struct *vms,
 	free_pgtables(&tlb, mas_detach, vms->vma, vms->unmap_start,
 		      vms->unmap_end, mm_wr_locked);
 	tlb_finish_mmu(&tlb);
+	vms->clear_ptes = false;
+}
+
+void vms_close_vmas(struct vma_munmap_struct *vms, struct ma_state *mas_detach)
+{
+	struct vm_area_struct *vma;
+
+	if (!vms->vma_count)
+		return;
+
+	mas_set(mas_detach, 0);
+	mas_for_each(mas_detach, vma, ULONG_MAX)
+		if (vma->vm_ops && vma->vm_ops->close)
+			vma->vm_ops->close(vma);
+
+	vms->closed = true;
 }

 /*
@@ -740,7 +765,7 @@ void vms_complete_munmap_vmas(struct vma_munmap_struct *vms,
 	if (vms->unlock)
 		mmap_write_downgrade(mm);

-	vms_complete_pte_clear(vms, mas_detach, !vms->unlock);
+	vms_clear_ptes(vms, mas_detach, !vms->unlock);
 	/* Update high watermark before we lower total_vm */
 	update_hiwater_vm(mm);
 	/* Stat accounting */
@@ -751,7 +776,7 @@ void vms_complete_munmap_vmas(struct vma_munmap_struct *vms,
 	/* Remove and clean up vmas */
 	mas_set(mas_detach, 0);
 	mas_for_each(mas_detach, vma, ULONG_MAX)
-		remove_vma(vma, false);
+		remove_vma(vma, /* unreachable = */ false, vms->closed);

 	vm_unacct_memory(vms->nr_accounted);
 	validate_mm(mm);
@@ -891,14 +916,18 @@ int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
 	while (vma_iter_addr(vms->vmi) > vms->start)
 		vma_iter_prev_range(vms->vmi);

+	/* There are now PTEs that need to be cleared */
+	vms->clear_ptes = true;
+
 	return 0;

 userfaultfd_error:
 munmap_gather_failed:
 end_split_failed:
-	abort_munmap_vmas(mas_detach);
+	abort_munmap_vmas(mas_detach, /* closed = */ false);
 start_split_failed:
 map_count_exceeded:
+	validate_mm(vms->mm);
 	return error;
 }

@@ -942,9 +971,9 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	return 0;

 clear_tree_failed:
-	abort_munmap_vmas(&mas_detach);
-gather_failed:
+	abort_munmap_vmas(&mas_detach, /* closed = */ false);
 	validate_mm(mm);
+gather_failed:
 	return error;
 }

@@ -1667,17 +1696,21 @@ bool vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
 }

 unsigned long count_vma_pages_range(struct mm_struct *mm,
-				    unsigned long addr, unsigned long end)
+				    unsigned long addr, unsigned long end,
+				    unsigned long *nr_accounted)
 {
 	VMA_ITERATOR(vmi, mm, addr);
 	struct vm_area_struct *vma;
 	unsigned long nr_pages = 0;

+	*nr_accounted = 0;
 	for_each_vma_range(vmi, vma, end) {
 		unsigned long vm_start = max(addr, vma->vm_start);
 		unsigned long vm_end = min(end, vma->vm_end);

 		nr_pages += PHYS_PFN(vm_end - vm_start);
+		if (vma->vm_flags & VM_ACCOUNT)
+			*nr_accounted += PHYS_PFN(vm_end - vm_start);
 	}

 	return nr_pages;
diff --git a/mm/vma.h b/mm/vma.h
index 2a2ca489e622..73297dc7fa28 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -43,6 +43,8 @@ struct vma_munmap_struct {
 	unsigned long stack_vm;
 	unsigned long data_vm;
 	bool unlock;			/* Unlock after the munmap */
+	bool clear_ptes;		/* If there are outstanding PTE to be cleared */
+	bool closed;			/* vma->vm_ops->close() called already */
 };

 struct unlink_vma_file_batch {
@@ -57,7 +59,7 @@ void validate_mm(struct mm_struct *mm);
 #endif

 /* Required for mmap_region() */
-void abort_munmap_vmas(struct ma_state *mas_detach);
+void abort_munmap_vmas(struct ma_state *mas_detach, bool closed);

 /* Required for expand_downwards(). */
 void anon_vma_interval_tree_pre_update_vma(struct vm_area_struct *vma);
@@ -97,6 +99,13 @@ void vms_complete_munmap_vmas(struct vma_munmap_struct *vms,
 int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
 			   struct ma_state *mas_detach);

+/* Required for mmap_region() */
+void vms_clear_ptes(struct vma_munmap_struct *vms,
+		    struct ma_state *mas_detach, bool mm_wr_locked);
+
+/* Required for mmap_region() */
+void vms_close_vmas(struct vma_munmap_struct *vms, struct ma_state *mas_detach);
+
 int
 do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		    struct mm_struct *mm, unsigned long start,
@@ -106,7 +115,7 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
 		  unsigned long start, size_t len, struct list_head *uf,
 		  bool unlock);

-void remove_vma(struct vm_area_struct *vma, bool unreachable);
+void remove_vma(struct vm_area_struct *vma, bool unreachable, bool closed);

 void unmap_region(struct ma_state *mas, struct vm_area_struct *vma,
 		struct vm_area_struct *prev, struct vm_area_struct *next);
@@ -220,7 +229,8 @@ bool vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot);
 int mm_take_all_locks(struct mm_struct *mm);
 void mm_drop_all_locks(struct mm_struct *mm);
 unsigned long count_vma_pages_range(struct mm_struct *mm,
-				    unsigned long addr, unsigned long end);
+				    unsigned long addr, unsigned long end,
+				    unsigned long *nr_accounted);

 static inline bool vma_wants_manual_pte_write_upgrade(struct vm_area_struct *vma)
 {
--
2.45.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5.1 15/19] mm/mmap: Use PHYS_PFN in mmap_region()
  2024-08-16 11:13 [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816 Bert Karwatzki
                   ` (13 preceding siblings ...)
  2024-08-16 11:13 ` [PATCH v5.1 14/19] mm/mmap: Avoid zeroing vma tree in mmap_region() Bert Karwatzki
@ 2024-08-16 11:13 ` Bert Karwatzki
  2024-08-16 11:33 ` [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816 Lorenzo Stoakes
  15 siblings, 0 replies; 18+ messages in thread
From: Bert Karwatzki @ 2024-08-16 11:13 UTC (permalink / raw)
  To: Liam R . Howlett
  Cc: Bert Karwatzki, Suren Baghdasaryan, Vlastimil Babka,
	Lorenzo Stoakes, Matthew Wilcox, sidhartha.kumar,
	Paul E . McKenney, Jiri Olsa, linux-kernel, linux-mm,
	Andrew Morton, Kees Cook, Jeff Xu, Liam R . Howlett,
	Lorenzo Stoakes

Instead of shifting the length by PAGE_SIZE, use PHYS_PFN.  Also use the
existing local variable everywhere instead of some of the time.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
---
 mm/mmap.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index e152b6caaf9c..57fb1c82a852 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1364,7 +1364,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma = NULL;
 	struct vm_area_struct *next, *prev, *merge;
-	pgoff_t pglen = len >> PAGE_SHIFT;
+	pgoff_t pglen = PHYS_PFN(len);
 	unsigned long charged = 0;
 	struct vma_munmap_struct vms;
 	struct ma_state mas_detach;
@@ -1384,7 +1384,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	 * MAP_FIXED may remove pages of mappings that intersects with requested
 	 * mapping. Account for the pages it would unmap.
 	 */
-	if (!may_expand_vm(mm, vm_flags, (len >> PAGE_SHIFT) - nr_pages))
+	if (!may_expand_vm(mm, vm_flags, pglen - nr_pages))
 		return -ENOMEM;

 	if (unlikely(!can_modify_mm(mm, addr, end)))
@@ -1415,7 +1415,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	 * Private writable mapping: check memory availability
 	 */
 	if (accountable_mapping(file, vm_flags)) {
-		charged = len >> PAGE_SHIFT;
+		charged = pglen;
 		charged -= nr_accounted;
 		if (security_vm_enough_memory_mm(mm, charged))
 			goto abort_munmap;
@@ -1575,14 +1575,14 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	if (vms.nr_pages)
 		vms_complete_munmap_vmas(&vms, &mas_detach);

-	vm_stat_account(mm, vm_flags, len >> PAGE_SHIFT);
+	vm_stat_account(mm, vm_flags, pglen);
 	if (vm_flags & VM_LOCKED) {
 		if ((vm_flags & VM_SPECIAL) || vma_is_dax(vma) ||
 					is_vm_hugetlb_page(vma) ||
 					vma == get_gate_vma(current->mm))
 			vm_flags_clear(vma, VM_LOCKED_MASK);
 		else
-			mm->locked_vm += (len >> PAGE_SHIFT);
+			mm->locked_vm += pglen;
 	}

 	if (file)
--
2.45.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816
  2024-08-16 11:13 [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816 Bert Karwatzki
                   ` (14 preceding siblings ...)
  2024-08-16 11:13 ` [PATCH v5.1 15/19] mm/mmap: Use PHYS_PFN " Bert Karwatzki
@ 2024-08-16 11:33 ` Lorenzo Stoakes
  2024-08-16 11:55   ` Bert Karwatzki
  15 siblings, 1 reply; 18+ messages in thread
From: Lorenzo Stoakes @ 2024-08-16 11:33 UTC (permalink / raw)
  To: Bert Karwatzki
  Cc: Liam R . Howlett, Suren Baghdasaryan, Vlastimil Babka,
	Lorenzo Stoakes, Matthew Wilcox, sidhartha.kumar,
	Paul E . McKenney, Jiri Olsa, linux-kernel, linux-mm,
	Andrew Morton, Kees Cook, Jeff Xu

On Fri, Aug 16, 2024 at 01:13:41PM GMT, Bert Karwatzki wrote:
> Since the v5 patchset some changes have taken place in the linux-next tree
> which make it impossible to cleanly apply that patchset. The most important
> of these changes is the splitting of the mm/mmap.c into mm/mmap.c and mm/vma.c
> and the splitting of mm/internal.h into mm/internal.h and mm/vma.h. Also
> arch_unmap() has already been removed from mm/mmap.c in next-20240816 so
> there's no need to take care of that.
>
> When testing this with `stress-ng --vm-segv 1` dmesg show errors like
>
>  [   T3753] coredump: 3753(stress-ng-vm-se): over coredump resource limit, skipping core dump
>  [   T3753] coredump: 3753(stress-ng-vm-se): coredump has not been created, error -7
>  [   T3754] coredump: 3754(stress-ng-vm-se): over coredump resource limit, skipping core dump
>  [   T3754] coredump: 3754(stress-ng-vm-se): coredump has not been created, error -7
>  [...]
>
> this is most likely not a problem of the patchset but instead caused by
> a more verbose coredump introduced in the following patch:
> https://lore.kernel.org/all/20240718182743.1959160-3-romank@linux.microsoft.com/
>
> Changes since v4:
>  - rebase on next-20240816
>  - some functions which were static in the original v5 patchset
>    are now non-static as they're used in both mmap.c and vma.c
>    (an alternative approach to this would have been to leave
>    them as "static inline" and put them into vma.h (at least if
>    they're only used in mmap.c and vma.c))
>
> Bert Karwatzki
>
> Liam R. Howlett (19):
>   mm/mmap: Correctly position vma_iterator in __split_vma()
>   mm/mmap: Introduce abort_munmap_vmas()
>   mm/mmap: Introduce vmi_complete_munmap_vmas()
>   mm/mmap: Extract the gathering of vmas from do_vmi_align_munmap()
>   mm/mmap: Introduce vma_munmap_struct for use in munmap operations
>   mm/mmap: Change munmap to use vma_munmap_struct() for accounting and
>     surrounding vmas
>   mm/mmap: Extract validate_mm() from vma_complete()
>   mm/mmap: Inline munmap operation in mmap_region()
>   mm/mmap: Expand mmap_region() munmap call
>   mm/mmap: Support vma == NULL in init_vma_munmap()
>   mm/mmap: Reposition vma iterator in mmap_region()
>   mm/mmap: Track start and end of munmap in vma_munmap_struct
>   mm/mmap: Clean up unmap_region() argument list
>   mm/mmap: Avoid zeroing vma tree in mmap_region()
>   mm/mmap: Use PHYS_PFN in mmap_region()
>   mm/mmap: Use vms accounted pages in mmap_region()
>   mm/mmap: Move can_modify_mm() check down the stack
>   ipc/shm, mm: Drop do_vma_munmap()
>   mm/mmap: Move may_expand_vm() check in mmap_region()
>
>  include/linux/mm.h |   6 +-
>  ipc/shm.c          |   8 +-
>  mm/mmap.c          | 146 ++++++++---------
>  mm/vma.c           | 397 +++++++++++++++++++++++++++++++--------------
>  mm/vma.h           |  61 ++++++-
>  5 files changed, 403 insertions(+), 215 deletions(-)
>
> --
> 2.45.2
>

I appreciate the effort taken in rebasing, but this is quite confusing,
it's like you've re-sent the series (with tags...) as if you were
submitting it, which I'm sure isn't your intent.

You've also cc'd Andrew as if it were a submission and everybody else on
thread...

It's probably worth marking the series to make it totally clear you're just
doing this as a favour or for testing purposes (I know you at least changed
the title of the series accordingly). At the very least RFC it...

Also you should mark that this is based on -next, as mm work is based on
mm-unstable (for this series, likely the same).

Liam is working on a rebase/rework of the series in parallel also so we
should definitely be careful to avoid any confusion here.

(Oh and as a side note - please do not send email to my personal address,
all kernel mail should go to my Oracle one please).

To avoid any confusion:

NAK this whole series in this form (+ wait for Liam's next version...)

Nacked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816
  2024-08-16 11:33 ` [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816 Lorenzo Stoakes
@ 2024-08-16 11:55   ` Bert Karwatzki
  0 siblings, 0 replies; 18+ messages in thread
From: Bert Karwatzki @ 2024-08-16 11:55 UTC (permalink / raw)
  To: Lorenzo Stoakes; +Cc: linux-kernel, linux-mm

Am Freitag, dem 16.08.2024 um 12:33 +0100 schrieb Lorenzo Stoakes:
> On Fri, Aug 16, 2024 at 01:13:41PM GMT, Bert Karwatzki wrote:
> > Since the v5 patchset some changes have taken place in the linux-next tree
> > which make it impossible to cleanly apply that patchset. The most important
> > of these changes is the splitting of the mm/mmap.c into mm/mmap.c and mm/vma.c
> > and the splitting of mm/internal.h into mm/internal.h and mm/vma.h. Also
> > arch_unmap() has already been removed from mm/mmap.c in next-20240816 so
> > there's no need to take care of that.
> >
> > When testing this with `stress-ng --vm-segv 1` dmesg show errors like
> >
> >  [   T3753] coredump: 3753(stress-ng-vm-se): over coredump resource limit, skipping core dump
> >  [   T3753] coredump: 3753(stress-ng-vm-se): coredump has not been created, error -7
> >  [   T3754] coredump: 3754(stress-ng-vm-se): over coredump resource limit, skipping core dump
> >  [   T3754] coredump: 3754(stress-ng-vm-se): coredump has not been created, error -7
> >  [...]
> >
> > this is most likely not a problem of the patchset but instead caused by
> > a more verbose coredump introduced in the following patch:
> > https://lore.kernel.org/all/20240718182743.1959160-3-romank@linux.microsoft.com/
> >
> > Changes since v4:
> >  - rebase on next-20240816
> >  - some functions which were static in the original v5 patchset
> >    are now non-static as they're used in both mmap.c and vma.c
> >    (an alternative approach to this would have been to leave
> >    them as "static inline" and put them into vma.h (at least if
> >    they're only used in mmap.c and vma.c))
> >
> > Bert Karwatzki
> >
> > Liam R. Howlett (19):
> >   mm/mmap: Correctly position vma_iterator in __split_vma()
> >   mm/mmap: Introduce abort_munmap_vmas()
> >   mm/mmap: Introduce vmi_complete_munmap_vmas()
> >   mm/mmap: Extract the gathering of vmas from do_vmi_align_munmap()
> >   mm/mmap: Introduce vma_munmap_struct for use in munmap operations
> >   mm/mmap: Change munmap to use vma_munmap_struct() for accounting and
> >     surrounding vmas
> >   mm/mmap: Extract validate_mm() from vma_complete()
> >   mm/mmap: Inline munmap operation in mmap_region()
> >   mm/mmap: Expand mmap_region() munmap call
> >   mm/mmap: Support vma == NULL in init_vma_munmap()
> >   mm/mmap: Reposition vma iterator in mmap_region()
> >   mm/mmap: Track start and end of munmap in vma_munmap_struct
> >   mm/mmap: Clean up unmap_region() argument list
> >   mm/mmap: Avoid zeroing vma tree in mmap_region()
> >   mm/mmap: Use PHYS_PFN in mmap_region()
> >   mm/mmap: Use vms accounted pages in mmap_region()
> >   mm/mmap: Move can_modify_mm() check down the stack
> >   ipc/shm, mm: Drop do_vma_munmap()
> >   mm/mmap: Move may_expand_vm() check in mmap_region()
> >
> >  include/linux/mm.h |   6 +-
> >  ipc/shm.c          |   8 +-
> >  mm/mmap.c          | 146 ++++++++---------
> >  mm/vma.c           | 397 +++++++++++++++++++++++++++++++--------------
> >  mm/vma.h           |  61 ++++++-
> >  5 files changed, 403 insertions(+), 215 deletions(-)
> >
> > --
> > 2.45.2
> >
>
> I appreciate the effort taken in rebasing, but this is quite confusing,
> it's like you've re-sent the series (with tags...) as if you were
> submitting it, which I'm sure isn't your intent.
>
> You've also cc'd Andrew as if it were a submission and everybody else on
> thread...
>
> It's probably worth marking the series to make it totally clear you're just
> doing this as a favour or for testing purposes (I know you at least changed
> the title of the series accordingly). At the very least RFC it...
>
> Also you should mark that this is based on -next, as mm work is based on
> mm-unstable (for this series, likely the same).
>
> Liam is working on a rebase/rework of the series in parallel also so we
> should definitely be careful to avoid any confusion here.
>
> (Oh and as a side note - please do not send email to my personal address,
> all kernel mail should go to my Oracle one please).
>
> To avoid any confusion:
>
> NAK this whole series in this form (+ wait for Liam's next version...)
>
> Nacked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Yes, this done for testing purposes (linux-next had several NULL pointer errors
(unrelated to the patchset) and as these are resolved since next-20240814, I
wanted to try the patches again, which I last tried on linux-next-20240730).

I should have taken more care in marking this as unofficial testing. (and not CC
everyone)

> (Oh and as a side note - please do not send email to my personal address,
> all kernel mail should go to my Oracle one please).

Ok, I just copied it from the Cc of Che old patches (like the other addresses.)

Also patches 16 to 19 of the set are currently stuck as smtp.web.de currently
refuses to send them.

Bert Karwatzki




^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-08-16 16:10 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-16 11:13 [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816 Bert Karwatzki
2024-08-16 11:13 ` [PATCH v5.1 01/19] mm/mmap: Correctly position vma_iterator in __split_vma() Bert Karwatzki
2024-08-16 11:13 ` [PATCH v5.1 02/19] mm/mmap: Introduce abort_munmap_vmas() Bert Karwatzki
2024-08-16 11:13 ` [PATCH v5.1 03/19] mm/mmap: Introduce vmi_complete_munmap_vmas() Bert Karwatzki
2024-08-16 11:13 ` [PATCH v5.1 04/19] mm/mmap: Extract the gathering of vmas from do_vmi_align_munmap() Bert Karwatzki
2024-08-16 11:13 ` [PATCH v5.1 05/19] mm/mmap: Introduce vma_munmap_struct for use in munmap operations Bert Karwatzki
2024-08-16 11:13 ` [PATCH v5.1 06/19] mm/mmap: Change munmap to use vma_munmap_struct() for accounting and surrounding vmas Bert Karwatzki
2024-08-16 11:13 ` [PATCH v5.1 07/19] mm/mmap: Extract validate_mm() from vma_complete() Bert Karwatzki
2024-08-16 11:13 ` [PATCH v5.1 08/19] mm/mmap: Inline munmap operation in mmap_region() Bert Karwatzki
2024-08-16 11:13 ` [PATCH v5.1 09/19] mm/mmap: Expand mmap_region() munmap call Bert Karwatzki
2024-08-16 11:13 ` [PATCH v5.1 10/19] mm/mmap: Support vma == NULL in init_vma_munmap() Bert Karwatzki
2024-08-16 11:13 ` [PATCH v5.1 11/19] mm/mmap: Reposition vma iterator in mmap_region() Bert Karwatzki
2024-08-16 11:13 ` [PATCH v5.1 12/19] mm/mmap: Track start and end of munmap in vma_munmap_struct Bert Karwatzki
2024-08-16 11:13 ` [PATCH v5.1 13/19] mm/mmap: Clean up unmap_region() argument list Bert Karwatzki
2024-08-16 11:13 ` [PATCH v5.1 14/19] mm/mmap: Avoid zeroing vma tree in mmap_region() Bert Karwatzki
2024-08-16 11:13 ` [PATCH v5.1 15/19] mm/mmap: Use PHYS_PFN " Bert Karwatzki
2024-08-16 11:33 ` [PATCH v5.1 00/19] Rebase v5 patchset to next-20240816 Lorenzo Stoakes
2024-08-16 11:55   ` Bert Karwatzki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox