* [PATCH v3 0/5] fix error handling in mmap_region() and refactor
@ 2024-10-25 12:26 Lorenzo Stoakes
2024-10-25 12:26 ` [PATCH v3 1/5] tools: testing: add additional vma_internal.h stubs Lorenzo Stoakes
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: Lorenzo Stoakes @ 2024-10-25 12:26 UTC (permalink / raw)
To: Andrew Morton
Cc: Liam R . Howlett, Vlastimil Babka, Jann Horn, linux-kernel,
linux-mm, Linus Torvalds, Peter Xu
Andrew - Sorry to be a pain, but please keep the first 4 hotfix patches
("mm: resolve faulty mmap_region() error path behaviour", "mm: refactor
map_deny_write_exec()", "mm: unconditionally close VMAs on error", "mm:
avoid unsafe VMA hook invocation when error arises on mmap hook") - in
place and replace the remaining 4 from the series with these, which are
intended for mm-unstable. Thanks!
The mmap_region() function is somewhat terrifying, with spaghetti-like
control flow and numerous means by which issues can arise and incomplete
state, memory leaks and other unpleasantness can occur.
This series goes to great lengths to simplify how mmap_region() works and
to avoid unwinding errors late on in the process of setting up the VMA for
the new mapping, and equally avoids such operations occurring while the VMA
is in an inconsistent state.
This series builds on the previously submitted hotfix patches (see link to
v2 below) which addresses the most critical issues around mmap_region(),
and further works to improve mmap_region() complexity, stability, and
testability.
This series moves the code to mm/vma.c to render it userland testable,
refactors and simplifies it into smaller functions that are significantly
more readable.
It additionally avoids performing an attempt at a second merge mid-way
through allocating a new VMA, a dubious proposition at best and one that is
highly subject to subtle bugs.
Rather than do this, we simply note that we ought to retry the merge and do
this as a final step.
v3:
* Separated out this patch series intended for mm-unstable from the hotfixes.
* Grouped all merge state together in __mmap_region() as suggested by Vlastimil.
* Removed unnecessary vma_iter_config() as suggested by Vlastimil.
* Defined VMG_MMAP_STATE() macro so we don't retain VMG state _at all_ and avoid
any strange behaviour that comes from re-using it.
* Fixed typo in MMAP_STATE.
* Added additional patch to remove unnecessary reset state logic from
vma_merge_new_range().
* Moved the merge retry back to before __mmap_complete() - let's try to match
existing behaviour as much as possible, which previously would not have
accounted for a potential clearing of the lock flag.
v2:
* Marked first 4 patches as hotfixes, the rest as not.
* Improved comment in vma_close() as per Vlastiml.
* Updated hole byte count as per Jann.
* Updated comment in map_deny_write_exec() as per Jann.
* Dropped unnecessary vma_iter_free() as per Vlastmil, Liam.
* Corrected vms_abort_munmap_vmas() mistaken assumption about nr_pages as
per Vlasitmil.
* Changed order of initial checks in mmap_region() to avoid user-visible
side effects as per Vmastlil, Liam.
* Corrected silly incorrect use of vma field.
* Various style corrects as per Liam.
* Fix horrid mistake with merge VMA, reworked the logic to avoid that
nonsense altogether.
* Add fields to map state rather than using vmg fields to avoid
confusion/risk of vmg state changing breaking things.
* Replaced last commit removing merge retry with one that retries the
merge, only sanely.
https://lore.kernel.org/all/cover.1729715266.git.lorenzo.stoakes@oracle.com/
v1:
https://lore.kernel.org/all/cover.1729628198.git.lorenzo.stoakes@oracle.com/
Lorenzo Stoakes (5):
tools: testing: add additional vma_internal.h stubs
mm: isolate mmap internal logic to mm/vma.c
mm: refactor __mmap_region()
mm: remove unnecessary reset state logic on merge new VMA
mm: defer second attempt at merge on mmap()
mm/mmap.c | 234 -----------------
mm/vma.c | 437 ++++++++++++++++++++++++++++++-
mm/vma.h | 97 +------
mm/vma_internal.h | 5 +
tools/testing/vma/vma_internal.h | 115 +++++++-
5 files changed, 546 insertions(+), 342 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 1/5] tools: testing: add additional vma_internal.h stubs
2024-10-25 12:26 [PATCH v3 0/5] fix error handling in mmap_region() and refactor Lorenzo Stoakes
@ 2024-10-25 12:26 ` Lorenzo Stoakes
2024-10-25 12:26 ` [PATCH v3 2/5] mm: isolate mmap internal logic to mm/vma.c Lorenzo Stoakes
` (3 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Lorenzo Stoakes @ 2024-10-25 12:26 UTC (permalink / raw)
To: Andrew Morton
Cc: Liam R . Howlett, Vlastimil Babka, Jann Horn, linux-kernel,
linux-mm, Linus Torvalds, Peter Xu
Add some additional vma_internal.h stubs in preparation for __mmap_region()
being moved to mm/vma.c. Without these the move would result in the tests
no longer compiling.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
tools/testing/vma/vma_internal.h | 115 ++++++++++++++++++++++++++++++-
1 file changed, 114 insertions(+), 1 deletion(-)
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
index c5b9da034511..e76ff579e1fd 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -44,7 +44,9 @@
#define VM_LOCKED 0x00002000
#define VM_IO 0x00004000
#define VM_DONTEXPAND 0x00040000
+#define VM_LOCKONFAULT 0x00080000
#define VM_ACCOUNT 0x00100000
+#define VM_NORESERVE 0x00200000
#define VM_MIXEDMAP 0x10000000
#define VM_STACK VM_GROWSDOWN
#define VM_SHADOW_STACK VM_NONE
@@ -53,6 +55,14 @@
#define VM_ACCESS_FLAGS (VM_READ | VM_WRITE | VM_EXEC)
#define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_PFNMAP | VM_MIXEDMAP)
+/* This mask represents all the VMA flag bits used by mlock */
+#define VM_LOCKED_MASK (VM_LOCKED | VM_LOCKONFAULT)
+
+#ifdef CONFIG_64BIT
+/* VM is sealed, in vm_flags */
+#define VM_SEALED _BITUL(63)
+#endif
+
#define FIRST_USER_ADDRESS 0UL
#define USER_PGTABLES_CEILING 0UL
@@ -698,8 +708,9 @@ static inline void tlb_finish_mmu(struct mmu_gather *)
{
}
-static inline void get_file(struct file *)
+static inline struct file *get_file(struct file *f)
{
+ return f;
}
static inline int vma_dup_policy(struct vm_area_struct *, struct vm_area_struct *)
@@ -920,4 +931,106 @@ static inline bool signal_pending(void *)
return false;
}
+static inline bool is_file_hugepages(struct file *)
+{
+ return false;
+}
+
+static inline int security_vm_enough_memory_mm(struct mm_struct *, long)
+{
+ return true;
+}
+
+static inline bool may_expand_vm(struct mm_struct *, vm_flags_t, unsigned long)
+{
+ return true;
+}
+
+static inline void vm_flags_init(struct vm_area_struct *vma,
+ vm_flags_t flags)
+{
+ vma->__vm_flags = flags;
+}
+
+static inline void vm_flags_set(struct vm_area_struct *vma,
+ vm_flags_t flags)
+{
+ vma_start_write(vma);
+ vma->__vm_flags |= flags;
+}
+
+static inline void vm_flags_clear(struct vm_area_struct *vma,
+ vm_flags_t flags)
+{
+ vma_start_write(vma);
+ vma->__vm_flags &= ~flags;
+}
+
+static inline int call_mmap(struct file *, struct vm_area_struct *)
+{
+ return 0;
+}
+
+static inline int shmem_zero_setup(struct vm_area_struct *)
+{
+ return 0;
+}
+
+static inline void vma_set_anonymous(struct vm_area_struct *vma)
+{
+ vma->vm_ops = NULL;
+}
+
+static inline void ksm_add_vma(struct vm_area_struct *)
+{
+}
+
+static inline void perf_event_mmap(struct vm_area_struct *)
+{
+}
+
+static inline bool vma_is_dax(struct vm_area_struct *)
+{
+ return false;
+}
+
+static inline struct vm_area_struct *get_gate_vma(struct mm_struct *)
+{
+ return NULL;
+}
+
+bool vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot);
+
+/* Update vma->vm_page_prot to reflect vma->vm_flags. */
+static inline void vma_set_page_prot(struct vm_area_struct *vma)
+{
+ unsigned long vm_flags = vma->vm_flags;
+ pgprot_t vm_page_prot;
+
+ /* testing: we inline vm_pgprot_modify() to avoid clash with vma.h. */
+ vm_page_prot = pgprot_modify(vma->vm_page_prot, vm_get_page_prot(vm_flags));
+
+ if (vma_wants_writenotify(vma, vm_page_prot)) {
+ vm_flags &= ~VM_SHARED;
+ /* testing: we inline vm_pgprot_modify() to avoid clash with vma.h. */
+ vm_page_prot = pgprot_modify(vm_page_prot, vm_get_page_prot(vm_flags));
+ }
+ /* remove_protection_ptes reads vma->vm_page_prot without mmap_lock */
+ WRITE_ONCE(vma->vm_page_prot, vm_page_prot);
+}
+
+static inline bool arch_validate_flags(unsigned long)
+{
+ return true;
+}
+
+static inline void vma_close(struct vm_area_struct *)
+{
+}
+
+static inline int mmap_file(struct file *, struct vm_area_struct *)
+{
+ return 0;
+}
+
#endif /* __MM_VMA_INTERNAL_H */
--
2.47.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 2/5] mm: isolate mmap internal logic to mm/vma.c
2024-10-25 12:26 [PATCH v3 0/5] fix error handling in mmap_region() and refactor Lorenzo Stoakes
2024-10-25 12:26 ` [PATCH v3 1/5] tools: testing: add additional vma_internal.h stubs Lorenzo Stoakes
@ 2024-10-25 12:26 ` Lorenzo Stoakes
2024-10-25 12:26 ` [PATCH v3 3/5] mm: refactor __mmap_region() Lorenzo Stoakes
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Lorenzo Stoakes @ 2024-10-25 12:26 UTC (permalink / raw)
To: Andrew Morton
Cc: Liam R . Howlett, Vlastimil Babka, Jann Horn, linux-kernel,
linux-mm, Linus Torvalds, Peter Xu
In previous commits we effected improvements to the mmap() logic in
mmap_region() and its newly introduced internal implementation function
__mmap_region().
However as these changes are intended to be backported, we kept the delta
as small as is possible and made as few changes as possible to the newly
introduced mm/vma.* files.
Take the opportunity to move this logic to mm/vma.c which not only isolates
it, but also makes it available for later userland testing which can help
us catch such logic errors far earlier.
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/mmap.c | 234 ---------------------------------
mm/vma.c | 323 +++++++++++++++++++++++++++++++++++++++++++++-
mm/vma.h | 97 +-------------
mm/vma_internal.h | 5 +
4 files changed, 329 insertions(+), 330 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index e686d57ed9f7..0affd1a0687f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -577,22 +577,6 @@ SYSCALL_DEFINE1(old_mmap, struct mmap_arg_struct __user *, arg)
}
#endif /* __ARCH_WANT_SYS_OLD_MMAP */
-/*
- * We account for memory if it's a private writeable mapping,
- * not hugepages and VM_NORESERVE wasn't set.
- */
-static inline bool accountable_mapping(struct file *file, vm_flags_t vm_flags)
-{
- /*
- * hugetlb has its own accounting separate from the core VM
- * VM_HUGETLB may not be set yet so we cannot check for that flag.
- */
- if (file && is_file_hugepages(file))
- return false;
-
- return (vm_flags & (VM_NORESERVE | VM_SHARED | VM_WRITE)) == VM_WRITE;
-}
-
/**
* unmapped_area() - Find an area between the low_limit and the high_limit with
* the correct alignment and offset, all from @info. Note: current->mm is used
@@ -1361,224 +1345,6 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
return do_vmi_munmap(&vmi, mm, start, len, uf, false);
}
-static unsigned long __mmap_region(struct file *file, unsigned long addr,
- unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
- struct list_head *uf)
-{
- struct mm_struct *mm = current->mm;
- struct vm_area_struct *vma = NULL;
- pgoff_t pglen = PHYS_PFN(len);
- unsigned long charged = 0;
- struct vma_munmap_struct vms;
- struct ma_state mas_detach;
- struct maple_tree mt_detach;
- unsigned long end = addr + len;
- int error;
- VMA_ITERATOR(vmi, mm, addr);
- VMG_STATE(vmg, mm, &vmi, addr, end, vm_flags, pgoff);
-
- vmg.file = file;
- /* Find the first overlapping VMA */
- vma = vma_find(&vmi, end);
- init_vma_munmap(&vms, &vmi, vma, addr, end, uf, /* unlock = */ false);
- if (vma) {
- mt_init_flags(&mt_detach, vmi.mas.tree->ma_flags & MT_FLAGS_LOCK_MASK);
- mt_on_stack(mt_detach);
- mas_init(&mas_detach, &mt_detach, /* addr = */ 0);
- /* Prepare to unmap any existing mapping in the area */
- error = vms_gather_munmap_vmas(&vms, &mas_detach);
- if (error)
- goto gather_failed;
-
- vmg.next = vms.next;
- vmg.prev = vms.prev;
- vma = NULL;
- } else {
- vmg.next = vma_iter_next_rewind(&vmi, &vmg.prev);
- }
-
- /* Check against address space limit. */
- if (!may_expand_vm(mm, vm_flags, pglen - vms.nr_pages)) {
- error = -ENOMEM;
- goto abort_munmap;
- }
-
- /*
- * Private writable mapping: check memory availability
- */
- if (accountable_mapping(file, vm_flags)) {
- charged = pglen;
- charged -= vms.nr_accounted;
- if (charged) {
- error = security_vm_enough_memory_mm(mm, charged);
- if (error)
- goto abort_munmap;
- }
-
- vms.nr_accounted = 0;
- vm_flags |= VM_ACCOUNT;
- vmg.flags = vm_flags;
- }
-
- /*
- * clear PTEs while the vma is still in the tree so that rmap
- * cannot race with the freeing later in the truncate scenario.
- * This is also needed for mmap_file(), which is why vm_ops
- * close function is called.
- */
- vms_clean_up_area(&vms, &mas_detach);
- vma = vma_merge_new_range(&vmg);
- if (vma)
- goto expanded;
- /*
- * Determine the object being mapped and call the appropriate
- * specific mapper. the address has already been validated, but
- * not unmapped, but the maps are removed from the list.
- */
- vma = vm_area_alloc(mm);
- if (!vma) {
- error = -ENOMEM;
- goto unacct_error;
- }
-
- vma_iter_config(&vmi, addr, end);
- vma_set_range(vma, addr, end, pgoff);
- vm_flags_init(vma, vm_flags);
- vma->vm_page_prot = vm_get_page_prot(vm_flags);
-
- if (vma_iter_prealloc(&vmi, vma)) {
- error = -ENOMEM;
- goto free_vma;
- }
-
- if (file) {
- vma->vm_file = get_file(file);
- error = mmap_file(file, vma);
- if (error)
- goto unmap_and_free_file_vma;
-
- /* Drivers cannot alter the address of the VMA. */
- WARN_ON_ONCE(addr != vma->vm_start);
- /*
- * Drivers should not permit writability when previously it was
- * disallowed.
- */
- VM_WARN_ON_ONCE(vm_flags != vma->vm_flags &&
- !(vm_flags & VM_MAYWRITE) &&
- (vma->vm_flags & VM_MAYWRITE));
-
- vma_iter_config(&vmi, addr, end);
- /*
- * If vm_flags changed after mmap_file(), we should try merge
- * vma again as we may succeed this time.
- */
- if (unlikely(vm_flags != vma->vm_flags && vmg.prev)) {
- struct vm_area_struct *merge;
-
- vmg.flags = vma->vm_flags;
- /* If this fails, state is reset ready for a reattempt. */
- merge = vma_merge_new_range(&vmg);
-
- if (merge) {
- /*
- * ->mmap() can change vma->vm_file and fput
- * the original file. So fput the vma->vm_file
- * here or we would add an extra fput for file
- * and cause general protection fault
- * ultimately.
- */
- fput(vma->vm_file);
- vm_area_free(vma);
- vma = merge;
- /* Update vm_flags to pick up the change. */
- vm_flags = vma->vm_flags;
- goto file_expanded;
- }
- vma_iter_config(&vmi, addr, end);
- }
-
- vm_flags = vma->vm_flags;
- } else if (vm_flags & VM_SHARED) {
- error = shmem_zero_setup(vma);
- if (error)
- goto free_iter_vma;
- } else {
- vma_set_anonymous(vma);
- }
-
-#ifdef CONFIG_SPARC64
- /* TODO: Fix SPARC ADI! */
- WARN_ON_ONCE(!arch_validate_flags(vm_flags));
-#endif
-
- /* Lock the VMA since it is modified after insertion into VMA tree */
- vma_start_write(vma);
- vma_iter_store(&vmi, vma);
- mm->map_count++;
- vma_link_file(vma);
-
- /*
- * vma_merge_new_range() calls khugepaged_enter_vma() too, the below
- * call covers the non-merge case.
- */
- khugepaged_enter_vma(vma, vma->vm_flags);
-
-file_expanded:
- file = vma->vm_file;
- ksm_add_vma(vma);
-expanded:
- perf_event_mmap(vma);
-
- /* Unmap any existing mapping in the area */
- vms_complete_munmap_vmas(&vms, &mas_detach);
-
- vm_stat_account(mm, vm_flags, pglen);
- if (vm_flags & VM_LOCKED) {
- if ((vm_flags & VM_SPECIAL) || vma_is_dax(vma) ||
- is_vm_hugetlb_page(vma) ||
- vma == get_gate_vma(current->mm))
- vm_flags_clear(vma, VM_LOCKED_MASK);
- else
- mm->locked_vm += pglen;
- }
-
- if (file)
- uprobe_mmap(vma);
-
- /*
- * New (or expanded) vma always get soft dirty status.
- * Otherwise user-space soft-dirty page tracker won't
- * be able to distinguish situation when vma area unmapped,
- * then new mapped in-place (which must be aimed as
- * a completely new data area).
- */
- vm_flags_set(vma, VM_SOFTDIRTY);
-
- vma_set_page_prot(vma);
-
- return addr;
-
-unmap_and_free_file_vma:
- fput(vma->vm_file);
- vma->vm_file = NULL;
-
- vma_iter_set(&vmi, vma->vm_end);
- /* Undo any partial mapping done by a device driver. */
- unmap_region(&vmi.mas, vma, vmg.prev, vmg.next);
-free_iter_vma:
- vma_iter_free(&vmi);
-free_vma:
- vm_area_free(vma);
-unacct_error:
- if (charged)
- vm_unacct_memory(charged);
-
-abort_munmap:
- vms_abort_munmap_vmas(&vms, &mas_detach);
-gather_failed:
- return error;
-}
-
unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
struct list_head *uf)
diff --git a/mm/vma.c b/mm/vma.c
index bb7cfa2dc282..0a2965be582d 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -1103,7 +1103,7 @@ static inline void vms_clear_ptes(struct vma_munmap_struct *vms,
vms->clear_ptes = false;
}
-void vms_clean_up_area(struct vma_munmap_struct *vms,
+static void vms_clean_up_area(struct vma_munmap_struct *vms,
struct ma_state *mas_detach)
{
struct vm_area_struct *vma;
@@ -1126,7 +1126,7 @@ void vms_clean_up_area(struct vma_munmap_struct *vms,
* used for the munmap() and may downgrade the lock - if requested. Everything
* needed to be done once the vma maple tree is updated.
*/
-void vms_complete_munmap_vmas(struct vma_munmap_struct *vms,
+static void vms_complete_munmap_vmas(struct vma_munmap_struct *vms,
struct ma_state *mas_detach)
{
struct vm_area_struct *vma;
@@ -1167,6 +1167,23 @@ void vms_complete_munmap_vmas(struct vma_munmap_struct *vms,
__mt_destroy(mas_detach->tree);
}
+/*
+ * reattach_vmas() - Undo any munmap work and free resources
+ * @mas_detach: The maple state with the detached maple tree
+ *
+ * Reattach any detached vmas and free up the maple tree used to track the vmas.
+ */
+static void reattach_vmas(struct ma_state *mas_detach)
+{
+ struct vm_area_struct *vma;
+
+ mas_set(mas_detach, 0);
+ mas_for_each(mas_detach, vma, ULONG_MAX)
+ vma_mark_detached(vma, false);
+
+ __mt_destroy(mas_detach->tree);
+}
+
/*
* vms_gather_munmap_vmas() - Put all VMAs within a range into a maple tree
* for removal at a later date. Handles splitting first and last if necessary
@@ -1177,7 +1194,7 @@ void vms_complete_munmap_vmas(struct vma_munmap_struct *vms,
*
* Return: 0 on success, error otherwise
*/
-int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
+static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
struct ma_state *mas_detach)
{
struct vm_area_struct *next = NULL;
@@ -1315,6 +1332,39 @@ int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
return error;
}
+/*
+ * init_vma_munmap() - Initializer wrapper for vma_munmap_struct
+ * @vms: The vma munmap struct
+ * @vmi: The vma iterator
+ * @vma: The first vm_area_struct to munmap
+ * @start: The aligned start address to munmap
+ * @end: The aligned end address to munmap
+ * @uf: The userfaultfd list_head
+ * @unlock: Unlock after the operation. Only unlocked on success
+ */
+static void init_vma_munmap(struct vma_munmap_struct *vms,
+ struct vma_iterator *vmi, struct vm_area_struct *vma,
+ unsigned long start, unsigned long end, struct list_head *uf,
+ bool unlock)
+{
+ vms->vmi = vmi;
+ vms->vma = vma;
+ if (vma) {
+ vms->start = start;
+ vms->end = end;
+ } else {
+ vms->start = vms->end = 0;
+ }
+ vms->unlock = unlock;
+ vms->uf = uf;
+ vms->vma_count = 0;
+ vms->nr_pages = vms->locked_vm = vms->nr_accounted = 0;
+ vms->exec_vm = vms->stack_vm = vms->data_vm = 0;
+ vms->unmap_start = FIRST_USER_ADDRESS;
+ vms->unmap_end = USER_PGTABLES_CEILING;
+ vms->clear_ptes = false;
+}
+
/*
* do_vmi_align_munmap() - munmap the aligned region from @start to @end.
* @vmi: The vma iterator
@@ -2069,3 +2119,270 @@ void mm_drop_all_locks(struct mm_struct *mm)
mutex_unlock(&mm_all_locks_mutex);
}
+
+/*
+ * We account for memory if it's a private writeable mapping,
+ * not hugepages and VM_NORESERVE wasn't set.
+ */
+static bool accountable_mapping(struct file *file, vm_flags_t vm_flags)
+{
+ /*
+ * hugetlb has its own accounting separate from the core VM
+ * VM_HUGETLB may not be set yet so we cannot check for that flag.
+ */
+ if (file && is_file_hugepages(file))
+ return false;
+
+ return (vm_flags & (VM_NORESERVE | VM_SHARED | VM_WRITE)) == VM_WRITE;
+}
+
+/*
+ * vms_abort_munmap_vmas() - Undo as much as possible from an aborted munmap()
+ * operation.
+ * @vms: The vma unmap structure
+ * @mas_detach: The maple state with the detached maple tree
+ *
+ * Reattach any detached vmas, free up the maple tree used to track the vmas.
+ * If that's not possible because the ptes are cleared (and vm_ops->closed() may
+ * have been called), then a NULL is written over the vmas and the vmas are
+ * removed (munmap() completed).
+ */
+static void vms_abort_munmap_vmas(struct vma_munmap_struct *vms,
+ struct ma_state *mas_detach)
+{
+ struct ma_state *mas = &vms->vmi->mas;
+
+ if (!vms->nr_pages)
+ return;
+
+ if (vms->clear_ptes)
+ return reattach_vmas(mas_detach);
+
+ /*
+ * Aborting cannot just call the vm_ops open() because they are often
+ * not symmetrical and state data has been lost. Resort to the old
+ * failure method of leaving a gap where the MAP_FIXED mapping failed.
+ */
+ mas_set_range(mas, vms->start, vms->end - 1);
+ mas_store_gfp(mas, NULL, GFP_KERNEL|__GFP_NOFAIL);
+ /* Clean up the insertion of the unfortunate gap */
+ vms_complete_munmap_vmas(vms, mas_detach);
+}
+
+unsigned long __mmap_region(struct file *file, unsigned long addr,
+ unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
+ struct list_head *uf)
+{
+ struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma = NULL;
+ pgoff_t pglen = PHYS_PFN(len);
+ unsigned long charged = 0;
+ struct vma_munmap_struct vms;
+ struct ma_state mas_detach;
+ struct maple_tree mt_detach;
+ unsigned long end = addr + len;
+ int error;
+ VMA_ITERATOR(vmi, mm, addr);
+ VMG_STATE(vmg, mm, &vmi, addr, end, vm_flags, pgoff);
+
+ vmg.file = file;
+ /* Find the first overlapping VMA */
+ vma = vma_find(&vmi, end);
+ init_vma_munmap(&vms, &vmi, vma, addr, end, uf, /* unlock = */ false);
+ if (vma) {
+ mt_init_flags(&mt_detach, vmi.mas.tree->ma_flags & MT_FLAGS_LOCK_MASK);
+ mt_on_stack(mt_detach);
+ mas_init(&mas_detach, &mt_detach, /* addr = */ 0);
+ /* Prepare to unmap any existing mapping in the area */
+ error = vms_gather_munmap_vmas(&vms, &mas_detach);
+ if (error)
+ goto gather_failed;
+
+ vmg.next = vms.next;
+ vmg.prev = vms.prev;
+ vma = NULL;
+ } else {
+ vmg.next = vma_iter_next_rewind(&vmi, &vmg.prev);
+ }
+
+ /* Check against address space limit. */
+ if (!may_expand_vm(mm, vm_flags, pglen - vms.nr_pages)) {
+ error = -ENOMEM;
+ goto abort_munmap;
+ }
+
+ /*
+ * Private writable mapping: check memory availability
+ */
+ if (accountable_mapping(file, vm_flags)) {
+ charged = pglen;
+ charged -= vms.nr_accounted;
+ if (charged) {
+ error = security_vm_enough_memory_mm(mm, charged);
+ if (error)
+ goto abort_munmap;
+ }
+
+ vms.nr_accounted = 0;
+ vm_flags |= VM_ACCOUNT;
+ vmg.flags = vm_flags;
+ }
+
+ /*
+ * clear PTEs while the vma is still in the tree so that rmap
+ * cannot race with the freeing later in the truncate scenario.
+ * This is also needed for mmap_file(), which is why vm_ops
+ * close function is called.
+ */
+ vms_clean_up_area(&vms, &mas_detach);
+ vma = vma_merge_new_range(&vmg);
+ if (vma)
+ goto expanded;
+ /*
+ * Determine the object being mapped and call the appropriate
+ * specific mapper. the address has already been validated, but
+ * not unmapped, but the maps are removed from the list.
+ */
+ vma = vm_area_alloc(mm);
+ if (!vma) {
+ error = -ENOMEM;
+ goto unacct_error;
+ }
+
+ vma_iter_config(&vmi, addr, end);
+ vma_set_range(vma, addr, end, pgoff);
+ vm_flags_init(vma, vm_flags);
+ vma->vm_page_prot = vm_get_page_prot(vm_flags);
+
+ if (vma_iter_prealloc(&vmi, vma)) {
+ error = -ENOMEM;
+ goto free_vma;
+ }
+
+ if (file) {
+ vma->vm_file = get_file(file);
+ error = mmap_file(file, vma);
+ if (error)
+ goto unmap_and_free_file_vma;
+
+ /* Drivers cannot alter the address of the VMA. */
+ WARN_ON_ONCE(addr != vma->vm_start);
+ /*
+ * Drivers should not permit writability when previously it was
+ * disallowed.
+ */
+ VM_WARN_ON_ONCE(vm_flags != vma->vm_flags &&
+ !(vm_flags & VM_MAYWRITE) &&
+ (vma->vm_flags & VM_MAYWRITE));
+
+ vma_iter_config(&vmi, addr, end);
+ /*
+ * If vm_flags changed after mmap_file(), we should try merge
+ * vma again as we may succeed this time.
+ */
+ if (unlikely(vm_flags != vma->vm_flags && vmg.prev)) {
+ struct vm_area_struct *merge;
+
+ vmg.flags = vma->vm_flags;
+ /* If this fails, state is reset ready for a reattempt. */
+ merge = vma_merge_new_range(&vmg);
+
+ if (merge) {
+ /*
+ * ->mmap() can change vma->vm_file and fput
+ * the original file. So fput the vma->vm_file
+ * here or we would add an extra fput for file
+ * and cause general protection fault
+ * ultimately.
+ */
+ fput(vma->vm_file);
+ vm_area_free(vma);
+ vma = merge;
+ /* Update vm_flags to pick up the change. */
+ vm_flags = vma->vm_flags;
+ goto file_expanded;
+ }
+ vma_iter_config(&vmi, addr, end);
+ }
+
+ vm_flags = vma->vm_flags;
+ } else if (vm_flags & VM_SHARED) {
+ error = shmem_zero_setup(vma);
+ if (error)
+ goto free_iter_vma;
+ } else {
+ vma_set_anonymous(vma);
+ }
+
+#ifdef CONFIG_SPARC64
+ /* TODO: Fix SPARC ADI! */
+ WARN_ON_ONCE(!arch_validate_flags(vm_flags));
+#endif
+
+ /* Lock the VMA since it is modified after insertion into VMA tree */
+ vma_start_write(vma);
+ vma_iter_store(&vmi, vma);
+ mm->map_count++;
+ vma_link_file(vma);
+
+ /*
+ * vma_merge_new_range() calls khugepaged_enter_vma() too, the below
+ * call covers the non-merge case.
+ */
+ khugepaged_enter_vma(vma, vma->vm_flags);
+
+file_expanded:
+ file = vma->vm_file;
+ ksm_add_vma(vma);
+expanded:
+ perf_event_mmap(vma);
+
+ /* Unmap any existing mapping in the area */
+ vms_complete_munmap_vmas(&vms, &mas_detach);
+
+ vm_stat_account(mm, vm_flags, pglen);
+ if (vm_flags & VM_LOCKED) {
+ if ((vm_flags & VM_SPECIAL) || vma_is_dax(vma) ||
+ is_vm_hugetlb_page(vma) ||
+ vma == get_gate_vma(current->mm))
+ vm_flags_clear(vma, VM_LOCKED_MASK);
+ else
+ mm->locked_vm += pglen;
+ }
+
+ if (file)
+ uprobe_mmap(vma);
+
+ /*
+ * New (or expanded) vma always get soft dirty status.
+ * Otherwise user-space soft-dirty page tracker won't
+ * be able to distinguish situation when vma area unmapped,
+ * then new mapped in-place (which must be aimed as
+ * a completely new data area).
+ */
+ vm_flags_set(vma, VM_SOFTDIRTY);
+
+ vma_set_page_prot(vma);
+
+ return addr;
+
+unmap_and_free_file_vma:
+ fput(vma->vm_file);
+ vma->vm_file = NULL;
+
+ vma_iter_set(&vmi, vma->vm_end);
+ /* Undo any partial mapping done by a device driver. */
+ unmap_region(&vmi.mas, vma, vmg.prev, vmg.next);
+free_iter_vma:
+ vma_iter_free(&vmi);
+free_vma:
+ vm_area_free(vma);
+unacct_error:
+ if (charged)
+ vm_unacct_memory(charged);
+
+abort_munmap:
+ vms_abort_munmap_vmas(&vms, &mas_detach);
+gather_failed:
+ return error;
+}
diff --git a/mm/vma.h b/mm/vma.h
index d58068c0ff2e..388d34748674 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -165,99 +165,6 @@ static inline int vma_iter_store_gfp(struct vma_iterator *vmi,
return 0;
}
-#ifdef CONFIG_MMU
-/*
- * init_vma_munmap() - Initializer wrapper for vma_munmap_struct
- * @vms: The vma munmap struct
- * @vmi: The vma iterator
- * @vma: The first vm_area_struct to munmap
- * @start: The aligned start address to munmap
- * @end: The aligned end address to munmap
- * @uf: The userfaultfd list_head
- * @unlock: Unlock after the operation. Only unlocked on success
- */
-static inline void init_vma_munmap(struct vma_munmap_struct *vms,
- struct vma_iterator *vmi, struct vm_area_struct *vma,
- unsigned long start, unsigned long end, struct list_head *uf,
- bool unlock)
-{
- vms->vmi = vmi;
- vms->vma = vma;
- if (vma) {
- vms->start = start;
- vms->end = end;
- } else {
- vms->start = vms->end = 0;
- }
- vms->unlock = unlock;
- vms->uf = uf;
- vms->vma_count = 0;
- vms->nr_pages = vms->locked_vm = vms->nr_accounted = 0;
- vms->exec_vm = vms->stack_vm = vms->data_vm = 0;
- vms->unmap_start = FIRST_USER_ADDRESS;
- vms->unmap_end = USER_PGTABLES_CEILING;
- vms->clear_ptes = false;
-}
-#endif
-
-int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
- struct ma_state *mas_detach);
-
-void vms_complete_munmap_vmas(struct vma_munmap_struct *vms,
- struct ma_state *mas_detach);
-
-void vms_clean_up_area(struct vma_munmap_struct *vms,
- struct ma_state *mas_detach);
-
-/*
- * reattach_vmas() - Undo any munmap work and free resources
- * @mas_detach: The maple state with the detached maple tree
- *
- * Reattach any detached vmas and free up the maple tree used to track the vmas.
- */
-static inline void reattach_vmas(struct ma_state *mas_detach)
-{
- struct vm_area_struct *vma;
-
- mas_set(mas_detach, 0);
- mas_for_each(mas_detach, vma, ULONG_MAX)
- vma_mark_detached(vma, false);
-
- __mt_destroy(mas_detach->tree);
-}
-
-/*
- * vms_abort_munmap_vmas() - Undo as much as possible from an aborted munmap()
- * operation.
- * @vms: The vma unmap structure
- * @mas_detach: The maple state with the detached maple tree
- *
- * Reattach any detached vmas, free up the maple tree used to track the vmas.
- * If that's not possible because the ptes are cleared (and vm_ops->closed() may
- * have been called), then a NULL is written over the vmas and the vmas are
- * removed (munmap() completed).
- */
-static inline void vms_abort_munmap_vmas(struct vma_munmap_struct *vms,
- struct ma_state *mas_detach)
-{
- struct ma_state *mas = &vms->vmi->mas;
- if (!vms->nr_pages)
- return;
-
- if (vms->clear_ptes)
- return reattach_vmas(mas_detach);
-
- /*
- * Aborting cannot just call the vm_ops open() because they are often
- * not symmetrical and state data has been lost. Resort to the old
- * failure method of leaving a gap where the MAP_FIXED mapping failed.
- */
- mas_set_range(mas, vms->start, vms->end - 1);
- mas_store_gfp(mas, NULL, GFP_KERNEL|__GFP_NOFAIL);
- /* Clean up the insertion of the unfortunate gap */
- vms_complete_munmap_vmas(vms, mas_detach);
-}
-
int
do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
struct mm_struct *mm, unsigned long start,
@@ -336,6 +243,10 @@ bool vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot);
int mm_take_all_locks(struct mm_struct *mm);
void mm_drop_all_locks(struct mm_struct *mm);
+unsigned long __mmap_region(struct file *file, unsigned long addr,
+ unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
+ struct list_head *uf);
+
static inline bool vma_wants_manual_pte_write_upgrade(struct vm_area_struct *vma)
{
/*
diff --git a/mm/vma_internal.h b/mm/vma_internal.h
index b930ab12a587..fc5f172a36bd 100644
--- a/mm/vma_internal.h
+++ b/mm/vma_internal.h
@@ -17,8 +17,10 @@
#include <linux/file.h>
#include <linux/fs.h>
#include <linux/huge_mm.h>
+#include <linux/hugetlb.h>
#include <linux/hugetlb_inline.h>
#include <linux/kernel.h>
+#include <linux/ksm.h>
#include <linux/khugepaged.h>
#include <linux/list.h>
#include <linux/maple_tree.h>
@@ -32,11 +34,14 @@
#include <linux/mmu_context.h>
#include <linux/mutex.h>
#include <linux/pagemap.h>
+#include <linux/perf_event.h>
#include <linux/pfn.h>
#include <linux/rcupdate.h>
#include <linux/rmap.h>
#include <linux/rwsem.h>
#include <linux/sched/signal.h>
+#include <linux/security.h>
+#include <linux/shmem_fs.h>
#include <linux/swap.h>
#include <linux/uprobes.h>
#include <linux/userfaultfd_k.h>
--
2.47.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 3/5] mm: refactor __mmap_region()
2024-10-25 12:26 [PATCH v3 0/5] fix error handling in mmap_region() and refactor Lorenzo Stoakes
2024-10-25 12:26 ` [PATCH v3 1/5] tools: testing: add additional vma_internal.h stubs Lorenzo Stoakes
2024-10-25 12:26 ` [PATCH v3 2/5] mm: isolate mmap internal logic to mm/vma.c Lorenzo Stoakes
@ 2024-10-25 12:26 ` Lorenzo Stoakes
2024-10-25 12:26 ` [PATCH v3 4/5] mm: remove unnecessary reset state logic on merge new VMA Lorenzo Stoakes
2024-10-25 12:26 ` [PATCH v3 5/5] mm: defer second attempt at merge on mmap() Lorenzo Stoakes
4 siblings, 0 replies; 10+ messages in thread
From: Lorenzo Stoakes @ 2024-10-25 12:26 UTC (permalink / raw)
To: Andrew Morton
Cc: Liam R . Howlett, Vlastimil Babka, Jann Horn, linux-kernel,
linux-mm, Linus Torvalds, Peter Xu
We have seen bugs and resource leaks arise from the complexity of the
__mmap_region() function. This, and the generally deeply fragile error
handling logic and complexity which makes understanding the function
difficult make it highly desirable to refactor it into something readable.
Achieve this by separating the function into smaller logical parts which
are easier to understand and follow, and which importantly very
significantly simplify the error handling.
Note that we now call vms_abort_munmap_vmas() in more error paths than we
used to, however in cases where no abort need occur, vms->nr_pages will be
equal to zero and we simply exit this function without doing more than we
would have done previously.
Importantly, the invocation of the driver mmap hook via mmap_file() now has
very simple and obvious handling (this was previously the most problematic
part of the mmap() operation).
Use a generalised stack-based 'mmap state' to thread through values and
also retrieve state as needed.
Also avoid ever relying on vma merge (vmg) state after a merge is
attempted, instead maintain meaningful state in the mmap state and
establish vmg state as and when required.
This avoids any subtle bugs arising from merge logic mutating this state
and mmap_region() logic later relying upon it.
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/vma.c | 410 ++++++++++++++++++++++++++++++++++++-------------------
1 file changed, 270 insertions(+), 140 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index 0a2965be582d..b91c947babd6 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -7,6 +7,56 @@
#include "vma_internal.h"
#include "vma.h"
+struct mmap_state {
+ struct mm_struct *mm;
+ struct vma_iterator *vmi;
+
+ unsigned long addr;
+ unsigned long end;
+ pgoff_t pgoff;
+ unsigned long pglen;
+ unsigned long flags;
+ struct file *file;
+
+ unsigned long charged;
+
+ struct vm_area_struct *prev;
+ struct vm_area_struct *next;
+
+ /* Unmapping state. */
+ struct vma_munmap_struct vms;
+ struct ma_state mas_detach;
+ struct maple_tree mt_detach;
+};
+
+#define MMAP_STATE(name, mm_, vmi_, addr_, len_, pgoff_, flags_, file_) \
+ struct mmap_state name = { \
+ .mm = mm_, \
+ .vmi = vmi_, \
+ .addr = addr_, \
+ .end = (addr_) + len, \
+ .pgoff = pgoff_, \
+ .pglen = PHYS_PFN(len_), \
+ .flags = flags_, \
+ .file = file_, \
+ }
+
+#define VMG_MMAP_STATE(name, map_, vma_) \
+ struct vma_merge_struct name = { \
+ .mm = (map_)->mm, \
+ .vmi = (map_)->vmi, \
+ .start = (map_)->addr, \
+ .end = (map_)->end, \
+ .flags = (map_)->flags, \
+ .pgoff = (map_)->pgoff, \
+ .file = (map_)->file, \
+ .prev = (map_)->prev, \
+ .vma = vma_, \
+ .next = (vma_) ? NULL : (map_)->next, \
+ .state = VMA_MERGE_START, \
+ .merge_flags = VMG_FLAG_DEFAULT, \
+ }
+
static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_next)
{
struct vm_area_struct *vma = merge_next ? vmg->next : vmg->prev;
@@ -2169,188 +2219,249 @@ static void vms_abort_munmap_vmas(struct vma_munmap_struct *vms,
vms_complete_munmap_vmas(vms, mas_detach);
}
-unsigned long __mmap_region(struct file *file, unsigned long addr,
- unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
- struct list_head *uf)
+/*
+ * __mmap_prepare() - Prepare to gather any overlapping VMAs that need to be
+ * unmapped once the map operation is completed, check limits, account mapping
+ * and clean up any pre-existing VMAs.
+ *
+ * @map: Mapping state.
+ * @uf: Userfaultfd context list.
+ *
+ * Returns: 0 on success, error code otherwise.
+ */
+static int __mmap_prepare(struct mmap_state *map, struct list_head *uf)
{
- struct mm_struct *mm = current->mm;
- struct vm_area_struct *vma = NULL;
- pgoff_t pglen = PHYS_PFN(len);
- unsigned long charged = 0;
- struct vma_munmap_struct vms;
- struct ma_state mas_detach;
- struct maple_tree mt_detach;
- unsigned long end = addr + len;
int error;
- VMA_ITERATOR(vmi, mm, addr);
- VMG_STATE(vmg, mm, &vmi, addr, end, vm_flags, pgoff);
-
- vmg.file = file;
- /* Find the first overlapping VMA */
- vma = vma_find(&vmi, end);
- init_vma_munmap(&vms, &vmi, vma, addr, end, uf, /* unlock = */ false);
- if (vma) {
- mt_init_flags(&mt_detach, vmi.mas.tree->ma_flags & MT_FLAGS_LOCK_MASK);
- mt_on_stack(mt_detach);
- mas_init(&mas_detach, &mt_detach, /* addr = */ 0);
+ struct vma_iterator *vmi = map->vmi;
+ struct vma_munmap_struct *vms = &map->vms;
+
+ /* Find the first overlapping VMA and initialise unmap state. */
+ vms->vma = vma_find(vmi, map->end);
+ init_vma_munmap(vms, vmi, vms->vma, map->addr, map->end, uf,
+ /* unlock = */ false);
+
+ /* OK, we have overlapping VMAs - prepare to unmap them. */
+ if (vms->vma) {
+ mt_init_flags(&map->mt_detach,
+ vmi->mas.tree->ma_flags & MT_FLAGS_LOCK_MASK);
+ mt_on_stack(map->mt_detach);
+ mas_init(&map->mas_detach, &map->mt_detach, /* addr = */ 0);
/* Prepare to unmap any existing mapping in the area */
- error = vms_gather_munmap_vmas(&vms, &mas_detach);
- if (error)
- goto gather_failed;
+ error = vms_gather_munmap_vmas(vms, &map->mas_detach);
+ if (error) {
+ /* On error VMAs will already have been reattached. */
+ vms->nr_pages = 0;
+ return error;
+ }
- vmg.next = vms.next;
- vmg.prev = vms.prev;
- vma = NULL;
+ map->next = vms->next;
+ map->prev = vms->prev;
} else {
- vmg.next = vma_iter_next_rewind(&vmi, &vmg.prev);
+ map->next = vma_iter_next_rewind(vmi, &map->prev);
}
/* Check against address space limit. */
- if (!may_expand_vm(mm, vm_flags, pglen - vms.nr_pages)) {
- error = -ENOMEM;
- goto abort_munmap;
- }
+ if (!may_expand_vm(map->mm, map->flags, map->pglen - vms->nr_pages))
+ return -ENOMEM;
- /*
- * Private writable mapping: check memory availability
- */
- if (accountable_mapping(file, vm_flags)) {
- charged = pglen;
- charged -= vms.nr_accounted;
- if (charged) {
- error = security_vm_enough_memory_mm(mm, charged);
+ /* Private writable mapping: check memory availability. */
+ if (accountable_mapping(map->file, map->flags)) {
+ map->charged = map->pglen;
+ map->charged -= vms->nr_accounted;
+ if (map->charged) {
+ error = security_vm_enough_memory_mm(map->mm, map->charged);
if (error)
- goto abort_munmap;
+ return error;
}
- vms.nr_accounted = 0;
- vm_flags |= VM_ACCOUNT;
- vmg.flags = vm_flags;
+ vms->nr_accounted = 0;
+ map->flags |= VM_ACCOUNT;
}
/*
- * clear PTEs while the vma is still in the tree so that rmap
+ * Clear PTEs while the vma is still in the tree so that rmap
* cannot race with the freeing later in the truncate scenario.
* This is also needed for mmap_file(), which is why vm_ops
* close function is called.
*/
- vms_clean_up_area(&vms, &mas_detach);
- vma = vma_merge_new_range(&vmg);
- if (vma)
- goto expanded;
+ vms_clean_up_area(vms, &map->mas_detach);
+
+ return 0;
+}
+
+static int __mmap_new_file_vma(struct mmap_state *map,
+ struct vm_area_struct **vmap, bool *mergedp)
+{
+ struct vma_iterator *vmi = map->vmi;
+ struct vm_area_struct *vma = *vmap;
+ int error;
+
+ vma->vm_file = get_file(map->file);
+ error = mmap_file(vma->vm_file, vma);
+ if (error) {
+ fput(vma->vm_file);
+ vma->vm_file = NULL;
+
+ vma_iter_set(vmi, vma->vm_end);
+ /* Undo any partial mapping done by a device driver. */
+ unmap_region(&vmi->mas, vma, map->prev, map->next);
+
+ return error;
+ }
+
+ /* Drivers cannot alter the address of the VMA. */
+ WARN_ON_ONCE(map->addr != vma->vm_start);
+ /*
+ * Drivers should not permit writability when previously it was
+ * disallowed.
+ */
+ VM_WARN_ON_ONCE(map->flags != vma->vm_flags &&
+ !(map->flags & VM_MAYWRITE) &&
+ (vma->vm_flags & VM_MAYWRITE));
+
+ /* mmap_file() might have changed VMA flags. */
+ map->flags = vma->vm_flags;
+
+ vma_iter_config(vmi, map->addr, map->end);
+ /*
+ * If flags changed after mmap_file(), we should try merge
+ * vma again as we may succeed this time.
+ */
+ if (unlikely(map->flags != vma->vm_flags && map->prev)) {
+ struct vm_area_struct *merge;
+ VMG_MMAP_STATE(vmg, map, /* vma = */ NULL);
+
+ merge = vma_merge_new_range(&vmg);
+ if (merge) {
+ /*
+ * ->mmap() can change vma->vm_file and fput
+ * the original file. So fput the vma->vm_file
+ * here or we would add an extra fput for file
+ * and cause general protection fault
+ * ultimately.
+ */
+ fput(vma->vm_file);
+ vm_area_free(vma);
+ vma = merge;
+ *mergedp = true;
+ } else {
+ vma_iter_config(vmi, map->addr, map->end);
+ }
+ }
+
+ *vmap = vma;
+ return 0;
+}
+
+/*
+ * __mmap_new_vma() - Allocate a new VMA for the region, as merging was not
+ * possible.
+ *
+ * An exception to this is if the mapping is file-backed, and the underlying
+ * driver changes the VMA flags, permitting a subsequent merge of the VMA, in
+ * which case the returned VMA is one that was merged on a second attempt.
+ *
+ * @map: Mapping state.
+ * @vmap: Output pointer for the new VMA.
+ *
+ * Returns: Zero on success, or an error.
+ */
+static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
+{
+ struct vma_iterator *vmi = map->vmi;
+ int error = 0;
+ bool merged = false;
+ struct vm_area_struct *vma;
+
/*
* Determine the object being mapped and call the appropriate
* specific mapper. the address has already been validated, but
* not unmapped, but the maps are removed from the list.
*/
- vma = vm_area_alloc(mm);
- if (!vma) {
- error = -ENOMEM;
- goto unacct_error;
- }
+ vma = vm_area_alloc(map->mm);
+ if (!vma)
+ return -ENOMEM;
- vma_iter_config(&vmi, addr, end);
- vma_set_range(vma, addr, end, pgoff);
- vm_flags_init(vma, vm_flags);
- vma->vm_page_prot = vm_get_page_prot(vm_flags);
+ vma_iter_config(vmi, map->addr, map->end);
+ vma_set_range(vma, map->addr, map->end, map->pgoff);
+ vm_flags_init(vma, map->flags);
+ vma->vm_page_prot = vm_get_page_prot(map->flags);
- if (vma_iter_prealloc(&vmi, vma)) {
+ if (vma_iter_prealloc(vmi, vma)) {
error = -ENOMEM;
goto free_vma;
}
- if (file) {
- vma->vm_file = get_file(file);
- error = mmap_file(file, vma);
- if (error)
- goto unmap_and_free_file_vma;
-
- /* Drivers cannot alter the address of the VMA. */
- WARN_ON_ONCE(addr != vma->vm_start);
- /*
- * Drivers should not permit writability when previously it was
- * disallowed.
- */
- VM_WARN_ON_ONCE(vm_flags != vma->vm_flags &&
- !(vm_flags & VM_MAYWRITE) &&
- (vma->vm_flags & VM_MAYWRITE));
-
- vma_iter_config(&vmi, addr, end);
- /*
- * If vm_flags changed after mmap_file(), we should try merge
- * vma again as we may succeed this time.
- */
- if (unlikely(vm_flags != vma->vm_flags && vmg.prev)) {
- struct vm_area_struct *merge;
-
- vmg.flags = vma->vm_flags;
- /* If this fails, state is reset ready for a reattempt. */
- merge = vma_merge_new_range(&vmg);
-
- if (merge) {
- /*
- * ->mmap() can change vma->vm_file and fput
- * the original file. So fput the vma->vm_file
- * here or we would add an extra fput for file
- * and cause general protection fault
- * ultimately.
- */
- fput(vma->vm_file);
- vm_area_free(vma);
- vma = merge;
- /* Update vm_flags to pick up the change. */
- vm_flags = vma->vm_flags;
- goto file_expanded;
- }
- vma_iter_config(&vmi, addr, end);
- }
-
- vm_flags = vma->vm_flags;
- } else if (vm_flags & VM_SHARED) {
+ if (map->file)
+ error = __mmap_new_file_vma(map, &vma, &merged);
+ else if (map->flags & VM_SHARED)
error = shmem_zero_setup(vma);
- if (error)
- goto free_iter_vma;
- } else {
+ else
vma_set_anonymous(vma);
- }
+
+ if (error)
+ goto free_iter_vma;
+
+ if (merged)
+ goto file_expanded;
#ifdef CONFIG_SPARC64
/* TODO: Fix SPARC ADI! */
- WARN_ON_ONCE(!arch_validate_flags(vm_flags));
+ WARN_ON_ONCE(!arch_validate_flags(map->flags));
#endif
/* Lock the VMA since it is modified after insertion into VMA tree */
vma_start_write(vma);
- vma_iter_store(&vmi, vma);
- mm->map_count++;
+ vma_iter_store(vmi, vma);
+ map->mm->map_count++;
vma_link_file(vma);
/*
* vma_merge_new_range() calls khugepaged_enter_vma() too, the below
* call covers the non-merge case.
*/
- khugepaged_enter_vma(vma, vma->vm_flags);
+ khugepaged_enter_vma(vma, map->flags);
file_expanded:
- file = vma->vm_file;
ksm_add_vma(vma);
-expanded:
+ *vmap = vma;
+ return 0;
+
+free_iter_vma:
+ vma_iter_free(vmi);
+free_vma:
+ vm_area_free(vma);
+ return error;
+}
+
+/*
+ * __mmap_complete() - Unmap any VMAs we overlap, account memory mapping
+ * statistics, handle locking and finalise the VMA.
+ *
+ * @map: Mapping state.
+ * @vma: Merged or newly allocated VMA for the mmap()'d region.
+ */
+static void __mmap_complete(struct mmap_state *map, struct vm_area_struct *vma)
+{
+ struct mm_struct *mm = map->mm;
+ unsigned long vm_flags = vma->vm_flags;
+
perf_event_mmap(vma);
- /* Unmap any existing mapping in the area */
- vms_complete_munmap_vmas(&vms, &mas_detach);
+ /* Unmap any existing mapping in the area. */
+ vms_complete_munmap_vmas(&map->vms, &map->mas_detach);
- vm_stat_account(mm, vm_flags, pglen);
+ vm_stat_account(mm, vma->vm_flags, map->pglen);
if (vm_flags & VM_LOCKED) {
if ((vm_flags & VM_SPECIAL) || vma_is_dax(vma) ||
is_vm_hugetlb_page(vma) ||
- vma == get_gate_vma(current->mm))
+ vma == get_gate_vma(mm))
vm_flags_clear(vma, VM_LOCKED_MASK);
else
- mm->locked_vm += pglen;
+ mm->locked_vm += map->pglen;
}
- if (file)
+ if (vma->vm_file)
uprobe_mmap(vma);
/*
@@ -2363,26 +2474,45 @@ unsigned long __mmap_region(struct file *file, unsigned long addr,
vm_flags_set(vma, VM_SOFTDIRTY);
vma_set_page_prot(vma);
+}
- return addr;
+unsigned long __mmap_region(struct file *file, unsigned long addr,
+ unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
+ struct list_head *uf)
+{
+ struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma = NULL;
+ int error;
+ VMA_ITERATOR(vmi, mm, addr);
+ MMAP_STATE(map, mm, &vmi, addr, len, pgoff, vm_flags, file);
-unmap_and_free_file_vma:
- fput(vma->vm_file);
- vma->vm_file = NULL;
+ error = __mmap_prepare(&map, uf);
+ if (error)
+ goto abort_munmap;
- vma_iter_set(&vmi, vma->vm_end);
- /* Undo any partial mapping done by a device driver. */
- unmap_region(&vmi.mas, vma, vmg.prev, vmg.next);
-free_iter_vma:
- vma_iter_free(&vmi);
-free_vma:
- vm_area_free(vma);
-unacct_error:
- if (charged)
- vm_unacct_memory(charged);
+ /* Attempt to merge with adjacent VMAs... */
+ if (map.prev || map.next) {
+ VMG_MMAP_STATE(vmg, &map, /* vma = */ NULL);
+
+ vma = vma_merge_new_range(&vmg);
+ }
+ /* ...but if we can't, allocate a new VMA. */
+ if (!vma) {
+ error = __mmap_new_vma(&map, &vma);
+ if (error)
+ goto unacct_error;
+ }
+
+ __mmap_complete(&map, vma);
+
+ return addr;
+
+ /* Accounting was done by __mmap_prepare(). */
+unacct_error:
+ if (map.charged)
+ vm_unacct_memory(map.charged);
abort_munmap:
- vms_abort_munmap_vmas(&vms, &mas_detach);
-gather_failed:
+ vms_abort_munmap_vmas(&map.vms, &map.mas_detach);
return error;
}
--
2.47.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 4/5] mm: remove unnecessary reset state logic on merge new VMA
2024-10-25 12:26 [PATCH v3 0/5] fix error handling in mmap_region() and refactor Lorenzo Stoakes
` (2 preceding siblings ...)
2024-10-25 12:26 ` [PATCH v3 3/5] mm: refactor __mmap_region() Lorenzo Stoakes
@ 2024-10-25 12:26 ` Lorenzo Stoakes
2024-10-25 17:35 ` Vlastimil Babka
2024-10-25 12:26 ` [PATCH v3 5/5] mm: defer second attempt at merge on mmap() Lorenzo Stoakes
4 siblings, 1 reply; 10+ messages in thread
From: Lorenzo Stoakes @ 2024-10-25 12:26 UTC (permalink / raw)
To: Andrew Morton
Cc: Liam R . Howlett, Vlastimil Babka, Jann Horn, linux-kernel,
linux-mm, Linus Torvalds, Peter Xu
The only place where this was used was in mmap_region(), which we have now
adjusted to not require this to be performed (we reset ourselves in
effect).
It also created a dangerous assumption that VMG state could be safely
reused after a merge, at which point it may have been mutated in unexpected
ways, leading to subtle bugs.
Note that it was discovered by Wei Yang that there was also an error in
this code - we are comparing vmg->vma with prev after setting it to
NULL.
This however had no impact, as we previously reset VMA iterator state
before attempting merge again, but it was useless effort.
In any case, this patch removes all of the logic so also eliminates this
wasted effort.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/vma.c | 11 -----------
1 file changed, 11 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index b91c947babd6..7c690be67910 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -963,7 +963,6 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
struct vm_area_struct *next = vmg->next;
unsigned long start = vmg->start;
unsigned long end = vmg->end;
- pgoff_t pgoff = vmg->pgoff;
pgoff_t pglen = PHYS_PFN(end - start);
bool can_merge_left, can_merge_right;
bool just_expand = vmg->merge_flags & VMG_FLAG_JUST_EXPAND;
@@ -1020,16 +1019,6 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
return vmg->vma;
}
- /* If expansion failed, reset state. Allows us to retry merge later. */
- if (!just_expand) {
- vmg->vma = NULL;
- vmg->start = start;
- vmg->end = end;
- vmg->pgoff = pgoff;
- if (vmg->vma == prev)
- vma_iter_set(vmg->vmi, start);
- }
-
return NULL;
}
--
2.47.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 5/5] mm: defer second attempt at merge on mmap()
2024-10-25 12:26 [PATCH v3 0/5] fix error handling in mmap_region() and refactor Lorenzo Stoakes
` (3 preceding siblings ...)
2024-10-25 12:26 ` [PATCH v3 4/5] mm: remove unnecessary reset state logic on merge new VMA Lorenzo Stoakes
@ 2024-10-25 12:26 ` Lorenzo Stoakes
2024-10-25 17:40 ` Vlastimil Babka
2024-10-26 10:34 ` Lorenzo Stoakes
4 siblings, 2 replies; 10+ messages in thread
From: Lorenzo Stoakes @ 2024-10-25 12:26 UTC (permalink / raw)
To: Andrew Morton
Cc: Liam R . Howlett, Vlastimil Babka, Jann Horn, linux-kernel,
linux-mm, Linus Torvalds, Peter Xu
Rather than trying to merge again when ostensibly allocating a new VMA,
instead defer until the VMA is added and attempt to merge the existing
range.
This way we have no complicated unwinding logic midway through the process
of mapping the VMA.
In addition this removes limitations on the VMA not being able to be the
first in the virtual memory address space which was previously implicitly
required.
In theory, for this very same reason, we should unconditionally attempt
merge here, however this is likely to have a performance impact so it is
better to avoid this given the unlikely outcome of a merge.
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/vma.c | 55 ++++++++++++++-----------------------------------------
1 file changed, 14 insertions(+), 41 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index 7c690be67910..7194f9449c60 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -19,6 +19,7 @@ struct mmap_state {
struct file *file;
unsigned long charged;
+ bool retry_merge;
struct vm_area_struct *prev;
struct vm_area_struct *next;
@@ -2278,8 +2279,9 @@ static int __mmap_prepare(struct mmap_state *map, struct list_head *uf)
return 0;
}
+
static int __mmap_new_file_vma(struct mmap_state *map,
- struct vm_area_struct **vmap, bool *mergedp)
+ struct vm_area_struct **vmap)
{
struct vma_iterator *vmi = map->vmi;
struct vm_area_struct *vma = *vmap;
@@ -2308,37 +2310,10 @@ static int __mmap_new_file_vma(struct mmap_state *map,
!(map->flags & VM_MAYWRITE) &&
(vma->vm_flags & VM_MAYWRITE));
- /* mmap_file() might have changed VMA flags. */
+ /* If the flags change (and are mergeable), let's retry later. */
+ map->retry_merge = vma->vm_flags != map->flags && !(vma->vm_flags & VM_SPECIAL);
map->flags = vma->vm_flags;
- vma_iter_config(vmi, map->addr, map->end);
- /*
- * If flags changed after mmap_file(), we should try merge
- * vma again as we may succeed this time.
- */
- if (unlikely(map->flags != vma->vm_flags && map->prev)) {
- struct vm_area_struct *merge;
- VMG_MMAP_STATE(vmg, map, /* vma = */ NULL);
-
- merge = vma_merge_new_range(&vmg);
- if (merge) {
- /*
- * ->mmap() can change vma->vm_file and fput
- * the original file. So fput the vma->vm_file
- * here or we would add an extra fput for file
- * and cause general protection fault
- * ultimately.
- */
- fput(vma->vm_file);
- vm_area_free(vma);
- vma = merge;
- *mergedp = true;
- } else {
- vma_iter_config(vmi, map->addr, map->end);
- }
- }
-
- *vmap = vma;
return 0;
}
@@ -2346,10 +2321,6 @@ static int __mmap_new_file_vma(struct mmap_state *map,
* __mmap_new_vma() - Allocate a new VMA for the region, as merging was not
* possible.
*
- * An exception to this is if the mapping is file-backed, and the underlying
- * driver changes the VMA flags, permitting a subsequent merge of the VMA, in
- * which case the returned VMA is one that was merged on a second attempt.
- *
* @map: Mapping state.
* @vmap: Output pointer for the new VMA.
*
@@ -2359,7 +2330,6 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
{
struct vma_iterator *vmi = map->vmi;
int error = 0;
- bool merged = false;
struct vm_area_struct *vma;
/*
@@ -2382,7 +2352,7 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
}
if (map->file)
- error = __mmap_new_file_vma(map, &vma, &merged);
+ error = __mmap_new_file_vma(map, &vma);
else if (map->flags & VM_SHARED)
error = shmem_zero_setup(vma);
else
@@ -2391,9 +2361,6 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
if (error)
goto free_iter_vma;
- if (merged)
- goto file_expanded;
-
#ifdef CONFIG_SPARC64
/* TODO: Fix SPARC ADI! */
WARN_ON_ONCE(!arch_validate_flags(map->flags));
@@ -2410,8 +2377,6 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
* call covers the non-merge case.
*/
khugepaged_enter_vma(vma, map->flags);
-
-file_expanded:
ksm_add_vma(vma);
*vmap = vma;
return 0;
@@ -2493,6 +2458,14 @@ unsigned long __mmap_region(struct file *file, unsigned long addr,
goto unacct_error;
}
+ /* If flags changed, we might be able to merge, so try again. */
+ if (map.retry_merge) {
+ VMG_MMAP_STATE(vmg, &map, vma);
+
+ vma_iter_config(map.vmi, map.addr, map.end);
+ vma_merge_existing_range(&vmg);
+ }
+
__mmap_complete(&map, vma);
return addr;
--
2.47.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 4/5] mm: remove unnecessary reset state logic on merge new VMA
2024-10-25 12:26 ` [PATCH v3 4/5] mm: remove unnecessary reset state logic on merge new VMA Lorenzo Stoakes
@ 2024-10-25 17:35 ` Vlastimil Babka
0 siblings, 0 replies; 10+ messages in thread
From: Vlastimil Babka @ 2024-10-25 17:35 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: Liam R . Howlett, Jann Horn, linux-kernel, linux-mm,
Linus Torvalds, Peter Xu
On 10/25/24 14:26, Lorenzo Stoakes wrote:
> The only place where this was used was in mmap_region(), which we have now
> adjusted to not require this to be performed (we reset ourselves in
> effect).
>
> It also created a dangerous assumption that VMG state could be safely
> reused after a merge, at which point it may have been mutated in unexpected
> ways, leading to subtle bugs.
>
> Note that it was discovered by Wei Yang that there was also an error in
> this code - we are comparing vmg->vma with prev after setting it to
> NULL.
>
> This however had no impact, as we previously reset VMA iterator state
> before attempting merge again, but it was useless effort.
>
> In any case, this patch removes all of the logic so also eliminates this
> wasted effort.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Yeah glad to be rid of this.
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> mm/vma.c | 11 -----------
> 1 file changed, 11 deletions(-)
>
> diff --git a/mm/vma.c b/mm/vma.c
> index b91c947babd6..7c690be67910 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -963,7 +963,6 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
> struct vm_area_struct *next = vmg->next;
> unsigned long start = vmg->start;
> unsigned long end = vmg->end;
> - pgoff_t pgoff = vmg->pgoff;
> pgoff_t pglen = PHYS_PFN(end - start);
> bool can_merge_left, can_merge_right;
> bool just_expand = vmg->merge_flags & VMG_FLAG_JUST_EXPAND;
> @@ -1020,16 +1019,6 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
> return vmg->vma;
> }
>
> - /* If expansion failed, reset state. Allows us to retry merge later. */
> - if (!just_expand) {
> - vmg->vma = NULL;
> - vmg->start = start;
> - vmg->end = end;
> - vmg->pgoff = pgoff;
> - if (vmg->vma == prev)
> - vma_iter_set(vmg->vmi, start);
> - }
> -
> return NULL;
> }
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 5/5] mm: defer second attempt at merge on mmap()
2024-10-25 12:26 ` [PATCH v3 5/5] mm: defer second attempt at merge on mmap() Lorenzo Stoakes
@ 2024-10-25 17:40 ` Vlastimil Babka
2024-10-25 17:53 ` Lorenzo Stoakes
2024-10-26 10:34 ` Lorenzo Stoakes
1 sibling, 1 reply; 10+ messages in thread
From: Vlastimil Babka @ 2024-10-25 17:40 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: Liam R . Howlett, Jann Horn, linux-kernel, linux-mm,
Linus Torvalds, Peter Xu
On 10/25/24 14:26, Lorenzo Stoakes wrote:
> Rather than trying to merge again when ostensibly allocating a new VMA,
> instead defer until the VMA is added and attempt to merge the existing
> range.
>
> This way we have no complicated unwinding logic midway through the process
> of mapping the VMA.
>
> In addition this removes limitations on the VMA not being able to be the
> first in the virtual memory address space which was previously implicitly
> required.
>
> In theory, for this very same reason, we should unconditionally attempt
> merge here, however this is likely to have a performance impact so it is
> better to avoid this given the unlikely outcome of a merge.
>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> mm/vma.c | 55 ++++++++++++++-----------------------------------------
> 1 file changed, 14 insertions(+), 41 deletions(-)
>
> diff --git a/mm/vma.c b/mm/vma.c
> index 7c690be67910..7194f9449c60 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -19,6 +19,7 @@ struct mmap_state {
> struct file *file;
>
> unsigned long charged;
> + bool retry_merge;
>
> struct vm_area_struct *prev;
> struct vm_area_struct *next;
> @@ -2278,8 +2279,9 @@ static int __mmap_prepare(struct mmap_state *map, struct list_head *uf)
> return 0;
> }
>
> +
> static int __mmap_new_file_vma(struct mmap_state *map,
> - struct vm_area_struct **vmap, bool *mergedp)
> + struct vm_area_struct **vmap)
AFAICS **vmap could become *vma now
> {
> struct vma_iterator *vmi = map->vmi;
> struct vm_area_struct *vma = *vmap;
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 5/5] mm: defer second attempt at merge on mmap()
2024-10-25 17:40 ` Vlastimil Babka
@ 2024-10-25 17:53 ` Lorenzo Stoakes
0 siblings, 0 replies; 10+ messages in thread
From: Lorenzo Stoakes @ 2024-10-25 17:53 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, Liam R . Howlett, Jann Horn, linux-kernel,
linux-mm, Linus Torvalds, Peter Xu
On Fri, Oct 25, 2024 at 07:40:43PM +0200, Vlastimil Babka wrote:
> On 10/25/24 14:26, Lorenzo Stoakes wrote:
> > Rather than trying to merge again when ostensibly allocating a new VMA,
> > instead defer until the VMA is added and attempt to merge the existing
> > range.
> >
> > This way we have no complicated unwinding logic midway through the process
> > of mapping the VMA.
> >
> > In addition this removes limitations on the VMA not being able to be the
> > first in the virtual memory address space which was previously implicitly
> > required.
> >
> > In theory, for this very same reason, we should unconditionally attempt
> > merge here, however this is likely to have a performance impact so it is
> > better to avoid this given the unlikely outcome of a merge.
> >
> > Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> > mm/vma.c | 55 ++++++++++++++-----------------------------------------
> > 1 file changed, 14 insertions(+), 41 deletions(-)
> >
> > diff --git a/mm/vma.c b/mm/vma.c
> > index 7c690be67910..7194f9449c60 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -19,6 +19,7 @@ struct mmap_state {
> > struct file *file;
> >
> > unsigned long charged;
> > + bool retry_merge;
> >
> > struct vm_area_struct *prev;
> > struct vm_area_struct *next;
> > @@ -2278,8 +2279,9 @@ static int __mmap_prepare(struct mmap_state *map, struct list_head *uf)
> > return 0;
> > }
> >
> > +
> > static int __mmap_new_file_vma(struct mmap_state *map,
> > - struct vm_area_struct **vmap, bool *mergedp)
> > + struct vm_area_struct **vmap)
>
> AFAICS **vmap could become *vma now
Ah yeah good spot you're right, will chase up on respin or do a fix-patch
next week!
>
> > {
> > struct vma_iterator *vmi = map->vmi;
> > struct vm_area_struct *vma = *vmap;
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 5/5] mm: defer second attempt at merge on mmap()
2024-10-25 12:26 ` [PATCH v3 5/5] mm: defer second attempt at merge on mmap() Lorenzo Stoakes
2024-10-25 17:40 ` Vlastimil Babka
@ 2024-10-26 10:34 ` Lorenzo Stoakes
1 sibling, 0 replies; 10+ messages in thread
From: Lorenzo Stoakes @ 2024-10-26 10:34 UTC (permalink / raw)
To: Andrew Morton
Cc: Liam R . Howlett, Vlastimil Babka, Jann Horn, linux-kernel,
linux-mm, Linus Torvalds, Peter Xu
On Fri, Oct 25, 2024 at 01:26:27PM +0100, Lorenzo Stoakes wrote:
> Rather than trying to merge again when ostensibly allocating a new VMA,
> instead defer until the VMA is added and attempt to merge the existing
> range.
>
> This way we have no complicated unwinding logic midway through the process
> of mapping the VMA.
>
> In addition this removes limitations on the VMA not being able to be the
> first in the virtual memory address space which was previously implicitly
> required.
>
> In theory, for this very same reason, we should unconditionally attempt
> merge here, however this is likely to have a performance impact so it is
> better to avoid this given the unlikely outcome of a merge.
>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Hi Andrew, as per Vlasitmil's review could you squash this fix-patch into
this commit please? Thanks!
----8<----
From 51efd2702933919a605eb9a198adbdcf03da9c2f Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Sat, 26 Oct 2024 11:33:01 +0100
Subject: [PATCH] mm: remove unnecessary indirection
Now we removed the merge logic from __mmap_new_file_vma() we can simply
pass in the vma direct rather than a pointer to a VMA, as pointed out by
Vlastimil.
---
mm/vma.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index f22ffb772fe1..68138e8c153e 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -2278,10 +2278,9 @@ static int __mmap_prepare(struct mmap_state *map, struct list_head *uf)
static int __mmap_new_file_vma(struct mmap_state *map,
- struct vm_area_struct **vmap)
+ struct vm_area_struct *vma)
{
struct vma_iterator *vmi = map->vmi;
- struct vm_area_struct *vma = *vmap;
int error;
vma->vm_file = get_file(map->file);
@@ -2349,7 +2348,7 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
}
if (map->file)
- error = __mmap_new_file_vma(map, &vma);
+ error = __mmap_new_file_vma(map, vma);
else if (map->flags & VM_SHARED)
error = shmem_zero_setup(vma);
else
--
2.47.0
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-10-26 10:34 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-25 12:26 [PATCH v3 0/5] fix error handling in mmap_region() and refactor Lorenzo Stoakes
2024-10-25 12:26 ` [PATCH v3 1/5] tools: testing: add additional vma_internal.h stubs Lorenzo Stoakes
2024-10-25 12:26 ` [PATCH v3 2/5] mm: isolate mmap internal logic to mm/vma.c Lorenzo Stoakes
2024-10-25 12:26 ` [PATCH v3 3/5] mm: refactor __mmap_region() Lorenzo Stoakes
2024-10-25 12:26 ` [PATCH v3 4/5] mm: remove unnecessary reset state logic on merge new VMA Lorenzo Stoakes
2024-10-25 17:35 ` Vlastimil Babka
2024-10-25 12:26 ` [PATCH v3 5/5] mm: defer second attempt at merge on mmap() Lorenzo Stoakes
2024-10-25 17:40 ` Vlastimil Babka
2024-10-25 17:53 ` Lorenzo Stoakes
2024-10-26 10:34 ` Lorenzo Stoakes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox