* [PATCH v3 00/10] mm: remove vma_merge()
@ 2024-08-30 18:10 Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 01/10] tools: improve vma test Makefile Lorenzo Stoakes
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: Lorenzo Stoakes @ 2024-08-30 18:10 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton
Cc: Liam R . Howlett, Vlastimil Babka, Mark Brown
Andrew: This is rebased on v8 of Liam's series [4], so the ordering
between our series should be to merge his first and then mine on top of
that. Thanks!
The infamous vma_merge() function has been the cause of a great deal of
pain, bugs and confusion for a very long time.
It is subtle, contains many corner cases, tries to do far too much and is
as a result very fragile.
The fact that the function requires there to be a numbering system to cover
each possible eventuality with references to each in the many branches of
its implementation as to which case you are looking at speaks to all this.
Some of this complexity is inherent - unfortunately there is no getting
away from the need to figure out precisely how to execute the merge,
whether we need to remove VMAs, whether it is safe to do so, what
constitutes a mergeable VMA and so on.
However, a lot of the complexity is not inherent but instead a product of
the function's 'organic' development.
Liam has gone to great lengths to improve the situation as a part of his
maple tree implementation, greatly improving the readability of the code,
and Vlastimil and myself have additionally gone to lengths to try to
improve things further.
However, with the availability of userland VMA testing, it now becomes
possible to perform a rather more significant refactoring while maintaining
confidence in its correct operation.
An attempt was previously made by Vlastimil [0] to eliminate vma_merge(),
however it was rather - brutal - and an astute reader might refer to the
date of that patch for insight as to its intent.
This series instead divides merge operations into two natural kinds -
merges which occur when a NEW vma is being added to the address space, and
merges which occur when a vma is being MODIFIED.
Happily, the vma_expand() function introduced by Liam, which has the
capacity for also deleting a subsequent VMA, covers each of the NEW vma
cases.
By abstracting the actual final commit of changes to a VMA to its own
function, commit_merge() and writing a wrapper around vma_expand() for new
VMA cases vma_merge_new_range(), we can avoid having to use vma_merge() for
these instances altogether.
By doing so we are also able to then de-duplicate all existing merge logic
in mmap_region() and do_brk_flags() and have everything invoke this new
function, so we universally take the same approach to merging new VMAs.
Having done so, we can then completely rework vma_merge() into
vma_merge_existing_range() and use this for the instances where a merge is
proposed for a region of an existing VMA.
This eliminates vma_merge() and its numbered cases and instead divides
things into logical cases - merge both, merge left, merge right (the latter
2 being either partial or full merges).
The code is heavily annotated with ASCII diagrams and greatly simplified in
comparison to the existing vma_merge() function.
Having made this change, we take the opportunity to address an issue with
merging VMAs possessing a vm_ops->close() hook - commit 714965ca8252
("mm/mmap: start distinguishing if vma can be removed in mergeability
test") and commit fc0c8f9089c2 ("mm, mmap: fix vma_merge() case 7 with
vma_ops->close") make efforts to relax how we handle these, making
assumptions about which VMAs might end up deleted (and thus, if possessing
a vm_ops->close() hook, cannot be).
This refactor means we do not need to guess, so instead explicitly only
disallow merge in instances where a VMA with a vm_ops->close() hook would
be deleted (and try a smaller merge in cases where this is possible).
In addition to these changes, we introduce a new vma_merge_struct
abstraction to allow VMA merge state to be threaded through the operation
neatly.
There is heavy unit testing provided for all merge functionality, added
prior to the refactoring, allowing for before/after testing.
The vm_ops->close() change also introduces exhaustive testing to
demonstrate that this functions as expected, and in addition to this the
reproduction code from commit fc0c8f9089c2 ("mm, mmap: fix vma_merge() case
7 with vma_ops->close") was tested and confirmed passing.
[0]:https://lore.kernel.org/linux-mm/20240401192623.18575-2-vbabka@suse.cz/
[1]:https://lore.kernel.org/all/20240830040101.822209-1-Liam.Howlett@oracle.com/
[2]:https://lore.kernel.org/linux-mm/c0ef6b6a-1c9b-4da2-a180-c8e1c73b1c28@lucifer.local/
[3]:https://lore.kernel.org/all/9dcddc2c-482b-4e12-a409-eee8d902ba26@lucifer.local/
[4]:https://lore.kernel.org/all/20240830040101.822209-1-Liam.Howlett@oracle.com/
v3:
* Rebased on Liam's v8 'Avoid MAP_FIXED gap exposure' series [1].
* Fixed issue with copy_vma() vma iterator positioning as per [2] (formerly
fixed via a fix patch).
* Fixed issue with vma_merge_expand() not correctly obtaining the next VMA as
per [3] (formerly fixed via a fix patch) - Thanks Mark Brown!
* General whitespace fixes.
* Improved comments.
* Added comments for bool params for clarity.
* Removed unnecessary syntactic change in vma_merge().
* Removed unnecessary else from mmap_region().
* Introduced vma_iter_next_rewind(), are_anon_vmas_compatible(),
can_vma_merge_left(), can_vma_merge_right().
* Cleaned up logic in vma_merge_new_range().
* Cleaned up logic in vma_merge_existing_range().
* Eliminated vma_lookup() from all VMA merge code.
* Added vma_merge_extend() regression test + confirmed fails before fix + passes
after.
* Added copy_vma() regression test + confirmed triggers assert before fix +
doesn't after.
* Confirmed _all_ self-tests passing at same rate before/after changes.
* Confirmed no perf impact.
v2:
* Updated tests to function without the vmg change, and moved earlier in
series so we can test against the code _exactly_ as it was previously.
* Added vmg->mm to store mm_struct and avoid hacky container_of() in
vma_merge() prior to refactor. It's logical to thread this through.
* Stopped specifying vmg->vma for vma_merge_new_vma() from the start,
which was previously removed later in the series.
* Improve vma_modify_flags() to be better formatted for a large number of
flags.
* Removed if (vma) { ... } logic in mmap_region() and integrated the
approach from a later commit of putting logic into the if (next &&... )
block. Improved comment about why we are doing this.
* Introduced VMG_STATE() and VMG_VMA_STATE() macros and use these to avoid
duplication of initialisation of vmg state.
* Expanded the commit message for abstracting the policy comparison to
explain the logic.
* Reverted the use of vmg in vma_shrink() and split_vma().
* Reverted the cleanup of __split_vma() int -> bool as at this point fully
irrelevant to series.
* Reinstated incorrectly removed vmg.uffd_ctx assignment in mmap_region().
* Removed a confusing comment about assignment of vmg.end in early version
of mmap_region().
* Renamed vma_merge_new_vma() to vma_merge_new_range() and
vma_merge_modified() to vma_merge_existing_range(). This makes it clearer
what we're attempting to do.
* Stopped setting vmg parameters in do_brk_flags() that we did not set in
the original implementation, i.e. vma parameters for things like
anon_vma, uffd context, etc. which in the original implementation are not
checked in can_vma_merge_after().
* Moved VM_SPECIAL maple tree rewalk out of if (!prev && !next) { ... }
block in vma_merge_new_range() (which was changed to !next anyway). This
should always be done in the VM_SPECIAL case if vmg->prev is specified.
* Updated vma_merge_new_range() to correct the case where prev, next could
be merged individually with the proposed range, however not
together.
* Update vma_merge_new_range() to require that the caller sets prev and
next. This simplifies the logic and avoids unnecessary maple tree walks.
* Updated mmap_region() to update vmg->flags from vma->vm_flags on merge
reattempt.
* Updated callers of vma_merge_new_range() to ensure we always point the
iterator at prev if it exists.
* Added new state field to vmg to allow for errors to be returned.
* Adjusted do_brk_flags() to read vmg->state and handle memory allocation
failures accordingly.
* Do not double-assign VM_SOFTDIRTY in do_brk_flags().
* Separated out move of vma_prepare(), init_vma_prep(), vma_complete(),
can_vma_merge_before(), can_vma_merge_after() functions to separate
commit.
* Adjusted commit_merge() change to initially _only_ have parameters
relevant to vma_expand() to make review easier.
* Reinstated 'vma iterator must be pointing to start' comment in
commit_merge().
* Adjusted commit_merge() again when introducing vma_merge_existing_range()
to accept parameters specific to existing range merges.
* Removed unnecessary abstraction of vmg->end in vma_merge_existing_range()
as only used once.
* Abstract expanded parameter to local variable for clarity in
vma_merge_existing_range().
* Unlink anon_vma objects if VMA pre-allocation fails on commit_merge() in
vma_merge_existing_range() if any were duplicated. This was incorrectly
excluded from the refactor.
* Moved comment from close commit regarding merge_will_delete_both to
previous commit as unchanged behaviour.
* Corrected failure to assign vmg->flags after applying VM_ACCOUNT in
map_region() (this had caused a ~5% regression in do_brk_flags()
incidentally, now resolved).
* Added vmi assumptions and asserts in merge functions.
* Added lock asserts in merge functions.
* Added an assert to vma_merge_new_range() to ensure no VMA within
[vmg->start, vmg->end).
* Added additional comments describing why we are moving the iterator to
avoid maple tree re-walks.
* Added new test for the case of prev, next both with vm_ops->close()
adding a new VMA, which should result in prev being expanded but NOT
merged with next.
* Adjusted test code to do a mock version of anon_vma duplication, and
cleanup after itself.
* Adjusted test code to allow vma preallocation to fail so we can test
how we handle this.
* Added a test to assert correct anon_vma duplication behaviour.
* Added a test to assert that preallocation failure results in anon_vma's
being unlinked.
* Corrected vma_expand() assumption - we need vma, next not prev.
* Reinstated removed VM_WARN_ON() around vp.anon_vma state in
commit_merge().
* Rebased over Pedro + Liam's changes.
* Updated test logic to handle current->{mm,pid,comm} fields after rebase
on Liam's changes which use these. Also added stub for pr_warn_once() for
the same reason.
* Adjusted logic fundamentals based on rebase - vma_merge_new_range() now
assumes vmi is pointing at the gap...
https://lore.kernel.org/all/cover.1724441678.git.lorenzo.stoakes@oracle.com/
v1:
https://lore.kernel.org/linux-mm/cover.1722849859.git.lorenzo.stoakes@oracle.com/
Lorenzo Stoakes (10):
tools: improve vma test Makefile
tools: add VMA merge tests
mm: introduce vma_merge_struct and abstract vma_merge(),vma_modify()
mm: remove duplicated open-coded VMA policy check
mm: abstract vma_expand() to use vma_merge_struct
mm: avoid using vma_merge() for new VMAs
mm: make vma_prepare() and friends static and internal to vma.c
mm: introduce commit_merge(), abstracting final commit of merge
mm: refactor vma_merge() into modify-only vma_merge_existing_range()
mm: rework vm_ops->close() handling on VMA merge
mm/mmap.c | 103 +--
mm/vma.c | 1307 ++++++++++++++++------------
mm/vma.h | 179 ++--
tools/testing/vma/Makefile | 6 +-
tools/testing/vma/vma.c | 1366 +++++++++++++++++++++++++++++-
tools/testing/vma/vma_internal.h | 51 +-
6 files changed, 2316 insertions(+), 696 deletions(-)
--
2.46.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 01/10] tools: improve vma test Makefile
2024-08-30 18:10 [PATCH v3 00/10] mm: remove vma_merge() Lorenzo Stoakes
@ 2024-08-30 18:10 ` Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 02/10] tools: add VMA merge tests Lorenzo Stoakes
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Lorenzo Stoakes @ 2024-08-30 18:10 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton
Cc: Liam R . Howlett, Vlastimil Babka, Mark Brown
Have vma.o depend on its source dependencies explicitly, as previously
these were simply being ignored as existing object files were up to date.
This now correctly re-triggers the build if mm/ source is changed as well
as local source code.
Also set clean as a phony rule.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
tools/testing/vma/Makefile | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/vma/Makefile b/tools/testing/vma/Makefile
index bfc905d222cf..860fd2311dcc 100644
--- a/tools/testing/vma/Makefile
+++ b/tools/testing/vma/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0-or-later
-.PHONY: default
+.PHONY: default clean
default: vma
@@ -9,7 +9,9 @@ include ../shared/shared.mk
OFILES = $(SHARED_OFILES) vma.o maple-shim.o
TARGETS = vma
-vma: $(OFILES) vma_internal.h ../../../mm/vma.c ../../../mm/vma.h
+vma.o: vma.c vma_internal.h ../../../mm/vma.c ../../../mm/vma.h
+
+vma: $(OFILES)
$(CC) $(CFLAGS) -o $@ $(OFILES) $(LDLIBS)
clean:
--
2.46.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 02/10] tools: add VMA merge tests
2024-08-30 18:10 [PATCH v3 00/10] mm: remove vma_merge() Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 01/10] tools: improve vma test Makefile Lorenzo Stoakes
@ 2024-08-30 18:10 ` Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 03/10] mm: introduce vma_merge_struct and abstract vma_merge(),vma_modify() Lorenzo Stoakes
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Lorenzo Stoakes @ 2024-08-30 18:10 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton
Cc: Liam R . Howlett, Vlastimil Babka, Mark Brown
Add a variety of VMA merge unit tests to assert that the behaviour of VMA
merge is correct at an abstract level and VMAs are merged or not merged as
expected.
These are intentionally added _before_ we start refactoring vma_merge() in
order that we can continually assert correctness throughout the rest of the
series.
In order to reduce churn going forward, we backport the vma_merge_struct
data type to the test code which we introduce and use in a future commit,
and add wrappers around the merge new and existing VMA cases.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
tools/testing/vma/vma.c | 1282 +++++++++++++++++++++++++++++-
tools/testing/vma/vma_internal.h | 45 +-
2 files changed, 1317 insertions(+), 10 deletions(-)
diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
index 48e033c60d87..71bd30d5da81 100644
--- a/tools/testing/vma/vma.c
+++ b/tools/testing/vma/vma.c
@@ -7,13 +7,43 @@
#include "maple-shared.h"
#include "vma_internal.h"
+/* Include so header guard set. */
+#include "../../../mm/vma.h"
+
+static bool fail_prealloc;
+
+/* Then override vma_iter_prealloc() so we can choose to fail it. */
+#define vma_iter_prealloc(vmi, vma) \
+ (fail_prealloc ? -ENOMEM : mas_preallocate(&(vmi)->mas, (vma), GFP_KERNEL))
+
/*
* Directly import the VMA implementation here. Our vma_internal.h wrapper
* provides userland-equivalent functionality for everything vma.c uses.
*/
#include "../../../mm/vma.c"
+/*
+ * Temporarily forward-ported from a future in which vmg's are used for merging.
+ */
+struct vma_merge_struct {
+ struct mm_struct *mm;
+ struct vma_iterator *vmi;
+ pgoff_t pgoff;
+ struct vm_area_struct *prev;
+ struct vm_area_struct *next; /* Modified by vma_merge(). */
+ struct vm_area_struct *vma; /* Either a new VMA or the one being modified. */
+ unsigned long start;
+ unsigned long end;
+ unsigned long flags;
+ struct file *file;
+ struct anon_vma *anon_vma;
+ struct mempolicy *policy;
+ struct vm_userfaultfd_ctx uffd_ctx;
+ struct anon_vma_name *anon_name;
+};
+
const struct vm_operations_struct vma_dummy_vm_ops;
+static struct anon_vma dummy_anon_vma;
#define ASSERT_TRUE(_expr) \
do { \
@@ -28,6 +58,14 @@ const struct vm_operations_struct vma_dummy_vm_ops;
#define ASSERT_EQ(_val1, _val2) ASSERT_TRUE((_val1) == (_val2))
#define ASSERT_NE(_val1, _val2) ASSERT_TRUE((_val1) != (_val2))
+static struct task_struct __current;
+
+struct task_struct *get_current(void)
+{
+ return &__current;
+}
+
+/* Helper function to simply allocate a VMA. */
static struct vm_area_struct *alloc_vma(struct mm_struct *mm,
unsigned long start,
unsigned long end,
@@ -47,22 +85,201 @@ static struct vm_area_struct *alloc_vma(struct mm_struct *mm,
return ret;
}
+/* Helper function to allocate a VMA and link it to the tree. */
+static struct vm_area_struct *alloc_and_link_vma(struct mm_struct *mm,
+ unsigned long start,
+ unsigned long end,
+ pgoff_t pgoff,
+ vm_flags_t flags)
+{
+ struct vm_area_struct *vma = alloc_vma(mm, start, end, pgoff, flags);
+
+ if (vma == NULL)
+ return NULL;
+
+ if (vma_link(mm, vma)) {
+ vm_area_free(vma);
+ return NULL;
+ }
+
+ /*
+ * Reset this counter which we use to track whether writes have
+ * begun. Linking to the tree will have caused this to be incremented,
+ * which means we will get a false positive otherwise.
+ */
+ vma->vm_lock_seq = -1;
+
+ return vma;
+}
+
+/* Helper function which provides a wrapper around a merge new VMA operation. */
+static struct vm_area_struct *merge_new(struct vma_merge_struct *vmg)
+{
+ /* vma_merge() needs a VMA to determine mm, anon_vma, and file. */
+ struct vm_area_struct dummy = {
+ .vm_mm = vmg->mm,
+ .vm_flags = vmg->flags,
+ .anon_vma = vmg->anon_vma,
+ .vm_file = vmg->file,
+ };
+
+ /*
+ * For convenience, get prev and next VMAs. Which the new VMA operation
+ * requires.
+ */
+ vmg->next = vma_next(vmg->vmi);
+ vmg->prev = vma_prev(vmg->vmi);
+
+ vma_iter_set(vmg->vmi, vmg->start);
+ return vma_merge_new_vma(vmg->vmi, vmg->prev, &dummy, vmg->start,
+ vmg->end, vmg->pgoff);
+}
+
+/*
+ * Helper function which provides a wrapper around a merge existing VMA
+ * operation.
+ */
+static struct vm_area_struct *merge_existing(struct vma_merge_struct *vmg)
+{
+ /* vma_merge() needs a VMA to determine mm, anon_vma, and file. */
+ struct vm_area_struct dummy = {
+ .vm_mm = vmg->mm,
+ .vm_flags = vmg->flags,
+ .anon_vma = vmg->anon_vma,
+ .vm_file = vmg->file,
+ };
+
+ return vma_merge(vmg->vmi, vmg->prev, &dummy, vmg->start, vmg->end,
+ vmg->flags, vmg->pgoff, vmg->policy, vmg->uffd_ctx,
+ vmg->anon_name);
+}
+
+/*
+ * Helper function which provides a wrapper around the expansion of an existing
+ * VMA.
+ */
+static int expand_existing(struct vma_merge_struct *vmg)
+{
+ return vma_expand(vmg->vmi, vmg->vma, vmg->start, vmg->end, vmg->pgoff,
+ vmg->next);
+}
+
+/*
+ * Helper function to reset merge state the associated VMA iterator to a
+ * specified new range.
+ */
+static void vmg_set_range(struct vma_merge_struct *vmg, unsigned long start,
+ unsigned long end, pgoff_t pgoff, vm_flags_t flags)
+{
+ vma_iter_set(vmg->vmi, start);
+
+ vmg->prev = NULL;
+ vmg->next = NULL;
+ vmg->vma = NULL;
+
+ vmg->start = start;
+ vmg->end = end;
+ vmg->pgoff = pgoff;
+ vmg->flags = flags;
+}
+
+/*
+ * Helper function to try to merge a new VMA.
+ *
+ * Update vmg and the iterator for it and try to merge, otherwise allocate a new
+ * VMA, link it to the maple tree and return it.
+ */
+static struct vm_area_struct *try_merge_new_vma(struct mm_struct *mm,
+ struct vma_merge_struct *vmg,
+ unsigned long start, unsigned long end,
+ pgoff_t pgoff, vm_flags_t flags,
+ bool *was_merged)
+{
+ struct vm_area_struct *merged;
+
+ vmg_set_range(vmg, start, end, pgoff, flags);
+
+ merged = merge_new(vmg);
+ if (merged) {
+ *was_merged = true;
+ return merged;
+ }
+
+ *was_merged = false;
+ return alloc_and_link_vma(mm, start, end, pgoff, flags);
+}
+
+/*
+ * Helper function to reset the dummy anon_vma to indicate it has not been
+ * duplicated.
+ */
+static void reset_dummy_anon_vma(void)
+{
+ dummy_anon_vma.was_cloned = false;
+ dummy_anon_vma.was_unlinked = false;
+}
+
+/*
+ * Helper function to remove all VMAs and destroy the maple tree associated with
+ * a virtual address space. Returns a count of VMAs in the tree.
+ */
+static int cleanup_mm(struct mm_struct *mm, struct vma_iterator *vmi)
+{
+ struct vm_area_struct *vma;
+ int count = 0;
+
+ fail_prealloc = false;
+ reset_dummy_anon_vma();
+
+ vma_iter_set(vmi, 0);
+ for_each_vma(*vmi, vma) {
+ vm_area_free(vma);
+ count++;
+ }
+
+ mtree_destroy(&mm->mm_mt);
+ mm->map_count = 0;
+ return count;
+}
+
+/* Helper function to determine if VMA has had vma_start_write() performed. */
+static bool vma_write_started(struct vm_area_struct *vma)
+{
+ int seq = vma->vm_lock_seq;
+
+ /* We reset after each check. */
+ vma->vm_lock_seq = -1;
+
+ /* The vma_start_write() stub simply increments this value. */
+ return seq > -1;
+}
+
+/* Helper function providing a dummy vm_ops->close() method.*/
+static void dummy_close(struct vm_area_struct *)
+{
+}
+
static bool test_simple_merge(void)
{
struct vm_area_struct *vma;
unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
struct mm_struct mm = {};
struct vm_area_struct *vma_left = alloc_vma(&mm, 0, 0x1000, 0, flags);
- struct vm_area_struct *vma_middle = alloc_vma(&mm, 0x1000, 0x2000, 1, flags);
struct vm_area_struct *vma_right = alloc_vma(&mm, 0x2000, 0x3000, 2, flags);
VMA_ITERATOR(vmi, &mm, 0x1000);
+ struct vma_merge_struct vmg = {
+ .mm = &mm,
+ .vmi = &vmi,
+ .start = 0x1000,
+ .end = 0x2000,
+ .flags = flags,
+ .pgoff = 1,
+ };
ASSERT_FALSE(vma_link(&mm, vma_left));
- ASSERT_FALSE(vma_link(&mm, vma_middle));
ASSERT_FALSE(vma_link(&mm, vma_right));
- vma = vma_merge_new_vma(&vmi, vma_left, vma_middle, 0x1000,
- 0x2000, 1);
+ vma = merge_new(&vmg);
ASSERT_NE(vma, NULL);
ASSERT_EQ(vma->vm_start, 0);
@@ -142,10 +359,17 @@ static bool test_simple_expand(void)
struct mm_struct mm = {};
struct vm_area_struct *vma = alloc_vma(&mm, 0, 0x1000, 0, flags);
VMA_ITERATOR(vmi, &mm, 0);
+ struct vma_merge_struct vmg = {
+ .vmi = &vmi,
+ .vma = vma,
+ .start = 0,
+ .end = 0x3000,
+ .pgoff = 0,
+ };
ASSERT_FALSE(vma_link(&mm, vma));
- ASSERT_FALSE(vma_expand(&vmi, vma, 0, 0x3000, 0, NULL));
+ ASSERT_FALSE(expand_existing(&vmg));
ASSERT_EQ(vma->vm_start, 0);
ASSERT_EQ(vma->vm_end, 0x3000);
@@ -178,6 +402,1042 @@ static bool test_simple_shrink(void)
return true;
}
+static bool test_merge_new(void)
+{
+ unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
+ struct mm_struct mm = {};
+ VMA_ITERATOR(vmi, &mm, 0);
+ struct vma_merge_struct vmg = {
+ .mm = &mm,
+ .vmi = &vmi,
+ };
+ struct anon_vma_chain dummy_anon_vma_chain_a = {
+ .anon_vma = &dummy_anon_vma,
+ };
+ struct anon_vma_chain dummy_anon_vma_chain_b = {
+ .anon_vma = &dummy_anon_vma,
+ };
+ struct anon_vma_chain dummy_anon_vma_chain_c = {
+ .anon_vma = &dummy_anon_vma,
+ };
+ struct anon_vma_chain dummy_anon_vma_chain_d = {
+ .anon_vma = &dummy_anon_vma,
+ };
+ int count;
+ struct vm_area_struct *vma, *vma_a, *vma_b, *vma_c, *vma_d;
+ bool merged;
+
+ /*
+ * 0123456789abc
+ * AA B CC
+ */
+ vma_a = alloc_and_link_vma(&mm, 0, 0x2000, 0, flags);
+ ASSERT_NE(vma_a, NULL);
+ /* We give each VMA a single avc so we can test anon_vma duplication. */
+ INIT_LIST_HEAD(&vma_a->anon_vma_chain);
+ list_add(&dummy_anon_vma_chain_a.same_vma, &vma_a->anon_vma_chain);
+
+ vma_b = alloc_and_link_vma(&mm, 0x3000, 0x4000, 3, flags);
+ ASSERT_NE(vma_b, NULL);
+ INIT_LIST_HEAD(&vma_b->anon_vma_chain);
+ list_add(&dummy_anon_vma_chain_b.same_vma, &vma_b->anon_vma_chain);
+
+ vma_c = alloc_and_link_vma(&mm, 0xb000, 0xc000, 0xb, flags);
+ ASSERT_NE(vma_c, NULL);
+ INIT_LIST_HEAD(&vma_c->anon_vma_chain);
+ list_add(&dummy_anon_vma_chain_c.same_vma, &vma_c->anon_vma_chain);
+
+ /*
+ * NO merge.
+ *
+ * 0123456789abc
+ * AA B ** CC
+ */
+ vma_d = try_merge_new_vma(&mm, &vmg, 0x7000, 0x9000, 7, flags, &merged);
+ ASSERT_NE(vma_d, NULL);
+ INIT_LIST_HEAD(&vma_d->anon_vma_chain);
+ list_add(&dummy_anon_vma_chain_d.same_vma, &vma_d->anon_vma_chain);
+ ASSERT_FALSE(merged);
+ ASSERT_EQ(mm.map_count, 4);
+
+ /*
+ * Merge BOTH sides.
+ *
+ * 0123456789abc
+ * AA*B DD CC
+ */
+ vma_b->anon_vma = &dummy_anon_vma;
+ vma = try_merge_new_vma(&mm, &vmg, 0x2000, 0x3000, 2, flags, &merged);
+ ASSERT_EQ(vma, vma_a);
+ /* Merge with A, delete B. */
+ ASSERT_TRUE(merged);
+ ASSERT_EQ(vma->vm_start, 0);
+ ASSERT_EQ(vma->vm_end, 0x4000);
+ ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_write_started(vma));
+ ASSERT_EQ(mm.map_count, 3);
+
+ /*
+ * Merge to PREVIOUS VMA.
+ *
+ * 0123456789abc
+ * AAAA* DD CC
+ */
+ vma = try_merge_new_vma(&mm, &vmg, 0x4000, 0x5000, 4, flags, &merged);
+ ASSERT_EQ(vma, vma_a);
+ /* Extend A. */
+ ASSERT_TRUE(merged);
+ ASSERT_EQ(vma->vm_start, 0);
+ ASSERT_EQ(vma->vm_end, 0x5000);
+ ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_write_started(vma));
+ ASSERT_EQ(mm.map_count, 3);
+
+ /*
+ * Merge to NEXT VMA.
+ *
+ * 0123456789abc
+ * AAAAA *DD CC
+ */
+ vma_d->anon_vma = &dummy_anon_vma;
+ vma = try_merge_new_vma(&mm, &vmg, 0x6000, 0x7000, 6, flags, &merged);
+ ASSERT_EQ(vma, vma_d);
+ /* Prepend. */
+ ASSERT_TRUE(merged);
+ ASSERT_EQ(vma->vm_start, 0x6000);
+ ASSERT_EQ(vma->vm_end, 0x9000);
+ ASSERT_EQ(vma->vm_pgoff, 6);
+ ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_write_started(vma));
+ ASSERT_EQ(mm.map_count, 3);
+
+ /*
+ * Merge BOTH sides.
+ *
+ * 0123456789abc
+ * AAAAA*DDD CC
+ */
+ vma = try_merge_new_vma(&mm, &vmg, 0x5000, 0x6000, 5, flags, &merged);
+ ASSERT_EQ(vma, vma_a);
+ /* Merge with A, delete D. */
+ ASSERT_TRUE(merged);
+ ASSERT_EQ(vma->vm_start, 0);
+ ASSERT_EQ(vma->vm_end, 0x9000);
+ ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_write_started(vma));
+ ASSERT_EQ(mm.map_count, 2);
+
+ /*
+ * Merge to NEXT VMA.
+ *
+ * 0123456789abc
+ * AAAAAAAAA *CC
+ */
+ vma_c->anon_vma = &dummy_anon_vma;
+ vma = try_merge_new_vma(&mm, &vmg, 0xa000, 0xb000, 0xa, flags, &merged);
+ ASSERT_EQ(vma, vma_c);
+ /* Prepend C. */
+ ASSERT_TRUE(merged);
+ ASSERT_EQ(vma->vm_start, 0xa000);
+ ASSERT_EQ(vma->vm_end, 0xc000);
+ ASSERT_EQ(vma->vm_pgoff, 0xa);
+ ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_write_started(vma));
+ ASSERT_EQ(mm.map_count, 2);
+
+ /*
+ * Merge BOTH sides.
+ *
+ * 0123456789abc
+ * AAAAAAAAA*CCC
+ */
+ vma = try_merge_new_vma(&mm, &vmg, 0x9000, 0xa000, 0x9, flags, &merged);
+ ASSERT_EQ(vma, vma_a);
+ /* Extend A and delete C. */
+ ASSERT_TRUE(merged);
+ ASSERT_EQ(vma->vm_start, 0);
+ ASSERT_EQ(vma->vm_end, 0xc000);
+ ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_write_started(vma));
+ ASSERT_EQ(mm.map_count, 1);
+
+ /*
+ * Final state.
+ *
+ * 0123456789abc
+ * AAAAAAAAAAAAA
+ */
+
+ count = 0;
+ vma_iter_set(&vmi, 0);
+ for_each_vma(vmi, vma) {
+ ASSERT_NE(vma, NULL);
+ ASSERT_EQ(vma->vm_start, 0);
+ ASSERT_EQ(vma->vm_end, 0xc000);
+ ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
+
+ vm_area_free(vma);
+ count++;
+ }
+
+ /* Should only have one VMA left (though freed) after all is done.*/
+ ASSERT_EQ(count, 1);
+
+ mtree_destroy(&mm.mm_mt);
+ return true;
+}
+
+static bool test_vma_merge_special_flags(void)
+{
+ unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
+ struct mm_struct mm = {};
+ VMA_ITERATOR(vmi, &mm, 0);
+ struct vma_merge_struct vmg = {
+ .mm = &mm,
+ .vmi = &vmi,
+ };
+ vm_flags_t special_flags[] = { VM_IO, VM_DONTEXPAND, VM_PFNMAP, VM_MIXEDMAP };
+ vm_flags_t all_special_flags = 0;
+ int i;
+ struct vm_area_struct *vma_left, *vma;
+
+ /* Make sure there aren't new VM_SPECIAL flags. */
+ for (i = 0; i < ARRAY_SIZE(special_flags); i++) {
+ all_special_flags |= special_flags[i];
+ }
+ ASSERT_EQ(all_special_flags, VM_SPECIAL);
+
+ /*
+ * 01234
+ * AAA
+ */
+ vma_left = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ ASSERT_NE(vma_left, NULL);
+
+ /* 1. Set up new VMA with special flag that would otherwise merge. */
+
+ /*
+ * 01234
+ * AAA*
+ *
+ * This should merge if not for the VM_SPECIAL flag.
+ */
+ vmg_set_range(&vmg, 0x3000, 0x4000, 3, flags);
+ for (i = 0; i < ARRAY_SIZE(special_flags); i++) {
+ vm_flags_t special_flag = special_flags[i];
+
+ vma_left->__vm_flags = flags | special_flag;
+ vmg.flags = flags | special_flag;
+ vma = merge_new(&vmg);
+ ASSERT_EQ(vma, NULL);
+ }
+
+ /* 2. Modify VMA with special flag that would otherwise merge. */
+
+ /*
+ * 01234
+ * AAAB
+ *
+ * Create a VMA to modify.
+ */
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x4000, 3, flags);
+ ASSERT_NE(vma, NULL);
+ vmg.vma = vma;
+
+ for (i = 0; i < ARRAY_SIZE(special_flags); i++) {
+ vm_flags_t special_flag = special_flags[i];
+
+ vma_left->__vm_flags = flags | special_flag;
+ vmg.flags = flags | special_flag;
+ vma = merge_existing(&vmg);
+ ASSERT_EQ(vma, NULL);
+ }
+
+ cleanup_mm(&mm, &vmi);
+ return true;
+}
+
+static bool test_vma_merge_with_close(void)
+{
+ unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
+ struct mm_struct mm = {};
+ VMA_ITERATOR(vmi, &mm, 0);
+ struct vma_merge_struct vmg = {
+ .mm = &mm,
+ .vmi = &vmi,
+ };
+ const struct vm_operations_struct vm_ops = {
+ .close = dummy_close,
+ };
+ struct vm_area_struct *vma_next =
+ alloc_and_link_vma(&mm, 0x2000, 0x3000, 2, flags);
+ struct vm_area_struct *vma;
+
+ /*
+ * When we merge VMAs we sometimes have to delete others as part of the
+ * operation.
+ *
+ * Considering the two possible adjacent VMAs to which a VMA can be
+ * merged:
+ *
+ * [ prev ][ vma ][ next ]
+ *
+ * In no case will we need to delete prev. If the operation is
+ * mergeable, then prev will be extended with one or both of vma and
+ * next deleted.
+ *
+ * As a result, during initial mergeability checks, only
+ * can_vma_merge_before() (which implies the VMA being merged with is
+ * 'next' as shown above) bothers to check to see whether the next VMA
+ * has a vm_ops->close() callback that will need to be called when
+ * removed.
+ *
+ * If it does, then we cannot merge as the resources that the close()
+ * operation potentially clears down are tied only to the existing VMA
+ * range and we have no way of extending those to the nearly merged one.
+ *
+ * We must consider two scenarios:
+ *
+ * A.
+ *
+ * vm_ops->close: - - !NULL
+ * [ prev ][ vma ][ next ]
+ *
+ * Where prev may or may not be present/mergeable.
+ *
+ * This is picked up by a specific check in can_vma_merge_before().
+ *
+ * B.
+ *
+ * vm_ops->close: - !NULL
+ * [ prev ][ vma ]
+ *
+ * Where prev and vma are present and mergeable.
+ *
+ * This is picked up by a specific check in the modified VMA merge.
+ *
+ * IMPORTANT NOTE: We make the assumption that the following case:
+ *
+ * - !NULL NULL
+ * [ prev ][ vma ][ next ]
+ *
+ * Cannot occur, because vma->vm_ops being the same implies the same
+ * vma->vm_file, and therefore this would mean that next->vm_ops->close
+ * would be set too, and thus scenario A would pick this up.
+ */
+
+ ASSERT_NE(vma_next, NULL);
+
+ /*
+ * SCENARIO A
+ *
+ * 0123
+ * *N
+ */
+
+ /* Make the next VMA have a close() callback. */
+ vma_next->vm_ops = &vm_ops;
+
+ /* Our proposed VMA has characteristics that would otherwise be merged. */
+ vmg_set_range(&vmg, 0x1000, 0x2000, 1, flags);
+
+ /* The next VMA having a close() operator should cause the merge to fail.*/
+ ASSERT_EQ(merge_new(&vmg), NULL);
+
+ /* Now create the VMA so we can merge via modified flags */
+ vmg_set_range(&vmg, 0x1000, 0x2000, 1, flags);
+ vma = alloc_and_link_vma(&mm, 0x1000, 0x2000, 1, flags);
+ vmg.vma = vma;
+
+ /*
+ * The VMA being modified in a way that would otherwise merge should
+ * also fail.
+ */
+ ASSERT_EQ(merge_existing(&vmg), NULL);
+
+ /* SCENARIO B
+ *
+ * 0123
+ * P*
+ *
+ * In order for this scenario to trigger, the VMA currently being
+ * modified must also have a .close().
+ */
+
+ /* Reset VMG state. */
+ vmg_set_range(&vmg, 0x1000, 0x2000, 1, flags);
+ /*
+ * Make next unmergeable, and don't let the scenario A check pick this
+ * up, we want to reproduce scenario B only.
+ */
+ vma_next->vm_ops = NULL;
+ vma_next->__vm_flags &= ~VM_MAYWRITE;
+ /* Allocate prev. */
+ vmg.prev = alloc_and_link_vma(&mm, 0, 0x1000, 0, flags);
+ /* Assign a vm_ops->close() function to VMA explicitly. */
+ vma->vm_ops = &vm_ops;
+ vmg.vma = vma;
+ /* Make sure merge does not occur. */
+ ASSERT_EQ(merge_existing(&vmg), NULL);
+
+ cleanup_mm(&mm, &vmi);
+ return true;
+}
+
+static bool test_vma_merge_new_with_close(void)
+{
+ unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
+ struct mm_struct mm = {};
+ VMA_ITERATOR(vmi, &mm, 0);
+ struct vma_merge_struct vmg = {
+ .mm = &mm,
+ .vmi = &vmi,
+ };
+ struct vm_area_struct *vma_prev = alloc_and_link_vma(&mm, 0, 0x2000, 0, flags);
+ struct vm_area_struct *vma_next = alloc_and_link_vma(&mm, 0x5000, 0x7000, 5, flags);
+ const struct vm_operations_struct vm_ops = {
+ .close = dummy_close,
+ };
+ struct vm_area_struct *vma;
+
+ /*
+ * We should allow the partial merge of a proposed new VMA if the
+ * surrounding VMAs have vm_ops->close() hooks (but are otherwise
+ * compatible), e.g.:
+ *
+ * New VMA
+ * A v-------v B
+ * |-----| |-----|
+ * close close
+ *
+ * Since the rule is to not DELETE a VMA with a close operation, this
+ * should be permitted, only rather than expanding A and deleting B, we
+ * should simply expand A and leave B intact, e.g.:
+ *
+ * New VMA
+ * A B
+ * |------------||-----|
+ * close close
+ */
+
+ /* Have prev and next have a vm_ops->close() hook. */
+ vma_prev->vm_ops = &vm_ops;
+ vma_next->vm_ops = &vm_ops;
+
+ vmg_set_range(&vmg, 0x2000, 0x5000, 2, flags);
+ vma = merge_new(&vmg);
+ ASSERT_NE(vma, NULL);
+ ASSERT_EQ(vma->vm_start, 0);
+ ASSERT_EQ(vma->vm_end, 0x5000);
+ ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_EQ(vma->vm_ops, &vm_ops);
+ ASSERT_TRUE(vma_write_started(vma));
+ ASSERT_EQ(mm.map_count, 2);
+
+ cleanup_mm(&mm, &vmi);
+ return true;
+}
+
+static bool test_merge_existing(void)
+{
+ unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
+ struct mm_struct mm = {};
+ VMA_ITERATOR(vmi, &mm, 0);
+ struct vm_area_struct *vma, *vma_prev, *vma_next;
+ struct vma_merge_struct vmg = {
+ .mm = &mm,
+ .vmi = &vmi,
+ };
+
+ /*
+ * Merge right case - partial span.
+ *
+ * <->
+ * 0123456789
+ * VVVVNNN
+ * ->
+ * 0123456789
+ * VNNNNNN
+ */
+ vma = alloc_and_link_vma(&mm, 0x2000, 0x6000, 2, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, flags);
+ vmg_set_range(&vmg, 0x3000, 0x6000, 3, flags);
+ vmg.vma = vma;
+ vmg.prev = vma;
+ vma->anon_vma = &dummy_anon_vma;
+ ASSERT_EQ(merge_existing(&vmg), vma_next);
+ ASSERT_EQ(vma_next->vm_start, 0x3000);
+ ASSERT_EQ(vma_next->vm_end, 0x9000);
+ ASSERT_EQ(vma_next->vm_pgoff, 3);
+ ASSERT_EQ(vma_next->anon_vma, &dummy_anon_vma);
+ ASSERT_EQ(vma->vm_start, 0x2000);
+ ASSERT_EQ(vma->vm_end, 0x3000);
+ ASSERT_EQ(vma->vm_pgoff, 2);
+ ASSERT_TRUE(vma_write_started(vma));
+ ASSERT_TRUE(vma_write_started(vma_next));
+ ASSERT_EQ(mm.map_count, 2);
+
+ /* Clear down and reset. */
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
+
+ /*
+ * Merge right case - full span.
+ *
+ * <-->
+ * 0123456789
+ * VVVVNNN
+ * ->
+ * 0123456789
+ * NNNNNNN
+ */
+ vma = alloc_and_link_vma(&mm, 0x2000, 0x6000, 2, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, flags);
+ vmg_set_range(&vmg, 0x2000, 0x6000, 2, flags);
+ vmg.vma = vma;
+ vma->anon_vma = &dummy_anon_vma;
+ ASSERT_EQ(merge_existing(&vmg), vma_next);
+ ASSERT_EQ(vma_next->vm_start, 0x2000);
+ ASSERT_EQ(vma_next->vm_end, 0x9000);
+ ASSERT_EQ(vma_next->vm_pgoff, 2);
+ ASSERT_EQ(vma_next->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_write_started(vma_next));
+ ASSERT_EQ(mm.map_count, 1);
+
+ /* Clear down and reset. We should have deleted vma. */
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 1);
+
+ /*
+ * Merge left case - partial span.
+ *
+ * <->
+ * 0123456789
+ * PPPVVVV
+ * ->
+ * 0123456789
+ * PPPPPPV
+ */
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
+ vmg_set_range(&vmg, 0x3000, 0x6000, 3, flags);
+ vmg.prev = vma_prev;
+ vmg.vma = vma;
+ vma->anon_vma = &dummy_anon_vma;
+
+ ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vma_prev->vm_start, 0);
+ ASSERT_EQ(vma_prev->vm_end, 0x6000);
+ ASSERT_EQ(vma_prev->vm_pgoff, 0);
+ ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
+ ASSERT_EQ(vma->vm_start, 0x6000);
+ ASSERT_EQ(vma->vm_end, 0x7000);
+ ASSERT_EQ(vma->vm_pgoff, 6);
+ ASSERT_TRUE(vma_write_started(vma_prev));
+ ASSERT_TRUE(vma_write_started(vma));
+ ASSERT_EQ(mm.map_count, 2);
+
+ /* Clear down and reset. */
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
+
+ /*
+ * Merge left case - full span.
+ *
+ * <-->
+ * 0123456789
+ * PPPVVVV
+ * ->
+ * 0123456789
+ * PPPPPPP
+ */
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
+ vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
+ vmg.prev = vma_prev;
+ vmg.vma = vma;
+ vma->anon_vma = &dummy_anon_vma;
+ ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vma_prev->vm_start, 0);
+ ASSERT_EQ(vma_prev->vm_end, 0x7000);
+ ASSERT_EQ(vma_prev->vm_pgoff, 0);
+ ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_write_started(vma_prev));
+ ASSERT_EQ(mm.map_count, 1);
+
+ /* Clear down and reset. We should have deleted vma. */
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 1);
+
+ /*
+ * Merge both case.
+ *
+ * <-->
+ * 0123456789
+ * PPPVVVVNNN
+ * ->
+ * 0123456789
+ * PPPPPPPPPP
+ */
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, flags);
+ vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
+ vmg.prev = vma_prev;
+ vmg.vma = vma;
+ vma->anon_vma = &dummy_anon_vma;
+ ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vma_prev->vm_start, 0);
+ ASSERT_EQ(vma_prev->vm_end, 0x9000);
+ ASSERT_EQ(vma_prev->vm_pgoff, 0);
+ ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_write_started(vma_prev));
+ ASSERT_EQ(mm.map_count, 1);
+
+ /* Clear down and reset. We should have deleted prev and next. */
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 1);
+
+ /*
+ * Non-merge ranges. the modified VMA merge operation assumes that the
+ * caller always specifies ranges within the input VMA so we need only
+ * examine these cases.
+ *
+ * -
+ * -
+ * -
+ * <->
+ * <>
+ * <>
+ * 0123456789a
+ * PPPVVVVVNNN
+ */
+
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x8000, 3, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x8000, 0xa000, 8, flags);
+
+ vmg_set_range(&vmg, 0x4000, 0x5000, 4, flags);
+ vmg.prev = vma;
+ vmg.vma = vma;
+ ASSERT_EQ(merge_existing(&vmg), NULL);
+
+ vmg_set_range(&vmg, 0x5000, 0x6000, 5, flags);
+ vmg.prev = vma;
+ vmg.vma = vma;
+ ASSERT_EQ(merge_existing(&vmg), NULL);
+
+ vmg_set_range(&vmg, 0x6000, 0x7000, 6, flags);
+ vmg.prev = vma;
+ vmg.vma = vma;
+ ASSERT_EQ(merge_existing(&vmg), NULL);
+
+ vmg_set_range(&vmg, 0x4000, 0x7000, 4, flags);
+ vmg.prev = vma;
+ vmg.vma = vma;
+ ASSERT_EQ(merge_existing(&vmg), NULL);
+
+ vmg_set_range(&vmg, 0x4000, 0x6000, 4, flags);
+ vmg.prev = vma;
+ vmg.vma = vma;
+ ASSERT_EQ(merge_existing(&vmg), NULL);
+
+ vmg_set_range(&vmg, 0x5000, 0x6000, 5, flags);
+ vmg.prev = vma;
+ vmg.vma = vma;
+ ASSERT_EQ(merge_existing(&vmg), NULL);
+
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 3);
+
+ return true;
+}
+
+static bool test_anon_vma_non_mergeable(void)
+{
+ unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
+ struct mm_struct mm = {};
+ VMA_ITERATOR(vmi, &mm, 0);
+ struct vm_area_struct *vma, *vma_prev, *vma_next;
+ struct vma_merge_struct vmg = {
+ .mm = &mm,
+ .vmi = &vmi,
+ };
+ struct anon_vma_chain dummy_anon_vma_chain1 = {
+ .anon_vma = &dummy_anon_vma,
+ };
+ struct anon_vma_chain dummy_anon_vma_chain2 = {
+ .anon_vma = &dummy_anon_vma,
+ };
+
+ /*
+ * In the case of modified VMA merge, merging both left and right VMAs
+ * but where prev and next have incompatible anon_vma objects, we revert
+ * to a merge of prev and VMA:
+ *
+ * <-->
+ * 0123456789
+ * PPPVVVVNNN
+ * ->
+ * 0123456789
+ * PPPPPPPNNN
+ */
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, flags);
+
+ /*
+ * Give both prev and next single anon_vma_chain fields, so they will
+ * merge with the NULL vmg->anon_vma.
+ *
+ * However, when prev is compared to next, the merge should fail.
+ */
+
+ INIT_LIST_HEAD(&vma_prev->anon_vma_chain);
+ list_add(&dummy_anon_vma_chain1.same_vma, &vma_prev->anon_vma_chain);
+ ASSERT_TRUE(list_is_singular(&vma_prev->anon_vma_chain));
+ vma_prev->anon_vma = &dummy_anon_vma;
+ ASSERT_TRUE(is_mergeable_anon_vma(NULL, vma_prev->anon_vma, vma_prev));
+
+ INIT_LIST_HEAD(&vma_next->anon_vma_chain);
+ list_add(&dummy_anon_vma_chain2.same_vma, &vma_next->anon_vma_chain);
+ ASSERT_TRUE(list_is_singular(&vma_next->anon_vma_chain));
+ vma_next->anon_vma = (struct anon_vma *)2;
+ ASSERT_TRUE(is_mergeable_anon_vma(NULL, vma_next->anon_vma, vma_next));
+
+ ASSERT_FALSE(is_mergeable_anon_vma(vma_prev->anon_vma, vma_next->anon_vma, NULL));
+
+ vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
+ vmg.prev = vma_prev;
+ vmg.vma = vma;
+
+ ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vma_prev->vm_start, 0);
+ ASSERT_EQ(vma_prev->vm_end, 0x7000);
+ ASSERT_EQ(vma_prev->vm_pgoff, 0);
+ ASSERT_TRUE(vma_write_started(vma_prev));
+ ASSERT_FALSE(vma_write_started(vma_next));
+
+ /* Clear down and reset. */
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
+
+ /*
+ * Now consider the new VMA case. This is equivalent, only adding a new
+ * VMA in a gap between prev and next.
+ *
+ * <-->
+ * 0123456789
+ * PPP****NNN
+ * ->
+ * 0123456789
+ * PPPPPPPNNN
+ */
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, flags);
+
+ INIT_LIST_HEAD(&vma_prev->anon_vma_chain);
+ list_add(&dummy_anon_vma_chain1.same_vma, &vma_prev->anon_vma_chain);
+ vma_prev->anon_vma = (struct anon_vma *)1;
+
+ INIT_LIST_HEAD(&vma_next->anon_vma_chain);
+ list_add(&dummy_anon_vma_chain2.same_vma, &vma_next->anon_vma_chain);
+ vma_next->anon_vma = (struct anon_vma *)2;
+
+ vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
+ vmg.prev = vma_prev;
+
+ ASSERT_EQ(merge_new(&vmg), vma_prev);
+ ASSERT_EQ(vma_prev->vm_start, 0);
+ ASSERT_EQ(vma_prev->vm_end, 0x7000);
+ ASSERT_EQ(vma_prev->vm_pgoff, 0);
+ ASSERT_TRUE(vma_write_started(vma_prev));
+ ASSERT_FALSE(vma_write_started(vma_next));
+
+ /* Final cleanup. */
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
+
+ return true;
+}
+
+static bool test_dup_anon_vma(void)
+{
+ unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
+ struct mm_struct mm = {};
+ VMA_ITERATOR(vmi, &mm, 0);
+ struct vma_merge_struct vmg = {
+ .mm = &mm,
+ .vmi = &vmi,
+ };
+ struct anon_vma_chain dummy_anon_vma_chain = {
+ .anon_vma = &dummy_anon_vma,
+ };
+ struct vm_area_struct *vma_prev, *vma_next, *vma;
+
+ reset_dummy_anon_vma();
+
+ /*
+ * Expanding a VMA delete the next one duplicates next's anon_vma and
+ * assigns it to the expanded VMA.
+ *
+ * This covers new VMA merging, as these operations amount to a VMA
+ * expand.
+ */
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
+ vma_next->anon_vma = &dummy_anon_vma;
+
+ vmg_set_range(&vmg, 0, 0x5000, 0, flags);
+ vmg.vma = vma_prev;
+ vmg.next = vma_next;
+
+ ASSERT_EQ(expand_existing(&vmg), 0);
+
+ /* Will have been cloned. */
+ ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_prev->anon_vma->was_cloned);
+
+ /* Cleanup ready for next run. */
+ cleanup_mm(&mm, &vmi);
+
+ /*
+ * next has anon_vma, we assign to prev.
+ *
+ * |<----->|
+ * |-------*********-------|
+ * prev vma next
+ * extend delete delete
+ */
+
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x5000, 0x8000, 5, flags);
+
+ /* Initialise avc so mergeability check passes. */
+ INIT_LIST_HEAD(&vma_next->anon_vma_chain);
+ list_add(&dummy_anon_vma_chain.same_vma, &vma_next->anon_vma_chain);
+
+ vma_next->anon_vma = &dummy_anon_vma;
+ vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
+ vmg.prev = vma_prev;
+ vmg.vma = vma;
+
+ ASSERT_EQ(merge_existing(&vmg), vma_prev);
+
+ ASSERT_EQ(vma_prev->vm_start, 0);
+ ASSERT_EQ(vma_prev->vm_end, 0x8000);
+
+ ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_prev->anon_vma->was_cloned);
+
+ cleanup_mm(&mm, &vmi);
+
+ /*
+ * vma has anon_vma, we assign to prev.
+ *
+ * |<----->|
+ * |-------*********-------|
+ * prev vma next
+ * extend delete delete
+ */
+
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x5000, 0x8000, 5, flags);
+
+ vma->anon_vma = &dummy_anon_vma;
+ vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
+ vmg.prev = vma_prev;
+ vmg.vma = vma;
+
+ ASSERT_EQ(merge_existing(&vmg), vma_prev);
+
+ ASSERT_EQ(vma_prev->vm_start, 0);
+ ASSERT_EQ(vma_prev->vm_end, 0x8000);
+
+ ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_prev->anon_vma->was_cloned);
+
+ cleanup_mm(&mm, &vmi);
+
+ /*
+ * vma has anon_vma, we assign to prev.
+ *
+ * |<----->|
+ * |-------*************
+ * prev vma
+ * extend shrink/delete
+ */
+
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x8000, 3, flags);
+
+ vma->anon_vma = &dummy_anon_vma;
+ vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
+ vmg.prev = vma_prev;
+ vmg.vma = vma;
+
+ ASSERT_EQ(merge_existing(&vmg), vma_prev);
+
+ ASSERT_EQ(vma_prev->vm_start, 0);
+ ASSERT_EQ(vma_prev->vm_end, 0x5000);
+
+ ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_prev->anon_vma->was_cloned);
+
+ cleanup_mm(&mm, &vmi);
+
+ /*
+ * vma has anon_vma, we assign to next.
+ *
+ * |<----->|
+ * *************-------|
+ * vma next
+ * shrink/delete extend
+ */
+
+ vma = alloc_and_link_vma(&mm, 0, 0x5000, 0, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x5000, 0x8000, 5, flags);
+
+ vma->anon_vma = &dummy_anon_vma;
+ vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
+ vmg.prev = vma;
+ vmg.vma = vma;
+
+ ASSERT_EQ(merge_existing(&vmg), vma_next);
+
+ ASSERT_EQ(vma_next->vm_start, 0x3000);
+ ASSERT_EQ(vma_next->vm_end, 0x8000);
+
+ ASSERT_EQ(vma_next->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(vma_next->anon_vma->was_cloned);
+
+ cleanup_mm(&mm, &vmi);
+ return true;
+}
+
+static bool test_vmi_prealloc_fail(void)
+{
+ unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
+ struct mm_struct mm = {};
+ VMA_ITERATOR(vmi, &mm, 0);
+ struct vma_merge_struct vmg = {
+ .mm = &mm,
+ .vmi = &vmi,
+ };
+ struct vm_area_struct *vma_prev, *vma;
+
+ /*
+ * We are merging vma into prev, with vma possessing an anon_vma, which
+ * will be duplicated. We cause the vmi preallocation to fail and assert
+ * the duplicated anon_vma is unlinked.
+ */
+
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
+ vma->anon_vma = &dummy_anon_vma;
+
+ vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
+ vmg.prev = vma_prev;
+ vmg.vma = vma;
+
+ fail_prealloc = true;
+
+ /* This will cause the merge to fail. */
+ ASSERT_EQ(merge_existing(&vmg), NULL);
+ /* We will already have assigned the anon_vma. */
+ ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
+ /* And it was both cloned and unlinked. */
+ ASSERT_TRUE(dummy_anon_vma.was_cloned);
+ ASSERT_TRUE(dummy_anon_vma.was_unlinked);
+
+ cleanup_mm(&mm, &vmi); /* Resets fail_prealloc too. */
+
+ /*
+ * We repeat the same operation for expanding a VMA, which is what new
+ * VMA merging ultimately uses too. This asserts that unlinking is
+ * performed in this case too.
+ */
+
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
+ vma->anon_vma = &dummy_anon_vma;
+
+ vmg_set_range(&vmg, 0, 0x5000, 3, flags);
+ vmg.vma = vma_prev;
+ vmg.next = vma;
+
+ fail_prealloc = true;
+ ASSERT_EQ(expand_existing(&vmg), -ENOMEM);
+
+ ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
+ ASSERT_TRUE(dummy_anon_vma.was_cloned);
+ ASSERT_TRUE(dummy_anon_vma.was_unlinked);
+
+ cleanup_mm(&mm, &vmi);
+ return true;
+}
+
+static bool test_merge_extend(void)
+{
+ unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
+ struct mm_struct mm = {};
+ VMA_ITERATOR(vmi, &mm, 0x1000);
+ struct vm_area_struct *vma;
+
+ vma = alloc_and_link_vma(&mm, 0, 0x1000, 0, flags);
+ alloc_and_link_vma(&mm, 0x3000, 0x4000, 3, flags);
+
+ /*
+ * Extend a VMA into the gap between itself and the following VMA.
+ * This should result in a merge.
+ *
+ * <->
+ * * *
+ *
+ */
+
+ ASSERT_EQ(vma_merge_extend(&vmi, vma, 0x2000), vma);
+ ASSERT_EQ(vma->vm_start, 0);
+ ASSERT_EQ(vma->vm_end, 0x4000);
+ ASSERT_EQ(vma->vm_pgoff, 0);
+ ASSERT_TRUE(vma_write_started(vma));
+ ASSERT_EQ(mm.map_count, 1);
+
+ cleanup_mm(&mm, &vmi);
+ return true;
+}
+
+static bool test_copy_vma(void)
+{
+ unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
+ struct mm_struct mm = {};
+ bool need_locks = false;
+ VMA_ITERATOR(vmi, &mm, 0);
+ struct vm_area_struct *vma, *vma_new, *vma_next;
+
+ /* Move backwards and do not merge. */
+
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
+ vma_new = copy_vma(&vma, 0, 0x2000, 0, &need_locks);
+
+ ASSERT_NE(vma_new, vma);
+ ASSERT_EQ(vma_new->vm_start, 0);
+ ASSERT_EQ(vma_new->vm_end, 0x2000);
+ ASSERT_EQ(vma_new->vm_pgoff, 0);
+
+ cleanup_mm(&mm, &vmi);
+
+ /* Move a VMA into position next to another and merge the two. */
+
+ vma = alloc_and_link_vma(&mm, 0, 0x2000, 0, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x6000, 0x8000, 6, flags);
+ vma_new = copy_vma(&vma, 0x4000, 0x2000, 4, &need_locks);
+
+ ASSERT_EQ(vma_new, vma_next);
+
+ cleanup_mm(&mm, &vmi);
+ return true;
+}
+
int main(void)
{
int num_tests = 0, num_fail = 0;
@@ -193,11 +1453,23 @@ int main(void)
} \
} while (0)
+ /* Very simple tests to kick the tyres. */
TEST(simple_merge);
TEST(simple_modify);
TEST(simple_expand);
TEST(simple_shrink);
+ TEST(merge_new);
+ TEST(vma_merge_special_flags);
+ TEST(vma_merge_with_close);
+ TEST(vma_merge_new_with_close);
+ TEST(merge_existing);
+ TEST(anon_vma_non_mergeable);
+ TEST(dup_anon_vma);
+ TEST(vmi_prealloc_fail);
+ TEST(merge_extend);
+ TEST(copy_vma);
+
#undef TEST
printf("%d tests run, %d passed, %d failed.\n",
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
index 093560e5b2ac..a3c262c6eb73 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -81,8 +81,6 @@
#define AS_MM_ALL_LOCKS 2
-#define current NULL
-
/* We hardcode this for now. */
#define sysctl_max_map_count 0x1000000UL
@@ -92,6 +90,12 @@ typedef struct pgprot { pgprotval_t pgprot; } pgprot_t;
typedef unsigned long vm_flags_t;
typedef __bitwise unsigned int vm_fault_t;
+/*
+ * The shared stubs do not implement this, it amounts to an fprintf(STDERR,...)
+ * either way :)
+ */
+#define pr_warn_once pr_err
+
typedef struct refcount_struct {
atomic_t refs;
} refcount_t;
@@ -100,9 +104,30 @@ struct kref {
refcount_t refcount;
};
+/*
+ * Define the task command name length as enum, then it can be visible to
+ * BPF programs.
+ */
+enum {
+ TASK_COMM_LEN = 16,
+};
+
+struct task_struct {
+ char comm[TASK_COMM_LEN];
+ pid_t pid;
+ struct mm_struct *mm;
+};
+
+struct task_struct *get_current(void);
+#define current get_current()
+
struct anon_vma {
struct anon_vma *root;
struct rb_root_cached rb_root;
+
+ /* Test fields. */
+ bool was_cloned;
+ bool was_unlinked;
};
struct anon_vma_chain {
@@ -682,13 +707,21 @@ static inline int vma_dup_policy(struct vm_area_struct *, struct vm_area_struct
return 0;
}
-static inline int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *)
+static inline int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
{
+ /* For testing purposes. We indicate that an anon_vma has been cloned. */
+ if (src->anon_vma != NULL) {
+ dst->anon_vma = src->anon_vma;
+ dst->anon_vma->was_cloned = true;
+ }
+
return 0;
}
-static inline void vma_start_write(struct vm_area_struct *)
+static inline void vma_start_write(struct vm_area_struct *vma)
{
+ /* Used to indicate to tests that a write operation has begun. */
+ vma->vm_lock_seq++;
}
static inline void vma_adjust_trans_huge(struct vm_area_struct *vma,
@@ -759,8 +792,10 @@ static inline void vma_assert_write_locked(struct vm_area_struct *)
{
}
-static inline void unlink_anon_vmas(struct vm_area_struct *)
+static inline void unlink_anon_vmas(struct vm_area_struct *vma)
{
+ /* For testing purposes, indicate that the anon_vma was unlinked. */
+ vma->anon_vma->was_unlinked = true;
}
static inline void anon_vma_unlock_write(struct anon_vma *)
--
2.46.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 03/10] mm: introduce vma_merge_struct and abstract vma_merge(),vma_modify()
2024-08-30 18:10 [PATCH v3 00/10] mm: remove vma_merge() Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 01/10] tools: improve vma test Makefile Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 02/10] tools: add VMA merge tests Lorenzo Stoakes
@ 2024-08-30 18:10 ` Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 04/10] mm: remove duplicated open-coded VMA policy check Lorenzo Stoakes
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Lorenzo Stoakes @ 2024-08-30 18:10 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton
Cc: Liam R . Howlett, Vlastimil Babka, Mark Brown
Rather than passing around huge numbers of parameters to numerous helper
functions, abstract them into a single struct that we thread through the
operation, the vma_merge_struct ('vmg').
Adjust vma_merge() and vma_modify() to accept this parameter, as well as
predicate functions can_vma_merge_before(), can_vma_merge_after(), and the
vma_modify_...() helper functions.
Also introduce VMG_STATE() and VMG_VMA_STATE() helper macros to allow for
easy vmg declaration.
We additionally remove the requirement that vma_merge() is passed a VMA
object representing the candidate new VMA. Previously it used this to
obtain the mm_struct, file and anon_vma properties of the proposed range (a
rather confusing state of affairs), which are now provided by the vmg
directly.
We also remove the pgoff calculation previously performed vma_modify(), and
instead calculate this in VMG_VMA_STATE() via the vma_pgoff_offset()
helper.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
mm/mmap.c | 76 ++++++++-------
mm/vma.c | 207 ++++++++++++++++++++++++----------------
mm/vma.h | 127 ++++++++++++++----------
tools/testing/vma/vma.c | 43 +--------
4 files changed, 246 insertions(+), 207 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index fc726c4e98be..ca9c6939638b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1373,10 +1373,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long end = addr + len;
unsigned long merge_start = addr, merge_end = end;
bool writable_file_mapping = false;
- pgoff_t vm_pgoff;
int error = -ENOMEM;
VMA_ITERATOR(vmi, mm, addr);
+ VMG_STATE(vmg, mm, &vmi, addr, end, vm_flags, pgoff);
+ vmg.file = file;
/* Find the first overlapping VMA */
vma = vma_find(&vmi, end);
init_vma_munmap(&vms, &vmi, vma, addr, end, uf, /* unlock = */ false);
@@ -1389,12 +1390,12 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
if (error)
goto gather_failed;
- next = vms.next;
- prev = vms.prev;
+ next = vmg.next = vms.next;
+ prev = vmg.prev = vms.prev;
vma = NULL;
} else {
- next = vma_next(&vmi);
- prev = vma_prev(&vmi);
+ next = vmg.next = vma_next(&vmi);
+ prev = vmg.prev = vma_prev(&vmi);
if (prev)
vma_iter_next_range(&vmi);
}
@@ -1414,6 +1415,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vms.nr_accounted = 0;
vm_flags |= VM_ACCOUNT;
+ vmg.flags = vm_flags;
}
if (vm_flags & VM_SPECIAL)
@@ -1422,28 +1424,31 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
/* Attempt to expand an old mapping */
/* Check next */
if (next && next->vm_start == end && !vma_policy(next) &&
- can_vma_merge_before(next, vm_flags, NULL, file, pgoff+pglen,
- NULL_VM_UFFD_CTX, NULL)) {
+ can_vma_merge_before(&vmg)) {
merge_end = next->vm_end;
vma = next;
- vm_pgoff = next->vm_pgoff - pglen;
+ vmg.pgoff = next->vm_pgoff - pglen;
+ /*
+ * We set this here so if we will merge with the previous VMA in
+ * the code below, can_vma_merge_after() ensures anon_vma
+ * compatibility between prev and next.
+ */
+ vmg.anon_vma = vma->anon_vma;
+ vmg.uffd_ctx = vma->vm_userfaultfd_ctx;
}
/* Check prev */
if (prev && prev->vm_end == addr && !vma_policy(prev) &&
- (vma ? can_vma_merge_after(prev, vm_flags, vma->anon_vma, file,
- pgoff, vma->vm_userfaultfd_ctx, NULL) :
- can_vma_merge_after(prev, vm_flags, NULL, file, pgoff,
- NULL_VM_UFFD_CTX, NULL))) {
+ can_vma_merge_after(&vmg)) {
merge_start = prev->vm_start;
vma = prev;
- vm_pgoff = prev->vm_pgoff;
+ vmg.pgoff = prev->vm_pgoff;
vma_prev(&vmi); /* Equivalent to going to the previous range */
}
if (vma) {
/* Actually expand, if possible */
- if (!vma_expand(&vmi, vma, merge_start, merge_end, vm_pgoff, next)) {
+ if (!vma_expand(&vmi, vma, merge_start, merge_end, vmg.pgoff, next)) {
khugepaged_enter_vma(vma, vm_flags);
goto expanded;
}
@@ -1774,26 +1779,29 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
* Expand the existing vma if possible; Note that singular lists do not
* occur after forking, so the expand will only happen on new VMAs.
*/
- if (vma && vma->vm_end == addr && !vma_policy(vma) &&
- can_vma_merge_after(vma, flags, NULL, NULL,
- addr >> PAGE_SHIFT, NULL_VM_UFFD_CTX, NULL)) {
- vma_iter_config(vmi, vma->vm_start, addr + len);
- if (vma_iter_prealloc(vmi, vma))
- goto unacct_fail;
-
- vma_start_write(vma);
-
- init_vma_prep(&vp, vma);
- vma_prepare(&vp);
- vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0);
- vma->vm_end = addr + len;
- vm_flags_set(vma, VM_SOFTDIRTY);
- vma_iter_store(vmi, vma);
-
- vma_complete(&vp, vmi, mm);
- validate_mm(mm);
- khugepaged_enter_vma(vma, flags);
- goto out;
+ if (vma && vma->vm_end == addr && !vma_policy(vma)) {
+ VMG_STATE(vmg, mm, vmi, addr, addr + len, flags, PHYS_PFN(addr));
+
+ vmg.prev = vma;
+ if (can_vma_merge_after(&vmg)) {
+ vma_iter_config(vmi, vma->vm_start, addr + len);
+ if (vma_iter_prealloc(vmi, vma))
+ goto unacct_fail;
+
+ vma_start_write(vma);
+
+ init_vma_prep(&vp, vma);
+ vma_prepare(&vp);
+ vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0);
+ vma->vm_end = addr + len;
+ vm_flags_set(vma, VM_SOFTDIRTY);
+ vma_iter_store(vmi, vma);
+
+ vma_complete(&vp, vmi, mm);
+ validate_mm(mm);
+ khugepaged_enter_vma(vma, flags);
+ goto out;
+ }
}
if (vma)
diff --git a/mm/vma.c b/mm/vma.c
index 1736bb237b2c..6be645240f07 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -7,16 +7,18 @@
#include "vma_internal.h"
#include "vma.h"
-/*
- * If the vma has a ->close operation then the driver probably needs to release
- * per-vma resources, so we don't attempt to merge those if the caller indicates
- * the current vma may be removed as part of the merge.
- */
-static inline bool is_mergeable_vma(struct vm_area_struct *vma,
- struct file *file, unsigned long vm_flags,
- struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- struct anon_vma_name *anon_name, bool may_remove_vma)
+static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_next)
{
+ struct vm_area_struct *vma = merge_next ? vmg->next : vmg->prev;
+ /*
+ * If the vma has a ->close operation then the driver probably needs to
+ * release per-vma resources, so we don't attempt to merge those if the
+ * caller indicates the current vma may be removed as part of the merge,
+ * which is the case if we are attempting to merge the next VMA into
+ * this one.
+ */
+ bool may_remove_vma = merge_next;
+
/*
* VM_SOFTDIRTY should not prevent from VMA merging, if we
* match the flags but dirty bit -- the caller should mark
@@ -25,15 +27,15 @@ static inline bool is_mergeable_vma(struct vm_area_struct *vma,
* the kernel to generate new VMAs when old one could be
* extended instead.
*/
- if ((vma->vm_flags ^ vm_flags) & ~VM_SOFTDIRTY)
+ if ((vma->vm_flags ^ vmg->flags) & ~VM_SOFTDIRTY)
return false;
- if (vma->vm_file != file)
+ if (vma->vm_file != vmg->file)
return false;
if (may_remove_vma && vma->vm_ops && vma->vm_ops->close)
return false;
- if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx))
+ if (!is_mergeable_vm_userfaultfd_ctx(vma, vmg->uffd_ctx))
return false;
- if (!anon_vma_name_eq(anon_vma_name(vma), anon_name))
+ if (!anon_vma_name_eq(anon_vma_name(vma), vmg->anon_name))
return false;
return true;
}
@@ -94,16 +96,16 @@ static void init_multi_vma_prep(struct vma_prepare *vp,
* We assume the vma may be removed as part of the merge.
*/
bool
-can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
- struct anon_vma *anon_vma, struct file *file,
- pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- struct anon_vma_name *anon_name)
+can_vma_merge_before(struct vma_merge_struct *vmg)
{
- if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name, true) &&
- is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
- if (vma->vm_pgoff == vm_pgoff)
+ pgoff_t pglen = PHYS_PFN(vmg->end - vmg->start);
+
+ if (is_mergeable_vma(vmg, /* merge_next = */ true) &&
+ is_mergeable_anon_vma(vmg->anon_vma, vmg->next->anon_vma, vmg->next)) {
+ if (vmg->next->vm_pgoff == vmg->pgoff + pglen)
return true;
}
+
return false;
}
@@ -116,18 +118,11 @@ can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
*
* We assume that vma is not removed as part of the merge.
*/
-bool
-can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
- struct anon_vma *anon_vma, struct file *file,
- pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- struct anon_vma_name *anon_name)
+bool can_vma_merge_after(struct vma_merge_struct *vmg)
{
- if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name, false) &&
- is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
- pgoff_t vm_pglen;
-
- vm_pglen = vma_pages(vma);
- if (vma->vm_pgoff + vm_pglen == vm_pgoff)
+ if (is_mergeable_vma(vmg, /* merge_next = */ false) &&
+ is_mergeable_anon_vma(vmg->anon_vma, vmg->prev->anon_vma, vmg->prev)) {
+ if (vmg->prev->vm_pgoff + vma_pages(vmg->prev) == vmg->pgoff)
return true;
}
return false;
@@ -1017,16 +1012,10 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
* **** is not represented - it will be merged and the vma containing the
* area is returned, or the function will return NULL
*/
-static struct vm_area_struct
-*vma_merge(struct vma_iterator *vmi, struct vm_area_struct *prev,
- struct vm_area_struct *src, unsigned long addr, unsigned long end,
- unsigned long vm_flags, pgoff_t pgoff, struct mempolicy *policy,
- struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- struct anon_vma_name *anon_name)
+static struct vm_area_struct *vma_merge(struct vma_merge_struct *vmg)
{
- struct mm_struct *mm = src->vm_mm;
- struct anon_vma *anon_vma = src->anon_vma;
- struct file *file = src->vm_file;
+ struct mm_struct *mm = vmg->mm;
+ struct vm_area_struct *prev = vmg->prev;
struct vm_area_struct *curr, *next, *res;
struct vm_area_struct *vma, *adjust, *remove, *remove2;
struct vm_area_struct *anon_dup = NULL;
@@ -1036,16 +1025,18 @@ static struct vm_area_struct
bool merge_prev = false;
bool merge_next = false;
bool vma_expanded = false;
+ unsigned long addr = vmg->start;
+ unsigned long end = vmg->end;
unsigned long vma_start = addr;
unsigned long vma_end = end;
- pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
+ pgoff_t pglen = PHYS_PFN(end - addr);
long adj_start = 0;
/*
* We later require that vma->vm_flags == vm_flags,
* so this tests vma->vm_flags & VM_SPECIAL, too.
*/
- if (vm_flags & VM_SPECIAL)
+ if (vmg->flags & VM_SPECIAL)
return NULL;
/* Does the input range span an existing VMA? (cases 5 - 8) */
@@ -1053,27 +1044,26 @@ static struct vm_area_struct
if (!curr || /* cases 1 - 4 */
end == curr->vm_end) /* cases 6 - 8, adjacent VMA */
- next = vma_lookup(mm, end);
+ next = vmg->next = vma_lookup(mm, end);
else
- next = NULL; /* case 5 */
+ next = vmg->next = NULL; /* case 5 */
if (prev) {
vma_start = prev->vm_start;
vma_pgoff = prev->vm_pgoff;
/* Can we merge the predecessor? */
- if (addr == prev->vm_end && mpol_equal(vma_policy(prev), policy)
- && can_vma_merge_after(prev, vm_flags, anon_vma, file,
- pgoff, vm_userfaultfd_ctx, anon_name)) {
+ if (addr == prev->vm_end && mpol_equal(vma_policy(prev), vmg->policy)
+ && can_vma_merge_after(vmg)) {
+
merge_prev = true;
- vma_prev(vmi);
+ vma_prev(vmg->vmi);
}
}
/* Can we merge the successor? */
- if (next && mpol_equal(policy, vma_policy(next)) &&
- can_vma_merge_before(next, vm_flags, anon_vma, file, pgoff+pglen,
- vm_userfaultfd_ctx, anon_name)) {
+ if (next && mpol_equal(vmg->policy, vma_policy(next)) &&
+ can_vma_merge_before(vmg)) {
merge_next = true;
}
@@ -1164,13 +1154,13 @@ static struct vm_area_struct
vma_expanded = true;
if (vma_expanded) {
- vma_iter_config(vmi, vma_start, vma_end);
+ vma_iter_config(vmg->vmi, vma_start, vma_end);
} else {
- vma_iter_config(vmi, adjust->vm_start + adj_start,
+ vma_iter_config(vmg->vmi, adjust->vm_start + adj_start,
adjust->vm_end);
}
- if (vma_iter_prealloc(vmi, vma))
+ if (vma_iter_prealloc(vmg->vmi, vma))
goto prealloc_fail;
init_multi_vma_prep(&vp, vma, adjust, remove, remove2);
@@ -1182,20 +1172,20 @@ static struct vm_area_struct
vma_set_range(vma, vma_start, vma_end, vma_pgoff);
if (vma_expanded)
- vma_iter_store(vmi, vma);
+ vma_iter_store(vmg->vmi, vma);
if (adj_start) {
adjust->vm_start += adj_start;
adjust->vm_pgoff += adj_start >> PAGE_SHIFT;
if (adj_start < 0) {
WARN_ON(vma_expanded);
- vma_iter_store(vmi, next);
+ vma_iter_store(vmg->vmi, next);
}
}
- vma_complete(&vp, vmi, mm);
+ vma_complete(&vp, vmg->vmi, mm);
validate_mm(mm);
- khugepaged_enter_vma(res, vm_flags);
+ khugepaged_enter_vma(res, vmg->flags);
return res;
prealloc_fail:
@@ -1203,8 +1193,8 @@ static struct vm_area_struct
unlink_anon_vmas(anon_dup);
anon_vma_fail:
- vma_iter_set(vmi, addr);
- vma_iter_load(vmi);
+ vma_iter_set(vmg->vmi, addr);
+ vma_iter_load(vmg->vmi);
return NULL;
}
@@ -1221,32 +1211,27 @@ static struct vm_area_struct
* The function returns either the merged VMA, the original VMA if a split was
* required instead, or an error if the split failed.
*/
-struct vm_area_struct *vma_modify(struct vma_iterator *vmi,
- struct vm_area_struct *prev,
- struct vm_area_struct *vma,
- unsigned long start, unsigned long end,
- unsigned long vm_flags,
- struct mempolicy *policy,
- struct vm_userfaultfd_ctx uffd_ctx,
- struct anon_vma_name *anon_name)
+static struct vm_area_struct *vma_modify(struct vma_merge_struct *vmg)
{
- pgoff_t pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
+ struct vm_area_struct *vma = vmg->vma;
struct vm_area_struct *merged;
- merged = vma_merge(vmi, prev, vma, start, end, vm_flags,
- pgoff, policy, uffd_ctx, anon_name);
+ /* First, try to merge. */
+ merged = vma_merge(vmg);
if (merged)
return merged;
- if (vma->vm_start < start) {
- int err = split_vma(vmi, vma, start, 1);
+ /* Split any preceding portion of the VMA. */
+ if (vma->vm_start < vmg->start) {
+ int err = split_vma(vmg->vmi, vma, vmg->start, 1);
if (err)
return ERR_PTR(err);
}
- if (vma->vm_end > end) {
- int err = split_vma(vmi, vma, end, 0);
+ /* Split any trailing portion of the VMA. */
+ if (vma->vm_end > vmg->end) {
+ int err = split_vma(vmg->vmi, vma, vmg->end, 0);
if (err)
return ERR_PTR(err);
@@ -1255,6 +1240,65 @@ struct vm_area_struct *vma_modify(struct vma_iterator *vmi,
return vma;
}
+struct vm_area_struct *vma_modify_flags(
+ struct vma_iterator *vmi, struct vm_area_struct *prev,
+ struct vm_area_struct *vma, unsigned long start, unsigned long end,
+ unsigned long new_flags)
+{
+ VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
+
+ vmg.flags = new_flags;
+
+ return vma_modify(&vmg);
+}
+
+struct vm_area_struct
+*vma_modify_flags_name(struct vma_iterator *vmi,
+ struct vm_area_struct *prev,
+ struct vm_area_struct *vma,
+ unsigned long start,
+ unsigned long end,
+ unsigned long new_flags,
+ struct anon_vma_name *new_name)
+{
+ VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
+
+ vmg.flags = new_flags;
+ vmg.anon_name = new_name;
+
+ return vma_modify(&vmg);
+}
+
+struct vm_area_struct
+*vma_modify_policy(struct vma_iterator *vmi,
+ struct vm_area_struct *prev,
+ struct vm_area_struct *vma,
+ unsigned long start, unsigned long end,
+ struct mempolicy *new_pol)
+{
+ VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
+
+ vmg.policy = new_pol;
+
+ return vma_modify(&vmg);
+}
+
+struct vm_area_struct
+*vma_modify_flags_uffd(struct vma_iterator *vmi,
+ struct vm_area_struct *prev,
+ struct vm_area_struct *vma,
+ unsigned long start, unsigned long end,
+ unsigned long new_flags,
+ struct vm_userfaultfd_ctx new_ctx)
+{
+ VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
+
+ vmg.flags = new_flags;
+ vmg.uffd_ctx = new_ctx;
+
+ return vma_modify(&vmg);
+}
+
/*
* Attempt to merge a newly mapped VMA with those adjacent to it. The caller
* must ensure that [start, end) does not overlap any existing VMA.
@@ -1264,8 +1308,11 @@ struct vm_area_struct
struct vm_area_struct *vma, unsigned long start,
unsigned long end, pgoff_t pgoff)
{
- return vma_merge(vmi, prev, vma, start, end, vma->vm_flags, pgoff,
- vma_policy(vma), vma->vm_userfaultfd_ctx, anon_vma_name(vma));
+ VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
+
+ vmg.pgoff = pgoff;
+
+ return vma_merge(&vmg);
}
/*
@@ -1276,12 +1323,10 @@ struct vm_area_struct *vma_merge_extend(struct vma_iterator *vmi,
struct vm_area_struct *vma,
unsigned long delta)
{
- pgoff_t pgoff = vma->vm_pgoff + vma_pages(vma);
+ VMG_VMA_STATE(vmg, vmi, vma, vma, vma->vm_end, vma->vm_end + delta);
/* vma is specified as prev, so case 1 or 2 will apply. */
- return vma_merge(vmi, vma, vma, vma->vm_end, vma->vm_end + delta,
- vma->vm_flags, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx, anon_vma_name(vma));
+ return vma_merge(&vmg);
}
void unlink_file_vma_batch_init(struct unlink_vma_file_batch *vb)
diff --git a/mm/vma.h b/mm/vma.h
index 85616faa4490..b1301d2c1c84 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -52,6 +52,59 @@ struct vma_munmap_struct {
unsigned long data_vm;
};
+/* Represents a VMA merge operation. */
+struct vma_merge_struct {
+ struct mm_struct *mm;
+ struct vma_iterator *vmi;
+ pgoff_t pgoff;
+ struct vm_area_struct *prev;
+ struct vm_area_struct *next; /* Modified by vma_merge(). */
+ struct vm_area_struct *vma; /* Either a new VMA or the one being modified. */
+ unsigned long start;
+ unsigned long end;
+ unsigned long flags;
+ struct file *file;
+ struct anon_vma *anon_vma;
+ struct mempolicy *policy;
+ struct vm_userfaultfd_ctx uffd_ctx;
+ struct anon_vma_name *anon_name;
+};
+
+/* Assumes addr >= vma->vm_start. */
+static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
+ unsigned long addr)
+{
+ return vma->vm_pgoff + PHYS_PFN(addr - vma->vm_start);
+}
+
+#define VMG_STATE(name, mm_, vmi_, start_, end_, flags_, pgoff_) \
+ struct vma_merge_struct name = { \
+ .mm = mm_, \
+ .vmi = vmi_, \
+ .start = start_, \
+ .end = end_, \
+ .flags = flags_, \
+ .pgoff = pgoff_, \
+ }
+
+#define VMG_VMA_STATE(name, vmi_, prev_, vma_, start_, end_) \
+ struct vma_merge_struct name = { \
+ .mm = vma_->vm_mm, \
+ .vmi = vmi_, \
+ .prev = prev_, \
+ .next = NULL, \
+ .vma = vma_, \
+ .start = start_, \
+ .end = end_, \
+ .flags = vma_->vm_flags, \
+ .pgoff = vma_pgoff_offset(vma_, start_), \
+ .file = vma_->vm_file, \
+ .anon_vma = vma_->anon_vma, \
+ .policy = vma_policy(vma_), \
+ .uffd_ctx = vma_->vm_userfaultfd_ctx, \
+ .anon_name = anon_vma_name(vma_), \
+ }
+
#ifdef CONFIG_DEBUG_VM_MAPLE_TREE
void validate_mm(struct mm_struct *mm);
#else
@@ -212,80 +265,52 @@ void remove_vma(struct vm_area_struct *vma, bool unreachable, bool closed);
void unmap_region(struct ma_state *mas, struct vm_area_struct *vma,
struct vm_area_struct *prev, struct vm_area_struct *next);
-/* Required by mmap_region(). */
-bool
-can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
- struct anon_vma *anon_vma, struct file *file,
- pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- struct anon_vma_name *anon_name);
-
-/* Required by mmap_region() and do_brk_flags(). */
-bool
-can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
- struct anon_vma *anon_vma, struct file *file,
- pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- struct anon_vma_name *anon_name);
-
-struct vm_area_struct *vma_modify(struct vma_iterator *vmi,
- struct vm_area_struct *prev,
- struct vm_area_struct *vma,
- unsigned long start, unsigned long end,
- unsigned long vm_flags,
- struct mempolicy *policy,
- struct vm_userfaultfd_ctx uffd_ctx,
- struct anon_vma_name *anon_name);
+/*
+ * Can we merge the VMA described by vmg into the following VMA vmg->next?
+ *
+ * Required by mmap_region().
+ */
+bool can_vma_merge_before(struct vma_merge_struct *vmg);
+
+/*
+ * Can we merge the VMA described by vmg into the preceding VMA vmg->prev?
+ *
+ * Required by mmap_region() and do_brk_flags().
+ */
+bool can_vma_merge_after(struct vma_merge_struct *vmg);
/* We are about to modify the VMA's flags. */
-static inline struct vm_area_struct
-*vma_modify_flags(struct vma_iterator *vmi,
- struct vm_area_struct *prev,
- struct vm_area_struct *vma,
- unsigned long start, unsigned long end,
- unsigned long new_flags)
-{
- return vma_modify(vmi, prev, vma, start, end, new_flags,
- vma_policy(vma), vma->vm_userfaultfd_ctx,
- anon_vma_name(vma));
-}
+struct vm_area_struct *vma_modify_flags(struct vma_iterator *vmi,
+ struct vm_area_struct *prev, struct vm_area_struct *vma,
+ unsigned long start, unsigned long end,
+ unsigned long new_flags);
/* We are about to modify the VMA's flags and/or anon_name. */
-static inline struct vm_area_struct
+struct vm_area_struct
*vma_modify_flags_name(struct vma_iterator *vmi,
struct vm_area_struct *prev,
struct vm_area_struct *vma,
unsigned long start,
unsigned long end,
unsigned long new_flags,
- struct anon_vma_name *new_name)
-{
- return vma_modify(vmi, prev, vma, start, end, new_flags,
- vma_policy(vma), vma->vm_userfaultfd_ctx, new_name);
-}
+ struct anon_vma_name *new_name);
/* We are about to modify the VMA's memory policy. */
-static inline struct vm_area_struct
+struct vm_area_struct
*vma_modify_policy(struct vma_iterator *vmi,
struct vm_area_struct *prev,
struct vm_area_struct *vma,
unsigned long start, unsigned long end,
- struct mempolicy *new_pol)
-{
- return vma_modify(vmi, prev, vma, start, end, vma->vm_flags,
- new_pol, vma->vm_userfaultfd_ctx, anon_vma_name(vma));
-}
+ struct mempolicy *new_pol);
/* We are about to modify the VMA's flags and/or uffd context. */
-static inline struct vm_area_struct
+struct vm_area_struct
*vma_modify_flags_uffd(struct vma_iterator *vmi,
struct vm_area_struct *prev,
struct vm_area_struct *vma,
unsigned long start, unsigned long end,
unsigned long new_flags,
- struct vm_userfaultfd_ctx new_ctx)
-{
- return vma_modify(vmi, prev, vma, start, end, new_flags,
- vma_policy(vma), new_ctx, anon_vma_name(vma));
-}
+ struct vm_userfaultfd_ctx new_ctx);
struct vm_area_struct
*vma_merge_new_vma(struct vma_iterator *vmi, struct vm_area_struct *prev,
diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
index 71bd30d5da81..7a3f59186464 100644
--- a/tools/testing/vma/vma.c
+++ b/tools/testing/vma/vma.c
@@ -22,26 +22,6 @@ static bool fail_prealloc;
*/
#include "../../../mm/vma.c"
-/*
- * Temporarily forward-ported from a future in which vmg's are used for merging.
- */
-struct vma_merge_struct {
- struct mm_struct *mm;
- struct vma_iterator *vmi;
- pgoff_t pgoff;
- struct vm_area_struct *prev;
- struct vm_area_struct *next; /* Modified by vma_merge(). */
- struct vm_area_struct *vma; /* Either a new VMA or the one being modified. */
- unsigned long start;
- unsigned long end;
- unsigned long flags;
- struct file *file;
- struct anon_vma *anon_vma;
- struct mempolicy *policy;
- struct vm_userfaultfd_ctx uffd_ctx;
- struct anon_vma_name *anon_name;
-};
-
const struct vm_operations_struct vma_dummy_vm_ops;
static struct anon_vma dummy_anon_vma;
@@ -115,14 +95,6 @@ static struct vm_area_struct *alloc_and_link_vma(struct mm_struct *mm,
/* Helper function which provides a wrapper around a merge new VMA operation. */
static struct vm_area_struct *merge_new(struct vma_merge_struct *vmg)
{
- /* vma_merge() needs a VMA to determine mm, anon_vma, and file. */
- struct vm_area_struct dummy = {
- .vm_mm = vmg->mm,
- .vm_flags = vmg->flags,
- .anon_vma = vmg->anon_vma,
- .vm_file = vmg->file,
- };
-
/*
* For convenience, get prev and next VMAs. Which the new VMA operation
* requires.
@@ -131,8 +103,7 @@ static struct vm_area_struct *merge_new(struct vma_merge_struct *vmg)
vmg->prev = vma_prev(vmg->vmi);
vma_iter_set(vmg->vmi, vmg->start);
- return vma_merge_new_vma(vmg->vmi, vmg->prev, &dummy, vmg->start,
- vmg->end, vmg->pgoff);
+ return vma_merge(vmg);
}
/*
@@ -141,17 +112,7 @@ static struct vm_area_struct *merge_new(struct vma_merge_struct *vmg)
*/
static struct vm_area_struct *merge_existing(struct vma_merge_struct *vmg)
{
- /* vma_merge() needs a VMA to determine mm, anon_vma, and file. */
- struct vm_area_struct dummy = {
- .vm_mm = vmg->mm,
- .vm_flags = vmg->flags,
- .anon_vma = vmg->anon_vma,
- .vm_file = vmg->file,
- };
-
- return vma_merge(vmg->vmi, vmg->prev, &dummy, vmg->start, vmg->end,
- vmg->flags, vmg->pgoff, vmg->policy, vmg->uffd_ctx,
- vmg->anon_name);
+ return vma_merge(vmg);
}
/*
--
2.46.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 04/10] mm: remove duplicated open-coded VMA policy check
2024-08-30 18:10 [PATCH v3 00/10] mm: remove vma_merge() Lorenzo Stoakes
` (2 preceding siblings ...)
2024-08-30 18:10 ` [PATCH v3 03/10] mm: introduce vma_merge_struct and abstract vma_merge(),vma_modify() Lorenzo Stoakes
@ 2024-08-30 18:10 ` Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 05/10] mm: abstract vma_expand() to use vma_merge_struct Lorenzo Stoakes
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Lorenzo Stoakes @ 2024-08-30 18:10 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton
Cc: Liam R . Howlett, Vlastimil Babka, Mark Brown
Both can_vma_merge_before() and can_vma_merge_after() are invoked after
checking for compatible VMA NUMA policy, we can simply move this to
is_mergeable_vma() and abstract this altogether.
In mmap_region() we set vmg->policy to NULL, so the policy comparisons
checked in can_vma_merge_before() and can_vma_merge_after() are exactly
equivalent to !vma_policy(vmg.next) and !vma_policy(vmg.prev).
Equally, in do_brk_flags(), vmg->policy is NULL, so the
can_vma_merge_after() is checking !vma_policy(vma), as we set vmg.prev to
vma.
In vma_merge(), we compare prev and next policies with vmg->policy before
checking can_vma_merge_after() and can_vma_merge_before() respectively,
which this patch causes to be checked in precisely the same way.
This therefore maintains precisely the same logic as before, only now
abstracted into is_mergeable_vma().
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
mm/mmap.c | 8 +++-----
mm/vma.c | 9 ++++-----
2 files changed, 7 insertions(+), 10 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index ca9c6939638b..3af8459e4e88 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1423,8 +1423,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
/* Attempt to expand an old mapping */
/* Check next */
- if (next && next->vm_start == end && !vma_policy(next) &&
- can_vma_merge_before(&vmg)) {
+ if (next && next->vm_start == end && can_vma_merge_before(&vmg)) {
merge_end = next->vm_end;
vma = next;
vmg.pgoff = next->vm_pgoff - pglen;
@@ -1438,8 +1437,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
}
/* Check prev */
- if (prev && prev->vm_end == addr && !vma_policy(prev) &&
- can_vma_merge_after(&vmg)) {
+ if (prev && prev->vm_end == addr && can_vma_merge_after(&vmg)) {
merge_start = prev->vm_start;
vma = prev;
vmg.pgoff = prev->vm_pgoff;
@@ -1779,7 +1777,7 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
* Expand the existing vma if possible; Note that singular lists do not
* occur after forking, so the expand will only happen on new VMAs.
*/
- if (vma && vma->vm_end == addr && !vma_policy(vma)) {
+ if (vma && vma->vm_end == addr) {
VMG_STATE(vmg, mm, vmi, addr, addr + len, flags, PHYS_PFN(addr));
vmg.prev = vma;
diff --git a/mm/vma.c b/mm/vma.c
index 6be645240f07..3284bb778c3d 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -19,6 +19,8 @@ static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_nex
*/
bool may_remove_vma = merge_next;
+ if (!mpol_equal(vmg->policy, vma_policy(vma)))
+ return false;
/*
* VM_SOFTDIRTY should not prevent from VMA merging, if we
* match the flags but dirty bit -- the caller should mark
@@ -1053,17 +1055,14 @@ static struct vm_area_struct *vma_merge(struct vma_merge_struct *vmg)
vma_pgoff = prev->vm_pgoff;
/* Can we merge the predecessor? */
- if (addr == prev->vm_end && mpol_equal(vma_policy(prev), vmg->policy)
- && can_vma_merge_after(vmg)) {
-
+ if (addr == prev->vm_end && can_vma_merge_after(vmg)) {
merge_prev = true;
vma_prev(vmg->vmi);
}
}
/* Can we merge the successor? */
- if (next && mpol_equal(vmg->policy, vma_policy(next)) &&
- can_vma_merge_before(vmg)) {
+ if (next && can_vma_merge_before(vmg)) {
merge_next = true;
}
--
2.46.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 05/10] mm: abstract vma_expand() to use vma_merge_struct
2024-08-30 18:10 [PATCH v3 00/10] mm: remove vma_merge() Lorenzo Stoakes
` (3 preceding siblings ...)
2024-08-30 18:10 ` [PATCH v3 04/10] mm: remove duplicated open-coded VMA policy check Lorenzo Stoakes
@ 2024-08-30 18:10 ` Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 06/10] mm: avoid using vma_merge() for new VMAs Lorenzo Stoakes
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Lorenzo Stoakes @ 2024-08-30 18:10 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton
Cc: Liam R . Howlett, Vlastimil Babka, Mark Brown
The purpose of the vmg is to thread merge state through functions and avoid
egregious parameter lists. We expand this to vma_expand(), which is used
for a number of merge cases.
Accordingly, adjust its callers, mmap_region() and relocate_vma_down(), to
use a vmg.
An added purpose of this change is the ability in a future commit to
perform all new VMA range merging using vma_expand().
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
mm/mmap.c | 15 ++++++++-------
mm/vma.c | 39 +++++++++++++++++----------------------
mm/vma.h | 5 +----
tools/testing/vma/vma.c | 3 +--
4 files changed, 27 insertions(+), 35 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 3af8459e4e88..2b3006efd3fb 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1371,7 +1371,6 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
struct ma_state mas_detach;
struct maple_tree mt_detach;
unsigned long end = addr + len;
- unsigned long merge_start = addr, merge_end = end;
bool writable_file_mapping = false;
int error = -ENOMEM;
VMA_ITERATOR(vmi, mm, addr);
@@ -1424,8 +1423,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
/* Attempt to expand an old mapping */
/* Check next */
if (next && next->vm_start == end && can_vma_merge_before(&vmg)) {
- merge_end = next->vm_end;
- vma = next;
+ vmg.end = next->vm_end;
+ vma = vmg.vma = next;
vmg.pgoff = next->vm_pgoff - pglen;
/*
* We set this here so if we will merge with the previous VMA in
@@ -1438,15 +1437,15 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
/* Check prev */
if (prev && prev->vm_end == addr && can_vma_merge_after(&vmg)) {
- merge_start = prev->vm_start;
- vma = prev;
+ vmg.start = prev->vm_start;
+ vma = vmg.vma = prev;
vmg.pgoff = prev->vm_pgoff;
vma_prev(&vmi); /* Equivalent to going to the previous range */
}
if (vma) {
/* Actually expand, if possible */
- if (!vma_expand(&vmi, vma, merge_start, merge_end, vmg.pgoff, next)) {
+ if (!vma_expand(&vmg)) {
khugepaged_enter_vma(vma, vm_flags);
goto expanded;
}
@@ -2320,6 +2319,7 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
unsigned long new_start = old_start - shift;
unsigned long new_end = old_end - shift;
VMA_ITERATOR(vmi, mm, new_start);
+ VMG_STATE(vmg, mm, &vmi, new_start, old_end, 0, vma->vm_pgoff);
struct vm_area_struct *next;
struct mmu_gather tlb;
@@ -2336,7 +2336,8 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
/*
* cover the whole range: [new_start, old_end)
*/
- if (vma_expand(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL))
+ vmg.vma = vma;
+ if (vma_expand(&vmg))
return -ENOMEM;
/*
diff --git a/mm/vma.c b/mm/vma.c
index 3284bb778c3d..d1033dade70e 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -467,30 +467,25 @@ void validate_mm(struct mm_struct *mm)
/*
* vma_expand - Expand an existing VMA
*
- * @vmi: The vma iterator
- * @vma: The vma to expand
- * @start: The start of the vma
- * @end: The exclusive end of the vma
- * @pgoff: The page offset of vma
- * @next: The current of next vma.
+ * @vmg: Describes a VMA expansion operation.
*
- * Expand @vma to @start and @end. Can expand off the start and end. Will
- * expand over @next if it's different from @vma and @end == @next->vm_end.
- * Checking if the @vma can expand and merge with @next needs to be handled by
- * the caller.
+ * Expand @vma to vmg->start and vmg->end. Can expand off the start and end.
+ * Will expand over vmg->next if it's different from vmg->vma and vmg->end ==
+ * vmg->next->vm_end. Checking if the vmg->vma can expand and merge with
+ * vmg->next needs to be handled by the caller.
*
* Returns: 0 on success
*/
-int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
- unsigned long start, unsigned long end, pgoff_t pgoff,
- struct vm_area_struct *next)
+int vma_expand(struct vma_merge_struct *vmg)
{
struct vm_area_struct *anon_dup = NULL;
bool remove_next = false;
+ struct vm_area_struct *vma = vmg->vma;
+ struct vm_area_struct *next = vmg->next;
struct vma_prepare vp;
vma_start_write(vma);
- if (next && (vma != next) && (end == next->vm_end)) {
+ if (next && (vma != next) && (vmg->end == next->vm_end)) {
int ret;
remove_next = true;
@@ -503,21 +498,21 @@ int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
init_multi_vma_prep(&vp, vma, NULL, remove_next ? next : NULL, NULL);
/* Not merging but overwriting any part of next is not handled. */
VM_WARN_ON(next && !vp.remove &&
- next != vma && end > next->vm_start);
+ next != vma && vmg->end > next->vm_start);
/* Only handles expanding */
- VM_WARN_ON(vma->vm_start < start || vma->vm_end > end);
+ VM_WARN_ON(vma->vm_start < vmg->start || vma->vm_end > vmg->end);
/* Note: vma iterator must be pointing to 'start' */
- vma_iter_config(vmi, start, end);
- if (vma_iter_prealloc(vmi, vma))
+ vma_iter_config(vmg->vmi, vmg->start, vmg->end);
+ if (vma_iter_prealloc(vmg->vmi, vma))
goto nomem;
vma_prepare(&vp);
- vma_adjust_trans_huge(vma, start, end, 0);
- vma_set_range(vma, start, end, pgoff);
- vma_iter_store(vmi, vma);
+ vma_adjust_trans_huge(vma, vmg->start, vmg->end, 0);
+ vma_set_range(vma, vmg->start, vmg->end, vmg->pgoff);
+ vma_iter_store(vmg->vmi, vma);
- vma_complete(&vp, vmi, vma->vm_mm);
+ vma_complete(&vp, vmg->vmi, vma->vm_mm);
return 0;
nomem:
diff --git a/mm/vma.h b/mm/vma.h
index b1301d2c1c84..c9b49c15f15a 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -128,10 +128,7 @@ void init_vma_prep(struct vma_prepare *vp,
void vma_complete(struct vma_prepare *vp,
struct vma_iterator *vmi, struct mm_struct *mm);
-int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
- unsigned long start, unsigned long end, pgoff_t pgoff,
- struct vm_area_struct *next);
-
+int vma_expand(struct vma_merge_struct *vmg);
int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long start, unsigned long end, pgoff_t pgoff);
diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
index 7a3f59186464..f6c4706a861f 100644
--- a/tools/testing/vma/vma.c
+++ b/tools/testing/vma/vma.c
@@ -121,8 +121,7 @@ static struct vm_area_struct *merge_existing(struct vma_merge_struct *vmg)
*/
static int expand_existing(struct vma_merge_struct *vmg)
{
- return vma_expand(vmg->vmi, vmg->vma, vmg->start, vmg->end, vmg->pgoff,
- vmg->next);
+ return vma_expand(vmg);
}
/*
--
2.46.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 06/10] mm: avoid using vma_merge() for new VMAs
2024-08-30 18:10 [PATCH v3 00/10] mm: remove vma_merge() Lorenzo Stoakes
` (4 preceding siblings ...)
2024-08-30 18:10 ` [PATCH v3 05/10] mm: abstract vma_expand() to use vma_merge_struct Lorenzo Stoakes
@ 2024-08-30 18:10 ` Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 07/10] mm: make vma_prepare() and friends static and internal to vma.c Lorenzo Stoakes
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Lorenzo Stoakes @ 2024-08-30 18:10 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton
Cc: Liam R . Howlett, Vlastimil Babka, Mark Brown
Abstract vma_merge_new_vma() to use vma_merge_struct and rename the
resultant function vma_merge_new_range() to be clear what the purpose of
this function is - a new VMA is desired in the specified range, and we wish
to see if it is possible to 'merge' surrounding VMAs into this range rather
than having to allocate a new VMA.
Note that this function uses vma_extend() exclusively, so adopts its
requirement that the iterator point at or before the gap. We add an assert
to this effect.
This is as opposed to vma_merge_existing_range(), which will be introduced
in a subsequent commit, and provide the same functionality for cases in
which we are modifying an existing VMA.
In mmap_region() and do_brk_flags() we open code scenarios where we prefer
to use vma_expand() rather than invoke a full vma_merge() operation.
Abstract this logic and eliminate all of the open-coding, and also use the
same logic for all cases where we add new VMAs to, rather than ultimately
use vma_merge(), rather use vma_expand().
Doing so removes duplication and simplifies VMA merging in all such cases,
laying the ground for us to eliminate the merging of new VMAs in
vma_merge() altogether.
Also add the ability for the vmg to track state, and able to report errors,
allowing for us to differentiate a failed merge from an inability to
allocate memory in callers.
This makes it far easier to understand what is happening in these cases
avoiding confusion, bugs and allowing for future optimisation.
Also introduce vma_iter_next_rewind() to allow for retrieval of the next,
and (optionally) the prev VMA, rewinding to the start of the previous gap.
Introduce are_anon_vmas_compatible() to abstract individual VMA anon_vma
comparison for the case of merging on both sides where the anon_vma of the
VMA being merged maybe compatible with prev and next, but prev and next's
anon_vma's may not be compatible with each other.
Finally also introduce can_vma_merge_left() / can_vma_merge_right() to
check adjacent VMA compatibility and that they are indeed adjacent.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by: Mark Brown <broonie@kernel.org>
---
mm/mmap.c | 92 ++++----------
mm/vma.c | 200 +++++++++++++++++++++++++++----
mm/vma.h | 48 +++++++-
tools/testing/vma/vma.c | 33 ++++-
tools/testing/vma/vma_internal.h | 6 +
5 files changed, 279 insertions(+), 100 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 2b3006efd3fb..02f7b45c3076 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1364,8 +1364,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma = NULL;
- struct vm_area_struct *next, *prev, *merge;
pgoff_t pglen = PHYS_PFN(len);
+ struct vm_area_struct *merge;
unsigned long charged = 0;
struct vma_munmap_struct vms;
struct ma_state mas_detach;
@@ -1389,14 +1389,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
if (error)
goto gather_failed;
- next = vmg.next = vms.next;
- prev = vmg.prev = vms.prev;
+ vmg.next = vms.next;
+ vmg.prev = vms.prev;
vma = NULL;
} else {
- next = vmg.next = vma_next(&vmi);
- prev = vmg.prev = vma_prev(&vmi);
- if (prev)
- vma_iter_next_range(&vmi);
+ vmg.next = vma_iter_next_rewind(&vmi, &vmg.prev);
}
/* Check against address space limit. */
@@ -1417,46 +1414,9 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vmg.flags = vm_flags;
}
- if (vm_flags & VM_SPECIAL)
- goto cannot_expand;
-
- /* Attempt to expand an old mapping */
- /* Check next */
- if (next && next->vm_start == end && can_vma_merge_before(&vmg)) {
- vmg.end = next->vm_end;
- vma = vmg.vma = next;
- vmg.pgoff = next->vm_pgoff - pglen;
- /*
- * We set this here so if we will merge with the previous VMA in
- * the code below, can_vma_merge_after() ensures anon_vma
- * compatibility between prev and next.
- */
- vmg.anon_vma = vma->anon_vma;
- vmg.uffd_ctx = vma->vm_userfaultfd_ctx;
- }
-
- /* Check prev */
- if (prev && prev->vm_end == addr && can_vma_merge_after(&vmg)) {
- vmg.start = prev->vm_start;
- vma = vmg.vma = prev;
- vmg.pgoff = prev->vm_pgoff;
- vma_prev(&vmi); /* Equivalent to going to the previous range */
- }
-
- if (vma) {
- /* Actually expand, if possible */
- if (!vma_expand(&vmg)) {
- khugepaged_enter_vma(vma, vm_flags);
- goto expanded;
- }
-
- /* If the expand fails, then reposition the vma iterator */
- if (unlikely(vma == prev))
- vma_iter_set(&vmi, addr);
- }
-
-cannot_expand:
-
+ vma = vma_merge_new_range(&vmg);
+ if (vma)
+ goto expanded;
/*
* Determine the object being mapped and call the appropriate
* specific mapper. the address has already been validated, but
@@ -1503,10 +1463,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
* If vm_flags changed after call_mmap(), we should try merge
* vma again as we may succeed this time.
*/
- if (unlikely(vm_flags != vma->vm_flags && prev)) {
- merge = vma_merge_new_vma(&vmi, prev, vma,
- vma->vm_start, vma->vm_end,
- vma->vm_pgoff);
+ if (unlikely(vm_flags != vma->vm_flags && vmg.prev)) {
+ vmg.flags = vma->vm_flags;
+ /* If this fails, state is reset ready for a reattempt. */
+ merge = vma_merge_new_range(&vmg);
+
if (merge) {
/*
* ->mmap() can change vma->vm_file and fput
@@ -1522,6 +1483,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vm_flags = vma->vm_flags;
goto unmap_writable;
}
+ vma_iter_config(&vmi, addr, end);
}
vm_flags = vma->vm_flags;
@@ -1554,7 +1516,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vma_link_file(vma);
/*
- * vma_merge() calls khugepaged_enter_vma() either, the below
+ * vma_merge_new_range() calls khugepaged_enter_vma() too, the below
* call covers the non-merge case.
*/
khugepaged_enter_vma(vma, vma->vm_flags);
@@ -1609,7 +1571,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vma_iter_set(&vmi, vma->vm_end);
/* Undo any partial mapping done by a device driver. */
- unmap_region(&vmi.mas, vma, prev, next);
+ unmap_region(&vmi.mas, vma, vmg.prev, vmg.next);
}
if (writable_file_mapping)
mapping_unmap_writable(file->f_mapping);
@@ -1756,7 +1718,6 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long addr, unsigned long len, unsigned long flags)
{
struct mm_struct *mm = current->mm;
- struct vma_prepare vp;
/*
* Check against address space limits by the changed size
@@ -1780,25 +1741,12 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
VMG_STATE(vmg, mm, vmi, addr, addr + len, flags, PHYS_PFN(addr));
vmg.prev = vma;
- if (can_vma_merge_after(&vmg)) {
- vma_iter_config(vmi, vma->vm_start, addr + len);
- if (vma_iter_prealloc(vmi, vma))
- goto unacct_fail;
-
- vma_start_write(vma);
-
- init_vma_prep(&vp, vma);
- vma_prepare(&vp);
- vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0);
- vma->vm_end = addr + len;
- vm_flags_set(vma, VM_SOFTDIRTY);
- vma_iter_store(vmi, vma);
-
- vma_complete(&vp, vmi, mm);
- validate_mm(mm);
- khugepaged_enter_vma(vma, flags);
+ vma_iter_next_range(vmi);
+
+ if (vma_merge_new_range(&vmg))
goto out;
- }
+ else if (vmg_nomem(&vmg))
+ goto unacct_fail;
}
if (vma)
diff --git a/mm/vma.c b/mm/vma.c
index d1033dade70e..749c4881fd60 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -55,6 +55,13 @@ static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1,
return anon_vma1 == anon_vma2;
}
+/* Are the anon_vma's belonging to each VMA compatible with one another? */
+static inline bool are_anon_vmas_compatible(struct vm_area_struct *vma1,
+ struct vm_area_struct *vma2)
+{
+ return is_mergeable_anon_vma(vma1->anon_vma, vma2->anon_vma, NULL);
+}
+
/*
* init_multi_vma_prep() - Initializer for struct vma_prepare
* @vp: The vma_prepare struct
@@ -130,6 +137,44 @@ bool can_vma_merge_after(struct vma_merge_struct *vmg)
return false;
}
+/*
+ * Can the proposed VMA be merged with the left (previous) VMA taking into
+ * account the start position of the proposed range.
+ */
+static bool can_vma_merge_left(struct vma_merge_struct *vmg)
+
+{
+ return vmg->prev && vmg->prev->vm_end == vmg->start &&
+ can_vma_merge_after(vmg);
+}
+
+/*
+ * Can the proposed VMA be merged with the right (next) VMA taking into
+ * account the end position of the proposed range.
+ *
+ * In addition, if we can merge with the left VMA, ensure that left and right
+ * anon_vma's are also compatible.
+ */
+static bool can_vma_merge_right(struct vma_merge_struct *vmg,
+ bool can_merge_left)
+{
+ if (!vmg->next || vmg->end != vmg->next->vm_start ||
+ !can_vma_merge_before(vmg))
+ return false;
+
+ if (!can_merge_left)
+ return true;
+
+ /*
+ * If we can merge with prev (left) and next (right), indicating that
+ * each VMA's anon_vma is compatible with the proposed anon_vma, this
+ * does not mean prev and next are compatible with EACH OTHER.
+ *
+ * We therefore check this in addition to mergeability to either side.
+ */
+ return are_anon_vmas_compatible(vmg->prev, vmg->next);
+}
+
/*
* Close a vm structure and free it.
*/
@@ -464,6 +509,111 @@ void validate_mm(struct mm_struct *mm)
}
#endif /* CONFIG_DEBUG_VM_MAPLE_TREE */
+/*
+ * vma_merge_new_range - Attempt to merge a new VMA into address space
+ *
+ * @vmg: Describes the VMA we are adding, in the range @vmg->start to @vmg->end
+ * (exclusive), which we try to merge with any adjacent VMAs if possible.
+ *
+ * We are about to add a VMA to the address space starting at @vmg->start and
+ * ending at @vmg->end. There are three different possible scenarios:
+ *
+ * 1. There is a VMA with identical properties immediately adjacent to the
+ * proposed new VMA [@vmg->start, @vmg->end) either before or after it -
+ * EXPAND that VMA:
+ *
+ * Proposed: |-----| or |-----|
+ * Existing: |----| |----|
+ *
+ * 2. There are VMAs with identical properties immediately adjacent to the
+ * proposed new VMA [@vmg->start, @vmg->end) both before AND after it -
+ * EXPAND the former and REMOVE the latter:
+ *
+ * Proposed: |-----|
+ * Existing: |----| |----|
+ *
+ * 3. There are no VMAs immediately adjacent to the proposed new VMA or those
+ * VMAs do not have identical attributes - NO MERGE POSSIBLE.
+ *
+ * In instances where we can merge, this function returns the expanded VMA which
+ * will have its range adjusted accordingly and the underlying maple tree also
+ * adjusted.
+ *
+ * Returns: In instances where no merge was possible, NULL. Otherwise, a pointer
+ * to the VMA we expanded.
+ *
+ * This function adjusts @vmg to provide @vmg->next if not already specified,
+ * and adjusts [@vmg->start, @vmg->end) to span the expanded range.
+ *
+ * ASSUMPTIONS:
+ * - The caller must hold a WRITE lock on the mm_struct->mmap_lock.
+ * - The caller must have determined that [@vmg->start, @vmg->end) is empty,
+ other than VMAs that will be unmapped should the operation succeed.
+ * - The caller must have specified the previous vma in @vmg->prev.
+ * - The caller must have specified the next vma in @vmg->next.
+ * - The caller must have positioned the vmi at or before the gap.
+ */
+struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
+{
+ struct vm_area_struct *prev = vmg->prev;
+ struct vm_area_struct *next = vmg->next;
+ unsigned long start = vmg->start;
+ unsigned long end = vmg->end;
+ pgoff_t pgoff = vmg->pgoff;
+ pgoff_t pglen = PHYS_PFN(end - start);
+ bool can_merge_left, can_merge_right;
+
+ mmap_assert_write_locked(vmg->mm);
+ VM_WARN_ON(vmg->vma);
+ /* vmi must point at or before the gap. */
+ VM_WARN_ON(vma_iter_addr(vmg->vmi) > end);
+
+ vmg->state = VMA_MERGE_NOMERGE;
+
+ /* Special VMAs are unmergeable, also if no prev/next. */
+ if ((vmg->flags & VM_SPECIAL) || (!prev && !next))
+ return NULL;
+
+ can_merge_left = can_vma_merge_left(vmg);
+ can_merge_right = can_vma_merge_right(vmg, can_merge_left);
+
+ /* If we can merge with the next VMA, adjust vmg accordingly. */
+ if (can_merge_right) {
+ vmg->end = next->vm_end;
+ vmg->vma = next;
+ vmg->pgoff = next->vm_pgoff - pglen;
+ }
+
+ /* If we can merge with the previous VMA, adjust vmg accordingly. */
+ if (can_merge_left) {
+ vmg->start = prev->vm_start;
+ vmg->vma = prev;
+ vmg->pgoff = prev->vm_pgoff;
+
+ vma_prev(vmg->vmi); /* Equivalent to going to the previous range */
+ }
+
+ /*
+ * Now try to expand adjacent VMA(s). This takes care of removing the
+ * following VMA if we have VMAs on both sides.
+ */
+ if (vmg->vma && !vma_expand(vmg)) {
+ khugepaged_enter_vma(vmg->vma, vmg->flags);
+ vmg->state = VMA_MERGE_SUCCESS;
+ return vmg->vma;
+ }
+
+ /* If expansion failed, reset state. Allows us to retry merge later. */
+ vmg->vma = NULL;
+ vmg->start = start;
+ vmg->end = end;
+ vmg->pgoff = pgoff;
+ if (vmg->vma == prev)
+ vma_iter_set(vmg->vmi, start);
+
+ return NULL;
+}
+
/*
* vma_expand - Expand an existing VMA
*
@@ -474,7 +624,11 @@ void validate_mm(struct mm_struct *mm)
* vmg->next->vm_end. Checking if the vmg->vma can expand and merge with
* vmg->next needs to be handled by the caller.
*
- * Returns: 0 on success
+ * Returns: 0 on success.
+ *
+ * ASSUMPTIONS:
+ * - The caller must hold a WRITE lock on vmg->vma->mm->mmap_lock.
+ * - The caller must have set @vmg->vma and @vmg->next.
*/
int vma_expand(struct vma_merge_struct *vmg)
{
@@ -484,6 +638,8 @@ int vma_expand(struct vma_merge_struct *vmg)
struct vm_area_struct *next = vmg->next;
struct vma_prepare vp;
+ mmap_assert_write_locked(vmg->mm);
+
vma_start_write(vma);
if (next && (vma != next) && (vmg->end == next->vm_end)) {
int ret;
@@ -516,6 +672,7 @@ int vma_expand(struct vma_merge_struct *vmg)
return 0;
nomem:
+ vmg->state = VMA_MERGE_ERROR_NOMEM;
if (anon_dup)
unlink_anon_vmas(anon_dup);
return -ENOMEM;
@@ -1029,6 +1186,8 @@ static struct vm_area_struct *vma_merge(struct vma_merge_struct *vmg)
pgoff_t pglen = PHYS_PFN(end - addr);
long adj_start = 0;
+ vmg->state = VMA_MERGE_NOMERGE;
+
/*
* We later require that vma->vm_flags == vm_flags,
* so this tests vma->vm_flags & VM_SPECIAL, too.
@@ -1180,13 +1339,19 @@ static struct vm_area_struct *vma_merge(struct vma_merge_struct *vmg)
vma_complete(&vp, vmg->vmi, mm);
validate_mm(mm);
khugepaged_enter_vma(res, vmg->flags);
+
+ vmg->state = VMA_MERGE_SUCCESS;
return res;
prealloc_fail:
+ vmg->state = VMA_MERGE_ERROR_NOMEM;
if (anon_dup)
unlink_anon_vmas(anon_dup);
anon_vma_fail:
+ if (err == -ENOMEM)
+ vmg->state = VMA_MERGE_ERROR_NOMEM;
+
vma_iter_set(vmg->vmi, addr);
vma_iter_load(vmg->vmi);
return NULL;
@@ -1293,22 +1458,6 @@ struct vm_area_struct
return vma_modify(&vmg);
}
-/*
- * Attempt to merge a newly mapped VMA with those adjacent to it. The caller
- * must ensure that [start, end) does not overlap any existing VMA.
- */
-struct vm_area_struct
-*vma_merge_new_vma(struct vma_iterator *vmi, struct vm_area_struct *prev,
- struct vm_area_struct *vma, unsigned long start,
- unsigned long end, pgoff_t pgoff)
-{
- VMG_VMA_STATE(vmg, vmi, prev, vma, start, end);
-
- vmg.pgoff = pgoff;
-
- return vma_merge(&vmg);
-}
-
/*
* Expand vma by delta bytes, potentially merging with an immediately adjacent
* VMA with identical properties.
@@ -1319,8 +1468,10 @@ struct vm_area_struct *vma_merge_extend(struct vma_iterator *vmi,
{
VMG_VMA_STATE(vmg, vmi, vma, vma, vma->vm_end, vma->vm_end + delta);
- /* vma is specified as prev, so case 1 or 2 will apply. */
- return vma_merge(&vmg);
+ vmg.next = vma_iter_next_rewind(vmi, NULL);
+ vmg.vma = NULL; /* We use the VMA to populate VMG fields only. */
+
+ return vma_merge_new_range(&vmg);
}
void unlink_file_vma_batch_init(struct unlink_vma_file_batch *vb)
@@ -1421,9 +1572,10 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
struct vm_area_struct *vma = *vmap;
unsigned long vma_start = vma->vm_start;
struct mm_struct *mm = vma->vm_mm;
- struct vm_area_struct *new_vma, *prev;
+ struct vm_area_struct *new_vma;
bool faulted_in_anon_vma = true;
VMA_ITERATOR(vmi, mm, addr);
+ VMG_VMA_STATE(vmg, &vmi, NULL, vma, addr, addr + len);
/*
* If anonymous vma has not yet been faulted, update new pgoff
@@ -1434,11 +1586,15 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
faulted_in_anon_vma = false;
}
- new_vma = find_vma_prev(mm, addr, &prev);
+ new_vma = find_vma_prev(mm, addr, &vmg.prev);
if (new_vma && new_vma->vm_start < addr + len)
return NULL; /* should never get here */
- new_vma = vma_merge_new_vma(&vmi, prev, vma, addr, addr + len, pgoff);
+ vmg.vma = NULL; /* New VMA range. */
+ vmg.pgoff = pgoff;
+ vmg.next = vma_iter_next_rewind(&vmi, NULL);
+ new_vma = vma_merge_new_range(&vmg);
+
if (new_vma) {
/*
* Source vma may have been merged into new_vma
diff --git a/mm/vma.h b/mm/vma.h
index c9b49c15f15a..497bb49a318e 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -52,6 +52,13 @@ struct vma_munmap_struct {
unsigned long data_vm;
};
+enum vma_merge_state {
+ VMA_MERGE_START,
+ VMA_MERGE_ERROR_NOMEM,
+ VMA_MERGE_NOMERGE,
+ VMA_MERGE_SUCCESS,
+};
+
/* Represents a VMA merge operation. */
struct vma_merge_struct {
struct mm_struct *mm;
@@ -68,8 +75,14 @@ struct vma_merge_struct {
struct mempolicy *policy;
struct vm_userfaultfd_ctx uffd_ctx;
struct anon_vma_name *anon_name;
+ enum vma_merge_state state;
};
+static inline bool vmg_nomem(struct vma_merge_struct *vmg)
+{
+ return vmg->state == VMA_MERGE_ERROR_NOMEM;
+}
+
/* Assumes addr >= vma->vm_start. */
static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
unsigned long addr)
@@ -85,6 +98,7 @@ static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
.end = end_, \
.flags = flags_, \
.pgoff = pgoff_, \
+ .state = VMA_MERGE_START, \
}
#define VMG_VMA_STATE(name, vmi_, prev_, vma_, start_, end_) \
@@ -103,6 +117,7 @@ static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
.policy = vma_policy(vma_), \
.uffd_ctx = vma_->vm_userfaultfd_ctx, \
.anon_name = anon_vma_name(vma_), \
+ .state = VMA_MERGE_START, \
}
#ifdef CONFIG_DEBUG_VM_MAPLE_TREE
@@ -309,10 +324,7 @@ struct vm_area_struct
unsigned long new_flags,
struct vm_userfaultfd_ctx new_ctx);
-struct vm_area_struct
-*vma_merge_new_vma(struct vma_iterator *vmi, struct vm_area_struct *prev,
- struct vm_area_struct *vma, unsigned long start,
- unsigned long end, pgoff_t pgoff);
+struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg);
struct vm_area_struct *vma_merge_extend(struct vma_iterator *vmi,
struct vm_area_struct *vma,
@@ -505,6 +517,34 @@ struct vm_area_struct *vma_iter_prev_range(struct vma_iterator *vmi)
return mas_prev_range(&vmi->mas, 0);
}
+/*
+ * Retrieve the next VMA and rewind the iterator to end of the previous VMA, or
+ * if no previous VMA, to index 0.
+ */
+static inline
+struct vm_area_struct *vma_iter_next_rewind(struct vma_iterator *vmi,
+ struct vm_area_struct **pprev)
+{
+ struct vm_area_struct *next = vma_next(vmi);
+ struct vm_area_struct *prev = vma_prev(vmi);
+
+ /*
+ * Consider the case where no previous VMA exists. We advance to the
+ * next VMA, skipping any gap, then rewind to the start of the range.
+ *
+ * If we were to unconditionally advance to the next range we'd wind up
+ * at the next VMA again, so we check to ensure there is a previous VMA
+ * to skip over.
+ */
+ if (prev)
+ vma_iter_next_range(vmi);
+
+ if (pprev)
+ *pprev = prev;
+
+ return next;
+}
+
#ifdef CONFIG_64BIT
static inline bool vma_is_sealed(struct vm_area_struct *vma)
diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
index f6c4706a861f..b7cdafec09af 100644
--- a/tools/testing/vma/vma.c
+++ b/tools/testing/vma/vma.c
@@ -101,9 +101,9 @@ static struct vm_area_struct *merge_new(struct vma_merge_struct *vmg)
*/
vmg->next = vma_next(vmg->vmi);
vmg->prev = vma_prev(vmg->vmi);
+ vma_iter_next_range(vmg->vmi);
- vma_iter_set(vmg->vmi, vmg->start);
- return vma_merge(vmg);
+ return vma_merge_new_range(vmg);
}
/*
@@ -162,10 +162,14 @@ static struct vm_area_struct *try_merge_new_vma(struct mm_struct *mm,
merged = merge_new(vmg);
if (merged) {
*was_merged = true;
+ ASSERT_EQ(vmg->state, VMA_MERGE_SUCCESS);
return merged;
}
*was_merged = false;
+
+ ASSERT_EQ(vmg->state, VMA_MERGE_NOMERGE);
+
return alloc_and_link_vma(mm, start, end, pgoff, flags);
}
@@ -595,6 +599,7 @@ static bool test_vma_merge_special_flags(void)
vmg.flags = flags | special_flag;
vma = merge_new(&vmg);
ASSERT_EQ(vma, NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
}
/* 2. Modify VMA with special flag that would otherwise merge. */
@@ -616,6 +621,7 @@ static bool test_vma_merge_special_flags(void)
vmg.flags = flags | special_flag;
vma = merge_existing(&vmg);
ASSERT_EQ(vma, NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
}
cleanup_mm(&mm, &vmi);
@@ -708,6 +714,7 @@ static bool test_vma_merge_with_close(void)
/* The next VMA having a close() operator should cause the merge to fail.*/
ASSERT_EQ(merge_new(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
/* Now create the VMA so we can merge via modified flags */
vmg_set_range(&vmg, 0x1000, 0x2000, 1, flags);
@@ -719,6 +726,7 @@ static bool test_vma_merge_with_close(void)
* also fail.
*/
ASSERT_EQ(merge_existing(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
/* SCENARIO B
*
@@ -744,6 +752,7 @@ static bool test_vma_merge_with_close(void)
vmg.vma = vma;
/* Make sure merge does not occur. */
ASSERT_EQ(merge_existing(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
cleanup_mm(&mm, &vmi);
return true;
@@ -792,6 +801,7 @@ static bool test_vma_merge_new_with_close(void)
vmg_set_range(&vmg, 0x2000, 0x5000, 2, flags);
vma = merge_new(&vmg);
ASSERT_NE(vma, NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma->vm_start, 0);
ASSERT_EQ(vma->vm_end, 0x5000);
ASSERT_EQ(vma->vm_pgoff, 0);
@@ -831,6 +841,7 @@ static bool test_merge_existing(void)
vmg.prev = vma;
vma->anon_vma = &dummy_anon_vma;
ASSERT_EQ(merge_existing(&vmg), vma_next);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_next->vm_start, 0x3000);
ASSERT_EQ(vma_next->vm_end, 0x9000);
ASSERT_EQ(vma_next->vm_pgoff, 3);
@@ -861,6 +872,7 @@ static bool test_merge_existing(void)
vmg.vma = vma;
vma->anon_vma = &dummy_anon_vma;
ASSERT_EQ(merge_existing(&vmg), vma_next);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_next->vm_start, 0x2000);
ASSERT_EQ(vma_next->vm_end, 0x9000);
ASSERT_EQ(vma_next->vm_pgoff, 2);
@@ -889,6 +901,7 @@ static bool test_merge_existing(void)
vma->anon_vma = &dummy_anon_vma;
ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_prev->vm_start, 0);
ASSERT_EQ(vma_prev->vm_end, 0x6000);
ASSERT_EQ(vma_prev->vm_pgoff, 0);
@@ -920,6 +933,7 @@ static bool test_merge_existing(void)
vmg.vma = vma;
vma->anon_vma = &dummy_anon_vma;
ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_prev->vm_start, 0);
ASSERT_EQ(vma_prev->vm_end, 0x7000);
ASSERT_EQ(vma_prev->vm_pgoff, 0);
@@ -948,6 +962,7 @@ static bool test_merge_existing(void)
vmg.vma = vma;
vma->anon_vma = &dummy_anon_vma;
ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_prev->vm_start, 0);
ASSERT_EQ(vma_prev->vm_end, 0x9000);
ASSERT_EQ(vma_prev->vm_pgoff, 0);
@@ -981,31 +996,37 @@ static bool test_merge_existing(void)
vmg.prev = vma;
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
vmg_set_range(&vmg, 0x5000, 0x6000, 5, flags);
vmg.prev = vma;
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
vmg_set_range(&vmg, 0x6000, 0x7000, 6, flags);
vmg.prev = vma;
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
vmg_set_range(&vmg, 0x4000, 0x7000, 4, flags);
vmg.prev = vma;
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
vmg_set_range(&vmg, 0x4000, 0x6000, 4, flags);
vmg.prev = vma;
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
vmg_set_range(&vmg, 0x5000, 0x6000, 5, flags);
vmg.prev = vma;
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
ASSERT_EQ(cleanup_mm(&mm, &vmi), 3);
@@ -1071,6 +1092,7 @@ static bool test_anon_vma_non_mergeable(void)
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_prev->vm_start, 0);
ASSERT_EQ(vma_prev->vm_end, 0x7000);
ASSERT_EQ(vma_prev->vm_pgoff, 0);
@@ -1106,6 +1128,7 @@ static bool test_anon_vma_non_mergeable(void)
vmg.prev = vma_prev;
ASSERT_EQ(merge_new(&vmg), vma_prev);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_prev->vm_start, 0);
ASSERT_EQ(vma_prev->vm_end, 0x7000);
ASSERT_EQ(vma_prev->vm_pgoff, 0);
@@ -1181,6 +1204,7 @@ static bool test_dup_anon_vma(void)
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_prev->vm_start, 0);
ASSERT_EQ(vma_prev->vm_end, 0x8000);
@@ -1209,6 +1233,7 @@ static bool test_dup_anon_vma(void)
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_prev->vm_start, 0);
ASSERT_EQ(vma_prev->vm_end, 0x8000);
@@ -1236,6 +1261,7 @@ static bool test_dup_anon_vma(void)
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_prev->vm_start, 0);
ASSERT_EQ(vma_prev->vm_end, 0x5000);
@@ -1263,6 +1289,7 @@ static bool test_dup_anon_vma(void)
vmg.vma = vma;
ASSERT_EQ(merge_existing(&vmg), vma_next);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_next->vm_start, 0x3000);
ASSERT_EQ(vma_next->vm_end, 0x8000);
@@ -1303,6 +1330,7 @@ static bool test_vmi_prealloc_fail(void)
/* This will cause the merge to fail. */
ASSERT_EQ(merge_existing(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_ERROR_NOMEM);
/* We will already have assigned the anon_vma. */
ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
/* And it was both cloned and unlinked. */
@@ -1327,6 +1355,7 @@ static bool test_vmi_prealloc_fail(void)
fail_prealloc = true;
ASSERT_EQ(expand_existing(&vmg), -ENOMEM);
+ ASSERT_EQ(vmg.state, VMA_MERGE_ERROR_NOMEM);
ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
ASSERT_TRUE(dummy_anon_vma.was_cloned);
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
index a3c262c6eb73..c5b9da034511 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -740,6 +740,12 @@ static inline void vma_iter_free(struct vma_iterator *vmi)
mas_destroy(&vmi->mas);
}
+static inline
+struct vm_area_struct *vma_iter_next_range(struct vma_iterator *vmi)
+{
+ return mas_next_range(&vmi->mas, ULONG_MAX);
+}
+
static inline void vm_acct_memory(long pages)
{
}
--
2.46.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 07/10] mm: make vma_prepare() and friends static and internal to vma.c
2024-08-30 18:10 [PATCH v3 00/10] mm: remove vma_merge() Lorenzo Stoakes
` (5 preceding siblings ...)
2024-08-30 18:10 ` [PATCH v3 06/10] mm: avoid using vma_merge() for new VMAs Lorenzo Stoakes
@ 2024-08-30 18:10 ` Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 08/10] mm: introduce commit_merge(), abstracting final commit of merge Lorenzo Stoakes
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Lorenzo Stoakes @ 2024-08-30 18:10 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton
Cc: Liam R . Howlett, Vlastimil Babka, Mark Brown
Now we have abstracted merge behaviour for new VMA ranges, we are able to
render vma_prepare(), init_vma_prep(), vma_complete(),
can_vma_merge_before() and can_vma_merge_after() static and internal to
vma.c.
These are internal implementation details of kernel VMA manipulation and
merging mechanisms and thus should not be exposed. This also renders the
functions userland testable.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/vma.c | 318 +++++++++++++++++++++++++++----------------------------
mm/vma.h | 25 -----
2 files changed, 158 insertions(+), 185 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index 749c4881fd60..eb4f32705a41 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -104,8 +104,7 @@ static void init_multi_vma_prep(struct vma_prepare *vp,
*
* We assume the vma may be removed as part of the merge.
*/
-bool
-can_vma_merge_before(struct vma_merge_struct *vmg)
+static bool can_vma_merge_before(struct vma_merge_struct *vmg)
{
pgoff_t pglen = PHYS_PFN(vmg->end - vmg->start);
@@ -127,7 +126,7 @@ can_vma_merge_before(struct vma_merge_struct *vmg)
*
* We assume that vma is not removed as part of the merge.
*/
-bool can_vma_merge_after(struct vma_merge_struct *vmg)
+static bool can_vma_merge_after(struct vma_merge_struct *vmg)
{
if (is_mergeable_vma(vmg, /* merge_next = */ false) &&
is_mergeable_anon_vma(vmg->anon_vma, vmg->prev->anon_vma, vmg->prev)) {
@@ -137,6 +136,162 @@ bool can_vma_merge_after(struct vma_merge_struct *vmg)
return false;
}
+static void __vma_link_file(struct vm_area_struct *vma,
+ struct address_space *mapping)
+{
+ if (vma_is_shared_maywrite(vma))
+ mapping_allow_writable(mapping);
+
+ flush_dcache_mmap_lock(mapping);
+ vma_interval_tree_insert(vma, &mapping->i_mmap);
+ flush_dcache_mmap_unlock(mapping);
+}
+
+/*
+ * Requires inode->i_mapping->i_mmap_rwsem
+ */
+static void __remove_shared_vm_struct(struct vm_area_struct *vma,
+ struct address_space *mapping)
+{
+ if (vma_is_shared_maywrite(vma))
+ mapping_unmap_writable(mapping);
+
+ flush_dcache_mmap_lock(mapping);
+ vma_interval_tree_remove(vma, &mapping->i_mmap);
+ flush_dcache_mmap_unlock(mapping);
+}
+
+/*
+ * vma_prepare() - Helper function for handling locking VMAs prior to altering
+ * @vp: The initialized vma_prepare struct
+ */
+static void vma_prepare(struct vma_prepare *vp)
+{
+ if (vp->file) {
+ uprobe_munmap(vp->vma, vp->vma->vm_start, vp->vma->vm_end);
+
+ if (vp->adj_next)
+ uprobe_munmap(vp->adj_next, vp->adj_next->vm_start,
+ vp->adj_next->vm_end);
+
+ i_mmap_lock_write(vp->mapping);
+ if (vp->insert && vp->insert->vm_file) {
+ /*
+ * Put into interval tree now, so instantiated pages
+ * are visible to arm/parisc __flush_dcache_page
+ * throughout; but we cannot insert into address
+ * space until vma start or end is updated.
+ */
+ __vma_link_file(vp->insert,
+ vp->insert->vm_file->f_mapping);
+ }
+ }
+
+ if (vp->anon_vma) {
+ anon_vma_lock_write(vp->anon_vma);
+ anon_vma_interval_tree_pre_update_vma(vp->vma);
+ if (vp->adj_next)
+ anon_vma_interval_tree_pre_update_vma(vp->adj_next);
+ }
+
+ if (vp->file) {
+ flush_dcache_mmap_lock(vp->mapping);
+ vma_interval_tree_remove(vp->vma, &vp->mapping->i_mmap);
+ if (vp->adj_next)
+ vma_interval_tree_remove(vp->adj_next,
+ &vp->mapping->i_mmap);
+ }
+
+}
+
+/*
+ * vma_complete- Helper function for handling the unlocking after altering VMAs,
+ * or for inserting a VMA.
+ *
+ * @vp: The vma_prepare struct
+ * @vmi: The vma iterator
+ * @mm: The mm_struct
+ */
+static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
+ struct mm_struct *mm)
+{
+ if (vp->file) {
+ if (vp->adj_next)
+ vma_interval_tree_insert(vp->adj_next,
+ &vp->mapping->i_mmap);
+ vma_interval_tree_insert(vp->vma, &vp->mapping->i_mmap);
+ flush_dcache_mmap_unlock(vp->mapping);
+ }
+
+ if (vp->remove && vp->file) {
+ __remove_shared_vm_struct(vp->remove, vp->mapping);
+ if (vp->remove2)
+ __remove_shared_vm_struct(vp->remove2, vp->mapping);
+ } else if (vp->insert) {
+ /*
+ * split_vma has split insert from vma, and needs
+ * us to insert it before dropping the locks
+ * (it may either follow vma or precede it).
+ */
+ vma_iter_store(vmi, vp->insert);
+ mm->map_count++;
+ }
+
+ if (vp->anon_vma) {
+ anon_vma_interval_tree_post_update_vma(vp->vma);
+ if (vp->adj_next)
+ anon_vma_interval_tree_post_update_vma(vp->adj_next);
+ anon_vma_unlock_write(vp->anon_vma);
+ }
+
+ if (vp->file) {
+ i_mmap_unlock_write(vp->mapping);
+ uprobe_mmap(vp->vma);
+
+ if (vp->adj_next)
+ uprobe_mmap(vp->adj_next);
+ }
+
+ if (vp->remove) {
+again:
+ vma_mark_detached(vp->remove, true);
+ if (vp->file) {
+ uprobe_munmap(vp->remove, vp->remove->vm_start,
+ vp->remove->vm_end);
+ fput(vp->file);
+ }
+ if (vp->remove->anon_vma)
+ anon_vma_merge(vp->vma, vp->remove);
+ mm->map_count--;
+ mpol_put(vma_policy(vp->remove));
+ if (!vp->remove2)
+ WARN_ON_ONCE(vp->vma->vm_end < vp->remove->vm_end);
+ vm_area_free(vp->remove);
+
+ /*
+ * In mprotect's case 6 (see comments on vma_merge),
+ * we are removing both mid and next vmas
+ */
+ if (vp->remove2) {
+ vp->remove = vp->remove2;
+ vp->remove2 = NULL;
+ goto again;
+ }
+ }
+ if (vp->insert && vp->file)
+ uprobe_mmap(vp->insert);
+}
+
+/*
+ * init_vma_prep() - Initializer wrapper for vma_prepare struct
+ * @vp: The vma_prepare struct
+ * @vma: The vma that will be altered once locked
+ */
+static void init_vma_prep(struct vma_prepare *vp, struct vm_area_struct *vma)
+{
+ init_multi_vma_prep(vp, vma, NULL, NULL, NULL);
+}
+
/*
* Can the proposed VMA be merged with the left (previous) VMA taking into
* account the start position of the proposed range.
@@ -315,31 +470,6 @@ static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
return __split_vma(vmi, vma, addr, new_below);
}
-/*
- * init_vma_prep() - Initializer wrapper for vma_prepare struct
- * @vp: The vma_prepare struct
- * @vma: The vma that will be altered once locked
- */
-void init_vma_prep(struct vma_prepare *vp,
- struct vm_area_struct *vma)
-{
- init_multi_vma_prep(vp, vma, NULL, NULL, NULL);
-}
-
-/*
- * Requires inode->i_mapping->i_mmap_rwsem
- */
-static void __remove_shared_vm_struct(struct vm_area_struct *vma,
- struct address_space *mapping)
-{
- if (vma_is_shared_maywrite(vma))
- mapping_unmap_writable(mapping);
-
- flush_dcache_mmap_lock(mapping);
- vma_interval_tree_remove(vma, &mapping->i_mmap);
- flush_dcache_mmap_unlock(mapping);
-}
-
/*
* vma has some anon_vma assigned, and is already inserted on that
* anon_vma's interval trees.
@@ -372,60 +502,6 @@ anon_vma_interval_tree_post_update_vma(struct vm_area_struct *vma)
anon_vma_interval_tree_insert(avc, &avc->anon_vma->rb_root);
}
-static void __vma_link_file(struct vm_area_struct *vma,
- struct address_space *mapping)
-{
- if (vma_is_shared_maywrite(vma))
- mapping_allow_writable(mapping);
-
- flush_dcache_mmap_lock(mapping);
- vma_interval_tree_insert(vma, &mapping->i_mmap);
- flush_dcache_mmap_unlock(mapping);
-}
-
-/*
- * vma_prepare() - Helper function for handling locking VMAs prior to altering
- * @vp: The initialized vma_prepare struct
- */
-void vma_prepare(struct vma_prepare *vp)
-{
- if (vp->file) {
- uprobe_munmap(vp->vma, vp->vma->vm_start, vp->vma->vm_end);
-
- if (vp->adj_next)
- uprobe_munmap(vp->adj_next, vp->adj_next->vm_start,
- vp->adj_next->vm_end);
-
- i_mmap_lock_write(vp->mapping);
- if (vp->insert && vp->insert->vm_file) {
- /*
- * Put into interval tree now, so instantiated pages
- * are visible to arm/parisc __flush_dcache_page
- * throughout; but we cannot insert into address
- * space until vma start or end is updated.
- */
- __vma_link_file(vp->insert,
- vp->insert->vm_file->f_mapping);
- }
- }
-
- if (vp->anon_vma) {
- anon_vma_lock_write(vp->anon_vma);
- anon_vma_interval_tree_pre_update_vma(vp->vma);
- if (vp->adj_next)
- anon_vma_interval_tree_pre_update_vma(vp->adj_next);
- }
-
- if (vp->file) {
- flush_dcache_mmap_lock(vp->mapping);
- vma_interval_tree_remove(vp->vma, &vp->mapping->i_mmap);
- if (vp->adj_next)
- vma_interval_tree_remove(vp->adj_next,
- &vp->mapping->i_mmap);
- }
-
-}
-
/*
* dup_anon_vma() - Helper function to duplicate anon_vma
* @dst: The destination VMA
@@ -715,84 +791,6 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
return 0;
}
-/*
- * vma_complete- Helper function for handling the unlocking after altering VMAs,
- * or for inserting a VMA.
- *
- * @vp: The vma_prepare struct
- * @vmi: The vma iterator
- * @mm: The mm_struct
- */
-void vma_complete(struct vma_prepare *vp,
- struct vma_iterator *vmi, struct mm_struct *mm)
-{
- if (vp->file) {
- if (vp->adj_next)
- vma_interval_tree_insert(vp->adj_next,
- &vp->mapping->i_mmap);
- vma_interval_tree_insert(vp->vma, &vp->mapping->i_mmap);
- flush_dcache_mmap_unlock(vp->mapping);
- }
-
- if (vp->remove && vp->file) {
- __remove_shared_vm_struct(vp->remove, vp->mapping);
- if (vp->remove2)
- __remove_shared_vm_struct(vp->remove2, vp->mapping);
- } else if (vp->insert) {
- /*
- * split_vma has split insert from vma, and needs
- * us to insert it before dropping the locks
- * (it may either follow vma or precede it).
- */
- vma_iter_store(vmi, vp->insert);
- mm->map_count++;
- }
-
- if (vp->anon_vma) {
- anon_vma_interval_tree_post_update_vma(vp->vma);
- if (vp->adj_next)
- anon_vma_interval_tree_post_update_vma(vp->adj_next);
- anon_vma_unlock_write(vp->anon_vma);
- }
-
- if (vp->file) {
- i_mmap_unlock_write(vp->mapping);
- uprobe_mmap(vp->vma);
-
- if (vp->adj_next)
- uprobe_mmap(vp->adj_next);
- }
-
- if (vp->remove) {
-again:
- vma_mark_detached(vp->remove, true);
- if (vp->file) {
- uprobe_munmap(vp->remove, vp->remove->vm_start,
- vp->remove->vm_end);
- fput(vp->file);
- }
- if (vp->remove->anon_vma)
- anon_vma_merge(vp->vma, vp->remove);
- mm->map_count--;
- mpol_put(vma_policy(vp->remove));
- if (!vp->remove2)
- WARN_ON_ONCE(vp->vma->vm_end < vp->remove->vm_end);
- vm_area_free(vp->remove);
-
- /*
- * In mprotect's case 6 (see comments on vma_merge),
- * we are removing both mid and next vmas
- */
- if (vp->remove2) {
- vp->remove = vp->remove2;
- vp->remove2 = NULL;
- goto again;
- }
- }
- if (vp->insert && vp->file)
- uprobe_mmap(vp->insert);
-}
-
static inline void vms_clear_ptes(struct vma_munmap_struct *vms,
struct ma_state *mas_detach, bool mm_wr_locked)
{
diff --git a/mm/vma.h b/mm/vma.h
index 497bb49a318e..370d3246f147 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -132,17 +132,6 @@ void anon_vma_interval_tree_pre_update_vma(struct vm_area_struct *vma);
/* Required for expand_downwards(). */
void anon_vma_interval_tree_post_update_vma(struct vm_area_struct *vma);
-/* Required for do_brk_flags(). */
-void vma_prepare(struct vma_prepare *vp);
-
-/* Required for do_brk_flags(). */
-void init_vma_prep(struct vma_prepare *vp,
- struct vm_area_struct *vma);
-
-/* Required for do_brk_flags(). */
-void vma_complete(struct vma_prepare *vp,
- struct vma_iterator *vmi, struct mm_struct *mm);
-
int vma_expand(struct vma_merge_struct *vmg);
int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long start, unsigned long end, pgoff_t pgoff);
@@ -277,20 +266,6 @@ void remove_vma(struct vm_area_struct *vma, bool unreachable, bool closed);
void unmap_region(struct ma_state *mas, struct vm_area_struct *vma,
struct vm_area_struct *prev, struct vm_area_struct *next);
-/*
- * Can we merge the VMA described by vmg into the following VMA vmg->next?
- *
- * Required by mmap_region().
- */
-bool can_vma_merge_before(struct vma_merge_struct *vmg);
-
-/*
- * Can we merge the VMA described by vmg into the preceding VMA vmg->prev?
- *
- * Required by mmap_region() and do_brk_flags().
- */
-bool can_vma_merge_after(struct vma_merge_struct *vmg);
-
/* We are about to modify the VMA's flags. */
struct vm_area_struct *vma_modify_flags(struct vma_iterator *vmi,
struct vm_area_struct *prev, struct vm_area_struct *vma,
--
2.46.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 08/10] mm: introduce commit_merge(), abstracting final commit of merge
2024-08-30 18:10 [PATCH v3 00/10] mm: remove vma_merge() Lorenzo Stoakes
` (6 preceding siblings ...)
2024-08-30 18:10 ` [PATCH v3 07/10] mm: make vma_prepare() and friends static and internal to vma.c Lorenzo Stoakes
@ 2024-08-30 18:10 ` Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 09/10] mm: refactor vma_merge() into modify-only vma_merge_existing_range() Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 10/10] mm: rework vm_ops->close() handling on VMA merge Lorenzo Stoakes
9 siblings, 0 replies; 11+ messages in thread
From: Lorenzo Stoakes @ 2024-08-30 18:10 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton
Cc: Liam R . Howlett, Vlastimil Babka, Mark Brown
Pull the part of vma_expand() which actually commits the merge operation,
that is inserts it into the maple tree and sets the VMA's vma->vm_start and
vma->vm_end parameters, into its own function.
We implement only the parts needed for vma_expand() which now as a result
of previous work is also the means by which new VMA ranges are merged.
The next commit in the series will implement merging of existing ranges
which will extend commit_merge() to accommodate this case and result in all
merges using this common code.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/vma.c | 39 +++++++++++++++++++++++++++------------
1 file changed, 27 insertions(+), 12 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index eb4f32705a41..566cad2338dd 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -585,6 +585,31 @@ void validate_mm(struct mm_struct *mm)
}
#endif /* CONFIG_DEBUG_VM_MAPLE_TREE */
+/* Actually perform the VMA merge operation. */
+static int commit_merge(struct vma_merge_struct *vmg,
+ struct vm_area_struct *remove)
+{
+ struct vma_prepare vp;
+
+ init_multi_vma_prep(&vp, vmg->vma, NULL, remove, NULL);
+
+ /* Note: vma iterator must be pointing to 'start'. */
+ vma_iter_config(vmg->vmi, vmg->start, vmg->end);
+
+ if (vma_iter_prealloc(vmg->vmi, vmg->vma))
+ return -ENOMEM;
+
+ vma_prepare(&vp);
+ vma_adjust_trans_huge(vmg->vma, vmg->start, vmg->end, 0);
+ vma_set_range(vmg->vma, vmg->start, vmg->end, vmg->pgoff);
+
+ vma_iter_store(vmg->vmi, vmg->vma);
+
+ vma_complete(&vp, vmg->vmi, vmg->vma->vm_mm);
+
+ return 0;
+}
+
/*
* vma_merge_new_range - Attempt to merge a new VMA into address space
*
@@ -712,7 +737,6 @@ int vma_expand(struct vma_merge_struct *vmg)
bool remove_next = false;
struct vm_area_struct *vma = vmg->vma;
struct vm_area_struct *next = vmg->next;
- struct vma_prepare vp;
mmap_assert_write_locked(vmg->mm);
@@ -727,24 +751,15 @@ int vma_expand(struct vma_merge_struct *vmg)
return ret;
}
- init_multi_vma_prep(&vp, vma, NULL, remove_next ? next : NULL, NULL);
/* Not merging but overwriting any part of next is not handled. */
- VM_WARN_ON(next && !vp.remove &&
+ VM_WARN_ON(next && !remove_next &&
next != vma && vmg->end > next->vm_start);
/* Only handles expanding */
VM_WARN_ON(vma->vm_start < vmg->start || vma->vm_end > vmg->end);
- /* Note: vma iterator must be pointing to 'start' */
- vma_iter_config(vmg->vmi, vmg->start, vmg->end);
- if (vma_iter_prealloc(vmg->vmi, vma))
+ if (commit_merge(vmg, remove_next ? next : NULL))
goto nomem;
- vma_prepare(&vp);
- vma_adjust_trans_huge(vma, vmg->start, vmg->end, 0);
- vma_set_range(vma, vmg->start, vmg->end, vmg->pgoff);
- vma_iter_store(vmg->vmi, vma);
-
- vma_complete(&vp, vmg->vmi, vma->vm_mm);
return 0;
nomem:
--
2.46.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 09/10] mm: refactor vma_merge() into modify-only vma_merge_existing_range()
2024-08-30 18:10 [PATCH v3 00/10] mm: remove vma_merge() Lorenzo Stoakes
` (7 preceding siblings ...)
2024-08-30 18:10 ` [PATCH v3 08/10] mm: introduce commit_merge(), abstracting final commit of merge Lorenzo Stoakes
@ 2024-08-30 18:10 ` Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 10/10] mm: rework vm_ops->close() handling on VMA merge Lorenzo Stoakes
9 siblings, 0 replies; 11+ messages in thread
From: Lorenzo Stoakes @ 2024-08-30 18:10 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton
Cc: Liam R . Howlett, Vlastimil Babka, Mark Brown
The existing vma_merge() function is no longer required to handle what were
previously referred to as cases 1-3 (i.e. the merging of a new VMA), as
this is now handled by vma_merge_new_vma().
Additionally, simplify the convoluted control flow of the original,
maintaining identical logic only expressed more clearly and doing away with
a complicated set of cases, rather logically examining each possible
outcome - merging of both the previous and subsequent VMA, merging of the
previous VMA and merging of the subsequent VMA alone.
We now utilise the previously implemented commit_merge() function to share
logic with vma_expand() de-duplicating code and providing less surface area
for bugs and confusion. In order to do so, we adjust this function to
accept parameters specific to merging existing ranges.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/vma.c | 508 ++++++++++++++++++++--------------------
tools/testing/vma/vma.c | 9 +-
2 files changed, 264 insertions(+), 253 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index 566cad2338dd..393bef832604 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -587,29 +587,278 @@ void validate_mm(struct mm_struct *mm)
/* Actually perform the VMA merge operation. */
static int commit_merge(struct vma_merge_struct *vmg,
- struct vm_area_struct *remove)
+ struct vm_area_struct *adjust,
+ struct vm_area_struct *remove,
+ struct vm_area_struct *remove2,
+ long adj_start,
+ bool expanded)
{
struct vma_prepare vp;
- init_multi_vma_prep(&vp, vmg->vma, NULL, remove, NULL);
+ init_multi_vma_prep(&vp, vmg->vma, adjust, remove, remove2);
- /* Note: vma iterator must be pointing to 'start'. */
- vma_iter_config(vmg->vmi, vmg->start, vmg->end);
+ VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma &&
+ vp.anon_vma != adjust->anon_vma);
+
+ if (expanded) {
+ /* Note: vma iterator must be pointing to 'start'. */
+ vma_iter_config(vmg->vmi, vmg->start, vmg->end);
+ } else {
+ vma_iter_config(vmg->vmi, adjust->vm_start + adj_start,
+ adjust->vm_end);
+ }
if (vma_iter_prealloc(vmg->vmi, vmg->vma))
return -ENOMEM;
vma_prepare(&vp);
- vma_adjust_trans_huge(vmg->vma, vmg->start, vmg->end, 0);
+ vma_adjust_trans_huge(vmg->vma, vmg->start, vmg->end, adj_start);
vma_set_range(vmg->vma, vmg->start, vmg->end, vmg->pgoff);
- vma_iter_store(vmg->vmi, vmg->vma);
+ if (expanded)
+ vma_iter_store(vmg->vmi, vmg->vma);
+
+ if (adj_start) {
+ adjust->vm_start += adj_start;
+ adjust->vm_pgoff += PHYS_PFN(adj_start);
+ if (adj_start < 0) {
+ WARN_ON(expanded);
+ vma_iter_store(vmg->vmi, adjust);
+ }
+ }
vma_complete(&vp, vmg->vmi, vmg->vma->vm_mm);
return 0;
}
+/*
+ * vma_merge_existing_range - Attempt to merge VMAs based on a VMA having its
+ * attributes modified.
+ *
+ * @vmg: Describes the modifications being made to a VMA and associated
+ * metadata.
+ *
+ * When the attributes of a range within a VMA change, then it might be possible
+ * for immediately adjacent VMAs to be merged into that VMA due to having
+ * identical properties.
+ *
+ * This function checks for the existence of any such mergeable VMAs and updates
+ * the maple tree describing the @vmg->vma->vm_mm address space to account for
+ * this, as well as any VMAs shrunk/expanded/deleted as a result of this merge.
+ *
+ * As part of this operation, if a merge occurs, the @vmg object will have its
+ * vma, start, end, and pgoff fields modified to execute the merge. Subsequent
+ * calls to this function should reset these fields.
+ *
+ * Returns: The merged VMA if merge succeeds, or NULL otherwise.
+ *
+ * ASSUMPTIONS:
+ * - The caller must assign the VMA to be modifed to @vmg->vma.
+ * - The caller must have set @vmg->prev to the previous VMA, if there is one.
+ * - The caller must not set @vmg->next, as we determine this.
+ * - The caller must hold a WRITE lock on the mm_struct->mmap_lock.
+ * - vmi must be positioned within [@vmg->vma->vm_start, @vmg->vma->vm_end).
+ */
+static struct vm_area_struct *vma_merge_existing_range(struct vma_merge_struct *vmg)
+{
+ struct vm_area_struct *vma = vmg->vma;
+ struct vm_area_struct *prev = vmg->prev;
+ struct vm_area_struct *next, *res;
+ struct vm_area_struct *anon_dup = NULL;
+ struct vm_area_struct *adjust = NULL;
+ unsigned long start = vmg->start;
+ unsigned long end = vmg->end;
+ bool left_side = vma && start == vma->vm_start;
+ bool right_side = vma && end == vma->vm_end;
+ int err = 0;
+ long adj_start = 0;
+ bool merge_will_delete_vma, merge_will_delete_next;
+ bool merge_left, merge_right, merge_both;
+ bool expanded;
+
+ mmap_assert_write_locked(vmg->mm);
+ VM_WARN_ON(!vma); /* We are modifying a VMA, so caller must specify. */
+ VM_WARN_ON(vmg->next); /* We set this. */
+ VM_WARN_ON(prev && start <= prev->vm_start);
+ VM_WARN_ON(start >= end);
+ /*
+ * If vma == prev, then we are offset into a VMA. Otherwise, if we are
+ * not, we must span a portion of the VMA.
+ */
+ VM_WARN_ON(vma && ((vma != prev && vmg->start != vma->vm_start) ||
+ vmg->end > vma->vm_end));
+ /* The vmi must be positioned within vmg->vma. */
+ VM_WARN_ON(vma && !(vma_iter_addr(vmg->vmi) >= vma->vm_start &&
+ vma_iter_addr(vmg->vmi) < vma->vm_end));
+
+ vmg->state = VMA_MERGE_NOMERGE;
+
+ /*
+ * If a special mapping or if the range being modified is neither at the
+ * furthermost left or right side of the VMA, then we have no chance of
+ * merging and should abort.
+ */
+ if (vmg->flags & VM_SPECIAL || (!left_side && !right_side))
+ return NULL;
+
+ if (left_side)
+ merge_left = can_vma_merge_left(vmg);
+ else
+ merge_left = false;
+
+ if (right_side) {
+ next = vmg->next = vma_iter_next_range(vmg->vmi);
+ vma_iter_prev_range(vmg->vmi);
+
+ merge_right = can_vma_merge_right(vmg, merge_left);
+ } else {
+ merge_right = false;
+ next = NULL;
+ }
+
+ if (merge_left) /* If merging prev, position iterator there. */
+ vma_prev(vmg->vmi);
+ else if (!merge_right) /* If we have nothing to merge, abort. */
+ return NULL;
+
+ merge_both = merge_left && merge_right;
+ /* If we span the entire VMA, a merge implies it will be deleted. */
+ merge_will_delete_vma = left_side && right_side;
+ /*
+ * If we merge both VMAs, then next is also deleted. This implies
+ * merge_will_delete_vma also.
+ */
+ merge_will_delete_next = merge_both;
+
+ /* No matter what happens, we will be adjusting vma. */
+ vma_start_write(vma);
+
+ if (merge_left)
+ vma_start_write(prev);
+
+ if (merge_right)
+ vma_start_write(next);
+
+ if (merge_both) {
+ /*
+ * |<----->|
+ * |-------*********-------|
+ * prev vma next
+ * extend delete delete
+ */
+
+ vmg->vma = prev;
+ vmg->start = prev->vm_start;
+ vmg->end = next->vm_end;
+ vmg->pgoff = prev->vm_pgoff;
+
+ /*
+ * We already ensured anon_vma compatibility above, so now it's
+ * simply a case of, if prev has no anon_vma object, which of
+ * next or vma contains the anon_vma we must duplicate.
+ */
+ err = dup_anon_vma(prev, next->anon_vma ? next : vma, &anon_dup);
+ } else if (merge_left) {
+ /*
+ * |<----->| OR
+ * |<--------->|
+ * |-------*************
+ * prev vma
+ * extend shrink/delete
+ */
+
+ vmg->vma = prev;
+ vmg->start = prev->vm_start;
+ vmg->pgoff = prev->vm_pgoff;
+
+ if (merge_will_delete_vma) {
+ /*
+ * can_vma_merge_after() assumed we would not be
+ * removing vma, so it skipped the check for
+ * vm_ops->close, but we are removing vma.
+ */
+ if (vma->vm_ops && vma->vm_ops->close)
+ err = -EINVAL;
+ } else {
+ adjust = vma;
+ adj_start = vmg->end - vma->vm_start;
+ }
+
+ if (!err)
+ err = dup_anon_vma(prev, vma, &anon_dup);
+ } else { /* merge_right */
+ /*
+ * |<----->| OR
+ * |<--------->|
+ * *************-------|
+ * vma next
+ * shrink/delete extend
+ */
+
+ pgoff_t pglen = PHYS_PFN(vmg->end - vmg->start);
+
+ VM_WARN_ON(!merge_right);
+ /* If we are offset into a VMA, then prev must be vma. */
+ VM_WARN_ON(vmg->start > vma->vm_start && prev && vma != prev);
+
+ if (merge_will_delete_vma) {
+ vmg->vma = next;
+ vmg->end = next->vm_end;
+ vmg->pgoff = next->vm_pgoff - pglen;
+ } else {
+ /*
+ * We shrink vma and expand next.
+ *
+ * IMPORTANT: This is the ONLY case where the final
+ * merged VMA is NOT vmg->vma, but rather vmg->next.
+ */
+
+ vmg->start = vma->vm_start;
+ vmg->end = start;
+ vmg->pgoff = vma->vm_pgoff;
+
+ adjust = next;
+ adj_start = -(vma->vm_end - start);
+ }
+
+ err = dup_anon_vma(next, vma, &anon_dup);
+ }
+
+ if (err)
+ goto abort;
+
+ /*
+ * In nearly all cases, we expand vmg->vma. There is one exception -
+ * merge_right where we partially span the VMA. In this case we shrink
+ * the end of vmg->vma and adjust the start of vmg->next accordingly.
+ */
+ expanded = !merge_right || merge_will_delete_vma;
+
+ if (commit_merge(vmg, adjust,
+ merge_will_delete_vma ? vma : NULL,
+ merge_will_delete_next ? next : NULL,
+ adj_start, expanded)) {
+ if (anon_dup)
+ unlink_anon_vmas(anon_dup);
+
+ vmg->state = VMA_MERGE_ERROR_NOMEM;
+ return NULL;
+ }
+
+ res = merge_left ? prev : next;
+ khugepaged_enter_vma(res, vmg->flags);
+
+ vmg->state = VMA_MERGE_SUCCESS;
+ return res;
+
+abort:
+ vma_iter_set(vmg->vmi, start);
+ vma_iter_load(vmg->vmi);
+ vmg->state = VMA_MERGE_ERROR_NOMEM;
+ return NULL;
+}
+
/*
* vma_merge_new_range - Attempt to merge a new VMA into address space
*
@@ -757,7 +1006,7 @@ int vma_expand(struct vma_merge_struct *vmg)
/* Only handles expanding */
VM_WARN_ON(vma->vm_start < vmg->start || vma->vm_end > vmg->end);
- if (commit_merge(vmg, remove_next ? next : NULL))
+ if (commit_merge(vmg, NULL, remove_next ? next : NULL, NULL, 0, true))
goto nomem;
return 0;
@@ -1127,249 +1376,6 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
}
-/*
- * Given a mapping request (addr,end,vm_flags,file,pgoff,anon_name),
- * figure out whether that can be merged with its predecessor or its
- * successor. Or both (it neatly fills a hole).
- *
- * In most cases - when called for mmap, brk or mremap - [addr,end) is
- * certain not to be mapped by the time vma_merge is called; but when
- * called for mprotect, it is certain to be already mapped (either at
- * an offset within prev, or at the start of next), and the flags of
- * this area are about to be changed to vm_flags - and the no-change
- * case has already been eliminated.
- *
- * The following mprotect cases have to be considered, where **** is
- * the area passed down from mprotect_fixup, never extending beyond one
- * vma, PPPP is the previous vma, CCCC is a concurrent vma that starts
- * at the same address as **** and is of the same or larger span, and
- * NNNN the next vma after ****:
- *
- * **** **** ****
- * PPPPPPNNNNNN PPPPPPNNNNNN PPPPPPCCCCCC
- * cannot merge might become might become
- * PPNNNNNNNNNN PPPPPPPPPPCC
- * mmap, brk or case 4 below case 5 below
- * mremap move:
- * **** ****
- * PPPP NNNN PPPPCCCCNNNN
- * might become might become
- * PPPPPPPPPPPP 1 or PPPPPPPPPPPP 6 or
- * PPPPPPPPNNNN 2 or PPPPPPPPNNNN 7 or
- * PPPPNNNNNNNN 3 PPPPNNNNNNNN 8
- *
- * It is important for case 8 that the vma CCCC overlapping the
- * region **** is never going to extended over NNNN. Instead NNNN must
- * be extended in region **** and CCCC must be removed. This way in
- * all cases where vma_merge succeeds, the moment vma_merge drops the
- * rmap_locks, the properties of the merged vma will be already
- * correct for the whole merged range. Some of those properties like
- * vm_page_prot/vm_flags may be accessed by rmap_walks and they must
- * be correct for the whole merged range immediately after the
- * rmap_locks are released. Otherwise if NNNN would be removed and
- * CCCC would be extended over the NNNN range, remove_migration_ptes
- * or other rmap walkers (if working on addresses beyond the "end"
- * parameter) may establish ptes with the wrong permissions of CCCC
- * instead of the right permissions of NNNN.
- *
- * In the code below:
- * PPPP is represented by *prev
- * CCCC is represented by *curr or not represented at all (NULL)
- * NNNN is represented by *next or not represented at all (NULL)
- * **** is not represented - it will be merged and the vma containing the
- * area is returned, or the function will return NULL
- */
-static struct vm_area_struct *vma_merge(struct vma_merge_struct *vmg)
-{
- struct mm_struct *mm = vmg->mm;
- struct vm_area_struct *prev = vmg->prev;
- struct vm_area_struct *curr, *next, *res;
- struct vm_area_struct *vma, *adjust, *remove, *remove2;
- struct vm_area_struct *anon_dup = NULL;
- struct vma_prepare vp;
- pgoff_t vma_pgoff;
- int err = 0;
- bool merge_prev = false;
- bool merge_next = false;
- bool vma_expanded = false;
- unsigned long addr = vmg->start;
- unsigned long end = vmg->end;
- unsigned long vma_start = addr;
- unsigned long vma_end = end;
- pgoff_t pglen = PHYS_PFN(end - addr);
- long adj_start = 0;
-
- vmg->state = VMA_MERGE_NOMERGE;
-
- /*
- * We later require that vma->vm_flags == vm_flags,
- * so this tests vma->vm_flags & VM_SPECIAL, too.
- */
- if (vmg->flags & VM_SPECIAL)
- return NULL;
-
- /* Does the input range span an existing VMA? (cases 5 - 8) */
- curr = find_vma_intersection(mm, prev ? prev->vm_end : 0, end);
-
- if (!curr || /* cases 1 - 4 */
- end == curr->vm_end) /* cases 6 - 8, adjacent VMA */
- next = vmg->next = vma_lookup(mm, end);
- else
- next = vmg->next = NULL; /* case 5 */
-
- if (prev) {
- vma_start = prev->vm_start;
- vma_pgoff = prev->vm_pgoff;
-
- /* Can we merge the predecessor? */
- if (addr == prev->vm_end && can_vma_merge_after(vmg)) {
- merge_prev = true;
- vma_prev(vmg->vmi);
- }
- }
-
- /* Can we merge the successor? */
- if (next && can_vma_merge_before(vmg)) {
- merge_next = true;
- }
-
- /* Verify some invariant that must be enforced by the caller. */
- VM_WARN_ON(prev && addr <= prev->vm_start);
- VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end));
- VM_WARN_ON(addr >= end);
-
- if (!merge_prev && !merge_next)
- return NULL; /* Not mergeable. */
-
- if (merge_prev)
- vma_start_write(prev);
-
- res = vma = prev;
- remove = remove2 = adjust = NULL;
-
- /* Can we merge both the predecessor and the successor? */
- if (merge_prev && merge_next &&
- is_mergeable_anon_vma(prev->anon_vma, next->anon_vma, NULL)) {
- vma_start_write(next);
- remove = next; /* case 1 */
- vma_end = next->vm_end;
- err = dup_anon_vma(prev, next, &anon_dup);
- if (curr) { /* case 6 */
- vma_start_write(curr);
- remove = curr;
- remove2 = next;
- /*
- * Note that the dup_anon_vma below cannot overwrite err
- * since the first caller would do nothing unless next
- * has an anon_vma.
- */
- if (!next->anon_vma)
- err = dup_anon_vma(prev, curr, &anon_dup);
- }
- } else if (merge_prev) { /* case 2 */
- if (curr) {
- vma_start_write(curr);
- if (end == curr->vm_end) { /* case 7 */
- /*
- * can_vma_merge_after() assumed we would not be
- * removing prev vma, so it skipped the check
- * for vm_ops->close, but we are removing curr
- */
- if (curr->vm_ops && curr->vm_ops->close)
- err = -EINVAL;
- remove = curr;
- } else { /* case 5 */
- adjust = curr;
- adj_start = (end - curr->vm_start);
- }
- if (!err)
- err = dup_anon_vma(prev, curr, &anon_dup);
- }
- } else { /* merge_next */
- vma_start_write(next);
- res = next;
- if (prev && addr < prev->vm_end) { /* case 4 */
- vma_start_write(prev);
- vma_end = addr;
- adjust = next;
- adj_start = -(prev->vm_end - addr);
- err = dup_anon_vma(next, prev, &anon_dup);
- } else {
- /*
- * Note that cases 3 and 8 are the ONLY ones where prev
- * is permitted to be (but is not necessarily) NULL.
- */
- vma = next; /* case 3 */
- vma_start = addr;
- vma_end = next->vm_end;
- vma_pgoff = next->vm_pgoff - pglen;
- if (curr) { /* case 8 */
- vma_pgoff = curr->vm_pgoff;
- vma_start_write(curr);
- remove = curr;
- err = dup_anon_vma(next, curr, &anon_dup);
- }
- }
- }
-
- /* Error in anon_vma clone. */
- if (err)
- goto anon_vma_fail;
-
- if (vma_start < vma->vm_start || vma_end > vma->vm_end)
- vma_expanded = true;
-
- if (vma_expanded) {
- vma_iter_config(vmg->vmi, vma_start, vma_end);
- } else {
- vma_iter_config(vmg->vmi, adjust->vm_start + adj_start,
- adjust->vm_end);
- }
-
- if (vma_iter_prealloc(vmg->vmi, vma))
- goto prealloc_fail;
-
- init_multi_vma_prep(&vp, vma, adjust, remove, remove2);
- VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma &&
- vp.anon_vma != adjust->anon_vma);
-
- vma_prepare(&vp);
- vma_adjust_trans_huge(vma, vma_start, vma_end, adj_start);
- vma_set_range(vma, vma_start, vma_end, vma_pgoff);
-
- if (vma_expanded)
- vma_iter_store(vmg->vmi, vma);
-
- if (adj_start) {
- adjust->vm_start += adj_start;
- adjust->vm_pgoff += adj_start >> PAGE_SHIFT;
- if (adj_start < 0) {
- WARN_ON(vma_expanded);
- vma_iter_store(vmg->vmi, next);
- }
- }
-
- vma_complete(&vp, vmg->vmi, mm);
- validate_mm(mm);
- khugepaged_enter_vma(res, vmg->flags);
-
- vmg->state = VMA_MERGE_SUCCESS;
- return res;
-
-prealloc_fail:
- vmg->state = VMA_MERGE_ERROR_NOMEM;
- if (anon_dup)
- unlink_anon_vmas(anon_dup);
-
-anon_vma_fail:
- if (err == -ENOMEM)
- vmg->state = VMA_MERGE_ERROR_NOMEM;
-
- vma_iter_set(vmg->vmi, addr);
- vma_iter_load(vmg->vmi);
- return NULL;
-}
-
/*
* We are about to modify one or multiple of a VMA's flags, policy, userfaultfd
* context and anonymous VMA name within the range [start, end).
@@ -1389,7 +1395,7 @@ static struct vm_area_struct *vma_modify(struct vma_merge_struct *vmg)
struct vm_area_struct *merged;
/* First, try to merge. */
- merged = vma_merge(vmg);
+ merged = vma_merge_existing_range(vmg);
if (merged)
return merged;
diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
index b7cdafec09af..25a95d9901ea 100644
--- a/tools/testing/vma/vma.c
+++ b/tools/testing/vma/vma.c
@@ -112,7 +112,7 @@ static struct vm_area_struct *merge_new(struct vma_merge_struct *vmg)
*/
static struct vm_area_struct *merge_existing(struct vma_merge_struct *vmg)
{
- return vma_merge(vmg);
+ return vma_merge_existing_range(vmg);
}
/*
@@ -752,7 +752,12 @@ static bool test_vma_merge_with_close(void)
vmg.vma = vma;
/* Make sure merge does not occur. */
ASSERT_EQ(merge_existing(&vmg), NULL);
- ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
+ /*
+ * Initially this is misapprehended as an out of memory report, as the
+ * close() check is handled in the same way as anon_vma duplication
+ * failures, however a subsequent patch resolves this.
+ */
+ ASSERT_EQ(vmg.state, VMA_MERGE_ERROR_NOMEM);
cleanup_mm(&mm, &vmi);
return true;
--
2.46.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 10/10] mm: rework vm_ops->close() handling on VMA merge
2024-08-30 18:10 [PATCH v3 00/10] mm: remove vma_merge() Lorenzo Stoakes
` (8 preceding siblings ...)
2024-08-30 18:10 ` [PATCH v3 09/10] mm: refactor vma_merge() into modify-only vma_merge_existing_range() Lorenzo Stoakes
@ 2024-08-30 18:10 ` Lorenzo Stoakes
9 siblings, 0 replies; 11+ messages in thread
From: Lorenzo Stoakes @ 2024-08-30 18:10 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton
Cc: Liam R . Howlett, Vlastimil Babka, Mark Brown
In commit 714965ca8252 ("mm/mmap: start distinguishing if vma can be
removed in mergeability test") we relaxed the VMA merge rules for VMAs
possessing a vm_ops->close() hook, permitting this operation in instances
where we wouldn't delete the VMA as part of the merge operation.
This was later corrected in commit fc0c8f9089c2 ("mm, mmap: fix vma_merge()
case 7 with vma_ops->close") to account for a subtle case that the previous
commit had not taken into account.
In both instances, we first rely on is_mergeable_vma() to determine whether
we might be dealing with a VMA that might be removed, taking advantage of
the fact that a 'previous' VMA will never be deleted, only VMAs that follow
it.
The second patch corrects the instance where a merge of the previous VMA
into a subsequent one did not correctly check whether the subsequent VMA
had a vm_ops->close() handler.
Both changes prevent merge cases that are actually permissible (for
instance a merge of a VMA into a following VMA with a vm_ops->close(), but
with no previous VMA, which would result in the next VMA being extended,
not deleted).
In addition, both changes fail to consider the case where a VMA that would
otherwise be merged with the previous and next VMA might have
vm_ops->close(), on the assumption that for this to be the case, all three
would have to have the same vma->vm_file to be mergeable and thus the same
vm_ops.
And in addition both changes operate at 50,000 feet, trying to guess
whether a VMA will be deleted.
As we have majorly refactored the VMA merge operation and de-duplicated
code to the point where we know precisely where deletions will occur, this
patch removes the aforementioned checks altogether and instead explicitly
checks whether a VMA will be deleted.
In cases where a reduced merge is still possible (where we merge both
previous and next VMA but the next VMA has a vm_ops->close hook, meaning we
could just merge the previous and current VMA), we do so, otherwise the
merge is not permitted.
We take advantage of our userland testing to assert that this functions
correctly - replacing the previous limited vm_ops->close() tests with tests
for every single case where we delete a VMA.
We also update all testing for both new and modified VMAs to set
vma->vm_ops->close() in every single instance where this would not prevent
the merge, to assert that we never do so.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/vma.c | 57 +++++++++-----
tools/testing/vma/vma.c | 166 +++++++++++++++++++++++++++++++---------
2 files changed, 164 insertions(+), 59 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index 393bef832604..8d1686fc8d5a 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -10,14 +10,6 @@
static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_next)
{
struct vm_area_struct *vma = merge_next ? vmg->next : vmg->prev;
- /*
- * If the vma has a ->close operation then the driver probably needs to
- * release per-vma resources, so we don't attempt to merge those if the
- * caller indicates the current vma may be removed as part of the merge,
- * which is the case if we are attempting to merge the next VMA into
- * this one.
- */
- bool may_remove_vma = merge_next;
if (!mpol_equal(vmg->policy, vma_policy(vma)))
return false;
@@ -33,8 +25,6 @@ static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_nex
return false;
if (vma->vm_file != vmg->file)
return false;
- if (may_remove_vma && vma->vm_ops && vma->vm_ops->close)
- return false;
if (!is_mergeable_vm_userfaultfd_ctx(vma, vmg->uffd_ctx))
return false;
if (!anon_vma_name_eq(anon_vma_name(vma), vmg->anon_name))
@@ -632,6 +622,12 @@ static int commit_merge(struct vma_merge_struct *vmg,
return 0;
}
+/* We can only remove VMAs when merging if they do not have a close hook. */
+static bool can_merge_remove_vma(struct vm_area_struct *vma)
+{
+ return !vma->vm_ops || !vma->vm_ops->close;
+}
+
/*
* vma_merge_existing_range - Attempt to merge VMAs based on a VMA having its
* attributes modified.
@@ -725,12 +721,30 @@ static struct vm_area_struct *vma_merge_existing_range(struct vma_merge_struct *
merge_both = merge_left && merge_right;
/* If we span the entire VMA, a merge implies it will be deleted. */
merge_will_delete_vma = left_side && right_side;
+
+ /*
+ * If we need to remove vma in its entirety but are unable to do so,
+ * we have no sensible recourse but to abort the merge.
+ */
+ if (merge_will_delete_vma && !can_merge_remove_vma(vma))
+ return NULL;
+
/*
* If we merge both VMAs, then next is also deleted. This implies
* merge_will_delete_vma also.
*/
merge_will_delete_next = merge_both;
+ /*
+ * If we cannot delete next, then we can reduce the operation to merging
+ * prev and vma (thereby deleting vma).
+ */
+ if (merge_will_delete_next && !can_merge_remove_vma(next)) {
+ merge_will_delete_next = false;
+ merge_right = false;
+ merge_both = false;
+ }
+
/* No matter what happens, we will be adjusting vma. */
vma_start_write(vma);
@@ -772,21 +786,12 @@ static struct vm_area_struct *vma_merge_existing_range(struct vma_merge_struct *
vmg->start = prev->vm_start;
vmg->pgoff = prev->vm_pgoff;
- if (merge_will_delete_vma) {
- /*
- * can_vma_merge_after() assumed we would not be
- * removing vma, so it skipped the check for
- * vm_ops->close, but we are removing vma.
- */
- if (vma->vm_ops && vma->vm_ops->close)
- err = -EINVAL;
- } else {
+ if (!merge_will_delete_vma) {
adjust = vma;
adj_start = vmg->end - vma->vm_start;
}
- if (!err)
- err = dup_anon_vma(prev, vma, &anon_dup);
+ err = dup_anon_vma(prev, vma, &anon_dup);
} else { /* merge_right */
/*
* |<----->| OR
@@ -940,6 +945,14 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
vmg->vma = prev;
vmg->pgoff = prev->vm_pgoff;
+ /*
+ * If this merge would result in removal of the next VMA but we
+ * are not permitted to do so, reduce the operation to merging
+ * prev and vma.
+ */
+ if (can_merge_right && !can_merge_remove_vma(next))
+ vmg->end = end;
+
vma_prev(vmg->vmi); /* Equivalent to going to the previous range */
}
@@ -994,6 +1007,8 @@ int vma_expand(struct vma_merge_struct *vmg)
int ret;
remove_next = true;
+ /* This should already have been checked by this point. */
+ VM_WARN_ON(!can_merge_remove_vma(next));
vma_start_write(next);
ret = dup_anon_vma(vma, next, &anon_dup);
if (ret)
diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
index 25a95d9901ea..c53f220eb6cc 100644
--- a/tools/testing/vma/vma.c
+++ b/tools/testing/vma/vma.c
@@ -387,6 +387,9 @@ static bool test_merge_new(void)
struct anon_vma_chain dummy_anon_vma_chain_d = {
.anon_vma = &dummy_anon_vma,
};
+ const struct vm_operations_struct vm_ops = {
+ .close = dummy_close,
+ };
int count;
struct vm_area_struct *vma, *vma_a, *vma_b, *vma_c, *vma_d;
bool merged;
@@ -430,6 +433,7 @@ static bool test_merge_new(void)
* 0123456789abc
* AA*B DD CC
*/
+ vma_a->vm_ops = &vm_ops; /* This should have no impact. */
vma_b->anon_vma = &dummy_anon_vma;
vma = try_merge_new_vma(&mm, &vmg, 0x2000, 0x3000, 2, flags, &merged);
ASSERT_EQ(vma, vma_a);
@@ -466,6 +470,7 @@ static bool test_merge_new(void)
* AAAAA *DD CC
*/
vma_d->anon_vma = &dummy_anon_vma;
+ vma_d->vm_ops = &vm_ops; /* This should have no impact. */
vma = try_merge_new_vma(&mm, &vmg, 0x6000, 0x7000, 6, flags, &merged);
ASSERT_EQ(vma, vma_d);
/* Prepend. */
@@ -483,6 +488,7 @@ static bool test_merge_new(void)
* 0123456789abc
* AAAAA*DDD CC
*/
+ vma_d->vm_ops = NULL; /* This would otherwise degrade the merge. */
vma = try_merge_new_vma(&mm, &vmg, 0x5000, 0x6000, 5, flags, &merged);
ASSERT_EQ(vma, vma_a);
/* Merge with A, delete D. */
@@ -640,13 +646,11 @@ static bool test_vma_merge_with_close(void)
const struct vm_operations_struct vm_ops = {
.close = dummy_close,
};
- struct vm_area_struct *vma_next =
- alloc_and_link_vma(&mm, 0x2000, 0x3000, 2, flags);
- struct vm_area_struct *vma;
+ struct vm_area_struct *vma_prev, *vma_next, *vma;
/*
- * When we merge VMAs we sometimes have to delete others as part of the
- * operation.
+ * When merging VMAs we are not permitted to remove any VMA that has a
+ * vm_ops->close() hook.
*
* Considering the two possible adjacent VMAs to which a VMA can be
* merged:
@@ -697,28 +701,52 @@ static bool test_vma_merge_with_close(void)
* would be set too, and thus scenario A would pick this up.
*/
- ASSERT_NE(vma_next, NULL);
-
/*
- * SCENARIO A
+ * The only case of a new VMA merge that results in a VMA being deleted
+ * is one where both the previous and next VMAs are merged - in this
+ * instance the next VMA is deleted, and the previous VMA is extended.
*
- * 0123
- * *N
+ * If we are unable to do so, we reduce the operation to simply
+ * extending the prev VMA and not merging next.
+ *
+ * 0123456789
+ * PPP**NNNN
+ * ->
+ * 0123456789
+ * PPPPPPNNN
*/
- /* Make the next VMA have a close() callback. */
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x5000, 0x9000, 5, flags);
vma_next->vm_ops = &vm_ops;
- /* Our proposed VMA has characteristics that would otherwise be merged. */
- vmg_set_range(&vmg, 0x1000, 0x2000, 1, flags);
+ vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
+ ASSERT_EQ(merge_new(&vmg), vma_prev);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
+ ASSERT_EQ(vma_prev->vm_start, 0);
+ ASSERT_EQ(vma_prev->vm_end, 0x5000);
+ ASSERT_EQ(vma_prev->vm_pgoff, 0);
- /* The next VMA having a close() operator should cause the merge to fail.*/
- ASSERT_EQ(merge_new(&vmg), NULL);
- ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
- /* Now create the VMA so we can merge via modified flags */
- vmg_set_range(&vmg, 0x1000, 0x2000, 1, flags);
- vma = alloc_and_link_vma(&mm, 0x1000, 0x2000, 1, flags);
+ /*
+ * When modifying an existing VMA there are further cases where we
+ * delete VMAs.
+ *
+ * <>
+ * 0123456789
+ * PPPVV
+ *
+ * In this instance, if vma has a close hook, the merge simply cannot
+ * proceed.
+ */
+
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
+ vma->vm_ops = &vm_ops;
+
+ vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
+ vmg.prev = vma_prev;
vmg.vma = vma;
/*
@@ -728,38 +756,90 @@ static bool test_vma_merge_with_close(void)
ASSERT_EQ(merge_existing(&vmg), NULL);
ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
- /* SCENARIO B
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
+
+ /*
+ * This case is mirrored if merging with next.
*
- * 0123
- * P*
+ * <>
+ * 0123456789
+ * VVNNNN
*
- * In order for this scenario to trigger, the VMA currently being
- * modified must also have a .close().
+ * In this instance, if vma has a close hook, the merge simply cannot
+ * proceed.
*/
- /* Reset VMG state. */
- vmg_set_range(&vmg, 0x1000, 0x2000, 1, flags);
- /*
- * Make next unmergeable, and don't let the scenario A check pick this
- * up, we want to reproduce scenario B only.
- */
- vma_next->vm_ops = NULL;
- vma_next->__vm_flags &= ~VM_MAYWRITE;
- /* Allocate prev. */
- vmg.prev = alloc_and_link_vma(&mm, 0, 0x1000, 0, flags);
- /* Assign a vm_ops->close() function to VMA explicitly. */
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x5000, 0x9000, 5, flags);
vma->vm_ops = &vm_ops;
+
+ vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
vmg.vma = vma;
- /* Make sure merge does not occur. */
ASSERT_EQ(merge_existing(&vmg), NULL);
/*
* Initially this is misapprehended as an out of memory report, as the
* close() check is handled in the same way as anon_vma duplication
* failures, however a subsequent patch resolves this.
*/
- ASSERT_EQ(vmg.state, VMA_MERGE_ERROR_NOMEM);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
+
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
+
+ /*
+ * Finally, we consider two variants of the case where we modify a VMA
+ * to merge with both the previous and next VMAs.
+ *
+ * The first variant is where vma has a close hook. In this instance, no
+ * merge can proceed.
+ *
+ * <>
+ * 0123456789
+ * PPPVVNNNN
+ */
+
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x5000, 0x9000, 5, flags);
+ vma->vm_ops = &vm_ops;
+
+ vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
+ vmg.prev = vma_prev;
+ vmg.vma = vma;
+
+ ASSERT_EQ(merge_existing(&vmg), NULL);
+ ASSERT_EQ(vmg.state, VMA_MERGE_NOMERGE);
+
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 3);
+
+ /*
+ * The second variant is where next has a close hook. In this instance,
+ * we reduce the operation to a merge between prev and vma.
+ *
+ * <>
+ * 0123456789
+ * PPPVVNNNN
+ * ->
+ * 0123456789
+ * PPPPPNNNN
+ */
+
+ vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
+ vma_next = alloc_and_link_vma(&mm, 0x5000, 0x9000, 5, flags);
+ vma_next->vm_ops = &vm_ops;
+
+ vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
+ vmg.prev = vma_prev;
+ vmg.vma = vma;
+
+ ASSERT_EQ(merge_existing(&vmg), vma_prev);
+ ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
+ ASSERT_EQ(vma_prev->vm_start, 0);
+ ASSERT_EQ(vma_prev->vm_end, 0x5000);
+ ASSERT_EQ(vma_prev->vm_pgoff, 0);
+
+ ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
- cleanup_mm(&mm, &vmi);
return true;
}
@@ -828,6 +908,9 @@ static bool test_merge_existing(void)
.mm = &mm,
.vmi = &vmi,
};
+ const struct vm_operations_struct vm_ops = {
+ .close = dummy_close,
+ };
/*
* Merge right case - partial span.
@@ -840,7 +923,9 @@ static bool test_merge_existing(void)
* VNNNNNN
*/
vma = alloc_and_link_vma(&mm, 0x2000, 0x6000, 2, flags);
+ vma->vm_ops = &vm_ops; /* This should have no impact. */
vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, flags);
+ vma_next->vm_ops = &vm_ops; /* This should have no impact. */
vmg_set_range(&vmg, 0x3000, 0x6000, 3, flags);
vmg.vma = vma;
vmg.prev = vma;
@@ -873,6 +958,7 @@ static bool test_merge_existing(void)
*/
vma = alloc_and_link_vma(&mm, 0x2000, 0x6000, 2, flags);
vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, flags);
+ vma_next->vm_ops = &vm_ops; /* This should have no impact. */
vmg_set_range(&vmg, 0x2000, 0x6000, 2, flags);
vmg.vma = vma;
vma->anon_vma = &dummy_anon_vma;
@@ -899,7 +985,9 @@ static bool test_merge_existing(void)
* PPPPPPV
*/
vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma_prev->vm_ops = &vm_ops; /* This should have no impact. */
vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
+ vma->vm_ops = &vm_ops; /* This should have no impact. */
vmg_set_range(&vmg, 0x3000, 0x6000, 3, flags);
vmg.prev = vma_prev;
vmg.vma = vma;
@@ -932,6 +1020,7 @@ static bool test_merge_existing(void)
* PPPPPPP
*/
vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma_prev->vm_ops = &vm_ops; /* This should have no impact. */
vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
vmg.prev = vma_prev;
@@ -960,6 +1049,7 @@ static bool test_merge_existing(void)
* PPPPPPPPPP
*/
vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
+ vma_prev->vm_ops = &vm_ops; /* This should have no impact. */
vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, flags);
vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
--
2.46.0
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-08-30 18:13 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-30 18:10 [PATCH v3 00/10] mm: remove vma_merge() Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 01/10] tools: improve vma test Makefile Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 02/10] tools: add VMA merge tests Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 03/10] mm: introduce vma_merge_struct and abstract vma_merge(),vma_modify() Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 04/10] mm: remove duplicated open-coded VMA policy check Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 05/10] mm: abstract vma_expand() to use vma_merge_struct Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 06/10] mm: avoid using vma_merge() for new VMAs Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 07/10] mm: make vma_prepare() and friends static and internal to vma.c Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 08/10] mm: introduce commit_merge(), abstracting final commit of merge Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 09/10] mm: refactor vma_merge() into modify-only vma_merge_existing_range() Lorenzo Stoakes
2024-08-30 18:10 ` [PATCH v3 10/10] mm: rework vm_ops->close() handling on VMA merge Lorenzo Stoakes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox