* [PATCH v2 0/3] fix incorrectly disallowed anonymous VMA merges
@ 2025-04-08 9:29 Lorenzo Stoakes
2025-04-08 9:29 ` [PATCH v2 1/3] mm/vma: " Lorenzo Stoakes
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Lorenzo Stoakes @ 2025-04-08 9:29 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jann Horn, Liam R . Howlett, Suren Baghdasaryan,
David Hildenbrand, Matthew Wilcox, Rik van Riel, Wei Yang,
linux-mm, linux-kernel
It appears that we have been incorrectly rejecting merge cases for 15
years, apparently by mistake.
Imagine a range of anonymous mapped momemory divided into two VMAs like
this, with incompatible protection bits:
RW RWX
unfaulted faulted
|-----------|-----------|
| prev | vma |
|-----------|-----------|
mprotect(RW)
Now imagine mprotect()'ing vma so it is RW. This appears as if it should
merge, it does not.
Neither does this case, again mprotect()'ing vma RW:
RWX RW
faulted unfaulted
|-----------|-----------|
| vma | next |
|-----------|-----------|
mprotect(RW)
Nor:
RW RWX RW
unfaulted faulted unfaulted
|-----------|-----------|-----------|
| prev | vma | next |
|-----------|-----------|-----------|
mprotect(RW)
What's going on here?
In commit 5beb49305251 ("mm: change anon_vma linking to fix multi-process
server scalability issue"), from 2010, Rik von Riel took careful care to
account for these cases - commenting that '[this is] easily overlooked:
when mprotect shifts the boundary, make sure the expanding vma has anon_vma
set if the shrinking vma had, to cover any anon pages imported.'
However, commit 965f55dea0e3 ("mmap: avoid merging cloned VMAs") introduced
a little over a year later, appears to have accidentally disallowed this.
By adjusting the is_mergeable_anon_vma() function to avoid lock contention
across large trees of forked anon_vma's, this commit wrongly assumed the
VMA being checked (the ostensible merge 'target') should be faulted, that
is, have an anon_vma, and thus an anon_vma_chain list established, but only
of length 1.
This appears to have been unintentional, as disallowing empty target VMAs
like this across the board makes no sense.
We already have logic that accounts for this case, the same logic Rik
introduced in 2010, now via dup_anon_vma() (and ultimately
anon_vma_clone()), so there is no problem permitting this.
This series fixes this mistake and also ensures that scalability concerns
remain addressed by explicitly checking that whatever VMA is being merged
has not been forked.
A full set of self tests which reproduce the issue are provided, as well as
updating userland VMA tests to assert this behaviour.
The self tests additionally assert scalability concerns are addressed.
v2:
* Added tags (thanks!)
* Folded small fix-patch to patch 2, returning -errno rather than errno in
vm_util.c.
* Merge conflict fixup in vm_util.c.
* Some comment fixups in the merge tests as per Wei.
* Added vma_merge to run_vmtests.sh usage() function as per Wei.
* Added the new test file to memory mapping section in MAINTAINERS, as
specifically memory mapping-related.
v1:
https://lore.kernel.org/all/cover.1742245056.git.lorenzo.stoakes@oracle.com/
Lorenzo Stoakes (3):
mm/vma: fix incorrectly disallowed anonymous VMA merges
tools/testing: add PROCMAP_QUERY helper functions in mm self tests
tools/testing/selftests: assert that anon merge cases behave as
expected
MAINTAINERS | 1 +
mm/vma.c | 81 ++--
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/merge.c | 455 ++++++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
tools/testing/selftests/mm/vm_util.c | 62 +++
tools/testing/selftests/mm/vm_util.h | 21 +
tools/testing/vma/vma.c | 100 ++---
9 files changed, 656 insertions(+), 70 deletions(-)
create mode 100644 tools/testing/selftests/mm/merge.c
--
2.49.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/3] mm/vma: fix incorrectly disallowed anonymous VMA merges
2025-04-08 9:29 [PATCH v2 0/3] fix incorrectly disallowed anonymous VMA merges Lorenzo Stoakes
@ 2025-04-08 9:29 ` Lorenzo Stoakes
2025-05-12 2:39 ` Harry Yoo
2025-04-08 9:29 ` [PATCH v2 2/3] tools/testing: add PROCMAP_QUERY helper functions in mm self tests Lorenzo Stoakes
2025-04-08 9:29 ` [PATCH v2 3/3] tools/testing/selftests: assert that anon merge cases behave as expected Lorenzo Stoakes
2 siblings, 1 reply; 6+ messages in thread
From: Lorenzo Stoakes @ 2025-04-08 9:29 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jann Horn, Liam R . Howlett, Suren Baghdasaryan,
David Hildenbrand, Matthew Wilcox, Rik van Riel, Wei Yang,
linux-mm, linux-kernel
anon_vma_chain's were introduced by Rik von Riel in commit 5beb49305251 ("mm:
change anon_vma linking to fix multi-process server scalability issue").
This patch was introduced in March 2010. As part of this change, careful
attention was made to the instance of mprotect() causing a VMA merge, with one
faulted (i.e. having anon_vma set) and another not:
/*
* Easily overlooked: when mprotect shifts the boundary,
* make sure the expanding vma has anon_vma set if the
* shrinking vma had, to cover any anon pages imported.
*/
In the modern VMA code, this is handled in dup_anon_vma() (and ultimately
anon_vma_clone()).
This case is one of the three configurations of adjacent VMA anon_vma state
that we might encounter on merge (where dst is the VMA which will be merged
into and src the one being merged into dst):
1. dst->anon_vma, src->anon_vma - These must be equal, no-op.
2. dst->anon_vma, !src->anon_vma - We simply use dst->anon_vma, no-op.
3. !dst->anon_vma, src->anon_vma - The case in question here.
In case 3, the instance addressed here - we duplicate the AVC connections
from src and place into dst.
However, in practice, we very often do NOT do this.
This appears to be due to an inadvertent consequence of the change
introduced by commit 965f55dea0e3 ("mmap: avoid merging cloned VMAs"),
introduced in May 2011.
This implies that this merge case was functional only for a little over a
year, and has since been broken for ~15 years.
Here, lock scalability concerns lead to us restricting anonymous merges
only to those VMAs with 1 entry in their vma->anon_vma_chain, that is, a
VMA that is not connected to any parent process's anon_vma.
The mergeability test looks like this:
static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1,
struct anon_vma *anon_vma2, struct vm_area_struct *vma)
{
if ((!anon_vma1 || !anon_vma2) && (!vma ||
!vma->anon_vma || list_is_singular(&vma->anon_vma_chain)))
return true;
return anon_vma1 == anon_vma2;
}
However, we have a problem here - typically the vma passed here is the
destination VMA.
For instance in vma_merge_existing_range() we invoke:
can_vma_merge_left()
-> [ check that there is an immediately adjacent prior VMA ]
-> can_vma_merge_after()
-> is_mergeable_vma() for general attribute check
-> is_mergeable_anon_vma([ proposed anon_vma ], prev->anon_vma, prev)
So if we were considering a target unfaulted 'prev':
unfaulted faulted
|-----------|-----------|
| prev | vma |
|-----------|-----------|
This would call is_mergeable_anon_vma(NULL, vma->anon_vma, prev).
The list_is_singular() check for vma->anon_vma_chain, an empty list on
fault, would cause this merge to _fail_ even though all else indicates a
merge.
Equally a simple merge into a next VMA would hit the same problem:
faulted unfaulted
|-----------|-----------|
| vma | next |
|-----------|-----------|
can_vma_merge_right()
-> [ check that there is an immediately adjacent succeeding VMA ]
-> can_vma_merge_before()
-> is_mergeable_vma() for general attribute check
-> is_mergeable_anon_vma([ proposed anon_vma ], next->anon_vma, next)
For a 3-way merge, we'd also hit the same problem if it was configured like
this for instance:
unfaulted faulted unfaulted
|-----------|-----------|-----------|
| prev | vma | next |
|-----------|-----------|-----------|
As we'd call can_vma_merge_left() for prev, and can_vma_merge_right() for
next, both of which would fail.
vma_merge_new_range() (and relatedly, vma_expand()) are not impacted, as
the new VMA would never already be faulted (it is a proposed new range).
Because we already handle each of the aforementioned merge cases, and can
absolutely therefore deal with an existing VMA merge with !dst->anon_vma,
src->anon_vma, there is absolutely no reason to disallow this kind of
merge.
It seems that the intention of this patch is to ensure that, in the
instance of merging unfaulted VMAs with faulted ones, we never wish to do
so with those with multiple AVCs due to the fact that anon_vma lock's are
held across both parent and child anon_vma's (actually, the 'root' parent
anon_vma's lock is used).
In fact, the original commit alludes to this - "find_mergeable_anon_vma()
already considers this case".
In find_mergeable_anon_vma() however, we check the anon_vma which will be
merged from, if it is set, then we check
list_is_singular(vma->anon_vma_chain).
So to match this logic, update is_mergeable_anon_vma() to perform this
scalability check on the VMA whose anon_vma we ultimately merge into.
This matches existing behaviour with forked VMAs, only we no longer wrongly
disallow ALL empty target merges.
So we both allow merge cases and ensure the scalability check is correctly
applied.
We may wish to revisit these lock scalability concerns at a later date and
ensure they are still valid.
Additionally, correct userland VMA tests which were mistakenly not
asserting these cases correctly previously to now correctly assert this,
and to ensure vmg->anon_vma state is always consistent to account for newly
introduced asserts.
Fixes: 965f55dea0e3 ("mmap: avoid merging cloned VMAs")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
mm/vma.c | 81 +++++++++++++++++++++++---------
tools/testing/vma/vma.c | 100 +++++++++++++++++++++-------------------
2 files changed, 111 insertions(+), 70 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index 839d12f02c88..8a6c5e835759 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -57,6 +57,22 @@ struct mmap_state {
.state = VMA_MERGE_START, \
}
+/*
+ * If, at any point, the VMA had unCoW'd mappings from parents, it will maintain
+ * more than one anon_vma_chain connecting it to more than one anon_vma. A merge
+ * would mean a wider range of folios sharing the root anon_vma lock, and thus
+ * potential lock contention, we do not wish to encourage merging such that this
+ * scales to a problem.
+ */
+static bool vma_had_uncowed_parents(struct vm_area_struct *vma)
+{
+ /*
+ * The list_is_singular() test is to avoid merging VMA cloned from
+ * parents. This can improve scalability caused by anon_vma lock.
+ */
+ return vma && vma->anon_vma && !list_is_singular(&vma->anon_vma_chain);
+}
+
static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_next)
{
struct vm_area_struct *vma = merge_next ? vmg->next : vmg->prev;
@@ -82,24 +98,28 @@ static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_nex
return true;
}
-static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1,
- struct anon_vma *anon_vma2, struct vm_area_struct *vma)
+static bool is_mergeable_anon_vma(struct vma_merge_struct *vmg, bool merge_next)
{
+ struct vm_area_struct *tgt = merge_next ? vmg->next : vmg->prev;
+ struct vm_area_struct *src = vmg->middle; /* exisitng merge case. */
+ struct anon_vma *tgt_anon = tgt->anon_vma;
+ struct anon_vma *src_anon = vmg->anon_vma;
+
/*
- * The list_is_singular() test is to avoid merging VMA cloned from
- * parents. This can improve scalability caused by anon_vma lock.
+ * We _can_ have !src, vmg->anon_vma via copy_vma(). In this instance we
+ * will remove the existing VMA's anon_vma's so there's no scalability
+ * concerns.
*/
- if ((!anon_vma1 || !anon_vma2) && (!vma ||
- list_is_singular(&vma->anon_vma_chain)))
- return true;
- return anon_vma1 == anon_vma2;
-}
+ VM_WARN_ON(src && src_anon != src->anon_vma);
-/* Are the anon_vma's belonging to each VMA compatible with one another? */
-static inline bool are_anon_vmas_compatible(struct vm_area_struct *vma1,
- struct vm_area_struct *vma2)
-{
- return is_mergeable_anon_vma(vma1->anon_vma, vma2->anon_vma, NULL);
+ /* Case 1 - we will dup_anon_vma() from src into tgt. */
+ if (!tgt_anon && src_anon)
+ return !vma_had_uncowed_parents(src);
+ /* Case 2 - we will simply use tgt's anon_vma. */
+ if (tgt_anon && !src_anon)
+ return !vma_had_uncowed_parents(tgt);
+ /* Case 3 - the anon_vma's are already shared. */
+ return src_anon == tgt_anon;
}
/*
@@ -164,7 +184,7 @@ static bool can_vma_merge_before(struct vma_merge_struct *vmg)
pgoff_t pglen = PHYS_PFN(vmg->end - vmg->start);
if (is_mergeable_vma(vmg, /* merge_next = */ true) &&
- is_mergeable_anon_vma(vmg->anon_vma, vmg->next->anon_vma, vmg->next)) {
+ is_mergeable_anon_vma(vmg, /* merge_next = */ true)) {
if (vmg->next->vm_pgoff == vmg->pgoff + pglen)
return true;
}
@@ -184,7 +204,7 @@ static bool can_vma_merge_before(struct vma_merge_struct *vmg)
static bool can_vma_merge_after(struct vma_merge_struct *vmg)
{
if (is_mergeable_vma(vmg, /* merge_next = */ false) &&
- is_mergeable_anon_vma(vmg->anon_vma, vmg->prev->anon_vma, vmg->prev)) {
+ is_mergeable_anon_vma(vmg, /* merge_next = */ false)) {
if (vmg->prev->vm_pgoff + vma_pages(vmg->prev) == vmg->pgoff)
return true;
}
@@ -400,8 +420,10 @@ static bool can_vma_merge_left(struct vma_merge_struct *vmg)
static bool can_vma_merge_right(struct vma_merge_struct *vmg,
bool can_merge_left)
{
- if (!vmg->next || vmg->end != vmg->next->vm_start ||
- !can_vma_merge_before(vmg))
+ struct vm_area_struct *next = vmg->next;
+ struct vm_area_struct *prev;
+
+ if (!next || vmg->end != next->vm_start || !can_vma_merge_before(vmg))
return false;
if (!can_merge_left)
@@ -414,7 +436,9 @@ static bool can_vma_merge_right(struct vma_merge_struct *vmg,
*
* We therefore check this in addition to mergeability to either side.
*/
- return are_anon_vmas_compatible(vmg->prev, vmg->next);
+ prev = vmg->prev;
+ return !prev->anon_vma || !next->anon_vma ||
+ prev->anon_vma == next->anon_vma;
}
/*
@@ -554,7 +578,9 @@ static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
}
/*
- * dup_anon_vma() - Helper function to duplicate anon_vma
+ * dup_anon_vma() - Helper function to duplicate anon_vma on VMA merge in the
+ * instance that the destination VMA has no anon_vma but the source does.
+ *
* @dst: The destination VMA
* @src: The source VMA
* @dup: Pointer to the destination VMA when successful.
@@ -565,9 +591,18 @@ static int dup_anon_vma(struct vm_area_struct *dst,
struct vm_area_struct *src, struct vm_area_struct **dup)
{
/*
- * Easily overlooked: when mprotect shifts the boundary, make sure the
- * expanding vma has anon_vma set if the shrinking vma had, to cover any
- * anon pages imported.
+ * There are three cases to consider for correctly propagating
+ * anon_vma's on merge.
+ *
+ * The first is trivial - neither VMA has anon_vma, we need not do
+ * anything.
+ *
+ * The second where both have anon_vma is also a no-op, as they must
+ * then be the same, so there is simply nothing to copy.
+ *
+ * Here we cover the third - if the destination VMA has no anon_vma,
+ * that is it is unfaulted, we need to ensure that the newly merged
+ * range is referenced by the anon_vma's of the source.
*/
if (src->anon_vma && !dst->anon_vma) {
int ret;
diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
index 11f761769b5b..7cfd6e31db10 100644
--- a/tools/testing/vma/vma.c
+++ b/tools/testing/vma/vma.c
@@ -185,6 +185,15 @@ static void vmg_set_range(struct vma_merge_struct *vmg, unsigned long start,
vmg->__adjust_next_start = false;
}
+/* Helper function to set both the VMG range and its anon_vma. */
+static void vmg_set_range_anon_vma(struct vma_merge_struct *vmg, unsigned long start,
+ unsigned long end, pgoff_t pgoff, vm_flags_t flags,
+ struct anon_vma *anon_vma)
+{
+ vmg_set_range(vmg, start, end, pgoff, flags);
+ vmg->anon_vma = anon_vma;
+}
+
/*
* Helper function to try to merge a new VMA.
*
@@ -265,6 +274,22 @@ static void dummy_close(struct vm_area_struct *)
{
}
+static void __vma_set_dummy_anon_vma(struct vm_area_struct *vma,
+ struct anon_vma_chain *avc,
+ struct anon_vma *anon_vma)
+{
+ vma->anon_vma = anon_vma;
+ INIT_LIST_HEAD(&vma->anon_vma_chain);
+ list_add(&avc->same_vma, &vma->anon_vma_chain);
+ avc->anon_vma = vma->anon_vma;
+}
+
+static void vma_set_dummy_anon_vma(struct vm_area_struct *vma,
+ struct anon_vma_chain *avc)
+{
+ __vma_set_dummy_anon_vma(vma, avc, &dummy_anon_vma);
+}
+
static bool test_simple_merge(void)
{
struct vm_area_struct *vma;
@@ -953,6 +978,7 @@ static bool test_merge_existing(void)
const struct vm_operations_struct vm_ops = {
.close = dummy_close,
};
+ struct anon_vma_chain avc = {};
/*
* Merge right case - partial span.
@@ -968,10 +994,10 @@ static bool test_merge_existing(void)
vma->vm_ops = &vm_ops; /* This should have no impact. */
vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, flags);
vma_next->vm_ops = &vm_ops; /* This should have no impact. */
- vmg_set_range(&vmg, 0x3000, 0x6000, 3, flags);
+ vmg_set_range_anon_vma(&vmg, 0x3000, 0x6000, 3, flags, &dummy_anon_vma);
vmg.middle = vma;
vmg.prev = vma;
- vma->anon_vma = &dummy_anon_vma;
+ vma_set_dummy_anon_vma(vma, &avc);
ASSERT_EQ(merge_existing(&vmg), vma_next);
ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_next->vm_start, 0x3000);
@@ -1001,9 +1027,9 @@ static bool test_merge_existing(void)
vma = alloc_and_link_vma(&mm, 0x2000, 0x6000, 2, flags);
vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, flags);
vma_next->vm_ops = &vm_ops; /* This should have no impact. */
- vmg_set_range(&vmg, 0x2000, 0x6000, 2, flags);
+ vmg_set_range_anon_vma(&vmg, 0x2000, 0x6000, 2, flags, &dummy_anon_vma);
vmg.middle = vma;
- vma->anon_vma = &dummy_anon_vma;
+ vma_set_dummy_anon_vma(vma, &avc);
ASSERT_EQ(merge_existing(&vmg), vma_next);
ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_next->vm_start, 0x2000);
@@ -1030,11 +1056,10 @@ static bool test_merge_existing(void)
vma_prev->vm_ops = &vm_ops; /* This should have no impact. */
vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
vma->vm_ops = &vm_ops; /* This should have no impact. */
- vmg_set_range(&vmg, 0x3000, 0x6000, 3, flags);
+ vmg_set_range_anon_vma(&vmg, 0x3000, 0x6000, 3, flags, &dummy_anon_vma);
vmg.prev = vma_prev;
vmg.middle = vma;
- vma->anon_vma = &dummy_anon_vma;
-
+ vma_set_dummy_anon_vma(vma, &avc);
ASSERT_EQ(merge_existing(&vmg), vma_prev);
ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_prev->vm_start, 0);
@@ -1064,10 +1089,10 @@ static bool test_merge_existing(void)
vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
vma_prev->vm_ops = &vm_ops; /* This should have no impact. */
vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
- vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
+ vmg_set_range_anon_vma(&vmg, 0x3000, 0x7000, 3, flags, &dummy_anon_vma);
vmg.prev = vma_prev;
vmg.middle = vma;
- vma->anon_vma = &dummy_anon_vma;
+ vma_set_dummy_anon_vma(vma, &avc);
ASSERT_EQ(merge_existing(&vmg), vma_prev);
ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_prev->vm_start, 0);
@@ -1094,10 +1119,10 @@ static bool test_merge_existing(void)
vma_prev->vm_ops = &vm_ops; /* This should have no impact. */
vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, flags);
vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, flags);
- vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
+ vmg_set_range_anon_vma(&vmg, 0x3000, 0x7000, 3, flags, &dummy_anon_vma);
vmg.prev = vma_prev;
vmg.middle = vma;
- vma->anon_vma = &dummy_anon_vma;
+ vma_set_dummy_anon_vma(vma, &avc);
ASSERT_EQ(merge_existing(&vmg), vma_prev);
ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_prev->vm_start, 0);
@@ -1180,12 +1205,9 @@ static bool test_anon_vma_non_mergeable(void)
.mm = &mm,
.vmi = &vmi,
};
- struct anon_vma_chain dummy_anon_vma_chain1 = {
- .anon_vma = &dummy_anon_vma,
- };
- struct anon_vma_chain dummy_anon_vma_chain2 = {
- .anon_vma = &dummy_anon_vma,
- };
+ struct anon_vma_chain dummy_anon_vma_chain_1 = {};
+ struct anon_vma_chain dummy_anon_vma_chain_2 = {};
+ struct anon_vma dummy_anon_vma_2;
/*
* In the case of modified VMA merge, merging both left and right VMAs
@@ -1209,24 +1231,11 @@ static bool test_anon_vma_non_mergeable(void)
*
* However, when prev is compared to next, the merge should fail.
*/
-
- INIT_LIST_HEAD(&vma_prev->anon_vma_chain);
- list_add(&dummy_anon_vma_chain1.same_vma, &vma_prev->anon_vma_chain);
- ASSERT_TRUE(list_is_singular(&vma_prev->anon_vma_chain));
- vma_prev->anon_vma = &dummy_anon_vma;
- ASSERT_TRUE(is_mergeable_anon_vma(NULL, vma_prev->anon_vma, vma_prev));
-
- INIT_LIST_HEAD(&vma_next->anon_vma_chain);
- list_add(&dummy_anon_vma_chain2.same_vma, &vma_next->anon_vma_chain);
- ASSERT_TRUE(list_is_singular(&vma_next->anon_vma_chain));
- vma_next->anon_vma = (struct anon_vma *)2;
- ASSERT_TRUE(is_mergeable_anon_vma(NULL, vma_next->anon_vma, vma_next));
-
- ASSERT_FALSE(is_mergeable_anon_vma(vma_prev->anon_vma, vma_next->anon_vma, NULL));
-
- vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
+ vmg_set_range_anon_vma(&vmg, 0x3000, 0x7000, 3, flags, NULL);
vmg.prev = vma_prev;
vmg.middle = vma;
+ vma_set_dummy_anon_vma(vma_prev, &dummy_anon_vma_chain_1);
+ __vma_set_dummy_anon_vma(vma_next, &dummy_anon_vma_chain_2, &dummy_anon_vma_2);
ASSERT_EQ(merge_existing(&vmg), vma_prev);
ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
@@ -1253,17 +1262,12 @@ static bool test_anon_vma_non_mergeable(void)
vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, flags);
- INIT_LIST_HEAD(&vma_prev->anon_vma_chain);
- list_add(&dummy_anon_vma_chain1.same_vma, &vma_prev->anon_vma_chain);
- vma_prev->anon_vma = (struct anon_vma *)1;
-
- INIT_LIST_HEAD(&vma_next->anon_vma_chain);
- list_add(&dummy_anon_vma_chain2.same_vma, &vma_next->anon_vma_chain);
- vma_next->anon_vma = (struct anon_vma *)2;
-
- vmg_set_range(&vmg, 0x3000, 0x7000, 3, flags);
+ vmg_set_range_anon_vma(&vmg, 0x3000, 0x7000, 3, flags, NULL);
vmg.prev = vma_prev;
+ vma_set_dummy_anon_vma(vma_prev, &dummy_anon_vma_chain_1);
+ __vma_set_dummy_anon_vma(vma_next, &dummy_anon_vma_chain_2, &dummy_anon_vma_2);
+ vmg.anon_vma = NULL;
ASSERT_EQ(merge_new(&vmg), vma_prev);
ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
ASSERT_EQ(vma_prev->vm_start, 0);
@@ -1363,8 +1367,8 @@ static bool test_dup_anon_vma(void)
vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
vma_next = alloc_and_link_vma(&mm, 0x5000, 0x8000, 5, flags);
-
- vma->anon_vma = &dummy_anon_vma;
+ vmg.anon_vma = &dummy_anon_vma;
+ vma_set_dummy_anon_vma(vma, &dummy_anon_vma_chain);
vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
vmg.prev = vma_prev;
vmg.middle = vma;
@@ -1392,7 +1396,7 @@ static bool test_dup_anon_vma(void)
vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, flags);
vma = alloc_and_link_vma(&mm, 0x3000, 0x8000, 3, flags);
- vma->anon_vma = &dummy_anon_vma;
+ vma_set_dummy_anon_vma(vma, &dummy_anon_vma_chain);
vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
vmg.prev = vma_prev;
vmg.middle = vma;
@@ -1420,7 +1424,7 @@ static bool test_dup_anon_vma(void)
vma = alloc_and_link_vma(&mm, 0, 0x5000, 0, flags);
vma_next = alloc_and_link_vma(&mm, 0x5000, 0x8000, 5, flags);
- vma->anon_vma = &dummy_anon_vma;
+ vma_set_dummy_anon_vma(vma, &dummy_anon_vma_chain);
vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
vmg.prev = vma;
vmg.middle = vma;
@@ -1447,6 +1451,7 @@ static bool test_vmi_prealloc_fail(void)
.mm = &mm,
.vmi = &vmi,
};
+ struct anon_vma_chain avc = {};
struct vm_area_struct *vma_prev, *vma;
/*
@@ -1459,9 +1464,10 @@ static bool test_vmi_prealloc_fail(void)
vma = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
vma->anon_vma = &dummy_anon_vma;
- vmg_set_range(&vmg, 0x3000, 0x5000, 3, flags);
+ vmg_set_range_anon_vma(&vmg, 0x3000, 0x5000, 3, flags, &dummy_anon_vma);
vmg.prev = vma_prev;
vmg.middle = vma;
+ vma_set_dummy_anon_vma(vma, &avc);
fail_prealloc = true;
--
2.49.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 2/3] tools/testing: add PROCMAP_QUERY helper functions in mm self tests
2025-04-08 9:29 [PATCH v2 0/3] fix incorrectly disallowed anonymous VMA merges Lorenzo Stoakes
2025-04-08 9:29 ` [PATCH v2 1/3] mm/vma: " Lorenzo Stoakes
@ 2025-04-08 9:29 ` Lorenzo Stoakes
2025-04-08 9:29 ` [PATCH v2 3/3] tools/testing/selftests: assert that anon merge cases behave as expected Lorenzo Stoakes
2 siblings, 0 replies; 6+ messages in thread
From: Lorenzo Stoakes @ 2025-04-08 9:29 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jann Horn, Liam R . Howlett, Suren Baghdasaryan,
David Hildenbrand, Matthew Wilcox, Rik van Riel, Wei Yang,
linux-mm, linux-kernel
The PROCMAP_QUERY ioctl() is very useful - it allows for binary access to
/proc/$pid/[s]maps data and thus convenient lookup of data contained there.
This patch exposes this for convenient use by mm self tests so the state of
VMAs can easily be queried.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
---
tools/testing/selftests/mm/vm_util.c | 62 ++++++++++++++++++++++++++++
tools/testing/selftests/mm/vm_util.h | 21 ++++++++++
2 files changed, 83 insertions(+)
diff --git a/tools/testing/selftests/mm/vm_util.c b/tools/testing/selftests/mm/vm_util.c
index a36734fb62f3..1357e2d6a7b6 100644
--- a/tools/testing/selftests/mm/vm_util.c
+++ b/tools/testing/selftests/mm/vm_util.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
#include <string.h>
+#include <errno.h>
#include <fcntl.h>
#include <dirent.h>
#include <inttypes.h>
@@ -424,3 +425,64 @@ bool check_vmflag_io(void *addr)
flags += flaglen;
}
}
+
+/*
+ * Open an fd at /proc/$pid/maps and configure procmap_out ready for
+ * PROCMAP_QUERY query. Returns 0 on success, or an error code otherwise.
+ */
+int open_procmap(pid_t pid, struct procmap_fd *procmap_out)
+{
+ char path[256];
+ int ret = 0;
+
+ memset(procmap_out, '\0', sizeof(*procmap_out));
+ sprintf(path, "/proc/%d/maps", pid);
+ procmap_out->query.size = sizeof(procmap_out->query);
+ procmap_out->fd = open(path, O_RDONLY);
+ if (procmap_out < 0)
+ ret = -errno;
+
+ return ret;
+}
+
+/* Perform PROCMAP_QUERY. Returns 0 on success, or an error code otherwise. */
+int query_procmap(struct procmap_fd *procmap)
+{
+ int ret = 0;
+
+ if (ioctl(procmap->fd, PROCMAP_QUERY, &procmap->query) == -1)
+ ret = -errno;
+
+ return ret;
+}
+
+/*
+ * Try to find the VMA at specified address, returns true if found, false if not
+ * found, and the test is failed if any other error occurs.
+ *
+ * On success, procmap->query is populated with the results.
+ */
+bool find_vma_procmap(struct procmap_fd *procmap, void *address)
+{
+ int err;
+
+ procmap->query.query_flags = 0;
+ procmap->query.query_addr = (unsigned long)address;
+ err = query_procmap(procmap);
+ if (!err)
+ return true;
+
+ if (err != -ENOENT)
+ ksft_exit_fail_msg("%s: Error %d on ioctl(PROCMAP_QUERY)\n",
+ __func__, err);
+ return false;
+}
+
+/*
+ * Close fd used by PROCMAP_QUERY mechanism. Returns 0 on success, or an error
+ * code otherwise.
+ */
+int close_procmap(struct procmap_fd *procmap)
+{
+ return close(procmap->fd);
+}
diff --git a/tools/testing/selftests/mm/vm_util.h b/tools/testing/selftests/mm/vm_util.h
index 6effafdc4d8a..9211ba640d9c 100644
--- a/tools/testing/selftests/mm/vm_util.h
+++ b/tools/testing/selftests/mm/vm_util.h
@@ -6,6 +6,7 @@
#include <strings.h> /* ffsl() */
#include <unistd.h> /* _SC_PAGESIZE */
#include "../kselftest.h"
+#include <linux/fs.h>
#define BIT_ULL(nr) (1ULL << (nr))
#define PM_SOFT_DIRTY BIT_ULL(55)
@@ -19,6 +20,15 @@
extern unsigned int __page_size;
extern unsigned int __page_shift;
+/*
+ * Represents an open fd and PROCMAP_QUERY state for binary (via ioctl)
+ * /proc/$pid/[s]maps lookup.
+ */
+struct procmap_fd {
+ int fd;
+ struct procmap_query query;
+};
+
static inline unsigned int psize(void)
{
if (!__page_size)
@@ -73,6 +83,17 @@ int uffd_register_with_ioctls(int uffd, void *addr, uint64_t len,
bool miss, bool wp, bool minor, uint64_t *ioctls);
unsigned long get_free_hugepages(void);
bool check_vmflag_io(void *addr);
+int open_procmap(pid_t pid, struct procmap_fd *procmap_out);
+int query_procmap(struct procmap_fd *procmap);
+bool find_vma_procmap(struct procmap_fd *procmap, void *address);
+int close_procmap(struct procmap_fd *procmap);
+
+static inline int open_self_procmap(struct procmap_fd *procmap_out)
+{
+ pid_t pid = getpid();
+
+ return open_procmap(pid, procmap_out);
+}
/*
* On ppc64 this will only work with radix 2M hugepage size
--
2.49.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 3/3] tools/testing/selftests: assert that anon merge cases behave as expected
2025-04-08 9:29 [PATCH v2 0/3] fix incorrectly disallowed anonymous VMA merges Lorenzo Stoakes
2025-04-08 9:29 ` [PATCH v2 1/3] mm/vma: " Lorenzo Stoakes
2025-04-08 9:29 ` [PATCH v2 2/3] tools/testing: add PROCMAP_QUERY helper functions in mm self tests Lorenzo Stoakes
@ 2025-04-08 9:29 ` Lorenzo Stoakes
2 siblings, 0 replies; 6+ messages in thread
From: Lorenzo Stoakes @ 2025-04-08 9:29 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jann Horn, Liam R . Howlett, Suren Baghdasaryan,
David Hildenbrand, Matthew Wilcox, Rik van Riel, Wei Yang,
linux-mm, linux-kernel
Prior to the recently applied commit that permits this merge,
mprotect()'ing a faulted VMA, adjacent to an unfaulted VMA, such that the
two share characteristics would fail to merge due to what appear to be
unintended consequences of commit 965f55dea0e3 ("mmap: avoid merging cloned
VMAs").
Now we have fixed this bug, assert that we can indeed merge anonymous VMAs
this way.
Also assert that forked source/target VMAs are equally
rejected. Previously, all empty target anon merges with one VMA faulted and
the other unfaulted would be rejected incorrectly, now we ensure that
unforked merge, but forked do not.
Additionally, add the new test file to the MEMORY MAPPING section in
MAINTAINERS, as these tests are explicitly memory mapping related.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
MAINTAINERS | 1 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/merge.c | 455 ++++++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
5 files changed, 462 insertions(+)
create mode 100644 tools/testing/selftests/mm/merge.c
diff --git a/MAINTAINERS b/MAINTAINERS
index de97cd54ff24..358819584ec1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15550,6 +15550,7 @@ F: mm/mseal.c
F: mm/vma.c
F: mm/vma.h
F: mm/vma_internal.h
+F: tools/testing/selftests/mm/merge.c
F: tools/testing/vma/
MEMORY TECHNOLOGY DEVICES (MTD)
diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore
index c5241b193db8..91db34941a14 100644
--- a/tools/testing/selftests/mm/.gitignore
+++ b/tools/testing/selftests/mm/.gitignore
@@ -58,3 +58,4 @@ hugetlb_dio
pkey_sighandler_tests_32
pkey_sighandler_tests_64
guard-regions
+merge
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 8270895039d1..ad4d6043a60f 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -98,6 +98,7 @@ TEST_GEN_FILES += hugetlb_madv_vs_map
TEST_GEN_FILES += hugetlb_dio
TEST_GEN_FILES += droppable
TEST_GEN_FILES += guard-regions
+TEST_GEN_FILES += merge
ifneq ($(ARCH),arm64)
TEST_GEN_FILES += soft-dirty
diff --git a/tools/testing/selftests/mm/merge.c b/tools/testing/selftests/mm/merge.c
new file mode 100644
index 000000000000..c76646cdf6e6
--- /dev/null
+++ b/tools/testing/selftests/mm/merge.c
@@ -0,0 +1,455 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#define _GNU_SOURCE
+#include "../kselftest_harness.h"
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+#include "vm_util.h"
+
+FIXTURE(merge)
+{
+ unsigned int page_size;
+ char *carveout;
+ struct procmap_fd procmap;
+};
+
+FIXTURE_SETUP(merge)
+{
+ self->page_size = psize();
+ /* Carve out PROT_NONE region to map over. */
+ self->carveout = mmap(NULL, 12 * self->page_size, PROT_NONE,
+ MAP_ANON | MAP_PRIVATE, -1, 0);
+ ASSERT_NE(self->carveout, MAP_FAILED);
+ /* Setup PROCMAP_QUERY interface. */
+ ASSERT_EQ(open_self_procmap(&self->procmap), 0);
+}
+
+FIXTURE_TEARDOWN(merge)
+{
+ ASSERT_EQ(munmap(self->carveout, 12 * self->page_size), 0);
+ ASSERT_EQ(close_procmap(&self->procmap), 0);
+}
+
+TEST_F(merge, mprotect_unfaulted_left)
+{
+ unsigned int page_size = self->page_size;
+ char *carveout = self->carveout;
+ struct procmap_fd *procmap = &self->procmap;
+ char *ptr;
+
+ /*
+ * Map 10 pages of R/W memory within. MAP_NORESERVE so we don't hit
+ * merge failure due to lack of VM_ACCOUNT flag by mistake.
+ *
+ * |-----------------------|
+ * | unfaulted |
+ * |-----------------------|
+ */
+ ptr = mmap(&carveout[page_size], 10 * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
+ ASSERT_NE(ptr, MAP_FAILED);
+ /*
+ * Now make the first 5 pages read-only, splitting the VMA:
+ *
+ * RO RW
+ * |-----------|-----------|
+ * | unfaulted | unfaulted |
+ * |-----------|-----------|
+ */
+ ASSERT_EQ(mprotect(ptr, 5 * page_size, PROT_READ), 0);
+ /*
+ * Fault in the first of the last 5 pages so it gets an anon_vma and
+ * thus the whole VMA becomes 'faulted':
+ *
+ * RO RW
+ * |-----------|-----------|
+ * | unfaulted | faulted |
+ * |-----------|-----------|
+ */
+ ptr[5 * page_size] = 'x';
+ /*
+ * Now mprotect() the RW region read-only, we should merge (though for
+ * ~15 years we did not! :):
+ *
+ * RO
+ * |-----------------------|
+ * | faulted |
+ * |-----------------------|
+ */
+ ASSERT_EQ(mprotect(&ptr[5 * page_size], 5 * page_size, PROT_READ), 0);
+
+ /* Assert that the merge succeeded using PROCMAP_QUERY. */
+ ASSERT_TRUE(find_vma_procmap(procmap, ptr));
+ ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
+ ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 10 * page_size);
+}
+
+TEST_F(merge, mprotect_unfaulted_right)
+{
+ unsigned int page_size = self->page_size;
+ char *carveout = self->carveout;
+ struct procmap_fd *procmap = &self->procmap;
+ char *ptr;
+
+ /*
+ * |-----------------------|
+ * | unfaulted |
+ * |-----------------------|
+ */
+ ptr = mmap(&carveout[page_size], 10 * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
+ ASSERT_NE(ptr, MAP_FAILED);
+ /*
+ * Now make the last 5 pages read-only, splitting the VMA:
+ *
+ * RW RO
+ * |-----------|-----------|
+ * | unfaulted | unfaulted |
+ * |-----------|-----------|
+ */
+ ASSERT_EQ(mprotect(&ptr[5 * page_size], 5 * page_size, PROT_READ), 0);
+ /*
+ * Fault in the first of the first 5 pages so it gets an anon_vma and
+ * thus the whole VMA becomes 'faulted':
+ *
+ * RW RO
+ * |-----------|-----------|
+ * | faulted | unfaulted |
+ * |-----------|-----------|
+ */
+ ptr[0] = 'x';
+ /*
+ * Now mprotect() the RW region read-only, we should merge:
+ *
+ * RO
+ * |-----------------------|
+ * | faulted |
+ * |-----------------------|
+ */
+ ASSERT_EQ(mprotect(ptr, 5 * page_size, PROT_READ), 0);
+
+ /* Assert that the merge succeeded using PROCMAP_QUERY. */
+ ASSERT_TRUE(find_vma_procmap(procmap, ptr));
+ ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
+ ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 10 * page_size);
+}
+
+TEST_F(merge, mprotect_unfaulted_both)
+{
+ unsigned int page_size = self->page_size;
+ char *carveout = self->carveout;
+ struct procmap_fd *procmap = &self->procmap;
+ char *ptr;
+
+ /*
+ * |-----------------------|
+ * | unfaulted |
+ * |-----------------------|
+ */
+ ptr = mmap(&carveout[2 * page_size], 9 * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
+ ASSERT_NE(ptr, MAP_FAILED);
+ /*
+ * Now make the first and last 3 pages read-only, splitting the VMA:
+ *
+ * RO RW RO
+ * |-----------|-----------|-----------|
+ * | unfaulted | unfaulted | unfaulted |
+ * |-----------|-----------|-----------|
+ */
+ ASSERT_EQ(mprotect(ptr, 3 * page_size, PROT_READ), 0);
+ ASSERT_EQ(mprotect(&ptr[6 * page_size], 3 * page_size, PROT_READ), 0);
+ /*
+ * Fault in the first of the middle 3 pages so it gets an anon_vma and
+ * thus the whole VMA becomes 'faulted':
+ *
+ * RO RW RO
+ * |-----------|-----------|-----------|
+ * | unfaulted | faulted | unfaulted |
+ * |-----------|-----------|-----------|
+ */
+ ptr[3 * page_size] = 'x';
+ /*
+ * Now mprotect() the RW region read-only, we should merge:
+ *
+ * RO
+ * |-----------------------|
+ * | faulted |
+ * |-----------------------|
+ */
+ ASSERT_EQ(mprotect(&ptr[3 * page_size], 3 * page_size, PROT_READ), 0);
+
+ /* Assert that the merge succeeded using PROCMAP_QUERY. */
+ ASSERT_TRUE(find_vma_procmap(procmap, ptr));
+ ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
+ ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 9 * page_size);
+}
+
+TEST_F(merge, mprotect_faulted_left_unfaulted_right)
+{
+ unsigned int page_size = self->page_size;
+ char *carveout = self->carveout;
+ struct procmap_fd *procmap = &self->procmap;
+ char *ptr;
+
+ /*
+ * |-----------------------|
+ * | unfaulted |
+ * |-----------------------|
+ */
+ ptr = mmap(&carveout[2 * page_size], 9 * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
+ ASSERT_NE(ptr, MAP_FAILED);
+ /*
+ * Now make the last 3 pages read-only, splitting the VMA:
+ *
+ * RW RO
+ * |-----------------------|-----------|
+ * | unfaulted | unfaulted |
+ * |-----------------------|-----------|
+ */
+ ASSERT_EQ(mprotect(&ptr[6 * page_size], 3 * page_size, PROT_READ), 0);
+ /*
+ * Fault in the first of the first 6 pages so it gets an anon_vma and
+ * thus the whole VMA becomes 'faulted':
+ *
+ * RW RO
+ * |-----------------------|-----------|
+ * | unfaulted | unfaulted |
+ * |-----------------------|-----------|
+ */
+ ptr[0] = 'x';
+ /*
+ * Now make the first 3 pages read-only, splitting the VMA:
+ *
+ * RO RW RO
+ * |-----------|-----------|-----------|
+ * | faulted | faulted | unfaulted |
+ * |-----------|-----------|-----------|
+ */
+ ASSERT_EQ(mprotect(ptr, 3 * page_size, PROT_READ), 0);
+ /*
+ * Now mprotect() the RW region read-only, we should merge:
+ *
+ * RO
+ * |-----------------------|
+ * | faulted |
+ * |-----------------------|
+ */
+ ASSERT_EQ(mprotect(&ptr[3 * page_size], 3 * page_size, PROT_READ), 0);
+
+ /* Assert that the merge succeeded using PROCMAP_QUERY. */
+ ASSERT_TRUE(find_vma_procmap(procmap, ptr));
+ ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
+ ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 9 * page_size);
+}
+
+TEST_F(merge, mprotect_unfaulted_left_faulted_right)
+{
+ unsigned int page_size = self->page_size;
+ char *carveout = self->carveout;
+ struct procmap_fd *procmap = &self->procmap;
+ char *ptr;
+
+ /*
+ * |-----------------------|
+ * | unfaulted |
+ * |-----------------------|
+ */
+ ptr = mmap(&carveout[2 * page_size], 9 * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
+ ASSERT_NE(ptr, MAP_FAILED);
+ /*
+ * Now make the first 3 pages read-only, splitting the VMA:
+ *
+ * RO RW
+ * |-----------|-----------------------|
+ * | unfaulted | unfaulted |
+ * |-----------|-----------------------|
+ */
+ ASSERT_EQ(mprotect(ptr, 3 * page_size, PROT_READ), 0);
+ /*
+ * Fault in the first of the last 6 pages so it gets an anon_vma and
+ * thus the whole VMA becomes 'faulted':
+ *
+ * RO RW
+ * |-----------|-----------------------|
+ * | unfaulted | faulted |
+ * |-----------|-----------------------|
+ */
+ ptr[3 * page_size] = 'x';
+ /*
+ * Now make the last 3 pages read-only, splitting the VMA:
+ *
+ * RO RW RO
+ * |-----------|-----------|-----------|
+ * | unfaulted | faulted | faulted |
+ * |-----------|-----------|-----------|
+ */
+ ASSERT_EQ(mprotect(&ptr[6 * page_size], 3 * page_size, PROT_READ), 0);
+ /*
+ * Now mprotect() the RW region read-only, we should merge:
+ *
+ * RO
+ * |-----------------------|
+ * | faulted |
+ * |-----------------------|
+ */
+ ASSERT_EQ(mprotect(&ptr[3 * page_size], 3 * page_size, PROT_READ), 0);
+
+ /* Assert that the merge succeeded using PROCMAP_QUERY. */
+ ASSERT_TRUE(find_vma_procmap(procmap, ptr));
+ ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
+ ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 9 * page_size);
+}
+
+TEST_F(merge, forked_target_vma)
+{
+ unsigned int page_size = self->page_size;
+ char *carveout = self->carveout;
+ struct procmap_fd *procmap = &self->procmap;
+ pid_t pid;
+ char *ptr, *ptr2;
+ int i;
+
+ /*
+ * |-----------|
+ * | unfaulted |
+ * |-----------|
+ */
+ ptr = mmap(&carveout[page_size], 5 * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANON | MAP_PRIVATE | MAP_FIXED, -1, 0);
+ ASSERT_NE(ptr, MAP_FAILED);
+
+ /*
+ * Fault in process.
+ *
+ * |-----------|
+ * | faulted |
+ * |-----------|
+ */
+ ptr[0] = 'x';
+
+ pid = fork();
+ ASSERT_NE(pid, -1);
+
+ if (pid != 0) {
+ wait(NULL);
+ return;
+ }
+
+ /* Child process below: */
+
+ /* Reopen for child. */
+ ASSERT_EQ(close_procmap(&self->procmap), 0);
+ ASSERT_EQ(open_self_procmap(&self->procmap), 0);
+
+ /* unCOWing everything does not cause the AVC to go away. */
+ for (i = 0; i < 5 * page_size; i += page_size)
+ ptr[i] = 'x';
+
+ /*
+ * Map in adjacent VMA in child.
+ *
+ * forked
+ * |-----------|-----------|
+ * | faulted | unfaulted |
+ * |-----------|-----------|
+ * ptr ptr2
+ */
+ ptr2 = mmap(&ptr[5 * page_size], 5 * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANON | MAP_PRIVATE | MAP_FIXED, -1, 0);
+ ASSERT_NE(ptr2, MAP_FAILED);
+
+ /* Make sure not merged. */
+ ASSERT_TRUE(find_vma_procmap(procmap, ptr));
+ ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
+ ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 5 * page_size);
+}
+
+TEST_F(merge, forked_source_vma)
+{
+ unsigned int page_size = self->page_size;
+ char *carveout = self->carveout;
+ struct procmap_fd *procmap = &self->procmap;
+ pid_t pid;
+ char *ptr, *ptr2;
+ int i;
+
+ /*
+ * |-----------|------------|
+ * | unfaulted | <unmapped> |
+ * |-----------|------------|
+ */
+ ptr = mmap(&carveout[page_size], 5 * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
+ ASSERT_NE(ptr, MAP_FAILED);
+
+ /*
+ * Fault in process.
+ *
+ * |-----------|------------|
+ * | faulted | <unmapped> |
+ * |-----------|------------|
+ */
+ ptr[0] = 'x';
+
+ pid = fork();
+ ASSERT_NE(pid, -1);
+
+ if (pid != 0) {
+ wait(NULL);
+ return;
+ }
+
+ /* Child process below: */
+
+ /* Reopen for child. */
+ ASSERT_EQ(close_procmap(&self->procmap), 0);
+ ASSERT_EQ(open_self_procmap(&self->procmap), 0);
+
+ /* unCOWing everything does not cause the AVC to go away. */
+ for (i = 0; i < 5 * page_size; i += page_size)
+ ptr[i] = 'x';
+
+ /*
+ * Map in adjacent VMA in child, ptr2 after ptr, but incompatible.
+ *
+ * forked RW RWX
+ * |-----------|-----------|
+ * | faulted | unfaulted |
+ * |-----------|-----------|
+ * ptr ptr2
+ */
+ ptr2 = mmap(&carveout[6 * page_size], 5 * page_size, PROT_READ | PROT_WRITE | PROT_EXEC,
+ MAP_ANON | MAP_PRIVATE | MAP_FIXED | MAP_NORESERVE, -1, 0);
+ ASSERT_NE(ptr2, MAP_FAILED);
+
+ /* Make sure not merged. */
+ ASSERT_TRUE(find_vma_procmap(procmap, ptr2));
+ ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr2);
+ ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr2 + 5 * page_size);
+
+ /*
+ * Now mprotect forked region to RWX so it becomes the source for the
+ * merge to unfaulted region:
+ *
+ * forked RWX RWX
+ * |-----------|-----------|
+ * | faulted | unfaulted |
+ * |-----------|-----------|
+ * ptr ptr2
+ *
+ * This should NOT result in a merge, as ptr was forked.
+ */
+ ASSERT_EQ(mprotect(ptr, 5 * page_size, PROT_READ | PROT_WRITE | PROT_EXEC), 0);
+ /* Again, make sure not merged. */
+ ASSERT_TRUE(find_vma_procmap(procmap, ptr2));
+ ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr2);
+ ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr2 + 5 * page_size);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 9aff33b10999..188b125bf1f6 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -79,6 +79,8 @@ separated by spaces:
test prctl(PR_SET_MDWE, ...)
- page_frag
test handling of page fragment allocation and freeing
+- vma_merge
+ test VMA merge cases behave as expected
example: ./run_vmtests.sh -t "hmm mmap ksm"
EOF
@@ -421,6 +423,8 @@ CATEGORY="madv_guard" run_test ./guard-regions
# MADV_POPULATE_READ and MADV_POPULATE_WRITE tests
CATEGORY="madv_populate" run_test ./madv_populate
+CATEGORY="vma_merge" run_test ./merge
+
if [ -x ./memfd_secret ]
then
(echo 0 > /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix
--
2.49.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/3] mm/vma: fix incorrectly disallowed anonymous VMA merges
2025-04-08 9:29 ` [PATCH v2 1/3] mm/vma: " Lorenzo Stoakes
@ 2025-05-12 2:39 ` Harry Yoo
2025-05-12 11:24 ` Lorenzo Stoakes
0 siblings, 1 reply; 6+ messages in thread
From: Harry Yoo @ 2025-05-12 2:39 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, Vlastimil Babka, Jann Horn, Liam R . Howlett,
Suren Baghdasaryan, David Hildenbrand, Matthew Wilcox,
Rik van Riel, Wei Yang, linux-mm, linux-kernel
On Tue, Apr 08, 2025 at 10:29:31AM +0100, Lorenzo Stoakes wrote:
> anon_vma_chain's were introduced by Rik von Riel in commit 5beb49305251 ("mm:
> change anon_vma linking to fix multi-process server scalability issue").
>
> This patch was introduced in March 2010. As part of this change, careful
> attention was made to the instance of mprotect() causing a VMA merge, with one
> faulted (i.e. having anon_vma set) and another not:
>
> /*
> * Easily overlooked: when mprotect shifts the boundary,
> * make sure the expanding vma has anon_vma set if the
> * shrinking vma had, to cover any anon pages imported.
> */
>
> In the modern VMA code, this is handled in dup_anon_vma() (and ultimately
> anon_vma_clone()).
>
> This case is one of the three configurations of adjacent VMA anon_vma state
> that we might encounter on merge (where dst is the VMA which will be merged
> into and src the one being merged into dst):
>
> 1. dst->anon_vma, src->anon_vma - These must be equal, no-op.
> 2. dst->anon_vma, !src->anon_vma - We simply use dst->anon_vma, no-op.
> 3. !dst->anon_vma, src->anon_vma - The case in question here.
>
> In case 3, the instance addressed here - we duplicate the AVC connections
> from src and place into dst.
>
> However, in practice, we very often do NOT do this.
>
> This appears to be due to an inadvertent consequence of the change
> introduced by commit 965f55dea0e3 ("mmap: avoid merging cloned VMAs"),
> introduced in May 2011.
>
> This implies that this merge case was functional only for a little over a
> year, and has since been broken for ~15 years.
>
> Here, lock scalability concerns lead to us restricting anonymous merges
> only to those VMAs with 1 entry in their vma->anon_vma_chain, that is, a
> VMA that is not connected to any parent process's anon_vma.
>
> The mergeability test looks like this:
>
> static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1,
> struct anon_vma *anon_vma2, struct vm_area_struct *vma)
> {
> if ((!anon_vma1 || !anon_vma2) && (!vma ||
> !vma->anon_vma || list_is_singular(&vma->anon_vma_chain)))
> return true;
> return anon_vma1 == anon_vma2;
> }
>
> However, we have a problem here - typically the vma passed here is the
> destination VMA.
>
> For instance in vma_merge_existing_range() we invoke:
>
> can_vma_merge_left()
> -> [ check that there is an immediately adjacent prior VMA ]
> -> can_vma_merge_after()
> -> is_mergeable_vma() for general attribute check
> -> is_mergeable_anon_vma([ proposed anon_vma ], prev->anon_vma, prev)
>
> So if we were considering a target unfaulted 'prev':
>
> unfaulted faulted
> |-----------|-----------|
> | prev | vma |
> |-----------|-----------|
>
> This would call is_mergeable_anon_vma(NULL, vma->anon_vma, prev).
>
> The list_is_singular() check for vma->anon_vma_chain, an empty list on
> fault, would cause this merge to _fail_ even though all else indicates a
> merge.
>
> Equally a simple merge into a next VMA would hit the same problem:
>
> faulted unfaulted
> |-----------|-----------|
> | vma | next |
> |-----------|-----------|
>
> can_vma_merge_right()
> -> [ check that there is an immediately adjacent succeeding VMA ]
> -> can_vma_merge_before()
> -> is_mergeable_vma() for general attribute check
> -> is_mergeable_anon_vma([ proposed anon_vma ], next->anon_vma, next)
>
> For a 3-way merge, we'd also hit the same problem if it was configured like
> this for instance:
>
> unfaulted faulted unfaulted
> |-----------|-----------|-----------|
> | prev | vma | next |
> |-----------|-----------|-----------|
>
> As we'd call can_vma_merge_left() for prev, and can_vma_merge_right() for
> next, both of which would fail.
>
> vma_merge_new_range() (and relatedly, vma_expand()) are not impacted, as
> the new VMA would never already be faulted (it is a proposed new range).
>
> Because we already handle each of the aforementioned merge cases, and can
> absolutely therefore deal with an existing VMA merge with !dst->anon_vma,
> src->anon_vma, there is absolutely no reason to disallow this kind of
> merge.
>
> It seems that the intention of this patch is to ensure that, in the
> instance of merging unfaulted VMAs with faulted ones, we never wish to do
> so with those with multiple AVCs due to the fact that anon_vma lock's are
> held across both parent and child anon_vma's (actually, the 'root' parent
> anon_vma's lock is used).
>
> In fact, the original commit alludes to this - "find_mergeable_anon_vma()
> already considers this case".
>
> In find_mergeable_anon_vma() however, we check the anon_vma which will be
> merged from, if it is set, then we check
> list_is_singular(vma->anon_vma_chain).
>
> So to match this logic, update is_mergeable_anon_vma() to perform this
> scalability check on the VMA whose anon_vma we ultimately merge into.
>
> This matches existing behaviour with forked VMAs, only we no longer wrongly
> disallow ALL empty target merges.
>
> So we both allow merge cases and ensure the scalability check is correctly
> applied.
>
> We may wish to revisit these lock scalability concerns at a later date and
> ensure they are still valid.
>
> Additionally, correct userland VMA tests which were mistakenly not
> asserting these cases correctly previously to now correctly assert this,
> and to ensure vmg->anon_vma state is always consistent to account for newly
> introduced asserts.
>
> Fixes: 965f55dea0e3 ("mmap: avoid merging cloned VMAs")
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
> ---
A little bit late to review, but better late than never :)
This patch makes sense and looks good to me,
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/3] mm/vma: fix incorrectly disallowed anonymous VMA merges
2025-05-12 2:39 ` Harry Yoo
@ 2025-05-12 11:24 ` Lorenzo Stoakes
0 siblings, 0 replies; 6+ messages in thread
From: Lorenzo Stoakes @ 2025-05-12 11:24 UTC (permalink / raw)
To: Harry Yoo
Cc: Andrew Morton, Vlastimil Babka, Jann Horn, Liam R . Howlett,
Suren Baghdasaryan, David Hildenbrand, Matthew Wilcox,
Rik van Riel, Wei Yang, linux-mm, linux-kernel
On Mon, May 12, 2025 at 11:39:16AM +0900, Harry Yoo wrote:
> On Tue, Apr 08, 2025 at 10:29:31AM +0100, Lorenzo Stoakes wrote:
> > anon_vma_chain's were introduced by Rik von Riel in commit 5beb49305251 ("mm:
> > change anon_vma linking to fix multi-process server scalability issue").
> >
> > This patch was introduced in March 2010. As part of this change, careful
> > attention was made to the instance of mprotect() causing a VMA merge, with one
> > faulted (i.e. having anon_vma set) and another not:
> >
> > /*
> > * Easily overlooked: when mprotect shifts the boundary,
> > * make sure the expanding vma has anon_vma set if the
> > * shrinking vma had, to cover any anon pages imported.
> > */
> >
> > In the modern VMA code, this is handled in dup_anon_vma() (and ultimately
> > anon_vma_clone()).
> >
> > This case is one of the three configurations of adjacent VMA anon_vma state
> > that we might encounter on merge (where dst is the VMA which will be merged
> > into and src the one being merged into dst):
> >
> > 1. dst->anon_vma, src->anon_vma - These must be equal, no-op.
> > 2. dst->anon_vma, !src->anon_vma - We simply use dst->anon_vma, no-op.
> > 3. !dst->anon_vma, src->anon_vma - The case in question here.
> >
> > In case 3, the instance addressed here - we duplicate the AVC connections
> > from src and place into dst.
> >
> > However, in practice, we very often do NOT do this.
> >
> > This appears to be due to an inadvertent consequence of the change
> > introduced by commit 965f55dea0e3 ("mmap: avoid merging cloned VMAs"),
> > introduced in May 2011.
> >
> > This implies that this merge case was functional only for a little over a
> > year, and has since been broken for ~15 years.
> >
> > Here, lock scalability concerns lead to us restricting anonymous merges
> > only to those VMAs with 1 entry in their vma->anon_vma_chain, that is, a
> > VMA that is not connected to any parent process's anon_vma.
> >
> > The mergeability test looks like this:
> >
> > static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1,
> > struct anon_vma *anon_vma2, struct vm_area_struct *vma)
> > {
> > if ((!anon_vma1 || !anon_vma2) && (!vma ||
> > !vma->anon_vma || list_is_singular(&vma->anon_vma_chain)))
> > return true;
> > return anon_vma1 == anon_vma2;
> > }
> >
> > However, we have a problem here - typically the vma passed here is the
> > destination VMA.
> >
> > For instance in vma_merge_existing_range() we invoke:
> >
> > can_vma_merge_left()
> > -> [ check that there is an immediately adjacent prior VMA ]
> > -> can_vma_merge_after()
> > -> is_mergeable_vma() for general attribute check
> > -> is_mergeable_anon_vma([ proposed anon_vma ], prev->anon_vma, prev)
> >
> > So if we were considering a target unfaulted 'prev':
> >
> > unfaulted faulted
> > |-----------|-----------|
> > | prev | vma |
> > |-----------|-----------|
> >
> > This would call is_mergeable_anon_vma(NULL, vma->anon_vma, prev).
> >
> > The list_is_singular() check for vma->anon_vma_chain, an empty list on
> > fault, would cause this merge to _fail_ even though all else indicates a
> > merge.
> >
> > Equally a simple merge into a next VMA would hit the same problem:
> >
> > faulted unfaulted
> > |-----------|-----------|
> > | vma | next |
> > |-----------|-----------|
> >
> > can_vma_merge_right()
> > -> [ check that there is an immediately adjacent succeeding VMA ]
> > -> can_vma_merge_before()
> > -> is_mergeable_vma() for general attribute check
> > -> is_mergeable_anon_vma([ proposed anon_vma ], next->anon_vma, next)
> >
> > For a 3-way merge, we'd also hit the same problem if it was configured like
> > this for instance:
> >
> > unfaulted faulted unfaulted
> > |-----------|-----------|-----------|
> > | prev | vma | next |
> > |-----------|-----------|-----------|
> >
> > As we'd call can_vma_merge_left() for prev, and can_vma_merge_right() for
> > next, both of which would fail.
> >
> > vma_merge_new_range() (and relatedly, vma_expand()) are not impacted, as
> > the new VMA would never already be faulted (it is a proposed new range).
> >
> > Because we already handle each of the aforementioned merge cases, and can
> > absolutely therefore deal with an existing VMA merge with !dst->anon_vma,
> > src->anon_vma, there is absolutely no reason to disallow this kind of
> > merge.
> >
> > It seems that the intention of this patch is to ensure that, in the
> > instance of merging unfaulted VMAs with faulted ones, we never wish to do
> > so with those with multiple AVCs due to the fact that anon_vma lock's are
> > held across both parent and child anon_vma's (actually, the 'root' parent
> > anon_vma's lock is used).
> >
> > In fact, the original commit alludes to this - "find_mergeable_anon_vma()
> > already considers this case".
> >
> > In find_mergeable_anon_vma() however, we check the anon_vma which will be
> > merged from, if it is set, then we check
> > list_is_singular(vma->anon_vma_chain).
> >
> > So to match this logic, update is_mergeable_anon_vma() to perform this
> > scalability check on the VMA whose anon_vma we ultimately merge into.
> >
> > This matches existing behaviour with forked VMAs, only we no longer wrongly
> > disallow ALL empty target merges.
> >
> > So we both allow merge cases and ensure the scalability check is correctly
> > applied.
> >
> > We may wish to revisit these lock scalability concerns at a later date and
> > ensure they are still valid.
> >
> > Additionally, correct userland VMA tests which were mistakenly not
> > asserting these cases correctly previously to now correctly assert this,
> > and to ensure vmg->anon_vma state is always consistent to account for newly
> > introduced asserts.
> >
> > Fixes: 965f55dea0e3 ("mmap: avoid merging cloned VMAs")
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
> > ---
>
> A little bit late to review, but better late than never :)
>
> This patch makes sense and looks good to me,
> Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Thanks! Appreciated :)
>
> --
> Cheers,
> Harry / Hyeonggon
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-05-12 11:24 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-08 9:29 [PATCH v2 0/3] fix incorrectly disallowed anonymous VMA merges Lorenzo Stoakes
2025-04-08 9:29 ` [PATCH v2 1/3] mm/vma: " Lorenzo Stoakes
2025-05-12 2:39 ` Harry Yoo
2025-05-12 11:24 ` Lorenzo Stoakes
2025-04-08 9:29 ` [PATCH v2 2/3] tools/testing: add PROCMAP_QUERY helper functions in mm self tests Lorenzo Stoakes
2025-04-08 9:29 ` [PATCH v2 3/3] tools/testing/selftests: assert that anon merge cases behave as expected Lorenzo Stoakes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox