linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH hotfix 6.12 0/2] introduce VMA merge mode to improve brk() performance
@ 2024-10-17 14:31 Lorenzo Stoakes
  2024-10-17 14:31 ` [PATCH hotfix 6.12 1/2] mm/vma: add expand-only VMA merge mode and optimise do_brk_flags() Lorenzo Stoakes
  2024-10-17 14:31 ` [PATCH hotfix 6.12 2/2] tools: testing: add expand-only mode VMA test Lorenzo Stoakes
  0 siblings, 2 replies; 5+ messages in thread
From: Lorenzo Stoakes @ 2024-10-17 14:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Liam R . Howlett, Vlastimil Babka, Jann Horn, linux-mm,
	linux-kernel, Oliver Sang

A ~5% performance regression was discovered on the
aim9.brk_test.ops_per_sec by the linux kernel test bot [0].

In the past to satisfy brk() performance we duplicated VMA expansion code
and special-cased do_brk_flags(). This is however horrid and undoes work to
abstract this logic, so in resolving the issue I have endeavoured to avoid
this.

Investigating further I was able to observe that the use of a
vma_iter_next_range() and vma_prev() pair, causing an unnecessary maple
tree walk. In addition there is work that we do that is simply unnecessary
for brk().

Therefore, add a special VMA merge mode VMG_FLAG_JUST_EXPAND to avoid doing
any of this - it assumes the VMA iterator is pointing at the previous VMA
and which skips logic that brk() does not require.

This mostly eliminates the performance regression reducing it to ~2% which
is in the realm of noise. In addition, the will-it-scale test brk2, written
to be more representative of real-world brk() usage, shows a modest
performance improvement - which gives me confidence that we are not
meaningfully regressing real workloads here.

This series includes a test asserting that the 'just expand' mode works as
expected.

With many thanks to Oliver Sang for helping with performance testing of
candidate patch sets!

[0]:https://lore.kernel.org/linux-mm/202409301043.629bea78-oliver.sang@intel.com

Lorenzo Stoakes (2):
  mm/vma: add expand-only VMA merge mode and optimise do_brk_flags()
  tools: testing: add expand-only mode VMA test

 mm/mmap.c               |  3 ++-
 mm/vma.c                | 23 +++++++++++++++--------
 mm/vma.h                | 14 ++++++++++++++
 tools/testing/vma/vma.c | 40 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 71 insertions(+), 9 deletions(-)

--
2.46.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH hotfix 6.12 1/2] mm/vma: add expand-only VMA merge mode and optimise do_brk_flags()
  2024-10-17 14:31 [PATCH hotfix 6.12 0/2] introduce VMA merge mode to improve brk() performance Lorenzo Stoakes
@ 2024-10-17 14:31 ` Lorenzo Stoakes
  2024-10-17 17:41   ` Liam R. Howlett
  2024-10-17 14:31 ` [PATCH hotfix 6.12 2/2] tools: testing: add expand-only mode VMA test Lorenzo Stoakes
  1 sibling, 1 reply; 5+ messages in thread
From: Lorenzo Stoakes @ 2024-10-17 14:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Liam R . Howlett, Vlastimil Babka, Jann Horn, linux-mm,
	linux-kernel, Oliver Sang

We know in advance that do_brk_flags() wants only to perform a VMA
expansion (if the prior VMA is compatible), and that we assume no mergeable
VMA follows it.

These are the semantics of this function prior to the recent rewrite of the
VMA merging logic, however we are now doing more work than necessary -
positioning the VMA iterator at the prior VMA and performing tasks that are
not required.

Add a new field to the vmg struct to permit merge flags and add a new merge
flag VMG_FLAG_JUST_EXPAND which implies this behaviour, and have
do_brk_flags() use this.

This fixes a reported performance regression in a brk() benchmarking suite.

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/linux-mm/202409301043.629bea78-oliver.sang@intel.com
Fixes: cacded5e42b9 ("mm: avoid using vma_merge() for new VMAs")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/mmap.c |  3 ++-
 mm/vma.c  | 23 +++++++++++++++--------
 mm/vma.h  | 14 ++++++++++++++
 3 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index dd4b35a25aeb..4a13d9f138f6 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1744,7 +1744,8 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		VMG_STATE(vmg, mm, vmi, addr, addr + len, flags, PHYS_PFN(addr));

 		vmg.prev = vma;
-		vma_iter_next_range(vmi);
+		/* vmi is positioned at prev, which this mode expects. */
+		vmg.merge_flags = VMG_FLAG_JUST_EXPAND;

 		if (vma_merge_new_range(&vmg))
 			goto out;
diff --git a/mm/vma.c b/mm/vma.c
index 4737afcb064c..b21ffec33f8e 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -917,6 +917,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
 	pgoff_t pgoff = vmg->pgoff;
 	pgoff_t pglen = PHYS_PFN(end - start);
 	bool can_merge_left, can_merge_right;
+	bool just_expand = vmg->merge_flags & VMG_FLAG_JUST_EXPAND;

 	mmap_assert_write_locked(vmg->mm);
 	VM_WARN_ON(vmg->vma);
@@ -930,7 +931,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
 		return NULL;

 	can_merge_left = can_vma_merge_left(vmg);
-	can_merge_right = can_vma_merge_right(vmg, can_merge_left);
+	can_merge_right = !just_expand && can_vma_merge_right(vmg, can_merge_left);

 	/* If we can merge with the next VMA, adjust vmg accordingly. */
 	if (can_merge_right) {
@@ -953,7 +954,11 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
 		if (can_merge_right && !can_merge_remove_vma(next))
 			vmg->end = end;

-		vma_prev(vmg->vmi); /* Equivalent to going to the previous range */
+		/* In expand-only case we are already positioned at prev. */
+		if (!just_expand) {
+			/* Equivalent to going to the previous range. */
+			vma_prev(vmg->vmi);
+		}
 	}

 	/*
@@ -967,12 +972,14 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
 	}

 	/* If expansion failed, reset state. Allows us to retry merge later. */
-	vmg->vma = NULL;
-	vmg->start = start;
-	vmg->end = end;
-	vmg->pgoff = pgoff;
-	if (vmg->vma == prev)
-		vma_iter_set(vmg->vmi, start);
+	if (!just_expand) {
+		vmg->vma = NULL;
+		vmg->start = start;
+		vmg->end = end;
+		vmg->pgoff = pgoff;
+		if (vmg->vma == prev)
+			vma_iter_set(vmg->vmi, start);
+	}

 	return NULL;
 }
diff --git a/mm/vma.h b/mm/vma.h
index 819f994cf727..8c6ecc0dfbf6 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -59,6 +59,17 @@ enum vma_merge_state {
 	VMA_MERGE_SUCCESS,
 };

+enum vma_merge_flags {
+	VMG_FLAG_DEFAULT = 0,
+	/*
+	 * If we can expand, simply do so. We know there is nothing to merge to
+	 * the right. Does not reset state upon failure to merge. The VMA
+	 * iterator is assumed to be positioned at the previous VMA, rather than
+	 * at the gap.
+	 */
+	VMG_FLAG_JUST_EXPAND = 1 << 0,
+};
+
 /* Represents a VMA merge operation. */
 struct vma_merge_struct {
 	struct mm_struct *mm;
@@ -75,6 +86,7 @@ struct vma_merge_struct {
 	struct mempolicy *policy;
 	struct vm_userfaultfd_ctx uffd_ctx;
 	struct anon_vma_name *anon_name;
+	enum vma_merge_flags merge_flags;
 	enum vma_merge_state state;
 };

@@ -99,6 +111,7 @@ static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
 		.flags = flags_,					\
 		.pgoff = pgoff_,					\
 		.state = VMA_MERGE_START,				\
+		.merge_flags = VMG_FLAG_DEFAULT,			\
 	}

 #define VMG_VMA_STATE(name, vmi_, prev_, vma_, start_, end_)	\
@@ -118,6 +131,7 @@ static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
 		.uffd_ctx = vma_->vm_userfaultfd_ctx,		\
 		.anon_name = anon_vma_name(vma_),		\
 		.state = VMA_MERGE_START,			\
+		.merge_flags = VMG_FLAG_DEFAULT,		\
 	}

 #ifdef CONFIG_DEBUG_VM_MAPLE_TREE
--
2.46.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH hotfix 6.12 2/2] tools: testing: add expand-only mode VMA test
  2024-10-17 14:31 [PATCH hotfix 6.12 0/2] introduce VMA merge mode to improve brk() performance Lorenzo Stoakes
  2024-10-17 14:31 ` [PATCH hotfix 6.12 1/2] mm/vma: add expand-only VMA merge mode and optimise do_brk_flags() Lorenzo Stoakes
@ 2024-10-17 14:31 ` Lorenzo Stoakes
  1 sibling, 0 replies; 5+ messages in thread
From: Lorenzo Stoakes @ 2024-10-17 14:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Liam R . Howlett, Vlastimil Babka, Jann Horn, linux-mm,
	linux-kernel, Oliver Sang

Add a test to assert that VMG_FLAG_JUST_EXPAND functions as expected - that
is, when the VMA iterator is positioned at the previous VMA and no VMAs
proceed it, we observe an expansion with all state as expected.

Explicitly place a prior VMA that would otherwise fail this test if the
mode were not enabled (as it would traverse to the previous-previous VMA).

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 tools/testing/vma/vma.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
index c53f220eb6cc..b33b47342d41 100644
--- a/tools/testing/vma/vma.c
+++ b/tools/testing/vma/vma.c
@@ -1522,6 +1522,45 @@ static bool test_copy_vma(void)
 	return true;
 }

+static bool test_expand_only_mode(void)
+{
+	unsigned long flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
+	struct mm_struct mm = {};
+	VMA_ITERATOR(vmi, &mm, 0);
+	struct vm_area_struct *vma_prev, *vma;
+	VMG_STATE(vmg, &mm, &vmi, 0x5000, 0x9000, flags, 5);
+
+	/*
+	 * Place a VMA prior to the one we're expanding so we assert that we do
+	 * not erroneously try to traverse to the previous VMA even though we
+	 * have, through the use of VMG_FLAG_JUST_EXPAND, indicated we do not
+	 * need to do so.
+	 */
+	alloc_and_link_vma(&mm, 0, 0x2000, 0, flags);
+
+	/*
+	 * We will be positioned at the prev VMA, but looking to expand to
+	 * 0x9000.
+	 */
+	vma_iter_set(&vmi, 0x3000);
+	vma_prev = alloc_and_link_vma(&mm, 0x3000, 0x5000, 3, flags);
+	vmg.prev = vma_prev;
+	vmg.merge_flags = VMG_FLAG_JUST_EXPAND;
+
+	vma = vma_merge_new_range(&vmg);
+	ASSERT_NE(vma, NULL);
+	ASSERT_EQ(vma, vma_prev);
+	ASSERT_EQ(vmg.state, VMA_MERGE_SUCCESS);
+	ASSERT_EQ(vma->vm_start, 0x3000);
+	ASSERT_EQ(vma->vm_end, 0x9000);
+	ASSERT_EQ(vma->vm_pgoff, 3);
+	ASSERT_TRUE(vma_write_started(vma));
+	ASSERT_EQ(vma_iter_addr(&vmi), 0x3000);
+
+	cleanup_mm(&mm, &vmi);
+	return true;
+}
+
 int main(void)
 {
 	int num_tests = 0, num_fail = 0;
@@ -1553,6 +1592,7 @@ int main(void)
 	TEST(vmi_prealloc_fail);
 	TEST(merge_extend);
 	TEST(copy_vma);
+	TEST(expand_only_mode);

 #undef TEST

--
2.46.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH hotfix 6.12 1/2] mm/vma: add expand-only VMA merge mode and optimise do_brk_flags()
  2024-10-17 14:31 ` [PATCH hotfix 6.12 1/2] mm/vma: add expand-only VMA merge mode and optimise do_brk_flags() Lorenzo Stoakes
@ 2024-10-17 17:41   ` Liam R. Howlett
  2024-10-17 17:46     ` Lorenzo Stoakes
  0 siblings, 1 reply; 5+ messages in thread
From: Liam R. Howlett @ 2024-10-17 17:41 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Vlastimil Babka, Jann Horn, linux-mm,
	linux-kernel, Oliver Sang

* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [241017 10:31]:
> We know in advance that do_brk_flags() wants only to perform a VMA
> expansion (if the prior VMA is compatible), and that we assume no mergeable
> VMA follows it.
> 
> These are the semantics of this function prior to the recent rewrite of the
> VMA merging logic, however we are now doing more work than necessary -
> positioning the VMA iterator at the prior VMA and performing tasks that are
> not required.
> 
> Add a new field to the vmg struct to permit merge flags and add a new merge
> flag VMG_FLAG_JUST_EXPAND which implies this behaviour, and have
> do_brk_flags() use this.

Funny, I was thinking we could do this for relocate_vma_down() as well,
bu that's expanding in the wrong direction so we'd have to add
VMG_FLAG_JUST_EXPAND_DOWN, or the like.  I'm not sure it's worth doing
since the expand down happens a lot less often.

> 
> This fixes a reported performance regression in a brk() benchmarking suite.
> 
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/linux-mm/202409301043.629bea78-oliver.sang@intel.com
> Fixes: cacded5e42b9 ("mm: avoid using vma_merge() for new VMAs")
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

> ---
>  mm/mmap.c |  3 ++-
>  mm/vma.c  | 23 +++++++++++++++--------
>  mm/vma.h  | 14 ++++++++++++++
>  3 files changed, 31 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index dd4b35a25aeb..4a13d9f138f6 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1744,7 +1744,8 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
>  		VMG_STATE(vmg, mm, vmi, addr, addr + len, flags, PHYS_PFN(addr));
> 
>  		vmg.prev = vma;
> -		vma_iter_next_range(vmi);
> +		/* vmi is positioned at prev, which this mode expects. */
> +		vmg.merge_flags = VMG_FLAG_JUST_EXPAND;
> 
>  		if (vma_merge_new_range(&vmg))
>  			goto out;
> diff --git a/mm/vma.c b/mm/vma.c
> index 4737afcb064c..b21ffec33f8e 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -917,6 +917,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
>  	pgoff_t pgoff = vmg->pgoff;
>  	pgoff_t pglen = PHYS_PFN(end - start);
>  	bool can_merge_left, can_merge_right;
> +	bool just_expand = vmg->merge_flags & VMG_FLAG_JUST_EXPAND;
> 
>  	mmap_assert_write_locked(vmg->mm);

Could we detect the wrong location by ensuring that the vma iterator is
positioned at prev?

	VM_WARN_ON(just_expand && prev && prev->vm_end != vma_iter_end(vmg->vmi);

or, since vma_iter_addr is used above already..

	VM_WARN_ON(just_expand && prev && prev->start != vma_iter_addr(vmg->vmi);


Does it make sense to just expand without a prev?  Should that be
checked separately?

Anyways, I think it's safer to keep these checks out of the regression
fix, but maybe better to add them later?


>  	VM_WARN_ON(vmg->vma);
> @@ -930,7 +931,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
>  		return NULL;
> 
>  	can_merge_left = can_vma_merge_left(vmg);
> -	can_merge_right = can_vma_merge_right(vmg, can_merge_left);
> +	can_merge_right = !just_expand && can_vma_merge_right(vmg, can_merge_left);
> 
>  	/* If we can merge with the next VMA, adjust vmg accordingly. */
>  	if (can_merge_right) {
> @@ -953,7 +954,11 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
>  		if (can_merge_right && !can_merge_remove_vma(next))
>  			vmg->end = end;
> 
> -		vma_prev(vmg->vmi); /* Equivalent to going to the previous range */
> +		/* In expand-only case we are already positioned at prev. */
> +		if (!just_expand) {
> +			/* Equivalent to going to the previous range. */
> +			vma_prev(vmg->vmi);
> +		}
>  	}
> 
>  	/*
> @@ -967,12 +972,14 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
>  	}
> 
>  	/* If expansion failed, reset state. Allows us to retry merge later. */
> -	vmg->vma = NULL;
> -	vmg->start = start;
> -	vmg->end = end;
> -	vmg->pgoff = pgoff;
> -	if (vmg->vma == prev)
> -		vma_iter_set(vmg->vmi, start);
> +	if (!just_expand) {
> +		vmg->vma = NULL;
> +		vmg->start = start;
> +		vmg->end = end;
> +		vmg->pgoff = pgoff;
> +		if (vmg->vma == prev)
> +			vma_iter_set(vmg->vmi, start);
> +	}
> 
>  	return NULL;
>  }
> diff --git a/mm/vma.h b/mm/vma.h
> index 819f994cf727..8c6ecc0dfbf6 100644
> --- a/mm/vma.h
> +++ b/mm/vma.h
> @@ -59,6 +59,17 @@ enum vma_merge_state {
>  	VMA_MERGE_SUCCESS,
>  };
> 
> +enum vma_merge_flags {
> +	VMG_FLAG_DEFAULT = 0,
> +	/*
> +	 * If we can expand, simply do so. We know there is nothing to merge to
> +	 * the right. Does not reset state upon failure to merge. The VMA
> +	 * iterator is assumed to be positioned at the previous VMA, rather than
> +	 * at the gap.
> +	 */
> +	VMG_FLAG_JUST_EXPAND = 1 << 0,
> +};
> +
>  /* Represents a VMA merge operation. */
>  struct vma_merge_struct {
>  	struct mm_struct *mm;
> @@ -75,6 +86,7 @@ struct vma_merge_struct {
>  	struct mempolicy *policy;
>  	struct vm_userfaultfd_ctx uffd_ctx;
>  	struct anon_vma_name *anon_name;
> +	enum vma_merge_flags merge_flags;
>  	enum vma_merge_state state;
>  };
> 
> @@ -99,6 +111,7 @@ static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
>  		.flags = flags_,					\
>  		.pgoff = pgoff_,					\
>  		.state = VMA_MERGE_START,				\
> +		.merge_flags = VMG_FLAG_DEFAULT,			\
>  	}
> 
>  #define VMG_VMA_STATE(name, vmi_, prev_, vma_, start_, end_)	\
> @@ -118,6 +131,7 @@ static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
>  		.uffd_ctx = vma_->vm_userfaultfd_ctx,		\
>  		.anon_name = anon_vma_name(vma_),		\
>  		.state = VMA_MERGE_START,			\
> +		.merge_flags = VMG_FLAG_DEFAULT,		\
>  	}
> 
>  #ifdef CONFIG_DEBUG_VM_MAPLE_TREE
> --
> 2.46.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH hotfix 6.12 1/2] mm/vma: add expand-only VMA merge mode and optimise do_brk_flags()
  2024-10-17 17:41   ` Liam R. Howlett
@ 2024-10-17 17:46     ` Lorenzo Stoakes
  0 siblings, 0 replies; 5+ messages in thread
From: Lorenzo Stoakes @ 2024-10-17 17:46 UTC (permalink / raw)
  To: Liam R. Howlett, Andrew Morton, Vlastimil Babka, Jann Horn,
	linux-mm, linux-kernel, Oliver Sang

On Thu, Oct 17, 2024 at 01:41:15PM -0400, Liam R. Howlett wrote:
> * Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [241017 10:31]:
> > We know in advance that do_brk_flags() wants only to perform a VMA
> > expansion (if the prior VMA is compatible), and that we assume no mergeable
> > VMA follows it.
> >
> > These are the semantics of this function prior to the recent rewrite of the
> > VMA merging logic, however we are now doing more work than necessary -
> > positioning the VMA iterator at the prior VMA and performing tasks that are
> > not required.
> >
> > Add a new field to the vmg struct to permit merge flags and add a new merge
> > flag VMG_FLAG_JUST_EXPAND which implies this behaviour, and have
> > do_brk_flags() use this.
>
> Funny, I was thinking we could do this for relocate_vma_down() as well,
> bu that's expanding in the wrong direction so we'd have to add
> VMG_FLAG_JUST_EXPAND_DOWN, or the like.  I'm not sure it's worth doing
> since the expand down happens a lot less often.

Yeah I think we have to carefully profile these and go case-by-case. My
initial series to fix this made things worse :P as usual in perf developer
instinct (in this case mine) is often miles off.

>
> >
> > This fixes a reported performance regression in a brk() benchmarking suite.
> >
> > Reported-by: kernel test robot <oliver.sang@intel.com>
> > Closes: https://lore.kernel.org/linux-mm/202409301043.629bea78-oliver.sang@intel.com
> > Fixes: cacded5e42b9 ("mm: avoid using vma_merge() for new VMAs")
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
>
> > ---
> >  mm/mmap.c |  3 ++-
> >  mm/vma.c  | 23 +++++++++++++++--------
> >  mm/vma.h  | 14 ++++++++++++++
> >  3 files changed, 31 insertions(+), 9 deletions(-)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index dd4b35a25aeb..4a13d9f138f6 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1744,7 +1744,8 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> >  		VMG_STATE(vmg, mm, vmi, addr, addr + len, flags, PHYS_PFN(addr));
> >
> >  		vmg.prev = vma;
> > -		vma_iter_next_range(vmi);
> > +		/* vmi is positioned at prev, which this mode expects. */
> > +		vmg.merge_flags = VMG_FLAG_JUST_EXPAND;
> >
> >  		if (vma_merge_new_range(&vmg))
> >  			goto out;
> > diff --git a/mm/vma.c b/mm/vma.c
> > index 4737afcb064c..b21ffec33f8e 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -917,6 +917,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
> >  	pgoff_t pgoff = vmg->pgoff;
> >  	pgoff_t pglen = PHYS_PFN(end - start);
> >  	bool can_merge_left, can_merge_right;
> > +	bool just_expand = vmg->merge_flags & VMG_FLAG_JUST_EXPAND;
> >
> >  	mmap_assert_write_locked(vmg->mm);
>
> Could we detect the wrong location by ensuring that the vma iterator is
> positioned at prev?
>
> 	VM_WARN_ON(just_expand && prev && prev->vm_end != vma_iter_end(vmg->vmi);
>
> or, since vma_iter_addr is used above already..
>
> 	VM_WARN_ON(just_expand && prev && prev->start != vma_iter_addr(vmg->vmi);
>
>
> Does it make sense to just expand without a prev?  Should that be
> checked separately?
>
> Anyways, I think it's safer to keep these checks out of the regression
> fix, but maybe better to add them later?

Yeah I do think it'd be worth adding some specific ones. But yeah perhaps
let's add those later!

>
>
> >  	VM_WARN_ON(vmg->vma);
> > @@ -930,7 +931,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
> >  		return NULL;
> >
> >  	can_merge_left = can_vma_merge_left(vmg);
> > -	can_merge_right = can_vma_merge_right(vmg, can_merge_left);
> > +	can_merge_right = !just_expand && can_vma_merge_right(vmg, can_merge_left);
> >
> >  	/* If we can merge with the next VMA, adjust vmg accordingly. */
> >  	if (can_merge_right) {
> > @@ -953,7 +954,11 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
> >  		if (can_merge_right && !can_merge_remove_vma(next))
> >  			vmg->end = end;
> >
> > -		vma_prev(vmg->vmi); /* Equivalent to going to the previous range */
> > +		/* In expand-only case we are already positioned at prev. */
> > +		if (!just_expand) {
> > +			/* Equivalent to going to the previous range. */
> > +			vma_prev(vmg->vmi);
> > +		}
> >  	}
> >
> >  	/*
> > @@ -967,12 +972,14 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
> >  	}
> >
> >  	/* If expansion failed, reset state. Allows us to retry merge later. */
> > -	vmg->vma = NULL;
> > -	vmg->start = start;
> > -	vmg->end = end;
> > -	vmg->pgoff = pgoff;
> > -	if (vmg->vma == prev)
> > -		vma_iter_set(vmg->vmi, start);
> > +	if (!just_expand) {
> > +		vmg->vma = NULL;
> > +		vmg->start = start;
> > +		vmg->end = end;
> > +		vmg->pgoff = pgoff;
> > +		if (vmg->vma == prev)
> > +			vma_iter_set(vmg->vmi, start);
> > +	}
> >
> >  	return NULL;
> >  }
> > diff --git a/mm/vma.h b/mm/vma.h
> > index 819f994cf727..8c6ecc0dfbf6 100644
> > --- a/mm/vma.h
> > +++ b/mm/vma.h
> > @@ -59,6 +59,17 @@ enum vma_merge_state {
> >  	VMA_MERGE_SUCCESS,
> >  };
> >
> > +enum vma_merge_flags {
> > +	VMG_FLAG_DEFAULT = 0,
> > +	/*
> > +	 * If we can expand, simply do so. We know there is nothing to merge to
> > +	 * the right. Does not reset state upon failure to merge. The VMA
> > +	 * iterator is assumed to be positioned at the previous VMA, rather than
> > +	 * at the gap.
> > +	 */
> > +	VMG_FLAG_JUST_EXPAND = 1 << 0,
> > +};
> > +
> >  /* Represents a VMA merge operation. */
> >  struct vma_merge_struct {
> >  	struct mm_struct *mm;
> > @@ -75,6 +86,7 @@ struct vma_merge_struct {
> >  	struct mempolicy *policy;
> >  	struct vm_userfaultfd_ctx uffd_ctx;
> >  	struct anon_vma_name *anon_name;
> > +	enum vma_merge_flags merge_flags;
> >  	enum vma_merge_state state;
> >  };
> >
> > @@ -99,6 +111,7 @@ static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
> >  		.flags = flags_,					\
> >  		.pgoff = pgoff_,					\
> >  		.state = VMA_MERGE_START,				\
> > +		.merge_flags = VMG_FLAG_DEFAULT,			\
> >  	}
> >
> >  #define VMG_VMA_STATE(name, vmi_, prev_, vma_, start_, end_)	\
> > @@ -118,6 +131,7 @@ static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
> >  		.uffd_ctx = vma_->vm_userfaultfd_ctx,		\
> >  		.anon_name = anon_vma_name(vma_),		\
> >  		.state = VMA_MERGE_START,			\
> > +		.merge_flags = VMG_FLAG_DEFAULT,		\
> >  	}
> >
> >  #ifdef CONFIG_DEBUG_VM_MAPLE_TREE
> > --
> > 2.46.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-10-17 17:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-17 14:31 [PATCH hotfix 6.12 0/2] introduce VMA merge mode to improve brk() performance Lorenzo Stoakes
2024-10-17 14:31 ` [PATCH hotfix 6.12 1/2] mm/vma: add expand-only VMA merge mode and optimise do_brk_flags() Lorenzo Stoakes
2024-10-17 17:41   ` Liam R. Howlett
2024-10-17 17:46     ` Lorenzo Stoakes
2024-10-17 14:31 ` [PATCH hotfix 6.12 2/2] tools: testing: add expand-only mode VMA test Lorenzo Stoakes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox