linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>
Subject: Re: [PATCH 07/10] mm: avoid using vma_merge() for new VMAs
Date: Thu, 8 Aug 2024 19:02:09 +0100	[thread overview]
Message-ID: <dec849e1-cfa5-4a1b-820b-8dc2ee5a8bdc@lucifer.local> (raw)
In-Reply-To: <28774b9d-b74e-4028-acef-5d4f09a5d36a@suse.cz>

On Thu, Aug 08, 2024 at 06:45:43PM GMT, Vlastimil Babka wrote:
> On 8/5/24 14:13, Lorenzo Stoakes wrote:
> > In mmap_region() and do_brk_flags() we open code scenarios where we prefer
> > to use vma_expand() rather than invoke a full vma_merge() operation.
> >
> > Abstract this logic and eliminate all of the open-coding, and also use the
> > same logic for all cases where we add new VMAs to, rather than ultimately
> > use vma_merge(), rather use vma_expand().
> >
> > We implement this by replacing vma_merge_new_vma() with this newly
> > abstracted logic.
> >
> > Doing so removes duplication and simplifies VMA merging in all such cases,
> > laying the ground for us to eliminate the merging of new VMAs in
> > vma_merge() altogether.
> >
> > This makes it far easier to understand what is happening in these cases
> > avoiding confusion, bugs and allowing for future optimisation.
> >
> > As a result of this change we are also able to make vma_prepare(),
> > init_vma_prep(), vma_complete(), can_vma_merge_before() and
> > can_vma_merge_after() static and internal to vma.c.
>
> That's really great, but it would be even better if these code moves could
> be a separate patch as it would make reviewing so much easier. But with git
> diff's --color-moved to the rescue, let me try...

Will separate out on respin.

>
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> >  mm/mmap.c                        |  79 ++---
> >  mm/vma.c                         | 482 +++++++++++++++++++------------
> >  mm/vma.h                         |  51 +---
> >  tools/testing/vma/vma_internal.h |   6 +
> >  4 files changed, 324 insertions(+), 294 deletions(-)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index f6593a81f73d..c03f50f46396 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1363,8 +1363,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> >  {
> >  	struct mm_struct *mm = current->mm;
> >  	struct vm_area_struct *vma = NULL;
> > -	struct vm_area_struct *next, *prev, *merge;
> > -	pgoff_t pglen = len >> PAGE_SHIFT;
> > +	struct vm_area_struct *merge;
> >  	unsigned long charged = 0;
> >  	unsigned long end = addr + len;
> >  	bool writable_file_mapping = false;
> > @@ -1411,44 +1410,9 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> >  		vm_flags |= VM_ACCOUNT;
> >  	}
> >
> > -	next = vmg.next = vma_next(&vmi);
> > -	prev = vmg.prev = vma_prev(&vmi);
> > -	if (vm_flags & VM_SPECIAL) {
> > -		if (prev)
> > -			vma_iter_next_range(&vmi);
> > -		goto cannot_expand;
> > -	}
> > -
> > -	/* Attempt to expand an old mapping */
> > -	/* Check next */
> > -	if (next && next->vm_start == end && can_vma_merge_before(&vmg)) {
> > -		/* We can adjust this as can_vma_merge_after() doesn't touch */
> > -		vmg.end = next->vm_end;
> > -		vma = vmg.vma = next;
> > -		vmg.pgoff = next->vm_pgoff - pglen;
> > -
> > -		/* We may merge our NULL anon_vma with non-NULL in next. */
> > -		vmg.anon_vma = vma->anon_vma;
> > -	}
> > -
> > -	/* Check prev */
> > -	if (prev && prev->vm_end == addr && can_vma_merge_after(&vmg)) {
> > -		vmg.start = prev->vm_start;
> > -		vma = vmg.vma = prev;
> > -		vmg.pgoff = prev->vm_pgoff;
> > -	} else if (prev) {
> > -		vma_iter_next_range(&vmi);
> > -	}
> > -
> > -	/* Actually expand, if possible */
> > -	if (vma && !vma_expand(&vmg)) {
> > -		khugepaged_enter_vma(vma, vm_flags);
> > +	vma = vma_merge_new_vma(&vmg);
> > +	if (vma)
> >  		goto expanded;
> > -	}
> > -
> > -	if (vma == prev)
> > -		vma_iter_set(&vmi, addr);
> > -cannot_expand:
> >
> >  	/*
> >  	 * Determine the object being mapped and call the appropriate
> > @@ -1493,10 +1457,9 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> >  		 * If vm_flags changed after call_mmap(), we should try merge
> >  		 * vma again as we may succeed this time.
> >  		 */
> > -		if (unlikely(vm_flags != vma->vm_flags && prev)) {
> > -			merge = vma_merge_new_vma_wrapper(&vmi, prev, vma,
> > -							  vma->vm_start, vma->vm_end,
> > -							  vma->vm_pgoff);
> > +		if (unlikely(vm_flags != vma->vm_flags && vmg.prev)) {
> > +			merge = vma_merge_new_vma(&vmg);
>
> Can this even succeed if we don't update vmg->vm_flags? Previously the
> wrapper would take them from vma.

You're right... ugh. Will fix.

This is yet another example of how having this _not_ be under test is
problematic, as that'd have picked this up.

I will try to move at least VMA merge invocation logic over in a later
series.

>
> > +
> >  			if (merge) {
> >  				/*
> >  				 * ->mmap() can change vma->vm_file and fput
>
> <snip>
>
> > +/*
> > + * vma_merge_new_vma - Attempt to merge a new VMA into address space
> > + *
> > + * @vmg: Describes the VMA we are adding, in the range @vmg->start to @vmg->end
> > + *       (exclusive), which we try to merge with any adjacent VMAs if possible.
> > + *
> > + * We are about to add a VMA to the address space starting at @vmg->start and
> > + * ending at @vmg->end. There are three different possible scenarios:
> > + *
> > + * 1. There is a VMA with identical properties immediately adjacent to the
> > + *    proposed new VMA [@vmg->start, @vmg->end) either before or after it -
> > + *    EXPAND that VMA:
> > + *
> > + * Proposed:       |-----|  or  |-----|
> > + * Existing:  |----|                  |----|
> > + *
> > + * 2. There are VMAs with identical properties immediately adjacent to the
> > + *    proposed new VMA [@vmg->start, @vmg->end) both before AND after it -
> > + *    EXPAND the former and REMOVE the latter:
> > + *
> > + * Proposed:       |-----|
> > + * Existing:  |----|     |----|
> > + *
> > + * 3. There are no VMAs immediately adjacent to the proposed new VMA or those
> > + *    VMAs do not have identical attributes - NO MERGE POSSIBLE.
> > + *
> > + * In instances where we can merge, this function returns the expanded VMA which
> > + * will have its range adjusted accordingly and the underlying maple tree also
> > + * adjusted.
> > + *
> > + * Returns: In instances where no merge was possible, NULL. Otherwise, a pointer
> > + *          to the VMA we expanded.
> > + *
> > + * This function also adjusts @vmg to provide @vmg->prev and @vmg->next if
> > + * neither already specified, and adjusts [@vmg->start, @vmg->end) to span the
> > + * expanded range.
> > + *
> > + * ASSUMPTIONS:
> > + * - The caller must hold a WRITE lock on the mm_struct->mmap_lock.
> > + * - The caller must have determined that [@vmg->start, @vmg->end) is empty.
>
> Should we be paranoid and assert something?

This will have a performance impact, if we do that we'll want something like
an #ifdef CONFIG_DEBUG_VM around that.

>
> > + */
> > +struct vm_area_struct *vma_merge_new_vma(struct vma_merge_struct *vmg)
> > +{
> > +	bool is_special = vmg->flags & VM_SPECIAL;
> > +	struct vm_area_struct *prev = vmg->prev;
> > +	struct vm_area_struct *next = vmg->next;
> > +	unsigned long start = vmg->start;
> > +	unsigned long end = vmg->end;
> > +	pgoff_t pgoff = vmg->pgoff;
> > +	pgoff_t pglen = PHYS_PFN(end - start);
> > +
> > +	VM_WARN_ON(vmg->vma);
> > +
> > +	if (!prev && !next) {
> > +		/*
> > +		 * Since the caller must have determined that the requested
> > +		 * range is empty, vmg->vmi will be left pointing at the VMA
> > +		 * immediately prior.
> > +		 */
>
> OK that's perhaps not that obvious, as it seems copy_vma() is doing some
> special dance to ensure this. Should we add it to the ASSUMPTIONS and assert
> it, or is there a maple tree operation we can do to ensure it, ideally if
> it's very cheap if the iterator is already set the way we want it to be?
>

To be fair this is something that was previously assumed, and I just added
a comment.

Will add to assumptions, and again I think any assert should be done in
such a way that under non-CONFIG_DEBUG_VM nothing happens, maybe
VM_WARN_ON()?

Will try to come up with something.

> > +		next = vmg->next = vma_next(vmg->vmi);
> > +		prev = vmg->prev = vma_prev(vmg->vmi);
> > +
> > +		/* Avoid maple tree re-walk. */
> > +		if (is_special && prev)
> > +			vma_iter_next_range(vmg->vmi);
>
> I wish I knew what this did but seems it's the same as the old code did so
> hopefully that's fine.

I think point is that we are about to exit, so we'd be left pointing at
prev. But since we're exiting in just a second, we want to be pointing at
the next vma which will become the prev of the next merge attempt.

Liam can maybe elucidate further.

>
> > +	}
> > +
> > +	/* If special mapping or no adjacent VMAs, nothing to merge. */
> > +	if (is_special || (!prev && !next))
> > +		return NULL;
> > +
> > +	/* If we can merge with the following VMA, adjust vmg accordingly. */
> > +	if (next && next->vm_start == end && can_vma_merge_before(vmg)) {
> > +		/*
> > +		 * We can adjust this here as can_vma_merge_after() doesn't
> > +		 * touch vmg->end.
> > +		 */
> > +		vmg->end = next->vm_end;
> > +		vmg->vma = next;
> > +		vmg->pgoff = next->vm_pgoff - pglen;
> > +
> > +		vmg->anon_vma = next->anon_vma;
> > +	}
> > +
> > +	/* If we can merge with the previous VMA, adjust vmg accordingly. */
> > +	if (prev && prev->vm_end == start && can_vma_merge_after(vmg)) {
> > +		vmg->start = prev->vm_start;
> > +		vmg->vma = prev;
> > +		vmg->pgoff = prev->vm_pgoff;
> > +	} else if (prev) {
> > +		vma_iter_next_range(vmg->vmi);
> > +	}
>
> Sigh... ditto.
>

(Liam can correct me) I think this is just setting up the vmi similar to
the other case such that if expansion fails we can positioned correctly for
the next merge attempt.

Yes it's fiddly, maybe needs a comment...


  reply	other threads:[~2024-08-08 18:02 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-05 12:13 [PATCH 00/10] mm: remove vma_merge() Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 01/10] tools: improve vma test Makefile Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 02/10] mm: introduce vma_merge_struct and abstract merge parameters Lorenzo Stoakes
2024-08-06 12:47   ` Petr Tesařík
2024-08-06 13:43     ` Lorenzo Stoakes
2024-08-06 14:06       ` Petr Tesařík
2024-08-06 14:20         ` Lorenzo Stoakes
2024-08-06 14:32           ` Petr Tesařík
2024-08-08 12:49   ` Vlastimil Babka
2024-08-08 17:18     ` Lorenzo Stoakes
2024-08-08 20:07   ` Liam R. Howlett
2024-08-09 10:11     ` Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 03/10] mm: abstract duplicated policy comparison Lorenzo Stoakes
2024-08-06 12:50   ` Petr Tesařík
2024-08-05 12:13 ` [PATCH 04/10] mm: abstract parameters for vma_expand/shrink() Lorenzo Stoakes
2024-08-06 12:54   ` Petr Tesařík
     [not found]   ` <f12608ec-9c40-4977-a5a6-479f86b44e80@kernel.org>
2024-08-08 15:45     ` Lorenzo Stoakes
2024-08-08 20:20       ` Liam R. Howlett
2024-08-09 10:18         ` Lorenzo Stoakes
2024-08-14 13:53       ` Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 05/10] mm: abstract vma_merge_new_vma() to use vma_merge_struct Lorenzo Stoakes
     [not found]   ` <82b802e0-94fd-4cca-ad8f-ea2d85bcae64@kernel.org>
2024-08-08 15:52     ` Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 06/10] tools: add VMA merge tests Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 07/10] mm: avoid using vma_merge() for new VMAs Lorenzo Stoakes
2024-08-06 13:04   ` Petr Tesařík
2024-08-06 13:44     ` Lorenzo Stoakes
2024-08-08 16:45   ` Vlastimil Babka
2024-08-08 18:02     ` Lorenzo Stoakes [this message]
2024-08-08 18:34       ` Liam R. Howlett
2024-08-08 19:06         ` Liam R. Howlett
2024-08-09 10:14           ` Lorenzo Stoakes
2024-08-09 15:23   ` Liam R. Howlett
2024-08-09 17:20     ` Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 08/10] mm: introduce commit_merge(), abstracting merge operation Lorenzo Stoakes
2024-08-06 13:41   ` Petr Tesařík
2024-08-06 13:48     ` Lorenzo Stoakes
2024-08-06 14:13       ` Petr Tesařík
2024-08-06 14:30         ` Lorenzo Stoakes
2024-08-06 14:39           ` Petr Tesařík
2024-08-09 10:15   ` Vlastimil Babka
2024-08-09 10:53     ` Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 09/10] mm: refactor vma_merge() into modify-only vma_merge_modified() Lorenzo Stoakes
2024-08-06 13:42   ` Petr Tesařík
2024-08-06 13:52     ` Lorenzo Stoakes
2024-08-09 13:44   ` Vlastimil Babka
2024-08-09 13:57     ` Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 10/10] mm: rework vm_ops->close() handling on VMA merge Lorenzo Stoakes
2024-08-06 13:55   ` Petr Tesařík
2024-08-06 14:08     ` Lorenzo Stoakes
2024-08-06 14:21       ` Petr Tesařík
2024-08-06 14:42         ` Lorenzo Stoakes
2024-08-09 14:25   ` Vlastimil Babka
2024-08-09 14:37     ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dec849e1-cfa5-4a1b-820b-8dc2ee5a8bdc@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox