linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 07/10] mm: avoid using vma_merge() for new VMAs
Date: Thu, 8 Aug 2024 14:34:29 -0400	[thread overview]
Message-ID: <jlie2tieccx5ulr3b77dpvw4fupeocsu4ftiuwtjkd663qkjzl@vc3imtfqgemi> (raw)
In-Reply-To: <dec849e1-cfa5-4a1b-820b-8dc2ee5a8bdc@lucifer.local>

* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [240808 14:02]:
> On Thu, Aug 08, 2024 at 06:45:43PM GMT, Vlastimil Babka wrote:
> > On 8/5/24 14:13, Lorenzo Stoakes wrote:
> > > In mmap_region() and do_brk_flags() we open code scenarios where we prefer
> > > to use vma_expand() rather than invoke a full vma_merge() operation.
> > >
> > > Abstract this logic and eliminate all of the open-coding, and also use the
> > > same logic for all cases where we add new VMAs to, rather than ultimately
> > > use vma_merge(), rather use vma_expand().
> > >
> > > We implement this by replacing vma_merge_new_vma() with this newly
> > > abstracted logic.
> > >
> > > Doing so removes duplication and simplifies VMA merging in all such cases,
> > > laying the ground for us to eliminate the merging of new VMAs in
> > > vma_merge() altogether.
> > >
> > > This makes it far easier to understand what is happening in these cases
> > > avoiding confusion, bugs and allowing for future optimisation.
> > >
> > > As a result of this change we are also able to make vma_prepare(),
> > > init_vma_prep(), vma_complete(), can_vma_merge_before() and
> > > can_vma_merge_after() static and internal to vma.c.
> >
> > That's really great, but it would be even better if these code moves could
> > be a separate patch as it would make reviewing so much easier. But with git
> > diff's --color-moved to the rescue, let me try...
> 
> Will separate out on respin.
> 
> >
> > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > > ---
> > >  mm/mmap.c                        |  79 ++---
> > >  mm/vma.c                         | 482 +++++++++++++++++++------------
> > >  mm/vma.h                         |  51 +---
> > >  tools/testing/vma/vma_internal.h |   6 +
> > >  4 files changed, 324 insertions(+), 294 deletions(-)

...
> > > + */
> > > +struct vm_area_struct *vma_merge_new_vma(struct vma_merge_struct *vmg)
> > > +{
> > > +	bool is_special = vmg->flags & VM_SPECIAL;
> > > +	struct vm_area_struct *prev = vmg->prev;
> > > +	struct vm_area_struct *next = vmg->next;
> > > +	unsigned long start = vmg->start;
> > > +	unsigned long end = vmg->end;
> > > +	pgoff_t pgoff = vmg->pgoff;
> > > +	pgoff_t pglen = PHYS_PFN(end - start);
> > > +
> > > +	VM_WARN_ON(vmg->vma);
> > > +
> > > +	if (!prev && !next) {
> > > +		/*
> > > +		 * Since the caller must have determined that the requested
> > > +		 * range is empty, vmg->vmi will be left pointing at the VMA
> > > +		 * immediately prior.
> > > +		 */
> >
> > OK that's perhaps not that obvious, as it seems copy_vma() is doing some
> > special dance to ensure this. Should we add it to the ASSUMPTIONS and assert
> > it, or is there a maple tree operation we can do to ensure it, ideally if
> > it's very cheap if the iterator is already set the way we want it to be?
> >
> 
> To be fair this is something that was previously assumed, and I just added
> a comment.
> 
> Will add to assumptions, and again I think any assert should be done in
> such a way that under non-CONFIG_DEBUG_VM nothing happens, maybe
> VM_WARN_ON()?
> 
> Will try to come up with something.
> 
> > > +		next = vmg->next = vma_next(vmg->vmi);
> > > +		prev = vmg->prev = vma_prev(vmg->vmi);
> > > +
> > > +		/* Avoid maple tree re-walk. */
> > > +		if (is_special && prev)
> > > +			vma_iter_next_range(vmg->vmi);
> >
> > I wish I knew what this did but seems it's the same as the old code did so
> > hopefully that's fine.
> 
> I think point is that we are about to exit, so we'd be left pointing at
> prev. But since we're exiting in just a second, we want to be pointing at
> the next vma which will become the prev of the next merge attempt.
> 
> Liam can maybe elucidate further.

What you have to remember is that the vma iterator (vmg->vmi above),
contains (or, basically is) a maple state (usually written as mas).  We
keep state of the maple tree walker so that we don't have to keep
re-walking to find the same thing.  We move around the tree with this
maple state because going prev/next is faster from leaves (almost always
just the next thing in the nodes array of pointers).

We use the maple state to write as well, so the maple state needs to
point to the correct location in the tree for a write.

The maple tree is a range-based tree, so each entry exists for a span of
values.  A write happens at the lowest index and can overwrite
subsequent values.  This means that the maple state needs to point to
the range containing the lowest index for the write (if it's pointing to
a node - it could walk from the top).

A side effect of writing to the lowest index is that we need to point to
the previous vma if we are going to 'expand' the vma.  The range is
essentially going to be from prev->start to "whatever we are expanding
over".

In the old code, the vm_flags & VM_SPECIAL code meant there was no way
an expansion was going to happen, but we've moved the maple state to the
wrong location for a write of a new vma - so this vma_iter_next_range()
just moves it back.  Then we "goto cannot_expand".

> 
> >
> > > +	}
> > > +
> > > +	/* If special mapping or no adjacent VMAs, nothing to merge. */
> > > +	if (is_special || (!prev && !next))
> > > +		return NULL;
> > > +
> > > +	/* If we can merge with the following VMA, adjust vmg accordingly. */
> > > +	if (next && next->vm_start == end && can_vma_merge_before(vmg)) {
> > > +		/*
> > > +		 * We can adjust this here as can_vma_merge_after() doesn't
> > > +		 * touch vmg->end.
> > > +		 */
> > > +		vmg->end = next->vm_end;
> > > +		vmg->vma = next;
> > > +		vmg->pgoff = next->vm_pgoff - pglen;
> > > +
> > > +		vmg->anon_vma = next->anon_vma;
> > > +	}
> > > +
> > > +	/* If we can merge with the previous VMA, adjust vmg accordingly. */
> > > +	if (prev && prev->vm_end == start && can_vma_merge_after(vmg)) {
> > > +		vmg->start = prev->vm_start;
> > > +		vmg->vma = prev;
> > > +		vmg->pgoff = prev->vm_pgoff;
> > > +	} else if (prev) {
> > > +		vma_iter_next_range(vmg->vmi);
> > > +	}
> >
> > Sigh... ditto.
> >
> 
> (Liam can correct me) I think this is just setting up the vmi similar to
> the other case such that if expansion fails we can positioned correctly for
> the next merge attempt.
> 
> Yes it's fiddly, maybe needs a comment...

Yes, ditto.



  reply	other threads:[~2024-08-08 18:34 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-05 12:13 [PATCH 00/10] mm: remove vma_merge() Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 01/10] tools: improve vma test Makefile Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 02/10] mm: introduce vma_merge_struct and abstract merge parameters Lorenzo Stoakes
2024-08-06 12:47   ` Petr Tesařík
2024-08-06 13:43     ` Lorenzo Stoakes
2024-08-06 14:06       ` Petr Tesařík
2024-08-06 14:20         ` Lorenzo Stoakes
2024-08-06 14:32           ` Petr Tesařík
2024-08-08 12:49   ` Vlastimil Babka
2024-08-08 17:18     ` Lorenzo Stoakes
2024-08-08 20:07   ` Liam R. Howlett
2024-08-09 10:11     ` Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 03/10] mm: abstract duplicated policy comparison Lorenzo Stoakes
2024-08-06 12:50   ` Petr Tesařík
2024-08-05 12:13 ` [PATCH 04/10] mm: abstract parameters for vma_expand/shrink() Lorenzo Stoakes
2024-08-06 12:54   ` Petr Tesařík
     [not found]   ` <f12608ec-9c40-4977-a5a6-479f86b44e80@kernel.org>
2024-08-08 15:45     ` Lorenzo Stoakes
2024-08-08 20:20       ` Liam R. Howlett
2024-08-09 10:18         ` Lorenzo Stoakes
2024-08-14 13:53       ` Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 05/10] mm: abstract vma_merge_new_vma() to use vma_merge_struct Lorenzo Stoakes
     [not found]   ` <82b802e0-94fd-4cca-ad8f-ea2d85bcae64@kernel.org>
2024-08-08 15:52     ` Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 06/10] tools: add VMA merge tests Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 07/10] mm: avoid using vma_merge() for new VMAs Lorenzo Stoakes
2024-08-06 13:04   ` Petr Tesařík
2024-08-06 13:44     ` Lorenzo Stoakes
2024-08-08 16:45   ` Vlastimil Babka
2024-08-08 18:02     ` Lorenzo Stoakes
2024-08-08 18:34       ` Liam R. Howlett [this message]
2024-08-08 19:06         ` Liam R. Howlett
2024-08-09 10:14           ` Lorenzo Stoakes
2024-08-09 15:23   ` Liam R. Howlett
2024-08-09 17:20     ` Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 08/10] mm: introduce commit_merge(), abstracting merge operation Lorenzo Stoakes
2024-08-06 13:41   ` Petr Tesařík
2024-08-06 13:48     ` Lorenzo Stoakes
2024-08-06 14:13       ` Petr Tesařík
2024-08-06 14:30         ` Lorenzo Stoakes
2024-08-06 14:39           ` Petr Tesařík
2024-08-09 10:15   ` Vlastimil Babka
2024-08-09 10:53     ` Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 09/10] mm: refactor vma_merge() into modify-only vma_merge_modified() Lorenzo Stoakes
2024-08-06 13:42   ` Petr Tesařík
2024-08-06 13:52     ` Lorenzo Stoakes
2024-08-09 13:44   ` Vlastimil Babka
2024-08-09 13:57     ` Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 10/10] mm: rework vm_ops->close() handling on VMA merge Lorenzo Stoakes
2024-08-06 13:55   ` Petr Tesařík
2024-08-06 14:08     ` Lorenzo Stoakes
2024-08-06 14:21       ` Petr Tesařík
2024-08-06 14:42         ` Lorenzo Stoakes
2024-08-09 14:25   ` Vlastimil Babka
2024-08-09 14:37     ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jlie2tieccx5ulr3b77dpvw4fupeocsu4ftiuwtjkd663qkjzl@vc3imtfqgemi \
    --to=liam.howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox