linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>
Subject: Re: [PATCH 09/10] mm: refactor vma_merge() into modify-only vma_merge_modified()
Date: Fri, 9 Aug 2024 14:57:13 +0100	[thread overview]
Message-ID: <f87b18e5-9caa-472f-9fe7-2ba30e4b46cd@lucifer.local> (raw)
In-Reply-To: <8c490115-59fe-4e87-ac07-ec7dd6a3ccd3@suse.cz>

On Fri, Aug 09, 2024 at 03:44:00PM GMT, Vlastimil Babka wrote:
> On 8/5/24 14:13, Lorenzo Stoakes wrote:
> > The existing vma_merge() function is no longer required to handle what were
> > previously referred to as cases 1-3 (i.e. the merging of a new VMA), as
> > this is now handled by vma_merge_new_vma().
> >
> > Additionally, we simplify the convoluted control flow of the original,
> > maintaining identical logic only expressed more clearly and doing away with
> > a complicated set of cases, rather logically examining each possible
> > outcome - merging of both the previous and subsequent VMA, merging of the
> > previous VMA and merging of the subsequent VMA alone.
> >
> > We now utilise the previously implemented commit_merge() function to share
> > logic with vma_expand() deduplicating code and providing less surface area
> > for bugs and confusion.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> >  mm/vma.c | 474 +++++++++++++++++++++++++++----------------------------
> >  mm/vma.h |   6 -
> >  2 files changed, 232 insertions(+), 248 deletions(-)
> >
> > diff --git a/mm/vma.c b/mm/vma.c
> > index b7e3c64d5d68..c55ae035f5d6 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -569,8 +569,7 @@ static int commit_merge(struct vma_merge_struct *vmg,
> >  			struct vm_area_struct *adjust,
> >  			struct vm_area_struct *remove,
> >  			struct vm_area_struct *remove2,
> > -			long adj_start,
> > -			bool expanded)
> > +			long adj_start, bool expanded)
> >  {
> >  	struct vma_prepare vp;
> >
> > @@ -607,6 +606,236 @@ static int commit_merge(struct vma_merge_struct *vmg,
> >  	return 0;
> >  }
> >
> > +/*
> > + * vma_merge_modified - Attempt to merge VMAs based on a VMA having its
> > + * attributes modified.
> > + *
> > + * @vmg: Describes the modifications being made to a VMA and associated
> > + *       metadata.
> > + *
> > + * When the attributes of a range within a VMA change, then it might be possible
> > + * for immediately adjacent VMAs to be merged into that VMA due to having
> > + * identical properties.
> > + *
> > + * This function checks for the existence of any such mergeable VMAs and updates
> > + * the maple tree describing the @vmg->vma->vm_mm address space to account for
> > + * this, as well as any VMAs shrunk/expanded/deleted as a result of this merge.
> > + *
> > + * As part of this operation, if a merge occurs, the @vmg object will have its
> > + * vma, start, end, and pgoff fields modified to execute the merge. Subsequent
> > + * calls to this function should reset these fields.
> > + *
> > + * Returns: The merged VMA if merge succeeds, or NULL otherwise.
> > + *
> > + * ASSUMPTIONS:
> > + * - The caller must assign the VMA to be modifed to vmg->vma.
> > + * - The caller must have set vmg->prev to the previous VMA, if there is one.
> > + * - The caller does not need to set vmg->next, as we determine this.
> > + * - The caller must hold a WRITE lock on the mm_struct->mmap_lock.
>
> Also there's again some assumption about vmi? :)

Yeah I will add.

>
> > + */
> > +static struct vm_area_struct *vma_merge_modified(struct vma_merge_struct *vmg)
> > +{
> > +	struct vm_area_struct *vma = vmg->vma;
> > +	struct vm_area_struct *prev = vmg->prev;
> > +	struct vm_area_struct *next, *res;
> > +	struct vm_area_struct *anon_dup = NULL;
> > +	struct vm_area_struct *adjust = NULL;
> > +	unsigned long start = vmg->start;
> > +	unsigned long end = vmg->end;
> > +	bool left_side = vma && start == vma->vm_start;
> > +	bool right_side = vma && end == vma->vm_end;
> > +	bool merge_will_delete_vma, merge_will_delete_next;
> > +	bool merge_left, merge_right;
> > +	bool merge_both = false;
> > +	int err = 0;
> > +	long adj_start = 0;
> > +
> > +	VM_WARN_ON(!vma); /* We are modifying a VMA, so caller must specify. */
> > +	VM_WARN_ON(vmg->next); /* We set this. */
> > +	VM_WARN_ON(prev && start <= prev->vm_start);
> > +	VM_WARN_ON(start >= end);
> > +	/*
> > +	 * If vma == prev, then we are offset into a VMA. Otherwise, if we are
> > +	 * not, we must span a portion of the VMA.
> > +	 */
> > +	VM_WARN_ON(vma && ((vma != prev && vmg->start != vma->vm_start) ||
> > +			   vmg->end > vma->vm_end));
> > +
> > +	/*
> > +	 * If a special mapping or neither at the furthermost left or right side
> > +	 * of the VMA, then we have no chance of merging and should abort.
> > +	 *
> > +	 * We later require that vma->vm_flags == vm_flags, so this tests
> > +	 * vma->vm_flags & VM_SPECIAL, too.
> > +	 */
> > +	if (vmg->flags & VM_SPECIAL || (!left_side && !right_side))
> > +		return NULL;
> > +
> > +	if (left_side && prev && prev->vm_end == start && can_vma_merge_after(vmg)) {
> > +		merge_left = true;
> > +		vma_prev(vmg->vmi);
> > +	} else {
> > +		merge_left = false;
> > +	}
> > +
> > +	if (right_side) {
> > +		next = vmg->next = vma_lookup(vma->vm_mm, end);
> > +
> > +		/*
> > +		 * We can merge right if there is a subsequent VMA, if it is
> > +		 * immediately adjacent, and if it is compatible with vma.
> > +		 */
> > +		merge_right = next && end == next->vm_start &&
> > +			can_vma_merge_before(vmg);
> > +
> > +		/*
> > +		 * We can only merge both if the anonymous VMA of the previous
> > +		 * VMA is compatible with the anonymous VMA of the subsequent
> > +		 * VMA.
> > +		 *
> > +		 * Otherwise, we default to merging only the left.
> > +		 */
> > +		if (merge_left && merge_right)
> > +			merge_right = merge_both =
> > +				is_mergeable_anon_vma(prev->anon_vma,
> > +						      next->anon_vma, NULL);
> > +	} else {
> > +		merge_right = false;
> > +		next = NULL;
> > +	}
> > +
> > +	/* If we have nothing to merge, abort. */
> > +	if (!merge_left && !merge_right)
> > +		return NULL;
> > +
> > +	/* If we span the entire VMA, a merge implies it will be deleted. */
> > +	merge_will_delete_vma = left_side && right_side;
> > +	/* If we merge both VMAs, then next is also deleted. */
> > +	merge_will_delete_next = merge_both;
> > +
> > +	/* No matter what happens, we will be adjusting vma. */
> > +	vma_start_write(vma);
> > +
> > +	if (merge_left)
> > +		vma_start_write(prev);
> > +
> > +	if (merge_right)
> > +		vma_start_write(next);
> > +
> > +	if (merge_both) {
> > +		/*
> > +		 *         |<----->|
> > +		 * |-------*********-------|
> > +		 *   prev     vma     next
> > +		 *  extend   delete  delete
> > +		 */
> > +
> > +		vmg->vma = prev;
> > +		vmg->start = prev->vm_start;
> > +		vmg->end = next->vm_end;
> > +		vmg->pgoff = prev->vm_pgoff;
> > +
> > +		/*
> > +		 * We already ensured anon_vma compatibility above, so now it's
> > +		 * simply a case of, if prev has no anon_vma object, which of
> > +		 * next or vma contains the anon_vma we must duplicate.
> > +		 */
> > +		err = dup_anon_vma(prev, next->anon_vma ? next : vma, &anon_dup);
> > +	} else if (merge_left) {
> > +		/*
> > +		 *         |<----->| OR
> > +		 *         |<--------->|
> > +		 * |-------*************
> > +		 *   prev       vma
> > +		 *  extend shrink/delete
> > +		 */
> > +
> > +		unsigned long end = vmg->end;
>
> Nit: This is only used once below, thus could be used directly?

Yeah this is probably a holdover from a previous (maybe buggy before I
fixed it) version of this code which used it more than once.

Will fix.

>
> > +
> > +		vmg->vma = prev;
> > +		vmg->start = prev->vm_start;
> > +		vmg->pgoff = prev->vm_pgoff;
> > +
> > +		if (merge_will_delete_vma) {
> > +			/*
> > +			 * can_vma_merge_after() assumed we would not be
> > +			 * removing vma, so it skipped the check for
> > +			 * vm_ops->close, but we are removing vma.
> > +			 */
> > +			if (vma->vm_ops && vma->vm_ops->close)
> > +				err = -EINVAL;
> > +		} else {
> > +			adjust = vma;
> > +			adj_start = end - vma->vm_start;
> > +		}
> > +
> > +		if (!err)
> > +			err = dup_anon_vma(prev, vma, &anon_dup);
> > +	} else { /* merge_right */
> > +		/*
> > +		 *     |<----->| OR
> > +		 * |<--------->|
> > +		 * *************-------|
> > +		 *      vma       next
> > +		 * shrink/delete extend
> > +		 */
> > +
> > +		pgoff_t pglen = PHYS_PFN(vmg->end - vmg->start);
> > +
> > +		VM_WARN_ON(!merge_right);
> > +		/* If we are offset into a VMA, then prev must be vma. */
> > +		VM_WARN_ON(vmg->start > vma->vm_start && prev && vma != prev);
> > +
> > +		if (merge_will_delete_vma) {
> > +			vmg->vma = next;
> > +			vmg->end = next->vm_end;
> > +			vmg->pgoff = next->vm_pgoff - pglen;
> > +		} else {
> > +			/*
> > +			 * We shrink vma and expand next.
> > +			 *
> > +			 * IMPORTANT: This is the ONLY case where the final
> > +			 * merged VMA is NOT vmg->vma, but rather vmg->next.
> > +			 */
> > +
> > +			vmg->start = vma->vm_start;
> > +			vmg->end = start;
> > +			vmg->pgoff = vma->vm_pgoff;
> > +
> > +			adjust = next;
> > +			adj_start = -(vma->vm_end - start);
> > +		}
> > +
> > +		err = dup_anon_vma(next, vma, &anon_dup);
> > +	}
> > +
> > +	if (err)
> > +		goto abort;
> > +
> > +	if (commit_merge(vmg, adjust,
> > +			 merge_will_delete_vma ? vma : NULL,
> > +			 merge_will_delete_next ? next : NULL,
> > +			 adj_start,
> > +			 /*
> > +			  * In nearly all cases, we expand vmg->vma. There is
> > +			  * one exception - merge_right where we partially span
> > +			  * the VMA. In this case we shrink the end of vmg->vma
> > +			  * and adjust the start of vmg->next accordingly.
> > +			  */
> > +			 !merge_right || merge_will_delete_vma))
> > +		return NULL;
>
> If this fails, you need to unlink_anon_vma() ? The old code did.

You're right, good spot this is a subtle one...

Will fix and I'll add a test for this too. The preallocate would have to
fail, but we can simulate that now...

>
>
> > +	res = merge_left ? prev : next;
> > +	khugepaged_enter_vma(res, vmg->flags);
> > +
> > +	return res;
> > +
> > +abort:
> > +	vma_iter_set(vmg->vmi, start);
> > +	vma_iter_load(vmg->vmi);
> > +	return NULL;
> > +}
> > +
> >  /*
> >   * vma_merge_new_vma - Attempt to merge a new VMA into address space
> >   *
> > @@ -1022,245 +1251,6 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
> >  	return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
> >  }
> >
> > -/*
> > - * Given a mapping request (addr,end,vm_flags,file,pgoff,anon_name),
> > - * figure out whether that can be merged with its predecessor or its
> > - * successor.  Or both (it neatly fills a hole).
> > - *
> > - * In most cases - when called for mmap, brk or mremap - [addr,end) is
> > - * certain not to be mapped by the time vma_merge is called; but when
> > - * called for mprotect, it is certain to be already mapped (either at
> > - * an offset within prev, or at the start of next), and the flags of
> > - * this area are about to be changed to vm_flags - and the no-change
> > - * case has already been eliminated.
> > - *
> > - * The following mprotect cases have to be considered, where **** is
> > - * the area passed down from mprotect_fixup, never extending beyond one
> > - * vma, PPPP is the previous vma, CCCC is a concurrent vma that starts
> > - * at the same address as **** and is of the same or larger span, and
> > - * NNNN the next vma after ****:
> > - *
> > - *     ****             ****                   ****
> > - *    PPPPPPNNNNNN    PPPPPPNNNNNN       PPPPPPCCCCCC
> > - *    cannot merge    might become       might become
> > - *                    PPNNNNNNNNNN       PPPPPPPPPPCC
> > - *    mmap, brk or    case 4 below       case 5 below
> > - *    mremap move:
> > - *                        ****               ****
> > - *                    PPPP    NNNN       PPPPCCCCNNNN
> > - *                    might become       might become
> > - *                    PPPPPPPPPPPP 1 or  PPPPPPPPPPPP 6 or
> > - *                    PPPPPPPPNNNN 2 or  PPPPPPPPNNNN 7 or
> > - *                    PPPPNNNNNNNN 3     PPPPNNNNNNNN 8
> > - *
> > - * It is important for case 8 that the vma CCCC overlapping the
> > - * region **** is never going to extended over NNNN. Instead NNNN must
> > - * be extended in region **** and CCCC must be removed. This way in
> > - * all cases where vma_merge succeeds, the moment vma_merge drops the
> > - * rmap_locks, the properties of the merged vma will be already
> > - * correct for the whole merged range. Some of those properties like
> > - * vm_page_prot/vm_flags may be accessed by rmap_walks and they must
> > - * be correct for the whole merged range immediately after the
> > - * rmap_locks are released. Otherwise if NNNN would be removed and
> > - * CCCC would be extended over the NNNN range, remove_migration_ptes
> > - * or other rmap walkers (if working on addresses beyond the "end"
> > - * parameter) may establish ptes with the wrong permissions of CCCC
> > - * instead of the right permissions of NNNN.
> > - *
> > - * In the code below:
> > - * PPPP is represented by *prev
> > - * CCCC is represented by *curr or not represented at all (NULL)
> > - * NNNN is represented by *next or not represented at all (NULL)
> > - * **** is not represented - it will be merged and the vma containing the
> > - *      area is returned, or the function will return NULL
> > - */
>
> RIP our precious diagrams.
>

I always disliked these... and I can say so because I was involved in
changing them so it's self-criticism too :)

Very much representing the overwrought complexity of trying to do
everything in one function.


  reply	other threads:[~2024-08-09 13:57 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-05 12:13 [PATCH 00/10] mm: remove vma_merge() Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 01/10] tools: improve vma test Makefile Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 02/10] mm: introduce vma_merge_struct and abstract merge parameters Lorenzo Stoakes
2024-08-06 12:47   ` Petr Tesařík
2024-08-06 13:43     ` Lorenzo Stoakes
2024-08-06 14:06       ` Petr Tesařík
2024-08-06 14:20         ` Lorenzo Stoakes
2024-08-06 14:32           ` Petr Tesařík
2024-08-08 12:49   ` Vlastimil Babka
2024-08-08 17:18     ` Lorenzo Stoakes
2024-08-08 20:07   ` Liam R. Howlett
2024-08-09 10:11     ` Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 03/10] mm: abstract duplicated policy comparison Lorenzo Stoakes
2024-08-06 12:50   ` Petr Tesařík
2024-08-05 12:13 ` [PATCH 04/10] mm: abstract parameters for vma_expand/shrink() Lorenzo Stoakes
2024-08-06 12:54   ` Petr Tesařík
     [not found]   ` <f12608ec-9c40-4977-a5a6-479f86b44e80@kernel.org>
2024-08-08 15:45     ` Lorenzo Stoakes
2024-08-08 20:20       ` Liam R. Howlett
2024-08-09 10:18         ` Lorenzo Stoakes
2024-08-14 13:53       ` Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 05/10] mm: abstract vma_merge_new_vma() to use vma_merge_struct Lorenzo Stoakes
     [not found]   ` <82b802e0-94fd-4cca-ad8f-ea2d85bcae64@kernel.org>
2024-08-08 15:52     ` Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 06/10] tools: add VMA merge tests Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 07/10] mm: avoid using vma_merge() for new VMAs Lorenzo Stoakes
2024-08-06 13:04   ` Petr Tesařík
2024-08-06 13:44     ` Lorenzo Stoakes
2024-08-08 16:45   ` Vlastimil Babka
2024-08-08 18:02     ` Lorenzo Stoakes
2024-08-08 18:34       ` Liam R. Howlett
2024-08-08 19:06         ` Liam R. Howlett
2024-08-09 10:14           ` Lorenzo Stoakes
2024-08-09 15:23   ` Liam R. Howlett
2024-08-09 17:20     ` Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 08/10] mm: introduce commit_merge(), abstracting merge operation Lorenzo Stoakes
2024-08-06 13:41   ` Petr Tesařík
2024-08-06 13:48     ` Lorenzo Stoakes
2024-08-06 14:13       ` Petr Tesařík
2024-08-06 14:30         ` Lorenzo Stoakes
2024-08-06 14:39           ` Petr Tesařík
2024-08-09 10:15   ` Vlastimil Babka
2024-08-09 10:53     ` Lorenzo Stoakes
2024-08-05 12:13 ` [PATCH 09/10] mm: refactor vma_merge() into modify-only vma_merge_modified() Lorenzo Stoakes
2024-08-06 13:42   ` Petr Tesařík
2024-08-06 13:52     ` Lorenzo Stoakes
2024-08-09 13:44   ` Vlastimil Babka
2024-08-09 13:57     ` Lorenzo Stoakes [this message]
2024-08-05 12:13 ` [PATCH 10/10] mm: rework vm_ops->close() handling on VMA merge Lorenzo Stoakes
2024-08-06 13:55   ` Petr Tesařík
2024-08-06 14:08     ` Lorenzo Stoakes
2024-08-06 14:21       ` Petr Tesařík
2024-08-06 14:42         ` Lorenzo Stoakes
2024-08-09 14:25   ` Vlastimil Babka
2024-08-09 14:37     ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f87b18e5-9caa-472f-9fe7-2ba30e4b46cd@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox