linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Jann Horn <jannh@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Harry Yoo <harry.yoo@oracle.com>,
	Yosry Ahmed <yosry.ahmed@linux.dev>
Subject: Re: [PATCH v2 5/7] mm/mremap: complete refactor of move_vma()
Date: Mon, 10 Mar 2025 15:38:17 +0000	[thread overview]
Message-ID: <06e4a72b-d599-4ba0-8cfe-69050b955768@lucifer.local> (raw)
In-Reply-To: <99922cca-ed8e-4996-8833-a89264783b28@suse.cz>

On Mon, Mar 10, 2025 at 04:11:59PM +0100, Vlastimil Babka wrote:
> On 3/6/25 11:34, Lorenzo Stoakes wrote:
> > We invoke ksm_madvise() with an intentionally dummy flags field, so no
> > need to pass around.
> >
> > Additionally, the code tries to be 'clever' with account_start,
> > account_end, using these to both check that vma->vm_start != 0 and that we
> > ought to account the newly split portion of VMA post-move, either before
> > or after it.
> >
> > We need to do this because we intentionally removed VM_ACCOUNT on the VMA
> > prior to unmapping, so we don't erroneously unaccount memory (we have
> > already calculated the correct amount to account and accounted it, any
> > subsequent subtraction will be incorrect).
> >
> > This patch significantly expands the comment (from 2002!) about
> > 'concealing' the flag to make it abundantly clear what's going on, as well
> > as adding and expanding a number of other comments also.
> >
> > We can remove account_start, account_end by instead tracking when we
> > account (i.e.  vma->vm_flags has the VM_ACCOUNT flag set, and this is not
> > an MREMAP_DONTUNMAP operation), and figuring out when to reinstate the
> > VM_ACCOUNT flag on prior/subsequent VMAs separately.
> >
> > We additionally break the function into logical pieces and attack the very
> > confusing error handling logic (where, for instance, new_addr is set to
> > err).
> >
> > After this change the code is considerably more readable and easy to
> > manipulate.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

Thanks!

>
> Some nits below:
>
> > +/*
> > + * Unmap source VMA for VMA move, turning it from a copy to a move, being
> > + * careful to ensure we do not underflow memory account while doing so if an
> > + * accountable move.
> > + *
> > + * This is best effort, if we fail to unmap then we simply try
>
> this looks like an unfinished sentence?

Damn yeah, will fix. Will wait for rest of your review before either
fix-patching or respinning.

>
> > @@ -1007,51 +1157,15 @@ static unsigned long move_vma(struct vma_remap_struct *vrm)
> >  	 */
> >  	hiwater_vm = mm->hiwater_vm;
>
> This...
>
> > -	/* Tell pfnmap has moved from this vma */
> > -	if (unlikely(vma->vm_flags & VM_PFNMAP))
> > -		untrack_pfn_clear(vma);
> > -
> > -	if (unlikely(!err && (vrm->flags & MREMAP_DONTUNMAP))) {
> > -		/* We always clear VM_LOCKED[ONFAULT] on the old vma */
> > -		vm_flags_clear(vma, VM_LOCKED_MASK);
> > -
> > -		/*
> > -		 * anon_vma links of the old vma is no longer needed after its page
> > -		 * table has been moved.
> > -		 */
> > -		if (new_vma != vma && vma->vm_start == old_addr &&
> > -			vma->vm_end == (old_addr + old_len))
> > -			unlink_anon_vmas(vma);
> > -
> > -		/* Because we won't unmap we don't need to touch locked_vm */
> > -		vrm_stat_account(vrm, new_len);
> > -		return new_addr;
> > -	}
> > -
> > -	vrm_stat_account(vrm, new_len);
> > -
> > -	vma_iter_init(&vmi, mm, old_addr);
> > -	if (do_vmi_munmap(&vmi, mm, old_addr, old_len, vrm->uf_unmap, false) < 0) {
> > -		/* OOM: unable to split vma, just get accounts right */
> > -		if (vm_flags & VM_ACCOUNT && !(vrm->flags & MREMAP_DONTUNMAP))
> > -			vm_acct_memory(old_len >> PAGE_SHIFT);
> > -		account_start = account_end = false;
> > -	}
> > +	vrm_stat_account(vrm, vrm->new_len);
> > +	if (unlikely(!err && (vrm->flags & MREMAP_DONTUNMAP)))
> > +		dontunmap_complete(vrm, new_vma);
> > +	else
> > +		unmap_source_vma(vrm);
> >
> >  	mm->hiwater_vm = hiwater_vm;
>
> ... and this AFAICS only applies to the unmap_source_vma() case. And from
> the comment it seems only to the "undo destination vma" variant. BTW this
> also means the unmap_source_vma() name is rather misleading?

I agree it's only meaningful in that case, I am not sure what you mean
about the undo destination vma case?

We have 3 situations in which we call unmap_source_vma():

1. normal, no error move.
2. error has occured, so we set the source to destination and vice-versa.
3. MREMAP_DONTUNMAP, error has occured, so we set the source to destination
   and vice-versa.

I _hate_ this error handling where we reverse the source/destination, it's
so confusing, but since we are doing that, I still think unmap_source_vma()
is meaningful.

unmap_old_vma() is possible too, but I think less clear...

The beter fix I think is to find a less awful way to do the error handling.

>
> So I think the whole handling of that hiwater_vm could move to
> unmap_source_vma(). It should not matter if we take the snapshot of
> hiwater_vm only after vrm_stat_account() bumps the total_vm. Nobody else can
> be racing with us to permanently turn that total_vm to a hiwater_vm.
>
> (this is probably a potential cleanup for a followup-patch anyway)

Yup agreed, but let's do it as a follow up as this is how it is in the
original code.

Will add to the whiteboard TODO...

>
> >
> > -	/* Restore VM_ACCOUNT if one or two pieces of vma left */
> > -	if (account_start) {
> > -		vma = vma_prev(&vmi);
> > -		vm_flags_set(vma, VM_ACCOUNT);
> > -	}
> > -
> > -	if (account_end) {
> > -		vma = vma_next(&vmi);
> > -		vm_flags_set(vma, VM_ACCOUNT);
> > -	}
> > -
> > -	return new_addr;
> > +	return err ? (unsigned long)err : vrm->new_addr;
> >  }
> >
> >  /*
> > --
> > 2.48.1
>


  reply	other threads:[~2025-03-10 15:38 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-06 10:33 [PATCH v2 0/7] refactor mremap and fix bug Lorenzo Stoakes
2025-03-06 10:33 ` [PATCH v2 1/7] mm/mremap: correctly handle partial mremap() of VMA starting at 0 Lorenzo Stoakes
2025-03-06 10:33 ` [PATCH v2 2/7] mm/mremap: refactor mremap() system call implementation Lorenzo Stoakes
2025-03-06 13:56   ` Vlastimil Babka
2025-03-06 10:33 ` [PATCH v2 3/7] mm/mremap: introduce and use vma_remap_struct threaded state Lorenzo Stoakes
2025-03-06 15:26   ` Vlastimil Babka
2025-03-10 10:19   ` Lorenzo Stoakes
2025-03-10 18:02     ` Lorenzo Stoakes
2025-03-06 10:34 ` [PATCH v2 4/7] mm/mremap: initial refactor of move_vma() Lorenzo Stoakes
2025-03-10 14:11   ` Vlastimil Babka
2025-03-10 14:28     ` Lorenzo Stoakes
2025-03-06 10:34 ` [PATCH v2 5/7] mm/mremap: complete " Lorenzo Stoakes
2025-03-10 15:11   ` Vlastimil Babka
2025-03-10 15:38     ` Lorenzo Stoakes [this message]
2025-03-06 10:34 ` [PATCH v2 6/7] mm/mremap: refactor move_page_tables(), abstracting state Lorenzo Stoakes
2025-03-10 17:05   ` Vlastimil Babka
2025-03-10 18:09     ` Lorenzo Stoakes
2025-03-06 10:34 ` [PATCH v2 7/7] mm/mremap: thread state through move page table operation Lorenzo Stoakes
2025-03-10 17:52   ` Vlastimil Babka
2025-03-10 18:13     ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=06e4a72b-d599-4ba0-8cfe-69050b955768@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=vbabka@suse.cz \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox