linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Lorenzo Stoakes <lstoakes@gmail.com>,
	Matthew Wilcox <willy@infradead.org>,
	sidhartha.kumar@oracle.com,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Bert Karwatzki <spasswolf@web.de>, Jiri Olsa <olsajiri@gmail.com>,
	linux-kernel@vger.kernel.org, Kees Cook <kees@kernel.org>
Subject: Re: [PATCH v4 14/21] mm/mmap: Avoid zeroing vma tree in mmap_region()
Date: Thu, 11 Jul 2024 12:07:52 -0400	[thread overview]
Message-ID: <qjt7mjnwp3mwkk3jbvzickmhlutlgjwvpuy3z4hihkxjt4skbc@qoqxppownvxl> (raw)
In-Reply-To: <6e29f050-89a1-4a7d-bba9-fc49c04292fb@lucifer.local>

* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [240711 11:25]:
> On Wed, Jul 10, 2024 at 03:22:43PM GMT, Liam R. Howlett wrote:
> > From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
> >
> > Instead of zeroing the vma tree and then overwriting the area, let the
> > area be overwritten and then clean up the gathered vmas using
> > vms_complete_munmap_vmas().
> >
> > If a driver is mapping over an existing vma, then clear the ptes before
> > the call_mmap() invocation.  This is done using the vms_clear_ptes()
> > helper.
> >
> > Temporarily keep track of the number of pages that will be removed and
> > reduce the charged amount.
> >
> > This also drops the validate_mm() call in the vma_expand() function.
> > It is necessary to drop the validate as it would fail since the mm
> > map_count would be incorrect during a vma expansion, prior to the
> > cleanup from vms_complete_munmap_vmas().
> >
> > Clean up the error handing of the vms_gather_munmap_vmas() by calling
> > the verification within the function.
> >
> > Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> > ---
> >  mm/internal.h |  1 +
> >  mm/mmap.c     | 80 +++++++++++++++++++++++++++------------------------
> >  2 files changed, 44 insertions(+), 37 deletions(-)
> >
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 11e90c6e5a3e..dd4eede1be0f 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -1503,6 +1503,7 @@ struct vma_munmap_struct {
> >  	unsigned long stack_vm;
> >  	unsigned long data_vm;
> >  	bool unlock;			/* Unlock after the munmap */
> > +	bool clear_ptes;		/* If there are outstanding PTE to be cleared */
> >  };
> >
> >  void __meminit __init_single_page(struct page *page, unsigned long pfn,
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 870c2d04ad6b..58cf42e22bfe 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -401,17 +401,21 @@ anon_vma_interval_tree_post_update_vma(struct vm_area_struct *vma)
> >  }
> >
> >  static unsigned long count_vma_pages_range(struct mm_struct *mm,
> > -		unsigned long addr, unsigned long end)
> > +		unsigned long addr, unsigned long end,
> > +		unsigned long *nr_accounted)
> >  {
> >  	VMA_ITERATOR(vmi, mm, addr);
> >  	struct vm_area_struct *vma;
> >  	unsigned long nr_pages = 0;
> >
> > +	*nr_accounted = 0;
> >  	for_each_vma_range(vmi, vma, end) {
> >  		unsigned long vm_start = max(addr, vma->vm_start);
> >  		unsigned long vm_end = min(end, vma->vm_end);
> >
> >  		nr_pages += PHYS_PFN(vm_end - vm_start);
> > +		if (vma->vm_flags & VM_ACCOUNT)
> > +			*nr_accounted += PHYS_PFN(vm_end - vm_start);
> >  	}
> >
> >  	return nr_pages;
> > @@ -524,6 +528,7 @@ static inline void init_vma_munmap(struct vma_munmap_struct *vms,
> >  	vms->exec_vm = vms->stack_vm = vms->data_vm = 0;
> >  	vms->unmap_start = FIRST_USER_ADDRESS;
> >  	vms->unmap_end = USER_PGTABLES_CEILING;
> > +	vms->clear_ptes = false;	/* No PTEs to clear yet */
> >  }
> >
> >  /*
> > @@ -732,7 +737,6 @@ int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
> >  	vma_iter_store(vmi, vma);
> >
> >  	vma_complete(&vp, vmi, vma->vm_mm);
> > -	validate_mm(vma->vm_mm);
> >  	return 0;
> >
> >  nomem:
> > @@ -2606,11 +2610,14 @@ static inline void abort_munmap_vmas(struct ma_state *mas_detach)
> >  }
> >
> >
> > -static void vms_complete_pte_clear(struct vma_munmap_struct *vms,
> > +static inline void vms_clear_ptes(struct vma_munmap_struct *vms,
> >  		struct ma_state *mas_detach, bool mm_wr_locked)
> >  {
> >  	struct mmu_gather tlb;
> >
> > +	if (!vms->clear_ptes) /* Nothing to do */
> > +		return;
> > +
> >  	/*
> >  	 * We can free page tables without write-locking mmap_lock because VMAs
> >  	 * were isolated before we downgraded mmap_lock.
> > @@ -2624,6 +2631,7 @@ static void vms_complete_pte_clear(struct vma_munmap_struct *vms,
> >  	/* start and end may be different if there is no prev or next vma. */
> >  	free_pgtables(&tlb, mas_detach, vms->vma, vms->unmap_start, vms->unmap_end, mm_wr_locked);
> >  	tlb_finish_mmu(&tlb);
> > +	vms->clear_ptes = false;
> >  }
> >
> >  /*
> > @@ -2647,7 +2655,7 @@ static void vms_complete_munmap_vmas(struct vma_munmap_struct *vms,
> >  	if (vms->unlock)
> >  		mmap_write_downgrade(mm);
> >
> > -	vms_complete_pte_clear(vms, mas_detach, !vms->unlock);
> > +	vms_clear_ptes(vms, mas_detach, !vms->unlock);
> >  	/* Update high watermark before we lower total_vm */
> >  	update_hiwater_vm(mm);
> >  	/* Stat accounting */
> > @@ -2799,6 +2807,9 @@ static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
> >  	while (vma_iter_addr(vms->vmi) > vms->start)
> >  		vma_iter_prev_range(vms->vmi);
> >
> > +	/* There are now PTEs that need to be cleared */
> > +	vms->clear_ptes = true;
> > +
> >  	return 0;
> >
> >  userfaultfd_error:
> > @@ -2807,6 +2818,7 @@ static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
> >  	abort_munmap_vmas(mas_detach);
> >  start_split_failed:
> >  map_count_exceeded:
> > +	validate_mm(vms->mm);
> 
> I'm guessing here we know it's safe to validate?

verification in the gather state is always safe - we haven't changed the
tree or a vma yet.

> 
> >  	return error;
> >  }
> >
> > @@ -2851,8 +2863,8 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> >
> >  clear_tree_failed:
> >  	abort_munmap_vmas(&mas_detach);
> > -gather_failed:
> >  	validate_mm(mm);
> 
> Additionally I imagine the gathering failing results in the tree being unable to
> be validated?

It is safe, but if it's here then it doesn't need to be above

> 
> > +gather_failed:
> >  	return error;
> >  }
> >
> > @@ -2940,24 +2952,19 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> >  	unsigned long merge_start = addr, merge_end = end;
> >  	bool writable_file_mapping = false;
> >  	pgoff_t vm_pgoff;
> > -	int error;
> > +	int error = -ENOMEM;
> >  	VMA_ITERATOR(vmi, mm, addr);
> > +	unsigned long nr_pages, nr_accounted;
> >
> > -	/* Check against address space limit. */
> > -	if (!may_expand_vm(mm, vm_flags, len >> PAGE_SHIFT)) {
> > -		unsigned long nr_pages;
> > -
> > -		/*
> > -		 * MAP_FIXED may remove pages of mappings that intersects with
> > -		 * requested mapping. Account for the pages it would unmap.
> > -		 */
> > -		nr_pages = count_vma_pages_range(mm, addr, end);
> > -
> > -		if (!may_expand_vm(mm, vm_flags,
> > -					(len >> PAGE_SHIFT) - nr_pages))
> > -			return -ENOMEM;
> > -	}
> > +	nr_pages = count_vma_pages_range(mm, addr, end, &nr_accounted);
> >
> > +	/*
> > +	 * Check against address space limit.
> > +	 * MAP_FIXED may remove pages of mappings that intersects with requested
> > +	 * mapping. Account for the pages it would unmap.
> > +	 */
> > +	if (!may_expand_vm(mm, vm_flags, (len >> PAGE_SHIFT) - nr_pages))
> > +		return -ENOMEM;
> >
> >  	if (unlikely(!can_modify_mm(mm, addr, end)))
> >  		return -EPERM;
> > @@ -2974,18 +2981,13 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> >  		mas_init(&mas_detach, &mt_detach, /* addr = */ 0);
> >  		/* Prepare to unmap any existing mapping in the area */
> >  		if (vms_gather_munmap_vmas(&vms, &mas_detach))
> > -			goto gather_failed;
> > -
> > -		/* Remove any existing mappings from the vma tree */
> > -		if (vma_iter_clear_gfp(&vmi, addr, end, GFP_KERNEL))
> > -			goto clear_tree_failed;
> > +			return -ENOMEM;
> >
> > -		/* Unmap any existing mapping in the area */
> > -		vms_complete_munmap_vmas(&vms, &mas_detach);
> >  		next = vms.next;
> >  		prev = vms.prev;
> >  		vma = NULL;
> >  	} else {
> > +		/* Minimal setup of vms */
> 
> Nit, but is this valid now we use the init function unconditionally?

Yes, that needs to be dropped, thanks.

> 
> >  		next = vma_next(&vmi);
> >  		prev = vma_prev(&vmi);
> >  		if (prev)
> > @@ -2997,8 +2999,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> >  	 */
> >  	if (accountable_mapping(file, vm_flags)) {
> >  		charged = len >> PAGE_SHIFT;
> > +		charged -= nr_accounted;
> >  		if (security_vm_enough_memory_mm(mm, charged))
> > -			return -ENOMEM;
> > +			goto abort_munmap;
> > +		vms.nr_accounted = 0;
> >  		vm_flags |= VM_ACCOUNT;
> >  	}
> >
> > @@ -3047,10 +3051,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> >  	 * not unmapped, but the maps are removed from the list.
> >  	 */
> >  	vma = vm_area_alloc(mm);
> > -	if (!vma) {
> > -		error = -ENOMEM;
> > +	if (!vma)
> >  		goto unacct_error;
> > -	}
> >
> >  	vma_iter_config(&vmi, addr, end);
> >  	vma_set_range(vma, addr, end, pgoff);
> > @@ -3059,6 +3061,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> >
> >  	if (file) {
> >  		vma->vm_file = get_file(file);
> > +		/* call_mmap() may map PTE, so ensure there are no existing PTEs */
> > +		vms_clear_ptes(&vms, &mas_detach, true);
> >  		error = call_mmap(file, vma);
> >  		if (error)
> >  			goto unmap_and_free_vma;
> > @@ -3149,6 +3153,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> >  expanded:
> >  	perf_event_mmap(vma);
> >
> > +	/* Unmap any existing mapping in the area */
> > +	if (vms.nr_pages)
> > +		vms_complete_munmap_vmas(&vms, &mas_detach);
> > +
> >  	vm_stat_account(mm, vm_flags, len >> PAGE_SHIFT);
> >  	if (vm_flags & VM_LOCKED) {
> >  		if ((vm_flags & VM_SPECIAL) || vma_is_dax(vma) ||
> > @@ -3196,14 +3204,12 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> >  unacct_error:
> >  	if (charged)
> >  		vm_unacct_memory(charged);
> > -	validate_mm(mm);
> > -	return error;
> >
> > -clear_tree_failed:
> > -	abort_munmap_vmas(&mas_detach);
> > -gather_failed:
> > +abort_munmap:
> > +	if (vms.nr_pages)
> > +		abort_munmap_vmas(&mas_detach);
> >  	validate_mm(mm);
> > -	return -ENOMEM;
> > +	return error;
> >  }
> >
> >  static int __vm_munmap(unsigned long start, size_t len, bool unlock)
> > --
> > 2.43.0
> >
> 
> Other than nits/queries, LGTM:
> 
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>


  reply	other threads:[~2024-07-11 16:08 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-10 19:22 [PATCH v4 00/21] Avoid MAP_FIXED gap exposure Liam R. Howlett
2024-07-10 19:22 ` [PATCH v4 01/21] mm/mmap: Correctly position vma_iterator in __split_vma() Liam R. Howlett
2024-07-10 19:22 ` [PATCH v4 02/21] mm/mmap: Introduce abort_munmap_vmas() Liam R. Howlett
2024-07-10 19:22 ` [PATCH v4 03/21] mm/mmap: Introduce vmi_complete_munmap_vmas() Liam R. Howlett
2024-07-10 19:22 ` [PATCH v4 04/21] mm/mmap: Extract the gathering of vmas from do_vmi_align_munmap() Liam R. Howlett
2024-07-10 19:22 ` [PATCH v4 05/21] mm/mmap: Introduce vma_munmap_struct for use in munmap operations Liam R. Howlett
2024-07-10 19:22 ` [PATCH v4 06/21] mm/mmap: Change munmap to use vma_munmap_struct() for accounting and surrounding vmas Liam R. Howlett
2024-07-10 19:22 ` [PATCH v4 07/21] mm/mmap: Extract validate_mm() from vma_complete() Liam R. Howlett
2024-07-10 19:22 ` [PATCH v4 08/21] mm/mmap: Inline munmap operation in mmap_region() Liam R. Howlett
2024-07-10 19:22 ` [PATCH v4 09/21] mm/mmap: Expand mmap_region() munmap call Liam R. Howlett
2024-07-11 14:16   ` Lorenzo Stoakes
2024-07-10 19:22 ` [PATCH v4 10/21] mm/mmap: Support vma == NULL in init_vma_munmap() Liam R. Howlett
2024-07-11 14:28   ` Lorenzo Stoakes
2024-07-11 16:04     ` Liam R. Howlett
2024-07-10 19:22 ` [PATCH v4 11/21] mm/mmap: Reposition vma iterator in mmap_region() Liam R. Howlett
2024-07-10 19:22 ` [PATCH v4 12/21] mm/mmap: Track start and end of munmap in vma_munmap_struct Liam R. Howlett
2024-07-10 19:22 ` [PATCH v4 13/21] mm/mmap: Clean up unmap_region() argument list Liam R. Howlett
2024-07-10 19:22 ` [PATCH v4 14/21] mm/mmap: Avoid zeroing vma tree in mmap_region() Liam R. Howlett
2024-07-11 15:25   ` Lorenzo Stoakes
2024-07-11 16:07     ` Liam R. Howlett [this message]
2024-07-16 12:46   ` kernel test robot
2024-07-17 17:42     ` Liam R. Howlett
2024-07-10 19:22 ` [PATCH v4 15/21] mm/mmap: Use PHYS_PFN " Liam R. Howlett
2024-07-10 19:22 ` [PATCH v4 16/21] mm/mmap: Use vms accounted pages " Liam R. Howlett
2024-07-10 19:22 ` [PATCH v4 17/21] mm/mmap: Drop arch_unmap() call from all archs Liam R. Howlett
2024-07-10 19:27   ` Dave Hansen
2024-07-10 21:02   ` LEROY Christophe
2024-07-10 23:26     ` Liam R. Howlett
2024-07-11  8:28       ` LEROY Christophe
2024-07-11 15:59         ` Liam R. Howlett
2024-07-10 19:22 ` [PATCH v4 18/21] mm/mmap: Move can_modify_mm() check down the stack Liam R. Howlett
2024-07-17  5:03   ` Jeff Xu
2024-07-17 14:07     ` Liam R. Howlett
2024-07-10 19:22 ` [PATCH v4 19/21] ipc/shm, mm: Drop do_vma_munmap() Liam R. Howlett
2024-07-10 19:22 ` [PATCH v4 20/21] mm/mmap: Move may_expand_vm() check in mmap_region() Liam R. Howlett
2024-07-11 15:38   ` Lorenzo Stoakes
2024-07-10 19:22 ` [PATCH v4 21/21] mm/mmap: Drop incorrect comment from vms_gather_munmap_vmas() Liam R. Howlett
2024-07-11 15:39   ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=qjt7mjnwp3mwkk3jbvzickmhlutlgjwvpuy3z4hihkxjt4skbc@qoqxppownvxl \
    --to=liam.howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=kees@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lstoakes@gmail.com \
    --cc=olsajiri@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=sidhartha.kumar@oracle.com \
    --cc=spasswolf@web.de \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox