linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Lorenzo Stoakes <lstoakes@gmail.com>,
	Matthew Wilcox <willy@infradead.org>,
	sidhartha.kumar@oracle.com,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Bert Karwatzki <spasswolf@web.de>, Jiri Olsa <olsajiri@gmail.com>,
	linux-kernel@vger.kernel.org, Kees Cook <kees@kernel.org>
Subject: Re: [PATCH v3 13/16] mm/mmap: Avoid zeroing vma tree in mmap_region()
Date: Tue, 9 Jul 2024 14:43:42 -0400	[thread overview]
Message-ID: <q2yygxyl2gtoy67fosh2slb3ufxzr5kx4dwhjs23cajpsmouod@luw4kdisi5yu> (raw)
In-Reply-To: <4c9b4ec9-88b3-4ca8-8358-734463533078@lucifer.local>

* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [240709 10:27]:
> On Mon, Jul 08, 2024 at 03:10:10PM GMT, Liam R. Howlett wrote:
> > * Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [240708 08:18]:
> > > On Thu, Jul 04, 2024 at 02:27:15PM GMT, Liam R. Howlett wrote:
> > > > From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
> > > >
> > > > Instead of zeroing the vma tree and then overwriting the area, let the
> > > > area be overwritten and then clean up the gathered vmas using
> > > > vms_complete_munmap_vmas().
> > > >
> > > > In the case of a driver mapping over existing vmas, the PTEs are cleared
> > > > using the helper vms_complete_pte_clear().
> > > >
> > > > Temporarily keep track of the number of pages that will be removed and
> > > > reduce the charged amount.
> > > >
> > > > This also drops the validate_mm() call in the vma_expand() function.
> > > > It is necessary to drop the validate as it would fail since the mm
> > > > map_count would be incorrect during a vma expansion, prior to the
> > > > cleanup from vms_complete_munmap_vmas().
> > > >
> > > > Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> > > > ---
> > > >  mm/internal.h |  1 +
> > > >  mm/mmap.c     | 61 ++++++++++++++++++++++++++++++---------------------
> > > >  2 files changed, 37 insertions(+), 25 deletions(-)
> > > >
> > > > diff --git a/mm/internal.h b/mm/internal.h
> > > > index 4c9f06669cc4..fae4a1bba732 100644
> > > > --- a/mm/internal.h
> > > > +++ b/mm/internal.h
> > > > @@ -1503,6 +1503,7 @@ struct vma_munmap_struct {
> > > >  	unsigned long stack_vm;
> > > >  	unsigned long data_vm;
> > > >  	bool unlock;			/* Unlock after the munmap */
> > > > +	bool cleared_ptes;		/* If the PTE are cleared already */
> > > >  };
> > > >
> > > >  void __meminit __init_single_page(struct page *page, unsigned long pfn,
> > > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > > index 5d458c5f080e..0c334eeae8cd 100644
> > > > --- a/mm/mmap.c
> > > > +++ b/mm/mmap.c
...
> > > > @@ -522,6 +526,7 @@ static inline void init_vma_munmap(struct vma_munmap_struct *vms,
> > > >  	vms->exec_vm = vms->stack_vm = vms->data_vm = 0;
> > > >  	vms->unmap_start = FIRST_USER_ADDRESS;
> > > >  	vms->unmap_end = USER_PGTABLES_CEILING;
> > > > +	vms->cleared_ptes = false;
> > > >  }
> > > >
> > > >  /*
> > > > @@ -730,7 +735,6 @@ int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > > >  	vma_iter_store(vmi, vma);
> > > >
> > > >  	vma_complete(&vp, vmi, vma->vm_mm);
> > > > -	validate_mm(vma->vm_mm);
> > >
> > > Since we're dropping this here, do we need to re-add this back somehwere
> > > where we are confident the state will be consistent?
> >
> > The vma_expand() function is used in two places - one is in the mmap.c
> > file which can no longer validate the mm until the munmap is complete.
> > The other is in fs/exec.c which cannot call the validate_mm().  So
> > to add this call back, I'd have to add a wrapper to vma_expand() to call
> > the validate_mm() function for debug builds.
> >
> > Really all this code in fs/exec.c doesn't belong there so we don't need
> > to do an extra function wrapper just to call validate_mm(). And you have
> > a patch to do that which is out for review!
> 
> Indeed :) perhaps we should add back to the wrapper?
> 
...

> > >
> > > > +	if (!may_expand_vm(mm, vm_flags, (len >> PAGE_SHIFT) - nr_pages))
> > > > +		return -ENOMEM;
> > > >
> > > >  	if (unlikely(!can_modify_mm(mm, addr, end)))
> > > >  		return -EPERM;
> > > > @@ -2971,14 +2974,12 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> > > >  		if (vms_gather_munmap_vmas(&vms, &mas_detach))
> > > >  			return -ENOMEM;
> > > >
> > > > -		if (vma_iter_clear_gfp(&vmi, addr, end, GFP_KERNEL))
> > > > -			return -ENOMEM;
> > > > -
> > > > -		vms_complete_munmap_vmas(&vms, &mas_detach);
> > > >  		next = vms.next;
> > > >  		prev = vms.prev;
> > > >  		vma = NULL;
> > > >  	} else {
> > > > +		/* Minimal setup of vms */
> > > > +		vms.nr_pages = 0;
> > >
...

> > > > @@ -3052,6 +3053,9 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> > > >
> > > >  	if (file) {
> > > >  		vma->vm_file = get_file(file);
> > > > +		/* call_mmap() map PTE, so ensure there are no existing PTEs */
...
> > > > +		if (vms.nr_pages)
> > > > +			vms_complete_pte_clear(&vms, &mas_detach, true);
> > > >  		error = call_mmap(file, vma);
> > > >  		if (error)
> > > >  			goto unmap_and_free_vma;
> > > > @@ -3142,6 +3146,9 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> > > >  expanded:
> > > >  	perf_event_mmap(vma);
> > > >
> > > > +	if (vms.nr_pages)
> > > > +		vms_complete_munmap_vmas(&vms, &mas_detach);
> > > > +
> > >
> > > Hang on, if we already did this in the if (file) branch above, might we end
> > > up calling this twice? I didn't see vms.nr_pages get set to zero or
> > > decremented anywhere (unless I missed it)?
> >
> > No, we called the new helper vms_complete_pte_clear(), which will avoid
> > clearing the ptes by the added flag vms->cleared_ptes in the second
> > call.
> >
> > Above, I modified vms_complete_pte_clear() to check vms->cleared_ptes
> > prior to clearing the ptes, so it will only be cleared if it needs
> > clearing.
> >
> > I debated moving this nr_pages check within vms_complete_munmap_vmas(),
> > but that would add an unnecessary check to the munmap() path.  Avoiding
> > both checks seemed too much code (yet another static inline, or such).
> > I also wanted to keep the sanity of nr_pages checking to a single
> > function - as you highlighted it could be a path to insanity.
> >
> > Considering I'll switch this ti a VMS_INIT(), I think that I could pass
> > it through and do the logic within the static inline at the expense of
> > the munmap() having a few extra instructions (but no cache hits, so not
> > a really big deal).
> 
> Yeah it's a bit confusing that the rest of vms_complete_munmap_vmas() is
> potentially run twice even if the vms_complete_pte_clear() exits early due
> to vms->cleared_ptes being set.

vms_complete_munmap_vmas() is never run twice, it's only ever run once.
vms_complete_pte_clear() is called from  vms_complete_munmap_vmas(), but
will do nothing if cleared_ptes == true, which is set at the end of the
pte_clear() itself, and initialized as false.

Hopefully this becomes more obvious with the change to an INIT_VMS()
paradigm.  I think I'll change the name of vms_complete_pte_clear() in
an attempt to make this more obvious as well (remove the _complete,
probably).

Thanks,
Liam



  reply	other threads:[~2024-07-09 18:44 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-04 18:27 [PATCH v3 00/16] Avoid MAP_FIXED gap exposure Liam R. Howlett
2024-07-04 18:27 ` [PATCH v3 01/16] mm/mmap: Correctly position vma_iterator in __split_vma() Liam R. Howlett
2024-07-04 18:27 ` [PATCH v3 02/16] mm/mmap: Introduce abort_munmap_vmas() Liam R. Howlett
2024-07-05 17:02   ` Lorenzo Stoakes
2024-07-05 18:12     ` Liam R. Howlett
2024-07-10 16:06       ` Suren Baghdasaryan
2024-07-04 18:27 ` [PATCH v3 03/16] mm/mmap: Introduce vmi_complete_munmap_vmas() Liam R. Howlett
2024-07-05 17:39   ` Lorenzo Stoakes
2024-07-10 16:07   ` Suren Baghdasaryan
2024-07-04 18:27 ` [PATCH v3 04/16] mm/mmap: Extract the gathering of vmas from do_vmi_align_munmap() Liam R. Howlett
2024-07-05 18:01   ` Lorenzo Stoakes
2024-07-05 18:41     ` Liam R. Howlett
2024-07-10 16:07   ` Suren Baghdasaryan
2024-07-10 16:32     ` Liam R. Howlett
2024-07-04 18:27 ` [PATCH v3 05/16] mm/mmap: Introduce vma_munmap_struct for use in munmap operations Liam R. Howlett
2024-07-05 18:39   ` Lorenzo Stoakes
2024-07-05 19:09     ` Liam R. Howlett
2024-07-10 16:07       ` Suren Baghdasaryan
2024-07-10 16:30         ` Liam R. Howlett
2024-07-04 18:27 ` [PATCH v3 06/16] mm/mmap: Change munmap to use vma_munmap_struct() for accounting and surrounding vmas Liam R. Howlett
2024-07-05 19:27   ` Lorenzo Stoakes
2024-07-05 19:59     ` Liam R. Howlett
2024-07-10 16:07       ` Suren Baghdasaryan
2024-07-10 17:29         ` Suren Baghdasaryan
2024-07-04 18:27 ` [PATCH v3 07/16] mm/mmap: Extract validate_mm() from vma_complete() Liam R. Howlett
2024-07-05 19:35   ` Lorenzo Stoakes
2024-07-10 16:06     ` Suren Baghdasaryan
2024-07-04 18:27 ` [PATCH v3 08/16] mm/mmap: Inline munmap operation in mmap_region() Liam R. Howlett
2024-07-05 19:39   ` Lorenzo Stoakes
2024-07-05 20:00     ` Liam R. Howlett
2024-07-10 16:15   ` Suren Baghdasaryan
2024-07-10 16:35     ` Liam R. Howlett
2024-07-04 18:27 ` [PATCH v3 09/16] mm/mmap: Expand mmap_region() munmap call Liam R. Howlett
2024-07-05 20:06   ` Lorenzo Stoakes
2024-07-05 20:30     ` Liam R. Howlett
2024-07-05 20:36       ` Lorenzo Stoakes
2024-07-08 14:49         ` Liam R. Howlett
2024-07-04 18:27 ` [PATCH v3 10/16] mm/mmap: Reposition vma iterator in mmap_region() Liam R. Howlett
2024-07-05 20:18   ` Lorenzo Stoakes
2024-07-05 20:56     ` Liam R. Howlett
2024-07-08 11:08       ` Lorenzo Stoakes
2024-07-08 16:43         ` Liam R. Howlett
2024-07-10 16:48   ` Suren Baghdasaryan
2024-07-10 17:18     ` Liam R. Howlett
2024-07-04 18:27 ` [PATCH v3 11/16] mm/mmap: Track start and end of munmap in vma_munmap_struct Liam R. Howlett
2024-07-05 20:27   ` Lorenzo Stoakes
2024-07-08 14:45     ` Liam R. Howlett
2024-07-10 17:14     ` Suren Baghdasaryan
2024-07-04 18:27 ` [PATCH v3 12/16] mm/mmap: Clean up unmap_region() argument list Liam R. Howlett
2024-07-05 20:33   ` Lorenzo Stoakes
2024-07-10 17:14     ` Suren Baghdasaryan
2024-07-04 18:27 ` [PATCH v3 13/16] mm/mmap: Avoid zeroing vma tree in mmap_region() Liam R. Howlett
2024-07-08 12:18   ` Lorenzo Stoakes
2024-07-08 19:10     ` Liam R. Howlett
2024-07-09 14:27       ` Lorenzo Stoakes
2024-07-09 18:43         ` Liam R. Howlett [this message]
2024-07-04 18:27 ` [PATCH v3 14/16] mm/mmap: Use PHYS_PFN " Liam R. Howlett
2024-07-08 12:21   ` Lorenzo Stoakes
2024-07-09 18:35     ` Liam R. Howlett
2024-07-09 18:42       ` Lorenzo Stoakes
2024-07-10 17:32     ` Suren Baghdasaryan
2024-07-04 18:27 ` [PATCH v3 15/16] mm/mmap: Use vms accounted pages " Liam R. Howlett
2024-07-08 12:43   ` Lorenzo Stoakes
2024-07-10 17:43     ` Suren Baghdasaryan
2024-07-04 18:27 ` [PATCH v3 16/16] mm/mmap: Move may_expand_vm() check " Liam R. Howlett
2024-07-08 12:52   ` Lorenzo Stoakes
2024-07-08 20:43     ` Liam R. Howlett
2024-07-09 14:42       ` Liam R. Howlett
2024-07-09 14:51         ` Lorenzo Stoakes
2024-07-09 14:52         ` Liam R. Howlett
2024-07-09 18:13           ` Dave Hansen
2024-07-09 14:45       ` Lorenzo Stoakes
2024-07-10 12:28         ` Michael Ellerman
2024-07-10 12:45           ` Lorenzo Stoakes
2024-07-10 12:59             ` LEROY Christophe
2024-07-10 16:09               ` Liam R. Howlett
2024-07-10 19:27                 ` Dmitry Safonov
2024-07-10 21:04                 ` LEROY Christophe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=q2yygxyl2gtoy67fosh2slb3ufxzr5kx4dwhjs23cajpsmouod@luw4kdisi5yu \
    --to=liam.howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=kees@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lstoakes@gmail.com \
    --cc=olsajiri@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=sidhartha.kumar@oracle.com \
    --cc=spasswolf@web.de \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox