[PATCH 0/6] Handle more faults under the VMA lock

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/6] Handle more faults under the VMA lock
@ 2023-09-27  5:24 Matthew Wilcox (Oracle)
  2023-09-27  5:24 ` [PATCH 1/6] mm: Make lock_folio_maybe_drop_mmap() VMA lock aware Matthew Wilcox (Oracle)
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-09-27  5:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Suren Baghdasaryan

At this point, we're handling the majority of file-backed page faults
under the VMA lock, using the ->map_pages entry point.  This patch set
attempts to expand that for the following siutations:

 - We have to do a read.  This could be because we've hit the point in
   the readahead window where we need to kick off the next readahead,
   or because the page is simply not present in cache.
 - We're handling a write fault.  Most applications don't do I/O by writes
   to shared mmaps for very good reasons, but some do, and it'd be nice
   to not make that slow unnecessarily.
 - We're doing a COW of a private mapping (both PTE already present
   and PTE not-present).  These are two different codepaths and I handle
   both of them in this patch set.

There is no support in this patch set for drivers to mark themselves
as being VMA lock friendly; they could implement the ->map_pages
vm_operation, but if they do, they would be the first.  This is probably
something we want to change at some point in the future, and I've marked
where to make that change in the code.

Suren, would you mind dropping this into your benchmarking setup and
seeing if these relatively minor additions make a difference?

Matthew Wilcox (Oracle) (6):
  mm: Make lock_folio_maybe_drop_mmap() VMA lock aware
  mm: Call wp_page_copy() under the VMA lock
  mm: Handle shared faults under the VMA lock
  mm: Handle COW faults under the VMA lock
  mm: Handle read faults under the VMA lock
  mm: Handle write faults to RO pages under the VMA lock

 mm/filemap.c | 13 ++++----
 mm/memory.c  | 93 ++++++++++++++++++++++++++++++++--------------------
 2 files changed, 65 insertions(+), 41 deletions(-)

-- 
2.40.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/6] mm: Make lock_folio_maybe_drop_mmap() VMA lock aware
  2023-09-27  5:24 [PATCH 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
@ 2023-09-27  5:24 ` Matthew Wilcox (Oracle)
  2023-09-27  5:25 ` [PATCH 2/6] mm: Call wp_page_copy() under the VMA lock Matthew Wilcox (Oracle)
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-09-27  5:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Suren Baghdasaryan

Drop the VMA lock instead of the mmap_lock if that's the one which
is held.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 9481ffaf24e6..a598872d62cc 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3104,7 +3104,7 @@ static int lock_folio_maybe_drop_mmap(struct vm_fault *vmf, struct folio *folio,
 
 	/*
 	 * NOTE! This will make us return with VM_FAULT_RETRY, but with
-	 * the mmap_lock still held. That's how FAULT_FLAG_RETRY_NOWAIT
+	 * the fault lock still held. That's how FAULT_FLAG_RETRY_NOWAIT
 	 * is supposed to work. We have way too many special cases..
 	 */
 	if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
@@ -3114,13 +3114,14 @@ static int lock_folio_maybe_drop_mmap(struct vm_fault *vmf, struct folio *folio,
 	if (vmf->flags & FAULT_FLAG_KILLABLE) {
 		if (__folio_lock_killable(folio)) {
 			/*
-			 * We didn't have the right flags to drop the mmap_lock,
-			 * but all fault_handlers only check for fatal signals
-			 * if we return VM_FAULT_RETRY, so we need to drop the
-			 * mmap_lock here and return 0 if we don't have a fpin.
+			 * We didn't have the right flags to drop the
+			 * fault lock, but all fault_handlers only check
+			 * for fatal signals if we return VM_FAULT_RETRY,
+			 * so we need to drop the fault lock here and
+			 * return 0 if we don't have a fpin.
 			 */
 			if (*fpin == NULL)
-				mmap_read_unlock(vmf->vma->vm_mm);
+				release_fault_lock(vmf);
 			return 0;
 		}
 	} else
-- 
2.40.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/6] mm: Call wp_page_copy() under the VMA lock
  2023-09-27  5:24 [PATCH 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
  2023-09-27  5:24 ` [PATCH 1/6] mm: Make lock_folio_maybe_drop_mmap() VMA lock aware Matthew Wilcox (Oracle)
@ 2023-09-27  5:25 ` Matthew Wilcox (Oracle)
  2023-09-27 22:38   ` Suren Baghdasaryan
  2023-09-27  5:25 ` [PATCH 3/6] mm: Handle shared faults " Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-09-27  5:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Suren Baghdasaryan

It is usually safe to call wp_page_copy() under the VMA lock.  The only
unsafe situation is when no anon_vma has been allocated for this VMA,
and we have to look at adjacent VMAs to determine if their anon_vma can
be shared.  Since this happens only for the first COW of a page in this
VMA, the majority of calls to wp_page_copy() do not need to fall back
to the mmap_sem.

Add vmf_anon_prepare() as an alternative to anon_vma_prepare() which
will return RETRY if we currently hold the VMA lock and need to allocate
an anon_vma.  This lets us drop the check in do_wp_page().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memory.c | 39 ++++++++++++++++++++++++++-------------
 1 file changed, 26 insertions(+), 13 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 97f860d6cd2a..cff78c496728 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3042,6 +3042,21 @@ static inline void wp_page_reuse(struct vm_fault *vmf)
 	count_vm_event(PGREUSE);
 }
 
+static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
+{
+	struct vm_area_struct *vma = vmf->vma;
+
+	if (likely(vma->anon_vma))
+		return 0;
+	if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
+		vma_end_read(vma);
+		return VM_FAULT_RETRY;
+	}
+	if (__anon_vma_prepare(vma))
+		return VM_FAULT_OOM;
+	return 0;
+}
+
 /*
  * Handle the case of a page which we actually need to copy to a new page,
  * either due to COW or unsharing.
@@ -3069,27 +3084,29 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 	pte_t entry;
 	int page_copied = 0;
 	struct mmu_notifier_range range;
-	int ret;
+	vm_fault_t ret;
 
 	delayacct_wpcopy_start();
 
 	if (vmf->page)
 		old_folio = page_folio(vmf->page);
-	if (unlikely(anon_vma_prepare(vma)))
-		goto oom;
+	ret = vmf_anon_prepare(vmf);
+	if (unlikely(ret))
+		goto out;
 
 	if (is_zero_pfn(pte_pfn(vmf->orig_pte))) {
 		new_folio = vma_alloc_zeroed_movable_folio(vma, vmf->address);
 		if (!new_folio)
 			goto oom;
 	} else {
+		int err;
 		new_folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma,
 				vmf->address, false);
 		if (!new_folio)
 			goto oom;
 
-		ret = __wp_page_copy_user(&new_folio->page, vmf->page, vmf);
-		if (ret) {
+		err = __wp_page_copy_user(&new_folio->page, vmf->page, vmf);
+		if (err) {
 			/*
 			 * COW failed, if the fault was solved by other,
 			 * it's fine. If not, userspace would re-fault on
@@ -3102,7 +3119,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 				folio_put(old_folio);
 
 			delayacct_wpcopy_end();
-			return ret == -EHWPOISON ? VM_FAULT_HWPOISON : 0;
+			return err == -EHWPOISON ? VM_FAULT_HWPOISON : 0;
 		}
 		kmsan_copy_page_meta(&new_folio->page, vmf->page);
 	}
@@ -3212,11 +3229,13 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 oom_free_new:
 	folio_put(new_folio);
 oom:
+	ret = VM_FAULT_OOM;
+out:
 	if (old_folio)
 		folio_put(old_folio);
 
 	delayacct_wpcopy_end();
-	return VM_FAULT_OOM;
+	return ret;
 }
 
 /**
@@ -3458,12 +3477,6 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
 		return 0;
 	}
 copy:
-	if ((vmf->flags & FAULT_FLAG_VMA_LOCK) && !vma->anon_vma) {
-		pte_unmap_unlock(vmf->pte, vmf->ptl);
-		vma_end_read(vmf->vma);
-		return VM_FAULT_RETRY;
-	}
-
 	/*
 	 * Ok, we need to copy. Oh, well..
 	 */
-- 
2.40.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/6] mm: Call wp_page_copy() under the VMA lock
  2023-09-27  5:25 ` [PATCH 2/6] mm: Call wp_page_copy() under the VMA lock Matthew Wilcox (Oracle)
@ 2023-09-27 22:38   ` Suren Baghdasaryan
  2023-09-28  5:18     ` Matthew Wilcox
  0 siblings, 1 reply; 16+ messages in thread
From: Suren Baghdasaryan @ 2023-09-27 22:38 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, linux-mm

On Tue, Sep 26, 2023 at 10:25 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> It is usually safe to call wp_page_copy() under the VMA lock.  The only
> unsafe situation is when no anon_vma has been allocated for this VMA,
> and we have to look at adjacent VMAs to determine if their anon_vma can
> be shared.  Since this happens only for the first COW of a page in this
> VMA, the majority of calls to wp_page_copy() do not need to fall back
> to the mmap_sem.
>
> Add vmf_anon_prepare() as an alternative to anon_vma_prepare() which
> will return RETRY if we currently hold the VMA lock and need to allocate
> an anon_vma.  This lets us drop the check in do_wp_page().
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/memory.c | 39 ++++++++++++++++++++++++++-------------
>  1 file changed, 26 insertions(+), 13 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 97f860d6cd2a..cff78c496728 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3042,6 +3042,21 @@ static inline void wp_page_reuse(struct vm_fault *vmf)
>         count_vm_event(PGREUSE);
>  }
>
> +static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
> +{
> +       struct vm_area_struct *vma = vmf->vma;
> +
> +       if (likely(vma->anon_vma))
> +               return 0;
> +       if (vmf->flags & FAULT_FLAG_VMA_LOCK) {

I don't think the above condition will happen today because
lock_vma_under_rcu() returns NULL and do_page_fault() falls back to
taking mmap_lock when !vma->anon_vma
(https://elixir.bootlin.com/linux/v6.6-rc3/source/mm/memory.c#L5428).
We would need to narrow down that check in lock_vma_under_rcu() to
make this work here.

> +               vma_end_read(vma);
> +               return VM_FAULT_RETRY;
> +       }
> +       if (__anon_vma_prepare(vma))
> +               return VM_FAULT_OOM;
> +       return 0;
> +}
> +
>  /*
>   * Handle the case of a page which we actually need to copy to a new page,
>   * either due to COW or unsharing.
> @@ -3069,27 +3084,29 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>         pte_t entry;
>         int page_copied = 0;
>         struct mmu_notifier_range range;
> -       int ret;
> +       vm_fault_t ret;
>
>         delayacct_wpcopy_start();
>
>         if (vmf->page)
>                 old_folio = page_folio(vmf->page);
> -       if (unlikely(anon_vma_prepare(vma)))
> -               goto oom;
> +       ret = vmf_anon_prepare(vmf);
> +       if (unlikely(ret))
> +               goto out;
>
>         if (is_zero_pfn(pte_pfn(vmf->orig_pte))) {
>                 new_folio = vma_alloc_zeroed_movable_folio(vma, vmf->address);
>                 if (!new_folio)
>                         goto oom;
>         } else {
> +               int err;
>                 new_folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma,
>                                 vmf->address, false);
>                 if (!new_folio)
>                         goto oom;
>
> -               ret = __wp_page_copy_user(&new_folio->page, vmf->page, vmf);
> -               if (ret) {
> +               err = __wp_page_copy_user(&new_folio->page, vmf->page, vmf);
> +               if (err) {
>                         /*
>                          * COW failed, if the fault was solved by other,
>                          * it's fine. If not, userspace would re-fault on
> @@ -3102,7 +3119,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>                                 folio_put(old_folio);
>
>                         delayacct_wpcopy_end();
> -                       return ret == -EHWPOISON ? VM_FAULT_HWPOISON : 0;
> +                       return err == -EHWPOISON ? VM_FAULT_HWPOISON : 0;
>                 }
>                 kmsan_copy_page_meta(&new_folio->page, vmf->page);
>         }
> @@ -3212,11 +3229,13 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>  oom_free_new:
>         folio_put(new_folio);
>  oom:
> +       ret = VM_FAULT_OOM;
> +out:
>         if (old_folio)
>                 folio_put(old_folio);
>
>         delayacct_wpcopy_end();
> -       return VM_FAULT_OOM;
> +       return ret;
>  }
>
>  /**
> @@ -3458,12 +3477,6 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
>                 return 0;
>         }
>  copy:
> -       if ((vmf->flags & FAULT_FLAG_VMA_LOCK) && !vma->anon_vma) {
> -               pte_unmap_unlock(vmf->pte, vmf->ptl);
> -               vma_end_read(vmf->vma);
> -               return VM_FAULT_RETRY;
> -       }
> -
>         /*
>          * Ok, we need to copy. Oh, well..
>          */
> --
> 2.40.1
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/6] mm: Call wp_page_copy() under the VMA lock
  2023-09-27 22:38   ` Suren Baghdasaryan
@ 2023-09-28  5:18     ` Matthew Wilcox
  2023-09-28 14:57       ` Suren Baghdasaryan
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2023-09-28  5:18 UTC (permalink / raw)
  To: Suren Baghdasaryan; +Cc: Andrew Morton, linux-mm

On Wed, Sep 27, 2023 at 03:38:38PM -0700, Suren Baghdasaryan wrote:
> On Tue, Sep 26, 2023 at 10:25 PM Matthew Wilcox (Oracle)
> <willy@infradead.org> wrote:
> > It is usually safe to call wp_page_copy() under the VMA lock.  The only
> > unsafe situation is when no anon_vma has been allocated for this VMA,
> > and we have to look at adjacent VMAs to determine if their anon_vma can
> > be shared.  Since this happens only for the first COW of a page in this
> > VMA, the majority of calls to wp_page_copy() do not need to fall back
> > to the mmap_sem.
> >
> > Add vmf_anon_prepare() as an alternative to anon_vma_prepare() which
> > will return RETRY if we currently hold the VMA lock and need to allocate
> > an anon_vma.  This lets us drop the check in do_wp_page().
> >
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> >  mm/memory.c | 39 ++++++++++++++++++++++++++-------------
> >  1 file changed, 26 insertions(+), 13 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 97f860d6cd2a..cff78c496728 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3042,6 +3042,21 @@ static inline void wp_page_reuse(struct vm_fault *vmf)
> >         count_vm_event(PGREUSE);
> >  }
> >
> > +static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
> > +{
> > +       struct vm_area_struct *vma = vmf->vma;
> > +
> > +       if (likely(vma->anon_vma))
> > +               return 0;
> > +       if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> 
> I don't think the above condition will happen today because
> lock_vma_under_rcu() returns NULL and do_page_fault() falls back to
> taking mmap_lock when !vma->anon_vma
> (https://elixir.bootlin.com/linux/v6.6-rc3/source/mm/memory.c#L5428).
> We would need to narrow down that check in lock_vma_under_rcu() to
> make this work here.

That's only for anon VMAs.  For file-backed VMAs, we can get here ...

handle_pte_fault()
        if (vmf->flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
                if (!pte_write(entry))
                        return do_wp_page(vmf);

ie we we have a MAP_PRIVATE of a file, first take a read-fault on it,
then write to it.  That causes us to allocate an anon page in this
file-backed VMA, so we need an anon_vma to exist.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/6] mm: Call wp_page_copy() under the VMA lock
  2023-09-28  5:18     ` Matthew Wilcox
@ 2023-09-28 14:57       ` Suren Baghdasaryan
  0 siblings, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2023-09-28 14:57 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Andrew Morton, linux-mm

On Wed, Sep 27, 2023 at 10:18 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Sep 27, 2023 at 03:38:38PM -0700, Suren Baghdasaryan wrote:
> > On Tue, Sep 26, 2023 at 10:25 PM Matthew Wilcox (Oracle)
> > <willy@infradead.org> wrote:
> > > It is usually safe to call wp_page_copy() under the VMA lock.  The only
> > > unsafe situation is when no anon_vma has been allocated for this VMA,
> > > and we have to look at adjacent VMAs to determine if their anon_vma can
> > > be shared.  Since this happens only for the first COW of a page in this
> > > VMA, the majority of calls to wp_page_copy() do not need to fall back
> > > to the mmap_sem.
> > >
> > > Add vmf_anon_prepare() as an alternative to anon_vma_prepare() which
> > > will return RETRY if we currently hold the VMA lock and need to allocate
> > > an anon_vma.  This lets us drop the check in do_wp_page().
> > >
> > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > ---
> > >  mm/memory.c | 39 ++++++++++++++++++++++++++-------------
> > >  1 file changed, 26 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index 97f860d6cd2a..cff78c496728 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -3042,6 +3042,21 @@ static inline void wp_page_reuse(struct vm_fault *vmf)
> > >         count_vm_event(PGREUSE);
> > >  }
> > >
> > > +static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
> > > +{
> > > +       struct vm_area_struct *vma = vmf->vma;
> > > +
> > > +       if (likely(vma->anon_vma))
> > > +               return 0;
> > > +       if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> >
> > I don't think the above condition will happen today because
> > lock_vma_under_rcu() returns NULL and do_page_fault() falls back to
> > taking mmap_lock when !vma->anon_vma
> > (https://elixir.bootlin.com/linux/v6.6-rc3/source/mm/memory.c#L5428).
> > We would need to narrow down that check in lock_vma_under_rcu() to
> > make this work here.
>
> That's only for anon VMAs.  For file-backed VMAs, we can get here ...
>
> handle_pte_fault()
>         if (vmf->flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
>                 if (!pte_write(entry))
>                         return do_wp_page(vmf);
>
> ie we we have a MAP_PRIVATE of a file, first take a read-fault on it,
> then write to it.  That causes us to allocate an anon page in this
> file-backed VMA, so we need an anon_vma to exist.

Oh, sorry, I completely missed that this check was only for anon VMAs.
Now it makes sense. Thanks!


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/6] mm: Handle shared faults under the VMA lock
  2023-09-27  5:24 [PATCH 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
  2023-09-27  5:24 ` [PATCH 1/6] mm: Make lock_folio_maybe_drop_mmap() VMA lock aware Matthew Wilcox (Oracle)
  2023-09-27  5:25 ` [PATCH 2/6] mm: Call wp_page_copy() under the VMA lock Matthew Wilcox (Oracle)
@ 2023-09-27  5:25 ` Matthew Wilcox (Oracle)
  2023-09-28  0:46   ` Suren Baghdasaryan
  2023-09-27  5:25 ` [PATCH 4/6] mm: Handle COW " Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-09-27  5:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Suren Baghdasaryan

There are many implementations of ->fault and some of them depend on
mmap_lock being held.  All vm_ops that implement ->map_pages() end
up calling filemap_fault(), which I have audited to be sure it does
not rely on mmap_lock.  So (for now) key off ->map_pages existing
as being a flag to indicate that it's safe to call ->fault while
only holding the vma lock.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memory.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index cff78c496728..0f3da4889230 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3042,6 +3042,21 @@ static inline void wp_page_reuse(struct vm_fault *vmf)
 	count_vm_event(PGREUSE);
 }
 
+/*
+ * We could add a bitflag somewhere, but for now, we know that all
+ * vm_ops that have a ->map_pages have been audited and don't need
+ * the mmap_lock to be held.
+ */
+static inline vm_fault_t vmf_maybe_unlock_vma(const struct vm_fault *vmf)
+{
+	struct vm_area_struct *vma = vmf->vma;
+
+	if (vma->vm_ops->map_pages || !(vmf->flags & FAULT_FLAG_VMA_LOCK))
+		return 0;
+	vma_end_read(vma);
+	return VM_FAULT_RETRY;
+}
+
 static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
@@ -4669,10 +4684,9 @@ static vm_fault_t do_shared_fault(struct vm_fault *vmf)
 	vm_fault_t ret, tmp;
 	struct folio *folio;
 
-	if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
-		vma_end_read(vma);
-		return VM_FAULT_RETRY;
-	}
+	ret = vmf_maybe_unlock_vma(vmf);
+	if (ret)
+		return ret;
 
 	ret = __do_fault(vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
-- 
2.40.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] mm: Handle shared faults under the VMA lock
  2023-09-27  5:25 ` [PATCH 3/6] mm: Handle shared faults " Matthew Wilcox (Oracle)
@ 2023-09-28  0:46   ` Suren Baghdasaryan
  2023-09-28  1:02     ` Suren Baghdasaryan
  0 siblings, 1 reply; 16+ messages in thread
From: Suren Baghdasaryan @ 2023-09-28  0:46 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, linux-mm

On Tue, Sep 26, 2023 at 10:25 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> There are many implementations of ->fault and some of them depend on
> mmap_lock being held.  All vm_ops that implement ->map_pages() end
> up calling filemap_fault(), which I have audited to be sure it does
> not rely on mmap_lock.  So (for now) key off ->map_pages existing
> as being a flag to indicate that it's safe to call ->fault while
> only holding the vma lock.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/memory.c | 22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index cff78c496728..0f3da4889230 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3042,6 +3042,21 @@ static inline void wp_page_reuse(struct vm_fault *vmf)
>         count_vm_event(PGREUSE);
>  }
>
> +/*
> + * We could add a bitflag somewhere, but for now, we know that all
> + * vm_ops that have a ->map_pages have been audited and don't need
> + * the mmap_lock to be held.
> + */
> +static inline vm_fault_t vmf_maybe_unlock_vma(const struct vm_fault *vmf)
> +{
> +       struct vm_area_struct *vma = vmf->vma;
> +
> +       if (vma->vm_ops->map_pages || !(vmf->flags & FAULT_FLAG_VMA_LOCK))
> +               return 0;
> +       vma_end_read(vma);
> +       return VM_FAULT_RETRY;
> +}
> +
>  static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
>  {
>         struct vm_area_struct *vma = vmf->vma;
> @@ -4669,10 +4684,9 @@ static vm_fault_t do_shared_fault(struct vm_fault *vmf)
>         vm_fault_t ret, tmp;
>         struct folio *folio;
>
> -       if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> -               vma_end_read(vma);
> -               return VM_FAULT_RETRY;
> -       }
> +       ret = vmf_maybe_unlock_vma(vmf);

The name of this new function in this context does not seem
appropriate to me. The logic of this check was that we can't rely on
VMA lock since it might not be sufficient, so we have to retry with
mmap_lock instead. With this change it seems like we intentionally try
to unlock the VMA here. IMHO this would be more understandable:

static inline bool is_vma_lock_sufficient(struct vm_area_struct *vma) {
    return vma->vm_ops->map_pages != NULL;
}

static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
{
...
        if ((vmf->flags & FAULT_FLAG_VMA_LOCK) && !is_vma_lock_sufficient(vma) {
                vma_end_read(vma);
               return VM_FAULT_RETRY;
        }

> +       if (ret)
> +               return ret;
>
>         ret = __do_fault(vmf);
>         if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
> --
> 2.40.1
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] mm: Handle shared faults under the VMA lock
  2023-09-28  0:46   ` Suren Baghdasaryan
@ 2023-09-28  1:02     ` Suren Baghdasaryan
  2023-09-28  5:33       ` Matthew Wilcox
  0 siblings, 1 reply; 16+ messages in thread
From: Suren Baghdasaryan @ 2023-09-28  1:02 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, linux-mm

On Wed, Sep 27, 2023 at 5:46 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Tue, Sep 26, 2023 at 10:25 PM Matthew Wilcox (Oracle)
> <willy@infradead.org> wrote:
> >
> > There are many implementations of ->fault and some of them depend on
> > mmap_lock being held.  All vm_ops that implement ->map_pages() end
> > up calling filemap_fault(), which I have audited to be sure it does
> > not rely on mmap_lock.  So (for now) key off ->map_pages existing
> > as being a flag to indicate that it's safe to call ->fault while
> > only holding the vma lock.
> >
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> >  mm/memory.c | 22 ++++++++++++++++++----
> >  1 file changed, 18 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index cff78c496728..0f3da4889230 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3042,6 +3042,21 @@ static inline void wp_page_reuse(struct vm_fault *vmf)
> >         count_vm_event(PGREUSE);
> >  }
> >
> > +/*
> > + * We could add a bitflag somewhere, but for now, we know that all
> > + * vm_ops that have a ->map_pages have been audited and don't need
> > + * the mmap_lock to be held.
> > + */
> > +static inline vm_fault_t vmf_maybe_unlock_vma(const struct vm_fault *vmf)
> > +{
> > +       struct vm_area_struct *vma = vmf->vma;
> > +
> > +       if (vma->vm_ops->map_pages || !(vmf->flags & FAULT_FLAG_VMA_LOCK))
> > +               return 0;
> > +       vma_end_read(vma);
> > +       return VM_FAULT_RETRY;
> > +}
> > +
> >  static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
> >  {
> >         struct vm_area_struct *vma = vmf->vma;
> > @@ -4669,10 +4684,9 @@ static vm_fault_t do_shared_fault(struct vm_fault *vmf)
> >         vm_fault_t ret, tmp;
> >         struct folio *folio;
> >
> > -       if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> > -               vma_end_read(vma);
> > -               return VM_FAULT_RETRY;
> > -       }
> > +       ret = vmf_maybe_unlock_vma(vmf);
>
> The name of this new function in this context does not seem
> appropriate to me. The logic of this check was that we can't rely on
> VMA lock since it might not be sufficient, so we have to retry with
> mmap_lock instead. With this change it seems like we intentionally try
> to unlock the VMA here. IMHO this would be more understandable:
>
> static inline bool is_vma_lock_sufficient(struct vm_area_struct *vma) {
>     return vma->vm_ops->map_pages != NULL;
> }
>
> static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
> {
> ...
>         if ((vmf->flags & FAULT_FLAG_VMA_LOCK) && !is_vma_lock_sufficient(vma) {
>                 vma_end_read(vma);
>                return VM_FAULT_RETRY;
>         }

Same comment for the rest of the patches where vmf_maybe_unlock_vma()
is being used. It would be great to have this logic coded in one
function like you do but I could not find an appropriate name that
would convey that "we want to check if the current lock is sufficient
and if not then we will drop it and retry". Maybe you or someone else
can think of a good name for it?

>
> > +       if (ret)
> > +               return ret;
> >
> >         ret = __do_fault(vmf);
> >         if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
> > --
> > 2.40.1
> >


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] mm: Handle shared faults under the VMA lock
  2023-09-28  1:02     ` Suren Baghdasaryan
@ 2023-09-28  5:33       ` Matthew Wilcox
  2023-09-28 15:02         ` Suren Baghdasaryan
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2023-09-28  5:33 UTC (permalink / raw)
  To: Suren Baghdasaryan; +Cc: Andrew Morton, linux-mm

On Wed, Sep 27, 2023 at 06:02:47PM -0700, Suren Baghdasaryan wrote:
> On Wed, Sep 27, 2023 at 5:46 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Tue, Sep 26, 2023 at 10:25 PM Matthew Wilcox (Oracle)
> > <willy@infradead.org> wrote:
> > >
> > > There are many implementations of ->fault and some of them depend on
> > > mmap_lock being held.  All vm_ops that implement ->map_pages() end
> > > up calling filemap_fault(), which I have audited to be sure it does
> > > not rely on mmap_lock.  So (for now) key off ->map_pages existing
> > > as being a flag to indicate that it's safe to call ->fault while
> > > only holding the vma lock.
> > >
> > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > ---
> > >  mm/memory.c | 22 ++++++++++++++++++----
> > >  1 file changed, 18 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index cff78c496728..0f3da4889230 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -3042,6 +3042,21 @@ static inline void wp_page_reuse(struct vm_fault *vmf)
> > >         count_vm_event(PGREUSE);
> > >  }
> > >
> > > +/*
> > > + * We could add a bitflag somewhere, but for now, we know that all
> > > + * vm_ops that have a ->map_pages have been audited and don't need
> > > + * the mmap_lock to be held.
> > > + */
> > > +static inline vm_fault_t vmf_maybe_unlock_vma(const struct vm_fault *vmf)
> > > +{
> > > +       struct vm_area_struct *vma = vmf->vma;
> > > +
> > > +       if (vma->vm_ops->map_pages || !(vmf->flags & FAULT_FLAG_VMA_LOCK))
> > > +               return 0;
> > > +       vma_end_read(vma);
> > > +       return VM_FAULT_RETRY;
> > > +}
> > > +
> > >  static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
> > >  {
> > >         struct vm_area_struct *vma = vmf->vma;
> > > @@ -4669,10 +4684,9 @@ static vm_fault_t do_shared_fault(struct vm_fault *vmf)
> > >         vm_fault_t ret, tmp;
> > >         struct folio *folio;
> > >
> > > -       if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> > > -               vma_end_read(vma);
> > > -               return VM_FAULT_RETRY;
> > > -       }
> > > +       ret = vmf_maybe_unlock_vma(vmf);
> >
> > The name of this new function in this context does not seem
> > appropriate to me. The logic of this check was that we can't rely on
> > VMA lock since it might not be sufficient, so we have to retry with
> > mmap_lock instead. With this change it seems like we intentionally try
> > to unlock the VMA here. IMHO this would be more understandable:
> >
> > static inline bool is_vma_lock_sufficient(struct vm_area_struct *vma) {
> >     return vma->vm_ops->map_pages != NULL;
> > }

Originally I called this function vma_needs_mmap_lock() (with the
opposite polarity).  But I disliked the duplication of code ...

> Same comment for the rest of the patches where vmf_maybe_unlock_vma()
> is being used. It would be great to have this logic coded in one
> function like you do but I could not find an appropriate name that
> would convey that "we want to check if the current lock is sufficient
> and if not then we will drop it and retry". Maybe you or someone else
> can think of a good name for it?

Maybe

+static inline vm_fault_t vmf_can_call_fault(const struct vm_fault *vmf)
+{
+       struct vm_area_struct *vma = vmf->vma;
+
+       if (vma->vm_ops->map_pages || !(vmf->flags & FAULT_FLAG_VMA_LOCK))
+               return 0;
+       vma_end_read(vma);
+       return VM_FAULT_RETRY;
+}

I'm having trouble coming up with a name that doesn't imply it's a bool
predicate.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] mm: Handle shared faults under the VMA lock
  2023-09-28  5:33       ` Matthew Wilcox
@ 2023-09-28 15:02         ` Suren Baghdasaryan
  0 siblings, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2023-09-28 15:02 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Andrew Morton, linux-mm

On Wed, Sep 27, 2023 at 10:33 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Sep 27, 2023 at 06:02:47PM -0700, Suren Baghdasaryan wrote:
> > On Wed, Sep 27, 2023 at 5:46 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Tue, Sep 26, 2023 at 10:25 PM Matthew Wilcox (Oracle)
> > > <willy@infradead.org> wrote:
> > > >
> > > > There are many implementations of ->fault and some of them depend on
> > > > mmap_lock being held.  All vm_ops that implement ->map_pages() end
> > > > up calling filemap_fault(), which I have audited to be sure it does
> > > > not rely on mmap_lock.  So (for now) key off ->map_pages existing
> > > > as being a flag to indicate that it's safe to call ->fault while
> > > > only holding the vma lock.
> > > >
> > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > > ---
> > > >  mm/memory.c | 22 ++++++++++++++++++----
> > > >  1 file changed, 18 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > index cff78c496728..0f3da4889230 100644
> > > > --- a/mm/memory.c
> > > > +++ b/mm/memory.c
> > > > @@ -3042,6 +3042,21 @@ static inline void wp_page_reuse(struct vm_fault *vmf)
> > > >         count_vm_event(PGREUSE);
> > > >  }
> > > >
> > > > +/*
> > > > + * We could add a bitflag somewhere, but for now, we know that all
> > > > + * vm_ops that have a ->map_pages have been audited and don't need
> > > > + * the mmap_lock to be held.
> > > > + */
> > > > +static inline vm_fault_t vmf_maybe_unlock_vma(const struct vm_fault *vmf)
> > > > +{
> > > > +       struct vm_area_struct *vma = vmf->vma;
> > > > +
> > > > +       if (vma->vm_ops->map_pages || !(vmf->flags & FAULT_FLAG_VMA_LOCK))
> > > > +               return 0;
> > > > +       vma_end_read(vma);
> > > > +       return VM_FAULT_RETRY;
> > > > +}
> > > > +
> > > >  static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
> > > >  {
> > > >         struct vm_area_struct *vma = vmf->vma;
> > > > @@ -4669,10 +4684,9 @@ static vm_fault_t do_shared_fault(struct vm_fault *vmf)
> > > >         vm_fault_t ret, tmp;
> > > >         struct folio *folio;
> > > >
> > > > -       if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> > > > -               vma_end_read(vma);
> > > > -               return VM_FAULT_RETRY;
> > > > -       }
> > > > +       ret = vmf_maybe_unlock_vma(vmf);
> > >
> > > The name of this new function in this context does not seem
> > > appropriate to me. The logic of this check was that we can't rely on
> > > VMA lock since it might not be sufficient, so we have to retry with
> > > mmap_lock instead. With this change it seems like we intentionally try
> > > to unlock the VMA here. IMHO this would be more understandable:
> > >
> > > static inline bool is_vma_lock_sufficient(struct vm_area_struct *vma) {
> > >     return vma->vm_ops->map_pages != NULL;
> > > }
>
> Originally I called this function vma_needs_mmap_lock() (with the
> opposite polarity).  But I disliked the duplication of code ...
>
> > Same comment for the rest of the patches where vmf_maybe_unlock_vma()
> > is being used. It would be great to have this logic coded in one
> > function like you do but I could not find an appropriate name that
> > would convey that "we want to check if the current lock is sufficient
> > and if not then we will drop it and retry". Maybe you or someone else
> > can think of a good name for it?
>
> Maybe
>
> +static inline vm_fault_t vmf_can_call_fault(const struct vm_fault *vmf)
> +{
> +       struct vm_area_struct *vma = vmf->vma;
> +
> +       if (vma->vm_ops->map_pages || !(vmf->flags & FAULT_FLAG_VMA_LOCK))
> +               return 0;
> +       vma_end_read(vma);
> +       return VM_FAULT_RETRY;
> +}
>
> I'm having trouble coming up with a name that doesn't imply it's a bool
> predicate.

Ok, I think it's better. At least it does not imply that our intention
is to unlock the VMA.

>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 4/6] mm: Handle COW faults under the VMA lock
  2023-09-27  5:24 [PATCH 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2023-09-27  5:25 ` [PATCH 3/6] mm: Handle shared faults " Matthew Wilcox (Oracle)
@ 2023-09-27  5:25 ` Matthew Wilcox (Oracle)
  2023-09-28  0:53   ` Suren Baghdasaryan
  2023-09-27  5:25 ` [PATCH 5/6] mm: Handle read " Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-09-27  5:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Suren Baghdasaryan

If the page is not currently present in the page tables, we need to call
the page fault handler to find out which page we're supposed to COW,
so we need to both check that there is already an anon_vma and that the
fault handler doesn't need the mmap_lock.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memory.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 0f3da4889230..02231a9394ed 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4639,13 +4639,11 @@ static vm_fault_t do_cow_fault(struct vm_fault *vmf)
 	struct vm_area_struct *vma = vmf->vma;
 	vm_fault_t ret;
 
-	if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
-		vma_end_read(vma);
-		return VM_FAULT_RETRY;
-	}
-
-	if (unlikely(anon_vma_prepare(vma)))
-		return VM_FAULT_OOM;
+	ret = vmf_maybe_unlock_vma(vmf);
+	if (!ret)
+		ret = vmf_anon_prepare(vmf);
+	if (ret)
+		return ret;
 
 	vmf->cow_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
 	if (!vmf->cow_page)
-- 
2.40.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/6] mm: Handle COW faults under the VMA lock
  2023-09-27  5:25 ` [PATCH 4/6] mm: Handle COW " Matthew Wilcox (Oracle)
@ 2023-09-28  0:53   ` Suren Baghdasaryan
  0 siblings, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2023-09-28  0:53 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, linux-mm

On Tue, Sep 26, 2023 at 10:25 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> If the page is not currently present in the page tables, we need to call
> the page fault handler to find out which page we're supposed to COW,
> so we need to both check that there is already an anon_vma and that the
> fault handler doesn't need the mmap_lock.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/memory.c | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 0f3da4889230..02231a9394ed 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4639,13 +4639,11 @@ static vm_fault_t do_cow_fault(struct vm_fault *vmf)
>         struct vm_area_struct *vma = vmf->vma;
>         vm_fault_t ret;
>
> -       if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> -               vma_end_read(vma);
> -               return VM_FAULT_RETRY;
> -       }
> -
> -       if (unlikely(anon_vma_prepare(vma)))
> -               return VM_FAULT_OOM;
> +       ret = vmf_maybe_unlock_vma(vmf);
> +       if (!ret)
> +               ret = vmf_anon_prepare(vmf);

Same comment as in [PATCH 2/6]. If we reach here with
FAULT_FLAG_VMA_LOCK then vma->anon_vma will not be NULL unless we
change lock_vma_under_rcu().



> +       if (ret)
> +               return ret;
>
>         vmf->cow_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
>         if (!vmf->cow_page)
> --
> 2.40.1
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 5/6] mm: Handle read faults under the VMA lock
  2023-09-27  5:24 [PATCH 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2023-09-27  5:25 ` [PATCH 4/6] mm: Handle COW " Matthew Wilcox (Oracle)
@ 2023-09-27  5:25 ` Matthew Wilcox (Oracle)
  2023-09-27  5:25 ` [PATCH 6/6] mm: Handle write faults to RO pages " Matthew Wilcox (Oracle)
  2023-09-27 22:18 ` [PATCH 0/6] Handle more faults " Suren Baghdasaryan
  6 siblings, 0 replies; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-09-27  5:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Suren Baghdasaryan

Most file-backed faults are already handled through ->map_pages(),
but if we need to do I/O we'll come this way.  Since filemap_fault()
is now safe to be called under the VMA lock, we can handle these faults
under the VMA lock now.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memory.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 02231a9394ed..5f92126dc527 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4617,10 +4617,9 @@ static vm_fault_t do_read_fault(struct vm_fault *vmf)
 			return ret;
 	}
 
-	if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
-		vma_end_read(vmf->vma);
-		return VM_FAULT_RETRY;
-	}
+	ret = vmf_maybe_unlock_vma(vmf);
+	if (ret)
+		return ret;
 
 	ret = __do_fault(vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
-- 
2.40.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 6/6] mm: Handle write faults to RO pages under the VMA lock
  2023-09-27  5:24 [PATCH 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
                   ` (4 preceding siblings ...)
  2023-09-27  5:25 ` [PATCH 5/6] mm: Handle read " Matthew Wilcox (Oracle)
@ 2023-09-27  5:25 ` Matthew Wilcox (Oracle)
  2023-09-27 22:18 ` [PATCH 0/6] Handle more faults " Suren Baghdasaryan
  6 siblings, 0 replies; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-09-27  5:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Suren Baghdasaryan

I think this is a pretty rare occurrence, but for consistency handle
faults with the VMA lock held the same way that we handle other
faults with the VMA lock held.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memory.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 5f92126dc527..45ffa0a527ca 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3301,10 +3301,9 @@ static vm_fault_t wp_pfn_shared(struct vm_fault *vmf)
 		vm_fault_t ret;
 
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-		if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
-			vma_end_read(vmf->vma);
-			return VM_FAULT_RETRY;
-		}
+		ret = vmf_maybe_unlock_vma(vmf);
+		if (ret)
+			return ret;
 
 		vmf->flags |= FAULT_FLAG_MKWRITE;
 		ret = vma->vm_ops->pfn_mkwrite(vmf);
@@ -3328,10 +3327,10 @@ static vm_fault_t wp_page_shared(struct vm_fault *vmf, struct folio *folio)
 		vm_fault_t tmp;
 
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-		if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
+		tmp = vmf_maybe_unlock_vma(vmf);
+		if (tmp) {
 			folio_put(folio);
-			vma_end_read(vmf->vma);
-			return VM_FAULT_RETRY;
+			return tmp;
 		}
 
 		tmp = do_page_mkwrite(vmf, folio);
-- 
2.40.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/6] Handle more faults under the VMA lock
  2023-09-27  5:24 [PATCH 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
                   ` (5 preceding siblings ...)
  2023-09-27  5:25 ` [PATCH 6/6] mm: Handle write faults to RO pages " Matthew Wilcox (Oracle)
@ 2023-09-27 22:18 ` Suren Baghdasaryan
  6 siblings, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2023-09-27 22:18 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, linux-mm

On Tue, Sep 26, 2023 at 10:25 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> At this point, we're handling the majority of file-backed page faults
> under the VMA lock, using the ->map_pages entry point.  This patch set
> attempts to expand that for the following siutations:
>
>  - We have to do a read.  This could be because we've hit the point in
>    the readahead window where we need to kick off the next readahead,
>    or because the page is simply not present in cache.
>  - We're handling a write fault.  Most applications don't do I/O by writes
>    to shared mmaps for very good reasons, but some do, and it'd be nice
>    to not make that slow unnecessarily.
>  - We're doing a COW of a private mapping (both PTE already present
>    and PTE not-present).  These are two different codepaths and I handle
>    both of them in this patch set.
>
> There is no support in this patch set for drivers to mark themselves
> as being VMA lock friendly; they could implement the ->map_pages
> vm_operation, but if they do, they would be the first.  This is probably
> something we want to change at some point in the future, and I've marked
> where to make that change in the code.
>
> Suren, would you mind dropping this into your benchmarking setup and
> seeing if these relatively minor additions make a difference?

Definitely. Thanks for doing this! I'll report once I have some
numbers but first I need to review the patches to understand which
benchmarks could show the benefit.

>
> Matthew Wilcox (Oracle) (6):
>   mm: Make lock_folio_maybe_drop_mmap() VMA lock aware
>   mm: Call wp_page_copy() under the VMA lock
>   mm: Handle shared faults under the VMA lock
>   mm: Handle COW faults under the VMA lock
>   mm: Handle read faults under the VMA lock
>   mm: Handle write faults to RO pages under the VMA lock
>
>  mm/filemap.c | 13 ++++----
>  mm/memory.c  | 93 ++++++++++++++++++++++++++++++++--------------------
>  2 files changed, 65 insertions(+), 41 deletions(-)
>
> --
> 2.40.1
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-09-28 15:03 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-27  5:24 [PATCH 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
2023-09-27  5:24 ` [PATCH 1/6] mm: Make lock_folio_maybe_drop_mmap() VMA lock aware Matthew Wilcox (Oracle)
2023-09-27  5:25 ` [PATCH 2/6] mm: Call wp_page_copy() under the VMA lock Matthew Wilcox (Oracle)
2023-09-27 22:38   ` Suren Baghdasaryan
2023-09-28  5:18     ` Matthew Wilcox
2023-09-28 14:57       ` Suren Baghdasaryan
2023-09-27  5:25 ` [PATCH 3/6] mm: Handle shared faults " Matthew Wilcox (Oracle)
2023-09-28  0:46   ` Suren Baghdasaryan
2023-09-28  1:02     ` Suren Baghdasaryan
2023-09-28  5:33       ` Matthew Wilcox
2023-09-28 15:02         ` Suren Baghdasaryan
2023-09-27  5:25 ` [PATCH 4/6] mm: Handle COW " Matthew Wilcox (Oracle)
2023-09-28  0:53   ` Suren Baghdasaryan
2023-09-27  5:25 ` [PATCH 5/6] mm: Handle read " Matthew Wilcox (Oracle)
2023-09-27  5:25 ` [PATCH 6/6] mm: Handle write faults to RO pages " Matthew Wilcox (Oracle)
2023-09-27 22:18 ` [PATCH 0/6] Handle more faults " Suren Baghdasaryan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox