linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
To: Lokesh Gidra <lokeshgidra@google.com>
Cc: akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	selinux@vger.kernel.org, surenb@google.com,
	kernel-team@android.com, aarcange@redhat.com, peterx@redhat.com,
	david@redhat.com, axelrasmussen@google.com, bgeffon@google.com,
	willy@infradead.org, jannh@google.com, kaleshsingh@google.com,
	ngeoffray@google.com, timmurray@google.com, rppt@kernel.org
Subject: Re: [PATCH v2 2/3] userfaultfd: protect mmap_changing with rw_sem in userfaulfd_ctx
Date: Mon, 29 Jan 2024 22:46:27 -0500	[thread overview]
Message-ID: <20240130034627.4aupq27mksswisqg@revolver> (raw)
In-Reply-To: <CA+EESO6XiPfbUBgU3FukGvi_NG5XpAQxWKu7vg534t=rtWmGXg@mail.gmail.com>

* Lokesh Gidra <lokeshgidra@google.com> [240129 17:35]:
> On Mon, Jan 29, 2024 at 1:00 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> >
> > * Lokesh Gidra <lokeshgidra@google.com> [240129 14:35]:
> > > Increments and loads to mmap_changing are always in mmap_lock
> > > critical section.
> >
> > Read or write?
> >
> It's write-mode when incrementing (except in case of
> userfaultfd_remove() where it's done in read-mode) and loads are in
> mmap_lock (read-mode). I'll clarify this in the next version.
> >
> > > This ensures that if userspace requests event
> > > notification for non-cooperative operations (e.g. mremap), userfaultfd
> > > operations don't occur concurrently.
> > >
> > > This can be achieved by using a separate read-write semaphore in
> > > userfaultfd_ctx such that increments are done in write-mode and loads
> > > in read-mode, thereby eliminating the dependency on mmap_lock for this
> > > purpose.
> > >
> > > This is a preparatory step before we replace mmap_lock usage with
> > > per-vma locks in fill/move ioctls.
> > >
> > > Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
> > > ---
> > >  fs/userfaultfd.c              | 40 ++++++++++++----------
> > >  include/linux/userfaultfd_k.h | 31 ++++++++++--------
> > >  mm/userfaultfd.c              | 62 ++++++++++++++++++++---------------
> > >  3 files changed, 75 insertions(+), 58 deletions(-)
> > >
> > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > > index 58331b83d648..c00a021bcce4 100644
> > > --- a/fs/userfaultfd.c
> > > +++ b/fs/userfaultfd.c
> > > @@ -685,12 +685,15 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *fcs)
> > >               ctx->flags = octx->flags;
> > >               ctx->features = octx->features;
> > >               ctx->released = false;
> > > +             init_rwsem(&ctx->map_changing_lock);
> > >               atomic_set(&ctx->mmap_changing, 0);
> > >               ctx->mm = vma->vm_mm;
> > >               mmgrab(ctx->mm);
> > >
> > >               userfaultfd_ctx_get(octx);
> > > +             down_write(&octx->map_changing_lock);
> > >               atomic_inc(&octx->mmap_changing);
> > > +             up_write(&octx->map_changing_lock);

On init, I don't think taking the lock is strictly necessary - unless
there is a way to access it before this increment?  Not that it would
cost much.

> >
> > This can potentially hold up your writer as the readers execute.  I
> > think this will change your priority (ie: priority inversion)?
> 
> Priority inversion, if any, is already happening due to mmap_lock, no?
> Also, I thought rw_semaphore implementation is fair, so the writer
> will eventually get the lock right? Please correct me if I'm wrong.

You are correct.  Any writer will stop any new readers, but readers
currently in the section must finish before the writer.

> 
> At this patch: there can't be any readers as they need to acquire
> mmap_lock in read-mode first. While writers, at the point of
> incrementing mmap_changing, already hold mmap_lock in write-mode.
> 
> With per-vma locks, the same synchronization that mmap_lock achieved
> around mmap_changing, will be achieved by ctx->map_changing_lock.

The inversion I was thinking was that the writer cannot complete the
write until the reader is done failing because the atomic_inc has
happened..?  I see the writer as a priority since readers cannot
complete within the write, but I read it wrong.  I think the readers are
fine if the happen before, during, or after a write.  The work is thrown
out if the reader happens during the transition between those states,
which is detected through the atomic.  This makes sense now.

> >
> > You could use the first bit of the atomic_inc as indication of a write.
> > So if the mmap_changing is even, then there are no writers.  If it
> > didn't change and it's even then you know no modification has happened
> > (or it overflowed and hit the same number which would be rare, but
> > maybe okay?).
> 
> This is already achievable, right? If mmap_changing is >0 then we know
> there are writers. The problem is that we want writers (like mremap
> operations) to block as long as there is a userfaultfd operation (also
> reader of mmap_changing) going on. Please note that I'm inferring this
> from current implementation.
> 
> AFAIU, mmap_changing isn't required for correctness, because all
> operations are happening under the right mode of mmap_lock. It's used
> to ensure that while a non-cooperative operations is happening, if the
> user has asked it to be notified, then no other userfaultfd operations
> should take place until the user gets the event notification.

I think it is needed, mmap_changing is read before the mmap_lock is
taken, then compared after the mmap_lock is taken (both read mode) to
ensure nothing has changed.

...

> > > @@ -783,7 +788,9 @@ bool userfaultfd_remove(struct vm_area_struct *vma,
> > >               return true;
> > >
> > >       userfaultfd_ctx_get(ctx);
> > > +     down_write(&ctx->map_changing_lock);
> > >       atomic_inc(&ctx->mmap_changing);
> > > +     up_write(&ctx->map_changing_lock);
> > >       mmap_read_unlock(mm);
> > >
> > >       msg_init(&ewq.msg);

If this happens in read mode, then why are you waiting for the readers
to leave?  Can't you just increment the atomic?  It's fine happening in
read mode today, so it should be fine with this new rwsem.

Thanks,
Liam

...


  reply	other threads:[~2024-01-30  3:46 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-29 19:35 [PATCH v2 0/3] per-vma locks in userfaultfd Lokesh Gidra
2024-01-29 19:35 ` [PATCH v2 1/3] userfaultfd: move userfaultfd_ctx struct to header file Lokesh Gidra
2024-01-30  7:12   ` Mike Rapoport
2024-01-29 19:35 ` [PATCH v2 2/3] userfaultfd: protect mmap_changing with rw_sem in userfaulfd_ctx Lokesh Gidra
2024-01-29 21:00   ` Liam R. Howlett
2024-01-29 22:35     ` Lokesh Gidra
2024-01-30  3:46       ` Liam R. Howlett [this message]
2024-01-30  8:55         ` Mike Rapoport
2024-01-30 17:28           ` Liam R. Howlett
2024-01-31  2:24             ` Lokesh Gidra
2024-02-04 10:27               ` Mike Rapoport
2024-02-05 20:53                 ` Lokesh Gidra
2024-02-07 15:27                   ` Mike Rapoport
2024-02-07 20:24                     ` Lokesh Gidra
2024-02-12  8:14                       ` Mike Rapoport
2024-01-30  7:21   ` Mike Rapoport
2024-01-29 19:35 ` [PATCH v2 3/3] userfaultfd: use per-vma locks in userfaultfd operations Lokesh Gidra
2024-01-29 20:36   ` Liam R. Howlett
2024-01-29 20:52     ` Suren Baghdasaryan
2024-01-29 21:18       ` Liam R. Howlett
2024-01-30  0:28       ` Lokesh Gidra
2024-01-30  2:58         ` Liam R. Howlett
2024-01-31  2:49           ` Lokesh Gidra
2024-01-31 21:41             ` Liam R. Howlett
2024-02-05 21:46               ` Suren Baghdasaryan
2024-02-05 21:54                 ` Lokesh Gidra
2024-02-05 22:00                   ` Liam R. Howlett
2024-02-05 22:24                     ` Lokesh Gidra
2024-02-06 14:35                       ` Liam R. Howlett
2024-02-06 16:26                         ` Lokesh Gidra
2024-02-06 17:07                           ` Liam R. Howlett
2024-01-31  3:03           ` Suren Baghdasaryan
2024-01-31 21:43             ` Liam R. Howlett
2024-01-29 20:39 ` [PATCH v2 0/3] per-vma locks in userfaultfd Liam R. Howlett
2024-01-29 21:58   ` Lokesh Gidra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240130034627.4aupq27mksswisqg@revolver \
    --to=liam.howlett@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=bgeffon@google.com \
    --cc=david@redhat.com \
    --cc=jannh@google.com \
    --cc=kaleshsingh@google.com \
    --cc=kernel-team@android.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lokeshgidra@google.com \
    --cc=ngeoffray@google.com \
    --cc=peterx@redhat.com \
    --cc=rppt@kernel.org \
    --cc=selinux@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox