linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	willy@infradead.org, david@kernel.org,  ziy@nvidia.com,
	matthew.brost@intel.com, joshua.hahnjy@gmail.com,
	 rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
	 ying.huang@linux.alibaba.com, apopple@nvidia.com,
	 baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com,
	npache@redhat.com,  ryan.roberts@arm.com, dev.jain@arm.com,
	baohua@kernel.org,  lance.yang@linux.dev, vbabka@suse.cz,
	jannh@google.com, rppt@kernel.org,  mhocko@suse.com,
	pfalcato@suse.de, kees@kernel.org, maddy@linux.ibm.com,
	 npiggin@gmail.com, mpe@ellerman.id.au, chleroy@kernel.org,
	 borntraeger@linux.ibm.com, frankja@linux.ibm.com,
	imbrenda@linux.ibm.com,  hca@linux.ibm.com, gor@linux.ibm.com,
	agordeev@linux.ibm.com,  svens@linux.ibm.com,
	gerald.schaefer@linux.ibm.com, linux-mm@kvack.org,
	 linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
	 linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org
Subject: Re: [PATCH v6 0/6] Use killable vma write locking in most places
Date: Tue, 31 Mar 2026 08:06:04 -0700	[thread overview]
Message-ID: <CAJuCfpHmUop-UzxM9QN7OiAnXUzj461+9ErgtXrtve_bR0_NjA@mail.gmail.com> (raw)
In-Reply-To: <f128b795-5577-40f9-8dae-936be4253642@lucifer.local>

On Tue, Mar 31, 2026 at 2:51 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
>
> On Fri, Mar 27, 2026 at 04:12:26PM -0700, Andrew Morton wrote:
> > On Fri, 27 Mar 2026 13:54:51 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > > Now that we have vma_start_write_killable() we can replace most of the
> > > vma_start_write() calls with it, improving reaction time to the kill
> > > signal.
> > >
> > > There are several places which are left untouched by this patchset:
> > >
> > > 1. free_pgtables() because function should free page tables even if a
> > > fatal signal is pending.
> > >
> > > 2. userfaultd code, where some paths calling vma_start_write() can
> > > handle EINTR and some can't without a deeper code refactoring.
> > >
> > > 3. mpol_rebind_mm() which is used by cpusset controller for migrations
> > > and operates on a remote mm. Incomplete operations here would result
> > > in an inconsistent cgroup state.
> > >
> > > 4. vm_flags_{set|mod|clear} require refactoring that involves moving
> > > vma_start_write() out of these functions and replacing it with
> > > vma_assert_write_locked(), then callers of these functions should
> > > lock the vma themselves using vma_start_write_killable() whenever
> > > possible.
> >
> > Updated, thanks.
>
> Andrew - sorry I think we need to yank this and defer to next cycle,
> there's too many functional changes here.
>
> (There was not really any way for me to predict this would happen ahead of
> time, unfortunately.)

Ok, no objections from me. I'll post v6 removing the part Lorenzo
objects to and you can pick it up whenever you deem appropriate.

>
> >
> > > Changes since v5 [1]:
> > > - Added Reviewed-by for unchanged patches, per Lorenzo Stoakes
> > >
> > > Patch#2:
> > > - Fixed locked_vm counter if mlock_vma_pages_range() fails in
> > > mlock_fixup(), per Sashiko
> > > - Avoid VMA re-locking in madvise_update_vma(), mprotect_fixup() and
> > > mseal_apply() when vma_modify_XXX creates a new VMA as it will already be
> > > locked. This prevents the possibility of incomplete operation if signal
> > > happens after a successful vma_modify_XXX modified the vma tree,
> > > per Sashiko
>
> Prevents the possibility of an incomplete operation? But
> vma_write_lock_killable() checks to see if you're _already_ write locked
> and would make the operation a no-op? So how is this even a delta?
>
> It's a brave new world, arguing with sashiko via a submitter... :)

Yeah, this is not really a problem but I thought I would change it up
to make it apparent that the extra vma_write_lock_killable() is not
even called.

>
> > > - Removed obsolete comment in madvise_update_vma() and mprotect_fixup()
> > >
> > > Patch#4:
> > > - Added clarifying comment for vma_start_write_killable() when locking a
> > > detached VMA
> > > - Override VMA_MERGE_NOMERGE in vma_expand() to prevent callers from
> > > falling back to a new VMA allocation, per Sashiko
> > > - Added a note in the changelog about temporary workaround of using
> > > ENOMEM to propagate the error in vma_merge_existing_range() and
> > > vma_expand()
> > >
> > > Patch#5:
> > > - Added fatal_signal_pending() check in do_mbind() to detect
> > > queue_pages_range() failures due to a pendig fatal signal, per Sashiko
> >
> > Changes since v5:
> >
> >
> >  mm/madvise.c   |   15 ++++++++++-----
> >  mm/mempolicy.c |    9 ++++++++-
> >  mm/mlock.c     |    2 ++
> >  mm/mprotect.c  |   26 ++++++++++++++++----------
> >  mm/mseal.c     |   27 +++++++++++++++++++--------
> >  mm/vma.c       |   20 ++++++++++++++++++--
> >  6 files changed, 73 insertions(+), 26 deletions(-)
> >
> > --- a/mm/madvise.c~b
> > +++ a/mm/madvise.c
> > @@ -172,11 +172,16 @@ static int madvise_update_vma(vm_flags_t
> >       if (IS_ERR(vma))
> >               return PTR_ERR(vma);
> >
> > -     madv_behavior->vma = vma;
> > -
> > -     /* vm_flags is protected by the mmap_lock held in write mode. */
> > -     if (vma_start_write_killable(vma))
> > -             return -EINTR;
> > +     /*
> > +      * If a new vma was created during vma_modify_XXX, the resulting
> > +      * vma is already locked. Skip re-locking new vma in this case.
> > +      */
> > +     if (vma == madv_behavior->vma) {
> > +             if (vma_start_write_killable(vma))
> > +                     return -EINTR;
> > +     } else {
> > +             madv_behavior->vma = vma;
> > +     }
> >
> >       vma->flags = new_vma_flags;
> >       if (set_new_anon_name)
> > --- a/mm/mempolicy.c~b
> > +++ a/mm/mempolicy.c
> > @@ -1546,7 +1546,14 @@ static long do_mbind(unsigned long start
> >                       flags | MPOL_MF_INVERT | MPOL_MF_WRLOCK, &pagelist);
> >
> >       if (nr_failed < 0) {
> > -             err = nr_failed;
> > +             /*
> > +              * queue_pages_range() might override the original error with -EFAULT.
> > +              * Confirm that fatal signals are still treated correctly.
> > +              */
> > +             if (fatal_signal_pending(current))
> > +                     err = -EINTR;
> > +             else
> > +                     err = nr_failed;
> >               nr_failed = 0;
> >       } else {
> >               vma_iter_init(&vmi, mm, start);
> > --- a/mm/mlock.c~b
> > +++ a/mm/mlock.c
> > @@ -518,6 +518,8 @@ static int mlock_fixup(struct vma_iterat
> >               vma->flags = new_vma_flags;
> >       } else {
> >               ret = mlock_vma_pages_range(vma, start, end, &new_vma_flags);
> > +             if (ret)
> > +                     mm->locked_vm -= nr_pages;
> >       }
> >  out:
> >       *prev = vma;
> > --- a/mm/mprotect.c~b
> > +++ a/mm/mprotect.c
> > @@ -716,6 +716,7 @@ mprotect_fixup(struct vma_iterator *vmi,
> >       const vma_flags_t old_vma_flags = READ_ONCE(vma->flags);
> >       vma_flags_t new_vma_flags = legacy_to_vma_flags(newflags);
> >       long nrpages = (end - start) >> PAGE_SHIFT;
> > +     struct vm_area_struct *new_vma;
> >       unsigned int mm_cp_flags = 0;
> >       unsigned long charged = 0;
> >       int error;
> > @@ -772,21 +773,26 @@ mprotect_fixup(struct vma_iterator *vmi,
> >               vma_flags_clear(&new_vma_flags, VMA_ACCOUNT_BIT);
> >       }
> >
> > -     vma = vma_modify_flags(vmi, *pprev, vma, start, end, &new_vma_flags);
> > -     if (IS_ERR(vma)) {
> > -             error = PTR_ERR(vma);
> > +     new_vma = vma_modify_flags(vmi, *pprev, vma, start, end,
> > +                                &new_vma_flags);
> > +     if (IS_ERR(new_vma)) {
> > +             error = PTR_ERR(new_vma);
> >               goto fail;
> >       }
> >
> > -     *pprev = vma;
> > -
> >       /*
> > -      * vm_flags and vm_page_prot are protected by the mmap_lock
> > -      * held in write mode.
> > +      * If a new vma was created during vma_modify_flags, the resulting
> > +      * vma is already locked. Skip re-locking new vma in this case.
> >        */
> > -     error = vma_start_write_killable(vma);
> > -     if (error)
> > -             goto fail;
> > +     if (new_vma == vma) {
> > +             error = vma_start_write_killable(vma);
> > +             if (error)
> > +                     goto fail;
> > +     } else {
> > +             vma = new_vma;
> > +     }
> > +
> > +     *pprev = vma;
> >
> >       vma_flags_reset_once(vma, &new_vma_flags);
> >       if (vma_wants_manual_pte_write_upgrade(vma))
> > --- a/mm/mseal.c~b
> > +++ a/mm/mseal.c
> > @@ -70,17 +70,28 @@ static int mseal_apply(struct mm_struct
> >
> >               if (!vma_test(vma, VMA_SEALED_BIT)) {
> >                       vma_flags_t vma_flags = vma->flags;
> > -                     int err;
> > +                     struct vm_area_struct *new_vma;
> >
> >                       vma_flags_set(&vma_flags, VMA_SEALED_BIT);
> >
> > -                     vma = vma_modify_flags(&vmi, prev, vma, curr_start,
> > -                                            curr_end, &vma_flags);
> > -                     if (IS_ERR(vma))
> > -                             return PTR_ERR(vma);
> > -                     err = vma_start_write_killable(vma);
> > -                     if (err)
> > -                             return err;
> > +                     new_vma = vma_modify_flags(&vmi, prev, vma, curr_start,
> > +                                                curr_end, &vma_flags);
> > +                     if (IS_ERR(new_vma))
> > +                             return PTR_ERR(new_vma);
> > +
> > +                     /*
> > +                      * If a new vma was created during vma_modify_flags,
> > +                      * the resulting vma is already locked.
> > +                      * Skip re-locking new vma in this case.
> > +                      */
> > +                     if (new_vma == vma) {
> > +                             int err = vma_start_write_killable(vma);
> > +                             if (err)
> > +                                     return err;
> > +                     } else {
> > +                             vma = new_vma;
> > +                     }
> > +
> >                       vma_set_flags(vma, VMA_SEALED_BIT);
> >               }
> >
> > --- a/mm/vma.c~b
> > +++ a/mm/vma.c
> > @@ -531,6 +531,10 @@ __split_vma(struct vma_iterator *vmi, st
> >       err = vma_start_write_killable(vma);
> >       if (err)
> >               goto out_free_vma;
> > +     /*
> > +      * Locking a new detached VMA will always succeed but it's just a
> > +      * detail of the current implementation, so handle it all the same.
> > +      */
> >       err = vma_start_write_killable(new);
> >       if (err)
> >               goto out_free_vma;
> > @@ -1197,8 +1201,14 @@ int vma_expand(struct vma_merge_struct *
> >
> >       mmap_assert_write_locked(vmg->mm);
> >       err = vma_start_write_killable(target);
> > -     if (err)
> > +     if (err) {
> > +             /*
> > +              * Override VMA_MERGE_NOMERGE to prevent callers from
> > +              * falling back to a new VMA allocation.
> > +              */
> > +             vmg->state = VMA_MERGE_ERROR_NOMEM;
> >               return err;
> > +     }
> >
> >       target_sticky = vma_flags_and_mask(&target->flags, VMA_STICKY_FLAGS);
> >
> > @@ -1231,8 +1241,14 @@ int vma_expand(struct vma_merge_struct *
> >                * is pending.
> >                */
> >               err = vma_start_write_killable(next);
> > -             if (err)
> > +             if (err) {
> > +                     /*
> > +                      * Override VMA_MERGE_NOMERGE to prevent callers from
> > +                      * falling back to a new VMA allocation.
> > +                      */
> > +                     vmg->state = VMA_MERGE_ERROR_NOMEM;
> >                       return err;
> > +             }
> >               err = dup_anon_vma(target, next, &anon_dup);
> >               if (err)
> >                       return err;
> > _
> >


  reply	other threads:[~2026-03-31 15:06 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-27 20:54 Suren Baghdasaryan
2026-03-27 20:54 ` [PATCH v6 1/6] mm/vma: cleanup error handling path in vma_expand() Suren Baghdasaryan
2026-03-27 20:54 ` [PATCH v6 2/6] mm: use vma_start_write_killable() in mm syscalls Suren Baghdasaryan
2026-03-31  9:35   ` Lorenzo Stoakes (Oracle)
2026-03-31 15:01     ` Suren Baghdasaryan
2026-03-31 18:29       ` Andrew Morton
2026-03-31 18:47         ` Lorenzo Stoakes (Oracle)
2026-03-31 20:14           ` Suren Baghdasaryan
2026-04-02 13:19             ` Lorenzo Stoakes (Oracle)
2026-04-02 15:11               ` Suren Baghdasaryan
2026-04-02 15:20                 ` Lorenzo Stoakes (Oracle)
2026-03-27 20:54 ` [PATCH v6 3/6] mm/khugepaged: use vma_start_write_killable() in collapse_huge_page() Suren Baghdasaryan
2026-03-27 20:54 ` [PATCH v6 4/6] mm/vma: use vma_start_write_killable() in vma operations Suren Baghdasaryan
2026-03-31 10:24   ` Lorenzo Stoakes (Oracle)
2026-03-31 15:37     ` Suren Baghdasaryan
2026-03-27 20:54 ` [PATCH v6 5/6] mm: use vma_start_write_killable() in process_vma_walk_lock() Suren Baghdasaryan
2026-03-31 10:38   ` Lorenzo Stoakes (Oracle)
2026-03-31 15:43     ` Suren Baghdasaryan
2026-03-27 20:54 ` [PATCH v6 6/6] KVM: PPC: use vma_start_write_killable() in kvmppc_memslot_page_merge() Suren Baghdasaryan
2026-03-27 23:12 ` [PATCH v6 0/6] Use killable vma write locking in most places Andrew Morton
2026-03-31  9:51   ` Lorenzo Stoakes (Oracle)
2026-03-31 15:06     ` Suren Baghdasaryan [this message]
2026-03-31 15:34       ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJuCfpHmUop-UzxM9QN7OiAnXUzj461+9ErgtXrtve_bR0_NjA@mail.gmail.com \
    --to=surenb@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=byungchul@sk.com \
    --cc=chleroy@kernel.org \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=frankja@linux.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=gourry@gourry.net \
    --cc=hca@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=jannh@google.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kees@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=ljs@kernel.org \
    --cc=maddy@linux.ibm.com \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=mpe@ellerman.id.au \
    --cc=npache@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=pfalcato@suse.de \
    --cc=rakie.kim@sk.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=svens@linux.ibm.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox