From: Suren Baghdasaryan <surenb@google.com>
To: "Liam R. Howlett" <Liam.Howlett@oracle.com>,
Suren Baghdasaryan <surenb@google.com>,
akpm@linux-foundation.org, willy@infradead.org,
david@kernel.org, ziy@nvidia.com, matthew.brost@intel.com,
joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com,
gourry@gourry.net, ying.huang@linux.alibaba.com,
apopple@nvidia.com, lorenzo.stoakes@oracle.com,
baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org,
lance.yang@linux.dev, vbabka@suse.cz, jannh@google.com,
rppt@kernel.org, mhocko@suse.com, pfalcato@suse.de,
kees@kernel.org, maddy@linux.ibm.com, npiggin@gmail.com,
mpe@ellerman.id.au, chleroy@kernel.org,
borntraeger@linux.ibm.com, frankja@linux.ibm.com,
imbrenda@linux.ibm.com, hca@linux.ibm.com, gor@linux.ibm.com,
agordeev@linux.ibm.com, svens@linux.ibm.com,
gerald.schaefer@linux.ibm.com, linux-mm@kvack.org,
linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org,
"Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Subject: Re: [PATCH v2 2/3] mm: replace vma_start_write() with vma_start_write_killable()
Date: Tue, 17 Feb 2026 13:02:48 -0800 [thread overview]
Message-ID: <CAJuCfpGViU4dDaLtPR8U0C+=FXO=1TuU-hT3fypNQO3LGOjbcA@mail.gmail.com> (raw)
In-Reply-To: <dtdfrko7uqif6flc4mefnlar7wnmrbyswfu7bvb2ar24gkeejo@ypzhmyklbeh7>
On Tue, Feb 17, 2026 at 11:19 AM Liam R. Howlett
<Liam.Howlett@oracle.com> wrote:
>
> * Suren Baghdasaryan <surenb@google.com> [260217 11:33]:
> > Now that we have vma_start_write_killable() we can replace most of the
> > vma_start_write() calls with it, improving reaction time to the kill
> > signal.
> >
> > There are several places which are left untouched by this patch:
> >
> > 1. free_pgtables() because function should free page tables even if a
> > fatal signal is pending.
> >
> > 2. process_vma_walk_lock(), which requires changes in its callers and
> > will be handled in the next patch.
> >
> > 3. userfaultd code, where some paths calling vma_start_write() can
> > handle EINTR and some can't without a deeper code refactoring.
> >
> > 4. vm_flags_{set|mod|clear} require refactoring that involves moving
> > vma_start_write() out of these functions and replacing it with
> > vma_assert_write_locked(), then callers of these functions should
> > lock the vma themselves using vma_start_write_killable() whenever
> > possible.
> >
> > Suggested-by: Matthew Wilcox <willy@infradead.org>
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> # powerpc
> > ---
> > arch/powerpc/kvm/book3s_hv_uvmem.c | 5 +-
> > include/linux/mempolicy.h | 5 +-
> > mm/khugepaged.c | 5 +-
> > mm/madvise.c | 4 +-
> > mm/memory.c | 2 +
> > mm/mempolicy.c | 23 ++++++--
> > mm/mlock.c | 20 +++++--
> > mm/mprotect.c | 4 +-
> > mm/mremap.c | 4 +-
> > mm/vma.c | 93 +++++++++++++++++++++---------
> > mm/vma_exec.c | 6 +-
> > 11 files changed, 123 insertions(+), 48 deletions(-)
> >
>
> ...
>
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
>
> ...
>
> >
> > static const struct mempolicy_operations mpol_ops[MPOL_MAX] = {
> > @@ -1785,9 +1793,15 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
> > return -EINVAL;
> > if (end == start)
> > return 0;
> > - mmap_write_lock(mm);
> > + if (mmap_write_lock_killable(mm))
> > + return -EINTR;
> > prev = vma_prev(&vmi);
> > for_each_vma_range(vmi, vma, end) {
> > + if (vma_start_write_killable(vma)) {
> > + err = -EINTR;
> > + break;
> > + }
> > +
> > /*
> > * If any vma in the range got policy other than MPOL_BIND
> > * or MPOL_PREFERRED_MANY we return error. We don't reset
> > @@ -1808,7 +1822,6 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
> > break;
> > }
> >
> > - vma_start_write(vma);
>
> Moving this vma_start_write() up means we will lock all vmas in the
> range regardless of if they are going to change. Was that your
> intention?
No, I missed that this would result in unnecessary locks.
>
> It might be better to move the locking to later in the loop, prior to
> the mpol_dup(), but after skipping other vmas?
Yes, that's the right place for it. Will move.
>
> > new->home_node = home_node;
> > err = mbind_range(&vmi, vma, &prev, start, end, new);
>
> ...
>
> > diff --git a/mm/vma.c b/mm/vma.c
> > index bb4d0326fecb..1d21351282cf 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
>
> ...
>
> > @@ -2532,6 +2556,11 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
> > goto free_vma;
> > }
> >
> > + /* Lock the VMA since it is modified after insertion into VMA tree */
> > + error = vma_start_write_killable(vma);
> > + if (error)
> > + goto free_iter_vma;
> > +
> > if (map->file)
> > error = __mmap_new_file_vma(map, vma);
> > else if (map->vm_flags & VM_SHARED)
> > @@ -2552,8 +2581,6 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
> > WARN_ON_ONCE(!arch_validate_flags(map->vm_flags));
> > #endif
> >
> > - /* Lock the VMA since it is modified after insertion into VMA tree */
> > - vma_start_write(vma);
> > vma_iter_store_new(vmi, vma);
> > map->mm->map_count++;
> > vma_link_file(vma, map->hold_file_rmap_lock);
>
> This is a bit of a nit on the placement..
>
> Prior to this change, the write lock on the vma was taken next to where
> it was inserted in the tree. Now the lock is taken between vma iterator
> preallocations and part of the vma setup.
>
> Would it make sense to put it closer to the vma allocation itself? I
> think all that's needed to be set is the mm struct for the locking to
> work?
I guess locking the vma before vma_iter_prealloc() would save us
unnecessary alloc/free in case of a pending fatal signal. I'll move
the lock right after vm_area_alloc() so that the entire vma setup is
done on a locked vma.
>
>
> ...
>
> > @@ -3089,7 +3120,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>
> Good luck testing this one.
Yeah... Any suggestions for tests I should use?
>
> > struct mm_struct *mm = vma->vm_mm;
> > struct vm_area_struct *next;
> > unsigned long gap_addr;
> > - int error = 0;
> > + int error;
> > VMA_ITERATOR(vmi, mm, vma->vm_start);
> >
> > if (!(vma->vm_flags & VM_GROWSUP))
> > @@ -3126,12 +3157,14 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
> >
> > /* We must make sure the anon_vma is allocated. */
> > if (unlikely(anon_vma_prepare(vma))) {
> > - vma_iter_free(&vmi);
> > - return -ENOMEM;
> > + error = -ENOMEM;
> > + goto free;
> > }
> >
> > /* Lock the VMA before expanding to prevent concurrent page faults */
> > - vma_start_write(vma);
> > + error = vma_start_write_killable(vma);
> > + if (error)
> > + goto free;
> > /* We update the anon VMA tree. */
> > anon_vma_lock_write(vma->anon_vma);
> >
> > @@ -3160,6 +3193,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
> > }
> > }
> > anon_vma_unlock_write(vma->anon_vma);
> > +free:
> > vma_iter_free(&vmi);
> > validate_mm(mm);
> > return error;
>
> Looks okay.
Thanks for the review, Liam! I'll wait a couple of days and post the
v3 with fixes.
>
> ...
>
>
>
next prev parent reply other threads:[~2026-02-17 21:03 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-17 16:32 [PATCH v2 0/3] Use killable vma write locking in most places Suren Baghdasaryan
2026-02-17 16:32 ` [PATCH v2 1/3] mm/vma: cleanup error handling path in vma_expand() Suren Baghdasaryan
2026-02-17 18:26 ` Liam R. Howlett
2026-02-17 16:32 ` [PATCH v2 2/3] mm: replace vma_start_write() with vma_start_write_killable() Suren Baghdasaryan
2026-02-17 19:19 ` Liam R. Howlett
2026-02-17 21:02 ` Suren Baghdasaryan [this message]
2026-02-18 16:46 ` Liam R. Howlett
2026-02-18 23:40 ` Suren Baghdasaryan
2026-02-17 16:32 ` [PATCH v2 3/3] mm: use vma_start_write_killable() in process_vma_walk_lock() Suren Baghdasaryan
2026-02-17 19:15 ` Heiko Carstens
2026-02-17 20:31 ` Suren Baghdasaryan
2026-02-18 7:10 ` Heiko Carstens
2026-02-18 13:07 ` Matthew Wilcox
2026-02-18 15:52 ` Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJuCfpGViU4dDaLtPR8U0C+=FXO=1TuU-hT3fypNQO3LGOjbcA@mail.gmail.com' \
--to=surenb@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=agordeev@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=borntraeger@linux.ibm.com \
--cc=byungchul@sk.com \
--cc=chleroy@kernel.org \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=frankja@linux.ibm.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=gourry@gourry.net \
--cc=hca@linux.ibm.com \
--cc=imbrenda@linux.ibm.com \
--cc=jannh@google.com \
--cc=joshua.hahnjy@gmail.com \
--cc=kees@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=lance.yang@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=maddy@linux.ibm.com \
--cc=matthew.brost@intel.com \
--cc=mhocko@suse.com \
--cc=mpe@ellerman.id.au \
--cc=npache@redhat.com \
--cc=npiggin@gmail.com \
--cc=pfalcato@suse.de \
--cc=rakie.kim@sk.com \
--cc=ritesh.list@gmail.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=svens@linux.ibm.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=ying.huang@linux.alibaba.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox