* [PATCH v3 0/3] Use killable vma write locking in most places
@ 2026-02-26 7:06 Suren Baghdasaryan
2026-02-26 7:06 ` [PATCH v3 1/3] mm/vma: cleanup error handling path in vma_expand() Suren Baghdasaryan
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Suren Baghdasaryan @ 2026-02-26 7:06 UTC (permalink / raw)
To: akpm
Cc: willy, david, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
byungchul, gourry, ying.huang, apopple, lorenzo.stoakes,
baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
baohua, lance.yang, vbabka, jannh, rppt, mhocko, pfalcato, kees,
maddy, npiggin, mpe, chleroy, borntraeger, frankja, imbrenda,
hca, gor, agordeev, svens, gerald.schaefer, linux-mm,
linuxppc-dev, kvm, linux-kernel, linux-s390, surenb
Now that we have vma_start_write_killable() we can replace most of the
vma_start_write() calls with it, improving reaction time to the kill
signal.
There are several places which are left untouched by this patchset:
1. free_pgtables() because function should free page tables even if a
fatal signal is pending.
2. userfaultd code, where some paths calling vma_start_write() can
handle EINTR and some can't without a deeper code refactoring.
3. mpol_rebind_mm() which is used by cpusset controller for migrations
and operates on a remote mm. Incomplete operations here would result
in an inconsistent cgroup state.
4. vm_flags_{set|mod|clear} require refactoring that involves moving
vma_start_write() out of these functions and replacing it with
vma_assert_write_locked(), then callers of these functions should
lock the vma themselves using vma_start_write_killable() whenever
possible.
A cleanup patch is added in the beginning to make later changes more
readable. The second patch contains most of the changes and the last
patch contains the changes associated with process_vma_walk_lock()
error handling.
Changes since v2 [1]:
- rebased over mm-unstable, per Matthew Wilcox;
- removed mpol_rebind_mm() changes since the function operates on a
remote mm and incomplete operation can leave unrelated process in an
inconsistent state;
- moved vma_start_write_killable() inside set_mempolicy_home_node() to
avoid locking extra vmas, per Liam R. Howlett
- moved vma_start_write_killable() inside __mmap_new_vma() to lock the
vma right after it's allocation, per Liam R. Howlett
- introduced VMA_MERGE_ERROR_INTR to add EINTR handling for vma_modify()
- changed do_mbind() error handling for avoid EINTR overrides;
- changed migrate_to_node() error handling for avoid EINTR overrides;
- added EINTR handling in queue_pages_range();
- fixed clear_refs_write() error handling which previous verstion broke
by skipping some of the cleanup logic;
[1] https://lore.kernel.org/all/20260217163250.2326001-1-surenb@google.com/
Suren Baghdasaryan (3):
mm/vma: cleanup error handling path in vma_expand()
mm: replace vma_start_write() with vma_start_write_killable()
mm: use vma_start_write_killable() in process_vma_walk_lock()
arch/powerpc/kvm/book3s_hv_uvmem.c | 5 +-
arch/s390/kvm/kvm-s390.c | 2 +-
fs/proc/task_mmu.c | 5 +-
mm/khugepaged.c | 5 +-
mm/madvise.c | 4 +-
mm/memory.c | 2 +
mm/mempolicy.c | 22 +++--
mm/mlock.c | 21 +++--
mm/mprotect.c | 4 +-
mm/mremap.c | 4 +-
mm/pagewalk.c | 20 +++--
mm/vma.c | 127 ++++++++++++++++++++---------
mm/vma.h | 6 ++
mm/vma_exec.c | 6 +-
14 files changed, 167 insertions(+), 66 deletions(-)
base-commit: 6de23f81a5e08be8fbf5e8d7e9febc72a5b5f27f
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 1/3] mm/vma: cleanup error handling path in vma_expand()
2026-02-26 7:06 [PATCH v3 0/3] Use killable vma write locking in most places Suren Baghdasaryan
@ 2026-02-26 7:06 ` Suren Baghdasaryan
2026-02-26 16:42 ` Liam R. Howlett
2026-02-26 7:06 ` [PATCH v3 2/3] mm: replace vma_start_write() with vma_start_write_killable() Suren Baghdasaryan
2026-02-26 7:06 ` [PATCH v3 3/3] mm: use vma_start_write_killable() in process_vma_walk_lock() Suren Baghdasaryan
2 siblings, 1 reply; 9+ messages in thread
From: Suren Baghdasaryan @ 2026-02-26 7:06 UTC (permalink / raw)
To: akpm
Cc: willy, david, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
byungchul, gourry, ying.huang, apopple, lorenzo.stoakes,
baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
baohua, lance.yang, vbabka, jannh, rppt, mhocko, pfalcato, kees,
maddy, npiggin, mpe, chleroy, borntraeger, frankja, imbrenda,
hca, gor, agordeev, svens, gerald.schaefer, linux-mm,
linuxppc-dev, kvm, linux-kernel, linux-s390, surenb
vma_expand() error handling is a bit confusing with "if (ret) return ret;"
mixed with "if (!ret && ...) ret = ...;". Simplify the code to check
for errors and return immediately after an operation that might fail.
This also makes later changes to this function more readable.
No functional change intended.
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
mm/vma.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index be64f781a3aa..bb4d0326fecb 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -1186,12 +1186,16 @@ int vma_expand(struct vma_merge_struct *vmg)
* Note that, by convention, callers ignore OOM for this case, so
* we don't need to account for vmg->give_up_on_mm here.
*/
- if (remove_next)
+ if (remove_next) {
ret = dup_anon_vma(target, next, &anon_dup);
- if (!ret && vmg->copied_from)
+ if (ret)
+ return ret;
+ }
+ if (vmg->copied_from) {
ret = dup_anon_vma(target, vmg->copied_from, &anon_dup);
- if (ret)
- return ret;
+ if (ret)
+ return ret;
+ }
if (remove_next) {
vma_start_write(next);
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 2/3] mm: replace vma_start_write() with vma_start_write_killable()
2026-02-26 7:06 [PATCH v3 0/3] Use killable vma write locking in most places Suren Baghdasaryan
2026-02-26 7:06 ` [PATCH v3 1/3] mm/vma: cleanup error handling path in vma_expand() Suren Baghdasaryan
@ 2026-02-26 7:06 ` Suren Baghdasaryan
2026-02-26 17:43 ` Liam R. Howlett
2026-02-26 7:06 ` [PATCH v3 3/3] mm: use vma_start_write_killable() in process_vma_walk_lock() Suren Baghdasaryan
2 siblings, 1 reply; 9+ messages in thread
From: Suren Baghdasaryan @ 2026-02-26 7:06 UTC (permalink / raw)
To: akpm
Cc: willy, david, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
byungchul, gourry, ying.huang, apopple, lorenzo.stoakes,
baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
baohua, lance.yang, vbabka, jannh, rppt, mhocko, pfalcato, kees,
maddy, npiggin, mpe, chleroy, borntraeger, frankja, imbrenda,
hca, gor, agordeev, svens, gerald.schaefer, linux-mm,
linuxppc-dev, kvm, linux-kernel, linux-s390, surenb,
Ritesh Harjani (IBM)
Now that we have vma_start_write_killable() we can replace most of the
vma_start_write() calls with it, improving reaction time to the kill
signal.
There are several places which are left untouched by this patch:
1. free_pgtables() because function should free page tables even if a
fatal signal is pending.
2. process_vma_walk_lock(), which requires changes in its callers and
will be handled in the next patch.
3. userfaultd code, where some paths calling vma_start_write() can
handle EINTR and some can't without a deeper code refactoring.
4. mpol_rebind_mm() which is used by cpusset controller for migrations
and operates on a remote mm. Incomplete operations here would result
in an inconsistent cgroup state.
5. vm_flags_{set|mod|clear} require refactoring that involves moving
vma_start_write() out of these functions and replacing it with
vma_assert_write_locked(), then callers of these functions should
lock the vma themselves using vma_start_write_killable() whenever
possible.
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> # powerpc
---
arch/powerpc/kvm/book3s_hv_uvmem.c | 5 +-
mm/khugepaged.c | 5 +-
mm/madvise.c | 4 +-
mm/memory.c | 2 +
mm/mempolicy.c | 8 ++-
mm/mlock.c | 21 +++++--
mm/mprotect.c | 4 +-
mm/mremap.c | 4 +-
mm/vma.c | 93 +++++++++++++++++++++---------
mm/vma_exec.c | 6 +-
10 files changed, 109 insertions(+), 43 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 5fbb95d90e99..0a28b48a46b8 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -410,7 +410,10 @@ static int kvmppc_memslot_page_merge(struct kvm *kvm,
ret = H_STATE;
break;
}
- vma_start_write(vma);
+ if (vma_start_write_killable(vma)) {
+ ret = H_STATE;
+ break;
+ }
/* Copy vm_flags to avoid partial modifications in ksm_madvise */
vm_flags = vma->vm_flags;
ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 1dd3cfca610d..6c92e31ee5fb 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1141,7 +1141,10 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
if (result != SCAN_SUCCEED)
goto out_up_write;
/* check if the pmd is still valid */
- vma_start_write(vma);
+ if (vma_start_write_killable(vma)) {
+ result = SCAN_FAIL;
+ goto out_up_write;
+ }
result = check_pmd_still_valid(mm, address, pmd);
if (result != SCAN_SUCCEED)
goto out_up_write;
diff --git a/mm/madvise.c b/mm/madvise.c
index c0370d9b4e23..ccdaea6b3b15 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -173,7 +173,9 @@ static int madvise_update_vma(vm_flags_t new_flags,
madv_behavior->vma = vma;
/* vm_flags is protected by the mmap_lock held in write mode. */
- vma_start_write(vma);
+ if (vma_start_write_killable(vma))
+ return -EINTR;
+
vm_flags_reset(vma, new_flags);
if (set_new_anon_name)
return replace_anon_vma_name(vma, anon_name);
diff --git a/mm/memory.c b/mm/memory.c
index 07778814b4a8..691062154cf5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -379,6 +379,8 @@ void free_pgd_range(struct mmu_gather *tlb,
* page tables that should be removed. This can differ from the vma mappings on
* some archs that may have mappings that need to be removed outside the vmas.
* Note that the prev->vm_end and next->vm_start are often used.
+ * We don't use vma_start_write_killable() because page tables should be freed
+ * even if the task is being killed.
*
* The vma_end differs from the pg_end when a dup_mmap() failed and the tree has
* unrelated data to the mm_struct being torn down.
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 0e5175f1c767..90939f5bde02 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1784,7 +1784,8 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
return -EINVAL;
if (end == start)
return 0;
- mmap_write_lock(mm);
+ if (mmap_write_lock_killable(mm))
+ return -EINTR;
prev = vma_prev(&vmi);
for_each_vma_range(vmi, vma, end) {
/*
@@ -1801,13 +1802,16 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
err = -EOPNOTSUPP;
break;
}
+ if (vma_start_write_killable(vma)) {
+ err = -EINTR;
+ break;
+ }
new = mpol_dup(old);
if (IS_ERR(new)) {
err = PTR_ERR(new);
break;
}
- vma_start_write(vma);
new->home_node = home_node;
err = mbind_range(&vmi, vma, &prev, start, end, new);
mpol_put(new);
diff --git a/mm/mlock.c b/mm/mlock.c
index 2f699c3497a5..c562c77c3ee0 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -420,7 +420,7 @@ static int mlock_pte_range(pmd_t *pmd, unsigned long addr,
* Called for mlock(), mlock2() and mlockall(), to set @vma VM_LOCKED;
* called for munlock() and munlockall(), to clear VM_LOCKED from @vma.
*/
-static void mlock_vma_pages_range(struct vm_area_struct *vma,
+static int mlock_vma_pages_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end, vm_flags_t newflags)
{
static const struct mm_walk_ops mlock_walk_ops = {
@@ -441,7 +441,9 @@ static void mlock_vma_pages_range(struct vm_area_struct *vma,
*/
if (newflags & VM_LOCKED)
newflags |= VM_IO;
- vma_start_write(vma);
+ if (vma_start_write_killable(vma))
+ return -EINTR;
+
vm_flags_reset_once(vma, newflags);
lru_add_drain();
@@ -452,6 +454,7 @@ static void mlock_vma_pages_range(struct vm_area_struct *vma,
newflags &= ~VM_IO;
vm_flags_reset_once(vma, newflags);
}
+ return 0;
}
/*
@@ -501,10 +504,12 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
*/
if ((newflags & VM_LOCKED) && (oldflags & VM_LOCKED)) {
/* No work to do, and mlocking twice would be wrong */
- vma_start_write(vma);
+ ret = vma_start_write_killable(vma);
+ if (ret)
+ goto out;
vm_flags_reset(vma, newflags);
} else {
- mlock_vma_pages_range(vma, start, end, newflags);
+ ret = mlock_vma_pages_range(vma, start, end, newflags);
}
out:
*prev = vma;
@@ -733,9 +738,13 @@ static int apply_mlockall_flags(int flags)
error = mlock_fixup(&vmi, vma, &prev, vma->vm_start, vma->vm_end,
newflags);
- /* Ignore errors, but prev needs fixing up. */
- if (error)
+ /* Ignore errors except EINTR, but prev needs fixing up. */
+ if (error) {
+ if (error == -EINTR)
+ return error;
+
prev = vma;
+ }
cond_resched();
}
out:
diff --git a/mm/mprotect.c b/mm/mprotect.c
index c0571445bef7..49dbb7156936 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -765,7 +765,9 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
* vm_flags and vm_page_prot are protected by the mmap_lock
* held in write mode.
*/
- vma_start_write(vma);
+ error = vma_start_write_killable(vma);
+ if (error < 0)
+ goto fail;
vm_flags_reset_once(vma, newflags);
if (vma_wants_manual_pte_write_upgrade(vma))
mm_cp_flags |= MM_CP_TRY_CHANGE_WRITABLE;
diff --git a/mm/mremap.c b/mm/mremap.c
index 2be876a70cc0..aef1e5f373c7 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -1286,7 +1286,9 @@ static unsigned long move_vma(struct vma_remap_struct *vrm)
return -ENOMEM;
/* We don't want racing faults. */
- vma_start_write(vrm->vma);
+ err = vma_start_write_killable(vrm->vma);
+ if (err)
+ return err;
/* Perform copy step. */
err = copy_vma_and_data(vrm, &new_vma);
diff --git a/mm/vma.c b/mm/vma.c
index bb4d0326fecb..9f2664f1d078 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -530,6 +530,13 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (err)
goto out_free_vmi;
+ err = vma_start_write_killable(vma);
+ if (err)
+ goto out_free_mpol;
+ err = vma_start_write_killable(new);
+ if (err)
+ goto out_free_mpol;
+
err = anon_vma_clone(new, vma, VMA_OP_SPLIT);
if (err)
goto out_free_mpol;
@@ -540,9 +547,6 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (new->vm_ops && new->vm_ops->open)
new->vm_ops->open(new);
- vma_start_write(vma);
- vma_start_write(new);
-
init_vma_prep(&vp, vma);
vp.insert = new;
vma_prepare(&vp);
@@ -895,16 +899,22 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
}
/* No matter what happens, we will be adjusting middle. */
- vma_start_write(middle);
+ err = vma_start_write_killable(middle);
+ if (err)
+ goto abort;
if (merge_right) {
- vma_start_write(next);
+ err = vma_start_write_killable(next);
+ if (err)
+ goto abort;
vmg->target = next;
sticky_flags |= (next->vm_flags & VM_STICKY);
}
if (merge_left) {
- vma_start_write(prev);
+ err = vma_start_write_killable(prev);
+ if (err)
+ goto abort;
vmg->target = prev;
sticky_flags |= (prev->vm_flags & VM_STICKY);
}
@@ -1155,10 +1165,12 @@ int vma_expand(struct vma_merge_struct *vmg)
struct vm_area_struct *next = vmg->next;
bool remove_next = false;
vm_flags_t sticky_flags;
- int ret = 0;
+ int ret;
mmap_assert_write_locked(vmg->mm);
- vma_start_write(target);
+ ret = vma_start_write_killable(target);
+ if (ret)
+ return ret;
if (next && target != next && vmg->end == next->vm_end)
remove_next = true;
@@ -1187,6 +1199,9 @@ int vma_expand(struct vma_merge_struct *vmg)
* we don't need to account for vmg->give_up_on_mm here.
*/
if (remove_next) {
+ ret = vma_start_write_killable(next);
+ if (ret)
+ return ret;
ret = dup_anon_vma(target, next, &anon_dup);
if (ret)
return ret;
@@ -1197,10 +1212,8 @@ int vma_expand(struct vma_merge_struct *vmg)
return ret;
}
- if (remove_next) {
- vma_start_write(next);
+ if (remove_next)
vmg->__remove_next = true;
- }
if (commit_merge(vmg))
goto nomem;
@@ -1233,6 +1246,7 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long start, unsigned long end, pgoff_t pgoff)
{
struct vma_prepare vp;
+ int err;
WARN_ON((vma->vm_start != start) && (vma->vm_end != end));
@@ -1244,7 +1258,11 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (vma_iter_prealloc(vmi, NULL))
return -ENOMEM;
- vma_start_write(vma);
+ err = vma_start_write_killable(vma);
+ if (err) {
+ vma_iter_free(vmi);
+ return err;
+ }
init_vma_prep(&vp, vma);
vma_prepare(&vp);
@@ -1434,7 +1452,9 @@ static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
if (error)
goto end_split_failed;
}
- vma_start_write(next);
+ error = vma_start_write_killable(next);
+ if (error)
+ goto munmap_gather_failed;
mas_set(mas_detach, vms->vma_count++);
error = mas_store_gfp(mas_detach, next, GFP_KERNEL);
if (error)
@@ -1828,12 +1848,17 @@ static void vma_link_file(struct vm_area_struct *vma, bool hold_rmap_lock)
static int vma_link(struct mm_struct *mm, struct vm_area_struct *vma)
{
VMA_ITERATOR(vmi, mm, 0);
+ int err;
vma_iter_config(&vmi, vma->vm_start, vma->vm_end);
if (vma_iter_prealloc(&vmi, vma))
return -ENOMEM;
- vma_start_write(vma);
+ err = vma_start_write_killable(vma);
+ if (err) {
+ vma_iter_free(&vmi);
+ return err;
+ }
vma_iter_store_new(&vmi, vma);
vma_link_file(vma, /* hold_rmap_lock= */false);
mm->map_count++;
@@ -2215,9 +2240,8 @@ int mm_take_all_locks(struct mm_struct *mm)
* is reached.
*/
for_each_vma(vmi, vma) {
- if (signal_pending(current))
+ if (signal_pending(current) || vma_start_write_killable(vma))
goto out_unlock;
- vma_start_write(vma);
}
vma_iter_init(&vmi, mm, 0);
@@ -2522,6 +2546,11 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
if (!vma)
return -ENOMEM;
+ /* Lock the VMA since it is modified after insertion into VMA tree */
+ error = vma_start_write_killable(vma);
+ if (error)
+ goto free_vma;
+
vma_iter_config(vmi, map->addr, map->end);
vma_set_range(vma, map->addr, map->end, map->pgoff);
vm_flags_init(vma, map->vm_flags);
@@ -2552,8 +2581,6 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
WARN_ON_ONCE(!arch_validate_flags(map->vm_flags));
#endif
- /* Lock the VMA since it is modified after insertion into VMA tree */
- vma_start_write(vma);
vma_iter_store_new(vmi, vma);
map->mm->map_count++;
vma_link_file(vma, map->hold_file_rmap_lock);
@@ -2864,6 +2891,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long addr, unsigned long len, vm_flags_t vm_flags)
{
struct mm_struct *mm = current->mm;
+ int err = -ENOMEM;
/*
* Check against address space limits by the changed size
@@ -2908,7 +2936,10 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
vma_set_range(vma, addr, addr + len, addr >> PAGE_SHIFT);
vm_flags_init(vma, vm_flags);
vma->vm_page_prot = vm_get_page_prot(vm_flags);
- vma_start_write(vma);
+ if (vma_start_write_killable(vma)) {
+ err = -EINTR;
+ goto mas_store_fail;
+ }
if (vma_iter_store_gfp(vmi, vma, GFP_KERNEL))
goto mas_store_fail;
@@ -2928,7 +2959,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
vm_area_free(vma);
unacct_fail:
vm_unacct_memory(len >> PAGE_SHIFT);
- return -ENOMEM;
+ return err;
}
/**
@@ -3089,7 +3120,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
struct mm_struct *mm = vma->vm_mm;
struct vm_area_struct *next;
unsigned long gap_addr;
- int error = 0;
+ int error;
VMA_ITERATOR(vmi, mm, vma->vm_start);
if (!(vma->vm_flags & VM_GROWSUP))
@@ -3126,12 +3157,14 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
/* We must make sure the anon_vma is allocated. */
if (unlikely(anon_vma_prepare(vma))) {
- vma_iter_free(&vmi);
- return -ENOMEM;
+ error = -ENOMEM;
+ goto free;
}
/* Lock the VMA before expanding to prevent concurrent page faults */
- vma_start_write(vma);
+ error = vma_start_write_killable(vma);
+ if (error)
+ goto free;
/* We update the anon VMA tree. */
anon_vma_lock_write(vma->anon_vma);
@@ -3160,6 +3193,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
}
}
anon_vma_unlock_write(vma->anon_vma);
+free:
vma_iter_free(&vmi);
validate_mm(mm);
return error;
@@ -3174,7 +3208,7 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
{
struct mm_struct *mm = vma->vm_mm;
struct vm_area_struct *prev;
- int error = 0;
+ int error;
VMA_ITERATOR(vmi, mm, vma->vm_start);
if (!(vma->vm_flags & VM_GROWSDOWN))
@@ -3205,12 +3239,14 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
/* We must make sure the anon_vma is allocated. */
if (unlikely(anon_vma_prepare(vma))) {
- vma_iter_free(&vmi);
- return -ENOMEM;
+ error = -ENOMEM;
+ goto free;
}
/* Lock the VMA before expanding to prevent concurrent page faults */
- vma_start_write(vma);
+ error = vma_start_write_killable(vma);
+ if (error)
+ goto free;
/* We update the anon VMA tree. */
anon_vma_lock_write(vma->anon_vma);
@@ -3240,6 +3276,7 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
}
}
anon_vma_unlock_write(vma->anon_vma);
+free:
vma_iter_free(&vmi);
validate_mm(mm);
return error;
diff --git a/mm/vma_exec.c b/mm/vma_exec.c
index 8134e1afca68..a4addc2a8480 100644
--- a/mm/vma_exec.c
+++ b/mm/vma_exec.c
@@ -40,6 +40,7 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
struct vm_area_struct *next;
struct mmu_gather tlb;
PAGETABLE_MOVE(pmc, vma, vma, old_start, new_start, length);
+ int err;
BUG_ON(new_start > new_end);
@@ -55,8 +56,9 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
* cover the whole range: [new_start, old_end)
*/
vmg.target = vma;
- if (vma_expand(&vmg))
- return -ENOMEM;
+ err = vma_expand(&vmg);
+ if (err)
+ return err;
/*
* move the page tables downwards, on failure we rely on
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 3/3] mm: use vma_start_write_killable() in process_vma_walk_lock()
2026-02-26 7:06 [PATCH v3 0/3] Use killable vma write locking in most places Suren Baghdasaryan
2026-02-26 7:06 ` [PATCH v3 1/3] mm/vma: cleanup error handling path in vma_expand() Suren Baghdasaryan
2026-02-26 7:06 ` [PATCH v3 2/3] mm: replace vma_start_write() with vma_start_write_killable() Suren Baghdasaryan
@ 2026-02-26 7:06 ` Suren Baghdasaryan
2026-02-26 18:10 ` Claudio Imbrenda
2 siblings, 1 reply; 9+ messages in thread
From: Suren Baghdasaryan @ 2026-02-26 7:06 UTC (permalink / raw)
To: akpm
Cc: willy, david, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
byungchul, gourry, ying.huang, apopple, lorenzo.stoakes,
baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
baohua, lance.yang, vbabka, jannh, rppt, mhocko, pfalcato, kees,
maddy, npiggin, mpe, chleroy, borntraeger, frankja, imbrenda,
hca, gor, agordeev, svens, gerald.schaefer, linux-mm,
linuxppc-dev, kvm, linux-kernel, linux-s390, surenb
Replace vma_start_write() with vma_start_write_killable() when
process_vma_walk_lock() is used with PGWALK_WRLOCK option.
Adjust its direct and indirect users to check for a possible error
and handle it. Ensure users handle EINTR correctly and do not ignore
it.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
arch/s390/kvm/kvm-s390.c | 2 +-
fs/proc/task_mmu.c | 5 ++++-
mm/mempolicy.c | 14 +++++++++++---
mm/pagewalk.c | 20 ++++++++++++++------
mm/vma.c | 22 ++++++++++++++--------
mm/vma.h | 6 ++++++
6 files changed, 50 insertions(+), 19 deletions(-)
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 7a175d86cef0..337e4f7db63a 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2948,7 +2948,7 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
}
/* must be called without kvm->lock */
r = kvm_s390_handle_pv(kvm, &args);
- if (copy_to_user(argp, &args, sizeof(args))) {
+ if (r != -EINTR && copy_to_user(argp, &args, sizeof(args))) {
r = -EFAULT;
break;
}
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index e091931d7ca1..1238a2988eb6 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1797,6 +1797,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
struct clear_refs_private cp = {
.type = type,
};
+ int err;
if (mmap_write_lock_killable(mm)) {
count = -EINTR;
@@ -1824,7 +1825,9 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
0, mm, 0, -1UL);
mmu_notifier_invalidate_range_start(&range);
}
- walk_page_range(mm, 0, -1, &clear_refs_walk_ops, &cp);
+ err = walk_page_range(mm, 0, -1, &clear_refs_walk_ops, &cp);
+ if (err < 0)
+ count = err;
if (type == CLEAR_REFS_SOFT_DIRTY) {
mmu_notifier_invalidate_range_end(&range);
flush_tlb_mm(mm);
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 90939f5bde02..3c8b3dfc9c56 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -988,6 +988,8 @@ queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end,
&queue_pages_lock_vma_walk_ops : &queue_pages_walk_ops;
err = walk_page_range(mm, start, end, ops, &qp);
+ if (err == -EINTR)
+ return err;
if (!qp.first)
/* whole range in hole */
@@ -1309,9 +1311,14 @@ static long migrate_to_node(struct mm_struct *mm, int source, int dest,
flags | MPOL_MF_DISCONTIG_OK, &pagelist);
mmap_read_unlock(mm);
+ if (nr_failed == -EINTR)
+ err = nr_failed;
+
if (!list_empty(&pagelist)) {
- err = migrate_pages(&pagelist, alloc_migration_target, NULL,
- (unsigned long)&mtc, MIGRATE_SYNC, MR_SYSCALL, NULL);
+ if (!err)
+ err = migrate_pages(&pagelist, alloc_migration_target,
+ NULL, (unsigned long)&mtc,
+ MIGRATE_SYNC, MR_SYSCALL, NULL);
if (err)
putback_movable_pages(&pagelist);
}
@@ -1611,7 +1618,8 @@ static long do_mbind(unsigned long start, unsigned long len,
MR_MEMPOLICY_MBIND, NULL);
}
- if (nr_failed && (flags & MPOL_MF_STRICT))
+ /* Do not mask EINTR */
+ if ((err != -EINTR) && (nr_failed && (flags & MPOL_MF_STRICT)))
err = -EIO;
if (!list_empty(&pagelist))
putback_movable_pages(&pagelist);
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index a94c401ab2cf..dc9f7a7709c6 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -425,14 +425,13 @@ static inline void process_mm_walk_lock(struct mm_struct *mm,
mmap_assert_write_locked(mm);
}
-static inline void process_vma_walk_lock(struct vm_area_struct *vma,
+static inline int process_vma_walk_lock(struct vm_area_struct *vma,
enum page_walk_lock walk_lock)
{
#ifdef CONFIG_PER_VMA_LOCK
switch (walk_lock) {
case PGWALK_WRLOCK:
- vma_start_write(vma);
- break;
+ return vma_start_write_killable(vma);
case PGWALK_WRLOCK_VERIFY:
vma_assert_write_locked(vma);
break;
@@ -444,6 +443,7 @@ static inline void process_vma_walk_lock(struct vm_area_struct *vma,
break;
}
#endif
+ return 0;
}
/*
@@ -487,7 +487,9 @@ int walk_page_range_mm_unsafe(struct mm_struct *mm, unsigned long start,
if (ops->pte_hole)
err = ops->pte_hole(start, next, -1, &walk);
} else { /* inside vma */
- process_vma_walk_lock(vma, ops->walk_lock);
+ err = process_vma_walk_lock(vma, ops->walk_lock);
+ if (err)
+ break;
walk.vma = vma;
next = min(end, vma->vm_end);
vma = find_vma(mm, vma->vm_end);
@@ -704,6 +706,7 @@ int walk_page_range_vma_unsafe(struct vm_area_struct *vma, unsigned long start,
.vma = vma,
.private = private,
};
+ int err;
if (start >= end || !walk.mm)
return -EINVAL;
@@ -711,7 +714,9 @@ int walk_page_range_vma_unsafe(struct vm_area_struct *vma, unsigned long start,
return -EINVAL;
process_mm_walk_lock(walk.mm, ops->walk_lock);
- process_vma_walk_lock(vma, ops->walk_lock);
+ err = process_vma_walk_lock(vma, ops->walk_lock);
+ if (err)
+ return err;
return __walk_page_range(start, end, &walk);
}
@@ -734,6 +739,7 @@ int walk_page_vma(struct vm_area_struct *vma, const struct mm_walk_ops *ops,
.vma = vma,
.private = private,
};
+ int err;
if (!walk.mm)
return -EINVAL;
@@ -741,7 +747,9 @@ int walk_page_vma(struct vm_area_struct *vma, const struct mm_walk_ops *ops,
return -EINVAL;
process_mm_walk_lock(walk.mm, ops->walk_lock);
- process_vma_walk_lock(vma, ops->walk_lock);
+ err = process_vma_walk_lock(vma, ops->walk_lock);
+ if (err)
+ return err;
return __walk_page_range(vma->vm_start, vma->vm_end, &walk);
}
diff --git a/mm/vma.c b/mm/vma.c
index 9f2664f1d078..46bbad6e64a4 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -998,14 +998,18 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
if (anon_dup)
unlink_anon_vmas(anon_dup);
- /*
- * This means we have failed to clone anon_vma's correctly, but no
- * actual changes to VMAs have occurred, so no harm no foul - if the
- * user doesn't want this reported and instead just wants to give up on
- * the merge, allow it.
- */
- if (!vmg->give_up_on_oom)
- vmg->state = VMA_MERGE_ERROR_NOMEM;
+ if (err == -EINTR) {
+ vmg->state = VMA_MERGE_ERROR_INTR;
+ } else {
+ /*
+ * This means we have failed to clone anon_vma's correctly,
+ * but no actual changes to VMAs have occurred, so no harm no
+ * foul - if the user doesn't want this reported and instead
+ * just wants to give up on the merge, allow it.
+ */
+ if (!vmg->give_up_on_oom)
+ vmg->state = VMA_MERGE_ERROR_NOMEM;
+ }
return NULL;
}
@@ -1681,6 +1685,8 @@ static struct vm_area_struct *vma_modify(struct vma_merge_struct *vmg)
merged = vma_merge_existing_range(vmg);
if (merged)
return merged;
+ if (vmg_intr(vmg))
+ return ERR_PTR(-EINTR);
if (vmg_nomem(vmg))
return ERR_PTR(-ENOMEM);
diff --git a/mm/vma.h b/mm/vma.h
index eba388c61ef4..fe4560f81f4f 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -56,6 +56,7 @@ struct vma_munmap_struct {
enum vma_merge_state {
VMA_MERGE_START,
VMA_MERGE_ERROR_NOMEM,
+ VMA_MERGE_ERROR_INTR,
VMA_MERGE_NOMERGE,
VMA_MERGE_SUCCESS,
};
@@ -226,6 +227,11 @@ static inline bool vmg_nomem(struct vma_merge_struct *vmg)
return vmg->state == VMA_MERGE_ERROR_NOMEM;
}
+static inline bool vmg_intr(struct vma_merge_struct *vmg)
+{
+ return vmg->state == VMA_MERGE_ERROR_INTR;
+}
+
/* Assumes addr >= vma->vm_start. */
static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
unsigned long addr)
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/3] mm/vma: cleanup error handling path in vma_expand()
2026-02-26 7:06 ` [PATCH v3 1/3] mm/vma: cleanup error handling path in vma_expand() Suren Baghdasaryan
@ 2026-02-26 16:42 ` Liam R. Howlett
2026-02-26 17:23 ` Suren Baghdasaryan
0 siblings, 1 reply; 9+ messages in thread
From: Liam R. Howlett @ 2026-02-26 16:42 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, willy, david, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
byungchul, gourry, ying.huang, apopple, lorenzo.stoakes,
baolin.wang, npache, ryan.roberts, dev.jain, baohua, lance.yang,
vbabka, jannh, rppt, mhocko, pfalcato, kees, maddy, npiggin, mpe,
chleroy, borntraeger, frankja, imbrenda, hca, gor, agordeev,
svens, gerald.schaefer, linux-mm, linuxppc-dev, kvm,
linux-kernel, linux-s390
* Suren Baghdasaryan <surenb@google.com> [260226 02:06]:
> vma_expand() error handling is a bit confusing with "if (ret) return ret;"
> mixed with "if (!ret && ...) ret = ...;". Simplify the code to check
> for errors and return immediately after an operation that might fail.
> This also makes later changes to this function more readable.
>
> No functional change intended.
>
> Suggested-by: Jann Horn <jannh@google.com>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
This looks the same as v2, so I'll try again ;)
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> mm/vma.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/mm/vma.c b/mm/vma.c
> index be64f781a3aa..bb4d0326fecb 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -1186,12 +1186,16 @@ int vma_expand(struct vma_merge_struct *vmg)
> * Note that, by convention, callers ignore OOM for this case, so
> * we don't need to account for vmg->give_up_on_mm here.
> */
> - if (remove_next)
> + if (remove_next) {
> ret = dup_anon_vma(target, next, &anon_dup);
> - if (!ret && vmg->copied_from)
> + if (ret)
> + return ret;
> + }
> + if (vmg->copied_from) {
> ret = dup_anon_vma(target, vmg->copied_from, &anon_dup);
> - if (ret)
> - return ret;
> + if (ret)
> + return ret;
> + }
>
> if (remove_next) {
> vma_start_write(next);
> --
> 2.53.0.414.gf7e9f6c205-goog
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/3] mm/vma: cleanup error handling path in vma_expand()
2026-02-26 16:42 ` Liam R. Howlett
@ 2026-02-26 17:23 ` Suren Baghdasaryan
0 siblings, 0 replies; 9+ messages in thread
From: Suren Baghdasaryan @ 2026-02-26 17:23 UTC (permalink / raw)
To: Liam R. Howlett, Suren Baghdasaryan, akpm, willy, david, ziy,
matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
ying.huang, apopple, lorenzo.stoakes, baolin.wang, npache,
ryan.roberts, dev.jain, baohua, lance.yang, vbabka, jannh, rppt,
mhocko, pfalcato, kees, maddy, npiggin, mpe, chleroy,
borntraeger, frankja, imbrenda, hca, gor, agordeev, svens,
gerald.schaefer, linux-mm, linuxppc-dev, kvm, linux-kernel,
linux-s390
On Thu, Feb 26, 2026 at 8:43 AM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
>
> * Suren Baghdasaryan <surenb@google.com> [260226 02:06]:
> > vma_expand() error handling is a bit confusing with "if (ret) return ret;"
> > mixed with "if (!ret && ...) ret = ...;". Simplify the code to check
> > for errors and return immediately after an operation that might fail.
> > This also makes later changes to this function more readable.
> >
> > No functional change intended.
> >
> > Suggested-by: Jann Horn <jannh@google.com>
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>
> This looks the same as v2, so I'll try again ;)
Sorry, missed adding it. So again, thank you very much!
>
> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
>
> > ---
> > mm/vma.c | 12 ++++++++----
> > 1 file changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/vma.c b/mm/vma.c
> > index be64f781a3aa..bb4d0326fecb 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -1186,12 +1186,16 @@ int vma_expand(struct vma_merge_struct *vmg)
> > * Note that, by convention, callers ignore OOM for this case, so
> > * we don't need to account for vmg->give_up_on_mm here.
> > */
> > - if (remove_next)
> > + if (remove_next) {
> > ret = dup_anon_vma(target, next, &anon_dup);
> > - if (!ret && vmg->copied_from)
> > + if (ret)
> > + return ret;
> > + }
> > + if (vmg->copied_from) {
> > ret = dup_anon_vma(target, vmg->copied_from, &anon_dup);
> > - if (ret)
> > - return ret;
> > + if (ret)
> > + return ret;
> > + }
> >
> > if (remove_next) {
> > vma_start_write(next);
> > --
> > 2.53.0.414.gf7e9f6c205-goog
> >
> >
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/3] mm: replace vma_start_write() with vma_start_write_killable()
2026-02-26 7:06 ` [PATCH v3 2/3] mm: replace vma_start_write() with vma_start_write_killable() Suren Baghdasaryan
@ 2026-02-26 17:43 ` Liam R. Howlett
0 siblings, 0 replies; 9+ messages in thread
From: Liam R. Howlett @ 2026-02-26 17:43 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, willy, david, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
byungchul, gourry, ying.huang, apopple, lorenzo.stoakes,
baolin.wang, npache, ryan.roberts, dev.jain, baohua, lance.yang,
vbabka, jannh, rppt, mhocko, pfalcato, kees, maddy, npiggin, mpe,
chleroy, borntraeger, frankja, imbrenda, hca, gor, agordeev,
svens, gerald.schaefer, linux-mm, linuxppc-dev, kvm,
linux-kernel, linux-s390, Ritesh Harjani (IBM)
* Suren Baghdasaryan <surenb@google.com> [260226 02:06]:
> Now that we have vma_start_write_killable() we can replace most of the
> vma_start_write() calls with it, improving reaction time to the kill
> signal.
>
> There are several places which are left untouched by this patch:
>
> 1. free_pgtables() because function should free page tables even if a
> fatal signal is pending.
>
> 2. process_vma_walk_lock(), which requires changes in its callers and
> will be handled in the next patch.
>
> 3. userfaultd code, where some paths calling vma_start_write() can
> handle EINTR and some can't without a deeper code refactoring.
>
> 4. mpol_rebind_mm() which is used by cpusset controller for migrations
> and operates on a remote mm. Incomplete operations here would result
> in an inconsistent cgroup state.
>
> 5. vm_flags_{set|mod|clear} require refactoring that involves moving
> vma_start_write() out of these functions and replacing it with
> vma_assert_write_locked(), then callers of these functions should
> lock the vma themselves using vma_start_write_killable() whenever
> possible.
>
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> # powerpc
Some nits below, but lgtm.
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> arch/powerpc/kvm/book3s_hv_uvmem.c | 5 +-
> mm/khugepaged.c | 5 +-
> mm/madvise.c | 4 +-
> mm/memory.c | 2 +
> mm/mempolicy.c | 8 ++-
> mm/mlock.c | 21 +++++--
> mm/mprotect.c | 4 +-
> mm/mremap.c | 4 +-
> mm/vma.c | 93 +++++++++++++++++++++---------
> mm/vma_exec.c | 6 +-
> 10 files changed, 109 insertions(+), 43 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 5fbb95d90e99..0a28b48a46b8 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -410,7 +410,10 @@ static int kvmppc_memslot_page_merge(struct kvm *kvm,
> ret = H_STATE;
> break;
> }
> - vma_start_write(vma);
> + if (vma_start_write_killable(vma)) {
> + ret = H_STATE;
> + break;
> + }
> /* Copy vm_flags to avoid partial modifications in ksm_madvise */
> vm_flags = vma->vm_flags;
> ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 1dd3cfca610d..6c92e31ee5fb 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1141,7 +1141,10 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
> if (result != SCAN_SUCCEED)
> goto out_up_write;
> /* check if the pmd is still valid */
> - vma_start_write(vma);
> + if (vma_start_write_killable(vma)) {
> + result = SCAN_FAIL;
> + goto out_up_write;
> + }
> result = check_pmd_still_valid(mm, address, pmd);
> if (result != SCAN_SUCCEED)
> goto out_up_write;
> diff --git a/mm/madvise.c b/mm/madvise.c
> index c0370d9b4e23..ccdaea6b3b15 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -173,7 +173,9 @@ static int madvise_update_vma(vm_flags_t new_flags,
> madv_behavior->vma = vma;
>
> /* vm_flags is protected by the mmap_lock held in write mode. */
> - vma_start_write(vma);
> + if (vma_start_write_killable(vma))
> + return -EINTR;
> +
> vm_flags_reset(vma, new_flags);
> if (set_new_anon_name)
> return replace_anon_vma_name(vma, anon_name);
> diff --git a/mm/memory.c b/mm/memory.c
> index 07778814b4a8..691062154cf5 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -379,6 +379,8 @@ void free_pgd_range(struct mmu_gather *tlb,
> * page tables that should be removed. This can differ from the vma mappings on
> * some archs that may have mappings that need to be removed outside the vmas.
> * Note that the prev->vm_end and next->vm_start are often used.
> + * We don't use vma_start_write_killable() because page tables should be freed
> + * even if the task is being killed.
> *
> * The vma_end differs from the pg_end when a dup_mmap() failed and the tree has
> * unrelated data to the mm_struct being torn down.
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 0e5175f1c767..90939f5bde02 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1784,7 +1784,8 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
> return -EINVAL;
> if (end == start)
> return 0;
> - mmap_write_lock(mm);
> + if (mmap_write_lock_killable(mm))
> + return -EINTR;
> prev = vma_prev(&vmi);
> for_each_vma_range(vmi, vma, end) {
> /*
> @@ -1801,13 +1802,16 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
> err = -EOPNOTSUPP;
> break;
> }
> + if (vma_start_write_killable(vma)) {
> + err = -EINTR;
> + break;
> + }
> new = mpol_dup(old);
> if (IS_ERR(new)) {
> err = PTR_ERR(new);
> break;
> }
>
> - vma_start_write(vma);
> new->home_node = home_node;
> err = mbind_range(&vmi, vma, &prev, start, end, new);
> mpol_put(new);
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 2f699c3497a5..c562c77c3ee0 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -420,7 +420,7 @@ static int mlock_pte_range(pmd_t *pmd, unsigned long addr,
> * Called for mlock(), mlock2() and mlockall(), to set @vma VM_LOCKED;
> * called for munlock() and munlockall(), to clear VM_LOCKED from @vma.
> */
> -static void mlock_vma_pages_range(struct vm_area_struct *vma,
> +static int mlock_vma_pages_range(struct vm_area_struct *vma,
> unsigned long start, unsigned long end, vm_flags_t newflags)
> {
> static const struct mm_walk_ops mlock_walk_ops = {
> @@ -441,7 +441,9 @@ static void mlock_vma_pages_range(struct vm_area_struct *vma,
> */
> if (newflags & VM_LOCKED)
> newflags |= VM_IO;
> - vma_start_write(vma);
> + if (vma_start_write_killable(vma))
> + return -EINTR;
> +
> vm_flags_reset_once(vma, newflags);
>
> lru_add_drain();
> @@ -452,6 +454,7 @@ static void mlock_vma_pages_range(struct vm_area_struct *vma,
> newflags &= ~VM_IO;
> vm_flags_reset_once(vma, newflags);
> }
> + return 0;
> }
>
> /*
> @@ -501,10 +504,12 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
> */
> if ((newflags & VM_LOCKED) && (oldflags & VM_LOCKED)) {
> /* No work to do, and mlocking twice would be wrong */
> - vma_start_write(vma);
> + ret = vma_start_write_killable(vma);
> + if (ret)
> + goto out;
> vm_flags_reset(vma, newflags);
> } else {
> - mlock_vma_pages_range(vma, start, end, newflags);
> + ret = mlock_vma_pages_range(vma, start, end, newflags);
> }
> out:
> *prev = vma;
> @@ -733,9 +738,13 @@ static int apply_mlockall_flags(int flags)
>
> error = mlock_fixup(&vmi, vma, &prev, vma->vm_start, vma->vm_end,
> newflags);
> - /* Ignore errors, but prev needs fixing up. */
> - if (error)
> + /* Ignore errors except EINTR, but prev needs fixing up. */
> + if (error) {
> + if (error == -EINTR)
> + return error;
> +
> prev = vma;
> + }
> cond_resched();
> }
> out:
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index c0571445bef7..49dbb7156936 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -765,7 +765,9 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
> * vm_flags and vm_page_prot are protected by the mmap_lock
> * held in write mode.
> */
> - vma_start_write(vma);
> + error = vma_start_write_killable(vma);
> + if (error < 0)
> + goto fail;
> vm_flags_reset_once(vma, newflags);
> if (vma_wants_manual_pte_write_upgrade(vma))
> mm_cp_flags |= MM_CP_TRY_CHANGE_WRITABLE;
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 2be876a70cc0..aef1e5f373c7 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -1286,7 +1286,9 @@ static unsigned long move_vma(struct vma_remap_struct *vrm)
> return -ENOMEM;
>
> /* We don't want racing faults. */
> - vma_start_write(vrm->vma);
> + err = vma_start_write_killable(vrm->vma);
> + if (err)
> + return err;
>
> /* Perform copy step. */
> err = copy_vma_and_data(vrm, &new_vma);
> diff --git a/mm/vma.c b/mm/vma.c
> index bb4d0326fecb..9f2664f1d078 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -530,6 +530,13 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
> if (err)
> goto out_free_vmi;
>
> + err = vma_start_write_killable(vma);
> + if (err)
> + goto out_free_mpol;
> + err = vma_start_write_killable(new);
> + if (err)
> + goto out_free_mpol;
> +
> err = anon_vma_clone(new, vma, VMA_OP_SPLIT);
> if (err)
> goto out_free_mpol;
> @@ -540,9 +547,6 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
> if (new->vm_ops && new->vm_ops->open)
> new->vm_ops->open(new);
>
> - vma_start_write(vma);
> - vma_start_write(new);
> -
> init_vma_prep(&vp, vma);
> vp.insert = new;
> vma_prepare(&vp);
> @@ -895,16 +899,22 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
> }
>
> /* No matter what happens, we will be adjusting middle. */
> - vma_start_write(middle);
> + err = vma_start_write_killable(middle);
> + if (err)
> + goto abort;
>
> if (merge_right) {
> - vma_start_write(next);
> + err = vma_start_write_killable(next);
> + if (err)
> + goto abort;
> vmg->target = next;
> sticky_flags |= (next->vm_flags & VM_STICKY);
> }
>
> if (merge_left) {
> - vma_start_write(prev);
> + err = vma_start_write_killable(prev);
> + if (err)
> + goto abort;
> vmg->target = prev;
> sticky_flags |= (prev->vm_flags & VM_STICKY);
> }
> @@ -1155,10 +1165,12 @@ int vma_expand(struct vma_merge_struct *vmg)
> struct vm_area_struct *next = vmg->next;
> bool remove_next = false;
> vm_flags_t sticky_flags;
> - int ret = 0;
> + int ret;
>
> mmap_assert_write_locked(vmg->mm);
> - vma_start_write(target);
> + ret = vma_start_write_killable(target);
> + if (ret)
> + return ret;
>
> if (next && target != next && vmg->end == next->vm_end)
> remove_next = true;
> @@ -1187,6 +1199,9 @@ int vma_expand(struct vma_merge_struct *vmg)
> * we don't need to account for vmg->give_up_on_mm here.
> */
> if (remove_next) {
> + ret = vma_start_write_killable(next);
> + if (ret)
> + return ret;
> ret = dup_anon_vma(target, next, &anon_dup);
> if (ret)
> return ret;
> @@ -1197,10 +1212,8 @@ int vma_expand(struct vma_merge_struct *vmg)
> return ret;
> }
>
> - if (remove_next) {
> - vma_start_write(next);
> + if (remove_next)
> vmg->__remove_next = true;
> - }
> if (commit_merge(vmg))
> goto nomem;
>
> @@ -1233,6 +1246,7 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
> unsigned long start, unsigned long end, pgoff_t pgoff)
> {
> struct vma_prepare vp;
> + int err;
>
> WARN_ON((vma->vm_start != start) && (vma->vm_end != end));
>
> @@ -1244,7 +1258,11 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
> if (vma_iter_prealloc(vmi, NULL))
> return -ENOMEM;
>
> - vma_start_write(vma);
> + err = vma_start_write_killable(vma);
> + if (err) {
> + vma_iter_free(vmi);
> + return err;
> + }
Could avoid allocating here by reordering the lock, but this is fine.
>
> init_vma_prep(&vp, vma);
> vma_prepare(&vp);
> @@ -1434,7 +1452,9 @@ static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
> if (error)
> goto end_split_failed;
> }
> - vma_start_write(next);
> + error = vma_start_write_killable(next);
> + if (error)
> + goto munmap_gather_failed;
> mas_set(mas_detach, vms->vma_count++);
> error = mas_store_gfp(mas_detach, next, GFP_KERNEL);
> if (error)
> @@ -1828,12 +1848,17 @@ static void vma_link_file(struct vm_area_struct *vma, bool hold_rmap_lock)
> static int vma_link(struct mm_struct *mm, struct vm_area_struct *vma)
> {
> VMA_ITERATOR(vmi, mm, 0);
> + int err;
>
> vma_iter_config(&vmi, vma->vm_start, vma->vm_end);
> if (vma_iter_prealloc(&vmi, vma))
> return -ENOMEM;
>
> - vma_start_write(vma);
> + err = vma_start_write_killable(vma);
> + if (err) {
> + vma_iter_free(&vmi);
> + return err;
> + }
Ditto here, ordering would mean no freeing.
> vma_iter_store_new(&vmi, vma);
> vma_link_file(vma, /* hold_rmap_lock= */false);
> mm->map_count++;
> @@ -2215,9 +2240,8 @@ int mm_take_all_locks(struct mm_struct *mm)
> * is reached.
> */
> for_each_vma(vmi, vma) {
> - if (signal_pending(current))
> + if (signal_pending(current) || vma_start_write_killable(vma))
> goto out_unlock;
> - vma_start_write(vma);
> }
>
> vma_iter_init(&vmi, mm, 0);
> @@ -2522,6 +2546,11 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
> if (!vma)
> return -ENOMEM;
>
> + /* Lock the VMA since it is modified after insertion into VMA tree */
> + error = vma_start_write_killable(vma);
> + if (error)
> + goto free_vma;
> +
There's no way this is going to fail, right?
> vma_iter_config(vmi, map->addr, map->end);
> vma_set_range(vma, map->addr, map->end, map->pgoff);
> vm_flags_init(vma, map->vm_flags);
> @@ -2552,8 +2581,6 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
> WARN_ON_ONCE(!arch_validate_flags(map->vm_flags));
> #endif
>
> - /* Lock the VMA since it is modified after insertion into VMA tree */
> - vma_start_write(vma);
> vma_iter_store_new(vmi, vma);
> map->mm->map_count++;
> vma_link_file(vma, map->hold_file_rmap_lock);
> @@ -2864,6 +2891,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> unsigned long addr, unsigned long len, vm_flags_t vm_flags)
> {
> struct mm_struct *mm = current->mm;
> + int err = -ENOMEM;
>
> /*
> * Check against address space limits by the changed size
> @@ -2908,7 +2936,10 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> vma_set_range(vma, addr, addr + len, addr >> PAGE_SHIFT);
> vm_flags_init(vma, vm_flags);
> vma->vm_page_prot = vm_get_page_prot(vm_flags);
> - vma_start_write(vma);
> + if (vma_start_write_killable(vma)) {
> + err = -EINTR;
> + goto mas_store_fail;
> + }
I'd rather have another label saying write lock failed. Really, this
will never fail though (besides syzbot..)
> if (vma_iter_store_gfp(vmi, vma, GFP_KERNEL))
> goto mas_store_fail;
>
> @@ -2928,7 +2959,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> vm_area_free(vma);
> unacct_fail:
> vm_unacct_memory(len >> PAGE_SHIFT);
> - return -ENOMEM;
> + return err;
> }
>
> /**
> @@ -3089,7 +3120,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
> struct mm_struct *mm = vma->vm_mm;
> struct vm_area_struct *next;
> unsigned long gap_addr;
> - int error = 0;
> + int error;
> VMA_ITERATOR(vmi, mm, vma->vm_start);
>
> if (!(vma->vm_flags & VM_GROWSUP))
> @@ -3126,12 +3157,14 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>
> /* We must make sure the anon_vma is allocated. */
> if (unlikely(anon_vma_prepare(vma))) {
> - vma_iter_free(&vmi);
> - return -ENOMEM;
> + error = -ENOMEM;
> + goto free;
> }
>
> /* Lock the VMA before expanding to prevent concurrent page faults */
> - vma_start_write(vma);
> + error = vma_start_write_killable(vma);
> + if (error)
> + goto free;
> /* We update the anon VMA tree. */
> anon_vma_lock_write(vma->anon_vma);
>
> @@ -3160,6 +3193,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
> }
> }
> anon_vma_unlock_write(vma->anon_vma);
> +free:
> vma_iter_free(&vmi);
> validate_mm(mm);
> return error;
> @@ -3174,7 +3208,7 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
> {
> struct mm_struct *mm = vma->vm_mm;
> struct vm_area_struct *prev;
> - int error = 0;
> + int error;
> VMA_ITERATOR(vmi, mm, vma->vm_start);
>
> if (!(vma->vm_flags & VM_GROWSDOWN))
> @@ -3205,12 +3239,14 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
>
> /* We must make sure the anon_vma is allocated. */
> if (unlikely(anon_vma_prepare(vma))) {
> - vma_iter_free(&vmi);
> - return -ENOMEM;
> + error = -ENOMEM;
> + goto free;
> }
>
> /* Lock the VMA before expanding to prevent concurrent page faults */
> - vma_start_write(vma);
> + error = vma_start_write_killable(vma);
> + if (error)
> + goto free;
> /* We update the anon VMA tree. */
> anon_vma_lock_write(vma->anon_vma);
>
> @@ -3240,6 +3276,7 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
> }
> }
> anon_vma_unlock_write(vma->anon_vma);
> +free:
> vma_iter_free(&vmi);
> validate_mm(mm);
> return error;
> diff --git a/mm/vma_exec.c b/mm/vma_exec.c
> index 8134e1afca68..a4addc2a8480 100644
> --- a/mm/vma_exec.c
> +++ b/mm/vma_exec.c
> @@ -40,6 +40,7 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
> struct vm_area_struct *next;
> struct mmu_gather tlb;
> PAGETABLE_MOVE(pmc, vma, vma, old_start, new_start, length);
> + int err;
>
> BUG_ON(new_start > new_end);
>
> @@ -55,8 +56,9 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
> * cover the whole range: [new_start, old_end)
> */
> vmg.target = vma;
> - if (vma_expand(&vmg))
> - return -ENOMEM;
> + err = vma_expand(&vmg);
> + if (err)
> + return err;
>
> /*
> * move the page tables downwards, on failure we rely on
> --
> 2.53.0.414.gf7e9f6c205-goog
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 3/3] mm: use vma_start_write_killable() in process_vma_walk_lock()
2026-02-26 7:06 ` [PATCH v3 3/3] mm: use vma_start_write_killable() in process_vma_walk_lock() Suren Baghdasaryan
@ 2026-02-26 18:10 ` Claudio Imbrenda
2026-02-26 18:24 ` Suren Baghdasaryan
0 siblings, 1 reply; 9+ messages in thread
From: Claudio Imbrenda @ 2026-02-26 18:10 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, willy, david, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
byungchul, gourry, ying.huang, apopple, lorenzo.stoakes,
baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
baohua, lance.yang, vbabka, jannh, rppt, mhocko, pfalcato, kees,
maddy, npiggin, mpe, chleroy, borntraeger, frankja, hca, gor,
agordeev, svens, gerald.schaefer, linux-mm, linuxppc-dev, kvm,
linux-kernel, linux-s390
On Wed, 25 Feb 2026 23:06:09 -0800
Suren Baghdasaryan <surenb@google.com> wrote:
> Replace vma_start_write() with vma_start_write_killable() when
> process_vma_walk_lock() is used with PGWALK_WRLOCK option.
> Adjust its direct and indirect users to check for a possible error
> and handle it. Ensure users handle EINTR correctly and do not ignore
> it.
>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> ---
> arch/s390/kvm/kvm-s390.c | 2 +-
> fs/proc/task_mmu.c | 5 ++++-
> mm/mempolicy.c | 14 +++++++++++---
> mm/pagewalk.c | 20 ++++++++++++++------
> mm/vma.c | 22 ++++++++++++++--------
> mm/vma.h | 6 ++++++
> 6 files changed, 50 insertions(+), 19 deletions(-)
>
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 7a175d86cef0..337e4f7db63a 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -2948,7 +2948,7 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
> }
> /* must be called without kvm->lock */
> r = kvm_s390_handle_pv(kvm, &args);
> - if (copy_to_user(argp, &args, sizeof(args))) {
> + if (r != -EINTR && copy_to_user(argp, &args, sizeof(args))) {
> r = -EFAULT;
> break;
> }
can you very briefly explain how we can end up with -EINTR here?
do I understand correctly that -EINTR is possible here only if the
process is being killed?
[...]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 3/3] mm: use vma_start_write_killable() in process_vma_walk_lock()
2026-02-26 18:10 ` Claudio Imbrenda
@ 2026-02-26 18:24 ` Suren Baghdasaryan
0 siblings, 0 replies; 9+ messages in thread
From: Suren Baghdasaryan @ 2026-02-26 18:24 UTC (permalink / raw)
To: Claudio Imbrenda
Cc: akpm, willy, david, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
byungchul, gourry, ying.huang, apopple, lorenzo.stoakes,
baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
baohua, lance.yang, vbabka, jannh, rppt, mhocko, pfalcato, kees,
maddy, npiggin, mpe, chleroy, borntraeger, frankja, hca, gor,
agordeev, svens, gerald.schaefer, linux-mm, linuxppc-dev, kvm,
linux-kernel, linux-s390
On Thu, Feb 26, 2026 at 10:10 AM Claudio Imbrenda
<imbrenda@linux.ibm.com> wrote:
>
> On Wed, 25 Feb 2026 23:06:09 -0800
> Suren Baghdasaryan <surenb@google.com> wrote:
>
> > Replace vma_start_write() with vma_start_write_killable() when
> > process_vma_walk_lock() is used with PGWALK_WRLOCK option.
> > Adjust its direct and indirect users to check for a possible error
> > and handle it. Ensure users handle EINTR correctly and do not ignore
> > it.
> >
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > ---
> > arch/s390/kvm/kvm-s390.c | 2 +-
> > fs/proc/task_mmu.c | 5 ++++-
> > mm/mempolicy.c | 14 +++++++++++---
> > mm/pagewalk.c | 20 ++++++++++++++------
> > mm/vma.c | 22 ++++++++++++++--------
> > mm/vma.h | 6 ++++++
> > 6 files changed, 50 insertions(+), 19 deletions(-)
> >
> > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> > index 7a175d86cef0..337e4f7db63a 100644
> > --- a/arch/s390/kvm/kvm-s390.c
> > +++ b/arch/s390/kvm/kvm-s390.c
> > @@ -2948,7 +2948,7 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
> > }
> > /* must be called without kvm->lock */
> > r = kvm_s390_handle_pv(kvm, &args);
> > - if (copy_to_user(argp, &args, sizeof(args))) {
> > + if (r != -EINTR && copy_to_user(argp, &args, sizeof(args))) {
> > r = -EFAULT;
> > break;
> > }
>
> can you very briefly explain how we can end up with -EINTR here?
>
> do I understand correctly that -EINTR is possible here only if the
> process is being killed?
Correct, it would happen if the process has a pending fatal signal
(like SIGKILL) in its signal queue.
>
> [...]
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-02-26 18:25 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-26 7:06 [PATCH v3 0/3] Use killable vma write locking in most places Suren Baghdasaryan
2026-02-26 7:06 ` [PATCH v3 1/3] mm/vma: cleanup error handling path in vma_expand() Suren Baghdasaryan
2026-02-26 16:42 ` Liam R. Howlett
2026-02-26 17:23 ` Suren Baghdasaryan
2026-02-26 7:06 ` [PATCH v3 2/3] mm: replace vma_start_write() with vma_start_write_killable() Suren Baghdasaryan
2026-02-26 17:43 ` Liam R. Howlett
2026-02-26 7:06 ` [PATCH v3 3/3] mm: use vma_start_write_killable() in process_vma_walk_lock() Suren Baghdasaryan
2026-02-26 18:10 ` Claudio Imbrenda
2026-02-26 18:24 ` Suren Baghdasaryan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox