Re: [PATCH v2 2/3] mm: replace vma_start_write() with vma_start_write_killable()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
To: Suren Baghdasaryan <surenb@google.com>
Cc: akpm@linux-foundation.org, willy@infradead.org, david@kernel.org,
	ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com,
	rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
	ying.huang@linux.alibaba.com, apopple@nvidia.com,
	lorenzo.stoakes@oracle.com, baolin.wang@linux.alibaba.com,
	npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
	baohua@kernel.org, lance.yang@linux.dev, vbabka@suse.cz,
	jannh@google.com, rppt@kernel.org, mhocko@suse.com,
	pfalcato@suse.de, kees@kernel.org, maddy@linux.ibm.com,
	npiggin@gmail.com, mpe@ellerman.id.au, chleroy@kernel.org,
	borntraeger@linux.ibm.com, frankja@linux.ibm.com,
	imbrenda@linux.ibm.com, hca@linux.ibm.com, gor@linux.ibm.com,
	agordeev@linux.ibm.com, svens@linux.ibm.com,
	gerald.schaefer@linux.ibm.com, linux-mm@kvack.org,
	linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org,
	"Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Subject: Re: [PATCH v2 2/3] mm: replace vma_start_write() with vma_start_write_killable()
Date: Tue, 17 Feb 2026 14:19:04 -0500	[thread overview]
Message-ID: <dtdfrko7uqif6flc4mefnlar7wnmrbyswfu7bvb2ar24gkeejo@ypzhmyklbeh7> (raw)
In-Reply-To: <20260217163250.2326001-3-surenb@google.com>

* Suren Baghdasaryan <surenb@google.com> [260217 11:33]:
> Now that we have vma_start_write_killable() we can replace most of the
> vma_start_write() calls with it, improving reaction time to the kill
> signal.
> 
> There are several places which are left untouched by this patch:
> 
> 1. free_pgtables() because function should free page tables even if a
> fatal signal is pending.
> 
> 2. process_vma_walk_lock(), which requires changes in its callers and
> will be handled in the next patch.
> 
> 3. userfaultd code, where some paths calling vma_start_write() can
> handle EINTR and some can't without a deeper code refactoring.
> 
> 4. vm_flags_{set|mod|clear} require refactoring that involves moving
> vma_start_write() out of these functions and replacing it with
> vma_assert_write_locked(), then callers of these functions should
> lock the vma themselves using vma_start_write_killable() whenever
> possible.
> 
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> # powerpc
> ---
>  arch/powerpc/kvm/book3s_hv_uvmem.c |  5 +-
>  include/linux/mempolicy.h          |  5 +-
>  mm/khugepaged.c                    |  5 +-
>  mm/madvise.c                       |  4 +-
>  mm/memory.c                        |  2 +
>  mm/mempolicy.c                     | 23 ++++++--
>  mm/mlock.c                         | 20 +++++--
>  mm/mprotect.c                      |  4 +-
>  mm/mremap.c                        |  4 +-
>  mm/vma.c                           | 93 +++++++++++++++++++++---------
>  mm/vma_exec.c                      |  6 +-
>  11 files changed, 123 insertions(+), 48 deletions(-)
> 

...

> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c

...

>  
>  static const struct mempolicy_operations mpol_ops[MPOL_MAX] = {
> @@ -1785,9 +1793,15 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
>  		return -EINVAL;
>  	if (end == start)
>  		return 0;
> -	mmap_write_lock(mm);
> +	if (mmap_write_lock_killable(mm))
> +		return -EINTR;
>  	prev = vma_prev(&vmi);
>  	for_each_vma_range(vmi, vma, end) {
> +		if (vma_start_write_killable(vma)) {
> +			err = -EINTR;
> +			break;
> +		}
> +
>  		/*
>  		 * If any vma in the range got policy other than MPOL_BIND
>  		 * or MPOL_PREFERRED_MANY we return error. We don't reset
> @@ -1808,7 +1822,6 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
>  			break;
>  		}
>  
> -		vma_start_write(vma);

Moving this vma_start_write() up means we will lock all vmas in the
range regardless of if they are going to change.  Was that your
intention?

It might be better to move the locking to later in the loop, prior to
the mpol_dup(), but after skipping other vmas?

>  		new->home_node = home_node;
>  		err = mbind_range(&vmi, vma, &prev, start, end, new);

...

> diff --git a/mm/vma.c b/mm/vma.c
> index bb4d0326fecb..1d21351282cf 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c

...

> @@ -2532,6 +2556,11 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
>  		goto free_vma;
>  	}
>  
> +	/* Lock the VMA since it is modified after insertion into VMA tree */
> +	error = vma_start_write_killable(vma);
> +	if (error)
> +		goto free_iter_vma;
> +
>  	if (map->file)
>  		error = __mmap_new_file_vma(map, vma);
>  	else if (map->vm_flags & VM_SHARED)
> @@ -2552,8 +2581,6 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
>  	WARN_ON_ONCE(!arch_validate_flags(map->vm_flags));
>  #endif
>  
> -	/* Lock the VMA since it is modified after insertion into VMA tree */
> -	vma_start_write(vma);
>  	vma_iter_store_new(vmi, vma);
>  	map->mm->map_count++;
>  	vma_link_file(vma, map->hold_file_rmap_lock);

This is a bit of a nit on the placement..

Prior to this change, the write lock on the vma was taken next to where
it was inserted in the tree.  Now the lock is taken between vma iterator
preallocations and part of the vma setup.

Would it make sense to put it closer to the vma allocation itself?  I
think all that's needed to be set is the mm struct for the locking to
work?


...

> @@ -3089,7 +3120,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)

Good luck testing this one.

>  	struct mm_struct *mm = vma->vm_mm;
>  	struct vm_area_struct *next;
>  	unsigned long gap_addr;
> -	int error = 0;
> +	int error;
>  	VMA_ITERATOR(vmi, mm, vma->vm_start);
>  
>  	if (!(vma->vm_flags & VM_GROWSUP))
> @@ -3126,12 +3157,14 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>  
>  	/* We must make sure the anon_vma is allocated. */
>  	if (unlikely(anon_vma_prepare(vma))) {
> -		vma_iter_free(&vmi);
> -		return -ENOMEM;
> +		error = -ENOMEM;
> +		goto free;
>  	}
>  
>  	/* Lock the VMA before expanding to prevent concurrent page faults */
> -	vma_start_write(vma);
> +	error = vma_start_write_killable(vma);
> +	if (error)
> +		goto free;
>  	/* We update the anon VMA tree. */
>  	anon_vma_lock_write(vma->anon_vma);
>  
> @@ -3160,6 +3193,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>  		}
>  	}
>  	anon_vma_unlock_write(vma->anon_vma);
> +free:
>  	vma_iter_free(&vmi);
>  	validate_mm(mm);
>  	return error;

Looks okay.

...

next prev parent reply	other threads:[~2026-02-17 19:19 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-17 16:32 [PATCH v2 0/3] Use killable vma write locking in most places Suren Baghdasaryan
2026-02-17 16:32 ` [PATCH v2 1/3] mm/vma: cleanup error handling path in vma_expand() Suren Baghdasaryan
2026-02-17 18:26   ` Liam R. Howlett
2026-02-17 16:32 ` [PATCH v2 2/3] mm: replace vma_start_write() with vma_start_write_killable() Suren Baghdasaryan
2026-02-17 19:19   ` Liam R. Howlett [this message]
2026-02-17 21:02     ` Suren Baghdasaryan
2026-02-18 16:46       ` Liam R. Howlett
2026-02-18 23:40         ` Suren Baghdasaryan
2026-02-17 16:32 ` [PATCH v2 3/3] mm: use vma_start_write_killable() in process_vma_walk_lock() Suren Baghdasaryan
2026-02-17 19:15   ` Heiko Carstens
2026-02-17 20:31     ` Suren Baghdasaryan
2026-02-18  7:10       ` Heiko Carstens
2026-02-18 13:07       ` Matthew Wilcox
2026-02-18 15:52         ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dtdfrko7uqif6flc4mefnlar7wnmrbyswfu7bvb2ar24gkeejo@ypzhmyklbeh7 \
    --to=liam.howlett@oracle.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=byungchul@sk.com \
    --cc=chleroy@kernel.org \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=frankja@linux.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=gourry@gourry.net \
    --cc=hca@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=jannh@google.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kees@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=maddy@linux.ibm.com \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=mpe@ellerman.id.au \
    --cc=npache@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=pfalcato@suse.de \
    --cc=rakie.kim@sk.com \
    --cc=ritesh.list@gmail.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=surenb@google.com \
    --cc=svens@linux.ibm.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox