Re: [PATCH] mm: remove unnecessary page_table_lock on stack expansion

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: remove unnecessary page_table_lock on stack expansion
Date: Tue, 5 Nov 2024 10:07:03 -0500	[thread overview]
Message-ID: <y7kqvhytmmcxw5kp4pcncr7i2y5rozj2tuqzttqkavlalx3bo5@7jufqeyyntor> (raw)
In-Reply-To: <20241101184627.131391-1-lorenzo.stoakes@oracle.com>

* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [241101 14:46]:
> Ever since commit 8d7071af8907 ("mm: always expand the stack with the mmap
> write lock held") we have been expanding the stack with the mmap write lock
> held.
> 
> This is true in all code paths:
> 
> get_arg_page()
>   -> expand_downwards()
> setup_arg_pages()
>   -> expand_stack_locked()
>     -> expand_downwards() / expand_upwards()
> lock_mm_and_find_vma()
>   -> expand_stack_locked()
>     -> expand_downwards() / expand_upwards()
> create_elf_tables()
>   -> find_extend_vma_locked()
>     -> expand_stack_locked()
> expand_stack()
>   -> vma_expand_down()
>     -> expand_downwards()
> expand_stack()
>   -> vma_expand_up()
>     -> expand_upwards()
> 
> Each of which acquire the mmap write lock before doing so. Despite this, we
> maintain code that acquires a page table lock in the expand_upwards() and
> expand_downwards() code, stating that we hold a shared mmap lock and thus
> this is necessary.
> 
> It is not, we do not have to worry about concurrent VMA expansions so we
> can simply drop this, and update comments accordingly.
> 
> We do not even need be concerned with racing page faults, as
> vma_start_write() is invoked in both cases.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

> ---
>  mm/mmap.c | 38 ++++++--------------------------------
>  1 file changed, 6 insertions(+), 32 deletions(-)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index f904b3bba962..386429f7db5a 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1039,6 +1039,8 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>  	if (!(vma->vm_flags & VM_GROWSUP))
>  		return -EFAULT;
>  
> +	mmap_assert_write_locked(mm);
> +
>  	/* Guard against exceeding limits of the address space. */
>  	address &= PAGE_MASK;
>  	if (address >= (TASK_SIZE & PAGE_MASK))
> @@ -1074,11 +1076,7 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>  
>  	/* Lock the VMA before expanding to prevent concurrent page faults */
>  	vma_start_write(vma);
> -	/*
> -	 * vma->vm_start/vm_end cannot change under us because the caller
> -	 * is required to hold the mmap_lock in read mode.  We need the
> -	 * anon_vma lock to serialize against concurrent expand_stacks.
> -	 */
> +	/* We update the anon VMA tree. */
>  	anon_vma_lock_write(vma->anon_vma);
>  
>  	/* Somebody else might have raced and expanded it already */
> @@ -1092,16 +1090,6 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>  		if (vma->vm_pgoff + (size >> PAGE_SHIFT) >= vma->vm_pgoff) {
>  			error = acct_stack_growth(vma, size, grow);
>  			if (!error) {
> -				/*
> -				 * We only hold a shared mmap_lock lock here, so
> -				 * we need to protect against concurrent vma
> -				 * expansions.  anon_vma_lock_write() doesn't
> -				 * help here, as we don't guarantee that all
> -				 * growable vmas in a mm share the same root
> -				 * anon vma.  So, we reuse mm->page_table_lock
> -				 * to guard against concurrent vma expansions.
> -				 */
> -				spin_lock(&mm->page_table_lock);
>  				if (vma->vm_flags & VM_LOCKED)
>  					mm->locked_vm += grow;
>  				vm_stat_account(mm, vma->vm_flags, grow);
> @@ -1110,7 +1098,6 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>  				/* Overwrite old entry in mtree. */
>  				vma_iter_store(&vmi, vma);
>  				anon_vma_interval_tree_post_update_vma(vma);
> -				spin_unlock(&mm->page_table_lock);
>  
>  				perf_event_mmap(vma);
>  			}
> @@ -1137,6 +1124,8 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
>  	if (!(vma->vm_flags & VM_GROWSDOWN))
>  		return -EFAULT;
>  
> +	mmap_assert_write_locked(mm);
> +
>  	address &= PAGE_MASK;
>  	if (address < mmap_min_addr || address < FIRST_USER_ADDRESS)
>  		return -EPERM;
> @@ -1166,11 +1155,7 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
>  
>  	/* Lock the VMA before expanding to prevent concurrent page faults */
>  	vma_start_write(vma);
> -	/*
> -	 * vma->vm_start/vm_end cannot change under us because the caller
> -	 * is required to hold the mmap_lock in read mode.  We need the
> -	 * anon_vma lock to serialize against concurrent expand_stacks.
> -	 */
> +	/* We update the anon VMA tree. */
>  	anon_vma_lock_write(vma->anon_vma);
>  
>  	/* Somebody else might have raced and expanded it already */
> @@ -1184,16 +1169,6 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
>  		if (grow <= vma->vm_pgoff) {
>  			error = acct_stack_growth(vma, size, grow);
>  			if (!error) {
> -				/*
> -				 * We only hold a shared mmap_lock lock here, so
> -				 * we need to protect against concurrent vma
> -				 * expansions.  anon_vma_lock_write() doesn't
> -				 * help here, as we don't guarantee that all
> -				 * growable vmas in a mm share the same root
> -				 * anon vma.  So, we reuse mm->page_table_lock
> -				 * to guard against concurrent vma expansions.
> -				 */
> -				spin_lock(&mm->page_table_lock);
>  				if (vma->vm_flags & VM_LOCKED)
>  					mm->locked_vm += grow;
>  				vm_stat_account(mm, vma->vm_flags, grow);
> @@ -1203,7 +1178,6 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
>  				/* Overwrite old entry in mtree. */
>  				vma_iter_store(&vmi, vma);
>  				anon_vma_interval_tree_post_update_vma(vma);
> -				spin_unlock(&mm->page_table_lock);
>  
>  				perf_event_mmap(vma);
>  			}
> -- 
> 2.47.0
>

     prev parent reply	other threads:[~2024-11-05 15:07 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-01 18:46 Lorenzo Stoakes
2024-11-01 18:59 ` Linus Torvalds
2024-11-01 20:34 ` Jann Horn
2024-11-05  9:17 ` Vlastimil Babka
2024-11-05 15:07 ` Liam R. Howlett [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=y7kqvhytmmcxw5kp4pcncr7i2y5rozj2tuqzttqkavlalx3bo5@7jufqeyyntor \
    --to=liam.howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox