linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jane Chu <jane.chu@oracle.com>
To: Miaohe Lin <linmiaohe@huawei.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: akpm@linux-foundation.org, brauner@kernel.org, oleg@redhat.com,
	tandersen@netflix.com, mjguzik@gmail.com, willy@infradead.org,
	kent.overstreet@linux.dev, zhangpeng.00@bytedance.com,
	hca@linux.ibm.com, mike.kravetz@oracle.com,
	muchun.song@linux.dev, thorvald@google.com,
	Liam.Howlett@Oracle.com
Subject: Re: [PATCH] fork: defer linking file vma until vma is fully initialized
Date: Mon, 15 Apr 2024 16:32:28 -0700	[thread overview]
Message-ID: <28976a8e-678e-4cfa-8748-e566c9c29053@oracle.com> (raw)
In-Reply-To: <20240410091441.3539905-1-linmiaohe@huawei.com>

On 4/10/2024 2:14 AM, Miaohe Lin wrote:

> Thorvald reported a WARNING [1]. And the root cause is below race:
>
>   CPU 1					CPU 2
>   fork					hugetlbfs_fallocate
>    dup_mmap				 hugetlbfs_punch_hole
>     i_mmap_lock_write(mapping);
>     vma_interval_tree_insert_after -- Child vma is visible through i_mmap tree.
>     i_mmap_unlock_write(mapping);
>     hugetlb_dup_vma_private -- Clear vma_lock outside i_mmap_rwsem!
> 					 i_mmap_lock_write(mapping);
>     					 hugetlb_vmdelete_list
> 					  vma_interval_tree_foreach
> 					   hugetlb_vma_trylock_write -- Vma_lock is cleared.
>     tmp->vm_ops->open -- Alloc new vma_lock outside i_mmap_rwsem!
> 					   hugetlb_vma_unlock_write -- Vma_lock is assigned!!!
> 					 i_mmap_unlock_write(mapping);
>
> hugetlb_dup_vma_private() and hugetlb_vm_op_open() are called outside
> i_mmap_rwsem lock while vma lock can be used in the same time. Fix this
> by deferring linking file vma until vma is fully initialized. Those vmas
> should be initialized first before they can be used.
>
> Reported-by: Thorvald Natvig <thorvald@google.com>
> Closes: https://lore.kernel.org/linux-mm/20240129161735.6gmjsswx62o4pbja@revolver/T/ [1]
> Fixes: 8d9bfb260814 ("hugetlb: add vma based lock for pmd sharing")
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>   kernel/fork.c | 33 +++++++++++++++++----------------
>   1 file changed, 17 insertions(+), 16 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 84de5faa8c9a..99076dbe27d8 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -714,6 +714,23 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>   		} else if (anon_vma_fork(tmp, mpnt))
>   			goto fail_nomem_anon_vma_fork;
>   		vm_flags_clear(tmp, VM_LOCKED_MASK);
> +		/*
> +		 * Copy/update hugetlb private vma information.
> +		 */
> +		if (is_vm_hugetlb_page(tmp))
> +			hugetlb_dup_vma_private(tmp);
> +
> +		/*
> +		 * Link the vma into the MT. After using __mt_dup(), memory
> +		 * allocation is not necessary here, so it cannot fail.
> +		 */
> +		vma_iter_bulk_store(&vmi, tmp);
> +
> +		mm->map_count++;
> +
> +		if (tmp->vm_ops && tmp->vm_ops->open)
> +			tmp->vm_ops->open(tmp);
> +
>   		file = tmp->vm_file;
>   		if (file) {
>   			struct address_space *mapping = file->f_mapping;
> @@ -730,25 +747,9 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>   			i_mmap_unlock_write(mapping);
>   		}
>   
> -		/*
> -		 * Copy/update hugetlb private vma information.
> -		 */
> -		if (is_vm_hugetlb_page(tmp))
> -			hugetlb_dup_vma_private(tmp);
> -
> -		/*
> -		 * Link the vma into the MT. After using __mt_dup(), memory
> -		 * allocation is not necessary here, so it cannot fail.
> -		 */
> -		vma_iter_bulk_store(&vmi, tmp);
> -
> -		mm->map_count++;
>   		if (!(tmp->vm_flags & VM_WIPEONFORK))
>   			retval = copy_page_range(tmp, mpnt);
>   
> -		if (tmp->vm_ops && tmp->vm_ops->open)
> -			tmp->vm_ops->open(tmp);
> -
>   		if (retval) {
>   			mpnt = vma_next(&vmi);
>   			goto loop_out;

Looks good.

Reviewed-by: Jane Chu <jane.chu@oracle.com>

-jane



      parent reply	other threads:[~2024-04-15 23:33 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-10  9:14 Miaohe Lin
2024-04-10 20:21 ` Andrew Morton
2024-04-11  2:18   ` Miaohe Lin
2024-04-15 23:32 ` Jane Chu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28976a8e-678e-4cfa-8748-e566c9c29053@oracle.com \
    --to=jane.chu@oracle.com \
    --cc=Liam.Howlett@Oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=hca@linux.ibm.com \
    --cc=kent.overstreet@linux.dev \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=mjguzik@gmail.com \
    --cc=muchun.song@linux.dev \
    --cc=oleg@redhat.com \
    --cc=tandersen@netflix.com \
    --cc=thorvald@google.com \
    --cc=willy@infradead.org \
    --cc=zhangpeng.00@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox