linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fork: defer linking file vma until vma is fully initialized
@ 2024-04-10  9:14 Miaohe Lin
  2024-04-10 20:21 ` Andrew Morton
  2024-04-15 23:32 ` Jane Chu
  0 siblings, 2 replies; 4+ messages in thread
From: Miaohe Lin @ 2024-04-10  9:14 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: akpm, brauner, oleg, tandersen, mjguzik, willy, kent.overstreet,
	zhangpeng.00, linmiaohe, hca, mike.kravetz, muchun.song,
	thorvald, Liam.Howlett, jane.chu

Thorvald reported a WARNING [1]. And the root cause is below race:

 CPU 1					CPU 2
 fork					hugetlbfs_fallocate
  dup_mmap				 hugetlbfs_punch_hole
   i_mmap_lock_write(mapping);
   vma_interval_tree_insert_after -- Child vma is visible through i_mmap tree.
   i_mmap_unlock_write(mapping);
   hugetlb_dup_vma_private -- Clear vma_lock outside i_mmap_rwsem!
					 i_mmap_lock_write(mapping);
   					 hugetlb_vmdelete_list
					  vma_interval_tree_foreach
					   hugetlb_vma_trylock_write -- Vma_lock is cleared.
   tmp->vm_ops->open -- Alloc new vma_lock outside i_mmap_rwsem!
					   hugetlb_vma_unlock_write -- Vma_lock is assigned!!!
					 i_mmap_unlock_write(mapping);

hugetlb_dup_vma_private() and hugetlb_vm_op_open() are called outside
i_mmap_rwsem lock while vma lock can be used in the same time. Fix this
by deferring linking file vma until vma is fully initialized. Those vmas
should be initialized first before they can be used.

Reported-by: Thorvald Natvig <thorvald@google.com>
Closes: https://lore.kernel.org/linux-mm/20240129161735.6gmjsswx62o4pbja@revolver/T/ [1]
Fixes: 8d9bfb260814 ("hugetlb: add vma based lock for pmd sharing")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 kernel/fork.c | 33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 84de5faa8c9a..99076dbe27d8 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -714,6 +714,23 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 		} else if (anon_vma_fork(tmp, mpnt))
 			goto fail_nomem_anon_vma_fork;
 		vm_flags_clear(tmp, VM_LOCKED_MASK);
+		/*
+		 * Copy/update hugetlb private vma information.
+		 */
+		if (is_vm_hugetlb_page(tmp))
+			hugetlb_dup_vma_private(tmp);
+
+		/*
+		 * Link the vma into the MT. After using __mt_dup(), memory
+		 * allocation is not necessary here, so it cannot fail.
+		 */
+		vma_iter_bulk_store(&vmi, tmp);
+
+		mm->map_count++;
+
+		if (tmp->vm_ops && tmp->vm_ops->open)
+			tmp->vm_ops->open(tmp);
+
 		file = tmp->vm_file;
 		if (file) {
 			struct address_space *mapping = file->f_mapping;
@@ -730,25 +747,9 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 			i_mmap_unlock_write(mapping);
 		}
 
-		/*
-		 * Copy/update hugetlb private vma information.
-		 */
-		if (is_vm_hugetlb_page(tmp))
-			hugetlb_dup_vma_private(tmp);
-
-		/*
-		 * Link the vma into the MT. After using __mt_dup(), memory
-		 * allocation is not necessary here, so it cannot fail.
-		 */
-		vma_iter_bulk_store(&vmi, tmp);
-
-		mm->map_count++;
 		if (!(tmp->vm_flags & VM_WIPEONFORK))
 			retval = copy_page_range(tmp, mpnt);
 
-		if (tmp->vm_ops && tmp->vm_ops->open)
-			tmp->vm_ops->open(tmp);
-
 		if (retval) {
 			mpnt = vma_next(&vmi);
 			goto loop_out;
-- 
2.33.0



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] fork: defer linking file vma until vma is fully initialized
  2024-04-10  9:14 [PATCH] fork: defer linking file vma until vma is fully initialized Miaohe Lin
@ 2024-04-10 20:21 ` Andrew Morton
  2024-04-11  2:18   ` Miaohe Lin
  2024-04-15 23:32 ` Jane Chu
  1 sibling, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2024-04-10 20:21 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: linux-mm, linux-kernel, brauner, oleg, tandersen, mjguzik, willy,
	kent.overstreet, zhangpeng.00, hca, mike.kravetz, muchun.song,
	thorvald, Liam.Howlett, jane.chu

On Wed, 10 Apr 2024 17:14:41 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:

> Thorvald reported a WARNING [1]. And the root cause is below race:
> 
>  CPU 1					CPU 2
>  fork					hugetlbfs_fallocate
>   dup_mmap				 hugetlbfs_punch_hole
>    i_mmap_lock_write(mapping);
>    vma_interval_tree_insert_after -- Child vma is visible through i_mmap tree.
>    i_mmap_unlock_write(mapping);
>    hugetlb_dup_vma_private -- Clear vma_lock outside i_mmap_rwsem!
> 					 i_mmap_lock_write(mapping);
>    					 hugetlb_vmdelete_list
> 					  vma_interval_tree_foreach
> 					   hugetlb_vma_trylock_write -- Vma_lock is cleared.
>    tmp->vm_ops->open -- Alloc new vma_lock outside i_mmap_rwsem!
> 					   hugetlb_vma_unlock_write -- Vma_lock is assigned!!!
> 					 i_mmap_unlock_write(mapping);
> 
> hugetlb_dup_vma_private() and hugetlb_vm_op_open() are called outside
> i_mmap_rwsem lock while vma lock can be used in the same time. Fix this
> by deferring linking file vma until vma is fully initialized. Those vmas
> should be initialized first before they can be used.

Cool.  I queued this in mm-hotfixes (for 6.8-rcX) and I added a cc:stable.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] fork: defer linking file vma until vma is fully initialized
  2024-04-10 20:21 ` Andrew Morton
@ 2024-04-11  2:18   ` Miaohe Lin
  0 siblings, 0 replies; 4+ messages in thread
From: Miaohe Lin @ 2024-04-11  2:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, brauner, oleg, tandersen, mjguzik, willy,
	kent.overstreet, zhangpeng.00, hca, mike.kravetz, muchun.song,
	thorvald, Liam.Howlett, jane.chu

On 2024/4/11 4:21, Andrew Morton wrote:
> On Wed, 10 Apr 2024 17:14:41 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:
> 
>> Thorvald reported a WARNING [1]. And the root cause is below race:
>>
>>  CPU 1					CPU 2
>>  fork					hugetlbfs_fallocate
>>   dup_mmap				 hugetlbfs_punch_hole
>>    i_mmap_lock_write(mapping);
>>    vma_interval_tree_insert_after -- Child vma is visible through i_mmap tree.
>>    i_mmap_unlock_write(mapping);
>>    hugetlb_dup_vma_private -- Clear vma_lock outside i_mmap_rwsem!
>> 					 i_mmap_lock_write(mapping);
>>    					 hugetlb_vmdelete_list
>> 					  vma_interval_tree_foreach
>> 					   hugetlb_vma_trylock_write -- Vma_lock is cleared.
>>    tmp->vm_ops->open -- Alloc new vma_lock outside i_mmap_rwsem!
>> 					   hugetlb_vma_unlock_write -- Vma_lock is assigned!!!
>> 					 i_mmap_unlock_write(mapping);
>>
>> hugetlb_dup_vma_private() and hugetlb_vm_op_open() are called outside
>> i_mmap_rwsem lock while vma lock can be used in the same time. Fix this
>> by deferring linking file vma until vma is fully initialized. Those vmas
>> should be initialized first before they can be used.
> 
> Cool.  I queued this in mm-hotfixes (for 6.8-rcX) and I added a cc:stable.

Thanks for doing this. And any comment or thought would be really welcome and appreciated.
.

> .
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] fork: defer linking file vma until vma is fully initialized
  2024-04-10  9:14 [PATCH] fork: defer linking file vma until vma is fully initialized Miaohe Lin
  2024-04-10 20:21 ` Andrew Morton
@ 2024-04-15 23:32 ` Jane Chu
  1 sibling, 0 replies; 4+ messages in thread
From: Jane Chu @ 2024-04-15 23:32 UTC (permalink / raw)
  To: Miaohe Lin, linux-mm, linux-kernel
  Cc: akpm, brauner, oleg, tandersen, mjguzik, willy, kent.overstreet,
	zhangpeng.00, hca, mike.kravetz, muchun.song, thorvald,
	Liam.Howlett

On 4/10/2024 2:14 AM, Miaohe Lin wrote:

> Thorvald reported a WARNING [1]. And the root cause is below race:
>
>   CPU 1					CPU 2
>   fork					hugetlbfs_fallocate
>    dup_mmap				 hugetlbfs_punch_hole
>     i_mmap_lock_write(mapping);
>     vma_interval_tree_insert_after -- Child vma is visible through i_mmap tree.
>     i_mmap_unlock_write(mapping);
>     hugetlb_dup_vma_private -- Clear vma_lock outside i_mmap_rwsem!
> 					 i_mmap_lock_write(mapping);
>     					 hugetlb_vmdelete_list
> 					  vma_interval_tree_foreach
> 					   hugetlb_vma_trylock_write -- Vma_lock is cleared.
>     tmp->vm_ops->open -- Alloc new vma_lock outside i_mmap_rwsem!
> 					   hugetlb_vma_unlock_write -- Vma_lock is assigned!!!
> 					 i_mmap_unlock_write(mapping);
>
> hugetlb_dup_vma_private() and hugetlb_vm_op_open() are called outside
> i_mmap_rwsem lock while vma lock can be used in the same time. Fix this
> by deferring linking file vma until vma is fully initialized. Those vmas
> should be initialized first before they can be used.
>
> Reported-by: Thorvald Natvig <thorvald@google.com>
> Closes: https://lore.kernel.org/linux-mm/20240129161735.6gmjsswx62o4pbja@revolver/T/ [1]
> Fixes: 8d9bfb260814 ("hugetlb: add vma based lock for pmd sharing")
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>   kernel/fork.c | 33 +++++++++++++++++----------------
>   1 file changed, 17 insertions(+), 16 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 84de5faa8c9a..99076dbe27d8 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -714,6 +714,23 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>   		} else if (anon_vma_fork(tmp, mpnt))
>   			goto fail_nomem_anon_vma_fork;
>   		vm_flags_clear(tmp, VM_LOCKED_MASK);
> +		/*
> +		 * Copy/update hugetlb private vma information.
> +		 */
> +		if (is_vm_hugetlb_page(tmp))
> +			hugetlb_dup_vma_private(tmp);
> +
> +		/*
> +		 * Link the vma into the MT. After using __mt_dup(), memory
> +		 * allocation is not necessary here, so it cannot fail.
> +		 */
> +		vma_iter_bulk_store(&vmi, tmp);
> +
> +		mm->map_count++;
> +
> +		if (tmp->vm_ops && tmp->vm_ops->open)
> +			tmp->vm_ops->open(tmp);
> +
>   		file = tmp->vm_file;
>   		if (file) {
>   			struct address_space *mapping = file->f_mapping;
> @@ -730,25 +747,9 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>   			i_mmap_unlock_write(mapping);
>   		}
>   
> -		/*
> -		 * Copy/update hugetlb private vma information.
> -		 */
> -		if (is_vm_hugetlb_page(tmp))
> -			hugetlb_dup_vma_private(tmp);
> -
> -		/*
> -		 * Link the vma into the MT. After using __mt_dup(), memory
> -		 * allocation is not necessary here, so it cannot fail.
> -		 */
> -		vma_iter_bulk_store(&vmi, tmp);
> -
> -		mm->map_count++;
>   		if (!(tmp->vm_flags & VM_WIPEONFORK))
>   			retval = copy_page_range(tmp, mpnt);
>   
> -		if (tmp->vm_ops && tmp->vm_ops->open)
> -			tmp->vm_ops->open(tmp);
> -
>   		if (retval) {
>   			mpnt = vma_next(&vmi);
>   			goto loop_out;

Looks good.

Reviewed-by: Jane Chu <jane.chu@oracle.com>

-jane



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-04-15 23:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-10  9:14 [PATCH] fork: defer linking file vma until vma is fully initialized Miaohe Lin
2024-04-10 20:21 ` Andrew Morton
2024-04-11  2:18   ` Miaohe Lin
2024-04-15 23:32 ` Jane Chu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox