[PATCH] mm: avoid zeroing user movable page twice with init_on

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
@ 2024-10-11 15:03 Zi Yan
  2024-10-11 18:23 ` Zi Yan
                   ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: Zi Yan @ 2024-10-11 15:03 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: David Hildenbrand, Matthew Wilcox (Oracle),
	Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, Ryan Roberts,
	Alexander Potapenko, Kees Cook, linux-kernel, Zi Yan

Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and
init_on_free=1 boot options") forces allocated page to be zeroed in
post_alloc_hook() when init_on_alloc=1.

For order-0 folios, if arch does not define
vma_alloc_zeroed_movable_folio(), the default implementation again zeros
the page return from the buddy allocator. So the page is zeroed twice.
Fix it by passing __GFP_ZERO instead to avoid double page zeroing.
At the moment, s390,arm64,x86,alpha,m68k are not impacted since they
define their own vma_alloc_zeroed_movable_folio().

For >0 order folios (mTHP and PMD THP), folio_zero_user() is called to
zero the folio again. Fix it by calling folio_zero_user() only if
init_on_alloc is set. All arch are impacted.

Added alloc_zeroed() helper to encapsulate the init_on_alloc check.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/highmem.h | 8 +-------
 mm/huge_memory.c        | 3 ++-
 mm/internal.h           | 6 ++++++
 mm/memory.c             | 3 ++-
 4 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index bec9bd715acf..6e452bd8e7e3 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -224,13 +224,7 @@ static inline
 struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
 				   unsigned long vaddr)
 {
-	struct folio *folio;
-
-	folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr);
-	if (folio)
-		clear_user_highpage(&folio->page, vaddr);
-
-	return folio;
+	return vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr);
 }
 #endif
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 82f464865570..5dcbea96edb7 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1176,7 +1176,8 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
 	}
 	folio_throttle_swaprate(folio, gfp);
 
-	folio_zero_user(folio, addr);
+	if (!alloc_zeroed())
+		folio_zero_user(folio, addr);
 	/*
 	 * The memory barrier inside __folio_mark_uptodate makes sure that
 	 * folio_zero_user writes become visible before the set_pmd_at()
diff --git a/mm/internal.h b/mm/internal.h
index 906da6280c2d..508f7802dd2b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1233,6 +1233,12 @@ void touch_pud(struct vm_area_struct *vma, unsigned long addr,
 void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
 	       pmd_t *pmd, bool write);
 
+static inline bool alloc_zeroed(void)
+{
+	return static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
+			&init_on_alloc);
+}
+
 enum {
 	/* mark page accessed */
 	FOLL_TOUCH = 1 << 16,
diff --git a/mm/memory.c b/mm/memory.c
index c67359ddb61a..88252f0e06d0 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4719,7 +4719,8 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
 				goto next;
 			}
 			folio_throttle_swaprate(folio, gfp);
-			folio_zero_user(folio, vmf->address);
+			if (!alloc_zeroed())
+				folio_zero_user(folio, vmf->address);
 			return folio;
 		}
 next:
-- 
2.45.2



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-10-11 15:03 [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1 Zi Yan
@ 2024-10-11 18:23 ` Zi Yan
  2024-10-16 12:53 ` Vlastimil Babka
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 32+ messages in thread
From: Zi Yan @ 2024-10-11 18:23 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: David Hildenbrand, Matthew Wilcox (Oracle),
	Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, Ryan Roberts,
	Alexander Potapenko, Kees Cook, linux-kernel, Zi Yan,
	Vlastimil Babka

[-- Attachment #1: Type: text/plain, Size: 3357 bytes --]

+Vlastimil

On 11 Oct 2024, at 11:03, Zi Yan wrote:

> Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and
> init_on_free=1 boot options") forces allocated page to be zeroed in
> post_alloc_hook() when init_on_alloc=1.
>
> For order-0 folios, if arch does not define
> vma_alloc_zeroed_movable_folio(), the default implementation again zeros
> the page return from the buddy allocator. So the page is zeroed twice.
> Fix it by passing __GFP_ZERO instead to avoid double page zeroing.
> At the moment, s390,arm64,x86,alpha,m68k are not impacted since they
> define their own vma_alloc_zeroed_movable_folio().
>
> For >0 order folios (mTHP and PMD THP), folio_zero_user() is called to
> zero the folio again. Fix it by calling folio_zero_user() only if
> init_on_alloc is set. All arch are impacted.
>
> Added alloc_zeroed() helper to encapsulate the init_on_alloc check.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>  include/linux/highmem.h | 8 +-------
>  mm/huge_memory.c        | 3 ++-
>  mm/internal.h           | 6 ++++++
>  mm/memory.c             | 3 ++-
>  4 files changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index bec9bd715acf..6e452bd8e7e3 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -224,13 +224,7 @@ static inline
>  struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
>  				   unsigned long vaddr)
>  {
> -	struct folio *folio;
> -
> -	folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr);
> -	if (folio)
> -		clear_user_highpage(&folio->page, vaddr);
> -
> -	return folio;
> +	return vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr);
>  }
>  #endif
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 82f464865570..5dcbea96edb7 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1176,7 +1176,8 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
>  	}
>  	folio_throttle_swaprate(folio, gfp);
>
> -	folio_zero_user(folio, addr);
> +	if (!alloc_zeroed())
> +		folio_zero_user(folio, addr);
>  	/*
>  	 * The memory barrier inside __folio_mark_uptodate makes sure that
>  	 * folio_zero_user writes become visible before the set_pmd_at()
> diff --git a/mm/internal.h b/mm/internal.h
> index 906da6280c2d..508f7802dd2b 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1233,6 +1233,12 @@ void touch_pud(struct vm_area_struct *vma, unsigned long addr,
>  void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
>  	       pmd_t *pmd, bool write);
>
> +static inline bool alloc_zeroed(void)
> +{
> +	return static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
> +			&init_on_alloc);
> +}
> +
>  enum {
>  	/* mark page accessed */
>  	FOLL_TOUCH = 1 << 16,
> diff --git a/mm/memory.c b/mm/memory.c
> index c67359ddb61a..88252f0e06d0 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4719,7 +4719,8 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
>  				goto next;
>  			}
>  			folio_throttle_swaprate(folio, gfp);
> -			folio_zero_user(folio, vmf->address);
> +			if (!alloc_zeroed())
> +				folio_zero_user(folio, vmf->address);
>  			return folio;
>  		}
>  next:
> -- 
> 2.45.2


Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-10-11 15:03 [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1 Zi Yan
  2024-10-11 18:23 ` Zi Yan
@ 2024-10-16 12:53 ` Vlastimil Babka
  2024-10-16 13:30   ` Zi Yan
  2024-10-21 12:23 ` David Hildenbrand
  2024-12-04 10:41 ` Geert Uytterhoeven
  3 siblings, 1 reply; 32+ messages in thread
From: Vlastimil Babka @ 2024-10-16 12:53 UTC (permalink / raw)
  To: Zi Yan, linux-mm, Andrew Morton
  Cc: David Hildenbrand, Matthew Wilcox (Oracle),
	Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, Ryan Roberts,
	Alexander Potapenko, Kees Cook, linux-kernel

On 10/11/24 17:03, Zi Yan wrote:
> Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and
> init_on_free=1 boot options") forces allocated page to be zeroed in
> post_alloc_hook() when init_on_alloc=1.
> 
> For order-0 folios, if arch does not define
> vma_alloc_zeroed_movable_folio(), the default implementation again zeros
> the page return from the buddy allocator. So the page is zeroed twice.
> Fix it by passing __GFP_ZERO instead to avoid double page zeroing.
> At the moment, s390,arm64,x86,alpha,m68k are not impacted since they
> define their own vma_alloc_zeroed_movable_folio().
> 
> For >0 order folios (mTHP and PMD THP), folio_zero_user() is called to
> zero the folio again. Fix it by calling folio_zero_user() only if
> init_on_alloc is set. All arch are impacted.

		   ^ not set?

> 
> Added alloc_zeroed() helper to encapsulate the init_on_alloc check.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  include/linux/highmem.h | 8 +-------
>  mm/huge_memory.c        | 3 ++-
>  mm/internal.h           | 6 ++++++
>  mm/memory.c             | 3 ++-
>  4 files changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index bec9bd715acf..6e452bd8e7e3 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -224,13 +224,7 @@ static inline
>  struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
>  				   unsigned long vaddr)
>  {
> -	struct folio *folio;
> -
> -	folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr);
> -	if (folio)
> -		clear_user_highpage(&folio->page, vaddr);
> -
> -	return folio;
> +	return vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr);
>  }
>  #endif
>  
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 82f464865570..5dcbea96edb7 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1176,7 +1176,8 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
>  	}
>  	folio_throttle_swaprate(folio, gfp);
>  
> -	folio_zero_user(folio, addr);
> +	if (!alloc_zeroed())
> +		folio_zero_user(folio, addr);
>  	/*
>  	 * The memory barrier inside __folio_mark_uptodate makes sure that
>  	 * folio_zero_user writes become visible before the set_pmd_at()
> diff --git a/mm/internal.h b/mm/internal.h
> index 906da6280c2d..508f7802dd2b 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1233,6 +1233,12 @@ void touch_pud(struct vm_area_struct *vma, unsigned long addr,
>  void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
>  	       pmd_t *pmd, bool write);
>  
> +static inline bool alloc_zeroed(void)
> +{
> +	return static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
> +			&init_on_alloc);
> +}
> +
>  enum {
>  	/* mark page accessed */
>  	FOLL_TOUCH = 1 << 16,
> diff --git a/mm/memory.c b/mm/memory.c
> index c67359ddb61a..88252f0e06d0 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4719,7 +4719,8 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
>  				goto next;
>  			}
>  			folio_throttle_swaprate(folio, gfp);
> -			folio_zero_user(folio, vmf->address);
> +			if (!alloc_zeroed())
> +				folio_zero_user(folio, vmf->address);
>  			return folio;
>  		}
>  next:



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-10-16 12:53 ` Vlastimil Babka
@ 2024-10-16 13:30   ` Zi Yan
  0 siblings, 0 replies; 32+ messages in thread
From: Zi Yan @ 2024-10-16 13:30 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton
  Cc: linux-mm, David Hildenbrand, Matthew Wilcox (Oracle),
	Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, Ryan Roberts,
	Alexander Potapenko, Kees Cook, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1260 bytes --]

On 16 Oct 2024, at 8:53, Vlastimil Babka wrote:

> On 10/11/24 17:03, Zi Yan wrote:
>> Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and
>> init_on_free=1 boot options") forces allocated page to be zeroed in
>> post_alloc_hook() when init_on_alloc=1.
>>
>> For order-0 folios, if arch does not define
>> vma_alloc_zeroed_movable_folio(), the default implementation again zeros
>> the page return from the buddy allocator. So the page is zeroed twice.
>> Fix it by passing __GFP_ZERO instead to avoid double page zeroing.
>> At the moment, s390,arm64,x86,alpha,m68k are not impacted since they
>> define their own vma_alloc_zeroed_movable_folio().
>>
>> For >0 order folios (mTHP and PMD THP), folio_zero_user() is called to
>> zero the folio again. Fix it by calling folio_zero_user() only if
>> init_on_alloc is set. All arch are impacted.
>
> 		   ^ not set?

You are right. The sentence should be:

"Fix it by calling folio_zero_user() only if init_on_alloc is not set."

Hi Andrew,

Do you want me to resend this with fixed commit log?

>
>>
>> Added alloc_zeroed() helper to encapsulate the init_on_alloc check.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>

Thanks.

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-10-11 15:03 [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1 Zi Yan
  2024-10-11 18:23 ` Zi Yan
  2024-10-16 12:53 ` Vlastimil Babka
@ 2024-10-21 12:23 ` David Hildenbrand
  2024-10-21 14:21   ` Zi Yan
  2024-12-04 10:41 ` Geert Uytterhoeven
  3 siblings, 1 reply; 32+ messages in thread
From: David Hildenbrand @ 2024-10-21 12:23 UTC (permalink / raw)
  To: Zi Yan, linux-mm, Andrew Morton
  Cc: Matthew Wilcox (Oracle),
	Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, Ryan Roberts,
	Alexander Potapenko, Kees Cook, linux-kernel



Am 11.10.24 um 17:03 schrieb Zi Yan:
> Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and
> init_on_free=1 boot options") forces allocated page to be zeroed in
> post_alloc_hook() when init_on_alloc=1.
> 
> For order-0 folios, if arch does not define
> vma_alloc_zeroed_movable_folio(), the default implementation again zeros
> the page return from the buddy allocator. So the page is zeroed twice.
> Fix it by passing __GFP_ZERO instead to avoid double page zeroing.
> At the moment, s390,arm64,x86,alpha,m68k are not impacted since they
> define their own vma_alloc_zeroed_movable_folio().
> 
> For >0 order folios (mTHP and PMD THP), folio_zero_user() is called to
> zero the folio again. Fix it by calling folio_zero_user() only if
> init_on_alloc is set. All arch are impacted.
> 
> Added alloc_zeroed() helper to encapsulate the init_on_alloc check.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>   include/linux/highmem.h | 8 +-------
>   mm/huge_memory.c        | 3 ++-
>   mm/internal.h           | 6 ++++++
>   mm/memory.c             | 3 ++-
>   4 files changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index bec9bd715acf..6e452bd8e7e3 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -224,13 +224,7 @@ static inline
>   struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
>   				   unsigned long vaddr)
>   {
> -	struct folio *folio;
> -
> -	folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr);
> -	if (folio)
> -		clear_user_highpage(&folio->page, vaddr);
> -
> -	return folio;
> +	return vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr);
>   }
>   #endif
>   
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 82f464865570..5dcbea96edb7 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1176,7 +1176,8 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
>   	}
>   	folio_throttle_swaprate(folio, gfp);
>   
> -	folio_zero_user(folio, addr);
> +	if (!alloc_zeroed())
> +		folio_zero_user(folio, addr);



It might be reasonable to spell out why we are not using GFP_ZERO somewhere, 
something like

/*
  * We are not using __GFP_ZERO because folio_zero_user() will make sure that the
  * page corresponding to the faulting address will be hot in the cache.
  */

Sth. like that maybe.

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-10-21 12:23 ` David Hildenbrand
@ 2024-10-21 14:21   ` Zi Yan
  2024-10-22 14:33     ` Zi Yan
  0 siblings, 1 reply; 32+ messages in thread
From: Zi Yan @ 2024-10-21 14:21 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, Andrew Morton, Matthew Wilcox (Oracle),
	Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, Ryan Roberts,
	Alexander Potapenko, Kees Cook, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4340 bytes --]

On 21 Oct 2024, at 8:23, David Hildenbrand wrote:

> Am 11.10.24 um 17:03 schrieb Zi Yan:
>> Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and
>> init_on_free=1 boot options") forces allocated page to be zeroed in
>> post_alloc_hook() when init_on_alloc=1.
>>
>> For order-0 folios, if arch does not define
>> vma_alloc_zeroed_movable_folio(), the default implementation again zeros
>> the page return from the buddy allocator. So the page is zeroed twice.
>> Fix it by passing __GFP_ZERO instead to avoid double page zeroing.
>> At the moment, s390,arm64,x86,alpha,m68k are not impacted since they
>> define their own vma_alloc_zeroed_movable_folio().
>>
>> For >0 order folios (mTHP and PMD THP), folio_zero_user() is called to
>> zero the folio again. Fix it by calling folio_zero_user() only if
>> init_on_alloc is set. All arch are impacted.
>>
>> Added alloc_zeroed() helper to encapsulate the init_on_alloc check.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>>   include/linux/highmem.h | 8 +-------
>>   mm/huge_memory.c        | 3 ++-
>>   mm/internal.h           | 6 ++++++
>>   mm/memory.c             | 3 ++-
>>   4 files changed, 11 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> index bec9bd715acf..6e452bd8e7e3 100644
>> --- a/include/linux/highmem.h
>> +++ b/include/linux/highmem.h
>> @@ -224,13 +224,7 @@ static inline
>>   struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
>>   				   unsigned long vaddr)
>>   {
>> -	struct folio *folio;
>> -
>> -	folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr);
>> -	if (folio)
>> -		clear_user_highpage(&folio->page, vaddr);
>> -
>> -	return folio;
>> +	return vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr);
>>   }
>>   #endif
>>  diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 82f464865570..5dcbea96edb7 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -1176,7 +1176,8 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
>>   	}
>>   	folio_throttle_swaprate(folio, gfp);
>>  -	folio_zero_user(folio, addr);
>> +	if (!alloc_zeroed())
>> +		folio_zero_user(folio, addr);
>
>
>
> It might be reasonable to spell out why we are not using GFP_ZERO somewhere, something like
>
> /*
>  * We are not using __GFP_ZERO because folio_zero_user() will make sure that the
>  * page corresponding to the faulting address will be hot in the cache.
>  */
>
> Sth. like that maybe.

I changed the wording a bit to fit the if statement and put the comment in both
call sites. Let me know how it looks. Thanks.

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 830d6aa5bf97..b304bb3ffcef 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1180,6 +1180,11 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
        }
        folio_throttle_swaprate(folio, gfp);

+       /*
+        * When a folio is not zeroed during allocation (__GFP_ZERO not used),
+        * folio_zero_user() is used to make sure that the page corresponding
+        * to the faulting address will be hot in the cache after zeroing.
+        */
        if (!alloc_zeroed())
                folio_zero_user(folio, addr);
        /*
diff --git a/mm/memory.c b/mm/memory.c
index 0f614523b9f4..42c8bb5fcd8e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4719,6 +4719,13 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
                                goto next;
                        }
                        folio_throttle_swaprate(folio, gfp);
+                       /*
+                        * When a folio is not zeroed during allocation
+                        * (__GFP_ZERO not used), folio_zero_user() is used
+                        * to make sure that the page corresponding to the
+                        * faulting address will be hot in the cache after
+                        * zeroing.
+                        */
                        if (!alloc_zeroed())
                                folio_zero_user(folio, vmf->address);
                        return folio;


>
> Acked-by: David Hildenbrand <david@redhat.com>

Thanks.

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-10-21 14:21   ` Zi Yan
@ 2024-10-22 14:33     ` Zi Yan
  0 siblings, 0 replies; 32+ messages in thread
From: Zi Yan @ 2024-10-22 14:33 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand
  Cc: linux-mm, Matthew Wilcox (Oracle),
	Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, Ryan Roberts,
	Alexander Potapenko, Kees Cook, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4552 bytes --]

On 21 Oct 2024, at 10:21, Zi Yan wrote:

> On 21 Oct 2024, at 8:23, David Hildenbrand wrote:
>
>> Am 11.10.24 um 17:03 schrieb Zi Yan:
>>> Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and
>>> init_on_free=1 boot options") forces allocated page to be zeroed in
>>> post_alloc_hook() when init_on_alloc=1.
>>>
>>> For order-0 folios, if arch does not define
>>> vma_alloc_zeroed_movable_folio(), the default implementation again zeros
>>> the page return from the buddy allocator. So the page is zeroed twice.
>>> Fix it by passing __GFP_ZERO instead to avoid double page zeroing.
>>> At the moment, s390,arm64,x86,alpha,m68k are not impacted since they
>>> define their own vma_alloc_zeroed_movable_folio().
>>>
>>> For >0 order folios (mTHP and PMD THP), folio_zero_user() is called to
>>> zero the folio again. Fix it by calling folio_zero_user() only if
>>> init_on_alloc is set. All arch are impacted.
>>>
>>> Added alloc_zeroed() helper to encapsulate the init_on_alloc check.
>>>
>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>> ---
>>>   include/linux/highmem.h | 8 +-------
>>>   mm/huge_memory.c        | 3 ++-
>>>   mm/internal.h           | 6 ++++++
>>>   mm/memory.c             | 3 ++-
>>>   4 files changed, 11 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>>> index bec9bd715acf..6e452bd8e7e3 100644
>>> --- a/include/linux/highmem.h
>>> +++ b/include/linux/highmem.h
>>> @@ -224,13 +224,7 @@ static inline
>>>   struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
>>>   				   unsigned long vaddr)
>>>   {
>>> -	struct folio *folio;
>>> -
>>> -	folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr);
>>> -	if (folio)
>>> -		clear_user_highpage(&folio->page, vaddr);
>>> -
>>> -	return folio;
>>> +	return vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr);
>>>   }
>>>   #endif
>>>  diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 82f464865570..5dcbea96edb7 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -1176,7 +1176,8 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
>>>   	}
>>>   	folio_throttle_swaprate(folio, gfp);
>>>  -	folio_zero_user(folio, addr);
>>> +	if (!alloc_zeroed())
>>> +		folio_zero_user(folio, addr);
>>
>>
>>
>> It might be reasonable to spell out why we are not using GFP_ZERO somewhere, something like
>>
>> /*
>>  * We are not using __GFP_ZERO because folio_zero_user() will make sure that the
>>  * page corresponding to the faulting address will be hot in the cache.
>>  */
>>
>> Sth. like that maybe.
>
> I changed the wording a bit to fit the if statement and put the comment in both
> call sites. Let me know how it looks. Thanks.

Hi Andrew,

Do you mind folding the changes below into the original patch?

Thanks.

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 830d6aa5bf97..b304bb3ffcef 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1180,6 +1180,11 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
        }
        folio_throttle_swaprate(folio, gfp);

+       /*
+        * When a folio is not zeroed during allocation (__GFP_ZERO not used),
+        * folio_zero_user() is used to make sure that the page corresponding
+        * to the faulting address will be hot in the cache after zeroing.
+        */
        if (!alloc_zeroed())
                folio_zero_user(folio, addr);
        /*
diff --git a/mm/memory.c b/mm/memory.c
index 0f614523b9f4..42c8bb5fcd8e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4719,6 +4719,13 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
                                goto next;
                        }
                        folio_throttle_swaprate(folio, gfp);
+                       /*
+                        * When a folio is not zeroed during allocation
+                        * (__GFP_ZERO not used), folio_zero_user() is used
+                        * to make sure that the page corresponding to the
+                        * faulting address will be hot in the cache after
+                        * zeroing.
+                        */
                        if (!alloc_zeroed())
                                folio_zero_user(folio, vmf->address);
                        return folio;


>>
>> Acked-by: David Hildenbrand <david@redhat.com>
>
> Thanks.

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-10-11 15:03 [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1 Zi Yan
                   ` (2 preceding siblings ...)
  2024-10-21 12:23 ` David Hildenbrand
@ 2024-12-04 10:41 ` Geert Uytterhoeven
  2024-12-04 12:50   ` Zi Yan
  2024-12-04 15:24   ` Zi Yan
  3 siblings, 2 replies; 32+ messages in thread
From: Geert Uytterhoeven @ 2024-12-04 10:41 UTC (permalink / raw)
  To: Zi Yan
  Cc: linux-mm, Andrew Morton, David Hildenbrand,
	Matthew Wilcox (Oracle),
	Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, Ryan Roberts,
	Alexander Potapenko, Kees Cook, linux-kernel, linux-mips

Hi Zi,

On Fri, Oct 11, 2024 at 5:13 PM Zi Yan <ziy@nvidia.com> wrote:
> Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and
> init_on_free=1 boot options") forces allocated page to be zeroed in
> post_alloc_hook() when init_on_alloc=1.
>
> For order-0 folios, if arch does not define
> vma_alloc_zeroed_movable_folio(), the default implementation again zeros
> the page return from the buddy allocator. So the page is zeroed twice.
> Fix it by passing __GFP_ZERO instead to avoid double page zeroing.
> At the moment, s390,arm64,x86,alpha,m68k are not impacted since they
> define their own vma_alloc_zeroed_movable_folio().
>
> For >0 order folios (mTHP and PMD THP), folio_zero_user() is called to
> zero the folio again. Fix it by calling folio_zero_user() only if
> init_on_alloc is set. All arch are impacted.
>
> Added alloc_zeroed() helper to encapsulate the init_on_alloc check.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>

Thanks for your patch, which is now commit 5708d96da20b99b4 ("mm:
avoid zeroing user movable page twice with init_on_alloc=1")
in v6.13-rc1.

This causing a panic when starting userspace on MIPS64 RBTX4927:

    Run /sbin/init as init process
    process '/lib/systemd/systemd' started with executable stack
    Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
    ---[ end Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b ]---

or

    Run /sbin/init as init process
    process '/lib/systemd/systemd' started with executable stack
    do_page_fault(): sending SIGSEGV to init for invalid read access
from 00000000583399f8
    epc = 0000000077e2b094 in ld-2.19.so[3094,77e28000+22000]
    ra  = 0000000077e2afcc in ld-2.19.so[2fcc,77e28000+22000]
    Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
    ---[ end Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b ]---

or

    Run /sbin/init as init process
    process '/lib/systemd/systemd' started with executable stack
    /sbin/inKernel panic - not syncing: Attempted to kill init!
exitcode=0x00007f00
    ---[ end Kernel panic - not syncing: Attempted to kill init!
exitcode=0x00007f00 ]---
    it: error while loading shared libraries: libpthread.so.0: object
file has no dynamic section

Reverting the commit (and fixing the trivial conflict) fixes the issue.

> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -224,13 +224,7 @@ static inline
>  struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
>                                    unsigned long vaddr)
>  {
> -       struct folio *folio;
> -
> -       folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr);
> -       if (folio)
> -               clear_user_highpage(&folio->page, vaddr);
> -
> -       return folio;
> +       return vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr);
>  }
>  #endif
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 82f464865570..5dcbea96edb7 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1176,7 +1176,8 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
>         }
>         folio_throttle_swaprate(folio, gfp);
>
> -       folio_zero_user(folio, addr);
> +       if (!alloc_zeroed())
> +               folio_zero_user(folio, addr);
>         /*
>          * The memory barrier inside __folio_mark_uptodate makes sure that
>          * folio_zero_user writes become visible before the set_pmd_at()
> diff --git a/mm/internal.h b/mm/internal.h
> index 906da6280c2d..508f7802dd2b 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1233,6 +1233,12 @@ void touch_pud(struct vm_area_struct *vma, unsigned long addr,
>  void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
>                pmd_t *pmd, bool write);
>
> +static inline bool alloc_zeroed(void)
> +{
> +       return static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
> +                       &init_on_alloc);
> +}
> +
>  enum {
>         /* mark page accessed */
>         FOLL_TOUCH = 1 << 16,
> diff --git a/mm/memory.c b/mm/memory.c
> index c67359ddb61a..88252f0e06d0 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4719,7 +4719,8 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
>                                 goto next;
>                         }
>                         folio_throttle_swaprate(folio, gfp);
> -                       folio_zero_user(folio, vmf->address);
> +                       if (!alloc_zeroed())
> +                               folio_zero_user(folio, vmf->address);
>                         return folio;
>                 }
>  next:

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-04 10:41 ` Geert Uytterhoeven
@ 2024-12-04 12:50   ` Zi Yan
  2024-12-04 12:56     ` Geert Uytterhoeven
  2024-12-04 15:24   ` Zi Yan
  1 sibling, 1 reply; 32+ messages in thread
From: Zi Yan @ 2024-12-04 12:50 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: linux-mm, Andrew Morton, David Hildenbrand,
	Matthew Wilcox (Oracle),
	Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, Ryan Roberts,
	Alexander Potapenko, Kees Cook, linux-kernel, linux-mips

On 4 Dec 2024, at 5:41, Geert Uytterhoeven wrote:

> Hi Zi,
>
> On Fri, Oct 11, 2024 at 5:13 PM Zi Yan <ziy@nvidia.com> wrote:
>> Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and
>> init_on_free=1 boot options") forces allocated page to be zeroed in
>> post_alloc_hook() when init_on_alloc=1.
>>
>> For order-0 folios, if arch does not define
>> vma_alloc_zeroed_movable_folio(), the default implementation again zeros
>> the page return from the buddy allocator. So the page is zeroed twice.
>> Fix it by passing __GFP_ZERO instead to avoid double page zeroing.
>> At the moment, s390,arm64,x86,alpha,m68k are not impacted since they
>> define their own vma_alloc_zeroed_movable_folio().
>>
>> For >0 order folios (mTHP and PMD THP), folio_zero_user() is called to
>> zero the folio again. Fix it by calling folio_zero_user() only if
>> init_on_alloc is set. All arch are impacted.
>>
>> Added alloc_zeroed() helper to encapsulate the init_on_alloc check.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>
> Thanks for your patch, which is now commit 5708d96da20b99b4 ("mm:
> avoid zeroing user movable page twice with init_on_alloc=1")
> in v6.13-rc1.

Thank you for reporting the error.

>
> This causing a panic when starting userspace on MIPS64 RBTX4927:
>
>     Run /sbin/init as init process
>     process '/lib/systemd/systemd' started with executable stack
>     Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>     ---[ end Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b ]---
>
> or
>
>     Run /sbin/init as init process
>     process '/lib/systemd/systemd' started with executable stack
>     do_page_fault(): sending SIGSEGV to init for invalid read access
> from 00000000583399f8
>     epc = 0000000077e2b094 in ld-2.19.so[3094,77e28000+22000]
>     ra  = 0000000077e2afcc in ld-2.19.so[2fcc,77e28000+22000]
>     Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>     ---[ end Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b ]---
>
> or
>
>     Run /sbin/init as init process
>     process '/lib/systemd/systemd' started with executable stack
>     /sbin/inKernel panic - not syncing: Attempted to kill init!
> exitcode=0x00007f00
>     ---[ end Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x00007f00 ]---
>     it: error while loading shared libraries: libpthread.so.0: object
> file has no dynamic section


Do you mind providing the full kernel log for this panic? And your kernel
config as well. I am trying to figure out why changing page zeroing from
twice to once can cause kernel panic.

--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-04 12:50   ` Zi Yan
@ 2024-12-04 12:56     ` Geert Uytterhoeven
  0 siblings, 0 replies; 32+ messages in thread
From: Geert Uytterhoeven @ 2024-12-04 12:56 UTC (permalink / raw)
  To: Zi Yan
  Cc: linux-mm, Andrew Morton, David Hildenbrand,
	Matthew Wilcox (Oracle),
	Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, Ryan Roberts,
	Alexander Potapenko, Kees Cook, linux-kernel, linux-mips

On Wed, Dec 4, 2024 at 1:50 PM Zi Yan <ziy@nvidia.com> wrote:
> On 4 Dec 2024, at 5:41, Geert Uytterhoeven wrote:
> > On Fri, Oct 11, 2024 at 5:13 PM Zi Yan <ziy@nvidia.com> wrote:
> >> Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and
> >> init_on_free=1 boot options") forces allocated page to be zeroed in
> >> post_alloc_hook() when init_on_alloc=1.
> >>
> >> For order-0 folios, if arch does not define
> >> vma_alloc_zeroed_movable_folio(), the default implementation again zeros
> >> the page return from the buddy allocator. So the page is zeroed twice.
> >> Fix it by passing __GFP_ZERO instead to avoid double page zeroing.
> >> At the moment, s390,arm64,x86,alpha,m68k are not impacted since they
> >> define their own vma_alloc_zeroed_movable_folio().
> >>
> >> For >0 order folios (mTHP and PMD THP), folio_zero_user() is called to
> >> zero the folio again. Fix it by calling folio_zero_user() only if
> >> init_on_alloc is set. All arch are impacted.
> >>
> >> Added alloc_zeroed() helper to encapsulate the init_on_alloc check.
> >>
> >> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >
> > Thanks for your patch, which is now commit 5708d96da20b99b4 ("mm:
> > avoid zeroing user movable page twice with init_on_alloc=1")
> > in v6.13-rc1.
>
> Thank you for reporting the error.
>
> > This causing a panic when starting userspace on MIPS64 RBTX4927:

> Do you mind providing the full kernel log for this panic? And your kernel
> config as well. I am trying to figure out why changing page zeroing from
> twice to once can cause kernel panic.

Both sent by PM.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-04 10:41 ` Geert Uytterhoeven
  2024-12-04 12:50   ` Zi Yan
@ 2024-12-04 15:24   ` Zi Yan
  2024-12-04 15:41     ` Vlastimil Babka
  2024-12-05  8:15     ` Geert Uytterhoeven
  1 sibling, 2 replies; 32+ messages in thread
From: Zi Yan @ 2024-12-04 15:24 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: linux-mm, Andrew Morton, David Hildenbrand,
	Matthew Wilcox (Oracle),
	Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, Ryan Roberts,
	Alexander Potapenko, Kees Cook, linux-kernel, linux-mips

On 4 Dec 2024, at 5:41, Geert Uytterhoeven wrote:

> Hi Zi,
>
> On Fri, Oct 11, 2024 at 5:13 PM Zi Yan <ziy@nvidia.com> wrote:
>> Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and
>> init_on_free=1 boot options") forces allocated page to be zeroed in
>> post_alloc_hook() when init_on_alloc=1.
>>
>> For order-0 folios, if arch does not define
>> vma_alloc_zeroed_movable_folio(), the default implementation again zeros
>> the page return from the buddy allocator. So the page is zeroed twice.
>> Fix it by passing __GFP_ZERO instead to avoid double page zeroing.
>> At the moment, s390,arm64,x86,alpha,m68k are not impacted since they
>> define their own vma_alloc_zeroed_movable_folio().
>>
>> For >0 order folios (mTHP and PMD THP), folio_zero_user() is called to
>> zero the folio again. Fix it by calling folio_zero_user() only if
>> init_on_alloc is set. All arch are impacted.
>>
>> Added alloc_zeroed() helper to encapsulate the init_on_alloc check.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>
> Thanks for your patch, which is now commit 5708d96da20b99b4 ("mm:
> avoid zeroing user movable page twice with init_on_alloc=1")
> in v6.13-rc1.
>
> This causing a panic when starting userspace on MIPS64 RBTX4927:
>
>     Run /sbin/init as init process
>     process '/lib/systemd/systemd' started with executable stack
>     Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>     ---[ end Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b ]---
>
> or
>
>     Run /sbin/init as init process
>     process '/lib/systemd/systemd' started with executable stack
>     do_page_fault(): sending SIGSEGV to init for invalid read access
> from 00000000583399f8
>     epc = 0000000077e2b094 in ld-2.19.so[3094,77e28000+22000]
>     ra  = 0000000077e2afcc in ld-2.19.so[2fcc,77e28000+22000]
>     Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>     ---[ end Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b ]---
>
> or
>
>     Run /sbin/init as init process
>     process '/lib/systemd/systemd' started with executable stack
>     /sbin/inKernel panic - not syncing: Attempted to kill init!
> exitcode=0x00007f00
>     ---[ end Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x00007f00 ]---
>     it: error while loading shared libraries: libpthread.so.0: object
> file has no dynamic section
>
> Reverting the commit (and fixing the trivial conflict) fixes the issue.
>
>> --- a/include/linux/highmem.h
>> +++ b/include/linux/highmem.h
>> @@ -224,13 +224,7 @@ static inline
>>  struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
>>                                    unsigned long vaddr)
>>  {
>> -       struct folio *folio;
>> -
>> -       folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr);
>> -       if (folio)
>> -               clear_user_highpage(&folio->page, vaddr);
>> -
>> -       return folio;
>> +       return vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr);
>>  }
>>  #endif
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 82f464865570..5dcbea96edb7 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -1176,7 +1176,8 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
>>         }
>>         folio_throttle_swaprate(folio, gfp);
>>
>> -       folio_zero_user(folio, addr);
>> +       if (!alloc_zeroed())
>> +               folio_zero_user(folio, addr);
>>         /*
>>          * The memory barrier inside __folio_mark_uptodate makes sure that
>>          * folio_zero_user writes become visible before the set_pmd_at()
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 906da6280c2d..508f7802dd2b 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -1233,6 +1233,12 @@ void touch_pud(struct vm_area_struct *vma, unsigned long addr,
>>  void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
>>                pmd_t *pmd, bool write);
>>
>> +static inline bool alloc_zeroed(void)
>> +{
>> +       return static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
>> +                       &init_on_alloc);
>> +}
>> +
>>  enum {
>>         /* mark page accessed */
>>         FOLL_TOUCH = 1 << 16,
>> diff --git a/mm/memory.c b/mm/memory.c
>> index c67359ddb61a..88252f0e06d0 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -4719,7 +4719,8 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
>>                                 goto next;
>>                         }
>>                         folio_throttle_swaprate(folio, gfp);
>> -                       folio_zero_user(folio, vmf->address);
>> +                       if (!alloc_zeroed())
>> +                               folio_zero_user(folio, vmf->address);
>>                         return folio;
>>                 }
>>  next:

The provided config does not have THP on, so the changes to mm/huge_memory.c
and mm/memory.c do not apply.

Can you try the patch below and see if the machine boots? Thanks.

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 6e452bd8e7e3..bec9bd715acf 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -224,7 +224,13 @@ static inline
 struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
 				   unsigned long vaddr)
 {
-	return vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr);
+	struct folio *folio;
+
+	folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr);
+	if (folio)
+		clear_user_highpage(&folio->page, vaddr);
+
+	return folio;
 }
 #endif


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-04 15:24   ` Zi Yan
@ 2024-12-04 15:41     ` Vlastimil Babka
  2024-12-04 16:16       ` Zi Yan
  2024-12-05  8:15     ` Geert Uytterhoeven
  1 sibling, 1 reply; 32+ messages in thread
From: Vlastimil Babka @ 2024-12-04 15:41 UTC (permalink / raw)
  To: Zi Yan, Geert Uytterhoeven
  Cc: linux-mm, Andrew Morton, David Hildenbrand,
	Matthew Wilcox (Oracle),
	Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, Ryan Roberts,
	Alexander Potapenko, Kees Cook, linux-kernel, linux-mips

On 12/4/24 16:24, Zi Yan wrote:
> On 4 Dec 2024, at 5:41, Geert Uytterhoeven wrote:
> 
> The provided config does not have THP on, so the changes to mm/huge_memory.c
> and mm/memory.c do not apply.
> 
> Can you try the patch below and see if the machine boots? Thanks.

Hmm looks like mips has some involved clear_user_page()
in arch/mips/include/asm/page.h

So maybe the clearing done as part of page allocator isn't enough here.

> 
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index 6e452bd8e7e3..bec9bd715acf 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -224,7 +224,13 @@ static inline
>  struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
>  				   unsigned long vaddr)
>  {
> -	return vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr);
> +	struct folio *folio;
> +
> +	folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr);
> +	if (folio)
> +		clear_user_highpage(&folio->page, vaddr);
> +
> +	return folio;
>  }
>  #endif
> 
> 
> Best Regards,
> Yan, Zi
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-04 15:41     ` Vlastimil Babka
@ 2024-12-04 16:16       ` Zi Yan
  2024-12-04 16:29         ` Matthew Wilcox
  0 siblings, 1 reply; 32+ messages in thread
From: Zi Yan @ 2024-12-04 16:16 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Geert Uytterhoeven, linux-mm, Andrew Morton, David Hildenbrand,
	Matthew Wilcox (Oracle),
	Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, Ryan Roberts,
	Alexander Potapenko, Kees Cook, linux-kernel, linux-mips

On 4 Dec 2024, at 10:41, Vlastimil Babka wrote:

> On 12/4/24 16:24, Zi Yan wrote:
>> On 4 Dec 2024, at 5:41, Geert Uytterhoeven wrote:
>>
>> The provided config does not have THP on, so the changes to mm/huge_memory.c
>> and mm/memory.c do not apply.
>>
>> Can you try the patch below and see if the machine boots? Thanks.
>
> Hmm looks like mips has some involved clear_user_page()
> in arch/mips/include/asm/page.h
>
> So maybe the clearing done as part of page allocator isn't enough here.
>
Basically, mips needs to flush data cache if kmap address is aliased to
userspace address. This means when mips has THP on, the patch below
is not enough to fix the issue.

In post_alloc_hook(), it does not make sense to pass userspace address
in to determine whether to flush dcache or not.

One way to fix it is to add something like arch_userpage_post_alloc()
to flush dcache if kmap address is aliased to userspace address.
But my questions are that
1) if kmap address will always be the same for two separate kmap_local() calls,
2) how much overheads the additional kmap_local() and kunmap_local() have.

>>
>> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> index 6e452bd8e7e3..bec9bd715acf 100644
>> --- a/include/linux/highmem.h
>> +++ b/include/linux/highmem.h
>> @@ -224,7 +224,13 @@ static inline
>>  struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
>>  				   unsigned long vaddr)
>>  {
>> -	return vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr);
>> +	struct folio *folio;
>> +
>> +	folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr);
>> +	if (folio)
>> +		clear_user_highpage(&folio->page, vaddr);
>> +
>> +	return folio;
>>  }
>>  #endif
>>
>>
>> Best Regards,
>> Yan, Zi
>>


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-04 16:16       ` Zi Yan
@ 2024-12-04 16:29         ` Matthew Wilcox
  2024-12-04 16:58           ` Zi Yan
  2024-12-04 17:33           ` Zi Yan
  0 siblings, 2 replies; 32+ messages in thread
From: Matthew Wilcox @ 2024-12-04 16:29 UTC (permalink / raw)
  To: Zi Yan
  Cc: Vlastimil Babka, Geert Uytterhoeven, linux-mm, Andrew Morton,
	David Hildenbrand, Miaohe Lin, Kefeng Wang, John Hubbard, Huang,
	Ying, Ryan Roberts, Alexander Potapenko, Kees Cook, linux-kernel,
	linux-mips

On Wed, Dec 04, 2024 at 11:16:51AM -0500, Zi Yan wrote:
> > So maybe the clearing done as part of page allocator isn't enough here.
> >
> Basically, mips needs to flush data cache if kmap address is aliased to

People use "aliased" in contronym ways.  Do you mean "has a
non-congruent alias" or "has a congruent alias"?

> userspace address. This means when mips has THP on, the patch below
> is not enough to fix the issue.
> 
> In post_alloc_hook(), it does not make sense to pass userspace address
> in to determine whether to flush dcache or not.
> 
> One way to fix it is to add something like arch_userpage_post_alloc()
> to flush dcache if kmap address is aliased to userspace address.
> But my questions are that
> 1) if kmap address will always be the same for two separate kmap_local() calls,

No.  It just takes the next address in the stack.

> 2) how much overheads the additional kmap_local() and kunmap_local() have.

That's going to be a per-arch question ...


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-04 16:29         ` Matthew Wilcox
@ 2024-12-04 16:58           ` Zi Yan
  2024-12-05  8:19             ` Geert Uytterhoeven
  2024-12-04 17:33           ` Zi Yan
  1 sibling, 1 reply; 32+ messages in thread
From: Zi Yan @ 2024-12-04 16:58 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Vlastimil Babka, Geert Uytterhoeven, linux-mm, Andrew Morton,
	David Hildenbrand, Miaohe Lin, Kefeng Wang, John Hubbard, Huang,
	Ying, Ryan Roberts, Alexander Potapenko, Kees Cook, linux-kernel,
	linux-mips

On 4 Dec 2024, at 11:29, Matthew Wilcox wrote:

> On Wed, Dec 04, 2024 at 11:16:51AM -0500, Zi Yan wrote:
>>> So maybe the clearing done as part of page allocator isn't enough here.
>>>
>> Basically, mips needs to flush data cache if kmap address is aliased to
>
> People use "aliased" in contronym ways.  Do you mean "has a
> non-congruent alias" or "has a congruent alias"?

I mean if kmap address goes into a different cache line than userspace
address, a cache flush is needed to make sure data is visible to
userspace.

>
>> userspace address. This means when mips has THP on, the patch below
>> is not enough to fix the issue.
>>
>> In post_alloc_hook(), it does not make sense to pass userspace address
>> in to determine whether to flush dcache or not.
>>
>> One way to fix it is to add something like arch_userpage_post_alloc()
>> to flush dcache if kmap address is aliased to userspace address.
>> But my questions are that
>> 1) if kmap address will always be the same for two separate kmap_local() calls,
>
> No.  It just takes the next address in the stack.

So this fix will not work, since it is possible that first kmap and second
kmap have different pages_do_alias() return values.

Another way would be to make a special case for mips, like below.
But that looks ugly, let me think about it more.

diff --git a/arch/mips/include/asm/page.h b/arch/mips/include/asm/page.h
index bc3e3484c1bf..ef3c6f0b9159 100644
--- a/arch/mips/include/asm/page.h
+++ b/arch/mips/include/asm/page.h
@@ -95,6 +95,19 @@ struct vm_area_struct;
 extern void copy_user_highpage(struct page *to, struct page *from,
 	unsigned long vaddr, struct vm_area_struct *vma);

+struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
+ 				   unsigned long vaddr)
+ {
+	struct folio *folio;
+
+	folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr);
+	if (folio)
+		clear_user_highpage(&folio->page, vaddr);
+
+	return folio;
+ }
+#define vma_alloc_zeroed_movable_folio vma_alloc_zeroed_movable_folio
+
 #define __HAVE_ARCH_COPY_USER_HIGHPAGE

 /*
diff --git a/mm/internal.h b/mm/internal.h
index cb8d8e8e3ffa..d513fa683aa3 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1287,7 +1287,8 @@ void touch_pmd(struct vm_area_struct *vma, unsigned long addr,

 static inline bool alloc_zeroed(void)
 {
-	return static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
+	return !IS_ENABLED(CONFIG_MIPS) &&
+		static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
 			&init_on_alloc);
 }


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-04 16:58           ` Zi Yan
@ 2024-12-05  8:19             ` Geert Uytterhoeven
  2024-12-05 17:32               ` Zi Yan
  0 siblings, 1 reply; 32+ messages in thread
From: Geert Uytterhoeven @ 2024-12-05  8:19 UTC (permalink / raw)
  To: Zi Yan
  Cc: Matthew Wilcox, Vlastimil Babka, linux-mm, Andrew Morton,
	David Hildenbrand, Miaohe Lin, Kefeng Wang, John Hubbard, Huang,
	Ying, Ryan Roberts, Alexander Potapenko, Kees Cook, linux-kernel,
	linux-mips

Hi Zi,

On Wed, Dec 4, 2024 at 5:58 PM Zi Yan <ziy@nvidia.com> wrote:
> On 4 Dec 2024, at 11:29, Matthew Wilcox wrote:
> > On Wed, Dec 04, 2024 at 11:16:51AM -0500, Zi Yan wrote:
> >>> So maybe the clearing done as part of page allocator isn't enough here.
> >>>
> >> Basically, mips needs to flush data cache if kmap address is aliased to
> >
> > People use "aliased" in contronym ways.  Do you mean "has a
> > non-congruent alias" or "has a congruent alias"?
>
> I mean if kmap address goes into a different cache line than userspace
> address, a cache flush is needed to make sure data is visible to
> userspace.
>
> >
> >> userspace address. This means when mips has THP on, the patch below
> >> is not enough to fix the issue.
> >>
> >> In post_alloc_hook(), it does not make sense to pass userspace address
> >> in to determine whether to flush dcache or not.
> >>
> >> One way to fix it is to add something like arch_userpage_post_alloc()
> >> to flush dcache if kmap address is aliased to userspace address.
> >> But my questions are that
> >> 1) if kmap address will always be the same for two separate kmap_local() calls,
> >
> > No.  It just takes the next address in the stack.
>
> So this fix will not work, since it is possible that first kmap and second
> kmap have different pages_do_alias() return values.
>
> Another way would be to make a special case for mips, like below.
> But that looks ugly, let me think about it more.
>
> diff --git a/arch/mips/include/asm/page.h b/arch/mips/include/asm/page.h
> index bc3e3484c1bf..ef3c6f0b9159 100644
> --- a/arch/mips/include/asm/page.h
> +++ b/arch/mips/include/asm/page.h
> @@ -95,6 +95,19 @@ struct vm_area_struct;
>  extern void copy_user_highpage(struct page *to, struct page *from,
>         unsigned long vaddr, struct vm_area_struct *vma);
>
> +struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
> +                                  unsigned long vaddr)
> + {
> +       struct folio *folio;
> +
> +       folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr);
> +       if (folio)
> +               clear_user_highpage(&folio->page, vaddr);
> +
> +       return folio;
> + }
> +#define vma_alloc_zeroed_movable_folio vma_alloc_zeroed_movable_folio
> +
>  #define __HAVE_ARCH_COPY_USER_HIGHPAGE
>
>  /*
> diff --git a/mm/internal.h b/mm/internal.h
> index cb8d8e8e3ffa..d513fa683aa3 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1287,7 +1287,8 @@ void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
>
>  static inline bool alloc_zeroed(void)
>  {
> -       return static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
> +       return !IS_ENABLED(CONFIG_MIPS) &&
> +               static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
>                         &init_on_alloc);
>  }

After adding a missing static inline, #include <linux/gfp.h>, and still
getting compile failures, I gave up...

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-05  8:19             ` Geert Uytterhoeven
@ 2024-12-05 17:32               ` Zi Yan
  2024-12-06  8:37                 ` Geert Uytterhoeven
  0 siblings, 1 reply; 32+ messages in thread
From: Zi Yan @ 2024-12-05 17:32 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Matthew Wilcox, Vlastimil Babka, linux-mm, Andrew Morton,
	David Hildenbrand, Miaohe Lin, Kefeng Wang, John Hubbard, Huang,
	Ying, Ryan Roberts, Alexander Potapenko, Kees Cook, linux-kernel,
	linux-mips

On 5 Dec 2024, at 3:19, Geert Uytterhoeven wrote:

> Hi Zi,
>
> On Wed, Dec 4, 2024 at 5:58 PM Zi Yan <ziy@nvidia.com> wrote:
>> On 4 Dec 2024, at 11:29, Matthew Wilcox wrote:
>>> On Wed, Dec 04, 2024 at 11:16:51AM -0500, Zi Yan wrote:
>>>>> So maybe the clearing done as part of page allocator isn't enough here.
>>>>>
>>>> Basically, mips needs to flush data cache if kmap address is aliased to
>>>
>>> People use "aliased" in contronym ways.  Do you mean "has a
>>> non-congruent alias" or "has a congruent alias"?
>>
>> I mean if kmap address goes into a different cache line than userspace
>> address, a cache flush is needed to make sure data is visible to
>> userspace.
>>
>>>
>>>> userspace address. This means when mips has THP on, the patch below
>>>> is not enough to fix the issue.
>>>>
>>>> In post_alloc_hook(), it does not make sense to pass userspace address
>>>> in to determine whether to flush dcache or not.
>>>>
>>>> One way to fix it is to add something like arch_userpage_post_alloc()
>>>> to flush dcache if kmap address is aliased to userspace address.
>>>> But my questions are that
>>>> 1) if kmap address will always be the same for two separate kmap_local() calls,
>>>
>>> No.  It just takes the next address in the stack.
>>
>> So this fix will not work, since it is possible that first kmap and second
>> kmap have different pages_do_alias() return values.
>>
>> Another way would be to make a special case for mips, like below.
>> But that looks ugly, let me think about it more.
>>
>> diff --git a/arch/mips/include/asm/page.h b/arch/mips/include/asm/page.h
>> index bc3e3484c1bf..ef3c6f0b9159 100644
>> --- a/arch/mips/include/asm/page.h
>> +++ b/arch/mips/include/asm/page.h
>> @@ -95,6 +95,19 @@ struct vm_area_struct;
>>  extern void copy_user_highpage(struct page *to, struct page *from,
>>         unsigned long vaddr, struct vm_area_struct *vma);
>>
>> +struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
>> +                                  unsigned long vaddr)
>> + {
>> +       struct folio *folio;
>> +
>> +       folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr);
>> +       if (folio)
>> +               clear_user_highpage(&folio->page, vaddr);
>> +
>> +       return folio;
>> + }
>> +#define vma_alloc_zeroed_movable_folio vma_alloc_zeroed_movable_folio
>> +
>>  #define __HAVE_ARCH_COPY_USER_HIGHPAGE
>>
>>  /*
>> diff --git a/mm/internal.h b/mm/internal.h
>> index cb8d8e8e3ffa..d513fa683aa3 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -1287,7 +1287,8 @@ void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
>>
>>  static inline bool alloc_zeroed(void)
>>  {
>> -       return static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
>> +       return !IS_ENABLED(CONFIG_MIPS) &&
>> +               static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
>>                         &init_on_alloc);
>>  }
>
> After adding a missing static inline, #include <linux/gfp.h>, and still
> getting compile failures, I gave up...

Sorry about that.

Can you try the patch below (it compiles locally for mips and x86) to see
if your issue is fixed? Can you please make THP always on in your config,
since THP is also affected by the same issue? The patch you tested only
fixed non THP config.

Thanks. I appreciate your help. :)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 6e452bd8e7e3..d9beb8371daa 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -224,7 +224,13 @@ static inline
 struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
 				   unsigned long vaddr)
 {
-	return vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr);
+	struct folio *folio;
+
+	folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr);
+	if (folio && alloc_need_zeroing())
+		clear_user_highpage(&folio->page, vaddr);
+
+	return folio;
 }
 #endif

diff --git a/include/linux/mm.h b/include/linux/mm.h
index c39c4945946c..6ac0308c4380 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4175,6 +4175,23 @@ static inline int do_mseal(unsigned long start, size_t len_in, unsigned long fla
 }
 #endif

+/*
+ * alloc_need_zeroing checks if a user folio from page allocator needs to be
+ * zeroed or not.
+ */
+static inline bool alloc_need_zeroing(void)
+{
+	/*
+	 * for user folios, arch with cache aliasing requires cache flush and
+	 * arc sets folio->flags, so always return false to make caller use
+	 * clear_user_page()/clear_user_highpage()
+	 */
+	return (IS_ENABLED(CONFIG_ARCH_HAS_CPU_CACHE_ALIASING) ||
+		IS_ENABLED(CONFIG_ARC)) ||
+	       !static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
+				   &init_on_alloc);
+}
+
 int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status);
 int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status);
 int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ee335d96fc39..107130a5413a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1176,11 +1176,12 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
 	folio_throttle_swaprate(folio, gfp);

        /*
-	* When a folio is not zeroed during allocation (__GFP_ZERO not used),
-	* folio_zero_user() is used to make sure that the page corresponding
-	* to the faulting address will be hot in the cache after zeroing.
+	* When a folio is not zeroed during allocation (__GFP_ZERO not used)
+	* or user folios require special handling, folio_zero_user() is used to
+	* make sure that the page corresponding to the faulting address will be
+	* hot in the cache after zeroing.
 	*/
-	if (!alloc_zeroed())
+	if (alloc_need_zeroing())
 		folio_zero_user(folio, addr);
 	/*
 	 * The memory barrier inside __folio_mark_uptodate makes sure that
diff --git a/mm/internal.h b/mm/internal.h
index cb8d8e8e3ffa..3bd08bafad04 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1285,12 +1285,6 @@ void touch_pud(struct vm_area_struct *vma, unsigned long addr,
 void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
 	       pmd_t *pmd, bool write);

-static inline bool alloc_zeroed(void)
-{
-	return static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
-			&init_on_alloc);
-}
-
 /*
  * Parses a string with mem suffixes into its order. Useful to parse kernel
  * parameters.
diff --git a/mm/memory.c b/mm/memory.c
index 75c2dfd04f72..cf1611791856 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4733,12 +4733,12 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
 			folio_throttle_swaprate(folio, gfp);
 			/*
 			 * When a folio is not zeroed during allocation
-			 * (__GFP_ZERO not used), folio_zero_user() is used
-			 * to make sure that the page corresponding to the
-			 * faulting address will be hot in the cache after
-			 * zeroing.
+			 * (__GFP_ZERO not used) or user folios require special
+			 * handling, folio_zero_user() is used to make sure
+			 * that the page corresponding to the faulting address
+			 * will be hot in the cache after zeroing.
 			 */
-			if (!alloc_zeroed())
+			if (alloc_need_zeroing())
 				folio_zero_user(folio, vmf->address);
 			return folio;
 		}



Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-05 17:32               ` Zi Yan
@ 2024-12-06  8:37                 ` Geert Uytterhoeven
  0 siblings, 0 replies; 32+ messages in thread
From: Geert Uytterhoeven @ 2024-12-06  8:37 UTC (permalink / raw)
  To: Zi Yan
  Cc: Matthew Wilcox, Vlastimil Babka, linux-mm, Andrew Morton,
	David Hildenbrand, Miaohe Lin, Kefeng Wang, John Hubbard, Huang,
	Ying, Ryan Roberts, Alexander Potapenko, Kees Cook, linux-kernel,
	linux-mips

Hi Zi,

On Thu, Dec 5, 2024 at 6:33 PM Zi Yan <ziy@nvidia.com> wrote:
> Can you try the patch below (it compiles locally for mips and x86) to see
> if your issue is fixed? Can you please make THP always on in your config,
> since THP is also affected by the same issue? The patch you tested only
> fixed non THP config.

Thanks, this works both without THP, and with
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y

Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

> Thanks. I appreciate your help. :)

Thanks, I appreciate your quick responses and solutions!

> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -224,7 +224,13 @@ static inline
>  struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
>                                    unsigned long vaddr)
>  {
> -       return vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr);
> +       struct folio *folio;
> +
> +       folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr);
> +       if (folio && alloc_need_zeroing())
> +               clear_user_highpage(&folio->page, vaddr);
> +
> +       return folio;
>  }
>  #endif
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index c39c4945946c..6ac0308c4380 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4175,6 +4175,23 @@ static inline int do_mseal(unsigned long start, size_t len_in, unsigned long fla
>  }
>  #endif
>
> +/*
> + * alloc_need_zeroing checks if a user folio from page allocator needs to be
> + * zeroed or not.
> + */
> +static inline bool alloc_need_zeroing(void)
> +{
> +       /*
> +        * for user folios, arch with cache aliasing requires cache flush and
> +        * arc sets folio->flags, so always return false to make caller use
> +        * clear_user_page()/clear_user_highpage()
> +        */
> +       return (IS_ENABLED(CONFIG_ARCH_HAS_CPU_CACHE_ALIASING) ||
> +               IS_ENABLED(CONFIG_ARC)) ||
> +              !static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
> +                                  &init_on_alloc);
> +}
> +
>  int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status);
>  int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status);
>  int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index ee335d96fc39..107130a5413a 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1176,11 +1176,12 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
>         folio_throttle_swaprate(folio, gfp);
>
>         /*
> -       * When a folio is not zeroed during allocation (__GFP_ZERO not used),
> -       * folio_zero_user() is used to make sure that the page corresponding
> -       * to the faulting address will be hot in the cache after zeroing.
> +       * When a folio is not zeroed during allocation (__GFP_ZERO not used)
> +       * or user folios require special handling, folio_zero_user() is used to
> +       * make sure that the page corresponding to the faulting address will be
> +       * hot in the cache after zeroing.
>         */
> -       if (!alloc_zeroed())
> +       if (alloc_need_zeroing())
>                 folio_zero_user(folio, addr);
>         /*
>          * The memory barrier inside __folio_mark_uptodate makes sure that
> diff --git a/mm/internal.h b/mm/internal.h
> index cb8d8e8e3ffa..3bd08bafad04 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1285,12 +1285,6 @@ void touch_pud(struct vm_area_struct *vma, unsigned long addr,
>  void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
>                pmd_t *pmd, bool write);
>
> -static inline bool alloc_zeroed(void)
> -{
> -       return static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
> -                       &init_on_alloc);
> -}
> -
>  /*
>   * Parses a string with mem suffixes into its order. Useful to parse kernel
>   * parameters.
> diff --git a/mm/memory.c b/mm/memory.c
> index 75c2dfd04f72..cf1611791856 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4733,12 +4733,12 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
>                         folio_throttle_swaprate(folio, gfp);
>                         /*
>                          * When a folio is not zeroed during allocation
> -                        * (__GFP_ZERO not used), folio_zero_user() is used
> -                        * to make sure that the page corresponding to the
> -                        * faulting address will be hot in the cache after
> -                        * zeroing.
> +                        * (__GFP_ZERO not used) or user folios require special
> +                        * handling, folio_zero_user() is used to make sure
> +                        * that the page corresponding to the faulting address
> +                        * will be hot in the cache after zeroing.
>                          */
> -                       if (!alloc_zeroed())
> +                       if (alloc_need_zeroing())
>                                 folio_zero_user(folio, vmf->address);
>                         return folio;
>                 }

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-04 16:29         ` Matthew Wilcox
  2024-12-04 16:58           ` Zi Yan
@ 2024-12-04 17:33           ` Zi Yan
  2024-12-04 17:46             ` Vlastimil Babka
  2024-12-04 18:30             ` Zi Yan
  1 sibling, 2 replies; 32+ messages in thread
From: Zi Yan @ 2024-12-04 17:33 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Vlastimil Babka, Geert Uytterhoeven, linux-mm, Andrew Morton,
	David Hildenbrand, Miaohe Lin, Kefeng Wang, John Hubbard, Huang,
	Ying, Ryan Roberts, Alexander Potapenko, Kees Cook, linux-kernel,
	linux-mips

On 4 Dec 2024, at 11:29, Matthew Wilcox wrote:

> On Wed, Dec 04, 2024 at 11:16:51AM -0500, Zi Yan wrote:
>>> So maybe the clearing done as part of page allocator isn't enough here.
>>>
>> Basically, mips needs to flush data cache if kmap address is aliased to
>
> People use "aliased" in contronym ways.  Do you mean "has a
> non-congruent alias" or "has a congruent alias"?
>
>> userspace address. This means when mips has THP on, the patch below
>> is not enough to fix the issue.
>>
>> In post_alloc_hook(), it does not make sense to pass userspace address
>> in to determine whether to flush dcache or not.
>>
>> One way to fix it is to add something like arch_userpage_post_alloc()
>> to flush dcache if kmap address is aliased to userspace address.
>> But my questions are that
>> 1) if kmap address will always be the same for two separate kmap_local() calls,
>
> No.  It just takes the next address in the stack.

Hmm, if kmap_local() gives different addresses, wouldn’t init_on_alloc be
causing issues before my patch? In the page allocator, the page is zeroed
from one kmap address without flush, then clear_user_highpage() clears
it again with another kmap address with flush. After returning to userspace,
the user application works on the page but when the cache line used by
init_on_alloc is written back (with 0s) at eviction, user data is corrupted.
Am I missing anything? Or all arch with cache aliasing never enables
init_on_alloc?

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-04 17:33           ` Zi Yan
@ 2024-12-04 17:46             ` Vlastimil Babka
  2024-12-04 18:13               ` Zi Yan
  2024-12-04 18:30             ` Zi Yan
  1 sibling, 1 reply; 32+ messages in thread
From: Vlastimil Babka @ 2024-12-04 17:46 UTC (permalink / raw)
  To: Zi Yan, Matthew Wilcox
  Cc: Geert Uytterhoeven, linux-mm, Andrew Morton, David Hildenbrand,
	Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, Ryan Roberts,
	Alexander Potapenko, Kees Cook, linux-kernel, linux-mips

On 12/4/24 18:33, Zi Yan wrote:
> On 4 Dec 2024, at 11:29, Matthew Wilcox wrote:
> 
>> On Wed, Dec 04, 2024 at 11:16:51AM -0500, Zi Yan wrote:
>>>> So maybe the clearing done as part of page allocator isn't enough here.
>>>>
>>> Basically, mips needs to flush data cache if kmap address is aliased to
>>
>> People use "aliased" in contronym ways.  Do you mean "has a
>> non-congruent alias" or "has a congruent alias"?
>>
>>> userspace address. This means when mips has THP on, the patch below
>>> is not enough to fix the issue.
>>>
>>> In post_alloc_hook(), it does not make sense to pass userspace address
>>> in to determine whether to flush dcache or not.
>>>
>>> One way to fix it is to add something like arch_userpage_post_alloc()
>>> to flush dcache if kmap address is aliased to userspace address.
>>> But my questions are that
>>> 1) if kmap address will always be the same for two separate kmap_local() calls,
>>
>> No.  It just takes the next address in the stack.
> 
> Hmm, if kmap_local() gives different addresses, wouldn’t init_on_alloc be
> causing issues before my patch? In the page allocator, the page is zeroed
> from one kmap address without flush, then clear_user_highpage() clears
> it again with another kmap address with flush. After returning to userspace,
> the user application works on the page but when the cache line used by
> init_on_alloc is written back (with 0s) at eviction, user data is corrupted.
> Am I missing anything? Or all arch with cache aliasing never enables
> init_on_alloc?

Maybe the arch also defines some hooks like arch_kmap_local_post_unmap() ?

As for the fix, could it rely on e.g. __HAVE_ARCH_COPY_USER_HIGHPAGE instead
of CONFIG_MIPS? That affects more arches, I don't know if we broke only mips
or others too.

> Best Regards,
> Yan, Zi



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-04 17:46             ` Vlastimil Babka
@ 2024-12-04 18:13               ` Zi Yan
  2024-12-04 18:16                 ` Zi Yan
  0 siblings, 1 reply; 32+ messages in thread
From: Zi Yan @ 2024-12-04 18:13 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Matthew Wilcox, Geert Uytterhoeven, linux-mm, Andrew Morton,
	David Hildenbrand, Miaohe Lin, Kefeng Wang, John Hubbard, Huang,
	Ying, Ryan Roberts, Alexander Potapenko, Kees Cook, linux-kernel,
	linux-mips

On 4 Dec 2024, at 12:46, Vlastimil Babka wrote:

> On 12/4/24 18:33, Zi Yan wrote:
>> On 4 Dec 2024, at 11:29, Matthew Wilcox wrote:
>>
>>> On Wed, Dec 04, 2024 at 11:16:51AM -0500, Zi Yan wrote:
>>>>> So maybe the clearing done as part of page allocator isn't enough here.
>>>>>
>>>> Basically, mips needs to flush data cache if kmap address is aliased to
>>>
>>> People use "aliased" in contronym ways.  Do you mean "has a
>>> non-congruent alias" or "has a congruent alias"?
>>>
>>>> userspace address. This means when mips has THP on, the patch below
>>>> is not enough to fix the issue.
>>>>
>>>> In post_alloc_hook(), it does not make sense to pass userspace address
>>>> in to determine whether to flush dcache or not.
>>>>
>>>> One way to fix it is to add something like arch_userpage_post_alloc()
>>>> to flush dcache if kmap address is aliased to userspace address.
>>>> But my questions are that
>>>> 1) if kmap address will always be the same for two separate kmap_local() calls,
>>>
>>> No.  It just takes the next address in the stack.
>>
>> Hmm, if kmap_local() gives different addresses, wouldn’t init_on_alloc be
>> causing issues before my patch? In the page allocator, the page is zeroed
>> from one kmap address without flush, then clear_user_highpage() clears
>> it again with another kmap address with flush. After returning to userspace,
>> the user application works on the page but when the cache line used by
>> init_on_alloc is written back (with 0s) at eviction, user data is corrupted.
>> Am I missing anything? Or all arch with cache aliasing never enables
>> init_on_alloc?
>
> Maybe the arch also defines some hooks like arch_kmap_local_post_unmap() ?

But this does not solve the possible init_on_alloc issue, since init_on_alloc
is done in mm/page_alloc.c without userspace address and has no knowledge of
whether the zeroed page will be used in userspace nor the cache line will
be the same as the userspace page cache line. If dcache is flushed
unconditionally for kmap_local, that could degrade performance.

> As for the fix, could it rely on e.g. __HAVE_ARCH_COPY_USER_HIGHPAGE instead
> of CONFIG_MIPS? That affects more arches, I don't know if we broke only mips
> or others too.

Yes, this is much better, since this issue affects any arch with cache aliasing.
Let me update my fix. Thanks.


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-04 18:13               ` Zi Yan
@ 2024-12-04 18:16                 ` Zi Yan
  2024-12-04 21:21                   ` Zi Yan
  0 siblings, 1 reply; 32+ messages in thread
From: Zi Yan @ 2024-12-04 18:16 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Matthew Wilcox, Geert Uytterhoeven, linux-mm, Andrew Morton,
	David Hildenbrand, Miaohe Lin, Kefeng Wang, John Hubbard, Huang,
	Ying, Ryan Roberts, Alexander Potapenko, Kees Cook, linux-kernel,
	linux-mips

On 4 Dec 2024, at 13:13, Zi Yan wrote:

> On 4 Dec 2024, at 12:46, Vlastimil Babka wrote:
>
>> On 12/4/24 18:33, Zi Yan wrote:
>>> On 4 Dec 2024, at 11:29, Matthew Wilcox wrote:
>>>
>>>> On Wed, Dec 04, 2024 at 11:16:51AM -0500, Zi Yan wrote:
>>>>>> So maybe the clearing done as part of page allocator isn't enough here.
>>>>>>
>>>>> Basically, mips needs to flush data cache if kmap address is aliased to
>>>>
>>>> People use "aliased" in contronym ways.  Do you mean "has a
>>>> non-congruent alias" or "has a congruent alias"?
>>>>
>>>>> userspace address. This means when mips has THP on, the patch below
>>>>> is not enough to fix the issue.
>>>>>
>>>>> In post_alloc_hook(), it does not make sense to pass userspace address
>>>>> in to determine whether to flush dcache or not.
>>>>>
>>>>> One way to fix it is to add something like arch_userpage_post_alloc()
>>>>> to flush dcache if kmap address is aliased to userspace address.
>>>>> But my questions are that
>>>>> 1) if kmap address will always be the same for two separate kmap_local() calls,
>>>>
>>>> No.  It just takes the next address in the stack.
>>>
>>> Hmm, if kmap_local() gives different addresses, wouldn’t init_on_alloc be
>>> causing issues before my patch? In the page allocator, the page is zeroed
>>> from one kmap address without flush, then clear_user_highpage() clears
>>> it again with another kmap address with flush. After returning to userspace,
>>> the user application works on the page but when the cache line used by
>>> init_on_alloc is written back (with 0s) at eviction, user data is corrupted.
>>> Am I missing anything? Or all arch with cache aliasing never enables
>>> init_on_alloc?
>>
>> Maybe the arch also defines some hooks like arch_kmap_local_post_unmap() ?
>
> But this does not solve the possible init_on_alloc issue, since init_on_alloc
> is done in mm/page_alloc.c without userspace address and has no knowledge of
> whether the zeroed page will be used in userspace nor the cache line will
> be the same as the userspace page cache line. If dcache is flushed
> unconditionally for kmap_local, that could degrade performance.
>
>> As for the fix, could it rely on e.g. __HAVE_ARCH_COPY_USER_HIGHPAGE instead
>> of CONFIG_MIPS? That affects more arches, I don't know if we broke only mips
>> or others too.
>
> Yes, this is much better, since this issue affects any arch with cache aliasing.
> Let me update my fix. Thanks.

I notice that arm64 has __HAVE_ARCH_COPY_USER_HIGHPAGE defined, so I will
need to look for an alternative.

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-04 18:16                 ` Zi Yan
@ 2024-12-04 21:21                   ` Zi Yan
  2024-12-04 21:24                     ` John Hubbard
  0 siblings, 1 reply; 32+ messages in thread
From: Zi Yan @ 2024-12-04 21:21 UTC (permalink / raw)
  To: Vlastimil Babka, Kees Cook, Alexander Potapenko
  Cc: Matthew Wilcox, Geert Uytterhoeven, linux-mm, Andrew Morton,
	David Hildenbrand, Miaohe Lin, Kefeng Wang, John Hubbard, Huang,
	Ying, Ryan Roberts, linux-kernel, linux-mips

On 4 Dec 2024, at 13:16, Zi Yan wrote:

> On 4 Dec 2024, at 13:13, Zi Yan wrote:
>
>> On 4 Dec 2024, at 12:46, Vlastimil Babka wrote:
>>
>>> On 12/4/24 18:33, Zi Yan wrote:
>>>> On 4 Dec 2024, at 11:29, Matthew Wilcox wrote:
>>>>
>>>>> On Wed, Dec 04, 2024 at 11:16:51AM -0500, Zi Yan wrote:
>>>>>>> So maybe the clearing done as part of page allocator isn't enough here.
>>>>>>>
>>>>>> Basically, mips needs to flush data cache if kmap address is aliased to
>>>>>
>>>>> People use "aliased" in contronym ways.  Do you mean "has a
>>>>> non-congruent alias" or "has a congruent alias"?
>>>>>
>>>>>> userspace address. This means when mips has THP on, the patch below
>>>>>> is not enough to fix the issue.
>>>>>>
>>>>>> In post_alloc_hook(), it does not make sense to pass userspace address
>>>>>> in to determine whether to flush dcache or not.
>>>>>>
>>>>>> One way to fix it is to add something like arch_userpage_post_alloc()
>>>>>> to flush dcache if kmap address is aliased to userspace address.
>>>>>> But my questions are that
>>>>>> 1) if kmap address will always be the same for two separate kmap_local() calls,
>>>>>
>>>>> No.  It just takes the next address in the stack.
>>>>
>>>> Hmm, if kmap_local() gives different addresses, wouldn’t init_on_alloc be
>>>> causing issues before my patch? In the page allocator, the page is zeroed
>>>> from one kmap address without flush, then clear_user_highpage() clears
>>>> it again with another kmap address with flush. After returning to userspace,
>>>> the user application works on the page but when the cache line used by
>>>> init_on_alloc is written back (with 0s) at eviction, user data is corrupted.
>>>> Am I missing anything? Or all arch with cache aliasing never enables
>>>> init_on_alloc?
>>>
>>> Maybe the arch also defines some hooks like arch_kmap_local_post_unmap() ?
>>
>> But this does not solve the possible init_on_alloc issue, since init_on_alloc
>> is done in mm/page_alloc.c without userspace address and has no knowledge of
>> whether the zeroed page will be used in userspace nor the cache line will
>> be the same as the userspace page cache line. If dcache is flushed
>> unconditionally for kmap_local, that could degrade performance.
>>
>>> As for the fix, could it rely on e.g. __HAVE_ARCH_COPY_USER_HIGHPAGE instead
>>> of CONFIG_MIPS? That affects more arches, I don't know if we broke only mips
>>> or others too.
>>
>> Yes, this is much better, since this issue affects any arch with cache aliasing.
>> Let me update my fix. Thanks.
>
> I notice that arm64 has __HAVE_ARCH_COPY_USER_HIGHPAGE defined, so I will
> need to look for an alternative.

It turns out sh, sparc, arm, xtensa, nios2, m68k, parisc, csky, and powerpc all have cache flush operations in clear_user_page() compared to clear_page() and
arc clears PG_dc_clean bit in addition to clear_page().

So __GFP_ZERO cannot simply replace clear_user_page()/clear_user_highpage().
I can add ARCH_REQUIRE_CLEAR_USER_PAGE for these arch and use it to decide
whether clear_user_page()/clear_user_highpage() needs to be used regardless of
the presence of init_on_alloc.

I also wonder if INIT_ON_ALLOC_DEFAULT_ON works on these arch or not.


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-04 21:21                   ` Zi Yan
@ 2024-12-04 21:24                     ` John Hubbard
  0 siblings, 0 replies; 32+ messages in thread
From: John Hubbard @ 2024-12-04 21:24 UTC (permalink / raw)
  To: Zi Yan, Vlastimil Babka, Kees Cook, Alexander Potapenko
  Cc: Matthew Wilcox, Geert Uytterhoeven, linux-mm, Andrew Morton,
	David Hildenbrand, Miaohe Lin, Kefeng Wang, Huang, Ying,
	Ryan Roberts, linux-kernel, linux-mips

On 12/4/24 1:21 PM, Zi Yan wrote:
> On 4 Dec 2024, at 13:16, Zi Yan wrote:
> 
>> On 4 Dec 2024, at 13:13, Zi Yan wrote:
>>
>>> On 4 Dec 2024, at 12:46, Vlastimil Babka wrote:
>>>
>>>> On 12/4/24 18:33, Zi Yan wrote:
>>>>> On 4 Dec 2024, at 11:29, Matthew Wilcox wrote:
>>>>>
>>>>>> On Wed, Dec 04, 2024 at 11:16:51AM -0500, Zi Yan wrote:
>>>>>>>> So maybe the clearing done as part of page allocator isn't enough here.
>>>>>>>>
>>>>>>> Basically, mips needs to flush data cache if kmap address is aliased to
>>>>>>
>>>>>> People use "aliased" in contronym ways.  Do you mean "has a
>>>>>> non-congruent alias" or "has a congruent alias"?
>>>>>>
>>>>>>> userspace address. This means when mips has THP on, the patch below
>>>>>>> is not enough to fix the issue.
>>>>>>>
>>>>>>> In post_alloc_hook(), it does not make sense to pass userspace address
>>>>>>> in to determine whether to flush dcache or not.
>>>>>>>
>>>>>>> One way to fix it is to add something like arch_userpage_post_alloc()
>>>>>>> to flush dcache if kmap address is aliased to userspace address.
>>>>>>> But my questions are that
>>>>>>> 1) if kmap address will always be the same for two separate kmap_local() calls,
>>>>>>
>>>>>> No.  It just takes the next address in the stack.
>>>>>
>>>>> Hmm, if kmap_local() gives different addresses, wouldn’t init_on_alloc be
>>>>> causing issues before my patch? In the page allocator, the page is zeroed
>>>>> from one kmap address without flush, then clear_user_highpage() clears
>>>>> it again with another kmap address with flush. After returning to userspace,
>>>>> the user application works on the page but when the cache line used by
>>>>> init_on_alloc is written back (with 0s) at eviction, user data is corrupted.
>>>>> Am I missing anything? Or all arch with cache aliasing never enables
>>>>> init_on_alloc?
>>>>
>>>> Maybe the arch also defines some hooks like arch_kmap_local_post_unmap() ?
>>>
>>> But this does not solve the possible init_on_alloc issue, since init_on_alloc
>>> is done in mm/page_alloc.c without userspace address and has no knowledge of
>>> whether the zeroed page will be used in userspace nor the cache line will
>>> be the same as the userspace page cache line. If dcache is flushed
>>> unconditionally for kmap_local, that could degrade performance.
>>>
>>>> As for the fix, could it rely on e.g. __HAVE_ARCH_COPY_USER_HIGHPAGE instead
>>>> of CONFIG_MIPS? That affects more arches, I don't know if we broke only mips
>>>> or others too.
>>>
>>> Yes, this is much better, since this issue affects any arch with cache aliasing.
>>> Let me update my fix. Thanks.
>>
>> I notice that arm64 has __HAVE_ARCH_COPY_USER_HIGHPAGE defined, so I will
>> need to look for an alternative.
> 
> It turns out sh, sparc, arm, xtensa, nios2, m68k, parisc, csky, and powerpc all have cache flush operations in clear_user_page() compared to clear_page() and
> arc clears PG_dc_clean bit in addition to clear_page().
> 
> So __GFP_ZERO cannot simply replace clear_user_page()/clear_user_highpage().
> I can add ARCH_REQUIRE_CLEAR_USER_PAGE for these arch and use it to decide
> whether clear_user_page()/clear_user_highpage() needs to be used regardless of
> the presence of init_on_alloc.
> 
> I also wonder if INIT_ON_ALLOC_DEFAULT_ON works on these arch or not.
> 

Well, I've been waiting to point out that if you actually *delete* the
entire INIT_ON_ALLOC feature, you'd be my personal hero. Defense in depth
is nice, but at some point, it crosses a line into the absurd, and I think
we are there. </pause and put on asbestos flame suit> :)

thanks,
-- 
John Hubbard



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-04 17:33           ` Zi Yan
  2024-12-04 17:46             ` Vlastimil Babka
@ 2024-12-04 18:30             ` Zi Yan
  2024-12-05  8:04               ` Geert Uytterhoeven
  1 sibling, 1 reply; 32+ messages in thread
From: Zi Yan @ 2024-12-04 18:30 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Matthew Wilcox, Vlastimil Babka, linux-mm, Andrew Morton,
	David Hildenbrand, Miaohe Lin, Kefeng Wang, John Hubbard, Huang,
	Ying, Ryan Roberts, Alexander Potapenko, Kees Cook, linux-kernel,
	linux-mips

On 4 Dec 2024, at 12:33, Zi Yan wrote:

> On 4 Dec 2024, at 11:29, Matthew Wilcox wrote:
>
>> On Wed, Dec 04, 2024 at 11:16:51AM -0500, Zi Yan wrote:
>>>> So maybe the clearing done as part of page allocator isn't enough here.
>>>>
>>> Basically, mips needs to flush data cache if kmap address is aliased to
>>
>> People use "aliased" in contronym ways.  Do you mean "has a
>> non-congruent alias" or "has a congruent alias"?
>>
>>> userspace address. This means when mips has THP on, the patch below
>>> is not enough to fix the issue.
>>>
>>> In post_alloc_hook(), it does not make sense to pass userspace address
>>> in to determine whether to flush dcache or not.
>>>
>>> One way to fix it is to add something like arch_userpage_post_alloc()
>>> to flush dcache if kmap address is aliased to userspace address.
>>> But my questions are that
>>> 1) if kmap address will always be the same for two separate kmap_local() calls,
>>
>> No.  It just takes the next address in the stack.
>
> Hmm, if kmap_local() gives different addresses, wouldn’t init_on_alloc be
> causing issues before my patch? In the page allocator, the page is zeroed
> from one kmap address without flush, then clear_user_highpage() clears
> it again with another kmap address with flush. After returning to userspace,
> the user application works on the page but when the cache line used by
> init_on_alloc is written back (with 0s) at eviction, user data is corrupted.
> Am I missing anything? Or all arch with cache aliasing never enables
> init_on_alloc?

Hi Geert,

Regarding the above concern, have you ever had CONFIG_INIT_ON_ALLOC_DEFAULT_ON
for your MIPS machine and encountered any issue? Or let me know if my reasoning
above is flawed.

To test it, I wonder if you can 1) revert my patch and 2) turn on
CONFIG_INIT_ON_ALLOC_DEFAULT_ON for your MIPS machine and run some applications
to see if any error happens.

Thanks.


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-04 18:30             ` Zi Yan
@ 2024-12-05  8:04               ` Geert Uytterhoeven
  2024-12-05  8:10                 ` David Hildenbrand
  0 siblings, 1 reply; 32+ messages in thread
From: Geert Uytterhoeven @ 2024-12-05  8:04 UTC (permalink / raw)
  To: Zi Yan
  Cc: Matthew Wilcox, Vlastimil Babka, linux-mm, Andrew Morton,
	David Hildenbrand, Miaohe Lin, Kefeng Wang, John Hubbard, Huang,
	Ying, Ryan Roberts, Alexander Potapenko, Kees Cook, linux-kernel,
	linux-mips

Hi Zi,

On Wed, Dec 4, 2024 at 7:30 PM Zi Yan <ziy@nvidia.com> wrote:
> On 4 Dec 2024, at 12:33, Zi Yan wrote:
> > On 4 Dec 2024, at 11:29, Matthew Wilcox wrote:
> >> On Wed, Dec 04, 2024 at 11:16:51AM -0500, Zi Yan wrote:
> >>>> So maybe the clearing done as part of page allocator isn't enough here.
> >>>>
> >>> Basically, mips needs to flush data cache if kmap address is aliased to
> >>
> >> People use "aliased" in contronym ways.  Do you mean "has a
> >> non-congruent alias" or "has a congruent alias"?
> >>
> >>> userspace address. This means when mips has THP on, the patch below
> >>> is not enough to fix the issue.
> >>>
> >>> In post_alloc_hook(), it does not make sense to pass userspace address
> >>> in to determine whether to flush dcache or not.
> >>>
> >>> One way to fix it is to add something like arch_userpage_post_alloc()
> >>> to flush dcache if kmap address is aliased to userspace address.
> >>> But my questions are that
> >>> 1) if kmap address will always be the same for two separate kmap_local() calls,
> >>
> >> No.  It just takes the next address in the stack.
> >
> > Hmm, if kmap_local() gives different addresses, wouldn’t init_on_alloc be
> > causing issues before my patch? In the page allocator, the page is zeroed
> > from one kmap address without flush, then clear_user_highpage() clears
> > it again with another kmap address with flush. After returning to userspace,
> > the user application works on the page but when the cache line used by
> > init_on_alloc is written back (with 0s) at eviction, user data is corrupted.
> > Am I missing anything? Or all arch with cache aliasing never enables
> > init_on_alloc?
>
> Hi Geert,
>
> Regarding the above concern, have you ever had CONFIG_INIT_ON_ALLOC_DEFAULT_ON
> for your MIPS machine and encountered any issue? Or let me know if my reasoning
> above is flawed.
>
> To test it, I wonder if you can 1) revert my patch and 2) turn on
> CONFIG_INIT_ON_ALLOC_DEFAULT_ON for your MIPS machine and run some applications
> to see if any error happens.

That seems to work fine...

Kernel log confirms it's enabled:
-mem auto-init: stack:off, heap alloc:off, heap free:off
+mem auto-init: stack:off, heap alloc:on, heap free:off

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-05  8:04               ` Geert Uytterhoeven
@ 2024-12-05  8:10                 ` David Hildenbrand
  2024-12-05 16:05                   ` Zi Yan
  0 siblings, 1 reply; 32+ messages in thread
From: David Hildenbrand @ 2024-12-05  8:10 UTC (permalink / raw)
  To: Geert Uytterhoeven, Zi Yan
  Cc: Matthew Wilcox, Vlastimil Babka, linux-mm, Andrew Morton,
	Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, Ryan Roberts,
	Alexander Potapenko, Kees Cook, linux-kernel, linux-mips

On 05.12.24 09:04, Geert Uytterhoeven wrote:
> Hi Zi,
> 
> On Wed, Dec 4, 2024 at 7:30 PM Zi Yan <ziy@nvidia.com> wrote:
>> On 4 Dec 2024, at 12:33, Zi Yan wrote:
>>> On 4 Dec 2024, at 11:29, Matthew Wilcox wrote:
>>>> On Wed, Dec 04, 2024 at 11:16:51AM -0500, Zi Yan wrote:
>>>>>> So maybe the clearing done as part of page allocator isn't enough here.
>>>>>>
>>>>> Basically, mips needs to flush data cache if kmap address is aliased to
>>>>
>>>> People use "aliased" in contronym ways.  Do you mean "has a
>>>> non-congruent alias" or "has a congruent alias"?
>>>>
>>>>> userspace address. This means when mips has THP on, the patch below
>>>>> is not enough to fix the issue.
>>>>>
>>>>> In post_alloc_hook(), it does not make sense to pass userspace address
>>>>> in to determine whether to flush dcache or not.
>>>>>
>>>>> One way to fix it is to add something like arch_userpage_post_alloc()
>>>>> to flush dcache if kmap address is aliased to userspace address.
>>>>> But my questions are that
>>>>> 1) if kmap address will always be the same for two separate kmap_local() calls,
>>>>
>>>> No.  It just takes the next address in the stack.
>>>
>>> Hmm, if kmap_local() gives different addresses, wouldn’t init_on_alloc be
>>> causing issues before my patch? In the page allocator, the page is zeroed
>>> from one kmap address without flush, then clear_user_highpage() clears
>>> it again with another kmap address with flush. After returning to userspace,
>>> the user application works on the page but when the cache line used by
>>> init_on_alloc is written back (with 0s) at eviction, user data is corrupted.
>>> Am I missing anything? Or all arch with cache aliasing never enables
>>> init_on_alloc?
>>
>> Hi Geert,
>>
>> Regarding the above concern, have you ever had CONFIG_INIT_ON_ALLOC_DEFAULT_ON
>> for your MIPS machine and encountered any issue? Or let me know if my reasoning
>> above is flawed.
>>
>> To test it, I wonder if you can 1) revert my patch and 2) turn on
>> CONFIG_INIT_ON_ALLOC_DEFAULT_ON for your MIPS machine and run some applications
>> to see if any error happens.
> 
> That seems to work fine...
> 
> Kernel log confirms it's enabled:
> -mem auto-init: stack:off, heap alloc:off, heap free:off
> +mem auto-init: stack:off, heap alloc:on, heap free:off

If I'm not wrong that's expected ... because we'll be double-zeroing 
that memory, clearing the cache :)

I guess the question is, how *effective* is 
CONFIG_INIT_ON_ALLOC_DEFAULT_ON on systems to prevent exposing un-zeroed 
data to userspace, when it doesn't end up doing the flush we really need.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-05  8:10                 ` David Hildenbrand
@ 2024-12-05 16:05                   ` Zi Yan
  2024-12-05 17:24                     ` Vlastimil Babka
  2024-12-06  8:03                     ` Geert Uytterhoeven
  0 siblings, 2 replies; 32+ messages in thread
From: Zi Yan @ 2024-12-05 16:05 UTC (permalink / raw)
  To: Geert Uytterhoeven, David Hildenbrand
  Cc: Matthew Wilcox, Vlastimil Babka, linux-mm, Andrew Morton,
	Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, Ryan Roberts,
	Alexander Potapenko, Kees Cook, linux-kernel, linux-mips

On 5 Dec 2024, at 3:10, David Hildenbrand wrote:

> On 05.12.24 09:04, Geert Uytterhoeven wrote:
>> Hi Zi,
>>
>> On Wed, Dec 4, 2024 at 7:30 PM Zi Yan <ziy@nvidia.com> wrote:
>>> On 4 Dec 2024, at 12:33, Zi Yan wrote:
>>>> On 4 Dec 2024, at 11:29, Matthew Wilcox wrote:
>>>>> On Wed, Dec 04, 2024 at 11:16:51AM -0500, Zi Yan wrote:
>>>>>>> So maybe the clearing done as part of page allocator isn't enough here.
>>>>>>>
>>>>>> Basically, mips needs to flush data cache if kmap address is aliased to
>>>>>
>>>>> People use "aliased" in contronym ways.  Do you mean "has a
>>>>> non-congruent alias" or "has a congruent alias"?
>>>>>
>>>>>> userspace address. This means when mips has THP on, the patch below
>>>>>> is not enough to fix the issue.
>>>>>>
>>>>>> In post_alloc_hook(), it does not make sense to pass userspace address
>>>>>> in to determine whether to flush dcache or not.
>>>>>>
>>>>>> One way to fix it is to add something like arch_userpage_post_alloc()
>>>>>> to flush dcache if kmap address is aliased to userspace address.
>>>>>> But my questions are that
>>>>>> 1) if kmap address will always be the same for two separate kmap_local() calls,
>>>>>
>>>>> No.  It just takes the next address in the stack.
>>>>
>>>> Hmm, if kmap_local() gives different addresses, wouldn’t init_on_alloc be
>>>> causing issues before my patch? In the page allocator, the page is zeroed
>>>> from one kmap address without flush, then clear_user_highpage() clears
>>>> it again with another kmap address with flush. After returning to userspace,
>>>> the user application works on the page but when the cache line used by
>>>> init_on_alloc is written back (with 0s) at eviction, user data is corrupted.
>>>> Am I missing anything? Or all arch with cache aliasing never enables
>>>> init_on_alloc?
>>>
>>> Hi Geert,
>>>
>>> Regarding the above concern, have you ever had CONFIG_INIT_ON_ALLOC_DEFAULT_ON
>>> for your MIPS machine and encountered any issue? Or let me know if my reasoning
>>> above is flawed.
>>>
>>> To test it, I wonder if you can 1) revert my patch and 2) turn on
>>> CONFIG_INIT_ON_ALLOC_DEFAULT_ON for your MIPS machine and run some applications
>>> to see if any error happens.
>>
>> That seems to work fine...
>>
>> Kernel log confirms it's enabled:
>> -mem auto-init: stack:off, heap alloc:off, heap free:off
>> +mem auto-init: stack:off, heap alloc:on, heap free:off
>
> If I'm not wrong that's expected ... because we'll be double-zeroing that memory, clearing the cache :)
>
> I guess the question is, how *effective* is CONFIG_INIT_ON_ALLOC_DEFAULT_ON on systems to prevent exposing un-zeroed data to userspace, when it doesn't end up doing the flush we really need.

Hi Geert,

Is it possible to run a 32bit kernel with HIGHMEM and
CONFIG_INIT_ON_ALLOC_DEFAULT_ON on the machine (of course with my patch
reverted)? Just to check my reasoning below.

Thanks.


Yes, it should work, since I forgot the actual issue is HIGHMEM+cache flush, not just cache flush is needed after clearing user page.

For arch which needs to flush cache after clearing user page, with HIGHMEM,
init_on_alloc first clears the page using kmap_addr0 without flushing
the cache, then clear_user_page() clears the page using kmap_addr1
with cache flush. After returning to userspace, the cache lines of
kmap_addr0 will be evicted and written back to RAM eventually, corrupting
user data with 0s, because no one flushes them before returning to userspace.

For a proper fix, I will add ARCH_HAS_OPS_AFTER_CLEAR_USER_PAGE and
make mips, sh, sparc, arm, xtensa, nios2, m68k, parisc, csky, arc, and powerpc
select it, then make alloc_zeroed() returns false if
ARCH_HAS_OPS_AFTER_CLEAR_USER_PAGE is enabled.

If my reasoning above is verified to be true, I will send a separate patch
to disable CONFIG_INIT_ON_ALLOC_DEFAULT_ON if HIGHMEM &&
ARCH_HAS_OPS_AFTER_CLEAR_USER_PAGE.

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-05 16:05                   ` Zi Yan
@ 2024-12-05 17:24                     ` Vlastimil Babka
  2024-12-05 17:38                       ` Zi Yan
  2024-12-06  8:03                     ` Geert Uytterhoeven
  1 sibling, 1 reply; 32+ messages in thread
From: Vlastimil Babka @ 2024-12-05 17:24 UTC (permalink / raw)
  To: Zi Yan, Geert Uytterhoeven, David Hildenbrand
  Cc: Matthew Wilcox, linux-mm, Andrew Morton, Miaohe Lin, Kefeng Wang,
	John Hubbard, Huang, Ying, Ryan Roberts, Alexander Potapenko,
	Kees Cook, linux-kernel, linux-mips

On 12/5/24 17:05, Zi Yan wrote:
> On 5 Dec 2024, at 3:10, David Hildenbrand wrote:
>>>
>>> Kernel log confirms it's enabled:
>>> -mem auto-init: stack:off, heap alloc:off, heap free:off
>>> +mem auto-init: stack:off, heap alloc:on, heap free:off
>>
>> If I'm not wrong that's expected ... because we'll be double-zeroing that memory, clearing the cache :)
>>
>> I guess the question is, how *effective* is CONFIG_INIT_ON_ALLOC_DEFAULT_ON on systems to prevent exposing un-zeroed data to userspace, when it doesn't end up doing the flush we really need.
> 
> Hi Geert,
> 
> Is it possible to run a 32bit kernel with HIGHMEM and
> CONFIG_INIT_ON_ALLOC_DEFAULT_ON on the machine (of course with my patch
> reverted)? Just to check my reasoning below.
> 
> Thanks.
> 
> 
> Yes, it should work, since I forgot the actual issue is HIGHMEM+cache flush, not just cache flush is needed after clearing user page.
> 
> For arch which needs to flush cache after clearing user page, with HIGHMEM,
> init_on_alloc first clears the page using kmap_addr0 without flushing
> the cache, then clear_user_page() clears the page using kmap_addr1
> with cache flush. After returning to userspace, the cache lines of
> kmap_addr0 will be evicted and written back to RAM eventually, corrupting
> user data with 0s, because no one flushes them before returning to userspace.
> 
> For a proper fix, I will add ARCH_HAS_OPS_AFTER_CLEAR_USER_PAGE and
> make mips, sh, sparc, arm, xtensa, nios2, m68k, parisc, csky, arc, and powerpc
> select it, then make alloc_zeroed() returns false if
> ARCH_HAS_OPS_AFTER_CLEAR_USER_PAGE is enabled.
> 
> If my reasoning above is verified to be true, I will send a separate patch
> to disable CONFIG_INIT_ON_ALLOC_DEFAULT_ON if HIGHMEM &&
> ARCH_HAS_OPS_AFTER_CLEAR_USER_PAGE.

If your reasoning is true, wouldn't any other user of kmap_local_page() of a
highpage on such system also leave the cache unflushed in case the page is
ever reused as a userspace page?

> Best Regards,
> Yan, Zi



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-05 17:24                     ` Vlastimil Babka
@ 2024-12-05 17:38                       ` Zi Yan
  0 siblings, 0 replies; 32+ messages in thread
From: Zi Yan @ 2024-12-05 17:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Geert Uytterhoeven, David Hildenbrand, Matthew Wilcox, linux-mm,
	Andrew Morton, Miaohe Lin, Kefeng Wang, John Hubbard, Huang,
	Ying, Ryan Roberts, Alexander Potapenko, Kees Cook, linux-kernel,
	linux-mips

On 5 Dec 2024, at 12:24, Vlastimil Babka wrote:

> On 12/5/24 17:05, Zi Yan wrote:
>> On 5 Dec 2024, at 3:10, David Hildenbrand wrote:
>>>>
>>>> Kernel log confirms it's enabled:
>>>> -mem auto-init: stack:off, heap alloc:off, heap free:off
>>>> +mem auto-init: stack:off, heap alloc:on, heap free:off
>>>
>>> If I'm not wrong that's expected ... because we'll be double-zeroing that memory, clearing the cache :)
>>>
>>> I guess the question is, how *effective* is CONFIG_INIT_ON_ALLOC_DEFAULT_ON on systems to prevent exposing un-zeroed data to userspace, when it doesn't end up doing the flush we really need.
>>
>> Hi Geert,
>>
>> Is it possible to run a 32bit kernel with HIGHMEM and
>> CONFIG_INIT_ON_ALLOC_DEFAULT_ON on the machine (of course with my patch
>> reverted)? Just to check my reasoning below.
>>
>> Thanks.
>>
>>
>> Yes, it should work, since I forgot the actual issue is HIGHMEM+cache flush, not just cache flush is needed after clearing user page.
>>
>> For arch which needs to flush cache after clearing user page, with HIGHMEM,
>> init_on_alloc first clears the page using kmap_addr0 without flushing
>> the cache, then clear_user_page() clears the page using kmap_addr1
>> with cache flush. After returning to userspace, the cache lines of
>> kmap_addr0 will be evicted and written back to RAM eventually, corrupting
>> user data with 0s, because no one flushes them before returning to userspace.
>>
>> For a proper fix, I will add ARCH_HAS_OPS_AFTER_CLEAR_USER_PAGE and
>> make mips, sh, sparc, arm, xtensa, nios2, m68k, parisc, csky, arc, and powerpc
>> select it, then make alloc_zeroed() returns false if
>> ARCH_HAS_OPS_AFTER_CLEAR_USER_PAGE is enabled.
>>
>> If my reasoning above is verified to be true, I will send a separate patch
>> to disable CONFIG_INIT_ON_ALLOC_DEFAULT_ON if HIGHMEM &&
>> ARCH_HAS_OPS_AFTER_CLEAR_USER_PAGE.
>
> If your reasoning is true, wouldn't any other user of kmap_local_page() of a
> highpage on such system also leave the cache unflushed in case the page is
> ever reused as a userspace page?

If the page is written and no cache is flushed, yes. But if the page is read
and the cache lines are clean, no write back will be done.

I wonder in what scenarios kernel writes to user pages, besides copy_to_user*(),
which should handle the cache flush.

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-05 16:05                   ` Zi Yan
  2024-12-05 17:24                     ` Vlastimil Babka
@ 2024-12-06  8:03                     ` Geert Uytterhoeven
  1 sibling, 0 replies; 32+ messages in thread
From: Geert Uytterhoeven @ 2024-12-06  8:03 UTC (permalink / raw)
  To: Zi Yan
  Cc: David Hildenbrand, Matthew Wilcox, Vlastimil Babka, linux-mm,
	Andrew Morton, Miaohe Lin, Kefeng Wang, John Hubbard, Huang,
	Ying, Ryan Roberts, Alexander Potapenko, Kees Cook, linux-kernel,
	linux-mips

Hi Zi,

On Thu, Dec 5, 2024 at 5:05 PM Zi Yan <ziy@nvidia.com> wrote:
> Is it possible to run a 32bit kernel with HIGHMEM and
> CONFIG_INIT_ON_ALLOC_DEFAULT_ON on the machine (of course with my patch
> reverted)? Just to check my reasoning below.

While I can boot 32-bit kernels, CPU_TX49XX and MACH_TX49XX do not
select CPU_SUPPORTS_HIGHMEM resp. SYS_SUPPORTS_HIGHMEM, and thus
HIGHMEM is not supported.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1
  2024-12-04 15:24   ` Zi Yan
  2024-12-04 15:41     ` Vlastimil Babka
@ 2024-12-05  8:15     ` Geert Uytterhoeven
  1 sibling, 0 replies; 32+ messages in thread
From: Geert Uytterhoeven @ 2024-12-05  8:15 UTC (permalink / raw)
  To: Zi Yan
  Cc: linux-mm, Andrew Morton, David Hildenbrand,
	Matthew Wilcox (Oracle),
	Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, Ryan Roberts,
	Alexander Potapenko, Kees Cook, linux-kernel, linux-mips

Hi Zi,

On Wed, Dec 4, 2024 at 4:24 PM Zi Yan <ziy@nvidia.com> wrote:
> On 4 Dec 2024, at 5:41, Geert Uytterhoeven wrote:
> > On Fri, Oct 11, 2024 at 5:13 PM Zi Yan <ziy@nvidia.com> wrote:
> >> Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and
> >> init_on_free=1 boot options") forces allocated page to be zeroed in
> >> post_alloc_hook() when init_on_alloc=1.
> >>
> >> For order-0 folios, if arch does not define
> >> vma_alloc_zeroed_movable_folio(), the default implementation again zeros
> >> the page return from the buddy allocator. So the page is zeroed twice.
> >> Fix it by passing __GFP_ZERO instead to avoid double page zeroing.
> >> At the moment, s390,arm64,x86,alpha,m68k are not impacted since they
> >> define their own vma_alloc_zeroed_movable_folio().
> >>
> >> For >0 order folios (mTHP and PMD THP), folio_zero_user() is called to
> >> zero the folio again. Fix it by calling folio_zero_user() only if
> >> init_on_alloc is set. All arch are impacted.
> >>
> >> Added alloc_zeroed() helper to encapsulate the init_on_alloc check.
> >>
> >> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >
> > Thanks for your patch, which is now commit 5708d96da20b99b4 ("mm:
> > avoid zeroing user movable page twice with init_on_alloc=1")
> > in v6.13-rc1.
> >
> > This causing a panic when starting userspace on MIPS64 RBTX4927:
> >
> >     Run /sbin/init as init process
> >     process '/lib/systemd/systemd' started with executable stack
> >     Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> >     ---[ end Kernel panic - not syncing: Attempted to kill init!
> > exitcode=0x0000000b ]---
> >
> > or
> >
> >     Run /sbin/init as init process
> >     process '/lib/systemd/systemd' started with executable stack
> >     do_page_fault(): sending SIGSEGV to init for invalid read access
> > from 00000000583399f8
> >     epc = 0000000077e2b094 in ld-2.19.so[3094,77e28000+22000]
> >     ra  = 0000000077e2afcc in ld-2.19.so[2fcc,77e28000+22000]
> >     Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> >     ---[ end Kernel panic - not syncing: Attempted to kill init!
> > exitcode=0x0000000b ]---
> >
> > or
> >
> >     Run /sbin/init as init process
> >     process '/lib/systemd/systemd' started with executable stack
> >     /sbin/inKernel panic - not syncing: Attempted to kill init!
> > exitcode=0x00007f00
> >     ---[ end Kernel panic - not syncing: Attempted to kill init!
> > exitcode=0x00007f00 ]---
> >     it: error while loading shared libraries: libpthread.so.0: object
> > file has no dynamic section
> >
> > Reverting the commit (and fixing the trivial conflict) fixes the issue.

> The provided config does not have THP on, so the changes to mm/huge_memory.c
> and mm/memory.c do not apply.
>
> Can you try the patch below and see if the machine boots? Thanks.

> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index 6e452bd8e7e3..bec9bd715acf 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -224,7 +224,13 @@ static inline
>  struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
>                                    unsigned long vaddr)
>  {
> -       return vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr);
> +       struct folio *folio;
> +
> +       folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr);
> +       if (folio)
> +               clear_user_highpage(&folio->page, vaddr);
> +
> +       return folio;
>  }
>  #endif

Thanks, that works!
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2024-12-06  8:37 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-11 15:03 [PATCH] mm: avoid zeroing user movable page twice with init_on_alloc=1 Zi Yan
2024-10-11 18:23 ` Zi Yan
2024-10-16 12:53 ` Vlastimil Babka
2024-10-16 13:30   ` Zi Yan
2024-10-21 12:23 ` David Hildenbrand
2024-10-21 14:21   ` Zi Yan
2024-10-22 14:33     ` Zi Yan
2024-12-04 10:41 ` Geert Uytterhoeven
2024-12-04 12:50   ` Zi Yan
2024-12-04 12:56     ` Geert Uytterhoeven
2024-12-04 15:24   ` Zi Yan
2024-12-04 15:41     ` Vlastimil Babka
2024-12-04 16:16       ` Zi Yan
2024-12-04 16:29         ` Matthew Wilcox
2024-12-04 16:58           ` Zi Yan
2024-12-05  8:19             ` Geert Uytterhoeven
2024-12-05 17:32               ` Zi Yan
2024-12-06  8:37                 ` Geert Uytterhoeven
2024-12-04 17:33           ` Zi Yan
2024-12-04 17:46             ` Vlastimil Babka
2024-12-04 18:13               ` Zi Yan
2024-12-04 18:16                 ` Zi Yan
2024-12-04 21:21                   ` Zi Yan
2024-12-04 21:24                     ` John Hubbard
2024-12-04 18:30             ` Zi Yan
2024-12-05  8:04               ` Geert Uytterhoeven
2024-12-05  8:10                 ` David Hildenbrand
2024-12-05 16:05                   ` Zi Yan
2024-12-05 17:24                     ` Vlastimil Babka
2024-12-05 17:38                       ` Zi Yan
2024-12-06  8:03                     ` Geert Uytterhoeven
2024-12-05  8:15     ` Geert Uytterhoeven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox