* [RFC PATCH] mm: avoid clearing user movable page twice with init_on_alloc=1
@ 2024-10-07 18:23 Zi Yan
2024-10-08 8:26 ` David Hildenbrand
2024-10-11 6:57 ` Huang, Ying
0 siblings, 2 replies; 8+ messages in thread
From: Zi Yan @ 2024-10-07 18:23 UTC (permalink / raw)
To: linux-mm, Alexander Potapenko, Kees Cook
Cc: David Hildenbrand, Andrew Morton, Matthew Wilcox (Oracle),
Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, linux-kernel,
Zi Yan
Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and
init_on_free=1 boot options") forces allocated page to be cleared in
post_alloc_hook() when init_on_alloc=1.
For non PMD folios, if arch does not define
vma_alloc_zeroed_movable_folio(), the default implementation again clears
the page return from the buddy allocator. So the page is cleared twice.
Fix it by passing __GFP_ZERO instead to avoid double page clearing.
At the moment, s390,arm64,x86,alpha,m68k are not impacted since they
define their own vma_alloc_zeroed_movable_folio().
For PMD folios, folio_zero_user() is called to clear the folio again.
Fix it by calling folio_zero_user() only if init_on_alloc is set.
All arch are impacted.
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
include/linux/highmem.h | 14 ++------------
mm/huge_memory.c | 4 +++-
2 files changed, 5 insertions(+), 13 deletions(-)
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 930a591b9b61..4b15224842e1 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -220,18 +220,8 @@ static inline void clear_user_highpage(struct page *page, unsigned long vaddr)
* Return: A folio containing one allocated and zeroed page or NULL if
* we are out of memory.
*/
-static inline
-struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
- unsigned long vaddr)
-{
- struct folio *folio;
-
- folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr, false);
- if (folio)
- clear_user_highpage(&folio->page, vaddr);
-
- return folio;
-}
+#define vma_alloc_zeroed_movable_folio(vma, vaddr) \
+ vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false)
#endif
static inline void clear_highpage(struct page *page)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a7b05f4c2a5e..ff746151896f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1177,7 +1177,9 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
goto release;
}
- folio_zero_user(folio, vmf->address);
+ if (!static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
+ &init_on_alloc))
+ folio_zero_user(folio, vmf->address);
/*
* The memory barrier inside __folio_mark_uptodate makes sure that
* folio_zero_user writes become visible before the set_pmd_at()
--
2.45.2
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] mm: avoid clearing user movable page twice with init_on_alloc=1
2024-10-07 18:23 [RFC PATCH] mm: avoid clearing user movable page twice with init_on_alloc=1 Zi Yan
@ 2024-10-08 8:26 ` David Hildenbrand
2024-10-08 11:52 ` Zi Yan
2024-10-11 6:57 ` Huang, Ying
1 sibling, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2024-10-08 8:26 UTC (permalink / raw)
To: Zi Yan, linux-mm, Alexander Potapenko, Kees Cook
Cc: Andrew Morton, Matthew Wilcox (Oracle),
Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, linux-kernel
On 07.10.24 20:23, Zi Yan wrote:
> Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and
> init_on_free=1 boot options") forces allocated page to be cleared in
> post_alloc_hook() when init_on_alloc=1.
>
> For non PMD folios, if arch does not define
> vma_alloc_zeroed_movable_folio(), the default implementation again clears
> the page return from the buddy allocator. So the page is cleared twice.
> Fix it by passing __GFP_ZERO instead to avoid double page clearing.
> At the moment, s390,arm64,x86,alpha,m68k are not impacted since they
> define their own vma_alloc_zeroed_movable_folio().
>
> For PMD folios, folio_zero_user() is called to clear the folio again.
> Fix it by calling folio_zero_user() only if init_on_alloc is set.
> All arch are impacted.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
> include/linux/highmem.h | 14 ++------------
> mm/huge_memory.c | 4 +++-
> 2 files changed, 5 insertions(+), 13 deletions(-)
>
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index 930a591b9b61..4b15224842e1 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -220,18 +220,8 @@ static inline void clear_user_highpage(struct page *page, unsigned long vaddr)
> * Return: A folio containing one allocated and zeroed page or NULL if
> * we are out of memory.
> */
> -static inline
> -struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
> - unsigned long vaddr)
> -{
> - struct folio *folio;
> -
> - folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr, false);
> - if (folio)
> - clear_user_highpage(&folio->page, vaddr);
> -
> - return folio;
> -}
> +#define vma_alloc_zeroed_movable_folio(vma, vaddr) \
> + vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false)
> #endif
>
> static inline void clear_highpage(struct page *page)
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index a7b05f4c2a5e..ff746151896f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1177,7 +1177,9 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
> goto release;
> }
>
> - folio_zero_user(folio, vmf->address);
> + if (!static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
> + &init_on_alloc))
> + folio_zero_user(folio, vmf->address);
> /*
> * The memory barrier inside __folio_mark_uptodate makes sure that
> * folio_zero_user writes become visible before the set_pmd_at()
I remember we discussed that in the past and that we do *not* want to
sprinkle these CONFIG_INIT_ON_ALLOC_DEFAULT_ON checks all over the kernel.
Ideally, we'd use GFP_ZERO and have the buddy just do that for us? There
is the slight chance that we zero-out when we're not going to use the
allocated folio, but ... that can happen either way even with the
current code?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] mm: avoid clearing user movable page twice with init_on_alloc=1
2024-10-08 8:26 ` David Hildenbrand
@ 2024-10-08 11:52 ` Zi Yan
2024-10-08 12:57 ` Vlastimil Babka
0 siblings, 1 reply; 8+ messages in thread
From: Zi Yan @ 2024-10-08 11:52 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-mm, Alexander Potapenko, Kees Cook, Andrew Morton,
Matthew Wilcox (Oracle),
Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 3456 bytes --]
On 8 Oct 2024, at 4:26, David Hildenbrand wrote:
> On 07.10.24 20:23, Zi Yan wrote:
>> Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and
>> init_on_free=1 boot options") forces allocated page to be cleared in
>> post_alloc_hook() when init_on_alloc=1.
>>
>> For non PMD folios, if arch does not define
>> vma_alloc_zeroed_movable_folio(), the default implementation again clears
>> the page return from the buddy allocator. So the page is cleared twice.
>> Fix it by passing __GFP_ZERO instead to avoid double page clearing.
>> At the moment, s390,arm64,x86,alpha,m68k are not impacted since they
>> define their own vma_alloc_zeroed_movable_folio().
>>
>> For PMD folios, folio_zero_user() is called to clear the folio again.
>> Fix it by calling folio_zero_user() only if init_on_alloc is set.
>> All arch are impacted.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>> include/linux/highmem.h | 14 ++------------
>> mm/huge_memory.c | 4 +++-
>> 2 files changed, 5 insertions(+), 13 deletions(-)
>>
>> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> index 930a591b9b61..4b15224842e1 100644
>> --- a/include/linux/highmem.h
>> +++ b/include/linux/highmem.h
>> @@ -220,18 +220,8 @@ static inline void clear_user_highpage(struct page *page, unsigned long vaddr)
>> * Return: A folio containing one allocated and zeroed page or NULL if
>> * we are out of memory.
>> */
>> -static inline
>> -struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
>> - unsigned long vaddr)
>> -{
>> - struct folio *folio;
>> -
>> - folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr, false);
>> - if (folio)
>> - clear_user_highpage(&folio->page, vaddr);
>> -
>> - return folio;
>> -}
>> +#define vma_alloc_zeroed_movable_folio(vma, vaddr) \
>> + vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false)
>> #endif
>> static inline void clear_highpage(struct page *page)
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index a7b05f4c2a5e..ff746151896f 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -1177,7 +1177,9 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
>> goto release;
>> }
>> - folio_zero_user(folio, vmf->address);
>> + if (!static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
>> + &init_on_alloc))
>> + folio_zero_user(folio, vmf->address);
>> /*
>> * The memory barrier inside __folio_mark_uptodate makes sure that
>> * folio_zero_user writes become visible before the set_pmd_at()
>
> I remember we discussed that in the past and that we do *not* want to sprinkle these CONFIG_INIT_ON_ALLOC_DEFAULT_ON checks all over the kernel.
>
> Ideally, we'd use GFP_ZERO and have the buddy just do that for us? There is the slight chance that we zero-out when we're not going to use the allocated folio, but ... that can happen either way even with the current code?
I agree that putting CONFIG_INIT_ON_ALLOC_DEFAULT_ON here is not ideal, but
folio_zero_user() uses vmf->address to improve cache performance by changing
subpage clearing order. See commit c79b57e462b5 ("mm: hugetlb: clear target
sub-page last when clearing huge page”). If we use GFP_ZERO, we lose this
optimization. To keep it, vmf->address will need to be passed to allocation
code. Maybe that is acceptable?
Best Regards,
Yan, Zi
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] mm: avoid clearing user movable page twice with init_on_alloc=1
2024-10-08 11:52 ` Zi Yan
@ 2024-10-08 12:57 ` Vlastimil Babka
2024-10-08 13:06 ` David Hildenbrand
0 siblings, 1 reply; 8+ messages in thread
From: Vlastimil Babka @ 2024-10-08 12:57 UTC (permalink / raw)
To: Zi Yan, David Hildenbrand
Cc: linux-mm, Alexander Potapenko, Kees Cook, Andrew Morton,
Matthew Wilcox (Oracle),
Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, linux-kernel
On 10/8/24 13:52, Zi Yan wrote:
> On 8 Oct 2024, at 4:26, David Hildenbrand wrote:
>
>>
>> I remember we discussed that in the past and that we do *not* want to sprinkle these CONFIG_INIT_ON_ALLOC_DEFAULT_ON checks all over the kernel.
>>
>> Ideally, we'd use GFP_ZERO and have the buddy just do that for us? There is the slight chance that we zero-out when we're not going to use the allocated folio, but ... that can happen either way even with the current code?
>
> I agree that putting CONFIG_INIT_ON_ALLOC_DEFAULT_ON here is not ideal, but
Create some nice inline wrapper for the test and it will look less ugly? :)
> folio_zero_user() uses vmf->address to improve cache performance by changing
> subpage clearing order. See commit c79b57e462b5 ("mm: hugetlb: clear target
> sub-page last when clearing huge page”). If we use GFP_ZERO, we lose this
> optimization. To keep it, vmf->address will need to be passed to allocation
> code. Maybe that is acceptable?
I'd rather not change the page allocation code for this...
> Best Regards,
> Yan, Zi
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] mm: avoid clearing user movable page twice with init_on_alloc=1
2024-10-08 12:57 ` Vlastimil Babka
@ 2024-10-08 13:06 ` David Hildenbrand
2024-10-08 13:46 ` Zi Yan
0 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2024-10-08 13:06 UTC (permalink / raw)
To: Vlastimil Babka, Zi Yan
Cc: linux-mm, Alexander Potapenko, Kees Cook, Andrew Morton,
Matthew Wilcox (Oracle),
Miaohe Lin, Kefeng Wang, John Hubbard, Huang, Ying, linux-kernel
On 08.10.24 14:57, Vlastimil Babka wrote:
> On 10/8/24 13:52, Zi Yan wrote:
>> On 8 Oct 2024, at 4:26, David Hildenbrand wrote:
>>
>>>
>>> I remember we discussed that in the past and that we do *not* want to sprinkle these CONFIG_INIT_ON_ALLOC_DEFAULT_ON checks all over the kernel.
>>>
>>> Ideally, we'd use GFP_ZERO and have the buddy just do that for us? There is the slight chance that we zero-out when we're not going to use the allocated folio, but ... that can happen either way even with the current code?
>>
>> I agree that putting CONFIG_INIT_ON_ALLOC_DEFAULT_ON here is not ideal, but
>
> Create some nice inline wrapper for the test and it will look less ugly? :)
>
>> folio_zero_user() uses vmf->address to improve cache performance by changing
>> subpage clearing order. See commit c79b57e462b5 ("mm: hugetlb: clear target
>> sub-page last when clearing huge page”). If we use GFP_ZERO, we lose this
>> optimization. To keep it, vmf->address will need to be passed to allocation
>> code. Maybe that is acceptable?
>
> I'd rather not change the page allocation code for this...
Although I'm curious if that optimization from 2017 is still valuable :)
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] mm: avoid clearing user movable page twice with init_on_alloc=1
2024-10-08 13:06 ` David Hildenbrand
@ 2024-10-08 13:46 ` Zi Yan
2024-10-11 6:55 ` Huang, Ying
0 siblings, 1 reply; 8+ messages in thread
From: Zi Yan @ 2024-10-08 13:46 UTC (permalink / raw)
To: David Hildenbrand, Vlastimil Babka, Huang, Ying
Cc: linux-mm, Alexander Potapenko, Kees Cook, Andrew Morton,
Matthew Wilcox (Oracle),
Miaohe Lin, Kefeng Wang, John Hubbard, linux-kernel,
Ryan Roberts
[-- Attachment #1: Type: text/plain, Size: 1811 bytes --]
On 8 Oct 2024, at 9:06, David Hildenbrand wrote:
> On 08.10.24 14:57, Vlastimil Babka wrote:
>> On 10/8/24 13:52, Zi Yan wrote:
>>> On 8 Oct 2024, at 4:26, David Hildenbrand wrote:
>>>
>>>>
>>>> I remember we discussed that in the past and that we do *not* want to sprinkle these CONFIG_INIT_ON_ALLOC_DEFAULT_ON checks all over the kernel.
>>>>
>>>> Ideally, we'd use GFP_ZERO and have the buddy just do that for us? There is the slight chance that we zero-out when we're not going to use the allocated folio, but ... that can happen either way even with the current code?
>>>
>>> I agree that putting CONFIG_INIT_ON_ALLOC_DEFAULT_ON here is not ideal, but
>>
>> Create some nice inline wrapper for the test and it will look less ugly? :)
something like?
static inline bool alloc_zeroed()
{
return static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
&init_on_alloc);
}
I missed another folio_zero_user() caller in alloc_anon_folio() for mTHP.
So both PMD THP and mTHP are zeroed twice for all arch.
Adding Ryan for mTHP.
>>
>>> folio_zero_user() uses vmf->address to improve cache performance by changing
>>> subpage clearing order. See commit c79b57e462b5 ("mm: hugetlb: clear target
>>> sub-page last when clearing huge page”). If we use GFP_ZERO, we lose this
>>> optimization. To keep it, vmf->address will need to be passed to allocation
>>> code. Maybe that is acceptable?
>>
>> I'd rather not change the page allocation code for this...
>
> Although I'm curious if that optimization from 2017 is still valuable :)
Maybe Ying can give some insight on this.
Do we need some general guidance on who is responsible for zeroing allocated
folios? Should people use GFP_ZERO instead of zeroing by themselves if possible?
Best Regards,
Yan, Zi
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] mm: avoid clearing user movable page twice with init_on_alloc=1
2024-10-08 13:46 ` Zi Yan
@ 2024-10-11 6:55 ` Huang, Ying
0 siblings, 0 replies; 8+ messages in thread
From: Huang, Ying @ 2024-10-11 6:55 UTC (permalink / raw)
To: Zi Yan
Cc: David Hildenbrand, Vlastimil Babka, linux-mm,
Alexander Potapenko, Kees Cook, Andrew Morton,
Matthew Wilcox (Oracle),
Miaohe Lin, Kefeng Wang, John Hubbard, linux-kernel,
Ryan Roberts
Zi Yan <ziy@nvidia.com> writes:
> On 8 Oct 2024, at 9:06, David Hildenbrand wrote:
>
>> On 08.10.24 14:57, Vlastimil Babka wrote:
>>> On 10/8/24 13:52, Zi Yan wrote:
>>>> On 8 Oct 2024, at 4:26, David Hildenbrand wrote:
>>>>
>>>>>
>>>>> I remember we discussed that in the past and that we do *not* want to sprinkle these CONFIG_INIT_ON_ALLOC_DEFAULT_ON checks all over the kernel.
>>>>>
>>>>> Ideally, we'd use GFP_ZERO and have the buddy just do that for us? There is the slight chance that we zero-out when we're not going to use the allocated folio, but ... that can happen either way even with the current code?
>>>>
>>>> I agree that putting CONFIG_INIT_ON_ALLOC_DEFAULT_ON here is not ideal, but
>>>
>>> Create some nice inline wrapper for the test and it will look less ugly? :)
>
> something like?
>
> static inline bool alloc_zeroed()
> {
> return static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
> &init_on_alloc);
> }
>
>
> I missed another folio_zero_user() caller in alloc_anon_folio() for mTHP.
> So both PMD THP and mTHP are zeroed twice for all arch.
>
> Adding Ryan for mTHP.
>
>>>
>>>> folio_zero_user() uses vmf->address to improve cache performance by changing
>>>> subpage clearing order. See commit c79b57e462b5 ("mm: hugetlb: clear target
>>>> sub-page last when clearing huge page”). If we use GFP_ZERO, we lose this
>>>> optimization. To keep it, vmf->address will need to be passed to allocation
>>>> code. Maybe that is acceptable?
>>>
>>> I'd rather not change the page allocation code for this...
>>
>> Although I'm curious if that optimization from 2017 is still valuable :)
>
> Maybe Ying can give some insight on this.
I guess the optimization still applies now. Although the size of the
per-core(thread) last level cache increases, it's still quite common for
it to be smaller than the size of THP. And the sizes of L1/L2 are
significantly smaller, the likelihood for the accessed cache line to be
in L1/L2/LLC increases with the optimization.
--
Best Regards,
Huang, Ying
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] mm: avoid clearing user movable page twice with init_on_alloc=1
2024-10-07 18:23 [RFC PATCH] mm: avoid clearing user movable page twice with init_on_alloc=1 Zi Yan
2024-10-08 8:26 ` David Hildenbrand
@ 2024-10-11 6:57 ` Huang, Ying
1 sibling, 0 replies; 8+ messages in thread
From: Huang, Ying @ 2024-10-11 6:57 UTC (permalink / raw)
To: Zi Yan
Cc: linux-mm, Alexander Potapenko, Kees Cook, David Hildenbrand,
Andrew Morton, Matthew Wilcox (Oracle),
Miaohe Lin, Kefeng Wang, John Hubbard, linux-kernel
Zi Yan <ziy@nvidia.com> writes:
[snip]
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index 930a591b9b61..4b15224842e1 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -220,18 +220,8 @@ static inline void clear_user_highpage(struct page *page, unsigned long vaddr)
> * Return: A folio containing one allocated and zeroed page or NULL if
> * we are out of memory.
> */
> -static inline
> -struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
> - unsigned long vaddr)
> -{
> - struct folio *folio;
> -
> - folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr, false);
> - if (folio)
> - clear_user_highpage(&folio->page, vaddr);
> -
> - return folio;
> -}
> +#define vma_alloc_zeroed_movable_folio(vma, vaddr) \
> + vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false)
Although just one line, I still prefer to use inline function instead of
macro here. Not strong opinion.
> #endif
[snip]
--
Best Regards,
Huang, Ying
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-10-11 7:01 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-07 18:23 [RFC PATCH] mm: avoid clearing user movable page twice with init_on_alloc=1 Zi Yan
2024-10-08 8:26 ` David Hildenbrand
2024-10-08 11:52 ` Zi Yan
2024-10-08 12:57 ` Vlastimil Babka
2024-10-08 13:06 ` David Hildenbrand
2024-10-08 13:46 ` Zi Yan
2024-10-11 6:55 ` Huang, Ying
2024-10-11 6:57 ` Huang, Ying
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox