linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: pgtable: make ptep_clear() non-atomic
@ 2024-11-19  9:07 Qi Zheng
  2024-11-19 15:27 ` Pasha Tatashin
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Qi Zheng @ 2024-11-19  9:07 UTC (permalink / raw)
  To: pasha.tatashin, tongtiangen, jannh, lorenzo.stoakes, david,
	ryan.roberts, peterx, jgg, muchun.song, akpm
  Cc: linux-kernel, linux-mm, Qi Zheng

In the generic ptep_get_and_clear() implementation, it is just a simple
combination of ptep_get() and pte_clear(). But for some architectures
(such as x86 and arm64, etc), the hardware will modify the A/D bits of the
page table entry, so the ptep_get_and_clear() needs to be overwritten
and implemented as an atomic operation to avoid contention, which has a
performance cost.

The commit d283d422c6c4 ("x86: mm: add x86_64 support for page table
check") adds the ptep_clear() on the x86, and makes it call
ptep_get_and_clear() when CONFIG_PAGE_TABLE_CHECK is enabled. The page
table check feature does not actually care about the A/D bits, so only
ptep_get() + pte_clear() should be called. But considering that the page
table check is a debug option, this should not have much of an impact.

But then the commit de8c8e52836d ("mm: page_table_check: add hooks to
public helpers") changed ptep_clear() to unconditionally call
ptep_get_and_clear(), so that the CONFIG_PAGE_TABLE_CHECK check can be
put into the page table check stubs (in include/linux/page_table_check.h).
This also cause performance loss to the kernel without
CONFIG_PAGE_TABLE_CHECK enabled, which doesn't make sense.

Currently ptep_clear() is only used in debug code and in khugepaged
collapse paths, which are fairly expensive. So the cost of an extra atomic
RMW operation does not matter. But this may be used for other paths in the
future. After all, for the present pte entry, we need to call ptep_clear()
instead of pte_clear() to ensure that PAGE_TABLE_CHECK works properly.

So to be more precise, just calling ptep_get() and pte_clear() in the
ptep_clear().

Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 include/linux/pgtable.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index adef9d6e9b1ba..e59decd22e1cb 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -533,7 +533,10 @@ static inline void clear_young_dirty_ptes(struct vm_area_struct *vma,
 static inline void ptep_clear(struct mm_struct *mm, unsigned long addr,
 			      pte_t *ptep)
 {
-	ptep_get_and_clear(mm, addr, ptep);
+	pte_t pte = ptep_get(ptep);
+
+	pte_clear(mm, addr, ptep);
+	page_table_check_pte_clear(mm, pte);
 }
 
 #ifdef CONFIG_GUP_GET_PXX_LOW_HIGH
-- 
2.20.1



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: pgtable: make ptep_clear() non-atomic
  2024-11-19  9:07 [PATCH] mm: pgtable: make ptep_clear() non-atomic Qi Zheng
@ 2024-11-19 15:27 ` Pasha Tatashin
  2024-11-19 15:46 ` Jann Horn
  2024-11-20  2:13 ` Muchun Song
  2 siblings, 0 replies; 4+ messages in thread
From: Pasha Tatashin @ 2024-11-19 15:27 UTC (permalink / raw)
  To: Qi Zheng
  Cc: tongtiangen, jannh, lorenzo.stoakes, david, ryan.roberts, peterx,
	jgg, muchun.song, akpm, linux-kernel, linux-mm

On Tue, Nov 19, 2024 at 4:10 AM Qi Zheng <zhengqi.arch@bytedance.com> wrote:
>
> In the generic ptep_get_and_clear() implementation, it is just a simple
> combination of ptep_get() and pte_clear(). But for some architectures
> (such as x86 and arm64, etc), the hardware will modify the A/D bits of the
> page table entry, so the ptep_get_and_clear() needs to be overwritten
> and implemented as an atomic operation to avoid contention, which has a
> performance cost.
>
> The commit d283d422c6c4 ("x86: mm: add x86_64 support for page table
> check") adds the ptep_clear() on the x86, and makes it call
> ptep_get_and_clear() when CONFIG_PAGE_TABLE_CHECK is enabled. The page
> table check feature does not actually care about the A/D bits, so only
> ptep_get() + pte_clear() should be called. But considering that the page
> table check is a debug option, this should not have much of an impact.
>
> But then the commit de8c8e52836d ("mm: page_table_check: add hooks to
> public helpers") changed ptep_clear() to unconditionally call
> ptep_get_and_clear(), so that the CONFIG_PAGE_TABLE_CHECK check can be
> put into the page table check stubs (in include/linux/page_table_check.h).
> This also cause performance loss to the kernel without
> CONFIG_PAGE_TABLE_CHECK enabled, which doesn't make sense.
>
> Currently ptep_clear() is only used in debug code and in khugepaged
> collapse paths, which are fairly expensive. So the cost of an extra atomic
> RMW operation does not matter. But this may be used for other paths in the
> future. After all, for the present pte entry, we need to call ptep_clear()
> instead of pte_clear() to ensure that PAGE_TABLE_CHECK works properly.
>
> So to be more precise, just calling ptep_get() and pte_clear() in the
> ptep_clear().
>
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> ---
>  include/linux/pgtable.h | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index adef9d6e9b1ba..e59decd22e1cb 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -533,7 +533,10 @@ static inline void clear_young_dirty_ptes(struct vm_area_struct *vma,
>  static inline void ptep_clear(struct mm_struct *mm, unsigned long addr,
>                               pte_t *ptep)
>  {
> -       ptep_get_and_clear(mm, addr, ptep);
> +       pte_t pte = ptep_get(ptep);
> +
> +       pte_clear(mm, addr, ptep);
> +       page_table_check_pte_clear(mm, pte);
>  }

Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>

Thanks,
Pasha


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: pgtable: make ptep_clear() non-atomic
  2024-11-19  9:07 [PATCH] mm: pgtable: make ptep_clear() non-atomic Qi Zheng
  2024-11-19 15:27 ` Pasha Tatashin
@ 2024-11-19 15:46 ` Jann Horn
  2024-11-20  2:13 ` Muchun Song
  2 siblings, 0 replies; 4+ messages in thread
From: Jann Horn @ 2024-11-19 15:46 UTC (permalink / raw)
  To: Qi Zheng
  Cc: pasha.tatashin, tongtiangen, lorenzo.stoakes, david,
	ryan.roberts, peterx, jgg, muchun.song, akpm, linux-kernel,
	linux-mm

On Tue, Nov 19, 2024 at 10:10 AM Qi Zheng <zhengqi.arch@bytedance.com> wrote:
> In the generic ptep_get_and_clear() implementation, it is just a simple
> combination of ptep_get() and pte_clear(). But for some architectures
> (such as x86 and arm64, etc), the hardware will modify the A/D bits of the
> page table entry, so the ptep_get_and_clear() needs to be overwritten
> and implemented as an atomic operation to avoid contention, which has a
> performance cost.
>
> The commit d283d422c6c4 ("x86: mm: add x86_64 support for page table
> check") adds the ptep_clear() on the x86, and makes it call
> ptep_get_and_clear() when CONFIG_PAGE_TABLE_CHECK is enabled. The page
> table check feature does not actually care about the A/D bits, so only
> ptep_get() + pte_clear() should be called. But considering that the page
> table check is a debug option, this should not have much of an impact.
>
> But then the commit de8c8e52836d ("mm: page_table_check: add hooks to
> public helpers") changed ptep_clear() to unconditionally call
> ptep_get_and_clear(), so that the CONFIG_PAGE_TABLE_CHECK check can be
> put into the page table check stubs (in include/linux/page_table_check.h).
> This also cause performance loss to the kernel without
> CONFIG_PAGE_TABLE_CHECK enabled, which doesn't make sense.
>
> Currently ptep_clear() is only used in debug code and in khugepaged
> collapse paths, which are fairly expensive. So the cost of an extra atomic
> RMW operation does not matter. But this may be used for other paths in the
> future. After all, for the present pte entry, we need to call ptep_clear()
> instead of pte_clear() to ensure that PAGE_TABLE_CHECK works properly.
>
> So to be more precise, just calling ptep_get() and pte_clear() in the
> ptep_clear().
>
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>

Yeah, as you mentioned on the other thread, this seems like a
reasonable cleanup.

Reviewed-by: Jann Horn <jannh@google.com>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: pgtable: make ptep_clear() non-atomic
  2024-11-19  9:07 [PATCH] mm: pgtable: make ptep_clear() non-atomic Qi Zheng
  2024-11-19 15:27 ` Pasha Tatashin
  2024-11-19 15:46 ` Jann Horn
@ 2024-11-20  2:13 ` Muchun Song
  2 siblings, 0 replies; 4+ messages in thread
From: Muchun Song @ 2024-11-20  2:13 UTC (permalink / raw)
  To: Qi Zheng
  Cc: pasha.tatashin, tongtiangen, jannh, lorenzo.stoakes, david,
	ryan.roberts, peterx, jgg, akpm, linux-kernel, linux-mm



> On Nov 19, 2024, at 17:07, Qi Zheng <zhengqi.arch@bytedance.com> wrote:
> 
> In the generic ptep_get_and_clear() implementation, it is just a simple
> combination of ptep_get() and pte_clear(). But for some architectures
> (such as x86 and arm64, etc), the hardware will modify the A/D bits of the
> page table entry, so the ptep_get_and_clear() needs to be overwritten
> and implemented as an atomic operation to avoid contention, which has a
> performance cost.
> 
> The commit d283d422c6c4 ("x86: mm: add x86_64 support for page table
> check") adds the ptep_clear() on the x86, and makes it call
> ptep_get_and_clear() when CONFIG_PAGE_TABLE_CHECK is enabled. The page
> table check feature does not actually care about the A/D bits, so only
> ptep_get() + pte_clear() should be called. But considering that the page
> table check is a debug option, this should not have much of an impact.
> 
> But then the commit de8c8e52836d ("mm: page_table_check: add hooks to
> public helpers") changed ptep_clear() to unconditionally call
> ptep_get_and_clear(), so that the CONFIG_PAGE_TABLE_CHECK check can be
> put into the page table check stubs (in include/linux/page_table_check.h).
> This also cause performance loss to the kernel without
> CONFIG_PAGE_TABLE_CHECK enabled, which doesn't make sense.
> 
> Currently ptep_clear() is only used in debug code and in khugepaged
> collapse paths, which are fairly expensive. So the cost of an extra atomic
> RMW operation does not matter. But this may be used for other paths in the
> future. After all, for the present pte entry, we need to call ptep_clear()
> instead of pte_clear() to ensure that PAGE_TABLE_CHECK works properly.
> 
> So to be more precise, just calling ptep_get() and pte_clear() in the
> ptep_clear().
> 
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>

Reviewed-by: Muchun Song <muchun.song@linux.dev>




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-11-20  2:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-19  9:07 [PATCH] mm: pgtable: make ptep_clear() non-atomic Qi Zheng
2024-11-19 15:27 ` Pasha Tatashin
2024-11-19 15:46 ` Jann Horn
2024-11-20  2:13 ` Muchun Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox