* [PATCH v1 1/1] mm/khugepaged: move tlb_remove_table_sync_one out from under PTL
@ 2026-01-15 7:16 Lance Yang
2026-01-15 10:00 ` Baolin Wang
0 siblings, 1 reply; 7+ messages in thread
From: Lance Yang @ 2026-01-15 7:16 UTC (permalink / raw)
To: akpm, david
Cc: ioworker0, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb,
mhocko, ziy, baolin.wang, npache, ryan.roberts, dev.jain, baohua,
linux-mm, linux-kernel, Lance Yang
From: Lance Yang <lance.yang@linux.dev>
tlb_remove_table_sync_one() sends IPIs to all CPUs and waits for them,
which we really don't want to do while holding PTL.
Just move the call to after we release PTL, and drop the macro wrapper
while we're at it.
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
include/linux/pgtable.h | 4 ----
mm/khugepaged.c | 5 +++--
2 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index eb8aacba3698..fb04ed22052c 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -755,7 +755,6 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp)
return pmd;
}
#define pmdp_get_lockless pmdp_get_lockless
-#define pmdp_get_lockless_sync() tlb_remove_table_sync_one()
#endif /* CONFIG_PGTABLE_LEVELS > 2 */
#endif /* CONFIG_GUP_GET_PXX_LOW_HIGH */
@@ -774,9 +773,6 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp)
{
return pmdp_get(pmdp);
}
-static inline void pmdp_get_lockless_sync(void)
-{
-}
#endif
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 9f790ec34400..0a6cebf880e0 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1664,10 +1664,10 @@ static enum scan_result try_collapse_pte_mapped_thp(struct mm_struct *mm, unsign
}
}
pgt_pmd = pmdp_collapse_flush(vma, haddr, pmd);
- pmdp_get_lockless_sync();
pte_unmap_unlock(start_pte, ptl);
if (ptl != pml)
spin_unlock(pml);
+ tlb_remove_table_sync_one();
mmu_notifier_invalidate_range_end(&range);
@@ -1818,7 +1818,6 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
*/
if (likely(file_backed_vma_is_retractable(vma))) {
pgt_pmd = pmdp_collapse_flush(vma, addr, pmd);
- pmdp_get_lockless_sync();
success = true;
}
@@ -1826,6 +1825,8 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
spin_unlock(ptl);
drop_pml:
spin_unlock(pml);
+ if (success)
+ tlb_remove_table_sync_one();
mmu_notifier_invalidate_range_end(&range);
--
2.49.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v1 1/1] mm/khugepaged: move tlb_remove_table_sync_one out from under PTL
2026-01-15 7:16 [PATCH v1 1/1] mm/khugepaged: move tlb_remove_table_sync_one out from under PTL Lance Yang
@ 2026-01-15 10:00 ` Baolin Wang
2026-01-15 12:28 ` Lance Yang
0 siblings, 1 reply; 7+ messages in thread
From: Baolin Wang @ 2026-01-15 10:00 UTC (permalink / raw)
To: Lance Yang, akpm, david
Cc: ioworker0, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb,
mhocko, ziy, npache, ryan.roberts, dev.jain, baohua, linux-mm,
linux-kernel
Hi Lance,
On 1/15/26 3:16 PM, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> tlb_remove_table_sync_one() sends IPIs to all CPUs and waits for them,
> which we really don't want to do while holding PTL.
Could you add more comments to explain why this is safe for the PAE case?
For the non-PAE case, you added a new tlb_remove_table_sync_one(), why
we need this (to solve what problem)? Please also add more comments to
explain.
> Just move the call to after we release PTL, and drop the macro wrapper
> while we're at it.
>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
> include/linux/pgtable.h | 4 ----
> mm/khugepaged.c | 5 +++--
> 2 files changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index eb8aacba3698..fb04ed22052c 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -755,7 +755,6 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp)
> return pmd;
> }
> #define pmdp_get_lockless pmdp_get_lockless
> -#define pmdp_get_lockless_sync() tlb_remove_table_sync_one()
> #endif /* CONFIG_PGTABLE_LEVELS > 2 */
> #endif /* CONFIG_GUP_GET_PXX_LOW_HIGH */
>
> @@ -774,9 +773,6 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp)
> {
> return pmdp_get(pmdp);
> }
> -static inline void pmdp_get_lockless_sync(void)
> -{
> -}
> #endif
>
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 9f790ec34400..0a6cebf880e0 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1664,10 +1664,10 @@ static enum scan_result try_collapse_pte_mapped_thp(struct mm_struct *mm, unsign
> }
> }
> pgt_pmd = pmdp_collapse_flush(vma, haddr, pmd);
> - pmdp_get_lockless_sync();
> pte_unmap_unlock(start_pte, ptl);
> if (ptl != pml)
> spin_unlock(pml);
> + tlb_remove_table_sync_one();
>
> mmu_notifier_invalidate_range_end(&range);
>
> @@ -1818,7 +1818,6 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
> */
> if (likely(file_backed_vma_is_retractable(vma))) {
> pgt_pmd = pmdp_collapse_flush(vma, addr, pmd);
> - pmdp_get_lockless_sync();
> success = true;
> }
>
> @@ -1826,6 +1825,8 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
> spin_unlock(ptl);
> drop_pml:
> spin_unlock(pml);
> + if (success)
> + tlb_remove_table_sync_one();
>
> mmu_notifier_invalidate_range_end(&range);
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v1 1/1] mm/khugepaged: move tlb_remove_table_sync_one out from under PTL
2026-01-15 10:00 ` Baolin Wang
@ 2026-01-15 12:28 ` Lance Yang
2026-01-16 1:03 ` Baolin Wang
0 siblings, 1 reply; 7+ messages in thread
From: Lance Yang @ 2026-01-15 12:28 UTC (permalink / raw)
To: Baolin Wang
Cc: ioworker0, lorenzo.stoakes, david, akpm, Liam.Howlett, vbabka,
rppt, surenb, mhocko, ziy, npache, ryan.roberts, dev.jain,
baohua, linux-mm, linux-kernel
On 2026/1/15 18:00, Baolin Wang wrote:
> Hi Lance,
>
> On 1/15/26 3:16 PM, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> tlb_remove_table_sync_one() sends IPIs to all CPUs and waits for them,
>> which we really don't want to do while holding PTL.
>
> Could you add more comments to explain why this is safe for the PAE case?
Yep, IIUC, it is safe because we've already done pmdp_collapse_flush()
which ensures the PMD change is visible.
pmdp_get_lockless_sync() (which calls tlb_remove_table_sync_one() on PAE)
is just to ensure any ongoing lockless pmd readers (e.g., GUP-fast) complete
before we proceed. It sends IPIs to all CPUs and waits for responses - a CPU
can only respond when it's not between local_irq_save() and
local_irq_restore().
Moving it out from under PTL doesn't change the synchronization semantics,
since lockless readers don't depend on PTL anyway.
>
> For the non-PAE case, you added a new tlb_remove_table_sync_one(), why
> we need this (to solve what problem)? Please also add more comments to
> explain.
Oops, you're right, the original macro was a no-op for non-PAE.
I should just move the macro call out from under PTL, rather than
replacing it with direct tlb_remove_table_sync_one() calls.
Thanks,
Lance
>
>> Just move the call to after we release PTL, and drop the macro wrapper
>> while we're at it.
>>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>> include/linux/pgtable.h | 4 ----
>> mm/khugepaged.c | 5 +++--
>> 2 files changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>> index eb8aacba3698..fb04ed22052c 100644
>> --- a/include/linux/pgtable.h
>> +++ b/include/linux/pgtable.h
>> @@ -755,7 +755,6 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp)
>> return pmd;
>> }
>> #define pmdp_get_lockless pmdp_get_lockless
>> -#define pmdp_get_lockless_sync() tlb_remove_table_sync_one()
>> #endif /* CONFIG_PGTABLE_LEVELS > 2 */
>> #endif /* CONFIG_GUP_GET_PXX_LOW_HIGH */
>> @@ -774,9 +773,6 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp)
>> {
>> return pmdp_get(pmdp);
>> }
>> -static inline void pmdp_get_lockless_sync(void)
>> -{
>> -}
>> #endif
>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 9f790ec34400..0a6cebf880e0 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -1664,10 +1664,10 @@ static enum scan_result
>> try_collapse_pte_mapped_thp(struct mm_struct *mm, unsign
>> }
>> }
>> pgt_pmd = pmdp_collapse_flush(vma, haddr, pmd);
>> - pmdp_get_lockless_sync();
>> pte_unmap_unlock(start_pte, ptl);
>> if (ptl != pml)
>> spin_unlock(pml);
>> + tlb_remove_table_sync_one();
>> mmu_notifier_invalidate_range_end(&range);
>> @@ -1818,7 +1818,6 @@ static void retract_page_tables(struct
>> address_space *mapping, pgoff_t pgoff)
>> */
>> if (likely(file_backed_vma_is_retractable(vma))) {
>> pgt_pmd = pmdp_collapse_flush(vma, addr, pmd);
>> - pmdp_get_lockless_sync();
>> success = true;
>> }
>> @@ -1826,6 +1825,8 @@ static void retract_page_tables(struct
>> address_space *mapping, pgoff_t pgoff)
>> spin_unlock(ptl);
>> drop_pml:
>> spin_unlock(pml);
>> + if (success)
>> + tlb_remove_table_sync_one();
>> mmu_notifier_invalidate_range_end(&range);
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v1 1/1] mm/khugepaged: move tlb_remove_table_sync_one out from under PTL
2026-01-15 12:28 ` Lance Yang
@ 2026-01-16 1:03 ` Baolin Wang
2026-01-16 1:25 ` Lance Yang
0 siblings, 1 reply; 7+ messages in thread
From: Baolin Wang @ 2026-01-16 1:03 UTC (permalink / raw)
To: Lance Yang
Cc: ioworker0, lorenzo.stoakes, david, akpm, Liam.Howlett, vbabka,
rppt, surenb, mhocko, ziy, npache, ryan.roberts, dev.jain,
baohua, linux-mm, linux-kernel, hughd
On 1/15/26 8:28 PM, Lance Yang wrote:
>
>
> On 2026/1/15 18:00, Baolin Wang wrote:
>> Hi Lance,
>>
>> On 1/15/26 3:16 PM, Lance Yang wrote:
>>> From: Lance Yang <lance.yang@linux.dev>
>>>
>>> tlb_remove_table_sync_one() sends IPIs to all CPUs and waits for them,
>>> which we really don't want to do while holding PTL.
>>
>> Could you add more comments to explain why this is safe for the PAE case?
>
> Yep, IIUC, it is safe because we've already done pmdp_collapse_flush()
> which ensures the PMD change is visible.
>
> pmdp_get_lockless_sync() (which calls tlb_remove_table_sync_one() on PAE)
> is just to ensure any ongoing lockless pmd readers (e.g., GUP-fast)
> complete
> before we proceed. It sends IPIs to all CPUs and waits for responses - a
> CPU
> can only respond when it's not between local_irq_save() and
> local_irq_restore().
>
> Moving it out from under PTL doesn't change the synchronization semantics,
> since lockless readers don't depend on PTL anyway.
Cc Hugh who introduced the pmdp_get_lockless_sync(), to double check.
Sounds reasonable to me, please add these comments into the commit
message. Thanks.
>> For the non-PAE case, you added a new tlb_remove_table_sync_one(), why
>> we need this (to solve what problem)? Please also add more comments to
>> explain.
>
> Oops, you're right, the original macro was a no-op for non-PAE.
>
> I should just move the macro call out from under PTL, rather than
> replacing it with direct tlb_remove_table_sync_one() calls.
OK.
>>> Just move the call to after we release PTL, and drop the macro wrapper
>>> while we're at it.
>>>
>>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>>> ---
>>> include/linux/pgtable.h | 4 ----
>>> mm/khugepaged.c | 5 +++--
>>> 2 files changed, 3 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>>> index eb8aacba3698..fb04ed22052c 100644
>>> --- a/include/linux/pgtable.h
>>> +++ b/include/linux/pgtable.h
>>> @@ -755,7 +755,6 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp)
>>> return pmd;
>>> }
>>> #define pmdp_get_lockless pmdp_get_lockless
>>> -#define pmdp_get_lockless_sync() tlb_remove_table_sync_one()
>>> #endif /* CONFIG_PGTABLE_LEVELS > 2 */
>>> #endif /* CONFIG_GUP_GET_PXX_LOW_HIGH */
>>> @@ -774,9 +773,6 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp)
>>> {
>>> return pmdp_get(pmdp);
>>> }
>>> -static inline void pmdp_get_lockless_sync(void)
>>> -{
>>> -}
>>> #endif
>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index 9f790ec34400..0a6cebf880e0 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -1664,10 +1664,10 @@ static enum scan_result
>>> try_collapse_pte_mapped_thp(struct mm_struct *mm, unsign
>>> }
>>> }
>>> pgt_pmd = pmdp_collapse_flush(vma, haddr, pmd);
>>> - pmdp_get_lockless_sync();
>>> pte_unmap_unlock(start_pte, ptl);
>>> if (ptl != pml)
>>> spin_unlock(pml);
>>> + tlb_remove_table_sync_one();
>>> mmu_notifier_invalidate_range_end(&range);
>>> @@ -1818,7 +1818,6 @@ static void retract_page_tables(struct
>>> address_space *mapping, pgoff_t pgoff)
>>> */
>>> if (likely(file_backed_vma_is_retractable(vma))) {
>>> pgt_pmd = pmdp_collapse_flush(vma, addr, pmd);
>>> - pmdp_get_lockless_sync();
>>> success = true;
>>> }
>>> @@ -1826,6 +1825,8 @@ static void retract_page_tables(struct
>>> address_space *mapping, pgoff_t pgoff)
>>> spin_unlock(ptl);
>>> drop_pml:
>>> spin_unlock(pml);
>>> + if (success)
>>> + tlb_remove_table_sync_one();
>>> mmu_notifier_invalidate_range_end(&range);
>>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v1 1/1] mm/khugepaged: move tlb_remove_table_sync_one out from under PTL
2026-01-16 1:03 ` Baolin Wang
@ 2026-01-16 1:25 ` Lance Yang
2026-01-18 8:39 ` [PATCH v1 1/1] mm/khugepaged: move tlb_remove_table_sync_one out Lance Yang
0 siblings, 1 reply; 7+ messages in thread
From: Lance Yang @ 2026-01-16 1:25 UTC (permalink / raw)
To: Baolin Wang
Cc: ioworker0, lorenzo.stoakes, david, akpm, Liam.Howlett, vbabka,
rppt, surenb, mhocko, ziy, npache, ryan.roberts, dev.jain,
baohua, linux-mm, linux-kernel, hughd
On 2026/1/16 09:03, Baolin Wang wrote:
>
>
> On 1/15/26 8:28 PM, Lance Yang wrote:
>>
>>
>> On 2026/1/15 18:00, Baolin Wang wrote:
>>> Hi Lance,
>>>
>>> On 1/15/26 3:16 PM, Lance Yang wrote:
>>>> From: Lance Yang <lance.yang@linux.dev>
>>>>
>>>> tlb_remove_table_sync_one() sends IPIs to all CPUs and waits for them,
>>>> which we really don't want to do while holding PTL.
>>>
>>> Could you add more comments to explain why this is safe for the PAE
>>> case?
>>
>> Yep, IIUC, it is safe because we've already done pmdp_collapse_flush()
>> which ensures the PMD change is visible.
>>
>> pmdp_get_lockless_sync() (which calls tlb_remove_table_sync_one() on PAE)
>> is just to ensure any ongoing lockless pmd readers (e.g., GUP-fast)
>> complete
>> before we proceed. It sends IPIs to all CPUs and waits for responses -
>> a CPU
>> can only respond when it's not between local_irq_save() and
>> local_irq_restore().
>>
>> Moving it out from under PTL doesn't change the synchronization
>> semantics,
>> since lockless readers don't depend on PTL anyway.
>
> Cc Hugh who introduced the pmdp_get_lockless_sync(), to double check.
>
> Sounds reasonable to me, please add these comments into the commit
> message. Thanks.
Yes, will do. Thanks!
>
>>> For the non-PAE case, you added a new tlb_remove_table_sync_one(),
>>> why we need this (to solve what problem)? Please also add more
>>> comments to explain.
>>
>> Oops, you're right, the original macro was a no-op for non-PAE.
>>
>> I should just move the macro call out from under PTL, rather than
>> replacing it with direct tlb_remove_table_sync_one() calls.
>
> OK.
Cheers,
Lance
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v1 1/1] mm/khugepaged: move tlb_remove_table_sync_one out
2026-01-16 1:25 ` Lance Yang
@ 2026-01-18 8:39 ` Lance Yang
2026-01-20 11:38 ` Lance Yang
0 siblings, 1 reply; 7+ messages in thread
From: Lance Yang @ 2026-01-18 8:39 UTC (permalink / raw)
To: hughd
Cc: Liam.Howlett, akpm, baohua, baolin.wang, david, dev.jain,
ioworker0, linux-kernel, linux-mm, lorenzo.stoakes, mhocko,
npache, rppt, ryan.roberts, surenb, vbabka, ziy, Lance Yang
Hi Hugh,
Could you check if my understanding is correct?
On PAE, pmdp_get_lockless() reads pmd_low first, then pmd_high. There's a
risk of reading mismatched values if another CPU modifies the PMD between
the two reads.
Commit 146b42e07494[1] introduced local_irq_save() to protect the
split-read, blocking TLB flush IPIs during the operation.
After modifying the PMD, pmdp_get_lockless_sync() sends an IPI to ensure
all ongoing split-reads complete before proceeding with pte_free_defer().
As commit 146b42e07494[1] says:
```
Complement this pmdp_get_lockless_start() and pmdp_get_lockless_end(),
used only locally in __pte_offset_map(), with a pmdp_get_lockless_sync()
synonym for tlb_remove_table_sync_one(): to send the necessary interrupt
at the right moment on those configs which do not already send it.
```
And commit 1043173eb5eb[2] says:
```
Follow the pattern in retract_page_tables(); and using pte_free_defer()
removes most of the need for tlb_remove_table_sync_one() here; but call
pmdp_get_lockless_sync() to use it in the PAE case.
```
Regarding moving pmdp_get_lockless_sync() out from under PTL: Since
lockless readers (e.g., GUP-fast, __pte_offset_map()) are protected by
local_irq_save() rather than PTL, pmdp_get_lockless_sync() can be called
outside PTL as long as it's before pte_free_defer().
In contrast, for non-PAE, PMD reads are atomic, so pmdp_get_lockless_sync()
is a no-op.
[1] https://github.com/torvalds/linux/commit/146b42e07494e45f7c7bcf2cbf7afd1424afd78e
[2] https://github.com/torvalds/linux/commit/1043173eb5eb351a1dba11cca12705075fe74a9e
Thanks,
Lance
On Fri, 16 Jan 2026 09:25:54 +0800, Lance Yang wrote:
>
>
> On 2026/1/16 09:03, Baolin Wang wrote:
> >
> >
> > On 1/15/26 8:28 PM, Lance Yang wrote:
> >>
> >>
> >> On 2026/1/15 18:00, Baolin Wang wrote:
> >>> Hi Lance,
> >>>
> >>> On 1/15/26 3:16 PM, Lance Yang wrote:
> >>>> From: Lance Yang <lance.yang@linux.dev>
> >>>>
> >>>> tlb_remove_table_sync_one() sends IPIs to all CPUs and waits for them,
> >>>> which we really don't want to do while holding PTL.
> >>>
> >>> Could you add more comments to explain why this is safe for the PAE
> >>> case?
> >>
> >> Yep, IIUC, it is safe because we've already done pmdp_collapse_flush()
> >> which ensures the PMD change is visible.
> >>
> >> pmdp_get_lockless_sync() (which calls tlb_remove_table_sync_one() on PAE)
> >> is just to ensure any ongoing lockless pmd readers (e.g., GUP-fast)
> >> complete
> >> before we proceed. It sends IPIs to all CPUs and waits for responses -
> >> a CPU
> >> can only respond when it's not between local_irq_save() and
> >> local_irq_restore().
> >>
> >> Moving it out from under PTL doesn't change the synchronization
> >> semantics,
> >> since lockless readers don't depend on PTL anyway.
> >
> > Cc Hugh who introduced the pmdp_get_lockless_sync(), to double check.
> >
> > Sounds reasonable to me, please add these comments into the commit
> > message. Thanks.
>
> Yes, will do. Thanks!
>
> >
> >>> For the non-PAE case, you added a new tlb_remove_table_sync_one(),
> >>> why we need this (to solve what problem)? Please also add more
> >>> comments to explain.
> >>
> >> Oops, you're right, the original macro was a no-op for non-PAE.
> >>
> >> I should just move the macro call out from under PTL, rather than
> >> replacing it with direct tlb_remove_table_sync_one() calls.
> >
> > OK.
>
> Cheers,
> Lance
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v1 1/1] mm/khugepaged: move tlb_remove_table_sync_one out
2026-01-18 8:39 ` [PATCH v1 1/1] mm/khugepaged: move tlb_remove_table_sync_one out Lance Yang
@ 2026-01-20 11:38 ` Lance Yang
0 siblings, 0 replies; 7+ messages in thread
From: Lance Yang @ 2026-01-20 11:38 UTC (permalink / raw)
To: hughd, baolin.wang
Cc: Liam.Howlett, akpm, baohua, david, dev.jain, ioworker0,
linux-kernel, linux-mm, lorenzo.stoakes, mhocko, npache, rppt,
ryan.roberts, surenb, vbabka, ziy, qi.zheng
On 2026/1/18 16:39, Lance Yang wrote:
> Hi Hugh,
>
> Could you check if my understanding is correct?
>
> On PAE, pmdp_get_lockless() reads pmd_low first, then pmd_high. There's a
> risk of reading mismatched values if another CPU modifies the PMD between
> the two reads.
>
> Commit 146b42e07494[1] introduced local_irq_save() to protect the
> split-read, blocking TLB flush IPIs during the operation.
>
> After modifying the PMD, pmdp_get_lockless_sync() sends an IPI to ensure
> all ongoing split-reads complete before proceeding with pte_free_defer().
>
> As commit 146b42e07494[1] says:
>
> ```
> Complement this pmdp_get_lockless_start() and pmdp_get_lockless_end(),
> used only locally in __pte_offset_map(), with a pmdp_get_lockless_sync()
> synonym for tlb_remove_table_sync_one(): to send the necessary interrupt
> at the right moment on those configs which do not already send it.
> ```
>
> And commit 1043173eb5eb[2] says:
>
> ```
> Follow the pattern in retract_page_tables(); and using pte_free_defer()
> removes most of the need for tlb_remove_table_sync_one() here; but call
> pmdp_get_lockless_sync() to use it in the PAE case.
> ```
>
> Regarding moving pmdp_get_lockless_sync() out from under PTL: Since
> lockless readers (e.g., GUP-fast, __pte_offset_map()) are protected by
> local_irq_save() rather than PTL, pmdp_get_lockless_sync() can be called
> outside PTL as long as it's before pte_free_defer().
Looking at commit 146b42e07494[1] again, it says pmdp_get_lockless_sync()
should be called "at the right moment". I now realize moving it outside
PTL might not be safe.
If we release PTL before calling pmdp_get_lockless_sync(), another CPU
could set a new PMD while a lockless reader is still in local_irq_save()
reading the old PMD (split-read). I'm not sure if this race is actually
possible, but if it is, it would hit the ABA problem where the reader
gets mismatched pmd_low (old) and pmd_high (new) - the "faint risk"
mentioned in commit 146b42e07494[1].
On Native x86 PAE, pmdp_collapse_flush() sends IPI and waits, preventing
this race. But on PV, the hypercall returns immediately, so we need
pmdp_get_lockless_sync() to ensure all IRQ-off readers complete before
releasing PTL.
I should keep it under PTL to be safe.
Sorry for the churn :(
[1]
https://github.com/torvalds/linux/commit/146b42e07494e45f7c7bcf2cbf7afd1424afd78e
Thanks,
Lance
>
> In contrast, for non-PAE, PMD reads are atomic, so pmdp_get_lockless_sync()
> is a no-op.
>
> [1] https://github.com/torvalds/linux/commit/146b42e07494e45f7c7bcf2cbf7afd1424afd78e
> [2] https://github.com/torvalds/linux/commit/1043173eb5eb351a1dba11cca12705075fe74a9e
>
>
> Thanks,
> Lance
>
> On Fri, 16 Jan 2026 09:25:54 +0800, Lance Yang wrote:
>>
>>
>> On 2026/1/16 09:03, Baolin Wang wrote:
>>>
>>>
>>> On 1/15/26 8:28 PM, Lance Yang wrote:
>>>>
>>>>
>>>> On 2026/1/15 18:00, Baolin Wang wrote:
>>>>> Hi Lance,
>>>>>
>>>>> On 1/15/26 3:16 PM, Lance Yang wrote:
>>>>>> From: Lance Yang <lance.yang@linux.dev>
>>>>>>
>>>>>> tlb_remove_table_sync_one() sends IPIs to all CPUs and waits for them,
>>>>>> which we really don't want to do while holding PTL.
>>>>>
>>>>> Could you add more comments to explain why this is safe for the PAE
>>>>> case?
>>>>
>>>> Yep, IIUC, it is safe because we've already done pmdp_collapse_flush()
>>>> which ensures the PMD change is visible.
>>>>
>>>> pmdp_get_lockless_sync() (which calls tlb_remove_table_sync_one() on PAE)
>>>> is just to ensure any ongoing lockless pmd readers (e.g., GUP-fast)
>>>> complete
>>>> before we proceed. It sends IPIs to all CPUs and waits for responses -
>>>> a CPU
>>>> can only respond when it's not between local_irq_save() and
>>>> local_irq_restore().
>>>>
>>>> Moving it out from under PTL doesn't change the synchronization
>>>> semantics,
>>>> since lockless readers don't depend on PTL anyway.
>>>
>>> Cc Hugh who introduced the pmdp_get_lockless_sync(), to double check.
>>>
>>> Sounds reasonable to me, please add these comments into the commit
>>> message. Thanks.
>>
>> Yes, will do. Thanks!
>>
>>>
>>>>> For the non-PAE case, you added a new tlb_remove_table_sync_one(),
>>>>> why we need this (to solve what problem)? Please also add more
>>>>> comments to explain.
>>>>
>>>> Oops, you're right, the original macro was a no-op for non-PAE.
>>>>
>>>> I should just move the macro call out from under PTL, rather than
>>>> replacing it with direct tlb_remove_table_sync_one() calls.
>>>
>>> OK.
>>
>> Cheers,
>> Lance
>>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-01-20 11:38 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-15 7:16 [PATCH v1 1/1] mm/khugepaged: move tlb_remove_table_sync_one out from under PTL Lance Yang
2026-01-15 10:00 ` Baolin Wang
2026-01-15 12:28 ` Lance Yang
2026-01-16 1:03 ` Baolin Wang
2026-01-16 1:25 ` Lance Yang
2026-01-18 8:39 ` [PATCH v1 1/1] mm/khugepaged: move tlb_remove_table_sync_one out Lance Yang
2026-01-20 11:38 ` Lance Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox