linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
@ 2024-04-30 13:31 Ryan Roberts
  2024-04-30 13:55 ` Will Deacon
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Ryan Roberts @ 2024-04-30 13:31 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Anshuman Khandual,
	Andrew Morton, Zi Yan, Aneesh Kumar K.V
  Cc: Ryan Roberts, linux-arm-kernel, linux-mm, linux-kernel, stable

__split_huge_pmd_locked() can be called for a present THP, devmap or
(non-present) migration entry. It calls pmdp_invalidate()
unconditionally on the pmdp and only determines if it is present or not
based on the returned old pmd.

But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
pmd_present() calls to return true - even for a swap pmd. Therefore any
lockless pgtable walker could see the migration entry pmd in this state
and start interpretting the fields (e.g. pmd_pfn()) as if it were
present, leading to BadThings (TM). GUP-fast appears to be one such
lockless pgtable walker.

While the obvious fix is for core-mm to avoid such calls for non-present
pmds (pmdp_invalidate() will also issue TLBI which is not necessary for
this case either), all other arches that implement pmd_mkinvalid() do it
in such a way that it is robust to being called with a non-present pmd.
So it is simpler and safer to make arm64 robust too. This approach means
we can even add tests to debug_vm_pgtable.c to validate the required
behaviour.

This is a theoretical bug found during code review. I don't have any
test case to trigger it in practice.

Cc: stable@vger.kernel.org
Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---

Hi all,

v1 of this fix [1] took the approach of fixing core-mm to never call
pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64
suffers this problem; all other arches are robust. So his suggestion was to
instead make arm64 robust in the same way and add tests to validate it. Despite
my stated reservations in the context of the v1 discussion, having thought on it
for a bit, I now agree with Zi Yan. Hence this post.

Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is
remove it from there and have this go in through the arm64 tree? Assuming there
is agreement that this approach is right one.

This applies on top of v6.9-rc5. Passes all the mm selftests on arm64.

[1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/

Thanks,
Ryan


 arch/arm64/include/asm/pgtable.h | 12 +++++--
 mm/debug_vm_pgtable.c            | 61 ++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index afdd56d26ad7..7d580271a46d 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -511,8 +511,16 @@ static inline int pmd_trans_huge(pmd_t pmd)

 static inline pmd_t pmd_mkinvalid(pmd_t pmd)
 {
-	pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
-	pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
+	/*
+	 * If not valid then either we are already present-invalid or we are
+	 * not-present (i.e. none or swap entry). We must not convert
+	 * not-present to present-invalid. Unbelievably, the core-mm may call
+	 * pmd_mkinvalid() for a swap entry and all other arches can handle it.
+	 */
+	if (pmd_valid(pmd)) {
+		pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
+		pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
+	}

 	return pmd;
 }
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 65c19025da3d..7e9c387d06b0 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -956,6 +956,65 @@ static void __init hugetlb_basic_tests(struct pgtable_debug_args *args) { }
 #endif /* CONFIG_HUGETLB_PAGE */

 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#if !defined(__HAVE_ARCH_PMDP_INVALIDATE) && defined(CONFIG_ARCH_ENABLE_THP_MIGRATION)
+static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args)
+{
+	unsigned long max_swap_offset;
+	swp_entry_t swp_set, swp_clear, swp_convert;
+	pmd_t pmd_set, pmd_clear;
+
+	/*
+	 * See generic_max_swapfile_size(): probe the maximum offset, then
+	 * create swap entry will all possible bits set and a swap entry will
+	 * all bits clear.
+	 */
+	max_swap_offset = swp_offset(pmd_to_swp_entry(swp_entry_to_pmd(swp_entry(0, ~0UL))));
+	swp_set = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset);
+	swp_clear = swp_entry(0, 0);
+
+	/* Convert to pmd. */
+	pmd_set = swp_entry_to_pmd(swp_set);
+	pmd_clear = swp_entry_to_pmd(swp_clear);
+
+	/*
+	 * Sanity check that the pmds are not-present, not-huge and swap entry
+	 * is recoverable without corruption.
+	 */
+	WARN_ON(pmd_present(pmd_set));
+	WARN_ON(pmd_trans_huge(pmd_set));
+	swp_convert = pmd_to_swp_entry(pmd_set);
+	WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
+	WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
+	WARN_ON(pmd_present(pmd_clear));
+	WARN_ON(pmd_trans_huge(pmd_clear));
+	swp_convert = pmd_to_swp_entry(pmd_clear);
+	WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
+	WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));
+
+	/* Now invalidate the pmd. */
+	pmd_set = pmd_mkinvalid(pmd_set);
+	pmd_clear = pmd_mkinvalid(pmd_clear);
+
+	/*
+	 * Since its a swap pmd, invalidation should effectively be a noop and
+	 * the checks we already did should give the same answer. Check the
+	 * invalidation didn't corrupt any fields.
+	 */
+	WARN_ON(pmd_present(pmd_set));
+	WARN_ON(pmd_trans_huge(pmd_set));
+	swp_convert = pmd_to_swp_entry(pmd_set);
+	WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
+	WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
+	WARN_ON(pmd_present(pmd_clear));
+	WARN_ON(pmd_trans_huge(pmd_clear));
+	swp_convert = pmd_to_swp_entry(pmd_clear);
+	WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
+	WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));
+}
+#else
+static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args) { }
+#endif /* !__HAVE_ARCH_PMDP_INVALIDATE && CONFIG_ARCH_ENABLE_THP_MIGRATION */
+
 static void __init pmd_thp_tests(struct pgtable_debug_args *args)
 {
 	pmd_t pmd;
@@ -982,6 +1041,8 @@ static void __init pmd_thp_tests(struct pgtable_debug_args *args)
 	WARN_ON(!pmd_trans_huge(pmd_mkinvalid(pmd_mkhuge(pmd))));
 	WARN_ON(!pmd_present(pmd_mkinvalid(pmd_mkhuge(pmd))));
 #endif /* __HAVE_ARCH_PMDP_INVALIDATE */
+
+	swp_pmd_mkinvalid_tests(args);
 }

 #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
--
2.25.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
  2024-04-30 13:31 [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds Ryan Roberts
@ 2024-04-30 13:55 ` Will Deacon
  2024-04-30 14:04   ` Ryan Roberts
  2024-04-30 15:00 ` Zi Yan
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2024-04-30 13:55 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Mark Rutland, Anshuman Khandual, Andrew Morton,
	Zi Yan, Aneesh Kumar K.V, linux-arm-kernel, linux-mm,
	linux-kernel, stable

On Tue, Apr 30, 2024 at 02:31:38PM +0100, Ryan Roberts wrote:
> __split_huge_pmd_locked() can be called for a present THP, devmap or
> (non-present) migration entry. It calls pmdp_invalidate()
> unconditionally on the pmdp and only determines if it is present or not
> based on the returned old pmd.
> 
> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
> pmd_present() calls to return true - even for a swap pmd. Therefore any
> lockless pgtable walker could see the migration entry pmd in this state
> and start interpretting the fields (e.g. pmd_pfn()) as if it were
> present, leading to BadThings (TM). GUP-fast appears to be one such
> lockless pgtable walker.
> 
> While the obvious fix is for core-mm to avoid such calls for non-present
> pmds (pmdp_invalidate() will also issue TLBI which is not necessary for
> this case either), all other arches that implement pmd_mkinvalid() do it
> in such a way that it is robust to being called with a non-present pmd.
> So it is simpler and safer to make arm64 robust too. This approach means
> we can even add tests to debug_vm_pgtable.c to validate the required
> behaviour.
> 
> This is a theoretical bug found during code review. I don't have any
> test case to trigger it in practice.
> 
> Cc: stable@vger.kernel.org
> Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration")
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
> 
> Hi all,
> 
> v1 of this fix [1] took the approach of fixing core-mm to never call
> pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64
> suffers this problem; all other arches are robust. So his suggestion was to
> instead make arm64 robust in the same way and add tests to validate it. Despite
> my stated reservations in the context of the v1 discussion, having thought on it
> for a bit, I now agree with Zi Yan. Hence this post.
> 
> Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is
> remove it from there and have this go in through the arm64 tree? Assuming there
> is agreement that this approach is right one.
> 
> This applies on top of v6.9-rc5. Passes all the mm selftests on arm64.
> 
> [1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
> 
> Thanks,
> Ryan
> 
> 
>  arch/arm64/include/asm/pgtable.h | 12 +++++--
>  mm/debug_vm_pgtable.c            | 61 ++++++++++++++++++++++++++++++++
>  2 files changed, 71 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index afdd56d26ad7..7d580271a46d 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -511,8 +511,16 @@ static inline int pmd_trans_huge(pmd_t pmd)
> 
>  static inline pmd_t pmd_mkinvalid(pmd_t pmd)
>  {
> -	pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
> -	pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
> +	/*
> +	 * If not valid then either we are already present-invalid or we are
> +	 * not-present (i.e. none or swap entry). We must not convert
> +	 * not-present to present-invalid. Unbelievably, the core-mm may call
> +	 * pmd_mkinvalid() for a swap entry and all other arches can handle it.
> +	 */
> +	if (pmd_valid(pmd)) {
> +		pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
> +		pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
> +	}
> 
>  	return pmd;
>  }

Acked-by: Will Deacon <will@kernel.org>

But it might be worth splitting the tests from the fix to make backporting
easier.

Catalin -- I assume you'll pick this up, but please shout if you want me
to take it instead.

Will


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
  2024-04-30 13:55 ` Will Deacon
@ 2024-04-30 14:04   ` Ryan Roberts
  2024-04-30 16:23     ` Catalin Marinas
  0 siblings, 1 reply; 15+ messages in thread
From: Ryan Roberts @ 2024-04-30 14:04 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Mark Rutland, Anshuman Khandual, Andrew Morton,
	Zi Yan, Aneesh Kumar K.V, linux-arm-kernel, linux-mm,
	linux-kernel, stable

On 30/04/2024 14:55, Will Deacon wrote:
> On Tue, Apr 30, 2024 at 02:31:38PM +0100, Ryan Roberts wrote:
>> __split_huge_pmd_locked() can be called for a present THP, devmap or
>> (non-present) migration entry. It calls pmdp_invalidate()
>> unconditionally on the pmdp and only determines if it is present or not
>> based on the returned old pmd.
>>
>> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
>> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
>> pmd_present() calls to return true - even for a swap pmd. Therefore any
>> lockless pgtable walker could see the migration entry pmd in this state
>> and start interpretting the fields (e.g. pmd_pfn()) as if it were
>> present, leading to BadThings (TM). GUP-fast appears to be one such
>> lockless pgtable walker.
>>
>> While the obvious fix is for core-mm to avoid such calls for non-present
>> pmds (pmdp_invalidate() will also issue TLBI which is not necessary for
>> this case either), all other arches that implement pmd_mkinvalid() do it
>> in such a way that it is robust to being called with a non-present pmd.
>> So it is simpler and safer to make arm64 robust too. This approach means
>> we can even add tests to debug_vm_pgtable.c to validate the required
>> behaviour.
>>
>> This is a theoretical bug found during code review. I don't have any
>> test case to trigger it in practice.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration")
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>
>> Hi all,
>>
>> v1 of this fix [1] took the approach of fixing core-mm to never call
>> pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64
>> suffers this problem; all other arches are robust. So his suggestion was to
>> instead make arm64 robust in the same way and add tests to validate it. Despite
>> my stated reservations in the context of the v1 discussion, having thought on it
>> for a bit, I now agree with Zi Yan. Hence this post.
>>
>> Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is
>> remove it from there and have this go in through the arm64 tree? Assuming there
>> is agreement that this approach is right one.
>>
>> This applies on top of v6.9-rc5. Passes all the mm selftests on arm64.
>>
>> [1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
>>
>> Thanks,
>> Ryan
>>
>>
>>  arch/arm64/include/asm/pgtable.h | 12 +++++--
>>  mm/debug_vm_pgtable.c            | 61 ++++++++++++++++++++++++++++++++
>>  2 files changed, 71 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index afdd56d26ad7..7d580271a46d 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -511,8 +511,16 @@ static inline int pmd_trans_huge(pmd_t pmd)
>>
>>  static inline pmd_t pmd_mkinvalid(pmd_t pmd)
>>  {
>> -	pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
>> -	pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
>> +	/*
>> +	 * If not valid then either we are already present-invalid or we are
>> +	 * not-present (i.e. none or swap entry). We must not convert
>> +	 * not-present to present-invalid. Unbelievably, the core-mm may call
>> +	 * pmd_mkinvalid() for a swap entry and all other arches can handle it.
>> +	 */
>> +	if (pmd_valid(pmd)) {
>> +		pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
>> +		pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
>> +	}
>>
>>  	return pmd;
>>  }
> 
> Acked-by: Will Deacon <will@kernel.org>

Thanks

> 
> But it might be worth splitting the tests from the fix to make backporting
> easier.

Yes good point. I'll leave this hanging for today to see if any more comments
come in, and will re-post tomorrow as 2 patches. I assume we need to go fast to
catch 6.9.

> 
> Catalin -- I assume you'll pick this up, but please shout if you want me
> to take it instead.
> 
> Will



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
  2024-04-30 13:31 [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds Ryan Roberts
  2024-04-30 13:55 ` Will Deacon
@ 2024-04-30 15:00 ` Zi Yan
  2024-04-30 17:57 ` Catalin Marinas
  2024-05-01 11:35 ` Ryan Roberts
  3 siblings, 0 replies; 15+ messages in thread
From: Zi Yan @ 2024-04-30 15:00 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Anshuman Khandual,
	Andrew Morton, Zi Yan, Aneesh Kumar K.V, linux-arm-kernel,
	linux-mm, linux-kernel, stable

[-- Attachment #1: Type: text/plain, Size: 2648 bytes --]

On 30 Apr 2024, at 9:31, Ryan Roberts wrote:

> __split_huge_pmd_locked() can be called for a present THP, devmap or
> (non-present) migration entry. It calls pmdp_invalidate()
> unconditionally on the pmdp and only determines if it is present or not
> based on the returned old pmd.
>
> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
> pmd_present() calls to return true - even for a swap pmd. Therefore any
> lockless pgtable walker could see the migration entry pmd in this state
> and start interpretting the fields (e.g. pmd_pfn()) as if it were
> present, leading to BadThings (TM). GUP-fast appears to be one such
> lockless pgtable walker.
>
> While the obvious fix is for core-mm to avoid such calls for non-present
> pmds (pmdp_invalidate() will also issue TLBI which is not necessary for
> this case either), all other arches that implement pmd_mkinvalid() do it
> in such a way that it is robust to being called with a non-present pmd.
> So it is simpler and safer to make arm64 robust too. This approach means
> we can even add tests to debug_vm_pgtable.c to validate the required
> behaviour.
>
> This is a theoretical bug found during code review. I don't have any
> test case to trigger it in practice.
>
> Cc: stable@vger.kernel.org
> Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration")
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>
> Hi all,
>
> v1 of this fix [1] took the approach of fixing core-mm to never call
> pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64
> suffers this problem; all other arches are robust. So his suggestion was to
> instead make arm64 robust in the same way and add tests to validate it. Despite
> my stated reservations in the context of the v1 discussion, having thought on it
> for a bit, I now agree with Zi Yan. Hence this post.
>
> Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is
> remove it from there and have this go in through the arm64 tree? Assuming there
> is agreement that this approach is right one.
>
> This applies on top of v6.9-rc5. Passes all the mm selftests on arm64.
>
> [1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
>
> Thanks,
> Ryan
>
>
>  arch/arm64/include/asm/pgtable.h | 12 +++++--
>  mm/debug_vm_pgtable.c            | 61 ++++++++++++++++++++++++++++++++
>  2 files changed, 71 insertions(+), 2 deletions(-)
>
LGTM. Thanks. Feel free to add Reviewed-by: Zi Yan <ziy@nvidia.com>.

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
  2024-04-30 14:04   ` Ryan Roberts
@ 2024-04-30 16:23     ` Catalin Marinas
  2024-04-30 16:25       ` Ryan Roberts
  0 siblings, 1 reply; 15+ messages in thread
From: Catalin Marinas @ 2024-04-30 16:23 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Will Deacon, Mark Rutland, Anshuman Khandual, Andrew Morton,
	Zi Yan, Aneesh Kumar K.V, linux-arm-kernel, linux-mm,
	linux-kernel, stable

On Tue, Apr 30, 2024 at 03:04:32PM +0100, Ryan Roberts wrote:
> On 30/04/2024 14:55, Will Deacon wrote:
> > But it might be worth splitting the tests from the fix to make backporting
> > easier.
> 
> Yes good point. I'll leave this hanging for today to see if any more comments
> come in, and will re-post tomorrow as 2 patches. I assume we need to go fast to
> catch 6.9.

Yes, I'll pick it up for 6.9. I can drop the tests from the patch (and
their mention in the log) and you can post your tests separately to go
via Andrew's tree.

-- 
Catalin


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
  2024-04-30 16:23     ` Catalin Marinas
@ 2024-04-30 16:25       ` Ryan Roberts
  0 siblings, 0 replies; 15+ messages in thread
From: Ryan Roberts @ 2024-04-30 16:25 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Will Deacon, Mark Rutland, Anshuman Khandual, Andrew Morton,
	Zi Yan, Aneesh Kumar K.V, linux-arm-kernel, linux-mm,
	linux-kernel, stable

On 30/04/2024 17:23, Catalin Marinas wrote:
> On Tue, Apr 30, 2024 at 03:04:32PM +0100, Ryan Roberts wrote:
>> On 30/04/2024 14:55, Will Deacon wrote:
>>> But it might be worth splitting the tests from the fix to make backporting
>>> easier.
>>
>> Yes good point. I'll leave this hanging for today to see if any more comments
>> come in, and will re-post tomorrow as 2 patches. I assume we need to go fast to
>> catch 6.9.
> 
> Yes, I'll pick it up for 6.9. I can drop the tests from the patch (and
> their mention in the log) and you can post your tests separately to go
> via Andrew's tree.
> 

Yep that works for me! thanks.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
  2024-04-30 13:31 [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds Ryan Roberts
  2024-04-30 13:55 ` Will Deacon
  2024-04-30 15:00 ` Zi Yan
@ 2024-04-30 17:57 ` Catalin Marinas
  2024-05-01  8:05   ` Ryan Roberts
  2024-05-02 18:00   ` Catalin Marinas
  2024-05-01 11:35 ` Ryan Roberts
  3 siblings, 2 replies; 15+ messages in thread
From: Catalin Marinas @ 2024-04-30 17:57 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Anshuman Khandual, Andrew Morton,
	Zi Yan, Aneesh Kumar K.V, Ryan Roberts
  Cc: linux-arm-kernel, linux-mm, linux-kernel, stable

On Tue, 30 Apr 2024 14:31:38 +0100, Ryan Roberts wrote:
> __split_huge_pmd_locked() can be called for a present THP, devmap or
> (non-present) migration entry. It calls pmdp_invalidate()
> unconditionally on the pmdp and only determines if it is present or not
> based on the returned old pmd.
> 
> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
> pmd_present() calls to return true - even for a swap pmd. Therefore any
> lockless pgtable walker could see the migration entry pmd in this state
> and start interpretting the fields (e.g. pmd_pfn()) as if it were
> present, leading to BadThings (TM). GUP-fast appears to be one such
> lockless pgtable walker.
> 
> [...]

Applied to arm64 (for-next/fixes), thanks! It should land in 6.9-rc7. I
removed the debug/test code, please send it as a separate patch for
6.10.

[1/1] arm64/mm: pmd_mkinvalid() must handle swap pmds
      https://git.kernel.org/arm64/c/e783331c7720

-- 
Catalin



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
  2024-04-30 17:57 ` Catalin Marinas
@ 2024-05-01  8:05   ` Ryan Roberts
  2024-05-01 10:04     ` Catalin Marinas
  2024-05-02 18:00   ` Catalin Marinas
  1 sibling, 1 reply; 15+ messages in thread
From: Ryan Roberts @ 2024-05-01  8:05 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Anshuman Khandual,
	Andrew Morton, Zi Yan, Aneesh Kumar K.V
  Cc: linux-arm-kernel, linux-mm, linux-kernel, stable

On 30/04/2024 18:57, Catalin Marinas wrote:
> On Tue, 30 Apr 2024 14:31:38 +0100, Ryan Roberts wrote:
>> __split_huge_pmd_locked() can be called for a present THP, devmap or
>> (non-present) migration entry. It calls pmdp_invalidate()
>> unconditionally on the pmdp and only determines if it is present or not
>> based on the returned old pmd.
>>
>> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
>> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
>> pmd_present() calls to return true - even for a swap pmd. Therefore any
>> lockless pgtable walker could see the migration entry pmd in this state
>> and start interpretting the fields (e.g. pmd_pfn()) as if it were
>> present, leading to BadThings (TM). GUP-fast appears to be one such
>> lockless pgtable walker.
>>
>> [...]
> 
> Applied to arm64 (for-next/fixes), thanks! It should land in 6.9-rc7. I
> removed the debug/test code, please send it as a separate patch for
> 6.10.

Thanks Catalin! I'm guessing this will turn up in today's linux-next, so if I
send the tests today and Andrew puts them straight in mm-unstable (which will
goto linux-next) there is no risk that the tests are there without the fix? Or
do I need to hold off until the fix is in v6.9-rc7?

> 
> [1/1] arm64/mm: pmd_mkinvalid() must handle swap pmds
>       https://git.kernel.org/arm64/c/e783331c7720
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
  2024-05-01  8:05   ` Ryan Roberts
@ 2024-05-01 10:04     ` Catalin Marinas
  2024-05-01 10:13       ` Ryan Roberts
  0 siblings, 1 reply; 15+ messages in thread
From: Catalin Marinas @ 2024-05-01 10:04 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Will Deacon, Mark Rutland, Anshuman Khandual, Andrew Morton,
	Zi Yan, Aneesh Kumar K.V, linux-arm-kernel, linux-mm,
	linux-kernel, stable

On Wed, May 01, 2024 at 09:05:17AM +0100, Ryan Roberts wrote:
> On 30/04/2024 18:57, Catalin Marinas wrote:
> > On Tue, 30 Apr 2024 14:31:38 +0100, Ryan Roberts wrote:
> >> __split_huge_pmd_locked() can be called for a present THP, devmap or
> >> (non-present) migration entry. It calls pmdp_invalidate()
> >> unconditionally on the pmdp and only determines if it is present or not
> >> based on the returned old pmd.
> >>
> >> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
> >> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
> >> pmd_present() calls to return true - even for a swap pmd. Therefore any
> >> lockless pgtable walker could see the migration entry pmd in this state
> >> and start interpretting the fields (e.g. pmd_pfn()) as if it were
> >> present, leading to BadThings (TM). GUP-fast appears to be one such
> >> lockless pgtable walker.
> >>
> >> [...]
> > 
> > Applied to arm64 (for-next/fixes), thanks! It should land in 6.9-rc7. I
> > removed the debug/test code, please send it as a separate patch for
> > 6.10.
> 
> Thanks Catalin! I'm guessing this will turn up in today's linux-next, so if I
> send the tests today and Andrew puts them straight in mm-unstable (which will
> goto linux-next) there is no risk that the tests are there without the fix? Or
> do I need to hold off until the fix is in v6.9-rc7?

It looks like we don't push for-next/fixes to linux-next, it's
short-lived usually, it ends up upstream quickly. I can send the pull
request later today, should turn up in mainline by tomorrow. You can add
a note to your patch for Andrew that it will fail on arm64 until the fix
ends up upstream. It's a matter of a couple of days anyway.

-- 
Catalin


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
  2024-05-01 10:04     ` Catalin Marinas
@ 2024-05-01 10:13       ` Ryan Roberts
  0 siblings, 0 replies; 15+ messages in thread
From: Ryan Roberts @ 2024-05-01 10:13 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Will Deacon, Mark Rutland, Anshuman Khandual, Andrew Morton,
	Zi Yan, Aneesh Kumar K.V, linux-arm-kernel, linux-mm,
	linux-kernel, stable

On 01/05/2024 11:04, Catalin Marinas wrote:
> On Wed, May 01, 2024 at 09:05:17AM +0100, Ryan Roberts wrote:
>> On 30/04/2024 18:57, Catalin Marinas wrote:
>>> On Tue, 30 Apr 2024 14:31:38 +0100, Ryan Roberts wrote:
>>>> __split_huge_pmd_locked() can be called for a present THP, devmap or
>>>> (non-present) migration entry. It calls pmdp_invalidate()
>>>> unconditionally on the pmdp and only determines if it is present or not
>>>> based on the returned old pmd.
>>>>
>>>> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
>>>> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
>>>> pmd_present() calls to return true - even for a swap pmd. Therefore any
>>>> lockless pgtable walker could see the migration entry pmd in this state
>>>> and start interpretting the fields (e.g. pmd_pfn()) as if it were
>>>> present, leading to BadThings (TM). GUP-fast appears to be one such
>>>> lockless pgtable walker.
>>>>
>>>> [...]
>>>
>>> Applied to arm64 (for-next/fixes), thanks! It should land in 6.9-rc7. I
>>> removed the debug/test code, please send it as a separate patch for
>>> 6.10.
>>
>> Thanks Catalin! I'm guessing this will turn up in today's linux-next, so if I
>> send the tests today and Andrew puts them straight in mm-unstable (which will
>> goto linux-next) there is no risk that the tests are there without the fix? Or
>> do I need to hold off until the fix is in v6.9-rc7?
> 
> It looks like we don't push for-next/fixes to linux-next, it's
> short-lived usually, it ends up upstream quickly. I can send the pull
> request later today, should turn up in mainline by tomorrow. You can add
> a note to your patch for Andrew that it will fail on arm64 until the fix
> ends up upstream. It's a matter of a couple of days anyway.

OK, I just didn't want to send it only for our CI to start failing. I'll do as
you suggest.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
  2024-04-30 13:31 [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds Ryan Roberts
                   ` (2 preceding siblings ...)
  2024-04-30 17:57 ` Catalin Marinas
@ 2024-05-01 11:35 ` Ryan Roberts
  2024-05-01 11:38   ` Ryan Roberts
  3 siblings, 1 reply; 15+ messages in thread
From: Ryan Roberts @ 2024-05-01 11:35 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Anshuman Khandual,
	Andrew Morton, Zi Yan, Aneesh Kumar K.V
  Cc: linux-arm-kernel, linux-mm, linux-kernel, stable

Zi Yan, I'm hoping you might have some input on the below...


On 30/04/2024 14:31, Ryan Roberts wrote:
> __split_huge_pmd_locked() can be called for a present THP, devmap or
> (non-present) migration entry. It calls pmdp_invalidate()
> unconditionally on the pmdp and only determines if it is present or not
> based on the returned old pmd.
> 
> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
> pmd_present() calls to return true - even for a swap pmd. Therefore any
> lockless pgtable walker could see the migration entry pmd in this state
> and start interpretting the fields (e.g. pmd_pfn()) as if it were
> present, leading to BadThings (TM). GUP-fast appears to be one such
> lockless pgtable walker.
> 
> While the obvious fix is for core-mm to avoid such calls for non-present
> pmds (pmdp_invalidate() will also issue TLBI which is not necessary for
> this case either), all other arches that implement pmd_mkinvalid() do it
> in such a way that it is robust to being called with a non-present pmd.

OK the plot thickens; The tests I wrote to check that pmd_mkinvalid() is safe for swap entries fails on x86_64. See below...

> So it is simpler and safer to make arm64 robust too. This approach means
> we can even add tests to debug_vm_pgtable.c to validate the required
> behaviour.
> 
> This is a theoretical bug found during code review. I don't have any
> test case to trigger it in practice.
> 
> Cc: stable@vger.kernel.org
> Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration")
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
> 
> Hi all,
> 
> v1 of this fix [1] took the approach of fixing core-mm to never call
> pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64
> suffers this problem; all other arches are robust. So his suggestion was to
> instead make arm64 robust in the same way and add tests to validate it. Despite
> my stated reservations in the context of the v1 discussion, having thought on it
> for a bit, I now agree with Zi Yan. Hence this post.
> 
> Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is
> remove it from there and have this go in through the arm64 tree? Assuming there
> is agreement that this approach is right one.
> 
> This applies on top of v6.9-rc5. Passes all the mm selftests on arm64.
> 
> [1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
> 
> Thanks,
> Ryan
> 
> 
>  arch/arm64/include/asm/pgtable.h | 12 +++++--
>  mm/debug_vm_pgtable.c            | 61 ++++++++++++++++++++++++++++++++
>  2 files changed, 71 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index afdd56d26ad7..7d580271a46d 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -511,8 +511,16 @@ static inline int pmd_trans_huge(pmd_t pmd)
> 
>  static inline pmd_t pmd_mkinvalid(pmd_t pmd)
>  {
> -	pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
> -	pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
> +	/*
> +	 * If not valid then either we are already present-invalid or we are
> +	 * not-present (i.e. none or swap entry). We must not convert
> +	 * not-present to present-invalid. Unbelievably, the core-mm may call
> +	 * pmd_mkinvalid() for a swap entry and all other arches can handle it.
> +	 */
> +	if (pmd_valid(pmd)) {
> +		pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
> +		pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
> +	}
> 
>  	return pmd;
>  }
> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> index 65c19025da3d..7e9c387d06b0 100644
> --- a/mm/debug_vm_pgtable.c
> +++ b/mm/debug_vm_pgtable.c
> @@ -956,6 +956,65 @@ static void __init hugetlb_basic_tests(struct pgtable_debug_args *args) { }
>  #endif /* CONFIG_HUGETLB_PAGE */
> 
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +#if !defined(__HAVE_ARCH_PMDP_INVALIDATE) && defined(CONFIG_ARCH_ENABLE_THP_MIGRATION)
> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args)
> +{

Printing various values at different locations in this function for debug:

> +	unsigned long max_swap_offset;
> +	swp_entry_t swp_set, swp_clear, swp_convert;
> +	pmd_t pmd_set, pmd_clear;
> +
> +	/*
> +	 * See generic_max_swapfile_size(): probe the maximum offset, then
> +	 * create swap entry will all possible bits set and a swap entry will
> +	 * all bits clear.
> +	 */
> +	max_swap_offset = swp_offset(pmd_to_swp_entry(swp_entry_to_pmd(swp_entry(0, ~0UL))));
> +	swp_set = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset);
> +	swp_clear = swp_entry(0, 0);
> +
> +	/* Convert to pmd. */
> +	pmd_set = swp_entry_to_pmd(swp_set);
> +	pmd_clear = swp_entry_to_pmd(swp_clear);

[    0.702163] debug_vm_pgtable: [swp_pmd_mkinvalid_tests  ]: valid: pmd_set=f800000000000000, pmd_clear=7fffffffffffe00

> +
> +	/*
> +	 * Sanity check that the pmds are not-present, not-huge and swap entry
> +	 * is recoverable without corruption.
> +	 */
> +	WARN_ON(pmd_present(pmd_set));
> +	WARN_ON(pmd_trans_huge(pmd_set));
> +	swp_convert = pmd_to_swp_entry(pmd_set);
> +	WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
> +	WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
> +	WARN_ON(pmd_present(pmd_clear));
> +	WARN_ON(pmd_trans_huge(pmd_clear));
> +	swp_convert = pmd_to_swp_entry(pmd_clear);
> +	WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
> +	WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));
> +
> +	/* Now invalidate the pmd. */
> +	pmd_set = pmd_mkinvalid(pmd_set);
> +	pmd_clear = pmd_mkinvalid(pmd_clear);

[    0.704452] debug_vm_pgtable: [swp_pmd_mkinvalid_tests  ]: invalid: pmd_set=f800000000000000, pmd_clear=7ffffffffe00e00

> +
> +	/*
> +	 * Since its a swap pmd, invalidation should effectively be a noop and
> +	 * the checks we already did should give the same answer. Check the
> +	 * invalidation didn't corrupt any fields.
> +	 */
> +	WARN_ON(pmd_present(pmd_set));
> +	WARN_ON(pmd_trans_huge(pmd_set));
> +	swp_convert = pmd_to_swp_entry(pmd_set);

[    0.706461] debug_vm_pgtable: [swp_pmd_mkinvalid_tests  ]: set: swp=7c03ffffffffffff (1f, 3ffffffffffff), convert=7c03ffffffffffff (1f, 3ffffffffffff)

> +	WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
> +	WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
> +	WARN_ON(pmd_present(pmd_clear));
> +	WARN_ON(pmd_trans_huge(pmd_clear));
> +	swp_convert = pmd_to_swp_entry(pmd_clear);

[    0.708841] debug_vm_pgtable: [swp_pmd_mkinvalid_tests  ]: clear: swp=0 (0, 0), convert=ff8 (0, ff8)

> +	WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
> +	WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));

This line fails on x86_64.

The logs show that the offset is indeed being corrupted by pmd_mkinvalid(); 0 -> 0xff8.

I think this is due to x86's pmd_mkinvalid() assuming the pmd is present; pmd_flags() and pmd_pfn() do all sorts of weird and wonderful things.

So does this take us full circle? Are we now back to modifying the core-mm to never call pmd_mkinvalid() on a non-present entry? If so, then I guess we should remove the arm64 fix from for-next/fixes.

> +}
> +#else
> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args) { }
> +#endif /* !__HAVE_ARCH_PMDP_INVALIDATE && CONFIG_ARCH_ENABLE_THP_MIGRATION */
> +
>  static void __init pmd_thp_tests(struct pgtable_debug_args *args)
>  {
>  	pmd_t pmd;
> @@ -982,6 +1041,8 @@ static void __init pmd_thp_tests(struct pgtable_debug_args *args)
>  	WARN_ON(!pmd_trans_huge(pmd_mkinvalid(pmd_mkhuge(pmd))));
>  	WARN_ON(!pmd_present(pmd_mkinvalid(pmd_mkhuge(pmd))));
>  #endif /* __HAVE_ARCH_PMDP_INVALIDATE */
> +
> +	swp_pmd_mkinvalid_tests(args);
>  }
> 
>  #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> --
> 2.25.1
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
  2024-05-01 11:35 ` Ryan Roberts
@ 2024-05-01 11:38   ` Ryan Roberts
  2024-05-01 12:07     ` Zi Yan
  0 siblings, 1 reply; 15+ messages in thread
From: Ryan Roberts @ 2024-05-01 11:38 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Anshuman Khandual,
	Andrew Morton, Zi Yan, Aneesh Kumar K.V, David Hildenbrand
  Cc: linux-arm-kernel, linux-mm, linux-kernel, stable

Pulling in David, who may be able to advise...


On 01/05/2024 12:35, Ryan Roberts wrote:
> Zi Yan, I'm hoping you might have some input on the below...
> 
> 
> On 30/04/2024 14:31, Ryan Roberts wrote:
>> __split_huge_pmd_locked() can be called for a present THP, devmap or
>> (non-present) migration entry. It calls pmdp_invalidate()
>> unconditionally on the pmdp and only determines if it is present or not
>> based on the returned old pmd.
>>
>> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
>> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
>> pmd_present() calls to return true - even for a swap pmd. Therefore any
>> lockless pgtable walker could see the migration entry pmd in this state
>> and start interpretting the fields (e.g. pmd_pfn()) as if it were
>> present, leading to BadThings (TM). GUP-fast appears to be one such
>> lockless pgtable walker.
>>
>> While the obvious fix is for core-mm to avoid such calls for non-present
>> pmds (pmdp_invalidate() will also issue TLBI which is not necessary for
>> this case either), all other arches that implement pmd_mkinvalid() do it
>> in such a way that it is robust to being called with a non-present pmd.
> 
> OK the plot thickens; The tests I wrote to check that pmd_mkinvalid() is safe for swap entries fails on x86_64. See below...
> 
>> So it is simpler and safer to make arm64 robust too. This approach means
>> we can even add tests to debug_vm_pgtable.c to validate the required
>> behaviour.
>>
>> This is a theoretical bug found during code review. I don't have any
>> test case to trigger it in practice.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration")
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>
>> Hi all,
>>
>> v1 of this fix [1] took the approach of fixing core-mm to never call
>> pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64
>> suffers this problem; all other arches are robust. So his suggestion was to
>> instead make arm64 robust in the same way and add tests to validate it. Despite
>> my stated reservations in the context of the v1 discussion, having thought on it
>> for a bit, I now agree with Zi Yan. Hence this post.
>>
>> Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is
>> remove it from there and have this go in through the arm64 tree? Assuming there
>> is agreement that this approach is right one.
>>
>> This applies on top of v6.9-rc5. Passes all the mm selftests on arm64.
>>
>> [1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
>>
>> Thanks,
>> Ryan
>>
>>
>>  arch/arm64/include/asm/pgtable.h | 12 +++++--
>>  mm/debug_vm_pgtable.c            | 61 ++++++++++++++++++++++++++++++++
>>  2 files changed, 71 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index afdd56d26ad7..7d580271a46d 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -511,8 +511,16 @@ static inline int pmd_trans_huge(pmd_t pmd)
>>
>>  static inline pmd_t pmd_mkinvalid(pmd_t pmd)
>>  {
>> -	pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
>> -	pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
>> +	/*
>> +	 * If not valid then either we are already present-invalid or we are
>> +	 * not-present (i.e. none or swap entry). We must not convert
>> +	 * not-present to present-invalid. Unbelievably, the core-mm may call
>> +	 * pmd_mkinvalid() for a swap entry and all other arches can handle it.
>> +	 */
>> +	if (pmd_valid(pmd)) {
>> +		pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
>> +		pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
>> +	}
>>
>>  	return pmd;
>>  }
>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>> index 65c19025da3d..7e9c387d06b0 100644
>> --- a/mm/debug_vm_pgtable.c
>> +++ b/mm/debug_vm_pgtable.c
>> @@ -956,6 +956,65 @@ static void __init hugetlb_basic_tests(struct pgtable_debug_args *args) { }
>>  #endif /* CONFIG_HUGETLB_PAGE */
>>
>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +#if !defined(__HAVE_ARCH_PMDP_INVALIDATE) && defined(CONFIG_ARCH_ENABLE_THP_MIGRATION)
>> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args)
>> +{
> 
> Printing various values at different locations in this function for debug:
> 
>> +	unsigned long max_swap_offset;
>> +	swp_entry_t swp_set, swp_clear, swp_convert;
>> +	pmd_t pmd_set, pmd_clear;
>> +
>> +	/*
>> +	 * See generic_max_swapfile_size(): probe the maximum offset, then
>> +	 * create swap entry will all possible bits set and a swap entry will
>> +	 * all bits clear.
>> +	 */
>> +	max_swap_offset = swp_offset(pmd_to_swp_entry(swp_entry_to_pmd(swp_entry(0, ~0UL))));
>> +	swp_set = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset);
>> +	swp_clear = swp_entry(0, 0);
>> +
>> +	/* Convert to pmd. */
>> +	pmd_set = swp_entry_to_pmd(swp_set);
>> +	pmd_clear = swp_entry_to_pmd(swp_clear);
> 
> [    0.702163] debug_vm_pgtable: [swp_pmd_mkinvalid_tests  ]: valid: pmd_set=f800000000000000, pmd_clear=7fffffffffffe00
> 
>> +
>> +	/*
>> +	 * Sanity check that the pmds are not-present, not-huge and swap entry
>> +	 * is recoverable without corruption.
>> +	 */
>> +	WARN_ON(pmd_present(pmd_set));
>> +	WARN_ON(pmd_trans_huge(pmd_set));
>> +	swp_convert = pmd_to_swp_entry(pmd_set);
>> +	WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
>> +	WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
>> +	WARN_ON(pmd_present(pmd_clear));
>> +	WARN_ON(pmd_trans_huge(pmd_clear));
>> +	swp_convert = pmd_to_swp_entry(pmd_clear);
>> +	WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
>> +	WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));
>> +
>> +	/* Now invalidate the pmd. */
>> +	pmd_set = pmd_mkinvalid(pmd_set);
>> +	pmd_clear = pmd_mkinvalid(pmd_clear);
> 
> [    0.704452] debug_vm_pgtable: [swp_pmd_mkinvalid_tests  ]: invalid: pmd_set=f800000000000000, pmd_clear=7ffffffffe00e00
> 
>> +
>> +	/*
>> +	 * Since its a swap pmd, invalidation should effectively be a noop and
>> +	 * the checks we already did should give the same answer. Check the
>> +	 * invalidation didn't corrupt any fields.
>> +	 */
>> +	WARN_ON(pmd_present(pmd_set));
>> +	WARN_ON(pmd_trans_huge(pmd_set));
>> +	swp_convert = pmd_to_swp_entry(pmd_set);
> 
> [    0.706461] debug_vm_pgtable: [swp_pmd_mkinvalid_tests  ]: set: swp=7c03ffffffffffff (1f, 3ffffffffffff), convert=7c03ffffffffffff (1f, 3ffffffffffff)
> 
>> +	WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
>> +	WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
>> +	WARN_ON(pmd_present(pmd_clear));
>> +	WARN_ON(pmd_trans_huge(pmd_clear));
>> +	swp_convert = pmd_to_swp_entry(pmd_clear);
> 
> [    0.708841] debug_vm_pgtable: [swp_pmd_mkinvalid_tests  ]: clear: swp=0 (0, 0), convert=ff8 (0, ff8)
> 
>> +	WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
>> +	WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));
> 
> This line fails on x86_64.
> 
> The logs show that the offset is indeed being corrupted by pmd_mkinvalid(); 0 -> 0xff8.
> 
> I think this is due to x86's pmd_mkinvalid() assuming the pmd is present; pmd_flags() and pmd_pfn() do all sorts of weird and wonderful things.
> 
> So does this take us full circle? Are we now back to modifying the core-mm to never call pmd_mkinvalid() on a non-present entry? If so, then I guess we should remove the arm64 fix from for-next/fixes.
> 
>> +}
>> +#else
>> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args) { }
>> +#endif /* !__HAVE_ARCH_PMDP_INVALIDATE && CONFIG_ARCH_ENABLE_THP_MIGRATION */
>> +
>>  static void __init pmd_thp_tests(struct pgtable_debug_args *args)
>>  {
>>  	pmd_t pmd;
>> @@ -982,6 +1041,8 @@ static void __init pmd_thp_tests(struct pgtable_debug_args *args)
>>  	WARN_ON(!pmd_trans_huge(pmd_mkinvalid(pmd_mkhuge(pmd))));
>>  	WARN_ON(!pmd_present(pmd_mkinvalid(pmd_mkhuge(pmd))));
>>  #endif /* __HAVE_ARCH_PMDP_INVALIDATE */
>> +
>> +	swp_pmd_mkinvalid_tests(args);
>>  }
>>
>>  #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>> --
>> 2.25.1
>>
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
  2024-05-01 11:38   ` Ryan Roberts
@ 2024-05-01 12:07     ` Zi Yan
  2024-05-01 12:58       ` Ryan Roberts
  0 siblings, 1 reply; 15+ messages in thread
From: Zi Yan @ 2024-05-01 12:07 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Anshuman Khandual,
	Andrew Morton, Aneesh Kumar K.V, David Hildenbrand,
	linux-arm-kernel, linux-mm, linux-kernel, stable

[-- Attachment #1: Type: text/plain, Size: 8872 bytes --]

On 1 May 2024, at 7:38, Ryan Roberts wrote:

> Pulling in David, who may be able to advise...
>
>
> On 01/05/2024 12:35, Ryan Roberts wrote:
>> Zi Yan, I'm hoping you might have some input on the below...
>>
>>
>> On 30/04/2024 14:31, Ryan Roberts wrote:
>>> __split_huge_pmd_locked() can be called for a present THP, devmap or
>>> (non-present) migration entry. It calls pmdp_invalidate()
>>> unconditionally on the pmdp and only determines if it is present or not
>>> based on the returned old pmd.
>>>
>>> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
>>> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
>>> pmd_present() calls to return true - even for a swap pmd. Therefore any
>>> lockless pgtable walker could see the migration entry pmd in this state
>>> and start interpretting the fields (e.g. pmd_pfn()) as if it were
>>> present, leading to BadThings (TM). GUP-fast appears to be one such
>>> lockless pgtable walker.
>>>
>>> While the obvious fix is for core-mm to avoid such calls for non-present
>>> pmds (pmdp_invalidate() will also issue TLBI which is not necessary for
>>> this case either), all other arches that implement pmd_mkinvalid() do it
>>> in such a way that it is robust to being called with a non-present pmd.
>>
>> OK the plot thickens; The tests I wrote to check that pmd_mkinvalid() is safe for swap entries fails on x86_64. See below...
>>
>>> So it is simpler and safer to make arm64 robust too. This approach means
>>> we can even add tests to debug_vm_pgtable.c to validate the required
>>> behaviour.
>>>
>>> This is a theoretical bug found during code review. I don't have any
>>> test case to trigger it in practice.
>>>
>>> Cc: stable@vger.kernel.org
>>> Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration")
>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>> ---
>>>
>>> Hi all,
>>>
>>> v1 of this fix [1] took the approach of fixing core-mm to never call
>>> pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64
>>> suffers this problem; all other arches are robust. So his suggestion was to
>>> instead make arm64 robust in the same way and add tests to validate it. Despite
>>> my stated reservations in the context of the v1 discussion, having thought on it
>>> for a bit, I now agree with Zi Yan. Hence this post.
>>>
>>> Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is
>>> remove it from there and have this go in through the arm64 tree? Assuming there
>>> is agreement that this approach is right one.
>>>
>>> This applies on top of v6.9-rc5. Passes all the mm selftests on arm64.
>>>
>>> [1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
>>>
>>> Thanks,
>>> Ryan
>>>
>>>
>>>  arch/arm64/include/asm/pgtable.h | 12 +++++--
>>>  mm/debug_vm_pgtable.c            | 61 ++++++++++++++++++++++++++++++++
>>>  2 files changed, 71 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>> index afdd56d26ad7..7d580271a46d 100644
>>> --- a/arch/arm64/include/asm/pgtable.h
>>> +++ b/arch/arm64/include/asm/pgtable.h
>>> @@ -511,8 +511,16 @@ static inline int pmd_trans_huge(pmd_t pmd)
>>>
>>>  static inline pmd_t pmd_mkinvalid(pmd_t pmd)
>>>  {
>>> -	pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
>>> -	pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
>>> +	/*
>>> +	 * If not valid then either we are already present-invalid or we are
>>> +	 * not-present (i.e. none or swap entry). We must not convert
>>> +	 * not-present to present-invalid. Unbelievably, the core-mm may call
>>> +	 * pmd_mkinvalid() for a swap entry and all other arches can handle it.
>>> +	 */
>>> +	if (pmd_valid(pmd)) {
>>> +		pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
>>> +		pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
>>> +	}
>>>
>>>  	return pmd;
>>>  }
>>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>>> index 65c19025da3d..7e9c387d06b0 100644
>>> --- a/mm/debug_vm_pgtable.c
>>> +++ b/mm/debug_vm_pgtable.c
>>> @@ -956,6 +956,65 @@ static void __init hugetlb_basic_tests(struct pgtable_debug_args *args) { }
>>>  #endif /* CONFIG_HUGETLB_PAGE */
>>>
>>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>> +#if !defined(__HAVE_ARCH_PMDP_INVALIDATE) && defined(CONFIG_ARCH_ENABLE_THP_MIGRATION)
>>> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args)
>>> +{
>>
>> Printing various values at different locations in this function for debug:
>>
>>> +	unsigned long max_swap_offset;
>>> +	swp_entry_t swp_set, swp_clear, swp_convert;
>>> +	pmd_t pmd_set, pmd_clear;
>>> +
>>> +	/*
>>> +	 * See generic_max_swapfile_size(): probe the maximum offset, then
>>> +	 * create swap entry will all possible bits set and a swap entry will
>>> +	 * all bits clear.
>>> +	 */
>>> +	max_swap_offset = swp_offset(pmd_to_swp_entry(swp_entry_to_pmd(swp_entry(0, ~0UL))));
>>> +	swp_set = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset);
>>> +	swp_clear = swp_entry(0, 0);
>>> +
>>> +	/* Convert to pmd. */
>>> +	pmd_set = swp_entry_to_pmd(swp_set);
>>> +	pmd_clear = swp_entry_to_pmd(swp_clear);
>>
>> [    0.702163] debug_vm_pgtable: [swp_pmd_mkinvalid_tests  ]: valid: pmd_set=f800000000000000, pmd_clear=7fffffffffffe00
>>
>>> +
>>> +	/*
>>> +	 * Sanity check that the pmds are not-present, not-huge and swap entry
>>> +	 * is recoverable without corruption.
>>> +	 */
>>> +	WARN_ON(pmd_present(pmd_set));
>>> +	WARN_ON(pmd_trans_huge(pmd_set));
>>> +	swp_convert = pmd_to_swp_entry(pmd_set);
>>> +	WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
>>> +	WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
>>> +	WARN_ON(pmd_present(pmd_clear));
>>> +	WARN_ON(pmd_trans_huge(pmd_clear));
>>> +	swp_convert = pmd_to_swp_entry(pmd_clear);
>>> +	WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
>>> +	WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));
>>> +
>>> +	/* Now invalidate the pmd. */
>>> +	pmd_set = pmd_mkinvalid(pmd_set);
>>> +	pmd_clear = pmd_mkinvalid(pmd_clear);
>>
>> [    0.704452] debug_vm_pgtable: [swp_pmd_mkinvalid_tests  ]: invalid: pmd_set=f800000000000000, pmd_clear=7ffffffffe00e00
>>
>>> +
>>> +	/*
>>> +	 * Since its a swap pmd, invalidation should effectively be a noop and
>>> +	 * the checks we already did should give the same answer. Check the
>>> +	 * invalidation didn't corrupt any fields.
>>> +	 */
>>> +	WARN_ON(pmd_present(pmd_set));
>>> +	WARN_ON(pmd_trans_huge(pmd_set));
>>> +	swp_convert = pmd_to_swp_entry(pmd_set);
>>
>> [    0.706461] debug_vm_pgtable: [swp_pmd_mkinvalid_tests  ]: set: swp=7c03ffffffffffff (1f, 3ffffffffffff), convert=7c03ffffffffffff (1f, 3ffffffffffff)
>>
>>> +	WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
>>> +	WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
>>> +	WARN_ON(pmd_present(pmd_clear));
>>> +	WARN_ON(pmd_trans_huge(pmd_clear));
>>> +	swp_convert = pmd_to_swp_entry(pmd_clear);
>>
>> [    0.708841] debug_vm_pgtable: [swp_pmd_mkinvalid_tests  ]: clear: swp=0 (0, 0), convert=ff8 (0, ff8)
>>
>>> +	WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
>>> +	WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));
>>
>> This line fails on x86_64.
>>
>> The logs show that the offset is indeed being corrupted by pmd_mkinvalid(); 0 -> 0xff8.
>>
>> I think this is due to x86's pmd_mkinvalid() assuming the pmd is present; pmd_flags() and pmd_pfn() do all sorts of weird and wonderful things.
>>
>> So does this take us full circle? Are we now back to modifying the core-mm to never call pmd_mkinvalid() on a non-present entry? If so, then I guess we should remove the arm64 fix from for-next/fixes.

If x86_64's pmd_mkinvalid() also corrupts swap entries, yes, your original fix
is better. I will dig into the x86 code more to figure out what goes wrong.
Last time, I only checked PAGE_* bits in these pmd|pte_* operations.
Sorry for the misinformation.

>>
>>> +}
>>> +#else
>>> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args) { }
>>> +#endif /* !__HAVE_ARCH_PMDP_INVALIDATE && CONFIG_ARCH_ENABLE_THP_MIGRATION */
>>> +
>>>  static void __init pmd_thp_tests(struct pgtable_debug_args *args)
>>>  {
>>>  	pmd_t pmd;
>>> @@ -982,6 +1041,8 @@ static void __init pmd_thp_tests(struct pgtable_debug_args *args)
>>>  	WARN_ON(!pmd_trans_huge(pmd_mkinvalid(pmd_mkhuge(pmd))));
>>>  	WARN_ON(!pmd_present(pmd_mkinvalid(pmd_mkhuge(pmd))));
>>>  #endif /* __HAVE_ARCH_PMDP_INVALIDATE */
>>> +
>>> +	swp_pmd_mkinvalid_tests(args);
>>>  }
>>>
>>>  #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>> --
>>> 2.25.1
>>>
>>


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
  2024-05-01 12:07     ` Zi Yan
@ 2024-05-01 12:58       ` Ryan Roberts
  0 siblings, 0 replies; 15+ messages in thread
From: Ryan Roberts @ 2024-05-01 12:58 UTC (permalink / raw)
  To: Zi Yan
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Anshuman Khandual,
	Andrew Morton, Aneesh Kumar K.V, David Hildenbrand,
	linux-arm-kernel, linux-mm, linux-kernel, stable

On 01/05/2024 13:07, Zi Yan wrote:
> On 1 May 2024, at 7:38, Ryan Roberts wrote:
> 
>> Pulling in David, who may be able to advise...
>>
>>
>> On 01/05/2024 12:35, Ryan Roberts wrote:
>>> Zi Yan, I'm hoping you might have some input on the below...
>>>
>>>
>>> On 30/04/2024 14:31, Ryan Roberts wrote:
>>>> __split_huge_pmd_locked() can be called for a present THP, devmap or
>>>> (non-present) migration entry. It calls pmdp_invalidate()
>>>> unconditionally on the pmdp and only determines if it is present or not
>>>> based on the returned old pmd.
>>>>
>>>> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
>>>> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
>>>> pmd_present() calls to return true - even for a swap pmd. Therefore any
>>>> lockless pgtable walker could see the migration entry pmd in this state
>>>> and start interpretting the fields (e.g. pmd_pfn()) as if it were
>>>> present, leading to BadThings (TM). GUP-fast appears to be one such
>>>> lockless pgtable walker.
>>>>
>>>> While the obvious fix is for core-mm to avoid such calls for non-present
>>>> pmds (pmdp_invalidate() will also issue TLBI which is not necessary for
>>>> this case either), all other arches that implement pmd_mkinvalid() do it
>>>> in such a way that it is robust to being called with a non-present pmd.
>>>
>>> OK the plot thickens; The tests I wrote to check that pmd_mkinvalid() is safe for swap entries fails on x86_64. See below...
>>>
>>>> So it is simpler and safer to make arm64 robust too. This approach means
>>>> we can even add tests to debug_vm_pgtable.c to validate the required
>>>> behaviour.
>>>>
>>>> This is a theoretical bug found during code review. I don't have any
>>>> test case to trigger it in practice.
>>>>
>>>> Cc: stable@vger.kernel.org
>>>> Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration")
>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>> ---
>>>>
>>>> Hi all,
>>>>
>>>> v1 of this fix [1] took the approach of fixing core-mm to never call
>>>> pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64
>>>> suffers this problem; all other arches are robust. So his suggestion was to
>>>> instead make arm64 robust in the same way and add tests to validate it. Despite
>>>> my stated reservations in the context of the v1 discussion, having thought on it
>>>> for a bit, I now agree with Zi Yan. Hence this post.
>>>>
>>>> Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is
>>>> remove it from there and have this go in through the arm64 tree? Assuming there
>>>> is agreement that this approach is right one.
>>>>
>>>> This applies on top of v6.9-rc5. Passes all the mm selftests on arm64.
>>>>
>>>> [1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
>>>>
>>>> Thanks,
>>>> Ryan
>>>>
>>>>
>>>>  arch/arm64/include/asm/pgtable.h | 12 +++++--
>>>>  mm/debug_vm_pgtable.c            | 61 ++++++++++++++++++++++++++++++++
>>>>  2 files changed, 71 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>>> index afdd56d26ad7..7d580271a46d 100644
>>>> --- a/arch/arm64/include/asm/pgtable.h
>>>> +++ b/arch/arm64/include/asm/pgtable.h
>>>> @@ -511,8 +511,16 @@ static inline int pmd_trans_huge(pmd_t pmd)
>>>>
>>>>  static inline pmd_t pmd_mkinvalid(pmd_t pmd)
>>>>  {
>>>> -	pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
>>>> -	pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
>>>> +	/*
>>>> +	 * If not valid then either we are already present-invalid or we are
>>>> +	 * not-present (i.e. none or swap entry). We must not convert
>>>> +	 * not-present to present-invalid. Unbelievably, the core-mm may call
>>>> +	 * pmd_mkinvalid() for a swap entry and all other arches can handle it.
>>>> +	 */
>>>> +	if (pmd_valid(pmd)) {
>>>> +		pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
>>>> +		pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
>>>> +	}
>>>>
>>>>  	return pmd;
>>>>  }
>>>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>>>> index 65c19025da3d..7e9c387d06b0 100644
>>>> --- a/mm/debug_vm_pgtable.c
>>>> +++ b/mm/debug_vm_pgtable.c
>>>> @@ -956,6 +956,65 @@ static void __init hugetlb_basic_tests(struct pgtable_debug_args *args) { }
>>>>  #endif /* CONFIG_HUGETLB_PAGE */
>>>>
>>>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>> +#if !defined(__HAVE_ARCH_PMDP_INVALIDATE) && defined(CONFIG_ARCH_ENABLE_THP_MIGRATION)
>>>> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args)
>>>> +{
>>>
>>> Printing various values at different locations in this function for debug:
>>>
>>>> +	unsigned long max_swap_offset;
>>>> +	swp_entry_t swp_set, swp_clear, swp_convert;
>>>> +	pmd_t pmd_set, pmd_clear;
>>>> +
>>>> +	/*
>>>> +	 * See generic_max_swapfile_size(): probe the maximum offset, then
>>>> +	 * create swap entry will all possible bits set and a swap entry will
>>>> +	 * all bits clear.
>>>> +	 */
>>>> +	max_swap_offset = swp_offset(pmd_to_swp_entry(swp_entry_to_pmd(swp_entry(0, ~0UL))));
>>>> +	swp_set = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset);
>>>> +	swp_clear = swp_entry(0, 0);
>>>> +
>>>> +	/* Convert to pmd. */
>>>> +	pmd_set = swp_entry_to_pmd(swp_set);
>>>> +	pmd_clear = swp_entry_to_pmd(swp_clear);
>>>
>>> [    0.702163] debug_vm_pgtable: [swp_pmd_mkinvalid_tests  ]: valid: pmd_set=f800000000000000, pmd_clear=7fffffffffffe00
>>>
>>>> +
>>>> +	/*
>>>> +	 * Sanity check that the pmds are not-present, not-huge and swap entry
>>>> +	 * is recoverable without corruption.
>>>> +	 */
>>>> +	WARN_ON(pmd_present(pmd_set));
>>>> +	WARN_ON(pmd_trans_huge(pmd_set));
>>>> +	swp_convert = pmd_to_swp_entry(pmd_set);
>>>> +	WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
>>>> +	WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
>>>> +	WARN_ON(pmd_present(pmd_clear));
>>>> +	WARN_ON(pmd_trans_huge(pmd_clear));
>>>> +	swp_convert = pmd_to_swp_entry(pmd_clear);
>>>> +	WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
>>>> +	WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));
>>>> +
>>>> +	/* Now invalidate the pmd. */
>>>> +	pmd_set = pmd_mkinvalid(pmd_set);
>>>> +	pmd_clear = pmd_mkinvalid(pmd_clear);
>>>
>>> [    0.704452] debug_vm_pgtable: [swp_pmd_mkinvalid_tests  ]: invalid: pmd_set=f800000000000000, pmd_clear=7ffffffffe00e00
>>>
>>>> +
>>>> +	/*
>>>> +	 * Since its a swap pmd, invalidation should effectively be a noop and
>>>> +	 * the checks we already did should give the same answer. Check the
>>>> +	 * invalidation didn't corrupt any fields.
>>>> +	 */
>>>> +	WARN_ON(pmd_present(pmd_set));
>>>> +	WARN_ON(pmd_trans_huge(pmd_set));
>>>> +	swp_convert = pmd_to_swp_entry(pmd_set);
>>>
>>> [    0.706461] debug_vm_pgtable: [swp_pmd_mkinvalid_tests  ]: set: swp=7c03ffffffffffff (1f, 3ffffffffffff), convert=7c03ffffffffffff (1f, 3ffffffffffff)
>>>
>>>> +	WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
>>>> +	WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
>>>> +	WARN_ON(pmd_present(pmd_clear));
>>>> +	WARN_ON(pmd_trans_huge(pmd_clear));
>>>> +	swp_convert = pmd_to_swp_entry(pmd_clear);
>>>
>>> [    0.708841] debug_vm_pgtable: [swp_pmd_mkinvalid_tests  ]: clear: swp=0 (0, 0), convert=ff8 (0, ff8)
>>>
>>>> +	WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
>>>> +	WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));
>>>
>>> This line fails on x86_64.
>>>
>>> The logs show that the offset is indeed being corrupted by pmd_mkinvalid(); 0 -> 0xff8.
>>>
>>> I think this is due to x86's pmd_mkinvalid() assuming the pmd is present; pmd_flags() and pmd_pfn() do all sorts of weird and wonderful things.
>>>
>>> So does this take us full circle? Are we now back to modifying the core-mm to never call pmd_mkinvalid() on a non-present entry? If so, then I guess we should remove the arm64 fix from for-next/fixes.
> 
> If x86_64's pmd_mkinvalid() also corrupts swap entries, yes, your original fix
> is better. I will dig into the x86 code more to figure out what goes wrong.
> Last time, I only checked PAGE_* bits in these pmd|pte_* operations.
> Sorry for the misinformation.

No worries, I'll do the amends we originally agreed for the original fix and resend.

> 
>>>
>>>> +}
>>>> +#else
>>>> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args) { }
>>>> +#endif /* !__HAVE_ARCH_PMDP_INVALIDATE && CONFIG_ARCH_ENABLE_THP_MIGRATION */
>>>> +
>>>>  static void __init pmd_thp_tests(struct pgtable_debug_args *args)
>>>>  {
>>>>  	pmd_t pmd;
>>>> @@ -982,6 +1041,8 @@ static void __init pmd_thp_tests(struct pgtable_debug_args *args)
>>>>  	WARN_ON(!pmd_trans_huge(pmd_mkinvalid(pmd_mkhuge(pmd))));
>>>>  	WARN_ON(!pmd_present(pmd_mkinvalid(pmd_mkhuge(pmd))));
>>>>  #endif /* __HAVE_ARCH_PMDP_INVALIDATE */
>>>> +
>>>> +	swp_pmd_mkinvalid_tests(args);
>>>>  }
>>>>
>>>>  #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>>> --
>>>> 2.25.1
>>>>
>>>
> 
> 
> --
> Best Regards,
> Yan, Zi



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
  2024-04-30 17:57 ` Catalin Marinas
  2024-05-01  8:05   ` Ryan Roberts
@ 2024-05-02 18:00   ` Catalin Marinas
  1 sibling, 0 replies; 15+ messages in thread
From: Catalin Marinas @ 2024-05-02 18:00 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Anshuman Khandual, Andrew Morton,
	Zi Yan, Aneesh Kumar K.V, Ryan Roberts
  Cc: linux-arm-kernel, linux-mm, linux-kernel, stable

On Tue, Apr 30, 2024 at 06:57:52PM +0100, Catalin Marinas wrote:
> On Tue, 30 Apr 2024 14:31:38 +0100, Ryan Roberts wrote:
> > __split_huge_pmd_locked() can be called for a present THP, devmap or
> > (non-present) migration entry. It calls pmdp_invalidate()
> > unconditionally on the pmdp and only determines if it is present or not
> > based on the returned old pmd.
> > 
> > But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
> > unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
> > pmd_present() calls to return true - even for a swap pmd. Therefore any
> > lockless pgtable walker could see the migration entry pmd in this state
> > and start interpretting the fields (e.g. pmd_pfn()) as if it were
> > present, leading to BadThings (TM). GUP-fast appears to be one such
> > lockless pgtable walker.
> > 
> > [...]
> 
> Applied to arm64 (for-next/fixes), thanks! It should land in 6.9-rc7. I
> removed the debug/test code, please send it as a separate patch for
> 6.10.
> 
> [1/1] arm64/mm: pmd_mkinvalid() must handle swap pmds
>       https://git.kernel.org/arm64/c/e783331c7720

Since Andrew merged the generic mm fix, I dropped this patch from the
arm64 for-next/fixes branch.

-- 
Catalin


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-05-02 18:00 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-30 13:31 [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds Ryan Roberts
2024-04-30 13:55 ` Will Deacon
2024-04-30 14:04   ` Ryan Roberts
2024-04-30 16:23     ` Catalin Marinas
2024-04-30 16:25       ` Ryan Roberts
2024-04-30 15:00 ` Zi Yan
2024-04-30 17:57 ` Catalin Marinas
2024-05-01  8:05   ` Ryan Roberts
2024-05-01 10:04     ` Catalin Marinas
2024-05-01 10:13       ` Ryan Roberts
2024-05-02 18:00   ` Catalin Marinas
2024-05-01 11:35 ` Ryan Roberts
2024-05-01 11:38   ` Ryan Roberts
2024-05-01 12:07     ` Zi Yan
2024-05-01 12:58       ` Ryan Roberts

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox