* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
2024-04-30 13:31 [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds Ryan Roberts
@ 2024-04-30 13:55 ` Will Deacon
2024-04-30 14:04 ` Ryan Roberts
2024-04-30 15:00 ` Zi Yan
` (2 subsequent siblings)
3 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2024-04-30 13:55 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Mark Rutland, Anshuman Khandual, Andrew Morton,
Zi Yan, Aneesh Kumar K.V, linux-arm-kernel, linux-mm,
linux-kernel, stable
On Tue, Apr 30, 2024 at 02:31:38PM +0100, Ryan Roberts wrote:
> __split_huge_pmd_locked() can be called for a present THP, devmap or
> (non-present) migration entry. It calls pmdp_invalidate()
> unconditionally on the pmdp and only determines if it is present or not
> based on the returned old pmd.
>
> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
> pmd_present() calls to return true - even for a swap pmd. Therefore any
> lockless pgtable walker could see the migration entry pmd in this state
> and start interpretting the fields (e.g. pmd_pfn()) as if it were
> present, leading to BadThings (TM). GUP-fast appears to be one such
> lockless pgtable walker.
>
> While the obvious fix is for core-mm to avoid such calls for non-present
> pmds (pmdp_invalidate() will also issue TLBI which is not necessary for
> this case either), all other arches that implement pmd_mkinvalid() do it
> in such a way that it is robust to being called with a non-present pmd.
> So it is simpler and safer to make arm64 robust too. This approach means
> we can even add tests to debug_vm_pgtable.c to validate the required
> behaviour.
>
> This is a theoretical bug found during code review. I don't have any
> test case to trigger it in practice.
>
> Cc: stable@vger.kernel.org
> Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration")
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>
> Hi all,
>
> v1 of this fix [1] took the approach of fixing core-mm to never call
> pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64
> suffers this problem; all other arches are robust. So his suggestion was to
> instead make arm64 robust in the same way and add tests to validate it. Despite
> my stated reservations in the context of the v1 discussion, having thought on it
> for a bit, I now agree with Zi Yan. Hence this post.
>
> Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is
> remove it from there and have this go in through the arm64 tree? Assuming there
> is agreement that this approach is right one.
>
> This applies on top of v6.9-rc5. Passes all the mm selftests on arm64.
>
> [1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
>
> Thanks,
> Ryan
>
>
> arch/arm64/include/asm/pgtable.h | 12 +++++--
> mm/debug_vm_pgtable.c | 61 ++++++++++++++++++++++++++++++++
> 2 files changed, 71 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index afdd56d26ad7..7d580271a46d 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -511,8 +511,16 @@ static inline int pmd_trans_huge(pmd_t pmd)
>
> static inline pmd_t pmd_mkinvalid(pmd_t pmd)
> {
> - pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
> - pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
> + /*
> + * If not valid then either we are already present-invalid or we are
> + * not-present (i.e. none or swap entry). We must not convert
> + * not-present to present-invalid. Unbelievably, the core-mm may call
> + * pmd_mkinvalid() for a swap entry and all other arches can handle it.
> + */
> + if (pmd_valid(pmd)) {
> + pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
> + pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
> + }
>
> return pmd;
> }
Acked-by: Will Deacon <will@kernel.org>
But it might be worth splitting the tests from the fix to make backporting
easier.
Catalin -- I assume you'll pick this up, but please shout if you want me
to take it instead.
Will
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
2024-04-30 13:55 ` Will Deacon
@ 2024-04-30 14:04 ` Ryan Roberts
2024-04-30 16:23 ` Catalin Marinas
0 siblings, 1 reply; 15+ messages in thread
From: Ryan Roberts @ 2024-04-30 14:04 UTC (permalink / raw)
To: Will Deacon
Cc: Catalin Marinas, Mark Rutland, Anshuman Khandual, Andrew Morton,
Zi Yan, Aneesh Kumar K.V, linux-arm-kernel, linux-mm,
linux-kernel, stable
On 30/04/2024 14:55, Will Deacon wrote:
> On Tue, Apr 30, 2024 at 02:31:38PM +0100, Ryan Roberts wrote:
>> __split_huge_pmd_locked() can be called for a present THP, devmap or
>> (non-present) migration entry. It calls pmdp_invalidate()
>> unconditionally on the pmdp and only determines if it is present or not
>> based on the returned old pmd.
>>
>> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
>> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
>> pmd_present() calls to return true - even for a swap pmd. Therefore any
>> lockless pgtable walker could see the migration entry pmd in this state
>> and start interpretting the fields (e.g. pmd_pfn()) as if it were
>> present, leading to BadThings (TM). GUP-fast appears to be one such
>> lockless pgtable walker.
>>
>> While the obvious fix is for core-mm to avoid such calls for non-present
>> pmds (pmdp_invalidate() will also issue TLBI which is not necessary for
>> this case either), all other arches that implement pmd_mkinvalid() do it
>> in such a way that it is robust to being called with a non-present pmd.
>> So it is simpler and safer to make arm64 robust too. This approach means
>> we can even add tests to debug_vm_pgtable.c to validate the required
>> behaviour.
>>
>> This is a theoretical bug found during code review. I don't have any
>> test case to trigger it in practice.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration")
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>
>> Hi all,
>>
>> v1 of this fix [1] took the approach of fixing core-mm to never call
>> pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64
>> suffers this problem; all other arches are robust. So his suggestion was to
>> instead make arm64 robust in the same way and add tests to validate it. Despite
>> my stated reservations in the context of the v1 discussion, having thought on it
>> for a bit, I now agree with Zi Yan. Hence this post.
>>
>> Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is
>> remove it from there and have this go in through the arm64 tree? Assuming there
>> is agreement that this approach is right one.
>>
>> This applies on top of v6.9-rc5. Passes all the mm selftests on arm64.
>>
>> [1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
>>
>> Thanks,
>> Ryan
>>
>>
>> arch/arm64/include/asm/pgtable.h | 12 +++++--
>> mm/debug_vm_pgtable.c | 61 ++++++++++++++++++++++++++++++++
>> 2 files changed, 71 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index afdd56d26ad7..7d580271a46d 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -511,8 +511,16 @@ static inline int pmd_trans_huge(pmd_t pmd)
>>
>> static inline pmd_t pmd_mkinvalid(pmd_t pmd)
>> {
>> - pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
>> - pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
>> + /*
>> + * If not valid then either we are already present-invalid or we are
>> + * not-present (i.e. none or swap entry). We must not convert
>> + * not-present to present-invalid. Unbelievably, the core-mm may call
>> + * pmd_mkinvalid() for a swap entry and all other arches can handle it.
>> + */
>> + if (pmd_valid(pmd)) {
>> + pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
>> + pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
>> + }
>>
>> return pmd;
>> }
>
> Acked-by: Will Deacon <will@kernel.org>
Thanks
>
> But it might be worth splitting the tests from the fix to make backporting
> easier.
Yes good point. I'll leave this hanging for today to see if any more comments
come in, and will re-post tomorrow as 2 patches. I assume we need to go fast to
catch 6.9.
>
> Catalin -- I assume you'll pick this up, but please shout if you want me
> to take it instead.
>
> Will
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
2024-04-30 14:04 ` Ryan Roberts
@ 2024-04-30 16:23 ` Catalin Marinas
2024-04-30 16:25 ` Ryan Roberts
0 siblings, 1 reply; 15+ messages in thread
From: Catalin Marinas @ 2024-04-30 16:23 UTC (permalink / raw)
To: Ryan Roberts
Cc: Will Deacon, Mark Rutland, Anshuman Khandual, Andrew Morton,
Zi Yan, Aneesh Kumar K.V, linux-arm-kernel, linux-mm,
linux-kernel, stable
On Tue, Apr 30, 2024 at 03:04:32PM +0100, Ryan Roberts wrote:
> On 30/04/2024 14:55, Will Deacon wrote:
> > But it might be worth splitting the tests from the fix to make backporting
> > easier.
>
> Yes good point. I'll leave this hanging for today to see if any more comments
> come in, and will re-post tomorrow as 2 patches. I assume we need to go fast to
> catch 6.9.
Yes, I'll pick it up for 6.9. I can drop the tests from the patch (and
their mention in the log) and you can post your tests separately to go
via Andrew's tree.
--
Catalin
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
2024-04-30 16:23 ` Catalin Marinas
@ 2024-04-30 16:25 ` Ryan Roberts
0 siblings, 0 replies; 15+ messages in thread
From: Ryan Roberts @ 2024-04-30 16:25 UTC (permalink / raw)
To: Catalin Marinas
Cc: Will Deacon, Mark Rutland, Anshuman Khandual, Andrew Morton,
Zi Yan, Aneesh Kumar K.V, linux-arm-kernel, linux-mm,
linux-kernel, stable
On 30/04/2024 17:23, Catalin Marinas wrote:
> On Tue, Apr 30, 2024 at 03:04:32PM +0100, Ryan Roberts wrote:
>> On 30/04/2024 14:55, Will Deacon wrote:
>>> But it might be worth splitting the tests from the fix to make backporting
>>> easier.
>>
>> Yes good point. I'll leave this hanging for today to see if any more comments
>> come in, and will re-post tomorrow as 2 patches. I assume we need to go fast to
>> catch 6.9.
>
> Yes, I'll pick it up for 6.9. I can drop the tests from the patch (and
> their mention in the log) and you can post your tests separately to go
> via Andrew's tree.
>
Yep that works for me! thanks.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
2024-04-30 13:31 [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds Ryan Roberts
2024-04-30 13:55 ` Will Deacon
@ 2024-04-30 15:00 ` Zi Yan
2024-04-30 17:57 ` Catalin Marinas
2024-05-01 11:35 ` Ryan Roberts
3 siblings, 0 replies; 15+ messages in thread
From: Zi Yan @ 2024-04-30 15:00 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Anshuman Khandual,
Andrew Morton, Zi Yan, Aneesh Kumar K.V, linux-arm-kernel,
linux-mm, linux-kernel, stable
[-- Attachment #1: Type: text/plain, Size: 2648 bytes --]
On 30 Apr 2024, at 9:31, Ryan Roberts wrote:
> __split_huge_pmd_locked() can be called for a present THP, devmap or
> (non-present) migration entry. It calls pmdp_invalidate()
> unconditionally on the pmdp and only determines if it is present or not
> based on the returned old pmd.
>
> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
> pmd_present() calls to return true - even for a swap pmd. Therefore any
> lockless pgtable walker could see the migration entry pmd in this state
> and start interpretting the fields (e.g. pmd_pfn()) as if it were
> present, leading to BadThings (TM). GUP-fast appears to be one such
> lockless pgtable walker.
>
> While the obvious fix is for core-mm to avoid such calls for non-present
> pmds (pmdp_invalidate() will also issue TLBI which is not necessary for
> this case either), all other arches that implement pmd_mkinvalid() do it
> in such a way that it is robust to being called with a non-present pmd.
> So it is simpler and safer to make arm64 robust too. This approach means
> we can even add tests to debug_vm_pgtable.c to validate the required
> behaviour.
>
> This is a theoretical bug found during code review. I don't have any
> test case to trigger it in practice.
>
> Cc: stable@vger.kernel.org
> Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration")
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>
> Hi all,
>
> v1 of this fix [1] took the approach of fixing core-mm to never call
> pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64
> suffers this problem; all other arches are robust. So his suggestion was to
> instead make arm64 robust in the same way and add tests to validate it. Despite
> my stated reservations in the context of the v1 discussion, having thought on it
> for a bit, I now agree with Zi Yan. Hence this post.
>
> Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is
> remove it from there and have this go in through the arm64 tree? Assuming there
> is agreement that this approach is right one.
>
> This applies on top of v6.9-rc5. Passes all the mm selftests on arm64.
>
> [1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
>
> Thanks,
> Ryan
>
>
> arch/arm64/include/asm/pgtable.h | 12 +++++--
> mm/debug_vm_pgtable.c | 61 ++++++++++++++++++++++++++++++++
> 2 files changed, 71 insertions(+), 2 deletions(-)
>
LGTM. Thanks. Feel free to add Reviewed-by: Zi Yan <ziy@nvidia.com>.
--
Best Regards,
Yan, Zi
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
2024-04-30 13:31 [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds Ryan Roberts
2024-04-30 13:55 ` Will Deacon
2024-04-30 15:00 ` Zi Yan
@ 2024-04-30 17:57 ` Catalin Marinas
2024-05-01 8:05 ` Ryan Roberts
2024-05-02 18:00 ` Catalin Marinas
2024-05-01 11:35 ` Ryan Roberts
3 siblings, 2 replies; 15+ messages in thread
From: Catalin Marinas @ 2024-04-30 17:57 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, Anshuman Khandual, Andrew Morton,
Zi Yan, Aneesh Kumar K.V, Ryan Roberts
Cc: linux-arm-kernel, linux-mm, linux-kernel, stable
On Tue, 30 Apr 2024 14:31:38 +0100, Ryan Roberts wrote:
> __split_huge_pmd_locked() can be called for a present THP, devmap or
> (non-present) migration entry. It calls pmdp_invalidate()
> unconditionally on the pmdp and only determines if it is present or not
> based on the returned old pmd.
>
> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
> pmd_present() calls to return true - even for a swap pmd. Therefore any
> lockless pgtable walker could see the migration entry pmd in this state
> and start interpretting the fields (e.g. pmd_pfn()) as if it were
> present, leading to BadThings (TM). GUP-fast appears to be one such
> lockless pgtable walker.
>
> [...]
Applied to arm64 (for-next/fixes), thanks! It should land in 6.9-rc7. I
removed the debug/test code, please send it as a separate patch for
6.10.
[1/1] arm64/mm: pmd_mkinvalid() must handle swap pmds
https://git.kernel.org/arm64/c/e783331c7720
--
Catalin
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
2024-04-30 17:57 ` Catalin Marinas
@ 2024-05-01 8:05 ` Ryan Roberts
2024-05-01 10:04 ` Catalin Marinas
2024-05-02 18:00 ` Catalin Marinas
1 sibling, 1 reply; 15+ messages in thread
From: Ryan Roberts @ 2024-05-01 8:05 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Mark Rutland, Anshuman Khandual,
Andrew Morton, Zi Yan, Aneesh Kumar K.V
Cc: linux-arm-kernel, linux-mm, linux-kernel, stable
On 30/04/2024 18:57, Catalin Marinas wrote:
> On Tue, 30 Apr 2024 14:31:38 +0100, Ryan Roberts wrote:
>> __split_huge_pmd_locked() can be called for a present THP, devmap or
>> (non-present) migration entry. It calls pmdp_invalidate()
>> unconditionally on the pmdp and only determines if it is present or not
>> based on the returned old pmd.
>>
>> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
>> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
>> pmd_present() calls to return true - even for a swap pmd. Therefore any
>> lockless pgtable walker could see the migration entry pmd in this state
>> and start interpretting the fields (e.g. pmd_pfn()) as if it were
>> present, leading to BadThings (TM). GUP-fast appears to be one such
>> lockless pgtable walker.
>>
>> [...]
>
> Applied to arm64 (for-next/fixes), thanks! It should land in 6.9-rc7. I
> removed the debug/test code, please send it as a separate patch for
> 6.10.
Thanks Catalin! I'm guessing this will turn up in today's linux-next, so if I
send the tests today and Andrew puts them straight in mm-unstable (which will
goto linux-next) there is no risk that the tests are there without the fix? Or
do I need to hold off until the fix is in v6.9-rc7?
>
> [1/1] arm64/mm: pmd_mkinvalid() must handle swap pmds
> https://git.kernel.org/arm64/c/e783331c7720
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
2024-05-01 8:05 ` Ryan Roberts
@ 2024-05-01 10:04 ` Catalin Marinas
2024-05-01 10:13 ` Ryan Roberts
0 siblings, 1 reply; 15+ messages in thread
From: Catalin Marinas @ 2024-05-01 10:04 UTC (permalink / raw)
To: Ryan Roberts
Cc: Will Deacon, Mark Rutland, Anshuman Khandual, Andrew Morton,
Zi Yan, Aneesh Kumar K.V, linux-arm-kernel, linux-mm,
linux-kernel, stable
On Wed, May 01, 2024 at 09:05:17AM +0100, Ryan Roberts wrote:
> On 30/04/2024 18:57, Catalin Marinas wrote:
> > On Tue, 30 Apr 2024 14:31:38 +0100, Ryan Roberts wrote:
> >> __split_huge_pmd_locked() can be called for a present THP, devmap or
> >> (non-present) migration entry. It calls pmdp_invalidate()
> >> unconditionally on the pmdp and only determines if it is present or not
> >> based on the returned old pmd.
> >>
> >> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
> >> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
> >> pmd_present() calls to return true - even for a swap pmd. Therefore any
> >> lockless pgtable walker could see the migration entry pmd in this state
> >> and start interpretting the fields (e.g. pmd_pfn()) as if it were
> >> present, leading to BadThings (TM). GUP-fast appears to be one such
> >> lockless pgtable walker.
> >>
> >> [...]
> >
> > Applied to arm64 (for-next/fixes), thanks! It should land in 6.9-rc7. I
> > removed the debug/test code, please send it as a separate patch for
> > 6.10.
>
> Thanks Catalin! I'm guessing this will turn up in today's linux-next, so if I
> send the tests today and Andrew puts them straight in mm-unstable (which will
> goto linux-next) there is no risk that the tests are there without the fix? Or
> do I need to hold off until the fix is in v6.9-rc7?
It looks like we don't push for-next/fixes to linux-next, it's
short-lived usually, it ends up upstream quickly. I can send the pull
request later today, should turn up in mainline by tomorrow. You can add
a note to your patch for Andrew that it will fail on arm64 until the fix
ends up upstream. It's a matter of a couple of days anyway.
--
Catalin
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
2024-05-01 10:04 ` Catalin Marinas
@ 2024-05-01 10:13 ` Ryan Roberts
0 siblings, 0 replies; 15+ messages in thread
From: Ryan Roberts @ 2024-05-01 10:13 UTC (permalink / raw)
To: Catalin Marinas
Cc: Will Deacon, Mark Rutland, Anshuman Khandual, Andrew Morton,
Zi Yan, Aneesh Kumar K.V, linux-arm-kernel, linux-mm,
linux-kernel, stable
On 01/05/2024 11:04, Catalin Marinas wrote:
> On Wed, May 01, 2024 at 09:05:17AM +0100, Ryan Roberts wrote:
>> On 30/04/2024 18:57, Catalin Marinas wrote:
>>> On Tue, 30 Apr 2024 14:31:38 +0100, Ryan Roberts wrote:
>>>> __split_huge_pmd_locked() can be called for a present THP, devmap or
>>>> (non-present) migration entry. It calls pmdp_invalidate()
>>>> unconditionally on the pmdp and only determines if it is present or not
>>>> based on the returned old pmd.
>>>>
>>>> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
>>>> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
>>>> pmd_present() calls to return true - even for a swap pmd. Therefore any
>>>> lockless pgtable walker could see the migration entry pmd in this state
>>>> and start interpretting the fields (e.g. pmd_pfn()) as if it were
>>>> present, leading to BadThings (TM). GUP-fast appears to be one such
>>>> lockless pgtable walker.
>>>>
>>>> [...]
>>>
>>> Applied to arm64 (for-next/fixes), thanks! It should land in 6.9-rc7. I
>>> removed the debug/test code, please send it as a separate patch for
>>> 6.10.
>>
>> Thanks Catalin! I'm guessing this will turn up in today's linux-next, so if I
>> send the tests today and Andrew puts them straight in mm-unstable (which will
>> goto linux-next) there is no risk that the tests are there without the fix? Or
>> do I need to hold off until the fix is in v6.9-rc7?
>
> It looks like we don't push for-next/fixes to linux-next, it's
> short-lived usually, it ends up upstream quickly. I can send the pull
> request later today, should turn up in mainline by tomorrow. You can add
> a note to your patch for Andrew that it will fail on arm64 until the fix
> ends up upstream. It's a matter of a couple of days anyway.
OK, I just didn't want to send it only for our CI to start failing. I'll do as
you suggest.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
2024-04-30 17:57 ` Catalin Marinas
2024-05-01 8:05 ` Ryan Roberts
@ 2024-05-02 18:00 ` Catalin Marinas
1 sibling, 0 replies; 15+ messages in thread
From: Catalin Marinas @ 2024-05-02 18:00 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, Anshuman Khandual, Andrew Morton,
Zi Yan, Aneesh Kumar K.V, Ryan Roberts
Cc: linux-arm-kernel, linux-mm, linux-kernel, stable
On Tue, Apr 30, 2024 at 06:57:52PM +0100, Catalin Marinas wrote:
> On Tue, 30 Apr 2024 14:31:38 +0100, Ryan Roberts wrote:
> > __split_huge_pmd_locked() can be called for a present THP, devmap or
> > (non-present) migration entry. It calls pmdp_invalidate()
> > unconditionally on the pmdp and only determines if it is present or not
> > based on the returned old pmd.
> >
> > But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
> > unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
> > pmd_present() calls to return true - even for a swap pmd. Therefore any
> > lockless pgtable walker could see the migration entry pmd in this state
> > and start interpretting the fields (e.g. pmd_pfn()) as if it were
> > present, leading to BadThings (TM). GUP-fast appears to be one such
> > lockless pgtable walker.
> >
> > [...]
>
> Applied to arm64 (for-next/fixes), thanks! It should land in 6.9-rc7. I
> removed the debug/test code, please send it as a separate patch for
> 6.10.
>
> [1/1] arm64/mm: pmd_mkinvalid() must handle swap pmds
> https://git.kernel.org/arm64/c/e783331c7720
Since Andrew merged the generic mm fix, I dropped this patch from the
arm64 for-next/fixes branch.
--
Catalin
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
2024-04-30 13:31 [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds Ryan Roberts
` (2 preceding siblings ...)
2024-04-30 17:57 ` Catalin Marinas
@ 2024-05-01 11:35 ` Ryan Roberts
2024-05-01 11:38 ` Ryan Roberts
3 siblings, 1 reply; 15+ messages in thread
From: Ryan Roberts @ 2024-05-01 11:35 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Mark Rutland, Anshuman Khandual,
Andrew Morton, Zi Yan, Aneesh Kumar K.V
Cc: linux-arm-kernel, linux-mm, linux-kernel, stable
Zi Yan, I'm hoping you might have some input on the below...
On 30/04/2024 14:31, Ryan Roberts wrote:
> __split_huge_pmd_locked() can be called for a present THP, devmap or
> (non-present) migration entry. It calls pmdp_invalidate()
> unconditionally on the pmdp and only determines if it is present or not
> based on the returned old pmd.
>
> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
> pmd_present() calls to return true - even for a swap pmd. Therefore any
> lockless pgtable walker could see the migration entry pmd in this state
> and start interpretting the fields (e.g. pmd_pfn()) as if it were
> present, leading to BadThings (TM). GUP-fast appears to be one such
> lockless pgtable walker.
>
> While the obvious fix is for core-mm to avoid such calls for non-present
> pmds (pmdp_invalidate() will also issue TLBI which is not necessary for
> this case either), all other arches that implement pmd_mkinvalid() do it
> in such a way that it is robust to being called with a non-present pmd.
OK the plot thickens; The tests I wrote to check that pmd_mkinvalid() is safe for swap entries fails on x86_64. See below...
> So it is simpler and safer to make arm64 robust too. This approach means
> we can even add tests to debug_vm_pgtable.c to validate the required
> behaviour.
>
> This is a theoretical bug found during code review. I don't have any
> test case to trigger it in practice.
>
> Cc: stable@vger.kernel.org
> Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration")
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>
> Hi all,
>
> v1 of this fix [1] took the approach of fixing core-mm to never call
> pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64
> suffers this problem; all other arches are robust. So his suggestion was to
> instead make arm64 robust in the same way and add tests to validate it. Despite
> my stated reservations in the context of the v1 discussion, having thought on it
> for a bit, I now agree with Zi Yan. Hence this post.
>
> Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is
> remove it from there and have this go in through the arm64 tree? Assuming there
> is agreement that this approach is right one.
>
> This applies on top of v6.9-rc5. Passes all the mm selftests on arm64.
>
> [1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
>
> Thanks,
> Ryan
>
>
> arch/arm64/include/asm/pgtable.h | 12 +++++--
> mm/debug_vm_pgtable.c | 61 ++++++++++++++++++++++++++++++++
> 2 files changed, 71 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index afdd56d26ad7..7d580271a46d 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -511,8 +511,16 @@ static inline int pmd_trans_huge(pmd_t pmd)
>
> static inline pmd_t pmd_mkinvalid(pmd_t pmd)
> {
> - pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
> - pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
> + /*
> + * If not valid then either we are already present-invalid or we are
> + * not-present (i.e. none or swap entry). We must not convert
> + * not-present to present-invalid. Unbelievably, the core-mm may call
> + * pmd_mkinvalid() for a swap entry and all other arches can handle it.
> + */
> + if (pmd_valid(pmd)) {
> + pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
> + pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
> + }
>
> return pmd;
> }
> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> index 65c19025da3d..7e9c387d06b0 100644
> --- a/mm/debug_vm_pgtable.c
> +++ b/mm/debug_vm_pgtable.c
> @@ -956,6 +956,65 @@ static void __init hugetlb_basic_tests(struct pgtable_debug_args *args) { }
> #endif /* CONFIG_HUGETLB_PAGE */
>
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +#if !defined(__HAVE_ARCH_PMDP_INVALIDATE) && defined(CONFIG_ARCH_ENABLE_THP_MIGRATION)
> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args)
> +{
Printing various values at different locations in this function for debug:
> + unsigned long max_swap_offset;
> + swp_entry_t swp_set, swp_clear, swp_convert;
> + pmd_t pmd_set, pmd_clear;
> +
> + /*
> + * See generic_max_swapfile_size(): probe the maximum offset, then
> + * create swap entry will all possible bits set and a swap entry will
> + * all bits clear.
> + */
> + max_swap_offset = swp_offset(pmd_to_swp_entry(swp_entry_to_pmd(swp_entry(0, ~0UL))));
> + swp_set = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset);
> + swp_clear = swp_entry(0, 0);
> +
> + /* Convert to pmd. */
> + pmd_set = swp_entry_to_pmd(swp_set);
> + pmd_clear = swp_entry_to_pmd(swp_clear);
[ 0.702163] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: valid: pmd_set=f800000000000000, pmd_clear=7fffffffffffe00
> +
> + /*
> + * Sanity check that the pmds are not-present, not-huge and swap entry
> + * is recoverable without corruption.
> + */
> + WARN_ON(pmd_present(pmd_set));
> + WARN_ON(pmd_trans_huge(pmd_set));
> + swp_convert = pmd_to_swp_entry(pmd_set);
> + WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
> + WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
> + WARN_ON(pmd_present(pmd_clear));
> + WARN_ON(pmd_trans_huge(pmd_clear));
> + swp_convert = pmd_to_swp_entry(pmd_clear);
> + WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
> + WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));
> +
> + /* Now invalidate the pmd. */
> + pmd_set = pmd_mkinvalid(pmd_set);
> + pmd_clear = pmd_mkinvalid(pmd_clear);
[ 0.704452] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: invalid: pmd_set=f800000000000000, pmd_clear=7ffffffffe00e00
> +
> + /*
> + * Since its a swap pmd, invalidation should effectively be a noop and
> + * the checks we already did should give the same answer. Check the
> + * invalidation didn't corrupt any fields.
> + */
> + WARN_ON(pmd_present(pmd_set));
> + WARN_ON(pmd_trans_huge(pmd_set));
> + swp_convert = pmd_to_swp_entry(pmd_set);
[ 0.706461] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: set: swp=7c03ffffffffffff (1f, 3ffffffffffff), convert=7c03ffffffffffff (1f, 3ffffffffffff)
> + WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
> + WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
> + WARN_ON(pmd_present(pmd_clear));
> + WARN_ON(pmd_trans_huge(pmd_clear));
> + swp_convert = pmd_to_swp_entry(pmd_clear);
[ 0.708841] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: clear: swp=0 (0, 0), convert=ff8 (0, ff8)
> + WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
> + WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));
This line fails on x86_64.
The logs show that the offset is indeed being corrupted by pmd_mkinvalid(); 0 -> 0xff8.
I think this is due to x86's pmd_mkinvalid() assuming the pmd is present; pmd_flags() and pmd_pfn() do all sorts of weird and wonderful things.
So does this take us full circle? Are we now back to modifying the core-mm to never call pmd_mkinvalid() on a non-present entry? If so, then I guess we should remove the arm64 fix from for-next/fixes.
> +}
> +#else
> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args) { }
> +#endif /* !__HAVE_ARCH_PMDP_INVALIDATE && CONFIG_ARCH_ENABLE_THP_MIGRATION */
> +
> static void __init pmd_thp_tests(struct pgtable_debug_args *args)
> {
> pmd_t pmd;
> @@ -982,6 +1041,8 @@ static void __init pmd_thp_tests(struct pgtable_debug_args *args)
> WARN_ON(!pmd_trans_huge(pmd_mkinvalid(pmd_mkhuge(pmd))));
> WARN_ON(!pmd_present(pmd_mkinvalid(pmd_mkhuge(pmd))));
> #endif /* __HAVE_ARCH_PMDP_INVALIDATE */
> +
> + swp_pmd_mkinvalid_tests(args);
> }
>
> #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
2024-05-01 11:35 ` Ryan Roberts
@ 2024-05-01 11:38 ` Ryan Roberts
2024-05-01 12:07 ` Zi Yan
0 siblings, 1 reply; 15+ messages in thread
From: Ryan Roberts @ 2024-05-01 11:38 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Mark Rutland, Anshuman Khandual,
Andrew Morton, Zi Yan, Aneesh Kumar K.V, David Hildenbrand
Cc: linux-arm-kernel, linux-mm, linux-kernel, stable
Pulling in David, who may be able to advise...
On 01/05/2024 12:35, Ryan Roberts wrote:
> Zi Yan, I'm hoping you might have some input on the below...
>
>
> On 30/04/2024 14:31, Ryan Roberts wrote:
>> __split_huge_pmd_locked() can be called for a present THP, devmap or
>> (non-present) migration entry. It calls pmdp_invalidate()
>> unconditionally on the pmdp and only determines if it is present or not
>> based on the returned old pmd.
>>
>> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
>> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
>> pmd_present() calls to return true - even for a swap pmd. Therefore any
>> lockless pgtable walker could see the migration entry pmd in this state
>> and start interpretting the fields (e.g. pmd_pfn()) as if it were
>> present, leading to BadThings (TM). GUP-fast appears to be one such
>> lockless pgtable walker.
>>
>> While the obvious fix is for core-mm to avoid such calls for non-present
>> pmds (pmdp_invalidate() will also issue TLBI which is not necessary for
>> this case either), all other arches that implement pmd_mkinvalid() do it
>> in such a way that it is robust to being called with a non-present pmd.
>
> OK the plot thickens; The tests I wrote to check that pmd_mkinvalid() is safe for swap entries fails on x86_64. See below...
>
>> So it is simpler and safer to make arm64 robust too. This approach means
>> we can even add tests to debug_vm_pgtable.c to validate the required
>> behaviour.
>>
>> This is a theoretical bug found during code review. I don't have any
>> test case to trigger it in practice.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration")
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>
>> Hi all,
>>
>> v1 of this fix [1] took the approach of fixing core-mm to never call
>> pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64
>> suffers this problem; all other arches are robust. So his suggestion was to
>> instead make arm64 robust in the same way and add tests to validate it. Despite
>> my stated reservations in the context of the v1 discussion, having thought on it
>> for a bit, I now agree with Zi Yan. Hence this post.
>>
>> Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is
>> remove it from there and have this go in through the arm64 tree? Assuming there
>> is agreement that this approach is right one.
>>
>> This applies on top of v6.9-rc5. Passes all the mm selftests on arm64.
>>
>> [1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
>>
>> Thanks,
>> Ryan
>>
>>
>> arch/arm64/include/asm/pgtable.h | 12 +++++--
>> mm/debug_vm_pgtable.c | 61 ++++++++++++++++++++++++++++++++
>> 2 files changed, 71 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index afdd56d26ad7..7d580271a46d 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -511,8 +511,16 @@ static inline int pmd_trans_huge(pmd_t pmd)
>>
>> static inline pmd_t pmd_mkinvalid(pmd_t pmd)
>> {
>> - pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
>> - pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
>> + /*
>> + * If not valid then either we are already present-invalid or we are
>> + * not-present (i.e. none or swap entry). We must not convert
>> + * not-present to present-invalid. Unbelievably, the core-mm may call
>> + * pmd_mkinvalid() for a swap entry and all other arches can handle it.
>> + */
>> + if (pmd_valid(pmd)) {
>> + pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
>> + pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
>> + }
>>
>> return pmd;
>> }
>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>> index 65c19025da3d..7e9c387d06b0 100644
>> --- a/mm/debug_vm_pgtable.c
>> +++ b/mm/debug_vm_pgtable.c
>> @@ -956,6 +956,65 @@ static void __init hugetlb_basic_tests(struct pgtable_debug_args *args) { }
>> #endif /* CONFIG_HUGETLB_PAGE */
>>
>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +#if !defined(__HAVE_ARCH_PMDP_INVALIDATE) && defined(CONFIG_ARCH_ENABLE_THP_MIGRATION)
>> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args)
>> +{
>
> Printing various values at different locations in this function for debug:
>
>> + unsigned long max_swap_offset;
>> + swp_entry_t swp_set, swp_clear, swp_convert;
>> + pmd_t pmd_set, pmd_clear;
>> +
>> + /*
>> + * See generic_max_swapfile_size(): probe the maximum offset, then
>> + * create swap entry will all possible bits set and a swap entry will
>> + * all bits clear.
>> + */
>> + max_swap_offset = swp_offset(pmd_to_swp_entry(swp_entry_to_pmd(swp_entry(0, ~0UL))));
>> + swp_set = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset);
>> + swp_clear = swp_entry(0, 0);
>> +
>> + /* Convert to pmd. */
>> + pmd_set = swp_entry_to_pmd(swp_set);
>> + pmd_clear = swp_entry_to_pmd(swp_clear);
>
> [ 0.702163] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: valid: pmd_set=f800000000000000, pmd_clear=7fffffffffffe00
>
>> +
>> + /*
>> + * Sanity check that the pmds are not-present, not-huge and swap entry
>> + * is recoverable without corruption.
>> + */
>> + WARN_ON(pmd_present(pmd_set));
>> + WARN_ON(pmd_trans_huge(pmd_set));
>> + swp_convert = pmd_to_swp_entry(pmd_set);
>> + WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
>> + WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
>> + WARN_ON(pmd_present(pmd_clear));
>> + WARN_ON(pmd_trans_huge(pmd_clear));
>> + swp_convert = pmd_to_swp_entry(pmd_clear);
>> + WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
>> + WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));
>> +
>> + /* Now invalidate the pmd. */
>> + pmd_set = pmd_mkinvalid(pmd_set);
>> + pmd_clear = pmd_mkinvalid(pmd_clear);
>
> [ 0.704452] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: invalid: pmd_set=f800000000000000, pmd_clear=7ffffffffe00e00
>
>> +
>> + /*
>> + * Since its a swap pmd, invalidation should effectively be a noop and
>> + * the checks we already did should give the same answer. Check the
>> + * invalidation didn't corrupt any fields.
>> + */
>> + WARN_ON(pmd_present(pmd_set));
>> + WARN_ON(pmd_trans_huge(pmd_set));
>> + swp_convert = pmd_to_swp_entry(pmd_set);
>
> [ 0.706461] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: set: swp=7c03ffffffffffff (1f, 3ffffffffffff), convert=7c03ffffffffffff (1f, 3ffffffffffff)
>
>> + WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
>> + WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
>> + WARN_ON(pmd_present(pmd_clear));
>> + WARN_ON(pmd_trans_huge(pmd_clear));
>> + swp_convert = pmd_to_swp_entry(pmd_clear);
>
> [ 0.708841] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: clear: swp=0 (0, 0), convert=ff8 (0, ff8)
>
>> + WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
>> + WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));
>
> This line fails on x86_64.
>
> The logs show that the offset is indeed being corrupted by pmd_mkinvalid(); 0 -> 0xff8.
>
> I think this is due to x86's pmd_mkinvalid() assuming the pmd is present; pmd_flags() and pmd_pfn() do all sorts of weird and wonderful things.
>
> So does this take us full circle? Are we now back to modifying the core-mm to never call pmd_mkinvalid() on a non-present entry? If so, then I guess we should remove the arm64 fix from for-next/fixes.
>
>> +}
>> +#else
>> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args) { }
>> +#endif /* !__HAVE_ARCH_PMDP_INVALIDATE && CONFIG_ARCH_ENABLE_THP_MIGRATION */
>> +
>> static void __init pmd_thp_tests(struct pgtable_debug_args *args)
>> {
>> pmd_t pmd;
>> @@ -982,6 +1041,8 @@ static void __init pmd_thp_tests(struct pgtable_debug_args *args)
>> WARN_ON(!pmd_trans_huge(pmd_mkinvalid(pmd_mkhuge(pmd))));
>> WARN_ON(!pmd_present(pmd_mkinvalid(pmd_mkhuge(pmd))));
>> #endif /* __HAVE_ARCH_PMDP_INVALIDATE */
>> +
>> + swp_pmd_mkinvalid_tests(args);
>> }
>>
>> #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>> --
>> 2.25.1
>>
>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
2024-05-01 11:38 ` Ryan Roberts
@ 2024-05-01 12:07 ` Zi Yan
2024-05-01 12:58 ` Ryan Roberts
0 siblings, 1 reply; 15+ messages in thread
From: Zi Yan @ 2024-05-01 12:07 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Anshuman Khandual,
Andrew Morton, Aneesh Kumar K.V, David Hildenbrand,
linux-arm-kernel, linux-mm, linux-kernel, stable
[-- Attachment #1: Type: text/plain, Size: 8872 bytes --]
On 1 May 2024, at 7:38, Ryan Roberts wrote:
> Pulling in David, who may be able to advise...
>
>
> On 01/05/2024 12:35, Ryan Roberts wrote:
>> Zi Yan, I'm hoping you might have some input on the below...
>>
>>
>> On 30/04/2024 14:31, Ryan Roberts wrote:
>>> __split_huge_pmd_locked() can be called for a present THP, devmap or
>>> (non-present) migration entry. It calls pmdp_invalidate()
>>> unconditionally on the pmdp and only determines if it is present or not
>>> based on the returned old pmd.
>>>
>>> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
>>> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
>>> pmd_present() calls to return true - even for a swap pmd. Therefore any
>>> lockless pgtable walker could see the migration entry pmd in this state
>>> and start interpretting the fields (e.g. pmd_pfn()) as if it were
>>> present, leading to BadThings (TM). GUP-fast appears to be one such
>>> lockless pgtable walker.
>>>
>>> While the obvious fix is for core-mm to avoid such calls for non-present
>>> pmds (pmdp_invalidate() will also issue TLBI which is not necessary for
>>> this case either), all other arches that implement pmd_mkinvalid() do it
>>> in such a way that it is robust to being called with a non-present pmd.
>>
>> OK the plot thickens; The tests I wrote to check that pmd_mkinvalid() is safe for swap entries fails on x86_64. See below...
>>
>>> So it is simpler and safer to make arm64 robust too. This approach means
>>> we can even add tests to debug_vm_pgtable.c to validate the required
>>> behaviour.
>>>
>>> This is a theoretical bug found during code review. I don't have any
>>> test case to trigger it in practice.
>>>
>>> Cc: stable@vger.kernel.org
>>> Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration")
>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>> ---
>>>
>>> Hi all,
>>>
>>> v1 of this fix [1] took the approach of fixing core-mm to never call
>>> pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64
>>> suffers this problem; all other arches are robust. So his suggestion was to
>>> instead make arm64 robust in the same way and add tests to validate it. Despite
>>> my stated reservations in the context of the v1 discussion, having thought on it
>>> for a bit, I now agree with Zi Yan. Hence this post.
>>>
>>> Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is
>>> remove it from there and have this go in through the arm64 tree? Assuming there
>>> is agreement that this approach is right one.
>>>
>>> This applies on top of v6.9-rc5. Passes all the mm selftests on arm64.
>>>
>>> [1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
>>>
>>> Thanks,
>>> Ryan
>>>
>>>
>>> arch/arm64/include/asm/pgtable.h | 12 +++++--
>>> mm/debug_vm_pgtable.c | 61 ++++++++++++++++++++++++++++++++
>>> 2 files changed, 71 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>> index afdd56d26ad7..7d580271a46d 100644
>>> --- a/arch/arm64/include/asm/pgtable.h
>>> +++ b/arch/arm64/include/asm/pgtable.h
>>> @@ -511,8 +511,16 @@ static inline int pmd_trans_huge(pmd_t pmd)
>>>
>>> static inline pmd_t pmd_mkinvalid(pmd_t pmd)
>>> {
>>> - pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
>>> - pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
>>> + /*
>>> + * If not valid then either we are already present-invalid or we are
>>> + * not-present (i.e. none or swap entry). We must not convert
>>> + * not-present to present-invalid. Unbelievably, the core-mm may call
>>> + * pmd_mkinvalid() for a swap entry and all other arches can handle it.
>>> + */
>>> + if (pmd_valid(pmd)) {
>>> + pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
>>> + pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
>>> + }
>>>
>>> return pmd;
>>> }
>>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>>> index 65c19025da3d..7e9c387d06b0 100644
>>> --- a/mm/debug_vm_pgtable.c
>>> +++ b/mm/debug_vm_pgtable.c
>>> @@ -956,6 +956,65 @@ static void __init hugetlb_basic_tests(struct pgtable_debug_args *args) { }
>>> #endif /* CONFIG_HUGETLB_PAGE */
>>>
>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>> +#if !defined(__HAVE_ARCH_PMDP_INVALIDATE) && defined(CONFIG_ARCH_ENABLE_THP_MIGRATION)
>>> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args)
>>> +{
>>
>> Printing various values at different locations in this function for debug:
>>
>>> + unsigned long max_swap_offset;
>>> + swp_entry_t swp_set, swp_clear, swp_convert;
>>> + pmd_t pmd_set, pmd_clear;
>>> +
>>> + /*
>>> + * See generic_max_swapfile_size(): probe the maximum offset, then
>>> + * create swap entry will all possible bits set and a swap entry will
>>> + * all bits clear.
>>> + */
>>> + max_swap_offset = swp_offset(pmd_to_swp_entry(swp_entry_to_pmd(swp_entry(0, ~0UL))));
>>> + swp_set = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset);
>>> + swp_clear = swp_entry(0, 0);
>>> +
>>> + /* Convert to pmd. */
>>> + pmd_set = swp_entry_to_pmd(swp_set);
>>> + pmd_clear = swp_entry_to_pmd(swp_clear);
>>
>> [ 0.702163] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: valid: pmd_set=f800000000000000, pmd_clear=7fffffffffffe00
>>
>>> +
>>> + /*
>>> + * Sanity check that the pmds are not-present, not-huge and swap entry
>>> + * is recoverable without corruption.
>>> + */
>>> + WARN_ON(pmd_present(pmd_set));
>>> + WARN_ON(pmd_trans_huge(pmd_set));
>>> + swp_convert = pmd_to_swp_entry(pmd_set);
>>> + WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
>>> + WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
>>> + WARN_ON(pmd_present(pmd_clear));
>>> + WARN_ON(pmd_trans_huge(pmd_clear));
>>> + swp_convert = pmd_to_swp_entry(pmd_clear);
>>> + WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
>>> + WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));
>>> +
>>> + /* Now invalidate the pmd. */
>>> + pmd_set = pmd_mkinvalid(pmd_set);
>>> + pmd_clear = pmd_mkinvalid(pmd_clear);
>>
>> [ 0.704452] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: invalid: pmd_set=f800000000000000, pmd_clear=7ffffffffe00e00
>>
>>> +
>>> + /*
>>> + * Since its a swap pmd, invalidation should effectively be a noop and
>>> + * the checks we already did should give the same answer. Check the
>>> + * invalidation didn't corrupt any fields.
>>> + */
>>> + WARN_ON(pmd_present(pmd_set));
>>> + WARN_ON(pmd_trans_huge(pmd_set));
>>> + swp_convert = pmd_to_swp_entry(pmd_set);
>>
>> [ 0.706461] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: set: swp=7c03ffffffffffff (1f, 3ffffffffffff), convert=7c03ffffffffffff (1f, 3ffffffffffff)
>>
>>> + WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
>>> + WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
>>> + WARN_ON(pmd_present(pmd_clear));
>>> + WARN_ON(pmd_trans_huge(pmd_clear));
>>> + swp_convert = pmd_to_swp_entry(pmd_clear);
>>
>> [ 0.708841] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: clear: swp=0 (0, 0), convert=ff8 (0, ff8)
>>
>>> + WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
>>> + WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));
>>
>> This line fails on x86_64.
>>
>> The logs show that the offset is indeed being corrupted by pmd_mkinvalid(); 0 -> 0xff8.
>>
>> I think this is due to x86's pmd_mkinvalid() assuming the pmd is present; pmd_flags() and pmd_pfn() do all sorts of weird and wonderful things.
>>
>> So does this take us full circle? Are we now back to modifying the core-mm to never call pmd_mkinvalid() on a non-present entry? If so, then I guess we should remove the arm64 fix from for-next/fixes.
If x86_64's pmd_mkinvalid() also corrupts swap entries, yes, your original fix
is better. I will dig into the x86 code more to figure out what goes wrong.
Last time, I only checked PAGE_* bits in these pmd|pte_* operations.
Sorry for the misinformation.
>>
>>> +}
>>> +#else
>>> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args) { }
>>> +#endif /* !__HAVE_ARCH_PMDP_INVALIDATE && CONFIG_ARCH_ENABLE_THP_MIGRATION */
>>> +
>>> static void __init pmd_thp_tests(struct pgtable_debug_args *args)
>>> {
>>> pmd_t pmd;
>>> @@ -982,6 +1041,8 @@ static void __init pmd_thp_tests(struct pgtable_debug_args *args)
>>> WARN_ON(!pmd_trans_huge(pmd_mkinvalid(pmd_mkhuge(pmd))));
>>> WARN_ON(!pmd_present(pmd_mkinvalid(pmd_mkhuge(pmd))));
>>> #endif /* __HAVE_ARCH_PMDP_INVALIDATE */
>>> +
>>> + swp_pmd_mkinvalid_tests(args);
>>> }
>>>
>>> #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>> --
>>> 2.25.1
>>>
>>
--
Best Regards,
Yan, Zi
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
2024-05-01 12:07 ` Zi Yan
@ 2024-05-01 12:58 ` Ryan Roberts
0 siblings, 0 replies; 15+ messages in thread
From: Ryan Roberts @ 2024-05-01 12:58 UTC (permalink / raw)
To: Zi Yan
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Anshuman Khandual,
Andrew Morton, Aneesh Kumar K.V, David Hildenbrand,
linux-arm-kernel, linux-mm, linux-kernel, stable
On 01/05/2024 13:07, Zi Yan wrote:
> On 1 May 2024, at 7:38, Ryan Roberts wrote:
>
>> Pulling in David, who may be able to advise...
>>
>>
>> On 01/05/2024 12:35, Ryan Roberts wrote:
>>> Zi Yan, I'm hoping you might have some input on the below...
>>>
>>>
>>> On 30/04/2024 14:31, Ryan Roberts wrote:
>>>> __split_huge_pmd_locked() can be called for a present THP, devmap or
>>>> (non-present) migration entry. It calls pmdp_invalidate()
>>>> unconditionally on the pmdp and only determines if it is present or not
>>>> based on the returned old pmd.
>>>>
>>>> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
>>>> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
>>>> pmd_present() calls to return true - even for a swap pmd. Therefore any
>>>> lockless pgtable walker could see the migration entry pmd in this state
>>>> and start interpretting the fields (e.g. pmd_pfn()) as if it were
>>>> present, leading to BadThings (TM). GUP-fast appears to be one such
>>>> lockless pgtable walker.
>>>>
>>>> While the obvious fix is for core-mm to avoid such calls for non-present
>>>> pmds (pmdp_invalidate() will also issue TLBI which is not necessary for
>>>> this case either), all other arches that implement pmd_mkinvalid() do it
>>>> in such a way that it is robust to being called with a non-present pmd.
>>>
>>> OK the plot thickens; The tests I wrote to check that pmd_mkinvalid() is safe for swap entries fails on x86_64. See below...
>>>
>>>> So it is simpler and safer to make arm64 robust too. This approach means
>>>> we can even add tests to debug_vm_pgtable.c to validate the required
>>>> behaviour.
>>>>
>>>> This is a theoretical bug found during code review. I don't have any
>>>> test case to trigger it in practice.
>>>>
>>>> Cc: stable@vger.kernel.org
>>>> Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration")
>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>> ---
>>>>
>>>> Hi all,
>>>>
>>>> v1 of this fix [1] took the approach of fixing core-mm to never call
>>>> pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64
>>>> suffers this problem; all other arches are robust. So his suggestion was to
>>>> instead make arm64 robust in the same way and add tests to validate it. Despite
>>>> my stated reservations in the context of the v1 discussion, having thought on it
>>>> for a bit, I now agree with Zi Yan. Hence this post.
>>>>
>>>> Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is
>>>> remove it from there and have this go in through the arm64 tree? Assuming there
>>>> is agreement that this approach is right one.
>>>>
>>>> This applies on top of v6.9-rc5. Passes all the mm selftests on arm64.
>>>>
>>>> [1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
>>>>
>>>> Thanks,
>>>> Ryan
>>>>
>>>>
>>>> arch/arm64/include/asm/pgtable.h | 12 +++++--
>>>> mm/debug_vm_pgtable.c | 61 ++++++++++++++++++++++++++++++++
>>>> 2 files changed, 71 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>>> index afdd56d26ad7..7d580271a46d 100644
>>>> --- a/arch/arm64/include/asm/pgtable.h
>>>> +++ b/arch/arm64/include/asm/pgtable.h
>>>> @@ -511,8 +511,16 @@ static inline int pmd_trans_huge(pmd_t pmd)
>>>>
>>>> static inline pmd_t pmd_mkinvalid(pmd_t pmd)
>>>> {
>>>> - pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
>>>> - pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
>>>> + /*
>>>> + * If not valid then either we are already present-invalid or we are
>>>> + * not-present (i.e. none or swap entry). We must not convert
>>>> + * not-present to present-invalid. Unbelievably, the core-mm may call
>>>> + * pmd_mkinvalid() for a swap entry and all other arches can handle it.
>>>> + */
>>>> + if (pmd_valid(pmd)) {
>>>> + pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
>>>> + pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
>>>> + }
>>>>
>>>> return pmd;
>>>> }
>>>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>>>> index 65c19025da3d..7e9c387d06b0 100644
>>>> --- a/mm/debug_vm_pgtable.c
>>>> +++ b/mm/debug_vm_pgtable.c
>>>> @@ -956,6 +956,65 @@ static void __init hugetlb_basic_tests(struct pgtable_debug_args *args) { }
>>>> #endif /* CONFIG_HUGETLB_PAGE */
>>>>
>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>> +#if !defined(__HAVE_ARCH_PMDP_INVALIDATE) && defined(CONFIG_ARCH_ENABLE_THP_MIGRATION)
>>>> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args)
>>>> +{
>>>
>>> Printing various values at different locations in this function for debug:
>>>
>>>> + unsigned long max_swap_offset;
>>>> + swp_entry_t swp_set, swp_clear, swp_convert;
>>>> + pmd_t pmd_set, pmd_clear;
>>>> +
>>>> + /*
>>>> + * See generic_max_swapfile_size(): probe the maximum offset, then
>>>> + * create swap entry will all possible bits set and a swap entry will
>>>> + * all bits clear.
>>>> + */
>>>> + max_swap_offset = swp_offset(pmd_to_swp_entry(swp_entry_to_pmd(swp_entry(0, ~0UL))));
>>>> + swp_set = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset);
>>>> + swp_clear = swp_entry(0, 0);
>>>> +
>>>> + /* Convert to pmd. */
>>>> + pmd_set = swp_entry_to_pmd(swp_set);
>>>> + pmd_clear = swp_entry_to_pmd(swp_clear);
>>>
>>> [ 0.702163] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: valid: pmd_set=f800000000000000, pmd_clear=7fffffffffffe00
>>>
>>>> +
>>>> + /*
>>>> + * Sanity check that the pmds are not-present, not-huge and swap entry
>>>> + * is recoverable without corruption.
>>>> + */
>>>> + WARN_ON(pmd_present(pmd_set));
>>>> + WARN_ON(pmd_trans_huge(pmd_set));
>>>> + swp_convert = pmd_to_swp_entry(pmd_set);
>>>> + WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
>>>> + WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
>>>> + WARN_ON(pmd_present(pmd_clear));
>>>> + WARN_ON(pmd_trans_huge(pmd_clear));
>>>> + swp_convert = pmd_to_swp_entry(pmd_clear);
>>>> + WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
>>>> + WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));
>>>> +
>>>> + /* Now invalidate the pmd. */
>>>> + pmd_set = pmd_mkinvalid(pmd_set);
>>>> + pmd_clear = pmd_mkinvalid(pmd_clear);
>>>
>>> [ 0.704452] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: invalid: pmd_set=f800000000000000, pmd_clear=7ffffffffe00e00
>>>
>>>> +
>>>> + /*
>>>> + * Since its a swap pmd, invalidation should effectively be a noop and
>>>> + * the checks we already did should give the same answer. Check the
>>>> + * invalidation didn't corrupt any fields.
>>>> + */
>>>> + WARN_ON(pmd_present(pmd_set));
>>>> + WARN_ON(pmd_trans_huge(pmd_set));
>>>> + swp_convert = pmd_to_swp_entry(pmd_set);
>>>
>>> [ 0.706461] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: set: swp=7c03ffffffffffff (1f, 3ffffffffffff), convert=7c03ffffffffffff (1f, 3ffffffffffff)
>>>
>>>> + WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
>>>> + WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
>>>> + WARN_ON(pmd_present(pmd_clear));
>>>> + WARN_ON(pmd_trans_huge(pmd_clear));
>>>> + swp_convert = pmd_to_swp_entry(pmd_clear);
>>>
>>> [ 0.708841] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: clear: swp=0 (0, 0), convert=ff8 (0, ff8)
>>>
>>>> + WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
>>>> + WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));
>>>
>>> This line fails on x86_64.
>>>
>>> The logs show that the offset is indeed being corrupted by pmd_mkinvalid(); 0 -> 0xff8.
>>>
>>> I think this is due to x86's pmd_mkinvalid() assuming the pmd is present; pmd_flags() and pmd_pfn() do all sorts of weird and wonderful things.
>>>
>>> So does this take us full circle? Are we now back to modifying the core-mm to never call pmd_mkinvalid() on a non-present entry? If so, then I guess we should remove the arm64 fix from for-next/fixes.
>
> If x86_64's pmd_mkinvalid() also corrupts swap entries, yes, your original fix
> is better. I will dig into the x86 code more to figure out what goes wrong.
> Last time, I only checked PAGE_* bits in these pmd|pte_* operations.
> Sorry for the misinformation.
No worries, I'll do the amends we originally agreed for the original fix and resend.
>
>>>
>>>> +}
>>>> +#else
>>>> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args) { }
>>>> +#endif /* !__HAVE_ARCH_PMDP_INVALIDATE && CONFIG_ARCH_ENABLE_THP_MIGRATION */
>>>> +
>>>> static void __init pmd_thp_tests(struct pgtable_debug_args *args)
>>>> {
>>>> pmd_t pmd;
>>>> @@ -982,6 +1041,8 @@ static void __init pmd_thp_tests(struct pgtable_debug_args *args)
>>>> WARN_ON(!pmd_trans_huge(pmd_mkinvalid(pmd_mkhuge(pmd))));
>>>> WARN_ON(!pmd_present(pmd_mkinvalid(pmd_mkhuge(pmd))));
>>>> #endif /* __HAVE_ARCH_PMDP_INVALIDATE */
>>>> +
>>>> + swp_pmd_mkinvalid_tests(args);
>>>> }
>>>>
>>>> #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>>> --
>>>> 2.25.1
>>>>
>>>
>
>
> --
> Best Regards,
> Yan, Zi
^ permalink raw reply [flat|nested] 15+ messages in thread