* [PATCH fixes] riscv: mm: Do not call pmd dtor on vmemmap page table teardown
@ 2024-11-20 13:12 Björn Töpel
2024-11-28 12:38 ` Alexandre Ghiti
2024-12-11 22:33 ` patchwork-bot+linux-riscv
0 siblings, 2 replies; 3+ messages in thread
From: Björn Töpel @ 2024-11-20 13:12 UTC (permalink / raw)
To: Alexandre Ghiti, Albert Ou, David Hildenbrand, Palmer Dabbelt,
Paul Walmsley, linux-riscv, Oscar Salvador
Cc: Björn Töpel, Andrew Bresticker, linux-kernel, linux-mm,
virtualization
From: Björn Töpel <bjorn@rivosinc.com>
The vmemmap's, which is used for RV64 with SPARSEMEM_VMEMMAP, page
tables are populated using pmd (page middle directory) hugetables.
However, the pmd allocation is not using the generic mechanism used by
the VMA code (e.g. pmd_alloc()), or the RISC-V specific
create_pgd_mapping()/alloc_pmd_late(). Instead, the vmemmap page table
code allocates a page, and calls vmemmap_set_pmd(). This results in
that the pmd ctor is *not* called, nor would it make sense to do so.
Now, when tearing down a vmemmap page table pmd, the cleanup code
would unconditionally, and incorrectly call the pmd dtor, which
results in a crash (best case).
This issue was found when running the HMM selftests:
| tools/testing/selftests/mm# ./test_hmm.sh smoke
| ... # when unloading the test_hmm.ko module
| page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10915b
| flags: 0x1000000000000000(node=0|zone=1)
| raw: 1000000000000000 0000000000000000 dead000000000122 0000000000000000
| raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
| page dumped because: VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte)
| ------------[ cut here ]------------
| kernel BUG at include/linux/mm.h:3080!
| Kernel BUG [#1]
| Modules linked in: test_hmm(-) sch_fq_codel fuse drm drm_panel_orientation_quirks backlight dm_mod
| CPU: 1 UID: 0 PID: 514 Comm: modprobe Tainted: G W 6.12.0-00982-gf2a4f1682d07 #2
| Tainted: [W]=WARN
| Hardware name: riscv-virtio qemu/qemu, BIOS 2024.10 10/01/2024
| epc : remove_pgd_mapping+0xbec/0x1070
| ra : remove_pgd_mapping+0xbec/0x1070
| epc : ffffffff80010a68 ra : ffffffff80010a68 sp : ff20000000a73940
| gp : ffffffff827b2d88 tp : ff6000008785da40 t0 : ffffffff80fbce04
| t1 : 0720072007200720 t2 : 706d756420656761 s0 : ff20000000a73a50
| s1 : ff6000008915cff8 a0 : 0000000000000039 a1 : 0000000000000008
| a2 : ff600003fff0de20 a3 : 0000000000000000 a4 : 0000000000000000
| a5 : 0000000000000000 a6 : c0000000ffffefff a7 : ffffffff824469b8
| s2 : ff1c0000022456c0 s3 : ff1ffffffdbfffff s4 : ff6000008915c000
| s5 : ff6000008915c000 s6 : ff6000008915c000 s7 : ff1ffffffdc00000
| s8 : 0000000000000001 s9 : ff1ffffffdc00000 s10: ffffffff819a31f0
| s11: ffffffffffffffff t3 : ffffffff8000c950 t4 : ff60000080244f00
| t5 : ff60000080244000 t6 : ff20000000a73708
| status: 0000000200000120 badaddr: ffffffff80010a68 cause: 0000000000000003
| [<ffffffff80010a68>] remove_pgd_mapping+0xbec/0x1070
| [<ffffffff80fd238e>] vmemmap_free+0x14/0x1e
| [<ffffffff8032e698>] section_deactivate+0x220/0x452
| [<ffffffff8032ef7e>] sparse_remove_section+0x4a/0x58
| [<ffffffff802f8700>] __remove_pages+0x7e/0xba
| [<ffffffff803760d8>] memunmap_pages+0x2bc/0x3fe
| [<ffffffff02a3ca28>] dmirror_device_remove_chunks+0x2ea/0x518 [test_hmm]
| [<ffffffff02a3e026>] hmm_dmirror_exit+0x3e/0x1018 [test_hmm]
| [<ffffffff80102c14>] __riscv_sys_delete_module+0x15a/0x2a6
| [<ffffffff80fd020c>] do_trap_ecall_u+0x1f2/0x266
| [<ffffffff80fde0a2>] _new_vmalloc_restore_context_a0+0xc6/0xd2
| Code: bf51 7597 0184 8593 76a5 854a 4097 0029 80e7 2c00 (9002) 7597
| ---[ end trace 0000000000000000 ]---
| Kernel panic - not syncing: Fatal exception in interrupt
Add a check to avoid calling the pmd dtor, if the calling context is
vmemmap_free().
Fixes: c75a74f4ba19 ("riscv: mm: Add memory hotplugging support")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
---
arch/riscv/mm/init.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 0e8c20adcd98..fc53ce748c80 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -1566,7 +1566,7 @@ static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd)
pmd_clear(pmd);
}
-static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud)
+static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud, bool is_vmemmap)
{
struct page *page = pud_page(*pud);
struct ptdesc *ptdesc = page_ptdesc(page);
@@ -1579,7 +1579,8 @@ static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud)
return;
}
- pagetable_pmd_dtor(ptdesc);
+ if (!is_vmemmap)
+ pagetable_pmd_dtor(ptdesc);
if (PageReserved(page))
free_reserved_page(page);
else
@@ -1703,7 +1704,7 @@ static void __meminit remove_pud_mapping(pud_t *pud_base, unsigned long addr, un
remove_pmd_mapping(pmd_base, addr, next, is_vmemmap, altmap);
if (pgtable_l4_enabled)
- free_pmd_table(pmd_base, pudp);
+ free_pmd_table(pmd_base, pudp, is_vmemmap);
}
}
base-commit: 57f7c7dc78cd09622b12920d92b40c1ce11b234e
--
2.45.2
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH fixes] riscv: mm: Do not call pmd dtor on vmemmap page table teardown
2024-11-20 13:12 [PATCH fixes] riscv: mm: Do not call pmd dtor on vmemmap page table teardown Björn Töpel
@ 2024-11-28 12:38 ` Alexandre Ghiti
2024-12-11 22:33 ` patchwork-bot+linux-riscv
1 sibling, 0 replies; 3+ messages in thread
From: Alexandre Ghiti @ 2024-11-28 12:38 UTC (permalink / raw)
To: Björn Töpel, Alexandre Ghiti, Albert Ou,
David Hildenbrand, Palmer Dabbelt, Paul Walmsley, linux-riscv,
Oscar Salvador
Cc: Björn Töpel, Andrew Bresticker, linux-kernel, linux-mm,
virtualization
Hi Björn,
On 20/11/2024 14:12, Björn Töpel wrote:
> From: Björn Töpel <bjorn@rivosinc.com>
>
> The vmemmap's, which is used for RV64 with SPARSEMEM_VMEMMAP, page
> tables are populated using pmd (page middle directory) hugetables.
> However, the pmd allocation is not using the generic mechanism used by
> the VMA code (e.g. pmd_alloc()), or the RISC-V specific
> create_pgd_mapping()/alloc_pmd_late(). Instead, the vmemmap page table
> code allocates a page, and calls vmemmap_set_pmd(). This results in
> that the pmd ctor is *not* called, nor would it make sense to do so.
>
> Now, when tearing down a vmemmap page table pmd, the cleanup code
> would unconditionally, and incorrectly call the pmd dtor, which
> results in a crash (best case).
>
> This issue was found when running the HMM selftests:
>
> | tools/testing/selftests/mm# ./test_hmm.sh smoke
> | ... # when unloading the test_hmm.ko module
> | page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10915b
> | flags: 0x1000000000000000(node=0|zone=1)
> | raw: 1000000000000000 0000000000000000 dead000000000122 0000000000000000
> | raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> | page dumped because: VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte)
> | ------------[ cut here ]------------
> | kernel BUG at include/linux/mm.h:3080!
> | Kernel BUG [#1]
> | Modules linked in: test_hmm(-) sch_fq_codel fuse drm drm_panel_orientation_quirks backlight dm_mod
> | CPU: 1 UID: 0 PID: 514 Comm: modprobe Tainted: G W 6.12.0-00982-gf2a4f1682d07 #2
> | Tainted: [W]=WARN
> | Hardware name: riscv-virtio qemu/qemu, BIOS 2024.10 10/01/2024
> | epc : remove_pgd_mapping+0xbec/0x1070
> | ra : remove_pgd_mapping+0xbec/0x1070
> | epc : ffffffff80010a68 ra : ffffffff80010a68 sp : ff20000000a73940
> | gp : ffffffff827b2d88 tp : ff6000008785da40 t0 : ffffffff80fbce04
> | t1 : 0720072007200720 t2 : 706d756420656761 s0 : ff20000000a73a50
> | s1 : ff6000008915cff8 a0 : 0000000000000039 a1 : 0000000000000008
> | a2 : ff600003fff0de20 a3 : 0000000000000000 a4 : 0000000000000000
> | a5 : 0000000000000000 a6 : c0000000ffffefff a7 : ffffffff824469b8
> | s2 : ff1c0000022456c0 s3 : ff1ffffffdbfffff s4 : ff6000008915c000
> | s5 : ff6000008915c000 s6 : ff6000008915c000 s7 : ff1ffffffdc00000
> | s8 : 0000000000000001 s9 : ff1ffffffdc00000 s10: ffffffff819a31f0
> | s11: ffffffffffffffff t3 : ffffffff8000c950 t4 : ff60000080244f00
> | t5 : ff60000080244000 t6 : ff20000000a73708
> | status: 0000000200000120 badaddr: ffffffff80010a68 cause: 0000000000000003
> | [<ffffffff80010a68>] remove_pgd_mapping+0xbec/0x1070
> | [<ffffffff80fd238e>] vmemmap_free+0x14/0x1e
> | [<ffffffff8032e698>] section_deactivate+0x220/0x452
> | [<ffffffff8032ef7e>] sparse_remove_section+0x4a/0x58
> | [<ffffffff802f8700>] __remove_pages+0x7e/0xba
> | [<ffffffff803760d8>] memunmap_pages+0x2bc/0x3fe
> | [<ffffffff02a3ca28>] dmirror_device_remove_chunks+0x2ea/0x518 [test_hmm]
> | [<ffffffff02a3e026>] hmm_dmirror_exit+0x3e/0x1018 [test_hmm]
> | [<ffffffff80102c14>] __riscv_sys_delete_module+0x15a/0x2a6
> | [<ffffffff80fd020c>] do_trap_ecall_u+0x1f2/0x266
> | [<ffffffff80fde0a2>] _new_vmalloc_restore_context_a0+0xc6/0xd2
> | Code: bf51 7597 0184 8593 76a5 854a 4097 0029 80e7 2c00 (9002) 7597
> | ---[ end trace 0000000000000000 ]---
> | Kernel panic - not syncing: Fatal exception in interrupt
>
> Add a check to avoid calling the pmd dtor, if the calling context is
> vmemmap_free().
>
> Fixes: c75a74f4ba19 ("riscv: mm: Add memory hotplugging support")
> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
> ---
> arch/riscv/mm/init.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 0e8c20adcd98..fc53ce748c80 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -1566,7 +1566,7 @@ static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd)
> pmd_clear(pmd);
> }
>
> -static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud)
> +static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud, bool is_vmemmap)
> {
> struct page *page = pud_page(*pud);
> struct ptdesc *ptdesc = page_ptdesc(page);
> @@ -1579,7 +1579,8 @@ static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud)
> return;
> }
>
> - pagetable_pmd_dtor(ptdesc);
> + if (!is_vmemmap)
> + pagetable_pmd_dtor(ptdesc);
> if (PageReserved(page))
> free_reserved_page(page);
> else
> @@ -1703,7 +1704,7 @@ static void __meminit remove_pud_mapping(pud_t *pud_base, unsigned long addr, un
> remove_pmd_mapping(pmd_base, addr, next, is_vmemmap, altmap);
>
> if (pgtable_l4_enabled)
> - free_pmd_table(pmd_base, pudp);
> + free_pmd_table(pmd_base, pudp, is_vmemmap);
> }
> }
>
>
> base-commit: 57f7c7dc78cd09622b12920d92b40c1ce11b234e
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Thanks,
Alex
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH fixes] riscv: mm: Do not call pmd dtor on vmemmap page table teardown
2024-11-20 13:12 [PATCH fixes] riscv: mm: Do not call pmd dtor on vmemmap page table teardown Björn Töpel
2024-11-28 12:38 ` Alexandre Ghiti
@ 2024-12-11 22:33 ` patchwork-bot+linux-riscv
1 sibling, 0 replies; 3+ messages in thread
From: patchwork-bot+linux-riscv @ 2024-12-11 22:33 UTC (permalink / raw)
To: =?utf-8?b?QmrDtnJuIFTDtnBlbCA8Ympvcm5Aa2VybmVsLm9yZz4=?=
Cc: linux-riscv, alexghiti, aou, david, palmer, paul.walmsley,
osalvador, bjorn, abrestic, linux-kernel, linux-mm,
virtualization
Hello:
This patch was applied to riscv/linux.git (fixes)
by Palmer Dabbelt <palmer@rivosinc.com>:
On Wed, 20 Nov 2024 14:12:02 +0100 you wrote:
> From: Björn Töpel <bjorn@rivosinc.com>
>
> The vmemmap's, which is used for RV64 with SPARSEMEM_VMEMMAP, page
> tables are populated using pmd (page middle directory) hugetables.
> However, the pmd allocation is not using the generic mechanism used by
> the VMA code (e.g. pmd_alloc()), or the RISC-V specific
> create_pgd_mapping()/alloc_pmd_late(). Instead, the vmemmap page table
> code allocates a page, and calls vmemmap_set_pmd(). This results in
> that the pmd ctor is *not* called, nor would it make sense to do so.
>
> [...]
Here is the summary with links:
- [fixes] riscv: mm: Do not call pmd dtor on vmemmap page table teardown
https://git.kernel.org/riscv/c/21f1b85c8912
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-12-11 22:32 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-20 13:12 [PATCH fixes] riscv: mm: Do not call pmd dtor on vmemmap page table teardown Björn Töpel
2024-11-28 12:38 ` Alexandre Ghiti
2024-12-11 22:33 ` patchwork-bot+linux-riscv
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox