* Re: BUG: Bad page map in process/Bad Swap file entry, RPI CM4 on clone syscall
[not found] ` <ef144edd-facb-b49e-4482-74f4b42b1ad1@online.de>
@ 2022-08-18 17:14 ` Max Schulze
2022-08-24 15:30 ` Will Deacon
0 siblings, 1 reply; 3+ messages in thread
From: Max Schulze @ 2022-08-18 17:14 UTC (permalink / raw)
To: Will Deacon; +Cc: linux-arm-kernel, catalin.marinas, naush, linux-mm, akpm
Hello,
> On 15.08.22 16:22, Will Deacon wrote:
>>
>>>
[...]
>>>
>>>
>>> [20:47:09] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>> [20:48:46] BUG: Bad page map in process projecta pte:1110111111111111 pmd:800000001c40003
>>> [20:48:46] addr:0000007fa1c00000 vm_flags:00100073 anon_vma:ffffff805bf80d08 mapping:0000000000000000 index:7fa1c00
>>> [20:48:46] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
>
>> I hate to say it, but this all looks like memory corruption hitting the
>> page table and possibly the 'struct page' array to me :/
>
> Perhaps a note on the occcurence: across devices, the "bad page map" differs at pte, but somehow is mostly consistent at pmd:800000001c40003 (though I have seen 800000001c02003 and 800000001c40003). Is this some "magic value"? Because when not, I think it would be highly unlikely to be the hardware.
>
> It is not only my program that has the problem, I have seen
>
> [Sun Aug 14 17:30:38 2022] BUG: Bad page map in process llvmpipe-3 pte:262d2626292a2627 pmd:800000001c01003
>
> and
> [Sat Aug 13 11:53:43 2022] BUG: Bad page map in process Xorg:disk$1 pte:a098a09aa29ea8a4 pmd:800000001c01003
> [Sat Aug 13 11:53:43 2022] addr:00000055a961e000 vm_flags:200100073 anon_vma:ffffff804c07d8f8 mapping:0000000000000000 index:55a961e
> [Sat Aug 13 11:53:43 2022] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
>
>
[..]
I am able to reproduce this on 6.0.0-rc1 .
It looks like vm_normal_page does not recognize the page as being "normal" (?).
(mm/memory.c)
> if (likely(!pte_special(pte)))
> goto check_pfn;
> if (vma->vm_ops && vma->vm_ops->find_special_page)
> return vma->vm_ops->find_special_page(vma, addr);
> if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> return NULL;
> if (is_zero_pfn(pfn))
> return NULL;
> if (pte_devmap(pte))
>[...]
> return NULL;
>
> print_bad_pte(vma, addr, pte, NULL);
What would be helpful to do next? Is the KASAN warning a consequent error or the cause?
[ 18:42:59]
[ 18:44:17] BUG: Bad page map in process projecta pte:212725231f242323 pmd:800000001c01003
[ 18:44:17] addr:0000007fa1000000 vm_flags:00100073 anon_vma:ffffff8054090c38 mapping:0000000000000000 index:7fa1000
[ 18:44:17] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
[ 18:44:17] CPU: 3 PID: 1135 Comm: projecta Tainted: G C 6.0.0-rc1-v8-gc8f41281d1f4 #2
[ 18:44:17] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[ 18:44:17] Call trace:
[ 18:44:17] dump_backtrace.part.0 (arch/arm64/kernel/stacktrace.c:184)
[ 18:44:17] show_stack (arch/arm64/kernel/stacktrace.c:191)
[ 18:44:17] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
[ 18:44:17] dump_stack (lib/dump_stack.c:114)
[ 18:44:17] print_bad_pte (mm/memory.c:567 (discriminator 12))
[ 18:44:17] vm_normal_page (mm/memory.c:638)
[ 18:44:17] copy_page_range (mm/memory.c:951 mm/memory.c:1085 mm/memory.c:1171 mm/memory.c:1208 mm/memory.c:1232 mm/memory.c:1330)
[ 18:44:17] dup_mm (kernel/fork.c:699 kernel/fork.c:1524)
[ 18:44:17] copy_process (kernel/fork.c:1576 kernel/fork.c:2256)
[ 18:44:17] kernel_clone (kernel/fork.c:2673)
[ 18:44:17] __do_sys_clone (kernel/fork.c:2808)
[ 18:44:17] __arm64_sys_clone (kernel/fork.c:2775)
[ 18:44:17] invoke_syscall (arch/arm64/kernel/syscall.c:38 arch/arm64/kernel/syscall.c:52)
[ 18:44:17] el0_svc_common.constprop.0 (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/syscall.c:150)
[ 18:44:17] do_el0_svc (arch/arm64/kernel/syscall.c:207)
[ 18:44:17] el0_svc (arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:625)
[ 18:44:17] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:643)
[ 18:44:17] el0t_64_sync (arch/arm64/kernel/entry.S:581)
[ 18:44:17] Disabling lock debugging due to kernel taint
[ 18:44:17] BUG: Bad page map in process projecta pte:2626262023222323 pmd:800000001c01003
[ 18:44:17] addr:0000007fa1001000 vm_flags:00100073 anon_vma:ffffff8054090c38 mapping:0000000000000000 index:7fa1001
[ 18:44:17] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
[ 18:44:17] CPU: 3 PID: 1135 Comm: projecta Tainted: G B C 6.0.0-rc1-v8-gc8f41281d1f4 #2
[ 18:44:17] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[ 18:44:17] Call trace:
[ 18:44:17] dump_backtrace.part.0 (arch/arm64/kernel/stacktrace.c:184)
[ 18:44:17] show_stack (arch/arm64/kernel/stacktrace.c:191)
[ 18:44:17] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
[ 18:44:17] dump_stack (lib/dump_stack.c:114)
[ 18:44:17] print_bad_pte (mm/memory.c:567 (discriminator 12))
[ 18:44:17] vm_normal_page (mm/memory.c:638)
[ 18:44:17] copy_page_range (mm/memory.c:951 mm/memory.c:1085 mm/memory.c:1171 mm/memory.c:1208 mm/memory.c:1232 mm/memory.c:1330)
[ 18:44:17] dup_mm (kernel/fork.c:699 kernel/fork.c:1524)
[ 18:44:17] copy_process (kernel/fork.c:1576 kernel/fork.c:2256)
[ 18:44:17] kernel_clone (kernel/fork.c:2673)
[ 18:44:17] __do_sys_clone (kernel/fork.c:2808)
[ 18:44:17] __arm64_sys_clone (kernel/fork.c:2775)
[ 18:44:17] invoke_syscall (arch/arm64/kernel/syscall.c:38 arch/arm64/kernel/syscall.c:52)
[ 18:44:17] el0_svc_common.constprop.0 (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/syscall.c:150)
[ 18:44:17] do_el0_svc (arch/arm64/kernel/syscall.c:207)
[ 18:44:17] el0_svc (arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:625)
[ 18:44:17] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:643)
[ 18:44:17] el0t_64_sync (arch/arm64/kernel/entry.S:581)
[ 18:44:17] ==================================================================
[ 18:44:17] BUG: KASAN: wild-memory-access in __sync_icache_dcache (./include/asm-generic/bitops/generic-non-atomic.h:127 arch/arm64/mm/flush.c:62)
[ 18:44:17] Read of size 8 at addr 00000096808c8880 by task projecta/1135
[ 18:44:17] CPU: 3 PID: 1135 Comm: projecta Tainted: G B C 6.0.0-rc1-v8-gc8f41281d1f4 #2
[ 18:44:17] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[ 18:44:17] Call trace:
[ 18:44:17] dump_backtrace.part.0 (arch/arm64/kernel/stacktrace.c:184)
[ 18:44:17] show_stack (arch/arm64/kernel/stacktrace.c:191)
[ 18:44:17] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
[ 18:44:17] print_report (mm/kasan/report.c:438)
[ 18:44:17] kasan_report (mm/kasan/report.c:162 mm/kasan/report.c:497)
[ 18:44:17] __asan_load8 (mm/kasan/generic.c:256)
[ 18:44:17] __sync_icache_dcache (./include/asm-generic/bitops/generic-non-atomic.h:127 arch/arm64/mm/flush.c:62)
[ 18:44:17] copy_page_range (./arch/arm64/include/asm/pgtable.h:327 ./arch/arm64/include/asm/pgtable.h:358 mm/memory.c:994 mm/memory.c:1085 mm/memory.c:1171 mm/memory.c:1208 mm/memory.c:1232 mm/memory.c:1330)
[ 18:44:17] dup_mm (kernel/fork.c:699 kernel/fork.c:1524)
[ 18:44:17] copy_process (kernel/fork.c:1576 kernel/fork.c:2256)
[ 18:44:17] kernel_clone (kernel/fork.c:2673)
[ 18:44:17] __do_sys_clone (kernel/fork.c:2808)
[ 18:44:17] __arm64_sys_clone (kernel/fork.c:2775)
[ 18:44:17] invoke_syscall (arch/arm64/kernel/syscall.c:38 arch/arm64/kernel/syscall.c:52)
[ 18:44:17] el0_svc_common.constprop.0 (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/syscall.c:150)
[ 18:44:17] do_el0_svc (arch/arm64/kernel/syscall.c:207)
[ 18:44:17] el0_svc (arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:625)
[ 18:44:17] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:643)
[ 18:44:17] el0t_64_sync (arch/arm64/kernel/entry.S:581)
[ 18:44:17] ==================================================================
[ 18:44:17] Unable to handle kernel paging request at virtual address 00000096808c8880
[ 18:44:17] Mem abort info:
[ 18:44:17] ESR = 0x0000000096000004
[ 18:44:17] EC = 0x25: DABT (current EL), IL = 32 bits
[ 18:44:17] SET = 0, FnV = 0
[ 18:44:17] EA = 0, S1PTW = 0
[ 18:44:17] FSC = 0x04: level 0 translation fault
[ 18:44:17] Data abort info:
[ 18:44:17] ISV = 0, ISS = 0x00000004
[ 18:44:17] CM = 0, WnR = 0
[ 18:44:17] [00000096808c8880] address between user and kernel address ranges
[ 18:44:17] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 18:44:17] Modules linked in: rtc_pcf85063 regmap_i2c ov9281 rfkill bcm2835_unicam v4l2_dv_timings v4l2_fwnode v3d bcm2835_v4l2(C) v4l2_async bcm2835_codec(C) bcm2835_isp(C) videobuf2_vmalloc rpivid_hevc(C) v4l2_mem2mem drm_shmem_helper bcm2835_mmal_vchiq(C) gpu_sched videobuf2_dma_contig videobuf2_memops i2c_mux_pinctrl videobuf2_v4l2 videobuf2_common raspberrypi_hwmon i2c_mux videodev i2c_brcmstb i2c_bcm2835 vc_sm_cma(C) mc uio_pdrv_genirq nvmem_rmem uio drm fuse drm_panel_orientation_quirks backlight ipv6
[ 18:44:17] CPU: 3 PID: 1135 Comm: projecta Tainted: G B C 6.0.0-rc1-v8-gc8f41281d1f4 #2
[ 18:44:17] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[ 18:44:17] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 18:44:17] pc : __sync_icache_dcache (./include/asm-generic/bitops/generic-non-atomic.h:127 arch/arm64/mm/flush.c:62)
[ 18:44:17] lr : __sync_icache_dcache (./include/asm-generic/bitops/generic-non-atomic.h:127 arch/arm64/mm/flush.c:62)
[ 18:44:17] sp : ffffffc00d067630
[ 18:44:17] x29: ffffffc00d067630 x28: 0400000000000001 x27: 2626262023222323
[ 18:44:17] x26: 0000007fa1001000 x25: fffffffe010f2ce8 x24: 0000000000000000
[ 18:44:17] x23: fffffffe00000000 x22: 00000096808c8880 x21: 1ffffff801a0cece
[ 18:44:17] x20: 0000000000000000 x19: 00000098808c8880 x18: 0000000000000000
[ 18:44:17] x17: 3d3d3d3d3d3d3d3d x16: 3d3d3d3d3d3d3d3d x15: 3d3d3d3d3d3d3d3d
[ 18:44:17] x14: 3d3d3d3d3d3d3d3d x13: 3d3d3d3d3d3d3d3d x12: ffffffb8014cd81d
[ 18:44:17] x11: 1ffffff8014cd81c x10: ffffffb8014cd81c x9 : dfffffc000000000
[ 18:44:17] x8 : ffffffc00a66c0e7 x7 : 00000047feb327e4 x6 : 0000000000000001
[ 18:44:17] x5 : ffffffc00a66c0e0 x4 : ffffffb8014cd81d x3 : ffffffc0080b68e4
[ 18:44:17] x2 : 0000000000000000 x1 : ffffff804f3e0040 x0 : 0000000000000001
[ 18:44:17] Call trace:
[ 18:44:17] __sync_icache_dcache (./include/asm-generic/bitops/generic-non-atomic.h:127 arch/arm64/mm/flush.c:62)
[ 18:44:17] copy_page_range (./arch/arm64/include/asm/pgtable.h:327 ./arch/arm64/include/asm/pgtable.h:358 mm/memory.c:994 mm/memory.c:1085 mm/memory.c:1171 mm/memory.c:1208 mm/memory.c:1232 mm/memory.c:1330)
[ 18:44:17] dup_mm (kernel/fork.c:699 kernel/fork.c:1524)
[ 18:44:17] copy_process (kernel/fork.c:1576 kernel/fork.c:2256)
[ 18:44:17] kernel_clone (kernel/fork.c:2673)
[ 18:44:17] __do_sys_clone (kernel/fork.c:2808)
[ 18:44:17] __arm64_sys_clone (kernel/fork.c:2775)
[ 18:44:17] invoke_syscall (arch/arm64/kernel/syscall.c:38 arch/arm64/kernel/syscall.c:52)
[ 18:44:17] el0_svc_common.constprop.0 (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/syscall.c:150)
[ 18:44:17] do_el0_svc (arch/arm64/kernel/syscall.c:207)
[ 18:44:17] el0_svc (arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:625)
[ 18:44:17] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:643)
[ 18:44:17] el0t_64_sync (arch/arm64/kernel/entry.S:581)
[ 18:44:17] Code: d37ae673 8b170276 aa1603e0 940f8ac1 (f8776a60)
All code
========
0: d37ae673 lsl x19, x19, #6
4: 8b170276 add x22, x19, x23
8: aa1603e0 mov x0, x22
c: 940f8ac1 bl 0x3e2b10
10:* f8776a60 ldr x0, [x19, x23] <-- trapping instruction
Code starting with the faulting instruction
===========================================
0: f8776a60 ldr x0, [x19, x23]
[ 18:44:17] ---[ end trace 0000000000000000 ]---
[ 18:44:17] note: projecta[1135] exited with preempt_count 2
Thanks,
Max
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: BUG: Bad page map in process/Bad Swap file entry, RPI CM4 on clone syscall
2022-08-18 17:14 ` BUG: Bad page map in process/Bad Swap file entry, RPI CM4 on clone syscall Max Schulze
@ 2022-08-24 15:30 ` Will Deacon
2022-08-26 8:39 ` Aw: " max.schulze
0 siblings, 1 reply; 3+ messages in thread
From: Will Deacon @ 2022-08-24 15:30 UTC (permalink / raw)
To: Max Schulze; +Cc: linux-arm-kernel, catalin.marinas, naush, linux-mm, akpm
On Thu, Aug 18, 2022 at 07:14:12PM +0200, Max Schulze wrote:
> > On 15.08.22 16:22, Will Deacon wrote:
> >>> [20:47:09] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> >>> [20:48:46] BUG: Bad page map in process projecta pte:1110111111111111 pmd:800000001c40003
> >>> [20:48:46] addr:0000007fa1c00000 vm_flags:00100073 anon_vma:ffffff805bf80d08 mapping:0000000000000000 index:7fa1c00
> >>> [20:48:46] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
>
> >
> >> I hate to say it, but this all looks like memory corruption hitting the
> >> page table and possibly the 'struct page' array to me :/
> >
> > Perhaps a note on the occcurence: across devices, the "bad page map"
> > differs at pte, but somehow is mostly consistent at pmd:800000001c40003
> > (though I have seen 800000001c02003 and 800000001c40003). Is this some
> > "magic value"? Because when not, I think it would be highly unlikely to
> > be the hardware.
> >
> > It is not only my program that has the problem, I have seen
> >
> > [Sun Aug 14 17:30:38 2022] BUG: Bad page map in process llvmpipe-3 pte:262d2626292a2627 pmd:800000001c01003
> >
> > and
> > [Sat Aug 13 11:53:43 2022] BUG: Bad page map in process Xorg:disk$1 pte:a098a09aa29ea8a4 pmd:800000001c01003
> > [Sat Aug 13 11:53:43 2022] addr:00000055a961e000 vm_flags:200100073 anon_vma:ffffff804c07d8f8 mapping:0000000000000000 index:55a961e
> > [Sat Aug 13 11:53:43 2022] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
> >
> >
> [..]
>
> I am able to reproduce this on 6.0.0-rc1 .
> It looks like vm_normal_page does not recognize the page as being "normal" (?).
> (mm/memory.c)
I think the issue is much more fundamental than that; you appear to have
page-table corruption (for example, "pte:262d2626292a2627" and
"pte:1110111111111111" are definitely corrupted) and so anything dealing
with 'struct page' derived from the physical address in the pte is going to
go wonky.
From the logs here, the pmds look ok but these are the pte values I spotted:
0x1110111111111111
0x262d2626292a2627
0xa098a09aa29ea8a4
0x212725231f242323
0x2626262023222323
which don't seem to correspond to any sort of poison, but are possibly
artifacts of repeated patterns with random bits cleared?
Will
^ permalink raw reply [flat|nested] 3+ messages in thread
* Aw: Re: BUG: Bad page map in process/Bad Swap file entry, RPI CM4 on clone syscall
2022-08-24 15:30 ` Will Deacon
@ 2022-08-26 8:39 ` max.schulze
0 siblings, 0 replies; 3+ messages in thread
From: max.schulze @ 2022-08-26 8:39 UTC (permalink / raw)
To: Will Deacon; +Cc: linux-arm-kernel, catalin.marinas, naush, linux-mm, akpm
>
>
> I think the issue is much more fundamental than that; you appear to have
> page-table corruption (for example, "pte:262d2626292a2627" and
> "pte:1110111111111111" are definitely corrupted) and so anything dealing
> with 'struct page' derived from the physical address in the pte is going to
> go wonky.
>
> From the logs here, the pmds look ok but these are the pte values I
> spotted:
>
> 0x1110111111111111
> 0x262d2626292a2627
> 0xa098a09aa29ea8a4
> 0x212725231f242323
> 0x2626262023222323
>
> which don't seem to correspond to any sort of poison, but are possibly
> artifacts of repeated patterns with random bits cleared?
>
> Will
Very valuable thought. I am streaming camera data, and generally the image captured from my desk is very dark (so these might be 8-bit brightness values from a mono ov9281 sensor).
I then set the camera to generate a test pattern, i.e. it will only produce output like this
00000090: ffff ffff ffff ffff ffff 00ff 00ff 00ff
000000a0: 00ff 00ff 00ff 00ff 00ff 00ff 00ff 00ff
( also ff00 and 0000 ).
And those were the crashes that followed:
[ 54.838582] BUG: Bad page map in process pool-upowerd pte:ff00ffffffffff pmd:800000001801003
[ 54.839187] BUG: Bad page map in process pool-upowerd pte:ff00ff00ff00ff pmd:800000001801003
[ 54.839615] BUG: Bad page map in process pool-upowerd pte:ff00ff00ff00ff pmd:800000001801003
[ 54.845439] BUG: Bad page map in process pool-upowerd pte:ff00ff00ff00ff pmd:800000001801003
[ 49.664066] BUG: Bad page map in process gmain pte:ff00ffffffffff pmd:800000001801003
[ 49.664515] BUG: Bad page map in process gmain pte:ff00ff00ff00ff pmd:800000001801003
[ 49.664879] BUG: Bad page map in process gmain pte:ff00ff00ff00ff pmd:800000001801003
[ 49.665245] BUG: Bad page map in process gmain pte:ff00ff00ff00ff pmd:800000001801003
[ 49.665610] BUG: Bad page map in process gmain pte:ff00ff00ff00ff pmd:800000001801003
[ 49.665973] BUG: Bad page map in process gmain pte:ff00ff00ff00ff pmd:800000001801003
[ 49.666336] BUG: Bad page map in process gmain pte:ff00ff00ff00ff pmd:800000001801003
[ 49.666698] BUG: Bad page map in process gmain pte:ff00ff00ff00ff pmd:800000001801003
[ 49.667061] BUG: Bad page map in process gmain pte:ff00ff00ff00ff pmd:800000001801003
That confirms my suspicion of a dma write-out-of-bounds from a very separate angle (thanks again for the valuable idea).
I have since taken this to https://github.com/raspberrypi/linux/issues/5138 but will update the list once investigation has found a solution.
Thanks,
Max
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-08-26 8:39 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <d5f3cea4-20e6-f401-f3e7-8cb2f26033dc@online.de>
[not found] ` <20220815142213.GA10448@willie-the-truck>
[not found] ` <ef144edd-facb-b49e-4482-74f4b42b1ad1@online.de>
2022-08-18 17:14 ` BUG: Bad page map in process/Bad Swap file entry, RPI CM4 on clone syscall Max Schulze
2022-08-24 15:30 ` Will Deacon
2022-08-26 8:39 ` Aw: " max.schulze
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox