On Sat, 17 Dec 2022 22:39:52 +0100 "Erhard F." wrote: > On Mon, 12 Dec 2022 14:31:35 +1000 > "Nicholas Piggin" wrote: > > > Have you run memtest on the system? Are the messages related to a > > kernel upgrade? This and your KASAN bugs look possibly like random > > corruption. > > Ok, so I went back to kernel 5.4.225 and ran 'memtester 1930M' for a few hours completing 5 test loops in a row. Next I ran 'stress -m2 --vm-bytes 965M' for a few hours, also without any problems. 1930M is the max. memory I can lock on this 2 GB PowerMac G4 without invoking systemds' OOM killer. > > Booting kernel 6.1.0 and running 'stress -m2 --vm-bytes 965M' I almost instantly get: > [...] > pagealloc: memory corruption > 830c4e52: 00 00 00 00 .... > CPU: 1 PID: 298 Comm: stress Tainted: G T 6.1.0-gentoo-PMacG4 #2 > Hardware name: PowerMac3,6 7455 0x80010303 PowerMac > Call Trace: > [f302bb50] [c0d22770] dump_stack_lvl+0x60/0xa4 (unreliable) > [f302bb70] [c03242bc] __kernel_unpoison_pages+0x21c/0x268 > [f302bbb0] [c02fdd04] get_page_from_freelist+0xf90/0x1234 > [f302bcb0] [c02febd0] __alloc_pages+0x1dc/0x101c > [f302be00] [c02d2fa8] handle_mm_fault+0x5b8/0x10bc > [f302bed0] [c002b8c8] ___do_page_fault+0x22c/0x818 > [f302bf10] [c002c108] do_page_fault+0x28/0x6c > [f302bf30] [c000433c] DataAccess_virt+0x124/0x17c > --- interrupt: 300 at 0xac3044 > NIP: 00ac3044 LR: 00ac3020 CTR: 00000000 > REGS: f302bf40 TRAP: 0300 Tainted: G T (6.1.0-gentoo-PMacG4) > MSR: 0000d032 CR: 20882464 XER: 00000000 > DAR: 8b192010 DSISR: 42000000 > GPR00: 00ac3020 affad290 a7ed3740 6b97e010 3c500000 20224462 00000000 009e0264 > GPR08: 1f815000 1f814000 00000000 404a0fca 20882462 00adfff4 00000000 00000000 > GPR16: 00000000 00000002 00000000 0000005a 40802462 80002462 40002462 00ae00a0 > GPR24: ffffffff ffffffff 3c500000 00000000 00000000 6b97e010 00ae7d64 00001000 > NIP [00ac3044] 0xac3044 > LR [00ac3020] 0xac3020 > --- interrupt: 300 > page:a7a2bb6d refcount:1 mapcount:0 mapping:00000000 index:0x1 pfn:0x310ab > flags: 0x80000000(zone=2) > raw: 80000000 00000100 00000122 00000000 00000001 00000000 ffffffff 00000001 > raw: 00000000 > page dumped because: pagealloc: corrupted page details > > Running 'memtester 1930M' on kernel 6.1.0 I almost instantly get: > [...] > pagealloc: memory corruption > f4f9be93: 00 00 00 00 .... > CPU: 1 PID: 295 Comm: memtester Tainted: G T 6.1.0-gentoo-PMacG4 #2 > Hardware name: PowerMac3,6 7455 0x80010303 PowerMac > Call Trace: > [f2c7b6c0] [c0d22770] dump_stack_lvl+0x60/0xa4 (unreliable) > [f2c7b6e0] [c03242bc] __kernel_unpoison_pages+0x21c/0x268 > [f2c7b720] [c02fdd04] get_page_from_freelist+0xf90/0x1234 > [f2c7b820] [c02febd0] __alloc_pages+0x1dc/0x101c > [f2c7b970] [c02d2fa8] handle_mm_fault+0x5b8/0x10bc > [f2c7ba40] [c02c61a4] __get_user_pages+0x180/0x3cc > [f2c7baa0] [c02c7e24] populate_vma_page_range+0x8c/0xe4 > [f2c7bad0] [c02c8088] __mm_populate+0x13c/0x238 > [f2c7bb60] [c02d5658] do_mlock+0x15c/0x38c > [f2c7bc00] [c0019948] system_call_exception+0x120/0x204 > [f2c7bf30] [c00221ac] ret_from_syscall+0x0/0x2c > --- interrupt: c00 at 0x6e6af0 > NIP: 006e6af0 LR: 007e11a4 CTR: 00000000 > REGS: f2c7bf40 TRAP: 0c00 Tainted: G T (6.1.0-gentoo-PMacG4) > MSR: 0000d032 CR: 40002468 XER: 20000000 > > GPR00: 00000096 afa7bdc0 a7abb2c0 2f067000 789ff010 00000000 00000000 006d23d4 > GPR08: 0000d032 00000008 78a00ff8 4047df2a 4047dbf7 007ffff4 010ef900 00c72438 > GPR16: 00c73b50 00c723a0 789ff010 78a00000 00000000 a7ab42c8 00000000 007d0ea0 > GPR24: 78a00000 fffff000 00000000 00001000 2f066010 00000001 00807de8 007e3870 > NIP [006e6af0] 0x6e6af0 > LR [007e11a4] 0x7e11a4 > --- interrupt: c00 > page:a05bd3e5 refcount:1 mapcount:0 mapping:00000000 index:0x1 pfn:0x310ab > flags: 0x80000000(zone=2) > raw: 80000000 00000100 00000122 00000000 00000001 00000000 ffffffff 00000001 > raw: 00000000 > page dumped because: pagealloc: corrupted page details On recent v6.15-rc kernel this memory corruption behaves deterministic now and I need no external tools to provoke it on my G4. It needs VMAP_STACK=y, CONFIG_HIGHMEM=y and more than one core to show up. In this case I always get it during boot after init on the root file system: [...] Freeing unused kernel image (initmem) memory: 1204K Kernel memory protection not selected by kernel config. Run /sbin/init as init process with arguments: /sbin/init with environment: HOME=/ TERM=linux random: crng init done pagealloc: memory corruption fffdfff0: 00 00 00 00 .... CPU: 0 UID: 0 PID: 164 Comm: lvm Not tainted 6.15.0-rc4-PMacG4 #51 NONE Hardware name: PowerMac3,6 7455 0x80010303 PowerMac Call Trace: [f21a1c30] [c0913b10] dump_stack_lvl+0x70/0xa4 (unreliable) [f21a1c50] [c0913b64] dump_stack+0x20/0x34 [f21a1c60] [c01ad7a0] __kernel_unpoison_pages+0x178/0x18c [f21a1ca0] [c019fbd8] post_alloc_hook+0x78/0xb8 [f21a1cc0] [c019fc3c] prep_new_page+0x24/0x5c [f21a1ce0] [c01a06b0] get_page_from_freelist+0x224/0x6f8 [f21a1d70] [c01a2218] __alloc_frozen_pages_noprof+0x118/0x800 [f21a1e20] [c01a2910] __alloc_pages_noprof+0x10/0x30 [f21a1e30] [c01a2cc8] __folio_alloc_noprof+0x14/0x30 [f21a1e40] [c0178e60] vma_alloc_zeroed_movable_folio.isra.0+0x34/0x8c [f21a1e60] [c017dd9c] handle_mm_fault+0x380/0x1624 [f21a1ef0] [c003000c] ___do_page_fault+0x37c/0x4dc [f21a1f30] [c0030394] do_page_fault+0x20/0x38 [f21a1f40] [c0004324] DataAccess_virt+0x11c/0x174 --- interrupt: 300 at 0xa77de974 NIP: a77de974 LR: 004650f4 CTR: 00002cb2 REGS: f21a1f50 TRAP: 0300 Not tainted (6.15.0-rc4-PMacG4) MSR: 0000d032 CR: 42004460 XER: 20000000 DAR: 007aa000 DSISR: 42000000 GPR00: 0000448e afbebcf0 a798d2e0 006eb158 00000000 00000010 007aa040 00000020 GPR08: ffffffc0 ffffffe0 00000013 a77de804 a77dda94 006afca4 0153b33c 00000000 GPR16: 00000000 015340b0 00000000 00710180 0153b30c 00000001 a799cc20 0043f718 GPR24: a799e010 00000001 006eb158 006e9e10 afbec018 afbecaa5 006af5e4 00a6fa80 NIP [a77de974] 0xa77de974 LR [004650f4] 0x4650f4 --- interrupt: 300 page: refcount:0 mapcount:0 mapping:00000000 index:0x0 pfn:0x3106d flags: 0x80000000(zone=2) raw: 80000000 00000100 00000122 00000000 00000000 00000000 ffffffff 00000000 page dumped because: pagealloc: corrupted page details [...] So the situation is as follows: VMAP_STACK=y and CONFIG_HIGHMEM=y ... memory corruption VMAP_STACK=y without HIGHMEM ... boots ok CONFIG_HIGHMEM=y without VMAP_STACK ... boots ok VMAP_STACK=y and CONFIG_HIGHMEM=y and nr_cpus=1 ... boots ok I stripped down kernel .config (see attached dmesg) a lot to rule out some factors and also enabled KCSAN. However the memory corruption always happens before KCSAN shows any races at some later time (see attached dmesg). KCSAN_EARLY_ENABLE=y is enabled. So the defining factor besides VMAP_STACK=y is CONFIG_HIGHMEM=y. If I disable HIGHMEM and let the G4 boot with only 784 MiB RAM in this case, I don't get any memory corruption whatsoever. Also CONFIG_HIGHMEM=y without VMAP_STACK=y works flawless. But not both together. Regards, Erhard