From: Suren Baghdasaryan <surenb@google.com>
To: "Holger Hoffstätte" <holger@applied-asynchrony.com>
Cc: Matthew Wilcox <willy@infradead.org>,
David Hildenbrand <david@redhat.com>,
akpm@linux-foundation.org, jirislaby@kernel.org,
jacobly.alt@gmail.com, michel@lespinasse.org,
jglisse@google.com, mhocko@suse.com, vbabka@suse.cz,
hannes@cmpxchg.org, mgorman@techsingularity.net,
dave@stgolabs.net, liam.howlett@oracle.com,
peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org,
mingo@redhat.com, will@kernel.org, luto@kernel.org,
songliubraving@fb.com, peterx@redhat.com, dhowells@redhat.com,
hughd@google.com, bigeasy@linutronix.de,
kent.overstreet@linux.dev, punit.agrawal@bytedance.com,
lstoakes@gmail.com, peterjung1337@gmail.com,
rientjes@google.com, chriscli@google.com,
axelrasmussen@google.com, joelaf@google.com, minchan@google.com,
rppt@kernel.org, jannh@google.com, shakeelb@google.com,
tatashin@google.com, edumazet@google.com, gthelen@google.com,
linux-mm@kvack.org
Subject: Re: [PATCH 1/1] mm: disable CONFIG_PER_VMA_LOCK by default until its fixed
Date: Tue, 4 Jul 2023 23:46:45 -0700 [thread overview]
Message-ID: <CAJuCfpEdF1x95vEFeofnJ3obJhEHq9Q_yj4Vi-9J7W=F8QjVAg@mail.gmail.com> (raw)
In-Reply-To: <a7149847-4b53-8ff0-d570-042631a1ce20@applied-asynchrony.com>
On Tue, Jul 4, 2023 at 4:59 PM Holger Hoffstätte
<holger@applied-asynchrony.com> wrote:
>
> On 2023-07-05 00:42, Matthew Wilcox wrote:
> > On Tue, Jul 04, 2023 at 11:34:27PM +0200, Holger Hoffstätte wrote:
> >> I applied the fix and did a clean rebuild. The first attempt to boot resulted in
> >> the following oops, though it kind of continued:
> >
> > It would be helpful to run this through decode_stacktrace.sh
> >
> >> Jul 4 22:35:22 hho kernel: BUG: kernel NULL pointer dereference, address: 0000000000000052
> >> Jul 4 22:35:22 hho kernel: #PF: supervisor read access in kernel mode
> >> Jul 4 22:35:22 hho kernel: #PF: error_code(0x0000) - not-present page
> >> Jul 4 22:35:22 hho kernel: PGD 0 P4D 0
> >> Jul 4 22:35:22 hho kernel: Oops: 0000 [#1] SMP
> >> Jul 4 22:35:22 hho kernel: CPU: 10 PID: 1740 Comm: start-stop-daem Not tainted 6.4.1 #1
> >> Jul 4 22:35:22 hho kernel: Hardware name: LENOVO 20U50001GE/20U50001GE, BIOS R19ET32W (1.16 ) 01/26/2021
> >> Jul 4 22:35:22 hho kernel: RIP: 0010:wq_worker_comm+0x63/0xc0
> >> Jul 4 22:35:22 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 7e 6b 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 29 b6 8b 00 80 7b
> >
> > Faulting insn:
> >
> > 0: 4c 8b 70 48 mov 0x48(%rax),%r14
> >
> > and rax is 0xa, which matches up with 0x52 as the faulting address.
> >
> > I'm not sure this is related to the VMA patches. It might be something
> > unrelated that doesn't often come up?
>
> See below for the reveal!
>
> >> Jul 4 22:35:22 hho kernel: RSP: 0018:ffffc90000fb7bb8 EFLAGS: 00010202
> >> Jul 4 22:35:22 hho kernel: RAX: 000000000000000a RBX: ffff88810cd43300 RCX: 0001020304050608
> >> Jul 4 22:35:22 hho kernel: RDX: ffff88811395bfc0 RSI: 7fffffffffffffff RDI: ffff88810cd43300
> >> Jul 4 22:35:22 hho kernel: RBP: 000000000000000f R08: ffffc90000fb7be8 R09: 0000000000000040
> >> Jul 4 22:35:22 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90000fb7be8
> >> Jul 4 22:35:22 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001
> >> Jul 4 22:35:22 hho kernel: FS: 00007f39dde1c740(0000) GS:ffff8887ef680000(0000) knlGS:0000000000000000
> >> Jul 4 22:35:22 hho kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> Jul 4 22:35:22 hho kernel: CR2: 0000000000000052 CR3: 0000000112188000 CR4: 0000000000350ee0
> >> Jul 4 22:35:22 hho kernel: Call Trace:
> >> Jul 4 22:35:22 hho kernel: <TASK>
> >> Jul 4 22:35:22 hho kernel: ? __die+0x1f/0x60
> >> Jul 4 22:35:22 hho kernel: ? page_fault_oops+0x14d/0x410
> >> Jul 4 22:35:22 hho kernel: ? xa_load+0x82/0xa0
> >> Jul 4 22:35:22 hho kernel: ? exc_page_fault+0x60/0x100
> >> Jul 4 22:35:22 hho kernel: ? asm_exc_page_fault+0x22/0x30
> >> Jul 4 22:35:22 hho kernel: ? wq_worker_comm+0x63/0xc0
> >> Jul 4 22:35:22 hho last message buffered 1 times
> >> Jul 4 22:35:22 hho kernel: proc_task_name+0xa4/0xb0
> >> Jul 4 22:35:22 hho kernel: ? seq_put_decimal_ull_width+0x96/0x100
> >> Jul 4 22:35:22 hho kernel: do_task_stat+0x44b/0xe10
> >> Jul 4 22:35:22 hho kernel: proc_single_show+0x4b/0xa0
> >> Jul 4 22:35:22 hho kernel: seq_read_iter+0xff/0x410
> >> Jul 4 22:35:22 hho kernel: ? generic_fillattr+0x45/0xf0
> >> Jul 4 22:35:22 hho kernel: seq_read+0x93/0xb0
> >> Jul 4 22:35:22 hho kernel: vfs_read+0x9b/0x2c0
> >> Jul 4 22:35:22 hho kernel: ? __do_sys_newfstatat+0x22/0x30
> >> Jul 4 22:35:22 hho kernel: ksys_read+0x53/0xc0
> >> Jul 4 22:35:22 hho kernel: do_syscall_64+0x35/0x80
> >> Jul 4 22:35:22 hho kernel: entry_SYSCALL_64_after_hwframe+0x46/0xb0
> >> Jul 4 22:35:22 hho kernel: RIP: 0033:0x7f39ddf5877d
> >> Jul 4 22:35:22 hho kernel: Code: b9 fe ff ff 48 8d 3d 1a 71 0a 00 50 e8 2c 12 02 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 80 3d 81 4c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83
> >> Jul 4 22:35:22 hho kernel: RSP: 002b:00007ffe4b98b6f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> >> Jul 4 22:35:22 hho kernel: RAX: ffffffffffffffda RBX: 00005655194cab40 RCX: 00007f39ddf5877d
> >> Jul 4 22:35:22 hho kernel: RDX: 0000000000000400 RSI: 00005655194ccd30 RDI: 0000000000000004
> >> Jul 4 22:35:22 hho kernel: RBP: 00007ffe4b98b760 R08: 00007f39ddff8cb2 R09: 0000000000000001
> >> Jul 4 22:35:22 hho kernel: R10: 0000000000001000 R11: 0000000000000246 R12: 00007f39de0324a0
> >> Jul 4 22:35:22 hho kernel: R13: 00005655194cd140 R14: 0000000000000a68 R15: 00007f39de031ba0
> >> Jul 4 22:35:22 hho kernel: </TASK>
> >> Jul 4 22:35:22 hho kernel: Modules linked in: mousedev sch_fq_codel bpf_preload snd_ctl_led amdgpu iwlmvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi mac80211 pkcs8_key_parser drm_ttm_helper ttm iommu_v2 gpu_sched snd_hda_intel libarc4 i2c_algo_bit snd_intel_dspcfg drm_buddy drm_suballoc_helper uvcvideo snd_hda_codec drm_display_helper edac_mce_amd videobuf2_vmalloc snd_hwdep crct10dif_pclmul videobuf2_memops uvc crc32_pclmul cec snd_hda_core crc32c_intel videobuf2_v4l2 ghash_clmulni_intel lm92 r8169 sha512_ssse3 snd_pcm videodev psmouse thinkpad_acpi iwlwifi drivetemp ledtrig_audio drm_kms_helper rapl videobuf2_common realtek snd_timer serio_raw snd_rn_pci_acp3x wmi_bmof platform_profile cfg80211 mc snd_acp_config k10temp snd syscopyarea mdio_devres ucsi_acpi snd_soc_acpi sysfillrect drm snd_pci_acp3x i2c_piix4 sysimgblt soundcore typec_ucsi ipmi_devintf rfkill roles libphy ipmi_msghandler typec video battery ac wmi i2c_scmi button
> >> Jul 4 22:35:22 hho kernel: CR2: 0000000000000052
> >> Jul 4 22:35:22 hho kernel: ---[ end trace 0000000000000000 ]---
> >> Jul 4 22:35:22 hho kernel: RIP: 0010:wq_worker_comm+0x63/0xc0
> >> Jul 4 22:35:22 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 7e 6b 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 29 b6 8b 00 80 7b
> >> Jul 4 22:35:22 hho kernel: RSP: 0018:ffffc90000fb7bb8 EFLAGS: 00010202
> >> Jul 4 22:35:22 hho kernel: RAX: 000000000000000a RBX: ffff88810cd43300 RCX: 0001020304050608
> >> Jul 4 22:35:22 hho kernel: RDX: ffff88811395bfc0 RSI: 7fffffffffffffff RDI: ffff88810cd43300
> >> Jul 4 22:35:22 hho kernel: RBP: 000000000000000f R08: ffffc90000fb7be8 R09: 0000000000000040
> >> Jul 4 22:35:22 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90000fb7be8
> >> Jul 4 22:35:22 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001
> >> Jul 4 22:35:22 hho kernel: FS: 00007f39dde1c740(0000) GS:ffff8887ef680000(0000) knlGS:0000000000000000
> >> Jul 4 22:35:22 hho kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> Jul 4 22:35:22 hho kernel: CR2: 0000000000000052 CR3: 0000000112188000 CR4: 0000000000350ee0
> >> Jul 4 22:35:22 hho kernel: note: start-stop-daem[1740] exited with irqs disabled
> >> Jul 4 22:35:22 hho kernel: Generic FE-GE Realtek PHY r8169-0-200:00: attached PHY driver (mii_bus:phy_addr=r8169-0-200:00, irq=MAC)
> >> Jul 4 22:35:22 hho kernel: r8169 0000:02:00.0 eth0: Link is Down
> >> Jul 4 22:35:24 hho kernel: r8169 0000:02:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx
> >> Jul 4 22:35:24 hho kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> >>
> >> It then kind of limped along until I rebooted again. This second attempt to boot
> >> died and locked up completely, again during amdgpu initialization, and is on display here:
> >> https://imgur.com/a/3ZE66kh
> >
> > refill_obj_stock() is also somewhat unrelated to VMA stuff. This is
> > all very bizarre.
> >
> >> Finally I just edited mm/Kconfig and set config PER_VMA_LOCK to "defbool n" to override
> >> any setting in my old config. That made everything work again - it's what I'm using now.
> >
> > Could I ask you to try a few boots with PER_VMA_LOCK set to "n", just
> > to eliminate the possibility that this is a coincidence?
> >
>
> HOLY SMOKES! You are on to something! I wanted to do 10 reboots and didn't expect
> anything to happen since this has been working fine since forever, and I don't boot
> that often since suspend is quite reliable these days. It did 9 without problems and
> then on the 10th reboot it crapped out, again with the xa_load pagefault.
Ok, sounds like the results of the fix are inconclusive. I guess we
should wait for more testing before concluding whether the fix is
valid.
In the meantime, per Andrew's request, I posted the patchset that
includes both the fix and the proper kill switch of the feature at
https://lore.kernel.org/all/20230705063711.2670599-1-surenb@google.com/.
Thanks,
Suren.
>
> Here's the first trace:
>
> holger>/tmp/linux-6.4.1/scripts/decode_stacktrace.sh /boot/kernel-genkernel-x86_64-6.4.1 < /tmp/kern.log
> Jul 4 22:35:22 hho kernel: [drm] Initialized amdgpu 3.52.0 20150101 for 0000:06:00.0 on minor 0
> Jul 4 22:35:22 hho kernel: fbcon: amdgpudrmfb (fb0) is primary device
> Jul 4 22:35:22 hho kernel: [drm] DSC precompute is not needed.
> Jul 4 22:35:22 hho kernel: Console: switching to colour frame buffer device 240x67
> Jul 4 22:35:22 hho kernel: amdgpu 0000:06:00.0: [drm] fb0: amdgpudrmfb frame buffer device
> Jul 4 22:35:22 hho kernel: BUG: kernel NULL pointer dereference, address: 0000000000000052
> Jul 4 22:35:22 hho kernel: #PF: supervisor read access in kernel mode
> Jul 4 22:35:22 hho kernel: #PF: error_code(0x0000) - not-present page
> Jul 4 22:35:22 hho kernel: PGD 0 P4D 0
> Jul 4 22:35:22 hho kernel: Oops: 0000 [#1] SMP
> Jul 4 22:35:22 hho kernel: CPU: 10 PID: 1740 Comm: start-stop-daem Not tainted 6.4.1 #1
> Jul 4 22:35:22 hho kernel: Hardware name: LENOVO 20U50001GE/20U50001GE, BIOS R19ET32W (1.16 ) 01/26/2021
> Jul 4 22:35:22 hho kernel: RIP: wq_worker_comm+0x63/0xc0
> Jul 4 22:35:22 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 7e 6b 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 29 b6 8b 00 80 7b
> All code
> ========
> 0: 43 2c 20 rex.XB sub $0x20,%al
> 3: 75 1d jne 0x22
> 5: 5b pop %rbx
> 6: 5d pop %rbp
> 7: 48 c7 c7 e0 a4 43 82 mov $0xffffffff8243a4e0,%rdi
> e: 41 5c pop %r12
> 10: 41 5d pop %r13
> 12: 41 5e pop %r14
> 14: e9 7e 6b 8b 00 jmp 0x8b6b97
> 19: 5b pop %rbx
> 1a: 5d pop %rbp
> 1b: 41 5c pop %r12
> 1d: 41 5d pop %r13
> 1f: 41 5e pop %r14
> 21: c3 ret
> 22: 48 89 df mov %rbx,%rdi
> 25: e8 ad 35 00 00 call 0x35d7
> 2a:* 4c 8b 70 48 mov 0x48(%rax),%r14 <-- trapping instruction
> 2e: 48 89 c3 mov %rax,%rbx
> 31: 4d 85 f6 test %r14,%r14
> 34: 74 cf je 0x5
> 36: 4c 89 f7 mov %r14,%rdi
> 39: e8 29 b6 8b 00 call 0x8bb667
> 3e: 80 .byte 0x80
> 3f: 7b .byte 0x7b
>
> Code starting with the faulting instruction
> ===========================================
> 0: 4c 8b 70 48 mov 0x48(%rax),%r14
> 4: 48 89 c3 mov %rax,%rbx
> 7: 4d 85 f6 test %r14,%r14
> a: 74 cf je 0xffffffffffffffdb
> c: 4c 89 f7 mov %r14,%rdi
> f: e8 29 b6 8b 00 call 0x8bb63d
> 14: 80 .byte 0x80
> 15: 7b .byte 0x7b
> Jul 4 22:35:22 hho kernel: RSP: 0018:ffffc90000fb7bb8 EFLAGS: 00010202
> Jul 4 22:35:22 hho kernel: RAX: 000000000000000a RBX: ffff88810cd43300 RCX: 0001020304050608
> Jul 4 22:35:22 hho kernel: RDX: ffff88811395bfc0 RSI: 7fffffffffffffff RDI: ffff88810cd43300
> Jul 4 22:35:22 hho kernel: RBP: 000000000000000f R08: ffffc90000fb7be8 R09: 0000000000000040
> Jul 4 22:35:22 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90000fb7be8
> Jul 4 22:35:22 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001
> Jul 4 22:35:22 hho kernel: FS: 00007f39dde1c740(0000) GS:ffff8887ef680000(0000) knlGS:0000000000000000
> Jul 4 22:35:22 hho kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jul 4 22:35:22 hho kernel: CR2: 0000000000000052 CR3: 0000000112188000 CR4: 0000000000350ee0
> Jul 4 22:35:22 hho kernel: Call Trace:
> Jul 4 22:35:22 hho kernel: <TASK>
> Jul 4 22:35:22 hho kernel: ? __die+0x1f/0x60
> Jul 4 22:35:22 hho kernel: ? page_fault_oops+0x14d/0x410
> Jul 4 22:35:22 hho kernel: ? xa_load+0x82/0xa0
> Jul 4 22:35:22 hho kernel: ? exc_page_fault+0x60/0x100
> Jul 4 22:35:22 hho kernel: ? asm_exc_page_fault+0x22/0x30
> Jul 4 22:35:22 hho kernel: ? wq_worker_comm+0x63/0xc0
> Jul 4 22:35:22 hho last message buffered 1 times
> Jul 4 22:35:22 hho kernel: proc_task_name+0xa4/0xb0
> Jul 4 22:35:22 hho kernel: ? seq_put_decimal_ull_width+0x96/0x100
> Jul 4 22:35:22 hho kernel: do_task_stat+0x44b/0xe10
> Jul 4 22:35:22 hho kernel: proc_single_show+0x4b/0xa0
> Jul 4 22:35:22 hho kernel: seq_read_iter+0xff/0x410
> Jul 4 22:35:22 hho kernel: ? generic_fillattr+0x45/0xf0
> Jul 4 22:35:22 hho kernel: seq_read+0x93/0xb0
> Jul 4 22:35:22 hho kernel: vfs_read+0x9b/0x2c0
> Jul 4 22:35:22 hho kernel: ? __do_sys_newfstatat+0x22/0x30
> Jul 4 22:35:22 hho kernel: ksys_read+0x53/0xc0
> Jul 4 22:35:22 hho kernel: do_syscall_64+0x35/0x80
> Jul 4 22:35:22 hho kernel: entry_SYSCALL_64_after_hwframe+0x46/0xb0
> Jul 4 22:35:22 hho kernel: RIP: 0033:0x7f39ddf5877d
> Jul 4 22:35:22 hho kernel: Code: b9 fe ff ff 48 8d 3d 1a 71 0a 00 50 e8 2c 12 02 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 80 3d 81 4c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83
> All code
> ========
> 0: b9 fe ff ff 48 mov $0x48fffffe,%ecx
> 5: 8d 3d 1a 71 0a 00 lea 0xa711a(%rip),%edi # 0xa7125
> b: 50 push %rax
> c: e8 2c 12 02 00 call 0x2123d
> 11: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
> 18: 00 00 00
> 1b: 66 90 xchg %ax,%ax
> 1d: 80 3d 81 4c 0e 00 00 cmpb $0x0,0xe4c81(%rip) # 0xe4ca5
> 24: 74 17 je 0x3d
> 26: 31 c0 xor %eax,%eax
> 28: 0f 05 syscall
> 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
> 30: 77 5b ja 0x8d
> 32: c3 ret
> 33: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
> 3a: 00 00 00
> 3d: 53 push %rbx
> 3e: 48 rex.W
> 3f: 83 .byte 0x83
>
> Code starting with the faulting instruction
> ===========================================
> 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
> 6: 77 5b ja 0x63
> 8: c3 ret
> 9: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
> 10: 00 00 00
> 13: 53 push %rbx
> 14: 48 rex.W
> 15: 83 .byte 0x83
> Jul 4 22:35:22 hho kernel: RSP: 002b:00007ffe4b98b6f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> Jul 4 22:35:22 hho kernel: RAX: ffffffffffffffda RBX: 00005655194cab40 RCX: 00007f39ddf5877d
> Jul 4 22:35:22 hho kernel: RDX: 0000000000000400 RSI: 00005655194ccd30 RDI: 0000000000000004
> Jul 4 22:35:22 hho kernel: RBP: 00007ffe4b98b760 R08: 00007f39ddff8cb2 R09: 0000000000000001
> Jul 4 22:35:22 hho kernel: R10: 0000000000001000 R11: 0000000000000246 R12: 00007f39de0324a0
> Jul 4 22:35:22 hho kernel: R13: 00005655194cd140 R14: 0000000000000a68 R15: 00007f39de031ba0
> Jul 4 22:35:22 hho kernel: </TASK>
> Jul 4 22:35:22 hho kernel: Modules linked in: mousedev sch_fq_codel bpf_preload snd_ctl_led amdgpu iwlmvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi mac80211 pkcs8_key_parser drm_ttm_helper ttm iommu_v2 gpu_sched snd_hda_intel libarc4 i2c_algo_bit snd_intel_dspcfg drm_buddy drm_suballoc_helper uvcvideo snd_hda_codec drm_display_helper edac_mce_amd videobuf2_vmalloc snd_hwdep crct10dif_pclmul videobuf2_memops uvc crc32_pclmul cec snd_hda_core crc32c_intel videobuf2_v4l2 ghash_clmulni_intel lm92 r8169 sha512_ssse3 snd_pcm videodev psmouse thinkpad_acpi iwlwifi drivetemp ledtrig_audio drm_kms_helper rapl videobuf2_common realtek snd_timer serio_raw snd_rn_pci_acp3x wmi_bmof platform_profile cfg80211 mc snd_acp_config k10temp snd syscopyarea mdio_devres ucsi_acpi snd_soc_acpi sysfillrect drm snd_pci_acp3x i2c_piix4 sysimgblt soundcore typec_ucsi ipmi_devintf rfkill roles libphy ipmi_msghandler typec video battery ac wmi i2c_scmi button
> Jul 4 22:35:22 hho kernel: CR2: 0000000000000052
> Jul 4 22:35:22 hho kernel: ---[ end trace 0000000000000000 ]---
> Jul 4 22:35:22 hho kernel: RIP: wq_worker_comm+0x63/0xc0
> Jul 4 22:35:22 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 7e 6b 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 29 b6 8b 00 80 7b
> All code
> ========
> 0: 43 2c 20 rex.XB sub $0x20,%al
> 3: 75 1d jne 0x22
> 5: 5b pop %rbx
> 6: 5d pop %rbp
> 7: 48 c7 c7 e0 a4 43 82 mov $0xffffffff8243a4e0,%rdi
> e: 41 5c pop %r12
> 10: 41 5d pop %r13
> 12: 41 5e pop %r14
> 14: e9 7e 6b 8b 00 jmp 0x8b6b97
> 19: 5b pop %rbx
> 1a: 5d pop %rbp
> 1b: 41 5c pop %r12
> 1d: 41 5d pop %r13
> 1f: 41 5e pop %r14
> 21: c3 ret
> 22: 48 89 df mov %rbx,%rdi
> 25: e8 ad 35 00 00 call 0x35d7
> 2a:* 4c 8b 70 48 mov 0x48(%rax),%r14 <-- trapping instruction
> 2e: 48 89 c3 mov %rax,%rbx
> 31: 4d 85 f6 test %r14,%r14
> 34: 74 cf je 0x5
> 36: 4c 89 f7 mov %r14,%rdi
> 39: e8 29 b6 8b 00 call 0x8bb667
> 3e: 80 .byte 0x80
> 3f: 7b .byte 0x7b
>
> Code starting with the faulting instruction
> ===========================================
> 0: 4c 8b 70 48 mov 0x48(%rax),%r14
> 4: 48 89 c3 mov %rax,%rbx
> 7: 4d 85 f6 test %r14,%r14
> a: 74 cf je 0xffffffffffffffdb
> c: 4c 89 f7 mov %r14,%rdi
> f: e8 29 b6 8b 00 call 0x8bb63d
> 14: 80 .byte 0x80
> 15: 7b .byte 0x7b
> Jul 4 22:35:22 hho kernel: RSP: 0018:ffffc90000fb7bb8 EFLAGS: 00010202
> Jul 4 22:35:22 hho kernel: RAX: 000000000000000a RBX: ffff88810cd43300 RCX: 0001020304050608
> Jul 4 22:35:22 hho kernel: RDX: ffff88811395bfc0 RSI: 7fffffffffffffff RDI: ffff88810cd43300
> Jul 4 22:35:22 hho kernel: RBP: 000000000000000f R08: ffffc90000fb7be8 R09: 0000000000000040
> Jul 4 22:35:22 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90000fb7be8
> Jul 4 22:35:22 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001
> Jul 4 22:35:22 hho kernel: FS: 00007f39dde1c740(0000) GS:ffff8887ef680000(0000) knlGS:0000000000000000
> Jul 4 22:35:22 hho kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jul 4 22:35:22 hho kernel: CR2: 0000000000000052 CR3: 0000000112188000 CR4: 0000000000350ee0
> Jul 4 22:35:22 hho kernel: note: start-stop-daem[1740] exited with irqs disabled
> Jul 4 22:35:22 hho kernel: Generic FE-GE Realtek PHY r8169-0-200:00: attached PHY driver (mii_bus:phy_addr=r8169-0-200:00, irq=MAC)
> Jul 4 22:35:22 hho kernel: r8169 0000:02:00.0 eth0: Link is Down
> Jul 4 22:35:24 hho kernel: r8169 0000:02:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx
> Jul 4 22:35:24 hho kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>
> Here is the second one from the reboot bonanza:
>
> holger>/tmp/linux-6.4.1/scripts/decode_stacktrace.sh /boot/kernel-genkernel-x86_64-6.4.1 < /tmp/kern.log
> Jul 5 01:34:20 hho kernel: [drm] Initialized amdgpu 3.52.0 20150101 for 0000:06:00.0 on minor 0
> Jul 5 01:34:20 hho kernel: fbcon: amdgpudrmfb (fb0) is primary device
> Jul 5 01:34:20 hho kernel: [drm] DSC precompute is not needed.
> Jul 5 01:34:20 hho kernel: Console: switching to colour frame buffer device 240x67
> Jul 5 01:34:20 hho kernel: amdgpu 0000:06:00.0: [drm] fb0: amdgpudrmfb frame buffer device
> Jul 5 01:34:20 hho kernel: BUG: kernel NULL pointer dereference, address: 0000000000000052
> Jul 5 01:34:20 hho kernel: #PF: supervisor read access in kernel mode
> Jul 5 01:34:20 hho kernel: #PF: error_code(0x0000) - not-present page
> Jul 5 01:34:20 hho kernel: PGD 0 P4D 0
> Jul 5 01:34:20 hho kernel: Oops: 0000 [#1] SMP
> Jul 5 01:34:20 hho kernel: CPU: 8 PID: 1716 Comm: start-stop-daem Not tainted 6.4.1 #1
> Jul 5 01:34:20 hho kernel: Hardware name: LENOVO 20U50001GE/20U50001GE, BIOS R19ET32W (1.16 ) 01/26/2021
> Jul 5 01:34:20 hho kernel: RIP: wq_worker_comm+0x63/0xc0
> Jul 5 01:34:20 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 2e 59 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 d9 a3 8b 00 80 7b
> All code
> ========
> 0: 43 2c 20 rex.XB sub $0x20,%al
> 3: 75 1d jne 0x22
> 5: 5b pop %rbx
> 6: 5d pop %rbp
> 7: 48 c7 c7 e0 a4 43 82 mov $0xffffffff8243a4e0,%rdi
> e: 41 5c pop %r12
> 10: 41 5d pop %r13
> 12: 41 5e pop %r14
> 14: e9 2e 59 8b 00 jmp 0x8b5947
> 19: 5b pop %rbx
> 1a: 5d pop %rbp
> 1b: 41 5c pop %r12
> 1d: 41 5d pop %r13
> 1f: 41 5e pop %r14
> 21: c3 ret
> 22: 48 89 df mov %rbx,%rdi
> 25: e8 ad 35 00 00 call 0x35d7
> 2a:* 4c 8b 70 48 mov 0x48(%rax),%r14 <-- trapping instruction
> 2e: 48 89 c3 mov %rax,%rbx
> 31: 4d 85 f6 test %r14,%r14
> 34: 74 cf je 0x5
> 36: 4c 89 f7 mov %r14,%rdi
> 39: e8 d9 a3 8b 00 call 0x8ba417
> 3e: 80 .byte 0x80
> 3f: 7b .byte 0x7b
>
> Code starting with the faulting instruction
> ===========================================
> 0: 4c 8b 70 48 mov 0x48(%rax),%r14
> 4: 48 89 c3 mov %rax,%rbx
> 7: 4d 85 f6 test %r14,%r14
> a: 74 cf je 0xffffffffffffffdb
> c: 4c 89 f7 mov %r14,%rdi
> f: e8 d9 a3 8b 00 call 0x8ba3ed
> 14: 80 .byte 0x80
> 15: 7b .byte 0x7b
> Jul 5 01:34:20 hho kernel: RSP: 0018:ffffc90001027bb8 EFLAGS: 00010202
> Jul 5 01:34:20 hho kernel: RAX: 000000000000000a RBX: ffff888111052640 RCX: 0001020304050608
> Jul 5 01:34:20 hho kernel: RDX: ffff88810490b300 RSI: 7fffffffffffffff RDI: ffff888111052640
> Jul 5 01:34:20 hho kernel: RBP: 000000000000000f R08: ffffc90001027be8 R09: 0000000000000040
> Jul 5 01:34:20 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90001027be8
> Jul 5 01:34:20 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001
> Jul 5 01:34:20 hho kernel: FS: 00007f917809a740(0000) GS:ffff8887ef600000(0000) knlGS:0000000000000000
> Jul 5 01:34:20 hho kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jul 5 01:34:20 hho kernel: CR2: 0000000000000052 CR3: 0000000107562000 CR4: 0000000000350ee0
> Jul 5 01:34:20 hho kernel: Call Trace:
> Jul 5 01:34:20 hho kernel: <TASK>
> Jul 5 01:34:20 hho kernel: ? __die+0x1f/0x60
> Jul 5 01:34:20 hho kernel: ? page_fault_oops+0x14d/0x410
> Jul 5 01:34:20 hho kernel: ? xa_load+0x82/0xa0
> Jul 5 01:34:20 hho last message buffered 1 times
> Jul 5 01:34:20 hho kernel: ? exc_page_fault+0x60/0x100
> Jul 5 01:34:20 hho kernel: ? asm_exc_page_fault+0x22/0x30
> Jul 5 01:34:20 hho kernel: ? wq_worker_comm+0x63/0xc0
> Jul 5 01:34:20 hho last message buffered 1 times
> Jul 5 01:34:20 hho kernel: proc_task_name+0xa4/0xb0
> Jul 5 01:34:20 hho kernel: ? seq_put_decimal_ull_width+0x96/0x100
> Jul 5 01:34:20 hho kernel: do_task_stat+0x44b/0xe10
> Jul 5 01:34:20 hho kernel: proc_single_show+0x4b/0xa0
> Jul 5 01:34:20 hho kernel: seq_read_iter+0xff/0x410
> Jul 5 01:34:20 hho kernel: ? generic_fillattr+0x45/0xf0
> Jul 5 01:34:20 hho kernel: seq_read+0x93/0xb0
> Jul 5 01:34:20 hho kernel: vfs_read+0x9b/0x2c0
> Jul 5 01:34:20 hho kernel: ? __do_sys_newfstatat+0x22/0x30
> Jul 5 01:34:20 hho kernel: ksys_read+0x53/0xc0
> Jul 5 01:34:20 hho kernel: do_syscall_64+0x35/0x80
> Jul 5 01:34:20 hho kernel: entry_SYSCALL_64_after_hwframe+0x46/0xb0
> Jul 5 01:34:20 hho kernel: RIP: 0033:0x7f91781d677d
> Jul 5 01:34:20 hho kernel: Code: b9 fe ff ff 48 8d 3d 1a 71 0a 00 50 e8 2c 12 02 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 80 3d 81 4c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83
> All code
> ========
> 0: b9 fe ff ff 48 mov $0x48fffffe,%ecx
> 5: 8d 3d 1a 71 0a 00 lea 0xa711a(%rip),%edi # 0xa7125
> b: 50 push %rax
> c: e8 2c 12 02 00 call 0x2123d
> 11: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
> 18: 00 00 00
> 1b: 66 90 xchg %ax,%ax
> 1d: 80 3d 81 4c 0e 00 00 cmpb $0x0,0xe4c81(%rip) # 0xe4ca5
> 24: 74 17 je 0x3d
> 26: 31 c0 xor %eax,%eax
> 28: 0f 05 syscall
> 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
> 30: 77 5b ja 0x8d
> 32: c3 ret
> 33: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
> 3a: 00 00 00
> 3d: 53 push %rbx
> 3e: 48 rex.W
> 3f: 83 .byte 0x83
>
> Code starting with the faulting instruction
> ===========================================
> 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
> 6: 77 5b ja 0x63
> 8: c3 ret
> 9: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
> 10: 00 00 00
> 13: 53 push %rbx
> 14: 48 rex.W
> 15: 83 .byte 0x83
> Jul 5 01:34:20 hho kernel: RSP: 002b:00007ffe56a8adb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> Jul 5 01:34:20 hho kernel: RAX: ffffffffffffffda RBX: 0000559458207b40 RCX: 00007f91781d677d
> Jul 5 01:34:20 hho kernel: RDX: 0000000000000400 RSI: 0000559458209d30 RDI: 0000000000000004
> Jul 5 01:34:20 hho kernel: RBP: 00007ffe56a8ae20 R08: 00007f9178276cb2 R09: 0000000000000001
> Jul 5 01:34:20 hho kernel: R10: 0000000000001000 R11: 0000000000000246 R12: 00007f91782b04a0
> Jul 5 01:34:20 hho kernel: R13: 000055945820a140 R14: 0000000000000a68 R15: 00007f91782afba0
> Jul 5 01:34:20 hho kernel: </TASK>
> Jul 5 01:34:20 hho kernel: Modules linked in: sch_fq_codel bpf_preload mousedev snd_ctl_led iwlmvm snd_hda_codec_realtek amdgpu pkcs8_key_parser snd_hda_codec_generic mac80211 libarc4 drm_ttm_helper snd_hda_codec_hdmi ttm iommu_v2 uvcvideo gpu_sched videobuf2_vmalloc i2c_algo_bit videobuf2_memops snd_hda_intel drm_buddy uvc edac_mce_amd snd_intel_dspcfg crct10dif_pclmul videobuf2_v4l2 drm_suballoc_helper crc32_pclmul lm92 snd_hda_codec drm_display_helper crc32c_intel videodev snd_hwdep ghash_clmulni_intel r8169 drivetemp cec sha512_ssse3 thinkpad_acpi snd_hda_core videobuf2_common psmouse realtek iwlwifi drm_kms_helper rapl ledtrig_audio snd_pcm mc serio_raw snd_rn_pci_acp3x platform_profile syscopyarea wmi_bmof mdio_devres k10temp ipmi_devintf snd_timer snd_acp_config sysfillrect cfg80211 drm ucsi_acpi sysimgblt snd snd_soc_acpi libphy i2c_piix4 ipmi_msghandler snd_pci_acp3x typec_ucsi soundcore rfkill video roles typec battery ac wmi i2c_scmi button
> Jul 5 01:34:20 hho kernel: CR2: 0000000000000052
> Jul 5 01:34:20 hho kernel: ---[ end trace 0000000000000000 ]---
> Jul 5 01:34:20 hho kernel: RIP: wq_worker_comm+0x63/0xc0
> Jul 5 01:34:20 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 2e 59 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 d9 a3 8b 00 80 7b
> All code
> ========
> 0: 43 2c 20 rex.XB sub $0x20,%al
> 3: 75 1d jne 0x22
> 5: 5b pop %rbx
> 6: 5d pop %rbp
> 7: 48 c7 c7 e0 a4 43 82 mov $0xffffffff8243a4e0,%rdi
> e: 41 5c pop %r12
> 10: 41 5d pop %r13
> 12: 41 5e pop %r14
> 14: e9 2e 59 8b 00 jmp 0x8b5947
> 19: 5b pop %rbx
> 1a: 5d pop %rbp
> 1b: 41 5c pop %r12
> 1d: 41 5d pop %r13
> 1f: 41 5e pop %r14
> 21: c3 ret
> 22: 48 89 df mov %rbx,%rdi
> 25: e8 ad 35 00 00 call 0x35d7
> 2a:* 4c 8b 70 48 mov 0x48(%rax),%r14 <-- trapping instruction
> 2e: 48 89 c3 mov %rax,%rbx
> 31: 4d 85 f6 test %r14,%r14
> 34: 74 cf je 0x5
> 36: 4c 89 f7 mov %r14,%rdi
> 39: e8 d9 a3 8b 00 call 0x8ba417
> 3e: 80 .byte 0x80
> 3f: 7b .byte 0x7b
>
> Code starting with the faulting instruction
> ===========================================
> 0: 4c 8b 70 48 mov 0x48(%rax),%r14
> 4: 48 89 c3 mov %rax,%rbx
> 7: 4d 85 f6 test %r14,%r14
> a: 74 cf je 0xffffffffffffffdb
> c: 4c 89 f7 mov %r14,%rdi
> f: e8 d9 a3 8b 00 call 0x8ba3ed
> 14: 80 .byte 0x80
> 15: 7b .byte 0x7b
> Jul 5 01:34:20 hho kernel: RSP: 0018:ffffc90001027bb8 EFLAGS: 00010202
> Jul 5 01:34:20 hho kernel: RAX: 000000000000000a RBX: ffff888111052640 RCX: 0001020304050608
> Jul 5 01:34:20 hho kernel: RDX: ffff88810490b300 RSI: 7fffffffffffffff RDI: ffff888111052640
> Jul 5 01:34:20 hho kernel: RBP: 000000000000000f R08: ffffc90001027be8 R09: 0000000000000040
> Jul 5 01:34:20 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90001027be8
> Jul 5 01:34:20 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001
> Jul 5 01:34:20 hho kernel: FS: 00007f917809a740(0000) GS:ffff8887ef600000(0000) knlGS:0000000000000000
> Jul 5 01:34:20 hho kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jul 5 01:34:20 hho kernel: CR2: 0000000000000052 CR3: 0000000107562000 CR4: 0000000000350ee0
> Jul 5 01:34:20 hho kernel: note: start-stop-daem[1716] exited with irqs disabled
> Jul 5 01:34:20 hho kernel: Generic FE-GE Realtek PHY r8169-0-200:00: attached PHY driver (mii_bus:phy_addr=r8169-0-200:00, irq=MAC)
> Jul 5 01:34:21 hho kernel: r8169 0000:02:00.0 eth0: Link is Down
> Jul 5 01:34:23 hho kernel: r8169 0000:02:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx
> Jul 5 01:34:23 hho kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>
> The crashing process was openrc's start-stop-daemon starting acpid, though I think
> both are just the victims here.
>
> Hope this helps!
>
> cheers
> Holger
next prev parent reply other threads:[~2023-07-05 6:47 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-03 18:21 Suren Baghdasaryan
2023-07-03 20:07 ` David Rientjes
2023-07-03 20:30 ` David Hildenbrand
2023-07-04 5:39 ` Suren Baghdasaryan
2023-07-04 6:50 ` Suren Baghdasaryan
2023-07-04 7:18 ` David Hildenbrand
2023-07-04 7:34 ` Suren Baghdasaryan
2023-07-04 8:03 ` David Hildenbrand
2023-07-04 18:01 ` David Hildenbrand
2023-07-04 13:07 ` Matthew Wilcox
2023-07-04 17:21 ` Suren Baghdasaryan
2023-07-04 17:36 ` David Hildenbrand
2023-07-04 17:56 ` Suren Baghdasaryan
2023-07-04 18:05 ` David Hildenbrand
2023-07-04 19:11 ` Suren Baghdasaryan
2023-07-04 20:10 ` Suren Baghdasaryan
[not found] ` <7d6ba07b-ee60-8920-b91c-04c826eb4690@applied-asynchrony.com>
2023-07-04 22:03 ` Suren Baghdasaryan
2023-07-04 22:42 ` Matthew Wilcox
[not found] ` <a7149847-4b53-8ff0-d570-042631a1ce20@applied-asynchrony.com>
2023-07-05 6:46 ` Suren Baghdasaryan [this message]
2023-07-04 17:55 ` Matthew Wilcox
2023-07-04 17:58 ` Suren Baghdasaryan
2023-07-04 8:12 ` Linux regression tracking (Thorsten Leemhuis)
2023-07-04 8:30 ` Hans de Goede
2023-07-04 8:18 ` Hans de Goede
2023-07-04 15:24 ` Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJuCfpEdF1x95vEFeofnJ3obJhEHq9Q_yj4Vi-9J7W=F8QjVAg@mail.gmail.com' \
--to=surenb@google.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=bigeasy@linutronix.de \
--cc=chriscli@google.com \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=dhowells@redhat.com \
--cc=edumazet@google.com \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=holger@applied-asynchrony.com \
--cc=hughd@google.com \
--cc=jacobly.alt@gmail.com \
--cc=jannh@google.com \
--cc=jglisse@google.com \
--cc=jirislaby@kernel.org \
--cc=joelaf@google.com \
--cc=kent.overstreet@linux.dev \
--cc=ldufour@linux.ibm.com \
--cc=liam.howlett@oracle.com \
--cc=linux-mm@kvack.org \
--cc=lstoakes@gmail.com \
--cc=luto@kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=michel@lespinasse.org \
--cc=minchan@google.com \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterjung1337@gmail.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=punit.agrawal@bytedance.com \
--cc=rientjes@google.com \
--cc=rppt@kernel.org \
--cc=shakeelb@google.com \
--cc=songliubraving@fb.com \
--cc=tatashin@google.com \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox