On 2025-02-19 03:41, Ryan Roberts wrote: > On 19/02/2025 02:27, Luiz Capitulino wrote: >> Hi, >> >> I'm getting the crash below with Linus tree commit >> 2408a807bfc3f738850ef5ad5e3fd59d66168996 on a Ampere Mt. Jade with two sockets >> (backtrace below). > > Thanks for the bug report, I'll take a look this morning, but I'm off work > tomorrow and Friday so if I can't figure it out before end of day I won't be > able to look again until Monday, unless someone can pick it up in the meantime. No rush at all. Please, enjoy your time off :) > Anyway, is there a specific config you're compiling for? And what about kernel > command line args? Config is attached. The kernel command-line is: """ ro crashkernel=1G-4G:406M,4G-64G:470M,64G-:726M rd.lvm.lv=cs_ampere-mtjade-altra-03/root rd.lvm.lv=cs_ampere-mtjade-altra-03/swap earlycon=pl011,mmio,0x100002600000 """ > Is it 100% reproducible for you? That is a good question. Right now it is (just tried again with latest Linus tree 6537cfb395f352782918d8ee7b7f10ba2cc3cbf2). But I do have the recollection that I was able to boot a bad kernel a few times. Btw, I'll try to bisect again and will also try to update the system's firmware just in case. > How much RAM does your system have? (I have 2 > socket Mt. Jade with 512G; I'll try to repro on that). Mine is 512G, maybe we're lucky and it's the same system. >> It happens very early during boot. Passing 'nokaslr' in the command-line works >> around the issue (ie. I can boot and use the system normally). Doesn't seem to >> happen with 6.13. I tried bisecting it but got nowhere... >> >> [    0.000000] ------------[ cut here ]------------ >> [    0.000000] kernel BUG at arch/arm64/mm/mmu.c:185! > > This is: > > /* > * After the PTE entry has been populated once, we > * only allow updates to the permission attributes. > */ > BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), pte_val(__ptep_get(ptep)))); > > So we have a valid -> valid PTE transition where either the PFNs are changing, > we are trying to change permissions on a contiguous entry, we are trying to > transition from non-global to global, or we are trying to change other > explicitly disallowed bits. > >> [    0.000000] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP >> [    0.000000] Modules linked in: >> [    0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.14.0-rc3+ #8 >> [    0.000000] pstate: 400000c9 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> [    0.000000] pc : alloc_init_cont_pte+0x20c/0x3d0 >> [    0.000000] lr : alloc_init_cont_pte+0x204/0x3d0 >> [    0.000000] sp : ffffb45836ec78b0 >> [    0.000000] x29: ffffb45836ec7940 x28: ffff6fea00000000 x27: 0068000000000f07 >> [    0.000000] x26: ffff6fea00200000 x25: 0000400000000000 x24: ffffffffff433000 >> [    0.000000] x23: dfff800000000000 x22: 0000d01600000000 x21: 0068000000000f07 >> [    0.000000] x20: ffff6fea00000000 x19: ffff6fea00010000 x18: 00000000ae5a3fb1 >> [    0.000000] x17: 0000000000001114 x16: 00000000bfc60000 x15: 0000000000000200 >> [    0.000000] x14: 0000000000000000 x13: 1ffff68b06dd8f1c x12: 00000000f1f1f1f1 >> [    0.000000] x11: ffff768b06dd8f1c x10: ffffb45835a1ca38 x9 : 0000000000000000 >> [    0.000000] x8 : 0000000041b58ab3 x7 : 0000000000000000 x6 : 0000000000000000 >> [    0.000000] x5 : 006840000a861f07 x4 : 000000000000a861 x3 : 000000000000a861 >> [    0.000000] x2 : 006840000a861f03 x1 : 0068400000000f07 x0 : 0000000000000000 >> [    0.000000] Call trace: >> [    0.000000]  alloc_init_cont_pte+0x20c/0x3d0 (P) >> [    0.000000]  alloc_init_cont_pmd+0x20c/0x4d0 >> [    0.000000]  alloc_init_pud+0x244/0x400 >> [    0.000000]  create_kpti_ng_temp_pgd+0xf8/0x1c8 > > This is an alias for __create_pgd_mapping_locked() so I suspect we are actually > in __map_memblock(). > >> [    0.000000]  map_mem.constprop.0+0x1d8/0x3b8 >> [    0.000000]  paging_init+0x98/0x330 >> [    0.000000]  setup_arch+0xac/0x170 >> [    0.000000]  start_kernel+0x74/0x3c8 >> [    0.000000]  __primary_switched+0x8c/0xa0 >> [    0.000000] Code: f9400301 97ffff64 72001c1f 54fffe21 (d4210000) >> [    0.000000] ---[ end trace 0000000000000000 ]--- >> [    0.000000] Kernel panic - not syncing: Oops - BUG: Fatal exception >> [    0.000000] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal >> exception ]--- >> > > So I guess either we are setting a PTE entry into a table for the first time, > where somehow the table has not been initially cleared (very unlikely) or we are > trying to update the permissions of an already mapped pte. In that latter case, > I think we should only be remapping the kernel image portion of the linear map. > > I can't see any obvious recent changes in this area. I'll see if I can repro and > poke around a bit more. OK, maybe you'll be able to reproduce with the config I'm attaching.