* [Oops] vfree abort in bpf_jit_free with memcg_data value 0xffff
@ 2024-06-03 9:10 Peng Fan
2024-06-04 0:50 ` Roman Gushchin
0 siblings, 1 reply; 4+ messages in thread
From: Peng Fan @ 2024-06-03 9:10 UTC (permalink / raw)
To: linux-mm, bpf, daniel, ast, zlim.lnx, cgroups, hannes, mhocko,
roman.gushchin, shakeelb, muchun.song
Hi All,
We are running 6.6 kernel on NXP i.MX95 platform, and meet an issue very
hard to reproduce. Panic log in the end. I check the registers and source code.
static inline struct obj_cgroup *__folio_objcg(struct folio *folio)
{
unsigned long memcg_data = folio->memcg_data;
VM_BUG_ON_FOLIO(folio_test_slab(folio), folio);
VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJCGS, folio);
VM_BUG_ON_FOLIO(!(memcg_data & MEMCG_DATA_KMEM), folio);
return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
}
the memcg_data is 0xffff in register x1. This seems a invalid value.
Register x0 is x1 & ~3.
The panic happens in the PC: ffff800080305894, which is 'ldr x0, [x0, #16]'
I not have an good idea on how to fix the issue, please suggest if you have time
to give a look.
[ 12.843675] Unable to handle kernel paging request at virtual address 000000000001000c
[ 12.849981] audit: type=1334 audit(1709988536.322:30): prog-id=3 op=UNLOAD
[ 12.857888] Mem abort info:
[ 12.867630] ESR = 0x0000000096000004
[ 12.871368] EC = 0x25: DABT (current EL), IL = 32 bits
[ 12.876675] SET = 0, FnV = 0
[ 12.879732] EA = 0, S1PTW = 0
[ 12.882860] FSC = 0x04: level 0 translation fault
[ 12.887730] Data abort info:
[ 12.890599] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 12.896076] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 12.901120] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 12.906424] user pgtable: 4k pages, 48-bit VAs, pgdp=00000001008de000
[ 12.912854] [000000000001000c] pgd=0000000000000000, p4d=0000000000000000
[ 12.919642] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 12.925900] Modules linked in:
[ 12.928942] CPU: 4 PID: 131 Comm: kworker/4:2 Not tainted 6.6.23-06226-g41e0f501b547-dirty #248
[ 12.937625] Hardware name: NXP i.MX95 19X19 board (DT)
[ 12.942748] Workqueue: events bpf_prog_free_deferred
[ 12.947713] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.954663] pc : vfree+0x114/0x2e0
[ 12.958060] lr : vfree+0x78/0x2e0
[ 12.961362] sp : ffff80008459bd10
[ 12.964664] x29: ffff80008459bd10 x28: 0000000000000000 x27: 0000000000000000
[ 12.969128] watchdog: watchdog0: watchdog did not stop!
[ 12.971788] x26: 0000000000000000 x25: ffff0000808b5a00 x24: ffff000080090805
[ 12.971795] x23: ffff000084bcdc08 x22: 0000000000000000 x21: ffff00008493c6c0
[ 12.971802] x20: fffffc000100005e x19: 0000000000000000 x18: 0000000000000000
[ 12.971808] x17: ffff800084ec1000 x16: ffff00008465f208
[ 12.991063] systemd-shutdown[1]: Using hardware watchdog 'i.MX7ULP watchdog timer', version 0, device /dev/watchdog0
[ 12.991246] x15: 0000000000000000
[ 13.017453] x14: 0000000000000000 x13: ffff80008f001000 x12: ffff000084647a00
[ 13.024577] x11: ffff000080b9d1f8 x10: ffff0000846479d8 x9 : ffff8000803057f8
[ 13.031701] x8 : ffff80008459bcf0 x7 : 0000000000000001 x6 : ffff800082b84d38
[ 13.038825] x5 : 0000000000000000 x4 : 0000000080000000 x3 : ffff80008377d000
[ 13.045949] x2 : 0000000000000001 x1 : 000000000000ffff x0 : 000000000000fffc
[ 13.047210] systemd-shutdown[1]: Watchdog running with a timeout of 1min.
[ 13.053073] Call trace:
[ 13.053076] vfree+0x114/0x2e0
[ 13.053083] bpf_jit_free+0x54/0xb8
[ 13.068804] bpf_prog_free_deferred+0x16c/0x1a0
[ 13.073328] process_one_work+0x148/0x3b8
[ 13.077332] worker_thread+0x32c/0x450
[ 13.081076] kthread+0x11c/0x128
[ 13.084300] ret_from_fork+0x10/0x20
[ 13.087874] Code: a9425bf5 a8c57bfd d50323bf d65f03c0 (f9400800)
Part of the objdump code:
ffff8000803057f4: 97f8c73d bl ffff8000801374e8 <__rcu_read_lock>
ffff8000803057f8: f9400681 ldr x1, [x20, #8]
ffff8000803057fc: d1000420 sub x0, x1, #0x1
ffff800080305800: f240003f tst x1, #0x1
ffff800080305804: 9a941000 csel x0, x0, x20, ne // ne = any
ffff800080305808: f9401c01 ldr x1, [x0, #56]
ffff80008030580c: 927ef420 and x0, x1, #0xfffffffffffffffc
ffff800080305810: 37080421 tbnz w1, #1, ffff800080305894 <vfree+0x114>
ffff800080305814: b40000e0 cbz x0, ffff800080305830 <vfree+0xb0>
ffff800080305818: d53b4236 mrs x22, daif
ffff80008030581c: d50343df msr daifset, #0x3
ffff800080305820: 12800002 mov w2, #0xffffffff // #-1
ffff800080305824: 528005c1 mov w1, #0x2e // #46
ffff800080305828: 94015eac bl ffff80008035d2d8 <__mod_memcg_state>
ffff80008030582c: d51b4236 msr daif, x22
ffff800080305830: 97f8eafa bl ffff800080140418 <__rcu_read_unlock>
ffff800080305834: aa1403e0 mov x0, x20
ffff800080305838: 52800001 mov w1, #0x0 // #0
ffff80008030583c: 94001847 bl ffff80008030b958 <__free_pages>
ffff800080305840: 11000673 add w19, w19, #0x1
ffff800080305844: b9402ea0 ldr w0, [x21, #44]
ffff800080305848: f94012a1 ldr x1, [x21, #32]
......
ffff80008030588c: d50323bf autiasp
ffff800080305890: d65f03c0 ret
ffff800080305894: f9400800 ldr x0, [x0, #16]
ffff800080305898: 17ffffdf b ffff800080305814 <vfree+0x94>
ffff80008030589c: a90363f7 stp x23, x24, [sp, #48]
Thanks
Peng.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Oops] vfree abort in bpf_jit_free with memcg_data value 0xffff
2024-06-03 9:10 [Oops] vfree abort in bpf_jit_free with memcg_data value 0xffff Peng Fan
@ 2024-06-04 0:50 ` Roman Gushchin
2024-06-04 2:20 ` Peng Fan
0 siblings, 1 reply; 4+ messages in thread
From: Roman Gushchin @ 2024-06-04 0:50 UTC (permalink / raw)
To: Peng Fan
Cc: linux-mm, bpf, daniel, ast, zlim.lnx, cgroups, hannes, mhocko,
shakeelb, muchun.song
On Mon, Jun 03, 2024 at 09:10:43AM +0000, Peng Fan wrote:
> Hi All,
>
> We are running 6.6 kernel on NXP i.MX95 platform, and meet an issue very
> hard to reproduce. Panic log in the end. I check the registers and source code.
Hi!
Do you know by a chance if the issue is reproducible on newer kernels?
From a very first glance, I doubt it's a generic memory accounting
issue, otherwise we'd see a lot more instances of it. So my guess it
something related to bpf jit code. It seems like there were heavy
changes since 6.6, this is why I'm asking about newer kernels.
Thanks!
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [Oops] vfree abort in bpf_jit_free with memcg_data value 0xffff
2024-06-04 0:50 ` Roman Gushchin
@ 2024-06-04 2:20 ` Peng Fan
2024-06-04 14:52 ` Peng Fan
0 siblings, 1 reply; 4+ messages in thread
From: Peng Fan @ 2024-06-04 2:20 UTC (permalink / raw)
To: Roman Gushchin
Cc: linux-mm, bpf, daniel, ast, zlim.lnx, cgroups, hannes, mhocko,
shakeelb, muchun.song
Hi Roman,
> Subject: Re: [Oops] vfree abort in bpf_jit_free with memcg_data value 0xffff
>
> On Mon, Jun 03, 2024 at 09:10:43AM +0000, Peng Fan wrote:
> > Hi All,
> >
> > We are running 6.6 kernel on NXP i.MX95 platform, and meet an issue
> > very hard to reproduce. Panic log in the end. I check the registers and
> source code.
>
> Hi!
>
> Do you know by a chance if the issue is reproducible on newer kernels?
>
> From a very first glance, I doubt it's a generic memory accounting issue,
> otherwise we'd see a lot more instances of it. So my guess it something
> related to bpf jit code. It seems like there were heavy changes since 6.6, this
> is why I'm asking about newer kernels.
I not have a full test environment with newer kernel, the i.MX95 platform
has not been landed in upstream repo.
After I enable DEBUG_VM, I have a new dump in virt_to_phys: I am thinking
whether the dma corrupt memory. And with disabling DPU, I am redoing
the test, and see how it goes.
[ 2.992655] ------------[ cut here ]------------
[ 3.003764] virt_to_phys used for non-linear address: 00000000897eac93 (0xffff800086001000)
[ 3.004944] sysctr_timer_read_write:10024 retry: 1
[ 3.012196] WARNING: CPU: 0 PID: 11 at arch/arm64/mm/physaddr.c:12 __virt_to_phys+0x68/0x98
[ 3.025243] Modules linked in:
[ 3.028312] CPU: 0 PID: 11 Comm: kworker/u12:0 Not tainted 6.6.23-06226-g4986cc3e1b75-dirty #251
[ 3.037098] Hardware name: NXP i.MX95 19X19 board (DT)
[ 3.042239] Workqueue: events_unbound deferred_probe_work_func
[ 3.044953] sysctr_timer_read_write:10024 retry: 1
[ 3.048079] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3.059796] pc : __virt_to_phys+0x68/0x98
[ 3.063809] lr : __virt_to_phys+0x68/0x98
[ 3.067839] sp : ffff800082de3990
[ 3.071141] x29: ffff800082de3990 x28: 0000000000000000 x27: 0000000034325258
[ 3.078282] x26: ffff000084748000 x25: ffff0000818ba800 x24: ffff00008471dc00
[ 3.084954] sysctr_timer_read_write:10024 retry: 1
[ 3.085423] x23: 0000000000000000 x22: ffff0000818ba200 x21: ffff00008080bc00
[ 3.097323] x20: ffff0000847345c0 x19: ffff800086001000 x18: 0000000000000006
[ 3.104447] x17: 6666783028203339 x16: 6361653739383030 x15: 303030303030203a
[ 3.111588] x14: 7373657264646120 x13: 2930303031303036 x12: 3830303038666666
[ 3.118712] x11: 6678302820333963 x10: 0000000000000a90 x9 : ffff8000800e04a0
[ 3.120954] sysctr_timer_read_write:10024 retry: 1
[ 3.125836] x8 : ffff0000803d28f0 x7 : 000000006273d88e x6 : 0000000000000400
[ 3.137736] x5 : 00000000410fd050 x4 : 0000000000f0000f x3 : 0000000000200000
[ 3.144894] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000803d1e00
[ 3.152036] Call trace:
[ 3.154489] __virt_to_phys+0x68/0x98
[ 3.158163] drm_fbdev_dma_helper_fb_probe+0x138/0x238
[ 3.163294] __drm_fb_helper_initial_config_and_unlock+0x2b0/0x4c0
[ 3.169012] sysctr_timer_read_write:10024 retry: 1
[ 3.169498] drm_fb_helper_initial_config+0x4c/0x68
[ 3.177000] sysctr_timer_read_write:10024 retry: 1
[ 3.179136] drm_fbdev_dma_client_hotplug+0x8c/0xe0
[ 3.188773] drm_client_register+0x60/0xb0
[ 3.192881] drm_fbdev_dma_setup+0x94/0x148
[ 3.197059] dpu95_probe+0xc4/0x130
[ 3.200577] platform_probe+0x70/0xd0
[ 3.204252] really_probe+0x150/0x2c0
Thanks
Peng
>
> Thanks!
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [Oops] vfree abort in bpf_jit_free with memcg_data value 0xffff
2024-06-04 2:20 ` Peng Fan
@ 2024-06-04 14:52 ` Peng Fan
0 siblings, 0 replies; 4+ messages in thread
From: Peng Fan @ 2024-06-04 14:52 UTC (permalink / raw)
To: Peng Fan, Roman Gushchin
Cc: linux-mm, bpf, daniel, ast, zlim.lnx, cgroups, hannes, mhocko,
shakeelb, muchun.song
> Subject: RE: [Oops] vfree abort in bpf_jit_free with memcg_data value 0xffff
>
> Hi Roman,
>
> > Subject: Re: [Oops] vfree abort in bpf_jit_free with memcg_data value
> > 0xffff
> >
> > On Mon, Jun 03, 2024 at 09:10:43AM +0000, Peng Fan wrote:
> > > Hi All,
> > >
> > > We are running 6.6 kernel on NXP i.MX95 platform, and meet an issue
> > > very hard to reproduce. Panic log in the end. I check the registers
> > > and
> > source code.
> >
> > Hi!
> >
> > Do you know by a chance if the issue is reproducible on newer kernels?
> >
> > From a very first glance, I doubt it's a generic memory accounting
> > issue, otherwise we'd see a lot more instances of it. So my guess it
> > something related to bpf jit code. It seems like there were heavy
> > changes since 6.6, this is why I'm asking about newer kernels.
>
> I not have a full test environment with newer kernel, the i.MX95 platform has
> not been landed in upstream repo.
>
> After I enable DEBUG_VM, I have a new dump in virt_to_phys: I am thinking
> whether the dma corrupt memory. And with disabling DPU, I am redoing the
> test, and see how it goes.
After address the virt_to_phys issue, I could still see bpt_jit_free trigger
kernel panic.
Is there any suggestion that how I could reproduce this issue sooner?
Currently I am doing linux reboot test, but needs several hours or more
to reproduce this issue.
Thanks,
Peng.
>
> [ 2.992655] ------------[ cut here ]------------
> [ 3.003764] virt_to_phys used for non-linear address: 00000000897eac93
> (0xffff800086001000)
> [ 3.004944] sysctr_timer_read_write:10024 retry: 1
> [ 3.012196] WARNING: CPU: 0 PID: 11 at arch/arm64/mm/physaddr.c:12
> __virt_to_phys+0x68/0x98
> [ 3.025243] Modules linked in:
> [ 3.028312] CPU: 0 PID: 11 Comm: kworker/u12:0 Not tainted 6.6.23-
> 06226-g4986cc3e1b75-dirty #251
> [ 3.037098] Hardware name: NXP i.MX95 19X19 board (DT)
> [ 3.042239] Workqueue: events_unbound deferred_probe_work_func
> [ 3.044953] sysctr_timer_read_write:10024 retry: 1
> [ 3.048079] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS
> BTYPE=--)
> [ 3.059796] pc : __virt_to_phys+0x68/0x98
> [ 3.063809] lr : __virt_to_phys+0x68/0x98
> [ 3.067839] sp : ffff800082de3990
> [ 3.071141] x29: ffff800082de3990 x28: 0000000000000000 x27:
> 0000000034325258
> [ 3.078282] x26: ffff000084748000 x25: ffff0000818ba800 x24:
> ffff00008471dc00
> [ 3.084954] sysctr_timer_read_write:10024 retry: 1
> [ 3.085423] x23: 0000000000000000 x22: ffff0000818ba200 x21:
> ffff00008080bc00
> [ 3.097323] x20: ffff0000847345c0 x19: ffff800086001000 x18:
> 0000000000000006
> [ 3.104447] x17: 6666783028203339 x16: 6361653739383030 x15:
> 303030303030203a
> [ 3.111588] x14: 7373657264646120 x13: 2930303031303036 x12:
> 3830303038666666
> [ 3.118712] x11: 6678302820333963 x10: 0000000000000a90 x9 :
> ffff8000800e04a0
> [ 3.120954] sysctr_timer_read_write:10024 retry: 1
> [ 3.125836] x8 : ffff0000803d28f0 x7 : 000000006273d88e x6 :
> 0000000000000400
> [ 3.137736] x5 : 00000000410fd050 x4 : 0000000000f0000f x3 :
> 0000000000200000
> [ 3.144894] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
> ffff0000803d1e00
> [ 3.152036] Call trace:
> [ 3.154489] __virt_to_phys+0x68/0x98
> [ 3.158163] drm_fbdev_dma_helper_fb_probe+0x138/0x238
> [ 3.163294] __drm_fb_helper_initial_config_and_unlock+0x2b0/0x4c0
> [ 3.169012] sysctr_timer_read_write:10024 retry: 1
> [ 3.169498] drm_fb_helper_initial_config+0x4c/0x68
> [ 3.177000] sysctr_timer_read_write:10024 retry: 1
> [ 3.179136] drm_fbdev_dma_client_hotplug+0x8c/0xe0
> [ 3.188773] drm_client_register+0x60/0xb0
> [ 3.192881] drm_fbdev_dma_setup+0x94/0x148
> [ 3.197059] dpu95_probe+0xc4/0x130
> [ 3.200577] platform_probe+0x70/0xd0
> [ 3.204252] really_probe+0x150/0x2c0
>
> Thanks
> Peng
> >
> > Thanks!
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-06-04 14:52 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-03 9:10 [Oops] vfree abort in bpf_jit_free with memcg_data value 0xffff Peng Fan
2024-06-04 0:50 ` Roman Gushchin
2024-06-04 2:20 ` Peng Fan
2024-06-04 14:52 ` Peng Fan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox