* Re: [next-20260216]NULL pointer dereference in drain_obj_stock() (RCU free path) [not found] ` <aZReMzl-S9KM_snh@nidhogg.toxiclabs.cc> @ 2026-02-18 11:36 ` Vlastimil Babka 2026-02-18 21:25 ` Shakeel Butt 2026-02-22 10:08 ` Venkat Rao Bagalkote 0 siblings, 2 replies; 7+ messages in thread From: Vlastimil Babka @ 2026-02-18 11:36 UTC (permalink / raw) To: Carlos Maiolino, Venkat Rao Bagalkote, Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt Cc: linux-xfs, linux-fsdevel, LKML, Madhavan Srinivasan, Ritesh Harjani, ojaswin, Muchun Song, Cgroups, linux-mm, Harry Yoo, Hao Li On 2/17/26 13:40, Carlos Maiolino wrote: > On Tue, Feb 17, 2026 at 04:59:12PM +0530, Venkat Rao Bagalkote wrote: >> Greetings!!! >> >> I am observing below OOPs, while running xfstests generic/428 test case. But >> I am not able to reproduce this consistently. >> >> >> Platform: IBM Power11 (pSeries LPAR), Radix MMU, LE, 64K pages >> Kernel: 6.19.0-next-20260216 >> Tests: generic/428 >> >> local.config >>> >> [xfs_4k] >> export RECREATE_TEST_DEV=true >> export TEST_DEV=/dev/loop0 >> export TEST_DIR=/mnt/test >> export SCRATCH_DEV=/dev/loop1 >> export SCRATCH_MNT=/mnt/scratch >> export MKFS_OPTIONS="-b size=4096" >> export FSTYP=xfs >> export MOUNT_OPTIONS=""- >> >> >> >> Attached is .config file used. >> >> >> Traces: >> > > /me fixing trace's indentation CCing memcg and slab folks. Would be nice to figure out where in drain_obj_stock things got wrong. Any change for e.g. ./scripts/faddr2line ? I wonder if we have either some bogus objext pointer, or maybe the rcu_free_sheaf() context is new (or previously rare) for memcg and we have some locking issues being exposed in refill/drain. >> >> [ 6054.957411] run fstests generic/428 at 2026-02-16 22:25:57 >> [ 6055.136443] Kernel attempted to read user page (0) - exploit attempt? >> (uid: 0) >> [ 6055.136474] BUG: Kernel NULL pointer dereference on read at 0x00000000 >> [ 6055.136485] Faulting instruction address: 0xc0000000008aff0c >> [ 6055.136495] Oops: Kernel access of bad area, sig: 11 [#1] >> [ 6055.136505] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries >> [ 6055.136517] Modules linked in: dm_thin_pool dm_persistent_data >> dm_bio_prison dm_snapshot dm_bufio dm_flakey xfs loop dm_mod nft_fib_inet >> nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 >> nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack >> nf_defrag_ipv6 nf_defrag_ipv4 bonding ip_set tls nf_tables rfkill sunrpc >> nfnetlink pseries_rng vmx_crypto dax_pmem fuse ext4 crc16 mbcache jbd2 >> nd_pmem papr_scm sd_mod libnvdimm sg ibmvscsi ibmveth scsi_transport_srp >> pseries_wdt [last unloaded: scsi_debug] >> [ 6055.136684] CPU: 19 UID: 0 PID: 0 Comm: swapper/19 Kdump: loaded Tainted: >> G W 6.19.0-next-20260216 #1 PREEMPTLAZY >> [ 6055.136701] Tainted: [W]=WARN >> [ 6055.136708] Hardware name: IBM,9080-HEX Power11 (architected) 0x820200 >> 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries >> [ 6055.136719] NIP: c0000000008aff0c LR: c0000000008aff00 CTR: >> c00000000036d5e0 >> [ 6055.136730] REGS: c000000d0dc877c0 TRAP: 0300 Tainted: G W >> (6.19.0-next-20260216) >> [ 6055.136742] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 84042802 XER: 20040037 >> [ 6055.136777] CFAR: c000000000862a74 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0 >> [ 6055.136777] GPR00: c0000000008aff00 c000000d0dc87a60 c00000000243a500 0000000000000001 >> [ 6055.136777] GPR04: 0000000000000008 0000000000000001 c0000000008aff00 0000000000000001 >> [ 6055.136777] GPR08: a80e000000000000 0000000000000001 0000000000000007 >> a80e000000000000 >> [ 6055.136777] GPR12: c00e00000c46e6d5 c000000d0ddf0b00 c000000019069a00 >> 0000000000000006 >> [ 6055.136777] GPR16: c000000007012fa0 c000000007012fa4 c000000005160980 >> c000000007012f88 >> [ 6055.136777] GPR20: c00c0000004d7cec c000000d0d10f008 0000000000000001 >> ffffffffffffff78 >> [ 6055.136777] GPR24: 0000000000000005 c000000d0d58f180 c0000001d0795e00 >> c000000d0d10f01c >> [ 6055.136777] GPR28: c000000d0d10f008 c000000d0d10f010 c0000001d0795e08 >> 0000000000000000 >> [ 6055.136891] NIP [c0000000008aff0c] drain_obj_stock+0x620/0xa48 >> [ 6055.136905] LR [c0000000008aff00] drain_obj_stock+0x614/0xa48 >> [ 6055.136915] Call Trace: >> [ 6055.136919] [c000000d0dc87a60] [c0000000008aff00] drain_obj_stock+0x614/0xa48 (unreliable) >> [ 6055.136933] [c000000d0dc87b10] [c0000000008b27e4] refill_obj_stock+0x104/0x680 >> [ 6055.136945] [c000000d0dc87b90] [c0000000008b9238] __memcg_slab_free_hook+0x238/0x3ec >> [ 6055.136956] [c000000d0dc87c60] [c0000000007f39a0] __rcu_free_sheaf_prepare+0x314/0x3e8 >> [ 6055.136968] [c000000d0dc87d10] [c0000000007fbf0c] rcu_free_sheaf+0x38/0x170 >> [ 6055.136980] [c000000d0dc87d50] [c0000000003344b0] rcu_do_batch+0x2ec/0xfa8 >> [ 6055.136992] [c000000d0dc87e50] [c000000000339948] rcu_core+0x22c/0x48c >> [ 6055.137002] [c000000d0dc87ec0] [c0000000001cfe6c] handle_softirqs+0x1f4/0x74c >> [ 6055.137013] [c000000d0dc87fe0] [c00000000001b0cc] do_softirq_own_stack+0x60/0x7c >> [ 6055.137025] [c000000009717930] [c00000000001b0b8] do_softirq_own_stack+0x4c/0x7c >> [ 6055.137036] [c000000009717960] [c0000000001cf128] __irq_exit_rcu+0x268/0x308 >> [ 6055.137046] [c0000000097179a0] [c0000000001d0ba4] irq_exit+0x20/0x38 >> [ 6055.137056] [c0000000097179c0] [c0000000000315f4] interrupt_async_exit_prepare.constprop.0+0x18/0x2c >> [ 6055.137069] [c0000000097179e0] [c000000000009ffc] decrementer_common_virt+0x28c/0x290 >> [ 6055.137080] ---- interrupt: 900 at plpar_hcall_norets_notrace+0x18/0x2c >> [ 6055.137090] NIP: c00000000012d8f0 LR: c00000000135c3fc CTR: 0000000000000000 >> [ 6055.137097] REGS: c000000009717a10 TRAP: 0900 Tainted: G W (6.19.0-next-20260216) >> [ 6055.137105] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24000804 XER: 00000037 >> [ 6055.137134] CFAR: 0000000000000000 IRQMASK: 0 >> [ 6055.137134] GPR00: 0000000000000000 c000000009717cb0 c00000000243a500 0000000000000000 >> [ 6055.137134] GPR04: 0000000000000000 800400002fe6fc10 0000000000000000 0000000000000001 >> [ 6055.137134] GPR08: 0000000000000033 0000000000000000 0000000000000090 0000000000000001 >> [ 6055.137134] GPR12: 800400002fe6fc00 c000000d0ddf0b00 0000000000000000 000000002ef01a60 >> [ 6055.137134] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> [ 6055.137134] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001 >> [ 6055.137134] GPR24: 0000000000000000 c000000004d7a778 00000581d1a507b8 0000000000000000 >> [ 6055.137134] GPR28: 0000000000000000 0000000000000001 c0000000032b18d8 c0000000032b18e0 >> [ 6055.137229] NIP [c00000000012d8f0] plpar_hcall_norets_notrace+0x18/0x2c >> [ 6055.137238] LR [c00000000135c3fc] cede_processor.isra.0+0x1c/0x30 >> [ 6055.137248] ---- interrupt: 900 >> [ 6055.137253] [c000000009717cb0] [c000000009717cf0] 0xc000000009717cf0 (unreliable) >> [ 6055.137265] [c000000009717d10] [c0000000019af160] dedicated_cede_loop+0x90/0x170 >> [ 6055.137277] [c000000009717d60] [c0000000019aeb10] cpuidle_enter_state+0x394/0x480 >> [ 6055.137288] [c000000009717e00] [c0000000013589ec] cpuidle_enter+0x64/0x9c >> [ 6055.137298] [c000000009717e50] [c000000000284a8c] call_cpuidle+0x7c/0xf8 >> [ 6055.137310] [c000000009717e90] [c000000000290398] cpuidle_idle_call+0x1c4/0x2b4 >> [ 6055.137321] [c000000009717f00] [c0000000002905bc] do_idle+0x134/0x208 >> [ 6055.137330] [c000000009717f50] [c000000000290a0c] cpu_startup_entry+0x60/0x64 >> [ 6055.137341] [c000000009717f80] [c0000000000744b8] start_secondary+0x3fc/0x400 >> [ 6055.137352] [c000000009717fe0] [c00000000000e258] start_secondary_prolog+0x10/0x14 >> [ 6055.137363] Code: 60000000 3bda0008 7fc3f378 4bfb148d 60000000 ebfa0008 38800008 7fe3fb78 4bfb2b51 60000000 7c0004ac 39200001 <7d40f8a8> 7d495050 7d40f9ad 40c2fff4 >> [ 6055.137400] ---[ end trace 0000000000000000 ]--- > > Again, nothing here seems to point to a xfs problem. > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [next-20260216]NULL pointer dereference in drain_obj_stock() (RCU free path) 2026-02-18 11:36 ` [next-20260216]NULL pointer dereference in drain_obj_stock() (RCU free path) Vlastimil Babka @ 2026-02-18 21:25 ` Shakeel Butt 2026-02-22 10:08 ` Venkat Rao Bagalkote 1 sibling, 0 replies; 7+ messages in thread From: Shakeel Butt @ 2026-02-18 21:25 UTC (permalink / raw) To: Vlastimil Babka Cc: Carlos Maiolino, Venkat Rao Bagalkote, Johannes Weiner, Michal Hocko, Roman Gushchin, linux-xfs, linux-fsdevel, LKML, Madhavan Srinivasan, Ritesh Harjani, ojaswin, Muchun Song, Cgroups, linux-mm, Harry Yoo, Hao Li On Wed, Feb 18, 2026 at 12:36:06PM +0100, Vlastimil Babka wrote: > On 2/17/26 13:40, Carlos Maiolino wrote: > > On Tue, Feb 17, 2026 at 04:59:12PM +0530, Venkat Rao Bagalkote wrote: > >> Greetings!!! > >> > >> I am observing below OOPs, while running xfstests generic/428 test case. But > >> I am not able to reproduce this consistently. > >> > >> > >> Platform: IBM Power11 (pSeries LPAR), Radix MMU, LE, 64K pages > >> Kernel: 6.19.0-next-20260216 > >> Tests: generic/428 > >> > >> local.config >>> > >> [xfs_4k] > >> export RECREATE_TEST_DEV=true > >> export TEST_DEV=/dev/loop0 > >> export TEST_DIR=/mnt/test > >> export SCRATCH_DEV=/dev/loop1 > >> export SCRATCH_MNT=/mnt/scratch > >> export MKFS_OPTIONS="-b size=4096" > >> export FSTYP=xfs > >> export MOUNT_OPTIONS=""- > >> > >> > >> > >> Attached is .config file used. > >> > >> > >> Traces: > >> > > > > /me fixing trace's indentation > > CCing memcg and slab folks. > Would be nice to figure out where in drain_obj_stock things got wrong. Any > change for e.g. ./scripts/faddr2line ? > > I wonder if we have either some bogus objext pointer, or maybe the > rcu_free_sheaf() context is new (or previously rare) for memcg and we have > some locking issues being exposed in refill/drain. > Yes output of ./scripts/faddr2line would be really helpful. I can't think of anything that might go wrong in refill/drain. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [next-20260216]NULL pointer dereference in drain_obj_stock() (RCU free path) 2026-02-18 11:36 ` [next-20260216]NULL pointer dereference in drain_obj_stock() (RCU free path) Vlastimil Babka 2026-02-18 21:25 ` Shakeel Butt @ 2026-02-22 10:08 ` Venkat Rao Bagalkote 2026-02-22 11:47 ` Harry Yoo 1 sibling, 1 reply; 7+ messages in thread From: Venkat Rao Bagalkote @ 2026-02-22 10:08 UTC (permalink / raw) To: Vlastimil Babka, Carlos Maiolino, Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt Cc: linux-xfs, linux-fsdevel, LKML, Madhavan Srinivasan, Ritesh Harjani, ojaswin, Muchun Song, Cgroups, linux-mm, Harry Yoo, Hao Li On 18/02/26 5:06 pm, Vlastimil Babka wrote: > On 2/17/26 13:40, Carlos Maiolino wrote: >> On Tue, Feb 17, 2026 at 04:59:12PM +0530, Venkat Rao Bagalkote wrote: >>> Greetings!!! >>> >>> I am observing below OOPs, while running xfstests generic/428 test case. But >>> I am not able to reproduce this consistently. >>> >>> >>> Platform: IBM Power11 (pSeries LPAR), Radix MMU, LE, 64K pages >>> Kernel: 6.19.0-next-20260216 >>> Tests: generic/428 >>> >>> local.config >>> >>> [xfs_4k] >>> export RECREATE_TEST_DEV=true >>> export TEST_DEV=/dev/loop0 >>> export TEST_DIR=/mnt/test >>> export SCRATCH_DEV=/dev/loop1 >>> export SCRATCH_MNT=/mnt/scratch >>> export MKFS_OPTIONS="-b size=4096" >>> export FSTYP=xfs >>> export MOUNT_OPTIONS=""- >>> >>> >>> >>> Attached is .config file used. >>> >>> >>> Traces: >>> >> /me fixing trace's indentation > CCing memcg and slab folks. > Would be nice to figure out where in drain_obj_stock things got wrong. Any > change for e.g. ./scripts/faddr2line ? > > I wonder if we have either some bogus objext pointer, or maybe the > rcu_free_sheaf() context is new (or previously rare) for memcg and we have > some locking issues being exposed in refill/drain. This issue also got reproduced on mainline repo. Traces: [ 8058.036083] Kernel attempted to read user page (0) - exploit attempt? (uid: 0) [ 8058.036116] BUG: Kernel NULL pointer dereference on read at 0x00000000 [ 8058.036127] Faulting instruction address: 0xc0000000008b018c [ 8058.036137] Oops: Kernel access of bad area, sig: 11 [#1] [ 8058.036147] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries [ 8058.036159] Modules linked in: overlay dm_zero dm_thin_pool dm_persistent_data dm_bio_prison dm_snapshot dm_bufio dm_flakey xfs loop dm_mod nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set bonding nf_tables tls rfkill sunrpc nfnetlink pseries_rng vmx_crypto dax_pmem fuse ext4 crc16 mbcache jbd2 nd_pmem papr_scm sd_mod libnvdimm sg ibmvscsi ibmveth scsi_transport_srp pseries_wdt [last unloaded: scsi_debug] [ 8058.036339] CPU: 19 UID: 0 PID: 115 Comm: ksoftirqd/19 Kdump: loaded Not tainted 6.19.0+ #1 PREEMPTLAZY [ 8058.036361] Hardware name: IBM,9080-HEX Power11 (architected) 0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries [ 8058.036379] NIP: c0000000008b018c LR: c0000000008b0180 CTR: c00000000036d680 [ 8058.036395] REGS: c00000000b5976c0 TRAP: 0300 Not tainted (6.19.0+) [ 8058.036411] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 84042002 XER: 20040000 [ 8058.036482] CFAR: c000000000862cf4 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0 [ 8058.036482] GPR00: c0000000008b0180 c00000000b597960 c00000000243a500 0000000000000001 [ 8058.036482] GPR04: 0000000000000008 0000000000000001 c0000000008b0180 0000000000000001 [ 8058.036482] GPR08: a80e000000000000 0000000000000001 0000000000000007 a80e000000000000 [ 8058.036482] GPR12: c00e00000120f8d5 c000000d0ddf0b00 c000000073567780 0000000000000006 [ 8058.036482] GPR16: c000000007012fa0 c000000007012fa4 c000000005160980 c000000007012f88 [ 8058.036482] GPR20: c00c000001c3daac c000000d0d10f008 0000000000000001 ffffffffffffff78 [ 8058.036482] GPR24: 0000000000000005 c000000d0d58f180 c00000000cd6f580 c000000d0d10f01c [ 8058.036482] GPR28: c000000d0d10f008 c000000d0d10f010 c00000000cd6f588 0000000000000000 [ 8058.036628] NIP [c0000000008b018c] drain_obj_stock+0x620/0xa48 [ 8058.036646] LR [c0000000008b0180] drain_obj_stock+0x614/0xa48 [ 8058.036659] Call Trace: [ 8058.036665] [c00000000b597960] [c0000000008b0180] drain_obj_stock+0x614/0xa48 (unreliable) [ 8058.036688] [c00000000b597a10] [c0000000008b2a64] refill_obj_stock+0x104/0x680 [ 8058.036715] [c00000000b597a90] [c0000000008b94b8] __memcg_slab_free_hook+0x238/0x3ec [ 8058.036738] [c00000000b597b60] [c0000000007f3c10] __rcu_free_sheaf_prepare+0x314/0x3e8 [ 8058.036763] [c00000000b597c10] [c0000000007fbf70] rcu_free_sheaf_nobarn+0x38/0x78 [ 8058.036788] [c00000000b597c40] [c000000000334550] rcu_do_batch+0x2ec/0xfa8 [ 8058.036812] [c00000000b597d40] [c0000000003399e8] rcu_core+0x22c/0x48c [ 8058.036835] [c00000000b597db0] [c0000000001cfe6c] handle_softirqs+0x1f4/0x74c [ 8058.036862] [c00000000b597ed0] [c0000000001d0458] run_ksoftirqd+0x94/0xb8 [ 8058.036885] [c00000000b597f00] [c00000000022a130] smpboot_thread_fn+0x450/0x648 [ 8058.036912] [c00000000b597f80] [c000000000218408] kthread+0x244/0x28c [ 8058.036927] [c00000000b597fe0] [c00000000000ded8] start_kernel_thread+0x14/0x18 [ 8058.036943] Code: 60000000 3bda0008 7fc3f378 4bfb148d 60000000 ebfa0008 38800008 7fe3fb78 4bfb2b51 60000000 7c0004ac 39200001 <7d40f8a8> 7d495050 7d40f9ad 40c2fff4 [ 8058.037000] ---[ end trace 0000000000000000 ]--- And below is the corresponding o/p from faddr2line. drain_obj_stock+0x620/0xa48: arch_atomic64_sub_return_relaxed at arch/powerpc/include/asm/atomic.h:272 (inlined by) raw_atomic64_sub_return at include/linux/atomic/atomic-arch-fallback.h:2917 (inlined by) raw_atomic64_sub_and_test at include/linux/atomic/atomic-arch-fallback.h:4386 (inlined by) raw_atomic_long_sub_and_test at include/linux/atomic/atomic-long.h:1551 (inlined by) atomic_long_sub_and_test at include/linux/atomic/atomic-instrumented.h:4522 (inlined by) percpu_ref_put_many at include/linux/percpu-refcount.h:334 (inlined by) percpu_ref_put at include/linux/percpu-refcount.h:351 (inlined by) obj_cgroup_put at include/linux/memcontrol.h:794 (inlined by) drain_obj_stock at mm/memcontrol.c:3059 drain_obj_stock+0x614/0xa48: instrument_atomic_read_write at include/linux/instrumented.h:112 (inlined by) atomic_long_sub_and_test at include/linux/atomic/atomic-instrumented.h:4521 (inlined by) percpu_ref_put_many at include/linux/percpu-refcount.h:334 (inlined by) percpu_ref_put at include/linux/percpu-refcount.h:351 (inlined by) obj_cgroup_put at include/linux/memcontrol.h:794 (inlined by) drain_obj_stock at mm/memcontrol.c:3059 refill_obj_stock+0x104/0x680: __preempt_count_add at include/asm-generic/preempt.h:54 (inlined by) __rcu_read_lock at include/linux/rcupdate.h:103 (inlined by) rcu_read_lock at include/linux/rcupdate.h:848 (inlined by) percpu_ref_get_many at include/linux/percpu-refcount.h:202 (inlined by) percpu_ref_get at include/linux/percpu-refcount.h:222 (inlined by) obj_cgroup_get at include/linux/memcontrol.h:782 (inlined by) refill_obj_stock at mm/memcontrol.c:3099 __memcg_slab_free_hook+0x238/0x3ec: __preempt_count_add at include/asm-generic/preempt.h:54 (inlined by) __rcu_read_lock at include/linux/rcupdate.h:103 (inlined by) rcu_read_lock at include/linux/rcupdate.h:848 (inlined by) percpu_ref_put_many at include/linux/percpu-refcount.h:330 (inlined by) percpu_ref_put at include/linux/percpu-refcount.h:351 (inlined by) obj_cgroup_put at include/linux/memcontrol.h:794 (inlined by) __memcg_slab_free_hook at mm/memcontrol.c:3284 __rcu_free_sheaf_prepare+0x314/0x3e8: memcg_slab_free_hook at mm/slub.c:2486 (inlined by) __rcu_free_sheaf_prepare at mm/slub.c:2914 rcu_free_sheaf_nobarn+0x38/0x78: sheaf_flush_unused at mm/slub.c:2893 (inlined by) rcu_free_sheaf_nobarn at mm/slub.c:2941 rcu_do_batch+0x2ec/0xfa8: rcu_do_batch at kernel/rcu/tree.c:2617 rcu_core+0x22c/0x48c: rcu_core at kernel/rcu/tree.c:2871 handle_softirqs+0x1f4/0x74c: handle_softirqs at kernel/softirq.c:622 run_ksoftirqd+0x94/0xb8: arch_local_irq_enable at arch/powerpc/include/asm/hw_irq.h:201 (inlined by) ksoftirqd_run_end at kernel/softirq.c:479 (inlined by) run_ksoftirqd at kernel/softirq.c:1064 (inlined by) run_ksoftirqd at kernel/softirq.c:1055 smpboot_thread_fn+0x450/0x648: smpboot_thread_fn at kernel/smpboot.c:160 (discriminator 3) kthread+0x244/0x28c: kthread at kernel/kthread.c:467 start_kernel_thread+0x14/0x18: start_kernel_thread at arch/powerpc/kernel/interrupt_64.S:771 Regards, Venkat. > >>> [ 6054.957411] run fstests generic/428 at 2026-02-16 22:25:57 >>> [ 6055.136443] Kernel attempted to read user page (0) - exploit attempt? >>> (uid: 0) >>> [ 6055.136474] BUG: Kernel NULL pointer dereference on read at 0x00000000 >>> [ 6055.136485] Faulting instruction address: 0xc0000000008aff0c >>> [ 6055.136495] Oops: Kernel access of bad area, sig: 11 [#1] >>> [ 6055.136505] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries >>> [ 6055.136517] Modules linked in: dm_thin_pool dm_persistent_data >>> dm_bio_prison dm_snapshot dm_bufio dm_flakey xfs loop dm_mod nft_fib_inet >>> nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 >>> nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack >>> nf_defrag_ipv6 nf_defrag_ipv4 bonding ip_set tls nf_tables rfkill sunrpc >>> nfnetlink pseries_rng vmx_crypto dax_pmem fuse ext4 crc16 mbcache jbd2 >>> nd_pmem papr_scm sd_mod libnvdimm sg ibmvscsi ibmveth scsi_transport_srp >>> pseries_wdt [last unloaded: scsi_debug] >>> [ 6055.136684] CPU: 19 UID: 0 PID: 0 Comm: swapper/19 Kdump: loaded Tainted: >>> G W 6.19.0-next-20260216 #1 PREEMPTLAZY >>> [ 6055.136701] Tainted: [W]=WARN >>> [ 6055.136708] Hardware name: IBM,9080-HEX Power11 (architected) 0x820200 >>> 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries >>> [ 6055.136719] NIP: c0000000008aff0c LR: c0000000008aff00 CTR: >>> c00000000036d5e0 >>> [ 6055.136730] REGS: c000000d0dc877c0 TRAP: 0300 Tainted: G W >>> (6.19.0-next-20260216) >>> [ 6055.136742] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 84042802 XER: 20040037 >>> [ 6055.136777] CFAR: c000000000862a74 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0 >>> [ 6055.136777] GPR00: c0000000008aff00 c000000d0dc87a60 c00000000243a500 0000000000000001 >>> [ 6055.136777] GPR04: 0000000000000008 0000000000000001 c0000000008aff00 0000000000000001 >>> [ 6055.136777] GPR08: a80e000000000000 0000000000000001 0000000000000007 >>> a80e000000000000 >>> [ 6055.136777] GPR12: c00e00000c46e6d5 c000000d0ddf0b00 c000000019069a00 >>> 0000000000000006 >>> [ 6055.136777] GPR16: c000000007012fa0 c000000007012fa4 c000000005160980 >>> c000000007012f88 >>> [ 6055.136777] GPR20: c00c0000004d7cec c000000d0d10f008 0000000000000001 >>> ffffffffffffff78 >>> [ 6055.136777] GPR24: 0000000000000005 c000000d0d58f180 c0000001d0795e00 >>> c000000d0d10f01c >>> [ 6055.136777] GPR28: c000000d0d10f008 c000000d0d10f010 c0000001d0795e08 >>> 0000000000000000 >>> [ 6055.136891] NIP [c0000000008aff0c] drain_obj_stock+0x620/0xa48 >>> [ 6055.136905] LR [c0000000008aff00] drain_obj_stock+0x614/0xa48 >>> [ 6055.136915] Call Trace: >>> [ 6055.136919] [c000000d0dc87a60] [c0000000008aff00] drain_obj_stock+0x614/0xa48 (unreliable) >>> [ 6055.136933] [c000000d0dc87b10] [c0000000008b27e4] refill_obj_stock+0x104/0x680 >>> [ 6055.136945] [c000000d0dc87b90] [c0000000008b9238] __memcg_slab_free_hook+0x238/0x3ec >>> [ 6055.136956] [c000000d0dc87c60] [c0000000007f39a0] __rcu_free_sheaf_prepare+0x314/0x3e8 >>> [ 6055.136968] [c000000d0dc87d10] [c0000000007fbf0c] rcu_free_sheaf+0x38/0x170 >>> [ 6055.136980] [c000000d0dc87d50] [c0000000003344b0] rcu_do_batch+0x2ec/0xfa8 >>> [ 6055.136992] [c000000d0dc87e50] [c000000000339948] rcu_core+0x22c/0x48c >>> [ 6055.137002] [c000000d0dc87ec0] [c0000000001cfe6c] handle_softirqs+0x1f4/0x74c >>> [ 6055.137013] [c000000d0dc87fe0] [c00000000001b0cc] do_softirq_own_stack+0x60/0x7c >>> [ 6055.137025] [c000000009717930] [c00000000001b0b8] do_softirq_own_stack+0x4c/0x7c >>> [ 6055.137036] [c000000009717960] [c0000000001cf128] __irq_exit_rcu+0x268/0x308 >>> [ 6055.137046] [c0000000097179a0] [c0000000001d0ba4] irq_exit+0x20/0x38 >>> [ 6055.137056] [c0000000097179c0] [c0000000000315f4] interrupt_async_exit_prepare.constprop.0+0x18/0x2c >>> [ 6055.137069] [c0000000097179e0] [c000000000009ffc] decrementer_common_virt+0x28c/0x290 >>> [ 6055.137080] ---- interrupt: 900 at plpar_hcall_norets_notrace+0x18/0x2c >>> [ 6055.137090] NIP: c00000000012d8f0 LR: c00000000135c3fc CTR: 0000000000000000 >>> [ 6055.137097] REGS: c000000009717a10 TRAP: 0900 Tainted: G W (6.19.0-next-20260216) >>> [ 6055.137105] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24000804 XER: 00000037 >>> [ 6055.137134] CFAR: 0000000000000000 IRQMASK: 0 >>> [ 6055.137134] GPR00: 0000000000000000 c000000009717cb0 c00000000243a500 0000000000000000 >>> [ 6055.137134] GPR04: 0000000000000000 800400002fe6fc10 0000000000000000 0000000000000001 >>> [ 6055.137134] GPR08: 0000000000000033 0000000000000000 0000000000000090 0000000000000001 >>> [ 6055.137134] GPR12: 800400002fe6fc00 c000000d0ddf0b00 0000000000000000 000000002ef01a60 >>> [ 6055.137134] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>> [ 6055.137134] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001 >>> [ 6055.137134] GPR24: 0000000000000000 c000000004d7a778 00000581d1a507b8 0000000000000000 >>> [ 6055.137134] GPR28: 0000000000000000 0000000000000001 c0000000032b18d8 c0000000032b18e0 >>> [ 6055.137229] NIP [c00000000012d8f0] plpar_hcall_norets_notrace+0x18/0x2c >>> [ 6055.137238] LR [c00000000135c3fc] cede_processor.isra.0+0x1c/0x30 >>> [ 6055.137248] ---- interrupt: 900 >>> [ 6055.137253] [c000000009717cb0] [c000000009717cf0] 0xc000000009717cf0 (unreliable) >>> [ 6055.137265] [c000000009717d10] [c0000000019af160] dedicated_cede_loop+0x90/0x170 >>> [ 6055.137277] [c000000009717d60] [c0000000019aeb10] cpuidle_enter_state+0x394/0x480 >>> [ 6055.137288] [c000000009717e00] [c0000000013589ec] cpuidle_enter+0x64/0x9c >>> [ 6055.137298] [c000000009717e50] [c000000000284a8c] call_cpuidle+0x7c/0xf8 >>> [ 6055.137310] [c000000009717e90] [c000000000290398] cpuidle_idle_call+0x1c4/0x2b4 >>> [ 6055.137321] [c000000009717f00] [c0000000002905bc] do_idle+0x134/0x208 >>> [ 6055.137330] [c000000009717f50] [c000000000290a0c] cpu_startup_entry+0x60/0x64 >>> [ 6055.137341] [c000000009717f80] [c0000000000744b8] start_secondary+0x3fc/0x400 >>> [ 6055.137352] [c000000009717fe0] [c00000000000e258] start_secondary_prolog+0x10/0x14 >>> [ 6055.137363] Code: 60000000 3bda0008 7fc3f378 4bfb148d 60000000 ebfa0008 38800008 7fe3fb78 4bfb2b51 60000000 7c0004ac 39200001 <7d40f8a8> 7d495050 7d40f9ad 40c2fff4 >>> [ 6055.137400] ---[ end trace 0000000000000000 ]--- >> Again, nothing here seems to point to a xfs problem. >> > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [next-20260216]NULL pointer dereference in drain_obj_stock() (RCU free path) 2026-02-22 10:08 ` Venkat Rao Bagalkote @ 2026-02-22 11:47 ` Harry Yoo 2026-02-22 23:36 ` Shakeel Butt 0 siblings, 1 reply; 7+ messages in thread From: Harry Yoo @ 2026-02-22 11:47 UTC (permalink / raw) To: Venkat Rao Bagalkote Cc: Vlastimil Babka, Carlos Maiolino, Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, linux-xfs, linux-fsdevel, LKML, Madhavan Srinivasan, Ritesh Harjani, ojaswin, Muchun Song, Cgroups, linux-mm, Hao Li On Sun, Feb 22, 2026 at 03:38:57PM +0530, Venkat Rao Bagalkote wrote: > > On 18/02/26 5:06 pm, Vlastimil Babka wrote: > > On 2/17/26 13:40, Carlos Maiolino wrote: > > > On Tue, Feb 17, 2026 at 04:59:12PM +0530, Venkat Rao Bagalkote wrote: > > > > Greetings!!! > > > > > > > > I am observing below OOPs, while running xfstests generic/428 test case. But > > > > I am not able to reproduce this consistently. > > > > > > > > > > > > Platform: IBM Power11 (pSeries LPAR), Radix MMU, LE, 64K pages > > > > Kernel: 6.19.0-next-20260216 > > > > Tests: generic/428 > > > > > > > > local.config >>> > > > > [xfs_4k] > > > > export RECREATE_TEST_DEV=true > > > > export TEST_DEV=/dev/loop0 > > > > export TEST_DIR=/mnt/test > > > > export SCRATCH_DEV=/dev/loop1 > > > > export SCRATCH_MNT=/mnt/scratch > > > > export MKFS_OPTIONS="-b size=4096" > > > > export FSTYP=xfs > > > > export MOUNT_OPTIONS=""- > > > > > > > > > > > > > > > > Attached is .config file used. > > > > > > > > Traces: > > > > > > > /me fixing trace's indentation > > CCing memcg and slab folks. > > Would be nice to figure out where in drain_obj_stock things got wrong. Any > > change for e.g. ./scripts/faddr2line ? > > > > I wonder if we have either some bogus objext pointer, or maybe the > > rcu_free_sheaf() context is new (or previously rare) for memcg and we have > > some locking issues being exposed in refill/drain. > > > This issue also got reproduced on mainline repo. > > Traces: > > [ 8058.036083] Kernel attempted to read user page (0) - exploit attempt? > (uid: 0) > [ 8058.036116] BUG: Kernel NULL pointer dereference on read at 0x00000000 > [ 8058.036127] Faulting instruction address: 0xc0000000008b018c > [ 8058.036137] Oops: Kernel access of bad area, sig: 11 [#1] > [ 8058.036147] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries > [ 8058.036159] Modules linked in: overlay dm_zero dm_thin_pool > dm_persistent_data dm_bio_prison dm_snapshot dm_bufio dm_flakey xfs loop > dm_mod nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet > nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set bonding nf_tables tls > rfkill sunrpc nfnetlink pseries_rng vmx_crypto dax_pmem fuse ext4 crc16 > mbcache jbd2 nd_pmem papr_scm sd_mod libnvdimm sg ibmvscsi ibmveth > scsi_transport_srp pseries_wdt [last unloaded: scsi_debug] > [ 8058.036339] CPU: 19 UID: 0 PID: 115 Comm: ksoftirqd/19 Kdump: loaded Not > tainted 6.19.0+ #1 PREEMPTLAZY > [ 8058.036361] Hardware name: IBM,9080-HEX Power11 (architected) 0x820200 > 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries > [ 8058.036379] NIP: c0000000008b018c LR: c0000000008b0180 CTR: > c00000000036d680 > [ 8058.036395] REGS: c00000000b5976c0 TRAP: 0300 Not tainted (6.19.0+) > [ 8058.036411] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: > 84042002 XER: 20040000 > [ 8058.036482] CFAR: c000000000862cf4 DAR: 0000000000000000 DSISR: 40000000 > IRQMASK: 0 > [ 8058.036482] GPR00: c0000000008b0180 c00000000b597960 c00000000243a500 > 0000000000000001 > [ 8058.036482] GPR04: 0000000000000008 0000000000000001 c0000000008b0180 > 0000000000000001 > [ 8058.036482] GPR08: a80e000000000000 0000000000000001 0000000000000007 > a80e000000000000 > [ 8058.036482] GPR12: c00e00000120f8d5 c000000d0ddf0b00 c000000073567780 > 0000000000000006 > [ 8058.036482] GPR16: c000000007012fa0 c000000007012fa4 c000000005160980 > c000000007012f88 > [ 8058.036482] GPR20: c00c000001c3daac c000000d0d10f008 0000000000000001 > ffffffffffffff78 > [ 8058.036482] GPR24: 0000000000000005 c000000d0d58f180 c00000000cd6f580 > c000000d0d10f01c > [ 8058.036482] GPR28: c000000d0d10f008 c000000d0d10f010 c00000000cd6f588 > 0000000000000000 > [ 8058.036628] NIP [c0000000008b018c] drain_obj_stock+0x620/0xa48 > [ 8058.036646] LR [c0000000008b0180] drain_obj_stock+0x614/0xa48 > [ 8058.036659] Call Trace: > [ 8058.036665] [c00000000b597960] [c0000000008b0180] > drain_obj_stock+0x614/0xa48 (unreliable) > [ 8058.036688] [c00000000b597a10] [c0000000008b2a64] > refill_obj_stock+0x104/0x680 > [ 8058.036715] [c00000000b597a90] [c0000000008b94b8] > __memcg_slab_free_hook+0x238/0x3ec > [ 8058.036738] [c00000000b597b60] [c0000000007f3c10] > __rcu_free_sheaf_prepare+0x314/0x3e8 > [ 8058.036763] [c00000000b597c10] [c0000000007fbf70] > rcu_free_sheaf_nobarn+0x38/0x78 > [ 8058.036788] [c00000000b597c40] [c000000000334550] > rcu_do_batch+0x2ec/0xfa8 > [ 8058.036812] [c00000000b597d40] [c0000000003399e8] rcu_core+0x22c/0x48c > [ 8058.036835] [c00000000b597db0] [c0000000001cfe6c] > handle_softirqs+0x1f4/0x74c > [ 8058.036862] [c00000000b597ed0] [c0000000001d0458] run_ksoftirqd+0x94/0xb8 > [ 8058.036885] [c00000000b597f00] [c00000000022a130] > smpboot_thread_fn+0x450/0x648 > [ 8058.036912] [c00000000b597f80] [c000000000218408] kthread+0x244/0x28c > [ 8058.036927] [c00000000b597fe0] [c00000000000ded8] > start_kernel_thread+0x14/0x18 > [ 8058.036943] Code: 60000000 3bda0008 7fc3f378 4bfb148d 60000000 ebfa0008 > 38800008 7fe3fb78 4bfb2b51 60000000 7c0004ac 39200001 <7d40f8a8> 7d495050 > 7d40f9ad 40c2fff4 > [ 8058.037000] ---[ end trace 0000000000000000 ]--- > > > And below is the corresponding o/p from faddr2line. Thanks! > drain_obj_stock+0x620/0xa48: > arch_atomic64_sub_return_relaxed at arch/powerpc/include/asm/atomic.h:272 > (inlined by) raw_atomic64_sub_return at > include/linux/atomic/atomic-arch-fallback.h:2917 > (inlined by) raw_atomic64_sub_and_test at > include/linux/atomic/atomic-arch-fallback.h:4386 > (inlined by) raw_atomic_long_sub_and_test at > include/linux/atomic/atomic-long.h:1551 > (inlined by) atomic_long_sub_and_test at > include/linux/atomic/atomic-instrumented.h:4522 > (inlined by) percpu_ref_put_many at include/linux/percpu-refcount.h:334 > (inlined by) percpu_ref_put at include/linux/percpu-refcount.h:351 > (inlined by) obj_cgroup_put at include/linux/memcontrol.h:794 > (inlined by) drain_obj_stock at mm/memcontrol.c:3059 It seems it crashed while dereferencing objcg->ref->data->count. I think that implies that obj_cgroup_release()->percpu_ref_exit() is already called due to the refcount reaching zero and set ref->data = NULL. Wait, was the stock->objcg ever a valid objcg? I think it should be valid when refilling the obj stock, otherwise it should have crashed in refill_obj_stock() -> obj_cgroup_get() path in the first place, rather than crashing when draining. And that sounds like we're somehow calling obj_cgroup_put() more times than obj_cgroup_get(). Anyway, this is my theory that it may be due to mis-refcounting of objcgs. > drain_obj_stock+0x614/0xa48: > instrument_atomic_read_write at include/linux/instrumented.h:112 > (inlined by) atomic_long_sub_and_test at > include/linux/atomic/atomic-instrumented.h:4521 > (inlined by) percpu_ref_put_many at include/linux/percpu-refcount.h:334 > (inlined by) percpu_ref_put at include/linux/percpu-refcount.h:351 > (inlined by) obj_cgroup_put at include/linux/memcontrol.h:794 > (inlined by) drain_obj_stock at mm/memcontrol.c:3059 > refill_obj_stock+0x104/0x680: -- Cheers, Harry / Hyeonggon ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [next-20260216]NULL pointer dereference in drain_obj_stock() (RCU free path) 2026-02-22 11:47 ` Harry Yoo @ 2026-02-22 23:36 ` Shakeel Butt 2026-02-22 23:48 ` Shakeel Butt 0 siblings, 1 reply; 7+ messages in thread From: Shakeel Butt @ 2026-02-22 23:36 UTC (permalink / raw) To: Harry Yoo Cc: Venkat Rao Bagalkote, Vlastimil Babka, Carlos Maiolino, Johannes Weiner, Michal Hocko, Roman Gushchin, linux-xfs, linux-fsdevel, LKML, Madhavan Srinivasan, Ritesh Harjani, ojaswin, Muchun Song, Cgroups, linux-mm, Hao Li On Sun, Feb 22, 2026 at 08:47:03PM +0900, Harry Yoo wrote: [...] > > It seems it crashed while dereferencing objcg->ref->data->count. > I think that implies that obj_cgroup_release()->percpu_ref_exit() > is already called due to the refcount reaching zero and set > ref->data = NULL. > > Wait, was the stock->objcg ever a valid objcg? > I think it should be valid when refilling the obj stock, otherwise > it should have crashed in refill_obj_stock() -> obj_cgroup_get() path > in the first place, rather than crashing when draining. > > And that sounds like we're somehow calling obj_cgroup_put() more times > than obj_cgroup_get(). > > Anyway, this is my theory that it may be due to mis-refcounting of objcgs. > I have not looked deeper into recent slub changes (sheafs or obj_exts savings) but one thing looks weird to me: allocate_slab() // for cache with SLAB_OBJ_EXT_IN_OBJ -> alloc_slab_obj_exts_early() -> slab_set_stride(slab, s->size) -> account_slab() -> alloc_slab_obj_exts() -> slab_set_stride(slab, sizeof(struct slabobj_ext)); Unconditional overwrite of stride. Not sure if it is issue or even related to this crash but looks odd. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [next-20260216]NULL pointer dereference in drain_obj_stock() (RCU free path) 2026-02-22 23:36 ` Shakeel Butt @ 2026-02-22 23:48 ` Shakeel Butt 2026-02-23 2:36 ` Harry Yoo 0 siblings, 1 reply; 7+ messages in thread From: Shakeel Butt @ 2026-02-22 23:48 UTC (permalink / raw) To: Harry Yoo Cc: Venkat Rao Bagalkote, Vlastimil Babka, Carlos Maiolino, Johannes Weiner, Michal Hocko, Roman Gushchin, linux-xfs, linux-fsdevel, LKML, Madhavan Srinivasan, Ritesh Harjani, ojaswin, Muchun Song, Cgroups, linux-mm, Hao Li On Sun, Feb 22, 2026 at 03:36:46PM -0800, Shakeel Butt wrote: > On Sun, Feb 22, 2026 at 08:47:03PM +0900, Harry Yoo wrote: > [...] > > > > It seems it crashed while dereferencing objcg->ref->data->count. > > I think that implies that obj_cgroup_release()->percpu_ref_exit() > > is already called due to the refcount reaching zero and set > > ref->data = NULL. > > > > Wait, was the stock->objcg ever a valid objcg? > > I think it should be valid when refilling the obj stock, otherwise > > it should have crashed in refill_obj_stock() -> obj_cgroup_get() path > > in the first place, rather than crashing when draining. > > > > And that sounds like we're somehow calling obj_cgroup_put() more times > > than obj_cgroup_get(). > > > > Anyway, this is my theory that it may be due to mis-refcounting of objcgs. > > > > I have not looked deeper into recent slub changes (sheafs or obj_exts savings) > but one thing looks weird to me: > > allocate_slab() // for cache with SLAB_OBJ_EXT_IN_OBJ > -> alloc_slab_obj_exts_early() > -> slab_set_stride(slab, s->size) > -> account_slab() > -> alloc_slab_obj_exts() > -> slab_set_stride(slab, sizeof(struct slabobj_ext)); > > Unconditional overwrite of stride. Not sure if it is issue or even related to > this crash but looks odd. I asked AI to debug this crash report along with a nudge towards to look for stride corruption, it gave me the following output: # Stride Corruption Bug Analysis ## Bug Report Context - **Crash Location**: `drain_obj_stock+0x620/0xa48` in `obj_cgroup_put(old)` at mm/memcontrol.c:3059 - **Root Cause**: `percpu_ref.data` is NULL, meaning `obj_cgroup_release()` already ran - **Platform**: IBM Power11 (pSeries LPAR, Radix MMU, LE, 64K pages, kernel 6.19.0-next-20260216) - **Trigger**: xfstests generic/428 ## Identified Bug: Unconditional Stride Overwrite ### Location: mm/slub.c lines 2196-2223 (alloc_slab_obj_exts) ```c retry: old_exts = READ_ONCE(slab->obj_exts); handle_failed_objexts_alloc(old_exts, vec, objects); slab_set_stride(slab, sizeof(struct slabobj_ext)); // BUG: UNCONDITIONALLY SET if (new_slab) { slab->obj_exts = new_exts; } else if (old_exts & ~OBJEXTS_FLAGS_MASK) { // obj_exts already exists, BUT stride was already modified above! mark_objexts_empty(vec); kfree(vec); return 0; } else if (cmpxchg(&slab->obj_exts, old_exts, new_exts) != old_exts) { goto retry; } ``` ### The Problem The stride is set to `sizeof(struct slabobj_ext)` **BEFORE** checking if `obj_exts` already exists. If a slab was created with `SLAB_OBJ_EXT_IN_OBJ` mode (where stride = `s->size`), and later `alloc_slab_obj_exts` is called for any reason, the stride gets corrupted. ### Stride Modes There are two stride modes (see alloc_slab_obj_exts_early): 1. **Normal mode**: stride = `sizeof(struct slabobj_ext)` (~16 bytes) - obj_exts is a separate array or in slab leftover space 2. **SLAB_OBJ_EXT_IN_OBJ mode**: stride = `s->size` (object size, e.g., 64-256+ bytes) - obj_ext is embedded within each object at a fixed offset ### Consequences of Wrong Stride When `slab_obj_ext` is later called: ```c obj_ext = (struct slabobj_ext *)(obj_exts + slab_get_stride(slab) * index); ``` With corrupted stride (16 instead of 256): - **Expected**: `obj_exts + 256 * 5 = obj_exts + 1280` (correct obj_ext for object 5) - **Actual**: `obj_exts + 16 * 5 = obj_exts + 80` (WRONG obj_ext - belongs to object 0!) This causes: 1. Reading wrong object's objcg pointer 2. Releasing wrong objcg reference (`obj_cgroup_put`) 3. Reference underflow on victim objcg 4. Early `obj_cgroup_release()` → `percpu_ref_exit()` → `data = NULL` 5. Stock still caches the objcg pointer 6. Later `drain_obj_stock()` tries to put it → **CRASH** ## Missing Safety Check `slab_obj_ext()` in mm/slab.h has **no bounds checking**: ```c static inline struct slabobj_ext *slab_obj_ext(struct slab *slab, unsigned long obj_exts, unsigned int index) { struct slabobj_ext *obj_ext; VM_WARN_ON_ONCE(obj_exts != slab_obj_exts(slab)); // MISSING: VM_WARN_ON_ONCE(index >= slab->objects); obj_ext = (struct slabobj_ext *)(obj_exts + slab_get_stride(slab) * index); return kasan_reset_tag(obj_ext); } ``` ## CRITICAL: Memory Ordering Bug on PowerPC (Likely Root Cause) ### The Problem In `alloc_slab_obj_exts` (mm/slub.c lines 2199-2220), there is **NO memory barrier** between the stride store and the obj_exts visibility via cmpxchg: ```c slab_set_stride(slab, sizeof(struct slabobj_ext)); // Store to stride (line 2199) // NO MEMORY BARRIER HERE! if (new_slab) { slab->obj_exts = new_exts; // Store to obj_exts (line 2207) } else if (...) { } else if (cmpxchg(&slab->obj_exts, ...) != ...) { // Atomic on obj_exts (line 2220) goto retry; } ``` ### Why This Crashes on PowerPC PowerPC has a **weakly-ordered memory model**. Stores can be reordered and may not be immediately visible to other processors. The cmpxchg provides a barrier AFTER it executes, but the stride store BEFORE cmpxchg may not be visible when obj_exts becomes visible. **Race Scenario:** 1. CPU A: `slab_set_stride(slab, 16)` (store to stride, in CPU A's store buffer) 2. CPU A: `cmpxchg(&slab->obj_exts, 0, new_exts)` succeeds, obj_exts is now visible 3. CPU B: Sees `obj_exts` is set (from step 2) 4. CPU B: Reads `slab->stride` → **sees OLD value (0 or garbage)** due to reordering! 5. CPU B: `slab_obj_ext` calculates `obj_exts + 0 * index = obj_exts` for ALL indices! 6. **All objects appear to share the same obj_ext at offset 0** ### Consequences - Object 0's objcg is correct - Object 1..N all read object 0's objcg (WRONG!) - When freeing multiple objects, we `obj_cgroup_put` the SAME objcg multiple times - Reference count underflows → early `obj_cgroup_release()` - `percpu_ref_exit()` sets `data = NULL` - Later stock drain tries to put the objcg → **CRASH in `drain_obj_stock`** ### Why This Matches the Bug Report - **Platform**: IBM Power11 (PowerPC) - weakly ordered memory - **Trigger**: xfstests generic/428 - creates high filesystem activity with many allocations - **Crash location**: `drain_obj_stock` → `obj_cgroup_put` with NULL data - **Pattern**: Stock has cached objcg that was prematurely released ### Proposed Fix Add a write memory barrier to ensure stride is visible before obj_exts: ```c slab_set_stride(slab, sizeof(struct slabobj_ext)); smp_wmb(); // Ensure stride is visible before obj_exts if (new_slab) { slab->obj_exts = new_exts; } else if (...) { } else if (cmpxchg(&slab->obj_exts, old_exts, new_exts) != old_exts) { goto retry; } ``` And correspondingly, the reader side should use a read barrier: ```c static inline unsigned short slab_get_stride(struct slab *slab) { // Need acquire semantics when reading stride after seeing obj_exts return smp_load_acquire(&slab->stride); } ``` Or use `smp_store_release` / `smp_load_acquire` pairs for proper ordering. ### Also Applies to alloc_slab_obj_exts_early The same issue exists in `alloc_slab_obj_exts_early` (lines 2290-2291 and 2308-2309): ```c slab->obj_exts = obj_exts; // Store obj_exts slab_set_stride(slab, sizeof(struct slabobj_ext)); // Store stride AFTER! ``` Here the order is **reversed** - obj_exts is set BEFORE stride! This is even worse for memory ordering, as other CPUs could see obj_exts before stride is set. ## Original Theory: Unconditional Stride Overwrite (Kept for reference - less likely to be the root cause on this specific crash) The stride is set to `sizeof(struct slabobj_ext)` **BEFORE** checking if `obj_exts` already exists. However, analysis shows this is protected by the TOCTOU check in callers (`!slab_obj_exts(slab)`). ## Trigger Scenarios 1. **Memory ordering on PowerPC** (MOST LIKELY): Stride not visible when obj_exts becomes visible due to missing memory barriers. 2. **Race between alloc_slab_obj_exts calls**: Two CPUs trying to allocate obj_exts for the same slab simultaneously. 3. **Interaction with RCU free path**: Objects in RCU sheaf being processed when stride is stale/zero. ## Confirmed Code Analysis (CONFIG_64BIT) On 64-bit systems (including IBM Power11), the stride is stored dynamically: **mm/slab.h:562-569**: ```c #ifdef CONFIG_64BIT static inline void slab_set_stride(struct slab *slab, unsigned short stride) { slab->stride = stride; // Plain store - NO memory ordering! } static inline unsigned short slab_get_stride(struct slab *slab) { return slab->stride; // Plain load - NO memory ordering! } ``` **mm/slab.h:533-548** (`slab_obj_exts`): ```c static inline unsigned long slab_obj_exts(struct slab *slab) { unsigned long obj_exts = READ_ONCE(slab->obj_exts); // Only compiler barrier! // ... validation ... return obj_exts & ~OBJEXTS_FLAGS_MASK; } ``` `READ_ONCE` only provides compiler ordering, NOT CPU memory ordering. There's no acquire barrier to ensure the stride read happens after seeing obj_exts. ## Complete Fix Using Release/Acquire Semantics ### Fix 1: Reader side - slab_obj_exts (mm/slab.h) Change `READ_ONCE` to `smp_load_acquire`: ```c static inline unsigned long slab_obj_exts(struct slab *slab) { unsigned long obj_exts = smp_load_acquire(&slab->obj_exts); // Acquire barrier // ... validation ... return obj_exts & ~OBJEXTS_FLAGS_MASK; } ``` ### Fix 2: Writer side - alloc_slab_obj_exts (mm/slub.c:2196-2223) Use `smp_store_release` for obj_exts after setting stride: ```c retry: old_exts = READ_ONCE(slab->obj_exts); handle_failed_objexts_alloc(old_exts, vec, objects); if (new_slab) { slab_set_stride(slab, sizeof(struct slabobj_ext)); smp_store_release(&slab->obj_exts, new_exts); // Release barrier } else if (old_exts & ~OBJEXTS_FLAGS_MASK) { mark_objexts_empty(vec); kfree(vec); return 0; } else { slab_set_stride(slab, sizeof(struct slabobj_ext)); // cmpxchg already provides release semantics, but stride must be // visible before cmpxchg. Need explicit barrier: smp_wmb(); if (cmpxchg(&slab->obj_exts, old_exts, new_exts) != old_exts) goto retry; } ``` ### Fix 3: Writer side - alloc_slab_obj_exts_early (mm/slub.c:2290-2291, 2308-2309) The order is REVERSED here - obj_exts is set BEFORE stride! Fix by using `smp_store_release`: ```c // For normal obj_exts (lines 2290-2291): slab_set_stride(slab, sizeof(struct slabobj_ext)); // Set stride FIRST smp_store_release(&slab->obj_exts, obj_exts); // Then release obj_exts // For SLAB_OBJ_EXT_IN_OBJ (lines 2308-2309): slab_set_stride(slab, s->size); // Set stride FIRST smp_store_release(&slab->obj_exts, obj_exts); // Then release obj_exts ``` ## Why This Fixes the Crash With proper release/acquire ordering: 1. **Writer** (CPU A): Sets stride, then `smp_store_release(&obj_exts, ...)` ensures stride is visible to all CPUs before obj_exts becomes visible 2. **Reader** (CPU B): `smp_load_acquire(&obj_exts)` ensures that if obj_exts is seen as set, the subsequent stride read will see the correct value This prevents the race where CPU B sees obj_exts but reads stale/zero stride, which caused all objects to appear to share obj_ext at offset 0, leading to multiple `obj_cgroup_put` calls on the same objcg → reference underflow → crash. ## Additional Safety: Bounds Check in slab_obj_ext Add bounds check to catch any remaining issues: ```c VM_WARN_ON_ONCE(index >= slab->objects); ``` ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [next-20260216]NULL pointer dereference in drain_obj_stock() (RCU free path) 2026-02-22 23:48 ` Shakeel Butt @ 2026-02-23 2:36 ` Harry Yoo 0 siblings, 0 replies; 7+ messages in thread From: Harry Yoo @ 2026-02-23 2:36 UTC (permalink / raw) To: Shakeel Butt Cc: Venkat Rao Bagalkote, Vlastimil Babka, Carlos Maiolino, Johannes Weiner, Michal Hocko, Roman Gushchin, linux-xfs, linux-fsdevel, LKML, Madhavan Srinivasan, Ritesh Harjani, ojaswin, Muchun Song, Cgroups, linux-mm, Hao Li On Sun, Feb 22, 2026 at 03:48:53PM -0800, Shakeel Butt wrote: > On Sun, Feb 22, 2026 at 03:36:46PM -0800, Shakeel Butt wrote: > > On Sun, Feb 22, 2026 at 08:47:03PM +0900, Harry Yoo wrote: > > [...] > > > > > > It seems it crashed while dereferencing objcg->ref->data->count. > > > I think that implies that obj_cgroup_release()->percpu_ref_exit() > > > is already called due to the refcount reaching zero and set > > > ref->data = NULL. > > > > > > Wait, was the stock->objcg ever a valid objcg? > > > I think it should be valid when refilling the obj stock, otherwise > > > it should have crashed in refill_obj_stock() -> obj_cgroup_get() path > > > in the first place, rather than crashing when draining. > > > > > > And that sounds like we're somehow calling obj_cgroup_put() more times > > > than obj_cgroup_get(). > > > > > > Anyway, this is my theory that it may be due to mis-refcounting of objcgs. > > > > > > > I have not looked deeper into recent slub changes (sheafs or obj_exts savings) > > but one thing looks weird to me: > > > > allocate_slab() // for cache with SLAB_OBJ_EXT_IN_OBJ > > -> alloc_slab_obj_exts_early() > > -> slab_set_stride(slab, s->size) > > -> account_slab() > > -> alloc_slab_obj_exts() > > -> slab_set_stride(slab, sizeof(struct slabobj_ext)); > > > > Unconditional overwrite of stride. Not sure if it is issue or even related to > > this crash but looks odd. > > I asked AI to debug this crash report along with a nudge towards to look for > stride corruption, it gave me the following output: Thanks! TLDR; I think stride corruption cannot happen because we don't try to allocate obj_exts if we already have it (w/ SLAB_OBJ_EXT_IN_OBJ). But the other analysis saying it's memory ordering issue seems very relevant. > # Stride Corruption Bug Analysis > > ## Bug Report Context > > - **Crash Location**: `drain_obj_stock+0x620/0xa48` in `obj_cgroup_put(old)` at mm/memcontrol.c:3059 > - **Root Cause**: `percpu_ref.data` is NULL, meaning `obj_cgroup_release()` already ran > - **Platform**: IBM Power11 (pSeries LPAR, Radix MMU, LE, 64K pages, kernel 6.19.0-next-20260216) > - **Trigger**: xfstests generic/428 > > ## Identified Bug: Unconditional Stride Overwrite > > ### Location: mm/slub.c lines 2196-2223 (alloc_slab_obj_exts) > > ```c > retry: > old_exts = READ_ONCE(slab->obj_exts); > handle_failed_objexts_alloc(old_exts, vec, objects); > slab_set_stride(slab, sizeof(struct slabobj_ext)); // BUG: UNCONDITIONALLY SET > > if (new_slab) { > slab->obj_exts = new_exts; > } else if (old_exts & ~OBJEXTS_FLAGS_MASK) { > // obj_exts already exists, BUT stride was already modified above! > mark_objexts_empty(vec); > kfree(vec); > return 0; > } else if (cmpxchg(&slab->obj_exts, old_exts, new_exts) != old_exts) { > goto retry; > } > ``` > > ### The Problem > > The stride is set to `sizeof(struct slabobj_ext)` **BEFORE** checking if `obj_exts` already > exists. If a slab was created with `SLAB_OBJ_EXT_IN_OBJ` mode (where stride = `s->size`), > and later `alloc_slab_obj_exts` is called for any reason, the stride gets corrupted. I think this part is bogus. It's not a bug because obj_exts saving is applied only during slab allocation and initialization. If it benefits from the obj_exts savings, no need to call alloc_slab_obj_exts() later, thus no corruption. If doesn't benefit from the obj_exts savings, the stride size is always sizeof(struct slabobj_ext). > ### Stride Modes > > There are two stride modes (see alloc_slab_obj_exts_early): > > 1. **Normal mode**: stride = `sizeof(struct slabobj_ext)` (~16 bytes) > - obj_exts is a separate array or in slab leftover space > > 2. **SLAB_OBJ_EXT_IN_OBJ mode**: stride = `s->size` (object size, e.g., 64-256+ bytes) > - obj_ext is embedded within each object at a fixed offset > > ### Consequences of Wrong Stride > > When `slab_obj_ext` is later called: > ```c > obj_ext = (struct slabobj_ext *)(obj_exts + slab_get_stride(slab) * index); > ``` > > With corrupted stride (16 instead of 256): > - **Expected**: `obj_exts + 256 * 5 = obj_exts + 1280` (correct obj_ext for object 5) > - **Actual**: `obj_exts + 16 * 5 = obj_exts + 80` (WRONG obj_ext - belongs to object 0!) [...snip...] > ## CRITICAL: Memory Ordering Bug on PowerPC (Likely Root Cause) > > ### The Problem > > In `alloc_slab_obj_exts` (mm/slub.c lines 2199-2220), there is **NO memory barrier** > between the stride store and the obj_exts visibility via cmpxchg: This is actually a good point. > ```c > slab_set_stride(slab, sizeof(struct slabobj_ext)); // Store to stride (line 2199) > // NO MEMORY BARRIER HERE! > if (new_slab) { > slab->obj_exts = new_exts; // Store to obj_exts (line 2207) > } else if (...) { > } else if (cmpxchg(&slab->obj_exts, ...) != ...) { // Atomic on obj_exts (line 2220) > goto retry; > } > ``` > > ### Why This Crashes on PowerPC > > PowerPC has a **weakly-ordered memory model**. Stores can be reordered and may not be > immediately visible to other processors. The cmpxchg provides a barrier AFTER it > executes, but the stride store BEFORE cmpxchg may not be visible when obj_exts becomes > visible. > > **Race Scenario:** > 1. CPU A: `slab_set_stride(slab, 16)` (store to stride, in CPU A's store buffer) > 2. CPU A: `cmpxchg(&slab->obj_exts, 0, new_exts)` succeeds, obj_exts is now visible > 3. CPU B: Sees `obj_exts` is set (from step 2) > 4. CPU B: Reads `slab->stride` → **sees OLD value (0 or garbage)** due to reordering! > 5. CPU B: `slab_obj_ext` calculates `obj_exts + 0 * index = obj_exts` for ALL indices! > 6. **All objects appear to share the same obj_ext at offset 0** Yes, that could actually happen, especially when the cache doesn't specify SLAB_ACCOUNT but allocate objects with __GFP_ACCOUNT set (e.g. xarray does that). With sheaves for all, objects can be in different CPUs' sheaves and they could try to allocate obj_exts and charge objects from the same slab. > ### Consequences > > - Object 0's objcg is correct > - Object 1..N all read object 0's objcg (WRONG!) > - When freeing multiple objects, we `obj_cgroup_put` the SAME objcg multiple times Yes, reading the wrong stride value is the path to mis-refcounting objcgs :P > - Reference count underflows → early `obj_cgroup_release()` > - `percpu_ref_exit()` sets `data = NULL` > - Later stock drain tries to put the objcg → **CRASH in `drain_obj_stock`** > ### Why This Matches the Bug Report > > - **Platform**: IBM Power11 (PowerPC) - weakly ordered memory > - **Trigger**: xfstests generic/428 - creates high filesystem activity with many allocations > - **Crash location**: `drain_obj_stock` → `obj_cgroup_put` with NULL data > - **Pattern**: Stock has cached objcg that was prematurely released > > ### Proposed Fix > > Add a write memory barrier to ensure stride is visible before obj_exts: > > ```c > slab_set_stride(slab, sizeof(struct slabobj_ext)); > smp_wmb(); // Ensure stride is visible before obj_exts > > if (new_slab) { > slab->obj_exts = new_exts; > } else if (...) { > } else if (cmpxchg(&slab->obj_exts, old_exts, new_exts) != old_exts) { > goto retry; > } > ``` I would say this is an overkill. Something like this should suffice: diff --git a/mm/slub.c b/mm/slub.c index 865bc050f654..8db93a88bbbf 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2196,7 +2196,6 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s, retry: old_exts = READ_ONCE(slab->obj_exts); handle_failed_objexts_alloc(old_exts, vec, objects); - slab_set_stride(slab, sizeof(struct slabobj_ext)); if (new_slab) { /* @@ -2272,6 +2271,12 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab) void *addr; unsigned long obj_exts; + /* + * Initialize stride unconditionally, + * but override if SLAB_OBJ_EXT_IN_OBJ is set. + */ + slab_set_stride(slab, sizeof(struct slabobj_ext)); + if (!need_slab_obj_exts(s)) return; @@ -2288,7 +2293,6 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab) obj_exts |= MEMCG_DATA_OBJEXTS; #endif slab->obj_exts = obj_exts; - slab_set_stride(slab, sizeof(struct slabobj_ext)); } else if (s->flags & SLAB_OBJ_EXT_IN_OBJ) { unsigned int offset = obj_exts_offset_in_object(s); There should be enough memory barriers to ensure the stride and obj_exts are visible to other CPUs before the slab is accessible by other CPUs (by acquiring/releasing n->list_lock) > And correspondingly, the reader side should use a read barrier: > ```c > static inline unsigned short slab_get_stride(struct slab *slab) > { > // Need acquire semantics when reading stride after seeing obj_exts > return smp_load_acquire(&slab->stride); > } > ``` > > Or use `smp_store_release` / `smp_load_acquire` pairs for proper ordering. > > ### Also Applies to alloc_slab_obj_exts_early > > The same issue exists in `alloc_slab_obj_exts_early` (lines 2290-2291 and 2308-2309): > > ```c > slab->obj_exts = obj_exts; // Store obj_exts > slab_set_stride(slab, sizeof(struct slabobj_ext)); // Store stride AFTER! > ``` > > Here the order is **reversed** - obj_exts is set BEFORE stride! This is even worse > for memory ordering, as other CPUs could see obj_exts before stride is set. [...] -- Cheers, Harry / Hyeonggon ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-02-23 2:36 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <ca241daa-e7e7-4604-a48d-de91ec9184a5@linux.ibm.com>
[not found] ` <aZReMzl-S9KM_snh@nidhogg.toxiclabs.cc>
2026-02-18 11:36 ` [next-20260216]NULL pointer dereference in drain_obj_stock() (RCU free path) Vlastimil Babka
2026-02-18 21:25 ` Shakeel Butt
2026-02-22 10:08 ` Venkat Rao Bagalkote
2026-02-22 11:47 ` Harry Yoo
2026-02-22 23:36 ` Shakeel Butt
2026-02-22 23:48 ` Shakeel Butt
2026-02-23 2:36 ` Harry Yoo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox