linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [next-20260216]NULL pointer dereference in drain_obj_stock() (RCU free path)
       [not found] ` <aZReMzl-S9KM_snh@nidhogg.toxiclabs.cc>
@ 2026-02-18 11:36   ` Vlastimil Babka
  2026-02-18 21:25     ` Shakeel Butt
  2026-02-22 10:08     ` Venkat Rao Bagalkote
  0 siblings, 2 replies; 7+ messages in thread
From: Vlastimil Babka @ 2026-02-18 11:36 UTC (permalink / raw)
  To: Carlos Maiolino, Venkat Rao Bagalkote, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Shakeel Butt
  Cc: linux-xfs, linux-fsdevel, LKML, Madhavan Srinivasan,
	Ritesh Harjani, ojaswin, Muchun Song, Cgroups, linux-mm,
	Harry Yoo, Hao Li

On 2/17/26 13:40, Carlos Maiolino wrote:
> On Tue, Feb 17, 2026 at 04:59:12PM +0530, Venkat Rao Bagalkote wrote:
>> Greetings!!!
>> 
>> I am observing below OOPs, while running xfstests generic/428 test case. But
>> I am not able to reproduce this consistently.
>> 
>> 
>> Platform: IBM Power11 (pSeries LPAR), Radix MMU, LE, 64K pages
>> Kernel: 6.19.0-next-20260216
>> Tests: generic/428
>> 
>> local.config >>>
>> [xfs_4k]
>> export RECREATE_TEST_DEV=true
>> export TEST_DEV=/dev/loop0
>> export TEST_DIR=/mnt/test
>> export SCRATCH_DEV=/dev/loop1
>> export SCRATCH_MNT=/mnt/scratch
>> export MKFS_OPTIONS="-b size=4096"
>> export FSTYP=xfs
>> export MOUNT_OPTIONS=""-
>> 
>> 
>> 
>> Attached is .config file used.
>> 
>> 
>> Traces:
>> 
> 
> /me fixing trace's indentation

CCing memcg and slab folks.
Would be nice to figure out where in drain_obj_stock things got wrong. Any
change for e.g. ./scripts/faddr2line ?

I wonder if we have either some bogus objext pointer, or maybe the
rcu_free_sheaf() context is new (or previously rare) for memcg and we have
some locking issues being exposed in refill/drain.

>> 
>> [ 6054.957411] run fstests generic/428 at 2026-02-16 22:25:57
>> [ 6055.136443] Kernel attempted to read user page (0) - exploit attempt?
>> (uid: 0)
>> [ 6055.136474] BUG: Kernel NULL pointer dereference on read at 0x00000000
>> [ 6055.136485] Faulting instruction address: 0xc0000000008aff0c
>> [ 6055.136495] Oops: Kernel access of bad area, sig: 11 [#1]
>> [ 6055.136505] LE PAGE_SIZE=64K MMU=Radix  SMP NR_CPUS=8192 NUMA pSeries
>> [ 6055.136517] Modules linked in: dm_thin_pool dm_persistent_data
>> dm_bio_prison dm_snapshot dm_bufio dm_flakey xfs loop dm_mod nft_fib_inet
>> nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
>> nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
>> nf_defrag_ipv6 nf_defrag_ipv4 bonding ip_set tls nf_tables rfkill sunrpc
>> nfnetlink pseries_rng vmx_crypto dax_pmem fuse ext4 crc16 mbcache jbd2
>> nd_pmem papr_scm sd_mod libnvdimm sg ibmvscsi ibmveth scsi_transport_srp
>> pseries_wdt [last unloaded: scsi_debug]
>> [ 6055.136684] CPU: 19 UID: 0 PID: 0 Comm: swapper/19 Kdump: loaded Tainted:
>> G        W           6.19.0-next-20260216 #1 PREEMPTLAZY
>> [ 6055.136701] Tainted: [W]=WARN
>> [ 6055.136708] Hardware name: IBM,9080-HEX Power11 (architected) 0x820200
>> 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
>> [ 6055.136719] NIP:  c0000000008aff0c LR: c0000000008aff00 CTR:
>> c00000000036d5e0
>> [ 6055.136730] REGS: c000000d0dc877c0 TRAP: 0300   Tainted: G   W           
>> (6.19.0-next-20260216)
>> [ 6055.136742] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 84042802 XER: 20040037
>> [ 6055.136777] CFAR: c000000000862a74 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0
>> [ 6055.136777] GPR00: c0000000008aff00 c000000d0dc87a60 c00000000243a500 0000000000000001
>> [ 6055.136777] GPR04: 0000000000000008 0000000000000001 c0000000008aff00 0000000000000001
>> [ 6055.136777] GPR08: a80e000000000000 0000000000000001 0000000000000007
>> a80e000000000000
>> [ 6055.136777] GPR12: c00e00000c46e6d5 c000000d0ddf0b00 c000000019069a00
>> 0000000000000006
>> [ 6055.136777] GPR16: c000000007012fa0 c000000007012fa4 c000000005160980
>> c000000007012f88
>> [ 6055.136777] GPR20: c00c0000004d7cec c000000d0d10f008 0000000000000001
>> ffffffffffffff78
>> [ 6055.136777] GPR24: 0000000000000005 c000000d0d58f180 c0000001d0795e00
>> c000000d0d10f01c
>> [ 6055.136777] GPR28: c000000d0d10f008 c000000d0d10f010 c0000001d0795e08
>> 0000000000000000
>> [ 6055.136891] NIP [c0000000008aff0c] drain_obj_stock+0x620/0xa48
>> [ 6055.136905] LR [c0000000008aff00] drain_obj_stock+0x614/0xa48
>> [ 6055.136915] Call Trace:
>> [ 6055.136919] [c000000d0dc87a60] [c0000000008aff00] drain_obj_stock+0x614/0xa48 (unreliable)
>> [ 6055.136933] [c000000d0dc87b10] [c0000000008b27e4] refill_obj_stock+0x104/0x680
>> [ 6055.136945] [c000000d0dc87b90] [c0000000008b9238] __memcg_slab_free_hook+0x238/0x3ec
>> [ 6055.136956] [c000000d0dc87c60] [c0000000007f39a0] __rcu_free_sheaf_prepare+0x314/0x3e8
>> [ 6055.136968] [c000000d0dc87d10] [c0000000007fbf0c] rcu_free_sheaf+0x38/0x170
>> [ 6055.136980] [c000000d0dc87d50] [c0000000003344b0] rcu_do_batch+0x2ec/0xfa8
>> [ 6055.136992] [c000000d0dc87e50] [c000000000339948] rcu_core+0x22c/0x48c
>> [ 6055.137002] [c000000d0dc87ec0] [c0000000001cfe6c] handle_softirqs+0x1f4/0x74c
>> [ 6055.137013] [c000000d0dc87fe0] [c00000000001b0cc] do_softirq_own_stack+0x60/0x7c
>> [ 6055.137025] [c000000009717930] [c00000000001b0b8] do_softirq_own_stack+0x4c/0x7c
>> [ 6055.137036] [c000000009717960] [c0000000001cf128] __irq_exit_rcu+0x268/0x308
>> [ 6055.137046] [c0000000097179a0] [c0000000001d0ba4] irq_exit+0x20/0x38
>> [ 6055.137056] [c0000000097179c0] [c0000000000315f4] interrupt_async_exit_prepare.constprop.0+0x18/0x2c
>> [ 6055.137069] [c0000000097179e0] [c000000000009ffc] decrementer_common_virt+0x28c/0x290
>> [ 6055.137080] ---- interrupt: 900 at plpar_hcall_norets_notrace+0x18/0x2c
>> [ 6055.137090] NIP:  c00000000012d8f0 LR: c00000000135c3fc CTR: 0000000000000000
>> [ 6055.137097] REGS: c000000009717a10 TRAP: 0900   Tainted: G   W            (6.19.0-next-20260216)
>> [ 6055.137105] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24000804  XER: 00000037
>> [ 6055.137134] CFAR: 0000000000000000 IRQMASK: 0
>> [ 6055.137134] GPR00: 0000000000000000 c000000009717cb0 c00000000243a500 0000000000000000
>> [ 6055.137134] GPR04: 0000000000000000 800400002fe6fc10 0000000000000000 0000000000000001
>> [ 6055.137134] GPR08: 0000000000000033 0000000000000000 0000000000000090 0000000000000001
>> [ 6055.137134] GPR12: 800400002fe6fc00 c000000d0ddf0b00 0000000000000000 000000002ef01a60
>> [ 6055.137134] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> [ 6055.137134] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
>> [ 6055.137134] GPR24: 0000000000000000 c000000004d7a778 00000581d1a507b8 0000000000000000
>> [ 6055.137134] GPR28: 0000000000000000 0000000000000001 c0000000032b18d8 c0000000032b18e0
>> [ 6055.137229] NIP [c00000000012d8f0] plpar_hcall_norets_notrace+0x18/0x2c
>> [ 6055.137238] LR [c00000000135c3fc] cede_processor.isra.0+0x1c/0x30
>> [ 6055.137248] ---- interrupt: 900
>> [ 6055.137253] [c000000009717cb0] [c000000009717cf0] 0xc000000009717cf0 (unreliable)
>> [ 6055.137265] [c000000009717d10] [c0000000019af160] dedicated_cede_loop+0x90/0x170
>> [ 6055.137277] [c000000009717d60] [c0000000019aeb10] cpuidle_enter_state+0x394/0x480
>> [ 6055.137288] [c000000009717e00] [c0000000013589ec] cpuidle_enter+0x64/0x9c
>> [ 6055.137298] [c000000009717e50] [c000000000284a8c] call_cpuidle+0x7c/0xf8
>> [ 6055.137310] [c000000009717e90] [c000000000290398] cpuidle_idle_call+0x1c4/0x2b4
>> [ 6055.137321] [c000000009717f00] [c0000000002905bc] do_idle+0x134/0x208
>> [ 6055.137330] [c000000009717f50] [c000000000290a0c] cpu_startup_entry+0x60/0x64
>> [ 6055.137341] [c000000009717f80] [c0000000000744b8] start_secondary+0x3fc/0x400
>> [ 6055.137352] [c000000009717fe0] [c00000000000e258] start_secondary_prolog+0x10/0x14
>> [ 6055.137363] Code: 60000000 3bda0008 7fc3f378 4bfb148d 60000000 ebfa0008 38800008 7fe3fb78 4bfb2b51 60000000 7c0004ac 39200001 <7d40f8a8> 7d495050 7d40f9ad 40c2fff4
>> [ 6055.137400] ---[ end trace 0000000000000000 ]---
> 
> Again, nothing here seems to point to a xfs problem.
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [next-20260216]NULL pointer dereference in drain_obj_stock() (RCU free path)
  2026-02-18 11:36   ` [next-20260216]NULL pointer dereference in drain_obj_stock() (RCU free path) Vlastimil Babka
@ 2026-02-18 21:25     ` Shakeel Butt
  2026-02-22 10:08     ` Venkat Rao Bagalkote
  1 sibling, 0 replies; 7+ messages in thread
From: Shakeel Butt @ 2026-02-18 21:25 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Carlos Maiolino, Venkat Rao Bagalkote, Johannes Weiner,
	Michal Hocko, Roman Gushchin, linux-xfs, linux-fsdevel, LKML,
	Madhavan Srinivasan, Ritesh Harjani, ojaswin, Muchun Song,
	Cgroups, linux-mm, Harry Yoo, Hao Li

On Wed, Feb 18, 2026 at 12:36:06PM +0100, Vlastimil Babka wrote:
> On 2/17/26 13:40, Carlos Maiolino wrote:
> > On Tue, Feb 17, 2026 at 04:59:12PM +0530, Venkat Rao Bagalkote wrote:
> >> Greetings!!!
> >> 
> >> I am observing below OOPs, while running xfstests generic/428 test case. But
> >> I am not able to reproduce this consistently.
> >> 
> >> 
> >> Platform: IBM Power11 (pSeries LPAR), Radix MMU, LE, 64K pages
> >> Kernel: 6.19.0-next-20260216
> >> Tests: generic/428
> >> 
> >> local.config >>>
> >> [xfs_4k]
> >> export RECREATE_TEST_DEV=true
> >> export TEST_DEV=/dev/loop0
> >> export TEST_DIR=/mnt/test
> >> export SCRATCH_DEV=/dev/loop1
> >> export SCRATCH_MNT=/mnt/scratch
> >> export MKFS_OPTIONS="-b size=4096"
> >> export FSTYP=xfs
> >> export MOUNT_OPTIONS=""-
> >> 
> >> 
> >> 
> >> Attached is .config file used.
> >> 
> >> 
> >> Traces:
> >> 
> > 
> > /me fixing trace's indentation
> 
> CCing memcg and slab folks.
> Would be nice to figure out where in drain_obj_stock things got wrong. Any
> change for e.g. ./scripts/faddr2line ?
> 
> I wonder if we have either some bogus objext pointer, or maybe the
> rcu_free_sheaf() context is new (or previously rare) for memcg and we have
> some locking issues being exposed in refill/drain.
> 

Yes output of ./scripts/faddr2line would be really helpful. I can't think of
anything that might go wrong in refill/drain.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [next-20260216]NULL pointer dereference in drain_obj_stock() (RCU free path)
  2026-02-18 11:36   ` [next-20260216]NULL pointer dereference in drain_obj_stock() (RCU free path) Vlastimil Babka
  2026-02-18 21:25     ` Shakeel Butt
@ 2026-02-22 10:08     ` Venkat Rao Bagalkote
  2026-02-22 11:47       ` Harry Yoo
  1 sibling, 1 reply; 7+ messages in thread
From: Venkat Rao Bagalkote @ 2026-02-22 10:08 UTC (permalink / raw)
  To: Vlastimil Babka, Carlos Maiolino, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt
  Cc: linux-xfs, linux-fsdevel, LKML, Madhavan Srinivasan,
	Ritesh Harjani, ojaswin, Muchun Song, Cgroups, linux-mm,
	Harry Yoo, Hao Li


On 18/02/26 5:06 pm, Vlastimil Babka wrote:
> On 2/17/26 13:40, Carlos Maiolino wrote:
>> On Tue, Feb 17, 2026 at 04:59:12PM +0530, Venkat Rao Bagalkote wrote:
>>> Greetings!!!
>>>
>>> I am observing below OOPs, while running xfstests generic/428 test case. But
>>> I am not able to reproduce this consistently.
>>>
>>>
>>> Platform: IBM Power11 (pSeries LPAR), Radix MMU, LE, 64K pages
>>> Kernel: 6.19.0-next-20260216
>>> Tests: generic/428
>>>
>>> local.config >>>
>>> [xfs_4k]
>>> export RECREATE_TEST_DEV=true
>>> export TEST_DEV=/dev/loop0
>>> export TEST_DIR=/mnt/test
>>> export SCRATCH_DEV=/dev/loop1
>>> export SCRATCH_MNT=/mnt/scratch
>>> export MKFS_OPTIONS="-b size=4096"
>>> export FSTYP=xfs
>>> export MOUNT_OPTIONS=""-
>>>
>>>
>>>
>>> Attached is .config file used.
>>>
>>>
>>> Traces:
>>>
>> /me fixing trace's indentation
> CCing memcg and slab folks.
> Would be nice to figure out where in drain_obj_stock things got wrong. Any
> change for e.g. ./scripts/faddr2line ?
>
> I wonder if we have either some bogus objext pointer, or maybe the
> rcu_free_sheaf() context is new (or previously rare) for memcg and we have
> some locking issues being exposed in refill/drain.


This issue also got reproduced on mainline repo.


Traces:


[ 8058.036083] Kernel attempted to read user page (0) - exploit attempt? 
(uid: 0)
[ 8058.036116] BUG: Kernel NULL pointer dereference on read at 0x00000000
[ 8058.036127] Faulting instruction address: 0xc0000000008b018c
[ 8058.036137] Oops: Kernel access of bad area, sig: 11 [#1]
[ 8058.036147] LE PAGE_SIZE=64K MMU=Radix  SMP NR_CPUS=8192 NUMA pSeries
[ 8058.036159] Modules linked in: overlay dm_zero dm_thin_pool 
dm_persistent_data dm_bio_prison dm_snapshot dm_bufio dm_flakey xfs loop 
dm_mod nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet 
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set bonding nf_tables tls 
rfkill sunrpc nfnetlink pseries_rng vmx_crypto dax_pmem fuse ext4 crc16 
mbcache jbd2 nd_pmem papr_scm sd_mod libnvdimm sg ibmvscsi ibmveth 
scsi_transport_srp pseries_wdt [last unloaded: scsi_debug]
[ 8058.036339] CPU: 19 UID: 0 PID: 115 Comm: ksoftirqd/19 Kdump: loaded 
Not tainted 6.19.0+ #1 PREEMPTLAZY
[ 8058.036361] Hardware name: IBM,9080-HEX Power11 (architected) 
0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
[ 8058.036379] NIP:  c0000000008b018c LR: c0000000008b0180 CTR: 
c00000000036d680
[ 8058.036395] REGS: c00000000b5976c0 TRAP: 0300   Not tainted (6.19.0+)
[ 8058.036411] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  
CR: 84042002  XER: 20040000
[ 8058.036482] CFAR: c000000000862cf4 DAR: 0000000000000000 DSISR: 
40000000 IRQMASK: 0
[ 8058.036482] GPR00: c0000000008b0180 c00000000b597960 c00000000243a500 
0000000000000001
[ 8058.036482] GPR04: 0000000000000008 0000000000000001 c0000000008b0180 
0000000000000001
[ 8058.036482] GPR08: a80e000000000000 0000000000000001 0000000000000007 
a80e000000000000
[ 8058.036482] GPR12: c00e00000120f8d5 c000000d0ddf0b00 c000000073567780 
0000000000000006
[ 8058.036482] GPR16: c000000007012fa0 c000000007012fa4 c000000005160980 
c000000007012f88
[ 8058.036482] GPR20: c00c000001c3daac c000000d0d10f008 0000000000000001 
ffffffffffffff78
[ 8058.036482] GPR24: 0000000000000005 c000000d0d58f180 c00000000cd6f580 
c000000d0d10f01c
[ 8058.036482] GPR28: c000000d0d10f008 c000000d0d10f010 c00000000cd6f588 
0000000000000000
[ 8058.036628] NIP [c0000000008b018c] drain_obj_stock+0x620/0xa48
[ 8058.036646] LR [c0000000008b0180] drain_obj_stock+0x614/0xa48
[ 8058.036659] Call Trace:
[ 8058.036665] [c00000000b597960] [c0000000008b0180] 
drain_obj_stock+0x614/0xa48 (unreliable)
[ 8058.036688] [c00000000b597a10] [c0000000008b2a64] 
refill_obj_stock+0x104/0x680
[ 8058.036715] [c00000000b597a90] [c0000000008b94b8] 
__memcg_slab_free_hook+0x238/0x3ec
[ 8058.036738] [c00000000b597b60] [c0000000007f3c10] 
__rcu_free_sheaf_prepare+0x314/0x3e8
[ 8058.036763] [c00000000b597c10] [c0000000007fbf70] 
rcu_free_sheaf_nobarn+0x38/0x78
[ 8058.036788] [c00000000b597c40] [c000000000334550] 
rcu_do_batch+0x2ec/0xfa8
[ 8058.036812] [c00000000b597d40] [c0000000003399e8] rcu_core+0x22c/0x48c
[ 8058.036835] [c00000000b597db0] [c0000000001cfe6c] 
handle_softirqs+0x1f4/0x74c
[ 8058.036862] [c00000000b597ed0] [c0000000001d0458] run_ksoftirqd+0x94/0xb8
[ 8058.036885] [c00000000b597f00] [c00000000022a130] 
smpboot_thread_fn+0x450/0x648
[ 8058.036912] [c00000000b597f80] [c000000000218408] kthread+0x244/0x28c
[ 8058.036927] [c00000000b597fe0] [c00000000000ded8] 
start_kernel_thread+0x14/0x18
[ 8058.036943] Code: 60000000 3bda0008 7fc3f378 4bfb148d 60000000 
ebfa0008 38800008 7fe3fb78 4bfb2b51 60000000 7c0004ac 39200001 
<7d40f8a8> 7d495050 7d40f9ad 40c2fff4
[ 8058.037000] ---[ end trace 0000000000000000 ]---


And below is the corresponding o/p from faddr2line.


drain_obj_stock+0x620/0xa48:
arch_atomic64_sub_return_relaxed at arch/powerpc/include/asm/atomic.h:272
(inlined by) raw_atomic64_sub_return at 
include/linux/atomic/atomic-arch-fallback.h:2917
(inlined by) raw_atomic64_sub_and_test at 
include/linux/atomic/atomic-arch-fallback.h:4386
(inlined by) raw_atomic_long_sub_and_test at 
include/linux/atomic/atomic-long.h:1551
(inlined by) atomic_long_sub_and_test at 
include/linux/atomic/atomic-instrumented.h:4522
(inlined by) percpu_ref_put_many at include/linux/percpu-refcount.h:334
(inlined by) percpu_ref_put at include/linux/percpu-refcount.h:351
(inlined by) obj_cgroup_put at include/linux/memcontrol.h:794
(inlined by) drain_obj_stock at mm/memcontrol.c:3059
drain_obj_stock+0x614/0xa48:
instrument_atomic_read_write at include/linux/instrumented.h:112
(inlined by) atomic_long_sub_and_test at 
include/linux/atomic/atomic-instrumented.h:4521
(inlined by) percpu_ref_put_many at include/linux/percpu-refcount.h:334
(inlined by) percpu_ref_put at include/linux/percpu-refcount.h:351
(inlined by) obj_cgroup_put at include/linux/memcontrol.h:794
(inlined by) drain_obj_stock at mm/memcontrol.c:3059
refill_obj_stock+0x104/0x680:
__preempt_count_add at include/asm-generic/preempt.h:54
(inlined by) __rcu_read_lock at include/linux/rcupdate.h:103
(inlined by) rcu_read_lock at include/linux/rcupdate.h:848
(inlined by) percpu_ref_get_many at include/linux/percpu-refcount.h:202
(inlined by) percpu_ref_get at include/linux/percpu-refcount.h:222
(inlined by) obj_cgroup_get at include/linux/memcontrol.h:782
(inlined by) refill_obj_stock at mm/memcontrol.c:3099
__memcg_slab_free_hook+0x238/0x3ec:
__preempt_count_add at include/asm-generic/preempt.h:54
(inlined by) __rcu_read_lock at include/linux/rcupdate.h:103
(inlined by) rcu_read_lock at include/linux/rcupdate.h:848
(inlined by) percpu_ref_put_many at include/linux/percpu-refcount.h:330
(inlined by) percpu_ref_put at include/linux/percpu-refcount.h:351
(inlined by) obj_cgroup_put at include/linux/memcontrol.h:794
(inlined by) __memcg_slab_free_hook at mm/memcontrol.c:3284
__rcu_free_sheaf_prepare+0x314/0x3e8:
memcg_slab_free_hook at mm/slub.c:2486
(inlined by) __rcu_free_sheaf_prepare at mm/slub.c:2914
rcu_free_sheaf_nobarn+0x38/0x78:
sheaf_flush_unused at mm/slub.c:2893
(inlined by) rcu_free_sheaf_nobarn at mm/slub.c:2941
rcu_do_batch+0x2ec/0xfa8:
rcu_do_batch at kernel/rcu/tree.c:2617
rcu_core+0x22c/0x48c:
rcu_core at kernel/rcu/tree.c:2871
handle_softirqs+0x1f4/0x74c:
handle_softirqs at kernel/softirq.c:622
run_ksoftirqd+0x94/0xb8:
arch_local_irq_enable at arch/powerpc/include/asm/hw_irq.h:201
(inlined by) ksoftirqd_run_end at kernel/softirq.c:479
(inlined by) run_ksoftirqd at kernel/softirq.c:1064
(inlined by) run_ksoftirqd at kernel/softirq.c:1055
smpboot_thread_fn+0x450/0x648:
smpboot_thread_fn at kernel/smpboot.c:160 (discriminator 3)
kthread+0x244/0x28c:
kthread at kernel/kthread.c:467
start_kernel_thread+0x14/0x18:
start_kernel_thread at arch/powerpc/kernel/interrupt_64.S:771


Regards,

Venkat.

>
>>> [ 6054.957411] run fstests generic/428 at 2026-02-16 22:25:57
>>> [ 6055.136443] Kernel attempted to read user page (0) - exploit attempt?
>>> (uid: 0)
>>> [ 6055.136474] BUG: Kernel NULL pointer dereference on read at 0x00000000
>>> [ 6055.136485] Faulting instruction address: 0xc0000000008aff0c
>>> [ 6055.136495] Oops: Kernel access of bad area, sig: 11 [#1]
>>> [ 6055.136505] LE PAGE_SIZE=64K MMU=Radix  SMP NR_CPUS=8192 NUMA pSeries
>>> [ 6055.136517] Modules linked in: dm_thin_pool dm_persistent_data
>>> dm_bio_prison dm_snapshot dm_bufio dm_flakey xfs loop dm_mod nft_fib_inet
>>> nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
>>> nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
>>> nf_defrag_ipv6 nf_defrag_ipv4 bonding ip_set tls nf_tables rfkill sunrpc
>>> nfnetlink pseries_rng vmx_crypto dax_pmem fuse ext4 crc16 mbcache jbd2
>>> nd_pmem papr_scm sd_mod libnvdimm sg ibmvscsi ibmveth scsi_transport_srp
>>> pseries_wdt [last unloaded: scsi_debug]
>>> [ 6055.136684] CPU: 19 UID: 0 PID: 0 Comm: swapper/19 Kdump: loaded Tainted:
>>> G        W           6.19.0-next-20260216 #1 PREEMPTLAZY
>>> [ 6055.136701] Tainted: [W]=WARN
>>> [ 6055.136708] Hardware name: IBM,9080-HEX Power11 (architected) 0x820200
>>> 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
>>> [ 6055.136719] NIP:  c0000000008aff0c LR: c0000000008aff00 CTR:
>>> c00000000036d5e0
>>> [ 6055.136730] REGS: c000000d0dc877c0 TRAP: 0300   Tainted: G   W
>>> (6.19.0-next-20260216)
>>> [ 6055.136742] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 84042802 XER: 20040037
>>> [ 6055.136777] CFAR: c000000000862a74 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0
>>> [ 6055.136777] GPR00: c0000000008aff00 c000000d0dc87a60 c00000000243a500 0000000000000001
>>> [ 6055.136777] GPR04: 0000000000000008 0000000000000001 c0000000008aff00 0000000000000001
>>> [ 6055.136777] GPR08: a80e000000000000 0000000000000001 0000000000000007
>>> a80e000000000000
>>> [ 6055.136777] GPR12: c00e00000c46e6d5 c000000d0ddf0b00 c000000019069a00
>>> 0000000000000006
>>> [ 6055.136777] GPR16: c000000007012fa0 c000000007012fa4 c000000005160980
>>> c000000007012f88
>>> [ 6055.136777] GPR20: c00c0000004d7cec c000000d0d10f008 0000000000000001
>>> ffffffffffffff78
>>> [ 6055.136777] GPR24: 0000000000000005 c000000d0d58f180 c0000001d0795e00
>>> c000000d0d10f01c
>>> [ 6055.136777] GPR28: c000000d0d10f008 c000000d0d10f010 c0000001d0795e08
>>> 0000000000000000
>>> [ 6055.136891] NIP [c0000000008aff0c] drain_obj_stock+0x620/0xa48
>>> [ 6055.136905] LR [c0000000008aff00] drain_obj_stock+0x614/0xa48
>>> [ 6055.136915] Call Trace:
>>> [ 6055.136919] [c000000d0dc87a60] [c0000000008aff00] drain_obj_stock+0x614/0xa48 (unreliable)
>>> [ 6055.136933] [c000000d0dc87b10] [c0000000008b27e4] refill_obj_stock+0x104/0x680
>>> [ 6055.136945] [c000000d0dc87b90] [c0000000008b9238] __memcg_slab_free_hook+0x238/0x3ec
>>> [ 6055.136956] [c000000d0dc87c60] [c0000000007f39a0] __rcu_free_sheaf_prepare+0x314/0x3e8
>>> [ 6055.136968] [c000000d0dc87d10] [c0000000007fbf0c] rcu_free_sheaf+0x38/0x170
>>> [ 6055.136980] [c000000d0dc87d50] [c0000000003344b0] rcu_do_batch+0x2ec/0xfa8
>>> [ 6055.136992] [c000000d0dc87e50] [c000000000339948] rcu_core+0x22c/0x48c
>>> [ 6055.137002] [c000000d0dc87ec0] [c0000000001cfe6c] handle_softirqs+0x1f4/0x74c
>>> [ 6055.137013] [c000000d0dc87fe0] [c00000000001b0cc] do_softirq_own_stack+0x60/0x7c
>>> [ 6055.137025] [c000000009717930] [c00000000001b0b8] do_softirq_own_stack+0x4c/0x7c
>>> [ 6055.137036] [c000000009717960] [c0000000001cf128] __irq_exit_rcu+0x268/0x308
>>> [ 6055.137046] [c0000000097179a0] [c0000000001d0ba4] irq_exit+0x20/0x38
>>> [ 6055.137056] [c0000000097179c0] [c0000000000315f4] interrupt_async_exit_prepare.constprop.0+0x18/0x2c
>>> [ 6055.137069] [c0000000097179e0] [c000000000009ffc] decrementer_common_virt+0x28c/0x290
>>> [ 6055.137080] ---- interrupt: 900 at plpar_hcall_norets_notrace+0x18/0x2c
>>> [ 6055.137090] NIP:  c00000000012d8f0 LR: c00000000135c3fc CTR: 0000000000000000
>>> [ 6055.137097] REGS: c000000009717a10 TRAP: 0900   Tainted: G   W            (6.19.0-next-20260216)
>>> [ 6055.137105] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24000804  XER: 00000037
>>> [ 6055.137134] CFAR: 0000000000000000 IRQMASK: 0
>>> [ 6055.137134] GPR00: 0000000000000000 c000000009717cb0 c00000000243a500 0000000000000000
>>> [ 6055.137134] GPR04: 0000000000000000 800400002fe6fc10 0000000000000000 0000000000000001
>>> [ 6055.137134] GPR08: 0000000000000033 0000000000000000 0000000000000090 0000000000000001
>>> [ 6055.137134] GPR12: 800400002fe6fc00 c000000d0ddf0b00 0000000000000000 000000002ef01a60
>>> [ 6055.137134] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> [ 6055.137134] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
>>> [ 6055.137134] GPR24: 0000000000000000 c000000004d7a778 00000581d1a507b8 0000000000000000
>>> [ 6055.137134] GPR28: 0000000000000000 0000000000000001 c0000000032b18d8 c0000000032b18e0
>>> [ 6055.137229] NIP [c00000000012d8f0] plpar_hcall_norets_notrace+0x18/0x2c
>>> [ 6055.137238] LR [c00000000135c3fc] cede_processor.isra.0+0x1c/0x30
>>> [ 6055.137248] ---- interrupt: 900
>>> [ 6055.137253] [c000000009717cb0] [c000000009717cf0] 0xc000000009717cf0 (unreliable)
>>> [ 6055.137265] [c000000009717d10] [c0000000019af160] dedicated_cede_loop+0x90/0x170
>>> [ 6055.137277] [c000000009717d60] [c0000000019aeb10] cpuidle_enter_state+0x394/0x480
>>> [ 6055.137288] [c000000009717e00] [c0000000013589ec] cpuidle_enter+0x64/0x9c
>>> [ 6055.137298] [c000000009717e50] [c000000000284a8c] call_cpuidle+0x7c/0xf8
>>> [ 6055.137310] [c000000009717e90] [c000000000290398] cpuidle_idle_call+0x1c4/0x2b4
>>> [ 6055.137321] [c000000009717f00] [c0000000002905bc] do_idle+0x134/0x208
>>> [ 6055.137330] [c000000009717f50] [c000000000290a0c] cpu_startup_entry+0x60/0x64
>>> [ 6055.137341] [c000000009717f80] [c0000000000744b8] start_secondary+0x3fc/0x400
>>> [ 6055.137352] [c000000009717fe0] [c00000000000e258] start_secondary_prolog+0x10/0x14
>>> [ 6055.137363] Code: 60000000 3bda0008 7fc3f378 4bfb148d 60000000 ebfa0008 38800008 7fe3fb78 4bfb2b51 60000000 7c0004ac 39200001 <7d40f8a8> 7d495050 7d40f9ad 40c2fff4
>>> [ 6055.137400] ---[ end trace 0000000000000000 ]---
>> Again, nothing here seems to point to a xfs problem.
>>
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [next-20260216]NULL pointer dereference in drain_obj_stock() (RCU free path)
  2026-02-22 10:08     ` Venkat Rao Bagalkote
@ 2026-02-22 11:47       ` Harry Yoo
  2026-02-22 23:36         ` Shakeel Butt
  0 siblings, 1 reply; 7+ messages in thread
From: Harry Yoo @ 2026-02-22 11:47 UTC (permalink / raw)
  To: Venkat Rao Bagalkote
  Cc: Vlastimil Babka, Carlos Maiolino, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, linux-xfs, linux-fsdevel, LKML,
	Madhavan Srinivasan, Ritesh Harjani, ojaswin, Muchun Song,
	Cgroups, linux-mm, Hao Li

On Sun, Feb 22, 2026 at 03:38:57PM +0530, Venkat Rao Bagalkote wrote:
> 
> On 18/02/26 5:06 pm, Vlastimil Babka wrote:
> > On 2/17/26 13:40, Carlos Maiolino wrote:
> > > On Tue, Feb 17, 2026 at 04:59:12PM +0530, Venkat Rao Bagalkote wrote:
> > > > Greetings!!!
> > > > 
> > > > I am observing below OOPs, while running xfstests generic/428 test case. But
> > > > I am not able to reproduce this consistently.
> > > > 
> > > > 
> > > > Platform: IBM Power11 (pSeries LPAR), Radix MMU, LE, 64K pages
> > > > Kernel: 6.19.0-next-20260216
> > > > Tests: generic/428
> > > > 
> > > > local.config >>>
> > > > [xfs_4k]
> > > > export RECREATE_TEST_DEV=true
> > > > export TEST_DEV=/dev/loop0
> > > > export TEST_DIR=/mnt/test
> > > > export SCRATCH_DEV=/dev/loop1
> > > > export SCRATCH_MNT=/mnt/scratch
> > > > export MKFS_OPTIONS="-b size=4096"
> > > > export FSTYP=xfs
> > > > export MOUNT_OPTIONS=""-
> > > > 
> > > > 
> > > > 
> > > > Attached is .config file used.
> > > > 
> > > > Traces:
> > > > 
> > > /me fixing trace's indentation
> > CCing memcg and slab folks.
> > Would be nice to figure out where in drain_obj_stock things got wrong. Any
> > change for e.g. ./scripts/faddr2line ?
> > 
> > I wonder if we have either some bogus objext pointer, or maybe the
> > rcu_free_sheaf() context is new (or previously rare) for memcg and we have
> > some locking issues being exposed in refill/drain.
> 
> 
> This issue also got reproduced on mainline repo.
> 
> Traces:
> 
> [ 8058.036083] Kernel attempted to read user page (0) - exploit attempt?
> (uid: 0)
> [ 8058.036116] BUG: Kernel NULL pointer dereference on read at 0x00000000
> [ 8058.036127] Faulting instruction address: 0xc0000000008b018c
> [ 8058.036137] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 8058.036147] LE PAGE_SIZE=64K MMU=Radix  SMP NR_CPUS=8192 NUMA pSeries
> [ 8058.036159] Modules linked in: overlay dm_zero dm_thin_pool
> dm_persistent_data dm_bio_prison dm_snapshot dm_bufio dm_flakey xfs loop
> dm_mod nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set bonding nf_tables tls
> rfkill sunrpc nfnetlink pseries_rng vmx_crypto dax_pmem fuse ext4 crc16
> mbcache jbd2 nd_pmem papr_scm sd_mod libnvdimm sg ibmvscsi ibmveth
> scsi_transport_srp pseries_wdt [last unloaded: scsi_debug]
> [ 8058.036339] CPU: 19 UID: 0 PID: 115 Comm: ksoftirqd/19 Kdump: loaded Not
> tainted 6.19.0+ #1 PREEMPTLAZY
> [ 8058.036361] Hardware name: IBM,9080-HEX Power11 (architected) 0x820200
> 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
> [ 8058.036379] NIP:  c0000000008b018c LR: c0000000008b0180 CTR:
> c00000000036d680
> [ 8058.036395] REGS: c00000000b5976c0 TRAP: 0300   Not tainted (6.19.0+)
> [ 8058.036411] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR:
> 84042002  XER: 20040000
> [ 8058.036482] CFAR: c000000000862cf4 DAR: 0000000000000000 DSISR: 40000000
> IRQMASK: 0
> [ 8058.036482] GPR00: c0000000008b0180 c00000000b597960 c00000000243a500
> 0000000000000001
> [ 8058.036482] GPR04: 0000000000000008 0000000000000001 c0000000008b0180
> 0000000000000001
> [ 8058.036482] GPR08: a80e000000000000 0000000000000001 0000000000000007
> a80e000000000000
> [ 8058.036482] GPR12: c00e00000120f8d5 c000000d0ddf0b00 c000000073567780
> 0000000000000006
> [ 8058.036482] GPR16: c000000007012fa0 c000000007012fa4 c000000005160980
> c000000007012f88
> [ 8058.036482] GPR20: c00c000001c3daac c000000d0d10f008 0000000000000001
> ffffffffffffff78
> [ 8058.036482] GPR24: 0000000000000005 c000000d0d58f180 c00000000cd6f580
> c000000d0d10f01c
> [ 8058.036482] GPR28: c000000d0d10f008 c000000d0d10f010 c00000000cd6f588
> 0000000000000000
> [ 8058.036628] NIP [c0000000008b018c] drain_obj_stock+0x620/0xa48
> [ 8058.036646] LR [c0000000008b0180] drain_obj_stock+0x614/0xa48
> [ 8058.036659] Call Trace:
> [ 8058.036665] [c00000000b597960] [c0000000008b0180]
> drain_obj_stock+0x614/0xa48 (unreliable)
> [ 8058.036688] [c00000000b597a10] [c0000000008b2a64]
> refill_obj_stock+0x104/0x680
> [ 8058.036715] [c00000000b597a90] [c0000000008b94b8]
> __memcg_slab_free_hook+0x238/0x3ec
> [ 8058.036738] [c00000000b597b60] [c0000000007f3c10]
> __rcu_free_sheaf_prepare+0x314/0x3e8
> [ 8058.036763] [c00000000b597c10] [c0000000007fbf70]
> rcu_free_sheaf_nobarn+0x38/0x78
> [ 8058.036788] [c00000000b597c40] [c000000000334550]
> rcu_do_batch+0x2ec/0xfa8
> [ 8058.036812] [c00000000b597d40] [c0000000003399e8] rcu_core+0x22c/0x48c
> [ 8058.036835] [c00000000b597db0] [c0000000001cfe6c]
> handle_softirqs+0x1f4/0x74c
> [ 8058.036862] [c00000000b597ed0] [c0000000001d0458] run_ksoftirqd+0x94/0xb8
> [ 8058.036885] [c00000000b597f00] [c00000000022a130]
> smpboot_thread_fn+0x450/0x648
> [ 8058.036912] [c00000000b597f80] [c000000000218408] kthread+0x244/0x28c
> [ 8058.036927] [c00000000b597fe0] [c00000000000ded8]
> start_kernel_thread+0x14/0x18
> [ 8058.036943] Code: 60000000 3bda0008 7fc3f378 4bfb148d 60000000 ebfa0008
> 38800008 7fe3fb78 4bfb2b51 60000000 7c0004ac 39200001 <7d40f8a8> 7d495050
> 7d40f9ad 40c2fff4
> [ 8058.037000] ---[ end trace 0000000000000000 ]---
> 
> 
> And below is the corresponding o/p from faddr2line.

Thanks!

> drain_obj_stock+0x620/0xa48:
> arch_atomic64_sub_return_relaxed at arch/powerpc/include/asm/atomic.h:272
> (inlined by) raw_atomic64_sub_return at
> include/linux/atomic/atomic-arch-fallback.h:2917
> (inlined by) raw_atomic64_sub_and_test at
> include/linux/atomic/atomic-arch-fallback.h:4386
> (inlined by) raw_atomic_long_sub_and_test at
> include/linux/atomic/atomic-long.h:1551
> (inlined by) atomic_long_sub_and_test at
> include/linux/atomic/atomic-instrumented.h:4522
> (inlined by) percpu_ref_put_many at include/linux/percpu-refcount.h:334
> (inlined by) percpu_ref_put at include/linux/percpu-refcount.h:351
> (inlined by) obj_cgroup_put at include/linux/memcontrol.h:794
> (inlined by) drain_obj_stock at mm/memcontrol.c:3059

It seems it crashed while dereferencing objcg->ref->data->count.
I think that implies that obj_cgroup_release()->percpu_ref_exit()
is already called due to the refcount reaching zero and set
ref->data = NULL.

Wait, was the stock->objcg ever a valid objcg?
I think it should be valid when refilling the obj stock, otherwise
it should have crashed in refill_obj_stock() -> obj_cgroup_get() path
in the first place, rather than crashing when draining.

And that sounds like we're somehow calling obj_cgroup_put() more times
than obj_cgroup_get().

Anyway, this is my theory that it may be due to mis-refcounting of objcgs.

> drain_obj_stock+0x614/0xa48:
> instrument_atomic_read_write at include/linux/instrumented.h:112
> (inlined by) atomic_long_sub_and_test at
> include/linux/atomic/atomic-instrumented.h:4521
> (inlined by) percpu_ref_put_many at include/linux/percpu-refcount.h:334
> (inlined by) percpu_ref_put at include/linux/percpu-refcount.h:351
> (inlined by) obj_cgroup_put at include/linux/memcontrol.h:794
> (inlined by) drain_obj_stock at mm/memcontrol.c:3059
> refill_obj_stock+0x104/0x680:

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [next-20260216]NULL pointer dereference in drain_obj_stock() (RCU free path)
  2026-02-22 11:47       ` Harry Yoo
@ 2026-02-22 23:36         ` Shakeel Butt
  2026-02-22 23:48           ` Shakeel Butt
  0 siblings, 1 reply; 7+ messages in thread
From: Shakeel Butt @ 2026-02-22 23:36 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Venkat Rao Bagalkote, Vlastimil Babka, Carlos Maiolino,
	Johannes Weiner, Michal Hocko, Roman Gushchin, linux-xfs,
	linux-fsdevel, LKML, Madhavan Srinivasan, Ritesh Harjani,
	ojaswin, Muchun Song, Cgroups, linux-mm, Hao Li

On Sun, Feb 22, 2026 at 08:47:03PM +0900, Harry Yoo wrote:
[...]
> 
> It seems it crashed while dereferencing objcg->ref->data->count.
> I think that implies that obj_cgroup_release()->percpu_ref_exit()
> is already called due to the refcount reaching zero and set
> ref->data = NULL.
> 
> Wait, was the stock->objcg ever a valid objcg?
> I think it should be valid when refilling the obj stock, otherwise
> it should have crashed in refill_obj_stock() -> obj_cgroup_get() path
> in the first place, rather than crashing when draining.
> 
> And that sounds like we're somehow calling obj_cgroup_put() more times
> than obj_cgroup_get().
> 
> Anyway, this is my theory that it may be due to mis-refcounting of objcgs.
> 

I have not looked deeper into recent slub changes (sheafs or obj_exts savings)
but one thing looks weird to me:

allocate_slab() // for cache with SLAB_OBJ_EXT_IN_OBJ
	-> alloc_slab_obj_exts_early()
		-> slab_set_stride(slab, s->size)
	-> account_slab()
		-> alloc_slab_obj_exts()
			-> slab_set_stride(slab, sizeof(struct slabobj_ext));

Unconditional overwrite of stride. Not sure if it is issue or even related to
this crash but looks odd.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [next-20260216]NULL pointer dereference in drain_obj_stock() (RCU free path)
  2026-02-22 23:36         ` Shakeel Butt
@ 2026-02-22 23:48           ` Shakeel Butt
  2026-02-23  2:36             ` Harry Yoo
  0 siblings, 1 reply; 7+ messages in thread
From: Shakeel Butt @ 2026-02-22 23:48 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Venkat Rao Bagalkote, Vlastimil Babka, Carlos Maiolino,
	Johannes Weiner, Michal Hocko, Roman Gushchin, linux-xfs,
	linux-fsdevel, LKML, Madhavan Srinivasan, Ritesh Harjani,
	ojaswin, Muchun Song, Cgroups, linux-mm, Hao Li

On Sun, Feb 22, 2026 at 03:36:46PM -0800, Shakeel Butt wrote:
> On Sun, Feb 22, 2026 at 08:47:03PM +0900, Harry Yoo wrote:
> [...]
> > 
> > It seems it crashed while dereferencing objcg->ref->data->count.
> > I think that implies that obj_cgroup_release()->percpu_ref_exit()
> > is already called due to the refcount reaching zero and set
> > ref->data = NULL.
> > 
> > Wait, was the stock->objcg ever a valid objcg?
> > I think it should be valid when refilling the obj stock, otherwise
> > it should have crashed in refill_obj_stock() -> obj_cgroup_get() path
> > in the first place, rather than crashing when draining.
> > 
> > And that sounds like we're somehow calling obj_cgroup_put() more times
> > than obj_cgroup_get().
> > 
> > Anyway, this is my theory that it may be due to mis-refcounting of objcgs.
> > 
> 
> I have not looked deeper into recent slub changes (sheafs or obj_exts savings)
> but one thing looks weird to me:
> 
> allocate_slab() // for cache with SLAB_OBJ_EXT_IN_OBJ
> 	-> alloc_slab_obj_exts_early()
> 		-> slab_set_stride(slab, s->size)
> 	-> account_slab()
> 		-> alloc_slab_obj_exts()
> 			-> slab_set_stride(slab, sizeof(struct slabobj_ext));
> 
> Unconditional overwrite of stride. Not sure if it is issue or even related to
> this crash but looks odd.

I asked AI to debug this crash report along with a nudge towards to look for
stride corruption, it gave me the following output:


# Stride Corruption Bug Analysis

## Bug Report Context

- **Crash Location**: `drain_obj_stock+0x620/0xa48` in `obj_cgroup_put(old)` at mm/memcontrol.c:3059
- **Root Cause**: `percpu_ref.data` is NULL, meaning `obj_cgroup_release()` already ran
- **Platform**: IBM Power11 (pSeries LPAR, Radix MMU, LE, 64K pages, kernel 6.19.0-next-20260216)
- **Trigger**: xfstests generic/428

## Identified Bug: Unconditional Stride Overwrite

### Location: mm/slub.c lines 2196-2223 (alloc_slab_obj_exts)

```c
retry:
    old_exts = READ_ONCE(slab->obj_exts);
    handle_failed_objexts_alloc(old_exts, vec, objects);
    slab_set_stride(slab, sizeof(struct slabobj_ext));  // BUG: UNCONDITIONALLY SET

    if (new_slab) {
        slab->obj_exts = new_exts;
    } else if (old_exts & ~OBJEXTS_FLAGS_MASK) {
        // obj_exts already exists, BUT stride was already modified above!
        mark_objexts_empty(vec);
        kfree(vec);
        return 0;
    } else if (cmpxchg(&slab->obj_exts, old_exts, new_exts) != old_exts) {
        goto retry;
    }
```

### The Problem

The stride is set to `sizeof(struct slabobj_ext)` **BEFORE** checking if `obj_exts` already
exists. If a slab was created with `SLAB_OBJ_EXT_IN_OBJ` mode (where stride = `s->size`),
and later `alloc_slab_obj_exts` is called for any reason, the stride gets corrupted.

### Stride Modes

There are two stride modes (see alloc_slab_obj_exts_early):

1. **Normal mode**: stride = `sizeof(struct slabobj_ext)` (~16 bytes)
   - obj_exts is a separate array or in slab leftover space

2. **SLAB_OBJ_EXT_IN_OBJ mode**: stride = `s->size` (object size, e.g., 64-256+ bytes)
   - obj_ext is embedded within each object at a fixed offset

### Consequences of Wrong Stride

When `slab_obj_ext` is later called:
```c
obj_ext = (struct slabobj_ext *)(obj_exts + slab_get_stride(slab) * index);
```

With corrupted stride (16 instead of 256):
- **Expected**: `obj_exts + 256 * 5 = obj_exts + 1280` (correct obj_ext for object 5)
- **Actual**: `obj_exts + 16 * 5 = obj_exts + 80` (WRONG obj_ext - belongs to object 0!)

This causes:
1. Reading wrong object's objcg pointer
2. Releasing wrong objcg reference (`obj_cgroup_put`)
3. Reference underflow on victim objcg
4. Early `obj_cgroup_release()` → `percpu_ref_exit()` → `data = NULL`
5. Stock still caches the objcg pointer
6. Later `drain_obj_stock()` tries to put it → **CRASH**

## Missing Safety Check

`slab_obj_ext()` in mm/slab.h has **no bounds checking**:

```c
static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
                                               unsigned long obj_exts,
                                               unsigned int index)
{
    struct slabobj_ext *obj_ext;

    VM_WARN_ON_ONCE(obj_exts != slab_obj_exts(slab));
    // MISSING: VM_WARN_ON_ONCE(index >= slab->objects);

    obj_ext = (struct slabobj_ext *)(obj_exts +
                                     slab_get_stride(slab) * index);
    return kasan_reset_tag(obj_ext);
}
```

## CRITICAL: Memory Ordering Bug on PowerPC (Likely Root Cause)

### The Problem

In `alloc_slab_obj_exts` (mm/slub.c lines 2199-2220), there is **NO memory barrier**
between the stride store and the obj_exts visibility via cmpxchg:

```c
slab_set_stride(slab, sizeof(struct slabobj_ext));  // Store to stride (line 2199)
                                                     // NO MEMORY BARRIER HERE!
if (new_slab) {
    slab->obj_exts = new_exts;                       // Store to obj_exts (line 2207)
} else if (...) {
} else if (cmpxchg(&slab->obj_exts, ...) != ...) {   // Atomic on obj_exts (line 2220)
    goto retry;
}
```

### Why This Crashes on PowerPC

PowerPC has a **weakly-ordered memory model**. Stores can be reordered and may not be
immediately visible to other processors. The cmpxchg provides a barrier AFTER it
executes, but the stride store BEFORE cmpxchg may not be visible when obj_exts becomes
visible.

**Race Scenario:**
1. CPU A: `slab_set_stride(slab, 16)` (store to stride, in CPU A's store buffer)
2. CPU A: `cmpxchg(&slab->obj_exts, 0, new_exts)` succeeds, obj_exts is now visible
3. CPU B: Sees `obj_exts` is set (from step 2)
4. CPU B: Reads `slab->stride` → **sees OLD value (0 or garbage)** due to reordering!
5. CPU B: `slab_obj_ext` calculates `obj_exts + 0 * index = obj_exts` for ALL indices!
6. **All objects appear to share the same obj_ext at offset 0**

### Consequences

- Object 0's objcg is correct
- Object 1..N all read object 0's objcg (WRONG!)
- When freeing multiple objects, we `obj_cgroup_put` the SAME objcg multiple times
- Reference count underflows → early `obj_cgroup_release()`
- `percpu_ref_exit()` sets `data = NULL`
- Later stock drain tries to put the objcg → **CRASH in `drain_obj_stock`**

### Why This Matches the Bug Report

- **Platform**: IBM Power11 (PowerPC) - weakly ordered memory
- **Trigger**: xfstests generic/428 - creates high filesystem activity with many allocations
- **Crash location**: `drain_obj_stock` → `obj_cgroup_put` with NULL data
- **Pattern**: Stock has cached objcg that was prematurely released

### Proposed Fix

Add a write memory barrier to ensure stride is visible before obj_exts:

```c
slab_set_stride(slab, sizeof(struct slabobj_ext));
smp_wmb();  // Ensure stride is visible before obj_exts

if (new_slab) {
    slab->obj_exts = new_exts;
} else if (...) {
} else if (cmpxchg(&slab->obj_exts, old_exts, new_exts) != old_exts) {
    goto retry;
}
```

And correspondingly, the reader side should use a read barrier:
```c
static inline unsigned short slab_get_stride(struct slab *slab)
{
    // Need acquire semantics when reading stride after seeing obj_exts
    return smp_load_acquire(&slab->stride);
}
```

Or use `smp_store_release` / `smp_load_acquire` pairs for proper ordering.

### Also Applies to alloc_slab_obj_exts_early

The same issue exists in `alloc_slab_obj_exts_early` (lines 2290-2291 and 2308-2309):

```c
slab->obj_exts = obj_exts;                           // Store obj_exts
slab_set_stride(slab, sizeof(struct slabobj_ext));   // Store stride AFTER!
```

Here the order is **reversed** - obj_exts is set BEFORE stride! This is even worse
for memory ordering, as other CPUs could see obj_exts before stride is set.

## Original Theory: Unconditional Stride Overwrite

(Kept for reference - less likely to be the root cause on this specific crash)

The stride is set to `sizeof(struct slabobj_ext)` **BEFORE** checking if `obj_exts`
already exists. However, analysis shows this is protected by the TOCTOU check in
callers (`!slab_obj_exts(slab)`).

## Trigger Scenarios

1. **Memory ordering on PowerPC** (MOST LIKELY): Stride not visible when obj_exts
   becomes visible due to missing memory barriers.

2. **Race between alloc_slab_obj_exts calls**: Two CPUs trying to allocate obj_exts
   for the same slab simultaneously.

3. **Interaction with RCU free path**: Objects in RCU sheaf being processed when
   stride is stale/zero.

## Confirmed Code Analysis (CONFIG_64BIT)

On 64-bit systems (including IBM Power11), the stride is stored dynamically:

**mm/slab.h:562-569**:
```c
#ifdef CONFIG_64BIT
static inline void slab_set_stride(struct slab *slab, unsigned short stride)
{
    slab->stride = stride;  // Plain store - NO memory ordering!
}
static inline unsigned short slab_get_stride(struct slab *slab)
{
    return slab->stride;    // Plain load - NO memory ordering!
}
```

**mm/slab.h:533-548** (`slab_obj_exts`):
```c
static inline unsigned long slab_obj_exts(struct slab *slab)
{
    unsigned long obj_exts = READ_ONCE(slab->obj_exts);  // Only compiler barrier!
    // ... validation ...
    return obj_exts & ~OBJEXTS_FLAGS_MASK;
}
```

`READ_ONCE` only provides compiler ordering, NOT CPU memory ordering. There's no
acquire barrier to ensure the stride read happens after seeing obj_exts.

## Complete Fix Using Release/Acquire Semantics

### Fix 1: Reader side - slab_obj_exts (mm/slab.h)

Change `READ_ONCE` to `smp_load_acquire`:

```c
static inline unsigned long slab_obj_exts(struct slab *slab)
{
    unsigned long obj_exts = smp_load_acquire(&slab->obj_exts);  // Acquire barrier
    // ... validation ...
    return obj_exts & ~OBJEXTS_FLAGS_MASK;
}
```

### Fix 2: Writer side - alloc_slab_obj_exts (mm/slub.c:2196-2223)

Use `smp_store_release` for obj_exts after setting stride:

```c
retry:
    old_exts = READ_ONCE(slab->obj_exts);
    handle_failed_objexts_alloc(old_exts, vec, objects);

    if (new_slab) {
        slab_set_stride(slab, sizeof(struct slabobj_ext));
        smp_store_release(&slab->obj_exts, new_exts);  // Release barrier
    } else if (old_exts & ~OBJEXTS_FLAGS_MASK) {
        mark_objexts_empty(vec);
        kfree(vec);
        return 0;
    } else {
        slab_set_stride(slab, sizeof(struct slabobj_ext));
        // cmpxchg already provides release semantics, but stride must be
        // visible before cmpxchg. Need explicit barrier:
        smp_wmb();
        if (cmpxchg(&slab->obj_exts, old_exts, new_exts) != old_exts)
            goto retry;
    }
```

### Fix 3: Writer side - alloc_slab_obj_exts_early (mm/slub.c:2290-2291, 2308-2309)

The order is REVERSED here - obj_exts is set BEFORE stride! Fix by using
`smp_store_release`:

```c
// For normal obj_exts (lines 2290-2291):
slab_set_stride(slab, sizeof(struct slabobj_ext));  // Set stride FIRST
smp_store_release(&slab->obj_exts, obj_exts);       // Then release obj_exts

// For SLAB_OBJ_EXT_IN_OBJ (lines 2308-2309):
slab_set_stride(slab, s->size);                     // Set stride FIRST
smp_store_release(&slab->obj_exts, obj_exts);       // Then release obj_exts
```

## Why This Fixes the Crash

With proper release/acquire ordering:

1. **Writer** (CPU A): Sets stride, then `smp_store_release(&obj_exts, ...)` ensures
   stride is visible to all CPUs before obj_exts becomes visible

2. **Reader** (CPU B): `smp_load_acquire(&obj_exts)` ensures that if obj_exts is
   seen as set, the subsequent stride read will see the correct value

This prevents the race where CPU B sees obj_exts but reads stale/zero stride,
which caused all objects to appear to share obj_ext at offset 0, leading to
multiple `obj_cgroup_put` calls on the same objcg → reference underflow → crash.

## Additional Safety: Bounds Check in slab_obj_ext

Add bounds check to catch any remaining issues:
```c
VM_WARN_ON_ONCE(index >= slab->objects);
```


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [next-20260216]NULL pointer dereference in drain_obj_stock() (RCU free path)
  2026-02-22 23:48           ` Shakeel Butt
@ 2026-02-23  2:36             ` Harry Yoo
  0 siblings, 0 replies; 7+ messages in thread
From: Harry Yoo @ 2026-02-23  2:36 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Venkat Rao Bagalkote, Vlastimil Babka, Carlos Maiolino,
	Johannes Weiner, Michal Hocko, Roman Gushchin, linux-xfs,
	linux-fsdevel, LKML, Madhavan Srinivasan, Ritesh Harjani,
	ojaswin, Muchun Song, Cgroups, linux-mm, Hao Li

On Sun, Feb 22, 2026 at 03:48:53PM -0800, Shakeel Butt wrote:
> On Sun, Feb 22, 2026 at 03:36:46PM -0800, Shakeel Butt wrote:
> > On Sun, Feb 22, 2026 at 08:47:03PM +0900, Harry Yoo wrote:
> > [...]
> > > 
> > > It seems it crashed while dereferencing objcg->ref->data->count.
> > > I think that implies that obj_cgroup_release()->percpu_ref_exit()
> > > is already called due to the refcount reaching zero and set
> > > ref->data = NULL.
> > > 
> > > Wait, was the stock->objcg ever a valid objcg?
> > > I think it should be valid when refilling the obj stock, otherwise
> > > it should have crashed in refill_obj_stock() -> obj_cgroup_get() path
> > > in the first place, rather than crashing when draining.
> > > 
> > > And that sounds like we're somehow calling obj_cgroup_put() more times
> > > than obj_cgroup_get().
> > > 
> > > Anyway, this is my theory that it may be due to mis-refcounting of objcgs.
> > > 
> > 
> > I have not looked deeper into recent slub changes (sheafs or obj_exts savings)
> > but one thing looks weird to me:
> > 
> > allocate_slab() // for cache with SLAB_OBJ_EXT_IN_OBJ
> > 	-> alloc_slab_obj_exts_early()
> > 		-> slab_set_stride(slab, s->size)
> > 	-> account_slab()
> > 		-> alloc_slab_obj_exts()
> > 			-> slab_set_stride(slab, sizeof(struct slabobj_ext));
> > 
> > Unconditional overwrite of stride. Not sure if it is issue or even related to
> > this crash but looks odd.
> 
> I asked AI to debug this crash report along with a nudge towards to look for
> stride corruption, it gave me the following output:

Thanks!

TLDR; I think stride corruption cannot happen because we don't try to
allocate obj_exts if we already have it (w/ SLAB_OBJ_EXT_IN_OBJ).

But the other analysis saying it's memory ordering issue seems
very relevant.

> # Stride Corruption Bug Analysis
> 
> ## Bug Report Context
> 
> - **Crash Location**: `drain_obj_stock+0x620/0xa48` in `obj_cgroup_put(old)` at mm/memcontrol.c:3059
> - **Root Cause**: `percpu_ref.data` is NULL, meaning `obj_cgroup_release()` already ran
> - **Platform**: IBM Power11 (pSeries LPAR, Radix MMU, LE, 64K pages, kernel 6.19.0-next-20260216)
> - **Trigger**: xfstests generic/428
> 
> ## Identified Bug: Unconditional Stride Overwrite
> 
> ### Location: mm/slub.c lines 2196-2223 (alloc_slab_obj_exts)
> 
> ```c
> retry:
>     old_exts = READ_ONCE(slab->obj_exts);
>     handle_failed_objexts_alloc(old_exts, vec, objects);
>     slab_set_stride(slab, sizeof(struct slabobj_ext));  // BUG: UNCONDITIONALLY SET
> 
>     if (new_slab) {
>         slab->obj_exts = new_exts;
>     } else if (old_exts & ~OBJEXTS_FLAGS_MASK) {
>         // obj_exts already exists, BUT stride was already modified above!
>         mark_objexts_empty(vec);
>         kfree(vec);
>         return 0;
>     } else if (cmpxchg(&slab->obj_exts, old_exts, new_exts) != old_exts) {
>         goto retry;
>     }
> ```
> 
> ### The Problem
> 
> The stride is set to `sizeof(struct slabobj_ext)` **BEFORE** checking if `obj_exts` already
> exists. If a slab was created with `SLAB_OBJ_EXT_IN_OBJ` mode (where stride = `s->size`),
> and later `alloc_slab_obj_exts` is called for any reason, the stride gets corrupted.

I think this part is bogus.

It's not a bug because obj_exts saving is applied only during slab
allocation and initialization. If it benefits from the obj_exts savings,
no need to call alloc_slab_obj_exts() later, thus no corruption.

If doesn't benefit from the obj_exts savings, the stride size is always
sizeof(struct slabobj_ext).

> ### Stride Modes
> 
> There are two stride modes (see alloc_slab_obj_exts_early):
> 
> 1. **Normal mode**: stride = `sizeof(struct slabobj_ext)` (~16 bytes)
>    - obj_exts is a separate array or in slab leftover space
> 
> 2. **SLAB_OBJ_EXT_IN_OBJ mode**: stride = `s->size` (object size, e.g., 64-256+ bytes)
>    - obj_ext is embedded within each object at a fixed offset
> 
> ### Consequences of Wrong Stride
> 
> When `slab_obj_ext` is later called:
> ```c
> obj_ext = (struct slabobj_ext *)(obj_exts + slab_get_stride(slab) * index);
> ```
> 
> With corrupted stride (16 instead of 256):
> - **Expected**: `obj_exts + 256 * 5 = obj_exts + 1280` (correct obj_ext for object 5)
> - **Actual**: `obj_exts + 16 * 5 = obj_exts + 80` (WRONG obj_ext - belongs to object 0!)

[...snip...]

> ## CRITICAL: Memory Ordering Bug on PowerPC (Likely Root Cause)
> 
> ### The Problem
> 
> In `alloc_slab_obj_exts` (mm/slub.c lines 2199-2220), there is **NO memory barrier**
> between the stride store and the obj_exts visibility via cmpxchg:

This is actually a good point.
 
> ```c
> slab_set_stride(slab, sizeof(struct slabobj_ext));  // Store to stride (line 2199)
>                                                      // NO MEMORY BARRIER HERE!
> if (new_slab) {
>     slab->obj_exts = new_exts;                       // Store to obj_exts (line 2207)
> } else if (...) {
> } else if (cmpxchg(&slab->obj_exts, ...) != ...) {   // Atomic on obj_exts (line 2220)
>     goto retry;
> }
> ```
>
> ### Why This Crashes on PowerPC
> 
> PowerPC has a **weakly-ordered memory model**. Stores can be reordered and may not be
> immediately visible to other processors. The cmpxchg provides a barrier AFTER it
> executes, but the stride store BEFORE cmpxchg may not be visible when obj_exts becomes
> visible.
> 
> **Race Scenario:**
> 1. CPU A: `slab_set_stride(slab, 16)` (store to stride, in CPU A's store buffer)
> 2. CPU A: `cmpxchg(&slab->obj_exts, 0, new_exts)` succeeds, obj_exts is now visible
> 3. CPU B: Sees `obj_exts` is set (from step 2)
> 4. CPU B: Reads `slab->stride` → **sees OLD value (0 or garbage)** due to reordering!
> 5. CPU B: `slab_obj_ext` calculates `obj_exts + 0 * index = obj_exts` for ALL indices!
> 6. **All objects appear to share the same obj_ext at offset 0**

Yes, that could actually happen, especially when the cache doesn't
specify SLAB_ACCOUNT but allocate objects with __GFP_ACCOUNT set
(e.g. xarray does that).

With sheaves for all, objects can be in different CPUs' sheaves and they
could try to allocate obj_exts and charge objects from the same slab.

> ### Consequences
> 
> - Object 0's objcg is correct
> - Object 1..N all read object 0's objcg (WRONG!)
> - When freeing multiple objects, we `obj_cgroup_put` the SAME objcg multiple times

Yes, reading the wrong stride value is the path to mis-refcounting
objcgs :P

> - Reference count underflows → early `obj_cgroup_release()`
> - `percpu_ref_exit()` sets `data = NULL`
> - Later stock drain tries to put the objcg → **CRASH in `drain_obj_stock`**

> ### Why This Matches the Bug Report
> 
> - **Platform**: IBM Power11 (PowerPC) - weakly ordered memory
> - **Trigger**: xfstests generic/428 - creates high filesystem activity with many allocations
> - **Crash location**: `drain_obj_stock` → `obj_cgroup_put` with NULL data
> - **Pattern**: Stock has cached objcg that was prematurely released
> 
> ### Proposed Fix
> 
> Add a write memory barrier to ensure stride is visible before obj_exts:
> 
> ```c
> slab_set_stride(slab, sizeof(struct slabobj_ext));
> smp_wmb();  // Ensure stride is visible before obj_exts
> 
> if (new_slab) {
>     slab->obj_exts = new_exts;
> } else if (...) {
> } else if (cmpxchg(&slab->obj_exts, old_exts, new_exts) != old_exts) {
>     goto retry;
> }
> ```

I would say this is an overkill. Something like this should suffice:

diff --git a/mm/slub.c b/mm/slub.c
index 865bc050f654..8db93a88bbbf 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2196,7 +2196,6 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
 retry:
 	old_exts = READ_ONCE(slab->obj_exts);
 	handle_failed_objexts_alloc(old_exts, vec, objects);
-	slab_set_stride(slab, sizeof(struct slabobj_ext));

 	if (new_slab) {
 		/*
@@ -2272,6 +2271,12 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
 	void *addr;
 	unsigned long obj_exts;

+	/*
+	 * Initialize stride unconditionally,
+	 * but override if SLAB_OBJ_EXT_IN_OBJ is set.
+	 */
+	slab_set_stride(slab, sizeof(struct slabobj_ext));
+
 	if (!need_slab_obj_exts(s))
 		return;

@@ -2288,7 +2293,6 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
 		obj_exts |= MEMCG_DATA_OBJEXTS;
 #endif
 		slab->obj_exts = obj_exts;
-		slab_set_stride(slab, sizeof(struct slabobj_ext));
 	} else if (s->flags & SLAB_OBJ_EXT_IN_OBJ) {
 		unsigned int offset = obj_exts_offset_in_object(s);

There should be enough memory barriers to ensure the stride and obj_exts
are visible to other CPUs before the slab is accessible by other CPUs
(by acquiring/releasing n->list_lock)

> And correspondingly, the reader side should use a read barrier:
> ```c
> static inline unsigned short slab_get_stride(struct slab *slab)
> {
>     // Need acquire semantics when reading stride after seeing obj_exts
>     return smp_load_acquire(&slab->stride);
> }
> ```
> 
> Or use `smp_store_release` / `smp_load_acquire` pairs for proper ordering.
> 
> ### Also Applies to alloc_slab_obj_exts_early
> 
> The same issue exists in `alloc_slab_obj_exts_early` (lines 2290-2291 and 2308-2309):
> 
> ```c
> slab->obj_exts = obj_exts;                           // Store obj_exts
> slab_set_stride(slab, sizeof(struct slabobj_ext));   // Store stride AFTER!
> ```
> 
> Here the order is **reversed** - obj_exts is set BEFORE stride! This is even worse
> for memory ordering, as other CPUs could see obj_exts before stride is set.

[...]

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-02-23  2:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <ca241daa-e7e7-4604-a48d-de91ec9184a5@linux.ibm.com>
     [not found] ` <aZReMzl-S9KM_snh@nidhogg.toxiclabs.cc>
2026-02-18 11:36   ` [next-20260216]NULL pointer dereference in drain_obj_stock() (RCU free path) Vlastimil Babka
2026-02-18 21:25     ` Shakeel Butt
2026-02-22 10:08     ` Venkat Rao Bagalkote
2026-02-22 11:47       ` Harry Yoo
2026-02-22 23:36         ` Shakeel Butt
2026-02-22 23:48           ` Shakeel Butt
2026-02-23  2:36             ` Harry Yoo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox