* [PATCH] mm/slab: initialize slab->stride early to avoid memory ordering issues
@ 2026-02-23 7:58 Harry Yoo
2026-02-23 11:44 ` Harry Yoo
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Harry Yoo @ 2026-02-23 7:58 UTC (permalink / raw)
To: Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Alexei Starovoitov, Hao Li, Suren Baghdasaryan, Shakeel Butt,
Muchun Song, Johannes Weiner, Michal Hocko, cgroups, linux-mm,
Venkat Rao Bagalkote
When alloc_slab_obj_exts() is called later in time (instead of at slab
allocation & initialization step), slab->stride and slab->obj_exts are
set when the slab is already accessible by multiple CPUs.
The current implementation does not enforce memory ordering between
slab->stride and slab->obj_exts. However, for correctness, slab->stride
must be visible before slab->obj_exts, otherwise concurrent readers
may observe slab->obj_exts as non-zero while stride is still stale,
leading to incorrect reference counting of object cgroups.
There has been a bug report [1] that showed symptoms of incorrect
reference counting of object cgroups, which could be triggered by
this memory ordering issue.
Fix this by unconditionally initializing slab->stride in
alloc_slab_obj_exts_early(), before the need_slab_obj_exts() check.
In case of SLAB_OBJ_EXT_IN_OBJ, it is overridden in the same function.
This ensures stride is set before the slab becomes visible to
other CPUs via the per-node partial slab list (protected by spinlock
with acquire/release semantics), preventing them from observing
inconsistent stride value.
Thanks to Shakeel Butt for pointing out this issue [2].
Fixes: 7a8e71bc619d ("mm/slab: use stride to access slabobj_ext")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/lkml/ca241daa-e7e7-4604-a48d-de91ec9184a5@linux.ibm.com [1]
Link: https://lore.kernel.org/linux-mm/aZu9G9mVIVzSm6Ft@hyeyoo [2]
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
I tested this patch, but I could not confirm that this actually fixes
the issue reported by [1]. It would be nice if Venkat could help
confirm; but perhaps it's challenging to reliably reproduce...
Since this logically makes sense, it would be worth fix it anyway.
mm/slub.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 18c30872d196..afa98065d74f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2196,7 +2196,6 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
retry:
old_exts = READ_ONCE(slab->obj_exts);
handle_failed_objexts_alloc(old_exts, vec, objects);
- slab_set_stride(slab, sizeof(struct slabobj_ext));
if (new_slab) {
/*
@@ -2272,6 +2271,9 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
void *addr;
unsigned long obj_exts;
+ /* Initialize stride early to avoid memory ordering issues */
+ slab_set_stride(slab, sizeof(struct slabobj_ext));
+
if (!need_slab_obj_exts(s))
return;
@@ -2288,7 +2290,6 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
obj_exts |= MEMCG_DATA_OBJEXTS;
#endif
slab->obj_exts = obj_exts;
- slab_set_stride(slab, sizeof(struct slabobj_ext));
} else if (s->flags & SLAB_OBJ_EXT_IN_OBJ) {
unsigned int offset = obj_exts_offset_in_object(s);
--
2.43.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm/slab: initialize slab->stride early to avoid memory ordering issues
2026-02-23 7:58 [PATCH] mm/slab: initialize slab->stride early to avoid memory ordering issues Harry Yoo
@ 2026-02-23 11:44 ` Harry Yoo
2026-02-23 17:04 ` Vlastimil Babka
2026-02-23 20:23 ` Shakeel Butt
2026-02-24 9:04 ` Venkat Rao Bagalkote
2 siblings, 1 reply; 6+ messages in thread
From: Harry Yoo @ 2026-02-23 11:44 UTC (permalink / raw)
To: Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Hao Li,
Suren Baghdasaryan, Shakeel Butt, Muchun Song, Johannes Weiner,
Michal Hocko, cgroups, linux-mm, Venkat Rao Bagalkote
On Mon, Feb 23, 2026 at 04:58:09PM +0900, Harry Yoo wrote:
> When alloc_slab_obj_exts() is called later in time (instead of at slab
> allocation & initialization step), slab->stride and slab->obj_exts are
> set when the slab is already accessible by multiple CPUs.
>
> The current implementation does not enforce memory ordering between
> slab->stride and slab->obj_exts. However, for correctness, slab->stride
> must be visible before slab->obj_exts, otherwise concurrent readers
> may observe slab->obj_exts as non-zero while stride is still stale,
> leading to incorrect reference counting of object cgroups.
>
> There has been a bug report [1] that showed symptoms of incorrect
> reference counting of object cgroups, which could be triggered by
> this memory ordering issue.
>
> Fix this by unconditionally initializing slab->stride in
> alloc_slab_obj_exts_early(), before the need_slab_obj_exts() check.
> In case of SLAB_OBJ_EXT_IN_OBJ, it is overridden in the same function.
>
> This ensures stride is set before the slab becomes visible to
> other CPUs via the per-node partial slab list (protected by spinlock
> with acquire/release semantics), preventing them from observing
> inconsistent stride value.
>
> Thanks to Shakeel Butt for pointing out this issue [2].
>
> Fixes: 7a8e71bc619d ("mm/slab: use stride to access slabobj_ext")
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Closes: https://lore.kernel.org/lkml/ca241daa-e7e7-4604-a48d-de91ec9184a5@linux.ibm.com [1]
> Link: https://lore.kernel.org/linux-mm/aZu9G9mVIVzSm6Ft@hyeyoo [2]
> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> ---
Vlastimil, could you please update the changelog when applying this
to the tree? I think this also explains [3] (thanks for raising it
off-list, Vlastimil!):
When alloc_slab_obj_exts() is called later (instead of during slab
allocation and initialization), slab->stride and slab->obj_exts are
updated after the slab is already accessible by multiple CPUs.
The current implementation does not enforce memory ordering between
slab->stride and slab->obj_exts. For correctness, slab->stride must be
visible before slab->obj_exts. Otherwise, concurrent readers may observe
slab->obj_exts as non-zero while stride is still stale.
With stale slab->stride, slab_obj_ext() could return the wrong obj_ext.
This could cause two problems:
- obj_cgroup_put() is called on the wrong objcg, leading to
a use-after-free due to incorrect reference counting [1] by
decrementing the reference count more than it was incremented.
- refill_obj_stock() is called on the wrong objcg, leading to
a page_counter overflow [2] by uncharging more memory than charged.
Fix this by unconditionally initializing slab->stride in
alloc_slab_obj_exts_early(), before the need_slab_obj_exts() check.
In the case of SLAB_OBJ_EXT_IN_OBJ, it is overridden in the function.
This ensures updates to slab->stride become visible before the slab
can be accessed by other CPUs via the per-node partial slab list
(protected by spinlock with acquire/release semantics).
Thanks to Shakeel Butt for pointing out this issue [3].
Fixes: 7a8e71bc619d ("mm/slab: use stride to access slabobj_ext")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/lkml/ca241daa-e7e7-4604-a48d-de91ec9184a5@linux.ibm.com [1]
Closes: https://lore.kernel.org/all/ddff7c7d-c0c3-4780-808f-9a83268bbf0c@linux.ibm.com [2]
Link: https://lore.kernel.org/linux-mm/aZu9G9mVIVzSm6Ft@hyeyoo [3]
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm/slab: initialize slab->stride early to avoid memory ordering issues
2026-02-23 11:44 ` Harry Yoo
@ 2026-02-23 17:04 ` Vlastimil Babka
0 siblings, 0 replies; 6+ messages in thread
From: Vlastimil Babka @ 2026-02-23 17:04 UTC (permalink / raw)
To: Harry Yoo, Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Hao Li,
Suren Baghdasaryan, Shakeel Butt, Muchun Song, Johannes Weiner,
Michal Hocko, cgroups, linux-mm, Venkat Rao Bagalkote
On 2/23/26 12:44, Harry Yoo wrote:
> On Mon, Feb 23, 2026 at 04:58:09PM +0900, Harry Yoo wrote:
>> When alloc_slab_obj_exts() is called later in time (instead of at slab
>> allocation & initialization step), slab->stride and slab->obj_exts are
>> set when the slab is already accessible by multiple CPUs.
>>
>> The current implementation does not enforce memory ordering between
>> slab->stride and slab->obj_exts. However, for correctness, slab->stride
>> must be visible before slab->obj_exts, otherwise concurrent readers
>> may observe slab->obj_exts as non-zero while stride is still stale,
>> leading to incorrect reference counting of object cgroups.
>>
>> There has been a bug report [1] that showed symptoms of incorrect
>> reference counting of object cgroups, which could be triggered by
>> this memory ordering issue.
>>
>> Fix this by unconditionally initializing slab->stride in
>> alloc_slab_obj_exts_early(), before the need_slab_obj_exts() check.
>> In case of SLAB_OBJ_EXT_IN_OBJ, it is overridden in the same function.
>>
>> This ensures stride is set before the slab becomes visible to
>> other CPUs via the per-node partial slab list (protected by spinlock
>> with acquire/release semantics), preventing them from observing
>> inconsistent stride value.
>>
>> Thanks to Shakeel Butt for pointing out this issue [2].
>>
>> Fixes: 7a8e71bc619d ("mm/slab: use stride to access slabobj_ext")
>> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
>> Closes: https://lore.kernel.org/lkml/ca241daa-e7e7-4604-a48d-de91ec9184a5@linux.ibm.com [1]
>> Link: https://lore.kernel.org/linux-mm/aZu9G9mVIVzSm6Ft@hyeyoo [2]
>> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
>> ---
>
> Vlastimil, could you please update the changelog when applying this
> to the tree? I think this also explains [3] (thanks for raising it
> off-list, Vlastimil!):
Done, thanks! Added to slab/for-next-fixes
> When alloc_slab_obj_exts() is called later (instead of during slab
> allocation and initialization), slab->stride and slab->obj_exts are
> updated after the slab is already accessible by multiple CPUs.
>
> The current implementation does not enforce memory ordering between
> slab->stride and slab->obj_exts. For correctness, slab->stride must be
> visible before slab->obj_exts. Otherwise, concurrent readers may observe
> slab->obj_exts as non-zero while stride is still stale.
>
> With stale slab->stride, slab_obj_ext() could return the wrong obj_ext.
> This could cause two problems:
>
> - obj_cgroup_put() is called on the wrong objcg, leading to
> a use-after-free due to incorrect reference counting [1] by
> decrementing the reference count more than it was incremented.
>
> - refill_obj_stock() is called on the wrong objcg, leading to
> a page_counter overflow [2] by uncharging more memory than charged.
>
> Fix this by unconditionally initializing slab->stride in
> alloc_slab_obj_exts_early(), before the need_slab_obj_exts() check.
> In the case of SLAB_OBJ_EXT_IN_OBJ, it is overridden in the function.
>
> This ensures updates to slab->stride become visible before the slab
> can be accessed by other CPUs via the per-node partial slab list
> (protected by spinlock with acquire/release semantics).
>
> Thanks to Shakeel Butt for pointing out this issue [3].
>
> Fixes: 7a8e71bc619d ("mm/slab: use stride to access slabobj_ext")
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Closes: https://lore.kernel.org/lkml/ca241daa-e7e7-4604-a48d-de91ec9184a5@linux.ibm.com [1]
> Closes: https://lore.kernel.org/all/ddff7c7d-c0c3-4780-808f-9a83268bbf0c@linux.ibm.com [2]
> Link: https://lore.kernel.org/linux-mm/aZu9G9mVIVzSm6Ft@hyeyoo [3]
> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm/slab: initialize slab->stride early to avoid memory ordering issues
2026-02-23 7:58 [PATCH] mm/slab: initialize slab->stride early to avoid memory ordering issues Harry Yoo
2026-02-23 11:44 ` Harry Yoo
@ 2026-02-23 20:23 ` Shakeel Butt
2026-02-24 9:04 ` Venkat Rao Bagalkote
2 siblings, 0 replies; 6+ messages in thread
From: Shakeel Butt @ 2026-02-23 20:23 UTC (permalink / raw)
To: Harry Yoo
Cc: Vlastimil Babka, Andrew Morton, Christoph Lameter,
David Rientjes, Roman Gushchin, Alexei Starovoitov, Hao Li,
Suren Baghdasaryan, Muchun Song, Johannes Weiner, Michal Hocko,
cgroups, linux-mm, Venkat Rao Bagalkote
On Mon, Feb 23, 2026 at 04:58:09PM +0900, Harry Yoo wrote:
> When alloc_slab_obj_exts() is called later in time (instead of at slab
> allocation & initialization step), slab->stride and slab->obj_exts are
> set when the slab is already accessible by multiple CPUs.
>
> The current implementation does not enforce memory ordering between
> slab->stride and slab->obj_exts. However, for correctness, slab->stride
> must be visible before slab->obj_exts, otherwise concurrent readers
> may observe slab->obj_exts as non-zero while stride is still stale,
> leading to incorrect reference counting of object cgroups.
>
> There has been a bug report [1] that showed symptoms of incorrect
> reference counting of object cgroups, which could be triggered by
> this memory ordering issue.
>
> Fix this by unconditionally initializing slab->stride in
> alloc_slab_obj_exts_early(), before the need_slab_obj_exts() check.
> In case of SLAB_OBJ_EXT_IN_OBJ, it is overridden in the same function.
>
> This ensures stride is set before the slab becomes visible to
> other CPUs via the per-node partial slab list (protected by spinlock
> with acquire/release semantics), preventing them from observing
> inconsistent stride value.
>
> Thanks to Shakeel Butt for pointing out this issue [2].
>
> Fixes: 7a8e71bc619d ("mm/slab: use stride to access slabobj_ext")
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Closes: https://lore.kernel.org/lkml/ca241daa-e7e7-4604-a48d-de91ec9184a5@linux.ibm.com [1]
> Link: https://lore.kernel.org/linux-mm/aZu9G9mVIVzSm6Ft@hyeyoo [2]
> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm/slab: initialize slab->stride early to avoid memory ordering issues
2026-02-23 7:58 [PATCH] mm/slab: initialize slab->stride early to avoid memory ordering issues Harry Yoo
2026-02-23 11:44 ` Harry Yoo
2026-02-23 20:23 ` Shakeel Butt
@ 2026-02-24 9:04 ` Venkat Rao Bagalkote
2026-02-24 11:10 ` Harry Yoo
2 siblings, 1 reply; 6+ messages in thread
From: Venkat Rao Bagalkote @ 2026-02-24 9:04 UTC (permalink / raw)
To: Harry Yoo, Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin,
Alexei Starovoitov, Hao Li, Suren Baghdasaryan, Shakeel Butt,
Muchun Song, Johannes Weiner, Michal Hocko, cgroups, linux-mm
On 23/02/26 1:28 pm, Harry Yoo wrote:
> When alloc_slab_obj_exts() is called later in time (instead of at slab
> allocation & initialization step), slab->stride and slab->obj_exts are
> set when the slab is already accessible by multiple CPUs.
>
> The current implementation does not enforce memory ordering between
> slab->stride and slab->obj_exts. However, for correctness, slab->stride
> must be visible before slab->obj_exts, otherwise concurrent readers
> may observe slab->obj_exts as non-zero while stride is still stale,
> leading to incorrect reference counting of object cgroups.
>
> There has been a bug report [1] that showed symptoms of incorrect
> reference counting of object cgroups, which could be triggered by
> this memory ordering issue.
>
> Fix this by unconditionally initializing slab->stride in
> alloc_slab_obj_exts_early(), before the need_slab_obj_exts() check.
> In case of SLAB_OBJ_EXT_IN_OBJ, it is overridden in the same function.
>
> This ensures stride is set before the slab becomes visible to
> other CPUs via the per-node partial slab list (protected by spinlock
> with acquire/release semantics), preventing them from observing
> inconsistent stride value.
>
> Thanks to Shakeel Butt for pointing out this issue [2].
>
> Fixes: 7a8e71bc619d ("mm/slab: use stride to access slabobj_ext")
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Closes: https://lore.kernel.org/lkml/ca241daa-e7e7-4604-a48d-de91ec9184a5@linux.ibm.com [1]
> Link: https://lore.kernel.org/linux-mm/aZu9G9mVIVzSm6Ft@hyeyoo [2]
> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> ---
>
> I tested this patch, but I could not confirm that this actually fixes
> the issue reported by [1]. It would be nice if Venkat could help
> confirm; but perhaps it's challenging to reliably reproduce...
Thanks for the patch. I did ran the complete test suite, and
unfortunately issue is reproducing.
I applied this patch on mainline repo for testing.
Traces:
[ 9316.514161] BUG: Kernel NULL pointer dereference on read at 0x00000000
[ 9316.514169] Faulting instruction address: 0xc0000000008b2ff4
[ 9316.514176] Oops: Kernel access of bad area, sig: 7 [#1]
[ 9316.514182] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries
[ 9316.514189] Modules linked in: overlay dm_zero dm_thin_pool
dm_persistent_data dm_bio_prison dm_snapshot dm_bufio dm_flakey xfs loop
dm_mod nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set bonding nf_tables tls
sunrpc rfkill nfnetlink pseries_rng vmx_crypto dax_pmem fuse ext4 crc16
mbcache jbd2 nd_pmem papr_scm sd_mod libnvdimm sg ibmvscsi ibmveth
scsi_transport_srp pseries_wdt [last unloaded: scsi_debug]
[ 9316.514295] CPU: 16 UID: 0 PID: 0 Comm: swapper/16 Kdump: loaded
Tainted: G W 7.0.0-rc1+ #1 PREEMPTLAZY
[ 9316.514306] Tainted: [W]=WARN
[ 9316.514311] Hardware name: IBM,9080-HEX Power11 (architected)
0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
[ 9316.514318] NIP: c0000000008b2ff4 LR: c0000000008b2fec CTR:
c00000000036d680
[ 9316.514326] REGS: c000000d0dcb7870 TRAP: 0300 Tainted: G W
(7.0.0-rc1+)
[ 9316.514333] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR:
84042802 XER: 20040000
[ 9316.514356] CFAR: c000000000862e94 DAR: 0000000000000000 DSISR:
00080000 IRQMASK: 0
[ 9316.514356] GPR00: c0000000008b2fec c000000d0dcb7b10 c00000000243a500
0000000000000001
[ 9316.514356] GPR04: 0000000000000008 0000000000000001 c0000000008b2fec
0000000000000001
[ 9316.514356] GPR08: a80e000000000000 0000000000000001 0000000000000007
a80e000000000000
[ 9316.514356] GPR12: c00e00000e7b6cd5 c000000d0ddf4700 c000000129a98e00
0000000000000006
[ 9316.514356] GPR16: c000000007012fa0 c000000007012fa4 c000000005160980
c000000007012f88
[ 9316.514356] GPR20: c00c000000021bec c000000d0d07f008 0000000000000001
ffffffffffffff78
[ 9316.514356] GPR24: 0000000000000005 c000000d0d58f180 c0000000032cf000
c000000d0ddf4700
[ 9316.514356] GPR28: 0000000000000088 0000000000000000 c000000129a98e00
c000000d0d07f000
[ 9316.514457] NIP [c0000000008b2ff4] refill_obj_stock+0x5b4/0x680
[ 9316.514467] LR [c0000000008b2fec] refill_obj_stock+0x5ac/0x680
[ 9316.514476] Call Trace:
[ 9316.514481] [c000000d0dcb7b10] [c0000000008b2fec]
refill_obj_stock+0x5ac/0x680 (unreliable)
[ 9316.514494] [c000000d0dcb7b90] [c0000000008b9598]
__memcg_slab_free_hook+0x238/0x3ec
[ 9316.514505] [c000000d0dcb7c60] [c0000000007f3d90]
__rcu_free_sheaf_prepare+0x314/0x3e8
[ 9316.514516] [c000000d0dcb7d10] [c0000000007fc2ec]
rcu_free_sheaf+0x38/0x170
[ 9316.514528] [c000000d0dcb7d50] [c000000000334570]
rcu_do_batch+0x2ec/0xfa8
[ 9316.514538] [c000000d0dcb7e50] [c000000000339a08] rcu_core+0x22c/0x48c
[ 9316.514548] [c000000d0dcb7ec0] [c0000000001cfeac]
handle_softirqs+0x1f4/0x74c
[ 9316.514559] [c000000d0dcb7fe0] [c00000000001b0cc]
do_softirq_own_stack+0x60/0x7c
[ 9316.514570] [c0000000096c7930] [c00000000001b0b8]
do_softirq_own_stack+0x4c/0x7c
[ 9316.514581] [c0000000096c7960] [c0000000001cf168]
__irq_exit_rcu+0x268/0x308
[ 9316.514592] [c0000000096c79a0] [c0000000001d0be4] irq_exit+0x20/0x38
[ 9316.514602] [c0000000096c79c0] [c0000000000315f4]
interrupt_async_exit_prepare.constprop.0+0x18/0x2c
[ 9316.514614] [c0000000096c79e0] [c000000000009ffc]
decrementer_common_virt+0x28c/0x290
[ 9316.514626] ---- interrupt: 900 at plpar_hcall_norets_notrace+0x18/0x2c
[ 9316.514635] NIP: c00000000012d9f0 LR: c00000000135c0a8 CTR:
0000000000000000
[ 9316.514642] REGS: c0000000096c7a10 TRAP: 0900 Tainted: G W
(7.0.0-rc1+)
[ 9316.514649] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
CR: 24000804 XER: 00000000
[ 9316.514678] CFAR: 0000000000000000 IRQMASK: 0
[ 9316.514678] GPR00: 0000000000000000 c0000000096c7cb0 c00000000243a500
0000000000000000
[ 9316.514678] GPR04: 0000000000000000 800400002fe6fc10 0000000000000000
0000000000000001
[ 9316.514678] GPR08: 0000000000000030 0000000000000000 0000000000000090
0000000000000001
[ 9316.514678] GPR12: 800400002fe6fc00 c000000d0ddf4700 0000000000000000
000000002ef01a00
[ 9316.514678] GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[ 9316.514678] GPR20: 0000000000000000 0000000000000000 0000000000000000
0000000000000001
[ 9316.514678] GPR24: 0000000000000000 c000000004d7a760 000008792ad04b82
0000000000000000
[ 9316.514678] GPR28: 0000000000000000 0000000000000001 c0000000032b18d8
c0000000032b18e0
[ 9316.514774] NIP [c00000000012d9f0] plpar_hcall_norets_notrace+0x18/0x2c
[ 9316.514782] LR [c00000000135c0a8] cede_processor.isra.0+0x1c/0x34
[ 9316.514792] ---- interrupt: 900
[ 9316.514797] [c0000000096c7cb0] [c0000000096c7cf0] 0xc0000000096c7cf0
(unreliable)
[ 9316.514808] [c0000000096c7d10] [c0000000019af170]
dedicated_cede_loop+0x90/0x170
[ 9316.514819] [c0000000096c7d60] [c0000000019aeb20]
cpuidle_enter_state+0x394/0x480
[ 9316.514830] [c0000000096c7e00] [c00000000135864c] cpuidle_enter+0x64/0x9c
[ 9316.514840] [c0000000096c7e50] [c000000000284b0c] call_cpuidle+0x7c/0xf8
[ 9316.514852] [c0000000096c7e90] [c0000000002903e8]
cpuidle_idle_call+0x1c4/0x2b4
[ 9316.514862] [c0000000096c7f00] [c00000000029060c] do_idle+0x134/0x208
[ 9316.514872] [c0000000096c7f50] [c000000000290a5c]
cpu_startup_entry+0x60/0x64
[ 9316.514882] [c0000000096c7f80] [c000000000074738]
start_secondary+0x3fc/0x400
[ 9316.514894] [c0000000096c7fe0] [c00000000000e258]
start_secondary_prolog+0x10/0x14
[ 9316.514904] Code: eba962a0 4bfffe40 60000000 387e0008 4bfae7c1
60000000 ebbe0008 38800008 7fa3eb78 4bfafe85 60000000 39200001
<7d40e8a8> 7d495214 7d40e9ad 40c2fff4
[ 9316.514941] ---[ end trace 0000000000000000 ]---
Regards,
Venkat.
>
> Since this logically makes sense, it would be worth fix it anyway.
>
> mm/slub.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 18c30872d196..afa98065d74f 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2196,7 +2196,6 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> retry:
> old_exts = READ_ONCE(slab->obj_exts);
> handle_failed_objexts_alloc(old_exts, vec, objects);
> - slab_set_stride(slab, sizeof(struct slabobj_ext));
>
> if (new_slab) {
> /*
> @@ -2272,6 +2271,9 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
> void *addr;
> unsigned long obj_exts;
>
> + /* Initialize stride early to avoid memory ordering issues */
> + slab_set_stride(slab, sizeof(struct slabobj_ext));
> +
> if (!need_slab_obj_exts(s))
> return;
>
> @@ -2288,7 +2290,6 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
> obj_exts |= MEMCG_DATA_OBJEXTS;
> #endif
> slab->obj_exts = obj_exts;
> - slab_set_stride(slab, sizeof(struct slabobj_ext));
> } else if (s->flags & SLAB_OBJ_EXT_IN_OBJ) {
> unsigned int offset = obj_exts_offset_in_object(s);
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm/slab: initialize slab->stride early to avoid memory ordering issues
2026-02-24 9:04 ` Venkat Rao Bagalkote
@ 2026-02-24 11:10 ` Harry Yoo
0 siblings, 0 replies; 6+ messages in thread
From: Harry Yoo @ 2026-02-24 11:10 UTC (permalink / raw)
To: Venkat Rao Bagalkote
Cc: Vlastimil Babka, Andrew Morton, Christoph Lameter,
David Rientjes, Roman Gushchin, Alexei Starovoitov, Hao Li,
Suren Baghdasaryan, Shakeel Butt, Muchun Song, Johannes Weiner,
Michal Hocko, cgroups, linux-mm
On Tue, Feb 24, 2026 at 02:34:41PM +0530, Venkat Rao Bagalkote wrote:
>
> On 23/02/26 1:28 pm, Harry Yoo wrote:
> > When alloc_slab_obj_exts() is called later in time (instead of at slab
> > allocation & initialization step), slab->stride and slab->obj_exts are
> > set when the slab is already accessible by multiple CPUs.
> >
> > The current implementation does not enforce memory ordering between
> > slab->stride and slab->obj_exts. However, for correctness, slab->stride
> > must be visible before slab->obj_exts, otherwise concurrent readers
> > may observe slab->obj_exts as non-zero while stride is still stale,
> > leading to incorrect reference counting of object cgroups.
> >
> > There has been a bug report [1] that showed symptoms of incorrect
> > reference counting of object cgroups, which could be triggered by
> > this memory ordering issue.
> >
> > Fix this by unconditionally initializing slab->stride in
> > alloc_slab_obj_exts_early(), before the need_slab_obj_exts() check.
> > In case of SLAB_OBJ_EXT_IN_OBJ, it is overridden in the same function.
> >
> > This ensures stride is set before the slab becomes visible to
> > other CPUs via the per-node partial slab list (protected by spinlock
> > with acquire/release semantics), preventing them from observing
> > inconsistent stride value.
> >
> > Thanks to Shakeel Butt for pointing out this issue [2].
> >
> > Fixes: 7a8e71bc619d ("mm/slab: use stride to access slabobj_ext")
> > Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> > Closes: https://lore.kernel.org/lkml/ca241daa-e7e7-4604-a48d-de91ec9184a5@linux.ibm.com
> > Link: https://lore.kernel.org/linux-mm/aZu9G9mVIVzSm6Ft@hyeyoo
> > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > ---
> >
> > I tested this patch, but I could not confirm that this actually fixes
> > the issue reported by [1]. It would be nice if Venkat could help
> > confirm; but perhaps it's challenging to reliably reproduce...
>
>
> Thanks for the patch. I did ran the complete test suite, and unfortunately
> issue is reproducing.
Oops, thanks for confirming that it's still reproduced!
That's really helpful.
Perhaps I should start considering cases where it's not a memory
ordering issue, but let's check one more thing before moving on.
could you please test if it still reproduces with the following patch?
If it's still reproducible, it should not be due to the memory ordering
issue between obj_exts and stride.
---8<---
From: Harry Yoo <harry.yoo@oracle.com>
Date: Mon, 23 Feb 2026 16:58:09 +0900
Subject: mm/slab: enforce slab->stride -> slab->obj_exts ordering
I tried to avoid unnecessary memory barriers for efficiency,
but the original bug is still reproducible.
Probably I missed a case where an object is allocated on a CPU
and then freed on a different CPU without involving spinlock.
I'm not sure if I did not cover edge cases or if it's caused by
something other than memory ordering issue.
Anyway, let's find out by introducing heavy memory barriers!
Always ensure that updates to stride is visible before obj_exts.
---
mm/slab.h | 1 +
mm/slub.c | 10 +++++++---
2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/mm/slab.h b/mm/slab.h
index 71c7261bf822..aacdd9f4e509 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -565,6 +565,7 @@ static inline void slab_set_stride(struct slab *slab, unsigned short stride)
}
static inline unsigned short slab_get_stride(struct slab *slab)
{
+ smp_rmb();
return slab->stride;
}
#else
diff --git a/mm/slub.c b/mm/slub.c
index 862642c165ed..c7c8b660a994 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2196,7 +2196,6 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
retry:
old_exts = READ_ONCE(slab->obj_exts);
handle_failed_objexts_alloc(old_exts, vec, objects);
- slab_set_stride(slab, sizeof(struct slabobj_ext));
if (new_slab) {
/*
@@ -2272,6 +2271,10 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
void *addr;
unsigned long obj_exts;
+ slab_set_stride(slab, sizeof(struct slabobj_ext));
+ /* pairs with smp_rmb() in slab_get_stride() */
+ smp_wmb();
+
if (!need_slab_obj_exts(s))
return;
@@ -2288,7 +2291,6 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
obj_exts |= MEMCG_DATA_OBJEXTS;
#endif
slab->obj_exts = obj_exts;
- slab_set_stride(slab, sizeof(struct slabobj_ext));
} else if (s->flags & SLAB_OBJ_EXT_IN_OBJ) {
unsigned int offset = obj_exts_offset_in_object(s);
@@ -2305,8 +2307,10 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
#ifdef CONFIG_MEMCG
obj_exts |= MEMCG_DATA_OBJEXTS;
#endif
- slab->obj_exts = obj_exts;
slab_set_stride(slab, s->size);
+ /* pairs with smp_rmb() in slab_get_stride() */
+ smp_wmb();
+ slab->obj_exts = obj_exts;
}
}
--
2.43.0
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-02-24 11:10 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-23 7:58 [PATCH] mm/slab: initialize slab->stride early to avoid memory ordering issues Harry Yoo
2026-02-23 11:44 ` Harry Yoo
2026-02-23 17:04 ` Vlastimil Babka
2026-02-23 20:23 ` Shakeel Butt
2026-02-24 9:04 ` Venkat Rao Bagalkote
2026-02-24 11:10 ` Harry Yoo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox