linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/slab: initialize slab->stride early to avoid memory ordering issues
@ 2026-02-23  7:58 Harry Yoo
  2026-02-23 11:44 ` Harry Yoo
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Harry Yoo @ 2026-02-23  7:58 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	Alexei Starovoitov, Hao Li, Suren Baghdasaryan, Shakeel Butt,
	Muchun Song, Johannes Weiner, Michal Hocko, cgroups, linux-mm,
	Venkat Rao Bagalkote

When alloc_slab_obj_exts() is called later in time (instead of at slab
allocation & initialization step), slab->stride and slab->obj_exts are
set when the slab is already accessible by multiple CPUs.

The current implementation does not enforce memory ordering between
slab->stride and slab->obj_exts. However, for correctness, slab->stride
must be visible before slab->obj_exts, otherwise concurrent readers
may observe slab->obj_exts as non-zero while stride is still stale,
leading to incorrect reference counting of object cgroups.

There has been a bug report [1] that showed symptoms of incorrect
reference counting of object cgroups, which could be triggered by
this memory ordering issue.

Fix this by unconditionally initializing slab->stride in
alloc_slab_obj_exts_early(), before the need_slab_obj_exts() check.
In case of SLAB_OBJ_EXT_IN_OBJ, it is overridden in the same function.

This ensures stride is set before the slab becomes visible to
other CPUs via the per-node partial slab list (protected by spinlock
with acquire/release semantics), preventing them from observing
inconsistent stride value.

Thanks to Shakeel Butt for pointing out this issue [2].

Fixes: 7a8e71bc619d ("mm/slab: use stride to access slabobj_ext")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/lkml/ca241daa-e7e7-4604-a48d-de91ec9184a5@linux.ibm.com [1]
Link: https://lore.kernel.org/linux-mm/aZu9G9mVIVzSm6Ft@hyeyoo [2]
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---

I tested this patch, but I could not confirm that this actually fixes
the issue reported by [1]. It would be nice if Venkat could help
confirm; but perhaps it's challenging to reliably reproduce...

Since this logically makes sense, it would be worth fix it anyway.

 mm/slub.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 18c30872d196..afa98065d74f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2196,7 +2196,6 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
 retry:
 	old_exts = READ_ONCE(slab->obj_exts);
 	handle_failed_objexts_alloc(old_exts, vec, objects);
-	slab_set_stride(slab, sizeof(struct slabobj_ext));
 
 	if (new_slab) {
 		/*
@@ -2272,6 +2271,9 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
 	void *addr;
 	unsigned long obj_exts;
 
+	/* Initialize stride early to avoid memory ordering issues */
+	slab_set_stride(slab, sizeof(struct slabobj_ext));
+
 	if (!need_slab_obj_exts(s))
 		return;
 
@@ -2288,7 +2290,6 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
 		obj_exts |= MEMCG_DATA_OBJEXTS;
 #endif
 		slab->obj_exts = obj_exts;
-		slab_set_stride(slab, sizeof(struct slabobj_ext));
 	} else if (s->flags & SLAB_OBJ_EXT_IN_OBJ) {
 		unsigned int offset = obj_exts_offset_in_object(s);
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/slab: initialize slab->stride early to avoid memory ordering issues
  2026-02-23  7:58 [PATCH] mm/slab: initialize slab->stride early to avoid memory ordering issues Harry Yoo
@ 2026-02-23 11:44 ` Harry Yoo
  2026-02-23 17:04   ` Vlastimil Babka
  2026-02-23 20:23 ` Shakeel Butt
  2026-02-24  9:04 ` Venkat Rao Bagalkote
  2 siblings, 1 reply; 6+ messages in thread
From: Harry Yoo @ 2026-02-23 11:44 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Hao Li,
	Suren Baghdasaryan, Shakeel Butt, Muchun Song, Johannes Weiner,
	Michal Hocko, cgroups, linux-mm, Venkat Rao Bagalkote

On Mon, Feb 23, 2026 at 04:58:09PM +0900, Harry Yoo wrote:
> When alloc_slab_obj_exts() is called later in time (instead of at slab
> allocation & initialization step), slab->stride and slab->obj_exts are
> set when the slab is already accessible by multiple CPUs.
> 
> The current implementation does not enforce memory ordering between
> slab->stride and slab->obj_exts. However, for correctness, slab->stride
> must be visible before slab->obj_exts, otherwise concurrent readers
> may observe slab->obj_exts as non-zero while stride is still stale,
> leading to incorrect reference counting of object cgroups.
> 
> There has been a bug report [1] that showed symptoms of incorrect
> reference counting of object cgroups, which could be triggered by
> this memory ordering issue.
> 
> Fix this by unconditionally initializing slab->stride in
> alloc_slab_obj_exts_early(), before the need_slab_obj_exts() check.
> In case of SLAB_OBJ_EXT_IN_OBJ, it is overridden in the same function.
> 
> This ensures stride is set before the slab becomes visible to
> other CPUs via the per-node partial slab list (protected by spinlock
> with acquire/release semantics), preventing them from observing
> inconsistent stride value.
> 
> Thanks to Shakeel Butt for pointing out this issue [2].
> 
> Fixes: 7a8e71bc619d ("mm/slab: use stride to access slabobj_ext")
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Closes: https://lore.kernel.org/lkml/ca241daa-e7e7-4604-a48d-de91ec9184a5@linux.ibm.com [1]
> Link: https://lore.kernel.org/linux-mm/aZu9G9mVIVzSm6Ft@hyeyoo [2]
> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> ---

Vlastimil, could you please update the changelog when applying this
to the tree? I think this also explains [3] (thanks for raising it
off-list, Vlastimil!):

When alloc_slab_obj_exts() is called later (instead of during slab
allocation and initialization), slab->stride and slab->obj_exts are
updated after the slab is already accessible by multiple CPUs.

The current implementation does not enforce memory ordering between
slab->stride and slab->obj_exts. For correctness, slab->stride must be
visible before slab->obj_exts. Otherwise, concurrent readers may observe
slab->obj_exts as non-zero while stride is still stale.

With stale slab->stride, slab_obj_ext() could return the wrong obj_ext.
This could cause two problems:

  - obj_cgroup_put() is called on the wrong objcg, leading to
    a use-after-free due to incorrect reference counting [1] by
    decrementing the reference count more than it was incremented.

  - refill_obj_stock() is called on the wrong objcg, leading to
    a page_counter overflow [2] by uncharging more memory than charged.

Fix this by unconditionally initializing slab->stride in
alloc_slab_obj_exts_early(), before the need_slab_obj_exts() check.
In the case of SLAB_OBJ_EXT_IN_OBJ, it is overridden in the function.

This ensures updates to slab->stride become visible before the slab
can be accessed by other CPUs via the per-node partial slab list
(protected by spinlock with acquire/release semantics).

Thanks to Shakeel Butt for pointing out this issue [3].

Fixes: 7a8e71bc619d ("mm/slab: use stride to access slabobj_ext")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/lkml/ca241daa-e7e7-4604-a48d-de91ec9184a5@linux.ibm.com [1]
Closes: https://lore.kernel.org/all/ddff7c7d-c0c3-4780-808f-9a83268bbf0c@linux.ibm.com [2]
Link: https://lore.kernel.org/linux-mm/aZu9G9mVIVzSm6Ft@hyeyoo [3]
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/slab: initialize slab->stride early to avoid memory ordering issues
  2026-02-23 11:44 ` Harry Yoo
@ 2026-02-23 17:04   ` Vlastimil Babka
  0 siblings, 0 replies; 6+ messages in thread
From: Vlastimil Babka @ 2026-02-23 17:04 UTC (permalink / raw)
  To: Harry Yoo, Vlastimil Babka, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Hao Li,
	Suren Baghdasaryan, Shakeel Butt, Muchun Song, Johannes Weiner,
	Michal Hocko, cgroups, linux-mm, Venkat Rao Bagalkote

On 2/23/26 12:44, Harry Yoo wrote:
> On Mon, Feb 23, 2026 at 04:58:09PM +0900, Harry Yoo wrote:
>> When alloc_slab_obj_exts() is called later in time (instead of at slab
>> allocation & initialization step), slab->stride and slab->obj_exts are
>> set when the slab is already accessible by multiple CPUs.
>> 
>> The current implementation does not enforce memory ordering between
>> slab->stride and slab->obj_exts. However, for correctness, slab->stride
>> must be visible before slab->obj_exts, otherwise concurrent readers
>> may observe slab->obj_exts as non-zero while stride is still stale,
>> leading to incorrect reference counting of object cgroups.
>> 
>> There has been a bug report [1] that showed symptoms of incorrect
>> reference counting of object cgroups, which could be triggered by
>> this memory ordering issue.
>> 
>> Fix this by unconditionally initializing slab->stride in
>> alloc_slab_obj_exts_early(), before the need_slab_obj_exts() check.
>> In case of SLAB_OBJ_EXT_IN_OBJ, it is overridden in the same function.
>> 
>> This ensures stride is set before the slab becomes visible to
>> other CPUs via the per-node partial slab list (protected by spinlock
>> with acquire/release semantics), preventing them from observing
>> inconsistent stride value.
>> 
>> Thanks to Shakeel Butt for pointing out this issue [2].
>> 
>> Fixes: 7a8e71bc619d ("mm/slab: use stride to access slabobj_ext")
>> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
>> Closes: https://lore.kernel.org/lkml/ca241daa-e7e7-4604-a48d-de91ec9184a5@linux.ibm.com [1]
>> Link: https://lore.kernel.org/linux-mm/aZu9G9mVIVzSm6Ft@hyeyoo [2]
>> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
>> ---
> 
> Vlastimil, could you please update the changelog when applying this
> to the tree? I think this also explains [3] (thanks for raising it
> off-list, Vlastimil!):

Done, thanks! Added to slab/for-next-fixes

> When alloc_slab_obj_exts() is called later (instead of during slab
> allocation and initialization), slab->stride and slab->obj_exts are
> updated after the slab is already accessible by multiple CPUs.
> 
> The current implementation does not enforce memory ordering between
> slab->stride and slab->obj_exts. For correctness, slab->stride must be
> visible before slab->obj_exts. Otherwise, concurrent readers may observe
> slab->obj_exts as non-zero while stride is still stale.
> 
> With stale slab->stride, slab_obj_ext() could return the wrong obj_ext.
> This could cause two problems:
> 
>   - obj_cgroup_put() is called on the wrong objcg, leading to
>     a use-after-free due to incorrect reference counting [1] by
>     decrementing the reference count more than it was incremented.
> 
>   - refill_obj_stock() is called on the wrong objcg, leading to
>     a page_counter overflow [2] by uncharging more memory than charged.
> 
> Fix this by unconditionally initializing slab->stride in
> alloc_slab_obj_exts_early(), before the need_slab_obj_exts() check.
> In the case of SLAB_OBJ_EXT_IN_OBJ, it is overridden in the function.
> 
> This ensures updates to slab->stride become visible before the slab
> can be accessed by other CPUs via the per-node partial slab list
> (protected by spinlock with acquire/release semantics).
> 
> Thanks to Shakeel Butt for pointing out this issue [3].
> 
> Fixes: 7a8e71bc619d ("mm/slab: use stride to access slabobj_ext")
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Closes: https://lore.kernel.org/lkml/ca241daa-e7e7-4604-a48d-de91ec9184a5@linux.ibm.com [1]
> Closes: https://lore.kernel.org/all/ddff7c7d-c0c3-4780-808f-9a83268bbf0c@linux.ibm.com [2]
> Link: https://lore.kernel.org/linux-mm/aZu9G9mVIVzSm6Ft@hyeyoo [3]
> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>






^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/slab: initialize slab->stride early to avoid memory ordering issues
  2026-02-23  7:58 [PATCH] mm/slab: initialize slab->stride early to avoid memory ordering issues Harry Yoo
  2026-02-23 11:44 ` Harry Yoo
@ 2026-02-23 20:23 ` Shakeel Butt
  2026-02-24  9:04 ` Venkat Rao Bagalkote
  2 siblings, 0 replies; 6+ messages in thread
From: Shakeel Butt @ 2026-02-23 20:23 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Vlastimil Babka, Andrew Morton, Christoph Lameter,
	David Rientjes, Roman Gushchin, Alexei Starovoitov, Hao Li,
	Suren Baghdasaryan, Muchun Song, Johannes Weiner, Michal Hocko,
	cgroups, linux-mm, Venkat Rao Bagalkote

On Mon, Feb 23, 2026 at 04:58:09PM +0900, Harry Yoo wrote:
> When alloc_slab_obj_exts() is called later in time (instead of at slab
> allocation & initialization step), slab->stride and slab->obj_exts are
> set when the slab is already accessible by multiple CPUs.
> 
> The current implementation does not enforce memory ordering between
> slab->stride and slab->obj_exts. However, for correctness, slab->stride
> must be visible before slab->obj_exts, otherwise concurrent readers
> may observe slab->obj_exts as non-zero while stride is still stale,
> leading to incorrect reference counting of object cgroups.
> 
> There has been a bug report [1] that showed symptoms of incorrect
> reference counting of object cgroups, which could be triggered by
> this memory ordering issue.
> 
> Fix this by unconditionally initializing slab->stride in
> alloc_slab_obj_exts_early(), before the need_slab_obj_exts() check.
> In case of SLAB_OBJ_EXT_IN_OBJ, it is overridden in the same function.
> 
> This ensures stride is set before the slab becomes visible to
> other CPUs via the per-node partial slab list (protected by spinlock
> with acquire/release semantics), preventing them from observing
> inconsistent stride value.
> 
> Thanks to Shakeel Butt for pointing out this issue [2].
> 
> Fixes: 7a8e71bc619d ("mm/slab: use stride to access slabobj_ext")
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Closes: https://lore.kernel.org/lkml/ca241daa-e7e7-4604-a48d-de91ec9184a5@linux.ibm.com [1]
> Link: https://lore.kernel.org/linux-mm/aZu9G9mVIVzSm6Ft@hyeyoo [2]
> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>

Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/slab: initialize slab->stride early to avoid memory ordering issues
  2026-02-23  7:58 [PATCH] mm/slab: initialize slab->stride early to avoid memory ordering issues Harry Yoo
  2026-02-23 11:44 ` Harry Yoo
  2026-02-23 20:23 ` Shakeel Butt
@ 2026-02-24  9:04 ` Venkat Rao Bagalkote
  2026-02-24 11:10   ` Harry Yoo
  2 siblings, 1 reply; 6+ messages in thread
From: Venkat Rao Bagalkote @ 2026-02-24  9:04 UTC (permalink / raw)
  To: Harry Yoo, Vlastimil Babka, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin,
	Alexei Starovoitov, Hao Li, Suren Baghdasaryan, Shakeel Butt,
	Muchun Song, Johannes Weiner, Michal Hocko, cgroups, linux-mm


On 23/02/26 1:28 pm, Harry Yoo wrote:
> When alloc_slab_obj_exts() is called later in time (instead of at slab
> allocation & initialization step), slab->stride and slab->obj_exts are
> set when the slab is already accessible by multiple CPUs.
>
> The current implementation does not enforce memory ordering between
> slab->stride and slab->obj_exts. However, for correctness, slab->stride
> must be visible before slab->obj_exts, otherwise concurrent readers
> may observe slab->obj_exts as non-zero while stride is still stale,
> leading to incorrect reference counting of object cgroups.
>
> There has been a bug report [1] that showed symptoms of incorrect
> reference counting of object cgroups, which could be triggered by
> this memory ordering issue.
>
> Fix this by unconditionally initializing slab->stride in
> alloc_slab_obj_exts_early(), before the need_slab_obj_exts() check.
> In case of SLAB_OBJ_EXT_IN_OBJ, it is overridden in the same function.
>
> This ensures stride is set before the slab becomes visible to
> other CPUs via the per-node partial slab list (protected by spinlock
> with acquire/release semantics), preventing them from observing
> inconsistent stride value.
>
> Thanks to Shakeel Butt for pointing out this issue [2].
>
> Fixes: 7a8e71bc619d ("mm/slab: use stride to access slabobj_ext")
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Closes: https://lore.kernel.org/lkml/ca241daa-e7e7-4604-a48d-de91ec9184a5@linux.ibm.com [1]
> Link: https://lore.kernel.org/linux-mm/aZu9G9mVIVzSm6Ft@hyeyoo [2]
> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> ---
>
> I tested this patch, but I could not confirm that this actually fixes
> the issue reported by [1]. It would be nice if Venkat could help
> confirm; but perhaps it's challenging to reliably reproduce...


Thanks for the patch. I did ran the complete test suite, and 
unfortunately issue is reproducing.

I applied this patch on mainline repo for testing.

Traces:

[ 9316.514161] BUG: Kernel NULL pointer dereference on read at 0x00000000
[ 9316.514169] Faulting instruction address: 0xc0000000008b2ff4
[ 9316.514176] Oops: Kernel access of bad area, sig: 7 [#1]
[ 9316.514182] LE PAGE_SIZE=64K MMU=Radix  SMP NR_CPUS=8192 NUMA pSeries
[ 9316.514189] Modules linked in: overlay dm_zero dm_thin_pool 
dm_persistent_data dm_bio_prison dm_snapshot dm_bufio dm_flakey xfs loop 
dm_mod nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet 
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set bonding nf_tables tls 
sunrpc rfkill nfnetlink pseries_rng vmx_crypto dax_pmem fuse ext4 crc16 
mbcache jbd2 nd_pmem papr_scm sd_mod libnvdimm sg ibmvscsi ibmveth 
scsi_transport_srp pseries_wdt [last unloaded: scsi_debug]
[ 9316.514295] CPU: 16 UID: 0 PID: 0 Comm: swapper/16 Kdump: loaded 
Tainted: G        W           7.0.0-rc1+ #1 PREEMPTLAZY
[ 9316.514306] Tainted: [W]=WARN
[ 9316.514311] Hardware name: IBM,9080-HEX Power11 (architected) 
0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
[ 9316.514318] NIP:  c0000000008b2ff4 LR: c0000000008b2fec CTR: 
c00000000036d680
[ 9316.514326] REGS: c000000d0dcb7870 TRAP: 0300   Tainted: G   W        
     (7.0.0-rc1+)
[ 9316.514333] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 
84042802  XER: 20040000
[ 9316.514356] CFAR: c000000000862e94 DAR: 0000000000000000 DSISR: 
00080000 IRQMASK: 0
[ 9316.514356] GPR00: c0000000008b2fec c000000d0dcb7b10 c00000000243a500 
0000000000000001
[ 9316.514356] GPR04: 0000000000000008 0000000000000001 c0000000008b2fec 
0000000000000001
[ 9316.514356] GPR08: a80e000000000000 0000000000000001 0000000000000007 
a80e000000000000
[ 9316.514356] GPR12: c00e00000e7b6cd5 c000000d0ddf4700 c000000129a98e00 
0000000000000006
[ 9316.514356] GPR16: c000000007012fa0 c000000007012fa4 c000000005160980 
c000000007012f88
[ 9316.514356] GPR20: c00c000000021bec c000000d0d07f008 0000000000000001 
ffffffffffffff78
[ 9316.514356] GPR24: 0000000000000005 c000000d0d58f180 c0000000032cf000 
c000000d0ddf4700
[ 9316.514356] GPR28: 0000000000000088 0000000000000000 c000000129a98e00 
c000000d0d07f000
[ 9316.514457] NIP [c0000000008b2ff4] refill_obj_stock+0x5b4/0x680
[ 9316.514467] LR [c0000000008b2fec] refill_obj_stock+0x5ac/0x680
[ 9316.514476] Call Trace:
[ 9316.514481] [c000000d0dcb7b10] [c0000000008b2fec] 
refill_obj_stock+0x5ac/0x680 (unreliable)
[ 9316.514494] [c000000d0dcb7b90] [c0000000008b9598] 
__memcg_slab_free_hook+0x238/0x3ec
[ 9316.514505] [c000000d0dcb7c60] [c0000000007f3d90] 
__rcu_free_sheaf_prepare+0x314/0x3e8
[ 9316.514516] [c000000d0dcb7d10] [c0000000007fc2ec] 
rcu_free_sheaf+0x38/0x170
[ 9316.514528] [c000000d0dcb7d50] [c000000000334570] 
rcu_do_batch+0x2ec/0xfa8
[ 9316.514538] [c000000d0dcb7e50] [c000000000339a08] rcu_core+0x22c/0x48c
[ 9316.514548] [c000000d0dcb7ec0] [c0000000001cfeac] 
handle_softirqs+0x1f4/0x74c
[ 9316.514559] [c000000d0dcb7fe0] [c00000000001b0cc] 
do_softirq_own_stack+0x60/0x7c
[ 9316.514570] [c0000000096c7930] [c00000000001b0b8] 
do_softirq_own_stack+0x4c/0x7c
[ 9316.514581] [c0000000096c7960] [c0000000001cf168] 
__irq_exit_rcu+0x268/0x308
[ 9316.514592] [c0000000096c79a0] [c0000000001d0be4] irq_exit+0x20/0x38
[ 9316.514602] [c0000000096c79c0] [c0000000000315f4] 
interrupt_async_exit_prepare.constprop.0+0x18/0x2c
[ 9316.514614] [c0000000096c79e0] [c000000000009ffc] 
decrementer_common_virt+0x28c/0x290
[ 9316.514626] ---- interrupt: 900 at plpar_hcall_norets_notrace+0x18/0x2c
[ 9316.514635] NIP:  c00000000012d9f0 LR: c00000000135c0a8 CTR: 
0000000000000000
[ 9316.514642] REGS: c0000000096c7a10 TRAP: 0900   Tainted: G   W        
     (7.0.0-rc1+)
[ 9316.514649] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  
CR: 24000804  XER: 00000000
[ 9316.514678] CFAR: 0000000000000000 IRQMASK: 0
[ 9316.514678] GPR00: 0000000000000000 c0000000096c7cb0 c00000000243a500 
0000000000000000
[ 9316.514678] GPR04: 0000000000000000 800400002fe6fc10 0000000000000000 
0000000000000001
[ 9316.514678] GPR08: 0000000000000030 0000000000000000 0000000000000090 
0000000000000001
[ 9316.514678] GPR12: 800400002fe6fc00 c000000d0ddf4700 0000000000000000 
000000002ef01a00
[ 9316.514678] GPR16: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[ 9316.514678] GPR20: 0000000000000000 0000000000000000 0000000000000000 
0000000000000001
[ 9316.514678] GPR24: 0000000000000000 c000000004d7a760 000008792ad04b82 
0000000000000000
[ 9316.514678] GPR28: 0000000000000000 0000000000000001 c0000000032b18d8 
c0000000032b18e0
[ 9316.514774] NIP [c00000000012d9f0] plpar_hcall_norets_notrace+0x18/0x2c
[ 9316.514782] LR [c00000000135c0a8] cede_processor.isra.0+0x1c/0x34
[ 9316.514792] ---- interrupt: 900
[ 9316.514797] [c0000000096c7cb0] [c0000000096c7cf0] 0xc0000000096c7cf0 
(unreliable)
[ 9316.514808] [c0000000096c7d10] [c0000000019af170] 
dedicated_cede_loop+0x90/0x170
[ 9316.514819] [c0000000096c7d60] [c0000000019aeb20] 
cpuidle_enter_state+0x394/0x480
[ 9316.514830] [c0000000096c7e00] [c00000000135864c] cpuidle_enter+0x64/0x9c
[ 9316.514840] [c0000000096c7e50] [c000000000284b0c] call_cpuidle+0x7c/0xf8
[ 9316.514852] [c0000000096c7e90] [c0000000002903e8] 
cpuidle_idle_call+0x1c4/0x2b4
[ 9316.514862] [c0000000096c7f00] [c00000000029060c] do_idle+0x134/0x208
[ 9316.514872] [c0000000096c7f50] [c000000000290a5c] 
cpu_startup_entry+0x60/0x64
[ 9316.514882] [c0000000096c7f80] [c000000000074738] 
start_secondary+0x3fc/0x400
[ 9316.514894] [c0000000096c7fe0] [c00000000000e258] 
start_secondary_prolog+0x10/0x14
[ 9316.514904] Code: eba962a0 4bfffe40 60000000 387e0008 4bfae7c1 
60000000 ebbe0008 38800008 7fa3eb78 4bfafe85 60000000 39200001 
<7d40e8a8> 7d495214 7d40e9ad 40c2fff4
[ 9316.514941] ---[ end trace 0000000000000000 ]---


Regards,

Venkat.

>
> Since this logically makes sense, it would be worth fix it anyway.
>
>   mm/slub.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 18c30872d196..afa98065d74f 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2196,7 +2196,6 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
>   retry:
>   	old_exts = READ_ONCE(slab->obj_exts);
>   	handle_failed_objexts_alloc(old_exts, vec, objects);
> -	slab_set_stride(slab, sizeof(struct slabobj_ext));
>   
>   	if (new_slab) {
>   		/*
> @@ -2272,6 +2271,9 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
>   	void *addr;
>   	unsigned long obj_exts;
>   
> +	/* Initialize stride early to avoid memory ordering issues */
> +	slab_set_stride(slab, sizeof(struct slabobj_ext));
> +
>   	if (!need_slab_obj_exts(s))
>   		return;
>   
> @@ -2288,7 +2290,6 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
>   		obj_exts |= MEMCG_DATA_OBJEXTS;
>   #endif
>   		slab->obj_exts = obj_exts;
> -		slab_set_stride(slab, sizeof(struct slabobj_ext));
>   	} else if (s->flags & SLAB_OBJ_EXT_IN_OBJ) {
>   		unsigned int offset = obj_exts_offset_in_object(s);
>   


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/slab: initialize slab->stride early to avoid memory ordering issues
  2026-02-24  9:04 ` Venkat Rao Bagalkote
@ 2026-02-24 11:10   ` Harry Yoo
  0 siblings, 0 replies; 6+ messages in thread
From: Harry Yoo @ 2026-02-24 11:10 UTC (permalink / raw)
  To: Venkat Rao Bagalkote
  Cc: Vlastimil Babka, Andrew Morton, Christoph Lameter,
	David Rientjes, Roman Gushchin, Alexei Starovoitov, Hao Li,
	Suren Baghdasaryan, Shakeel Butt, Muchun Song, Johannes Weiner,
	Michal Hocko, cgroups, linux-mm

On Tue, Feb 24, 2026 at 02:34:41PM +0530, Venkat Rao Bagalkote wrote:
> 
> On 23/02/26 1:28 pm, Harry Yoo wrote:
> > When alloc_slab_obj_exts() is called later in time (instead of at slab
> > allocation & initialization step), slab->stride and slab->obj_exts are
> > set when the slab is already accessible by multiple CPUs.
> > 
> > The current implementation does not enforce memory ordering between
> > slab->stride and slab->obj_exts. However, for correctness, slab->stride
> > must be visible before slab->obj_exts, otherwise concurrent readers
> > may observe slab->obj_exts as non-zero while stride is still stale,
> > leading to incorrect reference counting of object cgroups.
> > 
> > There has been a bug report [1] that showed symptoms of incorrect
> > reference counting of object cgroups, which could be triggered by
> > this memory ordering issue.
> > 
> > Fix this by unconditionally initializing slab->stride in
> > alloc_slab_obj_exts_early(), before the need_slab_obj_exts() check.
> > In case of SLAB_OBJ_EXT_IN_OBJ, it is overridden in the same function.
> > 
> > This ensures stride is set before the slab becomes visible to
> > other CPUs via the per-node partial slab list (protected by spinlock
> > with acquire/release semantics), preventing them from observing
> > inconsistent stride value.
> > 
> > Thanks to Shakeel Butt for pointing out this issue [2].
> > 
> > Fixes: 7a8e71bc619d ("mm/slab: use stride to access slabobj_ext")
> > Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> > Closes: https://lore.kernel.org/lkml/ca241daa-e7e7-4604-a48d-de91ec9184a5@linux.ibm.com
> > Link: https://lore.kernel.org/linux-mm/aZu9G9mVIVzSm6Ft@hyeyoo
> > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > ---
> > 
> > I tested this patch, but I could not confirm that this actually fixes
> > the issue reported by [1]. It would be nice if Venkat could help
> > confirm; but perhaps it's challenging to reliably reproduce...
> 
> 
> Thanks for the patch. I did ran the complete test suite, and unfortunately
> issue is reproducing.

Oops, thanks for confirming that it's still reproduced!
That's really helpful.

Perhaps I should start considering cases where it's not a memory
ordering issue, but let's check one more thing before moving on.
could you please test if it still reproduces with the following patch?

If it's still reproducible, it should not be due to the memory ordering
issue between obj_exts and stride.

---8<---
From: Harry Yoo <harry.yoo@oracle.com>
Date: Mon, 23 Feb 2026 16:58:09 +0900
Subject: mm/slab: enforce slab->stride -> slab->obj_exts ordering

I tried to avoid unnecessary memory barriers for efficiency,
but the original bug is still reproducible.

Probably I missed a case where an object is allocated on a CPU
and then freed on a different CPU without involving spinlock.

I'm not sure if I did not cover edge cases or if it's caused by
something other than memory ordering issue.

Anyway, let's find out by introducing heavy memory barriers!

Always ensure that updates to stride is visible before obj_exts.

---
 mm/slab.h |  1 +
 mm/slub.c | 10 +++++++---
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index 71c7261bf822..aacdd9f4e509 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -565,6 +565,7 @@ static inline void slab_set_stride(struct slab *slab, unsigned short stride)
 }
 static inline unsigned short slab_get_stride(struct slab *slab)
 {
+	smp_rmb();
 	return slab->stride;
 }
 #else
diff --git a/mm/slub.c b/mm/slub.c
index 862642c165ed..c7c8b660a994 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2196,7 +2196,6 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
 retry:
 	old_exts = READ_ONCE(slab->obj_exts);
 	handle_failed_objexts_alloc(old_exts, vec, objects);
-	slab_set_stride(slab, sizeof(struct slabobj_ext));

 	if (new_slab) {
 		/*
@@ -2272,6 +2271,10 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
 	void *addr;
 	unsigned long obj_exts;

+	slab_set_stride(slab, sizeof(struct slabobj_ext));
+	/* pairs with smp_rmb() in slab_get_stride() */
+	smp_wmb();
+
 	if (!need_slab_obj_exts(s))
 		return;

@@ -2288,7 +2291,6 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
 		obj_exts |= MEMCG_DATA_OBJEXTS;
 #endif
 		slab->obj_exts = obj_exts;
-		slab_set_stride(slab, sizeof(struct slabobj_ext));
 	} else if (s->flags & SLAB_OBJ_EXT_IN_OBJ) {
 		unsigned int offset = obj_exts_offset_in_object(s);

@@ -2305,8 +2307,10 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
 #ifdef CONFIG_MEMCG
 		obj_exts |= MEMCG_DATA_OBJEXTS;
 #endif
-		slab->obj_exts = obj_exts;
 		slab_set_stride(slab, s->size);
+		/* pairs with smp_rmb() in slab_get_stride() */
+		smp_wmb();
+		slab->obj_exts = obj_exts;
 	}
 }

--
2.43.0




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-02-24 11:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-23  7:58 [PATCH] mm/slab: initialize slab->stride early to avoid memory ordering issues Harry Yoo
2026-02-23 11:44 ` Harry Yoo
2026-02-23 17:04   ` Vlastimil Babka
2026-02-23 20:23 ` Shakeel Butt
2026-02-24  9:04 ` Venkat Rao Bagalkote
2026-02-24 11:10   ` Harry Yoo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox