[PATCH] slab: Introduce __SECOND_OBJEXT_FLAG for objext

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags
@ 2025-10-14  9:31 Hao Ge
  2025-10-14 10:27 ` Harry Yoo
  2025-10-14 13:10 ` [syzbot ci] " syzbot ci
  0 siblings, 2 replies; 6+ messages in thread
From: Hao Ge @ 2025-10-14  9:31 UTC (permalink / raw)
  To: Vlastimil Babka, Alexei Starovoitov, Andrew Morton,
	Johannes Weiner, Shakeel Butt, Michal Hocko, Roman Gushchin,
	Muchun Song, Suren Baghdasaryan
  Cc: Harry Yoo, cgroups, linux-mm, linux-kernel, Hao Ge

From: Hao Ge <gehao@kylinos.cn>

We should not reuse the first bit for OBJEXTS_ALLOC_FAIL.
This is because the following scenarios may be encountered:

Under heavy system load, certain sequences of events can trigger the
VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS, folio) check:

1. High system pressure may cause objext allocation failure for a slab.
2. When objext allocation fails, slab->obj_exts is set to
   OBJEXTS_ALLOC_FAIL (value 1).
3. Later, this slab may enter the release process.
4. During release of the associated folio, the existing
   VM_BUG_ON_FOLIO check validates folio->memcg_data.
   If the MEMCG_DATA_OBJEXTS bit is unexpectedly
   set here, the bug check gets triggered.

We have obtained the following logs:
[ 7108.343437] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff0002deb97600 pfn:0x31eb96
[ 7108.343482] head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 7108.343500] memcg:1
[ 7108.343507] flags: 0x17ffff800000040(head|node=0|zone=2|lastcpupid=0xfffff)
[ 7108.343523] raw: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000
[ 7108.343528] raw: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001
[ 7108.343534] head: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000
[ 7108.343539] head: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001
[ 7108.343562] head: 017ffff800000001 fffffdffcb7ae581 00000000ffffffff 00000000ffffffff
[ 7108.343569] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000002
[ 7108.343574] page dumped because: VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS)
[ 7108.343601] ------------[ cut here ]------------
[ 7108.343607] kernel BUG at ./include/linux/memcontrol.h:537!
[ 7108.343617] Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
[ 7108.345751] Modules linked in: squashfs isofs vhost_vsock vhost_net vmw_vsock_virtio_transport_common vfio_iommu_type1 vhost vfio vsock vhost_iotlb iommufd tap binfmt_misc nfsv3 nfs_acl nfs lockd grace netfs tls rds dns_resolver tun brd overlay ntfs3 exfat btrfs blake2b_generic xor xor_neon raid6_pq loop sctp ip6_udp_tunnel udp_tunnel nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables rfkill ip_set sunrpc vfat fat joydev sg sch_fq_codel nfnetlink virtio_gpu drm_client_lib virtio_dma_buf drm_shmem_helper sr_mod drm_kms_helper cdrom drm ghash_ce virtio_net virtio_scsi backlight virtio_console virtio_blk net_failover failover virtio_mmio dm_mirror dm_region_hash dm_log dm_multipath dm_mod fuse i2c_dev virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring autofs4 aes_neon_bs aes_ce_blk [last unloaded: hwpoison_inject]
[ 7108.355662] CPU: 7 UID: 0 PID: 4470 Comm: kylin-process-m Kdump: loaded Not tainted 6.18.0-rc1-dirty #54 PREEMPT(voluntary)
[ 7108.356864] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
[ 7108.357621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 7108.358981] pc : __free_frozen_pages+0xf18/0x18e8
[ 7108.359834] lr : __free_frozen_pages+0xf18/0x18e8
[ 7108.360379] sp : ffff8000a2bb7580
[ 7108.360786] x29: ffff8000a2bb7580 x28: fffffdffcb7ae580 x27: fffffdffcb7ae580
[ 7108.362013] x26: fffffdffcb7ae588 x25: 1fffffbff96f5cb1 x24: 1fffffbff96f5cb0
[ 7108.362804] x23: ffff8000839d6ba0 x22: ffff8000839d6000 x21: 0000000000000000
[ 7108.363596] x20: 0000000000000000 x19: 0000000000000001 x18: 0000000000000000
[ 7108.364393] x17: 445f47434d454d20 x16: 2620617461645f67 x15: 636d656d3e2d6f69
[ 7108.365498] x14: 6c6f66284f494c4f x13: 0000000000000001 x12: ffff600063fece93
[ 7108.366317] x11: 1fffe00063fece92 x10: ffff600063fece92 x9 : dfff800000000000
[ 7108.367610] x8 : 00009fff9c01316e x7 : ffff00031ff67493 x6 : 0000000000000001
[ 7108.368455] x5 : ffff00031ff67490 x4 : ffff600063fece93 x3 : 0000000000000000
[ 7108.369276] x2 : 0000000000000000 x1 : ffff000103fe5d40 x0 : 000000000000004c
[ 7108.370140] Call trace:
[ 7108.370463]  __free_frozen_pages+0xf18/0x18e8 (P)
[ 7108.371011]  free_frozen_pages+0x1c/0x30
[ 7108.372040]  __free_slab+0xd0/0x250
[ 7108.372471]  free_slab+0x38/0x118
[ 7108.372882]  free_to_partial_list+0x1d4/0x340
[ 7108.373813]  __slab_free+0x24c/0x348
[ 7108.374253]  ___cache_free+0xf0/0x110
[ 7108.374699]  qlist_free_all+0x78/0x130
[ 7108.375156]  kasan_quarantine_reduce+0x114/0x148
[ 7108.375695]  __kasan_slab_alloc+0x7c/0xb0
[ 7108.376668]  kmem_cache_alloc_noprof+0x164/0x5c8
[ 7108.377206]  __alloc_object+0x44/0x1f8
[ 7108.377659]  __create_object+0x34/0xc8
[ 7108.378196]  kmemleak_alloc+0xb8/0xd8
[ 7108.378644]  kmem_cache_alloc_noprof+0x368/0x5c8
[ 7108.379224]  getname_flags.part.0+0xa4/0x610
[ 7108.379733]  getname_flags+0x80/0xd8
[ 7108.380169]  do_sys_openat2+0xb4/0x178
[ 7108.380921]  __arm64_sys_openat+0x134/0x1d0
[ 7108.381952]  invoke_syscall+0xd4/0x258
[ 7108.382408]  el0_svc_common.constprop.0+0xb4/0x240
[ 7108.382965]  do_el0_svc+0x48/0x68
[ 7108.383375]  el0_svc+0x40/0xe0
[ 7108.383757]  el0t_64_sync_handler+0xa0/0xe8
[ 7108.384465]  el0t_64_sync+0x1ac/0x1b0
[ 7108.385284] Code: 91398021 aa1b03e0 91138021 97fd35e3 (d4210000)
[ 7108.386553] SMP: stopping secondary CPUs
[ 7108.389714] Starting crashdump kernel...
[ 7108.390190] Bye!

So, introduce __SECOND_OBJEXT_FLAG for objext_flags, adjust
the corresponding order accordingly, and ensure that OBJEXTS_ALLOC_FAIL
is no longer reused.

Fixes: 7612833192d5 ("slab: Reuse first bit for OBJEXTS_ALLOC_FAIL")
Signed-off-by: Hao Ge <gehao@kylinos.cn>
---
 include/linux/memcontrol.h | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 873e510d6f8d..8ea023944fac 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -341,27 +341,23 @@ enum page_memcg_data_flags {
 	__NR_MEMCG_DATA_FLAGS  = (1UL << 2),
 };
 
-#define __OBJEXTS_ALLOC_FAIL	MEMCG_DATA_OBJEXTS
 #define __FIRST_OBJEXT_FLAG	__NR_MEMCG_DATA_FLAGS
+#define __SECOND_OBJEXT_FLAG    (__FIRST_OBJEXT_FLAG << 1)
 
 #else /* CONFIG_MEMCG */
 
-#define __OBJEXTS_ALLOC_FAIL	(1UL << 0)
 #define __FIRST_OBJEXT_FLAG	(1UL << 0)
+#define __SECOND_OBJEXT_FLAG	(1UL << 0)
 
 #endif /* CONFIG_MEMCG */
 
 enum objext_flags {
-	/*
-	 * Use bit 0 with zero other bits to signal that slabobj_ext vector
-	 * failed to allocate. The same bit 0 with valid upper bits means
-	 * MEMCG_DATA_OBJEXTS.
-	 */
-	OBJEXTS_ALLOC_FAIL = __OBJEXTS_ALLOC_FAIL,
+	/* slabobj_ext vector failed to allocate */
+	OBJEXTS_ALLOC_FAIL = __FIRST_OBJEXT_FLAG,
 	/* slabobj_ext vector allocated with kmalloc_nolock() */
-	OBJEXTS_NOSPIN_ALLOC = __FIRST_OBJEXT_FLAG,
+	OBJEXTS_NOSPIN_ALLOC = __SECOND_OBJEXT_FLAG,
 	/* the next bit after the last actual flag */
-	__NR_OBJEXTS_FLAGS  = (__FIRST_OBJEXT_FLAG << 1),
+	__NR_OBJEXTS_FLAGS  = (__SECOND_OBJEXT_FLAG << 1),
 };
 
 #define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1)
-- 
2.25.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags
  2025-10-14  9:31 [PATCH] slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags Hao Ge
@ 2025-10-14 10:27 ` Harry Yoo
  2025-10-14 11:18   ` Hao Ge
  2025-10-14 12:49   ` Vlastimil Babka
  2025-10-14 13:10 ` [syzbot ci] " syzbot ci
  1 sibling, 2 replies; 6+ messages in thread
From: Harry Yoo @ 2025-10-14 10:27 UTC (permalink / raw)
  To: Hao Ge
  Cc: Vlastimil Babka, Alexei Starovoitov, Andrew Morton,
	Johannes Weiner, Shakeel Butt, Michal Hocko, Roman Gushchin,
	Muchun Song, Suren Baghdasaryan, cgroups, linux-mm, linux-kernel,
	Hao Ge

On Tue, Oct 14, 2025 at 05:31:24PM +0800, Hao Ge wrote:
> From: Hao Ge <gehao@kylinos.cn>
> 
> We should not reuse the first bit for OBJEXTS_ALLOC_FAIL.
> This is because the following scenarios may be encountered:
> 
> Under heavy system load, certain sequences of events can trigger the

Hi Hao, thanks for catching it!

It's late at night and my brain is tired so I may be missing something,
but let me leave comment anyway...

> VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS, folio) check:

Should we check (folio->memcg_data != OBJEXTS_ALLOC_FAIL) &&
(folio->memcg_data & MEMCG_DATA_OBJEXTS) instead then?

Not clearing a valid folio->memcg_data is considered an error, but freeing a
folio that is marked OBJEXTS_ALLOC_FAIL isn't.

> 1. High system pressure may cause objext allocation failure for a slab.
> 2. When objext allocation fails, slab->obj_exts is set to
>    OBJEXTS_ALLOC_FAIL (value 1).
> 3. Later, this slab may enter the release process.
> 4. During release of the associated folio, the existing
>    VM_BUG_ON_FOLIO check validates folio->memcg_data.
>    If the MEMCG_DATA_OBJEXTS bit is unexpectedly
>    set here, the bug check gets triggered.
>
> We have obtained the following logs:
> [ 7108.343437] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff0002deb97600 pfn:0x31eb96
> [ 7108.343482] head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> [ 7108.343500] memcg:1
> [ 7108.343507] flags: 0x17ffff800000040(head|node=0|zone=2|lastcpupid=0xfffff)
> [ 7108.343523] raw: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000
> [ 7108.343528] raw: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001
> [ 7108.343534] head: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000
> [ 7108.343539] head: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001
> [ 7108.343562] head: 017ffff800000001 fffffdffcb7ae581 00000000ffffffff 00000000ffffffff
> [ 7108.343569] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000002
> [ 7108.343574] page dumped because: VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS)
> [ 7108.343601] ------------[ cut here ]------------
> [ 7108.343607] kernel BUG at ./include/linux/memcontrol.h:537!
> [ 7108.343617] Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
> [ 7108.345751] Modules linked in: squashfs isofs vhost_vsock vhost_net vmw_vsock_virtio_transport_common vfio_iommu_type1 vhost vfio vsock vhost_iotlb iommufd tap binfmt_misc nfsv3 nfs_acl nfs lockd grace netfs tls rds dns_resolver tun brd overlay ntfs3 exfat btrfs blake2b_generic xor xor_neon raid6_pq loop sctp ip6_udp_tunnel udp_tunnel nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables rfkill ip_set sunrpc vfat fat joydev sg sch_fq_codel nfnetlink virtio_gpu drm_client_lib virtio_dma_buf drm_shmem_helper sr_mod drm_kms_helper cdrom drm ghash_ce virtio_net virtio_scsi backlight virtio_console virtio_blk net_failover failover virtio_mmio dm_mirror dm_region_hash dm_log dm_multipath dm_mod fuse i2c_dev virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring autofs4 aes_neon_bs aes_ce_blk [last unloaded: hwpoison_inject]
> [ 7108.355662] CPU: 7 UID: 0 PID: 4470 Comm: kylin-process-m Kdump: loaded Not tainted 6.18.0-rc1-dirty #54 PREEMPT(voluntary)
> [ 7108.356864] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
> [ 7108.357621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 7108.358981] pc : __free_frozen_pages+0xf18/0x18e8
> [ 7108.359834] lr : __free_frozen_pages+0xf18/0x18e8
> [ 7108.360379] sp : ffff8000a2bb7580
> [ 7108.360786] x29: ffff8000a2bb7580 x28: fffffdffcb7ae580 x27: fffffdffcb7ae580
> [ 7108.362013] x26: fffffdffcb7ae588 x25: 1fffffbff96f5cb1 x24: 1fffffbff96f5cb0
> [ 7108.362804] x23: ffff8000839d6ba0 x22: ffff8000839d6000 x21: 0000000000000000
> [ 7108.363596] x20: 0000000000000000 x19: 0000000000000001 x18: 0000000000000000
> [ 7108.364393] x17: 445f47434d454d20 x16: 2620617461645f67 x15: 636d656d3e2d6f69
> [ 7108.365498] x14: 6c6f66284f494c4f x13: 0000000000000001 x12: ffff600063fece93
> [ 7108.366317] x11: 1fffe00063fece92 x10: ffff600063fece92 x9 : dfff800000000000
> [ 7108.367610] x8 : 00009fff9c01316e x7 : ffff00031ff67493 x6 : 0000000000000001
> [ 7108.368455] x5 : ffff00031ff67490 x4 : ffff600063fece93 x3 : 0000000000000000
> [ 7108.369276] x2 : 0000000000000000 x1 : ffff000103fe5d40 x0 : 000000000000004c
> [ 7108.370140] Call trace:
> [ 7108.370463]  __free_frozen_pages+0xf18/0x18e8 (P)
> [ 7108.371011]  free_frozen_pages+0x1c/0x30
> [ 7108.372040]  __free_slab+0xd0/0x250
> [ 7108.372471]  free_slab+0x38/0x118
> [ 7108.372882]  free_to_partial_list+0x1d4/0x340
> [ 7108.373813]  __slab_free+0x24c/0x348
> [ 7108.374253]  ___cache_free+0xf0/0x110
> [ 7108.374699]  qlist_free_all+0x78/0x130
> [ 7108.375156]  kasan_quarantine_reduce+0x114/0x148
> [ 7108.375695]  __kasan_slab_alloc+0x7c/0xb0
> [ 7108.376668]  kmem_cache_alloc_noprof+0x164/0x5c8
> [ 7108.377206]  __alloc_object+0x44/0x1f8
> [ 7108.377659]  __create_object+0x34/0xc8
> [ 7108.378196]  kmemleak_alloc+0xb8/0xd8
> [ 7108.378644]  kmem_cache_alloc_noprof+0x368/0x5c8
> [ 7108.379224]  getname_flags.part.0+0xa4/0x610
> [ 7108.379733]  getname_flags+0x80/0xd8
> [ 7108.380169]  do_sys_openat2+0xb4/0x178
> [ 7108.380921]  __arm64_sys_openat+0x134/0x1d0
> [ 7108.381952]  invoke_syscall+0xd4/0x258
> [ 7108.382408]  el0_svc_common.constprop.0+0xb4/0x240
> [ 7108.382965]  do_el0_svc+0x48/0x68
> [ 7108.383375]  el0_svc+0x40/0xe0
> [ 7108.383757]  el0t_64_sync_handler+0xa0/0xe8
> [ 7108.384465]  el0t_64_sync+0x1ac/0x1b0
> [ 7108.385284] Code: 91398021 aa1b03e0 91138021 97fd35e3 (d4210000)
> [ 7108.386553] SMP: stopping secondary CPUs
> [ 7108.389714] Starting crashdump kernel...
> [ 7108.390190] Bye!
> 
> So, introduce __SECOND_OBJEXT_FLAG for objext_flags, adjust
> the corresponding order accordingly, and ensure that OBJEXTS_ALLOC_FAIL
> is no longer reused.
>
> Fixes: 7612833192d5 ("slab: Reuse first bit for OBJEXTS_ALLOC_FAIL")

Hmm using a new bit was suggested at that time, but that would
require bumping up the alignment when allocating slabobj_ext array?
(see alloc_slab_obj_exts())

And we can still distinguish two cases where

1) MEMCG_DATA_OBJEXTS is set, but upper bits are not set,
   so it should mean obj_exts allocation failed (OBJEXTS_ALLOC_FAIL),
   thus do not report error, or

2) MEMCG_DATA_OBJEXTS is set, and upper bits are also set, so someone
   did not clear a valid folio->memcg_data before freeing the folio
   (report error).

without introducing a new bit, right?

> Signed-off-by: Hao Ge <gehao@kylinos.cn>
> ---
>  include/linux/memcontrol.h | 16 ++++++----------
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 873e510d6f8d..8ea023944fac 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -341,27 +341,23 @@ enum page_memcg_data_flags {
>  	__NR_MEMCG_DATA_FLAGS  = (1UL << 2),
>  };
>  
> -#define __OBJEXTS_ALLOC_FAIL	MEMCG_DATA_OBJEXTS
>  #define __FIRST_OBJEXT_FLAG	__NR_MEMCG_DATA_FLAGS
> +#define __SECOND_OBJEXT_FLAG    (__FIRST_OBJEXT_FLAG << 1)
>  
>  #else /* CONFIG_MEMCG */
>  
> -#define __OBJEXTS_ALLOC_FAIL	(1UL << 0)
>  #define __FIRST_OBJEXT_FLAG	(1UL << 0)
> +#define __SECOND_OBJEXT_FLAG	(1UL << 0)
>  
>  #endif /* CONFIG_MEMCG */
>  
>  enum objext_flags {
> -	/*
> -	 * Use bit 0 with zero other bits to signal that slabobj_ext vector
> -	 * failed to allocate. The same bit 0 with valid upper bits means
> -	 * MEMCG_DATA_OBJEXTS.
> -	 */
> -	OBJEXTS_ALLOC_FAIL = __OBJEXTS_ALLOC_FAIL,
> +	/* slabobj_ext vector failed to allocate */
> +	OBJEXTS_ALLOC_FAIL = __FIRST_OBJEXT_FLAG,
>  	/* slabobj_ext vector allocated with kmalloc_nolock() */
> -	OBJEXTS_NOSPIN_ALLOC = __FIRST_OBJEXT_FLAG,
> +	OBJEXTS_NOSPIN_ALLOC = __SECOND_OBJEXT_FLAG,
>  	/* the next bit after the last actual flag */
> -	__NR_OBJEXTS_FLAGS  = (__FIRST_OBJEXT_FLAG << 1),
> +	__NR_OBJEXTS_FLAGS  = (__SECOND_OBJEXT_FLAG << 1),
>  };
>  
>  #define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1)
> -- 
> 2.25.1
> 

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags
  2025-10-14 10:27 ` Harry Yoo
@ 2025-10-14 11:18   ` Hao Ge
  2025-10-14 12:49   ` Vlastimil Babka
  1 sibling, 0 replies; 6+ messages in thread
From: Hao Ge @ 2025-10-14 11:18 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Vlastimil Babka, Alexei Starovoitov, Andrew Morton,
	Johannes Weiner, Shakeel Butt, Michal Hocko, Roman Gushchin,
	Muchun Song, Suren Baghdasaryan, cgroups, linux-mm, linux-kernel,
	Hao Ge


On 2025/10/14 18:27, Harry Yoo wrote:
> On Tue, Oct 14, 2025 at 05:31:24PM +0800, Hao Ge wrote:
>> From: Hao Ge <gehao@kylinos.cn>
>>
>> We should not reuse the first bit for OBJEXTS_ALLOC_FAIL.
>> This is because the following scenarios may be encountered:
>>
>> Under heavy system load, certain sequences of events can trigger the
> Hi Hao, thanks for catching it!
>
> It's late at night and my brain is tired so I may be missing something,
> but let me leave comment anyway...
>
>> VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS, folio) check:
> Should we check (folio->memcg_data != OBJEXTS_ALLOC_FAIL) &&
> (folio->memcg_data & MEMCG_DATA_OBJEXTS) instead then?
>
> Not clearing a valid folio->memcg_data is considered an error, but freeing a
> folio that is marked OBJEXTS_ALLOC_FAIL isn't.

Hi Harry

Thank you very much for taking the time to review my patch amid your 
busy work.

It was also that I didn’t express myself clearly in the following paragraph.

>> 1. High system pressure may cause objext allocation failure for a slab.
>> 2. When objext allocation fails, slab->obj_exts is set to
>>     OBJEXTS_ALLOC_FAIL (value 1).

The sentence "2. When objext allocation fails, slab->obj_exts is set to 
OBJEXTS_ALLOC_FAIL (value 1)."

should be converted to this one: "2. When objext allocation fails, 
slab->obj_exts is set to OBJEXTS_ALLOC_FAIL,

and OBJEXTS_ALLOC_FAIL is actually equivalent to MEMCG_DATA_OBJEXTS."

So the root cause of this issue lies here as well—because 
OBJEXTS_ALLOC_FAIL and MEMCG_DATA_OBJEXTS

are reusing the same bit.

Thanks

Best Regards

Hao


>> 3. Later, this slab may enter the release process.
>> 4. During release of the associated folio, the existing
>>     VM_BUG_ON_FOLIO check validates folio->memcg_data.
>>     If the MEMCG_DATA_OBJEXTS bit is unexpectedly
>>     set here, the bug check gets triggered.
>>
>> We have obtained the following logs:
>> [ 7108.343437] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff0002deb97600 pfn:0x31eb96
>> [ 7108.343482] head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
>> [ 7108.343500] memcg:1
>> [ 7108.343507] flags: 0x17ffff800000040(head|node=0|zone=2|lastcpupid=0xfffff)
>> [ 7108.343523] raw: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000
>> [ 7108.343528] raw: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001
>> [ 7108.343534] head: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000
>> [ 7108.343539] head: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001
>> [ 7108.343562] head: 017ffff800000001 fffffdffcb7ae581 00000000ffffffff 00000000ffffffff
>> [ 7108.343569] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000002
>> [ 7108.343574] page dumped because: VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS)
>> [ 7108.343601] ------------[ cut here ]------------
>> [ 7108.343607] kernel BUG at ./include/linux/memcontrol.h:537!
>> [ 7108.343617] Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
>> [ 7108.345751] Modules linked in: squashfs isofs vhost_vsock vhost_net vmw_vsock_virtio_transport_common vfio_iommu_type1 vhost vfio vsock vhost_iotlb iommufd tap binfmt_misc nfsv3 nfs_acl nfs lockd grace netfs tls rds dns_resolver tun brd overlay ntfs3 exfat btrfs blake2b_generic xor xor_neon raid6_pq loop sctp ip6_udp_tunnel udp_tunnel nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables rfkill ip_set sunrpc vfat fat joydev sg sch_fq_codel nfnetlink virtio_gpu drm_client_lib virtio_dma_buf drm_shmem_helper sr_mod drm_kms_helper cdrom drm ghash_ce virtio_net virtio_scsi backlight virtio_console virtio_blk net_failover failover virtio_mmio dm_mirror dm_region_hash dm_log dm_multipath dm_mod fuse i2c_dev virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring autofs4 aes_neon_bs aes_ce_blk [last unloaded: hwpoison_inject]
>> [ 7108.355662] CPU: 7 UID: 0 PID: 4470 Comm: kylin-process-m Kdump: loaded Not tainted 6.18.0-rc1-dirty #54 PREEMPT(voluntary)
>> [ 7108.356864] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
>> [ 7108.357621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [ 7108.358981] pc : __free_frozen_pages+0xf18/0x18e8
>> [ 7108.359834] lr : __free_frozen_pages+0xf18/0x18e8
>> [ 7108.360379] sp : ffff8000a2bb7580
>> [ 7108.360786] x29: ffff8000a2bb7580 x28: fffffdffcb7ae580 x27: fffffdffcb7ae580
>> [ 7108.362013] x26: fffffdffcb7ae588 x25: 1fffffbff96f5cb1 x24: 1fffffbff96f5cb0
>> [ 7108.362804] x23: ffff8000839d6ba0 x22: ffff8000839d6000 x21: 0000000000000000
>> [ 7108.363596] x20: 0000000000000000 x19: 0000000000000001 x18: 0000000000000000
>> [ 7108.364393] x17: 445f47434d454d20 x16: 2620617461645f67 x15: 636d656d3e2d6f69
>> [ 7108.365498] x14: 6c6f66284f494c4f x13: 0000000000000001 x12: ffff600063fece93
>> [ 7108.366317] x11: 1fffe00063fece92 x10: ffff600063fece92 x9 : dfff800000000000
>> [ 7108.367610] x8 : 00009fff9c01316e x7 : ffff00031ff67493 x6 : 0000000000000001
>> [ 7108.368455] x5 : ffff00031ff67490 x4 : ffff600063fece93 x3 : 0000000000000000
>> [ 7108.369276] x2 : 0000000000000000 x1 : ffff000103fe5d40 x0 : 000000000000004c
>> [ 7108.370140] Call trace:
>> [ 7108.370463]  __free_frozen_pages+0xf18/0x18e8 (P)
>> [ 7108.371011]  free_frozen_pages+0x1c/0x30
>> [ 7108.372040]  __free_slab+0xd0/0x250
>> [ 7108.372471]  free_slab+0x38/0x118
>> [ 7108.372882]  free_to_partial_list+0x1d4/0x340
>> [ 7108.373813]  __slab_free+0x24c/0x348
>> [ 7108.374253]  ___cache_free+0xf0/0x110
>> [ 7108.374699]  qlist_free_all+0x78/0x130
>> [ 7108.375156]  kasan_quarantine_reduce+0x114/0x148
>> [ 7108.375695]  __kasan_slab_alloc+0x7c/0xb0
>> [ 7108.376668]  kmem_cache_alloc_noprof+0x164/0x5c8
>> [ 7108.377206]  __alloc_object+0x44/0x1f8
>> [ 7108.377659]  __create_object+0x34/0xc8
>> [ 7108.378196]  kmemleak_alloc+0xb8/0xd8
>> [ 7108.378644]  kmem_cache_alloc_noprof+0x368/0x5c8
>> [ 7108.379224]  getname_flags.part.0+0xa4/0x610
>> [ 7108.379733]  getname_flags+0x80/0xd8
>> [ 7108.380169]  do_sys_openat2+0xb4/0x178
>> [ 7108.380921]  __arm64_sys_openat+0x134/0x1d0
>> [ 7108.381952]  invoke_syscall+0xd4/0x258
>> [ 7108.382408]  el0_svc_common.constprop.0+0xb4/0x240
>> [ 7108.382965]  do_el0_svc+0x48/0x68
>> [ 7108.383375]  el0_svc+0x40/0xe0
>> [ 7108.383757]  el0t_64_sync_handler+0xa0/0xe8
>> [ 7108.384465]  el0t_64_sync+0x1ac/0x1b0
>> [ 7108.385284] Code: 91398021 aa1b03e0 91138021 97fd35e3 (d4210000)
>> [ 7108.386553] SMP: stopping secondary CPUs
>> [ 7108.389714] Starting crashdump kernel...
>> [ 7108.390190] Bye!
>>
>> So, introduce __SECOND_OBJEXT_FLAG for objext_flags, adjust
>> the corresponding order accordingly, and ensure that OBJEXTS_ALLOC_FAIL
>> is no longer reused.
>>
>> Fixes: 7612833192d5 ("slab: Reuse first bit for OBJEXTS_ALLOC_FAIL")
> Hmm using a new bit was suggested at that time, but that would
> require bumping up the alignment when allocating slabobj_ext array?
> (see alloc_slab_obj_exts())
>
> And we can still distinguish two cases where
>
> 1) MEMCG_DATA_OBJEXTS is set, but upper bits are not set,
>     so it should mean obj_exts allocation failed (OBJEXTS_ALLOC_FAIL),
>     thus do not report error, or
>
> 2) MEMCG_DATA_OBJEXTS is set, and upper bits are also set, so someone
>     did not clear a valid folio->memcg_data before freeing the folio
>     (report error).
>
> without introducing a new bit, right?
>
>> Signed-off-by: Hao Ge <gehao@kylinos.cn>
>> ---
>>   include/linux/memcontrol.h | 16 ++++++----------
>>   1 file changed, 6 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index 873e510d6f8d..8ea023944fac 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -341,27 +341,23 @@ enum page_memcg_data_flags {
>>   	__NR_MEMCG_DATA_FLAGS  = (1UL << 2),
>>   };
>>   
>> -#define __OBJEXTS_ALLOC_FAIL	MEMCG_DATA_OBJEXTS
>>   #define __FIRST_OBJEXT_FLAG	__NR_MEMCG_DATA_FLAGS
>> +#define __SECOND_OBJEXT_FLAG    (__FIRST_OBJEXT_FLAG << 1)
>>   
>>   #else /* CONFIG_MEMCG */
>>   
>> -#define __OBJEXTS_ALLOC_FAIL	(1UL << 0)
>>   #define __FIRST_OBJEXT_FLAG	(1UL << 0)
>> +#define __SECOND_OBJEXT_FLAG	(1UL << 0)
>>   
>>   #endif /* CONFIG_MEMCG */
>>   
>>   enum objext_flags {
>> -	/*
>> -	 * Use bit 0 with zero other bits to signal that slabobj_ext vector
>> -	 * failed to allocate. The same bit 0 with valid upper bits means
>> -	 * MEMCG_DATA_OBJEXTS.
>> -	 */
>> -	OBJEXTS_ALLOC_FAIL = __OBJEXTS_ALLOC_FAIL,
>> +	/* slabobj_ext vector failed to allocate */
>> +	OBJEXTS_ALLOC_FAIL = __FIRST_OBJEXT_FLAG,
>>   	/* slabobj_ext vector allocated with kmalloc_nolock() */
>> -	OBJEXTS_NOSPIN_ALLOC = __FIRST_OBJEXT_FLAG,
>> +	OBJEXTS_NOSPIN_ALLOC = __SECOND_OBJEXT_FLAG,
>>   	/* the next bit after the last actual flag */
>> -	__NR_OBJEXTS_FLAGS  = (__FIRST_OBJEXT_FLAG << 1),
>> +	__NR_OBJEXTS_FLAGS  = (__SECOND_OBJEXT_FLAG << 1),
>>   };
>>   
>>   #define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1)
>> -- 
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags
  2025-10-14 10:27 ` Harry Yoo
  2025-10-14 11:18   ` Hao Ge
@ 2025-10-14 12:49   ` Vlastimil Babka
  2025-10-14 13:18     ` Hao Ge
  1 sibling, 1 reply; 6+ messages in thread
From: Vlastimil Babka @ 2025-10-14 12:49 UTC (permalink / raw)
  To: Harry Yoo, Hao Ge
  Cc: Alexei Starovoitov, Andrew Morton, Johannes Weiner, Shakeel Butt,
	Michal Hocko, Roman Gushchin, Muchun Song, Suren Baghdasaryan,
	cgroups, linux-mm, linux-kernel, Hao Ge

On 10/14/25 12:27, Harry Yoo wrote:
> On Tue, Oct 14, 2025 at 05:31:24PM +0800, Hao Ge wrote:
>> From: Hao Ge <gehao@kylinos.cn>
>> 
>> We should not reuse the first bit for OBJEXTS_ALLOC_FAIL.
>> This is because the following scenarios may be encountered:
>> 
>> Under heavy system load, certain sequences of events can trigger the
> 
> Hi Hao, thanks for catching it!
> 
> It's late at night and my brain is tired so I may be missing something,
> but let me leave comment anyway...
> 
>> VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS, folio) check:
> 
> Should we check (folio->memcg_data != OBJEXTS_ALLOC_FAIL) &&
> (folio->memcg_data & MEMCG_DATA_OBJEXTS) instead then?

Yes, we already went that direction, but seems we need to expand to more
places due to 7612833192d5 ("slab: Reuse first bit for OBJEXTS_ALLOC_FAIL"):

https://lore.kernel.org/all/20250915200918.3855580-2-surenb@google.com/
> Not clearing a valid folio->memcg_data is considered an error, but freeing a
> folio that is marked OBJEXTS_ALLOC_FAIL isn't.
> 
>> 1. High system pressure may cause objext allocation failure for a slab.
>> 2. When objext allocation fails, slab->obj_exts is set to
>>    OBJEXTS_ALLOC_FAIL (value 1).
>> 3. Later, this slab may enter the release process.
>> 4. During release of the associated folio, the existing
>>    VM_BUG_ON_FOLIO check validates folio->memcg_data.
>>    If the MEMCG_DATA_OBJEXTS bit is unexpectedly
>>    set here, the bug check gets triggered.
>>
>> We have obtained the following logs:
>> [ 7108.343437] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff0002deb97600 pfn:0x31eb96
>> [ 7108.343482] head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
>> [ 7108.343500] memcg:1
>> [ 7108.343507] flags: 0x17ffff800000040(head|node=0|zone=2|lastcpupid=0xfffff)
>> [ 7108.343523] raw: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000
>> [ 7108.343528] raw: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001
>> [ 7108.343534] head: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000
>> [ 7108.343539] head: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001
>> [ 7108.343562] head: 017ffff800000001 fffffdffcb7ae581 00000000ffffffff 00000000ffffffff
>> [ 7108.343569] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000002
>> [ 7108.343574] page dumped because: VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS)
>> [ 7108.343601] ------------[ cut here ]------------
>> [ 7108.343607] kernel BUG at ./include/linux/memcontrol.h:537!
>> [ 7108.343617] Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
>> [ 7108.345751] Modules linked in: squashfs isofs vhost_vsock vhost_net vmw_vsock_virtio_transport_common vfio_iommu_type1 vhost vfio vsock vhost_iotlb iommufd tap binfmt_misc nfsv3 nfs_acl nfs lockd grace netfs tls rds dns_resolver tun brd overlay ntfs3 exfat btrfs blake2b_generic xor xor_neon raid6_pq loop sctp ip6_udp_tunnel udp_tunnel nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables rfkill ip_set sunrpc vfat fat joydev sg sch_fq_codel nfnetlink virtio_gpu drm_client_lib virtio_dma_buf drm_shmem_helper sr_mod drm_kms_helper cdrom drm ghash_ce virtio_net virtio_scsi backlight virtio_console virtio_blk net_failover failover virtio_mmio dm_mirror dm_region_hash dm_log dm_multipath dm_mod fuse i2c_dev virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring autofs4 aes_neon_bs aes_ce_blk [last unloaded: hwpoison_inject]
>> [ 7108.355662] CPU: 7 UID: 0 PID: 4470 Comm: kylin-process-m Kdump: loaded Not tainted 6.18.0-rc1-dirty #54 PREEMPT(voluntary)
>> [ 7108.356864] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
>> [ 7108.357621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [ 7108.358981] pc : __free_frozen_pages+0xf18/0x18e8
>> [ 7108.359834] lr : __free_frozen_pages+0xf18/0x18e8
>> [ 7108.360379] sp : ffff8000a2bb7580
>> [ 7108.360786] x29: ffff8000a2bb7580 x28: fffffdffcb7ae580 x27: fffffdffcb7ae580
>> [ 7108.362013] x26: fffffdffcb7ae588 x25: 1fffffbff96f5cb1 x24: 1fffffbff96f5cb0
>> [ 7108.362804] x23: ffff8000839d6ba0 x22: ffff8000839d6000 x21: 0000000000000000
>> [ 7108.363596] x20: 0000000000000000 x19: 0000000000000001 x18: 0000000000000000
>> [ 7108.364393] x17: 445f47434d454d20 x16: 2620617461645f67 x15: 636d656d3e2d6f69
>> [ 7108.365498] x14: 6c6f66284f494c4f x13: 0000000000000001 x12: ffff600063fece93
>> [ 7108.366317] x11: 1fffe00063fece92 x10: ffff600063fece92 x9 : dfff800000000000
>> [ 7108.367610] x8 : 00009fff9c01316e x7 : ffff00031ff67493 x6 : 0000000000000001
>> [ 7108.368455] x5 : ffff00031ff67490 x4 : ffff600063fece93 x3 : 0000000000000000
>> [ 7108.369276] x2 : 0000000000000000 x1 : ffff000103fe5d40 x0 : 000000000000004c
>> [ 7108.370140] Call trace:
>> [ 7108.370463]  __free_frozen_pages+0xf18/0x18e8 (P)
>> [ 7108.371011]  free_frozen_pages+0x1c/0x30
>> [ 7108.372040]  __free_slab+0xd0/0x250
>> [ 7108.372471]  free_slab+0x38/0x118
>> [ 7108.372882]  free_to_partial_list+0x1d4/0x340
>> [ 7108.373813]  __slab_free+0x24c/0x348
>> [ 7108.374253]  ___cache_free+0xf0/0x110
>> [ 7108.374699]  qlist_free_all+0x78/0x130
>> [ 7108.375156]  kasan_quarantine_reduce+0x114/0x148
>> [ 7108.375695]  __kasan_slab_alloc+0x7c/0xb0
>> [ 7108.376668]  kmem_cache_alloc_noprof+0x164/0x5c8
>> [ 7108.377206]  __alloc_object+0x44/0x1f8
>> [ 7108.377659]  __create_object+0x34/0xc8
>> [ 7108.378196]  kmemleak_alloc+0xb8/0xd8
>> [ 7108.378644]  kmem_cache_alloc_noprof+0x368/0x5c8
>> [ 7108.379224]  getname_flags.part.0+0xa4/0x610
>> [ 7108.379733]  getname_flags+0x80/0xd8
>> [ 7108.380169]  do_sys_openat2+0xb4/0x178
>> [ 7108.380921]  __arm64_sys_openat+0x134/0x1d0
>> [ 7108.381952]  invoke_syscall+0xd4/0x258
>> [ 7108.382408]  el0_svc_common.constprop.0+0xb4/0x240
>> [ 7108.382965]  do_el0_svc+0x48/0x68
>> [ 7108.383375]  el0_svc+0x40/0xe0
>> [ 7108.383757]  el0t_64_sync_handler+0xa0/0xe8
>> [ 7108.384465]  el0t_64_sync+0x1ac/0x1b0
>> [ 7108.385284] Code: 91398021 aa1b03e0 91138021 97fd35e3 (d4210000)
>> [ 7108.386553] SMP: stopping secondary CPUs
>> [ 7108.389714] Starting crashdump kernel...
>> [ 7108.390190] Bye!
>> 
>> So, introduce __SECOND_OBJEXT_FLAG for objext_flags, adjust
>> the corresponding order accordingly, and ensure that OBJEXTS_ALLOC_FAIL
>> is no longer reused.
>>
>> Fixes: 7612833192d5 ("slab: Reuse first bit for OBJEXTS_ALLOC_FAIL")
> 
> Hmm using a new bit was suggested at that time, but that would
> require bumping up the alignment when allocating slabobj_ext array?
> (see alloc_slab_obj_exts())
> 
> And we can still distinguish two cases where
> 
> 1) MEMCG_DATA_OBJEXTS is set, but upper bits are not set,
>    so it should mean obj_exts allocation failed (OBJEXTS_ALLOC_FAIL),
>    thus do not report error, or
> 
> 2) MEMCG_DATA_OBJEXTS is set, and upper bits are also set, so someone
>    did not clear a valid folio->memcg_data before freeing the folio
>    (report error).
> 
> without introducing a new bit, right?

Agreed.

> 
>> Signed-off-by: Hao Ge <gehao@kylinos.cn>
>> ---
>>  include/linux/memcontrol.h | 16 ++++++----------
>>  1 file changed, 6 insertions(+), 10 deletions(-)
>> 
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index 873e510d6f8d..8ea023944fac 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -341,27 +341,23 @@ enum page_memcg_data_flags {
>>  	__NR_MEMCG_DATA_FLAGS  = (1UL << 2),
>>  };
>>  
>> -#define __OBJEXTS_ALLOC_FAIL	MEMCG_DATA_OBJEXTS
>>  #define __FIRST_OBJEXT_FLAG	__NR_MEMCG_DATA_FLAGS
>> +#define __SECOND_OBJEXT_FLAG    (__FIRST_OBJEXT_FLAG << 1)
>>  
>>  #else /* CONFIG_MEMCG */
>>  
>> -#define __OBJEXTS_ALLOC_FAIL	(1UL << 0)
>>  #define __FIRST_OBJEXT_FLAG	(1UL << 0)
>> +#define __SECOND_OBJEXT_FLAG	(1UL << 0)
>>  
>>  #endif /* CONFIG_MEMCG */
>>  
>>  enum objext_flags {
>> -	/*
>> -	 * Use bit 0 with zero other bits to signal that slabobj_ext vector
>> -	 * failed to allocate. The same bit 0 with valid upper bits means
>> -	 * MEMCG_DATA_OBJEXTS.
>> -	 */
>> -	OBJEXTS_ALLOC_FAIL = __OBJEXTS_ALLOC_FAIL,
>> +	/* slabobj_ext vector failed to allocate */
>> +	OBJEXTS_ALLOC_FAIL = __FIRST_OBJEXT_FLAG,
>>  	/* slabobj_ext vector allocated with kmalloc_nolock() */
>> -	OBJEXTS_NOSPIN_ALLOC = __FIRST_OBJEXT_FLAG,
>> +	OBJEXTS_NOSPIN_ALLOC = __SECOND_OBJEXT_FLAG,
>>  	/* the next bit after the last actual flag */
>> -	__NR_OBJEXTS_FLAGS  = (__FIRST_OBJEXT_FLAG << 1),
>> +	__NR_OBJEXTS_FLAGS  = (__SECOND_OBJEXT_FLAG << 1),
>>  };
>>  
>>  #define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1)
>> -- 
>> 2.25.1
>> 
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags
  2025-10-14 12:49   ` Vlastimil Babka
@ 2025-10-14 13:18     ` Hao Ge
  0 siblings, 0 replies; 6+ messages in thread
From: Hao Ge @ 2025-10-14 13:18 UTC (permalink / raw)
  To: Vlastimil Babka, Harry Yoo
  Cc: Alexei Starovoitov, Andrew Morton, Johannes Weiner, Shakeel Butt,
	Michal Hocko, Roman Gushchin, Muchun Song, Suren Baghdasaryan,
	cgroups, linux-mm, linux-kernel, Hao Ge


On 2025/10/14 20:49, Vlastimil Babka wrote:
> On 10/14/25 12:27, Harry Yoo wrote:
>> On Tue, Oct 14, 2025 at 05:31:24PM +0800, Hao Ge wrote:
>>> From: Hao Ge <gehao@kylinos.cn>
>>>
>>> We should not reuse the first bit for OBJEXTS_ALLOC_FAIL.
>>> This is because the following scenarios may be encountered:
>>>
>>> Under heavy system load, certain sequences of events can trigger the
>> Hi Hao, thanks for catching it!
>>
>> It's late at night and my brain is tired so I may be missing something,
>> but let me leave comment anyway...
>>
>>> VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS, folio) check:
>> Should we check (folio->memcg_data != OBJEXTS_ALLOC_FAIL) &&
>> (folio->memcg_data & MEMCG_DATA_OBJEXTS) instead then?
> Yes, we already went that direction, but seems we need to expand to more
> places due to 7612833192d5 ("slab: Reuse first bit for OBJEXTS_ALLOC_FAIL"):
>
> https://lore.kernel.org/all/20250915200918.3855580-2-surenb@google.com/
>> Not clearing a valid folio->memcg_data is considered an error, but freeing a
>> folio that is marked OBJEXTS_ALLOC_FAIL isn't.
>>
>>> 1. High system pressure may cause objext allocation failure for a slab.
>>> 2. When objext allocation fails, slab->obj_exts is set to
>>>     OBJEXTS_ALLOC_FAIL (value 1).
>>> 3. Later, this slab may enter the release process.
>>> 4. During release of the associated folio, the existing
>>>     VM_BUG_ON_FOLIO check validates folio->memcg_data.
>>>     If the MEMCG_DATA_OBJEXTS bit is unexpectedly
>>>     set here, the bug check gets triggered.
>>>
>>> We have obtained the following logs:
>>> [ 7108.343437] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff0002deb97600 pfn:0x31eb96
>>> [ 7108.343482] head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
>>> [ 7108.343500] memcg:1
>>> [ 7108.343507] flags: 0x17ffff800000040(head|node=0|zone=2|lastcpupid=0xfffff)
>>> [ 7108.343523] raw: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000
>>> [ 7108.343528] raw: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001
>>> [ 7108.343534] head: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000
>>> [ 7108.343539] head: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001
>>> [ 7108.343562] head: 017ffff800000001 fffffdffcb7ae581 00000000ffffffff 00000000ffffffff
>>> [ 7108.343569] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000002
>>> [ 7108.343574] page dumped because: VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS)
>>> [ 7108.343601] ------------[ cut here ]------------
>>> [ 7108.343607] kernel BUG at ./include/linux/memcontrol.h:537!
>>> [ 7108.343617] Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
>>> [ 7108.345751] Modules linked in: squashfs isofs vhost_vsock vhost_net vmw_vsock_virtio_transport_common vfio_iommu_type1 vhost vfio vsock vhost_iotlb iommufd tap binfmt_misc nfsv3 nfs_acl nfs lockd grace netfs tls rds dns_resolver tun brd overlay ntfs3 exfat btrfs blake2b_generic xor xor_neon raid6_pq loop sctp ip6_udp_tunnel udp_tunnel nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables rfkill ip_set sunrpc vfat fat joydev sg sch_fq_codel nfnetlink virtio_gpu drm_client_lib virtio_dma_buf drm_shmem_helper sr_mod drm_kms_helper cdrom drm ghash_ce virtio_net virtio_scsi backlight virtio_console virtio_blk net_failover failover virtio_mmio dm_mirror dm_region_hash dm_log dm_multipath dm_mod fuse i2c_dev virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring autofs4 aes_neon_bs aes_ce_blk [last unloaded: hwpoison_inject]
>>> [ 7108.355662] CPU: 7 UID: 0 PID: 4470 Comm: kylin-process-m Kdump: loaded Not tainted 6.18.0-rc1-dirty #54 PREEMPT(voluntary)
>>> [ 7108.356864] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
>>> [ 7108.357621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> [ 7108.358981] pc : __free_frozen_pages+0xf18/0x18e8
>>> [ 7108.359834] lr : __free_frozen_pages+0xf18/0x18e8
>>> [ 7108.360379] sp : ffff8000a2bb7580
>>> [ 7108.360786] x29: ffff8000a2bb7580 x28: fffffdffcb7ae580 x27: fffffdffcb7ae580
>>> [ 7108.362013] x26: fffffdffcb7ae588 x25: 1fffffbff96f5cb1 x24: 1fffffbff96f5cb0
>>> [ 7108.362804] x23: ffff8000839d6ba0 x22: ffff8000839d6000 x21: 0000000000000000
>>> [ 7108.363596] x20: 0000000000000000 x19: 0000000000000001 x18: 0000000000000000
>>> [ 7108.364393] x17: 445f47434d454d20 x16: 2620617461645f67 x15: 636d656d3e2d6f69
>>> [ 7108.365498] x14: 6c6f66284f494c4f x13: 0000000000000001 x12: ffff600063fece93
>>> [ 7108.366317] x11: 1fffe00063fece92 x10: ffff600063fece92 x9 : dfff800000000000
>>> [ 7108.367610] x8 : 00009fff9c01316e x7 : ffff00031ff67493 x6 : 0000000000000001
>>> [ 7108.368455] x5 : ffff00031ff67490 x4 : ffff600063fece93 x3 : 0000000000000000
>>> [ 7108.369276] x2 : 0000000000000000 x1 : ffff000103fe5d40 x0 : 000000000000004c
>>> [ 7108.370140] Call trace:
>>> [ 7108.370463]  __free_frozen_pages+0xf18/0x18e8 (P)
>>> [ 7108.371011]  free_frozen_pages+0x1c/0x30
>>> [ 7108.372040]  __free_slab+0xd0/0x250
>>> [ 7108.372471]  free_slab+0x38/0x118
>>> [ 7108.372882]  free_to_partial_list+0x1d4/0x340
>>> [ 7108.373813]  __slab_free+0x24c/0x348
>>> [ 7108.374253]  ___cache_free+0xf0/0x110
>>> [ 7108.374699]  qlist_free_all+0x78/0x130
>>> [ 7108.375156]  kasan_quarantine_reduce+0x114/0x148
>>> [ 7108.375695]  __kasan_slab_alloc+0x7c/0xb0
>>> [ 7108.376668]  kmem_cache_alloc_noprof+0x164/0x5c8
>>> [ 7108.377206]  __alloc_object+0x44/0x1f8
>>> [ 7108.377659]  __create_object+0x34/0xc8
>>> [ 7108.378196]  kmemleak_alloc+0xb8/0xd8
>>> [ 7108.378644]  kmem_cache_alloc_noprof+0x368/0x5c8
>>> [ 7108.379224]  getname_flags.part.0+0xa4/0x610
>>> [ 7108.379733]  getname_flags+0x80/0xd8
>>> [ 7108.380169]  do_sys_openat2+0xb4/0x178
>>> [ 7108.380921]  __arm64_sys_openat+0x134/0x1d0
>>> [ 7108.381952]  invoke_syscall+0xd4/0x258
>>> [ 7108.382408]  el0_svc_common.constprop.0+0xb4/0x240
>>> [ 7108.382965]  do_el0_svc+0x48/0x68
>>> [ 7108.383375]  el0_svc+0x40/0xe0
>>> [ 7108.383757]  el0t_64_sync_handler+0xa0/0xe8
>>> [ 7108.384465]  el0t_64_sync+0x1ac/0x1b0
>>> [ 7108.385284] Code: 91398021 aa1b03e0 91138021 97fd35e3 (d4210000)
>>> [ 7108.386553] SMP: stopping secondary CPUs
>>> [ 7108.389714] Starting crashdump kernel...
>>> [ 7108.390190] Bye!
>>>
>>> So, introduce __SECOND_OBJEXT_FLAG for objext_flags, adjust
>>> the corresponding order accordingly, and ensure that OBJEXTS_ALLOC_FAIL
>>> is no longer reused.
>>>
>>> Fixes: 7612833192d5 ("slab: Reuse first bit for OBJEXTS_ALLOC_FAIL")
>> Hmm using a new bit was suggested at that time, but that would
>> require bumping up the alignment when allocating slabobj_ext array?
>> (see alloc_slab_obj_exts())

Hi Vlastimil and Harry

Now I understand the alignment you're referring to, and indeed, my 
solution does require bumping up the

alignment when allocating the slabobj_ext array.

And syzbot ci has reported a bug:

https://lore.kernel.org/all/68ee41cb.050a0220.91a22.020b.GAE@google.com/

This is because the data of slabobj_ext has been corrupted.


>> And we can still distinguish two cases where
>>
>> 1) MEMCG_DATA_OBJEXTS is set, but upper bits are not set,
>>     so it should mean obj_exts allocation failed (OBJEXTS_ALLOC_FAIL),
>>     thus do not report error, or
>>
>> 2) MEMCG_DATA_OBJEXTS is set, and upper bits are also set, so someone
>>     did not clear a valid folio->memcg_data before freeing the folio
>>     (report error).
>>
>> without introducing a new bit, right?
> Agreed.

Okay, now I understand what you mean. I will send out the V2 version soon.

Thank you for your guidance.


Thanks

Best Regards

Hao

>
>>> Signed-off-by: Hao Ge <gehao@kylinos.cn>
>>> ---
>>>   include/linux/memcontrol.h | 16 ++++++----------
>>>   1 file changed, 6 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>>> index 873e510d6f8d..8ea023944fac 100644
>>> --- a/include/linux/memcontrol.h
>>> +++ b/include/linux/memcontrol.h
>>> @@ -341,27 +341,23 @@ enum page_memcg_data_flags {
>>>   	__NR_MEMCG_DATA_FLAGS  = (1UL << 2),
>>>   };
>>>   
>>> -#define __OBJEXTS_ALLOC_FAIL	MEMCG_DATA_OBJEXTS
>>>   #define __FIRST_OBJEXT_FLAG	__NR_MEMCG_DATA_FLAGS
>>> +#define __SECOND_OBJEXT_FLAG    (__FIRST_OBJEXT_FLAG << 1)
>>>   
>>>   #else /* CONFIG_MEMCG */
>>>   
>>> -#define __OBJEXTS_ALLOC_FAIL	(1UL << 0)
>>>   #define __FIRST_OBJEXT_FLAG	(1UL << 0)
>>> +#define __SECOND_OBJEXT_FLAG	(1UL << 0)
>>>   
>>>   #endif /* CONFIG_MEMCG */
>>>   
>>>   enum objext_flags {
>>> -	/*
>>> -	 * Use bit 0 with zero other bits to signal that slabobj_ext vector
>>> -	 * failed to allocate. The same bit 0 with valid upper bits means
>>> -	 * MEMCG_DATA_OBJEXTS.
>>> -	 */
>>> -	OBJEXTS_ALLOC_FAIL = __OBJEXTS_ALLOC_FAIL,
>>> +	/* slabobj_ext vector failed to allocate */
>>> +	OBJEXTS_ALLOC_FAIL = __FIRST_OBJEXT_FLAG,
>>>   	/* slabobj_ext vector allocated with kmalloc_nolock() */
>>> -	OBJEXTS_NOSPIN_ALLOC = __FIRST_OBJEXT_FLAG,
>>> +	OBJEXTS_NOSPIN_ALLOC = __SECOND_OBJEXT_FLAG,
>>>   	/* the next bit after the last actual flag */
>>> -	__NR_OBJEXTS_FLAGS  = (__FIRST_OBJEXT_FLAG << 1),
>>> +	__NR_OBJEXTS_FLAGS  = (__SECOND_OBJEXT_FLAG << 1),
>>>   };
>>>   
>>>   #define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1)
>>> -- 
>>> 2.25.1
>>>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [syzbot ci] Re: slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags
  2025-10-14  9:31 [PATCH] slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags Hao Ge
  2025-10-14 10:27 ` Harry Yoo
@ 2025-10-14 13:10 ` syzbot ci
  1 sibling, 0 replies; 6+ messages in thread
From: syzbot ci @ 2025-10-14 13:10 UTC (permalink / raw)
  To: akpm, ast, cgroups, gehao, hannes, hao.ge, harry.yoo,
	linux-kernel, linux-mm, mhocko, muchun.song, roman.gushchin,
	shakeel.butt, surenb, vbabka
  Cc: syzbot, syzkaller-bugs

syzbot ci has tested the following series

[v1] slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags
https://lore.kernel.org/all/20251014093124.300012-1-hao.ge@linux.dev
* [PATCH] slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags

and found the following issue:
general protection fault in percpu_ref_get_many

Full report is available here:
https://ci.syzbot.org/series/6fd66120-211f-479f-b6a1-35f990da2dc2

***

general protection fault in percpu_ref_get_many

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      0d97f2067c166eb495771fede9f7b73999c67f66
arch:      amd64
compiler:  Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config:    https://ci.syzbot.org/builds/74de5bb7-695b-4115-9a4b-ee7d7fd0cca2/config

Oops: general protection fault, probably for non-canonical address 0xdffffc00177780ff: 0000 [#1] SMP KASAN PTI
KASAN: probably user-memory-access in range [0x00000000bbbc07f8-0x00000000bbbc07ff]
CPU: 1 UID: 0 PID: 6155 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:percpu_ref_get_many+0x8d/0x140
Code: 01 48 c7 c7 80 70 78 8b be 65 03 00 00 48 c7 c2 c0 70 78 8b e8 64 2b 6f ff 49 bc 00 00 00 00 00 fc ff df 4c 89 f8 48 c1 e8 03 <42> 80 3c 20 00 74 08 4c 89 ff e8 c4 50 f7 ff 49 8b 07 a8 03 75 62
RSP: 0018:ffffc90004df7500 EFLAGS: 00010206
RAX: 00000000177780ff RBX: ffffffff822de139 RCX: 14bab840e71f4400
RDX: 0000000000000000 RSI: ffffffff8bc074c0 RDI: ffffffff8bc07480
RBP: 0000000000000088 R08: 0000000000000000 R09: ffffffff822de139
R10: dffffc0000000000 R11: fffffbfff1f3c1ef R12: dffffc0000000000
R13: ffff88823c63b5c0 R14: 0000000000000001 R15: 00000000bbbc07f8
FS:  0000555570ae6500(0000) GS:ffff8882a9d12000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555562d8c5c8 CR3: 0000000113444000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 refill_obj_stock+0x254/0x850
 __memcg_slab_free_hook+0x123/0x3b0
 kfree+0x3f7/0x6d0
 kobject_uevent_env+0x361/0x8c0
 netdev_queue_update_kobjects+0x346/0x6c0
 netdev_register_kobject+0x258/0x310
 register_netdevice+0x126c/0x1ae0
 __ip_tunnel_create+0x3e7/0x560
 ip_tunnel_init_net+0x2ba/0x800
 ops_init+0x35c/0x5c0
 setup_net+0xfe/0x320
 copy_net_ns+0x34e/0x4e0
 create_new_namespaces+0x3f3/0x720
 unshare_nsproxy_namespaces+0x11c/0x170
 ksys_unshare+0x4c8/0x8c0
 __x64_sys_unshare+0x38/0x50
 do_syscall_64+0xfa/0xfa0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f85c81906c7
Code: 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 10 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe57c0ca58 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f85c81906c7
RDX: 00007f85c818eec9 RSI: 00007ffe57c0ca20 RDI: 0000000040000000
RBP: 00007ffe57c0cac0 R08: 00007f85c83a69d0 R09: 00007f85c83a69d0
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe57c0cac0
R13: 00007ffe57c0cac8 R14: 0000000000000009 R15: 0000000000000000
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:percpu_ref_get_many+0x8d/0x140
Code: 01 48 c7 c7 80 70 78 8b be 65 03 00 00 48 c7 c2 c0 70 78 8b e8 64 2b 6f ff 49 bc 00 00 00 00 00 fc ff df 4c 89 f8 48 c1 e8 03 <42> 80 3c 20 00 74 08 4c 89 ff e8 c4 50 f7 ff 49 8b 07 a8 03 75 62
RSP: 0018:ffffc90004df7500 EFLAGS: 00010206
RAX: 00000000177780ff RBX: ffffffff822de139 RCX: 14bab840e71f4400
RDX: 0000000000000000 RSI: ffffffff8bc074c0 RDI: ffffffff8bc07480
RBP: 0000000000000088 R08: 0000000000000000 R09: ffffffff822de139
R10: dffffc0000000000 R11: fffffbfff1f3c1ef R12: dffffc0000000000
R13: ffff88823c63b5c0 R14: 0000000000000001 R15: 00000000bbbc07f8
FS:  0000555570ae6500(0000) GS:ffff8882a9d12000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555562d8c5c8 CR3: 0000000113444000 CR4: 00000000000006f0


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-10-14 13:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-14  9:31 [PATCH] slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags Hao Ge
2025-10-14 10:27 ` Harry Yoo
2025-10-14 11:18   ` Hao Ge
2025-10-14 12:49   ` Vlastimil Babka
2025-10-14 13:18     ` Hao Ge
2025-10-14 13:10 ` [syzbot ci] " syzbot ci

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox