* [PATCH] slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags
@ 2025-10-14 9:31 Hao Ge
2025-10-14 10:27 ` Harry Yoo
2025-10-14 13:10 ` [syzbot ci] " syzbot ci
0 siblings, 2 replies; 6+ messages in thread
From: Hao Ge @ 2025-10-14 9:31 UTC (permalink / raw)
To: Vlastimil Babka, Alexei Starovoitov, Andrew Morton,
Johannes Weiner, Shakeel Butt, Michal Hocko, Roman Gushchin,
Muchun Song, Suren Baghdasaryan
Cc: Harry Yoo, cgroups, linux-mm, linux-kernel, Hao Ge
From: Hao Ge <gehao@kylinos.cn>
We should not reuse the first bit for OBJEXTS_ALLOC_FAIL.
This is because the following scenarios may be encountered:
Under heavy system load, certain sequences of events can trigger the
VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS, folio) check:
1. High system pressure may cause objext allocation failure for a slab.
2. When objext allocation fails, slab->obj_exts is set to
OBJEXTS_ALLOC_FAIL (value 1).
3. Later, this slab may enter the release process.
4. During release of the associated folio, the existing
VM_BUG_ON_FOLIO check validates folio->memcg_data.
If the MEMCG_DATA_OBJEXTS bit is unexpectedly
set here, the bug check gets triggered.
We have obtained the following logs:
[ 7108.343437] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff0002deb97600 pfn:0x31eb96
[ 7108.343482] head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 7108.343500] memcg:1
[ 7108.343507] flags: 0x17ffff800000040(head|node=0|zone=2|lastcpupid=0xfffff)
[ 7108.343523] raw: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000
[ 7108.343528] raw: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001
[ 7108.343534] head: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000
[ 7108.343539] head: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001
[ 7108.343562] head: 017ffff800000001 fffffdffcb7ae581 00000000ffffffff 00000000ffffffff
[ 7108.343569] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000002
[ 7108.343574] page dumped because: VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS)
[ 7108.343601] ------------[ cut here ]------------
[ 7108.343607] kernel BUG at ./include/linux/memcontrol.h:537!
[ 7108.343617] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
[ 7108.345751] Modules linked in: squashfs isofs vhost_vsock vhost_net vmw_vsock_virtio_transport_common vfio_iommu_type1 vhost vfio vsock vhost_iotlb iommufd tap binfmt_misc nfsv3 nfs_acl nfs lockd grace netfs tls rds dns_resolver tun brd overlay ntfs3 exfat btrfs blake2b_generic xor xor_neon raid6_pq loop sctp ip6_udp_tunnel udp_tunnel nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables rfkill ip_set sunrpc vfat fat joydev sg sch_fq_codel nfnetlink virtio_gpu drm_client_lib virtio_dma_buf drm_shmem_helper sr_mod drm_kms_helper cdrom drm ghash_ce virtio_net virtio_scsi backlight virtio_console virtio_blk net_failover failover virtio_mmio dm_mirror dm_region_hash dm_log dm_multipath dm_mod fuse i2c_dev virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring autofs4 aes_neon_bs aes_ce_blk [last unloaded: hwpoison_inject]
[ 7108.355662] CPU: 7 UID: 0 PID: 4470 Comm: kylin-process-m Kdump: loaded Not tainted 6.18.0-rc1-dirty #54 PREEMPT(voluntary)
[ 7108.356864] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
[ 7108.357621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 7108.358981] pc : __free_frozen_pages+0xf18/0x18e8
[ 7108.359834] lr : __free_frozen_pages+0xf18/0x18e8
[ 7108.360379] sp : ffff8000a2bb7580
[ 7108.360786] x29: ffff8000a2bb7580 x28: fffffdffcb7ae580 x27: fffffdffcb7ae580
[ 7108.362013] x26: fffffdffcb7ae588 x25: 1fffffbff96f5cb1 x24: 1fffffbff96f5cb0
[ 7108.362804] x23: ffff8000839d6ba0 x22: ffff8000839d6000 x21: 0000000000000000
[ 7108.363596] x20: 0000000000000000 x19: 0000000000000001 x18: 0000000000000000
[ 7108.364393] x17: 445f47434d454d20 x16: 2620617461645f67 x15: 636d656d3e2d6f69
[ 7108.365498] x14: 6c6f66284f494c4f x13: 0000000000000001 x12: ffff600063fece93
[ 7108.366317] x11: 1fffe00063fece92 x10: ffff600063fece92 x9 : dfff800000000000
[ 7108.367610] x8 : 00009fff9c01316e x7 : ffff00031ff67493 x6 : 0000000000000001
[ 7108.368455] x5 : ffff00031ff67490 x4 : ffff600063fece93 x3 : 0000000000000000
[ 7108.369276] x2 : 0000000000000000 x1 : ffff000103fe5d40 x0 : 000000000000004c
[ 7108.370140] Call trace:
[ 7108.370463] __free_frozen_pages+0xf18/0x18e8 (P)
[ 7108.371011] free_frozen_pages+0x1c/0x30
[ 7108.372040] __free_slab+0xd0/0x250
[ 7108.372471] free_slab+0x38/0x118
[ 7108.372882] free_to_partial_list+0x1d4/0x340
[ 7108.373813] __slab_free+0x24c/0x348
[ 7108.374253] ___cache_free+0xf0/0x110
[ 7108.374699] qlist_free_all+0x78/0x130
[ 7108.375156] kasan_quarantine_reduce+0x114/0x148
[ 7108.375695] __kasan_slab_alloc+0x7c/0xb0
[ 7108.376668] kmem_cache_alloc_noprof+0x164/0x5c8
[ 7108.377206] __alloc_object+0x44/0x1f8
[ 7108.377659] __create_object+0x34/0xc8
[ 7108.378196] kmemleak_alloc+0xb8/0xd8
[ 7108.378644] kmem_cache_alloc_noprof+0x368/0x5c8
[ 7108.379224] getname_flags.part.0+0xa4/0x610
[ 7108.379733] getname_flags+0x80/0xd8
[ 7108.380169] do_sys_openat2+0xb4/0x178
[ 7108.380921] __arm64_sys_openat+0x134/0x1d0
[ 7108.381952] invoke_syscall+0xd4/0x258
[ 7108.382408] el0_svc_common.constprop.0+0xb4/0x240
[ 7108.382965] do_el0_svc+0x48/0x68
[ 7108.383375] el0_svc+0x40/0xe0
[ 7108.383757] el0t_64_sync_handler+0xa0/0xe8
[ 7108.384465] el0t_64_sync+0x1ac/0x1b0
[ 7108.385284] Code: 91398021 aa1b03e0 91138021 97fd35e3 (d4210000)
[ 7108.386553] SMP: stopping secondary CPUs
[ 7108.389714] Starting crashdump kernel...
[ 7108.390190] Bye!
So, introduce __SECOND_OBJEXT_FLAG for objext_flags, adjust
the corresponding order accordingly, and ensure that OBJEXTS_ALLOC_FAIL
is no longer reused.
Fixes: 7612833192d5 ("slab: Reuse first bit for OBJEXTS_ALLOC_FAIL")
Signed-off-by: Hao Ge <gehao@kylinos.cn>
---
include/linux/memcontrol.h | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 873e510d6f8d..8ea023944fac 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -341,27 +341,23 @@ enum page_memcg_data_flags {
__NR_MEMCG_DATA_FLAGS = (1UL << 2),
};
-#define __OBJEXTS_ALLOC_FAIL MEMCG_DATA_OBJEXTS
#define __FIRST_OBJEXT_FLAG __NR_MEMCG_DATA_FLAGS
+#define __SECOND_OBJEXT_FLAG (__FIRST_OBJEXT_FLAG << 1)
#else /* CONFIG_MEMCG */
-#define __OBJEXTS_ALLOC_FAIL (1UL << 0)
#define __FIRST_OBJEXT_FLAG (1UL << 0)
+#define __SECOND_OBJEXT_FLAG (1UL << 0)
#endif /* CONFIG_MEMCG */
enum objext_flags {
- /*
- * Use bit 0 with zero other bits to signal that slabobj_ext vector
- * failed to allocate. The same bit 0 with valid upper bits means
- * MEMCG_DATA_OBJEXTS.
- */
- OBJEXTS_ALLOC_FAIL = __OBJEXTS_ALLOC_FAIL,
+ /* slabobj_ext vector failed to allocate */
+ OBJEXTS_ALLOC_FAIL = __FIRST_OBJEXT_FLAG,
/* slabobj_ext vector allocated with kmalloc_nolock() */
- OBJEXTS_NOSPIN_ALLOC = __FIRST_OBJEXT_FLAG,
+ OBJEXTS_NOSPIN_ALLOC = __SECOND_OBJEXT_FLAG,
/* the next bit after the last actual flag */
- __NR_OBJEXTS_FLAGS = (__FIRST_OBJEXT_FLAG << 1),
+ __NR_OBJEXTS_FLAGS = (__SECOND_OBJEXT_FLAG << 1),
};
#define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1)
--
2.25.1
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags 2025-10-14 9:31 [PATCH] slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags Hao Ge @ 2025-10-14 10:27 ` Harry Yoo 2025-10-14 11:18 ` Hao Ge 2025-10-14 12:49 ` Vlastimil Babka 2025-10-14 13:10 ` [syzbot ci] " syzbot ci 1 sibling, 2 replies; 6+ messages in thread From: Harry Yoo @ 2025-10-14 10:27 UTC (permalink / raw) To: Hao Ge Cc: Vlastimil Babka, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Shakeel Butt, Michal Hocko, Roman Gushchin, Muchun Song, Suren Baghdasaryan, cgroups, linux-mm, linux-kernel, Hao Ge On Tue, Oct 14, 2025 at 05:31:24PM +0800, Hao Ge wrote: > From: Hao Ge <gehao@kylinos.cn> > > We should not reuse the first bit for OBJEXTS_ALLOC_FAIL. > This is because the following scenarios may be encountered: > > Under heavy system load, certain sequences of events can trigger the Hi Hao, thanks for catching it! It's late at night and my brain is tired so I may be missing something, but let me leave comment anyway... > VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS, folio) check: Should we check (folio->memcg_data != OBJEXTS_ALLOC_FAIL) && (folio->memcg_data & MEMCG_DATA_OBJEXTS) instead then? Not clearing a valid folio->memcg_data is considered an error, but freeing a folio that is marked OBJEXTS_ALLOC_FAIL isn't. > 1. High system pressure may cause objext allocation failure for a slab. > 2. When objext allocation fails, slab->obj_exts is set to > OBJEXTS_ALLOC_FAIL (value 1). > 3. Later, this slab may enter the release process. > 4. During release of the associated folio, the existing > VM_BUG_ON_FOLIO check validates folio->memcg_data. > If the MEMCG_DATA_OBJEXTS bit is unexpectedly > set here, the bug check gets triggered. > > We have obtained the following logs: > [ 7108.343437] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff0002deb97600 pfn:0x31eb96 > [ 7108.343482] head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 > [ 7108.343500] memcg:1 > [ 7108.343507] flags: 0x17ffff800000040(head|node=0|zone=2|lastcpupid=0xfffff) > [ 7108.343523] raw: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000 > [ 7108.343528] raw: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001 > [ 7108.343534] head: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000 > [ 7108.343539] head: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001 > [ 7108.343562] head: 017ffff800000001 fffffdffcb7ae581 00000000ffffffff 00000000ffffffff > [ 7108.343569] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000002 > [ 7108.343574] page dumped because: VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS) > [ 7108.343601] ------------[ cut here ]------------ > [ 7108.343607] kernel BUG at ./include/linux/memcontrol.h:537! > [ 7108.343617] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP > [ 7108.345751] Modules linked in: squashfs isofs vhost_vsock vhost_net vmw_vsock_virtio_transport_common vfio_iommu_type1 vhost vfio vsock vhost_iotlb iommufd tap binfmt_misc nfsv3 nfs_acl nfs lockd grace netfs tls rds dns_resolver tun brd overlay ntfs3 exfat btrfs blake2b_generic xor xor_neon raid6_pq loop sctp ip6_udp_tunnel udp_tunnel nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables rfkill ip_set sunrpc vfat fat joydev sg sch_fq_codel nfnetlink virtio_gpu drm_client_lib virtio_dma_buf drm_shmem_helper sr_mod drm_kms_helper cdrom drm ghash_ce virtio_net virtio_scsi backlight virtio_console virtio_blk net_failover failover virtio_mmio dm_mirror dm_region_hash dm_log dm_multipath dm_mod fuse i2c_dev virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring autofs4 aes_neon_bs aes_ce_blk [last unloaded: hwpoison_inject] > [ 7108.355662] CPU: 7 UID: 0 PID: 4470 Comm: kylin-process-m Kdump: loaded Not tainted 6.18.0-rc1-dirty #54 PREEMPT(voluntary) > [ 7108.356864] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 > [ 7108.357621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 7108.358981] pc : __free_frozen_pages+0xf18/0x18e8 > [ 7108.359834] lr : __free_frozen_pages+0xf18/0x18e8 > [ 7108.360379] sp : ffff8000a2bb7580 > [ 7108.360786] x29: ffff8000a2bb7580 x28: fffffdffcb7ae580 x27: fffffdffcb7ae580 > [ 7108.362013] x26: fffffdffcb7ae588 x25: 1fffffbff96f5cb1 x24: 1fffffbff96f5cb0 > [ 7108.362804] x23: ffff8000839d6ba0 x22: ffff8000839d6000 x21: 0000000000000000 > [ 7108.363596] x20: 0000000000000000 x19: 0000000000000001 x18: 0000000000000000 > [ 7108.364393] x17: 445f47434d454d20 x16: 2620617461645f67 x15: 636d656d3e2d6f69 > [ 7108.365498] x14: 6c6f66284f494c4f x13: 0000000000000001 x12: ffff600063fece93 > [ 7108.366317] x11: 1fffe00063fece92 x10: ffff600063fece92 x9 : dfff800000000000 > [ 7108.367610] x8 : 00009fff9c01316e x7 : ffff00031ff67493 x6 : 0000000000000001 > [ 7108.368455] x5 : ffff00031ff67490 x4 : ffff600063fece93 x3 : 0000000000000000 > [ 7108.369276] x2 : 0000000000000000 x1 : ffff000103fe5d40 x0 : 000000000000004c > [ 7108.370140] Call trace: > [ 7108.370463] __free_frozen_pages+0xf18/0x18e8 (P) > [ 7108.371011] free_frozen_pages+0x1c/0x30 > [ 7108.372040] __free_slab+0xd0/0x250 > [ 7108.372471] free_slab+0x38/0x118 > [ 7108.372882] free_to_partial_list+0x1d4/0x340 > [ 7108.373813] __slab_free+0x24c/0x348 > [ 7108.374253] ___cache_free+0xf0/0x110 > [ 7108.374699] qlist_free_all+0x78/0x130 > [ 7108.375156] kasan_quarantine_reduce+0x114/0x148 > [ 7108.375695] __kasan_slab_alloc+0x7c/0xb0 > [ 7108.376668] kmem_cache_alloc_noprof+0x164/0x5c8 > [ 7108.377206] __alloc_object+0x44/0x1f8 > [ 7108.377659] __create_object+0x34/0xc8 > [ 7108.378196] kmemleak_alloc+0xb8/0xd8 > [ 7108.378644] kmem_cache_alloc_noprof+0x368/0x5c8 > [ 7108.379224] getname_flags.part.0+0xa4/0x610 > [ 7108.379733] getname_flags+0x80/0xd8 > [ 7108.380169] do_sys_openat2+0xb4/0x178 > [ 7108.380921] __arm64_sys_openat+0x134/0x1d0 > [ 7108.381952] invoke_syscall+0xd4/0x258 > [ 7108.382408] el0_svc_common.constprop.0+0xb4/0x240 > [ 7108.382965] do_el0_svc+0x48/0x68 > [ 7108.383375] el0_svc+0x40/0xe0 > [ 7108.383757] el0t_64_sync_handler+0xa0/0xe8 > [ 7108.384465] el0t_64_sync+0x1ac/0x1b0 > [ 7108.385284] Code: 91398021 aa1b03e0 91138021 97fd35e3 (d4210000) > [ 7108.386553] SMP: stopping secondary CPUs > [ 7108.389714] Starting crashdump kernel... > [ 7108.390190] Bye! > > So, introduce __SECOND_OBJEXT_FLAG for objext_flags, adjust > the corresponding order accordingly, and ensure that OBJEXTS_ALLOC_FAIL > is no longer reused. > > Fixes: 7612833192d5 ("slab: Reuse first bit for OBJEXTS_ALLOC_FAIL") Hmm using a new bit was suggested at that time, but that would require bumping up the alignment when allocating slabobj_ext array? (see alloc_slab_obj_exts()) And we can still distinguish two cases where 1) MEMCG_DATA_OBJEXTS is set, but upper bits are not set, so it should mean obj_exts allocation failed (OBJEXTS_ALLOC_FAIL), thus do not report error, or 2) MEMCG_DATA_OBJEXTS is set, and upper bits are also set, so someone did not clear a valid folio->memcg_data before freeing the folio (report error). without introducing a new bit, right? > Signed-off-by: Hao Ge <gehao@kylinos.cn> > --- > include/linux/memcontrol.h | 16 ++++++---------- > 1 file changed, 6 insertions(+), 10 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 873e510d6f8d..8ea023944fac 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -341,27 +341,23 @@ enum page_memcg_data_flags { > __NR_MEMCG_DATA_FLAGS = (1UL << 2), > }; > > -#define __OBJEXTS_ALLOC_FAIL MEMCG_DATA_OBJEXTS > #define __FIRST_OBJEXT_FLAG __NR_MEMCG_DATA_FLAGS > +#define __SECOND_OBJEXT_FLAG (__FIRST_OBJEXT_FLAG << 1) > > #else /* CONFIG_MEMCG */ > > -#define __OBJEXTS_ALLOC_FAIL (1UL << 0) > #define __FIRST_OBJEXT_FLAG (1UL << 0) > +#define __SECOND_OBJEXT_FLAG (1UL << 0) > > #endif /* CONFIG_MEMCG */ > > enum objext_flags { > - /* > - * Use bit 0 with zero other bits to signal that slabobj_ext vector > - * failed to allocate. The same bit 0 with valid upper bits means > - * MEMCG_DATA_OBJEXTS. > - */ > - OBJEXTS_ALLOC_FAIL = __OBJEXTS_ALLOC_FAIL, > + /* slabobj_ext vector failed to allocate */ > + OBJEXTS_ALLOC_FAIL = __FIRST_OBJEXT_FLAG, > /* slabobj_ext vector allocated with kmalloc_nolock() */ > - OBJEXTS_NOSPIN_ALLOC = __FIRST_OBJEXT_FLAG, > + OBJEXTS_NOSPIN_ALLOC = __SECOND_OBJEXT_FLAG, > /* the next bit after the last actual flag */ > - __NR_OBJEXTS_FLAGS = (__FIRST_OBJEXT_FLAG << 1), > + __NR_OBJEXTS_FLAGS = (__SECOND_OBJEXT_FLAG << 1), > }; > > #define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1) > -- > 2.25.1 > -- Cheers, Harry / Hyeonggon ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags 2025-10-14 10:27 ` Harry Yoo @ 2025-10-14 11:18 ` Hao Ge 2025-10-14 12:49 ` Vlastimil Babka 1 sibling, 0 replies; 6+ messages in thread From: Hao Ge @ 2025-10-14 11:18 UTC (permalink / raw) To: Harry Yoo Cc: Vlastimil Babka, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Shakeel Butt, Michal Hocko, Roman Gushchin, Muchun Song, Suren Baghdasaryan, cgroups, linux-mm, linux-kernel, Hao Ge On 2025/10/14 18:27, Harry Yoo wrote: > On Tue, Oct 14, 2025 at 05:31:24PM +0800, Hao Ge wrote: >> From: Hao Ge <gehao@kylinos.cn> >> >> We should not reuse the first bit for OBJEXTS_ALLOC_FAIL. >> This is because the following scenarios may be encountered: >> >> Under heavy system load, certain sequences of events can trigger the > Hi Hao, thanks for catching it! > > It's late at night and my brain is tired so I may be missing something, > but let me leave comment anyway... > >> VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS, folio) check: > Should we check (folio->memcg_data != OBJEXTS_ALLOC_FAIL) && > (folio->memcg_data & MEMCG_DATA_OBJEXTS) instead then? > > Not clearing a valid folio->memcg_data is considered an error, but freeing a > folio that is marked OBJEXTS_ALLOC_FAIL isn't. Hi Harry Thank you very much for taking the time to review my patch amid your busy work. It was also that I didn’t express myself clearly in the following paragraph. >> 1. High system pressure may cause objext allocation failure for a slab. >> 2. When objext allocation fails, slab->obj_exts is set to >> OBJEXTS_ALLOC_FAIL (value 1). The sentence "2. When objext allocation fails, slab->obj_exts is set to OBJEXTS_ALLOC_FAIL (value 1)." should be converted to this one: "2. When objext allocation fails, slab->obj_exts is set to OBJEXTS_ALLOC_FAIL, and OBJEXTS_ALLOC_FAIL is actually equivalent to MEMCG_DATA_OBJEXTS." So the root cause of this issue lies here as well—because OBJEXTS_ALLOC_FAIL and MEMCG_DATA_OBJEXTS are reusing the same bit. Thanks Best Regards Hao >> 3. Later, this slab may enter the release process. >> 4. During release of the associated folio, the existing >> VM_BUG_ON_FOLIO check validates folio->memcg_data. >> If the MEMCG_DATA_OBJEXTS bit is unexpectedly >> set here, the bug check gets triggered. >> >> We have obtained the following logs: >> [ 7108.343437] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff0002deb97600 pfn:0x31eb96 >> [ 7108.343482] head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 >> [ 7108.343500] memcg:1 >> [ 7108.343507] flags: 0x17ffff800000040(head|node=0|zone=2|lastcpupid=0xfffff) >> [ 7108.343523] raw: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000 >> [ 7108.343528] raw: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001 >> [ 7108.343534] head: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000 >> [ 7108.343539] head: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001 >> [ 7108.343562] head: 017ffff800000001 fffffdffcb7ae581 00000000ffffffff 00000000ffffffff >> [ 7108.343569] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000002 >> [ 7108.343574] page dumped because: VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS) >> [ 7108.343601] ------------[ cut here ]------------ >> [ 7108.343607] kernel BUG at ./include/linux/memcontrol.h:537! >> [ 7108.343617] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP >> [ 7108.345751] Modules linked in: squashfs isofs vhost_vsock vhost_net vmw_vsock_virtio_transport_common vfio_iommu_type1 vhost vfio vsock vhost_iotlb iommufd tap binfmt_misc nfsv3 nfs_acl nfs lockd grace netfs tls rds dns_resolver tun brd overlay ntfs3 exfat btrfs blake2b_generic xor xor_neon raid6_pq loop sctp ip6_udp_tunnel udp_tunnel nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables rfkill ip_set sunrpc vfat fat joydev sg sch_fq_codel nfnetlink virtio_gpu drm_client_lib virtio_dma_buf drm_shmem_helper sr_mod drm_kms_helper cdrom drm ghash_ce virtio_net virtio_scsi backlight virtio_console virtio_blk net_failover failover virtio_mmio dm_mirror dm_region_hash dm_log dm_multipath dm_mod fuse i2c_dev virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring autofs4 aes_neon_bs aes_ce_blk [last unloaded: hwpoison_inject] >> [ 7108.355662] CPU: 7 UID: 0 PID: 4470 Comm: kylin-process-m Kdump: loaded Not tainted 6.18.0-rc1-dirty #54 PREEMPT(voluntary) >> [ 7108.356864] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 >> [ 7108.357621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> [ 7108.358981] pc : __free_frozen_pages+0xf18/0x18e8 >> [ 7108.359834] lr : __free_frozen_pages+0xf18/0x18e8 >> [ 7108.360379] sp : ffff8000a2bb7580 >> [ 7108.360786] x29: ffff8000a2bb7580 x28: fffffdffcb7ae580 x27: fffffdffcb7ae580 >> [ 7108.362013] x26: fffffdffcb7ae588 x25: 1fffffbff96f5cb1 x24: 1fffffbff96f5cb0 >> [ 7108.362804] x23: ffff8000839d6ba0 x22: ffff8000839d6000 x21: 0000000000000000 >> [ 7108.363596] x20: 0000000000000000 x19: 0000000000000001 x18: 0000000000000000 >> [ 7108.364393] x17: 445f47434d454d20 x16: 2620617461645f67 x15: 636d656d3e2d6f69 >> [ 7108.365498] x14: 6c6f66284f494c4f x13: 0000000000000001 x12: ffff600063fece93 >> [ 7108.366317] x11: 1fffe00063fece92 x10: ffff600063fece92 x9 : dfff800000000000 >> [ 7108.367610] x8 : 00009fff9c01316e x7 : ffff00031ff67493 x6 : 0000000000000001 >> [ 7108.368455] x5 : ffff00031ff67490 x4 : ffff600063fece93 x3 : 0000000000000000 >> [ 7108.369276] x2 : 0000000000000000 x1 : ffff000103fe5d40 x0 : 000000000000004c >> [ 7108.370140] Call trace: >> [ 7108.370463] __free_frozen_pages+0xf18/0x18e8 (P) >> [ 7108.371011] free_frozen_pages+0x1c/0x30 >> [ 7108.372040] __free_slab+0xd0/0x250 >> [ 7108.372471] free_slab+0x38/0x118 >> [ 7108.372882] free_to_partial_list+0x1d4/0x340 >> [ 7108.373813] __slab_free+0x24c/0x348 >> [ 7108.374253] ___cache_free+0xf0/0x110 >> [ 7108.374699] qlist_free_all+0x78/0x130 >> [ 7108.375156] kasan_quarantine_reduce+0x114/0x148 >> [ 7108.375695] __kasan_slab_alloc+0x7c/0xb0 >> [ 7108.376668] kmem_cache_alloc_noprof+0x164/0x5c8 >> [ 7108.377206] __alloc_object+0x44/0x1f8 >> [ 7108.377659] __create_object+0x34/0xc8 >> [ 7108.378196] kmemleak_alloc+0xb8/0xd8 >> [ 7108.378644] kmem_cache_alloc_noprof+0x368/0x5c8 >> [ 7108.379224] getname_flags.part.0+0xa4/0x610 >> [ 7108.379733] getname_flags+0x80/0xd8 >> [ 7108.380169] do_sys_openat2+0xb4/0x178 >> [ 7108.380921] __arm64_sys_openat+0x134/0x1d0 >> [ 7108.381952] invoke_syscall+0xd4/0x258 >> [ 7108.382408] el0_svc_common.constprop.0+0xb4/0x240 >> [ 7108.382965] do_el0_svc+0x48/0x68 >> [ 7108.383375] el0_svc+0x40/0xe0 >> [ 7108.383757] el0t_64_sync_handler+0xa0/0xe8 >> [ 7108.384465] el0t_64_sync+0x1ac/0x1b0 >> [ 7108.385284] Code: 91398021 aa1b03e0 91138021 97fd35e3 (d4210000) >> [ 7108.386553] SMP: stopping secondary CPUs >> [ 7108.389714] Starting crashdump kernel... >> [ 7108.390190] Bye! >> >> So, introduce __SECOND_OBJEXT_FLAG for objext_flags, adjust >> the corresponding order accordingly, and ensure that OBJEXTS_ALLOC_FAIL >> is no longer reused. >> >> Fixes: 7612833192d5 ("slab: Reuse first bit for OBJEXTS_ALLOC_FAIL") > Hmm using a new bit was suggested at that time, but that would > require bumping up the alignment when allocating slabobj_ext array? > (see alloc_slab_obj_exts()) > > And we can still distinguish two cases where > > 1) MEMCG_DATA_OBJEXTS is set, but upper bits are not set, > so it should mean obj_exts allocation failed (OBJEXTS_ALLOC_FAIL), > thus do not report error, or > > 2) MEMCG_DATA_OBJEXTS is set, and upper bits are also set, so someone > did not clear a valid folio->memcg_data before freeing the folio > (report error). > > without introducing a new bit, right? > >> Signed-off-by: Hao Ge <gehao@kylinos.cn> >> --- >> include/linux/memcontrol.h | 16 ++++++---------- >> 1 file changed, 6 insertions(+), 10 deletions(-) >> >> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >> index 873e510d6f8d..8ea023944fac 100644 >> --- a/include/linux/memcontrol.h >> +++ b/include/linux/memcontrol.h >> @@ -341,27 +341,23 @@ enum page_memcg_data_flags { >> __NR_MEMCG_DATA_FLAGS = (1UL << 2), >> }; >> >> -#define __OBJEXTS_ALLOC_FAIL MEMCG_DATA_OBJEXTS >> #define __FIRST_OBJEXT_FLAG __NR_MEMCG_DATA_FLAGS >> +#define __SECOND_OBJEXT_FLAG (__FIRST_OBJEXT_FLAG << 1) >> >> #else /* CONFIG_MEMCG */ >> >> -#define __OBJEXTS_ALLOC_FAIL (1UL << 0) >> #define __FIRST_OBJEXT_FLAG (1UL << 0) >> +#define __SECOND_OBJEXT_FLAG (1UL << 0) >> >> #endif /* CONFIG_MEMCG */ >> >> enum objext_flags { >> - /* >> - * Use bit 0 with zero other bits to signal that slabobj_ext vector >> - * failed to allocate. The same bit 0 with valid upper bits means >> - * MEMCG_DATA_OBJEXTS. >> - */ >> - OBJEXTS_ALLOC_FAIL = __OBJEXTS_ALLOC_FAIL, >> + /* slabobj_ext vector failed to allocate */ >> + OBJEXTS_ALLOC_FAIL = __FIRST_OBJEXT_FLAG, >> /* slabobj_ext vector allocated with kmalloc_nolock() */ >> - OBJEXTS_NOSPIN_ALLOC = __FIRST_OBJEXT_FLAG, >> + OBJEXTS_NOSPIN_ALLOC = __SECOND_OBJEXT_FLAG, >> /* the next bit after the last actual flag */ >> - __NR_OBJEXTS_FLAGS = (__FIRST_OBJEXT_FLAG << 1), >> + __NR_OBJEXTS_FLAGS = (__SECOND_OBJEXT_FLAG << 1), >> }; >> >> #define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1) >> -- >> 2.25.1 >> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags 2025-10-14 10:27 ` Harry Yoo 2025-10-14 11:18 ` Hao Ge @ 2025-10-14 12:49 ` Vlastimil Babka 2025-10-14 13:18 ` Hao Ge 1 sibling, 1 reply; 6+ messages in thread From: Vlastimil Babka @ 2025-10-14 12:49 UTC (permalink / raw) To: Harry Yoo, Hao Ge Cc: Alexei Starovoitov, Andrew Morton, Johannes Weiner, Shakeel Butt, Michal Hocko, Roman Gushchin, Muchun Song, Suren Baghdasaryan, cgroups, linux-mm, linux-kernel, Hao Ge On 10/14/25 12:27, Harry Yoo wrote: > On Tue, Oct 14, 2025 at 05:31:24PM +0800, Hao Ge wrote: >> From: Hao Ge <gehao@kylinos.cn> >> >> We should not reuse the first bit for OBJEXTS_ALLOC_FAIL. >> This is because the following scenarios may be encountered: >> >> Under heavy system load, certain sequences of events can trigger the > > Hi Hao, thanks for catching it! > > It's late at night and my brain is tired so I may be missing something, > but let me leave comment anyway... > >> VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS, folio) check: > > Should we check (folio->memcg_data != OBJEXTS_ALLOC_FAIL) && > (folio->memcg_data & MEMCG_DATA_OBJEXTS) instead then? Yes, we already went that direction, but seems we need to expand to more places due to 7612833192d5 ("slab: Reuse first bit for OBJEXTS_ALLOC_FAIL"): https://lore.kernel.org/all/20250915200918.3855580-2-surenb@google.com/ > Not clearing a valid folio->memcg_data is considered an error, but freeing a > folio that is marked OBJEXTS_ALLOC_FAIL isn't. > >> 1. High system pressure may cause objext allocation failure for a slab. >> 2. When objext allocation fails, slab->obj_exts is set to >> OBJEXTS_ALLOC_FAIL (value 1). >> 3. Later, this slab may enter the release process. >> 4. During release of the associated folio, the existing >> VM_BUG_ON_FOLIO check validates folio->memcg_data. >> If the MEMCG_DATA_OBJEXTS bit is unexpectedly >> set here, the bug check gets triggered. >> >> We have obtained the following logs: >> [ 7108.343437] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff0002deb97600 pfn:0x31eb96 >> [ 7108.343482] head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 >> [ 7108.343500] memcg:1 >> [ 7108.343507] flags: 0x17ffff800000040(head|node=0|zone=2|lastcpupid=0xfffff) >> [ 7108.343523] raw: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000 >> [ 7108.343528] raw: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001 >> [ 7108.343534] head: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000 >> [ 7108.343539] head: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001 >> [ 7108.343562] head: 017ffff800000001 fffffdffcb7ae581 00000000ffffffff 00000000ffffffff >> [ 7108.343569] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000002 >> [ 7108.343574] page dumped because: VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS) >> [ 7108.343601] ------------[ cut here ]------------ >> [ 7108.343607] kernel BUG at ./include/linux/memcontrol.h:537! >> [ 7108.343617] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP >> [ 7108.345751] Modules linked in: squashfs isofs vhost_vsock vhost_net vmw_vsock_virtio_transport_common vfio_iommu_type1 vhost vfio vsock vhost_iotlb iommufd tap binfmt_misc nfsv3 nfs_acl nfs lockd grace netfs tls rds dns_resolver tun brd overlay ntfs3 exfat btrfs blake2b_generic xor xor_neon raid6_pq loop sctp ip6_udp_tunnel udp_tunnel nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables rfkill ip_set sunrpc vfat fat joydev sg sch_fq_codel nfnetlink virtio_gpu drm_client_lib virtio_dma_buf drm_shmem_helper sr_mod drm_kms_helper cdrom drm ghash_ce virtio_net virtio_scsi backlight virtio_console virtio_blk net_failover failover virtio_mmio dm_mirror dm_region_hash dm_log dm_multipath dm_mod fuse i2c_dev virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring autofs4 aes_neon_bs aes_ce_blk [last unloaded: hwpoison_inject] >> [ 7108.355662] CPU: 7 UID: 0 PID: 4470 Comm: kylin-process-m Kdump: loaded Not tainted 6.18.0-rc1-dirty #54 PREEMPT(voluntary) >> [ 7108.356864] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 >> [ 7108.357621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> [ 7108.358981] pc : __free_frozen_pages+0xf18/0x18e8 >> [ 7108.359834] lr : __free_frozen_pages+0xf18/0x18e8 >> [ 7108.360379] sp : ffff8000a2bb7580 >> [ 7108.360786] x29: ffff8000a2bb7580 x28: fffffdffcb7ae580 x27: fffffdffcb7ae580 >> [ 7108.362013] x26: fffffdffcb7ae588 x25: 1fffffbff96f5cb1 x24: 1fffffbff96f5cb0 >> [ 7108.362804] x23: ffff8000839d6ba0 x22: ffff8000839d6000 x21: 0000000000000000 >> [ 7108.363596] x20: 0000000000000000 x19: 0000000000000001 x18: 0000000000000000 >> [ 7108.364393] x17: 445f47434d454d20 x16: 2620617461645f67 x15: 636d656d3e2d6f69 >> [ 7108.365498] x14: 6c6f66284f494c4f x13: 0000000000000001 x12: ffff600063fece93 >> [ 7108.366317] x11: 1fffe00063fece92 x10: ffff600063fece92 x9 : dfff800000000000 >> [ 7108.367610] x8 : 00009fff9c01316e x7 : ffff00031ff67493 x6 : 0000000000000001 >> [ 7108.368455] x5 : ffff00031ff67490 x4 : ffff600063fece93 x3 : 0000000000000000 >> [ 7108.369276] x2 : 0000000000000000 x1 : ffff000103fe5d40 x0 : 000000000000004c >> [ 7108.370140] Call trace: >> [ 7108.370463] __free_frozen_pages+0xf18/0x18e8 (P) >> [ 7108.371011] free_frozen_pages+0x1c/0x30 >> [ 7108.372040] __free_slab+0xd0/0x250 >> [ 7108.372471] free_slab+0x38/0x118 >> [ 7108.372882] free_to_partial_list+0x1d4/0x340 >> [ 7108.373813] __slab_free+0x24c/0x348 >> [ 7108.374253] ___cache_free+0xf0/0x110 >> [ 7108.374699] qlist_free_all+0x78/0x130 >> [ 7108.375156] kasan_quarantine_reduce+0x114/0x148 >> [ 7108.375695] __kasan_slab_alloc+0x7c/0xb0 >> [ 7108.376668] kmem_cache_alloc_noprof+0x164/0x5c8 >> [ 7108.377206] __alloc_object+0x44/0x1f8 >> [ 7108.377659] __create_object+0x34/0xc8 >> [ 7108.378196] kmemleak_alloc+0xb8/0xd8 >> [ 7108.378644] kmem_cache_alloc_noprof+0x368/0x5c8 >> [ 7108.379224] getname_flags.part.0+0xa4/0x610 >> [ 7108.379733] getname_flags+0x80/0xd8 >> [ 7108.380169] do_sys_openat2+0xb4/0x178 >> [ 7108.380921] __arm64_sys_openat+0x134/0x1d0 >> [ 7108.381952] invoke_syscall+0xd4/0x258 >> [ 7108.382408] el0_svc_common.constprop.0+0xb4/0x240 >> [ 7108.382965] do_el0_svc+0x48/0x68 >> [ 7108.383375] el0_svc+0x40/0xe0 >> [ 7108.383757] el0t_64_sync_handler+0xa0/0xe8 >> [ 7108.384465] el0t_64_sync+0x1ac/0x1b0 >> [ 7108.385284] Code: 91398021 aa1b03e0 91138021 97fd35e3 (d4210000) >> [ 7108.386553] SMP: stopping secondary CPUs >> [ 7108.389714] Starting crashdump kernel... >> [ 7108.390190] Bye! >> >> So, introduce __SECOND_OBJEXT_FLAG for objext_flags, adjust >> the corresponding order accordingly, and ensure that OBJEXTS_ALLOC_FAIL >> is no longer reused. >> >> Fixes: 7612833192d5 ("slab: Reuse first bit for OBJEXTS_ALLOC_FAIL") > > Hmm using a new bit was suggested at that time, but that would > require bumping up the alignment when allocating slabobj_ext array? > (see alloc_slab_obj_exts()) > > And we can still distinguish two cases where > > 1) MEMCG_DATA_OBJEXTS is set, but upper bits are not set, > so it should mean obj_exts allocation failed (OBJEXTS_ALLOC_FAIL), > thus do not report error, or > > 2) MEMCG_DATA_OBJEXTS is set, and upper bits are also set, so someone > did not clear a valid folio->memcg_data before freeing the folio > (report error). > > without introducing a new bit, right? Agreed. > >> Signed-off-by: Hao Ge <gehao@kylinos.cn> >> --- >> include/linux/memcontrol.h | 16 ++++++---------- >> 1 file changed, 6 insertions(+), 10 deletions(-) >> >> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >> index 873e510d6f8d..8ea023944fac 100644 >> --- a/include/linux/memcontrol.h >> +++ b/include/linux/memcontrol.h >> @@ -341,27 +341,23 @@ enum page_memcg_data_flags { >> __NR_MEMCG_DATA_FLAGS = (1UL << 2), >> }; >> >> -#define __OBJEXTS_ALLOC_FAIL MEMCG_DATA_OBJEXTS >> #define __FIRST_OBJEXT_FLAG __NR_MEMCG_DATA_FLAGS >> +#define __SECOND_OBJEXT_FLAG (__FIRST_OBJEXT_FLAG << 1) >> >> #else /* CONFIG_MEMCG */ >> >> -#define __OBJEXTS_ALLOC_FAIL (1UL << 0) >> #define __FIRST_OBJEXT_FLAG (1UL << 0) >> +#define __SECOND_OBJEXT_FLAG (1UL << 0) >> >> #endif /* CONFIG_MEMCG */ >> >> enum objext_flags { >> - /* >> - * Use bit 0 with zero other bits to signal that slabobj_ext vector >> - * failed to allocate. The same bit 0 with valid upper bits means >> - * MEMCG_DATA_OBJEXTS. >> - */ >> - OBJEXTS_ALLOC_FAIL = __OBJEXTS_ALLOC_FAIL, >> + /* slabobj_ext vector failed to allocate */ >> + OBJEXTS_ALLOC_FAIL = __FIRST_OBJEXT_FLAG, >> /* slabobj_ext vector allocated with kmalloc_nolock() */ >> - OBJEXTS_NOSPIN_ALLOC = __FIRST_OBJEXT_FLAG, >> + OBJEXTS_NOSPIN_ALLOC = __SECOND_OBJEXT_FLAG, >> /* the next bit after the last actual flag */ >> - __NR_OBJEXTS_FLAGS = (__FIRST_OBJEXT_FLAG << 1), >> + __NR_OBJEXTS_FLAGS = (__SECOND_OBJEXT_FLAG << 1), >> }; >> >> #define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1) >> -- >> 2.25.1 >> > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags 2025-10-14 12:49 ` Vlastimil Babka @ 2025-10-14 13:18 ` Hao Ge 0 siblings, 0 replies; 6+ messages in thread From: Hao Ge @ 2025-10-14 13:18 UTC (permalink / raw) To: Vlastimil Babka, Harry Yoo Cc: Alexei Starovoitov, Andrew Morton, Johannes Weiner, Shakeel Butt, Michal Hocko, Roman Gushchin, Muchun Song, Suren Baghdasaryan, cgroups, linux-mm, linux-kernel, Hao Ge On 2025/10/14 20:49, Vlastimil Babka wrote: > On 10/14/25 12:27, Harry Yoo wrote: >> On Tue, Oct 14, 2025 at 05:31:24PM +0800, Hao Ge wrote: >>> From: Hao Ge <gehao@kylinos.cn> >>> >>> We should not reuse the first bit for OBJEXTS_ALLOC_FAIL. >>> This is because the following scenarios may be encountered: >>> >>> Under heavy system load, certain sequences of events can trigger the >> Hi Hao, thanks for catching it! >> >> It's late at night and my brain is tired so I may be missing something, >> but let me leave comment anyway... >> >>> VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS, folio) check: >> Should we check (folio->memcg_data != OBJEXTS_ALLOC_FAIL) && >> (folio->memcg_data & MEMCG_DATA_OBJEXTS) instead then? > Yes, we already went that direction, but seems we need to expand to more > places due to 7612833192d5 ("slab: Reuse first bit for OBJEXTS_ALLOC_FAIL"): > > https://lore.kernel.org/all/20250915200918.3855580-2-surenb@google.com/ >> Not clearing a valid folio->memcg_data is considered an error, but freeing a >> folio that is marked OBJEXTS_ALLOC_FAIL isn't. >> >>> 1. High system pressure may cause objext allocation failure for a slab. >>> 2. When objext allocation fails, slab->obj_exts is set to >>> OBJEXTS_ALLOC_FAIL (value 1). >>> 3. Later, this slab may enter the release process. >>> 4. During release of the associated folio, the existing >>> VM_BUG_ON_FOLIO check validates folio->memcg_data. >>> If the MEMCG_DATA_OBJEXTS bit is unexpectedly >>> set here, the bug check gets triggered. >>> >>> We have obtained the following logs: >>> [ 7108.343437] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff0002deb97600 pfn:0x31eb96 >>> [ 7108.343482] head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 >>> [ 7108.343500] memcg:1 >>> [ 7108.343507] flags: 0x17ffff800000040(head|node=0|zone=2|lastcpupid=0xfffff) >>> [ 7108.343523] raw: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000 >>> [ 7108.343528] raw: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001 >>> [ 7108.343534] head: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000 >>> [ 7108.343539] head: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001 >>> [ 7108.343562] head: 017ffff800000001 fffffdffcb7ae581 00000000ffffffff 00000000ffffffff >>> [ 7108.343569] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000002 >>> [ 7108.343574] page dumped because: VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS) >>> [ 7108.343601] ------------[ cut here ]------------ >>> [ 7108.343607] kernel BUG at ./include/linux/memcontrol.h:537! >>> [ 7108.343617] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP >>> [ 7108.345751] Modules linked in: squashfs isofs vhost_vsock vhost_net vmw_vsock_virtio_transport_common vfio_iommu_type1 vhost vfio vsock vhost_iotlb iommufd tap binfmt_misc nfsv3 nfs_acl nfs lockd grace netfs tls rds dns_resolver tun brd overlay ntfs3 exfat btrfs blake2b_generic xor xor_neon raid6_pq loop sctp ip6_udp_tunnel udp_tunnel nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables rfkill ip_set sunrpc vfat fat joydev sg sch_fq_codel nfnetlink virtio_gpu drm_client_lib virtio_dma_buf drm_shmem_helper sr_mod drm_kms_helper cdrom drm ghash_ce virtio_net virtio_scsi backlight virtio_console virtio_blk net_failover failover virtio_mmio dm_mirror dm_region_hash dm_log dm_multipath dm_mod fuse i2c_dev virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring autofs4 aes_neon_bs aes_ce_blk [last unloaded: hwpoison_inject] >>> [ 7108.355662] CPU: 7 UID: 0 PID: 4470 Comm: kylin-process-m Kdump: loaded Not tainted 6.18.0-rc1-dirty #54 PREEMPT(voluntary) >>> [ 7108.356864] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 >>> [ 7108.357621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >>> [ 7108.358981] pc : __free_frozen_pages+0xf18/0x18e8 >>> [ 7108.359834] lr : __free_frozen_pages+0xf18/0x18e8 >>> [ 7108.360379] sp : ffff8000a2bb7580 >>> [ 7108.360786] x29: ffff8000a2bb7580 x28: fffffdffcb7ae580 x27: fffffdffcb7ae580 >>> [ 7108.362013] x26: fffffdffcb7ae588 x25: 1fffffbff96f5cb1 x24: 1fffffbff96f5cb0 >>> [ 7108.362804] x23: ffff8000839d6ba0 x22: ffff8000839d6000 x21: 0000000000000000 >>> [ 7108.363596] x20: 0000000000000000 x19: 0000000000000001 x18: 0000000000000000 >>> [ 7108.364393] x17: 445f47434d454d20 x16: 2620617461645f67 x15: 636d656d3e2d6f69 >>> [ 7108.365498] x14: 6c6f66284f494c4f x13: 0000000000000001 x12: ffff600063fece93 >>> [ 7108.366317] x11: 1fffe00063fece92 x10: ffff600063fece92 x9 : dfff800000000000 >>> [ 7108.367610] x8 : 00009fff9c01316e x7 : ffff00031ff67493 x6 : 0000000000000001 >>> [ 7108.368455] x5 : ffff00031ff67490 x4 : ffff600063fece93 x3 : 0000000000000000 >>> [ 7108.369276] x2 : 0000000000000000 x1 : ffff000103fe5d40 x0 : 000000000000004c >>> [ 7108.370140] Call trace: >>> [ 7108.370463] __free_frozen_pages+0xf18/0x18e8 (P) >>> [ 7108.371011] free_frozen_pages+0x1c/0x30 >>> [ 7108.372040] __free_slab+0xd0/0x250 >>> [ 7108.372471] free_slab+0x38/0x118 >>> [ 7108.372882] free_to_partial_list+0x1d4/0x340 >>> [ 7108.373813] __slab_free+0x24c/0x348 >>> [ 7108.374253] ___cache_free+0xf0/0x110 >>> [ 7108.374699] qlist_free_all+0x78/0x130 >>> [ 7108.375156] kasan_quarantine_reduce+0x114/0x148 >>> [ 7108.375695] __kasan_slab_alloc+0x7c/0xb0 >>> [ 7108.376668] kmem_cache_alloc_noprof+0x164/0x5c8 >>> [ 7108.377206] __alloc_object+0x44/0x1f8 >>> [ 7108.377659] __create_object+0x34/0xc8 >>> [ 7108.378196] kmemleak_alloc+0xb8/0xd8 >>> [ 7108.378644] kmem_cache_alloc_noprof+0x368/0x5c8 >>> [ 7108.379224] getname_flags.part.0+0xa4/0x610 >>> [ 7108.379733] getname_flags+0x80/0xd8 >>> [ 7108.380169] do_sys_openat2+0xb4/0x178 >>> [ 7108.380921] __arm64_sys_openat+0x134/0x1d0 >>> [ 7108.381952] invoke_syscall+0xd4/0x258 >>> [ 7108.382408] el0_svc_common.constprop.0+0xb4/0x240 >>> [ 7108.382965] do_el0_svc+0x48/0x68 >>> [ 7108.383375] el0_svc+0x40/0xe0 >>> [ 7108.383757] el0t_64_sync_handler+0xa0/0xe8 >>> [ 7108.384465] el0t_64_sync+0x1ac/0x1b0 >>> [ 7108.385284] Code: 91398021 aa1b03e0 91138021 97fd35e3 (d4210000) >>> [ 7108.386553] SMP: stopping secondary CPUs >>> [ 7108.389714] Starting crashdump kernel... >>> [ 7108.390190] Bye! >>> >>> So, introduce __SECOND_OBJEXT_FLAG for objext_flags, adjust >>> the corresponding order accordingly, and ensure that OBJEXTS_ALLOC_FAIL >>> is no longer reused. >>> >>> Fixes: 7612833192d5 ("slab: Reuse first bit for OBJEXTS_ALLOC_FAIL") >> Hmm using a new bit was suggested at that time, but that would >> require bumping up the alignment when allocating slabobj_ext array? >> (see alloc_slab_obj_exts()) Hi Vlastimil and Harry Now I understand the alignment you're referring to, and indeed, my solution does require bumping up the alignment when allocating the slabobj_ext array. And syzbot ci has reported a bug: https://lore.kernel.org/all/68ee41cb.050a0220.91a22.020b.GAE@google.com/ This is because the data of slabobj_ext has been corrupted. >> And we can still distinguish two cases where >> >> 1) MEMCG_DATA_OBJEXTS is set, but upper bits are not set, >> so it should mean obj_exts allocation failed (OBJEXTS_ALLOC_FAIL), >> thus do not report error, or >> >> 2) MEMCG_DATA_OBJEXTS is set, and upper bits are also set, so someone >> did not clear a valid folio->memcg_data before freeing the folio >> (report error). >> >> without introducing a new bit, right? > Agreed. Okay, now I understand what you mean. I will send out the V2 version soon. Thank you for your guidance. Thanks Best Regards Hao > >>> Signed-off-by: Hao Ge <gehao@kylinos.cn> >>> --- >>> include/linux/memcontrol.h | 16 ++++++---------- >>> 1 file changed, 6 insertions(+), 10 deletions(-) >>> >>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >>> index 873e510d6f8d..8ea023944fac 100644 >>> --- a/include/linux/memcontrol.h >>> +++ b/include/linux/memcontrol.h >>> @@ -341,27 +341,23 @@ enum page_memcg_data_flags { >>> __NR_MEMCG_DATA_FLAGS = (1UL << 2), >>> }; >>> >>> -#define __OBJEXTS_ALLOC_FAIL MEMCG_DATA_OBJEXTS >>> #define __FIRST_OBJEXT_FLAG __NR_MEMCG_DATA_FLAGS >>> +#define __SECOND_OBJEXT_FLAG (__FIRST_OBJEXT_FLAG << 1) >>> >>> #else /* CONFIG_MEMCG */ >>> >>> -#define __OBJEXTS_ALLOC_FAIL (1UL << 0) >>> #define __FIRST_OBJEXT_FLAG (1UL << 0) >>> +#define __SECOND_OBJEXT_FLAG (1UL << 0) >>> >>> #endif /* CONFIG_MEMCG */ >>> >>> enum objext_flags { >>> - /* >>> - * Use bit 0 with zero other bits to signal that slabobj_ext vector >>> - * failed to allocate. The same bit 0 with valid upper bits means >>> - * MEMCG_DATA_OBJEXTS. >>> - */ >>> - OBJEXTS_ALLOC_FAIL = __OBJEXTS_ALLOC_FAIL, >>> + /* slabobj_ext vector failed to allocate */ >>> + OBJEXTS_ALLOC_FAIL = __FIRST_OBJEXT_FLAG, >>> /* slabobj_ext vector allocated with kmalloc_nolock() */ >>> - OBJEXTS_NOSPIN_ALLOC = __FIRST_OBJEXT_FLAG, >>> + OBJEXTS_NOSPIN_ALLOC = __SECOND_OBJEXT_FLAG, >>> /* the next bit after the last actual flag */ >>> - __NR_OBJEXTS_FLAGS = (__FIRST_OBJEXT_FLAG << 1), >>> + __NR_OBJEXTS_FLAGS = (__SECOND_OBJEXT_FLAG << 1), >>> }; >>> >>> #define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1) >>> -- >>> 2.25.1 >>> ^ permalink raw reply [flat|nested] 6+ messages in thread
* [syzbot ci] Re: slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags 2025-10-14 9:31 [PATCH] slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags Hao Ge 2025-10-14 10:27 ` Harry Yoo @ 2025-10-14 13:10 ` syzbot ci 1 sibling, 0 replies; 6+ messages in thread From: syzbot ci @ 2025-10-14 13:10 UTC (permalink / raw) To: akpm, ast, cgroups, gehao, hannes, hao.ge, harry.yoo, linux-kernel, linux-mm, mhocko, muchun.song, roman.gushchin, shakeel.butt, surenb, vbabka Cc: syzbot, syzkaller-bugs syzbot ci has tested the following series [v1] slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags https://lore.kernel.org/all/20251014093124.300012-1-hao.ge@linux.dev * [PATCH] slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags and found the following issue: general protection fault in percpu_ref_get_many Full report is available here: https://ci.syzbot.org/series/6fd66120-211f-479f-b6a1-35f990da2dc2 *** general protection fault in percpu_ref_get_many tree: torvalds URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux base: 0d97f2067c166eb495771fede9f7b73999c67f66 arch: amd64 compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8 config: https://ci.syzbot.org/builds/74de5bb7-695b-4115-9a4b-ee7d7fd0cca2/config Oops: general protection fault, probably for non-canonical address 0xdffffc00177780ff: 0000 [#1] SMP KASAN PTI KASAN: probably user-memory-access in range [0x00000000bbbc07f8-0x00000000bbbc07ff] CPU: 1 UID: 0 PID: 6155 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:percpu_ref_get_many+0x8d/0x140 Code: 01 48 c7 c7 80 70 78 8b be 65 03 00 00 48 c7 c2 c0 70 78 8b e8 64 2b 6f ff 49 bc 00 00 00 00 00 fc ff df 4c 89 f8 48 c1 e8 03 <42> 80 3c 20 00 74 08 4c 89 ff e8 c4 50 f7 ff 49 8b 07 a8 03 75 62 RSP: 0018:ffffc90004df7500 EFLAGS: 00010206 RAX: 00000000177780ff RBX: ffffffff822de139 RCX: 14bab840e71f4400 RDX: 0000000000000000 RSI: ffffffff8bc074c0 RDI: ffffffff8bc07480 RBP: 0000000000000088 R08: 0000000000000000 R09: ffffffff822de139 R10: dffffc0000000000 R11: fffffbfff1f3c1ef R12: dffffc0000000000 R13: ffff88823c63b5c0 R14: 0000000000000001 R15: 00000000bbbc07f8 FS: 0000555570ae6500(0000) GS:ffff8882a9d12000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000555562d8c5c8 CR3: 0000000113444000 CR4: 00000000000006f0 Call Trace: <TASK> refill_obj_stock+0x254/0x850 __memcg_slab_free_hook+0x123/0x3b0 kfree+0x3f7/0x6d0 kobject_uevent_env+0x361/0x8c0 netdev_queue_update_kobjects+0x346/0x6c0 netdev_register_kobject+0x258/0x310 register_netdevice+0x126c/0x1ae0 __ip_tunnel_create+0x3e7/0x560 ip_tunnel_init_net+0x2ba/0x800 ops_init+0x35c/0x5c0 setup_net+0xfe/0x320 copy_net_ns+0x34e/0x4e0 create_new_namespaces+0x3f3/0x720 unshare_nsproxy_namespaces+0x11c/0x170 ksys_unshare+0x4c8/0x8c0 __x64_sys_unshare+0x38/0x50 do_syscall_64+0xfa/0xfa0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f85c81906c7 Code: 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 10 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffe57c0ca58 EFLAGS: 00000246 ORIG_RAX: 0000000000000110 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f85c81906c7 RDX: 00007f85c818eec9 RSI: 00007ffe57c0ca20 RDI: 0000000040000000 RBP: 00007ffe57c0cac0 R08: 00007f85c83a69d0 R09: 00007f85c83a69d0 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe57c0cac0 R13: 00007ffe57c0cac8 R14: 0000000000000009 R15: 0000000000000000 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:percpu_ref_get_many+0x8d/0x140 Code: 01 48 c7 c7 80 70 78 8b be 65 03 00 00 48 c7 c2 c0 70 78 8b e8 64 2b 6f ff 49 bc 00 00 00 00 00 fc ff df 4c 89 f8 48 c1 e8 03 <42> 80 3c 20 00 74 08 4c 89 ff e8 c4 50 f7 ff 49 8b 07 a8 03 75 62 RSP: 0018:ffffc90004df7500 EFLAGS: 00010206 RAX: 00000000177780ff RBX: ffffffff822de139 RCX: 14bab840e71f4400 RDX: 0000000000000000 RSI: ffffffff8bc074c0 RDI: ffffffff8bc07480 RBP: 0000000000000088 R08: 0000000000000000 R09: ffffffff822de139 R10: dffffc0000000000 R11: fffffbfff1f3c1ef R12: dffffc0000000000 R13: ffff88823c63b5c0 R14: 0000000000000001 R15: 00000000bbbc07f8 FS: 0000555570ae6500(0000) GS:ffff8882a9d12000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000555562d8c5c8 CR3: 0000000113444000 CR4: 00000000000006f0 *** If these findings have caused you to resend the series or submit a separate fix, please add the following tag to your commit message: Tested-by: syzbot@syzkaller.appspotmail.com --- This report is generated by a bot. It may contain errors. syzbot ci engineers can be reached at syzkaller@googlegroups.com. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-10-14 13:19 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-10-14 9:31 [PATCH] slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags Hao Ge 2025-10-14 10:27 ` Harry Yoo 2025-10-14 11:18 ` Hao Ge 2025-10-14 12:49 ` Vlastimil Babka 2025-10-14 13:18 ` Hao Ge 2025-10-14 13:10 ` [syzbot ci] " syzbot ci
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox