[linux-next:master] [slab] db93cdd664: BUG:kernel_NULL_pointer

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [linux-next:master] [slab]  db93cdd664: BUG:kernel_NULL_pointer_dereference,address
@ 2025-09-17  5:01 kernel test robot
  2025-09-17  8:03 ` Vlastimil Babka
  0 siblings, 1 reply; 12+ messages in thread
From: kernel test robot @ 2025-09-17  5:01 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: oe-lkp, lkp, Vlastimil Babka, kasan-dev, cgroups, linux-mm, oliver.sang



Hello,

kernel test robot noticed "BUG:kernel_NULL_pointer_dereference,address" on:

commit: db93cdd664fa02de9be883dd29343b21d8fc790f ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: boot

config: i386-randconfig-062-20250913
compiler: clang-20
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202509171214.912d5ac-lkp@intel.com


[    7.101117][    T0] BUG: kernel NULL pointer dereference, address: 00000010
[    7.102290][    T0] #PF: supervisor read access in kernel mode
[    7.103219][    T0] #PF: error_code(0x0000) - not-present page
[    7.104161][    T0] *pde = 00000000
[    7.104762][    T0] Thread overran stack, or stack corrupted
[    7.105726][    T0] Oops: Oops: 0000 [#1]
[    7.106410][    T0] CPU: 0 UID: 0 PID: 0 Comm: swapper Tainted: G                T   6.17.0-rc3-00014-gdb93cdd664fa #1 NONE  40eff3b43e4f0000b061f2e660abd0b2911f31b1
[    7.108712][    T0] Tainted: [T]=RANDSTRUCT
[    7.109368][    T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 7.110952][ T0] EIP: kmalloc_nolock_noprof (mm/slub.c:5607) 
[ 7.112838][ T0] Code: 90 90 90 90 90 89 45 bc 0f bd 75 bc 75 05 be ff ff ff ff 46 83 fe 0e 0f 83 b6 01 00 00 6b c7 38 8b 84 b0 b4 79 d0 b2 89 45 ec <8b> 40 10 a9 00 00 01 00 75 1b 8b 0d ec 28 db b3 31 f6 a9 87 04 00
All code
========
   0:	90                   	nop
   1:	90                   	nop
   2:	90                   	nop
   3:	90                   	nop
   4:	90                   	nop
   5:	89 45 bc             	mov    %eax,-0x44(%rbp)
   8:	0f bd 75 bc          	bsr    -0x44(%rbp),%esi
   c:	75 05                	jne    0x13
   e:	be ff ff ff ff       	mov    $0xffffffff,%esi
  13:	46 83 fe 0e          	rex.RX cmp $0xe,%esi
  17:	0f 83 b6 01 00 00    	jae    0x1d3
  1d:	6b c7 38             	imul   $0x38,%edi,%eax
  20:	8b 84 b0 b4 79 d0 b2 	mov    -0x4d2f864c(%rax,%rsi,4),%eax
  27:	89 45 ec             	mov    %eax,-0x14(%rbp)
  2a:*	8b 40 10             	mov    0x10(%rax),%eax		<-- trapping instruction
  2d:	a9 00 00 01 00       	test   $0x10000,%eax
  32:	75 1b                	jne    0x4f
  34:	8b 0d ec 28 db b3    	mov    -0x4c24d714(%rip),%ecx        # 0xffffffffb3db2926
  3a:	31 f6                	xor    %esi,%esi
  3c:	a9                   	.byte 0xa9
  3d:	87 04 00             	xchg   %eax,(%rax,%rax,1)

Code starting with the faulting instruction
===========================================
   0:	8b 40 10             	mov    0x10(%rax),%eax
   3:	a9 00 00 01 00       	test   $0x10000,%eax
   8:	75 1b                	jne    0x25
   a:	8b 0d ec 28 db b3    	mov    -0x4c24d714(%rip),%ecx        # 0xffffffffb3db28fc
  10:	31 f6                	xor    %esi,%esi
  12:	a9                   	.byte 0xa9
  13:	87 04 00             	xchg   %eax,(%rax,%rax,1)
[    7.115899][    T0] EAX: 00000000 EBX: 00000101 ECX: 00000200 EDX: 00000000
[    7.116940][    T0] ESI: 00000009 EDI: 0000000e EBP: b2d07d18 ESP: b2d07cd4
[    7.118013][    T0] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00210002
[    7.119201][    T0] CR0: 80050033 CR2: 00000010 CR3: 03672000 CR4: 00000090
[    7.120263][    T0] Call Trace:
[    7.120791][    T0] Modules linked in:
[    7.121455][    T0] CR2: 0000000000000010
[    7.122145][    T0] ---[ end trace 0000000000000000 ]---
[ 7.123070][ T0] EIP: kmalloc_nolock_noprof (mm/slub.c:5607) 
[ 7.123973][ T0] Code: 90 90 90 90 90 89 45 bc 0f bd 75 bc 75 05 be ff ff ff ff 46 83 fe 0e 0f 83 b6 01 00 00 6b c7 38 8b 84 b0 b4 79 d0 b2 89 45 ec <8b> 40 10 a9 00 00 01 00 75 1b 8b 0d ec 28 db b3 31 f6 a9 87 04 00
All code
========
   0:	90                   	nop
   1:	90                   	nop
   2:	90                   	nop
   3:	90                   	nop
   4:	90                   	nop
   5:	89 45 bc             	mov    %eax,-0x44(%rbp)
   8:	0f bd 75 bc          	bsr    -0x44(%rbp),%esi
   c:	75 05                	jne    0x13
   e:	be ff ff ff ff       	mov    $0xffffffff,%esi
  13:	46 83 fe 0e          	rex.RX cmp $0xe,%esi
  17:	0f 83 b6 01 00 00    	jae    0x1d3
  1d:	6b c7 38             	imul   $0x38,%edi,%eax
  20:	8b 84 b0 b4 79 d0 b2 	mov    -0x4d2f864c(%rax,%rsi,4),%eax
  27:	89 45 ec             	mov    %eax,-0x14(%rbp)
  2a:*	8b 40 10             	mov    0x10(%rax),%eax		<-- trapping instruction
  2d:	a9 00 00 01 00       	test   $0x10000,%eax
  32:	75 1b                	jne    0x4f
  34:	8b 0d ec 28 db b3    	mov    -0x4c24d714(%rip),%ecx        # 0xffffffffb3db2926
  3a:	31 f6                	xor    %esi,%esi
  3c:	a9                   	.byte 0xa9
  3d:	87 04 00             	xchg   %eax,(%rax,%rax,1)

Code starting with the faulting instruction
===========================================
   0:	8b 40 10             	mov    0x10(%rax),%eax
   3:	a9 00 00 01 00       	test   $0x10000,%eax
   8:	75 1b                	jne    0x25
   a:	8b 0d ec 28 db b3    	mov    -0x4c24d714(%rip),%ecx        # 0xffffffffb3db28fc
  10:	31 f6                	xor    %esi,%esi
  12:	a9                   	.byte 0xa9
  13:	87 04 00             	xchg   %eax,(%rax,%rax,1)


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250917/202509171214.912d5ac-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-next:master] [slab] db93cdd664: BUG:kernel_NULL_pointer_dereference,address
  2025-09-17  5:01 [linux-next:master] [slab] db93cdd664: BUG:kernel_NULL_pointer_dereference,address kernel test robot
@ 2025-09-17  8:03 ` Vlastimil Babka
  2025-09-17  9:18   ` Vlastimil Babka
  0 siblings, 1 reply; 12+ messages in thread
From: Vlastimil Babka @ 2025-09-17  8:03 UTC (permalink / raw)
  To: kernel test robot, Alexei Starovoitov, Harry Yoo
  Cc: oe-lkp, lkp, kasan-dev, cgroups, linux-mm

On 9/17/25 07:01, kernel test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed "BUG:kernel_NULL_pointer_dereference,address" on:
> 
> commit: db93cdd664fa02de9be883dd29343b21d8fc790f ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 
> in testcase: boot
> 
> config: i386-randconfig-062-20250913
> compiler: clang-20
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> 
> (please refer to attached dmesg/kmsg for entire log/backtrace)
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202509171214.912d5ac-lkp@intel.com
> 
> 
> [    7.101117][    T0] BUG: kernel NULL pointer dereference, address: 00000010
> [    7.102290][    T0] #PF: supervisor read access in kernel mode
> [    7.103219][    T0] #PF: error_code(0x0000) - not-present page
> [    7.104161][    T0] *pde = 00000000
> [    7.104762][    T0] Thread overran stack, or stack corrupted

Note this.

> [    7.105726][    T0] Oops: Oops: 0000 [#1]
> [    7.106410][    T0] CPU: 0 UID: 0 PID: 0 Comm: swapper Tainted: G                T   6.17.0-rc3-00014-gdb93cdd664fa #1 NONE  40eff3b43e4f0000b061f2e660abd0b2911f31b1
> [    7.108712][    T0] Tainted: [T]=RANDSTRUCT
> [    7.109368][    T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> [ 7.110952][ T0] EIP: kmalloc_nolock_noprof (mm/slub.c:5607) 

That's here.
if (!(s->flags & __CMPXCHG_DOUBLE) && !kmem_cache_debug(s))

dmesg already contains line "SLUB: HWalign=64, Order=0-3, MinObjects=0,
CPUs=1, Nodes=1" so all kmem caches are fully initialized, so doesn't look
like a bootstrap issue. Probably it's due to the stack overflow and not
actual bug on this line.

Because of that it's also unable to print the backtrace. But the only
kmallock_nolock usage for now is in slub itself, alloc_slab_obj_exts():

        /* Prevent recursive extension vector allocation */
        gfp |= __GFP_NO_OBJ_EXT;
        if (unlikely(!allow_spin)) {
                size_t sz = objects * sizeof(struct slabobj_ext);

                vec = kmalloc_nolock(sz, __GFP_ZERO, slab_nid(slab));
        } else {
                vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
                                   slab_nid(slab));
        }

Prevent recursive... hm? And we had stack overflow?
Also .config has CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y

So, this?
diff --git a/mm/slub.c b/mm/slub.c
index 837ee037abb5..c4f17ac6e4b6 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2092,7 +2092,8 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
 	if (unlikely(!allow_spin)) {
 		size_t sz = objects * sizeof(struct slabobj_ext);
 
-		vec = kmalloc_nolock(sz, __GFP_ZERO, slab_nid(slab));
+		vec = kmalloc_nolock(sz, __GFP_ZERO | __GFP_NO_OBJ_EXT,
+				     slab_nid(slab));
 	} else {
 		vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
 				   slab_nid(slab));
@@ -5591,7 +5592,8 @@ void *kmalloc_nolock_noprof(size_t size, gfp_t gfp_flags, int node)
 	bool can_retry = true;
 	void *ret = ERR_PTR(-EBUSY);
 
-	VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO));
+	VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO |
+				      __GFP_NO_OBJ_EXT));
 
 	if (unlikely(!size))
 		return ZERO_SIZE_PTR;



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-next:master] [slab] db93cdd664: BUG:kernel_NULL_pointer_dereference,address
  2025-09-17  8:03 ` Vlastimil Babka
@ 2025-09-17  9:18   ` Vlastimil Babka
  2025-09-17 18:38     ` Alexei Starovoitov
  0 siblings, 1 reply; 12+ messages in thread
From: Vlastimil Babka @ 2025-09-17  9:18 UTC (permalink / raw)
  To: kernel test robot, Alexei Starovoitov, Harry Yoo, Suren Baghdasaryan
  Cc: oe-lkp, lkp, kasan-dev, cgroups, linux-mm

On 9/17/25 10:03, Vlastimil Babka wrote:
> On 9/17/25 07:01, kernel test robot wrote:
>> 
>> 
>> Hello,
>> 
>> kernel test robot noticed "BUG:kernel_NULL_pointer_dereference,address" on:
>> 
>> commit: db93cdd664fa02de9be883dd29343b21d8fc790f ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>> 
>> in testcase: boot
>> 
>> config: i386-randconfig-062-20250913
>> compiler: clang-20
>> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>> 
>> (please refer to attached dmesg/kmsg for entire log/backtrace)

Managed to reproduce locally and my suggested fix works so I'm going to fold
it unless there's objections or better suggestions.

Also I was curious to find out which path is triggered so I've put a
dump_stack() before the kmalloc_nolock call:

[    0.731812][    T0] Call Trace:
[    0.732406][    T0]  __dump_stack+0x18/0x30
[    0.733200][    T0]  dump_stack_lvl+0x32/0x90
[    0.734037][    T0]  dump_stack+0xd/0x20
[    0.734780][    T0]  alloc_slab_obj_exts+0x181/0x1f0
[    0.735862][    T0]  __alloc_tagging_slab_alloc_hook+0xd1/0x330
[    0.736988][    T0]  ? __slab_alloc+0x4e/0x70
[    0.737858][    T0]  ? __set_page_owner+0x167/0x280
[    0.738774][    T0]  __kmalloc_cache_noprof+0x379/0x460
[    0.739756][    T0]  ? depot_fetch_stack+0x164/0x180
[    0.740687][    T0]  ? __set_page_owner+0x167/0x280
[    0.741604][    T0]  __set_page_owner+0x167/0x280
[    0.742503][    T0]  post_alloc_hook+0x17a/0x200
[    0.743404][    T0]  get_page_from_freelist+0x13b3/0x16b0
[    0.744427][    T0]  ? kvm_sched_clock_read+0xd/0x20
[    0.745358][    T0]  ? kvm_sched_clock_read+0xd/0x20
[    0.746290][    T0]  ? __next_zones_zonelist+0x26/0x60
[    0.747265][    T0]  __alloc_frozen_pages_noprof+0x143/0x1080
[    0.748358][    T0]  ? lock_acquire+0x8b/0x180
[    0.749209][    T0]  ? pcpu_alloc_noprof+0x181/0x800
[    0.750198][    T0]  ? sched_clock_noinstr+0x8/0x10
[    0.751119][    T0]  ? local_clock_noinstr+0x137/0x140
[    0.752089][    T0]  ? kvm_sched_clock_read+0xd/0x20
[    0.753023][    T0]  alloc_slab_page+0xda/0x150
[    0.753879][    T0]  new_slab+0xe1/0x500
[    0.754615][    T0]  ? kvm_sched_clock_read+0xd/0x20
[    0.755577][    T0]  ___slab_alloc+0xd79/0x1680
[    0.756469][    T0]  ? pcpu_alloc_noprof+0x538/0x800
[    0.757408][    T0]  ? __mutex_unlock_slowpath+0x195/0x3e0
[    0.758446][    T0]  __slab_alloc+0x4e/0x70
[    0.759237][    T0]  ? mm_alloc+0x38/0x80
[    0.759993][    T0]  kmem_cache_alloc_noprof+0x1db/0x470
[    0.760993][    T0]  ? mm_alloc+0x38/0x80
[    0.761745][    T0]  ? mm_alloc+0x38/0x80
[    0.762506][    T0]  mm_alloc+0x38/0x80
[    0.763260][    T0]  poking_init+0xe/0x80
[    0.764032][    T0]  start_kernel+0x16b/0x470
[    0.764858][    T0]  i386_start_kernel+0xce/0xf0
[    0.765723][    T0]  startup_32_smp+0x151/0x160

And the reason is we still have restricted gfp_allowed_mask at this point:
/* The GFP flags allowed during early boot */
#define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS))

It's only lifted to a full allowed mask later in the boot.

That means due to "kmalloc_nolock() is not supported on architectures that
don't implement cmpxchg16b" such architectures will no longer get objexts
allocated in early boot. I guess that's not a big deal.

Also any later allocation having its flags screwed for some reason to not
have __GFP_RECLAIM will also lose its objexts. Hope that's also acceptable.
I don't know if we can distinguish a real kmalloc_nolock() scope in
alloc_slab_obj_exts() without inventing new gfp flags or passing an extra
argument through several layers of functions.

>> 
>> 
>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>> the same patch/commit), kindly add following tags
>> | Reported-by: kernel test robot <oliver.sang@intel.com>
>> | Closes: https://lore.kernel.org/oe-lkp/202509171214.912d5ac-lkp@intel.com
>> 
>> 
>> [    7.101117][    T0] BUG: kernel NULL pointer dereference, address: 00000010
>> [    7.102290][    T0] #PF: supervisor read access in kernel mode
>> [    7.103219][    T0] #PF: error_code(0x0000) - not-present page
>> [    7.104161][    T0] *pde = 00000000
>> [    7.104762][    T0] Thread overran stack, or stack corrupted
> 
> Note this.
> 
>> [    7.105726][    T0] Oops: Oops: 0000 [#1]
>> [    7.106410][    T0] CPU: 0 UID: 0 PID: 0 Comm: swapper Tainted: G                T   6.17.0-rc3-00014-gdb93cdd664fa #1 NONE  40eff3b43e4f0000b061f2e660abd0b2911f31b1
>> [    7.108712][    T0] Tainted: [T]=RANDSTRUCT
>> [    7.109368][    T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
>> [ 7.110952][ T0] EIP: kmalloc_nolock_noprof (mm/slub.c:5607) 
> 
> That's here.
> if (!(s->flags & __CMPXCHG_DOUBLE) && !kmem_cache_debug(s))
> 
> dmesg already contains line "SLUB: HWalign=64, Order=0-3, MinObjects=0,
> CPUs=1, Nodes=1" so all kmem caches are fully initialized, so doesn't look
> like a bootstrap issue. Probably it's due to the stack overflow and not
> actual bug on this line.
> 
> Because of that it's also unable to print the backtrace. But the only
> kmallock_nolock usage for now is in slub itself, alloc_slab_obj_exts():
> 
>         /* Prevent recursive extension vector allocation */
>         gfp |= __GFP_NO_OBJ_EXT;
>         if (unlikely(!allow_spin)) {
>                 size_t sz = objects * sizeof(struct slabobj_ext);
> 
>                 vec = kmalloc_nolock(sz, __GFP_ZERO, slab_nid(slab));
>         } else {
>                 vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
>                                    slab_nid(slab));
>         }
> 
> Prevent recursive... hm? And we had stack overflow?
> Also .config has CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
> 
> So, this?
> diff --git a/mm/slub.c b/mm/slub.c
> index 837ee037abb5..c4f17ac6e4b6 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2092,7 +2092,8 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
>  	if (unlikely(!allow_spin)) {
>  		size_t sz = objects * sizeof(struct slabobj_ext);
>  
> -		vec = kmalloc_nolock(sz, __GFP_ZERO, slab_nid(slab));
> +		vec = kmalloc_nolock(sz, __GFP_ZERO | __GFP_NO_OBJ_EXT,
> +				     slab_nid(slab));
>  	} else {
>  		vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
>  				   slab_nid(slab));
> @@ -5591,7 +5592,8 @@ void *kmalloc_nolock_noprof(size_t size, gfp_t gfp_flags, int node)
>  	bool can_retry = true;
>  	void *ret = ERR_PTR(-EBUSY);
>  
> -	VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO));
> +	VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO |
> +				      __GFP_NO_OBJ_EXT));
>  
>  	if (unlikely(!size))
>  		return ZERO_SIZE_PTR;
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-next:master] [slab] db93cdd664: BUG:kernel_NULL_pointer_dereference,address
  2025-09-17  9:18   ` Vlastimil Babka
@ 2025-09-17 18:38     ` Alexei Starovoitov
  2025-09-18  7:06       ` Vlastimil Babka
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2025-09-17 18:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: kernel test robot, Alexei Starovoitov, Harry Yoo,
	Suren Baghdasaryan, oe-lkp, kbuild test robot, kasan-dev,
	open list:CONTROL GROUP (CGROUP),
	linux-mm

On Wed, Sep 17, 2025 at 2:18 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 9/17/25 10:03, Vlastimil Babka wrote:
> > On 9/17/25 07:01, kernel test robot wrote:
> >>
> >>
> >> Hello,
> >>
> >> kernel test robot noticed "BUG:kernel_NULL_pointer_dereference,address" on:
> >>
> >> commit: db93cdd664fa02de9be883dd29343b21d8fc790f ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
> >> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >>
> >> in testcase: boot
> >>
> >> config: i386-randconfig-062-20250913
> >> compiler: clang-20
> >> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> >>
> >> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
> Managed to reproduce locally and my suggested fix works so I'm going to fold
> it unless there's objections or better suggestions.

Thanks for the fix. Not sure what I was thinking. __GFP_NO_OBJ_EXT
is obviously needed there.

> Also I was curious to find out which path is triggered so I've put a
> dump_stack() before the kmalloc_nolock call:
>
> [    0.731812][    T0] Call Trace:
> [    0.732406][    T0]  __dump_stack+0x18/0x30
> [    0.733200][    T0]  dump_stack_lvl+0x32/0x90
> [    0.734037][    T0]  dump_stack+0xd/0x20
> [    0.734780][    T0]  alloc_slab_obj_exts+0x181/0x1f0
> [    0.735862][    T0]  __alloc_tagging_slab_alloc_hook+0xd1/0x330
> [    0.736988][    T0]  ? __slab_alloc+0x4e/0x70
> [    0.737858][    T0]  ? __set_page_owner+0x167/0x280
> [    0.738774][    T0]  __kmalloc_cache_noprof+0x379/0x460
> [    0.739756][    T0]  ? depot_fetch_stack+0x164/0x180
> [    0.740687][    T0]  ? __set_page_owner+0x167/0x280
> [    0.741604][    T0]  __set_page_owner+0x167/0x280
> [    0.742503][    T0]  post_alloc_hook+0x17a/0x200
> [    0.743404][    T0]  get_page_from_freelist+0x13b3/0x16b0
> [    0.744427][    T0]  ? kvm_sched_clock_read+0xd/0x20
> [    0.745358][    T0]  ? kvm_sched_clock_read+0xd/0x20
> [    0.746290][    T0]  ? __next_zones_zonelist+0x26/0x60
> [    0.747265][    T0]  __alloc_frozen_pages_noprof+0x143/0x1080
> [    0.748358][    T0]  ? lock_acquire+0x8b/0x180
> [    0.749209][    T0]  ? pcpu_alloc_noprof+0x181/0x800
> [    0.750198][    T0]  ? sched_clock_noinstr+0x8/0x10
> [    0.751119][    T0]  ? local_clock_noinstr+0x137/0x140
> [    0.752089][    T0]  ? kvm_sched_clock_read+0xd/0x20
> [    0.753023][    T0]  alloc_slab_page+0xda/0x150
> [    0.753879][    T0]  new_slab+0xe1/0x500
> [    0.754615][    T0]  ? kvm_sched_clock_read+0xd/0x20
> [    0.755577][    T0]  ___slab_alloc+0xd79/0x1680
> [    0.756469][    T0]  ? pcpu_alloc_noprof+0x538/0x800
> [    0.757408][    T0]  ? __mutex_unlock_slowpath+0x195/0x3e0
> [    0.758446][    T0]  __slab_alloc+0x4e/0x70
> [    0.759237][    T0]  ? mm_alloc+0x38/0x80
> [    0.759993][    T0]  kmem_cache_alloc_noprof+0x1db/0x470
> [    0.760993][    T0]  ? mm_alloc+0x38/0x80
> [    0.761745][    T0]  ? mm_alloc+0x38/0x80
> [    0.762506][    T0]  mm_alloc+0x38/0x80
> [    0.763260][    T0]  poking_init+0xe/0x80
> [    0.764032][    T0]  start_kernel+0x16b/0x470
> [    0.764858][    T0]  i386_start_kernel+0xce/0xf0
> [    0.765723][    T0]  startup_32_smp+0x151/0x160
>
> And the reason is we still have restricted gfp_allowed_mask at this point:
> /* The GFP flags allowed during early boot */
> #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS))
>
> It's only lifted to a full allowed mask later in the boot.

Ohh. That's interesting.

> That means due to "kmalloc_nolock() is not supported on architectures that
> don't implement cmpxchg16b" such architectures will no longer get objexts
> allocated in early boot. I guess that's not a big deal.
>
> Also any later allocation having its flags screwed for some reason to not
> have __GFP_RECLAIM will also lose its objexts. Hope that's also acceptable.
> I don't know if we can distinguish a real kmalloc_nolock() scope in
> alloc_slab_obj_exts() without inventing new gfp flags or passing an extra
> argument through several layers of functions.

I think it's ok-ish.
Can we add a check to alloc_slab_obj_exts() that sets allow_spin=true
if we're in the boot phase? Like:
if (gfp_allowed_mask != __GFP_BITS_MASK)
   allow_spin = true;
or some cleaner way to detect boot time by checking slab_state ?
bpf is not active during the boot and nothing should be
calling kmalloc_nolock.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-next:master] [slab] db93cdd664: BUG:kernel_NULL_pointer_dereference,address
  2025-09-17 18:38     ` Alexei Starovoitov
@ 2025-09-18  7:06       ` Vlastimil Babka
  2025-09-18 14:49         ` Suren Baghdasaryan
  0 siblings, 1 reply; 12+ messages in thread
From: Vlastimil Babka @ 2025-09-18  7:06 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: kernel test robot, Alexei Starovoitov, Harry Yoo,
	Suren Baghdasaryan, oe-lkp, kbuild test robot, kasan-dev,
	open list:CONTROL GROUP (CGROUP),
	linux-mm

On 9/17/25 20:38, Alexei Starovoitov wrote:
> On Wed, Sep 17, 2025 at 2:18 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> Also I was curious to find out which path is triggered so I've put a
>> dump_stack() before the kmalloc_nolock call:
>>
>> [    0.731812][    T0] Call Trace:
>> [    0.732406][    T0]  __dump_stack+0x18/0x30
>> [    0.733200][    T0]  dump_stack_lvl+0x32/0x90
>> [    0.734037][    T0]  dump_stack+0xd/0x20
>> [    0.734780][    T0]  alloc_slab_obj_exts+0x181/0x1f0
>> [    0.735862][    T0]  __alloc_tagging_slab_alloc_hook+0xd1/0x330
>> [    0.736988][    T0]  ? __slab_alloc+0x4e/0x70
>> [    0.737858][    T0]  ? __set_page_owner+0x167/0x280
>> [    0.738774][    T0]  __kmalloc_cache_noprof+0x379/0x460
>> [    0.739756][    T0]  ? depot_fetch_stack+0x164/0x180
>> [    0.740687][    T0]  ? __set_page_owner+0x167/0x280
>> [    0.741604][    T0]  __set_page_owner+0x167/0x280
>> [    0.742503][    T0]  post_alloc_hook+0x17a/0x200
>> [    0.743404][    T0]  get_page_from_freelist+0x13b3/0x16b0
>> [    0.744427][    T0]  ? kvm_sched_clock_read+0xd/0x20
>> [    0.745358][    T0]  ? kvm_sched_clock_read+0xd/0x20
>> [    0.746290][    T0]  ? __next_zones_zonelist+0x26/0x60
>> [    0.747265][    T0]  __alloc_frozen_pages_noprof+0x143/0x1080
>> [    0.748358][    T0]  ? lock_acquire+0x8b/0x180
>> [    0.749209][    T0]  ? pcpu_alloc_noprof+0x181/0x800
>> [    0.750198][    T0]  ? sched_clock_noinstr+0x8/0x10
>> [    0.751119][    T0]  ? local_clock_noinstr+0x137/0x140
>> [    0.752089][    T0]  ? kvm_sched_clock_read+0xd/0x20
>> [    0.753023][    T0]  alloc_slab_page+0xda/0x150
>> [    0.753879][    T0]  new_slab+0xe1/0x500
>> [    0.754615][    T0]  ? kvm_sched_clock_read+0xd/0x20
>> [    0.755577][    T0]  ___slab_alloc+0xd79/0x1680
>> [    0.756469][    T0]  ? pcpu_alloc_noprof+0x538/0x800
>> [    0.757408][    T0]  ? __mutex_unlock_slowpath+0x195/0x3e0
>> [    0.758446][    T0]  __slab_alloc+0x4e/0x70
>> [    0.759237][    T0]  ? mm_alloc+0x38/0x80
>> [    0.759993][    T0]  kmem_cache_alloc_noprof+0x1db/0x470
>> [    0.760993][    T0]  ? mm_alloc+0x38/0x80
>> [    0.761745][    T0]  ? mm_alloc+0x38/0x80
>> [    0.762506][    T0]  mm_alloc+0x38/0x80
>> [    0.763260][    T0]  poking_init+0xe/0x80
>> [    0.764032][    T0]  start_kernel+0x16b/0x470
>> [    0.764858][    T0]  i386_start_kernel+0xce/0xf0
>> [    0.765723][    T0]  startup_32_smp+0x151/0x160
>>
>> And the reason is we still have restricted gfp_allowed_mask at this point:
>> /* The GFP flags allowed during early boot */
>> #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS))
>>
>> It's only lifted to a full allowed mask later in the boot.
> 
> Ohh. That's interesting.
> 
>> That means due to "kmalloc_nolock() is not supported on architectures that
>> don't implement cmpxchg16b" such architectures will no longer get objexts
>> allocated in early boot. I guess that's not a big deal.
>>
>> Also any later allocation having its flags screwed for some reason to not
>> have __GFP_RECLAIM will also lose its objexts. Hope that's also acceptable.
>> I don't know if we can distinguish a real kmalloc_nolock() scope in
>> alloc_slab_obj_exts() without inventing new gfp flags or passing an extra
>> argument through several layers of functions.
> 
> I think it's ok-ish.
> Can we add a check to alloc_slab_obj_exts() that sets allow_spin=true
> if we're in the boot phase? Like:
> if (gfp_allowed_mask != __GFP_BITS_MASK)
>    allow_spin = true;
> or some cleaner way to detect boot time by checking slab_state ?
> bpf is not active during the boot and nothing should be
> calling kmalloc_nolock.

Checking the gfp_allowed_mask should work. Slab state is already UP so won't
help, and this is not really about slab state anyway.
But whether worth it... Suren what do you think?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-next:master] [slab] db93cdd664: BUG:kernel_NULL_pointer_dereference,address
  2025-09-18  7:06       ` Vlastimil Babka
@ 2025-09-18 14:49         ` Suren Baghdasaryan
  2025-09-19  1:39           ` Alexei Starovoitov
  0 siblings, 1 reply; 12+ messages in thread
From: Suren Baghdasaryan @ 2025-09-18 14:49 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Alexei Starovoitov, kernel test robot, Alexei Starovoitov,
	Harry Yoo, oe-lkp, kbuild test robot, kasan-dev,
	open list:CONTROL GROUP (CGROUP),
	linux-mm

On Thu, Sep 18, 2025 at 12:06 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 9/17/25 20:38, Alexei Starovoitov wrote:
> > On Wed, Sep 17, 2025 at 2:18 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >>
> >> Also I was curious to find out which path is triggered so I've put a
> >> dump_stack() before the kmalloc_nolock call:
> >>
> >> [    0.731812][    T0] Call Trace:
> >> [    0.732406][    T0]  __dump_stack+0x18/0x30
> >> [    0.733200][    T0]  dump_stack_lvl+0x32/0x90
> >> [    0.734037][    T0]  dump_stack+0xd/0x20
> >> [    0.734780][    T0]  alloc_slab_obj_exts+0x181/0x1f0
> >> [    0.735862][    T0]  __alloc_tagging_slab_alloc_hook+0xd1/0x330
> >> [    0.736988][    T0]  ? __slab_alloc+0x4e/0x70
> >> [    0.737858][    T0]  ? __set_page_owner+0x167/0x280
> >> [    0.738774][    T0]  __kmalloc_cache_noprof+0x379/0x460
> >> [    0.739756][    T0]  ? depot_fetch_stack+0x164/0x180
> >> [    0.740687][    T0]  ? __set_page_owner+0x167/0x280
> >> [    0.741604][    T0]  __set_page_owner+0x167/0x280
> >> [    0.742503][    T0]  post_alloc_hook+0x17a/0x200
> >> [    0.743404][    T0]  get_page_from_freelist+0x13b3/0x16b0
> >> [    0.744427][    T0]  ? kvm_sched_clock_read+0xd/0x20
> >> [    0.745358][    T0]  ? kvm_sched_clock_read+0xd/0x20
> >> [    0.746290][    T0]  ? __next_zones_zonelist+0x26/0x60
> >> [    0.747265][    T0]  __alloc_frozen_pages_noprof+0x143/0x1080
> >> [    0.748358][    T0]  ? lock_acquire+0x8b/0x180
> >> [    0.749209][    T0]  ? pcpu_alloc_noprof+0x181/0x800
> >> [    0.750198][    T0]  ? sched_clock_noinstr+0x8/0x10
> >> [    0.751119][    T0]  ? local_clock_noinstr+0x137/0x140
> >> [    0.752089][    T0]  ? kvm_sched_clock_read+0xd/0x20
> >> [    0.753023][    T0]  alloc_slab_page+0xda/0x150
> >> [    0.753879][    T0]  new_slab+0xe1/0x500
> >> [    0.754615][    T0]  ? kvm_sched_clock_read+0xd/0x20
> >> [    0.755577][    T0]  ___slab_alloc+0xd79/0x1680
> >> [    0.756469][    T0]  ? pcpu_alloc_noprof+0x538/0x800
> >> [    0.757408][    T0]  ? __mutex_unlock_slowpath+0x195/0x3e0
> >> [    0.758446][    T0]  __slab_alloc+0x4e/0x70
> >> [    0.759237][    T0]  ? mm_alloc+0x38/0x80
> >> [    0.759993][    T0]  kmem_cache_alloc_noprof+0x1db/0x470
> >> [    0.760993][    T0]  ? mm_alloc+0x38/0x80
> >> [    0.761745][    T0]  ? mm_alloc+0x38/0x80
> >> [    0.762506][    T0]  mm_alloc+0x38/0x80
> >> [    0.763260][    T0]  poking_init+0xe/0x80
> >> [    0.764032][    T0]  start_kernel+0x16b/0x470
> >> [    0.764858][    T0]  i386_start_kernel+0xce/0xf0
> >> [    0.765723][    T0]  startup_32_smp+0x151/0x160
> >>
> >> And the reason is we still have restricted gfp_allowed_mask at this point:
> >> /* The GFP flags allowed during early boot */
> >> #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS))
> >>
> >> It's only lifted to a full allowed mask later in the boot.
> >
> > Ohh. That's interesting.
> >
> >> That means due to "kmalloc_nolock() is not supported on architectures that
> >> don't implement cmpxchg16b" such architectures will no longer get objexts
> >> allocated in early boot. I guess that's not a big deal.
> >>
> >> Also any later allocation having its flags screwed for some reason to not
> >> have __GFP_RECLAIM will also lose its objexts. Hope that's also acceptable.
> >> I don't know if we can distinguish a real kmalloc_nolock() scope in
> >> alloc_slab_obj_exts() without inventing new gfp flags or passing an extra
> >> argument through several layers of functions.
> >
> > I think it's ok-ish.
> > Can we add a check to alloc_slab_obj_exts() that sets allow_spin=true
> > if we're in the boot phase? Like:
> > if (gfp_allowed_mask != __GFP_BITS_MASK)
> >    allow_spin = true;
> > or some cleaner way to detect boot time by checking slab_state ?
> > bpf is not active during the boot and nothing should be
> > calling kmalloc_nolock.
>
> Checking the gfp_allowed_mask should work. Slab state is already UP so won't
> help, and this is not really about slab state anyway.
> But whether worth it... Suren what do you think?

Vlastimil's fix is correct. We definitely need __GFP_NO_OBJ_EXT when
allocating an obj_exts vector, otherwise it will try to recursively
allocate an obj_exts vector for obj_exts allocation.

For the additional __GFP_BITS_MASK check, that sounds good to me as
long as we add a comment on why that is there. Or maybe such a check
deserves to be placed in a separate function similar to
gfpflags_allow_{spinning | blocking}?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-next:master] [slab] db93cdd664: BUG:kernel_NULL_pointer_dereference,address
  2025-09-18 14:49         ` Suren Baghdasaryan
@ 2025-09-19  1:39           ` Alexei Starovoitov
  2025-09-19 15:01             ` Suren Baghdasaryan
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2025-09-19  1:39 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Vlastimil Babka, kernel test robot, Alexei Starovoitov,
	Harry Yoo, oe-lkp, kbuild test robot, kasan-dev,
	open list:CONTROL GROUP (CGROUP),
	linux-mm

On Thu, Sep 18, 2025 at 7:49 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Thu, Sep 18, 2025 at 12:06 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > On 9/17/25 20:38, Alexei Starovoitov wrote:
> > > On Wed, Sep 17, 2025 at 2:18 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> > >>
> > >> Also I was curious to find out which path is triggered so I've put a
> > >> dump_stack() before the kmalloc_nolock call:
> > >>
> > >> [    0.731812][    T0] Call Trace:
> > >> [    0.732406][    T0]  __dump_stack+0x18/0x30
> > >> [    0.733200][    T0]  dump_stack_lvl+0x32/0x90
> > >> [    0.734037][    T0]  dump_stack+0xd/0x20
> > >> [    0.734780][    T0]  alloc_slab_obj_exts+0x181/0x1f0
> > >> [    0.735862][    T0]  __alloc_tagging_slab_alloc_hook+0xd1/0x330
> > >> [    0.736988][    T0]  ? __slab_alloc+0x4e/0x70
> > >> [    0.737858][    T0]  ? __set_page_owner+0x167/0x280
> > >> [    0.738774][    T0]  __kmalloc_cache_noprof+0x379/0x460
> > >> [    0.739756][    T0]  ? depot_fetch_stack+0x164/0x180
> > >> [    0.740687][    T0]  ? __set_page_owner+0x167/0x280
> > >> [    0.741604][    T0]  __set_page_owner+0x167/0x280
> > >> [    0.742503][    T0]  post_alloc_hook+0x17a/0x200
> > >> [    0.743404][    T0]  get_page_from_freelist+0x13b3/0x16b0
> > >> [    0.744427][    T0]  ? kvm_sched_clock_read+0xd/0x20
> > >> [    0.745358][    T0]  ? kvm_sched_clock_read+0xd/0x20
> > >> [    0.746290][    T0]  ? __next_zones_zonelist+0x26/0x60
> > >> [    0.747265][    T0]  __alloc_frozen_pages_noprof+0x143/0x1080
> > >> [    0.748358][    T0]  ? lock_acquire+0x8b/0x180
> > >> [    0.749209][    T0]  ? pcpu_alloc_noprof+0x181/0x800
> > >> [    0.750198][    T0]  ? sched_clock_noinstr+0x8/0x10
> > >> [    0.751119][    T0]  ? local_clock_noinstr+0x137/0x140
> > >> [    0.752089][    T0]  ? kvm_sched_clock_read+0xd/0x20
> > >> [    0.753023][    T0]  alloc_slab_page+0xda/0x150
> > >> [    0.753879][    T0]  new_slab+0xe1/0x500
> > >> [    0.754615][    T0]  ? kvm_sched_clock_read+0xd/0x20
> > >> [    0.755577][    T0]  ___slab_alloc+0xd79/0x1680
> > >> [    0.756469][    T0]  ? pcpu_alloc_noprof+0x538/0x800
> > >> [    0.757408][    T0]  ? __mutex_unlock_slowpath+0x195/0x3e0
> > >> [    0.758446][    T0]  __slab_alloc+0x4e/0x70
> > >> [    0.759237][    T0]  ? mm_alloc+0x38/0x80
> > >> [    0.759993][    T0]  kmem_cache_alloc_noprof+0x1db/0x470
> > >> [    0.760993][    T0]  ? mm_alloc+0x38/0x80
> > >> [    0.761745][    T0]  ? mm_alloc+0x38/0x80
> > >> [    0.762506][    T0]  mm_alloc+0x38/0x80
> > >> [    0.763260][    T0]  poking_init+0xe/0x80
> > >> [    0.764032][    T0]  start_kernel+0x16b/0x470
> > >> [    0.764858][    T0]  i386_start_kernel+0xce/0xf0
> > >> [    0.765723][    T0]  startup_32_smp+0x151/0x160
> > >>
> > >> And the reason is we still have restricted gfp_allowed_mask at this point:
> > >> /* The GFP flags allowed during early boot */
> > >> #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS))
> > >>
> > >> It's only lifted to a full allowed mask later in the boot.
> > >
> > > Ohh. That's interesting.
> > >
> > >> That means due to "kmalloc_nolock() is not supported on architectures that
> > >> don't implement cmpxchg16b" such architectures will no longer get objexts
> > >> allocated in early boot. I guess that's not a big deal.
> > >>
> > >> Also any later allocation having its flags screwed for some reason to not
> > >> have __GFP_RECLAIM will also lose its objexts. Hope that's also acceptable.
> > >> I don't know if we can distinguish a real kmalloc_nolock() scope in
> > >> alloc_slab_obj_exts() without inventing new gfp flags or passing an extra
> > >> argument through several layers of functions.
> > >
> > > I think it's ok-ish.
> > > Can we add a check to alloc_slab_obj_exts() that sets allow_spin=true
> > > if we're in the boot phase? Like:
> > > if (gfp_allowed_mask != __GFP_BITS_MASK)
> > >    allow_spin = true;
> > > or some cleaner way to detect boot time by checking slab_state ?
> > > bpf is not active during the boot and nothing should be
> > > calling kmalloc_nolock.
> >
> > Checking the gfp_allowed_mask should work. Slab state is already UP so won't
> > help, and this is not really about slab state anyway.
> > But whether worth it... Suren what do you think?
>
> Vlastimil's fix is correct. We definitely need __GFP_NO_OBJ_EXT when
> allocating an obj_exts vector, otherwise it will try to recursively
> allocate an obj_exts vector for obj_exts allocation.
>
> For the additional __GFP_BITS_MASK check, that sounds good to me as
> long as we add a comment on why that is there. Or maybe such a check
> deserves to be placed in a separate function similar to
> gfpflags_allow_{spinning | blocking}?

I would not. I think adding 'boot or not' logic to these two
will muddy the waters and will make the whole slab/page_alloc/memcg
logic and dependencies between them much harder to follow.
I'd either add a comment to alloc_slab_obj_exts() explaining
what may happen or add 'boot or not' check only there.
imo this is a niche, rare and special.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-next:master] [slab] db93cdd664: BUG:kernel_NULL_pointer_dereference,address
  2025-09-19  1:39           ` Alexei Starovoitov
@ 2025-09-19 15:01             ` Suren Baghdasaryan
  2025-09-19 18:31               ` Alexei Starovoitov
  0 siblings, 1 reply; 12+ messages in thread
From: Suren Baghdasaryan @ 2025-09-19 15:01 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Vlastimil Babka, kernel test robot, Alexei Starovoitov,
	Harry Yoo, oe-lkp, kbuild test robot, kasan-dev,
	open list:CONTROL GROUP (CGROUP),
	linux-mm

On Thu, Sep 18, 2025 at 6:39 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Sep 18, 2025 at 7:49 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Thu, Sep 18, 2025 at 12:06 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> > >
> > > On 9/17/25 20:38, Alexei Starovoitov wrote:
> > > > On Wed, Sep 17, 2025 at 2:18 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> > > >>
> > > >> Also I was curious to find out which path is triggered so I've put a
> > > >> dump_stack() before the kmalloc_nolock call:
> > > >>
> > > >> [    0.731812][    T0] Call Trace:
> > > >> [    0.732406][    T0]  __dump_stack+0x18/0x30
> > > >> [    0.733200][    T0]  dump_stack_lvl+0x32/0x90
> > > >> [    0.734037][    T0]  dump_stack+0xd/0x20
> > > >> [    0.734780][    T0]  alloc_slab_obj_exts+0x181/0x1f0
> > > >> [    0.735862][    T0]  __alloc_tagging_slab_alloc_hook+0xd1/0x330
> > > >> [    0.736988][    T0]  ? __slab_alloc+0x4e/0x70
> > > >> [    0.737858][    T0]  ? __set_page_owner+0x167/0x280
> > > >> [    0.738774][    T0]  __kmalloc_cache_noprof+0x379/0x460
> > > >> [    0.739756][    T0]  ? depot_fetch_stack+0x164/0x180
> > > >> [    0.740687][    T0]  ? __set_page_owner+0x167/0x280
> > > >> [    0.741604][    T0]  __set_page_owner+0x167/0x280
> > > >> [    0.742503][    T0]  post_alloc_hook+0x17a/0x200
> > > >> [    0.743404][    T0]  get_page_from_freelist+0x13b3/0x16b0
> > > >> [    0.744427][    T0]  ? kvm_sched_clock_read+0xd/0x20
> > > >> [    0.745358][    T0]  ? kvm_sched_clock_read+0xd/0x20
> > > >> [    0.746290][    T0]  ? __next_zones_zonelist+0x26/0x60
> > > >> [    0.747265][    T0]  __alloc_frozen_pages_noprof+0x143/0x1080
> > > >> [    0.748358][    T0]  ? lock_acquire+0x8b/0x180
> > > >> [    0.749209][    T0]  ? pcpu_alloc_noprof+0x181/0x800
> > > >> [    0.750198][    T0]  ? sched_clock_noinstr+0x8/0x10
> > > >> [    0.751119][    T0]  ? local_clock_noinstr+0x137/0x140
> > > >> [    0.752089][    T0]  ? kvm_sched_clock_read+0xd/0x20
> > > >> [    0.753023][    T0]  alloc_slab_page+0xda/0x150
> > > >> [    0.753879][    T0]  new_slab+0xe1/0x500
> > > >> [    0.754615][    T0]  ? kvm_sched_clock_read+0xd/0x20
> > > >> [    0.755577][    T0]  ___slab_alloc+0xd79/0x1680
> > > >> [    0.756469][    T0]  ? pcpu_alloc_noprof+0x538/0x800
> > > >> [    0.757408][    T0]  ? __mutex_unlock_slowpath+0x195/0x3e0
> > > >> [    0.758446][    T0]  __slab_alloc+0x4e/0x70
> > > >> [    0.759237][    T0]  ? mm_alloc+0x38/0x80
> > > >> [    0.759993][    T0]  kmem_cache_alloc_noprof+0x1db/0x470
> > > >> [    0.760993][    T0]  ? mm_alloc+0x38/0x80
> > > >> [    0.761745][    T0]  ? mm_alloc+0x38/0x80
> > > >> [    0.762506][    T0]  mm_alloc+0x38/0x80
> > > >> [    0.763260][    T0]  poking_init+0xe/0x80
> > > >> [    0.764032][    T0]  start_kernel+0x16b/0x470
> > > >> [    0.764858][    T0]  i386_start_kernel+0xce/0xf0
> > > >> [    0.765723][    T0]  startup_32_smp+0x151/0x160
> > > >>
> > > >> And the reason is we still have restricted gfp_allowed_mask at this point:
> > > >> /* The GFP flags allowed during early boot */
> > > >> #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS))
> > > >>
> > > >> It's only lifted to a full allowed mask later in the boot.
> > > >
> > > > Ohh. That's interesting.
> > > >
> > > >> That means due to "kmalloc_nolock() is not supported on architectures that
> > > >> don't implement cmpxchg16b" such architectures will no longer get objexts
> > > >> allocated in early boot. I guess that's not a big deal.
> > > >>
> > > >> Also any later allocation having its flags screwed for some reason to not
> > > >> have __GFP_RECLAIM will also lose its objexts. Hope that's also acceptable.
> > > >> I don't know if we can distinguish a real kmalloc_nolock() scope in
> > > >> alloc_slab_obj_exts() without inventing new gfp flags or passing an extra
> > > >> argument through several layers of functions.
> > > >
> > > > I think it's ok-ish.
> > > > Can we add a check to alloc_slab_obj_exts() that sets allow_spin=true
> > > > if we're in the boot phase? Like:
> > > > if (gfp_allowed_mask != __GFP_BITS_MASK)
> > > >    allow_spin = true;
> > > > or some cleaner way to detect boot time by checking slab_state ?
> > > > bpf is not active during the boot and nothing should be
> > > > calling kmalloc_nolock.
> > >
> > > Checking the gfp_allowed_mask should work. Slab state is already UP so won't
> > > help, and this is not really about slab state anyway.
> > > But whether worth it... Suren what do you think?
> >
> > Vlastimil's fix is correct. We definitely need __GFP_NO_OBJ_EXT when
> > allocating an obj_exts vector, otherwise it will try to recursively
> > allocate an obj_exts vector for obj_exts allocation.
> >
> > For the additional __GFP_BITS_MASK check, that sounds good to me as
> > long as we add a comment on why that is there. Or maybe such a check
> > deserves to be placed in a separate function similar to
> > gfpflags_allow_{spinning | blocking}?
>
> I would not. I think adding 'boot or not' logic to these two
> will muddy the waters and will make the whole slab/page_alloc/memcg
> logic and dependencies between them much harder to follow.
> I'd either add a comment to alloc_slab_obj_exts() explaining
> what may happen or add 'boot or not' check only there.
> imo this is a niche, rare and special.

Ok, comment it is then.
Will you be sending a new version or Vlastimil will be including that
in his fixup?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-next:master] [slab] db93cdd664: BUG:kernel_NULL_pointer_dereference,address
  2025-09-19 15:01             ` Suren Baghdasaryan
@ 2025-09-19 18:31               ` Alexei Starovoitov
  2025-09-26 12:25                 ` Vlastimil Babka
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2025-09-19 18:31 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Vlastimil Babka, kernel test robot, Alexei Starovoitov,
	Harry Yoo, oe-lkp, kbuild test robot, kasan-dev,
	open list:CONTROL GROUP (CGROUP),
	linux-mm

On Fri, Sep 19, 2025 at 8:01 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Thu, Sep 18, 2025 at 6:39 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Sep 18, 2025 at 7:49 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Thu, Sep 18, 2025 at 12:06 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> > > >
> > > > On 9/17/25 20:38, Alexei Starovoitov wrote:
> > > > > On Wed, Sep 17, 2025 at 2:18 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> > > > >>
> > > > >> Also I was curious to find out which path is triggered so I've put a
> > > > >> dump_stack() before the kmalloc_nolock call:
> > > > >>
> > > > >> [    0.731812][    T0] Call Trace:
> > > > >> [    0.732406][    T0]  __dump_stack+0x18/0x30
> > > > >> [    0.733200][    T0]  dump_stack_lvl+0x32/0x90
> > > > >> [    0.734037][    T0]  dump_stack+0xd/0x20
> > > > >> [    0.734780][    T0]  alloc_slab_obj_exts+0x181/0x1f0
> > > > >> [    0.735862][    T0]  __alloc_tagging_slab_alloc_hook+0xd1/0x330
> > > > >> [    0.736988][    T0]  ? __slab_alloc+0x4e/0x70
> > > > >> [    0.737858][    T0]  ? __set_page_owner+0x167/0x280
> > > > >> [    0.738774][    T0]  __kmalloc_cache_noprof+0x379/0x460
> > > > >> [    0.739756][    T0]  ? depot_fetch_stack+0x164/0x180
> > > > >> [    0.740687][    T0]  ? __set_page_owner+0x167/0x280
> > > > >> [    0.741604][    T0]  __set_page_owner+0x167/0x280
> > > > >> [    0.742503][    T0]  post_alloc_hook+0x17a/0x200
> > > > >> [    0.743404][    T0]  get_page_from_freelist+0x13b3/0x16b0
> > > > >> [    0.744427][    T0]  ? kvm_sched_clock_read+0xd/0x20
> > > > >> [    0.745358][    T0]  ? kvm_sched_clock_read+0xd/0x20
> > > > >> [    0.746290][    T0]  ? __next_zones_zonelist+0x26/0x60
> > > > >> [    0.747265][    T0]  __alloc_frozen_pages_noprof+0x143/0x1080
> > > > >> [    0.748358][    T0]  ? lock_acquire+0x8b/0x180
> > > > >> [    0.749209][    T0]  ? pcpu_alloc_noprof+0x181/0x800
> > > > >> [    0.750198][    T0]  ? sched_clock_noinstr+0x8/0x10
> > > > >> [    0.751119][    T0]  ? local_clock_noinstr+0x137/0x140
> > > > >> [    0.752089][    T0]  ? kvm_sched_clock_read+0xd/0x20
> > > > >> [    0.753023][    T0]  alloc_slab_page+0xda/0x150
> > > > >> [    0.753879][    T0]  new_slab+0xe1/0x500
> > > > >> [    0.754615][    T0]  ? kvm_sched_clock_read+0xd/0x20
> > > > >> [    0.755577][    T0]  ___slab_alloc+0xd79/0x1680
> > > > >> [    0.756469][    T0]  ? pcpu_alloc_noprof+0x538/0x800
> > > > >> [    0.757408][    T0]  ? __mutex_unlock_slowpath+0x195/0x3e0
> > > > >> [    0.758446][    T0]  __slab_alloc+0x4e/0x70
> > > > >> [    0.759237][    T0]  ? mm_alloc+0x38/0x80
> > > > >> [    0.759993][    T0]  kmem_cache_alloc_noprof+0x1db/0x470
> > > > >> [    0.760993][    T0]  ? mm_alloc+0x38/0x80
> > > > >> [    0.761745][    T0]  ? mm_alloc+0x38/0x80
> > > > >> [    0.762506][    T0]  mm_alloc+0x38/0x80
> > > > >> [    0.763260][    T0]  poking_init+0xe/0x80
> > > > >> [    0.764032][    T0]  start_kernel+0x16b/0x470
> > > > >> [    0.764858][    T0]  i386_start_kernel+0xce/0xf0
> > > > >> [    0.765723][    T0]  startup_32_smp+0x151/0x160
> > > > >>
> > > > >> And the reason is we still have restricted gfp_allowed_mask at this point:
> > > > >> /* The GFP flags allowed during early boot */
> > > > >> #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS))
> > > > >>
> > > > >> It's only lifted to a full allowed mask later in the boot.
> > > > >
> > > > > Ohh. That's interesting.
> > > > >
> > > > >> That means due to "kmalloc_nolock() is not supported on architectures that
> > > > >> don't implement cmpxchg16b" such architectures will no longer get objexts
> > > > >> allocated in early boot. I guess that's not a big deal.
> > > > >>
> > > > >> Also any later allocation having its flags screwed for some reason to not
> > > > >> have __GFP_RECLAIM will also lose its objexts. Hope that's also acceptable.
> > > > >> I don't know if we can distinguish a real kmalloc_nolock() scope in
> > > > >> alloc_slab_obj_exts() without inventing new gfp flags or passing an extra
> > > > >> argument through several layers of functions.
> > > > >
> > > > > I think it's ok-ish.
> > > > > Can we add a check to alloc_slab_obj_exts() that sets allow_spin=true
> > > > > if we're in the boot phase? Like:
> > > > > if (gfp_allowed_mask != __GFP_BITS_MASK)
> > > > >    allow_spin = true;
> > > > > or some cleaner way to detect boot time by checking slab_state ?
> > > > > bpf is not active during the boot and nothing should be
> > > > > calling kmalloc_nolock.
> > > >
> > > > Checking the gfp_allowed_mask should work. Slab state is already UP so won't
> > > > help, and this is not really about slab state anyway.
> > > > But whether worth it... Suren what do you think?
> > >
> > > Vlastimil's fix is correct. We definitely need __GFP_NO_OBJ_EXT when
> > > allocating an obj_exts vector, otherwise it will try to recursively
> > > allocate an obj_exts vector for obj_exts allocation.
> > >
> > > For the additional __GFP_BITS_MASK check, that sounds good to me as
> > > long as we add a comment on why that is there. Or maybe such a check
> > > deserves to be placed in a separate function similar to
> > > gfpflags_allow_{spinning | blocking}?
> >
> > I would not. I think adding 'boot or not' logic to these two
> > will muddy the waters and will make the whole slab/page_alloc/memcg
> > logic and dependencies between them much harder to follow.
> > I'd either add a comment to alloc_slab_obj_exts() explaining
> > what may happen or add 'boot or not' check only there.
> > imo this is a niche, rare and special.
>
> Ok, comment it is then.
> Will you be sending a new version or Vlastimil will be including that
> in his fixup?

Whichever way. I can, but so far Vlastimil phrasing of comments
were much better than mine :) So I think he can fold what he prefers.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-next:master] [slab] db93cdd664: BUG:kernel_NULL_pointer_dereference,address
  2025-09-19 18:31               ` Alexei Starovoitov
@ 2025-09-26 12:25                 ` Vlastimil Babka
  2025-09-26 15:30                   ` Alexei Starovoitov
  0 siblings, 1 reply; 12+ messages in thread
From: Vlastimil Babka @ 2025-09-26 12:25 UTC (permalink / raw)
  To: Alexei Starovoitov, Suren Baghdasaryan
  Cc: kernel test robot, Alexei Starovoitov, Harry Yoo, oe-lkp,
	kbuild test robot, kasan-dev, open list:CONTROL GROUP (CGROUP),
	linux-mm

On 9/19/25 20:31, Alexei Starovoitov wrote:
> On Fri, Sep 19, 2025 at 8:01 AM Suren Baghdasaryan <surenb@google.com> wrote:
>>
>> >
>> > I would not. I think adding 'boot or not' logic to these two
>> > will muddy the waters and will make the whole slab/page_alloc/memcg
>> > logic and dependencies between them much harder to follow.
>> > I'd either add a comment to alloc_slab_obj_exts() explaining
>> > what may happen or add 'boot or not' check only there.
>> > imo this is a niche, rare and special.
>>
>> Ok, comment it is then.
>> Will you be sending a new version or Vlastimil will be including that
>> in his fixup?
> 
> Whichever way. I can, but so far Vlastimil phrasing of comments
> were much better than mine :) So I think he can fold what he prefers.

I'm adding this. Hopefully we'll be able to make sheaves the only percpu
caching layer in SLUB in the (near) future, and then requirement for
cmpxchg16b for allocations will be gone.

diff --git a/mm/slub.c b/mm/slub.c
index 9f1054f0b9ca..f9f7f3942074 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2089,6 +2089,13 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
        gfp &= ~OBJCGS_CLEAR_MASK;
        /* Prevent recursive extension vector allocation */
        gfp |= __GFP_NO_OBJ_EXT;
+
+       /*
+        * Note that allow_spin may be false during early boot and its
+        * restricted GFP_BOOT_MASK. Due to kmalloc_nolock() only supporting
+        * architectures with cmpxchg16b, early obj_exts will be missing for
+        * very early allocations on those.
+        */
        if (unlikely(!allow_spin)) {
                size_t sz = objects * sizeof(struct slabobj_ext);
 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-next:master] [slab] db93cdd664: BUG:kernel_NULL_pointer_dereference,address
  2025-09-26 12:25                 ` Vlastimil Babka
@ 2025-09-26 15:30                   ` Alexei Starovoitov
  2025-09-26 15:38                     ` Suren Baghdasaryan
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2025-09-26 15:30 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Suren Baghdasaryan, kernel test robot, Alexei Starovoitov,
	Harry Yoo, oe-lkp, kbuild test robot, kasan-dev,
	open list:CONTROL GROUP (CGROUP),
	linux-mm

On Fri, Sep 26, 2025 at 1:25 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 9/19/25 20:31, Alexei Starovoitov wrote:
> > On Fri, Sep 19, 2025 at 8:01 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >>
> >> >
> >> > I would not. I think adding 'boot or not' logic to these two
> >> > will muddy the waters and will make the whole slab/page_alloc/memcg
> >> > logic and dependencies between them much harder to follow.
> >> > I'd either add a comment to alloc_slab_obj_exts() explaining
> >> > what may happen or add 'boot or not' check only there.
> >> > imo this is a niche, rare and special.
> >>
> >> Ok, comment it is then.
> >> Will you be sending a new version or Vlastimil will be including that
> >> in his fixup?
> >
> > Whichever way. I can, but so far Vlastimil phrasing of comments
> > were much better than mine :) So I think he can fold what he prefers.
>
> I'm adding this. Hopefully we'll be able to make sheaves the only percpu
> caching layer in SLUB in the (near) future, and then requirement for
> cmpxchg16b for allocations will be gone.
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 9f1054f0b9ca..f9f7f3942074 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2089,6 +2089,13 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
>         gfp &= ~OBJCGS_CLEAR_MASK;
>         /* Prevent recursive extension vector allocation */
>         gfp |= __GFP_NO_OBJ_EXT;
> +
> +       /*
> +        * Note that allow_spin may be false during early boot and its
> +        * restricted GFP_BOOT_MASK. Due to kmalloc_nolock() only supporting
> +        * architectures with cmpxchg16b, early obj_exts will be missing for
> +        * very early allocations on those.
> +        */

lgtm. Maybe add a sentence about future sheaves plan, so it's clear
that there is a path forward and above won't stay forever.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-next:master] [slab] db93cdd664: BUG:kernel_NULL_pointer_dereference,address
  2025-09-26 15:30                   ` Alexei Starovoitov
@ 2025-09-26 15:38                     ` Suren Baghdasaryan
  0 siblings, 0 replies; 12+ messages in thread
From: Suren Baghdasaryan @ 2025-09-26 15:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Vlastimil Babka, kernel test robot, Alexei Starovoitov,
	Harry Yoo, oe-lkp, kbuild test robot, kasan-dev,
	open list:CONTROL GROUP (CGROUP),
	linux-mm

On Fri, Sep 26, 2025 at 8:30 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Sep 26, 2025 at 1:25 PM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > On 9/19/25 20:31, Alexei Starovoitov wrote:
> > > On Fri, Sep 19, 2025 at 8:01 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > >>
> > >> >
> > >> > I would not. I think adding 'boot or not' logic to these two
> > >> > will muddy the waters and will make the whole slab/page_alloc/memcg
> > >> > logic and dependencies between them much harder to follow.
> > >> > I'd either add a comment to alloc_slab_obj_exts() explaining
> > >> > what may happen or add 'boot or not' check only there.
> > >> > imo this is a niche, rare and special.
> > >>
> > >> Ok, comment it is then.
> > >> Will you be sending a new version or Vlastimil will be including that
> > >> in his fixup?
> > >
> > > Whichever way. I can, but so far Vlastimil phrasing of comments
> > > were much better than mine :) So I think he can fold what he prefers.
> >
> > I'm adding this. Hopefully we'll be able to make sheaves the only percpu
> > caching layer in SLUB in the (near) future, and then requirement for
> > cmpxchg16b for allocations will be gone.
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 9f1054f0b9ca..f9f7f3942074 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2089,6 +2089,13 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> >         gfp &= ~OBJCGS_CLEAR_MASK;
> >         /* Prevent recursive extension vector allocation */
> >         gfp |= __GFP_NO_OBJ_EXT;
> > +
> > +       /*
> > +        * Note that allow_spin may be false during early boot and its
> > +        * restricted GFP_BOOT_MASK. Due to kmalloc_nolock() only supporting
> > +        * architectures with cmpxchg16b, early obj_exts will be missing for
> > +        * very early allocations on those.
> > +        */
>
> lgtm. Maybe add a sentence about future sheaves plan, so it's clear
> that there is a path forward and above won't stay forever.

LGTM as well. Thanks!


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-09-26 15:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-17  5:01 [linux-next:master] [slab] db93cdd664: BUG:kernel_NULL_pointer_dereference,address kernel test robot
2025-09-17  8:03 ` Vlastimil Babka
2025-09-17  9:18   ` Vlastimil Babka
2025-09-17 18:38     ` Alexei Starovoitov
2025-09-18  7:06       ` Vlastimil Babka
2025-09-18 14:49         ` Suren Baghdasaryan
2025-09-19  1:39           ` Alexei Starovoitov
2025-09-19 15:01             ` Suren Baghdasaryan
2025-09-19 18:31               ` Alexei Starovoitov
2025-09-26 12:25                 ` Vlastimil Babka
2025-09-26 15:30                   ` Alexei Starovoitov
2025-09-26 15:38                     ` Suren Baghdasaryan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox