* [linux-next:master] [maple_tree] 2041864a22: BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h
@ 2023-09-24 13:50 kernel test robot
[not found] ` <CGME20230924135059epcas1p4c0595d07a7d50da7a877a0af696d9c78@epcms1p8>
0 siblings, 1 reply; 7+ messages in thread
From: kernel test robot @ 2023-09-24 13:50 UTC (permalink / raw)
To: Jaeseon Sim
Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
Liam R. Howlett, Matthew Wilcox, Peng Zhang, Suren Baghdasaryan,
maple-tree, oliver.sang
Hello,
kernel test robot noticed "BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h" on:
commit: 2041864a22d4f4e900d0a3def4985432a21d8e6d ("maple_tree: use mas_node_count_gfp() in mas_expected_entries()")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
[test failed on linux-next/master 940fcc189c51032dd0282cbee4497542c982ac59]
in testcase: boot
compiler: gcc-9
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202309242123.7ebe65b5-oliver.sang@intel.com
[ 113.582828][ T1] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:306
[ 113.583602][ T1] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
[ 113.584246][ T1] preempt_count: 1, expected: 0
[ 113.584613][ T1] RCU nest depth: 0, expected: 0
[ 113.584983][ T1] 1 lock held by swapper/0/1:
[ 113.585344][ T1] #0: ffffc9000001fc10 (&mt->ma_lock){+.+.}-{2:2}, at: check_forking+0x1e0/0x5c0
[ 113.586160][ T1] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G TN 6.6.0-rc2-00018-g2041864a22d4 #1
[ 113.586924][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 113.587701][ T1] Call Trace:
[ 113.587949][ T1] <TASK>
[ 113.588172][ T1] dump_stack_lvl (lib/dump_stack.c:107)
[ 113.588540][ T1] dump_stack (lib/dump_stack.c:114)
[ 113.588668][ T1] __might_resched (kernel/sched/core.c:10188)
[ 113.588668][ T1] __might_sleep (kernel/sched/core.c:10117 (discriminator 17))
[ 113.588668][ T1] kmem_cache_alloc (include/linux/kernel.h:112 include/linux/sched/mm.h:306 mm/slab.h:709 mm/slub.c:3460 mm/slub.c:3486 mm/slub.c:3493 mm/slub.c:3502)
[ 113.588668][ T1] ? mas_alloc_nodes (lib/maple_tree.c:160 lib/maple_tree.c:1249)
[ 113.588668][ T1] mas_alloc_nodes (lib/maple_tree.c:160 lib/maple_tree.c:1249)
[ 113.588668][ T1] mas_node_count_gfp (lib/maple_tree.c:1331)
[ 113.588668][ T1] mas_expected_entries (lib/maple_tree.c:5580)
[ 113.588668][ T1] check_forking+0x205/0x5c0
[ 113.588668][ T1] ? check_mas_store_gfp+0x580/0x580
[ 113.588668][ T1] ? mt_destroy_walk (lib/maple_tree.c:5273)
[ 113.588668][ T1] ? mtree_destroy (lib/maple_tree.c:6392)
[ 113.588668][ T1] ? lock_downgrade (kernel/locking/lockdep.c:5761)
[ 113.588668][ T1] ? __raw_spin_lock_init (kernel/locking/spinlock_debug.c:26)
[ 113.588668][ T1] maple_tree_seed (lib/test_maple_tree.c:3584)
[ 113.588668][ T1] ? check_empty_area_window+0x3000/0x3000
[ 113.588668][ T1] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:63 (discriminator 22))
[ 113.588668][ T1] ? write_comp_data (kernel/kcov.c:236)
[ 113.588668][ T1] ? check_empty_area_window+0x3000/0x3000
[ 113.588668][ T1] do_one_initcall (init/main.c:1232)
[ 113.588668][ T1] ? trace_event_raw_event_initcall_level (init/main.c:1223)
[ 113.588668][ T1] ? parameq (kernel/params.c:171)
[ 113.588668][ T1] ? __kasan_kmalloc (mm/kasan/common.c:384)
[ 113.588668][ T1] kernel_init_freeable (init/main.c:1293 init/main.c:1310 init/main.c:1329 init/main.c:1547)
[ 113.588668][ T1] ? rest_init (init/main.c:1429)
[ 113.588668][ T1] kernel_init (init/main.c:1439)
[ 113.588668][ T1] ? rest_init (init/main.c:1429)
[ 113.588668][ T1] ret_from_fork (arch/x86/kernel/process.c:153)
[ 113.588668][ T1] ? rest_init (init/main.c:1429)
[ 113.588668][ T1] ret_from_fork_asm (arch/x86/entry/entry_64.S:312)
[ 113.588668][ T1] </TASK>
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230924/202309242123.7ebe65b5-oliver.sang@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [linux-next:master] [maple_tree] 2041864a22: BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h
[not found] ` <CGME20230924135059epcas1p4c0595d07a7d50da7a877a0af696d9c78@epcms1p8>
@ 2023-09-25 12:39 ` Jaeseon Sim
2023-09-25 12:47 ` Peng Zhang
2023-09-26 12:15 ` Jaeseon Sim
1 sibling, 1 reply; 7+ messages in thread
From: Jaeseon Sim @ 2023-09-25 12:39 UTC (permalink / raw)
To: Liam R. Howlett
Cc: kernel test robot, oe-lkp, lkp, Linux Memory Management List,
Andrew Morton, Matthew Wilcox, Peng Zhang, Suren Baghdasaryan,
maple-tree
> Hello,
>
> kernel test robot noticed "BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h" on:
>
> commit: 2041864a22d4f4e900d0a3def4985432a21d8e6d ("maple_tree: use mas_node_count_gfp() in mas_expected_entries()")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> [test failed on linux-next/master 940fcc189c51032dd0282cbee4497542c982ac59]
>
> in testcase: boot
>
> compiler: gcc-9
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202309242123.7ebe65b5-oliver.sang@intel.com
>
>
> [ 113.582828][ T1] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:306
> [ 113.583602][ T1] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
> [ 113.584246][ T1] preempt_count: 1, expected: 0
> [ 113.584613][ T1] RCU nest depth: 0, expected: 0
> [ 113.584983][ T1] 1 lock held by swapper/0/1:
> [ 113.585344][ T1] #0: ffffc9000001fc10 (&mt->ma_lock){+.+.}-{2:2}, at: check_forking+0x1e0/0x5c0
Dear Liam,
mas_expected_entries() in check_forking() tried to sleep while holding spinlock, and panic occurred.
I think mas_expected_entries() in lib/test_maple_tree.c need to be modified to align with commit 2041864a22d4f.
Do you have any idea for it? or Could you give some guide?
Thanks
Jaeseon
> [ 113.586160][ T1] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G TN 6.6.0-rc2-00018-g2041864a22d4 #1
> [ 113.586924][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 113.587701][ T1] Call Trace:
> [ 113.587949][ T1] <TASK>
> [ 113.588172][ T1] dump_stack_lvl (lib/dump_stack.c:107)
> [ 113.588540][ T1] dump_stack (lib/dump_stack.c:114)
> [ 113.588668][ T1] __might_resched (kernel/sched/core.c:10188)
> [ 113.588668][ T1] __might_sleep (kernel/sched/core.c:10117 (discriminator 17))
> [ 113.588668][ T1] kmem_cache_alloc (include/linux/kernel.h:112 include/linux/sched/mm.h:306 mm/slab.h:709 mm/slub.c:3460 mm/slub.c:3486 mm/slub.c:3493 mm/slub.c:3502)
> [ 113.588668][ T1] ? mas_alloc_nodes (lib/maple_tree.c:160 lib/maple_tree.c:1249)
> [ 113.588668][ T1] mas_alloc_nodes (lib/maple_tree.c:160 lib/maple_tree.c:1249)
> [ 113.588668][ T1] mas_node_count_gfp (lib/maple_tree.c:1331)
> [ 113.588668][ T1] mas_expected_entries (lib/maple_tree.c:5580)
> [ 113.588668][ T1] check_forking+0x205/0x5c0
> [ 113.588668][ T1] ? check_mas_store_gfp+0x580/0x580
> [ 113.588668][ T1] ? mt_destroy_walk (lib/maple_tree.c:5273)
> [ 113.588668][ T1] ? mtree_destroy (lib/maple_tree.c:6392)
> [ 113.588668][ T1] ? lock_downgrade (kernel/locking/lockdep.c:5761)
> [ 113.588668][ T1] ? __raw_spin_lock_init (kernel/locking/spinlock_debug.c:26)
> [ 113.588668][ T1] maple_tree_seed (lib/test_maple_tree.c:3584)
> [ 113.588668][ T1] ? check_empty_area_window+0x3000/0x3000
> [ 113.588668][ T1] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:63 (discriminator 22))
> [ 113.588668][ T1] ? write_comp_data (kernel/kcov.c:236)
> [ 113.588668][ T1] ? check_empty_area_window+0x3000/0x3000
> [ 113.588668][ T1] do_one_initcall (init/main.c:1232)
> [ 113.588668][ T1] ? trace_event_raw_event_initcall_level (init/main.c:1223)
> [ 113.588668][ T1] ? parameq (kernel/params.c:171)
> [ 113.588668][ T1] ? __kasan_kmalloc (mm/kasan/common.c:384)
> [ 113.588668][ T1] kernel_init_freeable (init/main.c:1293 init/main.c:1310 init/main.c:1329 init/main.c:1547)
> [ 113.588668][ T1] ? rest_init (init/main.c:1429)
> [ 113.588668][ T1] kernel_init (init/main.c:1439)
> [ 113.588668][ T1] ? rest_init (init/main.c:1429)
> [ 113.588668][ T1] ret_from_fork (arch/x86/kernel/process.c:153)
> [ 113.588668][ T1] ? rest_init (init/main.c:1429)
> [ 113.588668][ T1] ret_from_fork_asm (arch/x86/entry/entry_64.S:312)
> [ 113.588668][ T1] </TASK>
>
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20230924/202309242123.7ebe65b5-oliver.sang@intel.com
>
>
>
> --
> 0-DAY CI Kernel Test Service
> https://protect2.fireeye.com/v1/url?k=cc5f4a3e-add45f07-cc5ec171-000babffae10-d841b249634d3b66&q=1&e=af92f0ae-c873-480a-8f2b-9d5c35053b67&u=https%3A%2F%2Fgithub.com%2Fintel%2Flkp-tests%2Fwiki
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [linux-next:master] [maple_tree] 2041864a22: BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h
2023-09-25 12:39 ` Jaeseon Sim
@ 2023-09-25 12:47 ` Peng Zhang
2023-09-25 15:23 ` Liam R. Howlett
0 siblings, 1 reply; 7+ messages in thread
From: Peng Zhang @ 2023-09-25 12:47 UTC (permalink / raw)
To: jason.sim
Cc: Liam R. Howlett, kernel test robot, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Matthew Wilcox,
Peng Zhang, Suren Baghdasaryan, maple-tree
在 2023/9/25 20:39, Jaeseon Sim 写道:
>> Hello,
>>
>> kernel test robot noticed "BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h" on:
>>
>> commit: 2041864a22d4f4e900d0a3def4985432a21d8e6d ("maple_tree: use mas_node_count_gfp() in mas_expected_entries()")
>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>
>> [test failed on linux-next/master 940fcc189c51032dd0282cbee4497542c982ac59]
>>
>> in testcase: boot
>>
>> compiler: gcc-9
>> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>>
>> (please refer to attached dmesg/kmsg for entire log/backtrace)
>>
>>
>>
>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>> the same patch/commit), kindly add following tags
>> | Reported-by: kernel test robot <oliver.sang@intel.com>
>> | Closes: https://lore.kernel.org/oe-lkp/202309242123.7ebe65b5-oliver.sang@intel.com
>>
>>
>> [ 113.582828][ T1] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:306
>> [ 113.583602][ T1] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
>> [ 113.584246][ T1] preempt_count: 1, expected: 0
>> [ 113.584613][ T1] RCU nest depth: 0, expected: 0
>> [ 113.584983][ T1] 1 lock held by swapper/0/1:
>> [ 113.585344][ T1] #0: ffffc9000001fc10 (&mt->ma_lock){+.+.}-{2:2}, at: check_forking+0x1e0/0x5c0
> Dear Liam,
>
> mas_expected_entries() in check_forking() tried to sleep while holding spinlock, and panic occurred.
> I think mas_expected_entries() in lib/test_maple_tree.c need to be modified to align with commit 2041864a22d4f.
> Do you have any idea for it? or Could you give some guide?
This is just a test module. The work[1] I'm doing modifies this place
and it will fix this bug.
Thanks.
[1]
https://lore.kernel.org/lkml/20230925035617.84767-1-zhangpeng.00@bytedance.com/
>
> Thanks
> Jaeseon
>
>> [ 113.586160][ T1] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G TN 6.6.0-rc2-00018-g2041864a22d4 #1
>> [ 113.586924][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>> [ 113.587701][ T1] Call Trace:
>> [ 113.587949][ T1] <TASK>
>> [ 113.588172][ T1] dump_stack_lvl (lib/dump_stack.c:107)
>> [ 113.588540][ T1] dump_stack (lib/dump_stack.c:114)
>> [ 113.588668][ T1] __might_resched (kernel/sched/core.c:10188)
>> [ 113.588668][ T1] __might_sleep (kernel/sched/core.c:10117 (discriminator 17))
>> [ 113.588668][ T1] kmem_cache_alloc (include/linux/kernel.h:112 include/linux/sched/mm.h:306 mm/slab.h:709 mm/slub.c:3460 mm/slub.c:3486 mm/slub.c:3493 mm/slub.c:3502)
>> [ 113.588668][ T1] ? mas_alloc_nodes (lib/maple_tree.c:160 lib/maple_tree.c:1249)
>> [ 113.588668][ T1] mas_alloc_nodes (lib/maple_tree.c:160 lib/maple_tree.c:1249)
>> [ 113.588668][ T1] mas_node_count_gfp (lib/maple_tree.c:1331)
>> [ 113.588668][ T1] mas_expected_entries (lib/maple_tree.c:5580)
>> [ 113.588668][ T1] check_forking+0x205/0x5c0
>> [ 113.588668][ T1] ? check_mas_store_gfp+0x580/0x580
>> [ 113.588668][ T1] ? mt_destroy_walk (lib/maple_tree.c:5273)
>> [ 113.588668][ T1] ? mtree_destroy (lib/maple_tree.c:6392)
>> [ 113.588668][ T1] ? lock_downgrade (kernel/locking/lockdep.c:5761)
>> [ 113.588668][ T1] ? __raw_spin_lock_init (kernel/locking/spinlock_debug.c:26)
>> [ 113.588668][ T1] maple_tree_seed (lib/test_maple_tree.c:3584)
>> [ 113.588668][ T1] ? check_empty_area_window+0x3000/0x3000
>> [ 113.588668][ T1] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:63 (discriminator 22))
>> [ 113.588668][ T1] ? write_comp_data (kernel/kcov.c:236)
>> [ 113.588668][ T1] ? check_empty_area_window+0x3000/0x3000
>> [ 113.588668][ T1] do_one_initcall (init/main.c:1232)
>> [ 113.588668][ T1] ? trace_event_raw_event_initcall_level (init/main.c:1223)
>> [ 113.588668][ T1] ? parameq (kernel/params.c:171)
>> [ 113.588668][ T1] ? __kasan_kmalloc (mm/kasan/common.c:384)
>> [ 113.588668][ T1] kernel_init_freeable (init/main.c:1293 init/main.c:1310 init/main.c:1329 init/main.c:1547)
>> [ 113.588668][ T1] ? rest_init (init/main.c:1429)
>> [ 113.588668][ T1] kernel_init (init/main.c:1439)
>> [ 113.588668][ T1] ? rest_init (init/main.c:1429)
>> [ 113.588668][ T1] ret_from_fork (arch/x86/kernel/process.c:153)
>> [ 113.588668][ T1] ? rest_init (init/main.c:1429)
>> [ 113.588668][ T1] ret_from_fork_asm (arch/x86/entry/entry_64.S:312)
>> [ 113.588668][ T1] </TASK>
>>
>>
>>
>> The kernel config and materials to reproduce are available at:
>> https://download.01.org/0day-ci/archive/20230924/202309242123.7ebe65b5-oliver.sang@intel.com
>>
>>
>>
>> --
>> 0-DAY CI Kernel Test Service
>> https://protect2.fireeye.com/v1/url?k=cc5f4a3e-add45f07-cc5ec171-000babffae10-d841b249634d3b66&q=1&e=af92f0ae-c873-480a-8f2b-9d5c35053b67&u=https%3A%2F%2Fgithub.com%2Fintel%2Flkp-tests%2Fwiki
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [linux-next:master] [maple_tree] 2041864a22: BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h
2023-09-25 12:47 ` Peng Zhang
@ 2023-09-25 15:23 ` Liam R. Howlett
2023-09-26 2:55 ` Peng Zhang
0 siblings, 1 reply; 7+ messages in thread
From: Liam R. Howlett @ 2023-09-25 15:23 UTC (permalink / raw)
To: Peng Zhang
Cc: jason.sim, kernel test robot, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Matthew Wilcox,
Suren Baghdasaryan, maple-tree
* Peng Zhang <zhangpeng.00@bytedance.com> [230925 08:47]:
>
>
> 在 2023/9/25 20:39, Jaeseon Sim 写道:
> > > Hello,
> > >
> > > kernel test robot noticed "BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h" on:
> > >
> > > commit: 2041864a22d4f4e900d0a3def4985432a21d8e6d ("maple_tree: use mas_node_count_gfp() in mas_expected_entries()")
> > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > >
> > > [test failed on linux-next/master 940fcc189c51032dd0282cbee4497542c982ac59]
> > >
> > > in testcase: boot
> > >
> > > compiler: gcc-9
> > > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> > >
> > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > >
> > >
> > >
> > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > the same patch/commit), kindly add following tags
> > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > | Closes: https://lore.kernel.org/oe-lkp/202309242123.7ebe65b5-oliver.sang@intel.com
> > >
> > >
> > > [ 113.582828][ T1] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:306
> > > [ 113.583602][ T1] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
> > > [ 113.584246][ T1] preempt_count: 1, expected: 0
> > > [ 113.584613][ T1] RCU nest depth: 0, expected: 0
> > > [ 113.584983][ T1] 1 lock held by swapper/0/1:
> > > [ 113.585344][ T1] #0: ffffc9000001fc10 (&mt->ma_lock){+.+.}-{2:2}, at: check_forking+0x1e0/0x5c0
> > Dear Liam,
> >
> > mas_expected_entries() in check_forking() tried to sleep while holding spinlock, and panic occurred.
> > I think mas_expected_entries() in lib/test_maple_tree.c need to be modified to align with commit 2041864a22d4f.
> > Do you have any idea for it? or Could you give some guide?
There are two ways we could fix this: one is to pass through the GFP
flag and use different flags in the test module, the other is to move
the testing out of the module and into the userspace tests.
Adding the GFP flag to the interface might be needed in the future but
there's no need for that now. I was concerned about too large of a
change to the existing code, and this would increase the runtime code
changes - although not a lot.
I think the best thing would be to move the forking test out of the
module into the userspace testing (tools/testing/radix-tree/maple.c)
> This is just a test module. The work[1] I'm doing modifies this place
> and it will fix this bug.
Thanks Peng. This is a temporary fix for upstream, but is needed for
the LTS kernels as well. I've mentioned your patches to others, so
don't think they aren't noticed - they are eagerly awaited.
Since your patch adds the necessary GFP flag, we could move the
check_forking test back in your update, (patch 7/9 [1]) which avoids the
GFP_KERNEL flag (thanks!), if it is moved. I think it's worth while to
do since you already have a lot of userspace tests as well that uses
GFP_KERNEL (4/9 [2]) and it's good to keep as much in the kernel module
as possible.
By the way Peng, I have gotten complaints (I cannot find a reference
quickly) from older CPUs taking a long time on the test module. You are
making things faster, but I just wanted you to be aware of that in case
you add tests in the future that cause complaints :) I still think it
is worth keeping as much as possible in that module - it's a more valid
test scenario and it still runs from the userspace testing.
...
> Thanks.
>
> [1] https://lore.kernel.org/lkml/20230925035617.84767-1-zhangpeng.00@bytedance.com/
...
Thanks,
Liam
[1] https://lore.kernel.org/lkml/20230925035617.84767-8-zhangpeng.00@bytedance.com/
[2] https://lore.kernel.org/lkml/20230925035617.84767-5-zhangpeng.00@bytedance.com/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [linux-next:master] [maple_tree] 2041864a22: BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h
2023-09-25 15:23 ` Liam R. Howlett
@ 2023-09-26 2:55 ` Peng Zhang
0 siblings, 0 replies; 7+ messages in thread
From: Peng Zhang @ 2023-09-26 2:55 UTC (permalink / raw)
To: Liam R. Howlett, Peng Zhang, jason.sim, kernel test robot,
oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
Matthew Wilcox, Suren Baghdasaryan, maple-tree
在 2023/9/25 23:23, Liam R. Howlett 写道:
> * Peng Zhang <zhangpeng.00@bytedance.com> [230925 08:47]:
>>
>>
>> 在 2023/9/25 20:39, Jaeseon Sim 写道:
>>>> Hello,
>>>>
>>>> kernel test robot noticed "BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h" on:
>>>>
>>>> commit: 2041864a22d4f4e900d0a3def4985432a21d8e6d ("maple_tree: use mas_node_count_gfp() in mas_expected_entries()")
>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>>
>>>> [test failed on linux-next/master 940fcc189c51032dd0282cbee4497542c982ac59]
>>>>
>>>> in testcase: boot
>>>>
>>>> compiler: gcc-9
>>>> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>>>>
>>>> (please refer to attached dmesg/kmsg for entire log/backtrace)
>>>>
>>>>
>>>>
>>>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>>>> the same patch/commit), kindly add following tags
>>>> | Reported-by: kernel test robot <oliver.sang@intel.com>
>>>> | Closes: https://lore.kernel.org/oe-lkp/202309242123.7ebe65b5-oliver.sang@intel.com
>>>>
>>>>
>>>> [ 113.582828][ T1] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:306
>>>> [ 113.583602][ T1] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
>>>> [ 113.584246][ T1] preempt_count: 1, expected: 0
>>>> [ 113.584613][ T1] RCU nest depth: 0, expected: 0
>>>> [ 113.584983][ T1] 1 lock held by swapper/0/1:
>>>> [ 113.585344][ T1] #0: ffffc9000001fc10 (&mt->ma_lock){+.+.}-{2:2}, at: check_forking+0x1e0/0x5c0
>>> Dear Liam,
>>>
>>> mas_expected_entries() in check_forking() tried to sleep while holding spinlock, and panic occurred.
>>> I think mas_expected_entries() in lib/test_maple_tree.c need to be modified to align with commit 2041864a22d4f.
>>> Do you have any idea for it? or Could you give some guide?
>
> There are two ways we could fix this: one is to pass through the GFP
> flag and use different flags in the test module, the other is to move
> the testing out of the module and into the userspace tests.
Actually, there is a third method that can be used to solve this
problem, which is to use an externally sleepable lock, such as
rw_semaphore.
>
> Adding the GFP flag to the interface might be needed in the future but
> there's no need for that now. I was concerned about too large of a
> change to the existing code, and this would increase the runtime code
> changes - although not a lot.
>
> I think the best thing would be to move the forking test out of the
> module into the userspace testing (tools/testing/radix-tree/maple.c)
>
>> This is just a test module. The work[1] I'm doing modifies this place
>> and it will fix this bug.
>
> Thanks Peng. This is a temporary fix for upstream, but is needed for
> the LTS kernels as well. I've mentioned your patches to others, so
> don't think they aren't noticed - they are eagerly awaited.
>
> Since your patch adds the necessary GFP flag, we could move the
> check_forking test back in your update, (patch 7/9 [1]) which avoids the
> GFP_KERNEL flag (thanks!), if it is moved. I think it's worth while to
> do since you already have a lot of userspace tests as well that uses
> GFP_KERNEL (4/9 [2]) and it's good to keep as much in the kernel module
> as possible.
>
> By the way Peng, I have gotten complaints (I cannot find a reference
> quickly) from older CPUs taking a long time on the test module. You are
> making things faster, but I just wanted you to be aware of that in case
> you add tests in the future that cause complaints :) I still think it
> is worth keeping as much as possible in that module - it's a more valid
> test scenario and it still runs from the userspace testing.
I understand this now, and I will take this into consideration when
adding tests in the future.
>
> ...
>
>> Thanks.
>>
>> [1] https://lore.kernel.org/lkml/20230925035617.84767-1-zhangpeng.00@bytedance.com/
>
> ...
>
> Thanks,
> Liam
>
> [1] https://lore.kernel.org/lkml/20230925035617.84767-8-zhangpeng.00@bytedance.com/
> [2] https://lore.kernel.org/lkml/20230925035617.84767-5-zhangpeng.00@bytedance.com/
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [linux-next:master] [maple_tree] 2041864a22: BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h
[not found] ` <CGME20230924135059epcas1p4c0595d07a7d50da7a877a0af696d9c78@epcms1p8>
2023-09-25 12:39 ` Jaeseon Sim
@ 2023-09-26 12:15 ` Jaeseon Sim
2023-09-27 15:29 ` Liam R. Howlett
1 sibling, 1 reply; 7+ messages in thread
From: Jaeseon Sim @ 2023-09-26 12:15 UTC (permalink / raw)
To: Peng Zhang, Liam R. Howlett, kernel test robot, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Matthew Wilcox,
Suren Baghdasaryan, maple-tree
>>在 2023/9/25 23:23, Liam R. Howlett 写道:
>>> * Peng Zhang <zhangpeng.00@bytedance.com> [230925 08:47]:
>>>>
>>>>
>>>> 在 2023/9/25 20:39, Jaeseon Sim 写道:
>>>>>> Hello,
>>>>>>
>>>>>> kernel test robot noticed "BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h" on:
>>>>>>
>>>>>> commit: 2041864a22d4f4e900d0a3def4985432a21d8e6d ("maple_tree: use mas_node_count_gfp() in mas_expected_entries()")
>>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>>>>
>>>>>> [test failed on linux-next/master 940fcc189c51032dd0282cbee4497542c982ac59]
>>>>>>
>>>>>> in testcase: boot
>>>>>>
>>>>>> compiler: gcc-9
>>>>>> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>>>>>>
>>>>>> (please refer to attached dmesg/kmsg for entire log/backtrace)
>>>>>>
>>>>>>
>>>>>>
>>>>>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>>>>>> the same patch/commit), kindly add following tags
>>>>>> | Reported-by: kernel test robot <oliver.sang@intel.com>
>>>>>> | Closes: https://lore.kernel.org/oe-lkp/202309242123.7ebe65b5-oliver.sang@intel.com
>>>>>>
>>>>>>
>>>>>> [ 113.582828][ T1] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:306
>>>>>> [ 113.583602][ T1] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
>>>>>> [ 113.584246][ T1] preempt_count: 1, expected: 0
>>>>>> [ 113.584613][ T1] RCU nest depth: 0, expected: 0
>>>>>> [ 113.584983][ T1] 1 lock held by swapper/0/1:
>>>>>> [ 113.585344][ T1] #0: ffffc9000001fc10 (&mt->ma_lock){+.+.}-{2:2}, at: check_forking+0x1e0/0x5c0
>>>>> Dear Liam,
>>>>>
>>>>> mas_expected_entries() in check_forking() tried to sleep while holding spinlock, and panic occurred.
>>>>> I think mas_expected_entries() in lib/test_maple_tree.c need to be modified to align with commit 2041864a22d4f.
>>>>> Do you have any idea for it? or Could you give some guide?
>>>
>>> There are two ways we could fix this: one is to pass through the GFP
>>> flag and use different flags in the test module, the other is to move
>>> the testing out of the module and into the userspace tests.
>>Actually, there is a third method that can be used to solve this
>>problem, which is to use an externally sleepable lock, such as
>>rw_semaphore.
>>>
>>> Adding the GFP flag to the interface might be needed in the future but
>>> there's no need for that now. I was concerned about too large of a
>>> change to the existing code, and this would increase the runtime code
>>> changes - although not a lot.
>>>
>>> I think the best thing would be to move the forking test out of the
>>> module into the userspace testing (tools/testing/radix-tree/maple.c)
>>>
>>>> This is just a test module. The work[1] I'm doing modifies this place
>>>> and it will fix this bug.
>>>
>>> Thanks Peng. This is a temporary fix for upstream, but is needed for
>>> the LTS kernels as well. I've mentioned your patches to others, so
>>> don't think they aren't noticed - they are eagerly awaited.
>>>
>>> Since your patch adds the necessary GFP flag, we could move the
>>> check_forking test back in your update, (patch 7/9 [1]) which avoids the
>>> GFP_KERNEL flag (thanks!), if it is moved. I think it's worth while to
>>> do since you already have a lot of userspace tests as well that uses
>>> GFP_KERNEL (4/9 [2]) and it's good to keep as much in the kernel module
>>> as possible.
>>>
>>> By the way Peng, I have gotten complaints (I cannot find a reference
>>> quickly) from older CPUs taking a long time on the test module. You are
>>> making things faster, but I just wanted you to be aware of that in case
>>> you add tests in the future that cause complaints :) I still think it
>>> is worth keeping as much as possible in that module - it's a more valid
>>> test scenario and it still runs from the userspace testing.
>>I understand this now, and I will take this into consideration when
>>adding tests in the future.
>>>
>>> ...
>>>
>>>> Thanks.
>>>>
>>>> [1] https://lore.kernel.org/lkml/20230925035617.84767-1-zhangpeng.00@bytedance.com/
>>>
>>> ...
>>>
>>> Thanks,
>>> Liam
>>>
>>> [1] https://lore.kernel.org/lkml/20230925035617.84767-8-zhangpeng.00@bytedance.com/
>>> [2] https://lore.kernel.org/lkml/20230925035617.84767-5-zhangpeng.00@bytedance.com/
>>>
>>>
I think it would be better to wait for Peng's revision..
Thanks to all
Jaeseon
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [linux-next:master] [maple_tree] 2041864a22: BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h
2023-09-26 12:15 ` Jaeseon Sim
@ 2023-09-27 15:29 ` Liam R. Howlett
0 siblings, 0 replies; 7+ messages in thread
From: Liam R. Howlett @ 2023-09-27 15:29 UTC (permalink / raw)
To: Jaeseon Sim
Cc: Peng Zhang, kernel test robot, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Matthew Wilcox,
Suren Baghdasaryan, maple-tree
* Jaeseon Sim <jason.sim@samsung.com> [230926 08:15]:
...
> >>>>>> commit: 2041864a22d4f4e900d0a3def4985432a21d8e6d ("maple_tree: use mas_node_count_gfp() in mas_expected_entries()")
> >>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >>>>>>
> >>>>>> [test failed on linux-next/master 940fcc189c51032dd0282cbee4497542c982ac59]
> >>>>>>
> >>>>>> in testcase: boot
> >>>>>>
> >>>>>> compiler: gcc-9
> >>>>>> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> >>>>>>
> >>>>>> (please refer to attached dmesg/kmsg for entire log/backtrace)
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> >>>>>> the same patch/commit), kindly add following tags
> >>>>>> | Reported-by: kernel test robot <oliver.sang@intel.com>
> >>>>>> | Closes: https://lore.kernel.org/oe-lkp/202309242123.7ebe65b5-oliver.sang@intel.com
> >>>>>>
> >>>>>>
> >>>>>> [ 113.582828][ T1] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:306
> >>>>>> [ 113.583602][ T1] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
> >>>>>> [ 113.584246][ T1] preempt_count: 1, expected: 0
> >>>>>> [ 113.584613][ T1] RCU nest depth: 0, expected: 0
> >>>>>> [ 113.584983][ T1] 1 lock held by swapper/0/1:
> >>>>>> [ 113.585344][ T1] #0: ffffc9000001fc10 (&mt->ma_lock){+.+.}-{2:2}, at: check_forking+0x1e0/0x5c0
> >>>>> Dear Liam,
> >>>>>
> >>>>> mas_expected_entries() in check_forking() tried to sleep while holding spinlock, and panic occurred.
> >>>>> I think mas_expected_entries() in lib/test_maple_tree.c need to be modified to align with commit 2041864a22d4f.
> >>>>> Do you have any idea for it? or Could you give some guide?
> >>>
> >>> There are two ways we could fix this: one is to pass through the GFP
> >>> flag and use different flags in the test module, the other is to move
> >>> the testing out of the module and into the userspace tests.
> >>Actually, there is a third method that can be used to solve this
> >>problem, which is to use an externally sleepable lock, such as
> >>rw_semaphore.
Oh yes, that's true. There's probably even more ways than even these
three. I didn't mean to limit the choices to just two.
We should probably also add a test for this somehow to the test suite,
so probably the best way would be to use the test module and implement
the change like Peng suggested - at least in the long run. That way the
bots will report the potential locking issue if we ever change it again.
> >>>
> >>> Adding the GFP flag to the interface might be needed in the future but
> >>> there's no need for that now. I was concerned about too large of a
> >>> change to the existing code, and this would increase the runtime code
> >>> changes - although not a lot.
> >>>
> >>> I think the best thing would be to move the forking test out of the
> >>> module into the userspace testing (tools/testing/radix-tree/maple.c)
> >>>
> >>>> This is just a test module. The work[1] I'm doing modifies this place
> >>>> and it will fix this bug.
> >>>
> >>> Thanks Peng. This is a temporary fix for upstream, but is needed for
> >>> the LTS kernels as well. I've mentioned your patches to others, so
> >>> don't think they aren't noticed - they are eagerly awaited.
> >>>
> >>> Since your patch adds the necessary GFP flag, we could move the
> >>> check_forking test back in your update, (patch 7/9 [1]) which avoids the
> >>> GFP_KERNEL flag (thanks!), if it is moved. I think it's worth while to
> >>> do since you already have a lot of userspace tests as well that uses
> >>> GFP_KERNEL (4/9 [2]) and it's good to keep as much in the kernel module
> >>> as possible.
> >>>
> >>> By the way Peng, I have gotten complaints (I cannot find a reference
> >>> quickly) from older CPUs taking a long time on the test module. You are
> >>> making things faster, but I just wanted you to be aware of that in case
> >>> you add tests in the future that cause complaints :) I still think it
> >>> is worth keeping as much as possible in that module - it's a more valid
> >>> test scenario and it still runs from the userspace testing.
> >>I understand this now, and I will take this into consideration when
> >>adding tests in the future.
> >>>
> >>> ...
> >>>
> >>>> Thanks.
> >>>>
> >>>> [1] https://lore.kernel.org/lkml/20230925035617.84767-1-zhangpeng.00@bytedance.com/
> >>>
> >>> ...
> >>>
> >>> Thanks,
> >>> Liam
> >>>
> >>> [1] https://lore.kernel.org/lkml/20230925035617.84767-8-zhangpeng.00@bytedance.com/
> >>> [2] https://lore.kernel.org/lkml/20230925035617.84767-5-zhangpeng.00@bytedance.com/
> >>>
> >>>
>
> I think it would be better to wait for Peng's revision..
>
Your patch is a bug fix. What Peng has written is a new feature. The
only reason that the new feature fixes the issue is that I raised it in
the patch review process because you reported the issue in the existing
code. So, thank you for reporting it!
We are not backporting a new feature to an older kernel, so the fix you
submitted is incomplete, but very much necessary. For all we know, there
will be other issues with the feature that will delay getting to Linus'
branch - and possibly miss your internal release window as well.
We are currently in an RC release of the kernel. This means we can
submit the bug fixes, but we won't be submitting new features. So your
fix will land and be backported all the way to 6.1 before Peng's feature
is added.
Even if the new feature was put into the mm branch today, it would take
a long time for it to be pushed upstream. Even when the new feature is
landed upstream, your fix is still necessary from 6.1 - 6.6.
Between Peng and I, we have provided 3 ways to work around the test
failure. I like Peng's suggestion the best, but since he will be
replacing the tests then we could make this change in his patchset. For
now, I would be happy with a passing test framework to minimize the
backporting effort.
Do you think you can update the patch to incorporate one of suggested
fixes for the testing? If not, let me know and I will add it to the
list of tasks, because it *has* to happen.
Thank you,
Liam
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-09-27 15:29 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-24 13:50 [linux-next:master] [maple_tree] 2041864a22: BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h kernel test robot
[not found] ` <CGME20230924135059epcas1p4c0595d07a7d50da7a877a0af696d9c78@epcms1p8>
2023-09-25 12:39 ` Jaeseon Sim
2023-09-25 12:47 ` Peng Zhang
2023-09-25 15:23 ` Liam R. Howlett
2023-09-26 2:55 ` Peng Zhang
2023-09-26 12:15 ` Jaeseon Sim
2023-09-27 15:29 ` Liam R. Howlett
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox