[syzbot] [mm?] INFO: task hung in exit

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [syzbot] [mm?] INFO: task hung in exit_mmap
@ 2024-10-10 15:19 syzbot
  2024-10-10 15:28 ` Lorenzo Stoakes
  0 siblings, 1 reply; 5+ messages in thread
From: syzbot @ 2024-10-10 15:19 UTC (permalink / raw)
  To: Liam.Howlett, akpm, linux-kernel, linux-mm, lorenzo.stoakes,
	syzkaller-bugs, vbabka

Hello,

syzbot found the following issue on:

HEAD commit:    d3d1556696c1 Merge tag 'mm-hotfixes-stable-2024-10-09-15-4..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10416fd0580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=7a3fccdd0bb995
dashboard link: https://syzkaller.appspot.com/bug?extid=39bc767144c55c8db0ea
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/0600b551e610/disk-d3d15566.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/d59d43ed3976/vmlinux-d3d15566.xz
kernel image: https://storage.googleapis.com/syzbot-assets/e686a3e7e0d6/bzImage-d3d15566.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+39bc767144c55c8db0ea@syzkaller.appspotmail.com

INFO: task syz.3.917:7739 blocked for more than 146 seconds.
      Not tainted 6.12.0-rc2-syzkaller-00074-gd3d1556696c1 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz.3.917       state:D stack:23808 pid:7739  tgid:7739  ppid:5232   flags:0x00004000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5322 [inline]
 __schedule+0x1843/0x4ae0 kernel/sched/core.c:6682
 __schedule_loop kernel/sched/core.c:6759 [inline]
 schedule+0x14b/0x320 kernel/sched/core.c:6774
 schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6831
 rwsem_down_write_slowpath+0xeee/0x13b0 kernel/locking/rwsem.c:1176
 __down_write_common kernel/locking/rwsem.c:1304 [inline]
 __down_write kernel/locking/rwsem.c:1313 [inline]
 down_write+0x1d7/0x220 kernel/locking/rwsem.c:1578
 mmap_write_lock include/linux/mmap_lock.h:106 [inline]
 exit_mmap+0x2bd/0xc40 mm/mmap.c:1872
 __mmput+0x115/0x380 kernel/fork.c:1347
 exit_mm+0x220/0x310 kernel/exit.c:571
 do_exit+0x9b2/0x28e0 kernel/exit.c:926
 do_group_exit+0x207/0x2c0 kernel/exit.c:1088
 __do_sys_exit_group kernel/exit.c:1099 [inline]
 __se_sys_exit_group kernel/exit.c:1097 [inline]
 __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1097
 x64_sys_call+0x2634/0x2640 arch/x86/include/generated/asm/syscalls_64.h:232
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f4688f7dff9
RSP: 002b:00007ffea64ebf18 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4688f7dff9
RDX: 0000000000000064 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 00007ffea64ebf6c R08: 00007ffea64ebfff R09: 0000000000028eb6
R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000032
R13: 0000000000028eb6 R14: 0000000000028d0c R15: 00007ffea64ebfc0
 </TASK>
INFO: task syz.0.828:7756 blocked for more than 152 seconds.
      Not tainted 6.12.0-rc2-syzkaller-00074-gd3d1556696c1 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz.0.828       state:D stack:22384 pid:7756  tgid:7755  ppid:7346   flags:0x00004000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5322 [inline]
 __schedule+0x1843/0x4ae0 kernel/sched/core.c:6682
 __schedule_loop kernel/sched/core.c:6759 [inline]
 schedule+0x14b/0x320 kernel/sched/core.c:6774
 schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6831
 rwsem_down_write_slowpath+0xeee/0x13b0 kernel/locking/rwsem.c:1176
 __down_write_common kernel/locking/rwsem.c:1304 [inline]
 __down_write kernel/locking/rwsem.c:1313 [inline]
 down_write+0x1d7/0x220 kernel/locking/rwsem.c:1578
 mmap_write_lock include/linux/mmap_lock.h:106 [inline]
 exit_mmap+0x2bd/0xc40 mm/mmap.c:1872
 __mmput+0x115/0x380 kernel/fork.c:1347
 exit_mm+0x220/0x310 kernel/exit.c:571
 do_exit+0x9b2/0x28e0 kernel/exit.c:926
 do_group_exit+0x207/0x2c0 kernel/exit.c:1088
 get_signal+0x16a3/0x1740 kernel/signal.c:2917
 arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:337
 exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
 exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
 __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
 syscall_exit_to_user_mode+0xc9/0x370 kernel/entry/common.c:218
 do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff5c377dff9
RSP: 002b:00007ff5c45800e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 00007ff5c3935f88 RCX: 00007ff5c377dff9
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007ff5c3935f88
RBP: 00007ff5c3935f80 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ff5c3935f8c
R13: 0000000000000000 R14: 00007ffe0400b7d0 R15: 00007ffe0400b8b8
 </TASK>

Showing all locks held in the system:
1 lock held by pool_workqueue_/3:
 #0: ffffffff8e93d378 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:329 [inline]
 #0: ffffffff8e93d378 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x451/0x830 kernel/rcu/tree_exp.h:976
1 lock held by khungtaskd/30:
 #0: ffffffff8e937de0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:337 [inline]
 #0: ffffffff8e937de0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:849 [inline]
 #0: ffffffff8e937de0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x55/0x2a0 kernel/locking/lockdep.c:6720
1 lock held by klogd/4662:
 #0: ffff888072fda798 (&mm->mmap_lock){++++}-{3:3}, at: mmap_write_lock include/linux/mmap_lock.h:106 [inline]
 #0: ffff888072fda798 (&mm->mmap_lock){++++}-{3:3}, at: exit_mmap+0x2bd/0xc40 mm/mmap.c:1872
1 lock held by dhcpcd/4887:
 #0: ffff888032585718 (&mm->mmap_lock){++++}-{3:3}, at: mmap_write_lock_killable include/linux/mmap_lock.h:122 [inline]
 #0: ffff888032585718 (&mm->mmap_lock){++++}-{3:3}, at: vm_mmap_pgoff+0x17c/0x3d0 mm/util.c:586
2 locks held by getty/4982:
 #0: ffff88814b9860a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:243
 #1: ffffc90002f062f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x6a6/0x1e00 drivers/tty/n_tty.c:2211
3 locks held by kworker/1:5/5270:
3 locks held by kworker/0:5/5300:
 #0: ffff88801ac80948 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3204 [inline]
 #0: ffff88801ac80948 ((wq_completion)events){+.+.}-{0:0}, at: process_scheduled_works+0x93b/0x1850 kernel/workqueue.c:3310
 #1: ffffc90004037d00 (xfrm_state_gc_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3205 [inline]
 #1: ffffc90004037d00 (xfrm_state_gc_work){+.+.}-{0:0}, at: process_scheduled_works+0x976/0x1850 kernel/workqueue.c:3310
 #2: ffffffff8e93d378 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:329 [inline]
 #2: ffffffff8e93d378 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x451/0x830 kernel/rcu/tree_exp.h:976
1 lock held by syz.3.917/7739:
 #0: ffff888020fc5718 (&mm->mmap_lock){++++}-{3:3}, at: mmap_write_lock include/linux/mmap_lock.h:106 [inline]
 #0: ffff888020fc5718 (&mm->mmap_lock){++++}-{3:3}, at: exit_mmap+0x2bd/0xc40 mm/mmap.c:1872
1 lock held by syz.0.828/7756:
 #0: ffff888062c33a98 (&mm->mmap_lock){++++}-{3:3}, at: mmap_write_lock include/linux/mmap_lock.h:106 [inline]
 #0: ffff888062c33a98 (&mm->mmap_lock){++++}-{3:3}, at: exit_mmap+0x2bd/0xc40 mm/mmap.c:1872
3 locks held by kworker/u8:21/7787:
 #0: ffff88801b367148 ((wq_completion)cfg80211){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3204 [inline]
 #0: ffff88801b367148 ((wq_completion)cfg80211){+.+.}-{0:0}, at: process_scheduled_works+0x93b/0x1850 kernel/workqueue.c:3310
 #1: ffffc900015b7d00 ((work_completion)(&(&rdev->dfs_update_channels_wk)->work)){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3205 [inline]
 #1: ffffc900015b7d00 ((work_completion)(&(&rdev->dfs_update_channels_wk)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x976/0x1850 kernel/workqueue.c:3310
 #2: ffffffff8fcbf8c8 (rtnl_mutex){+.+.}-{3:3}, at: cfg80211_dfs_channels_update_work+0xbf/0x610 net/wireless/mlme.c:1021
3 locks held by kworker/u8:30/7796:
 #0: ffff88814b6a6948 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3204 [inline]
 #0: ffff88814b6a6948 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: process_scheduled_works+0x93b/0x1850 kernel/workqueue.c:3310
 #1: ffffc90002e9fd00 ((work_completion)(&(&ifa->dad_work)->work)){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3205 [inline]
 #1: ffffc90002e9fd00 ((work_completion)(&(&ifa->dad_work)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x976/0x1850 kernel/workqueue.c:3310
 #2: ffffffff8fcbf8c8 (rtnl_mutex){+.+.}-{3:3}, at: addrconf_dad_work+0xd0/0x16f0 net/ipv6/addrconf.c:4196
3 locks held by kworker/u8:32/7798:
 #0: ffff88801ac89148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3204 [inline]
 #0: ffff88801ac89148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x93b/0x1850 kernel/workqueue.c:3310
 #1: ffffc90003ad7d00 ((linkwatch_work).work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3205 [inline]
 #1: ffffc90003ad7d00 ((linkwatch_work).work){+.+.}-{0:0}, at: process_scheduled_works+0x976/0x1850 kernel/workqueue.c:3310
 #2: ffffffff8fcbf8c8 (rtnl_mutex){+.+.}-{3:3}, at: linkwatch_event+0xe/0x60 net/core/link_watch.c:276
5 locks held by kworker/u8:50/9743:
 #0: ffff88801baeb148 ((wq_completion)netns){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3204 [inline]
 #0: ffff88801baeb148 ((wq_completion)netns){+.+.}-{0:0}, at: process_scheduled_works+0x93b/0x1850 kernel/workqueue.c:3310
 #1: ffffc90003f57d00 (net_cleanup_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3205 [inline]
 #1: ffffc90003f57d00 (net_cleanup_work){+.+.}-{0:0}, at: process_scheduled_works+0x976/0x1850 kernel/workqueue.c:3310
 #2: ffffffff8fcb2dd0 (pernet_ops_rwsem){++++}-{3:3}, at: cleanup_net+0x16a/0xcc0 net/core/net_namespace.c:580
 #3: ffffffff8fcbf8c8 (rtnl_mutex){+.+.}-{3:3}, at: ieee80211_unregister_hw+0x55/0x2c0 net/mac80211/main.c:1662
 #4: ffff888059fc8768 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: wiphy_lock include/net/cfg80211.h:6014 [inline]
 #4: ffff888059fc8768 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: ieee80211_stop+0x3e9/0x4a0 net/mac80211/iface.c:777
1 lock held by syz.1.2163/11123:

=============================================

NMI backtrace for cpu 1
CPU: 1 UID: 0 PID: 30 Comm: khungtaskd Not tainted 6.12.0-rc2-syzkaller-00074-gd3d1556696c1 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
 nmi_cpu_backtrace+0x49c/0x4d0 lib/nmi_backtrace.c:113
 nmi_trigger_cpumask_backtrace+0x198/0x320 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:162 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:223 [inline]
 watchdog+0xff4/0x1040 kernel/hung_task.c:379
 kthread+0x2f0/0x390 kernel/kthread.c:389
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
 </TASK>
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 UID: 0 PID: 11132 Comm: syz-executor Not tainted 6.12.0-rc2-syzkaller-00074-gd3d1556696c1 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
RIP: 0010:preempt_count_sub+0x66/0x170 kernel/sched/core.c:5829
Code: c1 81 e1 ff ff ff 7f 39 d9 7c 27 81 fb fe 00 00 00 77 07 0f b6 c0 85 c0 74 5f 65 8b 05 cb 99 a0 7e f7 db 65 01 1d c2 99 a0 7e <5b> 41 5e c3 cc cc cc cc 90 e8 ec af 4c 03 85 c0 74 3a 48 c7 c0 30
RSP: 0018:ffffc9000335f8c8 EFLAGS: 00000093
RAX: 0000000080000002 RBX: 00000000ffffffff RCX: 0000000000000002
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
RBP: ffffc9000335f970 R08: ffffffff9a5fe1f3 R09: 1ffffffff34bfc3e
R10: dffffc0000000000 R11: fffffbfff34bfc3f R12: dffffc0000000000
R13: 1ffff9200066bf1c R14: dffffc0000000000 R15: 0000000000000046
FS:  0000555588e89500(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000110c310254 CR3: 00000000581ae000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <NMI>
 </NMI>
 <TASK>
 __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:152 [inline]
 _raw_spin_unlock_irqrestore+0xdd/0x140 kernel/locking/spinlock.c:194
 __debug_check_no_obj_freed lib/debugobjects.c:998 [inline]
 debug_check_no_obj_freed+0x561/0x580 lib/debugobjects.c:1019
 slab_free_hook mm/slub.c:2273 [inline]
 slab_free mm/slub.c:4579 [inline]
 kmem_cache_free+0x11f/0x420 mm/slub.c:4681
 __sigqueue_free kernel/signal.c:451 [inline]
 collect_signal kernel/signal.c:594 [inline]
 __dequeue_signal+0x4ac/0x5c0 kernel/signal.c:616
 dequeue_signal+0x1e0/0x680 kernel/signal.c:637
 get_signal+0x604/0x1740 kernel/signal.c:2797
 arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:337
 exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
 exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
 __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
 syscall_exit_to_user_mode+0xc9/0x370 kernel/entry/common.c:218
 do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fae9a37c911
Code: 75 57 89 f0 25 00 00 41 00 3d 00 00 41 00 74 49 80 3d 3a fc 18 00 00 74 6d 89 da 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 93 00 00 00 48 8b 54 24 28 64 48 2b 14 25
RSP: 002b:00007ffcb450afb0 EFLAGS: 00000202
RAX: 0000000000000003 RBX: 0000000000000002 RCX: 00007fae9a37c911
RDX: 0000000000000002 RSI: 00007fae9a3f033b RDI: 00000000ffffff9c
RBP: 00007fae9a3f033b R08: 00000000000000da R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
R13: 0000000000000003 R14: 0000000000000009 R15: 0000000000000000
 </TASK>
IPVS: wlc: UDP 224.0.0.2:0 - no destination available


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] [mm?] INFO: task hung in exit_mmap
  2024-10-10 15:19 [syzbot] [mm?] INFO: task hung in exit_mmap syzbot
@ 2024-10-10 15:28 ` Lorenzo Stoakes
  2024-10-13 13:29   ` Sasha Levin
  0 siblings, 1 reply; 5+ messages in thread
From: Lorenzo Stoakes @ 2024-10-10 15:28 UTC (permalink / raw)
  To: syzbot; +Cc: Liam.Howlett, akpm, linux-kernel, linux-mm, syzkaller-bugs, vbabka

On Thu, Oct 10, 2024 at 08:19:28AM -0700, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:    d3d1556696c1 Merge tag 'mm-hotfixes-stable-2024-10-09-15-4..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=10416fd0580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=7a3fccdd0bb995
> dashboard link: https://syzkaller.appspot.com/bug?extid=39bc767144c55c8db0ea
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/0600b551e610/disk-d3d15566.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/d59d43ed3976/vmlinux-d3d15566.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/e686a3e7e0d6/bzImage-d3d15566.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+39bc767144c55c8db0ea@syzkaller.appspotmail.com
>
> INFO: task syz.3.917:7739 blocked for more than 146 seconds.
>       Not tainted 6.12.0-rc2-syzkaller-00074-gd3d1556696c1 #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:syz.3.917       state:D stack:23808 pid:7739  tgid:7739  ppid:5232   flags:0x00004000
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5322 [inline]
>  __schedule+0x1843/0x4ae0 kernel/sched/core.c:6682
>  __schedule_loop kernel/sched/core.c:6759 [inline]
>  schedule+0x14b/0x320 kernel/sched/core.c:6774
>  schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6831
>  rwsem_down_write_slowpath+0xeee/0x13b0 kernel/locking/rwsem.c:1176
>  __down_write_common kernel/locking/rwsem.c:1304 [inline]
>  __down_write kernel/locking/rwsem.c:1313 [inline]
>  down_write+0x1d7/0x220 kernel/locking/rwsem.c:1578
>  mmap_write_lock include/linux/mmap_lock.h:106 [inline]
>  exit_mmap+0x2bd/0xc40 mm/mmap.c:1872

Hmm, task freezing up or system becoming unstable/locked up is reminsecent
of the maple tree bug I fixed in [0], which is still in the unstable hotfix
branch.

This is likely not going to repro as it's quite heisenbug-ish to trigger
and the failures are like this - somewhat disconnected from the cause, so
not sure if there is any case to speed this to Linus's tree.

On the other hand it's a pretty serious problem for stability and likely to
continue to manifest in nasty ways like this.

Can't be 100% sure this is the cause, but seems likely.

[0]:https://lore.kernel.org/linux-mm/48b349a2a0f7c76e18772712d0997a5e12ab0a3b.1728314403.git.lorenzo.stoakes@oracle.com/

>  __mmput+0x115/0x380 kernel/fork.c:1347
>  exit_mm+0x220/0x310 kernel/exit.c:571
>  do_exit+0x9b2/0x28e0 kernel/exit.c:926
>  do_group_exit+0x207/0x2c0 kernel/exit.c:1088
>  __do_sys_exit_group kernel/exit.c:1099 [inline]
>  __se_sys_exit_group kernel/exit.c:1097 [inline]
>  __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1097
>  x64_sys_call+0x2634/0x2640 arch/x86/include/generated/asm/syscalls_64.h:232
>  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f4688f7dff9
> RSP: 002b:00007ffea64ebf18 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4688f7dff9
> RDX: 0000000000000064 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: 00007ffea64ebf6c R08: 00007ffea64ebfff R09: 0000000000028eb6
> R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000032
> R13: 0000000000028eb6 R14: 0000000000028d0c R15: 00007ffea64ebfc0
>  </TASK>
> INFO: task syz.0.828:7756 blocked for more than 152 seconds.
>       Not tainted 6.12.0-rc2-syzkaller-00074-gd3d1556696c1 #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:syz.0.828       state:D stack:22384 pid:7756  tgid:7755  ppid:7346   flags:0x00004000
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5322 [inline]
>  __schedule+0x1843/0x4ae0 kernel/sched/core.c:6682
>  __schedule_loop kernel/sched/core.c:6759 [inline]
>  schedule+0x14b/0x320 kernel/sched/core.c:6774
>  schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6831
>  rwsem_down_write_slowpath+0xeee/0x13b0 kernel/locking/rwsem.c:1176
>  __down_write_common kernel/locking/rwsem.c:1304 [inline]
>  __down_write kernel/locking/rwsem.c:1313 [inline]
>  down_write+0x1d7/0x220 kernel/locking/rwsem.c:1578
>  mmap_write_lock include/linux/mmap_lock.h:106 [inline]
>  exit_mmap+0x2bd/0xc40 mm/mmap.c:1872
>  __mmput+0x115/0x380 kernel/fork.c:1347
>  exit_mm+0x220/0x310 kernel/exit.c:571
>  do_exit+0x9b2/0x28e0 kernel/exit.c:926
>  do_group_exit+0x207/0x2c0 kernel/exit.c:1088
>  get_signal+0x16a3/0x1740 kernel/signal.c:2917
>  arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:337
>  exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
>  exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
>  __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
>  syscall_exit_to_user_mode+0xc9/0x370 kernel/entry/common.c:218
>  do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7ff5c377dff9
> RSP: 002b:00007ff5c45800e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> RAX: fffffffffffffe00 RBX: 00007ff5c3935f88 RCX: 00007ff5c377dff9
> RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007ff5c3935f88
> RBP: 00007ff5c3935f80 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007ff5c3935f8c
> R13: 0000000000000000 R14: 00007ffe0400b7d0 R15: 00007ffe0400b8b8
>  </TASK>
>
> Showing all locks held in the system:
> 1 lock held by pool_workqueue_/3:
>  #0: ffffffff8e93d378 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:329 [inline]
>  #0: ffffffff8e93d378 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x451/0x830 kernel/rcu/tree_exp.h:976
> 1 lock held by khungtaskd/30:
>  #0: ffffffff8e937de0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:337 [inline]
>  #0: ffffffff8e937de0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:849 [inline]
>  #0: ffffffff8e937de0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x55/0x2a0 kernel/locking/lockdep.c:6720
> 1 lock held by klogd/4662:
>  #0: ffff888072fda798 (&mm->mmap_lock){++++}-{3:3}, at: mmap_write_lock include/linux/mmap_lock.h:106 [inline]
>  #0: ffff888072fda798 (&mm->mmap_lock){++++}-{3:3}, at: exit_mmap+0x2bd/0xc40 mm/mmap.c:1872
> 1 lock held by dhcpcd/4887:
>  #0: ffff888032585718 (&mm->mmap_lock){++++}-{3:3}, at: mmap_write_lock_killable include/linux/mmap_lock.h:122 [inline]
>  #0: ffff888032585718 (&mm->mmap_lock){++++}-{3:3}, at: vm_mmap_pgoff+0x17c/0x3d0 mm/util.c:586
> 2 locks held by getty/4982:
>  #0: ffff88814b9860a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:243
>  #1: ffffc90002f062f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x6a6/0x1e00 drivers/tty/n_tty.c:2211
> 3 locks held by kworker/1:5/5270:
> 3 locks held by kworker/0:5/5300:
>  #0: ffff88801ac80948 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3204 [inline]
>  #0: ffff88801ac80948 ((wq_completion)events){+.+.}-{0:0}, at: process_scheduled_works+0x93b/0x1850 kernel/workqueue.c:3310
>  #1: ffffc90004037d00 (xfrm_state_gc_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3205 [inline]
>  #1: ffffc90004037d00 (xfrm_state_gc_work){+.+.}-{0:0}, at: process_scheduled_works+0x976/0x1850 kernel/workqueue.c:3310
>  #2: ffffffff8e93d378 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:329 [inline]
>  #2: ffffffff8e93d378 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x451/0x830 kernel/rcu/tree_exp.h:976
> 1 lock held by syz.3.917/7739:
>  #0: ffff888020fc5718 (&mm->mmap_lock){++++}-{3:3}, at: mmap_write_lock include/linux/mmap_lock.h:106 [inline]
>  #0: ffff888020fc5718 (&mm->mmap_lock){++++}-{3:3}, at: exit_mmap+0x2bd/0xc40 mm/mmap.c:1872
> 1 lock held by syz.0.828/7756:
>  #0: ffff888062c33a98 (&mm->mmap_lock){++++}-{3:3}, at: mmap_write_lock include/linux/mmap_lock.h:106 [inline]
>  #0: ffff888062c33a98 (&mm->mmap_lock){++++}-{3:3}, at: exit_mmap+0x2bd/0xc40 mm/mmap.c:1872
> 3 locks held by kworker/u8:21/7787:
>  #0: ffff88801b367148 ((wq_completion)cfg80211){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3204 [inline]
>  #0: ffff88801b367148 ((wq_completion)cfg80211){+.+.}-{0:0}, at: process_scheduled_works+0x93b/0x1850 kernel/workqueue.c:3310
>  #1: ffffc900015b7d00 ((work_completion)(&(&rdev->dfs_update_channels_wk)->work)){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3205 [inline]
>  #1: ffffc900015b7d00 ((work_completion)(&(&rdev->dfs_update_channels_wk)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x976/0x1850 kernel/workqueue.c:3310
>  #2: ffffffff8fcbf8c8 (rtnl_mutex){+.+.}-{3:3}, at: cfg80211_dfs_channels_update_work+0xbf/0x610 net/wireless/mlme.c:1021
> 3 locks held by kworker/u8:30/7796:
>  #0: ffff88814b6a6948 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3204 [inline]
>  #0: ffff88814b6a6948 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: process_scheduled_works+0x93b/0x1850 kernel/workqueue.c:3310
>  #1: ffffc90002e9fd00 ((work_completion)(&(&ifa->dad_work)->work)){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3205 [inline]
>  #1: ffffc90002e9fd00 ((work_completion)(&(&ifa->dad_work)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x976/0x1850 kernel/workqueue.c:3310
>  #2: ffffffff8fcbf8c8 (rtnl_mutex){+.+.}-{3:3}, at: addrconf_dad_work+0xd0/0x16f0 net/ipv6/addrconf.c:4196
> 3 locks held by kworker/u8:32/7798:
>  #0: ffff88801ac89148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3204 [inline]
>  #0: ffff88801ac89148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x93b/0x1850 kernel/workqueue.c:3310
>  #1: ffffc90003ad7d00 ((linkwatch_work).work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3205 [inline]
>  #1: ffffc90003ad7d00 ((linkwatch_work).work){+.+.}-{0:0}, at: process_scheduled_works+0x976/0x1850 kernel/workqueue.c:3310
>  #2: ffffffff8fcbf8c8 (rtnl_mutex){+.+.}-{3:3}, at: linkwatch_event+0xe/0x60 net/core/link_watch.c:276
> 5 locks held by kworker/u8:50/9743:
>  #0: ffff88801baeb148 ((wq_completion)netns){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3204 [inline]
>  #0: ffff88801baeb148 ((wq_completion)netns){+.+.}-{0:0}, at: process_scheduled_works+0x93b/0x1850 kernel/workqueue.c:3310
>  #1: ffffc90003f57d00 (net_cleanup_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3205 [inline]
>  #1: ffffc90003f57d00 (net_cleanup_work){+.+.}-{0:0}, at: process_scheduled_works+0x976/0x1850 kernel/workqueue.c:3310
>  #2: ffffffff8fcb2dd0 (pernet_ops_rwsem){++++}-{3:3}, at: cleanup_net+0x16a/0xcc0 net/core/net_namespace.c:580
>  #3: ffffffff8fcbf8c8 (rtnl_mutex){+.+.}-{3:3}, at: ieee80211_unregister_hw+0x55/0x2c0 net/mac80211/main.c:1662
>  #4: ffff888059fc8768 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: wiphy_lock include/net/cfg80211.h:6014 [inline]
>  #4: ffff888059fc8768 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: ieee80211_stop+0x3e9/0x4a0 net/mac80211/iface.c:777
> 1 lock held by syz.1.2163/11123:
>
> =============================================
>
> NMI backtrace for cpu 1
> CPU: 1 UID: 0 PID: 30 Comm: khungtaskd Not tainted 6.12.0-rc2-syzkaller-00074-gd3d1556696c1 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
> Call Trace:
>  <TASK>
>  __dump_stack lib/dump_stack.c:94 [inline]
>  dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
>  nmi_cpu_backtrace+0x49c/0x4d0 lib/nmi_backtrace.c:113
>  nmi_trigger_cpumask_backtrace+0x198/0x320 lib/nmi_backtrace.c:62
>  trigger_all_cpu_backtrace include/linux/nmi.h:162 [inline]
>  check_hung_uninterruptible_tasks kernel/hung_task.c:223 [inline]
>  watchdog+0xff4/0x1040 kernel/hung_task.c:379
>  kthread+0x2f0/0x390 kernel/kthread.c:389
>  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
>  </TASK>
> Sending NMI from CPU 1 to CPUs 0:
> NMI backtrace for cpu 0
> CPU: 0 UID: 0 PID: 11132 Comm: syz-executor Not tainted 6.12.0-rc2-syzkaller-00074-gd3d1556696c1 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
> RIP: 0010:preempt_count_sub+0x66/0x170 kernel/sched/core.c:5829
> Code: c1 81 e1 ff ff ff 7f 39 d9 7c 27 81 fb fe 00 00 00 77 07 0f b6 c0 85 c0 74 5f 65 8b 05 cb 99 a0 7e f7 db 65 01 1d c2 99 a0 7e <5b> 41 5e c3 cc cc cc cc 90 e8 ec af 4c 03 85 c0 74 3a 48 c7 c0 30
> RSP: 0018:ffffc9000335f8c8 EFLAGS: 00000093
> RAX: 0000000080000002 RBX: 00000000ffffffff RCX: 0000000000000002
> RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
> RBP: ffffc9000335f970 R08: ffffffff9a5fe1f3 R09: 1ffffffff34bfc3e
> R10: dffffc0000000000 R11: fffffbfff34bfc3f R12: dffffc0000000000
> R13: 1ffff9200066bf1c R14: dffffc0000000000 R15: 0000000000000046
> FS:  0000555588e89500(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000110c310254 CR3: 00000000581ae000 CR4: 00000000003526f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <NMI>
>  </NMI>
>  <TASK>
>  __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:152 [inline]
>  _raw_spin_unlock_irqrestore+0xdd/0x140 kernel/locking/spinlock.c:194
>  __debug_check_no_obj_freed lib/debugobjects.c:998 [inline]
>  debug_check_no_obj_freed+0x561/0x580 lib/debugobjects.c:1019
>  slab_free_hook mm/slub.c:2273 [inline]
>  slab_free mm/slub.c:4579 [inline]
>  kmem_cache_free+0x11f/0x420 mm/slub.c:4681
>  __sigqueue_free kernel/signal.c:451 [inline]
>  collect_signal kernel/signal.c:594 [inline]
>  __dequeue_signal+0x4ac/0x5c0 kernel/signal.c:616
>  dequeue_signal+0x1e0/0x680 kernel/signal.c:637
>  get_signal+0x604/0x1740 kernel/signal.c:2797
>  arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:337
>  exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
>  exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
>  __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
>  syscall_exit_to_user_mode+0xc9/0x370 kernel/entry/common.c:218
>  do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7fae9a37c911
> Code: 75 57 89 f0 25 00 00 41 00 3d 00 00 41 00 74 49 80 3d 3a fc 18 00 00 74 6d 89 da 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 93 00 00 00 48 8b 54 24 28 64 48 2b 14 25
> RSP: 002b:00007ffcb450afb0 EFLAGS: 00000202
> RAX: 0000000000000003 RBX: 0000000000000002 RCX: 00007fae9a37c911
> RDX: 0000000000000002 RSI: 00007fae9a3f033b RDI: 00000000ffffff9c
> RBP: 00007fae9a3f033b R08: 00000000000000da R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
> R13: 0000000000000003 R14: 0000000000000009 R15: 0000000000000000
>  </TASK>
> IPVS: wlc: UDP 224.0.0.2:0 - no destination available
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] [mm?] INFO: task hung in exit_mmap
  2024-10-10 15:28 ` Lorenzo Stoakes
@ 2024-10-13 13:29   ` Sasha Levin
  2024-10-14  2:22     ` Liam R. Howlett
  0 siblings, 1 reply; 5+ messages in thread
From: Sasha Levin @ 2024-10-13 13:29 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: syzbot, Liam.Howlett, akpm, linux-kernel, linux-mm,
	syzkaller-bugs, vbabka

On Thu, Oct 10, 2024 at 04:28:18PM +0100, Lorenzo Stoakes wrote:
>On Thu, Oct 10, 2024 at 08:19:28AM -0700, syzbot wrote:
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit:    d3d1556696c1 Merge tag 'mm-hotfixes-stable-2024-10-09-15-4..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=10416fd0580000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=7a3fccdd0bb995
>> dashboard link: https://syzkaller.appspot.com/bug?extid=39bc767144c55c8db0ea
>> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
>>
>> Unfortunately, I don't have any reproducer for this issue yet.
>>
>> Downloadable assets:
>> disk image: https://storage.googleapis.com/syzbot-assets/0600b551e610/disk-d3d15566.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/d59d43ed3976/vmlinux-d3d15566.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/e686a3e7e0d6/bzImage-d3d15566.xz
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+39bc767144c55c8db0ea@syzkaller.appspotmail.com
>>
>> INFO: task syz.3.917:7739 blocked for more than 146 seconds.
>>       Not tainted 6.12.0-rc2-syzkaller-00074-gd3d1556696c1 #0
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:syz.3.917       state:D stack:23808 pid:7739  tgid:7739  ppid:5232   flags:0x00004000
>> Call Trace:
>>  <TASK>
>>  context_switch kernel/sched/core.c:5322 [inline]
>>  __schedule+0x1843/0x4ae0 kernel/sched/core.c:6682
>>  __schedule_loop kernel/sched/core.c:6759 [inline]
>>  schedule+0x14b/0x320 kernel/sched/core.c:6774
>>  schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6831
>>  rwsem_down_write_slowpath+0xeee/0x13b0 kernel/locking/rwsem.c:1176
>>  __down_write_common kernel/locking/rwsem.c:1304 [inline]
>>  __down_write kernel/locking/rwsem.c:1313 [inline]
>>  down_write+0x1d7/0x220 kernel/locking/rwsem.c:1578
>>  mmap_write_lock include/linux/mmap_lock.h:106 [inline]
>>  exit_mmap+0x2bd/0xc40 mm/mmap.c:1872
>
>Hmm, task freezing up or system becoming unstable/locked up is reminsecent
>of the maple tree bug I fixed in [0], which is still in the unstable hotfix
>branch.
>
>This is likely not going to repro as it's quite heisenbug-ish to trigger
>and the failures are like this - somewhat disconnected from the cause, so
>not sure if there is any case to speed this to Linus's tree.
>
>On the other hand it's a pretty serious problem for stability and likely to
>continue to manifest in nasty ways like this.
>
>Can't be 100% sure this is the cause, but seems likely.
>
>[0]:https://lore.kernel.org/linux-mm/48b349a2a0f7c76e18772712d0997a5e12ab0a3b.1728314403.git.lorenzo.stoakes@oracle.com/

On my Debian build box, running a 6.1 kernel, I've started hitting a
similar issue:

Oct 12 17:24:01 debian kernel: INFO: task sed:3557356 blocked for more than 1208 seconds.
Oct 12 17:24:01 debian kernel:       Not tainted 6.1.0-26-amd64 #1 Debian 6.1.112-1
Oct 12 17:24:01 debian kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 12 17:24:01 debian kernel: task:sed             state:D stack:0     pid:3557356 ppid:1      flags:0x00000002
Oct 12 17:24:01 debian kernel: Call Trace:
Oct 12 17:24:01 debian kernel:  <TASK>
Oct 12 17:24:01 debian kernel:  __schedule+0x34d/0x9e0
Oct 12 17:24:01 debian kernel:  schedule+0x5a/0xd0
Oct 12 17:24:01 debian kernel:  rwsem_down_write_slowpath+0x311/0x6d0
Oct 12 17:24:01 debian kernel:  exit_mmap+0xf6/0x2f0
Oct 12 17:24:01 debian kernel:  __mmput+0x3e/0x130
Oct 12 17:24:01 debian kernel:  do_exit+0x2fc/0xaf0
Oct 12 17:24:01 debian kernel:  do_group_exit+0x2d/0x80
Oct 12 17:24:01 debian kernel:  __x64_sys_exit_group+0x14/0x20
Oct 12 17:24:01 debian kernel:  do_syscall_64+0x55/0xb0
Oct 12 17:24:01 debian kernel:  ? do_fault+0x1a4/0x410
Oct 12 17:24:01 debian kernel:  ? __handle_mm_fault+0x660/0xfa0
Oct 12 17:24:01 debian kernel:  ? exit_to_user_mode_prepare+0x40/0x1e0
Oct 12 17:24:01 debian kernel:  ? handle_mm_fault+0xdb/0x2d0
Oct 12 17:24:01 debian kernel:  ? do_user_addr_fault+0x1b0/0x550
Oct 12 17:24:01 debian kernel:  ? exit_to_user_mode_prepare+0x40/0x1e0
Oct 12 17:24:01 debian kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Oct 12 17:24:01 debian kernel: RIP: 0033:0x7f797d75a349
Oct 12 17:24:01 debian kernel: RSP: 002b:00007fff37f0d3c8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
Oct 12 17:24:01 debian kernel: RAX: ffffffffffffffda RBX: 00007f797d8549e0 RCX: 00007f797d75a349
Oct 12 17:24:01 debian kernel: RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
Oct 12 17:24:01 debian kernel: RBP: 0000000000000000 R08: fffffffffffffe98 R09: 00007fff37f0d2df
Oct 12 17:24:01 debian kernel: R10: 00007fff37f0d240 R11: 0000000000000246 R12: 00007f797d8549e0
Oct 12 17:24:01 debian kernel: R13: 00007f797d85a2e0 R14: 0000000000000002 R15: 00007f797d85a2c8
Oct 12 17:24:01 debian kernel:  </TASK>

It reproduces fairly easily during a kernel build...

It doesn't sound like the same issue you're pointing out, right Lorenzo?

-- 
Thanks,
Sasha


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] [mm?] INFO: task hung in exit_mmap
  2024-10-13 13:29   ` Sasha Levin
@ 2024-10-14  2:22     ` Liam R. Howlett
  2024-10-14  9:04       ` Lorenzo Stoakes
  0 siblings, 1 reply; 5+ messages in thread
From: Liam R. Howlett @ 2024-10-14  2:22 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Lorenzo Stoakes, syzbot, akpm, linux-kernel, linux-mm,
	syzkaller-bugs, vbabka

* Sasha Levin <sashal@kernel.org> [241013 09:29]:
> On Thu, Oct 10, 2024 at 04:28:18PM +0100, Lorenzo Stoakes wrote:
> > On Thu, Oct 10, 2024 at 08:19:28AM -0700, syzbot wrote:
> > > Hello,
> > > 
> > > syzbot found the following issue on:
> > > 
> > > HEAD commit:    d3d1556696c1 Merge tag 'mm-hotfixes-stable-2024-10-09-15-4..
> > > git tree:       upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=10416fd0580000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=7a3fccdd0bb995
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=39bc767144c55c8db0ea
> > > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > > 
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > > 
> > > Downloadable assets:
> > > disk image: https://storage.googleapis.com/syzbot-assets/0600b551e610/disk-d3d15566.raw.xz
> > > vmlinux: https://storage.googleapis.com/syzbot-assets/d59d43ed3976/vmlinux-d3d15566.xz
> > > kernel image: https://storage.googleapis.com/syzbot-assets/e686a3e7e0d6/bzImage-d3d15566.xz
> > > 
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+39bc767144c55c8db0ea@syzkaller.appspotmail.com
> > > 
> > > INFO: task syz.3.917:7739 blocked for more than 146 seconds.
> > >       Not tainted 6.12.0-rc2-syzkaller-00074-gd3d1556696c1 #0
> > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > task:syz.3.917       state:D stack:23808 pid:7739  tgid:7739  ppid:5232   flags:0x00004000
> > > Call Trace:
> > >  <TASK>
> > >  context_switch kernel/sched/core.c:5322 [inline]
> > >  __schedule+0x1843/0x4ae0 kernel/sched/core.c:6682
> > >  __schedule_loop kernel/sched/core.c:6759 [inline]
> > >  schedule+0x14b/0x320 kernel/sched/core.c:6774
> > >  schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6831
> > >  rwsem_down_write_slowpath+0xeee/0x13b0 kernel/locking/rwsem.c:1176
> > >  __down_write_common kernel/locking/rwsem.c:1304 [inline]
> > >  __down_write kernel/locking/rwsem.c:1313 [inline]
> > >  down_write+0x1d7/0x220 kernel/locking/rwsem.c:1578
> > >  mmap_write_lock include/linux/mmap_lock.h:106 [inline]
> > >  exit_mmap+0x2bd/0xc40 mm/mmap.c:1872
> > 
> > Hmm, task freezing up or system becoming unstable/locked up is reminsecent
> > of the maple tree bug I fixed in [0], which is still in the unstable hotfix
> > branch.
> > 
> > This is likely not going to repro as it's quite heisenbug-ish to trigger
> > and the failures are like this - somewhat disconnected from the cause, so
> > not sure if there is any case to speed this to Linus's tree.
> > 
> > On the other hand it's a pretty serious problem for stability and likely to
> > continue to manifest in nasty ways like this.
> > 
> > Can't be 100% sure this is the cause, but seems likely.
> > 
> > [0]:https://lore.kernel.org/linux-mm/48b349a2a0f7c76e18772712d0997a5e12ab0a3b.1728314403.git.lorenzo.stoakes@oracle.com/
> 
> On my Debian build box, running a 6.1 kernel, I've started hitting a
> similar issue:
> 
> Oct 12 17:24:01 debian kernel: INFO: task sed:3557356 blocked for more than 1208 seconds.
> Oct 12 17:24:01 debian kernel:       Not tainted 6.1.0-26-amd64 #1 Debian 6.1.112-1
> Oct 12 17:24:01 debian kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Oct 12 17:24:01 debian kernel: task:sed             state:D stack:0     pid:3557356 ppid:1      flags:0x00000002
> Oct 12 17:24:01 debian kernel: Call Trace:
> Oct 12 17:24:01 debian kernel:  <TASK>
> Oct 12 17:24:01 debian kernel:  __schedule+0x34d/0x9e0
> Oct 12 17:24:01 debian kernel:  schedule+0x5a/0xd0
> Oct 12 17:24:01 debian kernel:  rwsem_down_write_slowpath+0x311/0x6d0
> Oct 12 17:24:01 debian kernel:  exit_mmap+0xf6/0x2f0
> Oct 12 17:24:01 debian kernel:  __mmput+0x3e/0x130
> Oct 12 17:24:01 debian kernel:  do_exit+0x2fc/0xaf0
> Oct 12 17:24:01 debian kernel:  do_group_exit+0x2d/0x80
> Oct 12 17:24:01 debian kernel:  __x64_sys_exit_group+0x14/0x20
> Oct 12 17:24:01 debian kernel:  do_syscall_64+0x55/0xb0
> Oct 12 17:24:01 debian kernel:  ? do_fault+0x1a4/0x410
> Oct 12 17:24:01 debian kernel:  ? __handle_mm_fault+0x660/0xfa0
> Oct 12 17:24:01 debian kernel:  ? exit_to_user_mode_prepare+0x40/0x1e0
> Oct 12 17:24:01 debian kernel:  ? handle_mm_fault+0xdb/0x2d0
> Oct 12 17:24:01 debian kernel:  ? do_user_addr_fault+0x1b0/0x550
> Oct 12 17:24:01 debian kernel:  ? exit_to_user_mode_prepare+0x40/0x1e0
> Oct 12 17:24:01 debian kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> Oct 12 17:24:01 debian kernel: RIP: 0033:0x7f797d75a349
> Oct 12 17:24:01 debian kernel: RSP: 002b:00007fff37f0d3c8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> Oct 12 17:24:01 debian kernel: RAX: ffffffffffffffda RBX: 00007f797d8549e0 RCX: 00007f797d75a349
> Oct 12 17:24:01 debian kernel: RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
> Oct 12 17:24:01 debian kernel: RBP: 0000000000000000 R08: fffffffffffffe98 R09: 00007fff37f0d2df
> Oct 12 17:24:01 debian kernel: R10: 00007fff37f0d240 R11: 0000000000000246 R12: 00007f797d8549e0
> Oct 12 17:24:01 debian kernel: R13: 00007f797d85a2e0 R14: 0000000000000002 R15: 00007f797d85a2c8
> Oct 12 17:24:01 debian kernel:  </TASK>
> 
> It reproduces fairly easily during a kernel build...
> 
> It doesn't sound like the same issue you're pointing out, right Lorenzo?

It could be.  I suspect there has been a change recently that has
made the bug possible - although, I've not put effort into finding out
yet if that is true.  If the bug existed for a long time (probably since
I fixed the live locking issue in 6.4 that was backported), then you
could be hitting it.

It is a single line fix.  If it happens frequently enough, you could try
it - this fix will go through the backporting route once it lands.

Although, I am not sure it has much to do with the maple tree as I don't
think anyone should have the mm to take the mmap write lock.  If we were
stuck in the maple tree somehow, the mm wouldn't reach the exit_mmap()
path - unless I missed something?

If you can dump the running tasks when you hit it, we could get a clue
from the (probably numerous) backtraces?

Thanks,
Liam


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] [mm?] INFO: task hung in exit_mmap
  2024-10-14  2:22     ` Liam R. Howlett
@ 2024-10-14  9:04       ` Lorenzo Stoakes
  0 siblings, 0 replies; 5+ messages in thread
From: Lorenzo Stoakes @ 2024-10-14  9:04 UTC (permalink / raw)
  To: Liam R. Howlett, Sasha Levin, syzbot, akpm, linux-kernel,
	linux-mm, syzkaller-bugs, vbabka

On Sun, Oct 13, 2024 at 10:22:11PM -0400, Liam R. Howlett wrote:
> * Sasha Levin <sashal@kernel.org> [241013 09:29]:
> > On Thu, Oct 10, 2024 at 04:28:18PM +0100, Lorenzo Stoakes wrote:
> > > On Thu, Oct 10, 2024 at 08:19:28AM -0700, syzbot wrote:
> > > > Hello,
> > > >
> > > > syzbot found the following issue on:
> > > >
> > > > HEAD commit:    d3d1556696c1 Merge tag 'mm-hotfixes-stable-2024-10-09-15-4..
> > > > git tree:       upstream
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=10416fd0580000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=7a3fccdd0bb995
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=39bc767144c55c8db0ea
> > > > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > > >
> > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > >
> > > > Downloadable assets:
> > > > disk image: https://storage.googleapis.com/syzbot-assets/0600b551e610/disk-d3d15566.raw.xz
> > > > vmlinux: https://storage.googleapis.com/syzbot-assets/d59d43ed3976/vmlinux-d3d15566.xz
> > > > kernel image: https://storage.googleapis.com/syzbot-assets/e686a3e7e0d6/bzImage-d3d15566.xz
> > > >
> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > Reported-by: syzbot+39bc767144c55c8db0ea@syzkaller.appspotmail.com
> > > >
> > > > INFO: task syz.3.917:7739 blocked for more than 146 seconds.
> > > >       Not tainted 6.12.0-rc2-syzkaller-00074-gd3d1556696c1 #0
> > > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > task:syz.3.917       state:D stack:23808 pid:7739  tgid:7739  ppid:5232   flags:0x00004000
> > > > Call Trace:
> > > >  <TASK>
> > > >  context_switch kernel/sched/core.c:5322 [inline]
> > > >  __schedule+0x1843/0x4ae0 kernel/sched/core.c:6682
> > > >  __schedule_loop kernel/sched/core.c:6759 [inline]
> > > >  schedule+0x14b/0x320 kernel/sched/core.c:6774
> > > >  schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6831
> > > >  rwsem_down_write_slowpath+0xeee/0x13b0 kernel/locking/rwsem.c:1176
> > > >  __down_write_common kernel/locking/rwsem.c:1304 [inline]
> > > >  __down_write kernel/locking/rwsem.c:1313 [inline]
> > > >  down_write+0x1d7/0x220 kernel/locking/rwsem.c:1578
> > > >  mmap_write_lock include/linux/mmap_lock.h:106 [inline]
> > > >  exit_mmap+0x2bd/0xc40 mm/mmap.c:1872
> > >
> > > Hmm, task freezing up or system becoming unstable/locked up is reminsecent
> > > of the maple tree bug I fixed in [0], which is still in the unstable hotfix
> > > branch.
> > >
> > > This is likely not going to repro as it's quite heisenbug-ish to trigger
> > > and the failures are like this - somewhat disconnected from the cause, so
> > > not sure if there is any case to speed this to Linus's tree.
> > >
> > > On the other hand it's a pretty serious problem for stability and likely to
> > > continue to manifest in nasty ways like this.
> > >
> > > Can't be 100% sure this is the cause, but seems likely.
> > >
> > > [0]:https://lore.kernel.org/linux-mm/48b349a2a0f7c76e18772712d0997a5e12ab0a3b.1728314403.git.lorenzo.stoakes@oracle.com/
> >
> > On my Debian build box, running a 6.1 kernel, I've started hitting a
> > similar issue:
> >
> > Oct 12 17:24:01 debian kernel: INFO: task sed:3557356 blocked for more than 1208 seconds.
> > Oct 12 17:24:01 debian kernel:       Not tainted 6.1.0-26-amd64 #1 Debian 6.1.112-1
> > Oct 12 17:24:01 debian kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Oct 12 17:24:01 debian kernel: task:sed             state:D stack:0     pid:3557356 ppid:1      flags:0x00000002
> > Oct 12 17:24:01 debian kernel: Call Trace:
> > Oct 12 17:24:01 debian kernel:  <TASK>
> > Oct 12 17:24:01 debian kernel:  __schedule+0x34d/0x9e0
> > Oct 12 17:24:01 debian kernel:  schedule+0x5a/0xd0
> > Oct 12 17:24:01 debian kernel:  rwsem_down_write_slowpath+0x311/0x6d0
> > Oct 12 17:24:01 debian kernel:  exit_mmap+0xf6/0x2f0
> > Oct 12 17:24:01 debian kernel:  __mmput+0x3e/0x130
> > Oct 12 17:24:01 debian kernel:  do_exit+0x2fc/0xaf0
> > Oct 12 17:24:01 debian kernel:  do_group_exit+0x2d/0x80
> > Oct 12 17:24:01 debian kernel:  __x64_sys_exit_group+0x14/0x20
> > Oct 12 17:24:01 debian kernel:  do_syscall_64+0x55/0xb0
> > Oct 12 17:24:01 debian kernel:  ? do_fault+0x1a4/0x410
> > Oct 12 17:24:01 debian kernel:  ? __handle_mm_fault+0x660/0xfa0
> > Oct 12 17:24:01 debian kernel:  ? exit_to_user_mode_prepare+0x40/0x1e0
> > Oct 12 17:24:01 debian kernel:  ? handle_mm_fault+0xdb/0x2d0
> > Oct 12 17:24:01 debian kernel:  ? do_user_addr_fault+0x1b0/0x550
> > Oct 12 17:24:01 debian kernel:  ? exit_to_user_mode_prepare+0x40/0x1e0
> > Oct 12 17:24:01 debian kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > Oct 12 17:24:01 debian kernel: RIP: 0033:0x7f797d75a349
> > Oct 12 17:24:01 debian kernel: RSP: 002b:00007fff37f0d3c8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> > Oct 12 17:24:01 debian kernel: RAX: ffffffffffffffda RBX: 00007f797d8549e0 RCX: 00007f797d75a349
> > Oct 12 17:24:01 debian kernel: RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
> > Oct 12 17:24:01 debian kernel: RBP: 0000000000000000 R08: fffffffffffffe98 R09: 00007fff37f0d2df
> > Oct 12 17:24:01 debian kernel: R10: 00007fff37f0d240 R11: 0000000000000246 R12: 00007f797d8549e0
> > Oct 12 17:24:01 debian kernel: R13: 00007f797d85a2e0 R14: 0000000000000002 R15: 00007f797d85a2c8
> > Oct 12 17:24:01 debian kernel:  </TASK>
> >
> > It reproduces fairly easily during a kernel build...
> >
> > It doesn't sound like the same issue you're pointing out, right Lorenzo?
>
> It could be.  I suspect there has been a change recently that has
> made the bug possible - although, I've not put effort into finding out
> yet if that is true.  If the bug existed for a long time (probably since
> I fixed the live locking issue in 6.4 that was backported), then you
> could be hitting it.
>
> It is a single line fix.  If it happens frequently enough, you could try
> it - this fix will go through the backporting route once it lands.
>
> Although, I am not sure it has much to do with the maple tree as I don't
> think anyone should have the mm to take the mmap write lock.  If we were
> stuck in the maple tree somehow, the mm wouldn't reach the exit_mmap()
> path - unless I missed something?

I think this is the same bug, as the problem is, once it manifests, the
actual problems it causes are delayed down the line until we hit a
situation that is _caused by_ the bug but somewhat detached from it.

In fact Bert and Mikhail hit the same thing with a process locking up in
this path AND in that a lock is held that really makes no sense.

As Liam says, this is a one-line change [0], could you try taking it and
seeing? It has already been taken by Andrew, but we delay our hotfixes by a
week or two so isn't in -rc yet.

[0]: https://lore.kernel.org/linux-mm/48b349a2a0f7c76e18772712d0997a5e12ab0a3b.1728314403.git.lorenzo.stoakes@oracle.com/

>
> If you can dump the running tasks when you hit it, we could get a clue
> from the (probably numerous) backtraces?

I suggest first trying [0] :) if that doesn't fix things then we will
definitely want to dig in further.

>
> Thanks,
> Liam

Thanks!


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-10-14  9:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-10 15:19 [syzbot] [mm?] INFO: task hung in exit_mmap syzbot
2024-10-10 15:28 ` Lorenzo Stoakes
2024-10-13 13:29   ` Sasha Levin
2024-10-14  2:22     ` Liam R. Howlett
2024-10-14  9:04       ` Lorenzo Stoakes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox