* Re: [syzbot] [block?] possible deadlock in blkdev_read_iter [not found] <697400dc.a70a0220.35de72.000a.GAE@google.com> @ 2026-01-24 11:31 ` Hillf Danton 2026-01-26 17:20 ` Suren Baghdasaryan 0 siblings, 1 reply; 7+ messages in thread From: Hillf Danton @ 2026-01-24 11:31 UTC (permalink / raw) To: syzbot Cc: axboe, linux-block, Lorenzo Stoakes, Suren Baghdasaryan, linux-mm, linux-kernel, syzkaller-bugs Add Lorenzo and Suren > Date: Fri, 23 Jan 2026 15:14:36 -0800 > Hello, > > syzbot found the following issue on: > > HEAD commit: 24d479d26b25 Linux 6.19-rc6 > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=100033fa580000 > kernel config: https://syzkaller.appspot.com/x/.config?x=1859476832863c41 > dashboard link: https://syzkaller.appspot.com/bug?extid=4e70c8e0a2017b432f7a > compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11451b9a580000 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1045e852580000 > > Downloadable assets: > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-24d479d2.raw.xz > vmlinux: https://storage.googleapis.com/syzbot-assets/d0f3c47f6869/vmlinux-24d479d2.xz > kernel image: https://storage.googleapis.com/syzbot-assets/800231513703/bzImage-24d479d2.xz > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+4e70c8e0a2017b432f7a@syzkaller.appspotmail.com > > WARNING: possible circular locking dependency detected > syzkaller #0 Not tainted > ------------------------------------------------------ > syz.0.17/6091 is trying to acquire lock: > ffff8881061287a8 ( > &sb->s_type->i_mutex_key#8){++++}-{4:4}, at: inode_lock_shared include/linux/fs.h:1042 [inline] > &sb->s_type->i_mutex_key#8){++++}-{4:4}, at: blkdev_read_iter+0x19e/0x500 block/fops.c:855 > > but task is already holding lock: > ffff888012aa0448 (vm_lock){++++}-{0:0}, at: lock_next_vma+0x10e/0xed0 mm/mmap_lock.c:334 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #2 (vm_lock){++++}-{0:0}: > __vma_enter_locked+0x260/0x770 mm/mmap_lock.c:72 > __vma_start_write+0x21/0x160 mm/mmap_lock.c:104 > vma_start_write include/linux/mmap_lock.h:213 [inline] > mprotect_fixup+0x4e3/0xb80 mm/mprotect.c:768 > setup_arg_pages+0x4a2/0xbb0 fs/exec.c:670 > load_elf_binary+0xb5b/0x4fe0 fs/binfmt_elf.c:1028 > search_binary_handler fs/exec.c:1669 [inline] > exec_binprm fs/exec.c:1701 [inline] > bprm_execve fs/exec.c:1753 [inline] > bprm_execve+0x8c2/0x1620 fs/exec.c:1729 > kernel_execve+0x2ef/0x3b0 fs/exec.c:1919 > try_to_run_init_process init/main.c:1506 [inline] > kernel_init+0x14a/0x2b0 init/main.c:1634 > ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158 > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 > > -> #1 (&mm->mmap_lock){++++}-{4:4}: > __might_fault mm/memory.c:7174 [inline] > __might_fault+0x113/0x190 mm/memory.c:7168 > _copy_to_iter+0x1c2/0x1710 lib/iov_iter.c:196 > copy_page_to_iter lib/iov_iter.c:374 [inline] > copy_page_to_iter+0x12a/0x1e0 lib/iov_iter.c:361 > copy_folio_to_iter include/linux/uio.h:204 [inline] > filemap_read+0x6b1/0xe40 mm/filemap.c:2851 > blkdev_read_iter+0x1ac/0x500 block/fops.c:856 > new_sync_read fs/read_write.c:491 [inline] > vfs_read+0x8bf/0xcf0 fs/read_write.c:572 > ksys_read+0x12a/0x250 fs/read_write.c:715 > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}: > check_prev_add kernel/locking/lockdep.c:3165 [inline] > check_prevs_add kernel/locking/lockdep.c:3284 [inline] > validate_chain kernel/locking/lockdep.c:3908 [inline] > __lock_acquire+0x1669/0x2890 kernel/locking/lockdep.c:5237 > lock_acquire kernel/locking/lockdep.c:5868 [inline] > lock_acquire+0x179/0x330 kernel/locking/lockdep.c:5825 > down_read+0x9b/0x460 kernel/locking/rwsem.c:1537 > inode_lock_shared include/linux/fs.h:1042 [inline] > blkdev_read_iter+0x19e/0x500 block/fops.c:855 > __kernel_read+0x3f3/0xbf0 fs/read_write.c:530 > freader_fetch+0x1d7/0x9d0 lib/buildid.c:100 > __build_id_parse.isra.0+0xdd/0x6c0 lib/buildid.c:297 > do_procmap_query+0xb0e/0x1080 fs/proc/task_mmu.c:733 > procfs_procmap_ioctl+0x9d/0xe0 fs/proc/task_mmu.c:813 > vfs_ioctl fs/ioctl.c:51 [inline] > __do_sys_ioctl fs/ioctl.c:597 [inline] > __se_sys_ioctl fs/ioctl.c:583 [inline] > __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583 > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > other info that might help us debug this: > > Chain exists of: > &sb->s_type->i_mutex_key#8 --> &mm->mmap_lock --> vm_lock > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > rlock(vm_lock); > lock(&mm->mmap_lock); > lock(vm_lock); > rlock(&sb->s_type->i_mutex_key#8); > > *** DEADLOCK *** > > 1 lock held by syz.0.17/6091: > #0: ffff888012aa0448 (vm_lock){++++}-{0:0}, at: lock_next_vma+0x10e/0xed0 mm/mmap_lock.c:334 > > stack backtrace: > CPU: 2 UID: 0 PID: 6091 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 > Call Trace: > <TASK> > __dump_stack lib/dump_stack.c:94 [inline] > dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 > print_circular_bug+0x275/0x340 kernel/locking/lockdep.c:2043 > check_noncircular+0x146/0x160 kernel/locking/lockdep.c:2175 > check_prev_add kernel/locking/lockdep.c:3165 [inline] > check_prevs_add kernel/locking/lockdep.c:3284 [inline] > validate_chain kernel/locking/lockdep.c:3908 [inline] > __lock_acquire+0x1669/0x2890 kernel/locking/lockdep.c:5237 > lock_acquire kernel/locking/lockdep.c:5868 [inline] > lock_acquire+0x179/0x330 kernel/locking/lockdep.c:5825 > down_read+0x9b/0x460 kernel/locking/rwsem.c:1537 > inode_lock_shared include/linux/fs.h:1042 [inline] > blkdev_read_iter+0x19e/0x500 block/fops.c:855 > __kernel_read+0x3f3/0xbf0 fs/read_write.c:530 > freader_fetch+0x1d7/0x9d0 lib/buildid.c:100 > __build_id_parse.isra.0+0xdd/0x6c0 lib/buildid.c:297 > do_procmap_query+0xb0e/0x1080 fs/proc/task_mmu.c:733 > procfs_procmap_ioctl+0x9d/0xe0 fs/proc/task_mmu.c:813 > vfs_ioctl fs/ioctl.c:51 [inline] > __do_sys_ioctl fs/ioctl.c:597 [inline] > __se_sys_ioctl fs/ioctl.c:583 [inline] > __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583 > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > RIP: 0033:0x7ff1a238f7c9 > Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 > RSP: 002b:00007ffebbe538b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 00007ff1a25e5fa0 RCX: 00007ff1a238f7c9 > RDX: 0000200000000180 RSI: 00000000c0686611 RDI: 0000000000000004 > RBP: 00007ff1a2413f91 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > R13: 00007ff1a25e5fa0 R14: 00007ff1a25e5fa0 R15: 0000000000000003 > </TASK> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] [block?] possible deadlock in blkdev_read_iter 2026-01-24 11:31 ` [syzbot] [block?] possible deadlock in blkdev_read_iter Hillf Danton @ 2026-01-26 17:20 ` Suren Baghdasaryan 2026-01-26 22:33 ` Suren Baghdasaryan 0 siblings, 1 reply; 7+ messages in thread From: Suren Baghdasaryan @ 2026-01-26 17:20 UTC (permalink / raw) To: Hillf Danton Cc: syzbot, axboe, linux-block, Lorenzo Stoakes, linux-mm, linux-kernel, syzkaller-bugs On Sat, Jan 24, 2026 at 3:32 AM Hillf Danton <hdanton@sina.com> wrote: > > Add Lorenzo and Suren Thanks! > > > Date: Fri, 23 Jan 2026 15:14:36 -0800 > > Hello, > > > > syzbot found the following issue on: > > > > HEAD commit: 24d479d26b25 Linux 6.19-rc6 > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=100033fa580000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=1859476832863c41 > > dashboard link: https://syzkaller.appspot.com/bug?extid=4e70c8e0a2017b432f7a > > compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11451b9a580000 > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1045e852580000 > > > > Downloadable assets: > > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-24d479d2.raw.xz > > vmlinux: https://storage.googleapis.com/syzbot-assets/d0f3c47f6869/vmlinux-24d479d2.xz > > kernel image: https://storage.googleapis.com/syzbot-assets/800231513703/bzImage-24d479d2.xz > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+4e70c8e0a2017b432f7a@syzkaller.appspotmail.com > > > > WARNING: possible circular locking dependency detected > > syzkaller #0 Not tainted > > ------------------------------------------------------ > > syz.0.17/6091 is trying to acquire lock: > > ffff8881061287a8 ( > > &sb->s_type->i_mutex_key#8){++++}-{4:4}, at: inode_lock_shared include/linux/fs.h:1042 [inline] > > &sb->s_type->i_mutex_key#8){++++}-{4:4}, at: blkdev_read_iter+0x19e/0x500 block/fops.c:855 > > > > but task is already holding lock: > > ffff888012aa0448 (vm_lock){++++}-{0:0}, at: lock_next_vma+0x10e/0xed0 mm/mmap_lock.c:334 > > > > which lock already depends on the new lock. > > > > > > the existing dependency chain (in reverse order) is: > > > > -> #2 (vm_lock){++++}-{0:0}: > > __vma_enter_locked+0x260/0x770 mm/mmap_lock.c:72 > > __vma_start_write+0x21/0x160 mm/mmap_lock.c:104 > > vma_start_write include/linux/mmap_lock.h:213 [inline] > > mprotect_fixup+0x4e3/0xb80 mm/mprotect.c:768 > > setup_arg_pages+0x4a2/0xbb0 fs/exec.c:670 > > load_elf_binary+0xb5b/0x4fe0 fs/binfmt_elf.c:1028 > > search_binary_handler fs/exec.c:1669 [inline] > > exec_binprm fs/exec.c:1701 [inline] > > bprm_execve fs/exec.c:1753 [inline] > > bprm_execve+0x8c2/0x1620 fs/exec.c:1729 > > kernel_execve+0x2ef/0x3b0 fs/exec.c:1919 > > try_to_run_init_process init/main.c:1506 [inline] > > kernel_init+0x14a/0x2b0 init/main.c:1634 > > ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158 > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 > > > > -> #1 (&mm->mmap_lock){++++}-{4:4}: > > __might_fault mm/memory.c:7174 [inline] > > __might_fault+0x113/0x190 mm/memory.c:7168 > > _copy_to_iter+0x1c2/0x1710 lib/iov_iter.c:196 > > copy_page_to_iter lib/iov_iter.c:374 [inline] > > copy_page_to_iter+0x12a/0x1e0 lib/iov_iter.c:361 > > copy_folio_to_iter include/linux/uio.h:204 [inline] > > filemap_read+0x6b1/0xe40 mm/filemap.c:2851 > > blkdev_read_iter+0x1ac/0x500 block/fops.c:856 > > new_sync_read fs/read_write.c:491 [inline] > > vfs_read+0x8bf/0xcf0 fs/read_write.c:572 > > ksys_read+0x12a/0x250 fs/read_write.c:715 > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}: > > check_prev_add kernel/locking/lockdep.c:3165 [inline] > > check_prevs_add kernel/locking/lockdep.c:3284 [inline] > > validate_chain kernel/locking/lockdep.c:3908 [inline] > > __lock_acquire+0x1669/0x2890 kernel/locking/lockdep.c:5237 > > lock_acquire kernel/locking/lockdep.c:5868 [inline] > > lock_acquire+0x179/0x330 kernel/locking/lockdep.c:5825 > > down_read+0x9b/0x460 kernel/locking/rwsem.c:1537 > > inode_lock_shared include/linux/fs.h:1042 [inline] > > blkdev_read_iter+0x19e/0x500 block/fops.c:855 > > __kernel_read+0x3f3/0xbf0 fs/read_write.c:530 > > freader_fetch+0x1d7/0x9d0 lib/buildid.c:100 > > __build_id_parse.isra.0+0xdd/0x6c0 lib/buildid.c:297 > > do_procmap_query+0xb0e/0x1080 fs/proc/task_mmu.c:733 > > procfs_procmap_ioctl+0x9d/0xe0 fs/proc/task_mmu.c:813 > > vfs_ioctl fs/ioctl.c:51 [inline] > > __do_sys_ioctl fs/ioctl.c:597 [inline] > > __se_sys_ioctl fs/ioctl.c:583 [inline] > > __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583 > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > It looks like: #0 is executing PROCMAP_QUERY ioclt, read-locks vm_lock and then calls build_id_parse()->__build_id_parse(..., may_fault=true)->__kernel_read() which eventually takes inode->i_rwsem. #1 is a file-backed page fault which asserts that it might take mmap_lock for read. #2 is load_elf_binary()->mprotect_fixup() which write-locks both mmap_lock and vm_lock. I'm guessing it already holds inode->i_rwsem before write-locking these locks. Originally I thought the issue is most liley introduced in d9d1c2d81797 ("fs/proc/task_mmu: execute PROCMAP_QUERY ioctl under per-vma locks"). But if #2 indeed takes inode->i_rwsem before write-locking mmap_lock, then the problem should exist even before that change when we didn't use vm_lock and relied on mmap_lock... I'll try to analyze this more before attempting a fix. > > other info that might help us debug this: > > > > Chain exists of: > > &sb->s_type->i_mutex_key#8 --> &mm->mmap_lock --> vm_lock > > > > Possible unsafe locking scenario: > > > > CPU0 CPU1 > > ---- ---- > > rlock(vm_lock); > > lock(&mm->mmap_lock); > > lock(vm_lock); > > rlock(&sb->s_type->i_mutex_key#8); > > > > *** DEADLOCK *** > > > > 1 lock held by syz.0.17/6091: > > #0: ffff888012aa0448 (vm_lock){++++}-{0:0}, at: lock_next_vma+0x10e/0xed0 mm/mmap_lock.c:334 > > > > stack backtrace: > > CPU: 2 UID: 0 PID: 6091 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 > > Call Trace: > > <TASK> > > __dump_stack lib/dump_stack.c:94 [inline] > > dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 > > print_circular_bug+0x275/0x340 kernel/locking/lockdep.c:2043 > > check_noncircular+0x146/0x160 kernel/locking/lockdep.c:2175 > > check_prev_add kernel/locking/lockdep.c:3165 [inline] > > check_prevs_add kernel/locking/lockdep.c:3284 [inline] > > validate_chain kernel/locking/lockdep.c:3908 [inline] > > __lock_acquire+0x1669/0x2890 kernel/locking/lockdep.c:5237 > > lock_acquire kernel/locking/lockdep.c:5868 [inline] > > lock_acquire+0x179/0x330 kernel/locking/lockdep.c:5825 > > down_read+0x9b/0x460 kernel/locking/rwsem.c:1537 > > inode_lock_shared include/linux/fs.h:1042 [inline] > > blkdev_read_iter+0x19e/0x500 block/fops.c:855 > > __kernel_read+0x3f3/0xbf0 fs/read_write.c:530 > > freader_fetch+0x1d7/0x9d0 lib/buildid.c:100 > > __build_id_parse.isra.0+0xdd/0x6c0 lib/buildid.c:297 > > do_procmap_query+0xb0e/0x1080 fs/proc/task_mmu.c:733 > > procfs_procmap_ioctl+0x9d/0xe0 fs/proc/task_mmu.c:813 > > vfs_ioctl fs/ioctl.c:51 [inline] > > __do_sys_ioctl fs/ioctl.c:597 [inline] > > __se_sys_ioctl fs/ioctl.c:583 [inline] > > __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583 > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > RIP: 0033:0x7ff1a238f7c9 > > Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 > > RSP: 002b:00007ffebbe538b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > > RAX: ffffffffffffffda RBX: 00007ff1a25e5fa0 RCX: 00007ff1a238f7c9 > > RDX: 0000200000000180 RSI: 00000000c0686611 RDI: 0000000000000004 > > RBP: 00007ff1a2413f91 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > > R13: 00007ff1a25e5fa0 R14: 00007ff1a25e5fa0 R15: 0000000000000003 > > </TASK> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] [block?] possible deadlock in blkdev_read_iter 2026-01-26 17:20 ` Suren Baghdasaryan @ 2026-01-26 22:33 ` Suren Baghdasaryan 2026-01-27 2:22 ` Suren Baghdasaryan 0 siblings, 1 reply; 7+ messages in thread From: Suren Baghdasaryan @ 2026-01-26 22:33 UTC (permalink / raw) To: Hillf Danton Cc: syzbot, axboe, linux-block, Lorenzo Stoakes, linux-mm, linux-kernel, syzkaller-bugs, Andrii Nakryiko On Mon, Jan 26, 2026 at 9:20 AM Suren Baghdasaryan <surenb@google.com> wrote: > > On Sat, Jan 24, 2026 at 3:32 AM Hillf Danton <hdanton@sina.com> wrote: > > > > Add Lorenzo and Suren > > Thanks! > > > > > > Date: Fri, 23 Jan 2026 15:14:36 -0800 > > > Hello, > > > > > > syzbot found the following issue on: > > > > > > HEAD commit: 24d479d26b25 Linux 6.19-rc6 > > > git tree: upstream > > > console output: https://syzkaller.appspot.com/x/log.txt?x=100033fa580000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=1859476832863c41 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=4e70c8e0a2017b432f7a > > > compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11451b9a580000 > > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1045e852580000 > > > > > > Downloadable assets: > > > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-24d479d2.raw.xz > > > vmlinux: https://storage.googleapis.com/syzbot-assets/d0f3c47f6869/vmlinux-24d479d2.xz > > > kernel image: https://storage.googleapis.com/syzbot-assets/800231513703/bzImage-24d479d2.xz > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > Reported-by: syzbot+4e70c8e0a2017b432f7a@syzkaller.appspotmail.com > > > > > > WARNING: possible circular locking dependency detected > > > syzkaller #0 Not tainted > > > ------------------------------------------------------ > > > syz.0.17/6091 is trying to acquire lock: > > > ffff8881061287a8 ( > > > &sb->s_type->i_mutex_key#8){++++}-{4:4}, at: inode_lock_shared include/linux/fs.h:1042 [inline] > > > &sb->s_type->i_mutex_key#8){++++}-{4:4}, at: blkdev_read_iter+0x19e/0x500 block/fops.c:855 > > > > > > but task is already holding lock: > > > ffff888012aa0448 (vm_lock){++++}-{0:0}, at: lock_next_vma+0x10e/0xed0 mm/mmap_lock.c:334 > > > > > > which lock already depends on the new lock. > > > > > > > > > the existing dependency chain (in reverse order) is: > > > > > > -> #2 (vm_lock){++++}-{0:0}: > > > __vma_enter_locked+0x260/0x770 mm/mmap_lock.c:72 > > > __vma_start_write+0x21/0x160 mm/mmap_lock.c:104 > > > vma_start_write include/linux/mmap_lock.h:213 [inline] > > > mprotect_fixup+0x4e3/0xb80 mm/mprotect.c:768 > > > setup_arg_pages+0x4a2/0xbb0 fs/exec.c:670 > > > load_elf_binary+0xb5b/0x4fe0 fs/binfmt_elf.c:1028 > > > search_binary_handler fs/exec.c:1669 [inline] > > > exec_binprm fs/exec.c:1701 [inline] > > > bprm_execve fs/exec.c:1753 [inline] > > > bprm_execve+0x8c2/0x1620 fs/exec.c:1729 > > > kernel_execve+0x2ef/0x3b0 fs/exec.c:1919 > > > try_to_run_init_process init/main.c:1506 [inline] > > > kernel_init+0x14a/0x2b0 init/main.c:1634 > > > ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158 > > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 > > > > > > -> #1 (&mm->mmap_lock){++++}-{4:4}: > > > __might_fault mm/memory.c:7174 [inline] > > > __might_fault+0x113/0x190 mm/memory.c:7168 > > > _copy_to_iter+0x1c2/0x1710 lib/iov_iter.c:196 > > > copy_page_to_iter lib/iov_iter.c:374 [inline] > > > copy_page_to_iter+0x12a/0x1e0 lib/iov_iter.c:361 > > > copy_folio_to_iter include/linux/uio.h:204 [inline] > > > filemap_read+0x6b1/0xe40 mm/filemap.c:2851 > > > blkdev_read_iter+0x1ac/0x500 block/fops.c:856 > > > new_sync_read fs/read_write.c:491 [inline] > > > vfs_read+0x8bf/0xcf0 fs/read_write.c:572 > > > ksys_read+0x12a/0x250 fs/read_write.c:715 > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > > do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > > -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}: > > > check_prev_add kernel/locking/lockdep.c:3165 [inline] > > > check_prevs_add kernel/locking/lockdep.c:3284 [inline] > > > validate_chain kernel/locking/lockdep.c:3908 [inline] > > > __lock_acquire+0x1669/0x2890 kernel/locking/lockdep.c:5237 > > > lock_acquire kernel/locking/lockdep.c:5868 [inline] > > > lock_acquire+0x179/0x330 kernel/locking/lockdep.c:5825 > > > down_read+0x9b/0x460 kernel/locking/rwsem.c:1537 > > > inode_lock_shared include/linux/fs.h:1042 [inline] > > > blkdev_read_iter+0x19e/0x500 block/fops.c:855 > > > __kernel_read+0x3f3/0xbf0 fs/read_write.c:530 > > > freader_fetch+0x1d7/0x9d0 lib/buildid.c:100 > > > __build_id_parse.isra.0+0xdd/0x6c0 lib/buildid.c:297 > > > do_procmap_query+0xb0e/0x1080 fs/proc/task_mmu.c:733 > > > procfs_procmap_ioctl+0x9d/0xe0 fs/proc/task_mmu.c:813 > > > vfs_ioctl fs/ioctl.c:51 [inline] > > > __do_sys_ioctl fs/ioctl.c:597 [inline] > > > __se_sys_ioctl fs/ioctl.c:583 [inline] > > > __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583 > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > > do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > It looks like: > #0 is executing PROCMAP_QUERY ioclt, read-locks vm_lock and then calls > build_id_parse()->__build_id_parse(..., > may_fault=true)->__kernel_read() which eventually takes > inode->i_rwsem. > #1 is a file-backed page fault which asserts that it might take > mmap_lock for read. > #2 is load_elf_binary()->mprotect_fixup() which write-locks both > mmap_lock and vm_lock. I'm guessing it already holds inode->i_rwsem > before write-locking these locks. > > Originally I thought the issue is most liley introduced in > d9d1c2d81797 ("fs/proc/task_mmu: execute PROCMAP_QUERY ioctl under > per-vma locks"). But if #2 indeed takes inode->i_rwsem before > write-locking mmap_lock, then the problem should exist even before > that change when we didn't use vm_lock and relied on mmap_lock... > > I'll try to analyze this more before attempting a fix. I was able to reproduce the same issue even after reverting d9d1c2d81797. The deadlock in this case is simpler and involves mmap_lock instead of vm_lock (see below). Looks like the race is between the read() syscall and do_procmap_query(). I'll continue investigating, in the meantime CC'ing Andrii. [ 62.320932][ T9229] [ 62.321471][ T9229] ====================================================== [ 62.323016][ T9229] WARNING: possible circular locking dependency detected [ 62.324618][ T9229] 6.19.0-rc6-00001-g40bea6261b2a #42 Not tainted [ 62.326013][ T9229] ------------------------------------------------------ [ 62.327560][ T9229] hillf/9229 is trying to acquire lock: [ 62.328821][ T9229] ffff888145b7b5a8 (&sb->s_type->i_mutex_key#8){++++}-{4:4}, at: blkdev_read_iter+0x2a7/0x4e0 [ 62.331102][ T9229] [ 62.331102][ T9229] but task is already holding lock: [ 62.332722][ T9229] ffff888183a6e540 (&mm->mmap_lock){++++}-{4:4}, at: do_procmap_query+0x39f/0x1050 [ 62.334795][ T9229] [ 62.334795][ T9229] which lock already depends on the new lock. [ 62.334795][ T9229] [ 62.337072][ T9229] [ 62.337072][ T9229] the existing dependency chain (in reverse order) is: [ 62.338998][ T9229] [ 62.338998][ T9229] -> #1 (&mm->mmap_lock){++++}-{4:4}: [ 62.340646][ T9229] __might_fault+0xed/0x170 [ 62.341763][ T9229] _copy_to_iter+0x118/0x1720 [ 62.342913][ T9229] copy_page_to_iter+0x12d/0x1e0 [ 62.344167][ T9229] filemap_read+0x720/0x10a0 [ 62.345298][ T9229] blkdev_read_iter+0x2b5/0x4e0 [ 62.346480][ T9229] vfs_read+0x7f4/0xae0 [ 62.347518][ T9229] ksys_read+0x12a/0x250 [ 62.348584][ T9229] do_syscall_64+0xcb/0xf80 [ 62.349707][ T9229] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 62.351116][ T9229] [ 62.351116][ T9229] -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}: [ 62.353012][ T9229] __lock_acquire+0x1509/0x26d0 [ 62.354213][ T9229] lock_acquire+0x185/0x340 [ 62.355323][ T9229] down_read+0x98/0x490 [ 62.356441][ T9229] blkdev_read_iter+0x2a7/0x4e0 [ 62.357619][ T9229] __kernel_read+0x39a/0xa90 [ 62.358767][ T9229] freader_fetch+0x1d5/0xa80 [ 62.359927][ T9229] __build_id_parse.isra.0+0xea/0x6a0 [ 62.361232][ T9229] do_procmap_query+0xd75/0x1050 [ 62.362434][ T9229] procfs_procmap_ioctl+0x7a/0xb0 [ 62.363687][ T9229] __x64_sys_ioctl+0x18e/0x210 [ 62.364863][ T9229] do_syscall_64+0xcb/0xf80 [ 62.365977][ T9229] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 62.367394][ T9229] [ 62.367394][ T9229] other info that might help us debug this: [ 62.367394][ T9229] [ 62.369637][ T9229] Possible unsafe locking scenario: [ 62.369637][ T9229] [ 62.371237][ T9229] CPU0 CPU1 [ 62.372441][ T9229] ---- ---- [ 62.373687][ T9229] rlock(&mm->mmap_lock); [ 62.374688][ T9229] lock(&sb->s_type->i_mutex_key#8); [ 62.376444][ T9229] lock(&mm->mmap_lock); [ 62.377956][ T9229] rlock(&sb->s_type->i_mutex_key#8); [ 62.379165][ T9229] [ 62.379165][ T9229] *** DEADLOCK *** [ 62.379165][ T9229] [ 62.380952][ T9229] 1 lock held by hillf/9229: [ 62.381971][ T9229] #0: ffff888183a6e540 (&mm->mmap_lock){++++}-{4:4}, at: do_procmap_query+0x39f/0x1050 [ 62.384162][ T9229] [ 62.384162][ T9229] stack backtrace: [ 62.385458][ T9229] CPU: 3 UID: 0 PID: 9229 Comm: hillf Not tainted 6.19.0-rc6-00001-g40bea6261b2a #42 PREEMPT(full) [ 62.385471][ T9229] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 [ 62.385477][ T9229] Call Trace: [ 62.385482][ T9229] <TASK> [ 62.385487][ T9229] dump_stack_lvl+0x100/0x190 [ 62.385505][ T9229] print_circular_bug.cold+0x185/0x1d5 [ 62.385521][ T9229] check_noncircular+0x14a/0x170 [ 62.385534][ T9229] __lock_acquire+0x1509/0x26d0 [ 62.385547][ T9229] lock_acquire+0x185/0x340 [ 62.385557][ T9229] ? blkdev_read_iter+0x2a7/0x4e0 [ 62.385569][ T9229] ? __pfx___might_resched+0x10/0x10 [ 62.385583][ T9229] down_read+0x98/0x490 [ 62.385593][ T9229] ? blkdev_read_iter+0x2a7/0x4e0 [ 62.385603][ T9229] ? __pfx_down_read+0x10/0x10 [ 62.385612][ T9229] ? lock_acquire+0x185/0x340 [ 62.385622][ T9229] ? is_bpf_text_address+0x25/0x1a0 [ 62.385634][ T9229] blkdev_read_iter+0x2a7/0x4e0 [ 62.385645][ T9229] __kernel_read+0x39a/0xa90 [ 62.385658][ T9229] ? __pfx___kernel_read+0x10/0x10 [ 62.385671][ T9229] ? __lock_acquire+0x481/0x26d0 [ 62.385683][ T9229] freader_fetch+0x1d5/0xa80 [ 62.385697][ T9229] ? find_held_lock+0x2b/0x80 [ 62.385712][ T9229] ? __pfx_freader_fetch+0x10/0x10 [ 62.385725][ T9229] ? __asan_memset+0x27/0x50 [ 62.385737][ T9229] __build_id_parse.isra.0+0xea/0x6a0 [ 62.385751][ T9229] ? __pfx___build_id_parse.isra.0+0x10/0x10 [ 62.385766][ T9229] ? __pfx_find_vma+0x10/0x10 [ 62.385774][ T9229] ? __might_fault+0x129/0x170 [ 62.385788][ T9229] do_procmap_query+0xd75/0x1050 [ 62.385798][ T9229] ? __pfx_do_procmap_query+0x10/0x10 [ 62.385807][ T9229] ? __sanitizer_cov_trace_switch+0x53/0x90 [ 62.385817][ T9229] ? do_vfs_ioctl+0x226/0x13b0 [ 62.385828][ T9229] ? __pfx_do_vfs_ioctl+0x10/0x10 [ 62.385839][ T9229] ? putname+0xfc/0x1b0 [ 62.385846][ T9229] ? putname+0x101/0x1b0 [ 62.385857][ T9229] ? __x64_sys_openat+0x143/0x210 [ 62.385867][ T9229] procfs_procmap_ioctl+0x7a/0xb0 [ 62.385877][ T9229] ? __pfx_procfs_procmap_ioctl+0x10/0x10 [ 62.385888][ T9229] __x64_sys_ioctl+0x18e/0x210 [ 62.385899][ T9229] do_syscall_64+0xcb/0xf80 [ 62.385913][ T9229] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 62.385923][ T9229] RIP: 0033:0x412209 [ 62.385931][ T9229] Code: c0 79 93 eb d5 48 8d 7c 1d 00 eb 99 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 d8 ff ff ff f7 d8 64 89 01 48 [ 62.385940][ T9229] RSP: 002b:00007fff380d5588 EFLAGS: 00000217 ORIG_RAX: 0000000000000010 [ 62.385950][ T9229] RAX: ffffffffffffffda RBX: 00007fff380d56c8 RCX: 0000000000412209 [ 62.385956][ T9229] RDX: 0000200000000180 RSI: 00000000c0686611 RDI: 0000000000000004 [ 62.385962][ T9229] RBP: 00007fff380d55a0 R08: 0000000000000000 R09: 00007fff380d5640 [ 62.385968][ T9229] R10: 0000000000000000 R11: 0000000000000217 R12: 00007fff380d56b8 [ 62.385974][ T9229] R13: 0000000000000002 R14: 00000000004a0e40 R15: 0000000000000002 [ 62.385982][ T9229] </TASK> > > > > other info that might help us debug this: > > > > > > Chain exists of: > > > &sb->s_type->i_mutex_key#8 --> &mm->mmap_lock --> vm_lock > > > > > > Possible unsafe locking scenario: > > > > > > CPU0 CPU1 > > > ---- ---- > > > rlock(vm_lock); > > > lock(&mm->mmap_lock); > > > lock(vm_lock); > > > rlock(&sb->s_type->i_mutex_key#8); > > > > > > *** DEADLOCK *** > > > > > > 1 lock held by syz.0.17/6091: > > > #0: ffff888012aa0448 (vm_lock){++++}-{0:0}, at: lock_next_vma+0x10e/0xed0 mm/mmap_lock.c:334 > > > > > > stack backtrace: > > > CPU: 2 UID: 0 PID: 6091 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 > > > Call Trace: > > > <TASK> > > > __dump_stack lib/dump_stack.c:94 [inline] > > > dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 > > > print_circular_bug+0x275/0x340 kernel/locking/lockdep.c:2043 > > > check_noncircular+0x146/0x160 kernel/locking/lockdep.c:2175 > > > check_prev_add kernel/locking/lockdep.c:3165 [inline] > > > check_prevs_add kernel/locking/lockdep.c:3284 [inline] > > > validate_chain kernel/locking/lockdep.c:3908 [inline] > > > __lock_acquire+0x1669/0x2890 kernel/locking/lockdep.c:5237 > > > lock_acquire kernel/locking/lockdep.c:5868 [inline] > > > lock_acquire+0x179/0x330 kernel/locking/lockdep.c:5825 > > > down_read+0x9b/0x460 kernel/locking/rwsem.c:1537 > > > inode_lock_shared include/linux/fs.h:1042 [inline] > > > blkdev_read_iter+0x19e/0x500 block/fops.c:855 > > > __kernel_read+0x3f3/0xbf0 fs/read_write.c:530 > > > freader_fetch+0x1d7/0x9d0 lib/buildid.c:100 > > > __build_id_parse.isra.0+0xdd/0x6c0 lib/buildid.c:297 > > > do_procmap_query+0xb0e/0x1080 fs/proc/task_mmu.c:733 > > > procfs_procmap_ioctl+0x9d/0xe0 fs/proc/task_mmu.c:813 > > > vfs_ioctl fs/ioctl.c:51 [inline] > > > __do_sys_ioctl fs/ioctl.c:597 [inline] > > > __se_sys_ioctl fs/ioctl.c:583 [inline] > > > __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583 > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > > do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > RIP: 0033:0x7ff1a238f7c9 > > > Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 > > > RSP: 002b:00007ffebbe538b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > > > RAX: ffffffffffffffda RBX: 00007ff1a25e5fa0 RCX: 00007ff1a238f7c9 > > > RDX: 0000200000000180 RSI: 00000000c0686611 RDI: 0000000000000004 > > > RBP: 00007ff1a2413f91 R08: 0000000000000000 R09: 0000000000000000 > > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > > > R13: 00007ff1a25e5fa0 R14: 00007ff1a25e5fa0 R15: 0000000000000003 > > > </TASK> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] [block?] possible deadlock in blkdev_read_iter 2026-01-26 22:33 ` Suren Baghdasaryan @ 2026-01-27 2:22 ` Suren Baghdasaryan 2026-01-27 18:51 ` Andrii Nakryiko 0 siblings, 1 reply; 7+ messages in thread From: Suren Baghdasaryan @ 2026-01-27 2:22 UTC (permalink / raw) To: Hillf Danton Cc: syzbot, axboe, linux-block, Lorenzo Stoakes, linux-mm, linux-kernel, syzkaller-bugs, Andrii Nakryiko On Mon, Jan 26, 2026 at 2:33 PM Suren Baghdasaryan <surenb@google.com> wrote: > > On Mon, Jan 26, 2026 at 9:20 AM Suren Baghdasaryan <surenb@google.com> wrote: > > > > On Sat, Jan 24, 2026 at 3:32 AM Hillf Danton <hdanton@sina.com> wrote: > > > > > > Add Lorenzo and Suren > > > > Thanks! > > > > > > > > > Date: Fri, 23 Jan 2026 15:14:36 -0800 > > > > Hello, > > > > > > > > syzbot found the following issue on: > > > > > > > > HEAD commit: 24d479d26b25 Linux 6.19-rc6 > > > > git tree: upstream > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=100033fa580000 > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=1859476832863c41 > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=4e70c8e0a2017b432f7a > > > > compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11451b9a580000 > > > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1045e852580000 > > > > > > > > Downloadable assets: > > > > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-24d479d2.raw.xz > > > > vmlinux: https://storage.googleapis.com/syzbot-assets/d0f3c47f6869/vmlinux-24d479d2.xz > > > > kernel image: https://storage.googleapis.com/syzbot-assets/800231513703/bzImage-24d479d2.xz > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > Reported-by: syzbot+4e70c8e0a2017b432f7a@syzkaller.appspotmail.com > > > > > > > > WARNING: possible circular locking dependency detected > > > > syzkaller #0 Not tainted > > > > ------------------------------------------------------ > > > > syz.0.17/6091 is trying to acquire lock: > > > > ffff8881061287a8 ( > > > > &sb->s_type->i_mutex_key#8){++++}-{4:4}, at: inode_lock_shared include/linux/fs.h:1042 [inline] > > > > &sb->s_type->i_mutex_key#8){++++}-{4:4}, at: blkdev_read_iter+0x19e/0x500 block/fops.c:855 > > > > > > > > but task is already holding lock: > > > > ffff888012aa0448 (vm_lock){++++}-{0:0}, at: lock_next_vma+0x10e/0xed0 mm/mmap_lock.c:334 > > > > > > > > which lock already depends on the new lock. > > > > > > > > > > > > the existing dependency chain (in reverse order) is: > > > > > > > > -> #2 (vm_lock){++++}-{0:0}: > > > > __vma_enter_locked+0x260/0x770 mm/mmap_lock.c:72 > > > > __vma_start_write+0x21/0x160 mm/mmap_lock.c:104 > > > > vma_start_write include/linux/mmap_lock.h:213 [inline] > > > > mprotect_fixup+0x4e3/0xb80 mm/mprotect.c:768 > > > > setup_arg_pages+0x4a2/0xbb0 fs/exec.c:670 > > > > load_elf_binary+0xb5b/0x4fe0 fs/binfmt_elf.c:1028 > > > > search_binary_handler fs/exec.c:1669 [inline] > > > > exec_binprm fs/exec.c:1701 [inline] > > > > bprm_execve fs/exec.c:1753 [inline] > > > > bprm_execve+0x8c2/0x1620 fs/exec.c:1729 > > > > kernel_execve+0x2ef/0x3b0 fs/exec.c:1919 > > > > try_to_run_init_process init/main.c:1506 [inline] > > > > kernel_init+0x14a/0x2b0 init/main.c:1634 > > > > ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158 > > > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 > > > > > > > > -> #1 (&mm->mmap_lock){++++}-{4:4}: > > > > __might_fault mm/memory.c:7174 [inline] > > > > __might_fault+0x113/0x190 mm/memory.c:7168 > > > > _copy_to_iter+0x1c2/0x1710 lib/iov_iter.c:196 > > > > copy_page_to_iter lib/iov_iter.c:374 [inline] > > > > copy_page_to_iter+0x12a/0x1e0 lib/iov_iter.c:361 > > > > copy_folio_to_iter include/linux/uio.h:204 [inline] > > > > filemap_read+0x6b1/0xe40 mm/filemap.c:2851 > > > > blkdev_read_iter+0x1ac/0x500 block/fops.c:856 > > > > new_sync_read fs/read_write.c:491 [inline] > > > > vfs_read+0x8bf/0xcf0 fs/read_write.c:572 > > > > ksys_read+0x12a/0x250 fs/read_write.c:715 > > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > > > do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 > > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > > > > -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}: > > > > check_prev_add kernel/locking/lockdep.c:3165 [inline] > > > > check_prevs_add kernel/locking/lockdep.c:3284 [inline] > > > > validate_chain kernel/locking/lockdep.c:3908 [inline] > > > > __lock_acquire+0x1669/0x2890 kernel/locking/lockdep.c:5237 > > > > lock_acquire kernel/locking/lockdep.c:5868 [inline] > > > > lock_acquire+0x179/0x330 kernel/locking/lockdep.c:5825 > > > > down_read+0x9b/0x460 kernel/locking/rwsem.c:1537 > > > > inode_lock_shared include/linux/fs.h:1042 [inline] > > > > blkdev_read_iter+0x19e/0x500 block/fops.c:855 > > > > __kernel_read+0x3f3/0xbf0 fs/read_write.c:530 > > > > freader_fetch+0x1d7/0x9d0 lib/buildid.c:100 > > > > __build_id_parse.isra.0+0xdd/0x6c0 lib/buildid.c:297 > > > > do_procmap_query+0xb0e/0x1080 fs/proc/task_mmu.c:733 > > > > procfs_procmap_ioctl+0x9d/0xe0 fs/proc/task_mmu.c:813 > > > > vfs_ioctl fs/ioctl.c:51 [inline] > > > > __do_sys_ioctl fs/ioctl.c:597 [inline] > > > > __se_sys_ioctl fs/ioctl.c:583 [inline] > > > > __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583 > > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > > > do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 > > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > > > > It looks like: > > #0 is executing PROCMAP_QUERY ioclt, read-locks vm_lock and then calls > > build_id_parse()->__build_id_parse(..., > > may_fault=true)->__kernel_read() which eventually takes > > inode->i_rwsem. > > #1 is a file-backed page fault which asserts that it might take > > mmap_lock for read. > > #2 is load_elf_binary()->mprotect_fixup() which write-locks both > > mmap_lock and vm_lock. I'm guessing it already holds inode->i_rwsem > > before write-locking these locks. > > > > Originally I thought the issue is most liley introduced in > > d9d1c2d81797 ("fs/proc/task_mmu: execute PROCMAP_QUERY ioctl under > > per-vma locks"). But if #2 indeed takes inode->i_rwsem before > > write-locking mmap_lock, then the problem should exist even before > > that change when we didn't use vm_lock and relied on mmap_lock... > > > > I'll try to analyze this more before attempting a fix. > > I was able to reproduce the same issue even after reverting > d9d1c2d81797. The deadlock in this case is simpler and involves > mmap_lock instead of vm_lock (see below). > Looks like the race is between the read() syscall and do_procmap_query(). > I'll continue investigating, in the meantime CC'ing Andrii. So, here is a cleaner version of that report (with d9d1c2d81797 reverted): -> #1 (&mm->mmap_lock){++++}-{4:4}: __might_fault+0xed/0x170 _copy_to_iter+0x118/0x1720 copy_page_to_iter+0x12d/0x1e0 filemap_read+0x720/0x10a0 blkdev_read_iter+0x2b5/0x4e0 vfs_read+0x7f4/0xae0 ksys_read+0x12a/0x250 do_syscall_64+0xcb/0xf80 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}: __lock_acquire+0x1509/0x26d0 lock_acquire+0x185/0x340 down_read+0x98/0x490 blkdev_read_iter+0x2a7/0x4e0 __kernel_read+0x39a/0xa90 freader_fetch+0x1d5/0xa80 __build_id_parse.isra.0+0xea/0x6a0 do_procmap_query+0xd75/0x1050 procfs_procmap_ioctl+0x7a/0xb0 __x64_sys_ioctl+0x18e/0x210 do_syscall_64+0xcb/0xf80 entry_SYSCALL_64_after_hwframe+0x77/0x7f other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- rlock(&mm->mmap_lock); lock(&sb->s_type->i_mutex_key#8); lock(&mm->mmap_lock); rlock(&sb->s_type->i_mutex_key#8); *** DEADLOCK *** Both threads are calling blkdev_read_iter(), which uses inode_lock_shared() to read-lock inode->i_rwsem. I'm not sure why CPU1 shows lock() instead of rlock(). So both threads read-lock inode->i_rwsem and mmap_lock but in a different order. IIUC, with read-locks this should not deadlock until some other thread write-locks the mmap_lock in between and this becomes a real deadlock: CPU0 CPU1 CPU2 ---- ---- ---- rlock(&mm->mmap_lock); rlock(&sb->s_type->i_mutex_key#8); wlock(&mm->mmap_lock) <-- waiting for CPU0 rlock(&mm->mmap_lock); <-- waiting for CPU1 rlock(&sb->s_type->i_mutex_key#8); <-- waiting for CPU2 I believe in the original report this write-locking thread was the one calling mprotect_fixup(). Per https://docs.kernel.org/mm/process_addrs.html#lock-ordering, inode->i_rwsem should be locked before mm->mmap_lock, so procfs_procmap_ioctl() has to be fixed to follow this lock ordering. One possibility I can think of is to use build_id_parse_nofault() first and if it fails because the required page is not faulted, we do freader_init_from_file(), then drop the mmap/vma lock and execute freader_fetch() outside of these locks to fault in that page. Once that's done, we'll retry the whole operation and this time build_id_parse_nofault() should pass (unless we already evicted that page, which is extremely unlikely and in that case, we'll retry again). I tried a POC with build_id_parse_nofault() but without the whole dance with freader_init_from_file/freader_fetch and the deadlock is gone. Andrii, WDYT? > > [ 62.320932][ T9229] > [ 62.321471][ T9229] ====================================================== > [ 62.323016][ T9229] WARNING: possible circular locking dependency detected > [ 62.324618][ T9229] 6.19.0-rc6-00001-g40bea6261b2a #42 Not tainted > [ 62.326013][ T9229] ------------------------------------------------------ > [ 62.327560][ T9229] hillf/9229 is trying to acquire lock: > [ 62.328821][ T9229] ffff888145b7b5a8 > (&sb->s_type->i_mutex_key#8){++++}-{4:4}, at: > blkdev_read_iter+0x2a7/0x4e0 > [ 62.331102][ T9229] > [ 62.331102][ T9229] but task is already holding lock: > [ 62.332722][ T9229] ffff888183a6e540 (&mm->mmap_lock){++++}-{4:4}, > at: do_procmap_query+0x39f/0x1050 > [ 62.334795][ T9229] > [ 62.334795][ T9229] which lock already depends on the new lock. > [ 62.334795][ T9229] > [ 62.337072][ T9229] > [ 62.337072][ T9229] the existing dependency chain (in reverse order) is: > [ 62.338998][ T9229] > [ 62.338998][ T9229] -> #1 (&mm->mmap_lock){++++}-{4:4}: > [ 62.340646][ T9229] __might_fault+0xed/0x170 > [ 62.341763][ T9229] _copy_to_iter+0x118/0x1720 > [ 62.342913][ T9229] copy_page_to_iter+0x12d/0x1e0 > [ 62.344167][ T9229] filemap_read+0x720/0x10a0 > [ 62.345298][ T9229] blkdev_read_iter+0x2b5/0x4e0 > [ 62.346480][ T9229] vfs_read+0x7f4/0xae0 > [ 62.347518][ T9229] ksys_read+0x12a/0x250 > [ 62.348584][ T9229] do_syscall_64+0xcb/0xf80 > [ 62.349707][ T9229] entry_SYSCALL_64_after_hwframe+0x77/0x7f > [ 62.351116][ T9229] > [ 62.351116][ T9229] -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}: > [ 62.353012][ T9229] __lock_acquire+0x1509/0x26d0 > [ 62.354213][ T9229] lock_acquire+0x185/0x340 > [ 62.355323][ T9229] down_read+0x98/0x490 > [ 62.356441][ T9229] blkdev_read_iter+0x2a7/0x4e0 > [ 62.357619][ T9229] __kernel_read+0x39a/0xa90 > [ 62.358767][ T9229] freader_fetch+0x1d5/0xa80 > [ 62.359927][ T9229] __build_id_parse.isra.0+0xea/0x6a0 > [ 62.361232][ T9229] do_procmap_query+0xd75/0x1050 > [ 62.362434][ T9229] procfs_procmap_ioctl+0x7a/0xb0 > [ 62.363687][ T9229] __x64_sys_ioctl+0x18e/0x210 > [ 62.364863][ T9229] do_syscall_64+0xcb/0xf80 > [ 62.365977][ T9229] entry_SYSCALL_64_after_hwframe+0x77/0x7f > [ 62.367394][ T9229] > [ 62.367394][ T9229] other info that might help us debug this: > [ 62.367394][ T9229] > [ 62.369637][ T9229] Possible unsafe locking scenario: > [ 62.369637][ T9229] > [ 62.371237][ T9229] CPU0 CPU1 > [ 62.372441][ T9229] ---- ---- > [ 62.373687][ T9229] rlock(&mm->mmap_lock); > [ 62.374688][ T9229] > lock(&sb->s_type->i_mutex_key#8); > [ 62.376444][ T9229] lock(&mm->mmap_lock); > [ 62.377956][ T9229] rlock(&sb->s_type->i_mutex_key#8); > [ 62.379165][ T9229] > [ 62.379165][ T9229] *** DEADLOCK *** > [ 62.379165][ T9229] > [ 62.380952][ T9229] 1 lock held by hillf/9229: > [ 62.381971][ T9229] #0: ffff888183a6e540 > (&mm->mmap_lock){++++}-{4:4}, at: do_procmap_query+0x39f/0x1050 > [ 62.384162][ T9229] > [ 62.384162][ T9229] stack backtrace: > [ 62.385458][ T9229] CPU: 3 UID: 0 PID: 9229 Comm: hillf Not tainted > 6.19.0-rc6-00001-g40bea6261b2a #42 PREEMPT(full) > [ 62.385471][ T9229] Hardware name: QEMU Standard PC (i440FX + PIIX, > 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 > [ 62.385477][ T9229] Call Trace: > [ 62.385482][ T9229] <TASK> > [ 62.385487][ T9229] dump_stack_lvl+0x100/0x190 > [ 62.385505][ T9229] print_circular_bug.cold+0x185/0x1d5 > [ 62.385521][ T9229] check_noncircular+0x14a/0x170 > [ 62.385534][ T9229] __lock_acquire+0x1509/0x26d0 > [ 62.385547][ T9229] lock_acquire+0x185/0x340 > [ 62.385557][ T9229] ? blkdev_read_iter+0x2a7/0x4e0 > [ 62.385569][ T9229] ? __pfx___might_resched+0x10/0x10 > [ 62.385583][ T9229] down_read+0x98/0x490 > [ 62.385593][ T9229] ? blkdev_read_iter+0x2a7/0x4e0 > [ 62.385603][ T9229] ? __pfx_down_read+0x10/0x10 > [ 62.385612][ T9229] ? lock_acquire+0x185/0x340 > [ 62.385622][ T9229] ? is_bpf_text_address+0x25/0x1a0 > [ 62.385634][ T9229] blkdev_read_iter+0x2a7/0x4e0 > [ 62.385645][ T9229] __kernel_read+0x39a/0xa90 > [ 62.385658][ T9229] ? __pfx___kernel_read+0x10/0x10 > [ 62.385671][ T9229] ? __lock_acquire+0x481/0x26d0 > [ 62.385683][ T9229] freader_fetch+0x1d5/0xa80 > [ 62.385697][ T9229] ? find_held_lock+0x2b/0x80 > [ 62.385712][ T9229] ? __pfx_freader_fetch+0x10/0x10 > [ 62.385725][ T9229] ? __asan_memset+0x27/0x50 > [ 62.385737][ T9229] __build_id_parse.isra.0+0xea/0x6a0 > [ 62.385751][ T9229] ? __pfx___build_id_parse.isra.0+0x10/0x10 > [ 62.385766][ T9229] ? __pfx_find_vma+0x10/0x10 > [ 62.385774][ T9229] ? __might_fault+0x129/0x170 > [ 62.385788][ T9229] do_procmap_query+0xd75/0x1050 > [ 62.385798][ T9229] ? __pfx_do_procmap_query+0x10/0x10 > [ 62.385807][ T9229] ? __sanitizer_cov_trace_switch+0x53/0x90 > [ 62.385817][ T9229] ? do_vfs_ioctl+0x226/0x13b0 > [ 62.385828][ T9229] ? __pfx_do_vfs_ioctl+0x10/0x10 > [ 62.385839][ T9229] ? putname+0xfc/0x1b0 > [ 62.385846][ T9229] ? putname+0x101/0x1b0 > [ 62.385857][ T9229] ? __x64_sys_openat+0x143/0x210 > [ 62.385867][ T9229] procfs_procmap_ioctl+0x7a/0xb0 > [ 62.385877][ T9229] ? __pfx_procfs_procmap_ioctl+0x10/0x10 > [ 62.385888][ T9229] __x64_sys_ioctl+0x18e/0x210 > [ 62.385899][ T9229] do_syscall_64+0xcb/0xf80 > [ 62.385913][ T9229] entry_SYSCALL_64_after_hwframe+0x77/0x7f > [ 62.385923][ T9229] RIP: 0033:0x412209 > [ 62.385931][ T9229] Code: c0 79 93 eb d5 48 8d 7c 1d 00 eb 99 0f 1f > 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b > 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 d8 ff ff ff f7 d8 > 64 89 01 48 > [ 62.385940][ T9229] RSP: 002b:00007fff380d5588 EFLAGS: 00000217 > ORIG_RAX: 0000000000000010 > [ 62.385950][ T9229] RAX: ffffffffffffffda RBX: 00007fff380d56c8 > RCX: 0000000000412209 > [ 62.385956][ T9229] RDX: 0000200000000180 RSI: 00000000c0686611 > RDI: 0000000000000004 > [ 62.385962][ T9229] RBP: 00007fff380d55a0 R08: 0000000000000000 > R09: 00007fff380d5640 > [ 62.385968][ T9229] R10: 0000000000000000 R11: 0000000000000217 > R12: 00007fff380d56b8 > [ 62.385974][ T9229] R13: 0000000000000002 R14: 00000000004a0e40 > R15: 0000000000000002 > [ 62.385982][ T9229] </TASK> > > > > > > > > > other info that might help us debug this: > > > > > > > > Chain exists of: > > > > &sb->s_type->i_mutex_key#8 --> &mm->mmap_lock --> vm_lock > > > > > > > > Possible unsafe locking scenario: > > > > > > > > CPU0 CPU1 > > > > ---- ---- > > > > rlock(vm_lock); > > > > lock(&mm->mmap_lock); > > > > lock(vm_lock); > > > > rlock(&sb->s_type->i_mutex_key#8); > > > > > > > > *** DEADLOCK *** > > > > > > > > 1 lock held by syz.0.17/6091: > > > > #0: ffff888012aa0448 (vm_lock){++++}-{0:0}, at: lock_next_vma+0x10e/0xed0 mm/mmap_lock.c:334 > > > > > > > > stack backtrace: > > > > CPU: 2 UID: 0 PID: 6091 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 > > > > Call Trace: > > > > <TASK> > > > > __dump_stack lib/dump_stack.c:94 [inline] > > > > dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 > > > > print_circular_bug+0x275/0x340 kernel/locking/lockdep.c:2043 > > > > check_noncircular+0x146/0x160 kernel/locking/lockdep.c:2175 > > > > check_prev_add kernel/locking/lockdep.c:3165 [inline] > > > > check_prevs_add kernel/locking/lockdep.c:3284 [inline] > > > > validate_chain kernel/locking/lockdep.c:3908 [inline] > > > > __lock_acquire+0x1669/0x2890 kernel/locking/lockdep.c:5237 > > > > lock_acquire kernel/locking/lockdep.c:5868 [inline] > > > > lock_acquire+0x179/0x330 kernel/locking/lockdep.c:5825 > > > > down_read+0x9b/0x460 kernel/locking/rwsem.c:1537 > > > > inode_lock_shared include/linux/fs.h:1042 [inline] > > > > blkdev_read_iter+0x19e/0x500 block/fops.c:855 > > > > __kernel_read+0x3f3/0xbf0 fs/read_write.c:530 > > > > freader_fetch+0x1d7/0x9d0 lib/buildid.c:100 > > > > __build_id_parse.isra.0+0xdd/0x6c0 lib/buildid.c:297 > > > > do_procmap_query+0xb0e/0x1080 fs/proc/task_mmu.c:733 > > > > procfs_procmap_ioctl+0x9d/0xe0 fs/proc/task_mmu.c:813 > > > > vfs_ioctl fs/ioctl.c:51 [inline] > > > > __do_sys_ioctl fs/ioctl.c:597 [inline] > > > > __se_sys_ioctl fs/ioctl.c:583 [inline] > > > > __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583 > > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > > > do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 > > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > RIP: 0033:0x7ff1a238f7c9 > > > > Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 > > > > RSP: 002b:00007ffebbe538b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > > > > RAX: ffffffffffffffda RBX: 00007ff1a25e5fa0 RCX: 00007ff1a238f7c9 > > > > RDX: 0000200000000180 RSI: 00000000c0686611 RDI: 0000000000000004 > > > > RBP: 00007ff1a2413f91 R08: 0000000000000000 R09: 0000000000000000 > > > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > > > > R13: 00007ff1a25e5fa0 R14: 00007ff1a25e5fa0 R15: 0000000000000003 > > > > </TASK> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] [block?] possible deadlock in blkdev_read_iter 2026-01-27 2:22 ` Suren Baghdasaryan @ 2026-01-27 18:51 ` Andrii Nakryiko 2026-01-27 23:52 ` Suren Baghdasaryan 0 siblings, 1 reply; 7+ messages in thread From: Andrii Nakryiko @ 2026-01-27 18:51 UTC (permalink / raw) To: Suren Baghdasaryan Cc: Hillf Danton, syzbot, axboe, linux-block, Lorenzo Stoakes, linux-mm, linux-kernel, syzkaller-bugs On Mon, Jan 26, 2026 at 6:22 PM Suren Baghdasaryan <surenb@google.com> wrote: > > On Mon, Jan 26, 2026 at 2:33 PM Suren Baghdasaryan <surenb@google.com> wrote: > > > > On Mon, Jan 26, 2026 at 9:20 AM Suren Baghdasaryan <surenb@google.com> wrote: > > > > > > On Sat, Jan 24, 2026 at 3:32 AM Hillf Danton <hdanton@sina.com> wrote: > > > > > > > > Add Lorenzo and Suren > > > > > > Thanks! > > > > > > > > > > > > Date: Fri, 23 Jan 2026 15:14:36 -0800 > > > > > Hello, > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > HEAD commit: 24d479d26b25 Linux 6.19-rc6 > > > > > git tree: upstream > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=100033fa580000 > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=1859476832863c41 > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=4e70c8e0a2017b432f7a > > > > > compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 > > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11451b9a580000 > > > > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1045e852580000 > > > > > > > > > > Downloadable assets: > > > > > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-24d479d2.raw.xz > > > > > vmlinux: https://storage.googleapis.com/syzbot-assets/d0f3c47f6869/vmlinux-24d479d2.xz > > > > > kernel image: https://storage.googleapis.com/syzbot-assets/800231513703/bzImage-24d479d2.xz > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > Reported-by: syzbot+4e70c8e0a2017b432f7a@syzkaller.appspotmail.com > > > > > > > > > > WARNING: possible circular locking dependency detected > > > > > syzkaller #0 Not tainted > > > > > ------------------------------------------------------ > > > > > syz.0.17/6091 is trying to acquire lock: > > > > > ffff8881061287a8 ( > > > > > &sb->s_type->i_mutex_key#8){++++}-{4:4}, at: inode_lock_shared include/linux/fs.h:1042 [inline] > > > > > &sb->s_type->i_mutex_key#8){++++}-{4:4}, at: blkdev_read_iter+0x19e/0x500 block/fops.c:855 > > > > > > > > > > but task is already holding lock: > > > > > ffff888012aa0448 (vm_lock){++++}-{0:0}, at: lock_next_vma+0x10e/0xed0 mm/mmap_lock.c:334 > > > > > > > > > > which lock already depends on the new lock. > > > > > > > > > > > > > > > the existing dependency chain (in reverse order) is: > > > > > > > > > > -> #2 (vm_lock){++++}-{0:0}: > > > > > __vma_enter_locked+0x260/0x770 mm/mmap_lock.c:72 > > > > > __vma_start_write+0x21/0x160 mm/mmap_lock.c:104 > > > > > vma_start_write include/linux/mmap_lock.h:213 [inline] > > > > > mprotect_fixup+0x4e3/0xb80 mm/mprotect.c:768 > > > > > setup_arg_pages+0x4a2/0xbb0 fs/exec.c:670 > > > > > load_elf_binary+0xb5b/0x4fe0 fs/binfmt_elf.c:1028 > > > > > search_binary_handler fs/exec.c:1669 [inline] > > > > > exec_binprm fs/exec.c:1701 [inline] > > > > > bprm_execve fs/exec.c:1753 [inline] > > > > > bprm_execve+0x8c2/0x1620 fs/exec.c:1729 > > > > > kernel_execve+0x2ef/0x3b0 fs/exec.c:1919 > > > > > try_to_run_init_process init/main.c:1506 [inline] > > > > > kernel_init+0x14a/0x2b0 init/main.c:1634 > > > > > ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158 > > > > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 > > > > > > > > > > -> #1 (&mm->mmap_lock){++++}-{4:4}: > > > > > __might_fault mm/memory.c:7174 [inline] > > > > > __might_fault+0x113/0x190 mm/memory.c:7168 > > > > > _copy_to_iter+0x1c2/0x1710 lib/iov_iter.c:196 > > > > > copy_page_to_iter lib/iov_iter.c:374 [inline] > > > > > copy_page_to_iter+0x12a/0x1e0 lib/iov_iter.c:361 > > > > > copy_folio_to_iter include/linux/uio.h:204 [inline] > > > > > filemap_read+0x6b1/0xe40 mm/filemap.c:2851 > > > > > blkdev_read_iter+0x1ac/0x500 block/fops.c:856 > > > > > new_sync_read fs/read_write.c:491 [inline] > > > > > vfs_read+0x8bf/0xcf0 fs/read_write.c:572 > > > > > ksys_read+0x12a/0x250 fs/read_write.c:715 > > > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > > > > do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 > > > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > > > > > > -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}: > > > > > check_prev_add kernel/locking/lockdep.c:3165 [inline] > > > > > check_prevs_add kernel/locking/lockdep.c:3284 [inline] > > > > > validate_chain kernel/locking/lockdep.c:3908 [inline] > > > > > __lock_acquire+0x1669/0x2890 kernel/locking/lockdep.c:5237 > > > > > lock_acquire kernel/locking/lockdep.c:5868 [inline] > > > > > lock_acquire+0x179/0x330 kernel/locking/lockdep.c:5825 > > > > > down_read+0x9b/0x460 kernel/locking/rwsem.c:1537 > > > > > inode_lock_shared include/linux/fs.h:1042 [inline] > > > > > blkdev_read_iter+0x19e/0x500 block/fops.c:855 > > > > > __kernel_read+0x3f3/0xbf0 fs/read_write.c:530 > > > > > freader_fetch+0x1d7/0x9d0 lib/buildid.c:100 > > > > > __build_id_parse.isra.0+0xdd/0x6c0 lib/buildid.c:297 > > > > > do_procmap_query+0xb0e/0x1080 fs/proc/task_mmu.c:733 > > > > > procfs_procmap_ioctl+0x9d/0xe0 fs/proc/task_mmu.c:813 > > > > > vfs_ioctl fs/ioctl.c:51 [inline] > > > > > __do_sys_ioctl fs/ioctl.c:597 [inline] > > > > > __se_sys_ioctl fs/ioctl.c:583 [inline] > > > > > __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583 > > > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > > > > do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 > > > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > > > > > > > It looks like: > > > #0 is executing PROCMAP_QUERY ioclt, read-locks vm_lock and then calls > > > build_id_parse()->__build_id_parse(..., > > > may_fault=true)->__kernel_read() which eventually takes > > > inode->i_rwsem. > > > #1 is a file-backed page fault which asserts that it might take > > > mmap_lock for read. > > > #2 is load_elf_binary()->mprotect_fixup() which write-locks both > > > mmap_lock and vm_lock. I'm guessing it already holds inode->i_rwsem > > > before write-locking these locks. > > > > > > Originally I thought the issue is most liley introduced in > > > d9d1c2d81797 ("fs/proc/task_mmu: execute PROCMAP_QUERY ioctl under > > > per-vma locks"). But if #2 indeed takes inode->i_rwsem before > > > write-locking mmap_lock, then the problem should exist even before > > > that change when we didn't use vm_lock and relied on mmap_lock... > > > > > > I'll try to analyze this more before attempting a fix. > > > > I was able to reproduce the same issue even after reverting > > d9d1c2d81797. The deadlock in this case is simpler and involves > > mmap_lock instead of vm_lock (see below). > > Looks like the race is between the read() syscall and do_procmap_query(). > > I'll continue investigating, in the meantime CC'ing Andrii. > > So, here is a cleaner version of that report (with d9d1c2d81797 reverted): > > -> #1 (&mm->mmap_lock){++++}-{4:4}: > __might_fault+0xed/0x170 > _copy_to_iter+0x118/0x1720 > copy_page_to_iter+0x12d/0x1e0 > filemap_read+0x720/0x10a0 > blkdev_read_iter+0x2b5/0x4e0 > vfs_read+0x7f4/0xae0 > ksys_read+0x12a/0x250 > do_syscall_64+0xcb/0xf80 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}: > __lock_acquire+0x1509/0x26d0 > lock_acquire+0x185/0x340 > down_read+0x98/0x490 > blkdev_read_iter+0x2a7/0x4e0 > __kernel_read+0x39a/0xa90 > freader_fetch+0x1d5/0xa80 > __build_id_parse.isra.0+0xea/0x6a0 > do_procmap_query+0xd75/0x1050 > procfs_procmap_ioctl+0x7a/0xb0 > __x64_sys_ioctl+0x18e/0x210 > do_syscall_64+0xcb/0xf80 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > other info that might help us debug this: > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > rlock(&mm->mmap_lock); > lock(&sb->s_type->i_mutex_key#8); > lock(&mm->mmap_lock); > rlock(&sb->s_type->i_mutex_key#8); > > *** DEADLOCK *** > > Both threads are calling blkdev_read_iter(), which uses > inode_lock_shared() to read-lock inode->i_rwsem. I'm not sure why CPU1 > shows lock() instead of rlock(). So both threads read-lock > inode->i_rwsem and mmap_lock but in a different order. IIUC, with > read-locks this should not deadlock until some other thread > write-locks the mmap_lock in between and this becomes a real deadlock: > > CPU0 CPU1 CPU2 > ---- ---- ---- > rlock(&mm->mmap_lock); > rlock(&sb->s_type->i_mutex_key#8); > wlock(&mm->mmap_lock) <-- waiting for CPU0 > rlock(&mm->mmap_lock); <-- waiting for CPU1 > rlock(&sb->s_type->i_mutex_key#8); <-- waiting for CPU2 > > I believe in the original report this write-locking thread was the one > calling mprotect_fixup(). > > Per https://docs.kernel.org/mm/process_addrs.html#lock-ordering, > inode->i_rwsem should be locked before mm->mmap_lock, so > procfs_procmap_ioctl() has to be fixed to follow this lock ordering. > One possibility I can think of is to use build_id_parse_nofault() > first and if it fails because the required page is not faulted, we do > freader_init_from_file(), then drop the mmap/vma lock and execute > freader_fetch() outside of these locks to fault in that page. Once > that's done, we'll retry the whole operation and this time > build_id_parse_nofault() should pass (unless we already evicted that > page, which is extremely unlikely and in that case, we'll retry > again). > > I tried a POC with build_id_parse_nofault() but without the whole > dance with freader_init_from_file/freader_fetch and the deadlock is > gone. Andrii, WDYT? I don't like it :) Too much complexity, _nofault() variant only makes sense for BPF in non-sleepable contexts. I think this can be fixed simpler and cleaner. We don't need to hold VMA lock while fetching build ID. Build ID works with vma's vm_file, so we can just get its reference, drop vma lock, then fetch build id. Below diff passes our BPF selftests. Might need to think about a bit leaner code changes, but the idea should be clear. Diff below will be butchered by gmail, but you can fetch it at [0]. Do you mind validating that deadlock is gone? Thanks! [0] https://git.kernel.org/pub/scm/linux/kernel/git/andrii/bpf-next.git/commit/?h=procmap-query-vma-deadlock-fix&id=7faf95b63a8a7ac6e78b6d90101c94bfa6ecdfd1 Author: Andrii Nakryiko <andrii@kernel.org> Date: Tue Jan 27 10:46:04 2026 -0800 procfs: avoid fetching build ID while holding VMA lock Signed-off-by: Andrii Nakryiko <andrii@kernel.org> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 81dfc26bfae8..564bf82e3731 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -656,6 +656,7 @@ static int do_procmap_query(struct mm_struct *mm, void __user *uarg) struct proc_maps_locking_ctx lock_ctx = { .mm = mm }; struct procmap_query karg; struct vm_area_struct *vma; + struct file *vm_file = NULL; const char *name = NULL; char build_id_buf[BUILD_ID_SIZE_MAX], *name_buf = NULL; __u64 usize; @@ -720,6 +721,9 @@ static int do_procmap_query(struct mm_struct *mm, void __user *uarg) karg.dev_major = MAJOR(inode->i_sb->s_dev); karg.dev_minor = MINOR(inode->i_sb->s_dev); karg.inode = inode->i_ino; + + if (karg.build_id_size) + vm_file = get_file(vma->vm_file); } else { karg.vma_offset = 0; karg.dev_major = 0; @@ -727,21 +731,6 @@ static int do_procmap_query(struct mm_struct *mm, void __user *uarg) karg.inode = 0; } - if (karg.build_id_size) { - __u32 build_id_sz; - - err = build_id_parse(vma, build_id_buf, &build_id_sz); - if (err) { - karg.build_id_size = 0; - } else { - if (karg.build_id_size < build_id_sz) { - err = -ENAMETOOLONG; - goto out; - } - karg.build_id_size = build_id_sz; - } - } - if (karg.vma_name_size) { size_t name_buf_sz = min_t(size_t, PATH_MAX, karg.vma_name_size); const struct path *path; @@ -779,6 +768,28 @@ static int do_procmap_query(struct mm_struct *mm, void __user *uarg) query_vma_teardown(&lock_ctx); mmput(mm); + if (karg.build_id_size) { + __u32 build_id_sz; + + err = -ENOENT; + if (vm_file) + err = build_id_parse_file(vm_file, build_id_buf, &build_id_sz); + if (err) { + karg.build_id_size = 0; + } else { + if (karg.build_id_size < build_id_sz) { + err = -ENAMETOOLONG; + goto out; + } + karg.build_id_size = build_id_sz; + } + } + + if (vm_file) { + fput(vm_file); + vm_file = NULL; + } + if (karg.vma_name_size && copy_to_user(u64_to_user_ptr(karg.vma_name_addr), name, karg.vma_name_size)) { kfree(name_buf); @@ -797,6 +808,8 @@ static int do_procmap_query(struct mm_struct *mm, void __user *uarg) out: query_vma_teardown(&lock_ctx); + if (vm_file) + fput(vm_file); mmput(mm); kfree(name_buf); return err; diff --git a/include/linux/buildid.h b/include/linux/buildid.h index 831c1b4b626c..7acc06b22fb7 100644 --- a/include/linux/buildid.h +++ b/include/linux/buildid.h @@ -7,7 +7,10 @@ #define BUILD_ID_SIZE_MAX 20 struct vm_area_struct; +struct file; + int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id, __u32 *size); +int build_id_parse_file(struct file *file, unsigned char *build_id, __u32 *size); int build_id_parse_nofault(struct vm_area_struct *vma, unsigned char *build_id, __u32 *size); int build_id_parse_buf(const void *buf, unsigned char *build_id, u32 buf_size); diff --git a/lib/buildid.c b/lib/buildid.c index aaf61dfc0919..c0002129d526 100644 --- a/lib/buildid.c +++ b/lib/buildid.c @@ -271,7 +271,7 @@ static int get_build_id_64(struct freader *r, unsigned char *build_id, __u32 *si /* enough for Elf64_Ehdr, Elf64_Phdr, and all the smaller requests */ #define MAX_FREADER_BUF_SZ 64 -static int __build_id_parse(struct vm_area_struct *vma, unsigned char *build_id, +static int __build_id_parse(struct file *file, unsigned char *build_id, __u32 *size, bool may_fault) { const Elf32_Ehdr *ehdr; @@ -279,11 +279,7 @@ static int __build_id_parse(struct vm_area_struct *vma, unsigned char *build_id, char buf[MAX_FREADER_BUF_SZ]; int ret; - /* only works for page backed storage */ - if (!vma->vm_file) - return -EINVAL; - - freader_init_from_file(&r, buf, sizeof(buf), vma->vm_file, may_fault); + freader_init_from_file(&r, buf, sizeof(buf), file, may_fault); /* fetch first 18 bytes of ELF header for checks */ ehdr = freader_fetch(&r, 0, offsetofend(Elf32_Ehdr, e_type)); @@ -324,7 +320,11 @@ static int __build_id_parse(struct vm_area_struct *vma, unsigned char *build_id, */ int build_id_parse_nofault(struct vm_area_struct *vma, unsigned char *build_id, __u32 *size) { - return __build_id_parse(vma, build_id, size, false /* !may_fault */); + /* only works for page backed storage */ + if (!vma->vm_file) + return -EINVAL; + + return __build_id_parse(vma->vm_file, build_id, size, false /* !may_fault */); } /* @@ -340,7 +340,16 @@ int build_id_parse_nofault(struct vm_area_struct *vma, unsigned char *build_id, */ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id, __u32 *size) { - return __build_id_parse(vma, build_id, size, true /* may_fault */); + /* only works for page backed storage */ + if (!vma->vm_file) + return -EINVAL; + + return __build_id_parse(vma->vm_file, build_id, size, true /* may_fault */); +} + +int build_id_parse_file(struct file *file, unsigned char *build_id, __u32 *size) +{ + return __build_id_parse(file, build_id, size, true /* may_fault */); } /** > > > > > [ 62.320932][ T9229] > > [ 62.321471][ T9229] ====================================================== > > [ 62.323016][ T9229] WARNING: possible circular locking dependency detected > > [ 62.324618][ T9229] 6.19.0-rc6-00001-g40bea6261b2a #42 Not tainted > > [ 62.326013][ T9229] ------------------------------------------------------ > > [ 62.327560][ T9229] hillf/9229 is trying to acquire lock: > > [ 62.328821][ T9229] ffff888145b7b5a8 > > (&sb->s_type->i_mutex_key#8){++++}-{4:4}, at: > > blkdev_read_iter+0x2a7/0x4e0 > > [ 62.331102][ T9229] > > [ 62.331102][ T9229] but task is already holding lock: > > [ 62.332722][ T9229] ffff888183a6e540 (&mm->mmap_lock){++++}-{4:4}, > > at: do_procmap_query+0x39f/0x1050 > > [ 62.334795][ T9229] > > [ 62.334795][ T9229] which lock already depends on the new lock. > > [ 62.334795][ T9229] > > [ 62.337072][ T9229] > > [ 62.337072][ T9229] the existing dependency chain (in reverse order) is: > > [ 62.338998][ T9229] > > [ 62.338998][ T9229] -> #1 (&mm->mmap_lock){++++}-{4:4}: > > [ 62.340646][ T9229] __might_fault+0xed/0x170 > > [ 62.341763][ T9229] _copy_to_iter+0x118/0x1720 > > [ 62.342913][ T9229] copy_page_to_iter+0x12d/0x1e0 > > [ 62.344167][ T9229] filemap_read+0x720/0x10a0 > > [ 62.345298][ T9229] blkdev_read_iter+0x2b5/0x4e0 > > [ 62.346480][ T9229] vfs_read+0x7f4/0xae0 > > [ 62.347518][ T9229] ksys_read+0x12a/0x250 > > [ 62.348584][ T9229] do_syscall_64+0xcb/0xf80 > > [ 62.349707][ T9229] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > [ 62.351116][ T9229] > > [ 62.351116][ T9229] -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}: > > [ 62.353012][ T9229] __lock_acquire+0x1509/0x26d0 > > [ 62.354213][ T9229] lock_acquire+0x185/0x340 > > [ 62.355323][ T9229] down_read+0x98/0x490 > > [ 62.356441][ T9229] blkdev_read_iter+0x2a7/0x4e0 > > [ 62.357619][ T9229] __kernel_read+0x39a/0xa90 > > [ 62.358767][ T9229] freader_fetch+0x1d5/0xa80 > > [ 62.359927][ T9229] __build_id_parse.isra.0+0xea/0x6a0 > > [ 62.361232][ T9229] do_procmap_query+0xd75/0x1050 > > [ 62.362434][ T9229] procfs_procmap_ioctl+0x7a/0xb0 > > [ 62.363687][ T9229] __x64_sys_ioctl+0x18e/0x210 > > [ 62.364863][ T9229] do_syscall_64+0xcb/0xf80 > > [ 62.365977][ T9229] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > [ 62.367394][ T9229] > > [ 62.367394][ T9229] other info that might help us debug this: > > [ 62.367394][ T9229] > > [ 62.369637][ T9229] Possible unsafe locking scenario: > > [ 62.369637][ T9229] > > [ 62.371237][ T9229] CPU0 CPU1 > > [ 62.372441][ T9229] ---- ---- > > [ 62.373687][ T9229] rlock(&mm->mmap_lock); > > [ 62.374688][ T9229] > > lock(&sb->s_type->i_mutex_key#8); > > [ 62.376444][ T9229] lock(&mm->mmap_lock); > > [ 62.377956][ T9229] rlock(&sb->s_type->i_mutex_key#8); > > [ 62.379165][ T9229] > > [ 62.379165][ T9229] *** DEADLOCK *** > > [ 62.379165][ T9229] > > [ 62.380952][ T9229] 1 lock held by hillf/9229: > > [ 62.381971][ T9229] #0: ffff888183a6e540 > > (&mm->mmap_lock){++++}-{4:4}, at: do_procmap_query+0x39f/0x1050 > > [ 62.384162][ T9229] > > [ 62.384162][ T9229] stack backtrace: > > [ 62.385458][ T9229] CPU: 3 UID: 0 PID: 9229 Comm: hillf Not tainted > > 6.19.0-rc6-00001-g40bea6261b2a #42 PREEMPT(full) > > [ 62.385471][ T9229] Hardware name: QEMU Standard PC (i440FX + PIIX, > > 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 > > [ 62.385477][ T9229] Call Trace: > > [ 62.385482][ T9229] <TASK> > > [ 62.385487][ T9229] dump_stack_lvl+0x100/0x190 > > [ 62.385505][ T9229] print_circular_bug.cold+0x185/0x1d5 > > [ 62.385521][ T9229] check_noncircular+0x14a/0x170 > > [ 62.385534][ T9229] __lock_acquire+0x1509/0x26d0 > > [ 62.385547][ T9229] lock_acquire+0x185/0x340 > > [ 62.385557][ T9229] ? blkdev_read_iter+0x2a7/0x4e0 > > [ 62.385569][ T9229] ? __pfx___might_resched+0x10/0x10 > > [ 62.385583][ T9229] down_read+0x98/0x490 > > [ 62.385593][ T9229] ? blkdev_read_iter+0x2a7/0x4e0 > > [ 62.385603][ T9229] ? __pfx_down_read+0x10/0x10 > > [ 62.385612][ T9229] ? lock_acquire+0x185/0x340 > > [ 62.385622][ T9229] ? is_bpf_text_address+0x25/0x1a0 > > [ 62.385634][ T9229] blkdev_read_iter+0x2a7/0x4e0 > > [ 62.385645][ T9229] __kernel_read+0x39a/0xa90 > > [ 62.385658][ T9229] ? __pfx___kernel_read+0x10/0x10 > > [ 62.385671][ T9229] ? __lock_acquire+0x481/0x26d0 > > [ 62.385683][ T9229] freader_fetch+0x1d5/0xa80 > > [ 62.385697][ T9229] ? find_held_lock+0x2b/0x80 > > [ 62.385712][ T9229] ? __pfx_freader_fetch+0x10/0x10 > > [ 62.385725][ T9229] ? __asan_memset+0x27/0x50 > > [ 62.385737][ T9229] __build_id_parse.isra.0+0xea/0x6a0 > > [ 62.385751][ T9229] ? __pfx___build_id_parse.isra.0+0x10/0x10 > > [ 62.385766][ T9229] ? __pfx_find_vma+0x10/0x10 > > [ 62.385774][ T9229] ? __might_fault+0x129/0x170 > > [ 62.385788][ T9229] do_procmap_query+0xd75/0x1050 > > [ 62.385798][ T9229] ? __pfx_do_procmap_query+0x10/0x10 > > [ 62.385807][ T9229] ? __sanitizer_cov_trace_switch+0x53/0x90 > > [ 62.385817][ T9229] ? do_vfs_ioctl+0x226/0x13b0 > > [ 62.385828][ T9229] ? __pfx_do_vfs_ioctl+0x10/0x10 > > [ 62.385839][ T9229] ? putname+0xfc/0x1b0 > > [ 62.385846][ T9229] ? putname+0x101/0x1b0 > > [ 62.385857][ T9229] ? __x64_sys_openat+0x143/0x210 > > [ 62.385867][ T9229] procfs_procmap_ioctl+0x7a/0xb0 > > [ 62.385877][ T9229] ? __pfx_procfs_procmap_ioctl+0x10/0x10 > > [ 62.385888][ T9229] __x64_sys_ioctl+0x18e/0x210 > > [ 62.385899][ T9229] do_syscall_64+0xcb/0xf80 > > [ 62.385913][ T9229] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > [ 62.385923][ T9229] RIP: 0033:0x412209 > > [ 62.385931][ T9229] Code: c0 79 93 eb d5 48 8d 7c 1d 00 eb 99 0f 1f > > 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b > > 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 d8 ff ff ff f7 d8 > > 64 89 01 48 > > [ 62.385940][ T9229] RSP: 002b:00007fff380d5588 EFLAGS: 00000217 > > ORIG_RAX: 0000000000000010 > > [ 62.385950][ T9229] RAX: ffffffffffffffda RBX: 00007fff380d56c8 > > RCX: 0000000000412209 > > [ 62.385956][ T9229] RDX: 0000200000000180 RSI: 00000000c0686611 > > RDI: 0000000000000004 > > [ 62.385962][ T9229] RBP: 00007fff380d55a0 R08: 0000000000000000 > > R09: 00007fff380d5640 > > [ 62.385968][ T9229] R10: 0000000000000000 R11: 0000000000000217 > > R12: 00007fff380d56b8 > > [ 62.385974][ T9229] R13: 0000000000000002 R14: 00000000004a0e40 > > R15: 0000000000000002 > > [ 62.385982][ T9229] </TASK> > > > > > > > > > > > > > > other info that might help us debug this: > > > > > > > > > > Chain exists of: > > > > > &sb->s_type->i_mutex_key#8 --> &mm->mmap_lock --> vm_lock > > > > > > > > > > Possible unsafe locking scenario: > > > > > > > > > > CPU0 CPU1 > > > > > ---- ---- > > > > > rlock(vm_lock); > > > > > lock(&mm->mmap_lock); > > > > > lock(vm_lock); > > > > > rlock(&sb->s_type->i_mutex_key#8); > > > > > > > > > > *** DEADLOCK *** > > > > > > > > > > 1 lock held by syz.0.17/6091: > > > > > #0: ffff888012aa0448 (vm_lock){++++}-{0:0}, at: lock_next_vma+0x10e/0xed0 mm/mmap_lock.c:334 > > > > > > > > > > stack backtrace: > > > > > CPU: 2 UID: 0 PID: 6091 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 > > > > > Call Trace: > > > > > <TASK> > > > > > __dump_stack lib/dump_stack.c:94 [inline] > > > > > dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 > > > > > print_circular_bug+0x275/0x340 kernel/locking/lockdep.c:2043 > > > > > check_noncircular+0x146/0x160 kernel/locking/lockdep.c:2175 > > > > > check_prev_add kernel/locking/lockdep.c:3165 [inline] > > > > > check_prevs_add kernel/locking/lockdep.c:3284 [inline] > > > > > validate_chain kernel/locking/lockdep.c:3908 [inline] > > > > > __lock_acquire+0x1669/0x2890 kernel/locking/lockdep.c:5237 > > > > > lock_acquire kernel/locking/lockdep.c:5868 [inline] > > > > > lock_acquire+0x179/0x330 kernel/locking/lockdep.c:5825 > > > > > down_read+0x9b/0x460 kernel/locking/rwsem.c:1537 > > > > > inode_lock_shared include/linux/fs.h:1042 [inline] > > > > > blkdev_read_iter+0x19e/0x500 block/fops.c:855 > > > > > __kernel_read+0x3f3/0xbf0 fs/read_write.c:530 > > > > > freader_fetch+0x1d7/0x9d0 lib/buildid.c:100 > > > > > __build_id_parse.isra.0+0xdd/0x6c0 lib/buildid.c:297 > > > > > do_procmap_query+0xb0e/0x1080 fs/proc/task_mmu.c:733 > > > > > procfs_procmap_ioctl+0x9d/0xe0 fs/proc/task_mmu.c:813 > > > > > vfs_ioctl fs/ioctl.c:51 [inline] > > > > > __do_sys_ioctl fs/ioctl.c:597 [inline] > > > > > __se_sys_ioctl fs/ioctl.c:583 [inline] > > > > > __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583 > > > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > > > > do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 > > > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > RIP: 0033:0x7ff1a238f7c9 > > > > > Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 > > > > > RSP: 002b:00007ffebbe538b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > > > > > RAX: ffffffffffffffda RBX: 00007ff1a25e5fa0 RCX: 00007ff1a238f7c9 > > > > > RDX: 0000200000000180 RSI: 00000000c0686611 RDI: 0000000000000004 > > > > > RBP: 00007ff1a2413f91 R08: 0000000000000000 R09: 0000000000000000 > > > > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > > > > > R13: 00007ff1a25e5fa0 R14: 00007ff1a25e5fa0 R15: 0000000000000003 > > > > > </TASK> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] [block?] possible deadlock in blkdev_read_iter 2026-01-27 18:51 ` Andrii Nakryiko @ 2026-01-27 23:52 ` Suren Baghdasaryan 2026-01-28 3:41 ` Suren Baghdasaryan 0 siblings, 1 reply; 7+ messages in thread From: Suren Baghdasaryan @ 2026-01-27 23:52 UTC (permalink / raw) To: Andrii Nakryiko Cc: Hillf Danton, syzbot, axboe, linux-block, Lorenzo Stoakes, linux-mm, linux-kernel, syzkaller-bugs On Tue, Jan 27, 2026 at 10:51 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Mon, Jan 26, 2026 at 6:22 PM Suren Baghdasaryan <surenb@google.com> wrote: > > > > On Mon, Jan 26, 2026 at 2:33 PM Suren Baghdasaryan <surenb@google.com> wrote: > > > > > > On Mon, Jan 26, 2026 at 9:20 AM Suren Baghdasaryan <surenb@google.com> wrote: > > > > > > > > On Sat, Jan 24, 2026 at 3:32 AM Hillf Danton <hdanton@sina.com> wrote: > > > > > > > > > > Add Lorenzo and Suren > > > > > > > > Thanks! > > > > > > > > > > > > > > > Date: Fri, 23 Jan 2026 15:14:36 -0800 > > > > > > Hello, > > > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > > > HEAD commit: 24d479d26b25 Linux 6.19-rc6 > > > > > > git tree: upstream > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=100033fa580000 > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=1859476832863c41 > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=4e70c8e0a2017b432f7a > > > > > > compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 > > > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11451b9a580000 > > > > > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1045e852580000 > > > > > > > > > > > > Downloadable assets: > > > > > > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-24d479d2.raw.xz > > > > > > vmlinux: https://storage.googleapis.com/syzbot-assets/d0f3c47f6869/vmlinux-24d479d2.xz > > > > > > kernel image: https://storage.googleapis.com/syzbot-assets/800231513703/bzImage-24d479d2.xz > > > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > > Reported-by: syzbot+4e70c8e0a2017b432f7a@syzkaller.appspotmail.com > > > > > > > > > > > > WARNING: possible circular locking dependency detected > > > > > > syzkaller #0 Not tainted > > > > > > ------------------------------------------------------ > > > > > > syz.0.17/6091 is trying to acquire lock: > > > > > > ffff8881061287a8 ( > > > > > > &sb->s_type->i_mutex_key#8){++++}-{4:4}, at: inode_lock_shared include/linux/fs.h:1042 [inline] > > > > > > &sb->s_type->i_mutex_key#8){++++}-{4:4}, at: blkdev_read_iter+0x19e/0x500 block/fops.c:855 > > > > > > > > > > > > but task is already holding lock: > > > > > > ffff888012aa0448 (vm_lock){++++}-{0:0}, at: lock_next_vma+0x10e/0xed0 mm/mmap_lock.c:334 > > > > > > > > > > > > which lock already depends on the new lock. > > > > > > > > > > > > > > > > > > the existing dependency chain (in reverse order) is: > > > > > > > > > > > > -> #2 (vm_lock){++++}-{0:0}: > > > > > > __vma_enter_locked+0x260/0x770 mm/mmap_lock.c:72 > > > > > > __vma_start_write+0x21/0x160 mm/mmap_lock.c:104 > > > > > > vma_start_write include/linux/mmap_lock.h:213 [inline] > > > > > > mprotect_fixup+0x4e3/0xb80 mm/mprotect.c:768 > > > > > > setup_arg_pages+0x4a2/0xbb0 fs/exec.c:670 > > > > > > load_elf_binary+0xb5b/0x4fe0 fs/binfmt_elf.c:1028 > > > > > > search_binary_handler fs/exec.c:1669 [inline] > > > > > > exec_binprm fs/exec.c:1701 [inline] > > > > > > bprm_execve fs/exec.c:1753 [inline] > > > > > > bprm_execve+0x8c2/0x1620 fs/exec.c:1729 > > > > > > kernel_execve+0x2ef/0x3b0 fs/exec.c:1919 > > > > > > try_to_run_init_process init/main.c:1506 [inline] > > > > > > kernel_init+0x14a/0x2b0 init/main.c:1634 > > > > > > ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158 > > > > > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 > > > > > > > > > > > > -> #1 (&mm->mmap_lock){++++}-{4:4}: > > > > > > __might_fault mm/memory.c:7174 [inline] > > > > > > __might_fault+0x113/0x190 mm/memory.c:7168 > > > > > > _copy_to_iter+0x1c2/0x1710 lib/iov_iter.c:196 > > > > > > copy_page_to_iter lib/iov_iter.c:374 [inline] > > > > > > copy_page_to_iter+0x12a/0x1e0 lib/iov_iter.c:361 > > > > > > copy_folio_to_iter include/linux/uio.h:204 [inline] > > > > > > filemap_read+0x6b1/0xe40 mm/filemap.c:2851 > > > > > > blkdev_read_iter+0x1ac/0x500 block/fops.c:856 > > > > > > new_sync_read fs/read_write.c:491 [inline] > > > > > > vfs_read+0x8bf/0xcf0 fs/read_write.c:572 > > > > > > ksys_read+0x12a/0x250 fs/read_write.c:715 > > > > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > > > > > do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 > > > > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > > > > > > > > -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}: > > > > > > check_prev_add kernel/locking/lockdep.c:3165 [inline] > > > > > > check_prevs_add kernel/locking/lockdep.c:3284 [inline] > > > > > > validate_chain kernel/locking/lockdep.c:3908 [inline] > > > > > > __lock_acquire+0x1669/0x2890 kernel/locking/lockdep.c:5237 > > > > > > lock_acquire kernel/locking/lockdep.c:5868 [inline] > > > > > > lock_acquire+0x179/0x330 kernel/locking/lockdep.c:5825 > > > > > > down_read+0x9b/0x460 kernel/locking/rwsem.c:1537 > > > > > > inode_lock_shared include/linux/fs.h:1042 [inline] > > > > > > blkdev_read_iter+0x19e/0x500 block/fops.c:855 > > > > > > __kernel_read+0x3f3/0xbf0 fs/read_write.c:530 > > > > > > freader_fetch+0x1d7/0x9d0 lib/buildid.c:100 > > > > > > __build_id_parse.isra.0+0xdd/0x6c0 lib/buildid.c:297 > > > > > > do_procmap_query+0xb0e/0x1080 fs/proc/task_mmu.c:733 > > > > > > procfs_procmap_ioctl+0x9d/0xe0 fs/proc/task_mmu.c:813 > > > > > > vfs_ioctl fs/ioctl.c:51 [inline] > > > > > > __do_sys_ioctl fs/ioctl.c:597 [inline] > > > > > > __se_sys_ioctl fs/ioctl.c:583 [inline] > > > > > > __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583 > > > > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > > > > > do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 > > > > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > > > > > > > > > > It looks like: > > > > #0 is executing PROCMAP_QUERY ioclt, read-locks vm_lock and then calls > > > > build_id_parse()->__build_id_parse(..., > > > > may_fault=true)->__kernel_read() which eventually takes > > > > inode->i_rwsem. > > > > #1 is a file-backed page fault which asserts that it might take > > > > mmap_lock for read. > > > > #2 is load_elf_binary()->mprotect_fixup() which write-locks both > > > > mmap_lock and vm_lock. I'm guessing it already holds inode->i_rwsem > > > > before write-locking these locks. > > > > > > > > Originally I thought the issue is most liley introduced in > > > > d9d1c2d81797 ("fs/proc/task_mmu: execute PROCMAP_QUERY ioctl under > > > > per-vma locks"). But if #2 indeed takes inode->i_rwsem before > > > > write-locking mmap_lock, then the problem should exist even before > > > > that change when we didn't use vm_lock and relied on mmap_lock... > > > > > > > > I'll try to analyze this more before attempting a fix. > > > > > > I was able to reproduce the same issue even after reverting > > > d9d1c2d81797. The deadlock in this case is simpler and involves > > > mmap_lock instead of vm_lock (see below). > > > Looks like the race is between the read() syscall and do_procmap_query(). > > > I'll continue investigating, in the meantime CC'ing Andrii. > > > > So, here is a cleaner version of that report (with d9d1c2d81797 reverted): > > > > -> #1 (&mm->mmap_lock){++++}-{4:4}: > > __might_fault+0xed/0x170 > > _copy_to_iter+0x118/0x1720 > > copy_page_to_iter+0x12d/0x1e0 > > filemap_read+0x720/0x10a0 > > blkdev_read_iter+0x2b5/0x4e0 > > vfs_read+0x7f4/0xae0 > > ksys_read+0x12a/0x250 > > do_syscall_64+0xcb/0xf80 > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}: > > __lock_acquire+0x1509/0x26d0 > > lock_acquire+0x185/0x340 > > down_read+0x98/0x490 > > blkdev_read_iter+0x2a7/0x4e0 > > __kernel_read+0x39a/0xa90 > > freader_fetch+0x1d5/0xa80 > > __build_id_parse.isra.0+0xea/0x6a0 > > do_procmap_query+0xd75/0x1050 > > procfs_procmap_ioctl+0x7a/0xb0 > > __x64_sys_ioctl+0x18e/0x210 > > do_syscall_64+0xcb/0xf80 > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > other info that might help us debug this: > > > > Possible unsafe locking scenario: > > > > CPU0 CPU1 > > ---- ---- > > rlock(&mm->mmap_lock); > > lock(&sb->s_type->i_mutex_key#8); > > lock(&mm->mmap_lock); > > rlock(&sb->s_type->i_mutex_key#8); > > > > *** DEADLOCK *** > > > > Both threads are calling blkdev_read_iter(), which uses > > inode_lock_shared() to read-lock inode->i_rwsem. I'm not sure why CPU1 > > shows lock() instead of rlock(). So both threads read-lock > > inode->i_rwsem and mmap_lock but in a different order. IIUC, with > > read-locks this should not deadlock until some other thread > > write-locks the mmap_lock in between and this becomes a real deadlock: > > > > CPU0 CPU1 CPU2 > > ---- ---- ---- > > rlock(&mm->mmap_lock); > > rlock(&sb->s_type->i_mutex_key#8); > > wlock(&mm->mmap_lock) <-- waiting for CPU0 > > rlock(&mm->mmap_lock); <-- waiting for CPU1 > > rlock(&sb->s_type->i_mutex_key#8); <-- waiting for CPU2 > > > > I believe in the original report this write-locking thread was the one > > calling mprotect_fixup(). > > > > Per https://docs.kernel.org/mm/process_addrs.html#lock-ordering, > > inode->i_rwsem should be locked before mm->mmap_lock, so > > procfs_procmap_ioctl() has to be fixed to follow this lock ordering. > > One possibility I can think of is to use build_id_parse_nofault() > > first and if it fails because the required page is not faulted, we do > > freader_init_from_file(), then drop the mmap/vma lock and execute > > freader_fetch() outside of these locks to fault in that page. Once > > that's done, we'll retry the whole operation and this time > > build_id_parse_nofault() should pass (unless we already evicted that > > page, which is extremely unlikely and in that case, we'll retry > > again). > > > > I tried a POC with build_id_parse_nofault() but without the whole > > dance with freader_init_from_file/freader_fetch and the deadlock is > > gone. Andrii, WDYT? > > I don't like it :) Too much complexity, _nofault() variant only makes > sense for BPF in non-sleepable contexts. I think this can be fixed > simpler and cleaner. We don't need to hold VMA lock while fetching > build ID. Build ID works with vma's vm_file, so we can just get its > reference, drop vma lock, then fetch build id. Below diff passes our > BPF selftests. Might need to think about a bit leaner code changes, > but the idea should be clear. Diff below will be butchered by gmail, > but you can fetch it at [0]. Do you mind validating that deadlock is > gone? Thanks! Sure. I'll test it later today, once I'm home. > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/andrii/bpf-next.git/commit/?h=procmap-query-vma-deadlock-fix&id=7faf95b63a8a7ac6e78b6d90101c94bfa6ecdfd1 > > Author: Andrii Nakryiko <andrii@kernel.org> > Date: Tue Jan 27 10:46:04 2026 -0800 > > procfs: avoid fetching build ID while holding VMA lock > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org> > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 81dfc26bfae8..564bf82e3731 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -656,6 +656,7 @@ static int do_procmap_query(struct mm_struct *mm, > void __user *uarg) > struct proc_maps_locking_ctx lock_ctx = { .mm = mm }; > struct procmap_query karg; > struct vm_area_struct *vma; > + struct file *vm_file = NULL; > const char *name = NULL; > char build_id_buf[BUILD_ID_SIZE_MAX], *name_buf = NULL; > __u64 usize; > @@ -720,6 +721,9 @@ static int do_procmap_query(struct mm_struct *mm, > void __user *uarg) > karg.dev_major = MAJOR(inode->i_sb->s_dev); > karg.dev_minor = MINOR(inode->i_sb->s_dev); > karg.inode = inode->i_ino; > + > + if (karg.build_id_size) > + vm_file = get_file(vma->vm_file); > } else { > karg.vma_offset = 0; > karg.dev_major = 0; > @@ -727,21 +731,6 @@ static int do_procmap_query(struct mm_struct *mm, > void __user *uarg) > karg.inode = 0; > } > > - if (karg.build_id_size) { > - __u32 build_id_sz; > - > - err = build_id_parse(vma, build_id_buf, &build_id_sz); > - if (err) { > - karg.build_id_size = 0; > - } else { > - if (karg.build_id_size < build_id_sz) { > - err = -ENAMETOOLONG; > - goto out; > - } > - karg.build_id_size = build_id_sz; > - } > - } > - > if (karg.vma_name_size) { > size_t name_buf_sz = min_t(size_t, PATH_MAX, > karg.vma_name_size); > const struct path *path; > @@ -779,6 +768,28 @@ static int do_procmap_query(struct mm_struct *mm, > void __user *uarg) > query_vma_teardown(&lock_ctx); > mmput(mm); > > + if (karg.build_id_size) { > + __u32 build_id_sz; > + > + err = -ENOENT; > + if (vm_file) > + err = build_id_parse_file(vm_file, > build_id_buf, &build_id_sz); > + if (err) { > + karg.build_id_size = 0; > + } else { > + if (karg.build_id_size < build_id_sz) { > + err = -ENAMETOOLONG; > + goto out; > + } > + karg.build_id_size = build_id_sz; > + } > + } > + > + if (vm_file) { > + fput(vm_file); > + vm_file = NULL; > + } > + > if (karg.vma_name_size && > copy_to_user(u64_to_user_ptr(karg.vma_name_addr), > name, karg.vma_name_size)) { > kfree(name_buf); > @@ -797,6 +808,8 @@ static int do_procmap_query(struct mm_struct *mm, > void __user *uarg) > > out: > query_vma_teardown(&lock_ctx); > + if (vm_file) > + fput(vm_file); > mmput(mm); > kfree(name_buf); > return err; > diff --git a/include/linux/buildid.h b/include/linux/buildid.h > index 831c1b4b626c..7acc06b22fb7 100644 > --- a/include/linux/buildid.h > +++ b/include/linux/buildid.h > @@ -7,7 +7,10 @@ > #define BUILD_ID_SIZE_MAX 20 > > struct vm_area_struct; > +struct file; > + > int build_id_parse(struct vm_area_struct *vma, unsigned char > *build_id, __u32 *size); > +int build_id_parse_file(struct file *file, unsigned char *build_id, > __u32 *size); > int build_id_parse_nofault(struct vm_area_struct *vma, unsigned char > *build_id, __u32 *size); > int build_id_parse_buf(const void *buf, unsigned char *build_id, u32 buf_size); > > diff --git a/lib/buildid.c b/lib/buildid.c > index aaf61dfc0919..c0002129d526 100644 > --- a/lib/buildid.c > +++ b/lib/buildid.c > @@ -271,7 +271,7 @@ static int get_build_id_64(struct freader *r, > unsigned char *build_id, __u32 *si > /* enough for Elf64_Ehdr, Elf64_Phdr, and all the smaller requests */ > #define MAX_FREADER_BUF_SZ 64 > > -static int __build_id_parse(struct vm_area_struct *vma, unsigned char > *build_id, > +static int __build_id_parse(struct file *file, unsigned char *build_id, > __u32 *size, bool may_fault) > { > const Elf32_Ehdr *ehdr; > @@ -279,11 +279,7 @@ static int __build_id_parse(struct vm_area_struct > *vma, unsigned char *build_id, > char buf[MAX_FREADER_BUF_SZ]; > int ret; > > - /* only works for page backed storage */ > - if (!vma->vm_file) > - return -EINVAL; > - > - freader_init_from_file(&r, buf, sizeof(buf), vma->vm_file, may_fault); > + freader_init_from_file(&r, buf, sizeof(buf), file, may_fault); > > /* fetch first 18 bytes of ELF header for checks */ > ehdr = freader_fetch(&r, 0, offsetofend(Elf32_Ehdr, e_type)); > @@ -324,7 +320,11 @@ static int __build_id_parse(struct vm_area_struct > *vma, unsigned char *build_id, > */ > int build_id_parse_nofault(struct vm_area_struct *vma, unsigned char > *build_id, __u32 *size) > { > - return __build_id_parse(vma, build_id, size, false /* !may_fault */); > + /* only works for page backed storage */ > + if (!vma->vm_file) > + return -EINVAL; > + > + return __build_id_parse(vma->vm_file, build_id, size, false /* > !may_fault */); > } > > /* > @@ -340,7 +340,16 @@ int build_id_parse_nofault(struct vm_area_struct > *vma, unsigned char *build_id, > */ > int build_id_parse(struct vm_area_struct *vma, unsigned char > *build_id, __u32 *size) > { > - return __build_id_parse(vma, build_id, size, true /* may_fault */); > + /* only works for page backed storage */ > + if (!vma->vm_file) > + return -EINVAL; > + > + return __build_id_parse(vma->vm_file, build_id, size, true /* > may_fault */); > +} > + > +int build_id_parse_file(struct file *file, unsigned char *build_id, > __u32 *size) > +{ > + return __build_id_parse(file, build_id, size, true /* may_fault */); > } > > /** > > > > > > > > > > [ 62.320932][ T9229] > > > [ 62.321471][ T9229] ====================================================== > > > [ 62.323016][ T9229] WARNING: possible circular locking dependency detected > > > [ 62.324618][ T9229] 6.19.0-rc6-00001-g40bea6261b2a #42 Not tainted > > > [ 62.326013][ T9229] ------------------------------------------------------ > > > [ 62.327560][ T9229] hillf/9229 is trying to acquire lock: > > > [ 62.328821][ T9229] ffff888145b7b5a8 > > > (&sb->s_type->i_mutex_key#8){++++}-{4:4}, at: > > > blkdev_read_iter+0x2a7/0x4e0 > > > [ 62.331102][ T9229] > > > [ 62.331102][ T9229] but task is already holding lock: > > > [ 62.332722][ T9229] ffff888183a6e540 (&mm->mmap_lock){++++}-{4:4}, > > > at: do_procmap_query+0x39f/0x1050 > > > [ 62.334795][ T9229] > > > [ 62.334795][ T9229] which lock already depends on the new lock. > > > [ 62.334795][ T9229] > > > [ 62.337072][ T9229] > > > [ 62.337072][ T9229] the existing dependency chain (in reverse order) is: > > > [ 62.338998][ T9229] > > > [ 62.338998][ T9229] -> #1 (&mm->mmap_lock){++++}-{4:4}: > > > [ 62.340646][ T9229] __might_fault+0xed/0x170 > > > [ 62.341763][ T9229] _copy_to_iter+0x118/0x1720 > > > [ 62.342913][ T9229] copy_page_to_iter+0x12d/0x1e0 > > > [ 62.344167][ T9229] filemap_read+0x720/0x10a0 > > > [ 62.345298][ T9229] blkdev_read_iter+0x2b5/0x4e0 > > > [ 62.346480][ T9229] vfs_read+0x7f4/0xae0 > > > [ 62.347518][ T9229] ksys_read+0x12a/0x250 > > > [ 62.348584][ T9229] do_syscall_64+0xcb/0xf80 > > > [ 62.349707][ T9229] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > [ 62.351116][ T9229] > > > [ 62.351116][ T9229] -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}: > > > [ 62.353012][ T9229] __lock_acquire+0x1509/0x26d0 > > > [ 62.354213][ T9229] lock_acquire+0x185/0x340 > > > [ 62.355323][ T9229] down_read+0x98/0x490 > > > [ 62.356441][ T9229] blkdev_read_iter+0x2a7/0x4e0 > > > [ 62.357619][ T9229] __kernel_read+0x39a/0xa90 > > > [ 62.358767][ T9229] freader_fetch+0x1d5/0xa80 > > > [ 62.359927][ T9229] __build_id_parse.isra.0+0xea/0x6a0 > > > [ 62.361232][ T9229] do_procmap_query+0xd75/0x1050 > > > [ 62.362434][ T9229] procfs_procmap_ioctl+0x7a/0xb0 > > > [ 62.363687][ T9229] __x64_sys_ioctl+0x18e/0x210 > > > [ 62.364863][ T9229] do_syscall_64+0xcb/0xf80 > > > [ 62.365977][ T9229] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > [ 62.367394][ T9229] > > > [ 62.367394][ T9229] other info that might help us debug this: > > > [ 62.367394][ T9229] > > > [ 62.369637][ T9229] Possible unsafe locking scenario: > > > [ 62.369637][ T9229] > > > [ 62.371237][ T9229] CPU0 CPU1 > > > [ 62.372441][ T9229] ---- ---- > > > [ 62.373687][ T9229] rlock(&mm->mmap_lock); > > > [ 62.374688][ T9229] > > > lock(&sb->s_type->i_mutex_key#8); > > > [ 62.376444][ T9229] lock(&mm->mmap_lock); > > > [ 62.377956][ T9229] rlock(&sb->s_type->i_mutex_key#8); > > > [ 62.379165][ T9229] > > > [ 62.379165][ T9229] *** DEADLOCK *** > > > [ 62.379165][ T9229] > > > [ 62.380952][ T9229] 1 lock held by hillf/9229: > > > [ 62.381971][ T9229] #0: ffff888183a6e540 > > > (&mm->mmap_lock){++++}-{4:4}, at: do_procmap_query+0x39f/0x1050 > > > [ 62.384162][ T9229] > > > [ 62.384162][ T9229] stack backtrace: > > > [ 62.385458][ T9229] CPU: 3 UID: 0 PID: 9229 Comm: hillf Not tainted > > > 6.19.0-rc6-00001-g40bea6261b2a #42 PREEMPT(full) > > > [ 62.385471][ T9229] Hardware name: QEMU Standard PC (i440FX + PIIX, > > > 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 > > > [ 62.385477][ T9229] Call Trace: > > > [ 62.385482][ T9229] <TASK> > > > [ 62.385487][ T9229] dump_stack_lvl+0x100/0x190 > > > [ 62.385505][ T9229] print_circular_bug.cold+0x185/0x1d5 > > > [ 62.385521][ T9229] check_noncircular+0x14a/0x170 > > > [ 62.385534][ T9229] __lock_acquire+0x1509/0x26d0 > > > [ 62.385547][ T9229] lock_acquire+0x185/0x340 > > > [ 62.385557][ T9229] ? blkdev_read_iter+0x2a7/0x4e0 > > > [ 62.385569][ T9229] ? __pfx___might_resched+0x10/0x10 > > > [ 62.385583][ T9229] down_read+0x98/0x490 > > > [ 62.385593][ T9229] ? blkdev_read_iter+0x2a7/0x4e0 > > > [ 62.385603][ T9229] ? __pfx_down_read+0x10/0x10 > > > [ 62.385612][ T9229] ? lock_acquire+0x185/0x340 > > > [ 62.385622][ T9229] ? is_bpf_text_address+0x25/0x1a0 > > > [ 62.385634][ T9229] blkdev_read_iter+0x2a7/0x4e0 > > > [ 62.385645][ T9229] __kernel_read+0x39a/0xa90 > > > [ 62.385658][ T9229] ? __pfx___kernel_read+0x10/0x10 > > > [ 62.385671][ T9229] ? __lock_acquire+0x481/0x26d0 > > > [ 62.385683][ T9229] freader_fetch+0x1d5/0xa80 > > > [ 62.385697][ T9229] ? find_held_lock+0x2b/0x80 > > > [ 62.385712][ T9229] ? __pfx_freader_fetch+0x10/0x10 > > > [ 62.385725][ T9229] ? __asan_memset+0x27/0x50 > > > [ 62.385737][ T9229] __build_id_parse.isra.0+0xea/0x6a0 > > > [ 62.385751][ T9229] ? __pfx___build_id_parse.isra.0+0x10/0x10 > > > [ 62.385766][ T9229] ? __pfx_find_vma+0x10/0x10 > > > [ 62.385774][ T9229] ? __might_fault+0x129/0x170 > > > [ 62.385788][ T9229] do_procmap_query+0xd75/0x1050 > > > [ 62.385798][ T9229] ? __pfx_do_procmap_query+0x10/0x10 > > > [ 62.385807][ T9229] ? __sanitizer_cov_trace_switch+0x53/0x90 > > > [ 62.385817][ T9229] ? do_vfs_ioctl+0x226/0x13b0 > > > [ 62.385828][ T9229] ? __pfx_do_vfs_ioctl+0x10/0x10 > > > [ 62.385839][ T9229] ? putname+0xfc/0x1b0 > > > [ 62.385846][ T9229] ? putname+0x101/0x1b0 > > > [ 62.385857][ T9229] ? __x64_sys_openat+0x143/0x210 > > > [ 62.385867][ T9229] procfs_procmap_ioctl+0x7a/0xb0 > > > [ 62.385877][ T9229] ? __pfx_procfs_procmap_ioctl+0x10/0x10 > > > [ 62.385888][ T9229] __x64_sys_ioctl+0x18e/0x210 > > > [ 62.385899][ T9229] do_syscall_64+0xcb/0xf80 > > > [ 62.385913][ T9229] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > [ 62.385923][ T9229] RIP: 0033:0x412209 > > > [ 62.385931][ T9229] Code: c0 79 93 eb d5 48 8d 7c 1d 00 eb 99 0f 1f > > > 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b > > > 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 d8 ff ff ff f7 d8 > > > 64 89 01 48 > > > [ 62.385940][ T9229] RSP: 002b:00007fff380d5588 EFLAGS: 00000217 > > > ORIG_RAX: 0000000000000010 > > > [ 62.385950][ T9229] RAX: ffffffffffffffda RBX: 00007fff380d56c8 > > > RCX: 0000000000412209 > > > [ 62.385956][ T9229] RDX: 0000200000000180 RSI: 00000000c0686611 > > > RDI: 0000000000000004 > > > [ 62.385962][ T9229] RBP: 00007fff380d55a0 R08: 0000000000000000 > > > R09: 00007fff380d5640 > > > [ 62.385968][ T9229] R10: 0000000000000000 R11: 0000000000000217 > > > R12: 00007fff380d56b8 > > > [ 62.385974][ T9229] R13: 0000000000000002 R14: 00000000004a0e40 > > > R15: 0000000000000002 > > > [ 62.385982][ T9229] </TASK> > > > > > > > > > > > > > > > > > > > other info that might help us debug this: > > > > > > > > > > > > Chain exists of: > > > > > > &sb->s_type->i_mutex_key#8 --> &mm->mmap_lock --> vm_lock > > > > > > > > > > > > Possible unsafe locking scenario: > > > > > > > > > > > > CPU0 CPU1 > > > > > > ---- ---- > > > > > > rlock(vm_lock); > > > > > > lock(&mm->mmap_lock); > > > > > > lock(vm_lock); > > > > > > rlock(&sb->s_type->i_mutex_key#8); > > > > > > > > > > > > *** DEADLOCK *** > > > > > > > > > > > > 1 lock held by syz.0.17/6091: > > > > > > #0: ffff888012aa0448 (vm_lock){++++}-{0:0}, at: lock_next_vma+0x10e/0xed0 mm/mmap_lock.c:334 > > > > > > > > > > > > stack backtrace: > > > > > > CPU: 2 UID: 0 PID: 6091 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) > > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 > > > > > > Call Trace: > > > > > > <TASK> > > > > > > __dump_stack lib/dump_stack.c:94 [inline] > > > > > > dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 > > > > > > print_circular_bug+0x275/0x340 kernel/locking/lockdep.c:2043 > > > > > > check_noncircular+0x146/0x160 kernel/locking/lockdep.c:2175 > > > > > > check_prev_add kernel/locking/lockdep.c:3165 [inline] > > > > > > check_prevs_add kernel/locking/lockdep.c:3284 [inline] > > > > > > validate_chain kernel/locking/lockdep.c:3908 [inline] > > > > > > __lock_acquire+0x1669/0x2890 kernel/locking/lockdep.c:5237 > > > > > > lock_acquire kernel/locking/lockdep.c:5868 [inline] > > > > > > lock_acquire+0x179/0x330 kernel/locking/lockdep.c:5825 > > > > > > down_read+0x9b/0x460 kernel/locking/rwsem.c:1537 > > > > > > inode_lock_shared include/linux/fs.h:1042 [inline] > > > > > > blkdev_read_iter+0x19e/0x500 block/fops.c:855 > > > > > > __kernel_read+0x3f3/0xbf0 fs/read_write.c:530 > > > > > > freader_fetch+0x1d7/0x9d0 lib/buildid.c:100 > > > > > > __build_id_parse.isra.0+0xdd/0x6c0 lib/buildid.c:297 > > > > > > do_procmap_query+0xb0e/0x1080 fs/proc/task_mmu.c:733 > > > > > > procfs_procmap_ioctl+0x9d/0xe0 fs/proc/task_mmu.c:813 > > > > > > vfs_ioctl fs/ioctl.c:51 [inline] > > > > > > __do_sys_ioctl fs/ioctl.c:597 [inline] > > > > > > __se_sys_ioctl fs/ioctl.c:583 [inline] > > > > > > __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583 > > > > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > > > > > do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 > > > > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > > RIP: 0033:0x7ff1a238f7c9 > > > > > > Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 > > > > > > RSP: 002b:00007ffebbe538b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > > > > > > RAX: ffffffffffffffda RBX: 00007ff1a25e5fa0 RCX: 00007ff1a238f7c9 > > > > > > RDX: 0000200000000180 RSI: 00000000c0686611 RDI: 0000000000000004 > > > > > > RBP: 00007ff1a2413f91 R08: 0000000000000000 R09: 0000000000000000 > > > > > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > > > > > > R13: 00007ff1a25e5fa0 R14: 00007ff1a25e5fa0 R15: 0000000000000003 > > > > > > </TASK> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] [block?] possible deadlock in blkdev_read_iter 2026-01-27 23:52 ` Suren Baghdasaryan @ 2026-01-28 3:41 ` Suren Baghdasaryan 0 siblings, 0 replies; 7+ messages in thread From: Suren Baghdasaryan @ 2026-01-28 3:41 UTC (permalink / raw) To: Andrii Nakryiko Cc: Hillf Danton, syzbot, axboe, linux-block, Lorenzo Stoakes, linux-mm, linux-kernel, syzkaller-bugs On Tue, Jan 27, 2026 at 3:52 PM Suren Baghdasaryan <surenb@google.com> wrote: > > On Tue, Jan 27, 2026 at 10:51 AM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Mon, Jan 26, 2026 at 6:22 PM Suren Baghdasaryan <surenb@google.com> wrote: > > > > > > On Mon, Jan 26, 2026 at 2:33 PM Suren Baghdasaryan <surenb@google.com> wrote: > > > > > > > > On Mon, Jan 26, 2026 at 9:20 AM Suren Baghdasaryan <surenb@google.com> wrote: > > > > > > > > > > On Sat, Jan 24, 2026 at 3:32 AM Hillf Danton <hdanton@sina.com> wrote: > > > > > > > > > > > > Add Lorenzo and Suren > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > Date: Fri, 23 Jan 2026 15:14:36 -0800 > > > > > > > Hello, > > > > > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > > > > > HEAD commit: 24d479d26b25 Linux 6.19-rc6 > > > > > > > git tree: upstream > > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=100033fa580000 > > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=1859476832863c41 > > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=4e70c8e0a2017b432f7a > > > > > > > compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 > > > > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11451b9a580000 > > > > > > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1045e852580000 > > > > > > > > > > > > > > Downloadable assets: > > > > > > > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-24d479d2.raw.xz > > > > > > > vmlinux: https://storage.googleapis.com/syzbot-assets/d0f3c47f6869/vmlinux-24d479d2.xz > > > > > > > kernel image: https://storage.googleapis.com/syzbot-assets/800231513703/bzImage-24d479d2.xz > > > > > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > > > Reported-by: syzbot+4e70c8e0a2017b432f7a@syzkaller.appspotmail.com > > > > > > > > > > > > > > WARNING: possible circular locking dependency detected > > > > > > > syzkaller #0 Not tainted > > > > > > > ------------------------------------------------------ > > > > > > > syz.0.17/6091 is trying to acquire lock: > > > > > > > ffff8881061287a8 ( > > > > > > > &sb->s_type->i_mutex_key#8){++++}-{4:4}, at: inode_lock_shared include/linux/fs.h:1042 [inline] > > > > > > > &sb->s_type->i_mutex_key#8){++++}-{4:4}, at: blkdev_read_iter+0x19e/0x500 block/fops.c:855 > > > > > > > > > > > > > > but task is already holding lock: > > > > > > > ffff888012aa0448 (vm_lock){++++}-{0:0}, at: lock_next_vma+0x10e/0xed0 mm/mmap_lock.c:334 > > > > > > > > > > > > > > which lock already depends on the new lock. > > > > > > > > > > > > > > > > > > > > > the existing dependency chain (in reverse order) is: > > > > > > > > > > > > > > -> #2 (vm_lock){++++}-{0:0}: > > > > > > > __vma_enter_locked+0x260/0x770 mm/mmap_lock.c:72 > > > > > > > __vma_start_write+0x21/0x160 mm/mmap_lock.c:104 > > > > > > > vma_start_write include/linux/mmap_lock.h:213 [inline] > > > > > > > mprotect_fixup+0x4e3/0xb80 mm/mprotect.c:768 > > > > > > > setup_arg_pages+0x4a2/0xbb0 fs/exec.c:670 > > > > > > > load_elf_binary+0xb5b/0x4fe0 fs/binfmt_elf.c:1028 > > > > > > > search_binary_handler fs/exec.c:1669 [inline] > > > > > > > exec_binprm fs/exec.c:1701 [inline] > > > > > > > bprm_execve fs/exec.c:1753 [inline] > > > > > > > bprm_execve+0x8c2/0x1620 fs/exec.c:1729 > > > > > > > kernel_execve+0x2ef/0x3b0 fs/exec.c:1919 > > > > > > > try_to_run_init_process init/main.c:1506 [inline] > > > > > > > kernel_init+0x14a/0x2b0 init/main.c:1634 > > > > > > > ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158 > > > > > > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 > > > > > > > > > > > > > > -> #1 (&mm->mmap_lock){++++}-{4:4}: > > > > > > > __might_fault mm/memory.c:7174 [inline] > > > > > > > __might_fault+0x113/0x190 mm/memory.c:7168 > > > > > > > _copy_to_iter+0x1c2/0x1710 lib/iov_iter.c:196 > > > > > > > copy_page_to_iter lib/iov_iter.c:374 [inline] > > > > > > > copy_page_to_iter+0x12a/0x1e0 lib/iov_iter.c:361 > > > > > > > copy_folio_to_iter include/linux/uio.h:204 [inline] > > > > > > > filemap_read+0x6b1/0xe40 mm/filemap.c:2851 > > > > > > > blkdev_read_iter+0x1ac/0x500 block/fops.c:856 > > > > > > > new_sync_read fs/read_write.c:491 [inline] > > > > > > > vfs_read+0x8bf/0xcf0 fs/read_write.c:572 > > > > > > > ksys_read+0x12a/0x250 fs/read_write.c:715 > > > > > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > > > > > > do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 > > > > > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > > > > > > > > > > -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}: > > > > > > > check_prev_add kernel/locking/lockdep.c:3165 [inline] > > > > > > > check_prevs_add kernel/locking/lockdep.c:3284 [inline] > > > > > > > validate_chain kernel/locking/lockdep.c:3908 [inline] > > > > > > > __lock_acquire+0x1669/0x2890 kernel/locking/lockdep.c:5237 > > > > > > > lock_acquire kernel/locking/lockdep.c:5868 [inline] > > > > > > > lock_acquire+0x179/0x330 kernel/locking/lockdep.c:5825 > > > > > > > down_read+0x9b/0x460 kernel/locking/rwsem.c:1537 > > > > > > > inode_lock_shared include/linux/fs.h:1042 [inline] > > > > > > > blkdev_read_iter+0x19e/0x500 block/fops.c:855 > > > > > > > __kernel_read+0x3f3/0xbf0 fs/read_write.c:530 > > > > > > > freader_fetch+0x1d7/0x9d0 lib/buildid.c:100 > > > > > > > __build_id_parse.isra.0+0xdd/0x6c0 lib/buildid.c:297 > > > > > > > do_procmap_query+0xb0e/0x1080 fs/proc/task_mmu.c:733 > > > > > > > procfs_procmap_ioctl+0x9d/0xe0 fs/proc/task_mmu.c:813 > > > > > > > vfs_ioctl fs/ioctl.c:51 [inline] > > > > > > > __do_sys_ioctl fs/ioctl.c:597 [inline] > > > > > > > __se_sys_ioctl fs/ioctl.c:583 [inline] > > > > > > > __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583 > > > > > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > > > > > > do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 > > > > > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > > > > > > > > > > > > > It looks like: > > > > > #0 is executing PROCMAP_QUERY ioclt, read-locks vm_lock and then calls > > > > > build_id_parse()->__build_id_parse(..., > > > > > may_fault=true)->__kernel_read() which eventually takes > > > > > inode->i_rwsem. > > > > > #1 is a file-backed page fault which asserts that it might take > > > > > mmap_lock for read. > > > > > #2 is load_elf_binary()->mprotect_fixup() which write-locks both > > > > > mmap_lock and vm_lock. I'm guessing it already holds inode->i_rwsem > > > > > before write-locking these locks. > > > > > > > > > > Originally I thought the issue is most liley introduced in > > > > > d9d1c2d81797 ("fs/proc/task_mmu: execute PROCMAP_QUERY ioctl under > > > > > per-vma locks"). But if #2 indeed takes inode->i_rwsem before > > > > > write-locking mmap_lock, then the problem should exist even before > > > > > that change when we didn't use vm_lock and relied on mmap_lock... > > > > > > > > > > I'll try to analyze this more before attempting a fix. > > > > > > > > I was able to reproduce the same issue even after reverting > > > > d9d1c2d81797. The deadlock in this case is simpler and involves > > > > mmap_lock instead of vm_lock (see below). > > > > Looks like the race is between the read() syscall and do_procmap_query(). > > > > I'll continue investigating, in the meantime CC'ing Andrii. > > > > > > So, here is a cleaner version of that report (with d9d1c2d81797 reverted): > > > > > > -> #1 (&mm->mmap_lock){++++}-{4:4}: > > > __might_fault+0xed/0x170 > > > _copy_to_iter+0x118/0x1720 > > > copy_page_to_iter+0x12d/0x1e0 > > > filemap_read+0x720/0x10a0 > > > blkdev_read_iter+0x2b5/0x4e0 > > > vfs_read+0x7f4/0xae0 > > > ksys_read+0x12a/0x250 > > > do_syscall_64+0xcb/0xf80 > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > > -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}: > > > __lock_acquire+0x1509/0x26d0 > > > lock_acquire+0x185/0x340 > > > down_read+0x98/0x490 > > > blkdev_read_iter+0x2a7/0x4e0 > > > __kernel_read+0x39a/0xa90 > > > freader_fetch+0x1d5/0xa80 > > > __build_id_parse.isra.0+0xea/0x6a0 > > > do_procmap_query+0xd75/0x1050 > > > procfs_procmap_ioctl+0x7a/0xb0 > > > __x64_sys_ioctl+0x18e/0x210 > > > do_syscall_64+0xcb/0xf80 > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > > other info that might help us debug this: > > > > > > Possible unsafe locking scenario: > > > > > > CPU0 CPU1 > > > ---- ---- > > > rlock(&mm->mmap_lock); > > > lock(&sb->s_type->i_mutex_key#8); > > > lock(&mm->mmap_lock); > > > rlock(&sb->s_type->i_mutex_key#8); > > > > > > *** DEADLOCK *** > > > > > > Both threads are calling blkdev_read_iter(), which uses > > > inode_lock_shared() to read-lock inode->i_rwsem. I'm not sure why CPU1 > > > shows lock() instead of rlock(). So both threads read-lock > > > inode->i_rwsem and mmap_lock but in a different order. IIUC, with > > > read-locks this should not deadlock until some other thread > > > write-locks the mmap_lock in between and this becomes a real deadlock: > > > > > > CPU0 CPU1 CPU2 > > > ---- ---- ---- > > > rlock(&mm->mmap_lock); > > > rlock(&sb->s_type->i_mutex_key#8); > > > wlock(&mm->mmap_lock) <-- waiting for CPU0 > > > rlock(&mm->mmap_lock); <-- waiting for CPU1 > > > rlock(&sb->s_type->i_mutex_key#8); <-- waiting for CPU2 > > > > > > I believe in the original report this write-locking thread was the one > > > calling mprotect_fixup(). > > > > > > Per https://docs.kernel.org/mm/process_addrs.html#lock-ordering, > > > inode->i_rwsem should be locked before mm->mmap_lock, so > > > procfs_procmap_ioctl() has to be fixed to follow this lock ordering. > > > One possibility I can think of is to use build_id_parse_nofault() > > > first and if it fails because the required page is not faulted, we do > > > freader_init_from_file(), then drop the mmap/vma lock and execute > > > freader_fetch() outside of these locks to fault in that page. Once > > > that's done, we'll retry the whole operation and this time > > > build_id_parse_nofault() should pass (unless we already evicted that > > > page, which is extremely unlikely and in that case, we'll retry > > > again). > > > > > > I tried a POC with build_id_parse_nofault() but without the whole > > > dance with freader_init_from_file/freader_fetch and the deadlock is > > > gone. Andrii, WDYT? > > > > I don't like it :) Too much complexity, _nofault() variant only makes > > sense for BPF in non-sleepable contexts. I think this can be fixed > > simpler and cleaner. We don't need to hold VMA lock while fetching > > build ID. Build ID works with vma's vm_file, so we can just get its > > reference, drop vma lock, then fetch build id. Below diff passes our > > BPF selftests. Might need to think about a bit leaner code changes, > > but the idea should be clear. Diff below will be butchered by gmail, > > but you can fetch it at [0]. Do you mind validating that deadlock is > > gone? Thanks! > > Sure. I'll test it later today, once I'm home. With this fix, the issue is not reproducible. Once you post the final version I can retest it again. Thanks, Suren. > > > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/andrii/bpf-next.git/commit/?h=procmap-query-vma-deadlock-fix&id=7faf95b63a8a7ac6e78b6d90101c94bfa6ecdfd1 > > > > Author: Andrii Nakryiko <andrii@kernel.org> > > Date: Tue Jan 27 10:46:04 2026 -0800 > > > > procfs: avoid fetching build ID while holding VMA lock > > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org> > > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > index 81dfc26bfae8..564bf82e3731 100644 > > --- a/fs/proc/task_mmu.c > > +++ b/fs/proc/task_mmu.c > > @@ -656,6 +656,7 @@ static int do_procmap_query(struct mm_struct *mm, > > void __user *uarg) > > struct proc_maps_locking_ctx lock_ctx = { .mm = mm }; > > struct procmap_query karg; > > struct vm_area_struct *vma; > > + struct file *vm_file = NULL; > > const char *name = NULL; > > char build_id_buf[BUILD_ID_SIZE_MAX], *name_buf = NULL; > > __u64 usize; > > @@ -720,6 +721,9 @@ static int do_procmap_query(struct mm_struct *mm, > > void __user *uarg) > > karg.dev_major = MAJOR(inode->i_sb->s_dev); > > karg.dev_minor = MINOR(inode->i_sb->s_dev); > > karg.inode = inode->i_ino; > > + > > + if (karg.build_id_size) > > + vm_file = get_file(vma->vm_file); > > } else { > > karg.vma_offset = 0; > > karg.dev_major = 0; > > @@ -727,21 +731,6 @@ static int do_procmap_query(struct mm_struct *mm, > > void __user *uarg) > > karg.inode = 0; > > } > > > > - if (karg.build_id_size) { > > - __u32 build_id_sz; > > - > > - err = build_id_parse(vma, build_id_buf, &build_id_sz); > > - if (err) { > > - karg.build_id_size = 0; > > - } else { > > - if (karg.build_id_size < build_id_sz) { > > - err = -ENAMETOOLONG; > > - goto out; > > - } > > - karg.build_id_size = build_id_sz; > > - } > > - } > > - > > if (karg.vma_name_size) { > > size_t name_buf_sz = min_t(size_t, PATH_MAX, > > karg.vma_name_size); > > const struct path *path; > > @@ -779,6 +768,28 @@ static int do_procmap_query(struct mm_struct *mm, > > void __user *uarg) > > query_vma_teardown(&lock_ctx); > > mmput(mm); > > > > + if (karg.build_id_size) { > > + __u32 build_id_sz; > > + > > + err = -ENOENT; > > + if (vm_file) > > + err = build_id_parse_file(vm_file, > > build_id_buf, &build_id_sz); > > + if (err) { > > + karg.build_id_size = 0; > > + } else { > > + if (karg.build_id_size < build_id_sz) { > > + err = -ENAMETOOLONG; > > + goto out; > > + } > > + karg.build_id_size = build_id_sz; > > + } > > + } > > + > > + if (vm_file) { > > + fput(vm_file); > > + vm_file = NULL; > > + } > > + > > if (karg.vma_name_size && > > copy_to_user(u64_to_user_ptr(karg.vma_name_addr), > > name, karg.vma_name_size)) { > > kfree(name_buf); > > @@ -797,6 +808,8 @@ static int do_procmap_query(struct mm_struct *mm, > > void __user *uarg) > > > > out: > > query_vma_teardown(&lock_ctx); > > + if (vm_file) > > + fput(vm_file); > > mmput(mm); > > kfree(name_buf); > > return err; > > diff --git a/include/linux/buildid.h b/include/linux/buildid.h > > index 831c1b4b626c..7acc06b22fb7 100644 > > --- a/include/linux/buildid.h > > +++ b/include/linux/buildid.h > > @@ -7,7 +7,10 @@ > > #define BUILD_ID_SIZE_MAX 20 > > > > struct vm_area_struct; > > +struct file; > > + > > int build_id_parse(struct vm_area_struct *vma, unsigned char > > *build_id, __u32 *size); > > +int build_id_parse_file(struct file *file, unsigned char *build_id, > > __u32 *size); > > int build_id_parse_nofault(struct vm_area_struct *vma, unsigned char > > *build_id, __u32 *size); > > int build_id_parse_buf(const void *buf, unsigned char *build_id, u32 buf_size); > > > > diff --git a/lib/buildid.c b/lib/buildid.c > > index aaf61dfc0919..c0002129d526 100644 > > --- a/lib/buildid.c > > +++ b/lib/buildid.c > > @@ -271,7 +271,7 @@ static int get_build_id_64(struct freader *r, > > unsigned char *build_id, __u32 *si > > /* enough for Elf64_Ehdr, Elf64_Phdr, and all the smaller requests */ > > #define MAX_FREADER_BUF_SZ 64 > > > > -static int __build_id_parse(struct vm_area_struct *vma, unsigned char > > *build_id, > > +static int __build_id_parse(struct file *file, unsigned char *build_id, > > __u32 *size, bool may_fault) > > { > > const Elf32_Ehdr *ehdr; > > @@ -279,11 +279,7 @@ static int __build_id_parse(struct vm_area_struct > > *vma, unsigned char *build_id, > > char buf[MAX_FREADER_BUF_SZ]; > > int ret; > > > > - /* only works for page backed storage */ > > - if (!vma->vm_file) > > - return -EINVAL; > > - > > - freader_init_from_file(&r, buf, sizeof(buf), vma->vm_file, may_fault); > > + freader_init_from_file(&r, buf, sizeof(buf), file, may_fault); > > > > /* fetch first 18 bytes of ELF header for checks */ > > ehdr = freader_fetch(&r, 0, offsetofend(Elf32_Ehdr, e_type)); > > @@ -324,7 +320,11 @@ static int __build_id_parse(struct vm_area_struct > > *vma, unsigned char *build_id, > > */ > > int build_id_parse_nofault(struct vm_area_struct *vma, unsigned char > > *build_id, __u32 *size) > > { > > - return __build_id_parse(vma, build_id, size, false /* !may_fault */); > > + /* only works for page backed storage */ > > + if (!vma->vm_file) > > + return -EINVAL; > > + > > + return __build_id_parse(vma->vm_file, build_id, size, false /* > > !may_fault */); > > } > > > > /* > > @@ -340,7 +340,16 @@ int build_id_parse_nofault(struct vm_area_struct > > *vma, unsigned char *build_id, > > */ > > int build_id_parse(struct vm_area_struct *vma, unsigned char > > *build_id, __u32 *size) > > { > > - return __build_id_parse(vma, build_id, size, true /* may_fault */); > > + /* only works for page backed storage */ > > + if (!vma->vm_file) > > + return -EINVAL; > > + > > + return __build_id_parse(vma->vm_file, build_id, size, true /* > > may_fault */); > > +} > > + > > +int build_id_parse_file(struct file *file, unsigned char *build_id, > > __u32 *size) > > +{ > > + return __build_id_parse(file, build_id, size, true /* may_fault */); > > } > > > > /** > > > > > > > > > > > > > > > [ 62.320932][ T9229] > > > > [ 62.321471][ T9229] ====================================================== > > > > [ 62.323016][ T9229] WARNING: possible circular locking dependency detected > > > > [ 62.324618][ T9229] 6.19.0-rc6-00001-g40bea6261b2a #42 Not tainted > > > > [ 62.326013][ T9229] ------------------------------------------------------ > > > > [ 62.327560][ T9229] hillf/9229 is trying to acquire lock: > > > > [ 62.328821][ T9229] ffff888145b7b5a8 > > > > (&sb->s_type->i_mutex_key#8){++++}-{4:4}, at: > > > > blkdev_read_iter+0x2a7/0x4e0 > > > > [ 62.331102][ T9229] > > > > [ 62.331102][ T9229] but task is already holding lock: > > > > [ 62.332722][ T9229] ffff888183a6e540 (&mm->mmap_lock){++++}-{4:4}, > > > > at: do_procmap_query+0x39f/0x1050 > > > > [ 62.334795][ T9229] > > > > [ 62.334795][ T9229] which lock already depends on the new lock. > > > > [ 62.334795][ T9229] > > > > [ 62.337072][ T9229] > > > > [ 62.337072][ T9229] the existing dependency chain (in reverse order) is: > > > > [ 62.338998][ T9229] > > > > [ 62.338998][ T9229] -> #1 (&mm->mmap_lock){++++}-{4:4}: > > > > [ 62.340646][ T9229] __might_fault+0xed/0x170 > > > > [ 62.341763][ T9229] _copy_to_iter+0x118/0x1720 > > > > [ 62.342913][ T9229] copy_page_to_iter+0x12d/0x1e0 > > > > [ 62.344167][ T9229] filemap_read+0x720/0x10a0 > > > > [ 62.345298][ T9229] blkdev_read_iter+0x2b5/0x4e0 > > > > [ 62.346480][ T9229] vfs_read+0x7f4/0xae0 > > > > [ 62.347518][ T9229] ksys_read+0x12a/0x250 > > > > [ 62.348584][ T9229] do_syscall_64+0xcb/0xf80 > > > > [ 62.349707][ T9229] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > [ 62.351116][ T9229] > > > > [ 62.351116][ T9229] -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}: > > > > [ 62.353012][ T9229] __lock_acquire+0x1509/0x26d0 > > > > [ 62.354213][ T9229] lock_acquire+0x185/0x340 > > > > [ 62.355323][ T9229] down_read+0x98/0x490 > > > > [ 62.356441][ T9229] blkdev_read_iter+0x2a7/0x4e0 > > > > [ 62.357619][ T9229] __kernel_read+0x39a/0xa90 > > > > [ 62.358767][ T9229] freader_fetch+0x1d5/0xa80 > > > > [ 62.359927][ T9229] __build_id_parse.isra.0+0xea/0x6a0 > > > > [ 62.361232][ T9229] do_procmap_query+0xd75/0x1050 > > > > [ 62.362434][ T9229] procfs_procmap_ioctl+0x7a/0xb0 > > > > [ 62.363687][ T9229] __x64_sys_ioctl+0x18e/0x210 > > > > [ 62.364863][ T9229] do_syscall_64+0xcb/0xf80 > > > > [ 62.365977][ T9229] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > [ 62.367394][ T9229] > > > > [ 62.367394][ T9229] other info that might help us debug this: > > > > [ 62.367394][ T9229] > > > > [ 62.369637][ T9229] Possible unsafe locking scenario: > > > > [ 62.369637][ T9229] > > > > [ 62.371237][ T9229] CPU0 CPU1 > > > > [ 62.372441][ T9229] ---- ---- > > > > [ 62.373687][ T9229] rlock(&mm->mmap_lock); > > > > [ 62.374688][ T9229] > > > > lock(&sb->s_type->i_mutex_key#8); > > > > [ 62.376444][ T9229] lock(&mm->mmap_lock); > > > > [ 62.377956][ T9229] rlock(&sb->s_type->i_mutex_key#8); > > > > [ 62.379165][ T9229] > > > > [ 62.379165][ T9229] *** DEADLOCK *** > > > > [ 62.379165][ T9229] > > > > [ 62.380952][ T9229] 1 lock held by hillf/9229: > > > > [ 62.381971][ T9229] #0: ffff888183a6e540 > > > > (&mm->mmap_lock){++++}-{4:4}, at: do_procmap_query+0x39f/0x1050 > > > > [ 62.384162][ T9229] > > > > [ 62.384162][ T9229] stack backtrace: > > > > [ 62.385458][ T9229] CPU: 3 UID: 0 PID: 9229 Comm: hillf Not tainted > > > > 6.19.0-rc6-00001-g40bea6261b2a #42 PREEMPT(full) > > > > [ 62.385471][ T9229] Hardware name: QEMU Standard PC (i440FX + PIIX, > > > > 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 > > > > [ 62.385477][ T9229] Call Trace: > > > > [ 62.385482][ T9229] <TASK> > > > > [ 62.385487][ T9229] dump_stack_lvl+0x100/0x190 > > > > [ 62.385505][ T9229] print_circular_bug.cold+0x185/0x1d5 > > > > [ 62.385521][ T9229] check_noncircular+0x14a/0x170 > > > > [ 62.385534][ T9229] __lock_acquire+0x1509/0x26d0 > > > > [ 62.385547][ T9229] lock_acquire+0x185/0x340 > > > > [ 62.385557][ T9229] ? blkdev_read_iter+0x2a7/0x4e0 > > > > [ 62.385569][ T9229] ? __pfx___might_resched+0x10/0x10 > > > > [ 62.385583][ T9229] down_read+0x98/0x490 > > > > [ 62.385593][ T9229] ? blkdev_read_iter+0x2a7/0x4e0 > > > > [ 62.385603][ T9229] ? __pfx_down_read+0x10/0x10 > > > > [ 62.385612][ T9229] ? lock_acquire+0x185/0x340 > > > > [ 62.385622][ T9229] ? is_bpf_text_address+0x25/0x1a0 > > > > [ 62.385634][ T9229] blkdev_read_iter+0x2a7/0x4e0 > > > > [ 62.385645][ T9229] __kernel_read+0x39a/0xa90 > > > > [ 62.385658][ T9229] ? __pfx___kernel_read+0x10/0x10 > > > > [ 62.385671][ T9229] ? __lock_acquire+0x481/0x26d0 > > > > [ 62.385683][ T9229] freader_fetch+0x1d5/0xa80 > > > > [ 62.385697][ T9229] ? find_held_lock+0x2b/0x80 > > > > [ 62.385712][ T9229] ? __pfx_freader_fetch+0x10/0x10 > > > > [ 62.385725][ T9229] ? __asan_memset+0x27/0x50 > > > > [ 62.385737][ T9229] __build_id_parse.isra.0+0xea/0x6a0 > > > > [ 62.385751][ T9229] ? __pfx___build_id_parse.isra.0+0x10/0x10 > > > > [ 62.385766][ T9229] ? __pfx_find_vma+0x10/0x10 > > > > [ 62.385774][ T9229] ? __might_fault+0x129/0x170 > > > > [ 62.385788][ T9229] do_procmap_query+0xd75/0x1050 > > > > [ 62.385798][ T9229] ? __pfx_do_procmap_query+0x10/0x10 > > > > [ 62.385807][ T9229] ? __sanitizer_cov_trace_switch+0x53/0x90 > > > > [ 62.385817][ T9229] ? do_vfs_ioctl+0x226/0x13b0 > > > > [ 62.385828][ T9229] ? __pfx_do_vfs_ioctl+0x10/0x10 > > > > [ 62.385839][ T9229] ? putname+0xfc/0x1b0 > > > > [ 62.385846][ T9229] ? putname+0x101/0x1b0 > > > > [ 62.385857][ T9229] ? __x64_sys_openat+0x143/0x210 > > > > [ 62.385867][ T9229] procfs_procmap_ioctl+0x7a/0xb0 > > > > [ 62.385877][ T9229] ? __pfx_procfs_procmap_ioctl+0x10/0x10 > > > > [ 62.385888][ T9229] __x64_sys_ioctl+0x18e/0x210 > > > > [ 62.385899][ T9229] do_syscall_64+0xcb/0xf80 > > > > [ 62.385913][ T9229] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > [ 62.385923][ T9229] RIP: 0033:0x412209 > > > > [ 62.385931][ T9229] Code: c0 79 93 eb d5 48 8d 7c 1d 00 eb 99 0f 1f > > > > 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b > > > > 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 d8 ff ff ff f7 d8 > > > > 64 89 01 48 > > > > [ 62.385940][ T9229] RSP: 002b:00007fff380d5588 EFLAGS: 00000217 > > > > ORIG_RAX: 0000000000000010 > > > > [ 62.385950][ T9229] RAX: ffffffffffffffda RBX: 00007fff380d56c8 > > > > RCX: 0000000000412209 > > > > [ 62.385956][ T9229] RDX: 0000200000000180 RSI: 00000000c0686611 > > > > RDI: 0000000000000004 > > > > [ 62.385962][ T9229] RBP: 00007fff380d55a0 R08: 0000000000000000 > > > > R09: 00007fff380d5640 > > > > [ 62.385968][ T9229] R10: 0000000000000000 R11: 0000000000000217 > > > > R12: 00007fff380d56b8 > > > > [ 62.385974][ T9229] R13: 0000000000000002 R14: 00000000004a0e40 > > > > R15: 0000000000000002 > > > > [ 62.385982][ T9229] </TASK> > > > > > > > > > > > > > > > > > > > > > > > > other info that might help us debug this: > > > > > > > > > > > > > > Chain exists of: > > > > > > > &sb->s_type->i_mutex_key#8 --> &mm->mmap_lock --> vm_lock > > > > > > > > > > > > > > Possible unsafe locking scenario: > > > > > > > > > > > > > > CPU0 CPU1 > > > > > > > ---- ---- > > > > > > > rlock(vm_lock); > > > > > > > lock(&mm->mmap_lock); > > > > > > > lock(vm_lock); > > > > > > > rlock(&sb->s_type->i_mutex_key#8); > > > > > > > > > > > > > > *** DEADLOCK *** > > > > > > > > > > > > > > 1 lock held by syz.0.17/6091: > > > > > > > #0: ffff888012aa0448 (vm_lock){++++}-{0:0}, at: lock_next_vma+0x10e/0xed0 mm/mmap_lock.c:334 > > > > > > > > > > > > > > stack backtrace: > > > > > > > CPU: 2 UID: 0 PID: 6091 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) > > > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 > > > > > > > Call Trace: > > > > > > > <TASK> > > > > > > > __dump_stack lib/dump_stack.c:94 [inline] > > > > > > > dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 > > > > > > > print_circular_bug+0x275/0x340 kernel/locking/lockdep.c:2043 > > > > > > > check_noncircular+0x146/0x160 kernel/locking/lockdep.c:2175 > > > > > > > check_prev_add kernel/locking/lockdep.c:3165 [inline] > > > > > > > check_prevs_add kernel/locking/lockdep.c:3284 [inline] > > > > > > > validate_chain kernel/locking/lockdep.c:3908 [inline] > > > > > > > __lock_acquire+0x1669/0x2890 kernel/locking/lockdep.c:5237 > > > > > > > lock_acquire kernel/locking/lockdep.c:5868 [inline] > > > > > > > lock_acquire+0x179/0x330 kernel/locking/lockdep.c:5825 > > > > > > > down_read+0x9b/0x460 kernel/locking/rwsem.c:1537 > > > > > > > inode_lock_shared include/linux/fs.h:1042 [inline] > > > > > > > blkdev_read_iter+0x19e/0x500 block/fops.c:855 > > > > > > > __kernel_read+0x3f3/0xbf0 fs/read_write.c:530 > > > > > > > freader_fetch+0x1d7/0x9d0 lib/buildid.c:100 > > > > > > > __build_id_parse.isra.0+0xdd/0x6c0 lib/buildid.c:297 > > > > > > > do_procmap_query+0xb0e/0x1080 fs/proc/task_mmu.c:733 > > > > > > > procfs_procmap_ioctl+0x9d/0xe0 fs/proc/task_mmu.c:813 > > > > > > > vfs_ioctl fs/ioctl.c:51 [inline] > > > > > > > __do_sys_ioctl fs/ioctl.c:597 [inline] > > > > > > > __se_sys_ioctl fs/ioctl.c:583 [inline] > > > > > > > __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583 > > > > > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > > > > > > do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 > > > > > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > > > RIP: 0033:0x7ff1a238f7c9 > > > > > > > Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 > > > > > > > RSP: 002b:00007ffebbe538b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > > > > > > > RAX: ffffffffffffffda RBX: 00007ff1a25e5fa0 RCX: 00007ff1a238f7c9 > > > > > > > RDX: 0000200000000180 RSI: 00000000c0686611 RDI: 0000000000000004 > > > > > > > RBP: 00007ff1a2413f91 R08: 0000000000000000 R09: 0000000000000000 > > > > > > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > > > > > > > R13: 00007ff1a25e5fa0 R14: 00007ff1a25e5fa0 R15: 0000000000000003 > > > > > > > </TASK> ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-01-28 3:42 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <697400dc.a70a0220.35de72.000a.GAE@google.com>
2026-01-24 11:31 ` [syzbot] [block?] possible deadlock in blkdev_read_iter Hillf Danton
2026-01-26 17:20 ` Suren Baghdasaryan
2026-01-26 22:33 ` Suren Baghdasaryan
2026-01-27 2:22 ` Suren Baghdasaryan
2026-01-27 18:51 ` Andrii Nakryiko
2026-01-27 23:52 ` Suren Baghdasaryan
2026-01-28 3:41 ` Suren Baghdasaryan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox