* Re: [PATCH] mm: Fix mmap_assert_locked() in follow_pte() @ 2024-07-12 12:17 Bert Karwatzki 2024-07-12 12:17 ` commit 1c29a32ce65f4cd0f1c causes Bad rss-counter state and firefox-esr crash in linux-next-20240613 Bert Karwatzki 0 siblings, 1 reply; 12+ messages in thread From: Bert Karwatzki @ 2024-07-12 12:17 UTC (permalink / raw) To: Pei Li Cc: Bert Karwatzki, Andrew Morton, Shuah Khan, David Hildenbrand, linux-mm, linux-kernel, linux-next, syzbot+35a4414f6e247f515443, syzkaller-bugs, linux-kernel-mentees, senozhatsky This is causing a deadlock for me, too. Since linux-next-20240712 I observe a hang when starting the gui. I bisected this to commit a13252049629 and reverting this commit in linux-next-20240712 fixes the issue for me. I do not have any log messages to prove the dead lock, though, as I compiled everything without CONFIG_LOCKDEP. Bert Karwatzki ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: commit 1c29a32ce65f4cd0f1c causes Bad rss-counter state and firefox-esr crash in linux-next-20240613 2024-07-12 12:17 [PATCH] mm: Fix mmap_assert_locked() in follow_pte() Bert Karwatzki @ 2024-07-12 12:17 ` Bert Karwatzki 2024-07-12 15:38 ` Liam R. Howlett 0 siblings, 1 reply; 12+ messages in thread From: Bert Karwatzki @ 2024-07-12 12:17 UTC (permalink / raw) To: Liam R . Howlett Cc: Bert Karwatzki, Andrew Morton, Jiri Olsa, linux-mm, linux-kernel, linux-next I did some experiments on the rss counter bug. The next patch is made for linux-next-20240613 with commit 1c29a32ce65f4cd0f1c0f9 reverted. Then I simply inlined the code of do_vmi_unmap() and do_vmi_align_munmap() into mmap_region(). This version of the code works fine and does not show the rss counter bug. diff --git a/mm/mmap.c b/mm/mmap.c index f95af72ddc9f..0f020c535c83 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -733,7 +733,6 @@ int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma, vma_iter_store(vmi, vma); vma_complete(&vp, vmi, vma->vm_mm); - validate_mm(vma->vm_mm); return 0; nomem: @@ -2911,6 +2910,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, struct vm_area_struct *next, *prev, *merge; pgoff_t pglen = len >> PAGE_SHIFT; unsigned long charged = 0; + struct vma_munmap_struct vms; + struct ma_state mas_detach; unsigned long end = addr + len; unsigned long merge_start = addr, merge_end = end; bool writable_file_mapping = false; @@ -2933,12 +2934,46 @@ unsigned long mmap_region(struct file *file, unsigned long addr, return -ENOMEM; } - /* Unmap any existing mapping in the area */ - error = do_vmi_munmap(&vmi, mm, addr, len, uf, false); - if (error == -EPERM) - return error; - else if (error) - return -ENOMEM; + /* Find the first overlapping VMA */ + vma = vma_find(&vmi, end); + if (vma) { + struct maple_tree mt_detach; + + /* + * Check if memory is sealed before arch_unmap. + * Prevent unmapping a sealed VMA. + * can_modify_mm assumes we have acquired the lock on MM. + */ + if (unlikely(!can_modify_mm(mm, addr, end))) { + return -EPERM; + } + + /* arch_unmap() might do unmaps itself. */ + arch_unmap(mm, addr, end); + + mt_init_flags(&mt_detach, vmi.mas.tree->ma_flags & MT_FLAGS_LOCK_MASK); + mt_on_stack(mt_detach); + mas_init(&mas_detach, &mt_detach, 0); + + init_vma_munmap(&vms, &vmi, vma, addr, end, uf, false); + error = vms_gather_munmap_vmas(&vms, &mas_detach); + if (error) { + validate_mm(mm); + return -ENOMEM; + } + + vma = NULL; + error = vma_iter_clear_gfp(&vmi, addr, end, GFP_KERNEL); + if (error) { + abort_munmap_vmas(&mas_detach); + return -ENOMEM; + } + + /* Point of no return */ + vms_complete_munmap_vmas(&vms, &mas_detach); + } else { + // TODO + } /* * Private writable mapping: check memory availability The next patch now moves the call to vms_complete_munmap_vmas() towards the end of mmap_region(). This code is also free of the rss counter bug. commit a4b24bb18dde627792297455befcc465e45be66d Author: Bert Karwatzki <spasswolf@web.de> Date: Thu Jun 20 17:02:08 2024 +0200 mm: mmap: push back vms_complete_munmap_vmas() In order to to debug the rss counter bug we're going to push back vms_complete_munmap_vmas() in mmap_region. Signed-off-by: Bert Karwatzki <spasswolf@web.de> diff --git a/mm/mmap.c b/mm/mmap.c index 0f020c535c83..4fb9dd2e6d6e 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2970,9 +2970,9 @@ unsigned long mmap_region(struct file *file, unsigned long addr, } /* Point of no return */ - vms_complete_munmap_vmas(&vms, &mas_detach); } else { - // TODO + vms.end = 0; + vms.nr_pages = 0; } /* @@ -3016,6 +3016,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr, vma_iter_next_range(&vmi); } + if (vms.end) { + vms_complete_munmap_vmas(&vms, &mas_detach); + vms.end = 0; // avoid double unmap below + } + /* Actually expand, if possible */ if (vma && !vma_expand(&vmi, vma, merge_start, merge_end, vm_pgoff, next)) { @@ -3026,7 +3031,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, if (vma == prev) vma_iter_set(&vmi, addr); cannot_expand: - + if (vms.end) + vms_complete_munmap_vmas(&vms, &mas_detach); /* * Determine the object being mapped and call the appropriate * specific mapper. the address has already been validated, but The next patch move vms_complete_munmap_vmas() a little further beyond the call to vma_expand(). This code contain the rss counter bug. commit 02d6be2410fa503d008f4cc8dcd1518ca56f8793 Author: Bert Karwatzki <spasswolf@web.de> Date: Thu Jun 20 20:07:13 2024 +0200 mm: mmap: push back vms_complete_munmap_vmas() This commit actually show the rss counter bug, while the previus does not! Signed-off-by: Bert Karwatzki <spasswolf@web.de> diff --git a/mm/mmap.c b/mm/mmap.c index 4fb9dd2e6d6e..c5f4b4b6fb84 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3016,14 +3016,12 @@ unsigned long mmap_region(struct file *file, unsigned long addr, vma_iter_next_range(&vmi); } - if (vms.end) { - vms_complete_munmap_vmas(&vms, &mas_detach); - vms.end = 0; // avoid double unmap below - } - /* Actually expand, if possible */ if (vma && !vma_expand(&vmi, vma, merge_start, merge_end, vm_pgoff, next)) { + if (vms.end) { + vms_complete_munmap_vmas(&vms, &mas_detach); + } khugepaged_enter_vma(vma, vm_flags); goto expanded; } So there might be some unwanted interaction between vms_complete_munmap_vmas though I've no yet figured out what exactly is happening. Hope this will be helpful in solving the problem. Bert Karwatzki ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: commit 1c29a32ce65f4cd0f1c causes Bad rss-counter state and firefox-esr crash in linux-next-20240613 2024-07-12 12:17 ` commit 1c29a32ce65f4cd0f1c causes Bad rss-counter state and firefox-esr crash in linux-next-20240613 Bert Karwatzki @ 2024-07-12 15:38 ` Liam R. Howlett 2024-07-12 16:26 ` Bert Karwatzki 0 siblings, 1 reply; 12+ messages in thread From: Liam R. Howlett @ 2024-07-12 15:38 UTC (permalink / raw) To: Bert Karwatzki Cc: Andrew Morton, Jiri Olsa, linux-mm, linux-kernel, linux-next * Bert Karwatzki <spasswolf@web.de> [240712 08:18]: > I did some experiments on the rss counter bug. The next patch is made for linux-next-20240613 > with commit 1c29a32ce65f4cd0f1c0f9 reverted. Then I simply inlined the code of do_vmi_unmap() > and do_vmi_align_munmap() into mmap_region(). This version of the code works fine and does not > show the rss counter bug. Are you still working with v1 of this patch set? I root-caused the rss counter issue and seg fault to the fact that next or prev may be expanded and I was using the new start/end on munmap() in the completion. This was fixed in subsequent patches. I've sent v4 recently, but will have to a v5 to back off the removal of arch_unmap() for PPC. ... Thanks, Liam ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: commit 1c29a32ce65f4cd0f1c causes Bad rss-counter state and firefox-esr crash in linux-next-20240613 2024-07-12 15:38 ` Liam R. Howlett @ 2024-07-12 16:26 ` Bert Karwatzki 0 siblings, 0 replies; 12+ messages in thread From: Bert Karwatzki @ 2024-07-12 16:26 UTC (permalink / raw) To: Liam R. Howlett, Andrew Morton, Jiri Olsa, linux-mm, linux-kernel, linux-next Am Freitag, dem 12.07.2024 um 11:38 -0400 schrieb Liam R. Howlett: > * Bert Karwatzki <spasswolf@web.de> [240712 08:18]: > > I did some experiments on the rss counter bug. The next patch is made for linux-next-20240613 > > with commit 1c29a32ce65f4cd0f1c0f9 reverted. Then I simply inlined the code of do_vmi_unmap() > > and do_vmi_align_munmap() into mmap_region(). This version of the code works fine and does not > > show the rss counter bug. > > Are you still working with v1 of this patch set? > > I root-caused the rss counter issue and seg fault to the fact that next > or prev may be expanded and I was using the new start/end on munmap() in > the completion. This was fixed in subsequent patches. > > I've sent v4 recently, but will have to a v5 to back off the removal of > arch_unmap() for PPC. > > ... > > Thanks, > Liam Sorry, that was a mistake when using git send-email while working on another bug I accidently sent this old mail. I actually have tested the v3 and v4 patchsets on linux-next up to 2024711 with no error. I havent't tested v4 on linux- 20240712, yet, because next-20240712 has a deadlock issue: https://lore.kernel.org/all/20240712080414.GA47643@google.com/ (while sending a mail to this issue, I accidently send the old mail, too) Bert Karwatzki ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <ZpBhCHsG39wIVnQN@x1n>]
* Re: [PATCH] mm: Fix mmap_assert_locked() in follow_pte() [not found] <ZpBhCHsG39wIVnQN@x1n> @ 2024-07-12 13:19 ` David Wang 0 siblings, 0 replies; 12+ messages in thread From: David Wang @ 2024-07-12 13:19 UTC (permalink / raw) To: peterx Cc: akpm, david, linux-kernel-mentees, linux-kernel, linux-mm, peili.dev, skhan, syzbot+35a4414f6e247f515443, syzkaller-bugs Hi, > Ah yes, I had one rfc patch for that, I temporarily put that aside as it > seemed nobody cared except myself.. it's here: > > https://lore.kernel.org/all/20240523223745.395337-2-peterx@redhat.com > > I didn't know it can already cause real trouble. It looks like that patch > should fix this. > > Thanks, > > -- > Peter Xu Just add another user scenario concering this kernel warning. Ever since 6.10-rc1, when I suspend my system via `systemctl suspend`, nvidia gpu driver would trigger a warning: Call Trace: <TASK> ? __warn+0x7c/0x120 ? follow_pte+0x15b/0x170 ? report_bug+0x18d/0x1c0 ? handle_bug+0x3c/0x80 ? exc_invalid_op+0x13/0x60 ? asm_exc_invalid_op+0x16/0x20 ? follow_pte+0x15b/0x170 follow_phys+0x3a/0xf0 untrack_pfn+0x53/0x120 unmap_single_vma+0xa6/0xe0 zap_page_range_single+0xe4/0x190 ? _nv002569kms+0x17b/0x210 [nvidia_modeset] ? srso_return_thunk+0x5/0x5f ? kfree+0x257/0x290 unmap_mapping_range+0x10d/0x130 nv_revoke_gpu_mappings_locked+0x43/0x70 [nvidia] nv_set_system_power_state+0x1c9/0x470 [nvidia] nv_procfs_write_suspend+0xd3/0x140 [nvidia] proc_reg_write+0x58/0xa0 ? srso_return_thunk+0x5/0x5f vfs_write+0xf6/0x440 ? __count_memcg_events+0x73/0x110 ? srso_return_thunk+0x5/0x5f ? count_memcg_events.constprop.0+0x1a/0x30 ? srso_return_thunk+0x5/0x5f ? handle_mm_fault+0xa9/0x2d0 ? srso_return_thunk+0x5/0x5f ? preempt_count_add+0x47/0xa0 ksys_write+0x63/0xe0 do_syscall_64+0x4b/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f34a3914240 Code: 40 00 48 8b 15 c1 9b 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 23 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89 RSP: 002b:00007ffca2aa2688 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f34a3914240 RDX: 0000000000000008 RSI: 000055a02968ed80 RDI: 0000000000000001 RBP: 000055a02968ed80 R08: 00007f34a39eecd0 R09: 00007f34a39eecd0 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000008 R13: 00007f34a39ef760 R14: 0000000000000008 R15: 00007f34a39ea9e0 </TASK> ---[ end trace 0000000000000000 ]--- PM: suspend entry (deep) Considering out-of-tree nature of nvidia gpu driver, and nobody reported this kernel warning before with in-trees, I had almost convinced myself that nvidia driver may need "big" rework to live with those "PTE" changes. So glad to see this thread of discussion/issue/fix now, I have been patching my system manually ever since 6.10-rc1, hope things got fixed soon... FYI David ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm: Fix mmap_assert_locked() in follow_pte() @ 2024-07-12 12:43 Bert Karwatzki 0 siblings, 0 replies; 12+ messages in thread From: Bert Karwatzki @ 2024-07-12 12:43 UTC (permalink / raw) To: Pei Li Cc: Bert Karwatzki, Andrew Morton, Shuah Khan, David Hildenbrand, linux-mm, linux-kernel, linux-next, syzbot+35a4414f6e247f515443, syzkaller-bugs, linux-kernel-mentees, senozhatsky diff --git a/mm/memory.c b/mm/memory.c index 282203363177..2f4b4322ec0e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1817,6 +1817,7 @@ static void unmap_single_vma(struct mmu_gather *tlb, { unsigned long start = max(vma->vm_start, start_addr); unsigned long end; + bool mm_read_locked; if (start >= vma->vm_end) return; @@ -1829,11 +1830,11 @@ static void unmap_single_vma(struct mmu_gather *tlb, if (unlikely(vma->vm_flags & VM_PFNMAP)) { if (!mm_wr_locked) - mmap_read_lock(vma->vm_mm); + mm_read_locked = !mmap_read_trylock(vma->vm_mm); untrack_pfn(vma, 0, 0, mm_wr_locked); - if (!mm_wr_locked) + if (!mm_wr_locked && !mm_read_locked) mmap_read_unlock(vma->vm_mm); } This seems to fix the issue without completely removing the locking. Bert Karwatzki ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] mm: Fix mmap_assert_locked() in follow_pte() @ 2024-07-11 5:13 Pei Li 2024-07-11 21:22 ` Andrew Morton 2024-07-11 21:33 ` David Hildenbrand 0 siblings, 2 replies; 12+ messages in thread From: Pei Li @ 2024-07-11 5:13 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, skhan, syzkaller-bugs, linux-kernel-mentees, syzbot+35a4414f6e247f515443, Pei Li This patch fixes this warning by acquiring read lock before entering untrack_pfn() while write lock is not held. syzbot has tested the proposed patch and the reproducer did not trigger any issue. Reported-by: syzbot+35a4414f6e247f515443@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=35a4414f6e247f515443 Tested-by: syzbot+35a4414f6e247f515443@syzkaller.appspotmail.com Signed-off-by: Pei Li <peili.dev@gmail.com> --- Syzbot reported the following warning in follow_pte(): WARNING: CPU: 3 PID: 5192 at include/linux/rwsem.h:195 rwsem_assert_held include/linux/rwsem.h:195 [inline] WARNING: CPU: 3 PID: 5192 at include/linux/rwsem.h:195 mmap_assert_locked include/linux/mmap_lock.h:65 [inline] WARNING: CPU: 3 PID: 5192 at include/linux/rwsem.h:195 follow_pte+0x414/0x4c0 mm/memory.c:5980 This is because we are assuming that mm->mmap_lock should be held when entering follow_pte(). This is added in commit c5541ba378e3 (mm: follow_pte() improvements). However, in the following call stack, we are not acquring the lock: follow_phys arch/x86/mm/pat/memtype.c:957 [inline] get_pat_info+0xf2/0x510 arch/x86/mm/pat/memtype.c:991 untrack_pfn+0xf7/0x4d0 arch/x86/mm/pat/memtype.c:1104 unmap_single_vma+0x1bd/0x2b0 mm/memory.c:1819 zap_page_range_single+0x326/0x560 mm/memory.c:1920 In zap_page_range_single(), we passed mm_wr_locked as false, as we do not expect write lock to be held. In the special case where vma->vm_flags is set as VM_PFNMAP, we are hitting untrack_pfn() which eventually calls into follow_phys. This patch fixes this warning by acquiring read lock before entering untrack_pfn() while write lock is not held. syzbot has tested the proposed patch and the reproducer did not trigger any issue: Tested on: commit: 9d9a2f29 Merge tag 'mm-hotfixes-stable-2024-07-10-13-1.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=13be8021980000 kernel config: https://syzkaller.appspot.com/x/.config?x=3456bae478301dc8 dashboard link: https://syzkaller.appspot.com/bug?extid=35a4414f6e247f515443 compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 patch: https://syzkaller.appspot.com/x/patch.diff?x=145e3441980000 Note: testing is done by a robot and is best-effort only. --- mm/memory.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index d10e616d7389..75d7959b835b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1815,9 +1815,16 @@ static void unmap_single_vma(struct mmu_gather *tlb, if (vma->vm_file) uprobe_munmap(vma, start, end); - if (unlikely(vma->vm_flags & VM_PFNMAP)) + if (unlikely(vma->vm_flags & VM_PFNMAP)) { + if (!mm_wr_locked) + mmap_read_lock(vma->vm_mm); + untrack_pfn(vma, 0, 0, mm_wr_locked); + if (!mm_wr_locked) + mmap_read_unlock(vma->vm_mm); + } + if (start != end) { if (unlikely(is_vm_hugetlb_page(vma))) { /* --- base-commit: 734610514cb0234763cc97ddbd235b7981889445 change-id: 20240710-bug12-fecfe4bb0dcb Best regards, -- Pei Li <peili.dev@gmail.com> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm: Fix mmap_assert_locked() in follow_pte() 2024-07-11 5:13 Pei Li @ 2024-07-11 21:22 ` Andrew Morton 2024-07-11 21:33 ` David Hildenbrand 1 sibling, 0 replies; 12+ messages in thread From: Andrew Morton @ 2024-07-11 21:22 UTC (permalink / raw) To: Pei Li Cc: linux-mm, linux-kernel, skhan, syzkaller-bugs, linux-kernel-mentees, syzbot+35a4414f6e247f515443, David Hildenbrand On Wed, 10 Jul 2024 22:13:17 -0700 Pei Li <peili.dev@gmail.com> wrote: > This patch fixes this warning by acquiring read lock before entering > untrack_pfn() while write lock is not held. > > syzbot has tested the proposed patch and the reproducer did not > trigger any issue. > Thanks. > --- > Syzbot reported the following warning in follow_pte(): > > WARNING: CPU: 3 PID: 5192 at include/linux/rwsem.h:195 rwsem_assert_held include/linux/rwsem.h:195 [inline] > WARNING: CPU: 3 PID: 5192 at include/linux/rwsem.h:195 mmap_assert_locked include/linux/mmap_lock.h:65 [inline] > WARNING: CPU: 3 PID: 5192 at include/linux/rwsem.h:195 follow_pte+0x414/0x4c0 mm/memory.c:5980 > > This is because we are assuming that mm->mmap_lock should be held when > entering follow_pte(). This is added in commit c5541ba378e3 (mm: > follow_pte() improvements). > > However, in the following call stack, we are not acquring the lock: > follow_phys arch/x86/mm/pat/memtype.c:957 [inline] > get_pat_info+0xf2/0x510 arch/x86/mm/pat/memtype.c:991 > untrack_pfn+0xf7/0x4d0 arch/x86/mm/pat/memtype.c:1104 > unmap_single_vma+0x1bd/0x2b0 mm/memory.c:1819 > zap_page_range_single+0x326/0x560 mm/memory.c:1920 > > In zap_page_range_single(), we passed mm_wr_locked as false, as we do > not expect write lock to be held. > In the special case where vma->vm_flags is set as VM_PFNMAP, we are > hitting untrack_pfn() which eventually calls into follow_phys. I included the above (very relevant) info in the changelog. And I added Fixes: c5541ba378e3 ("mm: follow_pte() improvements") and queued the patch for 6.10-rc7. Hopefully David can review it for us. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm: Fix mmap_assert_locked() in follow_pte() 2024-07-11 5:13 Pei Li 2024-07-11 21:22 ` Andrew Morton @ 2024-07-11 21:33 ` David Hildenbrand 2024-07-11 21:45 ` David Hildenbrand 2024-07-12 8:04 ` Sergey Senozhatsky 1 sibling, 2 replies; 12+ messages in thread From: David Hildenbrand @ 2024-07-11 21:33 UTC (permalink / raw) To: Pei Li, Andrew Morton Cc: linux-mm, linux-kernel, skhan, syzkaller-bugs, linux-kernel-mentees, syzbot+35a4414f6e247f515443 On 11.07.24 07:13, Pei Li wrote: > This patch fixes this warning by acquiring read lock before entering > untrack_pfn() while write lock is not held. > > syzbot has tested the proposed patch and the reproducer did not > trigger any issue. > > Reported-by: syzbot+35a4414f6e247f515443@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=35a4414f6e247f515443 > Tested-by: syzbot+35a4414f6e247f515443@syzkaller.appspotmail.com > Signed-off-by: Pei Li <peili.dev@gmail.com> > --- > Syzbot reported the following warning in follow_pte(): > > WARNING: CPU: 3 PID: 5192 at include/linux/rwsem.h:195 rwsem_assert_held include/linux/rwsem.h:195 [inline] > WARNING: CPU: 3 PID: 5192 at include/linux/rwsem.h:195 mmap_assert_locked include/linux/mmap_lock.h:65 [inline] > WARNING: CPU: 3 PID: 5192 at include/linux/rwsem.h:195 follow_pte+0x414/0x4c0 mm/memory.c:5980 > > This is because we are assuming that mm->mmap_lock should be held when > entering follow_pte(). This is added in commit c5541ba378e3 (mm: > follow_pte() improvements). > > However, in the following call stack, we are not acquring the lock: > follow_phys arch/x86/mm/pat/memtype.c:957 [inline] > get_pat_info+0xf2/0x510 arch/x86/mm/pat/memtype.c:991 > untrack_pfn+0xf7/0x4d0 arch/x86/mm/pat/memtype.c:1104 > unmap_single_vma+0x1bd/0x2b0 mm/memory.c:1819 > zap_page_range_single+0x326/0x560 mm/memory.c:1920 That implies that unmap_vmas() is called without the mmap lock in read mode, correct? Do we know how this happens? * exit_mmap() holds the mmap lock in read mode * unmap_region is documented to hold the mmap lock in read mode > > In zap_page_range_single(), we passed mm_wr_locked as false, as we do > not expect write lock to be held. > In the special case where vma->vm_flags is set as VM_PFNMAP, we are > hitting untrack_pfn() which eventually calls into follow_phys. > > This patch fixes this warning by acquiring read lock before entering > untrack_pfn() while write lock is not held. > > syzbot has tested the proposed patch and the reproducer did not trigger any issue: > > Tested on: > > commit: 9d9a2f29 Merge tag 'mm-hotfixes-stable-2024-07-10-13-1.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=13be8021980000 > kernel config: https://syzkaller.appspot.com/x/.config?x=3456bae478301dc8 > dashboard link: https://syzkaller.appspot.com/bug?extid=35a4414f6e247f515443 > compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 > patch: https://syzkaller.appspot.com/x/patch.diff?x=145e3441980000 > > Note: testing is done by a robot and is best-effort only. > --- > mm/memory.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/mm/memory.c b/mm/memory.c > index d10e616d7389..75d7959b835b 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1815,9 +1815,16 @@ static void unmap_single_vma(struct mmu_gather *tlb, > if (vma->vm_file) > uprobe_munmap(vma, start, end); > > - if (unlikely(vma->vm_flags & VM_PFNMAP)) > + if (unlikely(vma->vm_flags & VM_PFNMAP)) { > + if (!mm_wr_locked) > + mmap_read_lock(vma->vm_mm); > + > untrack_pfn(vma, 0, 0, mm_wr_locked); > > + if (!mm_wr_locked) > + mmap_read_unlock(vma->vm_mm); > + } > + > if (start != end) { > if (unlikely(is_vm_hugetlb_page(vma))) { I'm not sure if this is the right fix. I like to understand how we end up without the mmap lock at least in read mode in that path? -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm: Fix mmap_assert_locked() in follow_pte() 2024-07-11 21:33 ` David Hildenbrand @ 2024-07-11 21:45 ` David Hildenbrand 2024-07-11 21:51 ` David Hildenbrand 2024-07-12 8:04 ` Sergey Senozhatsky 1 sibling, 1 reply; 12+ messages in thread From: David Hildenbrand @ 2024-07-11 21:45 UTC (permalink / raw) To: Pei Li, Andrew Morton Cc: linux-mm, linux-kernel, skhan, syzkaller-bugs, linux-kernel-mentees, syzbot+35a4414f6e247f515443, Peter Xu On 11.07.24 23:33, David Hildenbrand wrote: > On 11.07.24 07:13, Pei Li wrote: >> This patch fixes this warning by acquiring read lock before entering >> untrack_pfn() while write lock is not held. >> >> syzbot has tested the proposed patch and the reproducer did not >> trigger any issue. >> >> Reported-by: syzbot+35a4414f6e247f515443@syzkaller.appspotmail.com >> Closes: https://syzkaller.appspot.com/bug?extid=35a4414f6e247f515443 >> Tested-by: syzbot+35a4414f6e247f515443@syzkaller.appspotmail.com >> Signed-off-by: Pei Li <peili.dev@gmail.com> >> --- >> Syzbot reported the following warning in follow_pte(): >> >> WARNING: CPU: 3 PID: 5192 at include/linux/rwsem.h:195 rwsem_assert_held include/linux/rwsem.h:195 [inline] >> WARNING: CPU: 3 PID: 5192 at include/linux/rwsem.h:195 mmap_assert_locked include/linux/mmap_lock.h:65 [inline] >> WARNING: CPU: 3 PID: 5192 at include/linux/rwsem.h:195 follow_pte+0x414/0x4c0 mm/memory.c:5980 >> >> This is because we are assuming that mm->mmap_lock should be held when >> entering follow_pte(). This is added in commit c5541ba378e3 (mm: >> follow_pte() improvements). >> >> However, in the following call stack, we are not acquring the lock: >> follow_phys arch/x86/mm/pat/memtype.c:957 [inline] >> get_pat_info+0xf2/0x510 arch/x86/mm/pat/memtype.c:991 >> untrack_pfn+0xf7/0x4d0 arch/x86/mm/pat/memtype.c:1104 >> unmap_single_vma+0x1bd/0x2b0 mm/memory.c:1819 >> zap_page_range_single+0x326/0x560 mm/memory.c:1920 > > That implies that unmap_vmas() is called without the mmap lock in read > mode, correct? > > Do we know how this happens? > > * exit_mmap() holds the mmap lock in read mode > * unmap_region is documented to hold the mmap lock in read mode I think this is it (missed the call from zap_page_range_single()): follow_phys arch/x86/mm/pat/memtype.c:957 [inline] get_pat_info+0xf2/0x510 arch/x86/mm/pat/memtype.c:991 untrack_pfn+0xf7/0x4d0 arch/x86/mm/pat/memtype.c:1104 unmap_single_vma+0x1bd/0x2b0 mm/memory.c:1819 zap_page_range_single+0x326/0x560 mm/memory.c:1920 unmap_mapping_range_vma mm/memory.c:3684 [inline] unmap_mapping_range_tree mm/memory.c:3701 [inline] unmap_mapping_pages mm/memory.c:3767 [inline] unmap_mapping_range+0x1ee/0x280 mm/memory.c:3804 truncate_pagecache+0x53/0x90 mm/truncate.c:731 simple_setattr+0xf2/0x120 fs/libfs.c:886 notify_change+0xec6/0x11f0 fs/attr.c:499 do_truncate+0x15c/0x220 fs/open.c:65 handle_truncate fs/namei.c:3308 [inline] I think Peter recently questioned whether untrack_pfn() should be even called from the place, but I might misremember things. Fix should work (I suspect we are not violating some locking rules?), PFNMAP should not happen there too often that we really care. If everything fails, we could drop the assert, but I'm hoping we can avoid that. We really want most users of follow_pte() to do the right thing. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm: Fix mmap_assert_locked() in follow_pte() 2024-07-11 21:45 ` David Hildenbrand @ 2024-07-11 21:51 ` David Hildenbrand 0 siblings, 0 replies; 12+ messages in thread From: David Hildenbrand @ 2024-07-11 21:51 UTC (permalink / raw) To: Pei Li, Andrew Morton Cc: linux-mm, linux-kernel, skhan, syzkaller-bugs, linux-kernel-mentees, syzbot+35a4414f6e247f515443, Peter Xu On 11.07.24 23:45, David Hildenbrand wrote: > On 11.07.24 23:33, David Hildenbrand wrote: >> On 11.07.24 07:13, Pei Li wrote: >>> This patch fixes this warning by acquiring read lock before entering >>> untrack_pfn() while write lock is not held. >>> >>> syzbot has tested the proposed patch and the reproducer did not >>> trigger any issue. >>> >>> Reported-by: syzbot+35a4414f6e247f515443@syzkaller.appspotmail.com >>> Closes: https://syzkaller.appspot.com/bug?extid=35a4414f6e247f515443 >>> Tested-by: syzbot+35a4414f6e247f515443@syzkaller.appspotmail.com >>> Signed-off-by: Pei Li <peili.dev@gmail.com> >>> --- >>> Syzbot reported the following warning in follow_pte(): >>> >>> WARNING: CPU: 3 PID: 5192 at include/linux/rwsem.h:195 rwsem_assert_held include/linux/rwsem.h:195 [inline] >>> WARNING: CPU: 3 PID: 5192 at include/linux/rwsem.h:195 mmap_assert_locked include/linux/mmap_lock.h:65 [inline] >>> WARNING: CPU: 3 PID: 5192 at include/linux/rwsem.h:195 follow_pte+0x414/0x4c0 mm/memory.c:5980 >>> >>> This is because we are assuming that mm->mmap_lock should be held when >>> entering follow_pte(). This is added in commit c5541ba378e3 (mm: >>> follow_pte() improvements). >>> >>> However, in the following call stack, we are not acquring the lock: >>> follow_phys arch/x86/mm/pat/memtype.c:957 [inline] >>> get_pat_info+0xf2/0x510 arch/x86/mm/pat/memtype.c:991 >>> untrack_pfn+0xf7/0x4d0 arch/x86/mm/pat/memtype.c:1104 >>> unmap_single_vma+0x1bd/0x2b0 mm/memory.c:1819 >>> zap_page_range_single+0x326/0x560 mm/memory.c:1920 >> >> That implies that unmap_vmas() is called without the mmap lock in read >> mode, correct? >> >> Do we know how this happens? >> >> * exit_mmap() holds the mmap lock in read mode >> * unmap_region is documented to hold the mmap lock in read mode > > I think this is it (missed the call from zap_page_range_single()): > > follow_phys arch/x86/mm/pat/memtype.c:957 [inline] > get_pat_info+0xf2/0x510 arch/x86/mm/pat/memtype.c:991 > untrack_pfn+0xf7/0x4d0 arch/x86/mm/pat/memtype.c:1104 > unmap_single_vma+0x1bd/0x2b0 mm/memory.c:1819 > zap_page_range_single+0x326/0x560 mm/memory.c:1920 > unmap_mapping_range_vma mm/memory.c:3684 [inline] > unmap_mapping_range_tree mm/memory.c:3701 [inline] > unmap_mapping_pages mm/memory.c:3767 [inline] > unmap_mapping_range+0x1ee/0x280 mm/memory.c:3804 > truncate_pagecache+0x53/0x90 mm/truncate.c:731 > simple_setattr+0xf2/0x120 fs/libfs.c:886 > notify_change+0xec6/0x11f0 fs/attr.c:499 > do_truncate+0x15c/0x220 fs/open.c:65 > handle_truncate fs/namei.c:3308 [inline] > > I think Peter recently questioned whether untrack_pfn() should be even > called from the place, but I might misremember things. > > Fix should work (I suspect we are not violating some locking rules?), > PFNMAP should not happen there too often that we really care. ... thinking again, likely we reach this point with "!mm_wr_locked" and the mmap lock already held in read mode. So I suspect the fix won't work as is. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm: Fix mmap_assert_locked() in follow_pte() 2024-07-11 21:33 ` David Hildenbrand 2024-07-11 21:45 ` David Hildenbrand @ 2024-07-12 8:04 ` Sergey Senozhatsky 1 sibling, 0 replies; 12+ messages in thread From: Sergey Senozhatsky @ 2024-07-12 8:04 UTC (permalink / raw) To: Pei Li, Andrew Morton, David Hildenbrand Cc: linux-mm, linux-kernel, skhan, syzkaller-bugs, linux-kernel-mentees, syzbot+35a4414f6e247f515443 On (24/07/11 23:33), David Hildenbrand wrote: [..] > > @@ -1815,9 +1815,16 @@ static void unmap_single_vma(struct mmu_gather *tlb, > > if (vma->vm_file) > > uprobe_munmap(vma, start, end); > > - if (unlikely(vma->vm_flags & VM_PFNMAP)) > > + if (unlikely(vma->vm_flags & VM_PFNMAP)) { > > + if (!mm_wr_locked) > > + mmap_read_lock(vma->vm_mm); > > + > > untrack_pfn(vma, 0, 0, mm_wr_locked); > > + if (!mm_wr_locked) > > + mmap_read_unlock(vma->vm_mm); > > + } > > + > > if (start != end) { > > if (unlikely(is_vm_hugetlb_page(vma))) { > > I'm not sure if this is the right fix. I like to understand how we end up > without the mmap lock at least in read mode in that path? I suspect this is causing a deadlock: [ 10.263161] ============================================ [ 10.263165] WARNING: possible recursive locking detected [ 10.263170] 6.10.0-rc7-next-20240712+ #645 Tainted: G N [ 10.263177] -------------------------------------------- [ 10.263179] (direxec)/166 is trying to acquire lock: [ 10.263184] ffff88810b4f0198 (&mm->mmap_lock){++++}-{3:3}, at: mmap_read_lock+0x12/0x40 [ 10.263217] [ 10.263217] but task is already holding lock: [ 10.263219] ffff88810b4f0198 (&mm->mmap_lock){++++}-{3:3}, at: exit_mmap+0x9c/0x830 [ 10.263238] [ 10.263238] other info that might help us debug this: [ 10.263241] Possible unsafe locking scenario: [ 10.263241] [ 10.263243] CPU0 [ 10.263245] ---- [ 10.263247] lock(&mm->mmap_lock); [ 10.263252] lock(&mm->mmap_lock); [ 10.263257] [ 10.263257] *** DEADLOCK *** [ 10.263257] [ 10.263259] May be due to missing lock nesting notation [ 10.263259] [ 10.263262] 3 locks held by (direxec)/166: [ 10.263267] #0: ffff88810b4e8548 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x70/0x1110 [ 10.263286] #1: ffff88810b4e85e0 (&sig->exec_update_lock){+.+.}-{3:3}, at: exec_mmap+0x9f/0x510 [ 10.263302] #2: ffff88810b4f0198 (&mm->mmap_lock){++++}-{3:3}, at: exit_mmap+0x9c/0x830 [ 10.263318] [ 10.263318] stack backtrace: [ 10.263329] CPU: 6 UID: 0 PID: 166 Comm: (direxec) Tainted: G N 6.10.0-rc7-next-20240712+ #645 [ 10.263340] Tainted: [N]=TEST [ 10.263349] Call Trace: [ 10.263355] <TASK> [ 10.263360] dump_stack_lvl+0xa3/0xeb [ 10.263375] print_deadlock_bug+0x4d5/0x680 [ 10.263387] __lock_acquire+0x65fb/0x7830 [ 10.263408] ? lock_is_held_type+0xdd/0x150 [ 10.263425] lock_acquire+0x14c/0x3e0 [ 10.263433] ? mmap_read_lock+0x12/0x40 [ 10.263445] ? lock_is_held_type+0xdd/0x150 [ 10.263454] down_read+0x58/0x9a0 [ 10.263461] ? mmap_read_lock+0x12/0x40 [ 10.263476] mmap_read_lock+0x12/0x40 [ 10.263485] unmap_single_vma+0x1bf/0x240 [ 10.263497] unmap_vmas+0x146/0x1c0 [ 10.263511] exit_mmap+0x13d/0x830 [ 10.263533] __mmput+0xc2/0x2c0 [ 10.263556] exec_mmap+0x4cb/0x510 [ 10.263580] begin_new_exec+0xfe6/0x1ba0 [ 10.263612] load_elf_binary+0x797/0x22a0 [ 10.263637] ? load_misc_binary+0x53a/0x930 [ 10.263656] ? lock_release+0x50f/0x830 [ 10.263673] ? bprm_execve+0x6d7/0x1110 [ 10.263693] bprm_execve+0x70d/0x1110 [ 10.263730] do_execveat_common+0x44b/0x600 [ 10.263745] __x64_sys_execve+0x8e/0xa0 [ 10.263754] do_syscall_64+0x71/0x110 [ 10.263764] entry_SYSCALL_64_after_hwframe+0x4b/0x53 ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-07-12 16:26 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-12 12:17 [PATCH] mm: Fix mmap_assert_locked() in follow_pte() Bert Karwatzki
2024-07-12 12:17 ` commit 1c29a32ce65f4cd0f1c causes Bad rss-counter state and firefox-esr crash in linux-next-20240613 Bert Karwatzki
2024-07-12 15:38 ` Liam R. Howlett
2024-07-12 16:26 ` Bert Karwatzki
[not found] <ZpBhCHsG39wIVnQN@x1n>
2024-07-12 13:19 ` [PATCH] mm: Fix mmap_assert_locked() in follow_pte() David Wang
-- strict thread matches above, loose matches on Subject: below --
2024-07-12 12:43 Bert Karwatzki
2024-07-11 5:13 Pei Li
2024-07-11 21:22 ` Andrew Morton
2024-07-11 21:33 ` David Hildenbrand
2024-07-11 21:45 ` David Hildenbrand
2024-07-11 21:51 ` David Hildenbrand
2024-07-12 8:04 ` Sergey Senozhatsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox