* [syzbot] [mm?] kernel BUG in page_table_check_clear @ 2023-05-05 0:46 syzbot 2023-05-07 13:58 ` usbdev_mmap causes type confusion in page_table_check Ruihan Li 0 siblings, 1 reply; 16+ messages in thread From: syzbot @ 2023-05-05 0:46 UTC (permalink / raw) To: akpm, linux-kernel, linux-mm, pasha.tatashin, syzkaller-bugs Hello, syzbot found the following issue on: HEAD commit: 825a0714d2b3 Merge tag 'efi-next-for-v6.4' of git://git.ke.. git tree: upstream console+strace: https://syzkaller.appspot.com/x/log.txt?x=12c10d5fc80000 kernel config: https://syzkaller.appspot.com/x/.config?x=37e26f1fda939e72 dashboard link: https://syzkaller.appspot.com/bug?extid=fcf1a817ceb50935ce99 compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15ad4054280000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13f5f594280000 Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/31a9dc70b22e/disk-825a0714.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/a216970773c6/vmlinux-825a0714.xz kernel image: https://storage.googleapis.com/syzbot-assets/3da9c8621c64/bzImage-825a0714.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+fcf1a817ceb50935ce99@syzkaller.appspotmail.com ------------[ cut here ]------------ kernel BUG at mm/page_table_check.c:83! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 5684 Comm: syz-executor989 Not tainted 6.3.0-syzkaller-11733-g825a0714d2b3 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/14/2023 RIP: 0010:page_table_check_clear.part.0+0x323/0x380 mm/page_table_check.c:83 Code: ff 89 de e8 3f 62 a2 ff 85 db 0f 89 7c fe ff ff e8 12 66 a2 ff 0f 0b e8 0b 66 a2 ff 0f 0b e8 04 66 a2 ff 0f 0b e8 fd 65 a2 ff <0f> 0b e8 f6 65 a2 ff 0f 0b 48 c7 c7 00 f7 e6 91 e8 f8 2c f5 ff e9 RSP: 0018:ffffc900044ff838 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff888013cb7010 RCX: 0000000000000000 RDX: ffff888029f25940 RSI: ffffffff81e1f843 RDI: 0000000000000005 RBP: ffff888013cb6fd0 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000001 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000040 CR3: 0000000077cb5000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> page_table_check_clear include/linux/mmzone.h:2003 [inline] __page_table_check_pte_clear+0x265/0x2b0 mm/page_table_check.c:155 page_table_check_pte_clear include/linux/page_table_check.h:55 [inline] ptep_get_and_clear_full arch/x86/include/asm/pgtable.h:1086 [inline] zap_pte_range mm/memory.c:1413 [inline] zap_pmd_range mm/memory.c:1563 [inline] zap_pud_range mm/memory.c:1592 [inline] zap_p4d_range mm/memory.c:1613 [inline] unmap_page_range+0x2aa6/0x38e0 mm/memory.c:1634 unmap_single_vma+0x19a/0x2b0 mm/memory.c:1680 unmap_vmas+0x234/0x380 mm/memory.c:1719 exit_mmap+0x190/0x930 mm/mmap.c:3111 __mmput+0x128/0x4c0 kernel/fork.c:1350 mmput+0x60/0x70 kernel/fork.c:1372 exit_mm kernel/exit.c:564 [inline] do_exit+0x9d7/0x2960 kernel/exit.c:858 do_group_exit+0xd4/0x2a0 kernel/exit.c:1021 __do_sys_exit_group kernel/exit.c:1032 [inline] __se_sys_exit_group kernel/exit.c:1030 [inline] __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1030 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f0c2f233679 Code: Unable to access opcode bytes at 0x7f0c2f23364f. RSP: 002b:00007ffc87dd3aa8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 00007f0c2f2a83f0 RCX: 00007f0c2f233679 RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 0000000000000000 R10: 0000000000000011 R11: 0000000000000246 R12: 00007f0c2f2a83f0 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:page_table_check_clear.part.0+0x323/0x380 mm/page_table_check.c:83 Code: ff 89 de e8 3f 62 a2 ff 85 db 0f 89 7c fe ff ff e8 12 66 a2 ff 0f 0b e8 0b 66 a2 ff 0f 0b e8 04 66 a2 ff 0f 0b e8 fd 65 a2 ff <0f> 0b e8 f6 65 a2 ff 0f 0b 48 c7 c7 00 f7 e6 91 e8 f8 2c f5 ff e9 RSP: 0018:ffffc900044ff838 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff888013cb7010 RCX: 0000000000000000 RDX: ffff888029f25940 RSI: ffffffff81e1f843 RDI: 0000000000000005 RBP: ffff888013cb6fd0 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000001 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000040 CR3: 0000000077cb5000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. If the bug is already fixed, let syzbot know by replying with: #syz fix: exact-commit-title If you want syzbot to run the reproducer, reply with: #syz test: git://repo/address.git branch-or-commit-hash If you attach or paste a git patch, syzbot will apply it before testing. If you want to change bug's subsystems, reply with: #syz set subsystems: new-subsystem (See the list of subsystem names on the web dashboard) If the bug is a duplicate of another bug, reply with: #syz dup: exact-subject-of-another-report If you want to undo deduplication, reply with: #syz undup ^ permalink raw reply [flat|nested] 16+ messages in thread
* usbdev_mmap causes type confusion in page_table_check 2023-05-05 0:46 [syzbot] [mm?] kernel BUG in page_table_check_clear syzbot @ 2023-05-07 13:58 ` Ruihan Li 2023-05-08 21:27 ` Pasha Tatashin 2023-05-09 13:25 ` Christoph Hellwig 0 siblings, 2 replies; 16+ messages in thread From: Ruihan Li @ 2023-05-07 13:58 UTC (permalink / raw) To: syzbot+fcf1a817ceb50935ce99 Cc: akpm, linux-kernel, linux-mm, pasha.tatashin, gregkh, linux-usb, syzkaller-bugs, Ruihan Li Hi all, Recently, syzbot reported [1] ("kernel BUG in page_table_check_clear"). After some bisection, I found out that when usbdev_mmap calls remap_pfn_range on kmalloc'ed memory, it causes type confusion between struct folio and slab in page_table_check. As I will explain below, it seems that both usb-side and mm-side need some fixes. So I have cc'd linux-usb and linux-mm here, as well as their maintainers, to see whether there are any comments on the proposed way to fix. [1] https://lore.kernel.org/all/000000000000258e5e05fae79fc1@google.com/ To handle mmap requests for /dev/bus/usb/XXX/YYY, usbdev_mmap first allocates memory by calling usb_alloc_coherent and then maps it into the user space using remap_pfn_range: static int usbdev_mmap(struct file *file, struct vm_area_struct *vma) { // ... mem = usb_alloc_coherent(ps->dev, size, GFP_USER | __GFP_NOWARN, &dma_handle); // ... if (hcd->localmem_pool || !hcd_uses_dma(hcd)) { if (remap_pfn_range(vma, vma->vm_start, virt_to_phys(usbm->mem) >> PAGE_SHIFT, size, vma->vm_page_prot) < 0) { // ... } } // ... } Note that in this case, we consider the DMA-unavailable scenario, which, to be honest, is rare in practice. However, syzbot emulates USB devices using a module named dummy_hcd, and since these devices are emulated, DMA is not available at all. Meanwhile, usb_alloc_coherent calls hcd_buffer_alloc, which uses kmalloc for memory allocation: void *hcd_buffer_alloc( struct usb_bus *bus, size_t size, gfp_t mem_flags, dma_addr_t *dma ) { // ... if (!hcd_uses_dma(hcd)) { *dma = ~(dma_addr_t) 0; return kmalloc(size, mem_flags); } // ... } However, during the update of the page table to map the kmalloc'd page into the user space, page_table_check_set attempts to determine whether the page is anonymous using PageAnon: static void page_table_check_set(struct mm_struct *mm, unsigned long addr, unsigned long pfn, unsigned long pgcnt, bool rw) { // ... anon = PageAnon(page); for (i = 0; i < pgcnt; i++) { // ... if (anon) { BUG_ON(atomic_read(&ptc->file_map_count)); BUG_ON(atomic_inc_return(&ptc->anon_map_count) > 1 && rw); } else { BUG_ON(atomic_read(&ptc->anon_map_count)); BUG_ON(atomic_inc_return(&ptc->file_map_count) < 0); } // ... } // ... } This call to PageAnon is invalid for slab pages because slab reuses the bits in struct page/folio to store its internal states, and the anonymity bit only exists in struct page/folio. As a result, the counters are incorrectly updated and checked in page_table_check_set and page_table_check_clear, leading to the bug being raised. Intuitively, I don't think it's reasonable to call remap_pfn_range to map kmalloc'd pages into the user space. In the past, kmalloc'd pages might have had alignment issues when certain memory debugging options were enabled. Although this has been fixed in commit 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)"), it has been shown that performing such mapping is still buggy, as demonstrated by the type confusion in page_table_check. Therefore, I propose adding a new function, hcd_buffer_alloc_pages, to guarantee the page requirements (i.e., no intermediate allocators, such as slab/slub). Below is a diff as a quick example: diff --git a/drivers/usb/core/buffer.c b/drivers/usb/core/buffer.c index fbb087b72..514bdc949 100644 --- a/drivers/usb/core/buffer.c +++ b/drivers/usb/core/buffer.c @@ -112,7 +112,7 @@ void hcd_buffer_destroy(struct usb_hcd *hcd) * better sharing and to leverage mm/slab.c intelligence. */ -void *hcd_buffer_alloc( +void *hcd_buffer_alloc_pages( struct usb_bus *bus, size_t size, gfp_t mem_flags, @@ -126,12 +126,13 @@ void *hcd_buffer_alloc( return NULL; if (hcd->localmem_pool) - return gen_pool_dma_alloc(hcd->localmem_pool, size, dma); + return gen_pool_dma_alloc_align(hcd->localmem_pool, size, + dma, PAGE_SIZE); /* some USB hosts just use PIO */ if (!hcd_uses_dma(hcd)) { *dma = ~(dma_addr_t) 0; - return kmalloc(size, mem_flags); + return (void *)__get_free_pages(mem_flags, get_order(size)); } for (i = 0; i < HCD_BUFFER_POOLS; i++) { (There appears to be another issue. In cases where hcd->localmem_pool is non-null, I suspect that remap_pfn_range should not be used. This is a DMA scenario, and the DMA handle is properly returned, so dma_mmap_coherent should be used instead. Am I correct?) This does not actually fix the type confusion bug in page_table_check_*. It should be noted that by leveraging /dev/mem, users can map arbitrary physical memory regions into the user space, which is also achieved through remap_pfn_range. I'm not 100% certain whether a fix is necessary, as one may argue that using page table checks (a kernel hardening technique, which means security is important) and /dev/mem (clearly insecure and potentially exploitable) together is completely unreasonable. If a fix is deemed necessary, I think taking the flag VM_PFNMAP into consideration is a reasonable solution, that said, in page_table_check_*, * when the VM_PFNMAP flag is set, all operations and checks on file_map_count and anon_map_count counters should be bypassed; * when the VM_PFNMAP flag is not set, an additionally check to ensure folio_test_slab evaluates to false should be performed. The implementation should be straightforward. However, I noticed that the page_table_check_* hooks are called in arch/* instead of mm/*, which not only limits its supported architectures (currently x86_64, arm64, and risc-v) but also makes it difficult to get the struct vm_area_struct as a parameter, since the struct vm_area_struct is not passed from mm/* to arch/* when mapping or unmapping pages. I have looked at d283d422c6c4 ("x86: mm: add x86_64 support for page table check"), but I don't see a valid reason. Perhaps Pasha can provide some explanation about this implementation choice? Thanks, Ruihan Li ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: usbdev_mmap causes type confusion in page_table_check 2023-05-07 13:58 ` usbdev_mmap causes type confusion in page_table_check Ruihan Li @ 2023-05-08 21:27 ` Pasha Tatashin 2023-05-08 21:36 ` Matthew Wilcox 2023-05-08 21:37 ` David Hildenbrand 2023-05-09 13:25 ` Christoph Hellwig 1 sibling, 2 replies; 16+ messages in thread From: Pasha Tatashin @ 2023-05-08 21:27 UTC (permalink / raw) To: Ruihan Li Cc: syzbot+fcf1a817ceb50935ce99, akpm, linux-kernel, linux-mm, gregkh, linux-usb, syzkaller-bugs On Sun, May 7, 2023 at 9:58 AM Ruihan Li <lrh2000@pku.edu.cn> wrote: > > Hi all, Hi Ruihan, Thank you for bisecting, and great analysis of the problem. > Recently, syzbot reported [1] ("kernel BUG in page_table_check_clear"). After > some bisection, I found out that when usbdev_mmap calls remap_pfn_range on > kmalloc'ed memory, it causes type confusion between struct folio and slab in > page_table_check. As I will explain below, it seems that both usb-side and > mm-side need some fixes. So I have cc'd linux-usb and linux-mm here, as well > as their maintainers, to see whether there are any comments on the proposed > way to fix. > > [1] https://lore.kernel.org/all/000000000000258e5e05fae79fc1@google.com/ > > To handle mmap requests for /dev/bus/usb/XXX/YYY, usbdev_mmap first allocates > memory by calling usb_alloc_coherent and then maps it into the user space > using remap_pfn_range: > > static int usbdev_mmap(struct file *file, struct vm_area_struct *vma) > { > // ... > mem = usb_alloc_coherent(ps->dev, size, GFP_USER | __GFP_NOWARN, > &dma_handle); > // ... > if (hcd->localmem_pool || !hcd_uses_dma(hcd)) { > if (remap_pfn_range(vma, vma->vm_start, > virt_to_phys(usbm->mem) >> PAGE_SHIFT, > size, vma->vm_page_prot) < 0) { > // ... > } > } > // ... > } > > Note that in this case, we consider the DMA-unavailable scenario, which, to be > honest, is rare in practice. However, syzbot emulates USB devices using a > module named dummy_hcd, and since these devices are emulated, DMA is not > available at all. > > Meanwhile, usb_alloc_coherent calls hcd_buffer_alloc, which uses kmalloc for > memory allocation: > > void *hcd_buffer_alloc( > struct usb_bus *bus, > size_t size, > gfp_t mem_flags, > dma_addr_t *dma > ) > { > // ... > if (!hcd_uses_dma(hcd)) { > *dma = ~(dma_addr_t) 0; > return kmalloc(size, mem_flags); > } > // ... > } > > However, during the update of the page table to map the kmalloc'd page into > the user space, page_table_check_set attempts to determine whether the page is > anonymous using PageAnon: > > static void page_table_check_set(struct mm_struct *mm, unsigned long addr, > unsigned long pfn, unsigned long pgcnt, > bool rw) > { > // ... > anon = PageAnon(page); > for (i = 0; i < pgcnt; i++) { > // ... > if (anon) { > BUG_ON(atomic_read(&ptc->file_map_count)); > BUG_ON(atomic_inc_return(&ptc->anon_map_count) > 1 && rw); > } else { > BUG_ON(atomic_read(&ptc->anon_map_count)); > BUG_ON(atomic_inc_return(&ptc->file_map_count) < 0); > } > // ... > } > // ... > } > > This call to PageAnon is invalid for slab pages because slab reuses the bits > in struct page/folio to store its internal states, and the anonymity bit only > exists in struct page/folio. As a result, the counters are incorrectly updated > and checked in page_table_check_set and page_table_check_clear, leading to the > bug being raised. We should change anon boolean to be: anon = !PageSlab(page) && PageAnon(page); This way, we will have a correct way to determine anon pages, and the rest will fall into file_map_count. The file (non-anon) PTEs are OK to be double mapped, so there won't be any problems from page_table_check point of view even if it is a slab page. > Intuitively, I don't think it's reasonable to call remap_pfn_range to map > kmalloc'd pages into the user space. In the past, kmalloc'd pages might have > had alignment issues when certain memory debugging options were enabled. > Although this has been fixed in commit 59bb47985c1d ("mm, sl[aou]b: guarantee > natural alignment for kmalloc(power-of-two)"), it has been shown that > performing such mapping is still buggy, as demonstrated by the type confusion > in page_table_check. Therefore, I propose adding a new function, > hcd_buffer_alloc_pages, to guarantee the page requirements (i.e., no > intermediate allocators, such as slab/slub). Below is a diff as a quick > example: > > diff --git a/drivers/usb/core/buffer.c b/drivers/usb/core/buffer.c > index fbb087b72..514bdc949 100644 > --- a/drivers/usb/core/buffer.c > +++ b/drivers/usb/core/buffer.c > @@ -112,7 +112,7 @@ void hcd_buffer_destroy(struct usb_hcd *hcd) > * better sharing and to leverage mm/slab.c intelligence. > */ > > -void *hcd_buffer_alloc( > +void *hcd_buffer_alloc_pages( > struct usb_bus *bus, > size_t size, > gfp_t mem_flags, > @@ -126,12 +126,13 @@ void *hcd_buffer_alloc( > return NULL; > > if (hcd->localmem_pool) > - return gen_pool_dma_alloc(hcd->localmem_pool, size, dma); > + return gen_pool_dma_alloc_align(hcd->localmem_pool, size, > + dma, PAGE_SIZE); > > /* some USB hosts just use PIO */ > if (!hcd_uses_dma(hcd)) { > *dma = ~(dma_addr_t) 0; > - return kmalloc(size, mem_flags); > + return (void *)__get_free_pages(mem_flags, get_order(size)); > } > > for (i = 0; i < HCD_BUFFER_POOLS; i++) { > > (There appears to be another issue. In cases where hcd->localmem_pool is > non-null, I suspect that remap_pfn_range should not be used. This is a DMA > scenario, and the DMA handle is properly returned, so dma_mmap_coherent should > be used instead. Am I correct?) > > This does not actually fix the type confusion bug in page_table_check_*. It > should be noted that by leveraging /dev/mem, users can map arbitrary physical > memory regions into the user space, which is also achieved through > remap_pfn_range. I'm not 100% certain whether a fix is necessary, as one may > argue that using page table checks (a kernel hardening technique, which means > security is important) and /dev/mem (clearly insecure and potentially Yes, /dev/mem device is a security problem, and would not work with a hardern kernel. > exploitable) together is completely unreasonable. > > If a fix is deemed necessary, I think taking the flag VM_PFNMAP into > consideration is a reasonable solution, that said, in page_table_check_*, > * when the VM_PFNMAP flag is set, all operations and checks on > file_map_count and anon_map_count counters should be bypassed; > * when the VM_PFNMAP flag is not set, an additionally check to ensure > folio_test_slab evaluates to false should be performed. > > The implementation should be straightforward. However, I noticed that the > page_table_check_* hooks are called in arch/* instead of mm/*, which not only > limits its supported architectures (currently x86_64, arm64, and risc-v) but > also makes it difficult to get the struct vm_area_struct as a parameter, since > the struct vm_area_struct is not passed from mm/* to arch/* when mapping or > unmapping pages. I have looked at d283d422c6c4 ("x86: mm: add x86_64 support > for page table check"), but I don't see a valid reason. Perhaps Pasha can > provide some explanation about this implementation choice? We specifically decided not to use VMA information in order to avoid relying on MM state (except for limited "struct page" info). The page_table_check is a separate from Linux-MM state machine that verifies that the user accessible pages are not falsely shared. Pasha ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: usbdev_mmap causes type confusion in page_table_check 2023-05-08 21:27 ` Pasha Tatashin @ 2023-05-08 21:36 ` Matthew Wilcox 2023-05-08 21:48 ` Pasha Tatashin 2023-05-08 21:37 ` David Hildenbrand 1 sibling, 1 reply; 16+ messages in thread From: Matthew Wilcox @ 2023-05-08 21:36 UTC (permalink / raw) To: Pasha Tatashin Cc: Ruihan Li, syzbot+fcf1a817ceb50935ce99, akpm, linux-kernel, linux-mm, gregkh, linux-usb, syzkaller-bugs On Mon, May 08, 2023 at 05:27:10PM -0400, Pasha Tatashin wrote: > > static void page_table_check_set(struct mm_struct *mm, unsigned long addr, > > unsigned long pfn, unsigned long pgcnt, > > bool rw) > > { > > // ... > > anon = PageAnon(page); > > for (i = 0; i < pgcnt; i++) { > > // ... > > if (anon) { > > BUG_ON(atomic_read(&ptc->file_map_count)); > > BUG_ON(atomic_inc_return(&ptc->anon_map_count) > 1 && rw); > > } else { > > BUG_ON(atomic_read(&ptc->anon_map_count)); > > BUG_ON(atomic_inc_return(&ptc->file_map_count) < 0); > > } > > // ... > > } > > // ... > > } > > > > This call to PageAnon is invalid for slab pages because slab reuses the bits > > in struct page/folio to store its internal states, and the anonymity bit only > > exists in struct page/folio. As a result, the counters are incorrectly updated > > and checked in page_table_check_set and page_table_check_clear, leading to the > > bug being raised. > > We should change anon boolean to be: > > anon = !PageSlab(page) && PageAnon(page); No. Slab pages are not elegible for mapping into userspace. That's all. There should be a BUG() for that. And I do mean BUG(), not "return error to user". Something has gone horribly wrong, and it's time to crash. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: usbdev_mmap causes type confusion in page_table_check 2023-05-08 21:36 ` Matthew Wilcox @ 2023-05-08 21:48 ` Pasha Tatashin 2023-05-08 21:52 ` Matthew Wilcox 0 siblings, 1 reply; 16+ messages in thread From: Pasha Tatashin @ 2023-05-08 21:48 UTC (permalink / raw) To: Matthew Wilcox Cc: Ruihan Li, syzbot+fcf1a817ceb50935ce99, akpm, linux-kernel, linux-mm, gregkh, linux-usb, syzkaller-bugs On Mon, May 8, 2023 at 2:36 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Mon, May 08, 2023 at 05:27:10PM -0400, Pasha Tatashin wrote: > > > static void page_table_check_set(struct mm_struct *mm, unsigned long addr, > > > unsigned long pfn, unsigned long pgcnt, > > > bool rw) > > > { > > > // ... > > > anon = PageAnon(page); > > > for (i = 0; i < pgcnt; i++) { > > > // ... > > > if (anon) { > > > BUG_ON(atomic_read(&ptc->file_map_count)); > > > BUG_ON(atomic_inc_return(&ptc->anon_map_count) > 1 && rw); > > > } else { > > > BUG_ON(atomic_read(&ptc->anon_map_count)); > > > BUG_ON(atomic_inc_return(&ptc->file_map_count) < 0); > > > } > > > // ... > > > } > > > // ... > > > } > > > > > > This call to PageAnon is invalid for slab pages because slab reuses the bits > > > in struct page/folio to store its internal states, and the anonymity bit only > > > exists in struct page/folio. As a result, the counters are incorrectly updated > > > and checked in page_table_check_set and page_table_check_clear, leading to the > > > bug being raised. > > > > We should change anon boolean to be: > > > > anon = !PageSlab(page) && PageAnon(page); > > No. Slab pages are not elegible for mapping into userspace. That's Sure, I can add BUG_ON(PageSlab(page)); to page_table_check_set. > all. There should be a BUG() for that. And I do mean BUG(), not > "return error to user". Something has gone horribly wrong, and it's > time to crash. It is just too easy to make slab available via remap_pfn_range(), but I do not think we want to add BUG() into the remap function, otherwise we will break devices such as /dev/mem. Pasha ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: usbdev_mmap causes type confusion in page_table_check 2023-05-08 21:48 ` Pasha Tatashin @ 2023-05-08 21:52 ` Matthew Wilcox 2023-05-08 21:55 ` Pasha Tatashin 0 siblings, 1 reply; 16+ messages in thread From: Matthew Wilcox @ 2023-05-08 21:52 UTC (permalink / raw) To: Pasha Tatashin Cc: Ruihan Li, syzbot+fcf1a817ceb50935ce99, akpm, linux-kernel, linux-mm, gregkh, linux-usb, syzkaller-bugs On Mon, May 08, 2023 at 02:48:59PM -0700, Pasha Tatashin wrote: > On Mon, May 8, 2023 at 2:36 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Mon, May 08, 2023 at 05:27:10PM -0400, Pasha Tatashin wrote: > > > > static void page_table_check_set(struct mm_struct *mm, unsigned long addr, > > > > unsigned long pfn, unsigned long pgcnt, > > > > bool rw) > > > > { > > > > // ... > > > > anon = PageAnon(page); > > > > for (i = 0; i < pgcnt; i++) { > > > > // ... > > > > if (anon) { > > > > BUG_ON(atomic_read(&ptc->file_map_count)); > > > > BUG_ON(atomic_inc_return(&ptc->anon_map_count) > 1 && rw); > > > > } else { > > > > BUG_ON(atomic_read(&ptc->anon_map_count)); > > > > BUG_ON(atomic_inc_return(&ptc->file_map_count) < 0); > > > > } > > > > // ... > > > > } > > > > // ... > > > > } > > > > > > > > This call to PageAnon is invalid for slab pages because slab reuses the bits > > > > in struct page/folio to store its internal states, and the anonymity bit only > > > > exists in struct page/folio. As a result, the counters are incorrectly updated > > > > and checked in page_table_check_set and page_table_check_clear, leading to the > > > > bug being raised. > > > > > > We should change anon boolean to be: > > > > > > anon = !PageSlab(page) && PageAnon(page); > > > > No. Slab pages are not elegible for mapping into userspace. That's > > Sure, I can add BUG_ON(PageSlab(page)); to page_table_check_set. > > > all. There should be a BUG() for that. And I do mean BUG(), not > > "return error to user". Something has gone horribly wrong, and it's > > time to crash. > > It is just too easy to make slab available via remap_pfn_range(), but > I do not think we want to add BUG() into the remap function, otherwise > we will break devices such as /dev/mem. Slab pages can't be mmaped. Really, no matter what interface you're using. page->_mapcount is necessarily incremented by mapping to userspace, and slab uses that space for its own purposes (and has for decades). It's similar for page tables and other allocations that use PageType. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: usbdev_mmap causes type confusion in page_table_check 2023-05-08 21:52 ` Matthew Wilcox @ 2023-05-08 21:55 ` Pasha Tatashin 2023-05-08 22:46 ` David Hildenbrand 0 siblings, 1 reply; 16+ messages in thread From: Pasha Tatashin @ 2023-05-08 21:55 UTC (permalink / raw) To: Matthew Wilcox Cc: Ruihan Li, syzbot+fcf1a817ceb50935ce99, akpm, linux-kernel, linux-mm, gregkh, linux-usb, syzkaller-bugs On Mon, May 8, 2023 at 2:52 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Mon, May 08, 2023 at 02:48:59PM -0700, Pasha Tatashin wrote: > > On Mon, May 8, 2023 at 2:36 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > On Mon, May 08, 2023 at 05:27:10PM -0400, Pasha Tatashin wrote: > > > > > static void page_table_check_set(struct mm_struct *mm, unsigned long addr, > > > > > unsigned long pfn, unsigned long pgcnt, > > > > > bool rw) > > > > > { > > > > > // ... > > > > > anon = PageAnon(page); > > > > > for (i = 0; i < pgcnt; i++) { > > > > > // ... > > > > > if (anon) { > > > > > BUG_ON(atomic_read(&ptc->file_map_count)); > > > > > BUG_ON(atomic_inc_return(&ptc->anon_map_count) > 1 && rw); > > > > > } else { > > > > > BUG_ON(atomic_read(&ptc->anon_map_count)); > > > > > BUG_ON(atomic_inc_return(&ptc->file_map_count) < 0); > > > > > } > > > > > // ... > > > > > } > > > > > // ... > > > > > } > > > > > > > > > > This call to PageAnon is invalid for slab pages because slab reuses the bits > > > > > in struct page/folio to store its internal states, and the anonymity bit only > > > > > exists in struct page/folio. As a result, the counters are incorrectly updated > > > > > and checked in page_table_check_set and page_table_check_clear, leading to the > > > > > bug being raised. > > > > > > > > We should change anon boolean to be: > > > > > > > > anon = !PageSlab(page) && PageAnon(page); > > > > > > No. Slab pages are not elegible for mapping into userspace. That's > > > > Sure, I can add BUG_ON(PageSlab(page)); to page_table_check_set. > > > > > all. There should be a BUG() for that. And I do mean BUG(), not > > > "return error to user". Something has gone horribly wrong, and it's > > > time to crash. > > > > It is just too easy to make slab available via remap_pfn_range(), but > > I do not think we want to add BUG() into the remap function, otherwise > > we will break devices such as /dev/mem. > > Slab pages can't be mmaped. Really, no matter what interface you're > using. page->_mapcount is necessarily incremented by mapping to > userspace, and slab uses that space for its own purposes (and has > for decades). It's similar for page tables and other allocations that > use PageType. Mapping random memory in /dev/mem can cause mapping slab pages in to userspace, the page->_mapcount is not incremented (and other fields are not accessed) in that case, as we are using VM_PFNMAP type VMA, which does not access "struct page". Pasha ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: usbdev_mmap causes type confusion in page_table_check 2023-05-08 21:55 ` Pasha Tatashin @ 2023-05-08 22:46 ` David Hildenbrand 2023-05-08 23:17 ` Pasha Tatashin 0 siblings, 1 reply; 16+ messages in thread From: David Hildenbrand @ 2023-05-08 22:46 UTC (permalink / raw) To: Pasha Tatashin, Matthew Wilcox Cc: Ruihan Li, syzbot+fcf1a817ceb50935ce99, akpm, linux-kernel, linux-mm, gregkh, linux-usb, syzkaller-bugs On 08.05.23 23:55, Pasha Tatashin wrote: > On Mon, May 8, 2023 at 2:52 PM Matthew Wilcox <willy@infradead.org> wrote: >> >> On Mon, May 08, 2023 at 02:48:59PM -0700, Pasha Tatashin wrote: >>> On Mon, May 8, 2023 at 2:36 PM Matthew Wilcox <willy@infradead.org> wrote: >>>> >>>> On Mon, May 08, 2023 at 05:27:10PM -0400, Pasha Tatashin wrote: >>>>>> static void page_table_check_set(struct mm_struct *mm, unsigned long addr, >>>>>> unsigned long pfn, unsigned long pgcnt, >>>>>> bool rw) >>>>>> { >>>>>> // ... >>>>>> anon = PageAnon(page); >>>>>> for (i = 0; i < pgcnt; i++) { >>>>>> // ... >>>>>> if (anon) { >>>>>> BUG_ON(atomic_read(&ptc->file_map_count)); >>>>>> BUG_ON(atomic_inc_return(&ptc->anon_map_count) > 1 && rw); >>>>>> } else { >>>>>> BUG_ON(atomic_read(&ptc->anon_map_count)); >>>>>> BUG_ON(atomic_inc_return(&ptc->file_map_count) < 0); >>>>>> } >>>>>> // ... >>>>>> } >>>>>> // ... >>>>>> } >>>>>> >>>>>> This call to PageAnon is invalid for slab pages because slab reuses the bits >>>>>> in struct page/folio to store its internal states, and the anonymity bit only >>>>>> exists in struct page/folio. As a result, the counters are incorrectly updated >>>>>> and checked in page_table_check_set and page_table_check_clear, leading to the >>>>>> bug being raised. >>>>> >>>>> We should change anon boolean to be: >>>>> >>>>> anon = !PageSlab(page) && PageAnon(page); >>>> >>>> No. Slab pages are not elegible for mapping into userspace. That's >>> >>> Sure, I can add BUG_ON(PageSlab(page)); to page_table_check_set. >>> >>>> all. There should be a BUG() for that. And I do mean BUG(), not >>>> "return error to user". Something has gone horribly wrong, and it's >>>> time to crash. >>> >>> It is just too easy to make slab available via remap_pfn_range(), but >>> I do not think we want to add BUG() into the remap function, otherwise >>> we will break devices such as /dev/mem. >> >> Slab pages can't be mmaped. Really, no matter what interface you're >> using. page->_mapcount is necessarily incremented by mapping to >> userspace, and slab uses that space for its own purposes (and has >> for decades). It's similar for page tables and other allocations that >> use PageType. > > Mapping random memory in /dev/mem can cause mapping slab pages in to > userspace, the page->_mapcount is not incremented (and other fields > are not accessed) in that case, as we are using VM_PFNMAP type VMA, > which does not access "struct page". We should be using vm_normal_page() to identify if we should be looking at the struct page or not, no? -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: usbdev_mmap causes type confusion in page_table_check 2023-05-08 22:46 ` David Hildenbrand @ 2023-05-08 23:17 ` Pasha Tatashin 2023-05-08 23:21 ` Pasha Tatashin 0 siblings, 1 reply; 16+ messages in thread From: Pasha Tatashin @ 2023-05-08 23:17 UTC (permalink / raw) To: David Hildenbrand Cc: Matthew Wilcox, Ruihan Li, syzbot+fcf1a817ceb50935ce99, akpm, linux-kernel, linux-mm, gregkh, linux-usb, syzkaller-bugs On Mon, May 8, 2023 at 3:46 PM David Hildenbrand <david@redhat.com> wrote: > > On 08.05.23 23:55, Pasha Tatashin wrote: > > On Mon, May 8, 2023 at 2:52 PM Matthew Wilcox <willy@infradead.org> wrote: > >> > >> On Mon, May 08, 2023 at 02:48:59PM -0700, Pasha Tatashin wrote: > >>> On Mon, May 8, 2023 at 2:36 PM Matthew Wilcox <willy@infradead.org> wrote: > >>>> > >>>> On Mon, May 08, 2023 at 05:27:10PM -0400, Pasha Tatashin wrote: > >>>>>> static void page_table_check_set(struct mm_struct *mm, unsigned long addr, > >>>>>> unsigned long pfn, unsigned long pgcnt, > >>>>>> bool rw) > >>>>>> { > >>>>>> // ... > >>>>>> anon = PageAnon(page); > >>>>>> for (i = 0; i < pgcnt; i++) { > >>>>>> // ... > >>>>>> if (anon) { > >>>>>> BUG_ON(atomic_read(&ptc->file_map_count)); > >>>>>> BUG_ON(atomic_inc_return(&ptc->anon_map_count) > 1 && rw); > >>>>>> } else { > >>>>>> BUG_ON(atomic_read(&ptc->anon_map_count)); > >>>>>> BUG_ON(atomic_inc_return(&ptc->file_map_count) < 0); > >>>>>> } > >>>>>> // ... > >>>>>> } > >>>>>> // ... > >>>>>> } > >>>>>> > >>>>>> This call to PageAnon is invalid for slab pages because slab reuses the bits > >>>>>> in struct page/folio to store its internal states, and the anonymity bit only > >>>>>> exists in struct page/folio. As a result, the counters are incorrectly updated > >>>>>> and checked in page_table_check_set and page_table_check_clear, leading to the > >>>>>> bug being raised. > >>>>> > >>>>> We should change anon boolean to be: > >>>>> > >>>>> anon = !PageSlab(page) && PageAnon(page); > >>>> > >>>> No. Slab pages are not elegible for mapping into userspace. That's > >>> > >>> Sure, I can add BUG_ON(PageSlab(page)); to page_table_check_set. > >>> > >>>> all. There should be a BUG() for that. And I do mean BUG(), not > >>>> "return error to user". Something has gone horribly wrong, and it's > >>>> time to crash. > >>> > >>> It is just too easy to make slab available via remap_pfn_range(), but > >>> I do not think we want to add BUG() into the remap function, otherwise > >>> we will break devices such as /dev/mem. > >> > >> Slab pages can't be mmaped. Really, no matter what interface you're > >> using. page->_mapcount is necessarily incremented by mapping to > >> userspace, and slab uses that space for its own purposes (and has > >> for decades). It's similar for page tables and other allocations that > >> use PageType. > > > > Mapping random memory in /dev/mem can cause mapping slab pages in to > > userspace, the page->_mapcount is not incremented (and other fields > > are not accessed) in that case, as we are using VM_PFNMAP type VMA, > > which does not access "struct page". > > We should be using vm_normal_page() to identify if we should be looking > at the struct page or not, no? For normal Kernel-MM operations, vm_normal_page() should be used to get "struct page" based on vma+addr+pte combination, but page_table_check does not use vma for its operation in order to strengthen the verification of no invalid page sharing. But, even vm_normal_page() can cause access to the "struct page" for VM_PFNMAP if pfn_valid(pfn) is true. So, vm_normal_page() can return a struct page for a user mapped slab page. Pasha > > -- > Thanks, > > David / dhildenb > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: usbdev_mmap causes type confusion in page_table_check 2023-05-08 23:17 ` Pasha Tatashin @ 2023-05-08 23:21 ` Pasha Tatashin 2023-05-08 23:37 ` David Hildenbrand 0 siblings, 1 reply; 16+ messages in thread From: Pasha Tatashin @ 2023-05-08 23:21 UTC (permalink / raw) To: David Hildenbrand Cc: Matthew Wilcox, Ruihan Li, syzbot+fcf1a817ceb50935ce99, akpm, linux-kernel, linux-mm, gregkh, linux-usb, syzkaller-bugs > For normal Kernel-MM operations, vm_normal_page() should be used to > get "struct page" based on vma+addr+pte combination, but > page_table_check does not use vma for its operation in order to > strengthen the verification of no invalid page sharing. But, even > vm_normal_page() can cause access to the "struct page" for VM_PFNMAP > if pfn_valid(pfn) is true. So, vm_normal_page() can return a struct > page for a user mapped slab page. Only for !ARCH_HAS_PTE_SPECIAL case, otherwise NULL is returned. Pasha ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: usbdev_mmap causes type confusion in page_table_check 2023-05-08 23:21 ` Pasha Tatashin @ 2023-05-08 23:37 ` David Hildenbrand 2023-05-09 0:07 ` Pasha Tatashin 0 siblings, 1 reply; 16+ messages in thread From: David Hildenbrand @ 2023-05-08 23:37 UTC (permalink / raw) To: Pasha Tatashin Cc: Matthew Wilcox, Ruihan Li, syzbot+fcf1a817ceb50935ce99, akpm, linux-kernel, linux-mm, gregkh, linux-usb, syzkaller-bugs On 09.05.23 01:21, Pasha Tatashin wrote: >> For normal Kernel-MM operations, vm_normal_page() should be used to >> get "struct page" based on vma+addr+pte combination, but >> page_table_check does not use vma for its operation in order to >> strengthen the verification of no invalid page sharing. But, even I'm not sure if that's the right approach for this case here, though. >> vm_normal_page() can cause access to the "struct page" for VM_PFNMAP >> if pfn_valid(pfn) is true. So, vm_normal_page() can return a struct >> page for a user mapped slab page. > > Only for !ARCH_HAS_PTE_SPECIAL case, otherwise NULL is returned. That would violate VM_PFNMAP semantics, though. I remember that there was a trick to it. Assuming we map /dev/mem, what stops a page we mapped and determined to be !anon to be freed and reused, such that we suddenly have an anon page mappped? In that case, we really don't want to look at the "struct page" ever, no? -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: usbdev_mmap causes type confusion in page_table_check 2023-05-08 23:37 ` David Hildenbrand @ 2023-05-09 0:07 ` Pasha Tatashin 0 siblings, 0 replies; 16+ messages in thread From: Pasha Tatashin @ 2023-05-09 0:07 UTC (permalink / raw) To: David Hildenbrand Cc: Matthew Wilcox, Ruihan Li, syzbot+fcf1a817ceb50935ce99, akpm, linux-kernel, linux-mm, gregkh, linux-usb, syzkaller-bugs On Mon, May 8, 2023 at 4:37 PM David Hildenbrand <david@redhat.com> wrote: > > On 09.05.23 01:21, Pasha Tatashin wrote: > >> For normal Kernel-MM operations, vm_normal_page() should be used to > >> get "struct page" based on vma+addr+pte combination, but > >> page_table_check does not use vma for its operation in order to > >> strengthen the verification of no invalid page sharing. But, even > > I'm not sure if that's the right approach for this case here, though. > > >> vm_normal_page() can cause access to the "struct page" for VM_PFNMAP > >> if pfn_valid(pfn) is true. So, vm_normal_page() can return a struct > >> page for a user mapped slab page. > > > > Only for !ARCH_HAS_PTE_SPECIAL case, otherwise NULL is returned. > > That would violate VM_PFNMAP semantics, though. I remember that there > was a trick to it. > > Assuming we map /dev/mem, what stops a page we mapped and determined to > be !anon to be freed and reused, such that we suddenly have an anon page > mappped? > > In that case, we really don't want to look at the "struct page" ever, no? Good point. page_table_check just does not work well /dev/mem. I am thinking of adding BUG_ON(PageSlab(page); and also "depends on !DEVMEM" for the PAGE_TABLE_CHECK config option. Pasha ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: usbdev_mmap causes type confusion in page_table_check 2023-05-08 21:27 ` Pasha Tatashin 2023-05-08 21:36 ` Matthew Wilcox @ 2023-05-08 21:37 ` David Hildenbrand 1 sibling, 0 replies; 16+ messages in thread From: David Hildenbrand @ 2023-05-08 21:37 UTC (permalink / raw) To: Pasha Tatashin, Ruihan Li Cc: syzbot+fcf1a817ceb50935ce99, akpm, linux-kernel, linux-mm, gregkh, linux-usb, syzkaller-bugs, Matthew Wilcox, Lorenzo Stoakes, Vlastimil Babka On 08.05.23 23:27, Pasha Tatashin wrote: > On Sun, May 7, 2023 at 9:58 AM Ruihan Li <lrh2000@pku.edu.cn> wrote: >> >> Hi all, > > Hi Ruihan, > > Thank you for bisecting, and great analysis of the problem. > >> Recently, syzbot reported [1] ("kernel BUG in page_table_check_clear"). After >> some bisection, I found out that when usbdev_mmap calls remap_pfn_range on >> kmalloc'ed memory, it causes type confusion between struct folio and slab in >> page_table_check. As I will explain below, it seems that both usb-side and >> mm-side need some fixes. So I have cc'd linux-usb and linux-mm here, as well >> as their maintainers, to see whether there are any comments on the proposed >> way to fix. >> >> [1] https://lore.kernel.org/all/000000000000258e5e05fae79fc1@google.com/ >> >> To handle mmap requests for /dev/bus/usb/XXX/YYY, usbdev_mmap first allocates >> memory by calling usb_alloc_coherent and then maps it into the user space >> using remap_pfn_range: >> >> static int usbdev_mmap(struct file *file, struct vm_area_struct *vma) >> { >> // ... >> mem = usb_alloc_coherent(ps->dev, size, GFP_USER | __GFP_NOWARN, >> &dma_handle); >> // ... >> if (hcd->localmem_pool || !hcd_uses_dma(hcd)) { >> if (remap_pfn_range(vma, vma->vm_start, >> virt_to_phys(usbm->mem) >> PAGE_SHIFT, >> size, vma->vm_page_prot) < 0) { >> // ... >> } >> } >> // ... >> } >> >> Note that in this case, we consider the DMA-unavailable scenario, which, to be >> honest, is rare in practice. However, syzbot emulates USB devices using a >> module named dummy_hcd, and since these devices are emulated, DMA is not >> available at all. >> >> Meanwhile, usb_alloc_coherent calls hcd_buffer_alloc, which uses kmalloc for >> memory allocation: >> >> void *hcd_buffer_alloc( >> struct usb_bus *bus, >> size_t size, >> gfp_t mem_flags, >> dma_addr_t *dma >> ) >> { >> // ... >> if (!hcd_uses_dma(hcd)) { >> *dma = ~(dma_addr_t) 0; >> return kmalloc(size, mem_flags); >> } >> // ... >> } >> >> However, during the update of the page table to map the kmalloc'd page into >> the user space, page_table_check_set attempts to determine whether the page is >> anonymous using PageAnon: >> >> static void page_table_check_set(struct mm_struct *mm, unsigned long addr, >> unsigned long pfn, unsigned long pgcnt, >> bool rw) >> { >> // ... >> anon = PageAnon(page); >> for (i = 0; i < pgcnt; i++) { >> // ... >> if (anon) { >> BUG_ON(atomic_read(&ptc->file_map_count)); >> BUG_ON(atomic_inc_return(&ptc->anon_map_count) > 1 && rw); >> } else { >> BUG_ON(atomic_read(&ptc->anon_map_count)); >> BUG_ON(atomic_inc_return(&ptc->file_map_count) < 0); >> } >> // ... >> } >> // ... >> } >> >> This call to PageAnon is invalid for slab pages because slab reuses the bits >> in struct page/folio to store its internal states, and the anonymity bit only >> exists in struct page/folio. As a result, the counters are incorrectly updated >> and checked in page_table_check_set and page_table_check_clear, leading to the >> bug being raised. > > We should change anon boolean to be: > > anon = !PageSlab(page) && PageAnon(page); > > This way, we will have a correct way to determine anon pages, and the > rest will fall into file_map_count. The file (non-anon) PTEs are OK to > be double mapped, so there won't be any problems from page_table_check > point of view even if it is a slab page. I'm surprised that we allow mapping slab pages to user space. I somehow remember that this is a big no-no. Do we really want to allow that? Are there many such users or is this the only one? If we do support it, we might want to update some PageAnon() checks in mm/gup.c too to exclude slab pages ... -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: usbdev_mmap causes type confusion in page_table_check 2023-05-07 13:58 ` usbdev_mmap causes type confusion in page_table_check Ruihan Li 2023-05-08 21:27 ` Pasha Tatashin @ 2023-05-09 13:25 ` Christoph Hellwig 2023-05-09 14:01 ` Greg KH 1 sibling, 1 reply; 16+ messages in thread From: Christoph Hellwig @ 2023-05-09 13:25 UTC (permalink / raw) To: Ruihan Li Cc: syzbot+fcf1a817ceb50935ce99, akpm, linux-kernel, linux-mm, pasha.tatashin, gregkh, linux-usb, syzkaller-bugs On Sun, May 07, 2023 at 09:58:44PM +0800, Ruihan Li wrote: > static int usbdev_mmap(struct file *file, struct vm_area_struct *vma) > { > // ... > mem = usb_alloc_coherent(ps->dev, size, GFP_USER | __GFP_NOWARN, > &dma_handle); > // ... > if (hcd->localmem_pool || !hcd_uses_dma(hcd)) { > if (remap_pfn_range(vma, vma->vm_start, > virt_to_phys(usbm->mem) >> PAGE_SHIFT, usb_alloc_coherent and up in the DMA coherent allocator (usually anyway), and you absolutely must never do a virt_to_phys or virt_to_page on that return value. This code is a buggy as f**k. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: usbdev_mmap causes type confusion in page_table_check 2023-05-09 13:25 ` Christoph Hellwig @ 2023-05-09 14:01 ` Greg KH 2023-05-10 13:17 ` Christoph Hellwig 0 siblings, 1 reply; 16+ messages in thread From: Greg KH @ 2023-05-09 14:01 UTC (permalink / raw) To: Christoph Hellwig Cc: Ruihan Li, syzbot+fcf1a817ceb50935ce99, akpm, linux-kernel, linux-mm, pasha.tatashin, linux-usb, syzkaller-bugs On Tue, May 09, 2023 at 06:25:42AM -0700, Christoph Hellwig wrote: > On Sun, May 07, 2023 at 09:58:44PM +0800, Ruihan Li wrote: > > static int usbdev_mmap(struct file *file, struct vm_area_struct *vma) > > { > > // ... > > mem = usb_alloc_coherent(ps->dev, size, GFP_USER | __GFP_NOWARN, > > &dma_handle); > > // ... > > if (hcd->localmem_pool || !hcd_uses_dma(hcd)) { > > if (remap_pfn_range(vma, vma->vm_start, > > virt_to_phys(usbm->mem) >> PAGE_SHIFT, > > usb_alloc_coherent and up in the DMA coherent allocator (usually > anyway), and you absolutely must never do a virt_to_phys or virt_to_page > on that return value. This code is a buggy as f**k. Odd, you gave it a reviewed-by: in commit a0e710a7def4 ("USB: usbfs: fix mmap dma mismatch") back in 2020 when it was merged as you said that was the way to fix this up. :) Do you have a better way to do it now that is more correct? Did some DMA changes happen that missed this codepath getting fixed up? thanks, gre gk-h ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: usbdev_mmap causes type confusion in page_table_check 2023-05-09 14:01 ` Greg KH @ 2023-05-10 13:17 ` Christoph Hellwig 0 siblings, 0 replies; 16+ messages in thread From: Christoph Hellwig @ 2023-05-10 13:17 UTC (permalink / raw) To: Greg KH Cc: Christoph Hellwig, Ruihan Li, syzbot+fcf1a817ceb50935ce99, akpm, linux-kernel, linux-mm, pasha.tatashin, linux-usb, syzkaller-bugs On Tue, May 09, 2023 at 04:01:02PM +0200, Greg KH wrote: > > > mem = usb_alloc_coherent(ps->dev, size, GFP_USER | __GFP_NOWARN, > > > &dma_handle); > > > // ... > > > if (hcd->localmem_pool || !hcd_uses_dma(hcd)) { > > > if (remap_pfn_range(vma, vma->vm_start, > > > virt_to_phys(usbm->mem) >> PAGE_SHIFT, > > > > usb_alloc_coherent and up in the DMA coherent allocator (usually > > anyway), and you absolutely must never do a virt_to_phys or virt_to_page > > on that return value. This code is a buggy as f**k. > > Odd, you gave it a reviewed-by: in commit a0e710a7def4 ("USB: usbfs: fix > mmap dma mismatch") back in 2020 when it was merged as you said that was > the way to fix this up. :) > > Do you have a better way to do it now that is more correct? Did some > DMA changes happen that missed this codepath getting fixed up? Sorry, I should not have shouted as quickly. The code is clearly guarded by the same conditional that makes it not use the DMA API, so from the DMA API POV this is actually correct, just ugly. The fix for the actual remap_file_ranges thing is probably something like this: diff --git a/drivers/usb/core/buffer.c b/drivers/usb/core/buffer.c index fbb087b728dc98..be56eba2558814 100644 --- a/drivers/usb/core/buffer.c +++ b/drivers/usb/core/buffer.c @@ -131,7 +131,7 @@ void *hcd_buffer_alloc( /* some USB hosts just use PIO */ if (!hcd_uses_dma(hcd)) { *dma = ~(dma_addr_t) 0; - return kmalloc(size, mem_flags); + return (void *)__get_free_pages(get_order(size), mem_flags); } for (i = 0; i < HCD_BUFFER_POOLS; i++) { @@ -160,7 +160,7 @@ void hcd_buffer_free( } if (!hcd_uses_dma(hcd)) { - kfree(addr); + free_pages((unsigned long)addr, get_order(size)); return; } ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-05-10 13:17 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-05-05 0:46 [syzbot] [mm?] kernel BUG in page_table_check_clear syzbot 2023-05-07 13:58 ` usbdev_mmap causes type confusion in page_table_check Ruihan Li 2023-05-08 21:27 ` Pasha Tatashin 2023-05-08 21:36 ` Matthew Wilcox 2023-05-08 21:48 ` Pasha Tatashin 2023-05-08 21:52 ` Matthew Wilcox 2023-05-08 21:55 ` Pasha Tatashin 2023-05-08 22:46 ` David Hildenbrand 2023-05-08 23:17 ` Pasha Tatashin 2023-05-08 23:21 ` Pasha Tatashin 2023-05-08 23:37 ` David Hildenbrand 2023-05-09 0:07 ` Pasha Tatashin 2023-05-08 21:37 ` David Hildenbrand 2023-05-09 13:25 ` Christoph Hellwig 2023-05-09 14:01 ` Greg KH 2023-05-10 13:17 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox