* Re: Bug report: vfio over kernel 5.19 - mm area [not found] <a99ed393-3b17-887f-a1f8-a288da9108a0@nvidia.com> @ 2022-06-15 10:52 ` Yishai Hadas 2022-06-15 13:59 ` Joao Martins 2022-06-15 14:02 ` Alex Williamson 0 siblings, 2 replies; 7+ messages in thread From: Yishai Hadas @ 2022-06-15 10:52 UTC (permalink / raw) To: Alex Williamson, akpm; +Cc: jason Gunthorpe, maor Gottlieb, kvm, idok, linux-mm Adding some extra relevant people from the MM area. On 15/06/2022 13:43, Yishai Hadas wrote: > Hi All, > > Any idea what could cause the below break in 5.19 ? we run QEMU and > immediately the machine is stuck. > > Once I run, echo l > /proc/sysrq-trigger could see the below task > which seems to be stuck.. > > This basic flow worked fine in 5.18. > > [1162.056583] NMI backtrace for cpu 4 > [ 1162.056585] CPU: 4 PID: 1979 Comm: qemu-system-x86 Not tainted > 5.19.0-rc1 #747 > [ 1162.056587] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), > BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 > [ 1162.056588] RIP: 0010:pmd_huge+0x0/0x20 > [ 1162.056592] Code: 49 89 44 24 28 48 8b 47 30 49 89 44 24 30 31 c0 > 41 5c c3 5b b8 01 00 00 00 5d 41 5c c3 cc cc cc cc cc cc cc cc cc cc > cc cc cc <0f> 1f 44 00 00 31 c0 48 f7 c7 9f ff ff ff 74 0f 81 e7 81 00 > 00 00 > [ 1162.056594] RSP: 0018:ffff888146253b38 EFLAGS: 00000202 > [ 1162.056596] RAX: ffff888101461980 RBX: ffff888146253bc0 RCX: > 000ffffffffff000 > [ 1162.056597] RDX: ffff88814fa22000 RSI: 00007f9f68231000 RDI: > 000000010a6b6067 > [ 1162.056598] RBP: ffff888111b90dc0 R08: 000000000002f424 R09: > 0000000000000001 > [ 1162.056599] R10: ffffffff825c2a40 R11: 0000000000000a08 R12: > ffff88814fa22a08 > [ 1162.056600] R13: 000000010a6b6067 R14: 0000000000052202 R15: > 00007f9f68231000 > [ 1162.056602] FS: 00007f9f6c228c40(0000) GS:ffff88885f900000(0000) > knlGS:0000000000000000 > [ 1162.056605] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1162.056606] CR2: 00005643994fd0ed CR3: 00000001496da005 CR4: > 0000000000372ea0 > [ 1162.056607] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 1162.056609] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [ 1162.056610] Call Trace: > [ 1162.056611] <TASK> > [ 1162.056611] follow_page_mask+0x196/0x5e0 > [ 1162.056615] __get_user_pages+0x190/0x5d0 > [ 1162.056617] ? flush_workqueue_prep_pwqs+0x110/0x110 > [ 1162.056620] __gup_longterm_locked+0xaf/0x470 > [ 1162.056624] vaddr_get_pfns+0x8e/0x240 [vfio_iommu_type1] > [ 1162.056628] ? qi_flush_iotlb+0x83/0xa0 > [ 1162.056631] vfio_pin_pages_remote+0x326/0x460 [vfio_iommu_type1] > [ 1162.056634] vfio_iommu_type1_ioctl+0x421/0x14f0 [vfio_iommu_type1] > [ 1162.056638] __x64_sys_ioctl+0x3e4/0x8e0 > [ 1162.056641] do_syscall_64+0x3d/0x90 > [ 1162.056644] entry_SYSCALL_64_after_hwframe+0x46/0xb0 > [ 1162.056646] RIP: 0033:0x7f9f6d14317b > [ 1162.056648] Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 > 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 > 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 > 01 48 > [ 1162.056650] RSP: 002b:00007fff4fca15b8 EFLAGS: 00000246 ORIG_RAX: > 0000000000000010 > [ 1162.056652] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > 00007f9f6d14317b > [ 1162.056653] RDX: 00007fff4fca1620 RSI: 0000000000003b71 RDI: > 000000000000001c > [ 1162.056654] RBP: 00007fff4fca1650 R08: 0000000000000001 R09: > 0000000000000000 > [ 1162.056655] R10: 0000000100000000 R11: 0000000000000246 R12: > 0000000000000000 > [ 1162.056656] R13: 0000000000000000 R14: 0000000000000000 R15: > 0000000000000000 > [ 1162.056657] </TASK> > > Yishai > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug report: vfio over kernel 5.19 - mm area 2022-06-15 10:52 ` Bug report: vfio over kernel 5.19 - mm area Yishai Hadas @ 2022-06-15 13:59 ` Joao Martins 2022-06-15 14:02 ` Alex Williamson 1 sibling, 0 replies; 7+ messages in thread From: Joao Martins @ 2022-06-15 13:59 UTC (permalink / raw) To: Yishai Hadas Cc: jason Gunthorpe, maor Gottlieb, kvm, idok, linux-mm, Alex Williamson, akpm On 6/15/22 11:52, Yishai Hadas wrote: > Adding some extra relevant people from the MM area. > > On 15/06/2022 13:43, Yishai Hadas wrote: >> Hi All, >> >> Any idea what could cause the below break in 5.19 ? we run QEMU and >> immediately the machine is stuck. >> >> Once I run, echo l > /proc/sysrq-trigger could see the below task >> which seems to be stuck.. >> >> This basic flow worked fine in 5.18. >> Maybe this one: https://lore.kernel.org/all/165490039431.944052.12458624139225785964.stgit@omen/ .. but I think it's not yet merged for v5.19: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/log/?h=mm-hotfixes-unstable ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug report: vfio over kernel 5.19 - mm area 2022-06-15 10:52 ` Bug report: vfio over kernel 5.19 - mm area Yishai Hadas 2022-06-15 13:59 ` Joao Martins @ 2022-06-15 14:02 ` Alex Williamson 2022-06-15 14:14 ` Yi Liu 2022-08-15 15:46 ` Yishai Hadas 1 sibling, 2 replies; 7+ messages in thread From: Alex Williamson @ 2022-06-15 14:02 UTC (permalink / raw) To: Yishai Hadas; +Cc: akpm, jason Gunthorpe, maor Gottlieb, kvm, idok, linux-mm On Wed, 15 Jun 2022 13:52:10 +0300 Yishai Hadas <yishaih@nvidia.com> wrote: > Adding some extra relevant people from the MM area. > > On 15/06/2022 13:43, Yishai Hadas wrote: > > Hi All, > > > > Any idea what could cause the below break in 5.19 ? we run QEMU and > > immediately the machine is stuck. > > > > Once I run, echo l > /proc/sysrq-trigger could see the below task > > which seems to be stuck.. > > > > This basic flow worked fine in 5.18. Spent Friday bisecting this and posted this fix: https://lore.kernel.org/all/165490039431.944052.12458624139225785964.stgit@omen/ I expect you're hotting the same. Thanks, Alex > > > > [1162.056583] NMI backtrace for cpu 4 > > [ 1162.056585] CPU: 4 PID: 1979 Comm: qemu-system-x86 Not tainted > > 5.19.0-rc1 #747 > > [ 1162.056587] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), > > BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 > > [ 1162.056588] RIP: 0010:pmd_huge+0x0/0x20 > > [ 1162.056592] Code: 49 89 44 24 28 48 8b 47 30 49 89 44 24 30 31 c0 > > 41 5c c3 5b b8 01 00 00 00 5d 41 5c c3 cc cc cc cc cc cc cc cc cc cc > > cc cc cc <0f> 1f 44 00 00 31 c0 48 f7 c7 9f ff ff ff 74 0f 81 e7 81 00 > > 00 00 > > [ 1162.056594] RSP: 0018:ffff888146253b38 EFLAGS: 00000202 > > [ 1162.056596] RAX: ffff888101461980 RBX: ffff888146253bc0 RCX: > > 000ffffffffff000 > > [ 1162.056597] RDX: ffff88814fa22000 RSI: 00007f9f68231000 RDI: > > 000000010a6b6067 > > [ 1162.056598] RBP: ffff888111b90dc0 R08: 000000000002f424 R09: > > 0000000000000001 > > [ 1162.056599] R10: ffffffff825c2a40 R11: 0000000000000a08 R12: > > ffff88814fa22a08 > > [ 1162.056600] R13: 000000010a6b6067 R14: 0000000000052202 R15: > > 00007f9f68231000 > > [ 1162.056602] FS: 00007f9f6c228c40(0000) GS:ffff88885f900000(0000) > > knlGS:0000000000000000 > > [ 1162.056605] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 1162.056606] CR2: 00005643994fd0ed CR3: 00000001496da005 CR4: > > 0000000000372ea0 > > [ 1162.056607] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > 0000000000000000 > > [ 1162.056609] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > > 0000000000000400 > > [ 1162.056610] Call Trace: > > [ 1162.056611] <TASK> > > [ 1162.056611] follow_page_mask+0x196/0x5e0 > > [ 1162.056615] __get_user_pages+0x190/0x5d0 > > [ 1162.056617] ? flush_workqueue_prep_pwqs+0x110/0x110 > > [ 1162.056620] __gup_longterm_locked+0xaf/0x470 > > [ 1162.056624] vaddr_get_pfns+0x8e/0x240 [vfio_iommu_type1] > > [ 1162.056628] ? qi_flush_iotlb+0x83/0xa0 > > [ 1162.056631] vfio_pin_pages_remote+0x326/0x460 [vfio_iommu_type1] > > [ 1162.056634] vfio_iommu_type1_ioctl+0x421/0x14f0 [vfio_iommu_type1] > > [ 1162.056638] __x64_sys_ioctl+0x3e4/0x8e0 > > [ 1162.056641] do_syscall_64+0x3d/0x90 > > [ 1162.056644] entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > [ 1162.056646] RIP: 0033:0x7f9f6d14317b > > [ 1162.056648] Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 > > 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 > > 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 > > 01 48 > > [ 1162.056650] RSP: 002b:00007fff4fca15b8 EFLAGS: 00000246 ORIG_RAX: > > 0000000000000010 > > [ 1162.056652] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > > 00007f9f6d14317b > > [ 1162.056653] RDX: 00007fff4fca1620 RSI: 0000000000003b71 RDI: > > 000000000000001c > > [ 1162.056654] RBP: 00007fff4fca1650 R08: 0000000000000001 R09: > > 0000000000000000 > > [ 1162.056655] R10: 0000000100000000 R11: 0000000000000246 R12: > > 0000000000000000 > > [ 1162.056656] R13: 0000000000000000 R14: 0000000000000000 R15: > > 0000000000000000 > > [ 1162.056657] </TASK> > > > > Yishai > > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug report: vfio over kernel 5.19 - mm area 2022-06-15 14:02 ` Alex Williamson @ 2022-06-15 14:14 ` Yi Liu 2022-06-15 14:22 ` Yishai Hadas 2022-08-15 15:46 ` Yishai Hadas 1 sibling, 1 reply; 7+ messages in thread From: Yi Liu @ 2022-06-15 14:14 UTC (permalink / raw) To: Alex Williamson, Yishai Hadas Cc: akpm, jason Gunthorpe, maor Gottlieb, kvm, idok, linux-mm Hi Alex, On 2022/6/15 22:02, Alex Williamson wrote: > On Wed, 15 Jun 2022 13:52:10 +0300 > Yishai Hadas <yishaih@nvidia.com> wrote: > >> Adding some extra relevant people from the MM area. >> >> On 15/06/2022 13:43, Yishai Hadas wrote: >>> Hi All, >>> >>> Any idea what could cause the below break in 5.19 ? we run QEMU and >>> immediately the machine is stuck. >>> >>> Once I run, echo l > /proc/sysrq-trigger could see the below task >>> which seems to be stuck.. >>> >>> This basic flow worked fine in 5.18. > > Spent Friday bisecting this and posted this fix: > > https://lore.kernel.org/all/165490039431.944052.12458624139225785964.stgit@omen/ > > I expect you're hotting the same. Thanks, I also hit a hang at calling pin_user_pages_remote() in the vaddr_get_pfns(). With the fix in the link, the issue got fixed. You may add my test-by to your fix. :-) > Alex > >>> >>> [1162.056583] NMI backtrace for cpu 4 >>> [ 1162.056585] CPU: 4 PID: 1979 Comm: qemu-system-x86 Not tainted >>> 5.19.0-rc1 #747 >>> [ 1162.056587] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), >>> BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 >>> [ 1162.056588] RIP: 0010:pmd_huge+0x0/0x20 >>> [ 1162.056592] Code: 49 89 44 24 28 48 8b 47 30 49 89 44 24 30 31 c0 >>> 41 5c c3 5b b8 01 00 00 00 5d 41 5c c3 cc cc cc cc cc cc cc cc cc cc >>> cc cc cc <0f> 1f 44 00 00 31 c0 48 f7 c7 9f ff ff ff 74 0f 81 e7 81 00 >>> 00 00 >>> [ 1162.056594] RSP: 0018:ffff888146253b38 EFLAGS: 00000202 >>> [ 1162.056596] RAX: ffff888101461980 RBX: ffff888146253bc0 RCX: >>> 000ffffffffff000 >>> [ 1162.056597] RDX: ffff88814fa22000 RSI: 00007f9f68231000 RDI: >>> 000000010a6b6067 >>> [ 1162.056598] RBP: ffff888111b90dc0 R08: 000000000002f424 R09: >>> 0000000000000001 >>> [ 1162.056599] R10: ffffffff825c2a40 R11: 0000000000000a08 R12: >>> ffff88814fa22a08 >>> [ 1162.056600] R13: 000000010a6b6067 R14: 0000000000052202 R15: >>> 00007f9f68231000 >>> [ 1162.056602] FS: 00007f9f6c228c40(0000) GS:ffff88885f900000(0000) >>> knlGS:0000000000000000 >>> [ 1162.056605] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 1162.056606] CR2: 00005643994fd0ed CR3: 00000001496da005 CR4: >>> 0000000000372ea0 >>> [ 1162.056607] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>> 0000000000000000 >>> [ 1162.056609] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: >>> 0000000000000400 >>> [ 1162.056610] Call Trace: >>> [ 1162.056611] <TASK> >>> [ 1162.056611] follow_page_mask+0x196/0x5e0 >>> [ 1162.056615] __get_user_pages+0x190/0x5d0 >>> [ 1162.056617] ? flush_workqueue_prep_pwqs+0x110/0x110 >>> [ 1162.056620] __gup_longterm_locked+0xaf/0x470 >>> [ 1162.056624] vaddr_get_pfns+0x8e/0x240 [vfio_iommu_type1] >>> [ 1162.056628] ? qi_flush_iotlb+0x83/0xa0 >>> [ 1162.056631] vfio_pin_pages_remote+0x326/0x460 [vfio_iommu_type1] >>> [ 1162.056634] vfio_iommu_type1_ioctl+0x421/0x14f0 [vfio_iommu_type1] >>> [ 1162.056638] __x64_sys_ioctl+0x3e4/0x8e0 >>> [ 1162.056641] do_syscall_64+0x3d/0x90 >>> [ 1162.056644] entry_SYSCALL_64_after_hwframe+0x46/0xb0 >>> [ 1162.056646] RIP: 0033:0x7f9f6d14317b >>> [ 1162.056648] Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 >>> 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 >>> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 >>> 01 48 >>> [ 1162.056650] RSP: 002b:00007fff4fca15b8 EFLAGS: 00000246 ORIG_RAX: >>> 0000000000000010 >>> [ 1162.056652] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>> 00007f9f6d14317b >>> [ 1162.056653] RDX: 00007fff4fca1620 RSI: 0000000000003b71 RDI: >>> 000000000000001c >>> [ 1162.056654] RBP: 00007fff4fca1650 R08: 0000000000000001 R09: >>> 0000000000000000 >>> [ 1162.056655] R10: 0000000100000000 R11: 0000000000000246 R12: >>> 0000000000000000 >>> [ 1162.056656] R13: 0000000000000000 R14: 0000000000000000 R15: >>> 0000000000000000 >>> [ 1162.056657] </TASK> >>> >>> Yishai >>> >> > -- Regards, Yi Liu ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug report: vfio over kernel 5.19 - mm area 2022-06-15 14:14 ` Yi Liu @ 2022-06-15 14:22 ` Yishai Hadas 0 siblings, 0 replies; 7+ messages in thread From: Yishai Hadas @ 2022-06-15 14:22 UTC (permalink / raw) To: Yi Liu, Alex Williamson Cc: akpm, jason Gunthorpe, maor Gottlieb, kvm, idok, linux-mm On 15/06/2022 17:14, Yi Liu wrote: > Hi Alex, > > On 2022/6/15 22:02, Alex Williamson wrote: >> On Wed, 15 Jun 2022 13:52:10 +0300 >> Yishai Hadas <yishaih@nvidia.com> wrote: >> >>> Adding some extra relevant people from the MM area. >>> >>> On 15/06/2022 13:43, Yishai Hadas wrote: >>>> Hi All, >>>> >>>> Any idea what could cause the below break in 5.19 ? we run QEMU and >>>> immediately the machine is stuck. >>>> >>>> Once I run, echo l > /proc/sysrq-trigger could see the below task >>>> which seems to be stuck.. >>>> >>>> This basic flow worked fine in 5.18. >> >> Spent Friday bisecting this and posted this fix: >> >> https://lore.kernel.org/all/165490039431.944052.12458624139225785964.stgit@omen/ >> >> >> I expect you're hotting the same. Thanks, > > I also hit a hang at calling pin_user_pages_remote() in the > vaddr_get_pfns(). With the fix in the link, the issue got fixed. > You may add my test-by to your fix. :-) Thanks Alex, it seems to be the same issue, with your fix I don't hit the problem. > >> Alex >> >>>> >>>> [1162.056583] NMI backtrace for cpu 4 >>>> [ 1162.056585] CPU: 4 PID: 1979 Comm: qemu-system-x86 Not tainted >>>> 5.19.0-rc1 #747 >>>> [ 1162.056587] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), >>>> BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 >>>> [ 1162.056588] RIP: 0010:pmd_huge+0x0/0x20 >>>> [ 1162.056592] Code: 49 89 44 24 28 48 8b 47 30 49 89 44 24 30 31 c0 >>>> 41 5c c3 5b b8 01 00 00 00 5d 41 5c c3 cc cc cc cc cc cc cc cc cc cc >>>> cc cc cc <0f> 1f 44 00 00 31 c0 48 f7 c7 9f ff ff ff 74 0f 81 e7 81 00 >>>> 00 00 >>>> [ 1162.056594] RSP: 0018:ffff888146253b38 EFLAGS: 00000202 >>>> [ 1162.056596] RAX: ffff888101461980 RBX: ffff888146253bc0 RCX: >>>> 000ffffffffff000 >>>> [ 1162.056597] RDX: ffff88814fa22000 RSI: 00007f9f68231000 RDI: >>>> 000000010a6b6067 >>>> [ 1162.056598] RBP: ffff888111b90dc0 R08: 000000000002f424 R09: >>>> 0000000000000001 >>>> [ 1162.056599] R10: ffffffff825c2a40 R11: 0000000000000a08 R12: >>>> ffff88814fa22a08 >>>> [ 1162.056600] R13: 000000010a6b6067 R14: 0000000000052202 R15: >>>> 00007f9f68231000 >>>> [ 1162.056602] FS: 00007f9f6c228c40(0000) GS:ffff88885f900000(0000) >>>> knlGS:0000000000000000 >>>> [ 1162.056605] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> [ 1162.056606] CR2: 00005643994fd0ed CR3: 00000001496da005 CR4: >>>> 0000000000372ea0 >>>> [ 1162.056607] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>>> 0000000000000000 >>>> [ 1162.056609] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: >>>> 0000000000000400 >>>> [ 1162.056610] Call Trace: >>>> [ 1162.056611] <TASK> >>>> [ 1162.056611] follow_page_mask+0x196/0x5e0 >>>> [ 1162.056615] __get_user_pages+0x190/0x5d0 >>>> [ 1162.056617] ? flush_workqueue_prep_pwqs+0x110/0x110 >>>> [ 1162.056620] __gup_longterm_locked+0xaf/0x470 >>>> [ 1162.056624] vaddr_get_pfns+0x8e/0x240 [vfio_iommu_type1] >>>> [ 1162.056628] ? qi_flush_iotlb+0x83/0xa0 >>>> [ 1162.056631] vfio_pin_pages_remote+0x326/0x460 [vfio_iommu_type1] >>>> [ 1162.056634] vfio_iommu_type1_ioctl+0x421/0x14f0 [vfio_iommu_type1] >>>> [ 1162.056638] __x64_sys_ioctl+0x3e4/0x8e0 >>>> [ 1162.056641] do_syscall_64+0x3d/0x90 >>>> [ 1162.056644] entry_SYSCALL_64_after_hwframe+0x46/0xb0 >>>> [ 1162.056646] RIP: 0033:0x7f9f6d14317b >>>> [ 1162.056648] Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 >>>> 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 >>>> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 >>>> 01 48 >>>> [ 1162.056650] RSP: 002b:00007fff4fca15b8 EFLAGS: 00000246 ORIG_RAX: >>>> 0000000000000010 >>>> [ 1162.056652] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>>> 00007f9f6d14317b >>>> [ 1162.056653] RDX: 00007fff4fca1620 RSI: 0000000000003b71 RDI: >>>> 000000000000001c >>>> [ 1162.056654] RBP: 00007fff4fca1650 R08: 0000000000000001 R09: >>>> 0000000000000000 >>>> [ 1162.056655] R10: 0000000100000000 R11: 0000000000000246 R12: >>>> 0000000000000000 >>>> [ 1162.056656] R13: 0000000000000000 R14: 0000000000000000 R15: >>>> 0000000000000000 >>>> [ 1162.056657] </TASK> >>>> >>>> Yishai >>> >> > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug report: vfio over kernel 5.19 - mm area 2022-06-15 14:02 ` Alex Williamson 2022-06-15 14:14 ` Yi Liu @ 2022-08-15 15:46 ` Yishai Hadas 2022-08-15 17:52 ` Alex Williamson 1 sibling, 1 reply; 7+ messages in thread From: Yishai Hadas @ 2022-08-15 15:46 UTC (permalink / raw) To: Alex Williamson, alex.sierra Cc: akpm, jason Gunthorpe, maor Gottlieb, kvm, idok, linux-mm On 15/06/2022 17:02, Alex Williamson wrote: > On Wed, 15 Jun 2022 13:52:10 +0300 > Yishai Hadas <yishaih@nvidia.com> wrote: > >> Adding some extra relevant people from the MM area. >> >> On 15/06/2022 13:43, Yishai Hadas wrote: >>> Hi All, >>> >>> Any idea what could cause the below break in 5.19 ? we run QEMU and >>> immediately the machine is stuck. >>> >>> Once I run, echo l > /proc/sysrq-trigger could see the below task >>> which seems to be stuck.. >>> >>> This basic flow worked fine in 5.18. > Spent Friday bisecting this and posted this fix: > > https://lore.kernel.org/all/165490039431.944052.12458624139225785964.stgit@omen/ > > I expect you're hotting the same. Thanks, > > Alex Alex, It seems that we got the same bug again in V6.0 RC1 .. The below code [1] from commit [2], put back the 'is_zero_pfn()' under the !(..) and seems buggy. I would expect the below fix for that [3]. Alex Sierra, Can you please review the below suggested fix for your patch and send a patch for RC2 accordingly ? Yishai [1] See: https://elixir.bootlin.com/linux/v6.0-rc1/source/include/linux/mm.h#L1549 diff --git a/include/linux/mm.h b/include/linux/mm.h index a2d01e49253b..64393ed3330a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -28,6 +28,7 @@ #include <linux/sched.h> #include <linux/pgtable.h> #include <linux/kasan.h> +#include <linux/memremap.h> struct mempolicy; struct anon_vma; @@ -1537,7 +1538,9 @@ static inline bool is_longterm_pinnable_page(struct page *page) if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) return false; #endif - return !is_zone_movable_page(page) || is_zero_pfn(page_to_pfn(page)); + return !(is_device_coherent_page(page) || + is_zone_movable_page(page) || + is_zero_pfn(page_to_pfn(page))); } [2] f25cbb7a95a24ff9a2a3bebd308e303942ae6b2c Author: Alex Sierra <alex.sierra@amd.com> Date: Fri Jul 15 10:05:10 2022 -0500 mm: add zone device coherent type memory support [3] Expected fix diff --git a/include/linux/mm.h b/include/linux/mm.h index 3bedc449c14d..b25f9886bd4c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1544,9 +1544,9 @@ static inline bool is_longterm_pinnable_page(struct page *page) if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) return false; #endif - return !(is_device_coherent_page(page) || - is_zone_movable_page(page) || - is_zero_pfn(page_to_pfn(page))); + return !is_device_coherent_page(page) || + !is_zone_movable_page(page) || + is_zero_pfn(page_to_pfn(page)); } #else static inline bool is_longterm_pinnable_page(struct page *page) >>> [1162.056583] NMI backtrace for cpu 4 >>> [ 1162.056585] CPU: 4 PID: 1979 Comm: qemu-system-x86 Not tainted >>> 5.19.0-rc1 #747 >>> [ 1162.056587] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), >>> BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 >>> [ 1162.056588] RIP: 0010:pmd_huge+0x0/0x20 >>> [ 1162.056592] Code: 49 89 44 24 28 48 8b 47 30 49 89 44 24 30 31 c0 >>> 41 5c c3 5b b8 01 00 00 00 5d 41 5c c3 cc cc cc cc cc cc cc cc cc cc >>> cc cc cc <0f> 1f 44 00 00 31 c0 48 f7 c7 9f ff ff ff 74 0f 81 e7 81 00 >>> 00 00 >>> [ 1162.056594] RSP: 0018:ffff888146253b38 EFLAGS: 00000202 >>> [ 1162.056596] RAX: ffff888101461980 RBX: ffff888146253bc0 RCX: >>> 000ffffffffff000 >>> [ 1162.056597] RDX: ffff88814fa22000 RSI: 00007f9f68231000 RDI: >>> 000000010a6b6067 >>> [ 1162.056598] RBP: ffff888111b90dc0 R08: 000000000002f424 R09: >>> 0000000000000001 >>> [ 1162.056599] R10: ffffffff825c2a40 R11: 0000000000000a08 R12: >>> ffff88814fa22a08 >>> [ 1162.056600] R13: 000000010a6b6067 R14: 0000000000052202 R15: >>> 00007f9f68231000 >>> [ 1162.056602] FS: 00007f9f6c228c40(0000) GS:ffff88885f900000(0000) >>> knlGS:0000000000000000 >>> [ 1162.056605] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 1162.056606] CR2: 00005643994fd0ed CR3: 00000001496da005 CR4: >>> 0000000000372ea0 >>> [ 1162.056607] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>> 0000000000000000 >>> [ 1162.056609] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: >>> 0000000000000400 >>> [ 1162.056610] Call Trace: >>> [ 1162.056611] <TASK> >>> [ 1162.056611] follow_page_mask+0x196/0x5e0 >>> [ 1162.056615] __get_user_pages+0x190/0x5d0 >>> [ 1162.056617] ? flush_workqueue_prep_pwqs+0x110/0x110 >>> [ 1162.056620] __gup_longterm_locked+0xaf/0x470 >>> [ 1162.056624] vaddr_get_pfns+0x8e/0x240 [vfio_iommu_type1] >>> [ 1162.056628] ? qi_flush_iotlb+0x83/0xa0 >>> [ 1162.056631] vfio_pin_pages_remote+0x326/0x460 [vfio_iommu_type1] >>> [ 1162.056634] vfio_iommu_type1_ioctl+0x421/0x14f0 [vfio_iommu_type1] >>> [ 1162.056638] __x64_sys_ioctl+0x3e4/0x8e0 >>> [ 1162.056641] do_syscall_64+0x3d/0x90 >>> [ 1162.056644] entry_SYSCALL_64_after_hwframe+0x46/0xb0 >>> [ 1162.056646] RIP: 0033:0x7f9f6d14317b >>> [ 1162.056648] Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 >>> 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 >>> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 >>> 01 48 >>> [ 1162.056650] RSP: 002b:00007fff4fca15b8 EFLAGS: 00000246 ORIG_RAX: >>> 0000000000000010 >>> [ 1162.056652] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>> 00007f9f6d14317b >>> [ 1162.056653] RDX: 00007fff4fca1620 RSI: 0000000000003b71 RDI: >>> 000000000000001c >>> [ 1162.056654] RBP: 00007fff4fca1650 R08: 0000000000000001 R09: >>> 0000000000000000 >>> [ 1162.056655] R10: 0000000100000000 R11: 0000000000000246 R12: >>> 0000000000000000 >>> [ 1162.056656] R13: 0000000000000000 R14: 0000000000000000 R15: >>> 0000000000000000 >>> [ 1162.056657] </TASK> >>> >>> Yishai >>> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug report: vfio over kernel 5.19 - mm area 2022-08-15 15:46 ` Yishai Hadas @ 2022-08-15 17:52 ` Alex Williamson 0 siblings, 0 replies; 7+ messages in thread From: Alex Williamson @ 2022-08-15 17:52 UTC (permalink / raw) To: Yishai Hadas, idok Cc: alex.sierra, akpm, jason Gunthorpe, maor Gottlieb, kvm, linux-mm On Mon, 15 Aug 2022 18:46:40 +0300 Yishai Hadas <yishaih@nvidia.com> wrote: > On 15/06/2022 17:02, Alex Williamson wrote: > > On Wed, 15 Jun 2022 13:52:10 +0300 > > Yishai Hadas <yishaih@nvidia.com> wrote: > > > >> Adding some extra relevant people from the MM area. > >> > >> On 15/06/2022 13:43, Yishai Hadas wrote: > >>> Hi All, > >>> > >>> Any idea what could cause the below break in 5.19 ? we run QEMU and > >>> immediately the machine is stuck. > >>> > >>> Once I run, echo l > /proc/sysrq-trigger could see the below task > >>> which seems to be stuck.. > >>> > >>> This basic flow worked fine in 5.18. > > Spent Friday bisecting this and posted this fix: > > > > https://lore.kernel.org/all/165490039431.944052.12458624139225785964.stgit@omen/ > > > > I expect you're hotting the same. Thanks, > > > > Alex > > Alex, > > It seems that we got the same bug again in V6.0 RC1 .. > > The below code [1] from commit [2], put back the 'is_zero_pfn()' under > the !(..) and seems buggy. > > I would expect the below fix for that [3]. > > Alex Sierra, > > Can you please review the below suggested fix for your patch and send a > patch for RC2 accordingly ? > https://lore.kernel.org/all/166015037385.760108.16881097713975517242.stgit@omen/ It's in the mm tree, hopefully it'll get pushed in an early rc. Thanks, Alex ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-08-15 17:53 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <a99ed393-3b17-887f-a1f8-a288da9108a0@nvidia.com>
2022-06-15 10:52 ` Bug report: vfio over kernel 5.19 - mm area Yishai Hadas
2022-06-15 13:59 ` Joao Martins
2022-06-15 14:02 ` Alex Williamson
2022-06-15 14:14 ` Yi Liu
2022-06-15 14:22 ` Yishai Hadas
2022-08-15 15:46 ` Yishai Hadas
2022-08-15 17:52 ` Alex Williamson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox