* Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults [not found] <1586138158.v5u7myprlp.none.ref@localhost> @ 2020-04-06 19:51 ` Alex Xu (Hello71) 2020-04-06 20:25 ` Thomas Hellström (VMware) 2020-04-06 21:04 ` Thomas Hellström (VMware) 0 siblings, 2 replies; 7+ messages in thread From: Alex Xu (Hello71) @ 2020-04-06 19:51 UTC (permalink / raw) To: linux-mm, dri-devel, linux-kernel, thomas_os Cc: pv-drivers, linux-graphics-maintainer, Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle), Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse, Christian König, Dan Williams, Roland Scheidegger Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to start filling dmesg, and then closing programs causes more BUGs and hangs, and then everything grinds to a halt (can't start more programs, can't even reboot through systemd). Using master and reverting that branch up to that point fixes the problem. I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4 board with IOMMU enabled. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults 2020-04-06 19:51 ` Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults Alex Xu (Hello71) @ 2020-04-06 20:25 ` Thomas Hellström (VMware) 2020-04-06 21:04 ` Thomas Hellström (VMware) 1 sibling, 0 replies; 7+ messages in thread From: Thomas Hellström (VMware) @ 2020-04-06 20:25 UTC (permalink / raw) To: Alex Xu (Hello71), linux-mm, dri-devel, linux-kernel Cc: pv-drivers, linux-graphics-maintainer, Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle), Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse, Christian König, Dan Williams, Roland Scheidegger On 4/6/20 9:51 PM, Alex Xu (Hello71) wrote: > Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad > rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to > start filling dmesg, and then closing programs causes more BUGs and > hangs, and then everything grinds to a halt (can't start more programs, > can't even reboot through systemd). > > Using master and reverting that branch up to that point fixes the > problem. > > I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4 > board with IOMMU enabled. Hmm. That sounds bad. Could you send a copy of your config? Meanwhile, I'll prepare a small patch that disables the non-vmwgfx huge_fault() until we've figured out what's happening. /Thomas ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults 2020-04-06 19:51 ` Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults Alex Xu (Hello71) 2020-04-06 20:25 ` Thomas Hellström (VMware) @ 2020-04-06 21:04 ` Thomas Hellström (VMware) 2020-04-07 0:38 ` Alex Xu (Hello71) 1 sibling, 1 reply; 7+ messages in thread From: Thomas Hellström (VMware) @ 2020-04-06 21:04 UTC (permalink / raw) To: Alex Xu (Hello71), linux-mm, dri-devel, linux-kernel Cc: pv-drivers, linux-graphics-maintainer, Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle), Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse, Christian König, Dan Williams, Roland Scheidegger [-- Attachment #1: Type: text/plain, Size: 631 bytes --] Hi, On 4/6/20 9:51 PM, Alex Xu (Hello71) wrote: > Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad > rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to > start filling dmesg, and then closing programs causes more BUGs and > hangs, and then everything grinds to a halt (can't start more programs, > can't even reboot through systemd). > > Using master and reverting that branch up to that point fixes the > problem. > > I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4 > board with IOMMU enabled. If you could try the attached patch, that'd be great! Thanks, Thomas [-- Attachment #2: 0001-drm-ttm-Temporarily-disable-the-huge_fault-callback.patch --] [-- Type: text/x-patch, Size: 2774 bytes --] From b630b9b4dcc1d01514d97a84cbb7f0cb85333154 Mon Sep 17 00:00:00 2001 From: "Thomas Hellstrom (VMware)" <thomas_os@shipmail.org> Date: Mon, 6 Apr 2020 22:55:13 +0200 Subject: [PATCH] drm/ttm: Temporarily disable the huge_fault() callback Signed-off-by: Thomas Hellstrom (VMware) <thomas_os@shipmail.org> --- drivers/gpu/drm/ttm/ttm_bo_vm.c | 63 --------------------------------- 1 file changed, 63 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 6ee3b96f0d13..0ad30b112982 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -442,66 +442,6 @@ vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) } EXPORT_SYMBOL(ttm_bo_vm_fault); -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -/** - * ttm_pgprot_is_wrprotecting - Is a page protection value write-protecting? - * @prot: The page protection value - * - * Return: true if @prot is write-protecting. false otherwise. - */ -static bool ttm_pgprot_is_wrprotecting(pgprot_t prot) -{ - /* - * This is meant to say "pgprot_wrprotect(prot) == prot" in a generic - * way. Unfortunately there is no generic pgprot_wrprotect. - */ - return pte_val(pte_wrprotect(__pte(pgprot_val(prot)))) == - pgprot_val(prot); -} - -static vm_fault_t ttm_bo_vm_huge_fault(struct vm_fault *vmf, - enum page_entry_size pe_size) -{ - struct vm_area_struct *vma = vmf->vma; - pgprot_t prot; - struct ttm_buffer_object *bo = vma->vm_private_data; - vm_fault_t ret; - pgoff_t fault_page_size = 0; - bool write = vmf->flags & FAULT_FLAG_WRITE; - - switch (pe_size) { - case PE_SIZE_PMD: - fault_page_size = HPAGE_PMD_SIZE >> PAGE_SHIFT; - break; -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD - case PE_SIZE_PUD: - fault_page_size = HPAGE_PUD_SIZE >> PAGE_SHIFT; - break; -#endif - default: - WARN_ON_ONCE(1); - return VM_FAULT_FALLBACK; - } - - /* Fallback on write dirty-tracking or COW */ - if (write && ttm_pgprot_is_wrprotecting(vma->vm_page_prot)) - return VM_FAULT_FALLBACK; - - ret = ttm_bo_vm_reserve(bo, vmf); - if (ret) - return ret; - - prot = vm_get_page_prot(vma->vm_flags); - ret = ttm_bo_vm_fault_reserved(vmf, prot, 1, fault_page_size); - if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) - return ret; - - dma_resv_unlock(bo->base.resv); - - return ret; -} -#endif - void ttm_bo_vm_open(struct vm_area_struct *vma) { struct ttm_buffer_object *bo = vma->vm_private_data; @@ -604,9 +544,6 @@ static const struct vm_operations_struct ttm_bo_vm_ops = { .open = ttm_bo_vm_open, .close = ttm_bo_vm_close, .access = ttm_bo_vm_access, -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - .huge_fault = ttm_bo_vm_huge_fault, -#endif }; static struct ttm_buffer_object *ttm_bo_vm_lookup(struct ttm_bo_device *bdev, -- 2.21.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults 2020-04-06 21:04 ` Thomas Hellström (VMware) @ 2020-04-07 0:38 ` Alex Xu (Hello71) 2020-04-07 11:26 ` Thomas Hellström (VMware) 0 siblings, 1 reply; 7+ messages in thread From: Alex Xu (Hello71) @ 2020-04-07 0:38 UTC (permalink / raw) To: dri-devel, linux-kernel, linux-mm, Thomas Hellström (VMware) Cc: Andrew Morton, Christian König, Dan Williams, Jérôme Glisse, Kirill A. Shutemov, linux-graphics-maintainer, Michal Hocko, pv-drivers, Ralph Campbell, Roland Scheidegger, Matthew Wilcox (Oracle) Excerpts from Thomas Hellström (VMware)'s message of April 6, 2020 5:04 pm: > Hi, > > On 4/6/20 9:51 PM, Alex Xu (Hello71) wrote: >> Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad >> rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to >> start filling dmesg, and then closing programs causes more BUGs and >> hangs, and then everything grinds to a halt (can't start more programs, >> can't even reboot through systemd). >> >> Using master and reverting that branch up to that point fixes the >> problem. >> >> I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4 >> board with IOMMU enabled. > > If you could try the attached patch, that'd be great! > > Thanks, > > Thomas > Yeah, that works too. Kernel config sent off-list. Regards, Alex. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults 2020-04-07 0:38 ` Alex Xu (Hello71) @ 2020-04-07 11:26 ` Thomas Hellström (VMware) 2020-04-07 15:36 ` Alex Xu (Hello71) 0 siblings, 1 reply; 7+ messages in thread From: Thomas Hellström (VMware) @ 2020-04-07 11:26 UTC (permalink / raw) To: Alex Xu (Hello71), dri-devel, linux-kernel, linux-mm Cc: Andrew Morton, Christian König, Dan Williams, Jérôme Glisse, Kirill A. Shutemov, linux-graphics-maintainer, Michal Hocko, pv-drivers, Ralph Campbell, Roland Scheidegger, Matthew Wilcox (Oracle) On 4/7/20 2:38 AM, Alex Xu (Hello71) wrote: > Excerpts from Thomas Hellström (VMware)'s message of April 6, 2020 5:04 pm: >> Hi, >> >> On 4/6/20 9:51 PM, Alex Xu (Hello71) wrote: >>> Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad >>> rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to >>> start filling dmesg, and then closing programs causes more BUGs and >>> hangs, and then everything grinds to a halt (can't start more programs, >>> can't even reboot through systemd). >>> >>> Using master and reverting that branch up to that point fixes the >>> problem. >>> >>> I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4 >>> board with IOMMU enabled. >> If you could try the attached patch, that'd be great! >> >> Thanks, >> >> Thomas >> > Yeah, that works too. Kernel config sent off-list. > > Regards, > Alex. Thanks. Do you want me to add your Reported-by: and Tested-by: To this patch? /Thomas ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults 2020-04-07 11:26 ` Thomas Hellström (VMware) @ 2020-04-07 15:36 ` Alex Xu (Hello71) 2020-04-07 19:57 ` Thomas Hellström (VMware) 0 siblings, 1 reply; 7+ messages in thread From: Alex Xu (Hello71) @ 2020-04-07 15:36 UTC (permalink / raw) To: dri-devel, linux-kernel, linux-mm, Thomas Hellström (VMware) Cc: Andrew Morton, Christian König, Dan Williams, Jérôme Glisse, Kirill A. Shutemov, linux-graphics-maintainer, Michal Hocko, pv-drivers, Ralph Campbell, Roland Scheidegger, Matthew Wilcox (Oracle) Excerpts from Thomas Hellström (VMware)'s message of April 7, 2020 7:26 am: > On 4/7/20 2:38 AM, Alex Xu (Hello71) wrote: >> Excerpts from Thomas Hellström (VMware)'s message of April 6, 2020 5:04 pm: >>> Hi, >>> >>> On 4/6/20 9:51 PM, Alex Xu (Hello71) wrote: >>>> Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad >>>> rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to >>>> start filling dmesg, and then closing programs causes more BUGs and >>>> hangs, and then everything grinds to a halt (can't start more programs, >>>> can't even reboot through systemd). >>>> >>>> Using master and reverting that branch up to that point fixes the >>>> problem. >>>> >>>> I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4 >>>> board with IOMMU enabled. >>> If you could try the attached patch, that'd be great! >>> >>> Thanks, >>> >>> Thomas >>> >> Yeah, that works too. Kernel config sent off-list. >> >> Regards, >> Alex. > > Thanks. Do you want me to add your > > Reported-by: and Tested-by: To this patch? > > /Thomas > > Sure. Shouldn't we fix it properly though? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults 2020-04-07 15:36 ` Alex Xu (Hello71) @ 2020-04-07 19:57 ` Thomas Hellström (VMware) 0 siblings, 0 replies; 7+ messages in thread From: Thomas Hellström (VMware) @ 2020-04-07 19:57 UTC (permalink / raw) To: Alex Xu (Hello71), dri-devel, linux-kernel, linux-mm Cc: Andrew Morton, Christian König, Dan Williams, Jérôme Glisse, Kirill A. Shutemov, linux-graphics-maintainer, Michal Hocko, pv-drivers, Ralph Campbell, Roland Scheidegger, Matthew Wilcox (Oracle) On 4/7/20 5:36 PM, Alex Xu (Hello71) wrote: > Excerpts from Thomas Hellström (VMware)'s message of April 7, 2020 7:26 am: >> On 4/7/20 2:38 AM, Alex Xu (Hello71) wrote: >>> Excerpts from Thomas Hellström (VMware)'s message of April 6, 2020 5:04 pm: >>>> Hi, >>>> >>>> On 4/6/20 9:51 PM, Alex Xu (Hello71) wrote: >>>>> Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad >>>>> rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to >>>>> start filling dmesg, and then closing programs causes more BUGs and >>>>> hangs, and then everything grinds to a halt (can't start more programs, >>>>> can't even reboot through systemd). >>>>> >>>>> Using master and reverting that branch up to that point fixes the >>>>> problem. >>>>> >>>>> I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4 >>>>> board with IOMMU enabled. >>>> If you could try the attached patch, that'd be great! >>>> >>>> Thanks, >>>> >>>> Thomas >>>> >>> Yeah, that works too. Kernel config sent off-list. >>> >>> Regards, >>> Alex. >> Thanks. Do you want me to add your >> >> Reported-by: and Tested-by: To this patch? >> >> /Thomas >> >> > Sure. Shouldn't we fix it properly though? It's still enabled for vmwgfx for which it is reasonably well tested and where I can't see any such errors. The code we remove with this patch enables huge page-table entries in some circumstances for other drivers, but given the problems you're seeing for amdgpu, it's better to enable this on a per-driver basis after thorough testing. Since I don't have amdgpu hardware I'm not sure what it's doing differently, and can't debug the issue properly. /Thomas ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-04-07 19:57 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <1586138158.v5u7myprlp.none.ref@localhost>
2020-04-06 19:51 ` Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults Alex Xu (Hello71)
2020-04-06 20:25 ` Thomas Hellström (VMware)
2020-04-06 21:04 ` Thomas Hellström (VMware)
2020-04-07 0:38 ` Alex Xu (Hello71)
2020-04-07 11:26 ` Thomas Hellström (VMware)
2020-04-07 15:36 ` Alex Xu (Hello71)
2020-04-07 19:57 ` Thomas Hellström (VMware)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox