linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "StDenis, Tom" <Tom.StDenis@amd.com>
To: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: "Grodzovsky, Andrey" <Andrey.Grodzovsky@amd.com>,
	"Wentland, Harry" <Harry.Wentland@amd.com>,
	"Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"Koenig, Christian" <Christian.Koenig@amd.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Subject: Re: After Vega 56/64 GPU hang I unable reboot system
Date: Thu, 20 Dec 2018 14:19:40 +0000	[thread overview]
Message-ID: <96c70496-ce62-b162-187c-ff34ebb84ec2@amd.com> (raw)
In-Reply-To: <fbdd541c-ce31-9fe0-f1ac-bb9c51bb6526@amd.com>

On 2018-12-20 9:08 a.m., Tom St Denis wrote:
> On 2018-12-20 9:06 a.m., Tom St Denis wrote:
>> On 2018-12-20 6:45 a.m., Mikhail Gavrilov wrote:
>>> On Thu, 20 Dec 2018 at 16:17, StDenis, Tom <Tom.StDenis@amd.com> wrote:
>>>>
>>>> Well yup the kernel is not letting you open the files:
>>>>
>>>>
>>>> As sudo/root you should be able to open these files with umr.  What
>>>> happens if you just open a shell as root and run it?
>>>>
>>>
>>> [root@localhost ~]# touch /sys/kernel/debug/dri/0/amdgpu_ring_gfx
>>> [root@localhost ~]# cat /sys/kernel/debug/dri/0/amdgpu_ring_gfx
>>> cat: /sys/kernel/debug/dri/0/amdgpu_ring_gfx: Operation not permitted
>>> [root@localhost ~]# ls -laZ /sys/kernel/debug/dri/0/amdgpu_ring_gfx
>>> -r--r--r--. 1 root root system_u:object_r:debugfs_t:s0 8204 Dec 20
>>> 16:31 /sys/kernel/debug/dri/0/amdgpu_ring_gfx
>>> [root@localhost ~]# getenforce
>>> Permissive
>>> [root@localhost ~]# /home/mikhail/packaging-work/umr/build/src/app/umr
>>> -O verbose,halt_waves -wa
>>> Cannot seek to MMIO address: Bad file descriptor
>>> [ERROR]: Could not open ring debugfs fileSegmentation fault (core 
>>> dumped)
>>>
>>> I am already tried launch `umr` under root user, but kernel don't let
>>> open `amdgpu_ring_gfx` again.
>>>
>>> What else kernel options I should to check?
>>>
>>> I am also attached current kernel config to this message.
>>
>> I can replicate this by doing
>>
>> chmod u+s umr
>> sudo ./umr -R gfx[.]
>>
>> You need to remove the u+s bit you are literally not running umr as root!
> 
> Actually disregard that.  I'm confused at this point.
> 
> I run umr 100s of times a day on my devel box just fine as root.
> 
> Let me fiddle and see if I can sort this out.


Ya I was right.  With a plain build I can access the files just fine.

tom@fx8:~/stuff/public/umr/src/app $ stat ./umr
   File: ./umr
   Size: 89204248  	Blocks: 174240     IO Block: 4096   regular file
Device: fd01h/64769d	Inode: 14946407    Links: 1
Access: (0775/-rwxrwxr-x)  Uid: ( 1000/     tom)   Gid: ( 1000/     tom)
Access: 2018-12-20 09:15:03.348320256 -0500
Modify: 2018-12-20 09:05:48.148724423 -0500
Change: 2018-12-20 09:14:43.964948557 -0500
  Birth: -
tom@fx8:~/stuff/public/umr/src/app $ sudo ./umr -R gfx[.]

raven1.gfx.rptr == 768
raven1.gfx.wptr == 768
raven1.gfx.drv_wptr == 768
raven1.gfx.ring[ 737] == 0xffff1000    ...
raven1.gfx.ring[ 738] == 0xffff1000    ...
raven1.gfx.ring[ 739] == 0xffff1000    ...
raven1.gfx.ring[ 740] == 0xffff1000    ...
raven1.gfx.ring[ 741] == 0xffff1000    ...
raven1.gfx.ring[ 742] == 0xffff1000    ...
raven1.gfx.ring[ 743] == 0xffff1000    ...
raven1.gfx.ring[ 744] == 0xffff1000    ...
raven1.gfx.ring[ 745] == 0xffff1000    ...
raven1.gfx.ring[ 746] == 0xffff1000    ...
raven1.gfx.ring[ 747] == 0xffff1000    ...
raven1.gfx.ring[ 748] == 0xffff1000    ...
raven1.gfx.ring[ 749] == 0xffff1000    ...
raven1.gfx.ring[ 750] == 0xffff1000    ...
raven1.gfx.ring[ 751] == 0xffff1000    ...
raven1.gfx.ring[ 752] == 0xffff1000    ...
raven1.gfx.ring[ 753] == 0xffff1000    ...
raven1.gfx.ring[ 754] == 0xffff1000    ...
raven1.gfx.ring[ 755] == 0xffff1000    ...
raven1.gfx.ring[ 756] == 0xffff1000    ...
raven1.gfx.ring[ 757] == 0xffff1000    ...
raven1.gfx.ring[ 758] == 0xffff1000    ...
raven1.gfx.ring[ 759] == 0xffff1000    ...
raven1.gfx.ring[ 760] == 0xffff1000    ...
raven1.gfx.ring[ 761] == 0xffff1000    ...
raven1.gfx.ring[ 762] == 0xffff1000    ...
raven1.gfx.ring[ 763] == 0xffff1000    ...
raven1.gfx.ring[ 764] == 0xffff1000    ...
raven1.gfx.ring[ 765] == 0xffff1000    ...
raven1.gfx.ring[ 766] == 0xffff1000    ...
raven1.gfx.ring[ 767] == 0xffff1000    ...
raven1.gfx.ring[ 768] == 0xc0032200    rwD


I did manage to get into a weird shell where I couldn't cat 
amdgpu_gca_config from bash though after a reboot (had updates pending) 
it works fine.

If you can't cat those files then neither can umr.

So NOTABUG :-)

Tom

  reply	other threads:[~2018-12-20 14:19 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-04 22:18 Mikhail Gavrilov
2018-12-14 19:36 ` Wentland, Harry
2018-12-15  9:42   ` Mikhail Gavrilov
2018-12-17 18:51     ` Wentland, Harry
2018-12-17 19:08       ` Grodzovsky, Andrey
2018-12-19 18:35         ` Mikhail Gavrilov
2018-12-19 20:21           ` Grodzovsky, Andrey
2018-12-19 20:51             ` StDenis, Tom
2018-12-19 20:56             ` StDenis, Tom
2018-12-19 21:14               ` Mikhail Gavrilov
2018-12-19 21:21                 ` Mikhail Gavrilov
2018-12-19 22:41                   ` StDenis, Tom
2018-12-20  3:29                     ` Mikhail Gavrilov
2018-12-20 11:17                       ` StDenis, Tom
2018-12-20 11:45                         ` Mikhail Gavrilov
2018-12-20 14:06                           ` StDenis, Tom
2018-12-20 14:08                             ` StDenis, Tom
2018-12-20 14:19                               ` StDenis, Tom [this message]
     [not found]                                 ` <96c70496-ce62-b162-187c-ff34ebb84ec2-5C7GfCeVMHo@public.gmane.org>
2018-12-20 16:07                                   ` Mikhail Gavrilov
2018-12-20 16:20                                     ` StDenis, Tom
2018-12-22 10:28                                       ` Mikhail Gavrilov
2018-12-22 10:28                                         ` Mikhail Gavrilov
2019-01-03 20:23         ` Mikhail Gavrilov
2019-01-04 17:50           ` Mikhail Gavrilov
2019-01-07 18:46             ` Grodzovsky, Andrey
2019-01-09 19:36               ` Mikhail Gavrilov
2019-01-09 20:35                 ` Grodzovsky, Andrey
2019-01-09 21:12                   ` Mikhail Gavrilov
2019-01-09 21:12                     ` Mikhail Gavrilov
2019-01-09 21:48                     ` Grodzovsky, Andrey
2019-01-10  4:04                       ` Mikhail Gavrilov
2019-01-10  8:54                     ` Michel Dänzer
2019-01-10  9:42                       ` Mikhail Gavrilov
2019-01-10  9:42                         ` Mikhail Gavrilov
2019-01-10 10:38                         ` Michel Dänzer
2019-01-10 15:22                 ` Mikhail Gavrilov
2019-01-10 15:22                   ` Mikhail Gavrilov
2019-01-10 15:31                   ` StDenis, Tom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=96c70496-ce62-b162-187c-ff34ebb84ec2@amd.com \
    --to=tom.stdenis@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Andrey.Grodzovsky@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=Harry.Wentland@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=linux-mm@kvack.org \
    --cc=mikhail.v.gavrilov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox