From: "StDenis, Tom" <Tom.StDenis@amd.com>
To: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: "Grodzovsky, Andrey" <Andrey.Grodzovsky@amd.com>,
"Wentland, Harry" <Harry.Wentland@amd.com>,
"Deucher, Alexander" <Alexander.Deucher@amd.com>,
"Koenig, Christian" <Christian.Koenig@amd.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Subject: Re: After Vega 56/64 GPU hang I unable reboot system
Date: Thu, 20 Dec 2018 14:19:40 +0000 [thread overview]
Message-ID: <96c70496-ce62-b162-187c-ff34ebb84ec2@amd.com> (raw)
In-Reply-To: <fbdd541c-ce31-9fe0-f1ac-bb9c51bb6526@amd.com>
On 2018-12-20 9:08 a.m., Tom St Denis wrote:
> On 2018-12-20 9:06 a.m., Tom St Denis wrote:
>> On 2018-12-20 6:45 a.m., Mikhail Gavrilov wrote:
>>> On Thu, 20 Dec 2018 at 16:17, StDenis, Tom <Tom.StDenis@amd.com> wrote:
>>>>
>>>> Well yup the kernel is not letting you open the files:
>>>>
>>>>
>>>> As sudo/root you should be able to open these files with umr. What
>>>> happens if you just open a shell as root and run it?
>>>>
>>>
>>> [root@localhost ~]# touch /sys/kernel/debug/dri/0/amdgpu_ring_gfx
>>> [root@localhost ~]# cat /sys/kernel/debug/dri/0/amdgpu_ring_gfx
>>> cat: /sys/kernel/debug/dri/0/amdgpu_ring_gfx: Operation not permitted
>>> [root@localhost ~]# ls -laZ /sys/kernel/debug/dri/0/amdgpu_ring_gfx
>>> -r--r--r--. 1 root root system_u:object_r:debugfs_t:s0 8204 Dec 20
>>> 16:31 /sys/kernel/debug/dri/0/amdgpu_ring_gfx
>>> [root@localhost ~]# getenforce
>>> Permissive
>>> [root@localhost ~]# /home/mikhail/packaging-work/umr/build/src/app/umr
>>> -O verbose,halt_waves -wa
>>> Cannot seek to MMIO address: Bad file descriptor
>>> [ERROR]: Could not open ring debugfs fileSegmentation fault (core
>>> dumped)
>>>
>>> I am already tried launch `umr` under root user, but kernel don't let
>>> open `amdgpu_ring_gfx` again.
>>>
>>> What else kernel options I should to check?
>>>
>>> I am also attached current kernel config to this message.
>>
>> I can replicate this by doing
>>
>> chmod u+s umr
>> sudo ./umr -R gfx[.]
>>
>> You need to remove the u+s bit you are literally not running umr as root!
>
> Actually disregard that. I'm confused at this point.
>
> I run umr 100s of times a day on my devel box just fine as root.
>
> Let me fiddle and see if I can sort this out.
Ya I was right. With a plain build I can access the files just fine.
tom@fx8:~/stuff/public/umr/src/app $ stat ./umr
File: ./umr
Size: 89204248 Blocks: 174240 IO Block: 4096 regular file
Device: fd01h/64769d Inode: 14946407 Links: 1
Access: (0775/-rwxrwxr-x) Uid: ( 1000/ tom) Gid: ( 1000/ tom)
Access: 2018-12-20 09:15:03.348320256 -0500
Modify: 2018-12-20 09:05:48.148724423 -0500
Change: 2018-12-20 09:14:43.964948557 -0500
Birth: -
tom@fx8:~/stuff/public/umr/src/app $ sudo ./umr -R gfx[.]
raven1.gfx.rptr == 768
raven1.gfx.wptr == 768
raven1.gfx.drv_wptr == 768
raven1.gfx.ring[ 737] == 0xffff1000 ...
raven1.gfx.ring[ 738] == 0xffff1000 ...
raven1.gfx.ring[ 739] == 0xffff1000 ...
raven1.gfx.ring[ 740] == 0xffff1000 ...
raven1.gfx.ring[ 741] == 0xffff1000 ...
raven1.gfx.ring[ 742] == 0xffff1000 ...
raven1.gfx.ring[ 743] == 0xffff1000 ...
raven1.gfx.ring[ 744] == 0xffff1000 ...
raven1.gfx.ring[ 745] == 0xffff1000 ...
raven1.gfx.ring[ 746] == 0xffff1000 ...
raven1.gfx.ring[ 747] == 0xffff1000 ...
raven1.gfx.ring[ 748] == 0xffff1000 ...
raven1.gfx.ring[ 749] == 0xffff1000 ...
raven1.gfx.ring[ 750] == 0xffff1000 ...
raven1.gfx.ring[ 751] == 0xffff1000 ...
raven1.gfx.ring[ 752] == 0xffff1000 ...
raven1.gfx.ring[ 753] == 0xffff1000 ...
raven1.gfx.ring[ 754] == 0xffff1000 ...
raven1.gfx.ring[ 755] == 0xffff1000 ...
raven1.gfx.ring[ 756] == 0xffff1000 ...
raven1.gfx.ring[ 757] == 0xffff1000 ...
raven1.gfx.ring[ 758] == 0xffff1000 ...
raven1.gfx.ring[ 759] == 0xffff1000 ...
raven1.gfx.ring[ 760] == 0xffff1000 ...
raven1.gfx.ring[ 761] == 0xffff1000 ...
raven1.gfx.ring[ 762] == 0xffff1000 ...
raven1.gfx.ring[ 763] == 0xffff1000 ...
raven1.gfx.ring[ 764] == 0xffff1000 ...
raven1.gfx.ring[ 765] == 0xffff1000 ...
raven1.gfx.ring[ 766] == 0xffff1000 ...
raven1.gfx.ring[ 767] == 0xffff1000 ...
raven1.gfx.ring[ 768] == 0xc0032200 rwD
I did manage to get into a weird shell where I couldn't cat
amdgpu_gca_config from bash though after a reboot (had updates pending)
it works fine.
If you can't cat those files then neither can umr.
So NOTABUG :-)
Tom
next prev parent reply other threads:[~2018-12-20 14:19 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-04 22:18 Mikhail Gavrilov
2018-12-14 19:36 ` Wentland, Harry
2018-12-15 9:42 ` Mikhail Gavrilov
2018-12-17 18:51 ` Wentland, Harry
2018-12-17 19:08 ` Grodzovsky, Andrey
2018-12-19 18:35 ` Mikhail Gavrilov
2018-12-19 20:21 ` Grodzovsky, Andrey
2018-12-19 20:51 ` StDenis, Tom
2018-12-19 20:56 ` StDenis, Tom
2018-12-19 21:14 ` Mikhail Gavrilov
2018-12-19 21:21 ` Mikhail Gavrilov
2018-12-19 22:41 ` StDenis, Tom
2018-12-20 3:29 ` Mikhail Gavrilov
2018-12-20 11:17 ` StDenis, Tom
2018-12-20 11:45 ` Mikhail Gavrilov
2018-12-20 14:06 ` StDenis, Tom
2018-12-20 14:08 ` StDenis, Tom
2018-12-20 14:19 ` StDenis, Tom [this message]
[not found] ` <96c70496-ce62-b162-187c-ff34ebb84ec2-5C7GfCeVMHo@public.gmane.org>
2018-12-20 16:07 ` Mikhail Gavrilov
2018-12-20 16:20 ` StDenis, Tom
2018-12-22 10:28 ` Mikhail Gavrilov
2018-12-22 10:28 ` Mikhail Gavrilov
2019-01-03 20:23 ` Mikhail Gavrilov
2019-01-04 17:50 ` Mikhail Gavrilov
2019-01-07 18:46 ` Grodzovsky, Andrey
2019-01-09 19:36 ` Mikhail Gavrilov
2019-01-09 20:35 ` Grodzovsky, Andrey
2019-01-09 21:12 ` Mikhail Gavrilov
2019-01-09 21:12 ` Mikhail Gavrilov
2019-01-09 21:48 ` Grodzovsky, Andrey
2019-01-10 4:04 ` Mikhail Gavrilov
2019-01-10 8:54 ` Michel Dänzer
2019-01-10 9:42 ` Mikhail Gavrilov
2019-01-10 9:42 ` Mikhail Gavrilov
2019-01-10 10:38 ` Michel Dänzer
2019-01-10 15:22 ` Mikhail Gavrilov
2019-01-10 15:22 ` Mikhail Gavrilov
2019-01-10 15:31 ` StDenis, Tom
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=96c70496-ce62-b162-187c-ff34ebb84ec2@amd.com \
--to=tom.stdenis@amd.com \
--cc=Alexander.Deucher@amd.com \
--cc=Andrey.Grodzovsky@amd.com \
--cc=Christian.Koenig@amd.com \
--cc=Harry.Wentland@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=linux-mm@kvack.org \
--cc=mikhail.v.gavrilov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox