From: "Bridgman, John" <John.Bridgman@amd.com>
To: Daniel Vetter <daniel.vetter@ffwll.ch>,
"Gabbay, Oded" <Oded.Gabbay@amd.com>
Cc: "Jerome Glisse" <j.glisse@gmail.com>,
"Christian König" <deathsimple@vodafone.de>,
"David Airlie" <airlied@linux.ie>,
"Alex Deucher" <alexdeucher@gmail.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Joerg Roedel" <joro@8bytes.org>,
"Lewycky, Andrew" <Andrew.Lewycky@amd.com>,
"Daenzer, Michel" <Michel.Daenzer@amd.com>,
"Goz, Ben" <Ben.Goz@amd.com>,
"Skidanov, Alexey" <Alexey.Skidanov@amd.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"dri-devel@lists.freedesktop.org"
<dri-devel@lists.freedesktop.org>, linux-mm <linux-mm@kvack.org>,
"Sellek, Tom" <Tom.Sellek@amd.com>
Subject: RE: [PATCH v2 00/25] AMDKFD kernel driver
Date: Wed, 23 Jul 2014 13:33:24 +0000 [thread overview]
Message-ID: <D89D60253BB73A4E8C62F9FD18A939CA01066B1B@storexdag02.amd.com> (raw)
In-Reply-To: <CAKMK7uFtSStEewVivbXAT1VC4t2Y+suTaEmQA4=UptK1UBLSmg@mail.gmail.com>
>-----Original Message-----
>From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch]
>Sent: Wednesday, July 23, 2014 3:06 AM
>To: Gabbay, Oded
>Cc: Jerome Glisse; Christian König; David Airlie; Alex Deucher; Andrew
>Morton; Bridgman, John; Joerg Roedel; Lewycky, Andrew; Daenzer, Michel;
>Goz, Ben; Skidanov, Alexey; linux-kernel@vger.kernel.org; dri-
>devel@lists.freedesktop.org; linux-mm; Sellek, Tom
>Subject: Re: [PATCH v2 00/25] AMDKFD kernel driver
>
>On Wed, Jul 23, 2014 at 8:50 AM, Oded Gabbay <oded.gabbay@amd.com>
>wrote:
>> On 22/07/14 14:15, Daniel Vetter wrote:
>>>
>>> On Tue, Jul 22, 2014 at 12:52:43PM +0300, Oded Gabbay wrote:
>>>>
>>>> On 22/07/14 12:21, Daniel Vetter wrote:
>>>>>
>>>>> On Tue, Jul 22, 2014 at 10:19 AM, Oded Gabbay
><oded.gabbay@amd.com>
>>>>> wrote:
>>>>>>>
>>>>>>> Exactly, just prevent userspace from submitting more. And if you
>>>>>>> have misbehaving userspace that submits too much, reset the gpu
>>>>>>> and tell it that you're sorry but won't schedule any more work.
>>>>>>
>>>>>>
>>>>>> I'm not sure how you intend to know if a userspace misbehaves or not.
>>>>>> Can
>>>>>> you elaborate ?
>>>>>
>>>>>
>>>>> Well that's mostly policy, currently in i915 we only have a check
>>>>> for hangs, and if userspace hangs a bit too often then we stop it.
>>>>> I guess you can do that with the queue unmapping you've describe in
>>>>> reply to Jerome's mail.
>>>>> -Daniel
>>>>>
>>>> What do you mean by hang ? Like the tdr mechanism in Windows (checks
>>>> if a gpu job takes more than 2 seconds, I think, and if so,
>>>> terminates the job).
>>>
>>>
>>> Essentially yes. But we also have some hw features to kill jobs
>>> quicker, e.g. for media workloads.
>>> -Daniel
>>>
>>
>> Yeah, so this is what I'm talking about when I say that you and Jerome
>> come from a graphics POV and amdkfd come from a compute POV, no
>offense intended.
>>
>> For compute jobs, we simply can't use this logic to terminate jobs.
>> Graphics are mostly Real-Time while compute jobs can take from a few
>> ms to a few hours!!! And I'm not talking about an entire application
>> runtime but on a single submission of jobs by the userspace app. We
>> have tests with jobs that take between 20-30 minutes to complete. In
>> theory, we can even imagine a compute job which takes 1 or 2 days (on
>larger APUs).
>>
>> Now, I understand the question of how do we prevent the compute job
>> from monopolizing the GPU, and internally here we have some ideas that
>> we will probably share in the next few days, but my point is that I
>> don't think we can terminate a compute job because it is running for more
>than x seconds.
>> It is like you would terminate a CPU process which runs more than x
>seconds.
>>
>> I think this is a *very* important discussion (detecting a misbehaved
>> compute process) and I would like to continue it, but I don't think
>> moving the job submission from userspace control to kernel control
>> will solve this core problem.
>
>Well graphics gets away with cooperative scheduling since usually people
>want to see stuff within a few frames, so we can legitimately kill jobs after a
>fairly short timeout. Imo if you want to allow userspace to submit compute
>jobs that are atomic and take a few minutes to hours with no break-up in
>between and no hw means to preempt then that design is screwed up. We
>really can't tell the core vm that "sorry we will hold onto these gobloads of
>memory you really need now for another few hours". Pinning memory like
>that essentially without a time limit is restricted to root.
Hi Daniel;
I don't really understand the reference to "gobloads of memory". Unlike radeon graphics, the userspace data for HSA applications is maintained in pageable system memory and accessed via the IOMMUv2 (ATC/PRI). The IOMMUv2 driver and mm subsystem takes care of faulting in memory pages as needed, nothing is long-term pinned.
The only pinned memory we are talking about here is per-queue and per-process data structures in the driver, which are tiny by comparison. Oded provided the "hardware limits" (ie an insane number of process & threads) for context, but real-world limits will be one or two orders of magnitude lower. Agree we should have included those limits in the initial code, that would have made the "real world" memory footprint much more visible.
Make sense ?
>-Daniel
>--
>Daniel Vetter
>Software Engineer, Intel Corporation
>+41 (0) 79 365 57 48 - http://blog.ffwll.ch
next prev parent reply other threads:[~2014-07-23 13:33 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-17 13:57 Oded Gabbay
2014-07-20 17:46 ` Jerome Glisse
2014-07-21 3:03 ` Jerome Glisse
2014-07-21 7:01 ` Daniel Vetter
2014-07-21 9:34 ` Christian König
2014-07-21 12:36 ` Oded Gabbay
2014-07-21 13:39 ` Christian König
2014-07-21 14:12 ` Oded Gabbay
2014-07-21 15:54 ` Jerome Glisse
2014-07-21 17:42 ` Oded Gabbay
2014-07-21 18:14 ` Jerome Glisse
2014-07-21 18:36 ` Oded Gabbay
2014-07-21 18:59 ` Jerome Glisse
2014-07-21 19:23 ` Oded Gabbay
2014-07-21 19:28 ` Jerome Glisse
2014-07-21 21:56 ` Oded Gabbay
2014-07-21 23:05 ` Jerome Glisse
2014-07-21 23:29 ` Bridgman, John
2014-07-21 23:36 ` Jerome Glisse
2014-07-22 8:05 ` Oded Gabbay
2014-07-22 7:23 ` Daniel Vetter
2014-07-22 8:10 ` Oded Gabbay
2014-07-21 15:25 ` Daniel Vetter
2014-07-21 15:58 ` Jerome Glisse
2014-07-21 17:05 ` Daniel Vetter
2014-07-21 17:28 ` Oded Gabbay
2014-07-21 18:22 ` Daniel Vetter
2014-07-21 18:41 ` Oded Gabbay
2014-07-21 19:03 ` Jerome Glisse
2014-07-22 7:28 ` Daniel Vetter
2014-07-22 7:40 ` Daniel Vetter
2014-07-22 8:21 ` Oded Gabbay
2014-07-22 8:19 ` Oded Gabbay
2014-07-22 9:21 ` Daniel Vetter
2014-07-22 9:24 ` Daniel Vetter
2014-07-22 9:52 ` Oded Gabbay
2014-07-22 11:15 ` Daniel Vetter
2014-07-23 6:50 ` Oded Gabbay
2014-07-23 7:04 ` Christian König
2014-07-23 13:39 ` Bridgman, John
2014-07-23 14:56 ` Jerome Glisse
2014-07-23 19:49 ` Alex Deucher
2014-07-23 20:25 ` Jerome Glisse
2014-07-23 7:05 ` Daniel Vetter
2014-07-23 8:35 ` Oded Gabbay
2014-07-23 13:33 ` Bridgman, John [this message]
2014-07-23 14:41 ` Daniel Vetter
2014-07-23 15:06 ` Bridgman, John
2014-07-23 15:12 ` Bridgman, John
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D89D60253BB73A4E8C62F9FD18A939CA01066B1B@storexdag02.amd.com \
--to=john.bridgman@amd.com \
--cc=Alexey.Skidanov@amd.com \
--cc=Andrew.Lewycky@amd.com \
--cc=Ben.Goz@amd.com \
--cc=Michel.Daenzer@amd.com \
--cc=Oded.Gabbay@amd.com \
--cc=Tom.Sellek@amd.com \
--cc=airlied@linux.ie \
--cc=akpm@linux-foundation.org \
--cc=alexdeucher@gmail.com \
--cc=daniel.vetter@ffwll.ch \
--cc=deathsimple@vodafone.de \
--cc=dri-devel@lists.freedesktop.org \
--cc=j.glisse@gmail.com \
--cc=joro@8bytes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox