From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id AFB3D9B for ; Sat, 1 Aug 2015 19:08:58 +0000 (UTC) Received: from mail-qg0-f45.google.com (mail-qg0-f45.google.com [209.85.192.45]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 33C4F19B for ; Sat, 1 Aug 2015 19:08:58 +0000 (UTC) Received: by qgeh16 with SMTP id h16so65989194qge.3 for ; Sat, 01 Aug 2015 12:08:57 -0700 (PDT) Date: Sat, 1 Aug 2015 15:08:48 -0400 From: Jerome Glisse To: Joerg Roedel Message-ID: <20150801190847.GA2704@gmail.com> References: <20150730130027.GA14980@8bytes.org> <55BB8BB2.2090809@redhat.com> <20150731161304.GA2039@redhat.com> <20150801155728.GC14980@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20150801155728.GC14980@8bytes.org> Cc: ksummit-discuss@lists.linuxfoundation.org Subject: Re: [Ksummit-discuss] [CORE TOPIC] Core Kernel support for Compute-Offload Devices List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Sat, Aug 01, 2015 at 05:57:29PM +0200, Joerg Roedel wrote: > On Fri, Jul 31, 2015 at 12:13:04PM -0400, Jerome Glisse wrote: > > Hence scheduling here is different, on GPU it is more about > > a queue of several thousand thread and you just move things > > up and down on what need to be executed first. Then GPU have > > hw scheduling that constantly switch btw active thread this > > why memory latency is so well hidden on GPU. > > Thats why I wrote "batch"-scheduler in the proposal. Its right that it > does not make sense to schedule out a GPU process, and some devices do > scheduling in hardware anyway. > > But the Linux kernel still needs to decide which jobs are sent to the > offload device in which order, more like an io-scheduler. > > There might be a compute job that only utilizes 60% of the device > resources, to the in-kernel scheduler could start another job there to > utilize the other 40%. > > I think its worth a discussion if some common schedulers (like for > blk-io) make sense here too. It is definitly worth a discussion but i fear right now there is little room for anything in the kernel. Hardware scheduling is done is almost 100% hardware. The idea of GPU is that you have 1000 compute unit but the hardware keep track of 10000 threads and at any point in time there is huge probability that 1000 of those 10000 threads are ready to compute something. So if a job is only using 60% of the GPU then the remaining 40% would automaticly be use by the next batch of threads. This is a simplification as the number of thread the hw can keep track of depend of several factor and vary from one model to the other even inside same family of the same manufacturer. Where kernel have control is which command queue (today GPU have several command queue than run concurently) can spawn threads inside the GPU. Also thing like which queue got priority over another one. You even have mecanism where you can "divide" the GPU among queue (you assign fraction of the GPU compute unit to a particular queue). Thought i expect this last one is vanishing. Also note that many GPU manufacturer are pushing for userspace queue (i think it is some microsoft requirement) in which case the kernel have even less control. I agree that blk-io design is probably closest thing that might fit. > > I already implemented several version of it and posted for review > > couple of them. You do not want automatic migration because kernel > > as not enough informations here. > > Some devices might provide that information, see the extended-access bit > of Intel VT-d. This would be limited to integrated GPU and so far only on one platform. My point was more that userspace have way more informations to make good decision here. The userspace program is more likely to know what part of the dataset gonna be repeatedly access by the GPU threads. Cheers, Jérôme