On Thu, 2015-07-30 at 15:00 +0200, Joerg Roedel wrote:
> [
>  The topic is highly technical and could be a tech topic. But it also
>  touches multiple subsystems, so I decided to submit it as a core
>  topic.
> ]
> 
> Across architectures and vendors there are new devices coming up for
> offloading tasks from the CPUs. Most of these devices are capable to
> operate on user address spaces.
> 
> Besides the commonalities there are important differences in the 
> memory
> model these devices offer. Some work only on system RAM, others come
> with their own memory which may or may not be accessible by the CPU.
> 
> I'd like to discuss what support we need in the core kernel for these
> devices. A probably incomplete list of open questions:
> 
> 	(1) Do we need the concept of an off-CPU task in the kernel
> 	    together with a common interface to create and manage them
> 	    and probably a (collection of) batch scheduler(s) for these
> 	    tasks?
> 
> 	(2) Changes in memory management for devices accessing user
> 	    address spaces:
> 	    
> 	    (2.1) How can we best support the different memory models
> 	          these devices support?
> 	    
> 	    (2.2) How do we handle the off-CPU users of an mm_struct?
> 	    
> 	    (2.3) How can we attach common state for off-CPU tasks to
> 	          mm_struct (and what needs to be in there)?

And how do we handle the assignment of Address Space IDs? The AMD
implementation currently allows the PASID space to be managed per
-device, but I understand ARM systems handle the TLB shootdown
broadcasts in hardware and need the PASID that the device sees to be
identical to the ASID on the CPU's MMU? And there are reasons why we
might actually want that model on Intel systems too. I'm working on the
Intel SVM right now, and looking at a single-PASID-space model (partly
because the PASID tables have to be physically contiguous, and they can
be huge!).

> 	(3) Does it make sense to implement automatic migration of
> 	    system memory to device memory (when available) and vice
> 	    versa? How do we decide what and when to migrate?

This is quite a horrid one, but perhaps ties into generic NUMA
considerations — if a memory page is being frequently accessed by
something that it's far away from, can we move it to closer memory? 

The question is how we handle that. We do have Extended Accessed bits
in the Intel implementation of SVM that let us know that a given PTE
was used from a device. Although not *which* device, in cases where
there might be more than one.

> 	(4) What features do we require in the hardware to support it
> 	    with a common interface?
> 
> I think it would be great if the kernel would have a common interface
> for these kind of devices. Currently every vendor develops its own
> interface with various hacks to work around core code behavior.

Right. For now it's almost all internal on-chip stuff so it's kind of
tolerable to have vendor-specific implenentations. But we are starting
to see PCIe root ports which support the necessary TLP prefixes to
support SVM on discrete devices. And then it'll be really important to
have this working cross-platform.

-- 
dwmw2