linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM/BPF TOPIC] Enabling Smart Data Stream Accelerator Support for Linux
@ 2025-01-31 17:53 Wei Huang
  2025-02-03 10:13 ` Jonathan Cameron
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Wei Huang @ 2025-01-31 17:53 UTC (permalink / raw)
  To: lsf-pc
  Cc: linux-mm, Don Dutile, Joel Savitz, Moyes, William, Iyer, Shyam,
	Lynch, Nathan, mel.gorman, santosh.shukla, Suthikulpanit,
	Suravee, shivankg, Michael.Day

Hi All,

I want to proposal a talk for the LSFMMBPF conference: Enabling Smart 
Data Stream Accelerator (SDXI) Support for Linux.

The smart data stream accelerator (SDXI) is an industry standard [1] 
that provides various advanced capabilities, such as offloading DMA 
operations, supporting user-space addresses, and offering other advanced 
data processing features. With the integration of SDXI into a SoC, DMA 
offloading can now be supported across different address spaces. This 
talk focuses on a software design which enables comprehensive SDXI 
support across multiple software layers in the Linux Kernel. These 
interfaces not only facilitate SDXI hardware management but also allow 
kernel space subsystems and user space applications to directly own and 
control SDXI hardware under the protection of IOMMU.

To illustrate the practical applications of SDXI, Red Hat and AMD 
developed a user-space library that leverages the SDXI driver interface, 
demonstrating various use cases, such as memory operation offloading, in 
both bare-metal and virtual environments.

The prototype device driver [2] and user-space library are available for 
testing. We continue to work on the improvement of both components and 
plan to upstream the device driver soon.

== DISCUSSION ==
At this conference, we plan to discuss with the community on:

1) Use Cases
* Linux DMA engine
* Kernel task offloading (e.g., bulk copying)
* QoS and kernel perf integration
* New use cases

2) User-Space API Interface
* IOCTL proposal
* Security control
* User-space app integration

3) Virtualization Support
* Progress & current status
* Challenges

== REFERENCES ==
[1] SDXI 1.0 specification, https://www.snia.org/sdxi
[2] SDXI device driver, https://github.com/AMDESE/linux-sdxi

Thanks,
-Wei



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Enabling Smart Data Stream Accelerator Support for Linux
  2025-01-31 17:53 [LSF/MM/BPF TOPIC] Enabling Smart Data Stream Accelerator Support for Linux Wei Huang
@ 2025-02-03 10:13 ` Jonathan Cameron
  2025-02-03 14:23   ` Jason Gunthorpe
  2025-02-04  0:52   ` Wei Huang
  2025-02-04  4:19 ` [Lsf-pc] " Dan Williams
  2025-02-04  7:59 ` Christoph Hellwig
  2 siblings, 2 replies; 7+ messages in thread
From: Jonathan Cameron @ 2025-02-03 10:13 UTC (permalink / raw)
  To: Wei Huang
  Cc: lsf-pc, linux-mm, Don Dutile, Joel Savitz, Moyes, William, Iyer,
	Shyam, Lynch, Nathan, mel.gorman, santosh.shukla, Suthikulpanit,
	Suravee, shivankg, Michael.Day, Zhangfei Gao, Zhou Wang,
	Shameer Kolothum, Jason Gunthorpe

On Fri, 31 Jan 2025 11:53:07 -0600
Wei Huang <wei.huang2@amd.com> wrote:

> Hi All,
> 
> I want to proposal a talk for the LSFMMBPF conference: Enabling Smart 
> Data Stream Accelerator (SDXI) Support for Linux.
> 
> The smart data stream accelerator (SDXI) is an industry standard [1] 
> that provides various advanced capabilities, such as offloading DMA 
> operations, supporting user-space addresses, and offering other advanced 
> data processing features. With the integration of SDXI into a SoC, DMA 
> offloading can now be supported across different address spaces. This 
> talk focuses on a software design which enables comprehensive SDXI 
> support across multiple software layers in the Linux Kernel. These 
> interfaces not only facilitate SDXI hardware management but also allow 
> kernel space subsystems and user space applications to directly own and 
> control SDXI hardware under the protection of IOMMU.
> 
> To illustrate the practical applications of SDXI, Red Hat and AMD 
> developed a user-space library that leverages the SDXI driver interface, 
> demonstrating various use cases, such as memory operation offloading, in 
> both bare-metal and virtual environments.
> 
> The prototype device driver [2] and user-space library are available for 
> testing. We continue to work on the improvement of both components and 
> plan to upstream the device driver soon.
> 
> == DISCUSSION ==
> At this conference, we plan to discuss with the community on:

Hi Wei, 

Lots of topics and hints at interesting areas, but I'd like to see more
details to understand how this maps to other data moving / reorganizing
accelerators.  Whilst SDXI looks like a good and feature rich spec,
I'm curious what is fundamentally new?  Perhaps it is just the
right time to improve functionality for DMA engines in general.


> 
> 1) Use Cases
> * Linux DMA engine
> * Kernel task offloading (e.g., bulk copying)
> * QoS and kernel perf integration
> * New use cases

All interesting topics across this particular DMA engine and many others.
For new use cases are you planning to bring some, or is this a request
for suggestions?

> 
> 2) User-Space API Interface
> * IOCTL proposal

I'm curious on this aspect and how it compares with previous approaches.
Obviously bring some new operators and possibly need to target remote
memory.  However we have existing support for userspace access to accelerators
for crypto, compression etc (and much broader)

We went through a similar process finding a path to support those a few
years ago and ended up with UACCE. (drivers/misc/uacce lots of stuff
under drivers crypto). If there is overlap it would be good to figure
out a path that reduces duplication / complexity of interfacing with
the various userspace projects we all care about.  I won't tell the stories
of pain an redesigns it took to get UACCE upstream, but if you are doing
another new thing, good luck! (+CC some folk more familiar and active
in this space than I am).

> * Security control
> * User-space app integration
> 
> 3) Virtualization Support
> * Progress & current status

Good to have some more detail on this in particular.  Is this mostly blocked
on vSVA, IOMMUFD etc progress or is there something new?

> * Challenges
> 
> == REFERENCES ==
> [1] SDXI 1.0 specification, https://www.snia.org/sdxi
> [2] SDXI device driver, https://github.com/AMDESE/linux-sdxi
> 
> Thanks,
> -Wei
> 
Thanks,

Jonathan
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Enabling Smart Data Stream Accelerator Support for Linux
  2025-02-03 10:13 ` Jonathan Cameron
@ 2025-02-03 14:23   ` Jason Gunthorpe
  2025-02-04  0:59     ` Wei Huang
  2025-02-04  0:52   ` Wei Huang
  1 sibling, 1 reply; 7+ messages in thread
From: Jason Gunthorpe @ 2025-02-03 14:23 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Wei Huang, lsf-pc, linux-mm, Don Dutile, Joel Savitz, Moyes,
	William, Iyer, Shyam, Lynch, Nathan, mel.gorman, santosh.shukla,
	Suthikulpanit, Suravee, shivankg, Michael.Day, Zhangfei Gao,
	Zhou Wang, Shameer Kolothum

On Mon, Feb 03, 2025 at 10:13:23AM +0000, Jonathan Cameron wrote:

> Lots of topics and hints at interesting areas, but I'd like to see more
> details to understand how this maps to other data moving / reorganizing
> accelerators.  Whilst SDXI looks like a good and feature rich spec,
> I'm curious what is fundamentally new?  Perhaps it is just the
> right time to improve functionality for DMA engines in general.

It looks quite alot like Intel's IDXD to me, which seems to have alot
of overlap.

> > [2] SDXI device driver, https://github.com/AMDESE/linux-sdxi

Sorry, but a never posted branch based on an v6.6 kernel using alot of
obsoleted and removed iommu APIs doesn't seem like LSF/MM content to
me. LSF/MM is supposed to be a problem solving conference, you should
ideally have something in active discussion on the mailing list with
an unresolved problem to talk about.

Jason


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Enabling Smart Data Stream Accelerator Support for Linux
  2025-02-03 10:13 ` Jonathan Cameron
  2025-02-03 14:23   ` Jason Gunthorpe
@ 2025-02-04  0:52   ` Wei Huang
  1 sibling, 0 replies; 7+ messages in thread
From: Wei Huang @ 2025-02-04  0:52 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: lsf-pc, linux-mm, Don Dutile, Joel Savitz, Moyes, William, Iyer,
	Shyam, Lynch, Nathan, mel.gorman, santosh.shukla, Suthikulpanit,
	Suravee, shivankg, Michael.Day, Zhangfei Gao, Zhou Wang,
	Shameer Kolothum, Jason Gunthorpe



On 2/3/25 4:13 AM, Jonathan Cameron wrote:
> On Fri, 31 Jan 2025 11:53:07 -0600
> Wei Huang <wei.huang2@amd.com> wrote:
> 
>> Hi All,
>>
>> I want to proposal a talk for the LSFMMBPF conference: Enabling Smart
>> Data Stream Accelerator (SDXI) Support for Linux.
>>
>> The smart data stream accelerator (SDXI) is an industry standard [1]
>> that provides various advanced capabilities, such as offloading DMA
>> operations, supporting user-space addresses, and offering other advanced
>> data processing features. With the integration of SDXI into a SoC, DMA
>> offloading can now be supported across different address spaces. This
>> talk focuses on a software design which enables comprehensive SDXI
>> support across multiple software layers in the Linux Kernel. These
>> interfaces not only facilitate SDXI hardware management but also allow
>> kernel space subsystems and user space applications to directly own and
>> control SDXI hardware under the protection of IOMMU.
>>
>> To illustrate the practical applications of SDXI, Red Hat and AMD
>> developed a user-space library that leverages the SDXI driver interface,
>> demonstrating various use cases, such as memory operation offloading, in
>> both bare-metal and virtual environments.
>>
>> The prototype device driver [2] and user-space library are available for
>> testing. We continue to work on the improvement of both components and
>> plan to upstream the device driver soon.
>>
>> == DISCUSSION ==
>> At this conference, we plan to discuss with the community on:
> 
> Hi Wei,
> 
> Lots of topics and hints at interesting areas, but I'd like to see more
> details to understand how this maps to other data moving / reorganizing
> accelerators.  Whilst SDXI looks like a good and feature rich spec,
> I'm curious what is fundamentally new?  Perhaps it is just the

Compared with existing implementations, I think the following (combined) 
features are considered interesting:

* An industry open standard that is architecture agnostic
* Support various, including user-mode, address spaces
* Designed with virtualization in mind, easy for passthru and migration
* Easy extension for future functionalities

> right time to improve functionality for DMA engines in general.
> 
> 
>>
>> 1) Use Cases
>> * Linux DMA engine
>> * Kernel task offloading (e.g., bulk copying)
>> * QoS and kernel perf integration
>> * New use cases
> 
> All interesting topics across this particular DMA engine and many others.
> For new use cases are you planning to bring some, or is this a request
> for suggestions?

Both. Some use cases we tested:
* Server as a DMA engine in Linux
* AutoNUMA offloading
* Memory zeroing for large VM memory initialization
* Batching folio copy operations

We do expect more use cases, and want to solicit ideas from the community.

> 
>>
>> 2) User-Space API Interface
>> * IOCTL proposal
> 
> I'm curious on this aspect and how it compares with previous approaches.
> Obviously bring some new operators and possibly need to target remote
> memory.  However we have existing support for userspace access to accelerators
> for crypto, compression etc (and much broader)
> 
> We went through a similar process finding a path to support those a few
> years ago and ended up with UACCE. (drivers/misc/uacce lots of stuff
> under drivers crypto). If there is overlap it would be good to figure
> out a path that reduces duplication / complexity of interfacing with
> the various userspace projects we all care about.  I won't tell the stories
> of pain an redesigns it took to get UACCE upstream, but if you are doing
> another new thing, good luck! (+CC some folk more familiar and active
> in this space than I am).

Thanks for the pointer. Right now, as a prototype, we don't take UACCE 
approach. But we did see UACCE can be utilized in this space. This can 
be part of discussion.

> 
>> * Security control
>> * User-space app integration
>>
>> 3) Virtualization Support
>> * Progress & current status
> 
> Good to have some more detail on this in particular.  Is this mostly blocked
> on vSVA, IOMMUFD etc progress or is there something new?

It is blocked by vIOMMU support in both kernel and QEMU/KVM. To support 
SVA inside VMs, we have to present a virtual IOMMU to guest VMs. There 
are various way of implementing virtual IOMMU. But AMD hardware vIOMMU 
is supposed to have better performance over emulated vIOMMU (there was a 
KVM Forum talk by Suravee and me for reference). We have a prototype 
implementation. Right now Suravee is working on cleaning/finishing up 
hardware vIOMMU patches for the upstream.

> 
>> * Challenges
>>
>> == REFERENCES ==
>> [1] SDXI 1.0 specification, https://www.snia.org/sdxi
>> [2] SDXI device driver, https://github.com/AMDESE/linux-sdxi
>>
>> Thanks,
>> -Wei
>>
> Thanks,
> 
> Jonathan
>>
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Enabling Smart Data Stream Accelerator Support for Linux
  2025-02-03 14:23   ` Jason Gunthorpe
@ 2025-02-04  0:59     ` Wei Huang
  0 siblings, 0 replies; 7+ messages in thread
From: Wei Huang @ 2025-02-04  0:59 UTC (permalink / raw)
  To: Jason Gunthorpe, Jonathan Cameron
  Cc: lsf-pc, linux-mm, Don Dutile, Joel Savitz, Moyes, William, Iyer,
	Shyam, Lynch, Nathan, mel.gorman, santosh.shukla, Suthikulpanit,
	Suravee, shivankg, Michael.Day, Zhangfei Gao, Zhou Wang,
	Shameer Kolothum



On 2/3/25 8:23 AM, Jason Gunthorpe wrote:
> On Mon, Feb 03, 2025 at 10:13:23AM +0000, Jonathan Cameron wrote:
> 
>> Lots of topics and hints at interesting areas, but I'd like to see more
>> details to understand how this maps to other data moving / reorganizing
>> accelerators.  Whilst SDXI looks like a good and feature rich spec,
>> I'm curious what is fundamentally new?  Perhaps it is just the
>> right time to improve functionality for DMA engines in general.
> 
> It looks quite alot like Intel's IDXD to me, which seems to have alot
> of overlap.

In terms of certain functionalities, it does overlap with IDXD in some 
areas. Both implementations, however, take different design approaches 
that might be worthy of discussion.

> 
>>> [2] SDXI device driver, https://github.com/AMDESE/linux-sdxi
> 
> Sorry, but a never posted branch based on an v6.6 kernel using alot of
> obsoleted and removed iommu APIs doesn't seem like LSF/MM content to
> me. LSF/MM is supposed to be a problem solving conference, you should
> ideally have something in active discussion on the mailing list with
> an unresolved problem to talk about.

Internally we have a rebase to kernel 6.12 and can be shared in github 
immediately. Regarding the upstream, Nathan Lynch and I are working on 
SDXI upstream patchset.

> 
> Jason



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Enabling Smart Data Stream Accelerator Support for Linux
  2025-01-31 17:53 [LSF/MM/BPF TOPIC] Enabling Smart Data Stream Accelerator Support for Linux Wei Huang
  2025-02-03 10:13 ` Jonathan Cameron
@ 2025-02-04  4:19 ` Dan Williams
  2025-02-04  7:59 ` Christoph Hellwig
  2 siblings, 0 replies; 7+ messages in thread
From: Dan Williams @ 2025-02-04  4:19 UTC (permalink / raw)
  To: Wei Huang via Lsf-pc
  Cc: linux-mm, Don Dutile, Joel Savitz, Moyes, William, Iyer, Shyam,
	Lynch, Nathan, mel.gorman, santosh.shukla, Suthikulpanit,
	Suravee, shivankg, Michael.Day

Wei Huang via Lsf-pc wrote:
> Hi All,
> 
> I want to proposal a talk for the LSFMMBPF conference: Enabling Smart 
> Data Stream Accelerator (SDXI) Support for Linux.
> 
> The smart data stream accelerator (SDXI) is an industry standard [1] 
> that provides various advanced capabilities, such as offloading DMA 
> operations, supporting user-space addresses, and offering other advanced 
> data processing features. With the integration of SDXI into a SoC, DMA 
> offloading can now be supported across different address spaces. This 
> talk focuses on a software design which enables comprehensive SDXI 
> support across multiple software layers in the Linux Kernel. These 
> interfaces not only facilitate SDXI hardware management but also allow 
> kernel space subsystems and user space applications to directly own and 
> control SDXI hardware under the protection of IOMMU.
> 
> To illustrate the practical applications of SDXI, Red Hat and AMD 
> developed a user-space library that leverages the SDXI driver interface, 
> demonstrating various use cases, such as memory operation offloading, in 
> both bare-metal and virtual environments.
> 
> The prototype device driver [2] and user-space library are available for 
> testing. We continue to work on the improvement of both components and 
> plan to upstream the device driver soon.
> 
> == DISCUSSION ==
> At this conference, we plan to discuss with the community on:
> 
> 1) Use Cases
> * Linux DMA engine

Is this a use case?

In other words copy-offload engines have struggled for more than a
decard to impact kernel use cases due to the maintenance burden of split
async / synchronous paths for kernel-buffer-to-kernel-buffer copies. The
Linux dmaengine subsystem mainly stayed relevant due to device-DMA use
cases. 

> * Kernel task offloading (e.g., bulk copying)
> * QoS and kernel perf integration
> * New use cases

I think for this effort to be successful it needs to focus on one
embarassingly clear use case where CPU copy hits a scaling wall, not a
gallery of potential use cases.

> 2) User-Space API Interface
> * IOCTL proposal
> * Security control
> * User-space app integration

The best API for copy-offload is no new API, i.e. transparent
acceleration of existing software. For example, io_uring is already
asking applications to rewrite and submit bulk work to the kernel. It
would be lovely if the applications got copy offload for free after
paying the io_uring conversion cost.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Enabling Smart Data Stream Accelerator Support for Linux
  2025-01-31 17:53 [LSF/MM/BPF TOPIC] Enabling Smart Data Stream Accelerator Support for Linux Wei Huang
  2025-02-03 10:13 ` Jonathan Cameron
  2025-02-04  4:19 ` [Lsf-pc] " Dan Williams
@ 2025-02-04  7:59 ` Christoph Hellwig
  2 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2025-02-04  7:59 UTC (permalink / raw)
  To: Wei Huang
  Cc: lsf-pc, linux-mm, Don Dutile, Joel Savitz, Moyes, William, Iyer,
	Shyam, Lynch, Nathan, mel.gorman, santosh.shukla, Suthikulpanit,
	Suravee, shivankg, Michael.Day

This is a lot of mumbling for for something that should have had a
driver for the basic dmaengine functionality upstream for a long time.

I'd suggest you spend your time on upstreaming that driver first and
then send actualy code proposing anything beyond that to the list
first and start a discusssion if that doesn't get anywhere.



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-02-04  7:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-31 17:53 [LSF/MM/BPF TOPIC] Enabling Smart Data Stream Accelerator Support for Linux Wei Huang
2025-02-03 10:13 ` Jonathan Cameron
2025-02-03 14:23   ` Jason Gunthorpe
2025-02-04  0:59     ` Wei Huang
2025-02-04  0:52   ` Wei Huang
2025-02-04  4:19 ` [Lsf-pc] " Dan Williams
2025-02-04  7:59 ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox