From: Jason Gunthorpe <jgg@nvidia.com>
To: Linus Walleij <linus.walleij@linaro.org>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>,
Greg KH <gregkh@linuxfoundation.org>,
Dan Williams <dan.j.williams@intel.com>,
ksummit@lists.linux.dev
Subject: Re: [MAINTAINERS SUMMIT] Between a rock and a hard place, managing expectations...
Date: Thu, 24 Aug 2023 11:19:24 -0300 [thread overview]
Message-ID: <ZOdm7Kr/HWrlXiux@nvidia.com> (raw)
In-Reply-To: <CACRpkdbt-GxDgGbpETJTjBXz6qH2yLFgTR8BVVU9EU1uj7tJ+Q@mail.gmail.com>
On Thu, Aug 24, 2023 at 10:16:31AM +0200, Linus Walleij wrote:
> On Thu, Aug 24, 2023 at 2:47 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> > On Tue, Aug 22, 2023 at 05:29:13PM +0300, Laurent Pinchart wrote:
>
> > > In some (many ?) cases, the lowest effort path is to try and sneak it in
> > > without us noticing rather than "fighting it out" or "designing it out"
> > > among themselves. There are cases where this behaviour is even the
> > > consensus among vendors, as they collectively prefer keeping the design
> > > effort low and get drivers and whole new subsystems upstream without
> > > taking the community interests into account at all.
> >
> > I've begun to have the opinion that the old incentive structure in the
> > industry has fallen apart.
> (...)
> > Now we have single entities that are OEM/(Largest) Customer/OS vendor
> > and sometimes even Chip designer all rolled into one opaque box. They
> > run, and only care about, a completely proprietary stack in userspace.
>
> I have a more optimistic view.
>
> Maybe it depends where you look.
Yes, I didn't say, but I'm specifically looking at the
Datacenter/Cloud/Enterprise area - not Android. Arguably Android is
more classical with Google acting principally as an OS Vendor.
> For deeply embedded silicon even in datacenters, companies like
> Red Hat have pushed the vendors to work upstream because they
> don't want to carry any custom patches. Jon Masters has been
> instrumental here, requiring upstream drivers and ACPI support for
> server silicon.
The influence of the OS vendor in this space has declined
considerably. No hyperscale cloud uses Red Hat as the hypervisor
OS. Many now even provide their own in-house preferred OS for the VMs
to use.
This is what I mean, take Google Cloud as example. Their cloud side
has a propritary closed hypervisor environment. They are their own
OEM, manufacturing their own systems. They have their own hypervisor
OS and VM OS that they control. They even make some of their own
chips, and have vendors make customized off-the-shelf chips just for
them.
IMHO there are a number of surprising and new motivations that come
from this consolidation - this is not the familiar dynamic.
It pus Linux in the role of de-facto standards body without the tools
of a standards body to manage the situation.
> For drivers/accel I was under the impression that since LF is backing
> PyTorch that would be the default userspace, but I don't know how they
> stand with that as it seems CUDA-centric for accelerators, and
> admittedly I don't know what conformance would mean in that case.
> What is even the backend API for an accelerator userspace?
> CUDA and OpenCL?
Yeah, there is a big glue layer between pytorch and the actual HW.
I feel the industry settled on things like pytorch as the agreed
interop layer, and left alone, the driver layer below. So we have
CUDA/ROCm/OneAPI as all different HW stacks leading up to pytorch.
This is not even new, in HPC networking for the least 30 years we've
had MPI as the agreed interop layer and under MPI are fullly parallel
per-device stacks - with varying levels of openness. At least
classical HPC had several well funded actors who had a strong
incentive to follow open source methodologies.
AI HPC hasn't reached an incentive yet, it is too new and exciting.
> For GPIO (admittedly a smaller problem than GPU) we simply made
> a new uAPI to supercede the old one when it didn't work out.
To use Dan's example, if we have to call a mullagan on Confidential
Compute uAPIs it would be a total disaster as we would be unable to
remove the old ones, ever. Some cloud operator would build their
proprietary software stack on the old API and simply refuse to ever
change. This stuff is huge and super complicated, so the cost of
keeping two would be impractical.
I think your point of try-and-fail-fast only works if we have the
flexability to wipe out the failure. If the failure lasts forever the
cost of failing can become too great.
Jason
next prev parent reply other threads:[~2023-08-24 14:19 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-22 0:43 Dan Williams
2023-08-22 8:55 ` Greg KH
2023-08-22 13:37 ` Linus Walleij
2023-08-22 14:29 ` Laurent Pinchart
2023-08-24 0:46 ` Jason Gunthorpe
2023-08-24 8:16 ` Linus Walleij
2023-08-24 14:19 ` Jason Gunthorpe [this message]
2023-08-24 17:12 ` Mark Brown
2023-08-24 17:20 ` Bird, Tim
2023-08-24 19:29 ` Bart Van Assche
2023-08-24 19:58 ` Mark Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZOdm7Kr/HWrlXiux@nvidia.com \
--to=jgg@nvidia.com \
--cc=dan.j.williams@intel.com \
--cc=gregkh@linuxfoundation.org \
--cc=ksummit@lists.linux.dev \
--cc=laurent.pinchart@ideasonboard.com \
--cc=linus.walleij@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox