From: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
To: ksummit-discuss@lists.linuxfoundation.org
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Subject: Re: [Ksummit-discuss] [CORE TOPIC] Device error handling / reporting / isolation
Date: Fri, 09 May 2014 13:31:35 +0200 [thread overview]
Message-ID: <3098666.qf3pAh5N1u@avalon> (raw)
In-Reply-To: <1399625714.879.9.camel@i7.infradead.org>
On Friday 09 May 2014 09:55:14 David Woodhouse wrote:
> On Thu, 2014-05-08 at 12:56 -0700, James Bottomley wrote:
> > On Thu, 2014-05-08 at 13:37 +0100, David Woodhouse wrote:
> > > I'd like to have a discussion about handling device errors.
> > >
> > > IOMMUs are becoming more common, and we've seen some failure modes where
> > > we just end up with an endless stream of fault reports from a given
> > > device, and the kernel can do nothing else.
> >
> > This is when the addresses being sent by the bus don't have IOTLB
> > entries?
>
> You speak as if you have a software-filled IOTLB. I'd have phrased that
> as "don't have page table entries". But yes, that.
>
> Or they have read-only IOTLB entries, and they're trying to write.
Or they're trying to perform secure access on non-secure IOTLB entries.
I've recently run into IOMMU issues that resulted in endless messages being
printed to the kernel log, exactly as you've mentioned, and found out the
error reporting mechanisms to be less than adequate.
The problem is twofold: we first need a mechanism to associate errors with
devices, and then a second mechanism to handle those errors.
I doubt the former could be made completely generic, but we should at least be
able to implement those mechanisms in subsystem core code. For instance, in
the IOMMU case, we will need to map I/O VAs to struct device, and I don't want
to see that being scattered across individual IOMMU drivers or bus master
drivers. Better locations would be either the IOMMU core or the DMA mapping
implementation.
The latter will likely require a mix of generic code for device isolation
and/or reset (when possible) and driver-specific code for proper recovery. A
fast reaction to prevent more faults from being generated should be coupled
with a slower reaction to fix the actual cause of the problem. I expect the
problem to be fatal in most cases, and, for IOMMUs again, usually caused by a
software bug rather than a hardware misbehaviour (although the latter can of
course happen). From an overall system point of view preventing the denial of
service that follows such errors (caused by kernel log flooding for instance,
or by the IOMMU being unable to serve other bus masters) could be our first
priority.
I'm interested to take part in this discussion.
> And as I said, once we start looking at it I suspect we'll end up
> finding other offences that need to be taken into consideration. Which
> is why I think this warrants a wider discussion rather than the IOMMU
> owners sitting in a darkened room doing it amongst themselves.
>
> > > But I absolutely don't want us to be implementing policies like that in
> > > an individual IOMMU driver; this needs to be handled by generic device
> > > code. Once upon a time I might have said PCI code, but this is actually
> > > relevant for non-PCI devices too.
> >
> > Right, with my PARISC hat on, our IOMMUs sit adjacent to the CPUs. The
> > PCI busses (if we have any) are a couple of layers down.
>
> Even the Intel IOMMU can do mappings (and take faults) for ACPI devices,
> these days.
>
> > > I want the IOMMU to report errors, and let the system do the appropriate
> > > thing. Which requires some discussion about what the "appropriate thing"
> > > can be in various circumstances, and indeed what options are available
> > > to us on various platforms.
> > >
> > > Participants would be those working with IOMMUs on various platforms,
> > > including Jörg Rödel, myself, and hopefully someone with a fairly
> > > intimate knowledge of EEH as used on POWER systems.
>
> I note that Jörg isn't actually on the nominations list. I think he
> should be...
--
Regards,
Laurent Pinchart
next prev parent reply other threads:[~2014-05-09 11:31 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-08 12:37 David Woodhouse
2014-05-08 18:03 ` Bjorn Helgaas
2014-05-08 20:00 ` Rafael J. Wysocki
2014-05-08 19:56 ` James Bottomley
2014-05-09 8:55 ` David Woodhouse
2014-05-09 11:31 ` Laurent Pinchart [this message]
2014-05-14 1:28 ` Benjamin Herrenschmidt
2014-05-09 17:48 ` Roland Dreier
2014-05-09 17:58 ` Matthew Wilcox
2014-05-09 18:08 ` Roland Dreier
2014-05-14 1:40 ` Benjamin Herrenschmidt
2014-05-09 18:05 ` Will Deacon
2014-05-12 15:03 ` Joerg Roedel
2014-05-09 19:37 ` Josh Triplett
2014-05-09 19:44 ` David Woodhouse
2014-05-09 19:53 ` Roland Dreier
2014-05-09 20:13 ` Luck, Tony
2014-05-09 20:19 ` James Bottomley
2014-05-10 1:09 ` Laurent Pinchart
2014-05-11 22:43 ` Daniel Vetter
2014-05-12 15:07 ` Joerg Roedel
2014-05-12 15:35 ` Daniel Vetter
2014-05-12 16:16 ` Andy Lutomirski
2014-05-12 16:28 ` Joerg Roedel
2014-05-12 16:59 ` Laurent Pinchart
2014-05-12 17:15 ` Joerg Roedel
2014-05-12 17:11 ` Daniel Vetter
2014-05-12 17:40 ` Joerg Roedel
2014-05-13 10:06 ` Daniel Vetter
2014-05-12 17:04 ` Daniel Vetter
2014-05-13 11:27 ` David Woodhouse
2014-05-13 17:25 ` Daniel Vetter
2014-05-14 1:50 ` Benjamin Herrenschmidt
2014-05-14 20:09 ` Daniel Vetter
2014-05-15 1:08 ` Benjamin Herrenschmidt
2014-05-12 16:26 ` Joerg Roedel
2014-05-12 14:58 ` Joerg Roedel
2014-05-13 14:37 ` David Woodhouse
2014-05-14 1:46 ` Benjamin Herrenschmidt
2014-05-14 1:43 ` Benjamin Herrenschmidt
2014-05-14 1:42 ` Benjamin Herrenschmidt
2014-05-14 1:24 ` Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3098666.qf3pAh5N1u@avalon \
--to=laurent.pinchart@ideasonboard.com \
--cc=James.Bottomley@hansenpartnership.com \
--cc=ksummit-discuss@lists.linuxfoundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox