On Thu, 2014-05-08 at 12:56 -0700, James Bottomley wrote: > On Thu, 2014-05-08 at 13:37 +0100, David Woodhouse wrote: > > I'd like to have a discussion about handling device errors. > > > > IOMMUs are becoming more common, and we've seen some failure modes where > > we just end up with an endless stream of fault reports from a given > > device, and the kernel can do nothing else. > > This is when the addresses being sent by the bus don't have IOTLB > entries? You speak as if you have a software-filled IOTLB. I'd have phrased that as "don't have page table entries". But yes, that. Or they have read-only IOTLB entries, and they're trying to write. And as I said, once we start looking at it I suspect we'll end up finding other offences that need to be taken into consideration. Which is why I think this warrants a wider discussion rather than the IOMMU owners sitting in a darkened room doing it amongst themselves. > > But I absolutely don't want us to be implementing policies like that in > > an individual IOMMU driver; this needs to be handled by generic device > > code. Once upon a time I might have said PCI code, but this is actually > > relevant for non-PCI devices too. > > Right, with my PARISC hat on, our IOMMUs sit adjacent to the CPUs. The > PCI busses (if we have any) are a couple of layers down. Even the Intel IOMMU can do mappings (and take faults) for ACPI devices, these days. > > I want the IOMMU to report errors, and let the system do the appropriate > > thing. Which requires some discussion about what the "appropriate thing" > > can be in various circumstances, and indeed what options are available > > to us on various platforms. > > > > Participants would be those working with IOMMUs on various platforms, > > including Jörg Rödel, myself, and hopefully someone with a fairly > > intimate knowledge of EEH as used on POWER systems. I note that Jörg isn't actually on the nominations list. I think he should be... -- David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation