I'm hearing a bunch of FUD around NVMe hotplug but precious little in the way of bug reports! Keith Busch has been doing a stellar job of fixing up the bugs that he's found, but I have seen precisely zero hotplug bugs reported to the NVMe mailing list. So put up or shut up. On 2014-05-09 1:49 PM, "Roland Dreier" wrote: > On Thu, May 8, 2014 at 5:37 AM, David Woodhouse > wrote: > > I'd like to have a discussion about handling device errors. > > > > IOMMUs are becoming more common, and we've seen some failure modes where > > we just end up with an endless stream of fault reports from a given > > device, and the kernel can do nothing else. > > > > We may have various options for shutting it up — a PCI function level > > reset, power cycling the offending device, or maybe just configuring the > > IOMMU to *ignore* further errors from it, which would at least let the > > system get on with doing something useful (and if we do, when do we > > re-enable reporting?). > > I think there's a more general problem that's worth talking about > here. In addition to IOMMU faults, there are lots of other PCI errors > that can happen, and we have some small number of drivers that have > been "hardened" to try and recover from these errors. However even > for these "hardened" drivers it seems pretty easy to hit deadlocks > when the driver tries to tear down and reinitialize things. > > So I wonder if we can do better without proliferating error handling > tentacles into all sorts of low-level drivers ("did we just read > 0xffffffff here? how about here? are we in the middle of error > recovery? how about now?"). > > One context where this is becoming a real concern is with NVMe drives. > These are SSDs that (may) look like normal 2.5" drives, but use PCIe > rather than SATA or SAS to connect to the host. Since they look like > normal drives, it's natural to put them into hot-pluggable JBODs, but > it turns out we react much worse to PCIe surprise removal than, say, > SAS hotplug. > > - R. > _______________________________________________ > Ksummit-discuss mailing list > Ksummit-discuss@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss >