From: Roland Dreier <roland@kernel.org>
To: David Woodhouse <dwmw2@infradead.org>
Cc: "ksummit-discuss@lists.linuxfoundation.org"
<ksummit-discuss@lists.linuxfoundation.org>
Subject: Re: [Ksummit-discuss] [CORE TOPIC] Device error handling / reporting / isolation
Date: Fri, 9 May 2014 10:48:55 -0700 [thread overview]
Message-ID: <CAG4TOxNJxWLWSZYW313XATtCpV+SM9WrYFJ-V+0Pf-yryLh67g@mail.gmail.com> (raw)
In-Reply-To: <1399552623.17118.22.camel@i7.infradead.org>
On Thu, May 8, 2014 at 5:37 AM, David Woodhouse <dwmw2@infradead.org> wrote:
> I'd like to have a discussion about handling device errors.
>
> IOMMUs are becoming more common, and we've seen some failure modes where
> we just end up with an endless stream of fault reports from a given
> device, and the kernel can do nothing else.
>
> We may have various options for shutting it up — a PCI function level
> reset, power cycling the offending device, or maybe just configuring the
> IOMMU to *ignore* further errors from it, which would at least let the
> system get on with doing something useful (and if we do, when do we
> re-enable reporting?).
I think there's a more general problem that's worth talking about
here. In addition to IOMMU faults, there are lots of other PCI errors
that can happen, and we have some small number of drivers that have
been "hardened" to try and recover from these errors. However even
for these "hardened" drivers it seems pretty easy to hit deadlocks
when the driver tries to tear down and reinitialize things.
So I wonder if we can do better without proliferating error handling
tentacles into all sorts of low-level drivers ("did we just read
0xffffffff here? how about here? are we in the middle of error
recovery? how about now?").
One context where this is becoming a real concern is with NVMe drives.
These are SSDs that (may) look like normal 2.5" drives, but use PCIe
rather than SATA or SAS to connect to the host. Since they look like
normal drives, it's natural to put them into hot-pluggable JBODs, but
it turns out we react much worse to PCIe surprise removal than, say,
SAS hotplug.
- R.
next prev parent reply other threads:[~2014-05-09 17:49 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-08 12:37 David Woodhouse
2014-05-08 18:03 ` Bjorn Helgaas
2014-05-08 20:00 ` Rafael J. Wysocki
2014-05-08 19:56 ` James Bottomley
2014-05-09 8:55 ` David Woodhouse
2014-05-09 11:31 ` Laurent Pinchart
2014-05-14 1:28 ` Benjamin Herrenschmidt
2014-05-09 17:48 ` Roland Dreier [this message]
2014-05-09 17:58 ` Matthew Wilcox
2014-05-09 18:08 ` Roland Dreier
2014-05-14 1:40 ` Benjamin Herrenschmidt
2014-05-09 18:05 ` Will Deacon
2014-05-12 15:03 ` Joerg Roedel
2014-05-09 19:37 ` Josh Triplett
2014-05-09 19:44 ` David Woodhouse
2014-05-09 19:53 ` Roland Dreier
2014-05-09 20:13 ` Luck, Tony
2014-05-09 20:19 ` James Bottomley
2014-05-10 1:09 ` Laurent Pinchart
2014-05-11 22:43 ` Daniel Vetter
2014-05-12 15:07 ` Joerg Roedel
2014-05-12 15:35 ` Daniel Vetter
2014-05-12 16:16 ` Andy Lutomirski
2014-05-12 16:28 ` Joerg Roedel
2014-05-12 16:59 ` Laurent Pinchart
2014-05-12 17:15 ` Joerg Roedel
2014-05-12 17:11 ` Daniel Vetter
2014-05-12 17:40 ` Joerg Roedel
2014-05-13 10:06 ` Daniel Vetter
2014-05-12 17:04 ` Daniel Vetter
2014-05-13 11:27 ` David Woodhouse
2014-05-13 17:25 ` Daniel Vetter
2014-05-14 1:50 ` Benjamin Herrenschmidt
2014-05-14 20:09 ` Daniel Vetter
2014-05-15 1:08 ` Benjamin Herrenschmidt
2014-05-12 16:26 ` Joerg Roedel
2014-05-12 14:58 ` Joerg Roedel
2014-05-13 14:37 ` David Woodhouse
2014-05-14 1:46 ` Benjamin Herrenschmidt
2014-05-14 1:43 ` Benjamin Herrenschmidt
2014-05-14 1:42 ` Benjamin Herrenschmidt
2014-05-14 1:24 ` Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAG4TOxNJxWLWSZYW313XATtCpV+SM9WrYFJ-V+0Pf-yryLh67g@mail.gmail.com \
--to=roland@kernel.org \
--cc=dwmw2@infradead.org \
--cc=ksummit-discuss@lists.linuxfoundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox