Re: [Ksummit-discuss] [CORE TOPIC] Device error handling / reporting / isolation

ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Roland Dreier <roland@kernel.org>
Cc: "ksummit-discuss@lists.linuxfoundation.org"
	<ksummit-discuss@lists.linuxfoundation.org>
Subject: Re: [Ksummit-discuss] [CORE TOPIC] Device error handling / reporting / isolation
Date: Wed, 14 May 2014 11:40:46 +1000	[thread overview]
Message-ID: <1400031646.17624.215.camel@pasglop> (raw)
In-Reply-To: <CAG4TOxNJxWLWSZYW313XATtCpV+SM9WrYFJ-V+0Pf-yryLh67g@mail.gmail.com>

On Fri, 2014-05-09 at 10:48 -0700, Roland Dreier wrote:
> 
> I think there's a more general problem that's worth talking about
> here.  In addition to IOMMU faults, there are lots of other PCI errors
> that can happen, and we have some small number of drivers that have
> been "hardened" to try and recover from these errors.  However even
> for these "hardened" drivers it seems pretty easy to hit deadlocks
> when the driver tries to tear down and reinitialize things.

Right. We are hitting that every time we test  a new round of machines /
FW / Distro on power when testing EEH. The error path in the drivers are
very badly tested.

For example, when our HW "isolates" a device, all reads start returning
ff's on MMIOs. Plenty of drivers will have either infinite or
very-long-timeout loops waiting for a bit to clear...

Also, when our HW decides to fence the entire PCI Express controller
(which can happen for example if it took a parity error in an internal
cache), subsequent MMIOs return ff's but also take a long time (hundreds
of microseconds or more).

We had issues where driver implement timeouts like this:

	for (i = 0; i < 10000; i++) {
		foo = readl(bar);
		if ((foo & my_bit) == 0)
			break;
		udelay(1);
	}

And expect this to be a 10ms timeout ... in fenced situations, it ends
up being a 100ms or 1s timeout (we've seen much longer ones).

One way to help find/fix these would be a better error injection
capability to "isolate" devices, for example my remapping their
MMIOs to something that returns ff's :-)

> So I wonder if we can do better without proliferating error handling
> tentacles into all sorts of low-level drivers ("did we just read
> 0xffffffff here?  how about here?  are we in the middle of error
> recovery?  how about now?").

We can't because ultimately, that is what HW will return when it's
broken, disconnected, lost a link, or EEH.

> One context where this is becoming a real concern is with NVMe drives.
>  These are SSDs that (may) look like normal 2.5" drives, but use PCIe
> rather than SATA or SAS to connect to the host.  Since they look like
> normal drives, it's natural to put them into hot-pluggable JBODs, but
> it turns out we react much worse to PCIe surprise removal than, say,
> SAS hotplug.

Cheers,
Ben.

next prev parent reply	other threads:[~2014-05-14  1:41 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-08 12:37 David Woodhouse
2014-05-08 18:03 ` Bjorn Helgaas
2014-05-08 20:00   ` Rafael J. Wysocki
2014-05-08 19:56 ` James Bottomley
2014-05-09  8:55   ` David Woodhouse
2014-05-09 11:31     ` Laurent Pinchart
2014-05-14  1:28       ` Benjamin Herrenschmidt
2014-05-09 17:48 ` Roland Dreier
2014-05-09 17:58   ` Matthew Wilcox
2014-05-09 18:08     ` Roland Dreier
2014-05-14  1:40   ` Benjamin Herrenschmidt [this message]
2014-05-09 18:05 ` Will Deacon
2014-05-12 15:03   ` Joerg Roedel
2014-05-09 19:37 ` Josh Triplett
2014-05-09 19:44   ` David Woodhouse
2014-05-09 19:53   ` Roland Dreier
2014-05-09 20:13     ` Luck, Tony
2014-05-09 20:19       ` James Bottomley
2014-05-10  1:09         ` Laurent Pinchart
2014-05-11 22:43           ` Daniel Vetter
2014-05-12 15:07             ` Joerg Roedel
2014-05-12 15:35               ` Daniel Vetter
2014-05-12 16:16                 ` Andy Lutomirski
2014-05-12 16:28                   ` Joerg Roedel
2014-05-12 16:59                     ` Laurent Pinchart
2014-05-12 17:15                       ` Joerg Roedel
2014-05-12 17:11                     ` Daniel Vetter
2014-05-12 17:40                       ` Joerg Roedel
2014-05-13 10:06                         ` Daniel Vetter
2014-05-12 17:04                   ` Daniel Vetter
2014-05-13 11:27                     ` David Woodhouse
2014-05-13 17:25                       ` Daniel Vetter
2014-05-14  1:50                       ` Benjamin Herrenschmidt
2014-05-14 20:09                         ` Daniel Vetter
2014-05-15  1:08                           ` Benjamin Herrenschmidt
2014-05-12 16:26                 ` Joerg Roedel
2014-05-12 14:58         ` Joerg Roedel
2014-05-13 14:37         ` David Woodhouse
2014-05-14  1:46         ` Benjamin Herrenschmidt
2014-05-14  1:43     ` Benjamin Herrenschmidt
2014-05-14  1:42   ` Benjamin Herrenschmidt
2014-05-14  1:24 ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1400031646.17624.215.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=ksummit-discuss@lists.linuxfoundation.org \
    --cc=roland@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox