ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: David Woodhouse <dwmw2@infradead.org>
To: "ksummit-discuss@lists.linuxfoundation.org"
	<ksummit-discuss@lists.linuxfoundation.org>
Subject: [Ksummit-discuss] [CORE TOPIC] Device error handling / reporting / isolation
Date: Thu, 08 May 2014 13:37:03 +0100	[thread overview]
Message-ID: <1399552623.17118.22.camel@i7.infradead.org> (raw)

[-- Attachment #1: Type: text/plain, Size: 1815 bytes --]

I'd like to have a discussion about handling device errors.

IOMMUs are becoming more common, and we've seen some failure modes where
we just end up with an endless stream of fault reports from a given
device, and the kernel can do nothing else.

We may have various options for shutting it up — a PCI function level
reset, power cycling the offending device, or maybe just configuring the
IOMMU to *ignore* further errors from it, which would at least let the
system get on with doing something useful (and if we do, when do we
re-enable reporting?).

But I absolutely don't want us to be implementing policies like that in
an individual IOMMU driver; this needs to be handled by generic device
code. Once upon a time I might have said PCI code, but this is actually
relevant for non-PCI devices too.

I want the IOMMU to report errors, and let the system do the appropriate
thing. Which requires some discussion about what the "appropriate thing"
can be in various circumstances, and indeed what options are available
to us on various platforms.

Participants would be those working with IOMMUs on various platforms,
including Jörg Rödel, myself, and hopefully someone with a fairly
intimate knowledge of EEH as used on POWER systems.

We probably also want KVM folks to weigh in on how, if at all, they'd
want errors on assigned devices to be reported to guests.

I strongly suspect that once we start looking at it, we'll find other
triggers than "IOMMU faults" for starting to isolate and reset
misbehaving devices. Interrupt storms perhaps being one of them — we've
never been particularly robust to those, either.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5745 bytes --]

             reply	other threads:[~2014-05-08 12:37 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-08 12:37 David Woodhouse [this message]
2014-05-08 18:03 ` Bjorn Helgaas
2014-05-08 20:00   ` Rafael J. Wysocki
2014-05-08 19:56 ` James Bottomley
2014-05-09  8:55   ` David Woodhouse
2014-05-09 11:31     ` Laurent Pinchart
2014-05-14  1:28       ` Benjamin Herrenschmidt
2014-05-09 17:48 ` Roland Dreier
2014-05-09 17:58   ` Matthew Wilcox
2014-05-09 18:08     ` Roland Dreier
2014-05-14  1:40   ` Benjamin Herrenschmidt
2014-05-09 18:05 ` Will Deacon
2014-05-12 15:03   ` Joerg Roedel
2014-05-09 19:37 ` Josh Triplett
2014-05-09 19:44   ` David Woodhouse
2014-05-09 19:53   ` Roland Dreier
2014-05-09 20:13     ` Luck, Tony
2014-05-09 20:19       ` James Bottomley
2014-05-10  1:09         ` Laurent Pinchart
2014-05-11 22:43           ` Daniel Vetter
2014-05-12 15:07             ` Joerg Roedel
2014-05-12 15:35               ` Daniel Vetter
2014-05-12 16:16                 ` Andy Lutomirski
2014-05-12 16:28                   ` Joerg Roedel
2014-05-12 16:59                     ` Laurent Pinchart
2014-05-12 17:15                       ` Joerg Roedel
2014-05-12 17:11                     ` Daniel Vetter
2014-05-12 17:40                       ` Joerg Roedel
2014-05-13 10:06                         ` Daniel Vetter
2014-05-12 17:04                   ` Daniel Vetter
2014-05-13 11:27                     ` David Woodhouse
2014-05-13 17:25                       ` Daniel Vetter
2014-05-14  1:50                       ` Benjamin Herrenschmidt
2014-05-14 20:09                         ` Daniel Vetter
2014-05-15  1:08                           ` Benjamin Herrenschmidt
2014-05-12 16:26                 ` Joerg Roedel
2014-05-12 14:58         ` Joerg Roedel
2014-05-13 14:37         ` David Woodhouse
2014-05-14  1:46         ` Benjamin Herrenschmidt
2014-05-14  1:43     ` Benjamin Herrenschmidt
2014-05-14  1:42   ` Benjamin Herrenschmidt
2014-05-14  1:24 ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1399552623.17118.22.camel@i7.infradead.org \
    --to=dwmw2@infradead.org \
    --cc=ksummit-discuss@lists.linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox