* [Ksummit-discuss] [TECH TOPIC] Printk softlockups
@ 2014-05-15 21:14 Jan Kara
2014-05-15 21:44 ` josh
2014-05-15 22:20 ` Jiri Kosina
0 siblings, 2 replies; 3+ messages in thread
From: Jan Kara @ 2014-05-15 21:14 UTC (permalink / raw)
To: ksummit-discuss
Hello,
for about an year I'm trying to upstream patches which allow booting of
large machines with serial console attached. The problem is that there are
lots of messages printed during boot (e.g. during device discovery - think
of tens or even hundreds or disks, ...). Currently, console_unlock() prints
messages from kernel printk buffer to console while the buffer is
non-empty. When serial console is attached, printing is slow and thus other
CPUs in the system have plenty of time to append new messages to the buffer
while one CPU is printing. Thus the CPU can spend theoretically unbounded
amount of time (in practice tens of seconds) doing printing in
console_unlock() leading to softlockups, RCU stalls, lost interrupts and
effectively the system dies.
Now over the year I've tried several approaches and the scenario is always
the same - I submit patches, then someone comes, complains he doesn't like
it and possibly suggests another way to do it. So I do it another way,
someone comes and doesn't like it *that* way... Now I've done 8 or so
iterations of the patchset and I'm getting frustrated I have to say.
In the last iteration Alan Cox suggested [1] that I should implement a
buffering console using tty layer and stick it on top of serial console. So
printing would happen only to another buffer, would be fast and the problem
won't appear. Frankly I don't see a big advantage of this approach to just
simply stopping printing of kernel log buffer early and it seems to me
modifying of serial drivers which work in putchar, poll-until-ready style
to work with buffering would be rather complex.
So I would really like as much involved people as possible to sit down in
one room and think over what guarantees do we want from printk, which
complexity is acceptable, and hopefully we can agree on a way accepted by
all parties to resolve the issue.
People involved in the discussion:
Jan Kara <jack@suse.cz>
Andrew Morton <akpm@linux-foundation.org>
Steven Rosted <rostedt@goodmis.org>
Alan Cox <gnomes@lxorguk.ukuu.org.uk>
[1] https://lkml.org/lkml/2014/4/22/251, https://lkml.org/lkml/2014/4/23/647
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Printk softlockups
2014-05-15 21:14 [Ksummit-discuss] [TECH TOPIC] Printk softlockups Jan Kara
@ 2014-05-15 21:44 ` josh
2014-05-15 22:20 ` Jiri Kosina
1 sibling, 0 replies; 3+ messages in thread
From: josh @ 2014-05-15 21:44 UTC (permalink / raw)
To: Jan Kara; +Cc: ksummit-discuss
On Thu, May 15, 2014 at 11:14:55PM +0200, Jan Kara wrote:
> for about an year I'm trying to upstream patches which allow booting of
> large machines with serial console attached. The problem is that there are
> lots of messages printed during boot (e.g. during device discovery - think
> of tens or even hundreds or disks, ...). Currently, console_unlock() prints
> messages from kernel printk buffer to console while the buffer is
> non-empty. When serial console is attached, printing is slow and thus other
> CPUs in the system have plenty of time to append new messages to the buffer
> while one CPU is printing. Thus the CPU can spend theoretically unbounded
> amount of time (in practice tens of seconds) doing printing in
> console_unlock() leading to softlockups, RCU stalls, lost interrupts and
> effectively the system dies.
>
> Now over the year I've tried several approaches and the scenario is always
> the same - I submit patches, then someone comes, complains he doesn't like
> it and possibly suggests another way to do it. So I do it another way,
> someone comes and doesn't like it *that* way... Now I've done 8 or so
> iterations of the patchset and I'm getting frustrated I have to say.
My sympathies; that sounds like exactly the right kind of discussion to
have at Kernel Summit.
> In the last iteration Alan Cox suggested [1] that I should implement a
> buffering console using tty layer and stick it on top of serial console. So
> printing would happen only to another buffer, would be fast and the problem
> won't appear. Frankly I don't see a big advantage of this approach to just
> simply stopping printing of kernel log buffer early and it seems to me
> modifying of serial drivers which work in putchar, poll-until-ready style
> to work with buffering would be rather complex.
>
> So I would really like as much involved people as possible to sit down in
> one room and think over what guarantees do we want from printk, which
> complexity is acceptable, and hopefully we can agree on a way accepted by
> all parties to resolve the issue.
>
> People involved in the discussion:
> Jan Kara <jack@suse.cz>
> Andrew Morton <akpm@linux-foundation.org>
> Steven Rosted <rostedt@goodmis.org>
> Alan Cox <gnomes@lxorguk.ukuu.org.uk>
>
> [1] https://lkml.org/lkml/2014/4/22/251, https://lkml.org/lkml/2014/4/23/647
I'd be interested in this discussion, both for its own sake, and because
with of the overlap with tinification and embedded.
The kernel seems entirely too chatty by default. Many of our current
messages need to move to a lower-priority loglevel. The default output,
even *without* "quiet" or "loglevel", should only include "what went
wrong", not "what went right". For the rest, you can buffer messages in
userspace; any situation critical enough to make a userspace logging
solution unusable should result in messages critical enough to end up
directly on the serial port. (That's actually a good rule of thumb for
critical messages: "if this goes wrong, might it become impossible to
get at the message via the normal userspace-captured log"?)
So, I'd be interested in whether we can make the kernel usable for your
scenario *without* extensive fixes to printk. We should also fix the
printk-to-serial path to not suck as much as it does, especially for
non-emergency messages. ("softlockup detected" is the kind
of message that needs to be printed by a routine with a higher priority
than potential softlockups; "disk detected" isn't.)
- Josh Triplett
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [Ksummit-discuss] [TECH TOPIC] Printk softlockups
2014-05-15 21:14 [Ksummit-discuss] [TECH TOPIC] Printk softlockups Jan Kara
2014-05-15 21:44 ` josh
@ 2014-05-15 22:20 ` Jiri Kosina
1 sibling, 0 replies; 3+ messages in thread
From: Jiri Kosina @ 2014-05-15 22:20 UTC (permalink / raw)
To: Jan Kara; +Cc: ksummit-discuss
On Thu, 15 May 2014, Jan Kara wrote:
[ ... snip ... ]
> So I would really like as much involved people as possible to sit down in
> one room and think over what guarantees do we want from printk, which
> complexity is acceptable, and hopefully we can agree on a way accepted by
> all parties to resolve the issue.
>
> People involved in the discussion:
> Jan Kara <jack@suse.cz>
> Andrew Morton <akpm@linux-foundation.org>
> Steven Rosted <rostedt@goodmis.org>
> Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Yes, this story is indeed frustrating.
What is worse, printk() needs even more surgery so that it really doesn't
lockup the machines 'super-hard' when called from NMI context.
We've spent a non-trivial amount of time in fixing this [1]. It might be a
natural followup to the discussion you are proposing, as we are basically
making printk() even more compilcated with that patchset ... but for a
good reason as well.
Currently, pritnk() is able (and we've seen in happening) to just
completely lock up the machine with all processess stuck in NMI context,
which is rather undebuggable, so we better have it fixed. But yes,
admittedly, it makes printk() code yet more complex, which might cause
headache to people who are already afraid of your patchset.
[1] https://lkml.org/lkml/2014/5/9/118
--
Jiri Kosina
SUSE Labs
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-05-15 22:20 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-15 21:14 [Ksummit-discuss] [TECH TOPIC] Printk softlockups Jan Kara
2014-05-15 21:44 ` josh
2014-05-15 22:20 ` Jiri Kosina
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox