From: Steven Rostedt <rostedt@goodmis.org>
To: Daniel Wang <wonderfly@google.com>
Cc: stable@vger.kernel.org, pmladek@suse.com,
Alexander.Levin@microsoft.com, akpm@linux-foundation.org,
byungchul.park@lge.com, dave.hansen@intel.com,
hannes@cmpxchg.org, jack@suse.cz, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, mathieu.desnoyers@efficios.com,
mgorman@suse.de, mhocko@kernel.org, pavel@ucw.cz,
penguin-kernel@I-love.SAKURA.ne.jp, peterz@infradead.org,
tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz,
xiyou.wangcong@gmail.com, pfeiner@google.com
Subject: Re: 4.14 backport request for dbdda842fe96f: "printk: Add console owner and waiter logic to load balance console writes"
Date: Mon, 1 Oct 2018 15:23:24 -0400 [thread overview]
Message-ID: <20181001152324.72a20bea@gandalf.local.home> (raw)
In-Reply-To: <20180927194601.207765-1-wonderfly@google.com>
On Thu, 27 Sep 2018 12:46:01 -0700
Daniel Wang <wonderfly@google.com> wrote:
> Prior to this change, the combination of `softlockup_panic=1` and
> `softlockup_all_cpu_stacktrace=1` may result in a deadlock when the reboot path
> is trying to grab the console lock that is held by the stack trace printing
> path. What seems to be happening is that while there are multiple CPUs, only one
> of them is tasked to print the back trace of all CPUs. On a machine with many
> CPUs and a slow serial console (on Google Compute Engine for example), the stack
> trace printing routine hits a timeout and the reboot path kicks in. The latter
> then tries to print something else, but can't get the lock because it's still
> held by earlier printing path. This is easily reproducible on a VM with 16+
> vCPUs on Google Compute Engine - which is a very common scenario.
>
> A quick repro is available at
> https://github.com/wonderfly/printk-deadlock-repro. The system hangs 3 seconds
> into executing repro.sh. Both deadlock analysis and repro are credits to Peter
> Feiner.
>
> Note that I have read previous discussions on backporting this to stable [1].
> The argument for objecting the backport was that this is a non-trivial fix and
> is supported to prevent hypothetical soft lockups. What we are hitting is a real
> deadlock, in production, however. Hence this request.
>
> [1] https://lore.kernel.org/lkml/20180409081535.dq7p5bfnpvd3xk3t@pathway.suse.cz/T/#u
>
> Serial console logs leading up to the deadlock. As can be seen the stack trace
> was incomplete because the printing path hit a timeout.
I'm fine with having this backported.
-- Steve
next prev parent reply other threads:[~2018-10-01 19:23 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-27 19:46 Daniel Wang
2018-10-01 19:23 ` Steven Rostedt [this message]
2018-10-01 20:13 ` Pavel Machek
2018-10-01 20:21 ` Vlastimil Babka
2018-10-01 20:38 ` Daniel Wang
2018-10-01 20:29 ` Steven Rostedt
2018-10-01 20:37 ` Daniel Wang
2018-10-01 20:40 ` Sasha Levin
2018-10-02 8:42 ` Petr Mladek
2018-10-02 17:21 ` Daniel Wang
2018-10-03 0:15 ` Daniel Wang
2018-10-03 1:23 ` Steven Rostedt
2018-10-03 9:14 ` Petr Mladek
2018-10-03 17:16 ` Daniel Wang
2018-10-03 17:37 ` Steven Rostedt
2018-10-03 18:37 ` Daniel Wang
2018-10-03 23:37 ` Daniel Wang
2018-10-04 7:44 ` Sergey Senozhatsky
2018-10-04 8:05 ` Sergey Senozhatsky
2018-10-04 8:36 ` Petr Mladek
2018-10-04 8:55 ` Sergey Senozhatsky
2018-10-21 18:09 ` Daniel Wang
2018-10-22 9:32 ` Petr Mladek
2018-10-22 10:09 ` Sergey Senozhatsky
2018-11-01 16:05 ` Daniel Wang
2018-11-09 6:47 ` Sergey Senozhatsky
2018-12-12 1:16 ` Daniel Wang
2018-12-12 5:21 ` Sergey Senozhatsky
2018-12-12 6:08 ` Daniel Wang
2018-12-12 6:28 ` Sergey Senozhatsky
2018-12-12 6:48 ` Sasha Levin
2018-12-12 8:10 ` Sergey Senozhatsky
2018-12-12 13:36 ` Petr Mladek
2018-12-12 13:59 ` Sergey Senozhatsky
2018-12-12 17:43 ` Sasha Levin
2018-12-12 20:11 ` Daniel Wang
2018-12-12 21:43 ` Sasha Levin
2018-12-12 21:49 ` Daniel Wang
2018-12-12 21:52 ` Sasha Levin
2018-12-12 21:56 ` Daniel Wang
2018-12-13 0:40 ` Daniel Wang
2018-12-13 2:27 ` Sergey Senozhatsky
2018-12-13 2:39 ` Daniel Wang
2018-12-13 9:59 ` Petr Mladek
2018-12-13 14:29 ` Sasha Levin
2018-12-13 2:07 ` Sergey Senozhatsky
2018-12-28 0:16 ` Sergey Senozhatsky
2018-12-28 8:27 ` Greg KH
2018-12-28 22:03 ` Daniel Wang
2018-12-28 22:03 ` Daniel Wang
2018-12-30 3:03 ` Sergey Senozhatsky
2018-10-04 7:49 ` Petr Mladek
2018-10-02 8:16 ` Sergey Senozhatsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181001152324.72a20bea@gandalf.local.home \
--to=rostedt@goodmis.org \
--cc=Alexander.Levin@microsoft.com \
--cc=akpm@linux-foundation.org \
--cc=byungchul.park@lge.com \
--cc=dave.hansen@intel.com \
--cc=hannes@cmpxchg.org \
--cc=jack@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mgorman@suse.de \
--cc=mhocko@kernel.org \
--cc=pavel@ucw.cz \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=peterz@infradead.org \
--cc=pfeiner@google.com \
--cc=pmladek@suse.com \
--cc=stable@vger.kernel.org \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
--cc=wonderfly@google.com \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox