Re: BUG: workqueue lockup (2)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: dvyukov@google.com
Cc: bot+e38be687a2450270a3b593bacb6b5795a7a74edb@syzkaller.appspotmail.com,
	syzkaller-bugs@googlegroups.com, gregkh@linuxfoundation.org,
	kstewart@linuxfoundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, pombredanne@nexb.com, tglx@linutronix.de
Subject: Re: BUG: workqueue lockup (2)
Date: Wed, 20 Dec 2017 19:55:10 +0900	[thread overview]
Message-ID: <201712201955.BHB30282.tMSFVFFJLQHOOO@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <CACT4Y+YZ6yuZqrjAxHEadW56TVS=x=WQqrfRrvMQ=LHU3+Kd8A@mail.gmail.com>

Dmitry Vyukov wrote:
> On Tue, Dec 19, 2017 at 3:27 PM, Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > syzbot wrote:
> >>
> >> syzkaller has found reproducer for the following crash on
> >> f3b5ad89de16f5d42e8ad36fbdf85f705c1ae051
> >
> > "BUG: workqueue lockup" is not a crash.
> 
> Hi Tetsuo,
> 
> What is the proper name for all of these collectively?

I think that things which lead to kernel panic when /proc/sys/kernel/panic_on_oops
was set to 1 are called an "oops" (or a "kerneloops").

Speak of "BUG: workqueue lockup", this is not an "oops". This message was
added by 82607adcf9cdf40f ("workqueue: implement lockup detector"), and
this message does not always indicate a fatal problem. This message can be
printed when the system is really out of CPU and memory. As far as I tested,
I think that workqueue was not able to run on specific CPU due to a soft
lockup bug.

> 
> 
> >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
> >> compiler: gcc (GCC) 7.1.1 20170620
> >> .config is attached
> >> Raw console output is attached.
> >> C reproducer is attached
> >> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> >> for information about syzkaller reproducers
> >>
> >>
> >> BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 37s!
> >> BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=-20 stuck for 32s!
> >> Showing busy workqueues and worker pools:
> >> workqueue events: flags=0x0
> >>    pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
> >>      pending: cache_reap
> >> workqueue events_power_efficient: flags=0x80
> >>    pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
> >>      pending: neigh_periodic_work, do_cache_clean
> >> workqueue mm_percpu_wq: flags=0x8
> >>    pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
> >>      pending: vmstat_update
> >> workqueue kblockd: flags=0x18
> >>    pwq 3: cpus=1 node=0 flags=0x0 nice=-20 active=1/256
> >>      pending: blk_timeout_work
> >
> > You gave up too early. There is no hint for understanding what was going on.
> > While we can observe "BUG: workqueue lockup" under memory pressure, there is
> > no hint like SysRq-t and SysRq-m. Thus, I can't tell something is wrong.
> 
> Do you know how to send them programmatically? I tried to find a way
> several times, but failed. Articles that I've found talk about
> pressing some keys that don't translate directly to us-ascii.

# echo t > /proc/sysrq-trigger
# echo m > /proc/sysrq-trigger

> 
> But you can also run the reproducer. No report can possible provide
> all possible useful information, sometimes debugging boils down to
> manually adding printfs. That's why syzbot aims at providing a
> reproducer as the ultimate source of details. Also since a developer
> needs to test a proposed fix, it's easier to start with the reproducer
> right away.

I don't have information about how to run the reproducer (e.g. how many
CPUs, how much memory, what network configuration is needed).

Also, please explain how to interpret raw.log file. The raw.log in
94eb2c03c9bc75aff2055f70734c@google.com had a lot of code output and kernel
messages but did not contain "BUG: workqueue lockup" message. On the other
hand, the raw.log in 001a113f711a528a3f0560b08e76@google.com has only kernel
messages and contains "BUG: workqueue lockup" message. Why they are
significantly different?

Also, can you add timestamp to all messages?
When each message was printed is a clue for understanding relationship.

> 
> 
> > At least you need to confirm that lockup lasts for a few minutes. Otherwise,
> 
> Is it possible to increase the timeout? How? We could bump it up to 2 minutes.

# echo 120 > /sys/module/workqueue/parameters/watchdog_thresh

But generally, reporting multiple times rather than only once gives me
better clue, for the former would tell me whether situation was changing.

Can you try not to give up as soon as "BUG: workqueue lockup" was printed
for the first time?

> 
> 
> > this might be just overstressing. (According to repro.c , 12 threads are
> > created and soon SEGV follows? According to above message, only 2 CPUs?
> > Triggering SEGV suggests memory was low due to saving coredump?)
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2017-12-20 10:55 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-03 14:31 syzbot
2017-12-03 14:36 ` Dmitry Vyukov
2017-12-03 14:48   ` Thomas Gleixner
2017-12-04 11:08     ` Tetsuo Handa
2017-12-19 12:25 ` syzbot
2017-12-19 14:27   ` Tetsuo Handa
2017-12-19 14:40     ` Dmitry Vyukov
2017-12-19 16:37       ` Dmitry Vyukov
2017-12-20 10:55         ` Tetsuo Handa [this message]
2017-12-21 10:19           ` Dmitry Vyukov
2017-12-21 10:22           ` Dmitry Vyukov
2017-12-21 11:04           ` Dmitry Vyukov
2017-12-21 13:07             ` Tetsuo Handa
2017-12-28 13:43               ` Dmitry Vyukov
2018-05-12 21:52   ` Eric Biggers
2018-05-13  2:06     ` Tetsuo Handa
2018-05-13  3:32       ` Eric Biggers
2018-05-13 14:29         ` Tetsuo Handa
2018-05-13 14:35           ` Dmitry Vyukov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201712201955.BHB30282.tMSFVFFJLQHOOO@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=bot+e38be687a2450270a3b593bacb6b5795a7a74edb@syzkaller.appspotmail.com \
    --cc=dvyukov@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=kstewart@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pombredanne@nexb.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox