From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id 178326B0038 for ; Tue, 19 Dec 2017 11:38:11 -0500 (EST) Received: by mail-pf0-f198.google.com with SMTP id z1so14840057pfl.9 for ; Tue, 19 Dec 2017 08:38:11 -0800 (PST) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id be8sor5569765plb.89.2017.12.19.08.38.09 for (Google Transport Security); Tue, 19 Dec 2017 08:38:09 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: <94eb2c03c9bc75aff2055f70734c@google.com> <001a113f711a528a3f0560b08e76@google.com> <201712192327.FIJ64026.tMQFOOVFFLHOSJ@I-love.SAKURA.ne.jp> From: Dmitry Vyukov Date: Tue, 19 Dec 2017 17:37:48 +0100 Message-ID: Subject: Re: BUG: workqueue lockup (2) Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Tetsuo Handa Cc: syzbot , syzkaller-bugs@googlegroups.com, Greg Kroah-Hartman , Kate Stewart , LKML , Linux-MM , Philippe Ombredanne , Thomas Gleixner On Tue, Dec 19, 2017 at 3:40 PM, Dmitry Vyukov wrote: > On Tue, Dec 19, 2017 at 3:27 PM, Tetsuo Handa > wrote: >> syzbot wrote: >>> >>> syzkaller has found reproducer for the following crash on >>> f3b5ad89de16f5d42e8ad36fbdf85f705c1ae051 >> >> "BUG: workqueue lockup" is not a crash. > > Hi Tetsuo, > > What is the proper name for all of these collectively? > > >>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master >>> compiler: gcc (GCC) 7.1.1 20170620 >>> .config is attached >>> Raw console output is attached. >>> C reproducer is attached >>> syzkaller reproducer is attached. See https://goo.gl/kgGztJ >>> for information about syzkaller reproducers >>> >>> >>> BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 37s! >>> BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=-20 stuck for 32s! >>> Showing busy workqueues and worker pools: >>> workqueue events: flags=0x0 >>> pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 >>> pending: cache_reap >>> workqueue events_power_efficient: flags=0x80 >>> pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256 >>> pending: neigh_periodic_work, do_cache_clean >>> workqueue mm_percpu_wq: flags=0x8 >>> pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 >>> pending: vmstat_update >>> workqueue kblockd: flags=0x18 >>> pwq 3: cpus=1 node=0 flags=0x0 nice=-20 active=1/256 >>> pending: blk_timeout_work >> >> You gave up too early. There is no hint for understanding what was going on. >> While we can observe "BUG: workqueue lockup" under memory pressure, there is >> no hint like SysRq-t and SysRq-m. Thus, I can't tell something is wrong. > > Do you know how to send them programmatically? I tried to find a way > several times, but failed. Articles that I've found talk about > pressing some keys that don't translate directly to us-ascii. On second though, some oopses automatically dump locks/tasks. Should we do the same for this oops? > But you can also run the reproducer. No report can possible provide > all possible useful information, sometimes debugging boils down to > manually adding printfs. That's why syzbot aims at providing a > reproducer as the ultimate source of details. Also since a developer > needs to test a proposed fix, it's easier to start with the reproducer > right away. > > >> At least you need to confirm that lockup lasts for a few minutes. Otherwise, > > Is it possible to increase the timeout? How? We could bump it up to 2 minutes. > > >> this might be just overstressing. (According to repro.c , 12 threads are >> created and soon SEGV follows? According to above message, only 2 CPUs? >> Triggering SEGV suggests memory was low due to saving coredump?) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org