From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f199.google.com (mail-wr0-f199.google.com [209.85.128.199]) by kanga.kvack.org (Postfix) with ESMTP id D29DE6B0253 for ; Fri, 15 Sep 2017 10:24:00 -0400 (EDT) Received: by mail-wr0-f199.google.com with SMTP id 97so2608513wrb.1 for ; Fri, 15 Sep 2017 07:24:00 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id b11si1490688edb.0.2017.09.15.07.23.59 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 15 Sep 2017 07:23:59 -0700 (PDT) Date: Fri, 15 Sep 2017 16:23:57 +0200 From: Michal Hocko Subject: Re: [PATCH] mm,page_alloc: softlockup on warn_alloc on Message-ID: <20170915142357.vpuwiv3gzdjtn2vr@dhcp22.suse.cz> References: <20170915103957.64r5xln7s6wlu3ro@dhcp22.suse.cz> <201709152038.BHF26323.LFOMFHOFOJSVQt@I-love.SAKURA.ne.jp> <20170915120020.diakzyzsx73ygnfx@dhcp22.suse.cz> <201709152109.AID48261.FtHOFMFQOJVLOS@I-love.SAKURA.ne.jp> <20170915121401.eaoncsmahh2stqn2@dhcp22.suse.cz> <201709152312.EGB69283.VFQOOtFMOFHJSL@I-love.SAKURA.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201709152312.EGB69283.VFQOOtFMOFHJSL@I-love.SAKURA.ne.jp> Sender: owner-linux-mm@kvack.org List-ID: To: Tetsuo Handa Cc: yuwang668899@gmail.com, vbabka@suse.cz, mpatocka@redhat.com, hannes@cmpxchg.org, mgorman@suse.de, dave.hansen@intel.com, akpm@linux-foundation.org, linux-mm@kvack.org, chenggang.qcg@alibaba-inc.com, yuwang.yuwang@alibaba-inc.com On Fri 15-09-17 23:12:24, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Fri 15-09-17 21:09:29, Tetsuo Handa wrote: > > > Michal Hocko wrote: > > > > On Fri 15-09-17 20:38:49, Tetsuo Handa wrote: > > > > [...] > > > > > You said "identify _why_ we see the lockup trigerring in the first > > > > > place" without providing means to identify it. Unless you provide > > > > > means to identify it (in a form which can be immediately and easily > > > > > backported to 4.9 kernels; that is, backporting not-yet-accepted > > > > > printk() offloading patchset is not a choice), this patch cannot be > > > > > refused. > > > > > > > > I fail to see why. It simply workarounds an existing problem elsewhere > > > > in the kernel without deeper understanding on where the problem is. You > > > > can add your own instrumentation to debug and describe the problem. This > > > > is no different to any other kernel bugs... > > > > > > Please do show us your patch for that. Normal users cannot afford developing > > > such instrumentation to debug and describe the problem. > > > > Stop this nonsense already! Any kernel bug/lockup needs a debugging > > which might be non-trivial and it is necessary to understand the real > > culprit. We do not add random hacks to silence a problem. We aim at > > fixing it! > > Assuming that Wang Yu's trace has > > RIP: 0010:[<...>] [<...>] dump_stack+0x.../0x... > > line in the omitted part (like Cong Wang's trace did), I suspect that a thread > which is holding dump_lock is unable to leave console_unlock() from printk() for > so long because many other threads are trying to call printk() from warn_alloc() > while consuming all CPU time. __dump_stack should be an atomic context AFAIR. But as we already discussed some time ago this lock is not fair and one function might bounce for too long. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org