From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI, MIME_QP_LONG_LINE,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57229C4740A for ; Sun, 6 Oct 2019 00:10:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D6777222C8 for ; Sun, 6 Oct 2019 00:10:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lca.pw header.i=@lca.pw header.b="PswHiM/B" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D6777222C8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=lca.pw Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4A3EB6B0003; Sat, 5 Oct 2019 20:10:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4548E6B0005; Sat, 5 Oct 2019 20:10:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3699B8E0003; Sat, 5 Oct 2019 20:10:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0218.hostedemail.com [216.40.44.218]) by kanga.kvack.org (Postfix) with ESMTP id 154866B0003 for ; Sat, 5 Oct 2019 20:10:52 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 92B77181AC9AE for ; Sun, 6 Oct 2019 00:10:51 +0000 (UTC) X-FDA: 76011429102.09.board77_5ea8a857a2d43 X-HE-Tag: board77_5ea8a857a2d43 X-Filterd-Recvd-Size: 14714 Received: from mail-qt1-f196.google.com (mail-qt1-f196.google.com [209.85.160.196]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Sun, 6 Oct 2019 00:10:50 +0000 (UTC) Received: by mail-qt1-f196.google.com with SMTP id 3so14096567qta.1 for ; Sat, 05 Oct 2019 17:10:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lca.pw; s=google; h=content-transfer-encoding:from:mime-version:subject:date:message-id :references:cc:in-reply-to:to; bh=O7SlWFQORVPPvXO0cOpsjQAu19u+WXbrw7Qx6dYKCZ0=; b=PswHiM/BIPsKwmlHPu6vurL2+1cXEWaJVXnLl9ik6XdJsZCcOTSbu5LOBOax9w2CWy E1JEjSMKI6x1AuvH3YnoFYlP40EOABs8RuvbzHwz2mz71nPhmtJCScmQqPG/O624eHSV xxINAA1r/xEK8TPWG0avjLOjQmLFyU/Wsf928Y+fLSAnLpvczXRn0g87qZdBSkLqqgER md8uc2rdoHnSMH4k6YnMUgOq8eWuUWoRRQRmXCWpcpo1xnUbWe8HffyICIcqm0yMYfoQ Bgf82uvdHlyr0xbCZKyGTxwY4gQ/oYG1KHnFv4jCXrOY+1hODuSgMuqzUmzbWxG+Dv4Q nFAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=O7SlWFQORVPPvXO0cOpsjQAu19u+WXbrw7Qx6dYKCZ0=; b=oOv/sNyzsz55FTS+v0BgsVjBUK2XRIiT6mhiRy85xLSkejHeURqfyUnZ+uq5YUIvDj 4hbe+G7lL1qOFJn8GWg0LBuD35P3iThjK0qGEkjMs1zHctCszIF6iVC74U1MlldsZFgN T/0d+cbnrWlZk0DHNhH1WEQVVnGMNMRSQm4V+w0jTaGvF6grFuhd3Gkpz5+nT7vx42fq DxkA6KTGREwFT8oNUw4SNKFf34y5PVPAg69IaEL1CRVJOdTNlDZ5D8qj+sDUn7R+yaCc H2iGbXpw4oDJu1e+eavXiEwZ8kYMTWD2A1uinwUSFDZnj7VDPAlbg+XZXVGZR+RD+IaV jDfg== X-Gm-Message-State: APjAAAWeo/BI2KCdba6YmYPHKZufd+EcNjR66MRKAcoQIwHrb6jkCwVF oOfDjjtR/+ciLBILdqwU4lQrQg== X-Google-Smtp-Source: APXvYqz6Y7iTzwxr+jYTt0aZIZEOUzVynrwLvA6bm1XEeOySdgdKTTsISWvZvEpSef9PEdv9nZWC/Q== X-Received: by 2002:ac8:458e:: with SMTP id l14mr23567049qtn.153.1570320649619; Sat, 05 Oct 2019 17:10:49 -0700 (PDT) Received: from [192.168.1.183] (pool-71-184-117-43.bstnma.fios.verizon.net. [71.184.117.43]) by smtp.gmail.com with ESMTPSA id q64sm6117759qkb.32.2019.10.05.17.10.48 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 05 Oct 2019 17:10:48 -0700 (PDT) Content-Type: multipart/alternative; boundary=Apple-Mail-73DB2C7D-92CD-48C3-B05D-71BC82A26781 Content-Transfer-Encoding: 7bit From: Qian Cai Mime-Version: 1.0 (1.0) Subject: Re: [PATCH] mm/page_isolation: fix a deadlock with printk() Date: Sat, 5 Oct 2019 20:10:47 -0400 Message-Id: <49F0AD04-6F61-4A1D-BFD5-E0769EC6F103@lca.pw> References: <20191005162942.b392b9336b860e245106faa2@linux-foundation.org> Cc: mhocko@kernel.org, sergey.senozhatsky.work@gmail.com, pmladek@suse.com, rostedt@goodmis.org, peterz@infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org In-Reply-To: <20191005162942.b392b9336b860e245106faa2@linux-foundation.org> To: Andrew Morton X-Mailer: iPhone Mail (17A860) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --Apple-Mail-73DB2C7D-92CD-48C3-B05D-71BC82A26781 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable > On Oct 5, 2019, at 7:29 PM, Andrew Morton wrot= e: >=20 > =EF=BB=BFOn Fri, 4 Oct 2019 12:42:26 -0400 Qian Cai wrote: >=20 >> It is unsafe to call printk() while zone->lock was held, i.e., >>=20 >> zone->lock --> console_sem >>=20 >> because the console could always allocate some memory in different code >> paths and form locking chains in an opposite order, >>=20 >> console_sem --> * --> zone->lock >>=20 >> As the result, it triggers lockdep splats like below and in [1]. It is >> fine to take zone->lock after has_unmovable_pages() (which has >> dump_stack()) in set_migratetype_isolate(). While at it, remove a >> problematic printk() in __offline_isolated_pages() only for debugging as >> well which will always disable lockdep on debug kernels. >>=20 >> The problem is probably there forever, but neither many developers will >> run memory offline with the lockdep enabled nor admins in the field are >> lucky enough yet to hit a perfect timing which required to trigger a >> real deadlock. In addition, there aren't many places that call printk() >> while zone->lock was held. >>=20 >> WARNING: possible circular locking dependency detected >> ------------------------------------------------------ >> test.sh/1724 is trying to acquire lock: >> 0000000052059ec0 (console_owner){-...}, at: console_unlock+0x >> 01: 328/0xa30 >>=20 >> but task is already holding lock: >> 000000006ffd89c8 (&(&zone->lock)->rlock){-.-.}, at: start_iso >> 01: late_page_range+0x216/0x538 >>=20 >> which lock already depends on the new lock. >>=20 >> the existing dependency chain (in reverse order) is: >>=20 >> -> #2 (&(&zone->lock)->rlock){-.-.}: >> lock_acquire+0x21a/0x468 >> _raw_spin_lock+0x54/0x68 >> get_page_from_freelist+0x8b6/0x2d28 >> __alloc_pages_nodemask+0x246/0x658 >> __get_free_pages+0x34/0x78 >> sclp_init+0x106/0x690 >> sclp_register+0x2e/0x248 >> sclp_rw_init+0x4a/0x70 >> sclp_console_init+0x4a/0x1b8 >> console_init+0x2c8/0x410 >> start_kernel+0x530/0x6a0 >> startup_continue+0x70/0xd0 >=20 > This appears to be the core of our problem? No, that is just one of those many places could form the lock chain.=20 console_lock -> other locks -> zone_lock Another example is, https://lore.kernel.org/lkml/1568823006.5576.178.camel@lca.pw/ It is easier to avoid, zone_lock -> console_lock rather than fixing the opposite. > At initialization time, > the sclp driver registers an inappropriate dependency with lockdep. It > does this by calling into the page allocator while holding sclp_lock.=20 > But we don't *want* to teach lockdep that sclp_lock nests outside > zone->lock. We want the opposite. >=20 > So can we address this class of problem by declaring "thou shalt not > call the page allocator while holding a lock which can be taken on the > prink path?". And then declare sclp to be defective. >=20 >=20 > And I think sclp is kinda buggy-but-lucky anyway: if console output is > directed to sclp device #0 and we're then trying to initialize sclp > device #1 then any printk which happens during that initialization will > deadlock. The driver escapes this by only supporting a single device > system-wide but it's not a model which drivers should generally follow. >=20 > (And if sclp will only ever support a single device system-wide, why > the heck does it need to take sclp_lock() on the device initialization > path??) >=20 >=20 --Apple-Mail-73DB2C7D-92CD-48C3-B05D-71BC82A26781 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable


On Oct 5, 2019, at 7:29 PM, Andrew Morton <= ;akpm@linux-foundation.org> wrote:

=EF=BB=BFOn Fri,  4 Oct 2019 12:42:= 26 -0400 Qian Cai <cai@lca.pw> wrote:

It is unsafe to call printk() while zone->lock= was held, i.e.,

zone->lock --> con= sole_sem

<= /blockquote>
because the console could always= allocate some memory in different code
paths and form locking chains in an opposite order,

console_sem --> * --> zone->lock

<= blockquote type=3D"cite">As the result, it triggers lockdep splats lik= e below and in [1]. It is
<= span>fine to take zone->lock after has_unmovable_pages() (which has
dump_stack()) in set_migra= tetype_isolate(). While at it, remove a
problematic printk() in __offline_isolated_pages() only f= or debugging as
well w= hich will always disable lockdep on debug kernels.

The problem is probably there forever, but neither many developers= will
run memory offl= ine with the lockdep enabled nor admins in the field are
lucky enough yet to hit a perfect timing= which required to trigger a
real deadlock. In addition, there aren't many places that call print= k()
while zone->lo= ck was held.
<= br>
WARNING: possible circular l= ocking dependency detected
= ------------------------------------------------------
test.sh/1724 is trying to acquire lo= ck:
0000000052059ec0 (= console_owner){-...}, at: console_unlock+0x
01: 328/0xa30

but t= ask is already holding lock:
000000006ffd89c8 (&(&zone->lock)->rlock){-.-.}, at: st= art_iso
01: late_page= _range+0x216/0x538

which lock already dep= ends on the new lock.

the existing depend= ency chain (in reverse order) is:

-> #= 2 (&(&zone->lock)->rlock){-.-.}:
      lock_acquire+= 0x21a/0x468
 &n= bsp;    _raw_spin_lock+0x54/0x68
=
      get_pag= e_from_freelist+0x8b6/0x2d28
      __alloc_pages_nodemask+0x246/0x= 658
  &nbs= p;   __get_free_pages+0x34/0x78
      sclp_init+0x1= 06/0x690
  = ;    sclp_register+0x2e/0x248
      sclp_rw_in= it+0x4a/0x70
 &= nbsp;    sclp_console_init+0x4a/0x1b8
      co= nsole_init+0x2c8/0x410
      start_kernel+0x530/0x6a0
     &n= bsp;startup_continue+0x70/0xd0

This appears to be the core of our problem?
<= br>
No, that is just one of those many places could form the lock c= hain. 

console_lock -> other locks -> zo= ne_lock

Another example is,

https://lore.kernel.org/lkml/1568823006.5576.178.camel@lca.pw/
It is easier to avoid,

zone_lock -= > console_lock

rather than fixing the opposite.<= /div>
 At initiali= zation time,
the sclp driver registers an inappropriate depe= ndency with lockdep.  It
does this by calling into the p= age allocator while holding sclp_lock.
But we don't *want* t= o teach lockdep that sclp_lock nests outside
zone->lock. &= nbsp;We want the opposite.

So can we addres= s this class of problem by declaring "thou shalt not
call th= e page allocator while holding a lock which can be taken on the
prink path?".  And then declare sclp to be defective.


And I think sclp is kinda buggy-but-luc= ky anyway: if console output is
directed to sclp device #0 a= nd we're then trying to initialize sclp
device #1 then any p= rintk which happens during that initialization will
deadlock= .  The driver escapes this by only supporting a single devicesystem-wide but it's not a model which drivers should generally follo= w.

(And if sclp will only ever support a si= ngle device system-wide, why
the heck does it need to take s= clp_lock() on the device initialization
path??)


= --Apple-Mail-73DB2C7D-92CD-48C3-B05D-71BC82A26781--