From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=IiI3=X7=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI,
	MIME_QP_LONG_LINE,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 57229C4740A
	for <linux-mm@archiver.kernel.org>; Sun,  6 Oct 2019 00:10:53 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id D6777222C8
	for <linux-mm@archiver.kernel.org>; Sun,  6 Oct 2019 00:10:52 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=lca.pw header.i=@lca.pw header.b="PswHiM/B"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D6777222C8
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=lca.pw
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 4A3EB6B0003; Sat,  5 Oct 2019 20:10:52 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 4548E6B0005; Sat,  5 Oct 2019 20:10:52 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 3699B8E0003; Sat,  5 Oct 2019 20:10:52 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0218.hostedemail.com [216.40.44.218])
	by kanga.kvack.org (Postfix) with ESMTP id 154866B0003
	for <linux-mm@kvack.org>; Sat,  5 Oct 2019 20:10:52 -0400 (EDT)
Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay05.hostedemail.com (Postfix) with SMTP id 92B77181AC9AE
	for <linux-mm@kvack.org>; Sun,  6 Oct 2019 00:10:51 +0000 (UTC)
X-FDA: 76011429102.09.board77_5ea8a857a2d43
X-HE-Tag: board77_5ea8a857a2d43
X-Filterd-Recvd-Size: 14714
Received: from mail-qt1-f196.google.com (mail-qt1-f196.google.com [209.85.160.196])
	by imf18.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Sun,  6 Oct 2019 00:10:50 +0000 (UTC)
Received: by mail-qt1-f196.google.com with SMTP id 3so14096567qta.1
        for <linux-mm@kvack.org>; Sat, 05 Oct 2019 17:10:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=lca.pw; s=google;
        h=content-transfer-encoding:from:mime-version:subject:date:message-id
         :references:cc:in-reply-to:to;
        bh=O7SlWFQORVPPvXO0cOpsjQAu19u+WXbrw7Qx6dYKCZ0=;
        b=PswHiM/BIPsKwmlHPu6vurL2+1cXEWaJVXnLl9ik6XdJsZCcOTSbu5LOBOax9w2CWy
         E1JEjSMKI6x1AuvH3YnoFYlP40EOABs8RuvbzHwz2mz71nPhmtJCScmQqPG/O624eHSV
         xxINAA1r/xEK8TPWG0avjLOjQmLFyU/Wsf928Y+fLSAnLpvczXRn0g87qZdBSkLqqgER
         md8uc2rdoHnSMH4k6YnMUgOq8eWuUWoRRQRmXCWpcpo1xnUbWe8HffyICIcqm0yMYfoQ
         Bgf82uvdHlyr0xbCZKyGTxwY4gQ/oYG1KHnFv4jCXrOY+1hODuSgMuqzUmzbWxG+Dv4Q
         nFAg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:content-transfer-encoding:from:mime-version
         :subject:date:message-id:references:cc:in-reply-to:to;
        bh=O7SlWFQORVPPvXO0cOpsjQAu19u+WXbrw7Qx6dYKCZ0=;
        b=oOv/sNyzsz55FTS+v0BgsVjBUK2XRIiT6mhiRy85xLSkejHeURqfyUnZ+uq5YUIvDj
         4hbe+G7lL1qOFJn8GWg0LBuD35P3iThjK0qGEkjMs1zHctCszIF6iVC74U1MlldsZFgN
         T/0d+cbnrWlZk0DHNhH1WEQVVnGMNMRSQm4V+w0jTaGvF6grFuhd3Gkpz5+nT7vx42fq
         DxkA6KTGREwFT8oNUw4SNKFf34y5PVPAg69IaEL1CRVJOdTNlDZ5D8qj+sDUn7R+yaCc
         H2iGbXpw4oDJu1e+eavXiEwZ8kYMTWD2A1uinwUSFDZnj7VDPAlbg+XZXVGZR+RD+IaV
         jDfg==
X-Gm-Message-State: APjAAAWeo/BI2KCdba6YmYPHKZufd+EcNjR66MRKAcoQIwHrb6jkCwVF
	oOfDjjtR/+ciLBILdqwU4lQrQg==
X-Google-Smtp-Source: APXvYqz6Y7iTzwxr+jYTt0aZIZEOUzVynrwLvA6bm1XEeOySdgdKTTsISWvZvEpSef9PEdv9nZWC/Q==
X-Received: by 2002:ac8:458e:: with SMTP id l14mr23567049qtn.153.1570320649619;
        Sat, 05 Oct 2019 17:10:49 -0700 (PDT)
Received: from [192.168.1.183] (pool-71-184-117-43.bstnma.fios.verizon.net. [71.184.117.43])
        by smtp.gmail.com with ESMTPSA id q64sm6117759qkb.32.2019.10.05.17.10.48
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Sat, 05 Oct 2019 17:10:48 -0700 (PDT)
Content-Type: multipart/alternative; boundary=Apple-Mail-73DB2C7D-92CD-48C3-B05D-71BC82A26781
Content-Transfer-Encoding: 7bit
From: Qian Cai <cai@lca.pw>
Mime-Version: 1.0 (1.0)
Subject: Re: [PATCH] mm/page_isolation: fix a deadlock with printk()
Date: Sat, 5 Oct 2019 20:10:47 -0400
Message-Id: <49F0AD04-6F61-4A1D-BFD5-E0769EC6F103@lca.pw>
References: <20191005162942.b392b9336b860e245106faa2@linux-foundation.org>
Cc: mhocko@kernel.org, sergey.senozhatsky.work@gmail.com, pmladek@suse.com,
 rostedt@goodmis.org, peterz@infradead.org, linux-mm@kvack.org,
 linux-kernel@vger.kernel.org
In-Reply-To: <20191005162942.b392b9336b860e245106faa2@linux-foundation.org>
To: Andrew Morton <akpm@linux-foundation.org>
X-Mailer: iPhone Mail (17A860)
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>


--Apple-Mail-73DB2C7D-92CD-48C3-B05D-71BC82A26781
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable


> On Oct 5, 2019, at 7:29 PM, Andrew Morton <akpm@linux-foundation.org> wrot=
e:
>=20
> =EF=BB=BFOn Fri,  4 Oct 2019 12:42:26 -0400 Qian Cai <cai@lca.pw> wrote:
>=20
>> It is unsafe to call printk() while zone->lock was held, i.e.,
>>=20
>> zone->lock --> console_sem
>>=20
>> because the console could always allocate some memory in different code
>> paths and form locking chains in an opposite order,
>>=20
>> console_sem --> * --> zone->lock
>>=20
>> As the result, it triggers lockdep splats like below and in [1]. It is
>> fine to take zone->lock after has_unmovable_pages() (which has
>> dump_stack()) in set_migratetype_isolate(). While at it, remove a
>> problematic printk() in __offline_isolated_pages() only for debugging as
>> well which will always disable lockdep on debug kernels.
>>=20
>> The problem is probably there forever, but neither many developers will
>> run memory offline with the lockdep enabled nor admins in the field are
>> lucky enough yet to hit a perfect timing which required to trigger a
>> real deadlock. In addition, there aren't many places that call printk()
>> while zone->lock was held.
>>=20
>> WARNING: possible circular locking dependency detected
>> ------------------------------------------------------
>> test.sh/1724 is trying to acquire lock:
>> 0000000052059ec0 (console_owner){-...}, at: console_unlock+0x
>> 01: 328/0xa30
>>=20
>> but task is already holding lock:
>> 000000006ffd89c8 (&(&zone->lock)->rlock){-.-.}, at: start_iso
>> 01: late_page_range+0x216/0x538
>>=20
>> which lock already depends on the new lock.
>>=20
>> the existing dependency chain (in reverse order) is:
>>=20
>> -> #2 (&(&zone->lock)->rlock){-.-.}:
>>       lock_acquire+0x21a/0x468
>>       _raw_spin_lock+0x54/0x68
>>       get_page_from_freelist+0x8b6/0x2d28
>>       __alloc_pages_nodemask+0x246/0x658
>>       __get_free_pages+0x34/0x78
>>       sclp_init+0x106/0x690
>>       sclp_register+0x2e/0x248
>>       sclp_rw_init+0x4a/0x70
>>       sclp_console_init+0x4a/0x1b8
>>       console_init+0x2c8/0x410
>>       start_kernel+0x530/0x6a0
>>       startup_continue+0x70/0xd0
>=20
> This appears to be the core of our problem?

No, that is just one of those many places could form the lock chain.=20

console_lock -> other locks -> zone_lock

Another example is,

https://lore.kernel.org/lkml/1568823006.5576.178.camel@lca.pw/

It is easier to avoid,

zone_lock -> console_lock

rather than fixing the opposite.

>  At initialization time,
> the sclp driver registers an inappropriate dependency with lockdep.  It
> does this by calling into the page allocator while holding sclp_lock.=20
> But we don't *want* to teach lockdep that sclp_lock nests outside
> zone->lock.  We want the opposite.
>=20
> So can we address this class of problem by declaring "thou shalt not
> call the page allocator while holding a lock which can be taken on the
> prink path?".  And then declare sclp to be defective.
>=20
>=20
> And I think sclp is kinda buggy-but-lucky anyway: if console output is
> directed to sclp device #0 and we're then trying to initialize sclp
> device #1 then any printk which happens during that initialization will
> deadlock.  The driver escapes this by only supporting a single device
> system-wide but it's not a model which drivers should generally follow.
>=20
> (And if sclp will only ever support a single device system-wide, why
> the heck does it need to take sclp_lock() on the device initialization
> path??)
>=20
>=20

--Apple-Mail-73DB2C7D-92CD-48C3-B05D-71BC82A26781
Content-Type: text/html;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable

<html><head><meta http-equiv=3D"content-type" content=3D"text/html; charset=3D=
utf-8"></head><body dir=3D"auto"><div dir=3D"ltr"><br></div><div dir=3D"ltr"=
><br><blockquote type=3D"cite">On Oct 5, 2019, at 7:29 PM, Andrew Morton &lt=
;akpm@linux-foundation.org&gt; wrote:<br><br></blockquote></div><blockquote t=
ype=3D"cite"><div dir=3D"ltr">=EF=BB=BF<span>On Fri, &nbsp;4 Oct 2019 12:42:=
26 -0400 Qian Cai &lt;cai@lca.pw&gt; wrote:</span><br><span></span><br><bloc=
kquote type=3D"cite"><span>It is unsafe to call printk() while zone-&gt;lock=
 was held, i.e.,</span><br></blockquote><blockquote type=3D"cite"><span></sp=
an><br></blockquote><blockquote type=3D"cite"><span>zone-&gt;lock --&gt; con=
sole_sem</span><br></blockquote><blockquote type=3D"cite"><span></span><br><=
/blockquote><blockquote type=3D"cite"><span>because the console could always=
 allocate some memory in different code</span><br></blockquote><blockquote t=
ype=3D"cite"><span>paths and form locking chains in an opposite order,</span=
><br></blockquote><blockquote type=3D"cite"><span></span><br></blockquote><b=
lockquote type=3D"cite"><span>console_sem --&gt; * --&gt; zone-&gt;lock</spa=
n><br></blockquote><blockquote type=3D"cite"><span></span><br></blockquote><=
blockquote type=3D"cite"><span>As the result, it triggers lockdep splats lik=
e below and in [1]. It is</span><br></blockquote><blockquote type=3D"cite"><=
span>fine to take zone-&gt;lock after has_unmovable_pages() (which has</span=
><br></blockquote><blockquote type=3D"cite"><span>dump_stack()) in set_migra=
tetype_isolate(). While at it, remove a</span><br></blockquote><blockquote t=
ype=3D"cite"><span>problematic printk() in __offline_isolated_pages() only f=
or debugging as</span><br></blockquote><blockquote type=3D"cite"><span>well w=
hich will always disable lockdep on debug kernels.</span><br></blockquote><b=
lockquote type=3D"cite"><span></span><br></blockquote><blockquote type=3D"ci=
te"><span>The problem is probably there forever, but neither many developers=
 will</span><br></blockquote><blockquote type=3D"cite"><span>run memory offl=
ine with the lockdep enabled nor admins in the field are</span><br></blockqu=
ote><blockquote type=3D"cite"><span>lucky enough yet to hit a perfect timing=
 which required to trigger a</span><br></blockquote><blockquote type=3D"cite=
"><span>real deadlock. In addition, there aren't many places that call print=
k()</span><br></blockquote><blockquote type=3D"cite"><span>while zone-&gt;lo=
ck was held.</span><br></blockquote><blockquote type=3D"cite"><span></span><=
br></blockquote><blockquote type=3D"cite"><span>WARNING: possible circular l=
ocking dependency detected</span><br></blockquote><blockquote type=3D"cite">=
<span>------------------------------------------------------</span><br></blo=
ckquote><blockquote type=3D"cite"><span>test.sh/1724 is trying to acquire lo=
ck:</span><br></blockquote><blockquote type=3D"cite"><span>0000000052059ec0 (=
console_owner){-...}, at: console_unlock+0x</span><br></blockquote><blockquo=
te type=3D"cite"><span>01: 328/0xa30</span><br></blockquote><blockquote type=
=3D"cite"><span></span><br></blockquote><blockquote type=3D"cite"><span>but t=
ask is already holding lock:</span><br></blockquote><blockquote type=3D"cite=
"><span>000000006ffd89c8 (&amp;(&amp;zone-&gt;lock)-&gt;rlock){-.-.}, at: st=
art_iso</span><br></blockquote><blockquote type=3D"cite"><span>01: late_page=
_range+0x216/0x538</span><br></blockquote><blockquote type=3D"cite"><span></=
span><br></blockquote><blockquote type=3D"cite"><span>which lock already dep=
ends on the new lock.</span><br></blockquote><blockquote type=3D"cite"><span=
></span><br></blockquote><blockquote type=3D"cite"><span>the existing depend=
ency chain (in reverse order) is:</span><br></blockquote><blockquote type=3D=
"cite"><span></span><br></blockquote><blockquote type=3D"cite"><span>-&gt; #=
2 (&amp;(&amp;zone-&gt;lock)-&gt;rlock){-.-.}:</span><br></blockquote><block=
quote type=3D"cite"><span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;lock_acquire+=
0x21a/0x468</span><br></blockquote><blockquote type=3D"cite"><span> &nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;_raw_spin_lock+0x54/0x68</span><br></blockquote>=
<blockquote type=3D"cite"><span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;get_pag=
e_from_freelist+0x8b6/0x2d28</span><br></blockquote><blockquote type=3D"cite=
"><span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;__alloc_pages_nodemask+0x246/0x=
658</span><br></blockquote><blockquote type=3D"cite"><span> &nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;__get_free_pages+0x34/0x78</span><br></blockquote><block=
quote type=3D"cite"><span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sclp_init+0x1=
06/0x690</span><br></blockquote><blockquote type=3D"cite"><span> &nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;sclp_register+0x2e/0x248</span><br></blockquote><bl=
ockquote type=3D"cite"><span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sclp_rw_in=
it+0x4a/0x70</span><br></blockquote><blockquote type=3D"cite"><span> &nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sclp_console_init+0x4a/0x1b8</span><br></blockq=
uote><blockquote type=3D"cite"><span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;co=
nsole_init+0x2c8/0x410</span><br></blockquote><blockquote type=3D"cite"><spa=
n> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;start_kernel+0x530/0x6a0</span><br></=
blockquote><blockquote type=3D"cite"><span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;startup_continue+0x70/0xd0</span><br></blockquote><span></span><br><span=
>This appears to be the core of our problem?</span></div></blockquote><div><=
br></div><div>No, that is just one of those many places could form the lock c=
hain.&nbsp;</div><div><br></div><div>console_lock -&gt; other locks -&gt; zo=
ne_lock</div><div><br></div><div>Another example is,</div><div><br></div><di=
v><a href=3D"https://lore.kernel.org/lkml/1568823006.5576.178.camel@lca.pw/"=
>https://lore.kernel.org/lkml/1568823006.5576.178.camel@lca.pw/</a></div><di=
v><br></div><div>It is easier to avoid,</div><div><br></div><div>zone_lock -=
&gt; console_lock</div><div><br></div><div>rather than fixing the opposite.<=
/div><br><blockquote type=3D"cite"><div dir=3D"ltr"><span> &nbsp;At initiali=
zation time,</span><br><span>the sclp driver registers an inappropriate depe=
ndency with lockdep. &nbsp;It</span><br><span>does this by calling into the p=
age allocator while holding sclp_lock. </span><br><span>But we don't *want* t=
o teach lockdep that sclp_lock nests outside</span><br><span>zone-&gt;lock. &=
nbsp;We want the opposite.</span><br><span></span><br><span>So can we addres=
s this class of problem by declaring "thou shalt not</span><br><span>call th=
e page allocator while holding a lock which can be taken on the</span><br><s=
pan>prink path?". &nbsp;And then declare sclp to be defective.</span><br><sp=
an></span><br><span></span><br><span>And I think sclp is kinda buggy-but-luc=
ky anyway: if console output is</span><br><span>directed to sclp device #0 a=
nd we're then trying to initialize sclp</span><br><span>device #1 then any p=
rintk which happens during that initialization will</span><br><span>deadlock=
. &nbsp;The driver escapes this by only supporting a single device</span><br=
><span>system-wide but it's not a model which drivers should generally follo=
w.</span><br><span></span><br><span>(And if sclp will only ever support a si=
ngle device system-wide, why</span><br><span>the heck does it need to take s=
clp_lock() on the device initialization</span><br><span>path??)</span><br><s=
pan></span><br><span></span><br></div></blockquote></body></html>=

--Apple-Mail-73DB2C7D-92CD-48C3-B05D-71BC82A26781--