From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f200.google.com (mail-io0-f200.google.com [209.85.223.200]) by kanga.kvack.org (Postfix) with ESMTP id 6C7796B002F for ; Tue, 6 Mar 2018 15:48:30 -0500 (EST) Received: by mail-io0-f200.google.com with SMTP id d18so326715iob.23 for ; Tue, 06 Mar 2018 12:48:30 -0800 (PST) Received: from userp2120.oracle.com (userp2120.oracle.com. [156.151.31.85]) by mx.google.com with ESMTPS id e66si3770397ita.57.2018.03.06.12.48.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 06 Mar 2018 12:48:29 -0800 (PST) Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w26KlwgT196488 for ; Tue, 6 Mar 2018 20:48:28 GMT Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp2120.oracle.com with ESMTP id 2gj234g3r9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Tue, 06 Mar 2018 20:48:28 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w26KmRhU017644 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Tue, 6 Mar 2018 20:48:27 GMT Received: from abhmp0018.oracle.com (abhmp0018.oracle.com [141.146.116.24]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w26KmRJd021030 for ; Tue, 6 Mar 2018 20:48:27 GMT Received: by mail-ot0-f181.google.com with SMTP id t2so19687037otj.4 for ; Tue, 06 Mar 2018 12:48:26 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20180306123655.957e5b6b20b200505544ea7a@linux-foundation.org> References: <20180306192022.28289-1-pasha.tatashin@oracle.com> <20180306123655.957e5b6b20b200505544ea7a@linux-foundation.org> From: Pavel Tatashin Date: Tue, 6 Mar 2018 15:48:26 -0500 Message-ID: Subject: Re: [PATCH] mm: might_sleep warning Content-Type: multipart/alternative; boundary="001a113b082c6f21690566c4909c" Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Steven Sistare , Daniel Jordan , Masayoshi Mizuma , Michal Hocko , Catalin Marinas , AKASHI Takahiro , Gioh Kim , Heiko Carstens , Yaowei Bai , Wei Yang , Paul Burton , Miles Chen , Vlastimil Babka , Mel Gorman , Johannes Weiner , LKML , linux-mm@kvack.org --001a113b082c6f21690566c4909c Content-Type: text/plain; charset="UTF-8" [CCed everyone] Hi Andrew, I afraid we cannot change this spinlock to mutex because deferred_grow_zone() might be called from an interrupt context if interrupt thread needs to allocate memory. Thank you, Pavel On Tue, Mar 6, 2018 at 3:36 PM, Andrew Morton wrote: > On Tue, 6 Mar 2018 14:20:22 -0500 Pavel Tatashin < > pasha.tatashin@oracle.com> wrote: > > > Robot reported this issue: > > https://lkml.org/lkml/2018/2/27/851 > > > > That is introduced by: > > mm: initialize pages on demand during boot > > > > The problem is caused by changing static branch value within spin lock. > > Spin lock disables preemption, and changing static branch value takes > > mutex lock in its path, and thus may sleep. > > > > The fix is to add another boolean variable to avoid the need to change > > static branch within spinlock. > > > > ... > > > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -1579,6 +1579,7 @@ static int __init deferred_init_memmap(void *data) > > * page_alloc_init_late() soon after smp_init() is complete. > > */ > > static __initdata DEFINE_SPINLOCK(deferred_zone_grow_lock); > > +static bool deferred_zone_grow __initdata = true; > > static DEFINE_STATIC_KEY_TRUE(deferred_pages); > > > > /* > > @@ -1616,7 +1617,7 @@ deferred_grow_zone(struct zone *zone, unsigned int > order) > > * Bail if we raced with another thread that disabled on demand > > * initialization. > > */ > > - if (!static_branch_unlikely(&deferred_pages)) { > > + if (!static_branch_unlikely(&deferred_pages) || > !deferred_zone_grow) { > > spin_unlock_irqrestore(&deferred_zone_grow_lock, flags); > > return false; > > } > > @@ -1683,10 +1684,15 @@ void __init page_alloc_init_late(void) > > /* > > * We are about to initialize the rest of deferred pages, > permanently > > * disable on-demand struct page initialization. > > + * > > + * Note: it is prohibited to modify static branches in > non-preemptible > > + * context. Since, spin_lock() disables preemption, we must use an > > + * extra boolean deferred_zone_grow. > > */ > > spin_lock(&deferred_zone_grow_lock); > > - static_branch_disable(&deferred_pages); > > + deferred_zone_grow = false; > > spin_unlock(&deferred_zone_grow_lock); > > + static_branch_disable(&deferred_pages); > > > > /* There will be num_node_state(N_MEMORY) threads */ > > atomic_set(&pgdat_init_n_undone, num_node_state(N_MEMORY)); > > Kinda ugly, but I can see the logic behind the decisions. > > Can we instead turn deferred_zone_grow_lock into a mutex? > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > --001a113b082c6f21690566c4909c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
[CCed everyone]

Hi Andrew,

I afraid we cannot change this spinlock to mu= tex because=C2=A0deferred_grow_zone() might be called from an interrupt con= text if interrupt thread needs to allocate memory.

Thank you= ,
Pavel


On Tue, Mar 6, 2018 at 3:36 PM, Andrew Morton = <akpm@lin= ux-foundation.org> wrote:
<= span class=3D"">On Tue,=C2=A0 6 Mar 2018 14:20:22 -0500 Pavel Tatashin <= pasha.tatashin@oracle.com&= gt; wrote:

> Robot reported this issue:
> https://lkml.org/lkml/2018/2/27/851
>
> That is introduced by:
> mm: initialize pages on demand during boot
>
> The problem is caused by changing static branch value within spin lock= .
> Spin lock disables preemption, and changing static branch value takes<= br> > mutex lock in its path, and thus may sleep.
>
> The fix is to add another boolean variable to avoid the need to change=
> static branch within spinlock.
>
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1579,6 +1579,7 @@ static int __init deferred_init_memmap(void *dat= a)
>=C2=A0 =C2=A0* page_alloc_init_late() soon after smp_init() is complete= .
>=C2=A0 =C2=A0*/
>=C2=A0 static __initdata DEFINE_SPINLOCK(deferred_zone_grow_lock);=
> +static bool deferred_zone_grow __initdata =3D true;
>=C2=A0 static DEFINE_STATIC_KEY_TRUE(deferred_pages);
>
>=C2=A0 /*
> @@ -1616,7 +1617,7 @@ deferred_grow_zone(struct zone *zone, unsigned i= nt order)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 * Bail if we raced with another thread that= disabled on demand
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 * initialization.
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 */
> -=C2=A0 =C2=A0 =C2=A0if (!static_branch_unlikely(&deferred_pa= ges)) {
> +=C2=A0 =C2=A0 =C2=A0if (!static_branch_unlikely(&deferred_pa= ges) || !deferred_zone_grow) {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0spin_unlock_irqr= estore(&deferred_zone_grow_lock, flags);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return false; >=C2=A0 =C2=A0 =C2=A0 =C2=A0}
> @@ -1683,10 +1684,15 @@ void __init page_alloc_init_late(void)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0/*
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 * We are about to initialize the rest of de= ferred pages, permanently
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 * disable on-demand struct page initializat= ion.
> +=C2=A0 =C2=A0 =C2=A0 *
> +=C2=A0 =C2=A0 =C2=A0 * Note: it is prohibited to modify static branch= es in non-preemptible
> +=C2=A0 =C2=A0 =C2=A0 * context. Since, spin_lock() disables preemptio= n, we must use an
> +=C2=A0 =C2=A0 =C2=A0 * extra boolean deferred_zone_grow.
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 */
>=C2=A0 =C2=A0 =C2=A0 =C2=A0spin_lock(&deferred_zone_grow_lock)= ;
> -=C2=A0 =C2=A0 =C2=A0static_branch_disable(&deferred_pages);<= br> > +=C2=A0 =C2=A0 =C2=A0deferred_zone_grow =3D false;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0spin_unlock(&deferred_zone_grow_loc= k);
> +=C2=A0 =C2=A0 =C2=A0static_branch_disable(&deferred_pages);<= br> >
>=C2=A0 =C2=A0 =C2=A0 =C2=A0/* There will be num_node_state(N_MEMORY) th= reads */
>=C2=A0 =C2=A0 =C2=A0 =C2=A0atomic_set(&pgdat_init_n_undone, nu= m_node_state(N_MEMORY));

Kinda ugly, but I can see the logic behind the decisions.

Can we instead turn deferred_zone_grow_lock into a mutex?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.= =C2=A0 For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=3Dmailto:"dont@kvack.org"> email@kva= ck.org </a>

--001a113b082c6f21690566c4909c-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org