From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=cKhE=X4=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.9 required=3.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,
	SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EF405C4CED1
	for <linux-mm@archiver.kernel.org>; Thu,  3 Oct 2019 16:12:11 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 85C4020865
	for <linux-mm@archiver.kernel.org>; Thu,  3 Oct 2019 16:12:11 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="B0jAsWog";
	dkim=pass (1024-bit key) header.d=fb.onmicrosoft.com header.i=@fb.onmicrosoft.com header.b="O82cZo0A"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 85C4020865
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=fb.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 182E86B0006; Thu,  3 Oct 2019 12:12:11 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 132C28E0005; Thu,  3 Oct 2019 12:12:11 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id F3FDB6B0008; Thu,  3 Oct 2019 12:12:10 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0189.hostedemail.com [216.40.44.189])
	by kanga.kvack.org (Postfix) with ESMTP id C954F6B0006
	for <linux-mm@kvack.org>; Thu,  3 Oct 2019 12:12:10 -0400 (EDT)
Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay02.hostedemail.com (Postfix) with SMTP id 6953A6C37
	for <linux-mm@kvack.org>; Thu,  3 Oct 2019 16:12:10 +0000 (UTC)
X-FDA: 76002965220.06.look21_12d783d01ab23
X-HE-Tag: look21_12d783d01ab23
X-Filterd-Recvd-Size: 21361
Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42])
	by imf20.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Thu,  3 Oct 2019 16:12:09 +0000 (UTC)
Received: from pps.filterd (m0109334.ppops.net [127.0.0.1])
	by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x93G3GRJ012489;
	Thu, 3 Oct 2019 09:12:04 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject
 : date : message-id : references : in-reply-to : content-type : content-id
 : content-transfer-encoding : mime-version; s=facebook;
 bh=AxxveKwcEdMJkKPADiwPChawv+1zl1y9FkzfOa5tSEs=;
 b=B0jAsWog79HOvK7XaTvOCK47cYBWpHJhdbhPMiPXHk9358yBMeJ3TegT5jIKG2ul/7Lt
 XisTdBaojU6ru8DzaMP6rvbXQBO6SSoFZEF8G5nYsvFOhzdyUFZe20fK6UuhzAKJiAdw
 DgN6nZsxkXL088QLGNsKvFjUwAvi9wI+YcI= 
Received: from maileast.thefacebook.com ([163.114.130.16])
	by mx0a-00082601.pphosted.com with ESMTP id 2vcddnt8sg-16
	(version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT);
	Thu, 03 Oct 2019 09:12:03 -0700
Received: from ash-exopmbx201.TheFacebook.com (2620:10d:c0a8:83::8) by
 ash-exhub104.TheFacebook.com (2620:10d:c0a8:82::d) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.1713.5; Thu, 3 Oct 2019 09:11:56 -0700
Received: from ash-exhub201.TheFacebook.com (2620:10d:c0a8:83::7) by
 ash-exopmbx201.TheFacebook.com (2620:10d:c0a8:83::8) with Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.1713.5; Thu, 3 Oct 2019 09:11:55 -0700
Received: from NAM03-DM3-obe.outbound.protection.outlook.com (100.104.31.183)
 by o365-in.thefacebook.com (100.104.36.101) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5
 via Frontend Transport; Thu, 3 Oct 2019 09:11:55 -0700
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=PP9FSd4Vb1VrAfP9sWu1+RkFlRrTPVPqrHFKMthK7lxhN0aTwN77Rg47u39gxozjxKfSZkPc3AlAkcftwDHbupKV3w/qBTky3sBW+s1yXH5DU4SIhR6A6ueisuwTOdBq2JLPzsu1XNDlbkhMH51Lpv3DNJUYb/v2VWEtxrUhKz9+NIvrjhMUgdxX8UfQSVJvbzSlx2JEiVthqyM3WDGi20V+4ab+0S4jTa1eBVHc1Xx0p9tENf8E9375YhVJxbQmEbUX6CJa295589E+5qTsuKXsPmI7QvZG4qPLsOyvWmenyluShJY5NMoa3pHALT0EYcDgSyaM6mblpJoB8u4Mig==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=AxxveKwcEdMJkKPADiwPChawv+1zl1y9FkzfOa5tSEs=;
 b=Eh6A/Ia7bPCGego4u3/YBuWc4h+R30q4UduQ8VkQ/ZXNUN8oTpxHh/xLyVzQ1wH2pQ+fecaK9CpIUdCqtrBVRkHx7oa6OPBq4ndFoznDO9y/k1VQnty1d1DhzVBL3lMp9sjvkMQtUpBCzLGuTFcRJaytkpZT95GO7KV8Q2tyL+KeKToYWgRvvnTmg9yLojR7u5u3gMgZ99lyVnJwAwqklF1vkiJ81y7SqwqccHjQ0urrdHcWCVxyV0gTZ7kQ56DrBZmyO8pQRXfe9Dvob6Z1ovk1QRS3zQPVCkIES3Ve0G1hmJP9+ZuHIwfNklaZrkP4GtPQPajsMC5D/MDQTelHQA==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=fb.com; dmarc=pass action=none header.from=fb.com; dkim=pass
 header.d=fb.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.onmicrosoft.com;
 s=selector2-fb-onmicrosoft-com;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=AxxveKwcEdMJkKPADiwPChawv+1zl1y9FkzfOa5tSEs=;
 b=O82cZo0AecTktR65FsuT3aWOtEpORpff5Dw0N8pRZxiN9HPm08Puww7dSmPKcSZpanqMup2B67YgV9+N2EZzcEp72JacvU9O7SdIhVpAKVqunNFA9t6prM23/Fm0hLi2Q3EMNsRR92vXs0uJJZtYR94gNRRjzvsAgL0zG3Vf2WM=
Received: from BN8PR15MB2626.namprd15.prod.outlook.com (20.179.137.220) by
 BN8PR15MB2595.namprd15.prod.outlook.com (20.179.138.96) with Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.2305.20; Thu, 3 Oct 2019 16:11:54 +0000
Received: from BN8PR15MB2626.namprd15.prod.outlook.com
 ([fe80::dde5:821f:4571:dea4]) by BN8PR15MB2626.namprd15.prod.outlook.com
 ([fe80::dde5:821f:4571:dea4%5]) with mapi id 15.20.2305.023; Thu, 3 Oct 2019
 16:11:54 +0000
From: Roman Gushchin <guro@fb.com>
To: Karsten Graul <kgraul@linux.ibm.com>
CC: Shakeel Butt <shakeelb@google.com>,
        Vladimir Davydov
	<vdavydov.dev@gmail.com>,
        David Rientjes <rientjes@google.com>,
        ",Christoph
 Lameter" <cl@linux.com>,
        Pekka Enberg <penberg@kernel.org>,
        Joonsoo Kim
	<iamjoonsoo.kim@lge.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        "linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: BUG: Crash in __free_slab() using SLAB_TYPESAFE_BY_RCU
Thread-Topic: BUG: Crash in __free_slab() using SLAB_TYPESAFE_BY_RCU
Thread-Index: AQHVeTDVEk1aW8/+kk6cZHy4IVdjLKdHwFmAgAAPNQCAAMyngIAAe+6A
Date: Thu, 3 Oct 2019 16:11:54 +0000
Message-ID: <20191003161149.GB13950@castle.DHCP.thefacebook.com>
References: <4a5108b4-5a2f-f83c-e6a8-5e0c9074ac69@linux.ibm.com>
 <20191002194121.GA9033@castle.DHCP.thefacebook.com>
 <20191003033540.GA10017@castle.DHCP.thefacebook.com>
 <da3c67e7-781c-e145-5c6e-c9f3ed4e57fb@linux.ibm.com>
In-Reply-To: <da3c67e7-781c-e145-5c6e-c9f3ed4e57fb@linux.ibm.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-clientproxiedby: MWHPR11CA0048.namprd11.prod.outlook.com
 (2603:10b6:300:115::34) To BN8PR15MB2626.namprd15.prod.outlook.com
 (2603:10b6:408:c7::28)
x-ms-exchange-messagesentrepresentingtype: 1
x-originating-ip: [2620:10d:c090:180::2898]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: a325713a-7257-4591-64d1-08d7481c6852
x-ms-traffictypediagnostic: BN8PR15MB2595:
x-microsoft-antispam-prvs: <BN8PR15MB2595FE242A3248D3535CD8D3BE9F0@BN8PR15MB2595.namprd15.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:8882;
x-forefront-prvs: 01792087B6
x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(396003)(376002)(136003)(346002)(366004)(39860400002)(189003)(199004)(51444003)(52314003)(52116002)(11346002)(476003)(102836004)(45080400002)(30864003)(66446008)(316002)(53546011)(386003)(6506007)(8936002)(86362001)(14454004)(6116002)(6486002)(4326008)(71190400001)(229853002)(6436002)(99286004)(2906002)(81156014)(8676002)(6512007)(66476007)(305945005)(64756008)(66556008)(66946007)(46003)(9686003)(81166006)(76176011)(71200400001)(486006)(1076003)(446003)(25786009)(256004)(6246003)(54906003)(186003)(14444005)(478600001)(33656002)(7736002)(6916009)(5660300002);DIR:OUT;SFP:1102;SCL:1;SRVR:BN8PR15MB2595;H:BN8PR15MB2626.namprd15.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;MX:1;A:1;
received-spf: None (protection.outlook.com: fb.com does not designate
 permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: lsjJ5vCWy/BNsQWv7JmXlKatjpFJCDUaTRtUXSNR5B9T7pBBnuLzz2MyYPEAFrDLiyxHqi79bFGMeeLtUHEkluCyxRi9zNIs4OcwpA18Q3xhnbzFzpnt6m8TG2HAcF6NwRoBzrpceGz6wk8ZJF5d3lwzdbx3Gz5cGp1OdlvC4GqvqI8mibg65SK9Fog4kFRcUzgtPRdmwbHpjce0JA7bVRVYKd7nH4E5YkFmIsF7bjjvFsPGoDEfKr02chFyXYFMcWhWFAmcfDsm/6c/LmCxfRuMzxwooUdMU3wbUH1jx+DnCB9ekn0lRHFrRRUbCgZZYCCjRGUzhquroUON3iMQO7ZXgQMpsNwxTUB0VXXKl1e6KjfOqZ5ycP4SZUijR0dw001V+FxL4Tmt74dkcTa/laOFxCiCwhTU5wHwT8O6B5Y=
x-ms-exchange-transport-forked: True
Content-Type: text/plain; charset="iso-8859-1"
Content-ID: <BDA83625CF24D94EB8E6AED33E4B9F46@namprd15.prod.outlook.com>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: a325713a-7257-4591-64d1-08d7481c6852
X-MS-Exchange-CrossTenant-originalarrivaltime: 03 Oct 2019 16:11:54.4607
 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: VgsZylnebC3m2kx62ICmnTCmXd61eHbV0NYLWEerJi05Pvf4R6nqKezL1pFW/xBr
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN8PR15MB2595
X-OriginatorOrg: fb.com
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,1.0.8
 definitions=2019-10-03_06:2019-10-03,2019-10-03 signatures=0
X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 adultscore=0 mlxscore=0
 suspectscore=0 spamscore=0 phishscore=0 priorityscore=1501 impostorscore=0
 mlxlogscore=999 lowpriorityscore=0 malwarescore=0 clxscore=1015
 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-1908290000 definitions=main-1910030144
X-FB-Internal: deliver
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Thu, Oct 03, 2019 at 10:48:15AM +0200, Karsten Graul wrote:
> Hello, Roman!
>=20
> On 03/10/2019 05:35, Roman Gushchin wrote:
> > On Wed, Oct 02, 2019 at 12:41:29PM -0700, Roman Gushchin wrote:
> >> Hello, Karsten!
> >>
> >> Thank you for the report!
> >>
> >> On Wed, Oct 02, 2019 at 04:50:53PM +0200, Karsten Graul wrote:
> >>>
> >>> net/smc is calling proto_register(&smc_proto, 1) with smc_proto.slab_=
flags =3D SLAB_TYPESAFE_BY_RCU.
> >>> Right after the last SMC socket is destroyed, proto_unregister(&smc_p=
roto) is called, which=20
> >>> calls kmem_cache_destroy(prot->slab). This results in a kernel crash =
in __free_slab().
> >>> Platform is s390x, reproduced on kernel 5.4-rc1. The problem was intr=
oduced by commit
> >>> fb2f2b0adb98 ("mm: memcg/slab: reparent memcg kmem_caches on cgroup r=
emoval")
> >>>
> >>> I added a 'call graph', below of that is the crash log and a (simple)=
 patch that works for me,
> >>> but I don't know if this is the correct way to fix it.
> >>>
> >>> (Please keep me on CC of this thread because I do not follow the mm m=
ailing list, thank you)
> >>>
> >>>
> >>> kmem_cache_destroy()=20
> >>>   -> shutdown_memcg_caches()
> >>>     -> shutdown_cache()
> >>>       -> __kmem_cache_shutdown()  (slub.c)
> >>>         -> free_partial()
> >>>           -> discard_slab()
> >>> 	    -> free_slab()                                      -- call to _=
_free_slab() is delayed
> >>> 	      -> call_rcu(rcu_free_slab)
> >>>     -> memcg_unlink_cache()
> >>>       -> WRITE_ONCE(s->memcg_params.memcg, NULL);               -- !!=
!
> >>>     -> list_add_tail(&s->list, &slab_caches_to_rcu_destroy);
> >>>     -> schedule_work(&slab_caches_to_rcu_destroy_work);  -> work_fn u=
ses rcu_barrier() to wait for rcu_batch,=20
> >>>                                                             so work_f=
n is not further involved here...
> >>> ... rcu grace period ...
> >>> rcu_batch()
> >>>   ...
> >>>   -> rcu_free_slab()   (slub.c)
> >>>     -> __free_slab()
> >>>       -> uncharge_slab_page()
> >>>         -> memcg_uncharge_slab()
> >>> 	  -> memcg =3D READ_ONCE(s->memcg_params.memcg);          -- !!! mem=
cg =3D=3D NULL
> >>> 	  -> mem_cgroup_lruvec(memcg)
> >>> 	    -> mz =3D mem_cgroup_nodeinfo(memcg, pgdat->node_id); -- mz =3D=
=3D NULL
> >>> 	    -> lruvec =3D &mz->lruvec;                            -- lruvec =
=3D=3D NULL
> >>> 	    -> lruvec->pgdat =3D pgdat;                           -- *crash*
> >>>
> >>> The crash log:
> >>
> >> Hm, I might be wrong, but it seems that the problem is deeper: __free_=
slab()
> >> called from the rcu path races with kmem_cache_destroy(), which is sup=
posed
> >> to be called when there are no outstanding allocations (and correspond=
ing pages).
> >> Any charged slab page actually holds a reference to the kmem_cache, wh=
ich prevents
> >> its destruction (look at s->memcg_params.refcnt), but kmem_cache_destr=
oy() ignores
> >> it.
>=20
> I don't see a race between kmem_cache_destroy() and __fre_slab(). kmem_ca=
che_destroy()
> is already done when __free_slab() is called. Maybe the below trace shows=
 you the order of calls on my
> system: kmem_cache_destroy() unlinks the memcg caches, sets up the rcu ca=
llbacks, then=20
> it starts the slab_caches_to_rcu_destroy_workfn() worker and then kmem_ca=
che_destroy() is done.

Right, and this is the problem. The question when call_rcu() in free_slab()=
 has
been called: if it did happen before the kmem_cache_destroy(), it's clearly
a bug inside the slab allocator. Otherwise it's probably an incorrect API
invocation.

> You see that the smc code is getting control again after that.
> The worker starts in between (before the smc_exit trace), but keeps waiti=
ng on the rcu_barrier().
> Ages later (see time difference) the rcu grace period ends and rcu_free_s=
lab() is called, and it
> crashes.=20
> I hope that helps!
>=20
> [  145.539917] kmem_cache_destroy before shutdown_memcg_caches() for 0000=
000068106f00
> [  145.539929] free_slab call_rcu() for 00000000392c2e00, page is 000003d=
080e45000
> [  145.539961] memcg_unlink_cache clearing memcg for 00000000392c2e00
> [  145.539970] shutdown_cache adding to slab_caches_to_rcu_destroy queue =
for work: 00000000392c2e00

Does it mean that call_rcu() has been called after kmem_cache_destroy()?
In this case, do you know who called it?

I'd add an atomic flag to the root kmem_cache, set it at the beginning of t=
he
kmem_cache_destroy() and check it in free_slab(). If set, dump the stacktra=
ce.
Just please make sure you're looking at the root kmem_cache flag, not the m=
emcg
one.

Thank you!

Roman

>=20
> [  145.540001] free_slab call_rcu() for 00000000392c2900, page is 000003d=
080e4a200
> [  145.540031] memcg_unlink_cache clearing memcg for 00000000392c2900
> [  145.540041] shutdown_cache adding to slab_caches_to_rcu_destroy queue =
for work: 00000000392c2900
>=20
> [  145.540066] kmem_cache_destroy after shutdown_memcg_caches() for 00000=
00068106f00
>=20
> [  145.540075] kmem_cache_destroy before final shutdown_cache() for 00000=
00068106f00
> [  145.540086] free_slab call_rcu() for 0000000068106f00, page is 000003d=
080e0a800
> [  145.540189] shutdown_cache adding to slab_caches_to_rcu_destroy queue =
for work: 0000000068106f00
>=20
> [  145.540548] kmem_cache_destroy after final shutdown_cache() for 000000=
0068106f00
>    kmem_cache_destroy is done
> [  145.540573] slab_caches_to_rcu_destroy_workfn before rcu_barrier() in =
workfunc
>    slab_caches_to_rcu_destroy_workfn started and waits in rcu_barrier() n=
ow
> [  145.540619] smc.0698ae: smc_exit before smc_pnet_exit
>    smc module exit code gets back control ...
> [  145.540699] smc.616283: smc_exit before unregister_pernet_subsys
> [  145.619747] rcu_free_slab called for 00000000392c2e00, page is 000003d=
080e45000
>    much later the rcu callbacks are invoked, and will crash
>=20
> >>
> >> If my thoughts are correct, the commit you've mentioned didn't introdu=
ced this
> >> issue, it just made it easier to reproduce.
> >>
> >> The proposed fix looks dubious to me: the problem isn't in the memcg p=
ointer
> >> (it's just a luck that it crashes on it), and it seems incorrect to no=
t decrease
> >> the slab statistics of the original memory cgroup.
>=20
> I was quite sure that my approach is way to simple, it's better when the =
mm experts
> work on that.
>=20
> >>
> >> What we probably need to do instead is to extend flush_memcg_workqueue=
() to
> >> wait for all outstanding rcu free callbacks. I have to think a bit wha=
t's the best
> >> way to fix it. How easy is to reproduce the problem?
>=20
> I can reproduce this at will and I am happy to test any fixes you provide=
.
>=20
> >=20
> > After a second thought, flush_memcg_workqueue() already contains
> > a rcu_barrier() call, so now first suspicion is that the last free() ca=
ll
> > occurs after the kmem_cache_destroy() call. Can you, please, check if i=
t's not
> > a case?
> >=20
>=20
> In kmem_cache_destroy(), the flush_memcg_workqueue() call is the first on=
e, and after
> that shutdown_memcg_caches() is called which setup the rcu callbacks.

These are callbacks to destroy kmem_caches, not pages.

> So flush_memcg_workqueue() can not wait for them. If you follow the 'call=
 graph' above=20
> using the RCU path in slub.c you can see when the callbacks are set up an=
d why no warning=20
> is printed.
>=20
>=20
> Second thought after I wrote all of the above: when flush_memcg_workqueue=
() already contains
> an rcu_barrier(), whats the point of delaying the slab freeing in the rcu=
 case? All rcu
> readers should be done now, so the rcu callbacks and the worker are not n=
eeded?
> What am I missing here (I am sure I miss something, I am completely new i=
n the mm area)?
>=20
> > Thanks!
> >=20
> >>
> >>>
> >>> 349.361168=A8 Unable to handle kernel pointer dereference in virtual =
kernel address space
> >>
> >> Btw, haven't you noticed anything suspicious in dmesg before this line=
?
>=20
> There is no error or warning line in dmesg before this line. Actually, I =
think that
> all pages are no longer in use so no warning is printed. Anyway, the slab=
 freeing is
> delayed in any case when RCU is in use, right?
>=20
>=20
> Karsten
>=20
> >>
> >> Thank you!
> >>
> >> Roman
> >>
> >>> 349.361210=A8 Failing address: 0000000000000000 TEID: 000000000000048=
3
> >>> 349.361223=A8 Fault in home space mode while using kernel ASCE.
> >>> 349.361240=A8 AS:00000000017d4007 R3:000000007fbd0007 S:000000007fbff=
000 P:000000000000003d
> >>> 349.361340=A8 Oops: 0004 ilc:3 =DD#1=A8 PREEMPT SMP
> >>> 349.361349=A8 Modules linked in: tcp_diag inet_diag xt_tcpudp ip6t_rp=
filter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip=
6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_at nf_nat=
 iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf=
_de
> >>> 349.361436=A8 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-05872-g=
6133e3e4bada-dirty #14
> >>> 349.361445=A8 Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
> >>> 349.361450=A8 Krnl PSW : 0704d00180000000 00000000003cadb6 (__free_sl=
ab+0x686/0x6b0)
> >>> 349.361464=A8            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:=
1 PM:0 RI:0 EA:3
> >>> 349.361470=A8 Krnl GPRS: 00000000f3a32928 0000000000000000 000000007f=
bf5d00 000000000117c4b8
> >>> 349.361475=A8            0000000000000000 000000009e3291c1 0000000000=
000000 0000000000000000
> >>> 349.361481=A8            0000000000000003 0000000000000008 000000002b=
478b00 000003d080a97600
> >>> 349.361481=A8            0000000000000003 0000000000000008 000000002b=
478b00 000003d080a97600
> >>> 349.361486=A8            000000000117ba00 000003e000057db0 0000000000=
3cabcc 000003e000057c78
> >>> 349.361500=A8 Krnl Code: 00000000003cada6: e310a1400004        lg    =
  %r1,320(%r10)
> >>> 349.361500=A8            00000000003cadac: c0e50046c286        brasl =
  %r14,ca32b8
> >>> 349.361500=A8           #00000000003cadb2: a7f4fe36            brc   =
  15,3caa1e
> >>> 349.361500=A8           >00000000003cadb6: e32060800024        stg   =
  %r2,128(%r6)
> >>> 349.361500=A8            00000000003cadbc: a7f4fd9e            brc   =
  15,3ca8f8
> >>> 349.361500=A8            00000000003cadc0: c0e50046790c        brasl =
  %r14,c99fd8
> >>> 349.361500=A8            00000000003cadc6: a7f4fe2c            brc   =
  15,3caa
> >>> 349.361500=A8            00000000003cadc6: a7f4fe2c            brc   =
  15,3caa1e
> >>> 349.361500=A8            00000000003cadca: ecb1ffff00d9        aghik =
  %r11,%r1,-1
> >>> 349.361619=A8 Call Trace:
> >>> 349.361627=A8 (=DD<00000000003cabcc>=A8 __free_slab+0x49c/0x6b0)
> >>> 349.361634=A8  =DD<00000000001f5886>=A8 rcu_core+0x5a6/0x7e0
> >>> 349.361643=A8  =DD<0000000000ca2dea>=A8 __do_softirq+0xf2/0x5c0
> >>> 349.361652=A8  =DD<0000000000152644>=A8 irq_exit+0x104/0x130
> >>> 349.361659=A8  =DD<000000000010d222>=A8 do_IRQ+0x9a/0xf0
> >>> 349.361667=A8  =DD<0000000000ca2344>=A8 ext_int_handler+0x130/0x134
> >>> 349.361674=A8  =DD<0000000000103648>=A8 enabled_wait+0x58/0x128
> >>> 349.361681=A8 (=DD<0000000000103634>=A8 enabled_wait+0x44/0x128)
> >>> 349.361688=A8  =DD<0000000000103b00>=A8 arch_cpu_idle+0x40/0x58
> >>> 349.361695=A8  =DD<0000000000ca0544>=A8 default_idle_call+0x3c/0x68
> >>> 349.361704=A8  =DD<000000000018eaa4>=A8 do_idle+0xec/0x1c0
> >>> 349.361748=A8  =DD<000000000018ee0e>=A8 cpu_startup_entry+0x36/0x40
> >>> 349.361756=A8  =DD<000000000122df34>=A8 arch_call_rest_init+0x5c/0x88
> >>> 349.361761=A8  =DD<0000000000000000>=A8 0x0
> >>> 349.361765=A8 INFO: lockdep is turned off.
> >>> 349.361769=A8 Last Breaking-Event-Address:
> >>> 349.361774=A8  =DD<00000000003ca8f4>=A8 __free_slab+0x1c4/0x6b0
> >>> 349.361781=A8 Kernel panic - not syncing: Fatal exception in interrup=
t
> >>>
> >>>
> >>> A fix that works for me (RFC):
> >>>
> >>> diff --git a/mm/slab.h b/mm/slab.h
> >>> index a62372d0f271..b19a3f940338 100644
> >>> --- a/mm/slab.h
> >>> +++ b/mm/slab.h
> >>> @@ -328,7 +328,7 @@ static __always_inline void memcg_uncharge_slab(s=
truct page *page, int order,
> >>>
> >>>         rcu_read_lock();
> >>>         memcg =3D READ_ONCE(s->memcg_params.memcg);
> >>> -       if (likely(!mem_cgroup_is_root(memcg))) {
> >>> +       if (likely(memcg && !mem_cgroup_is_root(memcg))) {
> >>>                 lruvec =3D mem_cgroup_lruvec(page_pgdat(page), memcg)=
;
> >>>                 mod_lruvec_state(lruvec, cache_vmstat_idx(s), -(1 << =
order));
> >>>                 memcg_kmem_uncharge_memcg(page, order, memcg);
> >>>
> >>> --=20
> >>> Karsten
> >>>
> >>> (I'm a dude)
> >>>
> >>>
>=20