From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF649C47404 for ; Thu, 3 Oct 2019 03:36:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4D06B21D81 for ; Thu, 3 Oct 2019 03:36:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="DPKH6Lu0"; dkim=pass (1024-bit key) header.d=fb.onmicrosoft.com header.i=@fb.onmicrosoft.com header.b="Vj0WWuSg" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4D06B21D81 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8C1E86B0005; Wed, 2 Oct 2019 23:36:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8734A6B0006; Wed, 2 Oct 2019 23:36:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 713578E0001; Wed, 2 Oct 2019 23:36:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0081.hostedemail.com [216.40.44.81]) by kanga.kvack.org (Postfix) with ESMTP id 3A76A6B0005 for ; Wed, 2 Oct 2019 23:36:11 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id B837D181AC9AE for ; Thu, 3 Oct 2019 03:36:10 +0000 (UTC) X-FDA: 76001060100.06.end65_700147186f732 X-HE-Tag: end65_700147186f732 X-Filterd-Recvd-Size: 16138 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Thu, 3 Oct 2019 03:36:09 +0000 (UTC) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.16.0.42/8.16.0.42) with SMTP id x933Y6V8015128; Wed, 2 Oct 2019 20:36:05 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-id : content-transfer-encoding : mime-version; s=facebook; bh=jzX+Q/ah3r5BIFfKcRBTDl8Og+p8VgQFyZ/7u9/gj1I=; b=DPKH6Lu0HBfflr4dJ25MCZWD4J2zKsJaxGXzifO7iEvYST0yoS9y/Z7wUxsIn7FBD/uF CHkR80mgBmFmZBFHoBDQgQMiAn3hKhPyKc4VRaC/uws3KlwoklOYb3l3NiypsYWUatwu LH3Vgmg3/sGnFp2d5r8Td96VuwIoGSWegZI= Received: from maileast.thefacebook.com ([163.114.130.16]) by m0001303.ppops.net with ESMTP id 2vcde3pxh5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Wed, 02 Oct 2019 20:36:05 -0700 Received: from ash-exhub104.TheFacebook.com (2620:10d:c0a8:82::d) by ash-exhub103.TheFacebook.com (2620:10d:c0a8:82::c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Wed, 2 Oct 2019 20:36:04 -0700 Received: from NAM01-BN3-obe.outbound.protection.outlook.com (100.104.31.183) by o365-in.thefacebook.com (100.104.35.175) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5 via Frontend Transport; Wed, 2 Oct 2019 20:36:04 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=a6fZ/jVnboRGQxhqRi++HpSHRdCxayk/GOsSduV6F+1cbntck+TpoXrMYVY+BJjlht3C1ZQ2QbiW4LTehpdZifb6WQ/Gl7G6oUeGiKNNUKNMI1aXsSU+tb3zyDJyVTkpzyf1BTb9+kT8equK+jJzyU+6cvgsY+Orin8kOQCEqCuVOu8DPTrlH/G1m33MR64taqG9jg7Dx51GkblyLYbQPY5Otaq/fDIpPnEVVZczTIGVc8p9qgd4MWrsRLSO38CsUD5dugBOX+r9nJMhz0OdUAK/g/fTIGzJ4bq9zfDseTEGeevUUTto9XODSxDxl4usSIoQ6eNNuBM2G3fI7PNA5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=jzX+Q/ah3r5BIFfKcRBTDl8Og+p8VgQFyZ/7u9/gj1I=; b=NM6CorXxCjA5ru7bCsGDy5oqffrpHszqVuq/O+A7rieIoti7LcvIlOc9yq8zS9ud4ZDO5PhTqDaElkQ1ceN4xYEOA+QR9BjJIxxPiEfxHL2mv8vqP7PKuW9OpGK3p1+AQVqy2XIXmDmU30pBAXjXdfVdgNPespCYFZ94zwBNcj3BIBVY77ik/9S68jU9qkTFMkAvfEUMO+XEK/yJcym2zQE2/G7Z0vVPJuBm7q/S+D829z5EKTvItHEwtyH4yXxPkkJssqnKvvBs//aHDK9wmAdKKm4hBy+glcBnkII6caM35RHKqDpSTk74MzHtg8Nty2UFfFwd/zyGDjzDEVtVkg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=fb.com; dmarc=pass action=none header.from=fb.com; dkim=pass header.d=fb.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.onmicrosoft.com; s=selector2-fb-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=jzX+Q/ah3r5BIFfKcRBTDl8Og+p8VgQFyZ/7u9/gj1I=; b=Vj0WWuSgBe4/k5mOv8Eiwph/XnmFuC0ab5bGDfuJcWyB0kmLhzYS1zUfCR1V4QWOhWlgcw6wsYHjIX3IkmL0tX4PfgiF67iYOZt0SMclwDMwYcyRKMWWTdFPwsiJlvEm9HH2CuDU4yCoBOCLX9GXXiEgLXxIBm4dsy6LQkocyHQ= Received: from BN8PR15MB2626.namprd15.prod.outlook.com (20.179.137.220) by BN8PR15MB2660.namprd15.prod.outlook.com (20.179.137.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2305.20; Thu, 3 Oct 2019 03:35:46 +0000 Received: from BN8PR15MB2626.namprd15.prod.outlook.com ([fe80::dde5:821f:4571:dea4]) by BN8PR15MB2626.namprd15.prod.outlook.com ([fe80::dde5:821f:4571:dea4%5]) with mapi id 15.20.2305.023; Thu, 3 Oct 2019 03:35:46 +0000 From: Roman Gushchin To: Karsten Graul CC: Shakeel Butt , Vladimir Davydov , David Rientjes , ",Christoph Lameter" , Pekka Enberg , Joonsoo Kim , Andrew Morton , "linux-mm@kvack.org" Subject: Re: BUG: Crash in __free_slab() using SLAB_TYPESAFE_BY_RCU Thread-Topic: BUG: Crash in __free_slab() using SLAB_TYPESAFE_BY_RCU Thread-Index: AQHVeTDVEk1aW8/+kk6cZHy4IVdjLKdHwFmAgACEhgA= Date: Thu, 3 Oct 2019 03:35:46 +0000 Message-ID: <20191003033540.GA10017@castle.DHCP.thefacebook.com> References: <4a5108b4-5a2f-f83c-e6a8-5e0c9074ac69@linux.ibm.com> <20191002194121.GA9033@castle.DHCP.thefacebook.com> In-Reply-To: <20191002194121.GA9033@castle.DHCP.thefacebook.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: MWHPR2201CA0021.namprd22.prod.outlook.com (2603:10b6:301:28::34) To BN8PR15MB2626.namprd15.prod.outlook.com (2603:10b6:408:c7::28) x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [2620:10d:c090:180::1c12] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 2fa5e06e-50b1-4cd0-2ee1-08d747b2c6ec x-ms-traffictypediagnostic: BN8PR15MB2660: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:9508; x-forefront-prvs: 01792087B6 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(346002)(376002)(366004)(39860400002)(396003)(136003)(199004)(189003)(6486002)(71200400001)(2906002)(14454004)(102836004)(6436002)(33656002)(186003)(6506007)(386003)(6246003)(6916009)(76176011)(316002)(229853002)(54906003)(6116002)(46003)(8936002)(446003)(11346002)(476003)(25786009)(478600001)(305945005)(256004)(486006)(71190400001)(14444005)(99286004)(5660300002)(86362001)(7736002)(81156014)(8676002)(1076003)(81166006)(4326008)(52116002)(66946007)(66446008)(64756008)(66556008)(66476007)(9686003)(6512007)(45080400002);DIR:OUT;SFP:1102;SCL:1;SRVR:BN8PR15MB2660;H:BN8PR15MB2626.namprd15.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;A:1;MX:1; received-spf: None (protection.outlook.com: fb.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: +7Gveibj3/W6Menox06NzvAQWdlK9Y5Il0MXB/8swgPsFcgjgfS1n7w6ATA3fYJjhi9RIL9zM99ndb2XdEm+wvich9MchjjtEoM2jNStIze8uOad2/RFXp4XaUgeD7Kw8U5x1FID1cVlriuQ9xciu6A0mrWebLw1g90icYxD7emCGknBg5Ck/mjcTeRFS5suce/vDKMHgt3duw4SxXClKoaG/jdftxZT7WjKSLofqA/gIz1aErY9N4M4uarxhafOxSwT/dnVlRpvpiZhWgZl6F+koPphkqGmiU2eqL1dBs7kqXYI7y8ZVr5w3xWiC8Jfv9mVAK/jq3rn4ytzrTSv3Y3tHeWAJ0nXecy7LPb27sTJlMmKXQnOxhOCj5yCdoqhjbaeRZ+xlSgsUFrdBZul6i3jW3+rDk8YLKfu47IFNrI= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="iso-8859-1" Content-ID: <4771A8D883A343479CADD3D6EACFDEC6@namprd15.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: 2fa5e06e-50b1-4cd0-2ee1-08d747b2c6ec X-MS-Exchange-CrossTenant-originalarrivaltime: 03 Oct 2019 03:35:46.6200 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: XUXaDng6382dvj8wN7MwEMqY2MrwYJ80UBVrI+JfBkEFmBEYA94kLkpPdOIYXOuD X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN8PR15MB2660 X-OriginatorOrg: fb.com X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,1.0.8 definitions=2019-10-03_02:2019-10-01,2019-10-03 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 impostorscore=0 bulkscore=0 mlxlogscore=859 clxscore=1015 suspectscore=0 malwarescore=0 adultscore=0 spamscore=0 lowpriorityscore=0 mlxscore=0 priorityscore=1501 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1908290000 definitions=main-1910030031 X-FB-Internal: deliver X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Oct 02, 2019 at 12:41:29PM -0700, Roman Gushchin wrote: > Hello, Karsten! >=20 > Thank you for the report! >=20 > On Wed, Oct 02, 2019 at 04:50:53PM +0200, Karsten Graul wrote: > >=20 > > net/smc is calling proto_register(&smc_proto, 1) with smc_proto.slab_fl= ags =3D SLAB_TYPESAFE_BY_RCU. > > Right after the last SMC socket is destroyed, proto_unregister(&smc_pro= to) is called, which=20 > > calls kmem_cache_destroy(prot->slab). This results in a kernel crash in= __free_slab(). > > Platform is s390x, reproduced on kernel 5.4-rc1. The problem was introd= uced by commit > > fb2f2b0adb98 ("mm: memcg/slab: reparent memcg kmem_caches on cgroup rem= oval") > >=20 > > I added a 'call graph', below of that is the crash log and a (simple) p= atch that works for me, > > but I don't know if this is the correct way to fix it. > >=20 > > (Please keep me on CC of this thread because I do not follow the mm mai= ling list, thank you) > >=20 > >=20 > > kmem_cache_destroy()=20 > > -> shutdown_memcg_caches() > > -> shutdown_cache() > > -> __kmem_cache_shutdown() (slub.c) > > -> free_partial() > > -> discard_slab() > > -> free_slab() -- call to __f= ree_slab() is delayed > > -> call_rcu(rcu_free_slab) > > -> memcg_unlink_cache() > > -> WRITE_ONCE(s->memcg_params.memcg, NULL); -- !!! > > -> list_add_tail(&s->list, &slab_caches_to_rcu_destroy); > > -> schedule_work(&slab_caches_to_rcu_destroy_work); -> work_fn use= s rcu_barrier() to wait for rcu_batch,=20 > > so work_fn = is not further involved here... > > ... rcu grace period ... > > rcu_batch() > > ... > > -> rcu_free_slab() (slub.c) > > -> __free_slab() > > -> uncharge_slab_page() > > -> memcg_uncharge_slab() > > -> memcg =3D READ_ONCE(s->memcg_params.memcg); -- !!! memcg= =3D=3D NULL > > -> mem_cgroup_lruvec(memcg) > > -> mz =3D mem_cgroup_nodeinfo(memcg, pgdat->node_id); -- mz =3D=3D= NULL > > -> lruvec =3D &mz->lruvec; -- lruvec = =3D=3D NULL > > -> lruvec->pgdat =3D pgdat; -- *crash* > >=20 > > The crash log: >=20 > Hm, I might be wrong, but it seems that the problem is deeper: __free_sla= b() > called from the rcu path races with kmem_cache_destroy(), which is suppos= ed > to be called when there are no outstanding allocations (and corresponding= pages). > Any charged slab page actually holds a reference to the kmem_cache, which= prevents > its destruction (look at s->memcg_params.refcnt), but kmem_cache_destroy(= ) ignores > it. >=20 > If my thoughts are correct, the commit you've mentioned didn't introduced= this > issue, it just made it easier to reproduce. >=20 > The proposed fix looks dubious to me: the problem isn't in the memcg poin= ter > (it's just a luck that it crashes on it), and it seems incorrect to not d= ecrease > the slab statistics of the original memory cgroup. >=20 > What we probably need to do instead is to extend flush_memcg_workqueue() = to > wait for all outstanding rcu free callbacks. I have to think a bit what's= the best > way to fix it. How easy is to reproduce the problem? After a second thought, flush_memcg_workqueue() already contains a rcu_barrier() call, so now first suspicion is that the last free() call occurs after the kmem_cache_destroy() call. Can you, please, check if it's = not a case? Thanks! >=20 > >=20 > > 349.361168=A8 Unable to handle kernel pointer dereference in virtual ke= rnel address space >=20 > Btw, haven't you noticed anything suspicious in dmesg before this line? >=20 > Thank you! >=20 > Roman >=20 > > 349.361210=A8 Failing address: 0000000000000000 TEID: 0000000000000483 > > 349.361223=A8 Fault in home space mode while using kernel ASCE. > > 349.361240=A8 AS:00000000017d4007 R3:000000007fbd0007 S:000000007fbff00= 0 P:000000000000003d > > 349.361340=A8 Oops: 0004 ilc:3 =DD#1=A8 PREEMPT SMP > > 349.361349=A8 Modules linked in: tcp_diag inet_diag xt_tcpudp ip6t_rpfi= lter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6t= able_nat ip6table_mangle ip6table_raw ip6table_security iptable_at nf_nat i= ptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_d= e > > 349.361436=A8 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-05872-g61= 33e3e4bada-dirty #14 > > 349.361445=A8 Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0) > > 349.361450=A8 Krnl PSW : 0704d00180000000 00000000003cadb6 (__free_slab= +0x686/0x6b0) > > 349.361464=A8 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 = PM:0 RI:0 EA:3 > > 349.361470=A8 Krnl GPRS: 00000000f3a32928 0000000000000000 000000007fbf= 5d00 000000000117c4b8 > > 349.361475=A8 0000000000000000 000000009e3291c1 000000000000= 0000 0000000000000000 > > 349.361481=A8 0000000000000003 0000000000000008 000000002b47= 8b00 000003d080a97600 > > 349.361481=A8 0000000000000003 0000000000000008 000000002b47= 8b00 000003d080a97600 > > 349.361486=A8 000000000117ba00 000003e000057db0 00000000003c= abcc 000003e000057c78 > > 349.361500=A8 Krnl Code: 00000000003cada6: e310a1400004 lg = %r1,320(%r10) > > 349.361500=A8 00000000003cadac: c0e50046c286 brasl = %r14,ca32b8 > > 349.361500=A8 #00000000003cadb2: a7f4fe36 brc = 15,3caa1e > > 349.361500=A8 >00000000003cadb6: e32060800024 stg = %r2,128(%r6) > > 349.361500=A8 00000000003cadbc: a7f4fd9e brc = 15,3ca8f8 > > 349.361500=A8 00000000003cadc0: c0e50046790c brasl = %r14,c99fd8 > > 349.361500=A8 00000000003cadc6: a7f4fe2c brc = 15,3caa > > 349.361500=A8 00000000003cadc6: a7f4fe2c brc = 15,3caa1e > > 349.361500=A8 00000000003cadca: ecb1ffff00d9 aghik = %r11,%r1,-1 > > 349.361619=A8 Call Trace: > > 349.361627=A8 (=DD<00000000003cabcc>=A8 __free_slab+0x49c/0x6b0) > > 349.361634=A8 =DD<00000000001f5886>=A8 rcu_core+0x5a6/0x7e0 > > 349.361643=A8 =DD<0000000000ca2dea>=A8 __do_softirq+0xf2/0x5c0 > > 349.361652=A8 =DD<0000000000152644>=A8 irq_exit+0x104/0x130 > > 349.361659=A8 =DD<000000000010d222>=A8 do_IRQ+0x9a/0xf0 > > 349.361667=A8 =DD<0000000000ca2344>=A8 ext_int_handler+0x130/0x134 > > 349.361674=A8 =DD<0000000000103648>=A8 enabled_wait+0x58/0x128 > > 349.361681=A8 (=DD<0000000000103634>=A8 enabled_wait+0x44/0x128) > > 349.361688=A8 =DD<0000000000103b00>=A8 arch_cpu_idle+0x40/0x58 > > 349.361695=A8 =DD<0000000000ca0544>=A8 default_idle_call+0x3c/0x68 > > 349.361704=A8 =DD<000000000018eaa4>=A8 do_idle+0xec/0x1c0 > > 349.361748=A8 =DD<000000000018ee0e>=A8 cpu_startup_entry+0x36/0x40 > > 349.361756=A8 =DD<000000000122df34>=A8 arch_call_rest_init+0x5c/0x88 > > 349.361761=A8 =DD<0000000000000000>=A8 0x0 > > 349.361765=A8 INFO: lockdep is turned off. > > 349.361769=A8 Last Breaking-Event-Address: > > 349.361774=A8 =DD<00000000003ca8f4>=A8 __free_slab+0x1c4/0x6b0 > > 349.361781=A8 Kernel panic - not syncing: Fatal exception in interrupt > >=20 > >=20 > > A fix that works for me (RFC): > >=20 > > diff --git a/mm/slab.h b/mm/slab.h > > index a62372d0f271..b19a3f940338 100644 > > --- a/mm/slab.h > > +++ b/mm/slab.h > > @@ -328,7 +328,7 @@ static __always_inline void memcg_uncharge_slab(str= uct page *page, int order, > >=20 > > rcu_read_lock(); > > memcg =3D READ_ONCE(s->memcg_params.memcg); > > - if (likely(!mem_cgroup_is_root(memcg))) { > > + if (likely(memcg && !mem_cgroup_is_root(memcg))) { > > lruvec =3D mem_cgroup_lruvec(page_pgdat(page), memcg); > > mod_lruvec_state(lruvec, cache_vmstat_idx(s), -(1 << or= der)); > > memcg_kmem_uncharge_memcg(page, order, memcg); > >=20 > > --=20 > > Karsten > >=20 > > (I'm a dude) > >=20 > >=20