From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56438ECE58D for ; Fri, 11 Oct 2019 17:34:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 114AD2084C for ; Fri, 11 Oct 2019 17:34:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="QD3ftIDl" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 114AD2084C Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B9D808E0005; Fri, 11 Oct 2019 13:34:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B4DAD8E0001; Fri, 11 Oct 2019 13:34:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A642A8E0005; Fri, 11 Oct 2019 13:34:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0040.hostedemail.com [216.40.44.40]) by kanga.kvack.org (Postfix) with ESMTP id 82D4B8E0001 for ; Fri, 11 Oct 2019 13:34:56 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 2BE8540E5 for ; Fri, 11 Oct 2019 17:34:56 +0000 (UTC) X-FDA: 76032204192.15.sun36_2b0cbba8e9a58 X-HE-Tag: sun36_2b0cbba8e9a58 X-Filterd-Recvd-Size: 9752 Received: from mail-yw1-f65.google.com (mail-yw1-f65.google.com [209.85.161.65]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Fri, 11 Oct 2019 17:34:55 +0000 (UTC) Received: by mail-yw1-f65.google.com with SMTP id i207so3744038ywc.9 for ; Fri, 11 Oct 2019 10:34:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=tb/zoyBhjfvsMFy/5nFubTA0DsJYfclfxTRm+tVkXXE=; b=QD3ftIDle0YFdIx7D+/fztcFJgInz8Jx5oaiBiJJ65qZAgJ9s2Mqj4UbiGE7jS+rDy FscCcB1sXENS3M9SSpfkx9qgX9fAxQr1zZOK9h5TI4tIRNR2+lKWZwI4E5eZ+RzCv3jS 00RcgNhqZn5zxN4qCEC62E8Jd4ow44bSzrCcHlK3ghNQpdbT6ZuCsVnXlkhy+oL/yNlU Y4YbAny6eIUmnvqehPSCklNRQeLMue4jFggX6PF9lwRs/Qlq9ZbVdMkTzMqqtIkmcxBW fDFsRVuQAznAxLB2XMDWXKevb04heCSmf/sIlmW2EfqSpQyfw0QHfeth6r0h/5QcyiAN apBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=tb/zoyBhjfvsMFy/5nFubTA0DsJYfclfxTRm+tVkXXE=; b=XJJIicfVuf90mL/v1noO+j4dJDCx5DRtopBa5wUkTV8B1i7KAGDNiPYhg9iy9X2Gs/ guDJfs8bgCswjXYDq/ERKcmzZy+8t7oY1nb62hzoBtE6o+azlCYpjbo1y0QwIelI1Id4 UJT+iT2jvkQp1BtallnpQnD7GwMWH9mP4OOCJ4OSlaEG4bXjAj7HAFAbzAP+kEIhU1Yv 9R/eqe4XU83HqVxsoz8RAn2Qf7DHTeC4TQZQR2LemjGBBRL6SBEYDqZ7ScOilAnsnu+A M5xsCjDnc+d0b2VN4k2Z9boQdrAEU+1+Z5u7edljbyypJ4DsA+coafP/MCn2Mg8/VYKY 8b3g== X-Gm-Message-State: APjAAAVeZ2jlX4kzHNn2o82L6TgAEeGu6QldliwQjwkWLReeN7FAdIPH Aim5YU5I//74UX2KbpIQOYPKCOAE9epVPrS9LOZ6xg== X-Google-Smtp-Source: APXvYqyNpyEMg1UQyESFmsh6fpKix15hX/VTfReWFWG77PQhEi3wHfARglLBB4AXORIb4CUdgMshwdyjFEjiwcJrSdo= X-Received: by 2002:a81:9907:: with SMTP id q7mr3172105ywg.296.1570815294663; Fri, 11 Oct 2019 10:34:54 -0700 (PDT) MIME-Version: 1.0 References: <20191010160549.1584316-1-guro@fb.com> In-Reply-To: <20191010160549.1584316-1-guro@fb.com> From: Shakeel Butt Date: Fri, 11 Oct 2019 10:34:43 -0700 Message-ID: Subject: Re: [PATCH RESEND] mm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer release To: Roman Gushchin Cc: Linux MM , LKML , Kernel Team , Karsten Graul , Vladimir Davydov , David Rientjes , stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Oct 10, 2019 at 9:05 AM Roman Gushchin wrote: > > Karsten reported the following panic in __free_slab() happening on a s390= x > machine: > > 349.361168=C2=A8 Unable to handle kernel pointer dereference in virtual k= ernel address space > 349.361210=C2=A8 Failing address: 0000000000000000 TEID: 0000000000000483 > 349.361223=C2=A8 Fault in home space mode while using kernel ASCE. > 349.361240=C2=A8 AS:00000000017d4007 R3:000000007fbd0007 S:000000007fbff0= 00 P:000000000000003d > 349.361340=C2=A8 Oops: 0004 ilc:3 =C3=9D#1=C2=A8 PREEMPT SMP > 349.361349=C2=A8 Modules linked in: tcp_diag inet_diag xt_tcpudp ip6t_rpf= ilter ip6t_REJECT \ > nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6tab= le_mangle \ > ip6table_raw ip6table_security iptable_at nf_nat > 349.361436=C2=A8 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-05872-g6= 133e3e4bada-dirty #14 > 349.361445=C2=A8 Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0) > 349.361450=C2=A8 Krnl PSW : 0704d00180000000 00000000003cadb6 (__free_sla= b+0x686/0x6b0) > 349.361464=C2=A8 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1= PM:0 RI:0 EA:3 > 349.361470=C2=A8 Krnl GPRS: 00000000f3a32928 0000000000000000 000000007fb= f5d00 000000000117c4b8 > 349.361475=C2=A8 0000000000000000 000000009e3291c1 00000000000= 00000 0000000000000000 > 349.361481=C2=A8 0000000000000003 0000000000000008 000000002b4= 78b00 000003d080a97600 > 349.361481=C2=A8 0000000000000003 0000000000000008 000000002b4= 78b00 000003d080a97600 > 349.361486=C2=A8 000000000117ba00 000003e000057db0 00000000003= cabcc 000003e000057c78 > 349.361500=C2=A8 Krnl Code: 00000000003cada6: e310a1400004 lg = %r1,320(%r10) > 349.361500=C2=A8 00000000003cadac: c0e50046c286 brasl = %r14,ca32b8 > 349.361500=C2=A8 #00000000003cadb2: a7f4fe36 brc = 15,3caa1e > 349.361500=C2=A8 >00000000003cadb6: e32060800024 stg = %r2,128(%r6) > 349.361500=C2=A8 00000000003cadbc: a7f4fd9e brc = 15,3ca8f8 > 349.361500=C2=A8 00000000003cadc0: c0e50046790c brasl = %r14,c99fd8 > 349.361500=C2=A8 00000000003cadc6: a7f4fe2c brc = 15,3caa > 349.361500=C2=A8 00000000003cadc6: a7f4fe2c brc = 15,3caa1e > 349.361500=C2=A8 00000000003cadca: ecb1ffff00d9 aghik = %r11,%r1,-1 > 349.361619=C2=A8 Call Trace: > 349.361627=C2=A8 (=C3=9D<00000000003cabcc>=C2=A8 __free_slab+0x49c/0x6b0) > 349.361634=C2=A8 =C3=9D<00000000001f5886>=C2=A8 rcu_core+0x5a6/0x7e0 > 349.361643=C2=A8 =C3=9D<0000000000ca2dea>=C2=A8 __do_softirq+0xf2/0x5c0 > 349.361652=C2=A8 =C3=9D<0000000000152644>=C2=A8 irq_exit+0x104/0x130 > 349.361659=C2=A8 =C3=9D<000000000010d222>=C2=A8 do_IRQ+0x9a/0xf0 > 349.361667=C2=A8 =C3=9D<0000000000ca2344>=C2=A8 ext_int_handler+0x130/0x= 134 > 349.361674=C2=A8 =C3=9D<0000000000103648>=C2=A8 enabled_wait+0x58/0x128 > 349.361681=C2=A8 (=C3=9D<0000000000103634>=C2=A8 enabled_wait+0x44/0x128) > 349.361688=C2=A8 =C3=9D<0000000000103b00>=C2=A8 arch_cpu_idle+0x40/0x58 > 349.361695=C2=A8 =C3=9D<0000000000ca0544>=C2=A8 default_idle_call+0x3c/0= x68 > 349.361704=C2=A8 =C3=9D<000000000018eaa4>=C2=A8 do_idle+0xec/0x1c0 > 349.361748=C2=A8 =C3=9D<000000000018ee0e>=C2=A8 cpu_startup_entry+0x36/0= x40 > 349.361756=C2=A8 =C3=9D<000000000122df34>=C2=A8 arch_call_rest_init+0x5c= /0x88 > 349.361761=C2=A8 =C3=9D<0000000000000000>=C2=A8 0x0 > 349.361765=C2=A8 INFO: lockdep is turned off. > 349.361769=C2=A8 Last Breaking-Event-Address: > 349.361774=C2=A8 =C3=9D<00000000003ca8f4>=C2=A8 __free_slab+0x1c4/0x6b0 > 349.361781=C2=A8 Kernel panic - not syncing: Fatal exception in interrupt > > The kernel panics on an attempt to dereference the NULL memcg pointer. > When shutdown_cache() is called from the kmem_cache_destroy() context, > a memcg kmem_cache might have empty slab pages in a partial list, > which are still charged to the memory cgroup. These pages are released > by free_partial() at the beginning of shutdown_cache(): either > directly or by scheduling a RCU-delayed work (if the kmem_cache has > the SLAB_TYPESAFE_BY_RCU flag). The latter case is when the reported > panic can happen: memcg_unlink_cache() is called immediately after > shrinking partial lists, without waiting for scheduled RCU works. > It sets the kmem_cache->memcg_params.memcg pointer to NULL, > and the following attempt to dereference it by __free_slab() > from the RCU work context causes the panic. > > To fix the issue, let's postpone the release of the memcg pointer > to destroy_memcg_params(). It's called from a separate work context > by slab_caches_to_rcu_destroy_workfn(), which contains a full RCU > barrier. This guarantees that all scheduled page release RCU works > will complete before the memcg pointer will be zeroed. > > Big thanks for Karsten for the perfect report containing all necessary > information, his help with the analysis of the problem and testing > of the fix. > > Fixes: fb2f2b0adb98 ("mm: memcg/slab: reparent memcg kmem_caches on cgrou= p removal") > Reported-by: Karsten Graul > Tested-by: Karsten Graul > Signed-off-by: Roman Gushchin Reviewed-by: Shakeel Butt > Cc: Karsten Graul > Cc: Shakeel Butt > Cc: Vladimir Davydov > Cc: David Rientjes > Cc: stable@vger.kernel.org > --- > mm/slab_common.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/mm/slab_common.c b/mm/slab_common.c > index 0b94a37da531..8afa188f6e20 100644 > --- a/mm/slab_common.c > +++ b/mm/slab_common.c > @@ -178,10 +178,13 @@ static int init_memcg_params(struct kmem_cache *s, > > static void destroy_memcg_params(struct kmem_cache *s) > { > - if (is_root_cache(s)) > + if (is_root_cache(s)) { > kvfree(rcu_access_pointer(s->memcg_params.memcg_caches)); > - else > + } else { > + mem_cgroup_put(s->memcg_params.memcg); > + WRITE_ONCE(s->memcg_params.memcg, NULL); > percpu_ref_exit(&s->memcg_params.refcnt); > + } > } > > static void free_memcg_params(struct rcu_head *rcu) > @@ -253,8 +256,6 @@ static void memcg_unlink_cache(struct kmem_cache *s) > } else { > list_del(&s->memcg_params.children_node); > list_del(&s->memcg_params.kmem_caches_node); > - mem_cgroup_put(s->memcg_params.memcg); > - WRITE_ONCE(s->memcg_params.memcg, NULL); > } > } > #else > -- > 2.21.0 >