From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua0-f198.google.com (mail-ua0-f198.google.com [209.85.217.198]) by kanga.kvack.org (Postfix) with ESMTP id 2196B6B025F for ; Wed, 26 Jul 2017 23:30:52 -0400 (EDT) Received: by mail-ua0-f198.google.com with SMTP id y23so130914024uah.15 for ; Wed, 26 Jul 2017 20:30:52 -0700 (PDT) Received: from mail-ua0-x243.google.com (mail-ua0-x243.google.com. [2607:f8b0:400c:c08::243]) by mx.google.com with ESMTPS id 131si6438516vkv.130.2017.07.26.20.30.51 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Jul 2017 20:30:51 -0700 (PDT) Received: by mail-ua0-x243.google.com with SMTP id w45so15620740uac.3 for ; Wed, 26 Jul 2017 20:30:51 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20170726134451.GR2981@dhcp22.suse.cz> References: <20170726130742.5976-1-wenwei.tww@gmail.com> <20170726134451.GR2981@dhcp22.suse.cz> From: Wenwei Tao Date: Thu, 27 Jul 2017 11:30:50 +0800 Message-ID: Subject: Re: [RFC PATCH] mm: memcg: fix css double put in mem_cgroup_iter Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Johannes Weiner , Balbir Singh , kamezawa.hiroyu@jp.fujitsu.com, yuwang.yuwang@alibaba-inc.com, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Wenwei Tao 2017-07-26 21:44 GMT+08:00 Michal Hocko : > On Wed 26-07-17 21:07:42, Wenwei Tao wrote: >> From: Wenwei Tao >> >> By removing the child cgroup while the parent cgroup is >> under reclaim, we could trigger the following kernel panic >> on kernel 3.10: >> ------------------------------------------------------------------------ >> kernel BUG at kernel/cgroup.c:893! >> invalid opcode: 0000 [#1] SMP >> CPU: 1 PID: 22477 Comm: kworker/1:1 Not tainted 3.10.107 #1 >> Workqueue: cgroup_destroy css_dput_fn >> task: ffff8817959a5780 ti: ffff8817e8886000 task.ti: ffff8817e8886000 >> RIP: 0010:[] [] >> cgroup_diput+0xc0/0xf0 >> RSP: 0000:ffff8817e8887da0 EFLAGS: 00010246 >> RAX: 0000000000000000 RBX: ffff8817a5dd5d40 RCX: dead000000000200 >> RDX: 0000000000000000 RSI: ffff8817973a6910 RDI: ffff8817f54c2a00 >> RBP: ffff8817e8887dc8 R08: ffff8817a5dd5dd0 R09: df9fb35794b01820 >> R10: df9fb35794b01820 R11: 00007fa95b1efcda R12: ffff8817a5dd5d9c >> R13: ffff8817f38b3a40 R14: ffff8817973a6910 R15: ffff8817973a6910 >> FS: 0000000000000000(0000) GS:ffff88181f220000(0000) >> knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 00007fa6e6234000 CR3: 000000179f19d000 CR4: 00000000000407e0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Stack: >> ffff8817a5dd5d40 ffff8817a5dd5d9c ffff8817f38b3a40 ffff8817973a6910 >> 0000000000000040 ffff8817e8887df8 ffffffff811b37c2 ffff8817fa23c000 >> ffff8817f57dbb80 ffff88181f232ac0 ffff88181f237500 ffff8817e8887e10 >> Call Trace: >> [] dput+0x1a2/0x2f0 >> [] cgroup_dput.isra.21+0x1c/0x30 >> [] css_dput_fn+0x1d/0x20 >> [] process_one_work+0x17c/0x460 >> [] worker_thread+0x116/0x3b0 >> [] ? manage_workers.isra.25+0x290/0x290 >> [] kthread+0xc0/0xd0 >> [] ? insert_kthread_work+0x40/0x40 >> [] ret_from_fork+0x58/0x90 >> [] ? insert_kthread_work+0x40/0x40 >> Code: 41 5e 41 5f 5d c3 0f 1f 44 00 00 48 8b 7f 78 48 8b 07 a8 01 74 15 >> 48 81 c7 30 01 00 00 48 c7 c6 a0 a7 0c 81 e8 b2 83 02 00 eb c8 <0f> 0b >> 49 8b 4e 18 48 c7 c2 7e f1 7a 81 be 85 03 00 00 48 c7 c7 >> RIP [] cgroup_diput+0xc0/0xf0 >> RSP >> ---[ end trace 85eeea5212c44f51 ]--- >> ------------------------------------------------------------------------ >> >> I think there is a css double put in mem_cgroup_iter. Under reclaim, >> we call mem_cgroup_iter the first time with prev == NULL, and we get >> last_visited memcg from per zone's reclaim_iter then call __mem_cgroup_iter_next >> try to get next alive memcg, __mem_cgroup_iter_next could return NULL >> if last_visited is already the last one so we put the last_visited's >> memcg css and continue to the next while loop, this time we might not >> do css_tryget(&last_visited->css) if the dead_count is changed, but >> we still do css_put(&last_visited->css), we put it twice, this could >> trigger the BUG_ON at kernel/cgroup.c:893. > > Yes, I guess your are right and I suspect that this has been silently > fixed by 519ebea3bf6d ("mm: memcontrol: factor out reclaim iterator > loading and updating"). I think a more appropriate fix is would be. > Are you able to reproduce and re-test it? > --- Yes, I think this commit can fix this issue, and I backport this commit to 3.10.107 kernel and cannot reproduce this issue. I guess this commit might need to be backported to 3.10.y stable kernel. > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 437ae2cbe102..0848ec05c12a 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1224,6 +1224,8 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root, > if (last_visited && last_visited != root && > !css_tryget(&last_visited->css)) > last_visited = NULL; > + } else { > + last_visited = true; > } > } > > -- > Michal Hocko > SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org