From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx121.postini.com [74.125.245.121]) by kanga.kvack.org (Postfix) with SMTP id 042036B0071 for ; Thu, 25 Oct 2012 05:57:21 -0400 (EDT) Date: Thu, 25 Oct 2012 11:57:19 +0200 From: Michal Hocko Subject: Re: process hangs on do_exit when oom happens Message-ID: <20121025095719.GA11105@dhcp22.suse.cz> References: <20121019160425.GA10175@dhcp22.suse.cz> <20121023095028.GD15397@dhcp22.suse.cz> <20121023101500.GE15397@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Qiang Gao Cc: Balbir Singh , "linux-kernel@vger.kernel.org" , "linux-mmc@vger.kernel.org" , "cgroups@vger.kernel.org" , linux-mm@kvack.org On Wed 24-10-12 11:44:17, Qiang Gao wrote: > On Wed, Oct 24, 2012 at 1:43 AM, Balbir Singh wrote: > > On Tue, Oct 23, 2012 at 3:45 PM, Michal Hocko wrote: > >> On Tue 23-10-12 18:10:33, Qiang Gao wrote: > >>> On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko wrote: > >>> > On Tue 23-10-12 15:18:48, Qiang Gao wrote: > >>> >> This process was moved to RT-priority queue when global oom-killer > >>> >> happened to boost the recovery of the system.. > >>> > > >>> > Who did that? oom killer doesn't boost the priority (scheduling class) > >>> > AFAIK. > >>> > > >>> >> but it wasn't get properily dealt with. I still have no idea why where > >>> >> the problem is .. > >>> > > >>> > Well your configuration says that there is no runtime reserved for the > >>> > group. > >>> > Please refer to Documentation/scheduler/sched-rt-group.txt for more > >>> > information. > >>> > > >> [...] > >>> maybe this is not a upstream-kernel bug. the centos/redhat kernel > >>> would boost the process to RT prio when the process was selected > >>> by oom-killer. > >> > >> This still looks like your cpu controller is misconfigured. Even if the > >> task is promoted to be realtime. > > > > > > Precisely! You need to have rt bandwidth enabled for RT tasks to run, > > as a workaround please give the groups some RT bandwidth and then work > > out the migration to RT and what should be the defaults on the distro. > > > > Balbir > > > see https://patchwork.kernel.org/patch/719411/ The patch surely "fixes" your problem but the primary fault here is the mis-configured cpu cgroup. If the value for the bandwidth is zero by default then all realtime processes in the group a screwed. The value should be set to something more reasonable. I am not familiar with the cpu controller but it seems that alloc_rt_sched_group needs some treat. Care to look into it and send a patch to the cpu controller and cgroup maintainers, please? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org