From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with ESMTP id 242696B0071 for ; Thu, 7 Jan 2010 09:50:04 -0500 (EST) Date: Thu, 7 Jan 2010 14:49:49 +0000 From: Mel Gorman Subject: Re: Commit f50de2d38 seems to be breaking my oom killer Message-ID: <20100107144948.GB5342@csn.ul.ie> References: <87a5b0801001070434m7f6b0fd6vfcdf49ab73a06cbb@mail.gmail.com> <20100107135831.GA29564@csn.ul.ie> <87a5b0801001070615p42268d77k66d472eff7a0e9fa@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <87a5b0801001070615p42268d77k66d472eff7a0e9fa@mail.gmail.com> Sender: owner-linux-mm@kvack.org To: Will Newton Cc: linux-mm@kvack.org List-ID: On Thu, Jan 07, 2010 at 02:15:42PM +0000, Will Newton wrote: > On Thu, Jan 7, 2010 at 1:58 PM, Mel Gorman wrote: > > On Thu, Jan 07, 2010 at 12:34:54PM +0000, Will Newton wrote: > >> Hi, > >> > >> I'm having some problems on a small embedded box with 24Mb of RAM and > >> no swap. If a process tries to use large amounts of memory and gets > >> OOM killed, with 2.6.32 it's fine, but with 2.6.33-rc2 kswapd gets > >> stuck and the system locks up. > > > > By stuck, do you mean it consumes 100% CPU and never goes to sleep? > > I assume so. The system sems locked up so ctrl-c of the process > doesn't work and I can't get in via telnet. Looking where the pc and > return pointer are going via JTAG leads me to believe it's stuck in > kswapd. > Sounds like kswapd is stuck in an infinite loop and I'm guessing your machine has just one CPU so the task to be killed never gets onto the CPU. > >> The problem appears to have been > >> introduced with f50de2d38. If I change sleeping_prematurely to skip > >> the for_each_populated_zone test then OOM killing operates as > >> expected. I'm guessing it's caused by the new code not allowing kswapd > >> to schedule when it is required to let the killed task exit. Does that > >> sound plausible? > >> > > > > It's conceivable. The expectation was that the cond_resched() in > > balance_pgdat() should have been called at > > > > if (!all_zones_ok) { > > cond_resched(); > > > > But it would appear that if all zones are unreclaimable, all_zones_ok == 1. > > It could be looping there indefinitly never calling schedule because it > > never reaches the points where cond_resched is called. > > > >> I'll try and investigate further into what's going on. > >> > > > > Can you try the following? > > > > ==== CUT HERE ==== > > vmscan: kswapd should notice that all zones are not ok if they are unreclaimble > > > > In the event all zones are unreclaimble, it is possible for kswapd to > > never go to sleep because "all zones are ok even though watermarks are > > not reached". It gets into a situation where cond_reched() is not > > called. > > > > This patch notes that if all zones are unreclaimable then the zones are > > not ok and cond_resched() should be called. > > > > Signed-off-by: Mel Gorman > > This fixes the problem, thanks for the quick response! > > Tested-by: Will Newton > Perfect. Thanks for reporting and testing. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org