From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx174.postini.com [74.125.245.174]) by kanga.kvack.org (Postfix) with SMTP id 984206B0083 for ; Wed, 23 May 2012 18:37:20 -0400 (EDT) Date: Wed, 23 May 2012 15:37:18 -0700 From: Andrew Morton Subject: Re: [patch v2] mm, oom: normalize oom scores to oom_score_adj scale only for userspace Message-Id: <20120523153718.b70bb762.akpm@linux-foundation.org> In-Reply-To: References: <20120426193551.GA24968@redhat.com> <20120503222949.GA13762@redhat.com> <20120517145022.a99f41e8.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: David Rientjes Cc: KOSAKI Motohiro , KAMEZAWA Hiroyuki , Dave Jones , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Wed, 23 May 2012 00:15:03 -0700 (PDT) David Rientjes wrote: > The oom_score_adj scale ranges from -1000 to 1000 and represents the > proportion of memory available to the process at allocation time. This > means an oom_score_adj value of 300, for example, will bias a process as > though it was using an extra 30.0% of available memory and a value of > -350 will discount 35.0% of available memory from its usage. > > The oom killer badness heuristic also uses this scale to report the oom > score for each eligible process in determining the "best" process to > kill. Thus, it can only differentiate each process's memory usage by > 0.1% of system RAM. > > On large systems, this can end up being a large amount of memory: 256MB > on 256GB systems, for example. > > This can be fixed by having the badness heuristic to use the actual > memory usage in scoring threads and then normalizing it to the > oom_score_adj scale for userspace. This results in better comparison > between eligible threads for kill and no change from the userspace > perspective. > > ... > > @@ -367,12 +354,13 @@ static struct task_struct *select_bad_process(unsigned int *ppoints, > } > > points = oom_badness(p, memcg, nodemask, totalpages); > - if (points > *ppoints) { > + if (points > chosen_points) { > chosen = p; > - *ppoints = points; > + chosen_points = points; > } > } while_each_thread(g, p); > > + *ppoints = chosen_points * 1000 / totalpages; > return chosen; > } > It's still not obvious that we always avoid the divide-by-zero here. If there's some weird way of convincing constrained_alloc() to look at an empty nodemask, or a nodemask which covers only empty nodes then blam. Now, it's probably the case that this is a can't-happen but that guarantee would be pretty convoluted and fragile? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org