From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with SMTP id 116776B005C for ; Thu, 4 Jun 2009 06:23:21 -0400 (EDT) Received: from m2.gw.fujitsu.co.jp ([10.0.50.72]) by fgwmail6.fujitsu.co.jp (Fujitsu Gateway) with ESMTP id n54ANH5Q004955 for (envelope-from kosaki.motohiro@jp.fujitsu.com); Thu, 4 Jun 2009 19:23:19 +0900 Received: from smail (m2 [127.0.0.1]) by outgoing.m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 3D7A145DE62 for ; Thu, 4 Jun 2009 19:23:17 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.fujitsu.co.jp [10.0.50.92]) by m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 0977945DE51 for ; Thu, 4 Jun 2009 19:23:17 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id E63931DB8048 for ; Thu, 4 Jun 2009 19:23:16 +0900 (JST) Received: from ml14.s.css.fujitsu.com (ml14.s.css.fujitsu.com [10.249.87.104]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 8D1D01DB803B for ; Thu, 4 Jun 2009 19:23:16 +0900 (JST) From: KOSAKI Motohiro Subject: [PATCH v4] zone_reclaim is always 0 by default Message-Id: <20090604192236.9761.A69D9226@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Date: Thu, 4 Jun 2009 19:23:15 +0900 (JST) Sender: owner-linux-mm@kvack.org To: Christoph Lameter , Rik van Riel , Robin Holt , "Zhang, Yanmin" , Wu Fengguang , linux-ia64@vger.kernel.org, linuxppc-dev@ozlabs.org, LKML , linux-mm , Andrew Morton Cc: kosaki.motohiro@jp.fujitsu.com List-ID: Current linux policy is, zone_reclaim_mode is enabled by default if the machine has large remote node distance. it's because we could assume that large distance mean large server until recently. Unfortunately, recent modern x86 CPU (e.g. Core i7, Opeteron) have P2P transport memory controller. IOW it's seen as NUMA from software view. Some Core i7 machine has large remote node distance. Yanmin reported zone_reclaim_mode=1 cause large apache regression. One Nehalem machine has 12GB memory, but there is always 2GB free although applications accesses lots of files. Eventually we located the root cause as zone_reclaim_mode=1. Actually, zone_reclaim_mode=1 mean "I dislike remote node allocation rather than disk access", it makes performance improvement to HPC workload. but it makes performance degression to desktop, file server and web server. In general, workload depended configration shouldn't put into default settings. However, current code is long standing about two year. Highest POWER and IA64 HPC machine (only) use this setting. Thus, x86 and almost rest architecture change default setting, but Only power and ia64 remain current configuration for backward-compatibility. Signed-off-by: KOSAKI Motohiro Cc: Christoph Lameter Cc: Rik van Riel Cc: Robin Holt Cc: "Zhang, Yanmin" Cc: Wu Fengguang Cc: linux-ia64@vger.kernel.org Cc: linuxppc-dev@ozlabs.org --- arch/powerpc/include/asm/topology.h | 6 ++++++ include/linux/topology.h | 7 +------ 2 files changed, 7 insertions(+), 6 deletions(-) Index: b/include/linux/topology.h =================================================================== --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -54,12 +54,7 @@ int arch_update_cpu_topology(void); #define node_distance(from,to) ((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE) #endif #ifndef RECLAIM_DISTANCE -/* - * If the distance between nodes in a system is larger than RECLAIM_DISTANCE - * (in whatever arch specific measurement units returned by node_distance()) - * then switch on zone reclaim on boot. - */ -#define RECLAIM_DISTANCE 20 +#define RECLAIM_DISTANCE INT_MAX #endif #ifndef PENALTY_FOR_NODE_WITH_CPUS #define PENALTY_FOR_NODE_WITH_CPUS (1) Index: b/arch/powerpc/include/asm/topology.h =================================================================== --- a/arch/powerpc/include/asm/topology.h +++ b/arch/powerpc/include/asm/topology.h @@ -10,6 +10,12 @@ struct device_node; #include +/* + * Distance above which we begin to use zone reclaim + */ +#define RECLAIM_DISTANCE 20 + + static inline int cpu_to_node(int cpu) { return numa_cpu_lookup_table[cpu]; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org