linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Zhang, Yanmin" <yanmin.zhang@intel.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Robin Holt <holt@sgi.com>,
	"Wu, Fengguang" <fengguang.wu@intel.com>
Subject: RE: [PATCH v3] zone_reclaim is always 0 by default
Date: Thu, 21 May 2009 11:27:12 +0800	[thread overview]
Message-ID: <4D05DB80B95B23498C72C700BD6C2E0B2F35B8FA@pdsmsx502.ccr.corp.intel.com> (raw)
In-Reply-To: <20090521114408.63D0.A69D9226@jp.fujitsu.com>

>>-----Original Message-----
>>From: KOSAKI Motohiro [mailto:kosaki.motohiro@jp.fujitsu.com]
>>Sent: 2009年5月21日 10:47
>>To: LKML; linux-mm; Andrew Morton; Rik van Riel; Christoph Lameter; Robin Holt;
>>Zhang, Yanmin; Wu, Fengguang
>>Cc: kosaki.motohiro@jp.fujitsu.com
>>Subject: [PATCH v3] zone_reclaim is always 0 by default
>>
>>
>>Subject: [PATCH v3] zone_reclaim is always 0 by default
>>
>>Current linux policy is, zone_reclaim_mode is enabled by default if the machine
>>has large remote node distance. it's because we could assume that large distance
>>mean large server until recently.
>>
>>Unfortunately, recent modern x86 CPU (e.g. Core i7, Opeteron) have P2P
>>transport
>>memory controller. IOW it's seen as NUMA from software view.
>>Some Core i7 machine has large remote node distance.
>>
>>Yanmin reported zone_reclaim_mode=1 cause large apache regression.
>>
>>    One Nehalem machine has 12GB memory,
>>    but there is always 2GB free although applications accesses lots of files.
>>    Eventually we located the root cause as zone_reclaim_mode=1.
>>
>>Actually, zone_reclaim_mode=1 mean "I dislike remote node allocation rather
>>than
>>disk access", it makes performance improvement to HPC workload.
>>but it makes performance degression desktop, file server and web server.
>>
>>In general, workload depended configration shouldn't put into default
>>settings.
>>Plus, desktop and file/web server eco-system is much larger than hpc's.
>>
>>Thus, zone_reclaim == 0 is better by default.
[YM] Thanks. I started a series of testing on 2 Nehalem machines by setting
zone_reclaim_mode=0 (The default is 1 on the 2 machines). I didn't find
regression with non-disk_I/O (mostly cpubound) benchmarks. disk I/O benchmarks 
could benefit a little from zone_reclaim_mode=0. As I start benchmark fio with 
numactl --interleave=all, so the fio improvement is not so bigger like before.

One thing I need mention is my testing with non-disk_I/O might be not good examples
for this patch, because every node has far more memory than the testing needs.
Only some disk I/O benchmarks have big requirement on page cache memory, so they could benefit from zone_reclaim_mode=0.


>>
>>
>>Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>>Cc: Christoph Lameter <cl@linux-foundation.org>
>>Cc: Rik van Riel <riel@redhat.com>
>>Cc: Robin Holt <holt@sgi.com>
>>Tested-by: "Zhang, Yanmin" <yanmin.zhang@intel.com>
>>Acked-by: Wu Fengguang <fengguang.wu@intel.com>
>>---
>> arch/ia64/include/asm/topology.h |    5 -----
>> include/linux/topology.h         |    9 +--------
>> mm/page_alloc.c                  |    7 -------
>> 3 files changed, 1 insertion(+), 20 deletions(-)
>>
>>Index: b/mm/page_alloc.c
>>===================================================================
>>--- a/mm/page_alloc.c
>>+++ b/mm/page_alloc.c
>>@@ -2494,13 +2494,6 @@ static void build_zonelists(pg_data_t *p
>> 		int distance = node_distance(local_node, node);
>>
>> 		/*
>>-		 * If another node is sufficiently far away then it is better
>>-		 * to reclaim pages in a zone before going off node.
>>-		 */
>>-		if (distance > RECLAIM_DISTANCE)
>>-			zone_reclaim_mode = 1;
>>-
>>-		/*
>> 		 * We don't want to pressure a particular node.
>> 		 * So adding penalty to the first node in same
>> 		 * distance group to make it round-robin.
>>Index: b/arch/ia64/include/asm/topology.h
>>===================================================================
>>--- a/arch/ia64/include/asm/topology.h
>>+++ b/arch/ia64/include/asm/topology.h
>>@@ -21,11 +21,6 @@
>> #define PENALTY_FOR_NODE_WITH_CPUS 255
>>
>> /*
>>- * Distance above which we begin to use zone reclaim
>>- */
>>-#define RECLAIM_DISTANCE 15
>>-
>>-/*
>>  * Returns the number of the node containing CPU 'cpu'
>>  */
>> #define cpu_to_node(cpu) (int)(cpu_to_node_map[cpu])
>>Index: b/include/linux/topology.h
>>===================================================================
>>--- a/include/linux/topology.h
>>+++ b/include/linux/topology.h
>>@@ -53,14 +53,7 @@ int arch_update_cpu_topology(void);
>> #ifndef node_distance
>> #define node_distance(from,to)	((from) == (to) ? LOCAL_DISTANCE :
>>REMOTE_DISTANCE)
>> #endif
>>-#ifndef RECLAIM_DISTANCE
>>-/*
>>- * If the distance between nodes in a system is larger than RECLAIM_DISTANCE
>>- * (in whatever arch specific measurement units returned by node_distance())
>>- * then switch on zone reclaim on boot.
>>- */
>>-#define RECLAIM_DISTANCE 20
>>-#endif
>>+
>> #ifndef PENALTY_FOR_NODE_WITH_CPUS
>> #define PENALTY_FOR_NODE_WITH_CPUS	(1)
>> #endif
>>


  reply	other threads:[~2009-05-21  3:28 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-21  2:47 KOSAKI Motohiro
2009-05-21  3:27 ` Zhang, Yanmin [this message]
2009-05-22 12:26 ` Robin Holt
2009-05-24 13:44   ` KOSAKI Motohiro
2009-05-25 11:41     ` Robin Holt
2009-05-27  8:06       ` KOSAKI Motohiro
2009-05-27  9:50         ` Robin Holt
2009-05-28  4:30           ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D05DB80B95B23498C72C700BD6C2E0B2F35B8FA@pdsmsx502.ccr.corp.intel.com \
    --to=yanmin.zhang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=fengguang.wu@intel.com \
    --cc=holt@sgi.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox