linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
To: Robin Holt <holt@sgi.com>
Cc: kosaki.motohiro@jp.fujitsu.com,
	Christoph Lameter <cl@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>
Subject: Re: [PATCH 4/4] zone_reclaim_mode is always 0 by default
Date: Thu, 21 May 2009 11:44:07 +0900 (JST)	[thread overview]
Message-ID: <20090521090549.63B5.A69D9226@jp.fujitsu.com> (raw)
In-Reply-To: <20090520140045.GA29447@sgi.com>

> On Tue, May 19, 2009 at 11:53:44AM +0900, KOSAKI Motohiro wrote:
> > Hi
> > 
> > > > Current linux policy is, zone_reclaim_mode is enabled by default if the machine
> > > > has large remote node distance. it's because we could assume that large distance 
> > > > mean large server until recently.
> > > > 
> > > > Unfortunately, recent modern x86 CPU (e.g. Core i7, Opeteron) have P2P transport
> > > > memory controller. IOW it's seen as NUMA from software view.
> > > > 
> > > > Some Core i7 machine has large remote node distance, but zone_reclaim don't
> > > > fit desktop and small file server. it cause performance degression.
> > > > 
> > > > Thus, zone_reclaim == 0 is better by default if the machine is small.
> > > 
> > > What if I had a node 0 with 32GB or 128GB of memory.  In that case,
> > > we would have 3GB for DMA32, 125GB for Normal and then a node 1 with
> > > 128GB.  I would suggest that zone reclaim would perform normally and
> > > be beneficial.
> > > 
> > > You are unfairly classifying this as a size of machine problem when it is
> > > really a problem with the underlying zone reclaim code being triggered
> > > due to imbalanced node/zones, part of which is due to a single node
> > > having multiple zones and those multiple zones setting up the conditions
> > > for extremely agressive reclaim.  In other words, you are putting a
> > > bandage in place to hide a problem on your particular hardware.
> > > 
> > > Can RECLAIM_DISTANCE be adjusted so your Ci7 boxes are no longer caught?
> > > Aren't 4 node Ci7 boxes soon to be readily available?  How are your apps
> > > different from my apps in that you are not impacted by node locality?
> > > Are you being too insensitive to node locality?  Conversely am I being
> > > too sensitive?
> > > 
> > > All that said, I would not stop this from going in.  I just think the
> > > selection criteria is rather random.  I think we know the condition we
> > > are trying to avoid which is a small Normal zone on one node and a larger
> > > Normal zone on another causing zone reclaim to be overly agressive.
> > > I don't know how to quantify "small" versus "large".  I would suggest
> > > that a node 0 with 16 or more GB should have zone reclaim on by default
> > > as well.  Can that be expressed in the selection criteria.
> > 
> > I post my opinion as another mail. please see it.
> 
> I don't think you addressed my actual question.  How much of this is
> a result of having a node where 1/4 of the memory is in the 'Normal'
> zone and 3/4 is in the DMA32 zone?  How much is due to the imbalance
> between Node 0 'Normal' and Node 1 'Normal'?  Shouldn't that type of
> sanity check be used for turning on zone reclaim instead of some random
> number of nodes.

I can't catch up your message. Can you post your patch?
Can you explain your sanity check?

Now, I decide to remove "nr_online_nodes >= 4" condition.
Apache regression is really non-sense.

> Even with 128 nodes and 256 cpus, I _NEVER_ see the
> system swapping out before allocating off node so I can certainly not
> reproduce the situation you are seeing.

hmhm. but I don't think we can assume hpc workload.


> 
> The imbalance I have seen was when I had two small memory nodes and two
> large memory nodes and then oversubscribed memory.  In that situation,
> I noticed that the apps on the small memory nodes were more frequently
> impacted.  This unfairness made sense to me and seemed perfectly
> reasonable.


The node imbalancing is ok. example, typical linux init script makes many deamon process
to node0, we can't avoid it and it don't make strange behavior.

but zone imbalancing is bad. I don't want discuss all item again. but you
can google about inter zone reclaim issue instead.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-05-21  2:43 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-13  3:06 [PATCH 0/4] various zone_reclaim cleanup KOSAKI Motohiro
2009-05-13  3:06 ` [PATCH 1/4] vmscan: change the number of the unmapped files in zone reclaim KOSAKI Motohiro
2009-05-13 13:31   ` Rik van Riel
2009-05-14 19:52   ` Christoph Lameter
2009-05-18  3:15   ` Wu Fengguang
2009-05-18  3:35     ` KOSAKI Motohiro
2009-05-18  3:53       ` Wu Fengguang
2009-05-19  1:11         ` KOSAKI Motohiro
2009-05-13  3:06 ` [PATCH 2/4] vmscan: drop PF_SWAPWRITE from zone_reclaim KOSAKI Motohiro
2009-05-13 13:35   ` Rik van Riel
2009-05-14 19:57   ` Christoph Lameter
2009-05-18  3:33   ` Wu Fengguang
2009-05-13  3:07 ` [PATCH 3/4] vmscan: zone_reclaim use may_swap KOSAKI Motohiro
2009-05-13 11:26   ` Johannes Weiner
2009-05-13 14:43   ` Rik van Riel
2009-05-14 19:59   ` Christoph Lameter
2009-05-18  3:35   ` Wu Fengguang
2009-05-13  3:08 ` [PATCH 4/4] zone_reclaim_mode is always 0 by default KOSAKI Motohiro
2009-05-13 14:47   ` Rik van Riel
2009-05-14  8:20     ` KOSAKI Motohiro
2009-05-14 11:48       ` Robin Holt
2009-05-14 12:02         ` KOSAKI Motohiro
2009-05-13 15:22   ` Robin Holt
2009-05-14 20:05     ` Christoph Lameter
2009-05-14 20:23       ` Rik van Riel
2009-05-14 20:31         ` Christoph Lameter
2009-05-15  1:02       ` KOSAKI Motohiro
2009-05-15 10:51         ` Robin Holt
2009-05-19  2:53           ` KOSAKI Motohiro
2009-05-20 14:00             ` Robin Holt
2009-05-21  2:44               ` KOSAKI Motohiro [this message]
2009-05-21 13:31                 ` Christoph Lameter
2009-05-21 13:57                   ` Robin Holt
2009-05-24 13:44                   ` KOSAKI Motohiro
2009-05-15 18:01         ` Christoph Lameter
2009-05-18  3:49   ` Wu Fengguang
2009-05-19  1:16     ` Zhang, Yanmin
2009-05-19  2:53     ` KOSAKI Motohiro
2009-05-19  2:57       ` KOSAKI Motohiro
2009-05-19  3:38       ` Zhang, Yanmin
2009-05-19  4:30         ` KOSAKI Motohiro
2009-05-19  5:06           ` Zhang, Yanmin
2009-05-19  7:09             ` KOSAKI Motohiro
2009-05-19  7:15               ` Zhang, Yanmin
2009-05-18  9:09   ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090521090549.63B5.A69D9226@jp.fujitsu.com \
    --to=kosaki.motohiro@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=holt@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox