linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
To: "Zhang, Yanmin" <yanmin.zhang@intel.com>
Cc: kosaki.motohiro@jp.fujitsu.com, "Wu,
	Fengguang" <fengguang.wu@intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	Christoph Lameter <cl@linux-foundation.org>
Subject: Re: [PATCH 4/4] zone_reclaim_mode is always 0 by default
Date: Tue, 19 May 2009 13:30:40 +0900 (JST)	[thread overview]
Message-ID: <20090519125744.4EC3.A69D9226@jp.fujitsu.com> (raw)
In-Reply-To: <4D05DB80B95B23498C72C700BD6C2E0B2EF6E29A@pdsmsx502.ccr.corp.intel.com>

> >>-----Original Message-----
> >>From: KOSAKI Motohiro [mailto:kosaki.motohiro@jp.fujitsu.com]
> >>Sent: 2009ト\xF3\x16ヤツ19ネユ 10:54
> >>To: Wu, Fengguang
> >>Cc: kosaki.motohiro@jp.fujitsu.com; LKML; linux-mm; Andrew Morton; Rik van
> >>Riel; Christoph Lameter; Zhang, Yanmin
> >>Subject: Re: [PATCH 4/4] zone_reclaim_mode is always 0 by default
> >>
> >>> On Wed, May 13, 2009 at 12:08:12PM +0900, KOSAKI Motohiro wrote:
> >>> > Subject: [PATCH] zone_reclaim_mode is always 0 by default
> >>> >
> >>> > Current linux policy is, if the machine has large remote node distance,
> >>> >  zone_reclaim_mode is enabled by default because we've be able to assume
> 
> >>
> >>ok, I would explain zone reclaim design and performance tendency.
> >>
> >>Firstly, we can make classification of linux eco system, roughly.
> >> - HPC
> >> - high-end server
> >> - volume server
> >> - desktop
> >> - embedded
> >>
> >>it is separated by typical workload mainly.
> >>
> >>Secondly, zone_reclaim mean "I strongly dislike remote node access than
> >>disk access".
> >>it is very fitting on HPC workload. it because
> >>  - HPC workload typically make the number of the same as cpus of processess
> >>(or thread).
> >>    IOW, the workload typically use memory equally each node.
> >>  - HPC workload is typically CPU bounded job. CPU migration is rare.
> >>  - HPC workload is typically long lived. (possible >1 year)
> >>    IOW, remote node allocation makes _very_ _very_ much remote node access.
> >>
> >>but zone_reclaim don't fit typical server workload.
> >>  - server workload often make thread pool and some thread is sleeping until
> >>    a request receved.
> >>    IOW, when thread waking-up, the thread might move another cpu.
> >>    node distance tendency don't make sense on weak cpu locality workload.
> >>
> >>Plus, disk-cache is the file-server's identity. we shouldn't think it's not
> >>important.
> >>Plus, DB software can consume almost system memory and (In general) RDB data
> >>makes
> >>harder to split equally as hpc.
> >>
> >>desktop workload is special. desktop peopole can run various workload beyond
> >>our assumption. So, we shouldn't have any workload assumption to desktop
> >>people.
> >>However, AFAIK almost desktop software use memory as UMA.
> >>
> >>we don't need to care embedded. it is typically UMA.
> >>
> >>
> >>IOW, the benefit of zone reclaim depend on "strong cpu locality" and
> >>"workload is cpu bounded" and "thead is long lived".
> >>but many workload don't fill above requirement. IOW, zone reclaim is
> >>workload depended feature (as Wu said).
> >>
> >>
> >>In general, the feature of workload depended don't fit default option.
> >>we can't know end-user run what workload anyway.
> >>
> >>Fortunately (or Unfortunately), typical workload and machine size had
> >>significant mutuality.
> >>Thus, the current default setting calculation had worked well in past days.
> [YM] Your analysis is clear and deep.

Thanks!


> >>Now, it was breaked. What should we do?
> >>Yanmin, We know 99% linux people use intel cpu and you are one of
> >>most hard repeated testing
> [YM] It's very easy to reproduce them on my machines. :) Sometimes, because the 
> issues only exist on machines with lots of cpu while other community developers
> have no such environments. 
>
> 
>  guy in lkml and you have much test.
> >>May I ask your tested machine and benchmark?
> [YM] Usually I started lots of benchmark testing against the latest kernel, but 
> as for this issue, it's reported by a customer firstly. The customer runs apache
> on Nehalem machines to access lots of files. So the issue is an example of file 
> server.

hmmm. 
I'm surprised this report. I didn't know this problem. oh..

Actually, I don't think apache is only file server.
apache is one of killer application in linux. it run on very widely organization.
you think large machine don't run apache? I don't think so.



> BTW, I found many test cases of fio have big drop after I upgraded BIOS of one 
> Nehalem machine. By checking vmstat data, I found almost a half memory is always free. It's also related to zone_reclaim_mode because new BIOS changes the node
> distance to a large value. I use numactl --interleave=all to walkaround the problem temporarily.
> 
> I have no HPC environment.

Yeah, that's ok. I and cristoph have. My worries is my unknown workload become regression.
so, May I assume you run your benchmark both zonre reclaim 0 and 1 and you 
haven't seen regression by non-zone reclaim mode?
if so, it encourage very much to me.

if zone reclaim mode disabling don't have regression, I'll pushing to 
remove default zone reclaim mode completely again.


> >>if zone_reclaim=0 tendency workload is much than zone_reclaim=1 tendency
> >>workload,
> >> we can drop our afraid and we would prioritize your opinion, of cource.
> So it seems only file servers have the issue currently.





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-05-19  4:30 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-13  3:06 [PATCH 0/4] various zone_reclaim cleanup KOSAKI Motohiro
2009-05-13  3:06 ` [PATCH 1/4] vmscan: change the number of the unmapped files in zone reclaim KOSAKI Motohiro
2009-05-13 13:31   ` Rik van Riel
2009-05-14 19:52   ` Christoph Lameter
2009-05-18  3:15   ` Wu Fengguang
2009-05-18  3:35     ` KOSAKI Motohiro
2009-05-18  3:53       ` Wu Fengguang
2009-05-19  1:11         ` KOSAKI Motohiro
2009-05-13  3:06 ` [PATCH 2/4] vmscan: drop PF_SWAPWRITE from zone_reclaim KOSAKI Motohiro
2009-05-13 13:35   ` Rik van Riel
2009-05-14 19:57   ` Christoph Lameter
2009-05-18  3:33   ` Wu Fengguang
2009-05-13  3:07 ` [PATCH 3/4] vmscan: zone_reclaim use may_swap KOSAKI Motohiro
2009-05-13 11:26   ` Johannes Weiner
2009-05-13 14:43   ` Rik van Riel
2009-05-14 19:59   ` Christoph Lameter
2009-05-18  3:35   ` Wu Fengguang
2009-05-13  3:08 ` [PATCH 4/4] zone_reclaim_mode is always 0 by default KOSAKI Motohiro
2009-05-13 14:47   ` Rik van Riel
2009-05-14  8:20     ` KOSAKI Motohiro
2009-05-14 11:48       ` Robin Holt
2009-05-14 12:02         ` KOSAKI Motohiro
2009-05-13 15:22   ` Robin Holt
2009-05-14 20:05     ` Christoph Lameter
2009-05-14 20:23       ` Rik van Riel
2009-05-14 20:31         ` Christoph Lameter
2009-05-15  1:02       ` KOSAKI Motohiro
2009-05-15 10:51         ` Robin Holt
2009-05-19  2:53           ` KOSAKI Motohiro
2009-05-20 14:00             ` Robin Holt
2009-05-21  2:44               ` KOSAKI Motohiro
2009-05-21 13:31                 ` Christoph Lameter
2009-05-21 13:57                   ` Robin Holt
2009-05-24 13:44                   ` KOSAKI Motohiro
2009-05-15 18:01         ` Christoph Lameter
2009-05-18  3:49   ` Wu Fengguang
2009-05-19  1:16     ` Zhang, Yanmin
2009-05-19  2:53     ` KOSAKI Motohiro
2009-05-19  2:57       ` KOSAKI Motohiro
2009-05-19  3:38       ` Zhang, Yanmin
2009-05-19  4:30         ` KOSAKI Motohiro [this message]
2009-05-19  5:06           ` Zhang, Yanmin
2009-05-19  7:09             ` KOSAKI Motohiro
2009-05-19  7:15               ` Zhang, Yanmin
2009-05-18  9:09   ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090519125744.4EC3.A69D9226@jp.fujitsu.com \
    --to=kosaki.motohiro@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=fengguang.wu@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    --cc=yanmin.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox