From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
To: "Zhang, Yanmin" <yanmin.zhang@intel.com>
Cc: kosaki.motohiro@jp.fujitsu.com, "Wu,
Fengguang" <fengguang.wu@intel.com>,
LKML <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
Rik van Riel <riel@redhat.com>,
Christoph Lameter <cl@linux-foundation.org>
Subject: Re: [PATCH 4/4] zone_reclaim_mode is always 0 by default
Date: Tue, 19 May 2009 13:30:40 +0900 (JST) [thread overview]
Message-ID: <20090519125744.4EC3.A69D9226@jp.fujitsu.com> (raw)
In-Reply-To: <4D05DB80B95B23498C72C700BD6C2E0B2EF6E29A@pdsmsx502.ccr.corp.intel.com>
> >>-----Original Message-----
> >>From: KOSAKI Motohiro [mailto:kosaki.motohiro@jp.fujitsu.com]
> >>Sent: 2009ト\xF3\x16ヤツ19ネユ 10:54
> >>To: Wu, Fengguang
> >>Cc: kosaki.motohiro@jp.fujitsu.com; LKML; linux-mm; Andrew Morton; Rik van
> >>Riel; Christoph Lameter; Zhang, Yanmin
> >>Subject: Re: [PATCH 4/4] zone_reclaim_mode is always 0 by default
> >>
> >>> On Wed, May 13, 2009 at 12:08:12PM +0900, KOSAKI Motohiro wrote:
> >>> > Subject: [PATCH] zone_reclaim_mode is always 0 by default
> >>> >
> >>> > Current linux policy is, if the machine has large remote node distance,
> >>> > zone_reclaim_mode is enabled by default because we've be able to assume
>
> >>
> >>ok, I would explain zone reclaim design and performance tendency.
> >>
> >>Firstly, we can make classification of linux eco system, roughly.
> >> - HPC
> >> - high-end server
> >> - volume server
> >> - desktop
> >> - embedded
> >>
> >>it is separated by typical workload mainly.
> >>
> >>Secondly, zone_reclaim mean "I strongly dislike remote node access than
> >>disk access".
> >>it is very fitting on HPC workload. it because
> >> - HPC workload typically make the number of the same as cpus of processess
> >>(or thread).
> >> IOW, the workload typically use memory equally each node.
> >> - HPC workload is typically CPU bounded job. CPU migration is rare.
> >> - HPC workload is typically long lived. (possible >1 year)
> >> IOW, remote node allocation makes _very_ _very_ much remote node access.
> >>
> >>but zone_reclaim don't fit typical server workload.
> >> - server workload often make thread pool and some thread is sleeping until
> >> a request receved.
> >> IOW, when thread waking-up, the thread might move another cpu.
> >> node distance tendency don't make sense on weak cpu locality workload.
> >>
> >>Plus, disk-cache is the file-server's identity. we shouldn't think it's not
> >>important.
> >>Plus, DB software can consume almost system memory and (In general) RDB data
> >>makes
> >>harder to split equally as hpc.
> >>
> >>desktop workload is special. desktop peopole can run various workload beyond
> >>our assumption. So, we shouldn't have any workload assumption to desktop
> >>people.
> >>However, AFAIK almost desktop software use memory as UMA.
> >>
> >>we don't need to care embedded. it is typically UMA.
> >>
> >>
> >>IOW, the benefit of zone reclaim depend on "strong cpu locality" and
> >>"workload is cpu bounded" and "thead is long lived".
> >>but many workload don't fill above requirement. IOW, zone reclaim is
> >>workload depended feature (as Wu said).
> >>
> >>
> >>In general, the feature of workload depended don't fit default option.
> >>we can't know end-user run what workload anyway.
> >>
> >>Fortunately (or Unfortunately), typical workload and machine size had
> >>significant mutuality.
> >>Thus, the current default setting calculation had worked well in past days.
> [YM] Your analysis is clear and deep.
Thanks!
> >>Now, it was breaked. What should we do?
> >>Yanmin, We know 99% linux people use intel cpu and you are one of
> >>most hard repeated testing
> [YM] It's very easy to reproduce them on my machines. :) Sometimes, because the
> issues only exist on machines with lots of cpu while other community developers
> have no such environments.
>
>
> guy in lkml and you have much test.
> >>May I ask your tested machine and benchmark?
> [YM] Usually I started lots of benchmark testing against the latest kernel, but
> as for this issue, it's reported by a customer firstly. The customer runs apache
> on Nehalem machines to access lots of files. So the issue is an example of file
> server.
hmmm.
I'm surprised this report. I didn't know this problem. oh..
Actually, I don't think apache is only file server.
apache is one of killer application in linux. it run on very widely organization.
you think large machine don't run apache? I don't think so.
> BTW, I found many test cases of fio have big drop after I upgraded BIOS of one
> Nehalem machine. By checking vmstat data, I found almost a half memory is always free. It's also related to zone_reclaim_mode because new BIOS changes the node
> distance to a large value. I use numactl --interleave=all to walkaround the problem temporarily.
>
> I have no HPC environment.
Yeah, that's ok. I and cristoph have. My worries is my unknown workload become regression.
so, May I assume you run your benchmark both zonre reclaim 0 and 1 and you
haven't seen regression by non-zone reclaim mode?
if so, it encourage very much to me.
if zone reclaim mode disabling don't have regression, I'll pushing to
remove default zone reclaim mode completely again.
> >>if zone_reclaim=0 tendency workload is much than zone_reclaim=1 tendency
> >>workload,
> >> we can drop our afraid and we would prioritize your opinion, of cource.
> So it seems only file servers have the issue currently.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-05-19 4:30 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-13 3:06 [PATCH 0/4] various zone_reclaim cleanup KOSAKI Motohiro
2009-05-13 3:06 ` [PATCH 1/4] vmscan: change the number of the unmapped files in zone reclaim KOSAKI Motohiro
2009-05-13 13:31 ` Rik van Riel
2009-05-14 19:52 ` Christoph Lameter
2009-05-18 3:15 ` Wu Fengguang
2009-05-18 3:35 ` KOSAKI Motohiro
2009-05-18 3:53 ` Wu Fengguang
2009-05-19 1:11 ` KOSAKI Motohiro
2009-05-13 3:06 ` [PATCH 2/4] vmscan: drop PF_SWAPWRITE from zone_reclaim KOSAKI Motohiro
2009-05-13 13:35 ` Rik van Riel
2009-05-14 19:57 ` Christoph Lameter
2009-05-18 3:33 ` Wu Fengguang
2009-05-13 3:07 ` [PATCH 3/4] vmscan: zone_reclaim use may_swap KOSAKI Motohiro
2009-05-13 11:26 ` Johannes Weiner
2009-05-13 14:43 ` Rik van Riel
2009-05-14 19:59 ` Christoph Lameter
2009-05-18 3:35 ` Wu Fengguang
2009-05-13 3:08 ` [PATCH 4/4] zone_reclaim_mode is always 0 by default KOSAKI Motohiro
2009-05-13 14:47 ` Rik van Riel
2009-05-14 8:20 ` KOSAKI Motohiro
2009-05-14 11:48 ` Robin Holt
2009-05-14 12:02 ` KOSAKI Motohiro
2009-05-13 15:22 ` Robin Holt
2009-05-14 20:05 ` Christoph Lameter
2009-05-14 20:23 ` Rik van Riel
2009-05-14 20:31 ` Christoph Lameter
2009-05-15 1:02 ` KOSAKI Motohiro
2009-05-15 10:51 ` Robin Holt
2009-05-19 2:53 ` KOSAKI Motohiro
2009-05-20 14:00 ` Robin Holt
2009-05-21 2:44 ` KOSAKI Motohiro
2009-05-21 13:31 ` Christoph Lameter
2009-05-21 13:57 ` Robin Holt
2009-05-24 13:44 ` KOSAKI Motohiro
2009-05-15 18:01 ` Christoph Lameter
2009-05-18 3:49 ` Wu Fengguang
2009-05-19 1:16 ` Zhang, Yanmin
2009-05-19 2:53 ` KOSAKI Motohiro
2009-05-19 2:57 ` KOSAKI Motohiro
2009-05-19 3:38 ` Zhang, Yanmin
2009-05-19 4:30 ` KOSAKI Motohiro [this message]
2009-05-19 5:06 ` Zhang, Yanmin
2009-05-19 7:09 ` KOSAKI Motohiro
2009-05-19 7:15 ` Zhang, Yanmin
2009-05-18 9:09 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090519125744.4EC3.A69D9226@jp.fujitsu.com \
--to=kosaki.motohiro@jp.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux-foundation.org \
--cc=fengguang.wu@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=riel@redhat.com \
--cc=yanmin.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox