From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: kosaki.motohiro@jp.fujitsu.com,
Andrew Morton <akpm@linux-foundation.org>,
riel@redhat.com, cl@linux-foundation.org, fengguang.wu@intel.com,
linuxram@us.ibm.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH 0/3] Fix malloc() stall in zone_reclaim() and bring behaviour more in line with expectations V3
Date: Tue, 16 Jun 2009 21:57:53 +0900 (JST) [thread overview]
Message-ID: <20090616202157.99AF.A69D9226@jp.fujitsu.com> (raw)
In-Reply-To: <20090615105651.GD23198@csn.ul.ie>
Hi
> > > > vmscan-zone_reclaim-use-may_swap.patch
> > > >
> > >
> > > This is a tricky one. Kosaki, I think this patch is a little dangerous. With
> > > this applied, pages get unmapped whether RECLAIM_SWAP is set or not. This
> > > means that zone_reclaim() now has more work to do when it's enabled and it
> > > incurs a number of minor faults for no reason as a result of trying to avoid
> > > going off-node. I don't believe that is desirable because it would manifest
> > > as high minor fault counts on NUMA and would be difficult to pin down why
> > > that was happening.
> >
> > (cc to hanns)
> >
> > First, if this patch should be dropped, commit bd2f6199
> > (vmscan: respect higher order in zone_reclaim()) should be too. I think.
> > the combination of lumply reclaim and !may_unmap is really ineffective.
>
> Whether it's ineffective or not, it's what the user has asked for. They
> want a high-order page found if possible within the limits of
> zone_reclaim_mode. If it fails, they will enter full direct reclaim
> later in the path and try again.
>
> How effective lumpy reclaim is in this case really depends on what the
> system has been used for in the past. It's impossible to know in advance
> how effective lumpy reclaim will be in every case.
In general, performance discussion need to concern typical use-case.
Almost zone-reclaim enabled machine is not file server. Thus unmapped file
page are not so high ratio.
I have pessimistic suspection of successful rate of lumpy reclaim in those server.
Of cource, it don't make allocation failure, it only make full direct reclaim.
but I don't hope strange and unnecessary lru shuffling. Also I don't think
it makes performance improvement.
> > it might cause isolate neighbor pages and give up unmapping and pages put
> > back tail of lru.
> > it mean to shuffle lru list.
> >
> > I don't think it is desirable.
> >
>
> With Kamezawa Hiroyuki's patch that avoids unnecessary shuffles of the LRU
> list due to lumpy reclaim, the situation might be better?
I still my_unmap is better choice, but if we use it, I agree with adding
may_unmap and page_mapped() condition to isolate_pages_global() is better and
good choice.
nice idea.
> > Second, we did learned that "mapped or not mapped" is not appropriate
> > reclaim boosting between split-lru discussion.
> > So, I think to make consistent is better. if no considerness of may_unmap
> > makes serious performance issue, we need to fix try_to_free_pages() path too.
> >
>
> I don't understand this paragraph.
>
> If zone_reclaim_mode is set to 1, I don't believe the expected behaviour is
> for pages to be unmapped from page tables. I think it will lead to mysterious
> bug reports of higher numbers of minor faults when running applications on
> NUMA machines in some situations.
AFAIK, 99.9% user read documentation, not actual code. and documentatin
didn't describe so.
I don't think this is expected behavior.
That's my point.
> > Third, if we consider MPI program on NUMA, each process only access
> > a part of array data frequently and never touch rest part of array.
> > So, AFAIK "rarely, but access" is rarely, no freqent access is not major performance source.
> >
> > I have one question. your "difficultness of pinning down" is major issue?
> >
>
> Yes. If an administrator notices that minor fault rates are higher than
> expected, it's going to be very difficult for them to understand why
> it is happening and why setting reclaim_mode to 0 apparently fixes the
> problem. oprofile for example might just show that a lot of time is being
> spent in the fault paths but not explain why.
I don't understand this paragraph a bit. I feel this is only theorical issue.
successing of try_to_unmap_one() mean the pte don't have accessed bit.
it's obvious sign to be able to unmap pte.
if we convice MPI program, long time untouched pages often mean never touched again.
Am I missing anything? or you don't talk about non-hpc workload?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-06-16 12:56 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-11 10:47 Mel Gorman
2009-06-11 10:47 ` [PATCH 1/3] Properly account for the number of page cache pages zone_reclaim() can reclaim Mel Gorman
2009-06-11 11:37 ` KOSAKI Motohiro
2009-06-12 10:17 ` Mel Gorman
2009-06-15 4:51 ` KOSAKI Motohiro
2009-06-15 10:05 ` Mel Gorman
2009-06-11 10:47 ` [PATCH 2/3] Do not unconditionally treat zones that fail zone_reclaim() as full Mel Gorman
2009-06-11 13:48 ` Christoph Lameter
2009-06-12 10:36 ` Mel Gorman
2009-06-12 15:44 ` Andrew Morton
2009-06-15 10:28 ` Mel Gorman
2009-06-15 15:58 ` Andrew Morton
2009-06-11 10:47 ` [PATCH 3/3] Count the number of times zone_reclaim() scans and fails Mel Gorman
2009-06-11 11:33 ` KOSAKI Motohiro
2009-06-15 21:19 ` Andrew Morton
2009-06-16 9:05 ` Mel Gorman
2009-06-11 23:30 ` [PATCH 0/3] Fix malloc() stall in zone_reclaim() and bring behaviour more in line with expectations V3 Andrew Morton
2009-06-12 11:04 ` Mel Gorman
2009-06-12 16:08 ` Andrew Morton
2009-06-15 9:42 ` KOSAKI Motohiro
2009-06-15 10:56 ` Mel Gorman
2009-06-15 15:01 ` Christoph Lameter
2009-06-15 15:25 ` Mel Gorman
2009-06-16 12:08 ` KOSAKI Motohiro
2009-06-16 12:20 ` Mel Gorman
2009-06-16 12:30 ` KOSAKI Motohiro
2009-06-16 12:57 ` KOSAKI Motohiro [this message]
2009-06-16 13:44 ` Mel Gorman
2009-06-16 14:51 ` Christoph Lameter
2009-06-17 10:06 ` KOSAKI Motohiro
2009-06-17 12:03 ` Mel Gorman
2009-06-17 18:48 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090616202157.99AF.A69D9226@jp.fujitsu.com \
--to=kosaki.motohiro@jp.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux-foundation.org \
--cc=fengguang.wu@intel.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxram@us.ibm.com \
--cc=mel@csn.ul.ie \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox