Re: -mm merge plans -- anti-fragmentation

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: -mm merge plans -- anti-fragmentation
@ 2007-07-10 10:20 Mel Gorman
  2007-07-10 11:01 ` KAMEZAWA Hiroyuki
                   ` (4 more replies)
  0 siblings, 5 replies; 30+ messages in thread
From: Mel Gorman @ 2007-07-10 10:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: npiggin, kenchen, jschopp, apw, kamezawa.hiroyu, a.p.zijlstra,
	y-goto, clameter, linux-mm, linux-kernel

apw@shadowen.org, kamezawa.hiroyu@jp.fujitsu.com, a.p.zijlstra@chello.nl,
y-goto@jp.fujitsu.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org
Bcc: 
Subject: Re: -mm merge plans for 2.6.23
Reply-To: 
In-Reply-To: <20070710013152.ef2cd200.akpm@linux-foundation.org>

Hi

On (10/07/07 01:31), Andrew Morton didst pronounce:

> <SNIP>
>
> add-a-bitmap-that-is-used-to-track-flags-affecting-a-block-of-pages.patch
> add-__gfp_movable-for-callers-to-flag-allocations-from-high-memory-that-may-be-migrated.patch

The add-__grp_movable patch here is also needed for zone movable patches
below. I'm pointing it out because it's possible grouping pages by
mobility and the zone movable stuff are effectively 
independent and could be merged separately.

> split-the-free-lists-for-movable-and-unmovable-allocations.patch
> choose-pages-from-the-per-cpu-list-based-on-migration-type.patch
> add-a-configure-option-to-group-pages-by-mobility.patch
> drain-per-cpu-lists-when-high-order-allocations-fail.patch
> move-free-pages-between-lists-on-steal.patch
> group-short-lived-and-reclaimable-kernel-allocations.patch
> group-high-order-atomic-allocations.patch
> do-not-group-pages-by-mobility-type-on-low-memory-systems.patch
> bias-the-placement-of-kernel-pages-at-lower-pfns.patch
> be-more-agressive-about-stealing-when-migrate_reclaimable-allocations-fallback.patch
> fix-corruption-of-memmap-on-ia64-sparsemem-when-mem_section-is-not-a-power-of-2.patch
> bias-the-location-of-pages-freed-for-min_free_kbytes-in-the-same-max_order_nr_pages-blocks.patch
> remove-page_group_by_mobility.patch
> dont-group-high-order-atomic-allocations.patch
> fix-calculation-in-move_freepages_block-for-counting-pages.patch
> breakout-page_order-to-internalh-to-avoid-special-knowledge-of-the-buddy-allocator.patch
> do-not-depend-on-max_order-when-grouping-pages-by-mobility.patch
> print-out-statistics-in-relation-to-fragmentation-avoidance-to-proc-pagetypeinfo.patch
> 
>  Mel's page allocator work.  Might merge this, but I'm still not hearing
>  sufficiently convincing noises from a sufficient number of people over this.
> 

This is a long on-going story. It bounces between people who say it's not a
complete solution and everything should have the 100% ability to defragment
and the people on the other side that say it goes a long way to solving their
problem. I've cc'd some of the parties that have expressed any interest in
the last year.

I want this mainly for reducing restrictions on the sizing of the hugepage
pool. Outside of that in the short-term it has some application for using
higher-order pages with SLUB although the patches to do that probably need
more work. In the longer-term, Kamezawa's memory hot-remove patches are
simplier if these patches are in place. In the past, it was known that
these patches helped the unplugging of 16MB sections on PPC64 which has
an application with DLPARs on that platform. In the slightly longer-term,
there are the memory compaction patches which trigger when there is enough
free memory but it's not contiguous enough.

On a slightly more left of centre tact, these patches *may* help fsblock with
large blocks although I would like to hear Nick's confirming/denying this.
Currently if fsblock wants to work with large blocks, it uses a vmap to map
discontiguous pages so they are virtually contiguous for the filesystem. The
use of VMAP is never cheap, though how much of an overhead in this case is
unknown.  If these patches were in place, fsblock could optimisically allocate
the higher-order page and use it without vmap if it succeeded. If it fails,
it would use vmap as a lower-performance-but-still-works fallback. This
may tie in better with what Christoph is doing with large blocks as well
as it may be a compromise solution between their proposals - I'm not 100%
sure so he's cc'd as well for comment.

The patches have been reviewed heavily recently by Christoph and Andy has
looked through them as well. They've been tested for a long time in -mm so
I would expect they not regress functionality. I've maintained that having
the 100% ability to defragment will cost too much in terms of performance
and would be blocked by the fact that the device driver model would have to
be updated to never use physical addresses - a massive undertaking. I think
this approach is more pragmatic and working on making more types of memory
(like page tables) migratable is at least piecemeal as opposed to turning
everything on it's head.

As has happened in the past, I'm not sure what else I can say here to
convince you.

> create-the-zone_movable-zone.patch
> allow-huge-page-allocations-to-use-gfp_high_movable.patch
> handle-kernelcore=-generic.patch
> 
>  Mel's moveable-zone work.  In a similar situation.  We need to stop whatever
>  we're doing and get down and work out what we're going to do with all this
>  stuff.
> 

Whatever about grouping pages by mobility, I would like to see these go
through. They have a real application for hugetlb pool resizing where the
administrator knows the range of hugepages that will be required but doesn't
want to waste memory when the required number of hugepages is small. I've
cc'd Kenneth Chen as I believe he has run into this problem recently where
I believe partitioning memory would have helped. He'll either confirm or deny.

> lumpy-reclaim-v4.patch

This patch is really what lumpy reclaim is. I believe Peter has looked
at this and was happy enough at the time although he is cc'd here again
in case this has changed. This is mainly useful with either grouping
pages by mobility or the ZONE_MOVABLE stuff. However, at the time the
patch was proposed, there was a feeling that it might help jumbo frame
allocation on e1000's and maybe if fsblock optimistically uses
contiguous pages it would have an application. I would like to see it go
through to see does it help e1000 at least.

There has been little noise here because there is little to say once it
went through it's initial review. Testing with anti-fragmentation
patches implies it works. Data on how well it works on it's own is
spotty but it will not regress functionality.

> have-kswapd-keep-a-minimum-order-free-other-than-order-0.patch
> only-check-absolute-watermarks-for-alloc_high-and-alloc_harder-allocations.patch
> 

These two patches are placed a little strangely as they are in relation
to slub using higher orders which comes later in the series. These were
contentious and needed to be revisited. It hasn't happened yet because
without grouping pages by mobility - the point is meaningless anyway. Right
now, these should not be going anywhere.

> slub-exploit-page-mobility-to-increase-allocation-order.patch
> slub-reduce-antifrag-max-order.patch
> 
>  These are slub changes which are dependent on Mel's stuff, and I have a note
>  here that there were reports of page allocation failures with these.  What's
>  up with that?
> 

These is where the
have-kswapd-keep-a-minimum-order-free-other-than-order-0.patch and
only-check-absolute-watermarks-for-alloc_high-and-alloc_harder-allocations.patch
patches should be. There were page allocation failure reports without these
patches but Nick felt they were not the correct solution and I tend to agree
with him on this matter. I haven't put a massive amount of thought into it
yet because without grouping pages by mobility, the question is pointless.

>  Maybe I should just drop the 100-odd marginal-looking MM patches?  We're
>  simply not showing compelling reasons for merging them and quite a lot of them
>  are stuck in a 90% complete state.
> 

While I cannot speak for all the patches that might fall into
this category, I would agree with the sentiment for the
have-kswapd-keep-a-minimum-order-free-other-than-order-0.patch and friends
patches.

However, it is totally unclear from my perspective what can be done with
grouping pages by mobility or ZONE_MOVABLE that would aid their stalled
status. Prehaps this thread will nudge it a bit.

> memory-unplug-v7-migration-by-kernel.patch
> memory-unplug-v7-isolate_lru_page-fix.patch
> memory-unplug-v7-memory-hotplug-cleanup.patch
> memory-unplug-v7-page-isolation.patch
> memory-unplug-v7-page-offline.patch
> memory-unplug-v7-ia64-interface.patch
> 
>  These are new, and are dependent on Mel's stuff.  Not for 2.6.23.
> 

Specifically, they depend on grouping pages by mobility for the page
isolation patch. Without grouping pages by mobility, that patch gets
pretty messy. For the operation to succeed at all, it benefits from the
ZONE_MOVABLE patches. Kamezawa is cc'd so he might comment further.

> <SNIP>

Thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-10 10:20 -mm merge plans -- anti-fragmentation Mel Gorman
@ 2007-07-10 11:01 ` KAMEZAWA Hiroyuki
  2007-07-10 11:12   ` Mel Gorman
  2007-07-10 11:04 ` Peter Zijlstra
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 30+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-07-10 11:01 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, npiggin, kenchen, jschopp, apw, a.p.zijlstra,
	y-goto, clameter, linux-mm, linux-kernel

On Tue, 10 Jul 2007 11:20:43 +0100
mel@skynet.ie (Mel Gorman) wrote:
> > memory-unplug-v7-migration-by-kernel.patch
> > memory-unplug-v7-isolate_lru_page-fix.patch
> > memory-unplug-v7-memory-hotplug-cleanup.patch
> > memory-unplug-v7-page-isolation.patch
> > memory-unplug-v7-page-offline.patch
> > memory-unplug-v7-ia64-interface.patch
> > 
> >  These are new, and are dependent on Mel's stuff.  Not for 2.6.23.
> > 
> 
> Specifically, they depend on grouping pages by mobility for the page
> isolation patch. Without grouping pages by mobility, that patch gets
> pretty messy. For the operation to succeed at all, it benefits from the
> ZONE_MOVABLE patches. Kamezawa is cc'd so he might comment further.
> 

In gerneal, there are 2 purpose for memory unplug.
(1) reduce amount of memory.
(2) plug some range of memory.

(1) is request from people who use some flexible environment, like virtual machine,
LPAR. (2) is request from people who want to remove physical DIMM deivces.

For (1), page movable type and page defragment works very well. Because memory unplug
interface allows removing a section of pages, we need to unplug the whole section.
By page grouping, pages are grouped into chunks and MOVABLE type chunk can be unplugged
very easily.

For (2), we need some method for specifing the range we will remove. For doing that,
ZONE seems to be good candidate.  Now we use "kernelcore=" boot option to create
ZONE_MOVABLE by hand. But this is the first step. I know Intel guy posted
his idea to specify Hotpluggable-Memory range in SRAT (by firmware). And I think that
other method may be introduced for node-hotplug. 

-Kame

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-10 10:20 -mm merge plans -- anti-fragmentation Mel Gorman
  2007-07-10 11:01 ` KAMEZAWA Hiroyuki
@ 2007-07-10 11:04 ` Peter Zijlstra
  2007-07-10 13:24   ` Mel Gorman
  2007-07-10 13:03 ` Nick Piggin
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2007-07-10 11:04 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, npiggin, kenchen, jschopp, apw, kamezawa.hiroyu,
	y-goto, clameter, linux-mm, linux-kernel

On Tue, 2007-07-10 at 11:20 +0100, Mel Gorman wrote:

<snip>

> > lumpy-reclaim-v4.patch
> 
> This patch is really what lumpy reclaim is. I believe Peter has looked
> at this and was happy enough at the time although he is cc'd here again
> in case this has changed. This is mainly useful with either grouping
> pages by mobility or the ZONE_MOVABLE stuff. However, at the time the
> patch was proposed, there was a feeling that it might help jumbo frame
> allocation on e1000's and maybe if fsblock optimistically uses
> contiguous pages it would have an application. I would like to see it go
> through to see does it help e1000 at least.

I'm not seeing how this will help e1000 (and other jumbo drivers). They
typically allocate using GFP_ATOMIC, so in order to satisfy those you'd
need to either have a higher order watermark or do atomic defrag of the
free space.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-10 11:01 ` KAMEZAWA Hiroyuki
@ 2007-07-10 11:12   ` Mel Gorman
  2007-07-10 11:38     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 30+ messages in thread
From: Mel Gorman @ 2007-07-10 11:12 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, npiggin, kenchen, jschopp, apw, a.p.zijlstra,
	y-goto, clameter, linux-mm, linux-kernel

On (10/07/07 20:01), KAMEZAWA Hiroyuki didst pronounce:
> On Tue, 10 Jul 2007 11:20:43 +0100
> mel@skynet.ie (Mel Gorman) wrote:
> > > memory-unplug-v7-migration-by-kernel.patch
> > > memory-unplug-v7-isolate_lru_page-fix.patch
> > > memory-unplug-v7-memory-hotplug-cleanup.patch
> > > memory-unplug-v7-page-isolation.patch
> > > memory-unplug-v7-page-offline.patch
> > > memory-unplug-v7-ia64-interface.patch
> > > 
> > >  These are new, and are dependent on Mel's stuff.  Not for 2.6.23.
> > > 
> > 
> > Specifically, they depend on grouping pages by mobility for the page
> > isolation patch. Without grouping pages by mobility, that patch gets
> > pretty messy. For the operation to succeed at all, it benefits from the
> > ZONE_MOVABLE patches. Kamezawa is cc'd so he might comment further.
> > 
> 
> In gerneal, there are 2 purpose for memory unplug.
> (1) reduce amount of memory.
> (2) plug some range of memory.
> 
> (1) is request from people who use some flexible environment, like virtual machine,
> LPAR. (2) is request from people who want to remove physical DIMM deivces.
> 
> For (1), page movable type and page defragment works very well. Because memory unplug
> interface allows removing a section of pages, we need to unplug the whole section.
> By page grouping, pages are grouped into chunks and MOVABLE type chunk can be unplugged
> very easily.
> 
> For (2), we need some method for specifing the range we will remove. For doing that,
> ZONE seems to be good candidate.  Now we use "kernelcore=" boot option to create
> ZONE_MOVABLE by hand.

At the risk of putting you on the spot, do you mind saying whether the
grouping pages by mobility and ZONE_MOVABLE patches are going in the
direction you want or should something totally different be done? If
they are going the right direction, is there anything critical that is
missing right now?

> But this is the first step. I know Intel guy posted
> his idea to specify Hotpluggable-Memory range in SRAT (by firmware).

There may be additional work required to make this play nicely with
ZONE_MOVABLE but it shouldn't be anything fundamental.

> And I think that
> other method may be introduced for node-hotplug. 
> 

Same as above really. If the node contains one zone - ZONE_MOVABLE, it
would work for unplugging.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-10 11:12   ` Mel Gorman
@ 2007-07-10 11:38     ` KAMEZAWA Hiroyuki
  2007-07-10 15:50       ` Mel Gorman
  0 siblings, 1 reply; 30+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-07-10 11:38 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, npiggin, kenchen, jschopp, apw, a.p.zijlstra,
	y-goto, clameter, linux-mm, linux-kernel

On Tue, 10 Jul 2007 12:12:02 +0100
mel@skynet.ie (Mel Gorman) wrote:
> > For (2), we need some method for specifing the range we will remove. For doing that,
> > ZONE seems to be good candidate.  Now we use "kernelcore=" boot option to create
> > ZONE_MOVABLE by hand.
> 
> At the risk of putting you on the spot, do you mind saying whether the
> grouping pages by mobility and ZONE_MOVABLE patches are going in the
> direction you want or should something totally different be done? If
> they are going the right direction, is there anything critical that is
> missing right now?
> 
"grouping pages by mobility and ZONE_MOVABLE" things are what I want. And
I want to go with them. But I know some people doesn't want to increase #
of zones. It is my concern. 
I know ZONE_MOVABLE works well but there are people who don't want new zone.
So making ZONE_MOVABLE as configurable will be good thing, as Nick Piggin pointed.

About my other concerns , see node hotplug (below).

> > But this is the first step. I know Intel guy posted
> > his idea to specify Hotpluggable-Memory range in SRAT (by firmware).
> 
> There may be additional work required to make this play nicely with
> ZONE_MOVABLE but it shouldn't be anything fundamental.
> 
yes. And I don't know his idea about SRAT is acceped in firmware comunity or not.
For now, kernelcore= works enough for memory hotplug.

> > And I think that
> > other method may be introduced for node-hotplug. 
> > 
> 
> Same as above really. If the node contains one zone - ZONE_MOVABLE, it
> would work for unplugging.
> 
Our concern on node hotplug is "bootmem" and hashtable , pgdata, memmap etc....
NUMA initilization (of each arch) includes something complicated.
But this is not directly related to ZONE_MOVABLE things I think.
It's node-hotplug problem.
We are now consdiering hot-add nodes after initcalls().

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-10 10:20 -mm merge plans -- anti-fragmentation Mel Gorman
  2007-07-10 11:01 ` KAMEZAWA Hiroyuki
  2007-07-10 11:04 ` Peter Zijlstra
@ 2007-07-10 13:03 ` Nick Piggin
  2007-07-10 13:55   ` Mel Gorman
  2007-07-10 18:46   ` Christoph Lameter
  2007-07-10 14:29 ` Dave McCracken
  2007-07-12 19:29 ` Andrew Morton
  4 siblings, 2 replies; 30+ messages in thread
From: Nick Piggin @ 2007-07-10 13:03 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, kenchen, jschopp, apw, kamezawa.hiroyu,
	a.p.zijlstra, y-goto, clameter, linux-mm, linux-kernel

On Tue, Jul 10, 2007 at 11:20:43AM +0100, Mel Gorman wrote:
> > 
> >  Mel's page allocator work.  Might merge this, but I'm still not hearing
> >  sufficiently convincing noises from a sufficient number of people over this.
> > 
> 
> This is a long on-going story. It bounces between people who say it's not a
> complete solution and everything should have the 100% ability to defragment
> and the people on the other side that say it goes a long way to solving their
> problem. I've cc'd some of the parties that have expressed any interest in
> the last year.

And I guess some other people who want to see what prolbems there are
and what can't be solved between order-0 allocations and reserve zones.

 
> On a slightly more left of centre tact, these patches *may* help fsblock with
> large blocks although I would like to hear Nick's confirming/denying this.
> Currently if fsblock wants to work with large blocks, it uses a vmap to map
> discontiguous pages so they are virtually contiguous for the filesystem. The
> use of VMAP is never cheap, though how much of an overhead in this case is
> unknown.  If these patches were in place, fsblock could optimisically allocate
> the higher-order page and use it without vmap if it succeeded. If it fails,
> it would use vmap as a lower-performance-but-still-works fallback. This
> may tie in better with what Christoph is doing with large blocks as well
> as it may be a compromise solution between their proposals - I'm not 100%
> sure so he's cc'd as well for comment.

Yeah higher order allocations could definitely be helpful for this although
I couldn't guess at the sort of impovements at this stage. And I mean if
there was a simple choice between better (but still not perfect) support
for higher order allocations or not, then of course you would take them.
I am sure there are other places as well where they might makes life a bit
easier or performance a bit better.

But given the code involved, it is not just a simple choice, but a
tradeoff. Perhaps I haven't seen or don't realise it, but I'm still not
sure that this tradeoff is a good one. (just my opinion).


> The patches have been reviewed heavily recently by Christoph and Andy has
> looked through them as well. They've been tested for a long time in -mm so
> I would expect they not regress functionality. I've maintained that having
> the 100% ability to defragment will cost too much in terms of performance
> and would be blocked by the fact that the device driver model would have to
> be updated to never use physical addresses - a massive undertaking. I think
> this approach is more pragmatic and working on making more types of memory
> (like page tables) migratable is at least piecemeal as opposed to turning
> everything on it's head.

My comments about defragmentation of the kernel were not exactly what
I believe is the right direction to go (it may be, but I'm rally not
in a position to know without having seen or tried to implement it). But
I do think that's what would really be needed in order to really support
higher order allocations the same as order-0.

I realise in your pragmatic approach, you are encouraging users to
put fallbacks in place in case a higher order page cannot be allocated,
but I don't think either higher order pagecache or higher order slubs
have such fallbacks (fsblock or a combination of fsblock and higher
order pagecache could have, but...).
 
> >  These are slub changes which are dependent on Mel's stuff, and I have a note
> >  here that there were reports of page allocation failures with these.  What's
> >  up with that?
> > 
> 
> These is where the
> have-kswapd-keep-a-minimum-order-free-other-than-order-0.patch and
> only-check-absolute-watermarks-for-alloc_high-and-alloc_harder-allocations.patch
> patches should be. There were page allocation failure reports without these
> patches but Nick felt they were not the correct solution and I tend to agree
> with him on this matter. I haven't put a massive amount of thought into it
> yet because without grouping pages by mobility, the question is pointless.

Yeah I think that was a hack.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-10 11:04 ` Peter Zijlstra
@ 2007-07-10 13:24   ` Mel Gorman
  0 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2007-07-10 13:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, npiggin, kenchen, jschopp, apw, kamezawa.hiroyu,
	y-goto, clameter, linux-mm, linux-kernel

On (10/07/07 13:04), Peter Zijlstra didst pronounce:
> On Tue, 2007-07-10 at 11:20 +0100, Mel Gorman wrote:
> 
> <snip>
> 
> > > lumpy-reclaim-v4.patch
> > 
> > This patch is really what lumpy reclaim is. I believe Peter has looked
> > at this and was happy enough at the time although he is cc'd here again
> > in case this has changed. This is mainly useful with either grouping
> > pages by mobility or the ZONE_MOVABLE stuff. However, at the time the
> > patch was proposed, there was a feeling that it might help jumbo frame
> > allocation on e1000's and maybe if fsblock optimistically uses
> > contiguous pages it would have an application. I would like to see it go
> > through to see does it help e1000 at least.
> 
> I'm not seeing how this will help e1000 (and other jumbo drivers). They
> typically allocate using GFP_ATOMIC, so in order to satisfy those you'd
> need to either have a higher order watermark or do atomic defrag of the
> free space.
> 

It does help somewhat indirectly and in an unsatisfactory manner. When the
higher watermarks are breached, the atomic allocation will still succeeed
but kswapd will be poked to reclaim at a given order. This is similar to
the problems SLUB hits when it uses high-orders frequently.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-10 13:03 ` Nick Piggin
@ 2007-07-10 13:55   ` Mel Gorman
  2007-07-10 18:47     ` Christoph Lameter
  2007-07-10 18:46   ` Christoph Lameter
  1 sibling, 1 reply; 30+ messages in thread
From: Mel Gorman @ 2007-07-10 13:55 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, kenchen, jschopp, apw, kamezawa.hiroyu,
	a.p.zijlstra, y-goto, clameter, linux-mm, linux-kernel

On (10/07/07 15:03), Nick Piggin didst pronounce:
> On Tue, Jul 10, 2007 at 11:20:43AM +0100, Mel Gorman wrote:
> > > 
> > >  Mel's page allocator work.  Might merge this, but I'm still not hearing
> > >  sufficiently convincing noises from a sufficient number of people over this.
> > > 
> > 
> > This is a long on-going story. It bounces between people who say it's not a
> > complete solution and everything should have the 100% ability to defragment
> > and the people on the other side that say it goes a long way to solving their
> > problem. I've cc'd some of the parties that have expressed any interest in
> > the last year.
> 
> And I guess some other people who want to see what prolbems there are
> and what can't be solved between order-0 allocations and reserve zones.
> 

This is true. The ZONE_MOVABLE stuff is a stab at seeing how far a reserve
zone can get and work to improve order-0 usage is not mutually exclusive to
grouping pages by mobility.

> > On a slightly more left of centre tact, these patches *may* help fsblock with
> > large blocks although I would like to hear Nick's confirming/denying this.
> > Currently if fsblock wants to work with large blocks, it uses a vmap to map
> > discontiguous pages so they are virtually contiguous for the filesystem. The
> > use of VMAP is never cheap, though how much of an overhead in this case is
> > unknown.  If these patches were in place, fsblock could optimisically allocate
> > the higher-order page and use it without vmap if it succeeded. If it fails,
> > it would use vmap as a lower-performance-but-still-works fallback. This
> > may tie in better with what Christoph is doing with large blocks as well
> > as it may be a compromise solution between their proposals - I'm not 100%
> > sure so he's cc'd as well for comment.
> 
> Yeah higher order allocations could definitely be helpful for this although
> I couldn't guess at the sort of impovements at this stage.

Admittedly, neither can I.

> And I mean if
> there was a simple choice between better (but still not perfect) support
> for higher order allocations or not, then of course you would take them.
> I am sure there are other places as well where they might makes life a bit
> easier or performance a bit better.
> 
> But given the code involved, it is not just a simple choice, but a
> tradeoff. Perhaps I haven't seen or don't realise it, but I'm still not
> sure that this tradeoff is a good one. (just my opinion).
> 

Regrettably, the code cannot be made any simplier without making it more
ineffective at the same time.

In principal, the idea is fairly simple. Identify allocations into a number of
"migrate types" and mark them with GFP flags. Instead of one set of free lists
have one set per migrate type and always try an satisfy an allocation from a
preferred list. When that cannot be done, rmqueue_fallback() is responsible
for selecting an alternative list in such a way to minimise future mixing
of blocks.

I cannot see a way this code could be made similar or devise an
alternative mechanism that would achieve the same result and be easier
to understand at the same time. I do not believe any serious alternative
has been proposed or implemented.

> > The patches have been reviewed heavily recently by Christoph and Andy has
> > looked through them as well. They've been tested for a long time in -mm so
> > I would expect they not regress functionality. I've maintained that having
> > the 100% ability to defragment will cost too much in terms of performance
> > and would be blocked by the fact that the device driver model would have to
> > be updated to never use physical addresses - a massive undertaking. I think
> > this approach is more pragmatic and working on making more types of memory
> > (like page tables) migratable is at least piecemeal as opposed to turning
> > everything on it's head.
> 
> My comments about defragmentation of the kernel were not exactly what
> I believe is the right direction to go (it may be, but I'm rally not
> in a position to know without having seen or tried to implement it).

I've taken a look at it a few times. I might be blinded by tunnel vision
but it's always looked like a really serious undertaking. Breaking the 1:1
phys:virt mapping was bad enough and looked like it would have some serious
performance reprocussions. Worse though was the requirement to rework drivers
to never use physical addresses and always be prepared to release all memory -
that just seemed like the type of upheaval that would never succeed.

> But
> I do think that's what would really be needed in order to really support
> higher order allocations the same as order-0.
> 

If they had to work 100% of the time at all times, I might agree with
you but the cost of having that sort of 100% guarantee is likely to be
so high as to outweigh any benefits of using high-order pages in the
first place.

> I realise in your pragmatic approach, you are encouraging users to
> put fallbacks in place in case a higher order page cannot be allocated,
> but I don't think either higher order pagecache or higher order slubs
> have such fallbacks (fsblock or a combination of fsblock and higher
> order pagecache could have, but...).
>  

SLUB doesn't have such a fallback right now. Minimally, one alternative
proposal was to force slabs that are involved with IO to use order-0
until it could be addressed fully. This conversation was never fully
resolved because similar to other points, without grouping pages by
mobility or something similar it's pointless.

fsblock in combination with higher order pagecache would have such a
fallback. While that is vapour at the moment, that does not mean that
something like that could not be implemented if grouping pages by mobility
was used. Again, without grouping pages by mobility or some solution that
has similar effects, higher-order pagecache in any guise becomes unworkable.

> > >  These are slub changes which are dependent on Mel's stuff, and I have a note
> > >  here that there were reports of page allocation failures with these.  What's
> > >  up with that?
> > > 
> > 
> > These is where the
> > have-kswapd-keep-a-minimum-order-free-other-than-order-0.patch and
> > only-check-absolute-watermarks-for-alloc_high-and-alloc_harder-allocations.patch
> > patches should be. There were page allocation failure reports without these
> > patches but Nick felt they were not the correct solution and I tend to agree
> > with him on this matter. I haven't put a massive amount of thought into it
> > yet because without grouping pages by mobility, the question is pointless.
> 
> Yeah I think that was a hack.
> 

Somewhat agreed. This is one where I want to take the time to consider
alternatives that are reliable and not subject to deadlocking. I do not
believe there is any attempt to push these patches anywhere right now.

-- 
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-10 10:20 -mm merge plans -- anti-fragmentation Mel Gorman
                   ` (2 preceding siblings ...)
  2007-07-10 13:03 ` Nick Piggin
@ 2007-07-10 14:29 ` Dave McCracken
  2007-07-10 15:23   ` Nick Piggin
  2007-07-12 19:29 ` Andrew Morton
  4 siblings, 1 reply; 30+ messages in thread
From: Dave McCracken @ 2007-07-10 14:29 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, npiggin, kenchen, jschopp, apw, kamezawa.hiroyu,
	a.p.zijlstra, y-goto, clameter, linux-mm, linux-kernel

On Tuesday 10 July 2007, Mel Gorman wrote:
> >  Mel's page allocator work.  Might merge this, but I'm still not hearing
> >  sufficiently convincing noises from a sufficient number of people over
> > this.
>
> This is a long on-going story. It bounces between people who say it's not a
> complete solution and everything should have the 100% ability to defragment
> and the people on the other side that say it goes a long way to solving
> their problem. I've cc'd some of the parties that have expressed any
> interest in the last year.

I find myself wondering what "sufficiently convincing noises" are.  I think we 
can all agree that in the current kernel order>0 allocations are a disaster.  
They simply aren't useable once the system fragments.  I think we can also 
all agree that 100% defragmentation is impossible without rewriting the 
kernel to avoid the hard-coded virtual->physical relationship we have now.

With that said, the only remaining question I see is whether we need order>0 
allocations.  If we do, then Mel's patches are clearly the right thing to do.  
They have received a lot of testing (if just by virtue of being in -mm for so 
long), and have shown to greatly increase the availability of order>0 pages.

The sheer list of patches lined up behind this set is strong evidence that 
there are useful features which depend on a working order>0.  When you add in 
the existing code that has to struggle with allocation failures or resort to 
special pools (ie hugetlbfs), I see a clear vote for the need for this patch.

Some object because order>0 will still be able to fail.  I point out that 
order==0 can also fail, though we go to great lengths to prevent it.  Mel's 
patches raise the success rate of order>0 to within a few percent of 
order==0.  All this means is callers will need to decide how to handle the 
infrequent failure.  This should be true no matter what the order.

I strongly vote for merging these patches.  Let's get them in mainline where 
they can do some good.

Dave McCracken

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-10 14:29 ` Dave McCracken
@ 2007-07-10 15:23   ` Nick Piggin
  2007-07-10 17:11     ` Dave McCracken
  2007-07-10 18:50     ` Christoph Lameter
  0 siblings, 2 replies; 30+ messages in thread
From: Nick Piggin @ 2007-07-10 15:23 UTC (permalink / raw)
  To: Dave McCracken
  Cc: Mel Gorman, Andrew Morton, kenchen, jschopp, apw,
	kamezawa.hiroyu, a.p.zijlstra, y-goto, clameter, linux-mm,
	linux-kernel

On Tue, Jul 10, 2007 at 09:29:45AM -0500, Dave McCracken wrote:
> On Tuesday 10 July 2007, Mel Gorman wrote:
> > >  Mel's page allocator work.  Might merge this, but I'm still not hearing
> > >  sufficiently convincing noises from a sufficient number of people over
> > > this.
> >
> > This is a long on-going story. It bounces between people who say it's not a
> > complete solution and everything should have the 100% ability to defragment
> > and the people on the other side that say it goes a long way to solving
> > their problem. I've cc'd some of the parties that have expressed any
> > interest in the last year.
> 
> I find myself wondering what "sufficiently convincing noises" are.  I think we 
> can all agree that in the current kernel order>0 allocations are a disaster.  

Are they? For what the kernel currently uses them for, I don't think
the lower order ones are so bad. Now and again we used to get reports
of atomic order 3 allocation failures with e1000 for example, but a
lot of those were before kswapd would properly asynchronously start
reclaim for atomic and higher order allocations. The odd failure
sometimes catches my eye, but nothing I would call a disaster.

Something like the birthday paradox I guess says that you don't actually
need a large proportion of pages free in order to get higher order
pages free. I think it is something like O(log or sqrt total pages), ie.
similar to what we use for our pages_min sizing, isn't it?

> They simply aren't useable once the system fragments.  I think we can also 
> all agree that 100% defragmentation is impossible without rewriting the 
> kernel to avoid the hard-coded virtual->physical relationship we have now.
> 
> With that said, the only remaining question I see is whether we need order>0 
> allocations.  If we do, then Mel's patches are clearly the right thing to do.  
> They have received a lot of testing (if just by virtue of being in -mm for so 
> long), and have shown to greatly increase the availability of order>0 pages.
> 
> The sheer list of patches lined up behind this set is strong evidence that 
> there are useful features which depend on a working order>0.  When you add in 
> the existing code that has to struggle with allocation failures or resort to 
> special pools (ie hugetlbfs), I see a clear vote for the need for this patch.

Really the only patches so far that I think have convincing reasons are
memory unplug and hugepage, and both of those can get a long way by using
a reserve zone (note it isn't entirely reserved, but still available for
things like pagecache). Beyond that, is there a big demand, and do we
want to make this fundamental change in direction in the kernel to
satisfy that demand?

> Some object because order>0 will still be able to fail.  I point out that 
> order==0 can also fail, though we go to great lengths to prevent it.  Mel's 
> patches raise the success rate of order>0 to within a few percent of 
> order==0.  All this means is callers will need to decide how to handle the 
> infrequent failure.  This should be true no matter what the order.

So small ones like order-1 and 2 seem reasonably good right now AFAIKS.
If you perhaps want to say start using order-4  pages for slab or
some other kernel memory allocations, then you can run into the situation
where memory gets fragmented such that you have one sixteenth of your
memory actualy used but you can't allocate from any of your slabs because
there are no order-4 pages left. I guess this is a big difference between
order-low failures and order-high.

> 
> I strongly vote for merging these patches.  Let's get them in mainline where 
> they can do some good.
> 
> Dave McCracken

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-10 11:38     ` KAMEZAWA Hiroyuki
@ 2007-07-10 15:50       ` Mel Gorman
  0 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2007-07-10 15:50 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, npiggin, kenchen, jschopp, apw, a.p.zijlstra,
	y-goto, clameter, linux-mm, linux-kernel

On (10/07/07 20:38), KAMEZAWA Hiroyuki didst pronounce:
> On Tue, 10 Jul 2007 12:12:02 +0100
> mel@skynet.ie (Mel Gorman) wrote:
> > > For (2), we need some method for specifing the range we will remove. For doing that,
> > > ZONE seems to be good candidate.  Now we use "kernelcore=" boot option to create
> > > ZONE_MOVABLE by hand.
> > 
> > At the risk of putting you on the spot, do you mind saying whether the
> > grouping pages by mobility and ZONE_MOVABLE patches are going in the
> > direction you want or should something totally different be done? If
> > they are going the right direction, is there anything critical that is
> > missing right now?
> > 
> "grouping pages by mobility and ZONE_MOVABLE" things are what I want. And
> I want to go with them. But I know some people doesn't want to increase #
> of zones. It is my concern. 

I'm not overly keen on increasing the number of zones either but it is a
simplier approach, solves some of the problems and is less intrusive than
grouping pages by mobility so it's a reasonable starting point.

> I know ZONE_MOVABLE works well but there are people who don't want new zone.
> So making ZONE_MOVABLE as configurable will be good thing, as Nick Piggin pointed.
> 

I tested your zone-configurable patch and they appear to work.  Your patch
builds whether ZONE_MOVABLE is available or not and ZONE_MOVABLE is only
available when the config option is set.  It is also considerably cleaner
than the patch I put together for a configurable ZONE_MOVABLE which is too
ugly to live in comparison.

> About my other concerns , see node hotplug (below).
> 
> > > But this is the first step. I know Intel guy posted
> > > his idea to specify Hotpluggable-Memory range in SRAT (by firmware).
> > 
> > There may be additional work required to make this play nicely with
> > ZONE_MOVABLE but it shouldn't be anything fundamental.
> > 
> yes. And I don't know his idea about SRAT is acceped in firmware comunity or not.
> For now, kernelcore= works enough for memory hotplug.
> 

Sounds good.

> > > And I think that
> > > other method may be introduced for node-hotplug. 
> > > 
> > 
> > Same as above really. If the node contains one zone - ZONE_MOVABLE, it
> > would work for unplugging.
> > 
> Our concern on node hotplug is "bootmem" and hashtable , pgdata, memmap etc....
> NUMA initilization (of each arch) includes something complicated.
> But this is not directly related to ZONE_MOVABLE things I think.
> It's node-hotplug problem.
> We are now consdiering hot-add nodes after initcalls().
> 

I don't see off-hand how it's so different from normal memory hot-add
but I'll take your word for it. I'll keep an eye out for patches related
to it.

Thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-10 15:23   ` Nick Piggin
@ 2007-07-10 17:11     ` Dave McCracken
  2007-07-11  2:59       ` Nick Piggin
  2007-07-11  8:55       ` Christoph Hellwig
  2007-07-10 18:50     ` Christoph Lameter
  1 sibling, 2 replies; 30+ messages in thread
From: Dave McCracken @ 2007-07-10 17:11 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Mel Gorman, Andrew Morton, kenchen, jschopp, apw,
	kamezawa.hiroyu, a.p.zijlstra, y-goto, clameter, linux-mm,
	linux-kernel

On Tuesday 10 July 2007, Nick Piggin wrote:
> On Tue, Jul 10, 2007 at 09:29:45AM -0500, Dave McCracken wrote:
> > I find myself wondering what "sufficiently convincing noises" are.  I
> > think we can all agree that in the current kernel order>0 allocations are
> > a disaster.
>
> Are they? For what the kernel currently uses them for, I don't think
> the lower order ones are so bad. Now and again we used to get reports
> of atomic order 3 allocation failures with e1000 for example, but a
> lot of those were before kswapd would properly asynchronously start
> reclaim for atomic and higher order allocations. The odd failure
> sometimes catches my eye, but nothing I would call a disaster.

Ok, maybe disaster is too strong a word.  But any kind of order>0 allocation 
still has to be approached with fear and caution, with a well tested fallback 
in the case of the inevitable failures.  How many driver writers would have 
benefited from using order>0 pages, but turned aside to other less optimal 
solutions due to their unreliability?  We don't know, and probably never 
will.  Those people have moved on and won't revisit that design decision.

> > The sheer list of patches lined up behind this set is strong evidence
> > that there are useful features which depend on a working order>0.  When
> > you add in the existing code that has to struggle with allocation
> > failures or resort to special pools (ie hugetlbfs), I see a clear vote
> > for the need for this patch.
>
> Really the only patches so far that I think have convincing reasons are
> memory unplug and hugepage, and both of those can get a long way by using
> a reserve zone (note it isn't entirely reserved, but still available for
> things like pagecache). Beyond that, is there a big demand, and do we
> want to make this fundamental change in direction in the kernel to
> satisfy that demand?

Yes, these projects have workarounds, because they have to.  But the 
workarounds are painful and often require that the user specify in advance 
what memory they intend to use for this purpose, something users often have 
to learn by trial and error.  Mel's patches would eliminate this barrier to 
use of the features.

I don't see Mel's patches as "a fundamental change in direction".  I think 
you're overstating the case.  I see it as fixing a deficiency in the design 
of the page allocator, and a long overdue fix.

> > Some object because order>0 will still be able to fail.  I point out that
> > order==0 can also fail, though we go to great lengths to prevent it.
> >  Mel's patches raise the success rate of order>0 to within a few percent
> > of order==0.  All this means is callers will need to decide how to handle
> > the infrequent failure.  This should be true no matter what the order.
>
> So small ones like order-1 and 2 seem reasonably good right now AFAIKS.
> If you perhaps want to say start using order-4  pages for slab or
> some other kernel memory allocations, then you can run into the situation
> where memory gets fragmented such that you have one sixteenth of your
> memory actualy used but you can't allocate from any of your slabs because
> there are no order-4 pages left. I guess this is a big difference between
> order-low failures and order-high.

In summary, I think I can rephrase your arguments against the patches as 
order>0 allocation pretty much works now for small orders, and people are 
living with it".  Is that fairly accurate?  My counter argument is that we 
can easily make it work much better and vastly simplify the code that is 
having to work around the lack of it by applying Mel's patches.

Dave McCracken

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-10 13:03 ` Nick Piggin
  2007-07-10 13:55   ` Mel Gorman
@ 2007-07-10 18:46   ` Christoph Lameter
  2007-07-11  9:48     ` Mel Gorman
  1 sibling, 1 reply; 30+ messages in thread
From: Christoph Lameter @ 2007-07-10 18:46 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Mel Gorman, Andrew Morton, kenchen, jschopp, apw,
	kamezawa.hiroyu, a.p.zijlstra, y-goto, linux-mm, linux-kernel

On Tue, 10 Jul 2007, Nick Piggin wrote:

> I realise in your pragmatic approach, you are encouraging users to
> put fallbacks in place in case a higher order page cannot be allocated,
> but I don't think either higher order pagecache or higher order slubs
> have such fallbacks (fsblock or a combination of fsblock and higher
> order pagecache could have, but...).

We have run mm kernels for month now without the need of a fallback. I 
purpose of ZONE_MOVABLE was to guarantee that higher order pages could be 
reclaimed and thus make the scheme reliable?

The experience so far shows that the approach works reliably. If there are 
issues then they need to be fixed. Putting in workarounds in other places 
such as in fsblock may just be hiding problems if there are any.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-10 13:55   ` Mel Gorman
@ 2007-07-10 18:47     ` Christoph Lameter
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Lameter @ 2007-07-10 18:47 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nick Piggin, Andrew Morton, kenchen, jschopp, apw,
	kamezawa.hiroyu, a.p.zijlstra, y-goto, linux-mm, linux-kernel

On Tue, 10 Jul 2007, Mel Gorman wrote:

> > > >  These are slub changes which are dependent on Mel's stuff, and I have a note
> > > >  here that there were reports of page allocation failures with these.  What's
> > > >  up with that?

As far as I know these were resolved by some of Mel's changes.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-10 15:23   ` Nick Piggin
  2007-07-10 17:11     ` Dave McCracken
@ 2007-07-10 18:50     ` Christoph Lameter
  2007-07-11 10:05       ` Mel Gorman
  1 sibling, 1 reply; 30+ messages in thread
From: Christoph Lameter @ 2007-07-10 18:50 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Dave McCracken, Mel Gorman, Andrew Morton, kenchen, jschopp, apw,
	kamezawa.hiroyu, a.p.zijlstra, y-goto, linux-mm, linux-kernel

On Tue, 10 Jul 2007, Nick Piggin wrote:

> > The sheer list of patches lined up behind this set is strong evidence that 
> > there are useful features which depend on a working order>0.  When you add in 
> > the existing code that has to struggle with allocation failures or resort to 
> > special pools (ie hugetlbfs), I see a clear vote for the need for this patch.
> 
> Really the only patches so far that I think have convincing reasons are
> memory unplug and hugepage, and both of those can get a long way by using
> a reserve zone (note it isn't entirely reserved, but still available for
> things like pagecache). Beyond that, is there a big demand, and do we
> want to make this fundamental change in direction in the kernel to
> satisfy that demand?

SLUB can use it to use large order pages which generate less lock 
contention which is important in SMP systems. Large pages also increase 
the object density in slabs.

> So small ones like order-1 and 2 seem reasonably good right now AFAIKS.

Sorry no. Without the antifrag patches I had failures even with order 1 
and 2 allocs from SLUB.

> If you perhaps want to say start using order-4  pages for slab or
> some other kernel memory allocations, then you can run into the situation
> where memory gets fragmented such that you have one sixteenth of your
> memory actualy used but you can't allocate from any of your slabs because
> there are no order-4 pages left. I guess this is a big difference between
> order-low failures and order-high.

The order that is readily reclaimable should be configurable.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-10 17:11     ` Dave McCracken
@ 2007-07-11  2:59       ` Nick Piggin
  2007-07-11 10:01         ` Mel Gorman
  2007-07-11 13:03         ` Andy Whitcroft
  2007-07-11  8:55       ` Christoph Hellwig
  1 sibling, 2 replies; 30+ messages in thread
From: Nick Piggin @ 2007-07-11  2:59 UTC (permalink / raw)
  To: Dave McCracken
  Cc: Mel Gorman, Andrew Morton, kenchen, jschopp, apw,
	kamezawa.hiroyu, a.p.zijlstra, y-goto, clameter, linux-mm,
	linux-kernel

On Tue, Jul 10, 2007 at 12:11:45PM -0500, Dave McCracken wrote:
> On Tuesday 10 July 2007, Nick Piggin wrote:
> > On Tue, Jul 10, 2007 at 09:29:45AM -0500, Dave McCracken wrote:
> > > I find myself wondering what "sufficiently convincing noises" are.  I
> > > think we can all agree that in the current kernel order>0 allocations are
> > > a disaster.
> >
> > Are they? For what the kernel currently uses them for, I don't think
> > the lower order ones are so bad. Now and again we used to get reports
> > of atomic order 3 allocation failures with e1000 for example, but a
> > lot of those were before kswapd would properly asynchronously start
> > reclaim for atomic and higher order allocations. The odd failure
> > sometimes catches my eye, but nothing I would call a disaster.
> 
> Ok, maybe disaster is too strong a word.  But any kind of order>0 allocation 
> still has to be approached with fear and caution, with a well tested fallback 
> in the case of the inevitable failures.  How many driver writers would have 
> benefited from using order>0 pages, but turned aside to other less optimal 
> solutions due to their unreliability?  We don't know, and probably never 
> will.  Those people have moved on and won't revisit that design decision.

On the other side of the coin, we can't just merge this in the hope
that some good uses might turn up (IMO).


> > > The sheer list of patches lined up behind this set is strong evidence
> > > that there are useful features which depend on a working order>0.  When
> > > you add in the existing code that has to struggle with allocation
> > > failures or resort to special pools (ie hugetlbfs), I see a clear vote
> > > for the need for this patch.
> >
> > Really the only patches so far that I think have convincing reasons are
> > memory unplug and hugepage, and both of those can get a long way by using
> > a reserve zone (note it isn't entirely reserved, but still available for
> > things like pagecache). Beyond that, is there a big demand, and do we
> > want to make this fundamental change in direction in the kernel to
> > satisfy that demand?
> 
> Yes, these projects have workarounds, because they have to.  But the 
> workarounds are painful and often require that the user specify in advance 
> what memory they intend to use for this purpose, something users often have 
> to learn by trial and error.  Mel's patches would eliminate this barrier to 
> use of the features.
> 
> I don't see Mel's patches as "a fundamental change in direction".  I think 
> you're overstating the case.  I see it as fixing a deficiency in the design 
> of the page allocator, and a long overdue fix.

I would still say that with Mel's patches in, you need to have a fallback
to order-0 because memory can still get fragemnted. But no Mel's patches
are not exactly a fundamental change in direction itself, but introducing
higher order allocations without fallbacks is a change (OK, order 1 or 2
is used today, and mostly because of the nature of the allocator they're OK
too, but if we're talking about like 64K+ of contiguous pages).


> > > Some object because order>0 will still be able to fail.  I point out that
> > > order==0 can also fail, though we go to great lengths to prevent it.
> > >  Mel's patches raise the success rate of order>0 to within a few percent
> > > of order==0.  All this means is callers will need to decide how to handle
> > > the infrequent failure.  This should be true no matter what the order.
> >
> > So small ones like order-1 and 2 seem reasonably good right now AFAIKS.
> > If you perhaps want to say start using order-4  pages for slab or
> > some other kernel memory allocations, then you can run into the situation
> > where memory gets fragmented such that you have one sixteenth of your
> > memory actualy used but you can't allocate from any of your slabs because
> > there are no order-4 pages left. I guess this is a big difference between
> > order-low failures and order-high.
> 
> In summary, I think I can rephrase your arguments against the patches as 
> order>0 allocation pretty much works now for small orders, and people are 
> living with it".  Is that fairly accurate?  My counter argument is that we 

Well it does work for small orders and if by living with it you mean works
OK, then yes.


> can easily make it work much better and vastly simplify the code that is 
> having to work around the lack of it by applying Mel's patches.

OK we have a lot contained in that statement :)

Make it work much better -- OK, so it should be easy to get the evidence
to justify this, then?

Vastly simplify the code -- so firstly you have to weigh this against the
increased complexity of Mel's patches, and secondly you are saying that we
can abandon fallback code? That's where we're talking about a fundamental
change in direction.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-10 17:11     ` Dave McCracken
  2007-07-11  2:59       ` Nick Piggin
@ 2007-07-11  8:55       ` Christoph Hellwig
  1 sibling, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2007-07-11  8:55 UTC (permalink / raw)
  To: Dave McCracken
  Cc: Nick Piggin, Mel Gorman, Andrew Morton, kenchen, jschopp, apw,
	kamezawa.hiroyu, a.p.zijlstra, y-goto, clameter, linux-mm,
	linux-kernel

On Tue, Jul 10, 2007 at 12:11:45PM -0500, Dave McCracken wrote:
> Ok, maybe disaster is too strong a word.  But any kind of order>0 allocation 
> still has to be approached with fear and caution, with a well tested fallback 
> in the case of the inevitable failures.  How many driver writers would have 
> benefited from using order>0 pages, but turned aside to other less optimal 
> solutions due to their unreliability?  We don't know, and probably never 
> will.  Those people have moved on and won't revisit that design decision.

If you look at almost any other OS they use high-order pages quite a lot.
At least Solaris, IRIX and UnixWare do.

Also not that once we have a high-order pagecache it gives a nice way
to simply reclaim a high-order page directly :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-10 18:46   ` Christoph Lameter
@ 2007-07-11  9:48     ` Mel Gorman
  0 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2007-07-11  9:48 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Nick Piggin, Andrew Morton, kenchen, jschopp, apw,
	kamezawa.hiroyu, a.p.zijlstra, y-goto, linux-mm, linux-kernel

On (10/07/07 11:46), Christoph Lameter didst pronounce:
> On Tue, 10 Jul 2007, Nick Piggin wrote:
> 
> > I realise in your pragmatic approach, you are encouraging users to
> > put fallbacks in place in case a higher order page cannot be allocated,
> > but I don't think either higher order pagecache or higher order slubs
> > have such fallbacks (fsblock or a combination of fsblock and higher
> > order pagecache could have, but...).
> 
> We have run mm kernels for month now without the need of a fallback. I 
> purpose of ZONE_MOVABLE was to guarantee that higher order pages could be 
> reclaimed and thus make the scheme reliable?
> 

That and they would be available within a specified limit. With grouping
pages by mobility, high order pages will be available but it's workload
dependant on how many there will be. This sort of predictability is
important for hugepages and memory unplug although it's of less
relevance to order-3 and order-4 users.

> The experience so far shows that the approach works reliably. If there are 
> issues then they need to be fixed. Putting in workarounds in other places 
> such as in fsblock may just be hiding problems if there are any.

I think fsblock as it stands would gain from grouping pages by mobility.
It could use high order pages where they were available and fallback to
using the slower vmap approach when they weren't. I don't see why
highorder page cache and fsblock would be mutually exclusive. For that
matter, I don't see why any of these approachs are mutually exclusive
with what Andrea is doing other than having more than one way of
skinning a cat in the kernal at the same time might be confusing.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-11  2:59       ` Nick Piggin
@ 2007-07-11 10:01         ` Mel Gorman
  2007-07-11 13:03         ` Andy Whitcroft
  1 sibling, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2007-07-11 10:01 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Dave McCracken, Andrew Morton, kenchen, jschopp, apw,
	kamezawa.hiroyu, a.p.zijlstra, y-goto, clameter, linux-mm,
	linux-kernel

On (11/07/07 04:59), Nick Piggin didst pronounce:
> On Tue, Jul 10, 2007 at 12:11:45PM -0500, Dave McCracken wrote:
> > On Tuesday 10 July 2007, Nick Piggin wrote:
> > > On Tue, Jul 10, 2007 at 09:29:45AM -0500, Dave McCracken wrote:
> > > > I find myself wondering what "sufficiently convincing noises" are.  I
> > > > think we can all agree that in the current kernel order>0 allocations are
> > > > a disaster.
> > >
> > > Are they? For what the kernel currently uses them for, I don't think
> > > the lower order ones are so bad. Now and again we used to get reports
> > > of atomic order 3 allocation failures with e1000 for example, but a
> > > lot of those were before kswapd would properly asynchronously start
> > > reclaim for atomic and higher order allocations. The odd failure
> > > sometimes catches my eye, but nothing I would call a disaster.
> > 
> > Ok, maybe disaster is too strong a word.  But any kind of order>0 allocation 
> > still has to be approached with fear and caution, with a well tested fallback 
> > in the case of the inevitable failures.  How many driver writers would have 
> > benefited from using order>0 pages, but turned aside to other less optimal 
> > solutions due to their unreliability?  We don't know, and probably never 
> > will.  Those people have moved on and won't revisit that design decision.
> 
> On the other side of the coin, we can't just merge this in the hope
> that some good uses might turn up (IMO).
> 

That is a catch-22. There is no point putting any work into these "good
uses" if they know they'll depend on grouping pages by mobility. After
watching the patches been kicked around for so long, I'd be suprised if
people put a lot of effort into implementing things that depended on
them.

Despite that, memory unplug has shown up again despite needing these
patches to go through, the SLUB high-order allocation stuff is there
and *potentially* fsblock could avoid using vmap all the time if that
approach was taken.

> 
> > > > The sheer list of patches lined up behind this set is strong evidence
> > > > that there are useful features which depend on a working order>0.  When
> > > > you add in the existing code that has to struggle with allocation
> > > > failures or resort to special pools (ie hugetlbfs), I see a clear vote
> > > > for the need for this patch.
> > >
> > > Really the only patches so far that I think have convincing reasons are
> > > memory unplug and hugepage, and both of those can get a long way by using
> > > a reserve zone (note it isn't entirely reserved, but still available for
> > > things like pagecache). Beyond that, is there a big demand, and do we
> > > want to make this fundamental change in direction in the kernel to
> > > satisfy that demand?
> > 
> > Yes, these projects have workarounds, because they have to.  But the 
> > workarounds are painful and often require that the user specify in advance 
> > what memory they intend to use for this purpose, something users often have 
> > to learn by trial and error.  Mel's patches would eliminate this barrier to 
> > use of the features.
> > 
> > I don't see Mel's patches as "a fundamental change in direction".  I think 
> > you're overstating the case.  I see it as fixing a deficiency in the design 
> > of the page allocator, and a long overdue fix.
> 
> I would still say that with Mel's patches in, you need to have a fallback
> to order-0 because memory can still get fragemnted.

I have not disputed this. I know stressed high-order allocations have
worked well in testing to date but I accept that some corner case is
going to exist that will cause a failure and we have to be prepared to
handle it.

> But no Mel's patches
> are not exactly a fundamental change in direction itself, but introducing
> higher order allocations without fallbacks is a change (OK, order 1 or 2
> is used today, and mostly because of the nature of the allocator they're OK
> too, but if we're talking about like 64K+ of contiguous pages).
> 

Then the changes that depend on high-order allocations succeeded or the world
ends needs to be checked carefully. Grouping pages by mobility shouldn't be
kicked on the grounds of what future patch may or may not do as it is not a
fundamental change in direction on its own.  For now, high-order users must
still be prepared to handle fallbacks and we should track how often those
fallbacks are used as it's an indication of when grouping pages by mobility
is not behaving as advertised.

In the context of high-order pagecache, I believe it can be made work with
fsblock nicely as I've stated elsewhere by using vmap as a fallback instead
of the first option. SLUB using high-order allocations all the time needs to
be revisited so I would not be keen on pushing it right now because I have
the same concerns as you in mind. When failures happen for memory unplug,
it just means unplug does not occur which is not world ending.

> > > > Some object because order>0 will still be able to fail.  I point out that
> > > > order==0 can also fail, though we go to great lengths to prevent it.
> > > >  Mel's patches raise the success rate of order>0 to within a few percent
> > > > of order==0.  All this means is callers will need to decide how to handle
> > > > the infrequent failure.  This should be true no matter what the order.
> > >
> > > So small ones like order-1 and 2 seem reasonably good right now AFAIKS.
> > > If you perhaps want to say start using order-4  pages for slab or
> > > some other kernel memory allocations, then you can run into the situation
> > > where memory gets fragmented such that you have one sixteenth of your
> > > memory actualy used but you can't allocate from any of your slabs because
> > > there are no order-4 pages left. I guess this is a big difference between
> > > order-low failures and order-high.
> > 
> > In summary, I think I can rephrase your arguments against the patches as 
> > order>0 allocation pretty much works now for small orders, and people are 
> > living with it".  Is that fairly accurate?  My counter argument is that we 
> 
> Well it does work for small orders and if by living with it you mean works
> OK, then yes.
> 
> 
> > can easily make it work much better and vastly simplify the code that is 
> > having to work around the lack of it by applying Mel's patches.
> 
> OK we have a lot contained in that statement :)
> 
> Make it work much better -- OK, so it should be easy to get the evidence
> to justify this, then?
> 
> Vastly simplify the code -- so firstly you have to weigh this against the
> increased complexity of Mel's patches, and secondly you are saying that we
> can abandon fallback code? That's where we're talking about a fundamental
> change in direction.
> 

I don't think fallback code should be abandoned. Maybe in a few years
when we know fallback never occur in any circumstances then maybe, but not
now. Someone using high orders needs to be sure there is a good reason for
it to justify dealing with the complexity. Some users like hugepages and
memory unplug are willing to deal with said complexity because they have to.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-10 18:50     ` Christoph Lameter
@ 2007-07-11 10:05       ` Mel Gorman
  0 siblings, 0 replies; 30+ messages in thread
From: Mel Gorman @ 2007-07-11 10:05 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Nick Piggin, Dave McCracken, Andrew Morton, kenchen, jschopp,
	apw, kamezawa.hiroyu, a.p.zijlstra, y-goto, linux-mm,
	linux-kernel

On (10/07/07 11:50), Christoph Lameter didst pronounce:
> On Tue, 10 Jul 2007, Nick Piggin wrote:
> 
> > > The sheer list of patches lined up behind this set is strong evidence that 
> > > there are useful features which depend on a working order>0.  When you add in 
> > > the existing code that has to struggle with allocation failures or resort to 
> > > special pools (ie hugetlbfs), I see a clear vote for the need for this patch.
> > 
> > Really the only patches so far that I think have convincing reasons are
> > memory unplug and hugepage, and both of those can get a long way by using
> > a reserve zone (note it isn't entirely reserved, but still available for
> > things like pagecache). Beyond that, is there a big demand, and do we
> > want to make this fundamental change in direction in the kernel to
> > satisfy that demand?
> 
> SLUB can use it to use large order pages which generate less lock 
> contention which is important in SMP systems. Large pages also increase 
> the object density in slabs.
> 

And this should be a measurable benefit at least.

> > So small ones like order-1 and 2 seem reasonably good right now AFAIKS.
> 
> Sorry no. Without the antifrag patches I had failures even with order 1 
> and 2 allocs from SLUB.
> 

Which doesn't really suprise me. Order-1 and order-2 allocation succeed
today because there are so few users of them. Order-1 is sometimes used
for stacks and the occasional wireless driver instead of on a regular
basis.

We still need to revisit the watermark handling before pushing this
aspect of SLUB forward but I'd like to see grouping pages by mobility go
forward first so the work is not pie-in-the-sky. I'm sure it can be
handled better than what we do in -mm today but it's good to know we
have a comparison point. The first stab at watermark handling may not be
the best approach but it's held up pretty well so far.

> > If you perhaps want to say start using order-4  pages for slab or
> > some other kernel memory allocations, then you can run into the situation
> > where memory gets fragmented such that you have one sixteenth of your
> > memory actualy used but you can't allocate from any of your slabs because
> > there are no order-4 pages left. I guess this is a big difference between
> > order-low failures and order-high.
> 
> The order that is readily reclaimable should be configurable.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-11  2:59       ` Nick Piggin
  2007-07-11 10:01         ` Mel Gorman
@ 2007-07-11 13:03         ` Andy Whitcroft
  1 sibling, 0 replies; 30+ messages in thread
From: Andy Whitcroft @ 2007-07-11 13:03 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Dave McCracken, Mel Gorman, Andrew Morton, kenchen, jschopp,
	kamezawa.hiroyu, a.p.zijlstra, y-goto, clameter, linux-mm,
	linux-kernel

On Wed, Jul 11, 2007 at 04:59:46AM +0200, Nick Piggin wrote:

> > Yes, these projects have workarounds, because they have to.  But the 
> > workarounds are painful and often require that the user specify in advance 
> > what memory they intend to use for this purpose, something users often have 
> > to learn by trial and error.  Mel's patches would eliminate this barrier to 
> > use of the features.
> > 
> > I don't see Mel's patches as "a fundamental change in direction".  I think 
> > you're overstating the case.  I see it as fixing a deficiency in the design 
> > of the page allocator, and a long overdue fix.
> 
> I would still say that with Mel's patches in, you need to have a fallback
> to order-0 because memory can still get fragemnted. But no Mel's patches
> are not exactly a fundamental change in direction itself, but introducing
> higher order allocations without fallbacks is a change (OK, order 1 or 2
> is used today, and mostly because of the nature of the allocator they're OK
> too, but if we're talking about like 64K+ of contiguous pages).

However much one improves fragmentation the chances of finding a
higher order page is always going to be lower than that of geting
an order-0, there are less of them for a start.  It is pretty much
inevitable that you would want to have a fallback for anything
which is critical for system continuation.  The thrust of the
anti-fragmentation work is not to claim a guarenteed availability but
to expand the range of order over which we find a high probability
of availability.  As you say elsewhere orders 0-2 pretty much work
with buddy even in the face of random allocation, where intution
might indicate it should not.  Simplistic reclaim can find us a page.
Indeed you then find that the kernel uses those sizes in preference
to order-0 for simplicity as they can be pretty much relied on, as
is done with the process kernel stacks.  Much of what is proposed
as uses for this work is an extension of this, using bigger pages
where available for performance or simplicity.  SLUB as an example is
making use of the fact that general availablity of near zero order
is virtually guarenteed.  Obviously as order increases cirtainly
decreases and you have to trade off the ramifications of failure
to allocate against the cost of handling that failure.

Specifically thinking about the pagecache, a fusion of Christoph's
high order pagecache with fsblocks ability to handle discontigious
pages at higher order sounds like it could be a very powerful
solution to both problems, offering contigious pages where available
and working regardless where not.

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-10 10:20 -mm merge plans -- anti-fragmentation Mel Gorman
                   ` (3 preceding siblings ...)
  2007-07-10 14:29 ` Dave McCracken
@ 2007-07-12 19:29 ` Andrew Morton
  2007-07-12 21:32   ` Mel Gorman
  2007-07-13 10:20   ` -mm merge plans -- anti-fragmentation Andy Whitcroft
  4 siblings, 2 replies; 30+ messages in thread
From: Andrew Morton @ 2007-07-12 19:29 UTC (permalink / raw)
  To: Mel Gorman
  Cc: npiggin, kenchen, jschopp, apw, kamezawa.hiroyu, a.p.zijlstra,
	y-goto, clameter, linux-mm, linux-kernel

On Tue, 10 Jul 2007 11:20:43 +0100
mel@skynet.ie (Mel Gorman) wrote:

> > create-the-zone_movable-zone.patch
> > allow-huge-page-allocations-to-use-gfp_high_movable.patch
> > handle-kernelcore=-generic.patch
> > 
> >  Mel's moveable-zone work.  In a similar situation.  We need to stop whatever
> >  we're doing and get down and work out what we're going to do with all this
> >  stuff.
> > 
> 
> Whatever about grouping pages by mobility, I would like to see these go
> through. They have a real application for hugetlb pool resizing where the
> administrator knows the range of hugepages that will be required but doesn't
> want to waste memory when the required number of hugepages is small. I've
> cc'd Kenneth Chen as I believe he has run into this problem recently where
> I believe partitioning memory would have helped. He'll either confirm or deny.

Still no decision here, really.

Should we at least go for

add-__gfp_movable-for-callers-to-flag-allocations-from-high-memory-that-may-be-migrated.patch
create-the-zone_movable-zone.patch
allow-huge-page-allocations-to-use-gfp_high_movable.patch
handle-kernelcore=-generic.patch

in 2.6.23?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-12 19:29 ` Andrew Morton
@ 2007-07-12 21:32   ` Mel Gorman
  2007-07-13 15:56     ` [PATCH] Add a movablecore= parameter for sizing ZONE_MOVABLE Mel Gorman
  2007-07-13 10:20   ` -mm merge plans -- anti-fragmentation Andy Whitcroft
  1 sibling, 1 reply; 30+ messages in thread
From: Mel Gorman @ 2007-07-12 21:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: npiggin, kenchen, jschopp, apw, kamezawa.hiroyu, a.p.zijlstra,
	y-goto, clameter, linux-mm, linux-kernel

On (12/07/07 12:29), Andrew Morton didst pronounce:
> On Tue, 10 Jul 2007 11:20:43 +0100
> mel@skynet.ie (Mel Gorman) wrote:
> 
> > > create-the-zone_movable-zone.patch
> > > allow-huge-page-allocations-to-use-gfp_high_movable.patch
> > > handle-kernelcore=-generic.patch
> > > 
> > >  Mel's moveable-zone work.  In a similar situation.  We need to stop whatever
> > >  we're doing and get down and work out what we're going to do with all this
> > >  stuff.
> > > 
> > 
> > Whatever about grouping pages by mobility, I would like to see these go
> > through. They have a real application for hugetlb pool resizing where the
> > administrator knows the range of hugepages that will be required but doesn't
> > want to waste memory when the required number of hugepages is small. I've
> > cc'd Kenneth Chen as I believe he has run into this problem recently where
> > I believe partitioning memory would have helped. He'll either confirm or deny.
> 
> Still no decision here, really.
> 
> Should we at least go for
> 
> add-__gfp_movable-for-callers-to-flag-allocations-from-high-memory-that-may-be-migrated.patch
> create-the-zone_movable-zone.patch
> allow-huge-page-allocations-to-use-gfp_high_movable.patch
> handle-kernelcore=-generic.patch
> 
> in 2.6.23?

Well, yes please from me obviously :) . There is one additional patch
I would like to send on tomorrow and that is providing the movablecore=
switch as well as kernelcore=. This is based on Nick's feedback where he
felt the configuration item might be not be the ideal in all situations -
Yasunori Goto agreed with him. I've posted a candidate patch and Andy had
a minor problem with it that I will correct.

While I would of course like grouping pages by mobility to go in as well,
I recognise that it probably needs a resubmission to -mm so people can take
another look in the next cycle.

On the positive side with just these patches, they gain us a few things;

1. A zone where the huge page pool can likely grow to at runtime. On
batch systems between jobs, the next job owner could grow the pool to
the size of ZONE_MOVABLE with reasonable reliability. This means an
administrator can set the zone to be a given size and let users decide
for themselves what size the hugepage pool will be. This gives us a
fairly reliable pool without the downside of wasting memory. Talking
to Kenneth Chen at OLS led me to believe that this would be a useful
feature in real world situations. He's been quite at the moment so
hopefully this will nudge him into saying something.

2. It does help the memory unplug case to some extent. The page
isolation code in that patchset does depend on grouping pages by
mobility but I could cut down grouping pages by mobility to *just* the
parts they need as a starting point

3. In contrast to grouping pages by mobility, you know well in advance how
many hugepages are likely to be allocated. The success rates of grouping
pages by mobility on it's own is workload dependant.

4. The zone is lower risk than grouping pages by mobility. It's less
complicated, the complexity is at the side and the code at runtime is the
same as todays.

So it's lower risk than grouping pages by mobility, has predictable behaviour
and helps some cases.  As Nick points out as well, we can see how far we can
get with just this reserve zone without taking the full plunge with grouping
pages by mobility.

Hopefully other people will throw their 2 cents in here too.

Thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-12 19:29 ` Andrew Morton
  2007-07-12 21:32   ` Mel Gorman
@ 2007-07-13 10:20   ` Andy Whitcroft
  2007-07-13 16:58     ` Christoph Lameter
  2007-07-13 17:02     ` Nish Aravamudan
  1 sibling, 2 replies; 30+ messages in thread
From: Andy Whitcroft @ 2007-07-13 10:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, npiggin, kenchen, jschopp, kamezawa.hiroyu,
	a.p.zijlstra, y-goto, clameter, linux-mm, linux-kernel

Andrew Morton wrote:
> On Tue, 10 Jul 2007 11:20:43 +0100
> mel@skynet.ie (Mel Gorman) wrote:
> 
>>> create-the-zone_movable-zone.patch
>>> allow-huge-page-allocations-to-use-gfp_high_movable.patch
>>> handle-kernelcore=-generic.patch
>>>
>>>  Mel's moveable-zone work.  In a similar situation.  We need to stop whatever
>>>  we're doing and get down and work out what we're going to do with all this
>>>  stuff.
>>>
>> Whatever about grouping pages by mobility, I would like to see these go
>> through. They have a real application for hugetlb pool resizing where the
>> administrator knows the range of hugepages that will be required but doesn't
>> want to waste memory when the required number of hugepages is small. I've
>> cc'd Kenneth Chen as I believe he has run into this problem recently where
>> I believe partitioning memory would have helped. He'll either confirm or deny.
> 
> Still no decision here, really.
> 
> Should we at least go for
> 
> add-__gfp_movable-for-callers-to-flag-allocations-from-high-memory-that-may-be-migrated.patch
> create-the-zone_movable-zone.patch
> allow-huge-page-allocations-to-use-gfp_high_movable.patch
> handle-kernelcore=-generic.patch
> 
> in 2.6.23?

These patches are pretty simple and self-contained utilising the
existing zone infrastructure.  They provide a significant degree of
placement control when configured, which gives a lot of the benefits of
grouping-pages-by-mobility.  Merging these would seem like a low-risk
option.

Having a degree of placement control as delivered by ZONE_MOVABLE
greatly increases the effectiveness of lumpy reclaim at higher orders.
These patches plus lumpy would (IMO) provide a good base for further
development.  In particular I would envisage better usability for
hugepage users in terms of simpler configuration.

I would like to see ZONE_MOVABLE and lumpy considered for 2.6.23.

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] Add a movablecore= parameter for sizing ZONE_MOVABLE
  2007-07-12 21:32   ` Mel Gorman
@ 2007-07-13 15:56     ` Mel Gorman
  2007-07-14  8:28       ` Nick Piggin
  0 siblings, 1 reply; 30+ messages in thread
From: Mel Gorman @ 2007-07-13 15:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: npiggin, kenchen, jschopp, apw, kamezawa.hiroyu, a.p.zijlstra,
	y-goto, clameter, linux-mm, linux-kernel

On (12/07/07 22:32), Mel Gorman didst pronounce:

> > Should we at least go for
> > 
> > add-__gfp_movable-for-callers-to-flag-allocations-from-high-memory-that-may-be-migrated.patch
> > create-the-zone_movable-zone.patch
> > allow-huge-page-allocations-to-use-gfp_high_movable.patch
> > handle-kernelcore=-generic.patch
> > 
> > in 2.6.23?
> 
> Well, yes please from me obviously :) . There is one additional patch
> I would like to send on tomorrow and that is providing the movablecore=

This is the patch. It has been boot-tested on a number of machines and
behaves as expected. Nick, with this in addition, do you have any
objection to the ZONE_MOVABLE patches going through to 2.6.23?

Thanks

=====
This patch adds a new parameter for sizing ZONE_MOVABLE called
movablecore=. While kernelcore= is used to specify the minimum amount of
memory that must be available for all allocation types, movablecore= is
used to specify the minimum amount of memory that is used for migratable
allocations. The amount of memory used for migratable allocations determines
how large the huge page pool could be dynamically resized to at runtime
for example.

How movablecore is actually handled is that the total number of pages in the system is calculated and a value is set for kernelcore that is

kernelcore == totalpages - movablecore

Both kernelcore= and movablecore= can be safely specified at the same time.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>

---
 Documentation/kernel-parameters.txt |   10 +++++
 mm/page_alloc.c                     |   65 ++++++++++++++++++++++++++++++++----
 2 files changed, 68 insertions(+), 7 deletions(-)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-zonemovable/Documentation/kernel-parameters.txt linux-2.6.22-movablecore/Documentation/kernel-parameters.txt
--- linux-2.6.22-zonemovable/Documentation/kernel-parameters.txt	2007-07-09 11:50:18.000000000 +0100
+++ linux-2.6.22-movablecore/Documentation/kernel-parameters.txt	2007-07-10 11:38:04.000000000 +0100
@@ -850,6 +850,16 @@ and is between 256 and 4096 characters. 
 			use the HighMem zone if it exists, and the Normal
 			zone if it does not.
 
+	movablecore=nn[KMG]	[KNL,IA-32,IA-64,PPC,X86-64] This parameter
+			is similar to kernelcore except it specifies the
+			amount of memory used for migratable allocations.
+			If both kernelcore and movablecore is specified,
+			then kernelcore will be at *least* the specified
+			value but may be more. If movablecore on its own
+			is specified, the administrator must be careful
+			that the amount of memory usable for all allocations
+			is not too small.
+
 	keepinitrd	[HW,ARM]
 
 	kstack=N	[IA-32,X86-64] Print N words from the kernel stack
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-zonemovable/mm/page_alloc.c linux-2.6.22-movablecore/mm/page_alloc.c
--- linux-2.6.22-zonemovable/mm/page_alloc.c	2007-07-09 11:50:18.000000000 +0100
+++ linux-2.6.22-movablecore/mm/page_alloc.c	2007-07-13 10:37:37.000000000 +0100
@@ -137,6 +137,7 @@ static unsigned long __meminitdata dma_r
   unsigned long __initdata node_boundary_end_pfn[MAX_NUMNODES];
 #endif /* CONFIG_MEMORY_HOTPLUG_RESERVE */
   unsigned long __initdata required_kernelcore;
+  unsigned long __initdata required_movablecore;
   unsigned long __initdata zone_movable_pfn[MAX_NUMNODES];
 
   /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
@@ -2980,6 +2981,18 @@ unsigned long __init find_max_pfn_with_a
 	return max_pfn;
 }
 
+unsigned long __init early_calculate_totalpages(void)
+{
+	int i;
+	unsigned long totalpages = 0;
+
+	for (i = 0; i < nr_nodemap_entries; i++)
+		totalpages += early_node_map[i].end_pfn -
+						early_node_map[i].start_pfn;
+
+	return totalpages;
+}
+
 /*
  * Find the PFN the Movable zone begins in each node. Kernel memory
  * is spread evenly between nodes as long as the nodes have enough
@@ -2993,6 +3006,29 @@ void __init find_zone_movable_pfns_for_n
 	unsigned long kernelcore_node, kernelcore_remaining;
 	int usable_nodes = num_online_nodes();
 
+	/*
+	 * If movablecore was specified, calculate what size of
+	 * kernelcore that corresponds so that memory usable for
+	 * any allocation type is evenly spread. If both kernelcore
+	 * and movablecore are specified, then the value of kernelcore
+	 * will be used for required_kernelcore if it's greater than
+	 * what movablecore would have allowed.
+	 */
+	if (required_movablecore) {
+		unsigned long totalpages = early_calculate_totalpages();
+		unsigned long corepages;
+
+		/*
+		 * Round-up so that ZONE_MOVABLE is at least as large as what
+		 * was requested by the user
+		 */
+		required_movablecore =
+			roundup(required_movablecore, MAX_ORDER_NR_PAGES);
+		corepages = totalpages - required_movablecore;
+
+		required_kernelcore = max(required_kernelcore, corepages);
+	}
+
 	/* If kernelcore was not specified, there is no ZONE_MOVABLE */
 	if (!required_kernelcore)
 		return;
@@ -3173,26 +3209,41 @@ void __init free_area_init_nodes(unsigne
 	}
 }
 
-/*
- * kernelcore=size sets the amount of memory for use for allocations that
- * cannot be reclaimed or migrated.
- */
-static int __init cmdline_parse_kernelcore(char *p)
+static int __init cmdline_parse_core(char *p, unsigned long *core)
 {
 	unsigned long long coremem;
 	if (!p)
 		return -EINVAL;
 
 	coremem = memparse(p, &p);
-	required_kernelcore = coremem >> PAGE_SHIFT;
+	*core = coremem >> PAGE_SHIFT;
 
-	/* Paranoid check that UL is enough for required_kernelcore */
+	/* Paranoid check that UL is enough for the coremem value */
 	WARN_ON((coremem >> PAGE_SHIFT) > ULONG_MAX);
 
 	return 0;
 }
 
+/*
+ * kernelcore=size sets the amount of memory for use for allocations that
+ * cannot be reclaimed or migrated.
+ */
+static int __init cmdline_parse_kernelcore(char *p)
+{
+	return cmdline_parse_core(p, &required_kernelcore);
+}
+
+/*
+ * movablecore=size sets the amount of memory for use for allocations that
+ * can be reclaimed or migrated.
+ */
+static int __init cmdline_parse_movablecore(char *p)
+{
+	return cmdline_parse_core(p, &required_movablecore);
+}
+
 early_param("kernelcore", cmdline_parse_kernelcore);
+early_param("movablecore", cmdline_parse_movablecore);
 
 #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-13 10:20   ` -mm merge plans -- anti-fragmentation Andy Whitcroft
@ 2007-07-13 16:58     ` Christoph Lameter
  2007-07-13 17:02     ` Nish Aravamudan
  1 sibling, 0 replies; 30+ messages in thread
From: Christoph Lameter @ 2007-07-13 16:58 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Andrew Morton, Mel Gorman, npiggin, kenchen, jschopp,
	kamezawa.hiroyu, a.p.zijlstra, y-goto, linux-mm, linux-kernel

On Fri, 13 Jul 2007, Andy Whitcroft wrote:

> I would like to see ZONE_MOVABLE and lumpy considered for 2.6.23.

Agree. ZONE_MOVABLE is a way to guarantee a reclaimable memory area which 
is beneficial for the antifrag approach (and it will help to get more 
reliable allocations of higher order pages in SLUB if one chooses to 
use these...).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: -mm merge plans -- anti-fragmentation
  2007-07-13 10:20   ` -mm merge plans -- anti-fragmentation Andy Whitcroft
  2007-07-13 16:58     ` Christoph Lameter
@ 2007-07-13 17:02     ` Nish Aravamudan
  1 sibling, 0 replies; 30+ messages in thread
From: Nish Aravamudan @ 2007-07-13 17:02 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Andrew Morton, Mel Gorman, npiggin, kenchen, jschopp,
	kamezawa.hiroyu, a.p.zijlstra, y-goto, clameter, linux-mm,
	linux-kernel

On 7/13/07, Andy Whitcroft <apw@shadowen.org> wrote:
> Andrew Morton wrote:
> > On Tue, 10 Jul 2007 11:20:43 +0100
> > mel@skynet.ie (Mel Gorman) wrote:
> >
> >>> create-the-zone_movable-zone.patch
> >>> allow-huge-page-allocations-to-use-gfp_high_movable.patch
> >>> handle-kernelcore=-generic.patch
> >>>
> >>>  Mel's moveable-zone work.  In a similar situation.  We need to stop whatever
> >>>  we're doing and get down and work out what we're going to do with all this
> >>>  stuff.
> >>>
> >> Whatever about grouping pages by mobility, I would like to see these go
> >> through. They have a real application for hugetlb pool resizing where the
> >> administrator knows the range of hugepages that will be required but doesn't
> >> want to waste memory when the required number of hugepages is small. I've
> >> cc'd Kenneth Chen as I believe he has run into this problem recently where
> >> I believe partitioning memory would have helped. He'll either confirm or deny.
> >
> > Still no decision here, really.
> >
> > Should we at least go for
> >
> > add-__gfp_movable-for-callers-to-flag-allocations-from-high-memory-that-may-be-migrated.patch
> > create-the-zone_movable-zone.patch
> > allow-huge-page-allocations-to-use-gfp_high_movable.patch
> > handle-kernelcore=-generic.patch
> >
> > in 2.6.23?
>
> These patches are pretty simple and self-contained utilising the
> existing zone infrastructure.  They provide a significant degree of
> placement control when configured, which gives a lot of the benefits of
> grouping-pages-by-mobility.  Merging these would seem like a low-risk
> option.
>
> Having a degree of placement control as delivered by ZONE_MOVABLE
> greatly increases the effectiveness of lumpy reclaim at higher orders.
> These patches plus lumpy would (IMO) provide a good base for further
> development.  In particular I would envisage better usability for
> hugepage users in terms of simpler configuration.

This is also where I (as a libhugetlbfs maintainer/developer) see
these patches being very helpful (for example, see Adam Litke's recent
posting on resizing the hugepage pool dynamically). Making hugepages
"easier" to use -- and in this case that means more likely to
successfully resize the hugepage pool at run-time -- is a good thing.

> I would like to see ZONE_MOVABLE and lumpy considered for 2.6.23.

Ack.

Thanks,
Nish

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Add a movablecore= parameter for sizing ZONE_MOVABLE
  2007-07-13 15:56     ` [PATCH] Add a movablecore= parameter for sizing ZONE_MOVABLE Mel Gorman
@ 2007-07-14  8:28       ` Nick Piggin
  2007-07-14 13:02         ` Mel Gorman
  0 siblings, 1 reply; 30+ messages in thread
From: Nick Piggin @ 2007-07-14  8:28 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, kenchen, jschopp, apw, kamezawa.hiroyu,
	a.p.zijlstra, y-goto, clameter, linux-mm, linux-kernel

On Fri, Jul 13, 2007 at 04:56:10PM +0100, Mel Gorman wrote:
> On (12/07/07 22:32), Mel Gorman didst pronounce:
> 
> > > Should we at least go for
> > > 
> > > add-__gfp_movable-for-callers-to-flag-allocations-from-high-memory-that-may-be-migrated.patch
> > > create-the-zone_movable-zone.patch
> > > allow-huge-page-allocations-to-use-gfp_high_movable.patch
> > > handle-kernelcore=-generic.patch
> > > 
> > > in 2.6.23?
> > 
> > Well, yes please from me obviously :) . There is one additional patch
> > I would like to send on tomorrow and that is providing the movablecore=
> 
> This is the patch. It has been boot-tested on a number of machines and
> behaves as expected. Nick, with this in addition, do you have any
> objection to the ZONE_MOVABLE patches going through to 2.6.23?

What's the status of making it configurable? I didn't see something
in -mm for that yet?

But that's not as important as ensuring the concept and user visible
stuff is in good shape, which I no longer have any problems with. So
yeah I think it would be good to get this in and get people up and
running with it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Add a movablecore= parameter for sizing ZONE_MOVABLE
  2007-07-14  8:28       ` Nick Piggin
@ 2007-07-14 13:02         ` Mel Gorman
  2007-07-15 13:47           ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 30+ messages in thread
From: Mel Gorman @ 2007-07-14 13:02 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, kenchen, jschopp, apw, kamezawa.hiroyu,
	a.p.zijlstra, y-goto, clameter, linux-mm, linux-kernel

On (14/07/07 10:28), Nick Piggin didst pronounce:
> On Fri, Jul 13, 2007 at 04:56:10PM +0100, Mel Gorman wrote:
> > On (12/07/07 22:32), Mel Gorman didst pronounce:
> > 
> > > > Should we at least go for
> > > > 
> > > > add-__gfp_movable-for-callers-to-flag-allocations-from-high-memory-that-may-be-migrated.patch
> > > > create-the-zone_movable-zone.patch
> > > > allow-huge-page-allocations-to-use-gfp_high_movable.patch
> > > > handle-kernelcore=-generic.patch
> > > > 
> > > > in 2.6.23?
> > > 
> > > Well, yes please from me obviously :) . There is one additional patch
> > > I would like to send on tomorrow and that is providing the movablecore=
> > 
> > This is the patch. It has been boot-tested on a number of machines and
> > behaves as expected. Nick, with this in addition, do you have any
> > objection to the ZONE_MOVABLE patches going through to 2.6.23?
> 
> What's the status of making it configurable? I didn't see something
> in -mm for that yet?
> 

I have a patch that makes it configurable but Kamezawa-san posted a very
promising patch about making all zones configurable in a very clever way
which is more general than what I did. He posted it as an RFC[1] and there
was feedback from Andy Whitcroft on how it could be made better so it wouldn't
have been picked up for -mm but something is in the pipeline.

I've tested his patch for zone movable and it worked as advertised so I
intended to see post-merge window what else could be done with it clean-up
wise. I am curious to see if it can also make ZONE_NORMAL configurable on
machines that only have ZONE_DMA for example.

> But that's not as important as ensuring the concept and user visible
> stuff is in good shape, which I no longer have any problems with.

Excellent.

> So
> yeah I think it would be good to get this in and get people up and
> running with it.

Thanks Nick.

[1] http://marc.info/?l=linux-mm&m=118405871911268&w=2

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Add a movablecore= parameter for sizing ZONE_MOVABLE
  2007-07-14 13:02         ` Mel Gorman
@ 2007-07-15 13:47           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 30+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-07-15 13:47 UTC (permalink / raw)
  To: Mel Gorman
  Cc: npiggin, akpm, kenchen, jschopp, apw, a.p.zijlstra, y-goto,
	clameter, linux-mm, linux-kernel

On Sat, 14 Jul 2007 14:02:08 +0100
mel@skynet.ie (Mel Gorman) wrote:
> > What's the status of making it configurable? I didn't see something
> > in -mm for that yet?
> > 
> 
> I have a patch that makes it configurable but Kamezawa-san posted a very
> promising patch about making all zones configurable in a very clever way
> which is more general than what I did. He posted it as an RFC[1] and there
> was feedback from Andy Whitcroft on how it could be made better so it wouldn't
> have been picked up for -mm but something is in the pipeline.
> 
I'll post it when I can. against the newest -mm.

> I've tested his patch for zone movable and it worked as advertised so I
> intended to see post-merge window what else could be done with it clean-up
> wise. I am curious to see if it can also make ZONE_NORMAL configurable on
> machines that only have ZONE_DMA for example.
> 
I think it as an interesting idea, Hmm....

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2007-07-15 13:47 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-10 10:20 -mm merge plans -- anti-fragmentation Mel Gorman
2007-07-10 11:01 ` KAMEZAWA Hiroyuki
2007-07-10 11:12   ` Mel Gorman
2007-07-10 11:38     ` KAMEZAWA Hiroyuki
2007-07-10 15:50       ` Mel Gorman
2007-07-10 11:04 ` Peter Zijlstra
2007-07-10 13:24   ` Mel Gorman
2007-07-10 13:03 ` Nick Piggin
2007-07-10 13:55   ` Mel Gorman
2007-07-10 18:47     ` Christoph Lameter
2007-07-10 18:46   ` Christoph Lameter
2007-07-11  9:48     ` Mel Gorman
2007-07-10 14:29 ` Dave McCracken
2007-07-10 15:23   ` Nick Piggin
2007-07-10 17:11     ` Dave McCracken
2007-07-11  2:59       ` Nick Piggin
2007-07-11 10:01         ` Mel Gorman
2007-07-11 13:03         ` Andy Whitcroft
2007-07-11  8:55       ` Christoph Hellwig
2007-07-10 18:50     ` Christoph Lameter
2007-07-11 10:05       ` Mel Gorman
2007-07-12 19:29 ` Andrew Morton
2007-07-12 21:32   ` Mel Gorman
2007-07-13 15:56     ` [PATCH] Add a movablecore= parameter for sizing ZONE_MOVABLE Mel Gorman
2007-07-14  8:28       ` Nick Piggin
2007-07-14 13:02         ` Mel Gorman
2007-07-15 13:47           ` KAMEZAWA Hiroyuki
2007-07-13 10:20   ` -mm merge plans -- anti-fragmentation Andy Whitcroft
2007-07-13 16:58     ` Christoph Lameter
2007-07-13 17:02     ` Nish Aravamudan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox