About the free page pool

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* About the free page pool
@ 2002-09-02 19:50 Scott Kaplan
  2002-09-02 20:33 ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Scott Kaplan @ 2002-09-02 19:50 UTC (permalink / raw)
  To: linux-mm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Yet another question as I try to get a clear picture of the nitty-gritty 
details of the VM...

How important is it to maintain a list of free pages?  That is, how 
critical is it that there be some pool of free pages from which the only 
bookkeeping required is the removal of that page from the free list.

In contrast, how awful would the following be:  Keep no free list, but 
instead ensure that some portion of the trailing end of the inactive list 
contains clean pages that are ready to be reclaimed.  When a free page is 
needed, just unmap that clean, inactive page and use *that* as your free 
page.  Clearly some more bookkeeping is required to unmap the page (assume 
that rmap is available to make that a straightforward task) than there 
would be simply to remove the page from the free list.  However, for every 
page on the free list, that unmapping work had to happen previously anyway.
..

(Of course, the above scenario assumes that main memory is full.  If there 
are unused page frames, then certainly you would consult a list of those 
first.)

Are there moments at which pages need to be allocated *so quickly* that 
unmapping the page at allocation time is too costly?  Or is there some 
other reason for maintaining a free list that I'm completely missing?

Also, how large is the free list of pages now?  5% of the main memory 
space?  A fixed number of page frames?

As always, thanks for the feedback and insights.
Scott
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (Darwin)
Comment: For info see http://www.gnupg.org

iD8DBQE9c8Ed8eFdWQtoOmgRAm3BAJ9q8Fw2v2F2MtuM9xxuwB2FjuN9MgCeO7P/
C6TrxXqNKF07Po0msvHvoKg=
=qAZt
-----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: About the free page pool
  2002-09-02 19:50 About the free page pool Scott Kaplan
@ 2002-09-02 20:33 ` Andrew Morton
  2002-09-02 20:50   ` Rik van Riel
  2002-09-02 21:58   ` Scott Kaplan
  0 siblings, 2 replies; 12+ messages in thread
From: Andrew Morton @ 2002-09-02 20:33 UTC (permalink / raw)
  To: Scott Kaplan; +Cc: linux-mm

Scott Kaplan wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Yet another question as I try to get a clear picture of the nitty-gritty
> details of the VM...
> 
> How important is it to maintain a list of free pages?  That is, how
> critical is it that there be some pool of free pages from which the only
> bookkeeping required is the removal of that page from the free list.

There are several reasons, all messy.

- We need to be able to allocate pages at interrupt time.  Mainly
  for networking receive.

  Similarly, we sometimes need to be able to allocate pages from under
  spinlocks.  In a context where we cannot legally take the locks or
  perform the functions which page reclaim wants to do.

- We sometimes need to allocate memory from *within* the context of
  page reclaim: find a dirty page on the LRU, need to write it out,
  need to allocate some memory to start the IO.  Where does that
  memory come from.

- The kernel frequently needs to perform higher-order allocations:
  two or more physically-contiguous pages.  The way we agglomerate
  0-order pages into higher-order pages is by coalescing them in the
  buddy.  If _all_ "free" pages are out on an LRU somewhere, we don't
  have a higher-order pool to draw from.

> In contrast, how awful would the following be:  Keep no free list, but
> instead ensure that some portion of the trailing end of the inactive list
> contains clean pages that are ready to be reclaimed.  When a free page is
> needed, just unmap that clean, inactive page and use *that* as your free
> page.  Clearly some more bookkeeping is required to unmap the page (assume
> that rmap is available to make that a straightforward task) than there
> would be simply to remove the page from the free list.  However, for every
> page on the free list, that unmapping work had to happen previously anyway.
> ..

It's feasible.  It'd take some work.  Probably it would best be implemented
via a third list.  That list would be protected by an IRQ-safe lock,
so reclaim from interrupt context would be OK.  The rmap unmapping code
would need to be interrupt-safe too (probably).  That's fairly straightforward,
but has subtleties between SMP and uniprocessor.  spin_trylock() doesn't do
anything on UP.

The higher-order page thing seems to be the biggest problem.

> Are there moments at which pages need to be allocated *so quickly* that
> unmapping the page at allocation time is too costly?  Or is there some
> other reason for maintaining a free list that I'm completely missing?

Interrupt-time allocations need to have minimum latency.  Incremental
latency in the page allocator will add directly to interrupt latency.
 
> Also, how large is the free list of pages now?  5% of the main memory
> space?  A fixed number of page frames?
> 

It's a ratio of the zone size, and there are a few thresholds in there,
for hysteresis, for emergency allocations, etc.  See free_area_init_core()
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: About the free page pool
  2002-09-02 20:33 ` Andrew Morton
@ 2002-09-02 20:50   ` Rik van Riel
  2002-09-02 21:21     ` Andrew Morton
  2002-09-02 21:58   ` Scott Kaplan
  1 sibling, 1 reply; 12+ messages in thread
From: Rik van Riel @ 2002-09-02 20:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Scott Kaplan, linux-mm

On Mon, 2 Sep 2002, Andrew Morton wrote:

> > How important is it to maintain a list of free pages?  That is, how
> > critical is it that there be some pool of free pages from which the only
> > bookkeeping required is the removal of that page from the free list.
>
> There are several reasons, all messy.

[snip]

> It's feasible.  It'd take some work.  Probably it would best be implemented
> via a third list.  That list would be protected by an IRQ-safe lock,

I don't think we need to bother with the IRQ-safe part.

It's much simpler if we just do:

1) have a normal free list, but have it smaller ...
   say, between zone->pages_min and zone->pages_low

2) if the free pages drop below the low water mark,
   have either a normal allocator or a kernel thread
   refill it to the high water mark, from the clean
   pages list

3) have the free+clean target set to something higher,
   say zone->pages_high ... we could even tune this
   automatically, if we run out of free+clean pages too
   often kswapd should probably try to keep more pages
   clean

What do you think, would this work?

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: About the free page pool
  2002-09-02 21:21     ` Andrew Morton
@ 2002-09-02 21:15       ` Rik van Riel
  0 siblings, 0 replies; 12+ messages in thread
From: Rik van Riel @ 2002-09-02 21:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Scott Kaplan, linux-mm

On Mon, 2 Sep 2002, Andrew Morton wrote:

> Well, I'm at a bit of a loss to understand what the objective
> of all this is.  Is it so that we can effectively increase the
> cache size, by not "wasting" all that free memory?

This is the main goal, yes.  It is worth noting that it also
works in the other direction, we can simply increase the clean
target to something large if we have a high allocation rate
because it doesn't waste memory to clean pages earlier.

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: About the free page pool
  2002-09-02 20:50   ` Rik van Riel
@ 2002-09-02 21:21     ` Andrew Morton
  2002-09-02 21:15       ` Rik van Riel
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2002-09-02 21:21 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Scott Kaplan, linux-mm

Rik van Riel wrote:
> 
> On Mon, 2 Sep 2002, Andrew Morton wrote:
> 
> > > How important is it to maintain a list of free pages?  That is, how
> > > critical is it that there be some pool of free pages from which the only
> > > bookkeeping required is the removal of that page from the free list.
> >
> > There are several reasons, all messy.
> 
> [snip]
> 
> > It's feasible.  It'd take some work.  Probably it would best be implemented
> > via a third list.  That list would be protected by an IRQ-safe lock,
> 
> I don't think we need to bother with the IRQ-safe part.
> 
> It's much simpler if we just do:
> 
> 1) have a normal free list, but have it smaller ...
>    say, between zone->pages_min and zone->pages_low
> 
> 2) if the free pages drop below the low water mark,
>    have either a normal allocator or a kernel thread
>    refill it to the high water mark, from the clean
>    pages list
> 
> 3) have the free+clean target set to something higher,
>    say zone->pages_high ... we could even tune this
>    automatically, if we run out of free+clean pages too
>    often kswapd should probably try to keep more pages
>    clean
> 
> What do you think, would this work?

Well, I'm at a bit of a loss to understand what the objective
of all this is.  Is it so that we can effectively increase the
cache size, by not "wasting" all that free memory?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: About the free page pool
  2002-09-02 20:33 ` Andrew Morton
  2002-09-02 20:50   ` Rik van Riel
@ 2002-09-02 21:58   ` Scott Kaplan
  2002-09-03  1:11     ` Andrew Morton
  2002-09-03 16:46     ` Daniel Phillips
  1 sibling, 2 replies; 12+ messages in thread
From: Scott Kaplan @ 2002-09-02 21:58 UTC (permalink / raw)
  To: linux-mm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Monday, September 2, 2002, at 04:33 PM, Andrew Morton wrote:

> Scott Kaplan wrote:
>> How important is it to maintain a list of free pages?  That is, how
>> critical is it that there be some pool of free pages from which the only
>> bookkeeping required is the removal of that page from the free list.
>
> There are several reasons, all messy.
>
> - We need to be able to allocate pages at interrupt time.  Mainly
>   for networking receive.

Okay, this actually seems pretty important, and I suspected that it would 
be a critical issue.  I suppose interrupts really do need to be as quick 
as possible, so doing the reclamation work during non-interrupt times is a 
good trade off.  That's a sufficient argument for me.

> - We sometimes need to allocate memory from *within* the context of
>   page reclaim: find a dirty page on the LRU, need to write it out,
>   need to allocate some memory to start the IO.  Where does that
>   memory come from.

That part could be handled without too much trouble, I believe.  If we're 
ensuring that some trailing portion of the inactive list is clean and 
ready for reclamation, then when the situation above arises, just allocate 
space by taking it from the end of the inactive list.  There should be no 
problem in doing that.

> - The kernel frequently needs to perform higher-order allocations:
>   two or more physically-contiguous pages.  The way we agglomerate
>   0-order pages into higher-order pages is by coalescing them in the
>   buddy.  If _all_ "free" pages are out on an LRU somewhere, we don't
>   have a higher-order pool to draw from.

What is the current approach to this problem?  Does the buddy allocator 
interact with the existing VM replacement policy so that, at times, the 
page occupying some particular page frame will be evicted not because it's 
the LRU page, but rather because its page frame is physically adjacent to 
some other free page?  In other words, I see the need to allocate 
physically contiguous groups of pages, and that the buddy allocator is 
used for that purpose, but what influence does the buddy allocator have to 
ensure that it can fulfill those higher-order allocations?

> It's a ratio of the zone size, and there are a few thresholds in there,
> for hysteresis, for emergency allocations, etc.  See free_area_init_core(
> )

I took a look, and if I'm calculating things correctly, pages_high seems 
to be set so that the free list is at most about 0.8% of the total number 
of pages in the zone.  For larger memories (above about 128 MB), that 
percentage decreases.  So we're keeping a modest pool of a few hundred 
pages -- not too big a deal.

[From a later email:]
> Well, I'm at a bit of a loss to understand what the objective
> of all this is.  Is it so that we can effectively increase the
> cache size, by not "wasting" all that free memory?

While I suppose it would be to keep those few hundred pages mapped and 
re-usable by the VM system, it would only make a difference in the miss 
rate under very tense and unlikely circumstances.  A few pages can make a 
big difference in the miss rate, but only if those few pages would allow 
the replacement policy to *just barely* keep the pages cached for long 
enough before they are referenced again.

My goal was a different one:  I just wanted some further simplification of 
the replacement mechanism.  When a free page is allocated, it gets mapped 
into some address space and inserted into the active list (right?).  If we 
wanted the active and inactive lists to remain a constant size (and for 
the movement of pages through those lists to be really simple), we could 
immediately evict a page from the active list into the inactive list, and 
then evict some other page from the inactive list to the free list.  If we 
did that, though, the use of a free list would be superfluous.

Since the approach I'm describing performs the VM bookkeeping during 
allocation (and, thus, potentially, interrupt) time, it would be a poor 
choice.  Evictions from the active and inactive lists must be performed at 
some other time.  Doing so is a tad more complicated, and makes the 
behavior of the replacement policy harder to model.  It seems, however, 
that to keep allocation fast, that bit of added complexity is necessary.

Thanks, as always,
Scott
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (Darwin)
Comment: For info see http://www.gnupg.org

iD8DBQE9c98P8eFdWQtoOmgRAjItAKCVve38NU+24lDPKTAO8AWNlTKXewCcDNtT
JQRKGGZ7AWsGh8nZLo93D5M=
=VA5B
-----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: About the free page pool
  2002-09-02 21:58   ` Scott Kaplan
@ 2002-09-03  1:11     ` Andrew Morton
  2002-09-03  1:35       ` Rik van Riel
  2002-09-03  5:12       ` William Lee Irwin III
  2002-09-03 16:46     ` Daniel Phillips
  1 sibling, 2 replies; 12+ messages in thread
From: Andrew Morton @ 2002-09-03  1:11 UTC (permalink / raw)
  To: Scott Kaplan; +Cc: linux-mm

Scott Kaplan wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Monday, September 2, 2002, at 04:33 PM, Andrew Morton wrote:
> 
> > Scott Kaplan wrote:
> >> How important is it to maintain a list of free pages?  That is, how
> >> critical is it that there be some pool of free pages from which the only
> >> bookkeeping required is the removal of that page from the free list.
> >
> > There are several reasons, all messy.
> >
> > - We need to be able to allocate pages at interrupt time.  Mainly
> >   for networking receive.
> 
> Okay, this actually seems pretty important, and I suspected that it would
> be a critical issue.  I suppose interrupts really do need to be as quick
> as possible, so doing the reclamation work during non-interrupt times is a
> good trade off.  That's a sufficient argument for me.
> 
> > - We sometimes need to allocate memory from *within* the context of
> >   page reclaim: find a dirty page on the LRU, need to write it out,
> >   need to allocate some memory to start the IO.  Where does that
> >   memory come from.
> 
> That part could be handled without too much trouble, I believe.  If we're
> ensuring that some trailing portion of the inactive list is clean and
> ready for reclamation, then when the situation above arises, just allocate
> space by taking it from the end of the inactive list.  There should be no
> problem in doing that.

yes.  But there are the latency issues as well.  We'll have cpu-local
pool of pages with which to satisfy most of these allocations anyway,
I guess.

> > - The kernel frequently needs to perform higher-order allocations:
> >   two or more physically-contiguous pages.  The way we agglomerate
> >   0-order pages into higher-order pages is by coalescing them in the
> >   buddy.  If _all_ "free" pages are out on an LRU somewhere, we don't
> >   have a higher-order pool to draw from.
> 
> What is the current approach to this problem?  Does the buddy allocator
> interact with the existing VM replacement policy so that, at times, the
> page occupying some particular page frame will be evicted not because it's
> the LRU page, but rather because its page frame is physically adjacent to
> some other free page?  In other words, I see the need to allocate
> physically contiguous groups of pages, and that the buddy allocator is
> used for that purpose, but what influence does the buddy allocator have to
> ensure that it can fulfill those higher-order allocations?

The current approach is guess-and-giggle.  It seems to work out that
there are enough physically contig pages for it to work.

The most important are 1-order allocations (8k, for kernel stacks).
The memory allocator will retry these allocations indefinitely, so
they end up succeeding, somehow.

I think there's a bug in there, actually.  If all zones have enough
free memory but there are no 1-order pages available, then the 1-order
allocator tried to run page reclaim, which will say "nope, nothing
needs doing".  Eventually, someone else returns some memory and coalescing
happens.   It's not a very glorious part of the kernel design.

> > It's a ratio of the zone size, and there are a few thresholds in there,
> > for hysteresis, for emergency allocations, etc.  See free_area_init_core(
> > )
> 
> I took a look, and if I'm calculating things correctly, pages_high seems
> to be set so that the free list is at most about 0.8% of the total number
> of pages in the zone.  For larger memories (above about 128 MB), that
> percentage decreases.  So we're keeping a modest pool of a few hundred
> pages -- not too big a deal.

Free memory seems to bottom out at about 2.2M on a 2.5G machine.

Note that the kernel statically allocates about 10M when it boots.  This
is basically a bug, and fixing it is a matter of running around shouting
at people.  This will happen ;)  This is the low-hanging fruit.


> [From a later email:]
> > Well, I'm at a bit of a loss to understand what the objective
> > of all this is.  Is it so that we can effectively increase the
> > cache size, by not "wasting" all that free memory?
> 
> While I suppose it would be to keep those few hundred pages mapped and
> re-usable by the VM system, it would only make a difference in the miss
> rate under very tense and unlikely circumstances.  A few pages can make a
> big difference in the miss rate, but only if those few pages would allow
> the replacement policy to *just barely* keep the pages cached for long
> enough before they are referenced again.

See 10M, above.

> My goal was a different one:  I just wanted some further simplification of
> the replacement mechanism.  When a free page is allocated, it gets mapped
> into some address space and inserted into the active list (right?).

Inactive, initially.  It changes with the vm-of-the-minute though.

>  If we
> wanted the active and inactive lists to remain a constant size (and for
> the movement of pages through those lists to be really simple), we could
> immediately evict a page from the active list into the inactive list, and
> then evict some other page from the inactive list to the free list.  If we
> did that, though, the use of a free list would be superfluous.
> 
> Since the approach I'm describing performs the VM bookkeeping during
> allocation (and, thus, potentially, interrupt) time, it would be a poor
> choice.  Evictions from the active and inactive lists must be performed at
> some other time.  Doing so is a tad more complicated, and makes the
> behavior of the replacement policy harder to model.  It seems, however,
> that to keep allocation fast, that bit of added complexity is necessary.
> 

Well, we never evict from the active list - just from the tail of the
inactive list.  But yes.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: About the free page pool
  2002-09-03  1:11     ` Andrew Morton
@ 2002-09-03  1:35       ` Rik van Riel
  2002-09-03  5:12       ` William Lee Irwin III
  1 sibling, 0 replies; 12+ messages in thread
From: Rik van Riel @ 2002-09-03  1:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Scott Kaplan, linux-mm

On Mon, 2 Sep 2002, Andrew Morton wrote:

> The most important are 1-order allocations (8k, for kernel stacks).
> The memory allocator will retry these allocations indefinitely, so
> they end up succeeding, somehow.
>
> I think there's a bug in there, actually.  If all zones have enough
> free memory but there are no 1-order pages available, then the 1-order
> allocator tried to run page reclaim, which will say "nope, nothing
> needs doing".  Eventually, someone else returns some memory and coalescing
> happens.   It's not a very glorious part of the kernel design.

This is fixable with rmap, though.  Another old item on my TODO list. ;(


regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: About the free page pool
  2002-09-03  1:11     ` Andrew Morton
  2002-09-03  1:35       ` Rik van Riel
@ 2002-09-03  5:12       ` William Lee Irwin III
  2002-09-03  5:43         ` Andrew Morton
  1 sibling, 1 reply; 12+ messages in thread
From: William Lee Irwin III @ 2002-09-03  5:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Scott Kaplan, linux-mm

On Mon, Sep 02, 2002 at 06:11:17PM -0700, Andrew Morton wrote:
> Note that the kernel statically allocates about 10M when it boots.  This
> is basically a bug, and fixing it is a matter of running around shouting
> at people.  This will happen ;)  This is the low-hanging fruit.

Are you referring to boot-time allocations using get_free_pages()
instead of bootmem? Killing those off would be nice, yes. It limits
the size of some hash tables on larger machines where "proportional
to memory" means "bigger than MAX_ORDER". (Changing the algorithms to
not use gargantuan hash tables might also be an interesting exercise
but one I've not got the bandwidth to take on.)

Cheers,
Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: About the free page pool
  2002-09-03  5:12       ` William Lee Irwin III
@ 2002-09-03  5:43         ` Andrew Morton
  2002-09-03  5:43           ` William Lee Irwin III
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2002-09-03  5:43 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Scott Kaplan, linux-mm

William Lee Irwin III wrote:
> 
> On Mon, Sep 02, 2002 at 06:11:17PM -0700, Andrew Morton wrote:
> > Note that the kernel statically allocates about 10M when it boots.  This
> > is basically a bug, and fixing it is a matter of running around shouting
> > at people.  This will happen ;)  This is the low-hanging fruit.
> 
> Are you referring to boot-time allocations using get_free_pages()
> instead of bootmem? Killing those off would be nice, yes. It limits
> the size of some hash tables on larger machines where "proportional
> to memory" means "bigger than MAX_ORDER". (Changing the algorithms to
> not use gargantuan hash tables might also be an interesting exercise
> but one I've not got the bandwidth to take on.)

Nope.  I'm referring to 1.5 megabytes lost to anonymous kmallocs,
two or three megabytes of biovec mempools, etc.  And that's with
NR_CPUS=4, and that's excluding all the statically allocated
array[NR_CPUS]s.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: About the free page pool
  2002-09-03  5:43         ` Andrew Morton
@ 2002-09-03  5:43           ` William Lee Irwin III
  0 siblings, 0 replies; 12+ messages in thread
From: William Lee Irwin III @ 2002-09-03  5:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Scott Kaplan, linux-mm

William Lee Irwin III wrote:
>> Are you referring to boot-time allocations using get_free_pages()
>> instead of bootmem? Killing those off would be nice, yes. It limits
>> the size of some hash tables on larger machines where "proportional
>> to memory" means "bigger than MAX_ORDER". (Changing the algorithms to
>> not use gargantuan hash tables might also be an interesting exercise
>> but one I've not got the bandwidth to take on.)

On Mon, Sep 02, 2002 at 10:43:04PM -0700, Andrew Morton wrote:
> Nope.  I'm referring to 1.5 megabytes lost to anonymous kmallocs,
> two or three megabytes of biovec mempools, etc.  And that's with
> NR_CPUS=4, and that's excluding all the statically allocated
> array[NR_CPUS]s.

Slightly different then. I don't know of anyone regularly testing 2.5.x
on 4MB machines, which might need a bit of help on this front if more
memory than they have is flushed down the toilet at boot.

I've got a collection of ancient toasters but the ports aren't booting,
and for reasons far deeper than this. 4MB bochs/x86 laptop? No time. =(

Cheers,
Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: About the free page pool
  2002-09-02 21:58   ` Scott Kaplan
  2002-09-03  1:11     ` Andrew Morton
@ 2002-09-03 16:46     ` Daniel Phillips
  1 sibling, 0 replies; 12+ messages in thread
From: Daniel Phillips @ 2002-09-03 16:46 UTC (permalink / raw)
  To: Scott Kaplan, linux-mm; +Cc: Andrew Morton

On Monday 02 September 2002 23:58, Scott Kaplan wrote:
> My goal was a different one:  I just wanted some further simplification of
> the replacement mechanism.

Simplifying the replacement mechanism has value as an aid to understanding,
or perhaps debugging.  There's also a strong case for maintaining a simple
VM design in parallel with the fancy one, as a compilation option.

Occasionally, someone will demonstrate that a far simpler design outperforms 
the fancy design de jour, causing considerable embarrassment to the incumbent 
designers.  It doesn't happen often though.  Usually, complexity is added to 
the VM for a good reason, and the fancier it gets, the better it works.  
Examples of this are division of the lru lists per zone and batching of vm 
operations.

At the risk of fueling (ahem) an analogy war, consider the classic 
carburetor.  As a means of mixing fuel and air for combustion, it's about as 
simple as you can get, but you can tweak the design as much as you like and 
it will never perform as well as a computer-controlled fuel injection system.

Even with all the recent optimizations lathered on, we are still working with 
a very simple underlying design, more like a carburetor than a flue injection 
system.  We mainly cross our fingers and hope that the system will magically 
solve its own problems.  For example, we hope that by making threads do their 
own vm scanning they will throttle and balance their memory consumption 
properly versus other threads.  This strategy has never worked reliably 
across a broad range of loads, though after a few years of tweaking, many of 
its typical faux pas have been identified and suppressed.

Such bandaid solutions do work for a time.  The problem is, the bandaids tend 
not to scale very well, either up or down.  So each new kernel generation 
requires a new set of bandaids, and usually a new team of medics to apply 
them.  After a while, the bandaids alone add up to more lines of code than 
the underlying VM mechanism, and it's time for a paradigm shift.  We're 
nearly at that point now.

In other words, after 2.6, carburetors will be out and computer-controlled 
fuel-injection will be in.

-- 
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2002-09-03 16:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-02 19:50 About the free page pool Scott Kaplan
2002-09-02 20:33 ` Andrew Morton
2002-09-02 20:50   ` Rik van Riel
2002-09-02 21:21     ` Andrew Morton
2002-09-02 21:15       ` Rik van Riel
2002-09-02 21:58   ` Scott Kaplan
2002-09-03  1:11     ` Andrew Morton
2002-09-03  1:35       ` Rik van Riel
2002-09-03  5:12       ` William Lee Irwin III
2002-09-03  5:43         ` Andrew Morton
2002-09-03  5:43           ` William Lee Irwin III
2002-09-03 16:46     ` Daniel Phillips

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox