Re: [PATCH] reapswap for 2.4.5-ac10

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH] reapswap for 2.4.5-ac10
@ 2001-06-06  8:39 Jonathan Morton
  2001-06-06 12:21 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Jonathan Morton @ 2001-06-06  8:39 UTC (permalink / raw)
  To: Marcelo Tosatti, linux-mm

>> > I'm resending the reapswap patch for inclusion into -ac series.
>>
>> Isn't it broken in this state?  Checking page_count, page->buffers and
>> PageSwapCache without the appropriate locks is dangerous.
>
>We hold the pagemap_lru_lock, so there will be no one doing lookups on
>this swap page (get_swapcache_page() locks pagemap_lru_lock).
>
>Am I overlooking something here?

Probably a good idea to hold the individual page's lock anyway.

BTW, does this clear out an area of allocated swap which all processes have
finished using?  For example, if a large process dies and leaves part of
itself in the swapcache, the swap space covered by this is currently not
retrieved until the swapcache is given enough pressure.  I *think* this is
what you're trying to address here, just want to be sure.

This is particularly relevant because I now have code which gives "new"
pages a low initial age, but swapped-in pages get a high initial age.
Since the dead process's swapcache pages have likely been swapped in
shortly before it's demise, they get very high ages and a new process
replacing the old one has an uphill struggle to force out the old pagecache
entries.  This hurts the MySQL compilation a lot with 32Mb or 48Mb physical
- but even without the "high age on swapin" patch, it must surely hurt
performance (albeit to a lesser degree).

*** UPDATE *** : I applied the patch, and it really does help.  Compile
time for MySQL is down to ~6m30s from ~8m30s with 48Mb physical, and the
behaviour after the monster file is finished is much improved.  For
reference, the MySQL compile takes ~5min on this box with all 256Mb
available.  It's a 1GHz Athlon.

>I've been saying for sometime now that I think only kswapd should do
>the page aging part. If we don't do it this way, heavy VM loads will make
>each memory intensive task age down other processes pages, so we see
>ourselves in a "unmapping/faulting" storm. Imagine what happens to
>interactivity in such a case.

Interesting observation.  Something else though, which kswapd is guilty of
as well: consider a page shared among many processes, eg. part of a
library.  As kswapd scans, the page is aged down for each process that uses
it.  So glibc gets aged down many times more quickly than a non-shared
page, precisely the opposite of what we really want to happen.  With
exponential-decay aging, and multiple processes doing the aging in this
manner, highly important things like glibc get muscled out in very short
order...

Maybe aging up/down needs to be done on a linear page scan, rather than a
per-process scan, and reserve the per-process scan for choosing process
pages to move into the swap arena.

Another point - when a page is earmarked for swapping out (allocated space,
moved into the swapcache area, etc) and is then re-referenced before it is
completely deallocated, it remains in the swapcache and is still allocated
in the swap region.  This seems backwards to me, and appears to be the
reason why "cache bloat" is visible on 2.4.5 systems - it isn't really
cache, but pages which are used by processes yet are still given space they
don't need, on disk.  It also neatly explains the large swap usage of 2.4
systems in general.  I fiddled temporarily with attempting to fix this, but
I couldn't figure out the correct way to deallocate a page from swap and
move it out of swapcache.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] reapswap for 2.4.5-ac10
  2001-06-06  8:39 [PATCH] reapswap for 2.4.5-ac10 Jonathan Morton
@ 2001-06-06 12:21 ` Andrew Morton
  2001-06-06 12:47   ` Stephen C. Tweedie
  2001-06-06 12:50   ` Jonathan Morton
  2001-06-06 19:18 ` Marcelo Tosatti
  2001-06-11  4:43 ` Joseph A. Knapka
  2 siblings, 2 replies; 19+ messages in thread
From: Andrew Morton @ 2001-06-06 12:21 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Marcelo Tosatti, linux-mm

Jonathan Morton wrote:
> 
> Interesting observation.  Something else though, which kswapd is guilty of
> as well: consider a page shared among many processes, eg. part of a
> library.  As kswapd scans, the page is aged down for each process that uses
> it.  So glibc gets aged down many times more quickly than a non-shared
> page, precisely the opposite of what we really want to happen.

Perhaps the page should be aged down by (1 / page->count)?

Just scale all the age stuff by 256 or 1000 or whatever and
instead of saying

	page->age -= CONSTANT;

you can use

	page->age -= (CONSTANT * 256) / atomic_read(page->count);


So the more users, the more slowly it ages.  You get the idea.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] reapswap for 2.4.5-ac10
  2001-06-06 12:21 ` Andrew Morton
@ 2001-06-06 12:47   ` Stephen C. Tweedie
  2001-06-06 12:50   ` Jonathan Morton
  1 sibling, 0 replies; 19+ messages in thread
From: Stephen C. Tweedie @ 2001-06-06 12:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jonathan Morton, Marcelo Tosatti, linux-mm

Hi,

On Wed, Jun 06, 2001 at 10:21:16PM +1000, Andrew Morton wrote:
> Jonathan Morton wrote:
> > 
> > Interesting observation.  Something else though, which kswapd is guilty of
> > as well: consider a page shared among many processes, eg. part of a
> > library.  As kswapd scans, the page is aged down for each process that uses
> > it.  So glibc gets aged down many times more quickly than a non-shared
> > page, precisely the opposite of what we really want to happen.
> 
> Perhaps the page should be aged down by (1 / page->count)?

The problem, of course, is that the referenced bit is not being
maintained at the same rate for all pages: we set it whenever we see a
mapping for it.  So, in fact, glibc can get aged *up* more than other
pages: because it is in multiple VMs, the swap loop has the chance to
rejuvinate the page more often.

We really want the aging done elsewhere.  Ideally, the VM page
scanning should be maintaining the state of the referenced bit on the
page, but the age manipulation should be done in the inactive-refill
loop.  That way the referenced-bit state would be propagated into the
page age at a uniform rate for all pages.  The difficulty is that the
refill-inactive loop and the try_to_swap_out loops proceed at
different rates, so it's not really possible at the moment to
determine all at once whether or not a page has been referenced in any
way since it was last seen.

Remember also that an unreferenced page gets unlinked from the page
tables in try_to_swap_out, so the presence of multiple inactive links
to glibc won't affect the swapper too much --- once those links have
been passed over once, they will be removed and we won't get extra
aging down done in subsequent passes.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] reapswap for 2.4.5-ac10
  2001-06-06 12:21 ` Andrew Morton
  2001-06-06 12:47   ` Stephen C. Tweedie
@ 2001-06-06 12:50   ` Jonathan Morton
  2001-06-06 13:12     ` Andrew Morton
  1 sibling, 1 reply; 19+ messages in thread
From: Jonathan Morton @ 2001-06-06 12:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Marcelo Tosatti, linux-mm

>> Interesting observation.  Something else though, which kswapd is guilty of
>> as well: consider a page shared among many processes, eg. part of a
>> library.  As kswapd scans, the page is aged down for each process that uses
>> it.  So glibc gets aged down many times more quickly than a non-shared
>> page, precisely the opposite of what we really want to happen.
>
>Perhaps the page should be aged down by (1 / page->count)?
>
>Just scale all the age stuff by 256 or 1000 or whatever and
>instead of saying
>
>	page->age -= CONSTANT;
>
>you can use
>
>	page->age -= (CONSTANT * 256) / atomic_read(page->count);
>
>
>So the more users, the more slowly it ages.  You get the idea.

However big you make that scaling constant, you'll always find some pages
which have more users than that.  Consider a shell server, and pages
belonging to glibc.  Once the number of users gets that large, the age will
go down by exactly zero, even if it just happens that the page is truly not
in use.

BUT, as it turns out, refill_inactive_scan() already does ageing down on a
page-by-page basis, rather than process-by-process.  So I can indeed take
out my little decrement in try_to_swap_out() and just leave the
bail-out-if-age-wasn't-zero code.  The age-up code stays - it's good to
catch accesses as frequently as possible.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)

The key to knowledge is not to rely on people to teach you it.

GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] reapswap for 2.4.5-ac10
  2001-06-06 12:50   ` Jonathan Morton
@ 2001-06-06 13:12     ` Andrew Morton
  2001-06-06 16:44       ` Jonathan Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2001-06-06 13:12 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Marcelo Tosatti, linux-mm

Jonathan Morton wrote:
> 
> >So the more users, the more slowly it ages.  You get the idea.
> 
> However big you make that scaling constant, you'll always find some pages
> which have more users than that.

2^24?

> BUT, as it turns out, refill_inactive_scan() already does ageing down on a
> page-by-page basis, rather than process-by-process.

Yes.  page->count needs looking at if you're doing physically-addressed
scanning.  Rik's patch probably does that.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] reapswap for 2.4.5-ac10
  2001-06-06 13:12     ` Andrew Morton
@ 2001-06-06 16:44       ` Jonathan Morton
  2001-06-06 17:01         ` Andrew Morton
  2001-06-09  7:46         ` Rik van Riel
  0 siblings, 2 replies; 19+ messages in thread
From: Jonathan Morton @ 2001-06-06 16:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Marcelo Tosatti, linux-mm

>> >So the more users, the more slowly it ages.  You get the idea.
>>
>> However big you make that scaling constant, you'll always find some pages
>> which have more users than that.
>
>2^24?

True, you aren't going to find 16 million processes on a box anytime soon.
However, it still doesn't quite appeal to me - it looks too much like a
hack.  What happens if, by some freak, someone does build a machine which
can handle that much?  Consider some future type of machine which is
essentially a Beowulf cluster with a single address space - I imagine NUMA
machines are already approaching this size.

>> BUT, as it turns out, refill_inactive_scan() already does ageing down on a
>> page-by-page basis, rather than process-by-process.
>
>Yes.  page->count needs looking at if you're doing physically-addressed
>scanning.  Rik's patch probably does that.

Explain...

AFAICT, the scanning in refill_inactive_scan() simply looks at a list of
pages, and doesn't really do physical addresses.  The age of a page should
be independent on the number of mappings it has, but dependent instead on
how much it is used (or how long it is not used for).  That code already
exists, and it works.

Also, I just sat down for a few minutes and figured out a very simple way
to get a proper working-set calculation without the fuss...  'course I have
to test it first.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)

The key to knowledge is not to rely on people to teach you it.

GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] reapswap for 2.4.5-ac10
  2001-06-06 16:44       ` Jonathan Morton
@ 2001-06-06 17:01         ` Andrew Morton
  2001-06-06 19:40           ` Jonathan Morton
  2001-06-09  7:46         ` Rik van Riel
  1 sibling, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2001-06-06 17:01 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Marcelo Tosatti, linux-mm

Jonathan Morton wrote:
> 
> >> >So the more users, the more slowly it ages.  You get the idea.
> >>
> >> However big you make that scaling constant, you'll always find some pages
> >> which have more users than that.
> >
> >2^24?
> 
> True, you aren't going to find 16 million processes on a box anytime soon.
> However, it still doesn't quite appeal to me - it looks too much like a
> hack.  What happens if, by some freak, someone does build a machine which
> can handle that much?  Consider some future type of machine which is
> essentially a Beowulf cluster with a single address space - I imagine NUMA
> machines are already approaching this size.

Sure.  SPARC has a 24 bit limit on atomic_t, so it'd better
not get too large :)

> >> BUT, as it turns out, refill_inactive_scan() already does ageing down on a
> >> page-by-page basis, rather than process-by-process.
> >
> >Yes.  page->count needs looking at if you're doing physically-addressed
> >scanning.  Rik's patch probably does that.
> 
> Explain...

Rik has a (big) patch which allows reverse lookups - physical back to
virtual.  So rather than scanning multiply mapped pages many times,
each page is scanned but once, and you can go from the physical
page back to all its users' ptes to see if/when any of them have
touched the page.  I think.   It'll be at http://www.surriel.com/patches/
Search for "pmap".

> AFAICT, the scanning in refill_inactive_scan() simply looks at a list of
> pages, and doesn't really do physical addresses.  The age of a page should
> be independent on the number of mappings it has, but dependent instead on
> how much it is used (or how long it is not used for).  That code already
> exists, and it works.

Well, the page will have different ages wrt all the mms which map it.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] reapswap for 2.4.5-ac10
  2001-06-06  8:39 [PATCH] reapswap for 2.4.5-ac10 Jonathan Morton
  2001-06-06 12:21 ` Andrew Morton
@ 2001-06-06 19:18 ` Marcelo Tosatti
  2001-06-06 21:13   ` Jonathan Morton
  2001-06-11  4:43 ` Joseph A. Knapka
  2 siblings, 1 reply; 19+ messages in thread
From: Marcelo Tosatti @ 2001-06-06 19:18 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: linux-mm


On Wed, 6 Jun 2001, Jonathan Morton wrote:

> *** UPDATE *** : I applied the patch, and it really does help.  Compile
> time for MySQL is down to ~6m30s from ~8m30s with 48Mb physical, and the
> behaviour after the monster file is finished is much improved.  For
> reference, the MySQL compile takes ~5min on this box with all 256Mb
> available.  It's a 1GHz Athlon.

Which patch ? :) 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] reapswap for 2.4.5-ac10
  2001-06-06 17:01         ` Andrew Morton
@ 2001-06-06 19:40           ` Jonathan Morton
  0 siblings, 0 replies; 19+ messages in thread
From: Jonathan Morton @ 2001-06-06 19:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Marcelo Tosatti, linux-mm

>> AFAICT, the scanning in refill_inactive_scan() simply looks at a list of
>> pages, and doesn't really do physical addresses.  The age of a page should
>> be independent on the number of mappings it has, but dependent instead on
>> how much it is used (or how long it is not used for).  That code already
>> exists, and it works.
>
>Well, the page will have different ages wrt all the mms which map it.

Hmmm...  I'm obviously still learning the intricacies of how this all fits
together.  I really thought that if you had a struct page*, it pointed to a
unique page and that the pte's of different vma's were capable of multiply
pointing at said struct page*.  But, isn't that what page->count is for?
So have I grabbed the wrong end of what you're saying, and in fact I had it
right in the first place?

So, if multiple processes are really using a single page, then it makes
sense for the age to skyrocket - you don't wanna swap that page out,
otherwise all the processes that are using it will stall.  If you have a
shared page that isn't being used, you want the age to decay at the same
rate as non-shared pages, though it doesn't particularly matter what that
rate is.

Once this is achieved, the age turns into a reasonable approximation to
working set - as long as we don't force the age down under memory pressure
without allowing other processes to get in on the act.  Ah, that seems to
be what we're doing at the moment...

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)

The key to knowledge is not to rely on people to teach you it.

GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] reapswap for 2.4.5-ac10
  2001-06-06 19:18 ` Marcelo Tosatti
@ 2001-06-06 21:13   ` Jonathan Morton
  2001-06-07 14:45     ` John Stoffel
  0 siblings, 1 reply; 19+ messages in thread
From: Jonathan Morton @ 2001-06-06 21:13 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-mm

>> *** UPDATE *** : I applied the patch, and it really does help.  Compile
>> time for MySQL is down to ~6m30s from ~8m30s with 48Mb physical, and the
>> behaviour after the monster file is finished is much improved.  For
>> reference, the MySQL compile takes ~5min on this box with all 256Mb
>> available.  It's a 1GHz Athlon.
>
>Which patch ? :)

The one which deals with dead swapcache pages.  I want to apply the one
which actively eats them using kreclaimd, too.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)

The key to knowledge is not to rely on people to teach you it.

GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] reapswap for 2.4.5-ac10
  2001-06-06 21:13   ` Jonathan Morton
@ 2001-06-07 14:45     ` John Stoffel
  2001-06-07 16:45       ` Jonathan Morton
  0 siblings, 1 reply; 19+ messages in thread
From: John Stoffel @ 2001-06-07 14:45 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Marcelo Tosatti, linux-mm

Jonathan> The one which deals with dead swapcache pages.  I want to
Jonathan> apply the one which actively eats them using kreclaimd, too.

Why do we need yet another daemon to reap pages/swap/cache from the
system?  

Or am I mis-understanding you here and you'd just be adding some stuff
to kswapd?

John
   John Stoffel - Senior Unix Systems Administrator - Lucent Technologies
	 stoffel@lucent.com - http://www.lucent.com - 978-952-7548
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] reapswap for 2.4.5-ac10
  2001-06-07 14:45     ` John Stoffel
@ 2001-06-07 16:45       ` Jonathan Morton
  0 siblings, 0 replies; 19+ messages in thread
From: Jonathan Morton @ 2001-06-07 16:45 UTC (permalink / raw)
  To: John Stoffel; +Cc: Marcelo Tosatti, linux-mm

At 3:45 pm +0100 7/6/2001, John Stoffel wrote:
>Jonathan> The one which deals with dead swapcache pages.  I want to
>Jonathan> apply the one which actively eats them using kreclaimd, too.
>
>Why do we need yet another daemon to reap pages/swap/cache from the
>system?
>
>Or am I mis-understanding you here and you'd just be adding some stuff
>to kswapd?

It wasn't me.  :)  kreclaimd already exists (i think it shows up as bdflush
in top), the patch I'm looking at adds swapcache-reclaim duties to it.
It's the same family of code as kswapd.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)

The key to knowledge is not to rely on people to teach you it.

GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] reapswap for 2.4.5-ac10
  2001-06-06 16:44       ` Jonathan Morton
  2001-06-06 17:01         ` Andrew Morton
@ 2001-06-09  7:46         ` Rik van Riel
  1 sibling, 0 replies; 19+ messages in thread
From: Rik van Riel @ 2001-06-09  7:46 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Andrew Morton, Marcelo Tosatti, linux-mm

On Wed, 6 Jun 2001, Jonathan Morton wrote:

> >> BUT, as it turns out, refill_inactive_scan() already does ageing down on a
> >> page-by-page basis, rather than process-by-process.
> >
> >Yes.  page->count needs looking at if you're doing physically-addressed
> >scanning.  Rik's patch probably does that.
> 
> Explain...
> 
> AFAICT, the scanning in refill_inactive_scan() simply looks at a list
> of pages, and doesn't really do physical addresses.

http://www.surriel.com/patches/2.4/2.4.5-ac5-pmap

In this patch, the kernel looks at the page table entries
using a page from refill_inactive() and does its page aging
on a per-physical-page basis.

Of course, this costs us some overhead and I'm not at all
convinced we actually want to use this strategy. It's just
too much fun to code to not try ;)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] reapswap for 2.4.5-ac10
  2001-06-06  8:39 [PATCH] reapswap for 2.4.5-ac10 Jonathan Morton
  2001-06-06 12:21 ` Andrew Morton
  2001-06-06 19:18 ` Marcelo Tosatti
@ 2001-06-11  4:43 ` Joseph A. Knapka
  2 siblings, 0 replies; 19+ messages in thread
From: Joseph A. Knapka @ 2001-06-11  4:43 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: linux-mm

Hi Jonathan,

Jonathan Morton wrote:
> 
> Interesting observation.  Something else though, which kswapd is guilty of
> as well: consider a page shared among many processes, eg. part of a
> library.  As kswapd scans, the page is aged down for each process that uses
> it.  So glibc gets aged down many times more quickly than a non-shared
> page, precisely the opposite of what we really want to happen.  With
> exponential-decay aging, and multiple processes doing the aging in this
> manner, highly important things like glibc get muscled out in very short
> order...

Are you sure about this? The only place pages are
aged down is in refill_inactive_scan(), which scans
the active_list, not process PTEs. Aging *up*, OTOH,
is done on a per-mapping basis, in try_to_swap_out()
(as well as linearly in refill_inactive_scan(), go
figure). This seems to be the rationale for making
age-down a division, and age-up an increment. Of course,
when memory is tight a lot of processes are going to
be waking up kswapd, so all pages are going to age more
quickly in that case, but we're never aging a page down
proportional to the number of processes that have it
mapped.

> Maybe aging up/down needs to be done on a linear page scan, rather than a
> per-process scan, and reserve the per-process scan for choosing process
> pages to move into the swap arena.

It would seem to make sense to do aging up and down
consistently. The (or a) way to do that is to make
try_to_swap_out() set PG_referenced, rather than age
the page up itself. Then no matter how many times the
page is touched, it will be aged up only once, next
time refill_inactive_scan() sees it.

On the other hand, what makes sense on a cursory
inspection may not be at all good in practice. I think
the way it works now is intuitively pretty reasonable:
a global downward decay of page->age for all pages,
which processes can counteract by referencing the page
frequently. When age is <3, the exponential decay is
coincidentally linear, so a page mapped by one process
can be kept active by being referenced (and noticed
by try_to_swap_out()) at least once during each of
refill_inactive_scan's trips through the active_list.
A page mapped by two processes has to be referenced
half as often by each, on average, to stay active
(assuming that the swap_out() scan visits all process
PTEs in approximately the same interval that
refill_inactive_scan() visits all the active pages).

-- Joe

-- Joseph A. Knapka
"You know how many remote castles there are along the gorges? You
 can't MOVE for remote castles!" -- Lu Tze re. Uberwald
// Linux MM Documentation in progress:
// http://home.earthlink.net/~jknapka/linux-mm/vmoutline.html
* Evolution is an "unproven theory" in the same sense that gravity is. *
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] reapswap for 2.4.5-ac10
  2001-06-06 12:31     ` Hugh Dickins
@ 2001-06-06 19:17       ` Marcelo Tosatti
  0 siblings, 0 replies; 19+ messages in thread
From: Marcelo Tosatti @ 2001-06-06 19:17 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Stephen C. Tweedie, Alan Cox, André Dahlqvist, linux-mm


On Wed, 6 Jun 2001, Hugh Dickins wrote:

> On Tue, 5 Jun 2001, Marcelo Tosatti wrote:
> > On Tue, 5 Jun 2001, Stephen C. Tweedie wrote:
> > > On Tue, Jun 05, 2001 at 04:48:46PM -0300, Marcelo Tosatti wrote:
> > > > I'm resending the reapswap patch for inclusion into -ac series. 
> > > 
> > > Isn't it broken in this state?  Checking page_count, page->buffers and
> > > PageSwapCache without the appropriate locks is dangerous.
> > 
> > We hold the pagemap_lru_lock, so there will be no one doing lookups on
> > this swap page (get_swapcache_page() locks pagemap_lru_lock).
> > 
> > Am I overlooking something here? 
> 
> mm/shmem.c:shmem_getpage_locked() and mm/swapfile.c:try_to_unuse()
> call delete_from_swap_cache_nolock(), both holding page lock,
> neither holding pagemap_lru_lock.
> 
> Unless you hold the page lock, PageSwapCache(page) and page->index
> are volatile, but to find swap_count(page) you have to rely on both
> of them.  TryLockPage()?

Thanks for the comments. 

I'll post a new patch which uses TryLockPage soon.

Thanks! 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] reapswap for 2.4.5-ac10
  2001-06-05 20:46   ` Marcelo Tosatti
@ 2001-06-06 12:31     ` Hugh Dickins
  2001-06-06 19:17       ` Marcelo Tosatti
  0 siblings, 1 reply; 19+ messages in thread
From: Hugh Dickins @ 2001-06-06 12:31 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Stephen C. Tweedie, Alan Cox, André Dahlqvist, linux-mm

On Tue, 5 Jun 2001, Marcelo Tosatti wrote:
> On Tue, 5 Jun 2001, Stephen C. Tweedie wrote:
> > On Tue, Jun 05, 2001 at 04:48:46PM -0300, Marcelo Tosatti wrote:
> > > I'm resending the reapswap patch for inclusion into -ac series. 
> > 
> > Isn't it broken in this state?  Checking page_count, page->buffers and
> > PageSwapCache without the appropriate locks is dangerous.
> 
> We hold the pagemap_lru_lock, so there will be no one doing lookups on
> this swap page (get_swapcache_page() locks pagemap_lru_lock).
> 
> Am I overlooking something here? 

mm/shmem.c:shmem_getpage_locked() and mm/swapfile.c:try_to_unuse()
call delete_from_swap_cache_nolock(), both holding page lock,
neither holding pagemap_lru_lock.

Unless you hold the page lock, PageSwapCache(page) and page->index
are volatile, but to find swap_count(page) you have to rely on both
of them.  TryLockPage()?

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] reapswap for 2.4.5-ac10
  2001-06-05 19:48 Marcelo Tosatti
@ 2001-06-05 22:14 ` Stephen C. Tweedie
  2001-06-05 20:46   ` Marcelo Tosatti
  0 siblings, 1 reply; 19+ messages in thread
From: Stephen C. Tweedie @ 2001-06-05 22:14 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Alan Cox, André Dahlqvist, linux-mm

Hi,

On Tue, Jun 05, 2001 at 04:48:46PM -0300, Marcelo Tosatti wrote:
 
> I'm resending the reapswap patch for inclusion into -ac series. 

Isn't it broken in this state?  Checking page_count, page->buffers and
PageSwapCache without the appropriate locks is dangerous.  I think you
need the page lock at the very least before making this test.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] reapswap for 2.4.5-ac10
  2001-06-05 22:14 ` Stephen C. Tweedie
@ 2001-06-05 20:46   ` Marcelo Tosatti
  2001-06-06 12:31     ` Hugh Dickins
  0 siblings, 1 reply; 19+ messages in thread
From: Marcelo Tosatti @ 2001-06-05 20:46 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Alan Cox, André Dahlqvist, linux-mm


On Tue, 5 Jun 2001, Stephen C. Tweedie wrote:

> Hi,
> 
> On Tue, Jun 05, 2001 at 04:48:46PM -0300, Marcelo Tosatti wrote:
>  
> > I'm resending the reapswap patch for inclusion into -ac series. 
> 
> Isn't it broken in this state?  Checking page_count, page->buffers and
> PageSwapCache without the appropriate locks is dangerous.

We hold the pagemap_lru_lock, so there will be no one doing lookups on
this swap page (get_swapcache_page() locks pagemap_lru_lock).

Am I overlooking something here? 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] reapswap for 2.4.5-ac10
@ 2001-06-05 19:48 Marcelo Tosatti
  2001-06-05 22:14 ` Stephen C. Tweedie
  0 siblings, 1 reply; 19+ messages in thread
From: Marcelo Tosatti @ 2001-06-05 19:48 UTC (permalink / raw)
  To: Alan Cox; +Cc: André Dahlqvist, linux-mm

Hi Alan, 

I'm resending the reapswap patch for inclusion into -ac series. 

--- linux.orig/mm/vmscan.c	Wed May 30 14:51:21 2001
+++ linux/mm/vmscan.c	Wed May 30 16:18:41 2001
@@ -461,6 +461,28 @@
 			continue;
 		}
 
+		/*
+		 * FIXME: this is a hack.
+		 *
+		 * Check for dead swap cache pages and clean
+		 * them as fast as possible, before doing any other checks.
+		 *
+		 * Note: We are guaranteeing that this page will never 
+		 * be touched in the future because a dirty page with no
+		 * other users than the swapcache will never be referenced
+		 * again.
+		 * 
+		 */
+
+		if (PageSwapCache(page) && PageDirty(page) &&
+				(page_count(page) - !!page->buffers) == 1 &&
+				swap_count(page) == 1) { 
+			ClearPageDirty(page);
+			ClearPageReferenced(page);
+			page->age = 0;
+		}
+
+			
 		/* Page is or was in use?  Move it to the active list. */
 		if (PageReferenced(page) || page->age > 0 ||
 				(!page->buffers && page_count(page) > 1) ||
@@ -686,6 +708,21 @@
 			nr_active_pages--;
 			continue;
 		}
+		
+		/*
+		 * FIXME: hack
+		 *
+		 * Special case for dead swap cache pages.
+		 * See comment on page_launder() for more info.
+		 */
+		if (PageSwapCache(page) && PageDirty(page) &&
+				(page_count(page) - !!page->buffers) == 1 &&
+				swap_count(page) == 1) {
+			deactivate_page_nolock(page);
+			nr_deactivated++;
+			continue;
+		}
+
 
 		/* Do aging on the pages. */
 		if (PageTestandClearReferenced(page)) {


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2001-06-11  4:43 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-06-06  8:39 [PATCH] reapswap for 2.4.5-ac10 Jonathan Morton
2001-06-06 12:21 ` Andrew Morton
2001-06-06 12:47   ` Stephen C. Tweedie
2001-06-06 12:50   ` Jonathan Morton
2001-06-06 13:12     ` Andrew Morton
2001-06-06 16:44       ` Jonathan Morton
2001-06-06 17:01         ` Andrew Morton
2001-06-06 19:40           ` Jonathan Morton
2001-06-09  7:46         ` Rik van Riel
2001-06-06 19:18 ` Marcelo Tosatti
2001-06-06 21:13   ` Jonathan Morton
2001-06-07 14:45     ` John Stoffel
2001-06-07 16:45       ` Jonathan Morton
2001-06-11  4:43 ` Joseph A. Knapka
  -- strict thread matches above, loose matches on Subject: below --
2001-06-05 19:48 Marcelo Tosatti
2001-06-05 22:14 ` Stephen C. Tweedie
2001-06-05 20:46   ` Marcelo Tosatti
2001-06-06 12:31     ` Hugh Dickins
2001-06-06 19:17       ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox