linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [DATAPOINT] pre7-6 will not swap
@ 2000-05-05  8:07 Benjamin Redelings I
  0 siblings, 0 replies; 18+ messages in thread
From: Benjamin Redelings I @ 2000-05-05  8:07 UTC (permalink / raw)
  To: linux-mm

Hi,
	I just compiled pre7-6.  It seems more useable than pre7-5.  However,
it basically does not swap.  The first time there is any memory
pressure, it swaps 32 pages (128k), and it never swaps again. 
	In similar circumstances, pre7-4 has gotten up to 30Mb swapped.  There
are many unused daemons running in my 64Mb RAM.

	I also reverted to
  count = nr_threads / (priority +1)
 	though I didn't check carefully what this did.  Anyway, it doesn't
seem to make a difference.	

</datapoint>

-BenRI

UP PPro, IDE, 64MB RAM
-- 
"I want to be in the light, as He is in the Light,
 I want to shine like the stars in the heavens." - DC Talk, "In the
Light"
Benjamin Redelings I      <><     http://www.bol.ucla.edu/~bredelin/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [DATAPOINT] pre7-6 will not swap
  2000-05-09  2:33       ` Linus Torvalds
@ 2000-05-09  3:31         ` Rajagopal Ananthanarayanan
  0 siblings, 0 replies; 18+ messages in thread
From: Rajagopal Ananthanarayanan @ 2000-05-09  3:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Quintela Carreira Juan J.,
	Andrea Arcangeli, Benjamin Redelings I, linux-mm

Linus Torvalds wrote:
> 
	[ ... ]
> 
> The "don't page out pages from zones that don't need it" test is a good
> test, but it turns out that it triggers a rather serious problem: the way
> the buffer cache dirty page handling is done is by having shrink_mmap() do
> a "try_to_free_buffers()" on the pages it encounters that have
> "page->buffer" set.
> 
> And doing that is quite important, because without that logic the buffers
> don't get written to disk in a timely manner, nor do already-written
> buffers get refiled to their proper lists. So you end up being "out of
> memory" - not because the machine is really out of memory, but because
> those buffers have a tendency to stick around if they aren't constantly
> looked after by "try_to_free_buffers()".
> 
> So the real fix ended up being to re-order the tests in shrink_mmap() a
> bit, so that try_to_free_buffers() is called even for pages that are on
> a good zone that doesn't need any real balancing..

Not sure entirely what effect this has, except for freeing underlying
buffer_head's. The page itself is still skipped. Anyway, brief examination
shows that you've changed several things here (in 7-7), so I'll have to go
at it some more time to get a full picture.

> 
> [ time passes ]
> 
> pre7-7 is there now.
> 
>                 Linus

Unfortunately my dbench test really runs bad with pre 7-7.
Quantitively, the amount of memory in "cache" of vmstat
is higher than before. write()'s start failing. 

More later,

-- 
--------------------------------------------------------------------------
Rajagopal Ananthanarayanan ("ananth")
Member Technical Staff, SGI.
--------------------------------------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [DATAPOINT] pre7-6 will not swap
  2000-05-09  1:52     ` Quintela Carreira Juan J.
  2000-05-09  2:28       ` Rajagopal Ananthanarayanan
@ 2000-05-09  2:33       ` Linus Torvalds
  2000-05-09  3:31         ` Rajagopal Ananthanarayanan
  1 sibling, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2000-05-09  2:33 UTC (permalink / raw)
  To: Quintela Carreira Juan J.
  Cc: Rajagopal Ananthanarayanan, Andrea Arcangeli,
	Benjamin Redelings I, linux-mm


On 9 May 2000, Quintela Carreira Juan J. wrote:
> Hi Linus, 
>    I have tested two versions of the patch (against vanilla
> pre7-6), the first was to remove the test altogether (I think this is
> from Rajagopal):

I'll make my current pre7-7 available right away, to head off the
discussion.

I found out the real reason for the problem, and it was quite a lot more
subtle than I originally thought.

The "don't page out pages from zones that don't need it" test is a good
test, but it turns out that it triggers a rather serious problem: the way
the buffer cache dirty page handling is done is by having shrink_mmap() do
a "try_to_free_buffers()" on the pages it encounters that have
"page->buffer" set.

And doing that is quite important, because without that logic the buffers
don't get written to disk in a timely manner, nor do already-written
buffers get refiled to their proper lists. So you end up being "out of
memory" - not because the machine is really out of memory, but because
those buffers have a tendency to stick around if they aren't constantly
looked after by "try_to_free_buffers()".

So the real fix ended up being to re-order the tests in shrink_mmap() a
bit, so that try_to_free_buffers() is called even for pages that are on
a good zone that doesn't need any real balancing..

[ time passes ] 

pre7-7 is there now.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [DATAPOINT] pre7-6 will not swap
  2000-05-09  1:52     ` Quintela Carreira Juan J.
@ 2000-05-09  2:28       ` Rajagopal Ananthanarayanan
  2000-05-09  2:33       ` Linus Torvalds
  1 sibling, 0 replies; 18+ messages in thread
From: Rajagopal Ananthanarayanan @ 2000-05-09  2:28 UTC (permalink / raw)
  To: Quintela Carreira Juan J.
  Cc: Linus Torvalds, Andrea Arcangeli, Benjamin Redelings I, linux-mm

"Quintela Carreira Juan J." wrote:
> 
> >>>>> "linus" == Linus Torvalds <torvalds@transmeta.com> writes:
> 
> linus> in vmscan.c, and that seems to be quite well-behaved too (but if somebody
> linus> has the energy to test the two different versions, I'd absolutely love to
> linus> hear results..)
> 
> Hi Linus,
>    I have tested two versions of the patch (against vanilla
> pre7-6), the first was to remove the test altogether (I think this is
> from Rajagopal):
> 
> --- pre7-6/mm/vmscan.c  Fri May  5 23:58:56 2000
> +++ testing/mm/vmscan.c Mon May  8 23:30:52 2000
> @@ -114,8 +114,9 @@
>          * Don't do any of the expensive stuff if
>          * we're not really interested in this zone.
>          */
> -       if (!page->zone->zone_wake_kswapd)
> +/*     if (!page->zone->zone_wake_kswapd)
>                 goto out_unlock;
> +*/
> 

I'm having the same experience too. The one thing
that makes stuff better is not to look at the zone at all
in try_to_swap_out (as Juan points out above).

I'm trying to also see if we can do better in shrink_mmap().
Although my gprof statistics say that we can end-up spending
91% of the time skipping pages, I'm not able to comeup with
anything simple to make shrink_mmap behave better ... except
one change which makes swapping a lot less and shrink_mmap
a lot more agressive: don't skip pages based on zone's
high water mark if we are trying hard to free pages (my heuristic
was to stop skipping pages if priority in shrink_mmap was 3; YMMV).
I'm not entirely convinced that this is the right thing to do.

In all, I do think that try_to_swap_out shouldn't skip pages
based on zones. We have now evidence from 3 different "workloads"
in this direction --- my own dbench test, Juan's test above &
Benjamin's "gaming" workload.


--------------------------------------------------------------------------
Rajagopal Ananthanarayanan ("ananth")
Member Technical Staff, SGI.
--------------------------------------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [DATAPOINT] pre7-6 will not swap
  2000-05-06 19:35   ` Linus Torvalds
  2000-05-06  5:35     ` Benjamin Redelings I
@ 2000-05-09  1:52     ` Quintela Carreira Juan J.
  2000-05-09  2:28       ` Rajagopal Ananthanarayanan
  2000-05-09  2:33       ` Linus Torvalds
  1 sibling, 2 replies; 18+ messages in thread
From: Quintela Carreira Juan J. @ 2000-05-09  1:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rajagopal Ananthanarayanan, Andrea Arcangeli,
	Benjamin Redelings I, linux-mm

>>>>> "linus" == Linus Torvalds <torvalds@transmeta.com> writes:


linus> in vmscan.c, and that seems to be quite well-behaved too (but if somebody
linus> has the energy to test the two different versions, I'd absolutely love to
linus> hear results..)

Hi Linus, 
   I have tested two versions of the patch (against vanilla
pre7-6), the first was to remove the test altogether (I think this is
from Rajagopal):

--- pre7-6/mm/vmscan.c	Fri May  5 23:58:56 2000
+++ testing/mm/vmscan.c	Mon May  8 23:30:52 2000
@@ -114,8 +114,9 @@
 	 * Don't do any of the expensive stuff if
 	 * we're not really interested in this zone.
 	 */
-	if (!page->zone->zone_wake_kswapd)
+/*	if (!page->zone->zone_wake_kswapd)
 		goto out_unlock;
+*/
 
 	/*
 	 * Ok, it's really dirty. That means that

Second one  is the Linus suggestion, change the test for:

diff -u -urN --exclude=CVS --exclude=*~ --exclude=.#* --exclude=TAGS pre7-6/mm/vmscan.c testing2/mm/vmscan.c
--- pre7-6/mm/vmscan.c	Fri May  5 23:58:56 2000
+++ testing2/mm/vmscan.c	Tue May  9 01:46:08 2000
@@ -114,7 +114,7 @@
 	 * Don't do any of the expensive stuff if
 	 * we're not really interested in this zone.
 	 */
-	if (!page->zone->zone_wake_kswapd)
+	if (page->zone->free_pages > page->zone->pages_high)
 		goto out_unlock;
 
 	/*
and thred one was the classzone-25 patch from Andrea.

The test is one of my tests:
    while (true); do time ./mmap002; done
which the size parameter adjusted to the size of te memory of the
system.

        The results are:
vanilla pre7-6 kills *all* my processes after 2 minutes and a half 
pre7-6 + Rajagopal:  Works quite well, times are stable between 2m20
                     and 3m10 (didn't kill any processes)

pre7-6 + Linus:      Kill all the processes after 3m and a few
                     seconds.

pre7-6 + classzone25: between 2m8 seconds and 2m23.

2.2.15: between 1m50 and 2m15 (the time is quite stable around 1m50)
        It has killed one process in 7 so far.

If you need more information, let me know.  As always comments,
suggestions are welcome.

Later, Juan.

-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [DATAPOINT] pre7-6 will not swap
  2000-05-07 19:13                 ` Rajagopal Ananthanarayanan
@ 2000-05-07 19:30                   ` Linus Torvalds
  0 siblings, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2000-05-07 19:30 UTC (permalink / raw)
  To: Rajagopal Ananthanarayanan; +Cc: riel, Benjamin Redelings I, linux-mm


On Sun, 7 May 2000, Rajagopal Ananthanarayanan wrote:
> 
> In the presense unreferenced pages in zones with free_pages > pages_high,
> should shrink_mmap ever fail? Current shrink_mmap will
> always skip over the pages of such zones. This in turn
> can lead to swapping.

I think shrink_mmap() should fail for that case: it tells the logic that
calls it that its time to stop calling shrink_mmap(), and go to vmscan
instead (so that next time we call shrink_mmap, we may in fact find some
pages to free).

If there really are tons of pages with free_pages > pages_high, then we
must have called shrink_mmap() for some other reason, so we're probably
interested in another zone altogether that isn't even a subset of the
"tons of memory" case (because if we had been interested in any class that
has the "lots of free memory" zone as a subset, then the logic in
__alloc_pages() would just have allocated it directly without worrying
about zone balancing at all).

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [DATAPOINT] pre7-6 will not swap
  2000-05-07 17:53               ` Linus Torvalds
@ 2000-05-07 19:13                 ` Rajagopal Ananthanarayanan
  2000-05-07 19:30                   ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Rajagopal Ananthanarayanan @ 2000-05-07 19:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: riel, Benjamin Redelings I, linux-mm

Linus Torvalds wrote:
> 

> 
> It can also make the aging less efficient.
> 
> But my real reason for disliking it is that I prefer conceptually simple
> approaches, and that one test just doesn't fit conceptually ;)

Linus & Rik, agreed that the second_scan logic I proposed
earlier was not perfect.

And, I agree that we should make things simpler. One question
about what shrink_mmap is trying to accomplish, conceptually:

In the presense unreferenced pages in zones with free_pages > pages_high,
should shrink_mmap ever fail? Current shrink_mmap will
always skip over the pages of such zones. This in turn
can lead to swapping.


--------------------------------------------------------------------------
Rajagopal Ananthanarayanan ("ananth")
Member Technical Staff, SGI.
--------------------------------------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [DATAPOINT] pre7-6 will not swap
  2000-05-07 17:40             ` Rik van Riel
@ 2000-05-07 17:53               ` Linus Torvalds
  2000-05-07 19:13                 ` Rajagopal Ananthanarayanan
  0 siblings, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2000-05-07 17:53 UTC (permalink / raw)
  To: riel; +Cc: Rajagopal Ananthanarayanan, Benjamin Redelings I, linux-mm


On Sun, 7 May 2000, Rik van Riel wrote:

> On Sat, 6 May 2000, Linus Torvalds wrote:
> 
> >  - looking at "shrink_mmap()", my reaction would not be to add more
> >    complexity to it, but to remove the _one_ special case that looks at
> >    one specific zone:
> > 
> >         /* wrong zone?  not looped too often?    roll again... */
> >         if (page->zone != zone && count)
> >                 goto again;
> > 
> >    I would suggest just removing that test altogether. The page wasn't
> >    from a "wrong zone". It was just a different zone that also needed
> >    balancing.
> 
> The danger in this is that we could "use up" the remaining
> ticks on the count variable in do_try_to_free_pages() and
> end up with a failed rmqueue for the request...

I agree.

However, I think the logic should be
 - kswapd tries to keep all zones reasonably well balanced
 - but kswapd obviously cannot do a perfect job, especially with bursty
   allocations, so:
 - we should at some point start synchronously helping kswapd
 - if somebody has special requirements, they may not be always possibly
   under all circumstances.

Basically, it boils down to: we should try to do our best, but we cannot
do wonders and we should realize that too.

> Oh, and the return value for shrink_mmap() will still
> indicate success, even if we failed to free a page for
> the zone we intended ... we've already decided for that
> before we get into the loop or not.

You're right. The only downside to the extra test is that it unbalances
the page freeing, and can lead to (for example) not using swap very
efficiently because we're looping too much in shrink_mmap. Which actually
seems to be one of the symptoms right now, but it may of course be dueto
something else too.

It can also make the aging less efficient.

But my real reason for disliking it is that I prefer conceptually simple
approaches, and that one test just doesn't fit conceptually ;)

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [DATAPOINT] pre7-6 will not swap
  2000-05-07  2:23           ` Linus Torvalds
@ 2000-05-07 17:40             ` Rik van Riel
  2000-05-07 17:53               ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Rik van Riel @ 2000-05-07 17:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rajagopal Ananthanarayanan, Benjamin Redelings I, linux-mm

On Sat, 6 May 2000, Linus Torvalds wrote:

> My personal inclination is along the lines of
>  - we never really care about any particular zone. We should make sure
>    that all zones get balanced, and that is what running kswapd will
>    eventually cause. 
>  - things like "shrink_mmap" and "vmscan" should both free any page from
>    any zone that is (a) a good candidateand (b) the zone is not yet
>    well-balanced.

double-nod

>  - looking at "shrink_mmap()", my reaction would not be to add more
>    complexity to it, but to remove the _one_ special case that looks at
>    one specific zone:
> 
>         /* wrong zone?  not looped too often?    roll again... */
>         if (page->zone != zone && count)
>                 goto again;
> 
>    I would suggest just removing that test altogether. The page wasn't
>    from a "wrong zone". It was just a different zone that also needed
>    balancing.

The danger in this is that we could "use up" the remaining
ticks on the count variable in do_try_to_free_pages() and
end up with a failed rmqueue for the request...

Oh, and the return value for shrink_mmap() will still
indicate success, even if we failed to free a page for
the zone we intended ... we've already decided for that
before we get into the loop or not.

But I agree that this test is wrong; it makes shrink_mmap()
loop to often compared to swap_out(), leading to worse page
aging in the swap cache and increased cpu use.

The solution could be to let do_try_to_free_page() loop
more often than it does now ... increasing our chances
of freeing from the right zone while at the same time
not increasing the amount of work to be done (we need
to do it anyway, so why not do it now and have that
memory allocation succeed?)

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [DATAPOINT] pre7-6 will not swap
  2000-05-06 22:24         ` Rajagopal Ananthanarayanan
  2000-05-06 14:03           ` Benjamin Redelings I
  2000-05-07  0:22           ` Rik van Riel
@ 2000-05-07  2:23           ` Linus Torvalds
  2000-05-07 17:40             ` Rik van Riel
  2 siblings, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2000-05-07  2:23 UTC (permalink / raw)
  To: Rajagopal Ananthanarayanan; +Cc: riel, Benjamin Redelings I, linux-mm


On Sat, 6 May 2000, Rajagopal Ananthanarayanan wrote:
> 
> I have a hunch. Follow this argument closely. In shrink_mmap we have:
> 
> ------------
> 	if (p_zone->free_pages > p_zone->pages_high)
>                         goto dispose_continue;
> ------
> 
> This page doesn't count against a valid try in shrink_mmap().

[ second-scan logic ]

Ugh.

This may be right, but it also gets my hackles up for being "too
contrieved". It shouldn't be this complex.

Either "shrink_mmap()" should care about the zone or it shouldn't. If it
should, then it should just check the particular zone that it was passed
in (ie basically per-zone LRU again). If it shouldn't, then it probably
should just take the LRU as-is.

Also, one thing that keeps me wondering is whether the current
"try_to_free_pages()" is right at all.

Remember: the fundamental operation isn't really "try_to_free_pages()"
Nobody really ever calls that directly. The fundamental operation we
want to have is really just "balance_zones()", and it may be that the
by isolating the "zone" we're aiming for early in balance_zones() we've
done a mistake.

My personal inclination is along the lines of
 - we never really care about any particular zone. We should make sure
   that all zones get balanced, and that is what running kswapd will
   eventually cause. 
 - things like "shrink_mmap" and "vmscan" should both free any page from
   any zone that is (a) a good candidateand (b) the zone is not yet
   well-balanced.
 - looking at "shrink_mmap()", my reaction would not be to add more
   complexity to it, but to remove the _one_ special case that looks at
   one specific zone:

        /* wrong zone?  not looped too often?    roll again... */
        if (page->zone != zone && count)
                goto again;

   I would suggest just removing that test altogether. The page wasn't
   from a "wrong zone". It was just a different zone that also needed
   balancing.

That single test stands out as being zone-specific instead of geared
towards the bigger goal of "let's balance the zones". It would also cause
"shrink_mmap()" to =return= failure, even if shrink_mmap() actually ended
up doing real work. Which just seems wrong.

So instead of making that test more complicated and adding a "phase"
counter, why not just remove it? Then "shrink_mmap()"will start failing
onlywhen it _truly_ fails - ie when it no longer can find any pages really
worth freeing. 

		Linus "gut instinct" Torvalds

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [DATAPOINT] pre7-6 will not swap
  2000-05-06 22:24         ` Rajagopal Ananthanarayanan
  2000-05-06 14:03           ` Benjamin Redelings I
@ 2000-05-07  0:22           ` Rik van Riel
  2000-05-07  2:23           ` Linus Torvalds
  2 siblings, 0 replies; 18+ messages in thread
From: Rik van Riel @ 2000-05-07  0:22 UTC (permalink / raw)
  To: Rajagopal Ananthanarayanan; +Cc: Benjamin Redelings I, Linus Torvalds, linux-mm

On Sat, 6 May 2000, Rajagopal Ananthanarayanan wrote:

> What do you guys think?

I think you may want to take a look at
page_alloc.c::__alloc_pages(), where the kernel balances
between different zones...

- kswapd is woken up when zone->free_pages < zone->pages_low
- kswapd goes to sleep when it has freed enough pages in the
  current zone
- if another zone has a lower memory load, we'll free some
  "extra" pages in that other zone, up to zone->pages_high

This should provide enough balancing between zones...

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [DATAPOINT] pre7-6 will not swap
  2000-05-06 21:46       ` Rik van Riel
@ 2000-05-06 22:24         ` Rajagopal Ananthanarayanan
  2000-05-06 14:03           ` Benjamin Redelings I
                             ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Rajagopal Ananthanarayanan @ 2000-05-06 22:24 UTC (permalink / raw)
  To: riel; +Cc: Benjamin Redelings I, Linus Torvalds, linux-mm

Rik van Riel wrote:
> 
> On Fri, 5 May 2000, Benjamin Redelings I wrote:
> 
> >       It looks like some processes (my unused daemons) are
> > scanned only once, and then get stuck at the end of some list?
> > Is that a possible explanation? <guessing> Perhaps Rik's moving
> > list-head idea is needed? </guessing>.
> 
> I'm busy implementing Davem's active/inactive list proposal
> to replace the current page/swapcache. I don't know if it'll
> work really well though, so research into other directions
> is very much welcome ;)
> 

Again my experience, with skipping pages whose zones have
(free_pages > pages_high) in try_to_swap_out, is similar to
Benajamin's ... the system behaves better than 7-4, but
isn't as good as without any zone skipping.

Once again, I'm back to asking, should we be swapping at all?
Shouldn't shrink_mmap() be finding pages to throw out?

I have a hunch. Follow this argument closely. In shrink_mmap we have:

------------
	if (p_zone->free_pages > p_zone->pages_high)
                        goto dispose_continue;
------

This page doesn't count against a valid try in shrink_mmap().
Soon, we run out of pages to look at, but "count" in shrink_mmap is
still high. So, we go back to scanning the lru list all over again.
If some pages' reference count was flipped in the first loop, good.
If it wasn't, and all that remained was unreferenced pages whose
zones have reached the high water mark, then they won't be victimized,
because the same test above will skip the page again!

Still on the second loop, shrink_mmap will look at other pages,
for instance because an I/O is in flight, and _those_ pages do tally
against "count" ... so, in essense, we have skipped unreferenced pages
belonging to zones with high water mark, for ever. This is wrong.

My solution is simple. Have a variable, "second_scan" initialized to zero,
at the top of shrink_mmap(). Set "second_scan = 1" at the bottom of the loop
in shrink_mmap:

---------------
	/* wrong zone?  not looped too often?    roll again... */
        if (page->zone != zone && count) {
		second_scan = 1;
                goto again;
	}
-------------

Now the pages_high test will be changed to:

-----------
	 if (p_zone->free_pages > p_zone->pages_high && !second_scan)
                        goto dispose_continue;
-----------

That is, victimize pages in zones with lots of free_pages if having
scanned once we didn't find anything.

If you are worried about unreferenced pages not being looked at in
the second_scan, we can change it to a third_scan.

Now, the final argument: since this page was skipped by shrink_mmap(),
the test in try_to_swap_out that Benjamin, I and Linus have been playing
around becomes important. Without it, pages in zones with lots of
free memory neither get "shrunk" nor get swapped.

What do you guys think?

--------------------------------------------------------------------------
Rajagopal Ananthanarayanan ("ananth")
Member Technical Staff, SGI.
--------------------------------------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [DATAPOINT] pre7-6 will not swap
  2000-05-06  5:35     ` Benjamin Redelings I
@ 2000-05-06 21:46       ` Rik van Riel
  2000-05-06 22:24         ` Rajagopal Ananthanarayanan
  0 siblings, 1 reply; 18+ messages in thread
From: Rik van Riel @ 2000-05-06 21:46 UTC (permalink / raw)
  To: Benjamin Redelings I; +Cc: Linus Torvalds, Rajagopal Ananthanarayanan, linux-mm

On Fri, 5 May 2000, Benjamin Redelings I wrote:

> 	It looks like some processes (my unused daemons) are
> scanned only once, and then get stuck at the end of some list?  
> Is that a possible explanation? <guessing> Perhaps Rik's moving
> list-head idea is needed? </guessing>.

I'm busy implementing Davem's active/inactive list proposal
to replace the current page/swapcache. I don't know if it'll
work really well though, so research into other directions
is very much welcome ;)

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [DATAPOINT] pre7-6 will not swap
  2000-05-06 17:12 ` Rajagopal Ananthanarayanan
  2000-05-06  4:25   ` Benjamin Redelings I
@ 2000-05-06 19:35   ` Linus Torvalds
  2000-05-06  5:35     ` Benjamin Redelings I
  2000-05-09  1:52     ` Quintela Carreira Juan J.
  1 sibling, 2 replies; 18+ messages in thread
From: Linus Torvalds @ 2000-05-06 19:35 UTC (permalink / raw)
  To: Rajagopal Ananthanarayanan; +Cc: Benjamin Redelings I, linux-mm


On Sat, 6 May 2000, Rajagopal Ananthanarayanan wrote:
> 
> Linus has taken in the fix to "old" vs. "young" in shrink_mmap,
> and taken out the aggressive counter change (also in shrink_mmap).
> But apparently another change in try_to_swap_out is causing problems.
> I haven't an analytical evaluation, but empericically, if I remove this
> in try_to_swap_out (mm/vmscan.c), dbench runs ok.

Yes. I was thinking some more about it, and it isusing the wrong test. It
must use the same test as the one in page_alloc.cto determine whether
azone is "interesting" or not - otherwise you get into a situation where
page_alloc.c doesn't want to allocate from a zone because it's not quite
empty enough, but at the same time vmscan doesn't want to free pages from
the zone because it's not quite full enough.

No wonder that if you get to that situation, the allocator starts getting
unhappy and says "no free pages".

> --------------- mm/vmscan.c around line 113 --------------
>         /*
>          * Don't do any of the expensive stuff if
>          * we're not really interested in this zone.
> 	 */
>         if (!page->zone->zone_wake_kswapd)
>                 goto out_unlock;

Make this test be the same as in "__alloc_pages()" in mm/page_alloc.c, and
it should be ok. The test there is:

                /* Are we supposed to free memory? Don't make it worse.. */
                if (!z->zone_wake_kswapd && z->free_pages > z->pages_low) {

and I suspect that we mightactually make the vmscan.c test more eager to
swap stuff out: my private source tree says

        /*
         * Don't do any of the expensive stuff if
         * we're not really interested in this zone.
         */
	if (z->free_pages > z->pages_high) 
		goto out_unlock;

in vmscan.c, and that seems to be quite well-behaved too (but if somebody
has the energy to test the two different versions, I'd absolutely love to
hear results..)

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [DATAPOINT] pre7-6 will not swap
       [not found] <8evk0f$7jote$1@fido.engr.sgi.com>
@ 2000-05-06 17:12 ` Rajagopal Ananthanarayanan
  2000-05-06  4:25   ` Benjamin Redelings I
  2000-05-06 19:35   ` Linus Torvalds
  0 siblings, 2 replies; 18+ messages in thread
From: Rajagopal Ananthanarayanan @ 2000-05-06 17:12 UTC (permalink / raw)
  To: Benjamin Redelings I, torvalds; +Cc: linux-mm

Benjamin Redelings I wrote:
> 
> Hi,
>         I just compiled pre7-6.  It seems more useable than pre7-5.  However,
> it basically does not swap.  The first time there is any memory
> pressure, it swaps 32 pages (128k), and it never swaps again.
>         In similar circumstances, pre7-4 has gotten up to 30Mb swapped.  There
> are many unused daemons running in my 64Mb RAM.
> 
>         I also reverted to
>   count = nr_threads / (priority +1)
>         though I didn't check carefully what this did.  Anyway, it doesn't
> seem to make a difference.
> 


Yes, your observation is a good summarization of 7-6 behaviour.
I'm also not seeing good results.  The writes from dbench
start failing; i guess the grab_page_cache in generic_file_write
is returning ENOMEM.

Again, as you say, the system doesn't want to swap after an intial
flurry of activity.

Linus has taken in the fix to "old" vs. "young" in shrink_mmap,
and taken out the aggressive counter change (also in shrink_mmap).
But apparently another change in try_to_swap_out is causing problems.
I haven't an analytical evaluation, but empericically, if I remove this
in try_to_swap_out (mm/vmscan.c), dbench runs ok.

--------------- mm/vmscan.c around line 113 --------------
        /*
         * Don't do any of the expensive stuff if
         * we're not really interested in this zone.
	 */
        if (!page->zone->zone_wake_kswapd)
                goto out_unlock;
----------------------------------------------------------

Benjamin, can you comment this line out and see if it improves things?

Linus, one thing crossed my mind. With the above change swap_out()
will "count" as having tried this process, although the zone may
never need balancing. Aren't the initial system threads at the
beginning of the task_list? If so, do you think their zones may
never balancing?  ... and hence swap_out in essence gives up early?





--------------------------------------------------------------------------
Rajagopal Ananthanarayanan ("ananth")
Member Technical Staff, SGI.
--------------------------------------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [DATAPOINT] pre7-6 will not swap
  2000-05-06 22:24         ` Rajagopal Ananthanarayanan
@ 2000-05-06 14:03           ` Benjamin Redelings I
  2000-05-07  0:22           ` Rik van Riel
  2000-05-07  2:23           ` Linus Torvalds
  2 siblings, 0 replies; 18+ messages in thread
From: Benjamin Redelings I @ 2000-05-06 14:03 UTC (permalink / raw)
  To: Rajagopal Ananthanarayanan; +Cc: riel, Linus Torvalds, linux-mm

> Once again, I'm back to asking, should we be swapping at all?
> Shouldn't shrink_mmap() be finding pages to throw out?
> 

Thats a good question.  However, it also misses part of the point.

The reason for the bad performance is not mainly that there is too
little swapout.  The WRONG PAGES are swapped out!  The system spends
most of its I/O bandwith doing page-in's.

Remember, on my system, the VM swapped out the quake ENGINE, which was
running 100% of the time, in order to keep unused daemons blocking on
select in core.

That is just wrong.  Right?

-benRI
-- 
"I want to be in the light, as He is in the Light,
 I want to shine like the stars in the heavens." - DC Talk, "In the
Light"
Benjamin Redelings I      <><     http://www.bol.ucla.edu/~bredelin/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [DATAPOINT] pre7-6 will not swap
  2000-05-06 19:35   ` Linus Torvalds
@ 2000-05-06  5:35     ` Benjamin Redelings I
  2000-05-06 21:46       ` Rik van Riel
  2000-05-09  1:52     ` Quintela Carreira Juan J.
  1 sibling, 1 reply; 18+ messages in thread
From: Benjamin Redelings I @ 2000-05-06  5:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rajagopal Ananthanarayanan, linux-mm

> and I suspect that we mightactually make the vmscan.c test more eager to
> swap stuff out: my private source tree says
> 
>         /*
>          * Don't do any of the expensive stuff if
>          * we're not really interested in this zone.
>          */
>         if (z->free_pages > z->pages_high)
>                 goto out_unlock;
> 
> in vmscan.c, and that seems to be quite well-behaved too (but if somebody
> has the energy to test the two different versions, I'd absolutely love to
> hear results..)

Although I would have thought that putting this test in would have no
effect on performance, it actually kills performance.  Since the test
appears very reasonable, I think this means we have a bug elsewhere, and
that removing this reasonable test cures a symptom, but not the bug.

OK, details.
	With Linus's test, the kernel does not want to swap much.  It is a
little better than the pervious version of the test, but much lower than
if the test was removed.  One result is that the cache shrinks to low
sizes like 14Mb/64Mb, when there are several unused daemons that could 
be swapped out.	
	Also, the WRONG PROCESSES are swapped out.  Several large daemons that
were swapped out w/o the test, are now left in core.  Instead, RUNNING
programs are swapped out, like netscape.  Even worse, running xquake and
'tar -xf linux.tar' makes the system non-responsive - the VM continues
paging the quake ENGINE in and out and in and out :P
	It looks like some processes (my unused daemons) are scanned only once,
and then get stuck at the end of some list?  Is that a possible
explanation? <guessing> Perhaps Rik's moving list-head idea is needed?
</guessing>.

carry on,
-benRI
-- 
"I want to be in the light, as He is in the Light,
 I want to shine like the stars in the heavens." - DC Talk, "In the
Light"
Benjamin Redelings I      <><     http://www.bol.ucla.edu/~bredelin/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [DATAPOINT] pre7-6 will not swap
  2000-05-06 17:12 ` Rajagopal Ananthanarayanan
@ 2000-05-06  4:25   ` Benjamin Redelings I
  2000-05-06 19:35   ` Linus Torvalds
  1 sibling, 0 replies; 18+ messages in thread
From: Benjamin Redelings I @ 2000-05-06  4:25 UTC (permalink / raw)
  To: Rajagopal Ananthanarayanan; +Cc: torvalds, linux-mm

> --------------- mm/vmscan.c around line 113 --------------
>         /*
>          * Don't do any of the expensive stuff if
>          * we're not really interested in this zone.
>          */
>         if (!page->zone->zone_wake_kswapd)
>                 goto out_unlock;
> ----------------------------------------------------------
> 
> Benjamin, can you comment this line out and see if it improves things?

	OK, reverted this.  I also reverted to "count = nr_threads / (priority
+ 1)", I hope that doesn't cause a problem.
	With the above patch reverted, the system swaps amazingly well, as
opposed to almost never.  It swaps out tasks in the correct order.  It
is also a bit more aggressive than pre7-4, swapping out unused daemons
even when there is lots of cache that presumably could be freed (e.g.
BEFORE I run netscape).  But this seems to be the right decision, given
that that stuff isn't swapped back in later.
	After running lots of processes, I can also say that this kernel does
not have a permanent cache size of 30Mb/64Mb.  It actually decreases
eventually instead of swapping out foreground programs like before.


	Does this mean that the zone_wake_kswapd essentially has the wrong
value, so that we don't even balance the zone for which we were called?

-benRI
UP PPro, 64MB RAM, IDE
-- 
"I want to be in the light, as He is in the Light,
 I want to shine like the stars in the heavens." - DC Talk, "In the
Light"
Benjamin Redelings I      <><     http://www.bol.ucla.edu/~bredelin/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2000-05-09  3:31 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-05-05  8:07 [DATAPOINT] pre7-6 will not swap Benjamin Redelings I
     [not found] <8evk0f$7jote$1@fido.engr.sgi.com>
2000-05-06 17:12 ` Rajagopal Ananthanarayanan
2000-05-06  4:25   ` Benjamin Redelings I
2000-05-06 19:35   ` Linus Torvalds
2000-05-06  5:35     ` Benjamin Redelings I
2000-05-06 21:46       ` Rik van Riel
2000-05-06 22:24         ` Rajagopal Ananthanarayanan
2000-05-06 14:03           ` Benjamin Redelings I
2000-05-07  0:22           ` Rik van Riel
2000-05-07  2:23           ` Linus Torvalds
2000-05-07 17:40             ` Rik van Riel
2000-05-07 17:53               ` Linus Torvalds
2000-05-07 19:13                 ` Rajagopal Ananthanarayanan
2000-05-07 19:30                   ` Linus Torvalds
2000-05-09  1:52     ` Quintela Carreira Juan J.
2000-05-09  2:28       ` Rajagopal Ananthanarayanan
2000-05-09  2:33       ` Linus Torvalds
2000-05-09  3:31         ` Rajagopal Ananthanarayanan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox