[PATCH] swapin readahead

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] swapin readahead
@ 1998-11-26 23:23 Rik van Riel
  1998-12-01 15:13 ` Stephen C. Tweedie
  0 siblings, 1 reply; 15+ messages in thread
From: Rik van Riel @ 1998-11-26 23:23 UTC (permalink / raw)
  To: Linux MM

Hi,

here is a very first primitive version of as swapin
readahead patch. It seems to give much increased
throughput to swap and the desktop switch time has
decreased noticably.

The checks are all needed. The first two checks are there
to avoid annoying messages from swap_state.c :)) The third
check is to make sure we always keep at least as much
swapout bandwidth as swapin bandwidth. We need that to keep
the system alive under heavy circumstances.

I am now testing the patch quite heavily (200+ swap IOs/second)
without any errors showing up in my xconsole, so I guess that
means you can have fun too :)

cheers,

Rik -- now completely used to dvorak kbd layout...
+-------------------------------------------------------------------+
| Linux memory management tour guide.        H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader.      http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+

--- mm/page_alloc.c.orig	Thu Nov 26 11:26:49 1998
+++ mm/page_alloc.c	Thu Nov 26 23:48:42 1998
@@ -370,9 +370,28 @@
 	pte_t * page_table, unsigned long entry, int write_access)
 {
 	unsigned long page;
+	int i;
 	struct page *page_map;
+	unsigned long offset = SWP_OFFSET(entry);
+	struct swap_info_struct *swapdev = SWP_TYPE(entry) + swap_info;
 	
 	page_map = read_swap_cache(entry);
+
+	/*
+	 * Primitive swap readahead code. We simply read the
+	 * next 16 entries in the swap area. The break below
+	 * is needed or else the request queue will explode :)
+	 */
+	for (i = 1; i++ < 16;) {
+		offset++;
+		if (!swapdev->swap_map[offset] || offset >= swapdev->max
+				|| atomic_read(&nr_async_pages) >
+				pager_daemon.swap_cluster / 2)
+			break;
+		read_swap_cache_async(SWP_ENTRY(SWP_TYPE(entry), offset),
+0);
+			break;
+	}
 
 	if (pte_val(*page_table) != entry) {
 		if (page_map)
--- mm/page_io.c.orig	Thu Nov 26 11:26:49 1998
+++ mm/page_io.c	Thu Nov 26 11:30:43 1998
@@ -60,7 +60,7 @@
 	}
 
 	/* Don't allow too many pending pages in flight.. */
-	if (atomic_read(&nr_async_pages) > SWAP_CLUSTER_MAX)
+	if (atomic_read(&nr_async_pages) > pager_daemon.swap_cluster)
 		wait = 1;
 
 	p = &swap_info[type];

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swapin readahead
  1998-11-26 23:23 [PATCH] swapin readahead Rik van Riel
@ 1998-12-01 15:13 ` Stephen C. Tweedie
  1998-12-01 15:41   ` Rik van Riel
  1998-12-01 15:51   ` Zlatko Calusic
  0 siblings, 2 replies; 15+ messages in thread
From: Stephen C. Tweedie @ 1998-12-01 15:13 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Linux MM, Stephen Tweedie

Hi Rik,

In article <Pine.LNX.3.96.981127001214.445A-100000@mirkwood.dummy.home>,
Rik van Riel <H.H.vanRiel@phys.uu.nl> writes:

> here is a very first primitive version of as swapin
> readahead patch. It seems to give much increased
> throughput to swap and the desktop switch time has
> decreased noticably.

> The checks are all needed. The first two checks are there
> to avoid annoying messages from swap_state.c :)) 

There's a third check needed, I think, which probably accounts for the
swap_duplicate errors people have been noting.  You need to skip pages
which are marked as locked in the swap_lockmap, or the async page read
may block (you might be trying to read in a page which is still being
written to swap).  In this case, by the time you have slept, the swap
entry is not necessarily still in use, so you may end up reading an
unused swap entry.  That would certainly lead to swap_duplicate
warnings, although I think they should be benign.

--Stephen
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swapin readahead
  1998-12-01 15:13 ` Stephen C. Tweedie
@ 1998-12-01 15:41   ` Rik van Riel
  1998-12-01 15:51   ` Zlatko Calusic
  1 sibling, 0 replies; 15+ messages in thread
From: Rik van Riel @ 1998-12-01 15:41 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Linux MM

On Tue, 1 Dec 1998, Stephen C. Tweedie wrote:
> In article <Pine.LNX.3.96.981127001214.445A-100000@mirkwood.dummy.home>,
> Rik van Riel <H.H.vanRiel@phys.uu.nl> writes:
> 
> > here is a very first primitive version of as swapin

I am at version 3 now (4 in a few minutes?) but your
message still seems needed...

> > The checks are all needed. The first two checks are there
> > to avoid annoying messages from swap_state.c :)) 
> 
> There's a third check needed, I think, which probably accounts for the
> swap_duplicate errors people have been noting.  You need to skip pages
> which are marked as locked in the swap_lockmap, or the async page read
> may block

OK, I'll add this test and try again.

I also noted something else -- when I free a lot of memory
(80+ MB gimp picture) the system swaps itself to death.
Could that also be because of issues with locked/unlocked
swap_cache pages which in some magical way duplicate themselves
and fill up memory? I have seen 600+ MB of shared memory :(

cheers,

Rik -- now completely used to dvorak kbd layout...
+-------------------------------------------------------------------+
| Linux memory management tour guide.        H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader.      http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swapin readahead
  1998-12-01 15:13 ` Stephen C. Tweedie
  1998-12-01 15:41   ` Rik van Riel
@ 1998-12-01 15:51   ` Zlatko Calusic
  1998-12-01 16:42     ` Rik van Riel
  1 sibling, 1 reply; 15+ messages in thread
From: Zlatko Calusic @ 1998-12-01 15:51 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Rik van Riel, Linux MM

"Stephen C. Tweedie" <sct@redhat.com> writes:

> Hi Rik,
> 
> In article <Pine.LNX.3.96.981127001214.445A-100000@mirkwood.dummy.home>,
> Rik van Riel <H.H.vanRiel@phys.uu.nl> writes:
> 
> > here is a very first primitive version of as swapin
> > readahead patch. It seems to give much increased
> > throughput to swap and the desktop switch time has
> > decreased noticably.
> 
> > The checks are all needed. The first two checks are there
> > to avoid annoying messages from swap_state.c :)) 
> 
> There's a third check needed, I think, which probably accounts for the
> swap_duplicate errors people have been noting.  You need to skip pages
> which are marked as locked in the swap_lockmap, or the async page read
> may block (you might be trying to read in a page which is still being
> written to swap).  In this case, by the time you have slept, the swap
> entry is not necessarily still in use, so you may end up reading an
> unused swap entry.  That would certainly lead to swap_duplicate
> warnings, although I think they should be benign.
> 

That warnings are probably benign, but the patch in the whole has at
least one big engineering problem. Unfortunately, I'm trying to
understand other parts of the MM code, so currently I don't have the
time needed to play with the swapin readahead more.

But, what I observed is that memory gets lost in some strange way. It
is possible that lost pages are in the swap cache, and it looks like
nothing frees them at all.

Problem is becoming worse, as you push MM to its limit.

I don't understand how Rik doesn't notice this, but I'm able to
deadlock machine in a matter of minutes, by running simple memory
mallocing & reading program.

Needless to say, performance measurement are postponed until my
machine can stay alive after applying the patch. :)

To help further debugging, I'm appending source of a very simple
program that is one of the tests I like to run to see what happened to
MM recently. :)

Call it hogmem.c, compile it and then run it with two arguments. First
is how much memory to allocate (make it slightly bigger than size of
your physical memory in MB, to make system swapping), and second is
how many times to read the memory (some small number).

For example, I'm using it like hogmem 100 3 (with 64MB of RAM).

After it finishes (that won't happen if you apply swapin readahead
patch, you've been warned!), it will report memory reading speed in
MB/sec. That is, swapping speed, if your argv[1] was large enough to
make life painfull for your disk(s). :)

I'm looking forward for your comments on the subject.

Rik, hopefully this helps you to find a problem with logic in your
patch.

Also, looking at the patch source, it looks like the comment there is
completely misleading, as the for() loop is not doing anything, at
all. The patch can be shortened to do offset++, if() and only ONE
read_swap_cache_async, if I'm understanding it correctly. Sorry, I'm
not including it here, have some other things to do fast.

hogmem.c:

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <limits.h>
#include <signal.h>
#include <time.h>
#include <sys/times.h>

#define MB (1024 * 1024)

int nr, intsize, i, t;
clock_t st;
struct tms dummy;

void intr(int intnum)
{
    clock_t et = times(&dummy);

    printf("\nMemory speed: %.2f MB/sec\n", (2 * t * CLK_TCK * nr + (double) i * CLK_TCK * intsize / MB) / (et - st));
    exit(EXIT_SUCCESS);
}

int main(int argc, char **argv)
{
    int max, nr_times, *area, c;

    setbuf(stdout, 0);
    signal(SIGINT, intr);
    signal(SIGTERM, intr);
    intsize = sizeof(int);
    if (argc < 2 || argc > 3) {
	fprintf(stderr, "Usage: hogmem <MB> [times]\n");
	exit(EXIT_FAILURE);
    }
    nr = atoi(argv[1]);
    if (argc == 3)
	nr_times = atoi(argv[2]);
    else
	nr_times = INT_MAX;
    area = malloc(nr * MB);
    max = nr * MB / intsize;
    st = times(&dummy);
    for (c = 0; c < nr_times; c++)
    {
	for (i = 0; i < max; i++)
	    area[i]++;
	t++;
	putchar('.');
    }
    i = 0;
    intr(0);
    /* notreached */
    exit(EXIT_SUCCESS);
}

Regards,
-- 
Posted by Zlatko Calusic           E-mail: <Zlatko.Calusic@CARNet.hr>
---------------------------------------------------------------------
	 Suicide is the most sincere form of self criticism.
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swapin readahead
  1998-12-01 15:51   ` Zlatko Calusic
@ 1998-12-01 16:42     ` Rik van Riel
  1998-12-01 17:20       ` Zlatko Calusic
  0 siblings, 1 reply; 15+ messages in thread
From: Rik van Riel @ 1998-12-01 16:42 UTC (permalink / raw)
  To: Zlatko Calusic; +Cc: Stephen C. Tweedie, Linux MM

On 1 Dec 1998, Zlatko Calusic wrote:
> "Stephen C. Tweedie" <sct@redhat.com> writes:
> > In article <Pine.LNX.3.96.981127001214.445A-100000@mirkwood.dummy.home>,
> > Rik van Riel <H.H.vanRiel@phys.uu.nl> writes:
> > 
> > > here is a very first primitive version of as swapin
> > > readahead patch. It seems to give much increased
> > > throughput to swap and the desktop switch time has
> > > decreased noticably.
> > 
> > There's a third check needed, I think, which probably accounts for the
> > swap_duplicate errors people have been noting.  You need to skip pages
> > which are marked as locked in the swap_lockmap, or the async page read
> 
> That warnings are probably benign, but the patch in the whole has at
> least one big engineering problem. Unfortunately, I'm trying to
> understand other parts of the MM code, so currently I don't have the
> time needed to play with the swapin readahead more.
> 
> But, what I observed is that memory gets lost in some strange way. It
> is possible that lost pages are in the swap cache, and it looks like
> nothing frees them at all.

I've observed this problem as well, but I haven't figured
out the cause yet...

> I don't understand how Rik doesn't notice this, but I'm able to
> deadlock machine in a matter of minutes, by running simple memory
> mallocing & reading program.

In my experience allocations aren't the big problem but
deallocations. I guess we lose some memory there :(

> Call it hogmem.c, compile it and then run it with two arguments. First
> is how much memory to allocate (make it slightly bigger than size of
> your physical memory in MB, to make system swapping), and second is
> how many times to read the memory (some small number).

> Rik, hopefully this helps you to find a problem with logic in your
> patch.

I'll check it out and report later.

> Also, looking at the patch source, it looks like the comment there is
> completely misleading, as the for() loop is not doing anything, at
> all. The patch can be shortened to do offset++, if() and only ONE
> read_swap_cache_async, if I'm understanding it correctly. Sorry, I'm
> not including it here, have some other things to do fast.

You have to read each entry separately; you want all of
them to have an entry in the swap cache...

[SNIP program]

Hmm, reading and writing huge amounts of memory repeatedly
makes memory dissapear and deadlock the machine... This
means we are losing memory somewhere -- I'll check things
out very carefully...

cheers,

Rik -- now completely used to dvorak kbd layout...
+-------------------------------------------------------------------+
| Linux memory management tour guide.        H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader.      http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swapin readahead
  1998-12-01 16:42     ` Rik van Riel
@ 1998-12-01 17:20       ` Zlatko Calusic
  1998-12-01 18:32         ` Rik van Riel
  1998-12-02 17:33         ` Stephen C. Tweedie
  0 siblings, 2 replies; 15+ messages in thread
From: Zlatko Calusic @ 1998-12-01 17:20 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Stephen C. Tweedie, Linux MM

Rik van Riel <H.H.vanRiel@phys.uu.nl> writes:

> In my experience allocations aren't the big problem but
> deallocations. I guess we lose some memory there :(

Yes. something like that. Since nobody asked pages to swap in (we
decided to swap them in) it looks like nobody frees them. :)
So we should free them somewhere, probably.

> > Also, looking at the patch source, it looks like the comment there is
> > completely misleading, as the for() loop is not doing anything, at
> > all. The patch can be shortened to do offset++, if() and only ONE
> > read_swap_cache_async, if I'm understanding it correctly. Sorry, I'm
> > not including it here, have some other things to do fast.
> 
> You have to read each entry separately; you want all of
> them to have an entry in the swap cache...

+
+	/*
+	 * Primitive swap readahead code. We simply read the
+	 * next 16 entries in the swap area. The break below
+	 * is needed or else the request queue will explode :)
+	 */
+	for (i = 1; i++ < 16;) {
+		offset++;
+		if (!swapdev->swap_map[offset] || offset >= swapdev->max
+				|| atomic_read(&nr_async_pages) >
+				pager_daemon.swap_cluster / 2)
+			break;
+		read_swap_cache_async(SWP_ENTRY(SWP_TYPE(entry), offset),
+0);
+			break;
+	}               ^^^^^^

Last break in the for() loop, exits the loop after the very first
pass. Why don't you get get rid of the loop, then:

	offset++;
	if (swapdev->swap_map[offset] && offset < swapdev->max
			&& atomic_read(&nr_async_pages) <=
			pager_daemon.swap_cluster / 2)
		read_swap_cache_async(SWP_ENTRY(SWP_TYPE(entry), offset), 0);

Functionality is exactly the same, and code is much more readable.
Do you see my point now?

I wish you luck with the swapin readahed. I'm also very interested in
the impact it could made, since my tests revealed that swapping in
adjacent pages from swap is quite common operation, so in some
workloads it could be a big win (hogmem, for instance, would probably
be much faster :)).

Good luck!
-- 
Posted by Zlatko Calusic           E-mail: <Zlatko.Calusic@CARNet.hr>
---------------------------------------------------------------------
	      Black holes are where God divided by zero.
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swapin readahead
  1998-12-01 17:20       ` Zlatko Calusic
@ 1998-12-01 18:32         ` Rik van Riel
  1998-12-02 17:35           ` Stephen C. Tweedie
  1998-12-02 17:33         ` Stephen C. Tweedie
  1 sibling, 1 reply; 15+ messages in thread
From: Rik van Riel @ 1998-12-01 18:32 UTC (permalink / raw)
  To: Zlatko Calusic; +Cc: Stephen C. Tweedie, Linux MM

On 1 Dec 1998, Zlatko Calusic wrote:
> Rik van Riel <H.H.vanRiel@phys.uu.nl> writes:
> 
> > In my experience allocations aren't the big problem but
> > deallocations. I guess we lose some memory there :(
> 
> Yes. something like that. Since nobody asked pages to swap in (we
> decided to swap them in) it looks like nobody frees them. :)
> So we should free them somewhere, probably.

I took the bet that shrink_mmap() would take care of that, but
aperrantly not always :(

> > > Also, looking at the patch source, it looks like the comment there is
> > > completely misleading, as the for() loop is not doing anything, at

> +		read_swap_cache_async(SWP_ENTRY(SWP_TYPE(entry), offset),
> +0);
> +			break;
> +	}               ^^^^^^
> 
> Last break in the for() loop, exits the loop after the very first
> pass. Why don't you get get rid of the loop, then:

Whoops, this break was left there from a previous editing
round and is removed now. I completely oversaw that one,
I guess that means I should go over the code with a comb
now... :)

> I wish you luck with the swapin readahed. I'm also very interested
> in the impact it could made, since my tests revealed that swapping
> in adjacent pages from swap is quite common operation, so in some
> workloads it could be a big win (hogmem, for instance, would
> probably be much faster :)). 

For the pure readahead cache system we'd only need a 10%
hit rate to increase performance twofold (Rogier Wolff and
I calculated this once on the transfer/seek ratio of disks).

Of course a 10% hit ratio means we swap out 90% of stuff
that'd otherwise stay in memory, so it's not a clear picture
at all.

We probably want to increase the readahead when we satisfy
more than 20% of all page faults (that involve a swap area)
from the cache and decrease it when we go below 10%.

Then, of course, we'd also need to weigh average over a minute
or 300 swapins, whichever takes longer. And we need to take
memory and I/O pressure into account so we don't fill up memory
and I/O bandwidth with useless work...

cheers,

Rik -- now completely used to dvorak kbd layout...
+-------------------------------------------------------------------+
| Linux memory management tour guide.        H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader.      http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swapin readahead
  1998-12-01 17:20       ` Zlatko Calusic
  1998-12-01 18:32         ` Rik van Riel
@ 1998-12-02 17:33         ` Stephen C. Tweedie
  1998-12-03 14:44           ` Rik van Riel
  1 sibling, 1 reply; 15+ messages in thread
From: Stephen C. Tweedie @ 1998-12-02 17:33 UTC (permalink / raw)
  To: Zlatko.Calusic; +Cc: Rik van Riel, Stephen C. Tweedie, Linux MM

Hi,

On 01 Dec 1998 18:20:49 +0100, Zlatko Calusic <Zlatko.Calusic@CARNet.hr>
said:

> Yes. something like that. Since nobody asked pages to swap in (we
> decided to swap them in) it looks like nobody frees them. :)
> So we should free them somewhere, probably.

I think read_swap_page_async should be acting as a lookup on the page
cache, so the page it returns is guaranteed to have an incremented
reference count.  You'll need to free_page() it just after the
read_swap_page_async() call to get the expected behaviour.

You still need to skip locked swap entries, too.

--Stephen
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swapin readahead
  1998-12-01 18:32         ` Rik van Riel
@ 1998-12-02 17:35           ` Stephen C. Tweedie
  1998-12-02 21:18             ` Zlatko Calusic
  0 siblings, 1 reply; 15+ messages in thread
From: Stephen C. Tweedie @ 1998-12-02 17:35 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Zlatko Calusic, Stephen C. Tweedie, Linux MM

Hi,

On Tue, 1 Dec 1998 19:32:52 +0100 (CET), Rik van Riel
<H.H.vanRiel@phys.uu.nl> said:

> I took the bet that shrink_mmap() would take care of that, but
> aperrantly not always :(

shrink_mmap() only gets rid of otherwise unused pages (pages whose count
is one).  After read_swap_cache_async(), the page count will be three:
once for the swap cache, once for the io in progress, once for the
reference returned by read_swap_cache_async().  You need to free that
last reference explicitly after doing the readahead call.  The io
reference will be returned once IO completes, and shrink_mmap() will
take care of the final swap cache reference.

--Stephen
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swapin readahead
  1998-12-02 17:35           ` Stephen C. Tweedie
@ 1998-12-02 21:18             ` Zlatko Calusic
  1998-12-03  5:25               ` Eric W. Biederman
  0 siblings, 1 reply; 15+ messages in thread
From: Zlatko Calusic @ 1998-12-02 21:18 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Rik van Riel, Linux MM

"Stephen C. Tweedie" <sct@redhat.com> writes:

> Hi,
> 
> On Tue, 1 Dec 1998 19:32:52 +0100 (CET), Rik van Riel
> <H.H.vanRiel@phys.uu.nl> said:
> 
> > I took the bet that shrink_mmap() would take care of that, but
> > aperrantly not always :(
> 
> shrink_mmap() only gets rid of otherwise unused pages (pages whose count
> is one).  After read_swap_cache_async(), the page count will be three:
> once for the swap cache, once for the io in progress, once for the
> reference returned by read_swap_cache_async().  You need to free that
> last reference explicitly after doing the readahead call.  The io
> reference will be returned once IO completes, and shrink_mmap() will
> take care of the final swap cache reference.
> 

That is exactly what I had in mind, but didn't have time to
investigate further. Nor courage to say that, without trying first. :)

I've been hacking shrink_mmap() and swap_out() most of the time last
few days and in fact completely understood all inner workings of them.
Quite complicated stuff, now I see why it so easily breaks if we
change something aruond.

Trying 2.1.131-2, I'm mostly satisfied with MM workout, but...

Still, I have a feeling that limit imposed on cache growth is now too
hard, unlike kernels from the 2.1.1[01]? era, that had opposite
problems (excessive cache growth during voluminous I/O operations).

What I wanted to ask is: do you guys share my opinion, and what
changes would you like to see before 2.2 comes out?

Thanks for any opinion.
-- 
Posted by Zlatko Calusic           E-mail: <Zlatko.Calusic@CARNet.hr>
---------------------------------------------------------------------
	"640K ought to be enough for anybody." Bill Gates '81
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swapin readahead
  1998-12-02 21:18             ` Zlatko Calusic
@ 1998-12-03  5:25               ` Eric W. Biederman
  1998-12-03  8:55                 ` Zlatko Calusic
  1998-12-03 10:07                 ` Rik van Riel
  0 siblings, 2 replies; 15+ messages in thread
From: Eric W. Biederman @ 1998-12-03  5:25 UTC (permalink / raw)
  To: Zlatko.Calusic; +Cc: Stephen C. Tweedie, Rik van Riel, Linux MM

>>>>> "ZC" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes:

ZC> Trying 2.1.131-2, I'm mostly satisfied with MM workout, but...

ZC> Still, I have a feeling that limit imposed on cache growth is now too
ZC> hard, unlike kernels from the 2.1.1[01]? era, that had opposite
ZC> problems (excessive cache growth during voluminous I/O operations).

My gut reaction is that we need a check in swap_out to see if we have
written out a swap_cluster or some other indication that we have
started all of the disk i/o that is reasonable for now and need to
switch to something else.

This should have the same effect as the switches with the limits on
the swap cache but more autobalancing.  I'm nervous of a kernel that
needs small limits on it's disk cache to work correctly.

ZC> What I wanted to ask is: do you guys share my opinion, and what
ZC> changes would you like to see before 2.2 comes out?

One thing worth putting in.  Probably before to 2.2 but definentily
before any swap page readahead is done is to start using brw_page
for swapfiles.  I don't know about synchronous cases, but in the when
asynchronous operation is important it improves swapfile performance
immensely.

Eric
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swapin readahead
  1998-12-03  5:25               ` Eric W. Biederman
@ 1998-12-03  8:55                 ` Zlatko Calusic
  1998-12-03 15:39                   ` Eric W. Biederman
  1998-12-03 10:07                 ` Rik van Riel
  1 sibling, 1 reply; 15+ messages in thread
From: Zlatko Calusic @ 1998-12-03  8:55 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Stephen C. Tweedie, Rik van Riel, Linux MM

ebiederm+eric@ccr.net (Eric W. Biederman) writes:

> >>>>> "ZC" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes:
> 
> ZC> Trying 2.1.131-2, I'm mostly satisfied with MM workout, but...
> 
> ZC> Still, I have a feeling that limit imposed on cache growth is now too
> ZC> hard, unlike kernels from the 2.1.1[01]? era, that had opposite
> ZC> problems (excessive cache growth during voluminous I/O operations).
> 
> My gut reaction is that we need a check in swap_out to see if we have
> written out a swap_cluster or some other indication that we have
> started all of the disk i/o that is reasonable for now and need to
> switch to something else.

I tried that approach (Rik has tried also) but only to find that
swapout speed drops. Will investigate further...

> 
> This should have the same effect as the switches with the limits on
> the swap cache but more autobalancing.  I'm nervous of a kernel that
> needs small limits on it's disk cache to work correctly.

Yes, that is exactly my point.

I'm glad there is at least one person to share an opinion with. :)

> 
> ZC> What I wanted to ask is: do you guys share my opinion, and what
> ZC> changes would you like to see before 2.2 comes out?
> 
> One thing worth putting in.  Probably before to 2.2 but definentily
> before any swap page readahead is done is to start using brw_page
> for swapfiles.  I don't know about synchronous cases, but in the when
> asynchronous operation is important it improves swapfile performance
> immensely.
> 

Speaking about swap files (as opposed to swap partitions) what is the
reason for synchronous operation when swapping to them, at first
place? Races?

-- 
Posted by Zlatko Calusic           E-mail: <Zlatko.Calusic@CARNet.hr>
---------------------------------------------------------------------
       If you can't make it good, make it LOOK good." B. Gates
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swapin readahead
  1998-12-03  5:25               ` Eric W. Biederman
  1998-12-03  8:55                 ` Zlatko Calusic
@ 1998-12-03 10:07                 ` Rik van Riel
  1 sibling, 0 replies; 15+ messages in thread
From: Rik van Riel @ 1998-12-03 10:07 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Zlatko Calusic, Stephen C. Tweedie, Linux MM

On 2 Dec 1998, Eric W. Biederman wrote:
> >>>>> "ZC" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes:
> 
> ZC> Trying 2.1.131-2, I'm mostly satisfied with MM workout, but...
> 
> ZC> Still, I have a feeling that limit imposed on cache growth is now too
> ZC> hard, unlike kernels from the 2.1.1[01]? era, that had opposite
> ZC> problems (excessive cache growth during voluminous I/O operations).
> 
> My gut reaction is that we need a check in swap_out to see if we have
> written out a swap_cluster or some other indication that we have
> started all of the disk i/o that is reasonable for now and need to
> switch to something else.

     if (buffer_over_borrow() || pgcache_over_borrow())
             state = 0;              
     if (atomic_read(&nr_async_pages) > pager_daemon.swap_cluster / 2)
             shrink_mmap(i, gfp_mask);

I have this piece of code in my vmscan.c in do_try_to_free_page().

It turns out to give the result we all seem to want. It has the
old balancing code (that works) and makes an extra round through
shrink_mmap() when we have been swapping stuff...

Please try it before dismissing :)

cheers,

Rik -- the flu hits, the flu hits, the flu hits -- MORE
+-------------------------------------------------------------------+
| Linux memory management tour guide.        H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader.      http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swapin readahead
  1998-12-02 17:33         ` Stephen C. Tweedie
@ 1998-12-03 14:44           ` Rik van Riel
  0 siblings, 0 replies; 15+ messages in thread
From: Rik van Riel @ 1998-12-03 14:44 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Zlatko Calusic, Linux MM

On Wed, 2 Dec 1998, Stephen C. Tweedie wrote:
> On 01 Dec 1998 18:20:49 +0100, Zlatko Calusic <Zlatko.Calusic@CARNet.hr>
> said:
> 
> > Yes. something like that. Since nobody asked pages to swap in (we
> > decided to swap them in) it looks like nobody frees them. :)
> > So we should free them somewhere, probably.
> 
> I think read_swap_page_async should be acting as a lookup on the page
> cache, so the page it returns is guaranteed to have an incremented
> reference count.  You'll need to free_page() it just after the
> read_swap_page_async() call to get the expected behaviour.

I have now included the free_page() and things seem to work
out fine. In version 6 of the swapin readahead patch I also
fixed the swap_cache_find_* statistics. We really should make
those available through /proc/sys/vm/swapcache (and writable
so we can test the stats over a certain period of time).

Zlatko's hogmem.c gives pretty decent performance now, but I
guess it could be better by always doing readahead regardless
of whether the page is in memory or not...

OTOH, I have observed swapin rates of 5000+ swaps a second, or
3000 in/out :)

cheers,

Rik -- the flu hits, the flu hits, the flu hits -- MORE
+-------------------------------------------------------------------+
| Linux memory management tour guide.        H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader.      http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] swapin readahead
  1998-12-03  8:55                 ` Zlatko Calusic
@ 1998-12-03 15:39                   ` Eric W. Biederman
  0 siblings, 0 replies; 15+ messages in thread
From: Eric W. Biederman @ 1998-12-03 15:39 UTC (permalink / raw)
  To: Zlatko.Calusic; +Cc: Stephen C. Tweedie, Rik van Riel, Linux MM

>>>>> "ZC" == Zlatko Calusic <Zlatko.Calusic@CARNet.hr> writes:

ZC> Speaking about swap files (as opposed to swap partitions) what is the
ZC> reason for synchronous operation when swapping to them, at first
ZC> place? Races?

It appears it was implmented that way long originally and no one has
changed the code.  It is almost trivial to change to using brw_page.

In my shmfs code I have that change, though I haven't tried it in a while.

Eric
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~1998-12-03 15:29 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-11-26 23:23 [PATCH] swapin readahead Rik van Riel
1998-12-01 15:13 ` Stephen C. Tweedie
1998-12-01 15:41   ` Rik van Riel
1998-12-01 15:51   ` Zlatko Calusic
1998-12-01 16:42     ` Rik van Riel
1998-12-01 17:20       ` Zlatko Calusic
1998-12-01 18:32         ` Rik van Riel
1998-12-02 17:35           ` Stephen C. Tweedie
1998-12-02 21:18             ` Zlatko Calusic
1998-12-03  5:25               ` Eric W. Biederman
1998-12-03  8:55                 ` Zlatko Calusic
1998-12-03 15:39                   ` Eric W. Biederman
1998-12-03 10:07                 ` Rik van Riel
1998-12-02 17:33         ` Stephen C. Tweedie
1998-12-03 14:44           ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox