* [PATCH] swapin readahead and fixes
@ 1998-12-03 17:56 Rik van Riel
1998-12-04 11:34 ` Stephen C. Tweedie
1998-12-04 19:25 ` Chris Evans
0 siblings, 2 replies; 23+ messages in thread
From: Rik van Riel @ 1998-12-03 17:56 UTC (permalink / raw)
To: Linux MM; +Cc: Linux Kernel
Hi,
here is a patch (against 2.1.130, but vs. 2.1.131 should
be trivial) that improves the swapping performance both
during swapout and swapin and contains a few minor fixes.
The swapout enhancement is in the fact that now kswapd
tries to free memory when it has a few swapout requests
pending in order to avoid a swapout frenzy -- and also
without avoiding too much pressure on the caches.
The swapin enhancement consists of a simple swapin readahead.
I have extensively tortured this version of the patch and it
should survive the most extreme things now. It is only a
primitive readahead thingy and can probably be improved
quite a lot; that, however, is something to do later when
it is proven stable and the bugfix parts are included in
the kernel.
Future versions of this (and other) patches can be grabbed
from http://linux-patches.rock-projects.com/ or from my
home page... <hint> check out Linux-patches! </hint>
regards,
Rik -- the flu hits, the flu hits, the flu hits -- MORE
+-------------------------------------------------------------------+
| Linux memory management tour guide. H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+
--- ./mm/vmscan.c.orig Thu Nov 26 11:26:50 1998
+++ ./mm/vmscan.c Tue Dec 1 07:12:28 1998
@@ -431,6 +431,8 @@
kmem_cache_reap(gfp_mask);
if (buffer_over_borrow() || pgcache_over_borrow())
+ state = 0;
+ if (atomic_read(&nr_async_pages) > pager_daemon.swap_cluster / 2)
shrink_mmap(i, gfp_mask);
switch (state) {
--- ./mm/page_io.c.orig Thu Nov 26 11:26:49 1998
+++ ./mm/page_io.c Thu Nov 26 11:30:43 1998
@@ -60,7 +60,7 @@
}
/* Don't allow too many pending pages in flight.. */
- if (atomic_read(&nr_async_pages) > SWAP_CLUSTER_MAX)
+ if (atomic_read(&nr_async_pages) > pager_daemon.swap_cluster)
wait = 1;
p = &swap_info[type];
--- ./mm/page_alloc.c.orig Thu Nov 26 11:26:49 1998
+++ ./mm/page_alloc.c Thu Dec 3 15:40:48 1998
@@ -370,9 +370,32 @@
pte_t * page_table, unsigned long entry, int write_access)
{
unsigned long page;
- struct page *page_map;
-
+ int i;
+ struct page *new_page, *page_map = lookup_swap_cache(entry);
+ unsigned long offset = SWP_OFFSET(entry);
+ struct swap_info_struct *swapdev = SWP_TYPE(entry) + swap_info;
+
+ if (!page_map) {
page_map = read_swap_cache(entry);
+
+ /*
+ * Primitive swap readahead code. We simply read the
+ * next 16 entries in the swap area. The break below
+ * is needed or else the request queue will explode :)
+ */
+ for (i = 1; i++ < 16;) {
+ offset++;
+ if (!swapdev->swap_map[offset] || offset >= swapdev->max
+ || nr_free_pages - atomic_read(&nr_async_pages) <
+ (freepages.high + freepages.low)/2)
+ break;
+ if (test_bit(offset, swapdev->swap_lockmap))
+ continue;
+ new_page = read_swap_cache_async(SWP_ENTRY(SWP_TYPE(entry), offset), 0);
+ if (new_page != NULL)
+ __free_page(new_page);
+ }
+ }
if (pte_val(*page_table) != entry) {
if (page_map)
--- ./mm/swap_state.c.orig Thu Nov 26 11:26:49 1998
+++ ./mm/swap_state.c Thu Dec 3 15:40:34 1998
@@ -258,9 +258,10 @@
* incremented.
*/
-static struct page * lookup_swap_cache(unsigned long entry)
+struct page * lookup_swap_cache(unsigned long entry)
{
struct page *found;
+ swap_cache_find_total++;
while (1) {
found = find_page(&swapper_inode, entry);
@@ -268,8 +269,10 @@
return 0;
if (found->inode != &swapper_inode || !PageSwapCache(found))
goto out_bad;
- if (!PageLocked(found))
+ if (!PageLocked(found)) {
+ swap_cache_find_success++;
return found;
+ }
__free_page(found);
__wait_on_page(found);
}
--- ./include/linux/swap.h.orig Tue Dec 1 07:29:56 1998
+++ ./include/linux/swap.h Tue Dec 1 07:31:03 1998
@@ -90,6 +90,7 @@
extern struct page * read_swap_cache_async(unsigned long, int);
#define read_swap_cache(entry) read_swap_cache_async(entry, 1);
extern int FASTCALL(swap_count(unsigned long));
+extern struct page * lookup_swap_cache(unsigned long);
/*
* Make these inline later once they are working properly.
*/
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-03 17:56 [PATCH] swapin readahead and fixes Rik van Riel
@ 1998-12-04 11:34 ` Stephen C. Tweedie
1998-12-04 14:02 ` Rik van Riel
1998-12-04 19:25 ` Chris Evans
1 sibling, 1 reply; 23+ messages in thread
From: Stephen C. Tweedie @ 1998-12-04 11:34 UTC (permalink / raw)
To: Rik van Riel; +Cc: Linux MM, Linux Kernel, Stephen Tweedie
Hi,
On Thu, 3 Dec 1998 18:56:34 +0100 (CET), Rik van Riel
<H.H.vanRiel@phys.uu.nl> said:
> The swapin enhancement consists of a simple swapin readahead.
One odd thing about the readahead: you don't start the readahead until
_after_ you have synchronously read in the first swap page of the
cluster. Surely it is better to do the readahead first, so that you
are submitting one IO to disk, not two?
--Stephen
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-04 11:34 ` Stephen C. Tweedie
@ 1998-12-04 14:02 ` Rik van Riel
1998-12-04 14:34 ` Stephen C. Tweedie
0 siblings, 1 reply; 23+ messages in thread
From: Rik van Riel @ 1998-12-04 14:02 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: Linux MM, Linux Kernel
On Fri, 4 Dec 1998, Stephen C. Tweedie wrote:
> On Thu, 3 Dec 1998 18:56:34 +0100 (CET), Rik van Riel
> <H.H.vanRiel@phys.uu.nl> said:
>
> > The swapin enhancement consists of a simple swapin readahead.
>
> One odd thing about the readahead: you don't start the readahead until
> _after_ you have synchronously read in the first swap page of the
> cluster. Surely it is better to do the readahead first, so that you
> are submitting one IO to disk, not two?
This would severely suck when something else would be doing
a run_taskqueue(&tq_disk). It would mean that we'd read
n+1..n+15 before n itself.
OTOH, if the disk is lightly loaded it would be an advantage.
I will try it shortly (but don't know how to measure the
results :)...
cheers,
Rik -- the flu hits, the flu hits, the flu hits -- MORE
+-------------------------------------------------------------------+
| Linux memory management tour guide. H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-04 14:02 ` Rik van Riel
@ 1998-12-04 14:34 ` Stephen C. Tweedie
1998-12-05 9:46 ` Gerard Roudier
1998-12-05 10:47 ` Gerard Roudier
0 siblings, 2 replies; 23+ messages in thread
From: Stephen C. Tweedie @ 1998-12-04 14:34 UTC (permalink / raw)
To: Rik van Riel; +Cc: Stephen C. Tweedie, Linux MM, Linux Kernel
Hi,
On Fri, 4 Dec 1998 15:02:56 +0100 (CET), Rik van Riel
<H.H.vanRiel@phys.uu.nl> said:
>> One odd thing about the readahead: you don't start the readahead until
>> _after_ you have synchronously read in the first swap page of the
>> cluster. Surely it is better to do the readahead first, so that you
>> are submitting one IO to disk, not two?
> This would severely suck when something else would be doing
> a run_taskqueue(&tq_disk). It would mean that we'd read
> n+1..n+15 before n itself.
No, not at all. This is already the way we do all readahead
everywhere in the kernel.
The idea is to do readahead for all the data you want, *including* the
bit you are going to need right away. Once that is done, you just
wait for the IO to complete on that first item. In this case, that
means doing a readahead on pages n to n+15 inclusive, and then after
that doing the synchronous read_swap_page on page n. The kernel will
happily find that page in the swap cache, work out that IO is already
in progress and wait for that page to become available.
Even though the buffer IO request layer issues the entire sequential
IO as one IO to the device drivers, the buffers and pages involved in
the data transfer still get unlocked one by one as the IO completes.
After submitting the initial IO you can wait for that first page to
become unlocked without having to wait for the rest of the readahead
IO to finish.
--Stephen
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-03 17:56 [PATCH] swapin readahead and fixes Rik van Riel
1998-12-04 11:34 ` Stephen C. Tweedie
@ 1998-12-04 19:25 ` Chris Evans
1998-12-04 20:47 ` Rik van Riel
1998-12-07 11:52 ` Rik van Riel
1 sibling, 2 replies; 23+ messages in thread
From: Chris Evans @ 1998-12-04 19:25 UTC (permalink / raw)
To: Rik van Riel; +Cc: Linux MM, Linux Kernel
On Thu, 3 Dec 1998, Rik van Riel wrote:
> Hi,
>
> here is a patch (against 2.1.130, but vs. 2.1.131 should
> be trivial) that improves the swapping performance both
> during swapout and swapin and contains a few minor fixes.
Hi Rik,
I'm very interested in performance for sequential swapping. This occurs in
for example scientific applications which much sweep through vast arrays
much larger than physical RAM.
Have you benchmarked booting with low physical RAM, lots of swap and
writing a simple program that allocates 100's of Mb of memory and then
sequentially accesses every page in a big loop?
This is one area in which FreeBSD stomps on us. Theoretically it should be
possible to get swap with readahead pulling pages into RAM at disk speed.
Cheers
Chris
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-04 19:25 ` Chris Evans
@ 1998-12-04 20:47 ` Rik van Riel
1998-12-05 18:59 ` Alan Cox
1998-12-07 11:52 ` Rik van Riel
1 sibling, 1 reply; 23+ messages in thread
From: Rik van Riel @ 1998-12-04 20:47 UTC (permalink / raw)
To: Chris Evans; +Cc: Linux MM, Linux Kernel
On Fri, 4 Dec 1998, Chris Evans wrote:
> On Thu, 3 Dec 1998, Rik van Riel wrote:
>
> > here is a patch (against 2.1.130, but vs. 2.1.131 should
> > be trivial) that improves the swapping performance both
> > during swapout and swapin and contains a few minor fixes.
>
> I'm very interested in performance for sequential swapping. This
> occurs in for example scientific applications which much sweep
> through vast arrays much larger than physical RAM.
>
> This is one area in which FreeBSD stomps on us. Theoretically it
> should be possible to get swap with readahead pulling pages into RAM
> at disk speed.
We're not at that point yet, not at all :(
We probably could put in an algorithm that does that as
well, but the current patch consists mainly of a proof-
of-concept (read really stupid) readahead algorithm :)
The advantage of that algorithm however is that it doesn't
incur any extra disk seeks (only linear readahead inside
the swap area). The way kswapd swaps out things this might
also help with the readahead of tiled date, etc...
I will compile a new patch (against 2.1.130 again, since
2.1.131 contains mostly VM mistakes that I want reversed)
this weekend...
regards,
Rik -- the flu hits, the flu hits, the flu hits -- MORE
+-------------------------------------------------------------------+
| Linux memory management tour guide. H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-04 14:34 ` Stephen C. Tweedie
@ 1998-12-05 9:46 ` Gerard Roudier
1998-12-07 16:50 ` Stephen C. Tweedie
1998-12-05 10:47 ` Gerard Roudier
1 sibling, 1 reply; 23+ messages in thread
From: Gerard Roudier @ 1998-12-05 9:46 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: Rik van Riel, Linux MM, Linux Kernel
On Fri, 4 Dec 1998, Stephen C. Tweedie wrote:
> Hi,
>
> On Fri, 4 Dec 1998 15:02:56 +0100 (CET), Rik van Riel
> <H.H.vanRiel@phys.uu.nl> said:
>
> >> One odd thing about the readahead: you don't start the readahead until
> >> _after_ you have synchronously read in the first swap page of the
> >> cluster. Surely it is better to do the readahead first, so that you
> >> are submitting one IO to disk, not two?
>
> > This would severely suck when something else would be doing
> > a run_taskqueue(&tq_disk). It would mean that we'd read
> > n+1..n+15 before n itself.
>
> No, not at all. This is already the way we do all readahead
> everywhere in the kernel.
>
> The idea is to do readahead for all the data you want, *including* the
> bit you are going to need right away. Once that is done, you just
> wait for the IO to complete on that first item.
Indeed.
In the old time, swapping and paging were different things, but they seems
to be confused in Linux.
You may perform read-ahead when you really swap in a process that had been
swapped out. But about paging, you must consider that this mechanism is
not sequential but mostly ramdom in RL. So you just want to read more data
at the same time and near the location that faulted. Reading-ahead is
obviously candidate for this optimization, but reading behind must also be
considered in my opinion.
File read-ahead is based on the way that data file are often accessed
sequentially by applications and we have to detect this behaviour prior
to reading ahead large data blocks.
For mmapped file, you may want to allow applications to tell you as
they intend to access data and trust them. But for paging, you just
want to read more data than 1 single page at a time, assuming that
data near the faulted address have good chances to be accessed by
the application soon.
That's my current opinion on this topic.
Regards,
Gerard.
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-04 14:34 ` Stephen C. Tweedie
1998-12-05 9:46 ` Gerard Roudier
@ 1998-12-05 10:47 ` Gerard Roudier
1 sibling, 0 replies; 23+ messages in thread
From: Gerard Roudier @ 1998-12-05 10:47 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: Rik van Riel, Linux MM, Linux Kernel
On Fri, 4 Dec 1998, Stephen C. Tweedie wrote:
> The idea is to do readahead for all the data you want, *including* the
> bit you are going to need right away. Once that is done, you just
> wait for the IO to complete on that first item. In this case, that
> means doing a readahead on pages n to n+15 inclusive, and then after
> that doing the synchronous read_swap_page on page n. The kernel will
> happily find that page in the swap cache, work out that IO is already
> in progress and wait for that page to become available.
>
> Even though the buffer IO request layer issues the entire sequential
> IO as one IO to the device drivers, the buffers and pages involved in
> the data transfer still get unlocked one by one as the IO completes.
> After submitting the initial IO you can wait for that first page to
> become unlocked without having to wait for the rest of the readahead
> IO to finish.
I find my previous reply to you mail very unclear. In fact, my idea is
to implemement something like the following:
Inputs:
-------
- faulted_address
- offset_behind offset behind the faulted address up to which we want
to swap-in data.
- offset_ahead offset ahead the faulted address ...
Strategy:
---------
- queue reads of all the pages between (faulted_address - offset_behind)
and (faulted_address + offset_ahead)
- run the tq_disk to actually start the IO.
- wait for the page that contains the faulted_address.
offset_behind and offset_ahead may be constant values or tuned at run-time
dynamically if some clever heuristic will be found.
Does the above make sense? (Or is it already implemented this way?)
Regards,
Gerard.
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-04 20:47 ` Rik van Riel
@ 1998-12-05 18:59 ` Alan Cox
1998-12-05 19:02 ` Rik van Riel
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Alan Cox @ 1998-12-05 18:59 UTC (permalink / raw)
To: H.H.vanRiel; +Cc: chris, linux-mm, linux-kernel
> I will compile a new patch (against 2.1.130 again, since
> 2.1.131 contains mostly VM mistakes that I want reversed)
> this weekend...
2.1.131 is materially faster here than any of the variants I've tried. Are
you sure ?
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-05 18:59 ` Alan Cox
@ 1998-12-05 19:02 ` Rik van Riel
1998-12-06 5:20 ` Rik van Riel
1998-12-06 5:23 ` Steve VanDevender
2 siblings, 0 replies; 23+ messages in thread
From: Rik van Riel @ 1998-12-05 19:02 UTC (permalink / raw)
To: Alan Cox; +Cc: chris, Linux MM, Linux Kernel
On Sat, 5 Dec 1998, Alan Cox wrote:
> > I will compile a new patch (against 2.1.130 again, since
> > 2.1.131 contains mostly VM mistakes that I want reversed)
> > this weekend...
>
> 2.1.131 is materially faster here than any of the variants I've
> tried. Are you sure ?
Sure it's faster. It just doesn't come near the
auto balancing that could have been (and appears
to be in my tree).
cheers,
Rik -- the flu hits, the flu hits, the flu hits -- MORE
+-------------------------------------------------------------------+
| Linux memory management tour guide. H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-05 18:59 ` Alan Cox
1998-12-05 19:02 ` Rik van Riel
@ 1998-12-06 5:20 ` Rik van Riel
1998-12-06 5:23 ` Steve VanDevender
2 siblings, 0 replies; 23+ messages in thread
From: Rik van Riel @ 1998-12-06 5:20 UTC (permalink / raw)
To: Alan Cox; +Cc: chris, Linux MM, Linux Kernel
On Sat, 5 Dec 1998, Alan Cox wrote:
>
> > I will compile a new patch (against 2.1.130 again, since
> > 2.1.131 contains mostly VM mistakes that I want reversed)
> > this weekend...
>
> 2.1.131 is materially faster here than any of the variants I've
> tried. Are you sure ?
Not completely, but please check out my new patch against
2.1.131. It should be faster still without putting too
much of a cap on the cache size.
regards,
Rik -- the flu hits, the flu hits, the flu hits -- MORE
+-------------------------------------------------------------------+
| Linux memory management tour guide. H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-05 18:59 ` Alan Cox
1998-12-05 19:02 ` Rik van Riel
1998-12-06 5:20 ` Rik van Riel
@ 1998-12-06 5:23 ` Steve VanDevender
2 siblings, 0 replies; 23+ messages in thread
From: Steve VanDevender @ 1998-12-06 5:23 UTC (permalink / raw)
To: linux-kernel; +Cc: H.H.vanRiel, chris, linux-mm, Alan Cox
Alan Cox writes:
> > I will compile a new patch (against 2.1.130 again, since
> > 2.1.131 contains mostly VM mistakes that I want reversed)
> > this weekend...
>
> 2.1.131 is materially faster here than any of the variants I've tried. Are
> you sure ?
I find 2.1.131 to be much better than its recent predecessors in
terms of reduced swap activity with the same set of applications
loaded (X, XEmacs, netscape). Pages are staying in memory when
they used to be swapped in and out all the time. Whatever
changed between 2.1.130 and 2.1.131 was hardly any sort of
mistake, as far as I'm concerned.
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-04 19:25 ` Chris Evans
1998-12-04 20:47 ` Rik van Riel
@ 1998-12-07 11:52 ` Rik van Riel
1 sibling, 0 replies; 23+ messages in thread
From: Rik van Riel @ 1998-12-07 11:52 UTC (permalink / raw)
To: Chris Evans; +Cc: Linux MM
On Fri, 4 Dec 1998, Chris Evans wrote:
> On Thu, 3 Dec 1998, Rik van Riel wrote:
>
> > here is a patch (against 2.1.130, but vs. 2.1.131 should
> > be trivial) that improves the swapping performance both
> > during swapout and swapin and contains a few minor fixes.
Since Dec 3 a lot changed. There now _is_ a patch against the
2.1.131 with Stephen's apparantly excellent shrink_mmap() fix.
> I'm very interested in performance for sequential swapping. This
> occurs in for example scientific applications which much sweep
> through vast arrays much larger than physical RAM.
>
> Have you benchmarked booting with low physical RAM, lots of swap and
> writing a simple program that allocates 100's of Mb of memory and
> then sequentially accesses every page in a big loop?
Yes. Zlatko Calusic made a small and simple program that you
can tell how much memory to use and how many passes it should
make. It simply reads the memory and dirties it. I have achieved
5 MB/s (that's 10 MB/s when you count the fact that you both have
to read _and_ write) on a 200 MB session.
> This is one area in which FreeBSD stomps on us. Theoretically it
> should be possible to get swap with readahead pulling pages into RAM
> at disk speed.
I have looked at the FreeBSD code. We can do better than that
and I've worked out quite a nice scheme to do both read-ahead,
read-behind or a combination of the two (depending on the
situation). We also should stop reading in pages that are in
the vincinity but don't belong to the program at hand.
Unfortunately the Linux swapout code doesn't do proper clustering
yet (too much fragmentation within a program's address space) and
none of the above ideas have been converted to code yet. :)
cheers,
Rik -- the flu hits, the flu hits, the flu hits -- MORE
+-------------------------------------------------------------------+
| Linux memory management tour guide. H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-05 9:46 ` Gerard Roudier
@ 1998-12-07 16:50 ` Stephen C. Tweedie
1998-12-08 1:34 ` Billy Harvey
0 siblings, 1 reply; 23+ messages in thread
From: Stephen C. Tweedie @ 1998-12-07 16:50 UTC (permalink / raw)
To: Gerard Roudier; +Cc: Stephen C. Tweedie, Rik van Riel, Linux MM, Linux Kernel
Hi,
On Sat, 5 Dec 1998 10:46:40 +0100 (MET), Gerard Roudier
<groudier@club-internet.fr> said:
> You may perform read-ahead when you really swap in a process that had been
> swapped out. But about paging, you must consider that this mechanism is
> not sequential but mostly ramdom in RL. So you just want to read more data
> at the same time and near the location that faulted. Reading-ahead is
> obviously candidate for this optimization, but reading behind must also be
> considered in my opinion.
Yep: one of the things which has been talked about, and which is on my
list of things to start experimenting with in 2.3, is increasing the
granularity of paging so that we automatically try to read in (say) 16K
at a time when we start paging a binary. Discarding unused pages can
still work on a per-page granularity, so we don't bloat memory in the
long term, but it has the potential to significantly improve loading
times for some binaries.
Of course, there are also a whole number of optimisations we can make
explicitly for sequentially accessed mapped regions, but the granularity
trick should be a pretty cheap way to wring a bit more performance out
of the normal random paging.
--Stephen
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-07 16:50 ` Stephen C. Tweedie
@ 1998-12-08 1:34 ` Billy Harvey
1998-12-08 2:31 ` Rik van Riel
1998-12-08 12:21 ` Stephen C. Tweedie
0 siblings, 2 replies; 23+ messages in thread
From: Billy Harvey @ 1998-12-08 1:34 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: Linux MM, Linux Kernel
Has anyone ever looked at the following concept? In addition to a
swap-in read-ahead, have a swap-out write-ahead. The idea is to use all
the avaialble swap space as a mirror of memory. If a need for real
memory comes up, and a page has been marked as mirrored, then it can be
immediately reused without swapping out. The trick would be in deciding
how to write-ahead without taking significant execution time and disk
access time away from other processes, that is with no impact to active
processes. Now, if that page is needed back into memory, the
current/improved methods of reading in can also be followed. In short,
we have information available to us that allows us to reduce time of
execution. That information is that we have swap space available, and
disk access time avaialable (while nothing else needs it), and can make
use of that time.
Spears solicted.
--
Billy.Harvey@thrillseeker.net
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-08 1:34 ` Billy Harvey
@ 1998-12-08 2:31 ` Rik van Riel
1998-12-08 2:51 ` Billy Harvey
` (2 more replies)
1998-12-08 12:21 ` Stephen C. Tweedie
1 sibling, 3 replies; 23+ messages in thread
From: Rik van Riel @ 1998-12-08 2:31 UTC (permalink / raw)
To: Billy Harvey; +Cc: Stephen C. Tweedie, Linux MM, Linux Kernel
On Mon, 7 Dec 1998, Billy Harvey wrote:
> Has anyone ever looked at the following concept? In addition to a
> swap-in read-ahead, have a swap-out write-ahead. The idea is to use
> all the avaialble swap space as a mirror of memory.
We do something a bit like this in 2.1.130+. Writing out all
pages to swap will use far too much I/O bandwidth though, so
we will never do that...
> If a need for real memory comes up, and a page has been marked as
> mirrored, then it can be immediately reused without swapping out.
> The trick would be in deciding how to write-ahead without taking
> significant execution time and disk access time away from other
> processes, that is with no impact to active processes.
We will probably want to implement a kind of write-ahead
algorithm for swapout though, but a slightly different one
than you envisioned.
On a swapout, we will scan ahead of where we are (p->swap_address)
and swap out the next number of pages too. We break the loop if:
- the page isn't present or already in swap
- the next two pages were touched since our last scan
- the page isn't allocated
- we reach the end of a SWAP_CLUSTER area in swap space
If we write this way (no more expensive than normal because
we write the stuff in one disk movement) swapin readahead
will be much more effective and performance will increase.
cheers,
Rik -- the flu hits, the flu hits, the flu hits -- MORE
+-------------------------------------------------------------------+
| Linux memory management tour guide. H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-08 2:31 ` Rik van Riel
@ 1998-12-08 2:51 ` Billy Harvey
1998-12-08 3:00 ` Rik van Riel
1998-12-08 12:35 ` Stephen C. Tweedie
1998-12-09 2:41 ` Drago Goricanec
2 siblings, 1 reply; 23+ messages in thread
From: Billy Harvey @ 1998-12-08 2:51 UTC (permalink / raw)
To: Rik van Riel; +Cc: Stephen C. Tweedie, Linux MM, Linux Kernel
Rik van Riel wrote:
>
> On Mon, 7 Dec 1998, Billy Harvey wrote:
>
> > Has anyone ever looked at the following concept? In addition to a
> > swap-in read-ahead, have a swap-out write-ahead. The idea is to use
> > all the avaialble swap space as a mirror of memory.
>
> We do something a bit like this in 2.1.130+. Writing out all
> pages to swap will use far too much I/O bandwidth though, so
> we will never do that...
>
Rik,
That's my point though about not taking I/O time away from other tasks.
Only mirror pages to swap if there's nothing else blocked for I/O - put
any free time to work, and mirror pages if swap memory allows in
anticipation that it may be swapped out later. I suppose a
least-recently-used approach on the pages would have the highest
payback. I realize the CPU may be used a little more, but other than
rc5des it's idle a good bit of the time anyway - perhaps this could be
one step above an idle task.
Billy
--
Billy.Harvey@thrillseeker.net
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-08 2:51 ` Billy Harvey
@ 1998-12-08 3:00 ` Rik van Riel
0 siblings, 0 replies; 23+ messages in thread
From: Rik van Riel @ 1998-12-08 3:00 UTC (permalink / raw)
To: Billy Harvey; +Cc: Stephen C. Tweedie, Linux MM, Linux Kernel
On Mon, 7 Dec 1998, Billy Harvey wrote:
> Rik van Riel wrote:
> >
> > On Mon, 7 Dec 1998, Billy Harvey wrote:
> >
> > > Has anyone ever looked at the following concept? In addition to a
> > > swap-in read-ahead, have a swap-out write-ahead. The idea is to use
> > > all the avaialble swap space as a mirror of memory.
> >
> > We do something a bit like this in 2.1.130+. Writing out all
> > pages to swap will use far too much I/O bandwidth though, so
> > we will never do that...
>
> That's my point though about not taking I/O time away from other
> tasks. Only mirror pages to swap if there's nothing else blocked
> for I/O - put any free time to work, and mirror pages if swap memory
> allows in anticipation that it may be swapped out later.
Write-ahead only makes sense when we can cluster the extra
I/O with the operation we were already going to do.
> I suppose a least-recently-used approach on the pages would have the
> highest payback.
LRU would be a very bad strategy since it wastes too much CPU
and it prevents us from writing the blocks to disk in such a
way that it makes swapin readahead efficient.
Remember that disk seek time is about 10 times as expensive
as transfer time. This means that we've got to optimize our
I/O patterns mainly for seek time -- transferring a few
blocks extra in one big I/O sweep isn't really costing us
anything. And once we do that, expensive schemes like LRU
really don't matter any more, do they?
regards,
Rik -- the flu hits, the flu hits, the flu hits -- MORE
+-------------------------------------------------------------------+
| Linux memory management tour guide. H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-08 1:34 ` Billy Harvey
1998-12-08 2:31 ` Rik van Riel
@ 1998-12-08 12:21 ` Stephen C. Tweedie
1 sibling, 0 replies; 23+ messages in thread
From: Stephen C. Tweedie @ 1998-12-08 12:21 UTC (permalink / raw)
To: Billy Harvey; +Cc: Stephen C. Tweedie, Linux MM, Linux Kernel
Hi,
On Mon, 07 Dec 1998 20:34:12 -0500, Billy Harvey
<Billy.Harvey@thrillseeker.net> said:
> Has anyone ever looked at the following concept? In addition to a
> swap-in read-ahead, have a swap-out write-ahead. The idea is to use all
> the avaialble swap space as a mirror of memory.
We already do that. That's what the swap cache is. When kswapd swaps
stuff out, it does so asynchronously, but leaves the data in the swap
cache where it can be picked up again if another process wants the
swap entry back. Most importantly, it lets us do the writing to swap
very rapidly, as we can efficiently stream the updates to disk.
--Stephen
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-08 2:31 ` Rik van Riel
1998-12-08 2:51 ` Billy Harvey
@ 1998-12-08 12:35 ` Stephen C. Tweedie
1998-12-08 13:51 ` Rik van Riel
1998-12-09 2:41 ` Drago Goricanec
2 siblings, 1 reply; 23+ messages in thread
From: Stephen C. Tweedie @ 1998-12-08 12:35 UTC (permalink / raw)
To: Rik van Riel; +Cc: Billy Harvey, Stephen C. Tweedie, Linux MM, Linux Kernel
Hi,
On Tue, 8 Dec 1998 03:31:25 +0100 (CET), Rik van Riel
<H.H.vanRiel@phys.uu.nl> said:
> On a swapout, we will scan ahead of where we are (p->swap_address)
> and swap out the next number of pages too.
Yes, but be aware that for good performance you need to combine this
with a mechanism to ensure swap space does not become fragmented, and
you also need a swap-behind mechanism for sequential accesses (so that
if an application is scanning a data set sequentially, the un-accessed
space behind the current application "cursor" is being removed from
memory just as fast as the stuff about to be accessed is being brought
in).
--Stephen
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-08 12:35 ` Stephen C. Tweedie
@ 1998-12-08 13:51 ` Rik van Riel
0 siblings, 0 replies; 23+ messages in thread
From: Rik van Riel @ 1998-12-08 13:51 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: Billy Harvey, Linux MM, Linux Kernel
On Tue, 8 Dec 1998, Stephen C. Tweedie wrote:
> On Tue, 8 Dec 1998 03:31:25 +0100 (CET), Rik van Riel
> <H.H.vanRiel@phys.uu.nl> said:
>
> > On a swapout, we will scan ahead of where we are (p->swap_address)
> > and swap out the next number of pages too.
>
> Yes, but be aware that for good performance you need to combine this
> with a mechanism to ensure swap space does not become fragmented,
> and you also need a swap-behind mechanism for sequential accesses
> (so that if an application is scanning a data set sequentially, the
> un-accessed space behind the current application "cursor" is being
> removed from memory just as fast as the stuff about to be accessed
> is being brought in).
And we also want a nice swapout clustering algorithm and
an awful lot of other stuff as well. I think we should
work on that stuff in the 'vacuum' period when 2.2 stabilizes
and 2.3 hasn't split off yet. Then we can merge the changes
in 2.3.very_small so we don't hold up the tree and give
something else the chance to hold it up again and again...
cheers,
Rik -- the flu hits, the flu hits, the flu hits -- MORE
+-------------------------------------------------------------------+
| Linux memory management tour guide. H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-08 2:31 ` Rik van Riel
1998-12-08 2:51 ` Billy Harvey
1998-12-08 12:35 ` Stephen C. Tweedie
@ 1998-12-09 2:41 ` Drago Goricanec
1998-12-09 11:58 ` Stephen C. Tweedie
2 siblings, 1 reply; 23+ messages in thread
From: Drago Goricanec @ 1998-12-09 2:41 UTC (permalink / raw)
To: H.H.vanRiel; +Cc: Billy.Harvey, sct, linux-mm, linux-kernel
On Tue, 8 Dec 1998 03:31:25 +0100 (CET), Rik van Riel writes:
> On a swapout, we will scan ahead of where we are (p->swap_address)
> and swap out the next number of pages too. We break the loop if:
> - the page isn't present or already in swap
> - the next two pages were touched since our last scan
> - the page isn't allocated
> - we reach the end of a SWAP_CLUSTER area in swap space
>
> If we write this way (no more expensive than normal because
> we write the stuff in one disk movement) swapin readahead
> will be much more effective and performance will increase.
Except for disk I/O bound processes, where the swapout writeahead
steals some extra time from the disk. I guess this is where having
separate swap and data disks would help.
Looking forward to trying out your patches myself.
Drago
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] swapin readahead and fixes
1998-12-09 2:41 ` Drago Goricanec
@ 1998-12-09 11:58 ` Stephen C. Tweedie
0 siblings, 0 replies; 23+ messages in thread
From: Stephen C. Tweedie @ 1998-12-09 11:58 UTC (permalink / raw)
To: Drago Goricanec; +Cc: H.H.vanRiel, Billy.Harvey, sct, linux-mm, linux-kernel
Hi,
On Wed, 09 Dec 1998 11:41:52 +0900, Drago Goricanec
<drago@king.otsd.ts.fujitsu.co.jp> said:
>> If we write this way (no more expensive than normal because
>> we write the stuff in one disk movement) swapin readahead
>> will be much more effective and performance will increase.
> Except for disk I/O bound processes, where the swapout writeahead
> steals some extra time from the disk.
Not necessarily: having to do extra seeks hurts the throughput MUCH
more than doing a bit more IO when the disk head is already in position.
> I guess this is where having separate swap and data disks would
> help.
That is _always_ a good idea, anyway.
--Stephen
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~1998-12-09 12:12 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-12-03 17:56 [PATCH] swapin readahead and fixes Rik van Riel
1998-12-04 11:34 ` Stephen C. Tweedie
1998-12-04 14:02 ` Rik van Riel
1998-12-04 14:34 ` Stephen C. Tweedie
1998-12-05 9:46 ` Gerard Roudier
1998-12-07 16:50 ` Stephen C. Tweedie
1998-12-08 1:34 ` Billy Harvey
1998-12-08 2:31 ` Rik van Riel
1998-12-08 2:51 ` Billy Harvey
1998-12-08 3:00 ` Rik van Riel
1998-12-08 12:35 ` Stephen C. Tweedie
1998-12-08 13:51 ` Rik van Riel
1998-12-09 2:41 ` Drago Goricanec
1998-12-09 11:58 ` Stephen C. Tweedie
1998-12-08 12:21 ` Stephen C. Tweedie
1998-12-05 10:47 ` Gerard Roudier
1998-12-04 19:25 ` Chris Evans
1998-12-04 20:47 ` Rik van Riel
1998-12-05 18:59 ` Alan Cox
1998-12-05 19:02 ` Rik van Riel
1998-12-06 5:20 ` Rik van Riel
1998-12-06 5:23 ` Steve VanDevender
1998-12-07 11:52 ` Rik van Riel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox