* [patch] swap-speedup-2.4.3-A1, massive swapping speedup
@ 2001-04-23 9:20 Ingo Molnar
2001-04-23 15:33 ` Rik van Riel
0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2001-04-23 9:20 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alan Cox, Linux Kernel List, linux-mm, Marcelo Tosatti,
Rik van Riel, Szabolcs Szakacsits
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2949 bytes --]
background: sometime during the 2.4 cycle swapping performance and the
efficiency of the swapcache dropped significantly. I believe i finally
managed to find the problem that caused poor swapping performance and
annoying 'sudden process stoppage' symptoms.
(to those people who are experiencing swapping problems in 2.4: please try
the attached swap-speedup-2.4.3-A1 patch and let me know whether it works
& helps. The patch is against 2.4.4-pre6 or 2.4.3-ac12 and works just fine
on UP and SMP systems here.)
the problem is in lookup_swap_cache(). Per design it's a read-only,
lightweight function that just looks up the swapcache and reestablishes
the mapping if there is an entry still present. (ie. in most cases this
means that there is a fresh swapout still pending). In reality
lookup_swap_cache() did not work as intended: pages were locked in most of
the cases due to swap-out WRITEs, which caused the find_lock_page() to act
as a synchronization point - it blocked until the writeout finished. (!!!)
This is highly inefficient and undesirable - a lookup cache should not and
must not serialize with writeouts.
the reason why lookup_swap_cache() locks the page is due to a valid race,
but the solution excessive: it tries to keep the lookup atomic against
destroyers of the page, page_launder() and reclaim_page(). But it does not
really need the page lock for this - what it needs is atomicity against
swapcache updates. The same atomicity can be achieved by taking the LRU
and pagecache locks, the PageSwapCache() check is now embedded in a new
function: __find_get_swapcache_page().
the patch dramatically improves swapping performance in the tests i've
tried: swap-trashing tests that used to effectively lock the system up,
are chugging along just fine now, and the system is still more than
usable. The performance bug basically killed all good effects of the
swapcache. Swap-in latency of swapped-out processes has decreased
significantly, and overall swapping throughput has increased and
stabilized.
I'd really like to ask all MM developers to take some time to lean back
and verify current code to find similar performance bugs, instead of
trying to hack up new functionality to hide symptoms of bad design or bad
implementation. (for example there are some plans to add "avoid trashing
via process suspension" heuristics, which just work around real problems
like this one. With such code in place i'd probably never have found this
problem.) I believe we have most of the VM functionality in place to have
a world-class VM (most of which is new), what we now need is reliable and
verified behavior, not more complexity.
[ i'd also like to ask for new methods to create 'swap trashing', right
now i cannot make my system unusable via excessive swapping. (i'm
currently using the sieve.c memory trasher from Cesar Eduardo Barros, this
code used to produce the most extreme trashing load - it works just fine
now.) ]
Ingo
[-- Attachment #2: Type: TEXT/PLAIN, Size: 2954 bytes --]
--- linux/mm/filemap.c.orig Mon Apr 23 13:35:06 2001
+++ linux/mm/filemap.c Mon Apr 23 13:37:09 2001
@@ -710,6 +710,34 @@
}
/*
+ * Find a swapcache page (and get a reference) or return NULL.
+ * The SwapCache check is protected by the pagecache lock.
+ */
+struct page * __find_get_swapcache_page(struct address_space *mapping,
+ unsigned long offset, struct page **hash)
+{
+ struct page *page;
+
+ /*
+ * We need the LRU lock to protect against page_launder().
+ */
+ spin_lock(&pagecache_lock);
+ spin_lock(&pagemap_lru_lock);
+
+ page = __find_page_nolock(mapping, offset, *hash);
+ if (page) {
+ if (PageSwapCache(page))
+ page_cache_get(page);
+ else
+ page = NULL;
+ }
+ spin_unlock(&pagemap_lru_lock);
+ spin_unlock(&pagecache_lock);
+
+ return page;
+}
+
+/*
* Same as the above, but lock the page too, verifying that
* it's still valid once we own it.
*/
--- linux/mm/swap_state.c.orig Mon Apr 23 13:35:06 2001
+++ linux/mm/swap_state.c Mon Apr 23 13:37:01 2001
@@ -163,37 +163,18 @@
/*
* Right now the pagecache is 32-bit only. But it's a 32 bit index. =)
*/
-repeat:
- found = find_lock_page(&swapper_space, entry.val);
+ found = find_get_swapcache_page(&swapper_space, entry.val);
if (!found)
return 0;
- /*
- * Though the "found" page was in the swap cache an instant
- * earlier, it might have been removed by refill_inactive etc.
- * Re search ... Since find_lock_page grabs a reference on
- * the page, it can not be reused for anything else, namely
- * it can not be associated with another swaphandle, so it
- * is enough to check whether the page is still in the scache.
- */
- if (!PageSwapCache(found)) {
- UnlockPage(found);
- page_cache_release(found);
- goto repeat;
- }
+ if (!PageSwapCache(found))
+ BUG();
if (found->mapping != &swapper_space)
- goto out_bad;
+ BUG();
#ifdef SWAP_CACHE_INFO
swap_cache_find_success++;
#endif
- UnlockPage(found);
return found;
}
-
-out_bad:
- printk (KERN_ERR "VM: Found a non-swapper swap page!\n");
- UnlockPage(found);
- page_cache_release(found);
- return 0;
}
/*
--- linux/include/linux/pagemap.h.orig Mon Apr 23 13:35:05 2001
+++ linux/include/linux/pagemap.h Mon Apr 23 13:37:01 2001
@@ -77,7 +77,12 @@
unsigned long index, struct page **hash);
extern void lock_page(struct page *page);
#define find_lock_page(mapping, index) \
- __find_lock_page(mapping, index, page_hash(mapping, index))
+ __find_lock_page(mapping, index, page_hash(mapping, index))
+
+extern struct page * __find_get_swapcache_page (struct address_space * mapping,
+ unsigned long index, struct page **hash);
+#define find_get_swapcache_page(mapping, index) \
+ __find_get_swapcache_page(mapping, index, page_hash(mapping, index))
extern void __add_page_to_hash_queue(struct page * page, struct page **p);
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [patch] swap-speedup-2.4.3-A1, massive swapping speedup 2001-04-23 9:20 [patch] swap-speedup-2.4.3-A1, massive swapping speedup Ingo Molnar @ 2001-04-23 15:33 ` Rik van Riel 2001-04-23 16:05 ` [patch] swap-speedup-2.4.3-A2 Ingo Molnar 2001-04-23 16:53 ` [patch] swap-speedup-2.4.3-A1, massive swapping speedup Jonathan Morton 0 siblings, 2 replies; 11+ messages in thread From: Rik van Riel @ 2001-04-23 15:33 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, Alan Cox, Linux Kernel List, linux-mm, Marcelo Tosatti, Szabolcs Szakacsits On Mon, 23 Apr 2001, Ingo Molnar wrote: > the reason why lookup_swap_cache() locks the page is due to a valid race, > but the solution excessive: it tries to keep the lookup atomic against > destroyers of the page, page_launder() and reclaim_page(). But it does not > really need the page lock for this - what it needs is atomicity against > swapcache updates. The same atomicity can be achieved by taking the LRU > and pagecache locks, the PageSwapCache() check is now embedded in a new > function: __find_get_swapcache_page(). There seems to be one more reason, take a look at the function read_swap_cache_async() in swap_state.c, around line 240: /* * Add it to the swap cache and read its contents. */ lock_page(new_page); add_to_swap_cache(new_page, entry); rw_swap_page(READ, new_page, wait); return new_page; Here we add an "empty" page to the swap cache and use the page lock to protect people from reading this non-up-to-date page. Your patch looks like it could occasionally let a process map in such a page before the IO is done. We really need to fix read_swap_cache_async() and your __find_get_swapcache_page() to make sure that only pages which are up to date can be given to processes. > the patch dramatically improves swapping performance in the tests i've > tried: > Swap-in latency of swapped-out processes has decreased significantly, > and overall swapping throughput has increased and stabilized. Cool, this really sounds like a patch we need, once the correctness issue is resolved. > I'd really like to ask all MM developers to take some time to lean back > and verify current code to find similar performance bugs, instead of > trying to hack up new functionality to hide symptoms of bad design or bad > implementation. (for example there are some plans to add "avoid trashing > via process suspension" heuristics, which just work around real problems > like this one. With such code in place i'd probably never have found this > problem.) I believe we have most of the VM functionality in place to have > a world-class VM (most of which is new), what we now need is reliable and > verified behavior, not more complexity. Agreed, we need to look for all the gotchas in the current code. OTOH, no matter how fast we make the current functionality, there will always be some point at which the system is so overloaded that no paging system can help save it from thrashing. At this point we need load control ... IMHO it is unacceptable to have Linux fall over under heavy overload when we could stop this from happening by simple load control code. Btw, to test this ... try running 10 copies of gnuchess playing against itself on a machine with 64 MB of RAM. You'll go under thrashing pretty quickly. This same situation is possible in real-life scenarios, like a website getting slashdotted or a mail server getting bombed. We really want some form of load control to save a machine from thrashing itself to death in such a situation. regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* [patch] swap-speedup-2.4.3-A2 2001-04-23 15:33 ` Rik van Riel @ 2001-04-23 16:05 ` Ingo Molnar 2001-04-23 17:17 ` Linus Torvalds 2001-04-24 5:44 ` [patch] swap-speedup-2.4.3-B3 Ingo Molnar 2001-04-23 16:53 ` [patch] swap-speedup-2.4.3-A1, massive swapping speedup Jonathan Morton 1 sibling, 2 replies; 11+ messages in thread From: Ingo Molnar @ 2001-04-23 16:05 UTC (permalink / raw) To: Rik van Riel Cc: Linus Torvalds, Alan Cox, Linux Kernel List, linux-mm, Marcelo Tosatti, Szabolcs Szakacsits [-- Attachment #1: Type: TEXT/PLAIN, Size: 2120 bytes --] On Mon, 23 Apr 2001, Rik van Riel wrote: > There seems to be one more reason, take a look at the function > read_swap_cache_async() in swap_state.c, around line 240: you are right - i thought about that issue too and assumed it works like the pagecache (which first reads the page without hashing it, then tries to add the result to the pagecache and throws away the page if anyone else finished it already), but that was incorrect. i think the swapcache's behavior is actually the more correct one, and the pagecache should first hash pending reads, then start the IO. If someone else tries to read the page and the page is not uptodate and locked, then we should sleep on the page's waitqueue. End-of-page-IO or PageError should then unlock the page and wake up all waiters. [which happens already] several processes trying to read the same page is a legitimate scenario, and should not result in multiple reads done on the same disk area. Besides the (albeit small) performance win, IMO this is also a quality of implementation issue, even if the window is small. (the window might be not that small in some cases like NFS?) i've fixed the patch so that it checks Page_Uptodate() within __find_get_swapcache_page(). The only case i'm not convinced about is the case of IO errors, but otherwise this should work. I've also fixed an SMP-only bug in the new __find_get_swapcache_page() function: it must not run __find_page_nolock() with the LRU lock held. i've attached swap-speedup-2.4.3-A2 which has these fixes included. > OTOH, no matter how fast we make the current functionality, there will > always be some point at which the system is so overloaded that no > paging system can help save it from thrashing. agreed - but this should really be the last resort, and i'm still worried about potentially hiding real performance bugs. > Btw, to test this ... try running 10 copies of gnuchess playing > against itself on a machine with 64 MB of RAM. You'll go under > thrashing pretty quickly. thanks, i'll try that. (do you have any easy oneliner command line to start up gnuchess against itself?) Ingo [-- Attachment #2: Type: TEXT/PLAIN, Size: 3316 bytes --] --- linux/mm/filemap.c.orig Mon Apr 23 13:35:06 2001 +++ linux/mm/filemap.c Mon Apr 23 20:55:23 2001 @@ -710,6 +710,43 @@ } /* + * Find a swapcache page (and get a reference) or return NULL. + * The SwapCache check is protected by the pagecache lock. + */ +struct page * __find_get_swapcache_page(struct address_space *mapping, + unsigned long offset, struct page **hash) +{ + struct page *page; + + /* + * We need the LRU lock to protect against page_launder(). + */ +repeat: + spin_lock(&pagecache_lock); + page = __find_page_nolock(mapping, offset, *hash); + if (page) { + spin_lock(&pagemap_lru_lock); + if (PageSwapCache(page)) { + page_cache_get(page); + if (!Page_Uptodate(page) && PageLocked(page)) { + spin_unlock(&pagemap_lru_lock); + spin_unlock(&pagecache_lock); + ___wait_on_page(page); + page_cache_release(page); + goto repeat; + } + if (!Page_Uptodate(page)) + printk("hm: page not uptodate in __find_get_swapcache_page()? page flags: %08lx\n", page->flags); + } else + page = NULL; + spin_unlock(&pagemap_lru_lock); + } + spin_unlock(&pagecache_lock); + + return page; +} + +/* * Same as the above, but lock the page too, verifying that * it's still valid once we own it. */ --- linux/mm/swap_state.c.orig Mon Apr 23 13:35:06 2001 +++ linux/mm/swap_state.c Mon Apr 23 13:37:01 2001 @@ -163,37 +163,18 @@ /* * Right now the pagecache is 32-bit only. But it's a 32 bit index. =) */ -repeat: - found = find_lock_page(&swapper_space, entry.val); + found = find_get_swapcache_page(&swapper_space, entry.val); if (!found) return 0; - /* - * Though the "found" page was in the swap cache an instant - * earlier, it might have been removed by refill_inactive etc. - * Re search ... Since find_lock_page grabs a reference on - * the page, it can not be reused for anything else, namely - * it can not be associated with another swaphandle, so it - * is enough to check whether the page is still in the scache. - */ - if (!PageSwapCache(found)) { - UnlockPage(found); - page_cache_release(found); - goto repeat; - } + if (!PageSwapCache(found)) + BUG(); if (found->mapping != &swapper_space) - goto out_bad; + BUG(); #ifdef SWAP_CACHE_INFO swap_cache_find_success++; #endif - UnlockPage(found); return found; } - -out_bad: - printk (KERN_ERR "VM: Found a non-swapper swap page!\n"); - UnlockPage(found); - page_cache_release(found); - return 0; } /* --- linux/include/linux/pagemap.h.orig Mon Apr 23 13:35:05 2001 +++ linux/include/linux/pagemap.h Mon Apr 23 13:37:01 2001 @@ -77,7 +77,12 @@ unsigned long index, struct page **hash); extern void lock_page(struct page *page); #define find_lock_page(mapping, index) \ - __find_lock_page(mapping, index, page_hash(mapping, index)) + __find_lock_page(mapping, index, page_hash(mapping, index)) + +extern struct page * __find_get_swapcache_page (struct address_space * mapping, + unsigned long index, struct page **hash); +#define find_get_swapcache_page(mapping, index) \ + __find_get_swapcache_page(mapping, index, page_hash(mapping, index)) extern void __add_page_to_hash_queue(struct page * page, struct page **p); ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] swap-speedup-2.4.3-A2 2001-04-23 16:05 ` [patch] swap-speedup-2.4.3-A2 Ingo Molnar @ 2001-04-23 17:17 ` Linus Torvalds 2001-04-23 16:54 ` Ingo Molnar 2001-04-24 5:44 ` [patch] swap-speedup-2.4.3-B3 Ingo Molnar 1 sibling, 1 reply; 11+ messages in thread From: Linus Torvalds @ 2001-04-23 17:17 UTC (permalink / raw) To: Ingo Molnar Cc: Rik van Riel, Alan Cox, Linux Kernel List, linux-mm, Marcelo Tosatti, Szabolcs Szakacsits On Mon, 23 Apr 2001, Ingo Molnar wrote: > > you are right - i thought about that issue too and assumed it works like > the pagecache (which first reads the page without hashing it, then tries > to add the result to the pagecache and throws away the page if anyone else > finished it already), but that was incorrect. The above is NOT how the page cache works. Or if some part of the page cache works that way, then it is a BUG. You must NEVER allow multiple outstanding reads from the same location - that implies that you're doing something wrong, and the system is doing too much IO. The way _all_ parts of the page cache should work is: Create new page: - look up page. If found, return it - allocate new page. - look up page again, in case somebody else added it while we allocated it. - add the page atomically with the lookup if the lookup failed, otherwise just free the page without doing anything. - return the looked-up / allocated page. return up-to-date page: - call the above to get a page cache page. - if uptodate, return - lock_page() - if now uptodate (ie somebody else filled it and held the lock), unlock and return. - start the IO - wait on IO by waiting on the page (modulo other work that you could do in the background). - if the page is still not up-to-date after we tried to read it, we got an IO error. Return error. The above is how it is always meant to work. The above works for both new allocations and for old. It works even if an earlier read had failed (due to wrong permissions for example - think about NFS page caches where some people may be unable to actually fill a page, so that you need to re-try on failure). The above is how the regular read/write paths work (modulo bugs). And it's also how the swap cache should work. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] swap-speedup-2.4.3-A2 2001-04-23 17:17 ` Linus Torvalds @ 2001-04-23 16:54 ` Ingo Molnar 0 siblings, 0 replies; 11+ messages in thread From: Ingo Molnar @ 2001-04-23 16:54 UTC (permalink / raw) To: Linus Torvalds Cc: Rik van Riel, Alan Cox, Linux Kernel List, linux-mm, Marcelo Tosatti, Szabolcs Szakacsits On Mon, 23 Apr 2001, Linus Torvalds wrote: > The above is NOT how the page cache works. Or if some part of the page > cache works that way, then it is a BUG. You must NEVER allow multiple > outstanding reads from the same location - that implies that you're > doing something wrong, and the system is doing too much IO. you are right, the pagecache does it correctly, i've just re-checked all places. Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* [patch] swap-speedup-2.4.3-B3 2001-04-23 16:05 ` [patch] swap-speedup-2.4.3-A2 Ingo Molnar 2001-04-23 17:17 ` Linus Torvalds @ 2001-04-24 5:44 ` Ingo Molnar 2001-04-24 16:38 ` Linus Torvalds 1 sibling, 1 reply; 11+ messages in thread From: Ingo Molnar @ 2001-04-24 5:44 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Cox, Linux Kernel List, linux-mm, Marcelo Tosatti, Rik van Riel, Szabolcs Szakacsits the latest swap-speedup patch can be found at: http://people.redhat.com/mingo/swap-speedup/swap-speedup-2.4.3-B3 (the patch is against 2.4.4-pre6 or 2.4.3-ac13.) -B3 includes Marcelo's patch for another area that blocks unnecesserily on locked swapcache pages: async swapcache readahead. Marcello did some tests which shows that this fix brought some nice improvements too. "make -j32 bzImage" using 128MB mem, 128MB swap, 4 CPUs: stock 2.4.3-ac13 ---------------- real 4m0.678s user 4m2.870s sys 0m38.920s swap-speedup-A2 --------------- real 3m24.190s user 4m1.070s sys 0m31.950s swap-speedup-B3 (A2 + Marcelo's swapin-readahead non-blocking patch) --------------- real 3m7.410s user 4m0.940s sys 0m28.680s ie. for this kernel compile test: swap-speedup-A2 is a 18% speedup relative to stock 2.4.3-ac13 swap-speedup-B3 is a 28% speedup relative to stock 2.4.3-ac13 and the amount of CPU time spent in the kernel has been reduced significantly as well. I believe all the correctness and SMP-locking issues have been taken care of in -B3 as well. Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] swap-speedup-2.4.3-B3 2001-04-24 5:44 ` [patch] swap-speedup-2.4.3-B3 Ingo Molnar @ 2001-04-24 16:38 ` Linus Torvalds 2001-04-25 2:28 ` Marcelo Tosatti 0 siblings, 1 reply; 11+ messages in thread From: Linus Torvalds @ 2001-04-24 16:38 UTC (permalink / raw) To: Ingo Molnar Cc: Alan Cox, Linux Kernel List, linux-mm, Marcelo Tosatti, Rik van Riel, Szabolcs Szakacsits On Tue, 24 Apr 2001, Ingo Molnar wrote: > > the latest swap-speedup patch can be found at: Please don't add more of those horrible "wait" arguments. Make two different versions of a function instead. It's going to clean up and simplify the code, and there really isn't any reason to do what you're doing. You should split up the logic differently: if you want to wait for the page, then DO so: page = lookup_swap_cache(..); if (page) { wait_for_swap_cache:valid(page); .. use page .. } Note how much more readable and UNDERSTANDABLE the above is, compared to page = lookup_swap_cache(..., 1); if (page) { ... and note also how splitting up the waiting will - simplify the swap cache lookup function, making it faster for people who do _NOT_ want to wait. - make it easier to statically check the correctness of programs by just eye-balling them ("Hey, he's calling 'wait' with the spinlock held"). - more easily moving the wait around, allowing for more concurrency. Basically, I don't want to mix synchronous and asynchronous interfaces. Everything should be asynchronous by default, and waiting should be explicit. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] swap-speedup-2.4.3-B3 2001-04-24 16:38 ` Linus Torvalds @ 2001-04-25 2:28 ` Marcelo Tosatti 0 siblings, 0 replies; 11+ messages in thread From: Marcelo Tosatti @ 2001-04-25 2:28 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Alan Cox, Linux Kernel List, linux-mm, Rik van Riel, Szabolcs Szakacsits On Tue, 24 Apr 2001, Linus Torvalds wrote: > Basically, I don't want to mix synchronous and asynchronous > interfaces. Everything should be asynchronous by default, and waiting > should be explicit. The following patch changes all swap IO functions to be asynchronous by default and changes the callers to wait when needed (except rw_swap_page_base which will block writers if there are too many in flight swap IOs). Ingo's find_get_swapcache_page() does not wait on locked pages anymore, which is now up to the callers. time make -j32 test with 4 CPUs machine, 128M ram and 128M swap: pre5 ---- 228.04user 28.14system 5:16.52elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (525113major+678617minor)pagefaults 0swaps pre5 + attached patch -------------------- 227.18user 25.49system 3:40.53elapsed 114%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (495387major+644924minor)pagefaults 0swaps Comments? diff -Nur linux.orig/include/linux/pagemap.h linux/include/linux/pagemap.h --- linux.orig/include/linux/pagemap.h Wed Apr 25 00:51:36 2001 +++ linux/include/linux/pagemap.h Wed Apr 25 00:53:04 2001 @@ -77,7 +77,12 @@ unsigned long index, struct page **hash); extern void lock_page(struct page *page); #define find_lock_page(mapping, index) \ - __find_lock_page(mapping, index, page_hash(mapping, index)) + __find_lock_page(mapping, index, page_hash(mapping, index)) + +extern struct page * __find_get_swapcache_page (struct address_space * mapping, + unsigned long index, struct page **hash); +#define find_get_swapcache_page(mapping, index) \ + __find_get_swapcache_page(mapping, index, page_hash(mapping, index)) extern void __add_page_to_hash_queue(struct page * page, struct page **p); diff -Nur linux.orig/include/linux/swap.h linux/include/linux/swap.h --- linux.orig/include/linux/swap.h Wed Apr 25 00:51:36 2001 +++ linux/include/linux/swap.h Wed Apr 25 00:53:04 2001 @@ -111,8 +111,8 @@ extern int try_to_free_pages(unsigned int gfp_mask); /* linux/mm/page_io.c */ -extern void rw_swap_page(int, struct page *, int); -extern void rw_swap_page_nolock(int, swp_entry_t, char *, int); +extern void rw_swap_page(int, struct page *); +extern void rw_swap_page_nolock(int, swp_entry_t, char *); /* linux/mm/page_alloc.c */ @@ -121,8 +121,7 @@ extern void add_to_swap_cache(struct page *, swp_entry_t); extern int swap_check_entry(unsigned long); extern struct page * lookup_swap_cache(swp_entry_t); -extern struct page * read_swap_cache_async(swp_entry_t, int); -#define read_swap_cache(entry) read_swap_cache_async(entry, 1); +extern struct page * read_swap_cache_async(swp_entry_t); /* linux/mm/oom_kill.c */ extern int out_of_memory(void); diff -Nur linux.orig/mm/filemap.c linux/mm/filemap.c --- linux.orig/mm/filemap.c Wed Apr 25 00:51:35 2001 +++ linux/mm/filemap.c Wed Apr 25 00:53:04 2001 @@ -678,6 +678,34 @@ } /* + * Find a swapcache page (and get a reference) or return NULL. + * The SwapCache check is protected by the pagecache lock. + */ +struct page * __find_get_swapcache_page(struct address_space *mapping, + unsigned long offset, struct page **hash) +{ + struct page *page; + + /* + * We need the LRU lock to protect against page_launder(). + */ + + spin_lock(&pagecache_lock); + page = __find_page_nolock(mapping, offset, *hash); + if (page) { + spin_lock(&pagemap_lru_lock); + if (PageSwapCache(page)) + page_cache_get(page); + else + page = NULL; + spin_unlock(&pagemap_lru_lock); + } + spin_unlock(&pagecache_lock); + + return page; +} + +/* * Same as the above, but lock the page too, verifying that * it's still valid once we own it. */ diff -Nur linux.orig/mm/memory.c linux/mm/memory.c --- linux.orig/mm/memory.c Wed Apr 25 00:51:35 2001 +++ linux/mm/memory.c Wed Apr 25 00:53:04 2001 @@ -1040,7 +1040,7 @@ break; } /* Ok, do the async read-ahead now */ - new_page = read_swap_cache_async(SWP_ENTRY(SWP_TYPE(entry), offset), 0); + new_page = read_swap_cache_async(SWP_ENTRY(SWP_TYPE(entry), offset)); if (new_page != NULL) page_cache_release(new_page); swap_free(SWP_ENTRY(SWP_TYPE(entry), offset)); @@ -1063,13 +1063,13 @@ if (!page) { lock_kernel(); swapin_readahead(entry); - page = read_swap_cache(entry); + page = read_swap_cache_async(entry); unlock_kernel(); if (!page) { spin_lock(&mm->page_table_lock); return -1; } - + wait_on_page(page); flush_page_to_ram(page); flush_icache_page(vma, page); } diff -Nur linux.orig/mm/page_io.c linux/mm/page_io.c --- linux.orig/mm/page_io.c Wed Apr 25 00:51:35 2001 +++ linux/mm/page_io.c Wed Apr 25 00:53:04 2001 @@ -33,7 +33,7 @@ * that shared pages stay shared while being swapped. */ -static int rw_swap_page_base(int rw, swp_entry_t entry, struct page *page, int wait) +static int rw_swap_page_base(int rw, swp_entry_t entry, struct page *page) { unsigned long offset; int zones[PAGE_SIZE/512]; @@ -41,6 +41,7 @@ kdev_t dev = 0; int block_size; struct inode *swapf = 0; + int wait = 0; /* Don't allow too many pending pages in flight.. */ if ((rw == WRITE) && atomic_read(&nr_async_pages) > @@ -104,7 +105,7 @@ * - it's marked as being swap-cache * - it's associated with the swap inode */ -void rw_swap_page(int rw, struct page *page, int wait) +void rw_swap_page(int rw, struct page *page) { swp_entry_t entry; @@ -116,7 +117,7 @@ PAGE_BUG(page); if (page->mapping != &swapper_space) PAGE_BUG(page); - if (!rw_swap_page_base(rw, entry, page, wait)) + if (!rw_swap_page_base(rw, entry, page)) UnlockPage(page); } @@ -125,7 +126,7 @@ * Therefore we can't use it. Later when we can remove the need for the * lock map and we can reduce the number of functions exported. */ -void rw_swap_page_nolock(int rw, swp_entry_t entry, char *buf, int wait) +void rw_swap_page_nolock(int rw, swp_entry_t entry, char *buf) { struct page *page = virt_to_page(buf); @@ -137,7 +138,8 @@ PAGE_BUG(page); /* needs sync_page to wait I/O completation */ page->mapping = &swapper_space; - if (!rw_swap_page_base(rw, entry, page, wait)) + if (!rw_swap_page_base(rw, entry, page)) UnlockPage(page); + wait_on_page(page); page->mapping = NULL; } diff -Nur linux.orig/mm/shmem.c linux/mm/shmem.c --- linux.orig/mm/shmem.c Wed Apr 25 00:51:35 2001 +++ linux/mm/shmem.c Wed Apr 25 00:53:04 2001 @@ -124,6 +124,7 @@ *ptr = (swp_entry_t){0}; freed++; if ((page = lookup_swap_cache(entry)) != NULL) { + wait_on_page(page); delete_from_swap_cache(page); page_cache_release(page); } @@ -329,10 +330,11 @@ spin_unlock (&info->lock); lock_kernel(); swapin_readahead(*entry); - page = read_swap_cache(*entry); + page = read_swap_cache_async(*entry); unlock_kernel(); if (!page) return ERR_PTR(-ENOMEM); + wait_on_page(page); if (!Page_Uptodate(page)) { page_cache_release(page); return ERR_PTR(-EIO); diff -Nur linux.orig/mm/swap_state.c linux/mm/swap_state.c --- linux.orig/mm/swap_state.c Wed Apr 25 00:51:35 2001 +++ linux/mm/swap_state.c Wed Apr 25 00:53:04 2001 @@ -34,7 +34,7 @@ return 0; in_use: - rw_swap_page(WRITE, page, 0); + rw_swap_page(WRITE, page); return 0; } @@ -163,37 +163,18 @@ /* * Right now the pagecache is 32-bit only. But it's a 32 bit index. =) */ -repeat: - found = find_lock_page(&swapper_space, entry.val); + found = find_get_swapcache_page(&swapper_space, entry.val); if (!found) return 0; - /* - * Though the "found" page was in the swap cache an instant - * earlier, it might have been removed by refill_inactive etc. - * Re search ... Since find_lock_page grabs a reference on - * the page, it can not be reused for anything else, namely - * it can not be associated with another swaphandle, so it - * is enough to check whether the page is still in the scache. - */ - if (!PageSwapCache(found)) { - UnlockPage(found); - page_cache_release(found); - goto repeat; - } + if (!PageSwapCache(found)) + BUG(); if (found->mapping != &swapper_space) - goto out_bad; + BUG(); #ifdef SWAP_CACHE_INFO swap_cache_find_success++; #endif - UnlockPage(found); return found; } - -out_bad: - printk (KERN_ERR "VM: Found a non-swapper swap page!\n"); - UnlockPage(found); - page_cache_release(found); - return 0; } /* @@ -205,7 +186,7 @@ * the swap entry is no longer in use. */ -struct page * read_swap_cache_async(swp_entry_t entry, int wait) +struct page * read_swap_cache_async(swp_entry_t entry) { struct page *found_page = 0, *new_page; unsigned long new_page_addr; @@ -238,7 +219,7 @@ */ lock_page(new_page); add_to_swap_cache(new_page, entry); - rw_swap_page(READ, new_page, wait); + rw_swap_page(READ, new_page); return new_page; out_free_page: diff -Nur linux.orig/mm/swapfile.c linux/mm/swapfile.c --- linux.orig/mm/swapfile.c Wed Apr 25 00:51:35 2001 +++ linux/mm/swapfile.c Wed Apr 25 00:53:24 2001 @@ -369,13 +369,15 @@ /* Get a page for the entry, using the existing swap cache page if there is one. Otherwise, get a clean page and read the swap into it. */ - page = read_swap_cache(entry); + page = read_swap_cache_async(entry); if (!page) { swap_free(entry); return -ENOMEM; } + lock_page(page); if (PageSwapCache(page)) - delete_from_swap_cache(page); + delete_from_swap_cache_nolock(page); + UnlockPage(page); read_lock(&tasklist_lock); for_each_task(p) unuse_process(p->mm, entry, page); @@ -650,7 +652,7 @@ } lock_page(virt_to_page(swap_header)); - rw_swap_page_nolock(READ, SWP_ENTRY(type,0), (char *) swap_header, 1); + rw_swap_page_nolock(READ, SWP_ENTRY(type,0), (char *) swap_header); if (!memcmp("SWAP-SPACE",swap_header->magic.magic,10)) swap_header_version = 1; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] swap-speedup-2.4.3-A1, massive swapping speedup 2001-04-23 15:33 ` Rik van Riel 2001-04-23 16:05 ` [patch] swap-speedup-2.4.3-A2 Ingo Molnar @ 2001-04-23 16:53 ` Jonathan Morton 2001-04-23 17:10 ` Linus Torvalds 1 sibling, 1 reply; 11+ messages in thread From: Jonathan Morton @ 2001-04-23 16:53 UTC (permalink / raw) To: Rik van Riel, Ingo Molnar Cc: Linus Torvalds, Alan Cox, Linux Kernel List, linux-mm, Marcelo Tosatti, Szabolcs Szakacsits >There seems to be one more reason, take a look at the function >read_swap_cache_async() in swap_state.c, around line 240: > > /* > * Add it to the swap cache and read its contents. > */ > lock_page(new_page); > add_to_swap_cache(new_page, entry); > rw_swap_page(READ, new_page, wait); > return new_page; > >Here we add an "empty" page to the swap cache and use the >page lock to protect people from reading this non-up-to-date >page. How about reversing the order of the calls - ie. add the page to the cache only when it's been filled? That would fix the race. -------------------------------------------------------------- from: Jonathan "Chromatix" Morton mail: chromi@cyberspace.org (not for attachments) big-mail: chromatix@penguinpowered.com uni-mail: j.d.morton@lancaster.ac.uk The key to knowledge is not to rely on people to teach you it. Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/ -----BEGIN GEEK CODE BLOCK----- Version 3.12 GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*) -----END GEEK CODE BLOCK----- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] swap-speedup-2.4.3-A1, massive swapping speedup 2001-04-23 16:53 ` [patch] swap-speedup-2.4.3-A1, massive swapping speedup Jonathan Morton @ 2001-04-23 17:10 ` Linus Torvalds 2001-04-23 22:13 ` Marcelo Tosatti 0 siblings, 1 reply; 11+ messages in thread From: Linus Torvalds @ 2001-04-23 17:10 UTC (permalink / raw) To: Jonathan Morton Cc: Rik van Riel, Ingo Molnar, Alan Cox, Linux Kernel List, linux-mm, Marcelo Tosatti, Szabolcs Szakacsits On Mon, 23 Apr 2001, Jonathan Morton wrote: > >There seems to be one more reason, take a look at the function > >read_swap_cache_async() in swap_state.c, around line 240: > > > > /* > > * Add it to the swap cache and read its contents. > > */ > > lock_page(new_page); > > add_to_swap_cache(new_page, entry); > > rw_swap_page(READ, new_page, wait); > > return new_page; > > > >Here we add an "empty" page to the swap cache and use the > >page lock to protect people from reading this non-up-to-date > >page. > > How about reversing the order of the calls - ie. add the page to the cache > only when it's been filled? That would fix the race. No. The page cache is used as the IO synchronization point, both for swapping and for regular IO. You have to add the page in _first_, because otherwise you may end up doing multiple IO's from different pages. The proper fix is to do the equivalent of this on all the lookup paths that want a valid page: if (!PageUptodate(page)) { lock_page(page); if (PageUptodate(page)) { unlock_page(page); return 0; } rw_swap_page(page, 0); wait_on_page(page); if (!PageUptodate(page)) return -EIO; } return 0; This is the whole point of the "uptodate" flag, and for all I know we may already do all of this (it's certainly the normal setup). Note how we do NOT block on write-backs in the above: the page will be up-to-date from the bery beginning (it had _better_ be, it's hard to write back a swap page that isn't up-to-date ;). The above is how all the file paths work. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] swap-speedup-2.4.3-A1, massive swapping speedup 2001-04-23 17:10 ` Linus Torvalds @ 2001-04-23 22:13 ` Marcelo Tosatti 0 siblings, 0 replies; 11+ messages in thread From: Marcelo Tosatti @ 2001-04-23 22:13 UTC (permalink / raw) To: Linus Torvalds Cc: Jonathan Morton, Rik van Riel, Ingo Molnar, Alan Cox, Linux Kernel List, linux-mm, Szabolcs Szakacsits On Mon, 23 Apr 2001, Linus Torvalds wrote: > > On Mon, 23 Apr 2001, Jonathan Morton wrote: > > >There seems to be one more reason, take a look at the function > > >read_swap_cache_async() in swap_state.c, around line 240: > > > > > > /* > > > * Add it to the swap cache and read its contents. > > > */ > > > lock_page(new_page); > > > add_to_swap_cache(new_page, entry); > > > rw_swap_page(READ, new_page, wait); > > > return new_page; > > > > > >Here we add an "empty" page to the swap cache and use the > > >page lock to protect people from reading this non-up-to-date > > >page. > > > > How about reversing the order of the calls - ie. add the page to the cache > > only when it's been filled? That would fix the race. > > No. The page cache is used as the IO synchronization point, both for > swapping and for regular IO. You have to add the page in _first_, because > otherwise you may end up doing multiple IO's from different pages. > > The proper fix is to do the equivalent of this on all the lookup paths > that want a valid page: > > if (!PageUptodate(page)) { > lock_page(page); > if (PageUptodate(page)) { > unlock_page(page); > return 0; > } > rw_swap_page(page, 0); > wait_on_page(page); > if (!PageUptodate(page)) > return -EIO; > } > return 0; > > This is the whole point of the "uptodate" flag, and for all I know we may > already do all of this (it's certainly the normal setup). > > Note how we do NOT block on write-backs in the above: the page will be > up-to-date from the bery beginning (it had _better_ be, it's hard to write > back a swap page that isn't up-to-date ;). > > The above is how all the file paths work. Please don't forget about swapin readahead. If the page is not uptodated and we are doing swapin readahead on it, we want to fail if the page is already locked (which means its already under IO). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2001-04-25 2:28 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2001-04-23 9:20 [patch] swap-speedup-2.4.3-A1, massive swapping speedup Ingo Molnar 2001-04-23 15:33 ` Rik van Riel 2001-04-23 16:05 ` [patch] swap-speedup-2.4.3-A2 Ingo Molnar 2001-04-23 17:17 ` Linus Torvalds 2001-04-23 16:54 ` Ingo Molnar 2001-04-24 5:44 ` [patch] swap-speedup-2.4.3-B3 Ingo Molnar 2001-04-24 16:38 ` Linus Torvalds 2001-04-25 2:28 ` Marcelo Tosatti 2001-04-23 16:53 ` [patch] swap-speedup-2.4.3-A1, massive swapping speedup Jonathan Morton 2001-04-23 17:10 ` Linus Torvalds 2001-04-23 22:13 ` Marcelo Tosatti
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox