background: sometime during the 2.4 cycle swapping performance and the efficiency of the swapcache dropped significantly. I believe i finally managed to find the problem that caused poor swapping performance and annoying 'sudden process stoppage' symptoms. (to those people who are experiencing swapping problems in 2.4: please try the attached swap-speedup-2.4.3-A1 patch and let me know whether it works & helps. The patch is against 2.4.4-pre6 or 2.4.3-ac12 and works just fine on UP and SMP systems here.) the problem is in lookup_swap_cache(). Per design it's a read-only, lightweight function that just looks up the swapcache and reestablishes the mapping if there is an entry still present. (ie. in most cases this means that there is a fresh swapout still pending). In reality lookup_swap_cache() did not work as intended: pages were locked in most of the cases due to swap-out WRITEs, which caused the find_lock_page() to act as a synchronization point - it blocked until the writeout finished. (!!!) This is highly inefficient and undesirable - a lookup cache should not and must not serialize with writeouts. the reason why lookup_swap_cache() locks the page is due to a valid race, but the solution excessive: it tries to keep the lookup atomic against destroyers of the page, page_launder() and reclaim_page(). But it does not really need the page lock for this - what it needs is atomicity against swapcache updates. The same atomicity can be achieved by taking the LRU and pagecache locks, the PageSwapCache() check is now embedded in a new function: __find_get_swapcache_page(). the patch dramatically improves swapping performance in the tests i've tried: swap-trashing tests that used to effectively lock the system up, are chugging along just fine now, and the system is still more than usable. The performance bug basically killed all good effects of the swapcache. Swap-in latency of swapped-out processes has decreased significantly, and overall swapping throughput has increased and stabilized. I'd really like to ask all MM developers to take some time to lean back and verify current code to find similar performance bugs, instead of trying to hack up new functionality to hide symptoms of bad design or bad implementation. (for example there are some plans to add "avoid trashing via process suspension" heuristics, which just work around real problems like this one. With such code in place i'd probably never have found this problem.) I believe we have most of the VM functionality in place to have a world-class VM (most of which is new), what we now need is reliable and verified behavior, not more complexity. [ i'd also like to ask for new methods to create 'swap trashing', right now i cannot make my system unusable via excessive swapping. (i'm currently using the sieve.c memory trasher from Cesar Eduardo Barros, this code used to produce the most extreme trashing load - it works just fine now.) ] Ingo