background: sometime during the 2.4 cycle swapping performance and the
efficiency of the swapcache dropped significantly. I believe i finally
managed to find the problem that caused poor swapping performance and
annoying 'sudden process stoppage' symptoms.

(to those people who are experiencing swapping problems in 2.4: please try
the attached swap-speedup-2.4.3-A1 patch and let me know whether it works
& helps. The patch is against 2.4.4-pre6 or 2.4.3-ac12 and works just fine
on UP and SMP systems here.)

the problem is in lookup_swap_cache(). Per design it's a read-only,
lightweight function that just looks up the swapcache and reestablishes
the mapping if there is an entry still present. (ie. in most cases this
means that there is a fresh swapout still pending). In reality
lookup_swap_cache() did not work as intended: pages were locked in most of
the cases due to swap-out WRITEs, which caused the find_lock_page() to act
as a synchronization point - it blocked until the writeout finished. (!!!)
This is highly inefficient and undesirable - a lookup cache should not and
must not serialize with writeouts.

the reason why lookup_swap_cache() locks the page is due to a valid race,
but the solution excessive: it tries to keep the lookup atomic against
destroyers of the page, page_launder() and reclaim_page(). But it does not
really need the page lock for this - what it needs is atomicity against
swapcache updates. The same atomicity can be achieved by taking the LRU
and pagecache locks, the PageSwapCache() check is now embedded in a new
function: __find_get_swapcache_page().

the patch dramatically improves swapping performance in the tests i've
tried: swap-trashing tests that used to effectively lock the system up,
are chugging along just fine now, and the system is still more than
usable. The performance bug basically killed all good effects of the
swapcache. Swap-in latency of swapped-out processes has decreased
significantly, and overall swapping throughput has increased and
stabilized.

I'd really like to ask all MM developers to take some time to lean back
and verify current code to find similar performance bugs, instead of
trying to hack up new functionality to hide symptoms of bad design or bad
implementation. (for example there are some plans to add "avoid trashing
via process suspension" heuristics, which just work around real problems
like this one. With such code in place i'd probably never have found this
problem.) I believe we have most of the VM functionality in place to have
a world-class VM (most of which is new), what we now need is reliable and
verified behavior, not more complexity.

[ i'd also like to ask for new methods to create 'swap trashing', right
now i cannot make my system unusable via excessive swapping. (i'm
currently using the sieve.c memory trasher from Cesar Eduardo Barros, this
code used to produce the most extreme trashing load - it works just fine
now.) ]

	Ingo