Shaohua, First bad commit is commit 51a65898ca3ce5435c809c5046cc949b28690a17 Author: Shaohua Li Date: Thu Jan 24 13:13:50 2013 +1100 swap: add per-partition lock for swapfile swap_lock is heavily contended when I test swap to 3 fast SSD (even slightly slower than swap to 2 such SSD). The main contention comes from swap_info_get(). This patch tries to fix the gap with adding a new per-partition lock. Global data like nr_swapfiles, total_swap_pages, least_priority and swap_list are still protected by swap_lock. nr_swap_pages is an atomic now, it can be changed without swap_lock. In theory, it's possible get_swap_page() finds no swap pages but actually there are free swap pages. But sounds not a big problem. Accessing partition specific data (like scan_swap_map and so on) is only protected by swap_info_struct.lock. Changing swap_info_struct.flags need hold swap_lock and swap_info_struct.lock, because scan_scan_map() will check it. read the flags is ok with either the locks hold. If both swap_lock and swap_info_struct.lock must be hold, we always hold the former first to avoid deadlock. swap_entry_free() can change swap_list. To delete that code, we add a new highest_priority_index. Whenever get_swap_page() is called, we check it. If it's valid, we use it. It's a pity get_swap_page() still holds swap_lock(). But in practice, swap_lock() isn't heavily contended in my test with this patch (or I can say there are other much more heavier bottlenecks like TLB flush). And BTW, looks get_swap_page() doesn't really need the lock. We never free swap_info[] and we check SWAP_WRITEOK flag. The only risk without the lock is we could swapout to some low priority swap, but we can quickly recover after several rounds of swap, so sounds not a big deal to me. But I'd prefer to fix this if it's a real problem. "swap: make each swap partition have one address_space" improved the swapout speed from 1.7G/s to 2G/s. This patch further improves the speed to 2.3G/s, so around 15% improvement. It's a multi-process test, so TLB flush isn't the biggest bottleneck before the patches. Signed-off-by: Shaohua Li Cc: Hugh Dickins Cc: Rik van Riel Cc: Minchan Kim Cc: Greg Kroah-Hartman Cc: Seth Jennings Cc: Konrad Rzeszutek Wilk Cc: Xiao Guangrong Cc: Dan Magenheimer Signed-off-by: Andrew Morton [ 22.837321] rpc.nfsd (2145) used greatest stack depth: 2704 bytes left Kernel tests: Boot OK! [ 24.697006] INFO: trying to register non-static key. [ 24.698629] the code is fine but needs lockdep annotation. [ 24.699858] turning off the locking correctness validator. [ 24.700936] Pid: 2301, comm: swapon Not tainted 3.8.0-rc4-07150-gd1b44b0 #505 [ 24.700936] Call Trace: [ 24.700936] [] __lock_acquire+0x823/0x8fd [ 24.700936] [] ? mark_held_locks+0xbe/0xea [ 24.700936] [] ? mutex_lock_nested+0x2de/0x310 [ 24.700936] [] lock_acquire+0xeb/0x13e [ 24.700936] [] ? sys_swapon+0x618/0x8d2 [ 24.700936] [] _raw_spin_lock+0x45/0x78 [ 24.700936] [] ? sys_swapon+0x618/0x8d2 [ 24.700936] [] sys_swapon+0x618/0x8d2 [ 24.700936] [] ? retint_swapgs+0x13/0x1b [ 24.700936] [] system_call_fastpath+0x16/0x1b [ 24.714377] Adding 307196k swap on /dev/vda. Priority:-1 extents:1 across:307196k [ 24.753049] Unregister pv shared memory for cpu 1 Thanks, Fengguang