From mboxrd@z Thu Jan 1 00:00:00 1970 Subject: Re: [PATCH/RFC 1/14] Reclaim Scalability: Convert anon_vma lock to read/write lock From: Lee Schermerhorn In-Reply-To: <20070917110234.GF25706@skynet.ie> References: <20070914205359.6536.98017.sendpatchset@localhost> <20070914205405.6536.37532.sendpatchset@localhost> <20070917110234.GF25706@skynet.ie> Content-Type: text/plain Date: Tue, 18 Sep 2007 16:17:21 -0400 Message-Id: <1190146641.5035.80.camel@localhost> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Mel Gorman Cc: linux-mm@kvack.org, akpm@linux-foundation.org, clameter@sgi.com, riel@redhat.com, balbir@linux.vnet.ibm.com, andrea@suse.de, a.p.zijlstra@chello.nl, eric.whitney@hp.com, npiggin@suse.de List-ID: On Mon, 2007-09-17 at 12:02 +0100, Mel Gorman wrote: > On (14/09/07 16:54), Lee Schermerhorn didst pronounce: > > [PATCH/RFC] 01/14 Reclaim Scalability: Convert anon_vma list lock a read/write lock > > > > Against 2.6.23-rc4-mm1 > > > > Make the anon_vma list lock a read/write lock. Heaviest use of this > > lock is in the page_referenced()/try_to_unmap() calls from vmscan > > [shrink_page_list()]. These functions can use a read lock to allow > > some parallelism for different cpus trying to reclaim pages mapped > > via the same set of vmas. > > > > This change should not change the footprint of the anon_vma in the > > non-debug case. > > > > Note: I have seen systems livelock with all cpus in reclaim, down > > in page_referenced_anon() or try_to_unmap_anon() spinning on the > > anon_vma lock. I have only seen this with the AIM7 benchmark with > > workloads of 10s of thousands of tasks. All of these tasks are > > children of a single ancestor, so they all share the same anon_vma > > for each vm area in their respective mm's. I'm told that Apache > > can fork thousands of children to handle incoming connections, and > > I've seen similar livelocks--albeit on the i_mmap_lock [next patch] > > running 1000s of Oracle users on a large ia64 platform. > > > > With this patch [along with Rik van Riel's split LRU patch] we were > > able to see the AIM7 workload start swapping, instead of hanging, > > for the first time. Same workload DID hang with just Rik's patch, > > so this patch is apparently useful. > > > > In light of what Peter and Linus said about rw-locks being more expensive > than spinlocks, we'll need to measure this with some benchmark. The plus > side is that this patch can be handled in isolation because it's either a > scalability fix or it isn't. It's worth investigating because you say it > fixed a real problem where under load the job was able to complete with > this patch and live-locked without it. > > kernbench is unlikely to show up anything useful here although it might be > worth running anyway just in case. brk_test from aim9 might be useful as it's > a micro-benchmark that uses brk() which is a path affected by this patch. As > aim7 is exercising this path, it would be interesting to see does it show > performance differences in the normal non-stressed case. Other suggestions? As Mel predicted, kernel builds don't seem to be affected by this patch, nor the i_mmap_lock rw_lock patch. Below I've included results for an old ia64 system that I have pretty much exclusive access to. I can't get 23-rc4-mm1 nor rc6-mm1 to boot on an x86_64 [AMD-based] right now--still trying to capture stack trace [not easy from a remote console :-(]. I don't have access to the large server with storage for testing Oracle and AIM right now. When I get it back, I will try both of these patches both for any added overhead and to verify that they alleviate the problem they're trying to solve. [I do have evidence that the anon_vma rw lock improves the situation with AIM7 on a ~21-rcx kernel earlier this year]. These times are the average [+ std dev'n] of 10 consecutive runs after reboot of a '-j32' build of ia64 defconfig. 23-rc4-mm1 - no rmap rw_lock -- i.e., spinlocks Real User System 101.94 1205.10 92.85 0.56 1.04 0.73 23-rc4-mm1 w/ anon_vma rw_lock Real User System 101.64 1205.36 91.83 0.65 0.59 0.67 23-rc4-mm1 w/ i_mmap_lock rw_lock Real User System 101.70 1204.57 92.20 0.51 0.73 0.39 This is a NUMA system, so the differences are more like the result of differences in locality--roll of the dice--than the lock types. More data later, when I get it... Lee -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org