From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Thu, 20 Sep 2007 16:16:16 +0200 From: Andrea Arcangeli Subject: Re: [PATCH/RFC 2/14] Reclaim Scalability: convert inode i_mmap_lock to reader/writer lock Message-ID: <20070920141616.GV4608@v2.random> References: <20070914205359.6536.98017.sendpatchset@localhost> <20070914205412.6536.34898.sendpatchset@localhost> <20070920012441.GQ4608@v2.random> <1190297448.5326.8.camel@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1190297448.5326.8.camel@localhost> Sender: owner-linux-mm@kvack.org Return-Path: To: Lee Schermerhorn Cc: linux-mm@kvack.org, akpm@linux-foundation.org, mel@csn.ul.ie, clameter@sgi.com, riel@redhat.com, balbir@linux.vnet.ibm.com, a.p.zijlstra@chello.nl, eric.whitney@hp.com, npiggin@suse.de List-ID: On Thu, Sep 20, 2007 at 10:10:48AM -0400, Lee Schermerhorn wrote: > Actually, the system never went OOM. Didn't get that far. I was trying > to create an Oracle workload that would put me at the brink of reclaim, > and then by running some app that would eat page cache, push it over the > edge. But, I apparently went too far--too many Oracle users for this > system--and it went into reclaim, got hung up with all cpus spinning on > the i_mmap_lock in page_referenced_file(). > > I just got this system back for testing. Soon as I build a 23-rc6-mm1 > kernel for it, I'll retest that with the same workload to demonstrate > the problem. Then I'll try it with the rw_lock patch to see if that > helps. Ok, I guess it's a numa scalability issue. All pages belongs to that file... and they all trash on the same spinlock. So I doubt the rw_lock will help much, the trashing where most time is probably spent should be the same. the rw_lock still looks a good idea, for smaller systems with faster interconnects like dualcore ;) > Well, except for the concern about the extra overhead of rw_locks. I'm > more worried about this for the i_mmap_lock than the anon_vma lock. The > only time we need to take the anon_vma lock for write is when adding a > new vma to the list, or removing one [vma_link(), et al]. But, the > i_mmap_lock is also used to protect the truncate_count, and must be > taken for write there. I expected that a kernel build might show > something with all the forks for parallel make, mapping of libc, cc > executable, ... but nothing. You mean it's not actually slower? Well I doubt a few instructions more counts these days, the major hit is the cacheline miss and that'll be the same for rwlock or spinlock... (which is why it probably won't help much on systems with tons of cpus and where cacheline bouncing trashes so badly). Ironically I think it's more an optimization for small smp with lots of ram, than big smp/numa. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org