Hi David, Forgive my selective quoting... David Chinner wrote: > Take a large file - say Size = 5x RAM or so - and then start > N threads runnnning at offset (n / Size) where n = the thread > number. They each read (Size / N) and so typically don't overlap. > > Throughput with increasing numbers of threads on a 24p altix > on an XFS filesystem on 2.6.15-rc5 looks like: > > Loads tput > ----- ------- > 1 789.59 > 2 1191.56 > 4 1724.63 > 8 1213.63 > 16 1057.03 > 32 744.73 > > Basically, we hit a scaling limitation at b/t 4 and 8 threads. This was > consistent across I/O sizes from 4KB to 4MB. I took a simple 30s PC sample > profile: > Percent Routine > -------------------------- > 63.62 _write_lock_irqsave > 15.66 _read_unlock_irq > So read_unlock_irq looks to be triggered by the mapping->tree_lock. > > I think that the write_lock_irqsave() contention is from memory > reclaim (shrink_list()->try_to_release_page()-> ->releasepage()-> > xfs_vm_releasepage()-> try_to_free_buffers()->clear_page_dirty()-> > test_clear_page_dirty()-> write_lock_irqsave(&mapping->tree_lock...)) > because page cache memory was full of this one file and demand is > causing them to be constantly recycled. I'd say you're right. tree_lock contention will be coming from a number of sources. reclaim, as you say, will be a big one. mpage_readpages (from readahead) will be another. Then the read lock in find_get_page in generic_mapping_read will start contending heavily on the writers and not get much concurrency. I'm sure lockless (read-side) pagecache will help... not only will it eliminate read_lock costs, but the reduced read contention should also decrease write_lock contention and bouncing. As well as lockless pagecache, I think we can batch tree_lock operations in readahead. Would be interesting to see how much this patch helps. -- SUSE Labs, Novell Inc.