From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from m6.gw.fujitsu.co.jp ([10.0.50.76]) by fgwmail6.fujitsu.co.jp (8.12.10/Fujitsu Gateway) id i7R5FdwH025124 for ; Fri, 27 Aug 2004 14:15:39 +0900 (envelope-from kamezawa.hiroyu@jp.fujitsu.com) Received: from s5.gw.fujitsu.co.jp by m6.gw.fujitsu.co.jp (8.12.10/Fujitsu Domain Master) id i7R5FdqA023502 for ; Fri, 27 Aug 2004 14:15:39 +0900 (envelope-from kamezawa.hiroyu@jp.fujitsu.com) Received: from fjmail503.fjmail.jp.fujitsu.com (fjmail503-0.fjmail.jp.fujitsu.com [10.59.80.100]) by s5.gw.fujitsu.co.jp (8.12.11) id i7R5FdVS005073 for ; Fri, 27 Aug 2004 14:15:39 +0900 (envelope-from kamezawa.hiroyu@jp.fujitsu.com) Received: from jp.fujitsu.com (fjscan502-0.fjmail.jp.fujitsu.com [10.59.80.122]) by fjmail503.fjmail.jp.fujitsu.com (Sun Internet Mail Server sims.4.0.2001.07.26.11.50.p9) with ESMTP id <0I3300JC4AM17D@fjmail503.fjmail.jp.fujitsu.com> for linux-mm@kvack.org; Fri, 27 Aug 2004 14:15:38 +0900 (JST) Date: Fri, 27 Aug 2004 14:20:48 +0900 From: Hiroyuki KAMEZAWA Subject: Re: [Lhms-devel] [RFC] buddy allocator without bitmap [2/4] In-reply-to: <20040826215927.0af2dee9.akpm@osdl.org> Message-id: <412EC4B0.1040901@jp.fujitsu.com> MIME-version: 1.0 Content-type: text/plain; charset=us-ascii; format=flowed Content-transfer-encoding: 7bit References: <412DD1AA.8080408@jp.fujitsu.com> <1093535402.2984.11.camel@nighthawk> <412E6CC3.8060908@jp.fujitsu.com> <20040826171840.4a61e80d.akpm@osdl.org> <412E8009.3080508@jp.fujitsu.com> <412EBD22.2090508@jp.fujitsu.com> <20040826215927.0af2dee9.akpm@osdl.org> Sender: owner-linux-mm@kvack.org Return-Path: To: Andrew Morton Cc: haveblue@us.ibm.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, lhms-devel@lists.sourceforge.net, wli@holomorphy.com List-ID: Andrew Morton wrote: > Certainly, executing an atomic op in a tight loop will show a lot of > difference. But that doesn't mean that making these operations non-atomic > makes a significant difference to overall kernel performance! > Thanks. My test before positng patch is calling mmap()/munmap() with 4-16Mega bytes. munmap with such Mega bytes causes many calls of __free_pages_bulk() and many pages are coalesced at once. This means atomic_ops in heavyly called tight loop (I called it 3 times in the most inner loop...) and my test shows bad performance ;). > But whatever - it all adds up. The microoptimisation is fine - let's go > that way. > I'd like to add macros and to get my codes clear. > >>Result: >>[root@kanex2 atomic]# nice -10 ./test-atomics >>score 0 is 64011 note: cache hit, no atomic >>score 1 is 543011 note: cache hit, atomic >>score 2 is 303901 note: cache hit, mixture >>score 3 is 344261 note: cache miss, no atomic >>score 4 is 1131085 note: cache miss, atomic >>score 5 is 593443 note: cache miss, mixture >>score 6 is 118455 note: cache hit, dependency, noatomic >>score 7 is 416195 note: cache hit, dependency, mixture >> >>smaller score is better. >>score 0-2 shows set_bit/__set_bit performance during good cache hit rate. >>score 3-5 shows set_bit/__set_bit performance during bad cache hit rate. >>score 6-7 shows set_bit/__set_bit performance during good cache hit >>but there is data dependency on each access in the tight loop. > > > I _think_ the above means atomic ops are 10x more costly, yes? > yes, when L2 cache hits, I think. -- --the clue is these footmarks leading to the door.-- KAMEZAWA Hiroyuki -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org