From mboxrd@z Thu Jan 1 00:00:00 1970 Subject: Re: [Lhms-devel] [RFC] buddy allocator without bitmap [2/4] From: Dave Hansen In-Reply-To: <412EBD22.2090508@jp.fujitsu.com> References: <412DD1AA.8080408@jp.fujitsu.com> <1093535402.2984.11.camel@nighthawk> <412E6CC3.8060908@jp.fujitsu.com> <20040826171840.4a61e80d.akpm@osdl.org> <412E8009.3080508@jp.fujitsu.com> <412EBD22.2090508@jp.fujitsu.com> Content-Type: text/plain Message-Id: <1093583072.2984.463.camel@nighthawk> Mime-Version: 1.0 Date: Thu, 26 Aug 2004 22:04:32 -0700 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Hiroyuki KAMEZAWA Cc: Andrew Morton , Linux Kernel Mailing List , linux-mm , lhms , William Lee Irwin III List-ID: On Thu, 2004-08-26 at 21:48, Hiroyuki KAMEZAWA wrote: > I testd set_bit()/__set_bit() ops, atomic and non atomic ops, on my Xeon. > I think this test is not perfect, but shows some aspect of pefromance of atomic ops. > > Program: > the program touches memory in tight loop, using atomic and non-atomic set_bit(). > memory size is 512k, L2 cache size. > I attaches it in this mail, but it is configured to my Xeon and looks ugly :). ... > To Dave: > cost of prefetch() is not here, because I found it is very sensitive to > what is done in the loop and difficult to measure in this program. > I found cost of calling prefetch is a bit high, I'll measure whether > prefetch() in buddy allocator is good or bad again. > > I think this result shows I should use non-atomic ops when I can. I think we all know that locked instructions are going to be slower. However, what I wanted to see is how it influences a slightly more realistic test, and actually in the context of the kernel. Let's actually see how much impact using the prefetch() and atomic vs non-atomic ops has when they're used *in* the kernel on a less contrived less microbenchmarky test. How about finding some kind of benchmark that will do a bunch of forking and a bunch of page faulting to cause lots of activity in the allocator? I'd suggest something like http://ck.kolivas.org/kernbench/ or SDET if you can get your hands on it. Anybody else have some suggestions? The atomic ops, you're probably right about, but it would still be nice to have some hard data. As for prefetch(), we could scatter it and unlikely() all over the kernel, but we only tend to do so when we can either demonstrate a concrete gain, or it is a super-hot path. With hot-and-cold-pages around, even the allocator functions don't necessarily count as super-hot. I'll run kernbench and sdet with and without the atomic ops and prefetch on some of my hardware and see what I come up with. -- Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org