From: Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com>
To: Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrew Morton <akpm@osdl.org>,
haveblue@us.ibm.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, lhms-devel@lists.sourceforge.net,
wli@holomorphy.com
Subject: Re: [Lhms-devel] [RFC] buddy allocator without bitmap [2/4]
Date: Fri, 27 Aug 2004 13:48:34 +0900 [thread overview]
Message-ID: <412EBD22.2090508@jp.fujitsu.com> (raw)
In-Reply-To: <412E8009.3080508@jp.fujitsu.com>
[-- Attachment #1: Type: text/plain, Size: 2681 bytes --]
Hi,
I testd set_bit()/__set_bit() ops, atomic and non atomic ops, on my Xeon.
I think this test is not perfect, but shows some aspect of pefromance of atomic ops.
Program:
the program touches memory in tight loop, using atomic and non-atomic set_bit().
memory size is 512k, L2 cache size.
I attaches it in this mail, but it is configured to my Xeon and looks ugly :).
My CPU:
from /proc/cpuinfo
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) XEON(TM) MP CPU 1.90GHz
stepping : 2
cpu MHz : 1891.582
cache size : 512 KBCPU : Intel Xeon 1.8GHz
Result:
[root@kanex2 atomic]# nice -10 ./test-atomics
score 0 is 64011 note: cache hit, no atomic
score 1 is 543011 note: cache hit, atomic
score 2 is 303901 note: cache hit, mixture
score 3 is 344261 note: cache miss, no atomic
score 4 is 1131085 note: cache miss, atomic
score 5 is 593443 note: cache miss, mixture
score 6 is 118455 note: cache hit, dependency, noatomic
score 7 is 416195 note: cache hit, dependency, mixture
smaller score is better.
score 0-2 shows set_bit/__set_bit performance during good cache hit rate.
score 3-5 shows set_bit/__set_bit performance during bad cache hit rate.
score 6-7 shows set_bit/__set_bit performance during good cache hit
but there is data dependency on each access in the tight loop.
To Dave:
cost of prefetch() is not here, because I found it is very sensitive to
what is done in the loop and difficult to measure in this program.
I found cost of calling prefetch is a bit high, I'll measure whether
prefetch() in buddy allocator is good or bad again.
I think this result shows I should use non-atomic ops when I can.
Thanks.
Kame
Hiroyuki KAMEZAWA wrote:
>
>
> Okay, I'll do more test and if I find atomic ops are slow,
> I'll add __XXXPagePrivate() macros.
>
> ps. I usually test codes on Xeon 1.8G x 2 server.
>
> -- Kame
>
> Andrew Morton wrote:
>
>> Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>>
>>> In the previous version, I used
>>> SetPagePrivate()/ClearPagePrivate()/PagePrivate().
>>> But these are "atomic" operation and looks very slow.
>>> This is why I doesn't used these macros in this version.
>>>
>>> My previous version, which used set_bit/test_bit/clear_bit, shows
>>> very bad performance
>>> on my test, and I replaced it.
>>
>>
>>
>> That's surprising. But if you do intend to use non-atomic bitops then
>> please add __SetPagePrivate() and __ClearPagePrivate()
>
>
--
--the clue is these footmarks leading to the door.--
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
[-- Attachment #2: test-atomics.c --]
[-- Type: text/plain, Size: 6834 bytes --]
#include <stdio.h>
#include <sys/mman.h>
/* Note: this program is written for Xeon */
/*
* Stolen from Linux.
*
*/
#define ADDR (*(volatile long *) addr)
/*
* set_bit - Atomically set a bit in memory
* @nr: the bit to set
* @addr: the address to start counting from
*
* This function is atomic and may not be reordered. See __set_bit()
* if you do not require the atomic guarantees.
*
* Note: there are no guarantees that this function will not be reordered
* on non x86 architectures, so if you are writting portable code,
* make sure not to rely on its reordering guarantees.
*
* Note that @nr may be almost arbitrarily large; this function is not
* restricted to acting on a single-word quantity.
*/
static inline void set_bit(int nr, volatile unsigned long * addr)
{
__asm__ __volatile__( "lock ;"
"btsl %1,%0"
:"=m" (ADDR)
:"Ir" (nr));
}
/**
* __set_bit - Set a bit in memory
* @nr: the bit to set
* @addr: the address to start counting from
*
* Unlike set_bit(), this function is non-atomic and may be reordered.
* If it's called on the same region of memory simultaneously, the effect
* may be that only one operation succeeds.
*/
static inline void __set_bit(int nr, volatile unsigned long * addr)
{
__asm__(
"btsl %1,%0"
:"=m" (ADDR)
:"Ir" (nr));
}
#define rdtsc(low,high) \
__asm__ __volatile__("rdtsc" : "=a" (low), "=d" (high))
/*
* Test params.
*
*/
#define CACHESIZE (512 * 1024) /* L2 cache size */
#define LCACHESIZE CACHESIZE/sizeof(long)
#define PAGESIZE 4096
#define LPAGESIZE PAGESIZE/sizeof(long)
#define MAX_TRY (100)
#define NOCACHEMISS_NOATOMIC 0
#define NOCACHEMISS_ATOMIC 1
#define NOCACHEMISS_MIXTURE 2
#define NOATOMIC 3
#define ATOMIC 4
#define MIXTURE 5
#define NOATOMIC_DEPEND 6
#define MIXTURE_DEPEND 7
#define NR_OPS 8
char message[NR_OPS][64]={
"cache hit, no atomic",
"cache hit, atomic",
"cache hit, mixture",
"cache miss, no atomic",
"cache miss, atomic",
"cache miss, mixture",
"cache hit, dependency, noatomic",
"cache hit, dependency, mixture"
};
#define LINESIZE 128 /* L2 line size */
#define LLINESIZE LINESIZE/sizeof(long)
/*
* function for preparing cache status
*/
void hot_cache(char *buffer,int size)
{
memset(buffer,0,size);
return;
}
void cold_cache(char *buffer,int size)
{
unsigned long *addr;
int i;
addr = malloc(size);
memset(addr,0,size);
return;
}
#define prefetch(addr) \
__asm__ __volatile__ ("prefetcht0 %0":: "m" (addr))
int main(int argc, char *argv[])
{
unsigned long long score[NR_OPS][MAX_TRY];
unsigned long long average_score[NR_OPS];
unsigned long *map, *addr;
struct {
unsigned long low;
unsigned long high;
} start,end;
int try, i, j;
unsigned long long lstart,lend;
map = mmap(NULL, CACHESIZE, PROT_WRITE, MAP_PRIVATE | MAP_ANON, 0, 0);
for(try = 0; try < MAX_TRY; try++) {
/* there is no page fault, cache hit */
hot_cache((char *)map, CACHESIZE);
/* No atomic ops case */
rdtsc(start.low, start.high);
for(addr = map;addr != map + LCACHESIZE; addr += LLINESIZE * 2) {
__set_bit(1,map);
__set_bit(2,map + LLINESIZE);
}
rdtsc(end.low, end.high);
lstart = (unsigned long long)start.high << 32 | start.low;
lend = (unsigned long long)end.high << 32 | end.low;
score[NOCACHEMISS_NOATOMIC][try] = lend - lstart;
/* there is no page fault, small cache miss */
hot_cache((char *)map, CACHESIZE);
/* atomic ops case */
rdtsc(start.low, start.high);
for(addr = map;addr != map + LCACHESIZE; addr += LLINESIZE * 2) {
set_bit(1,map);
set_bit(2,map + LLINESIZE);
}
rdtsc(end.low, end.high);
lstart = (unsigned long long)start.high << 32 | start.low;
lend = (unsigned long long)end.high << 32 | end.low;
score[NOCACHEMISS_ATOMIC][try] = lend - lstart;
/* there is no page fault, small cache miss */
hot_cache((char *)map, CACHESIZE);
/* mixture case */
rdtsc(start.low, start.high);
for(addr = map;addr != map + LCACHESIZE; addr += LLINESIZE * 2) {
__set_bit(1,map);
set_bit(2,map + LLINESIZE);
}
rdtsc(end.low, end.high);
lstart = (unsigned long long)start.high << 32 | start.low;
lend = (unsigned long long)end.high << 32 | end.low;
score[NOCACHEMISS_MIXTURE][try] = lend - lstart;
/* expire cache */
cold_cache((char *)map, CACHESIZE);
/* ATOMIC_ONLY case */
rdtsc(start.low, start.high);
for(addr = map; addr != map + LCACHESIZE; addr += LLINESIZE*2){
__set_bit(1,addr);
__set_bit(2,addr + LLINESIZE);
}
rdtsc(end.low, end.high);
lstart = (unsigned long long)start.high << 32 | start.low;
lend = (unsigned long long)end.high << 32 | end.low;
score[NOATOMIC][try] = lend - lstart;
/* expire cache */
cold_cache((char *)map, CACHESIZE);
/* ATOMIC_ONLY case */
rdtsc(start.low, start.high);
for(addr = map; addr != map + LCACHESIZE; addr += LLINESIZE * 2){
set_bit(1,addr);
set_bit(2,addr + LLINESIZE);
}
rdtsc(end.low, end.high);
lstart = (unsigned long long)start.high << 32 | start.low;
lend = (unsigned long long)end.high << 32 | end.low;
score[ATOMIC][try] = lend - lstart;
/* expire cache */
cold_cache((char *)map, CACHESIZE);
/* MIXTURE case */
rdtsc(start.low, start.high);
for(addr = map; addr != map + LCACHESIZE; addr += LLINESIZE * 2){
__set_bit(1,addr);
set_bit(2,addr + LLINESIZE);
}
rdtsc(end.low, end.high);
lstart = (unsigned long long)start.high << 32 | start.low;
lend = (unsigned long long)end.high << 32 | end.low;
score[MIXTURE][try] = lend - lstart;
/* hot cache */
hot_cache((char *)map, CACHESIZE);
/* case with dependency */
rdtsc(start.low, start.high);
for(addr = map; addr != map + LCACHESIZE; addr += LLINESIZE * 2){
__set_bit(1,addr);
__set_bit(2,addr);
}
rdtsc(end.low, end.high);
lstart = (unsigned long long)start.high << 32 | start.low;
lend = (unsigned long long)end.high << 32 | end.low;
score[NOATOMIC_DEPEND][try] = lend - lstart;
/* expire cache */
hot_cache((char *)map, CACHESIZE);
/* case with depndency */
rdtsc(start.low, start.high);
for(addr = map; addr != map + LCACHESIZE; addr += LLINESIZE * 2){
__set_bit(1,addr);
set_bit(2,addr);
}
rdtsc(end.low, end.high);
lstart = (unsigned long long)start.high << 32 | start.low;
lend = (unsigned long long)end.high << 32 | end.low;
score[MIXTURE_DEPEND][try] = lend - lstart;
}
for(j = 0; j < NR_OPS; j++) {
average_score[j] = 0;
for(i = 0; i < try; i++) {
average_score[j] += score[j][i];
}
printf("score %d is %16lld note: %s\n",j,average_score[j]/try,
message[j]);
}
return ;
}
next prev parent reply other threads:[~2004-08-27 4:43 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-08-26 12:03 Hiroyuki KAMEZAWA
2004-08-26 15:50 ` [Lhms-devel] " Dave Hansen
2004-08-26 23:05 ` Hiroyuki KAMEZAWA
2004-08-26 23:11 ` Dave Hansen
2004-08-26 23:28 ` Hiroyuki KAMEZAWA
2004-08-27 0:18 ` Andrew Morton
2004-08-27 0:27 ` Hiroyuki KAMEZAWA
2004-08-27 4:48 ` Hiroyuki KAMEZAWA [this message]
2004-08-27 4:59 ` Andrew Morton
2004-08-27 5:20 ` Hiroyuki KAMEZAWA
2004-08-27 5:04 ` Dave Hansen
2004-08-27 5:31 ` Hiroyuki KAMEZAWA
2004-08-27 5:31 ` Dave Hansen
2004-08-27 5:47 ` Dave Hansen
2004-08-27 6:09 ` Hiroyuki KAMEZAWA
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=412EBD22.2090508@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=akpm@osdl.org \
--cc=haveblue@us.ibm.com \
--cc=lhms-devel@lists.sourceforge.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox