Hi,
I testd set_bit()/__set_bit() ops, atomic and non atomic ops, on my Xeon.
I think this test is not perfect, but shows some aspect of pefromance of atomic ops.

Program:
the program touches memory in tight loop, using atomic and non-atomic set_bit().
memory size is 512k, L2 cache size.
I attaches it in this mail, but it is configured to my Xeon and looks ugly :).


My CPU:
from /proc/cpuinfo
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) XEON(TM) MP CPU 1.90GHz
stepping        : 2
cpu MHz         : 1891.582
cache size      : 512 KBCPU     : Intel Xeon 1.8GHz

Result:
[root@kanex2 atomic]# nice -10 ./test-atomics
score 0 is            64011 note: cache hit, no atomic
score 1 is           543011 note: cache hit, atomic
score 2 is           303901 note: cache hit, mixture
score 3 is           344261 note: cache miss, no atomic
score 4 is          1131085 note: cache miss, atomic
score 5 is           593443 note: cache miss, mixture
score 6 is           118455 note: cache hit, dependency, noatomic
score 7 is           416195 note: cache hit, dependency, mixture

smaller score is better.
score 0-2 shows set_bit/__set_bit performance during good cache hit rate.
score 3-5 shows set_bit/__set_bit performance during bad cache hit rate.
score 6-7 shows set_bit/__set_bit performance during good cache hit
but there is data dependency on each access in the tight loop.

To Dave:
cost of prefetch() is not here, because I found it is very sensitive to
what is done in the loop and difficult to measure in this program.
I found cost of calling prefetch is a bit high, I'll measure whether
prefetch() in buddy allocator is good or bad again.

I think this result shows I should use non-atomic ops when I can.

Thanks.
Kame

Hiroyuki KAMEZAWA wrote:
> 
> 
> Okay, I'll do more test and if I find atomic ops are slow,
> I'll add __XXXPagePrivate() macros.
> 
> ps. I usually test codes on Xeon 1.8G x 2 server.
> 
> -- Kame
> 
> Andrew Morton wrote:
> 
>> Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>>
>>> In the previous version, I used 
>>> SetPagePrivate()/ClearPagePrivate()/PagePrivate().
>>> But these are "atomic" operation and looks very slow.
>>> This is why I doesn't used these macros in this version.
>>>
>>> My previous version, which used set_bit/test_bit/clear_bit, shows 
>>> very bad performance
>>> on my test, and I replaced it.
>>
>>
>>
>> That's surprising.  But if you do intend to use non-atomic bitops then
>> please add __SetPagePrivate() and __ClearPagePrivate()
> 
> 


-- 
--the clue is these footmarks leading to the door.--
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>