From: Christoph Lameter <clameter@engr.sgi.com>
To: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
Linus Torvalds <torvalds@osdl.org>, Andrew Morton <akpm@osdl.org>,
linux-mm@kvack.org
Subject: Re: [RFT][PATCH 0/2] pagefault scalability alternative
Date: Tue, 23 Aug 2005 09:30:16 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.62.0508230909120.16321@schroedinger.engr.sgi.com> (raw)
In-Reply-To: <Pine.LNX.4.61.0508230822300.5224@goblin.wat.veritas.com>
On Tue, 23 Aug 2005, Hugh Dickins wrote:
> > The basic idea is to have a spinlock per page table entry it seems.
> A spinlock per page table, not a spinlock per page table entry.
Thats a spinlock per pmd? Calling it per page table is a bit confusing
since page table may refer to the whole tree. Could you develop
a clearer way of referring to these locks that is not page_table_lock or
ptl?
> After dealing with the really hard issues (how to get the definitions
> and inlines into the header files without crashing the HIGHPTE build)
> yesterday, I spent several hours ruminating again on that *pmd issue,
> holding off from making a hundred edits; and in the end added just
> an unsigned long cast into the i386 definition of pmd_none. We must
> avoid basing decisions on two mismatched halves; but pmd_present is
> already safe, and now pmd_none also. The remaining races are benign.
>
> What do you think?
Atomicity can be guaranteed to some degree by using the present bit.
For an update the present bit is first switched off. When a
new value is written, it is first written in the piece of the entry that
does not contain the pte bit which keeps the entry "not present". Last the
word with the present bit is written.
This means that if any p?d entry has been found to not contain the present
bit then a lock must be taken and then the entry must be reread to get a
consistent value.
Here are the results of the performance test. In summary these show that
the performance of both our approaches are equivalent. I would prefer your
patches over mine since they have a broader scope and may accellerate
other aspects of vm operations.
Note that these tests need to be taken with some caution. Results are
topology dependent and its just one special case (allocating new
pages in do_anon_page) that is measured. Results are somewhat scewed if
the amount of memory per task (mem/threads) becomes too small so that
there is not enough time spend in concurrent page faulting.
We only scale well up to 32 processors. Beyond that performance is still
dropping and there is severe contention at 60. This is still better than
to experience this drop at 4 processors (2.6.13) but not all that we
are after. This performance pattern is typical for only dealing with
the page_table_lock.
I tried the delta patches which increase performance somewhat more but I
do not get the performance results in the very high range that I saw last
year. Either something is wrong with the delta patches or there is
another other issue these days that limits performance. I still have to
figure out what is going on there. I may know more after I test on a
machine with more processors.
Two samples each allocating 1,4,8,16 GB with 1-60 processors.
1. 2.6.13-rc6-mm1
Gb Rep Thr CLine User System Wall flt/cpu/s fault/wsec
1 3 1 1 0.06s 2.08s 2.01s 91530.726 91686.487
1 3 2 1 0.04s 2.40s 1.03s 80313.725 148253.347
1 3 4 1 0.04s 2.48s 0.07s 78019.048 247860.666
1 3 8 1 0.04s 2.76s 0.05s 70217.143 336562.559
1 3 16 1 0.07s 4.37s 0.05s 44201.439 332361.815
1 3 32 1 5.94s 10.92s 1.00s 11650.154 180992.401
1 3 60 1 42.57s 21.80s 2.02s 3054.057 89132.235
Gb Rep Thr CLine User System Wall flt/cpu/s fault/wsec
4 3 1 1 0.13s 8.28s 8.04s 93356.125 93399.776
4 3 2 1 0.13s 9.44s 5.01s 82091.023 152346.188
4 3 4 1 0.12s 9.80s 3.00s 79245.466 256976.998
4 3 8 1 0.17s 10.54s 2.00s 73361.194 383107.125
4 3 16 1 0.16s 17.06s 1.09s 45637.883 404563.542
4 3 32 1 4.27s 42.62s 2.06s 16768.273 294151.260
4 3 60 1 40.02s 110.99s 4.04s 5207.607 177074.387
Gb Rep Thr CLine User System Wall flt/cpu/s fault/wsec
8 3 1 1 0.32s 16.84s 17.01s 91637.381 91636.318
8 3 2 1 0.32s 18.80s 10.02s 82228.356 153285.701
8 3 4 1 0.30s 19.45s 6.00s 79630.620 261810.203
8 3 8 1 0.34s 20.94s 4.00s 73885.006 391636.418
8 3 16 1 0.42s 34.06s 3.07s 45600.835 417784.690
8 3 32 1 9.57s 87.58s 5.01s 16188.390 303679.562
8 3 60 1 37.34s 246.24s 7.07s 5546.221 202734.992
Gb Rep Thr CLine User System Wall flt/cpu/s fault/wsec
16 3 1 1 0.64s 40.12s 40.07s 77161.695 77175.960
16 3 2 1 0.64s 38.24s 20.08s 80891.998 151015.426
16 3 4 1 0.67s 38.75s 11.09s 79784.113 263917.598
16 3 8 1 0.62s 41.82s 7.08s 74107.802 399410.789
16 3 16 1 0.61s 67.76s 7.03s 46003.627 429354.596
16 3 32 1 8.76s 173.04s 9.04s 17302.854 333248.692
16 3 60 1 32.76s 466.27s 13.03s 6303.609 235490.831
Gb Rep Thr CLine User System Wall flt/cpu/s fault/wsec
1 3 1 1 0.03s 2.08s 2.01s 92739.623 92765.448
1 3 2 1 0.02s 2.38s 1.03s 81647.841 150542.942
1 3 4 1 0.06s 2.46s 0.07s 77649.289 247254.017
1 3 8 1 0.05s 2.75s 0.05s 70017.094 346483.976
1 3 16 1 0.06s 4.39s 0.06s 44161.725 313310.777
1 3 32 1 9.02s 11.20s 1.02s 9717.675 162578.985
1 3 60 1 28.92s 29.71s 2.01s 3353.254 93278.693
Gb Rep Thr CLine User System Wall flt/cpu/s fault/wsec
4 3 1 1 0.16s 8.20s 8.03s 93935.977 93937.837
4 3 2 1 0.19s 9.33s 5.01s 82539.043 153124.158
4 3 4 1 0.22s 9.70s 3.00s 79213.537 257326.049
4 3 8 1 0.23s 10.48s 2.00s 73361.194 383192.157
4 3 16 1 0.22s 16.97s 1.09s 45722.791 405459.259
4 3 32 1 4.67s 43.56s 2.06s 16301.136 292609.111
4 3 60 1 21.01s 99.12s 4.00s 6546.181 193120.292
Gb Rep Thr CLine User System Wall flt/cpu/s fault/wsec
8 3 1 1 0.28s 16.77s 17.00s 92196.014 92248.241
8 3 2 1 0.36s 18.64s 10.02s 82747.475 154108.957
8 3 4 1 0.36s 19.39s 6.00s 79598.381 261456.810
8 3 8 1 0.31s 20.96s 4.00s 73898.891 392375.888
8 3 16 1 0.37s 34.28s 3.07s 45385.042 416385.059
8 3 32 1 8.75s 88.48s 5.01s 16175.737 303964.057
8 3 60 1 34.85s 213.80s 7.03s 6325.563 213671.451
Gb Rep Thr CLine User System Wall flt/cpu/s fault/wsec
16 3 1 1 0.63s 40.19s 40.08s 77040.752 77044.800
16 3 2 1 0.58s 38.34s 20.08s 80800.575 150916.421
16 3 4 1 0.71s 38.66s 11.09s 79873.248 264287.489
16 3 8 1 0.64s 41.91s 7.08s 73905.836 399511.701
16 3 16 1 0.64s 67.46s 7.03s 46187.350 430267.445
16 3 32 1 8.12s 171.97s 9.04s 17466.563 333665.446
16 3 60 1 28.56s 483.76s 13.03s 6140.067 235670.414
2. 2.6.13-rc1-mm1-hugh
Gb Rep Thr CLine User System Wall flt/cpu/s fault/wsec
1 3 1 1 0.02s 2.12s 2.01s 91530.726 91290.645
1 3 2 1 0.03s 2.40s 1.03s 80842.105 148577.211
1 3 4 1 0.04s 2.50s 0.07s 77161.695 246742.307
1 3 8 1 0.06s 2.74s 0.05s 69917.496 333526.774
1 3 16 1 0.04s 4.38s 0.05s 44321.010 329911.942
1 3 32 1 2.70s 11.10s 0.09s 14242.828 208238.299
1 3 60 1 10.39s 25.69s 1.06s 5448.016 122221.286
Gb Rep Thr CLine User System Wall flt/cpu/s fault/wsec
4 3 1 1 0.20s 8.24s 8.04s 93090.909 93102.689
4 3 2 1 0.17s 9.40s 5.01s 82125.313 152597.761
4 3 4 1 0.18s 9.76s 3.00s 78990.759 256940.071
4 3 8 1 0.14s 10.58s 2.00s 73361.194 383839.118
4 3 16 1 0.18s 17.34s 1.09s 44887.671 400584.413
4 3 32 1 3.03s 44.14s 2.06s 16670.171 296955.678
4 3 60 1 42.77s 124.64s 4.07s 4697.360 164568.728
Gb Rep Thr CLine User System Wall flt/cpu/s fault/wsec
8 3 1 1 0.41s 16.72s 17.01s 91787.115 91811.036
8 3 2 1 0.32s 18.75s 10.02s 82469.799 153767.106
8 3 4 1 0.31s 19.49s 6.00s 79405.493 260233.329
8 3 8 1 0.34s 21.00s 4.00s 73691.154 390162.630
8 3 16 1 0.33s 33.82s 3.07s 46054.814 420596.124
8 3 32 1 6.98s 87.06s 4.09s 16724.767 315990.572
8 3 60 1 39.50s 252.83s 7.06s 5380.182 204361.061
Gb Rep Thr CLine User System Wall flt/cpu/s fault/wsec
16 3 1 1 0.62s 40.28s 40.09s 76897.624 76894.371
16 3 2 1 0.73s 38.20s 20.08s 80775.678 150731.248
16 3 4 1 0.62s 38.86s 11.09s 79670.955 263253.128
16 3 8 1 0.67s 41.89s 7.09s 73891.948 398115.325
16 3 16 1 0.67s 68.00s 7.04s 45802.679 424756.786
16 3 32 1 8.13s 170.75s 9.06s 17584.902 325968.378
16 3 60 1 19.06s 443.08s 12.08s 6806.696 244372.117
Gb Rep Thr CLine User System Wall flt/cpu/s fault/wsec
1 3 1 1 0.04s 2.08s 2.01s 92390.977 92501.417
1 3 2 1 0.04s 2.38s 1.03s 81108.911 149261.811
1 3 4 1 0.04s 2.48s 0.07s 77895.404 245772.564
1 3 8 1 0.04s 2.71s 0.05s 71338.171 339935.008
1 3 16 1 0.08s 4.41s 0.06s 43690.667 321102.290
1 3 32 1 6.30s 10.29s 1.01s 11843.855 176712.623
1 3 60 1 31.78s 24.45s 2.00s 3496.372 95318.257
Gb Rep Thr CLine User System Wall flt/cpu/s fault/wsec
4 3 1 1 0.16s 8.23s 8.03s 93622.857 93678.202
4 3 2 1 0.17s 9.40s 5.01s 82125.313 152957.631
4 3 4 1 0.13s 9.81s 3.00s 79022.508 256991.607
4 3 8 1 0.16s 10.59s 2.00s 73142.857 383723.246
4 3 16 1 0.16s 17.08s 1.09s 45616.705 404165.286
4 3 32 1 4.28s 43.45s 2.07s 16471.850 283376.758
4 3 60 1 55.40s 115.75s 5.00s 4594.718 156683.131
Gb Rep Thr CLine User System Wall flt/cpu/s fault/wsec
8 3 1 1 0.32s 16.76s 17.00s 92044.944 92058.036
8 3 2 1 0.29s 18.68s 10.01s 82887.015 154397.006
8 3 4 1 0.33s 19.41s 6.00s 79662.885 262009.505
8 3 8 1 0.32s 20.91s 3.09s 74079.879 393444.882
8 3 16 1 0.32s 34.22s 3.07s 45521.649 417768.300
8 3 32 1 3.44s 85.73s 4.08s 17636.959 325891.128
8 3 60 1 56.83s 248.51s 8.02s 5150.986 191074.214
Gb Rep Thr CLine User System Wall flt/cpu/s fault/wsec
16 3 1 1 0.62s 40.08s 40.07s 77267.833 77269.273
16 3 2 1 0.67s 38.21s 20.08s 80891.998 151088.966
16 3 4 1 0.69s 38.68s 11.09s 79889.476 264299.054
16 3 8 1 0.65s 41.70s 7.08s 74261.756 400677.914
16 3 16 1 0.68s 68.20s 7.04s 45664.383 423956.953
16 3 32 1 4.05s 172.59s 9.03s 17808.292 338026.854
16 3 60 1 49.84s 458.57s 13.09s 6187.311 224887.539
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2005-08-23 16:30 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-08-22 21:27 Hugh Dickins
2005-08-22 21:29 ` [RFT][PATCH 1/2] " Hugh Dickins
2005-08-22 21:31 ` [RFT][PATCH 2/2] " Hugh Dickins
2005-08-23 0:25 ` Nick Piggin
2005-08-23 7:22 ` Hugh Dickins
2005-08-23 11:20 ` Nick Piggin
2005-08-23 13:06 ` Hugh Dickins
2005-08-23 13:29 ` Nick Piggin
2005-08-23 16:38 ` Hugh Dickins
2005-08-23 5:39 ` Andi Kleen
2005-08-23 7:01 ` Hugh Dickins
2005-08-22 22:29 ` [RFT][PATCH 0/2] " Christoph Lameter
2005-08-23 0:32 ` Nick Piggin
2005-08-23 7:04 ` Hugh Dickins
2005-08-23 8:14 ` Hugh Dickins
2005-08-23 10:03 ` Nick Piggin
2005-08-23 16:30 ` Christoph Lameter [this message]
2005-08-23 16:43 ` Martin J. Bligh
2005-08-23 18:29 ` Hugh Dickins
2005-08-27 22:10 ` Avi Kivity
2005-08-24 14:27 linux
2005-08-24 15:21 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.62.0508230909120.16321@schroedinger.engr.sgi.com \
--to=clameter@engr.sgi.com \
--cc=akpm@osdl.org \
--cc=hugh@veritas.com \
--cc=linux-mm@kvack.org \
--cc=nickpiggin@yahoo.com.au \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox