From: Christoph Lameter <clameter@engr.sgi.com>
To: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@osdl.org>,
torvalds@osdl.org, piggin@yahoo.com.au, linux-mm@kvack.org
Subject: Re: pagefault scalability patches
Date: Thu, 18 Aug 2005 18:33:14 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.62.0508181822520.2740@schroedinger.engr.sgi.com> (raw)
In-Reply-To: <Pine.LNX.4.61.0508182116110.11409@goblin.wat.veritas.com>
On Thu, 18 Aug 2005, Hugh Dickins wrote:
> There's a lot about atomic pte ops in this thread, but it's a pte
> cmpxchg which do_anonymous_page has to do - if I remember PaulMcK's
> bogroll rightly, cmpxchgs are extra bad news.
Same badness as spin_lock yes but they serialize for extremely small time
periods. So they are better than spinlock.
> Christoph and Nick are keen to go further, deeper into the atomics
> and cmpxchgs, away from the page table lock. Is that sensible when
> we have batch operations like zap_pte_range and copy_pte_range?
I did a batch faulting scheme last year too. See
http://marc.theaimsgroup.com/?l=linux-kernel&m=110488578521535&w=2
> How many architectures have been converted to ATOMIC_TABLE_OPS
> (could we call that ATOMIC_PAGE_TABLE_OPS?): just ia64, x86_64
> and i386. i386 being a joke, since it's only the non-PAE case
> which is converted, yet surely anyone getting into a serious
> number of cpus on i386 will be using PAE?
Right. This is just a start. If it would be in the kernel then other
people will do the work as I have heard repeatedly. Chicken-Egg.
> I may well be to blame for this. Perhaps my hostility has
> discouraged others from doing the work to add to what's there.
> Certainly it was me who advised Christoph to drop the i386 PAE
> support he originally had, since it was too ugly and buggy.
PAE support can be added within the framework provided by these
patches.
> And it was probably my resistance to the per-task rss patch which
> has led him to hold that back for now. I think wisely, that is a
> separate issue. But from what Linus says, it does rather look like
> we can't sensibly go forward with anonymous pte cmpxchging, without
> a matching rss solution.
I am working on getting the bit rot out of my old patches. This is going
to take a few days.
> matter. (There were three places in rmap.c which avoided rss 0 mms,
> but that was a historic necessity: I've deleted those checks from the
> rmap.c waiting in -mm.) Can't we just let them be racy?
Its great that these are gone. I just tried to find them and was happy to
discover they were already gone.
> With the page table lock moved inward, we can then easily choose to
> use a per-pagetable lock, to handle the page fault scalability issue
> without departing far from our existing locking conventions. Indeed,
> I have a working prototype for that, but I don't have equipment to test
> scalability on SGI's scale, and on my 2*HT*Xeons the best results are
> coming from just narrowing the page table lock, not from splitting it.
I have tried that last year too. I thought you helped me see the light on
the futility of that approach?
> I find proceeding in this way easier to understand, and would myself
> prefer Christoph's patches removed from -mm, so we can build the
> narrower page_table_lock solution there, then see what works best
> as a scalability solution on top - per-pagetable locking, or pte
> cmpxchging. But we all find our own ways easier to understand.
Oh no. We have been there before and I fear that if this gets removed then there
will be no progress for a long time like before. Please at least leave the
first patch in which provides an infrastructure for atomic pte operations
that may then be deployed in a variety of ways and be useful for
approaches that Hugh or Nick may come up with.
> You might like me to post my patch for testing (not for merging into
> any tree at this stage): please give me a couple of days to jiggle
> around with it first.
I'd be interested to see if you can really come up with anything that we
have not tried yet.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2005-08-19 1:33 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-08-17 22:17 Andrew Morton
2005-08-17 22:19 ` Christoph Lameter
2005-08-17 22:36 ` Linus Torvalds
2005-08-17 22:51 ` Christoph Lameter
2005-08-17 23:01 ` Linus Torvalds
2005-08-17 23:12 ` Christoph Lameter
2005-08-17 23:23 ` Linus Torvalds
2005-08-17 23:31 ` Christoph Lameter
2005-08-17 23:30 ` Andrew Morton
2005-08-17 23:33 ` Christoph Lameter
2005-08-17 23:44 ` Andrew Morton
2005-08-17 23:52 ` Peter Chubb
2005-08-17 23:58 ` Christoph Lameter
2005-08-18 0:47 ` Andrew Morton
2005-08-18 16:09 ` Christoph Lameter
2005-08-22 2:13 ` Benjamin Herrenschmidt
2005-08-18 0:43 ` Andrew Morton
2005-08-18 16:04 ` Christoph Lameter
2005-08-18 20:16 ` Hugh Dickins
2005-08-19 1:22 ` [PATCH] use mm_counter macros for nr_pte since its also under ptl Christoph Lameter
2005-08-19 3:17 ` Andrew Morton
2005-08-19 3:51 ` Christoph Lameter
2005-08-19 1:33 ` Christoph Lameter [this message]
2005-08-19 3:53 ` [RFC] Concept for delayed counter updates in mm_struct Christoph Lameter
2005-08-19 4:29 ` Andrew Morton
2005-08-19 4:34 ` Andi Kleen
2005-08-19 4:49 ` Linus Torvalds
2005-08-19 16:06 ` Christoph Lameter
2005-08-20 7:33 ` [PATCH] mm_struct counter deltas in task_struct Christoph Lameter
2005-08-20 7:35 ` [PATCH] Use deltas to replace atomic inc Christoph Lameter
2005-08-20 7:58 ` Andrew Morton
2005-08-22 3:32 ` Christoph Lameter
2005-08-22 3:48 ` Linus Torvalds
2005-08-22 4:06 ` Christoph Lameter
2005-08-22 4:18 ` Linus Torvalds
2005-08-22 13:23 ` Christoph Lameter
2005-08-22 14:22 ` Hugh Dickins
2005-08-22 15:24 ` Christoph Lameter
2005-08-22 15:43 ` Andi Kleen
2005-08-22 16:24 ` Christoph Lameter
2005-08-22 20:30 ` [PATCH] mm_struct counter deltas V2 Christoph Lameter
2005-08-22 20:31 ` [PATCH] Use deltas to replace atomic inc V2 Christoph Lameter
2005-08-22 2:09 ` pagefault scalability patches Benjamin Herrenschmidt
2005-08-18 2:00 ` Nick Piggin
2005-08-18 8:38 ` Nick Piggin
2005-08-18 16:17 ` Christoph Lameter
2005-08-22 2:04 ` Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.62.0508181822520.2740@schroedinger.engr.sgi.com \
--to=clameter@engr.sgi.com \
--cc=akpm@osdl.org \
--cc=hugh@veritas.com \
--cc=linux-mm@kvack.org \
--cc=piggin@yahoo.com.au \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox