linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Lameter <clameter@engr.sgi.com>
To: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@osdl.org>,
	torvalds@osdl.org, piggin@yahoo.com.au, linux-mm@kvack.org
Subject: Re: pagefault scalability patches
Date: Thu, 18 Aug 2005 18:33:14 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.62.0508181822520.2740@schroedinger.engr.sgi.com> (raw)
In-Reply-To: <Pine.LNX.4.61.0508182116110.11409@goblin.wat.veritas.com>

On Thu, 18 Aug 2005, Hugh Dickins wrote:

> There's a lot about atomic pte ops in this thread, but it's a pte
> cmpxchg which do_anonymous_page has to do - if I remember PaulMcK's
> bogroll rightly, cmpxchgs are extra bad news.

Same badness as spin_lock yes but they serialize for extremely small time 
periods. So they are better than spinlock.

> Christoph and Nick are keen to go further, deeper into the atomics
> and cmpxchgs, away from the page table lock.  Is that sensible when
> we have batch operations like zap_pte_range and copy_pte_range?

I did a batch faulting scheme last year too. See 
http://marc.theaimsgroup.com/?l=linux-kernel&m=110488578521535&w=2

> How many architectures have been converted to ATOMIC_TABLE_OPS
> (could we call that ATOMIC_PAGE_TABLE_OPS?): just ia64, x86_64
> and i386.  i386 being a joke, since it's only the non-PAE case
> which is converted, yet surely anyone getting into a serious
> number of cpus on i386 will be using PAE?

Right. This is just a start. If it would be in the kernel then other 
people will do the work as I have heard repeatedly. Chicken-Egg.

> I may well be to blame for this.  Perhaps my hostility has
> discouraged others from doing the work to add to what's there.
> Certainly it was me who advised Christoph to drop the i386 PAE
> support he originally had, since it was too ugly and buggy.

PAE support can be added within the framework provided by these 
patches.

> And it was probably my resistance to the per-task rss patch which
> has led him to hold that back for now.  I think wisely, that is a
> separate issue.  But from what Linus says, it does rather look like
> we can't sensibly go forward with anonymous pte cmpxchging, without
> a matching rss solution.

I am working on getting the bit rot out of my old patches. This is going 
to take a few days.

> matter.  (There were three places in rmap.c which avoided rss 0 mms,
> but that was a historic necessity: I've deleted those checks from the
> rmap.c waiting in -mm.)  Can't we just let them be racy?

Its great that these are gone. I just tried to find them and was happy to 
discover they were already gone.

> With the page table lock moved inward, we can then easily choose to
> use a per-pagetable lock, to handle the page fault scalability issue
> without departing far from our existing locking conventions.  Indeed,
> I have a working prototype for that, but I don't have equipment to test
> scalability on SGI's scale, and on my 2*HT*Xeons the best results are
> coming from just narrowing the page table lock, not from splitting it.

I have tried that last year too. I thought you helped me see the light on 
the futility of that approach?

> I find proceeding in this way easier to understand, and would myself
> prefer Christoph's patches removed from -mm, so we can build the
> narrower page_table_lock solution there, then see what works best
> as a scalability solution on top - per-pagetable locking, or pte
> cmpxchging.  But we all find our own ways easier to understand.

Oh no. We have been there before and I fear that if this gets removed then there 
will be no progress for a long time like before. Please at least leave the 
first patch in which provides an infrastructure for atomic pte operations 
that may then be deployed in a variety of ways and be useful for 
approaches that Hugh or Nick may come up with.

> You might like me to post my patch for testing (not for merging into
> any tree at this stage): please give me a couple of days to jiggle
> around with it first.

I'd be interested to see if you can really come up with anything that we 
have not tried yet.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2005-08-19  1:33 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-08-17 22:17 Andrew Morton
2005-08-17 22:19 ` Christoph Lameter
2005-08-17 22:36 ` Linus Torvalds
2005-08-17 22:51   ` Christoph Lameter
2005-08-17 23:01     ` Linus Torvalds
2005-08-17 23:12       ` Christoph Lameter
2005-08-17 23:23         ` Linus Torvalds
2005-08-17 23:31           ` Christoph Lameter
2005-08-17 23:30         ` Andrew Morton
2005-08-17 23:33           ` Christoph Lameter
2005-08-17 23:44             ` Andrew Morton
2005-08-17 23:52               ` Peter Chubb
2005-08-17 23:58                 ` Christoph Lameter
2005-08-18  0:47                   ` Andrew Morton
2005-08-18 16:09                     ` Christoph Lameter
2005-08-22  2:13     ` Benjamin Herrenschmidt
2005-08-18  0:43 ` Andrew Morton
2005-08-18 16:04   ` Christoph Lameter
2005-08-18 20:16   ` Hugh Dickins
2005-08-19  1:22     ` [PATCH] use mm_counter macros for nr_pte since its also under ptl Christoph Lameter
2005-08-19  3:17       ` Andrew Morton
2005-08-19  3:51         ` Christoph Lameter
2005-08-19  1:33     ` Christoph Lameter [this message]
2005-08-19  3:53     ` [RFC] Concept for delayed counter updates in mm_struct Christoph Lameter
2005-08-19  4:29       ` Andrew Morton
2005-08-19  4:34         ` Andi Kleen
2005-08-19  4:49         ` Linus Torvalds
2005-08-19 16:06           ` Christoph Lameter
2005-08-20  7:33           ` [PATCH] mm_struct counter deltas in task_struct Christoph Lameter
2005-08-20  7:35           ` [PATCH] Use deltas to replace atomic inc Christoph Lameter
2005-08-20  7:58             ` Andrew Morton
2005-08-22  3:32               ` Christoph Lameter
2005-08-22  3:48                 ` Linus Torvalds
2005-08-22  4:06                   ` Christoph Lameter
2005-08-22  4:18                     ` Linus Torvalds
2005-08-22 13:23                       ` Christoph Lameter
2005-08-22 14:22                         ` Hugh Dickins
2005-08-22 15:24                           ` Christoph Lameter
2005-08-22 15:43                             ` Andi Kleen
2005-08-22 16:24                               ` Christoph Lameter
2005-08-22 20:30                           ` [PATCH] mm_struct counter deltas V2 Christoph Lameter
2005-08-22 20:31                           ` [PATCH] Use deltas to replace atomic inc V2 Christoph Lameter
2005-08-22  2:09   ` pagefault scalability patches Benjamin Herrenschmidt
2005-08-18  2:00 ` Nick Piggin
2005-08-18  8:38   ` Nick Piggin
2005-08-18 16:17     ` Christoph Lameter
2005-08-22  2:04       ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.62.0508181822520.2740@schroedinger.engr.sgi.com \
    --to=clameter@engr.sgi.com \
    --cc=akpm@osdl.org \
    --cc=hugh@veritas.com \
    --cc=linux-mm@kvack.org \
    --cc=piggin@yahoo.com.au \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox