Re: [RFC] another way to speed up fake numa node page_alloc

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Paul Jackson <pj@sgi.com>
To: David Rientjes <rientjes@cs.washington.edu>
Cc: linux-mm@kvack.org, akpm@osdl.org, nickpiggin@yahoo.com.au,
	ak@suse.de, mbligh@google.com, rohitseth@google.com,
	menage@google.com, clameter@sgi.com
Subject: Re: [RFC] another way to speed up fake numa node page_alloc
Date: Wed, 4 Oct 2006 19:27:14 -0700	[thread overview]
Message-ID: <20061004192714.20412e08.pj@sgi.com> (raw)
In-Reply-To: <Pine.LNX.4.64N.0610041456480.19080@attu2.cs.washington.edu>

> Isn't this the exact behavior that ordered zonelists are supposed to solve 
> for real NUMA systems?  Has there been an _observed_ case where the cost 
> to scan the zonelists was considered excessive on real NUMA systems?

Well ... the good news is I understood your comments this time.

I guess I should be happy it only took about 3 iterations.

Historically the ordered zonelists addressed the situation where one
almost always found free memory near the front of the ordered zonelist.

Yes, you are correct that I originally didn't think we had a problem
with real numa zonelist scans.

Three days ago, when I introduced this alternative patch that started
this current thread, I changed my position, stating at that time:
>
> There are two reasons I persued this alternative:
> 
>  1) Contrary to what I said before, we (SGI, on large ia64 sn2 systems)
>     have seen real customer loads where the cost to scan the zonelist
>     was a problem, due to many nodes being full of memory before
>     we got to a node we could use.  Or at least, I think we have.
>     This was related to me by another engineer, based on experiences
>     from some time past.  So this is not guaranteed.  Most likely, though.
> 
>     The following approach should help such real numa systems just as
>     much as it helps fake numa systems, or any combination thereof.
>     
>  2) The effort to distinguish fake from real numa, using node_distance,
>     so that we could cache a fake numa node and optimize choosing
>     it over equivalent distance fake nodes, while continuing to
>     properly scan all real nodes in distance order, was going to
>     require a nasty blob of zonelist and node distance munging.
> 
>     The following approach has no new dependency on node distances or
>     zone sorting.

David wrote:
> I was under the impression that there was nothing wrong with the way 
> current real NUMA systems allocate pages.  If not, please point me to the 
> thread that _specifically_ discusses this with _data_ that shows it's 
> inefficient.

See above.  I don't have data, so cannot justify going far out of our
way.

If someone has a better way to skin this fake numa cat, that does not
benefit (or harm) real numa, that would still be worth careful
consideration.

> In fact, when this thread started you recommended as little 
> changes as possible to the code to not interfere with what already works.  

Yes, I did start with that recommendation.  See above.

And see above for my current reasons for persuing this patch.

Some more things I like about this patch:
 * Conceptually, it is very localized, making no changes to the
   larger code or data structure, just adding a cache of some
   hot data.
 * Further, it makes few assumptions about the larger scheme of
   things.
 * It has no dependencies on zonelist sorting, node distances,
   fake vs real numa nodes or any of that.
 * It makes no discernable difference in the memory placement
   behaviour of a system.

Downside - it's still a linear zonelist scan, and it's a cache bolted on
the side of things, rather than an inherently fast algorithm and data
structure.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2006-10-05  2:27 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-25  9:14 Paul Jackson
2006-09-26  6:08 ` David Rientjes
2006-09-26  7:06   ` Paul Jackson
2006-09-26 18:17     ` David Rientjes
2006-09-26 19:24       ` Paul Jackson
2006-09-26 19:58         ` David Rientjes
2006-09-26 21:48           ` Paul Jackson
2006-10-02  6:18 ` Paul Jackson
2006-10-02  6:31   ` David Rientjes
2006-10-02  6:48     ` Paul Jackson
2006-10-02  7:05       ` David Rientjes
2006-10-02  8:41         ` Paul Jackson
2006-10-03 18:15           ` Paul Jackson
2006-10-03 19:37             ` David Rientjes
2006-10-04 15:45               ` Paul Jackson
2006-10-04 16:11                 ` Christoph Lameter
2006-10-04 22:10                 ` David Rientjes
2006-10-05  2:27                   ` Paul Jackson [this message]
2006-10-05  2:37                     ` David Rientjes
2006-10-05  2:53                       ` Paul Jackson
2006-10-05  3:00                         ` David Rientjes
2006-10-05  3:26                           ` Paul Jackson
2006-10-05  3:49                             ` David Rientjes
2006-10-05  4:07                               ` Andrew Morton
2006-10-05  4:14                                 ` Paul Jackson
2006-10-05  4:50                                 ` David Rientjes
2006-10-05  4:53                                   ` Paul Jackson
2006-10-11  3:42                     ` Paul Jackson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061004192714.20412e08.pj@sgi.com \
    --to=pj@sgi.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=clameter@sgi.com \
    --cc=linux-mm@kvack.org \
    --cc=mbligh@google.com \
    --cc=menage@google.com \
    --cc=nickpiggin@yahoo.com.au \
    --cc=rientjes@cs.washington.edu \
    --cc=rohitseth@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox