From: Paul Jackson <pj@sgi.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: Lee.Schermerhorn@hp.com, linux-mm@kvack.org,
akpm@linux-foundation.org, ak@suse.de
Subject: Re: [PATCH] Document Linux Memory Policy
Date: Thu, 31 May 2007 12:25:44 -0700 [thread overview]
Message-ID: <20070531122544.fd561de4.pj@sgi.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0705301042320.1195@schroedinger.engr.sgi.com>
> They have to since they may be used to change page locations when policies
> are active. There is a libcpuset library that can be used for application
> control of cpusets. I think Paul would disagree with you here.
In the most common usage, a batch scheduler uses cpusets to control
a jobs memory and placement, and application code within the job uses
the memory policy calls (mbind, set_mempolicy) and scheduler policy
call (set_schedaffinity) to manage its detailed placement.
In particular, the memory policy calls can only be applied to the
current task, so any larger scope control has to be done by cpusets.
The cpuset file system, with its traditional file system hierarchy
and permission model, allows as much control as desired to be passed
on to specific applications, and over time, I expect this to happen
more.
However, there will always be a different focus here.
The primary purpose of the memory and scheduler policy mechanisms is to
maximize the efficient usage of available resources by a co-operating
set of tasks - get tasks close to their memory and things like that.
The mind set is "we own the machine - how can we best use it." For
example tightly coupled MPI jobs will need to place one compute bound
thread on each processor, insure that nothing else is actively running
on those processors, and place data close to task accessing it. The
expectation is that a jobs code may have to be modified, perhaps even
radically rewritten with a new algorithm, to optimize processor and
memory usage, as relative speeds of processor, memory and bus change.
The primary purpose of cpusets is job isolation, ensuring that one job
does not interfere with another, by keeping the jobs on separate cpus
and memory nodes. The mind set is "how can we keep these several jobs
out of each others hair, minimizing any impact of one jobs resource
usage on the runtime of another." The expectation is that jobs must
be controlled externally, without any change to the jobs code or even
any expertise in the fine grained memory or scheduler policy behaviour
of the job.
It may well make sense to document memory policy, for the developers
of large applications that need to use the scheduler or memory policy
routines to manage their multi-threaded, or multiple memory node (NUMA)
placement, -separate- from documenting cpuset placement of jobs on cpus
and memory. It's a quite different audience. In so far as possible,
the cpuset code was designed to enable controlling the placement of
jobs without the developer of those jobs, who might be using the
scheduler and memory placement calls, being aware of cpusets -- it's
just a smaller machine available to their job. Migration should also
be transparent to them -- their machine moved, that's all.
Unfortunately there are a couple of details that leak through:
1) big apps using scheduler and memory policy calls often want to
know how "big" their machine is, which changes under cpusets
from the physical size of the system, and
2) the sched_setaffinity, mbind and set_mempolicy calls take hard
physical CPU and Memory Node numbers, which change under migration
non-transparently.
Therefore I have in libcpuset two kinds of routines:
1) a large powerful set used by heavy weight batch schedulers to
provide sophisticated job placement, and
2) a small simple set used by applications that provide an interface
to sched_setaffinity, mbind and set_mempolicy that is virtualized
to the cpuset, providing cpuset relative CPU and Memory Node
numbering and cpuset relative sizes, safely usable from an
application across a migration to different nodes, without
application awareness.
The ancient, Linux 2.4 kernel based, libcpuset on oss.sgi.com is
really ancient and not relevant here. The cpuset mechanism in
Linux 2.6 is a complete redesign from SGI's cpumemset mechanism
for Linux 2.4 kernels.
SGI releases libcpuset under GPL license, though currently I've just
set this up for customers of SGI's software. Someday I hope to get
the current libcpuset up on oss.sgi.com, for all to use.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-05-31 19:25 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-29 19:33 Lee Schermerhorn
2007-05-29 20:04 ` Christoph Lameter
2007-05-29 20:16 ` Andi Kleen
2007-05-30 16:17 ` Lee Schermerhorn
2007-05-30 17:41 ` Christoph Lameter
2007-05-31 8:20 ` Michael Kerrisk
2007-05-31 14:49 ` Lee Schermerhorn
2007-05-31 15:56 ` Michael Kerrisk
2007-06-01 21:15 ` [PATCH] enhance memory policy sys call man pages v1 Lee Schermerhorn
2007-07-23 6:11 ` Michael Kerrisk
2007-07-23 6:32 ` mbind.2 man page patch Michael Kerrisk
2007-07-23 14:26 ` Lee Schermerhorn
2007-07-26 17:19 ` Michael Kerrisk
2007-07-26 18:06 ` Lee Schermerhorn
2007-07-26 18:18 ` Michael Kerrisk
2007-07-23 6:32 ` get_mempolicy.2 " Michael Kerrisk
2007-07-28 9:31 ` Michael Kerrisk
2007-08-09 18:43 ` Lee Schermerhorn
2007-08-09 20:57 ` Michael Kerrisk
2007-08-16 20:05 ` Andi Kleen
2007-08-18 5:50 ` Michael Kerrisk
2007-08-21 15:45 ` Lee Schermerhorn
2007-08-22 4:10 ` Michael Kerrisk
2007-08-22 16:08 ` [PATCH] Mempolicy Man Pages 2.64 1/3 - mbind.2 Lee Schermerhorn
2007-08-27 11:29 ` Michael Kerrisk
2007-08-22 16:10 ` [PATCH] Mempolicy Man Pages 2.64 2/3 - set_mempolicy.2 Lee Schermerhorn
2007-08-27 11:30 ` Michael Kerrisk
2007-08-22 16:12 ` [PATCH] Mempolicy Man Pages 2.64 3/3 - get_mempolicy.2 Lee Schermerhorn
2007-08-27 11:30 ` Michael Kerrisk
2007-08-27 10:46 ` get_mempolicy.2 man page patch Michael Kerrisk
2007-07-23 6:33 ` set_mempolicy.2 " Michael Kerrisk
2007-05-30 16:55 ` [PATCH] Document Linux Memory Policy Lee Schermerhorn
2007-05-30 17:56 ` Christoph Lameter
2007-05-31 6:18 ` Gleb Natapov
2007-05-31 6:41 ` Christoph Lameter
2007-05-31 6:47 ` Gleb Natapov
2007-05-31 6:56 ` Christoph Lameter
2007-05-31 7:11 ` Gleb Natapov
2007-05-31 7:24 ` Christoph Lameter
2007-05-31 7:39 ` Gleb Natapov
2007-05-31 17:43 ` Christoph Lameter
2007-05-31 17:07 ` Lee Schermerhorn
2007-05-31 10:43 ` Andi Kleen
2007-05-31 11:04 ` Gleb Natapov
2007-05-31 11:30 ` Gleb Natapov
2007-05-31 15:26 ` Lee Schermerhorn
2007-05-31 17:41 ` Gleb Natapov
2007-05-31 18:56 ` Lee Schermerhorn
2007-05-31 20:06 ` Gleb Natapov
2007-05-31 20:43 ` Andi Kleen
2007-06-01 9:38 ` Gleb Natapov
2007-06-01 10:21 ` Andi Kleen
2007-06-01 12:25 ` Gleb Natapov
2007-06-01 13:09 ` Andi Kleen
2007-06-01 17:15 ` Lee Schermerhorn
2007-06-01 18:43 ` Christoph Lameter
2007-06-01 19:38 ` Lee Schermerhorn
2007-06-01 19:48 ` Christoph Lameter
2007-06-01 21:05 ` Lee Schermerhorn
2007-06-01 21:56 ` Christoph Lameter
2007-06-04 13:46 ` Lee Schermerhorn
2007-06-04 16:34 ` Christoph Lameter
2007-06-04 17:02 ` Lee Schermerhorn
2007-06-04 17:11 ` Christoph Lameter
2007-06-04 20:23 ` Andi Kleen
2007-06-04 21:51 ` Christoph Lameter
2007-06-05 14:30 ` Lee Schermerhorn
2007-06-01 20:28 ` Gleb Natapov
2007-06-01 20:45 ` Christoph Lameter
2007-06-01 21:10 ` Lee Schermerhorn
2007-06-01 21:58 ` Christoph Lameter
2007-06-02 7:23 ` Gleb Natapov
2007-05-31 11:47 ` Andi Kleen
2007-05-31 11:59 ` Gleb Natapov
2007-05-31 12:15 ` Andi Kleen
2007-05-31 12:18 ` Gleb Natapov
2007-05-31 18:28 ` Lee Schermerhorn
2007-05-31 18:35 ` Christoph Lameter
2007-05-31 19:29 ` Lee Schermerhorn
2007-05-31 19:25 ` Paul Jackson [this message]
2007-05-31 20:22 ` Lee Schermerhorn
2007-05-29 20:07 ` Andi Kleen
2007-05-30 16:04 ` Lee Schermerhorn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070531122544.fd561de4.pj@sgi.com \
--to=pj@sgi.com \
--cc=Lee.Schermerhorn@hp.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox