From: Lee Schermerhorn <lee.schermerhorn@hp.com>
To: linux-mm@kvack.org
Cc: akpm@linux-foundation.org, ak@suse.de, mtk-manpages@gmx.net,
clameter@sgi.com, solo@google.com,
Lee Schermerhorn <lee.schermerhorn@hp.com>,
eric.whitney@hp.com
Subject: [PATCH/RFC 2/5] Mem Policy: Use MPOL_PREFERRED for system-wide default policy
Date: Thu, 30 Aug 2007 14:51:07 -0400 [thread overview]
Message-ID: <20070830185107.22619.43577.sendpatchset@localhost> (raw)
In-Reply-To: <20070830185053.22619.96398.sendpatchset@localhost>
PATCH/RFC 2/5 Use MPOL_PREFERRED for system-wide default policy
Against: 2.6.23-rc3-mm1
V1 -> V2:
+ restore BUG()s in switch(policy) default cases -- per
Christoph
+ eliminate unneeded re-init of struct mempolicy policy member
before freeing
Currently, when one specifies MPOL_DEFAULT via a NUMA memory
policy API [set_mempolicy(), mbind() and internal versions],
the kernel simply installs a NULL struct mempolicy pointer in
the appropriate context: task policy, vma policy, or shared
policy. This causes any use of that policy to "fall back" to
the next most specific policy scope. The only use of MPOL_DEFAULT
to mean "local allocation" is in the system default policy.
There is another, "preferred" way to specify local allocation via
the APIs. That is using the MPOL_PREFERRED policy mode with an
empty nodemask. Internally, the empty nodemask gets converted to
a preferred_node id of '-1'. All internal usage of MPOL_PREFERRED
will convert the '-1' to the id of the node local to the cpu
where the allocation occurs.
System default policy, except during boot, is hard-coded to
"local allocation". By using the MPOL_PREFERRED mode with a
negative value of preferred node for system default policy,
MPOL_DEFAULT will never occur in the 'policy' member of a
struct mempolicy. Thus, we can remove all checks for
MPOL_DEFAULT when converting policy to a node id/zonelist in
the allocation paths.
In slab_node() return local node id when policy pointer is NULL.
No need to set a pol value to take the switch default. Replace
switch default with BUG()--i.e., shouldn't happen.
With this patch MPOL_DEFAULT is only used in the APIs, including
internal calls to do_set_mempolicy() and in the display of policy
in /proc/<pid>/numa_maps. It always means "fall back" to the the
next most specific policy scope. This simplifies the description
of memory policies quite a bit, with no visible change in behavior.
This patch updates Documentation to reflect this change.
Tested with set_mempolicy() using numactl with memtoy, and
tested mbind() with memtoy. All seems to work "as expected".
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Documentation/vm/numa_memory_policy.txt | 70 ++++++++++++--------------------
mm/mempolicy.c | 31 ++++++--------
2 files changed, 41 insertions(+), 60 deletions(-)
Index: Linux/mm/mempolicy.c
===================================================================
--- Linux.orig/mm/mempolicy.c 2007-08-29 11:43:06.000000000 -0400
+++ Linux/mm/mempolicy.c 2007-08-29 11:44:03.000000000 -0400
@@ -105,9 +105,13 @@ static struct kmem_cache *sn_cache;
policied. */
enum zone_type policy_zone = 0;
+/*
+ * run-time system-wide default policy => local allocation
+ */
struct mempolicy default_policy = {
.refcnt = ATOMIC_INIT(1), /* never free it */
- .policy = MPOL_DEFAULT,
+ .policy = MPOL_PREFERRED,
+ .v = { .preferred_node = -1 },
};
static void mpol_rebind_policy(struct mempolicy *pol,
@@ -180,7 +184,8 @@ static struct mempolicy *mpol_new(int mo
mode, nodes ? nodes_addr(*nodes)[0] : -1);
if (mode == MPOL_DEFAULT)
- return NULL;
+ return NULL; /* simply delete any existing policy */
+
policy = kmem_cache_alloc(policy_cache, GFP_KERNEL);
if (!policy)
return ERR_PTR(-ENOMEM);
@@ -493,8 +498,6 @@ static void get_zonemask(struct mempolic
node_set(zone_to_nid(p->v.zonelist->zones[i]),
*nodes);
break;
- case MPOL_DEFAULT:
- break;
case MPOL_INTERLEAVE:
*nodes = p->v.nodes;
break;
@@ -1106,8 +1109,7 @@ static struct mempolicy * get_vma_policy
if (vma->vm_ops && vma->vm_ops->get_policy) {
pol = vma->vm_ops->get_policy(vma, addr);
shared_pol = 1; /* if pol non-NULL, that is */
- } else if (vma->vm_policy &&
- vma->vm_policy->policy != MPOL_DEFAULT)
+ } else if (vma->vm_policy)
pol = vma->vm_policy;
}
if (!pol)
@@ -1136,7 +1138,6 @@ static struct zonelist *zonelist_policy(
return policy->v.zonelist;
/*FALL THROUGH*/
case MPOL_INTERLEAVE: /* should not happen */
- case MPOL_DEFAULT:
nd = numa_node_id();
break;
default:
@@ -1166,9 +1167,10 @@ static unsigned interleave_nodes(struct
*/
unsigned slab_node(struct mempolicy *policy)
{
- int pol = policy ? policy->policy : MPOL_DEFAULT;
+ if (!policy)
+ return numa_node_id();
- switch (pol) {
+ switch (policy->policy) {
case MPOL_INTERLEAVE:
return interleave_nodes(policy);
@@ -1182,10 +1184,10 @@ unsigned slab_node(struct mempolicy *pol
case MPOL_PREFERRED:
if (policy->v.preferred_node >= 0)
return policy->v.preferred_node;
- /* Fall through */
+ return numa_node_id();
default:
- return numa_node_id();
+ BUG();
}
}
@@ -1410,8 +1412,6 @@ int __mpol_equal(struct mempolicy *a, st
if (a->policy != b->policy)
return 0;
switch (a->policy) {
- case MPOL_DEFAULT:
- return 1;
case MPOL_INTERLEAVE:
return nodes_equal(a->v.nodes, b->v.nodes);
case MPOL_PREFERRED:
@@ -1436,7 +1436,6 @@ void __mpol_free(struct mempolicy *p)
return;
if (p->policy == MPOL_BIND)
kfree(p->v.zonelist);
- p->policy = MPOL_DEFAULT;
kmem_cache_free(policy_cache, p);
}
@@ -1603,7 +1602,7 @@ void mpol_shared_policy_init(struct shar
if (policy != MPOL_DEFAULT) {
struct mempolicy *newpol;
- /* Falls back to MPOL_DEFAULT on any error */
+ /* Falls back to NULL policy [MPOL_DEFAULT] on any error */
newpol = mpol_new(policy, policy_nodes);
if (!IS_ERR(newpol)) {
/* Create pseudo-vma that contains just the policy */
@@ -1724,8 +1723,6 @@ static void mpol_rebind_policy(struct me
return;
switch (pol->policy) {
- case MPOL_DEFAULT:
- break;
case MPOL_INTERLEAVE:
nodes_remap(tmp, pol->v.nodes, *mpolmask, *newmask);
pol->v.nodes = tmp;
Index: Linux/Documentation/vm/numa_memory_policy.txt
===================================================================
--- Linux.orig/Documentation/vm/numa_memory_policy.txt 2007-08-29 11:23:56.000000000 -0400
+++ Linux/Documentation/vm/numa_memory_policy.txt 2007-08-29 11:43:10.000000000 -0400
@@ -149,63 +149,47 @@ Components of Memory Policies
Linux memory policy supports the following 4 behavioral modes:
- Default Mode--MPOL_DEFAULT: The behavior specified by this mode is
- context or scope dependent.
+ Default Mode--MPOL_DEFAULT: This mode is only used in the memory
+ policy APIs. Internally, MPOL_DEFAULT is converted to the NULL
+ memory policy in all policy scopes. Any existing non-default policy
+ will simply be removed when MPOL_DEFAULT is specified. As a result,
+ MPOL_DEFAULT means "fall back to the next most specific policy scope."
+
+ For example, a NULL or default task policy will fall back to the
+ system default policy. A NULL or default vma policy will fall
+ back to the task policy.
- As mentioned in the Policy Scope section above, during normal
- system operation, the System Default Policy is hard coded to
- contain the Default mode.
-
- In this context, default mode means "local" allocation--that is
- attempt to allocate the page from the node associated with the cpu
- where the fault occurs. If the "local" node has no memory, or the
- node's memory can be exhausted [no free pages available], local
- allocation will "fallback to"--attempt to allocate pages from--
- "nearby" nodes, in order of increasing "distance".
-
- Implementation detail -- subject to change: "Fallback" uses
- a per node list of sibling nodes--called zonelists--built at
- boot time, or when nodes or memory are added or removed from
- the system [memory hotplug]. These per node zonelist are
- constructed with nodes in order of increasing distance based
- on information provided by the platform firmware.
-
- When a task/process policy or a shared policy contains the Default
- mode, this also means "local allocation", as described above.
-
- In the context of a VMA, Default mode means "fall back to task
- policy"--which may or may not specify Default mode. Thus, Default
- mode can not be counted on to mean local allocation when used
- on a non-shared region of the address space. However, see
- MPOL_PREFERRED below.
-
- The Default mode does not use the optional set of nodes.
+ When specified in one of the memory policy APIs, the Default mode
+ does not use the optional set of nodes.
MPOL_BIND: This mode specifies that memory must come from the
set of nodes specified by the policy.
The memory policy APIs do not specify an order in which the nodes
- will be searched. However, unlike "local allocation", the Bind
- policy does not consider the distance between the nodes. Rather,
- allocations will fallback to the nodes specified by the policy in
- order of numeric node id. Like everything in Linux, this is subject
- to change.
+ will be searched. However, unlike "local allocation" discussed
+ below, the Bind policy does not consider the distance between the
+ nodes. Rather, allocations will fallback to the nodes specified
+ by the policy in order of numeric node id. Like everything in
+ Linux, this is subject to change.
MPOL_PREFERRED: This mode specifies that the allocation should be
attempted from the single node specified in the policy. If that
- allocation fails, the kernel will search other nodes, exactly as
- it would for a local allocation that started at the preferred node
- in increasing distance from the preferred node. "Local" allocation
- policy can be viewed as a Preferred policy that starts at the node
- containing the cpu where the allocation takes place.
+ allocation fails, the kernel will search other nodes, in order of
+ increasing distance from the preferred node based on information
+ provided by the platform firmware.
Internally, the Preferred policy uses a single node--the
preferred_node member of struct mempolicy. A "distinguished
value of this preferred_node, currently '-1', is interpreted
as "the node containing the cpu where the allocation takes
- place"--local allocation. This is the way to specify
- local allocation for a specific range of addresses--i.e. for
- VMA policies.
+ place"--local allocation. "Local" allocation policy can be
+ viewed as a Preferred policy that starts at the node containing
+ the cpu where the allocation takes place.
+
+ As mentioned in the Policy Scope section above, during normal
+ system operation, the System Default Policy is hard coded to
+ specify "local allocation". This policy uses the Preferred
+ policy with the special negative value of preferred_node.
MPOL_INTERLEAVED: This mode specifies that page allocations be
interleaved, on a page granularity, across the nodes specified in
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-08-30 18:51 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-30 18:50 [PATCH/RFC 0/5] Memory Policy Cleanups and Enhancements Lee Schermerhorn
2007-08-30 18:51 ` [PATCH/RFC 1/5] Mem Policy: fix reference counting Lee Schermerhorn
2007-09-11 18:48 ` Mel Gorman
2007-09-11 18:12 ` Lee Schermerhorn
2007-09-13 9:45 ` Mel Gorman
2007-08-30 18:51 ` Lee Schermerhorn [this message]
2007-09-11 18:54 ` [PATCH/RFC 2/5] Mem Policy: Use MPOL_PREFERRED for system-wide default policy Mel Gorman
2007-09-11 18:22 ` Lee Schermerhorn
2007-09-13 9:48 ` Mel Gorman
2007-08-30 18:51 ` [PATCH/RFC 3/5] Mem Policy: MPOL_PREFERRED fixups for "local allocation" Lee Schermerhorn
2007-09-11 18:58 ` Mel Gorman
2007-09-11 18:34 ` Lee Schermerhorn
2007-09-12 22:10 ` Christoph Lameter
2007-09-13 13:51 ` Lee Schermerhorn
2007-09-13 18:18 ` Christoph Lameter
2007-09-13 9:55 ` Mel Gorman
2007-09-12 22:06 ` Christoph Lameter
2007-09-13 13:35 ` Lee Schermerhorn
2007-09-13 18:21 ` Christoph Lameter
2007-08-30 18:51 ` [PATCH/RFC 4/5] Mem Policy: cpuset-independent interleave policy Lee Schermerhorn
2007-09-12 21:20 ` Ethan Solomita
2007-09-12 22:14 ` Christoph Lameter
2007-09-13 13:26 ` Lee Schermerhorn
2007-09-13 17:17 ` Ethan Solomita
2007-09-12 21:59 ` Ethan Solomita
2007-09-13 13:32 ` Lee Schermerhorn
2007-09-13 17:19 ` Ethan Solomita
2007-09-13 18:20 ` Christoph Lameter
2007-10-09 6:15 ` Ethan Solomita
2007-10-09 13:39 ` Lee Schermerhorn
2007-10-09 18:49 ` Christoph Lameter
2007-10-09 19:02 ` Lee Schermerhorn
2007-08-30 18:51 ` [PATCH/RFC 5/5] Mem Policy: add MPOL_F_MEMS_ALLOWED get_mempolicy() flag Lee Schermerhorn
2007-09-11 19:07 ` Mel Gorman
2007-09-11 18:42 ` Lee Schermerhorn
2007-09-12 22:14 ` Christoph Lameter
2007-09-14 20:24 ` [PATCH] " Lee Schermerhorn
2007-09-14 20:27 ` Christoph Lameter
2007-09-11 16:20 ` [PATCH/RFC 0/5] Memory Policy Cleanups and Enhancements Lee Schermerhorn
2007-09-11 19:12 ` Mel Gorman
2007-09-11 18:45 ` Lee Schermerhorn
2007-09-12 22:17 ` Christoph Lameter
2007-09-13 13:57 ` Lee Schermerhorn
2007-09-13 15:31 ` Mel Gorman
2007-09-13 15:01 ` Lee Schermerhorn
2007-09-13 18:55 ` Mel Gorman
2007-09-13 18:19 ` Christoph Lameter
2007-09-13 18:23 ` Mel Gorman
2007-09-13 18:26 ` Christoph Lameter
2007-09-13 21:17 ` Andrew Morton
2007-09-14 2:20 ` Christoph Lameter
2007-09-14 8:53 ` Mel Gorman
2007-09-14 15:06 ` Lee Schermerhorn
2007-09-14 17:46 ` Mel Gorman
2007-09-14 18:41 ` Christoph Lameter
2007-09-16 18:02 ` Mel Gorman
2007-09-17 18:12 ` Christoph Lameter
2007-09-17 18:19 ` Christoph Lameter
2007-09-17 20:14 ` Mel Gorman
2007-09-17 19:16 ` Christoph Lameter
2007-09-17 20:03 ` Mel Gorman
2007-09-14 20:15 ` Lee Schermerhorn
2007-09-16 18:05 ` Mel Gorman
2007-09-16 19:34 ` Andrew Morton
2007-09-16 21:22 ` Mel Gorman
2007-09-17 13:29 ` Lee Schermerhorn
2007-09-17 18:14 ` Christoph Lameter
2007-09-13 15:49 ` Lee Schermerhorn
2007-09-13 18:22 ` Christoph Lameter
2007-09-17 19:00 ` [PATCH] Fix NUMA Memory Policy Reference Counting Lee Schermerhorn
2007-09-17 19:14 ` Christoph Lameter
2007-09-17 19:38 ` Lee Schermerhorn
2007-09-17 19:43 ` Christoph Lameter
2007-09-19 22:03 ` Lee Schermerhorn
2007-09-19 22:23 ` Christoph Lameter
2007-09-18 10:36 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070830185107.22619.43577.sendpatchset@localhost \
--to=lee.schermerhorn@hp.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=eric.whitney@hp.com \
--cc=linux-mm@kvack.org \
--cc=mtk-manpages@gmx.net \
--cc=solo@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox