From: Paul Mundt <lethal@linux-sh.org>
To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
linux-mm <linux-mm@kvack.org>,
Christoph Lameter <clameter@sgi.com>,
Nishanth Aravamudan <nacc@us.ibm.com>,
kxr@sgi.com, ak@suse.de, akpm@linux-foundation.org,
Eric Whitney <eric.whitney@hp.com>
Subject: Re: [PATCH/RFC] Allow selected nodes to be excluded from MPOL_INTERLEAVE masks
Date: Wed, 1 Aug 2007 19:16:51 +0900 [thread overview]
Message-ID: <20070801101651.GA9113@linux-sh.org> (raw)
In-Reply-To: <1185812028.5492.79.camel@localhost>
On Mon, Jul 30, 2007 at 12:13:48PM -0400, Lee Schermerhorn wrote:
> Rationale: some architectures and platforms include nodes with
> memory that, in some cases, should never appear in MPOL_INTERLEAVE
> node masks. For example, the 'sh' architecture contains a small
> amount of SRAM that is local to each cpu. In some applications,
> this memory should be reserved for explicit usage. Another example
> is the pseudo-node on HP ia64 platforms that is already interleaved
> on a cache-line granularity by hardware. Again, in some cases, we
> want to reserve this for explicit usage, as it has bandwidth and
> [average] latency characteristics quite different from the "real"
> nodes.
>
Well, it's not so much the interleave that's the problem so much as
_when_ we interleave. The problem with the interleave node mask at system
init is that the kernel attempts to spread out data structures across
these nodes, which results in us being completely out of memory by the
time we get to userspace. After we've booted, supporting MPOL_INTERLEAVE
is not so much of a problem, applications just have to be careful with
their allocations.
The main thing is keeping the kernel away from these nodes unless it's
been specifically asked to fetch some memory from there. Every page does
count.
The real problem is how we want to deal with the node avoidance mask. In
SLOB things presently work quite well in this regard, Christoph's
slub_nodes= patch did a similar thing:
http://marc.info/?l=linux-mm&m=118127465421877&w=2
http://marc.info/?l=linux-mm&m=118127688911359&w=2
> Note that allocation of fresh hugepages in response to increases
> in /proc/sys/vm/nr_hugepages is a form of interleaving. I would
> like to propose that allocate_fresh_huge_page() use the
> N_INTERLEAVE state as well as MPOL_INTERLEAVE. Then, one can
> explicity allocate hugepages on the excluded nodes, when needed,
> using Nish Aravamundan's per node huge page sysfs attribute.
> NOT in this patch.
>
If we can differentiate between MPOL_INTERLEAVE from the kernel's point
of view, and explicit MPOL_INTERLEAVE specifiers via mbind() from
userspace, that works fine for my case. However, the mpol_new() changes
in this patch deny small nodes the ability to ever be included in an
MPOL_INTERLEAVE policy, when it's only the kernel policy that I have a
problem with.
Having said that, I do like the node states and using that to exclude a
node from the system init interleave nodelist, but this still won't
completely solve the tiny node problems.
> @@ -184,7 +184,7 @@ static struct mempolicy *mpol_new(int mo
> case MPOL_INTERLEAVE:
> policy->v.nodes = *nodes;
> nodes_and(policy->v.nodes, policy->v.nodes,
> - node_states[N_MEMORY]);
> + node_states[N_INTERLEAVE]);
> if (nodes_weight(policy->v.nodes) == 0) {
> kmem_cache_free(policy_cache, policy);
> return ERR_PTR(-EINVAL);
Leaving this as node_states[N_MEMORY] combined with the rest of the patch
would work for me, but that sort of changes the scope of the entire patch
;-)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-08-01 10:16 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-27 20:07 Lee Schermerhorn
2007-07-28 6:19 ` KAMEZAWA Hiroyuki
2007-07-30 16:13 ` Lee Schermerhorn
2007-07-30 18:29 ` Christoph Lameter
2007-07-30 20:32 ` Lee Schermerhorn
2007-07-30 21:57 ` Christoph Lameter
2007-08-01 10:16 ` Paul Mundt [this message]
2007-08-01 10:33 ` Andi Kleen
2007-08-01 11:01 ` Paul Mundt
2007-08-01 11:07 ` Andi Kleen
2007-08-01 11:21 ` Paul Mundt
2007-08-01 13:54 ` Lee Schermerhorn
2007-08-02 17:38 ` Mark Gross
2007-08-02 18:46 ` Lee Schermerhorn
2007-08-06 16:42 ` Mark Gross
2007-08-01 13:39 ` Lee Schermerhorn
2007-08-03 7:53 ` Paul Mundt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070801101651.GA9113@linux-sh.org \
--to=lethal@linux-sh.org \
--cc=Lee.Schermerhorn@hp.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=eric.whitney@hp.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kxr@sgi.com \
--cc=linux-mm@kvack.org \
--cc=nacc@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox