From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Paul Mundt <lethal@linux-sh.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
linux-mm <linux-mm@kvack.org>,
Christoph Lameter <clameter@sgi.com>,
Nishanth Aravamudan <nacc@us.ibm.com>,
kxr@sgi.com, ak@suse.de, akpm@linux-foundation.org,
Eric Whitney <eric.whitney@hp.com>
Subject: Re: [PATCH/RFC] Allow selected nodes to be excluded from MPOL_INTERLEAVE masks
Date: Wed, 01 Aug 2007 09:39:18 -0400 [thread overview]
Message-ID: <1185975558.5059.18.camel@localhost> (raw)
In-Reply-To: <20070801101651.GA9113@linux-sh.org>
On Wed, 2007-08-01 at 19:16 +0900, Paul Mundt wrote:
> On Mon, Jul 30, 2007 at 12:13:48PM -0400, Lee Schermerhorn wrote:
> > Rationale: some architectures and platforms include nodes with
> > memory that, in some cases, should never appear in MPOL_INTERLEAVE
> > node masks. For example, the 'sh' architecture contains a small
> > amount of SRAM that is local to each cpu. In some applications,
> > this memory should be reserved for explicit usage. Another example
> > is the pseudo-node on HP ia64 platforms that is already interleaved
> > on a cache-line granularity by hardware. Again, in some cases, we
> > want to reserve this for explicit usage, as it has bandwidth and
> > [average] latency characteristics quite different from the "real"
> > nodes.
> >
> Well, it's not so much the interleave that's the problem so much as
> _when_ we interleave. The problem with the interleave node mask at system
> init is that the kernel attempts to spread out data structures across
> these nodes, which results in us being completely out of memory by the
> time we get to userspace. After we've booted, supporting MPOL_INTERLEAVE
> is not so much of a problem, applications just have to be careful with
> their allocations.
>
> The main thing is keeping the kernel away from these nodes unless it's
> been specifically asked to fetch some memory from there. Every page does
> count.
>
> The real problem is how we want to deal with the node avoidance mask. In
> SLOB things presently work quite well in this regard, Christoph's
> slub_nodes= patch did a similar thing:
>
> http://marc.info/?l=linux-mm&m=118127465421877&w=2
> http://marc.info/?l=linux-mm&m=118127688911359&w=2
>
> > Note that allocation of fresh hugepages in response to increases
> > in /proc/sys/vm/nr_hugepages is a form of interleaving. I would
> > like to propose that allocate_fresh_huge_page() use the
> > N_INTERLEAVE state as well as MPOL_INTERLEAVE. Then, one can
> > explicity allocate hugepages on the excluded nodes, when needed,
> > using Nish Aravamundan's per node huge page sysfs attribute.
> > NOT in this patch.
> >
> If we can differentiate between MPOL_INTERLEAVE from the kernel's point
> of view, and explicit MPOL_INTERLEAVE specifiers via mbind() from
> userspace, that works fine for my case. However, the mpol_new() changes
> in this patch deny small nodes the ability to ever be included in an
> MPOL_INTERLEAVE policy, when it's only the kernel policy that I have a
> problem with.
Ah, but it would only "deny small nodes" if you nominate them in the
boot option. I haven't changed your heuristic in numa_policy_init. So,
it will still eliminate small nodes from the boot time interleave
nodemask, independent of whether or not you specify them in the
no_interleave_nodes list.
Or am I missing your point?
>
> Having said that, I do like the node states and using that to exclude a
> node from the system init interleave nodelist, but this still won't
> completely solve the tiny node problems.
Right, so we should keep your boot time heuristic.
>
> > @@ -184,7 +184,7 @@ static struct mempolicy *mpol_new(int mo
> > case MPOL_INTERLEAVE:
> > policy->v.nodes = *nodes;
> > nodes_and(policy->v.nodes, policy->v.nodes,
> > - node_states[N_MEMORY]);
> > + node_states[N_INTERLEAVE]);
> > if (nodes_weight(policy->v.nodes) == 0) {
> > kmem_cache_free(policy_cache, policy);
> > return ERR_PTR(-EINVAL);
>
> Leaving this as node_states[N_MEMORY] combined with the rest of the patch
> would work for me, but that sort of changes the scope of the entire patch
> ;-)
Yeah, it breaks one of my main reasons for proposing this. I still have
no way to keep user requested interleaving off my "special" hardware
interleaved nodes in the case where we don't want this. I should
mention that I'm assuming that the current "best practice" is to
interleave across "all available nodes" in the applications current
context.
[more follow up to later messages]
Lee
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-08-01 13:39 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-27 20:07 Lee Schermerhorn
2007-07-28 6:19 ` KAMEZAWA Hiroyuki
2007-07-30 16:13 ` Lee Schermerhorn
2007-07-30 18:29 ` Christoph Lameter
2007-07-30 20:32 ` Lee Schermerhorn
2007-07-30 21:57 ` Christoph Lameter
2007-08-01 10:16 ` Paul Mundt
2007-08-01 10:33 ` Andi Kleen
2007-08-01 11:01 ` Paul Mundt
2007-08-01 11:07 ` Andi Kleen
2007-08-01 11:21 ` Paul Mundt
2007-08-01 13:54 ` Lee Schermerhorn
2007-08-02 17:38 ` Mark Gross
2007-08-02 18:46 ` Lee Schermerhorn
2007-08-06 16:42 ` Mark Gross
2007-08-01 13:39 ` Lee Schermerhorn [this message]
2007-08-03 7:53 ` Paul Mundt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1185975558.5059.18.camel@localhost \
--to=lee.schermerhorn@hp.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=eric.whitney@hp.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kxr@sgi.com \
--cc=lethal@linux-sh.org \
--cc=linux-mm@kvack.org \
--cc=nacc@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox