linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Christoph Lameter <clameter@sgi.com>, Paul Jackson <pj@sgi.com>,
	akpm@linux-foundation.org, kxr@sgi.com, linux-mm@kvack.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [PATCH] Memoryless nodes:  use "node_memory_map" for cpuset mems_allowed validation
Date: Tue, 24 Jul 2007 10:11:03 -0400	[thread overview]
Message-ID: <1185286264.5649.23.camel@localhost> (raw)
In-Reply-To: <20070723214816.GC6036@us.ibm.com>

On Mon, 2007-07-23 at 14:48 -0700, Nishanth Aravamudan wrote:
> On 23.07.2007 [16:59:52 -0400], Lee Schermerhorn wrote:
> > On Mon, 2007-07-23 at 12:09 -0700, Nishanth Aravamudan wrote: 
> > > On 20.07.2007 [16:49:24 -0400], Lee Schermerhorn wrote:
> > > > This fixes a problem I encountered testing Christoph's memoryless nodes
> > > > series.  Applies atop that series.  Other than this, series holds up
> > > > under what testing I've been able to do this week.
> > > > 
> > > > Memoryless Nodes:  use "node_memory_map" for cpusets mems_allowed validation
> > > > 
> > > > cpusets try to ensure that any node added to a cpuset's 
> > > > mems_allowed is on-line and contains memory.  The assumption
> > > > was that online nodes contained memory.  Thus, it is possible
> > > > to add memoryless nodes to a cpuset and then add tasks to this
> > > > cpuset.  This results in continuous series of oom-kill and other
> > > > console stack traces and apparent system hang.
> > > > 
> > > > Change cpusets to use node_states[N_MEMORY] [a.k.a.
> > > > node_memory_map] in place of node_online_map when vetting 
> > > > memories.  Return error if admin attempts to write a non-empty
> > > > mems_allowed node mask containing only memoryless-nodes.
> > > > 
> > > > Signed-off-by:  Lee Schermerhorn <lee.schermerhorn@hp.com>
> > > 
> > > Lee, while looking at this change, I think it ends up fixing
> > > cpuset_mems_allowed() to return nodemasks that only include nodes in
> > > node_states[N_MEMORY]. However, cpuset_current_mems_allowed is a
> > > lockless macro which would still be broken. I think it would need to
> > > becom a static inline nodes_and() in the CPUSET case and a #define
> > > node_states[N_MEMORY] in the non-CPUSET case?
> > > 
> > > Or perhaps we should adjust cpusets to make it so that the mems_allowed
> > > member only includes nodes that are set in node_states[N_MEMORY]?
> > 
> > 
> > I thought that's what my patch to nodelist_parse() did.  It ensures that
> > current->mems_allowed is correct [contains at least one node with
> > memory, and only nodes with memory] at the time it is installed, but
> > doesn't consider memory hot plug and node off-lining.  Is this
> > [offline/hotplug] your point?
> 
> And everytime it is updated, right? (current->mems_allowed).   My concern
> is purely whether I can then directly use cpuset_current_mems_allowed in
> the interleave code for hugetlb.c and it will do the right thing. It
> will work, if the #define is changed for !CPUSETS and if your change
> guarantess current->mems_allowed is always consistent with
> node_states[N_MEMORY].

Other than offlining/hot removal of memory, I think the only place that
current->mems_allowed gets updated in in update_nodelist() [I wrote
nodelist_parse() previously by mistake].  My patch to that function
tries to ensure that current->mems_allowed always contains at least one
node with memory.

If by "gets updated" you're referring to
"cpuset_update_task_memory_state(), the latter calls
"guarantee_online_mems()", which I also patched to use
node_states[N_MEMORY] instead of "node_online_map".  So, I think you can
use current->mems_allowed in the hugetlb code.  Maybe call
"cpuset_update_task_memory_state()" before using it?  However, I think
that will have the effect of escaping the cpuset constraints if all of
the nodes in the current task's mems_allowed have been offlined or hot
removed since this mask was created/updated in update_nodelist().

> 
> I think I simply was confused about the full impact of your changes, as
> I don't know cpusets that well. I'm going to try and test a memoryless
> node box I have at work w/ your change, though, and see what happens.

FYI:  I initially tried to test Christoph's memless nodes series with
your rebased hugetlb patches, but the system appeared to hang.  [Might
be related to Ken Chen's recent hugetlb patch?]  I backed off to just
Christoph's series and things seem to run OK.  That's when I noticed
that one could create a cpuset with just memoryless nodes and posted the
subject patch.  I'll get back to testing your patches on my memoryless
nodes system "real soon now".

Meanwhile, as you've pointed out, I missed the "node_online_map" usage
in the header and, I see, in the initialization of the top level cpuset
in cpuset_init_smp().  I'm testing this now.  I'll repost the patch with
these fixes shortly.

For completeness, here's the numactl --hardware output [less the SLIT
info] from my test platform [ia64] in it's current config:

available: 5 nodes (0-4)
node 0 size: 0 MB
node 0 free: 0 MB
node 1 size: 0 MB
node 1 free: 0 MB
node 2 size: 0 MB
node 2 free: 0 MB
node 3 size: 0 MB
node 3 free: 0 MB
node 4 size: 8191 MB
node 4 free: 105 MB

Booted with mem=8G to ensure swapping, ...  Free mem is so low because
of the tests I'm running.  It varies between ~40M and ~150M.

> 
> > Seems like that is an issue that exists in the unpatched code as
> > well--i.e., unlike cpuset_mems_allowed(), the lockless, "_current_"
> > version does not vet current->mems_allowed against the
> > nodes_online_mask.  So, all valid nodes in current->mems_allowed could
> > have been off-lined since the mask was installed.  Am I reading this
> > right?
> 
> True -- I honestly don't know. I doubt much of this code has been fully
> audited for full node unplug?

Looks like at least an initial stab has been made...


Lee


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-07-24 14:11 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20070711182219.234782227@sgi.com>
     [not found] ` <20070711182252.138829364@sgi.com>
2007-07-11 18:46   ` [patch 10/12] Memoryless nodes: Update memory policy and page migration Nishanth Aravamudan
2007-07-11 18:56     ` Christoph Lameter
     [not found] ` <20070711182252.376540447@sgi.com>
2007-07-11 19:04   ` [patch 11/12] Add N_CPU node state Christoph Lameter
     [not found] ` <20070711182250.005856256@sgi.com>
2007-07-11 19:06   ` [patch 01/12] NUMA: Generic management of nodemasks for various purposes Christoph Lameter
2007-07-11 19:32     ` Lee Schermerhorn
2007-07-20 20:49     ` [PATCH] Memoryless nodes: use "node_memory_map" for cpuset mems_allowed validation Lee Schermerhorn
2007-07-20 22:07       ` Nishanth Aravamudan
2007-07-23 19:09       ` Nishanth Aravamudan
2007-07-23 19:23         ` Paul Jackson
2007-07-23 20:08           ` Nishanth Aravamudan
2007-07-23 20:59         ` Lee Schermerhorn
2007-07-23 21:48           ` Nishanth Aravamudan
2007-07-24 14:11             ` Lee Schermerhorn [this message]
2007-07-24 16:16               ` Nishanth Aravamudan
2007-07-24 14:15     ` [PATCH take2] " Lee Schermerhorn
2007-07-24 16:19       ` Nishanth Aravamudan
2007-07-24 19:01         ` Lee Schermerhorn
2007-07-25 15:50           ` Nishanth Aravamudan
2007-07-24 20:30     ` [PATCH take3] " Lee Schermerhorn
2007-07-25 15:53       ` Nishanth Aravamudan
2007-07-25 22:00       ` Nishanth Aravamudan
2007-07-26 13:04         ` Lee Schermerhorn
2007-07-27  0:40       ` Nishanth Aravamudan
2007-07-27 14:15         ` Lee Schermerhorn
2007-07-24 20:35     ` [PATCH/RFC] Memoryless nodes: Suppress redundant "node with no memory" messages Lee Schermerhorn
2007-07-25 15:56       ` Nishanth Aravamudan
     [not found] ` <20070711182251.433134748@sgi.com>
2007-07-12  0:07   ` [patch 07/12] Memoryless nodes: SLUB support Andrew Morton
2007-07-12  1:42     ` Christoph Lameter
2007-07-12 18:33       ` Nishanth Aravamudan
2007-07-12 18:38         ` Christoph Lameter
2007-07-13 15:14 ` [patch 00/12] NUMA: Memoryless node support V3 Nishanth Aravamudan
2007-07-13 16:43   ` Christoph Lameter
2007-07-13 16:52     ` Nishanth Aravamudan
2007-07-13 17:20     ` Lee Schermerhorn
2007-07-13 17:23       ` Christoph Lameter
2007-07-13 19:22         ` Lee Schermerhorn
2007-07-13 20:53         ` Lee Schermerhorn
2007-07-13 21:34           ` Christoph Lameter
2007-07-13 23:18           ` Nishanth Aravamudan
     [not found]     ` <1185310277.5649.90.camel@localhost>
     [not found]       ` <Pine.LNX.4.64.0707241402010.4773@schroedinger.engr.sgi.com>
     [not found]         ` <1185372692.5604.22.camel@localhost>
2007-07-25 15:45           ` Lee Schermerhorn
2007-07-25 19:16             ` 2.6.23-rc1-mm1: boot hang on ia64 with memoryless nodes Lee Schermerhorn
2007-07-25 19:38               ` Christoph Lameter
2007-07-25 20:03                 ` Christoph Lameter
2007-07-25 21:18                 ` Lee Schermerhorn
2007-07-26 13:53                   ` Lee Schermerhorn
2007-07-26 14:00                     ` KAMEZAWA Hiroyuki
2007-07-26 18:10                       ` Lee Schermerhorn
2007-07-26 14:33                     ` Lee Schermerhorn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1185286264.5649.23.camel@localhost \
    --to=lee.schermerhorn@hp.com \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kxr@sgi.com \
    --cc=linux-mm@kvack.org \
    --cc=nacc@us.ibm.com \
    --cc=pj@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox