linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: linux-mm@kvack.org, linux-numa@vger.kernel.org,
	akpm@linux-foundation.org, Nishanth Aravamudan <nacc@us.ibm.com>,
	David Rientjes <rientjes@google.com>, Adam Litke <agl@us.ibm.com>,
	Andy Whitcroft <apw@canonical.com>,
	eric.whitney@hp.com
Subject: Re: [PATCH 4/5] hugetlb:  add per node hstate attributes
Date: Thu, 27 Aug 2009 12:52:10 -0400	[thread overview]
Message-ID: <1251391930.4374.89.camel@useless.americas.hpqcorp.net> (raw)
In-Reply-To: <20090827102338.GC21183@csn.ul.ie>

On Thu, 2009-08-27 at 11:23 +0100, Mel Gorman wrote:
> On Wed, Aug 26, 2009 at 02:04:03PM -0400, Lee Schermerhorn wrote:
> > <SNIP>
> > This revised patch also removes the include of hugetlb.h from node.h.
> > 
> > Lee
> > 
> > ---
> > 
> > PATCH 5/6 hugetlb:  register per node hugepages attributes
> > 
> > Against: 2.6.31-rc6-mmotm-090820-1918
> > 
> > V2:  remove dependency on kobject private bitfield.  Search
> >      global hstates then all per node hstates for kobject
> >      match in attribute show/store functions.
> > 
> > V3:  rebase atop the mempolicy-based hugepage alloc/free;
> >      use custom "nodes_allowed" to restrict alloc/free to
> >      a specific node via per node attributes.  Per node
> >      attribute overrides mempolicy.  I.e., mempolicy only
> >      applies to global attributes.
> > 
> > V4:  Fix issues raised by Mel Gorman:
> >      + add !NUMA versions of hugetlb_[un]register_node()
> >      + rename 'hi' to 'i' in kobj_to_node_hstate()
> >      + rename (count, input) to (len, count) in nr_hugepages_store()
> >      + moved per node hugepages_kobj and hstate_kobjs[] from the
> >        struct node [sysdev] to hugetlb.c private arrays.
> >      + changed registration mechanism so that hugetlbfs [a module]
> >        register its attributes registration callbacks with the node
> >        driver, eliminating the dependency between the node driver
> >        and hugetlbfs.  From it's init func, hugetlbfs will register
> >        all on-line nodes' hugepage sysfs attributes along with
> >        hugetlbfs' attributes register/unregister functions.  The
> >        node driver will use these functions to [un]register nodes
> >        with hugetlbfs on node hot-plug.
> >      + replaced hugetlb.c private "nodes_allowed_from_node()" with
> >        generic "alloc_nodemask_of_node()".
> > 
> > This patch adds the per huge page size control/query attributes
> > to the per node sysdevs:
> > 
> > /sys/devices/system/node/node<ID>/hugepages/hugepages-<size>/
> > 	nr_hugepages       - r/w
> > 	free_huge_pages    - r/o
> > 	surplus_huge_pages - r/o
> > 
> > The patch attempts to re-use/share as much of the existing
> > global hstate attribute initialization and handling, and the
> > "nodes_allowed" constraint processing as possible.
> > Calling set_max_huge_pages() with no node indicates a change to
> > global hstate parameters.  In this case, any non-default task
> > mempolicy will be used to generate the nodes_allowed mask.  A
> > valid node id indicates an update to that node's hstate 
> > parameters, and the count argument specifies the target count
> > for the specified node.  From this info, we compute the target
> > global count for the hstate and construct a nodes_allowed node
> > mask contain only the specified node.
> > 
> > Setting the node specific nr_hugepages via the per node attribute
> > effectively ignores any task mempolicy or cpuset constraints.
> > 
<snip>
> > Index: linux-2.6.31-rc6-mmotm-090820-1918/drivers/base/node.c
> > ===================================================================
> > --- linux-2.6.31-rc6-mmotm-090820-1918.orig/drivers/base/node.c	2009-08-26 12:37:03.000000000 -0400
> > +++ linux-2.6.31-rc6-mmotm-090820-1918/drivers/base/node.c	2009-08-26 13:01:54.000000000 -0400
> > @@ -177,6 +177,31 @@ static ssize_t node_read_distance(struct
> >  }
> >  static SYSDEV_ATTR(distance, S_IRUGO, node_read_distance, NULL);
> >  
> > +/*
> > + * hugetlbfs per node attributes registration interface
> > + */
> > +NODE_REGISTRATION_FUNC __hugetlb_register_node;
> > +NODE_REGISTRATION_FUNC __hugetlb_unregister_node;
> > +
> > +static inline void hugetlb_register_node(struct node *node)
> > +{
> > +	if (__hugetlb_register_node)
> > +		__hugetlb_register_node(node);
> > +}
> > +
> > +static inline void hugetlb_unregister_node(struct node *node)
> > +{
> > +	if (__hugetlb_unregister_node)
> > +		__hugetlb_unregister_node(node);
> > +}
> > +
> > +void register_hugetlbfs_with_node(NODE_REGISTRATION_FUNC doregister,
> > +                                  NODE_REGISTRATION_FUNC unregister)
> > +{
> > +	__hugetlb_register_node   = doregister;
> > +	__hugetlb_unregister_node = unregister;
> > +}
> > +
> >  
> 
> I think I get this. Basically, you want to avoid the functions being
> called too early before sysfs is initialised and still work with hotplug
> later. So early in boot, no registeration happens. sysfs and hugetlbfs
> get initialised and at that point, these hooks become active, all nodes
> registered and hotplug later continues to work.
> 
> Is that accurate? Can it get a comment?

Yes, you got it, and yes, I'll add a comment.  I had explained it in the
patch description [V4], but that's not too useful to someone coming
along later...

<snip>

> > @@ -1253,7 +1255,21 @@ static unsigned long set_max_huge_pages(
> >  	if (h->order >= MAX_ORDER)
> >  		return h->max_huge_pages;
> >  
> > -	nodes_allowed = huge_mpol_nodes_allowed();
> > +	if (nid == NO_NODEID_SPECIFIED)
> > +		nodes_allowed = huge_mpol_nodes_allowed();
> > +	else {
> > +		/*
> > +		 * incoming 'count' is for node 'nid' only, so
> > +		 * adjust count to global, but restrict alloc/free
> > +		 * to the specified node.
> > +		 */
> > +		count += h->nr_huge_pages - h->nr_huge_pages_node[nid];
> > +		nodes_allowed = alloc_nodemask_of_node(nid);
> 
> alloc_nodemask_of_node() isn't defined anywhere.


Well, that's because the patch that defines it is in a message that I
meant to send before this one.  I see it's in my Drafts folder.  I'll
attach that patch below.  I'm rebasing against the 0827 mmotm, and I'll
resend the rebased series.  However, I wanted to get your opinion of the
nodemask patch below.

<snip>
> >  
> > +#ifdef CONFIG_NUMA
> > +
> > +struct node_hstate {
> > +	struct kobject		*hugepages_kobj;
> > +	struct kobject		*hstate_kobjs[HUGE_MAX_HSTATE];
> > +};
> > +struct node_hstate node_hstates[MAX_NUMNODES];
> > +
> > +static struct attribute *per_node_hstate_attrs[] = {
> > +	&nr_hugepages_attr.attr,
> > +	&free_hugepages_attr.attr,
> > +	&surplus_hugepages_attr.attr,
> > +	NULL,
> > +};
> > +
> > +static struct attribute_group per_node_hstate_attr_group = {
> > +	.attrs = per_node_hstate_attrs,
> > +};
> > +
> > +static struct hstate *kobj_to_node_hstate(struct kobject *kobj, int *nidp)
> > +{
> > +	int nid;
> > +
> > +	for (nid = 0; nid < nr_node_ids; nid++) {
> > +		struct node_hstate *nhs = &node_hstates[nid];
> > +		int i;
> > +		for (i = 0; i < HUGE_MAX_HSTATE; i++)
> > +			if (nhs->hstate_kobjs[i] == kobj) {
> > +				if (nidp)
> > +					*nidp = nid;
> > +				return &hstates[i];
> > +			}
> > +	}
> > +
> > +	BUG();
> > +	return NULL;
> > +}
> 
> Ok, this looks nicer in that the dependencies between hugetlbfs and base
> node support are going the right direction.

Agreed.  I removed that "issue" from the patch description.

<snip>
> > Index: linux-2.6.31-rc6-mmotm-090820-1918/include/linux/node.h
> > ===================================================================
> > --- linux-2.6.31-rc6-mmotm-090820-1918.orig/include/linux/node.h	2009-08-26 12:37:03.000000000 -0400
> > +++ linux-2.6.31-rc6-mmotm-090820-1918/include/linux/node.h	2009-08-26 12:40:19.000000000 -0400
> > @@ -28,6 +28,7 @@ struct node {
> >  
> >  struct memory_block;
> >  extern struct node node_devices[];
> > +typedef  void (*NODE_REGISTRATION_FUNC)(struct node *);
> >  
> >  extern int register_node(struct node *, int, struct node *);
> >  extern void unregister_node(struct node *node);
> > @@ -39,6 +40,8 @@ extern int unregister_cpu_under_node(uns
> >  extern int register_mem_sect_under_node(struct memory_block *mem_blk,
> >  						int nid);
> >  extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk);
> > +extern void register_hugetlbfs_with_node(NODE_REGISTRATION_FUNC doregister,
> > +                                         NODE_REGISTRATION_FUNC unregister);
> >  #else
> >  static inline int register_one_node(int nid)
> >  {
> > @@ -65,6 +68,9 @@ static inline int unregister_mem_sect_un
> >  {
> >  	return 0;
> >  }
> > +
> > +static inline void register_hugetlbfs_with_node(NODE_REGISTRATION_FUNC do,
> > +                                                NODE_REGISTRATION_FUNC un) { }
> 
> "do" is a keyword. This won't compile on !NUMA. needs to be called
> doregister and unregister or basically anything other than "do"

Sorry.  Last minute, obviously untested, addition.  I have built the
reworked code with and without NUMA.

Here's my current "alloc_nodemask_of_node()" patch.  What do you think
about going with this? 

PATCH 4/6 - hugetlb:  introduce alloc_nodemask_of_node()

Against: 2.6.31-rc6-mmotm-090820-1918

Introduce nodemask macro to allocate a nodemask and 
initialize it to contain a single node, using existing
nodemask_of_node() macro.  Coded as a macro to avoid header
dependency hell.

This will be used to construct the huge pages "nodes_allowed"
nodemask for a single node when a persistent huge page
pool page count is modified via a per node sysfs attribute.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 include/linux/nodemask.h |   17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

Index: linux-2.6.31-rc6-mmotm-090820-1918/include/linux/nodemask.h
===================================================================
--- linux-2.6.31-rc6-mmotm-090820-1918.orig/include/linux/nodemask.h	2009-08-27 09:16:39.000000000 -0400
+++ linux-2.6.31-rc6-mmotm-090820-1918/include/linux/nodemask.h	2009-08-27 09:52:21.000000000 -0400
@@ -245,18 +245,31 @@ static inline int __next_node(int n, con
 	return min_t(int,MAX_NUMNODES,find_next_bit(srcp->bits, MAX_NUMNODES, n+1));
 }
 
+#define init_nodemask_of_nodes(mask, node)				\
+	nodes_clear(*(mask));						\
+	node_set((node), *(mask));
+
 #define nodemask_of_node(node)						\
 ({									\
 	typeof(_unused_nodemask_arg_) m;				\
 	if (sizeof(m) == sizeof(unsigned long)) {			\
 		m.bits[0] = 1UL<<(node);				\
 	} else {							\
-		nodes_clear(m);						\
-		node_set((node), m);					\
+		init_nodemask_of_nodes(&m, (node));			\
 	}								\
 	m;								\
 })
 
+#define alloc_nodemask_of_node(node)					\
+({									\
+	typeof(_unused_nodemask_arg_) *nmp;				\
+	nmp = kmalloc(sizeof(*nmp), GFP_KERNEL);			\
+	if (nmp)							\
+		init_nodemask_of_nodes(nmp, (node));			\
+	nmp;								\
+})
+
+
 #define first_unset_node(mask) __first_unset_node(&(mask))
 static inline int __first_unset_node(const nodemask_t *maskp)
 {


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-08-27 16:52 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-24 19:24 [PATCH 0/5] hugetlb: numa control of persistent huge pages alloc/free Lee Schermerhorn
2009-08-24 19:25 ` [PATCH 1/5] hugetlb: rework hstate_next_node_* functions Lee Schermerhorn
2009-08-25  8:10   ` David Rientjes
2009-08-24 19:26 ` [PATCH 2/5] hugetlb: add nodemask arg to huge page alloc, free and surplus adjust fcns Lee Schermerhorn
2009-08-25  8:16   ` David Rientjes
2009-08-25 20:49     ` Lee Schermerhorn
2009-08-25 21:59       ` David Rientjes
2009-08-26  9:58       ` Mel Gorman
2009-08-24 19:27 ` [PATCH 3/5] hugetlb: derive huge pages nodes allowed from task mempolicy Lee Schermerhorn
2009-08-25  8:47   ` David Rientjes
2009-08-25 20:49     ` Lee Schermerhorn
2009-08-27 19:40       ` David Rientjes
2009-08-25 10:22   ` Mel Gorman
2009-08-24 19:29 ` [PATCH 4/5] hugetlb: add per node hstate attributes Lee Schermerhorn
2009-08-25 10:19   ` Mel Gorman
2009-08-25 20:49     ` Lee Schermerhorn
2009-08-26 10:11       ` Mel Gorman
2009-08-26 18:02         ` Lee Schermerhorn
2009-08-26 19:47           ` David Rientjes
2009-08-26 20:46             ` Lee Schermerhorn
2009-08-27  9:52               ` Mel Gorman
2009-08-27 19:35               ` David Rientjes
2009-08-28 12:56                 ` Lee Schermerhorn
2009-08-26 18:04         ` Lee Schermerhorn
2009-08-27 10:23           ` Mel Gorman
2009-08-27 16:52             ` Lee Schermerhorn [this message]
2009-08-28 10:09               ` Mel Gorman
2009-08-25 13:35   ` Mel Gorman
2009-08-25 20:49     ` Lee Schermerhorn
2009-08-26 10:12       ` Mel Gorman
2009-08-24 19:30 ` [PATCH 5/5] hugetlb: update hugetlb documentation for mempolicy based management Lee Schermerhorn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1251391930.4374.89.camel@useless.americas.hpqcorp.net \
    --to=lee.schermerhorn@hp.com \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=apw@canonical.com \
    --cc=eric.whitney@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-numa@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    --cc=nacc@us.ibm.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox