From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: linux-mm@kvack.org, linux-numa@vger.kernel.org,
akpm@linux-foundation.org, Nishanth Aravamudan <nacc@us.ibm.com>,
David Rientjes <rientjes@google.com>, Adam Litke <agl@us.ibm.com>,
Andy Whitcroft <apw@canonical.com>,
eric.whitney@hp.com
Subject: Re: [PATCH 4/5] hugetlb: add per node hstate attributes
Date: Thu, 27 Aug 2009 12:52:10 -0400 [thread overview]
Message-ID: <1251391930.4374.89.camel@useless.americas.hpqcorp.net> (raw)
In-Reply-To: <20090827102338.GC21183@csn.ul.ie>
On Thu, 2009-08-27 at 11:23 +0100, Mel Gorman wrote:
> On Wed, Aug 26, 2009 at 02:04:03PM -0400, Lee Schermerhorn wrote:
> > <SNIP>
> > This revised patch also removes the include of hugetlb.h from node.h.
> >
> > Lee
> >
> > ---
> >
> > PATCH 5/6 hugetlb: register per node hugepages attributes
> >
> > Against: 2.6.31-rc6-mmotm-090820-1918
> >
> > V2: remove dependency on kobject private bitfield. Search
> > global hstates then all per node hstates for kobject
> > match in attribute show/store functions.
> >
> > V3: rebase atop the mempolicy-based hugepage alloc/free;
> > use custom "nodes_allowed" to restrict alloc/free to
> > a specific node via per node attributes. Per node
> > attribute overrides mempolicy. I.e., mempolicy only
> > applies to global attributes.
> >
> > V4: Fix issues raised by Mel Gorman:
> > + add !NUMA versions of hugetlb_[un]register_node()
> > + rename 'hi' to 'i' in kobj_to_node_hstate()
> > + rename (count, input) to (len, count) in nr_hugepages_store()
> > + moved per node hugepages_kobj and hstate_kobjs[] from the
> > struct node [sysdev] to hugetlb.c private arrays.
> > + changed registration mechanism so that hugetlbfs [a module]
> > register its attributes registration callbacks with the node
> > driver, eliminating the dependency between the node driver
> > and hugetlbfs. From it's init func, hugetlbfs will register
> > all on-line nodes' hugepage sysfs attributes along with
> > hugetlbfs' attributes register/unregister functions. The
> > node driver will use these functions to [un]register nodes
> > with hugetlbfs on node hot-plug.
> > + replaced hugetlb.c private "nodes_allowed_from_node()" with
> > generic "alloc_nodemask_of_node()".
> >
> > This patch adds the per huge page size control/query attributes
> > to the per node sysdevs:
> >
> > /sys/devices/system/node/node<ID>/hugepages/hugepages-<size>/
> > nr_hugepages - r/w
> > free_huge_pages - r/o
> > surplus_huge_pages - r/o
> >
> > The patch attempts to re-use/share as much of the existing
> > global hstate attribute initialization and handling, and the
> > "nodes_allowed" constraint processing as possible.
> > Calling set_max_huge_pages() with no node indicates a change to
> > global hstate parameters. In this case, any non-default task
> > mempolicy will be used to generate the nodes_allowed mask. A
> > valid node id indicates an update to that node's hstate
> > parameters, and the count argument specifies the target count
> > for the specified node. From this info, we compute the target
> > global count for the hstate and construct a nodes_allowed node
> > mask contain only the specified node.
> >
> > Setting the node specific nr_hugepages via the per node attribute
> > effectively ignores any task mempolicy or cpuset constraints.
> >
<snip>
> > Index: linux-2.6.31-rc6-mmotm-090820-1918/drivers/base/node.c
> > ===================================================================
> > --- linux-2.6.31-rc6-mmotm-090820-1918.orig/drivers/base/node.c 2009-08-26 12:37:03.000000000 -0400
> > +++ linux-2.6.31-rc6-mmotm-090820-1918/drivers/base/node.c 2009-08-26 13:01:54.000000000 -0400
> > @@ -177,6 +177,31 @@ static ssize_t node_read_distance(struct
> > }
> > static SYSDEV_ATTR(distance, S_IRUGO, node_read_distance, NULL);
> >
> > +/*
> > + * hugetlbfs per node attributes registration interface
> > + */
> > +NODE_REGISTRATION_FUNC __hugetlb_register_node;
> > +NODE_REGISTRATION_FUNC __hugetlb_unregister_node;
> > +
> > +static inline void hugetlb_register_node(struct node *node)
> > +{
> > + if (__hugetlb_register_node)
> > + __hugetlb_register_node(node);
> > +}
> > +
> > +static inline void hugetlb_unregister_node(struct node *node)
> > +{
> > + if (__hugetlb_unregister_node)
> > + __hugetlb_unregister_node(node);
> > +}
> > +
> > +void register_hugetlbfs_with_node(NODE_REGISTRATION_FUNC doregister,
> > + NODE_REGISTRATION_FUNC unregister)
> > +{
> > + __hugetlb_register_node = doregister;
> > + __hugetlb_unregister_node = unregister;
> > +}
> > +
> >
>
> I think I get this. Basically, you want to avoid the functions being
> called too early before sysfs is initialised and still work with hotplug
> later. So early in boot, no registeration happens. sysfs and hugetlbfs
> get initialised and at that point, these hooks become active, all nodes
> registered and hotplug later continues to work.
>
> Is that accurate? Can it get a comment?
Yes, you got it, and yes, I'll add a comment. I had explained it in the
patch description [V4], but that's not too useful to someone coming
along later...
<snip>
> > @@ -1253,7 +1255,21 @@ static unsigned long set_max_huge_pages(
> > if (h->order >= MAX_ORDER)
> > return h->max_huge_pages;
> >
> > - nodes_allowed = huge_mpol_nodes_allowed();
> > + if (nid == NO_NODEID_SPECIFIED)
> > + nodes_allowed = huge_mpol_nodes_allowed();
> > + else {
> > + /*
> > + * incoming 'count' is for node 'nid' only, so
> > + * adjust count to global, but restrict alloc/free
> > + * to the specified node.
> > + */
> > + count += h->nr_huge_pages - h->nr_huge_pages_node[nid];
> > + nodes_allowed = alloc_nodemask_of_node(nid);
>
> alloc_nodemask_of_node() isn't defined anywhere.
Well, that's because the patch that defines it is in a message that I
meant to send before this one. I see it's in my Drafts folder. I'll
attach that patch below. I'm rebasing against the 0827 mmotm, and I'll
resend the rebased series. However, I wanted to get your opinion of the
nodemask patch below.
<snip>
> >
> > +#ifdef CONFIG_NUMA
> > +
> > +struct node_hstate {
> > + struct kobject *hugepages_kobj;
> > + struct kobject *hstate_kobjs[HUGE_MAX_HSTATE];
> > +};
> > +struct node_hstate node_hstates[MAX_NUMNODES];
> > +
> > +static struct attribute *per_node_hstate_attrs[] = {
> > + &nr_hugepages_attr.attr,
> > + &free_hugepages_attr.attr,
> > + &surplus_hugepages_attr.attr,
> > + NULL,
> > +};
> > +
> > +static struct attribute_group per_node_hstate_attr_group = {
> > + .attrs = per_node_hstate_attrs,
> > +};
> > +
> > +static struct hstate *kobj_to_node_hstate(struct kobject *kobj, int *nidp)
> > +{
> > + int nid;
> > +
> > + for (nid = 0; nid < nr_node_ids; nid++) {
> > + struct node_hstate *nhs = &node_hstates[nid];
> > + int i;
> > + for (i = 0; i < HUGE_MAX_HSTATE; i++)
> > + if (nhs->hstate_kobjs[i] == kobj) {
> > + if (nidp)
> > + *nidp = nid;
> > + return &hstates[i];
> > + }
> > + }
> > +
> > + BUG();
> > + return NULL;
> > +}
>
> Ok, this looks nicer in that the dependencies between hugetlbfs and base
> node support are going the right direction.
Agreed. I removed that "issue" from the patch description.
<snip>
> > Index: linux-2.6.31-rc6-mmotm-090820-1918/include/linux/node.h
> > ===================================================================
> > --- linux-2.6.31-rc6-mmotm-090820-1918.orig/include/linux/node.h 2009-08-26 12:37:03.000000000 -0400
> > +++ linux-2.6.31-rc6-mmotm-090820-1918/include/linux/node.h 2009-08-26 12:40:19.000000000 -0400
> > @@ -28,6 +28,7 @@ struct node {
> >
> > struct memory_block;
> > extern struct node node_devices[];
> > +typedef void (*NODE_REGISTRATION_FUNC)(struct node *);
> >
> > extern int register_node(struct node *, int, struct node *);
> > extern void unregister_node(struct node *node);
> > @@ -39,6 +40,8 @@ extern int unregister_cpu_under_node(uns
> > extern int register_mem_sect_under_node(struct memory_block *mem_blk,
> > int nid);
> > extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk);
> > +extern void register_hugetlbfs_with_node(NODE_REGISTRATION_FUNC doregister,
> > + NODE_REGISTRATION_FUNC unregister);
> > #else
> > static inline int register_one_node(int nid)
> > {
> > @@ -65,6 +68,9 @@ static inline int unregister_mem_sect_un
> > {
> > return 0;
> > }
> > +
> > +static inline void register_hugetlbfs_with_node(NODE_REGISTRATION_FUNC do,
> > + NODE_REGISTRATION_FUNC un) { }
>
> "do" is a keyword. This won't compile on !NUMA. needs to be called
> doregister and unregister or basically anything other than "do"
Sorry. Last minute, obviously untested, addition. I have built the
reworked code with and without NUMA.
Here's my current "alloc_nodemask_of_node()" patch. What do you think
about going with this?
PATCH 4/6 - hugetlb: introduce alloc_nodemask_of_node()
Against: 2.6.31-rc6-mmotm-090820-1918
Introduce nodemask macro to allocate a nodemask and
initialize it to contain a single node, using existing
nodemask_of_node() macro. Coded as a macro to avoid header
dependency hell.
This will be used to construct the huge pages "nodes_allowed"
nodemask for a single node when a persistent huge page
pool page count is modified via a per node sysfs attribute.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
include/linux/nodemask.h | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
Index: linux-2.6.31-rc6-mmotm-090820-1918/include/linux/nodemask.h
===================================================================
--- linux-2.6.31-rc6-mmotm-090820-1918.orig/include/linux/nodemask.h 2009-08-27 09:16:39.000000000 -0400
+++ linux-2.6.31-rc6-mmotm-090820-1918/include/linux/nodemask.h 2009-08-27 09:52:21.000000000 -0400
@@ -245,18 +245,31 @@ static inline int __next_node(int n, con
return min_t(int,MAX_NUMNODES,find_next_bit(srcp->bits, MAX_NUMNODES, n+1));
}
+#define init_nodemask_of_nodes(mask, node) \
+ nodes_clear(*(mask)); \
+ node_set((node), *(mask));
+
#define nodemask_of_node(node) \
({ \
typeof(_unused_nodemask_arg_) m; \
if (sizeof(m) == sizeof(unsigned long)) { \
m.bits[0] = 1UL<<(node); \
} else { \
- nodes_clear(m); \
- node_set((node), m); \
+ init_nodemask_of_nodes(&m, (node)); \
} \
m; \
})
+#define alloc_nodemask_of_node(node) \
+({ \
+ typeof(_unused_nodemask_arg_) *nmp; \
+ nmp = kmalloc(sizeof(*nmp), GFP_KERNEL); \
+ if (nmp) \
+ init_nodemask_of_nodes(nmp, (node)); \
+ nmp; \
+})
+
+
#define first_unset_node(mask) __first_unset_node(&(mask))
static inline int __first_unset_node(const nodemask_t *maskp)
{
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-08-27 16:52 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-24 19:24 [PATCH 0/5] hugetlb: numa control of persistent huge pages alloc/free Lee Schermerhorn
2009-08-24 19:25 ` [PATCH 1/5] hugetlb: rework hstate_next_node_* functions Lee Schermerhorn
2009-08-25 8:10 ` David Rientjes
2009-08-24 19:26 ` [PATCH 2/5] hugetlb: add nodemask arg to huge page alloc, free and surplus adjust fcns Lee Schermerhorn
2009-08-25 8:16 ` David Rientjes
2009-08-25 20:49 ` Lee Schermerhorn
2009-08-25 21:59 ` David Rientjes
2009-08-26 9:58 ` Mel Gorman
2009-08-24 19:27 ` [PATCH 3/5] hugetlb: derive huge pages nodes allowed from task mempolicy Lee Schermerhorn
2009-08-25 8:47 ` David Rientjes
2009-08-25 20:49 ` Lee Schermerhorn
2009-08-27 19:40 ` David Rientjes
2009-08-25 10:22 ` Mel Gorman
2009-08-24 19:29 ` [PATCH 4/5] hugetlb: add per node hstate attributes Lee Schermerhorn
2009-08-25 10:19 ` Mel Gorman
2009-08-25 20:49 ` Lee Schermerhorn
2009-08-26 10:11 ` Mel Gorman
2009-08-26 18:02 ` Lee Schermerhorn
2009-08-26 19:47 ` David Rientjes
2009-08-26 20:46 ` Lee Schermerhorn
2009-08-27 9:52 ` Mel Gorman
2009-08-27 19:35 ` David Rientjes
2009-08-28 12:56 ` Lee Schermerhorn
2009-08-26 18:04 ` Lee Schermerhorn
2009-08-27 10:23 ` Mel Gorman
2009-08-27 16:52 ` Lee Schermerhorn [this message]
2009-08-28 10:09 ` Mel Gorman
2009-08-25 13:35 ` Mel Gorman
2009-08-25 20:49 ` Lee Schermerhorn
2009-08-26 10:12 ` Mel Gorman
2009-08-24 19:30 ` [PATCH 5/5] hugetlb: update hugetlb documentation for mempolicy based management Lee Schermerhorn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1251391930.4374.89.camel@useless.americas.hpqcorp.net \
--to=lee.schermerhorn@hp.com \
--cc=agl@us.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=apw@canonical.com \
--cc=eric.whitney@hp.com \
--cc=linux-mm@kvack.org \
--cc=linux-numa@vger.kernel.org \
--cc=mel@csn.ul.ie \
--cc=nacc@us.ibm.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox