* [PATCH] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
@ 2007-06-07 15:04 Nishanth Aravamudan
2007-06-07 18:11 ` Christoph Lameter
0 siblings, 1 reply; 17+ messages in thread
From: Nishanth Aravamudan @ 2007-06-07 15:04 UTC (permalink / raw)
To: clameter; +Cc: Lee.Schermerhorn, anton, apw, mel, akpm, linux-mm
While testing my sysfs per-node hugepage allocator
(http://marc.info/?l=linux-mm&m=117935849517122&w=2), I found that an
alloc_pages_node(nid, GFP_THISNODE) request would sometimes return a
struct page such that page_to_nid(page) != nid. This was because, on
that particular machine, nodes 0 and 1 are populated and nodes 2 and 3
are not. When a page is requested get_page_from_freelist() relies on
zonelist->zones[0]->zone_pgdat indicating when THISNODE stops. But,
because, say, node 2 has no memory, the first zone_pgdat in the fallback
list points to a different node. Add a comment indicating that THISNODE
may not return pages on THISNODE if the node is unpopulated.
Am working on testing Lee/Anton's patch to add a node_populated_mask and
use that in the hugepage allocator path. But I think this may be a
problem anywhere THISNODE is used and memory is expected to come from
the requested node and nowhere else.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 0d2ef0b..ed826e9 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -67,6 +67,10 @@ struct vm_area_struct;
__GFP_HIGHMEM)
#ifdef CONFIG_NUMA
+/*
+ * NOTE: if the requested node is unpopulated (no memory), a THISNODE
+ * request can go to other nodes due to the fallback list
+ */
#define GFP_THISNODE (__GFP_THISNODE | __GFP_NOWARN | __GFP_NORETRY)
#else
#define GFP_THISNODE ((__force gfp_t)0)
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
2007-06-07 15:04 [PATCH] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated Nishanth Aravamudan
@ 2007-06-07 18:11 ` Christoph Lameter
2007-06-07 22:01 ` [PATCH v2] " Nishanth Aravamudan
0 siblings, 1 reply; 17+ messages in thread
From: Christoph Lameter @ 2007-06-07 18:11 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: Lee.Schermerhorn, anton, apw, mel, akpm, linux-mm
On Thu, 7 Jun 2007, Nishanth Aravamudan wrote:
> While testing my sysfs per-node hugepage allocator
> (http://marc.info/?l=linux-mm&m=117935849517122&w=2), I found that an
> alloc_pages_node(nid, GFP_THISNODE) request would sometimes return a
> struct page such that page_to_nid(page) != nid. This was because, on
> that particular machine, nodes 0 and 1 are populated and nodes 2 and 3
> are not. When a page is requested get_page_from_freelist() relies on
> zonelist->zones[0]->zone_pgdat indicating when THISNODE stops. But,
> because, say, node 2 has no memory, the first zone_pgdat in the fallback
> list points to a different node. Add a comment indicating that THISNODE
> may not return pages on THISNODE if the node is unpopulated.
Hmmm.... Bad semantics are developing as a result of allowing empty nodes
with no zones. This is not correct and can have bad consequences.
As I expected: We may need more hacks to deal with it. Sigh.
> Am working on testing Lee/Anton's patch to add a node_populated_mask and
> use that in the hugepage allocator path. But I think this may be a
> problem anywhere THISNODE is used and memory is expected to come from
> the requested node and nowhere else.
What GFP_THISNODE effectively does now is to require the allocation on the
nearest available zone to the indicated node because it does not allow
access outside of the first encountered pgdat. But the first pgdat is not
the node we selected if the node has no memory.
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 0d2ef0b..ed826e9 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -67,6 +67,10 @@ struct vm_area_struct;
> __GFP_HIGHMEM)
>
> #ifdef CONFIG_NUMA
> +/*
> + * NOTE: if the requested node is unpopulated (no memory), a THISNODE
> + * request can go to other nodes due to the fallback list
Change to
Note: GFP_THISNODE allocates from the first available pgdat (== node
structure) from the zonelist of a given node. The first pgdat may be the
pgdat of another node if the node has no memory on its own.
?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
2007-06-07 18:11 ` Christoph Lameter
@ 2007-06-07 22:01 ` Nishanth Aravamudan
2007-06-07 22:05 ` Christoph Lameter
2007-06-11 12:49 ` Andy Whitcroft
0 siblings, 2 replies; 17+ messages in thread
From: Nishanth Aravamudan @ 2007-06-07 22:01 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Lee.Schermerhorn, anton, apw, mel, akpm, linux-mm
On 07.06.2007 [11:11:02 -0700], Christoph Lameter wrote:
> On Thu, 7 Jun 2007, Nishanth Aravamudan wrote:
>
> > While testing my sysfs per-node hugepage allocator
> > (http://marc.info/?l=linux-mm&m=117935849517122&w=2), I found that
> > an alloc_pages_node(nid, GFP_THISNODE) request would sometimes
> > return a struct page such that page_to_nid(page) != nid. This was
> > because, on that particular machine, nodes 0 and 1 are populated and
> > nodes 2 and 3 are not. When a page is requested
> > get_page_from_freelist() relies on zonelist->zones[0]->zone_pgdat
> > indicating when THISNODE stops. But, because, say, node 2 has no
> > memory, the first zone_pgdat in the fallback list points to a
> > different node. Add a comment indicating that THISNODE may not
> > return pages on THISNODE if the node is unpopulated.
>
> Hmmm.... Bad semantics are developing as a result of allowing empty
> nodes with no zones. This is not correct and can have bad
> consequences.
I won't argue with you.
> As I expected: We may need more hacks to deal with it. Sigh.
Yes.
> > Am working on testing Lee/Anton's patch to add a node_populated_mask
> > and use that in the hugepage allocator path. But I think this may be
> > a problem anywhere THISNODE is used and memory is expected to come
> > from the requested node and nowhere else.
>
> What GFP_THISNODE effectively does now is to require the allocation on
> the nearest available zone to the indicated node because it does not
> allow access outside of the first encountered pgdat. But the first
> pgdat is not the node we selected if the node has no memory.
Right, that's the fallback list, right? And for unpopulated zones,
zonelist->zone[0]->zone_pgdat can refer to a different node.
> > diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> > index 0d2ef0b..ed826e9 100644
> > --- a/include/linux/gfp.h
> > +++ b/include/linux/gfp.h
> > @@ -67,6 +67,10 @@ struct vm_area_struct;
> > __GFP_HIGHMEM)
> >
> > #ifdef CONFIG_NUMA
> > +/*
> > + * NOTE: if the requested node is unpopulated (no memory), a THISNODE
> > + * request can go to other nodes due to the fallback list
>
> Change to
>
> Note: GFP_THISNODE allocates from the first available pgdat (== node
> structure) from the zonelist of a given node. The first pgdat may be the
> pgdat of another node if the node has no memory on its own.
Changed below.
gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
While testing my sysfs per-node hugepage allocator
(http://marc.info/?l=linux-mm&m=117935849517122&w=2), I found that an
alloc_pages_node(nid, GFP_THISNODE) request would sometimes return a
struct page such that page_to_nid(page) != nid. This was because, on
that particular machine, nodes 0 and 1 are populated and nodes 2 and 3
are not. When a page is requested get_page_from_freelist() relies on
zonelist->zones[0]->zone_pgdat indicating when THISNODE stops. But,
because, say, node 2 has no memory, the first zone_pgdat in the fallback
list points to a different node. Add a comment indicating that THISNODE
may not return pages on THISNODE if the node is unpopulated.
Am working on testing Lee/Anton's patch to add a node_populated_mask and
use that in the hugepage allocator path. But I think this may be a
problem anywhere THISNODE is used and memory is expected to come from
the requested node and nowhere else.
Reworked the comment based on feedback from Christoph Lameter.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index ed826e9..996cf08 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -68,8 +68,10 @@ struct vm_area_struct;
#ifdef CONFIG_NUMA
/*
- * NOTE: if the requested node is unpopulated (no memory), a THISNODE
- * request can go to other nodes due to the fallback list
+ * NOTE: GFP_THISNODE allocates from the first available pgdat (== node
+ * structure) from the zonelist of the requested node. The first pgdat
+ * may be the pgdat of another node if the requested node has no memory
+ * on its own.
*/
#define GFP_THISNODE (__GFP_THISNODE | __GFP_NOWARN | __GFP_NORETRY)
#else
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
2007-06-07 22:01 ` [PATCH v2] " Nishanth Aravamudan
@ 2007-06-07 22:05 ` Christoph Lameter
2007-06-07 22:16 ` Nishanth Aravamudan
2007-06-11 12:49 ` Andy Whitcroft
1 sibling, 1 reply; 17+ messages in thread
From: Christoph Lameter @ 2007-06-07 22:05 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: Lee.Schermerhorn, anton, apw, mel, akpm, linux-mm
On Thu, 7 Jun 2007, Nishanth Aravamudan wrote:
> /*
> - * NOTE: if the requested node is unpopulated (no memory), a THISNODE
> - * request can go to other nodes due to the fallback list
> + * NOTE: GFP_THISNODE allocates from the first available pgdat (== node
> + * structure) from the zonelist of the requested node. The first pgdat
> + * may be the pgdat of another node if the requested node has no memory
> + * on its own.
> */
> #define GFP_THISNODE (__GFP_THISNODE | __GFP_NOWARN | __GFP_NORETRY)
> #else
Acked-by: Christoph Lameter <clameter@sgi.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
2007-06-07 22:05 ` Christoph Lameter
@ 2007-06-07 22:16 ` Nishanth Aravamudan
0 siblings, 0 replies; 17+ messages in thread
From: Nishanth Aravamudan @ 2007-06-07 22:16 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Lee.Schermerhorn, anton, apw, mel, akpm, linux-mm
On 07.06.2007 [15:05:56 -0700], Christoph Lameter wrote:
> On Thu, 7 Jun 2007, Nishanth Aravamudan wrote:
>
> > /*
> > - * NOTE: if the requested node is unpopulated (no memory), a THISNODE
> > - * request can go to other nodes due to the fallback list
> > + * NOTE: GFP_THISNODE allocates from the first available pgdat (== node
> > + * structure) from the zonelist of the requested node. The first pgdat
> > + * may be the pgdat of another node if the requested node has no memory
> > + * on its own.
> > */
> > #define GFP_THISNODE (__GFP_THISNODE | __GFP_NOWARN | __GFP_NORETRY)
> > #else
>
> Acked-by: Christoph Lameter <clameter@sgi.com>
Thanks, I'm an idiot, though, and didn't reset to stock -linus before
refreshing my patch:
gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
While testing my sysfs per-node hugepage allocator
(http://marc.info/?l=linux-mm&m=117935849517122&w=2), I found that an
alloc_pages_node(nid, GFP_THISNODE) request would sometimes return a
struct page such that page_to_nid(page) != nid. This was because, on
that particular machine, nodes 0 and 1 are populated and nodes 2 and 3
are not. When a page is requested get_page_from_freelist() relies on
zonelist->zones[0]->zone_pgdat indicating when THISNODE stops. But,
because, say, node 2 has no memory, the first zone_pgdat in the fallback
list points to a different node. Add a comment indicating that THISNODE
may not return pages on THISNODE if the node is unpopulated.
Am working on testing Lee/Anton's patch to add a node_populated_mask and
use that in the hugepage allocator path. But I think this may be a
problem anywhere THISNODE is used and memory is expected to come from
the requested node and nowhere else.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: Christoph Lameter <clameter@sgi.com>
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 0d2ef0b..996cf08 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -67,6 +67,12 @@ struct vm_area_struct;
__GFP_HIGHMEM)
#ifdef CONFIG_NUMA
+/*
+ * NOTE: GFP_THISNODE allocates from the first available pgdat (== node
+ * structure) from the zonelist of the requested node. The first pgdat
+ * may be the pgdat of another node if the requested node has no memory
+ * on its own.
+ */
#define GFP_THISNODE (__GFP_THISNODE | __GFP_NOWARN | __GFP_NORETRY)
#else
#define GFP_THISNODE ((__force gfp_t)0)
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
2007-06-07 22:01 ` [PATCH v2] " Nishanth Aravamudan
2007-06-07 22:05 ` Christoph Lameter
@ 2007-06-11 12:49 ` Andy Whitcroft
2007-06-11 16:12 ` Christoph Lameter
1 sibling, 1 reply; 17+ messages in thread
From: Andy Whitcroft @ 2007-06-11 12:49 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: Christoph Lameter, Lee.Schermerhorn, anton, mel, akpm, linux-mm
Nishanth Aravamudan wrote:
> On 07.06.2007 [11:11:02 -0700], Christoph Lameter wrote:
>> On Thu, 7 Jun 2007, Nishanth Aravamudan wrote:
>>
>>> While testing my sysfs per-node hugepage allocator
>>> (http://marc.info/?l=linux-mm&m=117935849517122&w=2), I found that
>>> an alloc_pages_node(nid, GFP_THISNODE) request would sometimes
>>> return a struct page such that page_to_nid(page) != nid. This was
>>> because, on that particular machine, nodes 0 and 1 are populated and
>>> nodes 2 and 3 are not. When a page is requested
>>> get_page_from_freelist() relies on zonelist->zones[0]->zone_pgdat
>>> indicating when THISNODE stops. But, because, say, node 2 has no
>>> memory, the first zone_pgdat in the fallback list points to a
>>> different node. Add a comment indicating that THISNODE may not
>>> return pages on THISNODE if the node is unpopulated.
>> Hmmm.... Bad semantics are developing as a result of allowing empty
>> nodes with no zones. This is not correct and can have bad
>> consequences.
>
> I won't argue with you.
>
>> As I expected: We may need more hacks to deal with it. Sigh.
>
> Yes.
>
>>> Am working on testing Lee/Anton's patch to add a node_populated_mask
>>> and use that in the hugepage allocator path. But I think this may be
>>> a problem anywhere THISNODE is used and memory is expected to come
>>> from the requested node and nowhere else.
>> What GFP_THISNODE effectively does now is to require the allocation on
>> the nearest available zone to the indicated node because it does not
>> allow access outside of the first encountered pgdat. But the first
>> pgdat is not the node we selected if the node has no memory.
>
> Right, that's the fallback list, right? And for unpopulated zones,
> zonelist->zone[0]->zone_pgdat can refer to a different node.
>
>>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>>> index 0d2ef0b..ed826e9 100644
>>> --- a/include/linux/gfp.h
>>> +++ b/include/linux/gfp.h
>>> @@ -67,6 +67,10 @@ struct vm_area_struct;
>>> __GFP_HIGHMEM)
>>>
>>> #ifdef CONFIG_NUMA
>>> +/*
>>> + * NOTE: if the requested node is unpopulated (no memory), a THISNODE
>>> + * request can go to other nodes due to the fallback list
>> Change to
>>
>> Note: GFP_THISNODE allocates from the first available pgdat (== node
>> structure) from the zonelist of a given node. The first pgdat may be the
>> pgdat of another node if the node has no memory on its own.
>
> Changed below.
>
> gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
>
> While testing my sysfs per-node hugepage allocator
> (http://marc.info/?l=linux-mm&m=117935849517122&w=2), I found that an
> alloc_pages_node(nid, GFP_THISNODE) request would sometimes return a
> struct page such that page_to_nid(page) != nid. This was because, on
> that particular machine, nodes 0 and 1 are populated and nodes 2 and 3
> are not. When a page is requested get_page_from_freelist() relies on
> zonelist->zones[0]->zone_pgdat indicating when THISNODE stops. But,
> because, say, node 2 has no memory, the first zone_pgdat in the fallback
> list points to a different node. Add a comment indicating that THISNODE
> may not return pages on THISNODE if the node is unpopulated.
>
> Am working on testing Lee/Anton's patch to add a node_populated_mask and
> use that in the hugepage allocator path. But I think this may be a
> problem anywhere THISNODE is used and memory is expected to come from
> the requested node and nowhere else.
>
> Reworked the comment based on feedback from Christoph Lameter.
>
> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index ed826e9..996cf08 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -68,8 +68,10 @@ struct vm_area_struct;
>
> #ifdef CONFIG_NUMA
> /*
> - * NOTE: if the requested node is unpopulated (no memory), a THISNODE
> - * request can go to other nodes due to the fallback list
> + * NOTE: GFP_THISNODE allocates from the first available pgdat (== node
> + * structure) from the zonelist of the requested node. The first pgdat
> + * may be the pgdat of another node if the requested node has no memory
> + * on its own.
> */
> #define GFP_THISNODE (__GFP_THISNODE | __GFP_NOWARN | __GFP_NORETRY)
> #else
>
Its strange behaviour for sure. The users of this in the slab are not
getting what they expected. So its possible they would also want
something similar to what you are proposing for hugetlbfs. I also
wonder if the name should be changed to GFP_NEARTHISNODE or something.
The comment does better describe the current behavior, so it seems
worthy regardless.
Acked-by: Andy Whitcroft <apw@shadowen.org>
-apw
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
2007-06-11 12:49 ` Andy Whitcroft
@ 2007-06-11 16:12 ` Christoph Lameter
2007-06-11 16:42 ` Christoph Lameter
0 siblings, 1 reply; 17+ messages in thread
From: Christoph Lameter @ 2007-06-11 16:12 UTC (permalink / raw)
To: Andy Whitcroft
Cc: Nishanth Aravamudan, Lee.Schermerhorn, anton, mel, akpm, linux-mm
On Mon, 11 Jun 2007, Andy Whitcroft wrote:
> Its strange behaviour for sure. The users of this in the slab are not
> getting what they expected. So its possible they would also want
> something similar to what you are proposing for hugetlbfs. I also
> wonder if the name should be changed to GFP_NEARTHISNODE or something.
Well maybe we better fix this? I put an effort into using only cachelines
already used for GFP_THISNODE since this is in a very performance
critical path but at that point I was not thinking that we
would have memoryless nodes.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
2007-06-11 16:12 ` Christoph Lameter
@ 2007-06-11 16:42 ` Christoph Lameter
2007-06-11 17:12 ` Nishanth Aravamudan
2007-06-11 18:23 ` Lee Schermerhorn
0 siblings, 2 replies; 17+ messages in thread
From: Christoph Lameter @ 2007-06-11 16:42 UTC (permalink / raw)
To: Andy Whitcroft
Cc: Nishanth Aravamudan, Lee.Schermerhorn, ak, anton, mel, akpm, linux-mm
On Mon, 11 Jun 2007, Christoph Lameter wrote:
> Well maybe we better fix this? I put an effort into using only cachelines
> already used for GFP_THISNODE since this is in a very performance
> critical path but at that point I was not thinking that we
> would have memoryless nodes.
Duh. Too bad. The node information is not available in __alloc_pages at
all. The only thing we have to go on is a zonelist. And the first element
of that zonelist must no longer be the node from which we picked up
the zonelist after memoryless nodes come into play.
We could check this for alloc_pages_node() and alloc_pages_current by
putting in some code into the place where we retrive the zonelist based on
the current policy.
And looking at that code I can see some more bad consequences of
memoryless nodes:
1. Interleave to the memoryless node will be redirected to the nearest
node to the memoryless node. This will typically result in the nearest
node getting double the allocations if interleave is set.
So interleave is basically broken. It will no longer spread out the
allocations properly.
2. MPOL_BIND may allow allocations outside of the nodes specified.
It assumes that the first item of the zonelist of each node
is that zone.
So we have a universal assumption in the VM that the first zone of a
zonelist contains the local node. The current way of generating
zonelists for memoryless zones is broken (unsurprisingly since the NUMA
handling was never designed to handle memoryless nodes).
I think we can to fix all these troubles by adding a empty zone as
a first zone in the zonelist if the node has no memory of its own.
Then we need to make sure that we do the right thing of falling back
anytime these empty zones will be encountered.
This will have the effect of
1. GFP_THISNODE will fail since there is no memory in the empty zone.
2. MPOL_BIND will not allocate on nodes outside of the specified set
since there will be an empty zone in the generated zonelist.
3. Interleave will still hit an empty zones and fall back to the next.
We should add detection of memoryless nodes to mempoliy.c to skip
those nodes.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
2007-06-11 16:42 ` Christoph Lameter
@ 2007-06-11 17:12 ` Nishanth Aravamudan
2007-06-11 18:29 ` Christoph Lameter
2007-06-11 18:23 ` Lee Schermerhorn
1 sibling, 1 reply; 17+ messages in thread
From: Nishanth Aravamudan @ 2007-06-11 17:12 UTC (permalink / raw)
To: Christoph Lameter
Cc: Andy Whitcroft, Lee.Schermerhorn, ak, anton, mel, akpm, linux-mm
On 11.06.2007 [09:42:14 -0700], Christoph Lameter wrote:
> On Mon, 11 Jun 2007, Christoph Lameter wrote:
>
> > Well maybe we better fix this? I put an effort into using only cachelines
> > already used for GFP_THISNODE since this is in a very performance
> > critical path but at that point I was not thinking that we
> > would have memoryless nodes.
>
> Duh. Too bad. The node information is not available in __alloc_pages at
> all. The only thing we have to go on is a zonelist. And the first element
> of that zonelist must no longer be the node from which we picked up
> the zonelist after memoryless nodes come into play.
>
> We could check this for alloc_pages_node() and alloc_pages_current by
> putting in some code into the place where we retrive the zonelist based on
> the current policy.
>
> And looking at that code I can see some more bad consequences of
> memoryless nodes:
>
> 1. Interleave to the memoryless node will be redirected to the nearest
> node to the memoryless node. This will typically result in the nearest
> node getting double the allocations if interleave is set.
>
> So interleave is basically broken. It will no longer spread out the
> allocations properly.
>
> 2. MPOL_BIND may allow allocations outside of the nodes specified.
> It assumes that the first item of the zonelist of each node
> is that zone.
>
>
> So we have a universal assumption in the VM that the first zone of a
> zonelist contains the local node. The current way of generating
> zonelists for memoryless zones is broken (unsurprisingly since the NUMA
> handling was never designed to handle memoryless nodes).
>
> I think we can to fix all these troubles by adding a empty zone as
> a first zone in the zonelist if the node has no memory of its own.
> Then we need to make sure that we do the right thing of falling back
> anytime these empty zones will be encountered.
>
> This will have the effect of
>
> 1. GFP_THISNODE will fail since there is no memory in the empty zone.
>
> 2. MPOL_BIND will not allocate on nodes outside of the specified set
> since there will be an empty zone in the generated zonelist.
>
> 3. Interleave will still hit an empty zones and fall back to the next.
> We should add detection of memoryless nodes to mempoliy.c to skip
> those nodes.
These are the exact semantics, I expected. so I'll be happy to test/work
on these fixes.
This would also make it unnecessary to add the populated checks in
various places, I think, as THISNODE will mean ONLYTHISNODE (and perhaps
should be renamed in the series).
Thanks,
Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
2007-06-11 16:42 ` Christoph Lameter
2007-06-11 17:12 ` Nishanth Aravamudan
@ 2007-06-11 18:23 ` Lee Schermerhorn
2007-06-11 18:40 ` Christoph Lameter
1 sibling, 1 reply; 17+ messages in thread
From: Lee Schermerhorn @ 2007-06-11 18:23 UTC (permalink / raw)
To: Christoph Lameter
Cc: Andy Whitcroft, Nishanth Aravamudan, ak, anton, mel, akpm, linux-mm
On Mon, 2007-06-11 at 09:42 -0700, Christoph Lameter wrote:
> On Mon, 11 Jun 2007, Christoph Lameter wrote:
>
> > Well maybe we better fix this? I put an effort into using only cachelines
> > already used for GFP_THISNODE since this is in a very performance
> > critical path but at that point I was not thinking that we
> > would have memoryless nodes.
>
> Duh. Too bad. The node information is not available in __alloc_pages at
> all. The only thing we have to go on is a zonelist. And the first element
> of that zonelist must no longer be the node from which we picked up
> the zonelist after memoryless nodes come into play.
>
> We could check this for alloc_pages_node() and alloc_pages_current by
> putting in some code into the place where we retrive the zonelist based on
> the current policy.
>
> And looking at that code I can see some more bad consequences of
> memoryless nodes:
>
> 1. Interleave to the memoryless node will be redirected to the nearest
> node to the memoryless node. This will typically result in the nearest
> node getting double the allocations if interleave is set.
>
> So interleave is basically broken. It will no longer spread out the
> allocations properly.
Yeah. That's what was happening with the hugepage allocation that Anton
Blanchard started the patch for. I reworked the patch, with some input
from you as I recall, to utilize a "populate node map" that specified
which nodes contain memory in the "policy zone". Nish just reposted
this patch after testing on his platforms with his per node nr_hugepages
sysfs attribute patches.
>
> 2. MPOL_BIND may allow allocations outside of the nodes specified.
> It assumes that the first item of the zonelist of each node
> is that zone.
>
>
> So we have a universal assumption in the VM that the first zone of a
> zonelist contains the local node. The current way of generating
> zonelists for memoryless zones is broken (unsurprisingly since the NUMA
> handling was never designed to handle memoryless nodes).
>
> I think we can to fix all these troubles by adding a empty zone as
> a first zone in the zonelist if the node has no memory of its own.
> Then we need to make sure that we do the right thing of falling back
> anytime these empty zones will be encountered.
As I recall, that was Anton's first attempt. He just left the empty
nodes in the list. Andi asked him not to do that as it apparently
violated some other [unspecified?] assumptions in the policy code.
Perhaps Andi's objection was because the empty node's zones were not
properly initialized for some usages?
>
> This will have the effect of
>
> 1. GFP_THISNODE will fail since there is no memory in the empty zone.
>
> 2. MPOL_BIND will not allocate on nodes outside of the specified set
> since there will be an empty zone in the generated zonelist.
>
> 3. Interleave will still hit an empty zones and fall back to the next.
> We should add detection of memoryless nodes to mempoliy.c to skip
> those nodes.
When the hugepages patch was evolving, I suggested that we might want to
export the "populated map" to applications so that they could ask to
bind to or interleave across only populated nodes. We never pursued
that. Maybe just eliminate nodes that are unpopulated in the "policy
zone" from the node masks for MPOL_BIND and MPOL_INTERLEAVE in the
system calls? Saves checking the populated node set in the allocation
paths. Would need appropriate error return if this resulted in empty
nodemask.
Of course, memory hotplug could result in nodes becoming empty after the
nodemasks are adjusted, so we probably can't avoid checks in the
allocation paths if we want to avoid the bind and interleave issues you
mention above.
Lee
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
2007-06-11 17:12 ` Nishanth Aravamudan
@ 2007-06-11 18:29 ` Christoph Lameter
2007-06-11 18:46 ` Nishanth Aravamudan
2007-06-11 19:36 ` Nishanth Aravamudan
0 siblings, 2 replies; 17+ messages in thread
From: Christoph Lameter @ 2007-06-11 18:29 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: Andy Whitcroft, Lee.Schermerhorn, ak, anton, mel, akpm, linux-mm
On Mon, 11 Jun 2007, Nishanth Aravamudan wrote:
> These are the exact semantics, I expected. so I'll be happy to test/work
> on these fixes.
>
> This would also make it unnecessary to add the populated checks in
> various places, I think, as THISNODE will mean ONLYTHISNODE (and perhaps
> should be renamed in the series).
Here is a draft on how this could work:
It seems that MPOL_BIND already has the fixes needed. Adding empty zones
does not look that easy.
So I added a check interleave_nodes to verify that the selected node has
memory. If not skip it.
This does not work for the address based interleaving for anonymous vmas.
I am not sure what to do there. We could change the calculation of the
node to be based only on nodes with memory and then skip the memoryless
ones. I have only added a comment to describe its brokennes for now.
Then there is the original issue of GFP_THISNODE. I added a check in
alloc_pages_node to verify that the node has memory. If not then fail.
This will fix the alloc_pages_node case but not the alloc_pages() case.
In the alloc_pages() case we do not specify a node. Implicitly it is
understood that we (in the case of no memory policy / cpuset options)
allocate from the nearest node. So it may be argued there that the
GFP_THISNODE behavior of taking the first node from the zonelist is okay.
There is some reason to worry about the performance impact from adding the
check in alloc_pages_node and interleave. If the code is left in
alloc_pages_node then alloc_pages_node should be uninlined and moved to
mempolicy.c
Index: linux-2.6/mm/mempolicy.c
===================================================================
--- linux-2.6.orig/mm/mempolicy.c 2007-06-11 11:13:09.000000000 -0700
+++ linux-2.6/mm/mempolicy.c 2007-06-11 11:19:03.000000000 -0700
@@ -1125,9 +1125,11 @@ static unsigned interleave_nodes(struct
struct task_struct *me = current;
nid = me->il_next;
- next = next_node(nid, policy->v.nodes);
- if (next >= MAX_NUMNODES)
- next = first_node(policy->v.nodes);
+ do {
+ next = next_node(nid, policy->v.nodes);
+ if (next >= MAX_NUMNODES)
+ next = first_node(policy->v.nodes);
+ } while (!NODE_DATA(node)->present_pages);
me->il_next = next;
return nid;
}
@@ -1191,6 +1193,11 @@ static inline unsigned interleave_nid(st
* for huge pages, since vm_pgoff is in units of small
* pages, we need to shift off the always 0 bits to get
* a useful offset.
+ *
+ * For configurations with memoryless nodes this is broken
+ * since the allocation attempts on that node will fall
+ * back to other nodes and thus one neighboring node
+ * will be overallocated from.
*/
BUG_ON(shift < PAGE_SHIFT);
off = vma->vm_pgoff >> (shift - PAGE_SHIFT);
Index: linux-2.6/include/linux/gfp.h
===================================================================
--- linux-2.6.orig/include/linux/gfp.h 2007-06-11 11:19:32.000000000 -0700
+++ linux-2.6/include/linux/gfp.h 2007-06-11 11:21:33.000000000 -0700
@@ -134,6 +134,9 @@ static inline struct page *alloc_pages_n
if (nid < 0)
nid = numa_node_id();
+ if ((gfp_mask & __GFP_THISNODE) && !NODE_DATA(nid)->present_pages)
+ return NULL;
+
return __alloc_pages(gfp_mask, order,
NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
2007-06-11 18:23 ` Lee Schermerhorn
@ 2007-06-11 18:40 ` Christoph Lameter
0 siblings, 0 replies; 17+ messages in thread
From: Christoph Lameter @ 2007-06-11 18:40 UTC (permalink / raw)
To: Lee Schermerhorn
Cc: Andy Whitcroft, Nishanth Aravamudan, ak, anton, mel, akpm, linux-mm
On Mon, 11 Jun 2007, Lee Schermerhorn wrote:
> When the hugepages patch was evolving, I suggested that we might want to
> export the "populated map" to applications so that they could ask to
> bind to or interleave across only populated nodes. We never pursued
> that. Maybe just eliminate nodes that are unpopulated in the "policy
> zone" from the node masks for MPOL_BIND and MPOL_INTERLEAVE in the
> system calls? Saves checking the populated node set in the allocation
> paths. Would need appropriate error return if this resulted in empty
> nodemask.
That would work for the MPOL_BIND case since it has a zonelist. However,
MPOL_INTERLEAVE does not have a zonelist. I think we need the populated
map for interleave. The hacky way in how I checked for an unpopulated
node in the patch just posted is not that effective.
> Of course, memory hotplug could result in nodes becoming empty after the
> nodemasks are adjusted, so we probably can't avoid checks in the
> allocation paths if we want to avoid the bind and interleave issues you
> mention above.
Right.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
2007-06-11 18:29 ` Christoph Lameter
@ 2007-06-11 18:46 ` Nishanth Aravamudan
2007-06-11 18:54 ` Christoph Lameter
2007-06-11 19:36 ` Nishanth Aravamudan
1 sibling, 1 reply; 17+ messages in thread
From: Nishanth Aravamudan @ 2007-06-11 18:46 UTC (permalink / raw)
To: Christoph Lameter
Cc: Andy Whitcroft, Lee.Schermerhorn, ak, anton, mel, akpm, linux-mm
On 11.06.2007 [11:29:14 -0700], Christoph Lameter wrote:
> On Mon, 11 Jun 2007, Nishanth Aravamudan wrote:
>
> > These are the exact semantics, I expected. so I'll be happy to test/work
> > on these fixes.
> >
> > This would also make it unnecessary to add the populated checks in
> > various places, I think, as THISNODE will mean ONLYTHISNODE (and perhaps
> > should be renamed in the series).
>
> Here is a draft on how this could work:
>
> It seems that MPOL_BIND already has the fixes needed. Adding empty zones
> does not look that easy.
> So I added a check interleave_nodes to verify that the selected node has
> memory. If not skip it.
>
> This does not work for the address based interleaving for anonymous vmas.
> I am not sure what to do there. We could change the calculation of the
> node to be based only on nodes with memory and then skip the memoryless
> ones. I have only added a comment to describe its brokennes for now.
>
> Then there is the original issue of GFP_THISNODE. I added a check in
> alloc_pages_node to verify that the node has memory. If not then fail.
> This will fix the alloc_pages_node case but not the alloc_pages() case.
>
> In the alloc_pages() case we do not specify a node. Implicitly it is
> understood that we (in the case of no memory policy / cpuset options)
> allocate from the nearest node. So it may be argued there that the
> GFP_THISNODE behavior of taking the first node from the zonelist is okay.
>
> There is some reason to worry about the performance impact from adding the
> check in alloc_pages_node and interleave. If the code is left in
> alloc_pages_node then alloc_pages_node should be uninlined and moved to
> mempolicy.c
>
> Index: linux-2.6/mm/mempolicy.c
> ===================================================================
> --- linux-2.6.orig/mm/mempolicy.c 2007-06-11 11:13:09.000000000 -0700
> +++ linux-2.6/mm/mempolicy.c 2007-06-11 11:19:03.000000000 -0700
> @@ -1125,9 +1125,11 @@ static unsigned interleave_nodes(struct
> struct task_struct *me = current;
>
> nid = me->il_next;
> - next = next_node(nid, policy->v.nodes);
> - if (next >= MAX_NUMNODES)
> - next = first_node(policy->v.nodes);
> + do {
> + next = next_node(nid, policy->v.nodes);
> + if (next >= MAX_NUMNODES)
> + next = first_node(policy->v.nodes);
> + } while (!NODE_DATA(node)->present_pages);
If something like Lee/Anton's patch were to go in (which, as Lee pointed
out, I refreshed as Patch 1/3 in the series I posted a few days ago),
this would be
while(!node_populated(nid))
> me->il_next = next;
> return nid;
> }
> @@ -1191,6 +1193,11 @@ static inline unsigned interleave_nid(st
> * for huge pages, since vm_pgoff is in units of small
> * pages, we need to shift off the always 0 bits to get
> * a useful offset.
> + *
> + * For configurations with memoryless nodes this is broken
> + * since the allocation attempts on that node will fall
> + * back to other nodes and thus one neighboring node
> + * will be overallocated from.
> */
> BUG_ON(shift < PAGE_SHIFT);
> off = vma->vm_pgoff >> (shift - PAGE_SHIFT);
> Index: linux-2.6/include/linux/gfp.h
> ===================================================================
> --- linux-2.6.orig/include/linux/gfp.h 2007-06-11 11:19:32.000000000 -0700
> +++ linux-2.6/include/linux/gfp.h 2007-06-11 11:21:33.000000000 -0700
> @@ -134,6 +134,9 @@ static inline struct page *alloc_pages_n
> if (nid < 0)
> nid = numa_node_id();
>
> + if ((gfp_mask & __GFP_THISNODE) && !NODE_DATA(nid)->present_pages)
> + return NULL;
> +
similarly,
if ((gfp_mask & __GFP_THISNODE) && !node_populated(nid))
Presuming I understand everything correctly. Not sure which would be
preferred, or if perhaps node_populated, rather than using a nodemask
should just use NODE_DATA(nid)->present_pages?
Thanks,
Nish
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
2007-06-11 18:46 ` Nishanth Aravamudan
@ 2007-06-11 18:54 ` Christoph Lameter
0 siblings, 0 replies; 17+ messages in thread
From: Christoph Lameter @ 2007-06-11 18:54 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: Andy Whitcroft, Lee.Schermerhorn, ak, anton, mel, akpm, linux-mm
On Mon, 11 Jun 2007, Nishanth Aravamudan wrote:
> > + do {
> > + next = next_node(nid, policy->v.nodes);
> > + if (next >= MAX_NUMNODES)
> > + next = first_node(policy->v.nodes);
> > + } while (!NODE_DATA(node)->present_pages);
>
> If something like Lee/Anton's patch were to go in (which, as Lee pointed
> out, I refreshed as Patch 1/3 in the series I posted a few days ago),
> this would be
>
> while(!node_populated(nid))
Right. That would be much better.
> Presuming I understand everything correctly. Not sure which would be
> preferred, or if perhaps node_populated, rather than using a nodemask
> should just use NODE_DATA(nid)->present_pages?
I think the node_populate is better. Simple bitmap lookup.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
2007-06-11 18:29 ` Christoph Lameter
2007-06-11 18:46 ` Nishanth Aravamudan
@ 2007-06-11 19:36 ` Nishanth Aravamudan
2007-06-11 19:43 ` Christoph Lameter
1 sibling, 1 reply; 17+ messages in thread
From: Nishanth Aravamudan @ 2007-06-11 19:36 UTC (permalink / raw)
To: Christoph Lameter
Cc: Andy Whitcroft, Lee.Schermerhorn, ak, anton, mel, akpm, linux-mm
On 11.06.2007 [11:29:14 -0700], Christoph Lameter wrote:
> On Mon, 11 Jun 2007, Nishanth Aravamudan wrote:
>
> > These are the exact semantics, I expected. so I'll be happy to test/work
> > on these fixes.
> >
> > This would also make it unnecessary to add the populated checks in
> > various places, I think, as THISNODE will mean ONLYTHISNODE (and perhaps
> > should be renamed in the series).
>
> Here is a draft on how this could work:
<snip>
> Index: linux-2.6/mm/mempolicy.c
> ===================================================================
> --- linux-2.6.orig/mm/mempolicy.c 2007-06-11 11:13:09.000000000 -0700
> +++ linux-2.6/mm/mempolicy.c 2007-06-11 11:19:03.000000000 -0700
> @@ -1125,9 +1125,11 @@ static unsigned interleave_nodes(struct
> struct task_struct *me = current;
>
> nid = me->il_next;
> - next = next_node(nid, policy->v.nodes);
> - if (next >= MAX_NUMNODES)
> - next = first_node(policy->v.nodes);
> + do {
> + next = next_node(nid, policy->v.nodes);
> + if (next >= MAX_NUMNODES)
> + next = first_node(policy->v.nodes);
> + } while (!NODE_DATA(node)->present_pages);
> me->il_next = next;
> return nid;
> }
So, I'm splitting up the populated_map patch in two, so that these bits
or the hugetlbfs bits could be put on top of having that nodemask.
*but*, if this change occurs in mempolicy.c, I think we still have a
problem, where me->il_next could be initialized in do_set_mempolicy() to
a memoryless node:
if (new && new->policy == MPOL_INTERLEAVE)
current->il_next = first_node(new->v.nodes);
Since we return nid in mm/mempolicy.c, we've fix the problem for
subsequent intereaves, but not the first one. So should it be:
unsigned nid;
if (new && new->policy == MPOL_INTERLEAVE) {
nid = first_node(new->v.nodes);
while (!node_populated(nid)) {
nid = next_node(nid, new->v.nodes);
if (nid >= MAX_NUMNODES) {
mpol_free(current->mempolicy);
current->mempolicy = NULL;
mpol_set_task_struct_flag();
return -EINVAL;
}
}
}
??
Sorry if I'm way off here, just trying to get it right.
Thanks,
Nish
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
2007-06-11 19:36 ` Nishanth Aravamudan
@ 2007-06-11 19:43 ` Christoph Lameter
2007-06-11 20:18 ` Nishanth Aravamudan
0 siblings, 1 reply; 17+ messages in thread
From: Christoph Lameter @ 2007-06-11 19:43 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: Andy Whitcroft, Lee.Schermerhorn, ak, anton, mel, akpm, linux-mm
On Mon, 11 Jun 2007, Nishanth Aravamudan wrote:
> So, I'm splitting up the populated_map patch in two, so that these bits
> or the hugetlbfs bits could be put on top of having that nodemask.
Well maybe just do a single populate_map patch first. We can easily review
that and get it in. And it will be useful for multiple other patchsets.
> *but*, if this change occurs in mempolicy.c, I think we still have a
> problem, where me->il_next could be initialized in do_set_mempolicy() to
> a memoryless node:
I thought that one misalloc would not be that problematic (hmmmm... unless
its a hugetlb page on smallist NUMA system...)
> if (new && new->policy == MPOL_INTERLEAVE)
> current->il_next = first_node(new->v.nodes);
Hmmmm... We could also switch off the nodes in v.nodes? Then we do not
need any additional checks and the modifications to interleave() are not
necessary?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
2007-06-11 19:43 ` Christoph Lameter
@ 2007-06-11 20:18 ` Nishanth Aravamudan
0 siblings, 0 replies; 17+ messages in thread
From: Nishanth Aravamudan @ 2007-06-11 20:18 UTC (permalink / raw)
To: Christoph Lameter
Cc: Andy Whitcroft, Lee.Schermerhorn, ak, anton, mel, akpm, linux-mm
On 11.06.2007 [12:43:32 -0700], Christoph Lameter wrote:
> On Mon, 11 Jun 2007, Nishanth Aravamudan wrote:
>
> > So, I'm splitting up the populated_map patch in two, so that these bits
> > or the hugetlbfs bits could be put on top of having that nodemask.
>
> Well maybe just do a single populate_map patch first. We can easily review
> that and get it in. And it will be useful for multiple other patchsets.
Right, sorry, that's what I meant -- I was moving ahead to the other
patches to make sure everything was sensible.
Will send out the populated_map patch ASAP.
> > *but*, if this change occurs in mempolicy.c, I think we still have a
> > problem, where me->il_next could be initialized in do_set_mempolicy() to
> > a memoryless node:
>
> I thought that one misalloc would not be that problematic (hmmmm... unless
> its a hugetlb page on smallist NUMA system...)
Right -- it all depends...
> > if (new && new->policy == MPOL_INTERLEAVE)
> > current->il_next = first_node(new->v.nodes);
>
> Hmmmm... We could also switch off the nodes in v.nodes? Then we do not
> need any additional checks and the modifications to interleave() are
> not necessary?
Ah true, so that would happen at mpol_new() time. Makes sense.
Thanks,
Nish
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2007-06-11 20:18 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-07 15:04 [PATCH] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated Nishanth Aravamudan
2007-06-07 18:11 ` Christoph Lameter
2007-06-07 22:01 ` [PATCH v2] " Nishanth Aravamudan
2007-06-07 22:05 ` Christoph Lameter
2007-06-07 22:16 ` Nishanth Aravamudan
2007-06-11 12:49 ` Andy Whitcroft
2007-06-11 16:12 ` Christoph Lameter
2007-06-11 16:42 ` Christoph Lameter
2007-06-11 17:12 ` Nishanth Aravamudan
2007-06-11 18:29 ` Christoph Lameter
2007-06-11 18:46 ` Nishanth Aravamudan
2007-06-11 18:54 ` Christoph Lameter
2007-06-11 19:36 ` Nishanth Aravamudan
2007-06-11 19:43 ` Christoph Lameter
2007-06-11 20:18 ` Nishanth Aravamudan
2007-06-11 18:23 ` Lee Schermerhorn
2007-06-11 18:40 ` Christoph Lameter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox