* [patch] mm: default to node zonelist ordering when nodes have only lowmem
@ 2010-03-25 22:33 David Rientjes
2010-03-26 14:07 ` Mel Gorman
0 siblings, 1 reply; 4+ messages in thread
From: David Rientjes @ 2010-03-25 22:33 UTC (permalink / raw)
To: Andrew Morton; +Cc: Mel Gorman, KAMEZAWA Hiroyuki, linux-mm
There are two types of zonelist ordering methodologies:
- node order, preferring allocations on a node to stay local to and
- zone order, preferring allocations come from a higher zone to avoid
allocating in lowmem zones even though they may not be local.
The ordering technique used by the kernel is configurable on the command
line, but also has some logic to determine what the default should be.
This logic currently lacks knowledge of systems where a node may only
have lowmem. For such systems, it is necessary to use node order so that
GFP_KERNEL allocations may be satisfied by nodes consisting of only
lowmem.
If zone order is used, GFP_KERNEL allocations to such nodes are actually
allocated on a node with local affinity that includes ZONE_NORMAL.
This change defaults to node zonelist ordering if any node lacks
ZONE_NORMAL.
To force zone order, append 'numa_zonelist_order=zone' to the kernel
command line.
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
---
mm/page_alloc.c | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2582,7 +2582,7 @@ static int default_zonelist_order(void)
* ZONE_DMA and ZONE_DMA32 can be very small area in the sytem.
* If they are really small and used heavily, the system can fall
* into OOM very easily.
- * This function detect ZONE_DMA/DMA32 size and confgigures zone order.
+ * This function detect ZONE_DMA/DMA32 size and configures zone order.
*/
/* Is there ZONE_NORMAL ? (ex. ppc has only DMA zone..) */
low_kmem_size = 0;
@@ -2594,6 +2594,15 @@ static int default_zonelist_order(void)
if (zone_type < ZONE_NORMAL)
low_kmem_size += z->present_pages;
total_size += z->present_pages;
+ } else if (zone_type == ZONE_NORMAL) {
+ /*
+ * If any node has only lowmem, then node order
+ * is preferred to allow kernel allocations
+ * locally; otherwise, they can easily infringe
+ * on other nodes when there is an abundance of
+ * lowmem available to allocate from.
+ */
+ return ZONELIST_ORDER_NODE;
}
}
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch] mm: default to node zonelist ordering when nodes have only lowmem
2010-03-25 22:33 [patch] mm: default to node zonelist ordering when nodes have only lowmem David Rientjes
@ 2010-03-26 14:07 ` Mel Gorman
2010-03-26 19:05 ` David Rientjes
0 siblings, 1 reply; 4+ messages in thread
From: Mel Gorman @ 2010-03-26 14:07 UTC (permalink / raw)
To: David Rientjes; +Cc: Andrew Morton, KAMEZAWA Hiroyuki, linux-mm
On Thu, Mar 25, 2010 at 03:33:08PM -0700, David Rientjes wrote:
> There are two types of zonelist ordering methodologies:
>
> - node order, preferring allocations on a node to stay local to and
>
> - zone order, preferring allocations come from a higher zone to avoid
> allocating in lowmem zones even though they may not be local.
>
> The ordering technique used by the kernel is configurable on the command
> line, but also has some logic to determine what the default should be.
>
> This logic currently lacks knowledge of systems where a node may only
> have lowmem. For such systems, it is necessary to use node order so that
> GFP_KERNEL allocations may be satisfied by nodes consisting of only
> lowmem.
>
> If zone order is used, GFP_KERNEL allocations to such nodes are actually
> allocated on a node with local affinity that includes ZONE_NORMAL.
>
> This change defaults to node zonelist ordering if any node lacks
> ZONE_NORMAL.
>
> To force zone order, append 'numa_zonelist_order=zone' to the kernel
> command line.
>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
> mm/page_alloc.c | 11 ++++++++++-
> 1 files changed, 10 insertions(+), 1 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2582,7 +2582,7 @@ static int default_zonelist_order(void)
> * ZONE_DMA and ZONE_DMA32 can be very small area in the sytem.
> * If they are really small and used heavily, the system can fall
> * into OOM very easily.
> - * This function detect ZONE_DMA/DMA32 size and confgigures zone order.
> + * This function detect ZONE_DMA/DMA32 size and configures zone order.
> */
Spurious change here but it's not very important.
> /* Is there ZONE_NORMAL ? (ex. ppc has only DMA zone..) */
> low_kmem_size = 0;
> @@ -2594,6 +2594,15 @@ static int default_zonelist_order(void)
> if (zone_type < ZONE_NORMAL)
> low_kmem_size += z->present_pages;
> total_size += z->present_pages;
> + } else if (zone_type == ZONE_NORMAL) {
> + /*
What if it was ZONE_DMA32?
> + * If any node has only lowmem, then node order
> + * is preferred to allow kernel allocations
> + * locally; otherwise, they can easily infringe
> + * on other nodes when there is an abundance of
> + * lowmem available to allocate from.
> + */
> + return ZONELIST_ORDER_NODE;
It might be clearer if it was done as a similar check later
if (low_kmem_size &&
total_size > average_size && /* ignore small node */
low_kmem_size > total_size * 70/100)
return ZONELIST_ORDER_NODE;
This is saying if low memory is > 70% of total, then use nodes. To take
yours into account, it'd look something like;
if (low_kmwm_size && total_size > average_size) {
if (lowmem_size == total_size)
return ZONELIST_ORDER_ZONE;
if (lowmem_size > total_size * 70/100)
return ZONELIST_ORDER_NODE;
}
> }
> }
> }
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch] mm: default to node zonelist ordering when nodes have only lowmem
2010-03-26 14:07 ` Mel Gorman
@ 2010-03-26 19:05 ` David Rientjes
2010-03-30 10:03 ` Mel Gorman
0 siblings, 1 reply; 4+ messages in thread
From: David Rientjes @ 2010-03-26 19:05 UTC (permalink / raw)
To: Mel Gorman; +Cc: Andrew Morton, KAMEZAWA Hiroyuki, linux-mm
On Fri, 26 Mar 2010, Mel Gorman wrote:
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2582,7 +2582,7 @@ static int default_zonelist_order(void)
> > * ZONE_DMA and ZONE_DMA32 can be very small area in the sytem.
> > * If they are really small and used heavily, the system can fall
> > * into OOM very easily.
> > - * This function detect ZONE_DMA/DMA32 size and confgigures zone order.
> > + * This function detect ZONE_DMA/DMA32 size and configures zone order.
> > */
>
> Spurious change here but it's not very important.
>
> > /* Is there ZONE_NORMAL ? (ex. ppc has only DMA zone..) */
> > low_kmem_size = 0;
> > @@ -2594,6 +2594,15 @@ static int default_zonelist_order(void)
> > if (zone_type < ZONE_NORMAL)
> > low_kmem_size += z->present_pages;
> > total_size += z->present_pages;
> > + } else if (zone_type == ZONE_NORMAL) {
> > + /*
>
> What if it was ZONE_DMA32?
>
This is part of a zone iteration for each node, so if the node consists of
only ZONE_DMA then it wouldn't have a populated ZONE_NORMAL either and
will return ZONELIST_ORDER_NODE on the next iteration.
> > + * If any node has only lowmem, then node order
> > + * is preferred to allow kernel allocations
> > + * locally; otherwise, they can easily infringe
> > + * on other nodes when there is an abundance of
> > + * lowmem available to allocate from.
> > + */
> > + return ZONELIST_ORDER_NODE;
>
> It might be clearer if it was done as a similar check later
>
> if (low_kmem_size &&
> total_size > average_size && /* ignore small node */
> low_kmem_size > total_size * 70/100)
> return ZONELIST_ORDER_NODE;
>
> This is saying if low memory is > 70% of total, then use nodes. To take
> yours into account, it'd look something like;
>
> if (low_kmwm_size && total_size > average_size) {
> if (lowmem_size == total_size)
> return ZONELIST_ORDER_ZONE;
>
> if (lowmem_size > total_size * 70/100)
> return ZONELIST_ORDER_NODE;
> }
There's no guarantee that we'd ever detect the node consisiting of solely
lowmem here since it may be asymmetrically smaller than the average node
size.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch] mm: default to node zonelist ordering when nodes have only lowmem
2010-03-26 19:05 ` David Rientjes
@ 2010-03-30 10:03 ` Mel Gorman
0 siblings, 0 replies; 4+ messages in thread
From: Mel Gorman @ 2010-03-30 10:03 UTC (permalink / raw)
To: David Rientjes; +Cc: Andrew Morton, KAMEZAWA Hiroyuki, linux-mm
On Fri, Mar 26, 2010 at 12:05:06PM -0700, David Rientjes wrote:
> On Fri, 26 Mar 2010, Mel Gorman wrote:
>
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -2582,7 +2582,7 @@ static int default_zonelist_order(void)
> > > * ZONE_DMA and ZONE_DMA32 can be very small area in the sytem.
> > > * If they are really small and used heavily, the system can fall
> > > * into OOM very easily.
> > > - * This function detect ZONE_DMA/DMA32 size and confgigures zone order.
> > > + * This function detect ZONE_DMA/DMA32 size and configures zone order.
> > > */
> >
> > Spurious change here but it's not very important.
> >
> > > /* Is there ZONE_NORMAL ? (ex. ppc has only DMA zone..) */
> > > low_kmem_size = 0;
> > > @@ -2594,6 +2594,15 @@ static int default_zonelist_order(void)
> > > if (zone_type < ZONE_NORMAL)
> > > low_kmem_size += z->present_pages;
> > > total_size += z->present_pages;
> > > + } else if (zone_type == ZONE_NORMAL) {
> > > + /*
> >
> > What if it was ZONE_DMA32?
> >
>
> This is part of a zone iteration for each node, so if the node consists of
> only ZONE_DMA then it wouldn't have a populated ZONE_NORMAL either and
> will return ZONELIST_ORDER_NODE on the next iteration.
>
Yep. Made sense when I wrote out an example.
> > > + * If any node has only lowmem, then node order
> > > + * is preferred to allow kernel allocations
> > > + * locally; otherwise, they can easily infringe
> > > + * on other nodes when there is an abundance of
> > > + * lowmem available to allocate from.
> > > + */
> > > + return ZONELIST_ORDER_NODE;
> >
> > It might be clearer if it was done as a similar check later
> >
> > if (low_kmem_size &&
> > total_size > average_size && /* ignore small node */
> > low_kmem_size > total_size * 70/100)
> > return ZONELIST_ORDER_NODE;
> >
> > This is saying if low memory is > 70% of total, then use nodes. To take
> > yours into account, it'd look something like;
> >
> > if (low_kmwm_size && total_size > average_size) {
> > if (lowmem_size == total_size)
> > return ZONELIST_ORDER_ZONE;
> >
> > if (lowmem_size > total_size * 70/100)
> > return ZONELIST_ORDER_NODE;
> > }
>
> There's no guarantee that we'd ever detect the node consisiting of solely
> lowmem here since it may be asymmetrically smaller than the average node
> size.
>
True. I wasn't sure if it was intentional or not to take even small
nodes into account for this ordering.
It it's intentional, I see no problem with the patch. It's seems like a
reasonable default decision to me.
Acked-by: Mel Gorman <mel@csn.ul.ie>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-03-30 10:03 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-25 22:33 [patch] mm: default to node zonelist ordering when nodes have only lowmem David Rientjes
2010-03-26 14:07 ` Mel Gorman
2010-03-26 19:05 ` David Rientjes
2010-03-30 10:03 ` Mel Gorman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox