* [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE
@ 2007-08-02 17:21 Mel Gorman
2007-08-02 19:41 ` Lee Schermerhorn
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Mel Gorman @ 2007-08-02 17:21 UTC (permalink / raw)
To: akpm; +Cc: Lee.Schermerhorn, clameter, kamezawa.hiroyu, linux-mm, linux-kernel
The NUMA layer only supports NUMA policies for the highest zone. When
ZONE_MOVABLE is configured with kernelcore=, the the highest zone becomes
ZONE_MOVABLE. The result is that policies are only applied to allocations
like anonymous pages and page cache allocated from ZONE_MOVABLE when the
zone is used.
This patch applies policies to the two highest zones when the highest zone
is ZONE_MOVABLE. As ZONE_MOVABLE consists of pages from the highest "real"
zone, it's always functionally equivalent.
The patch has been tested on a variety of machines both NUMA and non-NUMA
covering x86, x86_64 and ppc64. No abnormal results were seen in kernbench,
tbench, dbench or hackbench. It passes regression tests from the numactl
package with and without kernelcore= once numactl tests are patched to
wait for vmstat counters to update.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
include/linux/mempolicy.h | 2 +-
include/linux/mmzone.h | 18 ++++++++++++++++++
mm/mempolicy.c | 2 +-
mm/page_alloc.c | 13 +++++++++++++
4 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index e147cf5..5bdd656 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -166,7 +166,7 @@ extern enum zone_type policy_zone;
static inline void check_highest_zone(enum zone_type k)
{
- if (k > policy_zone)
+ if (k > policy_zone && k != ZONE_MOVABLE)
policy_zone = k;
}
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3ea68cd..4e56273 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -410,6 +410,24 @@ struct zonelist {
#endif
};
+#ifdef CONFIG_NUMA
+/*
+ * Only custom zonelists like MPOL_BIND need to be filtered as part of
+ * policies. As described in the comment for struct zonelist_cache, these
+ * zonelists will not have a zlcache so zlcache_ptr will not be set. Use
+ * that to determine if the zonelists needs to be filtered or not.
+ */
+static inline int alloc_should_filter_zonelist(struct zonelist *zonelist)
+{
+ return !zonelist->zlcache_ptr;
+}
+#else
+static inline int alloc_should_filter_zonelist(struct zonelist *zonelist)
+{
+ return 0;
+}
+#endif /* CONFIG_NUMA */
+
#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
struct node_active_region {
unsigned long start_pfn;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 71b84b4..172abff 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -149,7 +149,7 @@ static struct zonelist *bind_zonelist(nodemask_t *nodes)
lower zones etc. Avoid empty zones because the memory allocator
doesn't like them. If you implement node hot removal you
have to fix that. */
- k = policy_zone;
+ k = MAX_NR_ZONES - 1;
while (1) {
for_each_node_mask(nd, *nodes) {
struct zone *z = &NODE_DATA(nd)->node_zones[k];
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3da85b8..6427653 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1157,6 +1157,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order,
nodemask_t *allowednodes = NULL;/* zonelist_cache approximation */
int zlc_active = 0; /* set if using zonelist_cache */
int did_zlc_setup = 0; /* just call zlc_setup() one time */
+ enum zone_type highest_zoneidx = -1; /* Gets set for policy zonelists */
zonelist_scan:
/*
@@ -1166,6 +1167,18 @@ zonelist_scan:
z = zonelist->zones;
do {
+ /*
+ * In NUMA, this could be a policy zonelist which contains
+ * zones that may not be allowed by the current gfp_mask.
+ * Check the zone is allowed by the current flags
+ */
+ if (unlikely(alloc_should_filter_zonelist(zonelist))) {
+ if (highest_zoneidx == -1)
+ highest_zoneidx = gfp_zone(gfp_mask);
+ if (zone_idx(*z) > highest_zoneidx)
+ continue;
+ }
+
if (NUMA_BUILD && zlc_active &&
!zlc_zone_worth_trying(zonelist, z, allowednodes))
continue;
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-02 17:21 [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE Mel Gorman @ 2007-08-02 19:41 ` Lee Schermerhorn 2007-08-02 20:45 ` Christoph Lameter 2007-08-03 22:02 ` Andi Kleen 2 siblings, 0 replies; 23+ messages in thread From: Lee Schermerhorn @ 2007-08-02 19:41 UTC (permalink / raw) To: Mel Gorman; +Cc: akpm, clameter, kamezawa.hiroyu, linux-mm, linux-kernel On Thu, 2007-08-02 at 18:21 +0100, Mel Gorman wrote: > The NUMA layer only supports NUMA policies for the highest zone. When > ZONE_MOVABLE is configured with kernelcore=, the the highest zone becomes > ZONE_MOVABLE. The result is that policies are only applied to allocations > like anonymous pages and page cache allocated from ZONE_MOVABLE when the > zone is used. > > This patch applies policies to the two highest zones when the highest zone > is ZONE_MOVABLE. As ZONE_MOVABLE consists of pages from the highest "real" > zone, it's always functionally equivalent. > > The patch has been tested on a variety of machines both NUMA and non-NUMA > covering x86, x86_64 and ppc64. No abnormal results were seen in kernbench, > tbench, dbench or hackbench. It passes regression tests from the numactl > package with and without kernelcore= once numactl tests are patched to > wait for vmstat counters to update. > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> And an ia_64 NUMA platform with some ad hoc, interactive functional testing with memtoy and an overnight run of a usex job mix. Job mix included a 32-way kernel build, several povray tracing apps, IO tests, sequential and random vm fault tests and a number of 'bin' tests that simulate a half a dozen crazed monkeys pounding away at keyboards entering surprisingly error-free commands. All of these loop until stopped or the system hangs/crashes--which it didn't... Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Tested-by: Lee Schermerhorn <lee.schermerhorn@hp.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-02 17:21 [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE Mel Gorman 2007-08-02 19:41 ` Lee Schermerhorn @ 2007-08-02 20:45 ` Christoph Lameter 2007-08-06 19:44 ` Andrew Morton 2007-08-06 21:56 ` Paul Jackson 2007-08-03 22:02 ` Andi Kleen 2 siblings, 2 replies; 23+ messages in thread From: Christoph Lameter @ 2007-08-02 20:45 UTC (permalink / raw) To: Mel Gorman Cc: akpm, pj, Lee.Schermerhorn, kamezawa.hiroyu, linux-mm, linux-kernel On Thu, 2 Aug 2007, Mel Gorman wrote: > +#ifdef CONFIG_NUMA > +/* > + * Only custom zonelists like MPOL_BIND need to be filtered as part of > + * policies. As described in the comment for struct zonelist_cache, these > + * zonelists will not have a zlcache so zlcache_ptr will not be set. Use > + * that to determine if the zonelists needs to be filtered or not. > + */ > +static inline int alloc_should_filter_zonelist(struct zonelist *zonelist) > +{ > + return !zonelist->zlcache_ptr; > +} I guess Paul needs to have a look at this one. Otherwise Acked-by: Christoph Lameter <clameter@sgi.com> > @@ -1166,6 +1167,18 @@ zonelist_scan: > z = zonelist->zones; > > do { > + /* > + * In NUMA, this could be a policy zonelist which contains > + * zones that may not be allowed by the current gfp_mask. > + * Check the zone is allowed by the current flags > + */ > + if (unlikely(alloc_should_filter_zonelist(zonelist))) { > + if (highest_zoneidx == -1) > + highest_zoneidx = gfp_zone(gfp_mask); > + if (zone_idx(*z) > highest_zoneidx) > + continue; > + } > + > if (NUMA_BUILD && zlc_active && Hotpath. Sigh. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-02 20:45 ` Christoph Lameter @ 2007-08-06 19:44 ` Andrew Morton 2007-08-06 20:13 ` Christoph Lameter 2007-08-06 21:56 ` Paul Jackson 1 sibling, 1 reply; 23+ messages in thread From: Andrew Morton @ 2007-08-06 19:44 UTC (permalink / raw) To: Christoph Lameter Cc: Mel Gorman, pj, Lee.Schermerhorn, kamezawa.hiroyu, linux-mm, linux-kernel, Paul Mundt On Thu, 2 Aug 2007 13:45:23 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote: > On Thu, 2 Aug 2007, Mel Gorman wrote: > > > +#ifdef CONFIG_NUMA > > +/* > > + * Only custom zonelists like MPOL_BIND need to be filtered as part of > > + * policies. As described in the comment for struct zonelist_cache, these > > + * zonelists will not have a zlcache so zlcache_ptr will not be set. Use > > + * that to determine if the zonelists needs to be filtered or not. > > + */ > > +static inline int alloc_should_filter_zonelist(struct zonelist *zonelist) > > +{ > > + return !zonelist->zlcache_ptr; > > +} > > I guess Paul needs to have a look at this one. Which Paul? > Otherwise > > Acked-by: Christoph Lameter <clameter@sgi.com> > > > @@ -1166,6 +1167,18 @@ zonelist_scan: > > z = zonelist->zones; > > > > do { > > + /* > > + * In NUMA, this could be a policy zonelist which contains > > + * zones that may not be allowed by the current gfp_mask. > > + * Check the zone is allowed by the current flags > > + */ > > + if (unlikely(alloc_should_filter_zonelist(zonelist))) { > > + if (highest_zoneidx == -1) > > + highest_zoneidx = gfp_zone(gfp_mask); > > + if (zone_idx(*z) > highest_zoneidx) > > + continue; > > + } > > + > > if (NUMA_BUILD && zlc_active && > > Hotpath. Sigh. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-06 19:44 ` Andrew Morton @ 2007-08-06 20:13 ` Christoph Lameter 0 siblings, 0 replies; 23+ messages in thread From: Christoph Lameter @ 2007-08-06 20:13 UTC (permalink / raw) To: Andrew Morton Cc: Mel Gorman, pj, Lee.Schermerhorn, kamezawa.hiroyu, linux-mm, linux-kernel, Paul Mundt On Mon, 6 Aug 2007, Andrew Morton wrote: > Which Paul? Paul Jackson. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-02 20:45 ` Christoph Lameter 2007-08-06 19:44 ` Andrew Morton @ 2007-08-06 21:56 ` Paul Jackson 1 sibling, 0 replies; 23+ messages in thread From: Paul Jackson @ 2007-08-06 21:56 UTC (permalink / raw) To: Christoph Lameter Cc: mel, akpm, Lee.Schermerhorn, kamezawa.hiroyu, linux-mm, linux-kernel Christoph wrote, responding to Mel and Andrew: > > +static inline int alloc_should_filter_zonelist(struct zonelist *zonelist) > > +{ > > + return !zonelist->zlcache_ptr; > > +} > > I guess Paul needs to have a look at this one. ... > > Which Paul? > > Paul Jackson. I'll ack that the above snippet of code, alloc_should_filter_zonelist(), does, as its comment explains, return true iff it's a custom zonelist such as from MPOL_BIND. As to the more interesting issues that this patch raises ... I have no clue. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-02 17:21 [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE Mel Gorman 2007-08-02 19:41 ` Lee Schermerhorn 2007-08-02 20:45 ` Christoph Lameter @ 2007-08-03 22:02 ` Andi Kleen 2007-08-04 0:23 ` Mel Gorman 2007-08-06 19:15 ` Andrew Morton 2 siblings, 2 replies; 23+ messages in thread From: Andi Kleen @ 2007-08-03 22:02 UTC (permalink / raw) To: Mel Gorman Cc: akpm, Lee.Schermerhorn, clameter, kamezawa.hiroyu, linux-mm, linux-kernel On Thursday 02 August 2007 19:21:18 Mel Gorman wrote: > The NUMA layer only supports NUMA policies for the highest zone. When > ZONE_MOVABLE is configured with kernelcore=, the the highest zone becomes > ZONE_MOVABLE. The result is that policies are only applied to allocations > like anonymous pages and page cache allocated from ZONE_MOVABLE when the > zone is used. > > This patch applies policies to the two highest zones when the highest zone > is ZONE_MOVABLE. As ZONE_MOVABLE consists of pages from the highest "real" > zone, it's always functionally equivalent. > > The patch has been tested on a variety of machines both NUMA and non-NUMA > covering x86, x86_64 and ppc64. No abnormal results were seen in kernbench, > tbench, dbench or hackbench. It passes regression tests from the numactl > package with and without kernelcore= once numactl tests are patched to > wait for vmstat counters to update. I must honestly say I really hate the patch. It's a horrible hack and makes fast paths slower. When I designed mempolicies I especially tried to avoid things like that, please don't add them through the backdoor now. -Andi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-03 22:02 ` Andi Kleen @ 2007-08-04 0:23 ` Mel Gorman 2007-08-04 8:51 ` Andi Kleen 2007-08-06 19:15 ` Andrew Morton 1 sibling, 1 reply; 23+ messages in thread From: Mel Gorman @ 2007-08-04 0:23 UTC (permalink / raw) To: Andi Kleen Cc: akpm, Lee.Schermerhorn, clameter, kamezawa.hiroyu, linux-mm, linux-kernel On (04/08/07 00:02), Andi Kleen didst pronounce: > On Thursday 02 August 2007 19:21:18 Mel Gorman wrote: > > The NUMA layer only supports NUMA policies for the highest zone. When > > ZONE_MOVABLE is configured with kernelcore=, the the highest zone becomes > > ZONE_MOVABLE. The result is that policies are only applied to allocations > > like anonymous pages and page cache allocated from ZONE_MOVABLE when the > > zone is used. > > > > This patch applies policies to the two highest zones when the highest zone > > is ZONE_MOVABLE. As ZONE_MOVABLE consists of pages from the highest "real" > > zone, it's always functionally equivalent. > > > > The patch has been tested on a variety of machines both NUMA and non-NUMA > > covering x86, x86_64 and ppc64. No abnormal results were seen in kernbench, > > tbench, dbench or hackbench. It passes regression tests from the numactl > > package with and without kernelcore= once numactl tests are patched to > > wait for vmstat counters to update. > > I must honestly say I really hate the patch. It's a horrible hack and makes fast paths > slower. When I designed mempolicies I especially tried to avoid things > like that, please don't add them through the backdoor now. > It only affects hot paths in the NUMA case so non-NUMA users will not care. For NUMA users, I have posted patches that eliminate multiple zonelists altogether which will reduce cache footprint (something like 7K per node on x86_64) and make things like MPOL_BIND behave in a consistent manner. That would cost on CPU but save on cache which would (hopefully) result in a net gain in most cases. I would like to go with this patch for now just for policies but for 2.6.23, we could leave it as "policies only apply to ZONE_MOVABLE when it is used" if you really insisted on it. It's less than ideal though for sure. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-04 0:23 ` Mel Gorman @ 2007-08-04 8:51 ` Andi Kleen 2007-08-04 16:39 ` Mel Gorman 0 siblings, 1 reply; 23+ messages in thread From: Andi Kleen @ 2007-08-04 8:51 UTC (permalink / raw) To: Mel Gorman Cc: akpm, Lee.Schermerhorn, clameter, kamezawa.hiroyu, linux-mm, linux-kernel > It only affects hot paths in the NUMA case so non-NUMA users will not care. For x86-64 most distribution kernels are NUMA these days. > For NUMA users, I have posted patches that eliminate multiple zonelists > altogether which will reduce cache footprint (something like 7K per node on > x86_64) How do you get to 7k? We got worst case 3 zones node (normally less); that's three pointers per GFP level. > and make things like MPOL_BIND behave in a consistent manner. That > would cost on CPU but save on cache which would (hopefully) result in a net > gain in most cases. That might be a good tradeoff, but without seeing the patch the 7k number sounds very dubious. > I would like to go with this patch for now just for policies but for > 2.6.23, we could leave it as "policies only apply to ZONE_MOVABLE when it > is used" if you really insisted on it. It's less than ideal though for > sure. Or disable ZONE_MOVABLE. It seems to be clearly not well thought out well yet. Perhaps make it dependent on !CONFIG_NUMA. -Andi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-04 8:51 ` Andi Kleen @ 2007-08-04 16:39 ` Mel Gorman 0 siblings, 0 replies; 23+ messages in thread From: Mel Gorman @ 2007-08-04 16:39 UTC (permalink / raw) To: Andi Kleen Cc: akpm, Lee.Schermerhorn, clameter, kamezawa.hiroyu, linux-mm, linux-kernel On (04/08/07 10:51), Andi Kleen didst pronounce: > > > It only affects hot paths in the NUMA case so non-NUMA users will not care. > > For x86-64 most distribution kernels are NUMA these days. > > > For NUMA users, I have posted patches that eliminate multiple zonelists > > altogether which will reduce cache footprint (something like 7K per node on > > x86_64) > > How do you get to 7k? We got worst case 3 zones node (normally less); > that's three pointers per GFP level. > The zonelists are pretty big. On a 4 node x86_64 machine (elm3b6 from tko), the size of pg_data_t goes from 13632 bytes to 5824 (almost 8k in fact) when only one zonelists is used. > > and make things like MPOL_BIND behave in a consistent manner. That > > would cost on CPU but save on cache which would (hopefully) result in a net > > gain in most cases. > > That might be a good tradeoff, but without seeing the patch > the 7k number sounds very dubious. > Proof-of-concept patch is below. It's not suitable for merging and I was getting the policy issue resolved first before spending more time on it. The patch was a big too heavy to call a fix for a bug. > > I would like to go with this patch for now just for policies but for > > 2.6.23, we could leave it as "policies only apply to ZONE_MOVABLE when it > > is used" if you really insisted on it. It's less than ideal though for > > sure. > > Or disable ZONE_MOVABLE. It seems to be clearly not well thought > out well yet. The zone is disabled by default. When enabled, the policies are only applied to it which is expected, but not desirable which is why I wanted to apply policies to the two highest zones when the highest was ZONE_MOVABLE. >Perhaps make it dependent on !CONFIG_NUMA. > That would make no sense. The systems that will be using hugepages and looking to resize their pool will often be NUMA machines and you state that most x86_64 distros will have NUMA enabled. This is the prototype patch for removing multiple zonelists altogether. It would also act as a fix for the policies-only-applying-to-ZONE_MOVABLE problem. You may not that where the filtering takes place in __alloc_pages() is in the same place as with the patch to fix policies so there is a logical progression from bug fix now to something with wider usefulness later. diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c index e724b36..4d417c4 100644 --- a/arch/parisc/mm/init.c +++ b/arch/parisc/mm/init.c @@ -602,12 +602,15 @@ void show_mem(void) int i, j, k; for (i = 0; i < npmem_ranges; i++) { + zl = &NODE_DATA(i)->node_zonelist; for (j = 0; j < MAX_NR_ZONES; j++) { - zl = NODE_DATA(i)->node_zonelists + j; printk("Zone list for zone %d on node %d: ", j, i); - for (k = 0; zl->zones[k] != NULL; k++) + for (k = 0; zl->zones[k] != NULL; k++) { + if (should_filter_zone(zl->zones[k]), j) + continue; printk("[%ld/%s] ", zone_to_nid(zl->zones[k]), zl->zones[k]->name); + } printk("\n"); } } diff --git a/drivers/char/sysrq.c b/drivers/char/sysrq.c index 39cc318..b56d17f 100644 --- a/drivers/char/sysrq.c +++ b/drivers/char/sysrq.c @@ -270,7 +270,7 @@ static struct sysrq_key_op sysrq_term_op = { static void moom_callback(struct work_struct *ignored) { - out_of_memory(&NODE_DATA(0)->node_zonelists[ZONE_NORMAL], + out_of_memory(&NODE_DATA(0)->node_zonelist, GFP_KERNEL, 0); } diff --git a/fs/buffer.c b/fs/buffer.c index 0e5ec37..8e9bbef 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -354,7 +354,7 @@ static void free_more_memory(void) yield(); for_each_online_pgdat(pgdat) { - zones = pgdat->node_zonelists[gfp_zone(GFP_NOFS)].zones; + zones = pgdat->node_zonelist.zones; if (*zones) try_to_free_pages(zones, 0, GFP_NOFS); } diff --git a/include/linux/gfp.h b/include/linux/gfp.h index bc68dd9..f2a597e 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -116,6 +116,13 @@ static inline enum zone_type gfp_zone(gfp_t flags) return ZONE_NORMAL; } +static inline int should_filter_zone(struct zone *zone, int highest_zoneidx) +{ + if (zone_idx(zone) > highest_zoneidx) + return 1; + return 0; +} + /* * There is only one page-allocator function, and two main namespaces to * it. The alloc_page*() variants return 'struct page *' and as such @@ -151,8 +158,7 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, if (nid < 0) nid = numa_node_id(); - return __alloc_pages(gfp_mask, order, - NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask)); + return __alloc_pages(gfp_mask, order, &NODE_DATA(nid)->node_zonelist); } #ifdef CONFIG_NUMA diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index e147cf5..83e5256 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -166,7 +166,7 @@ extern enum zone_type policy_zone; static inline void check_highest_zone(enum zone_type k) { - if (k > policy_zone) + if (k > policy_zone && k != ZONE_MOVABLE) policy_zone = k; } @@ -258,7 +258,7 @@ static inline void mpol_fix_fork_child_flag(struct task_struct *p) static inline struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr, gfp_t gfp_flags) { - return NODE_DATA(0)->node_zonelists + gfp_zone(gfp_flags); + return &NODE_DATA(0)->node_zonelist; } static inline int do_migrate_pages(struct mm_struct *mm, diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 3ea68cd..d2fe32e 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -201,6 +201,7 @@ struct zone { */ unsigned long lowmem_reserve[MAX_NR_ZONES]; + int zone_idx; #ifdef CONFIG_NUMA int node; /* @@ -437,7 +438,7 @@ extern struct page *mem_map; struct bootmem_data; typedef struct pglist_data { struct zone node_zones[MAX_NR_ZONES]; - struct zonelist node_zonelists[MAX_NR_ZONES]; + struct zonelist node_zonelist; int nr_zones; #ifdef CONFIG_FLAT_NODE_MEM_MAP struct page *node_mem_map; @@ -501,7 +502,7 @@ unsigned long __init node_memmap_size_bytes(int, unsigned long, unsigned long); /* * zone_idx() returns 0 for the ZONE_DMA zone, 1 for the ZONE_NORMAL zone, etc. */ -#define zone_idx(zone) ((zone) - (zone)->zone_pgdat->node_zones) +#define zone_idx(zone) ((zone)->zone_idx) static inline int populated_zone(struct zone *zone) { @@ -543,7 +544,7 @@ static inline int is_normal_idx(enum zone_type idx) static inline int is_highmem(struct zone *zone) { #ifdef CONFIG_HIGHMEM - int zone_idx = zone - zone->zone_pgdat->node_zones; + int zone_idx = zone_idx(zone); return zone_idx == ZONE_HIGHMEM || (zone_idx == ZONE_MOVABLE && zone_movable_is_highmem()); #else diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 71b84b4..8b16ca3 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -149,7 +149,7 @@ static struct zonelist *bind_zonelist(nodemask_t *nodes) lower zones etc. Avoid empty zones because the memory allocator doesn't like them. If you implement node hot removal you have to fix that. */ - k = policy_zone; + k = MAX_NR_ZONES - 1; while (1) { for_each_node_mask(nd, *nodes) { struct zone *z = &NODE_DATA(nd)->node_zones[k]; @@ -1116,7 +1116,7 @@ static struct zonelist *zonelist_policy(gfp_t gfp, struct mempolicy *policy) nd = 0; BUG(); } - return NODE_DATA(nd)->node_zonelists + gfp_zone(gfp); + return &NODE_DATA(nd)->node_zonelist; } /* Do dynamic interleaving for a process */ @@ -1212,7 +1212,7 @@ struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr, unsigned nid; nid = interleave_nid(pol, vma, addr, HPAGE_SHIFT); - return NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_flags); + return &NODE_DATA(nid)->node_zonelist; } return zonelist_policy(GFP_HIGHUSER, pol); } @@ -1226,7 +1226,7 @@ static struct page *alloc_page_interleave(gfp_t gfp, unsigned order, struct zonelist *zl; struct page *page; - zl = NODE_DATA(nid)->node_zonelists + gfp_zone(gfp); + zl = &NODE_DATA(nid)->node_zonelist; page = __alloc_pages(gfp, order, zl); if (page && page_zone(page) == zl->zones[0]) inc_zone_page_state(page, NUMA_INTERLEAVE_HIT); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index f9b82ad..1cca18e 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -179,6 +179,7 @@ static inline int constrained_alloc(struct zonelist *zonelist, gfp_t gfp_mask) struct zone **z; nodemask_t nodes; int node; + enum zone_type highest_zoneidx = gfp_zone(gfp_mask); nodes_clear(nodes); /* node has memory ? */ @@ -186,11 +187,15 @@ static inline int constrained_alloc(struct zonelist *zonelist, gfp_t gfp_mask) if (NODE_DATA(node)->node_present_pages) node_set(node, nodes); - for (z = zonelist->zones; *z; z++) + for (z = zonelist->zones; *z; z++) { + + if (should_filter_zone(*z, highest_zoneidx)) + continue; if (cpuset_zone_allowed_softwall(*z, gfp_mask)) node_clear(zone_to_nid(*z), nodes); else return CONSTRAINT_CPUSET; + } if (!nodes_empty(nodes)) return CONSTRAINT_MEMORY_POLICY; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3da85b8..190994d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1157,6 +1157,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, nodemask_t *allowednodes = NULL;/* zonelist_cache approximation */ int zlc_active = 0; /* set if using zonelist_cache */ int did_zlc_setup = 0; /* just call zlc_setup() one time */ + enum zone_type highest_zoneidx = gfp_zone(gfp_mask); zonelist_scan: /* @@ -1166,6 +1167,9 @@ zonelist_scan: z = zonelist->zones; do { + if (should_filter_zone(*z, highest_zoneidx)) + continue; + if (NUMA_BUILD && zlc_active && !zlc_zone_worth_trying(zonelist, z, allowednodes)) continue; @@ -1460,11 +1464,11 @@ static unsigned int nr_free_zone_pages(int offset) pg_data_t *pgdat = NODE_DATA(numa_node_id()); unsigned int sum = 0; - struct zonelist *zonelist = pgdat->node_zonelists + offset; - struct zone **zonep = zonelist->zones; - struct zone *zone; + struct zone **zonep = pgdat->node_zonelist.zones; + struct zone *zone = *zonep; - for (zone = *zonep++; zone; zone = *zonep++) { + for (zone = *zonep++; zone && zone_idx(zone) > offset; zone = *zonep++); + for (; zone; zone = *zonep++) { unsigned long size = zone->present_pages; unsigned long high = zone->pages_high; if (size > high) @@ -1823,17 +1827,14 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask) */ static void build_zonelists_in_node_order(pg_data_t *pgdat, int node) { - enum zone_type i; int j; struct zonelist *zonelist; - for (i = 0; i < MAX_NR_ZONES; i++) { - zonelist = pgdat->node_zonelists + i; - for (j = 0; zonelist->zones[j] != NULL; j++) - ; - j = build_zonelists_node(NODE_DATA(node), zonelist, j, i); - zonelist->zones[j] = NULL; - } + zonelist = &pgdat->node_zonelist; + for (j = 0; zonelist->zones[j] != NULL; j++) + ; + j = build_zonelists_node(NODE_DATA(node), zonelist, j, MAX_NR_ZONES-1); + zonelist->zones[j] = NULL; } /* @@ -1846,27 +1847,24 @@ static int node_order[MAX_NUMNODES]; static void build_zonelists_in_zone_order(pg_data_t *pgdat, int nr_nodes) { - enum zone_type i; int pos, j, node; int zone_type; /* needs to be signed */ struct zone *z; struct zonelist *zonelist; - for (i = 0; i < MAX_NR_ZONES; i++) { - zonelist = pgdat->node_zonelists + i; - pos = 0; - for (zone_type = i; zone_type >= 0; zone_type--) { - for (j = 0; j < nr_nodes; j++) { - node = node_order[j]; - z = &NODE_DATA(node)->node_zones[zone_type]; - if (populated_zone(z)) { - zonelist->zones[pos++] = z; - check_highest_zone(zone_type); - } + zonelist = &pgdat->node_zonelist; + pos = 0; + for (zone_type = MAX_NR_ZONES-1; zone_type >= 0; zone_type--) { + for (j = 0; j < nr_nodes; j++) { + node = node_order[j]; + z = &NODE_DATA(node)->node_zones[zone_type]; + if (populated_zone(z)) { + zonelist->zones[pos++] = z; + check_highest_zone(zone_type); } } - zonelist->zones[pos] = NULL; } + zonelist->zones[pos] = NULL; } static int default_zonelist_order(void) @@ -1933,17 +1931,14 @@ static void set_zonelist_order(void) static void build_zonelists(pg_data_t *pgdat) { int j, node, load; - enum zone_type i; nodemask_t used_mask; int local_node, prev_node; struct zonelist *zonelist; int order = current_zonelist_order; - /* initialize zonelists */ - for (i = 0; i < MAX_NR_ZONES; i++) { - zonelist = pgdat->node_zonelists + i; - zonelist->zones[0] = NULL; - } + /* initialize zonelist */ + zonelist = &pgdat->node_zonelist; + zonelist->zones[0] = NULL; /* NUMA-aware ordering of nodes */ local_node = pgdat->node_id; @@ -1997,7 +1992,7 @@ static void build_zonelist_cache(pg_data_t *pgdat) struct zonelist_cache *zlc; struct zone **z; - zonelist = pgdat->node_zonelists + i; + zonelist = &pgdat->node_zonelist; zonelist->zlcache_ptr = zlc = &zonelist->zlcache; bitmap_zero(zlc->fullzones, MAX_ZONES_PER_ZONELIST); for (z = zonelist->zones; *z; z++) @@ -2016,36 +2011,36 @@ static void set_zonelist_order(void) static void build_zonelists(pg_data_t *pgdat) { int node, local_node; - enum zone_type i,j; + enum zone_type j; + struct zonelist *zonelist; local_node = pgdat->node_id; - for (i = 0; i < MAX_NR_ZONES; i++) { - struct zonelist *zonelist; - zonelist = pgdat->node_zonelists + i; - - j = build_zonelists_node(pgdat, zonelist, 0, i); - /* - * Now we build the zonelist so that it contains the zones - * of all the other nodes. - * We don't want to pressure a particular node, so when - * building the zones for node N, we make sure that the - * zones coming right after the local ones are those from - * node N+1 (modulo N) - */ - for (node = local_node + 1; node < MAX_NUMNODES; node++) { - if (!node_online(node)) - continue; - j = build_zonelists_node(NODE_DATA(node), zonelist, j, i); - } - for (node = 0; node < local_node; node++) { - if (!node_online(node)) - continue; - j = build_zonelists_node(NODE_DATA(node), zonelist, j, i); - } + zonelist = &pgdat->node_zonelist; + j = build_zonelists_node(pgdat, zonelist, 0, MAX_NR_ZONES-1); - zonelist->zones[j] = NULL; + /* + * Now we build the zonelist so that it contains the zones + * of all the other nodes. + * We don't want to pressure a particular node, so when + * building the zones for node N, we make sure that the + * zones coming right after the local ones are those from + * node N+1 (modulo N) + */ + for (node = local_node + 1; node < MAX_NUMNODES; node++) { + if (!node_online(node)) + continue; + j = build_zonelists_node(NODE_DATA(node), zonelist, j, + MAX_NR_ZONES-1); } + for (node = 0; node < local_node; node++) { + if (!node_online(node)) + continue; + j = build_zonelists_node(NODE_DATA(node), zonelist, j, + MAX_NR_ZONES-1); + } + + zonelist->zones[j] = NULL; } /* non-NUMA variant of zonelist performance cache - just NULL zlcache_ptr */ @@ -2054,7 +2049,7 @@ static void build_zonelist_cache(pg_data_t *pgdat) int i; for (i = 0; i < MAX_NR_ZONES; i++) - pgdat->node_zonelists[i].zlcache_ptr = NULL; + pgdat->node_zonelist.zlcache_ptr = NULL; } #endif /* CONFIG_NUMA */ @@ -2940,6 +2935,7 @@ static void __meminit free_area_init_core(struct pglist_data *pgdat, nr_kernel_pages += realsize; nr_all_pages += realsize; + zone->zone_idx = j; zone->spanned_pages = size; zone->present_pages = realsize; #ifdef CONFIG_NUMA diff --git a/mm/slab.c b/mm/slab.c index a684778..558cf96 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3216,12 +3216,12 @@ static void *fallback_alloc(struct kmem_cache *cache, gfp_t flags) struct zone **z; void *obj = NULL; int nid; + enum zone_type highest_zoneidx = gfp_zone(flags); if (flags & __GFP_THISNODE) return NULL; - zonelist = &NODE_DATA(slab_node(current->mempolicy)) - ->node_zonelists[gfp_zone(flags)]; + zonelist = &NODE_DATA(slab_node(current->mempolicy))->node_zonelist; local_flags = (flags & GFP_LEVEL_MASK); retry: @@ -3230,6 +3230,9 @@ retry: * from existing per node queues. */ for (z = zonelist->zones; *z && !obj; z++) { + if (should_filter_zone(*z, highest_zoneidx)) + continue; + nid = zone_to_nid(*z); if (cpuset_zone_allowed_hardwall(*z, flags) && diff --git a/mm/slub.c b/mm/slub.c index 6c6d74f..eea184b 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1276,6 +1276,7 @@ static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags) struct zonelist *zonelist; struct zone **z; struct page *page; + enum zone_type highest_zoneidx = gfp_zone(flags); /* * The defrag ratio allows a configuration of the tradeoffs between @@ -1298,11 +1299,13 @@ static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags) if (!s->defrag_ratio || get_cycles() % 1024 > s->defrag_ratio) return NULL; - zonelist = &NODE_DATA(slab_node(current->mempolicy)) - ->node_zonelists[gfp_zone(flags)]; + zonelist = &NODE_DATA(slab_node(current->mempolicy))->node_zonelist; for (z = zonelist->zones; *z; z++) { struct kmem_cache_node *n; + if (should_filter_zone(*z, highest_zoneidx)) + continue; + n = get_node(s, zone_to_nid(*z)); if (n && cpuset_zone_allowed_hardwall(*z, flags) && diff --git a/mm/vmscan.c b/mm/vmscan.c index d419e10..8672d61 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1124,6 +1124,7 @@ unsigned long try_to_free_pages(struct zone **zones, int order, gfp_t gfp_mask) unsigned long nr_reclaimed = 0; struct reclaim_state *reclaim_state = current->reclaim_state; unsigned long lru_pages = 0; + enum zone_type highest_zoneidx; int i; struct scan_control sc = { .gfp_mask = gfp_mask, @@ -1136,9 +1137,14 @@ unsigned long try_to_free_pages(struct zone **zones, int order, gfp_t gfp_mask) count_vm_event(ALLOCSTALL); + highest_zoneidx = gfp_zone(gfp_mask); + for (i = 0; zones[i] != NULL; i++) { struct zone *zone = zones[i]; + if (should_filter_zone(zone, highest_zoneidx)) + continue; + if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL)) continue; -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-03 22:02 ` Andi Kleen 2007-08-04 0:23 ` Mel Gorman @ 2007-08-06 19:15 ` Andrew Morton 2007-08-06 19:18 ` Christoph Lameter ` (2 more replies) 1 sibling, 3 replies; 23+ messages in thread From: Andrew Morton @ 2007-08-06 19:15 UTC (permalink / raw) To: Andi Kleen Cc: Mel Gorman, Lee.Schermerhorn, clameter, kamezawa.hiroyu, linux-mm, linux-kernel On Sat, 4 Aug 2007 00:02:17 +0200 Andi Kleen <ak@suse.de> wrote: > On Thursday 02 August 2007 19:21:18 Mel Gorman wrote: > > The NUMA layer only supports NUMA policies for the highest zone. When > > ZONE_MOVABLE is configured with kernelcore=, the the highest zone becomes > > ZONE_MOVABLE. The result is that policies are only applied to allocations > > like anonymous pages and page cache allocated from ZONE_MOVABLE when the > > zone is used. > > > > This patch applies policies to the two highest zones when the highest zone > > is ZONE_MOVABLE. As ZONE_MOVABLE consists of pages from the highest "real" > > zone, it's always functionally equivalent. > > > > The patch has been tested on a variety of machines both NUMA and non-NUMA > > covering x86, x86_64 and ppc64. No abnormal results were seen in kernbench, > > tbench, dbench or hackbench. It passes regression tests from the numactl > > package with and without kernelcore= once numactl tests are patched to > > wait for vmstat counters to update. > > I must honestly say I really hate the patch. It's a horrible hack and makes fast paths > slower. When I designed mempolicies I especially tried to avoid things > like that, please don't add them through the backdoor now. > We don't want to be adding horrible hacks and slowness to the core of __alloc_pages(). So where do we stand on this? We made a mess of NUMA policies, and merging "grouping pages by mobility" would fix that mess, only we're not sure that we want to merge those and it's too late for 2.6.23 anwyay? If correct, I would suggest merging the horrible hack for .23 then taking it out when we merge "grouping pages by mobility". But what if we don't do that merge? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-06 19:15 ` Andrew Morton @ 2007-08-06 19:18 ` Christoph Lameter 2007-08-06 20:31 ` Andi Kleen 2007-08-06 21:48 ` Mel Gorman 2 siblings, 0 replies; 23+ messages in thread From: Christoph Lameter @ 2007-08-06 19:18 UTC (permalink / raw) To: Andrew Morton Cc: Andi Kleen, Mel Gorman, Lee.Schermerhorn, kamezawa.hiroyu, linux-mm, linux-kernel On Mon, 6 Aug 2007, Andrew Morton wrote: > If correct, I would suggest merging the horrible hack for .23 then taking > it out when we merge "grouping pages by mobility". But what if we don't do > that merge? Take the horrible hack for .23. For .24 either merge the mobility or get the other solution that Mel is working on. That solution would only use a single zonelist per node and filter on the fly. That may help performance and also help to make memory policies work better. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-06 19:15 ` Andrew Morton 2007-08-06 19:18 ` Christoph Lameter @ 2007-08-06 20:31 ` Andi Kleen 2007-08-06 21:55 ` Mel Gorman 2007-08-06 21:48 ` Mel Gorman 2 siblings, 1 reply; 23+ messages in thread From: Andi Kleen @ 2007-08-06 20:31 UTC (permalink / raw) To: Andrew Morton Cc: Mel Gorman, Lee.Schermerhorn, clameter, kamezawa.hiroyu, linux-mm, linux-kernel > If correct, I would suggest merging the horrible hack for .23 then taking > it out when we merge "grouping pages by mobility". But what if we don't do > that merge? Or disable ZONE_MOVABLE until it is usable? I don't think we have the infrastructure to really use it anyways, so it shouldn't make too much difference in terms of features. And it's not that there is some sort of deadline around for it. Or mark it CONFIG_EXPERIMENTAL with a warning that it'll break NUMA. But disabling is probably better. Then for .24 or .25 a better solution can be developed. I would prefer that instead of merging bandaid horrible hacks -- they have a tendency to stay around. -Andi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-06 20:31 ` Andi Kleen @ 2007-08-06 21:55 ` Mel Gorman 2007-08-07 5:12 ` Andrew Morton 0 siblings, 1 reply; 23+ messages in thread From: Mel Gorman @ 2007-08-06 21:55 UTC (permalink / raw) To: Andi Kleen Cc: Andrew Morton, Lee.Schermerhorn, clameter, kamezawa.hiroyu, linux-mm, linux-kernel On (06/08/07 22:31), Andi Kleen didst pronounce: > > > If correct, I would suggest merging the horrible hack for .23 then taking > > it out when we merge "grouping pages by mobility". But what if we don't do > > that merge? > > Or disable ZONE_MOVABLE until it is usable? It's usable now. The issue with policies only occurs if the user specifies kernelcore= or movablecore= on the command-line. Your language suggests that you believe policies are not applied when ZONE_MOVABLE is configured at build-time. > I don't think we have the > infrastructure to really use it anyways, so it shouldn't make too much difference > in terms of features. And it's not that there is some sort of deadline > around for it. > > Or mark it CONFIG_EXPERIMENTAL with a warning that it'll break NUMA. But disabling > is probably better. > Saying it breaks NUMA is a excessively strong language. It doesn't break policies in that they still get applied to the highest zone. If kernelcore= or movablecore= is not specified, the behaviour doesn't change. > Then for .24 or .25 a better solution can be developed. > The better solution in my mind is to always filter the zonelist instead of applying them only for MPOL_BIND zonelists as the hack does. > I would prefer that instead of merging bandaid horrible hacks -- they have > a tendency to stay around. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-06 21:55 ` Mel Gorman @ 2007-08-07 5:12 ` Andrew Morton 2007-08-07 16:55 ` Mel Gorman 0 siblings, 1 reply; 23+ messages in thread From: Andrew Morton @ 2007-08-07 5:12 UTC (permalink / raw) To: Mel Gorman Cc: Andi Kleen, Lee.Schermerhorn, clameter, kamezawa.hiroyu, linux-mm, linux-kernel On Mon, 6 Aug 2007 22:55:41 +0100 mel@skynet.ie (Mel Gorman) wrote: > On (06/08/07 22:31), Andi Kleen didst pronounce: > > > > > If correct, I would suggest merging the horrible hack for .23 then taking > > > it out when we merge "grouping pages by mobility". But what if we don't do > > > that merge? > > > > Or disable ZONE_MOVABLE until it is usable? > > It's usable now. The issue with policies only occurs if the user specifies > kernelcore= or movablecore= on the command-line. Your language suggests > that you believe policies are not applied when ZONE_MOVABLE is configured > at build-time. So.. the problem which we're fixing here is only present when someone use kernelcore=. This is in fact an argument for _not_ merging the horrible-hack. How commonly do we expect people to specify kernelcore=? If "not much" then it isn't worth adding the __alloc_pages() overhead? (It's a pretty darn small overhead, I must say) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-07 5:12 ` Andrew Morton @ 2007-08-07 16:55 ` Mel Gorman 2007-08-07 18:14 ` Andrew Morton 0 siblings, 1 reply; 23+ messages in thread From: Mel Gorman @ 2007-08-07 16:55 UTC (permalink / raw) To: Andrew Morton Cc: Andi Kleen, Lee.Schermerhorn, clameter, kamezawa.hiroyu, linux-mm, linux-kernel On (06/08/07 22:12), Andrew Morton didst pronounce: > On Mon, 6 Aug 2007 22:55:41 +0100 mel@skynet.ie (Mel Gorman) wrote: > > > On (06/08/07 22:31), Andi Kleen didst pronounce: > > > > > > > If correct, I would suggest merging the horrible hack for .23 then taking > > > > it out when we merge "grouping pages by mobility". But what if we don't do > > > > that merge? > > > > > > Or disable ZONE_MOVABLE until it is usable? > > > > It's usable now. The issue with policies only occurs if the user specifies > > kernelcore= or movablecore= on the command-line. Your language suggests > > that you believe policies are not applied when ZONE_MOVABLE is configured > > at build-time. > > So.. the problem which we're fixing here is only present when someone > use kernelcore=. This is in fact an argument for _not_ merging the > horrible-hack. > It's even more constrained than that. It only applies to the MPOL_BIND policy when kernelcore= is specified. The other policies work the same as they ever did. > How commonly do we expect people to specify kernelcore=? If "not much" then > it isn't worth adding the __alloc_pages() overhead? > For 2.6.23 at least, it'll be "not much". While I'm not keen on leaving MPOL_BIND as it is for 2.6.23, we can postpone the final decision until we've bashed the one-zonelist-per-node patches a bit and see do we want to do that instead. > (It's a pretty darn small overhead, I must say) And it's simplier than the one-zone-list-per-node patches. The current draft of the patch I'm working on looks something like; arch/parisc/mm/init.c | 10 ++- drivers/char/sysrq.c | 2 fs/buffer.c | 2 include/linux/gfp.h | 3 - include/linux/mempolicy.h | 2 include/linux/mmzone.h | 42 +++++++++++++++ include/linux/swap.h | 2 mm/mempolicy.c | 6 +- mm/mmzone.c | 28 ++++++++++ mm/oom_kill.c | 8 +-- mm/page_alloc.c | 122 +++++++++++++++++++++------------------------- mm/slab.c | 11 ++-- mm/slub.c | 11 ++-- mm/vmscan.c | 16 +++--- 14 files changed, 164 insertions(+), 101 deletions(-) -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-07 16:55 ` Mel Gorman @ 2007-08-07 18:14 ` Andrew Morton 2007-08-07 20:37 ` Christoph Lameter 2007-08-08 16:49 ` Mel Gorman 0 siblings, 2 replies; 23+ messages in thread From: Andrew Morton @ 2007-08-07 18:14 UTC (permalink / raw) To: Mel Gorman Cc: Andi Kleen, Lee.Schermerhorn, clameter, kamezawa.hiroyu, linux-mm, linux-kernel On Tue, 7 Aug 2007 17:55:47 +0100 mel@skynet.ie (Mel Gorman) wrote: > On (06/08/07 22:12), Andrew Morton didst pronounce: > > On Mon, 6 Aug 2007 22:55:41 +0100 mel@skynet.ie (Mel Gorman) wrote: > > > > > On (06/08/07 22:31), Andi Kleen didst pronounce: > > > > > > > > > If correct, I would suggest merging the horrible hack for .23 then taking > > > > > it out when we merge "grouping pages by mobility". But what if we don't do > > > > > that merge? > > > > > > > > Or disable ZONE_MOVABLE until it is usable? > > > > > > It's usable now. The issue with policies only occurs if the user specifies > > > kernelcore= or movablecore= on the command-line. Your language suggests > > > that you believe policies are not applied when ZONE_MOVABLE is configured > > > at build-time. > > > > So.. the problem which we're fixing here is only present when someone > > use kernelcore=. This is in fact an argument for _not_ merging the > > horrible-hack. > > > > It's even more constrained than that. It only applies to the MPOL_BIND > policy when kernelcore= is specified. The other policies work the same > as they ever did. so.. should we forget about merging the horrible-hack? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-07 18:14 ` Andrew Morton @ 2007-08-07 20:37 ` Christoph Lameter 2007-08-08 16:49 ` Mel Gorman 1 sibling, 0 replies; 23+ messages in thread From: Christoph Lameter @ 2007-08-07 20:37 UTC (permalink / raw) To: Andrew Morton Cc: Mel Gorman, Andi Kleen, Lee.Schermerhorn, kamezawa.hiroyu, linux-mm, linux-kernel On Tue, 7 Aug 2007, Andrew Morton wrote: > > It's even more constrained than that. It only applies to the MPOL_BIND > > policy when kernelcore= is specified. The other policies work the same > > as they ever did. > > so.. should we forget about merging the horrible-hack? Support MPOL_BIND is a basic NUMA feature. This is going to make .23 unusable for us if kernelcore= is used. If we cannot use kernelcore then NUMA systems cannot use the features that depend on kernelcore. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-07 18:14 ` Andrew Morton 2007-08-07 20:37 ` Christoph Lameter @ 2007-08-08 16:49 ` Mel Gorman 2007-08-08 17:03 ` Christoph Lameter 1 sibling, 1 reply; 23+ messages in thread From: Mel Gorman @ 2007-08-08 16:49 UTC (permalink / raw) To: Andrew Morton Cc: Andi Kleen, Lee.Schermerhorn, clameter, kamezawa.hiroyu, linux-mm, linux-kernel On (07/08/07 11:14), Andrew Morton didst pronounce: > On Tue, 7 Aug 2007 17:55:47 +0100 mel@skynet.ie (Mel Gorman) wrote: > > > On (06/08/07 22:12), Andrew Morton didst pronounce: > > > On Mon, 6 Aug 2007 22:55:41 +0100 mel@skynet.ie (Mel Gorman) wrote: > > > > > > > On (06/08/07 22:31), Andi Kleen didst pronounce: > > > > > > > > > > > If correct, I would suggest merging the horrible hack for .23 then taking > > > > > > it out when we merge "grouping pages by mobility". But what if we don't do > > > > > > that merge? > > > > > > > > > > Or disable ZONE_MOVABLE until it is usable? > > > > > > > > It's usable now. The issue with policies only occurs if the user specifies > > > > kernelcore= or movablecore= on the command-line. Your language suggests > > > > that you believe policies are not applied when ZONE_MOVABLE is configured > > > > at build-time. > > > > > > So.. the problem which we're fixing here is only present when someone > > > use kernelcore=. This is in fact an argument for _not_ merging the > > > horrible-hack. > > > > > > > It's even more constrained than that. It only applies to the MPOL_BIND > > policy when kernelcore= is specified. The other policies work the same > > as they ever did. > > so.. should we forget about merging the horrible-hack? Despite a fairly specific case, I'd still like to get the problem fixed for 2.6.23. I've posted up an alternative fix under the subject "Use one zonelist per node instead of multiple zonelists v2". It's a more invasive fix although arguably it's better overall than the hack because it's not dealing with a specific special case and has a sensible path forward that makes policies a saner ultimately. However, unlike the hack it affects all callers of the page allocator so lets see what the reaction from reviewers is before forgetting about the hack altogether. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-08 16:49 ` Mel Gorman @ 2007-08-08 17:03 ` Christoph Lameter 0 siblings, 0 replies; 23+ messages in thread From: Christoph Lameter @ 2007-08-08 17:03 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Andi Kleen, Lee.Schermerhorn, kamezawa.hiroyu, linux-mm, linux-kernel On Wed, 8 Aug 2007, Mel Gorman wrote: > Despite a fairly specific case, I'd still like to get the problem fixed > for 2.6.23. I've posted up an alternative fix under the subject "Use one > zonelist per node instead of multiple zonelists v2". It's a more invasive > fix although arguably it's better overall than the hack because it's not > dealing with a specific special case and has a sensible path forward that > makes policies a saner ultimately. However, unlike the hack it affects all > callers of the page allocator so lets see what the reaction from reviewers > is before forgetting about the hack altogether. Well its more a precursor to things to come than a hack. If the tests go well for the single zonelist scheme then we may end up doing what the patch did for the mpol_bind zonelists for all allocations. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-06 19:15 ` Andrew Morton 2007-08-06 19:18 ` Christoph Lameter 2007-08-06 20:31 ` Andi Kleen @ 2007-08-06 21:48 ` Mel Gorman 2007-08-06 22:31 ` Christoph Lameter 2 siblings, 1 reply; 23+ messages in thread From: Mel Gorman @ 2007-08-06 21:48 UTC (permalink / raw) To: Andrew Morton Cc: Andi Kleen, Lee.Schermerhorn, clameter, kamezawa.hiroyu, linux-mm, linux-kernel On (06/08/07 12:15), Andrew Morton didst pronounce: > On Sat, 4 Aug 2007 00:02:17 +0200 Andi Kleen <ak@suse.de> wrote: > > > On Thursday 02 August 2007 19:21:18 Mel Gorman wrote: > > > The NUMA layer only supports NUMA policies for the highest zone. When > > > ZONE_MOVABLE is configured with kernelcore=, the the highest zone becomes > > > ZONE_MOVABLE. The result is that policies are only applied to allocations > > > like anonymous pages and page cache allocated from ZONE_MOVABLE when the > > > zone is used. > > > > > > This patch applies policies to the two highest zones when the highest zone > > > is ZONE_MOVABLE. As ZONE_MOVABLE consists of pages from the highest "real" > > > zone, it's always functionally equivalent. > > > > > > The patch has been tested on a variety of machines both NUMA and non-NUMA > > > covering x86, x86_64 and ppc64. No abnormal results were seen in kernbench, > > > tbench, dbench or hackbench. It passes regression tests from the numactl > > > package with and without kernelcore= once numactl tests are patched to > > > wait for vmstat counters to update. > > > > I must honestly say I really hate the patch. It's a horrible hack and makes fast paths > > slower. When I designed mempolicies I especially tried to avoid things > > like that, please don't add them through the backdoor now. > > > > We don't want to be adding horrible hacks and slowness to the core of > __alloc_pages(). > > So where do we stand on this? We made a mess of NUMA policies, and merging > "grouping pages by mobility" would fix that mess, only we're not sure that > we want to merge those and it's too late for 2.6.23 anwyay? > Grouping pages by mobility would still apply polciies only to ZONE_MOVABLE when it is configured. What grouping pages by mobility would relieve is much of the motivation to configure ZONE_MOVABLE at all for hugepages. The zone has such attributes as being useful to hot-remove as well as guaranteeing how much memory can be allocated as hugepages at runtime that is not necessarily provided by grouping pages by mobility. > If correct, I would suggest merging the horrible hack for .23 then taking > it out when we merge "grouping pages by mobility". But what if we don't do > that merge? That hack aspect of the fix is that it alters the hot-path in the allocator. However, there is a logical way forward. There are patches in the works that change zonelists from having multiple zonelists to only having only one zonelist per node that is filtered based on the allocation flags. The place this filtering happens is the same as what the "hack" is currently doing. The cost of filtering should be offset by the reduced size of the node structure and tests with kernbench, hackbench and tbench seem to confirm that. This will bring the hack into being line with what we wanted with policies in the first place because things like MPOL_BIND will try nodes in node-local order instead of node-numeric order as it does currently. >From there, we can eliminate policy_zone altogether by applying policies to all zones but forcing a situation where MPOL_BIND will always contain one node that GFP_KERNEL allocations can be satisified from. For example, if I have a NUMAQ that only has ZONE_NORMAL on node 0 and a user tries to bind to nodes 2+3, they will really bind to nodes 0,2,3 so that GFP_KERNEL allocations on that process will not return NULL. Alternatively, we could have mbind return a failure if it doesn't include a node that can satisfy GFP_KERNEL allocations. Either of these options seem more sensible than sometimes applying policies and other times not applying them. The worst aspect is that filtering require many lookups of zone_idx(). However, we may be able to optimise this by keeping the zone index in the pointer within the zonelist as suggested by Christoph Lameter. However, I haven't tried implementing it yet to see what it looks like in practice. I'm for merging the hack for 2.6.23 and having one-zonelist-per-node ready for 2.6.24. If there is much fear that the hack will persist for too long, I'm ok with applying policies only to ZONE_MOVABLE when kernelcore= is specified on the command line as one-zonelist-per-node can fix the same problem. Ultimately if we agree on patches to eliminate policy_zone altogether, the problem becomes moot as it no longer exists. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-06 21:48 ` Mel Gorman @ 2007-08-06 22:31 ` Christoph Lameter 2007-08-06 22:57 ` Mel Gorman 0 siblings, 1 reply; 23+ messages in thread From: Christoph Lameter @ 2007-08-06 22:31 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Andi Kleen, Lee.Schermerhorn, kamezawa.hiroyu, linux-mm, linux-kernel On Mon, 6 Aug 2007, Mel Gorman wrote: > > So where do we stand on this? We made a mess of NUMA policies, and merging > > "grouping pages by mobility" would fix that mess, only we're not sure that > > we want to merge those and it's too late for 2.6.23 anwyay? > > > > Grouping pages by mobility would still apply polciies only to > ZONE_MOVABLE when it is configured. What grouping pages by mobility > would relieve is much of the motivation to configure ZONE_MOVABLE at all > for hugepages. The zone has such attributes as being useful to Ultimately ZONE_MOVABLE can be removed. AFAIK ZONE_MOVABLE is a temporary stepping stone to address concerns of about defrag reliability. Somehow the stepping stone got into .23 without the real thing. An additional issue with the current ZONE_MOVABLE in .23 is that the tentative association of ZONE_MOVABLE with HIGHMEM also makes use of large pages by SLUB not possible. > There are patches in the works that change zonelists from having multiple > zonelists to only having only one zonelist per node that is filtered based > on the allocation flags. The place this filtering happens is the same as what > the "hack" is currently doing. The cost of filtering should be offset by the > reduced size of the node structure and tests with kernbench, hackbench and > tbench seem to confirm that. This will bring the hack into being line with > what we wanted with policies in the first place because things like MPOL_BIND > will try nodes in node-local order instead of node-numeric order as it does > currently. I'd like to see that patch. > >From there, we can eliminate policy_zone altogether by applying policies > to all zones but forcing a situation where MPOL_BIND will always contain > one node that GFP_KERNEL allocations can be satisified from. For example, > if I have a NUMAQ that only has ZONE_NORMAL on node 0 and a user tries to > bind to nodes 2+3, they will really bind to nodes 0,2,3 so that GFP_KERNEL > allocations on that process will not return NULL. Alternatively, we could > have mbind return a failure if it doesn't include a node that can satisfy > GFP_KERNEL allocations. Either of these options seem more sensible than > sometimes applying policies and other times not applying them. We would still need to check on which nodes which zones area available. Zones that are not available on all zones would need to be exempt from policies. Maybe one could define an upper boundary of zones that are policed? On NUMAQ zones up to ZONE_NORMAL would be under policy. On x86_64 this may only include ZONE_DMA. A similar thing would occur on ia64 with the 4G DMA zone. Maybe policy_zone could become configurable? > I'm for merging the hack for 2.6.23 and having one-zonelist-per-node > ready for 2.6.24. If there is much fear that the hack will persist for too Why not for .23? It does not seem to be too much code? > long, I'm ok with applying policies only to ZONE_MOVABLE when kernelcore= > is specified on the command line as one-zonelist-per-node can fix the same > problem. Ultimately if we agree on patches to eliminate policy_zone altogether, > the problem becomes moot as it no longer exists. We cannot have a kernel release with broken mempolicy. We either need the patch here or the one-zonelist patch for .23. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE 2007-08-06 22:31 ` Christoph Lameter @ 2007-08-06 22:57 ` Mel Gorman 0 siblings, 0 replies; 23+ messages in thread From: Mel Gorman @ 2007-08-06 22:57 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Andi Kleen, Lee.Schermerhorn, kamezawa.hiroyu, linux-mm, linux-kernel On (06/08/07 15:31), Christoph Lameter didst pronounce: > On Mon, 6 Aug 2007, Mel Gorman wrote: > > > > So where do we stand on this? We made a mess of NUMA policies, and merging > > > "grouping pages by mobility" would fix that mess, only we're not sure that > > > we want to merge those and it's too late for 2.6.23 anwyay? > > > > > > > Grouping pages by mobility would still apply polciies only to > > ZONE_MOVABLE when it is configured. What grouping pages by mobility > > would relieve is much of the motivation to configure ZONE_MOVABLE at all > > for hugepages. The zone has such attributes as being useful to > > Ultimately ZONE_MOVABLE can be removed. AFAIK ZONE_MOVABLE is a temporary > stepping stone to address concerns of about defrag reliability. Somehow > the stepping stone got into .23 without the real thing. > > An additional issue with the current ZONE_MOVABLE in .23 is that the > tentative association of ZONE_MOVABLE with HIGHMEM also makes use of large > pages by SLUB not possible. > Pretty much. The use of ZONE_MOVABLE really only applies to hugepages and potentially memory hot-remove right now. > > There are patches in the works that change zonelists from having multiple > > zonelists to only having only one zonelist per node that is filtered based > > on the allocation flags. The place this filtering happens is the same as what > > the "hack" is currently doing. The cost of filtering should be offset by the > > reduced size of the node structure and tests with kernbench, hackbench and > > tbench seem to confirm that. This will bring the hack into being line with > > what we wanted with policies in the first place because things like MPOL_BIND > > will try nodes in node-local order instead of node-numeric order as it does > > currently. > > I'd like to see that patch. > I'll find the time to get it implemented this week. I've been prioritising anything that looked like a bug recently so it languished on the TODO pile. > > >From there, we can eliminate policy_zone altogether by applying policies > > to all zones but forcing a situation where MPOL_BIND will always contain > > one node that GFP_KERNEL allocations can be satisified from. For example, > > if I have a NUMAQ that only has ZONE_NORMAL on node 0 and a user tries to > > bind to nodes 2+3, they will really bind to nodes 0,2,3 so that GFP_KERNEL > > allocations on that process will not return NULL. Alternatively, we could > > have mbind return a failure if it doesn't include a node that can satisfy > > GFP_KERNEL allocations. Either of these options seem more sensible than > > sometimes applying policies and other times not applying them. > > We would still need to check on which nodes which zones area available. > Zones that are not available on all zones would need to be exempt from > policies. Maybe one could define an upper boundary of zones that are > policed? On NUMAQ zones up to ZONE_NORMAL would be under policy. On x86_64 > this may only include ZONE_DMA. A similar thing would occur on ia64 with > the 4G DMA zone. Maybe policy_zone could become configurable? > A sensible upper-boundary would be if GFP_KERNEL can be used on zones within that that node or not. gfp_zone(GFP_KERNEL) provides that sort of information and would resolve to ZONE_NORMAL on NUMAQ, ZONE_DMA on ia64 etc. Prehaps that would not work out for GFP_DMA32, I'm not 100% sure at the moment (it's late and this is meant to be a holiday, sue me :) ) This discussion is independent of one-zonelist-per-node which is the stepping stone between where we are now and getting rid of policy_zone altogether. > > I'm for merging the hack for 2.6.23 and having one-zonelist-per-node > > ready for 2.6.24. If there is much fear that the hack will persist for too > > Why not for .23? It does not seem to be too much code? > I'm working under the assumption that if it's not a bug-fix, you can't get it in after the merge window closes. I've seen complaints before where "bug-fixes" were adding features which one-zonelist-per-node may be preceived by some people to be. Perhaps the rules will flex for this patch when it comes out, perhaps not. I made the assumption that the least invasive bug-fix was sensible outside of the merge window and that's what this hack is. > > long, I'm ok with applying policies only to ZONE_MOVABLE when kernelcore= > > is specified on the command line as one-zonelist-per-node can fix the same > > problem. Ultimately if we agree on patches to eliminate policy_zone altogether, > > the problem becomes moot as it no longer exists. > > We cannot have a kernel release with broken mempolicy. We either need the > patch here or the one-zonelist patch for .23. I'll get a sensible version of one-zonelist ASAP. Prehaps we'll end up just going with that altogether if no performance issues are evident in testing. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2007-08-08 17:03 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-08-02 17:21 [PATCH] Apply memory policies to top two highest zones when highest zone is ZONE_MOVABLE Mel Gorman 2007-08-02 19:41 ` Lee Schermerhorn 2007-08-02 20:45 ` Christoph Lameter 2007-08-06 19:44 ` Andrew Morton 2007-08-06 20:13 ` Christoph Lameter 2007-08-06 21:56 ` Paul Jackson 2007-08-03 22:02 ` Andi Kleen 2007-08-04 0:23 ` Mel Gorman 2007-08-04 8:51 ` Andi Kleen 2007-08-04 16:39 ` Mel Gorman 2007-08-06 19:15 ` Andrew Morton 2007-08-06 19:18 ` Christoph Lameter 2007-08-06 20:31 ` Andi Kleen 2007-08-06 21:55 ` Mel Gorman 2007-08-07 5:12 ` Andrew Morton 2007-08-07 16:55 ` Mel Gorman 2007-08-07 18:14 ` Andrew Morton 2007-08-07 20:37 ` Christoph Lameter 2007-08-08 16:49 ` Mel Gorman 2007-08-08 17:03 ` Christoph Lameter 2007-08-06 21:48 ` Mel Gorman 2007-08-06 22:31 ` Christoph Lameter 2007-08-06 22:57 ` Mel Gorman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox