* [PATCH]: Free pages from local pcp lists under tight memory conditions
@ 2005-11-23 0:10 Rohit Seth
2005-11-23 5:36 ` Andrew Morton
2005-11-23 19:30 ` Christoph Lameter
0 siblings, 2 replies; 26+ messages in thread
From: Rohit Seth @ 2005-11-23 0:10 UTC (permalink / raw)
To: akpm, torvalds; +Cc: linux-mm, linux-kernel
Andrew, Linus,
[PATCH]: This patch free pages (pcp->batch from each list at a time) from
local pcp lists when a higher order allocation request is not able to
get serviced from global free_list.
This should help fix some of the earlier failures seen with order 1 allocations.
I will send separate patches for:
1- Reducing the remote cpus pcp
2- Clean up page_alloc.c for CONFIG_HOTPLUG_CPU to use this code appropiately
Signed-off-by: Rohit Seth <rohit.seth@intel.com>
--- a/mm/page_alloc.c 2005-11-22 07:03:40.000000000 -0800
+++ linux-2.6.15-rc2/mm/page_alloc.c 2005-11-22 07:17:48.000000000 -0800
@@ -827,6 +827,35 @@
return page;
}
+static int
+reduce_cpu_pcp(void )
+{
+ struct zone *zone;
+ unsigned long flags;
+ unsigned int cpu = get_cpu();
+ int i, ret=0;
+
+ local_irq_save(flags);
+ for_each_zone(zone) {
+ struct per_cpu_pageset *pset;
+
+ pset = zone_pcp(zone, cpu);
+ for (i = 0; i < ARRAY_SIZE(pset->pcp); i++) {
+ struct per_cpu_pages *pcp;
+
+ pcp = &pset->pcp[i];
+ if (pcp->count == 0)
+ continue;
+ pcp->count -= free_pages_bulk(zone, pcp->batch,
+ &pcp->list, 0);
+ ret++;
+ }
+ }
+ local_irq_restore(flags);
+ put_cpu();
+ return ret;
+}
+
/*
* This is the 'heart' of the zoned buddy allocator.
*/
@@ -887,6 +916,7 @@
* Ignore cpuset if GFP_ATOMIC (!wait) rather than fail alloc.
* See also cpuset_zone_allowed() comment in kernel/cpuset.c.
*/
+try_again:
page = get_page_from_freelist(gfp_mask, order, zonelist, alloc_flags);
if (page)
goto got_pg;
@@ -911,8 +941,15 @@
}
/* Atomic allocations - we can't balance anything */
- if (!wait)
- goto nopage;
+ if (!wait) {
+ /* Check if there are pages available on pcp lists that can be
+ * moved to global page list to satisfy higher order allocations.
+ */
+ if ((order > 0) && (reduce_cpu_pcp()))
+ goto try_again;
+ else
+ goto nopage;
+ }
rebalance:
cond_resched();
@@ -950,6 +987,14 @@
goto restart;
}
+ if (order > 0)
+ while (reduce_cpu_pcp()) {
+ if (get_page_from_freelist(gfp_mask, order, zonelist, alloc_flags))
+ goto got_pg;
+ }
+ /* FIXME: Add the support for reducing/draining the remote pcps.
+ */
+
/*
* Don't let big-order allocations loop unless the caller explicitly
* requests that. Wait for some write requests to complete then retry.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 0:10 [PATCH]: Free pages from local pcp lists under tight memory conditions Rohit Seth @ 2005-11-23 5:36 ` Andrew Morton 2005-11-23 5:58 ` Andrew Morton ` (3 more replies) 2005-11-23 19:30 ` Christoph Lameter 1 sibling, 4 replies; 26+ messages in thread From: Andrew Morton @ 2005-11-23 5:36 UTC (permalink / raw) To: Rohit Seth; +Cc: torvalds, linux-mm, linux-kernel, Christoph Lameter Rohit Seth <rohit.seth@intel.com> wrote: > > Andrew, Linus, > > [PATCH]: This patch free pages (pcp->batch from each list at a time) from > local pcp lists when a higher order allocation request is not able to > get serviced from global free_list. > > This should help fix some of the earlier failures seen with order 1 allocations. > > I will send separate patches for: > > 1- Reducing the remote cpus pcp > 2- Clean up page_alloc.c for CONFIG_HOTPLUG_CPU to use this code appropiately > > +static int > +reduce_cpu_pcp(void ) > +{ > + struct zone *zone; > + unsigned long flags; > + unsigned int cpu = get_cpu(); > + int i, ret=0; > + > + local_irq_save(flags); > + for_each_zone(zone) { > + struct per_cpu_pageset *pset; > + > + pset = zone_pcp(zone, cpu); > + for (i = 0; i < ARRAY_SIZE(pset->pcp); i++) { > + struct per_cpu_pages *pcp; > + > + pcp = &pset->pcp[i]; > + if (pcp->count == 0) > + continue; > + pcp->count -= free_pages_bulk(zone, pcp->batch, > + &pcp->list, 0); > + ret++; > + } > + } > + local_irq_restore(flags); > + put_cpu(); > + return ret; > +} This significantly duplicates the existing drain_local_pages(). > > + if (order > 0) > + while (reduce_cpu_pcp()) { > + if (get_page_from_freelist(gfp_mask, order, zonelist, alloc_flags)) This forgot to assign to local variable `page'! It'll return NULL and will leak memory. The `while' loop worries me for some reason, so I wimped out and just tried the remote drain once. > + goto got_pg; > + } > + /* FIXME: Add the support for reducing/draining the remote pcps. This is easy enough to do. I wanted to call the all-CPU drainer `drain_remote_pages' but that's already taken by some rather poorly-named NUMA thing which also duplicates most of __drain_pages(). This patch is against a random selection of the enormous number of mm/ patches in -mm. I haven't runtime-tested it yet. We need to verify that this patch actually does something useful. include/linux/gfp.h | 2 + include/linux/suspend.h | 1 mm/page_alloc.c | 85 ++++++++++++++++++++++++++++++++++++------------ 3 files changed, 66 insertions(+), 22 deletions(-) diff -puN include/linux/gfp.h~mm-free-pages-from-local-pcp-lists-under-tight-memory-conditions include/linux/gfp.h --- devel/include/linux/gfp.h~mm-free-pages-from-local-pcp-lists-under-tight-memory-conditions 2005-11-22 21:32:47.000000000 -0800 +++ devel-akpm/include/linux/gfp.h 2005-11-22 21:32:47.000000000 -0800 @@ -109,6 +109,8 @@ static inline struct page *alloc_pages_n NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask)); } +extern int drain_local_pages(void); + #ifdef CONFIG_NUMA extern struct page *alloc_pages_current(gfp_t gfp_mask, unsigned order); diff -puN include/linux/suspend.h~mm-free-pages-from-local-pcp-lists-under-tight-memory-conditions include/linux/suspend.h --- devel/include/linux/suspend.h~mm-free-pages-from-local-pcp-lists-under-tight-memory-conditions 2005-11-22 21:32:47.000000000 -0800 +++ devel-akpm/include/linux/suspend.h 2005-11-22 21:32:47.000000000 -0800 @@ -40,7 +40,6 @@ extern dev_t swsusp_resume_device; extern int shrink_mem(void); /* mm/page_alloc.c */ -extern void drain_local_pages(void); extern void mark_free_pages(struct zone *zone); #ifdef CONFIG_PM diff -puN mm/page_alloc.c~mm-free-pages-from-local-pcp-lists-under-tight-memory-conditions mm/page_alloc.c --- devel/mm/page_alloc.c~mm-free-pages-from-local-pcp-lists-under-tight-memory-conditions 2005-11-22 21:32:47.000000000 -0800 +++ devel-akpm/mm/page_alloc.c 2005-11-22 21:32:47.000000000 -0800 @@ -578,32 +578,71 @@ void drain_remote_pages(void) } #endif -#if defined(CONFIG_PM) || defined(CONFIG_HOTPLUG_CPU) -static void __drain_pages(unsigned int cpu) +/* + * Drain any cpu-local pages into the buddy lists. Must be called under + * local_irq_disable(). + */ +static int __drain_pages(unsigned int cpu) { - unsigned long flags; struct zone *zone; - int i; + int ret = 0; for_each_zone(zone) { struct per_cpu_pageset *pset; + int i; pset = zone_pcp(zone, cpu); for (i = 0; i < ARRAY_SIZE(pset->pcp); i++) { struct per_cpu_pages *pcp; pcp = &pset->pcp[i]; - local_irq_save(flags); + if (!pcp->count) + continue; pcp->count -= free_pages_bulk(zone, pcp->count, &pcp->list, 0); - local_irq_restore(flags); + ret++; } } + return ret; } -#endif /* CONFIG_PM || CONFIG_HOTPLUG_CPU */ -#ifdef CONFIG_PM +/* + * Spill all of this CPU's per-cpu pages back into the buddy allocator. + */ +int drain_local_pages(void) +{ + unsigned long flags; + int ret; + + local_irq_save(flags); + ret = __drain_pages(smp_processor_id()); + local_irq_restore(flags); + return ret; +} + +static void drainer(void *p) +{ + atomic_add(drain_local_pages(), p); +} + +/* + * Drain the per-cpu pages on all CPUs. If called from interrupt context we + * can only drain the local CPU's pages, since cross-CPU calls are deadlocky + * from interrupt context. + */ +static int drain_all_local_pages(void) +{ + if (in_interrupt()) { + return drain_local_pages(); + } else { + atomic_t ret = ATOMIC_INIT(0); + + on_each_cpu(drainer, &ret, 0, 1); + return atomic_read(&ret); + } +} +#ifdef CONFIG_PM void mark_free_pages(struct zone *zone) { unsigned long zone_pfn, flags; @@ -629,17 +668,6 @@ void mark_free_pages(struct zone *zone) spin_unlock_irqrestore(&zone->lock, flags); } -/* - * Spill all of this CPU's per-cpu pages back into the buddy allocator. - */ -void drain_local_pages(void) -{ - unsigned long flags; - - local_irq_save(flags); - __drain_pages(smp_processor_id()); - local_irq_restore(flags); -} #endif /* CONFIG_PM */ static void zone_statistics(struct zonelist *zonelist, struct zone *z) @@ -913,8 +941,16 @@ nofail_alloc: } /* Atomic allocations - we can't balance anything */ - if (!wait) - goto nopage; + if (!wait) { + /* + * Check if there are pages available on pcp lists that can be + * moved to global page list to satisfy higher order allocations + */ + if (order > 0 && drain_all_local_pages()) + goto restart; + else + goto nopage; + } rebalance: cond_resched(); @@ -952,6 +988,13 @@ rebalance: goto restart; } + if (order > 0 && drain_all_local_pages()) { + page = get_page_from_freelist(gfp_mask, order, zonelist, + alloc_flags); + if (page) + goto got_pg; + } + /* * Don't let big-order allocations loop unless the caller explicitly * requests that. Wait for some write requests to complete then retry. _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 5:36 ` Andrew Morton @ 2005-11-23 5:58 ` Andrew Morton 2005-11-23 18:17 ` Rohit Seth 2005-11-23 6:36 ` Christoph Lameter ` (2 subsequent siblings) 3 siblings, 1 reply; 26+ messages in thread From: Andrew Morton @ 2005-11-23 5:58 UTC (permalink / raw) To: rohit.seth, torvalds, linux-mm, linux-kernel, christoph Andrew Morton <akpm@osdl.org> wrote: > > The `while' loop worries me for some reason, so I wimped out and just tried > the remote drain once. Even the `goto restart' which is in this patch worries me from a livelock POV. Perhaps we should only ever run drain_all_local_pages() once per __alloc_pages() invokation. And perhaps we should run drain_all_local_pages() for GFP_ATOMIC or PF_MEMALLOC attempts too. --- devel/include/linux/gfp.h~mm-free-pages-from-local-pcp-lists-under-tight-memory-conditions 2005-11-22 21:47:33.000000000 -0800 +++ devel-akpm/include/linux/gfp.h 2005-11-22 21:57:22.000000000 -0800 @@ -109,6 +109,8 @@ static inline struct page *alloc_pages_n NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask)); } +extern void drain_local_pages(void); + #ifdef CONFIG_NUMA extern struct page *alloc_pages_current(gfp_t gfp_mask, unsigned order); diff -puN include/linux/suspend.h~mm-free-pages-from-local-pcp-lists-under-tight-memory-conditions include/linux/suspend.h --- devel/include/linux/suspend.h~mm-free-pages-from-local-pcp-lists-under-tight-memory-conditions 2005-11-22 21:47:33.000000000 -0800 +++ devel-akpm/include/linux/suspend.h 2005-11-22 21:47:33.000000000 -0800 @@ -40,7 +40,6 @@ extern dev_t swsusp_resume_device; extern int shrink_mem(void); /* mm/page_alloc.c */ -extern void drain_local_pages(void); extern void mark_free_pages(struct zone *zone); #ifdef CONFIG_PM diff -puN mm/page_alloc.c~mm-free-pages-from-local-pcp-lists-under-tight-memory-conditions mm/page_alloc.c --- devel/mm/page_alloc.c~mm-free-pages-from-local-pcp-lists-under-tight-memory-conditions 2005-11-22 21:47:33.000000000 -0800 +++ devel-akpm/mm/page_alloc.c 2005-11-22 21:58:01.000000000 -0800 @@ -578,32 +578,65 @@ void drain_remote_pages(void) } #endif -#if defined(CONFIG_PM) || defined(CONFIG_HOTPLUG_CPU) -static void __drain_pages(unsigned int cpu) +/* + * Drain any cpu-local pages into the buddy lists. Must be called under + * local_irq_disable(). + */ +static int __drain_pages(unsigned int cpu) { - unsigned long flags; struct zone *zone; - int i; + int ret = 0; for_each_zone(zone) { struct per_cpu_pageset *pset; + int i; pset = zone_pcp(zone, cpu); for (i = 0; i < ARRAY_SIZE(pset->pcp); i++) { struct per_cpu_pages *pcp; pcp = &pset->pcp[i]; - local_irq_save(flags); + if (!pcp->count) + continue; pcp->count -= free_pages_bulk(zone, pcp->count, &pcp->list, 0); - local_irq_restore(flags); + ret++; } } + return ret; } -#endif /* CONFIG_PM || CONFIG_HOTPLUG_CPU */ -#ifdef CONFIG_PM +/* + * Spill all of this CPU's per-cpu pages back into the buddy allocator. + */ +void drain_local_pages(void) +{ + unsigned long flags; + + local_irq_save(flags); + __drain_pages(smp_processor_id()); + local_irq_restore(flags); +} +static void drainer(void *p) +{ + drain_local_pages(); +} + +/* + * Drain the per-cpu pages on all CPUs. If called from interrupt context we + * can only drain the local CPU's pages, since cross-CPU calls are deadlocky + * from interrupt context. + */ +static void drain_all_local_pages(void) +{ + if (in_interrupt()) + drain_local_pages(); + else + on_each_cpu(drainer, NULL, 0, 1); +} + +#ifdef CONFIG_PM void mark_free_pages(struct zone *zone) { unsigned long zone_pfn, flags; @@ -629,17 +662,6 @@ void mark_free_pages(struct zone *zone) spin_unlock_irqrestore(&zone->lock, flags); } -/* - * Spill all of this CPU's per-cpu pages back into the buddy allocator. - */ -void drain_local_pages(void) -{ - unsigned long flags; - - local_irq_save(flags); - __drain_pages(smp_processor_id()); - local_irq_restore(flags); -} #endif /* CONFIG_PM */ static void zone_statistics(struct zonelist *zonelist, struct zone *z) @@ -889,6 +911,10 @@ restart: if (gfp_mask & __GFP_HIGH) alloc_flags |= ALLOC_DIP_LESS; + if (order > 0 || (!wait && (gfp_mask & __GFP_HIGH)) || + (p->flags & PF_MEMALLOC)) + drain_all_local_pages(); + page = get_page_from_freelist(gfp_mask, order, zonelist, alloc_flags); if (page) goto got_pg; _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 5:58 ` Andrew Morton @ 2005-11-23 18:17 ` Rohit Seth 0 siblings, 0 replies; 26+ messages in thread From: Rohit Seth @ 2005-11-23 18:17 UTC (permalink / raw) To: Andrew Morton; +Cc: torvalds, linux-mm, linux-kernel, christoph On Tue, 2005-11-22 at 21:58 -0800, Andrew Morton wrote: > Andrew Morton <akpm@osdl.org> wrote: > > > > The `while' loop worries me for some reason, so I wimped out and just tried > > the remote drain once. > > Even the `goto restart' which is in this patch worries me from a livelock > POV. Perhaps we should only ever run drain_all_local_pages() once per > __alloc_pages() invokation. > And perhaps we should run drain_all_local_pages() for GFP_ATOMIC or > PF_MEMALLOC attempts too. Good point for PF_MEMALLOC scenario. -rohit -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 5:36 ` Andrew Morton 2005-11-23 5:58 ` Andrew Morton @ 2005-11-23 6:36 ` Christoph Lameter 2005-11-23 6:42 ` Christoph Lameter 2005-11-23 17:54 ` Rohit Seth 3 siblings, 0 replies; 26+ messages in thread From: Christoph Lameter @ 2005-11-23 6:36 UTC (permalink / raw) To: Andrew Morton; +Cc: Rohit Seth, torvalds, linux-mm, linux-kernel On Tue, 22 Nov 2005, Andrew Morton wrote: > > [PATCH]: This patch free pages (pcp->batch from each list at a time) from > > local pcp lists when a higher order allocation request is not able to > > get serviced from global free_list. > > > > This should help fix some of the earlier failures seen with order 1 allocations. > > > > I will send separate patches for: > > > > 1- Reducing the remote cpus pcp That is already partially done by drain_remote_pages(). However, that draining is specific to this processors remote pagesets in remote zones. > This significantly duplicates the existing drain_local_pages(). We need to extract __drain_pcp from all these functions and clearly document how they differ. Seth probably needs to call __drain_pages for each processor. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 5:36 ` Andrew Morton 2005-11-23 5:58 ` Andrew Morton 2005-11-23 6:36 ` Christoph Lameter @ 2005-11-23 6:42 ` Christoph Lameter 2005-11-23 16:35 ` Linus Torvalds 2005-11-23 17:54 ` Rohit Seth 3 siblings, 1 reply; 26+ messages in thread From: Christoph Lameter @ 2005-11-23 6:42 UTC (permalink / raw) To: Andrew Morton; +Cc: Rohit Seth, torvalds, linux-mm, linux-kernel On Tue, 22 Nov 2005, Andrew Morton wrote: > +extern int drain_local_pages(void); drain_cpu_pcps? The naming scheme is a bit confusing right now. We drain the pcp structures not pages so maybe switch to pcp and then name each function so that the function can be distinguishes clearlyu? > +static int drain_all_local_pages(void) drain_all_pcps? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 6:42 ` Christoph Lameter @ 2005-11-23 16:35 ` Linus Torvalds 2005-11-23 17:03 ` Christoph Lameter 0 siblings, 1 reply; 26+ messages in thread From: Linus Torvalds @ 2005-11-23 16:35 UTC (permalink / raw) To: Christoph Lameter; +Cc: Andrew Morton, Rohit Seth, linux-mm, linux-kernel On Tue, 22 Nov 2005, Christoph Lameter wrote: > On Tue, 22 Nov 2005, Andrew Morton wrote: > > > +extern int drain_local_pages(void); > > drain_cpu_pcps? Please no. If there is something I _hate_ it's bad naming. And "pcps" is a totally unintelligible name. Write it out. If a function is so trivial that you can't be bothered to write out what the name means, that function shouldn't exist at all. Conversely, if it's worth doing, it's worth writing out a name. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 16:35 ` Linus Torvalds @ 2005-11-23 17:03 ` Christoph Lameter 0 siblings, 0 replies; 26+ messages in thread From: Christoph Lameter @ 2005-11-23 17:03 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, Rohit Seth, linux-mm, linux-kernel On Wed, 23 Nov 2005, Linus Torvalds wrote: > On Tue, 22 Nov 2005, Christoph Lameter wrote: > > On Tue, 22 Nov 2005, Andrew Morton wrote: > > > +extern int drain_local_pages(void); > > drain_cpu_pcps? > > Please no. > > If there is something I _hate_ it's bad naming. And "pcps" is a totally > unintelligible name. > > Write it out. If a function is so trivial that you can't be bothered to > write out what the name means, that function shouldn't exist at all. > Conversely, if it's worth doing, it's worth writing out a name. drain_one_cpus_pages_from_per_cpu_pagesets() drain_one_cpus_remote_pages_from_per_cpu_pagesets() drain_all_per_cpu_pagesets() ? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 5:36 ` Andrew Morton ` (2 preceding siblings ...) 2005-11-23 6:42 ` Christoph Lameter @ 2005-11-23 17:54 ` Rohit Seth 2005-11-23 18:06 ` Mel Gorman 3 siblings, 1 reply; 26+ messages in thread From: Rohit Seth @ 2005-11-23 17:54 UTC (permalink / raw) To: Andrew Morton; +Cc: torvalds, linux-mm, linux-kernel, Christoph Lameter On Tue, 2005-11-22 at 21:36 -0800, Andrew Morton wrote: > Rohit Seth <rohit.seth@intel.com> wrote: > > > > Andrew, Linus, > > > > [PATCH]: This patch free pages (pcp->batch from each list at a time) from > > local pcp lists when a higher order allocation request is not able to > > get serviced from global free_list. > > > > This should help fix some of the earlier failures seen with order 1 allocations. > > > > I will send separate patches for: > > > > 1- Reducing the remote cpus pcp > > 2- Clean up page_alloc.c for CONFIG_HOTPLUG_CPU to use this code appropiately > > > > +static int > > +reduce_cpu_pcp(void ) > > > This significantly duplicates the existing drain_local_pages(). Yes. The main change in this new function is I'm only freeing batch number of pages from each pcp rather than draining out all of them (even under a little memory pressure). IMO, we should be more opportunistic here in alloc_pages in moving pages back to global page pool list. Thoughts? As said earlier, I will be cleaning up the existing drain_local_pages in next follow up patch. > > > > > + if (order > 0) > > + while (reduce_cpu_pcp()) { > > + if (get_page_from_freelist(gfp_mask, order, zonelist, alloc_flags)) > > This forgot to assign to local variable `page'! It'll return NULL and will > leak memory. > My bad. Will fix it. > The `while' loop worries me for some reason, so I wimped out and just tried > the remote drain once. > Even after direct reclaim it probably does make sense to see how minimally we can service a higher order request. > > + goto got_pg; > > + } > > + /* FIXME: Add the support for reducing/draining the remote pcps. > > This is easy enough to do. > The couple of options that I wanted to think little more were (before attempting to do this part): 1- Whether use the IPI to get the remote CPUs to free pages from pcp or do it lazily (using work_pending or such). As at this point in execution we can definitely afford to get scheduled out. 2- Do we drain the whole pcp on remote processors or again follow the stepped approach (but may be with a steeper slope). > We need to verify that this patch actually does something useful. > > I'm working on this. Will let you know later today if I can come with some workload easily hitting this additional logic. Thanks, -rohit -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 17:54 ` Rohit Seth @ 2005-11-23 18:06 ` Mel Gorman 2005-11-23 19:41 ` Rohit Seth 2005-11-23 23:26 ` Rohit Seth 0 siblings, 2 replies; 26+ messages in thread From: Mel Gorman @ 2005-11-23 18:06 UTC (permalink / raw) To: Rohit Seth Cc: Andrew Morton, torvalds, linux-mm, linux-kernel, Christoph Lameter On Wed, 23 Nov 2005, Rohit Seth wrote: > On Tue, 2005-11-22 at 21:36 -0800, Andrew Morton wrote: > > Rohit Seth <rohit.seth@intel.com> wrote: > > > > > > Andrew, Linus, > > > > > > [PATCH]: This patch free pages (pcp->batch from each list at a time) from > > > local pcp lists when a higher order allocation request is not able to > > > get serviced from global free_list. > > > > > > This should help fix some of the earlier failures seen with order 1 allocations. > > > > > > I will send separate patches for: > > > > > > 1- Reducing the remote cpus pcp > > > 2- Clean up page_alloc.c for CONFIG_HOTPLUG_CPU to use this code appropiately > > > > > > +static int > > > +reduce_cpu_pcp(void ) > > > > > This significantly duplicates the existing drain_local_pages(). > > Yes. The main change in this new function is I'm only freeing batch > number of pages from each pcp rather than draining out all of them (even > under a little memory pressure). IMO, we should be more opportunistic > here in alloc_pages in moving pages back to global page pool list. > Thoughts? > I doubt you gain a whole lot by releasing them in batches. There is no way to determine if freeing a few will result in contiguous blocks or not and the overhead of been cautious will likely exceed the cost of simply refilling them on the next order-0 allocation. Your worst case is where the buddies you need are in different per-cpu caches. As it's easy to refill a per-cpu cache, it would be easier, clearer and probably faster to just purge the per-cpu cache and have it refilled on the next order-0 allocation. The release-in-batch approach would only be worthwhile if you expect an order-1 allocation to be very rare. > As said earlier, I will be cleaning up the existing drain_local_pages in > next follow up patch. > > > > > > > > > + if (order > 0) > > > + while (reduce_cpu_pcp()) { > > > + if (get_page_from_freelist(gfp_mask, order, zonelist, alloc_flags)) > > > > This forgot to assign to local variable `page'! It'll return NULL and will > > leak memory. > > > My bad. Will fix it. > > > The `while' loop worries me for some reason, so I wimped out and just tried > > the remote drain once. > > > Even after direct reclaim it probably does make sense to see how > minimally we can service a higher order request. > After direct reclaim, things are already very slow. The cost of refilling a per-cpu cache is nowhere near the same as a direct-reclaim. > > > + goto got_pg; > > > + } > > > + /* FIXME: Add the support for reducing/draining the remote pcps. > > > > This is easy enough to do. > > > > The couple of options that I wanted to think little more were (before > attempting to do this part): > > 1- Whether use the IPI to get the remote CPUs to free pages from pcp or > do it lazily (using work_pending or such). As at this point in > execution we can definitely afford to get scheduled out. > In 005_drainpercpu.patch from the last version of the anti-defrag, I used the smp_call_function() and it did not seem to slow up the system. Certainly, by the time it was called, the system was already low on memory and trashing a bit so it just wasn't noticable. > 2- Do we drain the whole pcp on remote processors or again follow the > stepped approach (but may be with a steeper slope). > I would say do the same on the remote case as you do locally to keep things consistent. > > > We need to verify that this patch actually does something useful. > > > > > I'm working on this. Will let you know later today if I can come with > some workload easily hitting this additional logic. > I found it hard to generate reliable workloads which hit these sort of situations although a fork-heavy workload with 8k stacks will put pressure on order-1 allocations. You can artifically force high order allocations using vmregress by doing something like this; ./configure --with-linux=/usr/src/linux-yourtree make insmod kernel_src/core/vmregress_core.ko insmod kernel_src/core/buddyinfo.ko insmod kernel_src/test/highalloc.ko echo 1 100 > /proc/vmregress/test_highalloc That will allocate 1 order-1 page every tenth of a second until it has tried 100 allocations. When it completes, the success/failure report can be read by catting /proc/vmregress/test_highalloc. Obviously, this is very artifical. -- Mel Gorman Part-time Phd Student Java Applications Developer University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 18:06 ` Mel Gorman @ 2005-11-23 19:41 ` Rohit Seth 2005-11-24 9:25 ` Mel Gorman 2005-11-23 23:26 ` Rohit Seth 1 sibling, 1 reply; 26+ messages in thread From: Rohit Seth @ 2005-11-23 19:41 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, torvalds, linux-mm, linux-kernel, Christoph Lameter On Wed, 2005-11-23 at 18:06 +0000, Mel Gorman wrote: > On Wed, 23 Nov 2005, Rohit Seth wrote: > > > > I doubt you gain a whole lot by releasing them in batches. There is no way > to determine if freeing a few will result in contiguous blocks or not and > the overhead of been cautious will likely exceed the cost of simply > refilling them on the next order-0 allocation. It depends. If most of the higher order allocations are only order 1 (and may be order 2) then it is possible that we may gain in freeing in batches. > Your worst case is where > the buddies you need are in different per-cpu caches. > That is why we need another patch that tries to allocate physically contiguous pages in each per_cpu_pagelist. Actually this patch used to be there in Andrew's tree for some time (2.6.14) before couple of corner cases came up failing where order 1 allocations were unsuccessful. > As it's easy to refill a per-cpu cache, it would be easier, clearer and > probably faster to just purge the per-cpu cache and have it refilled on > the next order-0 allocation. The release-in-batch approach would only be > worthwhile if you expect an order-1 allocation to be very rare. > Well, my only fear is if this shunting happens too often... > In 005_drainpercpu.patch from the last version of the anti-defrag, I used > the smp_call_function() and it did not seem to slow up the system. > Certainly, by the time it was called, the system was already low on > memory and trashing a bit so it just wasn't noticable. > I agree at this point in alloaction, speed probably does not matter too much. I definitely want to first see for simple workloads how much (and how deep we have to go into deallocations) this extra logic helps. > > 2- Do we drain the whole pcp on remote processors or again follow the > > stepped approach (but may be with a steeper slope). > > > > I would say do the same on the remote case as you do locally to keep > things consistent. > Well, I think in bigger scope these allocations/deallocations will get automatically balanced. > > > > > We need to verify that this patch actually does something useful. > > > > > > > > I'm working on this. Will let you know later today if I can come with > > some workload easily hitting this additional logic. > > > > I found it hard to generate reliable workloads which hit these sort of > situations although a fork-heavy workload with 8k stacks will put pressure > on order-1 allocations. You can artifically force high order allocations > using vmregress by doing something like this; Need something more benign/stupid to kick into this logic. -rohit -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 19:41 ` Rohit Seth @ 2005-11-24 9:25 ` Mel Gorman 0 siblings, 0 replies; 26+ messages in thread From: Mel Gorman @ 2005-11-24 9:25 UTC (permalink / raw) To: Rohit Seth Cc: Andrew Morton, torvalds, linux-mm, linux-kernel, Christoph Lameter On Wed, 23 Nov 2005, Rohit Seth wrote: > On Wed, 2005-11-23 at 18:06 +0000, Mel Gorman wrote: > > On Wed, 23 Nov 2005, Rohit Seth wrote: > > > > > > > I doubt you gain a whole lot by releasing them in batches. There is no way > > to determine if freeing a few will result in contiguous blocks or not and > > the overhead of been cautious will likely exceed the cost of simply > > refilling them on the next order-0 allocation. > > It depends. If most of the higher order allocations are only order 1 > (and may be order 2) then it is possible that we may gain in freeing in > batches. > Possible, but if you are draining, it's just as handy to drain them all and avoid >0 order failures in the near future. > > Your worst case is where > > the buddies you need are in different per-cpu caches. > > > > That is why we need another patch that tries to allocate physically > contiguous pages in each per_cpu_pagelist. That will only delay the problem. Pages end up on the per-cpu lists from either an allocation or a free. While the allocation would make sure the pages were contiguous and on the same list, the frees will not. The test case you are running is a tight loop of one process allocating and freeing order-1 pages. This is probably staying on the one CPU so draining in batches on just the local CPU will appear successful. On long lived loads, it will not be as successful. Draining all the pages on all lists would cover all cases while using the existing code. When I was testing my version of drain-percpu for anti-defrag and order-10 allocations, I found that draining just the local CPU made little difference but draining all of them made a massive difference. This is an extreme case, but it still applies to the smaller orders. > Actually this patch used to > be there in Andrew's tree for some time (2.6.14) before couple of corner > cases came up failing where order 1 allocations were unsuccessful. > > > As it's easy to refill a per-cpu cache, it would be easier, clearer and > > probably faster to just purge the per-cpu cache and have it refilled on > > the next order-0 allocation. The release-in-batch approach would only be > > worthwhile if you expect an order-1 allocation to be very rare. > > > > Well, my only fear is if this shunting happens too often... > Measure it by counting how often you drain the pages and add it to frag_show(). This is a hack obviously and not a permanent solution, but it's the easiest way to find out what's going on. > > In 005_drainpercpu.patch from the last version of the anti-defrag, I used > > the smp_call_function() and it did not seem to slow up the system. > > Certainly, by the time it was called, the system was already low on > > memory and trashing a bit so it just wasn't noticable. > > > > I agree at this point in alloaction, speed probably does not matter too > much. I definitely want to first see for simple workloads how much (and > how deep we have to go into deallocations) this extra logic helps. > > > > 2- Do we drain the whole pcp on remote processors or again follow the > > > stepped approach (but may be with a steeper slope). > > > > > > > I would say do the same on the remote case as you do locally to keep > > things consistent. > > > > Well, I think in bigger scope these allocations/deallocations will get > automatically balanced. > Depends on if your workload involves one or more processes. If the load is multiple processes on multiple CPUs, the per-cpu pages will be spread out a lot. > > > > > > > We need to verify that this patch actually does something useful. > > > > > > > > > > > I'm working on this. Will let you know later today if I can come with > > > some workload easily hitting this additional logic. > > > > > > > I found it hard to generate reliable workloads which hit these sort of > > situations although a fork-heavy workload with 8k stacks will put pressure > > on order-1 allocations. You can artifically force high order allocations > > using vmregress by doing something like this; > > Need something more benign/stupid to kick into this logic. > If CIFS still needs high order allocations, you could try -jN kernel compiles over the network filesystem. Network benchmarks running over a loopback device with a large MTU while another load consumes memory might also trigger it. -- Mel Gorman Part-time Phd Student Java Applications Developer University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 18:06 ` Mel Gorman 2005-11-23 19:41 ` Rohit Seth @ 2005-11-23 23:26 ` Rohit Seth 1 sibling, 0 replies; 26+ messages in thread From: Rohit Seth @ 2005-11-23 23:26 UTC (permalink / raw) To: Andrew Morton, Mel Gorman Cc: torvalds, linux-mm, linux-kernel, Christoph Lameter On Wed, 23 Nov 2005, Rohit Seth wrote: > > > On Tue, 2005-11-22 at 21:36 -0800, Andrew Morton wrote: > > > We need to verify that this patch actually does something useful. > > > > > > > > I'm working on this. Will let you know later today if I can come with > > some workload easily hitting this additional logic. > > > I'm able to trigger the reduce_cpu_pcp (I'll change its name in next update patch) logic after direct reclaim using a small test case hogging memory and a bash loop spawning another process 1 at a time using very little memory. I added a single printk after the direct reclaim where we reduce the per cpu pagelist (in my patch) just to get the order and how many iterations do we need to service the request. order is always 1 (coming from alloc_thread_info for 8K stack size). This is on i386 with 8K stack size. if (order > 0) { int i = 0; while (reduce_cpu_pcp()) { i++; page = get_page_from_freelist(gfp_mask, order, zonelist, alloc_flags); if (page) { printk("Got page %d order iteration %d", order, i); goto got_pg; } } } And got about 30 of those in couple of hours: [17179885.360000] Got page 1 order iteration 1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 0:10 [PATCH]: Free pages from local pcp lists under tight memory conditions Rohit Seth 2005-11-23 5:36 ` Andrew Morton @ 2005-11-23 19:30 ` Christoph Lameter 2005-11-23 19:46 ` Rohit Seth 1 sibling, 1 reply; 26+ messages in thread From: Christoph Lameter @ 2005-11-23 19:30 UTC (permalink / raw) To: Rohit Seth; +Cc: akpm, torvalds, linux-mm, linux-kernel On Tue, 22 Nov 2005, Rohit Seth wrote: > [PATCH]: This patch free pages (pcp->batch from each list at a time) from > local pcp lists when a higher order allocation request is not able to > get serviced from global free_list. Ummm.. One controversial idea: How about removing the complete pcp subsystem? Last time we disabled pcps we saw that the effect that it had was within noise ratio on AIM7. The lru lock taken without pcp is in the local zone and thus rarely contended. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 19:30 ` Christoph Lameter @ 2005-11-23 19:46 ` Rohit Seth 2005-11-23 19:55 ` Andrew Morton 0 siblings, 1 reply; 26+ messages in thread From: Rohit Seth @ 2005-11-23 19:46 UTC (permalink / raw) To: Christoph Lameter; +Cc: akpm, torvalds, linux-mm, linux-kernel On Wed, 2005-11-23 at 11:30 -0800, Christoph Lameter wrote: > On Tue, 22 Nov 2005, Rohit Seth wrote: > > > [PATCH]: This patch free pages (pcp->batch from each list at a time) from > > local pcp lists when a higher order allocation request is not able to > > get serviced from global free_list. > > Ummm.. One controversial idea: How about removing the complete pcp > subsystem? Last time we disabled pcps we saw that the effect > that it had was within noise ratio on AIM7. The lru lock taken without > pcp is in the local zone and thus rarely contended. Oh please stop. This per_cpu_pagelist is one great logic that has got added in allocator. Besides providing pages without the need to acquire the zone lock, it is one single main reason the coloring effect is drastically reduced in 2.6 (over 2.4) based kernels. -rohit -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 19:46 ` Rohit Seth @ 2005-11-23 19:55 ` Andrew Morton 2005-11-23 21:00 ` Rohit Seth 2005-11-23 22:01 ` Christoph Lameter 0 siblings, 2 replies; 26+ messages in thread From: Andrew Morton @ 2005-11-23 19:55 UTC (permalink / raw) To: Rohit Seth; +Cc: clameter, torvalds, linux-mm, linux-kernel Rohit Seth <rohit.seth@intel.com> wrote: > > On Wed, 2005-11-23 at 11:30 -0800, Christoph Lameter wrote: > > On Tue, 22 Nov 2005, Rohit Seth wrote: > > > > > [PATCH]: This patch free pages (pcp->batch from each list at a time) from > > > local pcp lists when a higher order allocation request is not able to > > > get serviced from global free_list. > > > > Ummm.. One controversial idea: How about removing the complete pcp > > subsystem? Last time we disabled pcps we saw that the effect > > that it had was within noise ratio on AIM7. The lru lock taken without > > pcp is in the local zone and thus rarely contended. > > Oh please stop. > > This per_cpu_pagelist is one great logic that has got added in > allocator. Besides providing pages without the need to acquire the zone > lock, it is one single main reason the coloring effect is drastically > reduced in 2.6 (over 2.4) based kernels. > hm. Before it was merged in 2.5.x, the feature was very marginal from a performance POV in my testing on 4-way. I was able to demonstrate a large (~60%?) speedup in one microbenckmark which consisted of four processes writing 16k to a file and truncating it back to zero again. That gain came from the cache warmth effect, which is the other benefit which these cpu-local pages are supposed to provide. I don't think Martin was able to demonstrate much benefit from the lock contention reduction on 16-way NUMAQ either. So I dithered for months and it was a marginal merge, so it's appropriate to justify the continued presence of the code. We didn't measure for any coloring effects though. In fact, I didn't know that this feature actually provided any benefit in that area. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 19:55 ` Andrew Morton @ 2005-11-23 21:00 ` Rohit Seth 2005-11-23 21:25 ` Christoph Lameter ` (2 more replies) 2005-11-23 22:01 ` Christoph Lameter 1 sibling, 3 replies; 26+ messages in thread From: Rohit Seth @ 2005-11-23 21:00 UTC (permalink / raw) To: Andrew Morton; +Cc: clameter, torvalds, linux-mm, linux-kernel On Wed, 2005-11-23 at 11:55 -0800, Andrew Morton wrote: > Rohit Seth <rohit.seth@intel.com> wrote: > > > > On Wed, 2005-11-23 at 11:30 -0800, Christoph Lameter wrote: > > > On Tue, 22 Nov 2005, Rohit Seth wrote: > > > > > > > [PATCH]: This patch free pages (pcp->batch from each list at a time) from > > > > local pcp lists when a higher order allocation request is not able to > > > > get serviced from global free_list. > > > > > > Ummm.. One controversial idea: How about removing the complete pcp > > > subsystem? Last time we disabled pcps we saw that the effect > > > that it had was within noise ratio on AIM7. The lru lock taken without > > > pcp is in the local zone and thus rarely contended. > > > > Oh please stop. > > > > This per_cpu_pagelist is one great logic that has got added in > > allocator. Besides providing pages without the need to acquire the zone > > lock, it is one single main reason the coloring effect is drastically > > reduced in 2.6 (over 2.4) based kernels. > > > > hm. Before it was merged in 2.5.x, the feature was very marginal from a > performance POV in my testing on 4-way. > One less lock to worry about in the page allocation should help somewhere I would assume. > I was able to demonstrate a large (~60%?) speedup in one microbenckmark > which consisted of four processes writing 16k to a file and truncating it > back to zero again. That gain came from the cache warmth effect, which is > the other benefit which these cpu-local pages are supposed to provide. > That is right. > I don't think Martin was able to demonstrate much benefit from the lock > contention reduction on 16-way NUMAQ either. > > So I dithered for months and it was a marginal merge, so it's appropriate > to justify the continued presence of the code. > May be the limits on the number of pages hanging on the per_cpu_pagelist was (or even now is) too small (for them to give any meaningful gain). May be we should have more physical contiguity in each of these pcps to give better cache spread. > We didn't measure for any coloring effects though. In fact, I didn't know > that this feature actually provided any benefit in that area. I thought Nick et.al came up with some of the constant values like batch size to tackle the page coloring issue specifically. In any case, I think one of the key difference between 2.4 and 2.6 allocators is the pcp list. And even with the minuscule batch and high watermarks this is helping ordinary benchmarks (by reducing the variation from run to run). -rohit -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 21:00 ` Rohit Seth @ 2005-11-23 21:25 ` Christoph Lameter 2005-11-23 22:29 ` Rohit Seth 2005-11-23 21:26 ` Andrew Morton 2005-11-24 3:02 ` Paul Jackson 2 siblings, 1 reply; 26+ messages in thread From: Christoph Lameter @ 2005-11-23 21:25 UTC (permalink / raw) To: Rohit Seth; +Cc: Andrew Morton, torvalds, linux-mm, linux-kernel On Wed, 23 Nov 2005, Rohit Seth wrote: > I thought Nick et.al came up with some of the constant values like batch > size to tackle the page coloring issue specifically. In any case, I > think one of the key difference between 2.4 and 2.6 allocators is the > pcp list. And even with the minuscule batch and high watermarks this is > helping ordinary benchmarks (by reducing the variation from run to run). Could you share some benchmark results? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 21:25 ` Christoph Lameter @ 2005-11-23 22:29 ` Rohit Seth 0 siblings, 0 replies; 26+ messages in thread From: Rohit Seth @ 2005-11-23 22:29 UTC (permalink / raw) To: Christoph Lameter; +Cc: Andrew Morton, torvalds, linux-mm, linux-kernel On Wed, 2005-11-23 at 13:25 -0800, Christoph Lameter wrote: > On Wed, 23 Nov 2005, Rohit Seth wrote: > > > I thought Nick et.al came up with some of the constant values like batch > > size to tackle the page coloring issue specifically. In any case, I > > think one of the key difference between 2.4 and 2.6 allocators is the > > pcp list. And even with the minuscule batch and high watermarks this is > > helping ordinary benchmarks (by reducing the variation from run to run). > > Could you share some benchmark results? > Some components of cpu2k on 2.4 base kernels show in access of 40-50% variation from run to run. The same variations came down to about 10% for 2.6 based kernels. -rohit -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 21:00 ` Rohit Seth 2005-11-23 21:25 ` Christoph Lameter @ 2005-11-23 21:26 ` Andrew Morton 2005-11-23 21:40 ` Rohit Seth 2005-11-24 3:02 ` Paul Jackson 2 siblings, 1 reply; 26+ messages in thread From: Andrew Morton @ 2005-11-23 21:26 UTC (permalink / raw) To: Rohit Seth; +Cc: clameter, torvalds, linux-mm, linux-kernel Rohit Seth <rohit.seth@intel.com> wrote: > > > I don't think Martin was able to demonstrate much benefit from the lock > > contention reduction on 16-way NUMAQ either. > > > > So I dithered for months and it was a marginal merge, so it's appropriate > > to justify the continued presence of the code. > > > > May be the limits on the number of pages hanging on the per_cpu_pagelist > was (or even now is) too small (for them to give any meaningful gain). > May be we should have more physical contiguity in each of these pcps to > give better cache spread. Could be. The initial settings were pretty arbitrary - I assumed that someone would get in and tune them up, but nothing much happened. Perhaps we should expose the thresholds in /proc/sys/vm so they're easier to play with. > > We didn't measure for any coloring effects though. In fact, I didn't know > > that this feature actually provided any benefit in that area. > > I thought Nick et.al came up with some of the constant values like batch > size to tackle the page coloring issue specifically. In any case, I > think one of the key difference between 2.4 and 2.6 allocators is the > pcp list. And even with the minuscule batch and high watermarks this is > helping ordinary benchmarks (by reducing the variation from run to run). OK, useful info, thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 21:26 ` Andrew Morton @ 2005-11-23 21:40 ` Rohit Seth 0 siblings, 0 replies; 26+ messages in thread From: Rohit Seth @ 2005-11-23 21:40 UTC (permalink / raw) To: Andrew Morton; +Cc: clameter, torvalds, linux-mm, linux-kernel On Wed, 2005-11-23 at 13:26 -0800, Andrew Morton wrote: > Rohit Seth <rohit.seth@intel.com> wrote: > > > > > I don't think Martin was able to demonstrate much benefit from the lock > > > contention reduction on 16-way NUMAQ either. > > > > > > So I dithered for months and it was a marginal merge, so it's appropriate > > > to justify the continued presence of the code. > > > > > > > May be the limits on the number of pages hanging on the per_cpu_pagelist > > was (or even now is) too small (for them to give any meaningful gain). > > May be we should have more physical contiguity in each of these pcps to > > give better cache spread. > > Could be. The initial settings were pretty arbitrary - I assumed that > someone would get in and tune them up, but nothing much happened. Perhaps > we should expose the thresholds in /proc/sys/vm so they're easier to play > with. Most certainly. If I had a patch ready...I would have given you one right away :-) Though I will work on it... It surely is unfortunate that we have not digged deeper into this area (in terms of optimizations).... -rohit -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 21:00 ` Rohit Seth 2005-11-23 21:25 ` Christoph Lameter 2005-11-23 21:26 ` Andrew Morton @ 2005-11-24 3:02 ` Paul Jackson 2005-11-29 23:18 ` Rohit Seth 2 siblings, 1 reply; 26+ messages in thread From: Paul Jackson @ 2005-11-24 3:02 UTC (permalink / raw) To: Rohit Seth Cc: akpm, clameter, torvalds, linux-mm, linux-kernel, steiner, nickpiggin Rohit wrote: > I thought Nick et.al came up with some of the constant values like batch > size to tackle the page coloring issue specifically. I think this came about on a linux-ia64 thread started by Jack Steiner: http://www.gelato.unsw.edu.au/archives/linux-ia64/0504/13668.html Subject: per_cpu_pagesets degrades MPI performance From: Jack Steiner <steiner_at_sgi.com> Date: 2005-04-05 05:28:27 Jack reported that per_cpu_pagesets were degrading some MPI benchmarks due to adverse page coloring. Nick responded, recommending a non-power of two batch size. Jack found that this helped nicely. This thread trails off, but seems to be the origins of the 2**n-1 batch size in: mm/page_alloc.c:zone_batchsize() * Clamp the batch to a 2^n - 1 value. Having a power ... batch = (1 << fls(batch + batch/2)) - 1; I don't see here evidence that "per_cpu_pagelist is ... one single main reason the coloring effect is drastically reduced in 2.6 (over 2.4) based kernels." Rather in this case anyway a batch size not a power of two was apparently needed to keep per_cpu_pagesets from hurting performance due to page coloring affects on some workloads. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-24 3:02 ` Paul Jackson @ 2005-11-29 23:18 ` Rohit Seth 2005-12-01 14:44 ` Paul Jackson 0 siblings, 1 reply; 26+ messages in thread From: Rohit Seth @ 2005-11-29 23:18 UTC (permalink / raw) To: Paul Jackson Cc: akpm, clameter, torvalds, linux-mm, linux-kernel, steiner, nickpiggin On Wed, 2005-11-23 at 19:02 -0800, Paul Jackson wrote: > Rohit wrote: > > I thought Nick et.al came up with some of the constant values like batch > > size to tackle the page coloring issue specifically. > > I think this came about on a linux-ia64 thread started by Jack Steiner: > > http://www.gelato.unsw.edu.au/archives/linux-ia64/0504/13668.html > Subject: per_cpu_pagesets degrades MPI performance > From: Jack Steiner <steiner_at_sgi.com> > Date: 2005-04-05 05:28:27 > > Jack reported that per_cpu_pagesets were degrading some MPI benchmarks due > to adverse page coloring. Nick responded, recommending a non-power of two > batch size. Jack found that this helped nicely. This thread trails off, > but seems to be the origins of the 2**n-1 batch size in: > > mm/page_alloc.c:zone_batchsize() > * Clamp the batch to a 2^n - 1 value. Having a power ... > batch = (1 << fls(batch + batch/2)) - 1; > > I don't see here evidence that "per_cpu_pagelist is ... one single main > reason the coloring effect is drastically reduced in 2.6 (over 2.4) > based kernels." Rather in this case anyway a batch size not a power of > two was apparently needed to keep per_cpu_pagesets from hurting > performance due to page coloring affects on some workloads. > Well, the batch size of a list ( + the high mark) are integral part of per_cpu_pagelist infrastructure. Tuning is always required. I don't think though one fixed set of values is fixing all the cases. Can you please comment on the performance delta on the MPI workload because of this change in batch values. And what were the numbers before per_cpu_pagelists were introduced. thanks, -rohit -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-29 23:18 ` Rohit Seth @ 2005-12-01 14:44 ` Paul Jackson 2005-12-02 0:32 ` Nick Piggin 0 siblings, 1 reply; 26+ messages in thread From: Paul Jackson @ 2005-12-01 14:44 UTC (permalink / raw) To: Rohit Seth Cc: akpm, clameter, torvalds, linux-mm, linux-kernel, steiner, nickpiggin Rohit wrote: > Can you please comment on the performance delta on the MPI workload > because of this change in batch values. I can't -- all I know is what I read in Jack Steiner's posts of April 5, 2005, referenced earlier in this thread. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-12-01 14:44 ` Paul Jackson @ 2005-12-02 0:32 ` Nick Piggin 0 siblings, 0 replies; 26+ messages in thread From: Nick Piggin @ 2005-12-02 0:32 UTC (permalink / raw) To: Paul Jackson Cc: Rohit Seth, akpm, clameter, torvalds, linux-mm, linux-kernel, steiner Paul Jackson wrote: > Rohit wrote: > >>Can you please comment on the performance delta on the MPI workload >>because of this change in batch values. > > > I can't -- all I know is what I read in Jack Steiner's posts > of April 5, 2005, referenced earlier in this thread. > It was something fairly large. Basically having a power of 2 batch size meant that 2 concurrent allocators (presumably setting up the working area) would alternately pull in power of 2 chunks of memory, which caused each CPU to only get pages of ~half of its cache's possible colours. The fix is not by any means a single value for all workloads, it simply avoids powers of 2 batch size. Note this will have very little effect on single threaded allocators and will do nothing for cache colouring there, however it is important for concurrent allocators. Nick -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH]: Free pages from local pcp lists under tight memory conditions 2005-11-23 19:55 ` Andrew Morton 2005-11-23 21:00 ` Rohit Seth @ 2005-11-23 22:01 ` Christoph Lameter 1 sibling, 0 replies; 26+ messages in thread From: Christoph Lameter @ 2005-11-23 22:01 UTC (permalink / raw) To: Andrew Morton; +Cc: Rohit Seth, torvalds, linux-mm, linux-kernel On Wed, 23 Nov 2005, Andrew Morton wrote: > I was able to demonstrate a large (~60%?) speedup in one microbenckmark > which consisted of four processes writing 16k to a file and truncating it > back to zero again. That gain came from the cache warmth effect, which is > the other benefit which these cpu-local pages are supposed to provide. Maybe we can cut the pcp logic back to only put cache warm pages into a single per_cpu pcp list for the local processor that contains node local pages? Return all remote pages and cold pages directly to the buddy lists. That way we can remove the logic to flush remote pages and remove those pcp lists for remote nodes that are mostly empty anyways. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2005-12-02 0:32 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-11-23 0:10 [PATCH]: Free pages from local pcp lists under tight memory conditions Rohit Seth 2005-11-23 5:36 ` Andrew Morton 2005-11-23 5:58 ` Andrew Morton 2005-11-23 18:17 ` Rohit Seth 2005-11-23 6:36 ` Christoph Lameter 2005-11-23 6:42 ` Christoph Lameter 2005-11-23 16:35 ` Linus Torvalds 2005-11-23 17:03 ` Christoph Lameter 2005-11-23 17:54 ` Rohit Seth 2005-11-23 18:06 ` Mel Gorman 2005-11-23 19:41 ` Rohit Seth 2005-11-24 9:25 ` Mel Gorman 2005-11-23 23:26 ` Rohit Seth 2005-11-23 19:30 ` Christoph Lameter 2005-11-23 19:46 ` Rohit Seth 2005-11-23 19:55 ` Andrew Morton 2005-11-23 21:00 ` Rohit Seth 2005-11-23 21:25 ` Christoph Lameter 2005-11-23 22:29 ` Rohit Seth 2005-11-23 21:26 ` Andrew Morton 2005-11-23 21:40 ` Rohit Seth 2005-11-24 3:02 ` Paul Jackson 2005-11-29 23:18 ` Rohit Seth 2005-12-01 14:44 ` Paul Jackson 2005-12-02 0:32 ` Nick Piggin 2005-11-23 22:01 ` Christoph Lameter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox