* [PATCH] Periodically drain non local pagesets
@ 2005-06-01 17:48 Christoph Lameter
2005-06-01 18:46 ` Dave Hansen
0 siblings, 1 reply; 3+ messages in thread
From: Christoph Lameter @ 2005-06-01 17:48 UTC (permalink / raw)
To: akpm; +Cc: linux-mm, linux-ia64, linux-kernel
The pageset array can potentially acquire a huge amount of memory on large
NUMA systems. F.e. on a system with 512 processors and 256 nodes there will
be 256*512 pagesets. If each pageset only holds 5 pages then we are talking about
655360 pages.With a 16K page size on IA64 this results in potentially 10 Gigabytes
of memory being trapped in pagesets. The typical cases are much less for smaller
systems but there is still the potential of memory being trapped in off node
pagesets. Off node memory may be rarely used if local memory is available and so
we may potentially have memory in seldom used pagesets without this patch.
The slab allocator flushes its per cpu caches every 2 seconds. The following patch
flushes the off node pageset caches in the same way by tying into the slab flush.
The patch also changes /proc/zoneinfo to include the number of
pages currently in each pageset.
Patch against 2.6.12-rc5-mm1
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.12-rc5/include/linux/gfp.h
===================================================================
--- linux-2.6.12-rc5.orig/include/linux/gfp.h 2005-05-27 16:39:48.000000000 -0700
+++ linux-2.6.12-rc5/include/linux/gfp.h 2005-06-01 10:40:04.000000000 -0700
@@ -135,5 +135,10 @@ extern void FASTCALL(free_cold_page(stru
#define free_page(addr) free_pages((addr),0)
void page_alloc_init(void);
+#ifdef CONFIG_NUMA
+void drain_remote_pages(void);
+#else
+static inline void drain_remote_pages(void) { };
+#endif
#endif /* __LINUX_GFP_H */
Index: linux-2.6.12-rc5/mm/page_alloc.c
===================================================================
--- linux-2.6.12-rc5.orig/mm/page_alloc.c 2005-05-27 16:40:20.000000000 -0700
+++ linux-2.6.12-rc5/mm/page_alloc.c 2005-06-01 10:41:25.000000000 -0700
@@ -515,6 +515,36 @@ static int rmqueue_bulk(struct zone *zon
return allocated;
}
+#ifdef CONFIG_NUMA
+/* Called from the slab reaper to drain remote pagesets */
+void drain_remote_pages(void)
+{
+ struct zone *zone;
+ int i;
+ unsigned long flags;
+
+ local_irq_save(flags);
+ for_each_zone(zone) {
+ struct per_cpu_pageset *pset;
+
+ /* Do not drain local pagesets */
+ if (zone == zone_table[numa_node_id()])
+ continue;
+
+ pset = zone->pageset[smp_processor_id()];
+ for (i = 0; i < ARRAY_SIZE(pset->pcp); i++) {
+ struct per_cpu_pages *pcp;
+
+ pcp = &pset->pcp[i];
+ if (pcp->count)
+ pcp->count -= free_pages_bulk(zone, pcp->count,
+ &pcp->list, 0);
+ }
+ }
+ local_irq_restore(flags);
+}
+#endif
+
#if defined(CONFIG_PM) || defined(CONFIG_HOTPLUG_CPU)
static void __drain_pages(unsigned int cpu)
{
@@ -1385,12 +1415,13 @@ void show_free_areas(void)
pageset = zone_pcp(zone, cpu);
for (temperature = 0; temperature < 2; temperature++)
- printk("cpu %d %s: low %d, high %d, batch %d\n",
+ printk("cpu %d %s: low %d, high %d, batch %d used:%d\n",
cpu,
temperature ? "cold" : "hot",
pageset->pcp[temperature].low,
pageset->pcp[temperature].high,
- pageset->pcp[temperature].batch);
+ pageset->pcp[temperature].batch,
+ pageset->pcp[temperature].count);
}
}
Index: linux-2.6.12-rc5/mm/slab.c
===================================================================
--- linux-2.6.12-rc5.orig/mm/slab.c 2005-05-27 16:41:36.000000000 -0700
+++ linux-2.6.12-rc5/mm/slab.c 2005-06-01 10:22:18.000000000 -0700
@@ -3471,6 +3471,7 @@ next:
}
check_irq_on();
up(&cache_chain_sem);
+ drain_remote_pages();
/* Setup the next iteration */
schedule_delayed_work(&__get_cpu_var(reap_work), REAPTIMEOUT_CPUC + smp_processor_id());
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] Periodically drain non local pagesets
2005-06-01 17:48 [PATCH] Periodically drain non local pagesets Christoph Lameter
@ 2005-06-01 18:46 ` Dave Hansen
2005-06-01 18:59 ` Christoph Lameter
0 siblings, 1 reply; 3+ messages in thread
From: Dave Hansen @ 2005-06-01 18:46 UTC (permalink / raw)
To: Christoph Lameter
Cc: Andrew Morton, linux-mm, ia64 list, Linux Kernel Mailing List
On Wed, 2005-06-01 at 10:48 -0700, Christoph Lameter wrote:
> + struct per_cpu_pageset *pset;
> +
> + /* Do not drain local pagesets */
> + if (zone == zone_table[numa_node_id()])
> + continue;
> +
It's best to avoid using NUMA-specific data structures, even in #ifdef
NUMA code. This particular use is incorrect, as the zone_table[] is not
indexed by numa_node_id(), but rather by a combination of the node
number and the zone number (see NODEZONE()).
I'd suggest using something like this:
if (zone->zone_pgdat->node_id == numa_node_id())
It might be nice to have a zone_node_id() macro that hides this as well.
With a macro like that that #defines to 0 when !CONFIG_NUMA, the #ifdef
around that function could probably go away.
Also, are you sure that you need the local_irq_en/disable()?
-- Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] Periodically drain non local pagesets
2005-06-01 18:46 ` Dave Hansen
@ 2005-06-01 18:59 ` Christoph Lameter
0 siblings, 0 replies; 3+ messages in thread
From: Christoph Lameter @ 2005-06-01 18:59 UTC (permalink / raw)
To: Dave Hansen; +Cc: Andrew Morton, linux-mm, ia64 list, Linux Kernel Mailing List
On Wed, 1 Jun 2005, Dave Hansen wrote:
> Also, are you sure that you need the local_irq_en/disable()?
drain_pages() does the same. We would run into trouble if an
interrupt would use the page allocator.
Fix for the zone comparison:
Index: linux-2.6.12-rc5/mm/page_alloc.c
===================================================================
--- linux-2.6.12-rc5.orig/mm/page_alloc.c 2005-06-01 10:41:25.000000000 -0700
+++ linux-2.6.12-rc5/mm/page_alloc.c 2005-06-01 11:57:55.000000000 -0700
@@ -528,7 +528,7 @@ void drain_remote_pages(void)
struct per_cpu_pageset *pset;
/* Do not drain local pagesets */
- if (zone == zone_table[numa_node_id()])
+ if (zone->zone_pgdat->node_id == numa_node_id())
continue;
pset = zone->pageset[smp_processor_id()];
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2005-06-01 18:59 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-06-01 17:48 [PATCH] Periodically drain non local pagesets Christoph Lameter
2005-06-01 18:46 ` Dave Hansen
2005-06-01 18:59 ` Christoph Lameter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox