* [RFC][PATCH 0/2] Quicklist is slighly problematic.
@ 2008-08-20 11:05 KOSAKI Motohiro
2008-08-20 11:07 ` [RFC][PATCH 1/2] Show quicklist at meminfo KOSAKI Motohiro
` (3 more replies)
0 siblings, 4 replies; 43+ messages in thread
From: KOSAKI Motohiro @ 2008-08-20 11:05 UTC (permalink / raw)
To: LKML, linux-mm, Andrew Morton, Christoph Lameter
Cc: kosaki.motohiro, tokunaga.keiich
Hi Cristoph,
Thank you for explain your quicklist plan at OLS.
So, I made summary to issue of quicklist.
if you have a bit time, Could you please read this mail and patches?
And, if possible, Could you please tell me your feeling?
--------------------------------------------------------------------
Now, Quicklist store some page in each CPU as cache.
(Each CPU has node_free_pages/16 pages)
and it is used for page table cache.
Then, exit() increase cache, the other hand fork() spent it.
So, if apache type (one parent and many child model) middleware run,
One CPU process fork(), Other CPU process the middleware work and exit().
At that time, One CPU don't have page table cache at all,
Others have maximum caches.
QList_max = (#ofCPUs - 1) x Free / 16
=> QList_max / (Free + QList_max) = (#ofCPUs - 1) / (16 + #ofCPUs - 1)
So, How much quicklist spent memory at maximum case?
That is #CPUs proposional because it is per CPU cache but cache amount calculation doesn't use #ofCPUs.
Above calculation mean
Number of CPUs per node 2 4 8 16
============================== ====================
QList_max / (Free + QList_max) 5.8% 16% 30% 48%
Wow! Quicklist can spent about 50% memory at worst case.
More unfortunately, it doesn't have any cache shrinking mechanism.
So it cause some wrong thing.
1. End user misunderstand to memory leak happend.
=> /proc/meminfo should display amount quicklist
2. It can cause OOM killer
=> Amount of quicklists shouldn't be proposional to #ofCPUs.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread* [RFC][PATCH 1/2] Show quicklist at meminfo 2008-08-20 11:05 [RFC][PATCH 0/2] Quicklist is slighly problematic KOSAKI Motohiro @ 2008-08-20 11:07 ` KOSAKI Motohiro 2008-08-20 18:35 ` Andrew Morton 2008-08-20 11:08 ` [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs KOSAKI Motohiro ` (2 subsequent siblings) 3 siblings, 1 reply; 43+ messages in thread From: KOSAKI Motohiro @ 2008-08-20 11:07 UTC (permalink / raw) To: LKML, linux-mm, Andrew Morton, Christoph Lameter, tokunaga.keiich Cc: kosaki.motohiro Now, Quicklist can spent several GB memory. So, if end user can't hou much spent memory, he misunderstand to memory leak happend. after this patch applied, /proc/meminfo output following. % cat /proc/meminfo MemTotal: 7701504 kB MemFree: 5159040 kB Buffers: 112960 kB Cached: 337536 kB SwapCached: 0 kB Active: 218944 kB Inactive: 350848 kB Active(anon): 120832 kB Inactive(anon): 0 kB Active(file): 98112 kB Inactive(file): 350848 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 2031488 kB SwapFree: 2031488 kB Dirty: 320 kB Writeback: 0 kB AnonPages: 119488 kB Mapped: 38528 kB Slab: 1595712 kB SReclaimable: 23744 kB SUnreclaim: 1571968 kB PageTables: 14336 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 5882240 kB Committed_AS: 356672 kB VmallocTotal: 17592177655808 kB VmallocUsed: 29056 kB VmallocChunk: 17592177626304 kB Quicklists: 283776 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 262144 kB Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> --- fs/proc/proc_misc.c | 6 ++++-- include/linux/quicklist.h | 7 +++++++ 2 files changed, 11 insertions(+), 2 deletions(-) Index: b/fs/proc/proc_misc.c =================================================================== --- a/fs/proc/proc_misc.c +++ b/fs/proc/proc_misc.c @@ -202,7 +202,8 @@ static int meminfo_read_proc(char *page, "Committed_AS: %8lu kB\n" "VmallocTotal: %8lu kB\n" "VmallocUsed: %8lu kB\n" - "VmallocChunk: %8lu kB\n", + "VmallocChunk: %8lu kB\n" + "Quicklists: %8lu kB\n", K(i.totalram), K(i.freeram), K(i.bufferram), @@ -242,7 +243,8 @@ static int meminfo_read_proc(char *page, K(committed), (unsigned long)VMALLOC_TOTAL >> 10, vmi.used >> 10, - vmi.largest_chunk >> 10 + vmi.largest_chunk >> 10, + K(quicklist_total_size()) ); len += hugetlb_report_meminfo(page + len); Index: b/include/linux/quicklist.h =================================================================== --- a/include/linux/quicklist.h +++ b/include/linux/quicklist.h @@ -80,6 +80,13 @@ void quicklist_trim(int nr, void (*dtor) unsigned long quicklist_total_size(void); +#else + +static inline unsigned long quicklist_total_size(void) +{ + return 0; +} + #endif #endif /* LINUX_QUICKLIST_H */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 1/2] Show quicklist at meminfo 2008-08-20 11:07 ` [RFC][PATCH 1/2] Show quicklist at meminfo KOSAKI Motohiro @ 2008-08-20 18:35 ` Andrew Morton 2008-08-21 7:36 ` KOSAKI Motohiro 0 siblings, 1 reply; 43+ messages in thread From: Andrew Morton @ 2008-08-20 18:35 UTC (permalink / raw) To: KOSAKI Motohiro; +Cc: linux-kernel, linux-mm, cl, tokunaga.keiich On Wed, 20 Aug 2008 20:07:06 +0900 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > Now, Quicklist can spent several GB memory. > So, if end user can't hou much spent memory, he misunderstand to memory leak happend. > > > after this patch applied, /proc/meminfo output following. > > % cat /proc/meminfo > > MemTotal: 7701504 kB > MemFree: 5159040 kB > Buffers: 112960 kB > Cached: 337536 kB > SwapCached: 0 kB > Active: 218944 kB > Inactive: 350848 kB > Active(anon): 120832 kB > Inactive(anon): 0 kB > Active(file): 98112 kB > Inactive(file): 350848 kB > Unevictable: 0 kB > Mlocked: 0 kB > SwapTotal: 2031488 kB > SwapFree: 2031488 kB > Dirty: 320 kB > Writeback: 0 kB > AnonPages: 119488 kB > Mapped: 38528 kB > Slab: 1595712 kB > SReclaimable: 23744 kB > SUnreclaim: 1571968 kB > PageTables: 14336 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 5882240 kB > Committed_AS: 356672 kB > VmallocTotal: 17592177655808 kB > VmallocUsed: 29056 kB > VmallocChunk: 17592177626304 kB > Quicklists: 283776 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 262144 kB > > ... > > K(committed), > (unsigned long)VMALLOC_TOTAL >> 10, > vmi.used >> 10, > - vmi.largest_chunk >> 10 > + vmi.largest_chunk >> 10, > + K(quicklist_total_size()) > ); quicklist_total_size() is racy against cpu hotplug. That's OK for /proc/meminfo purposes (occasional transient inaccuracy?), but will it crash? Not in the current implementation of per_cpu() afaict, but it might crash if we ever teach cpu hotunplug to free up the percpu resources. I see no cpu hotplug handling in the quicklist code. Do we leak all the hot-unplugged CPU's pages? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 1/2] Show quicklist at meminfo 2008-08-20 18:35 ` Andrew Morton @ 2008-08-21 7:36 ` KOSAKI Motohiro 2008-08-22 1:05 ` KOSAKI Motohiro 0 siblings, 1 reply; 43+ messages in thread From: KOSAKI Motohiro @ 2008-08-21 7:36 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-mm, cl, tokunaga.keiich > quicklist_total_size() is racy against cpu hotplug. That's OK for > /proc/meminfo purposes (occasional transient inaccuracy?), but will it > crash? Not in the current implementation of per_cpu() afaict, but it > might crash if we ever teach cpu hotunplug to free up the percpu > resources. First, Quicklist doesn't concern to cpu hotplug at all. it is another quicklist problem. Next, I think it doesn't cause crash. but I haven't any test. So, I'll test cpu hotplug/unplug testing today. I'll report result tommorow. > I see no cpu hotplug handling in the quicklist code. Do we leak all > the hot-unplugged CPU's pages? Yes. Thanks! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 1/2] Show quicklist at meminfo 2008-08-21 7:36 ` KOSAKI Motohiro @ 2008-08-22 1:05 ` KOSAKI Motohiro 2008-08-22 4:28 ` Andrew Morton 0 siblings, 1 reply; 43+ messages in thread From: KOSAKI Motohiro @ 2008-08-22 1:05 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Andrew Morton, linux-kernel, linux-mm, cl, tokunaga.keiich > > quicklist_total_size() is racy against cpu hotplug. That's OK for > > /proc/meminfo purposes (occasional transient inaccuracy?), but will it > > crash? Not in the current implementation of per_cpu() afaict, but it > > might crash if we ever teach cpu hotunplug to free up the percpu > > resources. > > First, Quicklist doesn't concern to cpu hotplug at all. > it is another quicklist problem. > > Next, I think it doesn't cause crash. but I haven't any test. > So, I'll test cpu hotplug/unplug testing today. > > I'll report result tommorow. OK. I ran cpu hotplug/unplug coutinuous workload over 12H. then, system crash doesn't happend. So, I believe my patch is cpu unplug safe. test method -------------------------------------------------------------- 1. open 7 terminal and following script run on each console. CPU=cpuXXX; while true; do echo 0 > /sys/devices/system/cpu/$CPU/online; echo 1 > /sys/devi ces/system/cpu/$CPU/online;done 2. open another console, following command run. watch -n 1 cat /proc/meminfo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 1/2] Show quicklist at meminfo 2008-08-22 1:05 ` KOSAKI Motohiro @ 2008-08-22 4:28 ` Andrew Morton 2008-08-22 13:23 ` Robin Holt 2008-08-23 8:24 ` KOSAKI Motohiro 0 siblings, 2 replies; 43+ messages in thread From: Andrew Morton @ 2008-08-22 4:28 UTC (permalink / raw) To: KOSAKI Motohiro; +Cc: linux-kernel, linux-mm, cl, tokunaga.keiich On Fri, 22 Aug 2008 10:05:45 +0900 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > > > quicklist_total_size() is racy against cpu hotplug. That's OK for > > > /proc/meminfo purposes (occasional transient inaccuracy?), but will it > > > crash? Not in the current implementation of per_cpu() afaict, but it > > > might crash if we ever teach cpu hotunplug to free up the percpu > > > resources. > > > > First, Quicklist doesn't concern to cpu hotplug at all. > > it is another quicklist problem. > > > > Next, I think it doesn't cause crash. but I haven't any test. > > So, I'll test cpu hotplug/unplug testing today. > > > > I'll report result tommorow. > > OK. > I ran cpu hotplug/unplug coutinuous workload over 12H. > then, system crash doesn't happend. > > So, I believe my patch is cpu unplug safe. err, which patch? I presently have: mm-show-quicklist-memory-usage-in-proc-meminfo.patch mm-show-quicklist-memory-usage-in-proc-meminfo-fix.patch mm-quicklist-shouldnt-be-proportional-to-number-of-cpus.patch mm-quicklist-shouldnt-be-proportional-to-number-of-cpus-fix.patch Is that what you have? I'll consolidate them into two patches and will append them here. Please check. From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> At present the quicklists store some page for each CPU as a cache. (Each CPU has node_free_pages/16 pages) It is used for page table cache. Then, exit() increase cache, the other hand fork() spent it. So, if apache type (one parent and many child model) middleware run, One CPU process fork(), Other CPU process the middleware work and exit(). At that time, One CPU don't have page table cache at all, Others have maximum caches. QList_max = (#ofCPUs - 1) x Free / 16 => QList_max / (Free + QList_max) = (#ofCPUs - 1) / (16 + #ofCPUs - 1) So, How much quicklist spent memory at maximum case? That is #CPUs proposional because it is per CPU cache but cache amount calculation doesn't use #ofCPUs. Above calculation mean Number of CPUs per node 2 4 8 16 ============================== ==================== QList_max / (Free + QList_max) 5.8% 16% 30% 48% Wow! Quicklist can spent about 50% memory at worst case. More unfortunately, it doesn't have any cache shrinking mechanism. So it cause some wrong thing. 1. End user misunderstand to memory leak happend. => /proc/meminfo should display amount quicklist 2. It can cause OOM killer => Amount of quicklists shouldn't be proportional to number of CPUs. This patch: Quicklists can consume several GB memory. So, if end user can't see how much memory is used, he can fail to understand why a memory leak happend. after this patch applied, /proc/meminfo output following. % cat /proc/meminfo MemTotal: 7701504 kB MemFree: 5159040 kB Buffers: 112960 kB Cached: 337536 kB SwapCached: 0 kB Active: 218944 kB Inactive: 350848 kB Active(anon): 120832 kB Inactive(anon): 0 kB Active(file): 98112 kB Inactive(file): 350848 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 2031488 kB SwapFree: 2031488 kB Dirty: 320 kB Writeback: 0 kB AnonPages: 119488 kB Mapped: 38528 kB Slab: 1595712 kB SReclaimable: 23744 kB SUnreclaim: 1571968 kB PageTables: 14336 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 5882240 kB Committed_AS: 356672 kB VmallocTotal: 17592177655808 kB VmallocUsed: 29056 kB VmallocChunk: 17592177626304 kB Quicklists: 283776 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 262144 kB [akpm@linux-foundation.org: build fix] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- fs/proc/proc_misc.c | 7 +++++-- include/linux/quicklist.h | 7 +++++++ 2 files changed, 12 insertions(+), 2 deletions(-) diff -puN fs/proc/proc_misc.c~mm-show-quicklist-memory-usage-in-proc-meminfo fs/proc/proc_misc.c --- a/fs/proc/proc_misc.c~mm-show-quicklist-memory-usage-in-proc-meminfo +++ a/fs/proc/proc_misc.c @@ -24,6 +24,7 @@ #include <linux/tty.h> #include <linux/string.h> #include <linux/mman.h> +#include <linux/quicklist.h> #include <linux/proc_fs.h> #include <linux/ioport.h> #include <linux/mm.h> @@ -189,7 +190,8 @@ static int meminfo_read_proc(char *page, "Committed_AS: %8lu kB\n" "VmallocTotal: %8lu kB\n" "VmallocUsed: %8lu kB\n" - "VmallocChunk: %8lu kB\n", + "VmallocChunk: %8lu kB\n" + "Quicklists: %8lu kB\n", K(i.totalram), K(i.freeram), K(i.bufferram), @@ -221,7 +223,8 @@ static int meminfo_read_proc(char *page, K(committed), (unsigned long)VMALLOC_TOTAL >> 10, vmi.used >> 10, - vmi.largest_chunk >> 10 + vmi.largest_chunk >> 10, + K(quicklist_total_size()) ); len += hugetlb_report_meminfo(page + len); diff -puN include/linux/quicklist.h~mm-show-quicklist-memory-usage-in-proc-meminfo include/linux/quicklist.h --- a/include/linux/quicklist.h~mm-show-quicklist-memory-usage-in-proc-meminfo +++ a/include/linux/quicklist.h @@ -80,6 +80,13 @@ void quicklist_trim(int nr, void (*dtor) unsigned long quicklist_total_size(void); +#else + +static inline unsigned long quicklist_total_size(void) +{ + return 0; +} + #endif #endif /* LINUX_QUICKLIST_H */ _ From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> When a test program which does task migration runs, my 8GB box spends 800MB of memory for quicklist. This is not memory leak but doesn't seem good. % cat /proc/meminfo MemTotal: 7701568 kB MemFree: 4724672 kB (snip) Quicklists: 844800 kB because - My machine spec is number of numa node: 2 number of cpus: 8 (4CPU x2 node) total mem: 8GB (4GB x2 node) free mem: about 5GB - Maximum quicklist usage is here Number of CPUs per node 2 4 8 16 ============================== ==================== QList_max / (Free + QList_max) 5.8% 16% 30% 48% - Then, 4.7GB x 16% ~= 880MB. So, Quicklist can use 800MB. So, if following spec machine run that program CPUs: 64 (8cpu x 8node) Mem: 1TB (128GB x8node) Then, quicklist can waste 300GB (= 1TB x 30%). It is too large. So, I don't like cache policies which is proportional to # of cpus. My patch changes the number of caches from: per-cpu-cache-amount = memory_on_node / 16 to per-cpu-cache-amount = memory_on_node / 16 / number_of_cpus_on_node. I think this is reasonable. but even if this patch is applied, quicklist can cache tons of memory on big machine. (Although its patch applied, quicklist can waste 64GB on 1TB server (= 1TB / 16), it is still too much??) test program is below. -------------------------------------------------------------------------------- #define _GNU_SOURCE #include <stdio.h> #include <errno.h> #include <stdlib.h> #include <string.h> #include <sched.h> #include <unistd.h> #include <sys/mman.h> #include <sys/wait.h> #define BUFFSIZE 512 int max_cpu(void) /* get max number of logical cpus from /proc/cpuinfo */ { FILE *fd; char *ret, buffer[BUFFSIZE]; int cpu = 1; fd = fopen("/proc/cpuinfo", "r"); if (fd == NULL) { perror("fopen(/proc/cpuinfo)"); exit(EXIT_FAILURE); } while (1) { ret = fgets(buffer, BUFFSIZE, fd); if (ret == NULL) break; if (!strncmp(buffer, "processor", 9)) cpu = atoi(strchr(buffer, ':') + 2); } fclose(fd); return cpu; } void cpu_bind(int cpu) /* bind current process to one cpu */ { cpu_set_t mask; int ret; CPU_ZERO(&mask); CPU_SET(cpu, &mask); ret = sched_setaffinity(0, sizeof(mask), &mask); if (ret == -1) { perror("sched_setaffinity()"); exit(EXIT_FAILURE); } sched_yield(); /* not necessary */ } #define MMAP_SIZE (10 * 1024 * 1024) /* 10 MB */ #define FORK_INTERVAL 1 /* 1 second */ main(int argc, char *argv[]) { int cpu_max, nextcpu; long pagesize; pid_t pid; /* set max number of logical cpu */ if (argc > 1) cpu_max = atoi(argv[1]) - 1; else cpu_max = max_cpu(); /* get the page size */ pagesize = sysconf(_SC_PAGESIZE); if (pagesize == -1) { perror("sysconf(_SC_PAGESIZE)"); exit(EXIT_FAILURE); } /* prepare parent process */ cpu_bind(0); nextcpu = cpu_max; loop: /* select destination cpu for child process by round-robin rule */ if (++nextcpu > cpu_max) nextcpu = 1; pid = fork(); if (pid == 0) { /* child action */ char *p; int i; /* consume page tables */ p = mmap(0, MMAP_SIZE, PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); i = MMAP_SIZE / pagesize; while (i-- > 0) { *p = 1; p += pagesize; } /* move to other cpu */ cpu_bind(nextcpu); /* printf("a child moved to cpu%d after mmap().\n", nextcpu); fflush(stdout); */ /* back page tables to pgtable_quicklist */ exit(0); } else if (pid > 0) { /* parent action */ sleep(FORK_INTERVAL); waitpid(pid, NULL, WNOHANG); } goto loop; } [akpm@linux-foundation.org: fix build on sparc64] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/quicklist.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff -puN mm/quicklist.c~mm-quicklist-shouldnt-be-proportional-to-number-of-cpus mm/quicklist.c --- a/mm/quicklist.c~mm-quicklist-shouldnt-be-proportional-to-number-of-cpus +++ a/mm/quicklist.c @@ -26,7 +26,9 @@ DEFINE_PER_CPU(struct quicklist, quickli static unsigned long max_pages(unsigned long min_pages) { unsigned long node_free_pages, max; - struct zone *zones = NODE_DATA(numa_node_id())->node_zones; + int node = numa_node_id(); + struct zone *zones = NODE_DATA(node)->node_zones; + cpumask_t node_cpumask; node_free_pages = #ifdef CONFIG_ZONE_DMA @@ -38,6 +40,10 @@ static unsigned long max_pages(unsigned zone_page_state(&zones[ZONE_NORMAL], NR_FREE_PAGES); max = node_free_pages / FRACTION_OF_NODE_MEM; + + node_cpumask = node_to_cpumask(node); + max /= cpus_weight_nr(node_cpumask); + return max(max, min_pages); } _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 1/2] Show quicklist at meminfo 2008-08-22 4:28 ` Andrew Morton @ 2008-08-22 13:23 ` Robin Holt 2008-08-22 13:56 ` Christoph Lameter 2008-08-23 8:24 ` KOSAKI Motohiro 1 sibling, 1 reply; 43+ messages in thread From: Robin Holt @ 2008-08-22 13:23 UTC (permalink / raw) To: Andrew Morton Cc: KOSAKI Motohiro, linux-kernel, linux-mm, cl, tokunaga.keiich Christoph, Could we maybe add a per_cpu off-node quicklist and just always free that in check_pgt_cache? That would get us back the freeing of off-node page tables. Thanks, Robin On Thu, Aug 21, 2008 at 09:28:47PM -0700, Andrew Morton wrote: > On Fri, 22 Aug 2008 10:05:45 +0900 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > > > > > quicklist_total_size() is racy against cpu hotplug. That's OK for > > > > /proc/meminfo purposes (occasional transient inaccuracy?), but will it > > > > crash? Not in the current implementation of per_cpu() afaict, but it > > > > might crash if we ever teach cpu hotunplug to free up the percpu > > > > resources. > > > > > > First, Quicklist doesn't concern to cpu hotplug at all. > > > it is another quicklist problem. > > > > > > Next, I think it doesn't cause crash. but I haven't any test. > > > So, I'll test cpu hotplug/unplug testing today. > > > > > > I'll report result tommorow. > > > > OK. > > I ran cpu hotplug/unplug coutinuous workload over 12H. > > then, system crash doesn't happend. > > > > So, I believe my patch is cpu unplug safe. > > err, which patch? > > I presently have: > > mm-show-quicklist-memory-usage-in-proc-meminfo.patch > mm-show-quicklist-memory-usage-in-proc-meminfo-fix.patch > mm-quicklist-shouldnt-be-proportional-to-number-of-cpus.patch > mm-quicklist-shouldnt-be-proportional-to-number-of-cpus-fix.patch > > Is that what you have? > > I'll consolidate them into two patches and will append them here. Please check. > > > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > > At present the quicklists store some page for each CPU as a cache. (Each > CPU has node_free_pages/16 pages) > > It is used for page table cache. Then, exit() increase cache, the other > hand fork() spent it. > > So, if apache type (one parent and many child model) middleware run, One > CPU process fork(), Other CPU process the middleware work and exit(). > > At that time, One CPU don't have page table cache at all, Others have > maximum caches. > > QList_max = (#ofCPUs - 1) x Free / 16 > => QList_max / (Free + QList_max) = (#ofCPUs - 1) / (16 + #ofCPUs - 1) > > So, How much quicklist spent memory at maximum case? That is #CPUs > proposional because it is per CPU cache but cache amount calculation > doesn't use #ofCPUs. > > Above calculation mean > > Number of CPUs per node 2 4 8 16 > ============================== ==================== > QList_max / (Free + QList_max) 5.8% 16% 30% 48% > > > Wow! Quicklist can spent about 50% memory at worst case. More > unfortunately, it doesn't have any cache shrinking mechanism. So it cause > some wrong thing. > > 1. End user misunderstand to memory leak happend. > => /proc/meminfo should display amount quicklist > > 2. It can cause OOM killer > => Amount of quicklists shouldn't be proportional to number of CPUs. > > > > This patch: > > Quicklists can consume several GB memory. So, if end user can't see how > much memory is used, he can fail to understand why a memory leak happend. > > after this patch applied, /proc/meminfo output following. > > % cat /proc/meminfo > > MemTotal: 7701504 kB > MemFree: 5159040 kB > Buffers: 112960 kB > Cached: 337536 kB > SwapCached: 0 kB > Active: 218944 kB > Inactive: 350848 kB > Active(anon): 120832 kB > Inactive(anon): 0 kB > Active(file): 98112 kB > Inactive(file): 350848 kB > Unevictable: 0 kB > Mlocked: 0 kB > SwapTotal: 2031488 kB > SwapFree: 2031488 kB > Dirty: 320 kB > Writeback: 0 kB > AnonPages: 119488 kB > Mapped: 38528 kB > Slab: 1595712 kB > SReclaimable: 23744 kB > SUnreclaim: 1571968 kB > PageTables: 14336 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 5882240 kB > Committed_AS: 356672 kB > VmallocTotal: 17592177655808 kB > VmallocUsed: 29056 kB > VmallocChunk: 17592177626304 kB > Quicklists: 283776 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 262144 kB > > [akpm@linux-foundation.org: build fix] > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > Cc: Christoph Lameter <cl@linux-foundation.org> > Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x] > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > fs/proc/proc_misc.c | 7 +++++-- > include/linux/quicklist.h | 7 +++++++ > 2 files changed, 12 insertions(+), 2 deletions(-) > > diff -puN fs/proc/proc_misc.c~mm-show-quicklist-memory-usage-in-proc-meminfo fs/proc/proc_misc.c > --- a/fs/proc/proc_misc.c~mm-show-quicklist-memory-usage-in-proc-meminfo > +++ a/fs/proc/proc_misc.c > @@ -24,6 +24,7 @@ > #include <linux/tty.h> > #include <linux/string.h> > #include <linux/mman.h> > +#include <linux/quicklist.h> > #include <linux/proc_fs.h> > #include <linux/ioport.h> > #include <linux/mm.h> > @@ -189,7 +190,8 @@ static int meminfo_read_proc(char *page, > "Committed_AS: %8lu kB\n" > "VmallocTotal: %8lu kB\n" > "VmallocUsed: %8lu kB\n" > - "VmallocChunk: %8lu kB\n", > + "VmallocChunk: %8lu kB\n" > + "Quicklists: %8lu kB\n", > K(i.totalram), > K(i.freeram), > K(i.bufferram), > @@ -221,7 +223,8 @@ static int meminfo_read_proc(char *page, > K(committed), > (unsigned long)VMALLOC_TOTAL >> 10, > vmi.used >> 10, > - vmi.largest_chunk >> 10 > + vmi.largest_chunk >> 10, > + K(quicklist_total_size()) > ); > > len += hugetlb_report_meminfo(page + len); > diff -puN include/linux/quicklist.h~mm-show-quicklist-memory-usage-in-proc-meminfo include/linux/quicklist.h > --- a/include/linux/quicklist.h~mm-show-quicklist-memory-usage-in-proc-meminfo > +++ a/include/linux/quicklist.h > @@ -80,6 +80,13 @@ void quicklist_trim(int nr, void (*dtor) > > unsigned long quicklist_total_size(void); > > +#else > + > +static inline unsigned long quicklist_total_size(void) > +{ > + return 0; > +} > + > #endif > > #endif /* LINUX_QUICKLIST_H */ > _ > > > > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > > When a test program which does task migration runs, my 8GB box spends > 800MB of memory for quicklist. This is not memory leak but doesn't seem > good. > > % cat /proc/meminfo > > MemTotal: 7701568 kB > MemFree: 4724672 kB > (snip) > Quicklists: 844800 kB > > because > > - My machine spec is > number of numa node: 2 > number of cpus: 8 (4CPU x2 node) > total mem: 8GB (4GB x2 node) > free mem: about 5GB > > - Maximum quicklist usage is here > > Number of CPUs per node 2 4 8 16 > ============================== ==================== > QList_max / (Free + QList_max) 5.8% 16% 30% 48% > > - Then, 4.7GB x 16% ~= 880MB. > So, Quicklist can use 800MB. > > So, if following spec machine run that program > > CPUs: 64 (8cpu x 8node) > Mem: 1TB (128GB x8node) > > Then, quicklist can waste 300GB (= 1TB x 30%). It is too large. > > So, I don't like cache policies which is proportional to # of cpus. > > My patch changes the number of caches > from: > per-cpu-cache-amount = memory_on_node / 16 > to > per-cpu-cache-amount = memory_on_node / 16 / number_of_cpus_on_node. > > I think this is reasonable. but even if this patch is applied, quicklist > can cache tons of memory on big machine. > > (Although its patch applied, quicklist can waste 64GB on 1TB server (= 1TB > / 16), it is still too much??) > > test program is below. > -------------------------------------------------------------------------------- > #define _GNU_SOURCE > > #include <stdio.h> > #include <errno.h> > #include <stdlib.h> > #include <string.h> > #include <sched.h> > #include <unistd.h> > #include <sys/mman.h> > #include <sys/wait.h> > > #define BUFFSIZE 512 > > int max_cpu(void) /* get max number of logical cpus from /proc/cpuinfo */ > { > FILE *fd; > char *ret, buffer[BUFFSIZE]; > int cpu = 1; > > fd = fopen("/proc/cpuinfo", "r"); > if (fd == NULL) { > perror("fopen(/proc/cpuinfo)"); > exit(EXIT_FAILURE); > } > while (1) { > ret = fgets(buffer, BUFFSIZE, fd); > if (ret == NULL) > break; > if (!strncmp(buffer, "processor", 9)) > cpu = atoi(strchr(buffer, ':') + 2); > } > fclose(fd); > return cpu; > } > > void cpu_bind(int cpu) /* bind current process to one cpu */ > { > cpu_set_t mask; > int ret; > > CPU_ZERO(&mask); > CPU_SET(cpu, &mask); > ret = sched_setaffinity(0, sizeof(mask), &mask); > if (ret == -1) { > perror("sched_setaffinity()"); > exit(EXIT_FAILURE); > } > sched_yield(); /* not necessary */ > } > > #define MMAP_SIZE (10 * 1024 * 1024) /* 10 MB */ > #define FORK_INTERVAL 1 /* 1 second */ > > main(int argc, char *argv[]) > { > int cpu_max, nextcpu; > long pagesize; > pid_t pid; > > /* set max number of logical cpu */ > if (argc > 1) > cpu_max = atoi(argv[1]) - 1; > else > cpu_max = max_cpu(); > > /* get the page size */ > pagesize = sysconf(_SC_PAGESIZE); > if (pagesize == -1) { > perror("sysconf(_SC_PAGESIZE)"); > exit(EXIT_FAILURE); > } > > /* prepare parent process */ > cpu_bind(0); > nextcpu = cpu_max; > > loop: > > /* select destination cpu for child process by round-robin rule */ > if (++nextcpu > cpu_max) > nextcpu = 1; > > pid = fork(); > > if (pid == 0) { /* child action */ > > char *p; > int i; > > /* consume page tables */ > p = mmap(0, MMAP_SIZE, PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); > i = MMAP_SIZE / pagesize; > while (i-- > 0) { > *p = 1; > p += pagesize; > } > > /* move to other cpu */ > cpu_bind(nextcpu); > /* > printf("a child moved to cpu%d after mmap().\n", nextcpu); > fflush(stdout); > */ > > /* back page tables to pgtable_quicklist */ > exit(0); > > } else if (pid > 0) { /* parent action */ > > sleep(FORK_INTERVAL); > waitpid(pid, NULL, WNOHANG); > > } > > goto loop; > } > > [akpm@linux-foundation.org: fix build on sparc64] > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > Cc: Christoph Lameter <cl@linux-foundation.org> > Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x] > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > mm/quicklist.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff -puN mm/quicklist.c~mm-quicklist-shouldnt-be-proportional-to-number-of-cpus mm/quicklist.c > --- a/mm/quicklist.c~mm-quicklist-shouldnt-be-proportional-to-number-of-cpus > +++ a/mm/quicklist.c > @@ -26,7 +26,9 @@ DEFINE_PER_CPU(struct quicklist, quickli > static unsigned long max_pages(unsigned long min_pages) > { > unsigned long node_free_pages, max; > - struct zone *zones = NODE_DATA(numa_node_id())->node_zones; > + int node = numa_node_id(); > + struct zone *zones = NODE_DATA(node)->node_zones; > + cpumask_t node_cpumask; > > node_free_pages = > #ifdef CONFIG_ZONE_DMA > @@ -38,6 +40,10 @@ static unsigned long max_pages(unsigned > zone_page_state(&zones[ZONE_NORMAL], NR_FREE_PAGES); > > max = node_free_pages / FRACTION_OF_NODE_MEM; > + > + node_cpumask = node_to_cpumask(node); > + max /= cpus_weight_nr(node_cpumask); > + > return max(max, min_pages); > } > > _ > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 1/2] Show quicklist at meminfo 2008-08-22 13:23 ` Robin Holt @ 2008-08-22 13:56 ` Christoph Lameter 0 siblings, 0 replies; 43+ messages in thread From: Christoph Lameter @ 2008-08-22 13:56 UTC (permalink / raw) To: Robin Holt Cc: Andrew Morton, KOSAKI Motohiro, linux-kernel, linux-mm, tokunaga.keiich Robin Holt wrote: > > Could we maybe add a per_cpu off-node quicklist and just always free > that in check_pgt_cache? That would get us back the freeing of off-node > page tables. Yes that is what I suggested and if you check your email from last year then you will find an internal discussion and patches for such an approach. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 1/2] Show quicklist at meminfo 2008-08-22 4:28 ` Andrew Morton 2008-08-22 13:23 ` Robin Holt @ 2008-08-23 8:24 ` KOSAKI Motohiro 2008-08-24 5:29 ` Andrew Morton 1 sibling, 1 reply; 43+ messages in thread From: KOSAKI Motohiro @ 2008-08-23 8:24 UTC (permalink / raw) To: Andrew Morton Cc: kosaki.motohiro, linux-kernel, linux-mm, cl, tokunaga.keiich > > OK. > > I ran cpu hotplug/unplug coutinuous workload over 12H. > > then, system crash doesn't happend. > > > > So, I believe my patch is cpu unplug safe. > > err, which patch? > > I presently have: > > mm-show-quicklist-memory-usage-in-proc-meminfo.patch > mm-show-quicklist-memory-usage-in-proc-meminfo-fix.patch > mm-quicklist-shouldnt-be-proportional-to-number-of-cpus.patch > mm-quicklist-shouldnt-be-proportional-to-number-of-cpus-fix.patch > > Is that what you have? > > I'll consolidate them into two patches and will append them here. Please check. Andrew, Thank you for your attention. I test on mm-show-quicklist-memory-usage-in-proc-meminfo.patch mm-show-quicklist-memory-usage-in-proc-meminfo-fix.patch and http://marc.info/?l=linux-mm&m=121931317407295&w=2 the above url's patch already checked sparc64 compilable by David. and I tested it. So, if possible, Could you replace current quicklist-shouldnt-be-proportional patch to that? (of cource, current -mm patch also works well) the same patch attached below because web mail interface is a bit ugly. From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> When a test program which does task migration runs, my 8GB box spends 800MB of memory for quicklist. This is not memory leak but doesn't seem good. % cat /proc/meminfo MemTotal: 7701568 kB MemFree: 4724672 kB (snip) Quicklists: 844800 kB because - My machine spec is number of numa node: 2 number of cpus: 8 (4CPU x2 node) total mem: 8GB (4GB x2 node) free mem: about 5GB - Maximum quicklist usage is here Number of CPUs per node 2 4 8 16 ============================== ==================== QList_max / (Free + QList_max) 5.8% 16% 30% 48% - Then, 4.7GB x 16% ~= 880MB. So, Quicklist can use 800MB. So, if following spec machine run that program CPUs: 64 (8cpu x 8node) Mem: 1TB (128GB x8node) Then, quicklist can waste 300GB (= 1TB x 30%). It is too large. So, I don't like cache policies which is proportional to # of cpus. My patch changes the number of caches from: per-cpu-cache-amount = memory_on_node / 16 to per-cpu-cache-amount = memory_on_node / 16 / number_of_cpus_on_node. I think this is reasonable. but even if this patch is applied, quicklist can cache tons of memory on big machine. (Although its patch applied, quicklist can waste 64GB on 1TB server (= 1TB / 16), it is still too much??) test program is below. -------------------------------------------------------------------------------- #define _GNU_SOURCE #include <stdio.h> #include <errno.h> #include <stdlib.h> #include <string.h> #include <sched.h> #include <unistd.h> #include <sys/mman.h> #include <sys/wait.h> #define BUFFSIZE 512 int max_cpu(void) /* get max number of logical cpus from /proc/cpuinfo */ { FILE *fd; char *ret, buffer[BUFFSIZE]; int cpu = 1; fd = fopen("/proc/cpuinfo", "r"); if (fd == NULL) { perror("fopen(/proc/cpuinfo)"); exit(EXIT_FAILURE); } while (1) { ret = fgets(buffer, BUFFSIZE, fd); if (ret == NULL) break; if (!strncmp(buffer, "processor", 9)) cpu = atoi(strchr(buffer, ':') + 2); } fclose(fd); return cpu; } void cpu_bind(int cpu) /* bind current process to one cpu */ { cpu_set_t mask; int ret; CPU_ZERO(&mask); CPU_SET(cpu, &mask); ret = sched_setaffinity(0, sizeof(mask), &mask); if (ret == -1) { perror("sched_setaffinity()"); exit(EXIT_FAILURE); } sched_yield(); /* not necessary */ } #define MMAP_SIZE (10 * 1024 * 1024) /* 10 MB */ #define FORK_INTERVAL 1 /* 1 second */ main(int argc, char *argv[]) { int cpu_max, nextcpu; long pagesize; pid_t pid; /* set max number of logical cpu */ if (argc > 1) cpu_max = atoi(argv[1]) - 1; else cpu_max = max_cpu(); /* get the page size */ pagesize = sysconf(_SC_PAGESIZE); if (pagesize == -1) { perror("sysconf(_SC_PAGESIZE)"); exit(EXIT_FAILURE); } /* prepare parent process */ cpu_bind(0); nextcpu = cpu_max; loop: /* select destination cpu for child process by round-robin rule */ if (++nextcpu > cpu_max) nextcpu = 1; pid = fork(); if (pid == 0) { /* child action */ char *p; int i; /* consume page tables */ p = mmap(0, MMAP_SIZE, PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); i = MMAP_SIZE / pagesize; while (i-- > 0) { *p = 1; p += pagesize; } /* move to other cpu */ cpu_bind(nextcpu); /* printf("a child moved to cpu%d after mmap().\n", nextcpu); fflush(stdout); */ /* back page tables to pgtable_quicklist */ exit(0); } else if (pid > 0) { /* parent action */ sleep(FORK_INTERVAL); waitpid(pid, NULL, WNOHANG); } goto loop; } Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Tested-by: David Miller <davem@davemloft.net> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x] --- mm/quicklist.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) Index: b/mm/quicklist.c =================================================================== --- a/mm/quicklist.c +++ b/mm/quicklist.c @@ -26,7 +26,10 @@ DEFINE_PER_CPU(struct quicklist, quickli static unsigned long max_pages(unsigned long min_pages) { unsigned long node_free_pages, max; - struct zone *zones = NODE_DATA(numa_node_id())->node_zones; + int node = numa_node_id(); + struct zone *zones = NODE_DATA(node)->node_zones; + int num_cpus_on_node; + node_to_cpumask_ptr(cpumask_on_node, node); node_free_pages = #ifdef CONFIG_ZONE_DMA @@ -38,6 +41,10 @@ static unsigned long max_pages(unsigned zone_page_state(&zones[ZONE_NORMAL], NR_FREE_PAGES); max = node_free_pages / FRACTION_OF_NODE_MEM; + + num_cpus_on_node = cpus_weight_nr(*cpumask_on_node); + max /= num_cpus_on_node; + return max(max, min_pages); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 1/2] Show quicklist at meminfo 2008-08-23 8:24 ` KOSAKI Motohiro @ 2008-08-24 5:29 ` Andrew Morton 0 siblings, 0 replies; 43+ messages in thread From: Andrew Morton @ 2008-08-24 5:29 UTC (permalink / raw) To: KOSAKI Motohiro; +Cc: linux-kernel, linux-mm, cl, tokunaga.keiich On Sat, 23 Aug 2008 17:24:31 +0900 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > > > OK. > > > I ran cpu hotplug/unplug coutinuous workload over 12H. > > > then, system crash doesn't happend. > > > > > > So, I believe my patch is cpu unplug safe. > > > > err, which patch? > > > > I presently have: > > > > mm-show-quicklist-memory-usage-in-proc-meminfo.patch > > mm-show-quicklist-memory-usage-in-proc-meminfo-fix.patch > > mm-quicklist-shouldnt-be-proportional-to-number-of-cpus.patch > > mm-quicklist-shouldnt-be-proportional-to-number-of-cpus-fix.patch > > > > Is that what you have? > > > > I'll consolidate them into two patches and will append them here. Please check. > > Andrew, Thank you for your attention. > > I test on > > mm-show-quicklist-memory-usage-in-proc-meminfo.patch > mm-show-quicklist-memory-usage-in-proc-meminfo-fix.patch > > and > > http://marc.info/?l=linux-mm&m=121931317407295&w=2 > > > the above url's patch already checked sparc64 compilable by David. > and I tested it. > > So, if possible, Could you replace current quicklist-shouldnt-be-proportional > patch to that? > (of cource, current -mm patch also works well) > OK, there's just too much potential for miscommunication and error here. Please resend everything as a sequence-numbered, fully-changlelogged signed-off patch series against current mainline. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs 2008-08-20 11:05 [RFC][PATCH 0/2] Quicklist is slighly problematic KOSAKI Motohiro 2008-08-20 11:07 ` [RFC][PATCH 1/2] Show quicklist at meminfo KOSAKI Motohiro @ 2008-08-20 11:08 ` KOSAKI Motohiro 2008-08-20 15:27 ` Christoph Lameter 2008-08-21 6:46 ` Andrew Morton 2008-08-20 14:10 ` [RFC][PATCH 0/2] Quicklist is slighly problematic Christoph Lameter 2008-08-20 18:31 ` Andrew Morton 3 siblings, 2 replies; 43+ messages in thread From: KOSAKI Motohiro @ 2008-08-20 11:08 UTC (permalink / raw) To: LKML, linux-mm, Andrew Morton, Christoph Lameter, tokunaga.keiich Cc: kosaki.motohiro When a test program which does task migration runs, my 8GB box spends 800MB of memory for quicklist. This is not memory leak but doesn't seem good. % cat /proc/meminfo MemTotal: 7701568 kB MemFree: 4724672 kB (snip) Quicklists: 844800 kB because - My machine spec is number of numa node: 2 number of cpus: 8 (4CPU x2 node) total mem: 8GB (4GB x2 node) free mem: about 5GB - Maximum quicklist usage is here Number of CPUs per node 2 4 8 16 ============================== ==================== QList_max / (Free + QList_max) 5.8% 16% 30% 48% - Then, 4.7GB x 16% ~= 880MB. So, Quicklist can use 800MB. So, if following spec machine run that program CPUs: 64 (8cpu x 8node) Mem: 1TB (128GB x8node) Then, quicklist can waste 300GB (= 1TB x 30%). it is fairly too large. So, I don't like cache policies which is proportional to # of cpus. My patch changes the number of caches from: per-cpu-cache-amount = memory_on_node / 16 to per-cpu-cache-amount = memory_on_node / 16 / numder_of_cpus_on_node. I think this is reasonable. but even if this patch is applied, quicklist can cache tons of memory on big machine. (Although its patch applied, quicklist can waste 64GB on 1TB server (= 1TB / 16), it is still too much??) test program is below. -------------------------------------------------------------------------------- #define _GNU_SOURCE #include <stdio.h> #include <errno.h> #include <stdlib.h> #include <string.h> #include <sched.h> #include <unistd.h> #include <sys/mman.h> #include <sys/wait.h> #define BUFFSIZE 512 int max_cpu(void) /* get max number of logical cpus from /proc/cpuinfo */ { FILE *fd; char *ret, buffer[BUFFSIZE]; int cpu = 1; fd = fopen("/proc/cpuinfo", "r"); if (fd == NULL) { perror("fopen(/proc/cpuinfo)"); exit(EXIT_FAILURE); } while (1) { ret = fgets(buffer, BUFFSIZE, fd); if (ret == NULL) break; if (!strncmp(buffer, "processor", 9)) cpu = atoi(strchr(buffer, ':') + 2); } fclose(fd); return cpu; } void cpu_bind(int cpu) /* bind current process to one cpu */ { cpu_set_t mask; int ret; CPU_ZERO(&mask); CPU_SET(cpu, &mask); ret = sched_setaffinity(0, sizeof(mask), &mask); if (ret == -1) { perror("sched_setaffinity()"); exit(EXIT_FAILURE); } sched_yield(); /* not necessary */ } #define MMAP_SIZE (10 * 1024 * 1024) /* 10 MB */ #define FORK_INTERVAL 1 /* 1 second */ main(int argc, char *argv[]) { int cpu_max, nextcpu; long pagesize; pid_t pid; /* set max number of logical cpu */ if (argc > 1) cpu_max = atoi(argv[1]) - 1; else cpu_max = max_cpu(); /* get the page size */ pagesize = sysconf(_SC_PAGESIZE); if (pagesize == -1) { perror("sysconf(_SC_PAGESIZE)"); exit(EXIT_FAILURE); } /* prepare parent process */ cpu_bind(0); nextcpu = cpu_max; loop: /* select destination cpu for child process by round-robin rule */ if (++nextcpu > cpu_max) nextcpu = 1; pid = fork(); if (pid == 0) { /* child action */ char *p; int i; /* consume page tables */ p = mmap(0, MMAP_SIZE, PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); i = MMAP_SIZE / pagesize; while (i-- > 0) { *p = 1; p += pagesize; } /* move to other cpu */ cpu_bind(nextcpu); /* printf("a child moved to cpu%d after mmap().\n", nextcpu); fflush(stdout); */ /* back page tables to pgtable_quicklist */ exit(0); } else if (pid > 0) { /* parent action */ sleep(FORK_INTERVAL); waitpid(pid, NULL, WNOHANG); } goto loop; } ----------------------------------------------------------------------------- Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> --- mm/quicklist.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) Index: b/mm/quicklist.c =================================================================== --- a/mm/quicklist.c +++ b/mm/quicklist.c @@ -26,7 +26,9 @@ DEFINE_PER_CPU(struct quicklist, quickli static unsigned long max_pages(unsigned long min_pages) { unsigned long node_free_pages, max; - struct zone *zones = NODE_DATA(numa_node_id())->node_zones; + int node = numa_node_id(); + struct zone *zones = NODE_DATA(node)->node_zones; + int num_cpus_per_node; node_free_pages = #ifdef CONFIG_ZONE_DMA @@ -38,6 +40,10 @@ static unsigned long max_pages(unsigned zone_page_state(&zones[ZONE_NORMAL], NR_FREE_PAGES); max = node_free_pages / FRACTION_OF_NODE_MEM; + + num_cpus_per_node = cpus_weight_nr(node_to_cpumask(node)); + max /= num_cpus_per_node; + return max(max, min_pages); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs 2008-08-20 11:08 ` [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs KOSAKI Motohiro @ 2008-08-20 15:27 ` Christoph Lameter 2008-08-21 6:46 ` Andrew Morton 1 sibling, 0 replies; 43+ messages in thread From: Christoph Lameter @ 2008-08-20 15:27 UTC (permalink / raw) To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrew Morton, tokunaga.keiich Looks good. Acked-by: Christoph Lameter <cl@linux-foundation.org> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs 2008-08-20 11:08 ` [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs KOSAKI Motohiro 2008-08-20 15:27 ` Christoph Lameter @ 2008-08-21 6:46 ` Andrew Morton 2008-08-21 7:13 ` David Miller, Andrew Morton 1 sibling, 1 reply; 43+ messages in thread From: Andrew Morton @ 2008-08-21 6:46 UTC (permalink / raw) To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Christoph Lameter, tokunaga.keiich On Wed, 20 Aug 2008 20:08:13 +0900 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > + num_cpus_per_node = cpus_weight_nr(node_to_cpumask(node)); sparc64 allmodconfig: mm/quicklist.c: In function `max_pages': mm/quicklist.c:44: error: invalid lvalue in unary `&' we seem to have a made a spectacular mess of cpumasks lately. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs 2008-08-21 6:46 ` Andrew Morton @ 2008-08-21 7:13 ` David Miller, Andrew Morton 2008-08-21 7:18 ` KOSAKI Motohiro ` (2 more replies) 0 siblings, 3 replies; 43+ messages in thread From: David Miller, Andrew Morton @ 2008-08-21 7:13 UTC (permalink / raw) To: akpm; +Cc: kosaki.motohiro, linux-kernel, linux-mm, cl, tokunaga.keiich > On Wed, 20 Aug 2008 20:08:13 +0900 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > > > + num_cpus_per_node = cpus_weight_nr(node_to_cpumask(node)); > > sparc64 allmodconfig: > > mm/quicklist.c: In function `max_pages': > mm/quicklist.c:44: error: invalid lvalue in unary `&' > > we seem to have a made a spectacular mess of cpumasks lately. It should explode similarly on x86, since it also defines node_to_cpumask() as an inline function. IA64 seems to be one of the few platforms to define this as a macro evaluating to the node-to-cpumask array entry, so it's clear what platform Motohiro-san did build testing on :-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs 2008-08-21 7:13 ` David Miller, Andrew Morton @ 2008-08-21 7:18 ` KOSAKI Motohiro 2008-08-21 7:27 ` Andrew Morton 2008-08-25 18:40 ` Mike Travis 2 siblings, 0 replies; 43+ messages in thread From: KOSAKI Motohiro @ 2008-08-21 7:18 UTC (permalink / raw) To: David Miller; +Cc: akpm, linux-kernel, linux-mm, cl, tokunaga.keiich >> sparc64 allmodconfig: >> >> mm/quicklist.c: In function `max_pages': >> mm/quicklist.c:44: error: invalid lvalue in unary `&' >> >> we seem to have a made a spectacular mess of cpumasks lately. > > It should explode similarly on x86, since it also defines node_to_cpumask() > as an inline function. > > IA64 seems to be one of the few platforms to define this as a macro > evaluating to the node-to-cpumask array entry, so it's clear what > platform Motohiro-san did build testing on :-) Thank you good advice. I don't have sparc64 machine but I can get borrowing x86 machine. So, I'll test on x86 today. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs 2008-08-21 7:13 ` David Miller, Andrew Morton 2008-08-21 7:18 ` KOSAKI Motohiro @ 2008-08-21 7:27 ` Andrew Morton 2008-08-21 7:31 ` KOSAKI Motohiro 2008-08-21 9:32 ` Peter Zijlstra 2008-08-25 18:40 ` Mike Travis 2 siblings, 2 replies; 43+ messages in thread From: Andrew Morton @ 2008-08-21 7:27 UTC (permalink / raw) To: David Miller; +Cc: kosaki.motohiro, linux-kernel, linux-mm, cl, tokunaga.keiich On Thu, 21 Aug 2008 00:13:22 -0700 (PDT) David Miller <davem@davemloft.net> wrote: > From: Andrew Morton <akpm@linux-foundation.org> > Date: Wed, 20 Aug 2008 23:46:15 -0700 > > > On Wed, 20 Aug 2008 20:08:13 +0900 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > > > > > + num_cpus_per_node = cpus_weight_nr(node_to_cpumask(node)); > > > > sparc64 allmodconfig: > > > > mm/quicklist.c: In function `max_pages': > > mm/quicklist.c:44: error: invalid lvalue in unary `&' > > > > we seem to have a made a spectacular mess of cpumasks lately. > > It should explode similarly on x86, since it also defines node_to_cpumask() > as an inline function. > > IA64 seems to be one of the few platforms to define this as a macro > evaluating to the node-to-cpumask array entry, so it's clear what > platform Motohiro-san did build testing on :-) Seems to compile OK on x86_32, x86_64, ia64 and powerpc for some reason. This seems to fix things on sparc64: --- a/mm/quicklist.c~mm-quicklist-shouldnt-be-proportional-to-number-of-cpus-fix +++ a/mm/quicklist.c @@ -28,7 +28,7 @@ static unsigned long max_pages(unsigned unsigned long node_free_pages, max; int node = numa_node_id(); struct zone *zones = NODE_DATA(node)->node_zones; - int num_cpus_per_node; + cpumask_t node_cpumask; node_free_pages = #ifdef CONFIG_ZONE_DMA @@ -41,8 +41,8 @@ static unsigned long max_pages(unsigned max = node_free_pages / FRACTION_OF_NODE_MEM; - num_cpus_per_node = cpus_weight_nr(node_to_cpumask(node)); - max /= num_cpus_per_node; + node_cpumask = node_to_cpumask(node); + max /= cpus_weight_nr(node_cpumask); return max(max, min_pages); } _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs 2008-08-21 7:27 ` Andrew Morton @ 2008-08-21 7:31 ` KOSAKI Motohiro 2008-08-21 9:32 ` Peter Zijlstra 1 sibling, 0 replies; 43+ messages in thread From: KOSAKI Motohiro @ 2008-08-21 7:31 UTC (permalink / raw) To: Andrew Morton; +Cc: David Miller, linux-kernel, linux-mm, cl, tokunaga.keiich >> IA64 seems to be one of the few platforms to define this as a macro >> evaluating to the node-to-cpumask array entry, so it's clear what >> platform Motohiro-san did build testing on :-) > > Seems to compile OK on x86_32, x86_64, ia64 and powerpc for some reason. > > This seems to fix things on sparc64: > > --- a/mm/quicklist.c~mm-quicklist-shouldnt-be-proportional-to-number-of-cpus-fix > +++ a/mm/quicklist.c > @@ -28,7 +28,7 @@ static unsigned long max_pages(unsigned > unsigned long node_free_pages, max; > int node = numa_node_id(); > struct zone *zones = NODE_DATA(node)->node_zones; > - int num_cpus_per_node; > + cpumask_t node_cpumask; > > node_free_pages = > #ifdef CONFIG_ZONE_DMA > @@ -41,8 +41,8 @@ static unsigned long max_pages(unsigned > > max = node_free_pages / FRACTION_OF_NODE_MEM; > > - num_cpus_per_node = cpus_weight_nr(node_to_cpumask(node)); > - max /= num_cpus_per_node; > + node_cpumask = node_to_cpumask(node); > + max /= cpus_weight_nr(node_cpumask); > > return max(max, min_pages); > } Thank you!! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs 2008-08-21 7:27 ` Andrew Morton 2008-08-21 7:31 ` KOSAKI Motohiro @ 2008-08-21 9:32 ` Peter Zijlstra 2008-08-21 10:04 ` KOSAKI Motohiro 2008-08-25 18:44 ` Mike Travis 1 sibling, 2 replies; 43+ messages in thread From: Peter Zijlstra @ 2008-08-21 9:32 UTC (permalink / raw) To: Andrew Morton Cc: David Miller, kosaki.motohiro, linux-kernel, linux-mm, cl, tokunaga.keiich, travis On Thu, 2008-08-21 at 00:27 -0700, Andrew Morton wrote: > On Thu, 21 Aug 2008 00:13:22 -0700 (PDT) David Miller <davem@davemloft.net> wrote: > > > From: Andrew Morton <akpm@linux-foundation.org> > > Date: Wed, 20 Aug 2008 23:46:15 -0700 > > > > > On Wed, 20 Aug 2008 20:08:13 +0900 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > > > > > > > + num_cpus_per_node = cpus_weight_nr(node_to_cpumask(node)); > > > > > > sparc64 allmodconfig: > > > > > > mm/quicklist.c: In function `max_pages': > > > mm/quicklist.c:44: error: invalid lvalue in unary `&' > > > > > > we seem to have a made a spectacular mess of cpumasks lately. > > > > It should explode similarly on x86, since it also defines node_to_cpumask() > > as an inline function. > > > > IA64 seems to be one of the few platforms to define this as a macro > > evaluating to the node-to-cpumask array entry, so it's clear what > > platform Motohiro-san did build testing on :-) > > Seems to compile OK on x86_32, x86_64, ia64 and powerpc for some reason. > > This seems to fix things on sparc64: > > --- a/mm/quicklist.c~mm-quicklist-shouldnt-be-proportional-to-number-of-cpus-fix > +++ a/mm/quicklist.c > @@ -28,7 +28,7 @@ static unsigned long max_pages(unsigned > unsigned long node_free_pages, max; > int node = numa_node_id(); > struct zone *zones = NODE_DATA(node)->node_zones; > - int num_cpus_per_node; > + cpumask_t node_cpumask; > > node_free_pages = > #ifdef CONFIG_ZONE_DMA > @@ -41,8 +41,8 @@ static unsigned long max_pages(unsigned > > max = node_free_pages / FRACTION_OF_NODE_MEM; > > - num_cpus_per_node = cpus_weight_nr(node_to_cpumask(node)); > - max /= num_cpus_per_node; > + node_cpumask = node_to_cpumask(node); > + max /= cpus_weight_nr(node_cpumask); > > return max(max, min_pages); > } humm, I thought we wanted to keep cpumask_t stuff away from our stack - since on insanely large SGI boxen (/me looks at mike) the thing becomes 512 bytes. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs 2008-08-21 9:32 ` Peter Zijlstra @ 2008-08-21 10:04 ` KOSAKI Motohiro 2008-08-21 10:09 ` David Miller, KOSAKI Motohiro ` (2 more replies) 2008-08-25 18:44 ` Mike Travis 1 sibling, 3 replies; 43+ messages in thread From: KOSAKI Motohiro @ 2008-08-21 10:04 UTC (permalink / raw) To: Peter Zijlstra Cc: kosaki.motohiro, Andrew Morton, David Miller, linux-kernel, linux-mm, cl, tokunaga.keiich, travis Hi Peter, Thank you good point out! > > @@ -41,8 +41,8 @@ static unsigned long max_pages(unsigned > > > > max = node_free_pages / FRACTION_OF_NODE_MEM; > > > > - num_cpus_per_node = cpus_weight_nr(node_to_cpumask(node)); > > - max /= num_cpus_per_node; > > + node_cpumask = node_to_cpumask(node); > > + max /= cpus_weight_nr(node_cpumask); > > > > return max(max, min_pages); > > } > > humm, I thought we wanted to keep cpumask_t stuff away from our stack - > since on insanely large SGI boxen (/me looks at mike) the thing becomes > 512 bytes. Hm, interesting. I think following patch fill your point, right? but I worry about it works on sparc64... Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> --- mm/quicklist.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) Index: b/mm/quicklist.c =================================================================== --- a/mm/quicklist.c +++ b/mm/quicklist.c @@ -26,7 +26,10 @@ DEFINE_PER_CPU(struct quicklist, quickli static unsigned long max_pages(unsigned long min_pages) { unsigned long node_free_pages, max; - struct zone *zones = NODE_DATA(numa_node_id())->node_zones; + int node = numa_node_id(); + struct zone *zones = NODE_DATA(node)->node_zones; + int num_cpus_on_node; + node_to_cpumask_ptr(cpumask_on_node, node); node_free_pages = #ifdef CONFIG_ZONE_DMA @@ -38,6 +41,10 @@ static unsigned long max_pages(unsigned zone_page_state(&zones[ZONE_NORMAL], NR_FREE_PAGES); max = node_free_pages / FRACTION_OF_NODE_MEM; + + num_cpus_on_node = cpus_weight_nr(*cpumask_on_node); + max /= num_cpus_on_node; + return max(max, min_pages); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs 2008-08-21 10:04 ` KOSAKI Motohiro @ 2008-08-21 10:09 ` David Miller, KOSAKI Motohiro 2008-08-21 10:13 ` KOSAKI Motohiro 2008-08-21 10:22 ` KOSAKI Motohiro 2008-08-25 18:48 ` Mike Travis 2 siblings, 1 reply; 43+ messages in thread From: David Miller, KOSAKI Motohiro @ 2008-08-21 10:09 UTC (permalink / raw) To: kosaki.motohiro Cc: peterz, akpm, linux-kernel, linux-mm, cl, tokunaga.keiich, travis > but I worry about it works on sparc64... It should. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs 2008-08-21 10:09 ` David Miller, KOSAKI Motohiro @ 2008-08-21 10:13 ` KOSAKI Motohiro 2008-08-21 10:26 ` David Miller, KOSAKI Motohiro 0 siblings, 1 reply; 43+ messages in thread From: KOSAKI Motohiro @ 2008-08-21 10:13 UTC (permalink / raw) To: David Miller Cc: kosaki.motohiro, peterz, akpm, linux-kernel, linux-mm, cl, tokunaga.keiich, travis > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > Date: Thu, 21 Aug 2008 19:04:28 +0900 > > > but I worry about it works on sparc64... > > It should. Could you please confirm it? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs 2008-08-21 10:13 ` KOSAKI Motohiro @ 2008-08-21 10:26 ` David Miller, KOSAKI Motohiro 0 siblings, 0 replies; 43+ messages in thread From: David Miller, KOSAKI Motohiro @ 2008-08-21 10:26 UTC (permalink / raw) To: kosaki.motohiro Cc: peterz, akpm, linux-kernel, linux-mm, cl, tokunaga.keiich, travis > > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > > Date: Thu, 21 Aug 2008 19:04:28 +0900 > > > > > but I worry about it works on sparc64... > > > > It should. > > Could you please confirm it? davem@sunset:~/src/GIT/net-2.6$ patch -p1 <diff patching file mm/quicklist.c davem@sunset:~/src/GIT/net-2.6$ make mm/quicklist.o CHK include/linux/version.h CHK include/linux/utsrelease.h CALL scripts/checksyscalls.sh CC mm/quicklist.o davem@sunset:~/src/GIT/net-2.6$ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs 2008-08-21 10:04 ` KOSAKI Motohiro 2008-08-21 10:09 ` David Miller, KOSAKI Motohiro @ 2008-08-21 10:22 ` KOSAKI Motohiro 2008-08-21 12:02 ` KOSAKI Motohiro 2008-08-25 18:48 ` Mike Travis 2 siblings, 1 reply; 43+ messages in thread From: KOSAKI Motohiro @ 2008-08-21 10:22 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Peter Zijlstra, Andrew Morton, David Miller, linux-kernel, linux-mm, cl, tokunaga.keiich, travis Sorry, following patch is crap. please forget it. I'll respin it soon. > > --- > mm/quicklist.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > Index: b/mm/quicklist.c > =================================================================== > --- a/mm/quicklist.c > +++ b/mm/quicklist.c > @@ -26,7 +26,10 @@ DEFINE_PER_CPU(struct quicklist, quickli > static unsigned long max_pages(unsigned long min_pages) > { > unsigned long node_free_pages, max; > - struct zone *zones = NODE_DATA(numa_node_id())->node_zones; > + int node = numa_node_id(); > + struct zone *zones = NODE_DATA(node)->node_zones; > + int num_cpus_on_node; > + node_to_cpumask_ptr(cpumask_on_node, node); > > node_free_pages = > #ifdef CONFIG_ZONE_DMA > @@ -38,6 +41,10 @@ static unsigned long max_pages(unsigned > zone_page_state(&zones[ZONE_NORMAL], NR_FREE_PAGES); > > max = node_free_pages / FRACTION_OF_NODE_MEM; > + > + num_cpus_on_node = cpus_weight_nr(*cpumask_on_node); > + max /= num_cpus_on_node; > + > return max(max, min_pages); > } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs 2008-08-21 10:22 ` KOSAKI Motohiro @ 2008-08-21 12:02 ` KOSAKI Motohiro 0 siblings, 0 replies; 43+ messages in thread From: KOSAKI Motohiro @ 2008-08-21 12:02 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Peter Zijlstra, Andrew Morton, David Miller, linux-kernel, linux-mm, cl, tokunaga.keiich, travis > > Sorry, following patch is crap. > please forget it. > > I'll respin it soon. Ah, it's a ok. it is not crap. node_to_cpumask_ptr() of generic arch makes local cpumask_t variable. #define node_to_cpumask_ptr(v, node) \ cpumask_t _##v = node_to_cpumask(node); \ const cpumask_t *v = &_##v but gcc optimazer can erase it. So, it doesn't consume any stack. checkstack.pl doesn't outpu quicklist related function. % objdump -d vmlinux | ./scripts/checkstack.pl 0xa000000100647a86 sn2_global_tlb_purge [vmlinux]: 2176 0xa000000100264e86 read_kcore [vmlinux]: 1360 0xa0000001001042a6 crash_save_cpu [vmlinux]: 1152 0xa0000001007869e6 e1000_check_options [vmlinux]: 1152 0xa00000010021b9c6 __mpage_writepage [vmlinux]: 1136 0xa00000010034e9c6 fat_alloc_clusters [vmlinux]: 1136 0xa0000001009c29c6 efi_uart_console_only [vmlinux]: 1136 0xa00000010034afa6 fat_add_entries [vmlinux]: 1088 0xa00000010034d186 fat_free_clusters [vmlinux]: 1088 0xa00000010051f396 tg3_get_estats [vmlinux]: 1072 0xa000000100348f26 fat_alloc_new_dir [vmlinux]: 1040 0xa00000010079df26 cpu_init [vmlinux]: 1040 0xa00000010020fa46 block_read_full_page [vmlinux]: 1024 0xa00000010021c906 do_mpage_readpage [vmlinux]: 1024 0xa000000100016106 kernel_thread [vmlinux]: 976 0xa000000100031486 convert_to_non_syscall [vmlinux]: 928 0xa0000001001d9486 do_sys_poll [vmlinux]: 848 0xa0000001007a6406 sn_cpu_init [vmlinux]: 768 0xa00000010004bc66 find_save_locs [vmlinux]: 752 0xa0000001009faa26 sn_setup [vmlinux]: 656 0xa000000100034326 arch_ptrace [vmlinux]: 624 0xa000000100197be6 shmem_getpage [vmlinux]: 624 0xa000000100119046 cpuset_write_resmask [vmlinux]: 608 0xa0000001001da4c6 do_select [vmlinux]: 592 0xa00000010064dfd0 sn_topology_show [vmlinux]: 592 0xa00000010005b7e6 vm_info [vmlinux]: 544 0xa0000001007a0026 cache_add_dev [vmlinux]: 544 0xa00000010000beb0 sys_clone2 [vmlinux]: 528 0xa00000010000bf30 sys_clone [vmlinux]: 528 0xa00000010000bfb0 ia64_native_switch_to [vmlinux]: 528 0xa00000010000cdd0 ia64_prepare_handle_unaligned [vmlinux]:528 0xa00000010000ce40 unw_init_running [vmlinux]: 528 0xa000000100072810 ia32_clone [vmlinux]: 528 0xa0000001000729f0 sys32_fork [vmlinux]: 528 0xa0000001003089c6 log_do_checkpoint [vmlinux]: 528 0xa00000010031de06 jbd2_log_do_checkpoint [vmlinux]: 528 0xa0000001007aefa6 ia64_fault [vmlinux]: 528 0xa000000100030f66 do_regset_call [vmlinux]: 512 0xa000000100036de6 do_fpregs_set [vmlinux]: 512 0xa000000100073446 do_regset_call [vmlinux]: 512 0xa000000100194246 sys_migrate_pages [vmlinux]: 512 0xa0000001003676a6 sys_semctl [vmlinux]: 512 0xa000000100038286 do_fpregs_get [vmlinux]: 480 0xa000000100200f46 sys_vmsplice [vmlinux]: 480 0xa000000100640490 print_hook [vmlinux]: 480 0xa00000010064ab26 sn_hwperf_get_nearest_node_objdata [vmlinux]:480 0xa000000100797966 sym2_probe [vmlinux]: 480 0xa00000010000ce50 unw_init_running [vmlinux]: 464 0xa000000100015e26 get_wchan [vmlinux]: 464 0xa0000001000177e6 show_stack [vmlinux]: 464 0xa000000100035fa6 ptrace_attach_sync_user_rbs [vmlinux]:464 0xa000000100042786 ia64_handle_unaligned [vmlinux]: 464 0xa00000010009ace6 sched_show_task [vmlinux]: 464 0xa0000001003664a6 sys_semtimedop [vmlinux]: 464 0xa00000010064bec6 sn_hwperf_init [vmlinux]: 464 0xa0000001001043c6 crash_kexec [vmlinux]: 448 0xa000000100217646 __blkdev_get [vmlinux]: 448 0xa000000100672aa6 skb_splice_bits [vmlinux]: 448 0xa0000001007a35a6 fork_idle [vmlinux]: 448 0xa0000001009c95a6 ia64_mca_init [vmlinux]: 448 0xa0000001009ee766 scdrv_init [vmlinux]: 448 0xa000000100128026 relay_file_splice_read [vmlinux]: 432 0xa000000100200346 generic_file_splice_read [vmlinux]: 432 0xa0000001004bf226 node_read_meminfo [vmlinux]: 432 0xa0000001006d54c6 do_ip_setsockopt [vmlinux]: 432 0xa00000010044e1c6 extract_buf [vmlinux]: 416 0xa00000010005ae06 register_info [vmlinux]: 400 0xa0000001005fb3f6 raid6_int32_gen_syndrome [vmlinux]: 400 0xa000000100066ac6 mca_try_to_recover [vmlinux]: 384 0xa000000100262466 meminfo_read_proc [vmlinux]: 384 0xa0000001006605a6 sock_recvmsg [vmlinux]: 368 0xa000000100661106 sock_sendmsg [vmlinux]: 368 0xa000000100664226 sys_sendmsg [vmlinux]: 368 0xa0000001009b8c06 md_run_setup [vmlinux]: 368 0xa000000100160086 unmap_vmas [vmlinux]: 352 0xa00000010025d866 do_task_stat [vmlinux]: 352 0xa00000010077d3a6 ia64_tlb_init [vmlinux]: 352 0xa000000100054196 ia64_mca_printk [vmlinux]: 336 0xa000000100058c46 tr_info [vmlinux]: 336 0xa000000100066356 mca_recovered [vmlinux]: 336 0xa000000100066436 fatal_mca [vmlinux]: 336 0xa0000001006d4026 do_ip_getsockopt [vmlinux]: 336 0xa000000100015a06 cpu_halt [vmlinux]: 320 0xa000000100119ca6 cpuset_attach [vmlinux]: 320 0xa00000010064cab0 sn_hwperf_ioctl [vmlinux]: 320 0xa000000100660786 sys_recvmsg [vmlinux]: 320 0xa0000001006c4666 cleanup_once [vmlinux]: 320 0xa0000001006c5066 inet_getpeer [vmlinux]: 320 0xa0000001000a8e56 warn_slowpath [vmlinux]: 304 0xa000000100117906 update_flag [vmlinux]: 304 0xa0000001001daca6 core_sys_select [vmlinux]: 304 0xa000000100234bc6 compat_core_sys_select [vmlinux]: 304 0xa00000010038e9c6 blk_recount_segments [vmlinux]: 304 0xa000000100487486 scdrv_write [vmlinux]: 304 0xa000000100487e66 scdrv_read [vmlinux]: 304 0xa0000001004890c6 scdrv_event [vmlinux]: 304 0xa0000001005561c6 scsi_reset_provider [vmlinux]: 304 0xa00000010078d266 tg3_get_invariants [vmlinux]: 304 Conclusion: This patch can queue to upstream IMHO. > > > > --- > > mm/quicklist.c | 9 ++++++++- > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > Index: b/mm/quicklist.c > > =================================================================== > > --- a/mm/quicklist.c > > +++ b/mm/quicklist.c > > @@ -26,7 +26,10 @@ DEFINE_PER_CPU(struct quicklist, quickli > > static unsigned long max_pages(unsigned long min_pages) > > { > > unsigned long node_free_pages, max; > > - struct zone *zones = NODE_DATA(numa_node_id())->node_zones; > > + int node = numa_node_id(); > > + struct zone *zones = NODE_DATA(node)->node_zones; > > + int num_cpus_on_node; > > + node_to_cpumask_ptr(cpumask_on_node, node); > > > > node_free_pages = > > #ifdef CONFIG_ZONE_DMA > > @@ -38,6 +41,10 @@ static unsigned long max_pages(unsigned > > zone_page_state(&zones[ZONE_NORMAL], NR_FREE_PAGES); > > > > max = node_free_pages / FRACTION_OF_NODE_MEM; > > + > > + num_cpus_on_node = cpus_weight_nr(*cpumask_on_node); > > + max /= num_cpus_on_node; > > + > > return max(max, min_pages); > > } > > > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs 2008-08-21 10:04 ` KOSAKI Motohiro 2008-08-21 10:09 ` David Miller, KOSAKI Motohiro 2008-08-21 10:22 ` KOSAKI Motohiro @ 2008-08-25 18:48 ` Mike Travis 2008-08-25 23:33 ` KOSAKI Motohiro 2 siblings, 1 reply; 43+ messages in thread From: Mike Travis @ 2008-08-25 18:48 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Peter Zijlstra, Andrew Morton, David Miller, linux-kernel, linux-mm, cl, tokunaga.keiich KOSAKI Motohiro wrote: > Hi Peter, > > Thank you good point out! > >>> @@ -41,8 +41,8 @@ static unsigned long max_pages(unsigned >>> >>> max = node_free_pages / FRACTION_OF_NODE_MEM; >>> >>> - num_cpus_per_node = cpus_weight_nr(node_to_cpumask(node)); >>> - max /= num_cpus_per_node; >>> + node_cpumask = node_to_cpumask(node); >>> + max /= cpus_weight_nr(node_cpumask); >>> >>> return max(max, min_pages); >>> } >> humm, I thought we wanted to keep cpumask_t stuff away from our stack - >> since on insanely large SGI boxen (/me looks at mike) the thing becomes >> 512 bytes. > > Hm, interesting. > I think following patch fill your point, right? > > but I worry about it works on sparc64... > > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > > --- > mm/quicklist.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > Index: b/mm/quicklist.c > =================================================================== > --- a/mm/quicklist.c > +++ b/mm/quicklist.c > @@ -26,7 +26,10 @@ DEFINE_PER_CPU(struct quicklist, quickli > static unsigned long max_pages(unsigned long min_pages) > { > unsigned long node_free_pages, max; > - struct zone *zones = NODE_DATA(numa_node_id())->node_zones; > + int node = numa_node_id(); > + struct zone *zones = NODE_DATA(node)->node_zones; > + int num_cpus_on_node; > + node_to_cpumask_ptr(cpumask_on_node, node); > > node_free_pages = > #ifdef CONFIG_ZONE_DMA > @@ -38,6 +41,10 @@ static unsigned long max_pages(unsigned > zone_page_state(&zones[ZONE_NORMAL], NR_FREE_PAGES); > > max = node_free_pages / FRACTION_OF_NODE_MEM; > + > + num_cpus_on_node = cpus_weight_nr(*cpumask_on_node); > + max /= num_cpus_on_node; > + > return max(max, min_pages); > } > > Exactly! And (many thanks to them!) the sparc maintainers have implemented a similar internal function definition for node_to_cpumask_ptr(). Mike -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs 2008-08-25 18:48 ` Mike Travis @ 2008-08-25 23:33 ` KOSAKI Motohiro 2008-08-26 20:35 ` Mike Travis 0 siblings, 1 reply; 43+ messages in thread From: KOSAKI Motohiro @ 2008-08-25 23:33 UTC (permalink / raw) To: Mike Travis Cc: kosaki.motohiro, Peter Zijlstra, Andrew Morton, David Miller, linux-kernel, linux-mm, cl, tokunaga.keiich > > + int node = numa_node_id(); > > + struct zone *zones = NODE_DATA(node)->node_zones; > > + int num_cpus_on_node; > > + node_to_cpumask_ptr(cpumask_on_node, node); > > > > node_free_pages = > > #ifdef CONFIG_ZONE_DMA > > @@ -38,6 +41,10 @@ static unsigned long max_pages(unsigned > > zone_page_state(&zones[ZONE_NORMAL], NR_FREE_PAGES); > > > > max = node_free_pages / FRACTION_OF_NODE_MEM; > > + > > + num_cpus_on_node = cpus_weight_nr(*cpumask_on_node); > > + max /= num_cpus_on_node; > > + > > return max(max, min_pages); > > Exactly! And (many thanks to them!) the sparc maintainers have > implemented a similar internal function definition for node_to_cpumask_ptr(). Can I think get your Ack? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs 2008-08-25 23:33 ` KOSAKI Motohiro @ 2008-08-26 20:35 ` Mike Travis 0 siblings, 0 replies; 43+ messages in thread From: Mike Travis @ 2008-08-26 20:35 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Peter Zijlstra, Andrew Morton, David Miller, linux-kernel, linux-mm, cl, tokunaga.keiich KOSAKI Motohiro wrote: >>> + int node = numa_node_id(); >>> + struct zone *zones = NODE_DATA(node)->node_zones; >>> + int num_cpus_on_node; >>> + node_to_cpumask_ptr(cpumask_on_node, node); >>> >>> node_free_pages = >>> #ifdef CONFIG_ZONE_DMA >>> @@ -38,6 +41,10 @@ static unsigned long max_pages(unsigned >>> zone_page_state(&zones[ZONE_NORMAL], NR_FREE_PAGES); >>> >>> max = node_free_pages / FRACTION_OF_NODE_MEM; >>> + >>> + num_cpus_on_node = cpus_weight_nr(*cpumask_on_node); >>> + max /= num_cpus_on_node; >>> + >>> return max(max, min_pages); >> Exactly! And (many thanks to them!) the sparc maintainers have >> implemented a similar internal function definition for node_to_cpumask_ptr(). > > Can I think get your Ack? > Based on code review, sure. I'll also give it a try on one of my test machines as soon as I can. Mike -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs 2008-08-21 9:32 ` Peter Zijlstra 2008-08-21 10:04 ` KOSAKI Motohiro @ 2008-08-25 18:44 ` Mike Travis 1 sibling, 0 replies; 43+ messages in thread From: Mike Travis @ 2008-08-25 18:44 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, David Miller, kosaki.motohiro, linux-kernel, linux-mm, cl, tokunaga.keiich Peter Zijlstra wrote: > On Thu, 2008-08-21 at 00:27 -0700, Andrew Morton wrote: >> On Thu, 21 Aug 2008 00:13:22 -0700 (PDT) David Miller <davem@davemloft.net> wrote: >> >>> From: Andrew Morton <akpm@linux-foundation.org> >>> Date: Wed, 20 Aug 2008 23:46:15 -0700 >>> >>>> On Wed, 20 Aug 2008 20:08:13 +0900 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: >>>> >>>>> + num_cpus_per_node = cpus_weight_nr(node_to_cpumask(node)); >>>> sparc64 allmodconfig: >>>> >>>> mm/quicklist.c: In function `max_pages': >>>> mm/quicklist.c:44: error: invalid lvalue in unary `&' >>>> >>>> we seem to have a made a spectacular mess of cpumasks lately. >>> It should explode similarly on x86, since it also defines node_to_cpumask() >>> as an inline function. >>> >>> IA64 seems to be one of the few platforms to define this as a macro >>> evaluating to the node-to-cpumask array entry, so it's clear what >>> platform Motohiro-san did build testing on :-) >> Seems to compile OK on x86_32, x86_64, ia64 and powerpc for some reason. >> >> This seems to fix things on sparc64: >> >> --- a/mm/quicklist.c~mm-quicklist-shouldnt-be-proportional-to-number-of-cpus-fix >> +++ a/mm/quicklist.c >> @@ -28,7 +28,7 @@ static unsigned long max_pages(unsigned >> unsigned long node_free_pages, max; >> int node = numa_node_id(); >> struct zone *zones = NODE_DATA(node)->node_zones; >> - int num_cpus_per_node; >> + cpumask_t node_cpumask; >> >> node_free_pages = >> #ifdef CONFIG_ZONE_DMA >> @@ -41,8 +41,8 @@ static unsigned long max_pages(unsigned >> >> max = node_free_pages / FRACTION_OF_NODE_MEM; >> >> - num_cpus_per_node = cpus_weight_nr(node_to_cpumask(node)); >> - max /= num_cpus_per_node; >> + node_cpumask = node_to_cpumask(node); >> + max /= cpus_weight_nr(node_cpumask); >> >> return max(max, min_pages); >> } > > humm, I thought we wanted to keep cpumask_t stuff away from our stack - > since on insanely large SGI boxen (/me looks at mike) the thing becomes > 512 bytes. Yes, thanks for pointing that out! I did send out an alternate coding that should keep the cpumask_t off the stack for those arch's that need to worry about it (using the node_to_cpumask_ptr function). I should probably devote some time to documenting some of these gotcha's in one of the Doc.../ files. Mike -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs 2008-08-21 7:13 ` David Miller, Andrew Morton 2008-08-21 7:18 ` KOSAKI Motohiro 2008-08-21 7:27 ` Andrew Morton @ 2008-08-25 18:40 ` Mike Travis 2008-08-25 23:31 ` KOSAKI Motohiro 2 siblings, 1 reply; 43+ messages in thread From: Mike Travis @ 2008-08-25 18:40 UTC (permalink / raw) To: David Miller Cc: akpm, kosaki.motohiro, linux-kernel, linux-mm, cl, tokunaga.keiich David Miller wrote: > From: Andrew Morton <akpm@linux-foundation.org> > Date: Wed, 20 Aug 2008 23:46:15 -0700 > >> On Wed, 20 Aug 2008 20:08:13 +0900 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: >> >>> + num_cpus_per_node = cpus_weight_nr(node_to_cpumask(node)); I think the more correct usage would be: { node_to_cpumask_ptr(v, node); num_cpus_per_node = cpus_weight_nr(*v); max /= num_cpus_per_node; return max(max, min_pages); } which should load 'v' with a pointer to the node_to_cpumask_map[node] entry [and avoid using stack space for the cpumask_t variable for those arch's that define a node_to_cpumask_map (or similar).] Otherwise a local cpumask_t variable '_v' is created to which 'v' is pointing to and thus can be used directly as an arg to the cpu_xxx ops. Thanks, Mike >> sparc64 allmodconfig: >> >> mm/quicklist.c: In function `max_pages': >> mm/quicklist.c:44: error: invalid lvalue in unary `&' >> >> we seem to have a made a spectacular mess of cpumasks lately. > > It should explode similarly on x86, since it also defines node_to_cpumask() > as an inline function. > > IA64 seems to be one of the few platforms to define this as a macro > evaluating to the node-to-cpumask array entry, so it's clear what > platform Motohiro-san did build testing on :-) > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs 2008-08-25 18:40 ` Mike Travis @ 2008-08-25 23:31 ` KOSAKI Motohiro 0 siblings, 0 replies; 43+ messages in thread From: KOSAKI Motohiro @ 2008-08-25 23:31 UTC (permalink / raw) To: Mike Travis Cc: kosaki.motohiro, David Miller, akpm, linux-kernel, linux-mm, cl, tokunaga.keiich Hi Mike, > >>> + num_cpus_per_node = cpus_weight_nr(node_to_cpumask(node)); > > I think the more correct usage would be: > > { > node_to_cpumask_ptr(v, node); > num_cpus_per_node = cpus_weight_nr(*v); > max /= num_cpus_per_node; > > return max(max, min_pages); > } > > which should load 'v' with a pointer to the node_to_cpumask_map[node] entry > [and avoid using stack space for the cpumask_t variable for those arch's > that define a node_to_cpumask_map (or similar).] Otherwise a local cpumask_t > variable '_v' is created to which 'v' is pointing to and thus can be used > directly as an arg to the cpu_xxx ops. Thank you for your attension. please see my latest patch (http://marc.info/?l=linux-mm&m=121966459713193&w=2) it do that. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 0/2] Quicklist is slighly problematic. 2008-08-20 11:05 [RFC][PATCH 0/2] Quicklist is slighly problematic KOSAKI Motohiro 2008-08-20 11:07 ` [RFC][PATCH 1/2] Show quicklist at meminfo KOSAKI Motohiro 2008-08-20 11:08 ` [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs KOSAKI Motohiro @ 2008-08-20 14:10 ` Christoph Lameter 2008-08-20 14:49 ` KOSAKI Motohiro 2008-08-21 2:13 ` Robin Holt 2008-08-20 18:31 ` Andrew Morton 3 siblings, 2 replies; 43+ messages in thread From: Christoph Lameter @ 2008-08-20 14:10 UTC (permalink / raw) To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrew Morton, tokunaga.keiich KOSAKI Motohiro wrote: > Hi Cristoph, > > Thank you for explain your quicklist plan at OLS. > > So, I made summary to issue of quicklist. > if you have a bit time, Could you please read this mail and patches? > And, if possible, Could you please tell me your feeling? I believe what I said at the OLS was that quicklists are fundamentally crappy and should be replaced by something that works (Guess that is what you meant by "plan"?). Quicklists were generalized from the IA64 arch code. Good fixup but I would think that some more radical rework is needed. Maybe some of this needs to vanish into the TLB handling logic? Then I have thought for awhile that the main reason that quicklists exist are the performance problems in the page allocator. If you can make the single page alloc / free pass competitive in performance with quicklists then we could get rid of all uses. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 0/2] Quicklist is slighly problematic. 2008-08-20 14:10 ` [RFC][PATCH 0/2] Quicklist is slighly problematic Christoph Lameter @ 2008-08-20 14:49 ` KOSAKI Motohiro 2008-08-20 15:26 ` Christoph Lameter 2008-08-21 2:13 ` Robin Holt 1 sibling, 1 reply; 43+ messages in thread From: KOSAKI Motohiro @ 2008-08-20 14:49 UTC (permalink / raw) To: Christoph Lameter; +Cc: LKML, linux-mm, Andrew Morton, tokunaga.keiich Hi Thank you very quick responce. >> Thank you for explain your quicklist plan at OLS. >> >> So, I made summary to issue of quicklist. >> if you have a bit time, Could you please read this mail and patches? >> And, if possible, Could you please tell me your feeling? > > I believe what I said at the OLS was that quicklists are fundamentally crappy > and should be replaced by something that works (Guess that is what you meant > by "plan"?). Quicklists were generalized from the IA64 arch code. Unfortunately, Multiple ia64 customer of my campany are suffered by Quicklist, now. because Quicklist works well for HPC likes application, but business server's application has very different behavior. IOW, Quicklist works well on best case, but it doesn't concern to worst case. So, if possible, I'd like to make short term solution. I believe nobody oppose quicklist reducing. it is defenitly too fat. > Good fixup but I would think that some more radical rework is needed. > Maybe some of this needs to vanish into the TLB handling logic? What do you think wrong TLB handing? pure performance issue? > Then I have thought for awhile that the main reason that quicklists exist are > the performance problems in the page allocator. If you can make the single > page alloc / free pass competitive in performance with quicklists then we > could get rid of all uses. Agreed. Do you have any page allocator enhancement plan? Can I help it? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 0/2] Quicklist is slighly problematic. 2008-08-20 14:49 ` KOSAKI Motohiro @ 2008-08-20 15:26 ` Christoph Lameter 0 siblings, 0 replies; 43+ messages in thread From: Christoph Lameter @ 2008-08-20 15:26 UTC (permalink / raw) To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrew Morton, tokunaga.keiich KOSAKI Motohiro wrote: > So, if possible, I'd like to make short term solution. > I believe nobody oppose quicklist reducing. it is defenitly too fat. Correct. >> Good fixup but I would think that some more radical rework is needed. >> Maybe some of this needs to vanish into the TLB handling logic? > > What do you think wrong TLB handing? > pure performance issue? The generic TLB code could be made to do allow the allocation, the batching and freeing of the pages. Would remove the need for quicklists for some uses. > > Do you have any page allocator enhancement plan? > Can I help it? A simple approach would be to use the queueing method used in quicklists in the page allocator hotpath. But the devil is in the details .... There are numerous checks for the type of page that are done by the page allocator and not for the quicklists. Somehow we need to work around these. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 0/2] Quicklist is slighly problematic. 2008-08-20 14:10 ` [RFC][PATCH 0/2] Quicklist is slighly problematic Christoph Lameter 2008-08-20 14:49 ` KOSAKI Motohiro @ 2008-08-21 2:13 ` Robin Holt 2008-08-21 2:16 ` Robin Holt 2008-08-21 3:08 ` David Miller, Robin Holt 1 sibling, 2 replies; 43+ messages in thread From: Robin Holt @ 2008-08-21 2:13 UTC (permalink / raw) To: Christoph Lameter Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, tokunaga.keiich On Wed, Aug 20, 2008 at 09:10:47AM -0500, Christoph Lameter wrote: > KOSAKI Motohiro wrote: > > Hi Cristoph, > > > > Thank you for explain your quicklist plan at OLS. > > > > So, I made summary to issue of quicklist. > > if you have a bit time, Could you please read this mail and patches? > > And, if possible, Could you please tell me your feeling? > > I believe what I said at the OLS was that quicklists are fundamentally crappy > and should be replaced by something that works (Guess that is what you meant > by "plan"?). Quicklists were generalized from the IA64 arch code. > > Good fixup but I would think that some more radical rework is needed. > > Maybe some of this needs to vanish into the TLB handling logic? > > Then I have thought for awhile that the main reason that quicklists exist are > the performance problems in the page allocator. If you can make the single > page alloc / free pass competitive in performance with quicklists then we > could get rid of all uses. It is more than the free/alloc cycle, the quicklist saves us from having to zero the page. In a sparsely filled page table, it saves time and cache footprint. In a heavily used page table, you end up with a near wash. One problem I see is somebody got rid of the node awareness. We used to not put pages onto a quicklist when they were being released from a different node than the cpu is on. Not sure where that went. It was done because of the trap page problem described here. Thanks, Robin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 0/2] Quicklist is slighly problematic. 2008-08-21 2:13 ` Robin Holt @ 2008-08-21 2:16 ` Robin Holt 2008-08-21 3:08 ` David Miller, Robin Holt 1 sibling, 0 replies; 43+ messages in thread From: Robin Holt @ 2008-08-21 2:16 UTC (permalink / raw) To: Christoph Lameter Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, tokunaga.keiich On Wed, Aug 20, 2008 at 09:13:32PM -0500, Robin Holt wrote: > On Wed, Aug 20, 2008 at 09:10:47AM -0500, Christoph Lameter wrote: > > KOSAKI Motohiro wrote: > > > Hi Cristoph, > > > > > > Thank you for explain your quicklist plan at OLS. > > > > > > So, I made summary to issue of quicklist. > > > if you have a bit time, Could you please read this mail and patches? > > > And, if possible, Could you please tell me your feeling? > > > > I believe what I said at the OLS was that quicklists are fundamentally crappy > > and should be replaced by something that works (Guess that is what you meant > > by "plan"?). Quicklists were generalized from the IA64 arch code. > > > > Good fixup but I would think that some more radical rework is needed. > > > > Maybe some of this needs to vanish into the TLB handling logic? > > > > Then I have thought for awhile that the main reason that quicklists exist are > > the performance problems in the page allocator. If you can make the single > > page alloc / free pass competitive in performance with quicklists then we > > could get rid of all uses. > > It is more than the free/alloc cycle, the quicklist saves us from > having to zero the page. In a sparsely filled page table, it saves time > and cache footprint. In a heavily used page table, you end up with a > near wash. > > One problem I see is somebody got rid of the node awareness. We used > to not put pages onto a quicklist when they were being released from a > different node than the cpu is on. Not sure where that went. It was > done because of the trap page problem described here. Poorly worded. Here is the code I am referring to: #ifdef CONFIG_NUMA unsigned long nid = page_to_nid(virt_to_page(pgtable_entry)); if (unlikely(nid != numa_node_id())) { free_page((unsigned long)pgtable_entry); return; } #endif Thanks, Robin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 0/2] Quicklist is slighly problematic. 2008-08-21 2:13 ` Robin Holt 2008-08-21 2:16 ` Robin Holt @ 2008-08-21 3:08 ` David Miller, Robin Holt 2008-08-21 13:10 ` Christoph Lameter 1 sibling, 1 reply; 43+ messages in thread From: David Miller, Robin Holt @ 2008-08-21 3:08 UTC (permalink / raw) To: holt; +Cc: cl, kosaki.motohiro, linux-kernel, linux-mm, akpm, tokunaga.keiich > One problem I see is somebody got rid of the node awareness. We used > to not put pages onto a quicklist when they were being released from a > different node than the cpu is on. Not sure where that went. It was > done because of the trap page problem described here. NUMA awareness is one of the reasons I keep thinking about dropping quicklist usage on sparc64. Using SLAB/SLUB for the page table bits with appropriate constructor and destructor bits ought to be able to approximate the gains from avoiding the initialization for cached objects. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 0/2] Quicklist is slighly problematic. 2008-08-21 3:08 ` David Miller, Robin Holt @ 2008-08-21 13:10 ` Christoph Lameter 0 siblings, 0 replies; 43+ messages in thread From: Christoph Lameter @ 2008-08-21 13:10 UTC (permalink / raw) To: David Miller Cc: holt, kosaki.motohiro, linux-kernel, linux-mm, akpm, tokunaga.keiich David Miller wrote: > Using SLAB/SLUB for the page table bits with appropriate constructor > and destructor bits ought to be able to approximate the gains > from avoiding the initialization for cached objects. Its a bit strange to use the small object allocator for page sized allocations. Plus there is this tie in with the tlb flushing logic. So I think this would be more clean if it would be moved into the asm-generic/tlb.h or so. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 0/2] Quicklist is slighly problematic. 2008-08-20 11:05 [RFC][PATCH 0/2] Quicklist is slighly problematic KOSAKI Motohiro ` (2 preceding siblings ...) 2008-08-20 14:10 ` [RFC][PATCH 0/2] Quicklist is slighly problematic Christoph Lameter @ 2008-08-20 18:31 ` Andrew Morton 2008-08-21 2:42 ` Robin Holt 3 siblings, 1 reply; 43+ messages in thread From: Andrew Morton @ 2008-08-20 18:31 UTC (permalink / raw) To: KOSAKI Motohiro; +Cc: linux-kernel, linux-mm, cl, tokunaga.keiich, stable On Wed, 20 Aug 2008 20:05:51 +0900 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > Hi Cristoph, > > Thank you for explain your quicklist plan at OLS. > > So, I made summary to issue of quicklist. > if you have a bit time, Could you please read this mail and patches? > And, if possible, Could you please tell me your feeling? > > > -------------------------------------------------------------------- > > Now, Quicklist store some page in each CPU as cache. > (Each CPU has node_free_pages/16 pages) > > and it is used for page table cache. > Then, exit() increase cache, the other hand fork() spent it. > > So, if apache type (one parent and many child model) middleware run, > One CPU process fork(), Other CPU process the middleware work and exit(). > > At that time, One CPU don't have page table cache at all, > Others have maximum caches. > > QList_max = (#ofCPUs - 1) x Free / 16 > => QList_max / (Free + QList_max) = (#ofCPUs - 1) / (16 + #ofCPUs - 1) > > So, How much quicklist spent memory at maximum case? > That is #CPUs proposional because it is per CPU cache but cache amount calculation doesn't use #ofCPUs. > > Above calculation mean > > Number of CPUs per node 2 4 8 16 > ============================== ==================== > QList_max / (Free + QList_max) 5.8% 16% 30% 48% > > > Wow! Quicklist can spent about 50% memory at worst case. > More unfortunately, it doesn't have any cache shrinking mechanism. > So it cause some wrong thing. > > 1. End user misunderstand to memory leak happend. > => /proc/meminfo should display amount quicklist > > 2. It can cause OOM killer > => Amount of quicklists shouldn't be proposional to #ofCPUs. > OK, that's a fatal bug and it's present in 2.6.25.x and 2.6.26.x. A serious issue. The patches do apply to both stable kernels and I have tagged them for backporting into them. They're nice and small, but I didn't get a really solid yes-this-is-what-we-should-do from Christoph? This (from [patch 2/2]): "(Although its patch applied, quicklist can waste 64GB on 1TB server (= 1TB / 16), it is still too much??)" is a bit of a worry. Yes, 64GB is too much! But at least this is now only a performance issue rather than a stability issue, yes? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 0/2] Quicklist is slighly problematic. 2008-08-20 18:31 ` Andrew Morton @ 2008-08-21 2:42 ` Robin Holt 2008-08-21 13:07 ` Christoph Lameter 0 siblings, 1 reply; 43+ messages in thread From: Robin Holt @ 2008-08-21 2:42 UTC (permalink / raw) To: Andrew Morton, KOSAKI Motohiro Cc: linux-kernel, linux-mm, cl, tokunaga.keiich, stable > OK, that's a fatal bug and it's present in 2.6.25.x and 2.6.26.x. A > serious issue. > > The patches do apply to both stable kernels and I have tagged them for > backporting into them. They're nice and small, but I didn't get a > really solid yes-this-is-what-we-should-do from Christoph? > > > This (from [patch 2/2]): "(Although its patch applied, quicklist can > waste 64GB on 1TB server (= 1TB / 16), it is still too much??)" is a > bit of a worry. Yes, 64GB is too much! But at least this is now only > a performance issue rather than a stability issue, yes? That 64GB is not quite correct. That assumes all 1TB is free. The quicklists are trimmed down as the nodes undergo allocations. The problem I see right now is that page tables allocated on one node and freed on a cpu on a different node could be placed early enough on the quicklist that it will not be freed until the other node gets under memory pressure. Could you give the following a try? It hasn't even been compiled. I think this in addition to your cpus per node change are the right thing to do. Thanks, Robin Index: ia64-cleanups/include/linux/quicklist.h =================================================================== --- ia64-cleanups.orig/include/linux/quicklist.h 2008-08-20 21:35:10.000000000 -0500 +++ ia64-cleanups/include/linux/quicklist.h 2008-08-20 21:38:00.891943270 -0500 @@ -66,6 +66,15 @@ static inline void __quicklist_free(int static inline void quicklist_free(int nr, void (*dtor)(void *), void *pp) { +#ifdef CONFIG_NUMA + unsigned long nid = page_to_nid(virt_to_page(pp)); + + if (unlikely(nid != numa_node_id())) { + free_page((unsigned long)pp); + return; + } +#endif + __quicklist_free(nr, dtor, pp, virt_to_page(pp)); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 0/2] Quicklist is slighly problematic. 2008-08-21 2:42 ` Robin Holt @ 2008-08-21 13:07 ` Christoph Lameter 2008-08-21 13:14 ` Robin Holt 0 siblings, 1 reply; 43+ messages in thread From: Christoph Lameter @ 2008-08-21 13:07 UTC (permalink / raw) To: Robin Holt Cc: Andrew Morton, KOSAKI Motohiro, linux-kernel, linux-mm, tokunaga.keiich, stable Robin Holt wrote: > > Index: ia64-cleanups/include/linux/quicklist.h > =================================================================== > --- ia64-cleanups.orig/include/linux/quicklist.h 2008-08-20 21:35:10.000000000 -0500 > +++ ia64-cleanups/include/linux/quicklist.h 2008-08-20 21:38:00.891943270 -0500 > @@ -66,6 +66,15 @@ static inline void __quicklist_free(int > > static inline void quicklist_free(int nr, void (*dtor)(void *), void *pp) > { > +#ifdef CONFIG_NUMA > + unsigned long nid = page_to_nid(virt_to_page(pp)); > + > + if (unlikely(nid != numa_node_id())) { > + free_page((unsigned long)pp); > + return; > + } > +#endif > + > __quicklist_free(nr, dtor, pp, virt_to_page(pp)); > } > We removed this code because it frees a page before the TLB flush has been performed. This code segment was the reason that quicklists were not accepted for x86. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 0/2] Quicklist is slighly problematic. 2008-08-21 13:07 ` Christoph Lameter @ 2008-08-21 13:14 ` Robin Holt 2008-08-21 13:18 ` Christoph Lameter 0 siblings, 1 reply; 43+ messages in thread From: Robin Holt @ 2008-08-21 13:14 UTC (permalink / raw) To: Christoph Lameter Cc: Robin Holt, Andrew Morton, KOSAKI Motohiro, linux-kernel, linux-mm, tokunaga.keiich, stable On Thu, Aug 21, 2008 at 08:07:43AM -0500, Christoph Lameter wrote: > Robin Holt wrote: > > > > Index: ia64-cleanups/include/linux/quicklist.h > > =================================================================== > > --- ia64-cleanups.orig/include/linux/quicklist.h 2008-08-20 21:35:10.000000000 -0500 > > +++ ia64-cleanups/include/linux/quicklist.h 2008-08-20 21:38:00.891943270 -0500 > > @@ -66,6 +66,15 @@ static inline void __quicklist_free(int > > > > static inline void quicklist_free(int nr, void (*dtor)(void *), void *pp) > > { > > +#ifdef CONFIG_NUMA > > + unsigned long nid = page_to_nid(virt_to_page(pp)); > > + > > + if (unlikely(nid != numa_node_id())) { > > + free_page((unsigned long)pp); > > + return; > > + } > > +#endif > > + > > __quicklist_free(nr, dtor, pp, virt_to_page(pp)); > > } > > > > We removed this code because it frees a page before the TLB flush has been > performed. This code segment was the reason that quicklists were not accepted > for x86. How could we do this. It was a _HUGE_ problem on altix boxes. When you started a jobs with a large number of MPI ranks, they would all start from the shepherd process on a single node and the children would migrate to a different cpu. Unless subsequent jobs used enough memory to flush those remote quicklists, we would end up with a depleted node that never reclaimed. Thanks, Robin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 0/2] Quicklist is slighly problematic. 2008-08-21 13:14 ` Robin Holt @ 2008-08-21 13:18 ` Christoph Lameter 2008-08-21 13:45 ` Robin Holt 0 siblings, 1 reply; 43+ messages in thread From: Christoph Lameter @ 2008-08-21 13:18 UTC (permalink / raw) To: Robin Holt Cc: Andrew Morton, KOSAKI Motohiro, linux-kernel, linux-mm, tokunaga.keiich, stable Robin Holt wrote: >> We removed this code because it frees a page before the TLB flush has been >> performed. This code segment was the reason that quicklists were not accepted >> for x86. > > How could we do this. It was a _HUGE_ problem on altix boxes. When you > started a jobs with a large number of MPI ranks, they would all start > from the shepherd process on a single node and the children would > migrate to a different cpu. Unless subsequent jobs used enough memory > to flush those remote quicklists, we would end up with a depleted node > that never reclaimed. Well I tried to get the quicklist stuff resolved at SGI multiple times last year when the early free before flush was discovered but there did not seem to be much interest at that point, so we dropped it. In order to make this work correctly we would need to create a list of remote pages. These remote pages would then be freed after the TLB flush. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 0/2] Quicklist is slighly problematic. 2008-08-21 13:18 ` Christoph Lameter @ 2008-08-21 13:45 ` Robin Holt 0 siblings, 0 replies; 43+ messages in thread From: Robin Holt @ 2008-08-21 13:45 UTC (permalink / raw) To: Christoph Lameter Cc: Robin Holt, Andrew Morton, KOSAKI Motohiro, linux-kernel, linux-mm, tokunaga.keiich, stable On Thu, Aug 21, 2008 at 08:18:24AM -0500, Christoph Lameter wrote: > Robin Holt wrote: > > >> We removed this code because it frees a page before the TLB flush has been > >> performed. This code segment was the reason that quicklists were not accepted > >> for x86. > > > > How could we do this. It was a _HUGE_ problem on altix boxes. When you > > started a jobs with a large number of MPI ranks, they would all start > > from the shepherd process on a single node and the children would > > migrate to a different cpu. Unless subsequent jobs used enough memory > > to flush those remote quicklists, we would end up with a depleted node > > that never reclaimed. > > Well I tried to get the quicklist stuff resolved at SGI multiple times last > year when the early free before flush was discovered but there did not seem to > be much interest at that point, so we dropped it. Well, now that you dope slap me, I vaguely remember this. I also seem to recall being very busy with other stuff and convincing myself that a proper resolution would magically appear. Argh. Sorry, Robin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2008-08-26 20:35 UTC | newest] Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-08-20 11:05 [RFC][PATCH 0/2] Quicklist is slighly problematic KOSAKI Motohiro 2008-08-20 11:07 ` [RFC][PATCH 1/2] Show quicklist at meminfo KOSAKI Motohiro 2008-08-20 18:35 ` Andrew Morton 2008-08-21 7:36 ` KOSAKI Motohiro 2008-08-22 1:05 ` KOSAKI Motohiro 2008-08-22 4:28 ` Andrew Morton 2008-08-22 13:23 ` Robin Holt 2008-08-22 13:56 ` Christoph Lameter 2008-08-23 8:24 ` KOSAKI Motohiro 2008-08-24 5:29 ` Andrew Morton 2008-08-20 11:08 ` [RFC][PATCH 2/2] quicklist shouldn't be proportional to # of CPUs KOSAKI Motohiro 2008-08-20 15:27 ` Christoph Lameter 2008-08-21 6:46 ` Andrew Morton 2008-08-21 7:13 ` David Miller, Andrew Morton 2008-08-21 7:18 ` KOSAKI Motohiro 2008-08-21 7:27 ` Andrew Morton 2008-08-21 7:31 ` KOSAKI Motohiro 2008-08-21 9:32 ` Peter Zijlstra 2008-08-21 10:04 ` KOSAKI Motohiro 2008-08-21 10:09 ` David Miller, KOSAKI Motohiro 2008-08-21 10:13 ` KOSAKI Motohiro 2008-08-21 10:26 ` David Miller, KOSAKI Motohiro 2008-08-21 10:22 ` KOSAKI Motohiro 2008-08-21 12:02 ` KOSAKI Motohiro 2008-08-25 18:48 ` Mike Travis 2008-08-25 23:33 ` KOSAKI Motohiro 2008-08-26 20:35 ` Mike Travis 2008-08-25 18:44 ` Mike Travis 2008-08-25 18:40 ` Mike Travis 2008-08-25 23:31 ` KOSAKI Motohiro 2008-08-20 14:10 ` [RFC][PATCH 0/2] Quicklist is slighly problematic Christoph Lameter 2008-08-20 14:49 ` KOSAKI Motohiro 2008-08-20 15:26 ` Christoph Lameter 2008-08-21 2:13 ` Robin Holt 2008-08-21 2:16 ` Robin Holt 2008-08-21 3:08 ` David Miller, Robin Holt 2008-08-21 13:10 ` Christoph Lameter 2008-08-20 18:31 ` Andrew Morton 2008-08-21 2:42 ` Robin Holt 2008-08-21 13:07 ` Christoph Lameter 2008-08-21 13:14 ` Robin Holt 2008-08-21 13:18 ` Christoph Lameter 2008-08-21 13:45 ` Robin Holt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox