* [PATCH 0/2] mm: measuring resource demand @ 2006-07-11 18:29 Peter Zijlstra 2006-07-11 18:29 ` [PATCH 1/2] mm: nonresident page tracking Peter Zijlstra 2006-07-11 18:29 ` [PATCH 2/2] mm: refault histogram Peter Zijlstra 0 siblings, 2 replies; 5+ messages in thread From: Peter Zijlstra @ 2006-07-11 18:29 UTC (permalink / raw) To: linux-mm, linux-kernel; +Cc: Peter Zijlstra, Rik van Riel Hi, This patch set implements a refault histogram. This can be used to effectively measure resource demand, as outlined in Rik's OLS paper "Measuring Resource Demand on Linux" available at: http://people.redhat.com/~riel/riel-OLS2006.pdf This current posting is meant to start a discussion on the topic, with the ultimate goal of getting something like this in mainline. Peter -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] mm: nonresident page tracking 2006-07-11 18:29 [PATCH 0/2] mm: measuring resource demand Peter Zijlstra @ 2006-07-11 18:29 ` Peter Zijlstra [not found] ` <215036450607140155w67df26fan5b2342ead686ce8b@mail.gmail.com> 2006-07-11 18:29 ` [PATCH 2/2] mm: refault histogram Peter Zijlstra 1 sibling, 1 reply; 5+ messages in thread From: Peter Zijlstra @ 2006-07-11 18:29 UTC (permalink / raw) To: linux-mm, linux-kernel; +Cc: Peter Zijlstra, Rik van Riel From: Rik van Riel <riel@redhat.com> Track non-resident pages through a simple hashing scheme. This way the space overhead is limited to 1 u32 per page, or 0.1% space overhead and lookups are one cache miss. Aside from seeing whether or not a page was recently evicted, we can also take a reasonable guess at how many other pages were evicted since this page was evicted. NOTE: bucket space also contributes to the total size of the hash. This way even 64-bit machines with more than 2^32 pages get a fair chance. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> include/linux/nonresident.h | 35 +++++++++ init/main.c | 2 mm/Kconfig | 4 + mm/Makefile | 1 mm/nonresident.c | 167 ++++++++++++++++++++++++++++++++++++++++++++ mm/swap.c | 3 mm/vmscan.c | 3 7 files changed, 215 insertions(+) Index: linux-2.6/mm/nonresident.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6/mm/nonresident.c 2006-07-10 19:51:24.000000000 +0200 @@ -0,0 +1,171 @@ +/* + * mm/nonresident.c + * (C) 2004,2005 Red Hat, Inc + * Written by Rik van Riel <riel@redhat.com> + * Released under the GPL, see the file COPYING for details. + * + * Keeps track of whether a non-resident page was recently evicted + * and should be immediately promoted to the active list. This also + * helps automatically tune the inactive target. + * + * The pageout code stores a recently evicted page in this cache + * by calling remember_page(mapping/mm, index/vaddr, generation) + * and can look it up in the cache by calling recently_evicted() + * with the same arguments. + * + * Note that there is no way to invalidate pages after eg. truncate + * or exit, we let the pages fall out of the non-resident set through + * normal replacement. + */ +#include <linux/mm.h> +#include <linux/cache.h> +#include <linux/spinlock.h> +#include <linux/bootmem.h> +#include <linux/hash.h> +#include <linux/prefetch.h> +#include <linux/kernel.h> + +/* Number of non-resident pages per hash bucket. Never smaller than 15. */ +#if (L1_CACHE_BYTES < 64) +#define NR_BUCKET_BYTES 64 +#else +#define NR_BUCKET_BYTES L1_CACHE_BYTES +#endif +#define NUM_NR ((NR_BUCKET_BYTES - sizeof(atomic_t))/sizeof(u32)) + +struct nr_bucket +{ + atomic_t hand; + u32 page[NUM_NR]; +} ____cacheline_aligned; + +/* The non-resident page hash table. */ +static struct nr_bucket * nonres_table; +static unsigned int nonres_shift; +static unsigned int nonres_mask; + +static struct nr_bucket * nr_hash(void * mapping, unsigned long index) +{ + unsigned long bucket; + unsigned long hash; + + hash = hash_ptr(mapping, BITS_PER_LONG); + hash = 37 * hash + hash_long(index, BITS_PER_LONG); + bucket = hash & nonres_mask; + + return nonres_table + bucket; +} + +static u32 nr_cookie(struct address_space * mapping, unsigned long index) +{ + /* + * Different hash magic from bucket selection to insure + * the combined bits extend hash-space. + */ + unsigned long cookie = hash_long(index, BITS_PER_LONG); + cookie = 51 * cookie + hash_ptr(mapping, BITS_PER_LONG); + + if (mapping && mapping->host) { + cookie = 37 * cookie + hash_long(mapping->host->i_ino, BITS_PER_LONG); + } + + return (u32)(cookie >> (BITS_PER_LONG - 32)); +} + +unsigned long nonresident_get(struct address_space * mapping, unsigned long index) +{ + struct nr_bucket * nr_bucket; + int distance; + u32 wanted; + int i; + + prefetch(mapping->host); + nr_bucket = nr_hash(mapping, index); + + prefetch(nr_bucket); + wanted = nr_cookie(mapping, index); + + for (i = 0; i < NUM_NR; i++) { + if (nr_bucket->page[i] == wanted) { + nr_bucket->page[i] = 0; + /* Return the distance between entry and clock hand. */ + distance = atomic_read(&nr_bucket->hand) + NUM_NR - i; + distance %= NUM_NR; + return (distance << nonres_shift) + (nr_bucket - nonres_table); + } + } + + return ~0UL; +} + +u32 nonresident_put(struct address_space * mapping, unsigned long index) +{ + struct nr_bucket * nr_bucket; + u32 nrpage; + int i; + + prefetch(mapping->host); + nr_bucket = nr_hash(mapping, index); + + prefetchw(nr_bucket); + nrpage = nr_cookie(mapping, index); + + /* Atomically find the next array index. */ + preempt_disable(); +retry: + i = atomic_inc_return(&nr_bucket->hand); + if (unlikely(i >= NUM_NR)) { + if (i == NUM_NR) + atomic_set(&nr_bucket->hand, -1); + goto retry; + } + preempt_enable(); + + /* Statistics may want to know whether the entry was in use. */ + return xchg(&nr_bucket->page[i], nrpage); +} + +unsigned long nonresident_total(void) +{ + return NUM_NR << nonres_shift; +} + +/* + * For interactive workloads, we remember about as many non-resident pages + * as we have actual memory pages. For server workloads with large inter- + * reference distances we could benefit from remembering more. + */ +static __initdata unsigned long nonresident_factor = 1; +void __init nonresident_init(void) +{ + int target; + int i; + + /* + * Calculate the non-resident hash bucket target. Use a power of + * two for the division because alloc_large_system_hash rounds up. + */ + target = nr_all_pages * nonresident_factor; + target /= (sizeof(struct nr_bucket) / sizeof(u32)); + + nonres_table = alloc_large_system_hash("Non-resident page tracking", + sizeof(struct nr_bucket), + target, + 0, + HASH_EARLY | HASH_HIGHMEM, + &nonres_shift, + &nonres_mask, + 0); + + for (i = 0; i < (1 << nonres_shift); i++) + atomic_set(&nonres_table[i].hand, 0); +} + +static int __init set_nonresident_factor(char * str) +{ + if (!str) + return 0; + nonresident_factor = simple_strtoul(str, &str, 0); + return 1; +} +__setup("nonresident_factor=", set_nonresident_factor); Index: linux-2.6/include/linux/nonresident.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6/include/linux/nonresident.h 2006-07-10 19:51:27.000000000 +0200 @@ -0,0 +1,35 @@ +#ifndef _LINUX_NONRESIDENT_H_ +#define _LINUX_NONRESIDENT_H_ + +#ifdef __KERNEL__ + +#ifdef CONFIG_MM_NONRESIDENT + +extern void nonresident_init(void); +extern unsigned long nonresident_get(struct address_space *, unsigned long); +extern u32 nonresident_put(struct address_space *, unsigned long); +extern unsigned long nonresident_total(void); + +#else /* CONFIG_MM_NONRESIDENT */ + +static inline void nonresident_init(void) { } +static inline +unsigned long nonresident_get(struct address_space *, unsigned long, int) +{ + return 0; +} + +static inline u32 nonresident_put(struct address_space *, unsigned long) +{ + return 0; +} + +static inline unsigned long nonresident_total(void) +{ + return 0; +} + +#endif /* CONFIG_MM_NONRESIDENT */ + +#endif /* __KERNEL */ +#endif /* _LINUX_NONRESIDENT_H_ */ Index: linux-2.6/init/main.c =================================================================== --- linux-2.6.orig/init/main.c 2006-07-10 19:49:02.000000000 +0200 +++ linux-2.6/init/main.c 2006-07-10 19:49:52.000000000 +0200 @@ -49,6 +49,7 @@ #include <linux/buffer_head.h> #include <linux/debug_locks.h> #include <linux/lockdep.h> +#include <linux/nonresident.h> #include <asm/io.h> #include <asm/bugs.h> @@ -544,6 +545,7 @@ asmlinkage void __init start_kernel(void #endif vfs_caches_init_early(); cpuset_init_early(); + nonresident_init(); mem_init(); kmem_cache_init(); setup_per_cpu_pageset(); Index: linux-2.6/mm/Makefile =================================================================== --- linux-2.6.orig/mm/Makefile 2006-07-10 19:49:02.000000000 +0200 +++ linux-2.6/mm/Makefile 2006-07-10 19:49:52.000000000 +0200 @@ -13,6 +13,7 @@ obj-y := bootmem.o filemap.o mempool.o prio_tree.o util.o mmzone.o vmstat.o $(mmu-y) obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o +obj-$(CONFIG_MM_NONRESIDENT) += nonresident.o obj-$(CONFIG_HUGETLBFS) += hugetlb.o obj-$(CONFIG_NUMA) += mempolicy.o obj-$(CONFIG_SPARSEMEM) += sparse.o Index: linux-2.6/mm/swap.c =================================================================== --- linux-2.6.orig/mm/swap.c 2006-07-10 19:49:02.000000000 +0200 +++ linux-2.6/mm/swap.c 2006-07-10 19:49:52.000000000 +0200 @@ -30,6 +30,7 @@ #include <linux/cpu.h> #include <linux/notifier.h> #include <linux/init.h> +#include <linux/nonresident.h> /* How many pages do we try to swap or page in/out together? */ int page_cluster; @@ -346,6 +347,7 @@ void __pagevec_lru_add(struct pagevec *p } BUG_ON(PageLRU(page)); SetPageLRU(page); + nonresident_get(page_mapping(page), page_index(page)); add_page_to_inactive_list(zone, page); } if (zone) @@ -373,6 +375,7 @@ void __pagevec_lru_add_active(struct pag } BUG_ON(PageLRU(page)); SetPageLRU(page); + nonresident_get(page_mapping(page), page_index(page)); BUG_ON(PageActive(page)); SetPageActive(page); add_page_to_active_list(zone, page); Index: linux-2.6/mm/vmscan.c =================================================================== --- linux-2.6.orig/mm/vmscan.c 2006-07-10 19:49:02.000000000 +0200 +++ linux-2.6/mm/vmscan.c 2006-07-10 19:49:52.000000000 +0200 @@ -35,6 +35,7 @@ #include <linux/rwsem.h> #include <linux/delay.h> #include <linux/kthread.h> +#include <linux/nonresident.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -395,6 +396,7 @@ int remove_mapping(struct address_space if (PageSwapCache(page)) { swp_entry_t swap = { .val = page_private(page) }; + nonresident_put(mapping, page_index(page)); __delete_from_swap_cache(page); write_unlock_irq(&mapping->tree_lock); swap_free(swap); @@ -402,6 +404,7 @@ int remove_mapping(struct address_space return 1; } + nonresident_put(mapping, page_index(page)); __remove_from_page_cache(page); write_unlock_irq(&mapping->tree_lock); __put_page(page); Index: linux-2.6/mm/Kconfig =================================================================== --- linux-2.6.orig/mm/Kconfig 2006-07-10 19:49:02.000000000 +0200 +++ linux-2.6/mm/Kconfig 2006-07-10 19:51:24.000000000 +0200 @@ -152,3 +152,7 @@ config RESOURCES_64BIT default 64BIT help This option allows memory and IO resources to be 64 bit. + +config MM_NONRESIDENT + bool "Track nonresident pages" + def_bool y -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <215036450607140155w67df26fan5b2342ead686ce8b@mail.gmail.com>]
* Re: [PATCH 1/2] mm: nonresident page tracking [not found] ` <215036450607140155w67df26fan5b2342ead686ce8b@mail.gmail.com> @ 2006-07-14 14:19 ` Peter Zijlstra 0 siblings, 0 replies; 5+ messages in thread From: Peter Zijlstra @ 2006-07-14 14:19 UTC (permalink / raw) To: Feng Jin; +Cc: linux-mm, linux-kernel, Rik van Riel On Fri, 2006-07-14 at 16:55 +0800, Feng Jin wrote: > Hi, > > I have applied the patch on 2.6.18-rc1-mm1, and when boot my system, > kernel panic occured, :( > I have tyied debug it with kdb, but panic occured at startup, although > I have add kdb=early, but it still > could not debug it. > attachment is my config file. >From the fact that the patch doesn't apply cleanly to .18-rc1-mm1, and that when I fixup the rejects it does boot, I can reach no other conclusion than that you blotched it somehow. This patch was against mainline from the day of the post. As for your suggestion of putting #ifdef CONFIG_MM_NONRESIDENT all over the place; have you seen how the nonresident.h file declares empty stubs for the functions? Peter -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 2/2] mm: refault histogram 2006-07-11 18:29 [PATCH 0/2] mm: measuring resource demand Peter Zijlstra 2006-07-11 18:29 ` [PATCH 1/2] mm: nonresident page tracking Peter Zijlstra @ 2006-07-11 18:29 ` Peter Zijlstra 1 sibling, 0 replies; 5+ messages in thread From: Peter Zijlstra @ 2006-07-11 18:29 UTC (permalink / raw) To: linux-mm, linux-kernel; +Cc: Peter Zijlstra, Rik van Riel From: Peter Zijlstra <a.p.zijlstra@chello.nl> Adds a refault histogram on top of the nonresident code. Based on ideas and code from Rik van Riel. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl fs/proc/proc_misc.c | 23 ++++++++++ mm/Kconfig | 5 ++ mm/Makefile | 1 mm/nonresident.c | 15 +++++- mm/refault.c | 114 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 155 insertions(+), 3 deletions(-) Index: linux-2.6/fs/proc/proc_misc.c =================================================================== --- linux-2.6.orig/fs/proc/proc_misc.c 2006-07-11 18:06:58.000000000 +0200 +++ linux-2.6/fs/proc/proc_misc.c 2006-07-11 18:07:03.000000000 +0200 @@ -224,6 +224,26 @@ static struct file_operations fragmentat .release = seq_release, }; +#ifdef CONFIG_MM_REFAULT +extern struct seq_operations refault_op; +static int refault_open(struct inode *inode, struct file *file) +{ + (void)inode; + return seq_open(file, &refault_op); +} + +extern ssize_t refault_write(struct file *, const char __user *buf, + size_t count, loff_t *); + +static struct file_operations refault_file_operations = { + .open = refault_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, + .write = refault_write, +}; +#endif + extern struct seq_operations zoneinfo_op; static int zoneinfo_open(struct inode *inode, struct file *file) { @@ -696,6 +716,9 @@ void __init proc_misc_init(void) #endif #endif create_seq_entry("buddyinfo",S_IRUGO, &fragmentation_file_operations); +#ifdef CONFIG_MM_REFAULT + create_seq_entry("refault",S_IRUGO, &refault_file_operations); +#endif create_seq_entry("vmstat",S_IRUGO, &proc_vmstat_file_operations); create_seq_entry("zoneinfo",S_IRUGO, &proc_zoneinfo_file_operations); create_seq_entry("diskstats", 0, &proc_diskstats_operations); Index: linux-2.6/mm/Kconfig =================================================================== --- linux-2.6.orig/mm/Kconfig 2006-07-11 18:07:03.000000000 +0200 +++ linux-2.6/mm/Kconfig 2006-07-11 18:07:03.000000000 +0200 @@ -156,3 +156,8 @@ config RESOURCES_64BIT config MM_NONRESIDENT bool "Track nonresident pages" def_bool y + +config MM_REFAULT + bool "Refault histogram" + def_bool y + depends on MM_NONRESIDENT Index: linux-2.6/mm/nonresident.c =================================================================== --- linux-2.6.orig/mm/nonresident.c 2006-07-11 18:07:03.000000000 +0200 +++ linux-2.6/mm/nonresident.c 2006-07-11 18:07:03.000000000 +0200 @@ -90,12 +90,21 @@ unsigned long nonresident_get(struct add nr_bucket->page[i] = 0; /* Return the distance between entry and clock hand. */ distance = atomic_read(&nr_bucket->hand) + NUM_NR - i; - distance %= NUM_NR; - return (distance << nonres_shift) + (nr_bucket - nonres_table); + distance = (distance % NUM_NR) << nonres_shift; + distance += (nr_bucket - nonres_table); + goto out; } } - return ~0UL; + distance = ~0UL; +out: +#ifdef CONFIG_MM_REFAULT + { + extern void nonresident_refault(unsigned long); + nonresident_refault(distance); + } +#endif /* CONFIG_MM_REFAULT */ + return distance; } u32 nonresident_put(struct address_space * mapping, unsigned long index) Index: linux-2.6/mm/refault.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6/mm/refault.c 2006-07-11 18:07:03.000000000 +0200 @@ -0,0 +1,114 @@ +#include <linux/config.h> +#include <linux/percpu.h> +#include <linux/seq_file.h> +#include <asm/uaccess.h> + +#define BUCKETS 64 + +DEFINE_PER_CPU(unsigned long[BUCKETS+1], refault_histogram); + +extern unsigned long nonresident_total(void); + +void nonresident_refault(unsigned long distance) +{ + unsigned long nonres_bucket = nonresident_total() / BUCKETS; + unsigned long bucket_id = distance / nonres_bucket; + + if (bucket_id > BUCKETS) + bucket_id = BUCKETS; + + __get_cpu_var(refault_histogram)[bucket_id]++; +} + +#ifdef CONFIG_PROC_FS + +#include <linux/seq_file.h> + +static void *frag_start(struct seq_file *m, loff_t *pos) +{ + if (*pos < 0 || *pos > BUCKETS) + return NULL; + + m->private = (void *)(unsigned long)*pos; + + return pos; +} + +static void *frag_next(struct seq_file *m, void *arg, loff_t *pos) +{ + if (*pos < BUCKETS) { + (*pos)++; + (unsigned long)m->private++; + return pos; + } + return NULL; +} + +static void frag_stop(struct seq_file *m, void *arg) +{ +} + +unsigned long get_refault_stat(unsigned long index) +{ + unsigned long total = 0; + int cpu; + + for_each_possible_cpu(cpu) { + total += per_cpu(refault_histogram, cpu)[index]; + } + return total; +} + +static int frag_show(struct seq_file *m, void *arg) +{ + unsigned long index = (unsigned long)m->private; + unsigned long nonres_bucket = nonresident_total() / BUCKETS; + unsigned long upper = ((unsigned long)index + 1) * nonres_bucket; + unsigned long lower = (unsigned long)index * nonres_bucket; + unsigned long hits = get_refault_stat(index); + + if (index == 0) + seq_printf(m, " Refault distance Hits\n"); + + if (index < BUCKETS) + seq_printf(m, "%9lu - %9lu %9lu\n", lower, upper, hits); + else + seq_printf(m, " New/Beyond %9lu %9lu\n", lower, hits); + + return 0; +} + +struct seq_operations refault_op = { + .start = frag_start, + .next = frag_next, + .stop = frag_stop, + .show = frag_show, +}; + +static void refault_reset(void) +{ + int cpu; + int bucket_id; + + for_each_possible_cpu(cpu) { + for (bucket_id = 0; bucket_id <= BUCKETS; ++bucket_id) + per_cpu(refault_histogram, cpu)[bucket_id] = 0; + } +} + +ssize_t refault_write(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ + if (count) { + char c; + + if (get_user(c, buf)) + return -EFAULT; + if (c == '0') + refault_reset(); + } + return count; +} + +#endif /* CONFIG_PROCFS */ + Index: linux-2.6/mm/Makefile =================================================================== --- linux-2.6.orig/mm/Makefile 2006-07-11 18:07:03.000000000 +0200 +++ linux-2.6/mm/Makefile 2006-07-11 18:07:03.000000000 +0200 @@ -14,6 +14,7 @@ obj-y := bootmem.o filemap.o mempool.o obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o obj-$(CONFIG_MM_NONRESIDENT) += nonresident.o +obj-$(CONFIG_MM_REFAULT) += refault.o obj-$(CONFIG_HUGETLBFS) += hugetlb.o obj-$(CONFIG_NUMA) += mempolicy.o obj-$(CONFIG_SPARSEMEM) += sparse.o -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 0/2] refaults @ 2007-07-26 17:25 Peter Zijlstra 2007-07-26 17:25 ` [PATCH 1/2] mm: nonresident page tracking Peter Zijlstra, Rik van Riel 0 siblings, 1 reply; 5+ messages in thread From: Peter Zijlstra @ 2007-07-26 17:25 UTC (permalink / raw) To: linux-kernel, linux-mm, containers; +Cc: akpm, balbir, riel, a.p.zijlstra Hi, This is a brush up of the refault patches, as presented by Rik at last year's OLS: http://people.redhat.com/riel/riel-OLS2006.pdf When talking to people at OLS this year there was a renewed interrest in the concept. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] mm: nonresident page tracking 2007-07-26 17:25 [PATCH 0/2] refaults Peter Zijlstra @ 2007-07-26 17:25 ` Peter Zijlstra, Rik van Riel 0 siblings, 0 replies; 5+ messages in thread From: Peter Zijlstra, Rik van Riel @ 2007-07-26 17:25 UTC (permalink / raw) To: linux-kernel, linux-mm, containers; +Cc: akpm, balbir, riel, a.p.zijlstra [-- Attachment #1: mm-nonresident.patch --] [-- Type: text/plain, Size: 10853 bytes --] Track non-resident pages through a simple hashing scheme. This way the space overhead is limited to 1 u32 per page, or 0.1% space overhead and lookups are one cache miss. Aside from seeing whether or not a page was recently evicted, we can also take a reasonable guess at how many other pages were evicted since this page was evicted. TODO: make the entries unsigned long, currently we're limited to 1^32*NUM_NR*PAGE_SIZE bytes of memory. Event though this would end up being 1008 TB of memory, I suspect the hash function to go crap at around 4 to 16 TB. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- Documentation/kernel-parameters.txt | 4 include/linux/nonresident.h | 35 +++++++ init/main.c | 2 mm/Kconfig | 4 mm/Makefile | 1 mm/nonresident.c | 173 ++++++++++++++++++++++++++++++++++++ mm/swap.c | 3 mm/vmscan.c | 3 8 files changed, 225 insertions(+) Index: linux-2.6/mm/nonresident.c =================================================================== --- /dev/null +++ linux-2.6/mm/nonresident.c @@ -0,0 +1,173 @@ +/* + * mm/nonresident.c + * (C) 2004,2005 Red Hat, Inc + * Written by Rik van Riel <riel@redhat.com> + * Released under the GPL, see the file COPYING for details. + * + * Keeps track of whether a non-resident page was recently evicted + * and should be immediately promoted to the active list. This also + * helps automatically tune the inactive target. + * + * The pageout code stores a recently evicted page in this cache + * by calling remember_page(mapping/mm, index/vaddr, generation) + * and can look it up in the cache by calling recently_evicted() + * with the same arguments. + * + * Note that there is no way to invalidate pages after eg. truncate + * or exit, we let the pages fall out of the non-resident set through + * normal replacement. + */ +#include <linux/mm.h> +#include <linux/cache.h> +#include <linux/spinlock.h> +#include <linux/bootmem.h> +#include <linux/hash.h> +#include <linux/prefetch.h> +#include <linux/kernel.h> + +/* Number of non-resident pages per hash bucket. Never smaller than 15. */ +#if (L1_CACHE_BYTES < 64) +#define NR_BUCKET_BYTES 64 +#else +#define NR_BUCKET_BYTES L1_CACHE_BYTES +#endif +#define NUM_NR ((NR_BUCKET_BYTES - sizeof(atomic_t))/sizeof(atomic_t)) + +struct nr_bucket +{ + atomic_t hand; + atomic_t cookie[NUM_NR]; +} ____cacheline_aligned; + +/* The non-resident page hash table. */ +static struct nr_bucket *nonres_table; +static unsigned int nonres_shift; +static unsigned int nonres_mask; + +static struct nr_bucket *nr_hash(void *mapping, unsigned long index) +{ + unsigned long bucket; + unsigned long hash; + + hash = hash_ptr(mapping, BITS_PER_LONG); + hash = 37 * hash + hash_long(index, BITS_PER_LONG); + bucket = hash & nonres_mask; + + return nonres_table + bucket; +} + +static u32 nr_cookie(struct address_space *mapping, unsigned long index) +{ + unsigned long cookie; + + cookie = hash_ptr(mapping, BITS_PER_LONG); + cookie = 37 * cookie + hash_long(index, BITS_PER_LONG); + + if (mapping && mapping->host) { + cookie *= 37; + cookie += hash_long(mapping->host->i_ino, BITS_PER_LONG); + } + + return (u32)(cookie >> (BITS_PER_LONG - 32)); +} + +unsigned long +nonresident_get(struct address_space *mapping, unsigned long index) +{ + struct nr_bucket *bucket; + unsigned long distance = ~0UL; + u32 wanted; + int i; + + if (!mapping) + return distance; + + prefetch(mapping->host); + bucket = nr_hash(mapping, index); + + prefetch(bucket); + wanted = nr_cookie(mapping, index); + + for (i = 0; i < NUM_NR; i++) { + if ((u32)atomic_cmpxchg(&bucket->cookie[i], wanted, 0) == wanted) { + /* Return the distance between entry and clock hand. */ + distance = atomic_read(&bucket->hand) + NUM_NR - i; + distance %= NUM_NR; + distance += (bucket - nonres_table); + goto out; + } + } + +out: + return distance; +} + +u32 nonresident_put(struct address_space *mapping, unsigned long index) +{ + struct nr_bucket *bucket; + u32 cookie; + int cur_hand; + int hand; + + prefetch(mapping->host); + bucket = nr_hash(mapping, index); + + prefetchw(bucket); + cookie = nr_cookie(mapping, index); + + /* Atomically find the next array index. */ + do { + cur_hand = atomic_read(&bucket->hand); + hand = cur_hand + 1; + if (unlikely(hand == NUM_NR)) + hand = 0; + } while (atomic_cmpxchg(&bucket->hand, cur_hand, hand) != cur_hand); + + /* Statistics may want to know whether the entry was in use. */ + return atomic_xchg(&bucket->cookie[hand], cookie); +} + +unsigned long nonresident_total(void) +{ + return NUM_NR << nonres_shift; +} + +/* + * For interactive workloads, we remember about as many non-resident pages + * as we have actual memory pages. For server workloads with large inter- + * reference distances we could benefit from remembering more. + */ +static __initdata unsigned long nonresident_factor = 100; +void __init nonresident_init(void) +{ + int target; + int i; + + /* + * Calculate the non-resident hash bucket target. Use a power of + * two for the division because alloc_large_system_hash rounds up. + */ + target = (nr_all_pages * nonresident_factor) / 100; + target /= (sizeof(struct nr_bucket) / sizeof(atomic_t)); + + nonres_table = alloc_large_system_hash("Non-resident page tracking", + sizeof(struct nr_bucket), + target, + 0, + HASH_EARLY, + &nonres_shift, + &nonres_mask, + 0); + + for (i = 0; i < (1 << nonres_shift); i++) + atomic_set(&nonres_table[i].hand, 0); +} + +static int __init set_nonresident_factor(char *str) +{ + if (!str) + return 0; + nonresident_factor = simple_strtoul(str, &str, 0); + return 1; +} +__setup("nonresident_factor=", set_nonresident_factor); Index: linux-2.6/include/linux/nonresident.h =================================================================== --- /dev/null +++ linux-2.6/include/linux/nonresident.h @@ -0,0 +1,35 @@ +#ifndef _LINUX_NONRESIDENT_H_ +#define _LINUX_NONRESIDENT_H_ + +#ifdef __KERNEL__ + +#ifdef CONFIG_MM_NONRESIDENT + +extern void nonresident_init(void); +extern unsigned long nonresident_get(struct address_space *, unsigned long); +extern u32 nonresident_put(struct address_space *, unsigned long); +extern unsigned long nonresident_total(void); + +#else /* CONFIG_MM_NONRESIDENT */ + +static inline void nonresident_init(void) { } +static inline +unsigned long nonresident_get(struct address_space *, unsigned long, int) +{ + return 0; +} + +static inline u32 nonresident_put(struct address_space *, unsigned long) +{ + return 0; +} + +static inline unsigned long nonresident_total(void) +{ + return 0; +} + +#endif /* CONFIG_MM_NONRESIDENT */ + +#endif /* __KERNEL */ +#endif /* _LINUX_NONRESIDENT_H_ */ Index: linux-2.6/init/main.c =================================================================== --- linux-2.6.orig/init/main.c +++ linux-2.6/init/main.c @@ -55,6 +55,7 @@ #include <linux/pid_namespace.h> #include <linux/device.h> #include <linux/kthread.h> +#include <linux/nonresident.h> #include <asm/io.h> #include <asm/bugs.h> @@ -611,6 +612,7 @@ asmlinkage void __init start_kernel(void #endif vfs_caches_init_early(); cpuset_init_early(); + nonresident_init(); mem_init(); kmem_cache_init(); setup_per_cpu_pageset(); Index: linux-2.6/mm/Makefile =================================================================== --- linux-2.6.orig/mm/Makefile +++ linux-2.6/mm/Makefile @@ -15,6 +15,7 @@ obj-y := bootmem.o filemap.o mempool.o obj-$(CONFIG_BOUNCE) += bounce.o obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o +obj-$(CONFIG_MM_NONRESIDENT) += nonresident.o obj-$(CONFIG_HUGETLBFS) += hugetlb.o obj-$(CONFIG_NUMA) += mempolicy.o obj-$(CONFIG_SPARSEMEM) += sparse.o Index: linux-2.6/mm/swap.c =================================================================== --- linux-2.6.orig/mm/swap.c +++ linux-2.6/mm/swap.c @@ -30,6 +30,7 @@ #include <linux/cpu.h> #include <linux/notifier.h> #include <linux/init.h> +#include <linux/nonresident.h> /* How many pages do we try to swap or page in/out together? */ int page_cluster; @@ -365,6 +366,7 @@ void __pagevec_lru_add(struct pagevec *p } VM_BUG_ON(PageLRU(page)); SetPageLRU(page); + nonresident_get(page_mapping(page), page_index(page)); add_page_to_inactive_list(zone, page); } if (zone) @@ -392,6 +394,7 @@ void __pagevec_lru_add_active(struct pag } VM_BUG_ON(PageLRU(page)); SetPageLRU(page); + nonresident_get(page_mapping(page), page_index(page)); VM_BUG_ON(PageActive(page)); SetPageActive(page); add_page_to_active_list(zone, page); Index: linux-2.6/mm/vmscan.c =================================================================== --- linux-2.6.orig/mm/vmscan.c +++ linux-2.6/mm/vmscan.c @@ -37,6 +37,7 @@ #include <linux/delay.h> #include <linux/kthread.h> #include <linux/freezer.h> +#include <linux/nonresident.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -402,6 +403,7 @@ int remove_mapping(struct address_space if (PageSwapCache(page)) { swp_entry_t swap = { .val = page_private(page) }; + nonresident_put(mapping, page_index(page)); __delete_from_swap_cache(page); write_unlock_irq(&mapping->tree_lock); swap_free(swap); @@ -409,6 +411,7 @@ int remove_mapping(struct address_space return 1; } + nonresident_put(mapping, page_index(page)); __remove_from_page_cache(page); write_unlock_irq(&mapping->tree_lock); __put_page(page); Index: linux-2.6/mm/Kconfig =================================================================== --- linux-2.6.orig/mm/Kconfig +++ linux-2.6/mm/Kconfig @@ -158,6 +158,10 @@ config RESOURCES_64BIT help This option allows memory and IO resources to be 64 bit. +config MM_NONRESIDENT + bool "Track nonresident pages" + def_bool y + config ZONE_DMA_FLAG int default "0" if !ZONE_DMA Index: linux-2.6/Documentation/kernel-parameters.txt =================================================================== --- linux-2.6.orig/Documentation/kernel-parameters.txt +++ linux-2.6/Documentation/kernel-parameters.txt @@ -1167,6 +1167,10 @@ and is between 256 and 4096 characters. nomce [IA-32] Machine Check Exception + nonresident_factor= [KNL] Scale the size of the nonresident history + The default is 100, which equals the total + memory size. + noreplace-paravirt [IA-32,PV_OPS] Don't patch paravirt_ops noreplace-smp [IA-32,SMP] Don't replace SMP instructions -- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-07-26 17:25 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-07-11 18:29 [PATCH 0/2] mm: measuring resource demand Peter Zijlstra
2006-07-11 18:29 ` [PATCH 1/2] mm: nonresident page tracking Peter Zijlstra
[not found] ` <215036450607140155w67df26fan5b2342ead686ce8b@mail.gmail.com>
2006-07-14 14:19 ` Peter Zijlstra
2006-07-11 18:29 ` [PATCH 2/2] mm: refault histogram Peter Zijlstra
2007-07-26 17:25 [PATCH 0/2] refaults Peter Zijlstra
2007-07-26 17:25 ` [PATCH 1/2] mm: nonresident page tracking Peter Zijlstra, Rik van Riel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox