[RFC 0/3] non-resident page tracking

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC 0/3] non-resident page tracking
@ 2005-08-08 20:14 Rik van Riel
  2005-08-08 20:14 ` [RFC 1/3] " Rik van Riel
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Rik van Riel @ 2005-08-08 20:14 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel

These patches implement non-resident page tracking, which is needed
infrastructure for advanced page replacement algorithms like CART
and CLOCK-Pro.

The patches have been tested, but could use some eyeballs.  In
particular, I do not know if the chosen hash function gives a good
spread between the hash buckets.

Note that these patches are not very useful by themselves, I still
need to implement CLOCK-Pro on top of them.  For more information
please see the linux-mm wiki:

	http://linux-mm.org/wiki/AdvancedPageReplacement

-- 
All Rights Reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC 1/3] non-resident page tracking
  2005-08-08 20:14 [RFC 0/3] non-resident page tracking Rik van Riel
@ 2005-08-08 20:14 ` Rik van Riel
  2005-08-08 20:26   ` David S. Miller, Rik van Riel
  2005-08-09 18:25   ` Marcelo Tosatti
  2005-08-08 20:14 ` [RFC 2/3] " Rik van Riel
  2005-08-08 20:14 ` [RFC 3/3] " Rik van Riel
  2 siblings, 2 replies; 11+ messages in thread
From: Rik van Riel @ 2005-08-08 20:14 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel

[-- Attachment #1: nonresident --]
[-- Type: text/plain, Size: 9508 bytes --]

Track non-resident pages through a simple hashing scheme.  This way
the space overhead is limited to 1 u32 per page, or 0.1% space overhead
and lookups are one cache miss.

Aside from seeing whether or not a page was recently evicted, we can
also take a reasonable guess at how many other pages were evicted since
this page was evicted.

Signed-off-by: Rik van Riel <riel@redhat.com>

Index: linux-2.6.12-vm/include/linux/swap.h
===================================================================
--- linux-2.6.12-vm.orig/include/linux/swap.h
+++ linux-2.6.12-vm/include/linux/swap.h
@@ -153,6 +153,11 @@ extern void out_of_memory(unsigned int _
 /* linux/mm/memory.c */
 extern void swapin_readahead(swp_entry_t, unsigned long, struct vm_area_struct *);
 
+/* linux/mm/nonresident.c */
+extern int remember_page(struct address_space *, unsigned long);
+extern int recently_evicted(struct address_space *, unsigned long);
+extern void init_nonresident(void);
+
 /* linux/mm/page_alloc.c */
 extern unsigned long totalram_pages;
 extern unsigned long totalhigh_pages;
@@ -288,6 +293,11 @@ static inline swp_entry_t get_swap_page(
 #define grab_swap_token()  do { } while(0)
 #define has_swap_token(x) 0
 
+/* linux/mm/nonresident.c */
+#define init_nonresident()	do { } while (0)
+#define remember_page(x,y)	0
+#define recently_evicted(x,y)	0
+
 #endif /* CONFIG_SWAP */
 #endif /* __KERNEL__*/
 #endif /* _LINUX_SWAP_H */
Index: linux-2.6.12-vm/init/main.c
===================================================================
--- linux-2.6.12-vm.orig/init/main.c
+++ linux-2.6.12-vm/init/main.c
@@ -47,6 +47,7 @@
 #include <linux/rmap.h>
 #include <linux/mempolicy.h>
 #include <linux/key.h>
+#include <linux/swap.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -488,6 +489,7 @@ asmlinkage void __init start_kernel(void
 	}
 #endif
 	vfs_caches_init_early();
+	init_nonresident();
 	mem_init();
 	kmem_cache_init();
 	numa_policy_init();
Index: linux-2.6.12-vm/mm/Makefile
===================================================================
--- linux-2.6.12-vm.orig/mm/Makefile
+++ linux-2.6.12-vm/mm/Makefile
@@ -12,7 +12,8 @@ obj-y			:= bootmem.o filemap.o mempool.o
 			   readahead.o slab.o swap.o truncate.o vmscan.o \
 			   prio_tree.o $(mmu-y)
 
-obj-$(CONFIG_SWAP)	+= page_io.o swap_state.o swapfile.o thrash.o
+obj-$(CONFIG_SWAP)	+= page_io.o swap_state.o swapfile.o thrash.o \
+			   nonresident.o
 obj-$(CONFIG_HUGETLBFS)	+= hugetlb.o
 obj-$(CONFIG_NUMA) 	+= mempolicy.o
 obj-$(CONFIG_SHMEM) += shmem.o
Index: linux-2.6.12-vm/mm/nonresident.c
===================================================================
--- /dev/null
+++ linux-2.6.12-vm/mm/nonresident.c
@@ -0,0 +1,157 @@
+/*
+ * mm/nonresident.c
+ * (C) 2004,2005 Red Hat, Inc
+ * Written by Rik van Riel <riel@redhat.com>
+ * Released under the GPL, see the file COPYING for details.
+ *
+ * Keeps track of whether a non-resident page was recently evicted
+ * and should be immediately promoted to the active list. This also
+ * helps automatically tune the inactive target.
+ *
+ * The pageout code stores a recently evicted page in this cache
+ * by calling remember_page(mapping/mm, index/vaddr, generation)
+ * and can look it up in the cache by calling recently_evicted()
+ * with the same arguments.
+ *
+ * Note that there is no way to invalidate pages after eg. truncate
+ * or exit, we let the pages fall out of the non-resident set through
+ * normal replacement.
+ */
+#include <linux/mm.h>
+#include <linux/cache.h>
+#include <linux/spinlock.h>
+#include <linux/bootmem.h>
+#include <linux/hash.h>
+#include <linux/prefetch.h>
+#include <linux/kernel.h>
+
+/* Number of non-resident pages per hash bucket */
+#define NUM_NR ((L1_CACHE_BYTES - sizeof(atomic_t))/sizeof(u32))
+
+struct nr_bucket
+{
+	atomic_t hand;
+	u32 page[NUM_NR];
+} ____cacheline_aligned;
+
+/* The non-resident page hash table. */
+static struct nr_bucket * nonres_table;
+static unsigned int nonres_shift;
+static unsigned int nonres_mask;
+
+struct nr_bucket * nr_hash(void * mapping, unsigned long index)
+{
+	unsigned long bucket;
+	unsigned long hash;
+
+	hash = hash_ptr(mapping, BITS_PER_LONG);
+	hash = 37 * hash + hash_long(index, BITS_PER_LONG);
+	bucket = hash & nonres_mask;
+
+	return nonres_table + bucket;
+}
+
+static u32 nr_cookie(struct address_space * mapping, unsigned long index)
+{
+	unsigned long cookie = hash_ptr(mapping, BITS_PER_LONG);
+	cookie = 37 * cookie + hash_long(index, BITS_PER_LONG);
+
+	if (mapping->host) {
+		cookie = 37 * cookie + hash_long(mapping->host->i_ino, BITS_PER_LONG);
+	}
+
+	return (u32)(cookie >> (BITS_PER_LONG - 32));
+}
+
+int recently_evicted(struct address_space * mapping, unsigned long index)
+{
+	struct nr_bucket * nr_bucket;
+	int distance;
+	u32 wanted;
+	int i;
+
+	prefetch(mapping->host);
+	nr_bucket = nr_hash(mapping, index);
+
+	prefetch(nr_bucket);
+	wanted = nr_cookie(mapping, index);
+
+	for (i = 0; i < NUM_NR; i++) {
+		if (nr_bucket->page[i] == wanted) {
+			nr_bucket->page[i] = 0;
+			/* Return the distance between entry and clock hand. */
+			distance = atomic_read(&nr_bucket->hand) + NUM_NR - i;
+			distance = (distance % NUM_NR) + 1;
+			return distance * (1 << nonres_shift);
+		}
+	}
+
+	return -1;
+}
+
+int remember_page(struct address_space * mapping, unsigned long index)
+{
+	struct nr_bucket * nr_bucket;
+	u32 nrpage;
+	int i;
+
+	prefetch(mapping->host);
+	nr_bucket = nr_hash(mapping, index);
+
+	prefetchw(nr_bucket);
+	nrpage = nr_cookie(mapping, index);
+
+	/* Atomically find the next array index. */
+	preempt_disable();
+  retry:
+	i = atomic_inc_return(&nr_bucket->hand);
+	if (unlikely(i >= NUM_NR)) {
+		if (i == NUM_NR)
+			atomic_set(&nr_bucket->hand, -1);
+		goto retry;
+	}
+	preempt_enable();
+
+	/* Statistics may want to know whether the entry was in use. */
+	return xchg(&nr_bucket->page[i], nrpage);
+}
+
+/*
+ * For interactive workloads, we remember about as many non-resident pages
+ * as we have actual memory pages.  For server workloads with large inter-
+ * reference distances we could benefit from remembering more.
+ */
+static __initdata unsigned long nonresident_factor = 1;
+void __init init_nonresident(void)
+{
+	int target;
+	int i;
+
+	/*
+	 * Calculate the non-resident hash bucket target. Use a power of
+	 * two for the division because alloc_large_system_hash rounds up.
+	 */
+	target = nr_all_pages * nonresident_factor;
+	target /= (sizeof(struct nr_bucket) / sizeof(u32));
+
+	nonres_table = alloc_large_system_hash("Non-resident page tracking",
+					sizeof(struct nr_bucket),
+					target,
+					0,
+					HASH_EARLY | HASH_HIGHMEM,
+					&nonres_shift,
+					&nonres_mask,
+					0);
+
+	for (i = 0; i < (1 << nonres_shift); i++)
+		atomic_set(&nonres_table[i].hand, 0);
+}
+
+static int __init set_nonresident_factor(char * str)
+{
+	if (!str)
+		return 0;
+	nonresident_factor = simple_strtoul(str, &str, 0);
+	return 1;
+}
+__setup("nonresident_factor=", set_nonresident_factor);
Index: linux-2.6.12-vm/mm/vmscan.c
===================================================================
--- linux-2.6.12-vm.orig/mm/vmscan.c
+++ linux-2.6.12-vm/mm/vmscan.c
@@ -509,6 +509,7 @@ static int shrink_list(struct list_head 
 #ifdef CONFIG_SWAP
 		if (PageSwapCache(page)) {
 			swp_entry_t swap = { .val = page->private };
+			remember_page(&swapper_space, page->private);
 			__delete_from_swap_cache(page);
 			write_unlock_irq(&mapping->tree_lock);
 			swap_free(swap);
@@ -517,6 +518,7 @@ static int shrink_list(struct list_head 
 		}
 #endif /* CONFIG_SWAP */
 
+		remember_page(page->mapping, page->index);
 		__remove_from_page_cache(page);
 		write_unlock_irq(&mapping->tree_lock);
 		__put_page(page);
Index: linux-2.6.12-vm/mm/filemap.c
===================================================================
--- linux-2.6.12-vm.orig/mm/filemap.c
+++ linux-2.6.12-vm/mm/filemap.c
@@ -400,8 +400,13 @@ int add_to_page_cache_lru(struct page *p
 				pgoff_t offset, int gfp_mask)
 {
 	int ret = add_to_page_cache(page, mapping, offset, gfp_mask);
-	if (ret == 0)
-		lru_cache_add(page);
+	int activate = recently_evicted(mapping, offset);
+	if (ret == 0) {
+		if (activate >= 0)
+			lru_cache_add_active(page);
+		else
+			lru_cache_add(page);
+	}
 	return ret;
 }
 
Index: linux-2.6.12-vm/mm/swap_state.c
===================================================================
--- linux-2.6.12-vm.orig/mm/swap_state.c
+++ linux-2.6.12-vm/mm/swap_state.c
@@ -323,6 +323,7 @@ struct page *read_swap_cache_async(swp_e
 			struct vm_area_struct *vma, unsigned long addr)
 {
 	struct page *found_page, *new_page = NULL;
+	int activate;
 	int err;
 
 	do {
@@ -344,6 +345,8 @@ struct page *read_swap_cache_async(swp_e
 				break;		/* Out of memory */
 		}
 
+		activate = recently_evicted(&swapper_space, entry.val);
+
 		/*
 		 * Associate the page with swap entry in the swap cache.
 		 * May fail (-ENOENT) if swap entry has been freed since
@@ -359,7 +362,10 @@ struct page *read_swap_cache_async(swp_e
 			/*
 			 * Initiate read into locked page and return.
 			 */
-			lru_cache_add_active(new_page);
+			if (activate >= 0)
+				lru_cache_add_active(new_page);
+			else
+				lru_cache_add(new_page);
 			swap_readpage(NULL, new_page);
 			return new_page;
 		}

--
-- 
All Rights Reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC 2/3] non-resident page tracking
  2005-08-08 20:14 [RFC 0/3] non-resident page tracking Rik van Riel
  2005-08-08 20:14 ` [RFC 1/3] " Rik van Riel
@ 2005-08-08 20:14 ` Rik van Riel
  2005-08-08 20:14 ` [RFC 3/3] " Rik van Riel
  2 siblings, 0 replies; 11+ messages in thread
From: Rik van Riel @ 2005-08-08 20:14 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel

[-- Attachment #1: nonresident-stats --]
[-- Type: text/plain, Size: 4738 bytes --]

Prints a histogram of refaults in /proc/refaults.  This allows somebody
to estimate how much more memory a memory starved system would need to
run better.  

It can also help with the evaluation of page replacement algorithms,
since the algorithm that would need the least amount of extra memory
to fit a workload can be identified.

Signed-off-by: Rik van Riel <riel@redhat.com>

Index: linux-2.6.12-vm/fs/proc/proc_misc.c
===================================================================
--- linux-2.6.12-vm.orig/fs/proc/proc_misc.c
+++ linux-2.6.12-vm/fs/proc/proc_misc.c
@@ -219,6 +219,20 @@ static struct file_operations fragmentat
 	.release	= seq_release,
 };
 
+extern struct seq_operations refaults_op;
+static int refaults_open(struct inode *inode, struct file *file)
+{
+	(void)inode;
+	return seq_open(file, &refaults_op);
+}
+
+static struct file_operations refaults_file_operations = {
+	.open		= refaults_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= seq_release,
+};
+
 static int version_read_proc(char *page, char **start, off_t off,
 				 int count, int *eof, void *data)
 {
@@ -588,6 +602,7 @@ void __init proc_misc_init(void)
 	create_seq_entry("interrupts", 0, &proc_interrupts_operations);
 	create_seq_entry("slabinfo",S_IWUSR|S_IRUGO,&proc_slabinfo_operations);
 	create_seq_entry("buddyinfo",S_IRUGO, &fragmentation_file_operations);
+	create_seq_entry("refaults",S_IRUGO, &refaults_file_operations);
 	create_seq_entry("vmstat",S_IRUGO, &proc_vmstat_file_operations);
 	create_seq_entry("diskstats", 0, &proc_diskstats_operations);
 #ifdef CONFIG_MODULES
Index: linux-2.6.12-vm/mm/nonresident.c
===================================================================
--- linux-2.6.12-vm.orig/mm/nonresident.c
+++ linux-2.6.12-vm/mm/nonresident.c
@@ -24,6 +24,7 @@
 #include <linux/hash.h>
 #include <linux/prefetch.h>
 #include <linux/kernel.h>
+#include <linux/percpu.h>
 
 /* Number of non-resident pages per hash bucket */
 #define NUM_NR ((L1_CACHE_BYTES - sizeof(atomic_t))/sizeof(u32))
@@ -34,6 +35,9 @@ struct nr_bucket
 	u32 page[NUM_NR];
 } ____cacheline_aligned;
 
+/* Histogram for non-resident refault hits. [NUM_NR] means "not found". */
+DEFINE_PER_CPU(unsigned long[NUM_NR+1], refault_histogram);
+
 /* The non-resident page hash table. */
 static struct nr_bucket * nonres_table;
 static unsigned int nonres_shift;
@@ -81,11 +85,14 @@ int recently_evicted(struct address_spac
 			nr_bucket->page[i] = 0;
 			/* Return the distance between entry and clock hand. */
 			distance = atomic_read(&nr_bucket->hand) + NUM_NR - i;
-			distance = (distance % NUM_NR) + 1;
-			return distance * (1 << nonres_shift);
+			distance = distance % NUM_NR;
+			__get_cpu_var(refault_histogram)[distance]++;
+			return (distance + 1) * (1 << nonres_shift);
 		}
 	}
 
+	/* If this page was evicted, it was longer ago than our history. */
+	__get_cpu_var(refault_histogram)[NUM_NR]++;
 	return -1;
 }
 
@@ -155,3 +162,68 @@ static int __init set_nonresident_factor
 	return 1;
 }
 __setup("nonresident_factor=", set_nonresident_factor);
+
+#ifdef CONFIG_PROC_FS
+
+#include <linux/seq_file.h>
+
+static void *frag_start(struct seq_file *m, loff_t *pos)
+{
+	if (*pos < 0 || *pos > NUM_NR)
+		return NULL;
+
+	m->private = (unsigned long)*pos;
+
+	return pos;
+}
+
+static void *frag_next(struct seq_file *m, void *arg, loff_t *pos)
+{
+	if (*pos < NUM_NR) {
+		(*pos)++;
+		(unsigned long)m->private++;
+		return pos;
+	}
+	return NULL;
+}
+
+static void frag_stop(struct seq_file *m, void *arg)
+{
+}
+
+unsigned long get_refault_stat(unsigned long index)
+{
+	unsigned long total = 0;
+	int cpu;
+
+	for (cpu = first_cpu(cpu_online_map); cpu < NR_CPUS; cpu++) {
+		total += per_cpu(refault_histogram, cpu)[index];
+	}
+	return total;
+}
+
+static int frag_show(struct seq_file *m, void *arg)
+{
+	unsigned long index = (unsigned long)m->private;
+	unsigned long upper = ((unsigned long)index + 1) << nonres_shift;
+	unsigned long lower = (unsigned long)index << nonres_shift;
+	unsigned long hits = get_refault_stat(index);
+
+	if (index == 0)
+		seq_printf(m, "     Refault distance          Hits\n");
+
+	if (index < NUM_NR)
+		seq_printf(m, "%9lu - %9lu     %9lu\n", lower, upper, hits);
+	else
+		seq_printf(m, " New/Beyond %9lu     %9lu\n", lower, hits);
+
+	return 0;
+}
+
+struct seq_operations refaults_op = {
+	.start  = frag_start,
+	.next   = frag_next,
+	.stop   = frag_stop,
+	.show   = frag_show,
+};
+#endif /* CONFIG_PROCFS */

--
-- 
All Rights Reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC 3/3] non-resident page tracking
  2005-08-08 20:14 [RFC 0/3] non-resident page tracking Rik van Riel
  2005-08-08 20:14 ` [RFC 1/3] " Rik van Riel
  2005-08-08 20:14 ` [RFC 2/3] " Rik van Riel
@ 2005-08-08 20:14 ` Rik van Riel
  2 siblings, 0 replies; 11+ messages in thread
From: Rik van Riel @ 2005-08-08 20:14 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel

[-- Attachment #1: useonce-cleanup --]
[-- Type: text/plain, Size: 5740 bytes --]

Simplify the use-once code.  I have not benchmarked this change yet,
but I expect it to have little impact on most workloads.  It gets rid
of some magic code though, which is nice.

Signed-off-by: Rik van Riel <riel@surriel.com>

Index: linux-2.6.12-vm/include/linux/page-flags.h
===================================================================
--- linux-2.6.12-vm.orig/include/linux/page-flags.h
+++ linux-2.6.12-vm/include/linux/page-flags.h
@@ -75,7 +75,9 @@
 #define PG_mappedtodisk		17	/* Has blocks allocated on-disk */
 #define PG_reclaim		18	/* To be reclaimed asap */
 #define PG_nosave_free		19	/* Free, should not be written */
+
 #define PG_uncached		20	/* Page has been mapped as uncached */
+#define PG_new			21	/* Newly allocated page */
 
 /*
  * Global page accounting.  One instance per CPU.  Only unsigned longs are
@@ -306,6 +308,11 @@ extern void __mod_page_state(unsigned of
 #define SetPageUncached(page)	set_bit(PG_uncached, &(page)->flags)
 #define ClearPageUncached(page)	clear_bit(PG_uncached, &(page)->flags)
 
+#define PageNew(page)		test_bit(PG_new, &(page)->flags)
+#define SetPageNew(page)	set_bit(PG_new, &(page)->flags)
+#define ClearPageNew(page)	clear_bit(PG_new, &(page)->flags)
+#define TestClearPageNew(page)	test_and_clear_bit(PG_new, &(page)->flags)
+
 struct page;	/* forward declaration */
 
 int test_clear_page_dirty(struct page *page);
Index: linux-2.6.12-vm/mm/filemap.c
===================================================================
--- linux-2.6.12-vm.orig/mm/filemap.c
+++ linux-2.6.12-vm/mm/filemap.c
@@ -383,6 +383,7 @@ int add_to_page_cache(struct page *page,
 		if (!error) {
 			page_cache_get(page);
 			SetPageLocked(page);
+			SetPageNew(page);
 			page->mapping = mapping;
 			page->index = offset;
 			mapping->nrpages++;
@@ -727,7 +728,6 @@ void do_generic_mapping_read(struct addr
 	unsigned long offset;
 	unsigned long last_index;
 	unsigned long next_index;
-	unsigned long prev_index;
 	loff_t isize;
 	struct page *cached_page;
 	int error;
@@ -736,7 +736,6 @@ void do_generic_mapping_read(struct addr
 	cached_page = NULL;
 	index = *ppos >> PAGE_CACHE_SHIFT;
 	next_index = index;
-	prev_index = ra.prev_page;
 	last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
 	offset = *ppos & ~PAGE_CACHE_MASK;
 
@@ -783,13 +782,7 @@ page_ok:
 		if (mapping_writably_mapped(mapping))
 			flush_dcache_page(page);
 
-		/*
-		 * When (part of) the same page is read multiple times
-		 * in succession, only mark it as accessed the first time.
-		 */
-		if (prev_index != index)
-			mark_page_accessed(page);
-		prev_index = index;
+		mark_page_accessed(page);
 
 		/*
 		 * Ok, we have the page, and it's up-to-date, so
Index: linux-2.6.12-vm/mm/shmem.c
===================================================================
--- linux-2.6.12-vm.orig/mm/shmem.c
+++ linux-2.6.12-vm/mm/shmem.c
@@ -1525,11 +1525,8 @@ static void do_shmem_file_read(struct fi
 			 */
 			if (mapping_writably_mapped(mapping))
 				flush_dcache_page(page);
-			/*
-			 * Mark the page accessed if we read the beginning.
-			 */
-			if (!offset)
-				mark_page_accessed(page);
+
+			mark_page_accessed(page);
 		} else
 			page = ZERO_PAGE(0);
 
Index: linux-2.6.12-vm/mm/swap.c
===================================================================
--- linux-2.6.12-vm.orig/mm/swap.c
+++ linux-2.6.12-vm/mm/swap.c
@@ -115,19 +115,11 @@ void fastcall activate_page(struct page 
 
 /*
  * Mark a page as having seen activity.
- *
- * inactive,unreferenced	->	inactive,referenced
- * inactive,referenced		->	active,unreferenced
- * active,unreferenced		->	active,referenced
  */
 void fastcall mark_page_accessed(struct page *page)
 {
-	if (!PageActive(page) && PageReferenced(page) && PageLRU(page)) {
-		activate_page(page);
-		ClearPageReferenced(page);
-	} else if (!PageReferenced(page)) {
+	if (!PageReferenced(page))
 		SetPageReferenced(page);
-	}
 }
 
 EXPORT_SYMBOL(mark_page_accessed);
@@ -157,6 +149,7 @@ void fastcall lru_cache_add_active(struc
 	if (!pagevec_add(pvec, page))
 		__pagevec_lru_add_active(pvec);
 	put_cpu_var(lru_add_active_pvecs);
+	ClearPageNew(page);
 }
 
 void lru_add_drain(void)
Index: linux-2.6.12-vm/mm/vmscan.c
===================================================================
--- linux-2.6.12-vm.orig/mm/vmscan.c
+++ linux-2.6.12-vm/mm/vmscan.c
@@ -225,27 +225,6 @@ static int shrink_slab(unsigned long sca
 	return 0;
 }
 
-/* Called without lock on whether page is mapped, so answer is unstable */
-static inline int page_mapping_inuse(struct page *page)
-{
-	struct address_space *mapping;
-
-	/* Page is in somebody's page tables. */
-	if (page_mapped(page))
-		return 1;
-
-	/* Be more reluctant to reclaim swapcache than pagecache */
-	if (PageSwapCache(page))
-		return 1;
-
-	mapping = page_mapping(page);
-	if (!mapping)
-		return 0;
-
-	/* File is mmap'd by somebody? */
-	return mapping_mapped(mapping);
-}
-
 static inline int is_page_cache_freeable(struct page *page)
 {
 	return page_count(page) - !!PagePrivate(page) == 2;
@@ -398,9 +377,13 @@ static int shrink_list(struct list_head 
 			goto keep_locked;
 
 		referenced = page_referenced(page, 1, sc->priority <= 0);
-		/* In active use or really unfreeable?  Activate it. */
-		if (referenced && page_mapping_inuse(page))
+
+		if (referenced) {
+			/* New page. Wait and see if it gets used again... */
+			if (TestClearPageNew(page))
+				goto keep_locked;
 			goto activate_locked;
+		}
 
 #ifdef CONFIG_SWAP
 		/*

--
-- 
All Rights Reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 1/3] non-resident page tracking
  2005-08-08 20:14 ` [RFC 1/3] " Rik van Riel
@ 2005-08-08 20:26   ` David S. Miller, Rik van Riel
  2005-08-08 20:30     ` Rik van Riel
  2005-08-09 18:25   ` Marcelo Tosatti
  1 sibling, 1 reply; 11+ messages in thread
From: David S. Miller, Rik van Riel @ 2005-08-08 20:26 UTC (permalink / raw)
  To: riel; +Cc: linux-mm, linux-kernel

> @@ -359,7 +362,10 @@ struct page *read_swap_cache_async(swp_e
>  			/*
>  			 * Initiate read into locked page and return.
>  			 */
> -			lru_cache_add_active(new_page);
> +			if (activate >= 0)
> +				lru_cache_add_active(new_page);
> +			else
> +				lru_cache_add(new_page);
>  			swap_readpage(NULL, new_page);
>  			return new_page;

This change is totally unrelated to the rest of the
patch, and is not mentioned in the changelog.  Could
you explain it?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 1/3] non-resident page tracking
  2005-08-08 20:26   ` David S. Miller, Rik van Riel
@ 2005-08-08 20:30     ` Rik van Riel
  0 siblings, 0 replies; 11+ messages in thread
From: Rik van Riel @ 2005-08-08 20:30 UTC (permalink / raw)
  To: David S. Miller, Rik van Riel; +Cc: linux-mm, linux-kernel

On Mon, 8 Aug 2005, David S. Miller wrote:

> > @@ -359,7 +362,10 @@ struct page *read_swap_cache_async(swp_e

> > -			lru_cache_add_active(new_page);
> > +			if (activate >= 0)
> > +				lru_cache_add_active(new_page);
> > +			else
> > +				lru_cache_add(new_page);
> 
> This change is totally unrelated to the rest of the
> patch, and is not mentioned in the changelog.  Could
> you explain it?

Oops, you're right.  This is part of the replacement policy in
CLOCK-Pro, ARC, CART, etc. and should have been in a separate
patch.

This is what I get for pulling an all-nighter. ;)

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 1/3] non-resident page tracking
  2005-08-08 20:14 ` [RFC 1/3] " Rik van Riel
  2005-08-08 20:26   ` David S. Miller, Rik van Riel
@ 2005-08-09 18:25   ` Marcelo Tosatti
  2005-08-09 19:15     ` Peter Zijlstra
  2005-08-09 23:52     ` Rik van Riel
  1 sibling, 2 replies; 11+ messages in thread
From: Marcelo Tosatti @ 2005-08-09 18:25 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm, linux-kernel

Hi Rik,

Two hopefully useful comments:

i) ARC and its variants requires additional information about page
replacement (namely whether the page has been reclaimed from the L1 or
L2 lists).

How costly would it be to add this information to the hash table?

ii) From my reading of the patch, the provided "distance" information is
relative to each hash bucket. I'm unable to understand the distance metric
being useful if measured per-hash-bucket instead of globally?

PS: Since remember_page() is always called with the zone->lru_lock held,
the preempt_disable/enable pair is unecessary at the moment... still, 
might be better to leave it there for safety reasons.

On Mon, Aug 08, 2005 at 04:14:17PM -0400, Rik van Riel wrote:
> Track non-resident pages through a simple hashing scheme.  This way
> the space overhead is limited to 1 u32 per page, or 0.1% space overhead
> and lookups are one cache miss.
> 
> Aside from seeing whether or not a page was recently evicted, we can
> also take a reasonable guess at how many other pages were evicted since
> this page was evicted.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> 
> Index: linux-2.6.12-vm/include/linux/swap.h
> ===================================================================
> --- linux-2.6.12-vm.orig/include/linux/swap.h
> +++ linux-2.6.12-vm/include/linux/swap.h
> @@ -153,6 +153,11 @@ extern void out_of_memory(unsigned int _
>  /* linux/mm/memory.c */
>  extern void swapin_readahead(swp_entry_t, unsigned long, struct vm_area_struct *);
>  
> +/* linux/mm/nonresident.c */
> +extern int remember_page(struct address_space *, unsigned long);
> +extern int recently_evicted(struct address_space *, unsigned long);
> +extern void init_nonresident(void);
> +
>  /* linux/mm/page_alloc.c */
>  extern unsigned long totalram_pages;
>  extern unsigned long totalhigh_pages;
> @@ -288,6 +293,11 @@ static inline swp_entry_t get_swap_page(
>  #define grab_swap_token()  do { } while(0)
>  #define has_swap_token(x) 0
>  
> +/* linux/mm/nonresident.c */
> +#define init_nonresident()	do { } while (0)
> +#define remember_page(x,y)	0
> +#define recently_evicted(x,y)	0
> +
>  #endif /* CONFIG_SWAP */
>  #endif /* __KERNEL__*/
>  #endif /* _LINUX_SWAP_H */
> Index: linux-2.6.12-vm/init/main.c
> ===================================================================
> --- linux-2.6.12-vm.orig/init/main.c
> +++ linux-2.6.12-vm/init/main.c
> @@ -47,6 +47,7 @@
>  #include <linux/rmap.h>
>  #include <linux/mempolicy.h>
>  #include <linux/key.h>
> +#include <linux/swap.h>
>  
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -488,6 +489,7 @@ asmlinkage void __init start_kernel(void
>  	}
>  #endif
>  	vfs_caches_init_early();
> +	init_nonresident();
>  	mem_init();
>  	kmem_cache_init();
>  	numa_policy_init();
> Index: linux-2.6.12-vm/mm/Makefile
> ===================================================================
> --- linux-2.6.12-vm.orig/mm/Makefile
> +++ linux-2.6.12-vm/mm/Makefile
> @@ -12,7 +12,8 @@ obj-y			:= bootmem.o filemap.o mempool.o
>  			   readahead.o slab.o swap.o truncate.o vmscan.o \
>  			   prio_tree.o $(mmu-y)
>  
> -obj-$(CONFIG_SWAP)	+= page_io.o swap_state.o swapfile.o thrash.o
> +obj-$(CONFIG_SWAP)	+= page_io.o swap_state.o swapfile.o thrash.o \
> +			   nonresident.o
>  obj-$(CONFIG_HUGETLBFS)	+= hugetlb.o
>  obj-$(CONFIG_NUMA) 	+= mempolicy.o
>  obj-$(CONFIG_SHMEM) += shmem.o
> Index: linux-2.6.12-vm/mm/nonresident.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6.12-vm/mm/nonresident.c
> @@ -0,0 +1,157 @@
> +/*
> + * mm/nonresident.c
> + * (C) 2004,2005 Red Hat, Inc
> + * Written by Rik van Riel <riel@redhat.com>
> + * Released under the GPL, see the file COPYING for details.
> + *
> + * Keeps track of whether a non-resident page was recently evicted
> + * and should be immediately promoted to the active list. This also
> + * helps automatically tune the inactive target.
> + *
> + * The pageout code stores a recently evicted page in this cache
> + * by calling remember_page(mapping/mm, index/vaddr, generation)
> + * and can look it up in the cache by calling recently_evicted()
> + * with the same arguments.
> + *
> + * Note that there is no way to invalidate pages after eg. truncate
> + * or exit, we let the pages fall out of the non-resident set through
> + * normal replacement.
> + */
> +#include <linux/mm.h>
> +#include <linux/cache.h>
> +#include <linux/spinlock.h>
> +#include <linux/bootmem.h>
> +#include <linux/hash.h>
> +#include <linux/prefetch.h>
> +#include <linux/kernel.h>
> +
> +/* Number of non-resident pages per hash bucket */
> +#define NUM_NR ((L1_CACHE_BYTES - sizeof(atomic_t))/sizeof(u32))
> +
> +struct nr_bucket
> +{
> +	atomic_t hand;
> +	u32 page[NUM_NR];
> +} ____cacheline_aligned;
> +
> +/* The non-resident page hash table. */
> +static struct nr_bucket * nonres_table;
> +static unsigned int nonres_shift;
> +static unsigned int nonres_mask;
> +
> +struct nr_bucket * nr_hash(void * mapping, unsigned long index)
> +{
> +	unsigned long bucket;
> +	unsigned long hash;
> +
> +	hash = hash_ptr(mapping, BITS_PER_LONG);
> +	hash = 37 * hash + hash_long(index, BITS_PER_LONG);
> +	bucket = hash & nonres_mask;
> +
> +	return nonres_table + bucket;
> +}
> +
> +static u32 nr_cookie(struct address_space * mapping, unsigned long index)
> +{
> +	unsigned long cookie = hash_ptr(mapping, BITS_PER_LONG);
> +	cookie = 37 * cookie + hash_long(index, BITS_PER_LONG);
> +
> +	if (mapping->host) {
> +		cookie = 37 * cookie + hash_long(mapping->host->i_ino, BITS_PER_LONG);
> +	}
> +
> +	return (u32)(cookie >> (BITS_PER_LONG - 32));
> +}
> +
> +int recently_evicted(struct address_space * mapping, unsigned long index)
> +{
> +	struct nr_bucket * nr_bucket;
> +	int distance;
> +	u32 wanted;
> +	int i;
> +
> +	prefetch(mapping->host);
> +	nr_bucket = nr_hash(mapping, index);
> +
> +	prefetch(nr_bucket);
> +	wanted = nr_cookie(mapping, index);
> +
> +	for (i = 0; i < NUM_NR; i++) {
> +		if (nr_bucket->page[i] == wanted) {
> +			nr_bucket->page[i] = 0;
> +			/* Return the distance between entry and clock hand. */
> +			distance = atomic_read(&nr_bucket->hand) + NUM_NR - i;
> +			distance = (distance % NUM_NR) + 1;
> +			return distance * (1 << nonres_shift);
> +		}
> +	}
> +
> +	return -1;
> +}
> +
> +int remember_page(struct address_space * mapping, unsigned long index)
> +{
> +	struct nr_bucket * nr_bucket;
> +	u32 nrpage;
> +	int i;
> +
> +	prefetch(mapping->host);
> +	nr_bucket = nr_hash(mapping, index);
> +
> +	prefetchw(nr_bucket);
> +	nrpage = nr_cookie(mapping, index);
> +
> +	/* Atomically find the next array index. */
> +	preempt_disable();
> +  retry:
> +	i = atomic_inc_return(&nr_bucket->hand);
> +	if (unlikely(i >= NUM_NR)) {
> +		if (i == NUM_NR)
> +			atomic_set(&nr_bucket->hand, -1);
> +		goto retry;
> +	}
> +	preempt_enable();
> +
> +	/* Statistics may want to know whether the entry was in use. */
> +	return xchg(&nr_bucket->page[i], nrpage);
> +}
> +
> +/*
> + * For interactive workloads, we remember about as many non-resident pages
> + * as we have actual memory pages.  For server workloads with large inter-
> + * reference distances we could benefit from remembering more.
> + */
> +static __initdata unsigned long nonresident_factor = 1;
> +void __init init_nonresident(void)
> +{
> +	int target;
> +	int i;
> +
> +	/*
> +	 * Calculate the non-resident hash bucket target. Use a power of
> +	 * two for the division because alloc_large_system_hash rounds up.
> +	 */
> +	target = nr_all_pages * nonresident_factor;
> +	target /= (sizeof(struct nr_bucket) / sizeof(u32));
> +
> +	nonres_table = alloc_large_system_hash("Non-resident page tracking",
> +					sizeof(struct nr_bucket),
> +					target,
> +					0,
> +					HASH_EARLY | HASH_HIGHMEM,
> +					&nonres_shift,
> +					&nonres_mask,
> +					0);
> +
> +	for (i = 0; i < (1 << nonres_shift); i++)
> +		atomic_set(&nonres_table[i].hand, 0);
> +}
> +
> +static int __init set_nonresident_factor(char * str)
> +{
> +	if (!str)
> +		return 0;
> +	nonresident_factor = simple_strtoul(str, &str, 0);
> +	return 1;
> +}
> +__setup("nonresident_factor=", set_nonresident_factor);
> Index: linux-2.6.12-vm/mm/vmscan.c
> ===================================================================
> --- linux-2.6.12-vm.orig/mm/vmscan.c
> +++ linux-2.6.12-vm/mm/vmscan.c
> @@ -509,6 +509,7 @@ static int shrink_list(struct list_head 
>  #ifdef CONFIG_SWAP
>  		if (PageSwapCache(page)) {
>  			swp_entry_t swap = { .val = page->private };
> +			remember_page(&swapper_space, page->private);
>  			__delete_from_swap_cache(page);
>  			write_unlock_irq(&mapping->tree_lock);
>  			swap_free(swap);
> @@ -517,6 +518,7 @@ static int shrink_list(struct list_head 
>  		}
>  #endif /* CONFIG_SWAP */
>  
> +		remember_page(page->mapping, page->index);
>  		__remove_from_page_cache(page);
>  		write_unlock_irq(&mapping->tree_lock);
>  		__put_page(page);
> Index: linux-2.6.12-vm/mm/filemap.c
> ===================================================================
> --- linux-2.6.12-vm.orig/mm/filemap.c
> +++ linux-2.6.12-vm/mm/filemap.c
> @@ -400,8 +400,13 @@ int add_to_page_cache_lru(struct page *p
>  				pgoff_t offset, int gfp_mask)
>  {
>  	int ret = add_to_page_cache(page, mapping, offset, gfp_mask);
> -	if (ret == 0)
> -		lru_cache_add(page);
> +	int activate = recently_evicted(mapping, offset);
> +	if (ret == 0) {
> +		if (activate >= 0)
> +			lru_cache_add_active(page);
> +		else
> +			lru_cache_add(page);
> +	}
>  	return ret;
>  }
>  
> Index: linux-2.6.12-vm/mm/swap_state.c
> ===================================================================
> --- linux-2.6.12-vm.orig/mm/swap_state.c
> +++ linux-2.6.12-vm/mm/swap_state.c
> @@ -323,6 +323,7 @@ struct page *read_swap_cache_async(swp_e
>  			struct vm_area_struct *vma, unsigned long addr)
>  {
>  	struct page *found_page, *new_page = NULL;
> +	int activate;
>  	int err;
>  
>  	do {
> @@ -344,6 +345,8 @@ struct page *read_swap_cache_async(swp_e
>  				break;		/* Out of memory */
>  		}
>  
> +		activate = recently_evicted(&swapper_space, entry.val);
> +
>  		/*
>  		 * Associate the page with swap entry in the swap cache.
>  		 * May fail (-ENOENT) if swap entry has been freed since
> @@ -359,7 +362,10 @@ struct page *read_swap_cache_async(swp_e
>  			/*
>  			 * Initiate read into locked page and return.
>  			 */
> -			lru_cache_add_active(new_page);
> +			if (activate >= 0)
> +				lru_cache_add_active(new_page);
> +			else
> +				lru_cache_add(new_page);
>  			swap_readpage(NULL, new_page);
>  			return new_page;
>  		}
> 
> --
> -- 
> All Rights Reversed
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 1/3] non-resident page tracking
  2005-08-09 18:25   ` Marcelo Tosatti
@ 2005-08-09 19:15     ` Peter Zijlstra
  2005-08-09 21:13       ` Marcelo Tosatti
  2005-08-09 23:52     ` Rik van Riel
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2005-08-09 19:15 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel, linux-mm, Rik van Riel

On Tue, 2005-08-09 at 15:25 -0300, Marcelo Tosatti wrote:
> Hi Rik,
> 
> Two hopefully useful comments:
> 
> i) ARC and its variants requires additional information about page
> replacement (namely whether the page has been reclaimed from the L1 or
> L2 lists).
> 
> How costly would it be to add this information to the hash table?
> 
I've been thinking on reserving another word in the cache-line and use
that as a bit-array to keep that information; the only problems with
that would be atomicy of the {bucket,bit} tuple and very large
cachelines where NUM_NR > 32.

> ii) From my reading of the patch, the provided "distance" information is
> relative to each hash bucket. I'm unable to understand the distance metric
> being useful if measured per-hash-bucket instead of globally?

The assumption is that IFF the hash function has good distribution
properties the per bucket distance is a good approximation of
(distance >> nonres_shift).

> 
> PS: Since remember_page() is always called with the zone->lru_lock held,
> the preempt_disable/enable pair is unecessary at the moment... still, 
> might be better to leave it there for safety reasons.
> 

There being multiple zones; owning zone->lru_lock does not guarantee
uniqueness on the remember_page() path as its a global structure.

-- 
Peter Zijlstra <a.p.zijlstra@chello.nl>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 1/3] non-resident page tracking
  2005-08-09 19:15     ` Peter Zijlstra
@ 2005-08-09 21:13       ` Marcelo Tosatti
  2005-08-10  8:40         ` Rik van Riel
  0 siblings, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2005-08-09 21:13 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, linux-mm, Rik van Riel

On Tue, Aug 09, 2005 at 09:15:26PM +0200, Peter Zijlstra wrote:
> On Tue, 2005-08-09 at 15:25 -0300, Marcelo Tosatti wrote:
> > Hi Rik,
> > 
> > Two hopefully useful comments:
> > 
> > i) ARC and its variants requires additional information about page
> > replacement (namely whether the page has been reclaimed from the L1 or
> > L2 lists).
> > 
> > How costly would it be to add this information to the hash table?
> > 
> I've been thinking on reserving another word in the cache-line and use
> that as a bit-array to keep that information; the only problems with
> that would be atomicy of the {bucket,bit} tuple and very large
> cachelines where NUM_NR > 32. 

The chance for a lookup hit to happen on a hash value which is in a
modified-state in a different CPU's cacheline should be pretty small
(depends on the architecture also, but shouldnt be much of an issue I
guess).

Hoping on that, guaranteed validity of data is not necessary, it is OK
to be incorrect occasionally.

> > ii) From my reading of the patch, the provided "distance" information is
> > relative to each hash bucket. I'm unable to understand the distance metric
> > being useful if measured per-hash-bucket instead of globally?
> 
> The assumption is that IFF the hash function has good distribution
> properties the per bucket distance is a good approximation of
> (distance >> nonres_shift).

Well, not really "good approximation" it sounds to me, the sensibility
goes down to L1_CACHE_LINE/sizeof(u32), which is:

- 8 on 32-byte cacheline
- 16 on 64-byte cacheline 
- 32 on 128-byte cacheline

Right?

So the (nice!) refault histogram gets limited to those values?

> > PS: Since remember_page() is always called with the zone->lru_lock held,
> > the preempt_disable/enable pair is unecessary at the moment... still, 
> > might be better to leave it there for safety reasons.
> > 
> 
> There being multiple zones; owning zone->lru_lock does not guarantee
> uniqueness on the remember_page() path as its a global structure.

True, but it guarantees disabled preemption. No big deal...
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 1/3] non-resident page tracking
  2005-08-09 18:25   ` Marcelo Tosatti
  2005-08-09 19:15     ` Peter Zijlstra
@ 2005-08-09 23:52     ` Rik van Riel
  1 sibling, 0 replies; 11+ messages in thread
From: Rik van Riel @ 2005-08-09 23:52 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-mm, linux-kernel

On Tue, 9 Aug 2005, Marcelo Tosatti wrote:

> Two hopefully useful comments:
> 
> i) ARC and its variants requires additional information about page
> replacement (namely whether the page has been reclaimed from the L1 or
> L2 lists).
> 
> How costly would it be to add this information to the hash table?

Not at all.  Simply reduce the hash to 31 bits and use the remaining
bit to store that value.

> ii) From my reading of the patch, the provided "distance" information is
> relative to each hash bucket. I'm unable to understand the distance metric
> being useful if measured per-hash-bucket instead of globally?

The idea is that the hash function spreads things around evenly
enough for the different buckets to rotate at roughly the same
speed.

-- 
All Rights Reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 1/3] non-resident page tracking
  2005-08-09 21:13       ` Marcelo Tosatti
@ 2005-08-10  8:40         ` Rik van Riel
  0 siblings, 0 replies; 11+ messages in thread
From: Rik van Riel @ 2005-08-10  8:40 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Peter Zijlstra, linux-kernel, linux-mm

On Tue, 9 Aug 2005, Marcelo Tosatti wrote:

> Well, not really "good approximation" it sounds to me, the sensibility
> goes down to L1_CACHE_LINE/sizeof(u32), which is:
> 
> - 8 on 32-byte cacheline
> - 16 on 64-byte cacheline 
> - 32 on 128-byte cacheline
> 
> Right?
> 
> So the (nice!) refault histogram gets limited to those values?

I agree that 7 would be too small.  I guess I should limit the
minimum size of the nonresident hash bucket to 15 entries...

-- 
All Rights Reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-08-10  8:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-08-08 20:14 [RFC 0/3] non-resident page tracking Rik van Riel
2005-08-08 20:14 ` [RFC 1/3] " Rik van Riel
2005-08-08 20:26   ` David S. Miller, Rik van Riel
2005-08-08 20:30     ` Rik van Riel
2005-08-09 18:25   ` Marcelo Tosatti
2005-08-09 19:15     ` Peter Zijlstra
2005-08-09 21:13       ` Marcelo Tosatti
2005-08-10  8:40         ` Rik van Riel
2005-08-09 23:52     ` Rik van Riel
2005-08-08 20:14 ` [RFC 2/3] " Rik van Riel
2005-08-08 20:14 ` [RFC 3/3] " Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox