* Zoned CART
@ 2005-08-12 14:37 Peter Zijlstra
2005-08-12 15:42 ` Rahul Iyer
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Peter Zijlstra @ 2005-08-12 14:37 UTC (permalink / raw)
To: linux-mm; +Cc: Rik van Riel, Marcelo Tosatti, Rahul Iyer
[-- Attachment #1: Type: text/plain, Size: 2183 bytes --]
Hi All,
I've been thinking on how to implement a zoned CART; and I think I have
found a nice concept.
My ideas are based on the initial cart patch by Rahul and the
non-resident code of Rik.
For a zoned page replacement algorithm we have per zone resident list(s)
and global non-resident list(s). CART specific we would have a T1_i and
T2_i, where 0 <= i <= nr_zones, and global B1 and B2 lists.
Because B1 and B2 are variable size and the B1_i target size q_i is zone
specific we need some tricks. However since |B1| + |B2| = c we could get
away with a single hash_table of c entries if we can manage to balance
the entries within.
I propose to do this by using a 2 hand bucket and using the 2 MSB of the
cookie (per bucket uniqueness; 30 bits of uniqueness should be enough on
a ~64 count bucket). The cookies MSB is used to distinguish B1/B2 and
the MSB-1 is used for the filter bit.
Let us denote the buckets with the subscript j: |B1_j| + |B2_j| = c_j.
Each hand keeps a FIFO for its corresponding type: B1/B2; eg. rotating
H1_j will select the next oldest B1_j page for removal.
We need to balance the per zone values:
T1_i, T2_i, |T1_i|, |T2_i|
p_i, Ns_i, Nl_i
|B1_i|, |B2_i|, q_i
agains the per bucket values:
B1_j, B2_j.
This can be done with two simple modifications to the algorithm:
- explicitly keep |B1_i| and |B2_i| - needed for the p,q targets
- merge the history replacement (lines 6-10) in the replace (lines
36-40) code so that: adding the new MRU page and removing the old LRU
page becomes one action.
This will keep:
|B1_j| |B1| Sum^i(|B1_i|)
-------- ~ ------ = -------------
|B2_j| |B2| Sum^i(|B2_i|)
however it will violate strict FIFO order within the buckets; although I
guess it won't be too bad.
This approach does away with explicitly keeping the FIFO lists for the
non-resident pages and merges them.
Attached is a modification of rik his non-resident code that implements
the buckets described herein.
I shall attempt to merge this code into the Rahuls new cart-patch-2 if
you guys don't see any big problems with the approach, or beat me to it.
Kind regards,
--
Peter Zijlstra <a.p.zijlstra@chello.nl>
[-- Attachment #2: nonresident-pages.patch --]
[-- Type: text/x-patch, Size: 9331 bytes --]
diff -NaurX linux-2.6.13-rc6/Documentation/dontdiff linux-2.6.13-rc6/include/linux/nonresident.h linux-2.6.13-rc6-cart/include/linux/nonresident.h
--- linux-2.6.13-rc6/include/linux/nonresident.h 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.13-rc6-cart/include/linux/nonresident.h 2005-08-12 13:55:54.000000000 +0200
@@ -0,0 +1,11 @@
+#ifndef __LINUX_NONRESIDENT_H
+#define __LINUX_NONRESIDENT_H
+
+#define NR_filter 0x01 /* short/long */
+#define NR_list 0x02 /* b1/b2; correlates to PG_active */
+
+#define EVICT_MASK 0x80000000
+#define EVICT_B1 0x00000000
+#define EVICT_B2 0x80000000
+
+#endif /* __LINUX_NONRESIDENT_H */
diff -NaurX linux-2.6.13-rc6/Documentation/dontdiff linux-2.6.13-rc6/include/linux/swap.h linux-2.6.13-rc6-cart/include/linux/swap.h
--- linux-2.6.13-rc6/include/linux/swap.h 2005-08-08 20:57:50.000000000 +0200
+++ linux-2.6.13-rc6-cart/include/linux/swap.h 2005-08-12 14:00:26.000000000 +0200
@@ -154,6 +154,11 @@
/* linux/mm/memory.c */
extern void swapin_readahead(swp_entry_t, unsigned long, struct vm_area_struct *);
+/* linux/mm/nonresident.c */
+extern u32 remember_page(struct address_space *, unsigned long, unsigned int);
+extern unsigned int recently_evicted(struct address_space *, unsigned long);
+extern void init_nonresident(void);
+
/* linux/mm/page_alloc.c */
extern unsigned long totalram_pages;
extern unsigned long totalhigh_pages;
@@ -292,6 +297,11 @@
#define grab_swap_token() do { } while(0)
#define has_swap_token(x) 0
+/* linux/mm/nonresident.c */
+#define init_nonresident() do { } while (0)
+#define remember_page(x,y,z) 0
+#define recently_evicted(x,y) 0
+
#endif /* CONFIG_SWAP */
#endif /* __KERNEL__*/
#endif /* _LINUX_SWAP_H */
diff -NaurX linux-2.6.13-rc6/Documentation/dontdiff linux-2.6.13-rc6/init/main.c linux-2.6.13-rc6-cart/init/main.c
--- linux-2.6.13-rc6/init/main.c 2005-08-08 20:57:51.000000000 +0200
+++ linux-2.6.13-rc6-cart/init/main.c 2005-08-10 08:33:38.000000000 +0200
@@ -47,6 +47,7 @@
#include <linux/rmap.h>
#include <linux/mempolicy.h>
#include <linux/key.h>
+#include <linux/swap.h>
#include <asm/io.h>
#include <asm/bugs.h>
@@ -494,6 +495,7 @@
}
#endif
vfs_caches_init_early();
+ init_nonresident();
mem_init();
kmem_cache_init();
setup_per_cpu_pageset();
diff -NaurX linux-2.6.13-rc6/Documentation/dontdiff linux-2.6.13-rc6/mm/Makefile linux-2.6.13-rc6-cart/mm/Makefile
--- linux-2.6.13-rc6/mm/Makefile 2005-08-08 20:57:52.000000000 +0200
+++ linux-2.6.13-rc6-cart/mm/Makefile 2005-08-10 08:33:39.000000000 +0200
@@ -12,7 +12,8 @@
readahead.o slab.o swap.o truncate.o vmscan.o \
prio_tree.o $(mmu-y)
-obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o
+obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o \
+ nonresident.o
obj-$(CONFIG_HUGETLBFS) += hugetlb.o
obj-$(CONFIG_NUMA) += mempolicy.o
obj-$(CONFIG_SPARSEMEM) += sparse.o
diff -NaurX linux-2.6.13-rc6/Documentation/dontdiff linux-2.6.13-rc6/mm/nonresident.c linux-2.6.13-rc6-cart/mm/nonresident.c
--- linux-2.6.13-rc6/mm/nonresident.c 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.13-rc6-cart/mm/nonresident.c 2005-08-12 14:00:26.000000000 +0200
@@ -0,0 +1,211 @@
+/*
+ * mm/nonresident.c
+ * (C) 2004,2005 Red Hat, Inc
+ * Written by Rik van Riel <riel@redhat.com>
+ * Released under the GPL, see the file COPYING for details.
+ * Adapted by Peter Zijlstra <a.p.zijlstra@chello.nl> for use by ARC
+ * like algorithms.
+ *
+ * Keeps track of whether a non-resident page was recently evicted
+ * and should be immediately promoted to the active list. This also
+ * helps automatically tune the inactive target.
+ *
+ * The pageout code stores a recently evicted page in this cache
+ * by calling remember_page(mapping/mm, index/vaddr)
+ * and can look it up in the cache by calling recently_evicted()
+ * with the same arguments.
+ *
+ * Note that there is no way to invalidate pages after eg. truncate
+ * or exit, we let the pages fall out of the non-resident set through
+ * normal replacement.
+ *
+ *
+ * Modified to work with ARC like algorithms who:
+ * - need to balance two lists; |b1| + |b2| = c,
+ * - keep a flag per non-resident page.
+ *
+ * This is accomplished by extending the buckets to two hands; one
+ * for each list. And modifying the cookie to put two state flags
+ * in its MSBs.
+ *
+ * On insertion time it is specified from which list an entry is to
+ * be reused; then the corresponding hand is rotated until a cookie
+ * of the proper type is encountered (MSB; NR_list).
+ *
+ * Because two hands and clock search are too much for
+ * preempt_disable() the bucket is guarded by a spinlock.
+ */
+#include <linux/mm.h>
+#include <linux/cache.h>
+#include <linux/spinlock.h>
+#include <linux/bootmem.h>
+#include <linux/hash.h>
+#include <linux/prefetch.h>
+#include <linux/kernel.h>
+#include <linux/nonresident.h>
+
+#define TARGET_SLOTS 128
+#define NR_CACHELINES (TARGET_SLOTS*sizeof(u32) / L1_CACHE_BYTES);
+#define NR_SLOTS (((NR_CACHELINES * L1_CACHE_BYTES) - sizeof(spinlock_t) - 2*sizeof(u16)) / sizeof(u32))
+#if NR_SLOTS < TARGET_SLOTS / 2
+#warning very small slot size
+#if NR_SLOTS == 0
+#error no room for slots left
+#endif
+#endif
+
+#define FLAGS_BITS 2
+#define FLAGS_SHIFT (sizeof(u32)*8 - FLAGS_BITS)
+#define FLAGS_MASK (~(((1 << FLAGS_BITS) - 1) << FLAGS_SHIFT))
+
+struct nr_bucket
+{
+ spinlock_t lock;
+ u16 hand[2];
+ u32 slot[NR_SLOTS];
+} ____cacheline_aligned;
+
+/* The non-resident page hash table. */
+static struct nr_bucket * nonres_table;
+static unsigned int nonres_shift;
+static unsigned int nonres_mask;
+
+/* hash the address into a bucket */
+static struct nr_bucket * nr_hash(void * mapping, unsigned long index)
+{
+ unsigned long bucket;
+ unsigned long hash;
+
+ hash = hash_ptr(mapping, BITS_PER_LONG);
+ hash = 37 * hash + hash_long(index, BITS_PER_LONG);
+ bucket = hash & nonres_mask;
+
+ return nonres_table + bucket;
+}
+
+/* hash the address, inode and flags into a cookie */
+/* the two msb are flags; where msb-1 is a type flag and msb a period flag */
+static u32 nr_cookie(struct address_space * mapping, unsigned long index, unsigned int flags)
+{
+ u32 c;
+ unsigned long cookie;
+
+ cookie = hash_ptr(mapping, BITS_PER_LONG);
+ cookie = 37 * cookie + hash_long(index, BITS_PER_LONG);
+
+ if (mapping->host) {
+ cookie = 37 * cookie + hash_long(mapping->host->i_ino, BITS_PER_LONG);
+ }
+
+ c = (u32)(cookie >> (BITS_PER_LONG - 32));
+ c = (c & FLAGS_MASK) | flags << FLAGS_SHIFT;
+ return c;
+}
+
+unsigned int recently_evicted(struct address_space * mapping, unsigned long index)
+{
+ struct nr_bucket * nr_bucket;
+ u32 wanted;
+ unsigned int r_flags = 0;
+ int i;
+
+ prefetch(mapping->host);
+ nr_bucket = nr_hash(mapping, index);
+
+ spin_lock_prefetch(nr_bucket); // prefetch_range(nr_bucket, NR_CACHELINES);
+ wanted = nr_cookie(mapping, index, 0);
+
+ spin_lock(&nr_bucket->lock);
+ for (i = 0; i < NR_SLOTS; ++i) {
+ if (nr_bucket->slot[i] & FLAGS_MASK == wanted) {
+ r_flags = nr_bucket->slot[i] >> FLAGS_SHIFT;
+ r_flags |= EVICT_MASK;
+ nr_bucket->slot[i] = 0;
+ break;
+ }
+ }
+ spin_unlock(&nr_bucket->lock);
+
+ return r_flags;
+}
+
+/* flags:
+ * logical and of the page flags (NR_filter, NR_list) and
+ * an EVICT_ target
+ */
+u32 remember_page(struct address_space * mapping, unsigned long index, unsigned int flags)
+{
+ struct nr_bucket * nr_bucket;
+ u32 cookie;
+ u32 * slot;
+ int i, slots;
+
+ prefetch(mapping->host);
+ nr_bucket = nr_hash(mapping, index);
+
+ spin_lock_prefetch(nr_bucket); // prefetchw_range(nr_bucket, NR_CACHELINES);
+ cookie = nr_cookie(mapping, index, flags);
+
+ flags &= EVICT_MASK;
+ spin_lock(&nr_bucket->lock);
+again:
+ slots = NR_SLOTS;
+ do {
+ i = ++nr_bucket->hand[!!flags];
+ if (unlikely(i >= NR_SLOTS))
+ i = nr_bucket->hand[!!flags] = 0;
+ slot = &nr_bucket->slot[i];
+ } while (*slot && *slot & EVICT_MASK != flags && --slots);
+ if (unlikely(!slots)) {
+ flags ^= EVICT_MASK;
+ goto again;
+ }
+ xchg(slot, cookie);
+ spin_unlock(&nr_bucket->lock);
+
+ return cookie;
+}
+
+/*
+ * For interactive workloads, we remember about as many non-resident pages
+ * as we have actual memory pages. For server workloads with large inter-
+ * reference distances we could benefit from remembering more.
+ */
+static __initdata unsigned long nonresident_factor = 1;
+void __init init_nonresident(void)
+{
+ int target;
+ int i;
+
+ /*
+ * Calculate the non-resident hash bucket target. Use a power of
+ * two for the division because alloc_large_system_hash rounds up.
+ */
+ target = nr_all_pages * nonresident_factor;
+ target /= (sizeof(struct nr_bucket) / sizeof(u32));
+
+ nonres_table = alloc_large_system_hash("Non-resident page tracking",
+ sizeof(struct nr_bucket),
+ target,
+ 0,
+ HASH_EARLY | HASH_HIGHMEM,
+ &nonres_shift,
+ &nonres_mask,
+ 0);
+
+ for (i = 0; i < (1 << nonres_shift); i++) {
+ spin_lock_init(&nonres_table[i].lock);
+ nonres_table[i].hand[0] = nonres_table[i].hand[1] = 0;
+ for (j = 0; j < NR_SLOTS; ++j)
+ nonres_table[i].slot[j] = 0;
+ }
+}
+
+static int __init set_nonresident_factor(char * str)
+{
+ if (!str)
+ return 0;
+ nonresident_factor = simple_strtoul(str, &str, 0);
+ return 1;
+}
+__setup("nonresident_factor=", set_nonresident_factor);
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Zoned CART
2005-08-12 14:37 Zoned CART Peter Zijlstra
@ 2005-08-12 15:42 ` Rahul Iyer
2005-08-12 15:52 ` Peter Zijlstra
2005-08-12 23:08 ` Marcelo Tosatti
2005-08-12 20:21 ` Marcelo Tosatti
2005-08-14 12:58 ` Peter Zijlstra
2 siblings, 2 replies; 22+ messages in thread
From: Rahul Iyer @ 2005-08-12 15:42 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: linux-mm, Rik van Riel, Marcelo Tosatti
Hi Peter,
I have recently released another patch...
both patches are at http://www.cs.cmu.edu/~412/projects/CART/
Thanks
Rahul
Peter Zijlstra wrote:
>Hi All,
>
>I've been thinking on how to implement a zoned CART; and I think I have
>found a nice concept.
>
>My ideas are based on the initial cart patch by Rahul and the
>non-resident code of Rik.
>
>For a zoned page replacement algorithm we have per zone resident list(s)
>and global non-resident list(s). CART specific we would have a T1_i and
>T2_i, where 0 <= i <= nr_zones, and global B1 and B2 lists.
>
>Because B1 and B2 are variable size and the B1_i target size q_i is zone
>specific we need some tricks. However since |B1| + |B2| = c we could get
>away with a single hash_table of c entries if we can manage to balance
>the entries within.
>
>I propose to do this by using a 2 hand bucket and using the 2 MSB of the
>cookie (per bucket uniqueness; 30 bits of uniqueness should be enough on
>a ~64 count bucket). The cookies MSB is used to distinguish B1/B2 and
>the MSB-1 is used for the filter bit.
>
>Let us denote the buckets with the subscript j: |B1_j| + |B2_j| = c_j.
>Each hand keeps a FIFO for its corresponding type: B1/B2; eg. rotating
>H1_j will select the next oldest B1_j page for removal.
>
>We need to balance the per zone values:
> T1_i, T2_i, |T1_i|, |T2_i|
> p_i, Ns_i, Nl_i
>
> |B1_i|, |B2_i|, q_i
>
>agains the per bucket values:
> B1_j, B2_j.
>
>This can be done with two simple modifications to the algorithm:
> - explicitly keep |B1_i| and |B2_i| - needed for the p,q targets
> - merge the history replacement (lines 6-10) in the replace (lines
> 36-40) code so that: adding the new MRU page and removing the old LRU
> page becomes one action.
>
>This will keep:
>
> |B1_j| |B1| Sum^i(|B1_i|)
>-------- ~ ------ = -------------
> |B2_j| |B2| Sum^i(|B2_i|)
>
>however it will violate strict FIFO order within the buckets; although I
>guess it won't be too bad.
>
>This approach does away with explicitly keeping the FIFO lists for the
>non-resident pages and merges them.
>
>Attached is a modification of rik his non-resident code that implements
>the buckets described herein.
>
>I shall attempt to merge this code into the Rahuls new cart-patch-2 if
>you guys don't see any big problems with the approach, or beat me to it.
>
>Kind regards,
>
>
>
>------------------------------------------------------------------------
>
>diff -NaurX linux-2.6.13-rc6/Documentation/dontdiff linux-2.6.13-rc6/include/linux/nonresident.h linux-2.6.13-rc6-cart/include/linux/nonresident.h
>--- linux-2.6.13-rc6/include/linux/nonresident.h 1970-01-01 01:00:00.000000000 +0100
>+++ linux-2.6.13-rc6-cart/include/linux/nonresident.h 2005-08-12 13:55:54.000000000 +0200
>@@ -0,0 +1,11 @@
>+#ifndef __LINUX_NONRESIDENT_H
>+#define __LINUX_NONRESIDENT_H
>+
>+#define NR_filter 0x01 /* short/long */
>+#define NR_list 0x02 /* b1/b2; correlates to PG_active */
>+
>+#define EVICT_MASK 0x80000000
>+#define EVICT_B1 0x00000000
>+#define EVICT_B2 0x80000000
>+
>+#endif /* __LINUX_NONRESIDENT_H */
>diff -NaurX linux-2.6.13-rc6/Documentation/dontdiff linux-2.6.13-rc6/include/linux/swap.h linux-2.6.13-rc6-cart/include/linux/swap.h
>--- linux-2.6.13-rc6/include/linux/swap.h 2005-08-08 20:57:50.000000000 +0200
>+++ linux-2.6.13-rc6-cart/include/linux/swap.h 2005-08-12 14:00:26.000000000 +0200
>@@ -154,6 +154,11 @@
> /* linux/mm/memory.c */
> extern void swapin_readahead(swp_entry_t, unsigned long, struct vm_area_struct *);
>
>+/* linux/mm/nonresident.c */
>+extern u32 remember_page(struct address_space *, unsigned long, unsigned int);
>+extern unsigned int recently_evicted(struct address_space *, unsigned long);
>+extern void init_nonresident(void);
>+
> /* linux/mm/page_alloc.c */
> extern unsigned long totalram_pages;
> extern unsigned long totalhigh_pages;
>@@ -292,6 +297,11 @@
> #define grab_swap_token() do { } while(0)
> #define has_swap_token(x) 0
>
>+/* linux/mm/nonresident.c */
>+#define init_nonresident() do { } while (0)
>+#define remember_page(x,y,z) 0
>+#define recently_evicted(x,y) 0
>+
> #endif /* CONFIG_SWAP */
> #endif /* __KERNEL__*/
> #endif /* _LINUX_SWAP_H */
>diff -NaurX linux-2.6.13-rc6/Documentation/dontdiff linux-2.6.13-rc6/init/main.c linux-2.6.13-rc6-cart/init/main.c
>--- linux-2.6.13-rc6/init/main.c 2005-08-08 20:57:51.000000000 +0200
>+++ linux-2.6.13-rc6-cart/init/main.c 2005-08-10 08:33:38.000000000 +0200
>@@ -47,6 +47,7 @@
> #include <linux/rmap.h>
> #include <linux/mempolicy.h>
> #include <linux/key.h>
>+#include <linux/swap.h>
>
> #include <asm/io.h>
> #include <asm/bugs.h>
>@@ -494,6 +495,7 @@
> }
> #endif
> vfs_caches_init_early();
>+ init_nonresident();
> mem_init();
> kmem_cache_init();
> setup_per_cpu_pageset();
>diff -NaurX linux-2.6.13-rc6/Documentation/dontdiff linux-2.6.13-rc6/mm/Makefile linux-2.6.13-rc6-cart/mm/Makefile
>--- linux-2.6.13-rc6/mm/Makefile 2005-08-08 20:57:52.000000000 +0200
>+++ linux-2.6.13-rc6-cart/mm/Makefile 2005-08-10 08:33:39.000000000 +0200
>@@ -12,7 +12,8 @@
> readahead.o slab.o swap.o truncate.o vmscan.o \
> prio_tree.o $(mmu-y)
>
>-obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o
>+obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o \
>+ nonresident.o
> obj-$(CONFIG_HUGETLBFS) += hugetlb.o
> obj-$(CONFIG_NUMA) += mempolicy.o
> obj-$(CONFIG_SPARSEMEM) += sparse.o
>diff -NaurX linux-2.6.13-rc6/Documentation/dontdiff linux-2.6.13-rc6/mm/nonresident.c linux-2.6.13-rc6-cart/mm/nonresident.c
>--- linux-2.6.13-rc6/mm/nonresident.c 1970-01-01 01:00:00.000000000 +0100
>+++ linux-2.6.13-rc6-cart/mm/nonresident.c 2005-08-12 14:00:26.000000000 +0200
>@@ -0,0 +1,211 @@
>+/*
>+ * mm/nonresident.c
>+ * (C) 2004,2005 Red Hat, Inc
>+ * Written by Rik van Riel <riel@redhat.com>
>+ * Released under the GPL, see the file COPYING for details.
>+ * Adapted by Peter Zijlstra <a.p.zijlstra@chello.nl> for use by ARC
>+ * like algorithms.
>+ *
>+ * Keeps track of whether a non-resident page was recently evicted
>+ * and should be immediately promoted to the active list. This also
>+ * helps automatically tune the inactive target.
>+ *
>+ * The pageout code stores a recently evicted page in this cache
>+ * by calling remember_page(mapping/mm, index/vaddr)
>+ * and can look it up in the cache by calling recently_evicted()
>+ * with the same arguments.
>+ *
>+ * Note that there is no way to invalidate pages after eg. truncate
>+ * or exit, we let the pages fall out of the non-resident set through
>+ * normal replacement.
>+ *
>+ *
>+ * Modified to work with ARC like algorithms who:
>+ * - need to balance two lists; |b1| + |b2| = c,
>+ * - keep a flag per non-resident page.
>+ *
>+ * This is accomplished by extending the buckets to two hands; one
>+ * for each list. And modifying the cookie to put two state flags
>+ * in its MSBs.
>+ *
>+ * On insertion time it is specified from which list an entry is to
>+ * be reused; then the corresponding hand is rotated until a cookie
>+ * of the proper type is encountered (MSB; NR_list).
>+ *
>+ * Because two hands and clock search are too much for
>+ * preempt_disable() the bucket is guarded by a spinlock.
>+ */
>+#include <linux/mm.h>
>+#include <linux/cache.h>
>+#include <linux/spinlock.h>
>+#include <linux/bootmem.h>
>+#include <linux/hash.h>
>+#include <linux/prefetch.h>
>+#include <linux/kernel.h>
>+#include <linux/nonresident.h>
>+
>+#define TARGET_SLOTS 128
>+#define NR_CACHELINES (TARGET_SLOTS*sizeof(u32) / L1_CACHE_BYTES);
>+#define NR_SLOTS (((NR_CACHELINES * L1_CACHE_BYTES) - sizeof(spinlock_t) - 2*sizeof(u16)) / sizeof(u32))
>+#if NR_SLOTS < TARGET_SLOTS / 2
>+#warning very small slot size
>+#if NR_SLOTS == 0
>+#error no room for slots left
>+#endif
>+#endif
>+
>+#define FLAGS_BITS 2
>+#define FLAGS_SHIFT (sizeof(u32)*8 - FLAGS_BITS)
>+#define FLAGS_MASK (~(((1 << FLAGS_BITS) - 1) << FLAGS_SHIFT))
>+
>+struct nr_bucket
>+{
>+ spinlock_t lock;
>+ u16 hand[2];
>+ u32 slot[NR_SLOTS];
>+} ____cacheline_aligned;
>+
>+/* The non-resident page hash table. */
>+static struct nr_bucket * nonres_table;
>+static unsigned int nonres_shift;
>+static unsigned int nonres_mask;
>+
>+/* hash the address into a bucket */
>+static struct nr_bucket * nr_hash(void * mapping, unsigned long index)
>+{
>+ unsigned long bucket;
>+ unsigned long hash;
>+
>+ hash = hash_ptr(mapping, BITS_PER_LONG);
>+ hash = 37 * hash + hash_long(index, BITS_PER_LONG);
>+ bucket = hash & nonres_mask;
>+
>+ return nonres_table + bucket;
>+}
>+
>+/* hash the address, inode and flags into a cookie */
>+/* the two msb are flags; where msb-1 is a type flag and msb a period flag */
>+static u32 nr_cookie(struct address_space * mapping, unsigned long index, unsigned int flags)
>+{
>+ u32 c;
>+ unsigned long cookie;
>+
>+ cookie = hash_ptr(mapping, BITS_PER_LONG);
>+ cookie = 37 * cookie + hash_long(index, BITS_PER_LONG);
>+
>+ if (mapping->host) {
>+ cookie = 37 * cookie + hash_long(mapping->host->i_ino, BITS_PER_LONG);
>+ }
>+
>+ c = (u32)(cookie >> (BITS_PER_LONG - 32));
>+ c = (c & FLAGS_MASK) | flags << FLAGS_SHIFT;
>+ return c;
>+}
>+
>+unsigned int recently_evicted(struct address_space * mapping, unsigned long index)
>+{
>+ struct nr_bucket * nr_bucket;
>+ u32 wanted;
>+ unsigned int r_flags = 0;
>+ int i;
>+
>+ prefetch(mapping->host);
>+ nr_bucket = nr_hash(mapping, index);
>+
>+ spin_lock_prefetch(nr_bucket); // prefetch_range(nr_bucket, NR_CACHELINES);
>+ wanted = nr_cookie(mapping, index, 0);
>+
>+ spin_lock(&nr_bucket->lock);
>+ for (i = 0; i < NR_SLOTS; ++i) {
>+ if (nr_bucket->slot[i] & FLAGS_MASK == wanted) {
>+ r_flags = nr_bucket->slot[i] >> FLAGS_SHIFT;
>+ r_flags |= EVICT_MASK;
>+ nr_bucket->slot[i] = 0;
>+ break;
>+ }
>+ }
>+ spin_unlock(&nr_bucket->lock);
>+
>+ return r_flags;
>+}
>+
>+/* flags:
>+ * logical and of the page flags (NR_filter, NR_list) and
>+ * an EVICT_ target
>+ */
>+u32 remember_page(struct address_space * mapping, unsigned long index, unsigned int flags)
>+{
>+ struct nr_bucket * nr_bucket;
>+ u32 cookie;
>+ u32 * slot;
>+ int i, slots;
>+
>+ prefetch(mapping->host);
>+ nr_bucket = nr_hash(mapping, index);
>+
>+ spin_lock_prefetch(nr_bucket); // prefetchw_range(nr_bucket, NR_CACHELINES);
>+ cookie = nr_cookie(mapping, index, flags);
>+
>+ flags &= EVICT_MASK;
>+ spin_lock(&nr_bucket->lock);
>+again:
>+ slots = NR_SLOTS;
>+ do {
>+ i = ++nr_bucket->hand[!!flags];
>+ if (unlikely(i >= NR_SLOTS))
>+ i = nr_bucket->hand[!!flags] = 0;
>+ slot = &nr_bucket->slot[i];
>+ } while (*slot && *slot & EVICT_MASK != flags && --slots);
>+ if (unlikely(!slots)) {
>+ flags ^= EVICT_MASK;
>+ goto again;
>+ }
>+ xchg(slot, cookie);
>+ spin_unlock(&nr_bucket->lock);
>+
>+ return cookie;
>+}
>+
>+/*
>+ * For interactive workloads, we remember about as many non-resident pages
>+ * as we have actual memory pages. For server workloads with large inter-
>+ * reference distances we could benefit from remembering more.
>+ */
>+static __initdata unsigned long nonresident_factor = 1;
>+void __init init_nonresident(void)
>+{
>+ int target;
>+ int i;
>+
>+ /*
>+ * Calculate the non-resident hash bucket target. Use a power of
>+ * two for the division because alloc_large_system_hash rounds up.
>+ */
>+ target = nr_all_pages * nonresident_factor;
>+ target /= (sizeof(struct nr_bucket) / sizeof(u32));
>+
>+ nonres_table = alloc_large_system_hash("Non-resident page tracking",
>+ sizeof(struct nr_bucket),
>+ target,
>+ 0,
>+ HASH_EARLY | HASH_HIGHMEM,
>+ &nonres_shift,
>+ &nonres_mask,
>+ 0);
>+
>+ for (i = 0; i < (1 << nonres_shift); i++) {
>+ spin_lock_init(&nonres_table[i].lock);
>+ nonres_table[i].hand[0] = nonres_table[i].hand[1] = 0;
>+ for (j = 0; j < NR_SLOTS; ++j)
>+ nonres_table[i].slot[j] = 0;
>+ }
>+}
>+
>+static int __init set_nonresident_factor(char * str)
>+{
>+ if (!str)
>+ return 0;
>+ nonresident_factor = simple_strtoul(str, &str, 0);
>+ return 1;
>+}
>+__setup("nonresident_factor=", set_nonresident_factor);
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Zoned CART
2005-08-12 15:42 ` Rahul Iyer
@ 2005-08-12 15:52 ` Peter Zijlstra
2005-08-12 23:08 ` Marcelo Tosatti
1 sibling, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2005-08-12 15:52 UTC (permalink / raw)
To: Rahul Iyer; +Cc: linux-mm, Rik van Riel, Marcelo Tosatti
On Fri, 2005-08-12 at 11:42 -0400, Rahul Iyer wrote:
> Hi Peter,
> I have recently released another patch...
> both patches are at http://www.cs.cmu.edu/~412/projects/CART/
> >I shall attempt to merge this code into the Rahuls new cart-patch-2 if
> >you guys don't see any big problems with the approach, or beat me to it.
> >
Yes I've seen that, that's the one I refered to in the above line.
I still have to read it thorougly though; however it looks as if the
non-resident pages are still per zone. Also you yourself mention OOM
problems with allocating nodes for the lists.
I hope my code solves those problems without affecting the quality of
the algorithm.
Peter Zijlstra
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Zoned CART
2005-08-12 14:37 Zoned CART Peter Zijlstra
2005-08-12 15:42 ` Rahul Iyer
@ 2005-08-12 20:21 ` Marcelo Tosatti
2005-08-12 22:28 ` Marcelo Tosatti
2005-08-13 19:03 ` Rahul Iyer
2005-08-14 12:58 ` Peter Zijlstra
2 siblings, 2 replies; 22+ messages in thread
From: Marcelo Tosatti @ 2005-08-12 20:21 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: linux-mm, Rik van Riel, Rahul Iyer
Hi!
On Fri, Aug 12, 2005 at 04:37:09PM +0200, Peter Zijlstra wrote:
> Hi All,
>
> I've been thinking on how to implement a zoned CART; and I think I have
> found a nice concept.
>
> My ideas are based on the initial cart patch by Rahul and the
> non-resident code of Rik.
>
> For a zoned page replacement algorithm we have per zone resident list(s)
> and global non-resident list(s). CART specific we would have a T1_i and
> T2_i, where 0 <= i <= nr_zones, and global B1 and B2 lists.
>
> Because B1 and B2 are variable size and the B1_i target size q_i is zone
> specific we need some tricks. However since |B1| + |B2| = c we could get
> away with a single hash_table of c entries if we can manage to balance
> the entries within.
>
> I propose to do this by using a 2 hand bucket and using the 2 MSB of the
> cookie (per bucket uniqueness; 30 bits of uniqueness should be enough on
> a ~64 count bucket). The cookies MSB is used to distinguish B1/B2 and
> the MSB-1 is used for the filter bit.
>
> Let us denote the buckets with the subscript j: |B1_j| + |B2_j| = c_j.
> Each hand keeps a FIFO for its corresponding type: B1/B2; eg. rotating
> H1_j will select the next oldest B1_j page for removal.
>
> We need to balance the per zone values:
> T1_i, T2_i, |T1_i|, |T2_i|
> p_i, Ns_i, Nl_i
>
> |B1_i|, |B2_i|, q_i
>
> agains the per bucket values:
> B1_j, B2_j.
>
> This can be done with two simple modifications to the algorithm:
> - explicitly keep |B1_i| and |B2_i| - needed for the p,q targets
> - merge the history replacement (lines 6-10) in the replace (lines
> 36-40) code so that: adding the new MRU page and removing the old LRU
> page becomes one action.
>
> This will keep:
>
> |B1_j| |B1| Sum^i(|B1_i|)
> -------- ~ ------ = -------------
> |B2_j| |B2| Sum^i(|B2_i|)
>
> however it will violate strict FIFO order within the buckets; although I
> guess it won't be too bad.
>
> This approach does away with explicitly keeping the FIFO lists for the
> non-resident pages and merges them.
Looks plausible, while keeping lower overhead of the hash table.
I have no useful technical comments on your idea at the moment, sorry
(hope to have on the future!).
One thing which which we would like to see done is an investigation of the
behaviour of the hashing function under different sets of input.
> Attached is a modification of rik his non-resident code that implements
> the buckets described herein.
>
> I shall attempt to merge this code into the Rahuls new cart-patch-2 if
> you guys don't see any big problems with the approach, or beat me to it.
IMHO the most important thing in trying to adapt ARC's dynamically
adaptable "recency/frequency" mechanism into Linux is to _really_
understand the behaviour of the page reclaiming process under different
workloads and system configurations.
Good question is: For which situations the current strategy is suboptimal
and why.
It certainly suffers from some of the well-studied LRU problems, most
notably the frequency metric is not weighted into the likeness of future
usage. The "likeness" is currently implemented by LRU list order.
There are further complications in an operating system compared to
a plain cache, such as:
- swap allocation
- laundering of dirty pages
- destruction of pagetable mappings for reclaiming purposes
- balancing between pagecache and kernel cache reclamation
- special cases such:
- VM_LOCKED mappings
- locked pages
- pages under writeback - right now these pages are sent up the top
of the LRU stack (either active or inactive) once encountered by
the page reclaiming codepath, meaning that they effectively become
more "recent" then all the pages in their respective stack.
The following ARC experiment uses a radix tree as the backend for
non-resident cache.
In some situations it performs better (increases cache hits, that is)
than the current v2.6 algorithm, sometimes way worse - probably due to
a short period in which active (L2) pages remain in memory when the
inactive target is large, as Rik commented on lkml the other day.
Rahul, do you have any promising performance results of your ARC
implementation?
Note: this one dies horribly with highmem machines, probably due to
atomic allocation of nodes - an improvement would be to
diff -Nur --exclude-from=/home/marcelo/git/exclude --show-c-function linux-2.6.12/include/linux/mm_inline.h linux-2.6.12-arc/include/linux/mm_inline.h
--- linux-2.6.12/include/linux/mm_inline.h 2005-06-17 16:48:29.000000000 -0300
+++ linux-2.6.12-arc/include/linux/mm_inline.h 2005-08-09 17:46:22.000000000 -0300
@@ -27,6 +27,12 @@ del_page_from_inactive_list(struct zone
zone->nr_inactive--;
}
+void add_to_inactive_evicted_list(struct zone *zone, struct address_space *mapping, unsigned long index);
+void add_to_active_evicted_list(struct zone *zone, struct address_space *mapping, unsigned long index);
+void add_to_evicted_list(struct zone *zone, unsigned long mapping, unsigned long index, int active);
+
+unsigned long zone_target(struct zone *zone);
+
static inline void
del_page_from_lru(struct zone *zone, struct page *page)
{
diff -Nur --exclude-from=/home/marcelo/git/exclude --show-c-function linux-2.6.12/include/linux/mmzone.h linux-2.6.12-arc/include/linux/mmzone.h
--- linux-2.6.12/include/linux/mmzone.h 2005-06-17 16:48:29.000000000 -0300
+++ linux-2.6.12-arc/include/linux/mmzone.h 2005-07-14 07:15:15.000000000 -0300
@@ -137,10 +137,14 @@ struct zone {
spinlock_t lru_lock;
struct list_head active_list;
struct list_head inactive_list;
+ struct list_head evicted_active_list;
+ struct list_head evicted_inactive_list;
unsigned long nr_scan_active;
unsigned long nr_scan_inactive;
unsigned long nr_active;
unsigned long nr_inactive;
+ unsigned long nr_evicted_active;
+ unsigned long nr_evicted_inactive;
unsigned long pages_scanned; /* since last reclaim */
int all_unreclaimable; /* All pages pinned */
diff -Nur --exclude-from=/home/marcelo/git/exclude --show-c-function linux-2.6.12/include/linux/page-flags.h linux-2.6.12-arc/include/linux/page-flags.h
--- linux-2.6.12/include/linux/page-flags.h 2005-06-17 16:48:29.000000000 -0300
+++ linux-2.6.12-arc/include/linux/page-flags.h 2005-08-05 00:59:45.000000000 -0300
@@ -76,6 +76,7 @@
#define PG_reclaim 18 /* To be reclaimed asap */
#define PG_nosave_free 19 /* Free, should not be written */
#define PG_uncached 20 /* Page has been mapped as uncached */
+#define PG_referencedtwice 21
/*
* Global page accounting. One instance per CPU. Only unsigned longs are
@@ -132,6 +133,16 @@ struct page_state {
unsigned long pgrotated; /* pages rotated to tail of the LRU */
unsigned long nr_bounce; /* pages for bounce buffers */
+
+ unsigned long active_scan;
+ unsigned long pgscan_active_dma;
+ unsigned long pgscan_active_normal;
+ unsigned long pgscan_active_high;
+
+ unsigned long inactive_scan;
+ unsigned long pgscan_inactive_dma;
+ unsigned long pgscan_inactive_normal;
+ unsigned long pgscan_inactive_high;
};
extern void get_page_state(struct page_state *ret);
@@ -185,6 +196,11 @@ extern void __mod_page_state(unsigned of
#define ClearPageReferenced(page) clear_bit(PG_referenced, &(page)->flags)
#define TestClearPageReferenced(page) test_and_clear_bit(PG_referenced, &(page)->flags)
+#define PageReferencedTwice(page) test_bit(PG_referencedtwice, &(page)->flags)
+#define SetPageReferencedTwice(page) set_bit(PG_referencedtwice, &(page)->flags)
+#define ClearPageReferencedTwice(page) clear_bit(PG_referencedtwice, &(page)->flags)
+#define TestClearPageReferencedTwice(page) test_and_clear_bit(PG_referencedtwice, &(page)->flags)
+
#define PageUptodate(page) test_bit(PG_uptodate, &(page)->flags)
#ifndef SetPageUptodate
#define SetPageUptodate(page) set_bit(PG_uptodate, &(page)->flags)
diff -Nur --exclude-from=/home/marcelo/git/exclude --show-c-function linux-2.6.12/lib/radix-tree.c linux-2.6.12-arc/lib/radix-tree.c
--- linux-2.6.12/lib/radix-tree.c 2005-06-17 16:48:29.000000000 -0300
+++ linux-2.6.12-arc/lib/radix-tree.c 2005-07-22 06:44:04.000000000 -0300
@@ -413,7 +413,7 @@ out:
}
EXPORT_SYMBOL(radix_tree_tag_clear);
-#ifndef __KERNEL__ /* Only the test harness uses this at present */
+//#ifndef __KERNEL__ /* Only the test harness uses this at present */
/**
* radix_tree_tag_get - get a tag on a radix tree node
* @root: radix tree root
@@ -422,8 +422,8 @@ EXPORT_SYMBOL(radix_tree_tag_clear);
*
* Return the search tag corresponging to @index in the radix tree.
*
- * Returns zero if the tag is unset, or if there is no corresponding item
- * in the tree.
+ * Returns -1 if the tag is unset, or zero if there is no corresponding
+ * item in the tree.
*/
int radix_tree_tag_get(struct radix_tree_root *root,
unsigned long index, int tag)
@@ -457,7 +457,7 @@ int radix_tree_tag_get(struct radix_tree
int ret = tag_get(*slot, tag, offset);
BUG_ON(ret && saw_unset_tag);
- return ret;
+ return ret ? 1 : -1;
}
slot = (struct radix_tree_node **)((*slot)->slots + offset);
shift -= RADIX_TREE_MAP_SHIFT;
@@ -465,7 +465,7 @@ int radix_tree_tag_get(struct radix_tree
}
}
EXPORT_SYMBOL(radix_tree_tag_get);
-#endif
+//#endif
static unsigned int
__lookup(struct radix_tree_root *root, void **results, unsigned long index,
diff -Nur --exclude-from=/home/marcelo/git/exclude --show-c-function linux-2.6.12/mm/evicted.c linux-2.6.12-arc/mm/evicted.c
--- linux-2.6.12/mm/evicted.c 1969-12-31 21:00:00.000000000 -0300
+++ linux-2.6.12-arc/mm/evicted.c 2005-08-10 14:04:25.000000000 -0300
@@ -0,0 +1,301 @@
+#include <linux/mm.h>
+#include <linux/radix-tree.h>
+#include <linux/hash.h>
+#include <linux/module.h>
+#include <linux/mempool.h>
+#include <linux/debugfs.h>
+#include <linux/swap.h>
+#include <linux/kernel.h>
+#include <asm/uaccess.h>
+
+/* overload DIRTY/WRITEBACK radix tags for our own purposes... */
+#define ACTIVE_TAG 0
+#define INACTIVE_TAG 1
+
+struct radix_tree_root evicted_radix = RADIX_TREE_INIT(GFP_ATOMIC);
+
+static mempool_t *evicted_cache;
+
+extern int total_memory;
+
+unsigned long inactive_target;
+
+struct evicted_page {
+ struct list_head evictedlist;
+ unsigned long key;
+};
+
+static void *evicted_pool_alloc(unsigned int __nocast gfp_mask, void *data)
+{
+ void *ptr = kmalloc(sizeof(struct evicted_page), gfp_mask);
+ return ptr;
+}
+
+static void evicted_pool_free(void *element, void *data)
+{
+ kfree(element);
+}
+
+int evicted_misses = 0;
+int active_evicted_hits = 0;
+int inactive_evicted_hits = 0;
+
+unsigned long zone_target(struct zone *zone)
+{
+ return (zone->present_pages * inactive_target) / totalram_pages;
+}
+
+struct dentry *evicted_dentry;
+
+int evicted_open(struct inode *ino, struct file *file)
+{
+ return 0;
+}
+
+ssize_t evicted_write(struct file *file, char __user *buf, size_t size,
+ loff_t *ignored)
+{
+ evicted_misses = 0;
+ active_evicted_hits = 0;
+ inactive_evicted_hits = 0;
+ return 1;
+}
+
+int ran = 0;
+
+ssize_t evicted_read(struct file *file, char __user *buf, size_t size,
+ loff_t *ignored)
+{
+ int len, a_evicted, i_evicted;
+ pg_data_t *pgdat;
+ char src[200];
+
+ len = a_evicted = i_evicted = 0;
+
+ if (ran) {
+ ran = 0;
+ return 0;
+ }
+
+ for_each_pgdat(pgdat) {
+ struct zonelist *zonelist = pgdat->node_zonelists;
+ struct zone **zonep = zonelist->zones;
+ struct zone *zone;
+ for (zone = *zonep++; zone; zone = *zonep++) {
+ a_evicted += zone->nr_evicted_active;
+ i_evicted += zone->nr_evicted_inactive;
+ }
+ }
+
+ sprintf(src, "Misses: %d\n"
+ "Active evicted size: %d\n"
+ "Active evicted hits: %d\n"
+ "Inactive evicted size: %d "
+ "Inactive evicted hits: %d\n"
+ "Global inactive target: %ld\n",
+ evicted_misses, a_evicted, active_evicted_hits,
+ i_evicted, inactive_evicted_hits, inactive_target);
+
+ len = strlen(src);
+
+ if(copy_to_user(buf, src, len))
+ return -EFAULT;
+
+ ran = 1;
+
+ return len;
+}
+
+struct file_operations vmevicted_fops = {
+ .owner = THIS_MODULE,
+ .open = evicted_open,
+ .read = evicted_read,
+ .write = evicted_write,
+};
+
+void __init init_vm_evicted(void)
+{
+ printk(KERN_ERR "init_vm_evicted total_memory:%d\n", total_memory);
+
+ evicted_cache = mempool_create (total_memory, evicted_pool_alloc, evicted_pool_free, NULL);
+
+ if (evicted_cache)
+ printk(KERN_ERR "evicted cache init, using %d Kbytes\n", (total_memory * 2) * sizeof(struct evicted_page));
+ else
+ printk(KERN_ERR "mempool_alloc failure!\n");
+
+ inactive_target = total_memory/2;
+
+ evicted_dentry = debugfs_create_file("vm_evicted", 0644, NULL, NULL, &vmevicted_fops);
+}
+
+struct evicted_page *alloc_evicted_entry(unsigned long index, unsigned long mapping)
+{
+ struct evicted_page *e_page;
+
+ if (!evicted_cache)
+ return NULL;
+
+ e_page = mempool_alloc(evicted_cache, GFP_ATOMIC);
+
+ if (e_page)
+ INIT_LIST_HEAD(&e_page->evictedlist);
+
+ return e_page;
+}
+
+inline unsigned long evict_hash_fn(unsigned long mapping, unsigned long index)
+{
+ unsigned long key;
+
+ /* most significant word of "mapping" is not random */
+ key = hash_long((mapping & 0x0000FFFF) + index, BITS_PER_LONG);
+ key = hash_long(key + index, BITS_PER_LONG);
+
+ return key;
+}
+
+/* remove the LRU page from the "elist" evicted page list */
+void remove_lru_page(struct radix_tree_root *radix, struct list_head *elist)
+{
+ struct evicted_page *e_page;
+ struct list_head *last = elist->prev;
+
+ e_page = list_entry(last, struct evicted_page, evictedlist);
+
+ radix_tree_delete(radix, e_page->key);
+
+ list_del(&e_page->evictedlist);
+ mempool_free(e_page, evicted_cache);
+}
+
+void add_to_inactive_evicted_list(struct zone *zone, unsigned long mapping,
+ unsigned long index)
+{
+ struct list_head *list = &zone->evicted_inactive_list;
+ struct evicted_page *e_page;
+ unsigned long key;
+ int target, above_target, error;
+
+ /* Total amount of history recorded is twice the number of pages cached */
+ target = (zone->present_pages*2) - zone->nr_inactive;
+
+ above_target = zone->nr_evicted_inactive - target;
+
+ while (above_target > 0 && zone->nr_evicted_inactive) {
+ remove_lru_page(&evicted_radix, list);
+ zone->nr_evicted_inactive--;
+ above_target--;
+ }
+
+ e_page = alloc_evicted_entry((unsigned long)mapping, index);
+
+ if (unlikely(!e_page))
+ return;
+
+ list_add(&e_page->evictedlist, list);
+
+ e_page->key = evict_hash_fn((unsigned long) mapping, index);
+
+ error = radix_tree_preload(GFP_ATOMIC|__GFP_NOWARN);
+ if (error == 0) {
+ radix_tree_insert(&evicted_radix, e_page->key, e_page);
+ radix_tree_tag_set(&evicted_radix, e_page->key, INACTIVE_TAG);
+ zone->nr_evicted_inactive++;
+ }
+ radix_tree_preload_end();
+}
+
+void add_to_active_evicted_list(struct zone *zone, unsigned long mapping, unsigned long index)
+{
+ struct list_head *list = &zone->evicted_active_list;
+ struct evicted_page *e_page;
+ unsigned long key;
+ int target, above_target, error;
+
+ /* Total amount of history recorded is twice the number of pages cached */
+ target = (zone->present_pages*2) - zone->nr_active;
+
+ above_target = zone->nr_evicted_active - target;
+
+ while (above_target > 0 && zone->nr_evicted_active) {
+ remove_lru_page(&evicted_radix, list);
+ zone->nr_evicted_active--;
+ above_target--;
+ }
+
+ e_page = alloc_evicted_entry((unsigned long)mapping, index);
+
+ if (!e_page)
+ return;
+
+ list_add(&e_page->evictedlist, list);
+
+ e_page->key = evict_hash_fn((unsigned long) mapping, index);
+
+ error = radix_tree_preload(GFP_ATOMIC|__GFP_NOWARN);
+ if (error == 0) {
+ radix_tree_insert(&evicted_radix, e_page->key, e_page);
+ radix_tree_tag_set(&evicted_radix, e_page->key, ACTIVE_TAG);
+ zone->nr_evicted_active++;
+ }
+ radix_tree_preload_end();
+}
+
+#define ACTIVE_HIT 1
+#define INACTIVE_HIT 2
+
+int evicted_lookup(struct address_space *mapping, unsigned long index)
+{
+ int e_page;
+ unsigned long key = evict_hash_fn((unsigned long) mapping, index);
+
+ e_page = radix_tree_tag_get(&evicted_radix, key, INACTIVE_TAG);
+
+ if (e_page == 1)
+ return INACTIVE_HIT;
+ else if (e_page == -1)
+ return ACTIVE_HIT;
+
+ return 0;
+}
+
+void add_to_evicted_list(struct zone *zone, unsigned long mapping,
+ unsigned long index, int active)
+{
+ if (active)
+ add_to_active_evicted_list(zone, mapping, index);
+ else
+ add_to_inactive_evicted_list(zone, mapping, index);
+}
+
+/* takes care of updating the inactive target */
+void evicted_account(struct address_space *mapping, unsigned long index)
+{
+ unsigned long diff;
+ evicted_misses++;
+
+ switch (evicted_lookup(mapping, index)) {
+ case ACTIVE_HIT:
+/* if (inactive_target > (totalram_pages/2)) {
+ diff = (totalram_pages/2) - inactive_target;
+ inactive_target -= min(diff/128, 32);
+ } else */
+ inactive_target -= 8;
+
+ if ((signed long)inactive_target < 0)
+ inactive_target = 0;
+
+ active_evicted_hits++;
+ break;
+ case INACTIVE_HIT:
+/* if (inactive_target < (totalram_pages/2)) {
+ diff = (totalram_pages/2) - inactive_target;
+ inactive_target += min(diff/128, 32);
+ } else */
+ inactive_target += 8;
+
+ inactive_evicted_hits++;
+ break;
+ }
+}
diff -Nur --exclude-from=/home/marcelo/git/exclude --show-c-function linux-2.6.12/mm/filemap.c linux-2.6.12-arc/mm/filemap.c
--- linux-2.6.12/mm/filemap.c 2005-06-17 16:48:29.000000000 -0300
+++ linux-2.6.12-arc/mm/filemap.c 2005-08-05 02:19:14.000000000 -0300
@@ -396,12 +396,18 @@ int add_to_page_cache(struct page *page,
EXPORT_SYMBOL(add_to_page_cache);
+extern int evicted_lookup(struct address_space *mapping, unsigned long index);
+
int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
pgoff_t offset, int gfp_mask)
{
int ret = add_to_page_cache(page, mapping, offset, gfp_mask);
- if (ret == 0)
- lru_cache_add(page);
+ if (ret == 0) {
+ if (evicted_lookup(mapping, offset))
+ lru_cache_add_active(page);
+ else
+ lru_cache_add(page);
+ }
return ret;
}
@@ -493,6 +499,8 @@ void fastcall __lock_page(struct page *p
}
EXPORT_SYMBOL(__lock_page);
+extern void evicted_account(struct address_space *, unsigned long);
+
/*
* a rather lightweight function, finding and getting a reference to a
* hashed page atomically.
@@ -505,6 +513,8 @@ struct page * find_get_page(struct addre
page = radix_tree_lookup(&mapping->page_tree, offset);
if (page)
page_cache_get(page);
+ else
+ evicted_account(mapping, offset);
read_unlock_irq(&mapping->tree_lock);
return page;
}
diff -Nur --exclude-from=/home/marcelo/git/exclude --show-c-function linux-2.6.12/mm/Makefile linux-2.6.12-arc/mm/Makefile
--- linux-2.6.12/mm/Makefile 2005-06-17 16:48:29.000000000 -0300
+++ linux-2.6.12-arc/mm/Makefile 2005-07-14 07:15:15.000000000 -0300
@@ -10,7 +10,7 @@ mmu-$(CONFIG_MMU) := fremap.o highmem.o
obj-y := bootmem.o filemap.o mempool.o oom_kill.o fadvise.o \
page_alloc.o page-writeback.o pdflush.o \
readahead.o slab.o swap.o truncate.o vmscan.o \
- prio_tree.o $(mmu-y)
+ prio_tree.o evicted.o $(mmu-y)
obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o
obj-$(CONFIG_HUGETLBFS) += hugetlb.o
diff -Nur --exclude-from=/home/marcelo/git/exclude --show-c-function linux-2.6.12/mm/memory.c linux-2.6.12-arc/mm/memory.c
--- linux-2.6.12/mm/memory.c 2005-08-09 15:05:29.000000000 -0300
+++ linux-2.6.12-arc/mm/memory.c 2005-08-09 17:08:38.000000000 -0300
@@ -1314,7 +1314,7 @@ static int do_wp_page(struct mm_struct *
page_remove_rmap(old_page);
flush_cache_page(vma, address, pfn);
break_cow(vma, new_page, address, page_table);
- lru_cache_add_active(new_page);
+ lru_cache_add(new_page);
page_add_anon_rmap(new_page, vma, address);
/* Free the old page.. */
@@ -1791,8 +1791,7 @@ do_anonymous_page(struct mm_struct *mm,
entry = maybe_mkwrite(pte_mkdirty(mk_pte(page,
vma->vm_page_prot)),
vma);
- lru_cache_add_active(page);
- SetPageReferenced(page);
+ lru_cache_add(page);
page_add_anon_rmap(page, vma, addr);
}
@@ -1912,7 +1911,7 @@ retry:
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
set_pte_at(mm, address, page_table, entry);
if (anon) {
- lru_cache_add_active(new_page);
+ lru_cache_add(new_page);
page_add_anon_rmap(new_page, vma, address);
} else
page_add_file_rmap(new_page);
diff -Nur --exclude-from=/home/marcelo/git/exclude --show-c-function linux-2.6.12/mm/mempool.c linux-2.6.12-arc/mm/mempool.c
--- linux-2.6.12/mm/mempool.c 2005-06-17 16:48:29.000000000 -0300
+++ linux-2.6.12-arc/mm/mempool.c 2005-08-10 12:13:40.000000000 -0300
@@ -60,10 +60,16 @@ mempool_t * mempool_create(int min_nr, m
if (!pool)
return NULL;
memset(pool, 0, sizeof(*pool));
+
pool->elements = kmalloc(min_nr * sizeof(void *), GFP_KERNEL);
if (!pool->elements) {
- kfree(pool);
- return NULL;
+ printk(KERN_ERR "kmalloc of %d failed, trying vmalloc!\n",
+ min_nr * sizeof(void *));
+ pool->elements = vmalloc(min_nr * sizeof(void *));
+ if (!pool->elements) {
+ kfree(pool);
+ return NULL;
+ }
}
spin_lock_init(&pool->lock);
pool->min_nr = min_nr;
diff -Nur --exclude-from=/home/marcelo/git/exclude --show-c-function linux-2.6.12/mm/page_alloc.c linux-2.6.12-arc/mm/page_alloc.c
--- linux-2.6.12/mm/page_alloc.c 2005-06-17 16:48:29.000000000 -0300
+++ linux-2.6.12-arc/mm/page_alloc.c 2005-08-09 17:11:21.000000000 -0300
@@ -378,8 +378,10 @@ void __free_pages_ok(struct page *page,
__put_page(page + i);
#endif
- for (i = 0 ; i < (1 << order) ; ++i)
+ for (i = 0 ; i < (1 << order) ; ++i) {
+ ClearPageActive((struct page *)(page + i));
free_pages_check(__FUNCTION__, page + i);
+ }
list_add(&page->lru, &list);
kernel_map_pages(page, 1<<order, 0);
free_pages_bulk(page_zone(page), 1, &list, order);
@@ -614,6 +616,8 @@ static void fastcall free_hot_cold_page(
inc_page_state(pgfree);
if (PageAnon(page))
page->mapping = NULL;
+ if (PageActive(page))
+ ClearPageActive(page);
free_pages_check(__FUNCTION__, page);
pcp = &zone->pageset[get_cpu()].pcp[cold];
local_irq_save(flags);
@@ -1708,11 +1712,16 @@ static void __init free_area_init_core(s
printk(KERN_DEBUG " %s zone: %lu pages, LIFO batch:%lu\n",
zone_names[j], realsize, batch);
INIT_LIST_HEAD(&zone->active_list);
+ INIT_LIST_HEAD(&zone->evicted_active_list);
INIT_LIST_HEAD(&zone->inactive_list);
+ INIT_LIST_HEAD(&zone->evicted_inactive_list);
zone->nr_scan_active = 0;
zone->nr_scan_inactive = 0;
zone->nr_active = 0;
zone->nr_inactive = 0;
+ zone->nr_evicted_inactive = 0;
+ zone->nr_evicted_active = 0;
+
if (!size)
continue;
@@ -1896,9 +1905,17 @@ static char *vmstat_text[] = {
"kswapd_inodesteal",
"pageoutrun",
"allocstall",
-
"pgrotated",
"nr_bounce",
+
+ "active_scan",
+ "pgscan_active_dma",
+ "pgscan_active_normal",
+ "pgscan_active_high",
+ "inactive_scan",
+ "pgscan_inactive_dma",
+ "pgscan_inactive_normal",
+ "pgscan_inactive_high",
};
static void *vmstat_start(struct seq_file *m, loff_t *pos)
diff -Nur --exclude-from=/home/marcelo/git/exclude --show-c-function linux-2.6.12/mm/swap.c linux-2.6.12-arc/mm/swap.c
--- linux-2.6.12/mm/swap.c 2005-06-17 16:48:29.000000000 -0300
+++ linux-2.6.12-arc/mm/swap.c 2005-08-05 01:03:05.000000000 -0300
@@ -122,8 +122,11 @@ void fastcall activate_page(struct page
*/
void fastcall mark_page_accessed(struct page *page)
{
- if (!PageActive(page) && PageReferenced(page) && PageLRU(page)) {
+ if (!PageActive(page) && PageReferencedTwice(page) && PageLRU(page)) {
activate_page(page);
+ ClearPageReferencedTwice(page);
+ } else if (!PageReferencedTwice(page) && PageReferenced(page)) {
+ SetPageReferencedTwice(page);
ClearPageReferenced(page);
} else if (!PageReferenced(page)) {
SetPageReferenced(page);
diff -Nur --exclude-from=/home/marcelo/git/exclude --show-c-function linux-2.6.12/mm/vmscan.c linux-2.6.12-arc/mm/vmscan.c
--- linux-2.6.12/mm/vmscan.c 2005-08-10 14:37:14.000000000 -0300
+++ linux-2.6.12-arc/mm/vmscan.c 2005-08-10 14:55:09.000000000 -0300
@@ -79,6 +79,8 @@ struct scan_control {
* In this context, it doesn't matter that we scan the
* whole list at once. */
int swap_cluster_max;
+
+ int nr_to_isolate;
};
/*
@@ -126,7 +128,7 @@ struct shrinker {
* From 0 .. 100. Higher means more swappy.
*/
int vm_swappiness = 60;
-static long total_memory;
+long total_memory;
static LIST_HEAD(shrinker_list);
static DECLARE_RWSEM(shrinker_rwsem);
@@ -225,27 +227,6 @@ static int shrink_slab(unsigned long sca
return 0;
}
-/* Called without lock on whether page is mapped, so answer is unstable */
-static inline int page_mapping_inuse(struct page *page)
-{
- struct address_space *mapping;
-
- /* Page is in somebody's page tables. */
- if (page_mapped(page))
- return 1;
-
- /* Be more reluctant to reclaim swapcache than pagecache */
- if (PageSwapCache(page))
- return 1;
-
- mapping = page_mapping(page);
- if (!mapping)
- return 0;
-
- /* File is mmap'd by somebody? */
- return mapping_mapped(mapping);
-}
-
static inline int is_page_cache_freeable(struct page *page)
{
return page_count(page) - !!PagePrivate(page) == 2;
@@ -360,15 +341,54 @@ static pageout_t pageout(struct page *pa
return PAGE_CLEAN;
}
+int should_reclaim_mapped(struct zone *zone, struct scan_control *sc)
+{
+ long mapped_ratio;
+ long distress;
+ long swap_tendency;
+ /*
+ * `distress' is a measure of how much trouble we're having reclaiming
+ * pages. 0 -> no problems. 100 -> great trouble.
+ */
+ distress = 100 >> zone->prev_priority;
+
+ /*
+ * The point of this algorithm is to decide when to start reclaiming
+ * mapped memory instead of just pagecache. Work out how much memory
+ * is mapped.
+ */
+ mapped_ratio = (sc->nr_mapped * 100) / total_memory;
+
+ /*
+ * Now decide how much we really want to unmap some pages. The mapped
+ * ratio is downgraded - just because there's a lot of mapped memory
+ * doesn't necessarily mean that page reclaim isn't succeeding.
+ *
+ * The distress ratio is important - we don't want to start going oom.
+ *
+ * A 100% value of vm_swappiness overrides this algorithm altogether.
+ */
+ swap_tendency = mapped_ratio / 2 + distress + vm_swappiness;
+
+ /*
+ * Now use this metric to decide whether to reclaim mapped pages
+ */
+ if (swap_tendency >= 100)
+ return 1;
+
+ return 0;
+}
+
/*
* shrink_list adds the number of reclaimed pages to sc->nr_reclaimed
*/
-static int shrink_list(struct list_head *page_list, struct scan_control *sc)
+static int shrink_list(struct list_head *page_list, struct scan_control *sc, struct zone *zone)
{
LIST_HEAD(ret_pages);
struct pagevec freed_pvec;
int pgactivate = 0;
int reclaimed = 0;
+ unsigned long savedmapping, savedindex, active;
cond_resched();
@@ -387,8 +407,6 @@ static int shrink_list(struct list_head
if (TestSetPageLocked(page))
goto keep;
- BUG_ON(PageActive(page));
-
sc->nr_scanned++;
/* Double the slab pressure for mapped and swapcache pages */
if (page_mapped(page) || PageSwapCache(page))
@@ -398,9 +416,18 @@ static int shrink_list(struct list_head
goto keep_locked;
referenced = page_referenced(page, 1, sc->priority <= 0);
- /* In active use or really unfreeable? Activate it. */
- if (referenced && page_mapping_inuse(page))
- goto activate_locked;
+ /* In active use? */
+ if (referenced) {
+ if (PageReferencedTwice(page)) {
+ ClearPageReferencedTwice(page);
+ goto activate_locked;
+ } else
+ SetPageReferencedTwice(page);
+ goto keep_locked;
+ }
+
+ if (page_mapped(page) && !should_reclaim_mapped(zone, sc))
+ goto keep_locked;
#ifdef CONFIG_SWAP
/*
@@ -509,18 +536,28 @@ static int shrink_list(struct list_head
#ifdef CONFIG_SWAP
if (PageSwapCache(page)) {
swp_entry_t swap = { .val = page->private };
+ savedmapping = (unsigned long)page->mapping;
+ savedindex = page->private;
+ active = PageActive(page);
__delete_from_swap_cache(page);
write_unlock_irq(&mapping->tree_lock);
swap_free(swap);
__put_page(page); /* The pagecache ref */
+ local_irq_disable();
+ add_to_evicted_list(zone, savedmapping, savedindex, active);
+ local_irq_enable();
goto free_it;
}
#endif /* CONFIG_SWAP */
-
+ savedmapping = (unsigned long)page->mapping;
+ savedindex = page->index;
+ active = PageActive(page);
__remove_from_page_cache(page);
write_unlock_irq(&mapping->tree_lock);
__put_page(page);
-
+ local_irq_disable();
+ add_to_evicted_list(zone, savedmapping, savedindex, active);
+ local_irq_enable();
free_it:
unlock_page(page);
reclaimed++;
@@ -597,7 +634,7 @@ static int isolate_lru_pages(int nr_to_s
/*
* shrink_cache() adds the number of pages reclaimed to sc->nr_reclaimed
*/
-static void shrink_cache(struct zone *zone, struct scan_control *sc)
+static void shrink_cache(struct zone *zone, struct scan_control *sc, struct list_head *from, unsigned long *page_counter)
{
LIST_HEAD(page_list);
struct pagevec pvec;
@@ -613,10 +650,10 @@ static void shrink_cache(struct zone *zo
int nr_scan;
int nr_freed;
- nr_taken = isolate_lru_pages(sc->swap_cluster_max,
- &zone->inactive_list,
+ nr_taken = isolate_lru_pages(sc->nr_to_isolate,
+ from,
&page_list, &nr_scan);
- zone->nr_inactive -= nr_taken;
+ *page_counter -= nr_taken;
zone->pages_scanned += nr_scan;
spin_unlock_irq(&zone->lru_lock);
@@ -628,7 +665,7 @@ static void shrink_cache(struct zone *zo
mod_page_state_zone(zone, pgscan_kswapd, nr_scan);
else
mod_page_state_zone(zone, pgscan_direct, nr_scan);
- nr_freed = shrink_list(&page_list, sc);
+ nr_freed = shrink_list(&page_list, sc, zone);
if (current_is_kswapd())
mod_page_state(kswapd_steal, nr_freed);
mod_page_state_zone(zone, pgsteal, nr_freed);
@@ -676,6 +713,7 @@ done:
* The downside is that we have to touch page->_count against each page.
* But we had to alter page->flags anyway.
*/
+#if 0
static void
refill_inactive_zone(struct zone *zone, struct scan_control *sc)
{
@@ -802,6 +840,10 @@ refill_inactive_zone(struct zone *zone,
mod_page_state_zone(zone, pgrefill, pgscanned);
mod_page_state(pgdeactivate, pgdeactivate);
}
+#endif
+
+#define RECLAIM_BALANCE 16
+DEFINE_PER_CPU(int, act_inact_scan) = RECLAIM_BALANCE;
/*
* This is a basic per-zone page freer. Used by both kswapd and direct reclaim.
@@ -809,44 +851,65 @@ refill_inactive_zone(struct zone *zone,
static void
shrink_zone(struct zone *zone, struct scan_control *sc)
{
- unsigned long nr_active;
- unsigned long nr_inactive;
+ unsigned long reclaim_saved, reclaimed;
+ int inactive_scan = 0;
+ int scan_protected = 0;
+ int *local_act_inact_scan = &__get_cpu_var(act_inact_scan);
+
+ sc->nr_to_reclaim = (zone->present_pages * sc->swap_cluster_max) /
+ total_memory;
+
+ reclaim_saved = sc->nr_reclaimed;
+
+ sc->nr_to_isolate = sc->nr_to_reclaim;
+
+ if (zone->nr_inactive >= zone_target(zone)) {
+ sc->nr_to_scan = (zone->nr_inactive >> sc->priority) + 1;
+ inc_page_state(inactive_scan);
+ mod_page_state_zone(zone, pgscan_inactive, sc->nr_to_scan);
+ shrink_cache(zone, sc, &zone->inactive_list, &zone->nr_inactive);
+ inactive_scan = 1;
+ *local_act_inact_scan--;
+ } else {
+ sc->nr_to_scan = (zone->nr_active >> sc->priority) + 1;
+ inc_page_state(active_scan);
+ mod_page_state_zone(zone, pgscan_active, sc->nr_to_scan);
+ shrink_cache(zone, sc, &zone->active_list, &zone->nr_active);
+ *local_act_inact_scan++;
+ }
+
+ /*
+ * Scan the "protected" list once in a while if the target
+ * list remains the same for a long period.
+ */
+ if (*local_act_inact_scan >= RECLAIM_BALANCE*2) {
+ scan_protected = 1;
+ inactive_scan = 1;
+ *local_act_inact_scan = RECLAIM_BALANCE;
+ } else if (*local_act_inact_scan <= 0) {
+ scan_protected = 1;
+ inactive_scan = 0;
+ *local_act_inact_scan = RECLAIM_BALANCE;
+ }
- /*
- * Add one to `nr_to_scan' just to make sure that the kernel will
- * slowly sift through the active list.
+ reclaimed = sc->nr_reclaimed - reclaim_saved;
+
+ /*
+ * if no pages have been reclaimed and we're in trouble, ignore
+ * the inactive target.
*/
- zone->nr_scan_active += (zone->nr_active >> sc->priority) + 1;
- nr_active = zone->nr_scan_active;
- if (nr_active >= sc->swap_cluster_max)
- zone->nr_scan_active = 0;
- else
- nr_active = 0;
-
- zone->nr_scan_inactive += (zone->nr_inactive >> sc->priority) + 1;
- nr_inactive = zone->nr_scan_inactive;
- if (nr_inactive >= sc->swap_cluster_max)
- zone->nr_scan_inactive = 0;
- else
- nr_inactive = 0;
-
- sc->nr_to_reclaim = sc->swap_cluster_max;
-
- while (nr_active || nr_inactive) {
- if (nr_active) {
- sc->nr_to_scan = min(nr_active,
- (unsigned long)sc->swap_cluster_max);
- nr_active -= sc->nr_to_scan;
- refill_inactive_zone(zone, sc);
- }
-
- if (nr_inactive) {
- sc->nr_to_scan = min(nr_inactive,
- (unsigned long)sc->swap_cluster_max);
- nr_inactive -= sc->nr_to_scan;
- shrink_cache(zone, sc);
- if (sc->nr_to_reclaim <= 0)
- break;
+ if (!reclaimed && sc->priority < 2)
+ scan_protected = 1;
+
+ if (scan_protected) {
+ sc->nr_to_reclaim = (zone->present_pages * sc->swap_cluster_max) /
+ total_memory;
+ if (inactive_scan) {
+ sc->nr_to_scan = (zone->nr_active >> sc->priority) + 1;
+ shrink_cache(zone, sc, &zone->active_list, &zone->nr_active);
+ } else {
+ sc->nr_to_scan = (zone->nr_inactive >> sc->priority) + 1;
+ shrink_cache(zone, sc, &zone->inactive_list, &zone->nr_inactive);
}
}
@@ -968,6 +1031,7 @@ int try_to_free_pages(struct zone **zone
if (sc.nr_scanned && priority < DEF_PRIORITY - 2)
blk_congestion_wait(WRITE, HZ/10);
}
+ ret = !!total_reclaimed;
out:
for (i = 0; zones[i] != 0; i++) {
struct zone *zone = zones[i];
@@ -1296,6 +1360,8 @@ static int __devinit cpu_callback(struct
}
#endif /* CONFIG_HOTPLUG_CPU */
+extern void init_vm_evicted(void);
+
static int __init kswapd_init(void)
{
pg_data_t *pgdat;
@@ -1305,6 +1371,7 @@ static int __init kswapd_init(void)
= find_task_by_pid(kernel_thread(kswapd, pgdat, CLONE_KERNEL));
total_memory = nr_free_pagecache_pages();
hotcpu_notifier(cpu_callback, 0);
+ init_vm_evicted();
return 0;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Zoned CART
2005-08-12 20:21 ` Marcelo Tosatti
@ 2005-08-12 22:28 ` Marcelo Tosatti
2005-08-13 19:03 ` Rahul Iyer
1 sibling, 0 replies; 22+ messages in thread
From: Marcelo Tosatti @ 2005-08-12 22:28 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: linux-mm, Rik van Riel, Rahul Iyer
> Note: this one dies horribly with highmem machines, probably due to
> atomic allocation of nodes - an improvement would be to
to preallocate memory required for the nodes and have a simple
allocator manage that space, or better use the hashtable.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Zoned CART
2005-08-12 15:42 ` Rahul Iyer
2005-08-12 15:52 ` Peter Zijlstra
@ 2005-08-12 23:08 ` Marcelo Tosatti
2005-08-13 19:00 ` Rahul Iyer
2005-08-14 18:31 ` Peter Zijlstra
1 sibling, 2 replies; 22+ messages in thread
From: Marcelo Tosatti @ 2005-08-12 23:08 UTC (permalink / raw)
To: Rahul Iyer; +Cc: Peter Zijlstra, linux-mm, Rik van Riel
On Fri, Aug 12, 2005 at 11:42:17AM -0400, Rahul Iyer wrote:
> Hi Peter,
> I have recently released another patch...
> both patches are at http://www.cs.cmu.edu/~412/projects/CART/
> Thanks
> Rahul
Hi Rahul,
Have some comments on the CART v2 patch
I find it a very interesting idea to split the active list in two!
+#define EVICTED_ACTIVE 1
+#define EVICTED_LONGTERM 2
+#define ACTIVE 3
+#define ACTIVE_LONGTERM 4
You have different definitions using the same bit positions.
Those values should be 1, 2, 4 and 8.
+#define EvictedActive(location) location & EVICTED_ACTIVE
+#define EvictedLongterm(location) location & EVICTED_LONGTERM
+#define Active(location) location & ACTIVE
+#define ActiveLongterm(location) location & ACTIVE_LONGTERM
(location & xxxx) looks nicer.
+struct non_res_list_node {
+ struct list_head list;
+ struct list_head hash;
+ unsigned long mapping;
+ unsigned long offset;
+ unsigned long inode;
+};
+ node->offset = page->index;
+ node->mapping = (unsigned long) page->mapping;
+ node->inode = get_inode_num(page->mapping);
You can compress these tree fields into a single one with a hash function.
+/* The replace function. This function serches the active and longterm
+lists and looks for a candidate for replacement. This function selects
+the candidate and returns the corresponding structpage or returns
+NULL in case no page can be freed. The *where argument is used to
+indicate the parent list of the page so that, in case it cannot be
+written back, it can be placed back on the correct list */
+struct page *replace(struct zone *zone, int *where)
+ list = list->next;
+ while (list !=&zone->active_longterm) {
+ page = list_entry(list, struct page, lru);
+
+ if (!PageReferenced(page))
+ break;
+
+ ClearPageReferenced(page);
+ del_page_from_active_longterm(zone, page);
+ add_page_to_active_list_tail(zone, page);
This sounds odd. If a page is referenced you remove it from the longterm list
"unpromoting" it to the active list? Shouldnt be the other way around?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Zoned CART
2005-08-12 23:08 ` Marcelo Tosatti
@ 2005-08-13 19:00 ` Rahul Iyer
2005-08-13 19:08 ` Marcelo Tosatti
2005-08-14 18:31 ` Peter Zijlstra
1 sibling, 1 reply; 22+ messages in thread
From: Rahul Iyer @ 2005-08-13 19:00 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Peter Zijlstra, linux-mm, Rik van Riel
Hi Marcelo,
>I find it a very interesting idea to split the active list in two!
>
>+#define EVICTED_ACTIVE 1
>+#define EVICTED_LONGTERM 2
>+#define ACTIVE 3
>+#define ACTIVE_LONGTERM 4
>
>You have different definitions using the same bit positions.
>Those values should be 1, 2, 4 and 8.
>
>
Agreed! Dumb mistake.
>+#define EvictedActive(location) location & EVICTED_ACTIVE
>+#define EvictedLongterm(location) location & EVICTED_LONGTERM
>+#define Active(location) location & ACTIVE
>+#define ActiveLongterm(location) location & ACTIVE_LONGTERM
>
>(location & xxxx) looks nicer.
>
>
Will do this too...
>+struct non_res_list_node {
>+ struct list_head list;
>+ struct list_head hash;
>+ unsigned long mapping;
>+ unsigned long offset;
>+ unsigned long inode;
>+};
>
>+ node->offset = page->index;
>+ node->mapping = (unsigned long) page->mapping;
>+ node->inode = get_inode_num(page->mapping);
>
>You can compress these tree fields into a single one with a hash function.
>
>
Yes, but then you would not be able to handle hash collisions. Are we
prepared to give up this property?
>+/* The replace function. This function serches the active and longterm
>+lists and looks for a candidate for replacement. This function selects
>+the candidate and returns the corresponding structpage or returns
>+NULL in case no page can be freed. The *where argument is used to
>+indicate the parent list of the page so that, in case it cannot be
>+written back, it can be placed back on the correct list */
>+struct page *replace(struct zone *zone, int *where)
>
>+ list = list->next;
>+ while (list !=&zone->active_longterm) {
>+ page = list_entry(list, struct page, lru);
>+
>+ if (!PageReferenced(page))
>+ break;
>+
>+ ClearPageReferenced(page);
>+ del_page_from_active_longterm(zone, page);
>+ add_page_to_active_list_tail(zone, page);
>
>This sounds odd. If a page is referenced you remove it from the longterm list
>"unpromoting" it to the active list? Shouldnt be the other way around?
>
>
>
I'll re-check this in the CART paper.
Currently I'm out of town, so i'll get this patch in with the
corrections as soon as i get back.
Thanks
Rahul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Zoned CART
2005-08-12 20:21 ` Marcelo Tosatti
2005-08-12 22:28 ` Marcelo Tosatti
@ 2005-08-13 19:03 ` Rahul Iyer
1 sibling, 0 replies; 22+ messages in thread
From: Rahul Iyer @ 2005-08-13 19:03 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Peter Zijlstra, linux-mm, Rik van Riel
>
>Rahul, do you have any promising performance results of your ARC
>implementation?
>
>
>
>
Actually, i haven't run performance tests on it because of the OOM
issue. However, boot times seem to as good as that of the "stock"
kernel's. I agree this is *not* a performance test! :)
-rahul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Zoned CART
2005-08-13 19:00 ` Rahul Iyer
@ 2005-08-13 19:08 ` Marcelo Tosatti
2005-08-13 21:30 ` Rik van Riel
0 siblings, 1 reply; 22+ messages in thread
From: Marcelo Tosatti @ 2005-08-13 19:08 UTC (permalink / raw)
To: Rahul Iyer; +Cc: Peter Zijlstra, linux-mm, Rik van Riel
> >+ node->offset = page->index;
> >+ node->mapping = (unsigned long) page->mapping;
> >+ node->inode = get_inode_num(page->mapping);
> >
> >You can compress these tree fields into a single one with a hash function.
> >
> >
> Yes, but then you would not be able to handle hash collisions. Are we
> prepared to give up this property?
I suppose collisions should be quite rare.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Zoned CART
2005-08-13 19:08 ` Marcelo Tosatti
@ 2005-08-13 21:30 ` Rik van Riel
0 siblings, 0 replies; 22+ messages in thread
From: Rik van Riel @ 2005-08-13 21:30 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Rahul Iyer, Peter Zijlstra, linux-mm
On Sat, 13 Aug 2005, Marcelo Tosatti wrote:
> > Yes, but then you would not be able to handle hash collisions. Are we
> > prepared to give up this property?
>
> I suppose collisions should be quite rare.
Rare enough to not be a performance issue - and remember
that page replacement algorithms are just about performance.
--
All Rights Reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Zoned CART
2005-08-12 14:37 Zoned CART Peter Zijlstra
2005-08-12 15:42 ` Rahul Iyer
2005-08-12 20:21 ` Marcelo Tosatti
@ 2005-08-14 12:58 ` Peter Zijlstra
2005-08-15 21:31 ` Peter Zijlstra
2005-08-26 21:03 ` Peter Zijlstra
2 siblings, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2005-08-14 12:58 UTC (permalink / raw)
To: linux-mm; +Cc: Rik van Riel, Marcelo Tosatti, Rahul Iyer
[-- Attachment #1: Type: text/plain, Size: 1660 bytes --]
On Fri, 2005-08-12 at 16:37 +0200, Peter Zijlstra wrote:
> This will keep:
>
> |B1_j| |B1| Sum^i(|B1_i|)
> -------- ~ ------ = -------------
> |B2_j| |B2| Sum^i(|B2_i|)
>
> however it will violate strict FIFO order within the buckets; although I
> guess it won't be too bad.
>
Still it bothered me. So I propose the attached code to fix it. What
I've done is to keep 2 proper clocks in the bucket. Each clock is a
single linked cyclic list where links are slot positions bitfield
encoded in the cookie; only 24 bits of uniqueness left.
The code is a bit messy and totaly untested (it should explode since I
wrote it around 2 in the morning) but I think the concept is sound. The
'insert before' and 'remove current' operations are implemented by
swapping slots.
remove current (b from 'abc'):
initial swap(2,3)
1: -> [2],a 1: -> [2],a
* 2: -> [3],b 2: -> [1],c
3: -> [1],c * 3: -> [3],b
3 is now free for use.
insert before (d before b in 'abc')
initial set 4 swap(2,4)
1: -> [2],a 1: -> [2],a 1: -> [2],a
* 2: -> [3],b 2: -> [3],b 2: -> [4],d
3: -> [1],c 3: -> [1],c 3: -> [1],c
4: nil 4: -> [4],d * 4: -> [3],b
leaving us with 'adbc'.
The only thing is that for this to work we have to start the algorith
with filled nonresident clocks. Currently it assumes:
q_i = c_i/2,
|B1_i| = q_i,
|B2_i| = c_i - q_i.
If q=0 etc., would be preferred changing the initial clocks to
|B1_j| = 0 and |B2_j| = c_j
should not pose any problems either.
Ok, now on to putting Rahul code on top of this ;-)
--
Peter Zijlstra <a.p.zijlstra@chello.nl>
[-- Attachment #2: nonresident-pages-2.patch --]
[-- Type: text/x-patch, Size: 10588 bytes --]
diff -NaurX linux-2.6.13-rc6/Documentation/dontdiff linux-2.6.13-rc6/include/linux/nonresident.h linux-2.6.13-rc6-cart/include/linux/nonresident.h
--- linux-2.6.13-rc6/include/linux/nonresident.h 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.13-rc6-cart/include/linux/nonresident.h 2005-08-12 13:55:54.000000000 +0200
@@ -0,0 +1,11 @@
+#ifndef __LINUX_NONRESIDENT_H
+#define __LINUX_NONRESIDENT_H
+
+#define NR_filter 0x01 /* short/long */
+#define NR_list 0x02 /* b1/b2; correlates to PG_active */
+
+#define EVICT_MASK 0x80000000
+#define EVICT_B1 0x00000000
+#define EVICT_B2 0x80000000
+
+#endif /* __LINUX_NONRESIDENT_H */
diff -NaurX linux-2.6.13-rc6/Documentation/dontdiff linux-2.6.13-rc6/include/linux/swap.h linux-2.6.13-rc6-cart/include/linux/swap.h
--- linux-2.6.13-rc6/include/linux/swap.h 2005-08-08 20:57:50.000000000 +0200
+++ linux-2.6.13-rc6-cart/include/linux/swap.h 2005-08-12 14:00:26.000000000 +0200
@@ -154,6 +154,11 @@
/* linux/mm/memory.c */
extern void swapin_readahead(swp_entry_t, unsigned long, struct vm_area_struct *);
+/* linux/mm/nonresident.c */
+extern u32 remember_page(struct address_space *, unsigned long, unsigned int);
+extern unsigned int recently_evicted(struct address_space *, unsigned long);
+extern void init_nonresident(void);
+
/* linux/mm/page_alloc.c */
extern unsigned long totalram_pages;
extern unsigned long totalhigh_pages;
@@ -292,6 +297,11 @@
#define grab_swap_token() do { } while(0)
#define has_swap_token(x) 0
+/* linux/mm/nonresident.c */
+#define init_nonresident() do { } while (0)
+#define remember_page(x,y,z) 0
+#define recently_evicted(x,y) 0
+
#endif /* CONFIG_SWAP */
#endif /* __KERNEL__*/
#endif /* _LINUX_SWAP_H */
diff -NaurX linux-2.6.13-rc6/Documentation/dontdiff linux-2.6.13-rc6/init/main.c linux-2.6.13-rc6-cart/init/main.c
--- linux-2.6.13-rc6/init/main.c 2005-08-08 20:57:51.000000000 +0200
+++ linux-2.6.13-rc6-cart/init/main.c 2005-08-10 08:33:38.000000000 +0200
@@ -47,6 +47,7 @@
#include <linux/rmap.h>
#include <linux/mempolicy.h>
#include <linux/key.h>
+#include <linux/swap.h>
#include <asm/io.h>
#include <asm/bugs.h>
@@ -494,6 +495,7 @@
}
#endif
vfs_caches_init_early();
+ init_nonresident();
mem_init();
kmem_cache_init();
setup_per_cpu_pageset();
diff -NaurX linux-2.6.13-rc6/Documentation/dontdiff linux-2.6.13-rc6/mm/Makefile linux-2.6.13-rc6-cart/mm/Makefile
--- linux-2.6.13-rc6/mm/Makefile 2005-08-08 20:57:52.000000000 +0200
+++ linux-2.6.13-rc6-cart/mm/Makefile 2005-08-10 08:33:39.000000000 +0200
@@ -12,7 +12,8 @@
readahead.o slab.o swap.o truncate.o vmscan.o \
prio_tree.o $(mmu-y)
-obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o
+obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o \
+ nonresident.o
obj-$(CONFIG_HUGETLBFS) += hugetlb.o
obj-$(CONFIG_NUMA) += mempolicy.o
obj-$(CONFIG_SPARSEMEM) += sparse.o
diff -NaurX linux-2.6.13-rc6/Documentation/dontdiff linux-2.6.13-rc6/mm/nonresident.c linux-2.6.13-rc6-cart/mm/nonresident.c
--- linux-2.6.13-rc6/mm/nonresident.c 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.13-rc6-cart/mm/nonresident.c 2005-08-12 14:00:26.000000000 +0200
@@ -0,0 +1,211 @@
+/*
+ * mm/nonresident.c
+ * (C) 2004,2005 Red Hat, Inc
+ * Written by Rik van Riel <riel@redhat.com>
+ * Released under the GPL, see the file COPYING for details.
+ * Adapted by Peter Zijlstra <a.p.zijlstra@chello.nl> for use by ARC
+ * like algorithms.
+ *
+ * Keeps track of whether a non-resident page was recently evicted
+ * and should be immediately promoted to the active list. This also
+ * helps automatically tune the inactive target.
+ *
+ * The pageout code stores a recently evicted page in this cache
+ * by calling remember_page(mapping/mm, index/vaddr)
+ * and can look it up in the cache by calling recently_evicted()
+ * with the same arguments.
+ *
+ * Note that there is no way to invalidate pages after eg. truncate
+ * or exit, we let the pages fall out of the non-resident set through
+ * normal replacement.
+ *
+ *
+ * Modified to work with ARC like algorithms who:
+ * - need to balance two FIFOs; |b1| + |b2| = c,
+ * - keep a flag per non-resident page.
+ *
+ * The bucket contains two single linked cyclic lists (CLOCKS) and each
+ * clock has a tail hand. By selecting a victim clock upon insertion it
+ * is possible to balance them.
+ *
+ * The slot looks like this:
+ * struct slot_t {
+ * u32 cookie : 24; /* LSB */
+ * u32 index : 6;
+ * u32 filter : 1;
+ * u32 clock : 1; /* MSB */
+ * };
+ *
+ * The bucket is guarded by a spinlock.
+ */
+#include <linux/mm.h>
+#include <linux/cache.h>
+#include <linux/spinlock.h>
+#include <linux/bootmem.h>
+#include <linux/hash.h>
+#include <linux/prefetch.h>
+#include <linux/kernel.h>
+#include <linux/nonresident.h>
+
+#define TARGET_SLOTS 64
+#define NR_CACHELINES (TARGET_SLOTS*sizeof(u32) / L1_CACHE_BYTES);
+#define NR_SLOTS (((NR_CACHELINES * L1_CACHE_BYTES) - sizeof(spinlock_t) - 2*sizeof(u16)) / sizeof(u32))
+#if NR_SLOTS < TARGET_SLOTS / 2
+#warning very small slot size
+#if NR_SLOTS <= 0
+#error no room for slots left
+#endif
+#endif
+
+#define BUILD_MASK(bits, shift) (((1 << (bits)) - 1) << (shift))
+
+#define FLAGS_BITS 2
+#define FLAGS_SHIFT (sizeof(u32)*8 - FLAGS_BITS)
+#define FLAGS_MASK BUILD_MASK(FLAGS_BITS, FLAGS_SHIFT)
+
+#define INDEX_BITS 6 /* ceil(log2(NR_SLOTS)) */
+#define INDEX_SHIFT (FLAGS_SHIFT - INDEX_BITS)
+#define INDEX_MASK BUILD_MASK(INDEX_BITS, INDEX_SHIFT)
+
+#define SET_INDEX(x, idx) ((x) = ((x) & ~INDEX_MASK) | ((idx) << INDEX_SHIFT))
+#define GET_INDEX(x) (((x) & INDEX_MASK) >> INDEX_SHIFT)
+
+struct nr_bucket
+{
+ spinlock_t lock;
+ u16 hand[2];
+ u32 slot[NR_SLOTS];
+} ____cacheline_aligned;
+
+/* The non-resident page hash table. */
+static struct nr_bucket * nonres_table;
+static unsigned int nonres_shift;
+static unsigned int nonres_mask;
+
+/* hash the address into a bucket */
+static struct nr_bucket * nr_hash(void * mapping, unsigned long index)
+{
+ unsigned long bucket;
+ unsigned long hash;
+
+ hash = hash_ptr(mapping, BITS_PER_LONG);
+ hash = 37 * hash + hash_long(index, BITS_PER_LONG);
+ bucket = hash & nonres_mask;
+
+ return nonres_table + bucket;
+}
+
+/* hash the address, inode and flags into a cookie */
+/* the two msb are flags; where msb-1 is a type flag and msb a period flag */
+static u32 nr_cookie(struct address_space * mapping, unsigned long index, unsigned int flags)
+{
+ u32 c;
+ unsigned long cookie;
+
+ cookie = hash_ptr(mapping, BITS_PER_LONG);
+ cookie = 37 * cookie + hash_long(index, BITS_PER_LONG);
+
+ if (mapping->host) {
+ cookie = 37 * cookie + hash_long(mapping->host->i_ino, BITS_PER_LONG);
+ }
+
+ c = (u32)(cookie >> (BITS_PER_LONG - 32));
+ c = (c & ~FLAGS_MASK) | ((flags << FLAGS_SHIFT) & FLAGS_MASK);
+ return c;
+}
+
+unsigned int recently_evicted(struct address_space * mapping, unsigned long index)
+{
+ struct nr_bucket * nr_bucket;
+ u32 wanted, mask;
+ unsigned int r_flags = 0;
+ int i;
+
+ prefetch(mapping->host);
+ nr_bucket = nr_hash(mapping, index);
+
+ spin_lock_prefetch(nr_bucket); // prefetch_range(nr_bucket, NR_CACHELINES);
+ wanted = nr_cookie(mapping, index, 0) & ~INDEX_MASK;
+ mask = ~(FLAGS_MASK | INDEX_MASK);
+
+ spin_lock(&nr_bucket->lock);
+ for (i = 0; i < NR_SLOTS; ++i) {
+ if (nr_bucket->slot[i] & mask == wanted) {
+ r_flags = nr_bucket->slot[i] >> FLAGS_SHIFT;
+ r_flags |= EVICT_MASK; /* set the MSB to mark presence */
+ break;
+ }
+ }
+ spin_unlock(&nr_bucket->lock);
+
+ return r_flags;
+}
+
+/* flags:
+ * logical and of the page flags (NR_filter, NR_list) and
+ * an EVICT_ target
+ */
+u32 remember_page(struct address_space * mapping, unsigned long index, unsigned int flags)
+{
+ struct nr_bucket *nr_bucket;
+ u32 cookie;
+ u32 *slot, *tail;
+ int slot_pos, tail_pos;
+
+ prefetch(mapping->host);
+ nr_bucket = nr_hash(mapping, index);
+
+ spin_lock_prefetch(nr_bucket); // prefetchw_range(nr_bucket, NR_CACHELINES);
+ cookie = nr_cookie(mapping, index, flags);
+
+ flags &= EVICT_MASK;
+ spin_lock(&nr_bucket->lock);
+
+again:
+ tail_pos = nr_bucket->hand[!!flags]; /* tail slot in removal clock */
+ &tail = &nr_bucket->slot[tail_pos];
+ if (unlikely(*tail & EVICT_MASK != flags)) {
+ flags ^= EVICT_MASK; /* empty clock; take other one */
+ goto again;
+ }
+ /* free slot by swapping tail,tail+1, so that we skip over tail */
+ slot_pos = GET_INDEX(*tail);
+ slot = &nr_bucket->slot[slot_pos];
+ if (likely(tail != slot)) *slot = xchg(tail, *slot);
+
+ SET_INDEX(cookie, slot_pos); /* cookie -> slot */
+
+ flags = cookie & EVICT_MASK; /* insertion chain */
+ tail = &nr_bucket->slot[nr_bucket->hand[!!flags]];
+ cookie = xchg(tail, cookie); /* prev/tail-1 -> cookie/tail -> slot */
+
+ if (likely(tail != slot)) cookie = xchg(slot, cookie);
+ nr_bucket->hand[!!flags] = slot_pos; /* prev -> cookie -> slot/tail */
+
+ spin_unlock(&nr_bucket->lock);
+
+ return cookie;
+}
+
+/*
+ * For interactive workloads, we remember about as many non-resident pages
+ * as we have actual memory pages. For server workloads with large inter-
+ * reference distances we could benefit from remembering more.
+ */
+static __initdata unsigned long nonresident_factor = 1;
+void __init init_nonresident(void)
+{
+ int target;
+ int i, j;
+
+ /*
+ * Calculate the non-resident hash bucket target. Use a power of
+ * two for the division because alloc_large_system_hash rounds up.
+ */
+ target = nr_all_pages * nonresident_factor;
+ target /= (sizeof(struct nr_bucket) / sizeof(u32));
+
+ nonres_table = alloc_large_system_hash("Non-resident page tracking",
+ sizeof(struct nr_bucket),
+ target,
+ 0,
+ HASH_EARLY | HASH_HIGHMEM,
+ &nonres_shift,
+ &nonres_mask,
+ 0);
+
+ for (i = 0; i < (1 << nonres_shift); i++) {
+ spin_lock_init(&nonres_table[i].lock);
+ nonres_table[i].hand[0] = 0;
+ nonres_table[i].hand[1] = NR_SLOTS/2;
+ for (j = 0; j < NR_SLOTS; ++j) {
+ nonres_table[i].slot[j] = (j < NR_SLOTS/2) ? 0 : EVICT_MASK;
+ if (j < NR_SLOTS/2 - 1)
+ SET_INDEX(nonres_table[i].slot[j], j+1);
+ else if (j == NR_SLOTS/2 - 1)
+ SET_INDEX(nonres_table[i].slot[j], 0);
+ else if (j < NR_SLOTS - 1)
+ SET_INDEX(nonres_table[i].slot[j], j+1);
+ else /* j == NR_SLOTS - 1 */
+ SET_INDEX(nonres_table[i].slot[j], NR_SLOTS/2);
+ }
+ }
+}
+
+static int __init set_nonresident_factor(char * str)
+{
+ if (!str)
+ return 0;
+ nonresident_factor = simple_strtoul(str, &str, 0);
+ return 1;
+}
+__setup("nonresident_factor=", set_nonresident_factor);
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Zoned CART
2005-08-12 23:08 ` Marcelo Tosatti
2005-08-13 19:00 ` Rahul Iyer
@ 2005-08-14 18:31 ` Peter Zijlstra
1 sibling, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2005-08-14 18:31 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Rahul Iyer, linux-mm, Rik van Riel
On Fri, 2005-08-12 at 20:08 -0300, Marcelo Tosatti wrote:
> +/* The replace function. This function serches the active and longterm
> +lists and looks for a candidate for replacement. This function selects
> +the candidate and returns the corresponding structpage or returns
> +NULL in case no page can be freed. The *where argument is used to
> +indicate the parent list of the page so that, in case it cannot be
> +written back, it can be placed back on the correct list */
> +struct page *replace(struct zone *zone, int *where)
>
> + list = list->next;
> + while (list !=&zone->active_longterm) {
> + page = list_entry(list, struct page, lru);
> +
> + if (!PageReferenced(page))
> + break;
> +
> + ClearPageReferenced(page);
> + del_page_from_active_longterm(zone, page);
> + add_page_to_active_list_tail(zone, page);
>
> This sounds odd. If a page is referenced you remove it from the longterm list
> "unpromoting" it to the active list? Shouldnt be the other way around?
This is correct, the longterm list (T2) is essentially a FIFO. All it
does is delay the re-evaluation of the page.
--
Peter Zijlstra <a.p.zijlstra@chello.nl>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Zoned CART
2005-08-14 12:58 ` Peter Zijlstra
@ 2005-08-15 21:31 ` Peter Zijlstra
2005-08-16 19:53 ` Rahul Iyer
2005-08-26 21:03 ` Peter Zijlstra
1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2005-08-15 21:31 UTC (permalink / raw)
To: linux-mm; +Cc: Rik van Riel, Marcelo Tosatti, Rahul Iyer
[-- Attachment #1: Type: text/plain, Size: 783 bytes --]
On Sun, 2005-08-14 at 14:58 +0200, Peter Zijlstra wrote:
>
> Ok, now on to putting Rahul code on top of this ;-)
I got UML to boot with this patch. Now for some stress and behavioural
testing.
include/linux/cart.h | 12 ++
include/linux/mm_inline.h | 36 ++++++
include/linux/mmzone.h | 12 +-
include/linux/page-flags.h | 5
include/linux/swap.h | 14 ++
init/main.c | 5
mm/Makefile | 3
mm/cart.c | 175 +++++++++++++++++++++++++++++++
mm/nonresident.c | 251 +++++++++++++++++++++++++++++++++++++++++++++
mm/swap.c | 4
mm/vmscan.c | 43 +++++++
11 files changed, 553 insertions(+), 7 deletions(-)
--
Peter Zijlstra <a.p.zijlstra@chello.nl>
[-- Attachment #2: 2.6.13-rc6-cart-3.patch --]
[-- Type: text/x-patch, Size: 24495 bytes --]
diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/include/linux/cart.h linux-2.6.13-rc6-cart/include/linux/cart.h
--- linux-2.6.13-rc6/include/linux/cart.h 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.13-rc6-cart/include/linux/cart.h 2005-08-15 17:33:07.000000000 +0200
@@ -0,0 +1,12 @@
+#ifndef __CART_H__
+#define __CART_H__
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/swap.h>
+
+extern void cart_init(void);
+extern void update_cart_params(struct page *);
+extern struct page *replace(struct zone *, unsigned int *);
+
+#endif
+
diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/include/linux/mm_inline.h linux-2.6.13-rc6-cart/include/linux/mm_inline.h
--- linux-2.6.13-rc6/include/linux/mm_inline.h 2005-03-02 08:38:33.000000000 +0100
+++ linux-2.6.13-rc6-cart/include/linux/mm_inline.h 2005-08-15 17:33:07.000000000 +0200
@@ -38,3 +38,39 @@ del_page_from_lru(struct zone *zone, str
zone->nr_inactive--;
}
}
+
+static inline void
+add_page_to_active_tail(struct zone *zone, struct page *page)
+{
+ list_add_tail(&page->lru, &zone->active_list);
+ zone->nr_active++;
+}
+
+static inline void
+del_page_from_active(struct zone *zone, struct page *page)
+{
+ list_del(&page->lru);
+ zone->nr_active--;
+}
+
+static inline void
+add_page_to_inactive_tail(struct zone *zone, struct page *page)
+{
+ list_add_tail(&page->lru, &zone->inactive_list);
+ zone->nr_inactive++;
+}
+
+static inline void
+del_page_from_active_longterm(struct zone *zone, struct page *page)
+{
+ list_del(&page->lru);
+ zone->nr_active_longterm--;
+}
+
+static inline void
+add_page_to_active_longterm_tail(struct zone *zone, struct page *page)
+{
+ list_add_tail(&page->lru, &zone->active_longterm);
+ zone->nr_active_longterm++;
+}
+
diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/include/linux/mmzone.h linux-2.6.13-rc6-cart/include/linux/mmzone.h
--- linux-2.6.13-rc6/include/linux/mmzone.h 2005-08-15 22:37:00.000000000 +0200
+++ linux-2.6.13-rc6-cart/include/linux/mmzone.h 2005-08-15 17:33:07.000000000 +0200
@@ -144,12 +144,20 @@ struct zone {
/* Fields commonly accessed by the page reclaim scanner */
spinlock_t lru_lock;
- struct list_head active_list;
+ struct list_head active_list; /* The T1 list of CART */
+ struct list_head active_longterm;/* The T2 list of CART */
struct list_head inactive_list;
unsigned long nr_scan_active;
unsigned long nr_scan_inactive;
- unsigned long nr_active;
+ unsigned long nr_active;
+ unsigned long nr_active_longterm;
unsigned long nr_inactive;
+ unsigned long nr_evicted_active;
+ unsigned long nr_evicted_longterm;
+ unsigned long nr_longterm; /* number of long term pages */
+ unsigned long nr_shortterm; /* number of short term pages */
+ unsigned long cart_p; /* p from the CART paper */
+ unsigned long cart_q; /* q from the cart paper */
unsigned long pages_scanned; /* since last reclaim */
int all_unreclaimable; /* All pages pinned */
diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/include/linux/page-flags.h linux-2.6.13-rc6-cart/include/linux/page-flags.h
--- linux-2.6.13-rc6/include/linux/page-flags.h 2005-08-15 22:37:00.000000000 +0200
+++ linux-2.6.13-rc6-cart/include/linux/page-flags.h 2005-08-15 17:33:07.000000000 +0200
@@ -75,6 +75,7 @@
#define PG_reclaim 17 /* To be reclaimed asap */
#define PG_nosave_free 18 /* Free, should not be written */
#define PG_uncached 19 /* Page has been mapped as uncached */
+#define PG_longterm 20 /* Filter bit for CART see mm/cart.c */
/*
* Global page accounting. One instance per CPU. Only unsigned longs are
@@ -305,6 +306,10 @@ extern void __mod_page_state(unsigned lo
#define SetPageUncached(page) set_bit(PG_uncached, &(page)->flags)
#define ClearPageUncached(page) clear_bit(PG_uncached, &(page)->flags)
+#define PageLongTerm(page) test_bit(PG_longterm, &(page)->flags)
+#define SetLongTerm(page) set_bit(PG_longterm, &(page)->flags)
+#define ClearLongTerm(page) clear_bit(PG_longterm, &(page)->flags)
+
struct page; /* forward declaration */
int test_clear_page_dirty(struct page *page);
diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/include/linux/swap.h linux-2.6.13-rc6-cart/include/linux/swap.h
--- linux-2.6.13-rc6/include/linux/swap.h 2005-08-15 22:37:00.000000000 +0200
+++ linux-2.6.13-rc6-cart/include/linux/swap.h 2005-08-15 17:33:08.000000000 +0200
@@ -154,6 +154,15 @@ extern void out_of_memory(unsigned int _
/* linux/mm/memory.c */
extern void swapin_readahead(swp_entry_t, unsigned long, struct vm_area_struct *);
+/* linux/mm/nonresident.c */
+#define NR_filter 0x01 /* short/long */
+#define NR_list 0x02 /* b1/b2; correlates to PG_active */
+#define NR_evict 0x80000000
+
+extern u32 remember_page(struct address_space *, unsigned long, unsigned int);
+extern unsigned int recently_evicted(struct address_space *, unsigned long);
+extern void init_nonresident(void);
+
/* linux/mm/page_alloc.c */
extern unsigned long totalram_pages;
extern unsigned long totalhigh_pages;
@@ -292,6 +301,11 @@ static inline swp_entry_t get_swap_page(
#define grab_swap_token() do { } while(0)
#define has_swap_token(x) 0
+/* linux/mm/nonresident.c */
+#define init_nonresident() do { } while (0)
+#define remember_page(x,y,z) 0
+#define recently_evicted(x,y) 0
+
#endif /* CONFIG_SWAP */
#endif /* __KERNEL__*/
#endif /* _LINUX_SWAP_H */
diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/init/main.c linux-2.6.13-rc6-cart/init/main.c
--- linux-2.6.13-rc6/init/main.c 2005-08-15 22:37:00.000000000 +0200
+++ linux-2.6.13-rc6-cart/init/main.c 2005-08-15 17:36:19.000000000 +0200
@@ -47,12 +47,15 @@
#include <linux/rmap.h>
#include <linux/mempolicy.h>
#include <linux/key.h>
+#include <linux/swap.h>
#include <asm/io.h>
#include <asm/bugs.h>
#include <asm/setup.h>
#include <asm/sections.h>
+#include <linux/cart.h>
+
/*
* This is one of the first .c files built. Error out early
* if we have compiler trouble..
@@ -494,7 +497,9 @@ asmlinkage void __init start_kernel(void
}
#endif
vfs_caches_init_early();
+ init_nonresident();
mem_init();
+ cart_init();
kmem_cache_init();
setup_per_cpu_pageset();
numa_policy_init();
diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/mm/Makefile linux-2.6.13-rc6-cart/mm/Makefile
--- linux-2.6.13-rc6/mm/Makefile 2005-08-15 22:37:01.000000000 +0200
+++ linux-2.6.13-rc6-cart/mm/Makefile 2005-08-15 17:33:08.000000000 +0200
@@ -12,7 +12,8 @@ obj-y := bootmem.o filemap.o mempool.o
readahead.o slab.o swap.o truncate.o vmscan.o \
prio_tree.o $(mmu-y)
-obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o
+obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o \
+ nonresident.o cart.o
obj-$(CONFIG_HUGETLBFS) += hugetlb.o
obj-$(CONFIG_NUMA) += mempolicy.o
obj-$(CONFIG_SPARSEMEM) += sparse.o
diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/mm/cart.c linux-2.6.13-rc6-cart/mm/cart.c
--- linux-2.6.13-rc6/mm/cart.c 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.13-rc6-cart/mm/cart.c 2005-08-15 22:22:26.000000000 +0200
@@ -0,0 +1,175 @@
+/* Thisfile contains the crux of the CART page replacement algorithm. This implementation however changes a few things form the classic CART scheme. This implementation splits the original active_list of the Linux implementation into two lists, namely active_list and active_longterm. The 'active' pages exist on these two lists. The active_list hopes to capture short term usage, while the active_longterm list hopes to capture long term usage. Whenever a page's state needs to be updated, the update_cart_params() function is called. The refill_incative_zone() function causes the replace() function to be evoked, resulting in the removal of pages from the active lists. Hence, which pages are deemed inactive is determined by the CART algorithm.
+For further details, please refer to the CART paper here - http://www.almaden.ibm.com/cs/people/dmodha/clockfast.pdf */
+
+#include <linux/cart.h>
+#include <linux/page-flags.h>
+#include <linux/mm_inline.h>
+
+/* Called from init/main.c to initialize the cart parameters */
+void cart_init()
+{
+ pg_data_t *pgdat;
+ struct zone *zone;
+ int i;
+
+ pgdat = pgdat_list;
+
+ do {
+ for (i=0;i<MAX_NR_ZONES;++i) {
+ zone = &pgdat->node_zones[i];
+
+ spin_lock_init(&zone->lru_lock);
+ INIT_LIST_HEAD(&zone->active_list);
+ INIT_LIST_HEAD(&zone->active_longterm);
+ INIT_LIST_HEAD(&zone->inactive_list);
+
+ zone->nr_active = zone->nr_active_longterm = zone->nr_inactive = 0;
+ zone->nr_evicted_active = 0;
+ zone->nr_evicted_longterm = zone->present_pages - zone->pages_high;
+
+ zone->cart_p = zone->cart_q = zone->nr_longterm = zone->nr_shortterm = 0;
+ }
+ } while ((pgdat = pgdat->pgdat_next));
+}
+
+/* The heart of the CART update function. This function is responsible for the movement of pages across the lists */
+void update_cart_params(struct page *page)
+{
+ unsigned int rflags;
+ unsigned long evicted_active;
+ unsigned evicted_longterm;
+ struct zone *zone;
+
+ zone = page_zone(page);
+
+ rflags = recently_evicted(page->mapping, page->index);
+ evicted_active = (!rflags && !(rflags & NR_list));
+ evicted_longterm = (!rflags && (rflags & NR_list));
+
+ if (evicted_active) {
+ zone->cart_p = min(zone->cart_p + max(zone->nr_shortterm/(zone->nr_evicted_active ?: 1UL), 1UL), (zone->present_pages - zone->pages_high));
+
+ ++zone->nr_longterm;
+ SetLongTerm(page);
+ ClearPageReferenced(page);
+ }
+ else if (evicted_longterm) {
+ zone->cart_p = max(zone->cart_p - max(1UL, zone->nr_longterm/(zone->nr_evicted_longterm ?: 1UL)), 0UL);
+
+ ++zone->nr_longterm;
+ ClearPageReferenced(page);
+
+ if (zone->nr_active_longterm + zone->nr_active + zone->nr_evicted_longterm - zone->nr_shortterm >=(zone->present_pages - zone->pages_high)) {
+ zone->cart_q = min(zone->cart_q + 1, 2*(zone->present_pages - zone->pages_high) - zone->nr_active);
+ }
+ }
+ else {
+ ++zone->nr_shortterm;
+ ClearLongTerm(page);
+ }
+
+ add_page_to_active_list(zone, page);
+}
+
+/* The replace function. This function serches the active and longterm lists and looks for a candidate for replacement. This function selects the candidate and returns the corresponding structpage or returns NULL in case no page can be freed. The *where argument is used to indicate the parent list of the page so that, in case it cannot be written back, it can be placed back on the correct list */
+struct page *replace(struct zone *zone, unsigned int *where)
+{
+ struct list_head *list;
+ struct page *page = NULL;
+ int referenced = 0;
+ int debug_count=0;
+ unsigned int flags = 0, rflags;
+
+ list = &zone->active_longterm;
+ list = list->next;
+ while (list !=&zone->active_longterm) {
+ page = list_entry(list, struct page, lru);
+
+ if (!PageReferenced(page))
+ break;
+
+ ClearPageReferenced(page);
+ del_page_from_active_longterm(zone, page);
+ add_page_to_active_tail(zone, page);
+
+ if ((zone->nr_active_longterm + zone->nr_active + zone->nr_evicted_longterm - zone->nr_shortterm) >= (zone->present_pages - zone->pages_high))
+ zone->cart_q = min(zone->cart_q + 1, 2*(zone->present_pages - zone->pages_high) - zone->nr_active);
+
+ list = &zone->active_longterm;
+ list = list->next;
+ debug_count++;
+ }
+
+ debug_count=0;
+ list = &zone->active_list;
+ list = list->next;
+
+ while (list != &zone->active_list) {
+ page = list_entry(list, struct page, lru);
+ referenced = PageReferenced(page);
+
+ if (!PageLongTerm(page) && !referenced)
+ break;
+
+ ClearPageReferenced(page);
+ if (referenced) {
+ del_page_from_active(zone, page);
+ add_page_to_active_tail(zone, page);
+
+ if (zone->nr_active >= min(zone->cart_p+1, zone->nr_evicted_active) && !PageLongTerm(page)) {
+ SetLongTerm(page);
+ --zone->nr_shortterm;
+ ++zone->nr_longterm;
+ }
+ }
+ else {
+ del_page_from_active(zone, page);
+ add_page_to_active_longterm_tail(zone, page);
+
+ zone->cart_q = max(zone->cart_q-1, (zone->present_pages - zone->pages_high) - zone->nr_active);
+ }
+
+ list = &zone->active_list;
+ list = list->next;
+ debug_count++;
+ }
+
+ page = NULL;
+
+ if (zone->nr_active > max(1UL, zone->cart_p)) {
+ if (!list_empty(&zone->active_list)) {
+ page = list_entry(zone->active_list.next, struct page, lru);
+ del_page_from_active(zone, page);
+ --zone->nr_shortterm;
+ ++zone->nr_evicted_active;
+ }
+ }
+ else {
+ if (!list_empty(&zone->active_longterm)) {
+ page = list_entry(zone->active_longterm.next, struct page, lru);
+ del_page_from_active_longterm(zone, page);
+ --zone->nr_longterm;
+ ++zone->nr_evicted_longterm;
+ flags |= NR_list;
+ }
+ }
+
+ if (!page) return NULL;
+ *where = flags | NR_evict;
+ if (PageLongTerm(page)) flags |= NR_filter;
+
+ /* history replacement; always remember, if the page was already remembered
+ * this will move it to the head.
+ * Also assume |B1| + |B2| == c + 1, since |B1_j| + |B2_j| == c_j.
+ */
+ if (zone->nr_evicted_active <= max(0UL, zone->cart_q)) flags |= NR_evict;
+
+ rflags = remember_page(page->mapping, page->index, flags);
+ if (rflags & NR_evict) {
+ if (likely(zone->nr_evicted_longterm)) --zone->nr_evicted_longterm;
+ } else {
+ if (likely(zone->nr_evicted_active)) --zone->nr_evicted_active;
+ }
+
+ return page;
+}
diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/mm/nonresident.c linux-2.6.13-rc6-cart/mm/nonresident.c
--- linux-2.6.13-rc6/mm/nonresident.c 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.13-rc6-cart/mm/nonresident.c 2005-08-15 21:46:17.000000000 +0200
@@ -0,0 +1,251 @@
+/*
+ * mm/nonresident.c
+ * (C) 2004,2005 Red Hat, Inc
+ * Written by Rik van Riel <riel@redhat.com>
+ * Released under the GPL, see the file COPYING for details.
+ * Adapted by Peter Zijlstra <a.p.zijlstra@chello.nl> for use by ARC
+ * like algorithms.
+ *
+ * Keeps track of whether a non-resident page was recently evicted
+ * and should be immediately promoted to the active list. This also
+ * helps automatically tune the inactive target.
+ *
+ * The pageout code stores a recently evicted page in this cache
+ * by calling remember_page(mapping/mm, index/vaddr)
+ * and can look it up in the cache by calling recently_evicted()
+ * with the same arguments.
+ *
+ * Note that there is no way to invalidate pages after eg. truncate
+ * or exit, we let the pages fall out of the non-resident set through
+ * normal replacement.
+ *
+ *
+ * Modified to work with ARC like algorithms who:
+ * - need to balance two FIFOs; |b1| + |b2| = c,
+ * - keep a flag per non-resident page.
+ *
+ * The bucket contains two single linked cyclic lists (CLOCKS) and each
+ * clock has a tail hand. By selecting a victim clock upon insertion it
+ * is possible to balance them.
+ *
+ * The slot looks like this:
+ * struct slot_t {
+ * u32 cookie : 24; // LSB
+ * u32 index : 6;
+ * u32 filter : 1;
+ * u32 clock : 1; // MSB
+ * };
+ *
+ * The bucket is guarded by a spinlock.
+ */
+#include <linux/swap.h>
+#include <linux/mm.h>
+#include <linux/cache.h>
+#include <linux/spinlock.h>
+#include <linux/bootmem.h>
+#include <linux/hash.h>
+#include <linux/prefetch.h>
+#include <linux/kernel.h>
+
+#define TARGET_SLOTS 64
+#define NR_CACHELINES (TARGET_SLOTS*sizeof(u32) / L1_CACHE_BYTES)
+#define NR_SLOTS (((NR_CACHELINES * L1_CACHE_BYTES) - sizeof(spinlock_t) - 2*sizeof(u16)) / sizeof(u32))
+#if 0
+#if NR_SLOTS < (TARGET_SLOTS / 2)
+#warning very small slot size
+#if NR_SLOTS <= 0
+#error no room for slots left
+#endif
+#endif
+#endif
+
+#define BUILD_MASK(bits, shift) (((1 << (bits)) - 1) << (shift))
+
+#define FLAGS_BITS 2
+#define FLAGS_SHIFT (sizeof(u32)*8 - FLAGS_BITS)
+#define FLAGS_MASK BUILD_MASK(FLAGS_BITS, FLAGS_SHIFT)
+
+#define INDEX_BITS 6 /* ceil(log2(NR_SLOTS)) */
+#define INDEX_SHIFT (FLAGS_SHIFT - INDEX_BITS)
+#define INDEX_MASK BUILD_MASK(INDEX_BITS, INDEX_SHIFT)
+
+#define SET_INDEX(x, idx) ((x) = ((x) & ~INDEX_MASK) | ((idx) << INDEX_SHIFT))
+#define GET_INDEX(x) (((x) & INDEX_MASK) >> INDEX_SHIFT)
+
+struct nr_bucket
+{
+ spinlock_t lock;
+ u16 hand[2];
+ u32 slot[NR_SLOTS];
+} ____cacheline_aligned;
+
+/* The non-resident page hash table. */
+static struct nr_bucket * nonres_table;
+static unsigned int nonres_shift;
+static unsigned int nonres_mask;
+
+/* hash the address into a bucket */
+static struct nr_bucket * nr_hash(void * mapping, unsigned long index)
+{
+ unsigned long bucket;
+ unsigned long hash;
+
+ hash = hash_ptr(mapping, BITS_PER_LONG);
+ hash = 37 * hash + hash_long(index, BITS_PER_LONG);
+ bucket = hash & nonres_mask;
+
+ return nonres_table + bucket;
+}
+
+/* hash the address, inode and flags into a cookie */
+/* the two msb are flags; where msb-1 is a type flag and msb a period flag */
+static u32 nr_cookie(struct address_space * mapping, unsigned long index, unsigned int flags)
+{
+ u32 c;
+ unsigned long cookie;
+
+ cookie = hash_ptr(mapping, BITS_PER_LONG);
+ cookie = 37 * cookie + hash_long(index, BITS_PER_LONG);
+
+ if (mapping->host) {
+ cookie = 37 * cookie + hash_long(mapping->host->i_ino, BITS_PER_LONG);
+ }
+
+ c = (u32)(cookie >> (BITS_PER_LONG - 32));
+ c = (c & ~FLAGS_MASK) | ((flags << FLAGS_SHIFT) & FLAGS_MASK);
+ return c;
+}
+
+unsigned int recently_evicted(struct address_space * mapping, unsigned long index)
+{
+ struct nr_bucket * nr_bucket;
+ u32 wanted, mask;
+ unsigned int r_flags = 0;
+ int i;
+
+ prefetch(mapping->host);
+ nr_bucket = nr_hash(mapping, index);
+
+ spin_lock_prefetch(nr_bucket); // prefetch_range(nr_bucket, NR_CACHELINES);
+ wanted = nr_cookie(mapping, index, 0) & ~INDEX_MASK;
+ mask = ~(FLAGS_MASK | INDEX_MASK);
+
+ spin_lock(&nr_bucket->lock);
+ for (i = 0; i < NR_SLOTS; ++i) {
+ if ((nr_bucket->slot[i] & mask) == wanted) {
+ r_flags = nr_bucket->slot[i] >> FLAGS_SHIFT;
+ r_flags |= NR_evict; /* set the MSB to mark presence */
+ break;
+ }
+ }
+ spin_unlock(&nr_bucket->lock);
+
+ return r_flags;
+}
+
+/* flags:
+ * logical and of the page flags (NR_filter, NR_list) and
+ * an NR_evict target
+ */
+u32 remember_page(struct address_space * mapping, unsigned long index, unsigned int flags)
+{
+ struct nr_bucket *nr_bucket;
+ u32 cookie;
+ u32 *slot, *tail;
+ unsigned int slot_pos, tail_pos;
+
+ prefetch(mapping->host);
+ nr_bucket = nr_hash(mapping, index);
+
+ spin_lock_prefetch(nr_bucket); // prefetchw_range(nr_bucket, NR_CACHELINES);
+ cookie = nr_cookie(mapping, index, flags);
+
+ flags &= NR_evict; /* removal chain */
+ spin_lock(&nr_bucket->lock);
+
+ /* free a slot */
+again:
+ tail_pos = nr_bucket->hand[!!flags];
+ BUG_ON(tail_pos >= NR_SLOTS);
+ tail = &nr_bucket->slot[tail_pos];
+ if (unlikely((*tail & NR_evict) != flags)) {
+ flags ^= NR_evict; /* empty chain; take other one */
+ goto again;
+ }
+ BUG_ON((*tail & NR_evict) != flags);
+ /* free slot by swapping tail,tail+1, so that we skip over tail */
+ slot_pos = GET_INDEX(*tail);
+ BUG_ON(slot_pos >= NR_SLOTS);
+ slot = &nr_bucket->slot[slot_pos];
+ BUG_ON((*slot & NR_evict) != flags);
+ if (likely(tail != slot)) *slot = xchg(tail, *slot);
+ /* slot: -> [slot], old cookie */
+ BUG_ON(GET_INDEX(*slot) != slot_pos);
+
+ flags = (cookie & NR_evict); /* insertion chain */
+
+ /* place cookie in empty slot */
+ SET_INDEX(cookie, slot_pos); /* -> [slot], cookie */
+ cookie = xchg(slot, cookie); /* slot: -> [slot], cookie */
+
+ /* insert slot before tail; ie. MRU pos */
+ tail_pos = nr_bucket->hand[!!flags];
+ BUG_ON(tail_pos >= NR_SLOTS);
+ tail = &nr_bucket->slot[tail_pos];
+ if (likely((*tail & NR_evict) == flags && tail != slot))
+ *slot = xchg(tail, *slot); /* swap if not empty and not same */
+ nr_bucket->hand[!!flags] = slot_pos;
+
+ spin_unlock(&nr_bucket->lock);
+
+ return cookie;
+}
+
+/*
+ * For interactive workloads, we remember about as many non-resident pages
+ * as we have actual memory pages. For server workloads with large inter-
+ * reference distances we could benefit from remembering more.
+ */
+static __initdata unsigned long nonresident_factor = 1;
+void __init init_nonresident(void)
+{
+ int target;
+ int i, j;
+
+ /*
+ * Calculate the non-resident hash bucket target. Use a power of
+ * two for the division because alloc_large_system_hash rounds up.
+ */
+ target = nr_all_pages * nonresident_factor;
+ target /= (sizeof(struct nr_bucket) / sizeof(u32));
+
+ nonres_table = alloc_large_system_hash("Non-resident page tracking",
+ sizeof(struct nr_bucket),
+ target,
+ 0,
+ HASH_EARLY | HASH_HIGHMEM,
+ &nonres_shift,
+ &nonres_mask,
+ 0);
+
+ for (i = 0; i < (1 << nonres_shift); i++) {
+ spin_lock_init(&nonres_table[i].lock);
+ nonres_table[i].hand[0] = nonres_table[i].hand[1] = 0;
+ for (j = 0; j < NR_SLOTS; ++j) {
+ nonres_table[i].slot[j] = NR_evict;
+ if (j < NR_SLOTS - 1)
+ SET_INDEX(nonres_table[i].slot[j], j+1);
+ else /* j == NR_SLOTS - 1 */
+ SET_INDEX(nonres_table[i].slot[j], 0);
+ }
+ }
+}
+
+static int __init set_nonresident_factor(char * str)
+{
+ if (!str)
+ return 0;
+ nonresident_factor = simple_strtoul(str, &str, 0);
+ return 1;
+}
+__setup("nonresident_factor=", set_nonresident_factor);
diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/mm/swap.c linux-2.6.13-rc6-cart/mm/swap.c
--- linux-2.6.13-rc6/mm/swap.c 2005-03-02 08:38:07.000000000 +0100
+++ linux-2.6.13-rc6-cart/mm/swap.c 2005-08-15 17:33:08.000000000 +0200
@@ -30,6 +30,7 @@
#include <linux/cpu.h>
#include <linux/notifier.h>
#include <linux/init.h>
+#include <linux/cart.h>
/* How many pages do we try to swap or page in/out together? */
int page_cluster;
@@ -107,7 +108,7 @@ void fastcall activate_page(struct page
if (PageLRU(page) && !PageActive(page)) {
del_page_from_inactive_list(zone, page);
SetPageActive(page);
- add_page_to_active_list(zone, page);
+ update_cart_params(page);
inc_page_state(pgactivate);
}
spin_unlock_irq(&zone->lru_lock);
@@ -124,7 +125,6 @@ void fastcall mark_page_accessed(struct
{
if (!PageActive(page) && PageReferenced(page) && PageLRU(page)) {
activate_page(page);
- ClearPageReferenced(page);
} else if (!PageReferenced(page)) {
SetPageReferenced(page);
}
diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/mm/vmscan.c linux-2.6.13-rc6-cart/mm/vmscan.c
--- linux-2.6.13-rc6/mm/vmscan.c 2005-08-15 22:37:01.000000000 +0200
+++ linux-2.6.13-rc6-cart/mm/vmscan.c 2005-08-15 17:33:08.000000000 +0200
@@ -38,6 +38,7 @@
#include <asm/div64.h>
#include <linux/swapops.h>
+#include <linux/cart.h>
/* possible outcome of pageout() */
typedef enum {
@@ -555,6 +556,44 @@ keep:
return reclaimed;
}
+/* This gets a page from the active_list and active_longterm lists in order to add to the incative list */
+static int get_from_active_lists(int nr_to_scan, struct zone *zone, struct list_head *dst, int *scanned)
+{
+ int nr_taken = 0;
+ struct page *page;
+ int scan = 0;
+ unsigned int flags;
+
+ while (scan++ < nr_to_scan) {
+ flags = 0;
+ page = replace(zone, &flags);
+
+ if (!page) break;
+ BUG_ON(!TestClearPageLRU(page));
+ BUG_ON(!flags);
+
+ if (get_page_testone(page)) {
+ /*
+ * It is being freed elsewhere
+ */
+ __put_page(page);
+ SetPageLRU(page);
+
+ if (!(flags & NR_list))
+ add_page_to_active_tail(zone, page);
+ else
+ add_page_to_active_longterm_tail(zone, page);
+ continue;
+ } else {
+ list_add(&page->lru, dst);
+ nr_taken++;
+ }
+ }
+
+ *scanned = scan;
+ return nr_taken;
+}
+
/*
* zone->lru_lock is heavily contended. Some of the functions that
* shrink the lists perform better by taking out a batch of pages
@@ -705,10 +744,10 @@ refill_inactive_zone(struct zone *zone,
lru_add_drain();
spin_lock_irq(&zone->lru_lock);
- pgmoved = isolate_lru_pages(nr_pages, &zone->active_list,
+ pgmoved = get_from_active_lists(nr_pages, zone,
&l_hold, &pgscanned);
zone->pages_scanned += pgscanned;
- zone->nr_active -= pgmoved;
+// zone->nr_active -= pgmoved;
spin_unlock_irq(&zone->lru_lock);
/*
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Zoned CART
2005-08-15 21:31 ` Peter Zijlstra
@ 2005-08-16 19:53 ` Rahul Iyer
2005-08-16 20:49 ` Christoph Lameter
0 siblings, 1 reply; 22+ messages in thread
From: Rahul Iyer @ 2005-08-16 19:53 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: linux-mm, Rik van Riel, Marcelo Tosatti
[-- Attachment #1: Type: text/plain, Size: 26683 bytes --]
Hi Peter,
I have changed the patch to incorporate Marcelo's suggestions.
The changes are:
-- Fixed bug with EvictedActive() macros and related constants
-- removed the 3 fields in the non resident nodes to just one hashval field.
The second one should not affect you in any way i think, as you have
used Rik's code for the non resident page management.
Patches:
-- linux-cartv3.patch - patches on the v2 codebase
-- linux-cartv3.vanilla.patch - patches the linux-2.6.12-rc5 kernel
-- linux-cartv3.partial.patch - fixes only the first issue. This should
be good for you Peter.
Thanks
Rahul
Peter Zijlstra wrote:
>On Sun, 2005-08-14 at 14:58 +0200, Peter Zijlstra wrote:
>
>
>
>>Ok, now on to putting Rahul code on top of this ;-)
>>
>>
>
>I got UML to boot with this patch. Now for some stress and behavioural
>testing.
>
> include/linux/cart.h | 12 ++
> include/linux/mm_inline.h | 36 ++++++
> include/linux/mmzone.h | 12 +-
> include/linux/page-flags.h | 5
> include/linux/swap.h | 14 ++
> init/main.c | 5
> mm/Makefile | 3
> mm/cart.c | 175 +++++++++++++++++++++++++++++++
> mm/nonresident.c | 251 +++++++++++++++++++++++++++++++++++++++++++++
> mm/swap.c | 4
> mm/vmscan.c | 43 +++++++
> 11 files changed, 553 insertions(+), 7 deletions(-)
>
>
>
>------------------------------------------------------------------------
>
>diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/include/linux/cart.h linux-2.6.13-rc6-cart/include/linux/cart.h
>--- linux-2.6.13-rc6/include/linux/cart.h 1970-01-01 01:00:00.000000000 +0100
>+++ linux-2.6.13-rc6-cart/include/linux/cart.h 2005-08-15 17:33:07.000000000 +0200
>@@ -0,0 +1,12 @@
>+#ifndef __CART_H__
>+#define __CART_H__
>+#include <linux/list.h>
>+#include <linux/mm.h>
>+#include <linux/swap.h>
>+
>+extern void cart_init(void);
>+extern void update_cart_params(struct page *);
>+extern struct page *replace(struct zone *, unsigned int *);
>+
>+#endif
>+
>diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/include/linux/mm_inline.h linux-2.6.13-rc6-cart/include/linux/mm_inline.h
>--- linux-2.6.13-rc6/include/linux/mm_inline.h 2005-03-02 08:38:33.000000000 +0100
>+++ linux-2.6.13-rc6-cart/include/linux/mm_inline.h 2005-08-15 17:33:07.000000000 +0200
>@@ -38,3 +38,39 @@ del_page_from_lru(struct zone *zone, str
> zone->nr_inactive--;
> }
> }
>+
>+static inline void
>+add_page_to_active_tail(struct zone *zone, struct page *page)
>+{
>+ list_add_tail(&page->lru, &zone->active_list);
>+ zone->nr_active++;
>+}
>+
>+static inline void
>+del_page_from_active(struct zone *zone, struct page *page)
>+{
>+ list_del(&page->lru);
>+ zone->nr_active--;
>+}
>+
>+static inline void
>+add_page_to_inactive_tail(struct zone *zone, struct page *page)
>+{
>+ list_add_tail(&page->lru, &zone->inactive_list);
>+ zone->nr_inactive++;
>+}
>+
>+static inline void
>+del_page_from_active_longterm(struct zone *zone, struct page *page)
>+{
>+ list_del(&page->lru);
>+ zone->nr_active_longterm--;
>+}
>+
>+static inline void
>+add_page_to_active_longterm_tail(struct zone *zone, struct page *page)
>+{
>+ list_add_tail(&page->lru, &zone->active_longterm);
>+ zone->nr_active_longterm++;
>+}
>+
>diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/include/linux/mmzone.h linux-2.6.13-rc6-cart/include/linux/mmzone.h
>--- linux-2.6.13-rc6/include/linux/mmzone.h 2005-08-15 22:37:00.000000000 +0200
>+++ linux-2.6.13-rc6-cart/include/linux/mmzone.h 2005-08-15 17:33:07.000000000 +0200
>@@ -144,12 +144,20 @@ struct zone {
>
> /* Fields commonly accessed by the page reclaim scanner */
> spinlock_t lru_lock;
>- struct list_head active_list;
>+ struct list_head active_list; /* The T1 list of CART */
>+ struct list_head active_longterm;/* The T2 list of CART */
> struct list_head inactive_list;
> unsigned long nr_scan_active;
> unsigned long nr_scan_inactive;
>- unsigned long nr_active;
>+ unsigned long nr_active;
>+ unsigned long nr_active_longterm;
> unsigned long nr_inactive;
>+ unsigned long nr_evicted_active;
>+ unsigned long nr_evicted_longterm;
>+ unsigned long nr_longterm; /* number of long term pages */
>+ unsigned long nr_shortterm; /* number of short term pages */
>+ unsigned long cart_p; /* p from the CART paper */
>+ unsigned long cart_q; /* q from the cart paper */
> unsigned long pages_scanned; /* since last reclaim */
> int all_unreclaimable; /* All pages pinned */
>
>diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/include/linux/page-flags.h linux-2.6.13-rc6-cart/include/linux/page-flags.h
>--- linux-2.6.13-rc6/include/linux/page-flags.h 2005-08-15 22:37:00.000000000 +0200
>+++ linux-2.6.13-rc6-cart/include/linux/page-flags.h 2005-08-15 17:33:07.000000000 +0200
>@@ -75,6 +75,7 @@
> #define PG_reclaim 17 /* To be reclaimed asap */
> #define PG_nosave_free 18 /* Free, should not be written */
> #define PG_uncached 19 /* Page has been mapped as uncached */
>+#define PG_longterm 20 /* Filter bit for CART see mm/cart.c */
>
> /*
> * Global page accounting. One instance per CPU. Only unsigned longs are
>@@ -305,6 +306,10 @@ extern void __mod_page_state(unsigned lo
> #define SetPageUncached(page) set_bit(PG_uncached, &(page)->flags)
> #define ClearPageUncached(page) clear_bit(PG_uncached, &(page)->flags)
>
>+#define PageLongTerm(page) test_bit(PG_longterm, &(page)->flags)
>+#define SetLongTerm(page) set_bit(PG_longterm, &(page)->flags)
>+#define ClearLongTerm(page) clear_bit(PG_longterm, &(page)->flags)
>+
> struct page; /* forward declaration */
>
> int test_clear_page_dirty(struct page *page);
>diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/include/linux/swap.h linux-2.6.13-rc6-cart/include/linux/swap.h
>--- linux-2.6.13-rc6/include/linux/swap.h 2005-08-15 22:37:00.000000000 +0200
>+++ linux-2.6.13-rc6-cart/include/linux/swap.h 2005-08-15 17:33:08.000000000 +0200
>@@ -154,6 +154,15 @@ extern void out_of_memory(unsigned int _
> /* linux/mm/memory.c */
> extern void swapin_readahead(swp_entry_t, unsigned long, struct vm_area_struct *);
>
>+/* linux/mm/nonresident.c */
>+#define NR_filter 0x01 /* short/long */
>+#define NR_list 0x02 /* b1/b2; correlates to PG_active */
>+#define NR_evict 0x80000000
>+
>+extern u32 remember_page(struct address_space *, unsigned long, unsigned int);
>+extern unsigned int recently_evicted(struct address_space *, unsigned long);
>+extern void init_nonresident(void);
>+
> /* linux/mm/page_alloc.c */
> extern unsigned long totalram_pages;
> extern unsigned long totalhigh_pages;
>@@ -292,6 +301,11 @@ static inline swp_entry_t get_swap_page(
> #define grab_swap_token() do { } while(0)
> #define has_swap_token(x) 0
>
>+/* linux/mm/nonresident.c */
>+#define init_nonresident() do { } while (0)
>+#define remember_page(x,y,z) 0
>+#define recently_evicted(x,y) 0
>+
> #endif /* CONFIG_SWAP */
> #endif /* __KERNEL__*/
> #endif /* _LINUX_SWAP_H */
>diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/init/main.c linux-2.6.13-rc6-cart/init/main.c
>--- linux-2.6.13-rc6/init/main.c 2005-08-15 22:37:00.000000000 +0200
>+++ linux-2.6.13-rc6-cart/init/main.c 2005-08-15 17:36:19.000000000 +0200
>@@ -47,12 +47,15 @@
> #include <linux/rmap.h>
> #include <linux/mempolicy.h>
> #include <linux/key.h>
>+#include <linux/swap.h>
>
> #include <asm/io.h>
> #include <asm/bugs.h>
> #include <asm/setup.h>
> #include <asm/sections.h>
>
>+#include <linux/cart.h>
>+
> /*
> * This is one of the first .c files built. Error out early
> * if we have compiler trouble..
>@@ -494,7 +497,9 @@ asmlinkage void __init start_kernel(void
> }
> #endif
> vfs_caches_init_early();
>+ init_nonresident();
> mem_init();
>+ cart_init();
> kmem_cache_init();
> setup_per_cpu_pageset();
> numa_policy_init();
>diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/mm/Makefile linux-2.6.13-rc6-cart/mm/Makefile
>--- linux-2.6.13-rc6/mm/Makefile 2005-08-15 22:37:01.000000000 +0200
>+++ linux-2.6.13-rc6-cart/mm/Makefile 2005-08-15 17:33:08.000000000 +0200
>@@ -12,7 +12,8 @@ obj-y := bootmem.o filemap.o mempool.o
> readahead.o slab.o swap.o truncate.o vmscan.o \
> prio_tree.o $(mmu-y)
>
>-obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o
>+obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o \
>+ nonresident.o cart.o
> obj-$(CONFIG_HUGETLBFS) += hugetlb.o
> obj-$(CONFIG_NUMA) += mempolicy.o
> obj-$(CONFIG_SPARSEMEM) += sparse.o
>diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/mm/cart.c linux-2.6.13-rc6-cart/mm/cart.c
>--- linux-2.6.13-rc6/mm/cart.c 1970-01-01 01:00:00.000000000 +0100
>+++ linux-2.6.13-rc6-cart/mm/cart.c 2005-08-15 22:22:26.000000000 +0200
>@@ -0,0 +1,175 @@
>+/* Thisfile contains the crux of the CART page replacement algorithm. This implementation however changes a few things form the classic CART scheme. This implementation splits the original active_list of the Linux implementation into two lists, namely active_list and active_longterm. The 'active' pages exist on these two lists. The active_list hopes to capture short term usage, while the active_longterm list hopes to capture long term usage. Whenever a page's state needs to be updated, the update_cart_params() function is called. The refill_incative_zone() function causes the replace() function to be evoked, resulting in the removal of pages from the active lists. Hence, which pages are deemed inactive is determined by the CART algorithm.
>+For further details, please refer to the CART paper here - http://www.almaden.ibm.com/cs/people/dmodha/clockfast.pdf */
>+
>+#include <linux/cart.h>
>+#include <linux/page-flags.h>
>+#include <linux/mm_inline.h>
>+
>+/* Called from init/main.c to initialize the cart parameters */
>+void cart_init()
>+{
>+ pg_data_t *pgdat;
>+ struct zone *zone;
>+ int i;
>+
>+ pgdat = pgdat_list;
>+
>+ do {
>+ for (i=0;i<MAX_NR_ZONES;++i) {
>+ zone = &pgdat->node_zones[i];
>+
>+ spin_lock_init(&zone->lru_lock);
>+ INIT_LIST_HEAD(&zone->active_list);
>+ INIT_LIST_HEAD(&zone->active_longterm);
>+ INIT_LIST_HEAD(&zone->inactive_list);
>+
>+ zone->nr_active = zone->nr_active_longterm = zone->nr_inactive = 0;
>+ zone->nr_evicted_active = 0;
>+ zone->nr_evicted_longterm = zone->present_pages - zone->pages_high;
>+
>+ zone->cart_p = zone->cart_q = zone->nr_longterm = zone->nr_shortterm = 0;
>+ }
>+ } while ((pgdat = pgdat->pgdat_next));
>+}
>+
>+/* The heart of the CART update function. This function is responsible for the movement of pages across the lists */
>+void update_cart_params(struct page *page)
>+{
>+ unsigned int rflags;
>+ unsigned long evicted_active;
>+ unsigned evicted_longterm;
>+ struct zone *zone;
>+
>+ zone = page_zone(page);
>+
>+ rflags = recently_evicted(page->mapping, page->index);
>+ evicted_active = (!rflags && !(rflags & NR_list));
>+ evicted_longterm = (!rflags && (rflags & NR_list));
>+
>+ if (evicted_active) {
>+ zone->cart_p = min(zone->cart_p + max(zone->nr_shortterm/(zone->nr_evicted_active ?: 1UL), 1UL), (zone->present_pages - zone->pages_high));
>+
>+ ++zone->nr_longterm;
>+ SetLongTerm(page);
>+ ClearPageReferenced(page);
>+ }
>+ else if (evicted_longterm) {
>+ zone->cart_p = max(zone->cart_p - max(1UL, zone->nr_longterm/(zone->nr_evicted_longterm ?: 1UL)), 0UL);
>+
>+ ++zone->nr_longterm;
>+ ClearPageReferenced(page);
>+
>+ if (zone->nr_active_longterm + zone->nr_active + zone->nr_evicted_longterm - zone->nr_shortterm >=(zone->present_pages - zone->pages_high)) {
>+ zone->cart_q = min(zone->cart_q + 1, 2*(zone->present_pages - zone->pages_high) - zone->nr_active);
>+ }
>+ }
>+ else {
>+ ++zone->nr_shortterm;
>+ ClearLongTerm(page);
>+ }
>+
>+ add_page_to_active_list(zone, page);
>+}
>+
>+/* The replace function. This function serches the active and longterm lists and looks for a candidate for replacement. This function selects the candidate and returns the corresponding structpage or returns NULL in case no page can be freed. The *where argument is used to indicate the parent list of the page so that, in case it cannot be written back, it can be placed back on the correct list */
>+struct page *replace(struct zone *zone, unsigned int *where)
>+{
>+ struct list_head *list;
>+ struct page *page = NULL;
>+ int referenced = 0;
>+ int debug_count=0;
>+ unsigned int flags = 0, rflags;
>+
>+ list = &zone->active_longterm;
>+ list = list->next;
>+ while (list !=&zone->active_longterm) {
>+ page = list_entry(list, struct page, lru);
>+
>+ if (!PageReferenced(page))
>+ break;
>+
>+ ClearPageReferenced(page);
>+ del_page_from_active_longterm(zone, page);
>+ add_page_to_active_tail(zone, page);
>+
>+ if ((zone->nr_active_longterm + zone->nr_active + zone->nr_evicted_longterm - zone->nr_shortterm) >= (zone->present_pages - zone->pages_high))
>+ zone->cart_q = min(zone->cart_q + 1, 2*(zone->present_pages - zone->pages_high) - zone->nr_active);
>+
>+ list = &zone->active_longterm;
>+ list = list->next;
>+ debug_count++;
>+ }
>+
>+ debug_count=0;
>+ list = &zone->active_list;
>+ list = list->next;
>+
>+ while (list != &zone->active_list) {
>+ page = list_entry(list, struct page, lru);
>+ referenced = PageReferenced(page);
>+
>+ if (!PageLongTerm(page) && !referenced)
>+ break;
>+
>+ ClearPageReferenced(page);
>+ if (referenced) {
>+ del_page_from_active(zone, page);
>+ add_page_to_active_tail(zone, page);
>+
>+ if (zone->nr_active >= min(zone->cart_p+1, zone->nr_evicted_active) && !PageLongTerm(page)) {
>+ SetLongTerm(page);
>+ --zone->nr_shortterm;
>+ ++zone->nr_longterm;
>+ }
>+ }
>+ else {
>+ del_page_from_active(zone, page);
>+ add_page_to_active_longterm_tail(zone, page);
>+
>+ zone->cart_q = max(zone->cart_q-1, (zone->present_pages - zone->pages_high) - zone->nr_active);
>+ }
>+
>+ list = &zone->active_list;
>+ list = list->next;
>+ debug_count++;
>+ }
>+
>+ page = NULL;
>+
>+ if (zone->nr_active > max(1UL, zone->cart_p)) {
>+ if (!list_empty(&zone->active_list)) {
>+ page = list_entry(zone->active_list.next, struct page, lru);
>+ del_page_from_active(zone, page);
>+ --zone->nr_shortterm;
>+ ++zone->nr_evicted_active;
>+ }
>+ }
>+ else {
>+ if (!list_empty(&zone->active_longterm)) {
>+ page = list_entry(zone->active_longterm.next, struct page, lru);
>+ del_page_from_active_longterm(zone, page);
>+ --zone->nr_longterm;
>+ ++zone->nr_evicted_longterm;
>+ flags |= NR_list;
>+ }
>+ }
>+
>+ if (!page) return NULL;
>+ *where = flags | NR_evict;
>+ if (PageLongTerm(page)) flags |= NR_filter;
>+
>+ /* history replacement; always remember, if the page was already remembered
>+ * this will move it to the head.
>+ * Also assume |B1| + |B2| == c + 1, since |B1_j| + |B2_j| == c_j.
>+ */
>+ if (zone->nr_evicted_active <= max(0UL, zone->cart_q)) flags |= NR_evict;
>+
>+ rflags = remember_page(page->mapping, page->index, flags);
>+ if (rflags & NR_evict) {
>+ if (likely(zone->nr_evicted_longterm)) --zone->nr_evicted_longterm;
>+ } else {
>+ if (likely(zone->nr_evicted_active)) --zone->nr_evicted_active;
>+ }
>+
>+ return page;
>+}
>diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/mm/nonresident.c linux-2.6.13-rc6-cart/mm/nonresident.c
>--- linux-2.6.13-rc6/mm/nonresident.c 1970-01-01 01:00:00.000000000 +0100
>+++ linux-2.6.13-rc6-cart/mm/nonresident.c 2005-08-15 21:46:17.000000000 +0200
>@@ -0,0 +1,251 @@
>+/*
>+ * mm/nonresident.c
>+ * (C) 2004,2005 Red Hat, Inc
>+ * Written by Rik van Riel <riel@redhat.com>
>+ * Released under the GPL, see the file COPYING for details.
>+ * Adapted by Peter Zijlstra <a.p.zijlstra@chello.nl> for use by ARC
>+ * like algorithms.
>+ *
>+ * Keeps track of whether a non-resident page was recently evicted
>+ * and should be immediately promoted to the active list. This also
>+ * helps automatically tune the inactive target.
>+ *
>+ * The pageout code stores a recently evicted page in this cache
>+ * by calling remember_page(mapping/mm, index/vaddr)
>+ * and can look it up in the cache by calling recently_evicted()
>+ * with the same arguments.
>+ *
>+ * Note that there is no way to invalidate pages after eg. truncate
>+ * or exit, we let the pages fall out of the non-resident set through
>+ * normal replacement.
>+ *
>+ *
>+ * Modified to work with ARC like algorithms who:
>+ * - need to balance two FIFOs; |b1| + |b2| = c,
>+ * - keep a flag per non-resident page.
>+ *
>+ * The bucket contains two single linked cyclic lists (CLOCKS) and each
>+ * clock has a tail hand. By selecting a victim clock upon insertion it
>+ * is possible to balance them.
>+ *
>+ * The slot looks like this:
>+ * struct slot_t {
>+ * u32 cookie : 24; // LSB
>+ * u32 index : 6;
>+ * u32 filter : 1;
>+ * u32 clock : 1; // MSB
>+ * };
>+ *
>+ * The bucket is guarded by a spinlock.
>+ */
>+#include <linux/swap.h>
>+#include <linux/mm.h>
>+#include <linux/cache.h>
>+#include <linux/spinlock.h>
>+#include <linux/bootmem.h>
>+#include <linux/hash.h>
>+#include <linux/prefetch.h>
>+#include <linux/kernel.h>
>+
>+#define TARGET_SLOTS 64
>+#define NR_CACHELINES (TARGET_SLOTS*sizeof(u32) / L1_CACHE_BYTES)
>+#define NR_SLOTS (((NR_CACHELINES * L1_CACHE_BYTES) - sizeof(spinlock_t) - 2*sizeof(u16)) / sizeof(u32))
>+#if 0
>+#if NR_SLOTS < (TARGET_SLOTS / 2)
>+#warning very small slot size
>+#if NR_SLOTS <= 0
>+#error no room for slots left
>+#endif
>+#endif
>+#endif
>+
>+#define BUILD_MASK(bits, shift) (((1 << (bits)) - 1) << (shift))
>+
>+#define FLAGS_BITS 2
>+#define FLAGS_SHIFT (sizeof(u32)*8 - FLAGS_BITS)
>+#define FLAGS_MASK BUILD_MASK(FLAGS_BITS, FLAGS_SHIFT)
>+
>+#define INDEX_BITS 6 /* ceil(log2(NR_SLOTS)) */
>+#define INDEX_SHIFT (FLAGS_SHIFT - INDEX_BITS)
>+#define INDEX_MASK BUILD_MASK(INDEX_BITS, INDEX_SHIFT)
>+
>+#define SET_INDEX(x, idx) ((x) = ((x) & ~INDEX_MASK) | ((idx) << INDEX_SHIFT))
>+#define GET_INDEX(x) (((x) & INDEX_MASK) >> INDEX_SHIFT)
>+
>+struct nr_bucket
>+{
>+ spinlock_t lock;
>+ u16 hand[2];
>+ u32 slot[NR_SLOTS];
>+} ____cacheline_aligned;
>+
>+/* The non-resident page hash table. */
>+static struct nr_bucket * nonres_table;
>+static unsigned int nonres_shift;
>+static unsigned int nonres_mask;
>+
>+/* hash the address into a bucket */
>+static struct nr_bucket * nr_hash(void * mapping, unsigned long index)
>+{
>+ unsigned long bucket;
>+ unsigned long hash;
>+
>+ hash = hash_ptr(mapping, BITS_PER_LONG);
>+ hash = 37 * hash + hash_long(index, BITS_PER_LONG);
>+ bucket = hash & nonres_mask;
>+
>+ return nonres_table + bucket;
>+}
>+
>+/* hash the address, inode and flags into a cookie */
>+/* the two msb are flags; where msb-1 is a type flag and msb a period flag */
>+static u32 nr_cookie(struct address_space * mapping, unsigned long index, unsigned int flags)
>+{
>+ u32 c;
>+ unsigned long cookie;
>+
>+ cookie = hash_ptr(mapping, BITS_PER_LONG);
>+ cookie = 37 * cookie + hash_long(index, BITS_PER_LONG);
>+
>+ if (mapping->host) {
>+ cookie = 37 * cookie + hash_long(mapping->host->i_ino, BITS_PER_LONG);
>+ }
>+
>+ c = (u32)(cookie >> (BITS_PER_LONG - 32));
>+ c = (c & ~FLAGS_MASK) | ((flags << FLAGS_SHIFT) & FLAGS_MASK);
>+ return c;
>+}
>+
>+unsigned int recently_evicted(struct address_space * mapping, unsigned long index)
>+{
>+ struct nr_bucket * nr_bucket;
>+ u32 wanted, mask;
>+ unsigned int r_flags = 0;
>+ int i;
>+
>+ prefetch(mapping->host);
>+ nr_bucket = nr_hash(mapping, index);
>+
>+ spin_lock_prefetch(nr_bucket); // prefetch_range(nr_bucket, NR_CACHELINES);
>+ wanted = nr_cookie(mapping, index, 0) & ~INDEX_MASK;
>+ mask = ~(FLAGS_MASK | INDEX_MASK);
>+
>+ spin_lock(&nr_bucket->lock);
>+ for (i = 0; i < NR_SLOTS; ++i) {
>+ if ((nr_bucket->slot[i] & mask) == wanted) {
>+ r_flags = nr_bucket->slot[i] >> FLAGS_SHIFT;
>+ r_flags |= NR_evict; /* set the MSB to mark presence */
>+ break;
>+ }
>+ }
>+ spin_unlock(&nr_bucket->lock);
>+
>+ return r_flags;
>+}
>+
>+/* flags:
>+ * logical and of the page flags (NR_filter, NR_list) and
>+ * an NR_evict target
>+ */
>+u32 remember_page(struct address_space * mapping, unsigned long index, unsigned int flags)
>+{
>+ struct nr_bucket *nr_bucket;
>+ u32 cookie;
>+ u32 *slot, *tail;
>+ unsigned int slot_pos, tail_pos;
>+
>+ prefetch(mapping->host);
>+ nr_bucket = nr_hash(mapping, index);
>+
>+ spin_lock_prefetch(nr_bucket); // prefetchw_range(nr_bucket, NR_CACHELINES);
>+ cookie = nr_cookie(mapping, index, flags);
>+
>+ flags &= NR_evict; /* removal chain */
>+ spin_lock(&nr_bucket->lock);
>+
>+ /* free a slot */
>+again:
>+ tail_pos = nr_bucket->hand[!!flags];
>+ BUG_ON(tail_pos >= NR_SLOTS);
>+ tail = &nr_bucket->slot[tail_pos];
>+ if (unlikely((*tail & NR_evict) != flags)) {
>+ flags ^= NR_evict; /* empty chain; take other one */
>+ goto again;
>+ }
>+ BUG_ON((*tail & NR_evict) != flags);
>+ /* free slot by swapping tail,tail+1, so that we skip over tail */
>+ slot_pos = GET_INDEX(*tail);
>+ BUG_ON(slot_pos >= NR_SLOTS);
>+ slot = &nr_bucket->slot[slot_pos];
>+ BUG_ON((*slot & NR_evict) != flags);
>+ if (likely(tail != slot)) *slot = xchg(tail, *slot);
>+ /* slot: -> [slot], old cookie */
>+ BUG_ON(GET_INDEX(*slot) != slot_pos);
>+
>+ flags = (cookie & NR_evict); /* insertion chain */
>+
>+ /* place cookie in empty slot */
>+ SET_INDEX(cookie, slot_pos); /* -> [slot], cookie */
>+ cookie = xchg(slot, cookie); /* slot: -> [slot], cookie */
>+
>+ /* insert slot before tail; ie. MRU pos */
>+ tail_pos = nr_bucket->hand[!!flags];
>+ BUG_ON(tail_pos >= NR_SLOTS);
>+ tail = &nr_bucket->slot[tail_pos];
>+ if (likely((*tail & NR_evict) == flags && tail != slot))
>+ *slot = xchg(tail, *slot); /* swap if not empty and not same */
>+ nr_bucket->hand[!!flags] = slot_pos;
>+
>+ spin_unlock(&nr_bucket->lock);
>+
>+ return cookie;
>+}
>+
>+/*
>+ * For interactive workloads, we remember about as many non-resident pages
>+ * as we have actual memory pages. For server workloads with large inter-
>+ * reference distances we could benefit from remembering more.
>+ */
>+static __initdata unsigned long nonresident_factor = 1;
>+void __init init_nonresident(void)
>+{
>+ int target;
>+ int i, j;
>+
>+ /*
>+ * Calculate the non-resident hash bucket target. Use a power of
>+ * two for the division because alloc_large_system_hash rounds up.
>+ */
>+ target = nr_all_pages * nonresident_factor;
>+ target /= (sizeof(struct nr_bucket) / sizeof(u32));
>+
>+ nonres_table = alloc_large_system_hash("Non-resident page tracking",
>+ sizeof(struct nr_bucket),
>+ target,
>+ 0,
>+ HASH_EARLY | HASH_HIGHMEM,
>+ &nonres_shift,
>+ &nonres_mask,
>+ 0);
>+
>+ for (i = 0; i < (1 << nonres_shift); i++) {
>+ spin_lock_init(&nonres_table[i].lock);
>+ nonres_table[i].hand[0] = nonres_table[i].hand[1] = 0;
>+ for (j = 0; j < NR_SLOTS; ++j) {
>+ nonres_table[i].slot[j] = NR_evict;
>+ if (j < NR_SLOTS - 1)
>+ SET_INDEX(nonres_table[i].slot[j], j+1);
>+ else /* j == NR_SLOTS - 1 */
>+ SET_INDEX(nonres_table[i].slot[j], 0);
>+ }
>+ }
>+}
>+
>+static int __init set_nonresident_factor(char * str)
>+{
>+ if (!str)
>+ return 0;
>+ nonresident_factor = simple_strtoul(str, &str, 0);
>+ return 1;
>+}
>+__setup("nonresident_factor=", set_nonresident_factor);
>diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/mm/swap.c linux-2.6.13-rc6-cart/mm/swap.c
>--- linux-2.6.13-rc6/mm/swap.c 2005-03-02 08:38:07.000000000 +0100
>+++ linux-2.6.13-rc6-cart/mm/swap.c 2005-08-15 17:33:08.000000000 +0200
>@@ -30,6 +30,7 @@
> #include <linux/cpu.h>
> #include <linux/notifier.h>
> #include <linux/init.h>
>+#include <linux/cart.h>
>
> /* How many pages do we try to swap or page in/out together? */
> int page_cluster;
>@@ -107,7 +108,7 @@ void fastcall activate_page(struct page
> if (PageLRU(page) && !PageActive(page)) {
> del_page_from_inactive_list(zone, page);
> SetPageActive(page);
>- add_page_to_active_list(zone, page);
>+ update_cart_params(page);
> inc_page_state(pgactivate);
> }
> spin_unlock_irq(&zone->lru_lock);
>@@ -124,7 +125,6 @@ void fastcall mark_page_accessed(struct
> {
> if (!PageActive(page) && PageReferenced(page) && PageLRU(page)) {
> activate_page(page);
>- ClearPageReferenced(page);
> } else if (!PageReferenced(page)) {
> SetPageReferenced(page);
> }
>diff -NaurpX linux-2.6.13-rc6-cart/Documentation/dontdiff -x arch -x asm-um linux-2.6.13-rc6/mm/vmscan.c linux-2.6.13-rc6-cart/mm/vmscan.c
>--- linux-2.6.13-rc6/mm/vmscan.c 2005-08-15 22:37:01.000000000 +0200
>+++ linux-2.6.13-rc6-cart/mm/vmscan.c 2005-08-15 17:33:08.000000000 +0200
>@@ -38,6 +38,7 @@
> #include <asm/div64.h>
>
> #include <linux/swapops.h>
>+#include <linux/cart.h>
>
> /* possible outcome of pageout() */
> typedef enum {
>@@ -555,6 +556,44 @@ keep:
> return reclaimed;
> }
>
>+/* This gets a page from the active_list and active_longterm lists in order to add to the incative list */
>+static int get_from_active_lists(int nr_to_scan, struct zone *zone, struct list_head *dst, int *scanned)
>+{
>+ int nr_taken = 0;
>+ struct page *page;
>+ int scan = 0;
>+ unsigned int flags;
>+
>+ while (scan++ < nr_to_scan) {
>+ flags = 0;
>+ page = replace(zone, &flags);
>+
>+ if (!page) break;
>+ BUG_ON(!TestClearPageLRU(page));
>+ BUG_ON(!flags);
>+
>+ if (get_page_testone(page)) {
>+ /*
>+ * It is being freed elsewhere
>+ */
>+ __put_page(page);
>+ SetPageLRU(page);
>+
>+ if (!(flags & NR_list))
>+ add_page_to_active_tail(zone, page);
>+ else
>+ add_page_to_active_longterm_tail(zone, page);
>+ continue;
>+ } else {
>+ list_add(&page->lru, dst);
>+ nr_taken++;
>+ }
>+ }
>+
>+ *scanned = scan;
>+ return nr_taken;
>+}
>+
> /*
> * zone->lru_lock is heavily contended. Some of the functions that
> * shrink the lists perform better by taking out a batch of pages
>@@ -705,10 +744,10 @@ refill_inactive_zone(struct zone *zone,
>
> lru_add_drain();
> spin_lock_irq(&zone->lru_lock);
>- pgmoved = isolate_lru_pages(nr_pages, &zone->active_list,
>+ pgmoved = get_from_active_lists(nr_pages, zone,
> &l_hold, &pgscanned);
> zone->pages_scanned += pgscanned;
>- zone->nr_active -= pgmoved;
>+// zone->nr_active -= pgmoved;
> spin_unlock_irq(&zone->lru_lock);
>
> /*
>
>
[-- Attachment #2: linux-cartv3.patch --]
[-- Type: text/x-patch, Size: 3671 bytes --]
diff -Naur /home/rni/linux-cart.v2/include/linux/cart.h linux-cart.v3/include/linux/cart.h
--- /home/rni/linux-cart.v2/include/linux/cart.h 2005-08-16 13:58:49.000000000 -0400
+++ linux-cart.v3/include/linux/cart.h 2005-08-16 14:01:20.000000000 -0400
@@ -7,20 +7,18 @@
#define EVICTED_ACTIVE 1
#define EVICTED_LONGTERM 2
-#define ACTIVE 3
-#define ACTIVE_LONGTERM 4
+#define ACTIVE 4
+#define ACTIVE_LONGTERM 8
-#define EvictedActive(location) location & EVICTED_ACTIVE
-#define EvictedLongterm(location) location & EVICTED_LONGTERM
-#define Active(location) location & ACTIVE
-#define ActiveLongterm(location) location & ACTIVE_LONGTERM
+#define EvictedActive(location) (location & EVICTED_ACTIVE)
+#define EvictedLongterm(location) (location & EVICTED_LONGTERM)
+#define Active(location) (location & ACTIVE)
+#define ActiveLongterm(location) (location & ACTIVE_LONGTERM)
struct non_res_list_node {
struct list_head list;
struct list_head hash;
- unsigned long mapping;
- unsigned long offset;
- unsigned long inode;
+ unsigned long hashval;
};
extern void cart_init();
diff -Naur /home/rni/linux-cart.v2/include/linux/evicted_hash.h linux-cart.v3/include/linux/evicted_hash.h
--- /home/rni/linux-cart.v2/include/linux/evicted_hash.h 2005-08-16 13:58:50.000000000 -0400
+++ linux-cart.v3/include/linux/evicted_hash.h 2005-08-16 14:06:54.000000000 -0400
@@ -3,7 +3,7 @@
#include <linux/cart.h>
void hashtable_init(struct hashtable *h);
-unsigned long mk_hash(struct page *page);
+unsigned long mk_hash_page(struct page *page);
unsigned long find_in_hashtable(struct hashtable *h, struct page *page);
void add_to_hashtable (struct hashtable *, struct non_res_list_node *);
unsigned long get_inode_num(void *addr);
diff -Naur /home/rni/linux-cart.v2/mm/cart.c linux-cart.v3/mm/cart.c
--- /home/rni/linux-cart.v2/mm/cart.c 2005-08-16 13:58:40.000000000 -0400
+++ linux-cart.v3/mm/cart.c 2005-08-16 14:06:12.000000000 -0400
@@ -67,9 +67,7 @@
printk (KERN_EMERG "Couldn't get a non_res_node!\n");
return;
- node->offset = page->index;
- node->mapping = (unsigned long) page->mapping;
- node->inode = get_inode_num(page->mapping);
+ node->hashval = mk_hash_page(page);
list_add(&node->list, l);
add_to_hashtable(h, node);
diff -Naur /home/rni/linux-cart.v2/mm/evicted_hash.c linux-cart.v3/mm/evicted_hash.c
--- /home/rni/linux-cart.v2/mm/evicted_hash.c 2005-08-16 13:58:40.000000000 -0400
+++ linux-cart.v3/mm/evicted_hash.c 2005-08-16 14:05:29.000000000 -0400
@@ -34,13 +34,13 @@
/* The hashing function... a better one is needed */
static inline unsigned long mk_hash_page(struct page *page)
{
- return (((unsigned long)page->mapping ^ page->index) ^ get_inode_num(page->mapping))%HASHTABLE_SIZE;
+ return ((unsigned long)page->mapping ^ page->index) ^ get_inode_num(page->mapping);
}
/* Hashing for non resident nodes */
static inline unsigned long mk_hash_non_res(struct non_res_list_node *node)
{
- return (node->offset ^ node->mapping ^ node->inode)%HASHTABLE_SIZE;
+ return (node->hashval)%HASHTABLE_SIZE;
}
/* Search in the hash table */
@@ -49,11 +49,12 @@
unsigned long index;
struct non_res_list_node *node;
struct list_head *list;
-
- index = mk_hash_page(page);
+ unsigned long hashval = mk_hash_page(page);
+
+ index = hashval%HASHTABLE_SIZE;
list_for_each_entry(node, &h->buckets[index], hash) {
- if (node->mapping == (unsigned long)page->mapping && node->offset == page->index && node->inode == get_inode_num(page->mapping))
+ if (node->hashval == hashval)
return 1;
}
[-- Attachment #3: linux-cartv3.vanilla.patch --]
[-- Type: text/x-patch, Size: 21865 bytes --]
diff -Naur /home/rni/linux-2.6.12-rc5/include/linux/cart.h linux-cart.v3/include/linux/cart.h
--- /home/rni/linux-2.6.12-rc5/include/linux/cart.h 1969-12-31 19:00:00.000000000 -0500
+++ linux-cart.v3/include/linux/cart.h 2005-08-16 14:01:20.000000000 -0400
@@ -0,0 +1,28 @@
+#ifndef __CART_H__
+#define __CART_H__
+#include <linux/list.h>
+#include <linux/mm.h>
+
+#define MIN_POOL 512
+
+#define EVICTED_ACTIVE 1
+#define EVICTED_LONGTERM 2
+#define ACTIVE 4
+#define ACTIVE_LONGTERM 8
+
+#define EvictedActive(location) (location & EVICTED_ACTIVE)
+#define EvictedLongterm(location) (location & EVICTED_LONGTERM)
+#define Active(location) (location & ACTIVE)
+#define ActiveLongterm(location) (location & ACTIVE_LONGTERM)
+
+struct non_res_list_node {
+ struct list_head list;
+ struct list_head hash;
+ unsigned long hashval;
+};
+
+extern void cart_init();
+void update_cart_params(struct page *);
+struct page *replace(struct zone *, int *);
+#endif
+
diff -Naur /home/rni/linux-2.6.12-rc5/include/linux/evicted_hash.h linux-cart.v3/include/linux/evicted_hash.h
--- /home/rni/linux-2.6.12-rc5/include/linux/evicted_hash.h 1969-12-31 19:00:00.000000000 -0500
+++ linux-cart.v3/include/linux/evicted_hash.h 2005-08-16 14:06:54.000000000 -0400
@@ -0,0 +1,10 @@
+#include <linux/mm.h>
+#include <linux/mmzone.h>
+#include <linux/cart.h>
+
+void hashtable_init(struct hashtable *h);
+unsigned long mk_hash_page(struct page *page);
+unsigned long find_in_hashtable(struct hashtable *h, struct page *page);
+void add_to_hashtable (struct hashtable *, struct non_res_list_node *);
+unsigned long get_inode_num(void *addr);
+
diff -Naur /home/rni/linux-2.6.12-rc5/include/linux/mm_inline.h linux-cart.v3/include/linux/mm_inline.h
--- /home/rni/linux-2.6.12-rc5/include/linux/mm_inline.h 2005-05-24 23:31:20.000000000 -0400
+++ linux-cart.v3/include/linux/mm_inline.h 2005-08-16 13:59:51.000000000 -0400
@@ -38,3 +38,32 @@
zone->nr_inactive--;
}
}
+
+static inline void
+add_page_to_active_list_tail(struct zone *zone, struct page *page)
+{
+ list_add_tail(&page->lru, &zone->active_list);
+ zone->nr_active++;
+}
+
+static inline void
+add_page_to_inactive_list_tail(struct zone *zone, struct page *page)
+{
+ list_add_tail(&page->lru, &zone->inactive_list);
+ zone->nr_inactive++;
+}
+
+static inline void
+del_page_from_active_longterm(struct zone *zone, struct page *page)
+{
+ list_del(&page->lru);
+ zone->nr_active_longterm--;
+}
+
+static inline void
+add_page_to_active_longterm_tail(struct zone *zone, struct page *page)
+{
+ list_add_tail(&page->lru, &zone->active_longterm);
+ zone->nr_active_longterm++;
+}
+
diff -Naur /home/rni/linux-2.6.12-rc5/include/linux/mmzone.h linux-cart.v3/include/linux/mmzone.h
--- /home/rni/linux-2.6.12-rc5/include/linux/mmzone.h 2005-05-24 23:31:20.000000000 -0400
+++ linux-cart.v3/include/linux/mmzone.h 2005-08-16 13:59:50.000000000 -0400
@@ -26,6 +26,18 @@
unsigned long nr_free;
};
+/* A Hashtable for the evicted list entries
+ * This is here, and not in linux/evicted_hash.h
+ * as the hashtable is needed in struct zone, but
+ * the function prototypes for the hash table use
+ * struct page, which is as yet unrecognised here
+ */
+#define HASHTABLE_SIZE 512
+
+struct hashtable {
+ struct list_head buckets[HASHTABLE_SIZE];
+};
+
struct pglist_data;
/*
@@ -135,12 +147,24 @@
/* Fields commonly accessed by the page reclaim scanner */
spinlock_t lru_lock;
- struct list_head active_list;
+ struct list_head active_list; /* The T1 list of CART */
+ struct list_head active_longterm;/* The T2 list of CART */
struct list_head inactive_list;
+ struct list_head evicted_active; /* The B1 list of CART */
+ struct list_head evicted_longterm;/*B2 list of CART */
+ struct hashtable evicted_active_hashtable; /*Hash table for evicted active list */
+ struct hashtable evicted_longterm_hashtable; /*Hast table for evicted inactive list */
unsigned long nr_scan_active;
unsigned long nr_scan_inactive;
- unsigned long nr_active;
+ unsigned long nr_active;
+ unsigned long nr_active_longterm;
unsigned long nr_inactive;
+ unsigned long nr_evicted_active;
+ unsigned long nr_evicted_longterm;
+ unsigned long nr_longterm; /* number of long term pages */
+ unsigned long nr_shortterm; /* number of short term pages */
+ unsigned long p; /* p from the CART paper */
+ unsigned long q; /* q from the cart paper */
unsigned long pages_scanned; /* since last reclaim */
int all_unreclaimable; /* All pages pinned */
diff -Naur /home/rni/linux-2.6.12-rc5/include/linux/page-flags.h linux-cart.v3/include/linux/page-flags.h
--- /home/rni/linux-2.6.12-rc5/include/linux/page-flags.h 2005-05-24 23:31:20.000000000 -0400
+++ linux-cart.v3/include/linux/page-flags.h 2005-08-16 13:59:51.000000000 -0400
@@ -58,7 +58,7 @@
#define PG_dirty 4
#define PG_lru 5
-#define PG_active 6
+#define PG_active 6
#define PG_slab 7 /* slab debug (Suparna wants this) */
#define PG_highmem 8
@@ -76,6 +76,7 @@
#define PG_reclaim 18 /* To be reclaimed asap */
#define PG_nosave_free 19 /* Free, should not be written */
#define PG_uncached 20 /* Page has been mapped as uncached */
+#define PG_longterm 21 /* Filter bit for CART see mm/cart.c */
/*
* Global page accounting. One instance per CPU. Only unsigned longs are
@@ -306,6 +307,10 @@
#define SetPageUncached(page) set_bit(PG_uncached, &(page)->flags)
#define ClearPageUncached(page) clear_bit(PG_uncached, &(page)->flags)
+#define PageLongTerm(page) test_bit(PG_longterm, &(page)->flags)
+#define SetLongTerm(page) set_bit(PG_longterm, &(page)->flags)
+#define ClearLongTerm(page) clear_bit(PG_longterm, &(page)->flags)
+
struct page; /* forward declaration */
int test_clear_page_dirty(struct page *page);
diff -Naur /home/rni/linux-2.6.12-rc5/init/main.c linux-cart.v3/init/main.c
--- /home/rni/linux-2.6.12-rc5/init/main.c 2005-05-24 23:31:20.000000000 -0400
+++ linux-cart.v3/init/main.c 2005-08-16 13:59:51.000000000 -0400
@@ -52,6 +52,8 @@
#include <asm/bugs.h>
#include <asm/setup.h>
+#include <linux/cart.h>
+
/*
* This is one of the first .c files built. Error out early
* if we have compiler trouble..
@@ -490,6 +492,7 @@
vfs_caches_init_early();
mem_init();
kmem_cache_init();
+ cart_init();
numa_policy_init();
if (late_time_init)
late_time_init();
diff -Naur /home/rni/linux-2.6.12-rc5/mm/cart.c linux-cart.v3/mm/cart.c
--- /home/rni/linux-2.6.12-rc5/mm/cart.c 1969-12-31 19:00:00.000000000 -0500
+++ linux-cart.v3/mm/cart.c 2005-08-16 14:06:12.000000000 -0400
@@ -0,0 +1,282 @@
+/* Thisfile contains the crux of the CART page replacement algorithm. This implementation however changes a few things form the classic CART scheme. This implementation splits the original active_list of the Linux implementation into two lists, namely active_list and active_longterm. The 'active' pages exist on these two lists. The active_list hopes to capture short term usage, while the active_longterm list hopes to capture long term usage. Whenever a page's state needs to be updated, the update_cart_params() function is called. The refill_incative_zone() function causes the replace() function to be evoked, resulting in the removal of pages from the active lists. Hence, which pages are deemed inactive is determined by the CART algorithm.
+For further details, please refer to the CART paper here - http://www.almaden.ibm.com/cs/people/dmodha/clockfast.pdf */
+
+#include <linux/cart.h>
+#include <linux/mmzone.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/mempool.h>
+#include <linux/page-flags.h>
+#include <linux/evicted_hash.h>
+#include <linux/mm_inline.h>
+
+kmem_cache_t *evicted_node_cache;
+mempool_t *evicted_node_pool;
+
+/* Called from init/main.c to initialize the cart parameters */
+void cart_init()
+{
+ pg_data_t *pgdat;
+ struct zone *zone;
+ int i;
+
+ pgdat = pgdat_list;
+
+ do {
+ for (i=0;i<MAX_NR_ZONES;++i) {
+ zone = &pgdat->node_zones[i];
+
+ spin_lock_init(&zone->lru_lock);
+ INIT_LIST_HEAD(&zone->active_list);
+ INIT_LIST_HEAD(&zone->active_longterm);
+ INIT_LIST_HEAD(&zone->inactive_list);
+ INIT_LIST_HEAD(&zone->evicted_active);
+ INIT_LIST_HEAD(&zone->evicted_longterm);
+
+ hashtable_init(&zone->evicted_active_hashtable);
+ hashtable_init(&zone->evicted_longterm_hashtable);
+
+ zone->nr_active = zone->nr_active_longterm = zone->nr_inactive = zone->nr_evicted_active = zone->nr_evicted_longterm = 0;
+
+ zone->p = zone->q = zone->nr_longterm = zone->nr_shortterm = 0;
+ }
+ } while ((pgdat = pgdat->pgdat_next));
+
+ /* Create a slab for non resident nodes */
+ evicted_node_cache = kmem_cache_create("EvictedPool", sizeof(struct non_res_list_node), 0, 0, NULL, NULL);
+
+ if (!evicted_node_cache)
+ panic("Could not allocate evicted node cache!\n");
+
+ /* Create a mempool of preallocated objects */
+ evicted_node_pool = mempool_create(MIN_POOL, mempool_alloc_slab, mempool_free_slab, evicted_node_cache);
+
+ if (!evicted_node_pool)
+ panic("Could not allocate evicted node_pool!\n");
+
+}
+
+/* Add a node to the evicted list specified */
+void add_to_evicted(struct hashtable *h, struct list_head *l, struct page *page)
+{
+ struct non_res_list_node *node;
+ node = mempool_alloc(evicted_node_pool, GFP_ATOMIC);
+
+ if (!node)
+ /* This is usually bad news :) */
+ printk (KERN_EMERG "Couldn't get a non_res_node!\n");
+ return;
+
+ node->hashval = mk_hash_page(page);
+
+ list_add(&node->list, l);
+ add_to_hashtable(h, node);
+}
+
+/* Delete a node form the tail of an evicted list */
+void del_from_evicted_tail(struct list_head *head)
+{
+ struct list_head *list;
+ struct non_res_list_node *node;
+ if(list_empty(head))
+ return;
+
+ list = head->prev;
+ node = list_entry(list, struct non_res_list_node, list);
+ list_del(list);
+ list_del(&node->hash);
+
+ mempool_free(node, evicted_node_pool);
+}
+
+/* Remove the tail node of the evicted active list */
+void del_from_evicted_active_tail(struct zone *zone)
+{
+ del_from_evicted_tail(&zone->evicted_active);
+}
+
+/* Remove the tail node of the evicted longterm list */
+void del_from_evicted_longterm_tail(struct zone *zone)
+{
+ del_from_evicted_tail(&zone->evicted_longterm);
+}
+
+/* Add a node to the evicted active list */
+void add_to_evicted_active(struct zone *zone, struct page *page)
+{
+ add_to_evicted(&zone->evicted_active_hashtable, &zone->evicted_active, page);
+ ++zone->nr_evicted_active;
+}
+
+/* Add a node to the evicted longterm list */
+void add_to_evicted_longterm(struct zone *zone, struct page *page)
+{
+ add_to_evicted(&zone->evicted_longterm_hashtable, &zone->evicted_longterm, page);
+ ++zone->nr_evicted_longterm;
+}
+
+/* Search for a node in the evicted active list */
+unsigned long find_in_evicted_active(struct page *page)
+{
+ struct zone *zone;
+ zone = page_zone(page);
+ return find_in_hashtable(&zone->evicted_active_hashtable, page);
+}
+
+/* Search for a node in the evicted longterm list */
+unsigned long find_in_evicted_longterm(struct page *page)
+{
+ struct zone *zone;
+ zone = page_zone(page);
+ return find_in_hashtable(&zone->evicted_longterm_hashtable, page);
+}
+
+/* Look to see whether a node is in any evicted list */
+unsigned long find_in_evicted_list(struct page *page)
+{
+ if (find_in_evicted_active(page))
+ return EVICTED_ACTIVE;
+ if (find_in_evicted_longterm(page))
+ return EVICTED_LONGTERM;
+
+ return 0;
+}
+
+/* The heart of the CART update function. This function is responsible for the movement of pages across the lists */
+void update_cart_params(struct page *page)
+{
+ unsigned long location;
+ unsigned long evicted_active;
+ unsigned evicted_longterm;
+ struct zone *zone;
+
+ zone = page_zone(page);
+
+ location = find_in_evicted_list(page);
+ evicted_active = EvictedActive(location);
+ evicted_longterm = EvictedLongterm(location);
+
+ if (evicted_active) {
+ zone->p = min(zone->p + max(zone->nr_shortterm/zone->nr_evicted_active, 1), zone->pages_high);
+
+ ++zone->nr_longterm;
+ SetLongTerm(page);
+ ClearPageReferenced(page);
+ }
+ else if (evicted_longterm) {
+ zone->p = max(zone->p - max(1, zone->nr_longterm/zone->nr_evicted_longterm), 0);
+ ++zone->nr_longterm;
+ ClearPageReferenced(page);
+
+ if (zone->nr_active_longterm + zone->nr_active + zone->nr_evicted_longterm - zone->nr_shortterm >=zone->pages_high) {
+ zone->q = min(zone->q + 1, 2*zone->pages_high - zone->nr_active);
+ }
+ }
+ else {
+ ++zone->nr_shortterm;
+ ClearLongTerm(page);
+ }
+
+ add_page_to_active_list(zone, page);
+}
+
+/* The replace function. This function serches the active and longterm lists and looks for a candidate for replacement. This function selects the candidate and returns the corresponding structpage or returns NULL in case no page can be freed. The *where argument is used to indicate the parent list of the page so that, in case it cannot be written back, it can be placed back on the correct list */
+struct page *replace(struct zone *zone, int *where)
+{
+ struct list_head *list;
+ struct page *page = NULL;
+ int referenced = 0;
+ unsigned long location;
+ int debug_count=0;
+
+ list = &zone->active_longterm;
+ list = list->next;
+ while (list !=&zone->active_longterm) {
+ page = list_entry(list, struct page, lru);
+
+ if (!PageReferenced(page))
+ break;
+
+ ClearPageReferenced(page);
+ del_page_from_active_longterm(zone, page);
+ add_page_to_active_list_tail(zone, page);
+
+ if ((zone->nr_active_longterm + zone->nr_active + zone->nr_evicted_longterm - zone->nr_shortterm) >= zone->pages_high)
+ zone->q = min(zone->q + 1, 2*zone->pages_high - zone->nr_active);
+
+ list = &zone->active_longterm;
+ list = list->next;
+ debug_count++;
+ }
+
+ debug_count=0;
+ list = &zone->active_list;
+ list = list->next;
+
+ while (list != &zone->active_list) {
+ page = list_entry(list, struct page, lru);
+ referenced = PageReferenced(page);
+
+ if (!PageLongTerm(page) && !referenced)
+ break;
+
+ ClearPageReferenced(page);
+ if (referenced) {
+ del_page_from_active_list(zone, page);
+ add_page_to_active_list_tail(zone, page);
+
+ if (zone->nr_active >= min(zone->p+1, zone->nr_evicted_active) && !PageLongTerm(page)) {
+ SetLongTerm(page);
+ --zone->nr_shortterm;
+ ++zone->nr_longterm;
+ }
+ }
+ else {
+ del_page_from_active_list(zone, page);
+ add_page_to_active_longterm_tail(zone, page);
+
+ zone->q = max(zone->q-1, zone->pages_high - zone->nr_active);
+ }
+
+ list = &zone->active_list;
+ list = list->next;
+ debug_count++;
+ }
+
+ page = NULL;
+
+ if (zone->nr_active > max(1, zone->p)) {
+ if (!list_empty(&zone->active_list)) {
+ page = list_entry(zone->active_list.next, struct page, lru);
+ del_page_from_active_list(zone, page);
+ add_to_evicted_active(zone, page);
+ *where = ACTIVE;
+ }
+ }
+ else {
+ if (!list_empty(&zone->active_longterm)) {
+ page = list_entry(zone->active_longterm.next, struct page, lru);
+ del_page_from_active_longterm(zone, page);
+ --zone->nr_longterm;
+ add_to_evicted_longterm(zone, page);
+ *where = ACTIVE_LONGTERM;
+ }
+ }
+
+ if (!page)
+ return NULL;
+
+ if (*where == 0)
+ BUG();
+
+ location = find_in_evicted_list(page);
+
+ if (!location && (zone->nr_evicted_active + zone->nr_evicted_longterm == zone->pages_high+1) && ((zone->nr_evicted_active > max(0, zone->q) ||zone->nr_evicted_longterm == 0))) {
+ del_from_evicted_active_tail(zone);
+ }
+ else if (!location && (zone->nr_evicted_active + zone->nr_evicted_longterm == zone->pages_high + 1))
+ del_from_evicted_longterm_tail(zone);
+
+ return page;
+}
+
diff -Naur /home/rni/linux-2.6.12-rc5/mm/evicted_hash.c linux-cart.v3/mm/evicted_hash.c
--- /home/rni/linux-2.6.12-rc5/mm/evicted_hash.c 1969-12-31 19:00:00.000000000 -0500
+++ linux-cart.v3/mm/evicted_hash.c 2005-08-16 14:05:29.000000000 -0400
@@ -0,0 +1,73 @@
+/* This file contains the functions required to manage the evicted list hash table. The evicted nodes are maintained as a couple of lists, but the hash table is used for speedy lookup */
+
+#include <linux/mm.h>
+#include <linux/mmzone.h>
+#include <linux/cart.h>
+
+/* Initialize the hashtable */
+void hashtable_init(struct hashtable *h)
+{
+ int i;
+
+ for (i=0;i<HASHTABLE_SIZE;++i) {
+ INIT_LIST_HEAD(&h->buckets[i]);
+ }
+}
+
+/* Get the inode number of a page if it is file backed, return 0 if it is anonymous */
+unsigned long get_inode_num(void *addr)
+{
+ struct address_space *p;
+
+ if (!addr)
+ return 0;
+
+ /* If lower bit is set, then it is a anon_vma object */
+ if (((unsigned long)addr) & 0x1)
+ return 0;
+
+ p = (struct address_space *)addr;
+
+ return p->host->i_ino;
+}
+
+/* The hashing function... a better one is needed */
+static inline unsigned long mk_hash_page(struct page *page)
+{
+ return ((unsigned long)page->mapping ^ page->index) ^ get_inode_num(page->mapping);
+}
+
+/* Hashing for non resident nodes */
+static inline unsigned long mk_hash_non_res(struct non_res_list_node *node)
+{
+ return (node->hashval)%HASHTABLE_SIZE;
+}
+
+/* Search in the hash table */
+unsigned long find_in_hashtable(struct hashtable *h, struct page *page)
+{
+ unsigned long index;
+ struct non_res_list_node *node;
+ struct list_head *list;
+ unsigned long hashval = mk_hash_page(page);
+
+ index = hashval%HASHTABLE_SIZE;
+
+ list_for_each_entry(node, &h->buckets[index], hash) {
+ if (node->hashval == hashval)
+ return 1;
+ }
+
+ return 0;
+}
+
+/* Add a node to the hash table */
+void add_to_hashtable(struct hashtable *h, struct non_res_list_node *node)
+{
+ unsigned long index;
+
+ index = mk_hash_non_res(node);
+
+ list_add(&node->hash, &h->buckets[index]);
+}
+
diff -Naur /home/rni/linux-2.6.12-rc5/mm/filemap.c linux-cart.v3/mm/filemap.c
--- /home/rni/linux-2.6.12-rc5/mm/filemap.c 2005-05-24 23:31:20.000000000 -0400
+++ linux-cart.v3/mm/filemap.c 2005-08-16 13:59:51.000000000 -0400
@@ -107,7 +107,6 @@
void __remove_from_page_cache(struct page *page)
{
struct address_space *mapping = page->mapping;
-
radix_tree_delete(&mapping->page_tree, page->index);
page->mapping = NULL;
mapping->nrpages--;
diff -Naur /home/rni/linux-2.6.12-rc5/mm/Makefile linux-cart.v3/mm/Makefile
--- /home/rni/linux-2.6.12-rc5/mm/Makefile 2005-05-24 23:31:20.000000000 -0400
+++ linux-cart.v3/mm/Makefile 2005-08-16 13:59:51.000000000 -0400
@@ -5,7 +5,7 @@
mmu-y := nommu.o
mmu-$(CONFIG_MMU) := fremap.o highmem.o madvise.o memory.o mincore.o \
mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \
- vmalloc.o
+ vmalloc.o cart.o evicted_hash.o
obj-y := bootmem.o filemap.o mempool.o oom_kill.o fadvise.o \
page_alloc.o page-writeback.o pdflush.o \
diff -Naur /home/rni/linux-2.6.12-rc5/mm/swap.c linux-cart.v3/mm/swap.c
--- /home/rni/linux-2.6.12-rc5/mm/swap.c 2005-05-24 23:31:20.000000000 -0400
+++ linux-cart.v3/mm/swap.c 2005-08-16 13:59:51.000000000 -0400
@@ -30,6 +30,7 @@
#include <linux/cpu.h>
#include <linux/notifier.h>
#include <linux/init.h>
+#include <linux/cart.h>
/* How many pages do we try to swap or page in/out together? */
int page_cluster;
@@ -107,7 +108,7 @@
if (PageLRU(page) && !PageActive(page)) {
del_page_from_inactive_list(zone, page);
SetPageActive(page);
- add_page_to_active_list(zone, page);
+ update_cart_params(page);
inc_page_state(pgactivate);
}
spin_unlock_irq(&zone->lru_lock);
@@ -124,7 +125,6 @@
{
if (!PageActive(page) && PageReferenced(page) && PageLRU(page)) {
activate_page(page);
- ClearPageReferenced(page);
} else if (!PageReferenced(page)) {
SetPageReferenced(page);
}
diff -Naur /home/rni/linux-2.6.12-rc5/mm/vmscan.c linux-cart.v3/mm/vmscan.c
--- /home/rni/linux-2.6.12-rc5/mm/vmscan.c 2005-05-24 23:31:20.000000000 -0400
+++ linux-cart.v3/mm/vmscan.c 2005-08-16 13:59:51.000000000 -0400
@@ -38,6 +38,7 @@
#include <asm/div64.h>
#include <linux/swapops.h>
+#include <linux/cart.h>
/* possible outcome of pageout() */
typedef enum {
@@ -545,6 +546,50 @@
return reclaimed;
}
+/* This gets a page from the active_list and active_longterm lists in order to add to the incative list */
+static int get_from_active_lists(int nr_to_scan, struct zone *zone, struct list_head *dst, int *scanned)
+{
+ int nr_taken = 0;
+ struct page *page;
+ int scan = 0;
+ int location;
+
+ while (scan++ < nr_to_scan) {
+ location = 0;
+ page = replace(zone, &location);
+
+ if (!page)
+ break;
+
+ if (!TestClearPageLRU(page)) {
+ BUG();
+ }
+
+ if (!location)
+ BUG();
+
+ if (get_page_testone(page)) {
+ /*
+ * It is being freed elsewhere
+ */
+ __put_page(page);
+ SetPageLRU(page);
+
+ if (Active(location))
+ add_page_to_active_list_tail(zone, page);
+ else
+ add_page_to_active_longterm_tail(zone, page);
+ continue;
+ } else {
+ list_add(&page->lru, dst);
+ nr_taken++;
+ }
+ }
+
+ *scanned = scan;
+ return nr_taken;
+}
+
/*
* zone->lru_lock is heavily contended. Some of the functions that
* shrink the lists perform better by taking out a batch of pages
@@ -695,10 +740,10 @@
lru_add_drain();
spin_lock_irq(&zone->lru_lock);
- pgmoved = isolate_lru_pages(nr_pages, &zone->active_list,
+ pgmoved = get_from_active_lists(nr_pages, zone,
&l_hold, &pgscanned);
zone->pages_scanned += pgscanned;
- zone->nr_active -= pgmoved;
+// zone->nr_active -= pgmoved;
spin_unlock_irq(&zone->lru_lock);
/*
[-- Attachment #4: linux-cartv3.partial.patch --]
[-- Type: text/x-patch, Size: 894 bytes --]
diff -Naur /home/rni/linux-cart.v2/include/linux/cart.h linux-cart.v3/include/linux/cart.h
--- /home/rni/linux-cart.v2/include/linux/cart.h 2005-08-16 13:58:49.000000000 -0400
+++ linux-cart.v3/include/linux/cart.h 2005-08-16 14:01:20.000000000 -0400
@@ -7,20 +7,18 @@
#define EVICTED_ACTIVE 1
#define EVICTED_LONGTERM 2
-#define ACTIVE 3
-#define ACTIVE_LONGTERM 4
+#define ACTIVE 4
+#define ACTIVE_LONGTERM 8
-#define EvictedActive(location) location & EVICTED_ACTIVE
-#define EvictedLongterm(location) location & EVICTED_LONGTERM
-#define Active(location) location & ACTIVE
-#define ActiveLongterm(location) location & ACTIVE_LONGTERM
+#define EvictedActive(location) (location & EVICTED_ACTIVE)
+#define EvictedLongterm(location) (location & EVICTED_LONGTERM)
+#define Active(location) (location & ACTIVE)
+#define ActiveLongterm(location) (location & ACTIVE_LONGTERM)
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Zoned CART
2005-08-16 19:53 ` Rahul Iyer
@ 2005-08-16 20:49 ` Christoph Lameter
2005-08-25 22:39 ` Peter Zijlstra
0 siblings, 1 reply; 22+ messages in thread
From: Christoph Lameter @ 2005-08-16 20:49 UTC (permalink / raw)
To: Rahul Iyer; +Cc: Peter Zijlstra, linux-mm, Rik van Riel, Marcelo Tosatti
Hmm. I am a bit concerned about the proliferation of counters in CART
because these may lead to bouncing cachelines.
The paper mentions some relationships between the different values.
If we had a counter for the number of pages resident (nr_rpages)
(|T1|+|T2|) then that counter would gradually approach c and then no
longer change.
Then
|T2| = nr_rpages - |T1|
Similarly if we had a counter for the number of pages on the evicted
list (nr_evicted) then that counter would also gradually approach c and
then stay constant. nr_evicted would only increase if nr_rpages has
already reached c which is another good thing to avoid bouncing
cachelines.
Then also
|B2| = nr_evicted - |B1|
Thus we could reduce the frequency of counter increments on a fully
loaded system (where nr_rpages = c and nr_eviced = c) by
calculating some variables:
#define nr_inactive (nr_rpages - nr_active)
#define nr_evicted_longterm (nr_evicted - nr_evicted_shortterm)
There is also a relationship between |S| and |L| since these attributes
are only used on resident pages.
|L| = nr_rpages - |S|
So
#define nr_longterm (nr_rpages - nr_shortterm)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Zoned CART
2005-08-16 20:49 ` Christoph Lameter
@ 2005-08-25 22:39 ` Peter Zijlstra
2005-08-26 0:01 ` Christoph Lameter
0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2005-08-25 22:39 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Rahul Iyer, linux-mm, Rik van Riel, Marcelo Tosatti
On Tue, 2005-08-16 at 13:49 -0700, Christoph Lameter wrote:
> Hmm. I am a bit concerned about the proliferation of counters in CART
> because these may lead to bouncing cachelines.
>
> The paper mentions some relationships between the different values.
>
> If we had a counter for the number of pages resident (nr_rpages)
> (|T1|+|T2|) then that counter would gradually approach c and then no
> longer change.
>
> Then
>
> |T2| = nr_rpages - |T1|
>
> Similarly if we had a counter for the number of pages on the evicted
> list (nr_evicted) then that counter would also gradually approach c and
> then stay constant. nr_evicted would only increase if nr_rpages has
> already reached c which is another good thing to avoid bouncing
> cachelines.
>
> Then also
>
> |B2| = nr_evicted - |B1|
>
> Thus we could reduce the frequency of counter increments on a fully
> loaded system (where nr_rpages = c and nr_eviced = c) by
> calculating some variables:
>
> #define nr_inactive (nr_rpages - nr_active)
> #define nr_evicted_longterm (nr_evicted - nr_evicted_shortterm)
>
> There is also a relationship between |S| and |L| since these attributes
> are only used on resident pages.
>
> |L| = nr_rpages - |S|
>
> So
>
> #define nr_longterm (nr_rpages - nr_shortterm)
I tried to do this, however I have some problems getting nr_rpages.
#define cart_c(zone) ((zone)->present_pages - (zone)->free_pages - (zone)->nr_inactive)
/* |T2| = c - |T1| */
#define active_longterm(zone) (cart_c(zone) - (zone)->nr_active)
/* |B2| = c - |B1| */
#define evicted_longterm(zone) (cart_c(zone) - (zone)->nr_evicted_active)
/* nl = c - ns */
#define longterm(zone) (cart_c(zone) - (zone)->nr_shortterm)
This is with a rahul's 3 list approach:
active_list <-> T1,
active_longterm <-> T2
inactive_list - used for batch replace; although i'm contemplating
getting rid of the thing.
My trouble is with the definition of cart_c; I seem to over guess c.
(and miscount some, esp. shortterm, but I'm looking into that).
struct zone values:
zone->nr_active: 1645
zone->nr_inactive: 1141
zone->nr_evicted_active: 0
zone->nr_shortterm: 30526
zone->cart_p: 0
zone->cart_q: 88
zone->present_pages: 16384
zone->free_pages: 10546
zone->pages_min: 256
zone->pages_low: 320
zone->pages_high: 384
implicit values:
zone->nr_active_longterm: 3052
zone->nr_evicted_longterm: 4697
zone->nr_longterm: 4294941467
zone->cart_c: 4697
counted values:
zone->nr_active: 1549
zone->nr_shortterm: 1545
zone->nr_longterm: 4
zone->nr_active_longterm: 0
zone->nr_inactive: 1141
here nr_rpages should be:
nr_active + nr_active_longterm =
1549 + 0 = 1549
but my cart_c marco gives me:
present_pages - free_pages - nr_inactive =
16384 - 10546 - 1141 = 4697
where are those 4697 - 1549 = 3148 pages?
--
Peter Zijlstra <a.p.zijlstra@chello.nl>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Zoned CART
2005-08-25 22:39 ` Peter Zijlstra
@ 2005-08-26 0:01 ` Christoph Lameter
2005-08-26 3:59 ` Rahul Iyer
2005-08-26 7:09 ` Peter Zijlstra
0 siblings, 2 replies; 22+ messages in thread
From: Christoph Lameter @ 2005-08-26 0:01 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Rahul Iyer, linux-mm, Rik van Riel, Marcelo Tosatti
On Fri, 26 Aug 2005, Peter Zijlstra wrote:
> This is with a rahul's 3 list approach:
> active_list <-> T1,
> active_longterm <-> T2
longterm == T2? That wont work. longterm (L) is composed of T2 and a
subset of T1.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Zoned CART
2005-08-26 0:01 ` Christoph Lameter
@ 2005-08-26 3:59 ` Rahul Iyer
2005-08-26 7:09 ` Peter Zijlstra
1 sibling, 0 replies; 22+ messages in thread
From: Rahul Iyer @ 2005-08-26 3:59 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Peter Zijlstra, linux-mm, Rik van Riel, Marcelo Tosatti
Christoph Lameter wrote:
>On Fri, 26 Aug 2005, Peter Zijlstra wrote:
>
>
>
>>This is with a rahul's 3 list approach:
>> active_list <-> T1,
>> active_longterm <-> T2
>>
>>
>
>longterm == T2? That wont work. longterm (L) is composed of T2 and a
>subset of T1.
>
>
>
This is probably named a bit wrong... active_longterm is meant to be the
T2 list, not the list of all longterm pages. The LongTerm bit in the
page flags defines whether the page is longterm or not.
thanks
rahul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Zoned CART
2005-08-26 0:01 ` Christoph Lameter
2005-08-26 3:59 ` Rahul Iyer
@ 2005-08-26 7:09 ` Peter Zijlstra
2005-08-26 12:24 ` Rik van Riel
1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2005-08-26 7:09 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Rahul Iyer, linux-mm, Rik van Riel, Marcelo Tosatti
On Thu, 2005-08-25 at 17:01 -0700, Christoph Lameter wrote:
> On Fri, 26 Aug 2005, Peter Zijlstra wrote:
>
> > This is with a rahul's 3 list approach:
> > active_list <-> T1,
> > active_longterm <-> T2
>
> longterm == T2? That wont work. longterm (L) is composed of T2 and a
> subset of T1.
As Rahul said, this is a misnomer, the list is actually used as T2,
but my question remains should (on a stock kernel):
zone->present_pages - zone->free_pages =
zone->nr_active + zone->nr_inactive
Or is there some other place the pages can go?
--
Peter Zijlstra <a.p.zijlstra@chello.nl>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Zoned CART
2005-08-26 7:09 ` Peter Zijlstra
@ 2005-08-26 12:24 ` Rik van Riel
0 siblings, 0 replies; 22+ messages in thread
From: Rik van Riel @ 2005-08-26 12:24 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Christoph Lameter, Rahul Iyer, linux-mm, Marcelo Tosatti
On Fri, 26 Aug 2005, Peter Zijlstra wrote:
> zone->present_pages - zone->free_pages =
> zone->nr_active + zone->nr_inactive
>
> Or is there some other place the pages can go?
Slab cache, page tables, ...
--
All Rights Reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Zoned CART
2005-08-14 12:58 ` Peter Zijlstra
2005-08-15 21:31 ` Peter Zijlstra
@ 2005-08-26 21:03 ` Peter Zijlstra
2005-08-27 19:46 ` [RFC][PATCH] " Peter Zijlstra
1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2005-08-26 21:03 UTC (permalink / raw)
To: linux-mm; +Cc: Rik van Riel, Marcelo Tosatti, Rahul Iyer
[-- Attachment #1: Type: text/plain, Size: 471 bytes --]
I redid most of the cart code today. I went back to the 2 lists approach
instead of the 3 lists Rahul has.
Attached is my current code. I'm still off on the shortterm (n_s) count
but I'll get there.
Also this thing livelocks the kernel under severe swap pressure,
shrink_cache just doesn't make any progress.
I'll look into these two issues tomorrow after some sleep :-)
Any comments are ofcourse welcome.
Kind regards,
--
Peter Zijlstra <a.p.zijlstra@chello.nl>
[-- Attachment #2: cart-mk2-1.patch --]
[-- Type: text/x-patch, Size: 37959 bytes --]
diff --git a/fs/proc/proc_misc.c b/fs/proc/proc_misc.c
--- a/fs/proc/proc_misc.c
+++ b/fs/proc/proc_misc.c
@@ -233,6 +233,20 @@ static struct file_operations proc_zonei
.release = seq_release,
};
+extern struct seq_operations cart_op;
+static int cart_open(struct inode *inode, struct file *file)
+{
+ (void)inode;
+ return seq_open(file, &cart_op);
+}
+
+static struct file_operations cart_file_operations = {
+ .open = cart_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
static int version_read_proc(char *page, char **start, off_t off,
int count, int *eof, void *data)
{
@@ -602,6 +616,7 @@ void __init proc_misc_init(void)
create_seq_entry("interrupts", 0, &proc_interrupts_operations);
create_seq_entry("slabinfo",S_IWUSR|S_IRUGO,&proc_slabinfo_operations);
create_seq_entry("buddyinfo",S_IRUGO, &fragmentation_file_operations);
+ create_seq_entry("cart",S_IRUGO, &cart_file_operations);
create_seq_entry("vmstat",S_IRUGO, &proc_vmstat_file_operations);
create_seq_entry("zoneinfo",S_IRUGO, &proc_zoneinfo_file_operations);
create_seq_entry("diskstats", 0, &proc_diskstats_operations);
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -31,10 +31,28 @@ static inline void
del_page_from_lru(struct zone *zone, struct page *page)
{
list_del(&page->lru);
- if (PageActive(page)) {
- ClearPageActive(page);
+ if (TestClearPageActive(page)) {
zone->nr_active--;
} else {
zone->nr_inactive--;
}
+ if (TestClearPageLongTerm(page)) {
+ /* zone->nr_longterm--; */
+ } else {
+ zone->nr_shortterm--;
+ }
+}
+
+static inline void
+add_page_to_active_tail(struct zone *zone, struct page *page)
+{
+ list_add_tail(&page->lru, &zone->active_list);
+ zone->nr_active++;
+}
+
+static inline void
+add_page_to_inactive_tail(struct zone *zone, struct page *page)
+{
+ list_add_tail(&page->lru, &zone->inactive_list);
+ zone->nr_inactive++;
}
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -144,12 +144,16 @@ struct zone {
/* Fields commonly accessed by the page reclaim scanner */
spinlock_t lru_lock;
- struct list_head active_list;
- struct list_head inactive_list;
+ struct list_head active_list; /* The T1 list of CART */
+ struct list_head inactive_list; /* The T2 list of CART */
unsigned long nr_scan_active;
unsigned long nr_scan_inactive;
- unsigned long nr_active;
+ unsigned long nr_active;
unsigned long nr_inactive;
+ unsigned long nr_evicted_active;
+ unsigned long nr_shortterm; /* number of short term pages */
+ unsigned long cart_p; /* p from the CART paper */
+ unsigned long cart_q; /* q from the cart paper */
unsigned long pages_scanned; /* since last reclaim */
int all_unreclaimable; /* All pages pinned */
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -76,6 +76,8 @@
#define PG_nosave_free 18 /* Free, should not be written */
#define PG_uncached 19 /* Page has been mapped as uncached */
+#define PG_longterm 20 /* Filter bit for CART see mm/cart.c */
+
/*
* Global page accounting. One instance per CPU. Only unsigned longs are
* allowed.
@@ -305,6 +307,12 @@ extern void __mod_page_state(unsigned lo
#define SetPageUncached(page) set_bit(PG_uncached, &(page)->flags)
#define ClearPageUncached(page) clear_bit(PG_uncached, &(page)->flags)
+#define PageLongTerm(page) test_bit(PG_longterm, &(page)->flags)
+#define SetPageLongTerm(page) set_bit(PG_longterm, &(page)->flags)
+#define TestSetPageLongTerm(page) test_and_set_bit(PG_longterm, &(page)->flags)
+#define ClearPageLongTerm(page) clear_bit(PG_longterm, &(page)->flags)
+#define TestClearPageLongTerm(page) test_and_clear_bit(PG_longterm, &(page)->flags)
+
struct page; /* forward declaration */
int test_clear_page_dirty(struct page *page);
diff --git a/include/linux/swap.h b/include/linux/swap.h
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -154,6 +154,23 @@ extern void out_of_memory(unsigned int _
/* linux/mm/memory.c */
extern void swapin_readahead(swp_entry_t, unsigned long, struct vm_area_struct *);
+/* linux/mm/nonresident.c */
+#define NR_filter 0x01 /* short/long */
+#define NR_list 0x02 /* b1/b2; correlates to PG_active */
+#define NR_evict 0x80000000
+
+extern u32 remember_page(struct address_space *, unsigned long, unsigned int);
+extern unsigned int recently_evicted(struct address_space *, unsigned long);
+extern void init_nonresident(void);
+
+/* linux/mm/cart.c */
+extern void cart_init(void);
+extern void cart_insert(struct zone*, struct page *, int);
+extern struct page *cart_replace(struct zone *, unsigned int *);
+
+#define lru_cache_add(page) cart_insert(page_zone((page)), (page), 0)
+#define add_page_to_cart(zone, page) cart_insert((zone), (page), 1)
+
/* linux/mm/page_alloc.c */
extern unsigned long totalram_pages;
extern unsigned long totalhigh_pages;
@@ -164,9 +181,8 @@ extern unsigned int nr_free_buffer_pages
extern unsigned int nr_free_pagecache_pages(void);
/* linux/mm/swap.c */
-extern void FASTCALL(lru_cache_add(struct page *));
+extern void FASTCALL(lru_cache_add_inactive(struct page *));
extern void FASTCALL(lru_cache_add_active(struct page *));
-extern void FASTCALL(activate_page(struct page *));
extern void FASTCALL(mark_page_accessed(struct page *));
extern void lru_add_drain(void);
extern int rotate_reclaimable_page(struct page *page);
@@ -292,6 +308,11 @@ static inline swp_entry_t get_swap_page(
#define grab_swap_token() do { } while(0)
#define has_swap_token(x) 0
+/* linux/mm/nonresident.c */
+#define init_nonresident() do { } while (0)
+#define remember_page(x,y,z) 0
+#define recently_evicted(x,y) 0
+
#endif /* CONFIG_SWAP */
#endif /* __KERNEL__*/
#endif /* _LINUX_SWAP_H */
diff --git a/init/main.c b/init/main.c
--- a/init/main.c
+++ b/init/main.c
@@ -47,6 +47,7 @@
#include <linux/rmap.h>
#include <linux/mempolicy.h>
#include <linux/key.h>
+#include <linux/swap.h>
#include <asm/io.h>
#include <asm/bugs.h>
@@ -494,7 +495,9 @@ asmlinkage void __init start_kernel(void
}
#endif
vfs_caches_init_early();
+ init_nonresident();
mem_init();
+ cart_init();
kmem_cache_init();
setup_per_cpu_pageset();
numa_policy_init();
diff --git a/mm/Makefile b/mm/Makefile
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -12,7 +12,8 @@ obj-y := bootmem.o filemap.o mempool.o
readahead.o slab.o swap.o truncate.o vmscan.o \
prio_tree.o $(mmu-y)
-obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o
+obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o \
+ nonresident.o cart.o
obj-$(CONFIG_HUGETLBFS) += hugetlb.o
obj-$(CONFIG_NUMA) += mempolicy.o
obj-$(CONFIG_SPARSEMEM) += sparse.o
diff --git a/mm/cart.c b/mm/cart.c
new file mode 100644
--- /dev/null
+++ b/mm/cart.c
@@ -0,0 +1,287 @@
+/* For further details, please refer to the CART paper here -
+ * http://www.almaden.ibm.com/cs/people/dmodha/clockfast.pdf
+ *
+ * Modified by Peter Zijlstra to work with the nonresident code I adapted
+ * from Rik van Riel.
+ *
+ * XXX: add page accounting
+ */
+
+#include <linux/swap.h>
+#include <linux/mm.h>
+#include <linux/page-flags.h>
+#include <linux/mm_inline.h>
+
+#define cart_cT(zone) ((zone)->nr_active + (zone)->nr_inactive)
+#define cart_cB(zone) ((zone)->nr_active + (zone)->nr_inactive + (zone)->free_pages)
+
+#define nr_T1(zone) ((zone)->nr_active)
+#define nr_T2(zone) ((zone)->nr_inactive)
+
+#define list_T1(zone) (&(zone)->active_list)
+#define list_T2(zone) (&(zone)->inactive_list)
+
+#define cart_p(zone) ((zone)->cart_p)
+#define cart_q(zone) ((zone)->cart_q)
+
+#define nr_B1(zone) ((zone)->nr_evicted_active)
+#define nr_B2(zone) (cart_cB(zone) - nr_B1(zone))
+
+#define nr_Ns(zone) ((zone)->nr_shortterm)
+#define nr_Nl(zone) (cart_cT(zone) - nr_Ns(zone))
+
+/* Called from init/main.c to initialize the cart parameters */
+void cart_init()
+{
+ struct zone *zone;
+ for_each_zone(zone) {
+ zone->nr_evicted_active = 0;
+ /* zone->nr_evicted_inactive = cart_cB(zone); */
+ zone->nr_shortterm = 0;
+ /* zone->nr_longterm = 0; */
+ zone->cart_p = 0;
+ zone->cart_q = 0;
+ }
+}
+
+static inline void cart_q_inc(struct zone *zone)
+{
+ /* if (|T2| + |B2| + |T1| - ns >= c) q = min(q + 1, 2c - |T1|) */
+ if (nr_T2(zone) + nr_B2(zone) + nr_T1(zone) - nr_Ns(zone) >= cart_cB(zone))
+ cart_q(zone) = min(cart_q(zone) + 1, 2*cart_cB(zone) - nr_T1(zone));
+}
+
+static inline void cart_q_dec(struct zone *zone)
+{
+ /* q = max(q - 1, c - |T1|) */
+ unsigned long target = cart_cB(zone) - nr_T1(zone);
+ if (cart_q(zone) <= target)
+ cart_q(zone) = target;
+ else
+ --cart_q(zone);
+}
+
+void cart_insert(struct zone *zone, struct page *page, int direct)
+{
+ unsigned int rflags;
+ unsigned int on_B1, on_B2;
+
+ rflags = recently_evicted(page_mapping(page), page_index(page));
+ on_B1 = (rflags && !(rflags & NR_list));
+ on_B2 = (rflags && (rflags & NR_list));
+
+ if (on_B1) {
+ /* p = min(p + max(1, ns/|B1|), c) */
+ unsigned long ratio = nr_Ns(zone) / (nr_B1(zone) ?: 1UL);
+ cart_p(zone) += ratio ?: 1UL;
+ if (unlikely(cart_p(zone) > cart_cT(zone)))
+ cart_p(zone) = cart_cT(zone);
+
+ SetPageLongTerm(page);
+ /* ++nr_Nl(zone); */
+ } else if (on_B2) {
+ /* p = max(p - max(1, nl/|B2|), 0) */
+ unsigned long ratio = nr_Nl(zone) / (nr_B2(zone) ?: 1UL);
+ cart_p(zone) -= ratio ?: 1UL;
+ if (unlikely(cart_p(zone) > cart_cT(zone))) /* unsigned; wrap around */
+ cart_p(zone) = 0UL;
+
+ SetPageLongTerm(page);
+ /* NOTE: this function is the only one that uses recently_evicted()
+ * and it does not use the NR_filter flag; we could live without,
+ * for now use as sanity check
+ */
+// BUG_ON(!(rflags & NR_filter)); /* all pages in B2 are longterm */
+
+ /* ++nr_Nl(zone); */
+ cart_q_inc(zone);
+ } else {
+ ClearPageLongTerm(page);
+ ++nr_Ns(zone);
+ }
+
+ ClearPageReferenced(page);
+ if (direct) {
+ SetPageActive(page);
+ add_page_to_active_list(zone, page);
+ BUG_ON(!PageLRU(page));
+ } else lru_cache_add_active(page);
+}
+
+/* This function selects the candidate and returns the corresponding
+ * struct page * or returns NULL in case no page can be freed.
+ * The *where argument is used to indicate the parent list of the page
+ * so that, in case it cannot be written back, it can be placed back on
+ * the correct list
+ */
+struct page *cart_replace(struct zone *zone, unsigned int *where)
+{
+ struct list_head *list;
+ struct page *page = NULL;
+ int referenced;
+ unsigned int flags, rflags;
+
+ while (!list_empty(list_T2(zone))) {
+ page = list_entry(list_T2(zone)->next, struct page, lru);
+
+ if (!TestClearPageReferenced(page))
+ break;
+
+ del_page_from_inactive_list(zone, page);
+ add_page_to_active_tail(zone, page);
+ SetPageActive(page);
+
+ cart_q_inc(zone);
+ }
+
+ while (!list_empty(list_T1(zone))) {
+ page = list_entry(list_T1(zone)->next, struct page, lru);
+ referenced = TestClearPageReferenced(page);
+
+ if (!PageLongTerm(page) && !referenced)
+ break;
+
+ if (referenced) {
+ del_page_from_active_list(zone, page);
+ add_page_to_active_tail(zone, page);
+
+ if (nr_T1(zone) >= min(cart_p(zone) + 1, nr_B1(zone)) &&
+ !PageLongTerm(page)) {
+ SetPageLongTerm(page);
+ --nr_Ns(zone);
+ /* ++nr_Nl(zone); */
+ }
+ } else {
+ del_page_from_active_list(zone, page);
+ add_page_to_inactive_tail(zone, page);
+ ClearPageActive(page);
+
+ cart_q_dec(zone);
+ }
+ }
+
+ page = NULL;
+ if (nr_T1(zone) > max(1UL, cart_p(zone))) {
+ page = list_entry(list_T1(zone)->next, struct page, lru);
+ del_page_from_active_list(zone, page);
+ --nr_Ns(zone);
+ ++nr_B1(zone);
+ flags = PageLongTerm(page) ? NR_filter : 0;
+ } else {
+ if (!list_empty(list_T2(zone))) {
+ page = list_entry(list_T2(zone)->next, struct page, lru);
+ del_page_from_inactive_list(zone, page);
+ /* --nr_Nl(zone); */
+ /* ++nr_B1(zone); */
+ flags = NR_list | NR_filter;
+ }
+ }
+ if (!page) return NULL;
+ *where = flags;
+
+ /* history replacement; always remember, if the page was already remembered
+ * this will move it to the head. XXX: not so; fix this !!
+ *
+ * Assume |B1| + |B2| == c + 1, since |B1_j| + |B2_j| := c_j.
+ * The list_empty check is done on the Bn_j size.
+ */
+ /* |B1| <= max(0, q) */
+ if (nr_B1(zone) <= cart_q(zone)) flags |= NR_evict;
+
+ rflags = remember_page(page_mapping(page), page_index(page), flags);
+ if (rflags & NR_evict) {
+ /* if (likely(nr_B2(zone))) --nr_B2(zone); */
+ } else {
+ if (likely(nr_B1(zone))) --nr_B1(zone);
+ }
+
+ return page;
+}
+
+#ifdef CONFIG_PROC_FS
+
+#include <linux/seq_file.h>
+
+static void *stats_start(struct seq_file *m, loff_t *pos)
+{
+ if (*pos < 0 || *pos > 1)
+ return NULL;
+
+ lru_add_drain();
+
+ return pos;
+}
+
+static void *stats_next(struct seq_file *m, void *arg, loff_t *pos)
+{
+ return NULL;
+}
+
+static void stats_stop(struct seq_file *m, void *arg)
+{
+}
+
+static int stats_show(struct seq_file *m, void *arg)
+{
+ struct zone *zone;
+ for_each_zone(zone) {
+ spin_lock_irq(&zone->lru_lock);
+ seq_printf(m, "\n\n======> zone: %lu <=====\n", (unsigned long)zone);
+ seq_printf(m, "struct zone values:\n");
+ seq_printf(m, " zone->nr_active: %lu\n", zone->nr_active);
+ seq_printf(m, " zone->nr_inactive: %lu\n", zone->nr_inactive);
+ seq_printf(m, " zone->nr_evicted_active: %lu\n", zone->nr_evicted_active);
+ seq_printf(m, " zone->nr_shortterm: %lu\n", zone->nr_shortterm);
+ seq_printf(m, " zone->cart_p: %lu\n", zone->cart_p);
+ seq_printf(m, " zone->cart_q: %lu\n", zone->cart_q);
+ seq_printf(m, " zone->present_pages: %lu\n", zone->present_pages);
+ seq_printf(m, " zone->free_pages: %lu\n", zone->free_pages);
+ seq_printf(m, " zone->pages_min: %lu\n", zone->pages_min);
+ seq_printf(m, " zone->pages_low: %lu\n", zone->pages_low);
+ seq_printf(m, " zone->pages_high: %lu\n", zone->pages_high);
+
+ seq_printf(m, "\n");
+ seq_printf(m, "implicit values:\n");
+ seq_printf(m, " zone->nr_evicted_longterm: %lu\n", nr_B2(zone));
+ seq_printf(m, " zone->nr_longterm: %lu\n", nr_Nl(zone));
+ seq_printf(m, " zone->cart_c: %lu\n", cart_cT(zone));
+
+ seq_printf(m, "\n");
+ seq_printf(m, "counted values:\n");
+
+ {
+ struct page *page;
+ unsigned long active = 0, shortterm = 0, longterm = 0;
+ list_for_each_entry(page, &zone->active_list, lru) {
+ ++active;
+ if (PageLongTerm(page)) ++longterm;
+ else ++shortterm;
+ }
+ seq_printf(m, " zone->nr_active: %lu\n", active);
+ seq_printf(m, " zone->nr_shortterm: %lu\n", shortterm);
+ seq_printf(m, " zone->nr_longterm: %lu\n", longterm); // XXX: should add zone->inactive
+ }
+
+ {
+ struct page *page;
+ unsigned long inactive = 0;
+ list_for_each_entry(page, &zone->inactive_list, lru) {
+ ++inactive;
+ }
+ seq_printf(m, " zone->nr_inactive: %lu\n", inactive);
+ }
+
+ spin_unlock_irq(&zone->lru_lock);
+ }
+
+ return 0;
+}
+
+struct seq_operations cart_op = {
+ .start = stats_start,
+ .next = stats_next,
+ .stop = stats_stop,
+ .show = stats_show,
+};
+
+#endif /* CONFIG_PROC_FS */
diff --git a/mm/filemap.c b/mm/filemap.c
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -723,7 +723,6 @@ void do_generic_mapping_read(struct addr
unsigned long offset;
unsigned long last_index;
unsigned long next_index;
- unsigned long prev_index;
loff_t isize;
struct page *cached_page;
int error;
@@ -732,7 +731,6 @@ void do_generic_mapping_read(struct addr
cached_page = NULL;
index = *ppos >> PAGE_CACHE_SHIFT;
next_index = index;
- prev_index = ra.prev_page;
last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
offset = *ppos & ~PAGE_CACHE_MASK;
@@ -779,13 +777,7 @@ page_ok:
if (mapping_writably_mapped(mapping))
flush_dcache_page(page);
- /*
- * When (part of) the same page is read multiple times
- * in succession, only mark it as accessed the first time.
- */
- if (prev_index != index)
- mark_page_accessed(page);
- prev_index = index;
+ mark_page_accessed(page);
/*
* Ok, we have the page, and it's up-to-date, so
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1304,7 +1304,8 @@ static int do_wp_page(struct mm_struct *
page_remove_rmap(old_page);
flush_cache_page(vma, address, pfn);
break_cow(vma, new_page, address, page_table);
- lru_cache_add_active(new_page);
+ lru_cache_add(new_page);
+ SetPageReferenced(new_page);
page_add_anon_rmap(new_page, vma, address);
/* Free the old page.. */
@@ -1782,7 +1783,7 @@ do_anonymous_page(struct mm_struct *mm,
entry = maybe_mkwrite(pte_mkdirty(mk_pte(page,
vma->vm_page_prot)),
vma);
- lru_cache_add_active(page);
+ lru_cache_add(page);
SetPageReferenced(page);
page_add_anon_rmap(page, vma, addr);
}
@@ -1903,7 +1904,8 @@ retry:
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
set_pte_at(mm, address, page_table, entry);
if (anon) {
- lru_cache_add_active(new_page);
+ lru_cache_add(new_page);
+ SetPageReferenced(new_page);
page_add_anon_rmap(new_page, vma, address);
} else
page_add_file_rmap(new_page);
diff --git a/mm/nonresident.c b/mm/nonresident.c
new file mode 100644
--- /dev/null
+++ b/mm/nonresident.c
@@ -0,0 +1,254 @@
+/*
+ * mm/nonresident.c
+ * (C) 2004,2005 Red Hat, Inc
+ * Written by Rik van Riel <riel@redhat.com>
+ * Released under the GPL, see the file COPYING for details.
+ * Adapted by Peter Zijlstra <a.p.zijlstra@chello.nl> for use by ARC
+ * like algorithms.
+ *
+ * Keeps track of whether a non-resident page was recently evicted
+ * and should be immediately promoted to the active list. This also
+ * helps automatically tune the inactive target.
+ *
+ * The pageout code stores a recently evicted page in this cache
+ * by calling remember_page(mapping/mm, index/vaddr)
+ * and can look it up in the cache by calling recently_evicted()
+ * with the same arguments.
+ *
+ * Note that there is no way to invalidate pages after eg. truncate
+ * or exit, we let the pages fall out of the non-resident set through
+ * normal replacement.
+ *
+ *
+ * Modified to work with ARC like algorithms who:
+ * - need to balance two FIFOs; |b1| + |b2| = c,
+ * - keep a flag per non-resident page.
+ *
+ * The bucket contains two single linked cyclic lists (CLOCKS) and each
+ * clock has a tail hand. By selecting a victim clock upon insertion it
+ * is possible to balance them.
+ *
+ * The slot looks like this:
+ * struct slot_t {
+ * u32 cookie : 24; // LSB
+ * u32 index : 6;
+ * u32 filter : 1;
+ * u32 clock : 1; // MSB
+ * };
+ *
+ * The bucket is guarded by a spinlock.
+ */
+#include <linux/swap.h>
+#include <linux/mm.h>
+#include <linux/cache.h>
+#include <linux/spinlock.h>
+#include <linux/bootmem.h>
+#include <linux/hash.h>
+#include <linux/prefetch.h>
+#include <linux/kernel.h>
+
+#define TARGET_SLOTS 64
+#define NR_CACHELINES (TARGET_SLOTS*sizeof(u32) / L1_CACHE_BYTES)
+#define NR_SLOTS (((NR_CACHELINES * L1_CACHE_BYTES) - sizeof(spinlock_t) - 2*sizeof(u16)) / sizeof(u32))
+#if 0
+#if NR_SLOTS < (TARGET_SLOTS / 2)
+#warning very small slot size
+#if NR_SLOTS <= 0
+#error no room for slots left
+#endif
+#endif
+#endif
+
+#define BUILD_MASK(bits, shift) (((1 << (bits)) - 1) << (shift))
+
+#define FLAGS_BITS 2
+#define FLAGS_SHIFT (sizeof(u32)*8 - FLAGS_BITS)
+#define FLAGS_MASK BUILD_MASK(FLAGS_BITS, FLAGS_SHIFT)
+
+#define INDEX_BITS 6 /* ceil(log2(NR_SLOTS)) */
+#define INDEX_SHIFT (FLAGS_SHIFT - INDEX_BITS)
+#define INDEX_MASK BUILD_MASK(INDEX_BITS, INDEX_SHIFT)
+
+#define SET_INDEX(x, idx) ((x) = ((x) & ~INDEX_MASK) | ((idx) << INDEX_SHIFT))
+#define GET_INDEX(x) (((x) & INDEX_MASK) >> INDEX_SHIFT)
+
+struct nr_bucket
+{
+ spinlock_t lock;
+ u16 hand[2];
+ u32 slot[NR_SLOTS];
+} ____cacheline_aligned;
+
+/* The non-resident page hash table. */
+static struct nr_bucket * nonres_table;
+static unsigned int nonres_shift;
+static unsigned int nonres_mask;
+
+/* hash the address into a bucket */
+static struct nr_bucket * nr_hash(void * mapping, unsigned long index)
+{
+ unsigned long bucket;
+ unsigned long hash;
+
+ hash = hash_ptr(mapping, BITS_PER_LONG);
+ hash = 37 * hash + hash_long(index, BITS_PER_LONG);
+ bucket = hash & nonres_mask;
+
+ return nonres_table + bucket;
+}
+
+/* hash the address, inode and flags into a cookie */
+/* the two msb are flags; where msb-1 is a type flag and msb a period flag */
+static u32 nr_cookie(struct address_space * mapping, unsigned long index, unsigned int flags)
+{
+ u32 c;
+ unsigned long cookie;
+
+ cookie = hash_ptr(mapping, BITS_PER_LONG);
+ cookie = 37 * cookie + hash_long(index, BITS_PER_LONG);
+
+ if (mapping && mapping->host) {
+ cookie = 37 * cookie + hash_long(mapping->host->i_ino, BITS_PER_LONG);
+ }
+
+ c = (u32)(cookie >> (BITS_PER_LONG - 32));
+ c = (c & ~FLAGS_MASK) | ((flags << FLAGS_SHIFT) & FLAGS_MASK);
+ return c;
+}
+
+unsigned int recently_evicted(struct address_space * mapping, unsigned long index)
+{
+ struct nr_bucket * nr_bucket;
+ u32 wanted, mask;
+ unsigned int r_flags = 0;
+ int i;
+ unsigned long iflags;
+
+ prefetch(mapping->host);
+ nr_bucket = nr_hash(mapping, index);
+
+ spin_lock_prefetch(nr_bucket); // prefetch_range(nr_bucket, NR_CACHELINES);
+ wanted = nr_cookie(mapping, index, 0) & ~INDEX_MASK;
+ mask = ~(FLAGS_MASK | INDEX_MASK);
+
+ spin_lock_irqsave(&nr_bucket->lock, iflags);
+ for (i = 0; i < NR_SLOTS; ++i) {
+ if ((nr_bucket->slot[i] & mask) == wanted) {
+ r_flags = nr_bucket->slot[i] >> FLAGS_SHIFT;
+ r_flags |= NR_evict; /* set the MSB to mark presence */
+ break;
+ }
+ }
+ spin_unlock_irqrestore(&nr_bucket->lock, iflags);
+
+ return r_flags;
+}
+
+/* flags:
+ * logical and of the page flags (NR_filter, NR_list) and
+ * an NR_evict target
+ */
+u32 remember_page(struct address_space * mapping, unsigned long index, unsigned int flags)
+{
+ struct nr_bucket *nr_bucket;
+ u32 cookie;
+ u32 *slot, *tail;
+ unsigned int slot_pos, tail_pos;
+ unsigned long iflags;
+
+ prefetch(mapping->host);
+ nr_bucket = nr_hash(mapping, index);
+
+ spin_lock_prefetch(nr_bucket); // prefetchw_range(nr_bucket, NR_CACHELINES);
+ cookie = nr_cookie(mapping, index, flags);
+
+ flags &= NR_evict; /* removal chain */
+ spin_lock_irqsave(&nr_bucket->lock, iflags);
+
+ /* free a slot */
+again:
+ tail_pos = nr_bucket->hand[!!flags];
+ BUG_ON(tail_pos >= NR_SLOTS);
+ tail = &nr_bucket->slot[tail_pos];
+ if (unlikely((*tail & NR_evict) != flags)) {
+ flags ^= NR_evict; /* empty chain; take other one */
+ goto again;
+ }
+ BUG_ON((*tail & NR_evict) != flags);
+ /* free slot by swapping tail,tail+1, so that we skip over tail */
+ slot_pos = GET_INDEX(*tail);
+ BUG_ON(slot_pos >= NR_SLOTS);
+ slot = &nr_bucket->slot[slot_pos];
+ BUG_ON((*slot & NR_evict) != flags);
+ if (likely(tail != slot)) *slot = xchg(tail, *slot);
+ /* slot: -> [slot], old cookie */
+ BUG_ON(GET_INDEX(*slot) != slot_pos);
+
+ flags = (cookie & NR_evict); /* insertion chain */
+
+ /* place cookie in empty slot */
+ SET_INDEX(cookie, slot_pos); /* -> [slot], cookie */
+ cookie = xchg(slot, cookie); /* slot: -> [slot], cookie */
+
+ /* insert slot before tail; ie. MRU pos */
+ tail_pos = nr_bucket->hand[!!flags];
+ BUG_ON(tail_pos >= NR_SLOTS);
+ tail = &nr_bucket->slot[tail_pos];
+ if (likely((*tail & NR_evict) == flags && tail != slot))
+ *slot = xchg(tail, *slot); /* swap if not empty and not same */
+ nr_bucket->hand[!!flags] = slot_pos;
+
+ spin_unlock_irqrestore(&nr_bucket->lock, iflags);
+
+ return cookie;
+}
+
+/*
+ * For interactive workloads, we remember about as many non-resident pages
+ * as we have actual memory pages. For server workloads with large inter-
+ * reference distances we could benefit from remembering more.
+ */
+static __initdata unsigned long nonresident_factor = 1;
+void __init init_nonresident(void)
+{
+ int target;
+ int i, j;
+
+ /*
+ * Calculate the non-resident hash bucket target. Use a power of
+ * two for the division because alloc_large_system_hash rounds up.
+ */
+ target = nr_all_pages * nonresident_factor;
+ target /= (sizeof(struct nr_bucket) / sizeof(u32));
+
+ nonres_table = alloc_large_system_hash("Non-resident page tracking",
+ sizeof(struct nr_bucket),
+ target,
+ 0,
+ HASH_EARLY | HASH_HIGHMEM,
+ &nonres_shift,
+ &nonres_mask,
+ 0);
+
+ for (i = 0; i < (1 << nonres_shift); i++) {
+ spin_lock_init(&nonres_table[i].lock);
+ nonres_table[i].hand[0] = nonres_table[i].hand[1] = 0;
+ for (j = 0; j < NR_SLOTS; ++j) {
+ nonres_table[i].slot[j] = NR_evict;
+ if (j < NR_SLOTS - 1)
+ SET_INDEX(nonres_table[i].slot[j], j+1);
+ else /* j == NR_SLOTS - 1 */
+ SET_INDEX(nonres_table[i].slot[j], 0);
+ }
+ }
+}
+
+static int __init set_nonresident_factor(char * str)
+{
+ if (!str)
+ return 0;
+ nonresident_factor = simple_strtoul(str, &str, 0);
+ return 1;
+}
+
+__setup("nonresident_factor=", set_nonresident_factor);
diff --git a/mm/shmem.c b/mm/shmem.c
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1500,11 +1500,8 @@ static void do_shmem_file_read(struct fi
*/
if (mapping_writably_mapped(mapping))
flush_dcache_page(page);
- /*
- * Mark the page accessed if we read the beginning.
- */
- if (!offset)
- mark_page_accessed(page);
+
+ mark_page_accessed(page);
} else
page = ZERO_PAGE(0);
diff --git a/mm/swap.c b/mm/swap.c
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -78,8 +78,8 @@ int rotate_reclaimable_page(struct page
return 1;
if (PageDirty(page))
return 1;
- if (PageActive(page))
- return 1;
+ /* if (PageActive(page)) */
+ /* return 1; */
if (!PageLRU(page))
return 1;
@@ -97,37 +97,12 @@ int rotate_reclaimable_page(struct page
}
/*
- * FIXME: speed this up?
- */
-void fastcall activate_page(struct page *page)
-{
- struct zone *zone = page_zone(page);
-
- spin_lock_irq(&zone->lru_lock);
- if (PageLRU(page) && !PageActive(page)) {
- del_page_from_inactive_list(zone, page);
- SetPageActive(page);
- add_page_to_active_list(zone, page);
- inc_page_state(pgactivate);
- }
- spin_unlock_irq(&zone->lru_lock);
-}
-
-/*
* Mark a page as having seen activity.
- *
- * inactive,unreferenced -> inactive,referenced
- * inactive,referenced -> active,unreferenced
- * active,unreferenced -> active,referenced
*/
void fastcall mark_page_accessed(struct page *page)
{
- if (!PageActive(page) && PageReferenced(page) && PageLRU(page)) {
- activate_page(page);
- ClearPageReferenced(page);
- } else if (!PageReferenced(page)) {
+ if (!PageReferenced(page))
SetPageReferenced(page);
- }
}
EXPORT_SYMBOL(mark_page_accessed);
@@ -139,7 +114,7 @@ EXPORT_SYMBOL(mark_page_accessed);
static DEFINE_PER_CPU(struct pagevec, lru_add_pvecs) = { 0, };
static DEFINE_PER_CPU(struct pagevec, lru_add_active_pvecs) = { 0, };
-void fastcall lru_cache_add(struct page *page)
+void fastcall lru_cache_add_inactive(struct page *page)
{
struct pagevec *pvec = &get_cpu_var(lru_add_pvecs);
@@ -303,6 +278,8 @@ void __pagevec_lru_add(struct pagevec *p
}
if (TestSetPageLRU(page))
BUG();
+ if (TestClearPageActive(page))
+ BUG();
add_page_to_inactive_list(zone, page);
}
if (zone)
diff --git a/mm/swap_state.c b/mm/swap_state.c
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -359,7 +359,7 @@ struct page *read_swap_cache_async(swp_e
/*
* Initiate read into locked page and return.
*/
- lru_cache_add_active(new_page);
+ lru_cache_add(new_page);
swap_readpage(NULL, new_page);
return new_page;
}
diff --git a/mm/swapfile.c b/mm/swapfile.c
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -408,7 +408,7 @@ static void unuse_pte(struct vm_area_str
* Move the page to the active list so it is not
* immediately swapped out again after swapon.
*/
- activate_page(page);
+ SetPageReferenced(page);
}
static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
@@ -508,7 +508,7 @@ static int unuse_mm(struct mm_struct *mm
* Activate page so shrink_cache is unlikely to unmap its
* ptes while lock is dropped, so swapoff can make progress.
*/
- activate_page(page);
+ SetPageReferenced(page);
unlock_page(page);
down_read(&mm->mmap_sem);
lock_page(page);
diff --git a/mm/vmscan.c b/mm/vmscan.c
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -235,27 +235,6 @@ static int shrink_slab(unsigned long sca
return ret;
}
-/* Called without lock on whether page is mapped, so answer is unstable */
-static inline int page_mapping_inuse(struct page *page)
-{
- struct address_space *mapping;
-
- /* Page is in somebody's page tables. */
- if (page_mapped(page))
- return 1;
-
- /* Be more reluctant to reclaim swapcache than pagecache */
- if (PageSwapCache(page))
- return 1;
-
- mapping = page_mapping(page);
- if (!mapping)
- return 0;
-
- /* File is mmap'd by somebody? */
- return mapping_mapped(mapping);
-}
-
static inline int is_page_cache_freeable(struct page *page)
{
return page_count(page) - !!PagePrivate(page) == 2;
@@ -397,7 +376,7 @@ static int shrink_list(struct list_head
if (TestSetPageLocked(page))
goto keep;
- BUG_ON(PageActive(page));
+ /* BUG_ON(PageActive(page)); */
sc->nr_scanned++;
/* Double the slab pressure for mapped and swapcache pages */
@@ -408,8 +387,8 @@ static int shrink_list(struct list_head
goto keep_locked;
referenced = page_referenced(page, 1, sc->priority <= 0);
- /* In active use or really unfreeable? Activate it. */
- if (referenced && page_mapping_inuse(page))
+
+ if (referenced)
goto activate_locked;
#ifdef CONFIG_SWAP
@@ -532,6 +511,7 @@ static int shrink_list(struct list_head
__put_page(page);
free_it:
+ ClearPageActive(page);
unlock_page(page);
reclaimed++;
if (!pagevec_add(&freed_pvec, page))
@@ -554,7 +534,7 @@ keep:
sc->nr_reclaimed += reclaimed;
return reclaimed;
}
-
+
/*
* zone->lru_lock is heavily contended. Some of the functions that
* shrink the lists perform better by taking out a batch of pages
@@ -566,33 +546,36 @@ keep:
* Appropriate locks must be held before calling this function.
*
* @nr_to_scan: The number of pages to look through on the list.
- * @src: The LRU list to pull pages off.
+ * @zone: The zone to get pages from.
* @dst: The temp list to put pages on to.
* @scanned: The number of pages that were scanned.
*
* returns how many pages were moved onto *@dst.
*/
-static int isolate_lru_pages(int nr_to_scan, struct list_head *src,
+static int isolate_lru_pages(int nr_to_scan, struct zone *zone,
struct list_head *dst, int *scanned)
{
int nr_taken = 0;
struct page *page;
int scan = 0;
+ unsigned int flags;
- while (scan++ < nr_to_scan && !list_empty(src)) {
- page = lru_to_page(src);
- prefetchw_prev_lru_page(page, src, flags);
+ while (scan++ < nr_to_scan) {
+ page = cart_replace(zone, &flags);
+ if (!page) break;
if (!TestClearPageLRU(page))
BUG();
- list_del(&page->lru);
if (get_page_testone(page)) {
/*
* It is being freed elsewhere
*/
__put_page(page);
SetPageLRU(page);
- list_add(&page->lru, src);
+ if (!(flags & NR_list))
+ add_page_to_inactive_tail(zone, page);
+ else
+ add_page_to_active_tail(zone, page);
continue;
} else {
list_add(&page->lru, dst);
@@ -624,8 +607,7 @@ static void shrink_cache(struct zone *zo
int nr_freed;
nr_taken = isolate_lru_pages(sc->swap_cluster_max,
- &zone->inactive_list,
- &page_list, &nr_scan);
+ zone, &page_list, &nr_scan);
zone->nr_inactive -= nr_taken;
zone->pages_scanned += nr_scan;
spin_unlock_irq(&zone->lru_lock);
@@ -670,194 +652,34 @@ done:
}
/*
- * This moves pages from the active list to the inactive list.
- *
- * We move them the other way if the page is referenced by one or more
- * processes, from rmap.
- *
- * If the pages are mostly unmapped, the processing is fast and it is
- * appropriate to hold zone->lru_lock across the whole operation. But if
- * the pages are mapped, the processing is slow (page_referenced()) so we
- * should drop zone->lru_lock around each page. It's impossible to balance
- * this, so instead we remove the pages from the LRU while processing them.
- * It is safe to rely on PG_active against the non-LRU pages in here because
- * nobody will play with that bit on a non-LRU page.
- *
- * The downside is that we have to touch page->_count against each page.
- * But we had to alter page->flags anyway.
- */
-static void
-refill_inactive_zone(struct zone *zone, struct scan_control *sc)
-{
- int pgmoved;
- int pgdeactivate = 0;
- int pgscanned;
- int nr_pages = sc->nr_to_scan;
- LIST_HEAD(l_hold); /* The pages which were snipped off */
- LIST_HEAD(l_inactive); /* Pages to go onto the inactive_list */
- LIST_HEAD(l_active); /* Pages to go onto the active_list */
- struct page *page;
- struct pagevec pvec;
- int reclaim_mapped = 0;
- long mapped_ratio;
- long distress;
- long swap_tendency;
-
- lru_add_drain();
- spin_lock_irq(&zone->lru_lock);
- pgmoved = isolate_lru_pages(nr_pages, &zone->active_list,
- &l_hold, &pgscanned);
- zone->pages_scanned += pgscanned;
- zone->nr_active -= pgmoved;
- spin_unlock_irq(&zone->lru_lock);
-
- /*
- * `distress' is a measure of how much trouble we're having reclaiming
- * pages. 0 -> no problems. 100 -> great trouble.
- */
- distress = 100 >> zone->prev_priority;
-
- /*
- * The point of this algorithm is to decide when to start reclaiming
- * mapped memory instead of just pagecache. Work out how much memory
- * is mapped.
- */
- mapped_ratio = (sc->nr_mapped * 100) / total_memory;
-
- /*
- * Now decide how much we really want to unmap some pages. The mapped
- * ratio is downgraded - just because there's a lot of mapped memory
- * doesn't necessarily mean that page reclaim isn't succeeding.
- *
- * The distress ratio is important - we don't want to start going oom.
- *
- * A 100% value of vm_swappiness overrides this algorithm altogether.
- */
- swap_tendency = mapped_ratio / 2 + distress + vm_swappiness;
-
- /*
- * Now use this metric to decide whether to start moving mapped memory
- * onto the inactive list.
- */
- if (swap_tendency >= 100)
- reclaim_mapped = 1;
-
- while (!list_empty(&l_hold)) {
- cond_resched();
- page = lru_to_page(&l_hold);
- list_del(&page->lru);
- if (page_mapped(page)) {
- if (!reclaim_mapped ||
- (total_swap_pages == 0 && PageAnon(page)) ||
- page_referenced(page, 0, sc->priority <= 0)) {
- list_add(&page->lru, &l_active);
- continue;
- }
- }
- list_add(&page->lru, &l_inactive);
- }
-
- pagevec_init(&pvec, 1);
- pgmoved = 0;
- spin_lock_irq(&zone->lru_lock);
- while (!list_empty(&l_inactive)) {
- page = lru_to_page(&l_inactive);
- prefetchw_prev_lru_page(page, &l_inactive, flags);
- if (TestSetPageLRU(page))
- BUG();
- if (!TestClearPageActive(page))
- BUG();
- list_move(&page->lru, &zone->inactive_list);
- pgmoved++;
- if (!pagevec_add(&pvec, page)) {
- zone->nr_inactive += pgmoved;
- spin_unlock_irq(&zone->lru_lock);
- pgdeactivate += pgmoved;
- pgmoved = 0;
- if (buffer_heads_over_limit)
- pagevec_strip(&pvec);
- __pagevec_release(&pvec);
- spin_lock_irq(&zone->lru_lock);
- }
- }
- zone->nr_inactive += pgmoved;
- pgdeactivate += pgmoved;
- if (buffer_heads_over_limit) {
- spin_unlock_irq(&zone->lru_lock);
- pagevec_strip(&pvec);
- spin_lock_irq(&zone->lru_lock);
- }
-
- pgmoved = 0;
- while (!list_empty(&l_active)) {
- page = lru_to_page(&l_active);
- prefetchw_prev_lru_page(page, &l_active, flags);
- if (TestSetPageLRU(page))
- BUG();
- BUG_ON(!PageActive(page));
- list_move(&page->lru, &zone->active_list);
- pgmoved++;
- if (!pagevec_add(&pvec, page)) {
- zone->nr_active += pgmoved;
- pgmoved = 0;
- spin_unlock_irq(&zone->lru_lock);
- __pagevec_release(&pvec);
- spin_lock_irq(&zone->lru_lock);
- }
- }
- zone->nr_active += pgmoved;
- spin_unlock_irq(&zone->lru_lock);
- pagevec_release(&pvec);
-
- mod_page_state_zone(zone, pgrefill, pgscanned);
- mod_page_state(pgdeactivate, pgdeactivate);
-}
-
-/*
* This is a basic per-zone page freer. Used by both kswapd and direct reclaim.
*/
static void
shrink_zone(struct zone *zone, struct scan_control *sc)
{
unsigned long nr_active;
- unsigned long nr_inactive;
/*
* Add one to `nr_to_scan' just to make sure that the kernel will
* slowly sift through the active list.
*/
- zone->nr_scan_active += (zone->nr_active >> sc->priority) + 1;
+ zone->nr_scan_active += ((zone->nr_active + zone->nr_inactive) >> sc->priority) + 1;
nr_active = zone->nr_scan_active;
if (nr_active >= sc->swap_cluster_max)
zone->nr_scan_active = 0;
else
nr_active = 0;
- zone->nr_scan_inactive += (zone->nr_inactive >> sc->priority) + 1;
- nr_inactive = zone->nr_scan_inactive;
- if (nr_inactive >= sc->swap_cluster_max)
- zone->nr_scan_inactive = 0;
- else
- nr_inactive = 0;
sc->nr_to_reclaim = sc->swap_cluster_max;
- while (nr_active || nr_inactive) {
- if (nr_active) {
- sc->nr_to_scan = min(nr_active,
- (unsigned long)sc->swap_cluster_max);
- nr_active -= sc->nr_to_scan;
- refill_inactive_zone(zone, sc);
- }
-
- if (nr_inactive) {
- sc->nr_to_scan = min(nr_inactive,
- (unsigned long)sc->swap_cluster_max);
- nr_inactive -= sc->nr_to_scan;
- shrink_cache(zone, sc);
- if (sc->nr_to_reclaim <= 0)
- break;
- }
+ while (nr_active) {
+ sc->nr_to_scan = min(nr_active,
+ (unsigned long)sc->swap_cluster_max);
+ nr_active -= sc->nr_to_scan;
+ shrink_cache(zone, sc);
+ if (sc->nr_to_reclaim <= 0)
+ break;
}
throttle_vm_writeout();
^ permalink raw reply [flat|nested] 22+ messages in thread
* [RFC][PATCH] Re: Zoned CART
2005-08-26 21:03 ` Peter Zijlstra
@ 2005-08-27 19:46 ` Peter Zijlstra
0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2005-08-27 19:46 UTC (permalink / raw)
To: linux-mm; +Cc: Rik van Riel, Marcelo Tosatti, Rahul Iyer
[-- Attachment #1: Type: text/plain, Size: 1931 bytes --]
Hi All,
After another day of hard work I feel I have this CART implementation
complete.
It survives a pounding and the stats seem pretty stable.
The things that need more work:
1) the hash function seems pretty lousy
2) __cart_remember() called from shrink_list() needs zone->lru_lock
The whole non-resident code is based on the idea that the hash function
gives an even spread so that:
B1_j B1
------ ~ ----
B2_j B2
However after a pounding the variance in (B1_j - B2_j) as given by the
std. deviation: sqrt(<x^2> - <x>^2) is around 10. And this for a bucket
with 57 slots.
The other issue is that __cart_remember() needs the zone->lru_lock. This
function is called from shrink_list() where the lock is explicitly
avoided, so this seems like an issue. Alternatives would be atomic_t for
zone->nr_q or a per cpu counter delta. Suggestions?
Also I made quite some changes in swap.c and vmscan.c without being an
expert on the code. Did I foul up too bad?
Then ofcourse I need to benchmark, suggestions?
Any comments appreciated.
Kind regards,
Peter Zijlstra
fs/exec.c | 2
fs/proc/proc_misc.c | 30 +++
include/linux/mm_inline.h | 22 ++
include/linux/mmzone.h | 10 -
include/linux/page-flags.h | 8 +
include/linux/swap.h | 33 ++++
init/main.c | 3
mm/Makefile | 3
mm/cart.c | 329 ++++++++++++++++++++++++++++++++++++++++++
mm/filemap.c | 10 -
mm/memory.c | 8 -
mm/nonresident.c | 348 +++++++++++++++++++++++++++++++++++++++++++++
mm/shmem.c | 7
mm/swap.c | 84 +---------
mm/swap_state.c | 2
mm/swapfile.c | 4
mm/vmscan.c | 228 ++---------------------------
17 files changed, 819 insertions(+), 312 deletions(-)
--
Peter Zijlstra <a.p.zijlstra@chello.nl>
[-- Attachment #2: cart-mk2-2.patch --]
[-- Type: text/x-patch, Size: 44195 bytes --]
diff --git a/fs/exec.c b/fs/exec.c
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -331,7 +331,7 @@ void install_arg_page(struct vm_area_str
goto out;
}
inc_mm_counter(mm, rss);
- lru_cache_add_active(page);
+ lru_cache_add(page);
set_pte_at(mm, address, pte, pte_mkdirty(pte_mkwrite(mk_pte(
page, vma->vm_page_prot))));
page_add_anon_rmap(page, vma, address);
diff --git a/fs/proc/proc_misc.c b/fs/proc/proc_misc.c
--- a/fs/proc/proc_misc.c
+++ b/fs/proc/proc_misc.c
@@ -233,6 +233,34 @@ static struct file_operations proc_zonei
.release = seq_release,
};
+extern struct seq_operations cart_op;
+static int cart_open(struct inode *inode, struct file *file)
+{
+ (void)inode;
+ return seq_open(file, &cart_op);
+}
+
+static struct file_operations cart_file_operations = {
+ .open = cart_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
+extern struct seq_operations nonresident_op;
+static int nonresident_open(struct inode *inode, struct file *file)
+{
+ (void)inode;
+ return seq_open(file, &nonresident_op);
+}
+
+static struct file_operations nonresident_file_operations = {
+ .open = nonresident_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
static int version_read_proc(char *page, char **start, off_t off,
int count, int *eof, void *data)
{
@@ -602,6 +630,8 @@ void __init proc_misc_init(void)
create_seq_entry("interrupts", 0, &proc_interrupts_operations);
create_seq_entry("slabinfo",S_IWUSR|S_IRUGO,&proc_slabinfo_operations);
create_seq_entry("buddyinfo",S_IRUGO, &fragmentation_file_operations);
+ create_seq_entry("cart",S_IRUGO, &cart_file_operations);
+ create_seq_entry("nonresident",S_IRUGO, &nonresident_file_operations);
create_seq_entry("vmstat",S_IRUGO, &proc_vmstat_file_operations);
create_seq_entry("zoneinfo",S_IRUGO, &proc_zoneinfo_file_operations);
create_seq_entry("diskstats", 0, &proc_diskstats_operations);
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -31,10 +31,28 @@ static inline void
del_page_from_lru(struct zone *zone, struct page *page)
{
list_del(&page->lru);
- if (PageActive(page)) {
- ClearPageActive(page);
+ if (TestClearPageActive(page)) {
zone->nr_active--;
} else {
zone->nr_inactive--;
}
+ if (TestClearPageLongTerm(page)) {
+ /* zone->nr_longterm--; */
+ } else {
+ zone->nr_shortterm--;
+ }
+}
+
+static inline void
+add_page_to_active_tail(struct zone *zone, struct page *page)
+{
+ list_add_tail(&page->lru, &zone->active_list);
+ zone->nr_active++;
+}
+
+static inline void
+add_page_to_inactive_tail(struct zone *zone, struct page *page)
+{
+ list_add_tail(&page->lru, &zone->inactive_list);
+ zone->nr_inactive++;
}
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -143,13 +143,17 @@ struct zone {
ZONE_PADDING(_pad1_)
/* Fields commonly accessed by the page reclaim scanner */
- spinlock_t lru_lock;
- struct list_head active_list;
- struct list_head inactive_list;
+ spinlock_t lru_lock;
+ struct list_head active_list; /* The T1 list of CART */
+ struct list_head inactive_list; /* The T2 list of CART */
unsigned long nr_scan_active;
unsigned long nr_scan_inactive;
unsigned long nr_active;
unsigned long nr_inactive;
+ unsigned long nr_evicted_active;
+ unsigned long nr_shortterm; /* number of short term pages */
+ unsigned long nr_p; /* p from the CART paper */
+ unsigned long nr_q; /* q from the cart paper */
unsigned long pages_scanned; /* since last reclaim */
int all_unreclaimable; /* All pages pinned */
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -76,6 +76,8 @@
#define PG_nosave_free 18 /* Free, should not be written */
#define PG_uncached 19 /* Page has been mapped as uncached */
+#define PG_longterm 20 /* Filter bit for CART see mm/cart.c */
+
/*
* Global page accounting. One instance per CPU. Only unsigned longs are
* allowed.
@@ -305,6 +307,12 @@ extern void __mod_page_state(unsigned lo
#define SetPageUncached(page) set_bit(PG_uncached, &(page)->flags)
#define ClearPageUncached(page) clear_bit(PG_uncached, &(page)->flags)
+#define PageLongTerm(page) test_bit(PG_longterm, &(page)->flags)
+#define SetPageLongTerm(page) set_bit(PG_longterm, &(page)->flags)
+#define TestSetPageLongTerm(page) test_and_set_bit(PG_longterm, &(page)->flags)
+#define ClearPageLongTerm(page) clear_bit(PG_longterm, &(page)->flags)
+#define TestClearPageLongTerm(page) test_and_clear_bit(PG_longterm, &(page)->flags)
+
struct page; /* forward declaration */
int test_clear_page_dirty(struct page *page);
diff --git a/include/linux/swap.h b/include/linux/swap.h
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -7,6 +7,7 @@
#include <linux/mmzone.h>
#include <linux/list.h>
#include <linux/sched.h>
+#include <linux/mm.h>
#include <asm/atomic.h>
#include <asm/page.h>
@@ -154,6 +155,31 @@ extern void out_of_memory(unsigned int _
/* linux/mm/memory.c */
extern void swapin_readahead(swp_entry_t, unsigned long, struct vm_area_struct *);
+/* linux/mm/nonresident.c */
+#define NR_filter 0x01 /* short/long */
+#define NR_list 0x02 /* b1/b2; correlates to PG_active */
+#define NR_evict 0x80000000
+
+extern unsigned int remember_page(struct address_space *, unsigned long, unsigned int);
+extern unsigned int recently_evicted(struct address_space *, unsigned long);
+extern void init_nonresident(void);
+
+/* linux/mm/cart.c */
+extern void cart_init(void);
+extern void __cart_insert(struct zone *, struct page *);
+extern struct page *__cart_replace(struct zone *);
+extern void __cart_reinsert(struct zone *, struct page*);
+extern void __cart_remember(struct zone *, struct page*);
+
+static inline void cart_remember(struct page *page)
+{
+ unsigned long flags;
+ struct zone *zone = page_zone(page);
+ spin_lock_irqsave(&zone->lru_lock, flags);
+ __cart_remember(zone, page);
+ spin_unlock_irqrestore(&zone->lru_lock, flags);
+}
+
/* linux/mm/page_alloc.c */
extern unsigned long totalram_pages;
extern unsigned long totalhigh_pages;
@@ -165,8 +191,6 @@ extern unsigned int nr_free_pagecache_pa
/* linux/mm/swap.c */
extern void FASTCALL(lru_cache_add(struct page *));
-extern void FASTCALL(lru_cache_add_active(struct page *));
-extern void FASTCALL(activate_page(struct page *));
extern void FASTCALL(mark_page_accessed(struct page *));
extern void lru_add_drain(void);
extern int rotate_reclaimable_page(struct page *page);
@@ -292,6 +316,11 @@ static inline swp_entry_t get_swap_page(
#define grab_swap_token() do { } while(0)
#define has_swap_token(x) 0
+/* linux/mm/nonresident.c */
+#define init_nonresident() do { } while (0)
+#define remember_page(x,y,z) 0
+#define recently_evicted(x,y) 0
+
#endif /* CONFIG_SWAP */
#endif /* __KERNEL__*/
#endif /* _LINUX_SWAP_H */
diff --git a/init/main.c b/init/main.c
--- a/init/main.c
+++ b/init/main.c
@@ -47,6 +47,7 @@
#include <linux/rmap.h>
#include <linux/mempolicy.h>
#include <linux/key.h>
+#include <linux/swap.h>
#include <asm/io.h>
#include <asm/bugs.h>
@@ -494,7 +495,9 @@ asmlinkage void __init start_kernel(void
}
#endif
vfs_caches_init_early();
+ init_nonresident();
mem_init();
+ cart_init();
kmem_cache_init();
setup_per_cpu_pageset();
numa_policy_init();
diff --git a/mm/Makefile b/mm/Makefile
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -12,7 +12,8 @@ obj-y := bootmem.o filemap.o mempool.o
readahead.o slab.o swap.o truncate.o vmscan.o \
prio_tree.o $(mmu-y)
-obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o
+obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o \
+ nonresident.o cart.o
obj-$(CONFIG_HUGETLBFS) += hugetlb.o
obj-$(CONFIG_NUMA) += mempolicy.o
obj-$(CONFIG_SPARSEMEM) += sparse.o
diff --git a/mm/cart.c b/mm/cart.c
new file mode 100644
--- /dev/null
+++ b/mm/cart.c
@@ -0,0 +1,329 @@
+/* For further details, please refer to the CART paper here -
+ * http://www.almaden.ibm.com/cs/people/dmodha/clockfast.pdf
+ *
+ * Modified by Peter Zijlstra to work with the nonresident code I adapted
+ * from Rik van Riel.
+ *
+ * XXX: add page accounting
+ */
+
+#include <linux/swap.h>
+#include <linux/mm.h>
+#include <linux/page-flags.h>
+#include <linux/mm_inline.h>
+#include <linux/rmap.h>
+
+#define cart_cT ((zone)->nr_active + (zone)->nr_inactive)
+#define cart_cB ((zone)->present_pages)
+
+#define size_T1 ((zone)->nr_active)
+#define size_T2 ((zone)->nr_inactive)
+
+#define list_T1 (&(zone)->active_list)
+#define list_T2 (&(zone)->inactive_list)
+
+#define cart_p ((zone)->nr_p)
+#define cart_q ((zone)->nr_q)
+
+#define size_B1 ((zone)->nr_evicted_active)
+#define size_B2 (cart_cB - size_B1)
+
+#define nr_Ns ((zone)->nr_shortterm)
+#define nr_Nl (cart_cT - nr_Ns)
+
+#define T2B(x) (((x) * cart_cB) / (cart_cT + 1))
+#define B2T(x) (((x) * cart_cT) / cart_cB)
+
+/* Called from init/main.c to initialize the cart parameters */
+void cart_init()
+{
+ struct zone *zone;
+ for_each_zone(zone) {
+ zone->nr_evicted_active = 0;
+ /* zone->nr_evicted_inactive = cart_cB; */
+ zone->nr_shortterm = 0;
+ /* zone->nr_longterm = 0; */
+ zone->nr_p = 0;
+ zone->nr_q = 0;
+ }
+}
+
+static inline void cart_q_inc(struct zone *zone)
+{
+ /* if (|T2| + |B2| + |T1| - ns >= c) q = min(q + 1, 2c - |T1|) */
+ if (size_T2 + B2T(size_B2) + size_T1 - nr_Ns >= cart_cT)
+ cart_q = min(cart_q + 1, 2*cart_cB - T2B(size_T1));
+}
+
+static inline void cart_q_dec(struct zone *zone)
+{
+ /* q = max(q - 1, c - |T1|) */
+ unsigned long target = cart_cB - T2B(size_T1);
+ if (cart_q <= target)
+ cart_q = target;
+ else
+ --cart_q;
+}
+
+/*
+ * zone->lru_lock taken
+ */
+void __cart_insert(struct zone *zone, struct page *page)
+{
+ unsigned int rflags;
+ unsigned int on_B1, on_B2;
+
+ rflags = recently_evicted(page_mapping(page), page_index(page));
+ on_B1 = (rflags && !(rflags & NR_list));
+ on_B2 = (rflags && (rflags & NR_list));
+
+ if (on_B1) {
+ /* p = min(p + max(1, ns/|B1|), c) */
+ unsigned long ratio = nr_Ns / (B2T(size_B1) + 1);
+ cart_p += ratio ?: 1UL;
+ if (unlikely(cart_p > cart_cT))
+ cart_p = cart_cT;
+
+ SetPageLongTerm(page);
+ /* ++nr_Nl; */
+ } else if (on_B2) {
+ /* p = max(p - max(1, nl/|B2|), 0) */
+ unsigned long ratio = nr_Nl / (B2T(size_B2) + 1);
+ cart_p -= ratio ?: 1UL;
+ if (unlikely(cart_p > cart_cT)) /* unsigned; wrap around */
+ cart_p = 0UL;
+
+ SetPageLongTerm(page);
+ /* NOTE: this function is the only one that uses recently_evicted()
+ * and it does not use the NR_filter flag; we could live without,
+ * for now use as sanity check
+ */
+ BUG_ON(!(rflags & NR_filter)); /* all pages in B2 are longterm */
+
+ /* ++nr_Nl; */
+ cart_q_inc(zone);
+ } else {
+ ClearPageLongTerm(page);
+ ++nr_Ns;
+ }
+
+ ClearPageReferenced(page);
+ SetPageActive(page);
+ add_page_to_active_list(zone, page);
+ BUG_ON(!PageLRU(page));
+}
+
+/* This function selects the candidate and returns the corresponding
+ * struct page * or returns NULL in case no page can be freed.
+ */
+struct page *__cart_replace(struct zone *zone)
+{
+ struct page *page;
+ int referenced;
+
+ while (!list_empty(list_T2)) {
+ page = list_entry(list_T2->next, struct page, lru);
+
+ if (!page_referenced(page, 0, 0))
+ break;
+
+ del_page_from_inactive_list(zone, page);
+ add_page_to_active_tail(zone, page);
+ SetPageActive(page);
+
+ cart_q_inc(zone);
+ }
+
+ while (!list_empty(list_T1)) {
+ page = list_entry(list_T1->next, struct page, lru);
+ referenced = page_referenced(page, 0, 0);
+
+ if (!PageLongTerm(page) && !referenced)
+ break;
+
+ if (referenced) {
+ del_page_from_active_list(zone, page);
+ add_page_to_active_tail(zone, page);
+
+ /* ( |T1| >= min(p + 1, |B1| ) and ( filter = 'S' ) */
+ if (size_T1 >= min(cart_p + 1, B2T(size_B1)) &&
+ !PageLongTerm(page)) {
+ SetPageLongTerm(page);
+ --nr_Ns;
+ /* ++nr_Nl; */
+ }
+ } else {
+ BUG_ON(!PageLongTerm(page));
+
+ del_page_from_active_list(zone, page);
+ add_page_to_inactive_tail(zone, page);
+ ClearPageActive(page);
+
+ cart_q_dec(zone);
+ }
+ }
+
+ page = NULL;
+ if (size_T1 > max(1UL, cart_p) || list_empty(list_T2)) {
+ if (!list_empty(list_T1)) {
+ page = list_entry(list_T1->next, struct page, lru);
+ del_page_from_active_list(zone, page);
+ BUG_ON(PageLongTerm(page));
+ --nr_Ns;
+ }
+ } else {
+ BUG_ON(list_empty(list_T2));
+ page = list_entry(list_T2->next, struct page, lru);
+ del_page_from_inactive_list(zone, page);
+ /* --nr_Nl; */
+ }
+ if (!page) return NULL;
+
+ return page;
+}
+
+/* re-insert pages that were elected for replacement but somehow didn't make it
+ * treat as referenced to let the relaim path make progress.
+ */
+void __cart_reinsert(struct zone *zone, struct page *page )
+{
+ if (!PageLongTerm(page)) ++nr_Ns;
+
+ if (!PageActive(page)) { /* T2 */
+ SetPageActive(page);
+ add_page_to_active_tail(zone, page);
+
+ cart_q_inc(zone);
+ } else { /* T1 */
+ add_page_to_active_tail(zone, page);
+
+ /* ( |T1| >= min(p + 1, |B1| ) and ( filter = 'S' ) */
+ if (size_T1 >= min(cart_p + 1, B2T(size_B1)) &&
+ !PageLongTerm(page)) {
+ SetPageLongTerm(page);
+ --nr_Ns;
+ /* ++nr_Nl; */
+ }
+ }
+}
+
+/* puts pages on the non-resident lists on swap-out
+ * XXX: lose the reliance on zone->lru_lock !!!
+ */
+void __cart_remember(struct zone *zone, struct page *page)
+{
+ unsigned int rflags;
+ unsigned int flags = 0;
+
+ if (!PageActive(page)) {
+ flags |= NR_list;
+ /* ++size_B2; */
+ } else
+ ++size_B1;
+
+ if (PageLongTerm(page))
+ flags |= NR_filter;
+
+ /* history replacement; always remember, if the page was already remembered
+ * this will move it to the head. XXX: not so; fix this !!
+ *
+ * Assume |B1| + |B2| == c + 1, since |B1_j| + |B2_j| := c_j.
+ * The list_empty check is done on the Bn_j side.
+ */
+ /* |B1| <= max(0, q) */
+ if (size_B1 <= cart_q) flags |= NR_evict;
+
+ rflags = remember_page(page_mapping(page), page_index(page), flags);
+
+ if (rflags & NR_list) {
+ /* if (likely(size_B2)) --size_B2; */
+ } else {
+ if (likely(size_B1)) --size_B1;
+ }
+}
+
+#ifdef CONFIG_PROC_FS
+
+#include <linux/seq_file.h>
+
+static void *stats_start(struct seq_file *m, loff_t *pos)
+{
+ if (*pos != 0)
+ return NULL;
+
+ lru_add_drain();
+
+ return pos;
+}
+
+static void *stats_next(struct seq_file *m, void *arg, loff_t *pos)
+{
+ return NULL;
+}
+
+static void stats_stop(struct seq_file *m, void *arg)
+{
+}
+
+static int stats_show(struct seq_file *m, void *arg)
+{
+ struct zone *zone;
+ for_each_zone(zone) {
+ spin_lock_irq(&zone->lru_lock);
+ seq_printf(m, "\n\n======> zone: %lu <=====\n", (unsigned long)zone);
+ seq_printf(m, "struct zone values:\n");
+ seq_printf(m, " zone->nr_active: %lu\n", zone->nr_active);
+ seq_printf(m, " zone->nr_inactive: %lu\n", zone->nr_inactive);
+ seq_printf(m, " zone->nr_evicted_active: %lu\n", zone->nr_evicted_active);
+ seq_printf(m, " zone->nr_shortterm: %lu\n", zone->nr_shortterm);
+ seq_printf(m, " zone->cart_p: %lu\n", zone->nr_p);
+ seq_printf(m, " zone->cart_q: %lu\n", zone->nr_q);
+ seq_printf(m, " zone->present_pages: %lu\n", zone->present_pages);
+ seq_printf(m, " zone->free_pages: %lu\n", zone->free_pages);
+ seq_printf(m, " zone->pages_min: %lu\n", zone->pages_min);
+ seq_printf(m, " zone->pages_low: %lu\n", zone->pages_low);
+ seq_printf(m, " zone->pages_high: %lu\n", zone->pages_high);
+
+ seq_printf(m, "\n");
+ seq_printf(m, "implicit values:\n");
+ seq_printf(m, " zone->nr_evicted_longterm: %lu\n", size_B2);
+ seq_printf(m, " zone->nr_longterm: %lu\n", nr_Nl);
+ seq_printf(m, " zone->cart_c: %lu\n", cart_cT);
+
+ seq_printf(m, "\n");
+ seq_printf(m, "counted values:\n");
+
+ {
+ struct page *page;
+ unsigned long active = 0, s1 = 0, l1 = 0;
+ unsigned long inactive = 0, s2 = 0, l2 = 0;
+ list_for_each_entry(page, &zone->active_list, lru) {
+ ++active;
+ if (PageLongTerm(page)) ++l1;
+ else ++s1;
+ }
+ list_for_each_entry(page, &zone->inactive_list, lru) {
+ ++inactive;
+ if (PageLongTerm(page)) ++l2;
+ else ++s2;
+ }
+ seq_printf(m, " zone->nr_active: %lu (%lu, %lu)\n", active, s1, l1);
+ seq_printf(m, " zone->nr_inactive: %lu (%lu, %lu)\n", inactive, s2, l2);
+ seq_printf(m, " zone->nr_shortterm: %lu\n", s1+s2);
+ seq_printf(m, " zone->nr_longterm: %lu\n", l1+l2);
+ }
+
+ spin_unlock_irq(&zone->lru_lock);
+ }
+
+ return 0;
+}
+
+struct seq_operations cart_op = {
+ .start = stats_start,
+ .next = stats_next,
+ .stop = stats_stop,
+ .show = stats_show,
+};
+
+#endif /* CONFIG_PROC_FS */
diff --git a/mm/filemap.c b/mm/filemap.c
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -723,7 +723,6 @@ void do_generic_mapping_read(struct addr
unsigned long offset;
unsigned long last_index;
unsigned long next_index;
- unsigned long prev_index;
loff_t isize;
struct page *cached_page;
int error;
@@ -732,7 +731,6 @@ void do_generic_mapping_read(struct addr
cached_page = NULL;
index = *ppos >> PAGE_CACHE_SHIFT;
next_index = index;
- prev_index = ra.prev_page;
last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
offset = *ppos & ~PAGE_CACHE_MASK;
@@ -779,13 +777,7 @@ page_ok:
if (mapping_writably_mapped(mapping))
flush_dcache_page(page);
- /*
- * When (part of) the same page is read multiple times
- * in succession, only mark it as accessed the first time.
- */
- if (prev_index != index)
- mark_page_accessed(page);
- prev_index = index;
+ mark_page_accessed(page);
/*
* Ok, we have the page, and it's up-to-date, so
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1304,7 +1304,7 @@ static int do_wp_page(struct mm_struct *
page_remove_rmap(old_page);
flush_cache_page(vma, address, pfn);
break_cow(vma, new_page, address, page_table);
- lru_cache_add_active(new_page);
+ lru_cache_add(new_page);
page_add_anon_rmap(new_page, vma, address);
/* Free the old page.. */
@@ -1782,8 +1782,7 @@ do_anonymous_page(struct mm_struct *mm,
entry = maybe_mkwrite(pte_mkdirty(mk_pte(page,
vma->vm_page_prot)),
vma);
- lru_cache_add_active(page);
- SetPageReferenced(page);
+ lru_cache_add(page);
page_add_anon_rmap(page, vma, addr);
}
@@ -1903,7 +1902,8 @@ retry:
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
set_pte_at(mm, address, page_table, entry);
if (anon) {
- lru_cache_add_active(new_page);
+ lru_cache_add(new_page);
+ SetPageReferenced(new_page);
page_add_anon_rmap(new_page, vma, address);
} else
page_add_file_rmap(new_page);
diff --git a/mm/nonresident.c b/mm/nonresident.c
new file mode 100644
--- /dev/null
+++ b/mm/nonresident.c
@@ -0,0 +1,348 @@
+/*
+ * mm/nonresident.c
+ * (C) 2004,2005 Red Hat, Inc
+ * Written by Rik van Riel <riel@redhat.com>
+ * Released under the GPL, see the file COPYING for details.
+ * Adapted by Peter Zijlstra <a.p.zijlstra@chello.nl> for use by ARC
+ * like algorithms.
+ *
+ * Keeps track of whether a non-resident page was recently evicted
+ * and should be immediately promoted to the active list. This also
+ * helps automatically tune the inactive target.
+ *
+ * The pageout code stores a recently evicted page in this cache
+ * by calling remember_page(mapping/mm, index/vaddr)
+ * and can look it up in the cache by calling recently_evicted()
+ * with the same arguments.
+ *
+ * Note that there is no way to invalidate pages after eg. truncate
+ * or exit, we let the pages fall out of the non-resident set through
+ * normal replacement.
+ *
+ *
+ * Modified to work with ARC like algorithms who:
+ * - need to balance two FIFOs; |b1| + |b2| = c,
+ * - keep a flag per non-resident page.
+ *
+ * The bucket contains two single linked cyclic lists (CLOCKS) and each
+ * clock has a tail hand. By selecting a victim clock upon insertion it
+ * is possible to balance them.
+ *
+ * The slot looks like this:
+ * struct slot_t {
+ * u32 cookie : 24; // LSB
+ * u32 index : 6;
+ * u32 filter : 1;
+ * u32 clock : 1; // MSB
+ * };
+ *
+ * The bucket is guarded by a spinlock.
+ */
+#include <linux/swap.h>
+#include <linux/mm.h>
+#include <linux/cache.h>
+#include <linux/spinlock.h>
+#include <linux/bootmem.h>
+#include <linux/hash.h>
+#include <linux/prefetch.h>
+#include <linux/kernel.h>
+
+#define TARGET_SLOTS 64
+#define NR_CACHELINES (TARGET_SLOTS*sizeof(u32) / L1_CACHE_BYTES)
+#define NR_SLOTS (((NR_CACHELINES * L1_CACHE_BYTES) - sizeof(spinlock_t) - 2*sizeof(u16)) / sizeof(u32))
+#if 0
+#if NR_SLOTS < (TARGET_SLOTS / 2)
+#warning very small slot size
+#if NR_SLOTS <= 0
+#error no room for slots left
+#endif
+#endif
+#endif
+
+#define BUILD_MASK(bits, shift) (((1 << (bits)) - 1) << (shift))
+
+#define FLAGS_BITS 2
+#define FLAGS_SHIFT (sizeof(u32)*8 - FLAGS_BITS)
+#define FLAGS_MASK BUILD_MASK(FLAGS_BITS, FLAGS_SHIFT)
+
+#define SET_FLAGS(x, flg) ((x) = ((x) & ~FLAGS_MASK) | ((flg) << FLAGS_SHIFT))
+#define GET_FLAGS(x) (((x) & FLAGS_MASK) >> FLAGS_SHIFT)
+
+#define INDEX_BITS 6 /* ceil(log2(NR_SLOTS)) */
+#define INDEX_SHIFT (FLAGS_SHIFT - INDEX_BITS)
+#define INDEX_MASK BUILD_MASK(INDEX_BITS, INDEX_SHIFT)
+
+#define SET_INDEX(x, idx) ((x) = ((x) & ~INDEX_MASK) | ((idx) << INDEX_SHIFT))
+#define GET_INDEX(x) (((x) & INDEX_MASK) >> INDEX_SHIFT)
+
+struct nr_bucket
+{
+ spinlock_t lock;
+ u16 hand[2];
+ u32 slot[NR_SLOTS];
+} ____cacheline_aligned;
+
+/* The non-resident page hash table. */
+static struct nr_bucket * nonres_table;
+static unsigned int nonres_shift;
+static unsigned int nonres_mask;
+
+/* hash the address into a bucket */
+static struct nr_bucket * nr_hash(void * mapping, unsigned long index)
+{
+ unsigned long bucket;
+ unsigned long hash;
+
+ hash = hash_ptr(mapping, BITS_PER_LONG);
+ hash = 37 * hash + hash_long(index, BITS_PER_LONG);
+ bucket = hash & nonres_mask;
+
+ return nonres_table + bucket;
+}
+
+/* hash the address and inode into a cookie */
+static u32 nr_cookie(struct address_space * mapping, unsigned long index)
+{
+ unsigned long cookie;
+
+ cookie = hash_ptr(mapping, BITS_PER_LONG);
+ cookie = 37 * cookie + hash_long(index, BITS_PER_LONG);
+
+ if (mapping && mapping->host) {
+ cookie = 37 * cookie + hash_long(mapping->host->i_ino, BITS_PER_LONG);
+ }
+
+ return (u32)(cookie >> (BITS_PER_LONG - 32));
+}
+
+unsigned int recently_evicted(struct address_space * mapping, unsigned long index)
+{
+ struct nr_bucket * nr_bucket;
+ u32 wanted, mask;
+ unsigned int r_flags = 0;
+ int i;
+ unsigned long iflags;
+
+ prefetch(mapping->host);
+ nr_bucket = nr_hash(mapping, index);
+
+ spin_lock_prefetch(nr_bucket); // prefetch_range(nr_bucket, NR_CACHELINES);
+ mask = ~(FLAGS_MASK | INDEX_MASK);
+ wanted = nr_cookie(mapping, index) & mask;
+
+ spin_lock_irqsave(&nr_bucket->lock, iflags);
+ for (i = 0; i < NR_SLOTS; ++i) {
+ if ((nr_bucket->slot[i] & mask) == wanted) {
+ r_flags = GET_FLAGS(nr_bucket->slot[i]);
+ r_flags |= NR_evict; /* set the MSB to mark presence */
+ break;
+ }
+ }
+ spin_unlock_irqrestore(&nr_bucket->lock, iflags);
+
+ return r_flags;
+}
+
+/* flags:
+ * logical and of the page flags (NR_filter, NR_list) and
+ * an NR_evict target
+ *
+ * remove current (b from 'abc'):
+ *
+ * initial swap(2,3)
+ *
+ * 1: -> [2],a 1: -> [2],a
+ * * 2: -> [3],b 2: -> [1],c
+ * 3: -> [1],c * 3: -> [3],b
+ *
+ * 3 is now free for use.
+ *
+ *
+ * insert before (d before b in 'abc')
+ *
+ * initial set 4 swap(2,4)
+ *
+ * 1: -> [2],a 1: -> [2],a 1: -> [2],a
+ * * 2: -> [3],b 2: -> [3],b 2: -> [4],d
+ * 3: -> [1],c 3: -> [1],c 3: -> [1],c
+ * 4: nil 4: -> [4],d * 4: -> [3],b
+ *
+ * leaving us with 'adbc'.
+ */
+unsigned int remember_page(struct address_space * mapping, unsigned long index, unsigned int flags)
+{
+ struct nr_bucket *nr_bucket;
+ u32 cookie;
+ u32 *slot, *tail;
+ unsigned int slot_pos, tail_pos;
+ unsigned long iflags;
+
+ prefetch(mapping->host);
+ nr_bucket = nr_hash(mapping, index);
+
+ spin_lock_prefetch(nr_bucket); // prefetchw_range(nr_bucket, NR_CACHELINES);
+ cookie = nr_cookie(mapping, index);
+ SET_FLAGS(cookie, flags);
+
+ flags &= NR_evict; /* removal chain */
+ spin_lock_irqsave(&nr_bucket->lock, iflags);
+
+ /* free a slot */
+again:
+ tail_pos = nr_bucket->hand[!!flags];
+ BUG_ON(tail_pos >= NR_SLOTS);
+ tail = &nr_bucket->slot[tail_pos];
+ if (unlikely((*tail & NR_evict) != flags)) {
+ flags ^= NR_evict; /* empty chain; take other one */
+ goto again;
+ }
+ BUG_ON((*tail & NR_evict) != flags);
+ /* free slot by swapping tail,tail+1, so that we skip over tail */
+ slot_pos = GET_INDEX(*tail);
+ BUG_ON(slot_pos >= NR_SLOTS);
+ slot = &nr_bucket->slot[slot_pos];
+ BUG_ON((*slot & NR_evict) != flags);
+ if (likely(tail != slot)) *slot = xchg(tail, *slot);
+ /* slot: -> [slot], old cookie */
+ BUG_ON(GET_INDEX(*slot) != slot_pos);
+
+ flags = (cookie & NR_evict); /* insertion chain */
+
+ /* place cookie in empty slot */
+ SET_INDEX(cookie, slot_pos); /* -> [slot], cookie */
+ cookie = xchg(slot, cookie); /* slot: -> [slot], cookie */
+
+ /* insert slot before tail; ie. MRU pos */
+ tail_pos = nr_bucket->hand[!!flags];
+ BUG_ON(tail_pos >= NR_SLOTS);
+ tail = &nr_bucket->slot[tail_pos];
+ if (likely((*tail & NR_evict) == flags && tail != slot))
+ *slot = xchg(tail, *slot); /* swap if not empty and not same */
+ nr_bucket->hand[!!flags] = slot_pos;
+
+ spin_unlock_irqrestore(&nr_bucket->lock, iflags);
+
+ return GET_FLAGS(cookie);
+}
+
+/*
+ * For interactive workloads, we remember about as many non-resident pages
+ * as we have actual memory pages. For server workloads with large inter-
+ * reference distances we could benefit from remembering more.
+ */
+static __initdata unsigned long nonresident_factor = 1;
+void __init init_nonresident(void)
+{
+ int target;
+ int i, j;
+
+ /*
+ * Calculate the non-resident hash bucket target. Use a power of
+ * two for the division because alloc_large_system_hash rounds up.
+ */
+ target = nr_all_pages * nonresident_factor;
+ target /= (sizeof(struct nr_bucket) / sizeof(u32));
+
+ nonres_table = alloc_large_system_hash("Non-resident page tracking",
+ sizeof(struct nr_bucket),
+ target,
+ 0,
+ HASH_EARLY | HASH_HIGHMEM,
+ &nonres_shift,
+ &nonres_mask,
+ 0);
+
+ for (i = 0; i < (1 << nonres_shift); i++) {
+ spin_lock_init(&nonres_table[i].lock);
+ nonres_table[i].hand[0] = nonres_table[i].hand[1] = 0;
+ for (j = 0; j < NR_SLOTS; ++j) {
+ nonres_table[i].slot[j] = 0;
+ SET_FLAGS(nonres_table[i].slot[j], (NR_list | NR_filter));
+ if (j < NR_SLOTS - 1)
+ SET_INDEX(nonres_table[i].slot[j], j+1);
+ else /* j == NR_SLOTS - 1 */
+ SET_INDEX(nonres_table[i].slot[j], 0);
+ }
+ }
+}
+
+static int __init set_nonresident_factor(char * str)
+{
+ if (!str)
+ return 0;
+ nonresident_factor = simple_strtoul(str, &str, 0);
+ return 1;
+}
+
+__setup("nonresident_factor=", set_nonresident_factor);
+
+#ifdef CONFIG_PROC_FS
+
+#include <linux/seq_file.h>
+
+static void *stats_start(struct seq_file *m, loff_t *pos)
+{
+ if (*pos < 0 || *pos >= (1 << nonres_shift))
+ return NULL;
+
+ m->private = (unsigned long)*pos;
+
+ return pos;
+}
+
+static void *stats_next(struct seq_file *m, void *arg, loff_t *pos)
+{
+ if (*pos < (1 << nonres_shift)-1) {
+ (*pos)++;
+ (unsigned long)m->private++;
+ return pos;
+ }
+ return NULL;
+}
+
+static void stats_stop(struct seq_file *m, void *arg)
+{
+}
+
+static void bucket_stats(struct nr_bucket * nr_bucket, int * b1, int * b2)
+{
+ unsigned int i, b[2] = {0, 0};
+ for (i = 0; i < 2; ++i) {
+ unsigned int j = nr_bucket->hand[i];
+ do
+ {
+ u32 *slot = &nr_bucket->slot[j];
+ if (!!(GET_FLAGS(*slot) & NR_list) != !!i)
+ break;
+
+ j = GET_INDEX(*slot);
+ ++b[i];
+ } while (j != nr_bucket->hand[i]);
+ }
+ *b1=b[0];
+ *b2=b[1];
+}
+
+static int stats_show(struct seq_file *m, void *arg)
+{
+ unsigned int index = (unsigned long)m->private;
+ struct nr_bucket *nr_bucket = &nonres_table[index];
+ unsigned long flags;
+ unsigned int b1, b2;
+
+ spin_lock_irqsave(&nr_bucket->lock, flags);
+ bucket_stats(nr_bucket, &b1, &b2);
+ spin_unlock_irqrestore(&nr_bucket->lock, flags);
+ seq_printf(m, "%d\t%d\t%d\n", b1, b2, b1+b2);
+
+ return 0;
+}
+
+struct seq_operations nonresident_op = {
+ .start = stats_start,
+ .next = stats_next,
+ .stop = stats_stop,
+ .show = stats_show,
+};
+
+#endif /* CONFIG_PROC_FS */
diff --git a/mm/shmem.c b/mm/shmem.c
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1500,11 +1500,8 @@ static void do_shmem_file_read(struct fi
*/
if (mapping_writably_mapped(mapping))
flush_dcache_page(page);
- /*
- * Mark the page accessed if we read the beginning.
- */
- if (!offset)
- mark_page_accessed(page);
+
+ mark_page_accessed(page);
} else
page = ZERO_PAGE(0);
diff --git a/mm/swap.c b/mm/swap.c
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -78,16 +78,17 @@ int rotate_reclaimable_page(struct page
return 1;
if (PageDirty(page))
return 1;
- if (PageActive(page))
- return 1;
if (!PageLRU(page))
return 1;
zone = page_zone(page);
spin_lock_irqsave(&zone->lru_lock, flags);
- if (PageLRU(page) && !PageActive(page)) {
+ if (PageLRU(page)) {
list_del(&page->lru);
- list_add_tail(&page->lru, &zone->inactive_list);
+ if (PageActive(page))
+ list_add(&page->lru, &zone->active_list);
+ else
+ list_add(&page->lru, &zone->inactive_list);
inc_page_state(pgrotated);
}
if (!test_clear_page_writeback(page))
@@ -97,37 +98,12 @@ int rotate_reclaimable_page(struct page
}
/*
- * FIXME: speed this up?
- */
-void fastcall activate_page(struct page *page)
-{
- struct zone *zone = page_zone(page);
-
- spin_lock_irq(&zone->lru_lock);
- if (PageLRU(page) && !PageActive(page)) {
- del_page_from_inactive_list(zone, page);
- SetPageActive(page);
- add_page_to_active_list(zone, page);
- inc_page_state(pgactivate);
- }
- spin_unlock_irq(&zone->lru_lock);
-}
-
-/*
* Mark a page as having seen activity.
- *
- * inactive,unreferenced -> inactive,referenced
- * inactive,referenced -> active,unreferenced
- * active,unreferenced -> active,referenced
*/
void fastcall mark_page_accessed(struct page *page)
{
- if (!PageActive(page) && PageReferenced(page) && PageLRU(page)) {
- activate_page(page);
- ClearPageReferenced(page);
- } else if (!PageReferenced(page)) {
+ if (!PageReferenced(page))
SetPageReferenced(page);
- }
}
EXPORT_SYMBOL(mark_page_accessed);
@@ -137,7 +113,6 @@ EXPORT_SYMBOL(mark_page_accessed);
* @page: the page to add
*/
static DEFINE_PER_CPU(struct pagevec, lru_add_pvecs) = { 0, };
-static DEFINE_PER_CPU(struct pagevec, lru_add_active_pvecs) = { 0, };
void fastcall lru_cache_add(struct page *page)
{
@@ -149,25 +124,12 @@ void fastcall lru_cache_add(struct page
put_cpu_var(lru_add_pvecs);
}
-void fastcall lru_cache_add_active(struct page *page)
-{
- struct pagevec *pvec = &get_cpu_var(lru_add_active_pvecs);
-
- page_cache_get(page);
- if (!pagevec_add(pvec, page))
- __pagevec_lru_add_active(pvec);
- put_cpu_var(lru_add_active_pvecs);
-}
-
void lru_add_drain(void)
{
struct pagevec *pvec = &get_cpu_var(lru_add_pvecs);
if (pagevec_count(pvec))
__pagevec_lru_add(pvec);
- pvec = &__get_cpu_var(lru_add_active_pvecs);
- if (pagevec_count(pvec))
- __pagevec_lru_add_active(pvec);
put_cpu_var(lru_add_pvecs);
}
@@ -303,7 +265,9 @@ void __pagevec_lru_add(struct pagevec *p
}
if (TestSetPageLRU(page))
BUG();
- add_page_to_inactive_list(zone, page);
+ if (TestClearPageActive(page))
+ BUG();
+ __cart_insert(zone, page);
}
if (zone)
spin_unlock_irq(&zone->lru_lock);
@@ -313,33 +277,6 @@ void __pagevec_lru_add(struct pagevec *p
EXPORT_SYMBOL(__pagevec_lru_add);
-void __pagevec_lru_add_active(struct pagevec *pvec)
-{
- int i;
- struct zone *zone = NULL;
-
- for (i = 0; i < pagevec_count(pvec); i++) {
- struct page *page = pvec->pages[i];
- struct zone *pagezone = page_zone(page);
-
- if (pagezone != zone) {
- if (zone)
- spin_unlock_irq(&zone->lru_lock);
- zone = pagezone;
- spin_lock_irq(&zone->lru_lock);
- }
- if (TestSetPageLRU(page))
- BUG();
- if (TestSetPageActive(page))
- BUG();
- add_page_to_active_list(zone, page);
- }
- if (zone)
- spin_unlock_irq(&zone->lru_lock);
- release_pages(pvec->pages, pvec->nr, pvec->cold);
- pagevec_reinit(pvec);
-}
-
/*
* Try to drop buffers from the pages in a pagevec
*/
@@ -421,9 +358,6 @@ static void lru_drain_cache(unsigned int
/* CPU is dead, so no locking needed. */
if (pagevec_count(pvec))
__pagevec_lru_add(pvec);
- pvec = &per_cpu(lru_add_active_pvecs, cpu);
- if (pagevec_count(pvec))
- __pagevec_lru_add_active(pvec);
}
/* Drop the CPU's cached committed space back into the central pool. */
diff --git a/mm/swap_state.c b/mm/swap_state.c
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -359,7 +359,7 @@ struct page *read_swap_cache_async(swp_e
/*
* Initiate read into locked page and return.
*/
- lru_cache_add_active(new_page);
+ lru_cache_add(new_page);
swap_readpage(NULL, new_page);
return new_page;
}
diff --git a/mm/swapfile.c b/mm/swapfile.c
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -408,7 +408,7 @@ static void unuse_pte(struct vm_area_str
* Move the page to the active list so it is not
* immediately swapped out again after swapon.
*/
- activate_page(page);
+ SetPageReferenced(page);
}
static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
@@ -508,7 +508,7 @@ static int unuse_mm(struct mm_struct *mm
* Activate page so shrink_cache is unlikely to unmap its
* ptes while lock is dropped, so swapoff can make progress.
*/
- activate_page(page);
+ SetPageReferenced(page);
unlock_page(page);
down_read(&mm->mmap_sem);
lock_page(page);
diff --git a/mm/vmscan.c b/mm/vmscan.c
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -235,27 +235,6 @@ static int shrink_slab(unsigned long sca
return ret;
}
-/* Called without lock on whether page is mapped, so answer is unstable */
-static inline int page_mapping_inuse(struct page *page)
-{
- struct address_space *mapping;
-
- /* Page is in somebody's page tables. */
- if (page_mapped(page))
- return 1;
-
- /* Be more reluctant to reclaim swapcache than pagecache */
- if (PageSwapCache(page))
- return 1;
-
- mapping = page_mapping(page);
- if (!mapping)
- return 0;
-
- /* File is mmap'd by somebody? */
- return mapping_mapped(mapping);
-}
-
static inline int is_page_cache_freeable(struct page *page)
{
return page_count(page) - !!PagePrivate(page) == 2;
@@ -397,8 +376,6 @@ static int shrink_list(struct list_head
if (TestSetPageLocked(page))
goto keep;
- BUG_ON(PageActive(page));
-
sc->nr_scanned++;
/* Double the slab pressure for mapped and swapcache pages */
if (page_mapped(page) || PageSwapCache(page))
@@ -408,8 +385,8 @@ static int shrink_list(struct list_head
goto keep_locked;
referenced = page_referenced(page, 1, sc->priority <= 0);
- /* In active use or really unfreeable? Activate it. */
- if (referenced && page_mapping_inuse(page))
+
+ if (referenced)
goto activate_locked;
#ifdef CONFIG_SWAP
@@ -519,6 +496,7 @@ static int shrink_list(struct list_head
#ifdef CONFIG_SWAP
if (PageSwapCache(page)) {
swp_entry_t swap = { .val = page->private };
+ cart_remember(page);
__delete_from_swap_cache(page);
write_unlock_irq(&mapping->tree_lock);
swap_free(swap);
@@ -527,11 +505,13 @@ static int shrink_list(struct list_head
}
#endif /* CONFIG_SWAP */
+ cart_remember(page);
__remove_from_page_cache(page);
write_unlock_irq(&mapping->tree_lock);
__put_page(page);
free_it:
+ ClearPageActive(page);
unlock_page(page);
reclaimed++;
if (!pagevec_add(&freed_pvec, page))
@@ -566,33 +546,32 @@ keep:
* Appropriate locks must be held before calling this function.
*
* @nr_to_scan: The number of pages to look through on the list.
- * @src: The LRU list to pull pages off.
+ * @zone: The zone to get pages from.
* @dst: The temp list to put pages on to.
* @scanned: The number of pages that were scanned.
*
* returns how many pages were moved onto *@dst.
*/
-static int isolate_lru_pages(int nr_to_scan, struct list_head *src,
+static int isolate_lru_pages(int nr_to_scan, struct zone *zone,
struct list_head *dst, int *scanned)
{
int nr_taken = 0;
struct page *page;
int scan = 0;
- while (scan++ < nr_to_scan && !list_empty(src)) {
- page = lru_to_page(src);
- prefetchw_prev_lru_page(page, src, flags);
+ while (scan++ < nr_to_scan) {
+ page = __cart_replace(zone);
+ if (!page) break;
if (!TestClearPageLRU(page))
BUG();
- list_del(&page->lru);
if (get_page_testone(page)) {
/*
* It is being freed elsewhere
*/
__put_page(page);
SetPageLRU(page);
- list_add(&page->lru, src);
+ __cart_reinsert(zone, page);
continue;
} else {
list_add(&page->lru, dst);
@@ -624,9 +603,7 @@ static void shrink_cache(struct zone *zo
int nr_freed;
nr_taken = isolate_lru_pages(sc->swap_cluster_max,
- &zone->inactive_list,
- &page_list, &nr_scan);
- zone->nr_inactive -= nr_taken;
+ zone, &page_list, &nr_scan);
zone->pages_scanned += nr_scan;
spin_unlock_irq(&zone->lru_lock);
@@ -653,10 +630,7 @@ static void shrink_cache(struct zone *zo
if (TestSetPageLRU(page))
BUG();
list_del(&page->lru);
- if (PageActive(page))
- add_page_to_active_list(zone, page);
- else
- add_page_to_inactive_list(zone, page);
+ __cart_reinsert(zone, page);
if (!pagevec_add(&pvec, page)) {
spin_unlock_irq(&zone->lru_lock);
__pagevec_release(&pvec);
@@ -670,194 +644,34 @@ done:
}
/*
- * This moves pages from the active list to the inactive list.
- *
- * We move them the other way if the page is referenced by one or more
- * processes, from rmap.
- *
- * If the pages are mostly unmapped, the processing is fast and it is
- * appropriate to hold zone->lru_lock across the whole operation. But if
- * the pages are mapped, the processing is slow (page_referenced()) so we
- * should drop zone->lru_lock around each page. It's impossible to balance
- * this, so instead we remove the pages from the LRU while processing them.
- * It is safe to rely on PG_active against the non-LRU pages in here because
- * nobody will play with that bit on a non-LRU page.
- *
- * The downside is that we have to touch page->_count against each page.
- * But we had to alter page->flags anyway.
- */
-static void
-refill_inactive_zone(struct zone *zone, struct scan_control *sc)
-{
- int pgmoved;
- int pgdeactivate = 0;
- int pgscanned;
- int nr_pages = sc->nr_to_scan;
- LIST_HEAD(l_hold); /* The pages which were snipped off */
- LIST_HEAD(l_inactive); /* Pages to go onto the inactive_list */
- LIST_HEAD(l_active); /* Pages to go onto the active_list */
- struct page *page;
- struct pagevec pvec;
- int reclaim_mapped = 0;
- long mapped_ratio;
- long distress;
- long swap_tendency;
-
- lru_add_drain();
- spin_lock_irq(&zone->lru_lock);
- pgmoved = isolate_lru_pages(nr_pages, &zone->active_list,
- &l_hold, &pgscanned);
- zone->pages_scanned += pgscanned;
- zone->nr_active -= pgmoved;
- spin_unlock_irq(&zone->lru_lock);
-
- /*
- * `distress' is a measure of how much trouble we're having reclaiming
- * pages. 0 -> no problems. 100 -> great trouble.
- */
- distress = 100 >> zone->prev_priority;
-
- /*
- * The point of this algorithm is to decide when to start reclaiming
- * mapped memory instead of just pagecache. Work out how much memory
- * is mapped.
- */
- mapped_ratio = (sc->nr_mapped * 100) / total_memory;
-
- /*
- * Now decide how much we really want to unmap some pages. The mapped
- * ratio is downgraded - just because there's a lot of mapped memory
- * doesn't necessarily mean that page reclaim isn't succeeding.
- *
- * The distress ratio is important - we don't want to start going oom.
- *
- * A 100% value of vm_swappiness overrides this algorithm altogether.
- */
- swap_tendency = mapped_ratio / 2 + distress + vm_swappiness;
-
- /*
- * Now use this metric to decide whether to start moving mapped memory
- * onto the inactive list.
- */
- if (swap_tendency >= 100)
- reclaim_mapped = 1;
-
- while (!list_empty(&l_hold)) {
- cond_resched();
- page = lru_to_page(&l_hold);
- list_del(&page->lru);
- if (page_mapped(page)) {
- if (!reclaim_mapped ||
- (total_swap_pages == 0 && PageAnon(page)) ||
- page_referenced(page, 0, sc->priority <= 0)) {
- list_add(&page->lru, &l_active);
- continue;
- }
- }
- list_add(&page->lru, &l_inactive);
- }
-
- pagevec_init(&pvec, 1);
- pgmoved = 0;
- spin_lock_irq(&zone->lru_lock);
- while (!list_empty(&l_inactive)) {
- page = lru_to_page(&l_inactive);
- prefetchw_prev_lru_page(page, &l_inactive, flags);
- if (TestSetPageLRU(page))
- BUG();
- if (!TestClearPageActive(page))
- BUG();
- list_move(&page->lru, &zone->inactive_list);
- pgmoved++;
- if (!pagevec_add(&pvec, page)) {
- zone->nr_inactive += pgmoved;
- spin_unlock_irq(&zone->lru_lock);
- pgdeactivate += pgmoved;
- pgmoved = 0;
- if (buffer_heads_over_limit)
- pagevec_strip(&pvec);
- __pagevec_release(&pvec);
- spin_lock_irq(&zone->lru_lock);
- }
- }
- zone->nr_inactive += pgmoved;
- pgdeactivate += pgmoved;
- if (buffer_heads_over_limit) {
- spin_unlock_irq(&zone->lru_lock);
- pagevec_strip(&pvec);
- spin_lock_irq(&zone->lru_lock);
- }
-
- pgmoved = 0;
- while (!list_empty(&l_active)) {
- page = lru_to_page(&l_active);
- prefetchw_prev_lru_page(page, &l_active, flags);
- if (TestSetPageLRU(page))
- BUG();
- BUG_ON(!PageActive(page));
- list_move(&page->lru, &zone->active_list);
- pgmoved++;
- if (!pagevec_add(&pvec, page)) {
- zone->nr_active += pgmoved;
- pgmoved = 0;
- spin_unlock_irq(&zone->lru_lock);
- __pagevec_release(&pvec);
- spin_lock_irq(&zone->lru_lock);
- }
- }
- zone->nr_active += pgmoved;
- spin_unlock_irq(&zone->lru_lock);
- pagevec_release(&pvec);
-
- mod_page_state_zone(zone, pgrefill, pgscanned);
- mod_page_state(pgdeactivate, pgdeactivate);
-}
-
-/*
* This is a basic per-zone page freer. Used by both kswapd and direct reclaim.
*/
static void
shrink_zone(struct zone *zone, struct scan_control *sc)
{
unsigned long nr_active;
- unsigned long nr_inactive;
/*
* Add one to `nr_to_scan' just to make sure that the kernel will
* slowly sift through the active list.
*/
- zone->nr_scan_active += (zone->nr_active >> sc->priority) + 1;
+ zone->nr_scan_active += ((zone->nr_active + zone->nr_inactive) >> sc->priority) + 1;
nr_active = zone->nr_scan_active;
if (nr_active >= sc->swap_cluster_max)
zone->nr_scan_active = 0;
else
nr_active = 0;
- zone->nr_scan_inactive += (zone->nr_inactive >> sc->priority) + 1;
- nr_inactive = zone->nr_scan_inactive;
- if (nr_inactive >= sc->swap_cluster_max)
- zone->nr_scan_inactive = 0;
- else
- nr_inactive = 0;
sc->nr_to_reclaim = sc->swap_cluster_max;
- while (nr_active || nr_inactive) {
- if (nr_active) {
- sc->nr_to_scan = min(nr_active,
- (unsigned long)sc->swap_cluster_max);
- nr_active -= sc->nr_to_scan;
- refill_inactive_zone(zone, sc);
- }
-
- if (nr_inactive) {
- sc->nr_to_scan = min(nr_inactive,
- (unsigned long)sc->swap_cluster_max);
- nr_inactive -= sc->nr_to_scan;
- shrink_cache(zone, sc);
- if (sc->nr_to_reclaim <= 0)
- break;
- }
+ while (nr_active) {
+ sc->nr_to_scan = min(nr_active,
+ (unsigned long)sc->swap_cluster_max);
+ nr_active -= sc->nr_to_scan;
+ shrink_cache(zone, sc);
+ if (sc->nr_to_reclaim <= 0)
+ break;
}
throttle_vm_writeout();
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2005-08-27 19:46 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-08-12 14:37 Zoned CART Peter Zijlstra
2005-08-12 15:42 ` Rahul Iyer
2005-08-12 15:52 ` Peter Zijlstra
2005-08-12 23:08 ` Marcelo Tosatti
2005-08-13 19:00 ` Rahul Iyer
2005-08-13 19:08 ` Marcelo Tosatti
2005-08-13 21:30 ` Rik van Riel
2005-08-14 18:31 ` Peter Zijlstra
2005-08-12 20:21 ` Marcelo Tosatti
2005-08-12 22:28 ` Marcelo Tosatti
2005-08-13 19:03 ` Rahul Iyer
2005-08-14 12:58 ` Peter Zijlstra
2005-08-15 21:31 ` Peter Zijlstra
2005-08-16 19:53 ` Rahul Iyer
2005-08-16 20:49 ` Christoph Lameter
2005-08-25 22:39 ` Peter Zijlstra
2005-08-26 0:01 ` Christoph Lameter
2005-08-26 3:59 ` Rahul Iyer
2005-08-26 7:09 ` Peter Zijlstra
2005-08-26 12:24 ` Rik van Riel
2005-08-26 21:03 ` Peter Zijlstra
2005-08-27 19:46 ` [RFC][PATCH] " Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox