* [PATCH 1/39] mm: disuse activate_page()
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
@ 2006-07-12 14:37 ` Peter Zijlstra
2006-07-12 14:37 ` [PATCH 2/39] mm: adjust blk_congestion_wait() logic Peter Zijlstra
` (38 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:37 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Get rid of activate_page() callers.
Instead, page activation is achieved through mark_page_accessed()
interface.
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
include/linux/swap.h | 1 -
mm/swapfile.c | 4 ++--
2 files changed, 2 insertions(+), 3 deletions(-)
Index: linux-2.6/include/linux/swap.h
===================================================================
--- linux-2.6.orig/include/linux/swap.h 2006-07-12 16:07:30.000000000 +0200
+++ linux-2.6/include/linux/swap.h 2006-07-12 16:11:58.000000000 +0200
@@ -165,7 +165,6 @@ extern unsigned int nr_free_pagecache_pa
/* linux/mm/swap.c */
extern void FASTCALL(lru_cache_add(struct page *));
extern void FASTCALL(lru_cache_add_active(struct page *));
-extern void FASTCALL(activate_page(struct page *));
extern void FASTCALL(mark_page_accessed(struct page *));
extern void lru_add_drain(void);
extern int lru_add_drain_all(void);
Index: linux-2.6/mm/swapfile.c
===================================================================
--- linux-2.6.orig/mm/swapfile.c 2006-07-12 16:07:32.000000000 +0200
+++ linux-2.6/mm/swapfile.c 2006-07-12 16:11:36.000000000 +0200
@@ -496,7 +496,7 @@ static void unuse_pte(struct vm_area_str
* Move the page to the active list so it is not
* immediately swapped out again after swapon.
*/
- activate_page(page);
+ mark_page_accessed(page);
}
static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
@@ -598,7 +598,7 @@ static int unuse_mm(struct mm_struct *mm
* Activate page so shrink_cache is unlikely to unmap its
* ptes while lock is dropped, so swapoff can make progress.
*/
- activate_page(page);
+ mark_page_accessed(page);
unlock_page(page);
down_read(&mm->mmap_sem);
lock_page(page);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 2/39] mm: adjust blk_congestion_wait() logic
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
2006-07-12 14:37 ` [PATCH 1/39] mm: disuse activate_page() Peter Zijlstra
@ 2006-07-12 14:37 ` Peter Zijlstra
2006-07-12 14:37 ` [PATCH 3/39] mm: pgrep: prepare for page replace framework Peter Zijlstra
` (37 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:37 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
The new page reclaim implementations can require a lot more scanning
in order to find a suiteable page. This causes kswapd to constantly hit:
blk_congestion_wait(WRITE, HZ/10);
without there being any submitted IO.
Count the number of async pages submitted by pageout() and only wait
for congestion when the last priority level has submitted more than
half SWAP_CLUSTER_MAX pages for IO.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
mm/vmscan.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c 2006-07-12 16:07:32.000000000 +0200
+++ linux-2.6/mm/vmscan.c 2006-07-12 16:11:58.000000000 +0200
@@ -61,6 +61,8 @@ struct scan_control {
* In this context, it doesn't matter that we scan the
* whole list at once. */
int swap_cluster_max;
+
+ unsigned long nr_writeout; /* page against which writeout was started */
};
/*
@@ -407,6 +409,7 @@ static unsigned long shrink_page_list(st
struct pagevec freed_pvec;
int pgactivate = 0;
unsigned long nr_reclaimed = 0;
+ int writeout = 0;
cond_resched();
@@ -488,8 +491,10 @@ static unsigned long shrink_page_list(st
case PAGE_ACTIVATE:
goto activate_locked;
case PAGE_SUCCESS:
- if (PageWriteback(page) || PageDirty(page))
+ if (PageWriteback(page) || PageDirty(page)) {
+ writeout++;
goto keep;
+ }
/*
* A synchronous write - probably a ramdisk. Go
* ahead and try to reclaim the page.
@@ -555,6 +560,7 @@ keep:
if (pagevec_count(&freed_pvec))
__pagevec_release_nonlru(&freed_pvec);
mod_page_state(pgactivate, pgactivate);
+ sc->nr_writeout += writeout;
return nr_reclaimed;
}
@@ -1123,6 +1129,7 @@ scan:
* pages behind kswapd's direction of progress, which would
* cause too much scanning of the lower zones.
*/
+ sc.nr_writeout = 0;
for (i = 0; i <= end_zone; i++) {
struct zone *zone = pgdat->node_zones + i;
int nr_slab;
@@ -1170,7 +1177,7 @@ scan:
* OK, kswapd is getting into trouble. Take a nap, then take
* another pass across the zones.
*/
- if (total_scanned && priority < DEF_PRIORITY - 2)
+ if (sc.nr_writeout > SWAP_CLUSTER_MAX/2)
blk_congestion_wait(WRITE, HZ/10);
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 3/39] mm: pgrep: prepare for page replace framework
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
2006-07-12 14:37 ` [PATCH 1/39] mm: disuse activate_page() Peter Zijlstra
2006-07-12 14:37 ` [PATCH 2/39] mm: adjust blk_congestion_wait() logic Peter Zijlstra
@ 2006-07-12 14:37 ` Peter Zijlstra
2006-07-12 14:37 ` [PATCH 4/39] mm: pgrep: convert insertion Peter Zijlstra
` (36 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:37 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Introduce the configuration option, and modify the Makefile.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
mm/Kconfig | 11 +++++++++++
mm/Makefile | 2 ++
mm/useonce.c | 3 +++
3 files changed, 16 insertions(+)
Index: linux-2.6/mm/Kconfig
===================================================================
--- linux-2.6.orig/mm/Kconfig 2006-07-12 16:07:32.000000000 +0200
+++ linux-2.6/mm/Kconfig 2006-07-12 16:11:29.000000000 +0200
@@ -133,6 +133,17 @@ config SPLIT_PTLOCK_CPUS
default "4096" if PARISC && !PA20
default "4"
+choice
+ prompt "Page replacement policy"
+ default MM_POLICY_USEONCE
+
+config MM_POLICY_USEONCE
+ bool "LRU-2Q USE-ONCE"
+ help
+ This option selects the standard multi-queue LRU policy.
+
+endchoice
+
#
# support for page migration
#
Index: linux-2.6/mm/Makefile
===================================================================
--- linux-2.6.orig/mm/Makefile 2006-07-12 16:07:32.000000000 +0200
+++ linux-2.6/mm/Makefile 2006-07-12 16:11:29.000000000 +0200
@@ -12,6 +12,8 @@ obj-y := bootmem.o filemap.o mempool.o
readahead.o swap.o truncate.o vmscan.o \
prio_tree.o util.o mmzone.o $(mmu-y)
+obj-$(CONFIG_MM_POLICY_USEONCE) += useonce.o
+
obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o
obj-$(CONFIG_HUGETLBFS) += hugetlb.o
obj-$(CONFIG_NUMA) += mempolicy.o
Index: linux-2.6/mm/useonce.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/mm/useonce.c 2006-07-12 16:11:58.000000000 +0200
@@ -0,0 +1,3 @@
+
+
+
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 4/39] mm: pgrep: convert insertion
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (2 preceding siblings ...)
2006-07-12 14:37 ` [PATCH 3/39] mm: pgrep: prepare for page replace framework Peter Zijlstra
@ 2006-07-12 14:37 ` Peter Zijlstra
2006-07-12 14:37 ` [PATCH 5/39] mm: pgrep: add a use-once insertion hint Peter Zijlstra
` (35 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:37 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Abstract the insertion of pages into the page cache.
API:
give a hint to the page replace algorithm as to the
importance of the given page.
void pgrep_hint_active(struct page *);
insert the given page in a per cpu pagevec
void fastcall pgrep_add(struct page *);
flush either the current, the given or all CPU(s) pagevec.
void pgrep_add_drain(void);
void __pgrep_add_drain(unsigned int);
int pgrep_add_drain_all(void);
functions to insert a pagevec worth of pages
void __pagevec_pgrep_add(struct pagevec *);
void pagevec_pgrep_add(struct pagevec *);
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
fs/cifs/file.c | 5 -
fs/exec.c | 4 -
fs/mpage.c | 5 -
fs/ntfs/file.c | 4 -
fs/ramfs/file-nommu.c | 2
fs/splice.c | 4 -
include/linux/mm_page_replace.h | 38 +++++++++++
include/linux/mm_use_once_policy.h | 21 ++++++
include/linux/pagevec.h | 8 --
include/linux/swap.h | 4 -
mm/filemap.c | 7 +-
mm/memory.c | 14 ++--
mm/mempolicy.c | 1
mm/migrate.c | 14 ----
mm/mmap.c | 5 -
mm/readahead.c | 9 +-
mm/shmem.c | 2
mm/swap.c | 120 ++-----------------------------------
mm/swap_state.c | 6 +
mm/useonce.c | 108 +++++++++++++++++++++++++++++++++
mm/vmscan.c | 5 -
21 files changed, 224 insertions(+), 162 deletions(-)
Index: linux-2.6/include/linux/mm_use_once_policy.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/include/linux/mm_use_once_policy.h 2006-07-12 16:11:57.000000000 +0200
@@ -0,0 +1,21 @@
+#ifndef _LINUX_MM_USEONCE_POLICY_H
+#define _LINUX_MM_USEONCE_POLICY_H
+
+#ifdef __KERNEL__
+
+static inline void pgrep_hint_active(struct page *page)
+{
+ SetPageActive(page);
+}
+
+static inline void
+__pgrep_add(struct zone *zone, struct page *page)
+{
+ if (PageActive(page))
+ add_page_to_active_list(zone, page);
+ else
+ add_page_to_inactive_list(zone, page);
+}
+
+#endif /* __KERNEL__ */
+#endif /* _LINUX_MM_USEONCE_POLICY_H */
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:11:57.000000000 +0200
@@ -0,0 +1,38 @@
+#ifndef _LINUX_MM_PAGE_REPLACE_H
+#define _LINUX_MM_PAGE_REPLACE_H
+
+#ifdef __KERNEL__
+
+#include <linux/mmzone.h>
+#include <linux/mm.h>
+#include <linux/pagevec.h>
+#include <linux/mm_inline.h>
+
+/* void pgrep_hint_active(struct page *); */
+extern void fastcall pgrep_add(struct page *);
+/* void __pgrep_add(struct zone *, struct page *); */
+/* void pgrep_add_drain(void); */
+extern void __pgrep_add_drain(unsigned int);
+extern int pgrep_add_drain_all(void);
+extern void __pagevec_pgrep_add(struct pagevec *);
+
+#ifdef CONFIG_MM_POLICY_USEONCE
+#include <linux/mm_use_once_policy.h>
+#else
+#error no mm policy
+#endif
+
+static inline void pagevec_pgrep_add(struct pagevec *pvec)
+{
+ if (pagevec_count(pvec))
+ __pagevec_pgrep_add(pvec);
+}
+
+static inline void pgrep_add_drain(void)
+{
+ __pgrep_add_drain(get_cpu());
+ put_cpu();
+}
+
+#endif /* __KERNEL__ */
+#endif /* _LINUX_MM_PAGE_REPLACE_H */
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c 2006-07-12 16:07:32.000000000 +0200
+++ linux-2.6/mm/filemap.c 2006-07-12 16:11:56.000000000 +0200
@@ -30,6 +30,7 @@
#include <linux/security.h>
#include <linux/syscalls.h>
#include <linux/cpuset.h>
+#include <linux/mm_page_replace.h>
#include "filemap.h"
#include "internal.h"
@@ -430,7 +431,7 @@ int add_to_page_cache_lru(struct page *p
{
int ret = add_to_page_cache(page, mapping, offset, gfp_mask);
if (ret == 0)
- lru_cache_add(page);
+ pgrep_add(page);
return ret;
}
@@ -1784,7 +1785,7 @@ repeat:
page = *cached_page;
page_cache_get(page);
if (!pagevec_add(lru_pvec, page))
- __pagevec_lru_add(lru_pvec);
+ __pagevec_pgrep_add(lru_pvec);
*cached_page = NULL;
}
}
@@ -2114,7 +2115,7 @@ generic_file_buffered_write(struct kiocb
if (unlikely(file->f_flags & O_DIRECT) && written)
status = filemap_write_and_wait(mapping);
- pagevec_lru_add(&lru_pvec);
+ pagevec_pgrep_add(&lru_pvec);
return written ? written : status;
}
EXPORT_SYMBOL(generic_file_buffered_write);
Index: linux-2.6/mm/useonce.c
===================================================================
--- linux-2.6.orig/mm/useonce.c 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/mm/useonce.c 2006-07-12 16:11:55.000000000 +0200
@@ -1,3 +1,111 @@
+#include <linux/mm_page_replace.h>
+#include <linux/mm_inline.h>
+#include <linux/swap.h>
+#include <linux/module.h>
+#include <linux/pagemap.h>
+/**
+ * lru_cache_add: add a page to the page lists
+ * @page: the page to add
+ */
+static DEFINE_PER_CPU(struct pagevec, lru_add_pvecs) = { 0, };
+static DEFINE_PER_CPU(struct pagevec, lru_add_active_pvecs) = { 0, };
+/*
+ * Add the passed pages to the LRU, then drop the caller's refcount
+ * on them. Reinitialises the caller's pagevec.
+ */
+void __pagevec_pgrep_add(struct pagevec *pvec)
+{
+ int i;
+ struct zone *zone = NULL;
+ for (i = 0; i < pagevec_count(pvec); i++) {
+ struct page *page = pvec->pages[i];
+ struct zone *pagezone = page_zone(page);
+
+ if (pagezone != zone) {
+ if (zone)
+ spin_unlock_irq(&zone->lru_lock);
+ zone = pagezone;
+ spin_lock_irq(&zone->lru_lock);
+ }
+ BUG_ON(PageLRU(page));
+ SetPageLRU(page);
+ add_page_to_inactive_list(zone, page);
+ }
+ if (zone)
+ spin_unlock_irq(&zone->lru_lock);
+ release_pages(pvec->pages, pvec->nr, pvec->cold);
+ pagevec_reinit(pvec);
+}
+
+EXPORT_SYMBOL(__pagevec_pgrep_add);
+
+static void __pagevec_lru_add_active(struct pagevec *pvec)
+{
+ int i;
+ struct zone *zone = NULL;
+
+ for (i = 0; i < pagevec_count(pvec); i++) {
+ struct page *page = pvec->pages[i];
+ struct zone *pagezone = page_zone(page);
+
+ if (pagezone != zone) {
+ if (zone)
+ spin_unlock_irq(&zone->lru_lock);
+ zone = pagezone;
+ spin_lock_irq(&zone->lru_lock);
+ }
+ BUG_ON(PageLRU(page));
+ SetPageLRU(page);
+ BUG_ON(PageActive(page));
+ SetPageActive(page);
+ add_page_to_active_list(zone, page);
+ }
+ if (zone)
+ spin_unlock_irq(&zone->lru_lock);
+ release_pages(pvec->pages, pvec->nr, pvec->cold);
+ pagevec_reinit(pvec);
+}
+
+static inline void lru_cache_add(struct page *page)
+{
+ struct pagevec *pvec = &get_cpu_var(lru_add_pvecs);
+
+ page_cache_get(page);
+ if (!pagevec_add(pvec, page))
+ __pagevec_pgrep_add(pvec);
+ put_cpu_var(lru_add_pvecs);
+}
+
+static inline void lru_cache_add_active(struct page *page)
+{
+ struct pagevec *pvec = &get_cpu_var(lru_add_active_pvecs);
+
+ page_cache_get(page);
+ if (!pagevec_add(pvec, page))
+ __pagevec_lru_add_active(pvec);
+ put_cpu_var(lru_add_active_pvecs);
+}
+
+void fastcall pgrep_add(struct page *page)
+{
+ if (PageActive(page)) {
+ ClearPageActive(page);
+ lru_cache_add_active(page);
+ } else {
+ lru_cache_add(page);
+ }
+}
+
+void __pgrep_add_drain(unsigned int cpu)
+{
+ struct pagevec *pvec = &per_cpu(lru_add_pvecs, cpu);
+
+ if (pagevec_count(pvec))
+ __pagevec_pgrep_add(pvec);
+ pvec = &per_cpu(lru_add_active_pvecs, cpu);
+ if (pagevec_count(pvec))
+ __pagevec_lru_add_active(pvec);
+}
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c 2006-07-12 16:07:32.000000000 +0200
+++ linux-2.6/mm/memory.c 2006-07-12 16:11:36.000000000 +0200
@@ -48,6 +48,7 @@
#include <linux/rmap.h>
#include <linux/module.h>
#include <linux/init.h>
+#include <linux/mm_page_replace.h>
#include <asm/pgalloc.h>
#include <asm/uaccess.h>
@@ -870,7 +871,7 @@ unsigned long zap_page_range(struct vm_a
unsigned long end = address + size;
unsigned long nr_accounted = 0;
- lru_add_drain();
+ pgrep_add_drain();
tlb = tlb_gather_mmu(mm, 0);
update_hiwater_rss(mm);
end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details);
@@ -1505,7 +1506,8 @@ gotten:
ptep_establish(vma, address, page_table, entry);
update_mmu_cache(vma, address, entry);
lazy_mmu_prot_update(entry);
- lru_cache_add_active(new_page);
+ pgrep_hint_active(new_page);
+ pgrep_add(new_page);
page_add_new_anon_rmap(new_page, vma, address);
/* Free the old page.. */
@@ -1857,7 +1859,7 @@ void swapin_readahead(swp_entry_t entry,
}
#endif
}
- lru_add_drain(); /* Push any new pages onto the LRU now */
+ pgrep_add_drain(); /* Push any new pages onto the LRU now */
}
/*
@@ -1991,7 +1993,8 @@ static int do_anonymous_page(struct mm_s
if (!pte_none(*page_table))
goto release;
inc_mm_counter(mm, anon_rss);
- lru_cache_add_active(page);
+ pgrep_hint_active(page);
+ pgrep_add(page);
page_add_new_anon_rmap(page, vma, address);
} else {
/* Map the ZERO_PAGE - vm_page_prot is readonly */
@@ -2122,7 +2125,8 @@ retry:
set_pte_at(mm, address, page_table, entry);
if (anon) {
inc_mm_counter(mm, anon_rss);
- lru_cache_add_active(new_page);
+ pgrep_hint_active(new_page);
+ pgrep_add(new_page);
page_add_new_anon_rmap(new_page, vma, address);
} else {
inc_mm_counter(mm, file_rss);
Index: linux-2.6/mm/mmap.c
===================================================================
--- linux-2.6.orig/mm/mmap.c 2006-07-12 16:07:32.000000000 +0200
+++ linux-2.6/mm/mmap.c 2006-07-12 16:08:18.000000000 +0200
@@ -25,6 +25,7 @@
#include <linux/mount.h>
#include <linux/mempolicy.h>
#include <linux/rmap.h>
+#include <linux/mm_page_replace.h>
#include <asm/uaccess.h>
#include <asm/cacheflush.h>
@@ -1662,7 +1663,7 @@ static void unmap_region(struct mm_struc
struct mmu_gather *tlb;
unsigned long nr_accounted = 0;
- lru_add_drain();
+ pgrep_add_drain();
tlb = tlb_gather_mmu(mm, 0);
update_hiwater_rss(mm);
unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
@@ -1942,7 +1943,7 @@ void exit_mmap(struct mm_struct *mm)
unsigned long nr_accounted = 0;
unsigned long end;
- lru_add_drain();
+ pgrep_add_drain();
flush_cache_mm(mm);
tlb = tlb_gather_mmu(mm, 1);
/* Don't update_hiwater_rss(mm) here, do_exit already did */
Index: linux-2.6/mm/shmem.c
===================================================================
--- linux-2.6.orig/mm/shmem.c 2006-07-12 16:07:32.000000000 +0200
+++ linux-2.6/mm/shmem.c 2006-07-12 16:08:18.000000000 +0200
@@ -954,7 +954,7 @@ struct page *shmem_swapin(struct shmem_i
break;
page_cache_release(page);
}
- lru_add_drain(); /* Push any new pages onto the LRU now */
+ pgrep_add_drain(); /* Push any new pages onto the LRU now */
return shmem_swapin_async(p, entry, idx);
}
Index: linux-2.6/mm/swap.c
===================================================================
--- linux-2.6.orig/mm/swap.c 2006-07-12 16:07:32.000000000 +0200
+++ linux-2.6/mm/swap.c 2006-07-12 16:11:55.000000000 +0200
@@ -30,6 +30,7 @@
#include <linux/cpu.h>
#include <linux/notifier.h>
#include <linux/init.h>
+#include <linux/mm_page_replace.h>
/* How many pages do we try to swap or page in/out together? */
int page_cluster;
@@ -132,63 +133,18 @@ void fastcall mark_page_accessed(struct
EXPORT_SYMBOL(mark_page_accessed);
-/**
- * lru_cache_add: add a page to the page lists
- * @page: the page to add
- */
-static DEFINE_PER_CPU(struct pagevec, lru_add_pvecs) = { 0, };
-static DEFINE_PER_CPU(struct pagevec, lru_add_active_pvecs) = { 0, };
-
-void fastcall lru_cache_add(struct page *page)
-{
- struct pagevec *pvec = &get_cpu_var(lru_add_pvecs);
-
- page_cache_get(page);
- if (!pagevec_add(pvec, page))
- __pagevec_lru_add(pvec);
- put_cpu_var(lru_add_pvecs);
-}
-
-void fastcall lru_cache_add_active(struct page *page)
-{
- struct pagevec *pvec = &get_cpu_var(lru_add_active_pvecs);
-
- page_cache_get(page);
- if (!pagevec_add(pvec, page))
- __pagevec_lru_add_active(pvec);
- put_cpu_var(lru_add_active_pvecs);
-}
-
-static void __lru_add_drain(int cpu)
-{
- struct pagevec *pvec = &per_cpu(lru_add_pvecs, cpu);
-
- /* CPU is dead, so no locking needed. */
- if (pagevec_count(pvec))
- __pagevec_lru_add(pvec);
- pvec = &per_cpu(lru_add_active_pvecs, cpu);
- if (pagevec_count(pvec))
- __pagevec_lru_add_active(pvec);
-}
-
-void lru_add_drain(void)
-{
- __lru_add_drain(get_cpu());
- put_cpu();
-}
-
#ifdef CONFIG_NUMA
-static void lru_add_drain_per_cpu(void *dummy)
+static void drain_per_cpu(void *dummy)
{
- lru_add_drain();
+ pgrep_add_drain();
}
/*
* Returns 0 for success
*/
-int lru_add_drain_all(void)
+int pgrep_add_drain_all(void)
{
- return schedule_on_each_cpu(lru_add_drain_per_cpu, NULL);
+ return schedule_on_each_cpu(drain_per_cpu, NULL);
}
#else
@@ -196,9 +152,9 @@ int lru_add_drain_all(void)
/*
* Returns 0 for success
*/
-int lru_add_drain_all(void)
+int pgrep_add_drain_all(void)
{
- lru_add_drain();
+ pgrep_add_drain();
return 0;
}
#endif
@@ -297,7 +253,7 @@ void release_pages(struct page **pages,
*/
void __pagevec_release(struct pagevec *pvec)
{
- lru_add_drain();
+ pgrep_add_drain();
release_pages(pvec->pages, pagevec_count(pvec), pvec->cold);
pagevec_reinit(pvec);
}
@@ -327,64 +283,6 @@ void __pagevec_release_nonlru(struct pag
}
/*
- * Add the passed pages to the LRU, then drop the caller's refcount
- * on them. Reinitialises the caller's pagevec.
- */
-void __pagevec_lru_add(struct pagevec *pvec)
-{
- int i;
- struct zone *zone = NULL;
-
- for (i = 0; i < pagevec_count(pvec); i++) {
- struct page *page = pvec->pages[i];
- struct zone *pagezone = page_zone(page);
-
- if (pagezone != zone) {
- if (zone)
- spin_unlock_irq(&zone->lru_lock);
- zone = pagezone;
- spin_lock_irq(&zone->lru_lock);
- }
- BUG_ON(PageLRU(page));
- SetPageLRU(page);
- add_page_to_inactive_list(zone, page);
- }
- if (zone)
- spin_unlock_irq(&zone->lru_lock);
- release_pages(pvec->pages, pvec->nr, pvec->cold);
- pagevec_reinit(pvec);
-}
-
-EXPORT_SYMBOL(__pagevec_lru_add);
-
-void __pagevec_lru_add_active(struct pagevec *pvec)
-{
- int i;
- struct zone *zone = NULL;
-
- for (i = 0; i < pagevec_count(pvec); i++) {
- struct page *page = pvec->pages[i];
- struct zone *pagezone = page_zone(page);
-
- if (pagezone != zone) {
- if (zone)
- spin_unlock_irq(&zone->lru_lock);
- zone = pagezone;
- spin_lock_irq(&zone->lru_lock);
- }
- BUG_ON(PageLRU(page));
- SetPageLRU(page);
- BUG_ON(PageActive(page));
- SetPageActive(page);
- add_page_to_active_list(zone, page);
- }
- if (zone)
- spin_unlock_irq(&zone->lru_lock);
- release_pages(pvec->pages, pvec->nr, pvec->cold);
- pagevec_reinit(pvec);
-}
-
-/*
* Try to drop buffers from the pages in a pagevec
*/
void pagevec_strip(struct pagevec *pvec)
@@ -473,7 +371,7 @@ static int cpu_swap_callback(struct noti
if (action == CPU_DEAD) {
atomic_add(*committed, &vm_committed_space);
*committed = 0;
- __lru_add_drain((long)hcpu);
+ __pgrep_add_drain((long)hcpu);
}
return NOTIFY_OK;
}
Index: linux-2.6/mm/swap_state.c
===================================================================
--- linux-2.6.orig/mm/swap_state.c 2006-07-12 16:07:32.000000000 +0200
+++ linux-2.6/mm/swap_state.c 2006-07-12 16:08:18.000000000 +0200
@@ -16,6 +16,7 @@
#include <linux/backing-dev.h>
#include <linux/pagevec.h>
#include <linux/migrate.h>
+#include <linux/mm_page_replace.h>
#include <asm/pgtable.h>
@@ -276,7 +277,7 @@ void free_pages_and_swap_cache(struct pa
{
struct page **pagep = pages;
- lru_add_drain();
+ pgrep_add_drain();
while (nr) {
int todo = min(nr, PAGEVEC_SIZE);
int i;
@@ -354,7 +355,8 @@ struct page *read_swap_cache_async(swp_e
/*
* Initiate read into locked page and return.
*/
- lru_cache_add_active(new_page);
+ pgrep_hint_active(new_page);
+ pgrep_add(new_page);
swap_readpage(NULL, new_page);
return new_page;
}
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/mm/vmscan.c 2006-07-12 16:11:54.000000000 +0200
@@ -34,6 +34,7 @@
#include <linux/notifier.h>
#include <linux/rwsem.h>
#include <linux/delay.h>
+#include <linux/mm_page_replace.h>
#include <asm/tlbflush.h>
#include <asm/div64.h>
@@ -630,7 +631,7 @@ static unsigned long shrink_inactive_lis
pagevec_init(&pvec, 1);
- lru_add_drain();
+ pgrep_add_drain();
spin_lock_irq(&zone->lru_lock);
do {
struct page *page;
@@ -757,7 +758,7 @@ static void shrink_active_list(unsigned
reclaim_mapped = 1;
}
- lru_add_drain();
+ pgrep_add_drain();
spin_lock_irq(&zone->lru_lock);
pgmoved = isolate_lru_pages(nr_pages, &zone->active_list,
&l_hold, &pgscanned);
Index: linux-2.6/fs/cifs/file.c
===================================================================
--- linux-2.6.orig/fs/cifs/file.c 2006-07-12 16:07:24.000000000 +0200
+++ linux-2.6/fs/cifs/file.c 2006-07-12 16:08:18.000000000 +0200
@@ -30,6 +30,7 @@
#include <linux/smp_lock.h>
#include <linux/writeback.h>
#include <linux/delay.h>
+#include <linux/mm_page_replace.h>
#include <asm/div64.h>
#include "cifsfs.h"
#include "cifspdu.h"
@@ -1654,7 +1655,7 @@ static void cifs_copy_cache_pages(struct
SetPageUptodate(page);
unlock_page(page);
if (!pagevec_add(plru_pvec, page))
- __pagevec_lru_add(plru_pvec);
+ __pagevec_pgrep_add(plru_pvec);
data += PAGE_CACHE_SIZE;
}
return;
@@ -1808,7 +1809,7 @@ static int cifs_readpages(struct file *f
bytes_read = 0;
}
- pagevec_lru_add(&lru_pvec);
+ pagevec_pgrep_add(&lru_pvec);
/* need to free smb_read_data buf before exit */
if (smb_read_data) {
Index: linux-2.6/fs/mpage.c
===================================================================
--- linux-2.6.orig/fs/mpage.c 2006-07-12 16:07:25.000000000 +0200
+++ linux-2.6/fs/mpage.c 2006-07-12 16:08:18.000000000 +0200
@@ -26,6 +26,7 @@
#include <linux/writeback.h>
#include <linux/backing-dev.h>
#include <linux/pagevec.h>
+#include <linux/mm_page_replace.h>
/*
* I/O completion handler for multipage BIOs.
@@ -408,12 +409,12 @@ mpage_readpages(struct address_space *ma
&first_logical_block,
get_block);
if (!pagevec_add(&lru_pvec, page))
- __pagevec_lru_add(&lru_pvec);
+ __pagevec_pgrep_add(&lru_pvec);
} else {
page_cache_release(page);
}
}
- pagevec_lru_add(&lru_pvec);
+ pagevec_pgrep_add(&lru_pvec);
BUG_ON(!list_empty(pages));
if (bio)
mpage_bio_submit(READ, bio);
Index: linux-2.6/fs/ntfs/file.c
===================================================================
--- linux-2.6.orig/fs/ntfs/file.c 2006-07-12 16:07:25.000000000 +0200
+++ linux-2.6/fs/ntfs/file.c 2006-07-12 16:08:18.000000000 +0200
@@ -441,7 +441,7 @@ static inline int __ntfs_grab_cache_page
pages[nr] = *cached_page;
page_cache_get(*cached_page);
if (unlikely(!pagevec_add(lru_pvec, *cached_page)))
- __pagevec_lru_add(lru_pvec);
+ __pagevec_pgrep_add(lru_pvec);
*cached_page = NULL;
}
index++;
@@ -2111,7 +2111,7 @@ err_out:
OSYNC_METADATA|OSYNC_DATA);
}
}
- pagevec_lru_add(&lru_pvec);
+ pagevec_pgrep_add(&lru_pvec);
ntfs_debug("Done. Returning %s (written 0x%lx, status %li).",
written ? "written" : "status", (unsigned long)written,
(long)status);
Index: linux-2.6/mm/readahead.c
===================================================================
--- linux-2.6.orig/mm/readahead.c 2006-07-12 16:07:32.000000000 +0200
+++ linux-2.6/mm/readahead.c 2006-07-12 16:08:18.000000000 +0200
@@ -14,6 +14,7 @@
#include <linux/blkdev.h>
#include <linux/backing-dev.h>
#include <linux/pagevec.h>
+#include <linux/mm_page_replace.h>
void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page)
{
@@ -146,7 +147,7 @@ int read_cache_pages(struct address_spac
}
ret = filler(data, page);
if (!pagevec_add(&lru_pvec, page))
- __pagevec_lru_add(&lru_pvec);
+ __pagevec_pgrep_add(&lru_pvec);
if (ret) {
while (!list_empty(pages)) {
struct page *victim;
@@ -158,7 +159,7 @@ int read_cache_pages(struct address_spac
break;
}
}
- pagevec_lru_add(&lru_pvec);
+ pagevec_pgrep_add(&lru_pvec);
return ret;
}
@@ -185,13 +186,13 @@ static int read_pages(struct address_spa
ret = mapping->a_ops->readpage(filp, page);
if (ret != AOP_TRUNCATED_PAGE) {
if (!pagevec_add(&lru_pvec, page))
- __pagevec_lru_add(&lru_pvec);
+ __pagevec_pgrep_add(&lru_pvec);
continue;
} /* else fall through to release */
}
page_cache_release(page);
}
- pagevec_lru_add(&lru_pvec);
+ pagevec_pgrep_add(&lru_pvec);
ret = 0;
out:
return ret;
Index: linux-2.6/fs/exec.c
===================================================================
--- linux-2.6.orig/fs/exec.c 2006-07-12 16:07:24.000000000 +0200
+++ linux-2.6/fs/exec.c 2006-07-12 16:08:18.000000000 +0200
@@ -49,6 +49,7 @@
#include <linux/rmap.h>
#include <linux/acct.h>
#include <linux/cn_proc.h>
+#include <linux/mm_page_replace.h>
#include <asm/uaccess.h>
#include <asm/mmu_context.h>
@@ -321,7 +322,8 @@ void install_arg_page(struct vm_area_str
goto out;
}
inc_mm_counter(mm, anon_rss);
- lru_cache_add_active(page);
+ pgrep_hint_active(page);
+ pgrep_add(page);
set_pte_at(mm, address, pte, pte_mkdirty(pte_mkwrite(mk_pte(
page, vma->vm_page_prot))));
page_add_new_anon_rmap(page, vma, address);
Index: linux-2.6/include/linux/pagevec.h
===================================================================
--- linux-2.6.orig/include/linux/pagevec.h 2006-06-12 06:51:15.000000000 +0200
+++ linux-2.6/include/linux/pagevec.h 2006-07-12 16:08:18.000000000 +0200
@@ -23,8 +23,6 @@ struct pagevec {
void __pagevec_release(struct pagevec *pvec);
void __pagevec_release_nonlru(struct pagevec *pvec);
void __pagevec_free(struct pagevec *pvec);
-void __pagevec_lru_add(struct pagevec *pvec);
-void __pagevec_lru_add_active(struct pagevec *pvec);
void pagevec_strip(struct pagevec *pvec);
unsigned pagevec_lookup(struct pagevec *pvec, struct address_space *mapping,
pgoff_t start, unsigned nr_pages);
@@ -81,10 +79,4 @@ static inline void pagevec_free(struct p
__pagevec_free(pvec);
}
-static inline void pagevec_lru_add(struct pagevec *pvec)
-{
- if (pagevec_count(pvec))
- __pagevec_lru_add(pvec);
-}
-
#endif /* _LINUX_PAGEVEC_H */
Index: linux-2.6/include/linux/swap.h
===================================================================
--- linux-2.6.orig/include/linux/swap.h 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/include/linux/swap.h 2006-07-12 16:11:49.000000000 +0200
@@ -163,13 +163,11 @@ extern unsigned int nr_free_buffer_pages
extern unsigned int nr_free_pagecache_pages(void);
/* linux/mm/swap.c */
-extern void FASTCALL(lru_cache_add(struct page *));
-extern void FASTCALL(lru_cache_add_active(struct page *));
extern void FASTCALL(mark_page_accessed(struct page *));
-extern void lru_add_drain(void);
extern int lru_add_drain_all(void);
extern int rotate_reclaimable_page(struct page *page);
extern void swap_setup(void);
+extern void release_pages(struct page **, int, int);
/* linux/mm/vmscan.c */
extern unsigned long try_to_free_pages(struct zone **, gfp_t);
Index: linux-2.6/fs/ramfs/file-nommu.c
===================================================================
--- linux-2.6.orig/fs/ramfs/file-nommu.c 2006-07-12 16:07:25.000000000 +0200
+++ linux-2.6/fs/ramfs/file-nommu.c 2006-07-12 16:08:18.000000000 +0200
@@ -108,7 +108,7 @@ static int ramfs_nommu_expand_for_mappin
goto add_error;
if (!pagevec_add(&lru_pvec, page))
- __pagevec_lru_add(&lru_pvec);
+ __pagevec_pgrep_add(&lru_pvec);
unlock_page(page);
}
Index: linux-2.6/mm/mempolicy.c
===================================================================
--- linux-2.6.orig/mm/mempolicy.c 2006-07-12 16:07:32.000000000 +0200
+++ linux-2.6/mm/mempolicy.c 2006-07-12 16:11:46.000000000 +0200
@@ -87,6 +87,7 @@
#include <linux/seq_file.h>
#include <linux/proc_fs.h>
#include <linux/migrate.h>
+#include <linux/mm_page_replace.h>
#include <asm/tlbflush.h>
#include <asm/uaccess.h>
Index: linux-2.6/fs/splice.c
===================================================================
--- linux-2.6.orig/fs/splice.c 2006-07-12 16:07:25.000000000 +0200
+++ linux-2.6/fs/splice.c 2006-07-12 16:08:18.000000000 +0200
@@ -21,13 +21,13 @@
#include <linux/file.h>
#include <linux/pagemap.h>
#include <linux/pipe_fs_i.h>
-#include <linux/mm_inline.h>
#include <linux/swap.h>
#include <linux/writeback.h>
#include <linux/buffer_head.h>
#include <linux/module.h>
#include <linux/syscalls.h>
#include <linux/uio.h>
+#include <linux/mm_page_replace.h>
struct partial_page {
unsigned int offset;
@@ -587,7 +587,7 @@ static int pipe_to_file(struct pipe_inod
page_cache_get(page);
if (!(buf->flags & PIPE_BUF_FLAG_LRU))
- lru_cache_add(page);
+ pgrep_add(page);
} else {
find_page:
page = find_lock_page(mapping, index);
Index: linux-2.6/mm/migrate.c
===================================================================
--- linux-2.6.orig/mm/migrate.c 2006-07-12 16:07:32.000000000 +0200
+++ linux-2.6/mm/migrate.c 2006-07-12 16:11:46.000000000 +0200
@@ -24,6 +24,7 @@
#include <linux/cpu.h>
#include <linux/cpuset.h>
#include <linux/swapops.h>
+#include <linux/mm_page_replace.h>
#include "internal.h"
@@ -80,7 +81,7 @@ int migrate_prep(void)
* drained them. Those pages will fail to migrate like other
* pages that may be busy.
*/
- lru_add_drain_all();
+ pgrep_add_drain();
return 0;
}
@@ -88,16 +89,7 @@ int migrate_prep(void)
static inline void move_to_lru(struct page *page)
{
list_del(&page->lru);
- if (PageActive(page)) {
- /*
- * lru_cache_add_active checks that
- * the PG_active bit is off.
- */
- ClearPageActive(page);
- lru_cache_add_active(page);
- } else {
- lru_cache_add(page);
- }
+ pgrep_add(page);
put_page(page);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 5/39] mm: pgrep: add a use-once insertion hint
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (3 preceding siblings ...)
2006-07-12 14:37 ` [PATCH 4/39] mm: pgrep: convert insertion Peter Zijlstra
@ 2006-07-12 14:37 ` Peter Zijlstra
2006-07-12 14:38 ` [PATCH 6/39] mm: pgrep: generice __pagevec_*_add Peter Zijlstra
` (34 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:37 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Allow for a use-once hint.
API:
give a hint to the page replace algorithm:
void pgrep_hint_use_once(struct page *);
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/mm_page_replace.h | 1 +
include/linux/mm_use_once_policy.h | 4 ++++
mm/filemap.c | 12 ++++++++++++
3 files changed, 17 insertions(+)
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/mm/filemap.c 2006-07-12 16:08:18.000000000 +0200
@@ -412,6 +412,18 @@ int add_to_page_cache(struct page *page,
error = radix_tree_insert(&mapping->page_tree, offset, page);
if (!error) {
page_cache_get(page);
+ /*
+ * shmem_getpage()
+ * lookup_swap_cache()
+ * TestSetPageLocked()
+ * move_from_swap_cache()
+ * add_to_page_cache()
+ *
+ * That path calls us with a LRU page instead of a new
+ * page. Don't set the hint for LRU pages.
+ */
+ if (!PageLocked(page))
+ pgrep_hint_use_once(page);
SetPageLocked(page);
page->mapping = mapping;
page->index = offset;
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace.h 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:11:54.000000000 +0200
@@ -9,6 +9,7 @@
#include <linux/mm_inline.h>
/* void pgrep_hint_active(struct page *); */
+/* void pgrep_hint_use_once(struct page *); */
extern void fastcall pgrep_add(struct page *);
/* void __pgrep_add(struct zone *, struct page *); */
/* void pgrep_add_drain(void); */
Index: linux-2.6/include/linux/mm_use_once_policy.h
===================================================================
--- linux-2.6.orig/include/linux/mm_use_once_policy.h 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/include/linux/mm_use_once_policy.h 2006-07-12 16:11:54.000000000 +0200
@@ -8,6 +8,10 @@ static inline void pgrep_hint_active(str
SetPageActive(page);
}
+static inline void pgrep_hint_use_once(struct page *page)
+{
+}
+
static inline void
__pgrep_add(struct zone *zone, struct page *page)
{
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 6/39] mm: pgrep: generice __pagevec_*_add
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (4 preceding siblings ...)
2006-07-12 14:37 ` [PATCH 5/39] mm: pgrep: add a use-once insertion hint Peter Zijlstra
@ 2006-07-12 14:38 ` Peter Zijlstra
2006-07-12 14:38 ` [PATCH 7/39] mm: pgrep: abstract the activation logic Peter Zijlstra
` (33 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:38 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Since PG_active is already used to discriminate between active and inactive
lists, use it to collapse the two pagevec add functions and make it a generic
helper function.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/mm_inline.h | 6 ++++
mm/swap.c | 31 ++++++++++++++++++++
mm/useonce.c | 68 ++--------------------------------------------
3 files changed, 41 insertions(+), 64 deletions(-)
Index: linux-2.6/mm/useonce.c
===================================================================
--- linux-2.6.orig/mm/useonce.c 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/mm/useonce.c 2006-07-12 16:11:49.000000000 +0200
@@ -11,64 +11,6 @@
static DEFINE_PER_CPU(struct pagevec, lru_add_pvecs) = { 0, };
static DEFINE_PER_CPU(struct pagevec, lru_add_active_pvecs) = { 0, };
-/*
- * Add the passed pages to the LRU, then drop the caller's refcount
- * on them. Reinitialises the caller's pagevec.
- */
-void __pagevec_pgrep_add(struct pagevec *pvec)
-{
- int i;
- struct zone *zone = NULL;
-
- for (i = 0; i < pagevec_count(pvec); i++) {
- struct page *page = pvec->pages[i];
- struct zone *pagezone = page_zone(page);
-
- if (pagezone != zone) {
- if (zone)
- spin_unlock_irq(&zone->lru_lock);
- zone = pagezone;
- spin_lock_irq(&zone->lru_lock);
- }
- BUG_ON(PageLRU(page));
- SetPageLRU(page);
- add_page_to_inactive_list(zone, page);
- }
- if (zone)
- spin_unlock_irq(&zone->lru_lock);
- release_pages(pvec->pages, pvec->nr, pvec->cold);
- pagevec_reinit(pvec);
-}
-
-EXPORT_SYMBOL(__pagevec_pgrep_add);
-
-static void __pagevec_lru_add_active(struct pagevec *pvec)
-{
- int i;
- struct zone *zone = NULL;
-
- for (i = 0; i < pagevec_count(pvec); i++) {
- struct page *page = pvec->pages[i];
- struct zone *pagezone = page_zone(page);
-
- if (pagezone != zone) {
- if (zone)
- spin_unlock_irq(&zone->lru_lock);
- zone = pagezone;
- spin_lock_irq(&zone->lru_lock);
- }
- BUG_ON(PageLRU(page));
- SetPageLRU(page);
- BUG_ON(PageActive(page));
- SetPageActive(page);
- add_page_to_active_list(zone, page);
- }
- if (zone)
- spin_unlock_irq(&zone->lru_lock);
- release_pages(pvec->pages, pvec->nr, pvec->cold);
- pagevec_reinit(pvec);
-}
-
static inline void lru_cache_add(struct page *page)
{
struct pagevec *pvec = &get_cpu_var(lru_add_pvecs);
@@ -85,18 +27,16 @@ static inline void lru_cache_add_active(
page_cache_get(page);
if (!pagevec_add(pvec, page))
- __pagevec_lru_add_active(pvec);
+ __pagevec_pgrep_add(pvec);
put_cpu_var(lru_add_active_pvecs);
}
void fastcall pgrep_add(struct page *page)
{
- if (PageActive(page)) {
- ClearPageActive(page);
+ if (PageActive(page))
lru_cache_add_active(page);
- } else {
+ else
lru_cache_add(page);
- }
}
void __pgrep_add_drain(unsigned int cpu)
@@ -107,5 +47,5 @@ void __pgrep_add_drain(unsigned int cpu)
__pagevec_pgrep_add(pvec);
pvec = &per_cpu(lru_add_active_pvecs, cpu);
if (pagevec_count(pvec))
- __pagevec_lru_add_active(pvec);
+ __pagevec_pgrep_add(pvec);
}
Index: linux-2.6/mm/swap.c
===================================================================
--- linux-2.6.orig/mm/swap.c 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/mm/swap.c 2006-07-12 16:11:50.000000000 +0200
@@ -335,6 +335,37 @@ unsigned pagevec_lookup_tag(struct pagev
EXPORT_SYMBOL(pagevec_lookup_tag);
+/*
+ * Add the passed pages to the LRU, then drop the caller's refcount
+ * on them. Reinitialises the caller's pagevec.
+ */
+void __pagevec_pgrep_add(struct pagevec *pvec)
+{
+ int i;
+ struct zone *zone = NULL;
+
+ for (i = 0; i < pagevec_count(pvec); i++) {
+ struct page *page = pvec->pages[i];
+ struct zone *pagezone = page_zone(page);
+
+ if (pagezone != zone) {
+ if (zone)
+ spin_unlock_irq(&zone->lru_lock);
+ zone = pagezone;
+ spin_lock_irq(&zone->lru_lock);
+ }
+ BUG_ON(PageLRU(page));
+ SetPageLRU(page);
+ __pgrep_add(zone, page);
+ }
+ if (zone)
+ spin_unlock_irq(&zone->lru_lock);
+ release_pages(pvec->pages, pvec->nr, pvec->cold);
+ pagevec_reinit(pvec);
+}
+
+EXPORT_SYMBOL(__pagevec_pgrep_add);
+
#ifdef CONFIG_SMP
/*
* We tolerate a little inaccuracy to avoid ping-ponging the counter between
Index: linux-2.6/include/linux/mm_inline.h
===================================================================
--- linux-2.6.orig/include/linux/mm_inline.h 2006-06-23 21:47:11.000000000 +0200
+++ linux-2.6/include/linux/mm_inline.h 2006-07-12 16:11:44.000000000 +0200
@@ -1,3 +1,7 @@
+#ifndef _LINUX_MM_INLINE_H_
+#define _LINUX_MM_INLINE_H_
+
+#ifdef __KERNEL__
static inline void
add_page_to_active_list(struct zone *zone, struct page *page)
@@ -39,3 +43,5 @@ del_page_from_lru(struct zone *zone, str
}
}
+#endif /* __KERNEL__ */
+#endif /* _LINUX_MM_INLINE_H_ */
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 7/39] mm: pgrep: abstract the activation logic
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (5 preceding siblings ...)
2006-07-12 14:38 ` [PATCH 6/39] mm: pgrep: generice __pagevec_*_add Peter Zijlstra
@ 2006-07-12 14:38 ` Peter Zijlstra
2006-07-12 14:38 ` [PATCH 8/39] mm: pgrep: move useful macros around Peter Zijlstra
` (32 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:38 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Abstract page activation and the reclaimable condition.
API:
wether the page is reclaimable
reclaim_t pgrep_reclaimable(struct page *);
RECLAIM_KEEP - keep the page
RECLAIM_ACTIVATE - keep the page and activate
RECLAIM_REFERENCED - try to pageout even though referenced
RECLAIM_OK - try to pageout
activate the page
int pgrep_activate(struct page *page);
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/mm_page_replace.h | 11 ++++++++
include/linux/mm_use_once_policy.h | 48 +++++++++++++++++++++++++++++++++++++
mm/vmscan.c | 42 ++++++++++----------------------
3 files changed, 72 insertions(+), 29 deletions(-)
Index: linux-2.6/include/linux/mm_use_once_policy.h
===================================================================
--- linux-2.6.orig/include/linux/mm_use_once_policy.h 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/include/linux/mm_use_once_policy.h 2006-07-12 16:11:50.000000000 +0200
@@ -3,6 +3,9 @@
#ifdef __KERNEL__
+#include <linux/fs.h>
+#include <linux/rmap.h>
+
static inline void pgrep_hint_active(struct page *page)
{
SetPageActive(page);
@@ -21,5 +24,50 @@ __pgrep_add(struct zone *zone, struct pa
add_page_to_inactive_list(zone, page);
}
+/* Called without lock on whether page is mapped, so answer is unstable */
+static inline int page_mapping_inuse(struct page *page)
+{
+ struct address_space *mapping;
+
+ /* Page is in somebody's page tables. */
+ if (page_mapped(page))
+ return 1;
+
+ /* Be more reluctant to reclaim swapcache than pagecache */
+ if (PageSwapCache(page))
+ return 1;
+
+ mapping = page_mapping(page);
+ if (!mapping)
+ return 0;
+
+ /* File is mmap'd by somebody? */
+ return mapping_mapped(mapping);
+}
+
+static inline reclaim_t pgrep_reclaimable(struct page *page)
+{
+ int referenced;
+
+ if (PageActive(page))
+ BUG();
+
+ referenced = page_referenced(page, 1);
+ /* In active use or really unfreeable? Activate it. */
+ if (referenced && page_mapping_inuse(page))
+ return RECLAIM_ACTIVATE;
+
+ if (referenced)
+ return RECLAIM_REFERENCED;
+
+ return RECLAIM_OK;
+}
+
+static inline int pgrep_activate(struct page *page)
+{
+ SetPageActive(page);
+ return 1;
+}
+
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_USEONCE_POLICY_H */
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace.h 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:11:53.000000000 +0200
@@ -17,6 +17,17 @@ extern void __pgrep_add_drain(unsigned i
extern int pgrep_add_drain_all(void);
extern void __pagevec_pgrep_add(struct pagevec *);
+typedef enum {
+ RECLAIM_KEEP,
+ RECLAIM_ACTIVATE,
+ RECLAIM_REFERENCED,
+ RECLAIM_OK,
+} reclaim_t;
+
+/* reclaim_t pgrep_reclaimable(struct page *); */
+/* int pgrep_activate(struct page *page); */
+
+
#ifdef CONFIG_MM_POLICY_USEONCE
#include <linux/mm_use_once_policy.h>
#else
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/mm/vmscan.c 2006-07-12 16:11:53.000000000 +0200
@@ -229,27 +229,6 @@ unsigned long shrink_slab(unsigned long
return ret;
}
-/* Called without lock on whether page is mapped, so answer is unstable */
-static inline int page_mapping_inuse(struct page *page)
-{
- struct address_space *mapping;
-
- /* Page is in somebody's page tables. */
- if (page_mapped(page))
- return 1;
-
- /* Be more reluctant to reclaim swapcache than pagecache */
- if (PageSwapCache(page))
- return 1;
-
- mapping = page_mapping(page);
- if (!mapping)
- return 0;
-
- /* File is mmap'd by somebody? */
- return mapping_mapped(mapping);
-}
-
static inline int is_page_cache_freeable(struct page *page)
{
return page_count(page) - !!PagePrivate(page) == 2;
@@ -419,7 +398,7 @@ static unsigned long shrink_page_list(st
struct address_space *mapping;
struct page *page;
int may_enter_fs;
- int referenced;
+ int referenced = 0;
cond_resched();
@@ -429,8 +408,6 @@ static unsigned long shrink_page_list(st
if (TestSetPageLocked(page))
goto keep;
- BUG_ON(PageActive(page));
-
sc->nr_scanned++;
if (!sc->may_swap && page_mapped(page))
@@ -443,10 +420,17 @@ static unsigned long shrink_page_list(st
if (PageWriteback(page))
goto keep_locked;
- referenced = page_referenced(page, 1);
- /* In active use or really unfreeable? Activate it. */
- if (referenced && page_mapping_inuse(page))
+ switch (pgrep_reclaimable(page)) {
+ case RECLAIM_KEEP:
+ goto keep_locked;
+ case RECLAIM_ACTIVATE:
goto activate_locked;
+ case RECLAIM_REFERENCED:
+ referenced = 1;
+ break;
+ case RECLAIM_OK:
+ break;
+ }
#ifdef CONFIG_SWAP
/*
@@ -549,8 +533,8 @@ free_it:
continue;
activate_locked:
- SetPageActive(page);
- pgactivate++;
+ if (pgrep_activate(page))
+ pgactivate++;
keep_locked:
unlock_page(page);
keep:
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 8/39] mm: pgrep: move useful macros around
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (6 preceding siblings ...)
2006-07-12 14:38 ` [PATCH 7/39] mm: pgrep: abstract the activation logic Peter Zijlstra
@ 2006-07-12 14:38 ` Peter Zijlstra
2006-07-12 14:38 ` [PATCH 9/39] mm: pgrep: move struct scan_control around Peter Zijlstra
` (31 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:38 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
move macro's out of vmscan into the generic page replace header so the
rest of the world can use them too.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/mm_page_replace.h | 30 ++++++++++++++++++++++++++++++
mm/vmscan.c | 30 ------------------------------
2 files changed, 30 insertions(+), 30 deletions(-)
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace.h 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:11:52.000000000 +0200
@@ -8,6 +8,36 @@
#include <linux/pagevec.h>
#include <linux/mm_inline.h>
+#define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
+
+#ifdef ARCH_HAS_PREFETCH
+#define prefetch_prev_lru_page(_page, _base, _field) \
+ do { \
+ if ((_page)->lru.prev != _base) { \
+ struct page *prev; \
+ \
+ prev = lru_to_page(&(_page->lru)); \
+ prefetch(&prev->_field); \
+ } \
+ } while (0)
+#else
+#define prefetch_prev_lru_page(_page, _base, _field) do { } while (0)
+#endif
+
+#ifdef ARCH_HAS_PREFETCHW
+#define prefetchw_prev_lru_page(_page, _base, _field) \
+ do { \
+ if ((_page)->lru.prev != _base) { \
+ struct page *prev; \
+ \
+ prev = lru_to_page(&(_page->lru)); \
+ prefetchw(&prev->_field); \
+ } \
+ } while (0)
+#else
+#define prefetchw_prev_lru_page(_page, _base, _field) do { } while (0)
+#endif
+
/* void pgrep_hint_active(struct page *); */
/* void pgrep_hint_use_once(struct page *); */
extern void fastcall pgrep_add(struct page *);
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/mm/vmscan.c 2006-07-12 16:11:52.000000000 +0200
@@ -77,36 +77,6 @@ struct shrinker {
long nr; /* objs pending delete */
};
-#define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
-
-#ifdef ARCH_HAS_PREFETCH
-#define prefetch_prev_lru_page(_page, _base, _field) \
- do { \
- if ((_page)->lru.prev != _base) { \
- struct page *prev; \
- \
- prev = lru_to_page(&(_page->lru)); \
- prefetch(&prev->_field); \
- } \
- } while (0)
-#else
-#define prefetch_prev_lru_page(_page, _base, _field) do { } while (0)
-#endif
-
-#ifdef ARCH_HAS_PREFETCHW
-#define prefetchw_prev_lru_page(_page, _base, _field) \
- do { \
- if ((_page)->lru.prev != _base) { \
- struct page *prev; \
- \
- prev = lru_to_page(&(_page->lru)); \
- prefetchw(&prev->_field); \
- } \
- } while (0)
-#else
-#define prefetchw_prev_lru_page(_page, _base, _field) do { } while (0)
-#endif
-
/*
* From 0 .. 100. Higher means more swappy.
*/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 9/39] mm: pgrep: move struct scan_control around
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (7 preceding siblings ...)
2006-07-12 14:38 ` [PATCH 8/39] mm: pgrep: move useful macros around Peter Zijlstra
@ 2006-07-12 14:38 ` Peter Zijlstra
2006-07-12 14:38 ` [PATCH 10/39] mm: pgrep: isolate the reclaim_mapped logic Peter Zijlstra
` (30 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:38 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Move struct scan_control to the general pgrep header so that all
policies can make use of it.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/mm_page_replace.h | 23 +++++++++++++++++++++++
mm/vmscan.c | 23 -----------------------
2 files changed, 23 insertions(+), 23 deletions(-)
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace.h 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:11:51.000000000 +0200
@@ -8,6 +8,29 @@
#include <linux/pagevec.h>
#include <linux/mm_inline.h>
+struct scan_control {
+ /* Incremented by the number of inactive pages that were scanned */
+ unsigned long nr_scanned;
+
+ unsigned long nr_mapped; /* From page_state */
+
+ /* This context's GFP mask */
+ gfp_t gfp_mask;
+
+ int may_writepage;
+
+ /* Can pages be swapped as part of reclaim? */
+ int may_swap;
+
+ /* This context's SWAP_CLUSTER_MAX. If freeing memory for
+ * suspend, we effectively ignore SWAP_CLUSTER_MAX.
+ * In this context, it doesn't matter that we scan the
+ * whole list at once. */
+ int swap_cluster_max;
+
+ unsigned long nr_writeout; /* page against which writeout was started */
+};
+
#define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
#ifdef ARCH_HAS_PREFETCH
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/mm/vmscan.c 2006-07-12 16:11:51.000000000 +0200
@@ -43,29 +43,6 @@
#include "internal.h"
-struct scan_control {
- /* Incremented by the number of inactive pages that were scanned */
- unsigned long nr_scanned;
-
- unsigned long nr_mapped; /* From page_state */
-
- /* This context's GFP mask */
- gfp_t gfp_mask;
-
- int may_writepage;
-
- /* Can pages be swapped as part of reclaim? */
- int may_swap;
-
- /* This context's SWAP_CLUSTER_MAX. If freeing memory for
- * suspend, we effectively ignore SWAP_CLUSTER_MAX.
- * In this context, it doesn't matter that we scan the
- * whole list at once. */
- int swap_cluster_max;
-
- unsigned long nr_writeout; /* page against which writeout was started */
-};
-
/*
* The list of shrinker callbacks used by to apply pressure to
* ageable caches.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 10/39] mm: pgrep: isolate the reclaim_mapped logic
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (8 preceding siblings ...)
2006-07-12 14:38 ` [PATCH 9/39] mm: pgrep: move struct scan_control around Peter Zijlstra
@ 2006-07-12 14:38 ` Peter Zijlstra
2006-07-12 14:39 ` [PATCH 11/39] mm: pgrep: replace mark_page_accessed Peter Zijlstra
` (29 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:38 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Move the reclaim_mapped code over to its own function so that other
reclaim policies can make use of it.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/mm_page_replace.h | 2
mm/vmscan.c | 95 ++++++++++++++++++++--------------------
2 files changed, 49 insertions(+), 48 deletions(-)
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/mm/vmscan.c 2006-07-12 16:11:49.000000000 +0200
@@ -618,6 +618,50 @@ done:
return nr_reclaimed;
}
+int should_reclaim_mapped(struct zone *zone)
+{
+ long mapped_ratio;
+ long distress;
+ long swap_tendency;
+
+ /*
+ * `distress' is a measure of how much trouble we're having
+ * reclaiming pages. 0 -> no problems. 100 -> great trouble.
+ */
+ distress = 100 >> zone->prev_priority;
+
+ /*
+ * The point of this algorithm is to decide when to start
+ * reclaiming mapped memory instead of just pagecache. Work out
+ * how much memory
+ * is mapped.
+ */
+ mapped_ratio = (read_page_state(nr_mapped) * 100) / total_memory;
+
+ /*
+ * Now decide how much we really want to unmap some pages. The
+ * mapped ratio is downgraded - just because there's a lot of
+ * mapped memory doesn't necessarily mean that page reclaim
+ * isn't succeeding.
+ *
+ * The distress ratio is important - we don't want to start
+ * going oom.
+ *
+ * A 100% value of vm_swappiness overrides this algorithm
+ * altogether.
+ */
+ swap_tendency = mapped_ratio / 2 + distress + vm_swappiness;
+
+ /*
+ * Now use this metric to decide whether to start moving mapped
+ * memory onto the inactive list.
+ */
+ if (swap_tendency >= 100)
+ return 1;
+
+ return 0;
+}
+
/*
* This moves pages from the active list to the inactive list.
*
@@ -636,7 +680,7 @@ done:
* But we had to alter page->flags anyway.
*/
static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
- struct scan_control *sc)
+ struct scan_control *sc, int reclaim_mapped)
{
unsigned long pgmoved;
int pgdeactivate = 0;
@@ -646,48 +690,9 @@ static void shrink_active_list(unsigned
LIST_HEAD(l_active); /* Pages to go onto the active_list */
struct page *page;
struct pagevec pvec;
- int reclaim_mapped = 0;
-
- if (sc->may_swap) {
- long mapped_ratio;
- long distress;
- long swap_tendency;
-
- /*
- * `distress' is a measure of how much trouble we're having
- * reclaiming pages. 0 -> no problems. 100 -> great trouble.
- */
- distress = 100 >> zone->prev_priority;
-
- /*
- * The point of this algorithm is to decide when to start
- * reclaiming mapped memory instead of just pagecache. Work out
- * how much memory
- * is mapped.
- */
- mapped_ratio = (sc->nr_mapped * 100) / total_memory;
- /*
- * Now decide how much we really want to unmap some pages. The
- * mapped ratio is downgraded - just because there's a lot of
- * mapped memory doesn't necessarily mean that page reclaim
- * isn't succeeding.
- *
- * The distress ratio is important - we don't want to start
- * going oom.
- *
- * A 100% value of vm_swappiness overrides this algorithm
- * altogether.
- */
- swap_tendency = mapped_ratio / 2 + distress + vm_swappiness;
-
- /*
- * Now use this metric to decide whether to start moving mapped
- * memory onto the inactive list.
- */
- if (swap_tendency >= 100)
- reclaim_mapped = 1;
- }
+ if (!sc->may_swap)
+ reclaim_mapped = 0;
pgrep_add_drain();
spin_lock_irq(&zone->lru_lock);
@@ -781,6 +786,7 @@ static unsigned long shrink_zone(int pri
unsigned long nr_inactive;
unsigned long nr_to_scan;
unsigned long nr_reclaimed = 0;
+ int reclaim_mapped = should_reclaim_mapped(zone);
atomic_inc(&zone->reclaim_in_progress);
@@ -807,7 +813,7 @@ static unsigned long shrink_zone(int pri
nr_to_scan = min(nr_active,
(unsigned long)sc->swap_cluster_max);
nr_active -= nr_to_scan;
- shrink_active_list(nr_to_scan, zone, sc);
+ shrink_active_list(nr_to_scan, zone, sc, reclaim_mapped);
}
if (nr_inactive) {
@@ -910,7 +916,6 @@ unsigned long try_to_free_pages(struct z
}
for (priority = DEF_PRIORITY; priority >= 0; priority--) {
- sc.nr_mapped = read_page_state(nr_mapped);
sc.nr_scanned = 0;
if (!priority)
disable_swap_token();
@@ -1000,7 +1005,6 @@ loop_again:
total_scanned = 0;
nr_reclaimed = 0;
sc.may_writepage = !laptop_mode;
- sc.nr_mapped = read_page_state(nr_mapped);
inc_page_state(pageoutrun);
@@ -1351,7 +1355,6 @@ static int __zone_reclaim(struct zone *z
struct scan_control sc = {
.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
.may_swap = !!(zone_reclaim_mode & RECLAIM_SWAP),
- .nr_mapped = read_page_state(nr_mapped),
.swap_cluster_max = max_t(unsigned long, nr_pages,
SWAP_CLUSTER_MAX),
.gfp_mask = gfp_mask,
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace.h 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:11:50.000000000 +0200
@@ -12,8 +12,6 @@ struct scan_control {
/* Incremented by the number of inactive pages that were scanned */
unsigned long nr_scanned;
- unsigned long nr_mapped; /* From page_state */
-
/* This context's GFP mask */
gfp_t gfp_mask;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 11/39] mm: pgrep: replace mark_page_accessed
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (9 preceding siblings ...)
2006-07-12 14:38 ` [PATCH 10/39] mm: pgrep: isolate the reclaim_mapped logic Peter Zijlstra
@ 2006-07-12 14:39 ` Peter Zijlstra
2006-07-12 14:39 ` [PATCH 12/39] mm: pgrep: move the shrink logic Peter Zijlstra
` (28 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:39 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Abstract the page activation.
API:
Mark a page as accessed.
void pgrep_mark_accessed(struct page *);
XXX: go through tree and rename mark_page_accessed() ?
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/mm_page_replace.h | 1 +
include/linux/mm_use_once_policy.h | 26 ++++++++++++++++++++++++++
mm/swap.c | 28 +---------------------------
3 files changed, 28 insertions(+), 27 deletions(-)
Index: linux-2.6/include/linux/mm_use_once_policy.h
===================================================================
--- linux-2.6.orig/include/linux/mm_use_once_policy.h 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/include/linux/mm_use_once_policy.h 2006-07-12 16:11:47.000000000 +0200
@@ -24,6 +24,32 @@ __pgrep_add(struct zone *zone, struct pa
add_page_to_inactive_list(zone, page);
}
+/*
+ * Mark a page as having seen activity.
+ *
+ * inactive,unreferenced -> inactive,referenced
+ * inactive,referenced -> active,unreferenced
+ * active,unreferenced -> active,referenced
+ */
+static inline void pgrep_mark_accessed(struct page *page)
+{
+ if (!PageActive(page) && PageReferenced(page) && PageLRU(page)) {
+ struct zone *zone = page_zone(page);
+
+ spin_lock_irq(&zone->lru_lock);
+ if (PageLRU(page) && !PageActive(page)) {
+ del_page_from_inactive_list(zone, page);
+ SetPageActive(page);
+ add_page_to_active_list(zone, page);
+ inc_page_state(pgactivate);
+ }
+ spin_unlock_irq(&zone->lru_lock);
+ ClearPageReferenced(page);
+ } else if (!PageReferenced(page)) {
+ SetPageReferenced(page);
+ }
+}
+
/* Called without lock on whether page is mapped, so answer is unstable */
static inline int page_mapping_inuse(struct page *page)
{
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace.h 2006-07-12 16:08:28.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:11:49.000000000 +0200
@@ -77,6 +77,7 @@ typedef enum {
/* reclaim_t pgrep_reclaimable(struct page *); */
/* int pgrep_activate(struct page *page); */
+/* void pgrep_mark_accessed(struct page *); */
#ifdef CONFIG_MM_POLICY_USEONCE
Index: linux-2.6/mm/swap.c
===================================================================
--- linux-2.6.orig/mm/swap.c 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/mm/swap.c 2006-07-12 16:11:47.000000000 +0200
@@ -98,37 +98,11 @@ int rotate_reclaimable_page(struct page
}
/*
- * FIXME: speed this up?
- */
-void fastcall activate_page(struct page *page)
-{
- struct zone *zone = page_zone(page);
-
- spin_lock_irq(&zone->lru_lock);
- if (PageLRU(page) && !PageActive(page)) {
- del_page_from_inactive_list(zone, page);
- SetPageActive(page);
- add_page_to_active_list(zone, page);
- inc_page_state(pgactivate);
- }
- spin_unlock_irq(&zone->lru_lock);
-}
-
-/*
* Mark a page as having seen activity.
- *
- * inactive,unreferenced -> inactive,referenced
- * inactive,referenced -> active,unreferenced
- * active,unreferenced -> active,referenced
*/
void fastcall mark_page_accessed(struct page *page)
{
- if (!PageActive(page) && PageReferenced(page) && PageLRU(page)) {
- activate_page(page);
- ClearPageReferenced(page);
- } else if (!PageReferenced(page)) {
- SetPageReferenced(page);
- }
+ pgrep_mark_accessed(page);
}
EXPORT_SYMBOL(mark_page_accessed);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 12/39] mm: pgrep: move the shrink logic
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (10 preceding siblings ...)
2006-07-12 14:39 ` [PATCH 11/39] mm: pgrep: replace mark_page_accessed Peter Zijlstra
@ 2006-07-12 14:39 ` Peter Zijlstra
2006-07-12 14:39 ` [PATCH 13/39] mm: pgrep: abstract rotate_reclaimable_page() Peter Zijlstra
` (27 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:39 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Move the whole per zone shrinker to the policy files.
Share the shrink_list logic across policies since it doesn't know about
the policy internels anymore and exclusively deals with pageout.
API:
Shrink the specified zone.
void pgrep_shrink(struct zone *, struct scan_control *);
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/mm_page_replace.h | 2
include/linux/swap.h | 9 +
mm/useonce.c | 242 ++++++++++++++++++++++++++++++++++++++
mm/vmscan.c | 251 ----------------------------------------
4 files changed, 257 insertions(+), 247 deletions(-)
Index: linux-2.6/include/linux/swap.h
===================================================================
--- linux-2.6.orig/include/linux/swap.h 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/include/linux/swap.h 2006-07-12 16:11:45.000000000 +0200
@@ -7,6 +7,7 @@
#include <linux/mmzone.h>
#include <linux/list.h>
#include <linux/sched.h>
+#include <linux/mm_page_replace.h>
#include <asm/atomic.h>
#include <asm/page.h>
@@ -170,10 +171,16 @@ extern void swap_setup(void);
extern void release_pages(struct page **, int, int);
/* linux/mm/vmscan.c */
+extern int remove_mapping(struct address_space *mapping, struct page *page);
+extern unsigned long shrink_page_list(struct list_head *page_list,
+ struct scan_control *sc);
+extern unsigned long isolate_lru_pages(unsigned long nr_to_scan,
+ struct list_head *src, struct list_head *dst,
+ unsigned long *scanned);
+extern int should_reclaim_mapped(struct zone *zone);
extern unsigned long try_to_free_pages(struct zone **, gfp_t);
extern unsigned long shrink_all_memory(unsigned long nr_pages);
extern int vm_swappiness;
-extern int remove_mapping(struct address_space *mapping, struct page *page);
/* possible outcome of pageout() */
typedef enum {
Index: linux-2.6/mm/useonce.c
===================================================================
--- linux-2.6.orig/mm/useonce.c 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/mm/useonce.c 2006-07-12 16:11:45.000000000 +0200
@@ -1,8 +1,10 @@
#include <linux/mm_page_replace.h>
-#include <linux/mm_inline.h>
#include <linux/swap.h>
#include <linux/module.h>
#include <linux/pagemap.h>
+#include <linux/writeback.h>
+#include <linux/buffer_head.h> /* for try_to_release_page(),
+ buffer_heads_over_limit */
/**
* lru_cache_add: add a page to the page lists
@@ -49,3 +51,241 @@ void __pgrep_add_drain(unsigned int cpu)
if (pagevec_count(pvec))
__pagevec_pgrep_add(pvec);
}
+
+/*
+ * shrink_inactive_list() is a helper for shrink_zone(). It returns the number
+ * of reclaimed pages
+ */
+static unsigned long shrink_inactive_list(unsigned long max_scan,
+ struct zone *zone, struct scan_control *sc)
+{
+ LIST_HEAD(page_list);
+ struct pagevec pvec;
+ unsigned long nr_scanned = 0;
+ unsigned long nr_reclaimed = 0;
+ pagevec_init(&pvec, 1);
+
+ pgrep_add_drain();
+ spin_lock_irq(&zone->lru_lock);
+ do {
+ struct page *page;
+ unsigned long nr_taken;
+ unsigned long nr_scan;
+ unsigned long nr_freed;
+
+ nr_taken = isolate_lru_pages(sc->swap_cluster_max,
+ &zone->inactive_list,
+ &page_list, &nr_scan);
+ zone->nr_inactive -= nr_taken;
+ zone->pages_scanned += nr_scan;
+ spin_unlock_irq(&zone->lru_lock);
+
+ nr_scanned += nr_scan;
+ nr_freed = shrink_page_list(&page_list, sc);
+ nr_reclaimed += nr_freed;
+ local_irq_disable();
+ if (current_is_kswapd()) {
+ __mod_page_state_zone(zone, pgscan_kswapd, nr_scan);
+ __mod_page_state(kswapd_steal, nr_freed);
+ } else
+ __mod_page_state_zone(zone, pgscan_direct, nr_scan);
+ __mod_page_state_zone(zone, pgsteal, nr_freed);
+
+ if (nr_taken == 0)
+ goto done;
+
+ spin_lock(&zone->lru_lock);
+ /*
+ * Put back any unfreeable pages.
+ */
+ while (!list_empty(&page_list)) {
+ page = lru_to_page(&page_list);
+ BUG_ON(PageLRU(page));
+ SetPageLRU(page);
+ list_del(&page->lru);
+ if (PageActive(page))
+ add_page_to_active_list(zone, page);
+ else
+ add_page_to_inactive_list(zone, page);
+ if (!pagevec_add(&pvec, page)) {
+ spin_unlock_irq(&zone->lru_lock);
+ __pagevec_release(&pvec);
+ spin_lock_irq(&zone->lru_lock);
+ }
+ }
+ } while (nr_scanned < max_scan);
+ spin_unlock(&zone->lru_lock);
+done:
+ local_irq_enable();
+ pagevec_release(&pvec);
+ return nr_reclaimed;
+}
+
+/*
+ * This moves pages from the active list to the inactive list.
+ *
+ * We move them the other way if the page is referenced by one or more
+ * processes, from rmap.
+ *
+ * If the pages are mostly unmapped, the processing is fast and it is
+ * appropriate to hold zone->lru_lock across the whole operation. But if
+ * the pages are mapped, the processing is slow (page_referenced()) so we
+ * should drop zone->lru_lock around each page. It's impossible to balance
+ * this, so instead we remove the pages from the LRU while processing them.
+ * It is safe to rely on PG_active against the non-LRU pages in here because
+ * nobody will play with that bit on a non-LRU page.
+ *
+ * The downside is that we have to touch page->_count against each page.
+ * But we had to alter page->flags anyway.
+ */
+static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
+ struct scan_control *sc, int reclaim_mapped)
+{
+ unsigned long pgmoved;
+ int pgdeactivate = 0;
+ unsigned long pgscanned;
+ LIST_HEAD(l_hold); /* The pages which were snipped off */
+ LIST_HEAD(l_inactive); /* Pages to go onto the inactive_list */
+ LIST_HEAD(l_active); /* Pages to go onto the active_list */
+ struct page *page;
+ struct pagevec pvec;
+
+ if (!sc->may_swap)
+ reclaim_mapped = 0;
+
+ pgrep_add_drain();
+ spin_lock_irq(&zone->lru_lock);
+ pgmoved = isolate_lru_pages(nr_pages, &zone->active_list,
+ &l_hold, &pgscanned);
+ zone->pages_scanned += pgscanned;
+ zone->nr_active -= pgmoved;
+ spin_unlock_irq(&zone->lru_lock);
+
+ while (!list_empty(&l_hold)) {
+ cond_resched();
+ page = lru_to_page(&l_hold);
+ list_del(&page->lru);
+ if (page_mapped(page)) {
+ if (!reclaim_mapped ||
+ (total_swap_pages == 0 && PageAnon(page)) ||
+ page_referenced(page, 0)) {
+ list_add(&page->lru, &l_active);
+ continue;
+ }
+ }
+ list_add(&page->lru, &l_inactive);
+ }
+
+ pagevec_init(&pvec, 1);
+ pgmoved = 0;
+ spin_lock_irq(&zone->lru_lock);
+ while (!list_empty(&l_inactive)) {
+ page = lru_to_page(&l_inactive);
+ prefetchw_prev_lru_page(page, &l_inactive, flags);
+ BUG_ON(PageLRU(page));
+ SetPageLRU(page);
+ BUG_ON(!PageActive(page));
+ ClearPageActive(page);
+
+ list_move(&page->lru, &zone->inactive_list);
+ pgmoved++;
+ if (!pagevec_add(&pvec, page)) {
+ zone->nr_inactive += pgmoved;
+ spin_unlock_irq(&zone->lru_lock);
+ pgdeactivate += pgmoved;
+ pgmoved = 0;
+ if (buffer_heads_over_limit)
+ pagevec_strip(&pvec);
+ __pagevec_release(&pvec);
+ spin_lock_irq(&zone->lru_lock);
+ }
+ }
+ zone->nr_inactive += pgmoved;
+ pgdeactivate += pgmoved;
+ if (buffer_heads_over_limit) {
+ spin_unlock_irq(&zone->lru_lock);
+ pagevec_strip(&pvec);
+ spin_lock_irq(&zone->lru_lock);
+ }
+
+ pgmoved = 0;
+ while (!list_empty(&l_active)) {
+ page = lru_to_page(&l_active);
+ prefetchw_prev_lru_page(page, &l_active, flags);
+ BUG_ON(PageLRU(page));
+ SetPageLRU(page);
+ BUG_ON(!PageActive(page));
+ list_move(&page->lru, &zone->active_list);
+ pgmoved++;
+ if (!pagevec_add(&pvec, page)) {
+ zone->nr_active += pgmoved;
+ pgmoved = 0;
+ spin_unlock_irq(&zone->lru_lock);
+ __pagevec_release(&pvec);
+ spin_lock_irq(&zone->lru_lock);
+ }
+ }
+ zone->nr_active += pgmoved;
+ spin_unlock(&zone->lru_lock);
+
+ __mod_page_state_zone(zone, pgrefill, pgscanned);
+ __mod_page_state(pgdeactivate, pgdeactivate);
+ local_irq_enable();
+
+ pagevec_release(&pvec);
+}
+
+/*
+ * This is a basic per-zone page freer. Used by both kswapd and direct reclaim.
+ */
+unsigned long pgrep_shrink_zone(int priority, struct zone *zone,
+ struct scan_control *sc)
+{
+ unsigned long nr_active;
+ unsigned long nr_inactive;
+ unsigned long nr_to_scan;
+ unsigned long nr_reclaimed = 0;
+ int reclaim_mapped = should_reclaim_mapped(zone);
+
+ atomic_inc(&zone->reclaim_in_progress);
+
+ /*
+ * Add one to `nr_to_scan' just to make sure that the kernel will
+ * slowly sift through the active list.
+ */
+ zone->nr_scan_active += (zone->nr_active >> priority) + 1;
+ nr_active = zone->nr_scan_active;
+ if (nr_active >= sc->swap_cluster_max)
+ zone->nr_scan_active = 0;
+ else
+ nr_active = 0;
+
+ zone->nr_scan_inactive += (zone->nr_inactive >> priority) + 1;
+ nr_inactive = zone->nr_scan_inactive;
+ if (nr_inactive >= sc->swap_cluster_max)
+ zone->nr_scan_inactive = 0;
+ else
+ nr_inactive = 0;
+
+ while (nr_active || nr_inactive) {
+ if (nr_active) {
+ nr_to_scan = min(nr_active,
+ (unsigned long)sc->swap_cluster_max);
+ nr_active -= nr_to_scan;
+ shrink_active_list(nr_to_scan, zone, sc, reclaim_mapped);
+ }
+
+ if (nr_inactive) {
+ nr_to_scan = min(nr_inactive,
+ (unsigned long)sc->swap_cluster_max);
+ nr_inactive -= nr_to_scan;
+ nr_reclaimed += shrink_inactive_list(nr_to_scan, zone,
+ sc);
+ }
+ }
+
+ throttle_vm_writeout();
+
+ atomic_dec(&zone->reclaim_in_progress);
+ return nr_reclaimed;
+}
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c 2006-07-12 16:09:07.000000000 +0200
+++ linux-2.6/mm/vmscan.c 2006-07-12 16:11:46.000000000 +0200
@@ -329,8 +329,8 @@ cannot_free:
/*
* shrink_page_list() returns the number of reclaimed pages
*/
-static unsigned long shrink_page_list(struct list_head *page_list,
- struct scan_control *sc)
+unsigned long shrink_page_list(struct list_head *page_list,
+ struct scan_control *sc)
{
LIST_HEAD(ret_pages);
struct pagevec freed_pvec;
@@ -513,7 +513,7 @@ keep:
*
* returns how many pages were moved onto *@dst.
*/
-static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
+unsigned long isolate_lru_pages(unsigned long nr_to_scan,
struct list_head *src, struct list_head *dst,
unsigned long *scanned)
{
@@ -548,76 +548,6 @@ static unsigned long isolate_lru_pages(u
return nr_taken;
}
-/*
- * shrink_inactive_list() is a helper for shrink_zone(). It returns the number
- * of reclaimed pages
- */
-static unsigned long shrink_inactive_list(unsigned long max_scan,
- struct zone *zone, struct scan_control *sc)
-{
- LIST_HEAD(page_list);
- struct pagevec pvec;
- unsigned long nr_scanned = 0;
- unsigned long nr_reclaimed = 0;
-
- pagevec_init(&pvec, 1);
-
- pgrep_add_drain();
- spin_lock_irq(&zone->lru_lock);
- do {
- struct page *page;
- unsigned long nr_taken;
- unsigned long nr_scan;
- unsigned long nr_freed;
-
- nr_taken = isolate_lru_pages(sc->swap_cluster_max,
- &zone->inactive_list,
- &page_list, &nr_scan);
- zone->nr_inactive -= nr_taken;
- zone->pages_scanned += nr_scan;
- spin_unlock_irq(&zone->lru_lock);
-
- nr_scanned += nr_scan;
- nr_freed = shrink_page_list(&page_list, sc);
- nr_reclaimed += nr_freed;
- local_irq_disable();
- if (current_is_kswapd()) {
- __mod_page_state_zone(zone, pgscan_kswapd, nr_scan);
- __mod_page_state(kswapd_steal, nr_freed);
- } else
- __mod_page_state_zone(zone, pgscan_direct, nr_scan);
- __mod_page_state_zone(zone, pgsteal, nr_freed);
-
- if (nr_taken == 0)
- goto done;
-
- spin_lock(&zone->lru_lock);
- /*
- * Put back any unfreeable pages.
- */
- while (!list_empty(&page_list)) {
- page = lru_to_page(&page_list);
- BUG_ON(PageLRU(page));
- SetPageLRU(page);
- list_del(&page->lru);
- if (PageActive(page))
- add_page_to_active_list(zone, page);
- else
- add_page_to_inactive_list(zone, page);
- if (!pagevec_add(&pvec, page)) {
- spin_unlock_irq(&zone->lru_lock);
- __pagevec_release(&pvec);
- spin_lock_irq(&zone->lru_lock);
- }
- }
- } while (nr_scanned < max_scan);
- spin_unlock(&zone->lru_lock);
-done:
- local_irq_enable();
- pagevec_release(&pvec);
- return nr_reclaimed;
-}
-
int should_reclaim_mapped(struct zone *zone)
{
long mapped_ratio;
@@ -663,175 +593,6 @@ int should_reclaim_mapped(struct zone *z
}
/*
- * This moves pages from the active list to the inactive list.
- *
- * We move them the other way if the page is referenced by one or more
- * processes, from rmap.
- *
- * If the pages are mostly unmapped, the processing is fast and it is
- * appropriate to hold zone->lru_lock across the whole operation. But if
- * the pages are mapped, the processing is slow (page_referenced()) so we
- * should drop zone->lru_lock around each page. It's impossible to balance
- * this, so instead we remove the pages from the LRU while processing them.
- * It is safe to rely on PG_active against the non-LRU pages in here because
- * nobody will play with that bit on a non-LRU page.
- *
- * The downside is that we have to touch page->_count against each page.
- * But we had to alter page->flags anyway.
- */
-static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
- struct scan_control *sc, int reclaim_mapped)
-{
- unsigned long pgmoved;
- int pgdeactivate = 0;
- unsigned long pgscanned;
- LIST_HEAD(l_hold); /* The pages which were snipped off */
- LIST_HEAD(l_inactive); /* Pages to go onto the inactive_list */
- LIST_HEAD(l_active); /* Pages to go onto the active_list */
- struct page *page;
- struct pagevec pvec;
-
- if (!sc->may_swap)
- reclaim_mapped = 0;
-
- pgrep_add_drain();
- spin_lock_irq(&zone->lru_lock);
- pgmoved = isolate_lru_pages(nr_pages, &zone->active_list,
- &l_hold, &pgscanned);
- zone->pages_scanned += pgscanned;
- zone->nr_active -= pgmoved;
- spin_unlock_irq(&zone->lru_lock);
-
- while (!list_empty(&l_hold)) {
- cond_resched();
- page = lru_to_page(&l_hold);
- list_del(&page->lru);
- if (page_mapped(page)) {
- if (!reclaim_mapped ||
- (total_swap_pages == 0 && PageAnon(page)) ||
- page_referenced(page, 0)) {
- list_add(&page->lru, &l_active);
- continue;
- }
- }
- list_add(&page->lru, &l_inactive);
- }
-
- pagevec_init(&pvec, 1);
- pgmoved = 0;
- spin_lock_irq(&zone->lru_lock);
- while (!list_empty(&l_inactive)) {
- page = lru_to_page(&l_inactive);
- prefetchw_prev_lru_page(page, &l_inactive, flags);
- BUG_ON(PageLRU(page));
- SetPageLRU(page);
- BUG_ON(!PageActive(page));
- ClearPageActive(page);
-
- list_move(&page->lru, &zone->inactive_list);
- pgmoved++;
- if (!pagevec_add(&pvec, page)) {
- zone->nr_inactive += pgmoved;
- spin_unlock_irq(&zone->lru_lock);
- pgdeactivate += pgmoved;
- pgmoved = 0;
- if (buffer_heads_over_limit)
- pagevec_strip(&pvec);
- __pagevec_release(&pvec);
- spin_lock_irq(&zone->lru_lock);
- }
- }
- zone->nr_inactive += pgmoved;
- pgdeactivate += pgmoved;
- if (buffer_heads_over_limit) {
- spin_unlock_irq(&zone->lru_lock);
- pagevec_strip(&pvec);
- spin_lock_irq(&zone->lru_lock);
- }
-
- pgmoved = 0;
- while (!list_empty(&l_active)) {
- page = lru_to_page(&l_active);
- prefetchw_prev_lru_page(page, &l_active, flags);
- BUG_ON(PageLRU(page));
- SetPageLRU(page);
- BUG_ON(!PageActive(page));
- list_move(&page->lru, &zone->active_list);
- pgmoved++;
- if (!pagevec_add(&pvec, page)) {
- zone->nr_active += pgmoved;
- pgmoved = 0;
- spin_unlock_irq(&zone->lru_lock);
- __pagevec_release(&pvec);
- spin_lock_irq(&zone->lru_lock);
- }
- }
- zone->nr_active += pgmoved;
- spin_unlock(&zone->lru_lock);
-
- __mod_page_state_zone(zone, pgrefill, pgscanned);
- __mod_page_state(pgdeactivate, pgdeactivate);
- local_irq_enable();
-
- pagevec_release(&pvec);
-}
-
-/*
- * This is a basic per-zone page freer. Used by both kswapd and direct reclaim.
- */
-static unsigned long shrink_zone(int priority, struct zone *zone,
- struct scan_control *sc)
-{
- unsigned long nr_active;
- unsigned long nr_inactive;
- unsigned long nr_to_scan;
- unsigned long nr_reclaimed = 0;
- int reclaim_mapped = should_reclaim_mapped(zone);
-
- atomic_inc(&zone->reclaim_in_progress);
-
- /*
- * Add one to `nr_to_scan' just to make sure that the kernel will
- * slowly sift through the active list.
- */
- zone->nr_scan_active += (zone->nr_active >> priority) + 1;
- nr_active = zone->nr_scan_active;
- if (nr_active >= sc->swap_cluster_max)
- zone->nr_scan_active = 0;
- else
- nr_active = 0;
-
- zone->nr_scan_inactive += (zone->nr_inactive >> priority) + 1;
- nr_inactive = zone->nr_scan_inactive;
- if (nr_inactive >= sc->swap_cluster_max)
- zone->nr_scan_inactive = 0;
- else
- nr_inactive = 0;
-
- while (nr_active || nr_inactive) {
- if (nr_active) {
- nr_to_scan = min(nr_active,
- (unsigned long)sc->swap_cluster_max);
- nr_active -= nr_to_scan;
- shrink_active_list(nr_to_scan, zone, sc, reclaim_mapped);
- }
-
- if (nr_inactive) {
- nr_to_scan = min(nr_inactive,
- (unsigned long)sc->swap_cluster_max);
- nr_inactive -= nr_to_scan;
- nr_reclaimed += shrink_inactive_list(nr_to_scan, zone,
- sc);
- }
- }
-
- throttle_vm_writeout();
-
- atomic_dec(&zone->reclaim_in_progress);
- return nr_reclaimed;
-}
-
-/*
* This is the direct reclaim path, for page-allocating processes. We only
* try to reclaim pages from zones which will satisfy the caller's allocation
* request.
@@ -869,7 +630,7 @@ static unsigned long shrink_zones(int pr
if (zone->all_unreclaimable && priority != DEF_PRIORITY)
continue; /* Let kswapd poll it */
- nr_reclaimed += shrink_zone(priority, zone, sc);
+ nr_reclaimed += pgrep_shrink_zone(priority, zone, sc);
}
return nr_reclaimed;
}
@@ -1085,7 +846,7 @@ scan:
if (zone->prev_priority > priority)
zone->prev_priority = priority;
sc.nr_scanned = 0;
- nr_reclaimed += shrink_zone(priority, zone, &sc);
+ nr_reclaimed += pgrep_shrink_zone(priority, zone, &sc);
reclaim_state->reclaimed_slab = 0;
nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL,
lru_pages);
@@ -1377,7 +1138,7 @@ static int __zone_reclaim(struct zone *z
*/
priority = ZONE_RECLAIM_PRIORITY;
do {
- nr_reclaimed += shrink_zone(priority, zone, &sc);
+ nr_reclaimed += pgrep_shrink_zone(priority, zone, &sc);
priority--;
} while (priority >= 0 && nr_reclaimed < nr_pages);
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace.h 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:11:47.000000000 +0200
@@ -78,6 +78,8 @@ typedef enum {
/* reclaim_t pgrep_reclaimable(struct page *); */
/* int pgrep_activate(struct page *page); */
/* void pgrep_mark_accessed(struct page *); */
+extern unsigned long pgrep_shrink_zone(int priority, struct zone *zone,
+ struct scan_control *sc);
#ifdef CONFIG_MM_POLICY_USEONCE
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 13/39] mm: pgrep: abstract rotate_reclaimable_page()
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (11 preceding siblings ...)
2006-07-12 14:39 ` [PATCH 12/39] mm: pgrep: move the shrink logic Peter Zijlstra
@ 2006-07-12 14:39 ` Peter Zijlstra
2006-07-12 14:39 ` [PATCH 14/39] mm: pgrep: manage page-state Peter Zijlstra
` (26 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:39 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Take out the knowledge of the rotation itself.
API:
rotate the page to the candidate end of the page scanner
(when suitable for reclaim)
void __pgrep_rotate_reclaimable(struct zone *, struct page *);
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/mm_page_replace.h | 1 +
include/linux/mm_use_once_policy.h | 8 ++++++++
mm/swap.c | 8 +-------
3 files changed, 10 insertions(+), 7 deletions(-)
Index: linux-2.6/include/linux/mm_use_once_policy.h
===================================================================
--- linux-2.6.orig/include/linux/mm_use_once_policy.h 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/include/linux/mm_use_once_policy.h 2006-07-12 16:11:46.000000000 +0200
@@ -95,5 +95,13 @@ static inline int pgrep_activate(struct
return 1;
}
+static inline void __pgrep_rotate_reclaimable(struct zone *zone, struct page *page)
+{
+ if (PageLRU(page) && !PageActive(page)) {
+ list_move_tail(&page->lru, &zone->inactive_list);
+ inc_page_state(pgrotated);
+ }
+}
+
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_USEONCE_POLICY_H */
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace.h 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:11:46.000000000 +0200
@@ -80,6 +80,7 @@ typedef enum {
/* void pgrep_mark_accessed(struct page *); */
extern unsigned long pgrep_shrink_zone(int priority, struct zone *zone,
struct scan_control *sc);
+/* void __pgrep_rotate_reclaimable(struct zone *, struct page *); */
#ifdef CONFIG_MM_POLICY_USEONCE
Index: linux-2.6/mm/swap.c
===================================================================
--- linux-2.6.orig/mm/swap.c 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/mm/swap.c 2006-07-12 16:11:45.000000000 +0200
@@ -79,18 +79,12 @@ int rotate_reclaimable_page(struct page
return 1;
if (PageDirty(page))
return 1;
- if (PageActive(page))
- return 1;
if (!PageLRU(page))
return 1;
zone = page_zone(page);
spin_lock_irqsave(&zone->lru_lock, flags);
- if (PageLRU(page) && !PageActive(page)) {
- list_del(&page->lru);
- list_add_tail(&page->lru, &zone->inactive_list);
- inc_page_state(pgrotated);
- }
+ __pgrep_rotate_reclaimable(zone, page);
if (!test_clear_page_writeback(page))
BUG();
spin_unlock_irqrestore(&zone->lru_lock, flags);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 14/39] mm: pgrep: manage page-state
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (12 preceding siblings ...)
2006-07-12 14:39 ` [PATCH 13/39] mm: pgrep: abstract rotate_reclaimable_page() Peter Zijlstra
@ 2006-07-12 14:39 ` Peter Zijlstra
2006-07-12 14:39 ` [PATCH 15/39] mm: pgrep: abstract page removal Peter Zijlstra
` (25 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:39 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
API:
Copy/Clear the reclaim page state:
void pgrep_copy_state(struct page *, struct page *);
void pgrep_clear_state(struct page *);
Query activeness of the page, where 'active' is taken to mean: not likely
to be in the next candidate group.
int pgrep_is_active(struct page *);
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl
include/linux/mm_page_replace.h | 3 +++
include/linux/mm_use_once_policy.h | 17 +++++++++++++++++
mm/mempolicy.c | 2 +-
mm/migrate.c | 5 ++---
mm/vmscan.c | 1 +
5 files changed, 24 insertions(+), 4 deletions(-)
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace.h 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:11:45.000000000 +0200
@@ -81,6 +81,9 @@ typedef enum {
extern unsigned long pgrep_shrink_zone(int priority, struct zone *zone,
struct scan_control *sc);
/* void __pgrep_rotate_reclaimable(struct zone *, struct page *); */
+/* void pgrep_copy_state(struct page *, struct page *); */
+/* void pgrep_clear_state(struct page *); */
+/* int pgrep_is_active(struct page *); */
#ifdef CONFIG_MM_POLICY_USEONCE
Index: linux-2.6/include/linux/mm_use_once_policy.h
===================================================================
--- linux-2.6.orig/include/linux/mm_use_once_policy.h 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/include/linux/mm_use_once_policy.h 2006-07-12 16:11:45.000000000 +0200
@@ -103,5 +103,22 @@ static inline void __pgrep_rotate_reclai
}
}
+static inline void pgrep_copy_state(struct page *dpage, struct page *spage)
+{
+ if (PageActive(spage))
+ SetPageActive(dpage);
+}
+
+static inline void pgrep_clear_state(struct page *page)
+{
+ if (PageActive(page))
+ ClearPageActive(page);
+}
+
+static inline int pgrep_is_active(struct page *page)
+{
+ return PageActive(page);
+}
+
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_USEONCE_POLICY_H */
Index: linux-2.6/mm/mempolicy.c
===================================================================
--- linux-2.6.orig/mm/mempolicy.c 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/mm/mempolicy.c 2006-07-12 16:11:43.000000000 +0200
@@ -1749,7 +1749,7 @@ static void gather_stats(struct page *pa
if (PageSwapCache(page))
md->swapcache++;
- if (PageActive(page))
+ if (pgrep_is_active(page))
md->active++;
if (PageWriteback(page))
Index: linux-2.6/mm/migrate.c
===================================================================
--- linux-2.6.orig/mm/migrate.c 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/mm/migrate.c 2006-07-12 16:11:45.000000000 +0200
@@ -262,12 +262,11 @@ void migrate_page_copy(struct page *newp
SetPageReferenced(newpage);
if (PageUptodate(page))
SetPageUptodate(newpage);
- if (PageActive(page))
- SetPageActive(newpage);
if (PageChecked(page))
SetPageChecked(newpage);
if (PageMappedToDisk(page))
SetPageMappedToDisk(newpage);
+ pgrep_copy_state(newpage, page);
if (PageDirty(page)) {
clear_page_dirty_for_io(page);
@@ -275,8 +274,8 @@ void migrate_page_copy(struct page *newp
}
ClearPageSwapCache(page);
- ClearPageActive(page);
ClearPagePrivate(page);
+ pgrep_clear_state(page);
set_page_private(page, 0);
page->mapping = NULL;
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/mm/vmscan.c 2006-07-12 16:11:45.000000000 +0200
@@ -473,6 +473,7 @@ unsigned long shrink_page_list(struct li
goto keep_locked;
free_it:
+ pgrep_clear_state(page);
unlock_page(page);
nr_reclaimed++;
if (!pagevec_add(&freed_pvec, page))
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 15/39] mm: pgrep: abstract page removal
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (13 preceding siblings ...)
2006-07-12 14:39 ` [PATCH 14/39] mm: pgrep: manage page-state Peter Zijlstra
@ 2006-07-12 14:39 ` Peter Zijlstra
2006-07-12 14:40 ` [PATCH 16/39] mm: pgrep: remove mm_inline.h Peter Zijlstra
` (24 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:39 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
API:
Remove the specified page from the page reclaim data structures.
void __pgrep_remove(struct zone *zone, struct page *page);
NOTE: isolate_lru_page{,s}() become generic functions.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
include/linux/mm_page_replace.h | 1 +
include/linux/mm_use_once_policy.h | 9 +++++++++
include/linux/swap.h | 6 +++---
mm/migrate.c | 5 +----
mm/swap.c | 6 ++++--
mm/useonce.c | 8 ++------
mm/vmscan.c | 12 +++++-------
7 files changed, 25 insertions(+), 22 deletions(-)
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace.h 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:11:44.000000000 +0200
@@ -84,6 +84,7 @@ extern unsigned long pgrep_shrink_zone(i
/* void pgrep_copy_state(struct page *, struct page *); */
/* void pgrep_clear_state(struct page *); */
/* int pgrep_is_active(struct page *); */
+/* void __pgrep_remove(struct zone *zone, struct page *page); */
#ifdef CONFIG_MM_POLICY_USEONCE
Index: linux-2.6/include/linux/mm_use_once_policy.h
===================================================================
--- linux-2.6.orig/include/linux/mm_use_once_policy.h 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/include/linux/mm_use_once_policy.h 2006-07-12 16:11:44.000000000 +0200
@@ -120,5 +120,14 @@ static inline int pgrep_is_active(struct
return PageActive(page);
}
+static inline void __pgrep_remove(struct zone *zone, struct page *page)
+{
+ list_del(&page->lru);
+ if (PageActive(page))
+ zone->nr_active--;
+ else
+ zone->nr_inactive--;
+}
+
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_USEONCE_POLICY_H */
Index: linux-2.6/mm/migrate.c
===================================================================
--- linux-2.6.orig/mm/migrate.c 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/mm/migrate.c 2006-07-12 16:11:44.000000000 +0200
@@ -53,10 +53,7 @@ int isolate_lru_page(struct page *page,
ret = 0;
get_page(page);
ClearPageLRU(page);
- if (PageActive(page))
- del_page_from_active_list(zone, page);
- else
- del_page_from_inactive_list(zone, page);
+ __pgrep_remove(zone, page);
list_add_tail(&page->lru, pagelist);
}
spin_unlock_irq(&zone->lru_lock);
Index: linux-2.6/mm/swap.c
===================================================================
--- linux-2.6.orig/mm/swap.c 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/mm/swap.c 2006-07-12 16:11:44.000000000 +0200
@@ -140,7 +140,8 @@ void fastcall __page_cache_release(struc
spin_lock_irqsave(&zone->lru_lock, flags);
BUG_ON(!PageLRU(page));
__ClearPageLRU(page);
- del_page_from_lru(zone, page);
+ __pgrep_remove(zone, page);
+ pgrep_clear_state(page);
spin_unlock_irqrestore(&zone->lru_lock, flags);
}
free_hot_page(page);
@@ -191,7 +192,8 @@ void release_pages(struct page **pages,
}
BUG_ON(!PageLRU(page));
__ClearPageLRU(page);
- del_page_from_lru(zone, page);
+ __pgrep_remove(zone, page);
+ pgrep_clear_state(page);
}
if (!pagevec_add(&pages_to_free, page)) {
Index: linux-2.6/mm/useonce.c
===================================================================
--- linux-2.6.orig/mm/useonce.c 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/mm/useonce.c 2006-07-12 16:11:43.000000000 +0200
@@ -73,11 +73,9 @@ static unsigned long shrink_inactive_lis
unsigned long nr_scan;
unsigned long nr_freed;
- nr_taken = isolate_lru_pages(sc->swap_cluster_max,
+ nr_taken = isolate_lru_pages(zone, sc->swap_cluster_max,
&zone->inactive_list,
&page_list, &nr_scan);
- zone->nr_inactive -= nr_taken;
- zone->pages_scanned += nr_scan;
spin_unlock_irq(&zone->lru_lock);
nr_scanned += nr_scan;
@@ -155,10 +153,8 @@ static void shrink_active_list(unsigned
pgrep_add_drain();
spin_lock_irq(&zone->lru_lock);
- pgmoved = isolate_lru_pages(nr_pages, &zone->active_list,
+ pgmoved = isolate_lru_pages(zone, nr_pages, &zone->active_list,
&l_hold, &pgscanned);
- zone->pages_scanned += pgscanned;
- zone->nr_active -= pgmoved;
spin_unlock_irq(&zone->lru_lock);
while (!list_empty(&l_hold)) {
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/mm/vmscan.c 2006-07-12 16:11:44.000000000 +0200
@@ -514,7 +514,7 @@ keep:
*
* returns how many pages were moved onto *@dst.
*/
-unsigned long isolate_lru_pages(unsigned long nr_to_scan,
+unsigned long isolate_lru_pages(struct zone *zone, unsigned long nr_to_scan,
struct list_head *src, struct list_head *dst,
unsigned long *scanned)
{
@@ -523,14 +523,11 @@ unsigned long isolate_lru_pages(unsigned
unsigned long scan;
for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
- struct list_head *target;
page = lru_to_page(src);
prefetchw_prev_lru_page(page, src, flags);
BUG_ON(!PageLRU(page));
- list_del(&page->lru);
- target = src;
if (likely(get_page_unless_zero(page))) {
/*
* Be careful not to clear PageLRU until after we're
@@ -538,14 +535,15 @@ unsigned long isolate_lru_pages(unsigned
* page release code relies on it.
*/
ClearPageLRU(page);
- target = dst;
+ __pgrep_remove(zone, page);
+ list_add(&page->lru, dst);
nr_taken++;
} /* else it is being freed elsewhere */
-
- list_add(&page->lru, target);
+ else list_move(&page->lru, src);
}
*scanned = scan;
+ zone->pages_scanned += scan;
return nr_taken;
}
Index: linux-2.6/include/linux/swap.h
===================================================================
--- linux-2.6.orig/include/linux/swap.h 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/include/linux/swap.h 2006-07-12 16:09:18.000000000 +0200
@@ -174,9 +174,9 @@ extern void release_pages(struct page **
extern int remove_mapping(struct address_space *mapping, struct page *page);
extern unsigned long shrink_page_list(struct list_head *page_list,
struct scan_control *sc);
-extern unsigned long isolate_lru_pages(unsigned long nr_to_scan,
- struct list_head *src, struct list_head *dst,
- unsigned long *scanned);
+extern unsigned long isolate_lru_pages(struct zone *zone,
+ unsigned long nr_to_scan, struct list_head *src,
+ struct list_head *dst, unsigned long *scanned);
extern int should_reclaim_mapped(struct zone *zone);
extern unsigned long try_to_free_pages(struct zone **, gfp_t);
extern unsigned long shrink_all_memory(unsigned long nr_pages);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 16/39] mm: pgrep: remove mm_inline.h
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (14 preceding siblings ...)
2006-07-12 14:39 ` [PATCH 15/39] mm: pgrep: abstract page removal Peter Zijlstra
@ 2006-07-12 14:40 ` Peter Zijlstra
2006-07-12 14:40 ` [PATCH 17/39] mm: pgrep: re-insertion logic Peter Zijlstra
` (23 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:40 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Move whatever is needed from mm_inline into the use-once policy header.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/mm_inline.h | 47 -------------------------------------
include/linux/mm_page_replace.h | 1
include/linux/mm_use_once_policy.h | 28 ++++++++++++++++++++++
mm/migrate.c | 1
mm/swap.c | 1
mm/vmscan.c | 1
6 files changed, 28 insertions(+), 51 deletions(-)
Index: linux-2.6/include/linux/mm_inline.h
===================================================================
--- linux-2.6.orig/include/linux/mm_inline.h 2006-07-12 16:08:18.000000000 +0200
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,47 +0,0 @@
-#ifndef _LINUX_MM_INLINE_H_
-#define _LINUX_MM_INLINE_H_
-
-#ifdef __KERNEL__
-
-static inline void
-add_page_to_active_list(struct zone *zone, struct page *page)
-{
- list_add(&page->lru, &zone->active_list);
- zone->nr_active++;
-}
-
-static inline void
-add_page_to_inactive_list(struct zone *zone, struct page *page)
-{
- list_add(&page->lru, &zone->inactive_list);
- zone->nr_inactive++;
-}
-
-static inline void
-del_page_from_active_list(struct zone *zone, struct page *page)
-{
- list_del(&page->lru);
- zone->nr_active--;
-}
-
-static inline void
-del_page_from_inactive_list(struct zone *zone, struct page *page)
-{
- list_del(&page->lru);
- zone->nr_inactive--;
-}
-
-static inline void
-del_page_from_lru(struct zone *zone, struct page *page)
-{
- list_del(&page->lru);
- if (PageActive(page)) {
- __ClearPageActive(page);
- zone->nr_active--;
- } else {
- zone->nr_inactive--;
- }
-}
-
-#endif /* __KERNEL__ */
-#endif /* _LINUX_MM_INLINE_H_ */
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace.h 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:11:43.000000000 +0200
@@ -6,7 +6,6 @@
#include <linux/mmzone.h>
#include <linux/mm.h>
#include <linux/pagevec.h>
-#include <linux/mm_inline.h>
struct scan_control {
/* Incremented by the number of inactive pages that were scanned */
Index: linux-2.6/include/linux/mm_use_once_policy.h
===================================================================
--- linux-2.6.orig/include/linux/mm_use_once_policy.h 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/include/linux/mm_use_once_policy.h 2006-07-12 16:11:39.000000000 +0200
@@ -6,6 +6,34 @@
#include <linux/fs.h>
#include <linux/rmap.h>
+static inline void
+add_page_to_active_list(struct zone *zone, struct page *page)
+{
+ list_add(&page->lru, &zone->active_list);
+ zone->nr_active++;
+}
+
+static inline void
+add_page_to_inactive_list(struct zone *zone, struct page *page)
+{
+ list_add(&page->lru, &zone->inactive_list);
+ zone->nr_inactive++;
+}
+
+static inline void
+del_page_from_active_list(struct zone *zone, struct page *page)
+{
+ list_del(&page->lru);
+ zone->nr_active--;
+}
+
+static inline void
+del_page_from_inactive_list(struct zone *zone, struct page *page)
+{
+ list_del(&page->lru);
+ zone->nr_inactive--;
+}
+
static inline void pgrep_hint_active(struct page *page)
{
SetPageActive(page);
Index: linux-2.6/mm/migrate.c
===================================================================
--- linux-2.6.orig/mm/migrate.c 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/mm/migrate.c 2006-07-12 16:11:43.000000000 +0200
@@ -17,7 +17,6 @@
#include <linux/swap.h>
#include <linux/pagemap.h>
#include <linux/buffer_head.h>
-#include <linux/mm_inline.h>
#include <linux/pagevec.h>
#include <linux/rmap.h>
#include <linux/topology.h>
Index: linux-2.6/mm/swap.c
===================================================================
--- linux-2.6.orig/mm/swap.c 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/mm/swap.c 2006-07-12 16:09:18.000000000 +0200
@@ -22,7 +22,6 @@
#include <linux/pagevec.h>
#include <linux/init.h>
#include <linux/module.h>
-#include <linux/mm_inline.h>
#include <linux/buffer_head.h> /* for try_to_release_page() */
#include <linux/module.h>
#include <linux/percpu_counter.h>
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/mm/vmscan.c 2006-07-12 16:11:39.000000000 +0200
@@ -24,7 +24,6 @@
#include <linux/blkdev.h>
#include <linux/buffer_head.h> /* for try_to_release_page(),
buffer_heads_over_limit */
-#include <linux/mm_inline.h>
#include <linux/pagevec.h>
#include <linux/backing-dev.h>
#include <linux/rmap.h>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 17/39] mm: pgrep: re-insertion logic
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (15 preceding siblings ...)
2006-07-12 14:40 ` [PATCH 16/39] mm: pgrep: remove mm_inline.h Peter Zijlstra
@ 2006-07-12 14:40 ` Peter Zijlstra
2006-07-12 14:40 ` [PATCH 18/39] mm: pgrep: initialisation hooks Peter Zijlstra
` (22 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:40 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
API:
reinserts pages taken with isolate_lru_page() - use by page mirgration.
void pgrep_reinsert(struct list_head*);
NOTE: these pages still have their reclaim page state and so can be
inserted at the proper place.
NOTE: this patch seems quite useless with the current use-once policy,
however for other policies re-insertion (where the page state is conserved)
is quite different from regular insertion (where the page state is set by
insertion hints).
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/migrate.h | 2 --
include/linux/mm_page_replace.h | 2 +-
mm/mempolicy.c | 4 ++--
mm/migrate.c | 29 +----------------------------
mm/useonce.c | 10 ++++++++++
5 files changed, 14 insertions(+), 33 deletions(-)
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace.h 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:11:41.000000000 +0200
@@ -84,7 +84,7 @@ extern unsigned long pgrep_shrink_zone(i
/* void pgrep_clear_state(struct page *); */
/* int pgrep_is_active(struct page *); */
/* void __pgrep_remove(struct zone *zone, struct page *page); */
-
+extern void pgrep_reinsert(struct list_head *);
#ifdef CONFIG_MM_POLICY_USEONCE
#include <linux/mm_use_once_policy.h>
Index: linux-2.6/mm/useonce.c
===================================================================
--- linux-2.6.orig/mm/useonce.c 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/mm/useonce.c 2006-07-12 16:11:41.000000000 +0200
@@ -52,6 +52,16 @@ void __pgrep_add_drain(unsigned int cpu)
__pagevec_pgrep_add(pvec);
}
+void pgrep_reinsert(struct list_head *page_list)
+{
+ struct page *page, *page2;
+
+ list_for_each_entry_safe(page, page2, page_list, lru) {
+ list_del(&page->lru);
+ pgrep_add(page);
+ put_page(page);
+ }
+}
/*
* shrink_inactive_list() is a helper for shrink_zone(). It returns the number
* of reclaimed pages
Index: linux-2.6/mm/mempolicy.c
===================================================================
--- linux-2.6.orig/mm/mempolicy.c 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/mm/mempolicy.c 2006-07-12 16:09:18.000000000 +0200
@@ -607,7 +607,7 @@ int migrate_to_node(struct mm_struct *mm
if (!list_empty(&pagelist)) {
err = migrate_pages_to(&pagelist, NULL, dest);
if (!list_empty(&pagelist))
- putback_lru_pages(&pagelist);
+ pgrep_reinsert(&pagelist);
}
return err;
}
@@ -775,7 +775,7 @@ long do_mbind(unsigned long start, unsig
}
if (!list_empty(&pagelist))
- putback_lru_pages(&pagelist);
+ pgrep_reinsert(&pagelist);
up_write(&mm->mmap_sem);
mpol_free(new);
Index: linux-2.6/include/linux/migrate.h
===================================================================
--- linux-2.6.orig/include/linux/migrate.h 2006-07-12 16:07:29.000000000 +0200
+++ linux-2.6/include/linux/migrate.h 2006-07-12 16:09:18.000000000 +0200
@@ -6,7 +6,6 @@
#ifdef CONFIG_MIGRATION
extern int isolate_lru_page(struct page *p, struct list_head *pagelist);
-extern int putback_lru_pages(struct list_head *l);
extern int migrate_page(struct page *, struct page *);
extern void migrate_page_copy(struct page *, struct page *);
extern int migrate_page_remove_references(struct page *, struct page *, int);
@@ -22,7 +21,6 @@ extern int migrate_prep(void);
static inline int isolate_lru_page(struct page *p, struct list_head *list)
{ return -ENOSYS; }
-static inline int putback_lru_pages(struct list_head *l) { return 0; }
static inline int migrate_pages(struct list_head *l, struct list_head *t,
struct list_head *moved, struct list_head *failed) { return -ENOSYS; }
Index: linux-2.6/mm/migrate.c
===================================================================
--- linux-2.6.orig/mm/migrate.c 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/mm/migrate.c 2006-07-12 16:09:18.000000000 +0200
@@ -25,8 +25,6 @@
#include <linux/swapops.h>
#include <linux/mm_page_replace.h>
-#include "internal.h"
-
/* The maximum number of pages to take off the LRU for migration */
#define MIGRATE_CHUNK_SIZE 256
@@ -82,31 +80,6 @@ int migrate_prep(void)
return 0;
}
-static inline void move_to_lru(struct page *page)
-{
- list_del(&page->lru);
- pgrep_add(page);
- put_page(page);
-}
-
-/*
- * Add isolated pages on the list back to the LRU.
- *
- * returns the number of pages put back.
- */
-int putback_lru_pages(struct list_head *l)
-{
- struct page *page;
- struct page *page2;
- int count = 0;
-
- list_for_each_entry_safe(page, page2, l, lru) {
- move_to_lru(page);
- count++;
- }
- return count;
-}
-
/*
* Non migratable page
*/
@@ -626,7 +599,7 @@ redo:
}
err = migrate_pages(pagelist, &newlist, &moved, &failed);
- putback_lru_pages(&moved); /* Call release pages instead ?? */
+ pgrep_reinsert(&moved); /* Call release pages instead ?? */
if (err >= 0 && list_empty(&newlist) && !list_empty(pagelist))
goto redo;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 18/39] mm: pgrep: initialisation hooks
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (16 preceding siblings ...)
2006-07-12 14:40 ` [PATCH 17/39] mm: pgrep: re-insertion logic Peter Zijlstra
@ 2006-07-12 14:40 ` Peter Zijlstra
2006-07-12 14:40 ` [PATCH 19/39] mm: pgrep: info functions Peter Zijlstra
` (21 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:40 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Move initialization of the replacement policy's variables into the
implementation.
API:
initialize the policy:
void pgrep_init(void);
initialize the policies per zone data:
void pgrep_init_zone(struct zone *);
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/mm_page_replace.h | 2 ++
init/main.c | 2 ++
mm/page_alloc.c | 8 ++------
mm/useonce.c | 15 +++++++++++++++
4 files changed, 21 insertions(+), 6 deletions(-)
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace.h 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:11:40.000000000 +0200
@@ -58,6 +58,8 @@ struct scan_control {
#define prefetchw_prev_lru_page(_page, _base, _field) do { } while (0)
#endif
+extern void pgrep_init(void);
+extern void pgrep_init_zone(struct zone *);
/* void pgrep_hint_active(struct page *); */
/* void pgrep_hint_use_once(struct page *); */
extern void fastcall pgrep_add(struct page *);
Index: linux-2.6/mm/useonce.c
===================================================================
--- linux-2.6.orig/mm/useonce.c 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/mm/useonce.c 2006-07-12 16:11:40.000000000 +0200
@@ -6,6 +6,21 @@
#include <linux/buffer_head.h> /* for try_to_release_page(),
buffer_heads_over_limit */
+void __init pgrep_init(void)
+{
+ /* empty hook */
+}
+
+void __init pgrep_init_zone(struct zone *zone)
+{
+ INIT_LIST_HEAD(&zone->active_list);
+ INIT_LIST_HEAD(&zone->inactive_list);
+ zone->nr_scan_active = 0;
+ zone->nr_scan_inactive = 0;
+ zone->nr_active = 0;
+ zone->nr_inactive = 0;
+}
+
/**
* lru_cache_add: add a page to the page lists
* @page: the page to add
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c 2006-07-12 16:07:32.000000000 +0200
+++ linux-2.6/mm/page_alloc.c 2006-07-12 16:11:40.000000000 +0200
@@ -37,6 +37,7 @@
#include <linux/nodemask.h>
#include <linux/vmalloc.h>
#include <linux/mempolicy.h>
+#include <linux/mm_page_replace.h>
#include <asm/tlbflush.h>
#include <asm/div64.h>
@@ -2100,12 +2101,7 @@ static void __init free_area_init_core(s
zone->temp_priority = zone->prev_priority = DEF_PRIORITY;
zone_pcp_init(zone);
- INIT_LIST_HEAD(&zone->active_list);
- INIT_LIST_HEAD(&zone->inactive_list);
- zone->nr_scan_active = 0;
- zone->nr_scan_inactive = 0;
- zone->nr_active = 0;
- zone->nr_inactive = 0;
+ pgrep_init_zone(zone);
atomic_set(&zone->reclaim_in_progress, 0);
if (!size)
continue;
Index: linux-2.6/init/main.c
===================================================================
--- linux-2.6.orig/init/main.c 2006-07-12 16:07:31.000000000 +0200
+++ linux-2.6/init/main.c 2006-07-12 16:09:18.000000000 +0200
@@ -47,6 +47,7 @@
#include <linux/rmap.h>
#include <linux/mempolicy.h>
#include <linux/key.h>
+#include <linux/mm_page_replace.h>
#include <asm/io.h>
#include <asm/bugs.h>
@@ -511,6 +512,7 @@ asmlinkage void __init start_kernel(void
#endif
vfs_caches_init_early();
cpuset_init_early();
+ pgrep_init();
mem_init();
kmem_cache_init();
setup_per_cpu_pageset();
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 19/39] mm: pgrep: info functions
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (17 preceding siblings ...)
2006-07-12 14:40 ` [PATCH 18/39] mm: pgrep: initialisation hooks Peter Zijlstra
@ 2006-07-12 14:40 ` Peter Zijlstra
2006-07-12 14:40 ` [PATCH 20/39] mm: pgrep: page count functions Peter Zijlstra
` (20 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:40 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Isolate the printing of various policy related information.
API:
print the zone information for show_free_areas():
void pgrep_show(struct zone *);
print the zone information for zoneinfo_show():
void pgrep_zoneinfo(struct zone *, struct seq_file *);
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/mm_page_replace.h | 3 ++
mm/page_alloc.c | 44 +--------------------------------
mm/useonce.c | 52 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 57 insertions(+), 42 deletions(-)
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace.h 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:11:39.000000000 +0200
@@ -6,6 +6,7 @@
#include <linux/mmzone.h>
#include <linux/mm.h>
#include <linux/pagevec.h>
+#include <linux/seq_file.h>
struct scan_control {
/* Incremented by the number of inactive pages that were scanned */
@@ -87,6 +88,8 @@ extern unsigned long pgrep_shrink_zone(i
/* int pgrep_is_active(struct page *); */
/* void __pgrep_remove(struct zone *zone, struct page *page); */
extern void pgrep_reinsert(struct list_head *);
+extern void pgrep_show(struct zone *);
+extern void pgrep_zoneinfo(struct zone *, struct seq_file *);
#ifdef CONFIG_MM_POLICY_USEONCE
#include <linux/mm_use_once_policy.h>
Index: linux-2.6/mm/useonce.c
===================================================================
--- linux-2.6.orig/mm/useonce.c 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/mm/useonce.c 2006-07-12 16:11:39.000000000 +0200
@@ -310,3 +310,55 @@ unsigned long pgrep_shrink_zone(int prio
atomic_dec(&zone->reclaim_in_progress);
return nr_reclaimed;
}
+
+#define K(x) ((x) << (PAGE_SHIFT-10))
+
+void pgrep_show(struct zone *zone)
+{
+ printk("%s"
+ " free:%lukB"
+ " min:%lukB"
+ " low:%lukB"
+ " high:%lukB"
+ " active:%lukB"
+ " inactive:%lukB"
+ " present:%lukB"
+ " pages_scanned:%lu"
+ " all_unreclaimable? %s"
+ "\n",
+ zone->name,
+ K(zone->free_pages),
+ K(zone->pages_min),
+ K(zone->pages_low),
+ K(zone->pages_high),
+ K(zone->nr_active),
+ K(zone->nr_inactive),
+ K(zone->present_pages),
+ zone->pages_scanned,
+ (zone->all_unreclaimable ? "yes" : "no")
+ );
+}
+
+void pgrep_zoneinfo(struct zone *zone, struct seq_file *m)
+{
+ seq_printf(m,
+ "\n pages free %lu"
+ "\n min %lu"
+ "\n low %lu"
+ "\n high %lu"
+ "\n active %lu"
+ "\n inactive %lu"
+ "\n scanned %lu (a: %lu i: %lu)"
+ "\n spanned %lu"
+ "\n present %lu",
+ zone->free_pages,
+ zone->pages_min,
+ zone->pages_low,
+ zone->pages_high,
+ zone->nr_active,
+ zone->nr_inactive,
+ zone->pages_scanned,
+ zone->nr_scan_active, zone->nr_scan_inactive,
+ zone->spanned_pages,
+ zone->present_pages);
+}
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/mm/page_alloc.c 2006-07-12 16:11:39.000000000 +0200
@@ -1457,28 +1457,7 @@ void show_free_areas(void)
int i;
show_node(zone);
- printk("%s"
- " free:%lukB"
- " min:%lukB"
- " low:%lukB"
- " high:%lukB"
- " active:%lukB"
- " inactive:%lukB"
- " present:%lukB"
- " pages_scanned:%lu"
- " all_unreclaimable? %s"
- "\n",
- zone->name,
- K(zone->free_pages),
- K(zone->pages_min),
- K(zone->pages_low),
- K(zone->pages_high),
- K(zone->nr_active),
- K(zone->nr_inactive),
- K(zone->present_pages),
- zone->pages_scanned,
- (zone->all_unreclaimable ? "yes" : "no")
- );
+ pgrep_show(zone);
printk("lowmem_reserve[]:");
for (i = 0; i < MAX_NR_ZONES; i++)
printk(" %lu", zone->lowmem_reserve[i]);
@@ -2252,26 +2231,7 @@ static int zoneinfo_show(struct seq_file
spin_lock_irqsave(&zone->lock, flags);
seq_printf(m, "Node %d, zone %8s", pgdat->node_id, zone->name);
- seq_printf(m,
- "\n pages free %lu"
- "\n min %lu"
- "\n low %lu"
- "\n high %lu"
- "\n active %lu"
- "\n inactive %lu"
- "\n scanned %lu (a: %lu i: %lu)"
- "\n spanned %lu"
- "\n present %lu",
- zone->free_pages,
- zone->pages_min,
- zone->pages_low,
- zone->pages_high,
- zone->nr_active,
- zone->nr_inactive,
- zone->pages_scanned,
- zone->nr_scan_active, zone->nr_scan_inactive,
- zone->spanned_pages,
- zone->present_pages);
+ pgrep_zoneinfo(zone, m);
seq_printf(m,
"\n protection: (%lu",
zone->lowmem_reserve[0]);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 20/39] mm: pgrep: page count functions
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (18 preceding siblings ...)
2006-07-12 14:40 ` [PATCH 19/39] mm: pgrep: info functions Peter Zijlstra
@ 2006-07-12 14:40 ` Peter Zijlstra
2006-07-12 14:41 ` [PATCH 21/39] mm: pgrep: per policy data Peter Zijlstra
` (19 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:40 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Abstract the various page counts used to drive the scanner.
API:
give the 'active', 'inactive' and free count for the selected pgdat.
(free interpretation of '' words)
void __pgrep_counts(unsigned long *, unsigned long *,
unsigned long *, struct zone *);
total number of pages in the policies care
unsigned long __pgrep_nr_pages(struct zone *);
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/mm_page_replace.h | 3 +++
include/linux/mm_use_once_policy.h | 5 +++++
mm/page_alloc.c | 12 +-----------
mm/useonce.c | 15 +++++++++++++++
mm/vmscan.c | 6 +++---
5 files changed, 27 insertions(+), 14 deletions(-)
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:11:36.000000000 +0200
@@ -90,6 +90,9 @@ extern unsigned long pgrep_shrink_zone(i
extern void pgrep_reinsert(struct list_head *);
extern void pgrep_show(struct zone *);
extern void pgrep_zoneinfo(struct zone *, struct seq_file *);
+extern void __pgrep_counts(unsigned long *, unsigned long *,
+ unsigned long *, struct zone *);
+/* unsigned long __pgrep_nr_pages(struct zone *); */
#ifdef CONFIG_MM_POLICY_USEONCE
#include <linux/mm_use_once_policy.h>
Index: linux-2.6/mm/useonce.c
===================================================================
--- linux-2.6.orig/mm/useonce.c 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/useonce.c 2006-07-12 16:11:38.000000000 +0200
@@ -362,3 +362,18 @@ void pgrep_zoneinfo(struct zone *zone, s
zone->spanned_pages,
zone->present_pages);
}
+
+void __pgrep_counts(unsigned long *active, unsigned long *inactive,
+ unsigned long *free, struct zone *zones)
+{
+ int i;
+
+ *active = 0;
+ *inactive = 0;
+ *free = 0;
+ for (i = 0; i < MAX_NR_ZONES; i++) {
+ *active += zones[i].nr_active;
+ *inactive += zones[i].nr_inactive;
+ *free += zones[i].free_pages;
+ }
+}
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/page_alloc.c 2006-07-12 16:11:37.000000000 +0200
@@ -1332,17 +1332,7 @@ EXPORT_SYMBOL(mod_page_state_offset);
void __get_zone_counts(unsigned long *active, unsigned long *inactive,
unsigned long *free, struct pglist_data *pgdat)
{
- struct zone *zones = pgdat->node_zones;
- int i;
-
- *active = 0;
- *inactive = 0;
- *free = 0;
- for (i = 0; i < MAX_NR_ZONES; i++) {
- *active += zones[i].nr_active;
- *inactive += zones[i].nr_inactive;
- *free += zones[i].free_pages;
- }
+ __pgrep_counts(active, inactive, free, pgdat->node_zones);
}
void get_zone_counts(unsigned long *active,
Index: linux-2.6/include/linux/mm_use_once_policy.h
===================================================================
--- linux-2.6.orig/include/linux/mm_use_once_policy.h 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/include/linux/mm_use_once_policy.h 2006-07-12 16:11:38.000000000 +0200
@@ -157,5 +157,10 @@ static inline void __pgrep_remove(struct
zone->nr_inactive--;
}
+static inline unsigned long __pgrep_nr_pages(struct zone *zone)
+{
+ return zone->nr_active + zone->nr_inactive;
+}
+
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_USEONCE_POLICY_H */
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c 2006-07-12 16:09:18.000000000 +0200
+++ linux-2.6/mm/vmscan.c 2006-07-12 16:11:36.000000000 +0200
@@ -671,7 +671,7 @@ unsigned long try_to_free_pages(struct z
continue;
zone->temp_priority = DEF_PRIORITY;
- lru_pages += zone->nr_active + zone->nr_inactive;
+ lru_pages += __pgrep_nr_pages(zone);
}
for (priority = DEF_PRIORITY; priority >= 0; priority--) {
@@ -812,7 +812,7 @@ scan:
for (i = 0; i <= end_zone; i++) {
struct zone *zone = pgdat->node_zones + i;
- lru_pages += zone->nr_active + zone->nr_inactive;
+ lru_pages += __pgrep_nr_pages(zone);
}
/*
@@ -853,7 +853,7 @@ scan:
if (zone->all_unreclaimable)
continue;
if (nr_slab == 0 && zone->pages_scanned >=
- (zone->nr_active + zone->nr_inactive) * 4)
+ __pgrep_nr_pages(zone) * 4)
zone->all_unreclaimable = 1;
/*
* If we've done a decent amount of scanning and
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 21/39] mm: pgrep: per policy data
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (19 preceding siblings ...)
2006-07-12 14:40 ` [PATCH 20/39] mm: pgrep: page count functions Peter Zijlstra
@ 2006-07-12 14:41 ` Peter Zijlstra
2006-07-12 14:41 ` [PATCH 22/39] mm: pgrep: per policy PG_flags Peter Zijlstra
` (18 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:41 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Abstract the policy specific variables from struct zone.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/mm_page_replace_data.h | 13 ++++++++
include/linux/mm_use_once_data.h | 16 ++++++++++
include/linux/mm_use_once_policy.h | 20 ++++++------
include/linux/mmzone.h | 8 +----
mm/useonce.c | 54 +++++++++++++++++------------------
5 files changed, 68 insertions(+), 43 deletions(-)
Index: linux-2.6/include/linux/mm_use_once_data.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/include/linux/mm_use_once_data.h 2006-07-12 16:11:19.000000000 +0200
@@ -0,0 +1,16 @@
+#ifndef _LINUX_MM_USEONCE_DATA_H
+#define _LINUX_MM_USEONCE_DATA_H
+
+#ifdef __KERNEL__
+
+struct pgrep_data {
+ struct list_head active_list;
+ struct list_head inactive_list;
+ unsigned long nr_scan_active;
+ unsigned long nr_scan_inactive;
+ unsigned long nr_active;
+ unsigned long nr_inactive;
+};
+
+#endif /* __KERNEL__ */
+#endif /* _LINUX_MM_USEONCE_DATA_H */
Index: linux-2.6/mm/useonce.c
===================================================================
--- linux-2.6.orig/mm/useonce.c 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/useonce.c 2006-07-12 16:11:31.000000000 +0200
@@ -13,12 +13,12 @@ void __init pgrep_init(void)
void __init pgrep_init_zone(struct zone *zone)
{
- INIT_LIST_HEAD(&zone->active_list);
- INIT_LIST_HEAD(&zone->inactive_list);
- zone->nr_scan_active = 0;
- zone->nr_scan_inactive = 0;
- zone->nr_active = 0;
- zone->nr_inactive = 0;
+ INIT_LIST_HEAD(&zone->policy.active_list);
+ INIT_LIST_HEAD(&zone->policy.inactive_list);
+ zone->policy.nr_scan_active = 0;
+ zone->policy.nr_scan_inactive = 0;
+ zone->policy.nr_active = 0;
+ zone->policy.nr_inactive = 0;
}
/**
@@ -99,7 +99,7 @@ static unsigned long shrink_inactive_lis
unsigned long nr_freed;
nr_taken = isolate_lru_pages(zone, sc->swap_cluster_max,
- &zone->inactive_list,
+ &zone->policy.inactive_list,
&page_list, &nr_scan);
spin_unlock_irq(&zone->lru_lock);
@@ -178,7 +178,7 @@ static void shrink_active_list(unsigned
pgrep_add_drain();
spin_lock_irq(&zone->lru_lock);
- pgmoved = isolate_lru_pages(zone, nr_pages, &zone->active_list,
+ pgmoved = isolate_lru_pages(zone, nr_pages, &zone->policy.active_list,
&l_hold, &pgscanned);
spin_unlock_irq(&zone->lru_lock);
@@ -208,10 +208,10 @@ static void shrink_active_list(unsigned
BUG_ON(!PageActive(page));
ClearPageActive(page);
- list_move(&page->lru, &zone->inactive_list);
+ list_move(&page->lru, &zone->policy.inactive_list);
pgmoved++;
if (!pagevec_add(&pvec, page)) {
- zone->nr_inactive += pgmoved;
+ zone->policy.nr_inactive += pgmoved;
spin_unlock_irq(&zone->lru_lock);
pgdeactivate += pgmoved;
pgmoved = 0;
@@ -221,7 +221,7 @@ static void shrink_active_list(unsigned
spin_lock_irq(&zone->lru_lock);
}
}
- zone->nr_inactive += pgmoved;
+ zone->policy.nr_inactive += pgmoved;
pgdeactivate += pgmoved;
if (buffer_heads_over_limit) {
spin_unlock_irq(&zone->lru_lock);
@@ -236,17 +236,17 @@ static void shrink_active_list(unsigned
BUG_ON(PageLRU(page));
SetPageLRU(page);
BUG_ON(!PageActive(page));
- list_move(&page->lru, &zone->active_list);
+ list_move(&page->lru, &zone->policy.active_list);
pgmoved++;
if (!pagevec_add(&pvec, page)) {
- zone->nr_active += pgmoved;
+ zone->policy.nr_active += pgmoved;
pgmoved = 0;
spin_unlock_irq(&zone->lru_lock);
__pagevec_release(&pvec);
spin_lock_irq(&zone->lru_lock);
}
}
- zone->nr_active += pgmoved;
+ zone->policy.nr_active += pgmoved;
spin_unlock(&zone->lru_lock);
__mod_page_state_zone(zone, pgrefill, pgscanned);
@@ -274,17 +274,17 @@ unsigned long pgrep_shrink_zone(int prio
* Add one to `nr_to_scan' just to make sure that the kernel will
* slowly sift through the active list.
*/
- zone->nr_scan_active += (zone->nr_active >> priority) + 1;
- nr_active = zone->nr_scan_active;
+ zone->policy.nr_scan_active += (zone->policy.nr_active >> priority) + 1;
+ nr_active = zone->policy.nr_scan_active;
if (nr_active >= sc->swap_cluster_max)
- zone->nr_scan_active = 0;
+ zone->policy.nr_scan_active = 0;
else
nr_active = 0;
- zone->nr_scan_inactive += (zone->nr_inactive >> priority) + 1;
- nr_inactive = zone->nr_scan_inactive;
+ zone->policy.nr_scan_inactive += (zone->policy.nr_inactive >> priority) + 1;
+ nr_inactive = zone->policy.nr_scan_inactive;
if (nr_inactive >= sc->swap_cluster_max)
- zone->nr_scan_inactive = 0;
+ zone->policy.nr_scan_inactive = 0;
else
nr_inactive = 0;
@@ -331,8 +331,8 @@ void pgrep_show(struct zone *zone)
K(zone->pages_min),
K(zone->pages_low),
K(zone->pages_high),
- K(zone->nr_active),
- K(zone->nr_inactive),
+ K(zone->policy.nr_active),
+ K(zone->policy.nr_inactive),
K(zone->present_pages),
zone->pages_scanned,
(zone->all_unreclaimable ? "yes" : "no")
@@ -355,10 +355,10 @@ void pgrep_zoneinfo(struct zone *zone, s
zone->pages_min,
zone->pages_low,
zone->pages_high,
- zone->nr_active,
- zone->nr_inactive,
+ zone->policy.nr_active,
+ zone->policy.nr_inactive,
zone->pages_scanned,
- zone->nr_scan_active, zone->nr_scan_inactive,
+ zone->policy.nr_scan_active, zone->policy.nr_scan_inactive,
zone->spanned_pages,
zone->present_pages);
}
@@ -372,8 +372,8 @@ void __pgrep_counts(unsigned long *activ
*inactive = 0;
*free = 0;
for (i = 0; i < MAX_NR_ZONES; i++) {
- *active += zones[i].nr_active;
- *inactive += zones[i].nr_inactive;
+ *active += zones[i].policy.nr_active;
+ *inactive += zones[i].policy.nr_inactive;
*free += zones[i].free_pages;
}
}
Index: linux-2.6/include/linux/mm_page_replace_data.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/include/linux/mm_page_replace_data.h 2006-07-12 16:11:29.000000000 +0200
@@ -0,0 +1,13 @@
+#ifndef _LINUX_MM_PAGE_REPLACE_DATA_H
+#define _LINUX_MM_PAGE_REPLACE_DATA_H
+
+#ifdef __KERNEL__
+
+#ifdef CONFIG_MM_POLICY_USEONCE
+#include <linux/mm_use_once_data.h>
+#else
+#error no mm policy
+#endif
+
+#endif /* __KERNEL__ */
+#endif /* _LINUX_MM_PAGE_REPLACE_DATA_H */
Index: linux-2.6/include/linux/mmzone.h
===================================================================
--- linux-2.6.orig/include/linux/mmzone.h 2006-07-12 16:07:29.000000000 +0200
+++ linux-2.6/include/linux/mmzone.h 2006-07-12 16:09:19.000000000 +0200
@@ -14,6 +14,7 @@
#include <linux/init.h>
#include <linux/seqlock.h>
#include <linux/nodemask.h>
+#include <linux/mm_page_replace_data.h>
#include <asm/atomic.h>
#include <asm/page.h>
@@ -154,12 +155,7 @@ struct zone {
/* Fields commonly accessed by the page reclaim scanner */
spinlock_t lru_lock;
- struct list_head active_list;
- struct list_head inactive_list;
- unsigned long nr_scan_active;
- unsigned long nr_scan_inactive;
- unsigned long nr_active;
- unsigned long nr_inactive;
+ struct pgrep_data policy;
unsigned long pages_scanned; /* since last reclaim */
int all_unreclaimable; /* All pages pinned */
Index: linux-2.6/include/linux/mm_use_once_policy.h
===================================================================
--- linux-2.6.orig/include/linux/mm_use_once_policy.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/mm_use_once_policy.h 2006-07-12 16:11:37.000000000 +0200
@@ -9,29 +9,29 @@
static inline void
add_page_to_active_list(struct zone *zone, struct page *page)
{
- list_add(&page->lru, &zone->active_list);
- zone->nr_active++;
+ list_add(&page->lru, &zone->policy.active_list);
+ zone->policy.nr_active++;
}
static inline void
add_page_to_inactive_list(struct zone *zone, struct page *page)
{
- list_add(&page->lru, &zone->inactive_list);
- zone->nr_inactive++;
+ list_add(&page->lru, &zone->policy.inactive_list);
+ zone->policy.nr_inactive++;
}
static inline void
del_page_from_active_list(struct zone *zone, struct page *page)
{
list_del(&page->lru);
- zone->nr_active--;
+ zone->policy.nr_active--;
}
static inline void
del_page_from_inactive_list(struct zone *zone, struct page *page)
{
list_del(&page->lru);
- zone->nr_inactive--;
+ zone->policy.nr_inactive--;
}
static inline void pgrep_hint_active(struct page *page)
@@ -126,7 +126,7 @@ static inline int pgrep_activate(struct
static inline void __pgrep_rotate_reclaimable(struct zone *zone, struct page *page)
{
if (PageLRU(page) && !PageActive(page)) {
- list_move_tail(&page->lru, &zone->inactive_list);
+ list_move_tail(&page->lru, &zone->policy.inactive_list);
inc_page_state(pgrotated);
}
}
@@ -152,14 +152,14 @@ static inline void __pgrep_remove(struct
{
list_del(&page->lru);
if (PageActive(page))
- zone->nr_active--;
+ zone->policy.nr_active--;
else
- zone->nr_inactive--;
+ zone->policy.nr_inactive--;
}
static inline unsigned long __pgrep_nr_pages(struct zone *zone)
{
- return zone->nr_active + zone->nr_inactive;
+ return zone->policy.nr_active + zone->policy.nr_inactive;
}
#endif /* __KERNEL__ */
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 22/39] mm: pgrep: per policy PG_flags
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (20 preceding siblings ...)
2006-07-12 14:41 ` [PATCH 21/39] mm: pgrep: per policy data Peter Zijlstra
@ 2006-07-12 14:41 ` Peter Zijlstra
2006-07-12 14:41 ` [PATCH 23/39] mm: pgrep: nonresident page tracking hooks Peter Zijlstra
` (17 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:41 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Abstract the replacement policy specific pageflags.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/mm_use_once_policy.h | 8 ++++++++
include/linux/page-flags.h | 7 +------
mm/hugetlb.c | 2 +-
mm/page_alloc.c | 6 +++---
4 files changed, 13 insertions(+), 10 deletions(-)
Index: linux-2.6/include/linux/mm_use_once_policy.h
===================================================================
--- linux-2.6.orig/include/linux/mm_use_once_policy.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/mm_use_once_policy.h 2006-07-12 16:11:36.000000000 +0200
@@ -5,6 +5,14 @@
#include <linux/fs.h>
#include <linux/rmap.h>
+#include <linux/page-flags.h>
+
+#define PG_active PG_reclaim1
+
+#define PageActive(page) test_bit(PG_active, &(page)->flags)
+#define SetPageActive(page) set_bit(PG_active, &(page)->flags)
+#define ClearPageActive(page) clear_bit(PG_active, &(page)->flags)
+#define __ClearPageActive(page) __clear_bit(PG_active, &(page)->flags)
static inline void
add_page_to_active_list(struct zone *zone, struct page *page)
Index: linux-2.6/include/linux/page-flags.h
===================================================================
--- linux-2.6.orig/include/linux/page-flags.h 2006-07-12 16:07:30.000000000 +0200
+++ linux-2.6/include/linux/page-flags.h 2006-07-12 16:11:30.000000000 +0200
@@ -70,7 +70,7 @@
#define PG_dirty 4
#define PG_lru 5
-#define PG_active 6
+#define PG_reclaim1 6 /* reserved by the mm reclaim code */
#define PG_slab 7 /* slab debug (Suparna wants this) */
#define PG_checked 8 /* kill me in 2.5.<early>. */
@@ -259,11 +259,6 @@ extern void __mod_page_state_offset(unsi
#define ClearPageLRU(page) clear_bit(PG_lru, &(page)->flags)
#define __ClearPageLRU(page) __clear_bit(PG_lru, &(page)->flags)
-#define PageActive(page) test_bit(PG_active, &(page)->flags)
-#define SetPageActive(page) set_bit(PG_active, &(page)->flags)
-#define ClearPageActive(page) clear_bit(PG_active, &(page)->flags)
-#define __ClearPageActive(page) __clear_bit(PG_active, &(page)->flags)
-
#define PageSlab(page) test_bit(PG_slab, &(page)->flags)
#define __SetPageSlab(page) __set_bit(PG_slab, &(page)->flags)
#define __ClearPageSlab(page) __clear_bit(PG_slab, &(page)->flags)
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/page_alloc.c 2006-07-12 16:11:30.000000000 +0200
@@ -149,7 +149,7 @@ static void bad_page(struct page *page)
page->flags &= ~(1 << PG_lru |
1 << PG_private |
1 << PG_locked |
- 1 << PG_active |
+ 1 << PG_reclaim1 |
1 << PG_dirty |
1 << PG_reclaim |
1 << PG_slab |
@@ -379,7 +379,7 @@ static inline int free_pages_check(struc
1 << PG_lru |
1 << PG_private |
1 << PG_locked |
- 1 << PG_active |
+ 1 << PG_reclaim1 |
1 << PG_reclaim |
1 << PG_slab |
1 << PG_swapcache |
@@ -527,7 +527,7 @@ static int prep_new_page(struct page *pa
1 << PG_lru |
1 << PG_private |
1 << PG_locked |
- 1 << PG_active |
+ 1 << PG_reclaim1 |
1 << PG_dirty |
1 << PG_reclaim |
1 << PG_slab |
Index: linux-2.6/mm/hugetlb.c
===================================================================
--- linux-2.6.orig/mm/hugetlb.c 2006-07-12 16:07:32.000000000 +0200
+++ linux-2.6/mm/hugetlb.c 2006-07-12 16:11:30.000000000 +0200
@@ -291,7 +291,7 @@ static void update_and_free_page(struct
nr_huge_pages_node[page_zone(page)->zone_pgdat->node_id]--;
for (i = 0; i < (HPAGE_SIZE / PAGE_SIZE); i++) {
page[i].flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced |
- 1 << PG_dirty | 1 << PG_active | 1 << PG_reserved |
+ 1 << PG_dirty | 1 << PG_reclaim1 | 1 << PG_reserved |
1 << PG_private | 1<< PG_writeback);
}
page[1].lru.next = NULL;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 23/39] mm: pgrep: nonresident page tracking hooks
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (21 preceding siblings ...)
2006-07-12 14:41 ` [PATCH 22/39] mm: pgrep: per policy PG_flags Peter Zijlstra
@ 2006-07-12 14:41 ` Peter Zijlstra
2006-07-12 14:41 ` [PATCH 24/39] mm: pgrep: generic shrinker logic Peter Zijlstra
` (16 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:41 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Add hooks for nonresident page tracking.
The policy has to define MM_POLICY_HAS_NONRESIDENT when it makes
use of these.
API:
Remeber a page - insert it into the nonresident page tracking.
void pgrep_remember(struct zone *, struct page *);
Forget about a page - remove it from the nonresident page tracking.
void pgrep_forget(struct address_space *, unsigned long);
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/mm_page_replace.h | 2 ++
include/linux/mm_use_once_policy.h | 3 +++
mm/memory.c | 28 ++++++++++++++++++++++++++++
mm/swapfile.c | 12 ++++++++++--
mm/vmscan.c | 2 ++
5 files changed, 45 insertions(+), 2 deletions(-)
Index: linux-2.6/include/linux/mm_use_once_policy.h
===================================================================
--- linux-2.6.orig/include/linux/mm_use_once_policy.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/mm_use_once_policy.h 2006-07-12 16:11:35.000000000 +0200
@@ -165,6 +165,9 @@ static inline void __pgrep_remove(struct
zone->policy.nr_inactive--;
}
+#define pgrep_remember(z, p) do { } while (0)
+#define pgrep_forget(m, i) do { } while (0)
+
static inline unsigned long __pgrep_nr_pages(struct zone *zone)
{
return zone->policy.nr_active + zone->policy.nr_inactive;
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:11:35.000000000 +0200
@@ -88,6 +88,8 @@ extern unsigned long pgrep_shrink_zone(i
/* int pgrep_is_active(struct page *); */
/* void __pgrep_remove(struct zone *zone, struct page *page); */
extern void pgrep_reinsert(struct list_head *);
+/* void pgrep_remember(struct zone *, struct page*); */
+/* void pgrep_forget(struct address_space *, unsigned long); */
extern void pgrep_show(struct zone *);
extern void pgrep_zoneinfo(struct zone *, struct seq_file *);
extern void __pgrep_counts(unsigned long *, unsigned long *,
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/mm/memory.c 2006-07-12 16:09:19.000000000 +0200
@@ -603,6 +603,31 @@ int copy_page_range(struct mm_struct *ds
return 0;
}
+#if defined MM_POLICY_HAS_NONRESIDENT
+static void free_file(struct vm_area_struct *vma,
+ unsigned long offset)
+{
+ struct address_space *mapping;
+ struct page *page;
+
+ if (!vma ||
+ !vma->vm_file ||
+ !vma->vm_file->f_mapping)
+ return;
+
+ mapping = vma->vm_file->f_mapping;
+ page = find_get_page(mapping, offset);
+ if (page) {
+ page_cache_release(page);
+ return;
+ }
+
+ pgrep_forget(mapping, offset);
+}
+#else
+#define free_file(a,b) do { } while (0)
+#endif
+
static unsigned long zap_pte_range(struct mmu_gather *tlb,
struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, unsigned long end,
@@ -618,6 +643,7 @@ static unsigned long zap_pte_range(struc
do {
pte_t ptent = *pte;
if (pte_none(ptent)) {
+ free_file(vma, pte_to_pgoff(ptent));
(*zap_work)--;
continue;
}
@@ -677,6 +703,8 @@ static unsigned long zap_pte_range(struc
continue;
if (!pte_file(ptent))
free_swap_and_cache(pte_to_swp_entry(ptent));
+ else
+ free_file(vma, pte_to_pgoff(ptent));
pte_clear_full(mm, addr, pte, tlb->fullmm);
} while (pte++, addr += PAGE_SIZE, (addr != end && *zap_work > 0));
Index: linux-2.6/mm/swapfile.c
===================================================================
--- linux-2.6.orig/mm/swapfile.c 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/mm/swapfile.c 2006-07-12 16:09:19.000000000 +0200
@@ -28,6 +28,7 @@
#include <linux/mutex.h>
#include <linux/capability.h>
#include <linux/syscalls.h>
+#include <linux/mm_page_replace.h>
#include <asm/pgtable.h>
#include <asm/tlbflush.h>
@@ -300,7 +301,8 @@ void swap_free(swp_entry_t entry)
p = swap_info_get(entry);
if (p) {
- swap_entry_free(p, swp_offset(entry));
+ if (!swap_entry_free(p, swp_offset(entry)))
+ pgrep_forget(&swapper_space, entry.val);
spin_unlock(&swap_lock);
}
}
@@ -397,12 +399,18 @@ void free_swap_and_cache(swp_entry_t ent
p = swap_info_get(entry);
if (p) {
- if (swap_entry_free(p, swp_offset(entry)) == 1) {
+ switch (swap_entry_free(p, swp_offset(entry))) {
+ case 1:
page = find_get_page(&swapper_space, entry.val);
if (page && unlikely(TestSetPageLocked(page))) {
page_cache_release(page);
page = NULL;
}
+ break;
+
+ case 0:
+ pgrep_forget(&swapper_space, entry.val);
+ break;
}
spin_unlock(&swap_lock);
}
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/vmscan.c 2006-07-12 16:11:35.000000000 +0200
@@ -308,6 +308,7 @@ int remove_mapping(struct address_space
if (PageSwapCache(page)) {
swp_entry_t swap = { .val = page_private(page) };
+ pgrep_remember(page_zone(page), page);
__delete_from_swap_cache(page);
write_unlock_irq(&mapping->tree_lock);
swap_free(swap);
@@ -315,6 +316,7 @@ int remove_mapping(struct address_space
return 1;
}
+ pgrep_remember(page_zone(page), page);
__remove_from_page_cache(page);
write_unlock_irq(&mapping->tree_lock);
__put_page(page);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 24/39] mm: pgrep: generic shrinker logic
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (22 preceding siblings ...)
2006-07-12 14:41 ` [PATCH 23/39] mm: pgrep: nonresident page tracking hooks Peter Zijlstra
@ 2006-07-12 14:41 ` Peter Zijlstra
2006-07-12 14:41 ` [PATCH 25/39] mm: pgrep: documentation Peter Zijlstra
` (15 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:41 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Add a general shrinker that policies can make use of.
The policy defines MM_POLICY_HAS_SHRINKER when it does _NOT_ want
to make use of this framework.
API:
Return the number of pages in the scanlist for this zone.
unsigned long __pgrep_nr_scan(struct zone *);
Fill the @list with at most @nr pages from @zone.
void pgrep_get_candidates(struct zone *, int, unsigned long,
struct list_head *, unsigned long *);
Reinsert @list into @zone where @nr pages were freed - reinsert those
pages that could not be freed.
void pgrep_put_candidates(struct zone *, struct list_head *,
unsigned long, int);
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/mm_page_replace.h | 7 ++++
include/linux/mm_use_once_policy.h | 2 +
mm/vmscan.c | 60 +++++++++++++++++++++++++++++++++++++
3 files changed, 69 insertions(+)
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:11:29.000000000 +0200
@@ -114,5 +114,12 @@ static inline void pgrep_add_drain(void)
put_cpu();
}
+#if ! defined MM_POLICY_HAS_SHRINKER
+/* unsigned long __pgrep_nr_scan(struct zone *); */
+void __pgrep_get_candidates(struct zone *, int, unsigned long, struct list_head *,
+ unsigned long *);
+void pgrep_put_candidates(struct zone *, struct list_head *, unsigned long, int);
+#endif
+
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_PAGE_REPLACE_H */
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/vmscan.c 2006-07-12 16:09:19.000000000 +0200
@@ -592,6 +592,66 @@ int should_reclaim_mapped(struct zone *z
return 0;
}
+#if ! defined MM_POLICY_HAS_SHRINKER
+unsigned long pgrep_shrink_zone(int priority, struct zone *zone,
+ struct scan_control *sc)
+{
+ unsigned long nr_reclaimed = 0;
+ unsigned long nr_scan = 0;
+
+ atomic_inc(&zone->reclaim_in_progress);
+
+ if (unlikely(sc->swap_cluster_max > SWAP_CLUSTER_MAX)) {
+ nr_scan = zone->policy.nr_scan;
+ zone->policy.nr_scan =
+ sc->swap_cluster_max + SWAP_CLUSTER_MAX - 1;
+ } else
+ zone->policy.nr_scan +=
+ (__pgrep_nr_scan(zone) >> priority) + 1;
+
+ while (zone->policy.nr_scan >= SWAP_CLUSTER_MAX) {
+ LIST_HEAD(page_list);
+ unsigned long nr_scan, nr_freed;
+
+ zone->policy.nr_scan -= SWAP_CLUSTER_MAX;
+
+ pgrep_add_drain();
+ spin_lock_irq(&zone->lru_lock);
+
+ __pgrep_get_candidates(zone, priority, SWAP_CLUSTER_MAX,
+ &page_list, &nr_scan);
+
+ spin_unlock(&zone->lru_lock);
+ if (current_is_kswapd())
+ __mod_page_state_zone(zone, pgscan_kswapd, nr_scan);
+ else
+ __mod_page_state_zone(zone, pgscan_direct, nr_scan);
+ local_irq_enable();
+
+ if (list_empty(&page_list))
+ continue;
+
+ nr_freed = shrink_page_list(&page_list, sc);
+ nr_reclaimed += nr_freed;
+
+ local_irq_disable();
+ if (current_is_kswapd())
+ __mod_page_state(kswapd_steal, nr_freed);
+ __mod_page_state_zone(zone, pgsteal, nr_freed);
+ local_irq_enable();
+
+ pgrep_put_candidates(zone, &page_list, nr_freed, sc->may_swap);
+ }
+ if (nr_scan)
+ zone->policy.nr_scan = nr_scan;
+
+ atomic_dec(&zone->reclaim_in_progress);
+
+ throttle_vm_writeout();
+ return nr_reclaimed;
+}
+#endif
+
/*
* This is the direct reclaim path, for page-allocating processes. We only
* try to reclaim pages from zones which will satisfy the caller's allocation
Index: linux-2.6/include/linux/mm_use_once_policy.h
===================================================================
--- linux-2.6.orig/include/linux/mm_use_once_policy.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/mm_use_once_policy.h 2006-07-12 16:11:31.000000000 +0200
@@ -173,5 +173,7 @@ static inline unsigned long __pgrep_nr_p
return zone->policy.nr_active + zone->policy.nr_inactive;
}
+#define MM_POLICY_HAS_SHRINKER
+
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_USEONCE_POLICY_H */
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 25/39] mm: pgrep: documentation
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (23 preceding siblings ...)
2006-07-12 14:41 ` [PATCH 24/39] mm: pgrep: generic shrinker logic Peter Zijlstra
@ 2006-07-12 14:41 ` Peter Zijlstra
2006-07-12 14:42 ` [PATCH 26/39] sum_cpu_var Peter Zijlstra
` (14 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:41 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Documentation for the page replace framework.
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Documentation/vm/pgrepment_api.txt | 216 +++++++++++++++++++++++++++++++++++++
1 file changed, 216 insertions(+)
Index: linux-2.6/Documentation/vm/pgrepment_api.txt
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/Documentation/vm/pgrepment_api.txt 2006-07-12 16:09:19.000000000 +0200
@@ -0,0 +1,216 @@
+ Page Replacement Policy Interface
+
+Introduction
+============
+
+This document describes the page replacement interfaces used by the
+virtual memory subsystem.
+
+When the system's free memory runs below a certain threshold, an action
+must be initiated to reclaim memory for future use. The decision of
+which memory pages to evict is called the replacement policy.
+
+There are several types of reclaimable objects which live in the
+system's memory:
+
+a) file cache pages
+b) anonymous process pages
+c) shared memory (shm) pages
+d) SLAB cache pages, used for internal kernel objects such as the inode
+and dentry caches.
+
+The policy API abstracts the replacement structure for pagecache objects
+(items a) b) and c)), separating it from the reclaim code path.
+
+This allows maintenance of different policies to deal with different
+workload requirements.
+
+Zoned VM
+========
+
+In Linux, physical memory is managed separately into zones, because
+certain types of allocations are address constrained.
+
+The operating system has to support types of hardware which cannot
+access full 32-bit addresses, but are limited to an address mask. For
+instance, ISA devices can only address the lower 24-bits (hence their
+visilibity goes up to 16MB).
+
+Additionally, pages used for internal kernel data must be restricted to
+the direct kernel mapping, which is approximately 1GB in current default
+configurations.
+
+Different zones must be managed separately from the perspective of page
+reclaim path, because particular zones might suffer more pressure than
+others.
+
+This means that the page replacement structures have to be maintained
+separately for each zone.
+
+Description
+===========
+
+The page replacement policy interface consists off a set of operations
+which are invoked from the common VM code.
+
+As mentioned before, the policy specific data has to be maintained
+separately for each zone, therefore "struct zone" embeds the following
+data structure:
+
+ struct pgrep_data policy;
+
+Which is to be defined by the policy in a separate header file.
+
+At the moment, this data structure is guarded by the "zone->lru_lock"
+spinlock, thus shared by all policies.
+
+Initialization (invoked during system bootup)
+--------------
+
+ * void __init pgrep_init(void)
+
+Policy private initialization.
+
+ * void __init pgrep_init_zone(struct zone *)
+
+Initialize zone specific policy data.
+
+
+Methods called by the VM
+------------------------
+
+ * void pgrep_hint_active(struct page *);
+ * void pgrep_hint_use_once(struct page *);
+
+Give the policy hints as to the importance of the page. These hints can
+be viewed as initial priority of page, where active is +1 and use_once -1.
+
+
+ * void fastcall pgrep_add(struct page *);
+
+Insert page into per-CPU list(s), used for batching groups of pages to
+relief zone->lru_lock contention. Called during page instantiation.
+
+
+ * void pgrep_add_drain(void);
+ * void pgrep_add_drain_cpu(unsigned int);
+
+Drain the per-CPU lists(s), pushing pages to the actual cache.
+Called in locations where it is important to not have stale data
+into the per-CPU lists.
+
+
+ * void pagevec_pgrep_add(struct pagevec *);
+ * void __pagevec_pgrep_add(struct pagevec *);
+
+Insert a whole pagevec worth of pages directly.
+
+
+ * void pgrep_get_candidates(struct zone *, int, struct list_head *);
+
+Select candidates for eviction from the specified zone.
+
+@zone: which memory zone to scan for.
+@nr_to_scan: number of pages to scan.
+@page_list: list_head to add select pages
+
+Called by mm/vmscan.c::shrink_cache(), the main function used to
+evict pagecache pages from a specific zone.
+
+
+ * reclaim_t pgrep_reclaimable(struct page *);
+
+Determines wether a page is reclaimable, used by shrink_list().
+This function encapsulates the call to page_referenced.
+
+
+ * void pgrep_activate(struct page *);
+
+Callback used to let the policy know this page was referenced.
+
+
+ * void pgrep_put_candidates(struct zone *, struct list_head *);
+
+Put unfreeable pages back into the zone's cache mgmt structures.
+
+@zone: memory zone which pages belong
+@page_list: list of pages to reinsert
+
+
+ * void pgrep_remove(struct zone *, struct page *);
+
+Remove page from cache. This function clears the page state.
+
+
+ * int pgrep_isolate(struct page *);
+
+Isolate a specified page; ie. remove it from the cache mgmt structures without
+clearing its page state (used for page migration).
+
+
+ * void pgrep_reinsert(struct list_head *);
+
+Reinsert a list of pages previously isolated by pgrep_isolate().
+Remember that these pages still have their page state; this property
+distinguishes this function from pgrep_add().
+NOTE: the pages on the list need not be in the same zone.
+
+
+ * void __pgrep_rotate_reclaimable(struct zone *, struct page *);
+
+Place this page so that it will be in the next candidate batch.
+
+
+ * void pgrep_remember(struct zone *, struct page*);
+ * void pgrep_forget(struct address_space *, unsigned long);
+
+Hooks for nonresident page management. Allows the policy to remember and
+forget about pages that are no longer resident.
+
+ * void pgrep_show(struct zone *);
+ * void pgrep_zoneinfo(struct zone *, struct seq_file *);
+
+Prints zoneinfo in the various ways.
+
+* void __pgrep_counts(unsigned long *, unsigned long *,
+ unsigned long *, struct pglist_data *);
+
+Gives 'active', 'inactive' and free count in pages for the selected pgdat.
+Where active/inactive are open for interpretation of the policy.
+
+ * unsigned long __pgrep_nr_pages(struct zone *);
+
+Gives the total number of pages currently managed by the page replacement
+policy.
+
+
+ * unsigned long __pgrep_nr_scan(struct zone *);
+
+Gives the number of pages needed to drive the scanning.
+
+Helpers
+-------
+
+Certain helpers are shared by all policies, follows a description of them:
+
+1) int should_reclaim_mapped(struct zone *);
+
+The point of this algorithm is to decide when to start reclaiming mapped
+memory instead of clean pagecache.
+
+Returns 1 if mapped pages should be candidates for reclaim, 0 otherwise.
+
+Page flags
+----------
+
+A number of bits in page->flags are reserved for the page replacement
+policies, they are:
+
+ PG_reclaim1 /* bit 6 */
+ PG_reclaim2 /* bit 20 */
+ PG_reclaim3 /* bit 21 */
+
+The policy private semantics of this bits are to be defined in
+the policy implementation. This bits are internal to the policy and as such
+should not be interpreted in any way by external code.
+
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 26/39] sum_cpu_var
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (24 preceding siblings ...)
2006-07-12 14:41 ` [PATCH 25/39] mm: pgrep: documentation Peter Zijlstra
@ 2006-07-12 14:42 ` Peter Zijlstra
2006-07-12 14:42 ` [PATCH 27/39] mm: clockpro: nonresident page tracking for CLOCK-Pro Peter Zijlstra
` (13 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:42 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Much used per_cpu op by the additional policies.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/percpu.h | 5 +++++
1 file changed, 5 insertions(+)
Index: linux-2.6/include/linux/percpu.h
===================================================================
--- linux-2.6.orig/include/linux/percpu.h 2006-06-12 06:51:08.000000000 +0200
+++ linux-2.6/include/linux/percpu.h 2006-07-12 16:09:19.000000000 +0200
@@ -15,6 +15,11 @@
#define get_cpu_var(var) (*({ preempt_disable(); &__get_cpu_var(var); }))
#define put_cpu_var(var) preempt_enable()
+#define __sum_cpu_var(type, var) ({ __typeof__(type) sum = 0; \
+ int cpu; \
+ for_each_cpu(cpu) sum += per_cpu(var, cpu); \
+ sum; })
+
#ifdef CONFIG_SMP
struct percpu_data {
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 27/39] mm: clockpro: nonresident page tracking for CLOCK-Pro
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (25 preceding siblings ...)
2006-07-12 14:42 ` [PATCH 26/39] sum_cpu_var Peter Zijlstra
@ 2006-07-12 14:42 ` Peter Zijlstra
2006-07-12 14:42 ` [PATCH 28/39] mm: clockpro: re-introduce page_referenced() ignore_token Peter Zijlstra
` (12 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:42 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Rik van Riel <riel@redhat.com>
Track non-resident pages through a simple hashing scheme. This way
the space overhead is limited to 1 u32 per page, or 0.1% space overhead
and lookups are one cache miss.
Aside from seeing whether or not a page was recently evicted, we can
also take a reasonable guess at how many other pages were evicted since
this page was evicted.
NOTE: bucket space also contributes to the total size of the hash.
This way even 64-bit machines with more than 2^32 pages get a fair
chance.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/nonresident.h | 12 +++
mm/nonresident.c | 175 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 187 insertions(+)
Index: linux-2.6/mm/nonresident.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/mm/nonresident.c 2006-07-12 16:11:22.000000000 +0200
@@ -0,0 +1,175 @@
+/*
+ * mm/nonresident.c
+ * (C) 2004,2005 Red Hat, Inc
+ * Written by Rik van Riel <riel@redhat.com>
+ * Released under the GPL, see the file COPYING for details.
+ *
+ * Keeps track of whether a non-resident page was recently evicted
+ * and should be immediately promoted to the active list. This also
+ * helps automatically tune the inactive target.
+ *
+ * The pageout code stores a recently evicted page in this cache
+ * by calling remember_page(mapping/mm, index/vaddr, generation)
+ * and can look it up in the cache by calling recently_evicted()
+ * with the same arguments.
+ *
+ * Note that there is no way to invalidate pages after eg. truncate
+ * or exit, we let the pages fall out of the non-resident set through
+ * normal replacement.
+ */
+#include <linux/mm.h>
+#include <linux/cache.h>
+#include <linux/spinlock.h>
+#include <linux/bootmem.h>
+#include <linux/hash.h>
+#include <linux/prefetch.h>
+#include <linux/kernel.h>
+
+/* Number of non-resident pages per hash bucket. Never smaller than 15. */
+#if (L1_CACHE_BYTES < 64)
+#define NR_BUCKET_BYTES 64
+#else
+#define NR_BUCKET_BYTES L1_CACHE_BYTES
+#endif
+#define NUM_NR ((NR_BUCKET_BYTES - sizeof(atomic_t))/sizeof(u32))
+
+struct nr_bucket
+{
+ atomic_t hand;
+ u32 page[NUM_NR];
+} ____cacheline_aligned;
+
+/* The non-resident page hash table. */
+static struct nr_bucket * nonres_table;
+static unsigned int nonres_shift;
+static unsigned int nonres_mask;
+
+static struct nr_bucket * nr_hash(void * mapping, unsigned long index)
+{
+ unsigned long bucket;
+ unsigned long hash;
+
+ hash = hash_ptr(mapping, BITS_PER_LONG);
+ hash = 37 * hash + hash_long(index, BITS_PER_LONG);
+ bucket = hash & nonres_mask;
+
+ return nonres_table + bucket;
+}
+
+static u32 nr_cookie(struct address_space * mapping, unsigned long index)
+{
+ /*
+ * Different hash magic from bucket selection to insure
+ * the combined bits extend hash-space.
+ */
+ unsigned long cookie = hash_long(index, BITS_PER_LONG);
+ cookie = 51 * cookie + hash_ptr(mapping, BITS_PER_LONG);
+
+ if (mapping && mapping->host) {
+ cookie = 37 * cookie + hash_long(mapping->host->i_ino, BITS_PER_LONG);
+ }
+
+ return (u32)(cookie >> (BITS_PER_LONG - 32));
+}
+
+unsigned long nonresident_get(struct address_space * mapping, unsigned long index)
+{
+ struct nr_bucket * nr_bucket;
+ int distance;
+ u32 wanted;
+ int i;
+
+ prefetch(mapping->host);
+ nr_bucket = nr_hash(mapping, index);
+
+ prefetch(nr_bucket);
+ wanted = nr_cookie(mapping, index);
+
+ for (i = 0; i < NUM_NR; i++) {
+ if (nr_bucket->page[i] == wanted) {
+ nr_bucket->page[i] = 0;
+ /* Return the distance between entry and clock hand. */
+ distance = atomic_read(&nr_bucket->hand) + NUM_NR - i;
+ distance = (distance % NUM_NR) << nonres_shift;
+ /*
+ * Add some jitter to the lower nonres_shift bits.
+ */
+ distance += (nr_bucket - nonres_table);
+ return distance;
+ }
+ }
+
+ return ~0UL;
+}
+
+u32 nonresident_put(struct address_space * mapping, unsigned long index)
+{
+ struct nr_bucket * nr_bucket;
+ u32 nrpage;
+ int i;
+
+ prefetch(mapping->host);
+ nr_bucket = nr_hash(mapping, index);
+
+ prefetchw(nr_bucket);
+ nrpage = nr_cookie(mapping, index);
+
+ /* Atomically find the next array index. */
+ preempt_disable();
+retry:
+ i = atomic_inc_return(&nr_bucket->hand);
+ if (unlikely(i >= NUM_NR)) {
+ if (i == NUM_NR)
+ atomic_set(&nr_bucket->hand, -1);
+ goto retry;
+ }
+ preempt_enable();
+
+ /* Statistics may want to know whether the entry was in use. */
+ return xchg(&nr_bucket->page[i], nrpage);
+}
+
+unsigned long fastcall nonresident_total(void)
+{
+ return NUM_NR << nonres_shift;
+}
+
+/*
+ * For interactive workloads, we remember about as many non-resident pages
+ * as we have actual memory pages. For server workloads with large inter-
+ * reference distances we could benefit from remembering more.
+ */
+static __initdata unsigned long nonresident_factor = 1;
+void __init nonresident_init(void)
+{
+ int target;
+ int i;
+
+ /*
+ * Calculate the non-resident hash bucket target. Use a power of
+ * two for the division because alloc_large_system_hash rounds up.
+ */
+ target = nr_all_pages * nonresident_factor;
+ target /= (sizeof(struct nr_bucket) / sizeof(u32));
+
+ nonres_table = alloc_large_system_hash("Non-resident page tracking",
+ sizeof(struct nr_bucket),
+ target,
+ 0,
+ HASH_EARLY | HASH_HIGHMEM,
+ &nonres_shift,
+ &nonres_mask,
+ 0);
+
+ for (i = 0; i < (1 << nonres_shift); i++)
+ atomic_set(&nonres_table[i].hand, 0);
+}
+
+static int __init set_nonresident_factor(char * str)
+{
+ if (!str)
+ return 0;
+ nonresident_factor = simple_strtoul(str, &str, 0);
+ return 1;
+}
+__setup("nonresident_factor=", set_nonresident_factor);
Index: linux-2.6/include/linux/nonresident.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/include/linux/nonresident.h 2006-07-12 16:11:22.000000000 +0200
@@ -0,0 +1,12 @@
+#ifndef _LINUX_NONRESIDENT_H_
+#define _LINUX_NONRESIDENT_H_
+
+#ifdef __KERNEL__
+
+extern void nonresident_init(void);
+extern unsigned long nonresident_get(struct address_space *, unsigned long);
+extern u32 nonresident_put(struct address_space *, unsigned long);
+extern unsigned long fastcall nonresident_total(void);
+
+#endif /* __KERNEL */
+#endif /* _LINUX_NONRESIDENT_H_ */
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 28/39] mm: clockpro: re-introduce page_referenced() ignore_token
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (26 preceding siblings ...)
2006-07-12 14:42 ` [PATCH 27/39] mm: clockpro: nonresident page tracking for CLOCK-Pro Peter Zijlstra
@ 2006-07-12 14:42 ` Peter Zijlstra
2006-07-12 14:42 ` [PATCH 29/39] mm: clockpro: second per policy PG_flag Peter Zijlstra
` (11 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:42 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Re-introduce the ignore_token argument to page_referenced(); hand hot
rotation will make use of this feature.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/mm_use_once_policy.h | 2 +-
include/linux/rmap.h | 4 ++--
mm/rmap.c | 26 ++++++++++++++++----------
mm/useonce.c | 2 +-
4 files changed, 20 insertions(+), 14 deletions(-)
Index: linux-2.6/include/linux/rmap.h
===================================================================
--- linux-2.6.orig/include/linux/rmap.h 2006-07-12 16:07:30.000000000 +0200
+++ linux-2.6/include/linux/rmap.h 2006-07-12 16:09:19.000000000 +0200
@@ -90,7 +90,7 @@ static inline void page_dup_rmap(struct
/*
* Called from mm/vmscan.c to handle paging out
*/
-int page_referenced(struct page *, int is_locked);
+int page_referenced(struct page *, int is_locked, int ignore_token);
int try_to_unmap(struct page *, int ignore_refs);
void remove_from_swap(struct page *page);
@@ -111,7 +111,7 @@ unsigned long page_address_in_vma(struct
#define anon_vma_prepare(vma) (0)
#define anon_vma_link(vma) do {} while (0)
-#define page_referenced(page,l) TestClearPageReferenced(page)
+#define page_referenced(page,l,i) TestClearPageReferenced(page)
#define try_to_unmap(page, refs) SWAP_FAIL
#endif /* CONFIG_MMU */
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c 2006-07-12 16:07:32.000000000 +0200
+++ linux-2.6/mm/rmap.c 2006-07-12 16:09:19.000000000 +0200
@@ -328,7 +328,7 @@ pte_t *page_check_address(struct page *p
* repeatedly from either page_referenced_anon or page_referenced_file.
*/
static int page_referenced_one(struct page *page,
- struct vm_area_struct *vma, unsigned int *mapcount)
+ struct vm_area_struct *vma, unsigned int *mapcount, int ignore_token)
{
struct mm_struct *mm = vma->vm_mm;
unsigned long address;
@@ -349,7 +349,7 @@ static int page_referenced_one(struct pa
/* Pretend the page is referenced if the task has the
swap token and is in the middle of a page fault. */
- if (mm != current->mm && has_swap_token(mm) &&
+ if (mm != current->mm && !ignore_token && has_swap_token(mm) &&
rwsem_is_locked(&mm->mmap_sem))
referenced++;
@@ -359,7 +359,7 @@ out:
return referenced;
}
-static int page_referenced_anon(struct page *page)
+static int page_referenced_anon(struct page *page, int ignore_token)
{
unsigned int mapcount;
struct anon_vma *anon_vma;
@@ -372,7 +372,8 @@ static int page_referenced_anon(struct p
mapcount = page_mapcount(page);
list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
- referenced += page_referenced_one(page, vma, &mapcount);
+ referenced += page_referenced_one(page, vma, &mapcount,
+ ignore_token);
if (!mapcount)
break;
}
@@ -391,7 +392,7 @@ static int page_referenced_anon(struct p
*
* This function is only called from page_referenced for object-based pages.
*/
-static int page_referenced_file(struct page *page)
+static int page_referenced_file(struct page *page, int ignore_token)
{
unsigned int mapcount;
struct address_space *mapping = page->mapping;
@@ -429,7 +430,8 @@ static int page_referenced_file(struct p
referenced++;
break;
}
- referenced += page_referenced_one(page, vma, &mapcount);
+ referenced += page_referenced_one(page, vma, &mapcount,
+ ignore_token);
if (!mapcount)
break;
}
@@ -446,10 +448,13 @@ static int page_referenced_file(struct p
* Quick test_and_clear_referenced for all mappings to a page,
* returns the number of ptes which referenced the page.
*/
-int page_referenced(struct page *page, int is_locked)
+int page_referenced(struct page *page, int is_locked, int ignore_token)
{
int referenced = 0;
+ if (!swap_token_default_timeout)
+ ignore_token = 1;
+
if (page_test_and_clear_young(page))
referenced++;
@@ -458,14 +463,15 @@ int page_referenced(struct page *page, i
if (page_mapped(page) && page->mapping) {
if (PageAnon(page))
- referenced += page_referenced_anon(page);
+ referenced += page_referenced_anon(page, ignore_token);
else if (is_locked)
- referenced += page_referenced_file(page);
+ referenced += page_referenced_file(page, ignore_token);
else if (TestSetPageLocked(page))
referenced++;
else {
if (page->mapping)
- referenced += page_referenced_file(page);
+ referenced += page_referenced_file(page,
+ ignore_token);
unlock_page(page);
}
}
Index: linux-2.6/mm/useonce.c
===================================================================
--- linux-2.6.orig/mm/useonce.c 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/useonce.c 2006-07-12 16:11:20.000000000 +0200
@@ -189,7 +189,7 @@ static void shrink_active_list(unsigned
if (page_mapped(page)) {
if (!reclaim_mapped ||
(total_swap_pages == 0 && PageAnon(page)) ||
- page_referenced(page, 0)) {
+ page_referenced(page, 0, 0)) {
list_add(&page->lru, &l_active);
continue;
}
Index: linux-2.6/include/linux/mm_use_once_policy.h
===================================================================
--- linux-2.6.orig/include/linux/mm_use_once_policy.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/mm_use_once_policy.h 2006-07-12 16:11:20.000000000 +0200
@@ -114,7 +114,7 @@ static inline reclaim_t pgrep_reclaimabl
if (PageActive(page))
BUG();
- referenced = page_referenced(page, 1);
+ referenced = page_referenced(page, 1, 0);
/* In active use or really unfreeable? Activate it. */
if (referenced && page_mapping_inuse(page))
return RECLAIM_ACTIVATE;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 29/39] mm: clockpro: second per policy PG_flag
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (27 preceding siblings ...)
2006-07-12 14:42 ` [PATCH 28/39] mm: clockpro: re-introduce page_referenced() ignore_token Peter Zijlstra
@ 2006-07-12 14:42 ` Peter Zijlstra
2006-07-12 14:42 ` [PATCH 30/39] mm: clockpro: CLOCK-Pro policy implementation Peter Zijlstra
` (10 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:42 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Add a second PG_flag to the page reclaim framework.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/page-flags.h | 1 +
mm/hugetlb.c | 4 ++--
mm/page_alloc.c | 3 +++
3 files changed, 6 insertions(+), 2 deletions(-)
Index: linux-2.6/include/linux/page-flags.h
===================================================================
--- linux-2.6.orig/include/linux/page-flags.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/page-flags.h 2006-07-12 16:11:26.000000000 +0200
@@ -89,6 +89,7 @@
#define PG_buddy 19 /* Page is free, on buddy lists */
#define PG_uncached 20 /* Page has been mapped as uncached */
+#define PG_reclaim2 21 /* reserved by the mm reclaim code */
/*
* Global page accounting. One instance per CPU. Only unsigned longs are
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/page_alloc.c 2006-07-12 16:11:26.000000000 +0200
@@ -150,6 +150,7 @@ static void bad_page(struct page *page)
1 << PG_private |
1 << PG_locked |
1 << PG_reclaim1 |
+ 1 << PG_reclaim2 |
1 << PG_dirty |
1 << PG_reclaim |
1 << PG_slab |
@@ -380,6 +381,7 @@ static inline int free_pages_check(struc
1 << PG_private |
1 << PG_locked |
1 << PG_reclaim1 |
+ 1 << PG_reclaim2 |
1 << PG_reclaim |
1 << PG_slab |
1 << PG_swapcache |
@@ -528,6 +530,7 @@ static int prep_new_page(struct page *pa
1 << PG_private |
1 << PG_locked |
1 << PG_reclaim1 |
+ 1 << PG_reclaim2 |
1 << PG_dirty |
1 << PG_reclaim |
1 << PG_slab |
Index: linux-2.6/mm/hugetlb.c
===================================================================
--- linux-2.6.orig/mm/hugetlb.c 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/hugetlb.c 2006-07-12 16:11:26.000000000 +0200
@@ -291,8 +291,8 @@ static void update_and_free_page(struct
nr_huge_pages_node[page_zone(page)->zone_pgdat->node_id]--;
for (i = 0; i < (HPAGE_SIZE / PAGE_SIZE); i++) {
page[i].flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced |
- 1 << PG_dirty | 1 << PG_reclaim1 | 1 << PG_reserved |
- 1 << PG_private | 1<< PG_writeback);
+ 1 << PG_dirty | 1 << PG_reclaim1 | 1 << PG_reclaim2 |
+ 1 << PG_reserved | 1 << PG_private | 1<< PG_writeback);
}
page[1].lru.next = NULL;
set_page_refcounted(page);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 30/39] mm: clockpro: CLOCK-Pro policy implementation
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (28 preceding siblings ...)
2006-07-12 14:42 ` [PATCH 29/39] mm: clockpro: second per policy PG_flag Peter Zijlstra
@ 2006-07-12 14:42 ` Peter Zijlstra
2006-07-12 14:43 ` [PATCH 31/39] mm: cart: nonresident page tracking for CART Peter Zijlstra
` (9 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:42 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
This patch implememnts an approximation to the CLOCKPro page replace
algorithm presented in:
http://www.cs.wm.edu/hpcs/WWW/HTML/publications/abs05-3.html
<insert rant on coolness and some numbers that prove it/>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/mm_clockpro_data.h | 21
include/linux/mm_clockpro_policy.h | 139 ++++++
include/linux/mm_page_replace.h | 2
include/linux/mm_page_replace_data.h | 2
mm/Kconfig | 5
mm/Makefile | 1
mm/clockpro.c | 759 +++++++++++++++++++++++++++++++++++
7 files changed, 929 insertions(+)
Index: linux-2.6/mm/clockpro.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/mm/clockpro.c 2006-07-12 16:11:22.000000000 +0200
@@ -0,0 +1,759 @@
+/*
+ * mm/clockpro.c
+ *
+ * Written by Peter Zijlstra <a.p.zijlstra@chello.nl>
+ * Released under the GPLv2, see the file COPYING for details.
+ *
+ * This file implements an approximation to the CLOCKPro page replace
+ * algorithm presented in:
+ * http://www.cs.wm.edu/hpcs/WWW/HTML/publications/abs05-3.html
+ *
+ * ===> The Algorithm <===
+ *
+ * This algorithm strifes to separate the pages with a small reuse distance
+ * from those with a large reuse distance. Pages with a small reuse distance
+ * are called hot pages and are not available for reclaim. Cold pages are those
+ * that have a large reuse distance. In order to track the reuse distance a
+ * test period is started when a reference is detected. When another reference
+ * is detected during this test period the page has a small enough reuse
+ * distance to be classified as hot.
+ *
+ * The test period is terminated when the page would get a larger reuse
+ * distance than the current largest hot page. This is directly coupled to the
+ * cold page target - the target number of cold pages. More cold pages
+ * mean fewer hot pages and hence the test period will be shorter.
+ *
+ * The cold page target is adjusted when a test period expires (dec) or when
+ * a page is referenced during its test period (inc).
+ *
+ * If we faulted in a nonresident page that is still in the test period, the
+ * inter-reference distance of that page is by definition smaller than that of
+ * the coldest page on the hot list. Meaning the hot list contains pages that
+ * are colder than at least one page that got evicted from memory, and the hot
+ * list should be smaller - conversely, the cold list should be larger.
+ *
+ * Since it is very likely that pages that are about to be evicted are still in
+ * their test period, their state has to be kept around until it expires, or
+ * the total number of pages tracks is twice the total of resident pages.
+ *
+ * The data-structre used is a single CLOCK with three hands: Hcold, Hhot and
+ * Htest. The dynamic is thusly: Hcold is rotated to look for unreferenced cold
+ * pages - those can be evicted. When Hcold encounters a referenced page it
+ * either starts a test period or promotes the page to hot if it already was in
+ * its test period. Then if there are less cold pages left than targeted, Hhot
+ * is rotated which will demote unreferenced hot pages. Hhot also terminates
+ * the test period of all cold pages it encounters. Then if after all this
+ * there are more nonresident pages tracked than there are resident pages,
+ * Htest will be rotated. Htest terminates all test periods it encounters,
+ * thereby removing nonresident pages. (Htest is pushed by Hhot - Hcold moves
+ * independently)
+ *
+ * res | h/c | tst | ref || Hcold | Hhot | Htest || Flt
+ * ----+-----+-----+-----++--------+--------+--------++-----
+ * 1 | 1 | 0 | 1 || = 1101 | 1100 | = 1101 ||
+ * 1 | 1 | 0 | 0 || = 1100 | 1000 | = 1100 ||
+ * ----+-----+-----+-----++--------+--------+--------++-----
+ * 1 | 0 | 1 | 1 || 1100 | 1001 | 1001 ||
+ * 1 | 0 | 1 | 0 || N 0010 | 1000 | 1000 ||
+ * 1 | 0 | 0 | 1 || 1010 | = 1001 | = 1001 ||
+ * 1 | 0 | 0 | 0 || X 0000 | = 1000 | = 1000 ||
+ * ----+-----+-----+-----++--------+--------+--------++-----
+ * ----+-----+-----+-----++--------+--------+--------++-----
+ * 0 | 0 | 1 | 1 || | | || 1100
+ * 0 | 0 | 1 | 0 || = 0010 | X 0000 | X 0000 ||
+ * 0 | 0 | 0 | 1 || | | || 1010
+ *
+ * The table gives the state transitions for each hand, '=' denotes no change,
+ * 'N' denotes becomes nonresident and 'X' denotes removal.
+ *
+ * (XXX: mention LIRS hot/cold page swapping which makes for the relocation on
+ * promotion/demotion)
+ *
+ * ===> The Approximation <===
+ *
+ * h/c -> PageHot()
+ * tst -> PageTest()
+ * ref -> page_referenced()
+ *
+ * Because pages can be evicted from one zone and paged back into another,
+ * nonresident page tracking needs to be inter-zone whereas resident page
+ * tracking is per definition per zone. Hence the resident and nonresident
+ * page tracking needs to be separated.
+ *
+ * This is accomplished by using two CLOCKs instead of one. One two handed
+ * CLOCK for the resident pages, and one single handed CLOCK for the
+ * nonresident pages. These CLOCKs are then coupled so that one can be seen
+ * as an overlay on the other - thereby approximating the relative order of
+ * the pages.
+ *
+ * The resident CLOCK has, as mentioned, two hands, one is Hcold (it does not
+ * affect nonresident pages) and the other is the resident part of Hhot.
+ *
+ * The nonresident CLOCK's single hand will be the nonresident part of Hhot.
+ * Htest is replaced by limiting the size of the nonresident CLOCK.
+ *
+ * The Hhot parts are coupled so that when all resident Hhot have made a full
+ * revolution so will the nonresident Hhot.
+ *
+ * (XXX: mention use-once, the two list/single list duality)
+ * TODO: numa
+ *
+ * All functions that are prefixed with '__' assume that zone->lru_lock is taken.
+ */
+
+#include <linux/mm_page_replace.h>
+#include <linux/rmap.h>
+#include <linux/buffer_head.h>
+#include <linux/pagevec.h>
+#include <linux/bootmem.h>
+#include <linux/init.h>
+#include <linux/swap.h>
+#include <linux/module.h>
+#include <linux/percpu.h>
+#include <linux/writeback.h>
+
+#include <asm/div64.h>
+
+#include <linux/nonresident.h>
+
+/* The nonresident code can be seen as a single handed clock that
+ * lacks the ability to remove tail pages. However it can report the
+ * distance to the head.
+ *
+ * What is done is to set a threshold that cuts of the clock tail.
+ */
+static DEFINE_PER_CPU(unsigned long, nonres_cutoff) = 0;
+
+/* Keep track of the number of nonresident pages tracked.
+ * This is used to scale the hand hot vs nonres hand rotation.
+ */
+static DEFINE_PER_CPU(unsigned long, nonres_count) = 0;
+
+static inline unsigned long __nonres_cutoff(void)
+{
+ return __sum_cpu_var(unsigned long, nonres_cutoff);
+}
+
+static inline unsigned long __nonres_count(void)
+{
+ return __sum_cpu_var(unsigned long, nonres_count);
+}
+
+static inline unsigned long __nonres_threshold(void)
+{
+ unsigned long cutoff = __nonres_cutoff() / 2;
+ unsigned long count = __nonres_count();
+
+ if (cutoff > count)
+ return 0;
+
+ return count - cutoff;
+}
+
+static void __nonres_cutoff_inc(unsigned long dt)
+{
+ unsigned long count = __nonres_count() * 2;
+ unsigned long cutoff = __nonres_cutoff();
+ if (cutoff < count - dt)
+ __get_cpu_var(nonres_cutoff) += dt;
+ else
+ __get_cpu_var(nonres_cutoff) += count - cutoff;
+}
+
+static void __nonres_cutoff_dec(unsigned long dt)
+{
+ unsigned long cutoff = __nonres_cutoff();
+ if (cutoff > dt)
+ __get_cpu_var(nonres_cutoff) -= dt;
+ else
+ __get_cpu_var(nonres_cutoff) -= cutoff;
+}
+
+static int nonres_get(struct address_space *mapping, unsigned long index)
+{
+ int found = 0;
+ unsigned long distance = nonresident_get(mapping, index);
+ if (distance != ~0UL) { /* valid page */
+ --__get_cpu_var(nonres_count);
+
+ /* If the distance is below the threshold the test
+ * period is still valid. Otherwise a tail page
+ * was found and we can decrease the the cutoff.
+ *
+ * Even if not found the hole introduced by the removal
+ * of the cookie increases the avg. distance by 1/2.
+ *
+ * NOTE: the cold target was adjusted when the threshold
+ * was decreased.
+ */
+ found = distance < __nonres_cutoff();
+ __nonres_cutoff_dec(1 + !!found);
+ }
+
+ return found;
+}
+
+static int nonres_put(struct address_space *mapping, unsigned long index)
+{
+ if (nonresident_put(mapping, index)) {
+ /* nonresident clock eats tail due to limited
+ * size; hand test equivalent.
+ */
+ __nonres_cutoff_dec(2);
+ return 1;
+ }
+
+ ++__get_cpu_var(nonres_count);
+ return 0;
+}
+
+static inline void nonres_rotate(unsigned long nr)
+{
+ __nonres_cutoff_inc(nr * 2);
+}
+
+static inline unsigned long nonres_count(void)
+{
+ return __nonres_threshold();
+}
+
+void __init pgrep_init(void)
+{
+ nonresident_init();
+}
+
+/* Called to initialize the clockpro parameters */
+void __init pgrep_init_zone(struct zone *zone)
+{
+ INIT_LIST_HEAD(&zone->policy.list_hand[0]);
+ INIT_LIST_HEAD(&zone->policy.list_hand[1]);
+ zone->policy.nr_resident = 0;
+ zone->policy.nr_cold = 0;
+ zone->policy.nr_cold_target = 2*zone->pages_high;
+ zone->policy.nr_nonresident_scale = 0;
+}
+
+/*
+ * Increase the cold pages target; limit it to the total number of resident
+ * pages present in the current zone.
+ *
+ * @zone: current zone
+ * @dct: intended increase
+ */
+static void __cold_target_inc(struct zone *zone, unsigned long dct)
+{
+ if (zone->policy.nr_cold_target < zone->policy.nr_resident - dct)
+ zone->policy.nr_cold_target += dct;
+ else
+ zone->policy.nr_cold_target = zone->policy.nr_resident;
+}
+
+/*
+ * Decrease the cold pages target; limit it to the high watermark in order
+ * to always have some pages available for quick reclaim.
+ *
+ * @zone: current zone
+ * @dct: intended decrease
+ */
+static void __cold_target_dec(struct zone *zone, unsigned long dct)
+{
+ if (zone->policy.nr_cold_target > (2*zone->pages_high) + dct)
+ zone->policy.nr_cold_target -= dct;
+ else
+ zone->policy.nr_cold_target = (2*zone->pages_high);
+}
+
+/*
+ * Instead of a single CLOCK with two hands, two lists are used.
+ * When the two lists are laid head to tail two junction points
+ * appear, these points are the hand positions.
+ *
+ * This approach has the advantage that there is no pointer magic
+ * associated with the hands. It is impossible to remove the page
+ * a hand is pointing to.
+ *
+ * To allow the hands to lap each other the lists are swappable; eg.
+ * when the hands point to the same position, one of the lists has to
+ * be empty - however it does not matter which list is. Hence we make
+ * sure that the hand we are going to work on contains the pages.
+ */
+static inline
+void __select_list_hand(struct zone *zone, struct list_head *list)
+{
+ if (list_empty(list)) {
+ LIST_HEAD(tmp);
+ list_splice_init(&zone->policy.list_hand[0], &tmp);
+ list_splice_init(&zone->policy.list_hand[1],
+ &zone->policy.list_hand[0]);
+ list_splice(&tmp, &zone->policy.list_hand[1]);
+ }
+}
+
+static DEFINE_PER_CPU(struct pagevec, clockpro_add_pvecs) = { 0, };
+
+/*
+ * Insert page into @zones clock and update adaptive parameters.
+ *
+ * Several page flags are used for insertion hints:
+ * PG_test - use the use-once logic
+ *
+ * For now we will ignore the active hint; the use once logic is
+ * explained below.
+ *
+ * @zone: target zone.
+ * @page: new page.
+ */
+void __pgrep_add(struct zone *zone, struct page *page)
+{
+ int found = 0;
+ struct address_space *mapping = page_mapping(page);
+ int hand = HAND_HOT;
+
+ if (mapping)
+ found = nonres_get(mapping, page_index(page));
+
+#if 0
+ /* prefill the hot list */
+ if (zone->free_pages > zone->policy.nr_cold_target) {
+ SetPageHot(page);
+ hand = HAND_COLD;
+ } else
+#endif
+ /* abuse the PG_test flag for pagecache use-once */
+ if (PageTest(page)) {
+ /*
+ * Use-Once insert; we want to avoid activation on the first
+ * reference (which we know will come).
+ *
+ * This is accomplished by inserting the page one state lower
+ * than usual so the activation that does come ups it to the
+ * normal insert state. Also we insert right behind Hhot so
+ * 1) Hhot cannot interfere; and 2) we lose the first reference
+ * quicker.
+ *
+ * Insert (cold,test)/(cold) so the following activation will
+ * elevate the state to (hot)/(cold,test). (NOTE: the activation
+ * will take care of the cold target increment).
+ */
+ if (!found)
+ ClearPageTest(page);
+ ++zone->policy.nr_cold;
+ hand = HAND_COLD;
+ } else {
+ /*
+ * Insert (hot) when found in the nonresident list, otherwise
+ * insert as (cold,test). Insert at the head of the Hhot list,
+ * ie. right behind Hcold.
+ */
+ if (found) {
+ SetPageHot(page);
+ __cold_target_inc(zone, 1);
+ hand = HAND_COLD;
+ } else {
+ SetPageTest(page);
+ ++zone->policy.nr_cold;
+ }
+ }
+ ++zone->policy.nr_resident;
+ list_add(&page->lru, &zone->policy.list_hand[hand]);
+
+ BUG_ON(!PageLRU(page));
+}
+
+void fastcall pgrep_add(struct page *page)
+{
+ struct pagevec *pvec = &get_cpu_var(clockpro_add_pvecs);
+
+ page_cache_get(page);
+ if (!pagevec_add(pvec, page))
+ __pagevec_pgrep_add(pvec);
+ put_cpu_var(clockpro_add_pvecs);
+}
+
+void __pgrep_add_drain(unsigned int cpu)
+{
+ struct pagevec *pvec = &per_cpu(clockpro_add_pvecs, cpu);
+
+ if (pagevec_count(pvec))
+ __pagevec_pgrep_add(pvec);
+}
+
+/*
+ * Add page to a release pagevec, temp. drop zone lock to release pagevec if full.
+ * Set PG_lru, update zone->policy.nr_cold and zone->policy.nr_resident.
+ *
+ * @zone: @pages zone.
+ * @page: page to be released.
+ * @pvec: pagevec to collect pages in.
+ */
+static void __page_release(struct zone *zone, struct page *page,
+ struct pagevec *pvec)
+{
+ BUG_ON(PageLRU(page));
+ SetPageLRU(page);
+ if (!PageHot(page))
+ ++zone->policy.nr_cold;
+ ++zone->policy.nr_resident;
+
+ if (!pagevec_add(pvec, page)) {
+ spin_unlock_irq(&zone->lru_lock);
+ if (buffer_heads_over_limit)
+ pagevec_strip(pvec);
+ __pagevec_release(pvec);
+ spin_lock_irq(&zone->lru_lock);
+ }
+}
+
+void pgrep_reinsert(struct list_head *page_list)
+{
+ struct page *page, *page2;
+ struct zone *zone = NULL;
+ struct pagevec pvec;
+
+ pagevec_init(&pvec, 1);
+ list_for_each_entry_safe(page, page2, page_list, lru) {
+ struct list_head *list;
+ struct zone *pagezone = page_zone(page);
+ if (pagezone != zone) {
+ if (zone)
+ spin_unlock_irq(&zone->lru_lock);
+ zone = pagezone;
+ spin_lock_irq(&zone->lru_lock);
+ }
+ if (PageHot(page))
+ list = &zone->policy.list_hand[HAND_COLD];
+ else
+ list = &zone->policy.list_hand[HAND_HOT];
+ list_move(&page->lru, list);
+ __page_release(zone, page, &pvec);
+ }
+ if (zone)
+ spin_unlock_irq(&zone->lru_lock);
+ pagevec_release(&pvec);
+}
+
+/*
+ * Try to reclaim a specified number of pages.
+ *
+ * Reclaim cadidates have:
+ * - PG_lru cleared
+ * - 1 extra ref
+ *
+ * NOTE: hot pages are also returned but will be spit back by try_pageout()
+ * this to preserve CLOCK order.
+ *
+ * @zone: target zone to reclaim pages from.
+ * @nr_to_scan: nr of pages to try for reclaim.
+ * @page_list: list to put the pages on.
+ * @nr_scanned: number of pages scanned.
+ */
+void __pgrep_get_candidates(struct zone *zone, int priority,
+ unsigned long nr_to_scan, struct list_head *page_list,
+ unsigned long *nr_scanned)
+{
+ unsigned long nr_scan, nr_total_scan = 0;
+ unsigned long nr_cold_prio;
+ int nr_taken;
+
+ do {
+ __select_list_hand(zone, &zone->policy.list_hand[HAND_COLD]);
+ nr_taken = isolate_lru_pages(zone, nr_to_scan,
+ &zone->policy.list_hand[HAND_COLD],
+ page_list, &nr_scan);
+ nr_to_scan -= nr_scan;
+ nr_total_scan += nr_scan;
+ } while (nr_to_scan > 0 && nr_taken);
+
+ *nr_scanned = nr_total_scan;
+
+ /*
+ * Artificially increase the cold target when the priority rises
+ * so we have enough pages to reclaim.
+ */
+ if (priority <= DEF_PRIORITY/2) {
+ nr_cold_prio =
+ (zone->policy.nr_resident - zone->policy.nr_cold) >>
+ priority;
+ __cold_target_inc(zone, nr_cold_prio);
+ }
+
+}
+
+static void rotate_hot(struct zone *, int, int, struct pagevec *);
+
+/*
+ * Reinsert those candidate pages that were not freed in shrink_list().
+ * Account pages that were promoted to hot by pgrep_activate().
+ * Rotate hand hot to balance the new hot and lost cold pages vs.
+ * the cold pages target.
+ *
+ * Candidate pages have:
+ * - PG_lru cleared
+ * - 1 extra ref
+ * undo that.
+ *
+ * @zone: zone we're working on.
+ * @page_list: the left over pages.
+ * @nr_freed: number of pages freed by shrink_list()
+ */
+void pgrep_put_candidates(struct zone *zone, struct list_head *page_list,
+ unsigned long nr_freed, int may_swap)
+{
+ struct pagevec pvec;
+ unsigned long dct = 0;
+
+ pagevec_init(&pvec, 1);
+ spin_lock_irq(&zone->lru_lock);
+ while (!list_empty(page_list)) {
+ int hand = HAND_HOT;
+ struct page *page = lru_to_page(page_list);
+ prefetchw_prev_lru_page(page, page_list, flags);
+
+ if (PageHot(page) && PageTest(page)) {
+ ClearPageTest(page);
+ ++dct;
+ hand = HAND_COLD; /* relocate promoted pages */
+ }
+
+ list_move(&page->lru, &zone->policy.list_hand[hand]);
+ __page_release(zone, page, &pvec);
+ }
+ __cold_target_inc(zone, dct);
+ spin_unlock_irq(&zone->lru_lock);
+
+ /*
+ * Limit the hot hand to half a revolution.
+ */
+ if (zone->policy.nr_cold < zone->policy.nr_cold_target) {
+ int i, nr = 1 + (zone->policy.nr_resident / 2*SWAP_CLUSTER_MAX);
+ int reclaim_mapped = 0; /* should_reclaim_mapped(zone); */
+ for (i = 0; zone->policy.nr_cold < zone->policy.nr_cold_target &&
+ i < nr; ++i)
+ rotate_hot(zone, SWAP_CLUSTER_MAX, reclaim_mapped, &pvec);
+ }
+
+ pagevec_release(&pvec);
+}
+
+/*
+ * Puts cold pages that have their test bit set on the non-resident lists.
+ *
+ * @zone: dead pages zone.
+ * @page: dead page.
+ */
+void pgrep_remember(struct zone *zone, struct page *page)
+{
+ if (PageTest(page) &&
+ nonres_put(page_mapping(page), page_index(page)))
+ __cold_target_dec(zone, 1);
+}
+
+void pgrep_forget(struct address_space *mapping, unsigned long index)
+{
+ nonres_get(mapping, index);
+}
+
+static unsigned long estimate_pageable_memory(void)
+{
+#if 0
+ static unsigned long next_check;
+ static unsigned long total = 0;
+
+ if (!total || time_after(jiffies, next_check)) {
+ struct zone *z;
+ total = 0;
+ for_each_zone(z)
+ total += z->nr_resident;
+ next_check = jiffies + HZ/10;
+ }
+
+ // gave 0 first time, SIGFPE in kernel sucks
+ // hence the !total
+#else
+ unsigned long total = 0;
+ struct zone *z;
+ for_each_zone(z)
+ total += z->policy.nr_resident;
+#endif
+ return total;
+}
+
+/*
+ * Rotate the non-resident hand; scale the rotation speed so that when all
+ * hot hands have made one full revolution the non-resident hand will have
+ * too.
+ *
+ * @zone: current zone
+ * @dh: number of pages the hot hand has moved
+ */
+static void __nonres_term(struct zone *zone, unsigned long dh)
+{
+ unsigned long long cycles;
+ unsigned long nr_count = nonres_count();
+
+ /*
+ * |n1| Rhot |N| Rhot
+ * Nhot = ----------- ~ ----------
+ * |r1| |R|
+ *
+ * NOTE depends on |N|, hence use the nonresident_forget() hook.
+ */
+ cycles = zone->policy.nr_nonresident_scale + 1ULL * dh * nr_count;
+ zone->policy.nr_nonresident_scale =
+ do_div(cycles, estimate_pageable_memory() + 1UL);
+ nonres_rotate(cycles);
+ __cold_target_dec(zone, cycles);
+}
+
+/*
+ * Rotate hand hot;
+ *
+ * @zone: current zone
+ * @nr_to_scan: batch quanta
+ * @reclaim_mapped: whether to demote mapped pages too
+ * @pvec: release pagevec
+ */
+static void rotate_hot(struct zone *zone, int nr_to_scan, int reclaim_mapped,
+ struct pagevec *pvec)
+{
+ LIST_HEAD(l_hold);
+ LIST_HEAD(l_tmp);
+ unsigned long dh = 0, dct = 0;
+ unsigned long pgscanned;
+ int pgdeactivate = 0;
+ int nr_taken;
+
+ spin_lock_irq(&zone->lru_lock);
+ __select_list_hand(zone, &zone->policy.list_hand[HAND_HOT]);
+ nr_taken = isolate_lru_pages(zone, nr_to_scan,
+ &zone->policy.list_hand[HAND_HOT],
+ &l_hold, &pgscanned);
+ spin_unlock_irq(&zone->lru_lock);
+
+ while (!list_empty(&l_hold)) {
+ struct page *page = lru_to_page(&l_hold);
+ prefetchw_prev_lru_page(page, &l_hold, flags);
+
+ if (PageHot(page)) {
+ BUG_ON(PageTest(page));
+
+ /*
+ * Ignore the swap token; this is not actual reclaim
+ * and it will give a better reflection of the actual
+ * hotness of pages.
+ *
+ * XXX do something with this reclaim_mapped stuff.
+ */
+ if (/*(((reclaim_mapped && mapped) || !mapped) ||
+ (total_swap_pages == 0 && PageAnon(page))) && */
+ !page_referenced(page, 0, 1)) {
+ SetPageTest(page);
+ ++pgdeactivate;
+ }
+
+ ++dh;
+ } else {
+ if (PageTest(page)) {
+ ClearPageTest(page);
+ ++dct;
+ }
+ }
+ list_move(&page->lru, &l_tmp);
+
+ cond_resched();
+ }
+
+ spin_lock_irq(&zone->lru_lock);
+ while (!list_empty(&l_tmp)) {
+ int hand = HAND_COLD;
+ struct page *page = lru_to_page(&l_tmp);
+ prefetchw_prev_lru_page(page, &l_tmp, flags);
+
+ if (PageHot(page) && PageTest(page)) {
+ ClearPageHot(page);
+ ClearPageTest(page);
+ hand = HAND_HOT; /* relocate demoted page */
+ }
+
+ list_move(&page->lru, &zone->policy.list_hand[hand]);
+ __page_release(zone, page, pvec);
+ }
+ __nonres_term(zone, nr_taken);
+ __cold_target_dec(zone, dct);
+ spin_unlock(&zone->lru_lock);
+
+ __mod_page_state_zone(zone, pgrefill, pgscanned);
+ __mod_page_state(pgdeactivate, pgdeactivate);
+
+ local_irq_enable();
+}
+
+#define K(x) ((x) << (PAGE_SHIFT-10))
+
+void pgrep_show(struct zone *zone)
+{
+ printk("%s"
+ " free:%lukB"
+ " min:%lukB"
+ " low:%lukB"
+ " high:%lukB"
+ " resident:%lukB"
+ " cold:%lukB"
+ " present:%lukB"
+ " pages_scanned:%lu"
+ " all_unreclaimable? %s"
+ "\n",
+ zone->name,
+ K(zone->free_pages),
+ K(zone->pages_min),
+ K(zone->pages_low),
+ K(zone->pages_high),
+ K(zone->policy.nr_resident),
+ K(zone->policy.nr_cold),
+ K(zone->present_pages),
+ zone->pages_scanned,
+ (zone->all_unreclaimable ? "yes" : "no")
+ );
+}
+
+void pgrep_zoneinfo(struct zone *zone, struct seq_file *m)
+{
+ seq_printf(m,
+ "\n pages free %lu"
+ "\n min %lu"
+ "\n low %lu"
+ "\n high %lu"
+ "\n resident %lu"
+ "\n cold %lu"
+ "\n cold_tar %lu"
+ "\n nr_count %lu"
+ "\n scanned %lu"
+ "\n spanned %lu"
+ "\n present %lu",
+ zone->free_pages,
+ zone->pages_min,
+ zone->pages_low,
+ zone->pages_high,
+ zone->policy.nr_resident,
+ zone->policy.nr_cold,
+ zone->policy.nr_cold_target,
+ nonres_count(),
+ zone->pages_scanned,
+ zone->spanned_pages,
+ zone->present_pages);
+}
+
+void __pgrep_counts(unsigned long *active, unsigned long *inactive,
+ unsigned long *free, struct zone *zones)
+{
+ int i;
+
+ *active = 0;
+ *inactive = 0;
+ *free = 0;
+ for (i = 0; i < MAX_NR_ZONES; i++) {
+ *active += zones[i].policy.nr_resident - zones[i].policy.nr_cold;
+ *inactive += zones[i].policy.nr_cold;
+ *free += zones[i].free_pages;
+ }
+}
Index: linux-2.6/include/linux/mm_clockpro_data.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/include/linux/mm_clockpro_data.h 2006-07-12 16:09:19.000000000 +0200
@@ -0,0 +1,21 @@
+#ifndef _LINUX_CLOCKPRO_DATA_H_
+#define _LINUX_CLOCKPRO_DATA_H_
+
+#ifdef __KERNEL__
+
+enum {
+ HAND_HOT = 0,
+ HAND_COLD = 1
+};
+
+struct pgrep_data {
+ struct list_head list_hand[2];
+ unsigned long nr_scan;
+ unsigned long nr_resident;
+ unsigned long nr_cold;
+ unsigned long nr_cold_target;
+ unsigned long nr_nonresident_scale;
+};
+
+#endif /* __KERNEL__ */
+#endif /* _LINUX_CLOCKPRO_DATA_H_ */
Index: linux-2.6/include/linux/mm_clockpro_policy.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/include/linux/mm_clockpro_policy.h 2006-07-12 16:09:19.000000000 +0200
@@ -0,0 +1,139 @@
+#ifndef _LINUX_MM_CLOCKPRO_POLICY_H
+#define _LINUX_MM_CLOCKPRO_POLICY_H
+
+#ifdef __KERNEL__
+
+#include <linux/rmap.h>
+#include <linux/page-flags.h>
+
+#define PG_hot PG_reclaim1
+#define PG_test PG_reclaim2
+
+#define PageHot(page) test_bit(PG_hot, &(page)->flags)
+#define SetPageHot(page) set_bit(PG_hot, &(page)->flags)
+#define ClearPageHot(page) clear_bit(PG_hot, &(page)->flags)
+#define TestClearPageHot(page) test_and_clear_bit(PG_hot, &(page)->flags)
+#define TestSetPageHot(page) test_and_set_bit(PG_hot, &(page)->flags)
+
+#define PageTest(page) test_bit(PG_test, &(page)->flags)
+#define SetPageTest(page) set_bit(PG_test, &(page)->flags)
+#define ClearPageTest(page) clear_bit(PG_test, &(page)->flags)
+#define TestClearPageTest(page) test_and_clear_bit(PG_test, &(page)->flags)
+
+static inline void pgrep_hint_active(struct page *page)
+{
+}
+
+static inline void pgrep_hint_use_once(struct page *page)
+{
+ if (PageLRU(page))
+ BUG();
+ if (PageHot(page))
+ BUG();
+ SetPageTest(page);
+}
+
+extern void __pgrep_add(struct zone *, struct page *);
+
+/*
+ * Activate a cold page:
+ * cold, !test -> cold, test
+ * cold, test -> hot
+ *
+ * @page: page to activate
+ */
+static inline int fastcall pgrep_activate(struct page *page)
+{
+ int hot, test;
+
+ hot = PageHot(page);
+ test = PageTest(page);
+
+ if (hot) {
+ BUG_ON(test);
+ } else {
+ if (test) {
+ SetPageHot(page);
+ /*
+ * Leave PG_test set for new hot pages in order to
+ * recognise them in put_candidates() and do accounting.
+ */
+ return 1;
+ } else {
+ SetPageTest(page);
+ }
+ }
+
+ return 0;
+}
+
+static inline void pgrep_copy_state(struct page *dpage, struct page *spage)
+{
+ if (PageHot(spage))
+ SetPageHot(dpage);
+ if (PageTest(spage))
+ SetPageTest(dpage);
+}
+
+static inline void pgrep_clear_state(struct page *page)
+{
+ if (PageHot(page))
+ ClearPageHot(page);
+ if (PageTest(page))
+ ClearPageTest(page);
+}
+
+static inline int pgrep_is_active(struct page *page)
+{
+ return PageHot(page);
+}
+
+static inline void __pgrep_remove(struct zone *zone, struct page *page)
+{
+ list_del(&page->lru);
+ --zone->policy.nr_resident;
+ if (!PageHot(page))
+ --zone->policy.nr_cold;
+}
+
+static inline reclaim_t pgrep_reclaimable(struct page *page)
+{
+ if (PageHot(page))
+ return RECLAIM_KEEP;
+
+ if (page_referenced(page, 1, 0))
+ return RECLAIM_ACTIVATE;
+
+ return RECLAIM_OK;
+}
+
+static inline void __pgrep_rotate_reclaimable(struct zone *zone, struct page *page)
+{
+ if (PageLRU(page) && !PageHot(page)) {
+ list_move_tail(&page->lru, &zone->policy.list_hand[HAND_COLD]);
+ inc_page_state(pgrotated);
+ }
+}
+
+static inline void pgrep_mark_accessed(struct page *page)
+{
+ SetPageReferenced(page);
+}
+
+#define MM_POLICY_HAS_NONRESIDENT
+
+extern void pgrep_remember(struct zone *, struct page *);
+extern void pgrep_forget(struct address_space *, unsigned long);
+
+static inline unsigned long __pgrep_nr_pages(struct zone *zone)
+{
+ return zone->policy.nr_resident;
+}
+
+static inline unsigned long __pgrep_nr_scan(struct zone *zone)
+{
+ return zone->policy.nr_resident;
+}
+
+#endif /* __KERNEL__ */
+#endif /* _LINUX_MM_CLOCKPRO_POLICY_H_ */
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:11:25.000000000 +0200
@@ -98,6 +98,8 @@ extern void __pgrep_counts(unsigned long
#ifdef CONFIG_MM_POLICY_USEONCE
#include <linux/mm_use_once_policy.h>
+#elif CONFIG_MM_POLICY_CLOCKPRO
+#include <linux/mm_clockpro_policy.h>
#else
#error no mm policy
#endif
Index: linux-2.6/include/linux/mm_page_replace_data.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace_data.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace_data.h 2006-07-12 16:11:25.000000000 +0200
@@ -5,6 +5,8 @@
#ifdef CONFIG_MM_POLICY_USEONCE
#include <linux/mm_use_once_data.h>
+#elif CONFIG_MM_POLICY_CLOCKPRO
+#include <linux/mm_clockpro_data.h>
#else
#error no mm policy
#endif
Index: linux-2.6/mm/Kconfig
===================================================================
--- linux-2.6.orig/mm/Kconfig 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/mm/Kconfig 2006-07-12 16:11:25.000000000 +0200
@@ -142,6 +142,11 @@ config MM_POLICY_USEONCE
help
This option selects the standard multi-queue LRU policy.
+config MM_POLICY_CLOCKPRO
+ bool "CLOCK-Pro"
+ help
+ This option selects a CLOCK-Pro based policy
+
endchoice
#
Index: linux-2.6/mm/Makefile
===================================================================
--- linux-2.6.orig/mm/Makefile 2006-07-12 16:08:18.000000000 +0200
+++ linux-2.6/mm/Makefile 2006-07-12 16:11:25.000000000 +0200
@@ -13,6 +13,7 @@ obj-y := bootmem.o filemap.o mempool.o
prio_tree.o util.o mmzone.o $(mmu-y)
obj-$(CONFIG_MM_POLICY_USEONCE) += useonce.o
+obj-$(CONFIG_MM_POLICY_CLOCKPRO) += nonresident.o clockpro.o
obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o
obj-$(CONFIG_HUGETLBFS) += hugetlb.o
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 31/39] mm: cart: nonresident page tracking for CART
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (29 preceding siblings ...)
2006-07-12 14:42 ` [PATCH 30/39] mm: clockpro: CLOCK-Pro policy implementation Peter Zijlstra
@ 2006-07-12 14:43 ` Peter Zijlstra
2006-07-12 14:43 ` [PATCH 32/39] mm: cart: third per policy PG_flag Peter Zijlstra
` (8 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:43 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Nonresident code for ARC based algorithms that require balancing of multiple
lists of non-resident pages.
Based on the CLOCK-Pro nonresident code by Rik van Riel.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/nonresident-cart.h | 34 +++
mm/nonresident-cart.c | 362 +++++++++++++++++++++++++++++++++++++++
2 files changed, 396 insertions(+)
Index: linux-2.6/mm/nonresident-cart.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/mm/nonresident-cart.c 2006-07-12 16:11:22.000000000 +0200
@@ -0,0 +1,362 @@
+/*
+ * mm/nonresident-cart.c
+ * (C) 2004,2005 Red Hat, Inc
+ * Written by Rik van Riel <riel@redhat.com>
+ * Released under the GPL, see the file COPYING for details.
+ * Adapted by Peter Zijlstra <a.p.zijlstra@chello.nl> for use by ARC
+ * like algorithms.
+ *
+ * Keeps track of whether a non-resident page was recently evicted
+ * and should be immediately promoted to the active list. This also
+ * helps automatically tune the inactive target.
+ *
+ * The pageout code stores a recently evicted page in this cache
+ * by calling nonresident_put(mapping/mm, index/vaddr)
+ * and can look it up in the cache by calling nonresident_get()
+ * with the same arguments.
+ *
+ * Note that there is no way to invalidate pages after eg. truncate
+ * or exit, we let the pages fall out of the non-resident set through
+ * normal replacement.
+ *
+ *
+ * Modified to work with ARC like algorithms who:
+ * - need to balance two FIFOs; |b1| + |b2| = c,
+ *
+ * The bucket contains four single linked cyclic lists (CLOCKS) and each
+ * clock has a tail hand. By selecting a victim clock upon insertion it
+ * is possible to balance them.
+ *
+ * The first two lists are used for B1/B2 and a third for a free slot list.
+ * The fourth list is unused.
+ *
+ * The slot looks like this:
+ * struct slot_t {
+ * u32 cookie : 24; // LSB
+ * u32 index : 6;
+ * u32 listid : 2;
+ * };
+ *
+ * The bucket is guarded by a spinlock.
+ */
+#include <linux/swap.h>
+#include <linux/mm.h>
+#include <linux/cache.h>
+#include <linux/spinlock.h>
+#include <linux/bootmem.h>
+#include <linux/hash.h>
+#include <linux/prefetch.h>
+#include <linux/kernel.h>
+#include <linux/nonresident-cart.h>
+
+#define TARGET_SLOTS 64
+#define NR_CACHELINES (TARGET_SLOTS*sizeof(u32) / L1_CACHE_BYTES)
+#define NR_SLOTS (((NR_CACHELINES * L1_CACHE_BYTES) - sizeof(spinlock_t) - 4*sizeof(u8)) / sizeof(u32))
+#if 0
+#if NR_SLOTS < (TARGET_SLOTS / 2)
+#warning very small slot size
+#if NR_SLOTS <= 0
+#error no room for slots left
+#endif
+#endif
+#endif
+
+#define BUILD_MASK(bits, shift) (((1 << (bits)) - 1) << (shift))
+
+#define LISTID_BITS 2
+#define LISTID_SHIFT (sizeof(u32)*8 - LISTID_BITS)
+#define LISTID_MASK BUILD_MASK(LISTID_BITS, LISTID_SHIFT)
+
+#define SET_LISTID(x, flg) ((x) = ((x) & ~LISTID_MASK) | ((flg) << LISTID_SHIFT))
+#define GET_LISTID(x) (((x) & LISTID_MASK) >> LISTID_SHIFT)
+
+#define INDEX_BITS 6 /* ceil(log2(NR_SLOTS)) */
+#define INDEX_SHIFT (LISTID_SHIFT - INDEX_BITS)
+#define INDEX_MASK BUILD_MASK(INDEX_BITS, INDEX_SHIFT)
+
+#define SET_INDEX(x, idx) ((x) = ((x) & ~INDEX_MASK) | ((idx) << INDEX_SHIFT))
+#define GET_INDEX(x) (((x) & INDEX_MASK) >> INDEX_SHIFT)
+
+#define COOKIE_MASK BUILD_MASK(sizeof(u32)*8 - LISTID_BITS - INDEX_BITS, 0)
+
+struct nr_bucket
+{
+ spinlock_t lock;
+ u8 hand[4];
+ u32 slot[NR_SLOTS];
+} ____cacheline_aligned;
+
+/* The non-resident page hash table. */
+static struct nr_bucket * nonres_table;
+static unsigned int nonres_shift;
+static unsigned int nonres_mask;
+
+/* hash the address into a bucket */
+static struct nr_bucket * nr_hash(void * mapping, unsigned long index)
+{
+ unsigned long bucket;
+ unsigned long hash;
+
+ hash = (unsigned long)mapping + 37 * index;
+ bucket = hash_long(hash, nonres_shift);
+
+ return nonres_table + bucket;
+}
+
+/* hash the address and inode into a cookie */
+static u32 nr_cookie(struct address_space * mapping, unsigned long index)
+{
+ unsigned long hash;
+
+ hash = 37 * (unsigned long)mapping + index;
+
+ if (mapping && mapping->host)
+ hash = 37 * hash + mapping->host->i_ino;
+
+ return hash_long(hash, sizeof(u32)*8 - LISTID_BITS - INDEX_BITS);
+}
+
+DEFINE_PER_CPU(unsigned long[4], nonres_count);
+
+/*
+ * remove current (b from 'abc'):
+ *
+ * initial swap(2,3)
+ *
+ * 1: -> [2],a 1: -> [2],a
+ * * 2: -> [3],b 2: -> [1],c
+ * 3: -> [1],c * 3: -> [3],b
+ *
+ * 3 is now free for use.
+ *
+ * @nr_bucket: bucket to operate in
+ * @listid: list that the deletee belongs to
+ * @pos: slot position of deletee
+ * @slot: possible pointer to slot
+ *
+ * returns pointer to removed slot, NULL when list empty.
+ */
+static u32 * __nonresident_del(struct nr_bucket *nr_bucket, int listid, u8 pos, u32 *slot)
+{
+ int next_pos;
+ u32 *next;
+
+ if (slot == NULL) {
+ slot = &nr_bucket->slot[pos];
+ if (GET_LISTID(*slot) != listid)
+ return NULL;
+ }
+
+ --__get_cpu_var(nonres_count[listid]);
+
+ next_pos = GET_INDEX(*slot);
+ if (pos == next_pos) {
+ next = slot;
+ goto out;
+ }
+
+ next = &nr_bucket->slot[next_pos];
+ *next = xchg(slot, *next);
+
+ if (next_pos == nr_bucket->hand[listid])
+ nr_bucket->hand[listid] = pos;
+out:
+ BUG_ON(GET_INDEX(*next) != next_pos);
+ return next;
+}
+
+static inline u32 * __nonresident_pop(struct nr_bucket *nr_bucket, int listid)
+{
+ return __nonresident_del(nr_bucket, listid, nr_bucket->hand[listid], NULL);
+}
+
+/*
+ * insert before (d before b in 'abc')
+ *
+ * initial set 4 swap(2,4)
+ *
+ * 1: -> [2],a 1: -> [2],a 1: -> [2],a
+ * * 2: -> [3],b 2: -> [3],b 2: -> [4],d
+ * 3: -> [1],c 3: -> [1],c 3: -> [1],c
+ * 4: -> [4],nil 4: -> [4],d * 4: -> [3],b
+ *
+ * leaving us with 'adbc'.
+ *
+ * @nr_bucket: bucket to operator in
+ * @listid: list to insert into
+ * @pos: position to insert before
+ * @slot: slot to insert
+ */
+static void __nonresident_insert(struct nr_bucket *nr_bucket, int listid, u8 *pos, u32 *slot)
+{
+ u32 *head;
+
+ SET_LISTID(*slot, listid);
+
+ head = &nr_bucket->slot[*pos];
+
+ *pos = GET_INDEX(*slot);
+ if (GET_LISTID(*head) == listid)
+ *slot = xchg(head, *slot);
+
+ ++__get_cpu_var(nonres_count[listid]);
+}
+
+static inline void __nonresident_push(struct nr_bucket *nr_bucket, int listid, u32 *slot)
+{
+ __nonresident_insert(nr_bucket, listid, &nr_bucket->hand[listid], slot);
+}
+
+/*
+ * Remembers a page by putting a hash-cookie on the @listid list.
+ *
+ * @mapping: page_mapping()
+ * @index: page_index()
+ * @listid: list to put the page on (NR_b1, NR_b2 and NR_free).
+ * @listid_evict: list to get a free page from when NR_free is empty.
+ *
+ * returns the list an empty page was taken from.
+ */
+int nonresident_put(struct address_space * mapping, unsigned long index, int listid, int listid_evict)
+{
+ struct nr_bucket *nr_bucket;
+ u32 cookie;
+ unsigned long flags;
+ u32 *slot;
+ int evict = NR_free;
+
+ prefetch(mapping->host);
+ nr_bucket = nr_hash(mapping, index);
+
+ spin_lock_prefetch(nr_bucket); // prefetchw_range(nr_bucket, NR_CACHELINES);
+ cookie = nr_cookie(mapping, index);
+
+ spin_lock_irqsave(&nr_bucket->lock, flags);
+ slot = __nonresident_pop(nr_bucket, evict);
+ if (!slot) {
+ evict = listid_evict;
+ slot = __nonresident_pop(nr_bucket, evict);
+ if (!slot) {
+ evict ^= 1;
+ slot = __nonresident_pop(nr_bucket, evict);
+ }
+ }
+ BUG_ON(!slot);
+ SET_INDEX(cookie, GET_INDEX(*slot));
+ cookie = xchg(slot, cookie);
+ __nonresident_push(nr_bucket, listid, slot);
+ spin_unlock_irqrestore(&nr_bucket->lock, flags);
+
+ return evict;
+}
+
+/*
+ * Searches a page on the first two lists, and places it on the free list.
+ *
+ * @mapping: page_mapping()
+ * @index: page_index()
+ *
+ * returns listid of the list the item was found on with NR_found set if found.
+ */
+int nonresident_get(struct address_space * mapping, unsigned long index)
+{
+ struct nr_bucket * nr_bucket;
+ u32 wanted;
+ int j;
+ u8 i;
+ unsigned long flags;
+ int ret = 0;
+
+ if (mapping)
+ prefetch(mapping->host);
+ nr_bucket = nr_hash(mapping, index);
+
+ spin_lock_prefetch(nr_bucket); // prefetch_range(nr_bucket, NR_CACHELINES);
+ wanted = nr_cookie(mapping, index) & COOKIE_MASK;
+
+ spin_lock_irqsave(&nr_bucket->lock, flags);
+ for (i = 0; i < 2; ++i) {
+ j = nr_bucket->hand[i];
+ do {
+ u32 *slot = &nr_bucket->slot[j];
+ if (GET_LISTID(*slot) != i)
+ break;
+
+ if ((*slot & COOKIE_MASK) == wanted) {
+ slot = __nonresident_del(nr_bucket, i, j, slot);
+ __nonresident_push(nr_bucket, NR_free, slot);
+ ret = i | NR_found;
+ goto out;
+ }
+
+ j = GET_INDEX(*slot);
+ } while (j != nr_bucket->hand[i]);
+ }
+out:
+ spin_unlock_irqrestore(&nr_bucket->lock, flags);
+
+ return ret;
+}
+
+unsigned int nonresident_total(void)
+{
+ return (1 << nonres_shift) * NR_SLOTS;
+}
+
+/*
+ * For interactive workloads, we remember about as many non-resident pages
+ * as we have actual memory pages. For server workloads with large inter-
+ * reference distances we could benefit from remembering more.
+ */
+static __initdata unsigned long nonresident_factor = 1;
+void __init nonresident_init(void)
+{
+ int target;
+ int i, j;
+
+ /*
+ * Calculate the non-resident hash bucket target. Use a power of
+ * two for the division because alloc_large_system_hash rounds up.
+ */
+ target = nr_all_pages * nonresident_factor;
+ target /= (sizeof(struct nr_bucket) / sizeof(u32));
+
+ nonres_table = alloc_large_system_hash("Non-resident page tracking",
+ sizeof(struct nr_bucket),
+ target,
+ 0,
+ HASH_EARLY | HASH_HIGHMEM,
+ &nonres_shift,
+ &nonres_mask,
+ 0);
+
+ for (i = 0; i < (1 << nonres_shift); i++) {
+ spin_lock_init(&nonres_table[i].lock);
+ for (j = 0; j < 4; ++j)
+ nonres_table[i].hand[j] = 0;
+
+ for (j = 0; j < NR_SLOTS; ++j) {
+ nonres_table[i].slot[j] = 0;
+ SET_LISTID(nonres_table[i].slot[j], NR_free);
+ if (j < NR_SLOTS - 1)
+ SET_INDEX(nonres_table[i].slot[j], j+1);
+ else /* j == NR_SLOTS - 1 */
+ SET_INDEX(nonres_table[i].slot[j], 0);
+ }
+ }
+
+ for_each_cpu(i) {
+ for (j=0; j<4; ++j)
+ per_cpu(nonres_count[j], i) = 0;
+ }
+}
+
+static int __init set_nonresident_factor(char * str)
+{
+ if (!str)
+ return 0;
+ nonresident_factor = simple_strtoul(str, &str, 0);
+ return 1;
+}
+
+__setup("nonresident_factor=", set_nonresident_factor);
Index: linux-2.6/include/linux/nonresident-cart.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/include/linux/nonresident-cart.h 2006-07-12 16:11:22.000000000 +0200
@@ -0,0 +1,34 @@
+#ifndef _LINUX_NONRESIDENT_CART_H_
+#define _LINUX_NONRESIDENT_CART_H_
+
+#ifdef __KERNEL__
+
+#include <linux/fs.h>
+#include <linux/preempt.h>
+#include <linux/percpu.h>
+
+#define NR_b1 0
+#define NR_b2 1
+#define NR_free 2
+
+#define NR_listid 3
+#define NR_found 0x80000000
+
+extern int nonresident_put(struct address_space *, unsigned long, int, int);
+extern int nonresident_get(struct address_space *, unsigned long);
+extern unsigned int nonresident_total(void);
+extern void nonresident_init(void);
+
+DECLARE_PER_CPU(unsigned long[4], nonres_count);
+
+static inline unsigned long nonresident_count(int listid)
+{
+ unsigned long count;
+ preempt_disable();
+ count = __sum_cpu_var(unsigned long, nonres_count[listid]);
+ preempt_enable();
+ return count;
+}
+
+#endif /* __KERNEL__ */
+#endif /* _LINUX_NONRESIDENT_CART_H_ */
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 32/39] mm: cart: third per policy PG_flag
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (30 preceding siblings ...)
2006-07-12 14:43 ` [PATCH 31/39] mm: cart: nonresident page tracking for CART Peter Zijlstra
@ 2006-07-12 14:43 ` Peter Zijlstra
2006-07-12 14:43 ` [PATCH 33/39] mm: cart: CART policy implementation Peter Zijlstra
` (7 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:43 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Add a third PG_flag to the page reclaim framework.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/page-flags.h | 1 +
mm/hugetlb.c | 3 ++-
mm/page_alloc.c | 3 +++
3 files changed, 6 insertions(+), 1 deletion(-)
Index: linux-2.6/include/linux/page-flags.h
===================================================================
--- linux-2.6.orig/include/linux/page-flags.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/page-flags.h 2006-07-12 16:09:19.000000000 +0200
@@ -90,6 +90,7 @@
#define PG_uncached 20 /* Page has been mapped as uncached */
#define PG_reclaim2 21 /* reserved by the mm reclaim code */
+#define PG_reclaim3 22 /* reserved by the mm reclaim code */
/*
* Global page accounting. One instance per CPU. Only unsigned longs are
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/page_alloc.c 2006-07-12 16:09:19.000000000 +0200
@@ -151,6 +151,7 @@ static void bad_page(struct page *page)
1 << PG_locked |
1 << PG_reclaim1 |
1 << PG_reclaim2 |
+ 1 << PG_reclaim3 |
1 << PG_dirty |
1 << PG_reclaim |
1 << PG_slab |
@@ -382,6 +383,7 @@ static inline int free_pages_check(struc
1 << PG_locked |
1 << PG_reclaim1 |
1 << PG_reclaim2 |
+ 1 << PG_reclaim3 |
1 << PG_reclaim |
1 << PG_slab |
1 << PG_swapcache |
@@ -531,6 +533,7 @@ static int prep_new_page(struct page *pa
1 << PG_locked |
1 << PG_reclaim1 |
1 << PG_reclaim2 |
+ 1 << PG_reclaim3 |
1 << PG_dirty |
1 << PG_reclaim |
1 << PG_slab |
Index: linux-2.6/mm/hugetlb.c
===================================================================
--- linux-2.6.orig/mm/hugetlb.c 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/hugetlb.c 2006-07-12 16:09:19.000000000 +0200
@@ -292,7 +292,8 @@ static void update_and_free_page(struct
for (i = 0; i < (HPAGE_SIZE / PAGE_SIZE); i++) {
page[i].flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced |
1 << PG_dirty | 1 << PG_reclaim1 | 1 << PG_reclaim2 |
- 1 << PG_reserved | 1 << PG_private | 1<< PG_writeback);
+ 1 << PG_reclaim3 | 1 << PG_reserved | 1 << PG_private |
+ 1<< PG_writeback);
}
page[1].lru.next = NULL;
set_page_refcounted(page);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 33/39] mm: cart: CART policy implementation
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (31 preceding siblings ...)
2006-07-12 14:43 ` [PATCH 32/39] mm: cart: third per policy PG_flag Peter Zijlstra
@ 2006-07-12 14:43 ` Peter Zijlstra
2006-07-12 14:43 ` [PATCH 34/39] mm: cart: CART-r " Peter Zijlstra
` (6 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:43 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
This patch contains a Page Replacement Algorithm based on CART
Please refer to the CART paper here -
http://www.almaden.ibm.com/cs/people/dmodha/clockfast.pdf
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/mm_cart_data.h | 31 +
include/linux/mm_cart_policy.h | 132 ++++++++
include/linux/mm_page_replace.h | 6
include/linux/mm_page_replace_data.h | 6
mm/Kconfig | 5
mm/Makefile | 1
mm/cart.c | 555 +++++++++++++++++++++++++++++++++++
7 files changed, 732 insertions(+), 4 deletions(-)
Index: linux-2.6/mm/cart.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/mm/cart.c 2006-07-12 16:11:24.000000000 +0200
@@ -0,0 +1,555 @@
+/*
+ * mm/cart.c
+ *
+ * Written by Peter Zijlstra <a.p.zijlstra@chello.nl>
+ * Released under the GPLv2, see the file COPYING for details.
+ *
+ * This file contains a Page Replacement Algorithm based on CART
+ * Please refer to the CART paper here -
+ * http://www.almaden.ibm.com/cs/people/dmodha/clockfast.pdf
+ *
+ * T1 -> active_list |T1| -> nr_active
+ * T2 -> inactive_list |T2| -> nr_inactive
+ * filter bit -> PG_longterm
+ *
+ * The algorithm was adapted to work for linux which poses the following
+ * extra constraints:
+ * - multiple memory zones,
+ * - fault before reference,
+ * - expensive refernce check.
+ *
+ * The multiple memory zones are handled by decoupling the T lists from the
+ * B lists, keeping T lists per zone while having global B lists. See
+ * mm/nonresident.c for the B list implementation. List sizes are scaled on
+ * comparison.
+ *
+ * The paper seems to assume we insert after/on the first reference, we
+ * actually insert before the first reference. In order to give 'S' pages
+ * a chance we will not mark them 'L' on their first cycle (PG_new).
+ *
+ * Also for efficiency's sake the replace operation is batched. This to
+ * avoid holding the much contended zone->lru_lock while calling the
+ * possibly slow page_referenced().
+ *
+ * All functions that are prefixed with '__' assume that zone->lru_lock is taken.
+ */
+
+#include <linux/mm_page_replace.h>
+#include <linux/rmap.h>
+#include <linux/buffer_head.h>
+#include <linux/pagevec.h>
+#include <linux/bootmem.h>
+#include <linux/init.h>
+#include <linux/nonresident-cart.h>
+#include <linux/swap.h>
+#include <linux/module.h>
+#include <linux/percpu.h>
+#include <linux/writeback.h>
+
+#include <asm/div64.h>
+
+
+static DEFINE_PER_CPU(unsigned long, cart_nr_q);
+
+void __init pgrep_init(void)
+{
+ nonresident_init();
+}
+
+void __init pgrep_init_zone(struct zone *zone)
+{
+ INIT_LIST_HEAD(&zone->policy.list_T1);
+ INIT_LIST_HEAD(&zone->policy.list_T2);
+ zone->policy.nr_T1 = 0;
+ zone->policy.nr_T2 = 0;
+ zone->policy.nr_shortterm = 0;
+ zone->policy.nr_p = 0;
+ zone->policy.flags = 0;
+}
+
+static inline unsigned long cart_c(struct zone *zone)
+{
+ return zone->policy.nr_T1 + zone->policy.nr_T2 + zone->free_pages;
+}
+
+#define scale(x, y, z) ({ unsigned long long tmp = (x); \
+ tmp *= (y); \
+ do_div(tmp, (z)); \
+ (unsigned long)tmp; })
+
+#define B2T(x) scale((x), cart_c(zone), nonresident_total())
+#define T2B(x) scale((x), nonresident_total(), cart_c(zone))
+
+static inline unsigned long cart_longterm(struct zone *zone)
+{
+ return zone->policy.nr_T1 + zone->policy.nr_T2 - zone->policy.nr_shortterm;
+}
+
+static inline unsigned long __cart_q(void)
+{
+ return __sum_cpu_var(unsigned long, cart_nr_q);
+}
+
+static void __cart_q_inc(struct zone *zone, unsigned long dq)
+{
+ /* if (|T2| + |B2| + |T1| - ns >= c) q = min(q + 1, 2c - |T1|) */
+ /* |B2| + nl >= c */
+ if (B2T(nonresident_count(NR_b2)) + cart_longterm(zone) >=
+ cart_c(zone)) {
+ unsigned long target = 2*nonresident_total() - T2B(zone->policy.nr_T1);
+ unsigned long nr_q;
+
+ preempt_disable();
+
+ nr_q = __cart_q();
+ if (nr_q + dq > target)
+ dq = target - nr_q;
+ __get_cpu_var(cart_nr_q) += dq;
+
+ preempt_enable();
+ }
+}
+
+static void __cart_q_dec(struct zone *zone, unsigned long dq)
+{
+ /* q = max(q - 1, c - |T1|) */
+ unsigned long target = nonresident_total() - T2B(zone->policy.nr_T1);
+ unsigned long nr_q;
+
+ preempt_disable();
+
+ nr_q = __cart_q();
+ if (nr_q < target)
+ dq = nr_q - target;
+ else if (nr_q < dq)
+ dq = nr_q;
+ __get_cpu_var(cart_nr_q) -= dq;
+
+ preempt_enable();
+}
+
+static inline unsigned long cart_q(void)
+{
+ unsigned long q;
+ preempt_disable();
+ q = __cart_q();
+ preempt_enable();
+ return q;
+}
+
+static inline void __cart_p_inc(struct zone *zone)
+{
+ /* p = min(p + max(1, ns/|B1|), c) */
+ unsigned long ratio;
+ ratio = (zone->policy.nr_shortterm /
+ (B2T(nonresident_count(NR_b1)) + 1)) ?: 1UL;
+ zone->policy.nr_p += ratio;
+ if (unlikely(zone->policy.nr_p > cart_c(zone)))
+ zone->policy.nr_p = cart_c(zone);
+}
+
+static inline void __cart_p_dec(struct zone *zone)
+{
+ /* p = max(p - max(1, nl/|B2|), 0) */
+ unsigned long ratio;
+ ratio = (cart_longterm(zone) /
+ (B2T(nonresident_count(NR_b2)) + 1)) ?: 1UL;
+ if (zone->policy.nr_p >= ratio)
+ zone->policy.nr_p -= ratio;
+ else
+ zone->policy.nr_p = 0UL;
+}
+
+static unsigned long list_count(struct list_head *list, int PG_flag, int result)
+{
+ unsigned long nr = 0;
+ struct page *page;
+ list_for_each_entry(page, list, lru) {
+ if (!!test_bit(PG_flag, &(page)->flags) == result)
+ ++nr;
+ }
+ return nr;
+}
+
+static void __validate_zone(struct zone *zone)
+{
+#if 0
+ int bug = 0;
+ unsigned long cnt0 = list_count(&zone->policy.list_T1, PG_lru, 0);
+ unsigned long cnt1 = list_count(&zone->policy.list_T1, PG_lru, 1);
+ if (cnt1 != zone->policy.nr_T1) {
+ printk(KERN_ERR "__validate_zone: T1: %lu,%lu,%lu\n", cnt0, cnt1, zone->policy.nr_T1);
+ bug = 1;
+ }
+
+ cnt0 = list_count(&zone->policy.list_T2, PG_lru, 0);
+ cnt1 = list_count(&zone->policy.list_T2, PG_lru, 1);
+ if (cnt1 != zone->policy.nr_T2 || bug) {
+ printk(KERN_ERR "__validate_zone: T2: %lu,%lu,%lu\n", cnt0, cnt1, zone->policy.nr_T2);
+ bug = 1;
+ }
+
+ cnt0 = list_count(&zone->policy.list_T1, PG_longterm, 0) +
+ list_count(&zone->policy.list_T2, PG_longterm, 0);
+ cnt1 = list_count(&zone->policy.list_T1, PG_longterm, 1) +
+ list_count(&zone->policy.list_T2, PG_longterm, 1);
+ if (cnt0 != zone->policy.nr_shortterm || bug) {
+ printk(KERN_ERR "__validate_zone: shortterm: %lu,%lu,%lu\n", cnt0, cnt1, zone->policy.nr_shortterm);
+ bug = 1;
+ }
+
+ cnt0 = list_count(&zone->policy.list_T2, PG_longterm, 0);
+ cnt1 = list_count(&zone->policy.list_T2, PG_longterm, 1);
+ if (cnt1 != zone->policy.nr_T2 || bug) {
+ printk(KERN_ERR "__validate_zone: longterm: %lu,%lu,%lu\n", cnt0, cnt1, zone->policy.nr_T2);
+ bug = 1;
+ }
+
+ if (bug) {
+ BUG();
+ }
+#endif
+}
+
+/*
+ * Insert page into @zones CART and update adaptive parameters.
+ *
+ * @zone: target zone.
+ * @page: new page.
+ */
+void __pgrep_add(struct zone *zone, struct page *page)
+{
+ unsigned int rflags;
+
+ /*
+ * Note: we could give hints to the insertion process using the LRU
+ * specific PG_flags like: PG_t1, PG_longterm and PG_referenced.
+ */
+
+ rflags = nonresident_get(page_mapping(page), page_index(page));
+
+ if (rflags & NR_found) {
+ SetPageLongTerm(page);
+ rflags &= NR_listid;
+ if (rflags == NR_b1) {
+ __cart_p_inc(zone);
+ } else if (rflags == NR_b2) {
+ __cart_p_dec(zone);
+ __cart_q_inc(zone, 1);
+ }
+ /* ++cart_longterm(zone); */
+ } else {
+ ClearPageLongTerm(page);
+ ++zone->policy.nr_shortterm;
+ }
+ SetPageT1(page);
+
+ list_add(&page->lru, &zone->policy.list_T1);
+
+ ++zone->policy.nr_T1;
+ BUG_ON(!PageLRU(page));
+
+ __validate_zone(zone);
+}
+
+static DEFINE_PER_CPU(struct pagevec, cart_add_pvecs) = { 0, };
+
+void fastcall pgrep_add(struct page *page)
+{
+ struct pagevec *pvec = &get_cpu_var(cart_add_pvecs);
+
+ page_cache_get(page);
+ if (!pagevec_add(pvec, page))
+ __pagevec_pgrep_add(pvec);
+ put_cpu_var(cart_add_pvecs);
+}
+
+void __pgrep_add_drain(unsigned int cpu)
+{
+ struct pagevec *pvec = &per_cpu(cart_add_pvecs, cpu);
+
+ if (pagevec_count(pvec))
+ __pagevec_pgrep_add(pvec);
+}
+
+/*
+ * Add page to a release pagevec, temp. drop zone lock to release pagevec if full.
+ *
+ * @zone: @pages zone.
+ * @page: page to be released.
+ * @pvec: pagevec to collect pages in.
+ */
+static inline void __page_release(struct zone *zone, struct page *page,
+ struct pagevec *pvec)
+{
+ BUG_ON(PageLRU(page));
+ SetPageLRU(page);
+ if (!PageLongTerm(page))
+ ++zone->policy.nr_shortterm;
+ if (PageT1(page))
+ ++zone->policy.nr_T1;
+ else
+ ++zone->policy.nr_T2;
+
+ if (!pagevec_add(pvec, page)) {
+ spin_unlock_irq(&zone->lru_lock);
+ if (buffer_heads_over_limit)
+ pagevec_strip(pvec);
+ __pagevec_release(pvec);
+ spin_lock_irq(&zone->lru_lock);
+ }
+}
+
+void pgrep_reinsert(struct list_head *page_list)
+{
+ struct page *page, *page2;
+ struct zone *zone = NULL;
+ struct pagevec pvec;
+
+ pagevec_init(&pvec, 1);
+ list_for_each_entry_safe(page, page2, page_list, lru) {
+ struct zone *pagezone = page_zone(page);
+ if (pagezone != zone) {
+ if (zone)
+ spin_unlock_irq(&zone->lru_lock);
+ zone = pagezone;
+ spin_lock_irq(&zone->lru_lock);
+ }
+ if (PageT1(page))
+ list_move(&page->lru, &zone->policy.list_T1);
+ else
+ list_move(&page->lru, &zone->policy.list_T2);
+
+ __page_release(zone, page, &pvec);
+ }
+ if (zone)
+ spin_unlock_irq(&zone->lru_lock);
+ pagevec_release(&pvec);
+}
+
+static inline int cart_reclaim_T1(struct zone *zone, unsigned long nr_to_scan)
+{
+ int t1 = zone->policy.nr_T1 > zone->policy.nr_p &&
+ (zone->policy.nr_T1 > nr_to_scan ||
+ zone->policy.nr_T1 > zone->policy.nr_T2);
+ int sat = TestClearZoneSaturated(zone);
+ int rec = ZoneReclaimedT1(zone);
+
+ if (t1) {
+ if (sat && rec)
+ return 0;
+ return 1;
+ }
+
+ if (sat && !rec)
+ return 1;
+ return 0;
+}
+
+
+void __pgrep_get_candidates(struct zone *zone, int priority,
+ unsigned long nr_to_scan, struct list_head *page_list,
+ unsigned long *nr_scanned)
+{
+ unsigned long nr_scan;
+ unsigned long nr_taken;
+ struct list_head *list;
+ int reclaim_t1;
+ int loop = 0;
+
+ reclaim_t1 = !!cart_reclaim_T1(zone, nr_to_scan);
+again:
+ if (reclaim_t1) {
+ list = &zone->policy.list_T1;
+ SetZoneReclaimedT1(zone);
+ } else {
+ list = &zone->policy.list_T2;
+ ClearZoneReclaimedT1(zone);
+ }
+
+ nr_taken =
+ isolate_lru_pages(zone, nr_to_scan, list, page_list, &nr_scan);
+
+ if (!nr_taken && !loop) {
+ reclaim_t1 ^= 1;
+ ++loop;
+ spin_unlock_irq(&zone->lru_lock);
+ cond_resched();
+ pgrep_add_drain();
+ spin_lock_irq(&zone->lru_lock);
+ goto again;
+ }
+
+ *nr_scanned = nr_scan;
+}
+
+void pgrep_put_candidates(struct zone *zone, struct list_head *page_list,
+ unsigned long nr_freed, int may_swap)
+{
+ struct pagevec pvec;
+ unsigned long dqi = 0;
+ unsigned long dqd = 0;
+ unsigned long dsl = 0;
+ unsigned long target;
+ unsigned long writeback = 0, count = 0;
+
+ pagevec_init(&pvec, 1);
+ spin_lock_irq(&zone->lru_lock);
+
+ target = min(zone->policy.nr_p + 1UL, B2T(nonresident_count(NR_b1)));
+
+ while (!list_empty(page_list)) {
+ struct page * page = lru_to_page(page_list);
+ prefetchw_prev_lru_page(page, page_list, flags);
+
+ if (PageT1(page)) { /* T1 */
+ if (TestClearPageReferenced(page)) {
+ if (!PageLongTerm(page) &&
+ (zone->policy.nr_T1 - dqd + dqi) >= target) {
+ SetPageLongTerm(page);
+ ++dsl;
+ }
+ list_move(&page->lru, &zone->policy.list_T1);
+ } else if (PageLongTerm(page)) {
+ ClearPageT1(page);
+ ++dqd;
+ list_move(&page->lru, &zone->policy.list_T2);
+ } else {
+ /* should have been reclaimed or was PG_new */
+ list_move(&page->lru, &zone->policy.list_T1);
+ }
+ } else { /* T2 */
+ if (TestClearPageReferenced(page)) {
+ SetPageT1(page);
+ ++dqi;
+ list_move(&page->lru, &zone->policy.list_T1);
+ } else {
+ /* should have been reclaimed */
+ list_move(&page->lru, &zone->policy.list_T2);
+ }
+ }
+ __page_release(zone, page, &pvec);
+ ++count;
+ if (PageWriteback(page))
+ ++writeback;
+ }
+
+ if (!nr_freed && writeback > count/2)
+ SetZoneSaturated(zone);
+
+ if (dqi > dqd)
+ __cart_q_inc(zone, dqi - dqd);
+ else
+ __cart_q_dec(zone, dqd - dqi);
+
+ spin_unlock_irq(&zone->lru_lock);
+ pagevec_release(&pvec);
+}
+
+void __pgrep_rotate_reclaimable(struct zone *zone, struct page *page)
+{
+ if (PageLRU(page)) {
+ if (PageLongTerm(page)) {
+ if (TestClearPageT1(page)) {
+ --zone->policy.nr_T1;
+ ++zone->policy.nr_T2;
+ __cart_q_dec(zone, 1);
+ }
+ list_move_tail(&page->lru, &zone->policy.list_T2);
+ } else {
+ if (!PageT1(page))
+ BUG();
+ list_move_tail(&page->lru, &zone->policy.list_T1);
+ }
+ }
+}
+
+void pgrep_remember(struct zone *zone, struct page *page)
+{
+ int target_list = PageT1(page) ? NR_b1 : NR_b2;
+ int evict_list = (nonresident_count(NR_b1) > cart_q())
+ ? NR_b1 : NR_b2;
+
+ nonresident_put(page_mapping(page), page_index(page),
+ target_list, evict_list);
+}
+
+void pgrep_forget(struct address_space *mapping, unsigned long index)
+{
+ nonresident_get(mapping, index);
+}
+
+#define K(x) ((x) << (PAGE_SHIFT-10))
+
+void pgrep_show(struct zone *zone)
+{
+ printk("%s"
+ " free:%lukB"
+ " min:%lukB"
+ " low:%lukB"
+ " high:%lukB"
+ " T1:%lukB"
+ " T2:%lukB"
+ " shortterm:%lukB"
+ " present:%lukB"
+ " pages_scanned:%lu"
+ " all_unreclaimable? %s"
+ "\n",
+ zone->name,
+ K(zone->free_pages),
+ K(zone->pages_min),
+ K(zone->pages_low),
+ K(zone->pages_high),
+ K(zone->policy.nr_T1),
+ K(zone->policy.nr_T2),
+ K(zone->policy.nr_shortterm),
+ K(zone->present_pages),
+ zone->pages_scanned,
+ (zone->all_unreclaimable ? "yes" : "no")
+ );
+}
+
+void pgrep_zoneinfo(struct zone *zone, struct seq_file *m)
+{
+ seq_printf(m,
+ "\n pages free %lu"
+ "\n min %lu"
+ "\n low %lu"
+ "\n high %lu"
+ "\n T1 %lu"
+ "\n T2 %lu"
+ "\n shortterm %lu"
+ "\n p %lu"
+ "\n flags %lu"
+ "\n scanned %lu"
+ "\n spanned %lu"
+ "\n present %lu",
+ zone->free_pages,
+ zone->pages_min,
+ zone->pages_low,
+ zone->pages_high,
+ zone->policy.nr_T1,
+ zone->policy.nr_T2,
+ zone->policy.nr_shortterm,
+ zone->policy.nr_p,
+ zone->policy.flags,
+ zone->pages_scanned,
+ zone->spanned_pages,
+ zone->present_pages);
+}
+
+void __pgrep_counts(unsigned long *active, unsigned long *inactive,
+ unsigned long *free, struct zone *zones)
+{
+ int i;
+
+ *active = 0;
+ *inactive = 0;
+ *free = 0;
+ for (i = 0; i < MAX_NR_ZONES; i++) {
+ *active += zones[i].policy.nr_T1 + zones[i].policy.nr_T2 -
+ zones[i].policy.nr_shortterm;
+ *inactive += zones[i].policy.nr_shortterm;
+ *free += zones[i].free_pages;
+ }
+}
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:11:24.000000000 +0200
@@ -96,10 +96,12 @@ extern void __pgrep_counts(unsigned long
unsigned long *, struct zone *);
/* unsigned long __pgrep_nr_pages(struct zone *); */
-#ifdef CONFIG_MM_POLICY_USEONCE
+#if defined CONFIG_MM_POLICY_USEONCE
#include <linux/mm_use_once_policy.h>
-#elif CONFIG_MM_POLICY_CLOCKPRO
+#elif defined CONFIG_MM_POLICY_CLOCKPRO
#include <linux/mm_clockpro_policy.h>
+#elif defined CONFIG_MM_POLICY_CART
+#include <linux/mm_cart_policy.h>
#else
#error no mm policy
#endif
Index: linux-2.6/include/linux/mm_page_replace_data.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace_data.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace_data.h 2006-07-12 16:11:24.000000000 +0200
@@ -3,10 +3,12 @@
#ifdef __KERNEL__
-#ifdef CONFIG_MM_POLICY_USEONCE
+#if defined CONFIG_MM_POLICY_USEONCE
#include <linux/mm_use_once_data.h>
-#elif CONFIG_MM_POLICY_CLOCKPRO
+#elif defined CONFIG_MM_POLICY_CLOCKPRO
#include <linux/mm_clockpro_data.h>
+#elif defined CONFIG_MM_POLICY_CART
+#include <linux/mm_cart_data.h>
#else
#error no mm policy
#endif
Index: linux-2.6/mm/Kconfig
===================================================================
--- linux-2.6.orig/mm/Kconfig 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/Kconfig 2006-07-12 16:11:24.000000000 +0200
@@ -147,6 +147,11 @@ config MM_POLICY_CLOCKPRO
help
This option selects a CLOCK-Pro based policy
+config MM_POLICY_CART
+ bool "CART"
+ help
+ This option selects a CART based policy
+
endchoice
#
Index: linux-2.6/mm/Makefile
===================================================================
--- linux-2.6.orig/mm/Makefile 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/Makefile 2006-07-12 16:11:24.000000000 +0200
@@ -14,6 +14,7 @@ obj-y := bootmem.o filemap.o mempool.o
obj-$(CONFIG_MM_POLICY_USEONCE) += useonce.o
obj-$(CONFIG_MM_POLICY_CLOCKPRO) += nonresident.o clockpro.o
+obj-$(CONFIG_MM_POLICY_CART) += nonresident-cart.o cart.o
obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o
obj-$(CONFIG_HUGETLBFS) += hugetlb.o
Index: linux-2.6/include/linux/mm_cart_data.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/include/linux/mm_cart_data.h 2006-07-12 16:11:24.000000000 +0200
@@ -0,0 +1,31 @@
+#ifndef _LINUX_CART_DATA_H_
+#define _LINUX_CART_DATA_H_
+
+#ifdef __KERNEL__
+
+#include <asm/bitops.h>
+
+struct pgrep_data {
+ struct list_head list_T1;
+ struct list_head list_T2;
+ unsigned long nr_scan;
+ unsigned long nr_T1;
+ unsigned long nr_T2;
+ unsigned long nr_shortterm;
+ unsigned long nr_p;
+ unsigned long flags;
+};
+
+#define CART_RECLAIMED_T1 0
+#define CART_SATURATED 1
+
+#define ZoneReclaimedT1(z) test_bit(CART_RECLAIMED_T1, &((z)->policy.flags))
+#define SetZoneReclaimedT1(z) __set_bit(CART_RECLAIMED_T1, &((z)->policy.flags))
+#define ClearZoneReclaimedT1(z) __clear_bit(CART_RECLAIMED_T1, &((z)->policy.flags))
+
+#define ZoneSaturated(z) test_bit(CART_SATURATED, &((z)->policy.flags))
+#define SetZoneSaturated(z) __set_bit(CART_SATURATED, &((z)->policy.flags))
+#define TestClearZoneSaturated(z) __test_and_clear_bit(CART_SATURATED, &((z)->policy.flags))
+
+#endif /* __KERNEL__ */
+#endif /* _LINUX_CART_DATA_H_ */
Index: linux-2.6/include/linux/mm_cart_policy.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/include/linux/mm_cart_policy.h 2006-07-12 16:11:24.000000000 +0200
@@ -0,0 +1,132 @@
+#ifndef _LINUX_MM_CART_POLICY_H
+#define _LINUX_MM_CART_POLICY_H
+
+#ifdef __KERNEL__
+
+#include <linux/rmap.h>
+#include <linux/page-flags.h>
+
+#define PG_t1 PG_reclaim1
+#define PG_longterm PG_reclaim2
+#define PG_new PG_reclaim3
+
+#define PageT1(page) test_bit(PG_t1, &(page)->flags)
+#define SetPageT1(page) set_bit(PG_t1, &(page)->flags)
+#define ClearPageT1(page) clear_bit(PG_t1, &(page)->flags)
+#define TestClearPageT1(page) test_and_clear_bit(PG_t1, &(page)->flags)
+#define TestSetPageT1(page) test_and_set_bit(PG_t1, &(page)->flags)
+
+#define PageLongTerm(page) test_bit(PG_longterm, &(page)->flags)
+#define SetPageLongTerm(page) set_bit(PG_longterm, &(page)->flags)
+#define TestSetPageLongTerm(page) test_and_set_bit(PG_longterm, &(page)->flags)
+#define ClearPageLongTerm(page) clear_bit(PG_longterm, &(page)->flags)
+#define TestClearPageLongTerm(page) test_and_clear_bit(PG_longterm, &(page)->flags)
+
+#define PageNew(page) test_bit(PG_new, &(page)->flags)
+#define SetPageNew(page) set_bit(PG_new, &(page)->flags)
+#define TestSetPageNew(page) test_and_set_bit(PG_new, &(page)->flags)
+#define ClearPageNew(page) clear_bit(PG_new, &(page)->flags)
+#define TestClearPageNew(page) test_and_clear_bit(PG_new, &(page)->flags)
+
+static inline void pgrep_hint_active(struct page *page)
+{
+}
+
+static inline void pgrep_hint_use_once(struct page *page)
+{
+ if (PageLRU(page))
+ BUG();
+ SetPageNew(page);
+}
+
+extern void __pgrep_add(struct zone *, struct page *);
+
+static inline void pgrep_copy_state(struct page *dpage, struct page *spage)
+{
+ if (PageT1(spage))
+ SetPageT1(dpage);
+ if (PageLongTerm(spage))
+ SetPageLongTerm(dpage);
+ if (PageNew(spage))
+ SetPageNew(dpage);
+}
+
+static inline void pgrep_clear_state(struct page *page)
+{
+ if (PageT1(page))
+ ClearPageT1(page);
+ if (PageLongTerm(page))
+ ClearPageLongTerm(page);
+ if (PageNew(page))
+ ClearPageNew(page);
+}
+
+static inline int pgrep_is_active(struct page *page)
+{
+ return PageLongTerm(page);
+}
+
+static inline void __pgrep_remove(struct zone *zone, struct page *page)
+{
+ list_del(&page->lru);
+ if (PageT1(page))
+ --zone->policy.nr_T1;
+ else
+ --zone->policy.nr_T2;
+
+ if (!PageLongTerm(page))
+ --zone->policy.nr_shortterm;
+}
+
+static inline int pgrep_reclaimable(struct page *page)
+{
+ if (page_referenced(page, 1, 0))
+ return RECLAIM_ACTIVATE;
+
+ if (PageNew(page))
+ ClearPageNew(page);
+
+ if ((PageT1(page) && PageLongTerm(page)) ||
+ (!PageT1(page) && !PageLongTerm(page)))
+ return RECLAIM_KEEP;
+
+ return RECLAIM_OK;
+}
+
+static inline int fastcall pgrep_activate(struct page *page)
+{
+ /* just set PG_referenced, handle the rest in
+ * pgrep_reinsert()
+ */
+ if (!TestClearPageNew(page)) {
+ SetPageReferenced(page);
+ return 1;
+ }
+
+ return 0;
+}
+
+extern void __pgrep_rotate_reclaimable(struct zone *, struct page *);
+
+static inline void pgrep_mark_accessed(struct page *page)
+{
+ SetPageReferenced(page);
+}
+
+#define MM_POLICY_HAS_NONRESIDENT
+
+extern void pgrep_remember(struct zone *, struct page *);
+extern void pgrep_forget(struct address_space *, unsigned long);
+
+static inline unsigned long __pgrep_nr_pages(struct zone *zone)
+{
+ return zone->policy.nr_T1 + zone->policy.nr_T2;
+}
+
+static inline unsigned long __pgrep_nr_scan(struct zone *zone)
+{
+ return zone->policy.nr_T1 + zone->policy.nr_T2;
+}
+
+#endif /* __KERNEL__ */
+#endif /* _LINUX_MM_CART_POLICY_H_ */
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 34/39] mm: cart: CART-r policy implementation
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (32 preceding siblings ...)
2006-07-12 14:43 ` [PATCH 33/39] mm: cart: CART policy implementation Peter Zijlstra
@ 2006-07-12 14:43 ` Peter Zijlstra
2006-07-12 14:43 ` [PATCH 35/39] mm: random: random page replacement policy Peter Zijlstra
` (5 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:43 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Another CART based policy, this one extends CART to handle cyclic access.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
include/linux/mm_cart_data.h | 8 ++++
include/linux/mm_cart_policy.h | 10 ++++-
include/linux/mm_page_replace.h | 2 -
include/linux/mm_page_replace_data.h | 2 -
mm/Kconfig | 6 +++
mm/Makefile | 1
mm/cart.c | 66 +++++++++++++++++++++++++++++------
7 files changed, 82 insertions(+), 13 deletions(-)
Index: linux-2.6/include/linux/mm_cart_data.h
===================================================================
--- linux-2.6.orig/include/linux/mm_cart_data.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/mm_cart_data.h 2006-07-12 16:09:19.000000000 +0200
@@ -13,11 +13,15 @@ struct pgrep_data {
unsigned long nr_T2;
unsigned long nr_shortterm;
unsigned long nr_p;
+#if defined CONFIG_MM_POLICY_CART_R
+ unsigned long nr_r;
+#endif
unsigned long flags;
};
#define CART_RECLAIMED_T1 0
#define CART_SATURATED 1
+#define CART_CYCLIC 2
#define ZoneReclaimedT1(z) test_bit(CART_RECLAIMED_T1, &((z)->policy.flags))
#define SetZoneReclaimedT1(z) __set_bit(CART_RECLAIMED_T1, &((z)->policy.flags))
@@ -27,5 +31,9 @@ struct pgrep_data {
#define SetZoneSaturated(z) __set_bit(CART_SATURATED, &((z)->policy.flags))
#define TestClearZoneSaturated(z) __test_and_clear_bit(CART_SATURATED, &((z)->policy.flags))
+#define ZoneCyclic(z) test_bit(CART_CYCLIC, &((z)->policy.flags))
+#define SetZoneCyclic(z) __set_bit(CART_CYCLIC, &((z)->policy.flags))
+#define ClearZoneCyclic(z) __clear_bit(CART_CYCLIC, &((z)->policy.flags))
+
#endif /* __KERNEL__ */
#endif /* _LINUX_CART_DATA_H_ */
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:11:23.000000000 +0200
@@ -100,7 +100,7 @@ extern void __pgrep_counts(unsigned long
#include <linux/mm_use_once_policy.h>
#elif defined CONFIG_MM_POLICY_CLOCKPRO
#include <linux/mm_clockpro_policy.h>
-#elif defined CONFIG_MM_POLICY_CART
+#elif defined CONFIG_MM_POLICY_CART || defined CONFIG_MM_POLICY_CART_R
#include <linux/mm_cart_policy.h>
#else
#error no mm policy
Index: linux-2.6/include/linux/mm_page_replace_data.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace_data.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace_data.h 2006-07-12 16:11:23.000000000 +0200
@@ -7,7 +7,7 @@
#include <linux/mm_use_once_data.h>
#elif defined CONFIG_MM_POLICY_CLOCKPRO
#include <linux/mm_clockpro_data.h>
-#elif defined CONFIG_MM_POLICY_CART
+#elif defined CONFIG_MM_POLICY_CART || defined CONFIG_MM_POLICY_CART_R
#include <linux/mm_cart_data.h>
#else
#error no mm policy
Index: linux-2.6/mm/Kconfig
===================================================================
--- linux-2.6.orig/mm/Kconfig 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/Kconfig 2006-07-12 16:11:23.000000000 +0200
@@ -152,6 +152,12 @@ config MM_POLICY_CART
help
This option selects a CART based policy
+config MM_POLICY_CART_R
+ bool "CART-r"
+ help
+ This option selects a CART based policy modified to handle cyclic
+ access patterns.
+
endchoice
#
Index: linux-2.6/mm/cart.c
===================================================================
--- linux-2.6.orig/mm/cart.c 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/cart.c 2006-07-12 16:11:22.000000000 +0200
@@ -64,6 +64,9 @@ void __init pgrep_init_zone(struct zone
zone->policy.nr_T2 = 0;
zone->policy.nr_shortterm = 0;
zone->policy.nr_p = 0;
+#if defined CONFIG_MM_POLICY_CART_R
+ zone->policy.nr_r = 0;
+#endif
zone->policy.flags = 0;
}
@@ -160,6 +163,30 @@ static inline void __cart_p_dec(struct z
zone->policy.nr_p = 0UL;
}
+#if defined CONFIG_MM_POLICY_CART_R
+static inline void __cart_r_inc(struct zone *zone)
+{
+ unsigned long ratio;
+ ratio = (cart_longterm(zone) / (zone->policy.nr_shortterm + 1)) ?: 1;
+ zone->policy.nr_r += ratio;
+ if (zone->policy.nr_r > cart_c(zone))
+ zone->policy.nr_r = cart_c(zone);
+}
+
+static inline void __cart_r_dec(struct zone *zone)
+{
+ unsigned long ratio;
+ ratio = (zone->policy.nr_shortterm / (cart_longterm(zone) + 1)) ?: 1;
+ if (zone->policy.nr_r > ratio)
+ zone->policy.nr_r -= ratio;
+ else
+ zone->policy.nr_r = 0UL;
+}
+#else
+#define __cart_r_inc(z) do { } while (0)
+#define __cart_r_dec(z) do { } while (0)
+#endif
+
static unsigned long list_count(struct list_head *list, int PG_flag, int result)
{
unsigned long nr = 0;
@@ -230,6 +257,8 @@ void __pgrep_add(struct zone *zone, stru
if (rflags & NR_found) {
SetPageLongTerm(page);
+ __cart_r_dec(zone);
+
rflags &= NR_listid;
if (rflags == NR_b1) {
__cart_p_inc(zone);
@@ -240,6 +269,7 @@ void __pgrep_add(struct zone *zone, stru
/* ++cart_longterm(zone); */
} else {
ClearPageLongTerm(page);
+ __cart_r_inc(zone);
++zone->policy.nr_shortterm;
}
SetPageT1(page);
@@ -329,21 +359,30 @@ void pgrep_reinsert(struct list_head *pa
static inline int cart_reclaim_T1(struct zone *zone, unsigned long nr_to_scan)
{
+ int ret = 0;
int t1 = zone->policy.nr_T1 > zone->policy.nr_p &&
(zone->policy.nr_T1 > nr_to_scan ||
zone->policy.nr_T1 > zone->policy.nr_T2);
int sat = TestClearZoneSaturated(zone);
int rec = ZoneReclaimedT1(zone);
+#if defined CONFIG_MM_POLICY_CART_R
+ int cyc = zone->policy.nr_r < cart_longterm(zone);
- if (t1) {
- if (sat && rec)
- return 0;
- return 1;
- }
+ t1 |= cyc;
+#endif
+
+ if ((t1 && !(rec && sat)) ||
+ (!t1 && (!rec && sat)))
+ ret = 1;
+
+#if defined CONFIG_MM_POLICY_CART_R
+ if (ret && cyc)
+ SetZoneCyclic(zone);
+ else
+ ClearZoneCyclic(zone);
+#endif
- if (sat && !rec)
- return 1;
- return 0;
+ return ret;
}
@@ -450,7 +489,8 @@ void __pgrep_rotate_reclaimable(struct z
{
if (PageLRU(page)) {
if (PageLongTerm(page)) {
- if (TestClearPageT1(page)) {
+ if (PageT1(page)) {
+ ClearPageT1(page);
--zone->policy.nr_T1;
++zone->policy.nr_T2;
__cart_q_dec(zone, 1);
@@ -520,7 +560,10 @@ void pgrep_zoneinfo(struct zone *zone, s
"\n T2 %lu"
"\n shortterm %lu"
"\n p %lu"
- "\n flags %lu"
+#if defined CONFIG_MM_POLICY_CART_R
+ "\n r %lu"
+#endif
+ "\n flags %lx"
"\n scanned %lu"
"\n spanned %lu"
"\n present %lu",
@@ -532,6 +575,9 @@ void pgrep_zoneinfo(struct zone *zone, s
zone->policy.nr_T2,
zone->policy.nr_shortterm,
zone->policy.nr_p,
+#if defined CONFIG_MM_POLICY_CART_R
+ zone->policy.nr_r,
+#endif
zone->policy.flags,
zone->pages_scanned,
zone->spanned_pages,
Index: linux-2.6/include/linux/mm_cart_policy.h
===================================================================
--- linux-2.6.orig/include/linux/mm_cart_policy.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/mm_cart_policy.h 2006-07-12 16:09:19.000000000 +0200
@@ -80,6 +80,13 @@ static inline void __pgrep_remove(struct
static inline int pgrep_reclaimable(struct page *page)
{
+#if defined CONFIG_MM_POLICY_CART_R
+ if (PageNew(page) && ZoneCyclic(page_zone(page))) {
+ ClearPageNew(page);
+ return RECLAIM_OK;
+ }
+#endif
+
if (page_referenced(page, 1, 0))
return RECLAIM_ACTIVATE;
@@ -98,10 +105,11 @@ static inline int fastcall pgrep_activat
/* just set PG_referenced, handle the rest in
* pgrep_reinsert()
*/
- if (!TestClearPageNew(page)) {
+ if (!PageNew(page)) {
SetPageReferenced(page);
return 1;
}
+ ClearPageNew(page);
return 0;
}
Index: linux-2.6/mm/Makefile
===================================================================
--- linux-2.6.orig/mm/Makefile 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/Makefile 2006-07-12 16:11:23.000000000 +0200
@@ -15,6 +15,7 @@ obj-y := bootmem.o filemap.o mempool.o
obj-$(CONFIG_MM_POLICY_USEONCE) += useonce.o
obj-$(CONFIG_MM_POLICY_CLOCKPRO) += nonresident.o clockpro.o
obj-$(CONFIG_MM_POLICY_CART) += nonresident-cart.o cart.o
+obj-$(CONFIG_MM_POLICY_CART_R) += nonresident-cart.o cart.o
obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o
obj-$(CONFIG_HUGETLBFS) += hugetlb.o
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 35/39] mm: random: random page replacement policy
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (33 preceding siblings ...)
2006-07-12 14:43 ` [PATCH 34/39] mm: cart: CART-r " Peter Zijlstra
@ 2006-07-12 14:43 ` Peter Zijlstra
2006-07-12 14:44 ` [PATCH 36/39] mm: refault histogram for non-resident policies Peter Zijlstra
` (4 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:43 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Random page replacement.
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
include/linux/mm_page_replace.h | 2
include/linux/mm_page_replace_data.h | 2
include/linux/mm_random_data.h | 9 +
include/linux/mm_random_policy.h | 60 +++++++++
mm/Kconfig | 5
mm/Makefile | 1
mm/random_policy.c | 218 +++++++++++++++++++++++++++++++++++
7 files changed, 297 insertions(+)
Index: linux-2.6/mm/random_policy.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/mm/random_policy.c 2006-07-12 16:09:19.000000000 +0200
@@ -0,0 +1,218 @@
+
+/* Random page replacement policy */
+
+#include <linux/module.h>
+#include <linux/mm_page_replace.h>
+#include <linux/swap.h>
+#include <linux/pagevec.h>
+#include <linux/init.h>
+#include <linux/rmap.h>
+#include <linux/hash.h>
+#include <linux/seq_file.h>
+#include <linux/writeback.h>
+#include <linux/buffer_head.h> /* for try_to_release_page(),
+ buffer_heads_over_limit */
+#include <asm/sections.h>
+
+void __init pgrep_init(void)
+{
+ printk(KERN_ERR "Random page replacement policy init!\n");
+}
+
+void __init pgrep_init_zone(struct zone *zone)
+{
+ zone->policy.nr_pages = 0;
+}
+
+static DEFINE_PER_CPU(struct pagevec, add_pvecs) = { 0, };
+
+void fastcall pgrep_add(struct page *page)
+{
+ struct pagevec *pvec = &get_cpu_var(add_pvecs);
+
+ page_cache_get(page);
+ if (!pagevec_add(pvec, page))
+ __pagevec_pgrep_add(pvec);
+ put_cpu_var(add_pvecs);
+}
+
+void __pgrep_add_drain(unsigned int cpu)
+{
+ struct pagevec *pvec = &per_cpu(add_pvecs, cpu);
+
+ if (pagevec_count(pvec))
+ __pagevec_pgrep_add(pvec);
+}
+
+static inline void __page_release(struct zone *zone, struct page *page,
+ struct pagevec *pvec)
+{
+ BUG_ON(PageLRU(page));
+ SetPageLRU(page);
+ ++zone->policy.nr_pages;
+
+ if (!pagevec_add(pvec, page)) {
+ spin_unlock_irq(&zone->lru_lock);
+ if (buffer_heads_over_limit)
+ pagevec_strip(pvec);
+ __pagevec_release(pvec);
+ spin_lock_irq(&zone->lru_lock);
+ }
+}
+
+void pgrep_reinsert(struct list_head *page_list)
+{
+ struct page *page, *page2;
+ struct zone *zone = NULL;
+ struct pagevec pvec;
+
+ pagevec_init(&pvec, 1);
+ list_for_each_entry_safe(page, page2, page_list, lru) {
+ struct zone *pagezone = page_zone(page);
+ if (pagezone != zone) {
+ if (zone)
+ spin_unlock_irq(&zone->lru_lock);
+ zone = pagezone;
+ spin_lock_irq(&zone->lru_lock);
+ }
+ __page_release(zone, page, &pvec);
+ }
+ if (zone)
+ spin_unlock_irq(&zone->lru_lock);
+ pagevec_release(&pvec);
+}
+
+/*
+ * Lehmer simple linear congruential PRNG
+ *
+ * Xn+1 = (a * Xn + c) mod m
+ *
+ * where a, c and m are constants.
+ *
+ * Note that "m" is zone->present_pages, so in this case its
+ * really not constant.
+ */
+
+static unsigned long get_random(struct zone *zone)
+{
+ zone->policy.seed =
+ hash_long(zone->policy.seed, BITS_PER_LONG) + 3147484177UL;
+ return zone->policy.seed;
+}
+
+static struct page *pick_random_cache_page(struct zone *zone)
+{
+ struct page *page;
+ unsigned long pfn;
+ do {
+ pfn = zone->zone_start_pfn +
+ get_random(zone) % zone->present_pages;
+ page = pfn_to_page(pfn);
+ } while (!PageLRU(page));
+ zone->policy.seed ^= page_index(page);
+ return page;
+}
+
+static unsigned long pick_candidates(struct zone *zone,
+ unsigned long nr_to_scan, struct list_head *pages)
+{
+ unsigned long nr_taken = 0;
+ for (;nr_to_scan && zone->policy.nr_pages; nr_to_scan--) {
+ struct page *page = pick_random_cache_page(zone);
+ if (!TestSetPageCandidate(page)) {
+ list_add(&page->lru, pages);
+ ++nr_taken;
+ }
+ }
+ return nr_taken;
+}
+
+void __pgrep_get_candidates(struct zone *zone, int priority,
+ unsigned long nr_to_scan, struct list_head *pages,
+ unsigned long *nr_scanned)
+{
+ LIST_HEAD(candidates);
+ nr_to_scan = pick_candidates(zone, nr_to_scan, &candidates);
+ isolate_lru_pages(zone, nr_to_scan, &candidates, pages, nr_scanned);
+ while (!list_empty(&candidates)) {
+ struct page *page = lru_to_page(&candidates);
+ list_del(&page->lru);
+ ClearPageCandidate(page);
+ }
+}
+
+void pgrep_put_candidates(struct zone *zone, struct list_head *pages,
+ unsigned long nr_freed, int may_swap)
+{
+ struct pagevec pvec;
+ pagevec_init(&pvec, 1);
+ spin_lock_irq(&zone->lru_lock);
+ while (!list_empty(pages)) {
+ struct page *page = lru_to_page(pages);
+ list_del(&page->lru);
+ __page_release(zone, page, &pvec);
+ }
+ spin_unlock_irq(&zone->lru_lock);
+ pagevec_release(&pvec);
+}
+
+#define K(x) ((x) << (PAGE_SHIFT-10))
+
+void pgrep_show(struct zone *zone)
+{
+ printk("%s"
+ " free:%lukB"
+ " min:%lukB"
+ " low:%lukB"
+ " high:%lukB"
+ " cached:%lukB"
+ " present:%lukB"
+ " pages_scanned:%lu"
+ " all_unreclaimable? %s"
+ "\n",
+ zone->name,
+ K(zone->free_pages),
+ K(zone->pages_min),
+ K(zone->pages_low),
+ K(zone->pages_high),
+ K(zone->policy.nr_pages),
+ K(zone->present_pages),
+ zone->pages_scanned,
+ (zone->all_unreclaimable ? "yes" : "no")
+ );
+}
+
+void pgrep_zoneinfo(struct zone *zone, struct seq_file *m)
+{
+ seq_printf(m,
+ "\n pages free %lu"
+ "\n min %lu"
+ "\n low %lu"
+ "\n high %lu"
+ "\n cached %lu"
+ "\n scanned %lu"
+ "\n spanned %lu"
+ "\n present %lu",
+ zone->free_pages,
+ zone->pages_min,
+ zone->pages_low,
+ zone->pages_high,
+ zone->policy.nr_pages,
+ zone->pages_scanned,
+ zone->spanned_pages,
+ zone->present_pages);
+}
+
+void __pgrep_counts(unsigned long *active, unsigned long *inactive,
+ unsigned long *free, struct zone *zones)
+{
+ int i;
+
+ *active = 0;
+ *inactive = 0;
+ *free = 0;
+ for (i = 0; i < MAX_NR_ZONES; i++) {
+ *free += zones[i].free_pages;
+ }
+}
+
Index: linux-2.6/mm/Kconfig
===================================================================
--- linux-2.6.orig/mm/Kconfig 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/Kconfig 2006-07-12 16:11:22.000000000 +0200
@@ -158,6 +158,11 @@ config MM_POLICY_CART_R
This option selects a CART based policy modified to handle cyclic
access patterns.
+config MM_POLICY_RANDOM
+ bool "Random"
+ help
+ This option selects the random replacement policy.
+
endchoice
#
Index: linux-2.6/include/linux/mm_page_replace.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace.h 2006-07-12 16:09:19.000000000 +0200
@@ -102,6 +102,8 @@ extern void __pgrep_counts(unsigned long
#include <linux/mm_clockpro_policy.h>
#elif defined CONFIG_MM_POLICY_CART || defined CONFIG_MM_POLICY_CART_R
#include <linux/mm_cart_policy.h>
+#elif defined CONFIG_MM_POLICY_RANDOM
+#include <linux/mm_random_policy.h>
#else
#error no mm policy
#endif
Index: linux-2.6/include/linux/mm_page_replace_data.h
===================================================================
--- linux-2.6.orig/include/linux/mm_page_replace_data.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/mm_page_replace_data.h 2006-07-12 16:09:19.000000000 +0200
@@ -9,6 +9,8 @@
#include <linux/mm_clockpro_data.h>
#elif defined CONFIG_MM_POLICY_CART || defined CONFIG_MM_POLICY_CART_R
#include <linux/mm_cart_data.h>
+#elif defined CONFIG_MM_POLICY_RANDOM
+#include <linux/mm_random_data.h>
#else
#error no mm policy
#endif
Index: linux-2.6/include/linux/mm_random_policy.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/include/linux/mm_random_policy.h 2006-07-12 16:09:19.000000000 +0200
@@ -0,0 +1,60 @@
+#ifndef _LINUX_MM_RANDOM_POLICY_H
+#define _LINUX_MM_RANDOM_POLICY_H
+
+#ifdef __KERNEL__
+
+#include <linux/page-flags.h>
+
+#define PG_candidate PG_reclaim1
+
+#define PageCandidate(page) test_bit(PG_candidate, &(page)->flags)
+#define TestSetPageCandidate(page) test_and_set_bit(PG_candidate, &(page)->flags)
+#define ClearPageCandidate(page) clear_bit(PG_candidate, &(page)->flags)
+
+#define pgrep_hint_active(p) do { } while (0)
+#define pgrep_hint_use_once(p) do { } while (0)
+
+static inline
+void __pgrep_add(struct zone *zone, struct page *page)
+{
+ zone->policy.nr_pages++;
+}
+
+#define pgrep_activate(p) 0
+#define pgrep_reclaimable(p) RECLAIM_OK
+#define pgrep_mark_accessed(p) do { } while (0)
+
+static inline
+void __pgrep_remove(struct zone *zone, struct page *page)
+{
+ if (PageCandidate(page)) {
+ ClearPageCandidate(page);
+ list_del(&page->lru);
+ }
+ zone->policy.nr_pages--;
+}
+
+static inline
+void __pgrep_rotate_reclaimable(struct zone *zone, struct page *page)
+{
+}
+
+#define pgrep_copy_state(d, s) do { } while (0)
+#define pgrep_clear_state(p) do { } while (0)
+#define pgrep_is_active(p) 0
+
+#define pgrep_remember(z, p) do { } while (0)
+#define pgrep_forget(m, i) do { } while (0)
+
+static inline unsigned long __pgrep_nr_pages(struct zone *zone)
+{
+ return zone->policy.nr_pages;
+}
+
+static inline unsigned long __pgrep_nr_scan(struct zone *zone)
+{
+ return zone->policy.nr_pages;
+}
+
+#endif /* __KERNEL__ */
+#endif /* _LINUX_MM_LRU_POLICY_H */
Index: linux-2.6/include/linux/mm_random_data.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/include/linux/mm_random_data.h 2006-07-12 16:09:19.000000000 +0200
@@ -0,0 +1,9 @@
+#ifdef __KERNEL__
+
+struct pgrep_data {
+ unsigned long nr_scan;
+ unsigned long nr_pages;
+ unsigned long seed;
+};
+
+#endif /* __KERNEL__ */
Index: linux-2.6/mm/Makefile
===================================================================
--- linux-2.6.orig/mm/Makefile 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/Makefile 2006-07-12 16:11:22.000000000 +0200
@@ -16,6 +16,7 @@ obj-$(CONFIG_MM_POLICY_USEONCE) += useon
obj-$(CONFIG_MM_POLICY_CLOCKPRO) += nonresident.o clockpro.o
obj-$(CONFIG_MM_POLICY_CART) += nonresident-cart.o cart.o
obj-$(CONFIG_MM_POLICY_CART_R) += nonresident-cart.o cart.o
+obj-$(CONFIG_MM_POLICY_RANDOM) += random_policy.o
obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o
obj-$(CONFIG_HUGETLBFS) += hugetlb.o
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 36/39] mm: refault histogram for non-resident policies
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (34 preceding siblings ...)
2006-07-12 14:43 ` [PATCH 35/39] mm: random: random page replacement policy Peter Zijlstra
@ 2006-07-12 14:44 ` Peter Zijlstra
2006-07-12 14:44 ` [PATCH 37/39] mm: use-once: cleanup of the use-once logic Peter Zijlstra
` (3 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:44 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Adds a refault histogram for those policies that use nonresident page tracking.
Based on ideas and code from Rik van Riel.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl
fs/proc/proc_misc.c | 23 +++++++
include/linux/nonresident-cart.h | 2
include/linux/nonresident.h | 4 -
mm/Kconfig | 5 +
mm/Makefile | 1
mm/cart.c | 4 -
mm/clockpro.c | 8 +-
mm/nonresident-cart.c | 110 +++++++++++++++++++++++++++++++++----
mm/nonresident.c | 17 ++++-
mm/refault.c | 114 +++++++++++++++++++++++++++++++++++++++
10 files changed, 262 insertions(+), 26 deletions(-)
Index: linux-2.6/fs/proc/proc_misc.c
===================================================================
--- linux-2.6.orig/fs/proc/proc_misc.c 2006-07-12 16:07:25.000000000 +0200
+++ linux-2.6/fs/proc/proc_misc.c 2006-07-12 16:09:24.000000000 +0200
@@ -220,6 +220,26 @@ static struct file_operations fragmentat
.release = seq_release,
};
+#ifdef CONFIG_MM_REFAULT
+extern struct seq_operations refault_op;
+static int refault_open(struct inode *inode, struct file *file)
+{
+ (void)inode;
+ return seq_open(file, &refault_op);
+}
+
+extern ssize_t refault_write(struct file *, const char __user *buf,
+ size_t count, loff_t *);
+
+static struct file_operations refault_file_operations = {
+ .open = refault_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+ .write = refault_write,
+};
+#endif
+
extern struct seq_operations zoneinfo_op;
static int zoneinfo_open(struct inode *inode, struct file *file)
{
@@ -692,6 +712,9 @@ void __init proc_misc_init(void)
#endif
#endif
create_seq_entry("buddyinfo",S_IRUGO, &fragmentation_file_operations);
+#ifdef CONFIG_MM_REFAULT
+ create_seq_entry("refault",S_IRUGO, &refault_file_operations);
+#endif
create_seq_entry("vmstat",S_IRUGO, &proc_vmstat_file_operations);
create_seq_entry("zoneinfo",S_IRUGO, &proc_zoneinfo_file_operations);
create_seq_entry("diskstats", 0, &proc_diskstats_operations);
Index: linux-2.6/mm/Kconfig
===================================================================
--- linux-2.6.orig/mm/Kconfig 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/Kconfig 2006-07-12 16:09:24.000000000 +0200
@@ -165,6 +165,11 @@ config MM_POLICY_RANDOM
endchoice
+config MM_REFAULT
+ bool "Refault histogram"
+ def_bool y
+ depends on MM_POLICY_CLOCKPRO || MM_POLICY_CART || MM_POLICY_CART_R
+
#
# support for page migration
#
Index: linux-2.6/mm/Makefile
===================================================================
--- linux-2.6.orig/mm/Makefile 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/Makefile 2006-07-12 16:09:24.000000000 +0200
@@ -17,6 +17,7 @@ obj-$(CONFIG_MM_POLICY_CLOCKPRO) += nonr
obj-$(CONFIG_MM_POLICY_CART) += nonresident-cart.o cart.o
obj-$(CONFIG_MM_POLICY_CART_R) += nonresident-cart.o cart.o
obj-$(CONFIG_MM_POLICY_RANDOM) += random_policy.o
+obj-$(CONFIG_MM_REFAULT) += refault.o
obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o
obj-$(CONFIG_HUGETLBFS) += hugetlb.o
Index: linux-2.6/mm/cart.c
===================================================================
--- linux-2.6.orig/mm/cart.c 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/cart.c 2006-07-12 16:09:24.000000000 +0200
@@ -253,7 +253,7 @@ void __pgrep_add(struct zone *zone, stru
* specific PG_flags like: PG_t1, PG_longterm and PG_referenced.
*/
- rflags = nonresident_get(page_mapping(page), page_index(page));
+ rflags = nonresident_get(page_mapping(page), page_index(page), 1);
if (rflags & NR_found) {
SetPageLongTerm(page);
@@ -516,7 +516,7 @@ void pgrep_remember(struct zone *zone, s
void pgrep_forget(struct address_space *mapping, unsigned long index)
{
- nonresident_get(mapping, index);
+ nonresident_get(mapping, index, 0);
}
#define K(x) ((x) << (PAGE_SHIFT-10))
Index: linux-2.6/mm/clockpro.c
===================================================================
--- linux-2.6.orig/mm/clockpro.c 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/clockpro.c 2006-07-12 16:09:24.000000000 +0200
@@ -169,10 +169,10 @@ static void __nonres_cutoff_dec(unsigned
__get_cpu_var(nonres_cutoff) -= cutoff;
}
-static int nonres_get(struct address_space *mapping, unsigned long index)
+static int nonres_get(struct address_space *mapping, unsigned long index, int is_fault)
{
int found = 0;
- unsigned long distance = nonresident_get(mapping, index);
+ unsigned long distance = nonresident_get(mapping, index, is_fault);
if (distance != ~0UL) { /* valid page */
--__get_cpu_var(nonres_count);
@@ -310,7 +310,7 @@ void __pgrep_add(struct zone *zone, stru
int hand = HAND_HOT;
if (mapping)
- found = nonres_get(mapping, page_index(page));
+ found = nonres_get(mapping, page_index(page), 1);
#if 0
/* prefill the hot list */
@@ -550,7 +550,7 @@ void pgrep_remember(struct zone *zone, s
void pgrep_forget(struct address_space *mapping, unsigned long index)
{
- nonres_get(mapping, index);
+ nonres_get(mapping, index, 0);
}
static unsigned long estimate_pageable_memory(void)
Index: linux-2.6/mm/nonresident-cart.c
===================================================================
--- linux-2.6.orig/mm/nonresident-cart.c 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/nonresident-cart.c 2006-07-12 16:09:24.000000000 +0200
@@ -49,6 +49,8 @@
#include <linux/kernel.h>
#include <linux/nonresident-cart.h>
+#include <asm/div64.h>
+
#define TARGET_SLOTS 64
#define NR_CACHELINES (TARGET_SLOTS*sizeof(u32) / L1_CACHE_BYTES)
#define NR_SLOTS (((NR_CACHELINES * L1_CACHE_BYTES) - sizeof(spinlock_t) - 4*sizeof(u8)) / sizeof(u32))
@@ -207,6 +209,52 @@ static inline void __nonresident_push(st
__nonresident_insert(nr_bucket, listid, &nr_bucket->hand[listid], slot);
}
+unsigned int nonresident_total(void)
+{
+ return NR_SLOTS << nonres_shift;
+}
+
+static DEFINE_PER_CPU(unsigned long, nonres_bal);
+
+static inline unsigned long __nonres_bal(void)
+{
+ return __sum_cpu_var(unsigned long, nonres_bal);
+}
+
+static void __nonres_bal_inc(unsigned long db)
+{
+ unsigned long nr_total;
+ unsigned long nr_bal;
+
+ preempt_disable();
+
+ nr_total = nonresident_total();
+ nr_bal = __nonres_bal();
+
+ if (nr_bal + db > nr_total)
+ db = nr_total - nr_bal;
+ __get_cpu_var(nonres_bal) += db;
+
+ preempt_enable();
+}
+
+static void __nonres_bal_dec(unsigned long db)
+{
+ unsigned long nr_total;
+ unsigned long nr_bal;
+
+ preempt_disable();
+
+ nr_total = nonresident_total();
+ nr_bal = __nonres_bal();
+
+ if (nr_bal < db)
+ db = nr_bal;
+ __get_cpu_var(nonres_bal) += db;
+
+ preempt_enable();
+}
+
/*
* Remembers a page by putting a hash-cookie on the @listid list.
*
@@ -246,6 +294,10 @@ int nonresident_put(struct address_space
cookie = xchg(slot, cookie);
__nonresident_push(nr_bucket, listid, slot);
spin_unlock_irqrestore(&nr_bucket->lock, flags);
+ if (listid == NR_b1)
+ __nonres_bal_dec(1);
+ else
+ __nonres_bal_inc(1);
return evict;
}
@@ -258,12 +310,13 @@ int nonresident_put(struct address_space
*
* returns listid of the list the item was found on with NR_found set if found.
*/
-int nonresident_get(struct address_space * mapping, unsigned long index)
+int nonresident_get(struct address_space * mapping, unsigned long index, int is_fault)
{
struct nr_bucket * nr_bucket;
u32 wanted;
- int j;
- u8 i;
+ unsigned long tail_dist;
+ int pos;
+ int i;
unsigned long flags;
int ret = 0;
@@ -276,33 +329,64 @@ int nonresident_get(struct address_space
spin_lock_irqsave(&nr_bucket->lock, flags);
for (i = 0; i < 2; ++i) {
- j = nr_bucket->hand[i];
+ tail_dist = 0;
+ pos = nr_bucket->hand[i];
do {
- u32 *slot = &nr_bucket->slot[j];
+ u32 *slot = &nr_bucket->slot[pos];
if (GET_LISTID(*slot) != i)
break;
if ((*slot & COOKIE_MASK) == wanted) {
- slot = __nonresident_del(nr_bucket, i, j, slot);
+ slot = __nonresident_del(nr_bucket, i, pos, slot);
__nonresident_push(nr_bucket, NR_free, slot);
+
ret = i | NR_found;
goto out;
}
- j = GET_INDEX(*slot);
- } while (j != nr_bucket->hand[i]);
+ pos = GET_INDEX(*slot);
+ ++tail_dist;
+ } while (pos != nr_bucket->hand[i]);
}
out:
+#ifdef CONFIG_MM_REFAULT
+ if (is_fault) {
+ extern void nonresident_refault(unsigned long);
+ unsigned long distance = ~0UL;
+
+ if (i < 2) {
+ unsigned long long dist;
+ unsigned long dist_total;
+ unsigned long bal[2] = {
+ nonresident_total() - __nonres_bal(),
+ __nonres_bal(),
+ };
+
+ dist_total =
+ __sum_cpu_var(unsigned long, nonres_count[i]);
+
+ tail_dist <<= nonres_shift;
+ tail_dist += (nr_bucket - nonres_table);
+
+ if (dist_total < tail_dist)
+ dist = 0;
+ else
+ dist = dist_total - tail_dist;
+
+ dist *= nonresident_total();
+ do_div(dist, bal[i] ?: 1);
+ distance = dist;
+ }
+
+ nonresident_refault(distance);
+ }
+#endif /* CONFIG_MM_REFAULT */
+
spin_unlock_irqrestore(&nr_bucket->lock, flags);
return ret;
}
-unsigned int nonresident_total(void)
-{
- return (1 << nonres_shift) * NR_SLOTS;
-}
-
/*
* For interactive workloads, we remember about as many non-resident pages
* as we have actual memory pages. For server workloads with large inter-
Index: linux-2.6/mm/nonresident.c
===================================================================
--- linux-2.6.orig/mm/nonresident.c 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/nonresident.c 2006-07-12 16:10:49.000000000 +0200
@@ -72,7 +72,8 @@ static u32 nr_cookie(struct address_spac
return (u32)(cookie >> (BITS_PER_LONG - 32));
}
-unsigned long nonresident_get(struct address_space * mapping, unsigned long index)
+unsigned long nonresident_get(struct address_space * mapping, unsigned long index,
+ int is_fault)
{
struct nr_bucket * nr_bucket;
int distance;
@@ -95,11 +96,19 @@ unsigned long nonresident_get(struct add
* Add some jitter to the lower nonres_shift bits.
*/
distance += (nr_bucket - nonres_table);
- return distance;
+ goto out;
}
}
- return ~0UL;
+ distance = ~0UL;
+out:
+#ifdef CONFIG_MM_REFAULT
+ if (is_fault) {
+ extern void nonresident_refault(unsigned long);
+ nonresident_refault(distance);
+ }
+#endif /* CONFIG_MM_REFAULT */
+ return distance;
}
u32 nonresident_put(struct address_space * mapping, unsigned long index)
@@ -129,7 +138,7 @@ retry:
return xchg(&nr_bucket->page[i], nrpage);
}
-unsigned long fastcall nonresident_total(void)
+unsigned long nonresident_total(void)
{
return NUM_NR << nonres_shift;
}
Index: linux-2.6/mm/refault.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/mm/refault.c 2006-07-12 16:09:24.000000000 +0200
@@ -0,0 +1,114 @@
+#include <linux/config.h>
+#include <linux/percpu.h>
+#include <linux/seq_file.h>
+#include <asm/uaccess.h>
+
+#define BUCKETS 64
+
+DEFINE_PER_CPU(unsigned long[BUCKETS+1], refault_histogram);
+
+extern unsigned long nonresident_total(void);
+
+void nonresident_refault(unsigned long distance)
+{
+ unsigned long nonres_bucket = nonresident_total() / BUCKETS;
+ unsigned long bucket_id = distance / nonres_bucket;
+
+ if (bucket_id > BUCKETS)
+ bucket_id = BUCKETS;
+
+ __get_cpu_var(refault_histogram)[bucket_id]++;
+}
+
+#ifdef CONFIG_PROC_FS
+
+#include <linux/seq_file.h>
+
+static void *frag_start(struct seq_file *m, loff_t *pos)
+{
+ if (*pos < 0 || *pos > BUCKETS)
+ return NULL;
+
+ m->private = (void *)(unsigned long)*pos;
+
+ return pos;
+}
+
+static void *frag_next(struct seq_file *m, void *arg, loff_t *pos)
+{
+ if (*pos < BUCKETS) {
+ (*pos)++;
+ (unsigned long)m->private++;
+ return pos;
+ }
+ return NULL;
+}
+
+static void frag_stop(struct seq_file *m, void *arg)
+{
+}
+
+unsigned long get_refault_stat(unsigned long index)
+{
+ unsigned long total = 0;
+ int cpu;
+
+ for_each_cpu(cpu) {
+ total += per_cpu(refault_histogram, cpu)[index];
+ }
+ return total;
+}
+
+static int frag_show(struct seq_file *m, void *arg)
+{
+ unsigned long index = (unsigned long)m->private;
+ unsigned long nonres_bucket = nonresident_total() / BUCKETS;
+ unsigned long upper = ((unsigned long)index + 1) * nonres_bucket;
+ unsigned long lower = (unsigned long)index * nonres_bucket;
+ unsigned long hits = get_refault_stat(index);
+
+ if (index == 0)
+ seq_printf(m, " Refault distance Hits\n");
+
+ if (index < BUCKETS)
+ seq_printf(m, "%9lu - %9lu %9lu\n", lower, upper, hits);
+ else
+ seq_printf(m, " New/Beyond %9lu %9lu\n", lower, hits);
+
+ return 0;
+}
+
+struct seq_operations refault_op = {
+ .start = frag_start,
+ .next = frag_next,
+ .stop = frag_stop,
+ .show = frag_show,
+};
+
+static void refault_reset(void)
+{
+ int cpu;
+ int bucket_id;
+
+ for_each_cpu(cpu) {
+ for (bucket_id = 0; bucket_id <= BUCKETS; ++bucket_id)
+ per_cpu(refault_histogram, cpu)[bucket_id] = 0;
+ }
+}
+
+ssize_t refault_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ if (count) {
+ char c;
+
+ if (get_user(c, buf))
+ return -EFAULT;
+ if (c == '0')
+ refault_reset();
+ }
+ return count;
+}
+
+#endif /* CONFIG_PROCFS */
+
Index: linux-2.6/include/linux/nonresident-cart.h
===================================================================
--- linux-2.6.orig/include/linux/nonresident-cart.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/nonresident-cart.h 2006-07-12 16:09:24.000000000 +0200
@@ -15,7 +15,7 @@
#define NR_found 0x80000000
extern int nonresident_put(struct address_space *, unsigned long, int, int);
-extern int nonresident_get(struct address_space *, unsigned long);
+extern int nonresident_get(struct address_space *, unsigned long, int);
extern unsigned int nonresident_total(void);
extern void nonresident_init(void);
Index: linux-2.6/include/linux/nonresident.h
===================================================================
--- linux-2.6.orig/include/linux/nonresident.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/nonresident.h 2006-07-12 16:09:24.000000000 +0200
@@ -4,9 +4,9 @@
#ifdef __KERNEL__
extern void nonresident_init(void);
-extern unsigned long nonresident_get(struct address_space *, unsigned long);
+extern unsigned long nonresident_get(struct address_space *, unsigned long, int);
extern u32 nonresident_put(struct address_space *, unsigned long);
-extern unsigned long fastcall nonresident_total(void);
+extern unsigned long nonresident_total(void);
#endif /* __KERNEL */
#endif /* _LINUX_NONRESIDENT_H_ */
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 37/39] mm: use-once: cleanup of the use-once logic
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (35 preceding siblings ...)
2006-07-12 14:44 ` [PATCH 36/39] mm: refault histogram for non-resident policies Peter Zijlstra
@ 2006-07-12 14:44 ` Peter Zijlstra
2006-07-12 14:44 ` [PATCH 38/39] mm: use-once: use the generic shrinker logic Peter Zijlstra
` (2 subsequent siblings)
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:44 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Explicit use-once implementation.
Based on ideas and code from Rik van Riel.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl
include/linux/mm_use_once_policy.h | 70 +++++++++++--------------------------
mm/useonce.c | 7 ++-
2 files changed, 26 insertions(+), 51 deletions(-)
Index: linux-2.6/include/linux/mm_use_once_policy.h
===================================================================
--- linux-2.6.orig/include/linux/mm_use_once_policy.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/mm_use_once_policy.h 2006-07-12 16:11:19.000000000 +0200
@@ -8,12 +8,17 @@
#include <linux/page-flags.h>
#define PG_active PG_reclaim1
+#define PG_new PG_reclaim2
#define PageActive(page) test_bit(PG_active, &(page)->flags)
#define SetPageActive(page) set_bit(PG_active, &(page)->flags)
#define ClearPageActive(page) clear_bit(PG_active, &(page)->flags)
#define __ClearPageActive(page) __clear_bit(PG_active, &(page)->flags)
+#define PageNew(page) test_bit(PG_new, &(page)->flags)
+#define SetPageNew(page) set_bit(PG_new, &(page)->flags)
+#define ClearPageNew(page) clear_bit(PG_new, &(page)->flags)
+
static inline void
add_page_to_active_list(struct zone *zone, struct page *page)
{
@@ -49,6 +54,7 @@ static inline void pgrep_hint_active(str
static inline void pgrep_hint_use_once(struct page *page)
{
+ SetPageNew(page);
}
static inline void
@@ -60,67 +66,31 @@ __pgrep_add(struct zone *zone, struct pa
add_page_to_inactive_list(zone, page);
}
-/*
- * Mark a page as having seen activity.
- *
- * inactive,unreferenced -> inactive,referenced
- * inactive,referenced -> active,unreferenced
- * active,unreferenced -> active,referenced
- */
static inline void pgrep_mark_accessed(struct page *page)
{
- if (!PageActive(page) && PageReferenced(page) && PageLRU(page)) {
- struct zone *zone = page_zone(page);
-
- spin_lock_irq(&zone->lru_lock);
- if (PageLRU(page) && !PageActive(page)) {
- del_page_from_inactive_list(zone, page);
- SetPageActive(page);
- add_page_to_active_list(zone, page);
- inc_page_state(pgactivate);
- }
- spin_unlock_irq(&zone->lru_lock);
- ClearPageReferenced(page);
- } else if (!PageReferenced(page)) {
+ if (!PageReferenced(page))
SetPageReferenced(page);
- }
-}
-
-/* Called without lock on whether page is mapped, so answer is unstable */
-static inline int page_mapping_inuse(struct page *page)
-{
- struct address_space *mapping;
-
- /* Page is in somebody's page tables. */
- if (page_mapped(page))
- return 1;
-
- /* Be more reluctant to reclaim swapcache than pagecache */
- if (PageSwapCache(page))
- return 1;
-
- mapping = page_mapping(page);
- if (!mapping)
- return 0;
-
- /* File is mmap'd by somebody? */
- return mapping_mapped(mapping);
}
static inline reclaim_t pgrep_reclaimable(struct page *page)
{
- int referenced;
+ int referenced, keep;
if (PageActive(page))
BUG();
referenced = page_referenced(page, 1, 0);
- /* In active use or really unfreeable? Activate it. */
- if (referenced && page_mapping_inuse(page))
- return RECLAIM_ACTIVATE;
- if (referenced)
- return RECLAIM_REFERENCED;
+ keep = PageNew(page);
+ if (keep)
+ ClearPageNew(page);
+
+ if (referenced) {
+ if (keep)
+ return RECLAIM_KEEP;
+
+ return RECLAIM_ACTIVATE;
+ }
return RECLAIM_OK;
}
@@ -143,12 +113,16 @@ static inline void pgrep_copy_state(stru
{
if (PageActive(spage))
SetPageActive(dpage);
+ if (PageNew(spage))
+ SetPageNew(dpage);
}
static inline void pgrep_clear_state(struct page *page)
{
if (PageActive(page))
ClearPageActive(page);
+ if (PageNew(page))
+ ClearPageNew(page);
}
static inline int pgrep_is_active(struct page *page)
Index: linux-2.6/mm/useonce.c
===================================================================
--- linux-2.6.orig/mm/useonce.c 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/mm/useonce.c 2006-07-12 16:11:19.000000000 +0200
@@ -172,6 +172,7 @@ static void shrink_active_list(unsigned
LIST_HEAD(l_active); /* Pages to go onto the active_list */
struct page *page;
struct pagevec pvec;
+ int referenced;
if (!sc->may_swap)
reclaim_mapped = 0;
@@ -186,10 +187,10 @@ static void shrink_active_list(unsigned
cond_resched();
page = lru_to_page(&l_hold);
list_del(&page->lru);
+ referenced = page_referenced(page, 0, 0);
if (page_mapped(page)) {
- if (!reclaim_mapped ||
- (total_swap_pages == 0 && PageAnon(page)) ||
- page_referenced(page, 0, 0)) {
+ if (referenced || !reclaim_mapped ||
+ (total_swap_pages == 0 && PageAnon(page))) {
list_add(&page->lru, &l_active);
continue;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 38/39] mm: use-once: use the generic shrinker logic
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (36 preceding siblings ...)
2006-07-12 14:44 ` [PATCH 37/39] mm: use-once: cleanup of the use-once logic Peter Zijlstra
@ 2006-07-12 14:44 ` Peter Zijlstra
2006-07-12 14:44 ` [PATCH 39/39] mm: use-once: cleanup of the insertion logic Peter Zijlstra
2006-07-13 15:38 ` [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Christoph Lameter
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:44 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Makes the use-once policy use the generic shrinker.
Based on ideas from Wu Fengguang.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
include/linux/mm_use_once_data.h | 2
include/linux/mm_use_once_policy.h | 5 -
mm/useonce.c | 156 +++++++++----------------------------
3 files changed, 45 insertions(+), 118 deletions(-)
Index: linux-2.6/mm/useonce.c
===================================================================
--- linux-2.6.orig/mm/useonce.c 2006-07-12 16:10:56.000000000 +0200
+++ linux-2.6/mm/useonce.c 2006-07-12 16:11:07.000000000 +0200
@@ -16,7 +16,7 @@ void __init pgrep_init_zone(struct zone
INIT_LIST_HEAD(&zone->policy.active_list);
INIT_LIST_HEAD(&zone->policy.inactive_list);
zone->policy.nr_scan_active = 0;
- zone->policy.nr_scan_inactive = 0;
+ zone->policy.nr_scan = 0;
zone->policy.nr_active = 0;
zone->policy.nr_inactive = 0;
}
@@ -78,73 +78,6 @@ void pgrep_reinsert(struct list_head *pa
}
}
/*
- * shrink_inactive_list() is a helper for shrink_zone(). It returns the number
- * of reclaimed pages
- */
-static unsigned long shrink_inactive_list(unsigned long max_scan,
- struct zone *zone, struct scan_control *sc)
-{
- LIST_HEAD(page_list);
- struct pagevec pvec;
- unsigned long nr_scanned = 0;
- unsigned long nr_reclaimed = 0;
- pagevec_init(&pvec, 1);
-
- pgrep_add_drain();
- spin_lock_irq(&zone->lru_lock);
- do {
- struct page *page;
- unsigned long nr_taken;
- unsigned long nr_scan;
- unsigned long nr_freed;
-
- nr_taken = isolate_lru_pages(zone, sc->swap_cluster_max,
- &zone->policy.inactive_list,
- &page_list, &nr_scan);
- spin_unlock_irq(&zone->lru_lock);
-
- nr_scanned += nr_scan;
- nr_freed = shrink_page_list(&page_list, sc);
- nr_reclaimed += nr_freed;
- local_irq_disable();
- if (current_is_kswapd()) {
- __mod_page_state_zone(zone, pgscan_kswapd, nr_scan);
- __mod_page_state(kswapd_steal, nr_freed);
- } else
- __mod_page_state_zone(zone, pgscan_direct, nr_scan);
- __mod_page_state_zone(zone, pgsteal, nr_freed);
-
- if (nr_taken == 0)
- goto done;
-
- spin_lock(&zone->lru_lock);
- /*
- * Put back any unfreeable pages.
- */
- while (!list_empty(&page_list)) {
- page = lru_to_page(&page_list);
- BUG_ON(PageLRU(page));
- SetPageLRU(page);
- list_del(&page->lru);
- if (PageActive(page))
- add_page_to_active_list(zone, page);
- else
- add_page_to_inactive_list(zone, page);
- if (!pagevec_add(&pvec, page)) {
- spin_unlock_irq(&zone->lru_lock);
- __pagevec_release(&pvec);
- spin_lock_irq(&zone->lru_lock);
- }
- }
- } while (nr_scanned < max_scan);
- spin_unlock(&zone->lru_lock);
-done:
- local_irq_enable();
- pagevec_release(&pvec);
- return nr_reclaimed;
-}
-
-/*
* This moves pages from the active list to the inactive list.
*
* We move them the other way if the page is referenced by one or more
@@ -162,7 +95,7 @@ done:
* But we had to alter page->flags anyway.
*/
static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
- struct scan_control *sc, int reclaim_mapped)
+ int reclaim_mapped)
{
unsigned long pgmoved;
int pgdeactivate = 0;
@@ -174,9 +107,6 @@ static void shrink_active_list(unsigned
struct pagevec pvec;
int referenced;
- if (!sc->may_swap)
- reclaim_mapped = 0;
-
pgrep_add_drain();
spin_lock_irq(&zone->lru_lock);
pgmoved = isolate_lru_pages(zone, nr_pages, &zone->policy.active_list,
@@ -257,59 +187,53 @@ static void shrink_active_list(unsigned
pagevec_release(&pvec);
}
-/*
- * This is a basic per-zone page freer. Used by both kswapd and direct reclaim.
- */
-unsigned long pgrep_shrink_zone(int priority, struct zone *zone,
- struct scan_control *sc)
+void __pgrep_get_candidates(struct zone *zone, int priority,
+ unsigned long nr_to_scan, struct list_head *pages,
+ unsigned long *nr_scanned)
{
- unsigned long nr_active;
- unsigned long nr_inactive;
- unsigned long nr_to_scan;
- unsigned long nr_reclaimed = 0;
- int reclaim_mapped = should_reclaim_mapped(zone);
+ unsigned long nr_taken;
+ unsigned long long nr_scan_active;
- atomic_inc(&zone->reclaim_in_progress);
+ nr_taken = isolate_lru_pages(zone, nr_to_scan,
+ &zone->policy.inactive_list, pages, nr_scanned);
- /*
- * Add one to `nr_to_scan' just to make sure that the kernel will
- * slowly sift through the active list.
- */
- zone->policy.nr_scan_active += (zone->policy.nr_active >> priority) + 1;
- nr_active = zone->policy.nr_scan_active;
- if (nr_active >= sc->swap_cluster_max)
- zone->policy.nr_scan_active = 0;
- else
- nr_active = 0;
+ nr_scan_active = nr_to_scan * zone->policy.nr_active * 1024ULL;
+ do_div(nr_scan_active, zone->policy.nr_inactive + nr_taken + 1UL);
+ zone->policy.nr_scan_active += nr_scan_active;
+}
- zone->policy.nr_scan_inactive += (zone->policy.nr_inactive >> priority) + 1;
- nr_inactive = zone->policy.nr_scan_inactive;
- if (nr_inactive >= sc->swap_cluster_max)
- zone->policy.nr_scan_inactive = 0;
- else
- nr_inactive = 0;
+void pgrep_put_candidates(struct zone *zone, struct list_head *pages,
+ unsigned long nr_freed, int may_swap)
+{
+ int reclaim_mapped = should_reclaim_mapped(zone);
+ struct pagevec pvec;
- while (nr_active || nr_inactive) {
- if (nr_active) {
- nr_to_scan = min(nr_active,
- (unsigned long)sc->swap_cluster_max);
- nr_active -= nr_to_scan;
- shrink_active_list(nr_to_scan, zone, sc, reclaim_mapped);
- }
+ pagevec_init(&pvec, 1);
- if (nr_inactive) {
- nr_to_scan = min(nr_inactive,
- (unsigned long)sc->swap_cluster_max);
- nr_inactive -= nr_to_scan;
- nr_reclaimed += shrink_inactive_list(nr_to_scan, zone,
- sc);
+ spin_lock_irq(&zone->lru_lock);
+ while (!list_empty(pages)) {
+ struct page *page = lru_to_page(pages);
+ BUG_ON(PageLRU(page));
+ SetPageLRU(page);
+ list_del(&page->lru);
+ if (PageActive(page))
+ add_page_to_active_list(zone, page);
+ else
+ add_page_to_inactive_list(zone, page);
+ if (!pagevec_add(&pvec, page)) {
+ spin_unlock_irq(&zone->lru_lock);
+ __pagevec_release(&pvec);
+ spin_lock_irq(&zone->lru_lock);
}
}
+ spin_unlock_irq(&zone->lru_lock);
- throttle_vm_writeout();
+ pagevec_release(&pvec);
- atomic_dec(&zone->reclaim_in_progress);
- return nr_reclaimed;
+ while (zone->policy.nr_scan_active >= SWAP_CLUSTER_MAX * 1024UL) {
+ zone->policy.nr_scan_active -= SWAP_CLUSTER_MAX * 1024UL;
+ shrink_active_list(SWAP_CLUSTER_MAX, zone, reclaim_mapped);
+ }
}
#define K(x) ((x) << (PAGE_SHIFT-10))
@@ -359,7 +283,7 @@ void pgrep_zoneinfo(struct zone *zone, s
zone->policy.nr_active,
zone->policy.nr_inactive,
zone->pages_scanned,
- zone->policy.nr_scan_active, zone->policy.nr_scan_inactive,
+ zone->policy.nr_scan_active / 1024, zone->policy.nr_scan,
zone->spanned_pages,
zone->present_pages);
}
Index: linux-2.6/include/linux/mm_use_once_policy.h
===================================================================
--- linux-2.6.orig/include/linux/mm_use_once_policy.h 2006-07-12 16:10:56.000000000 +0200
+++ linux-2.6/include/linux/mm_use_once_policy.h 2006-07-12 16:10:56.000000000 +0200
@@ -147,7 +147,10 @@ static inline unsigned long __pgrep_nr_p
return zone->policy.nr_active + zone->policy.nr_inactive;
}
-#define MM_POLICY_HAS_SHRINKER
+static inline unsigned long __pgrep_nr_scan(struct zone *zone)
+{
+ return zone->policy.nr_inactive;
+}
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_USEONCE_POLICY_H */
Index: linux-2.6/include/linux/mm_use_once_data.h
===================================================================
--- linux-2.6.orig/include/linux/mm_use_once_data.h 2006-07-12 16:09:19.000000000 +0200
+++ linux-2.6/include/linux/mm_use_once_data.h 2006-07-12 16:10:56.000000000 +0200
@@ -7,7 +7,7 @@ struct pgrep_data {
struct list_head active_list;
struct list_head inactive_list;
unsigned long nr_scan_active;
- unsigned long nr_scan_inactive;
+ unsigned long nr_scan;
unsigned long nr_active;
unsigned long nr_inactive;
};
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 39/39] mm: use-once: cleanup of the insertion logic
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (37 preceding siblings ...)
2006-07-12 14:44 ` [PATCH 38/39] mm: use-once: use the generic shrinker logic Peter Zijlstra
@ 2006-07-12 14:44 ` Peter Zijlstra
2006-07-13 15:38 ` [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Christoph Lameter
39 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-12 14:44 UTC (permalink / raw)
To: linux-mm; +Cc: Peter Zijlstra
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Make the use-once policy use only a single PCP for insertion.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
mm/useonce.c | 24 +-----------------------
1 file changed, 1 insertion(+), 23 deletions(-)
Index: linux-2.6/mm/useonce.c
===================================================================
--- linux-2.6.orig/mm/useonce.c 2006-07-12 16:10:56.000000000 +0200
+++ linux-2.6/mm/useonce.c 2006-07-12 16:10:56.000000000 +0200
@@ -26,9 +26,8 @@ void __init pgrep_init_zone(struct zone
* @page: the page to add
*/
static DEFINE_PER_CPU(struct pagevec, lru_add_pvecs) = { 0, };
-static DEFINE_PER_CPU(struct pagevec, lru_add_active_pvecs) = { 0, };
-static inline void lru_cache_add(struct page *page)
+void fastcall pgrep_add(struct page *page)
{
struct pagevec *pvec = &get_cpu_var(lru_add_pvecs);
@@ -38,33 +37,12 @@ static inline void lru_cache_add(struct
put_cpu_var(lru_add_pvecs);
}
-static inline void lru_cache_add_active(struct page *page)
-{
- struct pagevec *pvec = &get_cpu_var(lru_add_active_pvecs);
-
- page_cache_get(page);
- if (!pagevec_add(pvec, page))
- __pagevec_pgrep_add(pvec);
- put_cpu_var(lru_add_active_pvecs);
-}
-
-void fastcall pgrep_add(struct page *page)
-{
- if (PageActive(page))
- lru_cache_add_active(page);
- else
- lru_cache_add(page);
-}
-
void __pgrep_add_drain(unsigned int cpu)
{
struct pagevec *pvec = &per_cpu(lru_add_pvecs, cpu);
if (pagevec_count(pvec))
__pagevec_pgrep_add(pvec);
- pvec = &per_cpu(lru_add_active_pvecs, cpu);
- if (pagevec_count(pvec))
- __pagevec_pgrep_add(pvec);
}
void pgrep_reinsert(struct list_head *page_list)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies
2006-07-12 14:36 [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Peter Zijlstra
` (38 preceding siblings ...)
2006-07-12 14:44 ` [PATCH 39/39] mm: use-once: cleanup of the insertion logic Peter Zijlstra
@ 2006-07-13 15:38 ` Christoph Lameter
2006-07-15 17:03 ` Peter Zijlstra
39 siblings, 1 reply; 44+ messages in thread
From: Christoph Lameter @ 2006-07-13 15:38 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: linux-mm
On Wed, 12 Jul 2006, Peter Zijlstra wrote:
> with OLS around the corner, I thought I'd repost all my page-replacement work
> so people can get a quick peek at the current status.
> This should help discussion next week.
Ummm... Some high level discussion on what you are doing here and why
would be helpful.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies
2006-07-13 15:38 ` [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies Christoph Lameter
@ 2006-07-15 17:03 ` Peter Zijlstra
2006-07-16 3:50 ` Christoph Lameter
0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2006-07-15 17:03 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm
On Thu, 2006-07-13 at 08:38 -0700, Christoph Lameter wrote:
> On Wed, 12 Jul 2006, Peter Zijlstra wrote:
>
> > with OLS around the corner, I thought I'd repost all my page-replacement work
> > so people can get a quick peek at the current status.
> > This should help discussion next week.
>
> Ummm... Some high level discussion on what you are doing here and why
> would be helpful.
Sorry for the late reply.
The page replacement framework takes away all knowledge of the page
replacement implementation from the rest of the kernel. That is, it
takes out all direct references to list_active and list_inactive and
manipulations thereon and replaces them by the following functions:
pgrep_hint_*()
|
v
pgrep_add()
_____ |\___________________________________________.
/ \|_________________. \
| v \ v
| pgrep_get_candidates() | pgrep_remove()
| | | .____________/|
| v | v |
| pgrep_reclaimable() | pgrep_reinsert() |
| | \__________/ |
| |\_____________________. ,___________________/
| | \ /
| v v
| [pgrep_activate()] pgrep_clear_state()
| | |
| v v
| pgrep_put_candidates() [pgrep_remember()]
\______/
(There are some more functions, but this shows the main flow)
Then the patch-set goes on to re-implement all this 4 more times.
(admittedly this is a bit excessive, but has been much fun to do, and
has made sure the abstraction is powerfull enough to cope with very
different approaches to page reclaim).
Now on the why, I still believe one of the advanced page replacement
algorithms are better than the currently implemented. If only because
they have access to more information, namely that provided by the
nonresident page tracking. (Which, as shown by Rik's OLS entry this
year, provides more interresting uses)
I hope this answers enough of your questions, hessitate not to ask more.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies
2006-07-15 17:03 ` Peter Zijlstra
@ 2006-07-16 3:50 ` Christoph Lameter
2006-07-26 10:03 ` Marcelo Tosatti
0 siblings, 1 reply; 44+ messages in thread
From: Christoph Lameter @ 2006-07-16 3:50 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: linux-mm
On Sat, 15 Jul 2006, Peter Zijlstra wrote:
> Now on the why, I still believe one of the advanced page replacement
> algorithms are better than the currently implemented. If only because
> they have access to more information, namely that provided by the
> nonresident page tracking. (Which, as shown by Rik's OLS entry this
> year, provides more interresting uses)
Could you show us some workloads where this makes a significant
difference?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 0/39] mm: 2.6.17-pr1 - generic page-replacement framework and 4 new policies
2006-07-16 3:50 ` Christoph Lameter
@ 2006-07-26 10:03 ` Marcelo Tosatti
0 siblings, 0 replies; 44+ messages in thread
From: Marcelo Tosatti @ 2006-07-26 10:03 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Peter Zijlstra, linux-mm
On Sat, Jul 15, 2006 at 08:50:06PM -0700, Christoph Lameter wrote:
> On Sat, 15 Jul 2006, Peter Zijlstra wrote:
>
> > Now on the why, I still believe one of the advanced page replacement
> > algorithms are better than the currently implemented. If only because
> > they have access to more information, namely that provided by the
> > nonresident page tracking. (Which, as shown by Rik's OLS entry this
> > year, provides more interresting uses)
>
> Could you show us some workloads where this makes a significant
> difference?
http://www.linux-mm.org/PageReplacementTesting for instance.
Check the CLOCKPro/ARC papers for more details.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread