linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH] using page aging to shrink caches
@ 2002-05-18  4:10 Ed Tomlinson
  2002-05-21 18:47 ` Benjamin LaHaise
  0 siblings, 1 reply; 8+ messages in thread
From: Ed Tomlinson @ 2002-05-18  4:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm

Hi,

I have never been happy with the way slab cache shrinking worked.  This is an
attempt to make it better.  Working with the rmap vm on pre7-ac2, I have done
the following.

1. Moved reapable slab pages on to the active and inactive dirty lists.
2. When slab pages enter the inactive dirty list I count the number of pages
    seen on a per cache basis.
3. If slab pages manage to reach the front of the inactive dirty list I count
    the pages seen on a per cache basis.
4. After inactive_refill/inactive_refill_zone calls I scan the slab caches and,
    via a callback, shrink the caches using the number of pages from 3 & 4
    as a goal for each cache.
5. kmem_cache_reap is called as a last ditch effort before declaring a oom.
6. Since slab pages are kernel map via 8M pages on i386, the hardware 
    page reference bit is fairly much useless.  When a slab is created it pages
    are marked as referenced.  I also mark pages in the lookup/get functions
    for inodes, dentries, dquota and buffer_heads.

A few comments.

#1 avoids the need to create mappings for the slab pages by intercepting them
in page_launder_zone and refill_inactive_zone before we start really playing
with the pages.

#2 was done to avoid having 500,000 plus dentry/inodes on a lightly loaded
system.  Seems with low vm pressure rmap can supply enough free pages via
background aging.  This process avoids page_launder but does call refill_inactive...

#4 uses ends up using kmem_cache_shrink_nr to shrink caches.  For the 
caches that require pruning this call gets wraped.  The calling sequence
for icache, dcache and dquoto shrinking now only prunes when it cannot
get enought pages via a simple shrink.

Comments?
Ed Tomlinson

PS.  I may not see replies until Tuesday.

-----------
# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.422   -> 1.432  
#	         fs/buffer.c	1.63    -> 1.64   
#	         fs/dcache.c	1.18    -> 1.24   
#	          fs/dquot.c	1.18    -> 1.22   
#	         mm/vmscan.c	1.60    -> 1.68   
#	include/linux/slab.h	1.9     -> 1.14   
#	include/linux/dcache.h	1.11    -> 1.14   
#	           mm/slab.c	1.16    -> 1.22   
#	          fs/inode.c	1.35    -> 1.41   
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/05/10	ed@oscar.et.ca	1.423
# Use the vm's page aging to tell us when we need to shrink the caches.
# The vm uses callbacks to tell the slabs caches its time to shrink.
# --------------------------------------------
# 02/05/10	ed@oscar.et.ca	1.424
# Change the way process_shrinks is called so refill_invalid does not
# need to be changed.
# --------------------------------------------
# 02/05/11	ed@oscar.et.ca	1.426
# Simplify the scheme.  Use per cache callbacks instead of per family.
# This lets us target specific caches instead of being generic.  We
# still include a generic call (kmem_cache_reap) as a failsafe
# before ooming.
# --------------------------------------------
# 02/05/11	ed@oscar.et.ca	1.428
# Change factoring, removing changes from background_aging and putting
# the kmem_call_shrinkers call in kswapd.
# --------------------------------------------
# 02/05/11	ed@oscar.et.ca	1.429
# Base the number of pages a cache is shrunk on the number of pages the
# vm sees in refill_inactive_zone instead of on a magic priority.
# --------------------------------------------
# 02/05/12	ed@oscar.et.ca	1.430
# improve shrink methods for dcache, dquota and icache
# --------------------------------------------
# 02/05/12	ed@oscar.et.ca	1.428.1.1
# The icache is a slave of the dcache.  We will not reuse the inodes so
# lets clean them all.
# --------------------------------------------
# 02/05/12	ed@oscar.et.ca	1.428.1.2
# Only call shrink callback if we have seen a slab's worth of pages
# --------------------------------------------
# 02/05/13	ed@oscar.et.ca	1.428.1.3
# Andrew Morton pointed out that kernal pages are big (8M) and the 
# hardware reference bit is working with these big pages.  This makes 
# aging slabs on 4K pages a little more difficult.  Andrew suggested 
# hooking into the kmem_cache_alloc process and set the bit(s) there.  
# This changeset does this.
# --------------------------------------------
# 02/05/16	ed@oscar.et.ca	1.428.1.6
# Improve aging for dcache, inode and dquota pages by setting the ref
# bits in the various lookup/get code.
# --------------------------------------------
# 02/05/16	ed@oscar.et.ca	1.428.1.7
# Add another kmem_touch_page in getblk for buffer_heads.  Convert
# kmem_touch_page to a macro.
# --------------------------------------------
# 02/05/17	ed@oscar.et.ca	1.432
# Use the number of slab pages that enter or are requeued in the 
# inactive dirty list as the goal for the number of pages to shrink 
# a slab cache.  Each cache can have its own shrink callback, though
# only caches that need 'pruning' require specialized functions.
# --------------------------------------------
#
diff -Nru a/fs/buffer.c b/fs/buffer.c
--- a/fs/buffer.c	Fri May 17 23:36:37 2002
+++ b/fs/buffer.c	Fri May 17 23:36:37 2002
@@ -1059,8 +1059,10 @@
 		struct buffer_head * bh;
 
 		bh = get_hash_table(dev, block, size);
-		if (bh)
+		if (bh) {
+			kmem_touch_page(bh);
 			return bh;
+		}
 
 		if (!grow_buffers(dev, block, size))
 			free_more_memory();
diff -Nru a/fs/dcache.c b/fs/dcache.c
--- a/fs/dcache.c	Fri May 17 23:36:37 2002
+++ b/fs/dcache.c	Fri May 17 23:36:37 2002
@@ -538,18 +538,11 @@
 
 /*
  * This is called from kswapd when we think we need some
- * more memory, but aren't really sure how much. So we
- * carefully try to free a _bit_ of our dcache, but not
- * too much.
- *
- * Priority:
- *   0 - very urgent: shrink everything
- *  ...
- *   6 - base-level: try to shrink a bit.
+ * more memory. 
  */
-int shrink_dcache_memory(int priority, unsigned int gfp_mask)
+int shrink_dcache_memory(kmem_cache_t *cachep, int pages, int priority, int gfp_mask)
 {
-	int count = 0;
+	int count = kmem_cache_shrink_nr(cachep, pages);
 
 	/*
 	 * Nasty deadlock avoidance.
@@ -563,12 +556,13 @@
 	 * block allocations, but for now:
 	 */
 	if (!(gfp_mask & __GFP_FS))
-		return 0;
+		return count;
 
-	count = dentry_stat.nr_unused / priority;
-
-	prune_dcache(count);
-	return kmem_cache_shrink_nr(dentry_cache);
+	if (count < pages) {
+		prune_dcache(dentry_stat.nr_unused/priority);
+		count += kmem_cache_shrink_nr(cachep, pages-count);
+	}
+	return count;
 }
 
 #define NAME_ALLOC_LEN(len)	((len+16) & ~15)
@@ -730,6 +724,7 @@
 		}
 		__dget_locked(dentry);
 		dentry->d_vfs_flags |= DCACHE_REFERENCED;
+		kmem_touch_page(dentry);
 		spin_unlock(&dcache_lock);
 		return dentry;
 	}
@@ -1186,6 +1181,8 @@
 	if (!dentry_cache)
 		panic("Cannot create dentry cache");
 
+	kmem_set_shrinker(dentry_cache, (shrinker_t)shrink_dcache_memory);
+
 #if PAGE_SHIFT < 13
 	mempages >>= (13 - PAGE_SHIFT);
 #endif
@@ -1278,6 +1275,9 @@
 			SLAB_HWCACHE_ALIGN, NULL, NULL);
 	if (!dquot_cachep)
 		panic("Cannot create dquot SLAB cache");
+	
+	kmem_set_shrinker(dquot_cachep, (shrinker_t)shrink_dqcache_memory);
+	
 #endif
 
 	dcache_init(mempages);
diff -Nru a/fs/dquot.c b/fs/dquot.c
--- a/fs/dquot.c	Fri May 17 23:36:37 2002
+++ b/fs/dquot.c	Fri May 17 23:36:37 2002
@@ -1024,12 +1024,17 @@
 	}
 }
 
-int shrink_dqcache_memory(int priority, unsigned int gfp_mask)
+int shrink_dqcache_memory(kmem_cache_t *cachep, int pages, int priority, unsigned int gfp_mask)
 {
-	lock_kernel();
-	prune_dqcache(nr_free_dquots / (priority + 1));
-	unlock_kernel();
-	return kmem_cache_shrink_nr(dquot_cachep);
+	int count = kmem_cache_shrink_nr(cachep, pages);
+
+	if (count < pages) {
+		lock_kernel();
+		prune_dqcache(nr_free_dquots);
+		unlock_kernel();
+		count += kmem_cache_shrink_nr(cachep, pages-count);
+	}
+	return count;
 }
 
 /*
@@ -1148,6 +1153,7 @@
 #endif
 	dquot->dq_referenced++;
 	dqstats.lookups++;
+	kmem_touch_page(dquot);
 
 	return dquot;
 }
diff -Nru a/fs/inode.c b/fs/inode.c
--- a/fs/inode.c	Fri May 17 23:36:37 2002
+++ b/fs/inode.c	Fri May 17 23:36:37 2002
@@ -193,6 +193,7 @@
 
 static inline void __iget(struct inode * inode)
 {
+	kmem_touch_page(inode);
 	if (atomic_read(&inode->i_count)) {
 		atomic_inc(&inode->i_count);
 		return;
@@ -708,9 +709,9 @@
 		schedule_task(&unused_inodes_flush_task);
 }
 
-int shrink_icache_memory(int priority, int gfp_mask)
+int shrink_icache_memory(kmem_cache_t *cachep, int pages, int priority, int gfp_mask)
 {
-	int count = 0;
+	int count = kmem_cache_shrink_nr(cachep, pages);
 
 	/*
 	 * Nasty deadlock avoidance..
@@ -720,12 +721,13 @@
 	 * in clear_inode() and friends..
 	 */
 	if (!(gfp_mask & __GFP_FS))
-		return 0;
+		return count;
 
-	count = inodes_stat.nr_unused / priority;
-
-	prune_icache(count);
-	return kmem_cache_shrink_nr(inode_cachep);
+	if (count < pages) {
+		prune_icache(inodes_stat.nr_unused);
+		count += kmem_cache_shrink_nr(cachep, pages-count);
+	}
+	return count;
 }
 
 /*
@@ -1172,6 +1174,8 @@
 					 NULL);
 	if (!inode_cachep)
 		panic("cannot create inode slab cache");
+
+	kmem_set_shrinker(inode_cachep, (shrinker_t)shrink_icache_memory);
 
 	unused_inodes_flush_task.routine = try_to_sync_unused_inodes;
 }
diff -Nru a/include/linux/dcache.h b/include/linux/dcache.h
--- a/include/linux/dcache.h	Fri May 17 23:36:37 2002
+++ b/include/linux/dcache.h	Fri May 17 23:36:37 2002
@@ -171,15 +171,10 @@
 #define shrink_dcache() prune_dcache(0)
 struct zone_struct;
 /* dcache memory management */
-extern int shrink_dcache_memory(int, unsigned int);
 extern void prune_dcache(int);
 
 /* icache memory management (defined in linux/fs/inode.c) */
-extern int shrink_icache_memory(int, int);
 extern void prune_icache(int);
-
-/* quota cache memory management (defined in linux/fs/dquot.c) */
-extern int shrink_dqcache_memory(int, unsigned int);
 
 /* only used at mount-time */
 extern struct dentry * d_alloc_root(struct inode *);
diff -Nru a/include/linux/slab.h b/include/linux/slab.h
--- a/include/linux/slab.h	Fri May 17 23:36:37 2002
+++ b/include/linux/slab.h	Fri May 17 23:36:37 2002
@@ -55,7 +55,27 @@
 				       void (*)(void *, kmem_cache_t *, unsigned long));
 extern int kmem_cache_destroy(kmem_cache_t *);
 extern int kmem_cache_shrink(kmem_cache_t *);
-extern int kmem_cache_shrink_nr(kmem_cache_t *);
+
+typedef int (*shrinker_t)(kmem_cache_t *, int, int, int);
+
+extern void kmem_set_shrinker(kmem_cache_t *, shrinker_t);
+extern int kmem_call_shrinkers(int, int);
+extern void kmem_count_page(struct page *);
+#define kmem_touch_page(addr) 		SetPageReferenced(virt_to_page(addr));
+
+/* shrink drivers */
+extern int kmem_shrink_pages(kmem_cache_t *, int, int, int);
+
+/* dcache shrinker ( defined in linux/fs/dcache.c) */
+extern int shrink_dcache_memory(kmem_cache_t *, int, int, int);
+
+/* icache shrinker (defined in linux/fs/inode.c) */
+extern int shrink_icache_memory(kmem_cache_t *, int, int, int);
+
+/* quota cache shrinker (defined in linux/fs/dquot.c) */
+extern int shrink_dqcache_memory(kmem_cache_t *, int, int, int);
+
+extern int kmem_cache_shrink_nr(kmem_cache_t *, int);
 extern void *kmem_cache_alloc(kmem_cache_t *, int);
 extern void kmem_cache_free(kmem_cache_t *, void *);
 
diff -Nru a/mm/slab.c b/mm/slab.c
--- a/mm/slab.c	Fri May 17 23:36:37 2002
+++ b/mm/slab.c	Fri May 17 23:36:37 2002
@@ -213,6 +213,8 @@
 	kmem_cache_t		*slabp_cache;
 	unsigned int		growing;
 	unsigned int		dflags;		/* dynamic flags */
+	shrinker_t		shrinker;	/* shrink callback */
+	int 			count;		/* count used to trigger shrink */
 
 	/* constructor func */
 	void (*ctor)(void *, kmem_cache_t *, unsigned long);
@@ -382,6 +384,54 @@
 static void enable_cpucache (kmem_cache_t *cachep);
 static void enable_all_cpucaches (void);
 #endif
+ 
+/* set the shrink call back function */
+void kmem_set_shrinker(kmem_cache_t * cachep, shrinker_t theshrinker) 
+{
+	cachep->shrinker = theshrinker;
+}
+
+/* used by refill_inactive_zone to determine caches that need shrinking */
+void kmem_count_page(struct page *page)
+{
+	kmem_cache_t *cachep = GET_PAGE_CACHE(page);
+	cachep->count++;
+}
+
+/* call the shrink functions */
+int kmem_call_shrinkers(int priority, int gfp_mask) 
+{
+	int ret = 0;
+	struct list_head *p;
+
+        if (gfp_mask & __GFP_WAIT)
+                down(&cache_chain_sem);
+        else
+                if (down_trylock(&cache_chain_sem))
+                        return 0;
+
+        list_for_each(p,&cache_chain) {
+                kmem_cache_t *cachep = list_entry(p, kmem_cache_t, next);
+		int pgs = (1<<cachep->gfporder);
+		if (cachep->count >= pgs) {
+			if (cachep->shrinker == NULL)
+				BUG();
+			pgs = pgs*(cachep->count+pgs-1)/pgs;
+			ret += (*cachep->shrinker)(cachep, pgs, priority, gfp_mask);
+			cachep->count = 0;
+		}		
+        }
+        up(&cache_chain_sem);
+	return ret;
+}
+
+
+/* Default shrink method - try to shrink the pages requested  */
+int kmem_shrink_pages(kmem_cache_t * cachep, int pages, int priority, int gfp_mask) 
+{
+	return kmem_cache_shrink_nr(cachep, pages);
+}
+
 
 /* Cal the num objs, wastage, and bytes left over for a given slab size. */
 static void kmem_cache_estimate (unsigned long gfporder, size_t size,
@@ -514,12 +564,31 @@
 	 * vm_scan(). Shouldn't be a worry.
 	 */
 	while (i--) {
+		if (!(cachep->flags & SLAB_NO_REAP))
+			lru_cache_del(page);
 		PageClearSlab(page);
 		page++;
 	}
 	free_pages((unsigned long)addr, cachep->gfporder);
 }
 
+/*
+ * kernel pages are 8M so 4k page ref bit is not set - we need to
+ * do it manually...
+ */
+void kmem_set_referenced(kmem_cache_t *cachep, slab_t *slabp)
+{
+        if (!(cachep->flags & SLAB_NO_REAP)) {
+        	unsigned long i = (1<<cachep->gfporder);
+        	struct page *page = virt_to_page(slabp->s_mem-slabp->colouroff);
+        	while (i--) {
+			SetPageReferenced(page);
+                	page++;
+		}
+        }
+}
+
+
 #if DEBUG
 static inline void kmem_poison_obj (kmem_cache_t *cachep, void *addr)
 {
@@ -781,6 +850,8 @@
 		flags |= CFLGS_OPTIMIZE;
 
 	cachep->flags = flags;
+	cachep->shrinker = ( shrinker_t)(kmem_shrink_pages);
+	cachep->count = 0;
 	cachep->gfpflags = 0;
 	if (flags & SLAB_CACHE_DMA)
 		cachep->gfpflags |= GFP_DMA;
@@ -912,8 +983,9 @@
 
 /**
  * Called with the &cachep->spinlock held, returns number of slabs released
+ * Use 0 to release all the slabs we can.
  */
-static int __kmem_cache_shrink_locked(kmem_cache_t *cachep)
+static int __kmem_cache_shrink_locked(kmem_cache_t *cachep, int slabs)
 {
         slab_t *slabp;
         int ret = 0;
@@ -935,8 +1007,10 @@
 
                 spin_unlock_irq(&cachep->spinlock);
                 kmem_slab_destroy(cachep, slabp);
-		ret++;
                 spin_lock_irq(&cachep->spinlock);
+
+		if (++ret == slabs)
+			break;
         }
         return ret;
 }
@@ -948,7 +1022,7 @@
 	drain_cpu_caches(cachep);
 
 	spin_lock_irq(&cachep->spinlock);
-	__kmem_cache_shrink_locked(cachep);
+	__kmem_cache_shrink_locked(cachep, 0);
 	ret = !list_empty(&cachep->slabs_full) || !list_empty(&cachep->slabs_partial);
 	spin_unlock_irq(&cachep->spinlock);
 	return ret;
@@ -972,7 +1046,7 @@
 /**
  * kmem_cache_shrink_nr - Shrink a cache returning pages released
  */
-int kmem_cache_shrink_nr(kmem_cache_t *cachep)
+int kmem_cache_shrink_nr(kmem_cache_t *cachep, int pages)
 {
         int ret;
 
@@ -982,7 +1056,7 @@
 	drain_cpu_caches(cachep);
 
 	spin_lock_irq(&cachep->spinlock);
-	ret = __kmem_cache_shrink_locked(cachep);
+	ret = __kmem_cache_shrink_locked(cachep, pages>>cachep->gfporder);
 	spin_unlock_irq(&cachep->spinlock);
 	return ret<<(cachep->gfporder);
 }
@@ -1184,6 +1258,8 @@
 		SET_PAGE_CACHE(page, cachep);
 		SET_PAGE_SLAB(page, slabp);
 		PageSetSlab(page);
+		if (!(cachep->flags & SLAB_NO_REAP))
+			lru_cache_add(page);
 		page++;
 	} while (--i);
 
@@ -1265,6 +1341,7 @@
 		list_del(&slabp->list);
 		list_add(&slabp->list, &cachep->slabs_full);
 	}
+	kmem_set_referenced(cachep, slabp);
 #if DEBUG
 	if (cachep->flags & SLAB_POISON)
 		if (kmem_check_poison_obj(cachep, objp))
diff -Nru a/mm/vmscan.c b/mm/vmscan.c
--- a/mm/vmscan.c	Fri May 17 23:36:37 2002
+++ b/mm/vmscan.c	Fri May 17 23:36:37 2002
@@ -102,6 +102,9 @@
 			continue;
 		}
 
+		if (PageSlab(page))
+			BUG();
+
 		/* Page is being freed */
 		if (unlikely(page_count(page)) == 0) {
 			list_del(page_lru);
@@ -244,7 +247,8 @@
 		 * The page is in active use or really unfreeable. Move to
 		 * the active list and adjust the page age if needed.
 		 */
-		if (page_referenced(page) && page_mapping_inuse(page) &&
+		if (page_referenced(page) &&
+				(page_mapping_inuse(page) || PageSlab(page)) &&
 				!page_over_rsslimit(page)) {
 			del_page_from_inactive_dirty_list(page);
 			add_page_to_active_list(page);
@@ -253,6 +257,15 @@
 		}
 
 		/*
+		 * These pages are 'naked' - we do not want any other tests
+		 * done on them...
+		 */
+		if (PageSlab(page)) {
+			kmem_count_page(page);
+			continue;
+		}
+
+		/*
 		 * Page is being freed, don't worry about it.
 		 */
 		if (unlikely(page_count(page)) == 0)
@@ -446,6 +459,7 @@
  * This function will scan a portion of the active list of a zone to find
  * unused pages, those pages will then be moved to the inactive list.
  */
+
 int refill_inactive_zone(struct zone_struct * zone, int priority)
 {
 	int maxscan = zone->active_pages >> priority;
@@ -473,7 +487,7 @@
 		 * bother with page aging.  If the page is touched again
 		 * while on the inactive_clean list it'll be reactivated.
 		 */
-		if (!page_mapping_inuse(page)) {
+		if (!page_mapping_inuse(page) && !PageSlab(page)) {
 			drop_page(page);
 			continue;
 		}
@@ -497,8 +511,12 @@
 			list_add(page_lru, &zone->active_list);
 		} else {
 			deactivate_page_nolock(page);
-			if (++nr_deactivated > target)
+			if (PageSlab(page))
+				kmem_count_page(page);
+			else {
+				if (++nr_deactivated > target)
 				break;
+			}
 		}
 
 		/* Low latency reschedule point */
@@ -513,6 +531,7 @@
 	return nr_deactivated;
 }
 
+
 /**
  * refill_inactive - checks all zones and refills the inactive list as needed
  *
@@ -577,24 +596,15 @@
 
 	/*
 	 * Eat memory from filesystem page cache, buffer cache,
-	 * dentry, inode and filesystem quota caches.
 	 */
 	ret += page_launder(gfp_mask);
-	ret += shrink_dcache_memory(DEF_PRIORITY, gfp_mask);
-	ret += shrink_icache_memory(1, gfp_mask);
-#ifdef CONFIG_QUOTA
-	ret += shrink_dqcache_memory(DEF_PRIORITY, gfp_mask);
-#endif
 
 	/*
-	 * Move pages from the active list to the inactive list.
+	 * Move pages from the active list to the inactive list and
+	 * shrink caches return pages gained by shrink
 	 */
 	refill_inactive();
-
-	/* 	
-	 * Reclaim unused slab cache memory.
-	 */
-	ret += kmem_cache_reap(gfp_mask);
+	ret += kmem_call_shrinkers(DEF_PRIORITY, gfp_mask);
 
 	refill_freelist();
 
@@ -603,11 +613,14 @@
 		run_task_queue(&tq_disk);
 
 	/*
-	 * Hmm.. Cache shrink failed - time to kill something?
+	 * Hmm.. - time to kill something?
 	 * Mhwahahhaha! This is the part I really like. Giggle.
 	 */
-	if (!ret && free_min(ANY_ZONE) > 0)
-		out_of_memory();
+	if (!ret && free_min(ANY_ZONE) > 0) {
+		ret += kmem_cache_reap(gfp_mask);
+		if (!ret)
+			out_of_memory();
+	}
 
 	return ret;
 }
@@ -700,6 +713,7 @@
 
 			/* Do background page aging. */
 			background_aging(DEF_PRIORITY);
+			kmem_call_shrinkers(DEF_PRIORITY, GFP_KSWAPD);
 		}
 
 		wakeup_memwaiters();

-----------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] using page aging to shrink caches
  2002-05-18  4:10 [RFC][PATCH] using page aging to shrink caches Ed Tomlinson
@ 2002-05-21 18:47 ` Benjamin LaHaise
  2002-05-24 11:28   ` [RFC][PATCH] using page aging to shrink caches (pre8-ac5) Ed Tomlinson
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin LaHaise @ 2002-05-21 18:47 UTC (permalink / raw)
  To: Ed Tomlinson; +Cc: linux-kernel, linux-mm

On Sat, May 18, 2002 at 12:10:51AM -0400, Ed Tomlinson wrote:
> I have never been happy with the way slab cache shrinking worked.  This is an
> attempt to make it better.  Working with the rmap vm on pre7-ac2, I have done
> the following.

Thank you!  This is should help greatly with some of the vm imbalances by 
making slab reclaim part of the self tuning dynamics instead of hard coded 
magic numbers.  Do you have any plans to port this patch to 2.5 for inclusion?  
It would be useful to get testing in the 2.5 before merging in 2.4.

		-ben
-- 
"You will be reincarnated as a toad; and you will be much happier."
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] using page aging to shrink caches (pre8-ac5)
  2002-05-21 18:47 ` Benjamin LaHaise
@ 2002-05-24 11:28   ` Ed Tomlinson
  2002-05-24 11:35     ` Christoph Hellwig
                       ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Ed Tomlinson @ 2002-05-24 11:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm

On May 21, 2002 02:47 pm, Benjamin LaHaise wrote:
> On Sat, May 18, 2002 at 12:10:51AM -0400, Ed Tomlinson wrote:
> > I have never been happy with the way slab cache shrinking worked.  This
> > is an attempt to make it better.  Working with the rmap vm on pre7-ac2, I
> > have done the following.
>
> Thank you!  This is should help greatly with some of the vm imbalances by
> making slab reclaim part of the self tuning dynamics instead of hard coded
> magic numbers.  Do you have any plans to port this patch to 2.5 for
> inclusion? It would be useful to get testing in the 2.5 before merging in
> 2.4.

Here is an improved version of the patch for pre8-ac5.  

This moves things towards having the vm do the work of freeing the pages.
I do wonder if it worth the effort in that slab pages are a bit different from
other pages and get treated a little differently.  For instance, we sometimes
free slab pages in refill_inactive.  Without this the caches can grow and grow
without any possibility of shrinking when under low loads.  By allowing freeing
we avoid getting into a situation where slab pages cause an artificial shortage.

Finding a good method of handling the dcache/icache and dquota  caches has 
been fun...  What I do now is factor the pruning and shrinking into different 
calls.  The puning, in effect, ages entries in the above caches.  The rate I 
prune is simply the rate I see entries for these slabs in refill_inactive_zone.
This is seems fair and, in my testing, works better than anything else I have tried
(I have have experimented quite a bit).  It also avoid using any magic numbers 
and is self tuning.

The logic has also be improved to free specific slabs instead of just freeing <n>
freeable slabs when <n> were encountered by the vm.  Now we try to free the
slabs we encounter as we find them (see kmem_shrink_slab).

It works well on my UP box running pre8-ac5 without problems.  

Think this is ready for wider testing.   If any of you have test boxes that are having
vm problems, especially if they are slab related, it would be interesting to see if this
helps.

Comments, questions and feedback very welcome,

Ed Tomlinson


Summary of changes.

fs/buffer.c
	touch the object's page when a buffer head is looked up

fs/dcache.c
	prune_cache now ages <n> entries instead of freeing <n>
	shrink_dcache_memory becomes age_dcache_memory and is called from the vm
		using kmem_do_prunes.
	set the pruner call backs for the dcache and dqcache
	touch the object's page when a dentry is looked up

fs/dquot.c
	shrink_dqcache_memory becomes age_dqcache_memory and is called from the vm.
	touch the objects's page when a dquot is found

fs/inode.c
	touch the object's page when a inode is found
	prune_inodes now ages <n> inodes instead of freeing <n>.
	shrink_icache_memory becomes age_icache_memory and is called from the vm.
	set the pruner call back for the icache to age_icache_memory.

include/linux/slab.h
	add types, macros, and externs needed for the slab pages in lru scheme.    To
	rationalise includes, the externs for the age_<x>_memory calls move here from 
	dcache.h

mm/slab.c
	add pruner and count to kmem_cache_t and set them up when creating a cache
	add kmem_set_pruner to set the pruner call back
	add kmem_count_page to count the entries in a slab page in cachep->count
	add keme_do_prunes to call the pruner call backs to age the slab cache entries
	add and remove reapable slab pages to the lru
	add kmem_shrink_slab to remove a slab and free its memory if possible
	touch the page when allocating an entry in a slab

mm/vmscan.c
	bug if we hit a slab page in the wrong list
	handle slab pages in page_launder_zone.  We either free the slab, if we cannot and
		the slab's cache is growing we requeue it in the inactive list otherwise we 
		move the page to the active list.
	handle slab pages in refill_inactive_zone.  We count the entries on the page for 
		kmem_do_prunes which gets called later.   If the slab page is eligible to be
		moved to the inactive list, first try to free it, if this fails move it to the inactive 
		list.  refill_inactive_zone now returns the number of pages freed.
	in do_try_to_free_pages account for the pages freed by refill_inactive and call
		kmem_do_prunes to age the slab caches in lock step with the vm scanning.
		A last ditch kmem_cache_reap is left before we conclude we are oom.
	in kswapd call kmem_do_prunes after each background_aging call.

--------------
# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.406   -> 1.408  
#	         fs/buffer.c	1.66    -> 1.67   
#	         fs/dcache.c	1.19    -> 1.20   
#	          fs/dquot.c	1.18    -> 1.19   
#	         mm/vmscan.c	1.60    -> 1.62   
#	           mm/slab.c	1.16    -> 1.18   
#	include/linux/slab.h	1.8     -> 1.9    
#	include/linux/dcache.h	1.11    -> 1.12   
#	          fs/inode.c	1.36    -> 1.37   
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/05/23	ed@oscar.et.ca	1.407
# age_pressure_v7.diff
# --------------------------------------------
# 02/05/23	ed@oscar.et.ca	1.408
# Fix the locking in vmscan for slab pages in lru.  Improve the comments
# too.
# --------------------------------------------
#
diff -Nru a/fs/buffer.c b/fs/buffer.c
--- a/fs/buffer.c	Thu May 23 21:38:19 2002
+++ b/fs/buffer.c	Thu May 23 21:38:19 2002
@@ -951,8 +951,10 @@
 		struct buffer_head * bh;
 
 		bh = get_hash_table(dev, block, size);
-		if (bh)
+		if (bh) {
+			kmem_touch_page(bh);
 			return bh;
+		}
 
 		if (!grow_buffers(dev, block, size))
 			free_more_memory();
diff -Nru a/fs/dcache.c b/fs/dcache.c
--- a/fs/dcache.c	Thu May 23 21:38:19 2002
+++ b/fs/dcache.c	Thu May 23 21:38:19 2002
@@ -321,7 +321,7 @@
 void prune_dcache(int count)
 {
 	spin_lock(&dcache_lock);
-	for (;;) {
+	for (; count ; count--) {
 		struct dentry *dentry;
 		struct list_head *tmp;
 
@@ -345,8 +345,6 @@
 			BUG();
 
 		prune_one_dentry(dentry);
-		if (!--count)
-			break;
 	}
 	spin_unlock(&dcache_lock);
 }
@@ -538,19 +536,10 @@
 
 /*
  * This is called from kswapd when we think we need some
- * more memory, but aren't really sure how much. So we
- * carefully try to free a _bit_ of our dcache, but not
- * too much.
- *
- * Priority:
- *   0 - very urgent: shrink everything
- *  ...
- *   6 - base-level: try to shrink a bit.
+ * more memory. 
  */
-int shrink_dcache_memory(int priority, unsigned int gfp_mask)
+int age_dcache_memory(kmem_cache_t *cachep, int entries, int gfp_mask)
 {
-	int count = 0;
-
 	/*
 	 * Nasty deadlock avoidance.
 	 *
@@ -565,10 +554,11 @@
 	if (!(gfp_mask & __GFP_FS))
 		return 0;
 
-	count = dentry_stat.nr_unused / priority;
+	if (entries > dentry_stat.nr_unused)
+		entries = dentry_stat.nr_unused;
 
-	prune_dcache(count);
-	return kmem_cache_shrink(dentry_cache);
+	prune_dcache(entries);
+	return entries;
 }
 
 #define NAME_ALLOC_LEN(len)	((len+16) & ~15)
@@ -730,6 +720,7 @@
 		}
 		__dget_locked(dentry);
 		dentry->d_vfs_flags |= DCACHE_REFERENCED;
+		kmem_touch_page(dentry);
 		spin_unlock(&dcache_lock);
 		return dentry;
 	}
@@ -1186,6 +1177,8 @@
 	if (!dentry_cache)
 		panic("Cannot create dentry cache");
 
+	kmem_set_pruner(dentry_cache, (pruner_t)age_dcache_memory);
+
 #if PAGE_SHIFT < 13
 	mempages >>= (13 - PAGE_SHIFT);
 #endif
@@ -1279,6 +1272,9 @@
 			SLAB_HWCACHE_ALIGN, NULL, NULL);
 	if (!dquot_cachep)
 		panic("Cannot create dquot SLAB cache");
+	
+	kmem_set_pruner(dquot_cachep, (pruner_t)age_dqcache_memory);
+	
 #endif
 
 	dcache_init(mempages);
diff -Nru a/fs/dquot.c b/fs/dquot.c
--- a/fs/dquot.c	Thu May 23 21:38:19 2002
+++ b/fs/dquot.c	Thu May 23 21:38:19 2002
@@ -1026,10 +1026,13 @@
 
 int shrink_dqcache_memory(int priority, unsigned int gfp_mask)
 {
+	if (entries > nr_free_dquots)
+		entries = nr_free_dquots;
+
 	lock_kernel();
-	prune_dqcache(nr_free_dquots / (priority + 1));
+	prune_dqcache(entries);
 	unlock_kernel();
-	return kmem_cache_shrink(dquot_cachep);
+	return entries;
 }
 
 /*
@@ -1148,6 +1151,7 @@
 #endif
 	dquot->dq_referenced++;
 	dqstats.lookups++;
+	kmem_touch_page(dquot);
 
 	return dquot;
 }
diff -Nru a/fs/inode.c b/fs/inode.c
--- a/fs/inode.c	Thu May 23 21:38:19 2002
+++ b/fs/inode.c	Thu May 23 21:38:19 2002
@@ -193,6 +193,7 @@
 
 static inline void __iget(struct inode * inode)
 {
+	kmem_touch_page(inode);
 	if (atomic_read(&inode->i_count)) {
 		atomic_inc(&inode->i_count);
 		return;
@@ -672,10 +673,11 @@
 
 	count = 0;
 	entry = inode_unused.prev;
-	while (entry != &inode_unused)
-	{
+	for(; goal; goal--) {
 		struct list_head *tmp = entry;
 
+		if (entry == &inode_unused)
+			break;
 		entry = entry->prev;
 		inode = INODE(tmp);
 		if (inode->i_state & (I_FREEING|I_CLEAR|I_LOCK))
@@ -690,8 +692,6 @@
 		list_add(tmp, freeable);
 		inode->i_state |= I_FREEING;
 		count++;
-		if (!--goal)
-			break;
 	}
 	inodes_stat.nr_unused -= count;
 	spin_unlock(&inode_lock);
@@ -708,10 +708,8 @@
 		schedule_task(&unused_inodes_flush_task);
 }
 
-int shrink_icache_memory(int priority, int gfp_mask)
+int age_icache_memory(kmem_cache_t *cachep, int entries, int gfp_mask)
 {
-	int count = 0;
-
 	/*
 	 * Nasty deadlock avoidance..
 	 *
@@ -722,10 +720,11 @@
 	if (!(gfp_mask & __GFP_FS))
 		return 0;
 
-	count = inodes_stat.nr_unused / priority;
+	if (entries > inodes_stat.nr_unused)
+		entries = inodes_stat.nr_unused;
 
-	prune_icache(count);
-	return kmem_cache_shrink(inode_cachep);
+	prune_icache(entries);
+	return entries;
 }
 
 /*
@@ -1172,6 +1171,8 @@
 					 NULL);
 	if (!inode_cachep)
 		panic("cannot create inode slab cache");
+
+	kmem_set_pruner(inode_cachep, (pruner_t)age_icache_memory);
 
 	unused_inodes_flush_task.routine = try_to_sync_unused_inodes;
 }
diff -Nru a/include/linux/dcache.h b/include/linux/dcache.h
--- a/include/linux/dcache.h	Thu May 23 21:38:19 2002
+++ b/include/linux/dcache.h	Thu May 23 21:38:19 2002
@@ -171,15 +171,10 @@
 #define shrink_dcache() prune_dcache(0)
 struct zone_struct;
 /* dcache memory management */
-extern int shrink_dcache_memory(int, unsigned int);
 extern void prune_dcache(int);
 
 /* icache memory management (defined in linux/fs/inode.c) */
-extern int shrink_icache_memory(int, int);
 extern void prune_icache(int);
-
-/* quota cache memory management (defined in linux/fs/dquot.c) */
-extern int shrink_dqcache_memory(int, unsigned int);
 
 /* only used at mount-time */
 extern struct dentry * d_alloc_root(struct inode *);
diff -Nru a/include/linux/slab.h b/include/linux/slab.h
--- a/include/linux/slab.h	Thu May 23 21:38:19 2002
+++ b/include/linux/slab.h	Thu May 23 21:38:19 2002
@@ -55,6 +55,26 @@
 				       void (*)(void *, kmem_cache_t *, unsigned long));
 extern int kmem_cache_destroy(kmem_cache_t *);
 extern int kmem_cache_shrink(kmem_cache_t *);
+
+typedef int (*pruner_t)(kmem_cache_t *, int, int);
+
+extern void kmem_set_pruner(kmem_cache_t *, pruner_t);
+extern int kmem_do_prunes(int);
+extern void kmem_count_page(struct page *);
+#define kmem_touch_page(addr)                 SetPageReferenced(virt_to_page(addr));
+
+/* shrink a slab */
+extern int kmem_shrink_slab(struct page *);
+
+/* dcache prune ( defined in linux/fs/dcache.c) */
+extern int age_dcache_memory(kmem_cache_t *, int, int);
+
+/* icache prune (defined in linux/fs/inode.c) */
+extern int age_icache_memory(kmem_cache_t *, int, int);
+
+/* quota cache prune (defined in linux/fs/dquot.c) */
+extern int age_dqcache_memory(kmem_cache_t *, int, int);
+
 extern void *kmem_cache_alloc(kmem_cache_t *, int);
 extern void kmem_cache_free(kmem_cache_t *, void *);
 
diff -Nru a/mm/slab.c b/mm/slab.c
--- a/mm/slab.c	Thu May 23 21:38:19 2002
+++ b/mm/slab.c	Thu May 23 21:38:19 2002
@@ -212,6 +212,8 @@
 	kmem_cache_t		*slabp_cache;
 	unsigned int		growing;
 	unsigned int		dflags;		/* dynamic flags */
+	pruner_t		pruner;	/* shrink callback */
+	int 			count;		/* count used to trigger shrink */
 
 	/* constructor func */
 	void (*ctor)(void *, kmem_cache_t *, unsigned long);
@@ -381,6 +383,51 @@
 static void enable_cpucache (kmem_cache_t *cachep);
 static void enable_all_cpucaches (void);
 #endif
+ 
+/* set the prune call back function */
+void kmem_set_pruner(kmem_cache_t * cachep, pruner_t thepruner) 
+{
+	cachep->pruner = thepruner;
+}
+
+/* used by refill_inactive_zone to determine caches that need pruning */
+void kmem_count_page(struct page *page)
+{
+	kmem_cache_t *cachep = GET_PAGE_CACHE(page);
+	slab_t *slabp = GET_PAGE_SLAB(page);
+	if (cachep->pruner != NULL)
+		cachep->count += (slabp->inuse >> cachep->gfporder);
+}
+
+/* call the prune functions to age pruneable caches */
+int kmem_do_prunes(int gfp_mask) 
+{
+	int ret = 0;
+	struct list_head *p;
+
+        if (gfp_mask & __GFP_WAIT)
+                down(&cache_chain_sem);
+        else
+                if (down_trylock(&cache_chain_sem))
+                        return 0;
+
+        list_for_each(p,&cache_chain) {
+                kmem_cache_t *cachep = list_entry(p, kmem_cache_t, next);
+		if (cachep->pruner != NULL) {
+			if (cachep->count > 0) {
+				int nr = (*cachep->pruner)(cachep, cachep->count, gfp_mask);
+				cachep->count = 0;
+#ifdef DEBUG
+				printk("pruned %-17s %d\n",cachep->name, nr); 
+#endif
+
+			}
+		}
+        }
+        up(&cache_chain_sem);
+	return ret;
+}
+
 
 /* Cal the num objs, wastage, and bytes left over for a given slab size. */
 static void kmem_cache_estimate (unsigned long gfporder, size_t size,
@@ -513,6 +560,10 @@
 	 * vm_scan(). Shouldn't be a worry.
 	 */
 	while (i--) {
+		if (!(cachep->flags & SLAB_NO_REAP)) {
+			set_page_count(page, 0);
+			lru_cache_del(page);
+		}
 		PageClearSlab(page);
 		page++;
 	}
@@ -588,6 +639,34 @@
 		kmem_cache_free(cachep->slabp_cache, slabp);
 }
 
+/* 
+ * Used by page_launder_zone and refill_inactive_zone to 
+ * try to shrink a slab.  There are three possible results:
+ * - shrink works and we return the pages shrunk
+ * - shrink fails due to a growing cache, we return 0
+ * - shrink fails because the slab is in use, we return 0
+ *   and set the page reference bit.
+ */
+int kmem_shrink_slab(struct page *page)
+{
+	kmem_cache_t *cachep = GET_PAGE_CACHE(page);
+	slab_t *slabp = GET_PAGE_SLAB(page);
+
+	spin_lock_irq(cachep->spinlock);
+	if (!slabp->inuse) {
+	 	if (!cachep->growing) {
+			list_del(&slabp->list);
+			spin_unlock_irq(cachep->spinlock);
+			kmem_slab_destroy(cachep, slabp);
+			return 1<<cachep->gfporder;
+		}
+	} else 
+		SetPageReferenced(page);
+	spin_unlock_irq(cachep->spinlock);
+	return 0; 
+}
+
+
 /**
  * kmem_cache_create - Create a cache.
  * @name: A string which is used in /proc/slabinfo to identify this cache.
@@ -780,6 +859,8 @@
 		flags |= CFLGS_OPTIMIZE;
 
 	cachep->flags = flags;
+	cachep->pruner = NULL;
+	cachep->count = 0;
 	cachep->gfpflags = 0;
 	if (flags & SLAB_CACHE_DMA)
 		cachep->gfpflags |= GFP_DMA;
@@ -1174,6 +1255,10 @@
 		SET_PAGE_CACHE(page, cachep);
 		SET_PAGE_SLAB(page, slabp);
 		PageSetSlab(page);
+		if (!(cachep->flags & SLAB_NO_REAP)) {
+			set_page_count(page, 1);
+			lru_cache_add(page);
+		}
 		page++;
 	} while (--i);
 
@@ -1255,6 +1340,7 @@
 		list_del(&slabp->list);
 		list_add(&slabp->list, &cachep->slabs_full);
 	}
+	kmem_touch_page(objp);
 #if DEBUG
 	if (cachep->flags & SLAB_POISON)
 		if (kmem_check_poison_obj(cachep, objp))
diff -Nru a/mm/vmscan.c b/mm/vmscan.c
--- a/mm/vmscan.c	Thu May 23 21:38:19 2002
+++ b/mm/vmscan.c	Thu May 23 21:38:19 2002
@@ -102,6 +102,9 @@
 			continue;
 		}
 
+		if (PageSlab(page))
+			BUG();
+
 		/* Page is being freed */
 		if (unlikely(page_count(page)) == 0) {
 			list_del(page_lru);
@@ -270,7 +273,8 @@
 		 * the active list and adjust the page age if needed.
 		 */
 		pte_chain_lock(page);
-		if (page_referenced(page) && page_mapping_inuse(page) &&
+		if (page_referenced(page) &&
+				(page_mapping_inuse(page) || PageSlab(page)) &&
 				!page_over_rsslimit(page)) {
 			del_page_from_inactive_dirty_list(page);
 			add_page_to_active_list(page);
@@ -282,6 +286,25 @@
 		pte_chain_unlock(page);
 
 		/*
+		 * These pages are 'naked' - we do not want any other tests
+		 * done on them...  If kmem_shrink_slab finds the slab has
+		 * entries it will return 0 and set the page_refenence bit, 
+		 * in this case we want to activate the page.
+		 */
+		if (PageSlab(page)) {
+			int pages;
+			UnlockPage(page);
+			pages = kmem_shrink_slab(page);
+			if (!pages && PageTestandClearReferenced(page)) {
+	                        del_page_from_inactive_dirty_list(page);
+				add_page_to_active_list(page);
+				page->age = max((int)page->age, PAGE_AGE_START);
+			} else
+				cleaned_pages += pages;
+			continue;
+		}
+
+		/*
 		 * Anonymous process memory without backing store. Try to
 		 * allocate it some swap space here.
 		 *
@@ -470,12 +493,14 @@
  * This function will scan a portion of the active list of a zone to find
  * unused pages, those pages will then be moved to the inactive list.
  */
+
 int refill_inactive_zone(struct zone_struct * zone, int priority)
 {
 	int maxscan = zone->active_pages >> priority;
 	int target = inactive_high(zone);
 	struct list_head * page_lru;
 	int nr_deactivated = 0;
+	int nr_freed = 0;
 	struct page * page;
 
 	/* Take the lock while messing with the list... */
@@ -507,7 +532,7 @@
 		 * both PG_locked and the pte_chain_lock are held.
 		 */
 		pte_chain_lock(page);
-		if (!page_mapping_inuse(page)) {
+		if (!page_mapping_inuse(page) && !PageSlab(page)) {
 			pte_chain_unlock(page);
 			UnlockPage(page);
 			drop_page(page);
@@ -524,14 +549,32 @@
 		}
 
 		/* 
+		 * Count the entries on the page for pruning caches.
+		 */
+		if (PageSlab(page))
+			kmem_count_page(page);
+
+		/* 
 		 * If the page age is 'hot' and the process using the
 		 * page doesn't exceed its RSS limit we keep the page.
-		 * Otherwise we move it to the inactive_dirty list.
+		 * Otherwise we move it to the inactive_dirty list.  
+		 * For slab pages if its not a hot page, we try to 
+		 * free it, failing it goes to the inactive_dirty list. 
 		 */
 		if (page->age && !page_over_rsslimit(page)) {
 			list_del(page_lru);
 			list_add(page_lru, &zone->active_list);
 		} else {
+			if (PageSlab(page)) {
+				int pages = kmem_shrink_slab(page); 
+				if (pages) { 
+					nr_freed += pages;
+					pte_chain_unlock(page);
+					UnlockPage(page);
+					continue;
+				} else
+					ClearPageReferenced(page);
+			}		
 			deactivate_page_nolock(page);
 			if (++nr_deactivated > target) {
 				pte_chain_unlock(page);
@@ -553,9 +596,10 @@
 done:
 	spin_unlock(&pagemap_lru_lock);
 
-	return nr_deactivated;
+	return nr_freed;
 }
 
+
 /**
  * refill_inactive - checks all zones and refills the inactive list as needed
  *
@@ -620,24 +664,16 @@
 
 	/*
 	 * Eat memory from filesystem page cache, buffer cache,
-	 * dentry, inode and filesystem quota caches.
 	 */
 	ret += page_launder(gfp_mask);
-	ret += shrink_dcache_memory(DEF_PRIORITY, gfp_mask);
-	ret += shrink_icache_memory(1, gfp_mask);
-#ifdef CONFIG_QUOTA
-	ret += shrink_dqcache_memory(DEF_PRIORITY, gfp_mask);
-#endif
 
 	/*
-	 * Move pages from the active list to the inactive list.
-	 */
-	refill_inactive();
-
-	/* 	
-	 * Reclaim unused slab cache memory.
+	 * Move pages from the active list to the inactive list and 
+	 * return pages gained by shrink.  Then prune caches to age
+	 * them. 
 	 */
-	ret += kmem_cache_reap(gfp_mask);
+	ret += refill_inactive();
+	kmem_do_prunes(gfp_mask);
 
 	refill_freelist();
 
@@ -646,11 +682,14 @@
 		run_task_queue(&tq_disk);
 
 	/*
-	 * Hmm.. Cache shrink failed - time to kill something?
+	 * Hmm.. - time to kill something?
 	 * Mhwahahhaha! This is the part I really like. Giggle.
 	 */
-	if (!ret && free_min(ANY_ZONE) > 0)
-		out_of_memory();
+	if (!ret && free_min(ANY_ZONE) > 0) {
+		ret += kmem_cache_reap(gfp_mask);
+		if (!ret)
+			out_of_memory();
+	}
 
 	return ret;
 }
@@ -744,6 +783,7 @@
 
 			/* Do background page aging. */
 			background_aging(DEF_PRIORITY);
+			kmem_do_prunes(GFP_KSWAPD);
 		}
 
 		wakeup_memwaiters();

--------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] using page aging to shrink caches (pre8-ac5)
  2002-05-24 11:28   ` [RFC][PATCH] using page aging to shrink caches (pre8-ac5) Ed Tomlinson
@ 2002-05-24 11:35     ` Christoph Hellwig
  2002-05-24 12:14       ` Ed Tomlinson
  2002-05-24 11:42     ` William Lee Irwin III
  2002-05-29 12:01     ` Ed Tomlinson
  2 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2002-05-24 11:35 UTC (permalink / raw)
  To: Ed Tomlinson; +Cc: linux-kernel, linux-mm

On Fri, May 24, 2002 at 07:28:45AM -0400, Ed Tomlinson wrote:
> Comments, questions and feedback very welcome,

Just from a short look:

What about doing mark_page_accessed in kmem_touch_page?
And please do a s/pruner_t/kmem_pruner_t/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] using page aging to shrink caches (pre8-ac5)
  2002-05-24 11:28   ` [RFC][PATCH] using page aging to shrink caches (pre8-ac5) Ed Tomlinson
  2002-05-24 11:35     ` Christoph Hellwig
@ 2002-05-24 11:42     ` William Lee Irwin III
  2002-05-29 12:01     ` Ed Tomlinson
  2 siblings, 0 replies; 8+ messages in thread
From: William Lee Irwin III @ 2002-05-24 11:42 UTC (permalink / raw)
  To: Ed Tomlinson; +Cc: linux-kernel, linux-mm

On Fri, May 24, 2002 at 07:28:45AM -0400, Ed Tomlinson wrote:
> This moves things towards having the vm do the work of freeing the
> pages. I do wonder if it worth the effort in that slab pages are a
> bit different from other pages and get treated a little differently.
> For instance, we sometimes free slab pages in refill_inactive.
> Without this the caches can grow and grow without any possibility of
> shrinking when under low loads.  By allowing freeing we avoid getting
> into a situation where slab pages cause an artificial shortage.
> Finding a good method of handling the dcache/icache and dquota caches
> has been fun...  What I do now is factor the pruning and shrinking
> into different calls.  The puning, in effect, ages entries in the
> above caches. The rate I prune is simply the rate I see entries for
> these slabs in refill_inactive_zone. This is seems fair and, in my
> testing, works better than anything else I have tried (I have have
> experimented quite a bit).  It also avoid using any magic numbers 
> and is self tuning.

This kind of cache reclamation logic is so sorely needed it's
unimaginable. I'm quite grateful for your efforts in this direction,
and hope to be able to provide some assistance in testing soon.

Cheers,
Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] using page aging to shrink caches (pre8-ac5)
  2002-05-24 11:35     ` Christoph Hellwig
@ 2002-05-24 12:14       ` Ed Tomlinson
  2002-05-24 12:20         ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Ed Tomlinson @ 2002-05-24 12:14 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel, linux-mm

On May 24, 2002 07:35 am, Christoph Hellwig wrote:
> On Fri, May 24, 2002 at 07:28:45AM -0400, Ed Tomlinson wrote:
> > Comments, questions and feedback very welcome,
>
> Just from a short look:
>
> What about doing mark_page_accessed in kmem_touch_page?

mark_page_accessed expects a page struct.  kmem_touch_page takes an
address in the page, converts it to a kernel address and then marks the page.

> And please do a s/pruner_t/kmem_pruner_t/

Yes.  Done.

One other style question.  I am not completely happy with kmem_shrink_slab.
Think that instead of setting the reference bit I should probably do something
like return:

-1	- cache is growing
  0	- slab has inuse objects
  n    - pages were freed

Comments?
Ed Tomlinson

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] using page aging to shrink caches (pre8-ac5)
  2002-05-24 12:14       ` Ed Tomlinson
@ 2002-05-24 12:20         ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2002-05-24 12:20 UTC (permalink / raw)
  To: Ed Tomlinson; +Cc: linux-kernel, linux-mm

On Fri, May 24, 2002 at 08:14:25AM -0400, Ed Tomlinson wrote:
> On May 24, 2002 07:35 am, Christoph Hellwig wrote:
> > On Fri, May 24, 2002 at 07:28:45AM -0400, Ed Tomlinson wrote:
> > > Comments, questions and feedback very welcome,
> >
> > Just from a short look:
> >
> > What about doing mark_page_accessed in kmem_touch_page?
> 
> mark_page_accessed expects a page struct.  kmem_touch_page takes an
> address in the page, converts it to a kernel address and then marks the page.

Of course after the virt_to_page..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] using page aging to shrink caches (pre8-ac5)
  2002-05-24 11:28   ` [RFC][PATCH] using page aging to shrink caches (pre8-ac5) Ed Tomlinson
  2002-05-24 11:35     ` Christoph Hellwig
  2002-05-24 11:42     ` William Lee Irwin III
@ 2002-05-29 12:01     ` Ed Tomlinson
  2 siblings, 0 replies; 8+ messages in thread
From: Ed Tomlinson @ 2002-05-29 12:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, alan

Hi,

Here is an improved version of the patch.  It fixes a race in kmem_freepages (I do
not see why the race should not happen in straight ac) and makes the following 
changes:

Aging works a little differently for pruneable caches.  For these caches we use
pruning to do aging.  The rate we prune is simpily the rate we see objects on
pages processed by vmscan.  For the all other caches vm aging is used.  Without this
change two aging methods we being applied to the dcache/icache.  This favored
their pages and the vm was quite slow to trim them at times.

Second, since there is almost no overhead (ie no disk access), when refill_inactive_zone
sees a slab page it wants to free it releases the slab and moves the pages to the inactive
clean list.  

An interesting note.  If I directly free the pages in kmem_freepages I run into a race.
It seems that freepages can be lost...   To see the race I do the following.

find / -name "*" > /dev/null &
irman &
dbench 40 &

When irman start its memory stress test free pages are lost.  Its very easy to see this
using zlatko calusic's xmm utility.  With my patch the number of kernel pages should
be quite stable.  When the race occurs they jump and proc/meminfo shows missing
pages.  This is on UP with no preempth.

Patch applies to pre8-ac5.

Comments?

Ed Tomlinson

-----------------
# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.406   -> 1.412  
#	         fs/buffer.c	1.66    -> 1.68   
#	         fs/dcache.c	1.19    -> 1.21   
#	          fs/dquot.c	1.18    -> 1.20   
#	         mm/vmscan.c	1.60    -> 1.65   
#	           mm/slab.c	1.16    -> 1.21   
#	include/linux/slab.h	1.8     -> 1.10   
#	include/linux/dcache.h	1.11    -> 1.12   
#	          fs/inode.c	1.36    -> 1.38   
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/05/24	ed@oscar.et.ca	1.407
# age_pressure_v8.diff
# --------------------------------------------
# 02/05/24	ed@oscar.et.ca	1.408
# Remove side effect from kmem_shrink_slab and fix vmscan to use the new
# return codes.
# --------------------------------------------
# 02/05/24	ed@oscar.et.ca	1.409
# Simpilify - try lates algorythm without touches in lookups.
# --------------------------------------------
# 02/05/24	ed@oscar.et.ca	1.410
# Use either vm aging or prune call back aging, not both.  Keep slab
# pages on the active list.
# --------------------------------------------
# 02/05/28	ed@oscar.et.ca	1.411
# fix locking in slab to use pagemap_lru_lock when shrinking or growing
# a cache.  Use the inactive clean list when freeing a slab's pages.
# This avoid a race so the vm does not lose track of pages.
# --------------------------------------------
# 02/05/29	ed@oscar.et.ca	1.412
# Prevent bug in page_launder from being hit due to a dangling 
# referencebit.  Improve accounting in refill_inactive.
# --------------------------------------------
#
diff -Nru a/fs/dcache.c b/fs/dcache.c
--- a/fs/dcache.c	Wed May 29 07:35:03 2002
+++ b/fs/dcache.c	Wed May 29 07:35:03 2002
@@ -321,7 +321,7 @@
 void prune_dcache(int count)
 {
 	spin_lock(&dcache_lock);
-	for (;;) {
+	for (; count ; count--) {
 		struct dentry *dentry;
 		struct list_head *tmp;
 
@@ -345,8 +345,6 @@
 			BUG();
 
 		prune_one_dentry(dentry);
-		if (!--count)
-			break;
 	}
 	spin_unlock(&dcache_lock);
 }
@@ -538,19 +536,10 @@
 
 /*
  * This is called from kswapd when we think we need some
- * more memory, but aren't really sure how much. So we
- * carefully try to free a _bit_ of our dcache, but not
- * too much.
- *
- * Priority:
- *   0 - very urgent: shrink everything
- *  ...
- *   6 - base-level: try to shrink a bit.
+ * more memory. 
  */
-int shrink_dcache_memory(int priority, unsigned int gfp_mask)
+int age_dcache_memory(kmem_cache_t *cachep, int entries, int gfp_mask)
 {
-	int count = 0;
-
 	/*
 	 * Nasty deadlock avoidance.
 	 *
@@ -565,10 +554,11 @@
 	if (!(gfp_mask & __GFP_FS))
 		return 0;
 
-	count = dentry_stat.nr_unused / priority;
+	if (entries > dentry_stat.nr_unused)
+		entries = dentry_stat.nr_unused;
 
-	prune_dcache(count);
-	return kmem_cache_shrink(dentry_cache);
+	prune_dcache(entries);
+	return entries;
 }
 
 #define NAME_ALLOC_LEN(len)	((len+16) & ~15)
@@ -1186,6 +1176,8 @@
 	if (!dentry_cache)
 		panic("Cannot create dentry cache");
 
+	kmem_set_pruner(dentry_cache, (kmem_pruner_t)age_dcache_memory);
+
 #if PAGE_SHIFT < 13
 	mempages >>= (13 - PAGE_SHIFT);
 #endif
@@ -1279,6 +1271,9 @@
 			SLAB_HWCACHE_ALIGN, NULL, NULL);
 	if (!dquot_cachep)
 		panic("Cannot create dquot SLAB cache");
+	
+	kmem_set_pruner(dquot_cachep, (kmem_pruner_t)age_dqcache_memory);
+	
 #endif
 
 	dcache_init(mempages);
diff -Nru a/fs/dquot.c b/fs/dquot.c
--- a/fs/dquot.c	Wed May 29 07:35:03 2002
+++ b/fs/dquot.c	Wed May 29 07:35:03 2002
@@ -1026,10 +1026,13 @@
 
 int shrink_dqcache_memory(int priority, unsigned int gfp_mask)
 {
+	if (entries > nr_free_dquots)
+		entries = nr_free_dquots;
+
 	lock_kernel();
-	prune_dqcache(nr_free_dquots / (priority + 1));
+	prune_dqcache(entries);
 	unlock_kernel();
-	return kmem_cache_shrink(dquot_cachep);
+	return entries;
 }
 
 /*
diff -Nru a/fs/inode.c b/fs/inode.c
--- a/fs/inode.c	Wed May 29 07:35:03 2002
+++ b/fs/inode.c	Wed May 29 07:35:03 2002
@@ -672,10 +672,11 @@
 
 	count = 0;
 	entry = inode_unused.prev;
-	while (entry != &inode_unused)
-	{
+	for(; goal; goal--) {
 		struct list_head *tmp = entry;
 
+		if (entry == &inode_unused)
+			break;
 		entry = entry->prev;
 		inode = INODE(tmp);
 		if (inode->i_state & (I_FREEING|I_CLEAR|I_LOCK))
@@ -690,8 +691,6 @@
 		list_add(tmp, freeable);
 		inode->i_state |= I_FREEING;
 		count++;
-		if (!--goal)
-			break;
 	}
 	inodes_stat.nr_unused -= count;
 	spin_unlock(&inode_lock);
@@ -708,10 +707,8 @@
 		schedule_task(&unused_inodes_flush_task);
 }
 
-int shrink_icache_memory(int priority, int gfp_mask)
+int age_icache_memory(kmem_cache_t *cachep, int entries, int gfp_mask)
 {
-	int count = 0;
-
 	/*
 	 * Nasty deadlock avoidance..
 	 *
@@ -722,10 +719,11 @@
 	if (!(gfp_mask & __GFP_FS))
 		return 0;
 
-	count = inodes_stat.nr_unused / priority;
+	if (entries > inodes_stat.nr_unused)
+		entries = inodes_stat.nr_unused;
 
-	prune_icache(count);
-	return kmem_cache_shrink(inode_cachep);
+	prune_icache(entries);
+	return entries;
 }
 
 /*
@@ -1172,6 +1170,8 @@
 					 NULL);
 	if (!inode_cachep)
 		panic("cannot create inode slab cache");
+
+	kmem_set_pruner(inode_cachep, (kmem_pruner_t)age_icache_memory);
 
 	unused_inodes_flush_task.routine = try_to_sync_unused_inodes;
 }
diff -Nru a/include/linux/dcache.h b/include/linux/dcache.h
--- a/include/linux/dcache.h	Wed May 29 07:35:03 2002
+++ b/include/linux/dcache.h	Wed May 29 07:35:03 2002
@@ -171,15 +171,10 @@
 #define shrink_dcache() prune_dcache(0)
 struct zone_struct;
 /* dcache memory management */
-extern int shrink_dcache_memory(int, unsigned int);
 extern void prune_dcache(int);
 
 /* icache memory management (defined in linux/fs/inode.c) */
-extern int shrink_icache_memory(int, int);
 extern void prune_icache(int);
-
-/* quota cache memory management (defined in linux/fs/dquot.c) */
-extern int shrink_dqcache_memory(int, unsigned int);
 
 /* only used at mount-time */
 extern struct dentry * d_alloc_root(struct inode *);
diff -Nru a/include/linux/slab.h b/include/linux/slab.h
--- a/include/linux/slab.h	Wed May 29 07:35:03 2002
+++ b/include/linux/slab.h	Wed May 29 07:35:03 2002
@@ -55,6 +55,26 @@
 				       void (*)(void *, kmem_cache_t *, unsigned long));
 extern int kmem_cache_destroy(kmem_cache_t *);
 extern int kmem_cache_shrink(kmem_cache_t *);
+
+typedef int (*kmem_pruner_t)(kmem_cache_t *, int, int);
+
+extern void kmem_set_pruner(kmem_cache_t *, kmem_pruner_t);
+extern int kmem_do_prunes(int);
+extern int kmem_count_page(struct page *);
+#define kmem_touch_page(addr)                 SetPageReferenced(virt_to_page(addr));
+
+/* shrink a slab */
+extern int kmem_shrink_slab(struct page *);
+
+/* dcache prune ( defined in linux/fs/dcache.c) */
+extern int age_dcache_memory(kmem_cache_t *, int, int);
+
+/* icache prune (defined in linux/fs/inode.c) */
+extern int age_icache_memory(kmem_cache_t *, int, int);
+
+/* quota cache prune (defined in linux/fs/dquot.c) */
+extern int age_dqcache_memory(kmem_cache_t *, int, int);
+
 extern void *kmem_cache_alloc(kmem_cache_t *, int);
 extern void kmem_cache_free(kmem_cache_t *, void *);
 
diff -Nru a/mm/slab.c b/mm/slab.c
--- a/mm/slab.c	Wed May 29 07:35:03 2002
+++ b/mm/slab.c	Wed May 29 07:35:03 2002
@@ -72,6 +72,7 @@
 #include	<linux/slab.h>
 #include	<linux/interrupt.h>
 #include	<linux/init.h>
+#include	<linux/mm_inline.h>
 #include	<asm/uaccess.h>
 
 /*
@@ -212,6 +213,8 @@
 	kmem_cache_t		*slabp_cache;
 	unsigned int		growing;
 	unsigned int		dflags;		/* dynamic flags */
+	kmem_pruner_t		pruner;	/* shrink callback */
+	int 			count;		/* count used to trigger shrink */
 
 	/* constructor func */
 	void (*ctor)(void *, kmem_cache_t *, unsigned long);
@@ -381,6 +384,54 @@
 static void enable_cpucache (kmem_cache_t *cachep);
 static void enable_all_cpucaches (void);
 #endif
+ 
+/* set the prune call back function */
+void kmem_set_pruner(kmem_cache_t * cachep, kmem_pruner_t thepruner) 
+{
+	cachep->pruner = thepruner;
+}
+
+/* used by refill_inactive_zone to determine caches that need pruning */
+int kmem_count_page(struct page *page)
+{
+	kmem_cache_t *cachep = GET_PAGE_CACHE(page);
+	slab_t *slabp = GET_PAGE_SLAB(page);
+	if (cachep->pruner != NULL)
+		cachep->count += (slabp->inuse >> cachep->gfporder);
+	return (cachep->pruner != NULL);
+}
+
+/* call the prune functions to age pruneable caches */
+int kmem_do_prunes(int gfp_mask) 
+{
+	int ret = 0;
+	struct list_head *p;
+
+        if (gfp_mask & __GFP_WAIT)
+                down(&cache_chain_sem);
+        else
+                if (down_trylock(&cache_chain_sem))
+                        return 0;
+
+        list_for_each(p,&cache_chain) {
+                kmem_cache_t *cachep = list_entry(p, kmem_cache_t, next);
+		if (cachep->pruner != NULL) {
+			if (cachep->count > 0) {
+#ifdef DEBUGX
+				int nr = (*cachep->pruner)(cachep, cachep->count, gfp_mask);
+				printk("pruned %-17s %d\n",cachep->name, nr, gfp_mask)); 
+#else
+				(*cachep->pruner)(cachep, cachep->count, gfp_mask);
+#endif
+				cachep->count = 0;
+
+			}
+		}
+        }
+        up(&cache_chain_sem);
+	return 1;
+}
+
 
 /* Cal the num objs, wastage, and bytes left over for a given slab size. */
 static void kmem_cache_estimate (unsigned long gfporder, size_t size,
@@ -479,7 +530,9 @@
 
 __initcall(kmem_cpucache_init);
 
-/* Interface to system's page allocator. No need to hold the cache-lock.
+/*
+ * Interface to system's page allocator. No need to hold the cache-lock.
+ * Call with pagemap_lru_lock held
  */
 static inline void * kmem_getpages (kmem_cache_t *cachep, unsigned long flags)
 {
@@ -513,10 +566,17 @@
 	 * vm_scan(). Shouldn't be a worry.
 	 */
 	while (i--) {
-		PageClearSlab(page);
+		if (cachep->flags & SLAB_NO_REAP) 
+			PageClearSlab(page);
+		else {
+			ClearPageReferenced(page);
+			del_page_from_active_list(page);
+			add_page_to_inactive_clean_list(page);
+		}
 		page++;
 	}
-	free_pages((unsigned long)addr, cachep->gfporder);
+	if (cachep->flags & SLAB_NO_REAP)
+		free_pages((unsigned long)addr, cachep->gfporder);
 }
 
 #if DEBUG
@@ -549,6 +609,7 @@
 /* Destroy all the objs in a slab, and release the mem back to the system.
  * Before calling the slab must have been unlinked from the cache.
  * The cache-lock is not held/needed.
+ * pagemap_lru_lock should be held for kmem_freepages
  */
 static void kmem_slab_destroy (kmem_cache_t *cachep, slab_t *slabp)
 {
@@ -588,6 +649,32 @@
 		kmem_cache_free(cachep->slabp_cache, slabp);
 }
 
+/* 
+ * Used by page_launder_zone and refill_inactive_zone to 
+ * try to shrink a slab. 
+ * - shrink works and we return the pages shrunk
+ * - shrink fails because the slab is in use, we return 0
+ * called with pagemap_lru_lock held.
+ */
+int kmem_shrink_slab(struct page *page)
+{
+	kmem_cache_t *cachep = GET_PAGE_CACHE(page);
+	slab_t *slabp = GET_PAGE_SLAB(page);
+
+	spin_lock_irq(cachep->spinlock);
+	if (!slabp->inuse) {
+	 	if (!cachep->growing) {
+			list_del(&slabp->list);
+			spin_unlock_irq(cachep->spinlock);
+			kmem_slab_destroy(cachep, slabp);
+			return 1<<cachep->gfporder;
+		}
+	}
+	spin_unlock_irq(cachep->spinlock);
+	return 0; 
+}
+
+
 /**
  * kmem_cache_create - Create a cache.
  * @name: A string which is used in /proc/slabinfo to identify this cache.
@@ -780,6 +867,8 @@
 		flags |= CFLGS_OPTIMIZE;
 
 	cachep->flags = flags;
+	cachep->pruner = NULL;
+	cachep->count = 0;
 	cachep->gfpflags = 0;
 	if (flags & SLAB_CACHE_DMA)
 		cachep->gfpflags |= GFP_DMA;
@@ -946,11 +1035,13 @@
 
 	drain_cpu_caches(cachep);
 
+	spin_lock(&pagemap_lru_lock);
 	spin_lock_irq(&cachep->spinlock);
 	__kmem_cache_shrink_locked(cachep);
 	ret = !list_empty(&cachep->slabs_full) ||
 		!list_empty(&cachep->slabs_partial);
 	spin_unlock_irq(&cachep->spinlock);
+	spin_unlock(&pagemap_lru_lock);
 	return ret;
 }
 
@@ -969,10 +1060,12 @@
 		BUG();
 
 	drain_cpu_caches(cachep);
-  
+ 
+	spin_lock(&pagemap_lru_lock);
 	spin_lock_irq(&cachep->spinlock);
 	ret = __kmem_cache_shrink_locked(cachep);
 	spin_unlock_irq(&cachep->spinlock);
+	spin_unlock(&pagemap_lru_lock);
 
 	return ret << cachep->gfporder;
 }
@@ -1163,6 +1256,14 @@
 	if (!(objp = kmem_getpages(cachep, flags)))
 		goto failed;
 
+	/* 
+	 * We need the pagemap_lru_lock - is there a way to wait here 
+	 * or could we just spinlock without deadlocking ???
+	 */
+	if (!(cachep->flags & SLAB_NO_REAP))
+		if (!spin_trylock(&pagemap_lru_lock))
+			goto opps1;
+
 	/* Get slab management. */
 	if (!(slabp = kmem_cache_slabmgmt(cachep, objp, offset, local_flags)))
 		goto opps1;
@@ -1174,9 +1275,16 @@
 		SET_PAGE_CACHE(page, cachep);
 		SET_PAGE_SLAB(page, slabp);
 		PageSetSlab(page);
+		if (!(cachep->flags & SLAB_NO_REAP)) {
+			set_page_count(page, 1);
+			add_page_to_active_list(page);
+		}
 		page++;
 	} while (--i);
 
+	if (!(cachep->flags & SLAB_NO_REAP))
+		spin_unlock(&pagemap_lru_lock);
+
 	kmem_cache_init_objs(cachep, slabp, ctor_flags);
 
 	spin_lock_irqsave(&cachep->spinlock, save_flags);
@@ -1190,7 +1298,8 @@
 	spin_unlock_irqrestore(&cachep->spinlock, save_flags);
 	return 1;
 opps1:
-	kmem_freepages(cachep, objp);
+	/* do not use kmem_freepages - we are not in the lru yet... */      
+	free_pages((unsigned long)objp, cachep->gfporder);
 failed:
 	spin_lock_irqsave(&cachep->spinlock, save_flags);
 	cachep->growing--;
@@ -1255,6 +1364,7 @@
 		list_del(&slabp->list);
 		list_add(&slabp->list, &cachep->slabs_full);
 	}
+	kmem_touch_page(objp);
 #if DEBUG
 	if (cachep->flags & SLAB_POISON)
 		if (kmem_check_poison_obj(cachep, objp))
@@ -1816,6 +1926,7 @@
 
 	spin_lock_irq(&best_cachep->spinlock);
 perfect:
+	spin_lock(&pagemap_lru_lock);
 	/* free only 50% of the free slabs */
 	best_len = (best_len + 1)/2;
 	for (scan = 0; scan < best_len; scan++) {
@@ -1841,6 +1952,7 @@
 		kmem_slab_destroy(best_cachep, slabp);
 		spin_lock_irq(&best_cachep->spinlock);
 	}
+	spin_unlock(&pagemap_lru_lock);
 	spin_unlock_irq(&best_cachep->spinlock);
 	ret = scan * (1 << best_cachep->gfporder);
 out:
diff -Nru a/mm/vmscan.c b/mm/vmscan.c
--- a/mm/vmscan.c	Wed May 29 07:35:03 2002
+++ b/mm/vmscan.c	Wed May 29 07:35:03 2002
@@ -137,6 +137,12 @@
 			goto found_page;
 		}
 
+		/* page just has the flag, its not in any cache/slab */
+		if (PageSlab(page)) {
+			PageClearSlab(page);
+			goto found_page;
+		}
+
 		/* We should never ever get here. */
 		printk(KERN_ERR "VM: reclaim_page, found unknown page\n");
 		list_del(page_lru);
@@ -265,6 +271,10 @@
 		if (unlikely(TryLockPage(page)))
 			continue;
 
+		/* Slab pages should never get here... */
+		if (PageSlab(page))
+			BUG();
+
 		/*
 		 * The page is in active use or really unfreeable. Move to
 		 * the active list and adjust the page age if needed.
@@ -470,12 +480,14 @@
  * This function will scan a portion of the active list of a zone to find
  * unused pages, those pages will then be moved to the inactive list.
  */
+
 int refill_inactive_zone(struct zone_struct * zone, int priority)
 {
 	int maxscan = zone->active_pages >> priority;
 	int target = inactive_high(zone);
 	struct list_head * page_lru;
 	int nr_deactivated = 0;
+	int nr_freed = 0;
 	struct page * page;
 
 	/* Take the lock while messing with the list... */
@@ -507,7 +519,7 @@
 		 * both PG_locked and the pte_chain_lock are held.
 		 */
 		pte_chain_lock(page);
-		if (!page_mapping_inuse(page)) {
+		if (!page_mapping_inuse(page) && !PageSlab(page)) {
 			pte_chain_unlock(page);
 			UnlockPage(page);
 			drop_page(page);
@@ -524,6 +536,31 @@
 		}
 
 		/* 
+		 * For slab pages we count entries for caches with their
+		 * own pruning/aging method.  If we can count a page or
+		 * its cold we try to free it.  We only use one aging
+		 * method otherwise we end up with caches with lots
+		 * of free pages...  kmem_shrink_slab moves free the
+		 * slab and move the pages to the inactive clean list. 
+		 */
+		if (PageSlab(page)) {
+			pte_chain_unlock(page);
+			UnlockPage(page);
+			if (kmem_count_page(page) || !page->age) {
+				int pages = kmem_shrink_slab(page);
+				if (!pages) {
+					list_del(page_lru);
+					list_add(page_lru, &zone->active_list);
+				} else {
+					nr_freed += pages;
+					if (nr_deactivated+nr_freed > target)
+						goto done; 
+				}
+			}
+			continue;
+		}
+
+		/* 
 		 * If the page age is 'hot' and the process using the
 		 * page doesn't exceed its RSS limit we keep the page.
 		 * Otherwise we move it to the inactive_dirty list.
@@ -533,7 +570,7 @@
 			list_add(page_lru, &zone->active_list);
 		} else {
 			deactivate_page_nolock(page);
-			if (++nr_deactivated > target) {
+			if (++nr_deactivated+nr_freed > target) {
 				pte_chain_unlock(page);
 				UnlockPage(page);
 				goto done;
@@ -553,9 +590,10 @@
 done:
 	spin_unlock(&pagemap_lru_lock);
 
-	return nr_deactivated;
+	return nr_freed;
 }
 
+
 /**
  * refill_inactive - checks all zones and refills the inactive list as needed
  *
@@ -620,24 +658,15 @@
 
 	/*
 	 * Eat memory from filesystem page cache, buffer cache,
-	 * dentry, inode and filesystem quota caches.
 	 */
 	ret += page_launder(gfp_mask);
-	ret += shrink_dcache_memory(DEF_PRIORITY, gfp_mask);
-	ret += shrink_icache_memory(1, gfp_mask);
-#ifdef CONFIG_QUOTA
-	ret += shrink_dqcache_memory(DEF_PRIORITY, gfp_mask);
-#endif
 
 	/*
-	 * Move pages from the active list to the inactive list.
-	 */
-	refill_inactive();
-
-	/* 	
-	 * Reclaim unused slab cache memory.
+	 * Move pages from the active list to the inactive list,
+	 * then prune the prunable caches, aging them. 
 	 */
-	ret += kmem_cache_reap(gfp_mask);
+	ret += refill_inactive();
+	kmem_do_prunes(gfp_mask);
 
 	refill_freelist();
 
@@ -646,11 +675,13 @@
 		run_task_queue(&tq_disk);
 
 	/*
-	 * Hmm.. Cache shrink failed - time to kill something?
+	 * Hmm.. - time to kill something?
 	 * Mhwahahhaha! This is the part I really like. Giggle.
 	 */
-	if (!ret && free_min(ANY_ZONE) > 0)
-		out_of_memory();
+	if (!ret && free_min(ANY_ZONE) > 0) {
+		if (!kmem_cache_reap(gfp_mask))
+			out_of_memory();
+	}
 
 	return ret;
 }
@@ -744,6 +775,7 @@
 
 			/* Do background page aging. */
 			background_aging(DEF_PRIORITY);
+			kmem_do_prunes(GFP_KSWAPD);
 		}
 
 		wakeup_memwaiters();

-----------------


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2002-05-29 12:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-18  4:10 [RFC][PATCH] using page aging to shrink caches Ed Tomlinson
2002-05-21 18:47 ` Benjamin LaHaise
2002-05-24 11:28   ` [RFC][PATCH] using page aging to shrink caches (pre8-ac5) Ed Tomlinson
2002-05-24 11:35     ` Christoph Hellwig
2002-05-24 12:14       ` Ed Tomlinson
2002-05-24 12:20         ` Christoph Hellwig
2002-05-24 11:42     ` William Lee Irwin III
2002-05-29 12:01     ` Ed Tomlinson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox