From: Ed Tomlinson <tomlins@cam.org>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@zip.com.au>
Subject: Re: [RFC][PATCH] cache shrinking via page age
Date: Mon, 13 May 2002 22:38:31 -0400 [thread overview]
Message-ID: <200205132238.31589.tomlins@cam.org> (raw)
In-Reply-To: <200205120949.13081.tomlins@cam.org>
Hi,
Andrew Morton pointed out that the kernel is using 8m pages and is
setting reference bits for these pages... He suggested (amoung other
things - thanks) that setting the bits in kmem_cache_alloc would be
a good start to making aging happen. This version of the patch
impliments his suggestion.
Comments?
Ed Tomlinson
On May 11, 2002 04:14 pm, Ed Tomlinson wrote:
> > When running under low vm pressure rmap does not shrink caches. This
> > happens since we only call do_try_to_free_pages when we have a shortage.
> > On my box the combination of background_aging calling
> > refill_inactive_zone is able to supply the pages needed. The end result
> > of this the box acts sluggish, with about half my memory used by slab
> > pages (dcache/icache). This does correct itself under pressure but it
> > should never get into this state in the first place.
> >
> > Idealy we want all pages to be about the same age. Having half the pages
> > in the system 'cold' in the slab cache is not good - it implies the other
> > pages are 'hotter' than they need to be.
> >
> > To fix the situation I move reapable slab pages into the active list.
> > When aging moves a page into the inactive dirty list I watch for slab
> > pages and record the caches with old pages. After
> > refill_inactive/background_aging ends I call a new function,
> > kmem_call_shrinkers. This scans the list of slab caches and, via a
> > callback, shrinks caches with old pages. Note that we never swap out
> > slab pages they just cycle through active and inactive dirty lists.
> >
> > The end result is that slab caches are shrunk selectivily when they have
> > old 'cold' pages. I avoid adding any magic numbers to the vm and create
> > a generic interface to allow creators of slab caches to supply the vm
> > with a unique method to shrink their caches.
> >
> > When testing this there is one side effect to remember. Using cat
> > /proc/slabinfo references pages - this will tend to keep the slab pages
> > warmer than they should be. Like in quantum theory, watching (to often)
> > can change results.
>
> One additional comment. I have tried modifing kmem_cache_shrink_nr to
> free only the number of pages seen by refill_inactive_zone. This scheme
> revives the original problem. Think the issue is that, in essence, the the
> dentry/inode caches often work in read once mode (thats each object in
> a slab is used once...). Without the more aggresive shrink in this patch
> the 'read once' slab pages upset the vm balance.
>
> A data point. Comparing this patch to my previous one the inode/entry
> caches stabilize at about twice the size here.
>
> > I have testing on UP only - think the locking is ok though...
Patch is against 2.4.19-pre7-ac2
--------------
# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
# ChangeSet 1.422 -> 1.432
# fs/dcache.c 1.18 -> 1.20
# mm/vmscan.c 1.60 -> 1.66
# include/linux/slab.h 1.9 -> 1.11
# mm/slab.c 1.16 -> 1.22
# fs/inode.c 1.35 -> 1.38
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/05/10 ed@oscar.et.ca 1.423
# Use the vm's page aging to tell us when we need to shrink the caches.
# The vm uses callbacks to tell the slabs caches its time to shrink.
# --------------------------------------------
# 02/05/10 ed@oscar.et.ca 1.424
# Change the way process_shrinks is called so refill_invalid does not
# need to be changed.
# --------------------------------------------
# 02/05/10 ed@oscar.et.ca 1.425
# Remove debuging stuff
# --------------------------------------------
# 02/05/11 ed@oscar.et.ca 1.426
# Simplify the scheme. Use per cache callbacks instead of per family.
# This lets us target specific caches instead of being generic. We
# still include a generic call (kmem_cache_reap) as a failsafe
# before ooming.
# --------------------------------------------
# 02/05/11 ed@oscar.et.ca 1.427
# Remove debugging printk
# --------------------------------------------
# 02/05/11 ed@oscar.et.ca 1.428
# Change factoring, removing changes from background_aging and putting
# the kmem_call_shrinkers call in kswapd.
# --------------------------------------------
# 02/05/12 ed@oscar.et.ca 1.429
# The icache is a slave of the dcache. We will not reuse the inodes so
# lets clean them all.
# --------------------------------------------
# 02/05/12 ed@oscar.et.ca 1.430
# Only call shrink callback if we have seen a slab's worth of pages
# --------------------------------------------
# 02/05/13 ed@oscar.et.ca 1.431
# Andrew Morton pointed out that kernal pages are big (8M) and the
# hardware reference bit is working with these big pages. This makes
# aging slabs on 4K pages a little more difficult. Andrew suggested
# hooking into the kmem_cache_alloc process and set the bit(s) there.
# This changeset does this.
# --------------------------------------------
# 02/05/13 ed@oscar.et.ca 1.432
# Cleanup debug stuff
# --------------------------------------------
#
diff -Nru a/fs/dcache.c b/fs/dcache.c
--- a/fs/dcache.c Mon May 13 22:27:31 2002
+++ b/fs/dcache.c Mon May 13 22:27:31 2002
@@ -1186,6 +1186,8 @@
if (!dentry_cache)
panic("Cannot create dentry cache");
+ kmem_set_shrinker(dentry_cache, (shrinker_t)kmem_shrink_dcache);
+
#if PAGE_SHIFT < 13
mempages >>= (13 - PAGE_SHIFT);
#endif
@@ -1278,6 +1280,9 @@
SLAB_HWCACHE_ALIGN, NULL, NULL);
if (!dquot_cachep)
panic("Cannot create dquot SLAB cache");
+
+ kmem_set_shrinker(dquot_cachep, (shrinker_t)kmem_shrink_dquota);
+
#endif
dcache_init(mempages);
diff -Nru a/fs/inode.c b/fs/inode.c
--- a/fs/inode.c Mon May 13 22:27:31 2002
+++ b/fs/inode.c Mon May 13 22:27:31 2002
@@ -722,7 +722,7 @@
if (!(gfp_mask & __GFP_FS))
return 0;
- count = inodes_stat.nr_unused / priority;
+ count = inodes_stat.nr_unused;
prune_icache(count);
return kmem_cache_shrink_nr(inode_cachep);
@@ -1172,6 +1172,8 @@
NULL);
if (!inode_cachep)
panic("cannot create inode slab cache");
+
+ kmem_set_shrinker(inode_cachep, (shrinker_t)kmem_shrink_icache);
unused_inodes_flush_task.routine = try_to_sync_unused_inodes;
}
diff -Nru a/include/linux/slab.h b/include/linux/slab.h
--- a/include/linux/slab.h Mon May 13 22:27:31 2002
+++ b/include/linux/slab.h Mon May 13 22:27:31 2002
@@ -55,6 +55,19 @@
void (*)(void *, kmem_cache_t *, unsigned long));
extern int kmem_cache_destroy(kmem_cache_t *);
extern int kmem_cache_shrink(kmem_cache_t *);
+
+typedef int (*shrinker_t)(kmem_cache_t *, int, int);
+
+extern void kmem_set_shrinker(kmem_cache_t *, shrinker_t);
+extern int kmem_call_shrinkers(int, int);
+extern void kmem_count_page(struct page *);
+
+/* shrink drivers */
+extern int kmem_shrink_default(kmem_cache_t *, int, int);
+extern int kmem_shrink_dcache(kmem_cache_t *, int, int);
+extern int kmem_shrink_icache(kmem_cache_t *, int, int);
+extern int kmem_shrink_dquota(kmem_cache_t *, int, int);
+
extern int kmem_cache_shrink_nr(kmem_cache_t *);
extern void *kmem_cache_alloc(kmem_cache_t *, int);
extern void kmem_cache_free(kmem_cache_t *, void *);
diff -Nru a/mm/slab.c b/mm/slab.c
--- a/mm/slab.c Mon May 13 22:27:31 2002
+++ b/mm/slab.c Mon May 13 22:27:31 2002
@@ -213,6 +213,8 @@
kmem_cache_t *slabp_cache;
unsigned int growing;
unsigned int dflags; /* dynamic flags */
+ shrinker_t shrinker; /* shrink callback */
+ int count; /* count used to trigger shrink */
/* constructor func */
void (*ctor)(void *, kmem_cache_t *, unsigned long);
@@ -382,6 +384,69 @@
static void enable_cpucache (kmem_cache_t *cachep);
static void enable_all_cpucaches (void);
#endif
+
+/* set the shrink family and function */
+void kmem_set_shrinker(kmem_cache_t * cachep, shrinker_t theshrinker)
+{
+ cachep->shrinker = theshrinker;
+}
+
+/* used by refill_inactive_zone to determine caches that need shrinking */
+void kmem_count_page(struct page *page)
+{
+ kmem_cache_t *cachep = GET_PAGE_CACHE(page);
+ cachep->count++;
+}
+
+/* call the shrink family function */
+int kmem_call_shrinkers(int priority, int gfp_mask)
+{
+ int ret = 0;
+ struct list_head *p;
+
+ if (gfp_mask & __GFP_WAIT)
+ down(&cache_chain_sem);
+ else
+ if (down_trylock(&cache_chain_sem))
+ return 0;
+
+ list_for_each(p,&cache_chain) {
+ kmem_cache_t *cachep = list_entry(p, kmem_cache_t, next);
+ if (cachep->count >= (1<<cachep->gfporder)) {
+ if (cachep->shrinker == NULL)
+ BUG();
+ ret += (*cachep->shrinker)(cachep, priority, gfp_mask);
+ cachep->count = 0;
+ }
+ }
+ up(&cache_chain_sem);
+ return ret;
+}
+
+/* shink methods */
+int kmem_shrink_default(kmem_cache_t * cachep, int priority, int gfp_mask)
+{
+ return kmem_cache_shrink_nr(cachep);
+}
+
+int kmem_shrink_dcache(kmem_cache_t * cachep, int priority, int gfp_mask)
+{
+ return shrink_dcache_memory(priority, gfp_mask);
+}
+
+int kmem_shrink_icache(kmem_cache_t * cachep, int priority, int gfp_mask)
+{
+ return shrink_icache_memory(priority, gfp_mask);
+}
+
+#if defined (CONFIG_QUOTA)
+
+int kmem_shrink_dquota(kmem_cache_t * cachep, int priority, int gfp_mask)
+{
+ return shrink_dqcache_memory(priority, gfp_mask);
+}
+
+#endif
/* Cal the num objs, wastage, and bytes left over for a given slab size. */
static void kmem_cache_estimate (unsigned long gfporder, size_t size,
@@ -514,12 +579,31 @@
* vm_scan(). Shouldn't be a worry.
*/
while (i--) {
+ if (!(cachep->flags & SLAB_NO_REAP))
+ lru_cache_del(page);
PageClearSlab(page);
page++;
}
free_pages((unsigned long)addr, cachep->gfporder);
}
+/*
+ * kernel pages are 8M so 4k page ref bit is not set - we need to
+ * do it manually...
+ */
+void kmem_set_referenced(kmem_cache_t *cachep, slab_t *slabp)
+{
+ if (!(cachep->flags & SLAB_NO_REAP)) {
+ unsigned long i = (1<<cachep->gfporder);
+ struct page *page = virt_to_page(slabp->s_mem-slabp->colouroff);
+ while (i--) {
+ SetPageReferenced(page);
+ page++;
+ }
+ }
+}
+
+
#if DEBUG
static inline void kmem_poison_obj (kmem_cache_t *cachep, void *addr)
{
@@ -781,6 +865,8 @@
flags |= CFLGS_OPTIMIZE;
cachep->flags = flags;
+ cachep->shrinker = ( shrinker_t)(kmem_shrink_default);
+ cachep->count = 0;
cachep->gfpflags = 0;
if (flags & SLAB_CACHE_DMA)
cachep->gfpflags |= GFP_DMA;
@@ -1184,6 +1270,8 @@
SET_PAGE_CACHE(page, cachep);
SET_PAGE_SLAB(page, slabp);
PageSetSlab(page);
+ if (!(cachep->flags & SLAB_NO_REAP))
+ lru_cache_add(page);
page++;
} while (--i);
@@ -1265,6 +1353,7 @@
list_del(&slabp->list);
list_add(&slabp->list, &cachep->slabs_full);
}
+ kmem_set_referenced(cachep, slabp);
#if DEBUG
if (cachep->flags & SLAB_POISON)
if (kmem_check_poison_obj(cachep, objp))
@@ -1903,6 +1992,7 @@
unsigned long num_objs;
unsigned long active_slabs = 0;
unsigned long num_slabs;
+ int ref;
cachep = list_entry(p, kmem_cache_t, next);
spin_lock_irq(&cachep->spinlock);
diff -Nru a/mm/vmscan.c b/mm/vmscan.c
--- a/mm/vmscan.c Mon May 13 22:27:31 2002
+++ b/mm/vmscan.c Mon May 13 22:27:31 2002
@@ -102,6 +102,9 @@
continue;
}
+ if (PageSlab(page))
+ BUG();
+
/* Page is being freed */
if (unlikely(page_count(page)) == 0) {
list_del(page_lru);
@@ -244,7 +247,8 @@
* The page is in active use or really unfreeable. Move to
* the active list and adjust the page age if needed.
*/
- if (page_referenced(page) && page_mapping_inuse(page) &&
+ if (page_referenced(page) &&
+ (page_mapping_inuse(page) || PageSlab(page)) &&
!page_over_rsslimit(page)) {
del_page_from_inactive_dirty_list(page);
add_page_to_active_list(page);
@@ -253,6 +257,13 @@
}
/*
+ * SlabPages get shrunk in refill_inactive_zone. These pages
+ * are 'naked' - we do not want any other tests done on them...
+ */
+ if (PageSlab(page))
+ continue;
+
+ /*
* Page is being freed, don't worry about it.
*/
if (unlikely(page_count(page)) == 0)
@@ -446,6 +457,7 @@
* This function will scan a portion of the active list of a zone to find
* unused pages, those pages will then be moved to the inactive list.
*/
+
int refill_inactive_zone(struct zone_struct * zone, int priority)
{
int maxscan = zone->active_pages >> priority;
@@ -473,7 +485,7 @@
* bother with page aging. If the page is touched again
* while on the inactive_clean list it'll be reactivated.
*/
- if (!page_mapping_inuse(page)) {
+ if (!page_mapping_inuse(page) && !PageSlab(page)) {
drop_page(page);
continue;
}
@@ -497,8 +509,12 @@
list_add(page_lru, &zone->active_list);
} else {
deactivate_page_nolock(page);
- if (++nr_deactivated > target)
+ if (PageSlab(page))
+ kmem_count_page(page);
+ else {
+ if (++nr_deactivated > target)
break;
+ }
}
/* Low latency reschedule point */
@@ -513,6 +529,7 @@
return nr_deactivated;
}
+
/**
* refill_inactive - checks all zones and refills the inactive list as needed
*
@@ -577,24 +594,15 @@
/*
* Eat memory from filesystem page cache, buffer cache,
- * dentry, inode and filesystem quota caches.
*/
ret += page_launder(gfp_mask);
- ret += shrink_dcache_memory(DEF_PRIORITY, gfp_mask);
- ret += shrink_icache_memory(1, gfp_mask);
-#ifdef CONFIG_QUOTA
- ret += shrink_dqcache_memory(DEF_PRIORITY, gfp_mask);
-#endif
/*
- * Move pages from the active list to the inactive list.
+ * Move pages from the active list to the inactive list and
+ * shrink caches return pages gained by shrink
*/
refill_inactive();
-
- /*
- * Reclaim unused slab cache memory.
- */
- ret += kmem_cache_reap(gfp_mask);
+ ret += kmem_call_shrinkers(DEF_PRIORITY, gfp_mask);
refill_freelist();
@@ -603,11 +611,14 @@
run_task_queue(&tq_disk);
/*
- * Hmm.. Cache shrink failed - time to kill something?
+ * Hmm.. - time to kill something?
* Mhwahahhaha! This is the part I really like. Giggle.
*/
- if (!ret && free_min(ANY_ZONE) > 0)
- out_of_memory();
+ if (!ret && free_min(ANY_ZONE) > 0) {
+ ret += kmem_cache_reap(gfp_mask);
+ if (!ret)
+ out_of_memory();
+ }
return ret;
}
@@ -700,6 +711,7 @@
/* Do background page aging. */
background_aging(DEF_PRIORITY);
+ kmem_call_shrinkers(DEF_PRIORITY, GFP_KSWAPD);
}
wakeup_memwaiters();
--------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
next prev parent reply other threads:[~2002-05-14 2:38 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-05-11 20:14 Ed Tomlinson
2002-05-12 13:49 ` Ed Tomlinson
2002-05-14 2:38 ` Ed Tomlinson [this message]
2002-05-14 14:20 ` William Lee Irwin III
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200205132238.31589.tomlins@cam.org \
--to=tomlins@cam.org \
--cc=akpm@zip.com.au \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox