From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from oscar.casa.dyndns.org ([65.92.165.64]) by tomts16-srv.bellnexxia.net (InterMail vM.5.01.04.05 201-253-122-122-105-20011231) with ESMTP id <20020512134935.XTS24056.tomts16-srv.bellnexxia.net@oscar.casa.dyndns.org> for ; Sun, 12 May 2002 09:49:35 -0400 Received: from oscar (localhost [127.0.0.1]) by oscar.casa.dyndns.org (Postfix) with ESMTP id 625341470C for ; Sun, 12 May 2002 09:49:14 -0400 (EDT) Content-Type: text/plain; charset="iso-8859-1" From: Ed Tomlinson Subject: Re: [RFC][PATCH] cache shrinking via page age Date: Sun, 12 May 2002 09:49:12 -0400 References: <200205111614.29698.tomlins@cam.org> In-Reply-To: <200205111614.29698.tomlins@cam.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-Id: <200205120949.13081.tomlins@cam.org> Sender: owner-linux-mm@kvack.org Return-Path: To: linux-mm@kvack.org List-ID: On May 11, 2002 04:14 pm, Ed Tomlinson wrote: One additional comment. I have tried modifing kmem_cache_shrink_nr to free only the number of pages seen by refill_inactive_zone. This scheme revives the original problem. Think the issue is that, in essence, the the dentry/inode caches often work in read once mode (thats each object in a slab is used once...). Without the more aggresive shrink in this patch the 'read once' slab pages upset the vm balance. A data point. Comparing this patch to my previous one the inode/entry caches stabilize at about twice the size here. Ed Tomlinson > When running under low vm pressure rmap does not shrink caches. This > happens since we only call do_try_to_free_pages when we have a shortage. > On my box the combination of background_aging calling refill_inactive_zone > is able to supply the pages needed. The end result of this the box acts > sluggish, with about half my memory used by slab pages (dcache/icache). > This does correct itself under pressure but it should never get into this > state in the first place. > > Idealy we want all pages to be about the same age. Having half the pages > in the system 'cold' in the slab cache is not good - it implies the other > pages are 'hotter' than they need to be. > > To fix the situation I move reapable slab pages into the active list. When > aging moves a page into the inactive dirty list I watch for slab pages and > record the caches with old pages. After refill_inactive/background_aging > ends I call a new function, kmem_call_shrinkers. This scans the list of > slab caches and, via a callback, shrinks caches with old pages. Note that > we never swap out slab pages they just cycle through active and inactive > dirty lists. > > The end result is that slab caches are shrunk selectivily when they have > old 'cold' pages. I avoid adding any magic numbers to the vm and create a > generic interface to allow creators of slab caches to supply the vm with a > unique method to shrink their caches. > > When testing this there is one side effect to remember. Using cat > /proc/slabinfo references pages - this will tend to keep the slab pages > warmer than they should be. Like in quantum theory, watching (to often) > can change results. > > I have testing on UP only - think the locking is ok though... > > Patch is against 2.4.19-pre7-ac2 > > Comments? > Ed Tomlinson > > ------------ > # This is a BitKeeper generated patch for the following project: > # Project Name: Linux kernel tree > # This patch format is intended for GNU patch command version 2.5 or > higher. # This patch includes the following deltas: > # ChangeSet 1.422 -> 1.428 > # fs/dcache.c 1.18 -> 1.20 > # mm/vmscan.c 1.60 -> 1.65 > # include/linux/slab.h 1.9 -> 1.11 > # mm/slab.c 1.16 -> 1.19 > # fs/inode.c 1.35 -> 1.37 > # > # The following is the BitKeeper ChangeSet Log > # -------------------------------------------- > # 02/05/10 ed@oscar.et.ca 1.423 > # Use the vm's page aging to tell us when we need to shrink the caches. > # The vm uses callbacks to tell the slabs caches its time to shrink. > # -------------------------------------------- > # 02/05/10 ed@oscar.et.ca 1.424 > # Change the way process_shrinks is called so refill_invalid does not > # need to be changed. > # -------------------------------------------- > # 02/05/10 ed@oscar.et.ca 1.425 > # Remove debuging stuff > # -------------------------------------------- > # 02/05/11 ed@oscar.et.ca 1.426 > # Simplify the scheme. Use per cache callbacks instead of per family. > # This lets us target specific caches instead of being generic. We > # still include a generic call (kmem_cache_reap) as a failsafe > # before ooming. > # -------------------------------------------- > # 02/05/11 ed@oscar.et.ca 1.427 > # Remove debugging printk > # -------------------------------------------- > # 02/05/11 ed@oscar.et.ca 1.428 > # Change factoring, removing changes from background_aging and putting > # the kmem_call_shrinkers call in kswapd. > # -------------------------------------------- > # > diff -Nru a/fs/dcache.c b/fs/dcache.c > --- a/fs/dcache.c Sat May 11 15:31:40 2002 > +++ b/fs/dcache.c Sat May 11 15:31:40 2002 > @@ -1186,6 +1186,8 @@ > if (!dentry_cache) > panic("Cannot create dentry cache"); > > + kmem_set_shrinker(dentry_cache, (shrinker_t)kmem_shrink_dcache); > + > #if PAGE_SHIFT < 13 > mempages >>= (13 - PAGE_SHIFT); > #endif > @@ -1278,6 +1280,9 @@ > SLAB_HWCACHE_ALIGN, NULL, NULL); > if (!dquot_cachep) > panic("Cannot create dquot SLAB cache"); > + > + kmem_set_shrinker(dquot_cachep, (shrinker_t)kmem_shrink_dquota); > + > #endif > > dcache_init(mempages); > diff -Nru a/fs/inode.c b/fs/inode.c > --- a/fs/inode.c Sat May 11 15:31:40 2002 > +++ b/fs/inode.c Sat May 11 15:31:40 2002 > @@ -1173,6 +1173,8 @@ > if (!inode_cachep) > panic("cannot create inode slab cache"); > > + kmem_set_shrinker(inode_cachep, (shrinker_t)kmem_shrink_icache); > + > unused_inodes_flush_task.routine = try_to_sync_unused_inodes; > } > > diff -Nru a/include/linux/slab.h b/include/linux/slab.h > --- a/include/linux/slab.h Sat May 11 15:31:40 2002 > +++ b/include/linux/slab.h Sat May 11 15:31:40 2002 > @@ -55,6 +55,19 @@ > void (*)(void *, kmem_cache_t *, unsigned long)); > extern int kmem_cache_destroy(kmem_cache_t *); > extern int kmem_cache_shrink(kmem_cache_t *); > + > +typedef int (*shrinker_t)(kmem_cache_t *, int, int); > + > +extern void kmem_set_shrinker(kmem_cache_t *, shrinker_t); > +extern int kmem_call_shrinkers(int, int); > +extern void kmem_count_page(struct page *); > + > +/* shrink drivers */ > +extern int kmem_shrink_default(kmem_cache_t *, int, int); > +extern int kmem_shrink_dcache(kmem_cache_t *, int, int); > +extern int kmem_shrink_icache(kmem_cache_t *, int, int); > +extern int kmem_shrink_dquota(kmem_cache_t *, int, int); > + > extern int kmem_cache_shrink_nr(kmem_cache_t *); > extern void *kmem_cache_alloc(kmem_cache_t *, int); > extern void kmem_cache_free(kmem_cache_t *, void *); > diff -Nru a/mm/slab.c b/mm/slab.c > --- a/mm/slab.c Sat May 11 15:31:40 2002 > +++ b/mm/slab.c Sat May 11 15:31:40 2002 > @@ -213,6 +213,8 @@ > kmem_cache_t *slabp_cache; > unsigned int growing; > unsigned int dflags; /* dynamic flags */ > + shrinker_t shrinker; /* shrink callback */ > + int count; /* count used to trigger shrink */ > > /* constructor func */ > void (*ctor)(void *, kmem_cache_t *, unsigned long); > @@ -382,6 +384,69 @@ > static void enable_cpucache (kmem_cache_t *cachep); > static void enable_all_cpucaches (void); > #endif > + > +/* set the shrink family and function */ > +void kmem_set_shrinker(kmem_cache_t * cachep, shrinker_t theshrinker) > +{ > + cachep->shrinker = theshrinker; > +} > + > +/* used by refill_inactive_zone to determine caches that need shrinking */ > +void kmem_count_page(struct page *page) > +{ > + kmem_cache_t *cachep = GET_PAGE_CACHE(page); > + cachep->count++; > +} > + > +/* call the shrink family function */ > +int kmem_call_shrinkers(int priority, int gfp_mask) > +{ > + int ret = 0; > + struct list_head *p; > + > + if (gfp_mask & __GFP_WAIT) > + down(&cache_chain_sem); > + else > + if (down_trylock(&cache_chain_sem)) > + return 0; > + > + list_for_each(p,&cache_chain) { > + kmem_cache_t *cachep = list_entry(p, kmem_cache_t, next); > + if (cachep->count > 0) { > + if (cachep->shrinker == NULL) > + BUG(); > + ret += (*cachep->shrinker)(cachep, priority, gfp_mask); > + cachep->count = 0; > + } > + } > + up(&cache_chain_sem); > + return ret; > +} > + > +/* shink methods */ > +int kmem_shrink_default(kmem_cache_t * cachep, int priority, int gfp_mask) > +{ > + return kmem_cache_shrink_nr(cachep); > +} > + > +int kmem_shrink_dcache(kmem_cache_t * cachep, int priority, int gfp_mask) > +{ > + return shrink_dcache_memory(priority, gfp_mask); > +} > + > +int kmem_shrink_icache(kmem_cache_t * cachep, int priority, int gfp_mask) > +{ > + return shrink_icache_memory(priority, gfp_mask); > +} > + > +#if defined (CONFIG_QUOTA) > + > +int kmem_shrink_dquota(kmem_cache_t * cachep, int priority, int gfp_mask) > +{ > + return shrink_dqcache_memory(priority, gfp_mask); > +} > + > +#endif > > /* Cal the num objs, wastage, and bytes left over for a given slab size. > */ static void kmem_cache_estimate (unsigned long gfporder, size_t size, @@ > -514,6 +579,8 @@ > * vm_scan(). Shouldn't be a worry. > */ > while (i--) { > + if (!(cachep->flags & SLAB_NO_REAP)) > + lru_cache_del(page); > PageClearSlab(page); > page++; > } > @@ -781,6 +848,8 @@ > flags |= CFLGS_OPTIMIZE; > > cachep->flags = flags; > + cachep->shrinker = ( shrinker_t)(kmem_shrink_default); > + cachep->count = 0; > cachep->gfpflags = 0; > if (flags & SLAB_CACHE_DMA) > cachep->gfpflags |= GFP_DMA; > @@ -1184,6 +1253,8 @@ > SET_PAGE_CACHE(page, cachep); > SET_PAGE_SLAB(page, slabp); > PageSetSlab(page); > + if (!(cachep->flags & SLAB_NO_REAP)) > + lru_cache_add(page); > page++; > } while (--i); > > @@ -1903,6 +1974,7 @@ > unsigned long num_objs; > unsigned long active_slabs = 0; > unsigned long num_slabs; > + int ref; > cachep = list_entry(p, kmem_cache_t, next); > > spin_lock_irq(&cachep->spinlock); > diff -Nru a/mm/vmscan.c b/mm/vmscan.c > --- a/mm/vmscan.c Sat May 11 15:31:40 2002 > +++ b/mm/vmscan.c Sat May 11 15:31:40 2002 > @@ -102,6 +102,9 @@ > continue; > } > > + if (PageSlab(page)) > + BUG(); > + > /* Page is being freed */ > if (unlikely(page_count(page)) == 0) { > list_del(page_lru); > @@ -244,7 +247,8 @@ > * The page is in active use or really unfreeable. Move to > * the active list and adjust the page age if needed. > */ > - if (page_referenced(page) && page_mapping_inuse(page) && > + if (page_referenced(page) && > + (page_mapping_inuse(page) || PageSlab(page)) && > !page_over_rsslimit(page)) { > del_page_from_inactive_dirty_list(page); > add_page_to_active_list(page); > @@ -253,6 +257,12 @@ > } > > /* > + * SlabPages get shrunk in refill_inactive_zone > + */ > + if (PageSlab(page)) > + continue; > + > + /* > * Page is being freed, don't worry about it. > */ > if (unlikely(page_count(page)) == 0) > @@ -446,6 +456,7 @@ > * This function will scan a portion of the active list of a zone to find > * unused pages, those pages will then be moved to the inactive list. > */ > + > int refill_inactive_zone(struct zone_struct * zone, int priority) > { > int maxscan = zone->active_pages >> priority; > @@ -473,7 +484,7 @@ > * bother with page aging. If the page is touched again > * while on the inactive_clean list it'll be reactivated. > */ > - if (!page_mapping_inuse(page)) { > + if (!page_mapping_inuse(page) && !PageSlab(page)) { > drop_page(page); > continue; > } > @@ -497,8 +508,12 @@ > list_add(page_lru, &zone->active_list); > } else { > deactivate_page_nolock(page); > - if (++nr_deactivated > target) > + if (PageSlab(page)) > + kmem_count_page(page); > + else { > + if (++nr_deactivated > target) > break; > + } > } > > /* Low latency reschedule point */ > @@ -513,6 +528,7 @@ > return nr_deactivated; > } > > + > /** > * refill_inactive - checks all zones and refills the inactive list as > needed * > @@ -577,24 +593,15 @@ > > /* > * Eat memory from filesystem page cache, buffer cache, > - * dentry, inode and filesystem quota caches. > */ > ret += page_launder(gfp_mask); > - ret += shrink_dcache_memory(DEF_PRIORITY, gfp_mask); > - ret += shrink_icache_memory(1, gfp_mask); > -#ifdef CONFIG_QUOTA > - ret += shrink_dqcache_memory(DEF_PRIORITY, gfp_mask); > -#endif > > /* > - * Move pages from the active list to the inactive list. > + * Move pages from the active list to the inactive list and > + * shrink caches return pages gained by shrink > */ > refill_inactive(); > - > - /* > - * Reclaim unused slab cache memory. > - */ > - ret += kmem_cache_reap(gfp_mask); > + ret += kmem_call_shrinkers(DEF_PRIORITY, gfp_mask); > > refill_freelist(); > > @@ -603,11 +610,14 @@ > run_task_queue(&tq_disk); > > /* > - * Hmm.. Cache shrink failed - time to kill something? > + * Hmm.. - time to kill something? > * Mhwahahhaha! This is the part I really like. Giggle. > */ > - if (!ret && free_min(ANY_ZONE) > 0) > - out_of_memory(); > + if (!ret && free_min(ANY_ZONE) > 0) { > + ret += kmem_cache_reap(gfp_mask); > + if (!ret) > + out_of_memory(); > + } > > return ret; > } > @@ -700,6 +710,7 @@ > > /* Do background page aging. */ > background_aging(DEF_PRIORITY); > + kmem_call_shrinkers(DEF_PRIORITY, GFP_KSWAPD); > } > > wakeup_memwaiters(); > ------------ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/