From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from oscar.casa.dyndns.org ([65.92.168.147]) by tomts21-srv.bellnexxia.net (InterMail vM.5.01.04.05 201-253-122-122-105-20011231) with ESMTP id <20020511201453.FNTY11344.tomts21-srv.bellnexxia.net@oscar.casa.dyndns.org> for ; Sat, 11 May 2002 16:14:53 -0400 Received: from oscar (localhost [127.0.0.1]) by oscar.casa.dyndns.org (Postfix) with ESMTP id AD4EC16425 for ; Sat, 11 May 2002 16:14:30 -0400 (EDT) Content-Type: text/plain; charset="us-ascii" From: Ed Tomlinson Subject: [RFC][PATCH] cache shrinking via page age Date: Sat, 11 May 2002 16:14:29 -0400 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-Id: <200205111614.29698.tomlins@cam.org> Sender: owner-linux-mm@kvack.org Return-Path: To: linux-mm@kvack.org List-ID: Hi, When running under low vm pressure rmap does not shrink caches. This happens since we only call do_try_to_free_pages when we have a shortage. On my box the combination of background_aging calling refill_inactive_zone is able to supply the pages needed. The end result of this the box acts sluggish, with about half my memory used by slab pages (dcache/icache). This does correct itself under pressure but it should never get into this state in the first place. Idealy we want all pages to be about the same age. Having half the pages in the system 'cold' in the slab cache is not good - it implies the other pages are 'hotter' than they need to be. To fix the situation I move reapable slab pages into the active list. When aging moves a page into the inactive dirty list I watch for slab pages and record the caches with old pages. After refill_inactive/background_aging ends I call a new function, kmem_call_shrinkers. This scans the list of slab caches and, via a callback, shrinks caches with old pages. Note that we never swap out slab pages they just cycle through active and inactive dirty lists. The end result is that slab caches are shrunk selectivily when they have old 'cold' pages. I avoids adding any magic numbers to the vm and create a generic interface to allow creators of slab caches to supply the vm with a unique method to shrink their caches. When testing this there is one side effect to remember. Using cat /proc/slabinfo references pages - this will tend to keep the slab pages warmer than they should be. Like in quantum theory, watching (to often) can change results. I have testing on UP only - think the locking is ok though... Patch is against 2.4.19-pre7-ac2 Comments? Ed Tomlinson ------------ # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.422 -> 1.428 # fs/dcache.c 1.18 -> 1.20 # mm/vmscan.c 1.60 -> 1.65 # include/linux/slab.h 1.9 -> 1.11 # mm/slab.c 1.16 -> 1.19 # fs/inode.c 1.35 -> 1.37 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 02/05/10 ed@oscar.et.ca 1.423 # Use the vm's page aging to tell us when we need to shrink the caches. # The vm uses callbacks to tell the slabs caches its time to shrink. # -------------------------------------------- # 02/05/10 ed@oscar.et.ca 1.424 # Change the way process_shrinks is called so refill_invalid does not # need to be changed. # -------------------------------------------- # 02/05/10 ed@oscar.et.ca 1.425 # Remove debuging stuff # -------------------------------------------- # 02/05/11 ed@oscar.et.ca 1.426 # Simplify the scheme. Use per cache callbacks instead of per family. # This lets us target specific caches instead of being generic. We # still include a generic call (kmem_cache_reap) as a failsafe # before ooming. # -------------------------------------------- # 02/05/11 ed@oscar.et.ca 1.427 # Remove debugging printk # -------------------------------------------- # 02/05/11 ed@oscar.et.ca 1.428 # Change factoring, removing changes from background_aging and putting # the kmem_call_shrinkers call in kswapd. # -------------------------------------------- # diff -Nru a/fs/dcache.c b/fs/dcache.c --- a/fs/dcache.c Sat May 11 15:31:40 2002 +++ b/fs/dcache.c Sat May 11 15:31:40 2002 @@ -1186,6 +1186,8 @@ if (!dentry_cache) panic("Cannot create dentry cache"); + kmem_set_shrinker(dentry_cache, (shrinker_t)kmem_shrink_dcache); + #if PAGE_SHIFT < 13 mempages >>= (13 - PAGE_SHIFT); #endif @@ -1278,6 +1280,9 @@ SLAB_HWCACHE_ALIGN, NULL, NULL); if (!dquot_cachep) panic("Cannot create dquot SLAB cache"); + + kmem_set_shrinker(dquot_cachep, (shrinker_t)kmem_shrink_dquota); + #endif dcache_init(mempages); diff -Nru a/fs/inode.c b/fs/inode.c --- a/fs/inode.c Sat May 11 15:31:40 2002 +++ b/fs/inode.c Sat May 11 15:31:40 2002 @@ -1173,6 +1173,8 @@ if (!inode_cachep) panic("cannot create inode slab cache"); + kmem_set_shrinker(inode_cachep, (shrinker_t)kmem_shrink_icache); + unused_inodes_flush_task.routine = try_to_sync_unused_inodes; } diff -Nru a/include/linux/slab.h b/include/linux/slab.h --- a/include/linux/slab.h Sat May 11 15:31:40 2002 +++ b/include/linux/slab.h Sat May 11 15:31:40 2002 @@ -55,6 +55,19 @@ void (*)(void *, kmem_cache_t *, unsigned long)); extern int kmem_cache_destroy(kmem_cache_t *); extern int kmem_cache_shrink(kmem_cache_t *); + +typedef int (*shrinker_t)(kmem_cache_t *, int, int); + +extern void kmem_set_shrinker(kmem_cache_t *, shrinker_t); +extern int kmem_call_shrinkers(int, int); +extern void kmem_count_page(struct page *); + +/* shrink drivers */ +extern int kmem_shrink_default(kmem_cache_t *, int, int); +extern int kmem_shrink_dcache(kmem_cache_t *, int, int); +extern int kmem_shrink_icache(kmem_cache_t *, int, int); +extern int kmem_shrink_dquota(kmem_cache_t *, int, int); + extern int kmem_cache_shrink_nr(kmem_cache_t *); extern void *kmem_cache_alloc(kmem_cache_t *, int); extern void kmem_cache_free(kmem_cache_t *, void *); diff -Nru a/mm/slab.c b/mm/slab.c --- a/mm/slab.c Sat May 11 15:31:40 2002 +++ b/mm/slab.c Sat May 11 15:31:40 2002 @@ -213,6 +213,8 @@ kmem_cache_t *slabp_cache; unsigned int growing; unsigned int dflags; /* dynamic flags */ + shrinker_t shrinker; /* shrink callback */ + int count; /* count used to trigger shrink */ /* constructor func */ void (*ctor)(void *, kmem_cache_t *, unsigned long); @@ -382,6 +384,69 @@ static void enable_cpucache (kmem_cache_t *cachep); static void enable_all_cpucaches (void); #endif + +/* set the shrink family and function */ +void kmem_set_shrinker(kmem_cache_t * cachep, shrinker_t theshrinker) +{ + cachep->shrinker = theshrinker; +} + +/* used by refill_inactive_zone to determine caches that need shrinking */ +void kmem_count_page(struct page *page) +{ + kmem_cache_t *cachep = GET_PAGE_CACHE(page); + cachep->count++; +} + +/* call the shrink family function */ +int kmem_call_shrinkers(int priority, int gfp_mask) +{ + int ret = 0; + struct list_head *p; + + if (gfp_mask & __GFP_WAIT) + down(&cache_chain_sem); + else + if (down_trylock(&cache_chain_sem)) + return 0; + + list_for_each(p,&cache_chain) { + kmem_cache_t *cachep = list_entry(p, kmem_cache_t, next); + if (cachep->count > 0) { + if (cachep->shrinker == NULL) + BUG(); + ret += (*cachep->shrinker)(cachep, priority, gfp_mask); + cachep->count = 0; + } + } + up(&cache_chain_sem); + return ret; +} + +/* shink methods */ +int kmem_shrink_default(kmem_cache_t * cachep, int priority, int gfp_mask) +{ + return kmem_cache_shrink_nr(cachep); +} + +int kmem_shrink_dcache(kmem_cache_t * cachep, int priority, int gfp_mask) +{ + return shrink_dcache_memory(priority, gfp_mask); +} + +int kmem_shrink_icache(kmem_cache_t * cachep, int priority, int gfp_mask) +{ + return shrink_icache_memory(priority, gfp_mask); +} + +#if defined (CONFIG_QUOTA) + +int kmem_shrink_dquota(kmem_cache_t * cachep, int priority, int gfp_mask) +{ + return shrink_dqcache_memory(priority, gfp_mask); +} + +#endif /* Cal the num objs, wastage, and bytes left over for a given slab size. */ static void kmem_cache_estimate (unsigned long gfporder, size_t size, @@ -514,6 +579,8 @@ * vm_scan(). Shouldn't be a worry. */ while (i--) { + if (!(cachep->flags & SLAB_NO_REAP)) + lru_cache_del(page); PageClearSlab(page); page++; } @@ -781,6 +848,8 @@ flags |= CFLGS_OPTIMIZE; cachep->flags = flags; + cachep->shrinker = ( shrinker_t)(kmem_shrink_default); + cachep->count = 0; cachep->gfpflags = 0; if (flags & SLAB_CACHE_DMA) cachep->gfpflags |= GFP_DMA; @@ -1184,6 +1253,8 @@ SET_PAGE_CACHE(page, cachep); SET_PAGE_SLAB(page, slabp); PageSetSlab(page); + if (!(cachep->flags & SLAB_NO_REAP)) + lru_cache_add(page); page++; } while (--i); @@ -1903,6 +1974,7 @@ unsigned long num_objs; unsigned long active_slabs = 0; unsigned long num_slabs; + int ref; cachep = list_entry(p, kmem_cache_t, next); spin_lock_irq(&cachep->spinlock); diff -Nru a/mm/vmscan.c b/mm/vmscan.c --- a/mm/vmscan.c Sat May 11 15:31:40 2002 +++ b/mm/vmscan.c Sat May 11 15:31:40 2002 @@ -102,6 +102,9 @@ continue; } + if (PageSlab(page)) + BUG(); + /* Page is being freed */ if (unlikely(page_count(page)) == 0) { list_del(page_lru); @@ -244,7 +247,8 @@ * The page is in active use or really unfreeable. Move to * the active list and adjust the page age if needed. */ - if (page_referenced(page) && page_mapping_inuse(page) && + if (page_referenced(page) && + (page_mapping_inuse(page) || PageSlab(page)) && !page_over_rsslimit(page)) { del_page_from_inactive_dirty_list(page); add_page_to_active_list(page); @@ -253,6 +257,12 @@ } /* + * SlabPages get shrunk in refill_inactive_zone + */ + if (PageSlab(page)) + continue; + + /* * Page is being freed, don't worry about it. */ if (unlikely(page_count(page)) == 0) @@ -446,6 +456,7 @@ * This function will scan a portion of the active list of a zone to find * unused pages, those pages will then be moved to the inactive list. */ + int refill_inactive_zone(struct zone_struct * zone, int priority) { int maxscan = zone->active_pages >> priority; @@ -473,7 +484,7 @@ * bother with page aging. If the page is touched again * while on the inactive_clean list it'll be reactivated. */ - if (!page_mapping_inuse(page)) { + if (!page_mapping_inuse(page) && !PageSlab(page)) { drop_page(page); continue; } @@ -497,8 +508,12 @@ list_add(page_lru, &zone->active_list); } else { deactivate_page_nolock(page); - if (++nr_deactivated > target) + if (PageSlab(page)) + kmem_count_page(page); + else { + if (++nr_deactivated > target) break; + } } /* Low latency reschedule point */ @@ -513,6 +528,7 @@ return nr_deactivated; } + /** * refill_inactive - checks all zones and refills the inactive list as needed * @@ -577,24 +593,15 @@ /* * Eat memory from filesystem page cache, buffer cache, - * dentry, inode and filesystem quota caches. */ ret += page_launder(gfp_mask); - ret += shrink_dcache_memory(DEF_PRIORITY, gfp_mask); - ret += shrink_icache_memory(1, gfp_mask); -#ifdef CONFIG_QUOTA - ret += shrink_dqcache_memory(DEF_PRIORITY, gfp_mask); -#endif /* - * Move pages from the active list to the inactive list. + * Move pages from the active list to the inactive list and + * shrink caches return pages gained by shrink */ refill_inactive(); - - /* - * Reclaim unused slab cache memory. - */ - ret += kmem_cache_reap(gfp_mask); + ret += kmem_call_shrinkers(DEF_PRIORITY, gfp_mask); refill_freelist(); @@ -603,11 +610,14 @@ run_task_queue(&tq_disk); /* - * Hmm.. Cache shrink failed - time to kill something? + * Hmm.. - time to kill something? * Mhwahahhaha! This is the part I really like. Giggle. */ - if (!ret && free_min(ANY_ZONE) > 0) - out_of_memory(); + if (!ret && free_min(ANY_ZONE) > 0) { + ret += kmem_cache_reap(gfp_mask); + if (!ret) + out_of_memory(); + } return ret; } @@ -700,6 +710,7 @@ /* Do background page aging. */ background_aging(DEF_PRIORITY); + kmem_call_shrinkers(DEF_PRIORITY, GFP_KSWAPD); } wakeup_memwaiters(); ------------ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/