On Tue, Apr 26, 2011 at 2:17 AM, KAMEZAWA Hiroyuki < kamezawa.hiroyu@jp.fujitsu.com> wrote: > At memory reclaim, we determine the number of pages to be scanned > per zone as > (anon + file) >> priority. > Assume > scan = (anon + file) >> priority. > > If scan < SWAP_CLUSTER_MAX, shlink_list will be skipped for this > priority and results no-sacn. This has some problems. > > 1. This increases priority as 1 without any scan. > To do scan in DEF_PRIORITY always, amount of pages should be larger > than > 512M. If pages>>priority < SWAP_CLUSTER_MAX, it's recorded and scan > will be > batched, later. (But we lose 1 priority.) > But if the amount of pages is smaller than 16M, no scan at priority==0 > forever. > > 2. If zone->all_unreclaimabe==true, it's scanned only when priority==0. > So, x86's ZONE_DMA will never be recoverred until the user of pages > frees memory by itself. > > 3. With memcg, the limit of memory can be small. When using small memcg, > it gets priority < DEF_PRIORITY-2 very easily and need to call > wait_iff_congested(). > For doing scan before priorty=9, 64MB of memory should be used. > > This patch tries to scan SWAP_CLUSTER_MAX of pages in force...when > > 1. the target is enough small. > 2. it's kswapd or memcg reclaim. > > Then we can avoid rapid priority drop and may be able to recover > all_unreclaimable in a small zones. > > Signed-off-by: KAMEZAWA Hiroyuki > --- > mm/vmscan.c | 31 ++++++++++++++++++++++++++----- > 1 file changed, 26 insertions(+), 5 deletions(-) > > Index: memcg/mm/vmscan.c > =================================================================== > --- memcg.orig/mm/vmscan.c > +++ memcg/mm/vmscan.c > @@ -1737,6 +1737,16 @@ static void get_scan_count(struct zone * > u64 fraction[2], denominator; > enum lru_list l; > int noswap = 0; > + int may_noscan = 0; > + > + > extra line? > + anon = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) + > + zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON); > + file = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_FILE) + > + zone_nr_lru_pages(zone, sc, LRU_INACTIVE_FILE); > + > + if (((anon + file) >> priority) < SWAP_CLUSTER_MAX) > + may_noscan = 1; > > /* If we have no swap space, do not bother scanning anon pages. */ > if (!sc->may_swap || (nr_swap_pages <= 0)) { > @@ -1747,11 +1757,6 @@ static void get_scan_count(struct zone * > goto out; > } > > - anon = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) + > - zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON); > - file = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_FILE) + > - zone_nr_lru_pages(zone, sc, LRU_INACTIVE_FILE); > - > if (scanning_global_lru(sc)) { > free = zone_page_state(zone, NR_FREE_PAGES); > /* If we have very few page cache pages, > @@ -1814,10 +1819,26 @@ out: > unsigned long scan; > > scan = zone_nr_lru_pages(zone, sc, l); > + > extra line? > if (priority || noswap) { > scan >>= priority; > scan = div64_u64(scan * fraction[file], > denominator); > } > + > + if (!scan && > + may_noscan && > + (current_is_kswapd() || !scanning_global_lru(sc))) { > + /* > + * if we do target scan, the whole amount of memory > + * can be too small to scan with low priority > value. > + * This raise up priority rapidly without any scan. > + * Avoid that and give some scan. > + */ > + if (file) > + scan = SWAP_CLUSTER_MAX; > + else if (!noswap && (fraction[anon] > > fraction[file]*16)) > + scan = SWAP_CLUSTER_MAX; > + } > Ok, so we are changing the global kswapd, and per-memcg bg and direct reclaim both. Just to be clear here. Also, how did we calculated the "16" to be the fraction of anon vs file? nr[l] = nr_scan_try_batch(scan, > &reclaim_stat->nr_saved_scan[l]); > } > > Thank you --Ying