From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Fri, 10 Oct 2008 15:33:46 -0700 From: Andrew Morton Subject: Re: vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru.patch Message-Id: <20081010153346.e25b90f7.akpm@linux-foundation.org> In-Reply-To: <20081010152540.79ed64cb.akpm@linux-foundation.org> References: <200810081655.06698.nickpiggin@yahoo.com.au> <20081008185401.D958.KOSAKI.MOTOHIRO@jp.fujitsu.com> <20081010151701.e9e50bdb.akpm@linux-foundation.org> <20081010152540.79ed64cb.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: kosaki.motohiro@jp.fujitsu.com, nickpiggin@yahoo.com.au, linux-mm@kvack.org, riel@redhat.com, lee.schermerhorn@hp.com List-ID: On Fri, 10 Oct 2008 15:25:40 -0700 Andrew Morton wrote: > On Fri, 10 Oct 2008 15:17:01 -0700 > Andrew Morton wrote: > > > On Wed, 8 Oct 2008 19:03:07 +0900 (JST) > > KOSAKI Motohiro wrote: > > > > > Hi > > > > > > Nick, Andrew, very thanks for good advice. > > > your helpful increase my investigate speed. > > > > > > > > > > This patch, like I said when it was first merged, has the problem that > > > > it can cause large stalls when reclaiming pages. > > > > > > > > I actually myself tried a similar thing a long time ago. The problem is > > > > that after a long period of no reclaiming, your file pages can all end > > > > up being active and referenced. When the first guy wants to reclaim a > > > > page, it might have to scan through gigabytes of file pages before being > > > > able to reclaim a single one. > > > > > > I perfectly agree this opinion. > > > all pages stay on active list is awful. > > > > > > In addition, my mesurement tell me this patch cause latency degression on really heavy io workload. > > > > > > 2.6.27-rc8: Throughput 13.4231 MB/sec 4000 clients 4000 procs max_latency=1421988.159 ms > > > + patch : Throughput 12.0953 MB/sec 4000 clients 4000 procs max_latency=1731244.847 ms > > > > > > > > > > While it would be really nice to be able to just lazily set PageReferenced > > > > and nothing else in mark_page_accessed, and then do file page aging based > > > > on the referenced bit, the fact is that we virtually have O(1) reclaim > > > > for file pages now, and this can make it much more like O(n) (in worst case, > > > > especially). > > > > > > > > I don't think it is right to say "we broke aging and this patch fixes it". > > > > It's all a big crazy heuristic. Who's to say that the previous behaviour > > > > wasn't better and this patch breaks it? :) > > > > > > > > Anyway, I don't think it is exactly productive to keep patches like this in > > > > the tree (that doesn't seem ever intended to be merged) while there are > > > > other big changes to reclaim there. > > > > Well yes. I've been hanging onto these in the hope that someone would > > work out whether they are changes which we should make. > > > > > > > > Same for vm-dont-run-touch_buffer-during-buffercache-lookups.patch > > > > > > I mesured it too, > > > > > > 2.6.27-rc8: Throughput 13.4231 MB/sec 4000 clients 4000 procs max_latency=1421988.159 ms > > > + patch : Throughput 11.8494 MB/sec 4000 clients 4000 procs max_latency=3463217.227 ms > > > > > > dbench latency increased about x2.5 > > > > > > So, the patch desctiption already descibe this risk. > > > metadata dropping can decrease performance largely. > > > that just appeared, imho. > > > > Oh well, that'll suffice, thanks - I'll drop them. > > Which means that after vmscan-split-lru-lists-into-anon-file-sets.patch, > shrink_active_list() simply does > > while (!list_empty(&l_hold)) { > cond_resched(); > page = lru_to_page(&l_hold); > list_add(&page->lru, &l_inactive); > } > > yes? > > We might even be able to list_splice those pages.. OK, that wasn't a particularly good time to drop those patches. Here's how shrink_active_list() ended up: static void shrink_active_list(unsigned long nr_pages, struct zone *zone, struct scan_control *sc, int priority, int file) { unsigned long pgmoved; int pgdeactivate = 0; unsigned long pgscanned; LIST_HEAD(l_hold); /* The pages which were snipped off */ LIST_HEAD(l_active); LIST_HEAD(l_inactive); struct page *page; struct pagevec pvec; enum lru_list lru; lru_add_drain(); spin_lock_irq(&zone->lru_lock); pgmoved = sc->isolate_pages(nr_pages, &l_hold, &pgscanned, sc->order, ISOLATE_ACTIVE, zone, sc->mem_cgroup, 1, file); /* * zone->pages_scanned is used for detect zone's oom * mem_cgroup remembers nr_scan by itself. */ if (scan_global_lru(sc)) { zone->pages_scanned += pgscanned; zone->recent_scanned[!!file] += pgmoved; } if (file) __mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved); else __mod_zone_page_state(zone, NR_ACTIVE_ANON, -pgmoved); spin_unlock_irq(&zone->lru_lock); pgmoved = 0; while (!list_empty(&l_hold)) { cond_resched(); page = lru_to_page(&l_hold); list_del(&page->lru); if (unlikely(!page_evictable(page, NULL))) { putback_lru_page(page); continue; } list_add(&page->lru, &l_inactive); if (!page_mapping_inuse(page)) { /* * Bypass use-once, make the next access count. See * mark_page_accessed and shrink_page_list. */ SetPageReferenced(page); } } /* * Count the referenced pages as rotated, even when they are moved * to the inactive list. This helps balance scan pressure between * file and anonymous pages in get_scan_ratio. */ zone->recent_rotated[!!file] += pgmoved; /* * Now put the pages back on the appropriate [file or anon] inactive * and active lists. */ pagevec_init(&pvec, 1); pgmoved = 0; lru = LRU_BASE + file * LRU_FILE; spin_lock_irq(&zone->lru_lock); while (!list_empty(&l_inactive)) { page = lru_to_page(&l_inactive); prefetchw_prev_lru_page(page, &l_inactive, flags); VM_BUG_ON(PageLRU(page)); SetPageLRU(page); VM_BUG_ON(!PageActive(page)); ClearPageActive(page); list_move(&page->lru, &zone->lru[lru].list); mem_cgroup_move_lists(page, lru); pgmoved++; if (!pagevec_add(&pvec, page)) { __mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved); spin_unlock_irq(&zone->lru_lock); pgdeactivate += pgmoved; pgmoved = 0; if (buffer_heads_over_limit) pagevec_strip(&pvec); __pagevec_release(&pvec); spin_lock_irq(&zone->lru_lock); } } __mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved); pgdeactivate += pgmoved; if (buffer_heads_over_limit) { spin_unlock_irq(&zone->lru_lock); pagevec_strip(&pvec); spin_lock_irq(&zone->lru_lock); } pgmoved = 0; lru = LRU_ACTIVE + file * LRU_FILE; while (!list_empty(&l_active)) { page = lru_to_page(&l_active); prefetchw_prev_lru_page(page, &l_active, flags); VM_BUG_ON(PageLRU(page)); SetPageLRU(page); VM_BUG_ON(!PageActive(page)); list_move(&page->lru, &zone->lru[lru].list); mem_cgroup_move_lists(page, lru); pgmoved++; if (!pagevec_add(&pvec, page)) { __mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved); pgmoved = 0; spin_unlock_irq(&zone->lru_lock); if (vm_swap_full()) pagevec_swap_free(&pvec); __pagevec_release(&pvec); spin_lock_irq(&zone->lru_lock); } } __mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved); __count_zone_vm_events(PGREFILL, zone, pgscanned); __count_vm_events(PGDEACTIVATE, pgdeactivate); spin_unlock_irq(&zone->lru_lock); if (vm_swap_full()) pagevec_swap_free(&pvec); pagevec_release(&pvec); } Note the first use of pgmoved there. It no longer does anything. erk. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org