From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Mon, 24 Nov 2008 12:53:35 -0800 From: Andrew Morton Subject: Re: [PATCH] vmscan: bail out of page reclaim after swap_cluster_max pages Message-Id: <20081124125335.556c2a60.akpm@linux-foundation.org> In-Reply-To: <20081124145057.4211bd46@bree.surriel.com> References: <20081124145057.4211bd46@bree.surriel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Rik van Riel Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, mel@csn.ul.ie, kosaki.motohiro@jp.fujitsu.com List-ID: On Mon, 24 Nov 2008 14:50:57 -0500 Rik van Riel wrote: > Sometimes the VM spends the first few priority rounds rotating back > referenced pages and submitting IO. Once we get to a lower priority, > sometimes the VM ends up freeing way too many pages. It would help (a lot) if we had a much more specific and detailed description of the problem which is being fixed. Nobody has noticed it in half a decade, so it can't be very serious? > The fix is relatively simple: in shrink_zone() we can check how many > pages we have already freed, direct reclaim tasks break out of the > scanning loop if they have already freed enough pages and have reached > a lower priority level. So in the common scenario where there's a lot of dirty highmem and little dirty lowmem, the kernel will start reclaiming highmem at a vastly higher rate than lowmem. iirc, this was the reason why this change was tried then reverted. Please demonstrate that this regression is not worse than the problem which is being fixed! > However, in order to do this we do need to know how many pages we already > freed, so move nr_reclaimed into scan_control. Thus carrying the state across the *entire* scanning pass: all zones. So as soon as sc.nr_reclaimed exceeds swap_cluster_max, the scanner will fall into a different mode for the remaining zones wherein it will scan only swap_cluster_max pages from them, then will bale. This will heavily bias scanning onto the zones at the start of the zone list. In fact it probably means that the zone at the head of the zonelist gets thrashed and the remaining zones will just sit there doing almost nothing. Where's the sense in that? Has any testing been done to demonstrate and quantify this effect? > Signed-off-by: Rik van Riel > --- > Kosaki, this should address the zone scanning pressure issue. What is the "zone scanning pressure issue"? Please don't put "should" in a vmscan changelog :( Either it does, or it does not? This should look familiar: commit e468e46a9bea3297011d5918663ce6d19094cf87 Author: akpm Date: Thu Jun 24 15:53:52 2004 +0000 [PATCH] vmscan.c: dont reclaim too many pages The shrink_zone() logic can, under some circumstances, cause far too many pages to be reclaimed. Say, we're scanning at high priority and suddenly hit a large number of reclaimable pages on the LRU. Change things so we bale out when SWAP_CLUSTER_MAX pages have been reclaimed. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds BKrev: 40daf910sac4yN_aUhhJF2U8Upx1ww And here is where it was reverted. Note that this was nearly two years later! It takes that long for these things to be discovered, analysed and fixed. commit 210fe530305ee50cd889fe9250168228b2994f32 Author: Andrew Morton Date: Fri Jan 6 00:11:14 2006 -0800 [PATCH] vmscan: balancing fix Revert a patch which went into 2.6.8-rc1. The changelog for that patch was: The shrink_zone() logic can, under some circumstances, cause far too many pages to be reclaimed. Say, we're scanning at high priority and suddenly hit a large number of reclaimable pages on the LRU. Change things so we bale out when SWAP_CLUSTER_MAX pages have been reclaimed. Problem is, this change caused significant imbalance in inter-zone scan balancing by truncating scans of larger zones. Suppose, for example, ZONE_HIGHMEM is 10x the size of ZONE_NORMAL. The zone balancing algorithm would require that if we're scanning 100 pages of ZONE_HIGHMEM, we should scan 10 pages of ZONE_NORMAL. But this logic will cause the scanning of ZONE_HIGHMEM to bale out after only 32 pages are reclaimed. Thus effectively causing smaller zones to be scanned relatively harder than large ones. Now I need to remember what the workload was which caused me to write this patch originally, then fix it up in a different way... Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org