From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <464C48F1.3060903@shadowen.org> Date: Thu, 17 May 2007 13:22:09 +0100 From: Andy Whitcroft MIME-Version: 1.0 Subject: Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0 References: <464AC00E.10704@yahoo.com.au> <464ACA68.2040707@yahoo.com.au> <464AF8DB.9030000@yahoo.com.au> <20070516135039.GA7467@skynet.ie> <464B0F81.2090103@yahoo.com.au> <20070516153215.GB10225@skynet.ie> <464B26E8.3060404@yahoo.com.au> <20070516164631.GD10225@skynet.ie> <464BFF9D.809@yahoo.com.au> In-Reply-To: <464BFF9D.809@yahoo.com.au> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Nick Piggin Cc: Mel Gorman , Nicolas Mailhot , Christoph Lameter , akpm@linux-foundation.org, Linux Memory Management List List-ID: Nick Piggin wrote: > Mel Gorman wrote: >> On (17/05/07 01:44), Nick Piggin didst pronounce: > >>>> If the watermark was totally ignored with the second patch, I would >>>> understand >>>> but they are still obeyed. Even if it is an ALLOC_HIGH or ALLOC_HARDER >>>> allocation, the watermarks are obeyed for order-0 so memory does not >>>> get >>>> exhausted as that could cause a host of problems. The difference is >>>> if this >>>> is a HIGH or HARDER allocation and the memory can be granted without >>>> going >>>> belong the order-0 watermarks, it'll succeed. Would it be better if the >>>> lack of ALLOC_CPUSET was used to determine when only order-0 watermarks >>>> should be obeyed? >>> >>> But I don't know why you want to disobey higher order watermarks in the >>> first place. >> >> >> Because the original problem was bio_alloc() allocations failing and >> the OOM >> log showed that the higher-order pages were available. Patch 2 >> addressed it >> by succeeding these allocations if the min watermark was not breached >> with the >> knowledge that kswapd was awake and reclaiming at the relevant order. >> I think >> it may even have solved it without the kswapd change but the kswapd >> change >> seemed sensible. > > But that just breaks the watermarks. > > It could be that the actual values of the watermarks as they are now are > not very good ones, which is where the problem is coming from. > > >>> *Those* are exactly the things that are going to be helpful >>> to fix this problem of atomic higher order allocations failing or non >>> atomic ones going into direct reclaim. >>> >> >> >> And the intention was that non-atomic ones would go into direct reclaim >> after kicking kswapd but the atomic allocations would at least >> succeeed if >> the memory was there as long as they don't totally mess up watermarks. > > But we have 3 levels of watermarks, so you can keep a reserve for atomic > allocations _and_ a buffer between the reclaim watermark and the direct > reclaim watermark. > > >>>> Raising watermarks is no guarantee that a high-order allocation that >>>> can sleep >>>> will occur at the right time to kick kswapd awake and that it'll get >>>> back from >>>> whatever it's doing in time to spot the new order and start >>>> reclaiming again. >>> >>> You don't *need* a higher order allocation that can sleep in order >>> to kick kswapd. Crikey, I keep saying this. >>> >> >> >> Indeed, we seem to have got stuck in a loop of sorts. >> >> I understand that kswapd gets kicked awake either way but there must be a >> timing issue. Lets say we had a situations like >> >> order-0 alloc >> watermark hit => wake kswapd >> order-0 alloc kswapd reclaiming order 0 >> order-0 alloc kswapd reclaiming order 0 >> order-3 alloc => kick kswap for order 3 >> order-0 alloc kswapd reclaiming order 0 >> order-3 alloc kswapd reclaiming order 0 >> order-3 alloc kswapd reclaiming order 0 >> order-3 alloc => highorder mark hit, fail >> >> kswapd will keep reclaiming at order-0 until it completes a reclaim cycle >> and spots the new order and start over again. So there is a potentially >> sizable window there where problems can hit. Right? > > Take a look at the code. wakeup_kswapd and __alloc_pages. > > First, assume the zone is above high watermarks for order-0 and order-1. > order-0 allocs... > order-1 low watermark hit => don't care, not allocing order-1 > order-0 low watermark hit => wake kswapd reclaim order 0 > order-1 alloc => wakeup_kswapd raises kswapd_max_order to 1 > order-1 allocs continue to succeed until the min watermark is hit > order-1 *atomic* allocs continue until the atomic reserve is hit > order-1 memalloc allocs continue until no more order-1 pages left. This represents the ideal. However we never consider the reserves at order-1 unless we get an order-1 allocation. With lots of order-0 allocations (the norm) we can run the order-1 availability well below even the atomic reserve without anyone noticing, while the total reserve is above the order-0 low watermark. Here kswapd has been idle as there is only order-0 activity and we have sufficient of those. THEN an order-1 comes in, we are below the order-1 low watermarks, we wake kswapd, and retry and discover we are below the atomic threshold and _fail_ the allocation. > > There really is (or should be) a proper watermarking system in place that > provides the right buffering for higher order allocations. I think that this is should be, not is. >>> Working out why it apparently isn't working, first. Then maybe look at >>> raising watermarks (they get reduced fairly rapidly as the order >>> increases, >>> so it might just be that there is not enough at order-3). >>> >> >> >> I believe it failed to work due to a combination of kswapd reclaiming at >> the wrong order for a while and the fact that the watermarks are pretty >> agressive when it comes to higher orders. I'm trying to think of >> alternative fixes but keep coming back to the current fix using >> !(alloc_flags & ALLOC_CPUSET) to allow !wait allocations to succeed if >> the memory is there and above min watermarks at order-0. > > kswapd reclaiming at the wrong order should be a bug. It should start > reclaiming at the right order as soon as an allocation (atomic or not) > goes through the "start reclaiming now" watermark. > > Now this is just looking at mainline code that has the kswapd_max_order, > and kswapd doesn't actually reclaim "at" any order -- it just uses the > kswapd_max_order to know when the required "stop reclaiming now" marks > have been hit. If lumpy reclaim is not reclaiming at the right order, > then it means it isn't refreshing from kswapd_max_order enough. Yes I believe all of this is working as designed. The problem is that we treat order-0 and order-1 allocations as independant. We do not take into account that we split order-1's to make order-0. We do not check the order-1 reserve for order 0 and so wake kswapd early enough. It is very hard given the interdependant nature if the current calculation to detect transitions at _other_ orders when we allocate at any specific order. Hmmmmmm. -apw -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org