From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <403FF15E.3040800@cyberone.com.au> Date: Sat, 28 Feb 2004 12:39:42 +1100 From: Nick Piggin MIME-Version: 1.0 Subject: Re: [RFC] VM batching patch problems? References: <403FDEAA.1000802@cyberone.com.au> <20040227165244.25648122.akpm@osdl.org> In-Reply-To: <20040227165244.25648122.akpm@osdl.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Andrew Morton Cc: linux-mm@kvack.org List-ID: Andrew Morton wrote: >Nick Piggin wrote: > >>Hi, >>Here are a couple of things I think are wrong with balancing-batching >>patch. Comments? Maybe I'm wrong? >> >> >>1. kswapd always balances all zones. With your patch, balance_pgdat from >>kswapd will continue to balance all zones while any one of them is above >>the pages_high watermark. Strictly, this means it puts exactly equal >>pressure on all zones (disregarding problem #2). But it somewhat defeats >>the purpose of zones, which is to be able to selectively put more pressure >>on one area when needed. >> >>Example: 8MB cache in ZONE_DMA, 8000MB cache in ZONE_HIGHMEM and we'll >>assume pages are reclaimed from each zone at the same rate. Now suppose >>ZONE_DMA goes 10 pages under the watermark. kswapd will free 10000 pages >>from highmem while freeing the 10 ZONE_DMA pages. >> > >Maybe. It's still a relatively small amount of the zone. > > But that is with a relatively small amount of ZONE_DMA. Either way, kswapd will now bring ZONE_DMA above water 1000 times slower than it would if it didn't need to scan ZONE_HIGHMEM. Doing it this way has 0 benefits that I can see. >With an understanding of a problem it should be possible to construt a >testcase which demonstrates that problem, no? > > I can tell you that kswapd will be much less responsive in keeping small zones with free memory available in the presence of larger zones (eg ZONE_DMA vs ZONE_NORMAL, or ZONE_NORMAL vs ZONE_HIGHMEM). I guess this could be measured with a test case: keep track of total time each zone is under water. >>2. "batch pressure" is proportional to total size of the zone. It should >>be proportional to the size of the freeable LRU cache. >> >>Example: 1GB ZONE_NORMAL, 1GB ZONE_HIGHMEM. Both will have the same batch >>pressure, so the scanners will attempt to free the same amount of LRU cache >>per run from each zone. Now say ZONE_NORMAL is filled with 750MB of slab >>cache and pinned memory. The 250MB of LRU cache will be scanned at the >>same rate as the 1GB of highmem LRU cache. >> >> > >hmm, yes. But the increased ZONE_NORMAL scanning will cause increased slab >reclaim. Bear in mind that the batching does not balance the amount of >scanning scross the zones (maybe it should). It balances the amount of >reclaiming across the zones. So we can if we want incorporate the number >of reclaimed slab pages into the arithmetic. > No it doesn't increase ZONE_NORMAL scanning. The scanning is the same rate, but because ZONE_NORMAL is 1/4 the size, it has quadruple the pressure. If you don't like logic, just pretend it is pinned by mem_map and other things. Your batching patch is conceptually wrong and it adds complexity. And *reclaiming* across zones is a silly notion, it applies more pressure to zones with hotter pages. You want to balance *scanning*, and reclaim will take care of itself. This notion is embedded all through the scanner though, not just in your patch. I have ripped it out. > > > >>3. try_to_free_pages is now too lazy or sleepy. This seems to be what is >>causing the lowend kbuild problems. >> > >What makes you think that? > > Because if I make it sleep less and reclaim more, I drop kbuild time from 15 minutes with your scheme to 10 minutes. I don't know what more I can say. Your patch causes big regressions here in an area that I am trying to fix up. I am not making this up. >>I have a batch that addresses these problems and others. >> > >Sorry, I dropped most of your patches. I was reorganising things and it >just became impossible to work with them. One of them did eight unrelated >things! It's impossible to select different parts, to understand what the >different parts are doing, to instrument them, etc. > > OK most of them did one thing, but one did a number of pretty trivial things which I guess should have been broken out. >Some of the changes (such as in vm-tune-throttle) looked like random >empirical hacks. Without any description of what they're doing and of what >experimentation/testing/instrumentation went into them I'm not able to work >with them. > > I have been testing and posting results for the last month or two. vm-tune-throttle is actually one of those patches that does one thing, and it is documented exactly what it does at the top of the patch. I admit this is the one patch I wasn't able to measure an improvement from, but I like it anyway and I'll keep it. >So I picked out a couple of bits which were useful and obvious and dropped >the rest. Please, no more megahumongopatches. One concept per patch. > >I probably screwed up a few things and may have lost some bugfixes in the >process. > >The one patch which I don't particularly like is >vm-dont-rotate-active-list.patch. The theory makes sense but all that >mucking with false marker pages is a bit yuk. As I am unable to >demonstrate any improvement from that patch (slight regressions, in fact) >based on admittedly inexhaustive testing, I'd prefer that we be able to >demonstrate a good reason for bringing it back. > I already have, multiple times, which is why I sent it to you. I really have done quite a lot of testing. From the feedback I have got through your being in your tree I can say I'm on the right track with the patches, so I'll have to maintain my own rollup. I don't understand why your tests aren't showing any improvement. I know better than to question your testing methodology, but I don't believe the patches are so fragile that they only work for me. In fact, there has been quite a bit of feedback about how they help in real usage as well as benchmarks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org