From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Tue, 13 Jun 2000 23:49:19 +0200 (CEST) From: Andrea Arcangeli Subject: Re: [patch] improve streaming I/O [bug in shrink_mmap()] In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: Rik van Riel Cc: "Stephen C. Tweedie" , Zlatko Calusic , alan@redhat.com, Linux MM List , Linux Kernel List List-ID: On Tue, 13 Jun 2000, Rik van Riel wrote: >On Tue, 13 Jun 2000, Andrea Arcangeli wrote: >> On Mon, 12 Jun 2000, Stephen C. Tweedie wrote: >> >> >Nice --- it might also explain some of the excessive kswap CPU >> >utilisation we've seen reported now and again. >> >> You have more kswapd load for sure due the strict zone approch. >> It maybe not noticeable but it's real. > >Theoretically it's real, but having a certain number of free pages >around in the normal zone so we can do eg. process struct allocations >and slab allocations from there is well worth it. You may want to >closely re-read Linus' response to your classzone proposal some >weeks ago. I read all Linus's reply and I'm not missing anything. >> I think Linus's argument about the above scenario is simply that >> the above isn't going to happen very often, but how can I ignore >> this broken behaviour? I hate code that works in the common case >> but that have drawbacks in the corner case. > >Let me summarise the drawbacks of classzone and the strict zone >approach: > >Strict zone approach: >- use slightly more memory, on the order of maybe 1 or 2% >- use slightly more kswapd cpu time since the free page goals > are stricter > >Classzone: >- can easily run out of 2- and 4-page contiguous areas of > free memory in the normal zone, leading to the need to > do allocation of task_structs and most slab caches from > the dma zone This is a very very red herring. Take 2.4.0-test1-aclatest and assume you do some I/O and you fill all the normal zone (except the latest pages_min of course) in the cache. Then you fork a task and you fallback in the DMA zone that is completly free and the memory for the task_struct got allocated from the DMA zone also with your design! Also the memory you take free from the normal zone for this purpose is at max pages_high-pages_min that is very low margin that you can trivially throw away if you are doing some I/O and you happen to allocate it from the cache. Then you won't have any margin anymore and you'll allocate all the persistent stuff from the zone DMA. >- this in turn will lead to the dma zone being less reliable > when we need to allocate dma pages, [..] Previous point was wrong and that can happen also with current kernel. The fact is that the kernel memory currently can't be relocated and thus you can't do anything to solve this problem except to avoid to allocate there anything that can't be relocated and then you could fail kernel allocations even if you still have 16mbyte of free ram. > [..] or to a fork() failing > with out of memory once we have a lot of processes on very > big systems What the fork have to do with this issue? classzone patch will take enough memory free in the classzone so that it's likely there are two contigous pages thus this point is completly irrelevant with regard to zone/classzone design. >Here you'll see that both systems have their advantages and >disadvantages. The zoned approach has a few (minimal) performance As far I can tell the only disavantage of classzone is that the spinlock have to be per-node and you have to keep collected the information about the classzone while allocating and freeing the pages. >disadvantages while classzone has a few stability disadvantages. IMHO it's the opposite. Classzone provides the correct behaviour but at a potentially major fixed cost during allocations/deallocations and the lock is not per-zone anymore. However this additional information that we collect we'll avoid us to waste CPU and memory so it's not obvious that classzone will decrease performance. >Personally I'd chose stability over performance any day, but that's >just me. I fully agree and that's why I developed classzone in first place. >The big gains in classzone are most likely from the _other_ changes >that are somewhere inside the classzone patch. If we focus on Indeed. >merging some of those (and maybe even improving some of the others >before merging), we can have a 2.4 which performs as good as or >better than the current classzone code but without the drawbacks. IMHO it's the current kernel that have the drawbacks. Classzone is the _fixes_ for the drawbacks. For the other improvments I agree they are completly orthogonal and I agree to split them and to discuss separately. I have not mentioned in these emails infact. I'm only concerned about the zone design at this moment. >Oh, btw, the classzone patch is vulnerable to the infinite-loop >in shrink_mmap too. Imagine a page shortage in the dma zone but >having only zone_normal pages on the lru queue ... >(and since the zone_normal classzone already has enough free pages, >shrink_mmap will find itself looping forever searching for freeable >zone_dma pages which aren't there) You're obviously wrong that can't happen with classzone! I intentionally always put the pages outside the memclass into a dispose list so I simply can't lockup there and the code there in classzone is rock solid. Here the code from 2.4.0-test1-ac7-classzone-31:filemap.c:shrink_mmap(): while (count > 0 && (page_lru = lru_head->prev) != lru_head) { page = list_entry(page_lru, struct page, lru); list_del(page_lru); dispose = &old; ^^^^^^^^^^^^^^ /* don't account passes over not DMA pages */ if (!memclass(page->zone, zone)) goto dispose_continue; count--; It doesn't matter at all if I do count-- after going to the dispose_continue and that's not a bug it's intentional and the count-- have to stay after the check for the memclass to provide shrink_mmap enough power for shrinking the interesting classzones. Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/