* Re: cp file /dev/zero <-> cache [was Re: increasing page size] [not found] <199807091442.PAA01020@dax.dcs.ed.ac.uk> @ 1998-07-09 18:59 ` Rik van Riel 1998-07-09 23:37 ` Stephen C. Tweedie 1998-07-11 14:14 ` Rik van Riel 1 sibling, 1 reply; 40+ messages in thread From: Rik van Riel @ 1998-07-09 18:59 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Benjamin C.R. LaHaise, Andrea Arcangeli, Stephen Tweedie, Linux Kernel, Linux MM On Thu, 9 Jul 1998, Stephen C. Tweedie wrote: > On Tue, 7 Jul 1998 13:50:02 -0400 (EDT), "Benjamin C.R. LaHaise" > <blah@kvack.org> said: > > > Right. I'd rather see a multi-level lru like policy (ie on each cache hit > > it gets moved up one level in the cache, with the lru'd pages from a given > > There's a fundamentally nice property about the multi-level cache > which we _cannot_ easily emulate with page aging, and that is the > ability to avoid aging any hot pages at all while we are just > consuming cold pages. For example, a large "find|xargs grep" can be > satisfied without staling any of the existing hot cached pages. Then I'd better incorporate a design for this in the zone allocator (we could add this to the page_struct, but in the zone_struct we can make a nice bitmap of it). OTOH, is it really _that_ much different from an aging scheme with an initial age of 1? Rik. +-------------------------------------------------------------------+ | Linux memory management tour guide. H.H.vanRiel@phys.uu.nl | | Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ | +-------------------------------------------------------------------+ -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-09 18:59 ` cp file /dev/zero <-> cache [was Re: increasing page size] Rik van Riel @ 1998-07-09 23:37 ` Stephen C. Tweedie 1998-07-10 5:57 ` Rik van Riel 0 siblings, 1 reply; 40+ messages in thread From: Stephen C. Tweedie @ 1998-07-09 23:37 UTC (permalink / raw) To: Rik van Riel; +Cc: Stephen C. Tweedie, Benjamin C.R. LaHaise, Andrea Arcangeli Hi, On Thu, 9 Jul 1998 20:59:57 +0200 (CEST), Rik van Riel <H.H.vanRiel@phys.uu.nl> said: > On Thu, 9 Jul 1998, Stephen C. Tweedie wrote: >> >> There's a fundamentally nice property about the multi-level cache >> which we _cannot_ easily emulate with page aging, and that is the >> ability to avoid aging any hot pages at all while we are just >> consuming cold pages. > Then I'd better incorporate a design for this in the zone > allocator (we could add this to the page_struct, but in > the zone_struct we can make a nice bitmap of it). It's nothing to do with the allocator per se; it's really a different solution to a different problem. That helps, actually, as it means we're not forced to stick with one allocator if we want to use such a scheme. > OTOH, is it really _that_ much different from an aging > scheme with an initial age of 1? Yes, it is: the aging scheme pretty much forces us to age all pages on an equal basis, so a lot of transient pages hitting the cache has the side effect of prematurely aging and evicting a lot of existing, potentially far more valuable pages. A multilevel cache is pretty much essential if you're going to let any cached data survive a grep flood. Whether you _want_ that, or whether you'd rather just let the cache drain and repopulate it after the IO has calmed, is a different question; there are situations where one or other decision might be best, so it's not a guaranteed win. But the multilevel cache does have some nice properties which aren't so easy to get with page aging. It also tends to be faster at finding pages to evict, since we don't require multiple passes to flush the transient page queue. --Stephen. -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-09 23:37 ` Stephen C. Tweedie @ 1998-07-10 5:57 ` Rik van Riel 0 siblings, 0 replies; 40+ messages in thread From: Rik van Riel @ 1998-07-10 5:57 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Benjamin C.R. LaHaise, Linux MM On Fri, 10 Jul 1998, Stephen C. Tweedie wrote: > potentially far more valuable pages. A multilevel cache is pretty much > essential if you're going to let any cached data survive a grep flood. > Whether you _want_ that, or whether you'd rather just let the cache > drain and repopulate it after the IO has calmed, is a different > question; there are situations where one or other decision might be > best, so it's not a guaranteed win. But the multilevel cache does have > some nice properties which aren't so easy to get with page aging. It > also tends to be faster at finding pages to evict, since we don't > require multiple passes to flush the transient page queue. Let's go with those nice properties. Especially the last one (quicker at finding pages) is essential in preventing memory fragmentation (a 'lazy' list can be used to prevent pressure on the few last 'free' zones from building). Rik. +-------------------------------------------------------------------+ | Linux memory management tour guide. H.H.vanRiel@phys.uu.nl | | Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ | +-------------------------------------------------------------------+ -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] [not found] <199807091442.PAA01020@dax.dcs.ed.ac.uk> 1998-07-09 18:59 ` cp file /dev/zero <-> cache [was Re: increasing page size] Rik van Riel @ 1998-07-11 14:14 ` Rik van Riel 1998-07-11 21:23 ` Stephen C. Tweedie 1 sibling, 1 reply; 40+ messages in thread From: Rik van Riel @ 1998-07-11 14:14 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Benjamin C.R. LaHaise, Linux MM On Thu, 9 Jul 1998, Stephen C. Tweedie wrote: > There's a fundamentally nice property about the multi-level cache > which we _cannot_ easily emulate with page aging, and that is the > ability to avoid aging any hot pages at all while we are just > consuming cold pages. For example, a large "find|xargs grep" can be > satisfied without staling any of the existing hot cached pages. Thinking over this design, I wonder how many levels we'll need for normal operation, and how many pages are allowed in each level. I'd think we'll want 4 levels, with each 'lower' level having 30% to 70% more pages than the level above. This should be enough to cater to the needs of both rc5des-like programs and multi-megabyte tiled image processing. Then again, I could be completely wrong :) Anyone? Rik. +-------------------------------------------------------------------+ | Linux memory management tour guide. H.H.vanRiel@phys.uu.nl | | Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ | +-------------------------------------------------------------------+ -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-11 14:14 ` Rik van Riel @ 1998-07-11 21:23 ` Stephen C. Tweedie 1998-07-11 22:25 ` Rik van Riel 1998-07-12 1:47 ` Benjamin C.R. LaHaise 0 siblings, 2 replies; 40+ messages in thread From: Stephen C. Tweedie @ 1998-07-11 21:23 UTC (permalink / raw) To: Rik van Riel; +Cc: Stephen C. Tweedie, Benjamin C.R. LaHaise, Linux MM Hi, On Sat, 11 Jul 1998 16:14:26 +0200 (CEST), Rik van Riel <H.H.vanRiel@phys.uu.nl> said: > I'd think we'll want 4 levels, with each 'lower' > level having 30% to 70% more pages than the level > above. This should be enough to cater to the needs > of both rc5des-like programs and multi-megabyte > tiled image processing. > Then again, I could be completely wrong :) Anyone? Maybe, maybe not --- we'd have to try it. However, I'm always a bit dubious about being overly clever about this kind of stuff, and two level may well work fine. At worst, we can do ageing on the resident level and LRU on the transient, and let the aging take care of it. Personally, I think just a two-level LRU ought to be adequat. Yes, I know this implies getting rid of some of the page ageing from 2.1 again, but frankly, that code seems to be more painful than it's worth. The "solution" of calling shrink_mmap multiple times just makes the algorithm hideously expensive to execute. --Stephen -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-11 21:23 ` Stephen C. Tweedie @ 1998-07-11 22:25 ` Rik van Riel 1998-07-13 13:23 ` Stephen C. Tweedie 1998-07-12 1:47 ` Benjamin C.R. LaHaise 1 sibling, 1 reply; 40+ messages in thread From: Rik van Riel @ 1998-07-11 22:25 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Benjamin C.R. LaHaise, Linux MM On Sat, 11 Jul 1998, Stephen C. Tweedie wrote: > On Sat, 11 Jul 1998 16:14:26 +0200 (CEST), Rik van Riel > <H.H.vanRiel@phys.uu.nl> said: > > > I'd think we'll want 4 levels, with each 'lower' > > level having 30% to 70% more pages than the level > > Personally, I think just a two-level LRU ought to be adequat. Yes, I > know this implies getting rid of some of the page ageing from 2.1 again, > but frankly, that code seems to be more painful than it's worth. The > "solution" of calling shrink_mmap multiple times just makes the > algorithm hideously expensive to execute. This could be adequat, but then we will want to maintain an active:inactive ratio of 1:2, in order to get a somewhat realistic aging effect on the LRU inactive pages. Or maybe we want to do a 3-level thingy, inactive in LRU order and active and hyperactive (wired?) with aging. Then we only promote pages to the highest level when they've reached the highest age in the active level. (OK, this is probably _far_ too complex, but I'm just exploring some wild ideas here in the hope of triggering some ingenious idea) Rik. +-------------------------------------------------------------------+ | Linux memory management tour guide. H.H.vanRiel@phys.uu.nl | | Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ | +-------------------------------------------------------------------+ -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-11 22:25 ` Rik van Riel @ 1998-07-13 13:23 ` Stephen C. Tweedie 0 siblings, 0 replies; 40+ messages in thread From: Stephen C. Tweedie @ 1998-07-13 13:23 UTC (permalink / raw) To: Rik van Riel; +Cc: Stephen C. Tweedie, Benjamin C.R. LaHaise, Linux MM Hi, On Sun, 12 Jul 1998 00:25:20 +0200 (CEST), Rik van Riel <H.H.vanRiel@phys.uu.nl> said: > On Sat, 11 Jul 1998, Stephen C. Tweedie wrote: >> On Sat, 11 Jul 1998 16:14:26 +0200 (CEST), Rik van Riel >> <H.H.vanRiel@phys.uu.nl> said: >> >> > I'd think we'll want 4 levels, with each 'lower' >> > level having 30% to 70% more pages than the level >> >> Personally, I think just a two-level LRU ought to be adequat. Yes, I >> know this implies getting rid of some of the page ageing from 2.1 again, >> but frankly, that code seems to be more painful than it's worth. The >> "solution" of calling shrink_mmap multiple times just makes the >> algorithm hideously expensive to execute. > This could be adequat, but then we will want to maintain > an active:inactive ratio of 1:2, in order to get a somewhat > realistic aging effect on the LRU inactive pages. Aging is not a good thing in the cache, in general. We _want_ to be able to empty the cache at short notice. LRU works for that. The existing physical scan is definitely suboptimal without ageing, but that doesn't mean that aging is the right answer. (I tried doing buffer ageing in the original kswap. It sucked.) > Or maybe we want to do a 3-level thingy, inactive in LRU > order and active and hyperactive (wired?) with aging. If we have more than 2 levels, then we definitely don't want ageing: just let migration of pages between the levels do the ageing for us. --Stephen -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-11 21:23 ` Stephen C. Tweedie 1998-07-11 22:25 ` Rik van Riel @ 1998-07-12 1:47 ` Benjamin C.R. LaHaise 1998-07-13 13:42 ` Stephen C. Tweedie 1 sibling, 1 reply; 40+ messages in thread From: Benjamin C.R. LaHaise @ 1998-07-12 1:47 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Rik van Riel, Linux MM On Sat, 11 Jul 1998, Stephen C. Tweedie wrote: > Personally, I think just a two-level LRU ought to be adequat. Yes, I > know this implies getting rid of some of the page ageing from 2.1 again, > but frankly, that code seems to be more painful than it's worth. The > "solution" of calling shrink_mmap multiple times just makes the > algorithm hideously expensive to execute. Hmmm, is that a hint that I should sit down and work on the code tomorrow whilst recovering? =) -ben -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-12 1:47 ` Benjamin C.R. LaHaise @ 1998-07-13 13:42 ` Stephen C. Tweedie 1998-07-18 22:10 ` Rik van Riel 0 siblings, 1 reply; 40+ messages in thread From: Stephen C. Tweedie @ 1998-07-13 13:42 UTC (permalink / raw) To: Benjamin C.R. LaHaise; +Cc: Stephen C. Tweedie, Rik van Riel, Linux MM Hi, On Sat, 11 Jul 1998 21:47:44 -0400 (EDT), "Benjamin C.R. LaHaise" <blah@kvack.org> said: > On Sat, 11 Jul 1998, Stephen C. Tweedie wrote: >> Personally, I think just a two-level LRU ought to be adequat. Yes, I >> know this implies getting rid of some of the page ageing from 2.1 again, >> but frankly, that code seems to be more painful than it's worth. The >> "solution" of calling shrink_mmap multiple times just makes the >> algorithm hideously expensive to execute. > Hmmm, is that a hint that I should sit down and work on the code tomorrow > whilst recovering? =) I'm working on it right now. Currently, the VM is so bad that it is seriously getting in the way of my job. Just trying to fix some odd swapper bugs is impossible to test because I can't set up a ramdisk for swap and do in-memory tests that way: things thrash incredibly. The algorithms for aggressive cache pruning rely on fractions of nr_physpages, and that simply doesn't work if you have large numbers of pages dedicated to non-swappable things such as ramdisk, bigphysarea DMA buffers or network buffers. Rik, unfortunately I think we're just going to have to back out your cache page ageing. I've just done that on my local test box and the results are *incredible*: it is going much more than an order of magnitude faster on many things. Fragmentation also seems drastically improved: I've been doing builds of defrag in a 6MB box which were impossible beforehand due to NFS stalls. I'm going to do a bit more experimenting to see if we can keep some of the good ageing behaviour by doing proper LRU in the cache, but otherwise I think the cache ageing has either got to go or to be drastically altered. --Stephen -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-13 13:42 ` Stephen C. Tweedie @ 1998-07-18 22:10 ` Rik van Riel 1998-07-20 16:04 ` Stephen C. Tweedie 0 siblings, 1 reply; 40+ messages in thread From: Rik van Riel @ 1998-07-18 22:10 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Benjamin C.R. LaHaise, Linux MM On Mon, 13 Jul 1998, Stephen C. Tweedie wrote: > I'm working on it right now. Currently, the VM is so bad that it is > seriously getting in the way of my job. Just trying to fix some odd > swapper bugs is impossible to test because I can't set up a ramdisk for > swap and do in-memory tests that way: things thrash incredibly. The > algorithms for aggressive cache pruning rely on fractions of > nr_physpages, and that simply doesn't work if you have large numbers of > pages dedicated to non-swappable things such as ramdisk, bigphysarea DMA > buffers or network buffers. This means we'll have to substract those pages before determining the used percentage. > Rik, unfortunately I think we're just going to have to back out your > cache page ageing. I've just done that on my local test box and the > results are *incredible*: OK, I don't see much problems with that, except that the aging helps a _lot_ with readahead. For the rest, it's not much more than a kludge anyway ;( We really ought to do better than that anyway. I'll give you guys the URL of the Digital Unix manuals on this... (they have some _very_ nice mechanisms for this) > I'm going to do a bit more experimenting to see if we can keep some of > the good ageing behaviour by doing proper LRU in the cache, but > otherwise I think the cache ageing has either got to go or to be > drastically altered. A 2-level LRU on the page cache would be _very_ nice, but probably just as desastrous wrt. fragmentation as aging... Rik. +-------------------------------------------------------------------+ | Linux memory management tour guide. H.H.vanRiel@phys.uu.nl | | Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ | +-------------------------------------------------------------------+ -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-18 22:10 ` Rik van Riel @ 1998-07-20 16:04 ` Stephen C. Tweedie 0 siblings, 0 replies; 40+ messages in thread From: Stephen C. Tweedie @ 1998-07-20 16:04 UTC (permalink / raw) To: Rik van Riel; +Cc: Stephen C. Tweedie, Benjamin C.R. LaHaise, Linux MM Hi, On Sun, 19 Jul 1998 00:10:09 +0200 (CEST), Rik van Riel <H.H.vanRiel@phys.uu.nl> said: > On Mon, 13 Jul 1998, Stephen C. Tweedie wrote: >> I'm working on it right now. Currently, the VM is so bad that it is >> seriously getting in the way of my job. Just trying to fix some odd >> swapper bugs is impossible to test because I can't set up a ramdisk for >> swap and do in-memory tests that way: things thrash incredibly. The >> algorithms for aggressive cache pruning rely on fractions of >> nr_physpages, and that simply doesn't work if you have large numbers of >> pages dedicated to non-swappable things such as ramdisk, bigphysarea DMA >> buffers or network buffers. > This means we'll have to substract those pages before > determining the used percentage. Sure, but that's just admitting that the system is so inherently incapable of balancing itself that we have to place fixed limits on the cache size, and I'm not sure that's a good thing. >> Rik, unfortunately I think we're just going to have to back out your >> cache page ageing. I've just done that on my local test box and the >> results are *incredible*: > OK, I don't see much problems with that, except that the > aging helps a _lot_ with readahead. For the rest, it's > not much more than a kludge anyway ;( This is something we need to sort out. From my benchmarks so far, the one thing that's certain is that you were benchmarking something different from me when you found the ageing speedups. That's not good, because it implies that neither mechanism is doing the Right Thing. What sort of circumstances were you seeing big performance improvements in for your original page ageing code? That might help us to identify what the core improvement in the ageing is, so that we don't lose too much if we start changing the scheme again. > We really ought to do better than that anyway. I'll give > you guys the URL of the Digital Unix manuals on this... > (they have some _very_ nice mechanisms for this) OK, thanks! > A 2-level LRU on the page cache would be _very_ nice, > but probably just as desastrous wrt. fragmentation as > aging... Actually, fragmentation is not the big issue wrt ageing. The page ageing code is simply keeping the cache too large; the time it takes to age the cache means that far too much is getting swapped out, and on low memory machine the cache grows too large altogether. This means that there may be several ways forward. A multi-level LRU would not necessarily be any worse for fragmentation. Keeping a (low) ceiling on the page age in the cache might also be a way forward, allowing us to give a priority boost to readahead pages, but letting us then cap the age once the pages are read to prevent them from staying too long in the cache. I'm also experimenting right now with a number of new zoneing and ageing mechanisms which may address the fragmentation issue. As far as page ageing is concerned, it's really just the overall cache size, and the self-tuning of the cache size, which are my main concerns at the moment. --Stephen -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size]
@ 1998-07-09 13:01 Zachary Amsden
0 siblings, 0 replies; 40+ messages in thread
From: Zachary Amsden @ 1998-07-09 13:01 UTC (permalink / raw)
To: Rik van Riel, Stephen C. Tweedie; +Cc: Andrea Arcangeli, Linux MM, Linux Kernel
-----Original Message-----
From: Rik van Riel <H.H.vanRiel@phys.uu.nl>
To: Stephen C. Tweedie <sct@redhat.com>
Cc: Andrea Arcangeli <arcangeli@mbox.queen.it>; Linux MM
<linux-mm@kvack.org>; Linux Kernel <linux-kernel@vger.rutgers.edu>
Date: Thursday, July 09, 1998 3:50 AM
Subject: Re: cp file /dev/zero <-> cache [was Re: increasing page size]
>On Wed, 8 Jul 1998, Stephen C. Tweedie wrote:
>> <H.H.vanRiel@phys.uu.nl> said:
>>
>> > When my zone allocator is finished, it'll be a piece of
>> > cake to implement lazy page reclamation.
>>
>> I've already got a working implementation. The issue of lazy
>> reclamation is pretty much independent of the allocator underneath; I
>> don't see it being at all hard to run the lazy reclamation stuff on
>top
>> of any form of zoned allocation.
>
>The problem with the current allocator is that it stores
>the pointers to available blocks in the blocks themselves.
>This means we can't wait till the last moment with lazy
>reclamation.
Presumably to reduce memory use, but at what cost? It prevents
lazy reclamation and makes locating available blocks a major
headache. It only takes 4k of memory to store a bitmap of free
blocks in a 128 Meg system. Storing the free list in free space is
an admirable hack, but maybe outdated.
Zach Amsden
amsden@andrew.cmu.edu
P.S. I'm new to this discussion, so please don't flay me if
everything I said is in gross violation of the truth.
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 40+ messages in thread[parent not found: <Pine.LNX.3.96.980705072829.17879D-100000@mirkwood.dummy.home>]
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] [not found] <Pine.LNX.3.96.980705072829.17879D-100000@mirkwood.dummy.home> @ 1998-07-05 11:32 ` Andrea Arcangeli 1998-07-05 17:00 ` Rik van Riel 0 siblings, 1 reply; 40+ messages in thread From: Andrea Arcangeli @ 1998-07-05 11:32 UTC (permalink / raw) To: Rik van Riel; +Cc: linux-mm, Linux Kernel On Sun, 5 Jul 1998, Rik van Riel wrote: >The current allocator is often unable to keep fragmentation >from happening when too many allocations are done. When we So I don' t bother about fragmentation and about the zone allocator. >I have a better idea. The RSS for an inode shouldn't be >allowed to grow larger than 50% of the size of the page >cache when: >- we are tight on memory; and >- the page cache takes more than 25% of memory > >We can achieve this by switching off readahead when we >reach the maximum RSS of the inode. Then we should probably I run hdparm -a0 /dev/hda and nothing change. Now the cache take 20Mbyte of memory running cp file /dev/null while memtest 10000000 is running. >instruct kswapd in some way to remove pages from that inode, >but I'm not completely sure how to do that... Where does the cache is allocated? Is it allocated in the inode? If so kswapd should shrink the inode before start swapping out! >For the buffer cache, we might be able to use the same >kind of algorithm, but I'm not completely sure of that. The buffer memory seems to be reduced better than the cache memory though. >> I would ask to people to really run the kernel with mem=30Mbyte and then >> run a `cp /dev/zero file' and then a `cp file /dev/null' to really see >> what happens. > >In the first case, the buffer cache will grow without >bounds and without it being needed. In the second case >the page cache will grow a bit too much. 10Mbyte of 108 against 1Mbyte of 2.0.34 is not only a bit ;-). >Both can be avoided by using (not yet implemented) >balancing code. It is on the priority list of the MM I had to ask "2.0.34 has balancing code implemented and running?". The current mm layer is not able to shrink the cache memory and I consider it a bug that must be fixed without adding other code. Is there a function call (such us shrink_mmap for mmap or kmem_cache_reap() for slab or shrink_dcache_memory() for dcache) that is able to shrink the cache allocated by cp file /dev/zero? I could also try to apply to my kernel the memleak detector to see where the cache is really allocated... >team, so we will be working on it some day. There Good! >are some stability issues to be solved first, however. I wasn' t aware of these stability problems... >Try the MM team first: linux-mm@kvack.org. >Or read our TODO list: http://www.phys.uu.nl/~riel/mm-patch/todo.html OK. Andrea[s] Arcangeli -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-05 11:32 ` Andrea Arcangeli @ 1998-07-05 17:00 ` Rik van Riel 1998-07-05 18:38 ` Andrea Arcangeli ` (2 more replies) 0 siblings, 3 replies; 40+ messages in thread From: Rik van Riel @ 1998-07-05 17:00 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Linux MM, Linux Kernel On Sun, 5 Jul 1998, Andrea Arcangeli wrote: > On Sun, 5 Jul 1998, Rik van Riel wrote: > > >We can achieve this by switching off readahead when we > >reach the maximum RSS of the inode. Then we should probably > > I run hdparm -a0 /dev/hda and nothing change. Now the cache take 20Mbyte > of memory running cp file /dev/null while memtest 10000000 is running. Hdparm only affects _hardware_ readahead and has nothing to do with software readahead. > >instruct kswapd in some way to remove pages from that inode, > >but I'm not completely sure how to do that... > > Where does the cache is allocated? Is it allocated in the inode? If so > kswapd should shrink the inode before start swapping out! The cache is also mapped into a process'es address space. Currently we would have to walk all pagetables to find a specific page ;( When Stephen and Ben have merged their PTE stuff, we can do the freeing much easier though... > >For the buffer cache, we might be able to use the same > >kind of algorithm, but I'm not completely sure of that. > > The buffer memory seems to be reduced better than the cache memory though. This is partly because buffer memory is not mapped in any pagetable and because buffer memory generally isn't worth keeping around. Because of that we can and do just throw it out on the next opportunity. > >Both can be avoided by using (not yet implemented) > >balancing code. It is on the priority list of the MM > I had to ask "2.0.34 has balancing code implemented and running?". The 2.0 has no balancing code at all. At least, not AFAIK... > current mm layer is not able to shrink the cache memory and I consider it > a bug that must be fixed without adding other code. How do you propose we solve a bug without programming :) > Is there a function call (such us shrink_mmap for mmap or > kmem_cache_reap() for slab or shrink_dcache_memory() for dcache) that is > able to shrink the cache allocated by cp file /dev/zero? shrink_mmap() can only shrink unlocked and clean buffer pages and unmapped cache pages. We need to go through either bdflush (for buffer) or try_to_swap_out() first, in order to make some easy victims for shrink_mmap()... Rik. +-------------------------------------------------------------------+ | Linux memory management tour guide. H.H.vanRiel@phys.uu.nl | | Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ | +-------------------------------------------------------------------+ -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-05 17:00 ` Rik van Riel @ 1998-07-05 18:38 ` Andrea Arcangeli 1998-07-05 19:31 ` Rik van Riel 1998-07-06 10:31 ` Stephen C. Tweedie 1998-07-05 18:57 ` MOLNAR Ingo 1998-07-06 10:24 ` Stephen C. Tweedie 2 siblings, 2 replies; 40+ messages in thread From: Andrea Arcangeli @ 1998-07-05 18:38 UTC (permalink / raw) To: Rik van Riel; +Cc: Linux MM, Linux Kernel On Sun, 5 Jul 1998, Rik van Riel wrote: >Hdparm only affects _hardware_ readahead and has nothing >to do with software readahead. Wooops. >The cache is also mapped into a process'es address space. >Currently we would have to walk all pagetables to find a >specific page ;( >When Stephen and Ben have merged their PTE stuff, we can >do the freeing much easier though... I start to think that the problem is kswapd. Running cp file /dev/null the system remains fluid (when press a key I see the char on the _console_) until there is free (wasted because not used) memory. While there is free memory the swap is 0. When the free memory finish, the system die and when I press a key I don' t see the character on the screen immediatly. I think that it' s kswapd that is irratiting me. So now I am trying to fuck kswapd (I am starting to hate it since I really hate swap ;-). kswapd must swap _nothing_ if _freeable_ cache memory is allocated. kswapd _must_ consider freeable cache memory as _free_ not used memory and so it must not start swapping out useful code and data for make space for allocating more cache. With 2.0.34 when the cache eat all free memory nothing gone swapped out and all perform better. >> >Both can be avoided by using (not yet implemented) >> >balancing code. It is on the priority list of the MM >> I had to ask "2.0.34 has balancing code implemented and running?". The > >2.0 has no balancing code at all. At least, not AFAIK... So 2.1.108 must be able to perform as 2.0.34. >> current mm layer is not able to shrink the cache memory and I consider it >> a bug that must be fixed without adding other code. > >How do you propose we solve a bug without programming :) ;-). I want to tell "without adding new features or replacing the most of the code"... >> Is there a function call (such us shrink_mmap for mmap or >> kmem_cache_reap() for slab or shrink_dcache_memory() for dcache) that is >> able to shrink the cache allocated by cp file /dev/zero? > >shrink_mmap() can only shrink unlocked and clean buffer pages >and unmapped cache pages. We need to go through either bdflush ...unmapped cache pages. Good. >(for buffer) or try_to_swap_out() first, in order to make some try_to_swap_out() should unmap the cache pages? Then I had to recall shrink_mmap()? >easy victims for shrink_mmap()... Rik reading vmscan.c I noticed that you are the one that worked on kswapd (for example removing hard page limits and checking instead free_memory_available(nr)). Could you tell me what you changed (or in which kernel-patch I can find the kswapd patches) to force kswapd to swap so much? Andrea[s] Arcangeli -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-05 18:38 ` Andrea Arcangeli @ 1998-07-05 19:31 ` Rik van Riel 1998-07-06 10:38 ` Stephen C. Tweedie 1998-07-06 14:20 ` Andrea Arcangeli 1998-07-06 10:31 ` Stephen C. Tweedie 1 sibling, 2 replies; 40+ messages in thread From: Rik van Riel @ 1998-07-05 19:31 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Linux MM, Linux Kernel On Sun, 5 Jul 1998, Andrea Arcangeli wrote: > On Sun, 5 Jul 1998, Rik van Riel wrote: > > >The cache is also mapped into a process'es address space. > >Currently we would have to walk all pagetables to find a > >specific page ;( > > I start to think that the problem is kswapd. Running cp file /dev/null the > system remains fluid (when press a key I see the char on the _console_) > until there is free (wasted because not used) memory. While there is free > memory the swap is 0. When the free memory finish, the system die and when > I press a key I don' t see the character on the screen immediatly. I think > that it' s kswapd that is irratiting me. So now I am trying to fuck kswapd > (I am starting to hate it since I really hate swap ;-). kswapd must swap > _nothing_ if _freeable_ cache memory is allocated. kswapd _must_ consider > freeable cache memory as _free_ not used memory and so it must not start > swapping out useful code and data for make space for allocating more > cache. With 2.0.34 when the cache eat all free memory nothing gone > swapped out and all perform better. A few months ago someone (who?) posted a patch that modified kswapd's internals to only unmap clean pages when told to. If I can find the patch, I'll integrate it and let kswapd only swap clean pages when: - page_cache_size * 100 > num_physpages * page_cache.borrow_percent or - (buffer_mem >> PAGE_SHIFT) * 100 > num_physpages * buffermem.borrow_percent > >shrink_mmap() can only shrink unlocked and clean buffer pages > >and unmapped cache pages. We need to go through either bdflush > ...unmapped cache pages. Good. Not good, it means that kswapd needs to unmap the pages first, using the try_to_swap_out() function. [which really needs to be renamed to try_to_unmap()] > >(for buffer) or try_to_swap_out() first, in order to make some > try_to_swap_out() should unmap the cache pages? Then I had to recall > shrink_mmap()? Shrink_mmap() frees the pages that are already unmapped by try_to_swap_out(). This means that the pages need to be handled by both functions (which is good, because it gives us a second 'timeout' for page aging). > Rik reading vmscan.c I noticed that you are the one that worked on kswapd > (for example removing hard page limits and checking instead > free_memory_available(nr)). Could you tell me what you changed (or in > which kernel-patch I can find the kswapd patches) to force kswapd to swap > so much? Most of the patches are on my homepage, you can get and read them there... Rik. +-------------------------------------------------------------------+ | Linux memory management tour guide. H.H.vanRiel@phys.uu.nl | | Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ | +-------------------------------------------------------------------+ -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-05 19:31 ` Rik van Riel @ 1998-07-06 10:38 ` Stephen C. Tweedie 1998-07-06 11:42 ` Rik van Riel 1998-07-06 14:20 ` Andrea Arcangeli 1 sibling, 1 reply; 40+ messages in thread From: Stephen C. Tweedie @ 1998-07-06 10:38 UTC (permalink / raw) To: Rik van Riel; +Cc: Andrea Arcangeli, Linux MM Hi Rik, On Sun, 5 Jul 1998 21:31:56 +0200 (CEST), Rik van Riel <H.H.vanRiel@phys.uu.nl> said: > A few months ago someone (who?) posted a patch that modified > kswapd's internals to only unmap clean pages when told to. > If I can find the patch, I'll integrate it and let kswapd > only swap clean pages when: > - page_cache_size * 100 > num_physpages * page_cache.borrow_percent > or > - (buffer_mem >> PAGE_SHIFT) * 100 > num_physpages * buffermem.borrow_percent I'm not sure what that is supposed to achieve, and I'm not sure how well we expect such tinkering to work uniformly on 8MB and 512MB machines. Unmapping is not an issue with respect to cache sizes. --Stephen -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-06 10:38 ` Stephen C. Tweedie @ 1998-07-06 11:42 ` Rik van Riel 0 siblings, 0 replies; 40+ messages in thread From: Rik van Riel @ 1998-07-06 11:42 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Andrea Arcangeli, Linux MM On Mon, 6 Jul 1998, Stephen C. Tweedie wrote: > <H.H.vanRiel@phys.uu.nl> said: > > > A few months ago someone (who?) posted a patch that modified > > kswapd's internals to only unmap clean pages when told to. > > > If I can find the patch, I'll integrate it and let kswapd > > only swap clean pages when: > > I'm not sure what that is supposed to achieve, and I'm not sure how well > we expect such tinkering to work uniformly on 8MB and 512MB machines. > Unmapping is not an issue with respect to cache sizes. When we use this, we can finally 'enforce' the borrow_percent stuff. Yes, I know the borrow_percent isn't really a good thing, but we'll need the framework anyway when your balancing code is implemented. The 'only unmap clean pages' flag is a good way of implementing this framework; maybe we want to combine it with a flag to shrink_mmap() not to unmap swap cache pages... Or maybe we want to do swap cache LRU reclamation when free_memory_available(4) returns true. Rik. +-------------------------------------------------------------------+ | Linux memory management tour guide. H.H.vanRiel@phys.uu.nl | | Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ | +-------------------------------------------------------------------+ -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-05 19:31 ` Rik van Riel 1998-07-06 10:38 ` Stephen C. Tweedie @ 1998-07-06 14:20 ` Andrea Arcangeli 1 sibling, 0 replies; 40+ messages in thread From: Andrea Arcangeli @ 1998-07-06 14:20 UTC (permalink / raw) To: Rik van Riel; +Cc: Linux MM, Linux Kernel, Linus Torvalds On Sun, 5 Jul 1998, Rik van Riel wrote: >A few months ago someone (who?) posted a patch that modified >kswapd's internals to only unmap clean pages when told to. > >If I can find the patch, I'll integrate it and let kswapd >only swap clean pages when: >- page_cache_size * 100 > num_physpages * page_cache.borrow_percent I don' t agree with swapping out if there are enough freeable pages in the cache (or at least the aging should be very more clever than now). It seems that setting to 1 2 3 pagecache, buffers and freepages and setting 1 1 1 kswapd (so that kswapd can only swap one page at time) help a lot to make the system _usable_ (when I press a key I see it on the console) during `cp file /dev/null' (the cache got reduced to 3Mbyte against the default 10Mbyte if memtest 10000000 is running at the same time). Sometimes I get out of memory with these settings while `cp file /dev/null' is running, since the cache is allocated and the less priority of kswapd can' t free a lot of memory I think. Now I have a new question. What would happen if kswapd would be stopped while `cp file /dev/null' is running? The cache memory allocated by cp is reused or it' s always allocated from the free memory? And is it possible to know how much memory is unmappable (and then freeable) from the cache? If so we should use the swap_out() method in do_try_to_free_page() only if there isn' t enough freeable memory in the cache. If swap_out() is not used kswapd will free memory from the cache or buffers without swapout, or no? Think about a 128Mbyte system. I think that is a no sense swapping out 3/4 Mbyte of RAM and have 40/50Mbyte of cache and a lot of buffers allocated. If I buy memory _I_ don' t want to see the swap used. I _hate_ the swap. I would run with swapoff -a if the machine would not deadlock (with kswapd loading 100% of the CPU) instead of return out of memory. And how is handled the aging of the pages? i386 (and MIPS if I remeber well) (don' t tell me "and every other modern CPU" because I can guess that ;-) provides a flag in every page that should be usable to take care of the page recently read/write against the unused pages. Is that flags used to take care of the aging or the aging is done all by software without take advantages of CPU facilites? I ask this because it seems that the aging doesn' t work since my bash is swapped out (or removed from the RAM) when read(2) allocate the cache while in 2.0.34 all is perfect. Now I am using this simple program to test kswapd: #include <unistd.h> main() { char buf[4096]; while (read(0, buf, sizeof(buf)) == sizeof(buf)); } ./a.out < /tmp/zero Where zero is a big file. When there is no more memory free (because it' s all allocated in the cache) bash is not more responsive to keypress and the swap/in/out start. Fortunately at least the 2.0.34 mm algorithms seems to works _perfect_ under all kind of conditions so in the worst case I' ll try to port for my machine the linux/mm/* interesting things from 2.0.34 to 108 and I' ll start rejecting every other kernel official mm patch (you can see that I am really irritate due too much swapping in the last month ;-). It will be an hard work but at least I will be sure of the good result... Somebody has really _screwed_ the really _perfect_ 2.0.34 kswapd in the 2.1.x way. As far as I known, nobody except me is working to fix kswapd. I had also to tell that I never used Linux in a machine with > 32Mbyte of ram so I don' t know if there 2.1.108 works perfect as 2.0.34. So please tell me to buy other 32Mbyte of memory or help me to fix kswapd instead of developing new things for memory defragmentation for example. Andrea[s] Arcangeli PS. Now I am running 2.0.34 and it' s very very more efficient than 2.1.108. 108 is sure very faster in all things but _here_ the "always swapping" thing remove all other improvements and make the system very very less fluid :-(. -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-05 18:38 ` Andrea Arcangeli 1998-07-05 19:31 ` Rik van Riel @ 1998-07-06 10:31 ` Stephen C. Tweedie 1998-07-06 12:34 ` Andrea Arcangeli 1 sibling, 1 reply; 40+ messages in thread From: Stephen C. Tweedie @ 1998-07-06 10:31 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Rik van Riel, Linux MM, Linux Kernel, Stephen Tweedie Hi, On Sun, 5 Jul 1998 20:38:57 +0200 (CEST), Andrea Arcangeli <arcangeli@mbox.queen.it> said: > kswapd must swap _nothing_ if _freeable_ cache memory is allocated. > kswapd _must_ consider freeable cache memory as _free_ not used memory > and so it must not start swapping out useful code and data for make > space for allocating more cache. You just can't make blanket statements like that! If you're on an 8MB or 16MB box doing compilations, then you desperately want unused process data pages --- idle bits of inetd, lpd, sendmail, init, the shell, the top-level make and so on --- to be swapped out to make room for a few more header files in cache. Throwing away all cache pages will also destroy readahead and prevent you from caching pages of a binary between successive invocations. That's the problem with all rules of the form "memory management MUST prioritise X over Y". There are always cases where it is not true. What we need is a balance, not arbitrary rules like that. --Stephen -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-06 10:31 ` Stephen C. Tweedie @ 1998-07-06 12:34 ` Andrea Arcangeli 1998-07-06 14:36 ` Stephen C. Tweedie 0 siblings, 1 reply; 40+ messages in thread From: Andrea Arcangeli @ 1998-07-06 12:34 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Rik van Riel, Linux MM, Linux Kernel On Mon, 6 Jul 1998, Stephen C. Tweedie wrote: >On Sun, 5 Jul 1998 20:38:57 +0200 (CEST), Andrea Arcangeli ><arcangeli@mbox.queen.it> said: > >> kswapd must swap _nothing_ if _freeable_ cache memory is allocated. >> kswapd _must_ consider freeable cache memory as _free_ not used memory >> and so it must not start swapping out useful code and data for make >> space for allocating more cache. > >You just can't make blanket statements like that! If you're on an 8MB I' d like to not make statements like that, in that case the aging would work ;-). >or 16MB box doing compilations, then you desperately want unused process >data pages --- idle bits of inetd, lpd, sendmail, init, the shell, the Now also the process that needs memory got swapped out. >top-level make and so on --- to be swapped out to make room for a few >more header files in cache. Throwing away all cache pages will also >destroy readahead and prevent you from caching pages of a binary between >successive invocations. I _really_ don' t want cache and readahead when the system needs memory. The only important thing is to avoid the always swapin/out and provide free memory to the process. You don' t run in a 32Mbyte box I see ;-). Andrea[s] Arcangeli -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-06 12:34 ` Andrea Arcangeli @ 1998-07-06 14:36 ` Stephen C. Tweedie 1998-07-06 19:28 ` Andrea Arcangeli 0 siblings, 1 reply; 40+ messages in thread From: Stephen C. Tweedie @ 1998-07-06 14:36 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Stephen C. Tweedie, Rik van Riel, Linux MM, Linux Kernel Hi, On Mon, 6 Jul 1998 14:34:02 +0200 (CEST), Andrea Arcangeli <arcangeli@mbox.queen.it> said: > On Mon, 6 Jul 1998, Stephen C. Tweedie wrote: >> or 16MB box doing compilations, then you desperately want unused process >> data pages --- idle bits of inetd, lpd, sendmail, init, the shell, the > Now also the process that needs memory got swapped out. No --- that's the whole point. We have per-page process page aging which lets us differentiate between processes which are active and those which are idle, and between the used and unused pages within the active processes. If you are short on memory, then you don't want to keep around any process pages which belong to idle tasks. The only way to do that is to invoke the swapper. We need to make sure that we are just aggressive enough to discard pages which are not in use, and not to discard pages which have been touched recently. If we simply prune the cache to zero before doing any swapping, then we will be eliminating potentially useful data out of the cache instead of throwing away pages to swap which may not have been used in the past half an hour. That's what the balancing issue is about: if there are swap pages which are not being touched at all and files such as header files which are being constantly accessed, then we need to do at least _some_ swapping to eliminate the idle process pages. > I _really_ don' t want cache and readahead when the system needs > memory. You also don't want lpd sitting around, either. > The only important thing is to avoid the always swapin/out and provide > free memory to the process. It's just wishful thinking to assume you can do this simply by destroying the cache. Oh, and you _do_ want readahead even with little memory, otherwise you are doing 10 disk IOs to read a file instead of one; and on a box which is starved of memory, that implies you'll probably see a disk seek between each IO. That's just going to thrash your disk even harder. > You don' t run in a 32Mbyte box I see ;-). I run in 64MB, 16MB and 6MB for testing purposes. --Stephen -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-06 14:36 ` Stephen C. Tweedie @ 1998-07-06 19:28 ` Andrea Arcangeli 1998-07-07 12:01 ` Stephen C. Tweedie 0 siblings, 1 reply; 40+ messages in thread From: Andrea Arcangeli @ 1998-07-06 19:28 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Rik van Riel, Linux MM, Linux Kernel On Mon, 6 Jul 1998, Stephen C. Tweedie wrote: >No --- that's the whole point. We have per-page process page aging >which lets us differentiate between processes which are active and those >which are idle, and between the used and unused pages within the active >processes. Nice! The problem is that probably the kernel think that bash and every not 100% CPU eater is an idle process... >If you are short on memory, then you don't want to keep around any >process pages which belong to idle tasks. The only way to do that is to This is again more true for low memory machines (where the current kswapd policy sucks). I 100% agree with this, I don' t agree to swapout to make space from the cache. The cache is too much dynamic and so the swapin/out continue forever. >invoke the swapper. We need to make sure that we are just aggressive >enough to discard pages which are not in use, and not to discard pages >which have been touched recently. I think that we are too much aggressive. Also my bash gone swapped out. If I run `cp file /dev/null' on 2.0.34, when I launch `free' from the shall I don' t see stalls. It seems that `free' remains in the cache, while on 2.1.108 I had to wait a lot of seconds to see `free' executed (and characters printed to the console). >If we simply prune the cache to zero before doing any swapping, then we >will be eliminating potentially useful data out of the cache instead of >throwing away pages to swap which may not have been used in the past >half an hour. It would be nice if it would be swapped out _only_ pages that are not used in the past half an hour. If kswapd would run in such way I would thank you a lot instead of being irritate ;-). >That's what the balancing issue is about: if there are swap pages which >are not being touched at all and files such as header files which are >being constantly accessed, then we need to do at least _some_ swapping >to eliminate the idle process pages. 100% agree. >> I _really_ don' t want cache and readahead when the system needs >> memory. > >You also don't want lpd sitting around, either. NO. I want lpd sitting around if it' s been used in the last 10 minutes for example. I don' t want to swapout process for make space for the _cache_ if the process is not 100% idle instead. >> The only important thing is to avoid the always swapin/out and provide >> free memory to the process. > >It's just wishful thinking to assume you can do this simply by >destroying the cache. Oh, and you _do_ want readahead even with little Yes we can avoid it destroying the cache I think, since it' s the only cause I can touch by hand that cause me problems when nothing of huge is running (when I have 20Mbyte of "not used by me" memory). 2.0.34 destroy (wooo nice I love when I see the cache destroyed ;-) completly the cache and runs great. I have a friend that take 2.0.34 on its 8Mbyte laptop only to compile the kernel in 30Minutes instead of in the N hours of 2.0.10x. >memory, otherwise you are doing 10 disk IOs to read a file instead of >one; and on a box which is starved of memory, that implies you'll >probably see a disk seek between each IO. That's just going to thrash >your disk even harder. I really don' t bother about read-ahead. When the system swap the hd is so busy that there are really no difference to go at speed of 1Km/h or 0.1Km/h ;-). Readahead in that case is the same of run an optimized O(2^n) algorithm (against running a not optimized one (no-readahead)). >> You don' t run in a 32Mbyte box I see ;-). > >I run in 64MB, 16MB and 6MB for testing purposes. Maybe your test are a bit light ;-). Also maybe you are not running on a single IDE0 (UDMA) HD with the swap partition on the same HD as me. Please avoid the swap every time you can. Swap is the end of the life of every machine. Trash the cache instead. Which functions I had to touch and use to destroy the cache instead of swapping out processes? I don' t ask a so nice feature of page aging you are claiming about, I only need to avoid the swap to run _fast_ (as does 2.0.34). BTW, I started this thread these days only because I booted 2.0.34 and I noticed the big improvement. Andrea[s] Arcangeli PS. Thanks anyway to all mm guys that that contributed to 2.1.x since I _guess_ that kswapd and the mm layer in general is OK for high memory machines. __Maybe__ we only need some tuning for low memory machines. BTW, how many people tune the vm layer using the sysctls? -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-06 19:28 ` Andrea Arcangeli @ 1998-07-07 12:01 ` Stephen C. Tweedie 1998-07-07 15:54 ` Rik van Riel 0 siblings, 1 reply; 40+ messages in thread From: Stephen C. Tweedie @ 1998-07-07 12:01 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Stephen C. Tweedie, Rik van Riel, Linux MM, Linux Kernel Hi, On Mon, 6 Jul 1998 21:28:42 +0200 (CEST), Andrea Arcangeli <arcangeli@mbox.queen.it> said: > On Mon, 6 Jul 1998, Stephen C. Tweedie wrote: >> No --- that's the whole point. We have per-page process page aging >> which lets us differentiate between processes which are active and those >> which are idle, and between the used and unused pages within the active >> processes. > Nice! The problem is that probably the kernel think that bash and every > not 100% CPU eater is an idle process... Not at all. :) A process only has to touch a page once per sweep of the vm scanner for that page to be marked in use. A shell which touches a few pages for every keystroke will get the same preservation of those pages as a process which is touching the same number of pages in a tight loop. >> If you are short on memory, then you don't want to keep around any >> process pages which belong to idle tasks. The only way to do that is to > This is again more true for low memory machines (where the current kswapd > policy sucks). I 100% agree with this, I don' t agree to swapout to make > space from the cache. I've just explained why we _do_ want to do this on low memory machines, to a certain extent. When memory is low, we don't want to keep around anything which we don't need, and so swapping out completely unused pages is a good thing. The thing we need to avoid is swapping anything touched recently; switching off swapout completely, even just to make room for the cache, is wrong. >> invoke the swapper. We need to make sure that we are just aggressive >> enough to discard pages which are not in use, and not to discard pages >> which have been touched recently. > I think that we are too much aggressive. Sure, in 2.1. > It would be nice if it would be swapped out _only_ pages that are not used > in the past half an hour. If kswapd would run in such way I would thank > you a lot instead of being irritate ;-). ?? Some people will want to keep anything used within the last half hour; in other cases, 5 minutes idle should qualify for a swapout. On the compilation benchmarks I run on 6MB machines, any page not used within the past 10 seconds or so should be history! >> You also don't want lpd sitting around, either. > NO. I want lpd sitting around if it' s been used in the last 10 minutes > for example. I don' t want to swapout process for make space for the > _cache_ if the process is not 100% idle instead. Not if your memory is full. You CANNOT say "I want this in memory, not that". You will always be able to find situations where it doesn't work. You need a balance. I'm quite sure that you don't want your kernel build to thrash simply because the vm system is afraid of swapping out the sendmail and lpd daemons you used 10 minutes ago. > 2.0.34 destroy (wooo nice I love when I see the cache destroyed ;-) > completly the cache and runs great. No it doesn't. It balances the cache better; that's a very different thing. The only difference between 2.0 and 2.1 in this regard is the tuning of that balance; the underlying code is more or less the same. --Stephen -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-07 12:01 ` Stephen C. Tweedie @ 1998-07-07 15:54 ` Rik van Riel 1998-07-07 17:32 ` Benjamin C.R. LaHaise 1998-07-08 13:45 ` Stephen C. Tweedie 0 siblings, 2 replies; 40+ messages in thread From: Rik van Riel @ 1998-07-07 15:54 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Andrea Arcangeli, Linux MM, Linux Kernel On Tue, 7 Jul 1998, Stephen C. Tweedie wrote: > On Mon, 6 Jul 1998 21:28:42 +0200 (CEST), Andrea Arcangeli > <arcangeli@mbox.queen.it> said: > > > It would be nice if it would be swapped out _only_ pages that are not used > > in the past half an hour. If kswapd would run in such way I would thank > > you a lot instead of being irritate ;-). > > ?? Some people will want to keep anything used within the last half > hour; in other cases, 5 minutes idle should qualify for a swapout. On > the compilation benchmarks I run on 6MB machines, any page not used > within the past 10 seconds or so should be history! There's a good compromize between balancing per-page and per-process. We can simply declare the last X (say 8) pages of a process holy unless that process has slept for more than Y (say 5) seconds. As a temporary measure, you can tune swapctl to have an age_cluster_fract of 128 and an age_cluster_min of 0; this will leave the 8 last pages of an app in memory, whatever happens... Rik. +-------------------------------------------------------------------+ | Linux memory management tour guide. H.H.vanRiel@phys.uu.nl | | Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ | +-------------------------------------------------------------------+ -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-07 15:54 ` Rik van Riel @ 1998-07-07 17:32 ` Benjamin C.R. LaHaise 1998-07-08 13:54 ` Stephen C. Tweedie 1998-07-08 13:45 ` Stephen C. Tweedie 1 sibling, 1 reply; 40+ messages in thread From: Benjamin C.R. LaHaise @ 1998-07-07 17:32 UTC (permalink / raw) To: Rik van Riel; +Cc: Stephen C. Tweedie, Andrea Arcangeli, Linux MM, Linux Kernel On Tue, 7 Jul 1998, Rik van Riel wrote: > There's a good compromize between balancing per-page > and per-process. We can simply declare the last X > (say 8) pages of a process holy unless that process > has slept for more than Y (say 5) seconds. This is the wrong fix for the case that Andrea is complaining about - tossing out chunks of processes piecemeal, resulting in a length page-in time when the process becomes active again. Two things that might help with this are: read-ahead on swapins, and *true* swapping. If the system has run out of ram for the tasks at hand, should it not swap out a process that's inactive in one fell swoop? Likewise, when said process resumes, it's probably worth bringing that entire working set back into memory. That way the user will only experience a brief pause on the first keystroke issued to bash, not the 'pause on first character type, then pause as line editing code faults back in...' -ben -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-07 17:32 ` Benjamin C.R. LaHaise @ 1998-07-08 13:54 ` Stephen C. Tweedie 1998-07-08 21:19 ` Andrea Arcangeli 0 siblings, 1 reply; 40+ messages in thread From: Stephen C. Tweedie @ 1998-07-08 13:54 UTC (permalink / raw) To: Benjamin C.R. LaHaise Cc: Rik van Riel, Stephen C. Tweedie, Andrea Arcangeli, Linux MM, Linux Kernel Hi, On Tue, 7 Jul 1998 13:32:34 -0400 (8UU\x01), "Benjamin C.R. LaHaise" <blah@kvack.org> said: > This is the wrong fix for the case that Andrea is complaining about - > tossing out chunks of processes piecemeal, resulting in a length page-in > time when the process becomes active again. Two things that might help > with this are: read-ahead on swapins, and *true* swapping. I'm unconvinced. It's pretty clear that the underlying problem is that the cache is far too agressive when you are copying large amounts of data around. The fact that interactive performance is bad suggests not that the swapping algorithm is making bad decisions, but that it is being forced to work with far too little physical memory due to the cache size. There's no doubt that swap readahead and true full-process swapping can give us performance benefits, but Andrea is quite clearly seeing enormous resident cache sizes when copying large files to /dev/null, and that's a problem which we need to tackle independently of the swapper's own page selection algorithms. --Stephen -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-08 13:54 ` Stephen C. Tweedie @ 1998-07-08 21:19 ` Andrea Arcangeli 1998-07-11 11:18 ` Rik van Riel 0 siblings, 1 reply; 40+ messages in thread From: Andrea Arcangeli @ 1998-07-08 21:19 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Benjamin C.R. LaHaise, Rik van Riel, Linux MM, Linux Kernel On Wed, 8 Jul 1998, Stephen C. Tweedie wrote: >I'm unconvinced. It's pretty clear that the underlying problem is that >the cache is far too agressive when you are copying large amounts of >data around. The fact that interactive performance is bad suggests not >that the swapping algorithm is making bad decisions, but that it is >being forced to work with far too little physical memory due to the >cache size. Yes, this is exactly what I think too. Andrea[s] Arcangeli -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-08 21:19 ` Andrea Arcangeli @ 1998-07-11 11:18 ` Rik van Riel 1998-07-11 21:11 ` Stephen C. Tweedie 0 siblings, 1 reply; 40+ messages in thread From: Rik van Riel @ 1998-07-11 11:18 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Linux MM, Stephen Tweedie, Linux Kernel On Wed, 8 Jul 1998, Andrea Arcangeli wrote: > On Wed, 8 Jul 1998, Stephen C. Tweedie wrote: > > >I'm unconvinced. It's pretty clear that the underlying problem is that > >the cache is far too agressive when you are copying large amounts of > >data around. The fact that interactive performance is bad suggests not > >that the swapping algorithm is making bad decisions, but that it is > >being forced to work with far too little physical memory due to the > >cache size. This morning I have posted a patch to Linux MM which can drastically improve this situation. For the low-mem linux-kernel users, you can get the patch from my homepage too. Rik. +-------------------------------------------------------------------+ | Linux memory management tour guide. H.H.vanRiel@phys.uu.nl | | Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ | +-------------------------------------------------------------------+ -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-11 11:18 ` Rik van Riel @ 1998-07-11 21:11 ` Stephen C. Tweedie 0 siblings, 0 replies; 40+ messages in thread From: Stephen C. Tweedie @ 1998-07-11 21:11 UTC (permalink / raw) To: Rik van Riel; +Cc: Andrea Arcangeli, Linux MM, Stephen Tweedie, Linux Kernel Hi, On Sat, 11 Jul 1998 13:18:35 +0200 (CEST), Rik van Riel <H.H.vanRiel@phys.uu.nl> said: > This morning I have posted a patch to Linux MM which can > drastically improve this situation. > For the low-mem linux-kernel users, you can get the patch > from my homepage too. I can't see it... --Stephen -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-07 15:54 ` Rik van Riel 1998-07-07 17:32 ` Benjamin C.R. LaHaise @ 1998-07-08 13:45 ` Stephen C. Tweedie 1998-07-08 18:57 ` Rik van Riel 1 sibling, 1 reply; 40+ messages in thread From: Stephen C. Tweedie @ 1998-07-08 13:45 UTC (permalink / raw) To: Rik van Riel; +Cc: Stephen C. Tweedie, Andrea Arcangeli, Linux MM, Linux Kernel Hi, On Tue, 7 Jul 1998 17:54:46 +0200 (CEST), Rik van Riel <H.H.vanRiel@phys.uu.nl> said: > There's a good compromize between balancing per-page > and per-process. We can simply declare the last X > (say 8) pages of a process holy unless that process > has slept for more than Y (say 5) seconds. Yep --- this is per-process RSS management, and there is a _lot_ we can do once we start following this route. I've been talking with some folk about it already, and this is something we definitely want to look into for 2.3. For example, we can do both RSS limits (upper limits to RSS) plus RSS quotas (a guaranteed lower limit which we allocate to the process). Consider a machine where we have some very large processes thrashing away; placing an RSS limit on those excessive processes will prevent them from hogging all of physical memory, and giving interactive processes a small guaranteed RSS quota will ensure that those processes are allowed to make at least some progress even under severe VM load. The hard part is the self-tuning --- making sure that we don't give a resident quota to idle processes, so that they can be fully swapped out, and making sure that we don't overly trim back large processes for which there is actually sufficient physical memory. However, the principle of RSS management is a powerful one and we should most certainly be doing this for 2.3. --Stephen -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-08 13:45 ` Stephen C. Tweedie @ 1998-07-08 18:57 ` Rik van Riel 1998-07-08 22:11 ` Stephen C. Tweedie 0 siblings, 1 reply; 40+ messages in thread From: Rik van Riel @ 1998-07-08 18:57 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Andrea Arcangeli, Linux MM, Linux Kernel On Wed, 8 Jul 1998, Stephen C. Tweedie wrote: > On Tue, 7 Jul 1998 17:54:46 +0200 (CEST), Rik van Riel > <H.H.vanRiel@phys.uu.nl> said: > > > There's a good compromize between balancing per-page > > and per-process. We can simply declare the last X > > (say 8) pages of a process holy unless that process > > has slept for more than Y (say 5) seconds. > > Yep --- this is per-process RSS management, and there is a _lot_ we > can do once we start following this route. I've been talking with > some folk about it already, and this is something we definitely want > to look into for 2.3. > > The hard part is the self-tuning --- making sure that we don't give a When my zone allocator is finished, it'll be a piece of cake to implement lazy page reclamation. With lazy reclamation, we simply place an upper limit on the number of _active_ pages. A process that's really thrashing away will simply be moving it's pages to/from the inactive list. And when memory pressure increases, other processes will start taking pages away from the inactive pages collection of our memory hog. That looks quite OK to me... Rik. +-------------------------------------------------------------------+ | Linux memory management tour guide. H.H.vanRiel@phys.uu.nl | | Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ | +-------------------------------------------------------------------+ -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-08 18:57 ` Rik van Riel @ 1998-07-08 22:11 ` Stephen C. Tweedie 1998-07-09 7:43 ` Rik van Riel 1998-07-09 20:39 ` Rik van Riel 0 siblings, 2 replies; 40+ messages in thread From: Stephen C. Tweedie @ 1998-07-08 22:11 UTC (permalink / raw) To: Rik van Riel; +Cc: Stephen C. Tweedie, Andrea Arcangeli, Linux MM, Linux Kernel Hi, On Wed, 8 Jul 1998 20:57:27 +0200 (CEST), Rik van Riel <H.H.vanRiel@phys.uu.nl> said: > When my zone allocator is finished, it'll be a piece of > cake to implement lazy page reclamation. I've already got a working implementation. The issue of lazy reclamation is pretty much independent of the allocator underneath; I don't see it being at all hard to run the lazy reclamation stuff on top of any form of zoned allocation. > With lazy reclamation, we simply place an upper limit > on the number of _active_ pages. A process that's really > thrashing away will simply be moving it's pages to/from > the inactive list. Exactly. We _do_ want to be able to increase the RSS limit dynamically to avoid moving too many pages in and out of the working set, but if the process's working set is _that_ large, then performance will be dominated so much by L2 cache trashing and CPU TLB misses that the extra minor page faults we'd get are unlikely to be a catastrophic performance problem. In short, if there's no contention on memory, there's no need to impose RSS limits at all: it's just an extra performance cost. But as soon as physical memory contention becomes important, the RSS management is an obvious way of restricting the performance impact of the large processes on the rest of the system. > And when memory pressure increases, other processes will > start taking pages away from the inactive pages collection > of our memory hog. Precisely. > That looks quite OK to me... Yep. That's one of the main motivations behind the swap cache work in 2.1: the way the swapper now works, we can unhook pages from the process's page tables and send them to swap once the RSS limit is exceeded, but keep a copy of those pages in the swap cache so that if the process wants a page back before we've got around to reusing the memory, it's just a minor fault to bring it back in. All of this code is already present in 2.1 now. The only thing missing is the maintenance of the LRU list of lazy pages for reuse. --Stephen -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-08 22:11 ` Stephen C. Tweedie @ 1998-07-09 7:43 ` Rik van Riel 1998-07-09 20:39 ` Rik van Riel 1 sibling, 0 replies; 40+ messages in thread From: Rik van Riel @ 1998-07-09 7:43 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Andrea Arcangeli, Linux MM, Linux Kernel On Wed, 8 Jul 1998, Stephen C. Tweedie wrote: > <H.H.vanRiel@phys.uu.nl> said: > > > When my zone allocator is finished, it'll be a piece of > > cake to implement lazy page reclamation. > > I've already got a working implementation. The issue of lazy > reclamation is pretty much independent of the allocator underneath; I > don't see it being at all hard to run the lazy reclamation stuff on top > of any form of zoned allocation. The problem with the current allocator is that it stores the pointers to available blocks in the blocks themselves. This means we can't wait till the last moment with lazy reclamation. > is already present in 2.1 now. The only thing missing is the > maintenance of the LRU list of lazy pages for reuse. That part will come for free with my zone allocator. Rik. +-------------------------------------------------------------------+ | Linux memory management tour guide. H.H.vanRiel@phys.uu.nl | | Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ | +-------------------------------------------------------------------+ -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-08 22:11 ` Stephen C. Tweedie 1998-07-09 7:43 ` Rik van Riel @ 1998-07-09 20:39 ` Rik van Riel 1998-07-13 11:54 ` Stephen C. Tweedie 1 sibling, 1 reply; 40+ messages in thread From: Rik van Riel @ 1998-07-09 20:39 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Andrea Arcangeli, Linux MM On Wed, 8 Jul 1998, Stephen C. Tweedie wrote: > <H.H.vanRiel@phys.uu.nl> said: > > > When my zone allocator is finished, it'll be a piece of > > cake to implement lazy page reclamation. > > I've already got a working implementation. The issue of lazy > reclamation is pretty much independent of the allocator underneath; I We really should integrate this _now_, with the twist that pages which could form a larger buddy should be immediately deallocated. This can give us a cheap way to: - create larger memory buddies - remove some of the pressure on the buddy allocator (no need to grab that last 64 kB area when 25% of user pages are lazy reclaim) Rik. +-------------------------------------------------------------------+ | Linux memory management tour guide. H.H.vanRiel@phys.uu.nl | | Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ | +-------------------------------------------------------------------+ -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-09 20:39 ` Rik van Riel @ 1998-07-13 11:54 ` Stephen C. Tweedie 0 siblings, 0 replies; 40+ messages in thread From: Stephen C. Tweedie @ 1998-07-13 11:54 UTC (permalink / raw) To: Rik van Riel; +Cc: Stephen C. Tweedie, Andrea Arcangeli, Linux MM Hi, On Thu, 9 Jul 1998 22:39:10 +0200 (CEST), Rik van Riel <H.H.vanRiel@phys.uu.nl> said: > On Wed, 8 Jul 1998, Stephen C. Tweedie wrote: >> <H.H.vanRiel@phys.uu.nl> said: >> >> > When my zone allocator is finished, it'll be a piece of >> > cake to implement lazy page reclamation. >> >> I've already got a working implementation. The issue of lazy >> reclamation is pretty much independent of the allocator underneath; I > We really should integrate this _now_, with the twist > that pages which could form a larger buddy should be > immediately deallocated. Perhaps, but I don't think Linus will take it. He's right, too, it's too near 2.2 for that. > This can give us a cheap way to: > - create larger memory buddies > - remove some of the pressure on the buddy allocator > (no need to grab that last 64 kB area when 25% of > user pages are lazy reclaim) All it can do is to reduce the pain of doing swapping too aggressively. It doesn't make it much easier to do true defragmentation; it just lets you hang on to the defragmented pages a bit longer, which is a different thing. If you end up with non-pagable pages allocated to kmalloc/slab/page tables all over memory, then lazy reclaim is powerless to help defrag the memory. We need something else for 2.2. --Stephen -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-05 17:00 ` Rik van Riel 1998-07-05 18:38 ` Andrea Arcangeli @ 1998-07-05 18:57 ` MOLNAR Ingo 1998-07-06 10:24 ` Stephen C. Tweedie 2 siblings, 0 replies; 40+ messages in thread From: MOLNAR Ingo @ 1998-07-05 18:57 UTC (permalink / raw) To: Rik van Riel; +Cc: Andrea Arcangeli, Linux MM, Linux Kernel On Sun, 5 Jul 1998, Rik van Riel wrote: > > I run hdparm -a0 /dev/hda and nothing change. Now the cache take 20Mbyte > > of memory running cp file /dev/null while memtest 10000000 is running. > > Hdparm only affects _hardware_ readahead and has nothing > to do with software readahead. nope, -a0 turns off software readahead. -A controls hardware readahead. -- mingo -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-05 17:00 ` Rik van Riel 1998-07-05 18:38 ` Andrea Arcangeli 1998-07-05 18:57 ` MOLNAR Ingo @ 1998-07-06 10:24 ` Stephen C. Tweedie 1998-07-06 13:37 ` Eric W. Biederman 2 siblings, 1 reply; 40+ messages in thread From: Stephen C. Tweedie @ 1998-07-06 10:24 UTC (permalink / raw) To: Rik van Riel; +Cc: Andrea Arcangeli, Linux MM, Linux Kernel Hi, On Sun, 5 Jul 1998 19:00:04 +0200 (CEST), Rik van Riel <H.H.vanRiel@phys.uu.nl> said: > On Sun, 5 Jul 1998, Andrea Arcangeli wrote: >> Where does the cache is allocated? Is it allocated in the inode? If so >> kswapd should shrink the inode before start swapping out! > The cache is also mapped into a process'es address space. > Currently we would have to walk all pagetables to find a > specific page ;( Not in this case, where the file is just being copied. For a copy, the reads exist unmapped in the page cache; only mmap() creates mapped pages. > When Stephen and Ben have merged their PTE stuff, we can > do the freeing much easier though... In this case, it's not an issue, so we need to fix it for 2.2. >> I had to ask "2.0.34 has balancing code implemented and >> running?". The > 2.0 has no balancing code at all. At least, not AFAIK... It does: the Duff's device in try_to_free_page does it, and seems to work well enough. It was certainly tuned tightly enough: all of the hard part of getting the kswap stuff working well in try_to_swap_out() was to do with tuning the aggressiveness of swap relative to the buffer and cache reclaim mechanisms so that the try_to_free_page loop works well. That's why the recent policies of adding little rules here and there all over the mm layer have disturbed the balance so much, I think. >> Is there a function call (such us shrink_mmap for mmap or >> kmem_cache_reap() for slab or shrink_dcache_memory() for dcache) that >> is able to shrink the cache allocated by cp file /dev/zero? > shrink_mmap() can only shrink unlocked and clean buffer pages > and unmapped cache pages. We need to go through either bdflush > (for buffer) or try_to_swap_out() first, in order to make some > easy victims for shrink_mmap()... Only for mapped files, not files copied through the standard read/write calls. --Stephen -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-06 10:24 ` Stephen C. Tweedie @ 1998-07-06 13:37 ` Eric W. Biederman 1998-07-07 12:35 ` Stephen C. Tweedie 0 siblings, 1 reply; 40+ messages in thread From: Eric W. Biederman @ 1998-07-06 13:37 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Rik van Riel, Andrea Arcangeli, Linux MM, Linux Kernel >>>>> "ST" == Stephen C Tweedie <sct@redhat.com> writes: ST> It does: the Duff's device in try_to_free_page does it, and seems to ST> work well enough. It was certainly tuned tightly enough: all of the ST> hard part of getting the kswap stuff working well in try_to_swap_out() ST> was to do with tuning the aggressiveness of swap relative to the buffer ST> and cache reclaim mechanisms so that the try_to_free_page loop works ST> well. That's why the recent policies of adding little rules here and ST> there all over the mm layer have disturbed the balance so much, I think. The use of touch_page and age_page appear to be the most likely canidates for the page cache being more persistent than it used to be. If I'm not mistaken shrink_mmap must be called more often now to remove a given page. Eric -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cp file /dev/zero <-> cache [was Re: increasing page size] 1998-07-06 13:37 ` Eric W. Biederman @ 1998-07-07 12:35 ` Stephen C. Tweedie 0 siblings, 0 replies; 40+ messages in thread From: Stephen C. Tweedie @ 1998-07-07 12:35 UTC (permalink / raw) To: Eric W. Biederman Cc: Stephen C. Tweedie, Rik van Riel, Andrea Arcangeli, Linux MM, Linux Kernel Hi, On 06 Jul 1998 08:37:02 -0500, ebiederm+eric@npwt.net (Eric W. Biederman) said: > The use of touch_page and age_page appear to be the most likely > canidates for the page cache being more persistent than it used to > be. Yes., very much so. > If I'm not mistaken shrink_mmap must be called more often now to > remove a given page. Indeed. Three things I think we need to do are to lower the age ceiling for the page cache pages; perform page allocations for the page cache with a GFP_CACHE flag which forces us to look for other cache pages first in try_to_free_page; and try to eliminate several pages at a time from the page cache when we can. (There's no point in keeping only half the pages from a closed, sequentially accessed file in cache.) The first two of these are definitely small enough and clean enough changes to be appropriate for 2.1. --Stephen -- This is a majordomo managed list. To unsubscribe, send a message with the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~1998-07-20 16:04 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <199807091442.PAA01020@dax.dcs.ed.ac.uk>
1998-07-09 18:59 ` cp file /dev/zero <-> cache [was Re: increasing page size] Rik van Riel
1998-07-09 23:37 ` Stephen C. Tweedie
1998-07-10 5:57 ` Rik van Riel
1998-07-11 14:14 ` Rik van Riel
1998-07-11 21:23 ` Stephen C. Tweedie
1998-07-11 22:25 ` Rik van Riel
1998-07-13 13:23 ` Stephen C. Tweedie
1998-07-12 1:47 ` Benjamin C.R. LaHaise
1998-07-13 13:42 ` Stephen C. Tweedie
1998-07-18 22:10 ` Rik van Riel
1998-07-20 16:04 ` Stephen C. Tweedie
1998-07-09 13:01 Zachary Amsden
[not found] <Pine.LNX.3.96.980705072829.17879D-100000@mirkwood.dummy.home>
1998-07-05 11:32 ` Andrea Arcangeli
1998-07-05 17:00 ` Rik van Riel
1998-07-05 18:38 ` Andrea Arcangeli
1998-07-05 19:31 ` Rik van Riel
1998-07-06 10:38 ` Stephen C. Tweedie
1998-07-06 11:42 ` Rik van Riel
1998-07-06 14:20 ` Andrea Arcangeli
1998-07-06 10:31 ` Stephen C. Tweedie
1998-07-06 12:34 ` Andrea Arcangeli
1998-07-06 14:36 ` Stephen C. Tweedie
1998-07-06 19:28 ` Andrea Arcangeli
1998-07-07 12:01 ` Stephen C. Tweedie
1998-07-07 15:54 ` Rik van Riel
1998-07-07 17:32 ` Benjamin C.R. LaHaise
1998-07-08 13:54 ` Stephen C. Tweedie
1998-07-08 21:19 ` Andrea Arcangeli
1998-07-11 11:18 ` Rik van Riel
1998-07-11 21:11 ` Stephen C. Tweedie
1998-07-08 13:45 ` Stephen C. Tweedie
1998-07-08 18:57 ` Rik van Riel
1998-07-08 22:11 ` Stephen C. Tweedie
1998-07-09 7:43 ` Rik van Riel
1998-07-09 20:39 ` Rik van Riel
1998-07-13 11:54 ` Stephen C. Tweedie
1998-07-05 18:57 ` MOLNAR Ingo
1998-07-06 10:24 ` Stephen C. Tweedie
1998-07-06 13:37 ` Eric W. Biederman
1998-07-07 12:35 ` Stephen C. Tweedie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox