From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Wed, 7 Jun 2000 11:27:56 -0300 (BRST) From: Rik van Riel Subject: Re: journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code) In-Reply-To: <20000607144102.F30951@redhat.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: "Stephen C. Tweedie" Cc: Hans Reiser , "Quintela Carreira Juan J." , bert hubert , linux-kernel@vger.rutgers.edu, Chris Mason , linux-mm@kvack.org, Alexander Zarochentcev List-ID: On Wed, 7 Jun 2000, Stephen C. Tweedie wrote: > On Wed, Jun 07, 2000 at 10:23:35AM -0300, Rik van Riel wrote: > > > > > > There is no need for subcaches at all if all of the pages can be > > > represented on the page cache LRU lists. That would certainly > > > make balancing between caches easier. > > > > Wouldn't this mean we could end up with an LRU cache full of > > unfreeable pages? > > Rik, we need the VM to track dirty pages anyway, precisely so > that we can obtain some degree of write throttling to avoid > having the whole of memory full of dirty pages. *nod* > If we get short of memory, we really need to start flushing dirty > pages to disk independently of the task of finding free pages. Indeed, page replacement and page flushing need to be pretty much independant of each other, with the only gotcha that page replacement is able to trigger page flushing... > > This could get particularly nasty when we have a VM with > > active / inactive / scavenge lists... (like what I'm working > > on now) > > Right, we definitely need a better distinction between different > lists and different types of page activity before we can do this. I think we'll want something like what FreeBSD has, mainly because their feedback loop is really simple and has proven to be robust. 1) active list This list contains the pages which are active, we age the pages, they can be mapped in processes 2) inactive list All pages here are ready to be reclaimed. We are free to reclaim the clean inactive page before the dirty ones (to delay/minimise IO) because no page ends up here unless we want to reclaim it anyway. 3) scavenge list (BSD calls this cache list, -EOVERLOADEDWORD) All pages here are clean and can be reclaimed for all page allocations which have __GFP_WAIT set. We keep only a minimal amount of free pages and most times __alloc_pages() is called we'll take a scavenge page instead. 4) free list Not much of a list, the current free page structure. We use the pages here for atomic allocations and, when we have too many free pages, for normal allocations. The filesystem callbacks would be made for pages on the inactive list, the filesystem (or shm, or swap subsystem) is free to cluster any "eligable" pages with the page we requested to be freed. So if, eg., we request ext3 to flush page X, the filesystem can make its own decision on if it wants to also flush some other inactive (or even active) pages which are contiguous on disk with the block page X is written to. > > Question is, are the filesystems ready to play this game? > > With an address_space callback, yes --- ext3 can certainly find > a transaction covering a given page. This is what we need... > I'd imagine reiserfs can do something similar, but even if not, > it's not important if the filesystem can't do its lookup by > page. I don't necessarily agree on this point. What if our inactive list is filled with pages the filesystem somehow regards as new, and the filesystem will be busy flushing the "wrong" (in the eyes of the page stealer) pages? regards, Rik -- The Internet is not a network of computers. It is a network of people. That is its real strength. Wanna talk about the kernel? irc.openprojects.net / #kernelnewbies http://www.conectiva.com/ http://www.surriel.com/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/