From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Thu, 3 Aug 2000 18:50:22 -0300 (BRST) From: Rik van Riel Subject: Re: RFC: design for new VM In-Reply-To: <3989C752.DFA26462@norran.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: Roger Larsson Cc: linux-mm@kvack.org, "Scott F. Kaplan" List-ID: On Thu, 3 Aug 2000, Roger Larsson wrote: > > - data structures (page lists) > > - active list > > - per node/pgdat > > - contains pages with page->age > 0 > > - pages may be mapped into processes > > - scanned and aged whenever we are short > > on free + inactive pages > > - maybe multiple lists for different ages, > > to be better resistant against streaming IO > > (and for lower overhead) > > Does this really need to be a list? Since most pages should > be on this list can't it be virtual - pages on no other list > are on active list. All pages are scanned all the time... It doesn't have to be a list per se, but since we have the list head in the page struct anyway we might as well make it one. > > - inactive_dirty list > > - per zone > > - contains dirty, old pages (page->age == 0) > > - pages are not mapped in any process > > - inactive_clean list > > - per zone > > - contains clean, old pages > > - can be reused by __alloc_pages, like free pages > > - pages are not mapped in any process > > What will happen to pages on these lists if pages gets referenced? > * Move them back to the active list? Then it is hard to know how > many free able pages there really are... Indeed, we will move such a page back to the active list. "Luckily" the inactive pages are not mapped, so we have to locate them through find_page_nolock() and friends, which allows us to move the page back to the active list, adjust statistics and maybe even wake up kswapd as needed. > > - other data structures > > - int memory_pressure > > - on page allocation or reclaim, memory_pressure++ > > - on page freeing, memory_pressure-- (keep it >= 0, though) > > - decayed on a regular basis (eg. every second x -= x>>6) > > - used to determine inactive_target > > - inactive_target == one (two?) second(s) worth of memory_pressure, > > which is the amount of page reclaims we'll do in one second > > - free + inactive_clean >= zone->pages_high > > - free + inactive_clean + inactive_dirty >= zone->pages_high \ > > + one_second_of_memory_pressure * (zone_size / memory_size) > > One of the most interesting aspects (IMHO) of Scott F. Kaplands > "Compressed Cache and Virtual Memory Simulation" was the use of > VM time instead of wall time. One second could be too long of a > reaction time - relative to X allocations/sec etc. It's just the inactive target. Trying to keep one second of unmapped pages with page->age==0 around is mainly done to: - make sure we can flush all of them on time - put an "appropriate" amount of pressure on the pages in the active list, so page aging is smoothed out a little bit > > - inactive_target will be limited to some sane maximum > > (like, num_physpages / 4) > > Question: Why is this needed? > Answer: Due to high memory_pressure can only exist momentarily. > And can pollute our statistics. Indeed. Imagine Netscape starting on a 32MB machine. 10MB allocated within the second, but there's no way we want the inactive list to grow to that size... > > The idea is that when we have enough old (inactive + free) > > pages, we will NEVER move pages from the active list to the > > inactive lists. We do that because we'd rather wait for some > > IO completion than evict the wrong page. > > So, will the scanning stop then??? And referenced builds up. > Or will there be pages with age == 0 on the active list? Active scanning goes on only when we have a shortage of inactive pages. Also, when aren't scanning, the page age of no page will magically change to 0 ;) > This contradicts "very light background ageing" earlier. Nope. If the system does no scanning of pages for some time (say 1 minute), we will simply scan some fraction of the inactive list. That way we can guarantee that we'll not have OLD referenced bits lingering around and messing up page aging when we start running out of memory. > > If memory_pressure is high and we're doing a lot of dirty > > disk writes, the bdflush percentage will kick in and we'll > > be doing extra-agressive cleaning. In that case bdflush > > will automatically become more agressive the more page > > replacement is going on, which is a good thing. > > I think that one of the omissions in Kaplands report is the > time it takes to clean dirty pages. (Or have I missed > something... Need to select the pages earlier) Page replacement (select which page to replace) should always be independant from page flushing. You can make pretty decent decisions on which page(s) to free and the last thing you want is having them messed up by page flushing. > > Misc > > > > Page aging and flushing are completely separated in this > > scheme. We'll never end up aging and freeing a "wrong" clean > > page because we're waiting for IO completion of old and > > to-be-freed pages. > > Is page ageing modification of LRU enough? It seems to work fine for FreeBSD. Also, we can always change the "aging" of the active pages with something else. The system is modular enough that we can do that. > In many cases it will probably behave worse than plain LRU > (slower phase adaptions). We can change that by using exponential decay for the page age, or by using some different aging technique... > The access pattern diagrams in Kaplans report are very > enlightening... They are very interesting indeed, but I miss one very common workload in their report. A lot of systems do (multimedia) streaming IO these days, where a lot of data passes through the cache quickly, but all of the data is only touched once (or maybe twice). regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/