From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from flinx.npwt.net (eric@flinx.npwt.net [208.236.161.237]) by kvack.org (8.8.7/8.8.7) with ESMTP id XAA09922 for ; Wed, 17 Jun 1998 23:52:47 -0400 Subject: Re: PTE chaining, kswapd and swapin readahead References: From: ebiederm+eric@npwt.net (Eric W. Biederman) Date: 17 Jun 1998 23:06:58 -0500 In-Reply-To: Rik van Riel's message of Wed, 17 Jun 1998 18:03:14 +0200 (CEST) Message-ID: Sender: owner-linux-mm@kvack.org To: Rik van Riel Cc: Linux MM List-ID: >>>>> "RR" == Rik van Riel writes: RR> On 17 Jun 1998, Eric W. Biederman wrote: >> If we get around to using a true LRU algorithm we aren't too likely >> too to swap out address space adjacent pages... Though I can see the >> advantage for pages of the same age. RR> True LRU swapping might actually be a disadvantage. The way RR> we do things now (walking process address space) can result RR> in a much larger I/O bandwidth to/from the swapping device. The truly optimal algorithm is I to read in the page/pages we are going to use next, and remove the page we won't use for the longest period of time. It's the same as LRU but in reverse time order. And both of these algorithms have the important property that they avoid Belady's anomoly. That is with more memory they won't cause more page faults and more I/O. The goal should be to reduce disk I/O as disk bandwidth is way below memory bandwidth. Using ``unused'' disk bandwidth in prepaging may also be a help. Note: much of the write I/O performance we achieve is because get_swap_page() is very efficient at returning adjacent swap pages. I don't see where location is memory makes a difference. As to which pages should be put close to each other for read performance, that is a different question. Files tend to be read/written sequentially, so it is a safe bet to anticipate this one usage pattern and if it is going on capitalize on it, if not don't do any read ahead. We could probably add a few more likely cases to the vm system. The only simple special cases I can think to add are reverse sequential access, and stack access where pages 1 2 3 4 are accesed and then 4 3 2 1 are accessed in reverse order. But for the general case it is quite difficult to predict, and a wrong prediction could make things worse. >> Also for swapin readahead the only effective strategy I know is to >> implement a kernel system call, that says I'm going to be accessing The point I was hoping to make is that for programs that find themselves swapping frequently a non blocking read (for mmapped areas) can be quite effective. Because in certain circumstances a program can in fact predict which pages it will be using next. And this will allow a very close approximation of the true optimal paging algorithm, and is fairly simple to implement. For a really close approximation we might also want to have a system call that says which pages we aren't likely to be using soon. Perhaps: mtouch(void *addr, int len, int when); where ``when'' can be SOON or LATER ... Or after looking at: http://cesdis.gsfc.nasa.gov/beowulf/software/pager.html struct mtouch_tuple { caddr_t addr; size_t extent; int when; } prepage_list[NR_ELEMENTS]; mtouch(&prepage_list); Where prepage_list is terminated with an special element. I don't like suggested terminator of a NULL address of 0 because there are some programs that actually use that... The when paremeter is my idea... RR> There are more possibilities. One of them is to use the RR> same readahead tactic that is being used for mmap() RR> readahead. Actually that sounds like a decent idea. But I doubt it will help much. I will start on the vnodes fairly soon, after I get a kernel pgflush deamon working. Eric