On Tue, Apr 14, 2009 at 03:29:52PM +0800, Andi Kleen wrote: > Wu Fengguang writes: > > > > The context readahead algorithm guarantees to discover the sequentialness no > > matter how the streams are interleaved. For the above example, it will start > > sequential readahead since page 2 and 1002. > > > > The trick is to poke for page @offset-1 in the page cache when it has no other > > clues on the sequentialness of request @offset: if the current requenst belongs > > to a sequential stream, that stream must have accessed page @offset-1 recently, > > Really? The page could be just randomly cached from some previous IO, right? > I don't think you can detect that right? [maybe you could play some games > with the LRU list, but that would be likely not O(1)] > > I wonder if this could cause the prefetcher to "switch gears" suddenly > when it first walks a long stretch of non cached file data and then > suddenly hits a cached page. > > One possible way around this would be to add a concept of "page generation". > e.g. use a few bits of page flags. keep a global generation count > that increases slowly (let's say every few seconds) Mark the page with the > current generation count when the prefetcher takes it. When doing > your algorithm check the count first and only use pages with a recent > count. > > Not sure if it's a real problem or not. Good catch Andi! I'll list some possible situations. I guess you are referring to (2.3)? 1) page at @offset-1 is present, and it may be a trace of: 1.1) (interleaved) sequential reads: we catch it, bingo! 1.2) clustered random reads: readahead will be mostly favorable If page @offset-1 exists, it means there are two references that come close in both space(distance < one readahead window) and time (distance < one LRU scan cycle). So it's a good indication that some clustered random reads are occurring in the area around @offset. I have did many experiments on the performance impact of readahead on clustered random reads - the results are very encouraging. The tests covered different random read densities, random read sizes, as well as different thrashing conditions (by varying the dataset:memory ratios). There are hardly any performance loss(2% for the worst case), but the gain can be as large as 300%! Some numbers, curves and analysis can be found in the attached slides and paper: - readahead performance slides: Page 23-26: Clustered random reads - readahead framework paper: Page 7, Section 4.3: Random reads The recent readahead addition to mmap reads makes another vivid example of the stated "readahead is good for clustered random reads" principle. The readahead's side effect for the random references by executables are very good: - major faults reduced by 1/3 - mmap IO numbers reduced by 1/4 - with no obvious overheads But as always, one can fake a workload to totally defeat the readahead heuristics ;-) 2) page at @offset-1 is not present (for a sequential stream) 2.1) aggressive user space drop behind: fixable nuisance The user space could be doing fadvise(offset-1, DONTNEED), which will drop the history hint required to enable context readahead. But I guess when the administrator/developer noticed its impact - performance dropped instead of increased - he can easily fix it up to do fadvise(offset-2, DONTNEED), or to manage its own readahead via fadvise(WILLNEED). So this is an nuisance but fixable situation. 2.2) readahead thrashing: not handled for now We don't handle this for now. For now the behavior is to stop readahead and possibly restart the readahead window rampup process. 2.3) readahead cache hits: rare case and the impact is temporary The page at @offset-1 does get referenced by this stream, but it's created by someone else at some distant time ago. The page at @offset-1 may be lifted to active lru by this second reference, or too late and get reclaimed - by the time we reference page @offset. Normally its a range of cached pages. We are a) either walking inside the range and enjoying the cache hits, b) or we walk out of it and restart readahead by ourself, c) or the range of cached pages get reclaimed while we are walking on them, and hence cannot find page @offset-1. Obviously (c) is rare and temporary and is the main cause of (2.3). As soon as we goto the next page at @offset+1, we'll its 'previous' page at @offset to be cached(it is created by us!). So the context readahead starts working again - it's merely delayed by one page :-) Thanks, Fengguang