From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: text/plain; charset="iso-8859-1" From: Daniel Phillips Subject: Re: [PATCH] modified segq for 2.5 Date: Tue, 10 Sep 2002 02:17:20 +0200 References: In-Reply-To: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-Id: Sender: owner-linux-mm@kvack.org Return-Path: To: Rik van Riel , Andrew Morton Cc: William Lee Irwin III , sfkaplan@cs.amherst.edu, linux-mm@kvack.org List-ID: On Tuesday 10 September 2002 00:41, Rik van Riel wrote: > On Mon, 9 Sep 2002, Andrew Morton wrote: > > Do we remove the SetPageReferenced() in generic_file_write? > > Good question, I think we'll want to SetPageReferenced() when > we do a partial write but ClearPageReferenced() when we've > "written past the end" of the page. There's no substitute for the real thing: a short delay queue where we treat all references as a single reference. In generic_file_write, a page goes onto this list immediately on instantiation. On exit from the delay queue we unconditionally clear the referenced bit and use the rmap list to discard the pte referenced bits, then move the page to the inactive list. >>From there, a second reference will rescue the page to the hot end of the active list. Faulted-in pages, including swapped-in pages, mmaped pages and zeroed anon pages, take the same path as file IO pages. A reminder of why we're going to all this effort in the first place: it's to distinguish automatically between streaming IO and repeated use of data. With the improvements described here, we will additionally be able to detect used-once anon pages, which would include execute-once. Because of readahead, generic_file_read has to work a little differently. Ideally, we'd have a time-ordered readahead list and when the readahead heuristics accidently get too aggressive, we can cannibalize the future end of the list (and pour some cold water on the readahead thingy). A crude approximation of that behavior is just to have a readahead FIFO, and an even cruder approximation is to use the inactive list for this purpose. Unfortunately, the latter is too crude, because not-yet-used-readahead pages have to have a higher priority than just-used pages, otherwise the former will be recovered before the latter, which is not what we want. In any event, each page that passes under the read head of generic_file_read goes to the hot end of the delay queue, and from there behaves just like other kinds of pages. Attention has to be paid to balancing the aggressiveness of readahead against the refill_inactive scanning rate. These move in opposite directions in response to memory pressure. One could argue that program text is inherently more valuable than allocated data or file cache, in which case it may want its own inactive list, so that we can reclaim program text vs other kinds of data at different rates. The relative rates could depend on the relative instantiation rates (which includes the faulting rate and the file IO cache page creation rate). However, I'd like to see how well the crude presumption of equality works, and besides, it's less work that way. (So ignore this paragraph, please.) As far as zones go, the route of least resistance is to make both the delay queue and the readahead list per-zone, and since that means it's also per-node, numa people should like it. On the testing front, one useful cross-check is to determine whether hot spots in code are correctly detected. After running a while under mixed program activity and file IO, we should see that the hot spots as determined by a profiler (or cooked by a test program) have in fact moved to the active list, while initialization code has been evicted. All of the above is O(1). -- Daniel -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/