* [RFC] start_aggressive_readahead
@ 2002-07-25 16:10 Christoph Hellwig
2002-07-25 16:44 ` Rik van Riel
2002-07-26 6:53 ` Daniel Phillips
0 siblings, 2 replies; 23+ messages in thread
From: Christoph Hellwig @ 2002-07-25 16:10 UTC (permalink / raw)
To: torvalds; +Cc: linux-mm
Another patch from the XFS tree, I'd be happy to get some comments on
this one again.
This function (start_aggressive_readahead()) checks whether all zones
of the given gfp mask have lots of free pages. XFS needs this for it's
own readahead code (used only deep in the directory code, normal file
readahead is handled by the generic pagecache code). We perform the
readahead only is it returns 1 for enough free pages.
We could rip it out of XFS entirely without funcionality-loss, but it
would cost directory handling performance.
I'm also open for a better name (I think the current one is very bad,
but don't have a better idea :)). I'd also be ineterested in comments
how to avoid the new function and use existing functionality for it,
but I've tried to find it for a long time and didn't find something
suiteable.
--
The US Army issues lap-top computers now to squad-leaders on up. [...]
Believe me, there is nothing more lethal than a Power Point briefing
given by an Army person. -- Leon A. Goldstein
--- linux/include/linux/mm.h Wed, 29 May 2002 14:00:22
+++ linux/include/linux/mm.h Mon, 22 Jul 2002 12:06:09
@@ -460,6 +460,8 @@ extern void FASTCALL(free_pages(unsigned
#define __free_page(page) __free_pages((page), 0)
#define free_page(addr) free_pages((addr),0)
+extern int start_aggressive_readahead(int);
+
extern void show_free_areas(void);
extern void show_free_areas_node(pg_data_t *pgdat);
--- linux/kernel/ksyms.c Wed, 17 Jul 2002 12:08:06
+++ linux/kernel/ksyms.c Mon, 22 Jul 2002 12:06:09
@@ -90,6 +90,7 @@ EXPORT_SYMBOL(exit_fs);
EXPORT_SYMBOL(exit_sighand);
/* internal kernel memory management */
+EXPORT_SYMBOL(start_aggressive_readahead);
EXPORT_SYMBOL(_alloc_pages);
EXPORT_SYMBOL(__alloc_pages);
EXPORT_SYMBOL(alloc_pages_node);
--- linux/mm/page_alloc.c Tue, 25 Jun 2002 10:15:12
+++ linux/mm/page_alloc.c Mon, 22 Jul 2002 12:06:09
@@ -512,6 +512,37 @@ unsigned int nr_free_highpages (void)
#define K(x) ((x) << (PAGE_SHIFT-10))
/*
+ * If it returns non zero it means there's lots of ram "free"
+ * (note: not in cache!) so any caller will know that
+ * he can allocate some memory to do some more aggressive
+ * (possibly wasteful) readahead. The state of the memory
+ * should be rechecked after every few pages allocated for
+ * doing this aggressive readahead.
+ *
+ * NOTE: caller passes in gfp_mask of zones to check
+ */
+int start_aggressive_readahead(int gfp_mask)
+{
+ pg_data_t *pgdat = pgdat_list;
+ zonelist_t *zonelist;
+ zone_t **zonep, *zone;
+ int ret = 0;
+
+ do {
+ zonelist = pgdat->node_zonelists + (gfp_mask & GFP_ZONEMASK);
+ zonep = zonelist->zones;
+
+ for (zone = *zonep++; zone; zone = *zonep++)
+ if (zone->free_pages > zone->pages_high * 2)
+ ret = 1;
+
+ pgdat = pgdat->node_next;
+ } while (pgdat);
+
+ return ret;
+}
+
+/*
* Show free area list (used inside shift_scroll-lock stuff)
* We also calculate the percentage fragmentation. We do this by counting the
* memory on each free list with the exception of the first item on the list.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [RFC] start_aggressive_readahead 2002-07-25 16:10 [RFC] start_aggressive_readahead Christoph Hellwig @ 2002-07-25 16:44 ` Rik van Riel 2002-07-25 19:40 ` Andrew Morton 2002-07-26 6:53 ` Daniel Phillips 1 sibling, 1 reply; 23+ messages in thread From: Rik van Riel @ 2002-07-25 16:44 UTC (permalink / raw) To: Christoph Hellwig; +Cc: torvalds, linux-mm On Thu, 25 Jul 2002, Christoph Hellwig wrote: > This function (start_aggressive_readahead()) checks whether all zones > of the given gfp mask have lots of free pages. Seems a bit silly since ideally we wouldn't reclaim cache memory until we're low on physical memory. regards, Rik -- http://www.linuxsymposium.org/2002/ "You're one of those condescending OLS attendants" "Here's a nickle kid. Go buy yourself a real t-shirt" http://www.surriel.com/ http://distro.conectiva.com/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] start_aggressive_readahead 2002-07-25 16:44 ` Rik van Riel @ 2002-07-25 19:40 ` Andrew Morton 2002-07-26 16:50 ` Scott Kaplan 2002-07-26 20:14 ` Stephen Lord 0 siblings, 2 replies; 23+ messages in thread From: Andrew Morton @ 2002-07-25 19:40 UTC (permalink / raw) To: Rik van Riel; +Cc: Christoph Hellwig, torvalds, linux-mm Rik van Riel wrote: > > On Thu, 25 Jul 2002, Christoph Hellwig wrote: > > > This function (start_aggressive_readahead()) checks whether all zones > > of the given gfp mask have lots of free pages. > > Seems a bit silly since ideally we wouldn't reclaim cache memory > until we're low on physical memory. > Yes, I would question its worth also. What it boils down to is: which pages are we, in the immediate future, more likely to use? Pages which are at the tail of the inactive list, or pages which are in the file's readahead window? I'd say the latter, so readahead should just go and do reclaim. - -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] start_aggressive_readahead 2002-07-25 19:40 ` Andrew Morton @ 2002-07-26 16:50 ` Scott Kaplan 2002-07-26 19:38 ` Andrew Morton 2002-07-26 20:14 ` Stephen Lord 1 sibling, 1 reply; 23+ messages in thread From: Scott Kaplan @ 2002-07-26 16:50 UTC (permalink / raw) To: Andrew Morton; +Cc: Rik van Riel, Christoph Hellwig, torvalds, linux-mm -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday, July 25, 2002, at 03:40 PM, Andrew Morton wrote: > What it boils down to is: which pages are we, in the immediate future, > more likely to use? Pages which are at the tail of the inactive list, > or pages which are in the file's readahead window? That is the right question to ask... > I'd say the latter, so readahead should just go and do reclaim. ...but the answer's not that simple, I'm afraid. You've got two groups of logical pages competing for physical page frames. Which is more valuable depends entirely on the reference behavior of workload. I'll point you to a recent paper of mine on exactly this problem (in two formats): http://www.cs.amherst.edu/~sfkaplan/papers/prepaging.pdf http://www.cs.amherst.edu/~sfkaplan/papers/prepaging.ps.bz2 The results presented are from uniprogrammed reference traces, but the principle still applies: For some reference patterns, caching of some number of readahead pages is a great idea. For other reference patterns, the pages at the tail end of the inactive list are *still* more valuable, and the readahead pages should be completely ignored. There's also a lot of space in the middle: Readahead pages should be cached, but only for a limited time, lest they displace too many pages on the tail end of the inactive list. What you really want is some kind of adaptivity that allows you to compare the rates at which these two pools of pages are referenced, and then decides how many readahead pages (if any) to cache. Scott -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (Darwin) Comment: For info see http://www.gnupg.org iD8DBQE9QX3y8eFdWQtoOmgRAplfAKCLrmURjCkuf6snOfwrFQFmqXlYoACgnvCa IFEC/tDsVLY+isCC/qkxn5w= =8Jx5 -----END PGP SIGNATURE----- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] start_aggressive_readahead 2002-07-26 16:50 ` Scott Kaplan @ 2002-07-26 19:38 ` Andrew Morton 2002-07-28 23:32 ` Scott Kaplan 0 siblings, 1 reply; 23+ messages in thread From: Andrew Morton @ 2002-07-26 19:38 UTC (permalink / raw) To: Scott Kaplan; +Cc: Rik van Riel, Christoph Hellwig, torvalds, linux-mm Scott Kaplan wrote: > > .. > > What it boils down to is: which pages are we, in the immediate future, > > more likely to use? Pages which are at the tail of the inactive list, > > or pages which are in the file's readahead window? > > That is the right question to ask... > > > I'd say the latter, so readahead should just go and do reclaim. > > ...but the answer's not that simple, I'm afraid. You've got two groups of > logical pages competing for physical page frames. Which is more valuable > depends entirely on the reference behavior of workload. I'll point you to > a recent paper of mine on exactly this problem (in two formats): > > http://www.cs.amherst.edu/~sfkaplan/papers/prepaging.pdf readahead was rewritten for 2.5. I think it covers most of the things you discuss there. - It adaptively grows the window size in response to "hits": each time userspace requests a page, and that page is found to be inside the previously-requested readahead window, we grow the window by 2 pages (up to a configurable limit) because readahead is being beneficial. - It shrinks the window size in response to "misses" - if userspace requests a page which is *not* inside the previously-requested window, the future window size is shrunk by 25% - It detects eviction: if userspace requests a page which *should* have been inside the readahead window, but it's actually not there, then we know it was evicted prior to being used. We shrink the window by 3 pages. (This almost never happens, in my testing). - It behaves differently for page faults: for read(2), readahead is strictly ahead of the requested page. For mmap pagefaults, the readaround window is positioned 25% behind the requested page and 75% ahead of it. All these numbers were engineered by the time-honoured practice of guess-and-giggle. On IDE disks, you can fiddle extensively with readahead and make virtually no difference at all, because the disk does it as well. On older SCSI disks, readahead makes a lot of difference. Because, presumably, the disk isn't being as smart. To some extent, this device-level caching makes the whole readahead thing of historical interest only, I suspect. - For CPU efficiency against an already-fully-cached file: If readahead finds that all pages inside a readahead request are already in core, it shrinks the readahead window by a page, and ultimately turns readahead off completely. It is resumed when there is a miss. - We no longer put readahead pages on the active list. They are placed on the head of the inactive list. If nobody subsequently uses the page, it proceeds to the tail of the inactive list and is evicted. Sort of. This code needs some checking. When the readahead page is accessed, we set PageReferenced and leave it on the inactive list. It will still be evicted when it reaches the tail of the inactive list. It will only be moved to the active list if it is referenced (faulted in or read() from) a second time. I guess this is the "use-once" feature, and it is designed to detect the common case of a once-off streaming read. I'd be interested in your assessment of the 2.5 readahead/readaround implementation. It still has one nasty problem, which is not VM-related. It is to do with the interaction with request merging. When performing streaming reads from two large files, we tend to seek between the two files at the readahead window size granularity. But we *should* be alternating between the files at a coarser granularity: the request queue's read latency. 2.4 does this - somehow it manages to get its new readahead requests merged with its old ones, so this has the effect of "capturing" the disk head until the request latency of a request from the other file expires. I still need to get down and fix this - it's a very subtle interaction between readahead and request queueing and I suspect it'll need to be formalised in some manner, rather than just fiddling the code so it happens to work out right. - -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] start_aggressive_readahead 2002-07-26 19:38 ` Andrew Morton @ 2002-07-28 23:32 ` Scott Kaplan 2002-07-29 0:19 ` Rik van Riel 2002-07-29 7:34 ` Andrew Morton 0 siblings, 2 replies; 23+ messages in thread From: Scott Kaplan @ 2002-07-28 23:32 UTC (permalink / raw) To: Andrew Morton; +Cc: Rik van Riel, Christoph Hellwig, torvalds, linux-mm -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Friday, July 26, 2002, at 03:38 PM, Andrew Morton wrote: > readahead was rewritten for 2.5. It is just darned difficult to keep up with all of the changes! > I think it covers most of the things you discuss there. > > - It adaptively grows the window size in response to "hits" Seems somewhat reasonable, although easy to be fooled. If I reference some of the most recently read-ahead blocks, I'll grow the read-ahead window, keeping other unreference, read-ahead blocks for longer, even though there's no evidence that keeping them longer will result in more hits. In other words, it's not hits that should necessarily make you grow the cache -- it's the evidence that there will be an *increase* in hits if you do. > - It shrinks the window size in response to "misses" - if > userspace requests a page which is *not* inside the previously-requested > window, the future window size is shrunk by 25% This one seems wierd. If I reference a page that could have been in a larger read-ahead window, shouldn't I make the window *larger* so that next time, it *will* be in the window? > - It detects eviction: if userspace requests a page which *should* > have been inside the readahead window, but it's actually not there, > then we know it was evicted prior to being used. We shrink the > window by 3 pages. (This almost never happens, in my testing). Again, this seems backwards in the manner mentioned above. It could have been resident, but it was evicted, so if you want it to be a hit, make the window *bigger*, no? What should drive the reduction in the read-ahead window is the observation that recent increases have not yielding higher hit rates -- more has not been better. > - It behaves differently for page faults: for read(2), readahead is > strictly ahead of the requested page. For mmap pagefaults, > the readaround window is positioned 25% behind the requested page and > 75% ahead of it. That seems sensible enough... The entire adaptive mechanism you've described seems only to consider one of the two competing pools, though, namely the read-ahead pool of pages. What about its competition -- The references to pages that are near eviction at the end of the inactive list? Adapting to one without consideration of the other is working half-blind. Why would you ever want to shrink the read-ahead window if very, very few pages at the end of the inactive list are being hit? Similarly, you would want to be very cautious about increasing the size of the read-ahead window of many pages at the end of the inactive list are being re-used. > To some extent, this device-level caching makes the whole readahead thing > of historical interest only, I suspect. To some extent, yes, but the scales are substantially difference. If your disk has just a few MB of cache, but your RAM is hundreds of MB (or larger) , the VM system can choose to cache read-ahead pages for much, much longer if it detects that its of greater benefit than caching very old, used pages. > - We no longer put readahead pages on the active list. They are placed > on the head of the inactive list. If nobody subsequently uses the > page, it proceeds to the tail of the inactive list and is evicted. This seems a wise move, as placing them in the active list is only going to be beneficial in some very unusual cases. Still, the question does remain as to *how long* a read-ahead page should be left unused before it is prepared for eviction. I'll admit that it's not necessarily clear how to do the cost/benefit adaptivity that I'm describing. I'm working on that right now, which I why I'm suddenly so curious about the details of this VM and how to play with it. All in all, it sounds like you've made good changes, but perhaps you can address the weaknesses that I've pointed out (or tell me why I'm wrong about them). Scott -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (Darwin) Comment: For info see http://www.gnupg.org iD8DBQE9RH8R8eFdWQtoOmgRAtQPAJwJr6z3zkY5fJShQ3fSq44j2PwsLgCffw2B xyUMF/CKvmvn3+4BDvbcekQ= =BuBd -----END PGP SIGNATURE----- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] start_aggressive_readahead 2002-07-28 23:32 ` Scott Kaplan @ 2002-07-29 0:19 ` Rik van Riel 2002-07-29 2:12 ` Scott Kaplan 2002-07-29 7:34 ` Andrew Morton 1 sibling, 1 reply; 23+ messages in thread From: Rik van Riel @ 2002-07-29 0:19 UTC (permalink / raw) To: Scott Kaplan; +Cc: Andrew Morton, Christoph Hellwig, torvalds, linux-mm On Sun, 28 Jul 2002, Scott Kaplan wrote: > > - We no longer put readahead pages on the active list. They are placed > > on the head of the inactive list. If nobody subsequently uses the > > page, it proceeds to the tail of the inactive list and is evicted. > > This seems a wise move, as placing them in the active list is only going > to be beneficial in some very unusual cases. I'm not sure about that. If we do linear IO we most likely want to evict the pages we've already used as opposed to the pages we're about to use. This means that (1) we want to clear the accessed bit of the pages we've already read, moving them to the inactive list if needed and (2) we'll want to keep the about-to-be-used pages separate from the already-used pages. regards, Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] start_aggressive_readahead 2002-07-29 0:19 ` Rik van Riel @ 2002-07-29 2:12 ` Scott Kaplan 2002-07-29 3:05 ` Rik van Riel 0 siblings, 1 reply; 23+ messages in thread From: Scott Kaplan @ 2002-07-29 2:12 UTC (permalink / raw) To: Rik van Riel; +Cc: Andrew Morton, Christoph Hellwig, torvalds, linux-mm -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday, July 28, 2002, at 08:19 PM, Rik van Riel wrote: > I'm not sure about that. If we do linear IO we most likely > want to evict the pages we've already used as opposed to the > pages we're about to use. The situation is more subtle than that. I agree that in a linear I/O case, the read-ahead pages are extremely likely to be used very soon. However, that does *not* imply that they should be promoted to the active list -- in fact, quite the opposite when considering the read-ahead situation. Consider exactly the case you have raised -- strict, linear referencing of blocks, such as a sequential file read. When block `i' is referenced, it is an excellent prediction that block `i+1' will be referenced soon. If block `i+1' is not referenced soon, then the prediction was incorrect, *and there's little reason to keep the block around any longer*. In other words, the better the prediction, the closer to the end of the LRU ordering the blocks can be placed. The ones that *are* used soon will be referenced and promoted to the front of the LRU ordering before they are evicted, exactly because the soonness of use is so strong. The read-ahead blocks that are not used soon are evicted before long. In other words, the shorter a time you think you need to keep a block, the closer to the end of the list it should go. If your guess is wrong, you've displaced fewer other blocks. If your prediction is a good one, such as with linear file reading, you will not need to cache a block as a read-ahead block for long before it is actually used. It is when you predict that a read-ahead will not pay off for some time -- that the read-ahead blocks will not be used so soon -- that such blocks need to be placed closer to the front of the LRU ordering (that is, in the active list). That way, they will be cached much longer so that they will still be resident when they finally are used. Of course, such caching displaces more of the other pages, possibly causing faults on those. It is when your read-ahead prediction indicates a weak soonness of use that you must compare the benefits of caching those pages against the cost of displacing other pages. Only if few pages near the end of the LRU ordering -- non-read-ahead pages -- are being referenced might it be worth caching read-ahead pages for so long. So, in the case of linear I/O, placing the read-ahead pages at the front of the inactive list is likely to provide more than enough time for those pages to be used and promoted to the active list. By placing them in the inactive list, you reduce the damage done when read-ahead pages are *not* used soon. Scott -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (Darwin) Comment: For info see http://www.gnupg.org iD8DBQE9RKR78eFdWQtoOmgRAr1EAJ9RSY10utFCEvIftv9qEMNZzzczswCfTlZv 63z5vAMl38r+jtGQRImUkoY= =X6S4 -----END PGP SIGNATURE----- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] start_aggressive_readahead 2002-07-29 2:12 ` Scott Kaplan @ 2002-07-29 3:05 ` Rik van Riel 2002-07-29 15:24 ` Scott Kaplan 0 siblings, 1 reply; 23+ messages in thread From: Rik van Riel @ 2002-07-29 3:05 UTC (permalink / raw) To: Scott Kaplan; +Cc: Andrew Morton, Christoph Hellwig, torvalds, linux-mm On Sun, 28 Jul 2002, Scott Kaplan wrote: > On Sunday, July 28, 2002, at 08:19 PM, Rik van Riel wrote: > > > I'm not sure about that. If we do linear IO we most likely > > want to evict the pages we've already used as opposed to the > > pages we're about to use. > > The situation is more subtle than that. > Consider exactly the case you have raised -- strict, linear referencing of > blocks, such as a sequential file read. When block `i' is referenced, it > is an excellent prediction that block `i+1' will be referenced soon. If > block `i+1' is not referenced soon, then the prediction was incorrect, > *and there's little reason to keep the block around any longer*. My experience with 300 ftp clients pulling a collective 40 Mbit/s suggests otherwise. About 70% of the clients were on modem speed and the other 30% of the clients were on widely variable higher speeds. Since a disk seek + read is about 10ms, the absolute maximum number of seeks that can be done is 100 a second and the minimum amount of time between disk seeks for one stream should be about 3 seconds. In reality the situation is worse because of the large speed difference between the disk seeks and the fact that we want a reasonably low latency for disk IO for the other tasks in the system. This would put the conservative minimum time we should keep readahead data in RAM at something like 10 seconds, to account for the speed differences of fast and slow data streams and to not completely bog down the IO subsystem with requests. regards, Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] start_aggressive_readahead 2002-07-29 3:05 ` Rik van Riel @ 2002-07-29 15:24 ` Scott Kaplan 0 siblings, 0 replies; 23+ messages in thread From: Scott Kaplan @ 2002-07-29 15:24 UTC (permalink / raw) To: Rik van Riel; +Cc: Andrew Morton, Christoph Hellwig, torvalds, linux-mm -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday, July 28, 2002, at 11:05 PM, Rik van Riel wrote: > My experience with 300 ftp clients pulling a collective 40 Mbit/s > suggests otherwise. > > About 70% of the clients were on modem speed and the other 30% of > the clients were on widely variable higher speeds. > > Since a disk seek + read is about 10ms, the absolute maximum > number of seeks that can be done is 100 a second and the minimum > amount of time between disk seeks for one stream should be about > 3 seconds. This is a very interesting example of some real (and important) reference behavior that must be understood to be handled well. In the context of this thread of discussion, this case is substantially different from your original comment on read-ahead for ``linear file I/O''. Just as a refresher for myself and anyone else that needs it: I claimed that linear file I/O was a case in which read-ahead blocks should not be cached for long before they would either be used or evicted from lack of use. (That is, they should be placed nearer to the end of the LRU ordering.) The claim was based on the observation that sequential file traversal is a very good case for read-ahead, where the read-ahead blocks are very likely to be used very soon. What's important about this example is that, due to the whole system workload and the disparate connection speeds of the ftp clients, it is *NOT* a typical case of linear file I/O. In fact, what's odd about it is that block `i' of a file will be read, and for slower connections, block ` i+1' will *not* be used for some time, since reading block `i' will take a while. In other words, the interleaved reference behavior from all of these ftp downloads makes the prediction that block `i+1' will be used soon a weaker prediction. It is very likely to be used, yes, but not so soon in many cases due to the other files being read and referenced. Because the soonness of use is weak, we do indeed want to cache the read-ahead pages for longer. (That is, I agree that for this example, read-ahead pages should go into the active list.) Caching read-ahead pages for longer, though, displaces more used pages, forcing them to be evicted sooner then they would have been without the aggressive read-ahead caching. Critically, for *this* workload, that's probably just fine. Assuming that different files are being downloaded by different ftp clients, after reading and referencing a block, it's probably worth little to cache it in case of re-use for very long. In other words, among the referenced pages, those near the end of the LRU ordering are referenced rarely. The competition between read-ahead pages and less recently used referenced pages is lopsided in favor of the read-ahead pages. But that is only a consequence of reference pattern for *this specific workload* -- it may not be true for other workloads. Incidentally, this is all just mental masturbation until someone actually records and measures the reference behavior from this kind of workload. It all sounds about right, but that's neither good science nor good engineering. In short, I agree that for this case, inserting read-ahead pages into the inactive list may not be aggressive enough. I disagree that the reason is ``linear file I/O'', as the reference pattern here is more complex than that. This is also a wonderful case for getting read-ahead caching adaptivity right: A system that can weigh read-ahead caching allocations against less recently used referenced-page allocations will detect and adjust to this case quickly, while avoiding such aggressive read-ahead caching for other workloads. Scott -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (Darwin) Comment: For info see http://www.gnupg.org iD8DBQE9RV4a8eFdWQtoOmgRAk6tAKCYX8tHrauHGMaek1oyCJMvEQf5yACgrEgX pHx2gTsY4HTy9OUmOZjT7I8= =JTJP -----END PGP SIGNATURE----- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] start_aggressive_readahead 2002-07-28 23:32 ` Scott Kaplan 2002-07-29 0:19 ` Rik van Riel @ 2002-07-29 7:34 ` Andrew Morton 2002-07-29 7:37 ` Vladimir Dergachev ` (2 more replies) 1 sibling, 3 replies; 23+ messages in thread From: Andrew Morton @ 2002-07-29 7:34 UTC (permalink / raw) To: Scott Kaplan; +Cc: Rik van Riel, Christoph Hellwig, linux-mm [ snipped poor old Linus. he doesn't read 'em anyway ] Scott Kaplan wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Friday, July 26, 2002, at 03:38 PM, Andrew Morton wrote: > > > readahead was rewritten for 2.5. > > It is just darned difficult to keep up with all of the changes! > > > I think it covers most of the things you discuss there. > > > > - It adaptively grows the window size in response to "hits" > > Seems somewhat reasonable, although easy to be fooled. If I reference > some of the most recently read-ahead blocks, I'll grow the read-ahead > window, keeping other unreference, read-ahead blocks for longer, even > though there's no evidence that keeping them longer will result in more > hits. In other words, it's not hits that should necessarily make you grow > the cache -- it's the evidence that there will be an *increase* in hits if > you do. Ah, but if we're not getting hits in the readahead window then we're getting misses. And misses shrink the window. Add two pages for a hit, remove 25% for a miss. The window size should stabilise at a size which is larger if readahead is being useful. I hope. > > - It shrinks the window size in response to "misses" - if > > userspace requests a page which is *not* inside the previously-requested > > window, the future window size is shrunk by 25% > > This one seems wierd. If I reference a page that could have been in a > larger read-ahead window, shouldn't I make the window *larger* so that > next time, it *will* be in the window? That's true. If the application is walking across a file touching every fifth page, readahead will stabilise at its minimum window size, which is less than five pages and we lose bigtime. I'm not sure how to fix that while retaining some sanity in the code. > > - It detects eviction: if userspace requests a page which *should* > > have been inside the readahead window, but it's actually not there, > > then we know it was evicted prior to being used. We shrink the > > window by 3 pages. (This almost never happens, in my testing). > > Again, this seems backwards in the manner mentioned above. It could have > been resident, but it was evicted, so if you want it to be a hit, make the > window *bigger*, no? What should drive the reduction in the read-ahead > window is the observation that recent increases have not yielding higher > hit rates -- more has not been better. That's the thrashing situation which Rik mentioned. The application must be reading the file very slowly. We try to reduce the window size to a point at which all the slow readers in the system stabilise and stop thrashing each other's readahead. This works up to a point - I had a little artificial test - just a process which opens a great number of files and reads a page from each one, cycling around. The current code reduces the onset of thrashing in that test, and reduces its severity. It's significantly better than the old code. But there is still a dramatic dropoff in throughput once it happens. > > - It behaves differently for page faults: for read(2), readahead is > > strictly ahead of the requested page. For mmap pagefaults, > > the readaround window is positioned 25% behind the requested page and > > 75% ahead of it. > > That seems sensible enough... > > The entire adaptive mechanism you've described seems only to consider one > of the two competing pools, though, namely the read-ahead pool of pages. > What about its competition -- The references to pages that are near > eviction at the end of the inactive list? Adapting to one without > consideration of the other is working half-blind. Why would you ever want > to shrink the read-ahead window if very, very few pages at the end of the > inactive list are being hit? hmm. The default max window size is 128kbytes at present. For some but not many tests, increasing it does help. But mainly because of the merging artifact which I mentioned earlier. > Similarly, you would want to be very > cautious about increasing the size of the read-ahead window of many pages > at the end of the inactive list are being re-used. I tend to think that if pages at the tail of the LRU are being referenced with any frequency we've goofed anyway. There are many things apart from readahead which will allocate pages, yes? - -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] start_aggressive_readahead 2002-07-29 7:34 ` Andrew Morton @ 2002-07-29 7:37 ` Vladimir Dergachev 2002-07-29 7:53 ` Andrew Morton 2002-07-29 8:04 ` Rik van Riel 2002-07-30 16:11 ` Scott Kaplan 2 siblings, 1 reply; 23+ messages in thread From: Vladimir Dergachev @ 2002-07-29 7:37 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm > > > - It shrinks the window size in response to "misses" - if > > > userspace requests a page which is *not* inside the previously-requested > > > window, the future window size is shrunk by 25% > > > > This one seems wierd. If I reference a page that could have been in a > > larger read-ahead window, shouldn't I make the window *larger* so that > > next time, it *will* be in the window? > > That's true. If the application is walking across a file > touching every fifth page, readahead will stabilise at > its minimum window size, which is less than five pages and > we lose bigtime. I'm not sure how to fix that while retaining > some sanity in the code. I am curious: which applications do you know of that actually do this ? What about growing the window even if there is a miss as long as misses are sequential and not further than a fixed amount from the window ? Vladimir Dergachev -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] start_aggressive_readahead 2002-07-29 7:37 ` Vladimir Dergachev @ 2002-07-29 7:53 ` Andrew Morton 0 siblings, 0 replies; 23+ messages in thread From: Andrew Morton @ 2002-07-29 7:53 UTC (permalink / raw) To: Vladimir Dergachev; +Cc: linux-mm Vladimir Dergachev wrote: > > > > > - It shrinks the window size in response to "misses" - if > > > > userspace requests a page which is *not* inside the previously-requested > > > > window, the future window size is shrunk by 25% > > > > > > This one seems wierd. If I reference a page that could have been in a > > > larger read-ahead window, shouldn't I make the window *larger* so that > > > next time, it *will* be in the window? > > > > That's true. If the application is walking across a file > > touching every fifth page, readahead will stabilise at > > its minimum window size, which is less than five pages and > > we lose bigtime. I'm not sure how to fix that while retaining > > some sanity in the code. > > I am curious: which applications do you know of that actually do this ? None. Just a test program which I used for testing readahead! > What about growing the window even if there is a miss as long as misses > are sequential and not further than a fixed amount from the window ? That would work. If the window size is less than max, and the miss occurred inside the max, increase the window to a size which would have caught that page. Or to the max. - -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] start_aggressive_readahead 2002-07-29 7:34 ` Andrew Morton 2002-07-29 7:37 ` Vladimir Dergachev @ 2002-07-29 8:04 ` Rik van Riel 2002-07-30 16:11 ` Scott Kaplan 2 siblings, 0 replies; 23+ messages in thread From: Rik van Riel @ 2002-07-29 8:04 UTC (permalink / raw) To: Andrew Morton; +Cc: Scott Kaplan, Christoph Hellwig, linux-mm On Mon, 29 Jul 2002, Andrew Morton wrote: > > Similarly, you would want to be very > > cautious about increasing the size of the read-ahead window of many pages > > at the end of the inactive list are being re-used. > > I tend to think that if pages at the tail of the LRU are being > referenced with any frequency we've goofed anyway. There are > many things apart from readahead which will allocate pages, yes? It would be a useful thing to measure, though. We can use this information to decide to: 1) reduce readahead and, if if the situation continues 2) do load control regards, Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] start_aggressive_readahead 2002-07-29 7:34 ` Andrew Morton 2002-07-29 7:37 ` Vladimir Dergachev 2002-07-29 8:04 ` Rik van Riel @ 2002-07-30 16:11 ` Scott Kaplan 2002-07-30 16:21 ` Martin J. Bligh 2 siblings, 1 reply; 23+ messages in thread From: Scott Kaplan @ 2002-07-30 16:11 UTC (permalink / raw) To: Andrew Morton; +Cc: Rik van Riel, Christoph Hellwig, linux-mm -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday, July 29, 2002, at 03:34 AM, Andrew Morton wrote: > Scott Kaplan wrote: >> In other words, it's not hits that should necessarily make you grow >> the cache -- it's the evidence that there will be an *increase* in hits >> if >> you do. > > Ah, but if we're not getting hits in the readahead window > then we're getting misses. And misses shrink the window. Yes, and that's the wrong thing to do. If you are getting hits, you should try *skrinking* the window to see if there is a reduction in hits. If there is no reduction, you can capture just as many hits with a smaller window -- the extra space was superfluous. If you're getting misses, you should try to *grow* the window (to commit an awful case of verbing) in an attempt to turn such misses into hits. If growing the window doesn't decrease the misses, then you may need too large of an increase to cache those pages successfully. If growing the window does decrease the misses, then keep growing until you don't see a decrease. What's I'm describing here has its own major pitfalls: 1) It considers only the read-ahead pool. Shrinking or growing the window could also have an effect on the hits and misses to the used pool of pages. 2) You can get trapped in local minima. Part of what makes memory allocation hard under any realistic on-line replacement policy is that changes in hits/misses are non-monotonic. For example, if we are observing misses to evicted read-ahead pages and try to grow the cache in response, we may not see any improvement unless we grow the cache sufficiently, and then get diminishing returns if we grow it beyond that point. To avoid this kind of problem, you need more than just hit and miss counts -- you need reference distributions. > I tend to think that if pages at the tail of the LRU are being > referenced with any frequency we've goofed anyway. I disagree. Referencing things at the tail of the LRU is the sign of having done something *right*. It means that for a workload with substantial memory needs, the VM system is holding onto pages *just long enough*, and no longer, to ensure that they are cached before reuse. It means that the workload is leaving some pages unused for some time, but consistently revisiting those pages as part of a phase change that is near the scale of the memory size. It a case where LRU and its approximations perform about as well as possible. Remember that the ordering of resident pages doesn't need to be very exact. A policy can have a completely goofed notion of which pages will be used soon; if they're all resident, it doesn't matter that the ordering among the resident pages was poor. What counts is that they were resident. When you evict pages poorly, that' s when the mis-ordering is trouble: Referencing pages that have just been reclaimed is when we've really goofed. Otherwise, it's fine. This comment serves to highlight a point: Memory pressure is not merely defined by the amount of paging or the number of new page allocations. It should also be defined by the number of references to pages that *nearly* got evicted. Those references represent behavior that is on the scale of the memory size, where good and bad decisions make a different. Therefore, those are events relevant to the VM and the physical memory it is managing, and should contribute to the perception that there is pressure on the memory resources. Scott -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (Darwin) Comment: For info see http://www.gnupg.org iD8DBQE9Rrqa8eFdWQtoOmgRAq8oAJ9fJ+AlaXcfSc3U5xLIQQITPAc8QwCfQGK5 NvLZM39UauOSZ5TSjZYPH6s= =ovjL -----END PGP SIGNATURE----- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] start_aggressive_readahead 2002-07-30 16:11 ` Scott Kaplan @ 2002-07-30 16:21 ` Martin J. Bligh 2002-07-30 16:38 ` Scott Kaplan 2002-07-30 17:13 ` William Lee Irwin III 0 siblings, 2 replies; 23+ messages in thread From: Martin J. Bligh @ 2002-07-30 16:21 UTC (permalink / raw) To: Scott Kaplan, Andrew Morton; +Cc: Rik van Riel, Christoph Hellwig, linux-mm >> Ah, but if we're not getting hits in the readahead window >> then we're getting misses. And misses shrink the window. > > Yes, and that's the wrong thing to do. If you are getting hits, > you should try *skrinking* the window to see if there is a > reduction in hits. If there is no reduction, you can capture > just as many hits with a smaller window -- the extra space was > superfluous. If you're getting misses, you should try to *grow* > the window (to commit an awful case of verbing) in an attempt to > turn such misses into hits. If growing the window doesn't decrease > the misses, then you may need too large of an increase to cache > those pages successfully. If growing the window does decrease > the misses, then keep growing until you don't see a decrease. Would it not be easier to actually calculate (statistically) the read-ahead window, rather than actually tweaking it empirically? If we're getting misses, there could be at least two causes - 1. We're doing random, not sequential IO. Shrinking the window would be most sensible. 2. We're reading ahead really fast, or skip-reading ahead. Growing the window would probably be most sensible. Thus I'd contend that either growing or shrinking in straight response to just a hit/miss rate is not correct. We need to actually look at the access pattern of the application, surely? Perhaps I'm being naive, but I would have thought it would be possible to calculate what the hit/miss rate with a given readahead window would be without actually going to the pain of shrinking it up and down. M. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] start_aggressive_readahead 2002-07-30 16:21 ` Martin J. Bligh @ 2002-07-30 16:38 ` Scott Kaplan 2002-07-30 16:52 ` Martin J. Bligh 2002-07-30 17:13 ` William Lee Irwin III 1 sibling, 1 reply; 23+ messages in thread From: Scott Kaplan @ 2002-07-30 16:38 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Andrew Morton, Rik van Riel, Christoph Hellwig, linux-mm -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tuesday, July 30, 2002, at 12:21 PM, Martin J. Bligh wrote: > Thus I'd contend that either growing or shrinking in straight > response to just a hit/miss rate is not correct. We need to actually > look at the access pattern of the application, surely? I agree. I probably should have made it clear that what I was suggesting wasn't the right way to go about it, but rather an argument against the heuristics that seemed backwards to me. The causes for misses are necessarily as clear cut as you mentioned, as there are a lot of behaviors that are neither fully random nor fully sequential. So, while it is ideal to have some foresight before resizing the window -- some calculation that determines whether or not growth will help or shrinkage will hurt -- it will require the VM system to gather hit distributions. I'm trying to make that happen right now, although for all VM pages, and not for the specific purpose of read-ahead calculations. However, the paper for which I gave a pointer (in a shameless act of self promotion) proposes exactly that: Keeping reference distributions for read-ahead and non-read-ahead pages, and then balancing the two against each other in an attempt to determine what the best read-ahead window size would be given recent reference behavior. There may be simpler, kruftier, and/or more effective versions of what I proposed, but what you said above is, I think, the right idea. Scott -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (Darwin) Comment: For info see http://www.gnupg.org iD8DBQE9RsEf8eFdWQtoOmgRAp+vAJoCF6mUgAI42x6Bac4A2/u+7oZXIwCdHVqZ AQCPlqTF+84udI5xSWqYWas= =swZ6 -----END PGP SIGNATURE----- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] start_aggressive_readahead 2002-07-30 16:38 ` Scott Kaplan @ 2002-07-30 16:52 ` Martin J. Bligh 2002-08-05 18:54 ` Scott Kaplan 0 siblings, 1 reply; 23+ messages in thread From: Martin J. Bligh @ 2002-07-30 16:52 UTC (permalink / raw) To: Scott Kaplan; +Cc: Andrew Morton, Rik van Riel, Christoph Hellwig, linux-mm >> Thus I'd contend that either growing or shrinking in straight >> response to just a hit/miss rate is not correct. We need to actually >> look at the access pattern of the application, surely? > > I agree. I probably should have made it clear that what I was > suggesting wasn't the right way to go about it, but rather an > argument against the heuristics that seemed backwards to me. Both sets of heuristics seem backwards to me, depending on the circumstances ;-) > The causes for misses are necessarily as clear cut as you > mentioned, as there are a lot of behaviors that are neither > fully random nor fully sequential. Indeed. Sorry - all I was trying to point out was that if there exist two identical sets of input data that can lead two different correct sets of output data, the calculation you're doing is insufficient. Of course, there are many more than two circumstances. > So, while it is ideal to have some foresight before resizing the > window -- some calculation that determines whether or not growth > will help or shrinkage will hurt -- it will require the VM system > to gather hit distributions. Yup, but I think it's almost certainly worth that expense. > However, the paper for which I gave a pointer (in a shameless act > of self promotion) proposes exactly that: Keeping reference I should read that ;-) We seem to be mostly in violent agreement ... How you actually calculate the window is a matter for debate and experimentation, but just growing and shrinking based on purely the hit rate seems like a bad idea. M. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] start_aggressive_readahead 2002-07-30 16:52 ` Martin J. Bligh @ 2002-08-05 18:54 ` Scott Kaplan 0 siblings, 0 replies; 23+ messages in thread From: Scott Kaplan @ 2002-08-05 18:54 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Andrew Morton, Rik van Riel, Christoph Hellwig, linux-mm -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Martin, Sorry for the slowness of the response, but just a thought or two... > Both sets of heuristics seem backwards to me, depending on the > circumstances ;-) I don't agree, but more on that in a moment. First, I'd like to point out a minor difference between what I meant by my suggestion and your interpretation of it. The heuristic that I was suggesting -- grow in response to read-ahead misses, shrink in response to hits -- was not intended as a mere replacement. It was meant as a ``blind'' approach to discovering the reference distribution for read-ahead pages. So, the heuristic wouldn't be used simply as stated; instead, it would be a first approach to changing the read-ahead window size until evidence was gathered to make higher-level decisions. For example, the VM system could shrink the window in response to hits, but if that shrinking decreased the hit count ``significantly'', it would return to the smallest window size that did not cause a hit decrease. Similarly, the VM system could increase the window size in response to misses, but after reaching some limit of increase where the misses do not decrease ``sufficiently'', it could return the window to the smallest size at which miss decrease was observed. Now back to my claim that the heuristic that I suggested is not just the flip side of the original heuristics, where both are roughly equivalent, and the success of one or the other is just a matter of the reference behavior. Assuming that an LRU-like replacement strategy is in place -- and I believe that page aging is LRU-like in the vast majority of situations -- the only way to turn a miss into a hit is to increase the window size. Thus, the original heuristic's approach of shrinking the window in response to misses is a guarantee that future references that are part of the same reference behavior will remain misses. Put differently, the *only* case in which it makes sense to shrink the read-ahead window in response to misses is one in which the misses are the result of un-cache-able references -- ones that would have required an absurdly large window, and so no window would be the best choice. However, the heuristic that I described above will reach the same conclusion, although more slowly. After growing the cache in response to the misses and observing no miss decrease, it would revert to a zero-sized window. Granted, this discussion is based only on the read-ahead references, and not on the references to other, used pages. However, even with that consideration, there's almost no situation in which you want to respond to read-ahead misses by shrinking the window -- and in those cases where you do, it's because of other factors, such as the need for a hopeless large window or a heavy demand on used pages that are near eviction that you want to shrink the window. Read-ahead misses may not motivate larger read-ahead windows, but alone they *never* motivate smaller read-ahead windows. >> So, while it is ideal to have some foresight before resizing the >> window -- some calculation that determines whether or not growth >> will help or shrinkage will hurt -- it will require the VM system >> to gather hit distributions. > > Yup, but I think it's almost certainly worth that expense. I'm happy that you think so, because I'm trying to do that now, and it's going to create some overhead. Much like current rmap implementations, it' s going to be the most intrusive for those cases where no paging is involved, and so the gains of tracking such information cannot be realized. > How you actually calculate the window is a matter for debate and > experimentation, but just growing and shrinking based on purely the > hit rate seems like a bad idea. Here I do agree. Rather than finding the hit distribution by blindly setting allocations and observing the outcome, we can gather data to indicate what the outcome *would* be for that allocation. Note, however, that VM systems have a long, long history of doing things like just responding to blind data gathering, much like increasing or decreasing allocation due to hit rate. It's a matter of convincing people that gathering data that shows you the search space on-line is worth the complexity and the overhead. Scott -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (Darwin) Comment: For info see http://www.gnupg.org iD8DBQE9TsnO8eFdWQtoOmgRAiC1AJsE3nhGa5zIGtkTsn7FBEuwrhX2uwCfcgzK x7JgsWbQcQIhk3BSS2Wyu/o= =oSsq -----END PGP SIGNATURE----- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] start_aggressive_readahead 2002-07-30 16:21 ` Martin J. Bligh 2002-07-30 16:38 ` Scott Kaplan @ 2002-07-30 17:13 ` William Lee Irwin III 1 sibling, 0 replies; 23+ messages in thread From: William Lee Irwin III @ 2002-07-30 17:13 UTC (permalink / raw) To: Martin J. Bligh Cc: Scott Kaplan, Andrew Morton, Rik van Riel, Christoph Hellwig, linux-mm On Tue, Jul 30, 2002 at 09:21:57AM -0700, Martin J. Bligh wrote: > Would it not be easier to actually calculate (statistically) the > read-ahead window, rather than actually tweaking it empirically? > If we're getting misses, there could be at least two causes - I wonder where these stats should really be kept. They seem to be in the vma which probably doesn't fly too well when 20K threads are pounding on different chunks of the same thing. Each could do locally sequential reads and look random to the perspective of per-vma stats. This probably gets worse if different threads are stomping in different patterns, e.g. one sequential, one random. They also seem to lack any way to cooperate since the hints are kept per-vma. It's also probably easier to predict the behavior of a single task. Cheers, Bill -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] start_aggressive_readahead 2002-07-25 19:40 ` Andrew Morton 2002-07-26 16:50 ` Scott Kaplan @ 2002-07-26 20:14 ` Stephen Lord 2002-07-26 20:29 ` Andrew Morton 1 sibling, 1 reply; 23+ messages in thread From: Stephen Lord @ 2002-07-26 20:14 UTC (permalink / raw) To: Andrew Morton; +Cc: Rik van Riel, Christoph Hellwig, torvalds, linux-mm On Thu, 2002-07-25 at 14:40, Andrew Morton wrote: > Rik van Riel wrote: > > > > On Thu, 25 Jul 2002, Christoph Hellwig wrote: > > > > > This function (start_aggressive_readahead()) checks whether all zones > > > of the given gfp mask have lots of free pages. > > > > Seems a bit silly since ideally we wouldn't reclaim cache memory > > until we're low on physical memory. > > > > Yes, I would question its worth also. > > > What it boils down to is: which pages are we, in the immediate future, > more likely to use? Pages which are at the tail of the inactive list, > or pages which are in the file's readahead window? > > I'd say the latter, so readahead should just go and do reclaim. > The interesting thing is that tuning metadata readahead using this function does indeed improve performance under heavy memory load. It seems we end up pushing more useful things out of memory than the metadata we read in. Andrew, you talked about a GFP flag which would mean only return memory if there was some available which was already free and clean. The best approach might be to use that flag in this scenario and skip the readahead if no memory is returned. For the record, this is not just used for directory readahead, but for any btree structured metadata in xfs. Steve -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] start_aggressive_readahead 2002-07-26 20:14 ` Stephen Lord @ 2002-07-26 20:29 ` Andrew Morton 0 siblings, 0 replies; 23+ messages in thread From: Andrew Morton @ 2002-07-26 20:29 UTC (permalink / raw) To: Stephen Lord; +Cc: Rik van Riel, Christoph Hellwig, torvalds, linux-mm Stephen Lord wrote: > > On Thu, 2002-07-25 at 14:40, Andrew Morton wrote: > > Rik van Riel wrote: > > > > > > On Thu, 25 Jul 2002, Christoph Hellwig wrote: > > > > > > > This function (start_aggressive_readahead()) checks whether all zones > > > > of the given gfp mask have lots of free pages. > > > > > > Seems a bit silly since ideally we wouldn't reclaim cache memory > > > until we're low on physical memory. > > > > > > > Yes, I would question its worth also. > > > > > > What it boils down to is: which pages are we, in the immediate future, > > more likely to use? Pages which are at the tail of the inactive list, > > or pages which are in the file's readahead window? > > > > I'd say the latter, so readahead should just go and do reclaim. > > > > The interesting thing is that tuning metadata readahead using > this function does indeed improve performance under heavy memory > load. It seems we end up pushing more useful things out of > memory than the metadata we read in. I'm surprised. Could be that even when there is no memory pressure, you're simply reading stuff which you're never using? Ah. Could be that the improvements which you saw are nothing to do with leaving memory free, and everything to do with the extreme latency which occurs in page reclaim when the system is under load. (I'm whining again). > Andrew, you talked about > a GFP flag which would mean only return memory if there was > some available which was already free and clean. Yes, you can do that now. Just use GFP_ATOMIC & ~__GFP_HIGH and the allocation will fail if it could not be satisfied from a zone which has (free_pages > zone->pages_min). Which will dip further into the page reserves than the start_aggressive_readahead() approach would have, but it'll certainly get around the page reclaim latency. (You'll need to set PF_NOWARN around the call, else the page allocator will spam you to death. Sorry) - -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] start_aggressive_readahead 2002-07-25 16:10 [RFC] start_aggressive_readahead Christoph Hellwig 2002-07-25 16:44 ` Rik van Riel @ 2002-07-26 6:53 ` Daniel Phillips 1 sibling, 0 replies; 23+ messages in thread From: Daniel Phillips @ 2002-07-26 6:53 UTC (permalink / raw) To: Christoph Hellwig, torvalds; +Cc: linux-mm On Thursday 25 July 2002 18:10, Christoph Hellwig wrote: > I'm also open for a better name (I think the current one is very bad, > but don't have a better idea :)). I'd also be ineterested in comments > how to avoid the new function and use existing functionality for it, > but I've tried to find it for a long time and didn't find something > suiteable. That's the right attitude imho. Redoing reahead needs to be a project all by itself, a fine thing to experiment with in the stable series. A bad idea that sort of works for now is better than what we've got. -- Daniel -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2002-08-05 18:54 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-07-25 16:10 [RFC] start_aggressive_readahead Christoph Hellwig 2002-07-25 16:44 ` Rik van Riel 2002-07-25 19:40 ` Andrew Morton 2002-07-26 16:50 ` Scott Kaplan 2002-07-26 19:38 ` Andrew Morton 2002-07-28 23:32 ` Scott Kaplan 2002-07-29 0:19 ` Rik van Riel 2002-07-29 2:12 ` Scott Kaplan 2002-07-29 3:05 ` Rik van Riel 2002-07-29 15:24 ` Scott Kaplan 2002-07-29 7:34 ` Andrew Morton 2002-07-29 7:37 ` Vladimir Dergachev 2002-07-29 7:53 ` Andrew Morton 2002-07-29 8:04 ` Rik van Riel 2002-07-30 16:11 ` Scott Kaplan 2002-07-30 16:21 ` Martin J. Bligh 2002-07-30 16:38 ` Scott Kaplan 2002-07-30 16:52 ` Martin J. Bligh 2002-08-05 18:54 ` Scott Kaplan 2002-07-30 17:13 ` William Lee Irwin III 2002-07-26 20:14 ` Stephen Lord 2002-07-26 20:29 ` Andrew Morton 2002-07-26 6:53 ` Daniel Phillips
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox