* Re: madvise(2) MADV_SEQUENTIAL behavior [not found] <1216163022.3443.156.camel@zenigma> @ 2008-07-16 12:14 ` Peter Zijlstra 2008-07-16 14:50 ` Rik van Riel 0 siblings, 1 reply; 9+ messages in thread From: Peter Zijlstra @ 2008-07-16 12:14 UTC (permalink / raw) To: Eric Rannaud; +Cc: linux-kernel, linux-mm, riel On Tue, 2008-07-15 at 23:03 +0000, Eric Rannaud wrote: > mm/madvise.c and madvise(2) say: > > * MADV_SEQUENTIAL - pages in the given range will probably be accessed > * once, so they can be aggressively read ahead, and > * can be freed soon after they are accessed. > > > But as the sample program at the end of this post shows, and as I > understand the code in mm/filemap.c, MADV_SEQUENTIAL will only increase > the amount of read ahead for the specified page range, but will not > influence the rate at which the pages just read will be freed from > memory. Correct, various attempts have been made to actually implement this, but non made it through. My last attempt was: http://lkml.org/lkml/2007/7/21/219 Rik recently tried something else based on his split-lru series: http://lkml.org/lkml/2008/7/15/465 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: madvise(2) MADV_SEQUENTIAL behavior 2008-07-16 12:14 ` madvise(2) MADV_SEQUENTIAL behavior Peter Zijlstra @ 2008-07-16 14:50 ` Rik van Riel 2008-07-16 21:05 ` Chris Snook 0 siblings, 1 reply; 9+ messages in thread From: Rik van Riel @ 2008-07-16 14:50 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Eric Rannaud, linux-kernel, linux-mm On Wed, 16 Jul 2008 14:14:55 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, 2008-07-15 at 23:03 +0000, Eric Rannaud wrote: > > mm/madvise.c and madvise(2) say: > > > > * MADV_SEQUENTIAL - pages in the given range will probably be accessed > > * once, so they can be aggressively read ahead, and > > * can be freed soon after they are accessed. > > > > > > But as the sample program at the end of this post shows, and as I > > understand the code in mm/filemap.c, MADV_SEQUENTIAL will only increase > > the amount of read ahead for the specified page range, but will not > > influence the rate at which the pages just read will be freed from > > memory. > > Correct, various attempts have been made to actually implement this, but > non made it through. > > My last attempt was: > http://lkml.org/lkml/2007/7/21/219 > > Rik recently tried something else based on his split-lru series: > http://lkml.org/lkml/2008/7/15/465 M patch is not going to help with mmap, though. I believe that for mmap MADV_SEQUENTIAL, we will have to do an unmap-behind from the fault path. Not every time, but maybe once per megabyte, unmapping the megabyte behind us. That way the normal page cache policies (use once, etc) can take care of page eviction, which should help if the file is also in use by another process. -- All Rights Reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: madvise(2) MADV_SEQUENTIAL behavior 2008-07-16 14:50 ` Rik van Riel @ 2008-07-16 21:05 ` Chris Snook 2008-07-17 0:01 ` Eric Rannaud 2008-07-17 14:20 ` Rik van Riel 0 siblings, 2 replies; 9+ messages in thread From: Chris Snook @ 2008-07-16 21:05 UTC (permalink / raw) To: Rik van Riel; +Cc: Peter Zijlstra, Eric Rannaud, linux-kernel, linux-mm Rik van Riel wrote: > On Wed, 16 Jul 2008 14:14:55 +0200 > Peter Zijlstra <peterz@infradead.org> wrote: > >> On Tue, 2008-07-15 at 23:03 +0000, Eric Rannaud wrote: >>> mm/madvise.c and madvise(2) say: >>> >>> * MADV_SEQUENTIAL - pages in the given range will probably be accessed >>> * once, so they can be aggressively read ahead, and >>> * can be freed soon after they are accessed. >>> >>> >>> But as the sample program at the end of this post shows, and as I >>> understand the code in mm/filemap.c, MADV_SEQUENTIAL will only increase >>> the amount of read ahead for the specified page range, but will not >>> influence the rate at which the pages just read will be freed from >>> memory. >> Correct, various attempts have been made to actually implement this, but >> non made it through. >> >> My last attempt was: >> http://lkml.org/lkml/2007/7/21/219 >> >> Rik recently tried something else based on his split-lru series: >> http://lkml.org/lkml/2008/7/15/465 > > M patch is not going to help with mmap, though. > > I believe that for mmap MADV_SEQUENTIAL, we will have to do > an unmap-behind from the fault path. Not every time, but > maybe once per megabyte, unmapping the megabyte behind us. > > That way the normal page cache policies (use once, etc) can > take care of page eviction, which should help if the file > is also in use by another process. > Wouldn't it just be easier to not move pages to the active list when they're referenced via an MADV_SEQUENTIAL mapping? If we keep them on the inactive list, they'll be candidates for reclaiming, but they'll still be in pagecache when another task scans through, as long as we're not under memory pressure. -- Chris -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: madvise(2) MADV_SEQUENTIAL behavior 2008-07-16 21:05 ` Chris Snook @ 2008-07-17 0:01 ` Eric Rannaud 2008-07-17 6:14 ` Nick Piggin 2008-07-17 14:20 ` Rik van Riel 1 sibling, 1 reply; 9+ messages in thread From: Eric Rannaud @ 2008-07-17 0:01 UTC (permalink / raw) To: Chris Snook Cc: Rik van Riel, Peter Zijlstra, linux-kernel, linux-mm, Andrew Morton, Nick Piggin On Wed, 2008-07-16 at 17:05 -0400, Chris Snook wrote: > Rik van Riel wrote: > > I believe that for mmap MADV_SEQUENTIAL, we will have to do > > an unmap-behind from the fault path. Not every time, but > > maybe once per megabyte, unmapping the megabyte behind us. > > Wouldn't it just be easier to not move pages to the active list when > they're referenced via an MADV_SEQUENTIAL mapping? If we keep them on > the inactive list, they'll be candidates for reclaiming, but they'll > still be in pagecache when another task scans through, as long as we're > not under memory pressure. This approach, instead of invalidating the pages right away would provide a middle ground: a way to tell the kernel "these pages are not too important". Whereas if MADV_SEQUENTIAL just invalidates the pages once per megabyte (say), then it's only doing what is already possible using MADV_DONTNEED ("drop this pages now"). It would automate the process, but it would not provide a more subtle hint, which could be quite useful. As I see it, there are two basic concepts here: - no_reuse (like FADV_NOREUSE) - more_ra (more readahead) (DONTNEED being another different concept) Then: MADV_SEQUENTIAL = more_ra | no_reuse FADV_SEQUENTIAL = more_ra | no_reuse FADV_NOREUSE = no_reuse Right now, only the 'more_ra' part is implemented. 'no_reuse' could be implemented as Chris suggests. It looks like the disagreement a year ago around Peter's approach was mostly around the question of whether using read ahead as a heuristic for "drop behind" was safe for all workloads. Would it be less controversial to remove the heuristic (ra->size == ra->ra_pages), and to do something only if the user asked for _SEQUENTIAL or _NOREUSE? It might encourage user space applications to start using FADV_SEQUENTIAL or FADV_NOREUSE more often (as it would become worthwhile to do so), and if they do (especially cron jobs), the problem of the slow desktop in the morning would progressively solve itself. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: madvise(2) MADV_SEQUENTIAL behavior 2008-07-17 0:01 ` Eric Rannaud @ 2008-07-17 6:14 ` Nick Piggin 2008-07-17 14:21 ` Rik van Riel 0 siblings, 1 reply; 9+ messages in thread From: Nick Piggin @ 2008-07-17 6:14 UTC (permalink / raw) To: Eric Rannaud Cc: Chris Snook, Rik van Riel, Peter Zijlstra, linux-kernel, linux-mm, Andrew Morton On Thursday 17 July 2008 10:01, Eric Rannaud wrote: > On Wed, 2008-07-16 at 17:05 -0400, Chris Snook wrote: > > Rik van Riel wrote: > > > I believe that for mmap MADV_SEQUENTIAL, we will have to do > > > an unmap-behind from the fault path. Not every time, but > > > maybe once per megabyte, unmapping the megabyte behind us. > > > > Wouldn't it just be easier to not move pages to the active list when > > they're referenced via an MADV_SEQUENTIAL mapping? If we keep them on > > the inactive list, they'll be candidates for reclaiming, but they'll > > still be in pagecache when another task scans through, as long as we're > > not under memory pressure. > > This approach, instead of invalidating the pages right away would > provide a middle ground: a way to tell the kernel "these pages are not > too important". > > Whereas if MADV_SEQUENTIAL just invalidates the pages once per megabyte > (say), then it's only doing what is already possible using MADV_DONTNEED > ("drop this pages now"). It would automate the process, but it would not > provide a more subtle hint, which could be quite useful. > > As I see it, there are two basic concepts here: > - no_reuse (like FADV_NOREUSE) > - more_ra (more readahead) > (DONTNEED being another different concept) > > Then: > MADV_SEQUENTIAL = more_ra | no_reuse > FADV_SEQUENTIAL = more_ra | no_reuse > FADV_NOREUSE = no_reuse > > Right now, only the 'more_ra' part is implemented. 'no_reuse' could be > implemented as Chris suggests. > > It looks like the disagreement a year ago around Peter's approach was > mostly around the question of whether using read ahead as a heuristic > for "drop behind" was safe for all workloads. > > Would it be less controversial to remove the heuristic (ra->size == > ra->ra_pages), and to do something only if the user asked for > _SEQUENTIAL or _NOREUSE? It's far far easier to tell the kernel "I am no longer using these pages" than to say "I will not use these pages sometime in the future after I have used them". The former can be done synchronously and with a much higher efficiency than it takes to scan through LRU lists to figure this out. We should be using the SEQUENTIAL to open up readahead windows, and ask userspace applications to use DONTNEED to drop if it is important. IMO. > It might encourage user space applications to start using > FADV_SEQUENTIAL or FADV_NOREUSE more often (as it would become > worthwhile to do so), and if they do (especially cron jobs), the problem > of the slow desktop in the morning would progressively solve itself. The slow desktop in the morning should not happen even without such a call, because the kernel should not throw out frequently used data (even if it is not quite so recent) in favour of streaming data. OK, I figure it doesn't do such a good job now, which is sad, but making all apps micromanage the pagecache to get reasonable performance on a 2GB+ desktop system is even more sad ;) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: madvise(2) MADV_SEQUENTIAL behavior 2008-07-17 6:14 ` Nick Piggin @ 2008-07-17 14:21 ` Rik van Riel 2008-07-17 18:04 ` Chris Snook 0 siblings, 1 reply; 9+ messages in thread From: Rik van Riel @ 2008-07-17 14:21 UTC (permalink / raw) To: Nick Piggin Cc: Eric Rannaud, Chris Snook, Peter Zijlstra, linux-kernel, linux-mm, Andrew Morton On Thu, 17 Jul 2008 16:14:29 +1000 Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > It might encourage user space applications to start using > > FADV_SEQUENTIAL or FADV_NOREUSE more often (as it would become > > worthwhile to do so), and if they do (especially cron jobs), the problem > > of the slow desktop in the morning would progressively solve itself. > > The slow desktop in the morning should not happen even without such a > call, because the kernel should not throw out frequently used data (even > if it is not quite so recent) in favour of streaming data. > > OK, I figure it doesn't do such a good job now, which is sad, Do you have any tests in mind that we could use to decide whether the patch I posted Tuesday would do a decent job at protecting frequently used data from streaming data? http://lkml.org/lkml/2008/7/15/465 -- All Rights Reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: madvise(2) MADV_SEQUENTIAL behavior 2008-07-17 14:21 ` Rik van Riel @ 2008-07-17 18:04 ` Chris Snook 2008-07-17 18:09 ` Peter Zijlstra 0 siblings, 1 reply; 9+ messages in thread From: Chris Snook @ 2008-07-17 18:04 UTC (permalink / raw) To: Rik van Riel Cc: Nick Piggin, Eric Rannaud, Peter Zijlstra, linux-kernel, linux-mm, Andrew Morton Rik van Riel wrote: > On Thu, 17 Jul 2008 16:14:29 +1000 > Nick Piggin <nickpiggin@yahoo.com.au> wrote: > >>> It might encourage user space applications to start using >>> FADV_SEQUENTIAL or FADV_NOREUSE more often (as it would become >>> worthwhile to do so), and if they do (especially cron jobs), the problem >>> of the slow desktop in the morning would progressively solve itself. >> The slow desktop in the morning should not happen even without such a >> call, because the kernel should not throw out frequently used data (even >> if it is not quite so recent) in favour of streaming data. >> >> OK, I figure it doesn't do such a good job now, which is sad, > > Do you have any tests in mind that we could use to decide > whether the patch I posted Tuesday would do a decent job > at protecting frequently used data from streaming data? > > http://lkml.org/lkml/2008/7/15/465 > 1) start up a memory-hogging Java app 2) run a full-system backup If it works well, the Java app shouldn't slow down much. -- Chris -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: madvise(2) MADV_SEQUENTIAL behavior 2008-07-17 18:04 ` Chris Snook @ 2008-07-17 18:09 ` Peter Zijlstra 0 siblings, 0 replies; 9+ messages in thread From: Peter Zijlstra @ 2008-07-17 18:09 UTC (permalink / raw) To: Chris Snook Cc: Rik van Riel, Nick Piggin, Eric Rannaud, linux-kernel, linux-mm, Andrew Morton Sorry can't resist... On Thu, 2008-07-17 at 14:04 -0400, Chris Snook wrote: > 1) start up a memory-hogging Java app Is there any other kind? :-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: madvise(2) MADV_SEQUENTIAL behavior 2008-07-16 21:05 ` Chris Snook 2008-07-17 0:01 ` Eric Rannaud @ 2008-07-17 14:20 ` Rik van Riel 1 sibling, 0 replies; 9+ messages in thread From: Rik van Riel @ 2008-07-17 14:20 UTC (permalink / raw) To: Chris Snook; +Cc: Peter Zijlstra, Eric Rannaud, linux-kernel, linux-mm On Wed, 16 Jul 2008 17:05:14 -0400 Chris Snook <csnook@redhat.com> wrote: > > I believe that for mmap MADV_SEQUENTIAL, we will have to do > > an unmap-behind from the fault path. Not every time, but > > maybe once per megabyte, unmapping the megabyte behind us. > > > > That way the normal page cache policies (use once, etc) can > > take care of page eviction, which should help if the file > > is also in use by another process. > > Wouldn't it just be easier to not move pages to the active list when > they're referenced via an MADV_SEQUENTIAL mapping? You want to check the MADV_SEQUENTIAL hint at pageout time and discard the referenced bit from the pte? > If we keep them on the inactive list, they'll be candidates for > reclaiming Only if we ignore the referenced bit. Which I guess we can do. -- All Rights Reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-07-17 18:09 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <1216163022.3443.156.camel@zenigma>
2008-07-16 12:14 ` madvise(2) MADV_SEQUENTIAL behavior Peter Zijlstra
2008-07-16 14:50 ` Rik van Riel
2008-07-16 21:05 ` Chris Snook
2008-07-17 0:01 ` Eric Rannaud
2008-07-17 6:14 ` Nick Piggin
2008-07-17 14:21 ` Rik van Riel
2008-07-17 18:04 ` Chris Snook
2008-07-17 18:09 ` Peter Zijlstra
2008-07-17 14:20 ` Rik van Riel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox