linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: evict streaming IO cache first
@ 2008-11-15  9:38 KOSAKI Motohiro
  2008-11-16  0:58 ` Rik van Riel
  2008-11-16  5:00 ` Andrew Morton
  0 siblings, 2 replies; 27+ messages in thread
From: KOSAKI Motohiro @ 2008-11-15  9:38 UTC (permalink / raw)
  To: LKML, linux-mm, Rik van Riel, Linus Torvalds, Andrew Morton,
	Gene Heskett
  Cc: kosaki.motohiro

Hi Andrew,

I think we need this patch at 2.6.28.
Can this thinking get acception?


--------------------------------------------------
From: Rik van Riel <riel@redhat.com>

Gene Heskett reported 2.6.28-rc3 often make unnecessary swap-out
on his system(4GB mem, 2GB swap).
and He has had to do a "swapoff -a; swapon -a" daily to clear the swap.


Actually, When there is a lot of streaming IO (or lite memory pressure workload)
going on, we do not want to scan or evict pages from the working set.  
The old VM used to skip any mapped page, but still evict indirect blocks and
other data that is useful to cache.

This patch adds logic to skip scanning the anon lists and
the active file list if most of the file pages are on the
inactive file list (where streaming IO pages live), while
at the lowest scanning priority.

If the system is not doing a lot of streaming IO, eg. the
system is running a database workload, then more often used
file pages will be on the active file list and this logic
is automatically disabled.


IOW, Large server apparently doesn't need this patch. but
desktop or small server need it.


Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Ackted-by: Gene Heskett <gene.heskett@gmail.com>
Tested-by: Gene Heskett <gene.heskett@gmail.com>
---
 include/linux/mmzone.h |    1 +
 mm/vmscan.c            |   18 ++++++++++++++++--
 2 files changed, 17 insertions(+), 2 deletions(-)

Index: b/include/linux/mmzone.h
===================================================================
--- a/include/linux/mmzone.h	2008-11-10 16:10:34.000000000 +0900
+++ b/include/linux/mmzone.h	2008-11-10 16:12:20.000000000 +0900
@@ -453,6 +453,7 @@ static inline int zone_is_oom_locked(con
  * queues ("queue_length >> 12") during an aging round.
  */
 #define DEF_PRIORITY 12
+#define PRIO_CACHE_ONLY (DEF_PRIORITY+1)
 
 /* Maximum number of zones on a zonelist */
 #define MAX_ZONES_PER_ZONELIST (MAX_NUMNODES * MAX_NR_ZONES)
Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c	2008-11-10 16:10:34.000000000 +0900
+++ b/mm/vmscan.c	2008-11-10 16:11:30.000000000 +0900
@@ -1443,6 +1443,20 @@ static unsigned long shrink_zone(int pri
 		}
 	}
 
+	/*
+	 * If there is a lot of sequential IO going on, most of the
+	 * file pages will be on the inactive file list.  We start
+	 * out by reclaiming those pages, without putting pressure on
+	 * the working set.  We only do this if the bulk of the file pages
+	 * are not in the working set (on the active file list).
+	 */
+	if (priority == PRIO_CACHE_ONLY &&
+			(nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE]))
+		for_each_evictable_lru(l)
+			/* Scan only the inactive_file list. */
+			if (l != LRU_INACTIVE_FILE)
+				nr[l] = 0;
+
 	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
 					nr[LRU_INACTIVE_FILE]) {
 		for_each_evictable_lru(l) {
@@ -1573,7 +1587,7 @@ static unsigned long do_try_to_free_page
 		}
 	}
 
-	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
+	for (priority = PRIO_CACHE_ONLY; priority >= 0; priority--) {
 		sc->nr_scanned = 0;
 		if (!priority)
 			disable_swap_token();
@@ -1735,7 +1749,7 @@ loop_again:
 	for (i = 0; i < pgdat->nr_zones; i++)
 		temp_priority[i] = DEF_PRIORITY;
 
-	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
+	for (priority = PRIO_CACHE_ONLY; priority >= 0; priority--) {
 		int end_zone = 0;	/* Inclusive.  0 = ZONE_DMA */
 		unsigned long lru_pages = 0;
 





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: evict streaming IO cache first
  2008-11-15  9:38 [PATCH] mm: evict streaming IO cache first KOSAKI Motohiro
@ 2008-11-16  0:58 ` Rik van Riel
  2008-11-16  5:00 ` Andrew Morton
  1 sibling, 0 replies; 27+ messages in thread
From: Rik van Riel @ 2008-11-16  0:58 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Linus Torvalds, Andrew Morton, Gene Heskett

KOSAKI Motohiro wrote:
> Hi Andrew,
> 
> I think we need this patch at 2.6.28.

I agree.  I thought we would need this patch from right before
the time I wrote it, but we had no good workload to demonstrate
it at the time.

Gene Heskett found that the problem happens on his system and
the patch fixes is.

> Can this thinking get acception?

One of the reasons that I could not find a justification for
the patch is that all the benchmarks that I tried were
unaffected by it.  This makes me believe that the risk of
performance regressions is low, while the patch does fix a
real performance bug for Gene's desktop.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: evict streaming IO cache first
  2008-11-15  9:38 [PATCH] mm: evict streaming IO cache first KOSAKI Motohiro
  2008-11-16  0:58 ` Rik van Riel
@ 2008-11-16  5:00 ` Andrew Morton
  2008-11-16 18:15   ` Linus Torvalds
  1 sibling, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2008-11-16  5:00 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Rik van Riel, Linus Torvalds, Gene Heskett

On Sat, 15 Nov 2008 18:38:59 +0900 (JST) KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> Hi Andrew,
> 
> I think we need this patch at 2.6.28.
> Can this thinking get acception?
> 
> 
> --------------------------------------------------
> From: Rik van Riel <riel@redhat.com>
> 
> Gene Heskett reported 2.6.28-rc3 often make unnecessary swap-out
> on his system(4GB mem, 2GB swap).
> and He has had to do a "swapoff -a; swapon -a" daily to clear the swap.
> 
> 
> Actually, When there is a lot of streaming IO (or lite memory pressure workload)
> going on, we do not want to scan or evict pages from the working set.  
> The old VM used to skip any mapped page, but still evict indirect blocks and
> other data that is useful to cache.

Well yes, that was to stop precisely this problem from happening.

> This patch adds logic to skip scanning the anon lists and
> the active file list if most of the file pages are on the
> inactive file list (where streaming IO pages live), while
> at the lowest scanning priority.
> 
> If the system is not doing a lot of streaming IO, eg. the
> system is running a database workload, then more often used
> file pages will be on the active file list and this logic
> is automatically disabled.
> 
> 
> IOW, Large server apparently doesn't need this patch. but
> desktop or small server need it.
> 
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Ackted-by: Gene Heskett <gene.heskett@gmail.com>
> Tested-by: Gene Heskett <gene.heskett@gmail.com>
> ---
>  include/linux/mmzone.h |    1 +
>  mm/vmscan.c            |   18 ++++++++++++++++--
>  2 files changed, 17 insertions(+), 2 deletions(-)
> 
> Index: b/include/linux/mmzone.h
> ===================================================================
> --- a/include/linux/mmzone.h	2008-11-10 16:10:34.000000000 +0900
> +++ b/include/linux/mmzone.h	2008-11-10 16:12:20.000000000 +0900
> @@ -453,6 +453,7 @@ static inline int zone_is_oom_locked(con
>   * queues ("queue_length >> 12") during an aging round.
>   */
>  #define DEF_PRIORITY 12
> +#define PRIO_CACHE_ONLY (DEF_PRIORITY+1)
>  
>  /* Maximum number of zones on a zonelist */
>  #define MAX_ZONES_PER_ZONELIST (MAX_NUMNODES * MAX_NR_ZONES)
> Index: b/mm/vmscan.c
> ===================================================================
> --- a/mm/vmscan.c	2008-11-10 16:10:34.000000000 +0900
> +++ b/mm/vmscan.c	2008-11-10 16:11:30.000000000 +0900
> @@ -1443,6 +1443,20 @@ static unsigned long shrink_zone(int pri
>  		}
>  	}
>  
> +	/*
> +	 * If there is a lot of sequential IO going on, most of the
> +	 * file pages will be on the inactive file list.  We start
> +	 * out by reclaiming those pages, without putting pressure on
> +	 * the working set.  We only do this if the bulk of the file pages
> +	 * are not in the working set (on the active file list).
> +	 */
> +	if (priority == PRIO_CACHE_ONLY &&
> +			(nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE]))
> +		for_each_evictable_lru(l)
> +			/* Scan only the inactive_file list. */
> +			if (l != LRU_INACTIVE_FILE)
> +				nr[l] = 0;
> +
>  	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
>  					nr[LRU_INACTIVE_FILE]) {
>  		for_each_evictable_lru(l) {
> @@ -1573,7 +1587,7 @@ static unsigned long do_try_to_free_page
>  		}
>  	}
>  
> -	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
> +	for (priority = PRIO_CACHE_ONLY; priority >= 0; priority--) {
>  		sc->nr_scanned = 0;
>  		if (!priority)
>  			disable_swap_token();
> @@ -1735,7 +1749,7 @@ loop_again:
>  	for (i = 0; i < pgdat->nr_zones; i++)
>  		temp_priority[i] = DEF_PRIORITY;
>  
> -	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
> +	for (priority = PRIO_CACHE_ONLY; priority >= 0; priority--) {
>  		int end_zone = 0;	/* Inclusive.  0 = ZONE_DMA */
>  		unsigned long lru_pages = 0;

This is a twiddle, and not a terribly good one, IMO.

See, the old reclaim_mapped logic tried to work by *observing* the
behaviour of page reclaim.  The above tweak tries to predict a-priori
what will happen, and hence has far less information.  This is a key
difference!

The guesses which the above code makes can and will go wrong, I expect.

For a start, it has a sudden transition point at (nr[LRU_INACTIVE_FILE]
== nr[LRU_ACTIVE_FILE).  That is a completely arbitrary wet finger in
the air, and when it is crossed, there will be large changes in
behaviour.  This cannot be right!

Secondly, the above code is dependent upon the size of the zone.  We
scan 1/8192th of the zone's pages in "only reclaim file pages" mode,
then we flip into "scan anon pages as well" mode.

If the size of the zone is less than (SWAP_CLUSTER_MAX * 8192) (1GB
with 4k pages) then this code will _always_ fail to reclaim
SWAP_CLUSTER_MAX pages on that initial pass, and hence will always
decrement `priority' and will always fall into "scan anon pages as
well" mode.

So I suspect this code didn't fix the problem for small zones much at
all.


Really, I think that the old approach of observing the scanner
behaviour (rather than trying to predict it) was better.  It has more
information.  I see that bits of it are still left there - afaict
zone->prev_priority doesn't do anything any more?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: evict streaming IO cache first
  2008-11-16  5:00 ` Andrew Morton
@ 2008-11-16 18:15   ` Linus Torvalds
  2008-11-16 21:20     ` Rik van Riel
  0 siblings, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2008-11-16 18:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: KOSAKI Motohiro, LKML, linux-mm, Rik van Riel, Gene Heskett


On Sat, 15 Nov 2008, Andrew Morton wrote:
> 
> Really, I think that the old approach of observing the scanner
> behaviour (rather than trying to predict it) was better.

That's generally true. Self-adjusting behaviour rather than a-priori rules 
would be much nicer. However, we apparently need to fix this some way. 
Anybody willing to re-introduce some of the old logic?

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: evict streaming IO cache first
  2008-11-16 18:15   ` Linus Torvalds
@ 2008-11-16 21:20     ` Rik van Riel
  2008-11-16 21:28       ` Hugh Dickins
  2008-11-17  4:47       ` Andrew Morton
  0 siblings, 2 replies; 27+ messages in thread
From: Rik van Riel @ 2008-11-16 21:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, KOSAKI Motohiro, LKML, linux-mm, Gene Heskett

Linus Torvalds wrote:
> On Sat, 15 Nov 2008, Andrew Morton wrote:
>> Really, I think that the old approach of observing the scanner
>> behaviour (rather than trying to predict it) was better.
> 
> That's generally true. Self-adjusting behaviour rather than a-priori rules 
> would be much nicer. However, we apparently need to fix this some way. 
> Anybody willing to re-introduce some of the old logic?

The old behaviour has big problems, especially on large memory
systems.  If the old behaviour worked right, we would not have
been working on the split LRU code for the last year and a half.

Due to programs manipulating memory many pages at a time, the
LRU ends up getting mapped and cache pages on the list in bunches.

On large memory systems, after the scanner runs into a bunch
of mapped pages, it will switch to evicting mapped pages, even
if the next bunch of pages turns out to be cache pages.

I am not convinced that "reacting to what happened in the last
1/4096th of the LRU" is any better than "look at the list stats
and decide what to do".

Andrew's objection to how things behave on small memory systems
(the patch does not change anything) is valid, but going back
to the old behaviour does not seem like an option to me, either.

I will take a look at producing smoother self tuning behaviour
in get_scan_ratio(), with logic along these lines:
- the more file pages are inactive, the more eviction should
   focus on file pages, because we are not eating away at the
   working set yet
- the more file pages are active, the more there needs to be
   a balance between file and anon scanning, because we are
   starting to get to the working sets for both

I wonder if the "do not do mark_page_accessed at page fault time"
patch is triggering the current troublesome behaviour in the VM,
because actively used file pages are not moved out of the way of
the VM - which leads get_scan_ratio to believe that we are already
hitting the working set on the file side and should also start
scanning the anon LRUs.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: evict streaming IO cache first
  2008-11-16 21:20     ` Rik van Riel
@ 2008-11-16 21:28       ` Hugh Dickins
  2008-11-17  4:47       ` Andrew Morton
  1 sibling, 0 replies; 27+ messages in thread
From: Hugh Dickins @ 2008-11-16 21:28 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Linus Torvalds, Andrew Morton, KOSAKI Motohiro, LKML, linux-mm,
	Gene Heskett

On Sun, 16 Nov 2008, Rik van Riel wrote:
> 
> I wonder if the "do not do mark_page_accessed at page fault time"
> patch is triggering the current troublesome behaviour in the VM,
> because actively used file pages are not moved out of the way of
> the VM - which leads get_scan_ratio to believe that we are already
> hitting the working set on the file side and should also start
> scanning the anon LRUs.

That patch is only in the -mm tree, not in 2.6.28-rc.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: evict streaming IO cache first
  2008-11-16 21:20     ` Rik van Riel
  2008-11-16 21:28       ` Hugh Dickins
@ 2008-11-17  4:47       ` Andrew Morton
  2008-11-17  6:19         ` Nick Piggin
  2008-11-17  6:30         ` KAMEZAWA Hiroyuki
  1 sibling, 2 replies; 27+ messages in thread
From: Andrew Morton @ 2008-11-17  4:47 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Linus Torvalds, KOSAKI Motohiro, LKML, linux-mm, Gene Heskett

On Sun, 16 Nov 2008 16:20:26 -0500 Rik van Riel <riel@redhat.com> wrote:

> Linus Torvalds wrote:
> > On Sat, 15 Nov 2008, Andrew Morton wrote:
> >> Really, I think that the old approach of observing the scanner
> >> behaviour (rather than trying to predict it) was better.
> > 
> > That's generally true. Self-adjusting behaviour rather than a-priori rules 
> > would be much nicer. However, we apparently need to fix this some way. 
> > Anybody willing to re-introduce some of the old logic?
> 
> The old behaviour has big problems, especially on large memory
> systems.  If the old behaviour worked right, we would not have
> been working on the split LRU code for the last year and a half.

Split LRU is (in this aspect) worse than the old code.

> Due to programs manipulating memory many pages at a time, the
> LRU ends up getting mapped and cache pages on the list in bunches.
>
> On large memory systems, after the scanner runs into a bunch
> of mapped pages, it will switch to evicting mapped pages, even
> if the next bunch of pages turns out to be cache pages.

Sure.  But that sounds like theory to me.  I've never seen anyone even
vaguely get anywhere near the level of instrumentation and
investigation and testing to be in a position to demonstrate that this
is a problem in practice.

> I am not convinced that "reacting to what happened in the last
> 1/4096th of the LRU" is any better than "look at the list stats
> and decide what to do".

I bet it is.  The list stats are aggregated over the entire list and
aren't very useful for predicting the state of the few hundred pages at
the tail of the list.

> Andrew's objection to how things behave on small memory systems
> (the patch does not change anything) is valid, but going back
> to the old behaviour does not seem like an option to me, either.

There's also the behaviour change at the randomly-chosen
(nr[LRU_INACTIVE_FILE] == nr[LRU_ACTIVE_FILE) point..

> I will take a look at producing smoother self tuning behaviour
> in get_scan_ratio(), with logic along these lines:
> - the more file pages are inactive, the more eviction should
>    focus on file pages, because we are not eating away at the
>    working set yet
> - the more file pages are active, the more there needs to be
>    a balance between file and anon scanning, because we are
>    starting to get to the working sets for both

hm.  I wonder if it would be prohibitive to say "hey, we did the wrong
thing in that scanning pass - rewind and try it again".  Probably it
would be.

Anyway, we need to do something.

Shouldn't get_scan_ratio() be handling this case already?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: evict streaming IO cache first
  2008-11-17  4:47       ` Andrew Morton
@ 2008-11-17  6:19         ` Nick Piggin
  2008-11-17  6:30         ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 27+ messages in thread
From: Nick Piggin @ 2008-11-17  6:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Linus Torvalds, KOSAKI Motohiro, LKML, linux-mm,
	Gene Heskett

On Monday 17 November 2008 15:47, Andrew Morton wrote:
> On Sun, 16 Nov 2008 16:20:26 -0500 Rik van Riel <riel@redhat.com> wrote:

> > I will take a look at producing smoother self tuning behaviour
> > in get_scan_ratio(), with logic along these lines:
> > - the more file pages are inactive, the more eviction should
> >    focus on file pages, because we are not eating away at the
> >    working set yet
> > - the more file pages are active, the more there needs to be
> >    a balance between file and anon scanning, because we are
> >    starting to get to the working sets for both
>
> hm.  I wonder if it would be prohibitive to say "hey, we did the wrong
> thing in that scanning pass - rewind and try it again".  Probably it
> would be.
>
> Anyway, we need to do something.
>
> Shouldn't get_scan_ratio() be handling this case already?

I have a patch that was actually for the old vmscan logic that I
found really helps prevent working set get paged out. Actually
the old vmscan behaviour wasn't any good either at preventing
unmapped pagecache from being evicted, which is the main thing I
was trying to fix (eg. keep git tree in cache while doing other
use-once IO).

It ended up working really well, but I suspect it would still fall
over in the case where you were trying to populate your caches with
the git tree *while* the streaming IO is happening (if those pages
don't have a chance to get touched again for a while, they'll look
like use-once IO to the vm)... but still no worse than current
behaviour.

I need to forward port it to the new system, however...

But this would be a pretty big change and I can't see how it could
be appropriate for -rc6. Aren't we allergic to even single-line
changes in vmscan without seemingly multi-year "testing" phases? :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: evict streaming IO cache first
  2008-11-17  4:47       ` Andrew Morton
  2008-11-17  6:19         ` Nick Piggin
@ 2008-11-17  6:30         ` KAMEZAWA Hiroyuki
  2008-11-17  6:39           ` KOSAKI Motohiro
  1 sibling, 1 reply; 27+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-11-17  6:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Linus Torvalds, KOSAKI Motohiro, LKML, linux-mm,
	Gene Heskett

On Sun, 16 Nov 2008 20:47:20 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Sun, 16 Nov 2008 16:20:26 -0500 Rik van Riel <riel@redhat.com> wrote:
> Anyway, we need to do something.
> 
> Shouldn't get_scan_ratio() be handling this case already?
> 
Hmm, could I make a question ?

I think

  - recent_rolated[LRU_FILE] is incremented when file cache is moved from
    ACTIVE_FILE to INACTIVE_FILE.
  - recent_scanned[LRU_FILE] is sum of scanning numbers on INACTIVE/ACTIVE list
    of file.
  - file caches are added to INACITVE_FILE, at first.
  - get_scan_ratio() calculates %file to be

                         file        recent rotated.
   %file = IO_cost * ------------ / -------------
                      anon + file    recent scanned.

But when "files are used by streaming or some touch once application",
there is no rotation because they are in INACTIVE FILE at first add_to_lru().
But recent_rotated will not increase while recent_scanned goes bigger and bigger.

Then %file goes to 0 rapidly.

Hmm?

Thanks,
-kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: evict streaming IO cache first
  2008-11-17  6:30         ` KAMEZAWA Hiroyuki
@ 2008-11-17  6:39           ` KOSAKI Motohiro
  2008-11-17  6:54             ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 27+ messages in thread
From: KOSAKI Motohiro @ 2008-11-17  6:39 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Rik van Riel, Linus Torvalds, LKML, linux-mm,
	Gene Heskett

2008/11/17 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:
> On Sun, 16 Nov 2008 20:47:20 -0800
> Andrew Morton <akpm@linux-foundation.org> wrote:
>
>> On Sun, 16 Nov 2008 16:20:26 -0500 Rik van Riel <riel@redhat.com> wrote:
>> Anyway, we need to do something.
>>
>> Shouldn't get_scan_ratio() be handling this case already?
>>
> Hmm, could I make a question ?
>
> I think
>
>  - recent_rolated[LRU_FILE] is incremented when file cache is moved from
>    ACTIVE_FILE to INACTIVE_FILE.
>  - recent_scanned[LRU_FILE] is sum of scanning numbers on INACTIVE/ACTIVE list
>    of file.
>  - file caches are added to INACITVE_FILE, at first.
>  - get_scan_ratio() calculates %file to be
>
>                         file        recent rotated.
>   %file = IO_cost * ------------ / -------------
>                      anon + file    recent scanned.

rewote by div to mul changing.


                        file               recent scanned.
  %file = IO_cost * ------------ * -------------
                     anon + file       recent rotated.


> But when "files are used by streaming or some touch once application",
> there is no rotation because they are in INACTIVE FILE at first add_to_lru().
> But recent_rotated will not increase while recent_scanned goes bigger and bigger.

Yup.

> Then %file goes to 0 rapidly.

I think reverse.

The problem is, when streaming access started right after, recent
scanned isn't so much.
then %file don't reach 100%.

then, few anon pages swaped out althouth memory pressure isn't so heavy.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: evict streaming IO cache first
  2008-11-17  6:39           ` KOSAKI Motohiro
@ 2008-11-17  6:54             ` KAMEZAWA Hiroyuki
  2008-11-17  7:03               ` KOSAKI Motohiro
  2008-11-17 16:22               ` Linus Torvalds
  0 siblings, 2 replies; 27+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-11-17  6:54 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Rik van Riel, Linus Torvalds, LKML, linux-mm,
	Gene Heskett

On Mon, 17 Nov 2008 15:39:20 +0900
"KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com> wrote:

> rewote by div to mul changing.
> 
> 
>                         file               recent scanned.
>   %file = IO_cost * ------------ * -------------
>                      anon + file       recent rotated.
> 
> 
Ah, sorry.

> > But when "files are used by streaming or some touch once application",
> > there is no rotation because they are in INACTIVE FILE at first add_to_lru().
> > But recent_rotated will not increase while recent_scanned goes bigger and bigger.
> 
> Yup.
> 
> > Then %file goes to 0 rapidly.
> 
> I think reverse.
> 
> The problem is, when streaming access started right after, recent
> scanned isn't so much.
> then %file don't reach 100%.
> 
> then, few anon pages swaped out althouth memory pressure isn't so heavy.
> 
"few" ? 
85Mbytes of used swap while 1.2GBytes of free memory in Gene Heskett's report.
Hmm..

How about resetting zone->recent_scanned/rotated to be some value calculated from
INACTIVE_ANON/INACTIVE_FILE at some time (when the system is enough idle) ?

Thanks,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: evict streaming IO cache first
  2008-11-17  6:54             ` KAMEZAWA Hiroyuki
@ 2008-11-17  7:03               ` KOSAKI Motohiro
  2008-11-17  8:22                 ` KAMEZAWA Hiroyuki
  2008-11-17 16:22               ` Linus Torvalds
  1 sibling, 1 reply; 27+ messages in thread
From: KOSAKI Motohiro @ 2008-11-17  7:03 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Rik van Riel, Linus Torvalds, LKML, linux-mm,
	Gene Heskett

>> then, few anon pages swaped out althouth memory pressure isn't so heavy.
>>
> "few" ?
> 85Mbytes of used swap while 1.2GBytes of free memory in Gene Heskett's report.
> Hmm..

because accumuration of a days.


> How about resetting zone->recent_scanned/rotated to be some value calculated from
> INACTIVE_ANON/INACTIVE_FILE at some time (when the system is enough idle) ?

in get_scan_ratio()

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: evict streaming IO cache first
  2008-11-17  7:03               ` KOSAKI Motohiro
@ 2008-11-17  8:22                 ` KAMEZAWA Hiroyuki
  2008-11-17  8:32                   ` KOSAKI Motohiro
  2008-11-17 16:22                   ` Rik van Riel
  0 siblings, 2 replies; 27+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-11-17  8:22 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Rik van Riel, Linus Torvalds, LKML, linux-mm,
	Gene Heskett

On Mon, 17 Nov 2008 16:03:48 +0900
"KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com> wrote:
> > How about resetting zone->recent_scanned/rotated to be some value calculated from
> > INACTIVE_ANON/INACTIVE_FILE at some time (when the system is enough idle) ?
> 
> in get_scan_ratio()
> 
But active/inactive ratio (and mapped_ratio) is not handled there.

Follwoing 2 will return the same scan ratio.
==case 1==
  active_anon = 480M
  inactive_anon = 32M
  active_file = 2M
  inactive_file = 510M

==case 2==
  active_anon = 480M
  inactive_anon = 32M
  active_file = 480M
  inactive_file = 32M
==



-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: evict streaming IO cache first
  2008-11-17  8:22                 ` KAMEZAWA Hiroyuki
@ 2008-11-17  8:32                   ` KOSAKI Motohiro
  2008-11-17 16:22                   ` Rik van Riel
  1 sibling, 0 replies; 27+ messages in thread
From: KOSAKI Motohiro @ 2008-11-17  8:32 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Rik van Riel, Linus Torvalds, LKML, linux-mm,
	Gene Heskett

>> > How about resetting zone->recent_scanned/rotated to be some value calculated from
>> > INACTIVE_ANON/INACTIVE_FILE at some time (when the system is enough idle) ?
>>
>> in get_scan_ratio()
>>
> But active/inactive ratio (and mapped_ratio) is not handled there.

Yes.
I think akpm pointed out just this point.


> Follwoing 2 will return the same scan ratio.
> ==case 1==
>  active_anon = 480M
>  inactive_anon = 32M
>  active_file = 2M
>  inactive_file = 510M
>
> ==case 2==
>  active_anon = 480M
>  inactive_anon = 32M
>  active_file = 480M
>  inactive_file = 32M
> ==

Yes.
the patch handle this situation by special priority.


Umm..
Perhaps, I am missing your point?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: evict streaming IO cache first
  2008-11-17  6:54             ` KAMEZAWA Hiroyuki
  2008-11-17  7:03               ` KOSAKI Motohiro
@ 2008-11-17 16:22               ` Linus Torvalds
  2008-11-17 16:27                 ` Rik van Riel
                                   ` (2 more replies)
  1 sibling, 3 replies; 27+ messages in thread
From: Linus Torvalds @ 2008-11-17 16:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, LKML, linux-mm,
	Gene Heskett


On Mon, 17 Nov 2008, KAMEZAWA Hiroyuki wrote:
> 
> How about resetting zone->recent_scanned/rotated to be some value calculated from
> INACTIVE_ANON/INACTIVE_FILE at some time (when the system is enough idle) ?

.. or how about just considering the act of adding a new page to the LRU 
to be a "scan" event? IOW, "scanning" is not necessarily just an act of 
the VM looking for pages to free, but would be a more general "activity" 
meter.

IOW, when we calculate the percentages of anon-vs-file in get_scan_ratio() 
we take into account how much anon-page activity vs how much file cache 
activity there has been.

So if we've seen a lot of filesystem activity ("streaming"), we would tend 
to prefer to scan the page cache. If we've seen a lot of anon page 
mapping, we'd tend to prefer to scan the anon side.

That would seem to be the right kind of thing to do: if we literally have 
a load that only does streaming and pages never get moved to the active 
LRU, it should basically keep the page cache close to constant size - 
which is just another way of saying that we should only be scanning page 
cache pages.

Hmm? 

			Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: evict streaming IO cache first
  2008-11-17  8:22                 ` KAMEZAWA Hiroyuki
  2008-11-17  8:32                   ` KOSAKI Motohiro
@ 2008-11-17 16:22                   ` Rik van Riel
  1 sibling, 0 replies; 27+ messages in thread
From: Rik van Riel @ 2008-11-17 16:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: KOSAKI Motohiro, Andrew Morton, Linus Torvalds, LKML, linux-mm,
	Gene Heskett

KAMEZAWA Hiroyuki wrote:
> On Mon, 17 Nov 2008 16:03:48 +0900
> "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com> wrote:
>>> How about resetting zone->recent_scanned/rotated to be some value calculated from
>>> INACTIVE_ANON/INACTIVE_FILE at some time (when the system is enough idle) ?
>> in get_scan_ratio()
>>
> But active/inactive ratio (and mapped_ratio) is not handled there.
> 
> Follwoing 2 will return the same scan ratio.

get_scan_ratio does not look at the sizes of the lists, but
at the ratio between "pages scanned" and "pages rotated".

A page that is moved from the inactive to the active list
is always counted as rotated.

A page that is moved from the active to the inactive list
is counted as rotated if it was mapped and referenced.

> ==case 1==
>   active_anon = 480M
>   inactive_anon = 32M
>   active_file = 2M
>   inactive_file = 510M
> 
> ==case 2==
>   active_anon = 480M
>   inactive_anon = 32M
>   active_file = 480M
>   inactive_file = 32M
> ==
> 
> 
> 
> -Kame
> 


-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: evict streaming IO cache first
  2008-11-17 16:22               ` Linus Torvalds
@ 2008-11-17 16:27                 ` Rik van Riel
  2008-11-17 16:37                 ` Linus Torvalds
  2008-11-18  0:06                 ` [PATCH] vmscan: evict streaming IO first Rik van Riel
  2 siblings, 0 replies; 27+ messages in thread
From: Rik van Riel @ 2008-11-17 16:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: KAMEZAWA Hiroyuki, KOSAKI Motohiro, Andrew Morton, LKML,
	linux-mm, Gene Heskett

Linus Torvalds wrote:
> 
> On Mon, 17 Nov 2008, KAMEZAWA Hiroyuki wrote:
>> How about resetting zone->recent_scanned/rotated to be some value calculated from
>> INACTIVE_ANON/INACTIVE_FILE at some time (when the system is enough idle) ?
> 
> .. or how about just considering the act of adding a new page to the LRU 
> to be a "scan" event? IOW, "scanning" is not necessarily just an act of 
> the VM looking for pages to free, but would be a more general "activity" 
> meter.

That might work.

Adding a new page to the inactive file list would increment
zone->recent_scanned[file].

Adding a new anonymous page to the active anon list could
increment both zone->recent_scanned[anon] and
zone->recent_rotated[anon].

That way adding anonymous memory would move some pressure
to the file side, while doing lots of streaming IO would
result in the same.

> That would seem to be the right kind of thing to do: if we literally have 
> a load that only does streaming and pages never get moved to the active 
> LRU, it should basically keep the page cache close to constant size - 
> which is just another way of saying that we should only be scanning page 
> cache pages.

The only thing left at that point is the fact that the
streaming IO puts pressure on the working set in the
page cache, by always decreasing the active file list
too.  But that's a different thing entirely and can be
looked at later :)

Your idea looks like it should work.  I'll whip up a patch
for Gene Heskett.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: evict streaming IO cache first
  2008-11-17 16:22               ` Linus Torvalds
  2008-11-17 16:27                 ` Rik van Riel
@ 2008-11-17 16:37                 ` Linus Torvalds
  2008-11-17 16:54                   ` Rik van Riel
  2008-11-18  0:06                 ` [PATCH] vmscan: evict streaming IO first Rik van Riel
  2 siblings, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2008-11-17 16:37 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, LKML, linux-mm,
	Gene Heskett


On Mon, 17 Nov 2008, Linus Torvalds wrote:
> On Mon, 17 Nov 2008, KAMEZAWA Hiroyuki wrote:
> > 
> > How about resetting zone->recent_scanned/rotated to be some value calculated from
> > INACTIVE_ANON/INACTIVE_FILE at some time (when the system is enough idle) ?
> 
> .. or how about just considering the act of adding a new page to the LRU 
> to be a "scan" event? IOW, "scanning" is not necessarily just an act of 
> the VM looking for pages to free, but would be a more general "activity" 
> meter.

Another thing strikes me: it looks like the logic in "get_scan_ratio()" 
has a tendency to get unbalanced - if we end up deciding that we should 
scan a lot of anonymous pages, the scan numbers for anonymous pages will 
go up, and we get even _more_ eager to scan those. Of course, "rotate" 
events will then make us less likely again, but for streaming loads, you 
wouldn't expect to see those at all.

There seems to be another bug there wrt the "aging" - we age anon page 
events and file page events independently, which sounds like it would make 
the math totally nonsensical. We do that whole

	anon / (anon + file)

thing, but since anon and file counts are aged independently, that "math" 
is not math, it looks like a totally random number that has no meaning.

So instead of having two independent aging things, if we age one side, we 
should age the other. No?

But maybe I'm looking at it wrong. It doesn't seem sensible to me, but 
maybe there's some deeper truth in there somewhere that I'm missing..

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: evict streaming IO cache first
  2008-11-17 16:37                 ` Linus Torvalds
@ 2008-11-17 16:54                   ` Rik van Riel
  2008-11-17 17:06                     ` Linus Torvalds
  0 siblings, 1 reply; 27+ messages in thread
From: Rik van Riel @ 2008-11-17 16:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: KAMEZAWA Hiroyuki, KOSAKI Motohiro, Andrew Morton, LKML,
	linux-mm, Gene Heskett

Linus Torvalds wrote:

> Another thing strikes me: it looks like the logic in "get_scan_ratio()" 
> has a tendency to get unbalanced - if we end up deciding that we should 
> scan a lot of anonymous pages, the scan numbers for anonymous pages will 
> go up, and we get even _more_ eager to scan those. Of course, "rotate" 
> events will then make us less likely again, but for streaming loads, you 
> wouldn't expect to see those at all.

True for streaming loads - if we scan the file list and find
mostly pages from streaming loads, we will become more eager
to scan the file list.

I do not expect streaming loads to ever hit the anon list in
the same way, because anonymous pages start out referenced and
on the active list, which means an anonymous deactivation will
always be counted as a rotate event.

> There seems to be another bug there wrt the "aging" - we age anon page 
> events and file page events independently, which sounds like it would make 
> the math totally nonsensical. We do that whole
> 
> 	anon / (anon + file)

That's an outdated comment.  Andrew had a patch to update that
comment, but it must have gotten lost somewhere.  I'll send you
a patch to update it.

If you look at the actual calculation, you'l see that the
scan percentages are keyed off just swappiness and the
rotated/scanned ratios for each page category.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: evict streaming IO cache first
  2008-11-17 16:54                   ` Rik van Riel
@ 2008-11-17 17:06                     ` Linus Torvalds
  2008-11-17 17:16                       ` [PATCH] vmscan: fix get_scan_ratio comment Rik van Riel
  0 siblings, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2008-11-17 17:06 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KAMEZAWA Hiroyuki, KOSAKI Motohiro, Andrew Morton, LKML,
	linux-mm, Gene Heskett


On Mon, 17 Nov 2008, Rik van Riel wrote:

> Linus Torvalds wrote:
> 
> > Another thing strikes me: it looks like the logic in "get_scan_ratio()" has
> > a tendency to get unbalanced - if we end up deciding that we should scan a
> > lot of anonymous pages, the scan numbers for anonymous pages will go up, and
> > we get even _more_ eager to scan those. Of course, "rotate" events will then
> > make us less likely again, but for streaming loads, you wouldn't expect to
> > see those at all.
> 
> True for streaming loads - if we scan the file list and find
> mostly pages from streaming loads, we will become more eager
> to scan the file list.

The "count adding as activity" might hide that, but it does seem a big 
iffy.

> > There seems to be another bug there wrt the "aging" - we age anon page
> > events and file page events independently, which sounds like it would make
> > the math totally nonsensical. We do that whole
> > 
> > 	anon / (anon + file)
> 
> That's an outdated comment.  Andrew had a patch to update that
> comment, but it must have gotten lost somewhere.  I'll send you
> a patch to update it.
> 
> If you look at the actual calculation, you'l see that the
> scan percentages are keyed off just swappiness and the
> rotated/scanned ratios for each page category.

Ok, that makes sense. Yes, as ratios the math looks valid.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] vmscan: fix get_scan_ratio comment
  2008-11-17 17:06                     ` Linus Torvalds
@ 2008-11-17 17:16                       ` Rik van Riel
  2008-11-17 17:32                         ` KOSAKI Motohiro
  2008-11-17 17:35                         ` Linus Torvalds
  0 siblings, 2 replies; 27+ messages in thread
From: Rik van Riel @ 2008-11-17 17:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: KAMEZAWA Hiroyuki, KOSAKI Motohiro, Andrew Morton, LKML,
	linux-mm, Gene Heskett

[-- Attachment #1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2: get-scan-ratio-comment.patch --]
[-- Type: text/plain, Size: 910 bytes --]

Fix the old comment on the scan ratio calculations.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 mm/vmscan.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux-2.6.28-rc5/mm/vmscan.c
===================================================================
--- linux-2.6.28-rc5.orig/mm/vmscan.c	2008-11-16 17:47:13.000000000 -0500
+++ linux-2.6.28-rc5/mm/vmscan.c	2008-11-17 12:05:03.000000000 -0500
@@ -1386,9 +1386,9 @@ static void get_scan_ratio(struct zone *
 	file_prio = 200 - sc->swappiness;
 
 	/*
-	 *                  anon       recent_rotated[0]
-	 * %anon = 100 * ----------- / ----------------- * IO cost
-	 *               anon + file      rotate_sum
+	 *         recent_scanned[anon]
+	 * %anon = -------------------- * sc->swappiness
+	 *         recent_rotated[anon]
 	 */
 	ap = (anon_prio + 1) * (zone->recent_scanned[0] + 1);
 	ap /= zone->recent_rotated[0] + 1;

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] vmscan: fix get_scan_ratio comment
  2008-11-17 17:16                       ` [PATCH] vmscan: fix get_scan_ratio comment Rik van Riel
@ 2008-11-17 17:32                         ` KOSAKI Motohiro
  2008-11-17 17:35                         ` Linus Torvalds
  1 sibling, 0 replies; 27+ messages in thread
From: KOSAKI Motohiro @ 2008-11-17 17:32 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Linus Torvalds, KAMEZAWA Hiroyuki, Andrew Morton, LKML, linux-mm,
	Gene Heskett

2008/11/18 Rik van Riel <riel@redhat.com>:
>
> Fix the old comment on the scan ratio calculations.
>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
>  mm/vmscan.c |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> Index: linux-2.6.28-rc5/mm/vmscan.c
> ===================================================================
> --- linux-2.6.28-rc5.orig/mm/vmscan.c   2008-11-16 17:47:13.000000000 -0500
> +++ linux-2.6.28-rc5/mm/vmscan.c        2008-11-17 12:05:03.000000000 -0500
> @@ -1386,9 +1386,9 @@ static void get_scan_ratio(struct zone *
>        file_prio = 200 - sc->swappiness;
>
>        /*
> -        *                  anon       recent_rotated[0]
> -        * %anon = 100 * ----------- / ----------------- * IO cost
> -        *               anon + file      rotate_sum
> +        *         recent_scanned[anon]
> +        * %anon = -------------------- * sc->swappiness
> +        *         recent_rotated[anon]
>         */
>        ap = (anon_prio + 1) * (zone->recent_scanned[0] + 1);
>        ap /= zone->recent_rotated[0] + 1;

looks good to me, absolutely.
      Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] vmscan: fix get_scan_ratio comment
  2008-11-17 17:16                       ` [PATCH] vmscan: fix get_scan_ratio comment Rik van Riel
  2008-11-17 17:32                         ` KOSAKI Motohiro
@ 2008-11-17 17:35                         ` Linus Torvalds
  2008-11-17 18:53                           ` Rik van Riel
  1 sibling, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2008-11-17 17:35 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KAMEZAWA Hiroyuki, KOSAKI Motohiro, Andrew Morton, LKML,
	linux-mm, Gene Heskett


Grr. Don't use attachments. Now I can't sanely reply and quote things 
with a sane default setting (yes, I could set "include attachements in 
reply", but that is _not_ a sane default setting, and anybody who claims 
it is is a moron).

Why don't people learn?

Anyway, without quoting, the thing is - your fix isn't any better. The 
more interesting part is how the fractions get combined, and that is 
indeed approximately "anon% = anon / (anon + file)".

So you in many ways made the comment worse. It wasn't good before, but 
it's still not good, and now it comments on the part that isn't even 
interesting (ie it comments the _trivial_ fractional part)

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] vmscan: fix get_scan_ratio comment
  2008-11-17 17:35                         ` Linus Torvalds
@ 2008-11-17 18:53                           ` Rik van Riel
  2008-11-17 20:23                             ` Rik van Riel
  0 siblings, 1 reply; 27+ messages in thread
From: Rik van Riel @ 2008-11-17 18:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: KAMEZAWA Hiroyuki, KOSAKI Motohiro, Andrew Morton, LKML,
	linux-mm, Gene Heskett

Linus Torvalds wrote:

> Anyway, without quoting, the thing is - your fix isn't any better. The 
> more interesting part is how the fractions get combined, and that is 
> indeed approximately "anon% = anon / (anon + file)".

Well, the "anon" and "file" in that calculation are the
scanned/rotated ratios for anon and file pages, not the
sizes of the lists.

> So you in many ways made the comment worse. It wasn't good before, but 
> it's still not good, and now it comments on the part that isn't even 
> interesting (ie it comments the _trivial_ fractional part)

How about something like the following: ?

/*
  * The amount of pressure on anon vs file pages is inversely
  * proportional to the fraction of recently scanned pages on
  * each list that were recently referenced and in active use.
  */

(I'll mail the next patches out with claws-mail - silly thunderbird)

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] vmscan: fix get_scan_ratio comment
  2008-11-17 18:53                           ` Rik van Riel
@ 2008-11-17 20:23                             ` Rik van Riel
  0 siblings, 0 replies; 27+ messages in thread
From: Rik van Riel @ 2008-11-17 20:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: KAMEZAWA Hiroyuki, KOSAKI Motohiro, Andrew Morton, LKML,
	linux-mm, Gene Heskett

Fix the old comment on the scan ratio calculations.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 mm/vmscan.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux-2.6.28-rc5/mm/vmscan.c
===================================================================
--- linux-2.6.28-rc5.orig/mm/vmscan.c	2008-11-16 17:47:13.000000000 -0500
+++ linux-2.6.28-rc5/mm/vmscan.c	2008-11-17 15:22:22.000000000 -0500
@@ -1386,9 +1386,9 @@ static void get_scan_ratio(struct zone *
 	file_prio = 200 - sc->swappiness;
 
 	/*
-	 *                  anon       recent_rotated[0]
-	 * %anon = 100 * ----------- / ----------------- * IO cost
-	 *               anon + file      rotate_sum
+	 * The amount of pressure on anon vs file pages is inversely
+	 * proportional to the fraction of recently scanned pages on
+	 * each list that were recently referenced and in active use.
 	 */
 	ap = (anon_prio + 1) * (zone->recent_scanned[0] + 1);
 	ap /= zone->recent_rotated[0] + 1;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] vmscan: evict streaming IO first
  2008-11-17 16:22               ` Linus Torvalds
  2008-11-17 16:27                 ` Rik van Riel
  2008-11-17 16:37                 ` Linus Torvalds
@ 2008-11-18  0:06                 ` Rik van Riel
  2008-12-01 20:24                   ` Andrew Morton
  2 siblings, 1 reply; 27+ messages in thread
From: Rik van Riel @ 2008-11-18  0:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: KAMEZAWA Hiroyuki, KOSAKI Motohiro, Andrew Morton, LKML,
	linux-mm, Gene Heskett

Count the insertion of new pages in the statistics used to drive the
pageout scanning code.  This should help the kernel quickly evict
streaming file IO.

We count on the fact that new file pages start on the inactive file
LRU and new anonymous pages start on the active anon list.  This
means streaming file IO will increment the recent scanned file
statistic, while leaving the recent rotated file statistic alone,
driving pageout scanning to the file LRUs.

Pageout activity does its own list manipulation.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 mm/swap.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

On Mon, 17 Nov 2008 08:22:13 -0800 (PST)
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> .. or how about just considering the act of adding a new page to the LRU 
> to be a "scan" event? IOW, "scanning" is not necessarily just an act of 
> the VM looking for pages to free, but would be a more general "activity" 
> meter.

Linus, this should implement your idea.  

Gene, does this patch resolve the problem for you?

Index: linux-2.6.28-rc5/mm/swap.c
===================================================================
--- linux-2.6.28-rc5.orig/mm/swap.c	2008-11-16 17:47:13.000000000 -0500
+++ linux-2.6.28-rc5/mm/swap.c	2008-11-17 18:58:32.000000000 -0500
@@ -445,6 +445,7 @@ void ____pagevec_lru_add(struct pagevec 
 	for (i = 0; i < pagevec_count(pvec); i++) {
 		struct page *page = pvec->pages[i];
 		struct zone *pagezone = page_zone(page);
+		int file;
 
 		if (pagezone != zone) {
 			if (zone)
@@ -456,8 +457,12 @@ void ____pagevec_lru_add(struct pagevec 
 		VM_BUG_ON(PageUnevictable(page));
 		VM_BUG_ON(PageLRU(page));
 		SetPageLRU(page);
-		if (is_active_lru(lru))
+		file = is_file_lru(lru);
+		zone->recent_scanned[file]++;
+		if (is_active_lru(lru)) {
 			SetPageActive(page);
+			zone->recent_rotated[file]++;
+		}
 		add_page_to_lru_list(zone, page, lru);
 	}
 	if (zone)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] vmscan: evict streaming IO first
  2008-11-18  0:06                 ` [PATCH] vmscan: evict streaming IO first Rik van Riel
@ 2008-12-01 20:24                   ` Andrew Morton
  0 siblings, 0 replies; 27+ messages in thread
From: Andrew Morton @ 2008-12-01 20:24 UTC (permalink / raw)
  To: Rik van Riel
  Cc: torvalds, kamezawa.hiroyu, kosaki.motohiro, linux-kernel,
	linux-mm, gene.heskett

On Mon, 17 Nov 2008 19:06:42 -0500
Rik van Riel <riel@redhat.com> wrote:

> Count the insertion of new pages in the statistics used to drive the
> pageout scanning code.  This should help the kernel quickly evict
> streaming file IO.
> 
> We count on the fact that new file pages start on the inactive file
> LRU and new anonymous pages start on the active anon list.  This
> means streaming file IO will increment the recent scanned file
> statistic, while leaving the recent rotated file statistic alone,
> driving pageout scanning to the file LRUs.
> 
> Pageout activity does its own list manipulation.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
>  mm/swap.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> On Mon, 17 Nov 2008 08:22:13 -0800 (PST)
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > .. or how about just considering the act of adding a new page to the LRU 
> > to be a "scan" event? IOW, "scanning" is not necessarily just an act of 
> > the VM looking for pages to free, but would be a more general "activity" 
> > meter.
> 
> Linus, this should implement your idea.  
> 
> Gene, does this patch resolve the problem for you?

Has Gene had a chance to confirm this yet?

> Index: linux-2.6.28-rc5/mm/swap.c
> ===================================================================
> --- linux-2.6.28-rc5.orig/mm/swap.c	2008-11-16 17:47:13.000000000 -0500
> +++ linux-2.6.28-rc5/mm/swap.c	2008-11-17 18:58:32.000000000 -0500
> @@ -445,6 +445,7 @@ void ____pagevec_lru_add(struct pagevec 
>  	for (i = 0; i < pagevec_count(pvec); i++) {
>  		struct page *page = pvec->pages[i];
>  		struct zone *pagezone = page_zone(page);
> +		int file;
>  
>  		if (pagezone != zone) {
>  			if (zone)
> @@ -456,8 +457,12 @@ void ____pagevec_lru_add(struct pagevec 
>  		VM_BUG_ON(PageUnevictable(page));
>  		VM_BUG_ON(PageLRU(page));
>  		SetPageLRU(page);
> -		if (is_active_lru(lru))
> +		file = is_file_lru(lru);
> +		zone->recent_scanned[file]++;
> +		if (is_active_lru(lru)) {
>  			SetPageActive(page);
> +			zone->recent_rotated[file]++;
> +		}
>  		add_page_to_lru_list(zone, page, lru);
>  	}
>  	if (zone)

Were you not able to reproduce the problem?  It looks like it'd be a
pretty simple test case to set up?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2008-12-01 20:24 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-15  9:38 [PATCH] mm: evict streaming IO cache first KOSAKI Motohiro
2008-11-16  0:58 ` Rik van Riel
2008-11-16  5:00 ` Andrew Morton
2008-11-16 18:15   ` Linus Torvalds
2008-11-16 21:20     ` Rik van Riel
2008-11-16 21:28       ` Hugh Dickins
2008-11-17  4:47       ` Andrew Morton
2008-11-17  6:19         ` Nick Piggin
2008-11-17  6:30         ` KAMEZAWA Hiroyuki
2008-11-17  6:39           ` KOSAKI Motohiro
2008-11-17  6:54             ` KAMEZAWA Hiroyuki
2008-11-17  7:03               ` KOSAKI Motohiro
2008-11-17  8:22                 ` KAMEZAWA Hiroyuki
2008-11-17  8:32                   ` KOSAKI Motohiro
2008-11-17 16:22                   ` Rik van Riel
2008-11-17 16:22               ` Linus Torvalds
2008-11-17 16:27                 ` Rik van Riel
2008-11-17 16:37                 ` Linus Torvalds
2008-11-17 16:54                   ` Rik van Riel
2008-11-17 17:06                     ` Linus Torvalds
2008-11-17 17:16                       ` [PATCH] vmscan: fix get_scan_ratio comment Rik van Riel
2008-11-17 17:32                         ` KOSAKI Motohiro
2008-11-17 17:35                         ` Linus Torvalds
2008-11-17 18:53                           ` Rik van Riel
2008-11-17 20:23                             ` Rik van Riel
2008-11-18  0:06                 ` [PATCH] vmscan: evict streaming IO first Rik van Riel
2008-12-01 20:24                   ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox