[-mm][PATCH 0/5] mm: page reclaim throttle v6

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [-mm][PATCH 0/5] mm: page reclaim throttle v6
@ 2008-05-04 12:53 KOSAKI Motohiro
  2008-05-04 12:55 ` [-mm][PATCH 1/5] fix overflow problem of do_try_to_free_page() KOSAKI Motohiro
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: KOSAKI Motohiro @ 2008-05-04 12:53 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton; +Cc: kosaki.motohiro

changelog
========================================
  v5 -> v6
     o rebase to 2.6.25-mm1
     o use PGFREE statics instead wall time.
     o separate function type change patch and introduce throttle patch.

  v4 -> v5
     o rebase to 2.6.25-rc8-mm1

  v3 -> v4:
     o fixed recursive shrink_zone problem.
     o add last_checked variable in shrink_zone for 
       prevent corner case regression.

  v2 -> v3:
     o use wake_up() instead wake_up_all()
     o max reclaimers can be changed Kconfig option and sysctl.
     o some cleanups

  v1 -> v2:
     o make per zone throttle 



background
=====================================
current VM implementation doesn't has limit of # of parallel reclaim.
when heavy workload, it bring to 2 bad things
  - heavy lock contention
  - unnecessary swap out

The end of last year, KAMEZAWA Hiroyuki proposed the patch of page 
reclaim throttle and explain it improve reclaim time.
	http://marc.info/?l=linux-mm&m=119667465917215&w=2

but unfortunately it works only memcgroup reclaim.
Today, I implement it again for support global reclaim and mesure it.


benefit
=====================================
<<1. fix the bug of incorrect OOM killer>>

if do following commanc, sometimes OOM killer happened.
(OOM happend about 10%)

 $ ./hackbench 125 process 1000

because following bad scenario happend.

   1. memory shortage happend.
   2. many task call shrink_zone at the same time.
   3. all page are isolated from LRU at the same time.
   4. the last task can't isolate any page from LRU.
   5. it cause reclaim failure.
   6. it cause OOM killer.

my patch is directly solution for that problem.


<<2. performance improvement>>
I mesure various parameter of hackbench.

result number mean seconds (i.e. smaller is better)

    num_group       vanilla      with throttle   
   --------------------------------------------
      80              26.22           24.97    
      85              27.31           25.94    
      90              29.23           26.77
      95              30.73           28.40
     100              32.02           30.62
     105              33.97           31.93
     110              35.37           33.19
     115              36.96           33.68
     120              74.05           36.25
     125              41.07           39.30
     130              86.92           45.74
     135             234.62           45.99
     140             291.95           57.82
     145             425.35           70.31
     150             766.92          113.28


Why this patch imrove performance?

vanilla kernel get unstable performance at swap happend.
this patch doesn't improvement best case, but be able to prevent worst case.
thus, The average performance of hackbench increase largely.





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [-mm][PATCH 1/5] fix overflow problem of do_try_to_free_page()
  2008-05-04 12:53 [-mm][PATCH 0/5] mm: page reclaim throttle v6 KOSAKI Motohiro
@ 2008-05-04 12:55 ` KOSAKI Motohiro
  2008-05-05  8:12   ` Nishanth Aravamudan
  2008-05-04 12:57 ` [-mm][PATCH 2/5] introduce get_vm_event() KOSAKI Motohiro
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: KOSAKI Motohiro @ 2008-05-04 12:55 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Nishanth Aravamudan; +Cc: kosaki.motohiro

this patch is not part of reclaim throttle series.
it is merely hotfixs.

---------------------------------------
"Smarter retry of costly-order allocations" patch series change 
behaver of do_try_to_free_pages().
but unfortunately ret variable type unchanged.

thus, overflow problem is possible.



Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Nishanth Aravamudan <nacc@us.ibm.com>

---
 mm/vmscan.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c	2008-05-03 00:02:52.000000000 +0900
+++ b/mm/vmscan.c	2008-05-03 00:05:08.000000000 +0900
@@ -1317,7 +1317,7 @@ static unsigned long do_try_to_free_page
 					struct scan_control *sc)
 {
 	int priority;
-	int ret = 0;
+	unsigned long ret = 0;
 	unsigned long total_scanned = 0;
 	unsigned long nr_reclaimed = 0;
 	struct reclaim_state *reclaim_state = current->reclaim_state;


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [-mm][PATCH 1/5] fix overflow problem of do_try_to_free_page()
  2008-05-04 12:55 ` [-mm][PATCH 1/5] fix overflow problem of do_try_to_free_page() KOSAKI Motohiro
@ 2008-05-05  8:12   ` Nishanth Aravamudan
  2008-05-06  3:29     ` KOSAKI Motohiro
  0 siblings, 1 reply; 21+ messages in thread
From: Nishanth Aravamudan @ 2008-05-05  8:12 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrew Morton

On 04.05.2008 [21:55:57 +0900], KOSAKI Motohiro wrote:
> this patch is not part of reclaim throttle series.
> it is merely hotfixs.
> 
> ---------------------------------------
> "Smarter retry of costly-order allocations" patch series change 
> behaver of do_try_to_free_pages().
> but unfortunately ret variable type unchanged.
> 
> thus, overflow problem is possible.
> 
> 
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Nishanth Aravamudan <nacc@us.ibm.com>

Eep, sorry -- my original version had used -EAGAIN to indicate a special
condition, but this was removed before the final patch. Thanks for the
catch.

Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>

Should go upstream, as well.

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [-mm][PATCH 1/5] fix overflow problem of do_try_to_free_page()
  2008-05-05  8:12   ` Nishanth Aravamudan
@ 2008-05-06  3:29     ` KOSAKI Motohiro
  0 siblings, 0 replies; 21+ messages in thread
From: KOSAKI Motohiro @ 2008-05-06  3:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: kosaki.motohiro, LKML, linux-mm, Nishanth Aravamudan

Hi Andrew,

I'll repost patch 2-5 after refrect reviewer comment.
but I hope patch [1/5] merge into -mm soon.

Nishanth-san already acked me.
please.


> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > CC: Nishanth Aravamudan <nacc@us.ibm.com>
> 
> Eep, sorry -- my original version had used -EAGAIN to indicate a special
> condition, but this was removed before the final patch. Thanks for the
> catch.
> 
> Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
> 
> Should go upstream, as well.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [-mm][PATCH 2/5] introduce get_vm_event()
  2008-05-04 12:53 [-mm][PATCH 0/5] mm: page reclaim throttle v6 KOSAKI Motohiro
  2008-05-04 12:55 ` [-mm][PATCH 1/5] fix overflow problem of do_try_to_free_page() KOSAKI Motohiro
@ 2008-05-04 12:57 ` KOSAKI Motohiro
  2008-05-05 21:47   ` Rik van Riel
  2008-05-04 12:58 ` [-mm][PATCH 3/5] change function prototype of shrink_zone() KOSAKI Motohiro
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: KOSAKI Motohiro @ 2008-05-04 12:57 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton; +Cc: kosaki.motohiro

introduce get_vm_event() new function for easy use vm statics.


Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

---
 include/linux/vmstat.h |    7 ++++++-
 mm/vmstat.c            |   14 ++++++++++++++
 2 files changed, 20 insertions(+), 1 deletion(-)

Index: b/include/linux/vmstat.h
===================================================================
--- a/include/linux/vmstat.h	2008-05-02 23:40:39.000000000 +0900
+++ b/include/linux/vmstat.h	2008-05-03 00:23:17.000000000 +0900
@@ -92,6 +92,8 @@ static inline void vm_events_fold_cpu(in
 }
 #endif
 
+unsigned long get_vm_event(enum vm_event_item event_type);
+
 #else
 
 /* Disable counters */
@@ -113,7 +115,10 @@ static inline void all_vm_events(unsigne
 static inline void vm_events_fold_cpu(int cpu)
 {
 }
-
+static inline unsigned long get_vm_event(enum vm_event_item event_type)
+{
+	return 0;
+}
 #endif /* CONFIG_VM_EVENT_COUNTERS */
 
 #define __count_zone_vm_events(item, zone, delta) \
Index: b/mm/vmstat.c
===================================================================
--- a/mm/vmstat.c	2008-05-02 23:40:39.000000000 +0900
+++ b/mm/vmstat.c	2008-05-02 23:43:28.000000000 +0900
@@ -46,6 +46,20 @@ void all_vm_events(unsigned long *ret)
 }
 EXPORT_SYMBOL_GPL(all_vm_events);
 
+unsigned long get_vm_event(enum vm_event_item event_type)
+{
+	int cpu;
+	unsigned long ret = 0;
+
+	for_each_cpu_mask(cpu, cpu_online_map) {
+		struct vm_event_state *this = &per_cpu(vm_event_states, cpu);
+
+		ret += this->event[event_type];
+	}
+
+	return ret;
+}
+
 #ifdef CONFIG_HOTPLUG
 /*
  * Fold the foreign cpu events into our own.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [-mm][PATCH 2/5] introduce get_vm_event()
  2008-05-04 12:57 ` [-mm][PATCH 2/5] introduce get_vm_event() KOSAKI Motohiro
@ 2008-05-05 21:47   ` Rik van Riel
  0 siblings, 0 replies; 21+ messages in thread
From: Rik van Riel @ 2008-05-05 21:47 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrew Morton

On Sun, 04 May 2008 21:57:14 +0900
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> introduce get_vm_event() new function for easy use vm statics.
> 
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All Rights Reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [-mm][PATCH 3/5] change function prototype of shrink_zone()
  2008-05-04 12:53 [-mm][PATCH 0/5] mm: page reclaim throttle v6 KOSAKI Motohiro
  2008-05-04 12:55 ` [-mm][PATCH 1/5] fix overflow problem of do_try_to_free_page() KOSAKI Motohiro
  2008-05-04 12:57 ` [-mm][PATCH 2/5] introduce get_vm_event() KOSAKI Motohiro
@ 2008-05-04 12:58 ` KOSAKI Motohiro
  2008-05-05  4:42   ` minchan Kim
  2008-05-04 12:59 ` [-mm][PATCH 4/5] core of reclaim throttle KOSAKI Motohiro
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: KOSAKI Motohiro @ 2008-05-04 12:58 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton; +Cc: kosaki.motohiro

change function return type for following enhancement.
this patch have no behaver change.


Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

---
 mm/vmscan.c |   47 ++++++++++++++++++++++++++++++++---------------
 1 file changed, 32 insertions(+), 15 deletions(-)

Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c	2008-05-02 23:46:25.000000000 +0900
+++ b/mm/vmscan.c	2008-05-03 00:00:32.000000000 +0900
@@ -51,6 +51,9 @@ struct scan_control {
 	/* Incremented by the number of inactive pages that were scanned */
 	unsigned long nr_scanned;
 
+	/* number of reclaimed pages by this scanning */
+	unsigned long nr_reclaimed;
+
 	/* This context's GFP mask */
 	gfp_t gfp_mask;
 
@@ -1177,8 +1180,8 @@ static void shrink_active_list(unsigned 
 /*
  * This is a basic per-zone page freer.  Used by both kswapd and direct reclaim.
  */
-static unsigned long shrink_zone(int priority, struct zone *zone,
-				struct scan_control *sc)
+static int shrink_zone(int priority, struct zone *zone,
+		       struct scan_control *sc)
 {
 	unsigned long nr_active;
 	unsigned long nr_inactive;
@@ -1236,8 +1239,9 @@ static unsigned long shrink_zone(int pri
 		}
 	}
 
+	sc->nr_reclaimed += nr_reclaimed;
 	throttle_vm_writeout(sc->gfp_mask);
-	return nr_reclaimed;
+	return 0;
 }
 
 /*
@@ -1251,18 +1255,23 @@ static unsigned long shrink_zone(int pri
  * b) The zones may be over pages_high but they must go *over* pages_high to
  *    satisfy the `incremental min' zone defense algorithm.
  *
- * Returns the number of reclaimed pages.
+ * @priority: reclaim priority
+ * @zonelist: list of shrinking zones
+ * @sc: scan control context
+ * @ret_reclaimed: the number of reclaimed pages.
+ *
+ * Returns zonzero if error happend.
  *
  * If a zone is deemed to be full of pinned pages then just give it a light
  * scan then give up on it.
  */
-static unsigned long shrink_zones(int priority, struct zonelist *zonelist,
-					struct scan_control *sc)
+static int shrink_zones(int priority, struct zonelist *zonelist,
+			struct scan_control *sc)
 {
 	enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask);
-	unsigned long nr_reclaimed = 0;
 	struct zoneref *z;
 	struct zone *zone;
+	int ret = 0;
 
 	sc->all_unreclaimable = 1;
 	for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
@@ -1291,10 +1300,13 @@ static unsigned long shrink_zones(int pr
 							priority);
 		}
 
-		nr_reclaimed += shrink_zone(priority, zone, sc);
+		ret = shrink_zone(priority, zone, sc);
+		if (ret)
+			goto out;
 	}
 
-	return nr_reclaimed;
+out:
+	return ret;
 }
  
 /*
@@ -1319,12 +1331,12 @@ static unsigned long do_try_to_free_page
 	int priority;
 	int ret = 0;
 	unsigned long total_scanned = 0;
-	unsigned long nr_reclaimed = 0;
 	struct reclaim_state *reclaim_state = current->reclaim_state;
 	unsigned long lru_pages = 0;
 	struct zoneref *z;
 	struct zone *zone;
 	enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask);
+	int err;
 
 	if (scan_global_lru(sc))
 		count_vm_event(ALLOCSTALL);
@@ -1346,7 +1358,12 @@ static unsigned long do_try_to_free_page
 		sc->nr_scanned = 0;
 		if (!priority)
 			disable_swap_token();
-		nr_reclaimed += shrink_zones(priority, zonelist, sc);
+		err = shrink_zones(priority, zonelist, sc);
+		if (err == -EAGAIN) {
+			ret = 1;
+			goto out;
+		}
+
 		/*
 		 * Don't shrink slabs when reclaiming memory from
 		 * over limit cgroups
@@ -1354,13 +1371,13 @@ static unsigned long do_try_to_free_page
 		if (scan_global_lru(sc)) {
 			shrink_slab(sc->nr_scanned, sc->gfp_mask, lru_pages);
 			if (reclaim_state) {
-				nr_reclaimed += reclaim_state->reclaimed_slab;
+				sc->nr_reclaimed += reclaim_state->reclaimed_slab;
 				reclaim_state->reclaimed_slab = 0;
 			}
 		}
 		total_scanned += sc->nr_scanned;
-		if (nr_reclaimed >= sc->swap_cluster_max) {
-			ret = nr_reclaimed;
+		if (sc->nr_reclaimed >= sc->swap_cluster_max) {
+			ret = sc->nr_reclaimed;
 			goto out;
 		}
 
@@ -1383,7 +1400,7 @@ static unsigned long do_try_to_free_page
 	}
 	/* top priority shrink_caches still had more to do? don't OOM, then */
 	if (!sc->all_unreclaimable && scan_global_lru(sc))
-		ret = nr_reclaimed;
+		ret = sc->nr_reclaimed;
 out:
 	/*
 	 * Now that we've scanned all the zones at this priority level, note


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [-mm][PATCH 3/5] change function prototype of shrink_zone()
  2008-05-04 12:58 ` [-mm][PATCH 3/5] change function prototype of shrink_zone() KOSAKI Motohiro
@ 2008-05-05  4:42   ` minchan Kim
  2008-05-05  8:31     ` KOSAKI Motohiro
  0 siblings, 1 reply; 21+ messages in thread
From: minchan Kim @ 2008-05-05  4:42 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrew Morton

>  -static unsigned long shrink_zone(int priority, struct zone *zone,
>  -                               struct scan_control *sc)
>  +static int shrink_zone(int priority, struct zone *zone,
>  +                      struct scan_control *sc)
>   {
>         unsigned long nr_active;
>         unsigned long nr_inactive;
>  @@ -1236,8 +1239,9 @@ static unsigned long shrink_zone(int pri
>                 }
>         }
>
>  +       sc->nr_reclaimed += nr_reclaimed;
>         throttle_vm_writeout(sc->gfp_mask);
>  -       return nr_reclaimed;
>  +       return 0;
>   }

I am not sure this is right.
I might be wrong if this patch is depended on another patch.

As I see, shrink_zone always return 0 in your patch.

If it is right, I think that return value is useless. It is better
that we change function return type to "void"
Also, we have to change functions that call shrink_zone properly. ex)
balance_pgdat, __zone_reclaim
That functions still use number of shrink_zone's reclaim page

-- 
Thanks,
barrios

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [-mm][PATCH 3/5] change function prototype of shrink_zone()
  2008-05-05  4:42   ` minchan Kim
@ 2008-05-05  8:31     ` KOSAKI Motohiro
  2008-05-05  8:37       ` minchan Kim
  0 siblings, 1 reply; 21+ messages in thread
From: KOSAKI Motohiro @ 2008-05-05  8:31 UTC (permalink / raw)
  To: minchan Kim; +Cc: LKML, linux-mm, Andrew Morton

Hi

>  >  +       sc->nr_reclaimed += nr_reclaimed;
>  >         throttle_vm_writeout(sc->gfp_mask);
>  >  -       return nr_reclaimed;
>  >  +       return 0;
>  >   }
>
>  I am not sure this is right.
>  I might be wrong if this patch is depended on another patch.
>
>  As I see, shrink_zone always return 0 in your patch.

Yeah, this patch is just preparetion change of [4/5].
I use EAGAIN at [4/5].


>  If it is right, I think that return value is useless. It is better
>  that we change function return type to "void"
>  Also, we have to change functions that call shrink_zone properly. ex)
>  balance_pgdat, __zone_reclaim
>  That functions still use number of shrink_zone's reclaim page

this patch is not intent by solo usage.
just intent to bisect friendly.
thus, We need implement that following patch use freature only.

Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [-mm][PATCH 3/5] change function prototype of shrink_zone()
  2008-05-05  8:31     ` KOSAKI Motohiro
@ 2008-05-05  8:37       ` minchan Kim
  0 siblings, 0 replies; 21+ messages in thread
From: minchan Kim @ 2008-05-05  8:37 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrew Morton

OK. I see
I seemed to be in a hurry without looking over following patches.

On Mon, May 5, 2008 at 5:31 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
> Hi
>
> >  >  +       sc->nr_reclaimed += nr_reclaimed;
> >  >         throttle_vm_writeout(sc->gfp_mask);
> >  >  -       return nr_reclaimed;
> >  >  +       return 0;
> >  >   }
> >
> >  I am not sure this is right.
> >  I might be wrong if this patch is depended on another patch.
> >
> >  As I see, shrink_zone always return 0 in your patch.
>
> Yeah, this patch is just preparetion change of [4/5].
> I use EAGAIN at [4/5].
>
>
> >  If it is right, I think that return value is useless. It is better
> >  that we change function return type to "void"
> >  Also, we have to change functions that call shrink_zone properly. ex)
> >  balance_pgdat, __zone_reclaim
> >  That functions still use number of shrink_zone's reclaim page
>
> this patch is not intent by solo usage.
> just intent to bisect friendly.
> thus, We need implement that following patch use freature only.
>
> Thanks!
>



-- 
Thanks,
barrios

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [-mm][PATCH 4/5] core of reclaim throttle
  2008-05-04 12:53 [-mm][PATCH 0/5] mm: page reclaim throttle v6 KOSAKI Motohiro
                   ` (2 preceding siblings ...)
  2008-05-04 12:58 ` [-mm][PATCH 3/5] change function prototype of shrink_zone() KOSAKI Motohiro
@ 2008-05-04 12:59 ` KOSAKI Motohiro
  2008-05-04 13:12   ` KOSAKI Motohiro
  2008-05-04 13:01 ` [-mm][PATCH 5/5] introduce sysctl parameter of max task of throttle KOSAKI Motohiro
  2008-05-04 14:38 ` [-mm][PATCH 0/5] mm: page reclaim throttle v6 KOSAKI Motohiro
  5 siblings, 1 reply; 21+ messages in thread
From: KOSAKI Motohiro @ 2008-05-04 12:59 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton; +Cc: kosaki.motohiro

add throttle to shrink_zone() for performance improvement and prevent incorrect oom.


Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

---
 include/linux/mmzone.h |    2 +
 include/linux/sched.h  |    1 
 mm/Kconfig             |   10 +++++++++
 mm/page_alloc.c        |    4 +++
 mm/vmscan.c            |   52 ++++++++++++++++++++++++++++++++++++++++++++++++-
 5 files changed, 68 insertions(+), 1 deletion(-)

Index: b/include/linux/mmzone.h
===================================================================
--- a/include/linux/mmzone.h	2008-05-03 00:39:44.000000000 +0900
+++ b/include/linux/mmzone.h	2008-05-03 00:44:12.000000000 +0900
@@ -328,6 +328,8 @@ struct zone {
 	unsigned long		spanned_pages;	/* total size, including holes */
 	unsigned long		present_pages;	/* amount of memory (excluding holes) */
 
+	atomic_t		nr_reclaimers;
+	wait_queue_head_t	reclaim_throttle_waitq;
 	/*
 	 * rarely used fields:
 	 */
Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c	2008-05-03 00:39:44.000000000 +0900
+++ b/mm/page_alloc.c	2008-05-03 00:44:12.000000000 +0900
@@ -3502,6 +3502,10 @@ static void __paginginit free_area_init_
 		zone->nr_scan_inactive = 0;
 		zap_zone_vm_stats(zone);
 		zone->flags = 0;
+
+		zone->nr_reclaimers = ATOMIC_INIT(0);
+		init_waitqueue_head(&zone->reclaim_throttle_waitq);
+
 		if (!size)
 			continue;
 
Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c	2008-05-03 00:43:48.000000000 +0900
+++ b/mm/vmscan.c	2008-05-04 20:56:01.000000000 +0900
@@ -74,6 +74,11 @@ struct scan_control {
 
 	int order;
 
+	/* Can shrink be cutted off if other task freeded enough page. */
+	int may_cut_off;
+
+	unsigned long was_freed;
+
 	/* Which cgroup do we reclaim from */
 	struct mem_cgroup *mem_cgroup;
 
@@ -120,6 +125,7 @@ struct scan_control {
 int vm_swappiness = 60;
 long vm_total_pages;	/* The total number of pages which the VM controls */
 
+#define MAX_RECLAIM_TASKS CONFIG_NR_MAX_RECLAIM_TASKS_PER_ZONE
 static LIST_HEAD(shrinker_list);
 static DECLARE_RWSEM(shrinker_rwsem);
 
@@ -1187,7 +1193,42 @@ static int shrink_zone(int priority, str
 	unsigned long nr_inactive;
 	unsigned long nr_to_scan;
 	unsigned long nr_reclaimed = 0;
+	int ret = 0;
+	int throttle_on = 0;
+	unsigned long freed;
+
+	/* avoid recursing wait_evnet */
+	if (current->flags & PF_RECLAIMING)
+		goto shrinking;
+
+	throttle_on = 1;
+	current->flags |= PF_RECLAIMING;
+	wait_event(zone->reclaim_throttle_waitq,
+		 atomic_add_unless(&zone->nr_reclaimers, 1, MAX_RECLAIM_TASKS));
+
+	/* in some situation (e.g. hibernation), shrink processing shouldn't be
+	   cut off even though large memory freeded.  */
+	if (!sc->may_cut_off)
+		goto shrinking;
+
+	/* kswapd is no related for user latency experience. */
+	if (current->flags & PF_KSWAPD)
+		goto shrinking;
+
+	/* reclaim still necessary? */
+	freed = get_vm_event(PGFREE);
+	if (scan_global_lru(sc) &&
+	    ((freed - sc->was_freed) >= (zone->pages_high*4))) {
+		sc->was_freed = freed;
+
+		if (zone_watermark_ok(zone, sc->order, zone->pages_high,
+				      gfp_zone(sc->gfp_mask), 0)) {
+			ret = -EAGAIN;
+			goto out;
+		}
+	}
 
+shrinking:
 	if (scan_global_lru(sc)) {
 		/*
 		 * Add one to nr_to_scan just to make sure that the kernel
@@ -1239,9 +1280,16 @@ static int shrink_zone(int priority, str
 		}
 	}
 
+out:
+	if (throttle_on) {
+		current->flags &= ~PF_RECLAIMING;
+		atomic_dec(&zone->nr_reclaimers);
+		wake_up(&zone->reclaim_throttle_waitq);
+	}
+
 	sc->nr_reclaimed += nr_reclaimed;
 	throttle_vm_writeout(sc->gfp_mask);
-	return 0;
+	return ret;
 }
 
 /*
@@ -1438,6 +1486,8 @@ unsigned long try_to_free_pages(struct z
 		.order = order,
 		.mem_cgroup = NULL,
 		.isolate_pages = isolate_pages_global,
+		.may_cut_off = 1,
+		.was_freed = get_vm_event(PGFREE),
 	};
 
 	return do_try_to_free_pages(zonelist, &sc);
Index: b/mm/Kconfig
===================================================================
--- a/mm/Kconfig	2008-05-03 00:39:44.000000000 +0900
+++ b/mm/Kconfig	2008-05-03 00:44:12.000000000 +0900
@@ -205,3 +205,13 @@ config NR_QUICK
 config VIRT_TO_BUS
 	def_bool y
 	depends on !ARCH_NO_VIRT_TO_BUS
+
+config NR_MAX_RECLAIM_TASKS_PER_ZONE
+	int "maximum number of reclaiming tasks at the same time"
+	default 3
+	help
+	  This value determines the number of threads which can do page reclaim
+	  in a zone simultaneously. If this is too big, performance under heavy memory
+	  pressure will decrease.
+	  If unsure, use default.
+
Index: b/include/linux/sched.h
===================================================================
--- a/include/linux/sched.h	2008-05-03 00:39:44.000000000 +0900
+++ b/include/linux/sched.h	2008-05-03 00:44:12.000000000 +0900
@@ -1484,6 +1484,7 @@ static inline void put_task_struct(struc
 #define PF_MEMPOLICY	0x10000000	/* Non-default NUMA mempolicy */
 #define PF_MUTEX_TESTER	0x20000000	/* Thread belongs to the rt mutex tester */
 #define PF_FREEZER_SKIP	0x40000000	/* Freezer should not count it as freezeable */
+#define PF_RECLAIMING   0x80000000      /* The task have page reclaim throttling ticket */
 
 /*
  * Only the _current_ task can read/write to tsk->flags, but other


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [-mm][PATCH 4/5] core of reclaim throttle
  2008-05-04 12:59 ` [-mm][PATCH 4/5] core of reclaim throttle KOSAKI Motohiro
@ 2008-05-04 13:12   ` KOSAKI Motohiro
  2008-05-05  5:21     ` minchan Kim
  2008-05-05 21:51     ` Rik van Riel
  0 siblings, 2 replies; 21+ messages in thread
From: KOSAKI Motohiro @ 2008-05-04 13:12 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton; +Cc: kosaki.motohiro

Agghh!
I attached old version at last mail, please drop it.

right patch is here.


------------------------------------------------------------------------
add throttle to shrink_zone() for performance improvement and prevent incorrect oom.


Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

---
 include/linux/mmzone.h |    2 +
 include/linux/sched.h  |    1 
 mm/Kconfig             |   10 ++++++++
 mm/page_alloc.c        |    4 +++
 mm/vmscan.c            |   56 ++++++++++++++++++++++++++++++++++++++++++++++++-
 5 files changed, 72 insertions(+), 1 deletion(-)

Index: b/include/linux/mmzone.h
===================================================================
--- a/include/linux/mmzone.h	2008-05-03 00:39:44.000000000 +0900
+++ b/include/linux/mmzone.h	2008-05-03 00:44:12.000000000 +0900
@@ -328,6 +328,8 @@ struct zone {
 	unsigned long		spanned_pages;	/* total size, including holes */
 	unsigned long		present_pages;	/* amount of memory (excluding holes) */
 
+	atomic_t		nr_reclaimers;
+	wait_queue_head_t	reclaim_throttle_waitq;
 	/*
 	 * rarely used fields:
 	 */
Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c	2008-05-03 00:39:44.000000000 +0900
+++ b/mm/page_alloc.c	2008-05-03 00:44:12.000000000 +0900
@@ -3502,6 +3502,10 @@ static void __paginginit free_area_init_
 		zone->nr_scan_inactive = 0;
 		zap_zone_vm_stats(zone);
 		zone->flags = 0;
+
+		zone->nr_reclaimers = ATOMIC_INIT(0);
+		init_waitqueue_head(&zone->reclaim_throttle_waitq);
+
 		if (!size)
 			continue;
 
Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c	2008-05-03 00:43:48.000000000 +0900
+++ b/mm/vmscan.c	2008-05-04 22:47:49.000000000 +0900
@@ -74,6 +74,11 @@ struct scan_control {
 
 	int order;
 
+	/* Can shrink be cutted off if other task freeded enough page. */
+	int may_cut_off;
+
+	unsigned long was_freed;
+
 	/* Which cgroup do we reclaim from */
 	struct mem_cgroup *mem_cgroup;
 
@@ -120,6 +125,7 @@ struct scan_control {
 int vm_swappiness = 60;
 long vm_total_pages;	/* The total number of pages which the VM controls */
 
+#define MAX_RECLAIM_TASKS CONFIG_NR_MAX_RECLAIM_TASKS_PER_ZONE
 static LIST_HEAD(shrinker_list);
 static DECLARE_RWSEM(shrinker_rwsem);
 
@@ -1187,7 +1193,46 @@ static int shrink_zone(int priority, str
 	unsigned long nr_inactive;
 	unsigned long nr_to_scan;
 	unsigned long nr_reclaimed = 0;
+	int ret = 0;
+	int throttle_on = 0;
+	unsigned long freed;
+	unsigned long threshold;
+
+	/* avoid recursing wait_evnet */
+	if (current->flags & PF_RECLAIMING)
+		goto shrinking;
+
+	throttle_on = 1;
+	current->flags |= PF_RECLAIMING;
+	wait_event(zone->reclaim_throttle_waitq,
+		 atomic_add_unless(&zone->nr_reclaimers, 1, MAX_RECLAIM_TASKS));
+
+	/* in some situation (e.g. hibernation), shrink processing shouldn't be
+	   cut off even though large memory freeded.  */
+	if (!sc->may_cut_off)
+		goto shrinking;
+
+	/* kswapd is no related for user latency experience. */
+	if (current->flags & PF_KSWAPD)
+		goto shrinking;
+
+	/* x4 ratio mean we want rarely check.
+	   because frequently check decrease performance. */
+	threshold = ((1 << sc->order) + zone->pages_high) * 4;
+	freed = get_vm_event(PGFREE);
+
+	/* reclaim still necessary? */
+	if (scan_global_lru(sc) &&
+	    freed - sc->was_freed >= threshold) {
+		if (zone_watermark_ok(zone, sc->order, zone->pages_high,
+				      gfp_zone(sc->gfp_mask), 0)) {
+			ret = -EAGAIN;
+			goto out;
+		}
+		sc->was_freed = freed;
+	}
 
+shrinking:
 	if (scan_global_lru(sc)) {
 		/*
 		 * Add one to nr_to_scan just to make sure that the kernel
@@ -1239,9 +1284,16 @@ static int shrink_zone(int priority, str
 		}
 	}
 
+out:
+	if (throttle_on) {
+		current->flags &= ~PF_RECLAIMING;
+		atomic_dec(&zone->nr_reclaimers);
+		wake_up(&zone->reclaim_throttle_waitq);
+	}
+
 	sc->nr_reclaimed += nr_reclaimed;
 	throttle_vm_writeout(sc->gfp_mask);
-	return 0;
+	return ret;
 }
 
 /*
@@ -1438,6 +1490,8 @@ unsigned long try_to_free_pages(struct z
 		.order = order,
 		.mem_cgroup = NULL,
 		.isolate_pages = isolate_pages_global,
+		.may_cut_off = 1,
+		.was_freed = get_vm_event(PGFREE),
 	};
 
 	return do_try_to_free_pages(zonelist, &sc);
Index: b/mm/Kconfig
===================================================================
--- a/mm/Kconfig	2008-05-03 00:39:44.000000000 +0900
+++ b/mm/Kconfig	2008-05-03 00:44:12.000000000 +0900
@@ -205,3 +205,13 @@ config NR_QUICK
 config VIRT_TO_BUS
 	def_bool y
 	depends on !ARCH_NO_VIRT_TO_BUS
+
+config NR_MAX_RECLAIM_TASKS_PER_ZONE
+	int "maximum number of reclaiming tasks at the same time"
+	default 3
+	help
+	  This value determines the number of threads which can do page reclaim
+	  in a zone simultaneously. If this is too big, performance under heavy memory
+	  pressure will decrease.
+	  If unsure, use default.
+
Index: b/include/linux/sched.h
===================================================================
--- a/include/linux/sched.h	2008-05-03 00:39:44.000000000 +0900
+++ b/include/linux/sched.h	2008-05-03 00:44:12.000000000 +0900
@@ -1484,6 +1484,7 @@ static inline void put_task_struct(struc
 #define PF_MEMPOLICY	0x10000000	/* Non-default NUMA mempolicy */
 #define PF_MUTEX_TESTER	0x20000000	/* Thread belongs to the rt mutex tester */
 #define PF_FREEZER_SKIP	0x40000000	/* Freezer should not count it as freezeable */
+#define PF_RECLAIMING   0x80000000      /* The task have page reclaim throttling ticket */
 
 /*
  * Only the _current_ task can read/write to tsk->flags, but other


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [-mm][PATCH 4/5] core of reclaim throttle
  2008-05-04 13:12   ` KOSAKI Motohiro
@ 2008-05-05  5:21     ` minchan Kim
  2008-05-05  8:24       ` KOSAKI Motohiro
  2008-05-05 21:51     ` Rik van Riel
  1 sibling, 1 reply; 21+ messages in thread
From: minchan Kim @ 2008-05-05  5:21 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrew Morton

>  @@ -120,6 +125,7 @@ struct scan_control {
>   int vm_swappiness = 60;
>   long vm_total_pages;   /* The total number of pages which the VM controls */
>
>  +#define MAX_RECLAIM_TASKS CONFIG_NR_MAX_RECLAIM_TASKS_PER_ZONE
>   static LIST_HEAD(shrinker_list);
>   static DECLARE_RWSEM(shrinker_rwsem);
>
>  @@ -1187,7 +1193,46 @@ static int shrink_zone(int priority, str
>
>         unsigned long nr_inactive;
>         unsigned long nr_to_scan;
>         unsigned long nr_reclaimed = 0;
>  +       int ret = 0;
>  +       int throttle_on = 0;
>  +       unsigned long freed;
>  +       unsigned long threshold;
>
> +
>  +       /* avoid recursing wait_evnet */
>  +       if (current->flags & PF_RECLAIMING)
>  +               goto shrinking;
>  +
>  +       throttle_on = 1;
>  +       current->flags |= PF_RECLAIMING;
>  +       wait_event(zone->reclaim_throttle_waitq,
>  +                atomic_add_unless(&zone->nr_reclaimers, 1, MAX_RECLAIM_TASKS));
>  +
>  +       /* in some situation (e.g. hibernation), shrink processing shouldn't be
>  +          cut off even though large memory freeded.  */
>  +       if (!sc->may_cut_off)
>  +               goto shrinking;
>  +

where do you initialize may_cut_off ?
Current Implementation, may_cut_off is always "0" so always goto shrinking
-- 
Thanks,
barrios

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [-mm][PATCH 4/5] core of reclaim throttle
  2008-05-05  5:21     ` minchan Kim
@ 2008-05-05  8:24       ` KOSAKI Motohiro
  2008-05-05  8:32         ` minchan Kim
  0 siblings, 1 reply; 21+ messages in thread
From: KOSAKI Motohiro @ 2008-05-05  8:24 UTC (permalink / raw)
  To: minchan Kim; +Cc: LKML, linux-mm, Andrew Morton

>  >  +       /* in some situation (e.g. hibernation), shrink processing shouldn't be
>  >  +          cut off even though large memory freeded.  */
>  >  +       if (!sc->may_cut_off)
>  >  +               goto shrinking;
>  >  +
>
>  where do you initialize may_cut_off ?
>  Current Implementation, may_cut_off is always "0" so always goto shrinking

please see try_to_free_pages :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [-mm][PATCH 4/5] core of reclaim throttle
  2008-05-05  8:24       ` KOSAKI Motohiro
@ 2008-05-05  8:32         ` minchan Kim
  0 siblings, 0 replies; 21+ messages in thread
From: minchan Kim @ 2008-05-05  8:32 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrew Morton

I see.

My gmail client hide that contents.
I am sorry :-)

On Mon, May 5, 2008 at 5:24 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
> >  >  +       /* in some situation (e.g. hibernation), shrink processing shouldn't be
> >  >  +          cut off even though large memory freeded.  */
> >  >  +       if (!sc->may_cut_off)
> >  >  +               goto shrinking;
> >  >  +
> >
> >  where do you initialize may_cut_off ?
> >  Current Implementation, may_cut_off is always "0" so always goto shrinking
>
> please see try_to_free_pages :)
>



-- 
Thanks,
barrios

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [-mm][PATCH 4/5] core of reclaim throttle
  2008-05-04 13:12   ` KOSAKI Motohiro
  2008-05-05  5:21     ` minchan Kim
@ 2008-05-05 21:51     ` Rik van Riel
  2008-05-05 22:23       ` KOSAKI Motohiro
  1 sibling, 1 reply; 21+ messages in thread
From: Rik van Riel @ 2008-05-05 21:51 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrew Morton

On Sun, 04 May 2008 22:12:12 +0900
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> +	throttle_on = 1;
> +	current->flags |= PF_RECLAIMING;
> +	wait_event(zone->reclaim_throttle_waitq,
> +		 atomic_add_unless(&zone->nr_reclaimers, 1, MAX_RECLAIM_TASKS));

This is a problem.  Processes without __GFP_FS or __GFP_IO cannot wait on
processes that have those flags set in their gfp_mask, and tasks that do
not have __GFP_IO set cannot wait for tasks with it.  This is because the
tasks that have those flags set may grab locks that the tasks without the
flag are holding, causing a deadlock.

The easiest fix would be to only make tasks with both __GFP_FS and __GFP_IO
sleep.  Tasks that call try_to_free_pages without those flags are relatively
rare and should hopefully not cause any issues.

-- 
All Rights Reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [-mm][PATCH 4/5] core of reclaim throttle
  2008-05-05 21:51     ` Rik van Riel
@ 2008-05-05 22:23       ` KOSAKI Motohiro
  2008-05-06  0:43         ` Rik van Riel
  0 siblings, 1 reply; 21+ messages in thread
From: KOSAKI Motohiro @ 2008-05-05 22:23 UTC (permalink / raw)
  To: Rik van Riel; +Cc: LKML, linux-mm, Andrew Morton

>  > +     throttle_on = 1;
>  > +     current->flags |= PF_RECLAIMING;
>  > +     wait_event(zone->reclaim_throttle_waitq,
>  > +              atomic_add_unless(&zone->nr_reclaimers, 1, MAX_RECLAIM_TASKS));
>
>  This is a problem.  Processes without __GFP_FS or __GFP_IO cannot wait on
>  processes that have those flags set in their gfp_mask, and tasks that do
>  not have __GFP_IO set cannot wait for tasks with it.  This is because the
>  tasks that have those flags set may grab locks that the tasks without the
>  flag are holding, causing a deadlock.

hmmm, AFAIK,
on current kernel, sometimes __GFP_IO task wait for non __GFP_IO task
by lock_page().
Is this wrong?

therefore my patch care only recursive reclaim situation.
I don't object to your opinion. but I hope understand exactly your opinion.

>  The easiest fix would be to only make tasks with both __GFP_FS and __GFP_IO
>  sleep.  Tasks that call try_to_free_pages without those flags are relatively
>  rare and should hopefully not cause any issues.

Agreed it's easy.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [-mm][PATCH 4/5] core of reclaim throttle
  2008-05-05 22:23       ` KOSAKI Motohiro
@ 2008-05-06  0:43         ` Rik van Riel
  2008-05-06  1:01           ` KOSAKI Motohiro
  0 siblings, 1 reply; 21+ messages in thread
From: Rik van Riel @ 2008-05-06  0:43 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrew Morton

On Tue, 6 May 2008 07:23:18 +0900
"KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com> wrote:

> hmmm, AFAIK,
> on current kernel, sometimes __GFP_IO task wait for non __GFP_IO task
> by lock_page().
> Is this wrong?

This is fine.

The problem is adding a code path that causes non __GFP_IO tasks to
wait on __GFP_IO tasks.  Then you can have a deadlock.
 
> therefore my patch care only recursive reclaim situation.
> I don't object to your opinion. but I hope understand exactly your opinion.

I believe not all non __GFP_IO or non __GFP_FS calls are recursive
reclaim, but there are some other code paths too.  For example from
fs/buffer.c

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [-mm][PATCH 4/5] core of reclaim throttle
  2008-05-06  0:43         ` Rik van Riel
@ 2008-05-06  1:01           ` KOSAKI Motohiro
  0 siblings, 0 replies; 21+ messages in thread
From: KOSAKI Motohiro @ 2008-05-06  1:01 UTC (permalink / raw)
  To: Rik van Riel; +Cc: LKML, linux-mm, Andrew Morton

> > hmmm, AFAIK,
>  > on current kernel, sometimes __GFP_IO task wait for non __GFP_IO task
>  > by lock_page().
>  > Is this wrong?
>
>  This is fine.
>
>  The problem is adding a code path that causes non __GFP_IO tasks to
>  wait on __GFP_IO tasks.  Then you can have a deadlock.

Ah, OK.
I'll add __GFP_FS and __GFP_IO check at next post.

Thanks!


>  > therefore my patch care only recursive reclaim situation.
>  > I don't object to your opinion. but I hope understand exactly your opinion.
>
>  I believe not all non __GFP_IO or non __GFP_FS calls are recursive
>  reclaim, but there are some other code paths too.  For example from
>  fs/buffer.c

absolutely.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [-mm][PATCH 5/5] introduce sysctl parameter of max task of throttle
  2008-05-04 12:53 [-mm][PATCH 0/5] mm: page reclaim throttle v6 KOSAKI Motohiro
                   ` (3 preceding siblings ...)
  2008-05-04 12:59 ` [-mm][PATCH 4/5] core of reclaim throttle KOSAKI Motohiro
@ 2008-05-04 13:01 ` KOSAKI Motohiro
  2008-05-04 14:38 ` [-mm][PATCH 0/5] mm: page reclaim throttle v6 KOSAKI Motohiro
  5 siblings, 0 replies; 21+ messages in thread
From: KOSAKI Motohiro @ 2008-05-04 13:01 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton; +Cc: kosaki.motohiro

introduce sysctl parameter of max task of throttle.

<usage>
 # echo 5 > /proc/sys/vm/max_nr_task_per_zone
</usage>



Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>


---
 include/linux/swap.h |    2 ++
 kernel/sysctl.c      |    9 +++++++++
 mm/vmscan.c          |    3 ++-
 3 files changed, 13 insertions(+), 1 deletion(-)

Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c	2008-05-03 00:45:15.000000000 +0900
+++ b/mm/vmscan.c	2008-05-03 00:47:00.000000000 +0900
@@ -125,9 +125,10 @@ struct scan_control {
 int vm_swappiness = 60;
 long vm_total_pages;	/* The total number of pages which the VM controls */
 
-#define MAX_RECLAIM_TASKS CONFIG_NR_MAX_RECLAIM_TASKS_PER_ZONE
+#define MAX_RECLAIM_TASKS vm_max_nr_task_per_zone
 static LIST_HEAD(shrinker_list);
 static DECLARE_RWSEM(shrinker_rwsem);
+int vm_max_nr_task_per_zone = CONFIG_NR_MAX_RECLAIM_TASKS_PER_ZONE;
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 #define scan_global_lru(sc)	(!(sc)->mem_cgroup)
Index: b/include/linux/swap.h
===================================================================
--- a/include/linux/swap.h	2008-05-03 00:22:33.000000000 +0900
+++ b/include/linux/swap.h	2008-05-03 00:47:00.000000000 +0900
@@ -206,6 +206,8 @@ static inline int zone_reclaim(struct zo
 
 extern int kswapd_run(int nid);
 
+extern int vm_max_nr_task_per_zone;
+
 #ifdef CONFIG_MMU
 /* linux/mm/shmem.c */
 extern int shmem_unuse(swp_entry_t entry, struct page *page);
Index: b/kernel/sysctl.c
===================================================================
--- a/kernel/sysctl.c	2008-05-03 00:22:33.000000000 +0900
+++ b/kernel/sysctl.c	2008-05-03 00:47:00.000000000 +0900
@@ -1150,6 +1150,15 @@ static struct ctl_table vm_table[] = {
 		.extra2		= &one,
 	},
 #endif
+	{
+		.ctl_name       = CTL_UNNUMBERED,
+		.procname       = "max_nr_task_per_zone",
+		.data           = &vm_max_nr_task_per_zone,
+		.maxlen         = sizeof(vm_max_nr_task_per_zone),
+		.mode           = 0644,
+		.proc_handler   = &proc_dointvec,
+		.strategy       = &sysctl_intvec,
+	},
 /*
  * NOTE: do not add new entries to this table unless you have read
  * Documentation/sysctl/ctl_unnumbered.txt


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [-mm][PATCH 0/5] mm: page reclaim throttle v6
  2008-05-04 12:53 [-mm][PATCH 0/5] mm: page reclaim throttle v6 KOSAKI Motohiro
                   ` (4 preceding siblings ...)
  2008-05-04 13:01 ` [-mm][PATCH 5/5] introduce sysctl parameter of max task of throttle KOSAKI Motohiro
@ 2008-05-04 14:38 ` KOSAKI Motohiro
  5 siblings, 0 replies; 21+ messages in thread
From: KOSAKI Motohiro @ 2008-05-04 14:38 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Rik van Riel, Lee Schermerhorn
  Cc: kosaki.motohiro

page reclaim throttle + split lru series performance is below.
I think its combination is best.


    num_group       vanilla      with throttle     throttle + split lru
   -----------------------------------------------------------------
      80              26.22           24.97           23.75
      85              27.31           25.94           27.01
      90              29.23           26.77           26.90
      95              30.73           28.40           28.81
     100              32.02           30.62           29.18
     105              33.97           31.93           32.21
     110              35.37           33.19           33.10
     115              36.96           33.68           33.90
     120              74.05           36.25           36.58
     125              41.07           39.30           36.64
     130              86.92           45.74           40.55
     135             234.62           45.99           47.18
     140             291.95           57.82           58.91
     145             425.35           70.31           50.63
     150             766.92          113.28          105.33



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2008-05-06  3:29 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-05-04 12:53 [-mm][PATCH 0/5] mm: page reclaim throttle v6 KOSAKI Motohiro
2008-05-04 12:55 ` [-mm][PATCH 1/5] fix overflow problem of do_try_to_free_page() KOSAKI Motohiro
2008-05-05  8:12   ` Nishanth Aravamudan
2008-05-06  3:29     ` KOSAKI Motohiro
2008-05-04 12:57 ` [-mm][PATCH 2/5] introduce get_vm_event() KOSAKI Motohiro
2008-05-05 21:47   ` Rik van Riel
2008-05-04 12:58 ` [-mm][PATCH 3/5] change function prototype of shrink_zone() KOSAKI Motohiro
2008-05-05  4:42   ` minchan Kim
2008-05-05  8:31     ` KOSAKI Motohiro
2008-05-05  8:37       ` minchan Kim
2008-05-04 12:59 ` [-mm][PATCH 4/5] core of reclaim throttle KOSAKI Motohiro
2008-05-04 13:12   ` KOSAKI Motohiro
2008-05-05  5:21     ` minchan Kim
2008-05-05  8:24       ` KOSAKI Motohiro
2008-05-05  8:32         ` minchan Kim
2008-05-05 21:51     ` Rik van Riel
2008-05-05 22:23       ` KOSAKI Motohiro
2008-05-06  0:43         ` Rik van Riel
2008-05-06  1:01           ` KOSAKI Motohiro
2008-05-04 13:01 ` [-mm][PATCH 5/5] introduce sysctl parameter of max task of throttle KOSAKI Motohiro
2008-05-04 14:38 ` [-mm][PATCH 0/5] mm: page reclaim throttle v6 KOSAKI Motohiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox