* Re: [PATCH 0/5] page reclaim throttle v7
2008-06-05 2:12 [PATCH 0/5] page reclaim throttle v7 kosaki.motohiro
@ 2008-06-05 2:06 ` KAMEZAWA Hiroyuki
2008-06-05 2:23 ` KOSAKI Motohiro
2008-06-05 2:12 ` [PATCH 1/5] fix incorrect variable type of do_try_to_free_pages() kosaki.motohiro
` (4 subsequent siblings)
5 siblings, 1 reply; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-05 2:06 UTC (permalink / raw)
To: kosaki.motohiro; +Cc: LKML, linux-mm, Andrew Morton
On Thu, 05 Jun 2008 11:12:11 +0900
kosaki.motohiro@jp.fujitsu.com wrote:
> Hi
>
> I post latest version of page reclaim patch series.
>
> This patch is holding up very well under usex stress test
> over 24+ hours :)
>
>
> Against: 2.6.26-rc2-mm1
>
I like this series and I'd like to support this under memcg when
this goes to mainline. (it seems better to test this for a while
before adding some memcg-related changes.)
Then, please give me inputs.
What do you think do I have to do for supporting this in memcg ?
Handling the case of scan_global_lru(sc)==false is enough ?
Thanks,
-Kame
>
> changelog
> ========================================
> v6 -> v7
> o rebase to 2.6.26-rc2-mm1
> o get_vm_stat: make cpu-unplug safety.
> o mark vm_max_nr_task_per_zone __read_mostly.
> o add check __GFP_FS, __GFP_IO for avoid deadlock.
> o fixed compile error on x86_64.
>
> v5 -> v6
> o rebase to 2.6.25-mm1
> o use PGFREE statics instead wall time.
> o separate function type change patch and introduce throttle patch.
>
> v4 -> v5
> o rebase to 2.6.25-rc8-mm1
>
> v3 -> v4:
> o fixed recursive shrink_zone problem.
> o add last_checked variable in shrink_zone for
> prevent corner case regression.
>
> v2 -> v3:
> o use wake_up() instead wake_up_all()
> o max reclaimers can be changed Kconfig option and sysctl.
> o some cleanups
>
> v1 -> v2:
> o make per zone throttle
>
>
>
> background
> =====================================
> current VM implementation doesn't has limit of # of parallel reclaim.
> when heavy workload, it bring to 2 bad things
> - heavy lock contention
> - unnecessary swap out
>
> at end of last year, KAMEZAWA Hiroyuki proposed the patch of page
> reclaim throttle and explain it improve reclaim time.
> http://marc.info/?l=linux-mm&m=119667465917215&w=2
>
> but unfortunately it works only memcgroup reclaim.
> since, I implement it again for support global reclaim and mesure it.
>
>
> benefit
> =====================================
> <<1. fix the bug of incorrect OOM killer>>
>
> if do following commanc, sometimes OOM killer happened.
> (OOM happend about 10%)
>
> $ ./hackbench 125 process 1000
>
> because following bad scenario is happend.
>
> 1. memory shortage happend.
> 2. many task call shrink_zone at the same time.
> 3. thus, All page are isolated from LRU at the same time.
> 4. the last task can't isolate any page from LRU.
> 5. it cause reclaim failure.
> 6. it cause OOM killer.
>
> my patch is directly solution for that problem.
>
>
> <<2. performance improvement>>
> I mesure RvR Split LRU series + page reclaim throttle series performance by hackbench.
>
> result number mean seconds (i.e. smaller is better)
>
>
> + split_lru improvement
> num_group 2.6.26-rc2-mm1 + throttle ratio
> -----------------------------------------------------------------
> 100 28.383 28.247
> 110 31.237 30.83
> 120 33.282 33.473
> 130 36.530 37.356
> 140 101.041 44.873 >200%
> 150 795.020 96.265 >800%
>
>
> Why this patch imrove performance?
>
> vanilla kernel get unstable performance at swap happend because
> unnecessary swap out happend freqently.
> this patch doesn't improvement best case, but be able to prevent worst case.
> thus, The average performance of hackbench increase largely.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 0/5] page reclaim throttle v7
2008-06-05 2:06 ` KAMEZAWA Hiroyuki
@ 2008-06-05 2:23 ` KOSAKI Motohiro
0 siblings, 0 replies; 16+ messages in thread
From: KOSAKI Motohiro @ 2008-06-05 2:23 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton
Hi kame-san,
> I like this series and I'd like to support this under memcg when
> this goes to mainline. (it seems better to test this for a while
> before adding some memcg-related changes.)
>
> Then, please give me inputs.
> What do you think do I have to do for supporting this in memcg ?
> Handling the case of scan_global_lru(sc)==false is enough ?
my patch have 2 improvement.
1. ristrict reclaiming parallerism of #task (throttle)
2. reclaiming cut off if other task already freed enough memory.
we already consider #1 on memcg and works well.
but we doesn't support #2 on memcg because balbir-san's said
"memcg doesn't need it".
if you need improvement of #2, please change blow portion of my patch.
> + /* reclaim still necessary? */
> + if (scan_global_lru(sc) &&
> + freed - sc->was_freed >= threshold) {
> + if (zone_watermark_ok(zone, sc->order, zone->pages_high,
> + gfp_zone(sc->gfp_mask), 0)) {
> + ret = -EAGAIN;
> + goto out;
> + }
> + sc->was_freed = freed;
> + }
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/5] fix incorrect variable type of do_try_to_free_pages()
2008-06-05 2:12 [PATCH 0/5] page reclaim throttle v7 kosaki.motohiro
2008-06-05 2:06 ` KAMEZAWA Hiroyuki
@ 2008-06-05 2:12 ` kosaki.motohiro
2008-06-05 1:26 ` KOSAKI Motohiro
2008-06-05 2:12 ` [PATCH 2/5] introduce get_vm_event() kosaki.motohiro
` (3 subsequent siblings)
5 siblings, 1 reply; 16+ messages in thread
From: kosaki.motohiro @ 2008-06-05 2:12 UTC (permalink / raw)
To: LKML, linux-mm, Andrew Morton; +Cc: kosaki.motohiro
[-- Attachment #1: 01-fix-do_try_to_free_pages-ret.patch --]
[-- Type: text/plain, Size: 967 bytes --]
"Smarter retry of costly-order allocations" patch series change behaver of do_try_to_free_pages().
but unfortunately ret variable tyep unchanged.
thus, overflow problem is possible.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
mm/vmscan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1317,7 +1317,7 @@ static unsigned long do_try_to_free_page
struct scan_control *sc)
{
int priority;
- int ret = 0;
+ unsigned long ret = 0;
unsigned long total_scanned = 0;
unsigned long nr_reclaimed = 0;
struct reclaim_state *reclaim_state = current->reclaim_state;
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 1/5] fix incorrect variable type of do_try_to_free_pages()
2008-06-05 2:12 ` [PATCH 1/5] fix incorrect variable type of do_try_to_free_pages() kosaki.motohiro
@ 2008-06-05 1:26 ` KOSAKI Motohiro
0 siblings, 0 replies; 16+ messages in thread
From: KOSAKI Motohiro @ 2008-06-05 1:26 UTC (permalink / raw)
To: kosaki.motohiro; +Cc: LKML, linux-mm, Andrew Morton, Nishanth Aravamudan
> "Smarter retry of costly-order allocations" patch series change behaver of do_try_to_free_pages().
> but unfortunately ret variable tyep unchanged.
>
> thus, overflow problem is possible.
>
>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
sorry, this patch already get Nishanth-san's ACK.
I'll append it and resend by this mail.
----------------------------
fix incorrect variable type of do_try_to_free_pages()
"Smarter retry of costly-order allocations" patch series change behaver of do_try_to_free_pages().
but unfortunately ret variable tyep unchanged.
thus, overflow problem is possible.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
mm/vmscan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1317,7 +1317,7 @@ static unsigned long do_try_to_free_page
struct scan_control *sc)
{
int priority;
- int ret = 0;
+ unsigned long ret = 0;
unsigned long total_scanned = 0;
unsigned long nr_reclaimed = 0;
struct reclaim_state *reclaim_state = current->reclaim_state;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 2/5] introduce get_vm_event().
2008-06-05 2:12 [PATCH 0/5] page reclaim throttle v7 kosaki.motohiro
2008-06-05 2:06 ` KAMEZAWA Hiroyuki
2008-06-05 2:12 ` [PATCH 1/5] fix incorrect variable type of do_try_to_free_pages() kosaki.motohiro
@ 2008-06-05 2:12 ` kosaki.motohiro
2008-06-05 1:29 ` KOSAKI Motohiro
2008-06-05 2:12 ` [PATCH 3/5] change return type of shrink_zone() kosaki.motohiro
` (2 subsequent siblings)
5 siblings, 1 reply; 16+ messages in thread
From: kosaki.motohiro @ 2008-06-05 2:12 UTC (permalink / raw)
To: LKML, linux-mm, Andrew Morton; +Cc: kosaki.motohiro
[-- Attachment #1: 02-get_vm_event.patch --]
[-- Type: text/plain, Size: 1725 bytes --]
introduce get_vm_event() new function for easy use vm statics.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
include/linux/vmstat.h | 7 ++++++-
mm/vmstat.c | 16 ++++++++++++++++
2 files changed, 22 insertions(+), 1 deletion(-)
Index: b/include/linux/vmstat.h
===================================================================
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -98,6 +98,8 @@ static inline void vm_events_fold_cpu(in
}
#endif
+unsigned long get_vm_event(enum vm_event_item event_type);
+
#else
/* Disable counters */
@@ -119,7 +121,10 @@ static inline void all_vm_events(unsigne
static inline void vm_events_fold_cpu(int cpu)
{
}
-
+static inline unsigned long get_vm_event(enum vm_event_item event_type)
+{
+ return 0;
+}
#endif /* CONFIG_VM_EVENT_COUNTERS */
#define __count_zone_vm_events(item, zone, delta) \
Index: b/mm/vmstat.c
===================================================================
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -49,6 +49,22 @@ void all_vm_events(unsigned long *ret)
}
EXPORT_SYMBOL_GPL(all_vm_events);
+unsigned long get_vm_event(enum vm_event_item event_type)
+{
+ int cpu;
+ unsigned long ret = 0;
+
+ get_online_cpus();
+ for_each_online_cpu(cpu) {
+ struct vm_event_state *this = &per_cpu(vm_event_states, cpu);
+
+ ret += this->event[event_type];
+ }
+ put_online_cpus();
+
+ return ret;
+}
+
#ifdef CONFIG_HOTPLUG
/*
* Fold the foreign cpu events into our own.
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 2/5] introduce get_vm_event().
2008-06-05 2:12 ` [PATCH 2/5] introduce get_vm_event() kosaki.motohiro
@ 2008-06-05 1:29 ` KOSAKI Motohiro
0 siblings, 0 replies; 16+ messages in thread
From: KOSAKI Motohiro @ 2008-06-05 1:29 UTC (permalink / raw)
To: kosaki.motohiro; +Cc: LKML, linux-mm, Andrew Morton, Rik van Riel
> introduce get_vm_event() new function for easy use vm statics.
>
>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
sorry, this patch already get Rik-san's ACK.
I'll append it and resend by this mail.
------------------------------------
introduce get_vm_event() new function for easy use vm statics.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Rik van Riel <riel@redhat.com>
---
include/linux/vmstat.h | 7 ++++++-
mm/vmstat.c | 16 ++++++++++++++++
2 files changed, 22 insertions(+), 1 deletion(-)
Index: b/include/linux/vmstat.h
===================================================================
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -98,6 +98,8 @@ static inline void vm_events_fold_cpu(in
}
#endif
+unsigned long get_vm_event(enum vm_event_item event_type);
+
#else
/* Disable counters */
@@ -119,7 +121,10 @@ static inline void all_vm_events(unsigne
static inline void vm_events_fold_cpu(int cpu)
{
}
-
+static inline unsigned long get_vm_event(enum vm_event_item event_type)
+{
+ return 0;
+}
#endif /* CONFIG_VM_EVENT_COUNTERS */
#define __count_zone_vm_events(item, zone, delta) \
Index: b/mm/vmstat.c
===================================================================
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -49,6 +49,22 @@ void all_vm_events(unsigned long *ret)
}
EXPORT_SYMBOL_GPL(all_vm_events);
+unsigned long get_vm_event(enum vm_event_item event_type)
+{
+ int cpu;
+ unsigned long ret = 0;
+
+ get_online_cpus();
+ for_each_online_cpu(cpu) {
+ struct vm_event_state *this = &per_cpu(vm_event_states, cpu);
+
+ ret += this->event[event_type];
+ }
+ put_online_cpus();
+
+ return ret;
+}
+
#ifdef CONFIG_HOTPLUG
/*
* Fold the foreign cpu events into our own.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 3/5] change return type of shrink_zone()
2008-06-05 2:12 [PATCH 0/5] page reclaim throttle v7 kosaki.motohiro
` (2 preceding siblings ...)
2008-06-05 2:12 ` [PATCH 2/5] introduce get_vm_event() kosaki.motohiro
@ 2008-06-05 2:12 ` kosaki.motohiro
2008-06-05 2:12 ` [PATCH 4/5] add throttle to shrink_zone() kosaki.motohiro
2008-06-05 2:12 ` [PATCH 5/5] introduce sysctl of throttle kosaki.motohiro
5 siblings, 0 replies; 16+ messages in thread
From: kosaki.motohiro @ 2008-06-05 2:12 UTC (permalink / raw)
To: LKML, linux-mm, Andrew Morton; +Cc: kosaki.motohiro
[-- Attachment #1: 03-change-return-type-of-shrink-function.patch --]
[-- Type: text/plain, Size: 7719 bytes --]
change function return type for following enhancement.
this patch have no behaver change.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
mm/vmscan.c | 71 +++++++++++++++++++++++++++++++++++++-----------------------
1 file changed, 44 insertions(+), 27 deletions(-)
Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -51,6 +51,9 @@ struct scan_control {
/* Incremented by the number of inactive pages that were scanned */
unsigned long nr_scanned;
+ /* number of reclaimed pages by this scanning */
+ unsigned long nr_reclaimed;
+
/* This context's GFP mask */
gfp_t gfp_mask;
@@ -1177,8 +1180,8 @@ static void shrink_active_list(unsigned
/*
* This is a basic per-zone page freer. Used by both kswapd and direct reclaim.
*/
-static unsigned long shrink_zone(int priority, struct zone *zone,
- struct scan_control *sc)
+static int shrink_zone(int priority, struct zone *zone,
+ struct scan_control *sc)
{
unsigned long nr_active;
unsigned long nr_inactive;
@@ -1236,8 +1239,9 @@ static unsigned long shrink_zone(int pri
}
}
+ sc->nr_reclaimed += nr_reclaimed;
throttle_vm_writeout(sc->gfp_mask);
- return nr_reclaimed;
+ return 0;
}
/*
@@ -1251,18 +1255,23 @@ static unsigned long shrink_zone(int pri
* b) The zones may be over pages_high but they must go *over* pages_high to
* satisfy the `incremental min' zone defense algorithm.
*
- * Returns the number of reclaimed pages.
+ * @priority: reclaim priority
+ * @zonelist: list of shrinking zones
+ * @sc: scan control context
+ * @ret_reclaimed: the number of reclaimed pages.
+ *
+ * Returns zonzero if error happend.
*
* If a zone is deemed to be full of pinned pages then just give it a light
* scan then give up on it.
*/
-static unsigned long shrink_zones(int priority, struct zonelist *zonelist,
- struct scan_control *sc)
+static int shrink_zones(int priority, struct zonelist *zonelist,
+ struct scan_control *sc)
{
enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask);
- unsigned long nr_reclaimed = 0;
struct zoneref *z;
struct zone *zone;
+ int ret = 0;
sc->all_unreclaimable = 1;
for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
@@ -1291,10 +1300,13 @@ static unsigned long shrink_zones(int pr
priority);
}
- nr_reclaimed += shrink_zone(priority, zone, sc);
+ ret = shrink_zone(priority, zone, sc);
+ if (ret)
+ goto out;
}
- return nr_reclaimed;
+out:
+ return ret;
}
/*
@@ -1319,12 +1331,12 @@ static unsigned long do_try_to_free_page
int priority;
unsigned long ret = 0;
unsigned long total_scanned = 0;
- unsigned long nr_reclaimed = 0;
struct reclaim_state *reclaim_state = current->reclaim_state;
unsigned long lru_pages = 0;
struct zoneref *z;
struct zone *zone;
enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask);
+ int err;
if (scan_global_lru(sc))
count_vm_event(ALLOCSTALL);
@@ -1346,7 +1358,12 @@ static unsigned long do_try_to_free_page
sc->nr_scanned = 0;
if (!priority)
disable_swap_token();
- nr_reclaimed += shrink_zones(priority, zonelist, sc);
+ err = shrink_zones(priority, zonelist, sc);
+ if (err == -EAGAIN) {
+ ret = 1;
+ goto out;
+ }
+
/*
* Don't shrink slabs when reclaiming memory from
* over limit cgroups
@@ -1354,13 +1371,14 @@ static unsigned long do_try_to_free_page
if (scan_global_lru(sc)) {
shrink_slab(sc->nr_scanned, sc->gfp_mask, lru_pages);
if (reclaim_state) {
- nr_reclaimed += reclaim_state->reclaimed_slab;
+ sc->nr_reclaimed +=
+ reclaim_state->reclaimed_slab;
reclaim_state->reclaimed_slab = 0;
}
}
total_scanned += sc->nr_scanned;
- if (nr_reclaimed >= sc->swap_cluster_max) {
- ret = nr_reclaimed;
+ if (sc->nr_reclaimed >= sc->swap_cluster_max) {
+ ret = sc->nr_reclaimed;
goto out;
}
@@ -1383,7 +1401,7 @@ static unsigned long do_try_to_free_page
}
/* top priority shrink_caches still had more to do? don't OOM, then */
if (!sc->all_unreclaimable && scan_global_lru(sc))
- ret = nr_reclaimed;
+ ret = sc->nr_reclaimed;
out:
/*
* Now that we've scanned all the zones at this priority level, note
@@ -1476,7 +1494,6 @@ static unsigned long balance_pgdat(pg_da
int priority;
int i;
unsigned long total_scanned;
- unsigned long nr_reclaimed;
struct reclaim_state *reclaim_state = current->reclaim_state;
struct scan_control sc = {
.gfp_mask = GFP_KERNEL,
@@ -1495,7 +1512,6 @@ static unsigned long balance_pgdat(pg_da
loop_again:
total_scanned = 0;
- nr_reclaimed = 0;
sc.may_writepage = !laptop_mode;
count_vm_event(PAGEOUTRUN);
@@ -1554,6 +1570,7 @@ loop_again:
for (i = 0; i <= end_zone; i++) {
struct zone *zone = pgdat->node_zones + i;
int nr_slab;
+ unsigned long write_threshold;
if (!populated_zone(zone))
continue;
@@ -1574,11 +1591,11 @@ loop_again:
*/
if (!zone_watermark_ok(zone, order, 8*zone->pages_high,
end_zone, 0))
- nr_reclaimed += shrink_zone(priority, zone, &sc);
+ shrink_zone(priority, zone, &sc);
reclaim_state->reclaimed_slab = 0;
nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL,
lru_pages);
- nr_reclaimed += reclaim_state->reclaimed_slab;
+ sc.nr_reclaimed += reclaim_state->reclaimed_slab;
total_scanned += sc.nr_scanned;
if (zone_is_all_unreclaimable(zone))
continue;
@@ -1592,8 +1609,9 @@ loop_again:
* the reclaim ratio is low, start doing writepage
* even in laptop mode
*/
+ write_threshold = sc.nr_reclaimed + sc.nr_reclaimed / 2;
if (total_scanned > SWAP_CLUSTER_MAX * 2 &&
- total_scanned > nr_reclaimed + nr_reclaimed / 2)
+ total_scanned > write_threshold)
sc.may_writepage = 1;
}
if (all_zones_ok)
@@ -1611,7 +1629,7 @@ loop_again:
* matches the direct reclaim path behaviour in terms of impact
* on zone->*_priority.
*/
- if (nr_reclaimed >= SWAP_CLUSTER_MAX)
+ if (sc.nr_reclaimed >= SWAP_CLUSTER_MAX)
break;
}
out:
@@ -1633,7 +1651,7 @@ out:
goto loop_again;
}
- return nr_reclaimed;
+ return sc.nr_reclaimed;
}
/*
@@ -1983,7 +2001,6 @@ static int __zone_reclaim(struct zone *z
struct task_struct *p = current;
struct reclaim_state reclaim_state;
int priority;
- unsigned long nr_reclaimed = 0;
struct scan_control sc = {
.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
.may_swap = !!(zone_reclaim_mode & RECLAIM_SWAP),
@@ -2016,9 +2033,9 @@ static int __zone_reclaim(struct zone *z
priority = ZONE_RECLAIM_PRIORITY;
do {
note_zone_scanning_priority(zone, priority);
- nr_reclaimed += shrink_zone(priority, zone, &sc);
+ shrink_zone(priority, zone, &sc);
priority--;
- } while (priority >= 0 && nr_reclaimed < nr_pages);
+ } while (priority >= 0 && sc.nr_reclaimed < nr_pages);
}
slab_reclaimable = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
@@ -2042,13 +2059,13 @@ static int __zone_reclaim(struct zone *z
* Update nr_reclaimed by the number of slab pages we
* reclaimed from this zone.
*/
- nr_reclaimed += slab_reclaimable -
+ sc.nr_reclaimed += slab_reclaimable -
zone_page_state(zone, NR_SLAB_RECLAIMABLE);
}
p->reclaim_state = NULL;
current->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE);
- return nr_reclaimed >= nr_pages;
+ return sc.nr_reclaimed >= nr_pages;
}
int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH 4/5] add throttle to shrink_zone()
2008-06-05 2:12 [PATCH 0/5] page reclaim throttle v7 kosaki.motohiro
` (3 preceding siblings ...)
2008-06-05 2:12 ` [PATCH 3/5] change return type of shrink_zone() kosaki.motohiro
@ 2008-06-05 2:12 ` kosaki.motohiro
2008-06-08 20:12 ` Andrew Morton
2008-06-05 2:12 ` [PATCH 5/5] introduce sysctl of throttle kosaki.motohiro
5 siblings, 1 reply; 16+ messages in thread
From: kosaki.motohiro @ 2008-06-05 2:12 UTC (permalink / raw)
To: LKML, linux-mm, Andrew Morton; +Cc: kosaki.motohiro
[-- Attachment #1: 04-reclaim-throttle-v7.patch --]
[-- Type: text/plain, Size: 5529 bytes --]
add throttle to shrink_zone() for performance improvement and prevent incorrect oom.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
include/linux/mmzone.h | 2 +
include/linux/sched.h | 1
mm/Kconfig | 10 +++++++
mm/page_alloc.c | 3 ++
mm/vmscan.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++-
5 files changed, 77 insertions(+), 1 deletion(-)
Index: b/include/linux/mmzone.h
===================================================================
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -328,6 +328,8 @@ struct zone {
unsigned long spanned_pages; /* total size, including holes */
unsigned long present_pages; /* amount of memory (excluding holes) */
+ atomic_t nr_reclaimers;
+ wait_queue_head_t reclaim_throttle_waitq;
/*
* rarely used fields:
*/
Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3500,6 +3500,9 @@ static void __paginginit free_area_init_
zone->nr_scan_inactive = 0;
zap_zone_vm_stats(zone);
zone->flags = 0;
+ atomic_set(&zone->nr_reclaimers, 0);
+ init_waitqueue_head(&zone->reclaim_throttle_waitq);
+
if (!size)
continue;
Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -74,6 +74,11 @@ struct scan_control {
int order;
+ /* Can shrink be cutted off if other task freeded enough page. */
+ int may_cut_off;
+
+ unsigned long was_freed;
+
/* Which cgroup do we reclaim from */
struct mem_cgroup *mem_cgroup;
@@ -120,6 +125,7 @@ struct scan_control {
int vm_swappiness = 60;
long vm_total_pages; /* The total number of pages which the VM controls */
+#define MAX_RECLAIM_TASKS CONFIG_NR_MAX_RECLAIM_TASKS_PER_ZONE
static LIST_HEAD(shrinker_list);
static DECLARE_RWSEM(shrinker_rwsem);
@@ -1187,7 +1193,52 @@ static int shrink_zone(int priority, str
unsigned long nr_inactive;
unsigned long nr_to_scan;
unsigned long nr_reclaimed = 0;
+ int ret = 0;
+ int throttle_on = 0;
+ unsigned long freed;
+ unsigned long threshold;
+
+ /* !__GFP_IO and/or !__GFP_FS task may grab some locks.
+ thus, if these tasks wait on other, it may cause deadlock. */
+ if ((sc->gfp_mask & (__GFP_IO | __GFP_FS)) != (__GFP_IO | __GFP_FS))
+ goto shrinking;
+
+ /* avoid recursing wait_evnet for avoid deadlock. */
+ if (current->flags & PF_RECLAIMING)
+ goto shrinking;
+
+ throttle_on = 1;
+ current->flags |= PF_RECLAIMING;
+ wait_event(zone->reclaim_throttle_waitq,
+ atomic_add_unless(&zone->nr_reclaimers, 1, MAX_RECLAIM_TASKS));
+
+
+ /* in some situation (e.g. hibernation), shrink processing shouldn't be
+ cut off even though large memory freeded. */
+ if (!sc->may_cut_off)
+ goto shrinking;
+
+ /* kswapd is no related for user latency experience. */
+ if (current->flags & PF_KSWAPD)
+ goto shrinking;
+
+ /* x4 ratio mean we want rarely check.
+ because frequently check decrease performance. */
+ threshold = ((1 << sc->order) + zone->pages_high) * 4;
+ freed = get_vm_event(PGFREE);
+
+ /* reclaim still necessary? */
+ if (scan_global_lru(sc) &&
+ freed - sc->was_freed >= threshold) {
+ if (zone_watermark_ok(zone, sc->order, zone->pages_high,
+ gfp_zone(sc->gfp_mask), 0)) {
+ ret = -EAGAIN;
+ goto out;
+ }
+ sc->was_freed = freed;
+ }
+shrinking:
if (scan_global_lru(sc)) {
/*
* Add one to nr_to_scan just to make sure that the kernel
@@ -1239,9 +1290,16 @@ static int shrink_zone(int priority, str
}
}
+out:
+ if (throttle_on) {
+ current->flags &= ~PF_RECLAIMING;
+ atomic_dec(&zone->nr_reclaimers);
+ wake_up(&zone->reclaim_throttle_waitq);
+ }
+
sc->nr_reclaimed += nr_reclaimed;
throttle_vm_writeout(sc->gfp_mask);
- return 0;
+ return ret;
}
/*
@@ -1439,6 +1497,8 @@ unsigned long try_to_free_pages(struct z
.order = order,
.mem_cgroup = NULL,
.isolate_pages = isolate_pages_global,
+ .may_cut_off = 1,
+ .was_freed = get_vm_event(PGFREE),
};
return do_try_to_free_pages(zonelist, &sc);
Index: b/mm/Kconfig
===================================================================
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -205,3 +205,13 @@ config NR_QUICK
config VIRT_TO_BUS
def_bool y
depends on !ARCH_NO_VIRT_TO_BUS
+
+config NR_MAX_RECLAIM_TASKS_PER_ZONE
+ int "maximum number of reclaiming tasks at the same time"
+ default 3
+ help
+ This value determines the number of threads which can do page reclaim
+ in a zone simultaneously. If this is too big, performance under heavy memory
+ pressure will decrease.
+ If unsure, use default.
+
Index: b/include/linux/sched.h
===================================================================
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1510,6 +1510,7 @@ static inline void put_task_struct(struc
#define PF_MEMPOLICY 0x10000000 /* Non-default NUMA mempolicy */
#define PF_MUTEX_TESTER 0x20000000 /* Thread belongs to the rt mutex tester */
#define PF_FREEZER_SKIP 0x40000000 /* Freezer should not count it as freezeable */
+#define PF_RECLAIMING 0x80000000 /* The task have page reclaim throttling ticket */
/*
* Only the _current_ task can read/write to tsk->flags, but other
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 4/5] add throttle to shrink_zone()
2008-06-05 2:12 ` [PATCH 4/5] add throttle to shrink_zone() kosaki.motohiro
@ 2008-06-08 20:12 ` Andrew Morton
2008-06-09 0:38 ` KOSAKI Motohiro
0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2008-06-08 20:12 UTC (permalink / raw)
To: kosaki.motohiro; +Cc: LKML, linux-mm
On Thu, 05 Jun 2008 11:12:15 +0900 kosaki.motohiro@jp.fujitsu.com wrote:
> add throttle to shrink_zone() for performance improvement and prevent incorrect oom.
We should have a description of how all this works, please. I thought
that was present in earlier iterations of this patchset.
It's quite hard and quite unreliable to reverse engineer both the
design and your thinking from the implementation.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/5] add throttle to shrink_zone()
2008-06-08 20:12 ` Andrew Morton
@ 2008-06-09 0:38 ` KOSAKI Motohiro
0 siblings, 0 replies; 16+ messages in thread
From: KOSAKI Motohiro @ 2008-06-09 0:38 UTC (permalink / raw)
To: Andrew Morton; +Cc: kosaki.motohiro, LKML, linux-mm
> > add throttle to shrink_zone() for performance improvement and prevent incorrect oom.
>
> We should have a description of how all this works, please. I thought
> that was present in earlier iterations of this patchset.
>
> It's quite hard and quite unreliable to reverse engineer both the
> design and your thinking from the implementation.
Oh, sorry.
I'll write properly description soon.
Thans.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 5/5] introduce sysctl of throttle
2008-06-05 2:12 [PATCH 0/5] page reclaim throttle v7 kosaki.motohiro
` (4 preceding siblings ...)
2008-06-05 2:12 ` [PATCH 4/5] add throttle to shrink_zone() kosaki.motohiro
@ 2008-06-05 2:12 ` kosaki.motohiro
2008-06-08 20:09 ` Andrew Morton
2008-06-08 20:10 ` Andrew Morton
5 siblings, 2 replies; 16+ messages in thread
From: kosaki.motohiro @ 2008-06-05 2:12 UTC (permalink / raw)
To: LKML, linux-mm, Andrew Morton; +Cc: kosaki.motohiro
[-- Attachment #1: 05-reclaim-throttle-sysctl-v7.patch --]
[-- Type: text/plain, Size: 2248 bytes --]
introduce sysctl parameter of max task of throttle.
<usage>
# echo 5 > /proc/sys/vm/max_nr_task_per_zone
</usage>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
include/linux/swap.h | 2 ++
kernel/sysctl.c | 9 +++++++++
mm/vmscan.c | 4 +++-
3 files changed, 14 insertions(+), 1 deletion(-)
Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -125,9 +125,11 @@ struct scan_control {
int vm_swappiness = 60;
long vm_total_pages; /* The total number of pages which the VM controls */
-#define MAX_RECLAIM_TASKS CONFIG_NR_MAX_RECLAIM_TASKS_PER_ZONE
+#define MAX_RECLAIM_TASKS vm_max_nr_task_per_zone
static LIST_HEAD(shrinker_list);
static DECLARE_RWSEM(shrinker_rwsem);
+int vm_max_nr_task_per_zone __read_mostly
+ = CONFIG_NR_MAX_RECLAIM_TASKS_PER_ZONE;
#ifdef CONFIG_CGROUP_MEM_RES_CTLR
#define scan_global_lru(sc) (!(sc)->mem_cgroup)
Index: b/include/linux/swap.h
===================================================================
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -206,6 +206,8 @@ static inline int zone_reclaim(struct zo
extern int kswapd_run(int nid);
+extern int vm_max_nr_task_per_zone;
+
#ifdef CONFIG_MMU
/* linux/mm/shmem.c */
extern int shmem_unuse(swp_entry_t entry, struct page *page);
Index: b/kernel/sysctl.c
===================================================================
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1151,6 +1151,15 @@ static struct ctl_table vm_table[] = {
.extra2 = &one,
},
#endif
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "max_nr_task_per_zone",
+ .data = &vm_max_nr_task_per_zone,
+ .maxlen = sizeof(vm_max_nr_task_per_zone),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ .strategy = &sysctl_intvec,
+ },
/*
* NOTE: do not add new entries to this table unless you have read
* Documentation/sysctl/ctl_unnumbered.txt
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 5/5] introduce sysctl of throttle
2008-06-05 2:12 ` [PATCH 5/5] introduce sysctl of throttle kosaki.motohiro
@ 2008-06-08 20:09 ` Andrew Morton
2008-06-09 0:37 ` KOSAKI Motohiro
2008-06-08 20:10 ` Andrew Morton
1 sibling, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2008-06-08 20:09 UTC (permalink / raw)
To: kosaki.motohiro; +Cc: LKML, linux-mm
On Thu, 05 Jun 2008 11:12:16 +0900 kosaki.motohiro@jp.fujitsu.com wrote:
> introduce sysctl parameter of max task of throttle.
>
> <usage>
> # echo 5 > /proc/sys/vm/max_nr_task_per_zone
> </usage>
>
>
>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>
>
> ---
> include/linux/swap.h | 2 ++
> kernel/sysctl.c | 9 +++++++++
> mm/vmscan.c | 4 +++-
> 3 files changed, 14 insertions(+), 1 deletion(-)
>
> Index: b/mm/vmscan.c
> ===================================================================
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -125,9 +125,11 @@ struct scan_control {
> int vm_swappiness = 60;
> long vm_total_pages; /* The total number of pages which the VM controls */
>
> -#define MAX_RECLAIM_TASKS CONFIG_NR_MAX_RECLAIM_TASKS_PER_ZONE
> +#define MAX_RECLAIM_TASKS vm_max_nr_task_per_zone
> static LIST_HEAD(shrinker_list);
> static DECLARE_RWSEM(shrinker_rwsem);
> +int vm_max_nr_task_per_zone __read_mostly
> + = CONFIG_NR_MAX_RECLAIM_TASKS_PER_ZONE;
It would be nice if we could remove
CONFIG_NR_MAX_RECLAIM_TASKS_PER_ZONE altogether. Its a pretty obscure
thing and we haven't provided people wait any information which would
permit them to tune it anwyay.
In which case this patch should be folded into [4/5].
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 5/5] introduce sysctl of throttle
2008-06-08 20:09 ` Andrew Morton
@ 2008-06-09 0:37 ` KOSAKI Motohiro
0 siblings, 0 replies; 16+ messages in thread
From: KOSAKI Motohiro @ 2008-06-09 0:37 UTC (permalink / raw)
To: Andrew Morton; +Cc: kosaki.motohiro, LKML, linux-mm
> > +int vm_max_nr_task_per_zone __read_mostly
> > + = CONFIG_NR_MAX_RECLAIM_TASKS_PER_ZONE;
>
> It would be nice if we could remove
> CONFIG_NR_MAX_RECLAIM_TASKS_PER_ZONE altogether. Its a pretty obscure
> thing and we haven't provided people wait any information which would
> permit them to tune it anwyay.
>
> In which case this patch should be folded into [4/5].
Sure.
I'll remove CONFIG_NR_MAX_RECLAIM_TASKS_PER_ZONE Kconfig.
Thanks!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 5/5] introduce sysctl of throttle
2008-06-05 2:12 ` [PATCH 5/5] introduce sysctl of throttle kosaki.motohiro
2008-06-08 20:09 ` Andrew Morton
@ 2008-06-08 20:10 ` Andrew Morton
2008-06-09 0:34 ` KOSAKI Motohiro
1 sibling, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2008-06-08 20:10 UTC (permalink / raw)
To: kosaki.motohiro; +Cc: LKML, linux-mm
On Thu, 05 Jun 2008 11:12:16 +0900 kosaki.motohiro@jp.fujitsu.com wrote:
> # echo 5 > /proc/sys/vm/max_nr_task_per_zone
Please document /proc/sys/vm tunables in Documentation/filesystems/proc.txt
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 5/5] introduce sysctl of throttle
2008-06-08 20:10 ` Andrew Morton
@ 2008-06-09 0:34 ` KOSAKI Motohiro
0 siblings, 0 replies; 16+ messages in thread
From: KOSAKI Motohiro @ 2008-06-09 0:34 UTC (permalink / raw)
To: Andrew Morton; +Cc: kosaki.motohiro, LKML, linux-mm
> > # echo 5 > /proc/sys/vm/max_nr_task_per_zone
>
> Please document /proc/sys/vm tunables in Documentation/filesystems/proc.txt
Oh, makes sense.
Thank you good advice!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread