[PATCH mmotm] vmscan: fix may_swap handling for memcg

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH mmotm] vmscan: fix may_swap handling for memcg
@ 2009-06-08  3:02 Daisuke Nishimura
  2009-06-08  3:20 ` KOSAKI Motohiro
  0 siblings, 1 reply; 14+ messages in thread
From: Daisuke Nishimura @ 2009-06-08  3:02 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Johannes Weiner, Balbir Singh, KAMEZAWA Hiroyuki,
	KOSAKI Motohiro, Daisuke Nishimura

From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

Commit 2e2e425989080cc534fc0fca154cae515f971cf5 ("vmscan,memcg: reintroduce
sc->may_swap) add may_swap flag and handle it at get_scan_ratio().

But the result of get_scan_ratio() is ignored when priority == 0, and this
means, when memcg hits the mem+swap limit, anon pages can be swapped
just in vain. Especially when memcg causes oom by mem+swap limit,
we can see many and many pages are swapped out.

Instead of not scanning anon lru completely when priority == 0, this patch adds
a hook to handle may_swap flag in shrink_page_list() to avoid using useless swaps,
and calls try_to_free_swap() if needed because it can reduce
both mem.usage and memsw.usage if the page(SwapCache) is unused anymore.

Such unused-but-managed-under-memcg SwapCache can be made in some paths,
for example trylock_page() failure in free_swap_cache().

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 mm/vmscan.c |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2ddcfc8..d9a3f54 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -640,6 +640,25 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 					referenced && page_mapping_inuse(page))
 			goto activate_locked;

+		if (!sc->may_swap && PageSwapBacked(page)) {
+			/* SwapCache has already uses swap entry */
+			if (!PageSwapCache(page))
+				goto keep_locked;
+			/*
+			 * From the view point of memcg, may_swap is false when
+			 * memsw.usage hits the limit.
+			 * But swaping out SwapCache to disk doesn't reduce the
+			 * memsw.usage, so it is a waste of time.
+			 * Call try_to_free_swap() if the page isn't used,
+			 * because it can reduce both mem.usage and memsw.usage.
+			 */
+			if (!scanning_global_lru(sc)) {
+				if (!page_mapped(page))
+					try_to_free_swap(page);
+				goto keep_locked;
+			}
+		}
+
 		/*
 		 * Anonymous process memory has backing store?
 		 * Try to allocate it some swap space here.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mmotm] vmscan: fix may_swap handling for memcg
  2009-06-08  3:02 [PATCH mmotm] vmscan: fix may_swap handling for memcg Daisuke Nishimura
@ 2009-06-08  3:20 ` KOSAKI Motohiro
  2009-06-08  6:39   ` Daisuke Nishimura
  0 siblings, 1 reply; 14+ messages in thread
From: KOSAKI Motohiro @ 2009-06-08  3:20 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, Johannes Weiner,
	Balbir Singh, KAMEZAWA Hiroyuki

Hi

> From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> 
> Commit 2e2e425989080cc534fc0fca154cae515f971cf5 ("vmscan,memcg: reintroduce
> sc->may_swap) add may_swap flag and handle it at get_scan_ratio().
> 
> But the result of get_scan_ratio() is ignored when priority == 0, and this
> means, when memcg hits the mem+swap limit, anon pages can be swapped
> just in vain. Especially when memcg causes oom by mem+swap limit,
> we can see many and many pages are swapped out.
> 
> Instead of not scanning anon lru completely when priority == 0, this patch adds
> a hook to handle may_swap flag in shrink_page_list() to avoid using useless swaps,
> and calls try_to_free_swap() if needed because it can reduce
> both mem.usage and memsw.usage if the page(SwapCache) is unused anymore.
> 
> Such unused-but-managed-under-memcg SwapCache can be made in some paths,
> for example trylock_page() failure in free_swap_cache().
> 
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

I think root cause is following branch, right?
if so, Why can't we handle this issue on shrink_zone()?


---------------------------------------------------------------
static void shrink_zone(int priority, struct zone *zone,
                                struct scan_control *sc)
{
        get_scan_ratio(zone, sc, percent);

        for_each_evictable_lru(l) {
                int file = is_file_lru(l);
                unsigned long scan;

                scan = zone_nr_pages(zone, sc, l);
                if (priority) {				// !!here!!
                        scan >>= priority;
                        scan = (scan * percent[file]) / 100;
                }




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mmotm] vmscan: fix may_swap handling for memcg
  2009-06-08  3:20 ` KOSAKI Motohiro
@ 2009-06-08  6:39   ` Daisuke Nishimura
  2009-06-08  6:53     ` KOSAKI Motohiro
  0 siblings, 1 reply; 14+ messages in thread
From: Daisuke Nishimura @ 2009-06-08  6:39 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, Daisuke Nishimura

On Mon,  8 Jun 2009 12:20:54 +0900 (JST), KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> Hi
> 
Hi, thank you for your comment.

> > From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > 
> > Commit 2e2e425989080cc534fc0fca154cae515f971cf5 ("vmscan,memcg: reintroduce
> > sc->may_swap) add may_swap flag and handle it at get_scan_ratio().
> > 
> > But the result of get_scan_ratio() is ignored when priority == 0, and this
> > means, when memcg hits the mem+swap limit, anon pages can be swapped
> > just in vain. Especially when memcg causes oom by mem+swap limit,
> > we can see many and many pages are swapped out.
> > 
> > Instead of not scanning anon lru completely when priority == 0, this patch adds
> > a hook to handle may_swap flag in shrink_page_list() to avoid using useless swaps,
> > and calls try_to_free_swap() if needed because it can reduce
> > both mem.usage and memsw.usage if the page(SwapCache) is unused anymore.
> > 
> > Such unused-but-managed-under-memcg SwapCache can be made in some paths,
> > for example trylock_page() failure in free_swap_cache().
> > 
> > Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> 
> I think root cause is following branch, right?
yes.

> if so, Why can't we handle this issue on shrink_zone()?
> 
Just because priority==0 means oom is about to happen and I don't
want to see oom if possible.
So I thought it would be better to reclaim as much pages(memsw.usage) as possible
in this case.

> 
> ---------------------------------------------------------------
> static void shrink_zone(int priority, struct zone *zone,
>                                 struct scan_control *sc)
> {
>         get_scan_ratio(zone, sc, percent);
> 
>         for_each_evictable_lru(l) {
>                 int file = is_file_lru(l);
>                 unsigned long scan;
> 
>                 scan = zone_nr_pages(zone, sc, l);
>                 if (priority) {				// !!here!!
>                         scan >>= priority;
>                         scan = (scan * percent[file]) / 100;
>                 }
> 
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mmotm] vmscan: fix may_swap handling for memcg
  2009-06-08  6:39   ` Daisuke Nishimura
@ 2009-06-08  6:53     ` KOSAKI Motohiro
  2009-06-08  7:54       ` Daisuke Nishimura
  0 siblings, 1 reply; 14+ messages in thread
From: KOSAKI Motohiro @ 2009-06-08  6:53 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, Johannes Weiner,
	Balbir Singh, KAMEZAWA Hiroyuki

> On Mon,  8 Jun 2009 12:20:54 +0900 (JST), KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> > Hi
> > 
> Hi, thank you for your comment.
> 
> > > From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > > 
> > > Commit 2e2e425989080cc534fc0fca154cae515f971cf5 ("vmscan,memcg: reintroduce
> > > sc->may_swap) add may_swap flag and handle it at get_scan_ratio().
> > > 
> > > But the result of get_scan_ratio() is ignored when priority == 0, and this
> > > means, when memcg hits the mem+swap limit, anon pages can be swapped
> > > just in vain. Especially when memcg causes oom by mem+swap limit,
> > > we can see many and many pages are swapped out.
> > > 
> > > Instead of not scanning anon lru completely when priority == 0, this patch adds
> > > a hook to handle may_swap flag in shrink_page_list() to avoid using useless swaps,
> > > and calls try_to_free_swap() if needed because it can reduce
> > > both mem.usage and memsw.usage if the page(SwapCache) is unused anymore.
> > > 
> > > Such unused-but-managed-under-memcg SwapCache can be made in some paths,
> > > for example trylock_page() failure in free_swap_cache().
> > > 
> > > Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > 
> > I think root cause is following branch, right?
> yes.
> 
> > if so, Why can't we handle this issue on shrink_zone()?
> > 
> Just because priority==0 means oom is about to happen and I don't
> want to see oom if possible.
> So I thought it would be better to reclaim as much pages(memsw.usage) as possible
> in this case.

hmmm..

In general, adding new branch to shrink_page_list() is not good idea.
it can cause performance degression.

Plus, it is not big problem at all. it happen only when priority==0.
Definitely, priority==0 don't occur normally.
and, too many recliaming pages is not only memcg issue. I don't think this
patch provide generic solution.


Why your test environment makes oom so frequently?




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mmotm] vmscan: fix may_swap handling for memcg
  2009-06-08  6:53     ` KOSAKI Motohiro
@ 2009-06-08  7:54       ` Daisuke Nishimura
  2009-06-09  7:13         ` [PATCH mmotm] vmscan: handle may_swap more strictly (Re: [PATCH mmotm] vmscan: fix may_swap handling for memcg) Daisuke Nishimura
  0 siblings, 1 reply; 14+ messages in thread
From: Daisuke Nishimura @ 2009-06-08  7:54 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, Daisuke Nishimura

On Mon,  8 Jun 2009 15:53:50 +0900 (JST), KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> > On Mon,  8 Jun 2009 12:20:54 +0900 (JST), KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> > > Hi
> > > 
> > Hi, thank you for your comment.
> > 
> > > > From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > > > 
> > > > Commit 2e2e425989080cc534fc0fca154cae515f971cf5 ("vmscan,memcg: reintroduce
> > > > sc->may_swap) add may_swap flag and handle it at get_scan_ratio().
> > > > 
> > > > But the result of get_scan_ratio() is ignored when priority == 0, and this
> > > > means, when memcg hits the mem+swap limit, anon pages can be swapped
> > > > just in vain. Especially when memcg causes oom by mem+swap limit,
> > > > we can see many and many pages are swapped out.
> > > > 
> > > > Instead of not scanning anon lru completely when priority == 0, this patch adds
> > > > a hook to handle may_swap flag in shrink_page_list() to avoid using useless swaps,
> > > > and calls try_to_free_swap() if needed because it can reduce
> > > > both mem.usage and memsw.usage if the page(SwapCache) is unused anymore.
> > > > 
> > > > Such unused-but-managed-under-memcg SwapCache can be made in some paths,
> > > > for example trylock_page() failure in free_swap_cache().
> > > > 
> > > > Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > > 
> > > I think root cause is following branch, right?
> > yes.
> > 
> > > if so, Why can't we handle this issue on shrink_zone()?
> > > 
> > Just because priority==0 means oom is about to happen and I don't
> > want to see oom if possible.
> > So I thought it would be better to reclaim as much pages(memsw.usage) as possible
> > in this case.
> 
> hmmm..
> 
> In general, adding new branch to shrink_page_list() is not good idea.
> it can cause performance degression.
> 
> Plus, it is not big problem at all. it happen only when priority==0.
> Definitely, priority==0 don't occur normally.
But it happens under high memory pressure...

> and, too many recliaming pages is not only memcg issue. I don't think this
> patch provide generic solution.
> 
Ah, you're right. It's not only memcg issue.

> 
> Why your test environment makes oom so frequently?
> 
Not so frequently :)
But I can see almost all of pages are swapped-out when memcg causes oom
by memsw.limit(it's a waste of cpu time).
And even after Kamezawa-san's memcg-fix-behavior-under-memorylimit-equals-to-memswlimit.patch,
I can sometimes see swap usage when mem.limit==memsw.limit(it's a waste of cpu time too).


Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH mmotm] vmscan: handle may_swap more strictly (Re: [PATCH mmotm] vmscan: fix may_swap handling for memcg)
  2009-06-08  7:54       ` Daisuke Nishimura
@ 2009-06-09  7:13         ` Daisuke Nishimura
  2009-06-09  7:20           ` KOSAKI Motohiro
  2009-06-09  7:28           ` KAMEZAWA Hiroyuki
  0 siblings, 2 replies; 14+ messages in thread
From: Daisuke Nishimura @ 2009-06-09  7:13 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Balbir Singh,
	KAMEZAWA Hiroyuki, Daisuke Nishimura

> > and, too many recliaming pages is not only memcg issue. I don't think this
> > patch provide generic solution.
> > 
> Ah, you're right. It's not only memcg issue.
> 
How about this one ?

===
From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

Commit 2e2e425989080cc534fc0fca154cae515f971cf5 ("vmscan,memcg: reintroduce
sc->may_swap) add may_swap flag and handle it at get_scan_ratio().

But the result of get_scan_ratio() is ignored when priority == 0,
so anon lru is scanned even if may_swap == 0 or nr_swap_pages == 0.
IMHO, this is not an expected behavior.

As for memcg especially, because of this behavior many and many pages are
swapped-out just in vain when oom is invoked by mem+swap limit.

This patch is for handling may_swap flag more strictly.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 mm/vmscan.c |   18 +++++++++---------
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2ddcfc8..bacb092 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1407,13 +1407,6 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
 	unsigned long ap, fp;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
 
-	/* If we have no swap space, do not bother scanning anon pages. */
-	if (!sc->may_swap || (nr_swap_pages <= 0)) {
-		percent[0] = 0;
-		percent[1] = 100;
-		return;
-	}
-
 	anon  = zone_nr_pages(zone, sc, LRU_ACTIVE_ANON) +
 		zone_nr_pages(zone, sc, LRU_INACTIVE_ANON);
 	file  = zone_nr_pages(zone, sc, LRU_ACTIVE_FILE) +
@@ -1511,15 +1504,22 @@ static void shrink_zone(int priority, struct zone *zone,
 	enum lru_list l;
 	unsigned long nr_reclaimed = sc->nr_reclaimed;
 	unsigned long swap_cluster_max = sc->swap_cluster_max;
+	int noswap = 0;
 
-	get_scan_ratio(zone, sc, percent);
+	/* If we have no swap space, do not bother scanning anon pages. */
+	if (!sc->may_swap || (nr_swap_pages <= 0)) {
+		noswap = 1;
+		percent[0] = 0;
+		percent[1] = 100;
+	} else
+		get_scan_ratio(zone, sc, percent);
 
 	for_each_evictable_lru(l) {
 		int file = is_file_lru(l);
 		unsigned long scan;
 
 		scan = zone_nr_pages(zone, sc, l);
-		if (priority) {
+		if (priority || noswap) {
 			scan >>= priority;
 			scan = (scan * percent[file]) / 100;
 		}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mmotm] vmscan: handle may_swap more strictly (Re: [PATCH mmotm] vmscan: fix may_swap handling for memcg)
  2009-06-09  7:13         ` [PATCH mmotm] vmscan: handle may_swap more strictly (Re: [PATCH mmotm] vmscan: fix may_swap handling for memcg) Daisuke Nishimura
@ 2009-06-09  7:20           ` KOSAKI Motohiro
  2009-06-09  7:48             ` Minchan Kim
  2009-06-09  7:28           ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 14+ messages in thread
From: KOSAKI Motohiro @ 2009-06-09  7:20 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, Johannes Weiner,
	Balbir Singh, KAMEZAWA Hiroyuki

> > > and, too many recliaming pages is not only memcg issue. I don't think this
> > > patch provide generic solution.
> > > 
> > Ah, you're right. It's not only memcg issue.
> > 
> How about this one ?
> 
> ===
> From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> 
> Commit 2e2e425989080cc534fc0fca154cae515f971cf5 ("vmscan,memcg: reintroduce
> sc->may_swap) add may_swap flag and handle it at get_scan_ratio().
> 
> But the result of get_scan_ratio() is ignored when priority == 0,
> so anon lru is scanned even if may_swap == 0 or nr_swap_pages == 0.
> IMHO, this is not an expected behavior.
> 
> As for memcg especially, because of this behavior many and many pages are
> swapped-out just in vain when oom is invoked by mem+swap limit.
> 
> This patch is for handling may_swap flag more strictly.
> 
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

Looks great.
your patch doesn't only improve memcg, bug also improve noswap system.

Thanks.
	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>



> ---
>  mm/vmscan.c |   18 +++++++++---------
>  1 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 2ddcfc8..bacb092 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1407,13 +1407,6 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
>  	unsigned long ap, fp;
>  	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
>  
> -	/* If we have no swap space, do not bother scanning anon pages. */
> -	if (!sc->may_swap || (nr_swap_pages <= 0)) {
> -		percent[0] = 0;
> -		percent[1] = 100;
> -		return;
> -	}
> -
>  	anon  = zone_nr_pages(zone, sc, LRU_ACTIVE_ANON) +
>  		zone_nr_pages(zone, sc, LRU_INACTIVE_ANON);
>  	file  = zone_nr_pages(zone, sc, LRU_ACTIVE_FILE) +
> @@ -1511,15 +1504,22 @@ static void shrink_zone(int priority, struct zone *zone,
>  	enum lru_list l;
>  	unsigned long nr_reclaimed = sc->nr_reclaimed;
>  	unsigned long swap_cluster_max = sc->swap_cluster_max;
> +	int noswap = 0;
>  
> -	get_scan_ratio(zone, sc, percent);
> +	/* If we have no swap space, do not bother scanning anon pages. */
> +	if (!sc->may_swap || (nr_swap_pages <= 0)) {
> +		noswap = 1;
> +		percent[0] = 0;
> +		percent[1] = 100;
> +	} else
> +		get_scan_ratio(zone, sc, percent);
>  
>  	for_each_evictable_lru(l) {
>  		int file = is_file_lru(l);
>  		unsigned long scan;
>  
>  		scan = zone_nr_pages(zone, sc, l);
> -		if (priority) {
> +		if (priority || noswap) {
>  			scan >>= priority;
>  			scan = (scan * percent[file]) / 100;
>  		}



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mmotm] vmscan: handle may_swap more strictly (Re: [PATCH mmotm] vmscan: fix may_swap handling for memcg)
  2009-06-09  7:13         ` [PATCH mmotm] vmscan: handle may_swap more strictly (Re: [PATCH mmotm] vmscan: fix may_swap handling for memcg) Daisuke Nishimura
  2009-06-09  7:20           ` KOSAKI Motohiro
@ 2009-06-09  7:28           ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 14+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-09  7:28 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, Johannes Weiner,
	Balbir Singh

On Tue, 9 Jun 2009 16:13:30 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> > > and, too many recliaming pages is not only memcg issue. I don't think this
> > > patch provide generic solution.
> > > 
> > Ah, you're right. It's not only memcg issue.
> > 
> How about this one ?
> 
> ===
> From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> 
> Commit 2e2e425989080cc534fc0fca154cae515f971cf5 ("vmscan,memcg: reintroduce
> sc->may_swap) add may_swap flag and handle it at get_scan_ratio().
> 
> But the result of get_scan_ratio() is ignored when priority == 0,
> so anon lru is scanned even if may_swap == 0 or nr_swap_pages == 0.
> IMHO, this is not an expected behavior.
> 
> As for memcg especially, because of this behavior many and many pages are
> swapped-out just in vain when oom is invoked by mem+swap limit.
> 
> This patch is for handling may_swap flag more strictly.
> 
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

Thanks,
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mmotm] vmscan: handle may_swap more strictly (Re: [PATCH mmotm] vmscan: fix may_swap handling for memcg)
  2009-06-09  7:20           ` KOSAKI Motohiro
@ 2009-06-09  7:48             ` Minchan Kim
  2009-06-09  7:58               ` KOSAKI Motohiro
  0 siblings, 1 reply; 14+ messages in thread
From: Minchan Kim @ 2009-06-09  7:48 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Daisuke Nishimura, LKML, linux-mm, Andrew Morton,
	Johannes Weiner, Balbir Singh, KAMEZAWA Hiroyuki

Hi, KOSAKI.

As you know, this problem caused by if condition(priority) in shrink_zone.
Let me have a question.

Why do we have to prevent scan value calculation when the priority is zero ?
As I know, before split-lru, we didn't do it.

Is there any specific issue in case of the priority is zero ?

On Tue, Jun 9, 2009 at 4:20 PM, KOSAKI
Motohiro<kosaki.motohiro@jp.fujitsu.com> wrote:
>> > > and, too many recliaming pages is not only memcg issue. I don't think this
>> > > patch provide generic solution.
>> > >
>> > Ah, you're right. It's not only memcg issue.
>> >
>> How about this one ?
>>
>> ===
>> From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
>>
>> Commit 2e2e425989080cc534fc0fca154cae515f971cf5 ("vmscan,memcg: reintroduce
>> sc->may_swap) add may_swap flag and handle it at get_scan_ratio().
>>
>> But the result of get_scan_ratio() is ignored when priority == 0,
>> so anon lru is scanned even if may_swap == 0 or nr_swap_pages == 0.
>> IMHO, this is not an expected behavior.
>>
>> As for memcg especially, because of this behavior many and many pages are
>> swapped-out just in vain when oom is invoked by mem+swap limit.
>>
>> This patch is for handling may_swap flag more strictly.
>>
>> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
>
> Looks great.
> your patch doesn't only improve memcg, bug also improve noswap system.
>
> Thanks.
>        Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>
>
>
>> ---
>>  mm/vmscan.c |   18 +++++++++---------
>>  1 files changed, 9 insertions(+), 9 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 2ddcfc8..bacb092 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1407,13 +1407,6 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
>>       unsigned long ap, fp;
>>       struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
>>
>> -     /* If we have no swap space, do not bother scanning anon pages. */
>> -     if (!sc->may_swap || (nr_swap_pages <= 0)) {
>> -             percent[0] = 0;
>> -             percent[1] = 100;
>> -             return;
>> -     }
>> -
>>       anon  = zone_nr_pages(zone, sc, LRU_ACTIVE_ANON) +
>>               zone_nr_pages(zone, sc, LRU_INACTIVE_ANON);
>>       file  = zone_nr_pages(zone, sc, LRU_ACTIVE_FILE) +
>> @@ -1511,15 +1504,22 @@ static void shrink_zone(int priority, struct zone *zone,
>>       enum lru_list l;
>>       unsigned long nr_reclaimed = sc->nr_reclaimed;
>>       unsigned long swap_cluster_max = sc->swap_cluster_max;
>> +     int noswap = 0;
>>
>> -     get_scan_ratio(zone, sc, percent);
>> +     /* If we have no swap space, do not bother scanning anon pages. */
>> +     if (!sc->may_swap || (nr_swap_pages <= 0)) {
>> +             noswap = 1;
>> +             percent[0] = 0;
>> +             percent[1] = 100;
>> +     } else
>> +             get_scan_ratio(zone, sc, percent);
>>
>>       for_each_evictable_lru(l) {
>>               int file = is_file_lru(l);
>>               unsigned long scan;
>>
>>               scan = zone_nr_pages(zone, sc, l);
>> -             if (priority) {
>> +             if (priority || noswap) {
>>                       scan >>= priority;
>>                       scan = (scan * percent[file]) / 100;
>>               }
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



-- 
Kinds regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mmotm] vmscan: handle may_swap more strictly (Re: [PATCH  mmotm] vmscan: fix may_swap handling for memcg)
  2009-06-09  7:48             ` Minchan Kim
@ 2009-06-09  7:58               ` KOSAKI Motohiro
  2009-06-09  8:19                 ` Minchan Kim
  0 siblings, 1 reply; 14+ messages in thread
From: KOSAKI Motohiro @ 2009-06-09  7:58 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, Daisuke Nishimura, LKML, linux-mm,
	Andrew Morton, Johannes Weiner, Balbir Singh, KAMEZAWA Hiroyuki

> Hi, KOSAKI.
> 
> As you know, this problem caused by if condition(priority) in shrink_zone.
> Let me have a question.
> 
> Why do we have to prevent scan value calculation when the priority is zero ?
> As I know, before split-lru, we didn't do it.
> 
> Is there any specific issue in case of the priority is zero ?

Yes. 

example:

get_scan_ratio() return anon:80%, file=20%. and the system have
10000 anon pages and 10000 file pages.

shrink_zone() picked up 8000 anon pages and 2000 file pages.
it mean 8000 file pages aren't scanned at all.

Oops, it can makes OOM-killer although system have droppable file cache.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mmotm] vmscan: handle may_swap more strictly (Re: [PATCH mmotm] vmscan: fix may_swap handling for memcg)
  2009-06-09  7:58               ` KOSAKI Motohiro
@ 2009-06-09  8:19                 ` Minchan Kim
  2009-06-09  8:24                   ` KOSAKI Motohiro
  0 siblings, 1 reply; 14+ messages in thread
From: Minchan Kim @ 2009-06-09  8:19 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Daisuke Nishimura, LKML, linux-mm, Andrew Morton,
	Johannes Weiner, Balbir Singh, KAMEZAWA Hiroyuki

On Tue, Jun 9, 2009 at 4:58 PM, KOSAKI
Motohiro<kosaki.motohiro@jp.fujitsu.com> wrote:
>> Hi, KOSAKI.
>>
>> As you know, this problem caused by if condition(priority) in shrink_zone.
>> Let me have a question.
>>
>> Why do we have to prevent scan value calculation when the priority is zero ?
>> As I know, before split-lru, we didn't do it.
>>
>> Is there any specific issue in case of the priority is zero ?
>
> Yes.
>
> example:
>
> get_scan_ratio() return anon:80%, file=20%. and the system have
> 10000 anon pages and 10000 file pages.
>
> shrink_zone() picked up 8000 anon pages and 2000 file pages.
> it mean 8000 file pages aren't scanned at all.
>
> Oops, it can makes OOM-killer although system have droppable file cache.
>
Hmm..Can that problem be happen in real system ?
The file ratio is big means that file lru list scanning is so big but
rotate is small.
It means file lru have few reclaimable page.

Isn't it ? I am confusing.
Could you elaborate, please if you don't mind ?

-- 
Kinds regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mmotm] vmscan: handle may_swap more strictly (Re: [PATCH  mmotm] vmscan: fix may_swap handling for memcg)
  2009-06-09  8:19                 ` Minchan Kim
@ 2009-06-09  8:24                   ` KOSAKI Motohiro
  2009-06-09  8:35                     ` Minchan Kim
  0 siblings, 1 reply; 14+ messages in thread
From: KOSAKI Motohiro @ 2009-06-09  8:24 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, Daisuke Nishimura, LKML, linux-mm,
	Andrew Morton, Johannes Weiner, Balbir Singh, KAMEZAWA Hiroyuki

> On Tue, Jun 9, 2009 at 4:58 PM, KOSAKI
> Motohiro<kosaki.motohiro@jp.fujitsu.com> wrote:
> >> Hi, KOSAKI.
> >>
> >> As you know, this problem caused by if condition(priority) in shrink_zone.
> >> Let me have a question.
> >>
> >> Why do we have to prevent scan value calculation when the priority is zero ?
> >> As I know, before split-lru, we didn't do it.
> >>
> >> Is there any specific issue in case of the priority is zero ?
> >
> > Yes.
> >
> > example:
> >
> > get_scan_ratio() return anon:80%, file=20%. and the system have
> > 10000 anon pages and 10000 file pages.
> >
> > shrink_zone() picked up 8000 anon pages and 2000 file pages.
> > it mean 8000 file pages aren't scanned at all.
> >
> > Oops, it can makes OOM-killer although system have droppable file cache.
> >
> Hmm..Can that problem be happen in real system ?
> The file ratio is big means that file lru list scanning is so big but
> rotate is small.
> It means file lru have few reclaimable page.
> 
> Isn't it ? I am confusing.
> Could you elaborate, please if you don't mind ?

hm, ok, my example was wrong.
I intention is, if there are droppable file-back pages (althout only 1 page), 
OOM-killer shouldn't occuer.

many or few is unrelated.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mmotm] vmscan: handle may_swap more strictly (Re: [PATCH mmotm] vmscan: fix may_swap handling for memcg)
  2009-06-09  8:24                   ` KOSAKI Motohiro
@ 2009-06-09  8:35                     ` Minchan Kim
  2009-06-09  8:37                       ` KOSAKI Motohiro
  0 siblings, 1 reply; 14+ messages in thread
From: Minchan Kim @ 2009-06-09  8:35 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Daisuke Nishimura, LKML, linux-mm, Andrew Morton,
	Johannes Weiner, Balbir Singh, KAMEZAWA Hiroyuki, Rik van Riel

On Tue, Jun 9, 2009 at 5:24 PM, KOSAKI
Motohiro<kosaki.motohiro@jp.fujitsu.com> wrote:
>> On Tue, Jun 9, 2009 at 4:58 PM, KOSAKI
>> Motohiro<kosaki.motohiro@jp.fujitsu.com> wrote:
>> >> Hi, KOSAKI.
>> >>
>> >> As you know, this problem caused by if condition(priority) in shrink_zone.
>> >> Let me have a question.
>> >>
>> >> Why do we have to prevent scan value calculation when the priority is zero ?
>> >> As I know, before split-lru, we didn't do it.
>> >>
>> >> Is there any specific issue in case of the priority is zero ?
>> >
>> > Yes.
>> >
>> > example:
>> >
>> > get_scan_ratio() return anon:80%, file=20%. and the system have
>> > 10000 anon pages and 10000 file pages.
>> >
>> > shrink_zone() picked up 8000 anon pages and 2000 file pages.
>> > it mean 8000 file pages aren't scanned at all.
>> >
>> > Oops, it can makes OOM-killer although system have droppable file cache.
>> >
>> Hmm..Can that problem be happen in real system ?
>> The file ratio is big means that file lru list scanning is so big but
>> rotate is small.
>> It means file lru have few reclaimable page.
>>
>> Isn't it ? I am confusing.
>> Could you elaborate, please if you don't mind ?
>
> hm, ok, my example was wrong.
> I intention is, if there are droppable file-back pages (althout only 1 page),
> OOM-killer shouldn't occuer.
>
> many or few is unrelated.
>

I am not sure that is effective.
Have you ever met this problem in real situation ?

BTW, I have to dive into code. :)
Thanks for spending valuable time for commenting

-- 
Kinds regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mmotm] vmscan: handle may_swap more strictly (Re: [PATCH  mmotm] vmscan: fix may_swap handling for memcg)
  2009-06-09  8:35                     ` Minchan Kim
@ 2009-06-09  8:37                       ` KOSAKI Motohiro
  0 siblings, 0 replies; 14+ messages in thread
From: KOSAKI Motohiro @ 2009-06-09  8:37 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, Daisuke Nishimura, LKML, linux-mm,
	Andrew Morton, Johannes Weiner, Balbir Singh, KAMEZAWA Hiroyuki,
	Rik van Riel

> On Tue, Jun 9, 2009 at 5:24 PM, KOSAKI
> Motohiro<kosaki.motohiro@jp.fujitsu.com> wrote:
> >> On Tue, Jun 9, 2009 at 4:58 PM, KOSAKI
> >> Motohiro<kosaki.motohiro@jp.fujitsu.com> wrote:
> >> >> Hi, KOSAKI.
> >> >>
> >> >> As you know, this problem caused by if condition(priority) in shrink_zone.
> >> >> Let me have a question.
> >> >>
> >> >> Why do we have to prevent scan value calculation when the priority is zero ?
> >> >> As I know, before split-lru, we didn't do it.
> >> >>
> >> >> Is there any specific issue in case of the priority is zero ?
> >> >
> >> > Yes.
> >> >
> >> > example:
> >> >
> >> > get_scan_ratio() return anon:80%, file=20%. and the system have
> >> > 10000 anon pages and 10000 file pages.
> >> >
> >> > shrink_zone() picked up 8000 anon pages and 2000 file pages.
> >> > it mean 8000 file pages aren't scanned at all.
> >> >
> >> > Oops, it can makes OOM-killer although system have droppable file cache.
> >> >
> >> Hmm..Can that problem be happen in real system ?
> >> The file ratio is big means that file lru list scanning is so big but
> >> rotate is small.
> >> It means file lru have few reclaimable page.
> >>
> >> Isn't it ? I am confusing.
> >> Could you elaborate, please if you don't mind ?
> >
> > hm, ok, my example was wrong.
> > I intention is, if there are droppable file-back pages (althout only 1 page),
> > OOM-killer shouldn't occuer.
> >
> > many or few is unrelated.
> >
> 
> I am not sure that is effective.
> Have you ever met this problem in real situation ?

No.
It's only stress workload issue. but VM subsystem sould work on
stress workload, I think.


> BTW, I have to dive into code. :)
> Thanks for spending valuable time for commenting





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-06-09  8:08 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-08  3:02 [PATCH mmotm] vmscan: fix may_swap handling for memcg Daisuke Nishimura
2009-06-08  3:20 ` KOSAKI Motohiro
2009-06-08  6:39   ` Daisuke Nishimura
2009-06-08  6:53     ` KOSAKI Motohiro
2009-06-08  7:54       ` Daisuke Nishimura
2009-06-09  7:13         ` [PATCH mmotm] vmscan: handle may_swap more strictly (Re: [PATCH mmotm] vmscan: fix may_swap handling for memcg) Daisuke Nishimura
2009-06-09  7:20           ` KOSAKI Motohiro
2009-06-09  7:48             ` Minchan Kim
2009-06-09  7:58               ` KOSAKI Motohiro
2009-06-09  8:19                 ` Minchan Kim
2009-06-09  8:24                   ` KOSAKI Motohiro
2009-06-09  8:35                     ` Minchan Kim
2009-06-09  8:37                       ` KOSAKI Motohiro
2009-06-09  7:28           ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox