[BUGFIX][PATCH v3] memcg: fix behavior of per cpu charge cache draining.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [BUGFIX][PATCH v3] memcg: fix behavior of per cpu charge cache draining.
@ 2011-06-09  0:30 KAMEZAWA Hiroyuki
  2011-06-09  1:30 ` Daisuke Nishimura
  2011-06-10  8:12 ` Michal Hocko
  0 siblings, 2 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-06-09  0:30 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, akpm, nishimura, Michal Hocko, bsingharora, Ying Han



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUGFIX][PATCH v3] memcg: fix behavior of per cpu charge cache draining.
  2011-06-09  0:30 [BUGFIX][PATCH v3] memcg: fix behavior of per cpu charge cache draining KAMEZAWA Hiroyuki
@ 2011-06-09  1:30 ` Daisuke Nishimura
  2011-06-10  8:12 ` Michal Hocko
  1 sibling, 0 replies; 9+ messages in thread
From: Daisuke Nishimura @ 2011-06-09  1:30 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, akpm, Michal Hocko, bsingharora,
	Ying Han, Daisuke Nishimura

On Thu, 9 Jun 2011 09:30:45 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> From 0ebd8a90a91d50c512e7c63e5529a22e44e84c42 Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Wed, 8 Jun 2011 13:51:11 +0900
> Subject: [PATCH] Fix behavior of per-cpu charge cache draining in memcg.
> 
> For performance, memory cgroup caches some "charge" from res_counter
> into per cpu cache. This works well but because it's cache,
> it needs to be flushed in some cases. Typical cases are
> 	1. when someone hit limit.
> 	2. when rmdir() is called and need to charges to be 0.
> 
> But "1" has problem.
> 
> Recently, with large SMP machines, we many kworker runs because
> of flushing memcg's cache. Bad things in implementation are
> 
> a) it's called before calling try_to_free_mem_cgroup_pages()
>    so, it's called immidiately when a task hit limit.
>    (I though it was better to avoid to run into memory reclaim.
>     But it was wrong decision.)
> 
> b) Even if a cpu contains a cache for memcg not related to
>    a memcg which hits limit, drain code is called.
> 
> This patch fixes a) and b) by
> 
> A) delay calling of flushing until one run of try_to_free...
>    Then, the number of calling is decreased.
> B) check percpu cache contains a useful data or not.
> plus
> C) check asynchronous percpu draining doesn't run.
> 
> BTW, why this patch relpaces atomic_t counter with mutex is
> to guarantee a memcg which is pointed by stock->cacne is
> not destroyed while we check css_id.
> 
> Reported-by: Ying Han <yinghan@google.com>
> Reviewed-by: Michal Hocko <mhocko@suse.cz>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUGFIX][PATCH v3] memcg: fix behavior of per cpu charge cache draining.
  2011-06-09  0:30 [BUGFIX][PATCH v3] memcg: fix behavior of per cpu charge cache draining KAMEZAWA Hiroyuki
  2011-06-09  1:30 ` Daisuke Nishimura
@ 2011-06-10  8:12 ` Michal Hocko
  2011-06-10  8:39   ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2011-06-10  8:12 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, akpm, nishimura, bsingharora, Ying Han

On Thu 09-06-11 09:30:45, KAMEZAWA Hiroyuki wrote:
> From 0ebd8a90a91d50c512e7c63e5529a22e44e84c42 Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Wed, 8 Jun 2011 13:51:11 +0900
> Subject: [PATCH] Fix behavior of per-cpu charge cache draining in memcg.
> 
> For performance, memory cgroup caches some "charge" from res_counter
> into per cpu cache. This works well but because it's cache,
> it needs to be flushed in some cases. Typical cases are
> 	1. when someone hit limit.
> 	2. when rmdir() is called and need to charges to be 0.
> 
> But "1" has problem.
> 
> Recently, with large SMP machines, we many kworker runs because
> of flushing memcg's cache. Bad things in implementation are
> 
> a) it's called before calling try_to_free_mem_cgroup_pages()
>    so, it's called immidiately when a task hit limit.
>    (I though it was better to avoid to run into memory reclaim.
>     But it was wrong decision.)
> 
> b) Even if a cpu contains a cache for memcg not related to
>    a memcg which hits limit, drain code is called.
> 
> This patch fixes a) and b) by
> 
> A) delay calling of flushing until one run of try_to_free...
>    Then, the number of calling is decreased.
> B) check percpu cache contains a useful data or not.
> plus
> C) check asynchronous percpu draining doesn't run.
> 
> BTW, why this patch relpaces atomic_t counter with mutex is
> to guarantee a memcg which is pointed by stock->cacne is
> not destroyed while we check css_id.
> 
> Reported-by: Ying Han <yinghan@google.com>
> Reviewed-by: Michal Hocko <mhocko@suse.cz>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Changelog:
>  - fixed typo.
>  - fixed rcu_read_lock() and add strict mutal execution between
>    asynchronous and synchronous flushing. It's requred for validness
>    of cached pointer.
>  - add root_mem->use_hierarchy check.
> ---
>  mm/memcontrol.c |   54 +++++++++++++++++++++++++++++++++++-------------------
>  1 files changed, 35 insertions(+), 19 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index bd9052a..3baddcb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
[...]
>  static struct mem_cgroup_per_zone *
>  mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
> @@ -1670,8 +1670,6 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
>  		victim = mem_cgroup_select_victim(root_mem);
>  		if (victim == root_mem) {
>  			loop++;
> -			if (loop >= 1)
> -				drain_all_stock_async();
>  			if (loop >= 2) {
>  				/*
>  				 * If we have not been able to reclaim
> @@ -1723,6 +1721,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
>  				return total;
>  		} else if (mem_cgroup_margin(root_mem))
>  			return total;
> +		drain_all_stock_async(root_mem);
>  	}
>  	return total;
>  }

I still think that we pointlessly reclaim even though we could have a
lot of pages pre-charged in the cache (the more CPUs we have the more
significant this might be).
Now that drain_all_stock_async is more targeted with your patch doesn't
it make sense to call it before we start what-ever reclaim and call
mem_cgroup_margin right after?
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUGFIX][PATCH v3] memcg: fix behavior of per cpu charge cache draining.
  2011-06-10  8:12 ` Michal Hocko
@ 2011-06-10  8:39   ` KAMEZAWA Hiroyuki
  2011-06-10  9:08     ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-06-10  8:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, akpm, nishimura, bsingharora, Ying Han

On Fri, 10 Jun 2011 10:12:19 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> On Thu 09-06-11 09:30:45, KAMEZAWA Hiroyuki wrote:
> > From 0ebd8a90a91d50c512e7c63e5529a22e44e84c42 Mon Sep 17 00:00:00 2001
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Date: Wed, 8 Jun 2011 13:51:11 +0900
> > Subject: [PATCH] Fix behavior of per-cpu charge cache draining in memcg.
> > 
> > For performance, memory cgroup caches some "charge" from res_counter
> > into per cpu cache. This works well but because it's cache,
> > it needs to be flushed in some cases. Typical cases are
> > 	1. when someone hit limit.
> > 	2. when rmdir() is called and need to charges to be 0.
> > 
> > But "1" has problem.
> > 
> > Recently, with large SMP machines, we many kworker runs because
> > of flushing memcg's cache. Bad things in implementation are
> > 
> > a) it's called before calling try_to_free_mem_cgroup_pages()
> >    so, it's called immidiately when a task hit limit.
> >    (I though it was better to avoid to run into memory reclaim.
> >     But it was wrong decision.)
> > 
> > b) Even if a cpu contains a cache for memcg not related to
> >    a memcg which hits limit, drain code is called.
> > 
> > This patch fixes a) and b) by
> > 
> > A) delay calling of flushing until one run of try_to_free...
> >    Then, the number of calling is decreased.
> > B) check percpu cache contains a useful data or not.
> > plus
> > C) check asynchronous percpu draining doesn't run.
> > 
> > BTW, why this patch relpaces atomic_t counter with mutex is
> > to guarantee a memcg which is pointed by stock->cacne is
> > not destroyed while we check css_id.
> > 
> > Reported-by: Ying Han <yinghan@google.com>
> > Reviewed-by: Michal Hocko <mhocko@suse.cz>
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > Changelog:
> >  - fixed typo.
> >  - fixed rcu_read_lock() and add strict mutal execution between
> >    asynchronous and synchronous flushing. It's requred for validness
> >    of cached pointer.
> >  - add root_mem->use_hierarchy check.
> > ---
> >  mm/memcontrol.c |   54 +++++++++++++++++++++++++++++++++++-------------------
> >  1 files changed, 35 insertions(+), 19 deletions(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index bd9052a..3baddcb 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> [...]
> >  static struct mem_cgroup_per_zone *
> >  mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
> > @@ -1670,8 +1670,6 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> >  		victim = mem_cgroup_select_victim(root_mem);
> >  		if (victim == root_mem) {
> >  			loop++;
> > -			if (loop >= 1)
> > -				drain_all_stock_async();
> >  			if (loop >= 2) {
> >  				/*
> >  				 * If we have not been able to reclaim
> > @@ -1723,6 +1721,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> >  				return total;
> >  		} else if (mem_cgroup_margin(root_mem))
> >  			return total;
> > +		drain_all_stock_async(root_mem);
> >  	}
> >  	return total;
> >  }
> 
> I still think that we pointlessly reclaim even though we could have a
> lot of pages pre-charged in the cache (the more CPUs we have the more
> significant this might be).

The more CPUs, the more scan cost for each per-cpu memory, which makes
cache-miss.

I know placement of drain_all_stock_async() is not big problem on my host,
which has 2socket/8core cpus. But, assuming 1000+ cpu host, 
"when you hit limit, you'll see 1000*128bytes cache miss and need to call test_and_set for 1000+ cpus in bad case." doesn't seem much win.

If we implement "call-drain-only-nearby-cpus", I think we can call it before
calling try_to_free_mem_cgroup_pages(). I'll add it to my TO-DO-LIST.

How do you think ?

Thanks,
-Kame












--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUGFIX][PATCH v3] memcg: fix behavior of per cpu charge cache draining.
  2011-06-10  8:39   ` KAMEZAWA Hiroyuki
@ 2011-06-10  9:08     ` Michal Hocko
  2011-06-10  9:59       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2011-06-10  9:08 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, akpm, nishimura, bsingharora, Ying Han

On Fri 10-06-11 17:39:58, KAMEZAWA Hiroyuki wrote:
> On Fri, 10 Jun 2011 10:12:19 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > On Thu 09-06-11 09:30:45, KAMEZAWA Hiroyuki wrote:
[...]
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index bd9052a..3baddcb 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > [...]
> > >  static struct mem_cgroup_per_zone *
> > >  mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
> > > @@ -1670,8 +1670,6 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> > >  		victim = mem_cgroup_select_victim(root_mem);
> > >  		if (victim == root_mem) {
> > >  			loop++;
> > > -			if (loop >= 1)
> > > -				drain_all_stock_async();
> > >  			if (loop >= 2) {
> > >  				/*
> > >  				 * If we have not been able to reclaim
> > > @@ -1723,6 +1721,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> > >  				return total;
> > >  		} else if (mem_cgroup_margin(root_mem))
> > >  			return total;
> > > +		drain_all_stock_async(root_mem);
> > >  	}
> > >  	return total;
> > >  }
> > 
> > I still think that we pointlessly reclaim even though we could have a
> > lot of pages pre-charged in the cache (the more CPUs we have the more
> > significant this might be).
> 
> The more CPUs, the more scan cost for each per-cpu memory, which makes
> cache-miss.
> 
> I know placement of drain_all_stock_async() is not big problem on my host,
> which has 2socket/8core cpus. But, assuming 1000+ cpu host, 

Hmm, it really depends what you want to optimize for. Reclaim path is
already slow path and cache misses, while not good, are not the most
significant issue, I guess.
What I would see as a much bigger problem is that there might be a lot
of memory pre-charged at those per-cpu caches. Falling into a reclaim
costs us much more IMO and we can evict something that could be useful
for no good reason.

> "when you hit limit, you'll see 1000*128bytes cache miss and need to
> call test_and_set for 1000+ cpus in bad case." doesn't seem much win.
> 
> If we implement "call-drain-only-nearby-cpus", I think we can call it before
> calling try_to_free_mem_cgroup_pages(). I'll add it to my TO-DO-LIST.

It would just consider cpus at the same node?

> How do you think ?

I am afraid we would need two versions then. One for complete draining
(rmdir and company) while the other for reclaim purposes. Which sounds
like a more code complexity.

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUGFIX][PATCH v3] memcg: fix behavior of per cpu charge cache draining.
  2011-06-10  9:08     ` Michal Hocko
@ 2011-06-10  9:59       ` KAMEZAWA Hiroyuki
  2011-06-10 11:04         ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-06-10  9:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, akpm, nishimura, bsingharora, Ying Han

On Fri, 10 Jun 2011 11:08:02 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> On Fri 10-06-11 17:39:58, KAMEZAWA Hiroyuki wrote:
> > On Fri, 10 Jun 2011 10:12:19 +0200
> > Michal Hocko <mhocko@suse.cz> wrote:
> > 
> > > On Thu 09-06-11 09:30:45, KAMEZAWA Hiroyuki wrote:
> [...]
> > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > index bd9052a..3baddcb 100644
> > > > --- a/mm/memcontrol.c
> > > > +++ b/mm/memcontrol.c
> > > [...]
> > > >  static struct mem_cgroup_per_zone *
> > > >  mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
> > > > @@ -1670,8 +1670,6 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> > > >  		victim = mem_cgroup_select_victim(root_mem);
> > > >  		if (victim == root_mem) {
> > > >  			loop++;
> > > > -			if (loop >= 1)
> > > > -				drain_all_stock_async();
> > > >  			if (loop >= 2) {
> > > >  				/*
> > > >  				 * If we have not been able to reclaim
> > > > @@ -1723,6 +1721,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> > > >  				return total;
> > > >  		} else if (mem_cgroup_margin(root_mem))
> > > >  			return total;
> > > > +		drain_all_stock_async(root_mem);
> > > >  	}
> > > >  	return total;
> > > >  }
> > > 
> > > I still think that we pointlessly reclaim even though we could have a
> > > lot of pages pre-charged in the cache (the more CPUs we have the more
> > > significant this might be).
> > 
> > The more CPUs, the more scan cost for each per-cpu memory, which makes
> > cache-miss.
> > 
> > I know placement of drain_all_stock_async() is not big problem on my host,
> > which has 2socket/8core cpus. But, assuming 1000+ cpu host, 
> 
> Hmm, it really depends what you want to optimize for. Reclaim path is
> already slow path and cache misses, while not good, are not the most
> significant issue, I guess.
> What I would see as a much bigger problem is that there might be a lot
> of memory pre-charged at those per-cpu caches. Falling into a reclaim
> costs us much more IMO and we can evict something that could be useful
> for no good reason.
> 

It's waste of time to talk this kind of things without the numbers.

ok, I don't change the caller's logic. Discuss this when someone gets
number of LARGE smp box. Updated one is attached. Tested on my host
without problem and it seems kworker run is much reduced on my test
with "cat"

When I run "cat 1Gfile > /dev/null" under 300M limit memcg,

[Before]
13767 kamezawa  20   0 98.6m  424  416 D 10.0  0.0   0:00.61 cat
   58 root      20   0     0    0    0 S  0.6  0.0   0:00.09 kworker/2:1
   60 root      20   0     0    0    0 S  0.6  0.0   0:00.08 kworker/4:1
    4 root      20   0     0    0    0 S  0.3  0.0   0:00.02 kworker/0:0
   57 root      20   0     0    0    0 S  0.3  0.0   0:00.05 kworker/1:1
   61 root      20   0     0    0    0 S  0.3  0.0   0:00.05 kworker/5:1
   62 root      20   0     0    0    0 S  0.3  0.0   0:00.05 kworker/6:1
   63 root      20   0     0    0    0 S  0.3  0.0   0:00.05 kworker/7:1

[After]
 2676 root      20   0 98.6m  416  416 D  9.3  0.0   0:00.87 cat
 2626 kamezawa  20   0 15192 1312  920 R  0.3  0.0   0:00.28 top
    1 root      20   0 19384 1496 1204 S  0.0  0.0   0:00.66 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0
    4 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kworker/0:0



> > "when you hit limit, you'll see 1000*128bytes cache miss and need to
> > call test_and_set for 1000+ cpus in bad case." doesn't seem much win.
> > 
> > If we implement "call-drain-only-nearby-cpus", I think we can call it before
> > calling try_to_free_mem_cgroup_pages(). I'll add it to my TO-DO-LIST.
> 
> It would just consider cpus at the same node?
> 
just on the same socket. anyway, keep-margin-in-background will be final
help for large SMP.


please test/ack if ok.
==

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUGFIX][PATCH v3] memcg: fix behavior of per cpu charge cache draining.
  2011-06-10  9:59       ` KAMEZAWA Hiroyuki
@ 2011-06-10 11:04         ` Michal Hocko
  2011-06-10 12:24           ` Hiroyuki Kamezawa
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2011-06-10 11:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, akpm, nishimura, bsingharora, Ying Han

On Fri 10-06-11 18:59:52, KAMEZAWA Hiroyuki wrote:
> On Fri, 10 Jun 2011 11:08:02 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > On Fri 10-06-11 17:39:58, KAMEZAWA Hiroyuki wrote:
> > > On Fri, 10 Jun 2011 10:12:19 +0200
> > > Michal Hocko <mhocko@suse.cz> wrote:
> > > 
> > > > On Thu 09-06-11 09:30:45, KAMEZAWA Hiroyuki wrote:
> > [...]
> > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > > index bd9052a..3baddcb 100644
> > > > > --- a/mm/memcontrol.c
> > > > > +++ b/mm/memcontrol.c
> > > > [...]
> > > > >  static struct mem_cgroup_per_zone *
> > > > >  mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
> > > > > @@ -1670,8 +1670,6 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> > > > >  		victim = mem_cgroup_select_victim(root_mem);
> > > > >  		if (victim == root_mem) {
> > > > >  			loop++;
> > > > > -			if (loop >= 1)
> > > > > -				drain_all_stock_async();
> > > > >  			if (loop >= 2) {
> > > > >  				/*
> > > > >  				 * If we have not been able to reclaim
> > > > > @@ -1723,6 +1721,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> > > > >  				return total;
> > > > >  		} else if (mem_cgroup_margin(root_mem))
> > > > >  			return total;
> > > > > +		drain_all_stock_async(root_mem);
> > > > >  	}
> > > > >  	return total;
> > > > >  }
> > > > 
> > > > I still think that we pointlessly reclaim even though we could have a
> > > > lot of pages pre-charged in the cache (the more CPUs we have the more
> > > > significant this might be).
> > > 
> > > The more CPUs, the more scan cost for each per-cpu memory, which makes
> > > cache-miss.
> > > 
> > > I know placement of drain_all_stock_async() is not big problem on my host,
> > > which has 2socket/8core cpus. But, assuming 1000+ cpu host, 
> > 
> > Hmm, it really depends what you want to optimize for. Reclaim path is
> > already slow path and cache misses, while not good, are not the most
> > significant issue, I guess.
> > What I would see as a much bigger problem is that there might be a lot
> > of memory pre-charged at those per-cpu caches. Falling into a reclaim
> > costs us much more IMO and we can evict something that could be useful
> > for no good reason.
> > 
> 
> It's waste of time to talk this kind of things without the numbers.
> 
> ok, I don't change the caller's logic. Discuss this when someone gets
> number of LARGE smp box. 

Sounds reasonable.

[..,]
> please test/ack if ok.

see comment bellow.
Reviewed-by: Michal Hocko <mhocko@suse.cz>

[...]
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index bd9052a..75713cb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -359,7 +359,7 @@ enum charge_type {
>  static void mem_cgroup_get(struct mem_cgroup *mem);
>  static void mem_cgroup_put(struct mem_cgroup *mem);
>  static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
> -static void drain_all_stock_async(void);
> +static void drain_all_stock_async(struct mem_cgroup *mem);
>  
>  static struct mem_cgroup_per_zone *
>  mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
> @@ -1670,8 +1670,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
>  		victim = mem_cgroup_select_victim(root_mem);
>  		if (victim == root_mem) {
>  			loop++;
> -			if (loop >= 1)
> -				drain_all_stock_async();
> +			drain_all_stock_async(root_mem);
>  			if (loop >= 2) {
>  				/*
>  				 * If we have not been able to reclaim

This still doesn't prevent from direct reclaim even though we have freed
enough pages from pcp caches. Should I post it as a separate patch?

> @@ -1934,9 +1933,12 @@ struct memcg_stock_pcp {
>  	struct mem_cgroup *cached; /* this never be root cgroup */
>  	unsigned int nr_pages;
>  	struct work_struct work;
> +	unsigned long flags;
> +#define ASYNC_FLUSHING	(0)

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUGFIX][PATCH v3] memcg: fix behavior of per cpu charge cache draining.
  2011-06-10 11:04         ` Michal Hocko
@ 2011-06-10 12:24           ` Hiroyuki Kamezawa
  2011-06-10 13:31             ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Hiroyuki Kamezawa @ 2011-06-10 12:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, akpm, nishimura,
	bsingharora, Ying Han

2011/6/10 Michal Hocko <mhocko@suse.cz>:
> On Fri 10-06-11 18:59:52, KAMEZAWA Hiroyuki wrote:
>> On Fri, 10 Jun 2011 11:08:02 +0200
>> Michal Hocko <mhocko@suse.cz> wrote:
>>
>> > On Fri 10-06-11 17:39:58, KAMEZAWA Hiroyuki wrote:
>> > > On Fri, 10 Jun 2011 10:12:19 +0200
>> > > Michal Hocko <mhocko@suse.cz> wrote:
>> > >
>> > > > On Thu 09-06-11 09:30:45, KAMEZAWA Hiroyuki wrote:
>> > [...]
>> > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> > > > > index bd9052a..3baddcb 100644
>> > > > > --- a/mm/memcontrol.c
>> > > > > +++ b/mm/memcontrol.c
>> > > > [...]
>> > > > >  static struct mem_cgroup_per_zone *
>> > > > >  mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
>> > > > > @@ -1670,8 +1670,6 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
>> > > > >               victim = mem_cgroup_select_victim(root_mem);
>> > > > >               if (victim == root_mem) {
>> > > > >                       loop++;
>> > > > > -                     if (loop >= 1)
>> > > > > -                             drain_all_stock_async();
>> > > > >                       if (loop >= 2) {
>> > > > >                               /*
>> > > > >                                * If we have not been able to reclaim
>> > > > > @@ -1723,6 +1721,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
>> > > > >                               return total;
>> > > > >               } else if (mem_cgroup_margin(root_mem))
>> > > > >                       return total;
>> > > > > +             drain_all_stock_async(root_mem);
>> > > > >       }
>> > > > >       return total;
>> > > > >  }
>> > > >
>> > > > I still think that we pointlessly reclaim even though we could have a
>> > > > lot of pages pre-charged in the cache (the more CPUs we have the more
>> > > > significant this might be).
>> > >
>> > > The more CPUs, the more scan cost for each per-cpu memory, which makes
>> > > cache-miss.
>> > >
>> > > I know placement of drain_all_stock_async() is not big problem on my host,
>> > > which has 2socket/8core cpus. But, assuming 1000+ cpu host,
>> >
>> > Hmm, it really depends what you want to optimize for. Reclaim path is
>> > already slow path and cache misses, while not good, are not the most
>> > significant issue, I guess.
>> > What I would see as a much bigger problem is that there might be a lot
>> > of memory pre-charged at those per-cpu caches. Falling into a reclaim
>> > costs us much more IMO and we can evict something that could be useful
>> > for no good reason.
>> >
>>
>> It's waste of time to talk this kind of things without the numbers.
>>
>> ok, I don't change the caller's logic. Discuss this when someone gets
>> number of LARGE smp box.
>
> Sounds reasonable.
>
> [..,]
>> please test/ack if ok.
>
> see comment bellow.
> Reviewed-by: Michal Hocko <mhocko@suse.cz>
>
> [...]
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index bd9052a..75713cb 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -359,7 +359,7 @@ enum charge_type {
>>  static void mem_cgroup_get(struct mem_cgroup *mem);
>>  static void mem_cgroup_put(struct mem_cgroup *mem);
>>  static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
>> -static void drain_all_stock_async(void);
>> +static void drain_all_stock_async(struct mem_cgroup *mem);
>>
>>  static struct mem_cgroup_per_zone *
>>  mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
>> @@ -1670,8 +1670,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
>>               victim = mem_cgroup_select_victim(root_mem);
>>               if (victim == root_mem) {
>>                       loop++;
>> -                     if (loop >= 1)
>> -                             drain_all_stock_async();
>> +                     drain_all_stock_async(root_mem);
>>                       if (loop >= 2) {
>>                               /*
>>                                * If we have not been able to reclaim
>
> This still doesn't prevent from direct reclaim even though we have freed
> enough pages from pcp caches. Should I post it as a separate patch?
>

yes. please in different thread. Maybe moving this out of loop will
make sense. (And I have a cleanup patch for this loop. I'll do that
when I post it later, anyway)

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUGFIX][PATCH v3] memcg: fix behavior of per cpu charge cache draining.
  2011-06-10 12:24           ` Hiroyuki Kamezawa
@ 2011-06-10 13:31             ` Michal Hocko
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2011-06-10 13:31 UTC (permalink / raw)
  To: Hiroyuki Kamezawa
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, akpm, nishimura,
	bsingharora, Ying Han

On Fri 10-06-11 21:24:51, Hiroyuki Kamezawa wrote:
> 2011/6/10 Michal Hocko <mhocko@suse.cz>:
> > On Fri 10-06-11 18:59:52, KAMEZAWA Hiroyuki wrote:
[...]
> >> @@ -1670,8 +1670,7 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
> >>               victim = mem_cgroup_select_victim(root_mem);
> >>               if (victim == root_mem) {
> >>                       loop++;
> >> -                     if (loop >= 1)
> >> -                             drain_all_stock_async();
> >> +                     drain_all_stock_async(root_mem);
> >>                       if (loop >= 2) {
> >>                               /*
> >>                                * If we have not been able to reclaim
> >
> > This still doesn't prevent from direct reclaim even though we have freed
> > enough pages from pcp caches. Should I post it as a separate patch?
> >
> 
> yes. please in different thread. Maybe moving this out of loop will
> make sense. (And I have a cleanup patch for this loop. I'll do that
> when I post it later, anyway)

OK, I will wait for your cleanup then.
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-06-10 13:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-09  0:30 [BUGFIX][PATCH v3] memcg: fix behavior of per cpu charge cache draining KAMEZAWA Hiroyuki
2011-06-09  1:30 ` Daisuke Nishimura
2011-06-10  8:12 ` Michal Hocko
2011-06-10  8:39   ` KAMEZAWA Hiroyuki
2011-06-10  9:08     ` Michal Hocko
2011-06-10  9:59       ` KAMEZAWA Hiroyuki
2011-06-10 11:04         ` Michal Hocko
2011-06-10 12:24           ` Hiroyuki Kamezawa
2011-06-10 13:31             ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox