[RFC][preview] memcg: reduce lock contention at uncharge by batching

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC][preview] memcg: reduce lock contention at uncharge by batching
@ 2009-08-25  2:25 KAMEZAWA Hiroyuki
  2009-08-25  2:29 ` [RFC][preview] [patch 1/2] memcg: batched uncharge base KAMEZAWA Hiroyuki
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-25  2:25 UTC (permalink / raw)
  To: linux-mm; +Cc: balbir, nishimura

Hi,

This is a preview of a patch for reduce lock contention for memcg->res_counter.
This makes series of uncharge in batch and reduce critical lock contention in
res_counter. This is still under developement and based on 2.6.31-rc7.
I'll rebase this onto mmotm if I'm ready.

I have only 8cpu(4core/2socket) system now. no significant speed up but good lock_stat.

resutlt of kernel-make // time make -j 8
[Before]
real    2m46.491s
user    4m47.008s
sys     3m32.954s


lock_stat version 0.3
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                          &counter->lock:       1167034        1196935           0.52       16291.34      829793.69       18742433       45050576           0.42       30788.81     9490908.36
                          --------------
                          &counter->lock         638151          [<ffffffff81090fd5>] res_counter_charge+0x45/0xe0
                          &counter->lock         558784          [<ffffffff81090f5d>] res_counter_uncharge+0x2d/0x60
                          --------------
                          &counter->lock         679567          [<ffffffff81090fd5>] res_counter_charge+0x45/0xe0
                          &counter->lock         517368          [<ffffffff81090f5d>] res_counter_uncharge+0x2d/0x60

[After]
real    2m45.423s
user    4m48.522s
sys     3m29.183s
lock_stat version 0.3
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                          &counter->lock:        494955         500859           0.53        9601.11      293501.54       16311201       27502048           0.43       25483.56     6934715.75
                          --------------
                          &counter->lock         427024          [<ffffffff81090fb5>] res_counter_charge+0x45/0xe0
                          &counter->lock          73835          [<ffffffff81090f3d>] res_counter_uncharge+0x2d/0x60
                          --------------
                          &counter->lock         435369          [<ffffffff81090fb5>] res_counter_charge+0x45/0xe0
                          &counter->lock          65490          [<ffffffff81090f3d>] res_counter_uncharge+0x2d/0x60

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC][preview] [patch 1/2] memcg: batched uncharge base
  2009-08-25  2:25 [RFC][preview] memcg: reduce lock contention at uncharge by batching KAMEZAWA Hiroyuki
@ 2009-08-25  2:29 ` KAMEZAWA Hiroyuki
  2009-08-25  8:07   ` Daisuke Nishimura
  2009-08-25  2:31 ` [RFC][preview][patch 2/2] memcg: uncharge at truncate/unmap in batched manner KAMEZAWA Hiroyuki
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-25  2:29 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, nishimura


In massive parallel enviroment, res_counter can be a performance bottleneck.
This patch is a trial for reducing lock contention in memcg.

One strong techinque to reduce lock contention is reducing calls themselves by
do some amount of calls into a call, in batch.

Considering charge/uncharge chatacteristic,
	- charge is done one by one via demand-paging.
	- uncharge is done by
		- in continuous call at munmap, truncate, exit, execve...
		- one by one via vmscan/paging.

It seems we have a chance to batched-uncharge.
This patch is a base patch for batched uncharge. For avoiding
scattering memcg's structure as argument, this patch adds memcg batch uncharge
information to the task. please see start/end usage in next patch.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/memcontrol.h |   12 ++++++++++
 include/linux/sched.h      |    8 +++++++
 mm/memcontrol.c            |   51 ++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 68 insertions(+), 3 deletions(-)

Index: linux-2.6.31-rc7/include/linux/memcontrol.h
===================================================================
--- linux-2.6.31-rc7.orig/include/linux/memcontrol.h
+++ linux-2.6.31-rc7/include/linux/memcontrol.h
@@ -54,6 +54,10 @@ extern void mem_cgroup_rotate_lru_list(s
 extern void mem_cgroup_del_lru(struct page *page);
 extern void mem_cgroup_move_lists(struct page *page,
 				  enum lru_list from, enum lru_list to);
+
+extern void mem_cgroup_uncharge_batch_start(void);
+extern void mem_cgroup_uncharge_batch_end(void);
+
 extern void mem_cgroup_uncharge_page(struct page *page);
 extern void mem_cgroup_uncharge_cache_page(struct page *page);
 extern int mem_cgroup_shmem_charge_fallback(struct page *page,
@@ -148,6 +152,14 @@ static inline void mem_cgroup_cancel_cha
 {
 }
 
+static inline void mem_cgroup_uncharge_batch_start(void)
+{
+}
+
+static inline void mem_cgroup_uncharge_batch_start(void)
+{
+}
+
 static inline void mem_cgroup_uncharge_page(struct page *page)
 {
 }
Index: linux-2.6.31-rc7/mm/memcontrol.c
===================================================================
--- linux-2.6.31-rc7.orig/mm/memcontrol.c
+++ linux-2.6.31-rc7/mm/memcontrol.c
@@ -1500,6 +1500,7 @@ __mem_cgroup_uncharge_common(struct page
 	struct page_cgroup *pc;
 	struct mem_cgroup *mem = NULL;
 	struct mem_cgroup_per_zone *mz;
+	struct memcg_batch_info *batch = NULL;
 
 	if (mem_cgroup_disabled())
 		return NULL;
@@ -1537,10 +1538,25 @@ __mem_cgroup_uncharge_common(struct page
 	default:
 		break;
 	}
+	if (current->batch_memcg.batch_mode)
+		batch = &current->batch_memcg;
 
-	res_counter_uncharge(&mem->res, PAGE_SIZE);
-	if (do_swap_account && (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT))
-		res_counter_uncharge(&mem->memsw, PAGE_SIZE);
+	if (!batch || batch->memcg != mem) {
+		res_counter_uncharge(&mem->res, PAGE_SIZE);
+		if (do_swap_account &&
+		    (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT))
+			res_counter_uncharge(&mem->memsw, PAGE_SIZE);
+		if (batch) {
+			batch->memcg = mem;
+			css_get(&mem->css);
+		}
+	} else {
+		/* instead of modifing res_counter, remember it */
+		batch->nr_pages += PAGE_SIZE;
+		if (do_swap_account &&
+		    (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT))
+			batch->nr_memsw += PAGE_SIZE;
+	}
 	mem_cgroup_charge_statistics(mem, pc, false);
 
 	ClearPageCgroupUsed(pc);
@@ -1582,6 +1598,35 @@ void mem_cgroup_uncharge_cache_page(stru
 	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_CACHE);
 }
 
+void mem_cgroup_uncharge_batch_start(void)
+{
+	VM_BUG_ON(current->batch_memcg.batch_mode);
+	current->batch_memcg.batch_mode = 1;
+	current->batch_memcg.memcg = NULL;
+	current->batch_memcg.nr_pages = 0;
+	current->batch_memcg.nr_memsw = 0;
+}
+
+void mem_cgroup_uncharge_batch_end(void)
+{
+	struct mem_cgroup *mem;
+
+	VM_BUG_ON(!current->batch_memcg.batch_mode);
+	current->batch_memcg.batch_mode = 0;
+
+	mem = current->batch_memcg.memcg;
+	if (!mem)
+		return;
+	if (current->batch_memcg.nr_pages)
+		res_counter_uncharge(&mem->res,
+				     current->batch_memcg.nr_pages);
+	if (current->batch_memcg.nr_memsw)
+		res_counter_uncharge(&mem->memsw,
+				     current->batch_memcg.nr_memsw);
+	/* we got css's refcnt */
+	cgroup_release_and_wakeup_rmdir(&mem->css);
+}
+
 #ifdef CONFIG_SWAP
 /*
  * called after __delete_from_swap_cache() and drop "page" account.
Index: linux-2.6.31-rc7/include/linux/sched.h
===================================================================
--- linux-2.6.31-rc7.orig/include/linux/sched.h
+++ linux-2.6.31-rc7/include/linux/sched.h
@@ -1480,6 +1480,14 @@ struct task_struct {
 	/* bitmask of trace recursion */
 	unsigned long trace_recursion;
 #endif /* CONFIG_TRACING */
+#ifdef CONFIG_CGROUP_MEM_RES_CTLR
+	/* For implicit argument for batched uncharge */
+	struct memcg_batch_info {
+		struct mem_cgroup *memcg;
+		int batch_mode;
+		unsigned long nr_pages, nr_memsw;
+	} batch_memcg;
+#endif
 };
 
 /* Future-safe accessor for struct task_struct's cpus_allowed. */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][preview] [patch 1/2] memcg: batched uncharge base
  2009-08-25  2:29 ` [RFC][preview] [patch 1/2] memcg: batched uncharge base KAMEZAWA Hiroyuki
@ 2009-08-25  8:07   ` Daisuke Nishimura
  2009-08-25  8:37     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 10+ messages in thread
From: Daisuke Nishimura @ 2009-08-25  8:07 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, Daisuke Nishimura

First of all, I think these patches are good optimization.

I have a few comments for now.

On Tue, 25 Aug 2009 11:29:19 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> In massive parallel enviroment, res_counter can be a performance bottleneck.
> This patch is a trial for reducing lock contention in memcg.
> 
> One strong techinque to reduce lock contention is reducing calls themselves by
> do some amount of calls into a call, in batch.
> 
> Considering charge/uncharge chatacteristic,
> 	- charge is done one by one via demand-paging.
> 	- uncharge is done by
> 		- in continuous call at munmap, truncate, exit, execve...
> 		- one by one via vmscan/paging.
> 
> It seems we have a chance to batched-uncharge.
> This patch is a base patch for batched uncharge. For avoiding
> scattering memcg's structure as argument, this patch adds memcg batch uncharge
> information to the task. please see start/end usage in next patch.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/memcontrol.h |   12 ++++++++++
>  include/linux/sched.h      |    8 +++++++
>  mm/memcontrol.c            |   51 ++++++++++++++++++++++++++++++++++++++++++---
>  3 files changed, 68 insertions(+), 3 deletions(-)
> 
> Index: linux-2.6.31-rc7/include/linux/memcontrol.h
> ===================================================================
> --- linux-2.6.31-rc7.orig/include/linux/memcontrol.h
> +++ linux-2.6.31-rc7/include/linux/memcontrol.h
> @@ -54,6 +54,10 @@ extern void mem_cgroup_rotate_lru_list(s
>  extern void mem_cgroup_del_lru(struct page *page);
>  extern void mem_cgroup_move_lists(struct page *page,
>  				  enum lru_list from, enum lru_list to);
> +
> +extern void mem_cgroup_uncharge_batch_start(void);
> +extern void mem_cgroup_uncharge_batch_end(void);
> +
>  extern void mem_cgroup_uncharge_page(struct page *page);
>  extern void mem_cgroup_uncharge_cache_page(struct page *page);
>  extern int mem_cgroup_shmem_charge_fallback(struct page *page,
> @@ -148,6 +152,14 @@ static inline void mem_cgroup_cancel_cha
>  {
>  }
>  
> +static inline void mem_cgroup_uncharge_batch_start(void)
> +{
> +}
> +
> +static inline void mem_cgroup_uncharge_batch_start(void)
> +{
> +}
> +
>  static inline void mem_cgroup_uncharge_page(struct page *page)
>  {
>  }
> Index: linux-2.6.31-rc7/mm/memcontrol.c
> ===================================================================
> --- linux-2.6.31-rc7.orig/mm/memcontrol.c
> +++ linux-2.6.31-rc7/mm/memcontrol.c
> @@ -1500,6 +1500,7 @@ __mem_cgroup_uncharge_common(struct page
>  	struct page_cgroup *pc;
>  	struct mem_cgroup *mem = NULL;
>  	struct mem_cgroup_per_zone *mz;
> +	struct memcg_batch_info *batch = NULL;
>  
>  	if (mem_cgroup_disabled())
>  		return NULL;
> @@ -1537,10 +1538,25 @@ __mem_cgroup_uncharge_common(struct page
>  	default:
>  		break;
>  	}
> +	if (current->batch_memcg.batch_mode)
> +		batch = &current->batch_memcg;
>  
> -	res_counter_uncharge(&mem->res, PAGE_SIZE);
> -	if (do_swap_account && (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT))
> -		res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> +	if (!batch || batch->memcg != mem) {
> +		res_counter_uncharge(&mem->res, PAGE_SIZE);
> +		if (do_swap_account &&
> +		    (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT))
> +			res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> +		if (batch) {
> +			batch->memcg = mem;
What if we have set batch->memcg to a different memcg and it has some batch->nr_pages(nr_memsw) ?
Shouldn't we flush them first ?

And, it might be a overkill, how about flushing all the batched-uncharges
before invoking oom at __mem_cgroup_try_charge() ?


Thanks,
Daisuke Nishimura.

> +			css_get(&mem->css);
> +		}
> +	} else {
> +		/* instead of modifing res_counter, remember it */
> +		batch->nr_pages += PAGE_SIZE;
> +		if (do_swap_account &&
> +		    (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT))
> +			batch->nr_memsw += PAGE_SIZE;
> +	}
>  	mem_cgroup_charge_statistics(mem, pc, false);
>  
>  	ClearPageCgroupUsed(pc);
> @@ -1582,6 +1598,35 @@ void mem_cgroup_uncharge_cache_page(stru
>  	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_CACHE);
>  }
>  
> +void mem_cgroup_uncharge_batch_start(void)
> +{
> +	VM_BUG_ON(current->batch_memcg.batch_mode);
> +	current->batch_memcg.batch_mode = 1;
> +	current->batch_memcg.memcg = NULL;
> +	current->batch_memcg.nr_pages = 0;
> +	current->batch_memcg.nr_memsw = 0;
> +}
> +
> +void mem_cgroup_uncharge_batch_end(void)
> +{
> +	struct mem_cgroup *mem;
> +
> +	VM_BUG_ON(!current->batch_memcg.batch_mode);
> +	current->batch_memcg.batch_mode = 0;
> +
> +	mem = current->batch_memcg.memcg;
> +	if (!mem)
> +		return;
> +	if (current->batch_memcg.nr_pages)
> +		res_counter_uncharge(&mem->res,
> +				     current->batch_memcg.nr_pages);
> +	if (current->batch_memcg.nr_memsw)
> +		res_counter_uncharge(&mem->memsw,
> +				     current->batch_memcg.nr_memsw);
> +	/* we got css's refcnt */
> +	cgroup_release_and_wakeup_rmdir(&mem->css);
> +}
> +
>  #ifdef CONFIG_SWAP
>  /*
>   * called after __delete_from_swap_cache() and drop "page" account.
> Index: linux-2.6.31-rc7/include/linux/sched.h
> ===================================================================
> --- linux-2.6.31-rc7.orig/include/linux/sched.h
> +++ linux-2.6.31-rc7/include/linux/sched.h
> @@ -1480,6 +1480,14 @@ struct task_struct {
>  	/* bitmask of trace recursion */
>  	unsigned long trace_recursion;
>  #endif /* CONFIG_TRACING */
> +#ifdef CONFIG_CGROUP_MEM_RES_CTLR
> +	/* For implicit argument for batched uncharge */
> +	struct memcg_batch_info {
> +		struct mem_cgroup *memcg;
> +		int batch_mode;
> +		unsigned long nr_pages, nr_memsw;
> +	} batch_memcg;
> +#endif
>  };
>  
>  /* Future-safe accessor for struct task_struct's cpus_allowed. */
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][preview] [patch 1/2] memcg: batched uncharge base
  2009-08-25  8:07   ` Daisuke Nishimura
@ 2009-08-25  8:37     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 10+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-25  8:37 UTC (permalink / raw)
  To: Daisuke Nishimura; +Cc: linux-mm, balbir

On Tue, 25 Aug 2009 17:07:35 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> First of all, I think these patches are good optimization.
> 
> I have a few comments for now.
> 
> On Tue, 25 Aug 2009 11:29:19 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > 
> > In massive parallel enviroment, res_counter can be a performance bottleneck.
> > This patch is a trial for reducing lock contention in memcg.
> > 
> > One strong techinque to reduce lock contention is reducing calls themselves by
> > do some amount of calls into a call, in batch.
> > 
> > Considering charge/uncharge chatacteristic,
> > 	- charge is done one by one via demand-paging.
> > 	- uncharge is done by
> > 		- in continuous call at munmap, truncate, exit, execve...
> > 		- one by one via vmscan/paging.
> > 
> > It seems we have a chance to batched-uncharge.
> > This patch is a base patch for batched uncharge. For avoiding
> > scattering memcg's structure as argument, this patch adds memcg batch uncharge
> > information to the task. please see start/end usage in next patch.
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  include/linux/memcontrol.h |   12 ++++++++++
> >  include/linux/sched.h      |    8 +++++++
> >  mm/memcontrol.c            |   51 ++++++++++++++++++++++++++++++++++++++++++---
> >  3 files changed, 68 insertions(+), 3 deletions(-)
> > 
> > Index: linux-2.6.31-rc7/include/linux/memcontrol.h
> > ===================================================================
> > --- linux-2.6.31-rc7.orig/include/linux/memcontrol.h
> > +++ linux-2.6.31-rc7/include/linux/memcontrol.h
> > @@ -54,6 +54,10 @@ extern void mem_cgroup_rotate_lru_list(s
> >  extern void mem_cgroup_del_lru(struct page *page);
> >  extern void mem_cgroup_move_lists(struct page *page,
> >  				  enum lru_list from, enum lru_list to);
> > +
> > +extern void mem_cgroup_uncharge_batch_start(void);
> > +extern void mem_cgroup_uncharge_batch_end(void);
> > +
> >  extern void mem_cgroup_uncharge_page(struct page *page);
> >  extern void mem_cgroup_uncharge_cache_page(struct page *page);
> >  extern int mem_cgroup_shmem_charge_fallback(struct page *page,
> > @@ -148,6 +152,14 @@ static inline void mem_cgroup_cancel_cha
> >  {
> >  }
> >  
> > +static inline void mem_cgroup_uncharge_batch_start(void)
> > +{
> > +}
> > +
> > +static inline void mem_cgroup_uncharge_batch_start(void)
> > +{
> > +}
> > +
> >  static inline void mem_cgroup_uncharge_page(struct page *page)
> >  {
> >  }
> > Index: linux-2.6.31-rc7/mm/memcontrol.c
> > ===================================================================
> > --- linux-2.6.31-rc7.orig/mm/memcontrol.c
> > +++ linux-2.6.31-rc7/mm/memcontrol.c
> > @@ -1500,6 +1500,7 @@ __mem_cgroup_uncharge_common(struct page
> >  	struct page_cgroup *pc;
> >  	struct mem_cgroup *mem = NULL;
> >  	struct mem_cgroup_per_zone *mz;
> > +	struct memcg_batch_info *batch = NULL;
> >  
> >  	if (mem_cgroup_disabled())
> >  		return NULL;
> > @@ -1537,10 +1538,25 @@ __mem_cgroup_uncharge_common(struct page
> >  	default:
> >  		break;
> >  	}
> > +	if (current->batch_memcg.batch_mode)
> > +		batch = &current->batch_memcg;
> >  
> > -	res_counter_uncharge(&mem->res, PAGE_SIZE);
> > -	if (do_swap_account && (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT))
> > -		res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > +	if (!batch || batch->memcg != mem) {
> > +		res_counter_uncharge(&mem->res, PAGE_SIZE);
> > +		if (do_swap_account &&
> > +		    (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT))
> > +			res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > +		if (batch) {
> > +			batch->memcg = mem;
> What if we have set batch->memcg to a different memcg and it has some batch->nr_pages(nr_memsw) ?
> Shouldn't we flush them first ?
> 
Ah, this is bug. this should be
==
  if (batch && !batch->memcg)
==
(my current code does this.) thank you for pointing out.

I wonder it's not necessary to flush. just ignore it as no-batch.
This batched uncharge is done at
	- truncate/invalidate file cache per 14pages.(PAGEVECSIZE)
	- per vma unmapping.

Then, flush-and-exchange or just-do-synchronous-uncharge here or not
will not be important, I think.

> And, it might be a overkill, how about flushing all the batched-uncharges
> before invoking oom at __mem_cgroup_try_charge() ?
> 
Hmm. Maybe, I selected region of batched-uncharge to be enough small...
then, adding synchronize_rcu() or congestion_wait() or some before
retrying next-loop of reclaim will be enough.

Or, prevent batched-uncharge if someone runs into reclaim will be a smart choice.
It will be easy midification to  mem_cgroup_uncharge_batch_start(void).

Thanks,
-Kame


> 
> Thanks,
> Daisuke Nishimura.
> 
> > +			css_get(&mem->css);
> > +		}
> > +	} else {
> > +		/* instead of modifing res_counter, remember it */
> > +		batch->nr_pages += PAGE_SIZE;
> > +		if (do_swap_account &&
> > +		    (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT))
> > +			batch->nr_memsw += PAGE_SIZE;
> > +	}
> >  	mem_cgroup_charge_statistics(mem, pc, false);
> >  
> >  	ClearPageCgroupUsed(pc);
> > @@ -1582,6 +1598,35 @@ void mem_cgroup_uncharge_cache_page(stru
> >  	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_CACHE);
> >  }
> >  
> > +void mem_cgroup_uncharge_batch_start(void)
> > +{
> > +	VM_BUG_ON(current->batch_memcg.batch_mode);
> > +	current->batch_memcg.batch_mode = 1;
> > +	current->batch_memcg.memcg = NULL;
> > +	current->batch_memcg.nr_pages = 0;
> > +	current->batch_memcg.nr_memsw = 0;
> > +}
> > +
> > +void mem_cgroup_uncharge_batch_end(void)
> > +{
> > +	struct mem_cgroup *mem;
> > +
> > +	VM_BUG_ON(!current->batch_memcg.batch_mode);
> > +	current->batch_memcg.batch_mode = 0;
> > +
> > +	mem = current->batch_memcg.memcg;
> > +	if (!mem)
> > +		return;
> > +	if (current->batch_memcg.nr_pages)
> > +		res_counter_uncharge(&mem->res,
> > +				     current->batch_memcg.nr_pages);
> > +	if (current->batch_memcg.nr_memsw)
> > +		res_counter_uncharge(&mem->memsw,
> > +				     current->batch_memcg.nr_memsw);
> > +	/* we got css's refcnt */
> > +	cgroup_release_and_wakeup_rmdir(&mem->css);
> > +}
> > +
> >  #ifdef CONFIG_SWAP
> >  /*
> >   * called after __delete_from_swap_cache() and drop "page" account.
> > Index: linux-2.6.31-rc7/include/linux/sched.h
> > ===================================================================
> > --- linux-2.6.31-rc7.orig/include/linux/sched.h
> > +++ linux-2.6.31-rc7/include/linux/sched.h
> > @@ -1480,6 +1480,14 @@ struct task_struct {
> >  	/* bitmask of trace recursion */
> >  	unsigned long trace_recursion;
> >  #endif /* CONFIG_TRACING */
> > +#ifdef CONFIG_CGROUP_MEM_RES_CTLR
> > +	/* For implicit argument for batched uncharge */
> > +	struct memcg_batch_info {
> > +		struct mem_cgroup *memcg;
> > +		int batch_mode;
> > +		unsigned long nr_pages, nr_memsw;
> > +	} batch_memcg;
> > +#endif
> >  };
> >  
> >  /* Future-safe accessor for struct task_struct's cpus_allowed. */
> > 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC][preview][patch 2/2] memcg: uncharge at truncate/unmap in batched manner
  2009-08-25  2:25 [RFC][preview] memcg: reduce lock contention at uncharge by batching KAMEZAWA Hiroyuki
  2009-08-25  2:29 ` [RFC][preview] [patch 1/2] memcg: batched uncharge base KAMEZAWA Hiroyuki
@ 2009-08-25  2:31 ` KAMEZAWA Hiroyuki
  2009-08-25  8:25 ` [RFC][preview] memcg: reduce lock contention at uncharge by batching Balbir Singh
  2009-08-26  1:02 ` KAMEZAWA Hiroyuki
  3 siblings, 0 replies; 10+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-25  2:31 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, nishimura

This patch adds hook to start batched uncharge into
  - unmap
  - truncate

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
Index: linux-2.6.31-rc7/mm/memory.c
===================================================================
--- linux-2.6.31-rc7.orig/mm/memory.c
+++ linux-2.6.31-rc7/mm/memory.c
@@ -907,6 +907,7 @@ static unsigned long unmap_page_range(st
 		details = NULL;
 
 	BUG_ON(addr >= end);
+	mem_cgroup_uncharge_batch_start();
 	tlb_start_vma(tlb, vma);
 	pgd = pgd_offset(vma->vm_mm, addr);
 	do {
@@ -919,6 +920,7 @@ static unsigned long unmap_page_range(st
 						zap_work, details);
 	} while (pgd++, addr = next, (addr != end && *zap_work > 0));
 	tlb_end_vma(tlb, vma);
+	mem_cgroup_uncharge_batch_end();
 
 	return addr;
 }
Index: linux-2.6.31-rc7/mm/truncate.c
===================================================================
--- linux-2.6.31-rc7.orig/mm/truncate.c
+++ linux-2.6.31-rc7/mm/truncate.c
@@ -231,6 +231,7 @@ void truncate_inode_pages_range(struct a
 			pagevec_release(&pvec);
 			break;
 		}
+		mem_cgroup_uncharge_batch_start();
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
@@ -250,6 +251,7 @@ void truncate_inode_pages_range(struct a
 			unlock_page(page);
 		}
 		pagevec_release(&pvec);
+		mem_cgroup_uncharge_batch_end();
 	}
 }
 EXPORT_SYMBOL(truncate_inode_pages_range);
@@ -291,6 +293,7 @@ unsigned long invalidate_mapping_pages(s
 	pagevec_init(&pvec, 0);
 	while (next <= end &&
 			pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) {
+		mem_cgroup_uncharge_batch_start();
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 			pgoff_t index;
@@ -322,6 +325,7 @@ unlock:
 				break;
 		}
 		pagevec_release(&pvec);
+		mem_cgroup_uncharge_batch_end();
 		cond_resched();
 	}
 	return ret;
@@ -396,6 +400,7 @@ int invalidate_inode_pages2_range(struct
 	while (next <= end && !wrapped &&
 		pagevec_lookup(&pvec, mapping, next,
 			min(end - next, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+		mem_cgroup_uncharge_batch_start();
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 			pgoff_t page_index;
@@ -445,6 +450,7 @@ int invalidate_inode_pages2_range(struct
 			unlock_page(page);
 		}
 		pagevec_release(&pvec);
+		mem_cgroup_uncharge_batch_end();
 		cond_resched();
 	}
 	return ret;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][preview] memcg: reduce lock contention at uncharge by batching
  2009-08-25  2:25 [RFC][preview] memcg: reduce lock contention at uncharge by batching KAMEZAWA Hiroyuki
  2009-08-25  2:29 ` [RFC][preview] [patch 1/2] memcg: batched uncharge base KAMEZAWA Hiroyuki
  2009-08-25  2:31 ` [RFC][preview][patch 2/2] memcg: uncharge at truncate/unmap in batched manner KAMEZAWA Hiroyuki
@ 2009-08-25  8:25 ` Balbir Singh
  2009-08-25  8:42   ` KAMEZAWA Hiroyuki
  2009-08-26  1:02 ` KAMEZAWA Hiroyuki
  3 siblings, 1 reply; 10+ messages in thread
From: Balbir Singh @ 2009-08-25  8:25 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, nishimura

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-08-25 11:25:47]:

> Hi,
> 
> This is a preview of a patch for reduce lock contention for memcg->res_counter.
> This makes series of uncharge in batch and reduce critical lock contention in
> res_counter. This is still under developement and based on 2.6.31-rc7.
> I'll rebase this onto mmotm if I'm ready.
> 
> I have only 8cpu(4core/2socket) system now. no significant speed up but good lock_stat.
>


I'll test this on a 24 way that I have and check. I think these
patches + resource counter per cpu locking should give good results.
 
-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][preview] memcg: reduce lock contention at uncharge by batching
  2009-08-25  8:25 ` [RFC][preview] memcg: reduce lock contention at uncharge by batching Balbir Singh
@ 2009-08-25  8:42   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 10+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-25  8:42 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, nishimura

On Tue, 25 Aug 2009 13:55:26 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-08-25 11:25:47]:
> 
> > Hi,
> > 
> > This is a preview of a patch for reduce lock contention for memcg->res_counter.
> > This makes series of uncharge in batch and reduce critical lock contention in
> > res_counter. This is still under developement and based on 2.6.31-rc7.
> > I'll rebase this onto mmotm if I'm ready.
> > 
> > I have only 8cpu(4core/2socket) system now. no significant speed up but good lock_stat.
> >
> 
> 
> I'll test this on a 24 way that I have and check. I think these
> patches + resource counter per cpu locking should give good results.
>  
Thank you.

yes. I'm trying re-considering res_counter-percpu, too.
But, hmm, accuracy of counter trade-off is our final trouble if we select it.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][preview] memcg: reduce lock contention at uncharge by batching
  2009-08-25  2:25 [RFC][preview] memcg: reduce lock contention at uncharge by batching KAMEZAWA Hiroyuki
                   ` (2 preceding siblings ...)
  2009-08-25  8:25 ` [RFC][preview] memcg: reduce lock contention at uncharge by batching Balbir Singh
@ 2009-08-26  1:02 ` KAMEZAWA Hiroyuki
  2009-08-26  5:25   ` Daisuke Nishimura
  3 siblings, 1 reply; 10+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-26  1:02 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, nishimura

With attached patch below, per-cpu-precharge,

I got this number,

[Before] linux-2.6.31-rc7
real    2m46.491s
user    4m47.008s
sys     3m32.954s


lock_stat version 0.3
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                          &counter->lock:       1167034        1196935           0.52       16291.34      829793.69       18742433       45050576           0.42       30788.81     9490908.36
                          --------------
                          &counter->lock         638151          [<ffffffff81090fd5>] res_counter_charge+0x45/0xe0
                          &counter->lock         558784          [<ffffffff81090f5d>] res_counter_uncharge+0x2d/0x60
                          --------------
                          &counter->lock         679567          [<ffffffff81090fd5>] res_counter_charge+0x45/0xe0
                          &counter->lock         517368          [<ffffffff81090f5d>] res_counter_uncharge+0x2d/0x60

[After] precharge+batched uncharge
real    2m46.799s
user    4m49.523s
sys     3m18.916s
                         &counter->lock:         12785          12984           0.71          34.87        6768.24
       967813        4937090           0.47       20257.57      953289.67
                          --------------
                          &counter->lock          11117          [<ffffffff81090f3d>] res_counter_uncharge+0x2d/0x60
                          &counter->lock           1867          [<ffffffff81090fb5>] res_counter_charge+0x45/0xe0
                          --------------
                          &counter->lock          10691          [<ffffffff81090f3d>] res_counter_uncharge+0x2d/0x60
                          &counter->lock           2293          [<ffffffff81090fb5>] res_counter_charge+0x45/0xe0

I think patch below is enough simple. (but I need to support flush&cpu-hotplug)
I'd like to rebase this onto mmotom. 
Main difference with percpu_counter is that this is pre-charge and never goes over limit.

--
Index: linux-2.6.31-rc7/mm/memcontrol.c
===================================================================
--- linux-2.6.31-rc7.orig/mm/memcontrol.c	2009-08-26 09:11:57.000000000 +0900
+++ linux-2.6.31-rc7/mm/memcontrol.c	2009-08-26 09:46:51.000000000 +0900
@@ -67,6 +67,7 @@
 	MEM_CGROUP_STAT_PGPGIN_COUNT,	/* # of pages paged in */
 	MEM_CGROUP_STAT_PGPGOUT_COUNT,	/* # of pages paged out */
 
+	MEM_CGROUP_STAT_PRECHARGE, /* # of charges pre-allocated for future */
 	MEM_CGROUP_STAT_NSTATS,
 };
 
@@ -959,6 +960,32 @@
 	unlock_page_cgroup(pc);
 }
 
+#define CHARGE_SIZE	(4 * ((NR_CPUS >> 5)+1) * PAGE_SIZE)
+
+bool use_precharge(struct mem_cgroup *mem)
+{
+	struct mem_cgroup_stat_cpu *cstat;
+	int cpu = get_cpu();
+	bool ret = true;
+
+	cstat = &mem->stat.cpustat[cpu];
+	if (cstat->count[MEM_CGROUP_STAT_PRECHARGE])
+		cstat->count[MEM_CGROUP_STAT_PRECHARGE] -= PAGE_SIZE;
+	else
+		ret = false;
+	put_cpu();
+	return ret;
+}
+
+void do_precharge(struct mem_cgroup *mem, int val)
+{
+	struct mem_cgroup_stat_cpu *cstat;
+	int cpu = get_cpu();
+	cstat = &mem->stat.cpustat[cpu];
+	__mem_cgroup_stat_add_safe(cstat, MEM_CGROUP_STAT_PRECHARGE, val);
+	put_cpu();
+}
+
 /*
  * Unlike exported interface, "oom" parameter is added. if oom==true,
  * oom-killer can be invoked.
@@ -995,20 +1022,24 @@
 
 	VM_BUG_ON(css_is_removed(&mem->css));
 
+	/* can we use precharge ? */
+	if (use_precharge(mem))
+		goto got;
+
 	while (1) {
 		int ret;
 		bool noswap = false;
 
-		ret = res_counter_charge(&mem->res, PAGE_SIZE, &fail_res);
+		ret = res_counter_charge(&mem->res, CHARGE_SIZE, &fail_res);
 		if (likely(!ret)) {
 			if (!do_swap_account)
 				break;
-			ret = res_counter_charge(&mem->memsw, PAGE_SIZE,
+			ret = res_counter_charge(&mem->memsw, CHARGE_SIZE,
 							&fail_res);
 			if (likely(!ret))
 				break;
 			/* mem+swap counter fails */
-			res_counter_uncharge(&mem->res, PAGE_SIZE);
+			res_counter_uncharge(&mem->res, CHARGE_SIZE);
 			noswap = true;
 			mem_over_limit = mem_cgroup_from_res_counter(fail_res,
 									memsw);
@@ -1046,6 +1077,8 @@
 			goto nomem;
 		}
 	}
+	do_precharge(mem, CHARGE_SIZE-PAGE_SIZE);
+got:
 	return 0;
 nomem:
 	css_put(&mem->css);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][preview] memcg: reduce lock contention at uncharge by batching
  2009-08-26  1:02 ` KAMEZAWA Hiroyuki
@ 2009-08-26  5:25   ` Daisuke Nishimura
  2009-08-26  6:48     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 10+ messages in thread
From: Daisuke Nishimura @ 2009-08-26  5:25 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, Daisuke Nishimura

On Wed, 26 Aug 2009 10:02:56 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> With attached patch below, per-cpu-precharge,
> 
> I got this number,
> 
> [Before] linux-2.6.31-rc7
> real    2m46.491s
> user    4m47.008s
> sys     3m32.954s
> 
> 
> lock_stat version 0.3
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>                               class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
>                           &counter->lock:       1167034        1196935           0.52       16291.34      829793.69       18742433       45050576           0.42       30788.81     9490908.36
>                           --------------
>                           &counter->lock         638151          [<ffffffff81090fd5>] res_counter_charge+0x45/0xe0
>                           &counter->lock         558784          [<ffffffff81090f5d>] res_counter_uncharge+0x2d/0x60
>                           --------------
>                           &counter->lock         679567          [<ffffffff81090fd5>] res_counter_charge+0x45/0xe0
>                           &counter->lock         517368          [<ffffffff81090f5d>] res_counter_uncharge+0x2d/0x60
> 
> [After] precharge+batched uncharge
> real    2m46.799s
> user    4m49.523s
> sys     3m18.916s
>                          &counter->lock:         12785          12984           0.71          34.87        6768.24
>        967813        4937090           0.47       20257.57      953289.67
>                           --------------
>                           &counter->lock          11117          [<ffffffff81090f3d>] res_counter_uncharge+0x2d/0x60
>                           &counter->lock           1867          [<ffffffff81090fb5>] res_counter_charge+0x45/0xe0
>                           --------------
>                           &counter->lock          10691          [<ffffffff81090f3d>] res_counter_uncharge+0x2d/0x60
>                           &counter->lock           2293          [<ffffffff81090fb5>] res_counter_charge+0x45/0xe0
> 
> I think patch below is enough simple. (but I need to support flush&cpu-hotplug)
> I'd like to rebase this onto mmotom. 
> Main difference with percpu_counter is that this is pre-charge and never goes over limit.
> 
I basically agree to this direction, but I have one question.

What do you mean by "flush" ? I suppose "discard precharges when hitting the limit", right ?

Thanks,
Daisuke Nishimura.

> --
> Index: linux-2.6.31-rc7/mm/memcontrol.c
> ===================================================================
> --- linux-2.6.31-rc7.orig/mm/memcontrol.c	2009-08-26 09:11:57.000000000 +0900
> +++ linux-2.6.31-rc7/mm/memcontrol.c	2009-08-26 09:46:51.000000000 +0900
> @@ -67,6 +67,7 @@
>  	MEM_CGROUP_STAT_PGPGIN_COUNT,	/* # of pages paged in */
>  	MEM_CGROUP_STAT_PGPGOUT_COUNT,	/* # of pages paged out */
>  
> +	MEM_CGROUP_STAT_PRECHARGE, /* # of charges pre-allocated for future */
>  	MEM_CGROUP_STAT_NSTATS,
>  };
>  
> @@ -959,6 +960,32 @@
>  	unlock_page_cgroup(pc);
>  }
>  
> +#define CHARGE_SIZE	(4 * ((NR_CPUS >> 5)+1) * PAGE_SIZE)
> +
> +bool use_precharge(struct mem_cgroup *mem)
> +{
> +	struct mem_cgroup_stat_cpu *cstat;
> +	int cpu = get_cpu();
> +	bool ret = true;
> +
> +	cstat = &mem->stat.cpustat[cpu];
> +	if (cstat->count[MEM_CGROUP_STAT_PRECHARGE])
> +		cstat->count[MEM_CGROUP_STAT_PRECHARGE] -= PAGE_SIZE;
> +	else
> +		ret = false;
> +	put_cpu();
> +	return ret;
> +}
> +
> +void do_precharge(struct mem_cgroup *mem, int val)
> +{
> +	struct mem_cgroup_stat_cpu *cstat;
> +	int cpu = get_cpu();
> +	cstat = &mem->stat.cpustat[cpu];
> +	__mem_cgroup_stat_add_safe(cstat, MEM_CGROUP_STAT_PRECHARGE, val);
> +	put_cpu();
> +}
> +
>  /*
>   * Unlike exported interface, "oom" parameter is added. if oom==true,
>   * oom-killer can be invoked.
> @@ -995,20 +1022,24 @@
>  
>  	VM_BUG_ON(css_is_removed(&mem->css));
>  
> +	/* can we use precharge ? */
> +	if (use_precharge(mem))
> +		goto got;
> +
>  	while (1) {
>  		int ret;
>  		bool noswap = false;
>  
> -		ret = res_counter_charge(&mem->res, PAGE_SIZE, &fail_res);
> +		ret = res_counter_charge(&mem->res, CHARGE_SIZE, &fail_res);
>  		if (likely(!ret)) {
>  			if (!do_swap_account)
>  				break;
> -			ret = res_counter_charge(&mem->memsw, PAGE_SIZE,
> +			ret = res_counter_charge(&mem->memsw, CHARGE_SIZE,
>  							&fail_res);
>  			if (likely(!ret))
>  				break;
>  			/* mem+swap counter fails */
> -			res_counter_uncharge(&mem->res, PAGE_SIZE);
> +			res_counter_uncharge(&mem->res, CHARGE_SIZE);
>  			noswap = true;
>  			mem_over_limit = mem_cgroup_from_res_counter(fail_res,
>  									memsw);
> @@ -1046,6 +1077,8 @@
>  			goto nomem;
>  		}
>  	}
> +	do_precharge(mem, CHARGE_SIZE-PAGE_SIZE);
> +got:
>  	return 0;
>  nomem:
>  	css_put(&mem->css);
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][preview] memcg: reduce lock contention at uncharge by batching
  2009-08-26  5:25   ` Daisuke Nishimura
@ 2009-08-26  6:48     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 10+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-08-26  6:48 UTC (permalink / raw)
  To: Daisuke Nishimura; +Cc: linux-mm, balbir

On Wed, 26 Aug 2009 14:25:20 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> On Wed, 26 Aug 2009 10:02:56 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > With attached patch below, per-cpu-precharge,
> > 
> > I got this number,
> > 
> > [Before] linux-2.6.31-rc7
> > real    2m46.491s
> > user    4m47.008s
> > sys     3m32.954s
> > 
> > 
> > lock_stat version 0.3
> > -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >                               class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total
> > -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > 
> >                           &counter->lock:       1167034        1196935           0.52       16291.34      829793.69       18742433       45050576           0.42       30788.81     9490908.36
> >                           --------------
> >                           &counter->lock         638151          [<ffffffff81090fd5>] res_counter_charge+0x45/0xe0
> >                           &counter->lock         558784          [<ffffffff81090f5d>] res_counter_uncharge+0x2d/0x60
> >                           --------------
> >                           &counter->lock         679567          [<ffffffff81090fd5>] res_counter_charge+0x45/0xe0
> >                           &counter->lock         517368          [<ffffffff81090f5d>] res_counter_uncharge+0x2d/0x60
> > 
> > [After] precharge+batched uncharge
> > real    2m46.799s
> > user    4m49.523s
> > sys     3m18.916s
> >                          &counter->lock:         12785          12984           0.71          34.87        6768.24
> >        967813        4937090           0.47       20257.57      953289.67
> >                           --------------
> >                           &counter->lock          11117          [<ffffffff81090f3d>] res_counter_uncharge+0x2d/0x60
> >                           &counter->lock           1867          [<ffffffff81090fb5>] res_counter_charge+0x45/0xe0
> >                           --------------
> >                           &counter->lock          10691          [<ffffffff81090f3d>] res_counter_uncharge+0x2d/0x60
> >                           &counter->lock           2293          [<ffffffff81090fb5>] res_counter_charge+0x45/0xe0
> > 
> > I think patch below is enough simple. (but I need to support flush&cpu-hotplug)
> > I'd like to rebase this onto mmotom. 
> > Main difference with percpu_counter is that this is pre-charge and never goes over limit.
> > 
> I basically agree to this direction, but I have one question.
> 
> What do you mean by "flush" ? I suppose "discard precharges when hitting the limit", right ?
> 

yes.

Thanks,
-Kame

> Thanks,
> Daisuke Nishimura.
> 
> > --
> > Index: linux-2.6.31-rc7/mm/memcontrol.c
> > ===================================================================
> > --- linux-2.6.31-rc7.orig/mm/memcontrol.c	2009-08-26 09:11:57.000000000 +0900
> > +++ linux-2.6.31-rc7/mm/memcontrol.c	2009-08-26 09:46:51.000000000 +0900
> > @@ -67,6 +67,7 @@
> >  	MEM_CGROUP_STAT_PGPGIN_COUNT,	/* # of pages paged in */
> >  	MEM_CGROUP_STAT_PGPGOUT_COUNT,	/* # of pages paged out */
> >  
> > +	MEM_CGROUP_STAT_PRECHARGE, /* # of charges pre-allocated for future */
> >  	MEM_CGROUP_STAT_NSTATS,
> >  };
> >  
> > @@ -959,6 +960,32 @@
> >  	unlock_page_cgroup(pc);
> >  }
> >  
> > +#define CHARGE_SIZE	(4 * ((NR_CPUS >> 5)+1) * PAGE_SIZE)
> > +
> > +bool use_precharge(struct mem_cgroup *mem)
> > +{
> > +	struct mem_cgroup_stat_cpu *cstat;
> > +	int cpu = get_cpu();
> > +	bool ret = true;
> > +
> > +	cstat = &mem->stat.cpustat[cpu];
> > +	if (cstat->count[MEM_CGROUP_STAT_PRECHARGE])
> > +		cstat->count[MEM_CGROUP_STAT_PRECHARGE] -= PAGE_SIZE;
> > +	else
> > +		ret = false;
> > +	put_cpu();
> > +	return ret;
> > +}
> > +
> > +void do_precharge(struct mem_cgroup *mem, int val)
> > +{
> > +	struct mem_cgroup_stat_cpu *cstat;
> > +	int cpu = get_cpu();
> > +	cstat = &mem->stat.cpustat[cpu];
> > +	__mem_cgroup_stat_add_safe(cstat, MEM_CGROUP_STAT_PRECHARGE, val);
> > +	put_cpu();
> > +}
> > +
> >  /*
> >   * Unlike exported interface, "oom" parameter is added. if oom==true,
> >   * oom-killer can be invoked.
> > @@ -995,20 +1022,24 @@
> >  
> >  	VM_BUG_ON(css_is_removed(&mem->css));
> >  
> > +	/* can we use precharge ? */
> > +	if (use_precharge(mem))
> > +		goto got;
> > +
> >  	while (1) {
> >  		int ret;
> >  		bool noswap = false;
> >  
> > -		ret = res_counter_charge(&mem->res, PAGE_SIZE, &fail_res);
> > +		ret = res_counter_charge(&mem->res, CHARGE_SIZE, &fail_res);
> >  		if (likely(!ret)) {
> >  			if (!do_swap_account)
> >  				break;
> > -			ret = res_counter_charge(&mem->memsw, PAGE_SIZE,
> > +			ret = res_counter_charge(&mem->memsw, CHARGE_SIZE,
> >  							&fail_res);
> >  			if (likely(!ret))
> >  				break;
> >  			/* mem+swap counter fails */
> > -			res_counter_uncharge(&mem->res, PAGE_SIZE);
> > +			res_counter_uncharge(&mem->res, CHARGE_SIZE);
> >  			noswap = true;
> >  			mem_over_limit = mem_cgroup_from_res_counter(fail_res,
> >  									memsw);
> > @@ -1046,6 +1077,8 @@
> >  			goto nomem;
> >  		}
> >  	}
> > +	do_precharge(mem, CHARGE_SIZE-PAGE_SIZE);
> > +got:
> >  	return 0;
> >  nomem:
> >  	css_put(&mem->css);
> > 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-08-26 15:40 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-25  2:25 [RFC][preview] memcg: reduce lock contention at uncharge by batching KAMEZAWA Hiroyuki
2009-08-25  2:29 ` [RFC][preview] [patch 1/2] memcg: batched uncharge base KAMEZAWA Hiroyuki
2009-08-25  8:07   ` Daisuke Nishimura
2009-08-25  8:37     ` KAMEZAWA Hiroyuki
2009-08-25  2:31 ` [RFC][preview][patch 2/2] memcg: uncharge at truncate/unmap in batched manner KAMEZAWA Hiroyuki
2009-08-25  8:25 ` [RFC][preview] memcg: reduce lock contention at uncharge by batching Balbir Singh
2009-08-25  8:42   ` KAMEZAWA Hiroyuki
2009-08-26  1:02 ` KAMEZAWA Hiroyuki
2009-08-26  5:25   ` Daisuke Nishimura
2009-08-26  6:48     ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox