[PATCH 1/2] memcg: coalescing uncharge at unmap/truncate (Oct/9)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2] memcg: coalescing uncharge at unmap/truncate (Oct/9)
@ 2009-10-09  7:58 KAMEZAWA Hiroyuki
  2009-10-09  8:01 ` [PATCH 2/2] memcg: coalescing charge by percpu (Oct/9) KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-09  7:58 UTC (permalink / raw)
  To: linux-mm; +Cc: balbir, nishimura, akpm, h-shimamoto, linux-kernel

Thank you for all review to previous ones (Oct/2).

This patch is against mmotm + softlimit fix patches.
(which are now in -rc git tree.)
==
In massive parallel enviroment, res_counter can be a performance bottleneck.
One strong techinque to reduce lock contention is reducing calls by
coalescing some amount of calls into one.

Considering charge/uncharge chatacteristic,
	- charge is done one by one via demand-paging.
	- uncharge is done by
		- in chunk at munmap, truncate, exit, execve...
		- one by one via vmscan/paging.

It seems we have a chance to coalesce uncharges for improving scalability
at unmap/truncation.

This patch is a for coalescing uncharge. For avoiding scattering memcg's
structure to functions under /mm, this patch adds memcg batch uncharge
information to the task. A reason for per-task batching is for making
use of caller's context information. We do batched uncharge (deleyed uncharge)
when truncation/unmap occurs but do direct uncharge when uncharge is
called by memory reclaim (vmscan.c).

The degree of coalescing depends on callers
  - at invalidate/trucate... pagevec size
  - at unmap ....ZAP_BLOCK_SIZE
(memory itself will be freed in this degree.)
Then, we'll not coalescing too much.

On x86-64 8cpu server, I tested overheads of memcg at page fault by
running a program which does map/fault/unmap in a loop. Running 
a task per a cpu by taskset and see sum of the number of page faults
in 60secs.

[without memcg config] 
  40156968  page-faults              #      0.085 M/sec   ( +-   0.046% )
  27.67 cache-miss/faults
[root cgroup]
  36659599  page-faults              #      0.077 M/sec   ( +-   0.247% )
  31.58 miss/faults
[in a child cgroup]
  18444157  page-faults              #      0.039 M/sec   ( +-   0.133% )
  69.96 miss/faults
[child with this patch]
  27133719  page-faults              #      0.057 M/sec   ( +-   0.155% )
  47.16 miss/faults

We can see some amounts of improvement.
(root cgroup doesn't affected by this patch)
Another patch for "charge" will follow this and above will be improved more.

Changelog(since 2009/10/02):
 - renamed filed of memcg_batch (as pages to bytes, memsw to memsw_bytes)
 - some clean up and commentary/description updates.
 - added initialize code to copy_process(). (possible bug fix)

Changelog(old):
 - fixed !CONFIG_MEM_CGROUP case. 
 - rebased onto the latest mmotm + softlimit fix patches.
 - unified patch for callers
 - added commetns.
 - make ->do_batch as bool.
 - removed css_get() at el. We don't need it.

Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/memcontrol.h |   13 ++++++
 include/linux/sched.h      |    8 +++
 kernel/fork.c              |    4 +
 mm/memcontrol.c            |   96 ++++++++++++++++++++++++++++++++++++++++++---
 mm/memory.c                |    2 
 mm/truncate.c              |    6 ++
 6 files changed, 123 insertions(+), 6 deletions(-)

Index: mmotm-2.6.31-Sep28/include/linux/memcontrol.h
===================================================================
--- mmotm-2.6.31-Sep28.orig/include/linux/memcontrol.h
+++ mmotm-2.6.31-Sep28/include/linux/memcontrol.h
@@ -54,6 +54,11 @@ extern void mem_cgroup_rotate_lru_list(s
 extern void mem_cgroup_del_lru(struct page *page);
 extern void mem_cgroup_move_lists(struct page *page,
 				  enum lru_list from, enum lru_list to);
+
+/* For coalescing uncharge for reducing memcg' overhead*/
+extern void mem_cgroup_uncharge_start(void);
+extern void mem_cgroup_uncharge_end(void);
+
 extern void mem_cgroup_uncharge_page(struct page *page);
 extern void mem_cgroup_uncharge_cache_page(struct page *page);
 extern int mem_cgroup_shmem_charge_fallback(struct page *page,
@@ -151,6 +156,14 @@ static inline void mem_cgroup_cancel_cha
 {
 }
 
+static inline void mem_cgroup_uncharge_start(void)
+{
+}
+
+static inline void mem_cgroup_uncharge_end(void)
+{
+}
+
 static inline void mem_cgroup_uncharge_page(struct page *page)
 {
 }
Index: mmotm-2.6.31-Sep28/mm/memcontrol.c
===================================================================
--- mmotm-2.6.31-Sep28.orig/mm/memcontrol.c
+++ mmotm-2.6.31-Sep28/mm/memcontrol.c
@@ -1826,6 +1826,50 @@ void mem_cgroup_cancel_charge_swapin(str
 	css_put(&mem->css);
 }
 
+static void
+__do_uncharge(struct mem_cgroup *mem, const enum charge_type ctype)
+{
+	struct memcg_batch_info *batch = NULL;
+	bool uncharge_memsw = true;
+	/* If swapout, usage of swap doesn't decrease */
+	if (!do_swap_account || ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT)
+		uncharge_memsw = false;
+	/*
+	 * do_batch > 0 when unmapping pages or inode invalidate/truncate.
+	 * In those cases, all pages freed continously can be expected to be in
+	 * the same cgroup and we have chance to coalesce uncharges.
+	 * But we do uncharge one by one if this is killed by OOM(TIF_MEMDIE)
+	 * because we want to do uncharge as soon as possible.
+	 */
+	if (!current->memcg_batch.do_batch || test_thread_flag(TIF_MEMDIE))
+		goto direct_uncharge;
+
+	batch = &current->memcg_batch;
+	/*
+	 * In usual, we do css_get() when we remember memcg pointer.
+	 * But in this case, we keep res->usage until end of a series of
+	 * uncharges. Then, it's ok to ignore memcg's refcnt.
+	 */
+	if (!batch->memcg)
+		batch->memcg = mem;
+	/*
+	 * In typical case, batch->memcg == mem. This means we can
+	 * merge a series of uncharges to an uncharge of res_counter.
+	 * If not, we uncharge res_counter ony by one.
+	 */
+	if (batch->memcg != mem)
+		goto direct_uncharge;
+	/* remember freed charge and uncharge it later */
+	batch->bytes += PAGE_SIZE;
+	if (uncharge_memsw)
+		batch->memsw_bytes += PAGE_SIZE;
+	return;
+direct_uncharge:
+	res_counter_uncharge(&mem->res, PAGE_SIZE);
+	if (uncharge_memsw)
+		res_counter_uncharge(&mem->memsw, PAGE_SIZE);
+	return;
+}
 
 /*
  * uncharge if !page_mapped(page)
@@ -1874,12 +1918,8 @@ __mem_cgroup_uncharge_common(struct page
 		break;
 	}
 
-	if (!mem_cgroup_is_root(mem)) {
-		res_counter_uncharge(&mem->res, PAGE_SIZE);
-		if (do_swap_account &&
-				(ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT))
-			res_counter_uncharge(&mem->memsw, PAGE_SIZE);
-	}
+	if (!mem_cgroup_is_root(mem))
+		__do_uncharge(mem, ctype);
 	if (ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT)
 		mem_cgroup_swap_statistics(mem, true);
 	mem_cgroup_charge_statistics(mem, pc, false);
@@ -1925,6 +1965,50 @@ void mem_cgroup_uncharge_cache_page(stru
 	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_CACHE);
 }
 
+/*
+ * Batch_start/batch_end is called in unmap_page_range/invlidate/trucate.
+ * In that cases, pages are freed continuously and we can expect pages
+ * are in the same memcg. All these calls itself limits the number of
+ * pages freed at once, then uncharge_start/end() is called properly.
+ * This may be called prural(2) times in a context,
+ */
+
+void mem_cgroup_uncharge_start(void)
+{
+	current->memcg_batch.do_batch++;
+	/* We can do nest. */
+	if (current->memcg_batch.do_batch == 1) {
+		current->memcg_batch.memcg = NULL;
+		current->memcg_batch.bytes = 0;
+		current->memcg_batch.memsw_bytes = 0;
+	}
+}
+
+void mem_cgroup_uncharge_end(void)
+{
+	struct memcg_batch_info *batch = &current->memcg_batch;
+
+	if (!batch->do_batch)
+		return;
+
+	batch->do_batch--;
+	if (batch->do_batch) /* If stacked, do nothing. */
+		return;
+
+	if (!batch->memcg)
+		return;
+	/*
+	 * This "batch->memcg" is valid without any css_get/put etc...
+	 * bacause we hide charges behind us.
+	 */
+	if (batch->bytes)
+		res_counter_uncharge(&batch->memcg->res, batch->bytes);
+	if (batch->memsw_bytes)
+		res_counter_uncharge(&batch->memcg->memsw, batch->memsw_bytes);
+	/* forget this pointer (for sanity check) */
+	batch->memcg = NULL;
+}
+
 #ifdef CONFIG_SWAP
 /*
  * called after __delete_from_swap_cache() and drop "page" account.
Index: mmotm-2.6.31-Sep28/include/linux/sched.h
===================================================================
--- mmotm-2.6.31-Sep28.orig/include/linux/sched.h
+++ mmotm-2.6.31-Sep28/include/linux/sched.h
@@ -1549,6 +1549,14 @@ struct task_struct {
 	unsigned long trace_recursion;
 #endif /* CONFIG_TRACING */
 	unsigned long stack_start;
+#ifdef CONFIG_CGROUP_MEM_RES_CTLR /* memcg uses this to do batch job */
+	struct memcg_batch_info {
+		int do_batch;	/* incremented when batch uncharge started */
+		struct mem_cgroup *memcg; /* target memcg of uncharge */
+		unsigned long bytes; 		/* uncharged usage */
+		unsigned long memsw_bytes; /* uncharged mem+swap usage */
+	} memcg_batch;
+#endif
 };
 
 /* Future-safe accessor for struct task_struct's cpus_allowed. */
Index: mmotm-2.6.31-Sep28/mm/memory.c
===================================================================
--- mmotm-2.6.31-Sep28.orig/mm/memory.c
+++ mmotm-2.6.31-Sep28/mm/memory.c
@@ -940,6 +940,7 @@ static unsigned long unmap_page_range(st
 		details = NULL;
 
 	BUG_ON(addr >= end);
+	mem_cgroup_uncharge_start();
 	tlb_start_vma(tlb, vma);
 	pgd = pgd_offset(vma->vm_mm, addr);
 	do {
@@ -952,6 +953,7 @@ static unsigned long unmap_page_range(st
 						zap_work, details);
 	} while (pgd++, addr = next, (addr != end && *zap_work > 0));
 	tlb_end_vma(tlb, vma);
+	mem_cgroup_uncharge_end();
 
 	return addr;
 }
Index: mmotm-2.6.31-Sep28/mm/truncate.c
===================================================================
--- mmotm-2.6.31-Sep28.orig/mm/truncate.c
+++ mmotm-2.6.31-Sep28/mm/truncate.c
@@ -272,6 +272,7 @@ void truncate_inode_pages_range(struct a
 			pagevec_release(&pvec);
 			break;
 		}
+		mem_cgroup_uncharge_start();
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
@@ -286,6 +287,7 @@ void truncate_inode_pages_range(struct a
 			unlock_page(page);
 		}
 		pagevec_release(&pvec);
+		mem_cgroup_uncharge_end();
 	}
 }
 EXPORT_SYMBOL(truncate_inode_pages_range);
@@ -327,6 +329,7 @@ unsigned long invalidate_mapping_pages(s
 	pagevec_init(&pvec, 0);
 	while (next <= end &&
 			pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) {
+		mem_cgroup_uncharge_start();
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 			pgoff_t index;
@@ -354,6 +357,7 @@ unsigned long invalidate_mapping_pages(s
 				break;
 		}
 		pagevec_release(&pvec);
+		mem_cgroup_uncharge_end();
 		cond_resched();
 	}
 	return ret;
@@ -428,6 +432,7 @@ int invalidate_inode_pages2_range(struct
 	while (next <= end && !wrapped &&
 		pagevec_lookup(&pvec, mapping, next,
 			min(end - next, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+		mem_cgroup_uncharge_start();
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 			pgoff_t page_index;
@@ -477,6 +482,7 @@ int invalidate_inode_pages2_range(struct
 			unlock_page(page);
 		}
 		pagevec_release(&pvec);
+		mem_cgroup_uncharge_end();
 		cond_resched();
 	}
 	return ret;
Index: mmotm-2.6.31-Sep28/kernel/fork.c
===================================================================
--- mmotm-2.6.31-Sep28.orig/kernel/fork.c
+++ mmotm-2.6.31-Sep28/kernel/fork.c
@@ -1114,6 +1114,10 @@ static struct task_struct *copy_process(
 #ifdef CONFIG_DEBUG_MUTEXES
 	p->blocked_on = NULL; /* not blocked yet */
 #endif
+#ifdef CONFIG_CGROUP_MEM_RES_CTLR
+	p->memcg_batch.do_batch = 0;
+	p->memcg_batch.memcg = NULL;
+#endif
 
 	p->bts = NULL;
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] memcg: coalescing charge by percpu (Oct/9)
  2009-10-09  7:58 [PATCH 1/2] memcg: coalescing uncharge at unmap/truncate (Oct/9) KAMEZAWA Hiroyuki
@ 2009-10-09  8:01 ` KAMEZAWA Hiroyuki
  2009-10-09 23:50   ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-09  8:01 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, nishimura, akpm, h-shimamoto, linux-kernel

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

This is a patch for coalescing access to res_counter at charging by
percpu caching. At charge, memcg charges 64pages and remember it in
percpu cache. Because it's cache, drain/flush if necessary.

This version uses public percpu area.
 2 benefits for using public percpu area.
 1. Sum of stocked charge in the system is limited to # of cpus
    not to the number of memcg. This shows better synchonization.
 2. drain code for flush/cpuhotplug is very easy (and quick)

The most important point of this patch is that we never touch res_counter
in fast path. The res_counter is system-wide shared counter which is modified
very frequently. We shouldn't touch it as far as we can for avoiding
false sharing.

On x86-64 8cpu server, I tested overheads of memcg at page fault by
running a program which does map/fault/unmap in a loop. Running
a task per a cpu by taskset and see sum of the number of page faults
in 60secs.

[without memcg config]
  40156968  page-faults              #      0.085 M/sec   ( +-   0.046% )
  27.67 cache-miss/faults

[root cgroup]
  36659599  page-faults              #      0.077 M/sec   ( +-   0.247% )
  31.58 cache miss/faults

[in a child cgroup]
  18444157  page-faults              #      0.039 M/sec   ( +-   0.133% )
  69.96 cache miss/faults

[ + coalescing uncharge patch]
  27133719  page-faults              #      0.057 M/sec   ( +-   0.155% )
  47.16 cache miss/faults

[ + coalescing uncharge patch + this patch ]
  34224709  page-faults              #      0.072 M/sec   ( +-   0.173% )
  34.69 cache miss/faults

Changelog (since Oct/2):
  - updated comments
  - replaced get_cpu_var() with __get_cpu_var() if possible.
  - removed mutex for system-wide drain. adds a counter instead of it.
  - removed CONFIG_HOTPLUG_CPU

Changelog (old):
  - rebased onto the latest mmotm
  - moved charge size check before __GFP_WAIT check for avoiding unnecesary
  - added asynchronous flush routine.
  - fixed bugs pointed out by Nishimura-san.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |  162 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 156 insertions(+), 6 deletions(-)

Index: mmotm-2.6.31-Sep28/mm/memcontrol.c
===================================================================
--- mmotm-2.6.31-Sep28.orig/mm/memcontrol.c
+++ mmotm-2.6.31-Sep28/mm/memcontrol.c
@@ -38,6 +38,7 @@
 #include <linux/vmalloc.h>
 #include <linux/mm_inline.h>
 #include <linux/page_cgroup.h>
+#include <linux/cpu.h>
 #include "internal.h"
 
 #include <asm/uaccess.h>
@@ -275,6 +276,7 @@ enum charge_type {
 static void mem_cgroup_get(struct mem_cgroup *mem);
 static void mem_cgroup_put(struct mem_cgroup *mem);
 static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
+static void drain_all_stock_async(void);
 
 static struct mem_cgroup_per_zone *
 mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
@@ -1137,6 +1139,8 @@ static int mem_cgroup_hierarchical_recla
 		victim = mem_cgroup_select_victim(root_mem);
 		if (victim == root_mem) {
 			loop++;
+			if (loop >= 1)
+				drain_all_stock_async();
 			if (loop >= 2) {
 				/*
 				 * If we have not been able to reclaim
@@ -1259,6 +1263,139 @@ done:
 }
 
 /*
+ * size of first charge trial. "32" comes from vmscan.c's magic value.
+ * TODO: maybe necessary to use big numbers in big irons.
+ */
+#define CHARGE_SIZE	(32 * PAGE_SIZE)
+struct memcg_stock_pcp {
+	struct mem_cgroup *cached; /* this never be root cgroup */
+	int charge;
+	struct work_struct work;
+};
+static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
+static atomic_t memcg_drain_count;
+
+/*
+ * Try to consume stocked charge on this cpu. If success, PAGE_SIZE is consumed
+ * from local stock and true is returned. If the stock is 0 or charges from a
+ * cgroup which is not current target, returns false. This stock will be
+ * refilled.
+ */
+static bool consume_stock(struct mem_cgroup *mem)
+{
+	struct memcg_stock_pcp *stock;
+	bool ret = true;
+
+	stock = &get_cpu_var(memcg_stock);
+	if (mem == stock->cached && stock->charge)
+		stock->charge -= PAGE_SIZE;
+	else /* need to call res_counter_charge */
+		ret = false;
+	put_cpu_var(memcg_stock);
+	return ret;
+}
+
+/*
+ * Retruens stocks cached in percpu to res_counter and reset
+ * cached information.
+ */
+static void drain_stock(struct memcg_stock_pcp *stock)
+{
+	struct mem_cgroup *old = stock->cached;
+
+	if (stock->charge) {
+		res_counter_uncharge(&old->res, stock->charge);
+		if (do_swap_account)
+			res_counter_uncharge(&old->memsw, stock->charge);
+	}
+	stock->cached = NULL;
+	stock->charge = 0;
+}
+
+/*
+ * This must be called under preempt disabled or must be called by
+ * a thread which is pinned to local cpu.
+ */
+static void drain_local_stock(struct work_struct *dummy)
+{
+	struct memcg_stock_pcp *stock = &__get_cpu_var(memcg_stock);
+	drain_stock(stock);
+}
+
+/*
+ * Cache charges(val) which is from res_counter, to local per_cpu area.
+ * This will be consumed by consumt_stock() function, later.
+ */
+
+static void refill_stock(struct mem_cgroup *mem, int val)
+{
+	struct memcg_stock_pcp *stock = &get_cpu_var(memcg_stock);
+
+	if (stock->cached != mem) { /* reset if necessary */
+		drain_stock(stock);
+		stock->cached = mem;
+	}
+	stock->charge += val;
+	put_cpu_var(memcg_stock);
+}
+
+/*
+ * Tries to drain stocked charges in other cpus. This function is asynchronous
+ * and just put a work per cpu for draining localy on each cpu. Caller can
+ * expects some charges will be back to res_counter later but cannot wait for
+ * it.
+ */
+
+static void drain_all_stock_async(void)
+{
+	int cpu;
+	/* This function is for scheduling "drain" in asynchronous way.
+	 * The result of "drain" is not directly handled by callers. Then,
+	 * if someone is calling drain, we don't have to call drain more.
+	 * Anyway, work_pending() will catch if there is a race. We just do
+	 * loose check here.
+	 */
+	if (atomic_read(&memcg_drain_count))
+		return;
+	/* Notify other cpus that system-wide "drain" is running */
+	atomic_inc(&memcg_drain_count);
+	get_online_cpus();
+	for_each_online_cpu(cpu) {
+		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
+		if (work_pending(&stock->work))
+			continue;
+		INIT_WORK(&stock->work, drain_local_stock);
+		schedule_work_on(cpu, &stock->work);
+	}
+ 	put_online_cpus();
+	atomic_dec(&memcg_drain_count);
+	/* We don't wait for flush_work */
+}
+
+/* This is a synchronous drain interface. */
+static void drain_all_stock_sync(void)
+{
+	/* called when force_empty is called */
+	atomic_inc(&memcg_drain_count);
+	schedule_on_each_cpu(drain_local_stock);
+	atomic_dec(&memcg_drain_count);
+}
+
+static int __cpuinit memcg_stock_cpu_callback(struct notifier_block *nb,
+					unsigned long action,
+					void *hcpu)
+{
+	int cpu = (unsigned long)hcpu;
+	struct memcg_stock_pcp *stock;
+
+	if (action != CPU_DEAD)
+		return NOTIFY_OK;
+	stock = &per_cpu(memcg_stock, cpu);
+	drain_stock(stock);
+	return NOTIFY_OK;
+}
+
+/*
  * Unlike exported interface, "oom" parameter is added. if oom==true,
  * oom-killer can be invoked.
  */
@@ -1269,6 +1406,7 @@ static int __mem_cgroup_try_charge(struc
 	struct mem_cgroup *mem, *mem_over_limit;
 	int nr_retries = MEM_CGROUP_RECLAIM_RETRIES;
 	struct res_counter *fail_res;
+	int csize = CHARGE_SIZE;
 
 	if (unlikely(test_thread_flag(TIF_MEMDIE))) {
 		/* Don't account this! */
@@ -1293,23 +1431,25 @@ static int __mem_cgroup_try_charge(struc
 		return 0;
 
 	VM_BUG_ON(css_is_removed(&mem->css));
+	if (mem_cgroup_is_root(mem))
+		goto done;
 
 	while (1) {
 		int ret = 0;
 		unsigned long flags = 0;
 
-		if (mem_cgroup_is_root(mem))
-			goto done;
-		ret = res_counter_charge(&mem->res, PAGE_SIZE, &fail_res);
+		if (consume_stock(mem))
+			goto charged;
+
+		ret = res_counter_charge(&mem->res, csize, &fail_res);
 		if (likely(!ret)) {
 			if (!do_swap_account)
 				break;
-			ret = res_counter_charge(&mem->memsw, PAGE_SIZE,
-							&fail_res);
+			ret = res_counter_charge(&mem->memsw, csize, &fail_res);
 			if (likely(!ret))
 				break;
 			/* mem+swap counter fails */
-			res_counter_uncharge(&mem->res, PAGE_SIZE);
+			res_counter_uncharge(&mem->res, csize);
 			flags |= MEM_CGROUP_RECLAIM_NOSWAP;
 			mem_over_limit = mem_cgroup_from_res_counter(fail_res,
 									memsw);
@@ -1318,6 +1458,11 @@ static int __mem_cgroup_try_charge(struc
 			mem_over_limit = mem_cgroup_from_res_counter(fail_res,
 									res);
 
+		/* reduce request size and retry */
+		if (csize > PAGE_SIZE) {
+			csize = PAGE_SIZE;
+			continue;
+		}
 		if (!(gfp_mask & __GFP_WAIT))
 			goto nomem;
 
@@ -1347,6 +1492,9 @@ static int __mem_cgroup_try_charge(struc
 			goto nomem;
 		}
 	}
+	if (csize > PAGE_SIZE)
+		refill_stock(mem, csize - PAGE_SIZE);
+charged:
 	/*
 	 * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
 	 * if they exceeds softlimit.
@@ -2468,6 +2616,7 @@ move_account:
 			goto out;
 		/* This is for making all *used* pages to be on LRU. */
 		lru_add_drain_all();
+		drain_all_stock_sync();
 		ret = 0;
 		for_each_node_state(node, N_HIGH_MEMORY) {
 			for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) {
@@ -3186,6 +3335,7 @@ mem_cgroup_create(struct cgroup_subsys *
 		root_mem_cgroup = mem;
 		if (mem_cgroup_soft_limit_tree_init())
 			goto free_out;
+		hotcpu_notifier(memcg_stock_cpu_callback, 0);
 
 	} else {
 		parent = mem_cgroup_from_cont(cont->parent);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] memcg: coalescing charge by percpu (Oct/9)
  2009-10-09  8:01 ` [PATCH 2/2] memcg: coalescing charge by percpu (Oct/9) KAMEZAWA Hiroyuki
@ 2009-10-09 23:50   ` Andrew Morton
  2009-10-11  2:37     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2009-10-09 23:50 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, nishimura, h-shimamoto, linux-kernel

On Fri, 9 Oct 2009 17:01:05 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> +static void drain_all_stock_async(void)
> +{
> +	int cpu;
> +	/* This function is for scheduling "drain" in asynchronous way.
> +	 * The result of "drain" is not directly handled by callers. Then,
> +	 * if someone is calling drain, we don't have to call drain more.
> +	 * Anyway, work_pending() will catch if there is a race. We just do
> +	 * loose check here.
> +	 */
> +	if (atomic_read(&memcg_drain_count))
> +		return;
> +	/* Notify other cpus that system-wide "drain" is running */
> +	atomic_inc(&memcg_drain_count);
> +	get_online_cpus();
> +	for_each_online_cpu(cpu) {
> +		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> +		if (work_pending(&stock->work))
> +			continue;
> +		INIT_WORK(&stock->work, drain_local_stock);
> +		schedule_work_on(cpu, &stock->work);
> +	}
> + 	put_online_cpus();
> +	atomic_dec(&memcg_drain_count);
> +	/* We don't wait for flush_work */
> +}

It's unusual to run INIT_WORK() each time we use a work_struct. 
Usually we will run INIT_WORK a single time, then just repeatedly use
that structure.  Because after the work has completed, it is still in a
ready-to-use state.

Running INIT_WORK() repeatedly against the same work_struct adds a risk
that we'll scribble on an in-use work_struct, which would make a big
mess.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] memcg: coalescing charge by percpu (Oct/9)
  2009-10-09 23:50   ` Andrew Morton
@ 2009-10-11  2:37     ` KAMEZAWA Hiroyuki
  2009-10-13  7:57       ` Daisuke Nishimura
  0 siblings, 1 reply; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-11  2:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, linux-mm, balbir, nishimura, h-shimamoto,
	linux-kernel

Andrew Morton wrote:
> On Fri, 9 Oct 2009 17:01:05 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
>> +static void drain_all_stock_async(void)
>> +{
>> +	int cpu;
>> +	/* This function is for scheduling "drain" in asynchronous way.
>> +	 * The result of "drain" is not directly handled by callers. Then,
>> +	 * if someone is calling drain, we don't have to call drain more.
>> +	 * Anyway, work_pending() will catch if there is a race. We just do
>> +	 * loose check here.
>> +	 */
>> +	if (atomic_read(&memcg_drain_count))
>> +		return;
>> +	/* Notify other cpus that system-wide "drain" is running */
>> +	atomic_inc(&memcg_drain_count);
>> +	get_online_cpus();
>> +	for_each_online_cpu(cpu) {
>> +		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
>> +		if (work_pending(&stock->work))
>> +			continue;
>> +		INIT_WORK(&stock->work, drain_local_stock);
>> +		schedule_work_on(cpu, &stock->work);
>> +	}
>> + 	put_online_cpus();
>> +	atomic_dec(&memcg_drain_count);
>> +	/* We don't wait for flush_work */
>> +}
>
> It's unusual to run INIT_WORK() each time we use a work_struct.
> Usually we will run INIT_WORK a single time, then just repeatedly use
> that structure.  Because after the work has completed, it is still in a
> ready-to-use state.
>
> Running INIT_WORK() repeatedly against the same work_struct adds a risk
> that we'll scribble on an in-use work_struct, which would make a big
> mess.
>
Ah, ok. I'll prepare a fix. (And I think atomic_dec/inc placement is not
very good....I'll do total review, again.)

Thank you for review.

Regards,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] memcg: coalescing charge by percpu (Oct/9)
  2009-10-11  2:37     ` KAMEZAWA Hiroyuki
@ 2009-10-13  7:57       ` Daisuke Nishimura
  2009-10-13  8:05         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 9+ messages in thread
From: Daisuke Nishimura @ 2009-10-13  7:57 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, linux-mm, balbir, h-shimamoto, linux-kernel,
	Daisuke Nishimura

On Sun, 11 Oct 2009 11:37:35 +0900 (JST), "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Andrew Morton wrote:
> > On Fri, 9 Oct 2009 17:01:05 +0900
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >
> >> +static void drain_all_stock_async(void)
> >> +{
> >> +	int cpu;
> >> +	/* This function is for scheduling "drain" in asynchronous way.
> >> +	 * The result of "drain" is not directly handled by callers. Then,
> >> +	 * if someone is calling drain, we don't have to call drain more.
> >> +	 * Anyway, work_pending() will catch if there is a race. We just do
> >> +	 * loose check here.
> >> +	 */
> >> +	if (atomic_read(&memcg_drain_count))
> >> +		return;
> >> +	/* Notify other cpus that system-wide "drain" is running */
> >> +	atomic_inc(&memcg_drain_count);
Shouldn't we use atomic_inc_not_zero() ?
(Do you mean this problem by "is not very good" below ?)


Thanks,
Daisuke Nishimura.

> >> +	get_online_cpus();
> >> +	for_each_online_cpu(cpu) {
> >> +		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> >> +		if (work_pending(&stock->work))
> >> +			continue;
> >> +		INIT_WORK(&stock->work, drain_local_stock);
> >> +		schedule_work_on(cpu, &stock->work);
> >> +	}
> >> + 	put_online_cpus();
> >> +	atomic_dec(&memcg_drain_count);
> >> +	/* We don't wait for flush_work */
> >> +}
> >
> > It's unusual to run INIT_WORK() each time we use a work_struct.
> > Usually we will run INIT_WORK a single time, then just repeatedly use
> > that structure.  Because after the work has completed, it is still in a
> > ready-to-use state.
> >
> > Running INIT_WORK() repeatedly against the same work_struct adds a risk
> > that we'll scribble on an in-use work_struct, which would make a big
> > mess.
> >
> Ah, ok. I'll prepare a fix. (And I think atomic_dec/inc placement is not
> very good....I'll do total review, again.)
> 
> Thank you for review.
> 
> Regards,
> -Kame
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] memcg: coalescing charge by percpu (Oct/9)
  2009-10-13  7:57       ` Daisuke Nishimura
@ 2009-10-13  8:05         ` KAMEZAWA Hiroyuki
  2009-10-14  6:42           ` Daisuke Nishimura
  0 siblings, 1 reply; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-13  8:05 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, linux-mm, balbir, h-shimamoto, linux-kernel

On Tue, 13 Oct 2009 16:57:19 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> On Sun, 11 Oct 2009 11:37:35 +0900 (JST), "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > Andrew Morton wrote:
> > > On Fri, 9 Oct 2009 17:01:05 +0900
> > > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > >
> > >> +static void drain_all_stock_async(void)
> > >> +{
> > >> +	int cpu;
> > >> +	/* This function is for scheduling "drain" in asynchronous way.
> > >> +	 * The result of "drain" is not directly handled by callers. Then,
> > >> +	 * if someone is calling drain, we don't have to call drain more.
> > >> +	 * Anyway, work_pending() will catch if there is a race. We just do
> > >> +	 * loose check here.
> > >> +	 */
> > >> +	if (atomic_read(&memcg_drain_count))
> > >> +		return;
> > >> +	/* Notify other cpus that system-wide "drain" is running */
> > >> +	atomic_inc(&memcg_drain_count);
> Shouldn't we use atomic_inc_not_zero() ?
> (Do you mean this problem by "is not very good" below ?)
> 
As comment says, "we just do loose check". There is no terrible race except
for wasting cpu time.

I'm now thinking about following.

==
for_each_online_cpu(cpu) {
	struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
	if (work_pending(&stock->work))
		continue;
+       atomic_inc(&memcg_drain_count);
	INIT_WORK(&stock->work, drain_local_stock);
	schedule_work_on(cpu, &stock->work);
}
==
Or using cpumask to avoid scheduleing twice.

atomic_dec will be added to worker routine, after drain.

I'm now prepareing slides for JLS (ah, yes, deadline has gone.), so plz give me time..
If you want to review it, plz let me know.

Thanks,
-Kame


> 
> Thanks,
> Daisuke Nishimura.
> 
> > >> +	get_online_cpus();
> > >> +	for_each_online_cpu(cpu) {
> > >> +		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> > >> +		if (work_pending(&stock->work))
> > >> +			continue;
> > >> +		INIT_WORK(&stock->work, drain_local_stock);
> > >> +		schedule_work_on(cpu, &stock->work);
> > >> +	}
> > >> + 	put_online_cpus();
> > >> +	atomic_dec(&memcg_drain_count);
> > >> +	/* We don't wait for flush_work */
> > >> +}
> > >
> > > It's unusual to run INIT_WORK() each time we use a work_struct.
> > > Usually we will run INIT_WORK a single time, then just repeatedly use
> > > that structure.  Because after the work has completed, it is still in a
> > > ready-to-use state.
> > >
> > > Running INIT_WORK() repeatedly against the same work_struct adds a risk
> > > that we'll scribble on an in-use work_struct, which would make a big
> > > mess.
> > >
> > Ah, ok. I'll prepare a fix. (And I think atomic_dec/inc placement is not
> > very good....I'll do total review, again.)
> > 
> > Thank you for review.
> > 
> > Regards,
> > -Kame
> > 
> > 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] memcg: coalescing charge by percpu (Oct/9)
  2009-10-13  8:05         ` KAMEZAWA Hiroyuki
@ 2009-10-14  6:42           ` Daisuke Nishimura
  2009-10-14  7:02             ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 9+ messages in thread
From: Daisuke Nishimura @ 2009-10-14  6:42 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, linux-mm, balbir, h-shimamoto, linux-kernel,
	Daisuke Nishimura

> I'm now prepareing slides for JLS (ah, yes, deadline has gone.), so plz give me time..
I see. I think there is still time left till next merge window, so I'm in no hurry.
I will go to JLS and am looking forward to attending your session :)


Some comments and a fix about this patch are inlined.

On Tue, 13 Oct 2009 17:05:45 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 13 Oct 2009 16:57:19 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
> > On Sun, 11 Oct 2009 11:37:35 +0900 (JST), "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > Andrew Morton wrote:
> > > > On Fri, 9 Oct 2009 17:01:05 +0900
> > > > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > >
> > > >> +static void drain_all_stock_async(void)
> > > >> +{
> > > >> +	int cpu;
> > > >> +	/* This function is for scheduling "drain" in asynchronous way.
> > > >> +	 * The result of "drain" is not directly handled by callers. Then,
> > > >> +	 * if someone is calling drain, we don't have to call drain more.
> > > >> +	 * Anyway, work_pending() will catch if there is a race. We just do
> > > >> +	 * loose check here.
> > > >> +	 */
> > > >> +	if (atomic_read(&memcg_drain_count))
> > > >> +		return;
> > > >> +	/* Notify other cpus that system-wide "drain" is running */
> > > >> +	atomic_inc(&memcg_drain_count);
> > Shouldn't we use atomic_inc_not_zero() ?
> > (Do you mean this problem by "is not very good" below ?)
> > 
> As comment says, "we just do loose check". There is no terrible race except
> for wasting cpu time.
> 
Considering more, I think checking loosely itself is not bad, but using INIT_WORK()
is not a good behavior, as Andrew said.

It clears WORK_STRUCT_PENDING bit and initializes the list_head of the work_struct.
So I think a following race can happen.

    work_pending() ->false
    INIT_WORK()
    schedule_work_on()    
      queue_work_on()        
                                  work_pending() -> false
        test_and_seet_bit()
          -> sets WORK_STRUCT_PENDING
                                  INIT_WORK() -> clears WORK_STRUCT_PENDING
                             

And actually, I've seen BUG several times in testing mmotm-2009-10-09-01-07 + these 2 patches.
This seems not to happen on plain mmotm-2009-10-09-01-07.


[ 2264.213803] ------------[ cut here ]------------
[ 2264.214058] WARNING: at lib/list_debug.c:30 __list_add+0x6e/0x87()
[ 2264.214058] Hardware name: Express5800/140Rd-4 [N8100-1065]
[ 2264.214058] list_add corruption. prev->next should be next (ffff880029fd8e80), but was
ffff880029fd36b8. (prev=ffff880029fd36b8).
[ 2264.214058] Modules linked in: ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables b
ridge stp autofs4 hidp rfcomm l2cap crc16 bluetooth lockd sunrpc ib_iser rdma_cm ib_cm iw_
cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_t
cp libiscsi scsi_transport_iscsi dm_mirror dm_multipath scsi_dh video output sbs sbshc bat
tery ac lp sg ide_cd_mod cdrom serio_raw acpi_memhotplug button parport_pc parport rtc_cmo
s rtc_core rtc_lib e1000 i2c_i801 i2c_core pata_acpi ata_generic pcspkr dm_region_hash dm_
log dm_mod ata_piix libata shpchp megaraid_mbox megaraid_mm sd_mod scsi_mod ext3 jbd uhci_
hcd ohci_hcd ehci_hcd [last unloaded: freq_table]
[ 2264.214058] Pid: 14398, comm: shmem_test_02 Tainted: G      D W  2.6.32-rc3-mmotm-2009-
10-09-01-07-00249-g1112566 #2
[ 2264.214058] Call Trace:
[ 2264.214058]  [<ffffffff811e0f43>] ? __list_add+0x6e/0x87
[ 2264.214058]  [<ffffffff8104af21>] warn_slowpath_common+0x7c/0x94
[ 2264.214058]  [<ffffffff8104afb8>] warn_slowpath_fmt+0x69/0x6b
[ 2264.214058]  [<ffffffff81077354>] ? print_lock_contention_bug+0x1b/0xe0
[ 2264.214058]  [<ffffffff810b819d>] ? probe_workqueue_insertion+0x40/0x9f
[ 2264.214058]  [<ffffffff81076720>] ? trace_hardirqs_off+0xd/0xf
[ 2264.214058]  [<ffffffff81386bef>] ? _spin_unlock_irqrestore+0x3d/0x4c
[ 2264.214058]  [<ffffffff811e0f43>] __list_add+0x6e/0x87
[ 2264.214058]  [<ffffffff81065230>] insert_work+0x78/0x9a
[ 2264.214058]  [<ffffffff81065c47>] __queue_work+0x2f/0x43
[ 2264.214058]  [<ffffffff81065cdb>] queue_work_on+0x44/0x50
[ 2264.214058]  [<ffffffff81065d02>] schedule_work_on+0x1b/0x1d
[ 2264.214058]  [<ffffffff81104623>] mem_cgroup_hierarchical_reclaim+0x232/0x40e
[ 2264.214058]  [<ffffffff81077eeb>] ? trace_hardirqs_on+0xd/0xf
[ 2264.214058]  [<ffffffff81104ec9>] __mem_cgroup_try_charge+0x14c/0x25b
[ 2264.214058]  [<ffffffff811058df>] mem_cgroup_charge_common+0x55/0x82
[ 2264.214058]  [<ffffffff810d5f9d>] ? shmem_swp_entry+0x134/0x140
[ 2264.214058]  [<ffffffff81106a44>] mem_cgroup_cache_charge+0xef/0x118
[ 2264.214058]  [<ffffffff810f766e>] ? alloc_page_vma+0xe0/0xef
[ 2264.214058]  [<ffffffff810d689a>] shmem_getpage+0x6a7/0x86b
[ 2264.214058]  [<ffffffff81077eeb>] ? trace_hardirqs_on+0xd/0xf
[ 2264.214058]  [<ffffffff810ce1d4>] ? get_page_from_freelist+0x4fc/0x6ce
[ 2264.214058]  [<ffffffff81386794>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 2264.214058]  [<ffffffff8100e385>] ? do_softirq+0x77/0x86
[ 2264.214058]  [<ffffffff8100c5bc>] ? restore_args+0x0/0x30
[ 2264.214058]  [<ffffffff810502bf>] ? current_fs_time+0x27/0x2e
[ 2264.214058]  [<ffffffff81077354>] ? print_lock_contention_bug+0x1b/0xe0
[ 2264.214058]  [<ffffffff811236c2>] ? file_update_time+0x44/0x144
[ 2264.214058]  [<ffffffff810d6b11>] shmem_fault+0x42/0x63
[ 2264.214058]  [<ffffffff810e07db>] __do_fault+0x55/0x484
[ 2264.214058]  [<ffffffff81077354>] ? print_lock_contention_bug+0x1b/0xe0
[ 2264.214058]  [<ffffffff810e28e1>] handle_mm_fault+0x1ea/0x813
[ 2264.214058]  [<ffffffff81389770>] do_page_fault+0x25a/0x2ea
[ 2264.214058]  [<ffffffff8138757f>] page_fault+0x1f/0x30
[ 2264.214058] ---[ end trace d435092dd749cff5 ]---
[ 2264.538186] ------------[ cut here ]------------
[ 2264.538219] ------------[ cut here ]------------
[ 2264.538231] kernel BUG at kernel/workqueue.c:287!
[ 2264.538240] invalid opcode: 0000 [#2] SMP
[ 2264.538251] last sysfs file: /sys/devices/system/cpu/cpu15/cache/index1/shared_cpu_map
[ 2264.538259] CPU 0
[ 2264.538270] Modules linked in: ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables b
ridge stp autofs4 hidp rfcomm l2cap crc16 bluetooth lockd sunrpc ib_iser rdma_cm ib_cm iw_
cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_t
cp libiscsi scsi_transport_iscsi dm_mirror dm_multipath scsi_dh video output sbs sbshc bat
tery ac lp sg ide_cd_mod cdrom serio_raw acpi_memhotplug button parport_pc parport rtc_cmo
s rtc_core rtc_lib e1000 i2c_i801 i2c_core pata_acpi ata_generic pcspkr dm_region_hash dm_
log dm_mod ata_piix libata shpchp megaraid_mbox megaraid_mm sd_mod scsi_mod ext3 jbd uhci_
hcd ohci_hcd ehci_hcd [last unloaded: freq_table]
[ 2264.538627] Pid: 51, comm: events/0 Tainted: G      D W  2.6.32-rc3-mmotm-2009-10-09-01
-07-00249-g1112566 #2 Express5800/140Rd-4 [N8100-1065]
[ 2264.538638] RIP: 0010:[<ffffffff81065f56>]  [<ffffffff81065f56>] worker_thread+0x157/0x
2b3
[ 2264.538663] RSP: 0000:ffff8803a068fe10  EFLAGS: 00010207
[ 2264.538670] RAX: 0000000000000000 RBX: ffff8803a045a478 RCX: ffff880029fd36b8
[ 2264.538679] RDX: 0000000000001918 RSI: 00000001a068fdd0 RDI: ffffffff81386bad
[ 2264.538684] RBP: ffff8803a068feb0 R08: 0000000000000002 R09: 0000000000000001
[ 2264.538694] R10: ffffffff81389860 R11: ffff880029fd6518 R12: ffff880029fd36b0
[ 2264.538701] R13: ffff880029fd8e40 R14: ffff8803a067e590 R15: ffffffff811049f3
[ 2264.538719] FS:  0000000000000000(0000) GS:ffff880029e00000(0000) knlGS:000000000000000
0
[ 2264.538734] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 2264.538742] CR2: 00007ffcc30a0001 CR3: 0000000001001000 CR4: 00000000000006f0
[ 2264.538755] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2264.538766] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2264.538773] Process events/0 (pid: 51, threadinfo ffff8803a068e000, task ffff8803a067e5
90)
[ 2264.538783] Stack:
[ 2264.538788]  ffffffff81065f5a ffffffff813847ec ffffffff827d6a20 0000000000000000
[ 2264.538807] <0> ffffffff814f5a42 0000000000000002 0000000000000000 ffff8803a067e948
[ 2264.538835] <0> 0000000000000000 ffff8803a067e590 ffffffff8106a61b ffff8803a068fe68
[ 2264.538857] Call Trace:
[ 2264.538868]  [<ffffffff81065f5a>] ? worker_thread+0x15b/0x2b3
[ 2264.538883]  [<ffffffff813847ec>] ? thread_return+0x3e/0xee
[ 2264.538887]  [<ffffffff8106a61b>] ? autoremove_wake_function+0x0/0x3d
[ 2264.538887]  [<ffffffff81065dff>] ? worker_thread+0x0/0x2b3
[ 2264.538887]  [<ffffffff8106a510>] kthread+0x82/0x8a
[ 2264.538887]  [<ffffffff8100cc1a>] child_rip+0xa/0x20
[ 2264.538887]  [<ffffffff8100c5bc>] ? restore_args+0x0/0x30
[ 2264.538887]  [<ffffffff8106a46d>] ? kthreadd+0xd5/0xf6
[ 2264.538887]  [<ffffffff8106a48e>] ? kthread+0x0/0x8a
[ 2264.538887]  [<ffffffff8100cc10>] ? child_rip+0x0/0x20
[ 2264.538887] Code: 00 4c 89 ef 48 8b 08 48 8b 50 08 48 89 51 08 48 89 0a 48 89 40 08 48
89 00 e8 34 0c 32 00 49 8b 04 24 48 83 e0 fc 4c 39 e8 74 04 <0f> 0b eb fe f0 41 80 24 24 f
e 49 8b bd a8 00 00 00 48 8d 9d 70
[ 2264.538887] RIP  [<ffffffff81065f56>] worker_thread+0x157/0x2b3
[ 2264.538887]  RSP <ffff8803a068fe10>
[ 2264.539304] ---[ end trace d435092dd749cff6 ]---


How about doing INIT_WORK() once in initialization like this ?
After this patch, it has survived the same test for more than 6 hours w/o causing the BUG,
while it survived for only 1 hour at most before.

===
From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

Don't do INIT_WORK() repeatedly against the same work_struct.
Just do it once in initialization.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 mm/memcontrol.c |   13 ++++++++-----
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index bc57916..fd8f65f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1349,8 +1349,8 @@ static void drain_all_stock_async(void)
 	/* This function is for scheduling "drain" in asynchronous way.
 	 * The result of "drain" is not directly handled by callers. Then,
 	 * if someone is calling drain, we don't have to call drain more.
-	 * Anyway, work_pending() will catch if there is a race. We just do
-	 * loose check here.
+	 * Anyway, WORK_STRUCT_PENDING check in queue_work_on() will catch if
+	 * there is a race. We just do loose check here.
 	 */
 	if (atomic_read(&memcg_drain_count))
 		return;
@@ -1359,9 +1359,6 @@ static void drain_all_stock_async(void)
 	get_online_cpus();
 	for_each_online_cpu(cpu) {
 		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
-		if (work_pending(&stock->work))
-			continue;
-		INIT_WORK(&stock->work, drain_local_stock);
 		schedule_work_on(cpu, &stock->work);
 	}
  	put_online_cpus();
@@ -3327,11 +3324,17 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 
 	/* root ? */
 	if (cont->parent == NULL) {
+		int cpu;
 		enable_swap_cgroup();
 		parent = NULL;
 		root_mem_cgroup = mem;
 		if (mem_cgroup_soft_limit_tree_init())
 			goto free_out;
+		for_each_possible_cpu(cpu) {
+			struct memcg_stock_pcp *stock =
+						&per_cpu(memcg_stock, cpu);
+			INIT_WORK(&stock->work, drain_local_stock);
+		}
 		hotcpu_notifier(memcg_stock_cpu_callback, 0);
 
 	} else {
-- 
1.5.6.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] memcg: coalescing charge by percpu (Oct/9)
  2009-10-14  6:42           ` Daisuke Nishimura
@ 2009-10-14  7:02             ` KAMEZAWA Hiroyuki
  2009-10-16  0:32               ` [BUGFIX][PATCH -mmotm] memcg: don't do INIT_WORK() repeatedly against the same work_struct Daisuke Nishimura
  0 siblings, 1 reply; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-10-14  7:02 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, linux-mm, balbir, h-shimamoto, linux-kernel

On Wed, 14 Oct 2009 15:42:11 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> It clears WORK_STRUCT_PENDING bit and initializes the list_head of the work_struct.
> So I think a following race can happen.
> 
>     work_pending() ->false
>     INIT_WORK()
>     schedule_work_on()    
>       queue_work_on()        
>                                   work_pending() -> false
>         test_and_seet_bit()
>           -> sets WORK_STRUCT_PENDING
>                                   INIT_WORK() -> clears WORK_STRUCT_PENDING
>                              
> 
> And actually, I've seen BUG several times in testing mmotm-2009-10-09-01-07 + these 2 patches.
> This seems not to happen on plain mmotm-2009-10-09-01-07.
> 
> 
> [ 2264.213803] ------------[ cut here ]------------
> [ 2264.214058] WARNING: at lib/list_debug.c:30 __list_add+0x6e/0x87()
> [ 2264.214058] Hardware name: Express5800/140Rd-4 [N8100-1065]
> [ 2264.214058] list_add corruption. prev->next should be next (ffff880029fd8e80), but was
> ffff880029fd36b8. (prev=ffff880029fd36b8).
> [ 2264.214058] Modules linked in: ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables b
> ridge stp autofs4 hidp rfcomm l2cap crc16 bluetooth lockd sunrpc ib_iser rdma_cm ib_cm iw_
> cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_t
> cp libiscsi scsi_transport_iscsi dm_mirror dm_multipath scsi_dh video output sbs sbshc bat
> tery ac lp sg ide_cd_mod cdrom serio_raw acpi_memhotplug button parport_pc parport rtc_cmo
> s rtc_core rtc_lib e1000 i2c_i801 i2c_core pata_acpi ata_generic pcspkr dm_region_hash dm_
> log dm_mod ata_piix libata shpchp megaraid_mbox megaraid_mm sd_mod scsi_mod ext3 jbd uhci_
> hcd ohci_hcd ehci_hcd [last unloaded: freq_table]
> [ 2264.214058] Pid: 14398, comm: shmem_test_02 Tainted: G      D W  2.6.32-rc3-mmotm-2009-
> 10-09-01-07-00249-g1112566 #2
> [ 2264.214058] Call Trace:
> [ 2264.214058]  [<ffffffff811e0f43>] ? __list_add+0x6e/0x87
> [ 2264.214058]  [<ffffffff8104af21>] warn_slowpath_common+0x7c/0x94
> [ 2264.214058]  [<ffffffff8104afb8>] warn_slowpath_fmt+0x69/0x6b
> [ 2264.214058]  [<ffffffff81077354>] ? print_lock_contention_bug+0x1b/0xe0
> [ 2264.214058]  [<ffffffff810b819d>] ? probe_workqueue_insertion+0x40/0x9f
> [ 2264.214058]  [<ffffffff81076720>] ? trace_hardirqs_off+0xd/0xf
> [ 2264.214058]  [<ffffffff81386bef>] ? _spin_unlock_irqrestore+0x3d/0x4c
> [ 2264.214058]  [<ffffffff811e0f43>] __list_add+0x6e/0x87
> [ 2264.214058]  [<ffffffff81065230>] insert_work+0x78/0x9a
> [ 2264.214058]  [<ffffffff81065c47>] __queue_work+0x2f/0x43
> [ 2264.214058]  [<ffffffff81065cdb>] queue_work_on+0x44/0x50
> [ 2264.214058]  [<ffffffff81065d02>] schedule_work_on+0x1b/0x1d
> [ 2264.214058]  [<ffffffff81104623>] mem_cgroup_hierarchical_reclaim+0x232/0x40e
> [ 2264.214058]  [<ffffffff81077eeb>] ? trace_hardirqs_on+0xd/0xf
> [ 2264.214058]  [<ffffffff81104ec9>] __mem_cgroup_try_charge+0x14c/0x25b
> [ 2264.214058]  [<ffffffff811058df>] mem_cgroup_charge_common+0x55/0x82
> [ 2264.214058]  [<ffffffff810d5f9d>] ? shmem_swp_entry+0x134/0x140
> [ 2264.214058]  [<ffffffff81106a44>] mem_cgroup_cache_charge+0xef/0x118
> [ 2264.214058]  [<ffffffff810f766e>] ? alloc_page_vma+0xe0/0xef
> [ 2264.214058]  [<ffffffff810d689a>] shmem_getpage+0x6a7/0x86b
> [ 2264.214058]  [<ffffffff81077eeb>] ? trace_hardirqs_on+0xd/0xf
> [ 2264.214058]  [<ffffffff810ce1d4>] ? get_page_from_freelist+0x4fc/0x6ce
> [ 2264.214058]  [<ffffffff81386794>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 2264.214058]  [<ffffffff8100e385>] ? do_softirq+0x77/0x86
> [ 2264.214058]  [<ffffffff8100c5bc>] ? restore_args+0x0/0x30
> [ 2264.214058]  [<ffffffff810502bf>] ? current_fs_time+0x27/0x2e
> [ 2264.214058]  [<ffffffff81077354>] ? print_lock_contention_bug+0x1b/0xe0
> [ 2264.214058]  [<ffffffff811236c2>] ? file_update_time+0x44/0x144
> [ 2264.214058]  [<ffffffff810d6b11>] shmem_fault+0x42/0x63
> [ 2264.214058]  [<ffffffff810e07db>] __do_fault+0x55/0x484
> [ 2264.214058]  [<ffffffff81077354>] ? print_lock_contention_bug+0x1b/0xe0
> [ 2264.214058]  [<ffffffff810e28e1>] handle_mm_fault+0x1ea/0x813
> [ 2264.214058]  [<ffffffff81389770>] do_page_fault+0x25a/0x2ea
> [ 2264.214058]  [<ffffffff8138757f>] page_fault+0x1f/0x30
> [ 2264.214058] ---[ end trace d435092dd749cff5 ]---
> [ 2264.538186] ------------[ cut here ]------------
> [ 2264.538219] ------------[ cut here ]------------
> [ 2264.538231] kernel BUG at kernel/workqueue.c:287!
> [ 2264.538240] invalid opcode: 0000 [#2] SMP
> [ 2264.538251] last sysfs file: /sys/devices/system/cpu/cpu15/cache/index1/shared_cpu_map
> [ 2264.538259] CPU 0
> [ 2264.538270] Modules linked in: ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables b
> ridge stp autofs4 hidp rfcomm l2cap crc16 bluetooth lockd sunrpc ib_iser rdma_cm ib_cm iw_
> cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_t
> cp libiscsi scsi_transport_iscsi dm_mirror dm_multipath scsi_dh video output sbs sbshc bat
> tery ac lp sg ide_cd_mod cdrom serio_raw acpi_memhotplug button parport_pc parport rtc_cmo
> s rtc_core rtc_lib e1000 i2c_i801 i2c_core pata_acpi ata_generic pcspkr dm_region_hash dm_
> log dm_mod ata_piix libata shpchp megaraid_mbox megaraid_mm sd_mod scsi_mod ext3 jbd uhci_
> hcd ohci_hcd ehci_hcd [last unloaded: freq_table]
> [ 2264.538627] Pid: 51, comm: events/0 Tainted: G      D W  2.6.32-rc3-mmotm-2009-10-09-01
> -07-00249-g1112566 #2 Express5800/140Rd-4 [N8100-1065]
> [ 2264.538638] RIP: 0010:[<ffffffff81065f56>]  [<ffffffff81065f56>] worker_thread+0x157/0x
> 2b3
> [ 2264.538663] RSP: 0000:ffff8803a068fe10  EFLAGS: 00010207
> [ 2264.538670] RAX: 0000000000000000 RBX: ffff8803a045a478 RCX: ffff880029fd36b8
> [ 2264.538679] RDX: 0000000000001918 RSI: 00000001a068fdd0 RDI: ffffffff81386bad
> [ 2264.538684] RBP: ffff8803a068feb0 R08: 0000000000000002 R09: 0000000000000001
> [ 2264.538694] R10: ffffffff81389860 R11: ffff880029fd6518 R12: ffff880029fd36b0
> [ 2264.538701] R13: ffff880029fd8e40 R14: ffff8803a067e590 R15: ffffffff811049f3
> [ 2264.538719] FS:  0000000000000000(0000) GS:ffff880029e00000(0000) knlGS:000000000000000
> 0
> [ 2264.538734] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [ 2264.538742] CR2: 00007ffcc30a0001 CR3: 0000000001001000 CR4: 00000000000006f0
> [ 2264.538755] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2264.538766] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 2264.538773] Process events/0 (pid: 51, threadinfo ffff8803a068e000, task ffff8803a067e5
> 90)
> [ 2264.538783] Stack:
> [ 2264.538788]  ffffffff81065f5a ffffffff813847ec ffffffff827d6a20 0000000000000000
> [ 2264.538807] <0> ffffffff814f5a42 0000000000000002 0000000000000000 ffff8803a067e948
> [ 2264.538835] <0> 0000000000000000 ffff8803a067e590 ffffffff8106a61b ffff8803a068fe68
> [ 2264.538857] Call Trace:
> [ 2264.538868]  [<ffffffff81065f5a>] ? worker_thread+0x15b/0x2b3
> [ 2264.538883]  [<ffffffff813847ec>] ? thread_return+0x3e/0xee
> [ 2264.538887]  [<ffffffff8106a61b>] ? autoremove_wake_function+0x0/0x3d
> [ 2264.538887]  [<ffffffff81065dff>] ? worker_thread+0x0/0x2b3
> [ 2264.538887]  [<ffffffff8106a510>] kthread+0x82/0x8a
> [ 2264.538887]  [<ffffffff8100cc1a>] child_rip+0xa/0x20
> [ 2264.538887]  [<ffffffff8100c5bc>] ? restore_args+0x0/0x30
> [ 2264.538887]  [<ffffffff8106a46d>] ? kthreadd+0xd5/0xf6
> [ 2264.538887]  [<ffffffff8106a48e>] ? kthread+0x0/0x8a
> [ 2264.538887]  [<ffffffff8100cc10>] ? child_rip+0x0/0x20
> [ 2264.538887] Code: 00 4c 89 ef 48 8b 08 48 8b 50 08 48 89 51 08 48 89 0a 48 89 40 08 48
> 89 00 e8 34 0c 32 00 49 8b 04 24 48 83 e0 fc 4c 39 e8 74 04 <0f> 0b eb fe f0 41 80 24 24 f
> e 49 8b bd a8 00 00 00 48 8d 9d 70
> [ 2264.538887] RIP  [<ffffffff81065f56>] worker_thread+0x157/0x2b3
> [ 2264.538887]  RSP <ffff8803a068fe10>
> [ 2264.539304] ---[ end trace d435092dd749cff6 ]---
> 
> 
> How about doing INIT_WORK() once in initialization like this ?
> After this patch, it has survived the same test for more than 6 hours w/o causing the BUG,
> while it survived for only 1 hour at most before.
> 
> ===
> From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> 
> Don't do INIT_WORK() repeatedly against the same work_struct.
> Just do it once in initialization.
> 
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Thanks. you're much quicker than me.

Regards,
-Kame

> ---
>  mm/memcontrol.c |   13 ++++++++-----
>  1 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index bc57916..fd8f65f 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1349,8 +1349,8 @@ static void drain_all_stock_async(void)
>  	/* This function is for scheduling "drain" in asynchronous way.
>  	 * The result of "drain" is not directly handled by callers. Then,
>  	 * if someone is calling drain, we don't have to call drain more.
> -	 * Anyway, work_pending() will catch if there is a race. We just do
> -	 * loose check here.
> +	 * Anyway, WORK_STRUCT_PENDING check in queue_work_on() will catch if
> +	 * there is a race. We just do loose check here.
>  	 */
>  	if (atomic_read(&memcg_drain_count))
>  		return;
> @@ -1359,9 +1359,6 @@ static void drain_all_stock_async(void)
>  	get_online_cpus();
>  	for_each_online_cpu(cpu) {
>  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> -		if (work_pending(&stock->work))
> -			continue;
> -		INIT_WORK(&stock->work, drain_local_stock);
>  		schedule_work_on(cpu, &stock->work);
>  	}
>   	put_online_cpus();
> @@ -3327,11 +3324,17 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  
>  	/* root ? */
>  	if (cont->parent == NULL) {
> +		int cpu;
>  		enable_swap_cgroup();
>  		parent = NULL;
>  		root_mem_cgroup = mem;
>  		if (mem_cgroup_soft_limit_tree_init())
>  			goto free_out;
> +		for_each_possible_cpu(cpu) {
> +			struct memcg_stock_pcp *stock =
> +						&per_cpu(memcg_stock, cpu);
> +			INIT_WORK(&stock->work, drain_local_stock);
> +		}
>  		hotcpu_notifier(memcg_stock_cpu_callback, 0);
>  
>  	} else {
> -- 
> 1.5.6.1
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [BUGFIX][PATCH -mmotm] memcg: don't do INIT_WORK() repeatedly against the same work_struct
  2009-10-14  7:02             ` KAMEZAWA Hiroyuki
@ 2009-10-16  0:32               ` Daisuke Nishimura
  0 siblings, 0 replies; 9+ messages in thread
From: Daisuke Nishimura @ 2009-10-16  0:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, linux-mm, balbir, h-shimamoto, linux-kernel,
	Daisuke Nishimura

This is a fix for memcg-coalesce-charging-via-percpu-storage.patch,
and can be applied after memcg-coalesce-charging-via-percpu-storage-fix.patch.

===
From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

Don't do INIT_WORK() repeatedly against the same work_struct.
It can actually lead to a BUG.

Just do it once in initialization.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 mm/memcontrol.c |   13 ++++++++-----
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f850941..bf02bea 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1349,8 +1349,8 @@ static void drain_all_stock_async(void)
 	/* This function is for scheduling "drain" in asynchronous way.
 	 * The result of "drain" is not directly handled by callers. Then,
 	 * if someone is calling drain, we don't have to call drain more.
-	 * Anyway, work_pending() will catch if there is a race. We just do
-	 * loose check here.
+	 * Anyway, WORK_STRUCT_PENDING check in queue_work_on() will catch if
+	 * there is a race. We just do loose check here.
 	 */
 	if (atomic_read(&memcg_drain_count))
 		return;
@@ -1359,9 +1359,6 @@ static void drain_all_stock_async(void)
 	get_online_cpus();
 	for_each_online_cpu(cpu) {
 		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
-		if (work_pending(&stock->work))
-			continue;
-		INIT_WORK(&stock->work, drain_local_stock);
 		schedule_work_on(cpu, &stock->work);
 	}
  	put_online_cpus();
@@ -3327,11 +3324,17 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 
 	/* root ? */
 	if (cont->parent == NULL) {
+		int cpu;
 		enable_swap_cgroup();
 		parent = NULL;
 		root_mem_cgroup = mem;
 		if (mem_cgroup_soft_limit_tree_init())
 			goto free_out;
+		for_each_possible_cpu(cpu) {
+			struct memcg_stock_pcp *stock =
+						&per_cpu(memcg_stock, cpu);
+			INIT_WORK(&stock->work, drain_local_stock);
+		}
 		hotcpu_notifier(memcg_stock_cpu_callback, 0);
 
 	} else {
-- 
1.5.6.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-10-16  0:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-09  7:58 [PATCH 1/2] memcg: coalescing uncharge at unmap/truncate (Oct/9) KAMEZAWA Hiroyuki
2009-10-09  8:01 ` [PATCH 2/2] memcg: coalescing charge by percpu (Oct/9) KAMEZAWA Hiroyuki
2009-10-09 23:50   ` Andrew Morton
2009-10-11  2:37     ` KAMEZAWA Hiroyuki
2009-10-13  7:57       ` Daisuke Nishimura
2009-10-13  8:05         ` KAMEZAWA Hiroyuki
2009-10-14  6:42           ` Daisuke Nishimura
2009-10-14  7:02             ` KAMEZAWA Hiroyuki
2009-10-16  0:32               ` [BUGFIX][PATCH -mmotm] memcg: don't do INIT_WORK() repeatedly against the same work_struct Daisuke Nishimura

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox