* [RFC][PATCH] memcg remove css_get/put per pages
@ 2010-06-08 3:19 KAMEZAWA Hiroyuki
2010-06-08 5:40 ` Balbir Singh
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-06-08 3:19 UTC (permalink / raw)
To: linux-mm; +Cc: balbir, nishimura
Now, I think pre_destroy->force_empty() works very well and we can get rid of
css_put/get per pages. This has very big effect in some special case.
This is a test result with a multi-thread page fault program
(I used at rwsem discussion.)
[Before patch]
25.72% multi-fault-all [kernel.kallsyms] [k] clear_page_c
8.18% multi-fault-all [kernel.kallsyms] [k] try_get_mem_cgroup_from_mm
8.17% multi-fault-all [kernel.kallsyms] [k] down_read_trylock
8.03% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irqsave
5.46% multi-fault-all [kernel.kallsyms] [k] __css_put
5.45% multi-fault-all [kernel.kallsyms] [k] __alloc_pages_nodemask
4.36% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irq
4.35% multi-fault-all [kernel.kallsyms] [k] up_read
3.59% multi-fault-all [kernel.kallsyms] [k] css_put
2.37% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock
1.80% multi-fault-all [kernel.kallsyms] [k] mem_cgroup_add_lru_list
1.78% multi-fault-all [kernel.kallsyms] [k] __rmqueue
1.65% multi-fault-all [kernel.kallsyms] [k] handle_mm_fault
try_get_mem_cgroup_from_mm() is a one of heavy ops because of false-sharing in
css's counter for css_get/put.
I removed that.
[After]
26.16% multi-fault-all [kernel.kallsyms] [k] clear_page_c
11.73% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock
9.23% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irqsave
9.07% multi-fault-all [kernel.kallsyms] [k] down_read_trylock
6.09% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irq
5.57% multi-fault-all [kernel.kallsyms] [k] __alloc_pages_nodemask
4.86% multi-fault-all [kernel.kallsyms] [k] up_read
2.54% multi-fault-all [kernel.kallsyms] [k] __mem_cgroup_commit_charge
2.29% multi-fault-all [kernel.kallsyms] [k] _cond_resched
2.04% multi-fault-all [kernel.kallsyms] [k] mem_cgroup_add_lru_list
1.82% multi-fault-all [kernel.kallsyms] [k] handle_mm_fault
Hmm. seems nice. But I don't convince my patch has no race.
I'll continue test but your help is welcome.
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Now, memory cgroup increments css(cgroup subsys state)'s reference
count per a charged page. And the reference count is kept until
the page is uncharged. But this has 2 bad effect.
1. Because css_get/put calls atoimic_inc()/dec, heavy call of them
on large smp will not scale well.
2. Because css's refcnt cannot be in a state as "ready-to-release",
cgroup's notify_on_release handler can't work with memcg.
This is a trial to remove css's refcnt per a page. Even if we remove
refcnt, pre_destroy() does enough synchronization.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/memcontrol.c | 66 +++++++++++++++++++++++++++++++++++++++-----------------
1 file changed, 46 insertions(+), 20 deletions(-)
Index: mmotm-2.6.34-Jun6/mm/memcontrol.c
===================================================================
--- mmotm-2.6.34-Jun6.orig/mm/memcontrol.c
+++ mmotm-2.6.34-Jun6/mm/memcontrol.c
@@ -1717,25 +1717,49 @@ static int __mem_cgroup_try_charge(struc
* thread group leader migrates. It's possible that mm is not
* set, if so charge the init_mm (happens for pagecache usage).
*/
- if (*memcg) {
+ if (!*memcg && !mm)
+ goto bypass;
+again:
+ if (*memcg) { /* css should be a valid one */
mem = *memcg;
+ VM_BUG_ON(css_is_removed(mem));
+ if (mem_cgroup_is_root(mem))
+ goto done;
+ if (consume_stock(mem))
+ goto done;
css_get(&mem->css);
} else {
- mem = try_get_mem_cgroup_from_mm(mm);
- if (unlikely(!mem))
- return 0;
- *memcg = mem;
- }
+ struct task_struct *p;
- VM_BUG_ON(css_is_removed(&mem->css));
- if (mem_cgroup_is_root(mem))
- goto done;
+ rcu_read_lock();
+ p = rcu_dereference(mm->owner);
+ VM_BUG_ON(!p);
+ /*
+ * while task_lock, this task cannot be disconnected with
+ * the cgroup we see.
+ */
+ task_lock(p);
+ mem = mem_cgroup_from_task(p);
+ VM_BUG_ON(!mem);
+ if (mem_cgroup_is_root(mem)) {
+ task_unlock(p);
+ rcu_read_unlock();
+ goto done;
+ }
+ if (consume_stock(mem)) {
+ *memcg = mem;
+ task_unlock(p);
+ rcu_read_unlock();
+ goto done;
+ }
+ css_get(&mem->css);
+ task_unlock(p);
+ rcu_read_unlock();
+ }
do {
bool oom_check;
- if (consume_stock(mem))
- goto done; /* don't need to fill stock */
/* If killed, bypass charge */
if (fatal_signal_pending(current))
goto bypass;
@@ -1750,10 +1774,13 @@ static int __mem_cgroup_try_charge(struc
switch (ret) {
case CHARGE_OK:
+ *memcg = mem;
break;
case CHARGE_RETRY: /* not in OOM situation but retry */
csize = PAGE_SIZE;
- break;
+ css_put(&mem->css);
+ mem = NULL;
+ goto again;
case CHARGE_WOULDBLOCK: /* !__GFP_WAIT */
goto nomem;
case CHARGE_NOMEM: /* OOM routine works */
@@ -1769,6 +1796,7 @@ static int __mem_cgroup_try_charge(struc
if (csize > PAGE_SIZE)
refill_stock(mem, csize - PAGE_SIZE);
+ css_put(&mem->css);
done:
return 0;
nomem:
@@ -1795,7 +1823,6 @@ static void __mem_cgroup_cancel_charge(s
res_counter_uncharge(&mem->memsw, PAGE_SIZE * count);
VM_BUG_ON(test_bit(CSS_ROOT, &mem->css.flags));
WARN_ON_ONCE(count > INT_MAX);
- __css_put(&mem->css, (int)count);
}
/* we don't need css_put for root */
}
@@ -2158,7 +2185,6 @@ int mem_cgroup_try_charge_swapin(struct
goto charge_cur_mm;
*ptr = mem;
ret = __mem_cgroup_try_charge(NULL, mask, ptr, true);
- /* drop extra refcnt from tryget */
css_put(&mem->css);
return ret;
charge_cur_mm:
@@ -2345,9 +2371,6 @@ __mem_cgroup_uncharge_common(struct page
unlock_page_cgroup(pc);
memcg_check_events(mem, page);
- /* at swapout, this memcg will be accessed to record to swap */
- if (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT)
- css_put(&mem->css);
return mem;
@@ -2432,14 +2455,18 @@ mem_cgroup_uncharge_swapcache(struct pag
if (!swapout) /* this was a swap cache but the swap is unused ! */
ctype = MEM_CGROUP_CHARGE_TYPE_DROP;
- memcg = __mem_cgroup_uncharge_common(page, ctype);
+ memcg = try_get_mem_cgroup_from_page(page);
+ if (!memcg)
+ return;
+
+ __mem_cgroup_uncharge_common(page, ctype);
/* record memcg information */
if (do_swap_account && swapout && memcg) {
swap_cgroup_record(ent, css_id(&memcg->css));
mem_cgroup_get(memcg);
}
- if (swapout && memcg)
+ if (memcg)
css_put(&memcg->css);
}
#endif
@@ -4219,7 +4246,6 @@ static int mem_cgroup_do_precharge(unsig
mc.precharge += count;
VM_BUG_ON(test_bit(CSS_ROOT, &mem->css.flags));
WARN_ON_ONCE(count > INT_MAX);
- __css_get(&mem->css, (int)count);
return ret;
}
one_by_one:
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [RFC][PATCH] memcg remove css_get/put per pages 2010-06-08 3:19 [RFC][PATCH] memcg remove css_get/put per pages KAMEZAWA Hiroyuki @ 2010-06-08 5:40 ` Balbir Singh 2010-06-09 0:47 ` KAMEZAWA Hiroyuki 2010-06-08 7:31 ` Daisuke Nishimura 2010-06-09 6:59 ` [RFC][PATCH] memcg remove css_get/put per pages v2 KAMEZAWA Hiroyuki 2 siblings, 1 reply; 15+ messages in thread From: Balbir Singh @ 2010-06-08 5:40 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, nishimura * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-06-08 12:19:01]: > Now, I think pre_destroy->force_empty() works very well and we can get rid of > css_put/get per pages. This has very big effect in some special case. > > This is a test result with a multi-thread page fault program > (I used at rwsem discussion.) > > [Before patch] > 25.72% multi-fault-all [kernel.kallsyms] [k] clear_page_c > 8.18% multi-fault-all [kernel.kallsyms] [k] try_get_mem_cgroup_from_mm > 8.17% multi-fault-all [kernel.kallsyms] [k] down_read_trylock > 8.03% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irqsave > 5.46% multi-fault-all [kernel.kallsyms] [k] __css_put > 5.45% multi-fault-all [kernel.kallsyms] [k] __alloc_pages_nodemask > 4.36% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irq > 4.35% multi-fault-all [kernel.kallsyms] [k] up_read > 3.59% multi-fault-all [kernel.kallsyms] [k] css_put > 2.37% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock > 1.80% multi-fault-all [kernel.kallsyms] [k] mem_cgroup_add_lru_list > 1.78% multi-fault-all [kernel.kallsyms] [k] __rmqueue > 1.65% multi-fault-all [kernel.kallsyms] [k] handle_mm_fault > > try_get_mem_cgroup_from_mm() is a one of heavy ops because of false-sharing in > css's counter for css_get/put. > > I removed that. > > [After] > 26.16% multi-fault-all [kernel.kallsyms] [k] clear_page_c > 11.73% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock > 9.23% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irqsave > 9.07% multi-fault-all [kernel.kallsyms] [k] down_read_trylock > 6.09% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irq > 5.57% multi-fault-all [kernel.kallsyms] [k] __alloc_pages_nodemask > 4.86% multi-fault-all [kernel.kallsyms] [k] up_read > 2.54% multi-fault-all [kernel.kallsyms] [k] __mem_cgroup_commit_charge > 2.29% multi-fault-all [kernel.kallsyms] [k] _cond_resched > 2.04% multi-fault-all [kernel.kallsyms] [k] mem_cgroup_add_lru_list > 1.82% multi-fault-all [kernel.kallsyms] [k] handle_mm_fault > > Hmm. seems nice. But I don't convince my patch has no race. > I'll continue test but your help is welcome. > Looks nice, Kamezawa-San could you please confirm the source of raw_spin_lock_irqsave and trylock from /proc/lock_stat? > == > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > Now, memory cgroup increments css(cgroup subsys state)'s reference > count per a charged page. And the reference count is kept until > the page is uncharged. But this has 2 bad effect. > > 1. Because css_get/put calls atoimic_inc()/dec, heavy call of them > on large smp will not scale well. > 2. Because css's refcnt cannot be in a state as "ready-to-release", > cgroup's notify_on_release handler can't work with memcg. > > This is a trial to remove css's refcnt per a page. Even if we remove ^^ (per page) > refcnt, pre_destroy() does enough synchronization. Could you also document what the rules for css_get/put now become? I like the idea, but I am not sure if I understand the new rules correctly by looking at the code. -- Three Cheers, Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] memcg remove css_get/put per pages 2010-06-08 5:40 ` Balbir Singh @ 2010-06-09 0:47 ` KAMEZAWA Hiroyuki 2010-06-09 5:14 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 15+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-06-09 0:47 UTC (permalink / raw) To: balbir; +Cc: linux-mm, nishimura On Tue, 8 Jun 2010 11:10:04 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-06-08 12:19:01]: > > > Now, I think pre_destroy->force_empty() works very well and we can get rid of > > css_put/get per pages. This has very big effect in some special case. > > > > This is a test result with a multi-thread page fault program > > (I used at rwsem discussion.) > > > > [Before patch] > > 25.72% multi-fault-all [kernel.kallsyms] [k] clear_page_c > > 8.18% multi-fault-all [kernel.kallsyms] [k] try_get_mem_cgroup_from_mm > > 8.17% multi-fault-all [kernel.kallsyms] [k] down_read_trylock > > 8.03% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irqsave > > 5.46% multi-fault-all [kernel.kallsyms] [k] __css_put > > 5.45% multi-fault-all [kernel.kallsyms] [k] __alloc_pages_nodemask > > 4.36% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irq > > 4.35% multi-fault-all [kernel.kallsyms] [k] up_read > > 3.59% multi-fault-all [kernel.kallsyms] [k] css_put > > 2.37% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock > > 1.80% multi-fault-all [kernel.kallsyms] [k] mem_cgroup_add_lru_list > > 1.78% multi-fault-all [kernel.kallsyms] [k] __rmqueue > > 1.65% multi-fault-all [kernel.kallsyms] [k] handle_mm_fault > > > > try_get_mem_cgroup_from_mm() is a one of heavy ops because of false-sharing in > > css's counter for css_get/put. > > > > I removed that. > > > > [After] > > 26.16% multi-fault-all [kernel.kallsyms] [k] clear_page_c > > 11.73% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock > > 9.23% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irqsave > > 9.07% multi-fault-all [kernel.kallsyms] [k] down_read_trylock > > 6.09% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irq > > 5.57% multi-fault-all [kernel.kallsyms] [k] __alloc_pages_nodemask > > 4.86% multi-fault-all [kernel.kallsyms] [k] up_read > > 2.54% multi-fault-all [kernel.kallsyms] [k] __mem_cgroup_commit_charge > > 2.29% multi-fault-all [kernel.kallsyms] [k] _cond_resched > > 2.04% multi-fault-all [kernel.kallsyms] [k] mem_cgroup_add_lru_list > > 1.82% multi-fault-all [kernel.kallsyms] [k] handle_mm_fault > > > > Hmm. seems nice. But I don't convince my patch has no race. > > I'll continue test but your help is welcome. > > > > Looks nice, Kamezawa-San could you please confirm the source of > raw_spin_lock_irqsave and trylock from /proc/lock_stat? > Sure. But above result can be got when lockdep etc..are off. (it increase lock overhead) But yes, new _raw_spin_lock seems strange. > > == > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > > > Now, memory cgroup increments css(cgroup subsys state)'s reference > > count per a charged page. And the reference count is kept until > > the page is uncharged. But this has 2 bad effect. > > > > 1. Because css_get/put calls atoimic_inc()/dec, heavy call of them > > on large smp will not scale well. > > 2. Because css's refcnt cannot be in a state as "ready-to-release", > > cgroup's notify_on_release handler can't work with memcg. > > > > This is a trial to remove css's refcnt per a page. Even if we remove > ^^ (per page) > > refcnt, pre_destroy() does enough synchronization. > > Could you also document what the rules for css_get/put now become? I > like the idea, but I am not sure if I understand the new rules > correctly by looking at the code. Hm. I'll try....but this just removes css_get/put per pages.. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] memcg remove css_get/put per pages 2010-06-09 0:47 ` KAMEZAWA Hiroyuki @ 2010-06-09 5:14 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 15+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-06-09 5:14 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: balbir, linux-mm, nishimura On Wed, 9 Jun 2010 09:47:34 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > Looks nice, Kamezawa-San could you please confirm the source of > > raw_spin_lock_irqsave and trylock from /proc/lock_stat? > > > Sure. But above result can be got when lockdep etc..are off. > (it increase lock overhead) > > But yes, new _raw_spin_lock seems strange. > Here. == ------------------------------ &(&mm->page_table_lock)->rlock 20812995 [<ffffffff81124019>] handle_mm_fault+0x7a9/0x9b0 &(&mm->page_table_lock)->rlock 9 [<ffffffff81120c5b>] __pte_alloc+0x4b/0xf0 &(&mm->page_table_lock)->rlock 4 [<ffffffff8112c70d>] anon_vma_prepare+0xad/0x180 &(&mm->page_table_lock)->rlock 83395 [<ffffffff811204b4>] unmap_vmas+0x3c4/0xa60 ------------------------------ &(&mm->page_table_lock)->rlock 7 [<ffffffff81120c5b>] __pte_alloc+0x4b/0xf0 &(&mm->page_table_lock)->rlock 20812987 [<ffffffff81124019>] handle_mm_fault+0x7a9/0x9b0 &(&mm->page_table_lock)->rlock 2 [<ffffffff8112c70d>] anon_vma_prepare+0xad/0x180 &(&mm->page_table_lock)->rlock 83408 [<ffffffff811204b4>] unmap_vmas+0x3c4/0xa60 &(&p->alloc_lock)->rlock: 6304532 6308276 0.14 1772.97 7098177.74 23165904 23222238 0.00 1980.76 12445023.62 ------------------------ &(&p->alloc_lock)->rlock 6308277 [<ffffffff81153e17>] __mem_cgroup_try_charge+0x327/0x590 ------------------------ &(&p->alloc_lock)->rlock 6308277 [<ffffffff81153e17>] __mem_cgroup_try_charge+0x327/0x590 == Then, new raw_spin_lock is task_lock(). This is because task_lock(mm->owner) makes cacheline ping pong ;( So, this is not very good patch for multi-threaded programs, Sigh... I'll consider how I can get safe access without locks again.. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] memcg remove css_get/put per pages 2010-06-08 3:19 [RFC][PATCH] memcg remove css_get/put per pages KAMEZAWA Hiroyuki 2010-06-08 5:40 ` Balbir Singh @ 2010-06-08 7:31 ` Daisuke Nishimura 2010-06-09 0:54 ` KAMEZAWA Hiroyuki 2010-06-09 6:59 ` [RFC][PATCH] memcg remove css_get/put per pages v2 KAMEZAWA Hiroyuki 2 siblings, 1 reply; 15+ messages in thread From: Daisuke Nishimura @ 2010-06-08 7:31 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, Daisuke Nishimura On Tue, 8 Jun 2010 12:19:01 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > Now, I think pre_destroy->force_empty() works very well and we can get rid of > css_put/get per pages. This has very big effect in some special case. > > This is a test result with a multi-thread page fault program > (I used at rwsem discussion.) > > [Before patch] > 25.72% multi-fault-all [kernel.kallsyms] [k] clear_page_c > 8.18% multi-fault-all [kernel.kallsyms] [k] try_get_mem_cgroup_from_mm > 8.17% multi-fault-all [kernel.kallsyms] [k] down_read_trylock > 8.03% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irqsave > 5.46% multi-fault-all [kernel.kallsyms] [k] __css_put > 5.45% multi-fault-all [kernel.kallsyms] [k] __alloc_pages_nodemask > 4.36% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irq > 4.35% multi-fault-all [kernel.kallsyms] [k] up_read > 3.59% multi-fault-all [kernel.kallsyms] [k] css_put > 2.37% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock > 1.80% multi-fault-all [kernel.kallsyms] [k] mem_cgroup_add_lru_list > 1.78% multi-fault-all [kernel.kallsyms] [k] __rmqueue > 1.65% multi-fault-all [kernel.kallsyms] [k] handle_mm_fault > > try_get_mem_cgroup_from_mm() is a one of heavy ops because of false-sharing in > css's counter for css_get/put. > I'm sorry, what do you mean by "false-sharing" ? And I think it would be better to add these performance data to commit log. > I removed that. > > [After] > 26.16% multi-fault-all [kernel.kallsyms] [k] clear_page_c > 11.73% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock > 9.23% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irqsave > 9.07% multi-fault-all [kernel.kallsyms] [k] down_read_trylock > 6.09% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irq > 5.57% multi-fault-all [kernel.kallsyms] [k] __alloc_pages_nodemask > 4.86% multi-fault-all [kernel.kallsyms] [k] up_read > 2.54% multi-fault-all [kernel.kallsyms] [k] __mem_cgroup_commit_charge > 2.29% multi-fault-all [kernel.kallsyms] [k] _cond_resched > 2.04% multi-fault-all [kernel.kallsyms] [k] mem_cgroup_add_lru_list > 1.82% multi-fault-all [kernel.kallsyms] [k] handle_mm_fault > > Hmm. seems nice. But I don't convince my patch has no race. > I'll continue test but your help is welcome. > > == > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > Now, memory cgroup increments css(cgroup subsys state)'s reference > count per a charged page. And the reference count is kept until > the page is uncharged. But this has 2 bad effect. > > 1. Because css_get/put calls atoimic_inc()/dec, heavy call of them > on large smp will not scale well. I'm sorry if I'm asking a stupid question, the number of css_get/put would be: before: get:1 in charge put:1 in uncharge after: get:1, put:1 in charge no get/put in uncharge right ? Then, isn't there any change as a whole ? > 2. Because css's refcnt cannot be in a state as "ready-to-release", > cgroup's notify_on_release handler can't work with memcg. > Yes, 2 is one of weak point of memcg, IMHO. > This is a trial to remove css's refcnt per a page. Even if we remove > refcnt, pre_destroy() does enough synchronization. > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > --- > mm/memcontrol.c | 66 +++++++++++++++++++++++++++++++++++++++----------------- > 1 file changed, 46 insertions(+), 20 deletions(-) > > Index: mmotm-2.6.34-Jun6/mm/memcontrol.c > =================================================================== > --- mmotm-2.6.34-Jun6.orig/mm/memcontrol.c > +++ mmotm-2.6.34-Jun6/mm/memcontrol.c > @@ -1717,25 +1717,49 @@ static int __mem_cgroup_try_charge(struc > * thread group leader migrates. It's possible that mm is not > * set, if so charge the init_mm (happens for pagecache usage). > */ > - if (*memcg) { > + if (!*memcg && !mm) > + goto bypass; Shouldn't it be VM_BUG_ON(!*memcg && !mm) ? > +again: > + if (*memcg) { /* css should be a valid one */ > mem = *memcg; > + VM_BUG_ON(css_is_removed(mem)); > + if (mem_cgroup_is_root(mem)) > + goto done; > + if (consume_stock(mem)) > + goto done; > css_get(&mem->css); > } else { > - mem = try_get_mem_cgroup_from_mm(mm); > - if (unlikely(!mem)) > - return 0; > - *memcg = mem; > - } > + struct task_struct *p; > > - VM_BUG_ON(css_is_removed(&mem->css)); > - if (mem_cgroup_is_root(mem)) > - goto done; > + rcu_read_lock(); > + p = rcu_dereference(mm->owner); > + VM_BUG_ON(!p); > + /* > + * while task_lock, this task cannot be disconnected with > + * the cgroup we see. > + */ > + task_lock(p); > + mem = mem_cgroup_from_task(p); > + VM_BUG_ON(!mem); > + if (mem_cgroup_is_root(mem)) { Shoudn't we do "*memcg = mem" here ? hmm, how about doing: done: *memcg = mem; return 0; instead of doing "*memcg = mem" in some places ? > + task_unlock(p); > + rcu_read_unlock(); > + goto done; > + } > + if (consume_stock(mem)) { > + *memcg = mem; > + task_unlock(p); > + rcu_read_unlock(); > + goto done; > + } > + css_get(&mem->css); > + task_unlock(p); > + rcu_read_unlock(); > + } > > do { > bool oom_check; > > - if (consume_stock(mem)) > - goto done; /* don't need to fill stock */ > /* If killed, bypass charge */ > if (fatal_signal_pending(current)) > goto bypass; > @@ -1750,10 +1774,13 @@ static int __mem_cgroup_try_charge(struc > > switch (ret) { > case CHARGE_OK: > + *memcg = mem; > break; > case CHARGE_RETRY: /* not in OOM situation but retry */ > csize = PAGE_SIZE; > - break; > + css_put(&mem->css); > + mem = NULL; > + goto again; > case CHARGE_WOULDBLOCK: /* !__GFP_WAIT */ > goto nomem; > case CHARGE_NOMEM: /* OOM routine works */ > @@ -1769,6 +1796,7 @@ static int __mem_cgroup_try_charge(struc > > if (csize > PAGE_SIZE) > refill_stock(mem, csize - PAGE_SIZE); > + css_put(&mem->css); > done: > return 0; > nomem: > @@ -1795,7 +1823,6 @@ static void __mem_cgroup_cancel_charge(s > res_counter_uncharge(&mem->memsw, PAGE_SIZE * count); > VM_BUG_ON(test_bit(CSS_ROOT, &mem->css.flags)); > WARN_ON_ONCE(count > INT_MAX); > - __css_put(&mem->css, (int)count); > } > /* we don't need css_put for root */ > } These VM_BUG_ON() and WARN_ON_ONCE() will be unnecessary, too. > @@ -2158,7 +2185,6 @@ int mem_cgroup_try_charge_swapin(struct > goto charge_cur_mm; > *ptr = mem; > ret = __mem_cgroup_try_charge(NULL, mask, ptr, true); > - /* drop extra refcnt from tryget */ > css_put(&mem->css); > return ret; > charge_cur_mm: > @@ -2345,9 +2371,6 @@ __mem_cgroup_uncharge_common(struct page > unlock_page_cgroup(pc); > > memcg_check_events(mem, page); > - /* at swapout, this memcg will be accessed to record to swap */ > - if (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT) > - css_put(&mem->css); > > return mem; > > @@ -2432,14 +2455,18 @@ mem_cgroup_uncharge_swapcache(struct pag > if (!swapout) /* this was a swap cache but the swap is unused ! */ > ctype = MEM_CGROUP_CHARGE_TYPE_DROP; > > - memcg = __mem_cgroup_uncharge_common(page, ctype); > + memcg = try_get_mem_cgroup_from_page(page); > + if (!memcg) > + return; > + > + __mem_cgroup_uncharge_common(page, ctype); > > /* record memcg information */ > if (do_swap_account && swapout && memcg) { > swap_cgroup_record(ent, css_id(&memcg->css)); > mem_cgroup_get(memcg); > } > - if (swapout && memcg) > + if (memcg) > css_put(&memcg->css); > } > #endif "if (memcg)" is unnecessary(it's checked above). > @@ -4219,7 +4246,6 @@ static int mem_cgroup_do_precharge(unsig > mc.precharge += count; > VM_BUG_ON(test_bit(CSS_ROOT, &mem->css.flags)); > WARN_ON_ONCE(count > INT_MAX); > - __css_get(&mem->css, (int)count); > return ret; > } > one_by_one: > ditto. IIUC this patch, we should remove css_put() in mem_cgroup_move_swap_account() and __css_put() in mem_cgroup_clear_mc() too, and modify some comments. Anyway, we must test these changes carefully. Thanks, Daisuke Nishimura. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] memcg remove css_get/put per pages 2010-06-08 7:31 ` Daisuke Nishimura @ 2010-06-09 0:54 ` KAMEZAWA Hiroyuki 2010-06-09 2:05 ` Daisuke Nishimura 0 siblings, 1 reply; 15+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-06-09 0:54 UTC (permalink / raw) To: Daisuke Nishimura; +Cc: linux-mm, balbir On Tue, 8 Jun 2010 16:31:29 +0900 Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote: > On Tue, 8 Jun 2010 12:19:01 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > Now, I think pre_destroy->force_empty() works very well and we can get rid of > > css_put/get per pages. This has very big effect in some special case. > > > > This is a test result with a multi-thread page fault program > > (I used at rwsem discussion.) > > > > [Before patch] > > 25.72% multi-fault-all [kernel.kallsyms] [k] clear_page_c > > 8.18% multi-fault-all [kernel.kallsyms] [k] try_get_mem_cgroup_from_mm > > 8.17% multi-fault-all [kernel.kallsyms] [k] down_read_trylock > > 8.03% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irqsave > > 5.46% multi-fault-all [kernel.kallsyms] [k] __css_put > > 5.45% multi-fault-all [kernel.kallsyms] [k] __alloc_pages_nodemask > > 4.36% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irq > > 4.35% multi-fault-all [kernel.kallsyms] [k] up_read > > 3.59% multi-fault-all [kernel.kallsyms] [k] css_put > > 2.37% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock > > 1.80% multi-fault-all [kernel.kallsyms] [k] mem_cgroup_add_lru_list > > 1.78% multi-fault-all [kernel.kallsyms] [k] __rmqueue > > 1.65% multi-fault-all [kernel.kallsyms] [k] handle_mm_fault > > > > try_get_mem_cgroup_from_mm() is a one of heavy ops because of false-sharing in > > css's counter for css_get/put. > > > I'm sorry, what do you mean by "false-sharing" ? cacheline pingpong among cpus arount css' counter. > And I think it would be better to add these performance data to commit log. > yes, I'll move this when I remove RFC. > > I removed that. > > > > [After] > > 26.16% multi-fault-all [kernel.kallsyms] [k] clear_page_c > > 11.73% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock > > 9.23% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irqsave > > 9.07% multi-fault-all [kernel.kallsyms] [k] down_read_trylock > > 6.09% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irq > > 5.57% multi-fault-all [kernel.kallsyms] [k] __alloc_pages_nodemask > > 4.86% multi-fault-all [kernel.kallsyms] [k] up_read > > 2.54% multi-fault-all [kernel.kallsyms] [k] __mem_cgroup_commit_charge > > 2.29% multi-fault-all [kernel.kallsyms] [k] _cond_resched > > 2.04% multi-fault-all [kernel.kallsyms] [k] mem_cgroup_add_lru_list > > 1.82% multi-fault-all [kernel.kallsyms] [k] handle_mm_fault > > > > Hmm. seems nice. But I don't convince my patch has no race. > > I'll continue test but your help is welcome. > > > > == > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > > > Now, memory cgroup increments css(cgroup subsys state)'s reference > > count per a charged page. And the reference count is kept until > > the page is uncharged. But this has 2 bad effect. > > > > 1. Because css_get/put calls atoimic_inc()/dec, heavy call of them > > on large smp will not scale well. > I'm sorry if I'm asking a stupid question, the number of css_get/put > would be: > > before: > get:1 in charge > put:1 in uncharge > after: > get:1, put:1 in charge > no get/put in uncharge > > right ? No. before: get 1 in charge. put 1 at charge after: no get at charge in fast path (cunsume_stcok hits.) get 1 at accssing res_counter and reclaim, put 1 after it. no get/put in uncharge. > Then, isn't there any change as a whole ? > We get much benefit when consume_stock() works. > > 2. Because css's refcnt cannot be in a state as "ready-to-release", > > cgroup's notify_on_release handler can't work with memcg. > > > Yes, 2 is one of weak point of memcg, IMHO. > > > This is a trial to remove css's refcnt per a page. Even if we remove > > refcnt, pre_destroy() does enough synchronization. > > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > --- > > mm/memcontrol.c | 66 +++++++++++++++++++++++++++++++++++++++----------------- > > 1 file changed, 46 insertions(+), 20 deletions(-) > > > > Index: mmotm-2.6.34-Jun6/mm/memcontrol.c > > =================================================================== > > --- mmotm-2.6.34-Jun6.orig/mm/memcontrol.c > > +++ mmotm-2.6.34-Jun6/mm/memcontrol.c > > @@ -1717,25 +1717,49 @@ static int __mem_cgroup_try_charge(struc > > * thread group leader migrates. It's possible that mm is not > > * set, if so charge the init_mm (happens for pagecache usage). > > */ > > - if (*memcg) { > > + if (!*memcg && !mm) > > + goto bypass; > Shouldn't it be VM_BUG_ON(!*memcg && !mm) ? > IIUC, we were afraid of a thread with no mm comes here...at boot, etc.. I don't want to change the minor behavior in this set. > > +again: > > + if (*memcg) { /* css should be a valid one */ > > mem = *memcg; > > + VM_BUG_ON(css_is_removed(mem)); > > + if (mem_cgroup_is_root(mem)) > > + goto done; > > + if (consume_stock(mem)) > > + goto done; > > css_get(&mem->css); > > } else { > > - mem = try_get_mem_cgroup_from_mm(mm); > > - if (unlikely(!mem)) > > - return 0; > > - *memcg = mem; > > - } > > + struct task_struct *p; > > > > - VM_BUG_ON(css_is_removed(&mem->css)); > > - if (mem_cgroup_is_root(mem)) > > - goto done; > > + rcu_read_lock(); > > + p = rcu_dereference(mm->owner); > > + VM_BUG_ON(!p); > > + /* > > + * while task_lock, this task cannot be disconnected with > > + * the cgroup we see. > > + */ > > + task_lock(p); > > + mem = mem_cgroup_from_task(p); > > + VM_BUG_ON(!mem); > > + if (mem_cgroup_is_root(mem)) { > Shoudn't we do "*memcg = mem" here ? > hmm, how about doing: > > done: > *memcg = mem; > return 0; > > instead of doing "*memcg = mem" in some places ? > Ok, I'll consider about that. > > + task_unlock(p); > > + rcu_read_unlock(); > > + goto done; > > + } > > + if (consume_stock(mem)) { > > + *memcg = mem; > > + task_unlock(p); > > + rcu_read_unlock(); > > + goto done; > > + } > > + css_get(&mem->css); > > + task_unlock(p); > > + rcu_read_unlock(); > > + } > > > > do { > > bool oom_check; > > > > - if (consume_stock(mem)) > > - goto done; /* don't need to fill stock */ > > /* If killed, bypass charge */ > > if (fatal_signal_pending(current)) > > goto bypass; > > @@ -1750,10 +1774,13 @@ static int __mem_cgroup_try_charge(struc > > > > switch (ret) { > > case CHARGE_OK: > > + *memcg = mem; > > break; > > case CHARGE_RETRY: /* not in OOM situation but retry */ > > csize = PAGE_SIZE; > > - break; > > + css_put(&mem->css); > > + mem = NULL; > > + goto again; > > case CHARGE_WOULDBLOCK: /* !__GFP_WAIT */ > > goto nomem; > > case CHARGE_NOMEM: /* OOM routine works */ > > @@ -1769,6 +1796,7 @@ static int __mem_cgroup_try_charge(struc > > > > if (csize > PAGE_SIZE) > > refill_stock(mem, csize - PAGE_SIZE); > > + css_put(&mem->css); > > done: > > return 0; > > nomem: > > @@ -1795,7 +1823,6 @@ static void __mem_cgroup_cancel_charge(s > > res_counter_uncharge(&mem->memsw, PAGE_SIZE * count); > > VM_BUG_ON(test_bit(CSS_ROOT, &mem->css.flags)); > > WARN_ON_ONCE(count > INT_MAX); > > - __css_put(&mem->css, (int)count); > > } > > /* we don't need css_put for root */ > > } > These VM_BUG_ON() and WARN_ON_ONCE() will be unnecessary, too. > ok. > > @@ -2158,7 +2185,6 @@ int mem_cgroup_try_charge_swapin(struct > > goto charge_cur_mm; > > *ptr = mem; > > ret = __mem_cgroup_try_charge(NULL, mask, ptr, true); > > - /* drop extra refcnt from tryget */ > > css_put(&mem->css); > > return ret; > > charge_cur_mm: > > @@ -2345,9 +2371,6 @@ __mem_cgroup_uncharge_common(struct page > > unlock_page_cgroup(pc); > > > > memcg_check_events(mem, page); > > - /* at swapout, this memcg will be accessed to record to swap */ > > - if (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT) > > - css_put(&mem->css); > > > > return mem; > > > > @@ -2432,14 +2455,18 @@ mem_cgroup_uncharge_swapcache(struct pag > > if (!swapout) /* this was a swap cache but the swap is unused ! */ > > ctype = MEM_CGROUP_CHARGE_TYPE_DROP; > > > > - memcg = __mem_cgroup_uncharge_common(page, ctype); > > + memcg = try_get_mem_cgroup_from_page(page); > > + if (!memcg) > > + return; > > + > > + __mem_cgroup_uncharge_common(page, ctype); > > > > /* record memcg information */ > > if (do_swap_account && swapout && memcg) { > > swap_cgroup_record(ent, css_id(&memcg->css)); > > mem_cgroup_get(memcg); > > } > > - if (swapout && memcg) > > + if (memcg) > > css_put(&memcg->css); > > } > > #endif > "if (memcg)" is unnecessary(it's checked above). > Sure. > > @@ -4219,7 +4246,6 @@ static int mem_cgroup_do_precharge(unsig > > mc.precharge += count; > > VM_BUG_ON(test_bit(CSS_ROOT, &mem->css.flags)); > > WARN_ON_ONCE(count > INT_MAX); > > - __css_get(&mem->css, (int)count); > > return ret; > > } > > one_by_one: > > > ditto. > ok. > IIUC this patch, we should remove css_put() in mem_cgroup_move_swap_account() > and __css_put() in mem_cgroup_clear_mc() too, and modify some comments. > Anyway, we must test these changes carefully. > will do in the next version. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] memcg remove css_get/put per pages 2010-06-09 0:54 ` KAMEZAWA Hiroyuki @ 2010-06-09 2:05 ` Daisuke Nishimura 0 siblings, 0 replies; 15+ messages in thread From: Daisuke Nishimura @ 2010-06-09 2:05 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, Daisuke Nishimura On Wed, 9 Jun 2010 09:54:48 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Tue, 8 Jun 2010 16:31:29 +0900 > Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote: > > > On Tue, 8 Jun 2010 12:19:01 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: (snip) > > > 1. Because css_get/put calls atoimic_inc()/dec, heavy call of them > > > on large smp will not scale well. > > I'm sorry if I'm asking a stupid question, the number of css_get/put > > would be: > > > > before: > > get:1 in charge > > put:1 in uncharge > > after: > > get:1, put:1 in charge > > no get/put in uncharge > > > > right ? > > No. > > before: get 1 in charge. > put 1 at charge > > after: > no get at charge in fast path (cunsume_stcok hits.) > get 1 at accssing res_counter and reclaim, put 1 after it. > no get/put in uncharge. > > > Then, isn't there any change as a whole ? > > > We get much benefit when consume_stock() works. > Ah, I missed comsume_stock(). The number of get/put would be decreased very much. Thank you for your explanation. Thanks, Daisuke Nishimura. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC][PATCH] memcg remove css_get/put per pages v2 2010-06-08 3:19 [RFC][PATCH] memcg remove css_get/put per pages KAMEZAWA Hiroyuki 2010-06-08 5:40 ` Balbir Singh 2010-06-08 7:31 ` Daisuke Nishimura @ 2010-06-09 6:59 ` KAMEZAWA Hiroyuki 2010-06-10 2:34 ` Daisuke Nishimura ` (2 more replies) 2 siblings, 3 replies; 15+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-06-09 6:59 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, nishimura, linux-kernel Still RFC, added lkml to CC: list. == From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Now, memory cgroup increments css(cgroup subsys state)'s reference count per a charged page. And the reference count is kept until the page is uncharged. But this has 2 bad effect. 1. Because css_get/put calls atoimic_inc()/dec, heavy call of them on large smp will not scale well. 2. Because css's refcnt cannot be in a state as "ready-to-release", cgroup's notify_on_release handler can't work with memcg. This is a trial to remove css's refcnt per a page. Even if we remove refcnt, pre_destroy() does enough synchronization. After this patch, it seems css_get() is still called in try_charge(). But the logic is. 1. task_lock(mm->owner) 2. If a memcg of mm->owner is cached one, consume_stock() will work. If it returns true, task_unlock and return immediately. 3. If a memcg doesn't hit cached one, css_get() and access res_counter and do charge. So, in the fast path, we don't call css_get() and can avoid access to shared counter. The point is that while there are used resource, memcg can't be freed. So, if it's cached, we don't have to take css_get(). Brief test result on mmotm-2.6.32-Jul-06 + Nishimura-san's cleanup. test with multi-threaded page fault program on 8cpu system. (Each threads touch 2M of pages and release them by madvice().) [Before] 39522457 page-faults 549387860 cache-misses (cache-miss/page-fault 13.90) 60.003715887 seconds time elapsed 25.75% multi-fault-all [kernel.kallsyms] [k] clear_page_c 8.59% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irqsave 8.22% multi-fault-all [kernel.kallsyms] [k] try_get_mem_cgroup_from_mm 8.01% multi-fault-all [kernel.kallsyms] [k] down_read_trylock 5.41% multi-fault-all [kernel.kallsyms] [k] __alloc_pages_nodemask 5.41% multi-fault-all [kernel.kallsyms] [k] __css_put 4.48% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irq 4.41% multi-fault-all [kernel.kallsyms] [k] up_read 3.60% multi-fault-all [kernel.kallsyms] [k] css_put 1.93% multi-fault-all [kernel.kallsyms] [k] handle_mm_fault 1.91% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock 1.89% multi-fault-all [kernel.kallsyms] [k] _cond_resched 1.78% multi-fault-all [kernel.kallsyms] [k] __rmqueue 1.69% multi-fault-all [kernel.kallsyms] [k] mem_cgroup_add_lru_list 1.63% multi-fault-all [kernel.kallsyms] [k] page_fault 1.45% multi-fault-all [kernel.kallsyms] [k] __mem_cgroup_commit_charge 1.34% multi-fault-all [kernel.kallsyms] [k] find_vma [After] 43253862 page-faults 505750203 cache-misses (cache-miss/page-fault 11.69) 60.004123555 seconds time elapsed 27.98% multi-fault-all [kernel.kallsyms] [k] clear_page_c 9.89% multi-fault-all [kernel.kallsyms] [k] down_read_trylock 9.88% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irq 9.37% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irqsave 5.91% multi-fault-all [kernel.kallsyms] [k] __alloc_pages_nodemask 5.69% multi-fault-all [kernel.kallsyms] [k] up_read 2.94% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock 2.70% multi-fault-all [kernel.kallsyms] [k] handle_mm_fault 2.66% multi-fault-all [kernel.kallsyms] [k] mem_cgroup_add_lru_list 2.25% multi-fault-all [kernel.kallsyms] [k] __mem_cgroup_commit_charge 1.90% multi-fault-all [kernel.kallsyms] [k] __rmqueue 1.74% multi-fault-all [kernel.kallsyms] [k] page_fault 1.52% multi-fault-all [kernel.kallsyms] [k] find_vma Then, overhead is reduced. Note: Because this is an extreme, not-realistic, test, I'm not sure how this change will be benefits for usual applications. Changelog 20100609: - clean up try_charge(). - fixed mem_cgroup_clear_mc - removed unnecessary warnings. - removed task_lock, added more comments. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- mm/memcontrol.c | 96 ++++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 63 insertions(+), 33 deletions(-) Index: mmotm-2.6.34-Jun6/mm/memcontrol.c =================================================================== --- mmotm-2.6.34-Jun6.orig/mm/memcontrol.c +++ mmotm-2.6.34-Jun6/mm/memcontrol.c @@ -1717,28 +1717,61 @@ static int __mem_cgroup_try_charge(struc * thread group leader migrates. It's possible that mm is not * set, if so charge the init_mm (happens for pagecache usage). */ - if (*memcg) { + if (!*memcg && !mm) + goto bypass; +again: + if (*memcg) { /* css should be a valid one */ mem = *memcg; + VM_BUG_ON(css_is_removed(&mem->css)); + if (mem_cgroup_is_root(mem)) + goto done; + if (consume_stock(mem)) + goto done; css_get(&mem->css); } else { - mem = try_get_mem_cgroup_from_mm(mm); - if (unlikely(!mem)) - return 0; - *memcg = mem; - } + struct task_struct *p; - VM_BUG_ON(css_is_removed(&mem->css)); - if (mem_cgroup_is_root(mem)) - goto done; + rcu_read_lock(); + p = rcu_dereference(mm->owner); + VM_BUG_ON(!p); + /* + * It seems there are some races when 'p' exits. But, at exit(), + * p->cgroups will be reset to init_css_set. So, we'll just + * find the root cgroup even if 'p' gets to be obsolete. + */ + mem = mem_cgroup_from_task(p); + VM_BUG_ON(!mem); + if (mem_cgroup_is_root(mem)) { + rcu_read_unlock(); + goto done; + } + if (consume_stock(mem)) { + /* + * It seems dagerous to access memcg without css_get(). + * But considering how consume_stok works, it's not + * necessary. If consume_stock success, some charges + * from this memcg are cached on this cpu. So, we + * don't need to call css_get()/css_tryget() before + * calling consume_stock(). + */ + rcu_read_unlock(); + goto done; + } + if (!css_tryget(&mem->css)) { + rcu_read_unlock(); + goto again; + } + rcu_read_unlock(); + } do { bool oom_check; - if (consume_stock(mem)) - goto done; /* don't need to fill stock */ /* If killed, bypass charge */ - if (fatal_signal_pending(current)) + if (fatal_signal_pending(current)) { + css_put(&mem->css); goto bypass; + } oom_check = false; if (oom && !nr_oom_retries) { @@ -1753,30 +1786,36 @@ static int __mem_cgroup_try_charge(struc break; case CHARGE_RETRY: /* not in OOM situation but retry */ csize = PAGE_SIZE; - break; + css_put(&mem->css); + mem = NULL; + goto again; case CHARGE_WOULDBLOCK: /* !__GFP_WAIT */ + css_put(&mem->css); goto nomem; case CHARGE_NOMEM: /* OOM routine works */ - if (!oom) + if (!oom) { + css_put(&mem->css); goto nomem; + } /* If oom, we never return -ENOMEM */ nr_oom_retries--; break; case CHARGE_OOM_DIE: /* Killed by OOM Killer */ + css_put(&mem->css); goto bypass; } } while (ret != CHARGE_OK); if (csize > PAGE_SIZE) refill_stock(mem, csize - PAGE_SIZE); + css_put(&mem->css); done: + *memcg = mem; return 0; nomem: - css_put(&mem->css); + *memcg = NULL; return -ENOMEM; bypass: - if (mem) - css_put(&mem->css); *memcg = NULL; return 0; } @@ -1793,11 +1832,7 @@ static void __mem_cgroup_cancel_charge(s res_counter_uncharge(&mem->res, PAGE_SIZE * count); if (do_swap_account) res_counter_uncharge(&mem->memsw, PAGE_SIZE * count); - VM_BUG_ON(test_bit(CSS_ROOT, &mem->css.flags)); - WARN_ON_ONCE(count > INT_MAX); - __css_put(&mem->css, (int)count); } - /* we don't need css_put for root */ } static void mem_cgroup_cancel_charge(struct mem_cgroup *mem) @@ -2158,7 +2193,6 @@ int mem_cgroup_try_charge_swapin(struct goto charge_cur_mm; *ptr = mem; ret = __mem_cgroup_try_charge(NULL, mask, ptr, true); - /* drop extra refcnt from tryget */ css_put(&mem->css); return ret; charge_cur_mm: @@ -2345,9 +2379,6 @@ __mem_cgroup_uncharge_common(struct page unlock_page_cgroup(pc); memcg_check_events(mem, page); - /* at swapout, this memcg will be accessed to record to swap */ - if (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT) - css_put(&mem->css); return mem; @@ -2432,15 +2463,18 @@ mem_cgroup_uncharge_swapcache(struct pag if (!swapout) /* this was a swap cache but the swap is unused ! */ ctype = MEM_CGROUP_CHARGE_TYPE_DROP; - memcg = __mem_cgroup_uncharge_common(page, ctype); + memcg = try_get_mem_cgroup_from_page(page); + if (!memcg) + return; + + __mem_cgroup_uncharge_common(page, ctype); /* record memcg information */ - if (do_swap_account && swapout && memcg) { + if (do_swap_account && swapout) { swap_cgroup_record(ent, css_id(&memcg->css)); mem_cgroup_get(memcg); } - if (swapout && memcg) - css_put(&memcg->css); + css_put(&memcg->css); } #endif @@ -2518,7 +2552,6 @@ static int mem_cgroup_move_swap_account( */ if (!mem_cgroup_is_root(to)) res_counter_uncharge(&to->res, PAGE_SIZE); - css_put(&to->css); } return 0; } @@ -4219,7 +4252,6 @@ static int mem_cgroup_do_precharge(unsig mc.precharge += count; VM_BUG_ON(test_bit(CSS_ROOT, &mem->css.flags)); WARN_ON_ONCE(count > INT_MAX); - __css_get(&mem->css, (int)count); return ret; } one_by_one: @@ -4469,8 +4501,6 @@ static void mem_cgroup_clear_mc(void) */ res_counter_uncharge(&mc.to->res, PAGE_SIZE * mc.moved_swap); - VM_BUG_ON(test_bit(CSS_ROOT, &mc.to->css.flags)); - __css_put(&mc.to->css, mc.moved_swap); } /* we've already done mem_cgroup_get(mc.to) */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] memcg remove css_get/put per pages v2 2010-06-09 6:59 ` [RFC][PATCH] memcg remove css_get/put per pages v2 KAMEZAWA Hiroyuki @ 2010-06-10 2:34 ` Daisuke Nishimura 2010-06-10 2:49 ` KAMEZAWA Hiroyuki 2010-06-11 4:37 ` Daisuke Nishimura 2010-06-11 6:11 ` Balbir Singh 2 siblings, 1 reply; 15+ messages in thread From: Daisuke Nishimura @ 2010-06-10 2:34 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, linux-kernel, Daisuke Nishimura I can't find any trivial bugs from my review at the moment. I'll do some tests. Some minor commens. On Wed, 9 Jun 2010 15:59:40 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > Still RFC, added lkml to CC: list. > == > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > Now, memory cgroup increments css(cgroup subsys state)'s reference > count per a charged page. And the reference count is kept until > the page is uncharged. But this has 2 bad effect. > > 1. Because css_get/put calls atoimic_inc()/dec, heavy call of them > on large smp will not scale well. > 2. Because css's refcnt cannot be in a state as "ready-to-release", > cgroup's notify_on_release handler can't work with memcg. > > This is a trial to remove css's refcnt per a page. Even if we remove > refcnt, pre_destroy() does enough synchronization. > > After this patch, it seems css_get() is still called in try_charge(). > But the logic is. > > 1. task_lock(mm->owner) There is no task_lock() in this version :) (snip) > @@ -4219,7 +4252,6 @@ static int mem_cgroup_do_precharge(unsig > mc.precharge += count; > VM_BUG_ON(test_bit(CSS_ROOT, &mem->css.flags)); > WARN_ON_ONCE(count > INT_MAX); > - __css_get(&mem->css, (int)count); > return ret; > } > one_by_one: You can remove VM_BUG_ON() and WARN_ON_ONCE() here, too. > @@ -4469,8 +4501,6 @@ static void mem_cgroup_clear_mc(void) > */ > res_counter_uncharge(&mc.to->res, > PAGE_SIZE * mc.moved_swap); > - VM_BUG_ON(test_bit(CSS_ROOT, &mc.to->css.flags)); > - __css_put(&mc.to->css, mc.moved_swap); > } > /* we've already done mem_cgroup_get(mc.to) */ > > And, you can remove "WARN_ON_ONCE(mc.moved_swap > INT_MAX)" at the beginning of this block, too. Thanks, Daisuke Nishimura. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] memcg remove css_get/put per pages v2 2010-06-10 2:34 ` Daisuke Nishimura @ 2010-06-10 2:49 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 15+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-06-10 2:49 UTC (permalink / raw) To: Daisuke Nishimura; +Cc: linux-mm, balbir, linux-kernel On Thu, 10 Jun 2010 11:34:24 +0900 Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote: > I can't find any trivial bugs from my review at the moment. > I'll do some tests. > Thank you for review. > Some minor commens. > > On Wed, 9 Jun 2010 15:59:40 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > Still RFC, added lkml to CC: list. > > == > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > > > Now, memory cgroup increments css(cgroup subsys state)'s reference > > count per a charged page. And the reference count is kept until > > the page is uncharged. But this has 2 bad effect. > > > > 1. Because css_get/put calls atoimic_inc()/dec, heavy call of them > > on large smp will not scale well. > > 2. Because css's refcnt cannot be in a state as "ready-to-release", > > cgroup's notify_on_release handler can't work with memcg. > > > > This is a trial to remove css's refcnt per a page. Even if we remove > > refcnt, pre_destroy() does enough synchronization. > > > > After this patch, it seems css_get() is still called in try_charge(). > > But the logic is. > > > > 1. task_lock(mm->owner) > There is no task_lock() in this version :) > yes. > (snip) > > @@ -4219,7 +4252,6 @@ static int mem_cgroup_do_precharge(unsig > > mc.precharge += count; > > VM_BUG_ON(test_bit(CSS_ROOT, &mem->css.flags)); > > WARN_ON_ONCE(count > INT_MAX); > > - __css_get(&mem->css, (int)count); > > return ret; > > } > > one_by_one: > You can remove VM_BUG_ON() and WARN_ON_ONCE() here, too. > ok. > > @@ -4469,8 +4501,6 @@ static void mem_cgroup_clear_mc(void) > > */ > > res_counter_uncharge(&mc.to->res, > > PAGE_SIZE * mc.moved_swap); > > - VM_BUG_ON(test_bit(CSS_ROOT, &mc.to->css.flags)); > > - __css_put(&mc.to->css, mc.moved_swap); > > } > > /* we've already done mem_cgroup_get(mc.to) */ > > > > > And, you can remove "WARN_ON_ONCE(mc.moved_swap > INT_MAX)" at the beginning > of this block, too. > Hmm. ok. Ah...them, this patch fixes the limitation by "css's refcnt is int" problem. sounds nice. Thanks, -Kmae -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] memcg remove css_get/put per pages v2 2010-06-09 6:59 ` [RFC][PATCH] memcg remove css_get/put per pages v2 KAMEZAWA Hiroyuki 2010-06-10 2:34 ` Daisuke Nishimura @ 2010-06-11 4:37 ` Daisuke Nishimura 2010-06-11 4:52 ` KAMEZAWA Hiroyuki 2010-06-11 6:11 ` Balbir Singh 2 siblings, 1 reply; 15+ messages in thread From: Daisuke Nishimura @ 2010-06-11 4:37 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, linux-kernel, Daisuke Nishimura > @@ -2432,15 +2463,18 @@ mem_cgroup_uncharge_swapcache(struct pag > if (!swapout) /* this was a swap cache but the swap is unused ! */ > ctype = MEM_CGROUP_CHARGE_TYPE_DROP; > > - memcg = __mem_cgroup_uncharge_common(page, ctype); > + memcg = try_get_mem_cgroup_from_page(page); > + if (!memcg) > + return; > + > + __mem_cgroup_uncharge_common(page, ctype); > > /* record memcg information */ > - if (do_swap_account && swapout && memcg) { > + if (do_swap_account && swapout) { > swap_cgroup_record(ent, css_id(&memcg->css)); > mem_cgroup_get(memcg); > } > - if (swapout && memcg) > - css_put(&memcg->css); > + css_put(&memcg->css); > } > #endif > hmm, this change seems to cause a problem. I can see under flow of mem->memsw and "swap" field in memory.stat. I think doing swap_cgroup_record() against mem_cgroup which is not returned by __mem_cgroup_uncharge_common() is a bad behavior. How about doing like this ? We can safely access mem_cgroup while it has memory.usage, iow, before we call res_counter_uncharge(). After this change, it seems to work well. --- mm/memcontrol.c | 22 +++++++++------------- 1 files changed, 9 insertions(+), 13 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 6e7c1c9..2fae26f 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2362,10 +2362,6 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype) break; } - if (!mem_cgroup_is_root(mem)) - __do_uncharge(mem, ctype); - if (ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT) - mem_cgroup_swap_statistics(mem, true); mem_cgroup_charge_statistics(mem, pc, false); ClearPageCgroupUsed(pc); @@ -2379,6 +2375,12 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype) unlock_page_cgroup(pc); memcg_check_events(mem, page); + if (ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT) { + mem_cgroup_swap_statistics(mem, true); + mem_cgroup_get(mem); + } + if (!mem_cgroup_is_root(mem)) + __do_uncharge(mem, ctype); return mem; @@ -2463,18 +2465,12 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout) if (!swapout) /* this was a swap cache but the swap is unused ! */ ctype = MEM_CGROUP_CHARGE_TYPE_DROP; - memcg = try_get_mem_cgroup_from_page(page); - if (!memcg) - return; - - __mem_cgroup_uncharge_common(page, ctype); + memcg = __mem_cgroup_uncharge_common(page, ctype); /* record memcg information */ - if (do_swap_account && swapout) { + if (do_swap_account && swapout && memcg) + /* We've already done mem_cgroup_get() in uncharge_common(). */ swap_cgroup_record(ent, css_id(&memcg->css)); - mem_cgroup_get(memcg); - } - css_put(&memcg->css); } #endif -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] memcg remove css_get/put per pages v2 2010-06-11 4:37 ` Daisuke Nishimura @ 2010-06-11 4:52 ` KAMEZAWA Hiroyuki 2010-06-11 4:59 ` Daisuke Nishimura 0 siblings, 1 reply; 15+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-06-11 4:52 UTC (permalink / raw) To: Daisuke Nishimura; +Cc: linux-mm, balbir, linux-kernel On Fri, 11 Jun 2010 13:37:44 +0900 Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote: > > @@ -2432,15 +2463,18 @@ mem_cgroup_uncharge_swapcache(struct pag > > if (!swapout) /* this was a swap cache but the swap is unused ! */ > > ctype = MEM_CGROUP_CHARGE_TYPE_DROP; > > > > - memcg = __mem_cgroup_uncharge_common(page, ctype); > > + memcg = try_get_mem_cgroup_from_page(page); > > + if (!memcg) > > + return; > > + > > + __mem_cgroup_uncharge_common(page, ctype); > > > > /* record memcg information */ > > - if (do_swap_account && swapout && memcg) { > > + if (do_swap_account && swapout) { > > swap_cgroup_record(ent, css_id(&memcg->css)); > > mem_cgroup_get(memcg); > > } > > - if (swapout && memcg) > > - css_put(&memcg->css); > > + css_put(&memcg->css); > > } > > #endif > > > hmm, this change seems to cause a problem. > I can see under flow of mem->memsw and "swap" field in memory.stat. > > I think doing swap_cgroup_record() against mem_cgroup which is not returned > by __mem_cgroup_uncharge_common() is a bad behavior. > > How about doing like this ? We can safely access mem_cgroup while it has > memory.usage, iow, before we call res_counter_uncharge(). > After this change, it seems to work well. > Thank you!. seems to work. I'll merge your change. Can I add your Signed-off-by ? Thanks, -Kame > --- > mm/memcontrol.c | 22 +++++++++------------- > 1 files changed, 9 insertions(+), 13 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 6e7c1c9..2fae26f 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2362,10 +2362,6 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype) > break; > } > > - if (!mem_cgroup_is_root(mem)) > - __do_uncharge(mem, ctype); > - if (ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT) > - mem_cgroup_swap_statistics(mem, true); > mem_cgroup_charge_statistics(mem, pc, false); > > ClearPageCgroupUsed(pc); > @@ -2379,6 +2375,12 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype) > unlock_page_cgroup(pc); > > memcg_check_events(mem, page); > + if (ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT) { > + mem_cgroup_swap_statistics(mem, true); > + mem_cgroup_get(mem); > + } > + if (!mem_cgroup_is_root(mem)) > + __do_uncharge(mem, ctype); > > return mem; > > @@ -2463,18 +2465,12 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout) > if (!swapout) /* this was a swap cache but the swap is unused ! */ > ctype = MEM_CGROUP_CHARGE_TYPE_DROP; > > - memcg = try_get_mem_cgroup_from_page(page); > - if (!memcg) > - return; > - > - __mem_cgroup_uncharge_common(page, ctype); > + memcg = __mem_cgroup_uncharge_common(page, ctype); > > /* record memcg information */ > - if (do_swap_account && swapout) { > + if (do_swap_account && swapout && memcg) > + /* We've already done mem_cgroup_get() in uncharge_common(). */ > swap_cgroup_record(ent, css_id(&memcg->css)); > - mem_cgroup_get(memcg); > - } > - css_put(&memcg->css); > } > #endif > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] memcg remove css_get/put per pages v2 2010-06-11 4:52 ` KAMEZAWA Hiroyuki @ 2010-06-11 4:59 ` Daisuke Nishimura 0 siblings, 0 replies; 15+ messages in thread From: Daisuke Nishimura @ 2010-06-11 4:59 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, balbir, linux-kernel, Daisuke Nishimura On Fri, 11 Jun 2010 13:52:02 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Fri, 11 Jun 2010 13:37:44 +0900 > Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote: > > > > @@ -2432,15 +2463,18 @@ mem_cgroup_uncharge_swapcache(struct pag > > > if (!swapout) /* this was a swap cache but the swap is unused ! */ > > > ctype = MEM_CGROUP_CHARGE_TYPE_DROP; > > > > > > - memcg = __mem_cgroup_uncharge_common(page, ctype); > > > + memcg = try_get_mem_cgroup_from_page(page); > > > + if (!memcg) > > > + return; > > > + > > > + __mem_cgroup_uncharge_common(page, ctype); > > > > > > /* record memcg information */ > > > - if (do_swap_account && swapout && memcg) { > > > + if (do_swap_account && swapout) { > > > swap_cgroup_record(ent, css_id(&memcg->css)); > > > mem_cgroup_get(memcg); > > > } > > > - if (swapout && memcg) > > > - css_put(&memcg->css); > > > + css_put(&memcg->css); > > > } > > > #endif > > > > > hmm, this change seems to cause a problem. > > I can see under flow of mem->memsw and "swap" field in memory.stat. > > > > I think doing swap_cgroup_record() against mem_cgroup which is not returned > > by __mem_cgroup_uncharge_common() is a bad behavior. > > > > How about doing like this ? We can safely access mem_cgroup while it has > > memory.usage, iow, before we call res_counter_uncharge(). > > After this change, it seems to work well. > > > > Thank you!. seems to work. I'll merge your change. > Can I add your Signed-off-by ? > Sure. Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] memcg remove css_get/put per pages v2 2010-06-09 6:59 ` [RFC][PATCH] memcg remove css_get/put per pages v2 KAMEZAWA Hiroyuki 2010-06-10 2:34 ` Daisuke Nishimura 2010-06-11 4:37 ` Daisuke Nishimura @ 2010-06-11 6:11 ` Balbir Singh 2010-06-11 6:21 ` KAMEZAWA Hiroyuki 2 siblings, 1 reply; 15+ messages in thread From: Balbir Singh @ 2010-06-11 6:11 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, nishimura, linux-kernel * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-06-09 15:59:40]: > + if (consume_stock(mem)) { > + /* > + * It seems dagerous to access memcg without css_get(). > + * But considering how consume_stok works, it's not > + * necessary. If consume_stock success, some charges > + * from this memcg are cached on this cpu. So, we > + * don't need to call css_get()/css_tryget() before > + * calling consume_stock(). > + */ > + rcu_read_unlock(); > + goto done; > + } > + if (!css_tryget(&mem->css)) { If tryget fails, can one assume that this due to a race and the mem is about to be freed? -- Three Cheers, Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH] memcg remove css_get/put per pages v2 2010-06-11 6:11 ` Balbir Singh @ 2010-06-11 6:21 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 15+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-06-11 6:21 UTC (permalink / raw) To: balbir; +Cc: linux-mm, nishimura, linux-kernel On Fri, 11 Jun 2010 11:41:02 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-06-09 15:59:40]: > > > + if (consume_stock(mem)) { > > + /* > > + * It seems dagerous to access memcg without css_get(). > > + * But considering how consume_stok works, it's not > > + * necessary. If consume_stock success, some charges > > + * from this memcg are cached on this cpu. So, we > > + * don't need to call css_get()/css_tryget() before > > + * calling consume_stock(). > > + */ > > + rcu_read_unlock(); > > + goto done; > > + } > > + if (!css_tryget(&mem->css)) { > > If tryget fails, can one assume that this due to a race and the mem is > about to be freed? > Yes. it's due to a race and "mem" will be no longer used. This does the same thing which try_get_mem_cgrou_from_mm() does now. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2010-06-11 7:03 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-06-08 3:19 [RFC][PATCH] memcg remove css_get/put per pages KAMEZAWA Hiroyuki 2010-06-08 5:40 ` Balbir Singh 2010-06-09 0:47 ` KAMEZAWA Hiroyuki 2010-06-09 5:14 ` KAMEZAWA Hiroyuki 2010-06-08 7:31 ` Daisuke Nishimura 2010-06-09 0:54 ` KAMEZAWA Hiroyuki 2010-06-09 2:05 ` Daisuke Nishimura 2010-06-09 6:59 ` [RFC][PATCH] memcg remove css_get/put per pages v2 KAMEZAWA Hiroyuki 2010-06-10 2:34 ` Daisuke Nishimura 2010-06-10 2:49 ` KAMEZAWA Hiroyuki 2010-06-11 4:37 ` Daisuke Nishimura 2010-06-11 4:52 ` KAMEZAWA Hiroyuki 2010-06-11 4:59 ` Daisuke Nishimura 2010-06-11 6:11 ` Balbir Singh 2010-06-11 6:21 ` KAMEZAWA Hiroyuki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox