From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Li Zefan <lizf@cn.fujitsu.com>, Paul Menage <menage@google.com>,
linux-mm <linux-mm@kvack.org>,
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Subject: Re: [PATCH -mmotm 4/7] memcg: improbe performance in moving charge
Date: Fri, 4 Dec 2009 16:29:18 +0900 [thread overview]
Message-ID: <20091204162918.98aed8c8.nishimura@mxp.nes.nec.co.jp> (raw)
In-Reply-To: <20091204161004.146ae715.kamezawa.hiroyu@jp.fujitsu.com>
On Fri, 4 Dec 2009 16:10:04 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Fri, 4 Dec 2009 14:50:49 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
>
> > This patch tries to reduce overheads in moving charge by:
> >
> > - Instead of calling res_counter_uncharge against the old cgroup in
> > __mem_cgroup_move_account everytime, call res_counter_uncharge at the end of
> > task migration once.
> > - Instead of calling res_counter_charge(via __mem_cgroup_try_charge) repeatedly,
> > call res_counter_charge(PAGE_SIZE * count) in can_attach() if possible.
> > - Adds a new arg(count) to __css_put and make it decrement the css->refcnt
> > by "count", not 1.
> > - Add a new function(__css_get), which takes "count" as a arg and increment
> > the css->recnt by "count".
> > - Instead of calling css_get/css_put repeatedly, call new __css_get/__css_put
> > if possible.
> > - removed css_get(&to->css) from __mem_cgroup_move_account(callers should have
> > already called css_get), and removed css_put(&to->css) too, which is called by
> > callers of move_account on success of move_account.
> >
> > These changes reduces the overhead from 1.7sec to 0.6sec to move charges of 1G
> > anonymous memory in my test environment.
> >
> > Changelog: 2009/12/04
> > - new patch
> >
> seems nice in general.
>
Thank you.
>
> > Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > ---
> > include/linux/cgroup.h | 12 +++-
> > kernel/cgroup.c | 5 +-
> > mm/memcontrol.c | 151 +++++++++++++++++++++++++++++++-----------------
> > 3 files changed, 109 insertions(+), 59 deletions(-)
> >
> > diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> > index d4cc200..61f75ae 100644
> > --- a/include/linux/cgroup.h
> > +++ b/include/linux/cgroup.h
> > @@ -75,6 +75,12 @@ enum {
> > CSS_REMOVED, /* This CSS is dead */
> > };
> >
> > +/* Caller must verify that the css is not for root cgroup */
> > +static inline void __css_get(struct cgroup_subsys_state *css, int count)
> > +{
> > + atomic_add(count, &css->refcnt);
> > +}
> > +
> > /*
> > * Call css_get() to hold a reference on the css; it can be used
> > * for a reference obtained via:
> > @@ -86,7 +92,7 @@ static inline void css_get(struct cgroup_subsys_state *css)
> > {
> > /* We don't need to reference count the root state */
> > if (!test_bit(CSS_ROOT, &css->flags))
> > - atomic_inc(&css->refcnt);
> > + __css_get(css, 1);
> > }
> >
> > static inline bool css_is_removed(struct cgroup_subsys_state *css)
> > @@ -117,11 +123,11 @@ static inline bool css_tryget(struct cgroup_subsys_state *css)
> > * css_get() or css_tryget()
> > */
> >
> > -extern void __css_put(struct cgroup_subsys_state *css);
> > +extern void __css_put(struct cgroup_subsys_state *css, int count);
> > static inline void css_put(struct cgroup_subsys_state *css)
> > {
> > if (!test_bit(CSS_ROOT, &css->flags))
> > - __css_put(css);
> > + __css_put(css, 1);
> > }
> >
>
> Maybe it's better to divide cgroup part in other patches. Li or Paul has to review.
>
>
I agree.
> > /* bits in struct cgroup flags field */
> > diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> > index d67d471..44f5924 100644
> > --- a/kernel/cgroup.c
> > +++ b/kernel/cgroup.c
> > @@ -3729,12 +3729,13 @@ static void check_for_release(struct cgroup *cgrp)
> > }
> > }
> >
> > -void __css_put(struct cgroup_subsys_state *css)
> > +/* Caller must verify that the css is not for root cgroup */
> > +void __css_put(struct cgroup_subsys_state *css, int count)
> > {
> > struct cgroup *cgrp = css->cgroup;
> > int val;
> > rcu_read_lock();
> > - val = atomic_dec_return(&css->refcnt);
> > + val = atomic_sub_return(count, &css->refcnt);
> > if (val == 1) {
> > if (notify_on_release(cgrp)) {
> > set_bit(CGRP_RELEASABLE, &cgrp->flags);
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index e38f211..769b85a 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -252,6 +252,7 @@ struct move_charge_struct {
> > struct mem_cgroup *from;
> > struct mem_cgroup *to;
> > unsigned long precharge;
> > + unsigned long moved_charge;
> > };
> > static struct move_charge_struct mc;
> >
> > @@ -1532,14 +1533,23 @@ nomem:
> > * This function is for that and do uncharge, put css's refcnt.
> > * gotten by try_charge().
> > */
> > -static void mem_cgroup_cancel_charge(struct mem_cgroup *mem)
> > +static void __mem_cgroup_cancel_charge(struct mem_cgroup *mem,
> > + unsigned long count)
> > {
> > if (!mem_cgroup_is_root(mem)) {
> > - res_counter_uncharge(&mem->res, PAGE_SIZE);
> > + res_counter_uncharge(&mem->res, PAGE_SIZE * count);
> > if (do_swap_account)
> > - res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > + res_counter_uncharge(&mem->memsw, PAGE_SIZE * count);
> > + VM_BUG_ON(test_bit(CSS_ROOT, &mem->css.flags));
> > + WARN_ON_ONCE(count > INT_MAX);
>
> Hmm. is this WARN_ON necessary ? ...maybe res_counter_uncharge() will catch
> this, anyway.
>
The arg of atomic_add/dec is "int", so IMHO, it would be better to check here
(I think we neve hit this in real use).
> > + __css_put(&mem->css, (int)count);
> > }
> > - css_put(&mem->css);
> > + /* we don't need css_put for root */
> > +}
> > +
> > +static void mem_cgroup_cancel_charge(struct mem_cgroup *mem)
> > +{
> > + __mem_cgroup_cancel_charge(mem, 1);
> > }
> >
> > /*
> > @@ -1645,17 +1655,20 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
> > * @pc: page_cgroup of the page.
> > * @from: mem_cgroup which the page is moved from.
> > * @to: mem_cgroup which the page is moved to. @from != @to.
> > + * @uncharge: whether we should call uncharge and css_put against @from.
> > *
> > * The caller must confirm following.
> > * - page is not on LRU (isolate_page() is useful.)
> > * - the pc is locked, used, and ->mem_cgroup points to @from.
> > *
> > - * This function does "uncharge" from old cgroup but doesn't do "charge" to
> > - * new cgroup. It should be done by a caller.
> > + * This function doesn't do "charge" nor css_get to new cgroup. It should be
> > + * done by a caller(__mem_cgroup_try_charge would be usefull). If @uncharge is
> > + * true, this function does "uncharge" from old cgroup, but it doesn't if
> > + * @uncharge is false, so a caller should do "uncharge".
> > */
> >
> > static void __mem_cgroup_move_account(struct page_cgroup *pc,
> > - struct mem_cgroup *from, struct mem_cgroup *to)
> > + struct mem_cgroup *from, struct mem_cgroup *to, bool uncharge)
> > {
> > struct page *page;
> > int cpu;
> > @@ -1668,10 +1681,6 @@ static void __mem_cgroup_move_account(struct page_cgroup *pc,
> > VM_BUG_ON(!PageCgroupUsed(pc));
> > VM_BUG_ON(pc->mem_cgroup != from);
> >
> > - if (!mem_cgroup_is_root(from))
> > - res_counter_uncharge(&from->res, PAGE_SIZE);
> > - mem_cgroup_charge_statistics(from, pc, false);
> > -
> > page = pc->page;
> > if (page_mapped(page) && !PageAnon(page)) {
> > cpu = smp_processor_id();
> > @@ -1687,12 +1696,12 @@ static void __mem_cgroup_move_account(struct page_cgroup *pc,
> > __mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_FILE_MAPPED,
> > 1);
> > }
> > + mem_cgroup_charge_statistics(from, pc, false);
> > + if (uncharge)
> > + /* This is not "cancel", but cancel_charge does all we need. */
> > + mem_cgroup_cancel_charge(from);
> >
> > - if (do_swap_account && !mem_cgroup_is_root(from))
> > - res_counter_uncharge(&from->memsw, PAGE_SIZE);
> > - css_put(&from->css);
> > -
> > - css_get(&to->css);
> > + /* caller should have done css_get */
> > pc->mem_cgroup = to;
> > mem_cgroup_charge_statistics(to, pc, true);
> > /*
> > @@ -1709,12 +1718,12 @@ static void __mem_cgroup_move_account(struct page_cgroup *pc,
> > * __mem_cgroup_move_account()
> > */
> > static int mem_cgroup_move_account(struct page_cgroup *pc,
> > - struct mem_cgroup *from, struct mem_cgroup *to)
> > + struct mem_cgroup *from, struct mem_cgroup *to, bool uncharge)
> > {
> > int ret = -EINVAL;
> > lock_page_cgroup(pc);
> > if (PageCgroupUsed(pc) && pc->mem_cgroup == from) {
> > - __mem_cgroup_move_account(pc, from, to);
> > + __mem_cgroup_move_account(pc, from, to, uncharge);
> > ret = 0;
> > }
> > unlock_page_cgroup(pc);
> > @@ -1750,11 +1759,9 @@ static int mem_cgroup_move_parent(struct page_cgroup *pc,
> > if (ret || !parent)
> > goto put_back;
> >
> > - ret = mem_cgroup_move_account(pc, child, parent);
> > - if (!ret)
> > - css_put(&parent->css); /* drop extra refcnt by try_charge() */
> > - else
> > - mem_cgroup_cancel_charge(parent); /* does css_put */
> > + ret = mem_cgroup_move_account(pc, child, parent, true);
> > + if (ret)
> > + mem_cgroup_cancel_charge(parent);
> > put_back:
> > putback_lru_page(page);
> > put:
> > @@ -3441,16 +3448,57 @@ static int mem_cgroup_populate(struct cgroup_subsys *ss,
> > }
> >
> > /* Handlers for move charge at task migration. */
> > -static int mem_cgroup_do_precharge(void)
> > +#define PRECHARGE_COUNT_AT_ONCE 256
> > +static int mem_cgroup_do_precharge(unsigned long count)
> > {
> > - int ret = -ENOMEM;
> > + int ret = 0;
> > + int batch_count = PRECHARGE_COUNT_AT_ONCE;
> > struct mem_cgroup *mem = mc.to;
> >
> > - ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false, NULL);
> > - if (ret || !mem)
> > - return -ENOMEM;
> > -
> > - mc.precharge++;
> > + if (mem_cgroup_is_root(mem)) {
> > + mc.precharge += count;
> > + /* we don't need css_get for root */
> > + return ret;
> > + }
> > + /* try to charge at once */
> > + if (count > 1) {
> > + struct res_counter *dummy;
> > + /*
> > + * "mem" cannot be under rmdir() because we've already checked
> > + * by cgroup_lock_live_cgroup() that it is not removed and we
> > + * are still under the same cgroup_mutex. So we can postpone
> > + * css_get().
> > + */
> > + if (res_counter_charge(&mem->res, PAGE_SIZE * count, &dummy))
> > + goto one_by_one;
>
> if (do_swap_account) here.
>
Ah, yes. you're right.
> > + if (res_counter_charge(&mem->memsw,
> > + PAGE_SIZE * count, &dummy)) {
> > + res_counter_uncharge(&mem->res, PAGE_SIZE * count);
> > + goto one_by_one;
> > + }
> > + mc.precharge += count;
> > + VM_BUG_ON(test_bit(CSS_ROOT, &mem->css.flags));
> > + WARN_ON_ONCE(count > INT_MAX);
> > + __css_get(&mem->css, (int)count);
> > + return ret;
> > + }
> > +one_by_one:
> > + /* fall back to one by one charge */
> > + while (!ret && count--) {
>
> !ret check seems unnecessary.
yes.. will remove.
> > + if (signal_pending(current)) {
> > + ret = -EINTR;
> > + break;
> > + }
> > + if (!batch_count--) {
> > + batch_count = PRECHARGE_COUNT_AT_ONCE;
> > + cond_resched();
> > + }
> > + ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem,
> > + false, NULL);
> > + if (ret || !mem)
> > + return -ENOMEM;
>
> returning without uncharge here ?
>
If we return here with -ENOMEM, mem_cgroup_can_attach calls mem_cgroup_clear_mc,
which will do uncharge. I will add some comments.
Thank you for your careful review.
Regards,
Daisuke Nishimura.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-12-04 7:35 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-04 5:46 [PATCH -mmotm 0/7] memcg: move charge at task migration (04/Dec) Daisuke Nishimura
2009-12-04 5:47 ` [PATCH -mmotm 1/7] cgroup: introduce cancel_attach() Daisuke Nishimura
2009-12-04 5:48 ` [PATCH -mmotm 2/7] memcg: add interface to move charge at task migration Daisuke Nishimura
2009-12-04 6:55 ` KAMEZAWA Hiroyuki
2009-12-04 5:49 ` [PATCH -mmotm 3/7] memcg: move charges of anonymous page Daisuke Nishimura
2009-12-04 5:50 ` [PATCH -mmotm 4/7] memcg: improbe performance in moving charge Daisuke Nishimura
2009-12-04 7:10 ` KAMEZAWA Hiroyuki
2009-12-04 7:29 ` Daisuke Nishimura [this message]
2009-12-04 5:51 ` [PATCH -mmotm 5/7] memcg: avoid oom during " Daisuke Nishimura
2009-12-04 7:14 ` KAMEZAWA Hiroyuki
2009-12-04 7:43 ` Daisuke Nishimura
2009-12-04 5:52 ` [PATCH -mmotm 6/7] memcg: move charges of anonymous swap Daisuke Nishimura
2009-12-04 7:32 ` KAMEZAWA Hiroyuki
2009-12-04 10:53 ` Daisuke Nishimura
2009-12-04 5:54 ` [PATCH -mmotm 7/7] memcg: improbe performance in moving swap charge Daisuke Nishimura
2009-12-04 6:53 ` [PATCH -mmotm 0/7] memcg: move charge at task migration (04/Dec) KAMEZAWA Hiroyuki
2009-12-04 7:00 ` KAMEZAWA Hiroyuki
2009-12-07 6:34 ` Daisuke Nishimura
2009-12-09 0:21 ` KAMEZAWA Hiroyuki
2009-12-04 7:09 ` Daisuke Nishimura
2009-12-04 7:34 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091204162918.98aed8c8.nishimura@mxp.nes.nec.co.jp \
--to=nishimura@mxp.nes.nec.co.jp \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=lizf@cn.fujitsu.com \
--cc=menage@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox