linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>
Subject: Re: [RFC][PATCH 1/4][mmotm] memcg: soft limit clean up
Date: Thu, 10 Sep 2009 09:10:10 +0900	[thread overview]
Message-ID: <20090910091010.e1365df3.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <661de9470909090410t160454a2k658c980b92d11612@mail.gmail.com>

On Wed, 9 Sep 2009 16:40:03 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> On Wed, Sep 9, 2009 at 2:11 PM, KAMEZAWA Hiroyuki <
> kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > This patch clean up/fixes for memcg's uncharge soft limit path.
> > This is also a preparation for batched uncharge.
> >
> > Problems:
> >  Now, res_counter_charge()/uncharge() handles softlimit information at
> >  charge/uncharge and softlimit-check is done when event counter per memcg
> >  goes over limit. But event counter per memcg is updated only when
> >  when memcg is over soft limit. But ancerstors are handled in charge path
> >  but not in uncharge path.
> >  For batched charge/uncharge, event counter check should be more strict.
> >
> >  Prolems:
> >
> 
> typo, should be Problems
> 
yes..


> 
> >  1. memcg's event counter incremented only when softlimit hits. That's bad.
> >     It makes event counter hard to be reused for other purpose.
> >
> 
> I don't understand the context, are these existing problems?
> 

"event" counter is useful for other purposes as
  - memory usage threshold notifier or something fancy.

Then, I think "event" of charge/uncharge should be counted at every charge/uncharge.
Now, charge/uncharge event is counted only when soft_fail_res != NULL.

>  2. At uncharge, only the lowest level rescounter is handled. This is bug.
> >     Because ancesotor's event counter is not incremented, children should
> >     take care of them.
> >  3. res_counter_uncharge()'s 3rd argument is NULL in most case.
> >     ops under res_counter->lock should be small. No "if" sentense is
> > better.
> >
> > Fixes:
> >  * Removed soft_limit_xx poitner and checsk from charge and uncharge.
> >
> 
> typo should be soft_limit_xxx_pointer and check
> 
will fix.

Thank you for review. I'll brush up.

Thanks,
-Kame

> 
> >    Do-check-only-when-necessary scheme works enough well without them.
> >
> >  * make event-counter of memcg checked at every charge/uncharge.
> >    (per-cpu area will be accessed soon anyway)
> >
> >  * All ancestors are checked at soft-limit-check. This is necessary because
> >    ancesotor's event counter may never be modified. Then, they should be
> >    checked at the same time.
> >
> > Todo;
> >  We may need to modify EVENT_COUNTER_THRESH of parent with many children.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > ---
> >  include/linux/res_counter.h |    6 --
> >  kernel/res_counter.c        |   18 ------
> >  mm/memcontrol.c             |  115
> > +++++++++++++++++++-------------------------
> >  3 files changed, 55 insertions(+), 84 deletions(-)
> >
> > Index: mmotm-2.6.31-Sep3/kernel/res_counter.c
> > ===================================================================
> > --- mmotm-2.6.31-Sep3.orig/kernel/res_counter.c
> > +++ mmotm-2.6.31-Sep3/kernel/res_counter.c
> > @@ -37,27 +37,17 @@ int res_counter_charge_locked(struct res
> >  }
> >
> >  int res_counter_charge(struct res_counter *counter, unsigned long val,
> > -                       struct res_counter **limit_fail_at,
> > -                       struct res_counter **soft_limit_fail_at)
> > +                       struct res_counter **limit_fail_at)
> >  {
> >        int ret;
> >        unsigned long flags;
> >        struct res_counter *c, *u;
> >
> >        *limit_fail_at = NULL;
> > -       if (soft_limit_fail_at)
> > -               *soft_limit_fail_at = NULL;
> >        local_irq_save(flags);
> >        for (c = counter; c != NULL; c = c->parent) {
> >                spin_lock(&c->lock);
> >                ret = res_counter_charge_locked(c, val);
> > -               /*
> > -                * With soft limits, we return the highest ancestor
> > -                * that exceeds its soft limit
> > -                */
> > -               if (soft_limit_fail_at &&
> > -                       !res_counter_soft_limit_check_locked(c))
> > -                       *soft_limit_fail_at = c;
> >                spin_unlock(&c->lock);
> >                if (ret < 0) {
> >                        *limit_fail_at = c;
> > @@ -85,8 +75,7 @@ void res_counter_uncharge_locked(struct
> >        counter->usage -= val;
> >  }
> >
> > -void res_counter_uncharge(struct res_counter *counter, unsigned long val,
> > -                               bool *was_soft_limit_excess)
> > +void res_counter_uncharge(struct res_counter *counter, unsigned long val)
> >  {
> >        unsigned long flags;
> >        struct res_counter *c;
> > @@ -94,9 +83,6 @@ void res_counter_uncharge(struct res_cou
> >        local_irq_save(flags);
> >        for (c = counter; c != NULL; c = c->parent) {
> >                spin_lock(&c->lock);
> > -               if (was_soft_limit_excess)
> > -                       *was_soft_limit_excess =
> > -                               !res_counter_soft_limit_check_locked(c);
> >                res_counter_uncharge_locked(c, val);
> >                spin_unlock(&c->lock);
> >        }
> > Index: mmotm-2.6.31-Sep3/include/linux/res_counter.h
> > ===================================================================
> > --- mmotm-2.6.31-Sep3.orig/include/linux/res_counter.h
> > +++ mmotm-2.6.31-Sep3/include/linux/res_counter.h
> > @@ -114,8 +114,7 @@ void res_counter_init(struct res_counter
> >  int __must_check res_counter_charge_locked(struct res_counter *counter,
> >                unsigned long val);
> >  int __must_check res_counter_charge(struct res_counter *counter,
> > -               unsigned long val, struct res_counter **limit_fail_at,
> > -               struct res_counter **soft_limit_at);
> > +               unsigned long val, struct res_counter **limit_fail_at);
> >
> >  /*
> >  * uncharge - tell that some portion of the resource is released
> > @@ -128,8 +127,7 @@ int __must_check res_counter_charge(stru
> >  */
> >
> >  void res_counter_uncharge_locked(struct res_counter *counter, unsigned
> > long val);
> > -void res_counter_uncharge(struct res_counter *counter, unsigned long val,
> > -                               bool *was_soft_limit_excess);
> > +void res_counter_uncharge(struct res_counter *counter, unsigned long val);
> >
> >  static inline bool res_counter_limit_check_locked(struct res_counter *cnt)
> >  {
> > Index: mmotm-2.6.31-Sep3/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-2.6.31-Sep3.orig/mm/memcontrol.c
> > +++ mmotm-2.6.31-Sep3/mm/memcontrol.c
> > @@ -353,16 +353,6 @@ __mem_cgroup_remove_exceeded(struct mem_
> >  }
> >
> >  static void
> > -mem_cgroup_insert_exceeded(struct mem_cgroup *mem,
> > -                               struct mem_cgroup_per_zone *mz,
> > -                               struct mem_cgroup_tree_per_zone *mctz)
> > -{
> > -       spin_lock(&mctz->lock);
> > -       __mem_cgroup_insert_exceeded(mem, mz, mctz);
> > -       spin_unlock(&mctz->lock);
> > -}
> > -
> > -static void
> >  mem_cgroup_remove_exceeded(struct mem_cgroup *mem,
> >                                struct mem_cgroup_per_zone *mz,
> >                                struct mem_cgroup_tree_per_zone *mctz)
> > @@ -392,34 +382,40 @@ static bool mem_cgroup_soft_limit_check(
> >
> >  static void mem_cgroup_update_tree(struct mem_cgroup *mem, struct page
> > *page)
> >  {
> > -       unsigned long long prev_usage_in_excess, new_usage_in_excess;
> > -       bool updated_tree = false;
> > +       unsigned long long new_usage_in_excess;
> >        struct mem_cgroup_per_zone *mz;
> >        struct mem_cgroup_tree_per_zone *mctz;
> > -
> > -       mz = mem_cgroup_zoneinfo(mem, page_to_nid(page),
> > page_zonenum(page));
> > +       int nid = page_to_nid(page);
> > +       int zid = page_zonenum(page);
> >        mctz = soft_limit_tree_from_page(page);
> >
> >        /*
> > -        * We do updates in lazy mode, mem's are removed
> > -        * lazily from the per-zone, per-node rb tree
> > +        * Necessary to update all ancestors when hierarchy is used.
> > +        * because their event counter is not touched.
> >         */
> > -       prev_usage_in_excess = mz->usage_in_excess;
> > -
> > -       new_usage_in_excess = res_counter_soft_limit_excess(&mem->res);
> > -       if (prev_usage_in_excess) {
> > -               mem_cgroup_remove_exceeded(mem, mz, mctz);
> > -               updated_tree = true;
> > -       }
> > -       if (!new_usage_in_excess)
> > -               goto done;
> > -       mem_cgroup_insert_exceeded(mem, mz, mctz);
> > -
> > -done:
> > -       if (updated_tree) {
> > -               spin_lock(&mctz->lock);
> > -               mz->usage_in_excess = new_usage_in_excess;
> > -               spin_unlock(&mctz->lock);
> > +       for (; mem; mem = parent_mem_cgroup(mem)) {
> > +               mz = mem_cgroup_zoneinfo(mem, nid, zid);
> > +               new_usage_in_excess =
> > +                       res_counter_soft_limit_excess(&mem->res);
> > +               /*
> > +                * We have to update the tree if mz is on RB-tree or
> > +                * mem is over its softlimit.
> > +                */
> > +               if (new_usage_in_excess || mz->on_tree) {
> > +                       spin_lock(&mctz->lock);
> > +                       /* if on-tree, remove it */
> > +                       if (mz->on_tree)
> > +                               __mem_cgroup_remove_exceeded(mem, mz,
> > mctz);
> > +                       /*
> > +                        * if over soft limit, insert again.
> > mz->usage_in_excess
> > +                        * will be updated properly.
> > +                        */
> > +                       if (new_usage_in_excess)
> > +                               __mem_cgroup_insert_exceeded(mem, mz,
> > mctz);
> > +                       else
> > +                               mz->usage_in_excess = 0;
> > +                       spin_unlock(&mctz->lock);
> > +               }
> >        }
> >  }
> >
> > @@ -1270,9 +1266,9 @@ static int __mem_cgroup_try_charge(struc
> >                        gfp_t gfp_mask, struct mem_cgroup **memcg,
> >                        bool oom, struct page *page)
> >  {
> > -       struct mem_cgroup *mem, *mem_over_limit, *mem_over_soft_limit;
> > +       struct mem_cgroup *mem, *mem_over_limit;
> >        int nr_retries = MEM_CGROUP_RECLAIM_RETRIES;
> > -       struct res_counter *fail_res, *soft_fail_res = NULL;
> > +       struct res_counter *fail_res;
> >
> >        if (unlikely(test_thread_flag(TIF_MEMDIE))) {
> >                /* Don't account this! */
> > @@ -1304,17 +1300,16 @@ static int __mem_cgroup_try_charge(struc
> >
> >                if (mem_cgroup_is_root(mem))
> >                        goto done;
> > -               ret = res_counter_charge(&mem->res, PAGE_SIZE, &fail_res,
> > -                                               &soft_fail_res);
> > +               ret = res_counter_charge(&mem->res, PAGE_SIZE, &fail_res);
> >                if (likely(!ret)) {
> >                        if (!do_swap_account)
> >                                break;
> >                        ret = res_counter_charge(&mem->memsw, PAGE_SIZE,
> > -                                                       &fail_res, NULL);
> > +                                                       &fail_res);
> >                        if (likely(!ret))
> >                                break;
> >                        /* mem+swap counter fails */
> > -                       res_counter_uncharge(&mem->res, PAGE_SIZE, NULL);
> > +                       res_counter_uncharge(&mem->res, PAGE_SIZE);
> >                        flags |= MEM_CGROUP_RECLAIM_NOSWAP;
> >                        mem_over_limit =
> > mem_cgroup_from_res_counter(fail_res,
> >
> >  memsw);
> > @@ -1353,16 +1348,11 @@ static int __mem_cgroup_try_charge(struc
> >                }
> >        }
> >        /*
> > -        * Insert just the ancestor, we should trickle down to the correct
> > -        * cgroup for reclaim, since the other nodes will be below their
> > -        * soft limit
> > -        */
> > -       if (soft_fail_res) {
> > -               mem_over_soft_limit =
> > -                       mem_cgroup_from_res_counter(soft_fail_res, res);
> > -               if (mem_cgroup_soft_limit_check(mem_over_soft_limit))
> > -                       mem_cgroup_update_tree(mem_over_soft_limit, page);
> > -       }
> > +        * Insert ancestor (and ancestor's ancestors), to softlimit
> > RB-tree.
> > +        * if they exceeds softlimit.
> > +        */
> > +       if (mem_cgroup_soft_limit_check(mem))
> > +               mem_cgroup_update_tree(mem, page);
> >  done:
> >        return 0;
> >  nomem:
> > @@ -1437,10 +1427,9 @@ static void __mem_cgroup_commit_charge(s
> >        if (unlikely(PageCgroupUsed(pc))) {
> >                unlock_page_cgroup(pc);
> >                if (!mem_cgroup_is_root(mem)) {
> > -                       res_counter_uncharge(&mem->res, PAGE_SIZE, NULL);
> > +                       res_counter_uncharge(&mem->res, PAGE_SIZE);
> >                        if (do_swap_account)
> > -                               res_counter_uncharge(&mem->memsw,
> > PAGE_SIZE,
> > -                                                       NULL);
> > +                               res_counter_uncharge(&mem->memsw,
> > PAGE_SIZE);
> >                }
> >                css_put(&mem->css);
> >                return;
> > @@ -1519,7 +1508,7 @@ static int mem_cgroup_move_account(struc
> >                goto out;
> >
> >        if (!mem_cgroup_is_root(from))
> > -               res_counter_uncharge(&from->res, PAGE_SIZE, NULL);
> > +               res_counter_uncharge(&from->res, PAGE_SIZE);
> >        mem_cgroup_charge_statistics(from, pc, false);
> >
> >        page = pc->page;
> > @@ -1539,7 +1528,7 @@ static int mem_cgroup_move_account(struc
> >        }
> >
> >        if (do_swap_account && !mem_cgroup_is_root(from))
> > -               res_counter_uncharge(&from->memsw, PAGE_SIZE, NULL);
> > +               res_counter_uncharge(&from->memsw, PAGE_SIZE);
> >        css_put(&from->css);
> >
> >        css_get(&to->css);
> > @@ -1610,9 +1599,9 @@ uncharge:
> >        css_put(&parent->css);
> >        /* uncharge if move fails */
> >        if (!mem_cgroup_is_root(parent)) {
> > -               res_counter_uncharge(&parent->res, PAGE_SIZE, NULL);
> > +               res_counter_uncharge(&parent->res, PAGE_SIZE);
> >                if (do_swap_account)
> > -                       res_counter_uncharge(&parent->memsw, PAGE_SIZE,
> > NULL);
> > +                       res_counter_uncharge(&parent->memsw, PAGE_SIZE);
> >        }
> >        return ret;
> >  }
> > @@ -1803,8 +1792,7 @@ __mem_cgroup_commit_charge_swapin(struct
> >                         * calling css_tryget
> >                         */
> >                        if (!mem_cgroup_is_root(memcg))
> > -                               res_counter_uncharge(&memcg->memsw,
> > PAGE_SIZE,
> > -                                                       NULL);
> > +                               res_counter_uncharge(&memcg->memsw,
> > PAGE_SIZE);
> >                        mem_cgroup_swap_statistics(memcg, false);
> >                        mem_cgroup_put(memcg);
> >                }
> > @@ -1831,9 +1819,9 @@ void mem_cgroup_cancel_charge_swapin(str
> >        if (!mem)
> >                return;
> >        if (!mem_cgroup_is_root(mem)) {
> > -               res_counter_uncharge(&mem->res, PAGE_SIZE, NULL);
> > +               res_counter_uncharge(&mem->res, PAGE_SIZE);
> >                if (do_swap_account)
> > -                       res_counter_uncharge(&mem->memsw, PAGE_SIZE, NULL);
> > +                       res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> >        }
> >        css_put(&mem->css);
> >  }
> > @@ -1848,7 +1836,6 @@ __mem_cgroup_uncharge_common(struct page
> >        struct page_cgroup *pc;
> >        struct mem_cgroup *mem = NULL;
> >        struct mem_cgroup_per_zone *mz;
> > -       bool soft_limit_excess = false;
> >
> >        if (mem_cgroup_disabled())
> >                return NULL;
> > @@ -1888,10 +1875,10 @@ __mem_cgroup_uncharge_common(struct page
> >        }
> >
> >        if (!mem_cgroup_is_root(mem)) {
> > -               res_counter_uncharge(&mem->res, PAGE_SIZE,
> > &soft_limit_excess);
> > +               res_counter_uncharge(&mem->res, PAGE_SIZE);
> >                if (do_swap_account &&
> >                                (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT))
> > -                       res_counter_uncharge(&mem->memsw, PAGE_SIZE, NULL);
> > +                       res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> >        }
> >        if (ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT)
> >                mem_cgroup_swap_statistics(mem, true);
> > @@ -1908,7 +1895,7 @@ __mem_cgroup_uncharge_common(struct page
> >        mz = page_cgroup_zoneinfo(pc);
> >        unlock_page_cgroup(pc);
> >
> > -       if (soft_limit_excess && mem_cgroup_soft_limit_check(mem))
> > +       if (mem_cgroup_soft_limit_check(mem))
> >                mem_cgroup_update_tree(mem, page);
> >        /* at swapout, this memcg will be accessed to record to swap */
> >        if (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT)
> > @@ -1986,7 +1973,7 @@ void mem_cgroup_uncharge_swap(swp_entry_
> >                 * This memcg can be obsolete one. We avoid calling
> > css_tryget
> >                 */
> >                if (!mem_cgroup_is_root(memcg))
> > -                       res_counter_uncharge(&memcg->memsw, PAGE_SIZE,
> > NULL);
> > +                       res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> >                mem_cgroup_swap_statistics(memcg, false);
> >                mem_cgroup_put(memcg);
> >        }
> >
> >
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2009-09-10  0:12 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-09  8:39 [RFC][PATCH 0/4][mmotm] memcg: reduce lock contention v3 KAMEZAWA Hiroyuki
2009-09-09  8:41 ` [RFC][PATCH 1/4][mmotm] memcg: soft limit clean up KAMEZAWA Hiroyuki
     [not found]   ` <661de9470909090410t160454a2k658c980b92d11612@mail.gmail.com>
2009-09-10  0:10     ` KAMEZAWA Hiroyuki [this message]
2009-09-09  8:41 ` [RFC][PATCH 2/4][mmotm] clean up charge path of softlimit KAMEZAWA Hiroyuki
2009-09-09  8:44 ` [RFC][PATCH 3/4][mmotm] memcg: batched uncharge KAMEZAWA Hiroyuki
2009-09-09  8:45 ` [RFC][PATCH 4/4][mmotm] memcg: coalescing charge KAMEZAWA Hiroyuki
2009-09-12  4:58   ` Daisuke Nishimura
2009-09-15  0:09     ` KAMEZAWA Hiroyuki
2009-09-09 20:30 ` [RFC][PATCH 0/4][mmotm] memcg: reduce lock contention v3 Balbir Singh
2009-09-10  0:20   ` KAMEZAWA Hiroyuki
2009-09-10  5:18     ` Balbir Singh
2009-09-18  8:47 ` [RFC][PATCH 0/11][mmotm] memcg: patch dump (Sep/18) KAMEZAWA Hiroyuki
2009-09-18  8:50   ` [RFC][PATCH 1/11] memcg: clean up softlimit uncharge KAMEZAWA Hiroyuki
2009-09-18  8:52   ` [RFC][PATCH 2/11]memcg: reduce res_counter_soft_limit_excess KAMEZAWA Hiroyuki
2009-09-18  8:53   ` [RFC][PATCH 3/11] memcg: coalescing uncharge KAMEZAWA Hiroyuki
2009-09-18  8:54   ` [RFC][PATCH 4/11] memcg: coalescing charge KAMEZAWA Hiroyuki
2009-09-18  8:55   ` [RFC][PATCH 5/11] memcg: clean up cancel charge KAMEZAWA Hiroyuki
2009-09-18  8:57   ` [RFC][PATCH 6/11] memcg: cleaun up percpu statistics KAMEZAWA Hiroyuki
2009-09-18  8:58   ` [RFC][PATCH 7/11] memcg: rename from_cont to from_cgroup KAMEZAWA Hiroyuki
2009-09-18  9:00   ` [RFC][PATCH 8/11]memcg: remove unused macro and adds commentary KAMEZAWA Hiroyuki
2009-09-18  9:01   ` [RFC][PATCH 9/11]memcg: clean up zonestat funcs KAMEZAWA Hiroyuki
2009-09-18  9:04   ` [RFC][PATCH 10/11][mmotm] memcg: clean up percpu and more commentary for soft limit KAMEZAWA Hiroyuki
2009-09-18  9:06   ` [RFC][PATCH 11/11][mmotm] memcg: more commentary and clean up KAMEZAWA Hiroyuki
2009-09-18 10:37   ` [RFC][PATCH 0/11][mmotm] memcg: patch dump (Sep/18) Daisuke Nishimura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090910091010.e1365df3.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=nishimura@mxp.nes.nec.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox