From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>
Subject: Re: [RFC][PATCH 1/4][mmotm] memcg: soft limit clean up
Date: Thu, 10 Sep 2009 09:10:10 +0900 [thread overview]
Message-ID: <20090910091010.e1365df3.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <661de9470909090410t160454a2k658c980b92d11612@mail.gmail.com>
On Wed, 9 Sep 2009 16:40:03 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> On Wed, Sep 9, 2009 at 2:11 PM, KAMEZAWA Hiroyuki <
> kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> > This patch clean up/fixes for memcg's uncharge soft limit path.
> > This is also a preparation for batched uncharge.
> >
> > Problems:
> > Now, res_counter_charge()/uncharge() handles softlimit information at
> > charge/uncharge and softlimit-check is done when event counter per memcg
> > goes over limit. But event counter per memcg is updated only when
> > when memcg is over soft limit. But ancerstors are handled in charge path
> > but not in uncharge path.
> > For batched charge/uncharge, event counter check should be more strict.
> >
> > Prolems:
> >
>
> typo, should be Problems
>
yes..
>
> > 1. memcg's event counter incremented only when softlimit hits. That's bad.
> > It makes event counter hard to be reused for other purpose.
> >
>
> I don't understand the context, are these existing problems?
>
"event" counter is useful for other purposes as
- memory usage threshold notifier or something fancy.
Then, I think "event" of charge/uncharge should be counted at every charge/uncharge.
Now, charge/uncharge event is counted only when soft_fail_res != NULL.
> 2. At uncharge, only the lowest level rescounter is handled. This is bug.
> > Because ancesotor's event counter is not incremented, children should
> > take care of them.
> > 3. res_counter_uncharge()'s 3rd argument is NULL in most case.
> > ops under res_counter->lock should be small. No "if" sentense is
> > better.
> >
> > Fixes:
> > * Removed soft_limit_xx poitner and checsk from charge and uncharge.
> >
>
> typo should be soft_limit_xxx_pointer and check
>
will fix.
Thank you for review. I'll brush up.
Thanks,
-Kame
>
> > Do-check-only-when-necessary scheme works enough well without them.
> >
> > * make event-counter of memcg checked at every charge/uncharge.
> > (per-cpu area will be accessed soon anyway)
> >
> > * All ancestors are checked at soft-limit-check. This is necessary because
> > ancesotor's event counter may never be modified. Then, they should be
> > checked at the same time.
> >
> > Todo;
> > We may need to modify EVENT_COUNTER_THRESH of parent with many children.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > ---
> > include/linux/res_counter.h | 6 --
> > kernel/res_counter.c | 18 ------
> > mm/memcontrol.c | 115
> > +++++++++++++++++++-------------------------
> > 3 files changed, 55 insertions(+), 84 deletions(-)
> >
> > Index: mmotm-2.6.31-Sep3/kernel/res_counter.c
> > ===================================================================
> > --- mmotm-2.6.31-Sep3.orig/kernel/res_counter.c
> > +++ mmotm-2.6.31-Sep3/kernel/res_counter.c
> > @@ -37,27 +37,17 @@ int res_counter_charge_locked(struct res
> > }
> >
> > int res_counter_charge(struct res_counter *counter, unsigned long val,
> > - struct res_counter **limit_fail_at,
> > - struct res_counter **soft_limit_fail_at)
> > + struct res_counter **limit_fail_at)
> > {
> > int ret;
> > unsigned long flags;
> > struct res_counter *c, *u;
> >
> > *limit_fail_at = NULL;
> > - if (soft_limit_fail_at)
> > - *soft_limit_fail_at = NULL;
> > local_irq_save(flags);
> > for (c = counter; c != NULL; c = c->parent) {
> > spin_lock(&c->lock);
> > ret = res_counter_charge_locked(c, val);
> > - /*
> > - * With soft limits, we return the highest ancestor
> > - * that exceeds its soft limit
> > - */
> > - if (soft_limit_fail_at &&
> > - !res_counter_soft_limit_check_locked(c))
> > - *soft_limit_fail_at = c;
> > spin_unlock(&c->lock);
> > if (ret < 0) {
> > *limit_fail_at = c;
> > @@ -85,8 +75,7 @@ void res_counter_uncharge_locked(struct
> > counter->usage -= val;
> > }
> >
> > -void res_counter_uncharge(struct res_counter *counter, unsigned long val,
> > - bool *was_soft_limit_excess)
> > +void res_counter_uncharge(struct res_counter *counter, unsigned long val)
> > {
> > unsigned long flags;
> > struct res_counter *c;
> > @@ -94,9 +83,6 @@ void res_counter_uncharge(struct res_cou
> > local_irq_save(flags);
> > for (c = counter; c != NULL; c = c->parent) {
> > spin_lock(&c->lock);
> > - if (was_soft_limit_excess)
> > - *was_soft_limit_excess =
> > - !res_counter_soft_limit_check_locked(c);
> > res_counter_uncharge_locked(c, val);
> > spin_unlock(&c->lock);
> > }
> > Index: mmotm-2.6.31-Sep3/include/linux/res_counter.h
> > ===================================================================
> > --- mmotm-2.6.31-Sep3.orig/include/linux/res_counter.h
> > +++ mmotm-2.6.31-Sep3/include/linux/res_counter.h
> > @@ -114,8 +114,7 @@ void res_counter_init(struct res_counter
> > int __must_check res_counter_charge_locked(struct res_counter *counter,
> > unsigned long val);
> > int __must_check res_counter_charge(struct res_counter *counter,
> > - unsigned long val, struct res_counter **limit_fail_at,
> > - struct res_counter **soft_limit_at);
> > + unsigned long val, struct res_counter **limit_fail_at);
> >
> > /*
> > * uncharge - tell that some portion of the resource is released
> > @@ -128,8 +127,7 @@ int __must_check res_counter_charge(stru
> > */
> >
> > void res_counter_uncharge_locked(struct res_counter *counter, unsigned
> > long val);
> > -void res_counter_uncharge(struct res_counter *counter, unsigned long val,
> > - bool *was_soft_limit_excess);
> > +void res_counter_uncharge(struct res_counter *counter, unsigned long val);
> >
> > static inline bool res_counter_limit_check_locked(struct res_counter *cnt)
> > {
> > Index: mmotm-2.6.31-Sep3/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-2.6.31-Sep3.orig/mm/memcontrol.c
> > +++ mmotm-2.6.31-Sep3/mm/memcontrol.c
> > @@ -353,16 +353,6 @@ __mem_cgroup_remove_exceeded(struct mem_
> > }
> >
> > static void
> > -mem_cgroup_insert_exceeded(struct mem_cgroup *mem,
> > - struct mem_cgroup_per_zone *mz,
> > - struct mem_cgroup_tree_per_zone *mctz)
> > -{
> > - spin_lock(&mctz->lock);
> > - __mem_cgroup_insert_exceeded(mem, mz, mctz);
> > - spin_unlock(&mctz->lock);
> > -}
> > -
> > -static void
> > mem_cgroup_remove_exceeded(struct mem_cgroup *mem,
> > struct mem_cgroup_per_zone *mz,
> > struct mem_cgroup_tree_per_zone *mctz)
> > @@ -392,34 +382,40 @@ static bool mem_cgroup_soft_limit_check(
> >
> > static void mem_cgroup_update_tree(struct mem_cgroup *mem, struct page
> > *page)
> > {
> > - unsigned long long prev_usage_in_excess, new_usage_in_excess;
> > - bool updated_tree = false;
> > + unsigned long long new_usage_in_excess;
> > struct mem_cgroup_per_zone *mz;
> > struct mem_cgroup_tree_per_zone *mctz;
> > -
> > - mz = mem_cgroup_zoneinfo(mem, page_to_nid(page),
> > page_zonenum(page));
> > + int nid = page_to_nid(page);
> > + int zid = page_zonenum(page);
> > mctz = soft_limit_tree_from_page(page);
> >
> > /*
> > - * We do updates in lazy mode, mem's are removed
> > - * lazily from the per-zone, per-node rb tree
> > + * Necessary to update all ancestors when hierarchy is used.
> > + * because their event counter is not touched.
> > */
> > - prev_usage_in_excess = mz->usage_in_excess;
> > -
> > - new_usage_in_excess = res_counter_soft_limit_excess(&mem->res);
> > - if (prev_usage_in_excess) {
> > - mem_cgroup_remove_exceeded(mem, mz, mctz);
> > - updated_tree = true;
> > - }
> > - if (!new_usage_in_excess)
> > - goto done;
> > - mem_cgroup_insert_exceeded(mem, mz, mctz);
> > -
> > -done:
> > - if (updated_tree) {
> > - spin_lock(&mctz->lock);
> > - mz->usage_in_excess = new_usage_in_excess;
> > - spin_unlock(&mctz->lock);
> > + for (; mem; mem = parent_mem_cgroup(mem)) {
> > + mz = mem_cgroup_zoneinfo(mem, nid, zid);
> > + new_usage_in_excess =
> > + res_counter_soft_limit_excess(&mem->res);
> > + /*
> > + * We have to update the tree if mz is on RB-tree or
> > + * mem is over its softlimit.
> > + */
> > + if (new_usage_in_excess || mz->on_tree) {
> > + spin_lock(&mctz->lock);
> > + /* if on-tree, remove it */
> > + if (mz->on_tree)
> > + __mem_cgroup_remove_exceeded(mem, mz,
> > mctz);
> > + /*
> > + * if over soft limit, insert again.
> > mz->usage_in_excess
> > + * will be updated properly.
> > + */
> > + if (new_usage_in_excess)
> > + __mem_cgroup_insert_exceeded(mem, mz,
> > mctz);
> > + else
> > + mz->usage_in_excess = 0;
> > + spin_unlock(&mctz->lock);
> > + }
> > }
> > }
> >
> > @@ -1270,9 +1266,9 @@ static int __mem_cgroup_try_charge(struc
> > gfp_t gfp_mask, struct mem_cgroup **memcg,
> > bool oom, struct page *page)
> > {
> > - struct mem_cgroup *mem, *mem_over_limit, *mem_over_soft_limit;
> > + struct mem_cgroup *mem, *mem_over_limit;
> > int nr_retries = MEM_CGROUP_RECLAIM_RETRIES;
> > - struct res_counter *fail_res, *soft_fail_res = NULL;
> > + struct res_counter *fail_res;
> >
> > if (unlikely(test_thread_flag(TIF_MEMDIE))) {
> > /* Don't account this! */
> > @@ -1304,17 +1300,16 @@ static int __mem_cgroup_try_charge(struc
> >
> > if (mem_cgroup_is_root(mem))
> > goto done;
> > - ret = res_counter_charge(&mem->res, PAGE_SIZE, &fail_res,
> > - &soft_fail_res);
> > + ret = res_counter_charge(&mem->res, PAGE_SIZE, &fail_res);
> > if (likely(!ret)) {
> > if (!do_swap_account)
> > break;
> > ret = res_counter_charge(&mem->memsw, PAGE_SIZE,
> > - &fail_res, NULL);
> > + &fail_res);
> > if (likely(!ret))
> > break;
> > /* mem+swap counter fails */
> > - res_counter_uncharge(&mem->res, PAGE_SIZE, NULL);
> > + res_counter_uncharge(&mem->res, PAGE_SIZE);
> > flags |= MEM_CGROUP_RECLAIM_NOSWAP;
> > mem_over_limit =
> > mem_cgroup_from_res_counter(fail_res,
> >
> > memsw);
> > @@ -1353,16 +1348,11 @@ static int __mem_cgroup_try_charge(struc
> > }
> > }
> > /*
> > - * Insert just the ancestor, we should trickle down to the correct
> > - * cgroup for reclaim, since the other nodes will be below their
> > - * soft limit
> > - */
> > - if (soft_fail_res) {
> > - mem_over_soft_limit =
> > - mem_cgroup_from_res_counter(soft_fail_res, res);
> > - if (mem_cgroup_soft_limit_check(mem_over_soft_limit))
> > - mem_cgroup_update_tree(mem_over_soft_limit, page);
> > - }
> > + * Insert ancestor (and ancestor's ancestors), to softlimit
> > RB-tree.
> > + * if they exceeds softlimit.
> > + */
> > + if (mem_cgroup_soft_limit_check(mem))
> > + mem_cgroup_update_tree(mem, page);
> > done:
> > return 0;
> > nomem:
> > @@ -1437,10 +1427,9 @@ static void __mem_cgroup_commit_charge(s
> > if (unlikely(PageCgroupUsed(pc))) {
> > unlock_page_cgroup(pc);
> > if (!mem_cgroup_is_root(mem)) {
> > - res_counter_uncharge(&mem->res, PAGE_SIZE, NULL);
> > + res_counter_uncharge(&mem->res, PAGE_SIZE);
> > if (do_swap_account)
> > - res_counter_uncharge(&mem->memsw,
> > PAGE_SIZE,
> > - NULL);
> > + res_counter_uncharge(&mem->memsw,
> > PAGE_SIZE);
> > }
> > css_put(&mem->css);
> > return;
> > @@ -1519,7 +1508,7 @@ static int mem_cgroup_move_account(struc
> > goto out;
> >
> > if (!mem_cgroup_is_root(from))
> > - res_counter_uncharge(&from->res, PAGE_SIZE, NULL);
> > + res_counter_uncharge(&from->res, PAGE_SIZE);
> > mem_cgroup_charge_statistics(from, pc, false);
> >
> > page = pc->page;
> > @@ -1539,7 +1528,7 @@ static int mem_cgroup_move_account(struc
> > }
> >
> > if (do_swap_account && !mem_cgroup_is_root(from))
> > - res_counter_uncharge(&from->memsw, PAGE_SIZE, NULL);
> > + res_counter_uncharge(&from->memsw, PAGE_SIZE);
> > css_put(&from->css);
> >
> > css_get(&to->css);
> > @@ -1610,9 +1599,9 @@ uncharge:
> > css_put(&parent->css);
> > /* uncharge if move fails */
> > if (!mem_cgroup_is_root(parent)) {
> > - res_counter_uncharge(&parent->res, PAGE_SIZE, NULL);
> > + res_counter_uncharge(&parent->res, PAGE_SIZE);
> > if (do_swap_account)
> > - res_counter_uncharge(&parent->memsw, PAGE_SIZE,
> > NULL);
> > + res_counter_uncharge(&parent->memsw, PAGE_SIZE);
> > }
> > return ret;
> > }
> > @@ -1803,8 +1792,7 @@ __mem_cgroup_commit_charge_swapin(struct
> > * calling css_tryget
> > */
> > if (!mem_cgroup_is_root(memcg))
> > - res_counter_uncharge(&memcg->memsw,
> > PAGE_SIZE,
> > - NULL);
> > + res_counter_uncharge(&memcg->memsw,
> > PAGE_SIZE);
> > mem_cgroup_swap_statistics(memcg, false);
> > mem_cgroup_put(memcg);
> > }
> > @@ -1831,9 +1819,9 @@ void mem_cgroup_cancel_charge_swapin(str
> > if (!mem)
> > return;
> > if (!mem_cgroup_is_root(mem)) {
> > - res_counter_uncharge(&mem->res, PAGE_SIZE, NULL);
> > + res_counter_uncharge(&mem->res, PAGE_SIZE);
> > if (do_swap_account)
> > - res_counter_uncharge(&mem->memsw, PAGE_SIZE, NULL);
> > + res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > }
> > css_put(&mem->css);
> > }
> > @@ -1848,7 +1836,6 @@ __mem_cgroup_uncharge_common(struct page
> > struct page_cgroup *pc;
> > struct mem_cgroup *mem = NULL;
> > struct mem_cgroup_per_zone *mz;
> > - bool soft_limit_excess = false;
> >
> > if (mem_cgroup_disabled())
> > return NULL;
> > @@ -1888,10 +1875,10 @@ __mem_cgroup_uncharge_common(struct page
> > }
> >
> > if (!mem_cgroup_is_root(mem)) {
> > - res_counter_uncharge(&mem->res, PAGE_SIZE,
> > &soft_limit_excess);
> > + res_counter_uncharge(&mem->res, PAGE_SIZE);
> > if (do_swap_account &&
> > (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT))
> > - res_counter_uncharge(&mem->memsw, PAGE_SIZE, NULL);
> > + res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > }
> > if (ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT)
> > mem_cgroup_swap_statistics(mem, true);
> > @@ -1908,7 +1895,7 @@ __mem_cgroup_uncharge_common(struct page
> > mz = page_cgroup_zoneinfo(pc);
> > unlock_page_cgroup(pc);
> >
> > - if (soft_limit_excess && mem_cgroup_soft_limit_check(mem))
> > + if (mem_cgroup_soft_limit_check(mem))
> > mem_cgroup_update_tree(mem, page);
> > /* at swapout, this memcg will be accessed to record to swap */
> > if (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT)
> > @@ -1986,7 +1973,7 @@ void mem_cgroup_uncharge_swap(swp_entry_
> > * This memcg can be obsolete one. We avoid calling
> > css_tryget
> > */
> > if (!mem_cgroup_is_root(memcg))
> > - res_counter_uncharge(&memcg->memsw, PAGE_SIZE,
> > NULL);
> > + res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> > mem_cgroup_swap_statistics(memcg, false);
> > mem_cgroup_put(memcg);
> > }
> >
> >
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-09-10 0:12 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-09 8:39 [RFC][PATCH 0/4][mmotm] memcg: reduce lock contention v3 KAMEZAWA Hiroyuki
2009-09-09 8:41 ` [RFC][PATCH 1/4][mmotm] memcg: soft limit clean up KAMEZAWA Hiroyuki
[not found] ` <661de9470909090410t160454a2k658c980b92d11612@mail.gmail.com>
2009-09-10 0:10 ` KAMEZAWA Hiroyuki [this message]
2009-09-09 8:41 ` [RFC][PATCH 2/4][mmotm] clean up charge path of softlimit KAMEZAWA Hiroyuki
2009-09-09 8:44 ` [RFC][PATCH 3/4][mmotm] memcg: batched uncharge KAMEZAWA Hiroyuki
2009-09-09 8:45 ` [RFC][PATCH 4/4][mmotm] memcg: coalescing charge KAMEZAWA Hiroyuki
2009-09-12 4:58 ` Daisuke Nishimura
2009-09-15 0:09 ` KAMEZAWA Hiroyuki
2009-09-09 20:30 ` [RFC][PATCH 0/4][mmotm] memcg: reduce lock contention v3 Balbir Singh
2009-09-10 0:20 ` KAMEZAWA Hiroyuki
2009-09-10 5:18 ` Balbir Singh
2009-09-18 8:47 ` [RFC][PATCH 0/11][mmotm] memcg: patch dump (Sep/18) KAMEZAWA Hiroyuki
2009-09-18 8:50 ` [RFC][PATCH 1/11] memcg: clean up softlimit uncharge KAMEZAWA Hiroyuki
2009-09-18 8:52 ` [RFC][PATCH 2/11]memcg: reduce res_counter_soft_limit_excess KAMEZAWA Hiroyuki
2009-09-18 8:53 ` [RFC][PATCH 3/11] memcg: coalescing uncharge KAMEZAWA Hiroyuki
2009-09-18 8:54 ` [RFC][PATCH 4/11] memcg: coalescing charge KAMEZAWA Hiroyuki
2009-09-18 8:55 ` [RFC][PATCH 5/11] memcg: clean up cancel charge KAMEZAWA Hiroyuki
2009-09-18 8:57 ` [RFC][PATCH 6/11] memcg: cleaun up percpu statistics KAMEZAWA Hiroyuki
2009-09-18 8:58 ` [RFC][PATCH 7/11] memcg: rename from_cont to from_cgroup KAMEZAWA Hiroyuki
2009-09-18 9:00 ` [RFC][PATCH 8/11]memcg: remove unused macro and adds commentary KAMEZAWA Hiroyuki
2009-09-18 9:01 ` [RFC][PATCH 9/11]memcg: clean up zonestat funcs KAMEZAWA Hiroyuki
2009-09-18 9:04 ` [RFC][PATCH 10/11][mmotm] memcg: clean up percpu and more commentary for soft limit KAMEZAWA Hiroyuki
2009-09-18 9:06 ` [RFC][PATCH 11/11][mmotm] memcg: more commentary and clean up KAMEZAWA Hiroyuki
2009-09-18 10:37 ` [RFC][PATCH 0/11][mmotm] memcg: patch dump (Sep/18) Daisuke Nishimura
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090910091010.e1365df3.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=balbir@linux.vnet.ibm.com \
--cc=linux-mm@kvack.org \
--cc=nishimura@mxp.nes.nec.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox