From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>
Subject: Re: [RFC][preview] [patch 1/2] memcg: batched uncharge base
Date: Tue, 25 Aug 2009 17:37:43 +0900 [thread overview]
Message-ID: <20090825173743.61f737ee.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20090825170735.6acd3ace.nishimura@mxp.nes.nec.co.jp>
On Tue, 25 Aug 2009 17:07:35 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> First of all, I think these patches are good optimization.
>
> I have a few comments for now.
>
> On Tue, 25 Aug 2009 11:29:19 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >
> > In massive parallel enviroment, res_counter can be a performance bottleneck.
> > This patch is a trial for reducing lock contention in memcg.
> >
> > One strong techinque to reduce lock contention is reducing calls themselves by
> > do some amount of calls into a call, in batch.
> >
> > Considering charge/uncharge chatacteristic,
> > - charge is done one by one via demand-paging.
> > - uncharge is done by
> > - in continuous call at munmap, truncate, exit, execve...
> > - one by one via vmscan/paging.
> >
> > It seems we have a chance to batched-uncharge.
> > This patch is a base patch for batched uncharge. For avoiding
> > scattering memcg's structure as argument, this patch adds memcg batch uncharge
> > information to the task. please see start/end usage in next patch.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> > include/linux/memcontrol.h | 12 ++++++++++
> > include/linux/sched.h | 8 +++++++
> > mm/memcontrol.c | 51 ++++++++++++++++++++++++++++++++++++++++++---
> > 3 files changed, 68 insertions(+), 3 deletions(-)
> >
> > Index: linux-2.6.31-rc7/include/linux/memcontrol.h
> > ===================================================================
> > --- linux-2.6.31-rc7.orig/include/linux/memcontrol.h
> > +++ linux-2.6.31-rc7/include/linux/memcontrol.h
> > @@ -54,6 +54,10 @@ extern void mem_cgroup_rotate_lru_list(s
> > extern void mem_cgroup_del_lru(struct page *page);
> > extern void mem_cgroup_move_lists(struct page *page,
> > enum lru_list from, enum lru_list to);
> > +
> > +extern void mem_cgroup_uncharge_batch_start(void);
> > +extern void mem_cgroup_uncharge_batch_end(void);
> > +
> > extern void mem_cgroup_uncharge_page(struct page *page);
> > extern void mem_cgroup_uncharge_cache_page(struct page *page);
> > extern int mem_cgroup_shmem_charge_fallback(struct page *page,
> > @@ -148,6 +152,14 @@ static inline void mem_cgroup_cancel_cha
> > {
> > }
> >
> > +static inline void mem_cgroup_uncharge_batch_start(void)
> > +{
> > +}
> > +
> > +static inline void mem_cgroup_uncharge_batch_start(void)
> > +{
> > +}
> > +
> > static inline void mem_cgroup_uncharge_page(struct page *page)
> > {
> > }
> > Index: linux-2.6.31-rc7/mm/memcontrol.c
> > ===================================================================
> > --- linux-2.6.31-rc7.orig/mm/memcontrol.c
> > +++ linux-2.6.31-rc7/mm/memcontrol.c
> > @@ -1500,6 +1500,7 @@ __mem_cgroup_uncharge_common(struct page
> > struct page_cgroup *pc;
> > struct mem_cgroup *mem = NULL;
> > struct mem_cgroup_per_zone *mz;
> > + struct memcg_batch_info *batch = NULL;
> >
> > if (mem_cgroup_disabled())
> > return NULL;
> > @@ -1537,10 +1538,25 @@ __mem_cgroup_uncharge_common(struct page
> > default:
> > break;
> > }
> > + if (current->batch_memcg.batch_mode)
> > + batch = ¤t->batch_memcg;
> >
> > - res_counter_uncharge(&mem->res, PAGE_SIZE);
> > - if (do_swap_account && (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT))
> > - res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > + if (!batch || batch->memcg != mem) {
> > + res_counter_uncharge(&mem->res, PAGE_SIZE);
> > + if (do_swap_account &&
> > + (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT))
> > + res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > + if (batch) {
> > + batch->memcg = mem;
> What if we have set batch->memcg to a different memcg and it has some batch->nr_pages(nr_memsw) ?
> Shouldn't we flush them first ?
>
Ah, this is bug. this should be
==
if (batch && !batch->memcg)
==
(my current code does this.) thank you for pointing out.
I wonder it's not necessary to flush. just ignore it as no-batch.
This batched uncharge is done at
- truncate/invalidate file cache per 14pages.(PAGEVECSIZE)
- per vma unmapping.
Then, flush-and-exchange or just-do-synchronous-uncharge here or not
will not be important, I think.
> And, it might be a overkill, how about flushing all the batched-uncharges
> before invoking oom at __mem_cgroup_try_charge() ?
>
Hmm. Maybe, I selected region of batched-uncharge to be enough small...
then, adding synchronize_rcu() or congestion_wait() or some before
retrying next-loop of reclaim will be enough.
Or, prevent batched-uncharge if someone runs into reclaim will be a smart choice.
It will be easy midification to mem_cgroup_uncharge_batch_start(void).
Thanks,
-Kame
>
> Thanks,
> Daisuke Nishimura.
>
> > + css_get(&mem->css);
> > + }
> > + } else {
> > + /* instead of modifing res_counter, remember it */
> > + batch->nr_pages += PAGE_SIZE;
> > + if (do_swap_account &&
> > + (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT))
> > + batch->nr_memsw += PAGE_SIZE;
> > + }
> > mem_cgroup_charge_statistics(mem, pc, false);
> >
> > ClearPageCgroupUsed(pc);
> > @@ -1582,6 +1598,35 @@ void mem_cgroup_uncharge_cache_page(stru
> > __mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_CACHE);
> > }
> >
> > +void mem_cgroup_uncharge_batch_start(void)
> > +{
> > + VM_BUG_ON(current->batch_memcg.batch_mode);
> > + current->batch_memcg.batch_mode = 1;
> > + current->batch_memcg.memcg = NULL;
> > + current->batch_memcg.nr_pages = 0;
> > + current->batch_memcg.nr_memsw = 0;
> > +}
> > +
> > +void mem_cgroup_uncharge_batch_end(void)
> > +{
> > + struct mem_cgroup *mem;
> > +
> > + VM_BUG_ON(!current->batch_memcg.batch_mode);
> > + current->batch_memcg.batch_mode = 0;
> > +
> > + mem = current->batch_memcg.memcg;
> > + if (!mem)
> > + return;
> > + if (current->batch_memcg.nr_pages)
> > + res_counter_uncharge(&mem->res,
> > + current->batch_memcg.nr_pages);
> > + if (current->batch_memcg.nr_memsw)
> > + res_counter_uncharge(&mem->memsw,
> > + current->batch_memcg.nr_memsw);
> > + /* we got css's refcnt */
> > + cgroup_release_and_wakeup_rmdir(&mem->css);
> > +}
> > +
> > #ifdef CONFIG_SWAP
> > /*
> > * called after __delete_from_swap_cache() and drop "page" account.
> > Index: linux-2.6.31-rc7/include/linux/sched.h
> > ===================================================================
> > --- linux-2.6.31-rc7.orig/include/linux/sched.h
> > +++ linux-2.6.31-rc7/include/linux/sched.h
> > @@ -1480,6 +1480,14 @@ struct task_struct {
> > /* bitmask of trace recursion */
> > unsigned long trace_recursion;
> > #endif /* CONFIG_TRACING */
> > +#ifdef CONFIG_CGROUP_MEM_RES_CTLR
> > + /* For implicit argument for batched uncharge */
> > + struct memcg_batch_info {
> > + struct mem_cgroup *memcg;
> > + int batch_mode;
> > + unsigned long nr_pages, nr_memsw;
> > + } batch_memcg;
> > +#endif
> > };
> >
> > /* Future-safe accessor for struct task_struct's cpus_allowed. */
> >
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-08-25 23:49 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-25 2:25 [RFC][preview] memcg: reduce lock contention at uncharge by batching KAMEZAWA Hiroyuki
2009-08-25 2:29 ` [RFC][preview] [patch 1/2] memcg: batched uncharge base KAMEZAWA Hiroyuki
2009-08-25 8:07 ` Daisuke Nishimura
2009-08-25 8:37 ` KAMEZAWA Hiroyuki [this message]
2009-08-25 2:31 ` [RFC][preview][patch 2/2] memcg: uncharge at truncate/unmap in batched manner KAMEZAWA Hiroyuki
2009-08-25 8:25 ` [RFC][preview] memcg: reduce lock contention at uncharge by batching Balbir Singh
2009-08-25 8:42 ` KAMEZAWA Hiroyuki
2009-08-26 1:02 ` KAMEZAWA Hiroyuki
2009-08-26 5:25 ` Daisuke Nishimura
2009-08-26 6:48 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090825173743.61f737ee.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=balbir@linux.vnet.ibm.com \
--cc=linux-mm@kvack.org \
--cc=nishimura@mxp.nes.nec.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox