From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Paul Menage <menage@google.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"containers@lists.osdl.org" <containers@lists.osdl.org>,
"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
"xemul@openvz.org" <xemul@openvz.org>,
"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>,
"yamamoto@valinux.co.jp" <yamamoto@valinux.co.jp>
Subject: Re: [RFD][PATCH] memcg: Move Usage at Task Move
Date: Wed, 11 Jun 2008 16:45:44 +0900 [thread overview]
Message-ID: <20080611164544.94047336.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <6599ad830806110017t5ebeda78id1914d179a018422@mail.gmail.com>
Hi,
On Wed, 11 Jun 2008 00:17:31 -0700
"Paul Menage" <menage@google.com> wrote:
> On Thu, Jun 5, 2008 at 6:52 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > Move Usage at Task Move (just an experimantal for discussion)
> > I tested this but don't think bug-free.
> >
> > In current memcg, when task moves to a new cg, the usage remains in the old cg.
> > This is considered to be not good.
>
> Is it really such a big deal if we don't transfer the page ownerships
> to the new cgroup? As this thread has shown, it's a fairly painful
> operation to support. It would be good to have some concrete examples
> of cases where this is needed.
>
When we moves a process with XXXG bytes of memory, we need "move" obviously.
I think there is a case that system administrator decides to create _new_
cgroup to isolate some swappy job for maintaining the system.
(I never be able to say that never happens.)
This kind of resource resizing can be happen under automatic controlls of
middleware, I think. But as you say, this should be implemented in simple way.
I'm now trying to make this simple. (i.e. searching no-rollback approach.)
> >
> > This is a trial to move "usage" from old cg to new cg at task move.
> > Finally, you'll see the problems we have to handle are failure and rollback.
> >
> > This one's Basic algorithm is
> >
> > 0. can_attach() is called.
> > 1. count movable pages by scanning page table. isolate all pages from LRU.
> > 2. try to create enough room in new memory cgroup
> > 3. start moving page accouing
> > 4. putback pages to LRU.
> > 5. can_attach() for other cgroups are called.
> >
> > A case study.
> >
> > group_A -> limit=1G, task_X's usage= 800M.
> > group_B -> limit=1G, usage=500M.
> >
> > For moving task_X from group_A to group_B.
> > - group_B should be reclaimed or have enough room.
> >
> > While moving task_X from group_A to group_B.
> > - group_B's memory usage can be changed
> > - group_A's memory usage can be changed
> >
> > We accounts the resouce based on pages. Then, we can't move all resource
> > usage at once.
> >
> > If group_B has no more room when we've moved 700M of task_X to group_B,
> > we have to move 700M of task_X back to group_A. So I implemented roll-back.
> > But other process may use up group_A's available resource at that point.
> >
> > For avoiding that, preserve 800M in group_B before moving task_X means that
> > task_X can occupy 1600M of resource at moving. (So I don't do in this patch.)
>
> I think that pre-reserving in B would be the cleanest solution, and
> would save the need to provide rollback.
>
Yes. My next version will try to pre-reserve. and no rollbacks.
> > 2. Don't move any usage at task move. (current implementation.)
> > Pros.
> > - no complication in the code.
> > Cons.
> > - A task's usage is chareged to wrong cgroup.
> > - Not sure, but I believe the users don't want this.
>
> I'd say stick with this unless there a strong arguments in favour of
> changing, based on concrete needs.
>
People around me says "this logic is buggy" ;)
> >
> > One reasone is that I think a typical usage of memory controller is
> > fork()->move->exec(). (by libcg ?) and exec() will flush the all usage.
>
> Exactly - this is a good reason *not* to implement move - because then
> you drag all the usage of the middleware daemon into the new cgroup.
>
Yes but this is one of the usage of cgroup. In general, system admin can
use this for limiting memory on his own decision.
> > Index: temp-2.6.26-rc2-mm1/include/linux/cgroup.h
> > ===================================================================
> > --- temp-2.6.26-rc2-mm1.orig/include/linux/cgroup.h
> > +++ temp-2.6.26-rc2-mm1/include/linux/cgroup.h
> > @@ -299,6 +299,8 @@ struct cgroup_subsys {
> > struct cgroup *cgrp, struct task_struct *tsk);
> > void (*attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
> > struct cgroup *old_cgrp, struct task_struct *tsk);
> > + void (*attach_rollback)(struct cgroup_subsys *ss,
> > + struct task_struct *tsk);
> > void (*fork)(struct cgroup_subsys *ss, struct task_struct *task);
> > void (*exit)(struct cgroup_subsys *ss, struct task_struct *task);
> > int (*populate)(struct cgroup_subsys *ss,
> > Index: temp-2.6.26-rc2-mm1/kernel/cgroup.c
> > ===================================================================
> > --- temp-2.6.26-rc2-mm1.orig/kernel/cgroup.c
> > +++ temp-2.6.26-rc2-mm1/kernel/cgroup.c
> > @@ -1241,7 +1241,7 @@ int cgroup_attach_task(struct cgroup *cg
> > if (ss->can_attach) {
> > retval = ss->can_attach(ss, cgrp, tsk);
> > if (retval)
> > - return retval;
> > + goto rollback;
> > }
> > }
> >
> > @@ -1278,6 +1278,13 @@ int cgroup_attach_task(struct cgroup *cg
> > synchronize_rcu();
> > put_css_set(cg);
> > return 0;
> > +
> > +rollback:
> > + for_each_subsys(root, ss) {
> > + if (ss->attach_rollback)
> > + ss->attach_rollback(ss, tsk);
> > + }
> > + return retval;
> > }
> >
>
> I really need to get round to my plan for implementing transactional
> attach - I've just been swamped by internal stuff recently.
> Essentially, I think that we need the ability for a subsystem to
> request either a commit or a rollback following an attach. The big
> difference to what we have now is that the each subsystem will be able
> to synchronize itself with the updates to its state pointer in the
> task's css_set. Also, we need to not be calling attach_rollback on
> subsystems that didn't get an attach() call.
>
yes. but, at first, I'll try no-rollback approach.
And can I move memory resource controller's subsys_id to the last for now ?
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-06-11 7:45 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-06 1:52 KAMEZAWA Hiroyuki
2008-06-10 5:50 ` YAMAMOTO Takashi
2008-06-10 8:13 ` KAMEZAWA Hiroyuki
2008-06-10 12:57 ` YAMAMOTO Takashi
2008-06-11 2:02 ` KAMEZAWA Hiroyuki
2008-06-11 3:45 ` YAMAMOTO Takashi
2008-06-11 4:08 ` KAMEZAWA Hiroyuki
2008-06-10 7:35 ` Daisuke Nishimura
2008-06-10 8:26 ` KAMEZAWA Hiroyuki
2008-06-11 3:03 ` Daisuke Nishimura
2008-06-11 3:25 ` KAMEZAWA Hiroyuki
2008-06-11 3:44 ` YAMAMOTO Takashi
2008-06-11 4:14 ` KAMEZAWA Hiroyuki
2008-06-11 4:29 ` Daisuke Nishimura
2008-06-11 4:40 ` KAMEZAWA Hiroyuki
2008-06-12 5:20 ` YAMAMOTO Takashi
2008-06-12 6:51 ` KAMEZAWA Hiroyuki
2008-06-11 7:17 ` Paul Menage
2008-06-11 7:45 ` KAMEZAWA Hiroyuki [this message]
2008-06-11 8:04 ` Paul Menage
2008-06-11 8:27 ` KAMEZAWA Hiroyuki
2008-06-11 8:48 ` Paul Menage
2008-06-12 5:08 ` KAMEZAWA Hiroyuki
2008-06-12 13:17 ` Serge E. Hallyn
2008-06-12 13:34 ` kamezawa.hiroyu
2008-06-12 21:08 ` Serge E. Hallyn
2008-06-13 0:34 ` KAMEZAWA Hiroyuki
2008-06-13 0:41 ` KAMEZAWA Hiroyuki
2008-06-11 8:27 ` Balbir Singh
2008-06-11 12:21 ` Daisuke Nishimura
2008-06-11 12:51 ` kamezawa.hiroyu
2008-06-11 13:13 ` Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080611164544.94047336.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=balbir@linux.vnet.ibm.com \
--cc=containers@lists.osdl.org \
--cc=linux-mm@kvack.org \
--cc=menage@google.com \
--cc=nishimura@mxp.nes.nec.co.jp \
--cc=xemul@openvz.org \
--cc=yamamoto@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox