From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: linux-mm <linux-mm@kvack.org>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Li Zefan <lizf@cn.fujitsu.com>
Subject: Re: [RFC][BUGFIX] memcg: rmdir doesn't return
Date: Tue, 16 Jun 2009 15:48:20 +0900 [thread overview]
Message-ID: <20090616154820.c9065809.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20090616153810.fd710c5b.nishimura@mxp.nes.nec.co.jp>
On Tue, 16 Jun 2009 15:38:10 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> On Tue, 16 Jun 2009 14:00:50 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Tue, 16 Jun 2009 11:47:35 +0900
> > Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> >
> > > On Mon, 15 Jun 2009 17:17:15 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > > On Mon, 15 Jun 2009 12:02:13 +0900
> > > > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > > > I don't like implict resource move. I'll try some today. plz see it.
> > > > > _But_ this case just happens when swap is shared between cgroups and _very_ heavy
> > > > > swap-in continues very long. I don't think this is a fatal and BUG.
> > > > >
> > > > > But ok, maybe wake-up path is not enough.
> > > > >
> > > > Here.
> > > > Anyway, there is an unfortunate complexity in cgroup's rmdir() path.
> > > > I think this will remove all concern in
> > > > pre_destroy -> check -> start rmdir path
> > > > if subsys is aware of what they does.
> > > > Usual subsys just consider "tasks" and no extra references I hope.
> > > > If your test result is good, I'll post again (after merge window ?).
> > > >
> > > Thank you for your patch.
> > >
> > > At first, I thought this problem can be solved by this direction, but
> > > there is a race window yet.
> > >
> > > The root cause of this problem is that mem.usage can be incremented
> > > by swap-in behavior of memcg even after it has become 0 once.
> > > So, mem.usage can also be incremented between cgroup_need_restart_rmdir()
> > > and schedule().
> > > I can see rmdir being locked up actually in my test.
> > >
> > > hmm, sleeping until being waken up might not be good if we don't change
> > > swap-in behavior of memcg in some way.
> > >
> > Or, invalidate all refs from swap_cgroup in force_empty().
> > Fixed one is attached.
> >
> > Why I don't like "charge to current process" at swap-in is that a user cannot
> > expect how the resource usage will change. It will be random.
> >
> > In this meaning, I wanted to set "owner" of file-caches. But file-caches are
> > used in more explict way than swap and the user can be aware of the usage
> > easier than swap cache.(and files are expected to be shared in its nature.)
> >
> > The patch itself will require some more work.
> > What I feel difficut in cgroup's rmdir() is
> > ==
> > pre_destroy(); => pre_destroy() reduces css's refcnt to be 0.
> > CGROUP_WAIT_ON_RMDIR is set
> > if (check css's refcnt again)
> > {
> > sleep and retry
> > }
> > ==
> > css_tryget() check CSS_IS_REMOVED but CSS_IS_REMOVED is set only when
> > css->refcnt goes down to be 0. Hmm.
> >
> > I think my patch itself is not so bad. But the scheme is dirty in general.
> >
> > Thanks,
> > -Kame
> > ==
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Looks good except for:
>
> > @@ -374,6 +385,7 @@ struct cgroup_subsys {
> > struct cgroup_subsys_state *(*create)(struct cgroup_subsys *ss,
> > struct cgroup *cgrp);
> > int (*pre_destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
> > + int (*rmdir_retry)(struct cgroup_subsys *ss, struct cgroup *cgrp);
> > void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
> > int (*can_attach)(struct cgroup_subsys *ss,
> > struct cgroup *cgrp, struct task_struct *tsk);
> s/rmdir_retry/retry_rmdir
>
> It has been working well so far, but I will continue to test for more long time.
>
Thank you. I'd like to find out more clean fix, keeping this as an option.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-06-16 6:48 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-12 5:33 Daisuke Nishimura
2009-06-12 6:19 ` KAMEZAWA Hiroyuki
2009-06-15 2:50 ` Daisuke Nishimura
2009-06-15 3:02 ` KAMEZAWA Hiroyuki
2009-06-15 8:17 ` KAMEZAWA Hiroyuki
2009-06-16 2:47 ` Daisuke Nishimura
2009-06-16 5:00 ` KAMEZAWA Hiroyuki
2009-06-16 6:38 ` Daisuke Nishimura
2009-06-16 6:48 ` KAMEZAWA Hiroyuki [this message]
2009-06-16 8:44 ` KAMEZAWA Hiroyuki
2009-06-17 4:56 ` Balbir Singh
2009-06-17 5:11 ` KAMEZAWA Hiroyuki
2009-06-17 5:49 ` Balbir Singh
2009-06-17 6:27 ` KAMEZAWA Hiroyuki
2009-06-17 7:35 ` Balbir Singh
2009-06-17 9:05 ` KAMEZAWA Hiroyuki
2009-06-17 9:24 ` Balbir Singh
2009-06-18 3:03 ` Daisuke Nishimura
2009-06-18 3:21 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090616154820.c9065809.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=balbir@linux.vnet.ibm.com \
--cc=linux-mm@kvack.org \
--cc=lizf@cn.fujitsu.com \
--cc=nishimura@mxp.nes.nec.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox