linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Paul Menage <menage@google.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
	"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>,
	"lizf@cn.fujitsu.com" <lizf@cn.fujitsu.com>,
	"kosaki.motohiro@jp.fujitsu.com" <kosaki.motohiro@jp.fujitsu.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC][PATCH 1/6] memcg: fix pre_destory handler
Date: Wed, 10 Dec 2008 02:40:13 -0800	[thread overview]
Message-ID: <6599ad830812100240g5e549a5cqe29cbea736788865@mail.gmail.com> (raw)
In-Reply-To: <20081209200647.a1fa76a9.kamezawa.hiroyu@jp.fujitsu.com>

On Tue, Dec 9, 2008 at 3:06 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> better name for new flag is welcome.
>
> ==
> Because pre_destroy() handler is moved out to cgroup_lock() for
> avoiding dead-lock, now, cgroup's rmdir() does following sequence.
>
>        cgroup_lock()
>        check children and tasks.
>        (A)
>        cgroup_unlock()
>        (B)
>        pre_destroy() for subsys;-----(1)
>        (C)
>        cgroup_lock();
>        (D)
>        Second check:check for -EBUSY again because we released the lock.
>        (E)
>        mark cgroup as removed.
>        (F)
>        unlink from lists.
>        cgroup_unlock();
>        dput()
>        => when dentry's refcnt goes down to 0
>                destroy() handers for subsys

The reason for needing this patch is because of the non-atomic locking
in cgroup_rmdir() that was introduced due to the circular locking
dependency between the hotplug lock and the cgroup_mutex.

But rather than adding a whole bunch more complexity, this looks like
another case that could be solved by the hierarchy_mutex patches that
I posted a while ago.

Those allow the cpuset hotplug notifier (and any other subsystem that
wants a stable hierarchy) to take ss->hierarchy_mutex, to prevent
mkdir/rmdir/bind in its hierarchy, which helps to remove the deadlock
that the above dropping of cgroup_mutex was introduced to work around.

>
> Considering above sequence, new tasks can be added while
>        (B) and (C)
> swap-in recored can be charged back to a cgroup after pre_destroy()
>        at (C) and (D), (E)
> (means cgrp's refcnt not comes from task but from other persistent objects.)

Which "persistent object" are you getting the css refcount from?

Is the problem that swap references aren't refcounted because you want
to avoid swap from keeping a cgroup alive? But you still want to be
able to do css_get() on the mem_cgroup* obtained from a swap
reference, and be safely synchronized with concurrent rmdir operations
without having to take a heavy lock?

The solution that I originally tried to use for this in an early
version of cgroups (but dropped as I thought it was not needed) was to
treat css->refcount as follows:

 0 => trying to remove or removed
 1 => normal state with no additional refs

So to get a reference on a possibly removed css we'd have:

int css_tryget(css) {
  while (!atomic_inc_not_zero(&css->refcount)) {
    if (test_bit(CSS_REMOVED, &css->flags)) {
      return 0;
    }
  }
  return 1;
}

and cgroup_rmdir would do:

for_each_subsys() {
  if (cmpxchg(&css->refcount, 0, -1)) {
    // busy, roll back -1s to 0s, give error
    ...
  }
}
// success
for_each_subsys() {
  set_bit(CSS_REMOVED, &css->flags);
}

This makes it easy to have weak references to a css that can be
dropped in a destroy() callback. Would this maybe even remove the need
for mem_cgroup_pre_destroy()?

Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2008-12-10 10:40 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-09 11:02 [RFC][PATCH 0/6] cgroup id and mix fixes (2008/12/09) KAMEZAWA Hiroyuki
2008-12-09 11:04 ` [RFC][PATCH 1/6] memcg: Documentation for internal implementation KAMEZAWA Hiroyuki
2008-12-10  0:27   ` KAMEZAWA Hiroyuki
2008-12-10  1:02     ` Li Zefan
2008-12-10  1:07       ` KAMEZAWA Hiroyuki
2008-12-09 11:06 ` [RFC][PATCH 1/6] memcg: fix pre_destory handler KAMEZAWA Hiroyuki
2008-12-10  2:08   ` KAMEZAWA Hiroyuki
2008-12-10  2:19   ` Li Zefan
2008-12-10  2:23     ` KAMEZAWA Hiroyuki
2008-12-10  2:28   ` Daisuke Nishimura
2008-12-10  2:58     ` KAMEZAWA Hiroyuki
2008-12-10  3:03       ` Daisuke Nishimura
2008-12-10  4:17         ` KAMEZAWA Hiroyuki
2008-12-10 10:40   ` Paul Menage [this message]
2008-12-10 11:29     ` KAMEZAWA Hiroyuki
2008-12-10 13:25       ` Balbir Singh
2008-12-10 13:47         ` Daisuke Nishimura
2008-12-10 18:26           ` Paul Menage
2008-12-10 18:25         ` Paul Menage
2008-12-10 18:35       ` Paul Menage
2008-12-10 19:00         ` Paul Menage
2008-12-11  0:21           ` KAMEZAWA Hiroyuki
2008-12-11  0:24             ` Paul Menage
2008-12-11  1:06               ` KAMEZAWA Hiroyuki
2008-12-11 12:43               ` KAMEZAWA Hiroyuki
2008-12-11  0:25         ` KAMEZAWA Hiroyuki
2008-12-11  0:28           ` Paul Menage
2008-12-11  1:09             ` KAMEZAWA Hiroyuki
2008-12-09 11:08 ` [RFC][PATCH 2/6] cgroup id KAMEZAWA Hiroyuki
2008-12-09 11:09 ` [RFC][PATCH 4/6] Flat hierarchical reclaim by ID KAMEZAWA Hiroyuki
2008-12-09 12:27   ` Balbir Singh
2008-12-09 14:28     ` KAMEZAWA Hiroyuki
2008-12-09 15:46       ` Balbir Singh
2008-12-09 16:34         ` KAMEZAWA Hiroyuki
2008-12-10  2:49           ` Balbir Singh
2008-12-10  3:03             ` KAMEZAWA Hiroyuki
2008-12-09 11:10 ` [RFC][PATCH 5/6] fix inactive_ratio under hierarchy KAMEZAWA Hiroyuki
2008-12-11  3:14   ` KOSAKI Motohiro
2008-12-11  3:19     ` KAMEZAWA Hiroyuki
2008-12-09 11:12 ` [RFC][PATCH 6/6] fix oom " KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6599ad830812100240g5e549a5cqe29cbea736788865@mail.gmail.com \
    --to=menage@google.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox