linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Ying Han <yinghan@google.com>, Tejun Heo <htejun@gmail.com>,
	Glauber Costa <glommer@parallels.com>,
	Li Zefan <lizefan@huawei.com>
Subject: Re: [PATCH v3 4/7] memcg: remove memcg from the reclaim iterators
Date: Wed, 13 Feb 2013 10:51:13 +0100	[thread overview]
Message-ID: <20130213095113.GA23562@dhcp22.suse.cz> (raw)
In-Reply-To: <20130212195358.GE25235@cmpxchg.org>

On Tue 12-02-13 14:53:58, Johannes Weiner wrote:
[...]
> iteration:
> rcu_read_lock()
> dead_count = atomic_read(&hierarchy->dead_count)
> smp_rmb()
> previous = iterator->position
> if (iterator->dead_count != dead_count)
>    /* A cgroup in our hierarchy was killed, pointer might be dangling */
>    don't use iterator
> if (!tryget(&previous))
>    /* The cgroup is marked dead, don't use it */
>    don't use iterator
> next = find_next_and_tryget(hierarchy, &previous)
> /* what happens if destruction of next starts NOW? */

OK, I thought that this depends on the ordering of CSS_DEACT_BIAS and
dead_count writes - because there is no memory ordering enforced between
those two. But it shouldn't matter because we are checking both. If the
increment is seen sooner then we do not care about css_tryget and if css
is deactivated before dead_count++ then the css_tryget would shout.

More interesting ordering, however, is dead_count++ vs. css_put from
cgroup core. Say we have the following:

	CPU0			CPU1			CPU2

iter->position = A;
iter->dead_count = dead_count;
rcu_read_unlock()
return A

mem_cgroup_iter_break
  css_put(A)					bias(A)
  						css_offline()
  						css_put(A) // in cgroup_destroy_locked
							   // last ref and A will be freed
  			rcu_read_lock()
			read parent->dead_count
						parent->dead_count++ // got reordered from css_offline
			css_tryget(A) // kaboom

The reordering window is really huge and I think it is impossible
to trigger in real life. And mem_cgroup_reparent_charges calls
mem_cgroup_start_move unconditionally which in turn calls
synchronize_rcu() which is a full barrier AFAIU so dead_count++ cannot
be reordered ATM.
But should we rely on that? Shouldn't we add smp_wmb
after dead_count++ as I had in an earlier version of the patch?

> css_put(previous)
> iterator->position = next
> smp_wmb()
> iterator->dead_count = dead_count /* my suggestion, instead of a second atomic_read() */
> rcu_read_unlock()
> return next /* caller drops ref eventually, iterator->cgroup becomes weak */
> 
> destruction:
> bias(cgroup->refcount) /* disables future tryget */
> //synchronize_rcu() /* Michal's suggestion */
> atomic_inc(&cgroup->hierarchy->dead_count)
> synchronize_rcu()
> free(cgroup)

Other than that this should work. I will update the patch accordingly.

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-02-13  9:51 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-03 17:54 [PATCH v3 0/7] rework mem_cgroup iterator Michal Hocko
2013-01-03 17:54 ` [PATCH v3 1/7] memcg: synchronize per-zone iterator access by a spinlock Michal Hocko
2013-01-03 17:54 ` [PATCH v3 2/7] memcg: keep prev's css alive for the whole mem_cgroup_iter Michal Hocko
2013-01-03 17:54 ` [PATCH v3 3/7] memcg: rework mem_cgroup_iter to use cgroup iterators Michal Hocko
2013-01-03 17:54 ` [PATCH v3 4/7] memcg: remove memcg from the reclaim iterators Michal Hocko
2013-01-07  6:18   ` Kamezawa Hiroyuki
2013-02-08 19:33   ` Johannes Weiner
2013-02-11 15:16     ` Michal Hocko
2013-02-11 17:56       ` Johannes Weiner
2013-02-11 19:29         ` Michal Hocko
2013-02-11 19:58           ` Johannes Weiner
2013-02-11 21:27             ` Michal Hocko
2013-02-11 22:07               ` Michal Hocko
2013-02-11 22:39               ` Johannes Weiner
2013-02-12  9:54                 ` Michal Hocko
2013-02-12 15:10                   ` Johannes Weiner
2013-02-12 15:43                     ` Michal Hocko
2013-02-12 16:10                       ` Paul E. McKenney
2013-02-12 17:25                         ` Johannes Weiner
2013-02-12 18:31                           ` Paul E. McKenney
2013-02-12 19:53                             ` Johannes Weiner
2013-02-13  9:51                               ` Michal Hocko [this message]
2013-02-12 17:56                         ` Michal Hocko
2013-02-12 16:13                       ` Michal Hocko
2013-02-12 16:24                         ` Michal Hocko
2013-02-12 16:37                           ` Michal Hocko
2013-02-12 16:41                           ` Johannes Weiner
2013-02-12 17:12                             ` Michal Hocko
2013-02-12 17:37                               ` Johannes Weiner
2013-02-13  8:11                                 ` Glauber Costa
2013-02-13 10:38                                   ` Michal Hocko
2013-02-13 10:34                                 ` Michal Hocko
2013-02-13 12:56                                   ` Michal Hocko
2013-02-12 16:33                       ` Johannes Weiner
2013-01-03 17:54 ` [PATCH v3 5/7] memcg: simplify mem_cgroup_iter Michal Hocko
2013-01-03 17:54 ` [PATCH v3 6/7] memcg: further " Michal Hocko
2013-01-03 17:54 ` [PATCH v3 7/7] cgroup: remove css_get_next Michal Hocko
2013-01-04  3:42   ` Li Zefan
2013-01-23 12:52 ` [PATCH v3 0/7] rework mem_cgroup iterator Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130213095113.GA23562@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=glommer@parallels.com \
    --cc=hannes@cmpxchg.org \
    --cc=htejun@gmail.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox