linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>,
	bsingharora@gmail.com, cgroups@vger.kernel.org,
	linux-mm@kvack.org, lizefan@huawei.com
Subject: Re: [PATCH 1/3] memcg: fix subtle memory barrier bug in mem_cgroup_iter()
Date: Tue, 4 Jun 2013 09:58:40 -0400	[thread overview]
Message-ID: <20130604135840.GN15576@cmpxchg.org> (raw)
In-Reply-To: <20130604130336.GE31242@dhcp22.suse.cz>

On Tue, Jun 04, 2013 at 03:03:36PM +0200, Michal Hocko wrote:
> On Mon 03-06-13 17:44:37, Tejun Heo wrote:
> [...]
> > @@ -1218,9 +1218,18 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
> >  			 * is alive.
> >  			 */
> >  			dead_count = atomic_read(&root->dead_count);
> > -			smp_rmb();
> > +
> >  			last_visited = iter->last_visited;
> >  			if (last_visited) {
> > +				/*
> > +				 * Paired with smp_wmb() below in this
> > +				 * function.  The pair guarantee that
> > +				 * last_visited is more current than
> > +				 * last_dead_count, which may lead to
> > +				 * spurious iteration resets but guarantees
> > +				 * reliable detection of dead condition.
> > +				 */
> > +				smp_rmb();
> >  				if ((dead_count != iter->last_dead_count) ||
> >  					!css_tryget(&last_visited->css)) {
> >  					last_visited = NULL;
> 
> I originally had the barrier this way but Johannes pointed out it is not
> correct https://lkml.org/lkml/2013/2/11/411
> "
> !> +			/*
> !> +			 * last_visited might be invalid if some of the group
> !> +			 * downwards was removed. As we do not know which one
> !> +			 * disappeared we have to start all over again from the
> !> +			 * root.
> !> +			 * css ref count then makes sure that css won't
> !> +			 * disappear while we iterate to the next memcg
> !> +			 */
> !> +			last_visited = iter->last_visited;
> !> +			dead_count = atomic_read(&root->dead_count);
> !> +			smp_rmb();
> !
> !Confused about this barrier, see below.
> !
> !As per above, if you remove the iter lock, those lines are mixed up.
> !You need to read the dead count first because the writer updates the
> !dead count after it sets the new position.  That way, if the dead
> !count gives the go-ahead, you KNOW that the position cache is valid,
> !because it has been updated first.  If either the two reads or the two
> !writes get reordered, you risk seeing a matching dead count while the
> !position cache is stale.
> "

The original prototype code I sent looked like this:

mem_cgroup_iter:
rcu_read_lock()
if atomic_read(&root->dead_count) == iter->dead_count:
  smp_rmb()
  if tryget(iter->position):
    position = iter->position
memcg = find_next(postion)
css_put(position)
iter->position = memcg
smp_wmb() /* Write position cache BEFORE marking it uptodate */
iter->dead_count = atomic_read(&root->dead_count)
rcu_read_unlock()

iter->last_position is written, THEN iter->last_dead_count is written.

So, yes, you "need to read the dead count" first to be sure
iter->last_position is uptodate.  But iter->last_dead_count, not
root->dead_count.  I should have caught this in the final submission
of your patch :(

Tejun's patch is not correct, either.  Something like this?

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 010d6c1..92830fa 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1199,7 +1199,6 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
 
 			mz = mem_cgroup_zoneinfo(root, nid, zid);
 			iter = &mz->reclaim_iter[reclaim->priority];
-			last_visited = iter->last_visited;
 			if (prev && reclaim->generation != iter->generation) {
 				iter->last_visited = NULL;
 				goto out_unlock;
@@ -1217,14 +1216,20 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
 			 * css_tryget() verifies the cgroup pointed to
 			 * is alive.
 			 */
+			last_visited = NULL;
 			dead_count = atomic_read(&root->dead_count);
-			smp_rmb();
-			last_visited = iter->last_visited;
-			if (last_visited) {
-				if ((dead_count != iter->last_dead_count) ||
-					!css_tryget(&last_visited->css)) {
+			if (dead_count == iter->last_dead_count) {
+				/*
+				 * The writer below sets the position
+				 * pointer, then the dead count.
+				 * Ensure we read the updated position
+				 * when the dead count matches.
+				 */
+				smp_rmb();
+				last_visited = iter->last_visited;
+				if (last_visited &&
+				    !css_tryget(&last_visited->css))
 					last_visited = NULL;
-				}
 			}
 		}
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-06-04 13:59 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-04  0:44 [PATCHSET] memcg: fix and reimplement iterator Tejun Heo
2013-06-04  0:44 ` [PATCH 1/3] memcg: fix subtle memory barrier bug in mem_cgroup_iter() Tejun Heo
2013-06-04 13:03   ` Michal Hocko
2013-06-04 13:58     ` Johannes Weiner [this message]
2013-06-04 15:29       ` Michal Hocko
2013-06-04  0:44 ` [PATCH 2/3] memcg: restructure mem_cgroup_iter() Tejun Heo
2013-06-04 13:21   ` Michal Hocko
2013-06-04 20:51     ` Tejun Heo
2013-06-04  0:44 ` [PATCH 3/3] memcg: simplify mem_cgroup_reclaim_iter Tejun Heo
2013-06-04 13:18   ` Michal Hocko
2013-06-04 20:50     ` Tejun Heo
2013-06-04 21:28       ` Michal Hocko
2013-06-04 21:55         ` Tejun Heo
2013-06-05  7:30           ` Michal Hocko
2013-06-05  8:20             ` Tejun Heo
2013-06-05  8:36               ` Michal Hocko
2013-06-05  8:44                 ` Tejun Heo
2013-06-05  8:55                   ` Michal Hocko
2013-06-05  9:03                     ` Tejun Heo
2013-06-05 14:39               ` Johannes Weiner
2013-06-05 14:50                 ` Johannes Weiner
2013-06-05 14:56                 ` Michal Hocko
2013-06-05 17:22                 ` Tejun Heo
2013-06-05 19:45                   ` Johannes Weiner
2013-06-05 20:06                     ` Tejun Heo
2013-06-05 21:17                       ` Johannes Weiner
2013-06-05 22:20                         ` Tejun Heo
2013-06-05 22:27                           ` Tejun Heo
2013-06-06 11:50                             ` Michal Hocko
2013-06-07  0:52                               ` Tejun Heo
2013-06-07  7:37                                 ` Michal Hocko
2013-06-07 23:25                                   ` Tejun Heo
2013-06-10  8:02                                     ` Michal Hocko
2013-06-10 19:54                                       ` Tejun Heo
2013-06-10 20:48                                         ` Michal Hocko
2013-06-10 23:13                                           ` Tejun Heo
2013-06-11  7:27                                             ` Michal Hocko
2013-06-11  7:44                                               ` Tejun Heo
2013-06-11  7:55                                                 ` Michal Hocko
2013-06-11  8:00                                                   ` Tejun Heo
2013-06-04 21:40       ` Johannes Weiner
2013-06-04 21:49         ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130604135840.GN15576@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=mhocko@suse.cz \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox