From: Michal Hocko <mhocko@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>,
bsingharora@gmail.com, cgroups@vger.kernel.org,
linux-mm@kvack.org, lizefan@huawei.com
Subject: Re: [PATCH 1/3] memcg: fix subtle memory barrier bug in mem_cgroup_iter()
Date: Tue, 4 Jun 2013 17:29:59 +0200 [thread overview]
Message-ID: <20130604152959.GB6356@dhcp22.suse.cz> (raw)
In-Reply-To: <20130604135840.GN15576@cmpxchg.org>
On Tue 04-06-13 09:58:40, Johannes Weiner wrote:
> On Tue, Jun 04, 2013 at 03:03:36PM +0200, Michal Hocko wrote:
> > On Mon 03-06-13 17:44:37, Tejun Heo wrote:
> > [...]
> > > @@ -1218,9 +1218,18 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
> > > * is alive.
> > > */
> > > dead_count = atomic_read(&root->dead_count);
> > > - smp_rmb();
> > > +
> > > last_visited = iter->last_visited;
> > > if (last_visited) {
> > > + /*
> > > + * Paired with smp_wmb() below in this
> > > + * function. The pair guarantee that
> > > + * last_visited is more current than
> > > + * last_dead_count, which may lead to
> > > + * spurious iteration resets but guarantees
> > > + * reliable detection of dead condition.
> > > + */
> > > + smp_rmb();
> > > if ((dead_count != iter->last_dead_count) ||
> > > !css_tryget(&last_visited->css)) {
> > > last_visited = NULL;
> >
> > I originally had the barrier this way but Johannes pointed out it is not
> > correct https://lkml.org/lkml/2013/2/11/411
> > "
> > !> + /*
> > !> + * last_visited might be invalid if some of the group
> > !> + * downwards was removed. As we do not know which one
> > !> + * disappeared we have to start all over again from the
> > !> + * root.
> > !> + * css ref count then makes sure that css won't
> > !> + * disappear while we iterate to the next memcg
> > !> + */
> > !> + last_visited = iter->last_visited;
> > !> + dead_count = atomic_read(&root->dead_count);
> > !> + smp_rmb();
> > !
> > !Confused about this barrier, see below.
> > !
> > !As per above, if you remove the iter lock, those lines are mixed up.
> > !You need to read the dead count first because the writer updates the
> > !dead count after it sets the new position. That way, if the dead
> > !count gives the go-ahead, you KNOW that the position cache is valid,
> > !because it has been updated first. If either the two reads or the two
> > !writes get reordered, you risk seeing a matching dead count while the
> > !position cache is stale.
> > "
>
> The original prototype code I sent looked like this:
>
> mem_cgroup_iter:
> rcu_read_lock()
> if atomic_read(&root->dead_count) == iter->dead_count:
> smp_rmb()
> if tryget(iter->position):
> position = iter->position
> memcg = find_next(postion)
> css_put(position)
> iter->position = memcg
> smp_wmb() /* Write position cache BEFORE marking it uptodate */
> iter->dead_count = atomic_read(&root->dead_count)
> rcu_read_unlock()
>
> iter->last_position is written, THEN iter->last_dead_count is written.
>
> So, yes, you "need to read the dead count" first to be sure
> iter->last_position is uptodate. But iter->last_dead_count, not
> root->dead_count. I should have caught this in the final submission
> of your patch :(
OK, right you are. I managed to confuse myself by the three dependencies
here. dead_count -> last_visited -> last_dead_count. The first one is
invalid because last_visited doesn't care about dead_count and that
makes it much more clear now.
> Tejun's patch is not correct, either. Something like this?
Yes this looks saner and correct. Care to send a full patch?
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 010d6c1..92830fa 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1199,7 +1199,6 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
>
> mz = mem_cgroup_zoneinfo(root, nid, zid);
> iter = &mz->reclaim_iter[reclaim->priority];
> - last_visited = iter->last_visited;
> if (prev && reclaim->generation != iter->generation) {
> iter->last_visited = NULL;
> goto out_unlock;
> @@ -1217,14 +1216,20 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
> * css_tryget() verifies the cgroup pointed to
> * is alive.
> */
> + last_visited = NULL;
> dead_count = atomic_read(&root->dead_count);
> - smp_rmb();
> - last_visited = iter->last_visited;
> - if (last_visited) {
> - if ((dead_count != iter->last_dead_count) ||
> - !css_tryget(&last_visited->css)) {
> + if (dead_count == iter->last_dead_count) {
> + /*
> + * The writer below sets the position
> + * pointer, then the dead count.
> + * Ensure we read the updated position
> + * when the dead count matches.
> + */
> + smp_rmb();
> + last_visited = iter->last_visited;
> + if (last_visited &&
> + !css_tryget(&last_visited->css))
> last_visited = NULL;
> - }
> }
> }
>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-06-04 15:30 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-04 0:44 [PATCHSET] memcg: fix and reimplement iterator Tejun Heo
2013-06-04 0:44 ` [PATCH 1/3] memcg: fix subtle memory barrier bug in mem_cgroup_iter() Tejun Heo
2013-06-04 13:03 ` Michal Hocko
2013-06-04 13:58 ` Johannes Weiner
2013-06-04 15:29 ` Michal Hocko [this message]
2013-06-04 0:44 ` [PATCH 2/3] memcg: restructure mem_cgroup_iter() Tejun Heo
2013-06-04 13:21 ` Michal Hocko
2013-06-04 20:51 ` Tejun Heo
2013-06-04 0:44 ` [PATCH 3/3] memcg: simplify mem_cgroup_reclaim_iter Tejun Heo
2013-06-04 13:18 ` Michal Hocko
2013-06-04 20:50 ` Tejun Heo
2013-06-04 21:28 ` Michal Hocko
2013-06-04 21:55 ` Tejun Heo
2013-06-05 7:30 ` Michal Hocko
2013-06-05 8:20 ` Tejun Heo
2013-06-05 8:36 ` Michal Hocko
2013-06-05 8:44 ` Tejun Heo
2013-06-05 8:55 ` Michal Hocko
2013-06-05 9:03 ` Tejun Heo
2013-06-05 14:39 ` Johannes Weiner
2013-06-05 14:50 ` Johannes Weiner
2013-06-05 14:56 ` Michal Hocko
2013-06-05 17:22 ` Tejun Heo
2013-06-05 19:45 ` Johannes Weiner
2013-06-05 20:06 ` Tejun Heo
2013-06-05 21:17 ` Johannes Weiner
2013-06-05 22:20 ` Tejun Heo
2013-06-05 22:27 ` Tejun Heo
2013-06-06 11:50 ` Michal Hocko
2013-06-07 0:52 ` Tejun Heo
2013-06-07 7:37 ` Michal Hocko
2013-06-07 23:25 ` Tejun Heo
2013-06-10 8:02 ` Michal Hocko
2013-06-10 19:54 ` Tejun Heo
2013-06-10 20:48 ` Michal Hocko
2013-06-10 23:13 ` Tejun Heo
2013-06-11 7:27 ` Michal Hocko
2013-06-11 7:44 ` Tejun Heo
2013-06-11 7:55 ` Michal Hocko
2013-06-11 8:00 ` Tejun Heo
2013-06-04 21:40 ` Johannes Weiner
2013-06-04 21:49 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130604152959.GB6356@dhcp22.suse.cz \
--to=mhocko@suse.cz \
--cc=bsingharora@gmail.com \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=lizefan@huawei.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox