From: Michal Hocko <mhocko@suse.cz>
To: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Andrew Morton <akpm@linux-foundation.org>,
Greg Thelen <gthelen@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/3] mm/memcg: fix endless iteration in reclaim
Date: Tue, 21 Jan 2014 09:34:54 +0100 [thread overview]
Message-ID: <20140121083454.GA1894@dhcp22.suse.cz> (raw)
In-Reply-To: <alpine.LSU.2.11.1401201958330.1155@eggly.anvils>
On Mon 20-01-14 21:16:36, Hugh Dickins wrote:
> On Fri, 17 Jan 2014, Michal Hocko wrote:
> > On Thu 16-01-14 11:15:36, Hugh Dickins wrote:
> >
> > > I don't believe 19f39402864e was responsible for a reference leak,
> > > that came later. But I think it was responsible for the original
> > > endless iteration (shrink_zone going around and around getting root
> > > again and again from mem_cgroup_iter).
> >
> > So your hang is not within mem_cgroup_iter but you are getting root all
> > the time without any way out?
>
> In the 3.10 and 3.11 cases, yes.
OK, that makes sense.
> > [3.10 code base]
> > shrink_zone
> > [rmdir root]
> > mem_cgroup_iter(root, NULL, reclaim)
> > // prev = NULL
> > rcu_read_lock()
> > last_visited = iter->last_visited // gets root || NULL
> > css_tryget(last_visited) // failed
> > last_visited = NULL [1]
> > memcg = root = __mem_cgroup_iter_next(root, NULL)
> > iter->last_visited = root;
> > reclaim->generation = iter->generation
> >
> > mem_cgroup_iter(root, root, reclaim)
> > // prev = root
> > rcu_read_lock
> > last_visited = iter->last_visited // gets root
> > css_tryget(last_visited) // failed
> > [1]
> >
> > So we indeed can loop here without any progress. I just fail
> > to see how my patch could help. We even do not get down to
> > cgroup_next_descendant_pre.
> >
> > Or am I missing something?
>
> Your patch to 3.12 and 3.13 mem_cgroup_iter_next() doesn't help
> in 3.10 and 3.11, correct. That's why I appended a different patch,
> to mem_cgroup_iter(), for the 3.10 and 3.11 versions of the hang.
>
> >
> > The following should fix this kind of endless loop:
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 194721839cf5..168e5abcca92 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1221,7 +1221,8 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
> > smp_rmb();
> > last_visited = iter->last_visited;
> > if (last_visited &&
> > - !css_tryget(&last_visited->css))
> > + last_visited != root &&
> > + !css_tryget(&last_visited->css))
> > last_visited = NULL;
> > }
> > }
> > @@ -1229,7 +1230,7 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
> > memcg = __mem_cgroup_iter_next(root, last_visited);
> >
> > if (reclaim) {
> > - if (last_visited)
> > + if (last_visited && last_visited != root)
> > css_put(&last_visited->css);
> >
> > iter->last_visited = memcg;
>
> Right, that appears to fix 3.10, and seems a better alternative to the
> patch I suggested. I say "appears" because my success in reproducing
> the hang is variable, so when I see that it's "fixed" I cannot be
> quite sure.
Understood.
> I say "seems" because I think yours respects the intention
> of the iterator better than mine, but I've never been convinced that
> the iterator is as sensible as it intends in the face of races.
>
> At the bottom I've appended the version of yours that I've been trying
> on 3.11. I did succeed in reproducing the hang twice on 3.11.10.3
> (which I don't think differs in any essential from 3.11.0 for this issue,
> but after my lack of success with 3.11.0 I tried harder with that.)
git log points only at 3 patches in mm/memcontrol.c and all of them seem
unrelated.
> More so than in the 3.10 case, I haven't really given it long enough
> with the patch to really assert that it's good; and Greg Thelen came
> across a different reproduction case that I've yet to remind myself
> of and try, I'll have to report back to you later in the week when
> I've run that with your fix.
Great, thanks a lot for your testing. It is really appreciated
especially now that I am quite busy with other internal stuff.
> > Not that I like it much :/
>
> Well, I'm not in love with it, but I do think it's more appropriate
> than mine, if it really does fix the issues.
It fixes a potential endless loop. It is a question it is the one you
are seeing.
> It was only under questioning from you that we arrived at the belief
> that the problem is with the css_tryget of a root being removed: my
> patch was vaguer than that, not identifying the root cause.
>
> I suspect that the underlying problem is actually the "do {} while ()"
> nature of the iteration loops, instead of "while () {}"s.
I think the outside caller shouldn't care much. The iterator code has to
make sure that it doesn't loop itself. Doing while () {} has some issues
as well. Having a reason to reclaim but hen do not reclaim anything
might pop out as an issue upper in the calling stack.
> That places us (not for the first time) in the awkward position of
> having to supply something once (and once only) even when it doesn't
> really fit.
>
> (I have wondered whether making mem_cgroup_invalidate_reclaim_iterators
> visit the memcg as well as its parents, might provide another fix; nice
> if it did, but I doubt it, and have spent so much time fiddling around
> here that I've lost the will to try anything else.)
I do not see it as an easier alternative.
[...]
> > > > Cc: stable@vger.kernel.org # 3.10+
> > >
> > > Well, I'm okay with that, if we use that as a way to shoehorn in the
> > > patch at the bottom instead for 3.10 and 3.11 stables.
> >
> > So far I do not see how it would make a change for those two kernels as
> > they have the special handling for root.
>
> That was my point: that patch does not fix 3.10 and 3.11 at all,
> but they suffer from the same problem (manifesting in a slightly
> different way, the hang revisiting mem_cgroup_iter repeatedly instead
> of being trapped inside it); so it may not be inappropriate to say 3.10+
> even though that particular patch will not apply and would not fix them.
OK, understood now. I will repost that patch with updated changelog
later.
> > [...]
> > > "Equivalent" patch for 3.10 or 3.11: fixing similar hangs but no leakage.
> > >
> > > Signed-off-by: Hugh Dickins <hughd@google.com>
> > >
> > > --- v3.10/mm/memcontrol.c 2013-06-30 15:13:29.000000000 -0700
> > > +++ linux/mm/memcontrol.c 2014-01-15 18:18:24.476566659 -0800
> > > @@ -1226,7 +1226,8 @@ struct mem_cgroup *mem_cgroup_iter(struc
> > > }
> > > }
> > >
> > > - memcg = __mem_cgroup_iter_next(root, last_visited);
> > > + if (!prev || last_visited)
> > > + memcg = __mem_cgroup_iter_next(root, last_visited);
> >
> > I am confused. What would change between those two calls to change the
> > outcome? The function doesn't have any internal state.
>
> I don't understand your question (what two calls?).
OK, it was my selective blindness that stroke again here. Sorry about
the confusion.
With fresh eyes. Yes it would work as well.
> The 3.10 or 3.11
> __mem_cgroup_iter_next() begins with "if (!last_visited) return root;",
> which was problematic because again and again it would return root.
> Originally I passed in prev, and returned NULL instead of root if prev
> but !last_visited; but I've an aversion to passing a function an extra
> argument to say it shouldn't have been called, so in this version I'm
> testing !prev || last_visited before calling it. Perhaps your "two
> calls" are the first with prev == NULL and the second with prev == root.
>
> But I say I prefer your fix because mine above says nothing about root,
> which we now believe is the only problematic case. Mine would leave
> memcg NULL whenever a change resets last_visited to NULL (once one memcg
> has been delivered): which is simple, but not what the iterator intends
> (if I read it right, it wants to start again from the beginning, whereas
> I'm hastening it to the end). In practice mine works well, and I haven't
> seen the premature OOMs that you might suppose it leads to; but let's go
> for yours as more in keeping with the spirit of the iterator.
OK, let's keep it consistently ugly.
> "The spirit of the iterator", now that's a fine phrase.
:)
> Here's my 3.11 version of your 3.10, in case you spot something silly.
> I'll give it a try on Greg's testcase in coming days and report back.
> (Greg did suggest a different fix from mine back when he hit the issue,
> I'll also look that one out again in case it offers something better.)
>
> --- v3.11/mm/memcontrol.c 2014-01-19 14:16:38.656701990 -0800
> +++ linux/mm/memcontrol.c 2014-01-20 19:04:50.635637615 -0800
> @@ -1148,19 +1148,17 @@ mem_cgroup_iter_load(struct mem_cgroup_r
> if (iter->last_dead_count == *sequence) {
> smp_rmb();
> position = iter->last_visited;
> - if (position && !css_tryget(&position->css))
> + if (position && position != root &&
> + !css_tryget(&position->css))
> position = NULL;
> }
> return position;
> }
>
> static void mem_cgroup_iter_update(struct mem_cgroup_reclaim_iter *iter,
> - struct mem_cgroup *last_visited,
> struct mem_cgroup *new_position,
> int sequence)
> {
> - if (last_visited)
> - css_put(&last_visited->css);
> /*
> * We store the sequence count from the time @last_visited was
> * loaded successfully instead of rereading it here so that we
> @@ -1234,7 +1232,10 @@ struct mem_cgroup *mem_cgroup_iter(struc
> memcg = __mem_cgroup_iter_next(root, last_visited);
>
> if (reclaim) {
> - mem_cgroup_iter_update(iter, last_visited, memcg, seq);
> + if (last_visited && last_visited != root)
> + css_put(&last_visited->css);
> +
> + mem_cgroup_iter_update(iter, memcg, seq);
>
> if (!memcg)
> iter->generation++;
Yes it looks good. Although I would probably go and add root into
mem_cgroup_iter_update and do the check and css_put there to have
it symmetric with mem_cgroup_iter_load. I will cook up a changelog for
this one as well (for both 3.10 and 3.11 because they share fail on root
case).
Thanks a lot!
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-01-21 8:34 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-14 1:50 [PATCH 1/3] mm/memcg: fix last_dead_count memory wastage Hugh Dickins
2014-01-14 1:52 ` [PATCH 2/3] mm/memcg: fix endless iteration in reclaim Hugh Dickins
2014-01-14 13:27 ` Michal Hocko
2014-01-14 13:34 ` Michal Hocko
2014-01-14 14:26 ` Michal Hocko
2014-01-14 20:42 ` Hugh Dickins
2014-01-15 9:58 ` Michal Hocko
2014-01-15 12:17 ` Michal Hocko
2014-01-15 21:24 ` Hugh Dickins
2014-01-16 8:17 ` Michal Hocko
2014-01-16 15:22 ` Michal Hocko
2014-01-16 19:15 ` Hugh Dickins
2014-01-17 15:41 ` Michal Hocko
2014-01-21 5:16 ` Hugh Dickins
2014-01-21 8:34 ` Michal Hocko [this message]
2014-01-21 10:45 ` [PATCH -mm 1/2] memcg: fix endless loop caused by mem_cgroup_iter Michal Hocko
2014-01-21 10:45 ` [PATCH -mm 2/2] memcg: fix css reference leak and endless loop in mem_cgroup_iter Michal Hocko
2014-01-21 19:42 ` Andrew Morton
2014-01-21 21:18 ` Hugh Dickins
2014-01-22 8:27 ` Michal Hocko
2014-01-23 10:42 ` Hugh Dickins
2014-01-23 11:09 ` Michal Hocko
2014-01-23 12:53 ` Hugh Dickins
2014-01-22 8:12 ` Michal Hocko
2014-01-14 1:54 ` [PATCH 3/3] mm/memcg: iteration skip memcgs not yet fully initialized Hugh Dickins
2014-01-14 13:30 ` Michal Hocko
2014-01-14 14:29 ` Tejun Heo
2014-01-15 8:20 ` Michal Hocko
2014-01-15 8:21 ` Michal Hocko
2014-01-14 13:03 ` [PATCH 1/3] mm/memcg: fix last_dead_count memory wastage Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140121083454.GA1894@dhcp22.suse.cz \
--to=mhocko@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox