From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Tejun Heo <tj@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org
Subject: Re: [PATCH -v2 4/6] memcg: make sure that memcg is not offline when charging
Date: Wed, 5 Feb 2014 10:28:21 -0500 [thread overview]
Message-ID: <20140205152821.GY6963@cmpxchg.org> (raw)
In-Reply-To: <20140205133834.GB2425@dhcp22.suse.cz>
On Wed, Feb 05, 2014 at 02:38:34PM +0100, Michal Hocko wrote:
> On Tue 04-02-14 11:29:39, Johannes Weiner wrote:
> [...]
> > Maybe we should remove the XXX if it makes you think we should change
> > the current situation by any means necessary. This patch is not an
> > improvement.
> >
> > I put the XXX there so that we one day maybe refactor the code in a
> > clean fashion where try_get_mem_cgroup_from_whatever() is in the same
> > rcu section as the first charge attempt. On failure, reclaim, and do
> > the lookup again.
>
> I wouldn't be opposed to such a cleanup. It is not that simple, though.
>
> > Also, this problem only exists on swapin, where the memcg is looked up
> > from an auxilliary data structure and not the current task, so maybe
> > that would be an angle to look for a clean solution.
>
> I am not so sure about that. Task could have been moved to a different
> group basically anytime it was outside of rcu_read_lock section (which
> means most of the time). And so the group might get removed and we are
> in the very same situation.
>
> > Either way, the problem is currently fixed
>
> OK, my understanding (and my ack was based on that) was that we needed
> a simple and safe fix for the stable trees and we would have something
> more appropriate later on. Preventing from the race sounds like a more
> appropriate and a better technical solution to me. So I would rather ask
> why to keep a workaround in place. Does it add any risk?
> Especially when we basically abuse the 2 stage cgroup removal. All the
> charges should be cleared out after css_offline.
I thought more about this and talked to Tejun as well. He told me
that the rcu grace period between disabling tryget and calling
css_offline() is currently an implementation detail of the refcounter
that css uses, but it's not a guarantee. So my initial idea of
reworking memcg to do css_tryget() and res_counter_charge() in the
same rcu section is no longer enough to synchronize against offlining.
We can forget about that.
On the other hand, memcg holds a css reference only while an actual
controller reference is being established (res_counter_charge), then
drops it. This means that once css_tryget() is disabled, we only need
to wait for the css refcounter to hit 0 to know for sure that no new
charges can show up and reparent_charges() is safe to run, right?
Well, css_free() is the callback invoked when the ref counter hits 0,
and that is a guarantee. From a memcg perspective, it's the right
place to do reparenting, not css_offline().
Here is the only exception to the above: swapout records maintain
permanent css references, so they prevent css_free() from running.
For that reason alone we should run one optimistic reparenting in
css_offline() to make sure one swap record does not pin gigabytes of
pages in an offlined cgroup, which is unreachable for reclaim. But
the reparenting for *correctness* is in css_free(), not css_offline().
We should be changing the comments. The code is already correct.
> > Unless the alternative solution is inherent in a clean rework of the
> > code to match cgroup core lifetime management, I don't see any reason
> > to move away from the status quo.
>
> To be honest this sounds like a weak reasoning to refuse a real fix
> which replaces a workaround.
>
> This is a second attempt to fix the actual race that you are dismissing
> which is really surprising to me. Especially when the workaround is an
> ugly hack.
IMO it was always functionally correct, just something that could have
been done cleaner from a design POV. That's why I refused every
alternative solution that made the code worse instead of better.
But looks like it also makes perfect sense from a design POV, so
it's all moot now.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-02-05 15:28 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-04 13:28 [PATCH -v2 0/6] memcg: some charge path cleanups + css offline vs. charge race fix Michal Hocko
2014-02-04 13:28 ` [PATCH -v2 1/6] memcg: do not replicate try_get_mem_cgroup_from_mm in __mem_cgroup_try_charge Michal Hocko
2014-02-04 15:55 ` Johannes Weiner
2014-02-04 16:05 ` Michal Hocko
2014-02-05 13:49 ` Michal Hocko
2014-02-04 13:28 ` [PATCH -v2 2/6] memcg: cleanup charge routines Michal Hocko
2014-02-04 16:05 ` Johannes Weiner
2014-02-04 16:12 ` Michal Hocko
2014-02-04 16:40 ` Johannes Weiner
2014-02-04 19:11 ` Michal Hocko
2014-02-04 19:36 ` Johannes Weiner
2014-02-04 13:28 ` [PATCH -v2 3/6] memcg: mm == NULL is not allowed for mem_cgroup_try_charge_mm Michal Hocko
2014-02-04 16:05 ` Johannes Weiner
2014-02-04 13:28 ` [PATCH -v2 4/6] memcg: make sure that memcg is not offline when charging Michal Hocko
2014-02-04 16:29 ` Johannes Weiner
2014-02-05 13:38 ` Michal Hocko
2014-02-05 15:28 ` Johannes Weiner [this message]
2014-02-05 15:42 ` Tejun Heo
2014-02-05 16:19 ` Michal Hocko
2014-02-05 16:29 ` Michal Hocko
2014-02-05 16:30 ` Tejun Heo
2014-02-05 16:45 ` Johannes Weiner
2014-02-05 17:23 ` Michal Hocko
2014-02-04 13:28 ` [PATCH -v2 5/6] memcg, kmem: clean up memcg parameter handling Michal Hocko
2014-02-04 16:32 ` Johannes Weiner
2014-02-04 16:42 ` Michal Hocko
2014-02-04 13:29 ` [PATCH -v2 6/6] Revert "mm: memcg: fix race condition between memcg teardown and swapin" Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140205152821.GY6963@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox