From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, Oleg Nesterov <oleg@redhat.com>,
Tejun Heo <tj@kernel.org>,
Vladimir Davydov <vdavydov@parallels.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC 3/3] memcg: get rid of mm_struct::owner
Date: Tue, 26 May 2015 10:10:11 -0400 [thread overview]
Message-ID: <20150526141011.GA11065@cmpxchg.org> (raw)
In-Reply-To: <1432641006-8025-4-git-send-email-mhocko@suse.cz>
On Tue, May 26, 2015 at 01:50:06PM +0200, Michal Hocko wrote:
> Please note that this patch introduces a USER VISIBLE CHANGE OF BEHAVIOR.
> Without mm->owner _all_ tasks associated with the mm_struct would
> initiate memcg migration while previously only owner of the mm_struct
> could do that. The original behavior was awkward though because the user
> task didn't have any means to find out the current owner (esp. after
> mm_update_next_owner) so the migration behavior was not well defined
> in general.
> New cgroup API (unified hierarchy) will discontinue tasks file which
> means that migrating threads will no longer be possible. In such a case
> having CLONE_VM without CLONE_THREAD could emulate the thread behavior
> but this patch prevents from isolating memcg controllers from others.
> Nevertheless I am not convinced such a use case would really deserve
> complications on the memcg code side.
I think such a change is okay. The memcg semantics of moving threads
with the same mm into separate groups have always been arbitrary. No
reasonable behavior can be expected of this, so what sane real life
usecase would rely on it?
> @@ -104,7 +105,12 @@ static inline bool mm_match_cgroup(struct mm_struct *mm,
> bool match = false;
>
> rcu_read_lock();
> - task_memcg = mem_cgroup_from_task(rcu_dereference(mm->owner));
> + /*
> + * rcu_dereference would be better but mem_cgroup is not a complete
> + * type here
> + */
> + task_memcg = READ_ONCE(mm->memcg);
> + smp_read_barrier_depends();
> if (task_memcg)
> match = mem_cgroup_is_descendant(task_memcg, memcg);
> rcu_read_unlock();
This function has only one user in rmap. If you inline it there, you
can use rcu_dereference() and get rid of the specialness & comment.
> @@ -195,6 +201,10 @@ void mem_cgroup_split_huge_fixup(struct page *head);
> #else /* CONFIG_MEMCG */
> struct mem_cgroup;
>
> +void mm_drop_memcg(struct mm_struct *mm)
> +{}
> +void mm_set_memcg(struct mm_struct *mm, struct mem_cgroup *memcg)
> +{}
static inline?
> @@ -292,94 +292,6 @@ kill_orphaned_pgrp(struct task_struct *tsk, struct task_struct *parent)
> }
> }
>
> -#ifdef CONFIG_MEMCG
> -/*
> - * A task is exiting. If it owned this mm, find a new owner for the mm.
> - */
> -void mm_update_next_owner(struct mm_struct *mm)
> -{
> - struct task_struct *c, *g, *p = current;
> -
> -retry:
> - /*
> - * If the exiting or execing task is not the owner, it's
> - * someone else's problem.
> - */
> - if (mm->owner != p)
> - return;
> - /*
> - * The current owner is exiting/execing and there are no other
> - * candidates. Do not leave the mm pointing to a possibly
> - * freed task structure.
> - */
> - if (atomic_read(&mm->mm_users) <= 1) {
> - mm->owner = NULL;
> - return;
> - }
> -
> - read_lock(&tasklist_lock);
> - /*
> - * Search in the children
> - */
> - list_for_each_entry(c, &p->children, sibling) {
> - if (c->mm == mm)
> - goto assign_new_owner;
> - }
> -
> - /*
> - * Search in the siblings
> - */
> - list_for_each_entry(c, &p->real_parent->children, sibling) {
> - if (c->mm == mm)
> - goto assign_new_owner;
> - }
> -
> - /*
> - * Search through everything else, we should not get here often.
> - */
> - for_each_process(g) {
> - if (g->flags & PF_KTHREAD)
> - continue;
> - for_each_thread(g, c) {
> - if (c->mm == mm)
> - goto assign_new_owner;
> - if (c->mm)
> - break;
> - }
> - }
> - read_unlock(&tasklist_lock);
> - /*
> - * We found no owner yet mm_users > 1: this implies that we are
> - * most likely racing with swapoff (try_to_unuse()) or /proc or
> - * ptrace or page migration (get_task_mm()). Mark owner as NULL.
> - */
> - mm->owner = NULL;
> - return;
> -
> -assign_new_owner:
> - BUG_ON(c == p);
> - get_task_struct(c);
> - /*
> - * The task_lock protects c->mm from changing.
> - * We always want mm->owner->mm == mm
> - */
> - task_lock(c);
> - /*
> - * Delay read_unlock() till we have the task_lock()
> - * to ensure that c does not slip away underneath us
> - */
> - read_unlock(&tasklist_lock);
> - if (c->mm != mm) {
> - task_unlock(c);
> - put_task_struct(c);
> - goto retry;
> - }
> - mm->owner = c;
> - task_unlock(c);
> - put_task_struct(c);
> -}
> -#endif /* CONFIG_MEMCG */
w00t!
> @@ -469,6 +469,46 @@ static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
> return mem_cgroup_from_css(css);
> }
>
> +static struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
> +{
> + if (!p->mm)
> + return NULL;
> + return rcu_dereference(p->mm->memcg);
> +}
> +
> +void mm_set_memcg(struct mm_struct *mm, struct mem_cgroup *memcg)
> +{
> + if (memcg)
> + css_get(&memcg->css);
> + rcu_assign_pointer(mm->memcg, memcg);
> +}
> +
> +void mm_drop_memcg(struct mm_struct *mm)
> +{
> + /*
> + * This is the last reference to mm so nobody can see
> + * this memcg
> + */
> + if (mm->memcg)
> + css_put(&mm->memcg->css);
> +}
This is really simple and obvious and has only one caller, it would be
better to inline this into mmput(). The comment would also be easier
to understand in conjunction with the mmdrop() in the callsite:
if (mm->memcg)
css_put(&mm->memcg->css);
/* We could reset mm->memcg, but this will free the mm: */
mmdrop(mm);
The same goes for mm_set_memcg, there is no real need for obscuring a
simple get-and-store.
> +static void mm_move_memcg(struct mm_struct *mm, struct mem_cgroup *memcg)
> +{
> + struct mem_cgroup *old_memcg;
> +
> + mm_set_memcg(mm, memcg);
> +
> + /*
> + * wait for all current users of the old memcg before we
> + * release the reference.
> + */
> + old_memcg = mm->memcg;
> + synchronize_rcu();
> + if (old_memcg)
> + css_put(&old_memcg->css);
> +}
I'm not sure why we need that synchronize_rcu() in here, the css is
itself protected by RCU and a failing tryget will prevent you from
taking it outside a RCU-locked region.
Aside from that, there is again exactly one place that performs this
operation. Please inline it into mem_cgroup_move_task().
> @@ -5204,6 +5251,12 @@ static void mem_cgroup_move_task(struct cgroup_subsys_state *css,
> struct mm_struct *mm = get_task_mm(p);
>
> if (mm) {
> + /*
> + * Commit to a new memcg. mc.to points to the destination
> + * memcg even when the current charges are not moved.
> + */
> + mm_move_memcg(mm, mc.to);
> +
> if (mc_move_charge())
> mem_cgroup_move_charge(mm);
> mmput(mm);
It's a little weird to use mc.to when not moving charges, as "mc"
stands for "move charge". Why not derive the destination from @css,
just like can_attach does? It's a mere cast. That also makes patch
#2 in your series unnecessary.
Otherwise, the patch looks great to me.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-05-26 14:10 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-26 11:50 [RFC 0/3] " Michal Hocko
2015-05-26 11:50 ` [RFC 1/3] memcg: restructure mem_cgroup_can_attach() Michal Hocko
2015-05-26 11:50 ` [RFC 2/3] memcg: Use mc.moving_task as the indication for charge moving Michal Hocko
2015-05-26 11:50 ` [RFC 3/3] memcg: get rid of mm_struct::owner Michal Hocko
2015-05-26 14:10 ` Johannes Weiner [this message]
2015-05-26 15:11 ` Michal Hocko
2015-05-26 17:20 ` Johannes Weiner
2015-05-27 14:48 ` Michal Hocko
2015-05-28 21:07 ` Tejun Heo
2015-05-29 12:08 ` Michal Hocko
2015-05-29 13:10 ` Tejun Heo
2015-05-29 13:45 ` Michal Hocko
2015-05-29 14:07 ` Tejun Heo
2015-05-29 14:57 ` Michal Hocko
2015-05-29 15:23 ` Tejun Heo
2015-05-29 15:26 ` Michal Hocko
2015-05-26 16:36 ` Oleg Nesterov
2015-05-26 17:22 ` Michal Hocko
2015-05-26 17:38 ` Oleg Nesterov
2015-05-27 9:43 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150526141011.GA11065@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=oleg@redhat.com \
--cc=tj@kernel.org \
--cc=vdavydov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox