From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from sd0109e.au.ibm.com (d23rh905.au.ibm.com [202.81.18.225]) by e23smtp04.au.ibm.com (8.13.1/8.13.1) with ESMTP id m2SEtX1R016246 for ; Sat, 29 Mar 2008 01:55:33 +1100 Received: from d23av01.au.ibm.com (d23av01.au.ibm.com [9.190.234.96]) by sd0109e.au.ibm.com (8.13.8/8.13.8/NCO v8.7) with ESMTP id m2SExoHE154304 for ; Sat, 29 Mar 2008 01:59:50 +1100 Received: from d23av01.au.ibm.com (loopback [127.0.0.1]) by d23av01.au.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m2SEu1vM019107 for ; Sat, 29 Mar 2008 01:56:02 +1100 Message-ID: <47ED0621.4050304@linux.vnet.ibm.com> Date: Fri, 28 Mar 2008 20:22:17 +0530 From: Balbir Singh Reply-To: balbir@linux.vnet.ibm.com MIME-Version: 1.0 Subject: Re: [-mm] Add an owner to the mm_struct (v2) References: <20080328082316.6961.29044.sendpatchset@localhost.localdomain> <6599ad830803280401r68d30e91waaea8eb1de36eb52@mail.gmail.com> <47ECE662.3060506@linux.vnet.ibm.com> <6599ad830803280705o4213c448r991cbf9da6ffe2f1@mail.gmail.com> In-Reply-To: <6599ad830803280705o4213c448r991cbf9da6ffe2f1@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Paul Menage Cc: Pavel Emelianov , Hugh Dickins , Sudhir Kumar , YAMAMOTO Takashi , lizf@cn.fujitsu.com, linux-kernel@vger.kernel.org, taka@valinux.co.jp, linux-mm@kvack.org, David Rientjes , Andrew Morton , KAMEZAWA Hiroyuki List-ID: Paul Menage wrote: > On Fri, Mar 28, 2008 at 5:36 AM, Balbir Singh wrote: >> > - you hold task_lock() for the new owner (which is necessary anyway to >> > ensure that the new owner's mm doesn't change while you're updating >> > mm->owner) >> > >> >> tsk->mm should not change unless the task is exiting or when a kernel thread >> does use_mm() (PF_BORROWED_MM). >> > > Or the task calls execve(). > >> I see mm->owner changing when >> >> 1. The mm->owner exits >> 2. At fork time for clone calls with CLONE_VM > This was supposed to be deleted, ignore it > Why would a clone() call with CLONE_VM change mm->owner? We're sharing > with the existing owner, who is still using the mm, so we should leave > it as it is. > > I don't see what invariant you're trying to protect with > mm->owner_lock. Can you explain it? > mm->owner_lock is there to protect mm->owner field from changing simultaneously as tasks fork/exit. >> >> @@ -1357,6 +1379,10 @@ static struct task_struct *copy_process( >> >> write_unlock_irq(&tasklist_lock); >> >> proc_fork_connector(p); >> >> cgroup_post_fork(p); >> >> + >> >> + if (!(clone_flags & CLONE_VM) && (p != p->group_leader)) >> >> + mm_fork_init_owner(p); >> >> + >> > >> > I'm not sure I understand what this is doing. >> > >> > I read it as "if p has its own mm and p is a child thread, set >> > p->mm->owner to p->group_leader". But by definition if p has its own >> > mm, then p->group_leader->mm will be different to p->mm, therefore >> > we'd end up with mm->owner->mm != mm, which seems very bad. >> > >> > What's the intention of this bit of code? >> > >> >> The intention is to handle the case when clone is called without CLONE_VM and >> with CLONE_THREAD. This means that p can have it's own mm and a shared group_leader. >> > > In which case p should be the owner of its new mm, not > p->group_leader. mm_fork_init_owner() does: > Oh! yes.. my bad again. The check should have been p == p->thread_group, but that is not required either. The check should now ideally be if (!(clone_flags & CLONE_VM)) >> + if (mm->owner != p) >> + rcu_assign_pointer(mm->owner, p->group_leader); > > Since we just created a fresh mm for p in copy_mm(), and set mm->owner > = p, I don't see how this test can succeed. And if it does, then we > end up setting p->mm->owner to p->group_leader, which has to be bad > since we know that: > > - p is the only user of its mm > - p != p->group_leader > > therefore we're setting p->mm->owner to point to a process that > doesn't use p->mm. What am I missing? > >> >> +/* >> >> + * Task p is exiting and it owned p, so lets find a new owner for it >> >> + */ >> >> +static inline int >> >> +mm_need_new_owner(struct mm_struct *mm, struct task_struct *p) >> >> +{ >> >> + int ret; >> >> + >> >> + rcu_read_lock(); >> >> + ret = (mm && (rcu_dereference(mm->owner) == p) && >> >> + (atomic_read(&mm->mm_users) > 1)); >> >> + rcu_read_unlock(); >> >> + return ret; >> > >> > The only way that rcu_read_lock() helps here is if mm freeing is >> > protected by RCU, which I don't think is the case. >> > >> >> rcu_read_lock() also ensures that preemption does not cause us to see incorrect >> values. >> > > The only thing that RCU is protecting us against here is that if > mm->owner is non-NULL, the task it points to won't be freed until the > end of our RCU read section. But since we never dereference mm->owner > in the RCU read section, that buys us nothing and is just confusing. > >> > But as long as p==current, there's no race, since no other process >> > will re-point mm->owner at themselves, so mm can't go away anyway >> > since we have a reference to it that we're going to be dropping soon. >> > >> >> mm cannot go away, but mm->owner can be different from current and could be >> going away. > > But that's a problem for mm->owner, not for current. > > By this point we've already cleared current->mm, so we're no longer a > candidate for becoming mm->owner if we're not already the owner. > Therefore the truth or falsity of (mm->owner == current) can't be > changed by races - either we are already the owner and no-one but us > can change mm->owner, or we're not the owner and no one can point > mm->owner at us. > >> >> + >> >> + if (!mm_need_new_owner(mm, p)) >> >> + return; >> >> + >> >> + /* >> >> + * Search in the children >> >> + */ >> >> + list_for_each_entry(c, &p->children, sibling) { >> >> + if (c->mm == p->mm) > > Oh, and at the point in exit_mm() when you call > mm_update_next_owner(), p->mm is NULL, so this will only match against > tasks that have no mm. > Yes.. I think we need to call it earlier. >> >> + goto assign_new_owner; >> >> + } >> > >> > We need to keep checking mm_need_new_owner() since it can become false >> > if the only other user of the mm exits at the same time that we do. >> > (In which case there's nothing to do). >> > >> >> I would rather deal with the case where mm->owner is NULL, rather than keep >> checking > > I mean that it's possible that there's one other task using mm at the > point when we enter mm_update_next_owner(), but while we're trying to > find it, it does an exit() or execve() and stops being a user of mm. > if that happens, we should bail from the search since we no longer > need to find another user to pass it to (and there is no other user to > find). > >> (since even with constant checking we cannot guarantee that mm->owner >> will not become NULL) >> >> >> >> + * Search through everything else. We should not get >> >> + * here often >> >> + */ >> >> + for_each_process(c) { >> >> + g = c; >> >> + do { >> >> + if (c->mm && (c->mm == p->mm)) >> >> + goto assign_new_owner; >> >> + } while ((c = next_thread(c)) != g); >> >> + } >> > >> > Is there a reason to not code this as for_each_thread? >> > >> >> Is there a for_each_thread()? >> > > Sorry, I meant do_each_thread()/while_each_thread(). Isn't that > basically the same thing as what you've explicitly coded here? > Yes, it is. >> >> +assign_new_owner: >> >> + spin_lock(&mm->owner_lock); >> >> + rcu_assign_pointer(mm->owner, c); >> >> + spin_unlock(&mm->owner_lock); >> >> +} >> > >> > This can break if c is also exiting and has passed the call to >> > mm_update_next_owner() by the time we assign mm->owner. That's why my >> > original suggested version had a function like: >> > >> >> Won't it better to check for c->flags & PF_EXITING? >> > > Or if c is execing and gives up the mm that way. > >> > static inline void try_give_mm_ownership(struct task_struct *task, >> > struct mm_struct *mm) { >> > if (task->mm != mm) return; >> > task_lock(task); >> > if (task->mm == mm) { >> > mm->owner = task; >> > } >> > task_unlock(task); >> > } >> > >> > i.e. determining that a task is a valid candidate and updating the >> > owner pointer has to be done in the same critical section. >> > >> >> Let me try and revamp the locking rules and see what that leads to. But, I don't >> like protecting an mm_struct's member with a task_struct's lock. > > We're not protecting mm->owner with task_lock(new_owner). As I said, > the locking rule for mm->owner should that it can only be changed if > (current==mm->owner). What we're protecting is new_owner->mm - i.e. > we're ensuring that we don't break the invariant that (mm->owner->mm > == mm) > But there is no way to guarantee that, what is the new_owner exec's after we've done the check and assigned. Won't we end up breaking the invariant? How about we have mm_update_new_owner() call in exec_mmap() as well? That way, we can still use owner_lock and keep the invariant. >> >> config CGROUP_MEM_RES_CTLR >> >> bool "Memory Resource Controller for Control Groups" >> >> - depends on CGROUPS && RESOURCE_COUNTERS >> >> + depends on CGROUPS && RESOURCE_COUNTERS && MM_OWNER >> > >> > Maybe this should select MM_OWNER rather than depending on it? >> > >> >> I thought of it, but wondered if the user should make an informed choice about >> MM_OWNER and the overhead it brings along. > > I would vote the other way on that - make it clear in the memory > controller config help that the memory controller can introduce > overhead, and have it implicitly enable MM_OWNER. > That's a possibility Thanks for your patience and review. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org