From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d28relay02.in.ibm.com (d28relay02.in.ibm.com [9.184.220.59]) by e28smtp06.in.ibm.com (8.13.1/8.13.1) with ESMTP id m2SIESQ9032650 for ; Fri, 28 Mar 2008 23:44:28 +0530 Received: from d28av01.in.ibm.com (d28av01.in.ibm.com [9.184.220.63]) by d28relay02.in.ibm.com (8.13.8/8.13.8/NCO v8.7) with ESMTP id m2SIERSm1147018 for ; Fri, 28 Mar 2008 23:44:27 +0530 Received: from d28av01.in.ibm.com (loopback [127.0.0.1]) by d28av01.in.ibm.com (8.13.1/8.13.3) with ESMTP id m2SIEYEG001796 for ; Fri, 28 Mar 2008 18:14:35 GMT Message-ID: <47ED34A4.70604@linux.vnet.ibm.com> Date: Fri, 28 Mar 2008 23:40:44 +0530 From: Balbir Singh Reply-To: balbir@linux.vnet.ibm.com MIME-Version: 1.0 Subject: Re: [-mm] Add an owner to the mm_struct (v2) References: <20080328082316.6961.29044.sendpatchset@localhost.localdomain> <6599ad830803280401r68d30e91waaea8eb1de36eb52@mail.gmail.com> <47ECE662.3060506@linux.vnet.ibm.com> <6599ad830803280705o4213c448r991cbf9da6ffe2f1@mail.gmail.com> <47ED0621.4050304@linux.vnet.ibm.com> <6599ad830803280838s19ffc366w1a950ebb12e2907b@mail.gmail.com> In-Reply-To: <6599ad830803280838s19ffc366w1a950ebb12e2907b@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Paul Menage Cc: Pavel Emelianov , Hugh Dickins , Sudhir Kumar , YAMAMOTO Takashi , lizf@cn.fujitsu.com, linux-kernel@vger.kernel.org, taka@valinux.co.jp, linux-mm@kvack.org, David Rientjes , Andrew Morton , KAMEZAWA Hiroyuki List-ID: Paul Menage wrote: > On Fri, Mar 28, 2008 at 7:52 AM, Balbir Singh wrote: >> mm->owner_lock is there to protect mm->owner field from changing simultaneously >> as tasks fork/exit. >> > > But the *hardware* already does that for you - individual writes to > pointers are already atomic operations and so will be serialized. > Using a lock to guard something only does anything useful if at least > one of the critical regions that takes the lock consists of more than > a single atomic operation, or if you have a mixture of read sections > and write sections. Now it's true that your critical region in > mm_fork_init_owner() is more than a single atomic op, but I'm arguing > below that it's a no-op. So that just leaves the single region > > spin_lock(&mm->owner_lock); > mm->owner = new_owner; > spin_unlock(&mm->owner_lock); > > which isn't observably different if you remove the spinlock. > At fork time, we can have do_fork() run in parallel and we need to protect mm->owner, if several threads are created at the same time. We don't want to overwrite mm->owner for each thread that is created. >> Oh! yes.. my bad again. The check should have been p == p->thread_group, but >> that is not required either. The check should now ideally be >> >> if (!(clone_flags & CLONE_VM)) >> > > OK, so if the new thread has its own mm (and hence will already have > mm->owner set up to point to p in mm_init()) then we do: > >> + if (mm->owner != p) >> + rcu_assign_pointer(mm->owner, p->group_leader); > > which is a no-op since we know mm->owner == p. > >> Yes.. I think we need to call it earlier. >> > > No, I think we need to call it later - after we've cleared current->mm > (from within task_lock(current)) - so we can't rely on p->mm in this > function, we have to pass it in. If we call it before while > current->mm == mm, then we risk a race where the (new or existing) > owner exits and passes it back to us *after* we've done a check to see > if we need to find a new owner. If we ensure that current->mm != mm > before we call mm_update_next_owner(), then we know we're not a > candidate for receiving the ownership if we don't have it already. > Yes and we could also check for flags & PF_EXITING >> But there is no way to guarantee that, what is the new_owner exec's after we've >> done the check and assigned. Won't we end up breaking the invariant? How about >> we have mm_update_new_owner() call in exec_mmap() as well? That way, we can >> still use owner_lock and keep the invariant. >> > > Oops, I thought that exit_mm() already got called in the execve() > path, but you're right, it doesn't. > > Yes, exit_mmap() should call mm_update_next_owner() after the call to > task_unlock(), i.e. after it's set its new mm. > > So I need to express the invariant more carefully. > > What we need to preserve is that, for every mm at all times, mm->owner > points to a valid task. So either: > > 1) mm->owner->mm == mm AND mm->owner will check to see whether it > needs to pass ownership before it exits or execs. > > OR > > 2) mm->owner is the last user of mm and is about to free mm. > > OR > > 3) mm->owner is currently searching for another user of mm to pass the > ownership to. > > In order to get from state 3 to state 1 safely we have to hold > task_lock(new_owner). Otherwise we can race with an exit or exec in > new_owner, resulting in a process that has already passed the point of > checking current->mm->owner. > No.. like you said if we do it after current->mm has changed and is different from mm, then it's safe to find a new owner. I still don't see why we need task_lock(new_owner). Even if we have task_lock(new_owner), it can still exit or exec later. > I don't see why we need mm->owner_lock to maintain this invariant. > (But am quite prepared to be proven wrong). > Why mix task_lock() to protect mm->owner? owner_lock can provide the protection you are talking about. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org