From mboxrd@z Thu Jan 1 00:00:00 1970 From: Balbir Singh Subject: Re: [-mm] Add an owner to the mm_struct (v8) Date: Fri, 04 Apr 2008 14:55:14 +0530 Message-ID: <47F5F3FA.7060709@linux.vnet.ibm.com> References: <20080404080544.26313.38199.sendpatchset@localhost.localdomain> <6599ad830804040112q3dd5333aodf6a170c78e61dc8@mail.gmail.com> <47F5E69C.9@linux.vnet.ibm.com> <6599ad830804040150j4946cf92h886bb26000319f3b@mail.gmail.com> Reply-To: balbir@linux.vnet.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <6599ad830804040150j4946cf92h886bb26000319f3b@mail.gmail.com> Sender: linux-kernel-owner@vger.kernel.org To: Paul Menage Cc: Pavel Emelianov , Hugh Dickins , Sudhir Kumar , YAMAMOTO Takashi , lizf@cn.fujitsu.com, linux-kernel@vger.kernel.org, taka@valinux.co.jp, linux-mm@kvack.org, David Rientjes , Andrew Morton , KAMEZAWA Hiroyuki List-Id: linux-mm.kvack.org Paul Menage wrote: > On Fri, Apr 4, 2008 at 1:28 AM, Balbir Singh wrote: >> It won't uncharge for the memory controller from the root cgroup since each page >> has the mem_cgroup information associated with it. > > Right, I realise that the memory controller is OK because of the ref counts. > >> For other controllers, >> they'll need to monitor exit() callbacks to know when the leader is dead :( (sigh). > > That sounds like a nightmare ... > Yes, it would be, but worth the trouble. Is it really critical to move a dead cgroup leader to init_css_set in cgroup_exit()? >> Not having the group leader optimization can introduce big overheads (consider >> thousands of tasks, with the group leader being the first one to exit). > > Can you test the overhead? > I probably can write a program and see what the overhead looks like > As long as we find someone to pass the mm to quickly, it shouldn't be > too bad - I think we're already optimized for that case. Generally the > group leader's first child will be the new owner, and any subsequent > times the owner exits, they're unlikely to have any children so > they'll go straight to the sibling check and pass the mm to the > parent's first child. > > Unless they all exit in strict sibling order and hence pass the mm > along the chain one by one, we should be fine. And if that exit > ordering does turn out to be common, then simply walking the child and > sibling lists in reverse order to find a victim will minimize the > amount of passing. > Finding the next mm might not be all that bad, but doing it each time a task exits, can be an overhead, specially for large multi threaded programs. This can get severe if the new mm->owner belongs to a different cgroup, in which case we need to use callbacks as well. If half the threads belonged to a different cgroup and the new mm->owner kept switching between cgroups, the overhead would be really high, with the callbacks and the mm->owner changing frequently. > One other thing occurred to me - what lock protects the child and > sibling links? I don't see any documentation anywhere, but from the > code it looks as though it's tasklist_lock rather than RCU - so maybe > we should be holding that with a read_lock(), at least for the first > two parts of the search? (The full thread search is RCU-safe). > You are right about the read_lock() -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL