From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from m3.gw.fujitsu.co.jp ([10.0.50.73]) by fgwmail5.fujitsu.co.jp (Fujitsu Gateway) with ESMTP id mAB7SnoW027271 for (envelope-from kamezawa.hiroyu@jp.fujitsu.com); Tue, 11 Nov 2008 16:28:49 +0900 Received: from smail (m3 [127.0.0.1]) by outgoing.m3.gw.fujitsu.co.jp (Postfix) with ESMTP id F238E45DD7A for ; Tue, 11 Nov 2008 16:28:48 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (s3.gw.fujitsu.co.jp [10.0.50.93]) by m3.gw.fujitsu.co.jp (Postfix) with ESMTP id C4E5745DD77 for ; Tue, 11 Nov 2008 16:28:48 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id 9C908E08002 for ; Tue, 11 Nov 2008 16:28:48 +0900 (JST) Received: from ml12.s.css.fujitsu.com (ml12.s.css.fujitsu.com [10.249.87.102]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id 33154E08005 for ; Tue, 11 Nov 2008 16:28:48 +0900 (JST) Date: Tue, 11 Nov 2008 16:28:12 +0900 From: KAMEZAWA Hiroyuki Subject: Re: [RFC][PATCH]Per-cgroup OOM handler Message-Id: <20081111162812.492218fc.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <604427e00811102042x202906ecq2a10eb5e404e2ec9@mail.gmail.com> References: <604427e00811031340k56634773g6e260d79e6cb51e7@mail.gmail.com> <604427e00811031419k2e990061kdb03f4b715b51fb9@mail.gmail.com> <20081106143438.5557b87c.kamezawa.hiroyu@jp.fujitsu.com> <604427e00811102042x202906ecq2a10eb5e404e2ec9@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Ying Han Cc: linux-mm@kvack.org, Rohit Seth , Paul Menage , David Rientjes List-ID: On Mon, 10 Nov 2008 20:42:23 -0800 Ying Han wrote: > Thank you for your comments. > On Wed, Nov 5, 2008 at 9:34 PM, KAMEZAWA Hiroyuki < > kamezawa.hiroyu@jp.fujitsu.com> wrote: > Here is how we do the one-tick-wait in cgroup_should_oom() in oom_kill.c > >-------if (!ret) { > >------->-------/* If we're not going to OOM, we should sleep for a > >------->------- * bit to give userspace a chance to respond before we > >------->------- * go back and try to reclaim again */ > >------->-------schedule_timeout_uninterruptible(1); > >-------} > and it works well in-house so far as i mentioned earlier. what's > important here is not "sleeping for one tick", the idea here is to > reschedule the ooming thread so the oom handler can make action ( like > adding memory node to the cpuset) and the subsequent page allocator in > get_page_from_freelist() can use it. > Can't we avoid this kind of magical one-tick wait ? > > > (Before OOM, the system tend to wait in congestion_wait() or some.) > > I am not sure how the call to congestion_wait() relevant to the > "one-tick-wait"? We are simply just trying to reschedule the ooming task, > that the oom handler has waken up to have chance doing something. > if lucky. > > > > > > OOM-handler shoule be in another cpuset or mlocked in this case > > The oom-handler is in the same cgroup as the ooming task. That is why it's > called per-cgroup oom-handler. However, there's probably a livelock if the > userspace oom handler is the one that triggers the oom and detach/reattaches > without ever freeing or adding memory. For this case, either we can detect > in the kernel by doing something like if(current == pid) or just leave the > problem up to userspace( the oom handler shouldn't detach itself after > getting the ooming notification, it is considered to be a user bug? ). > Hmm, from discussion of mem_notify handler in Feb/March of this year, oom-hanlder cannot works well if memory is near to OOM, in general. Then, mlockall was recomemded to handler. (and it must not do file access.) I wonder creating small cpuset (and isolated node) for oom-handler may be another help. > > > > I'm wondering > > - freeeze-all-threads-in-group-at-oom > > - free emergency memory to page allocator which was pooled at cgroup > > creation > > rather than 1-tick wait > > > > BTW, it seems this patch allows task detach/attach always. it's safe(and > > sane) ? > > yes, we allows task detach/attach. So far we don't see any race condition > except the livelock > i mentioned above. Any particular scenario can think of now? thanks > I don't find it ;) BTW, shouldn't we disable preempt(or irq) before taking spinlocks ? > > +static int cgroup_should_oom(void) > > +{ > > + int ret = 1; /* OOM by default */ > > + struct oom_cgroup *cs; > > + > > + task_lock(current); > > + cs = oom_cgroup_from_task(current); > > + Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org