From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail143.messagelabs.com (mail143.messagelabs.com [216.82.254.35]) by kanga.kvack.org (Postfix) with SMTP id 9DD616B0071 for ; Wed, 9 Jun 2010 20:43:11 -0400 (EDT) Received: from m4.gw.fujitsu.co.jp ([10.0.50.74]) by fgwmail5.fujitsu.co.jp (Fujitsu Gateway) with ESMTP id o5A0h8gN003746 for (envelope-from kamezawa.hiroyu@jp.fujitsu.com); Thu, 10 Jun 2010 09:43:08 +0900 Received: from smail (m4 [127.0.0.1]) by outgoing.m4.gw.fujitsu.co.jp (Postfix) with ESMTP id 1187145DE7D for ; Thu, 10 Jun 2010 09:43:08 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (s4.gw.fujitsu.co.jp [10.0.50.94]) by m4.gw.fujitsu.co.jp (Postfix) with ESMTP id A5D0D45DE6F for ; Thu, 10 Jun 2010 09:43:07 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 7E19AE3800B for ; Thu, 10 Jun 2010 09:43:07 +0900 (JST) Received: from ml13.s.css.fujitsu.com (ml13.s.css.fujitsu.com [10.249.87.103]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id E350AE38006 for ; Thu, 10 Jun 2010 09:43:06 +0900 (JST) Date: Thu, 10 Jun 2010 09:38:42 +0900 From: KAMEZAWA Hiroyuki Subject: Re: [RFC PATCH 0/6] Do not call ->writepage[s] from direct reclaim and use a_ops->writepages() where possible Message-Id: <20100610093842.6a038ab0.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <20100609095200.GA5650@csn.ul.ie> References: <1275987745-21708-1-git-send-email-mel@csn.ul.ie> <20100609115211.435a45f7.kamezawa.hiroyu@jp.fujitsu.com> <20100609095200.GA5650@csn.ul.ie> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Mel Gorman Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Dave Chinner , Chris Mason , Nick Piggin , Rik van Riel List-ID: On Wed, 9 Jun 2010 10:52:00 +0100 Mel Gorman wrote: > On Wed, Jun 09, 2010 at 11:52:11AM +0900, KAMEZAWA Hiroyuki wrote: > > > > > > > My concern is how memcg should work. IOW, what changes will be necessary for > > memcg to work with the new vmscan logic as no-direct-writeback. > > > > At worst, memcg waits on background flushers to clean their pages but > obviously this could lead to stalls in containers if it happened to be full > of dirty pages. > yes. > Do you have test scenarios already setup for functional and performance > regression testing of containers? If so, can you run tests with this series > and see what sort of impact you find? I haven't done performance testing > with containers to date so I don't know what the expected values are. > Maybe kernbench is enough. I think it does enough write and malloc. 'Limit' size for test depends on your host. I sometimes does this on 8cpu SMP box. # mount -t cgroup none /cgroups -o memory # mkdir /cgroups/A # echo $$ > /cgroups/A # echo 300M > /cgroups/memory.limit_in_bytes # make -j 8 or make -j 16 Comparing size of swap and speed will be interesting. (Above 300M is enough small because my test machine has 24G memory.) Or # mount -t cgroup none /cgroups -o memory # mkdir /cgroups/A # echo $$ > /cgroups/A # echo 50M > /cgroups/memory.limit_in_bytes # dd if=/dev/zero of=./tmpfile bs=65536 count=100000 or some. When I tested the original patch for "avoiding writeback" by Dave Chinner, I saw 2 ooms in 10 tests. If not patched, I never see OOM. > > Maybe an ideal solution will be > > - support buffered I/O tracking in I/O cgroup. > > - flusher threads should work with I/O cgroup. > > - memcg itself should support dirty ratio. and add a trigger to kick flusher > > threads for dirty pages in a memcg. > > But I know it's a long way. > > > > I'm not very familiar with memcg I'm afraid or its requirements so I am > having trouble guessing which of these would behave the best. You could take > a gamble on having memcg doing writeback in direct reclaim but you may run > into the same problem of overflowing stacks. > maybe. > I'm not sure how a flusher thread would work just within a cgroup. It > would have to do a lot of searching to find the pages it needs > considering that it's looking at inodes rather than pages. > yes. So, I(we) need some way for coloring inode for selectable writeback. But people in this area are very nervous about performance (me too ;), I've not found the answer yet. > One possibility I guess would be to create a flusher-like thread if a direct > reclaimer finds that the dirty pages in the container are above the dirty > ratio. It would scan and clean all dirty pages in the container LRU on behalf > of dirty reclaimers. > Yes, that's possible. But Andrew recommends not to do add more threads. So, I'll use workqueue if necessary. > Another possibility would be to have kswapd work in containers. > Specifically, if wakeup_kswapd() is called with a cgroup that it's added > to a list. kswapd gives priority to global reclaim but would > occasionally check if there is a container that needs kswapd on a > pending list and if so, work within the container. Is there a good > reason why kswapd does not work within container groups? > One reason is node v.s. memcg. Because memcg doesn't limit memory placement, a container can contain pages from the all nodes. So,it's a bit problem which node's kswapd we should run . (but yes, maybe small problem.) Another is memory-reclaim-prioirty between memcg. (I don't want to add such a knob...) Maybe it's time to consider about that. Now, we're using kswapd for softlimit. I think similar hints for kswapd should work. yes. > Finally, you could just allow reclaim within a memcg do writeback. Right > now, the check is based on current_is_kswapd() but I could create a helper > function that also checked for sc->mem_cgroup. Direct reclaim from the > page allocator never appears to work within a container group (which > raises questions in itself such as why a process in a container would > reclaim pages outside the container?) so it would remain safe. > isolate_lru_pages() for memcg finds only pages in a memcg ;) > > How the new logic works with memcg ? Because memcg doesn't trigger kswapd, > > memcg has to wait for a flusher thread make pages clean ? > > Right now, memcg has to wait for a flusher thread to make pages clean. > ok. > > Or memcg should have kswapd-for-memcg ? > > > > Is it okay to call writeback directly when !scanning_global_lru() ? > > memcg's reclaim routine is only called from specific positions, so, I guess > > no stack problem. > > It's a judgement call from you really. I see that direct reclaimers do > not set mem_cgroup so it's down to - are you reasonably sure that all > the paths that reclaim based on a container are not deep? One concerns is add_to_page_cache(). If it's called in deep stack, my assumption is wrong. > I looked > around for a while and the bulk appeared to be in the fault path so I > would guess "yes" but as I'm not familiar with the memcg implementation > I'll have missed a lot. > > > But we just have I/O pattern problem. > > True. > Okay, I'll consider about how to kick kswapd via memcg or flusher-for-memcg. Please go ahead as you want. I love good I/O pattern, too. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org