From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail202.messagelabs.com (mail202.messagelabs.com [216.82.254.227]) by kanga.kvack.org (Postfix) with SMTP id 1A64C6B0218 for ; Wed, 14 Apr 2010 02:52:16 -0400 (EDT) Received: from m6.gw.fujitsu.co.jp ([10.0.50.76]) by fgwmail5.fujitsu.co.jp (Fujitsu Gateway) with ESMTP id o3E6qBj8016776 for (envelope-from kosaki.motohiro@jp.fujitsu.com); Wed, 14 Apr 2010 15:52:12 +0900 Received: from smail (m6 [127.0.0.1]) by outgoing.m6.gw.fujitsu.co.jp (Postfix) with ESMTP id BB5F745DE4E for ; Wed, 14 Apr 2010 15:52:11 +0900 (JST) Received: from s6.gw.fujitsu.co.jp (s6.gw.fujitsu.co.jp [10.0.50.96]) by m6.gw.fujitsu.co.jp (Postfix) with ESMTP id 9B7EA45DE51 for ; Wed, 14 Apr 2010 15:52:11 +0900 (JST) Received: from s6.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s6.gw.fujitsu.co.jp (Postfix) with ESMTP id 60E921DB8016 for ; Wed, 14 Apr 2010 15:52:11 +0900 (JST) Received: from ml14.s.css.fujitsu.com (ml14.s.css.fujitsu.com [10.249.87.104]) by s6.gw.fujitsu.co.jp (Postfix) with ESMTP id 01EFC1DB8013 for ; Wed, 14 Apr 2010 15:52:11 +0900 (JST) From: KOSAKI Motohiro Subject: Re: [PATCH] mm: disallow direct reclaim page writeback In-Reply-To: <20100413143659.GA2493@dastard> References: <20100413201635.D119.A69D9226@jp.fujitsu.com> <20100413143659.GA2493@dastard> Message-Id: <20100414155201.D14A.A69D9226@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Date: Wed, 14 Apr 2010 15:52:10 +0900 (JST) Sender: owner-linux-mm@kvack.org To: Dave Chinner Cc: kosaki.motohiro@jp.fujitsu.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Chris Mason List-ID: > On Tue, Apr 13, 2010 at 08:39:29PM +0900, KOSAKI Motohiro wrote: > > Hi > > > > > > Pros: > > > > 1) prevent XFS stack overflow > > > > 2) improve io workload performance > > > > > > > > Cons: > > > > 3) TOTALLY kill lumpy reclaim (i.e. high order allocation) > > > > > > > > So, If we only need to consider io workload this is no downside. but > > > > it can't. > > > > > > > > I think (1) is XFS issue. XFS should care it itself. > > > > > > The filesystem is irrelevant, IMO. > > > > > > The traces from the reporter showed that we've got close to a 2k > > > stack footprint for memory allocation to direct reclaim and then we > > > can put the entire writeback path on top of that. This is roughly > > > 3.5k for XFS, and then depending on the storage subsystem > > > configuration and transport can be another 2k of stack needed below > > > XFS. > > > > > > IOWs, if we completely ignore the filesystem stack usage, there's > > > still up to 4k of stack needed in the direct reclaim path. Given > > > that one of the stack traces supplied show direct reclaim being > > > entered with over 3k of stack already used, pretty much any > > > filesystem is capable of blowing an 8k stack. > > > > > > So, this is not an XFS issue, even though XFS is the first to > > > uncover it. Don't shoot the messenger.... > > > > Thanks explanation. I haven't noticed direct reclaim consume > > 2k stack. I'll investigate it and try diet it. > > But XFS 3.5K stack consumption is too large too. please diet too. > > It hasn't grown in the last 2 years after the last major diet where > all the fat was trimmed from it in the last round of the i386 4k > stack vs XFS saga. it seems that everything else around XFS has > grown in that time, and now we are blowing stacks again.... I have dumb question, If xfs haven't bloat stack usage, why 3.5 stack usage works fine on 4k stack kernel? It seems impossible. Please don't think I blame you. I don't know what is "4k stack vs XFS saga". I merely want to understand what you said. > > > Hence I think that direct reclaim should be deferring to the > > > background flusher threads for cleaning memory and not trying to be > > > doing it itself. > > > > Well, you seems continue to discuss io workload. I don't disagree > > such point. > > > > example, If only order-0 reclaim skip pageout(), we will get the above > > benefit too. > > But it won't prevent start blowups... > > > > > but we never kill pageout() completely because we can't > > > > assume users don't run high order allocation workload. > > > > > > I think that lumpy reclaim will still work just fine. > > > > > > Lumpy reclaim appears to be using IO as a method of slowing > > > down the reclaim cycle - the congestion_wait() call will still > > > function as it does now if the background flusher threads are active > > > and causing congestion. I don't see why lumpy reclaim specifically > > > needs to be issuing IO to make it work - if the congestion_wait() is > > > not waiting long enough then wait longer - don't issue IO to extend > > > the wait time. > > > > lumpy reclaim is for allocation high order page. then, it not only > > reclaim LRU head page, but also its PFN neighborhood. PFN neighborhood > > is often newly page and still dirty. then we enfoce pageout cleaning > > and discard it. > > Ok, I see that now - I missed the second call to __isolate_lru_pages() > in isolate_lru_pages(). No problem. It's one of VM mess. Usual developers don't know it :-) > > When high order allocation occur, we don't only need free enough amount > > memory, but also need free enough contenious memory block. > > Agreed, that was why I was kind of surprised not to find it was > doing that. But, as you have pointed out, that was my mistake. > > > If we need to consider _only_ io throughput, waiting flusher thread > > might faster perhaps, but actually we also need to consider reclaim > > latency. I'm worry about such point too. > > True, but without know how to test and measure such things I can't > really comment... Agreed. I know making VM mesurement benchmark is very difficult. but probably it is necessary.... I'm sorry, now I can't give you good convenient benchmark. > > > > Of course, the code is a maze of twisty passages, so I probably > > > missed something important. Hopefully someone can tell me what. ;) > > > > > > FWIW, the biggest problem here is that I have absolutely no clue on > > > how to test what the impact on lumpy reclaim really is. Does anyone > > > have a relatively simple test that can be run to determine what the > > > impact is? > > > > So, can you please run two workloads concurrently? > > - Normal IO workload (fio, iozone, etc..) > > - echo $NUM > /proc/sys/vm/nr_hugepages > > What do I measure/observe/record that is meaningful? > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org