From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e34.co.us.ibm.com (8.13.8/8.13.8) with ESMTP id m18HBXhm014617 for ; Fri, 8 Feb 2008 12:11:33 -0500 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v8.7) with ESMTP id m18HBXna086792 for ; Fri, 8 Feb 2008 10:11:33 -0700 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m18HBWif008805 for ; Fri, 8 Feb 2008 10:11:33 -0700 Date: Fri, 8 Feb 2008 09:11:32 -0800 From: Nishanth Aravamudan Subject: Re: [RFC][PATCH 2/2] Explicitly retry hugepage allocations Message-ID: <20080208171132.GE15903@us.ibm.com> References: <20080206230726.GF3477@us.ibm.com> <20080206231243.GG3477@us.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org Return-Path: To: Christoph Lameter Cc: melgor@ie.ibm.com, apw@shadowen.org, agl@us.ibm.com, wli@holomorphy.com, linux-mm@kvack.org List-ID: On 06.02.2008 [15:30:53 -0800], Christoph Lameter wrote: > On Wed, 6 Feb 2008, Nishanth Aravamudan wrote: > > > Add __GFP_REPEAT to hugepage allocations. Do so to not necessitate > > userspace putting pressure on the VM by repeated echo's into > > /proc/sys/vm/nr_hugepages to grow the pool. With the previous patch > > to allow for large-order __GFP_REPEAT attempts to loop for a bit (as > > opposed to indefinitely), this increases the likelihood of getting > > hugepages when the system experiences (or recently experienced) > > load. > > > > On a 2-way x86_64, this doubles the number of hugepages (from 10 to > > 20) obtained while compiling a kernel at the same time. On a 4-way > > ppc64, a similar scale increase is seen (from 3 to 5 hugepages). > > Finally, on a 2-way x86, this leads to a 5-fold increase in the > > hugepages allocatable under load (90 to 554). > > Hmmm... How about defaulting to __GFP_REPEAT by default for larger > page allocations? There are other users of larger allocs that would > also benefit from the same measure. I think it would be fine as long > as we are sure to fail at some point. In thinking about this more, one of the harder parts for me to get my head around was the implicit promotion of small-order allocations to __GFP_REPEAT (and thus to __GFP_NOFAIL). I would prefer keeping the large-order allocations explicit as to when they want the VM to try harder to succeed. As far as I understand it, only hugepages really will leverage this from code in in the kernel currently? I also feel like, even if __GFP_REPEAT becomes a default behavior, it's better to use it as a documentation of intent from the caller -- and perhaps indicate to us sites that are over-stressing the VM unnecessarily by regularly forcing reclaim? I also am not 100% positive on how I would test the result of such a change, since there are not that many large-order allocations in the kernel... Did you have any thoughts on that? Thanks, Nish -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org