From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail202.messagelabs.com (mail202.messagelabs.com [216.82.254.227]) by kanga.kvack.org (Postfix) with SMTP id 4742F6B0044 for ; Tue, 27 Oct 2009 15:30:20 -0400 (EDT) Date: Tue, 27 Oct 2009 20:30:07 +0100 From: Andrea Arcangeli Subject: Re: RFC: Transparent Hugepage support Message-ID: <20091027193007.GA6043@random.random> References: <20091026185130.GC4868@random.random> <87ljiwk8el.fsf@basil.nowhere.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87ljiwk8el.fsf@basil.nowhere.org> Sender: owner-linux-mm@kvack.org To: Andi Kleen Cc: linux-mm@kvack.org, Marcelo Tosatti , Adam Litke , Avi Kivity , Izik Eidus , Hugh Dickins , Nick Piggin , Andrew Morton List-ID: On Tue, Oct 27, 2009 at 07:18:26PM +0100, Andi Kleen wrote: > In general the best would be to just merge hugetlbfs into > the normal VM. It has been growing for far too long as a separate > "second VM" by now. This seems like a reasonable first step, > but some comments blow. Problem is hugetlbfs as it stands now can't be merged... it deliberately takes its own paths and it tries to be as far away from the VM as possible. But as you said, as people tries to make hugetlbfs vmas "more similar to the regular vmas" hugetlbfs slowly spreads into the VM code defeating the whole reason why hugetlbfs magic exists (i.e. to be out of the way of the VM as much as possible). Trying to make hugetlbfs more similar to regular vmas makes the VM more complex while still not achieving full feature for hugetlbfs (notably overcommit and paging). > The problem is that this will interact badly with 1GB pages -- once > you split them up you'll never get them back, because they > can't be allocated at runtime. 1GB pages can't be handled by this code, and clearly it's not practical to hope 1G pages to materialize in the buddy (even if we were to increase the buddy so much slowing it down regular page allocation). Let's forget 1G pages here... we're only focused on sizes that can be allocated dynamically. Main problem are the 64k pages or such that don't fit into a pmd... > Even for 2MB pages it can be a problem. > > You'll likely need to fix the page table code. In terms of fragmentation split_huge_page itself won't create it.. unless it swaps (but then CPU performance is lost on the mapping anyway). We need to teach mprotect/mremap not to call split_huge_page true, but not to avoid fragmentation. btw, thinking at fragmentation generated by mnmap(last4k) I also think I found a minor bug in munmap if a partial part of the 2M page is unmapped (currently I'm afraid I'm dropping the whole 2M in that case ;), but it's trivial to fix... clearly not many apps are truncating 4k off a 2M mapping. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org