From: Andi Kleen <andi@firstfloor.org>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>,
linux-mm@kvack.org, Marcelo Tosatti <mtosatti@redhat.com>,
Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>,
Izik Eidus <ieidus@redhat.com>,
Hugh Dickins <hugh.dickins@tiscali.co.uk>,
Nick Piggin <npiggin@suse.de>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: RFC: Transparent Hugepage support
Date: Wed, 28 Oct 2009 05:28:05 +0100 [thread overview]
Message-ID: <20091028042805.GJ7744@basil.fritz.box> (raw)
In-Reply-To: <20091027193007.GA6043@random.random>
On Tue, Oct 27, 2009 at 08:30:07PM +0100, Andrea Arcangeli wrote:
Hi Andrea,
> On Tue, Oct 27, 2009 at 07:18:26PM +0100, Andi Kleen wrote:
> > In general the best would be to just merge hugetlbfs into
> > the normal VM. It has been growing for far too long as a separate
> > "second VM" by now. This seems like a reasonable first step,
> > but some comments blow.
>
> Problem is hugetlbfs as it stands now can't be merged... it
> deliberately takes its own paths and it tries to be as far away from
> the VM as possible. But as you said, as people tries to make hugetlbfs
I think longer term the standard VM just needs to understand
huge pages properly. Originally when huge pages were only
considered a "Oracle hack" the separation made sense, but now
with more and more use that is really not true anymore.
Also hugetlbfs is gaining more and more functionality all the time.
Maintaining two VMs in parallel forever seems like the wrong
thing to do.
Also the fragmentation avoidance heuristics got a lot better
in the last years, so it's much more practical than it used to be
(at least for 2MB)
> > The problem is that this will interact badly with 1GB pages -- once
> > you split them up you'll never get them back, because they
> > can't be allocated at runtime.
>
> 1GB pages can't be handled by this code, and clearly it's not
> practical to hope 1G pages to materialize in the buddy (even if we
That seems short sightened. You do this because 2MB pages give you
x% performance advantage, but then it's likely that 1GB pages will give
another y% improvement and why should people stop at the smaller
improvement?
Ignoring the gigantic pages now would just mean that this
would need to be revised later again or that users still
need to use hacks like libhugetlbfs.
Given 1GB pages for a time are harder to use on the system
administrator level, but at least for applications the interfaces
should be similar at least.
> were to increase the buddy so much slowing it down regular page
> allocation). Let's forget 1G pages here... we're only focused on sizes
> that can be allocated dynamically. Main problem are the 64k pages or
> such that don't fit into a pmd...
What 64k pages? You're talking about soft pages or non x86?
>
> > Even for 2MB pages it can be a problem.
> >
> > You'll likely need to fix the page table code.
>
> In terms of fragmentation split_huge_page itself won't create
> it.. unless it swaps (but then CPU performance is lost on the mapping
> anyway).
The problem is that the performance will be lost forever. So if
you ever do something that only does a little temporary
swapping (like a backup run) you would be ready for a reboot.
Not good.
> We need to teach mprotect/mremap not to call split_huge_page
> true, but not to avoid fragmentation. btw, thinking at fragmentation
I think they just have to be fixed properly.
My suspicion is btw that there's some more code sharing possible
in all that VMA handling code of ther different system calls
(I remember thinking that when I wrote mbind() :-). Then perhaps
variable page support would be easier anyways because less code needs
to be changed.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-10-28 4:28 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-26 18:51 Andrea Arcangeli
2009-10-27 15:41 ` Rik van Riel
2009-10-27 18:18 ` Andi Kleen
2009-10-27 19:30 ` Andrea Arcangeli
2009-10-28 4:28 ` Andi Kleen [this message]
2009-10-28 12:00 ` Andrea Arcangeli
2009-10-28 14:18 ` Andi Kleen
2009-10-28 14:54 ` Adam Litke
2009-10-28 15:13 ` Andi Kleen
2009-10-28 15:30 ` Andrea Arcangeli
2009-10-29 15:59 ` Dave Hansen
2009-10-31 21:32 ` Benjamin Herrenschmidt
2009-10-28 15:48 ` Andrea Arcangeli
2009-10-28 16:03 ` Andi Kleen
2009-10-28 16:22 ` Andrea Arcangeli
2009-10-28 16:34 ` Andi Kleen
2009-10-28 16:56 ` Adam Litke
2009-10-28 17:18 ` Andi Kleen
2009-10-28 19:04 ` Andrea Arcangeli
2009-10-28 19:22 ` Andrea Arcangeli
2009-10-29 9:43 ` Ingo Molnar
2009-10-29 10:36 ` Andrea Arcangeli
2009-10-29 16:50 ` Mike Travis
2009-10-30 0:40 ` KAMEZAWA Hiroyuki
2009-11-03 10:55 ` Andrea Arcangeli
2009-11-04 0:36 ` KAMEZAWA Hiroyuki
2009-10-29 12:54 ` Andrea Arcangeli
2009-10-27 20:42 ` Christoph Lameter
2009-10-27 18:21 ` Andrea Arcangeli
2009-10-27 20:25 ` Chris Wright
2009-10-29 18:51 ` Christoph Lameter
2009-11-01 10:56 ` Andrea Arcangeli
2009-10-29 18:55 ` Christoph Lameter
2009-10-31 21:29 ` Benjamin Herrenschmidt
2009-11-03 11:18 ` Andrea Arcangeli
2009-11-03 19:10 ` Dave Hansen
2009-11-04 4:10 ` Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091028042805.GJ7744@basil.fritz.box \
--to=andi@firstfloor.org \
--cc=aarcange@redhat.com \
--cc=agl@us.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=avi@redhat.com \
--cc=hugh.dickins@tiscali.co.uk \
--cc=ieidus@redhat.com \
--cc=linux-mm@kvack.org \
--cc=mtosatti@redhat.com \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox