Re: RFC: Transparent Hugepage support

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Andrea Arcangeli <aarcange@redhat.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linux-mm@kvack.org, Marcelo Tosatti <mtosatti@redhat.com>,
	Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>,
	Izik Eidus <ieidus@redhat.com>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Nick Piggin <npiggin@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: RFC: Transparent Hugepage support
Date: Tue, 3 Nov 2009 12:18:29 +0100	[thread overview]
Message-ID: <20091103111829.GJ11981@random.random> (raw)
In-Reply-To: <1257024567.7907.17.camel@pasglop>

On Sun, Nov 01, 2009 at 08:29:27AM +1100, Benjamin Herrenschmidt wrote:
> This isn't possible on all architectures. Some archs have "segment"
> constraints which mean only one page size per such "segment". Server
> ppc's for example (segment size being either 256M or 1T depending on the
> CPU).

Hmm 256M is already too large for a transparent allocation. It will
require reservation and hugetlbfs to me actually seems a perfect fit
for this hardware limitation. The software limits of hugetlbfs matches
the hardware limit perfectly and it already provides all necessary
permission and reservation features needed to deal with extremely huge
page sizes that probabilistically would never be found in the buddy
(even if we were to extend it to make it not impossible). That are
hugely expensive to defrag dynamically even if we could [and we can't
hope to defrag many of those because of slab]. Just in case it's not
obvious the probability we can defrag degrades exponentially with the
increase of the hugepagesize (which also means 256M is already orders
of magnitude more realistic to function than than 1G). Clearly if we
increase slab to allocate with a front allocator in 256M chunk then
our probability increases substantially, but to make something
realistic there's at minimum an order of 10000 times between
hugepagesize and total ram size. I.e. if 2M page makes some
probabilistic sense with slab front-allocating 2M pages on a 64G
system, for 256M pages to make an equivalent sense, system would
require minimum 8Terabyte of ram. If pages were 1G sized system would
require 32 Terabyte of ram (and the bigger overhead and trouble we
would have considering some allocation would still happen in 4k ptes
and the fixed overhead of relocating those 4k ranges would be much
bigger if the hugepage size is a lot bigger than 2M and the regular
page size is still 4k).

> > The most important design choice is: always fallback to 4k allocation
> > if the hugepage allocation fails! This is the _very_ opposite of some
> > large pagecache patches that failed with -EIO back then if a 64k (or
> > similar) allocation failed...
> 
> Precisely because the approach cannot work on all architectures ?

I thought the main reason for those patches was to allow a fs
blocksize bigger than PAGE_SIZE, a PAGE_CACHE_SIZE of 64k would allow
for a 64k fs blocksize without much fs changes. But yes, if the mmu
can't fallback, then software can't fallback either and so it impedes
the transparent design on those architectures... To me hugetlbfs looks
as best as you can get on those mmu.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-11-03 16:04 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-26 18:51 Andrea Arcangeli
2009-10-27 15:41 ` Rik van Riel
2009-10-27 18:18 ` Andi Kleen
2009-10-27 19:30   ` Andrea Arcangeli
2009-10-28  4:28     ` Andi Kleen
2009-10-28 12:00       ` Andrea Arcangeli
2009-10-28 14:18         ` Andi Kleen
2009-10-28 14:54           ` Adam Litke
2009-10-28 15:13             ` Andi Kleen
2009-10-28 15:30               ` Andrea Arcangeli
2009-10-29 15:59             ` Dave Hansen
2009-10-31 21:32             ` Benjamin Herrenschmidt
2009-10-28 15:48           ` Andrea Arcangeli
2009-10-28 16:03             ` Andi Kleen
2009-10-28 16:22               ` Andrea Arcangeli
2009-10-28 16:34                 ` Andi Kleen
2009-10-28 16:56                   ` Adam Litke
2009-10-28 17:18                     ` Andi Kleen
2009-10-28 19:04                   ` Andrea Arcangeli
2009-10-28 19:22                     ` Andrea Arcangeli
2009-10-29  9:43       ` Ingo Molnar
2009-10-29 10:36         ` Andrea Arcangeli
2009-10-29 16:50           ` Mike Travis
2009-10-30  0:40           ` KAMEZAWA Hiroyuki
2009-11-03 10:55             ` Andrea Arcangeli
2009-11-04  0:36               ` KAMEZAWA Hiroyuki
2009-10-29 12:54     ` Andrea Arcangeli
2009-10-27 20:42 ` Christoph Lameter
2009-10-27 18:21   ` Andrea Arcangeli
2009-10-27 20:25     ` Chris Wright
2009-10-29 18:51       ` Christoph Lameter
2009-11-01 10:56         ` Andrea Arcangeli
2009-10-29 18:55     ` Christoph Lameter
2009-10-31 21:29 ` Benjamin Herrenschmidt
2009-11-03 11:18   ` Andrea Arcangeli [this message]
2009-11-03 19:10     ` Dave Hansen
2009-11-04  4:10     ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091103111829.GJ11981@random.random \
    --to=aarcange@redhat.com \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=avi@redhat.com \
    --cc=benh@kernel.crashing.org \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=ieidus@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mtosatti@redhat.com \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox