linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: linux-mm@kvack.org, Marcelo Tosatti <mtosatti@redhat.com>,
	Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>,
	Izik Eidus <ieidus@redhat.com>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Nick Piggin <npiggin@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: RFC: Transparent Hugepage support
Date: Wed, 28 Oct 2009 17:22:06 +0100	[thread overview]
Message-ID: <20091028162206.GG9640@random.random> (raw)
In-Reply-To: <20091028160352.GS7744@basil.fritz.box>

On Wed, Oct 28, 2009 at 05:03:52PM +0100, Andi Kleen wrote:
> It's still a big step between just needing reservation and also
> hacking the application to use new interfaces.

The word "transparent" is all about "no need of hacking the
application" because "there is no new interface".

I want to keep it as transparent as possible and to defer adding user
visible interfaces (with the exception of MADV_HUGEPAGE equivalent to
MADV_MERGEABLE for the scan daemon) initially. Even MADV_HUGEPAGE
might not be necessary, even the disable/enable global flag may not be
necessary but that is the absolute minimum tuning that seems
useful and so there's not much risk to obsolete it.

> doesn't fully initially. That is why I objected earlier -- the design
> doesn't seem to support them.

I think it supports them once you solve the reservation, your hinting
and you add pud_trans_huge.

> A global sysctl seems like a quite clumpsy way to do that. I hope
> it would be possible to do better even with relatively simple code.

btw, the sysctl has to be moved to sysfs. The same sysfs directory
will also control the background collapse_huge_page daemon.

> e.g. a per process flag + prctl wouldn't seem to be particularly complicated.

You realize we can add those _interfaces_ later _after_ adding
pud_trans_huge. I don't even want to add pud_trans_huge right
now. Adding them now would force us to be sure to get the interface
right. I don't even want to think about it.

Let's defer any not strictly necessary visible user interface for
_later_. Anything 1G pages need can be deferred later.

> If there's a per process "use pre-reservation" policy that logic
> could well be shared for 2MB and 1GB.

We don't want having to reserve. Yes we could reserve but we don't
want to. We want to tell the kernel which regions have to be scanned
to recreate 2M pages with the madvise, but that's about it.

Nothing prevents us to add an interface to reserve later, which
obviously will be mandatory for 1G pages to ever be allocated. It's
not something we need to solve now I think.

> I don't think there's much (anything?) in 1GB support that's absolutely
> useless for 2M. e.g. a flexible reservation policy is certainly not.

I don't see KVM ever using this reservation hint, glibc neither. So
yes, you may have a corner case, but for the actual users of
transparent hugepages it seems entirely useless to me for the long
run. I may be wrong but because this is a new interface, and
transparent hugepages is all about _not_ having to modify the app at
all, we should better focus on ensuring the MADV_HUGEPAGE fits 1G
collapse_huge_page collapsing later (yeah, assuming 1G pages becomes
available and that you can hang all apps using that data for as long
as copy_page(1g)).

The whole point of ignoring 1G pages is that, we know adding
pud_trans_huge later is no problem, and that it'll require userland
changes that we want to defer as it's an orthogonal problem, even if
it might remotely help some corner case using transparent hugepages.

> When the performance improvement is visible enough people will
> feel the need to reboot and the practical effect will be that
> Linux requires reboots for full performance.

So you think the collapse_huge_page daemon will not be enough? How
can't it be enough? If it's not enough it means the defrag logic isn't
smart enough simply. So there's no way anything we do in this patch
can make a difference to avoid or not avoid reboot. In short your
worry of "need of rebooting" has nothing to do with the code we're
discussing but with the ability of the VM to generate hugepages. The
collapse_huge_page daemon will do the necessary things if those are
made available without need of reboot. yes defrag is another thing to
solve but it can be addressed separately and in parallel with this.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-10-28 16:22 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-26 18:51 Andrea Arcangeli
2009-10-27 15:41 ` Rik van Riel
2009-10-27 18:18 ` Andi Kleen
2009-10-27 19:30   ` Andrea Arcangeli
2009-10-28  4:28     ` Andi Kleen
2009-10-28 12:00       ` Andrea Arcangeli
2009-10-28 14:18         ` Andi Kleen
2009-10-28 14:54           ` Adam Litke
2009-10-28 15:13             ` Andi Kleen
2009-10-28 15:30               ` Andrea Arcangeli
2009-10-29 15:59             ` Dave Hansen
2009-10-31 21:32             ` Benjamin Herrenschmidt
2009-10-28 15:48           ` Andrea Arcangeli
2009-10-28 16:03             ` Andi Kleen
2009-10-28 16:22               ` Andrea Arcangeli [this message]
2009-10-28 16:34                 ` Andi Kleen
2009-10-28 16:56                   ` Adam Litke
2009-10-28 17:18                     ` Andi Kleen
2009-10-28 19:04                   ` Andrea Arcangeli
2009-10-28 19:22                     ` Andrea Arcangeli
2009-10-29  9:43       ` Ingo Molnar
2009-10-29 10:36         ` Andrea Arcangeli
2009-10-29 16:50           ` Mike Travis
2009-10-30  0:40           ` KAMEZAWA Hiroyuki
2009-11-03 10:55             ` Andrea Arcangeli
2009-11-04  0:36               ` KAMEZAWA Hiroyuki
2009-10-29 12:54     ` Andrea Arcangeli
2009-10-27 20:42 ` Christoph Lameter
2009-10-27 18:21   ` Andrea Arcangeli
2009-10-27 20:25     ` Chris Wright
2009-10-29 18:51       ` Christoph Lameter
2009-11-01 10:56         ` Andrea Arcangeli
2009-10-29 18:55     ` Christoph Lameter
2009-10-31 21:29 ` Benjamin Herrenschmidt
2009-11-03 11:18   ` Andrea Arcangeli
2009-11-03 19:10     ` Dave Hansen
2009-11-04  4:10     ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091028162206.GG9640@random.random \
    --to=aarcange@redhat.com \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=avi@redhat.com \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=ieidus@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mtosatti@redhat.com \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox