From: Christoph Lameter <cl@linux-foundation.org>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-mm@kvack.org, Marcelo Tosatti <mtosatti@redhat.com>,
Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>,
Izik Eidus <ieidus@redhat.com>,
Hugh Dickins <hugh.dickins@tiscali.co.uk>,
Nick Piggin <npiggin@suse.de>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: RFC: Transparent Hugepage support
Date: Thu, 29 Oct 2009 14:55:08 -0400 (EDT) [thread overview]
Message-ID: <alpine.DEB.1.10.0910291451240.18197@V090114053VZO-1> (raw)
In-Reply-To: <20091027182109.GA5753@random.random>
On Tue, 27 Oct 2009, Andrea Arcangeli wrote:
> Agreed, migration is important on numa systems as much as swapping is
> important on regular hosts, and this patch allows both in the very
> same way with a few liner addition (that is a noop and doesn't modify
> the kernel binary when CONFIG_TRANSPARENT_HUGEPAGE=N). The hugepages
> in this patch should already relocatable just fine with move_pages (I
> say "should" because I didn't test move_pages yet ;).
Another NUMA issue is how MPOL_INTERLEAVE would work with this.
MPOL_INTERLEAVE would cause the spreading of a sequence of pages over a
series of nodes. If you coalesce to one huge page then that cannot be done
anymore.
> > Wont you be running into issues with page dirtying on that level?
>
> Not sure I follow what the problem should be. At the moment when
> pmd_trans_huge is true, the dirty bit is meaningless (hugepages at the
> moment are splitted in place into regular pages before they can be
> converted to swapcache, only after an hugepage becomes swapcache its
> dirty bit on the pte becomes meaningful to handle the case of an
> exclusive swapcache mapped writeable into a single pte and marked
> clean to be able to swap it out at zerocost if memory pressure returns
> and to avoid a cow if the page is written to before it is paged out
> again), but the accessed bit is already handled just fine at the pmd
> level.
May not be a problem as long as you dont allow fs operations with these
pages.
> > Those also had fall back logic to 4k. Does this scheme also allow I/O with
>
> Well maybe I remember your patches wrong, or I might have not followed
> later developments but I was quite sure to remember when we discussed
> it, the reason of the -EIO failure was the fs had softblocksize bigger
> than 4k... and in general fs can't handle blocksize bigger than the
> PAGE_CACHE_SIZE... In effect the core trouble wasnt' the large
> pagecache but the fact the fs wanted a blocksize larger than
> PAGE_SIZE, despite not being able to handle it, if the block was
> splitted in multiple 4k not contiguous areas.
The patches modified the page cache logic to determine the page size from
the page structs.
> > I dont get the point of this. What do you mean by "an operation that
> > cannot fail"? Atomic section?
>
> In short I mean it cannot return -ENOMEM (and an additional bonus is
> that I managed it not to require scheduling or blocking
> operations). The idea is that you can plug it anywhere with a one
> liner and your code becomes hugepage compatible (sure it would run
> faster if you were to teach to your code to handle pmd_trans_huge
> natively but we can't do it all at once :).
We need to know some more detail about the conversion.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-10-29 14:56 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-26 18:51 Andrea Arcangeli
2009-10-27 15:41 ` Rik van Riel
2009-10-27 18:18 ` Andi Kleen
2009-10-27 19:30 ` Andrea Arcangeli
2009-10-28 4:28 ` Andi Kleen
2009-10-28 12:00 ` Andrea Arcangeli
2009-10-28 14:18 ` Andi Kleen
2009-10-28 14:54 ` Adam Litke
2009-10-28 15:13 ` Andi Kleen
2009-10-28 15:30 ` Andrea Arcangeli
2009-10-29 15:59 ` Dave Hansen
2009-10-31 21:32 ` Benjamin Herrenschmidt
2009-10-28 15:48 ` Andrea Arcangeli
2009-10-28 16:03 ` Andi Kleen
2009-10-28 16:22 ` Andrea Arcangeli
2009-10-28 16:34 ` Andi Kleen
2009-10-28 16:56 ` Adam Litke
2009-10-28 17:18 ` Andi Kleen
2009-10-28 19:04 ` Andrea Arcangeli
2009-10-28 19:22 ` Andrea Arcangeli
2009-10-29 9:43 ` Ingo Molnar
2009-10-29 10:36 ` Andrea Arcangeli
2009-10-29 16:50 ` Mike Travis
2009-10-30 0:40 ` KAMEZAWA Hiroyuki
2009-11-03 10:55 ` Andrea Arcangeli
2009-11-04 0:36 ` KAMEZAWA Hiroyuki
2009-10-29 12:54 ` Andrea Arcangeli
2009-10-27 20:42 ` Christoph Lameter
2009-10-27 18:21 ` Andrea Arcangeli
2009-10-27 20:25 ` Chris Wright
2009-10-29 18:51 ` Christoph Lameter
2009-11-01 10:56 ` Andrea Arcangeli
2009-10-29 18:55 ` Christoph Lameter [this message]
2009-10-31 21:29 ` Benjamin Herrenschmidt
2009-11-03 11:18 ` Andrea Arcangeli
2009-11-03 19:10 ` Dave Hansen
2009-11-04 4:10 ` Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.1.10.0910291451240.18197@V090114053VZO-1 \
--to=cl@linux-foundation.org \
--cc=aarcange@redhat.com \
--cc=agl@us.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=avi@redhat.com \
--cc=hugh.dickins@tiscali.co.uk \
--cc=ieidus@redhat.com \
--cc=linux-mm@kvack.org \
--cc=mtosatti@redhat.com \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox