From: Christoph Lameter <cl@linux-foundation.org>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-mm@kvack.org, Marcelo Tosatti <mtosatti@redhat.com>,
Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>,
Izik Eidus <ieidus@redhat.com>,
Hugh Dickins <hugh.dickins@tiscali.co.uk>,
Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>,
Mel Gorman <mel@csn.ul.ie>, Dave Hansen <dave@linux.vnet.ibm.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Ingo Molnar <mingo@elte.hu>, Mike Travis <travis@sgi.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Chris Wright <chrisw@sous-sol.org>,
bpicco@redhat.com,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Arnd Bergmann <arnd@arndb.de>,
"Michael S. Tsirkin" <mst@redhat.com>,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH 00 of 34] Transparent Hugepage support #14
Date: Tue, 23 Mar 2010 12:06:59 -0500 (CDT) [thread overview]
Message-ID: <alpine.DEB.2.00.1003231200430.10178@router.home> (raw)
In-Reply-To: <20100322170619.GQ29874@random.random>
On Mon, 22 Mar 2010, Andrea Arcangeli wrote:
> > > Problem with O_DIRECT is that I couldn't use mmu notifier to prevent
> > > it to take the pin on the page, because there is no way to interrupt
> > > DMA synchronously before mmu_notifier_invalidate_* returns... So I had
> > > to add compound_lock and keep gup API backwards compatible and have
> > > the proper serialization happen _only_ for PageTail inside put_page.
> >
> > You can take a refcount *before* breaking up a 2M page then you dont have
> > to fear the put_page.
>
> If you take it _before_ it will go into the head page regardless of
> which subpage was returned by gup. We need to know which subpages are
> under DMA. The pin has to go to the tail pages or head page depending
> on the physical address that was requested by gup. To fix this we need
> at the very least to change gup api to ask for hugepages which it
> can't right now because it'd break all drivers.
A 2M page needs to be treated as a single page. Under DMA would mean that
the whole of the page is considered under DMA! Sectioning off the 2M page
causes all sorts of problems. The page state needs to be complete in a
single page struct.
> besides even if we add a error retval, we can't have mprotect/mremap
> fail at most swapout could be deferred because "page cannot be broken
> up" but even that is risky and I've been extra careful not to require
> any memory allocation or sleeping lock in split_huge_page to make it
> ideal to use in swap path without risking any functional regression
> whatsoever.
We can go to sleep in mprotect and mremap and wait for the breaking up to
be successful.
> Allowing it to fail would result in a mess. Obviously I wasn't clear
> enough in the last sentence of my previous mail so I'll have to
> repeate: any effort in handling the failure (which in some case it
> can't be handled as syscalls can't fail just because a page is 2M)
> should instead be spent to _remove_ the split_huge_page call.
Its not advisable to do this. Splitting the huge page may cause surprises
to another kernel function that is operating on the assumption that this
is a 2M page. If you do this then new synchronization methods are
required.
> > We already have 2M pmd handling in the kernel and can consider huge pmd
> > entries while walking the page tables! Go incrementally use what
> > is there.
>
> There's no such thing unless you talk about the hugetlbfs paths. In
Indeed I am.
> Best of all, I had to add zero atomic ops and just 1 branch in already
> hot l1 cache (and no writes to the l1 cache either, just 1 more read)
> in order to add the pagefault slow path for huge pmd. So unless you
> actively take advantage of hugepages, the page_table_lock locking will
> be zero cost and in the future nothing prevents us to add a more
> scalar PMD lock like it exists for the pte (but keep in mind it's much
> 512 times less important for PMD than it is for the PTE).
Its not much use to have fake 2M pages that can splinter below you at any
time. In order to take full advantage of huge pages you need to be able
to do VM operations on them and for that these things need to be treated
as a single unit.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-03-23 17:08 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-17 15:19 Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 01 of 34] define MADV_HUGEPAGE Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 02 of 34] compound_lock Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 03 of 34] alter compound get_page/put_page Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 04 of 34] update futex compound knowledge Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 05 of 34] fix bad_page to show the real reason the page is bad Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 06 of 34] clear compound mapping Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 07 of 34] add native_set_pmd_at Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 08 of 34] add pmd paravirt ops Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 09 of 34] no paravirt version of pmd ops Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 10 of 34] export maybe_mkwrite Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 11 of 34] comment reminder in destroy_compound_page Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 12 of 34] config_transparent_hugepage Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 13 of 34] special pmd_trans_* functions Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 14 of 34] add pmd mangling generic functions Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 15 of 34] add pmd mangling functions to x86 Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 16 of 34] bail out gup_fast on splitting pmd Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 17 of 34] pte alloc trans splitting Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 18 of 34] add pmd mmu_notifier helpers Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 19 of 34] clear page compound Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 20 of 34] add pmd_huge_pte to mm_struct Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 21 of 34] split_huge_page_mm/vma Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 22 of 34] split_huge_page paging Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 23 of 34] clear_copy_huge_page Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 24 of 34] kvm mmu transparent hugepage support Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 25 of 34] _GFP_NO_KSWAPD Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 26 of 34] don't alloc harder for gfp nomemalloc even if nowait Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 27 of 34] transparent hugepage core Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 28 of 34] verify pmd_trans_huge isn't leaking Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 29 of 34] madvise(MADV_HUGEPAGE) Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 30 of 34] pmd_trans_huge migrate bugcheck Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 31 of 34] memcg compound Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 32 of 34] memcg huge memory Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 33 of 34] transparent hugepage vmstat Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 34 of 34] khugepaged Andrea Arcangeli
2010-03-17 19:05 ` [PATCH 00 of 34] Transparent Hugepage support #14 Christoph Lameter
2010-03-18 23:49 ` Andrea Arcangeli
2010-03-19 13:29 ` Christoph Lameter
2010-03-19 14:41 ` Andrea Arcangeli
2010-03-22 15:38 ` Christoph Lameter
2010-03-22 16:35 ` Johannes Weiner
2010-03-22 16:46 ` Christoph Lameter
2010-03-22 17:15 ` Andrea Arcangeli
2010-03-23 17:08 ` Christoph Lameter
2010-03-22 18:20 ` Johannes Weiner
2010-03-23 17:11 ` Christoph Lameter
2010-03-23 19:06 ` Andrea Arcangeli
2010-03-22 17:08 ` Andrea Arcangeli
2010-03-22 17:06 ` Andrea Arcangeli
2010-03-23 17:06 ` Christoph Lameter [this message]
2010-03-23 19:08 ` Andrea Arcangeli
2010-03-24 21:03 ` Christoph Lameter
2010-03-24 21:22 ` Andrea Arcangeli
2010-03-25 22:17 ` Christoph Lameter
2010-03-25 22:41 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.00.1003231200430.10178@router.home \
--to=cl@linux-foundation.org \
--cc=aarcange@redhat.com \
--cc=agl@us.ibm.com \
--cc=arnd@arndb.de \
--cc=avi@redhat.com \
--cc=balbir@linux.vnet.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=bpicco@redhat.com \
--cc=chrisw@sous-sol.org \
--cc=dave@linux.vnet.ibm.com \
--cc=hugh.dickins@tiscali.co.uk \
--cc=ieidus@redhat.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mingo@elte.hu \
--cc=mst@redhat.com \
--cc=mtosatti@redhat.com \
--cc=npiggin@suse.de \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=travis@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox