From: Andrea Arcangeli <aarcange@redhat.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: linux-mm@kvack.org, Marcelo Tosatti <mtosatti@redhat.com>,
Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>,
Izik Eidus <ieidus@redhat.com>,
Hugh Dickins <hugh.dickins@tiscali.co.uk>,
Nick Piggin <npiggin@suse.de>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: RFC: Transparent Hugepage support
Date: Thu, 29 Oct 2009 13:54:35 +0100 [thread overview]
Message-ID: <20091029125435.GM9640@random.random> (raw)
In-Reply-To: <20091027193007.GA6043@random.random>
On Tue, Oct 27, 2009 at 08:30:07PM +0100, Andrea Arcangeli wrote:
> generated by mnmap(last4k) I also think I found a minor bug in munmap
> if a partial part of the 2M page is unmapped (currently I'm afraid I'm
Here the incremental fix if somebody is running the patch (this also
frees the page after tlb flush, during development I had to call
put_page to debug something and I forgot to replace it back to
tlb_remove_page ;). In the meantime I'll try to split the patch into
more self contained pieces for easier review (including the kvm patch
to build 3level NPT/EPT).
Then I will continue working on the daemon that will
collapse_huge_page in the madvise(MADV_HUGEPAGE) regions.
This printed 0 before, now it prints 0xff as with transparent hugepage
disabled.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#define SIZE (2*1024*1024)
int main()
{
char *p = malloc(SIZE*2-1);
p = (char *)((unsigned long)(p + SIZE - 1) & ~(SIZE-1));
*p = 0xff;
munmap(p+SIZE-4096, 4096);
printf("%x\n", *(unsigned char *)p);
return 0;
}
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -850,6 +850,9 @@ extern struct page *follow_trans_huge_pm
unsigned long addr,
pmd_t *pmd,
unsigned int flags);
+extern int zap_pmd_trans_huge(struct mmu_gather *tlb,
+ struct vm_area_struct *vma,
+ pmd_t *pmd);
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
extern int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
pmd_t *dst_pmd, pmd_t *src_pmd,
diff --git a/mm/Kconfig b/mm/Kconfig
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -292,8 +292,9 @@ config NOMMU_INITIAL_TRIM_EXCESS
See Documentation/nommu-mmap.txt for more information.
config TRANSPARENT_HUGEPAGE
- bool "Transparent Hugepage support"
+ bool "Transparent Hugepage support" if EMBEDDED
depends on X86_64
+ default y
help
Transparent Hugepages allows the kernel to use huge pages and
huge tlb transparently to the applications whenever possible.
@@ -302,4 +303,4 @@ config TRANSPARENT_HUGEPAGE
allocation, by reducing the number of tlb misses and by speeding
up the pagetable walking.
- If unsure, say N.
+ If memory constrained on embedded, you may want to say N.
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -12,6 +12,7 @@
#include <linux/mmu_notifier.h>
#include <linux/rmap.h>
#include <linux/swap.h>
+#include <asm/tlb.h>
#include <asm/pgalloc.h>
#include "internal.h"
@@ -374,3 +375,37 @@ struct page *follow_trans_huge_pmd(struc
out:
return page;
}
+
+int zap_pmd_trans_huge(struct mmu_gather *tlb, struct vm_area_struct *vma,
+ pmd_t *pmd)
+{
+ int ret = 0;
+
+ spin_lock(&tlb->mm->page_table_lock);
+ if (likely(pmd_trans_huge(*pmd))) {
+ if (unlikely(pmd_trans_frozen(*pmd))) {
+ spin_unlock(&tlb->mm->page_table_lock);
+ wait_split_huge_page(vma->anon_vma,
+ pmd);
+ } else {
+ struct page *page;
+ pgtable_t pgtable;
+ pgtable = get_pmd_huge_pte(tlb->mm);
+ page = pfn_to_page(pmd_pfn(*pmd));
+ VM_BUG_ON(!PageCompound(page));
+ pmd_clear(pmd);
+ spin_unlock(&tlb->mm->page_table_lock);
+ page_remove_rmap(page);
+ VM_BUG_ON(page_mapcount(page) < 0);
+ add_mm_counter(tlb->mm, anon_rss,
+ -1<<(HPAGE_SHIFT-
+ PAGE_SHIFT));
+ tlb_remove_page(tlb, page);
+ pte_free(tlb->mm, pgtable);
+ ret = 1;
+ }
+ } else
+ spin_unlock(&tlb->mm->page_table_lock);
+
+ return ret;
+}
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -913,32 +913,12 @@ static inline unsigned long zap_pmd_rang
do {
next = pmd_addr_end(addr, end);
if (pmd_trans_huge(*pmd)) {
- spin_lock(&tlb->mm->page_table_lock);
- if (likely(pmd_trans_huge(*pmd))) {
- if (unlikely(pmd_trans_frozen(*pmd))) {
- spin_unlock(&tlb->mm->page_table_lock);
- wait_split_huge_page(vma->anon_vma,
- pmd);
- } else {
- struct page *page;
- pgtable_t pgtable;
- pgtable = get_pmd_huge_pte(tlb->mm);
- page = pfn_to_page(pmd_pfn(*pmd));
- VM_BUG_ON(!PageCompound(page));
- pmd_clear(pmd);
- spin_unlock(&tlb->mm->page_table_lock);
- page_remove_rmap(page);
- VM_BUG_ON(page_mapcount(page) < 0);
- add_mm_counter(tlb->mm, anon_rss,
- -1<<(HPAGE_SHIFT-
- PAGE_SHIFT));
- put_page(page);
- pte_free(tlb->mm, pgtable);
- (*zap_work)--;
- continue;
- }
- } else
- spin_unlock(&tlb->mm->page_table_lock);
+ if (next-addr != HPAGE_SIZE)
+ split_huge_page_vma(vma, pmd);
+ else if (zap_pmd_trans_huge(tlb, vma, pmd)) {
+ (*zap_work)--;
+ continue;
+ }
/* fall through */
}
if (pmd_none_or_clear_bad(pmd)) {
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-10-29 12:54 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-26 18:51 Andrea Arcangeli
2009-10-27 15:41 ` Rik van Riel
2009-10-27 18:18 ` Andi Kleen
2009-10-27 19:30 ` Andrea Arcangeli
2009-10-28 4:28 ` Andi Kleen
2009-10-28 12:00 ` Andrea Arcangeli
2009-10-28 14:18 ` Andi Kleen
2009-10-28 14:54 ` Adam Litke
2009-10-28 15:13 ` Andi Kleen
2009-10-28 15:30 ` Andrea Arcangeli
2009-10-29 15:59 ` Dave Hansen
2009-10-31 21:32 ` Benjamin Herrenschmidt
2009-10-28 15:48 ` Andrea Arcangeli
2009-10-28 16:03 ` Andi Kleen
2009-10-28 16:22 ` Andrea Arcangeli
2009-10-28 16:34 ` Andi Kleen
2009-10-28 16:56 ` Adam Litke
2009-10-28 17:18 ` Andi Kleen
2009-10-28 19:04 ` Andrea Arcangeli
2009-10-28 19:22 ` Andrea Arcangeli
2009-10-29 9:43 ` Ingo Molnar
2009-10-29 10:36 ` Andrea Arcangeli
2009-10-29 16:50 ` Mike Travis
2009-10-30 0:40 ` KAMEZAWA Hiroyuki
2009-11-03 10:55 ` Andrea Arcangeli
2009-11-04 0:36 ` KAMEZAWA Hiroyuki
2009-10-29 12:54 ` Andrea Arcangeli [this message]
2009-10-27 20:42 ` Christoph Lameter
2009-10-27 18:21 ` Andrea Arcangeli
2009-10-27 20:25 ` Chris Wright
2009-10-29 18:51 ` Christoph Lameter
2009-11-01 10:56 ` Andrea Arcangeli
2009-10-29 18:55 ` Christoph Lameter
2009-10-31 21:29 ` Benjamin Herrenschmidt
2009-11-03 11:18 ` Andrea Arcangeli
2009-11-03 19:10 ` Dave Hansen
2009-11-04 4:10 ` Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091029125435.GM9640@random.random \
--to=aarcange@redhat.com \
--cc=agl@us.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=avi@redhat.com \
--cc=hugh.dickins@tiscali.co.uk \
--cc=ieidus@redhat.com \
--cc=linux-mm@kvack.org \
--cc=mtosatti@redhat.com \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox