From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with SMTP id F05C26B004D for ; Thu, 29 Oct 2009 08:54:43 -0400 (EDT) Date: Thu, 29 Oct 2009 13:54:35 +0100 From: Andrea Arcangeli Subject: Re: RFC: Transparent Hugepage support Message-ID: <20091029125435.GM9640@random.random> References: <20091026185130.GC4868@random.random> <87ljiwk8el.fsf@basil.nowhere.org> <20091027193007.GA6043@random.random> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20091027193007.GA6043@random.random> Sender: owner-linux-mm@kvack.org To: Andi Kleen Cc: linux-mm@kvack.org, Marcelo Tosatti , Adam Litke , Avi Kivity , Izik Eidus , Hugh Dickins , Nick Piggin , Andrew Morton List-ID: On Tue, Oct 27, 2009 at 08:30:07PM +0100, Andrea Arcangeli wrote: > generated by mnmap(last4k) I also think I found a minor bug in munmap > if a partial part of the 2M page is unmapped (currently I'm afraid I'm Here the incremental fix if somebody is running the patch (this also frees the page after tlb flush, during development I had to call put_page to debug something and I forgot to replace it back to tlb_remove_page ;). In the meantime I'll try to split the patch into more self contained pieces for easier review (including the kvm patch to build 3level NPT/EPT). Then I will continue working on the daemon that will collapse_huge_page in the madvise(MADV_HUGEPAGE) regions. This printed 0 before, now it prints 0xff as with transparent hugepage disabled. #include #include #include #include #define SIZE (2*1024*1024) int main() { char *p = malloc(SIZE*2-1); p = (char *)((unsigned long)(p + SIZE - 1) & ~(SIZE-1)); *p = 0xff; munmap(p+SIZE-4096, 4096); printf("%x\n", *(unsigned char *)p); return 0; } Signed-off-by: Andrea Arcangeli --- diff --git a/include/linux/mm.h b/include/linux/mm.h --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -850,6 +850,9 @@ extern struct page *follow_trans_huge_pm unsigned long addr, pmd_t *pmd, unsigned int flags); +extern int zap_pmd_trans_huge(struct mmu_gather *tlb, + struct vm_area_struct *vma, + pmd_t *pmd); #ifdef CONFIG_TRANSPARENT_HUGEPAGE extern int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm, pmd_t *dst_pmd, pmd_t *src_pmd, diff --git a/mm/Kconfig b/mm/Kconfig --- a/mm/Kconfig +++ b/mm/Kconfig @@ -292,8 +292,9 @@ config NOMMU_INITIAL_TRIM_EXCESS See Documentation/nommu-mmap.txt for more information. config TRANSPARENT_HUGEPAGE - bool "Transparent Hugepage support" + bool "Transparent Hugepage support" if EMBEDDED depends on X86_64 + default y help Transparent Hugepages allows the kernel to use huge pages and huge tlb transparently to the applications whenever possible. @@ -302,4 +303,4 @@ config TRANSPARENT_HUGEPAGE allocation, by reducing the number of tlb misses and by speeding up the pagetable walking. - If unsure, say N. + If memory constrained on embedded, you may want to say N. diff --git a/mm/huge_memory.c b/mm/huge_memory.c --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include "internal.h" @@ -374,3 +375,37 @@ struct page *follow_trans_huge_pmd(struc out: return page; } + +int zap_pmd_trans_huge(struct mmu_gather *tlb, struct vm_area_struct *vma, + pmd_t *pmd) +{ + int ret = 0; + + spin_lock(&tlb->mm->page_table_lock); + if (likely(pmd_trans_huge(*pmd))) { + if (unlikely(pmd_trans_frozen(*pmd))) { + spin_unlock(&tlb->mm->page_table_lock); + wait_split_huge_page(vma->anon_vma, + pmd); + } else { + struct page *page; + pgtable_t pgtable; + pgtable = get_pmd_huge_pte(tlb->mm); + page = pfn_to_page(pmd_pfn(*pmd)); + VM_BUG_ON(!PageCompound(page)); + pmd_clear(pmd); + spin_unlock(&tlb->mm->page_table_lock); + page_remove_rmap(page); + VM_BUG_ON(page_mapcount(page) < 0); + add_mm_counter(tlb->mm, anon_rss, + -1<<(HPAGE_SHIFT- + PAGE_SHIFT)); + tlb_remove_page(tlb, page); + pte_free(tlb->mm, pgtable); + ret = 1; + } + } else + spin_unlock(&tlb->mm->page_table_lock); + + return ret; +} diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -913,32 +913,12 @@ static inline unsigned long zap_pmd_rang do { next = pmd_addr_end(addr, end); if (pmd_trans_huge(*pmd)) { - spin_lock(&tlb->mm->page_table_lock); - if (likely(pmd_trans_huge(*pmd))) { - if (unlikely(pmd_trans_frozen(*pmd))) { - spin_unlock(&tlb->mm->page_table_lock); - wait_split_huge_page(vma->anon_vma, - pmd); - } else { - struct page *page; - pgtable_t pgtable; - pgtable = get_pmd_huge_pte(tlb->mm); - page = pfn_to_page(pmd_pfn(*pmd)); - VM_BUG_ON(!PageCompound(page)); - pmd_clear(pmd); - spin_unlock(&tlb->mm->page_table_lock); - page_remove_rmap(page); - VM_BUG_ON(page_mapcount(page) < 0); - add_mm_counter(tlb->mm, anon_rss, - -1<<(HPAGE_SHIFT- - PAGE_SHIFT)); - put_page(page); - pte_free(tlb->mm, pgtable); - (*zap_work)--; - continue; - } - } else - spin_unlock(&tlb->mm->page_table_lock); + if (next-addr != HPAGE_SIZE) + split_huge_page_vma(vma, pmd); + else if (zap_pmd_trans_huge(tlb, vma, pmd)) { + (*zap_work)--; + continue; + } /* fall through */ } if (pmd_none_or_clear_bad(pmd)) { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org