From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e2.ny.us.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id k52Kvkcw016603 for ; Fri, 2 Jun 2006 16:57:46 -0400 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay04.pok.ibm.com (8.12.10/NCO/VER6.8) with ESMTP id k52KvkLi110292 for ; Fri, 2 Jun 2006 16:57:46 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id k52Kvk67009788 for ; Fri, 2 Jun 2006 16:57:46 -0400 Subject: Re: [PATCH] hugetlb: powerpc: Actively close unused htlb regions on vma close From: Adam Litke In-Reply-To: References: <1149257287.9693.6.camel@localhost.localdomain> Content-Type: text/plain Date: Fri, 02 Jun 2006 15:57:20 -0500 Message-Id: <1149281841.9693.39.camel@localhost.localdomain> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Christoph Lameter Cc: linuxppc-dev@ozlabs.org, linux-mm@kvack.org, David Gibson , linux-kernel@vger.kernel.org List-ID: On Fri, 2006-06-02 at 13:06 -0700, Christoph Lameter wrote: > On Fri, 2 Jun 2006, Adam Litke wrote: > > > The following patch introduces a architecture-specific vm_ops.close() > > hook. For all architectures besides powerpc, this is a no-op. On > > powerpc, the low and high segments are scanned to locate empty hugetlb > > segments which can be made available for normal mappings. Comments? > > IA64 has similar issues and uses the hook suggested by Hugh. However, we > have a permanently reserved memory area. I am a bit surprised about the > need to make address space available for normal mappings. Is this for 32 > bit powerpc support? I now have a working implementation using Hugh's suggestion and incorporating some suggestions from David Hansen... (attaching for reference). The real reason I want to "close" hugetlb regions (even on 64bit platforms) is so a process can replace a previous hugetlb mapping with normal pages when huge pages become scarce. An example would be the hugetlb morecore (malloc) feature in libhugetlbfs :) [PATCH] powerpc: Close hugetlb regions when unmapping VMAs On powerpc, each segment can contain pages of only one size. When a hugetlb mapping is requested, a segment is located and marked for use with huge pages. This is a uni-directional operation -- hugetlb segments are never marked for use again with normal pages. For long running processes which make use of a combination of normal and hugetlb mappings, this behavior can unduly constrain the virtual address space. Changes since V1: * Modifications limited to arch-specific code (hugetlb_free_pgd_range) * Only scan segments covered by the range to be unmapped Signed-off-by: Adam Litke --- hugetlbpage.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 49 insertions(+) diff -upN reference/arch/powerpc/mm/hugetlbpage.c current/arch/powerpc/mm/hugetlbpage.c --- reference/arch/powerpc/mm/hugetlbpage.c +++ current/arch/powerpc/mm/hugetlbpage.c @@ -52,6 +52,7 @@ typedef struct { unsigned long pd; } hugepd_t; #define hugepd_none(hpd) ((hpd).pd == 0) +void close_hugetlb_areas(struct mm_struct *mm); static inline pte_t *hugepd_page(hugepd_t hpd) { @@ -303,6 +304,8 @@ void hugetlb_free_pgd_range(struct mmu_g continue; hugetlb_free_pud_range(*tlb, pgd, addr, next, floor, ceiling); } while (pgd++, addr = next, addr != end); + + close_hugetlb_areas((*tlb)->mm); } void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, @@ -518,6 +521,52 @@ int prepare_hugepage_range(unsigned long return 0; } +void close_hugetlb_areas(struct mm_struct *mm, unsigned long start, + unsigned long end) +{ + unsigned long i; + struct slb_flush_info fi; + u16 inuse, hiflush, loflush, mask; + + if (!mm) + return; + + if (start < 0x100000000UL) { + mask = LOW_ESID_MASK(start, end - start); + inuse = mm->context.low_htlb_areas; + for (i = 0; i < NUM_LOW_AREAS; i++) { + if (!(mask & (1 << i))) + continue; + if (prepare_low_area_for_htlb(mm, i) == 0) + inuse &= ~(1 << i); + } + loflush = inuse ^ mm->context.low_htlb_areas; + mm->context.low_htlb_areas = inuse; + } + + if (end > 0x100000000UL) { + mask = HTLB_AREA_MASK(start, end - start); + inuse = mm->context.high_htlb_areas; + for (i = 0; i < NUM_HIGH_AREAS; i++) { + if (!(mask & (1 << i))) + continue; + if (prepare_high_area_for_htlb(mm, i) == 0) + inuse &= ~(1 << i); + } + hiflush = inuse ^ mm->context.high_htlb_areas; + mm->context.high_htlb_areas = inuse; + } + + /* the context changes must make it to memory before the flush, + * so that further SLB misses do the right thing. */ + mb(); + fi.mm = mm; + if ((fi.newareas = loflush)) + on_each_cpu(flush_low_segments, &fi, 0, 1); + if ((fi.newareas = hiflush)) + on_each_cpu(flush_high_segments, &fi, 0, 1); +} + struct page * follow_huge_addr(struct mm_struct *mm, unsigned long address, int write) { -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org