From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Fri, 13 Jul 2007 21:39:18 +0100 (BST) From: Hugh Dickins Subject: Re: mmu_gather changes & generalization In-Reply-To: <1184287915.6059.163.camel@localhost.localdomain> Message-ID: References: <1184046405.6059.17.camel@localhost.localdomain> <1184195933.6059.111.camel@localhost.localdomain> <1184287915.6059.163.camel@localhost.localdomain> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: Benjamin Herrenschmidt Cc: linux-mm@kvack.org, Nick Piggin List-ID: On Fri, 13 Jul 2007, Benjamin Herrenschmidt wrote: > > I don't care about the small macros that just set/test bits like > pte_exec. I want to remove the ones that do more than that and are > unused (ptep_test_and_clear_dirty() was a good example, there was some > semantics subtleties vs. flushing or not flusing, etc...). Those things > need to go if they aren't used. Yes, David Rientjes and Zach Amsden and I kept going back and forth over its sister ptep_test_and_clear_young(): it is hard to work out where to place what kind of flush, particularly when it has no users. Martin eliminating ptep_test_and_clear_dirty looked like a good answer. > I'll have a look after the next -mm to see what's left. There may be > nothing left to cleanup :-) It sounds like I misunderstood how far your cleanup was to reach. Maybe there isn't such a big multi-arch-build deal as I implied. Here's the 2.6.22 version of what I worked on just after 2.6.16. As I said before, if you find it useful to build upon, do so; but if not, not. From something you said earlier, I've a feeling we'll be fighting over where to place the TLB flushes, inside or outside the page table lock. A few notes: Keep in mind: hard to have low preemption latency with decent throughput in zap_pte_range - easier than it once was now the ptl is taken lower down, but big problem when truncation/invalidation holds i_mmap_lock to scan the vma prio_tree - drop that lock and it has to restart. Not satisfactorily solved yet (sometimes I think we should collapse the prio_tree into a list for the duration of the unmapping: no problem putting a marker in the list). The mmu_gather of pages to be freed after TLB flush represents a signficant quantity of deferred work, particularly when those pages are in swapcache: we do want preemption enabled while freeing them, but we don't want to lose our place in the prio_tree very often. Don't be misled by inclusion of patches to ia64 and powerpc hugetlbpage.c, that's just to replace **tlb by *tlb in one function: the real mmu_gather conversion is yet to be done there. Only i386 and x86_64 have been converted, built and (inadequately) tested so far: but most arches shouldn't need more than removing their DEFINE_PER_CPU, with arm and arm26 probably just wanting to use more of the generic code. sparc64 uses a flush_tlb_pending technique which defers a lot of work until context switch, when it cannot be preempted: I've given little thought to it. powerpc appeared similar to sparc64, but you've changed it since 2.6.16. I've removed the start,end args to tlb_finish_mmu, and several levels above it: the tlb_start_valid business in unmap_vmas always seemed ugly to me, only ia64 has made use of them, and I cannot see why it shouldn't just record first and last addr when its tlb_remove_tlb_entry is called. But since ia64 isn't done yet, that end of it isn't seen in the patch. Hugh --- arch/i386/mm/init.c | 1 arch/ia64/mm/hugetlbpage.c | 2 arch/powerpc/mm/hugetlbpage.c | 8 - arch/x86_64/mm/init.c | 2 include/asm-generic/pgtable.h | 12 -- include/asm-generic/tlb.h | 109 +++++++++++---------- include/asm-x86_64/tlbflush.h | 4 include/linux/hugetlb.h | 2 include/linux/mm.h | 11 -- include/linux/swap.h | 5 - mm/fremap.c | 2 mm/memory.c | 209 ++++++++++++++++-------------------------- mm/mmap.c | 34 ++---- mm/swap_state.c | 12 -- 14 files changed, 163 insertions(+), 250 deletions(-) --- 2.6.22/arch/i386/mm/init.c 2007-07-09 00:32:17.000000000 +0100 +++ linux/arch/i386/mm/init.c 2007-07-12 19:47:28.000000000 +0100 @@ -47,7 +47,6 @@ unsigned int __VMALLOC_RESERVE = 128 << 20; -DEFINE_PER_CPU(struct mmu_gather, mmu_gathers); unsigned long highstart_pfn, highend_pfn; static int noinline do_test_wp_bit(void); --- 2.6.22/arch/ia64/mm/hugetlbpage.c 2007-07-09 00:32:17.000000000 +0100 +++ linux/arch/ia64/mm/hugetlbpage.c 2007-07-12 19:47:28.000000000 +0100 @@ -114,7 +114,7 @@ follow_huge_pmd(struct mm_struct *mm, un return NULL; } -void hugetlb_free_pgd_range(struct mmu_gather **tlb, +void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end, unsigned long floor, unsigned long ceiling) { --- 2.6.22/arch/powerpc/mm/hugetlbpage.c 2007-07-09 00:32:17.000000000 +0100 +++ linux/arch/powerpc/mm/hugetlbpage.c 2007-07-12 19:47:28.000000000 +0100 @@ -240,7 +240,7 @@ static void hugetlb_free_pud_range(struc * * Must be called with pagetable lock held. */ -void hugetlb_free_pgd_range(struct mmu_gather **tlb, +void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end, unsigned long floor, unsigned long ceiling) { @@ -300,13 +300,13 @@ void hugetlb_free_pgd_range(struct mmu_g return; start = addr; - pgd = pgd_offset((*tlb)->mm, addr); + pgd = pgd_offset(tlb->mm, addr); do { - BUG_ON(get_slice_psize((*tlb)->mm, addr) != mmu_huge_psize); + BUG_ON(get_slice_psize(tlb->mm, addr) != mmu_huge_psize); next = pgd_addr_end(addr, end); if (pgd_none_or_clear_bad(pgd)) continue; - hugetlb_free_pud_range(*tlb, pgd, addr, next, floor, ceiling); + hugetlb_free_pud_range(tlb, pgd, addr, next, floor, ceiling); } while (pgd++, addr = next, addr != end); } --- 2.6.22/arch/x86_64/mm/init.c 2007-07-09 00:32:17.000000000 +0100 +++ linux/arch/x86_64/mm/init.c 2007-07-12 19:47:28.000000000 +0100 @@ -53,8 +53,6 @@ EXPORT_SYMBOL(dma_ops); static unsigned long dma_reserve __initdata; -DEFINE_PER_CPU(struct mmu_gather, mmu_gathers); - /* * NOTE: pagetable_init alloc all the fixmap pagetables contiguous on the * physical space so we can cache the place of the first one and move --- 2.6.22/include/asm-generic/pgtable.h 2007-07-09 00:32:17.000000000 +0100 +++ linux/include/asm-generic/pgtable.h 2007-07-12 19:47:28.000000000 +0100 @@ -111,18 +111,6 @@ do { \ }) #endif -/* - * Some architectures may be able to avoid expensive synchronization - * primitives when modifications are made to PTE's which are already - * not present, or in the process of an address space destruction. - */ -#ifndef __HAVE_ARCH_PTE_CLEAR_NOT_PRESENT_FULL -#define pte_clear_not_present_full(__mm, __address, __ptep, __full) \ -do { \ - pte_clear((__mm), (__address), (__ptep)); \ -} while (0) -#endif - #ifndef __HAVE_ARCH_PTEP_CLEAR_FLUSH #define ptep_clear_flush(__vma, __address, __ptep) \ ({ \ --- 2.6.22/include/asm-generic/tlb.h 2006-11-29 21:57:37.000000000 +0000 +++ linux/include/asm-generic/tlb.h 2007-07-12 19:47:28.000000000 +0100 @@ -17,65 +17,77 @@ #include #include -/* - * For UP we don't need to worry about TLB flush - * and page free order so much.. - */ -#ifdef CONFIG_SMP - #ifdef ARCH_FREE_PTR_NR - #define FREE_PTR_NR ARCH_FREE_PTR_NR - #else - #define FREE_PTE_NR 506 - #endif - #define tlb_fast_mode(tlb) ((tlb)->nr == ~0U) -#else - #define FREE_PTE_NR 1 - #define tlb_fast_mode(tlb) 1 -#endif +#define TLB_TRUNC 0 /* i_mmap_lock is held */ +#define TLB_UNMAP 1 /* normal munmap or zap */ +#define TLB_EXIT 2 /* tearing down whole mm */ + +#define TLB_FALLBACK_PAGES 8 /* a few entries on the stack */ /* struct mmu_gather is an opaque type used by the mm code for passing around * any data needed by arch specific code for tlb_remove_page. */ struct mmu_gather { - struct mm_struct *mm; - unsigned int nr; /* set to ~0U means fast mode */ - unsigned int need_flush;/* Really unmapped some ptes? */ - unsigned int fullmm; /* non-zero means full mm flush */ - struct page * pages[FREE_PTE_NR]; + struct mm_struct *mm; + short nr; + short max; + short need_flush; /* Really unmapped some ptes? */ + short mode; + struct page ** pages; + struct page * fallback_pages[TLB_FALLBACK_PAGES]; }; -/* Users of the generic TLB shootdown code must declare this storage space. */ -DECLARE_PER_CPU(struct mmu_gather, mmu_gathers); - /* tlb_gather_mmu - * Return a pointer to an initialized struct mmu_gather. + * Initialize struct mmu_gather. */ -static inline struct mmu_gather * -tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush) +static inline void +tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, int mode) { - struct mmu_gather *tlb = &get_cpu_var(mmu_gathers); - tlb->mm = mm; - - /* Use fast mode if only one CPU is online */ - tlb->nr = num_online_cpus() > 1 ? 0U : ~0U; - - tlb->fullmm = full_mm_flush; - - return tlb; + tlb->nr = 0; + tlb->max = TLB_FALLBACK_PAGES; + tlb->need_flush = 0; + tlb->mode = mode; + tlb->pages = tlb->fallback_pages; + /* temporarily erase fallback_pages for clearer debug traces */ + memset(tlb->fallback_pages, 0, sizeof(tlb->fallback_pages)); } static inline void -tlb_flush_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end) +tlb_flush_mmu(struct mmu_gather *tlb) { if (!tlb->need_flush) return; tlb->need_flush = 0; tlb_flush(tlb); - if (!tlb_fast_mode(tlb)) { - free_pages_and_swap_cache(tlb->pages, tlb->nr); - tlb->nr = 0; + free_pages_and_swap_cache(tlb->pages, tlb->nr); + tlb->nr = 0; +} + +static inline int +tlb_is_extensible(struct mmu_gather *tlb) +{ +#ifdef CONFIG_PREEMPT + return tlb->mode != TLB_TRUNC; +#else + return 1; +#endif +} + +static inline int +tlb_is_full(struct mmu_gather *tlb) +{ + if (tlb->nr < tlb->max) + return 0; + if (tlb->pages == tlb->fallback_pages && tlb_is_extensible(tlb)) { + struct page **pages = (void *)__get_free_pages(GFP_ATOMIC|__GFP_NOWARN, 0); + if (pages) { + memcpy(pages, tlb->pages, sizeof(tlb->fallback_pages)); + tlb->pages = pages; + tlb->max = PAGE_SIZE / sizeof(struct page *); + return 0; + } } + return 1; } /* tlb_finish_mmu @@ -83,14 +95,11 @@ tlb_flush_mmu(struct mmu_gather *tlb, un * that were required. */ static inline void -tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end) +tlb_finish_mmu(struct mmu_gather *tlb) { - tlb_flush_mmu(tlb, start, end); - - /* keep the page table cache within bounds */ - check_pgt_cache(); - - put_cpu_var(mmu_gathers); + tlb_flush_mmu(tlb); + if (tlb->pages != tlb->fallback_pages) + free_pages((unsigned long)tlb->pages, 0); } /* tlb_remove_page @@ -100,14 +109,10 @@ tlb_finish_mmu(struct mmu_gather *tlb, u */ static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page) { + if (tlb->nr >= tlb->max) + tlb_flush_mmu(tlb); tlb->need_flush = 1; - if (tlb_fast_mode(tlb)) { - free_page_and_swap_cache(page); - return; - } tlb->pages[tlb->nr++] = page; - if (tlb->nr >= FREE_PTE_NR) - tlb_flush_mmu(tlb, 0, 0); } /** --- 2.6.22/include/asm-x86_64/tlbflush.h 2007-07-09 00:32:17.000000000 +0100 +++ linux/include/asm-x86_64/tlbflush.h 2007-07-12 19:47:28.000000000 +0100 @@ -86,10 +86,6 @@ static inline void flush_tlb_range(struc #define TLBSTATE_OK 1 #define TLBSTATE_LAZY 2 -/* Roughly an IPI every 20MB with 4k pages for freeing page table - ranges. Cost is about 42k of memory for each CPU. */ -#define ARCH_FREE_PTE_NR 5350 - #endif #define flush_tlb_kernel_range(start, end) flush_tlb_all() --- 2.6.22/include/linux/hugetlb.h 2007-07-09 00:32:17.000000000 +0100 +++ linux/include/linux/hugetlb.h 2007-07-12 19:47:28.000000000 +0100 @@ -52,7 +52,7 @@ void hugetlb_change_protection(struct vm #ifndef ARCH_HAS_HUGETLB_FREE_PGD_RANGE #define hugetlb_free_pgd_range free_pgd_range #else -void hugetlb_free_pgd_range(struct mmu_gather **tlb, unsigned long addr, +void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end, unsigned long floor, unsigned long ceiling); #endif --- 2.6.22/include/linux/mm.h 2007-07-09 00:32:17.000000000 +0100 +++ linux/include/linux/mm.h 2007-07-12 19:47:28.000000000 +0100 @@ -738,15 +738,12 @@ struct zap_details { }; struct page *vm_normal_page(struct vm_area_struct *, unsigned long, pte_t); -unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address, +void zap_page_range(struct vm_area_struct *vma, unsigned long address, unsigned long size, struct zap_details *); -unsigned long unmap_vmas(struct mmu_gather **tlb, - struct vm_area_struct *start_vma, unsigned long start_addr, - unsigned long end_addr, unsigned long *nr_accounted, - struct zap_details *); -void free_pgd_range(struct mmu_gather **tlb, unsigned long addr, +void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma); +void free_pgd_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end, unsigned long floor, unsigned long ceiling); -void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *start_vma, +void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma, unsigned long floor, unsigned long ceiling); int copy_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *vma); --- 2.6.22/include/linux/swap.h 2007-04-26 04:08:32.000000000 +0100 +++ linux/include/linux/swap.h 2007-07-12 19:47:28.000000000 +0100 @@ -232,7 +232,6 @@ extern void delete_from_swap_cache(struc extern int move_to_swap_cache(struct page *, swp_entry_t); extern int move_from_swap_cache(struct page *, unsigned long, struct address_space *); -extern void free_page_and_swap_cache(struct page *); extern void free_pages_and_swap_cache(struct page **, int); extern struct page * lookup_swap_cache(swp_entry_t); extern struct page * read_swap_cache_async(swp_entry_t, struct vm_area_struct *vma, @@ -287,9 +286,7 @@ static inline void disable_swap_token(vo #define si_swapinfo(val) \ do { (val)->freeswap = (val)->totalswap = 0; } while (0) /* only sparc can not include linux/pagemap.h in this file - * so leave page_cache_release and release_pages undeclared... */ -#define free_page_and_swap_cache(page) \ - page_cache_release(page) + * so leave release_pages undeclared... */ #define free_pages_and_swap_cache(pages, nr) \ release_pages((pages), (nr), 0); --- 2.6.22/mm/fremap.c 2007-02-04 18:44:54.000000000 +0000 +++ linux/mm/fremap.c 2007-07-12 19:47:28.000000000 +0100 @@ -39,7 +39,7 @@ static int zap_pte(struct mm_struct *mm, } else { if (!pte_file(pte)) free_swap_and_cache(pte_to_swp_entry(pte)); - pte_clear_not_present_full(mm, addr, ptep, 0); + pte_clear(mm, addr, ptep); } return !!page; } --- 2.6.22/mm/memory.c 2007-07-09 00:32:17.000000000 +0100 +++ linux/mm/memory.c 2007-07-12 19:47:28.000000000 +0100 @@ -203,7 +203,7 @@ static inline void free_pud_range(struct * * Must be called with pagetable lock held. */ -void free_pgd_range(struct mmu_gather **tlb, +void free_pgd_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end, unsigned long floor, unsigned long ceiling) { @@ -254,19 +254,19 @@ void free_pgd_range(struct mmu_gather ** return; start = addr; - pgd = pgd_offset((*tlb)->mm, addr); + pgd = pgd_offset(tlb->mm, addr); do { next = pgd_addr_end(addr, end); if (pgd_none_or_clear_bad(pgd)) continue; - free_pud_range(*tlb, pgd, addr, next, floor, ceiling); + free_pud_range(tlb, pgd, addr, next, floor, ceiling); } while (pgd++, addr = next, addr != end); - if (!(*tlb)->fullmm) - flush_tlb_pgtables((*tlb)->mm, start, end); + if (tlb->mode != TLB_EXIT) + flush_tlb_pgtables(tlb->mm, start, end); } -void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *vma, +void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long floor, unsigned long ceiling) { while (vma) { @@ -298,6 +298,9 @@ void free_pgtables(struct mmu_gather **t } vma = next; } + + /* keep the page table cache within bounds */ + check_pgt_cache(); } int __pte_alloc(struct mm_struct *mm, pmd_t *pmd, unsigned long address) @@ -621,24 +624,36 @@ int copy_page_range(struct mm_struct *ds static unsigned long zap_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, - long *zap_work, struct zap_details *details) + struct zap_details *details) { + spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL; struct mm_struct *mm = tlb->mm; pte_t *pte; spinlock_t *ptl; int file_rss = 0; int anon_rss = 0; + int progress; +again: + progress = 0; pte = pte_offset_map_lock(mm, pmd, addr, &ptl); arch_enter_lazy_mmu_mode(); do { - pte_t ptent = *pte; + pte_t ptent; + + if (progress >= 64) { + progress = 0; + if (need_resched() || + need_lockbreak(ptl) || + (i_mmap_lock && need_lockbreak(i_mmap_lock))) + break; + } + ptent = *pte; if (pte_none(ptent)) { - (*zap_work)--; + progress++; continue; } - - (*zap_work) -= PAGE_SIZE; + progress += 8; if (pte_present(ptent)) { struct page *page; @@ -662,8 +677,10 @@ static unsigned long zap_pte_range(struc page->index > details->last_index)) continue; } + if (tlb_is_full(tlb)) + break; ptent = ptep_get_and_clear_full(mm, addr, pte, - tlb->fullmm); + tlb->mode == TLB_EXIT); tlb_remove_tlb_entry(tlb, pte, addr); if (unlikely(!page)) continue; @@ -693,20 +710,27 @@ static unsigned long zap_pte_range(struc continue; if (!pte_file(ptent)) free_swap_and_cache(pte_to_swp_entry(ptent)); - pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); - } while (pte++, addr += PAGE_SIZE, (addr != end && *zap_work > 0)); + pte_clear(mm, addr, pte); + } while (pte++, addr += PAGE_SIZE, addr != end); add_mm_rss(mm, file_rss, anon_rss); arch_leave_lazy_mmu_mode(); pte_unmap_unlock(pte - 1, ptl); + if (!i_mmap_lock) { + cond_resched(); + if (tlb_is_full(tlb)) + tlb_flush_mmu(tlb); + if (addr != end) + goto again; + } return addr; } static inline unsigned long zap_pmd_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t *pud, unsigned long addr, unsigned long end, - long *zap_work, struct zap_details *details) + struct zap_details *details) { pmd_t *pmd; unsigned long next; @@ -715,20 +739,18 @@ static inline unsigned long zap_pmd_rang do { next = pmd_addr_end(addr, end); if (pmd_none_or_clear_bad(pmd)) { - (*zap_work)--; + addr = next; continue; } - next = zap_pte_range(tlb, vma, pmd, addr, next, - zap_work, details); - } while (pmd++, addr = next, (addr != end && *zap_work > 0)); - + addr = zap_pte_range(tlb, vma, pmd, addr, next, details); + } while (pmd++, addr == next && addr != end); return addr; } static inline unsigned long zap_pud_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pgd_t *pgd, unsigned long addr, unsigned long end, - long *zap_work, struct zap_details *details) + struct zap_details *details) { pud_t *pud; unsigned long next; @@ -737,20 +759,18 @@ static inline unsigned long zap_pud_rang do { next = pud_addr_end(addr, end); if (pud_none_or_clear_bad(pud)) { - (*zap_work)--; + addr = next; continue; } - next = zap_pmd_range(tlb, vma, pud, addr, next, - zap_work, details); - } while (pud++, addr = next, (addr != end && *zap_work > 0)); - + addr = zap_pmd_range(tlb, vma, pud, addr, next, details); + } while (pud++, addr == next && addr != end); return addr; } static unsigned long unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long addr, unsigned long end, - long *zap_work, struct zap_details *details) + struct zap_details *details) { pgd_t *pgd; unsigned long next; @@ -764,137 +784,62 @@ static unsigned long unmap_page_range(st do { next = pgd_addr_end(addr, end); if (pgd_none_or_clear_bad(pgd)) { - (*zap_work)--; + addr = next; continue; } - next = zap_pud_range(tlb, vma, pgd, addr, next, - zap_work, details); - } while (pgd++, addr = next, (addr != end && *zap_work > 0)); + addr = zap_pud_range(tlb, vma, pgd, addr, next, details); + } while (pgd++, addr == next && addr != end); tlb_end_vma(tlb, vma); - return addr; } -#ifdef CONFIG_PREEMPT -# define ZAP_BLOCK_SIZE (8 * PAGE_SIZE) -#else -/* No preempt: go for improved straight-line efficiency */ -# define ZAP_BLOCK_SIZE (1024 * PAGE_SIZE) -#endif - /** * unmap_vmas - unmap a range of memory covered by a list of vma's - * @tlbp: address of the caller's struct mmu_gather + * @tlb: address of the caller's struct mmu_gather * @vma: the starting vma - * @start_addr: virtual address at which to start unmapping - * @end_addr: virtual address at which to end unmapping - * @nr_accounted: Place number of unmapped pages in vm-accountable vma's here - * @details: details of nonlinear truncation or shared cache invalidation - * - * Returns the end address of the unmapping (restart addr if interrupted). * * Unmap all pages in the vma list. - * - * We aim to not hold locks for too long (for scheduling latency reasons). - * So zap pages in ZAP_BLOCK_SIZE bytecounts. This means we need to - * return the ending mmu_gather to the caller. - * - * Only addresses between `start' and `end' will be unmapped. - * * The VMA list must be sorted in ascending virtual address order. - * - * unmap_vmas() assumes that the caller will flush the whole unmapped address - * range after unmap_vmas() returns. So the only responsibility here is to - * ensure that any thus-far unmapped pages are flushed before unmap_vmas() - * drops the lock and schedules. - */ -unsigned long unmap_vmas(struct mmu_gather **tlbp, - struct vm_area_struct *vma, unsigned long start_addr, - unsigned long end_addr, unsigned long *nr_accounted, - struct zap_details *details) + */ +void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *vma) { - long zap_work = ZAP_BLOCK_SIZE; - unsigned long tlb_start = 0; /* For tlb_finish_mmu */ - int tlb_start_valid = 0; - unsigned long start = start_addr; - spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL; - int fullmm = (*tlbp)->fullmm; - - for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) { - unsigned long end; - - start = max(vma->vm_start, start_addr); - if (start >= vma->vm_end) - continue; - end = min(vma->vm_end, end_addr); - if (end <= vma->vm_start) - continue; + unsigned long nr_accounted = 0; + while (vma) { if (vma->vm_flags & VM_ACCOUNT) - *nr_accounted += (end - start) >> PAGE_SHIFT; - - while (start != end) { - if (!tlb_start_valid) { - tlb_start = start; - tlb_start_valid = 1; - } - - if (unlikely(is_vm_hugetlb_page(vma))) { - unmap_hugepage_range(vma, start, end); - zap_work -= (end - start) / - (HPAGE_SIZE / PAGE_SIZE); - start = end; - } else - start = unmap_page_range(*tlbp, vma, - start, end, &zap_work, details); - - if (zap_work > 0) { - BUG_ON(start != end); - break; - } + nr_accounted += vma_pages(vma); - tlb_finish_mmu(*tlbp, tlb_start, start); - - if (need_resched() || - (i_mmap_lock && need_lockbreak(i_mmap_lock))) { - if (i_mmap_lock) { - *tlbp = NULL; - goto out; - } - cond_resched(); - } - - *tlbp = tlb_gather_mmu(vma->vm_mm, fullmm); - tlb_start_valid = 0; - zap_work = ZAP_BLOCK_SIZE; - } + if (unlikely(is_vm_hugetlb_page(vma))) + unmap_hugepage_range(vma, vma->vm_start, vma->vm_end); + else + unmap_page_range(tlb, vma, vma->vm_start, vma->vm_end, NULL); + vma = vma->vm_next; } -out: - return start; /* which is now the end (or restart) address */ + + vm_unacct_memory(nr_accounted); } /** * zap_page_range - remove user pages in a given range * @vma: vm_area_struct holding the applicable pages * @address: starting address of pages to zap - * @size: number of bytes to zap + * @end: ending address of pages to zap * @details: details of nonlinear truncation or shared cache invalidation */ -unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address, +void zap_page_range(struct vm_area_struct *vma, unsigned long address, unsigned long size, struct zap_details *details) { struct mm_struct *mm = vma->vm_mm; - struct mmu_gather *tlb; + struct mmu_gather tlb; unsigned long end = address + size; - unsigned long nr_accounted = 0; - lru_add_drain(); - tlb = tlb_gather_mmu(mm, 0); + BUG_ON(is_vm_hugetlb_page(vma)); + BUG_ON(address < vma->vm_start || end > vma->vm_end); + + tlb_gather_mmu(&tlb, mm, TLB_UNMAP); update_hiwater_rss(mm); - end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details); - if (tlb) - tlb_finish_mmu(tlb, address, end); - return end; + unmap_page_range(&tlb, vma, address, end, details); + tlb_finish_mmu(&tlb); } /* @@ -1822,6 +1767,8 @@ static int unmap_mapping_range_vma(struc unsigned long start_addr, unsigned long end_addr, struct zap_details *details) { + struct mm_struct *mm = vma->vm_mm; + struct mmu_gather tlb; unsigned long restart_addr; int need_break; @@ -1836,8 +1783,12 @@ again: } } - restart_addr = zap_page_range(vma, start_addr, - end_addr - start_addr, details); + tlb_gather_mmu(&tlb, mm, TLB_TRUNC); + update_hiwater_rss(mm); + restart_addr = unmap_page_range(&tlb, vma, + start_addr, end_addr, details); + tlb_finish_mmu(&tlb); + need_break = need_resched() || need_lockbreak(details->i_mmap_lock); --- 2.6.22/mm/mmap.c 2007-07-09 00:32:17.000000000 +0100 +++ linux/mm/mmap.c 2007-07-12 19:47:28.000000000 +0100 @@ -36,8 +36,7 @@ #endif static void unmap_region(struct mm_struct *mm, - struct vm_area_struct *vma, struct vm_area_struct *prev, - unsigned long start, unsigned long end); + struct vm_area_struct *vma, struct vm_area_struct *prev); /* * WARNING: the debugging will use recursive algorithms so never enable this @@ -1165,7 +1164,7 @@ unmap_and_free_vma: fput(file); /* Undo any partial mapping done by a device driver. */ - unmap_region(mm, vma, prev, vma->vm_start, vma->vm_end); + unmap_region(mm, vma, prev); charged = 0; free_vma: kmem_cache_free(vm_area_cachep, vma); @@ -1677,21 +1676,17 @@ static void remove_vma_list(struct mm_st * Called with the mm semaphore held. */ static void unmap_region(struct mm_struct *mm, - struct vm_area_struct *vma, struct vm_area_struct *prev, - unsigned long start, unsigned long end) + struct vm_area_struct *vma, struct vm_area_struct *prev) { struct vm_area_struct *next = prev? prev->vm_next: mm->mmap; - struct mmu_gather *tlb; - unsigned long nr_accounted = 0; + struct mmu_gather tlb; - lru_add_drain(); - tlb = tlb_gather_mmu(mm, 0); + tlb_gather_mmu(&tlb, mm, TLB_UNMAP); update_hiwater_rss(mm); - unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL); - vm_unacct_memory(nr_accounted); + unmap_vmas(&tlb, vma); free_pgtables(&tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS, next? next->vm_start: 0); - tlb_finish_mmu(tlb, start, end); + tlb_finish_mmu(&tlb); } /* @@ -1829,7 +1824,7 @@ int do_munmap(struct mm_struct *mm, unsi * Remove the vma's, and unmap the actual pages */ detach_vmas_to_be_unmapped(mm, vma, prev, end); - unmap_region(mm, vma, prev, start, end); + unmap_region(mm, vma, prev); /* Fix up all other VM information */ remove_vma_list(mm, vma); @@ -1968,23 +1963,18 @@ EXPORT_SYMBOL(do_brk); /* Release all mmaps. */ void exit_mmap(struct mm_struct *mm) { - struct mmu_gather *tlb; + struct mmu_gather tlb; struct vm_area_struct *vma = mm->mmap; - unsigned long nr_accounted = 0; - unsigned long end; /* mm's last user has gone, and its about to be pulled down */ arch_exit_mmap(mm); - lru_add_drain(); flush_cache_mm(mm); - tlb = tlb_gather_mmu(mm, 1); + tlb_gather_mmu(&tlb, mm, TLB_EXIT); /* Don't update_hiwater_rss(mm) here, do_exit already did */ - /* Use -1 here to ensure all VMAs in the mm are unmapped */ - end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL); - vm_unacct_memory(nr_accounted); + unmap_vmas(&tlb, vma); free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, 0); - tlb_finish_mmu(tlb, 0, end); + tlb_finish_mmu(&tlb); /* * Walk the list again, actually closing and freeing it, --- 2.6.22/mm/swap_state.c 2006-09-20 04:42:06.000000000 +0100 +++ linux/mm/swap_state.c 2007-07-12 19:47:28.000000000 +0100 @@ -258,16 +258,6 @@ static inline void free_swap_cache(struc } } -/* - * Perform a free_page(), also freeing any swap cache associated with - * this page if it is the last user of the page. - */ -void free_page_and_swap_cache(struct page *page) -{ - free_swap_cache(page); - page_cache_release(page); -} - /* * Passed an array of pages, drop them all from swapcache and then release * them. They are removed from the LRU and freed if this is their last use. @@ -286,6 +276,8 @@ void free_pages_and_swap_cache(struct pa release_pages(pagep, todo, 0); pagep += todo; nr -= todo; + if (nr && !preempt_count()) + cond_resched(); } } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org