From: Hugh Dickins <hugh@veritas.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linux-mm@kvack.org, Nick Piggin <nickpiggin@yahoo.com.au>
Subject: Re: mmu_gather changes & generalization
Date: Fri, 13 Jul 2007 21:39:18 +0100 (BST) [thread overview]
Message-ID: <Pine.LNX.4.64.0707132126001.5377@blonde.wat.veritas.com> (raw)
In-Reply-To: <1184287915.6059.163.camel@localhost.localdomain>
On Fri, 13 Jul 2007, Benjamin Herrenschmidt wrote:
>
> I don't care about the small macros that just set/test bits like
> pte_exec. I want to remove the ones that do more than that and are
> unused (ptep_test_and_clear_dirty() was a good example, there was some
> semantics subtleties vs. flushing or not flusing, etc...). Those things
> need to go if they aren't used.
Yes, David Rientjes and Zach Amsden and I kept going back and forth
over its sister ptep_test_and_clear_young(): it is hard to work out
where to place what kind of flush, particularly when it has no users.
Martin eliminating ptep_test_and_clear_dirty looked like a good answer.
> I'll have a look after the next -mm to see what's left. There may be
> nothing left to cleanup :-)
It sounds like I misunderstood how far your cleanup was to reach.
Maybe there isn't such a big multi-arch-build deal as I implied.
Here's the 2.6.22 version of what I worked on just after 2.6.16.
As I said before, if you find it useful to build upon, do so;
but if not, not. From something you said earlier, I've a
feeling we'll be fighting over where to place the TLB flushes,
inside or outside the page table lock.
A few notes:
Keep in mind: hard to have low preemption latency with decent throughput
in zap_pte_range - easier than it once was now the ptl is taken lower down,
but big problem when truncation/invalidation holds i_mmap_lock to scan the
vma prio_tree - drop that lock and it has to restart. Not satisfactorily
solved yet (sometimes I think we should collapse the prio_tree into a list
for the duration of the unmapping: no problem putting a marker in the list).
The mmu_gather of pages to be freed after TLB flush represents a signficant
quantity of deferred work, particularly when those pages are in swapcache:
we do want preemption enabled while freeing them, but we don't want to lose
our place in the prio_tree very often.
Don't be misled by inclusion of patches to ia64 and powerpc hugetlbpage.c,
that's just to replace **tlb by *tlb in one function: the real mmu_gather
conversion is yet to be done there.
Only i386 and x86_64 have been converted, built and (inadequately) tested so
far: but most arches shouldn't need more than removing their DEFINE_PER_CPU,
with arm and arm26 probably just wanting to use more of the generic code.
sparc64 uses a flush_tlb_pending technique which defers a lot of work until
context switch, when it cannot be preempted: I've given little thought to it.
powerpc appeared similar to sparc64, but you've changed it since 2.6.16.
I've removed the start,end args to tlb_finish_mmu, and several levels above
it: the tlb_start_valid business in unmap_vmas always seemed ugly to me,
only ia64 has made use of them, and I cannot see why it shouldn't just
record first and last addr when its tlb_remove_tlb_entry is called.
But since ia64 isn't done yet, that end of it isn't seen in the patch.
Hugh
---
arch/i386/mm/init.c | 1
arch/ia64/mm/hugetlbpage.c | 2
arch/powerpc/mm/hugetlbpage.c | 8 -
arch/x86_64/mm/init.c | 2
include/asm-generic/pgtable.h | 12 --
include/asm-generic/tlb.h | 109 +++++++++++----------
include/asm-x86_64/tlbflush.h | 4
include/linux/hugetlb.h | 2
include/linux/mm.h | 11 --
include/linux/swap.h | 5 -
mm/fremap.c | 2
mm/memory.c | 209 ++++++++++++++++--------------------------
mm/mmap.c | 34 ++----
mm/swap_state.c | 12 --
14 files changed, 163 insertions(+), 250 deletions(-)
--- 2.6.22/arch/i386/mm/init.c 2007-07-09 00:32:17.000000000 +0100
+++ linux/arch/i386/mm/init.c 2007-07-12 19:47:28.000000000 +0100
@@ -47,7 +47,6 @@
unsigned int __VMALLOC_RESERVE = 128 << 20;
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
unsigned long highstart_pfn, highend_pfn;
static int noinline do_test_wp_bit(void);
--- 2.6.22/arch/ia64/mm/hugetlbpage.c 2007-07-09 00:32:17.000000000 +0100
+++ linux/arch/ia64/mm/hugetlbpage.c 2007-07-12 19:47:28.000000000 +0100
@@ -114,7 +114,7 @@ follow_huge_pmd(struct mm_struct *mm, un
return NULL;
}
-void hugetlb_free_pgd_range(struct mmu_gather **tlb,
+void hugetlb_free_pgd_range(struct mmu_gather *tlb,
unsigned long addr, unsigned long end,
unsigned long floor, unsigned long ceiling)
{
--- 2.6.22/arch/powerpc/mm/hugetlbpage.c 2007-07-09 00:32:17.000000000 +0100
+++ linux/arch/powerpc/mm/hugetlbpage.c 2007-07-12 19:47:28.000000000 +0100
@@ -240,7 +240,7 @@ static void hugetlb_free_pud_range(struc
*
* Must be called with pagetable lock held.
*/
-void hugetlb_free_pgd_range(struct mmu_gather **tlb,
+void hugetlb_free_pgd_range(struct mmu_gather *tlb,
unsigned long addr, unsigned long end,
unsigned long floor, unsigned long ceiling)
{
@@ -300,13 +300,13 @@ void hugetlb_free_pgd_range(struct mmu_g
return;
start = addr;
- pgd = pgd_offset((*tlb)->mm, addr);
+ pgd = pgd_offset(tlb->mm, addr);
do {
- BUG_ON(get_slice_psize((*tlb)->mm, addr) != mmu_huge_psize);
+ BUG_ON(get_slice_psize(tlb->mm, addr) != mmu_huge_psize);
next = pgd_addr_end(addr, end);
if (pgd_none_or_clear_bad(pgd))
continue;
- hugetlb_free_pud_range(*tlb, pgd, addr, next, floor, ceiling);
+ hugetlb_free_pud_range(tlb, pgd, addr, next, floor, ceiling);
} while (pgd++, addr = next, addr != end);
}
--- 2.6.22/arch/x86_64/mm/init.c 2007-07-09 00:32:17.000000000 +0100
+++ linux/arch/x86_64/mm/init.c 2007-07-12 19:47:28.000000000 +0100
@@ -53,8 +53,6 @@ EXPORT_SYMBOL(dma_ops);
static unsigned long dma_reserve __initdata;
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/*
* NOTE: pagetable_init alloc all the fixmap pagetables contiguous on the
* physical space so we can cache the place of the first one and move
--- 2.6.22/include/asm-generic/pgtable.h 2007-07-09 00:32:17.000000000 +0100
+++ linux/include/asm-generic/pgtable.h 2007-07-12 19:47:28.000000000 +0100
@@ -111,18 +111,6 @@ do { \
})
#endif
-/*
- * Some architectures may be able to avoid expensive synchronization
- * primitives when modifications are made to PTE's which are already
- * not present, or in the process of an address space destruction.
- */
-#ifndef __HAVE_ARCH_PTE_CLEAR_NOT_PRESENT_FULL
-#define pte_clear_not_present_full(__mm, __address, __ptep, __full) \
-do { \
- pte_clear((__mm), (__address), (__ptep)); \
-} while (0)
-#endif
-
#ifndef __HAVE_ARCH_PTEP_CLEAR_FLUSH
#define ptep_clear_flush(__vma, __address, __ptep) \
({ \
--- 2.6.22/include/asm-generic/tlb.h 2006-11-29 21:57:37.000000000 +0000
+++ linux/include/asm-generic/tlb.h 2007-07-12 19:47:28.000000000 +0100
@@ -17,65 +17,77 @@
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
-/*
- * For UP we don't need to worry about TLB flush
- * and page free order so much..
- */
-#ifdef CONFIG_SMP
- #ifdef ARCH_FREE_PTR_NR
- #define FREE_PTR_NR ARCH_FREE_PTR_NR
- #else
- #define FREE_PTE_NR 506
- #endif
- #define tlb_fast_mode(tlb) ((tlb)->nr == ~0U)
-#else
- #define FREE_PTE_NR 1
- #define tlb_fast_mode(tlb) 1
-#endif
+#define TLB_TRUNC 0 /* i_mmap_lock is held */
+#define TLB_UNMAP 1 /* normal munmap or zap */
+#define TLB_EXIT 2 /* tearing down whole mm */
+
+#define TLB_FALLBACK_PAGES 8 /* a few entries on the stack */
/* struct mmu_gather is an opaque type used by the mm code for passing around
* any data needed by arch specific code for tlb_remove_page.
*/
struct mmu_gather {
- struct mm_struct *mm;
- unsigned int nr; /* set to ~0U means fast mode */
- unsigned int need_flush;/* Really unmapped some ptes? */
- unsigned int fullmm; /* non-zero means full mm flush */
- struct page * pages[FREE_PTE_NR];
+ struct mm_struct *mm;
+ short nr;
+ short max;
+ short need_flush; /* Really unmapped some ptes? */
+ short mode;
+ struct page ** pages;
+ struct page * fallback_pages[TLB_FALLBACK_PAGES];
};
-/* Users of the generic TLB shootdown code must declare this storage space. */
-DECLARE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/* tlb_gather_mmu
- * Return a pointer to an initialized struct mmu_gather.
+ * Initialize struct mmu_gather.
*/
-static inline struct mmu_gather *
-tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush)
+static inline void
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, int mode)
{
- struct mmu_gather *tlb = &get_cpu_var(mmu_gathers);
-
tlb->mm = mm;
-
- /* Use fast mode if only one CPU is online */
- tlb->nr = num_online_cpus() > 1 ? 0U : ~0U;
-
- tlb->fullmm = full_mm_flush;
-
- return tlb;
+ tlb->nr = 0;
+ tlb->max = TLB_FALLBACK_PAGES;
+ tlb->need_flush = 0;
+ tlb->mode = mode;
+ tlb->pages = tlb->fallback_pages;
+ /* temporarily erase fallback_pages for clearer debug traces */
+ memset(tlb->fallback_pages, 0, sizeof(tlb->fallback_pages));
}
static inline void
-tlb_flush_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
+tlb_flush_mmu(struct mmu_gather *tlb)
{
if (!tlb->need_flush)
return;
tlb->need_flush = 0;
tlb_flush(tlb);
- if (!tlb_fast_mode(tlb)) {
- free_pages_and_swap_cache(tlb->pages, tlb->nr);
- tlb->nr = 0;
+ free_pages_and_swap_cache(tlb->pages, tlb->nr);
+ tlb->nr = 0;
+}
+
+static inline int
+tlb_is_extensible(struct mmu_gather *tlb)
+{
+#ifdef CONFIG_PREEMPT
+ return tlb->mode != TLB_TRUNC;
+#else
+ return 1;
+#endif
+}
+
+static inline int
+tlb_is_full(struct mmu_gather *tlb)
+{
+ if (tlb->nr < tlb->max)
+ return 0;
+ if (tlb->pages == tlb->fallback_pages && tlb_is_extensible(tlb)) {
+ struct page **pages = (void *)__get_free_pages(GFP_ATOMIC|__GFP_NOWARN, 0);
+ if (pages) {
+ memcpy(pages, tlb->pages, sizeof(tlb->fallback_pages));
+ tlb->pages = pages;
+ tlb->max = PAGE_SIZE / sizeof(struct page *);
+ return 0;
+ }
}
+ return 1;
}
/* tlb_finish_mmu
@@ -83,14 +95,11 @@ tlb_flush_mmu(struct mmu_gather *tlb, un
* that were required.
*/
static inline void
-tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
+tlb_finish_mmu(struct mmu_gather *tlb)
{
- tlb_flush_mmu(tlb, start, end);
-
- /* keep the page table cache within bounds */
- check_pgt_cache();
-
- put_cpu_var(mmu_gathers);
+ tlb_flush_mmu(tlb);
+ if (tlb->pages != tlb->fallback_pages)
+ free_pages((unsigned long)tlb->pages, 0);
}
/* tlb_remove_page
@@ -100,14 +109,10 @@ tlb_finish_mmu(struct mmu_gather *tlb, u
*/
static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
{
+ if (tlb->nr >= tlb->max)
+ tlb_flush_mmu(tlb);
tlb->need_flush = 1;
- if (tlb_fast_mode(tlb)) {
- free_page_and_swap_cache(page);
- return;
- }
tlb->pages[tlb->nr++] = page;
- if (tlb->nr >= FREE_PTE_NR)
- tlb_flush_mmu(tlb, 0, 0);
}
/**
--- 2.6.22/include/asm-x86_64/tlbflush.h 2007-07-09 00:32:17.000000000 +0100
+++ linux/include/asm-x86_64/tlbflush.h 2007-07-12 19:47:28.000000000 +0100
@@ -86,10 +86,6 @@ static inline void flush_tlb_range(struc
#define TLBSTATE_OK 1
#define TLBSTATE_LAZY 2
-/* Roughly an IPI every 20MB with 4k pages for freeing page table
- ranges. Cost is about 42k of memory for each CPU. */
-#define ARCH_FREE_PTE_NR 5350
-
#endif
#define flush_tlb_kernel_range(start, end) flush_tlb_all()
--- 2.6.22/include/linux/hugetlb.h 2007-07-09 00:32:17.000000000 +0100
+++ linux/include/linux/hugetlb.h 2007-07-12 19:47:28.000000000 +0100
@@ -52,7 +52,7 @@ void hugetlb_change_protection(struct vm
#ifndef ARCH_HAS_HUGETLB_FREE_PGD_RANGE
#define hugetlb_free_pgd_range free_pgd_range
#else
-void hugetlb_free_pgd_range(struct mmu_gather **tlb, unsigned long addr,
+void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
unsigned long end, unsigned long floor,
unsigned long ceiling);
#endif
--- 2.6.22/include/linux/mm.h 2007-07-09 00:32:17.000000000 +0100
+++ linux/include/linux/mm.h 2007-07-12 19:47:28.000000000 +0100
@@ -738,15 +738,12 @@ struct zap_details {
};
struct page *vm_normal_page(struct vm_area_struct *, unsigned long, pte_t);
-unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
+void zap_page_range(struct vm_area_struct *vma, unsigned long address,
unsigned long size, struct zap_details *);
-unsigned long unmap_vmas(struct mmu_gather **tlb,
- struct vm_area_struct *start_vma, unsigned long start_addr,
- unsigned long end_addr, unsigned long *nr_accounted,
- struct zap_details *);
-void free_pgd_range(struct mmu_gather **tlb, unsigned long addr,
+void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma);
+void free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
unsigned long end, unsigned long floor, unsigned long ceiling);
-void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *start_vma,
+void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
unsigned long floor, unsigned long ceiling);
int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
struct vm_area_struct *vma);
--- 2.6.22/include/linux/swap.h 2007-04-26 04:08:32.000000000 +0100
+++ linux/include/linux/swap.h 2007-07-12 19:47:28.000000000 +0100
@@ -232,7 +232,6 @@ extern void delete_from_swap_cache(struc
extern int move_to_swap_cache(struct page *, swp_entry_t);
extern int move_from_swap_cache(struct page *, unsigned long,
struct address_space *);
-extern void free_page_and_swap_cache(struct page *);
extern void free_pages_and_swap_cache(struct page **, int);
extern struct page * lookup_swap_cache(swp_entry_t);
extern struct page * read_swap_cache_async(swp_entry_t, struct vm_area_struct *vma,
@@ -287,9 +286,7 @@ static inline void disable_swap_token(vo
#define si_swapinfo(val) \
do { (val)->freeswap = (val)->totalswap = 0; } while (0)
/* only sparc can not include linux/pagemap.h in this file
- * so leave page_cache_release and release_pages undeclared... */
-#define free_page_and_swap_cache(page) \
- page_cache_release(page)
+ * so leave release_pages undeclared... */
#define free_pages_and_swap_cache(pages, nr) \
release_pages((pages), (nr), 0);
--- 2.6.22/mm/fremap.c 2007-02-04 18:44:54.000000000 +0000
+++ linux/mm/fremap.c 2007-07-12 19:47:28.000000000 +0100
@@ -39,7 +39,7 @@ static int zap_pte(struct mm_struct *mm,
} else {
if (!pte_file(pte))
free_swap_and_cache(pte_to_swp_entry(pte));
- pte_clear_not_present_full(mm, addr, ptep, 0);
+ pte_clear(mm, addr, ptep);
}
return !!page;
}
--- 2.6.22/mm/memory.c 2007-07-09 00:32:17.000000000 +0100
+++ linux/mm/memory.c 2007-07-12 19:47:28.000000000 +0100
@@ -203,7 +203,7 @@ static inline void free_pud_range(struct
*
* Must be called with pagetable lock held.
*/
-void free_pgd_range(struct mmu_gather **tlb,
+void free_pgd_range(struct mmu_gather *tlb,
unsigned long addr, unsigned long end,
unsigned long floor, unsigned long ceiling)
{
@@ -254,19 +254,19 @@ void free_pgd_range(struct mmu_gather **
return;
start = addr;
- pgd = pgd_offset((*tlb)->mm, addr);
+ pgd = pgd_offset(tlb->mm, addr);
do {
next = pgd_addr_end(addr, end);
if (pgd_none_or_clear_bad(pgd))
continue;
- free_pud_range(*tlb, pgd, addr, next, floor, ceiling);
+ free_pud_range(tlb, pgd, addr, next, floor, ceiling);
} while (pgd++, addr = next, addr != end);
- if (!(*tlb)->fullmm)
- flush_tlb_pgtables((*tlb)->mm, start, end);
+ if (tlb->mode != TLB_EXIT)
+ flush_tlb_pgtables(tlb->mm, start, end);
}
-void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *vma,
+void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma,
unsigned long floor, unsigned long ceiling)
{
while (vma) {
@@ -298,6 +298,9 @@ void free_pgtables(struct mmu_gather **t
}
vma = next;
}
+
+ /* keep the page table cache within bounds */
+ check_pgt_cache();
}
int __pte_alloc(struct mm_struct *mm, pmd_t *pmd, unsigned long address)
@@ -621,24 +624,36 @@ int copy_page_range(struct mm_struct *ds
static unsigned long zap_pte_range(struct mmu_gather *tlb,
struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, unsigned long end,
- long *zap_work, struct zap_details *details)
+ struct zap_details *details)
{
+ spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
struct mm_struct *mm = tlb->mm;
pte_t *pte;
spinlock_t *ptl;
int file_rss = 0;
int anon_rss = 0;
+ int progress;
+again:
+ progress = 0;
pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
arch_enter_lazy_mmu_mode();
do {
- pte_t ptent = *pte;
+ pte_t ptent;
+
+ if (progress >= 64) {
+ progress = 0;
+ if (need_resched() ||
+ need_lockbreak(ptl) ||
+ (i_mmap_lock && need_lockbreak(i_mmap_lock)))
+ break;
+ }
+ ptent = *pte;
if (pte_none(ptent)) {
- (*zap_work)--;
+ progress++;
continue;
}
-
- (*zap_work) -= PAGE_SIZE;
+ progress += 8;
if (pte_present(ptent)) {
struct page *page;
@@ -662,8 +677,10 @@ static unsigned long zap_pte_range(struc
page->index > details->last_index))
continue;
}
+ if (tlb_is_full(tlb))
+ break;
ptent = ptep_get_and_clear_full(mm, addr, pte,
- tlb->fullmm);
+ tlb->mode == TLB_EXIT);
tlb_remove_tlb_entry(tlb, pte, addr);
if (unlikely(!page))
continue;
@@ -693,20 +710,27 @@ static unsigned long zap_pte_range(struc
continue;
if (!pte_file(ptent))
free_swap_and_cache(pte_to_swp_entry(ptent));
- pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
- } while (pte++, addr += PAGE_SIZE, (addr != end && *zap_work > 0));
+ pte_clear(mm, addr, pte);
+ } while (pte++, addr += PAGE_SIZE, addr != end);
add_mm_rss(mm, file_rss, anon_rss);
arch_leave_lazy_mmu_mode();
pte_unmap_unlock(pte - 1, ptl);
+ if (!i_mmap_lock) {
+ cond_resched();
+ if (tlb_is_full(tlb))
+ tlb_flush_mmu(tlb);
+ if (addr != end)
+ goto again;
+ }
return addr;
}
static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
struct vm_area_struct *vma, pud_t *pud,
unsigned long addr, unsigned long end,
- long *zap_work, struct zap_details *details)
+ struct zap_details *details)
{
pmd_t *pmd;
unsigned long next;
@@ -715,20 +739,18 @@ static inline unsigned long zap_pmd_rang
do {
next = pmd_addr_end(addr, end);
if (pmd_none_or_clear_bad(pmd)) {
- (*zap_work)--;
+ addr = next;
continue;
}
- next = zap_pte_range(tlb, vma, pmd, addr, next,
- zap_work, details);
- } while (pmd++, addr = next, (addr != end && *zap_work > 0));
-
+ addr = zap_pte_range(tlb, vma, pmd, addr, next, details);
+ } while (pmd++, addr == next && addr != end);
return addr;
}
static inline unsigned long zap_pud_range(struct mmu_gather *tlb,
struct vm_area_struct *vma, pgd_t *pgd,
unsigned long addr, unsigned long end,
- long *zap_work, struct zap_details *details)
+ struct zap_details *details)
{
pud_t *pud;
unsigned long next;
@@ -737,20 +759,18 @@ static inline unsigned long zap_pud_rang
do {
next = pud_addr_end(addr, end);
if (pud_none_or_clear_bad(pud)) {
- (*zap_work)--;
+ addr = next;
continue;
}
- next = zap_pmd_range(tlb, vma, pud, addr, next,
- zap_work, details);
- } while (pud++, addr = next, (addr != end && *zap_work > 0));
-
+ addr = zap_pmd_range(tlb, vma, pud, addr, next, details);
+ } while (pud++, addr == next && addr != end);
return addr;
}
static unsigned long unmap_page_range(struct mmu_gather *tlb,
struct vm_area_struct *vma,
unsigned long addr, unsigned long end,
- long *zap_work, struct zap_details *details)
+ struct zap_details *details)
{
pgd_t *pgd;
unsigned long next;
@@ -764,137 +784,62 @@ static unsigned long unmap_page_range(st
do {
next = pgd_addr_end(addr, end);
if (pgd_none_or_clear_bad(pgd)) {
- (*zap_work)--;
+ addr = next;
continue;
}
- next = zap_pud_range(tlb, vma, pgd, addr, next,
- zap_work, details);
- } while (pgd++, addr = next, (addr != end && *zap_work > 0));
+ addr = zap_pud_range(tlb, vma, pgd, addr, next, details);
+ } while (pgd++, addr == next && addr != end);
tlb_end_vma(tlb, vma);
-
return addr;
}
-#ifdef CONFIG_PREEMPT
-# define ZAP_BLOCK_SIZE (8 * PAGE_SIZE)
-#else
-/* No preempt: go for improved straight-line efficiency */
-# define ZAP_BLOCK_SIZE (1024 * PAGE_SIZE)
-#endif
-
/**
* unmap_vmas - unmap a range of memory covered by a list of vma's
- * @tlbp: address of the caller's struct mmu_gather
+ * @tlb: address of the caller's struct mmu_gather
* @vma: the starting vma
- * @start_addr: virtual address at which to start unmapping
- * @end_addr: virtual address at which to end unmapping
- * @nr_accounted: Place number of unmapped pages in vm-accountable vma's here
- * @details: details of nonlinear truncation or shared cache invalidation
- *
- * Returns the end address of the unmapping (restart addr if interrupted).
*
* Unmap all pages in the vma list.
- *
- * We aim to not hold locks for too long (for scheduling latency reasons).
- * So zap pages in ZAP_BLOCK_SIZE bytecounts. This means we need to
- * return the ending mmu_gather to the caller.
- *
- * Only addresses between `start' and `end' will be unmapped.
- *
* The VMA list must be sorted in ascending virtual address order.
- *
- * unmap_vmas() assumes that the caller will flush the whole unmapped address
- * range after unmap_vmas() returns. So the only responsibility here is to
- * ensure that any thus-far unmapped pages are flushed before unmap_vmas()
- * drops the lock and schedules.
- */
-unsigned long unmap_vmas(struct mmu_gather **tlbp,
- struct vm_area_struct *vma, unsigned long start_addr,
- unsigned long end_addr, unsigned long *nr_accounted,
- struct zap_details *details)
+ */
+void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *vma)
{
- long zap_work = ZAP_BLOCK_SIZE;
- unsigned long tlb_start = 0; /* For tlb_finish_mmu */
- int tlb_start_valid = 0;
- unsigned long start = start_addr;
- spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
- int fullmm = (*tlbp)->fullmm;
-
- for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) {
- unsigned long end;
-
- start = max(vma->vm_start, start_addr);
- if (start >= vma->vm_end)
- continue;
- end = min(vma->vm_end, end_addr);
- if (end <= vma->vm_start)
- continue;
+ unsigned long nr_accounted = 0;
+ while (vma) {
if (vma->vm_flags & VM_ACCOUNT)
- *nr_accounted += (end - start) >> PAGE_SHIFT;
-
- while (start != end) {
- if (!tlb_start_valid) {
- tlb_start = start;
- tlb_start_valid = 1;
- }
-
- if (unlikely(is_vm_hugetlb_page(vma))) {
- unmap_hugepage_range(vma, start, end);
- zap_work -= (end - start) /
- (HPAGE_SIZE / PAGE_SIZE);
- start = end;
- } else
- start = unmap_page_range(*tlbp, vma,
- start, end, &zap_work, details);
-
- if (zap_work > 0) {
- BUG_ON(start != end);
- break;
- }
+ nr_accounted += vma_pages(vma);
- tlb_finish_mmu(*tlbp, tlb_start, start);
-
- if (need_resched() ||
- (i_mmap_lock && need_lockbreak(i_mmap_lock))) {
- if (i_mmap_lock) {
- *tlbp = NULL;
- goto out;
- }
- cond_resched();
- }
-
- *tlbp = tlb_gather_mmu(vma->vm_mm, fullmm);
- tlb_start_valid = 0;
- zap_work = ZAP_BLOCK_SIZE;
- }
+ if (unlikely(is_vm_hugetlb_page(vma)))
+ unmap_hugepage_range(vma, vma->vm_start, vma->vm_end);
+ else
+ unmap_page_range(tlb, vma, vma->vm_start, vma->vm_end, NULL);
+ vma = vma->vm_next;
}
-out:
- return start; /* which is now the end (or restart) address */
+
+ vm_unacct_memory(nr_accounted);
}
/**
* zap_page_range - remove user pages in a given range
* @vma: vm_area_struct holding the applicable pages
* @address: starting address of pages to zap
- * @size: number of bytes to zap
+ * @end: ending address of pages to zap
* @details: details of nonlinear truncation or shared cache invalidation
*/
-unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
+void zap_page_range(struct vm_area_struct *vma, unsigned long address,
unsigned long size, struct zap_details *details)
{
struct mm_struct *mm = vma->vm_mm;
- struct mmu_gather *tlb;
+ struct mmu_gather tlb;
unsigned long end = address + size;
- unsigned long nr_accounted = 0;
- lru_add_drain();
- tlb = tlb_gather_mmu(mm, 0);
+ BUG_ON(is_vm_hugetlb_page(vma));
+ BUG_ON(address < vma->vm_start || end > vma->vm_end);
+
+ tlb_gather_mmu(&tlb, mm, TLB_UNMAP);
update_hiwater_rss(mm);
- end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details);
- if (tlb)
- tlb_finish_mmu(tlb, address, end);
- return end;
+ unmap_page_range(&tlb, vma, address, end, details);
+ tlb_finish_mmu(&tlb);
}
/*
@@ -1822,6 +1767,8 @@ static int unmap_mapping_range_vma(struc
unsigned long start_addr, unsigned long end_addr,
struct zap_details *details)
{
+ struct mm_struct *mm = vma->vm_mm;
+ struct mmu_gather tlb;
unsigned long restart_addr;
int need_break;
@@ -1836,8 +1783,12 @@ again:
}
}
- restart_addr = zap_page_range(vma, start_addr,
- end_addr - start_addr, details);
+ tlb_gather_mmu(&tlb, mm, TLB_TRUNC);
+ update_hiwater_rss(mm);
+ restart_addr = unmap_page_range(&tlb, vma,
+ start_addr, end_addr, details);
+ tlb_finish_mmu(&tlb);
+
need_break = need_resched() ||
need_lockbreak(details->i_mmap_lock);
--- 2.6.22/mm/mmap.c 2007-07-09 00:32:17.000000000 +0100
+++ linux/mm/mmap.c 2007-07-12 19:47:28.000000000 +0100
@@ -36,8 +36,7 @@
#endif
static void unmap_region(struct mm_struct *mm,
- struct vm_area_struct *vma, struct vm_area_struct *prev,
- unsigned long start, unsigned long end);
+ struct vm_area_struct *vma, struct vm_area_struct *prev);
/*
* WARNING: the debugging will use recursive algorithms so never enable this
@@ -1165,7 +1164,7 @@ unmap_and_free_vma:
fput(file);
/* Undo any partial mapping done by a device driver. */
- unmap_region(mm, vma, prev, vma->vm_start, vma->vm_end);
+ unmap_region(mm, vma, prev);
charged = 0;
free_vma:
kmem_cache_free(vm_area_cachep, vma);
@@ -1677,21 +1676,17 @@ static void remove_vma_list(struct mm_st
* Called with the mm semaphore held.
*/
static void unmap_region(struct mm_struct *mm,
- struct vm_area_struct *vma, struct vm_area_struct *prev,
- unsigned long start, unsigned long end)
+ struct vm_area_struct *vma, struct vm_area_struct *prev)
{
struct vm_area_struct *next = prev? prev->vm_next: mm->mmap;
- struct mmu_gather *tlb;
- unsigned long nr_accounted = 0;
+ struct mmu_gather tlb;
- lru_add_drain();
- tlb = tlb_gather_mmu(mm, 0);
+ tlb_gather_mmu(&tlb, mm, TLB_UNMAP);
update_hiwater_rss(mm);
- unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
- vm_unacct_memory(nr_accounted);
+ unmap_vmas(&tlb, vma);
free_pgtables(&tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
next? next->vm_start: 0);
- tlb_finish_mmu(tlb, start, end);
+ tlb_finish_mmu(&tlb);
}
/*
@@ -1829,7 +1824,7 @@ int do_munmap(struct mm_struct *mm, unsi
* Remove the vma's, and unmap the actual pages
*/
detach_vmas_to_be_unmapped(mm, vma, prev, end);
- unmap_region(mm, vma, prev, start, end);
+ unmap_region(mm, vma, prev);
/* Fix up all other VM information */
remove_vma_list(mm, vma);
@@ -1968,23 +1963,18 @@ EXPORT_SYMBOL(do_brk);
/* Release all mmaps. */
void exit_mmap(struct mm_struct *mm)
{
- struct mmu_gather *tlb;
+ struct mmu_gather tlb;
struct vm_area_struct *vma = mm->mmap;
- unsigned long nr_accounted = 0;
- unsigned long end;
/* mm's last user has gone, and its about to be pulled down */
arch_exit_mmap(mm);
- lru_add_drain();
flush_cache_mm(mm);
- tlb = tlb_gather_mmu(mm, 1);
+ tlb_gather_mmu(&tlb, mm, TLB_EXIT);
/* Don't update_hiwater_rss(mm) here, do_exit already did */
- /* Use -1 here to ensure all VMAs in the mm are unmapped */
- end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
- vm_unacct_memory(nr_accounted);
+ unmap_vmas(&tlb, vma);
free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, 0);
- tlb_finish_mmu(tlb, 0, end);
+ tlb_finish_mmu(&tlb);
/*
* Walk the list again, actually closing and freeing it,
--- 2.6.22/mm/swap_state.c 2006-09-20 04:42:06.000000000 +0100
+++ linux/mm/swap_state.c 2007-07-12 19:47:28.000000000 +0100
@@ -258,16 +258,6 @@ static inline void free_swap_cache(struc
}
}
-/*
- * Perform a free_page(), also freeing any swap cache associated with
- * this page if it is the last user of the page.
- */
-void free_page_and_swap_cache(struct page *page)
-{
- free_swap_cache(page);
- page_cache_release(page);
-}
-
/*
* Passed an array of pages, drop them all from swapcache and then release
* them. They are removed from the LRU and freed if this is their last use.
@@ -286,6 +276,8 @@ void free_pages_and_swap_cache(struct pa
release_pages(pagep, todo, 0);
pagep += todo;
nr -= todo;
+ if (nr && !preempt_count())
+ cond_resched();
}
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-07-13 20:39 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-10 5:46 Benjamin Herrenschmidt
2007-07-11 20:45 ` Hugh Dickins
2007-07-11 23:18 ` Benjamin Herrenschmidt
2007-07-12 16:42 ` Hugh Dickins
2007-07-13 0:51 ` Benjamin Herrenschmidt
2007-07-13 20:39 ` Hugh Dickins [this message]
2007-07-13 22:46 ` Benjamin Herrenschmidt
2007-07-14 15:33 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0707132126001.5377@blonde.wat.veritas.com \
--to=hugh@veritas.com \
--cc=benh@kernel.crashing.org \
--cc=linux-mm@kvack.org \
--cc=nickpiggin@yahoo.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox