* [RFC] Possible alternate 4 level pagetables?
@ 2004-11-14 5:46 Nick Piggin
2004-11-14 5:46 ` Nick Piggin
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Nick Piggin @ 2004-11-14 5:46 UTC (permalink / raw)
To: Andi Kleen; +Cc: Linux Memory Management
[-- Attachment #1: Type: text/plain, Size: 1023 bytes --]
Hi Andi,
Just looking at your 4 level page tables patch, I wondered why the extra
level isn't inserted between pgd and pmd, as that would appear to be the
least intrusive (conceptually, in the generic code). Also it maybe matches
more closely the way that the 2->3 level conversion was done.
I've been toying with it a little bit. It is mainly just starting with
your code and doing straight conversions, although I also attempted to
implement a better compatibility layer that does the pagetable "folding"
for you if you don't need to use the full range of them.
Caveats are that there is still something slightly broken with it on i386,
and so I haven't looked at x86-64 yet. I don't see why this wouldn't work
though.
I've called the new level 'pud'. u for upper or something.
Sorry the patch isn't in very good shape at the moment - I won't have time
to work on it for a week, so I thought this would be a good point just to
solicit initial comments.
Patches against recent -bk. patch 1/4 attached.
Thanks,
Nick
[-- Attachment #2: 3level-cleanup.patch --]
[-- Type: text/x-patch, Size: 9965 bytes --]
---
linux-2.6-npiggin/include/asm-generic/pgtable-nopmd.h | 47 +++++++++++++++
linux-2.6-npiggin/include/asm-i386/mmzone.h | 1
linux-2.6-npiggin/include/asm-i386/page.h | 6 -
linux-2.6-npiggin/include/asm-i386/pgalloc.h | 7 --
linux-2.6-npiggin/include/asm-i386/pgtable-2level-defs.h | 2
linux-2.6-npiggin/include/asm-i386/pgtable-2level.h | 27 +-------
linux-2.6-npiggin/include/asm-i386/pgtable-3level.h | 7 ++
linux-2.6-npiggin/include/asm-i386/pgtable.h | 14 +---
8 files changed, 68 insertions(+), 43 deletions(-)
diff -puN /dev/null include/asm-generic/pgtable-nopmd.h
--- /dev/null 2004-09-06 19:38:39.000000000 +1000
+++ linux-2.6-npiggin/include/asm-generic/pgtable-nopmd.h 2004-11-14 12:31:01.000000000 +1100
@@ -0,0 +1,47 @@
+#ifndef _PGTABLE_NOPMD_H
+#define _PGTABLE_NOPMD_H
+
+#define PMD_SHIFT PGDIR_SHIFT
+#define PTRS_PER_PMD 1
+#define PMD_SIZE (1UL << PMD_SHIFT)
+#define PMD_MASK (~(PMD_SIZE-1))
+
+/*
+ * The "pgd_xxx()" functions here are trivial for a folded two-level
+ * setup: the pmd is never bad, and a pmd always exists (as it's folded
+ * into the pgd entry)
+ */
+#define pgd_none(pmd) 0
+#define pgd_bad(pmd) 0
+#define pgd_present(pmd) 1
+#define pgd_clear(xp) do { } while (0)
+#define pgd_ERROR(pmd) do { } while (0)
+
+#define pgd_populate(mm, pmd, pte) do { } while (0)
+#define pgd_populate_kernel(mm, pmd, pte) do { } while (0)
+
+/*
+ * (pmds are folded into pgds so this doesn't get actually called,
+ * but the define is needed for a generic inline function.)
+ */
+#define set_pmd(pmdptr, pmdval) set_pgd(((pgd_t *)pmdptr), __pgd(pmd_val(pmdval)))
+
+#define pmd_offset(pgd, address) ((pmd_t *)(pgd))
+
+#define pmd_val(x) (pgd_val((x).pmd))
+#define __pmd(x) ((pmd_t) { (x) } )
+
+#define pmd_page(pmd) (pgd_page(*(pgd_t *)&(pmd)))
+#define pmd_page_kernel(pmd) (pgd_page_kernel(*(pgd_t *)&(pmd)))
+
+/*
+ * allocating and freeing a pmd is trivial: the 1-entry pmd is
+ * inside the pgd, so has no extra memory associated with it.
+ */
+#define pmd_alloc_one(mm, address) NULL
+#define pmd_free(x) do { } while (0)
+#define __pmd_free_tlb(tlb, x) do { } while (0)
+
+typedef struct { pgd_t pmd; } pmd_t;
+
+#endif /* _PGTABLE_NOPMD_H */
diff -puN include/asm-i386/pgtable-2level.h~3level-cleanup include/asm-i386/pgtable-2level.h
--- linux-2.6/include/asm-i386/pgtable-2level.h~3level-cleanup 2004-11-13 18:44:06.000000000 +1100
+++ linux-2.6-npiggin/include/asm-i386/pgtable-2level.h 2004-11-13 18:44:06.000000000 +1100
@@ -1,22 +1,12 @@
#ifndef _I386_PGTABLE_2LEVEL_H
#define _I386_PGTABLE_2LEVEL_H
+#include <asm-generic/pgtable-nopmd.h>
+
#define pte_ERROR(e) \
printk("%s:%d: bad pte %08lx.\n", __FILE__, __LINE__, (e).pte_low)
#define pmd_ERROR(e) \
printk("%s:%d: bad pmd %08lx.\n", __FILE__, __LINE__, pmd_val(e))
-#define pgd_ERROR(e) \
- printk("%s:%d: bad pgd %08lx.\n", __FILE__, __LINE__, pgd_val(e))
-
-/*
- * The "pgd_xxx()" functions here are trivial for a folded two-level
- * setup: the pgd is never bad, and a pmd always exists (as it's folded
- * into the pgd entry)
- */
-static inline int pgd_none(pgd_t pgd) { return 0; }
-static inline int pgd_bad(pgd_t pgd) { return 0; }
-static inline int pgd_present(pgd_t pgd) { return 1; }
-#define pgd_clear(xp) do { } while (0)
/*
* Certain architectures need to do special things when PTEs
@@ -25,20 +15,11 @@ static inline int pgd_present(pgd_t pgd)
*/
#define set_pte(pteptr, pteval) (*(pteptr) = pteval)
#define set_pte_atomic(pteptr, pteval) set_pte(pteptr,pteval)
-/*
- * (pmds are folded into pgds so this doesn't get actually called,
- * but the define is needed for a generic inline function.)
- */
-#define set_pmd(pmdptr, pmdval) (*(pmdptr) = pmdval)
+
#define set_pgd(pgdptr, pgdval) (*(pgdptr) = pgdval)
-#define pgd_page(pgd) \
-((unsigned long) __va(pgd_val(pgd) & PAGE_MASK))
+#define pgd_page(pgd) (pfn_to_page(pgd_val(pgd) >> PAGE_SHIFT))
-static inline pmd_t * pmd_offset(pgd_t * dir, unsigned long address)
-{
- return (pmd_t *) dir;
-}
#define ptep_get_and_clear(xp) __pte(xchg(&(xp)->pte_low, 0))
#define pte_same(a, b) ((a).pte_low == (b).pte_low)
#define pte_page(x) pfn_to_page(pte_pfn(x))
diff -puN include/asm-i386/page.h~3level-cleanup include/asm-i386/page.h
--- linux-2.6/include/asm-i386/page.h~3level-cleanup 2004-11-13 18:44:06.000000000 +1100
+++ linux-2.6-npiggin/include/asm-i386/page.h 2004-11-13 18:44:06.000000000 +1100
@@ -46,11 +46,12 @@ typedef struct { unsigned long pte_low,
typedef struct { unsigned long long pmd; } pmd_t;
typedef struct { unsigned long long pgd; } pgd_t;
typedef struct { unsigned long long pgprot; } pgprot_t;
+#define pmd_val(x) ((x).pmd)
#define pte_val(x) ((x).pte_low | ((unsigned long long)(x).pte_high << 32))
+#define __pmd(x) ((pmd_t) { (x) } )
#define HPAGE_SHIFT 21
#else
typedef struct { unsigned long pte_low; } pte_t;
-typedef struct { unsigned long pmd; } pmd_t;
typedef struct { unsigned long pgd; } pgd_t;
typedef struct { unsigned long pgprot; } pgprot_t;
#define boot_pte_t pte_t /* or would you rather have a typedef */
@@ -66,13 +67,10 @@ typedef struct { unsigned long pgprot; }
#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
#endif
-
-#define pmd_val(x) ((x).pmd)
#define pgd_val(x) ((x).pgd)
#define pgprot_val(x) ((x).pgprot)
#define __pte(x) ((pte_t) { (x) } )
-#define __pmd(x) ((pmd_t) { (x) } )
#define __pgd(x) ((pgd_t) { (x) } )
#define __pgprot(x) ((pgprot_t) { (x) } )
diff -puN include/asm-i386/pgtable-2level-defs.h~3level-cleanup include/asm-i386/pgtable-2level-defs.h
--- linux-2.6/include/asm-i386/pgtable-2level-defs.h~3level-cleanup 2004-11-13 18:44:06.000000000 +1100
+++ linux-2.6-npiggin/include/asm-i386/pgtable-2level-defs.h 2004-11-13 18:44:06.000000000 +1100
@@ -12,8 +12,6 @@
* the i386 is two-level, so we don't really have any
* PMD directory physically.
*/
-#define PMD_SHIFT 22
-#define PTRS_PER_PMD 1
#define PTRS_PER_PTE 1024
diff -puN include/asm-generic/pgtable.h~3level-cleanup include/asm-generic/pgtable.h
diff -puN include/asm-i386/pgtable-3level.h~3level-cleanup include/asm-i386/pgtable-3level.h
--- linux-2.6/include/asm-i386/pgtable-3level.h~3level-cleanup 2004-11-13 18:44:06.000000000 +1100
+++ linux-2.6-npiggin/include/asm-i386/pgtable-3level.h 2004-11-14 12:31:01.000000000 +1100
@@ -73,6 +73,11 @@ static inline void pgd_clear (pgd_t * pg
#define pgd_page(pgd) \
((unsigned long) __va(pgd_val(pgd) & PAGE_MASK))
+#define pmd_page(pmd) (pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT))
+
+#define pmd_page_kernel(pmd) \
+((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
+
/* Find an entry in the second-level page table.. */
#define pmd_offset(dir, address) ((pmd_t *) pgd_page(*(dir)) + \
pmd_index(address))
@@ -142,4 +147,6 @@ static inline pmd_t pfn_pmd(unsigned lon
#define __pte_to_swp_entry(pte) ((swp_entry_t){ (pte).pte_high })
#define __swp_entry_to_pte(x) ((pte_t){ 0, (x).val })
+#define __pmd_free_tlb(tlb, x) do { } while (0)
+
#endif /* _I386_PGTABLE_3LEVEL_H */
diff -puN include/asm-i386/pgalloc.h~3level-cleanup include/asm-i386/pgalloc.h
--- linux-2.6/include/asm-i386/pgalloc.h~3level-cleanup 2004-11-13 18:44:06.000000000 +1100
+++ linux-2.6-npiggin/include/asm-i386/pgalloc.h 2004-11-14 12:31:01.000000000 +1100
@@ -39,16 +39,15 @@ static inline void pte_free(struct page
#define __pte_free_tlb(tlb,pte) tlb_remove_page((tlb),(pte))
+#ifdef CONFIG_X86_PAE
/*
- * allocating and freeing a pmd is trivial: the 1-entry pmd is
- * inside the pgd, so has no extra memory associated with it.
- * (In the PAE case we free the pmds as part of the pgd.)
+ * In the PAE case we free the pmds as part of the pgd.
*/
-
#define pmd_alloc_one(mm, addr) ({ BUG(); ((pmd_t *)2); })
#define pmd_free(x) do { } while (0)
#define __pmd_free_tlb(tlb,x) do { } while (0)
#define pgd_populate(mm, pmd, pte) BUG()
+#endif
#define check_pgt_cache() do { } while (0)
diff -puN include/asm-i386/pgtable.h~3level-cleanup include/asm-i386/pgtable.h
--- linux-2.6/include/asm-i386/pgtable.h~3level-cleanup 2004-11-13 18:44:06.000000000 +1100
+++ linux-2.6-npiggin/include/asm-i386/pgtable.h 2004-11-14 12:30:02.000000000 +1100
@@ -50,12 +50,12 @@ void paging_init(void);
*/
#ifdef CONFIG_X86_PAE
# include <asm/pgtable-3level-defs.h>
+# define PMD_SIZE (1UL << PMD_SHIFT)
+# define PMD_MASK (~(PMD_SIZE-1))
#else
# include <asm/pgtable-2level-defs.h>
#endif
-#define PMD_SIZE (1UL << PMD_SHIFT)
-#define PMD_MASK (~(PMD_SIZE-1))
#define PGDIR_SIZE (1UL << PGDIR_SHIFT)
#define PGDIR_MASK (~(PGDIR_SIZE-1))
@@ -293,15 +293,11 @@ static inline pte_t pte_modify(pte_t pte
#define page_pte(page) page_pte_prot(page, __pgprot(0))
-#define pmd_page_kernel(pmd) \
-((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
-
-#ifndef CONFIG_DISCONTIGMEM
-#define pmd_page(pmd) (pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT))
-#endif /* !CONFIG_DISCONTIGMEM */
+#define pgd_page_kernel(pgd) \
+((unsigned long) __va(pgd_val(pgd) & PAGE_MASK))
#define pmd_large(pmd) \
- ((pmd_val(pmd) & (_PAGE_PSE|_PAGE_PRESENT)) == (_PAGE_PSE|_PAGE_PRESENT))
+((pmd_val(pmd) & (_PAGE_PSE|_PAGE_PRESENT)) == (_PAGE_PSE|_PAGE_PRESENT))
/*
* the pgd page can be thought of an array like this: pgd_t[PTRS_PER_PGD]
diff -puN include/asm-i386/mmzone.h~3level-cleanup include/asm-i386/mmzone.h
--- linux-2.6/include/asm-i386/mmzone.h~3level-cleanup 2004-11-13 18:44:06.000000000 +1100
+++ linux-2.6-npiggin/include/asm-i386/mmzone.h 2004-11-13 18:44:06.000000000 +1100
@@ -116,7 +116,6 @@ static inline struct pglist_data *pfn_to
(unsigned long)(__page - __zone->zone_mem_map) \
+ __zone->zone_start_pfn; \
})
-#define pmd_page(pmd) (pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT))
#ifdef CONFIG_X86_NUMAQ /* we have contiguous memory on NUMA-Q */
#define pfn_valid(pfn) ((pfn) < num_physpages)
_
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] Possible alternate 4 level pagetables?
2004-11-14 5:46 [RFC] Possible alternate 4 level pagetables? Nick Piggin
@ 2004-11-14 5:46 ` Nick Piggin
2004-11-14 5:47 ` Nick Piggin
2004-11-14 8:25 ` Andi Kleen
2004-11-15 21:37 ` Hugh Dickins
2 siblings, 1 reply; 9+ messages in thread
From: Nick Piggin @ 2004-11-14 5:46 UTC (permalink / raw)
To: Andi Kleen; +Cc: Linux Memory Management
[-- Attachment #1: Type: text/plain, Size: 4 bytes --]
2/4
[-- Attachment #2: 4level-compat.patch --]
[-- Type: text/x-patch, Size: 8102 bytes --]
---
linux-2.6-npiggin/include/asm-generic/pgtable-nopmd.h | 44 ++++++++--------
linux-2.6-npiggin/include/asm-generic/pgtable-nopud.h | 47 ++++++++++++++++++
linux-2.6-npiggin/include/asm-generic/tlb.h | 6 ++
linux-2.6-npiggin/include/asm-i386/pgalloc.h | 2
linux-2.6-npiggin/include/asm-i386/pgtable-3level.h | 16 +++---
5 files changed, 86 insertions(+), 29 deletions(-)
diff -puN /dev/null include/asm-generic/pgtable-nopud.h
--- /dev/null 2004-09-06 19:38:39.000000000 +1000
+++ linux-2.6-npiggin/include/asm-generic/pgtable-nopud.h 2004-11-14 12:33:19.000000000 +1100
@@ -0,0 +1,47 @@
+#ifndef _PGTABLE_NOPUD_H
+#define _PGTABLE_NOPUD_H
+
+#define PUD_SHIFT PGDIR_SHIFT
+#define PTRS_PER_PUD 1
+#define PUD_SIZE (1UL << PUD_SHIFT)
+#define PUD_MASK (~(PUD_SIZE-1))
+
+/*
+ * The "pgd_xxx()" functions here are trivial for a folded two or three-level
+ * setup: the pgd is never bad, and a pud always exists (as it's folded
+ * into the pgd entry)
+ */
+#define pgd_none(pgd) 0
+#define pgd_bad(pgd) 0
+#define pgd_present(pgd) 1
+#define pgd_clear(xp) do { } while (0)
+#define pgd_ERROR(pgd) do { } while (0)
+
+#define pgd_populate(mm, pud, pmd) do { } while (0)
+
+/*
+ * (puds are folded into pgds so this doesn't get actually called,
+ * but the define is needed for a generic inline function.)
+ */
+#define set_pud(pudptr, pudval) set_pgd(((pgd_t *)pudptr), *(pgd_t *)&(pudval))
+
+#define pud_offset(pgd, address) ((pud_t *)(pgd))
+#define pud_offset_k(pgd, address) ((pud_t *)(pgd))
+
+#define pud_val(x) (pgd_val((x).pud))
+#define __pud(x) ((pud_t) { (x) } )
+
+#define pud_page(pud) (pgd_page(*(pgd_t *)&(pud)))
+#define pud_page_kernel(pud) (pgd_page_kernel(*(pgd_t *)&(pud)))
+
+/*
+ * allocating and freeing a pud is trivial: the 1-entry pud is
+ * inside the pgd, so has no extra memory associated with it.
+ */
+#define pud_alloc_one(mm, address) NULL
+#define pud_free(x) do { } while (0)
+#define __pud_free_tlb(tlb, x) do { } while (0)
+
+typedef struct { pgd_t pud; } pud_t;
+
+#endif /* _PGTABLE_NOPUD_H */
diff -puN include/asm-generic/pgtable-nopmd.h~4level-compat include/asm-generic/pgtable-nopmd.h
--- linux-2.6/include/asm-generic/pgtable-nopmd.h~4level-compat 2004-11-14 12:32:51.000000000 +1100
+++ linux-2.6-npiggin/include/asm-generic/pgtable-nopmd.h 2004-11-14 12:32:51.000000000 +1100
@@ -1,47 +1,49 @@
#ifndef _PGTABLE_NOPMD_H
#define _PGTABLE_NOPMD_H
-#define PMD_SHIFT PGDIR_SHIFT
+#include <asm-generic/pgtable-nopud.h>
+
+#define PMD_SHIFT PUD_SHIFT
#define PTRS_PER_PMD 1
#define PMD_SIZE (1UL << PMD_SHIFT)
#define PMD_MASK (~(PMD_SIZE-1))
/*
- * The "pgd_xxx()" functions here are trivial for a folded two-level
+ * The "pud_xxx()" functions here are trivial for a folded two-level
* setup: the pmd is never bad, and a pmd always exists (as it's folded
- * into the pgd entry)
+ * into the pud entry)
*/
-#define pgd_none(pmd) 0
-#define pgd_bad(pmd) 0
-#define pgd_present(pmd) 1
-#define pgd_clear(xp) do { } while (0)
-#define pgd_ERROR(pmd) do { } while (0)
+#define pud_none(pmd) 0
+#define pud_bad(pmd) 0
+#define pud_present(pmd) 1
+#define pud_clear(xp) do { } while (0)
+#define pud_ERROR(pmd) do { } while (0)
-#define pgd_populate(mm, pmd, pte) do { } while (0)
-#define pgd_populate_kernel(mm, pmd, pte) do { } while (0)
+#define pud_populate(mm, pmd, pte) do { } while (0)
+#define pud_populate_kernel(mm, pmd, pte) do { } while (0)
/*
- * (pmds are folded into pgds so this doesn't get actually called,
+ * (pmds are folded into puds so this doesn't get actually called,
* but the define is needed for a generic inline function.)
*/
-#define set_pmd(pmdptr, pmdval) set_pgd(((pgd_t *)pmdptr), __pgd(pmd_val(pmdval)))
+#define set_pmd(pmdptr, pmdval) set_pud(((pud_t *)pmdptr), *(pud_t *)&(pmdval))
-#define pmd_offset(pgd, address) ((pmd_t *)(pgd))
+#define pmd_offset(pud, address) ((pmd_t *)(pud))
-#define pmd_val(x) (pgd_val((x).pmd))
+#define pmd_val(x) (pud_val((x).pmd))
#define __pmd(x) ((pmd_t) { (x) } )
-#define pmd_page(pmd) (pgd_page(*(pgd_t *)&(pmd)))
-#define pmd_page_kernel(pmd) (pgd_page_kernel(*(pgd_t *)&(pmd)))
+#define pmd_page(pmd) (pud_page(*(pud_t *)&(pmd)))
+#define pmd_page_kernel(pmd) (pud_page_kernel(*(pud_t *)&(pmd)))
/*
* allocating and freeing a pmd is trivial: the 1-entry pmd is
- * inside the pgd, so has no extra memory associated with it.
+ * inside the pud, so has no extra memory associated with it.
*/
-#define pmd_alloc_one(mm, address) NULL
-#define pmd_free(x) do { } while (0)
-#define __pmd_free_tlb(tlb, x) do { } while (0)
+#define pmd_alloc_one(mm, address) NULL
+#define pmd_free(x) do { } while (0)
+#define __pmd_free_tlb(tlb, x) do { } while (0)
-typedef struct { pgd_t pmd; } pmd_t;
+typedef struct { pud_t pmd; } pmd_t;
#endif /* _PGTABLE_NOPMD_H */
diff -puN include/asm-i386/pgtable.h~4level-compat include/asm-i386/pgtable.h
diff -puN include/asm-generic/tlb.h~4level-compat include/asm-generic/tlb.h
--- linux-2.6/include/asm-generic/tlb.h~4level-compat 2004-11-14 12:32:51.000000000 +1100
+++ linux-2.6-npiggin/include/asm-generic/tlb.h 2004-11-14 12:32:51.000000000 +1100
@@ -141,6 +141,12 @@ static inline void tlb_remove_page(struc
__pte_free_tlb(tlb, ptep); \
} while (0)
+#define pud_free_tlb(tlb, pudp) \
+ do { \
+ tlb->need_flush = 1; \
+ __pud_free_tlb(tlb, pudp); \
+ } while (0)
+
#define pmd_free_tlb(tlb, pmdp) \
do { \
tlb->need_flush = 1; \
diff -puN include/asm-i386/pgtable-3level.h~4level-compat include/asm-i386/pgtable-3level.h
--- linux-2.6/include/asm-i386/pgtable-3level.h~4level-compat 2004-11-14 12:32:51.000000000 +1100
+++ linux-2.6-npiggin/include/asm-i386/pgtable-3level.h 2004-11-14 12:32:51.000000000 +1100
@@ -1,6 +1,8 @@
#ifndef _I386_PGTABLE_3LEVEL_H
#define _I386_PGTABLE_3LEVEL_H
+#include <asm-generic/pgtable-nopud.h>
+
/*
* Intel Physical Address Extension (PAE) Mode - three-level page
* tables on PPro+ CPUs.
@@ -12,12 +14,12 @@
printk("%s:%d: bad pte %p(%08lx%08lx).\n", __FILE__, __LINE__, &(e), (e).pte_high, (e).pte_low)
#define pmd_ERROR(e) \
printk("%s:%d: bad pmd %p(%016Lx).\n", __FILE__, __LINE__, &(e), pmd_val(e))
-#define pgd_ERROR(e) \
- printk("%s:%d: bad pgd %p(%016Lx).\n", __FILE__, __LINE__, &(e), pgd_val(e))
+#define pud_ERROR(e) \
+ printk("%s:%d: bad pud %p(%016Lx).\n", __FILE__, __LINE__, &(e), pud_val(e))
-static inline int pgd_none(pgd_t pgd) { return 0; }
-static inline int pgd_bad(pgd_t pgd) { return 0; }
-static inline int pgd_present(pgd_t pgd) { return 1; }
+#define pud_none(pud) 0
+#define pud_bad(pud) 0
+#define pud_present(pud) 1
/*
* Is the pte executable?
@@ -68,7 +70,7 @@ static inline void set_pte(pte_t *ptep,
* We do not let the generic code free and clear pgd entries due to
* this erratum.
*/
-static inline void pgd_clear (pgd_t * pgd) { }
+#define pud_clear(pud) do { } while (0)
#define pgd_page(pgd) \
((unsigned long) __va(pgd_val(pgd) & PAGE_MASK))
@@ -79,7 +81,7 @@ static inline void pgd_clear (pgd_t * pg
((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
/* Find an entry in the second-level page table.. */
-#define pmd_offset(dir, address) ((pmd_t *) pgd_page(*(dir)) + \
+#define pmd_offset(pud, address) ((pmd_t *) pud_page(*(pud)) + \
pmd_index(address))
static inline pte_t ptep_get_and_clear(pte_t *ptep)
diff -puN include/asm-i386/pgalloc.h~4level-compat include/asm-i386/pgalloc.h
--- linux-2.6/include/asm-i386/pgalloc.h~4level-compat 2004-11-14 12:32:51.000000000 +1100
+++ linux-2.6-npiggin/include/asm-i386/pgalloc.h 2004-11-14 12:32:51.000000000 +1100
@@ -46,7 +46,7 @@ static inline void pte_free(struct page
#define pmd_alloc_one(mm, addr) ({ BUG(); ((pmd_t *)2); })
#define pmd_free(x) do { } while (0)
#define __pmd_free_tlb(tlb,x) do { } while (0)
-#define pgd_populate(mm, pmd, pte) BUG()
+#define pud_populate(mm, pmd, pte) BUG()
#endif
#define check_pgt_cache() do { } while (0)
_
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] Possible alternate 4 level pagetables?
2004-11-14 5:46 ` Nick Piggin
@ 2004-11-14 5:47 ` Nick Piggin
2004-11-14 5:48 ` Nick Piggin
0 siblings, 1 reply; 9+ messages in thread
From: Nick Piggin @ 2004-11-14 5:47 UTC (permalink / raw)
To: Andi Kleen; +Cc: Linux Memory Management
[-- Attachment #1: Type: text/plain, Size: 17 bytes --]
3/4 - core patch
[-- Attachment #2: 4level-core-patch.patch --]
[-- Type: text/x-patch, Size: 53731 bytes --]
Extend the Linux MM to 4level page tables.
This is the core patch for mm/*, fs/*, include/linux/*
It breaks all architectures, which will be fixed in separate patches.
The conversion is quite straight forward. All the functions walking the page
table hierarchy have been changed to deal with another level at the top. The
additional level is called pml4.
mm/memory.c has changed a lot because it did most of the heavy lifting here.
Most of the changes here are extensions of the previous code.
One exception is copy_page_range() which I rewrote into the standard multiple
inlines style. The previous single function goto heaven was not really
scalable to 4level.
pgd_offset() and pgd_offset_k() have been renamed to make sure all unconverted
code causes an compile error. They are replaced with
pml4_offset()/pml4_offset_k() and pml4_pgd_offset()/pml4_pgd_offset_k() Please
note than when you use pml4_offset_k() you also have to use
pml4_pgd_offset_k() because IA64 requires a special case here.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---
25-akpm/fs/exec.c | 6
25-akpm/include/linux/init_task.h | 2
25-akpm/include/linux/sched.h | 2
25-akpm/kernel/fork.c | 6
25-akpm/mm/fremap.c | 11
---
linux-2.6-npiggin/fs/exec.c | 6
linux-2.6-npiggin/include/linux/init_task.h | 2
linux-2.6-npiggin/include/linux/mm.h | 22
linux-2.6-npiggin/mm/fremap.c | 18
linux-2.6-npiggin/mm/memory.c | 687 ++++++++++++++++++----------
linux-2.6-npiggin/mm/mempolicy.c | 22
linux-2.6-npiggin/mm/mmap.c | 25 -
linux-2.6-npiggin/mm/mprotect.c | 65 +-
linux-2.6-npiggin/mm/mremap.c | 29 -
linux-2.6-npiggin/mm/msync.c | 55 +-
linux-2.6-npiggin/mm/rmap.c | 21
linux-2.6-npiggin/mm/swapfile.c | 81 ++-
linux-2.6-npiggin/mm/vmalloc.c | 113 +++-
13 files changed, 794 insertions(+), 352 deletions(-)
diff -puN fs/exec.c~4level-core-patch fs/exec.c
--- linux-2.6/fs/exec.c~4level-core-patch 2004-11-14 12:33:54.000000000 +1100
+++ linux-2.6-npiggin/fs/exec.c 2004-11-14 12:33:54.000000000 +1100
@@ -300,6 +300,7 @@ void install_arg_page(struct vm_area_str
{
struct mm_struct *mm = vma->vm_mm;
pgd_t * pgd;
+ pud_t * pud;
pmd_t * pmd;
pte_t * pte;
@@ -310,7 +311,10 @@ void install_arg_page(struct vm_area_str
pgd = pgd_offset(mm, address);
spin_lock(&mm->page_table_lock);
- pmd = pmd_alloc(mm, pgd, address);
+ pud = pud_alloc(mm, pgd, address);
+ if (!pud)
+ goto out;
+ pmd = pmd_alloc(mm, pud, address);
if (!pmd)
goto out;
pte = pte_alloc_map(mm, pmd, address);
diff -puN include/linux/init_task.h~4level-core-patch include/linux/init_task.h
--- linux-2.6/include/linux/init_task.h~4level-core-patch 2004-11-14 12:33:54.000000000 +1100
+++ linux-2.6-npiggin/include/linux/init_task.h 2004-11-14 12:33:54.000000000 +1100
@@ -34,7 +34,7 @@
#define INIT_MM(name) \
{ \
.mm_rb = RB_ROOT, \
- .pgd = swapper_pg_dir, \
+ .pgd = swapper_pg_dir, \
.mm_users = ATOMIC_INIT(2), \
.mm_count = ATOMIC_INIT(1), \
.mmap_sem = __RWSEM_INITIALIZER(name.mmap_sem), \
diff -puN include/linux/mm.h~4level-core-patch include/linux/mm.h
--- linux-2.6/include/linux/mm.h~4level-core-patch 2004-11-14 12:33:54.000000000 +1100
+++ linux-2.6-npiggin/include/linux/mm.h 2004-11-14 15:04:02.000000000 +1100
@@ -566,7 +566,7 @@ int unmap_vmas(struct mmu_gather **tlbp,
struct vm_area_struct *start_vma, unsigned long start_addr,
unsigned long end_addr, unsigned long *nr_accounted,
struct zap_details *);
-void clear_page_tables(struct mmu_gather *tlb, unsigned long first, int nr);
+void clear_page_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end);
int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
struct vm_area_struct *vma);
int zeromap_page_range(struct vm_area_struct *vma, unsigned long from,
@@ -581,7 +581,8 @@ static inline void unmap_shared_mapping_
}
extern int vmtruncate(struct inode * inode, loff_t offset);
-extern pmd_t *FASTCALL(__pmd_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address));
+extern pud_t *FASTCALL(__pud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address));
+extern pmd_t *FASTCALL(__pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address));
extern pte_t *FASTCALL(pte_alloc_kernel(struct mm_struct *mm, pmd_t *pmd, unsigned long address));
extern pte_t *FASTCALL(pte_alloc_map(struct mm_struct *mm, pmd_t *pmd, unsigned long address));
extern int install_page(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, struct page *page, pgprot_t prot);
@@ -626,15 +627,22 @@ extern struct shrinker *set_shrinker(int
extern void remove_shrinker(struct shrinker *shrinker);
/*
- * On a two-level page table, this ends up being trivial. Thus the
- * inlining and the symmetry break with pte_alloc_map() that does all
+ * On a two-level or three-level page table, this ends up being trivial. Thus
+ * the inlining and the symmetry break with pte_alloc_map() that does all
* of this out-of-line.
*/
-static inline pmd_t *pmd_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
+static inline pud_t *pud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
{
if (pgd_none(*pgd))
- return __pmd_alloc(mm, pgd, address);
- return pmd_offset(pgd, address);
+ return __pud_alloc(mm, pgd, address);
+ return pud_offset(pgd, address);
+}
+
+static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
+{
+ if (pud_none(*pud))
+ return __pmd_alloc(mm, pud, address);
+ return pmd_offset(pud, address);
}
extern void free_area_init(unsigned long * zones_size);
diff -puN include/linux/sched.h~4level-core-patch include/linux/sched.h
diff -puN kernel/fork.c~4level-core-patch kernel/fork.c
diff -puN mm/fremap.c~4level-core-patch mm/fremap.c
--- linux-2.6/mm/fremap.c~4level-core-patch 2004-11-14 12:33:54.000000000 +1100
+++ linux-2.6-npiggin/mm/fremap.c 2004-11-14 12:33:54.000000000 +1100
@@ -60,14 +60,19 @@ int install_page(struct mm_struct *mm, s
pgoff_t size;
int err = -ENOMEM;
pte_t *pte;
- pgd_t *pgd;
pmd_t *pmd;
+ pud_t *pud;
+ pgd_t *pgd;
pte_t pte_val;
pgd = pgd_offset(mm, addr);
spin_lock(&mm->page_table_lock);
+
+ pud = pud_alloc(mm, pgd, addr);
+ if (!pud)
+ goto err_unlock;
- pmd = pmd_alloc(mm, pgd, addr);
+ pmd = pmd_alloc(mm, pud, addr);
if (!pmd)
goto err_unlock;
@@ -112,14 +117,19 @@ int install_file_pte(struct mm_struct *m
{
int err = -ENOMEM;
pte_t *pte;
- pgd_t *pgd;
pmd_t *pmd;
+ pud_t *pud;
+ pgd_t *pgd;
pte_t pte_val;
pgd = pgd_offset(mm, addr);
spin_lock(&mm->page_table_lock);
+
+ pud = pud_alloc(mm, pgd, addr);
+ if (!pud)
+ goto err_unlock;
- pmd = pmd_alloc(mm, pgd, addr);
+ pmd = pmd_alloc(mm, pud, addr);
if (!pmd)
goto err_unlock;
diff -puN mm/memory.c~4level-core-patch mm/memory.c
--- linux-2.6/mm/memory.c~4level-core-patch 2004-11-14 12:33:54.000000000 +1100
+++ linux-2.6-npiggin/mm/memory.c 2004-11-14 16:32:50.000000000 +1100
@@ -34,6 +34,8 @@
*
* 16.07.99 - Support of BIGMEM added by Gerhard Wichert, Siemens AG
* (Gerhard.Wichert@pdb.siemens.de)
+ *
+ * Aug/Sep 2004 Changed to four level page tables (Andi Kleen)
*/
#include <linux/kernel_stat.h>
@@ -118,10 +120,30 @@ static inline void free_one_pmd(struct m
pte_free_tlb(tlb, page);
}
+static inline void free_one_pud(struct mmu_gather *tlb, pud_t * dir)
+{
+ int j;
+ pmd_t *pmd;
+
+ if (pud_none(*dir))
+ return;
+ if (unlikely(pud_bad(*dir))) {
+ pud_ERROR(*dir);
+ pud_clear(dir);
+ return;
+ }
+ pmd = pmd_offset(dir, 0);
+ pud_clear(dir);
+ for (j = 0; j < PTRS_PER_PMD ; j++)
+ free_one_pmd(tlb, pmd+j);
+ pmd_free_tlb(tlb, pmd);
+}
+
+
static inline void free_one_pgd(struct mmu_gather *tlb, pgd_t * dir)
{
int j;
- pmd_t * pmd;
+ pud_t * pud;
if (pgd_none(*dir))
return;
@@ -130,28 +152,31 @@ static inline void free_one_pgd(struct m
pgd_clear(dir);
return;
}
- pmd = pmd_offset(dir, 0);
+ pud = pud_offset(dir, 0);
pgd_clear(dir);
- for (j = 0; j < PTRS_PER_PMD ; j++)
- free_one_pmd(tlb, pmd+j);
- pmd_free_tlb(tlb, pmd);
+ for (j = 0; j < PTRS_PER_PUD ; j++)
+ free_one_pud(tlb, pud+j);
+ pud_free_tlb(tlb, pud);
}
/*
- * This function clears all user-level page tables of a process - this
- * is needed by execve(), so that old pages aren't in the way.
+ * This function clears user-level page tables of a process.
*
* Must be called with pagetable lock held.
+ *
+ * This function is not exact and may clear less than the range if
+ * addr and end are not suitably aligned.
*/
-void clear_page_tables(struct mmu_gather *tlb, unsigned long first, int nr)
+void clear_page_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end)
{
pgd_t * page_dir = tlb->mm->pgd;
- page_dir += first;
+ page_dir += addr;
do {
free_one_pgd(tlb, page_dir);
page_dir++;
- } while (--nr);
+ addr = (addr + PGDIR_SIZE) & PGDIR_MASK;
+ } while (addr && (addr < end));
}
pte_t fastcall * pte_alloc_map(struct mm_struct *mm, pmd_t *pmd, unsigned long address)
@@ -204,165 +229,210 @@ pte_t fastcall * pte_alloc_kernel(struct
out:
return pte_offset_kernel(pmd, address);
}
-#define PTE_TABLE_MASK ((PTRS_PER_PTE-1) * sizeof(pte_t))
-#define PMD_TABLE_MASK ((PTRS_PER_PMD-1) * sizeof(pmd_t))
/*
* copy one vm_area from one task to the other. Assumes the page tables
* already present in the new task to be cleared in the whole range
* covered by this vma.
*
- * 08Jan98 Merged into one routine from several inline routines to reduce
- * variable count and make things faster. -jj
- *
* dst->page_table_lock is held on entry and exit,
- * but may be dropped within pmd_alloc() and pte_alloc_map().
+ * but may be dropped within p[mg]d_alloc() and pte_alloc_map().
*/
-int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
- struct vm_area_struct *vma)
+
+static inline void
+copy_swap_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, pte_t pte)
{
- pgd_t * src_pgd, * dst_pgd;
- unsigned long address = vma->vm_start;
- unsigned long end = vma->vm_end;
- unsigned long cow;
+ if (pte_file(pte))
+ return;
+ swap_duplicate(pte_to_swp_entry(pte));
+ if (list_empty(&dst_mm->mmlist)) {
+ spin_lock(&mmlist_lock);
+ list_add(&dst_mm->mmlist, &src_mm->mmlist);
+ spin_unlock(&mmlist_lock);
+ }
+}
- if (is_vm_hugetlb_page(vma))
- return copy_hugetlb_page_range(dst, src, vma);
+static inline void
+copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
+ pte_t *dst_pte, pte_t *src_pte, unsigned long vm_flags,
+ unsigned long addr)
+{
+ pte_t pte = *src_pte;
+ struct page *page;
+ unsigned long pfn;
- cow = (vma->vm_flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
- src_pgd = pgd_offset(src, address)-1;
- dst_pgd = pgd_offset(dst, address)-1;
+ /* pte contains position in swap, so copy. */
+ if (!pte_present(pte)) {
+ copy_swap_pte(dst_mm, src_mm, pte);
+ set_pte(dst_pte, pte);
+ return;
+ }
+ pfn = pte_pfn(pte);
+ /* the pte points outside of valid memory, the
+ * mapping is assumed to be good, meaningful
+ * and not mapped via rmap - duplicate the
+ * mapping as is.
+ */
+ page = NULL;
+ if (pfn_valid(pfn))
+ page = pfn_to_page(pfn);
- for (;;) {
- pmd_t * src_pmd, * dst_pmd;
+ if (!page || PageReserved(page)) {
+ set_pte(dst_pte, pte);
+ return;
+ }
- src_pgd++; dst_pgd++;
-
- /* copy_pmd_range */
-
- if (pgd_none(*src_pgd))
- goto skip_copy_pmd_range;
- if (unlikely(pgd_bad(*src_pgd))) {
- pgd_ERROR(*src_pgd);
- pgd_clear(src_pgd);
-skip_copy_pmd_range: address = (address + PGDIR_SIZE) & PGDIR_MASK;
- if (!address || (address >= end))
- goto out;
+ /*
+ * If it's a COW mapping, write protect it both
+ * in the parent and the child
+ */
+ if ((vm_flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE) {
+ ptep_set_wrprotect(src_pte);
+ pte = *src_pte;
+ }
+
+ /*
+ * If it's a shared mapping, mark it clean in
+ * the child
+ */
+ if (vm_flags & VM_SHARED)
+ pte = pte_mkclean(pte);
+ pte = pte_mkold(pte);
+ get_page(page);
+ dst_mm->rss++;
+ if (PageAnon(page))
+ dst_mm->anon_rss++;
+ set_pte(dst_pte, pte);
+ page_dup_rmap(page);
+}
+
+static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
+ pmd_t *dst_pmd, pmd_t *src_pmd, struct vm_area_struct *vma,
+ unsigned long addr, unsigned long end)
+{
+ pte_t *src_pte, *dst_pte;
+ pte_t *s, *d;
+ unsigned long vm_flags = vma->vm_flags;
+
+ d = dst_pte = pte_alloc_map(dst_mm, dst_pmd, addr);
+ if (!dst_pte)
+ return -ENOMEM;
+
+ spin_lock(&src_mm->page_table_lock);
+ s = src_pte = pte_offset_map_nested(src_pmd, addr);
+ for (; addr < end; addr += PAGE_SIZE, s++, d++) {
+ if (pte_none(*s))
continue;
- }
+ copy_one_pte(dst_mm, src_mm, d, s, vm_flags, addr);
+ }
+ pte_unmap_nested(src_pte);
+ pte_unmap(dst_pte);
+ spin_unlock(&src_mm->page_table_lock);
+ cond_resched_lock(&dst_mm->page_table_lock);
+ return 0;
+}
- src_pmd = pmd_offset(src_pgd, address);
- dst_pmd = pmd_alloc(dst, dst_pgd, address);
- if (!dst_pmd)
- goto nomem;
+static int copy_pmd_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
+ pud_t *dst_pud, pud_t *src_pud, struct vm_area_struct *vma,
+ unsigned long addr, unsigned long end)
+{
+ pmd_t *src_pmd, *dst_pmd;
+ int err = 0;
+ unsigned long next;
+
+ src_pmd = pmd_offset(src_pud, addr);
+ dst_pmd = pmd_alloc(dst_mm, dst_pud, addr);
+ if (!dst_pmd)
+ return -ENOMEM;
+
+ for (; addr < end; addr = next, src_pmd++, dst_pmd++) {
+ next = (addr + PMD_SIZE) & PMD_MASK;
+ if (next > end)
+ next = end;
+ if (pmd_none(*src_pmd))
+ continue;
+ if (pmd_bad(*src_pmd)) {
+ pmd_ERROR(*src_pmd);
+ pmd_clear(src_pmd);
+ continue;
+ }
+ err = copy_pte_range(dst_mm, src_mm, dst_pmd, src_pmd,
+ vma, addr, next);
+ if (err)
+ break;
+ }
+ return err;
+}
- do {
- pte_t * src_pte, * dst_pte;
-
- /* copy_pte_range */
-
- if (pmd_none(*src_pmd))
- goto skip_copy_pte_range;
- if (unlikely(pmd_bad(*src_pmd))) {
- pmd_ERROR(*src_pmd);
- pmd_clear(src_pmd);
-skip_copy_pte_range:
- address = (address + PMD_SIZE) & PMD_MASK;
- if (address >= end)
- goto out;
- goto cont_copy_pmd_range;
- }
+static int copy_pud_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
+ pgd_t *dst_pgd, pgd_t *src_pgd, struct vm_area_struct *vma,
+ unsigned long addr, unsigned long end)
+{
+ pud_t *src_pud, *dst_pud;
+ int err = 0;
+ unsigned long next;
+
+ src_pud = pud_offset(src_pgd, addr);
+ dst_pud = pud_alloc(dst_mm, dst_pgd, addr);
+ if (!dst_pud)
+ return -ENOMEM;
+
+ for (; addr < end; addr = next, src_pud++, dst_pud++) {
+ next = (addr + PUD_SIZE) & PUD_MASK;
+ if (next > end)
+ next = end;
+ if (pud_none(*src_pud))
+ continue;
+ if (pud_bad(*src_pud)) {
+ pud_ERROR(*src_pud);
+ pud_clear(src_pud);
+ continue;
+ }
+ err = copy_pmd_range(dst_mm, src_mm, dst_pud, src_pud,
+ vma, addr, next);
+ if (err)
+ break;
+ }
+ return err;
+}
- dst_pte = pte_alloc_map(dst, dst_pmd, address);
- if (!dst_pte)
- goto nomem;
- spin_lock(&src->page_table_lock);
- src_pte = pte_offset_map_nested(src_pmd, address);
- do {
- pte_t pte = *src_pte;
- struct page *page;
- unsigned long pfn;
-
- /* copy_one_pte */
-
- if (pte_none(pte))
- goto cont_copy_pte_range_noset;
- /* pte contains position in swap, so copy. */
- if (!pte_present(pte)) {
- if (!pte_file(pte)) {
- swap_duplicate(pte_to_swp_entry(pte));
- if (list_empty(&dst->mmlist)) {
- spin_lock(&mmlist_lock);
- list_add(&dst->mmlist,
- &src->mmlist);
- spin_unlock(&mmlist_lock);
- }
- }
- set_pte(dst_pte, pte);
- goto cont_copy_pte_range_noset;
- }
- pfn = pte_pfn(pte);
- /* the pte points outside of valid memory, the
- * mapping is assumed to be good, meaningful
- * and not mapped via rmap - duplicate the
- * mapping as is.
- */
- page = NULL;
- if (pfn_valid(pfn))
- page = pfn_to_page(pfn);
-
- if (!page || PageReserved(page)) {
- set_pte(dst_pte, pte);
- goto cont_copy_pte_range_noset;
- }
+int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
+ struct vm_area_struct *vma)
+{
+ pgd_t *src_pgd, *dst_pgd;
+ unsigned long addr, start, end, next;
+ int err = 0;
+ int i;
- /*
- * If it's a COW mapping, write protect it both
- * in the parent and the child
- */
- if (cow) {
- ptep_set_wrprotect(src_pte);
- pte = *src_pte;
- }
+ if (is_vm_hugetlb_page(vma))
+ return copy_hugetlb_page_range(dst, src, vma);
- /*
- * If it's a shared mapping, mark it clean in
- * the child
- */
- if (vma->vm_flags & VM_SHARED)
- pte = pte_mkclean(pte);
- pte = pte_mkold(pte);
- get_page(page);
- dst->rss++;
- if (PageAnon(page))
- dst->anon_rss++;
- set_pte(dst_pte, pte);
- page_dup_rmap(page);
-cont_copy_pte_range_noset:
- address += PAGE_SIZE;
- if (address >= end) {
- pte_unmap_nested(src_pte);
- pte_unmap(dst_pte);
- goto out_unlock;
- }
- src_pte++;
- dst_pte++;
- } while ((unsigned long)src_pte & PTE_TABLE_MASK);
- pte_unmap_nested(src_pte-1);
- pte_unmap(dst_pte-1);
- spin_unlock(&src->page_table_lock);
- cond_resched_lock(&dst->page_table_lock);
-cont_copy_pmd_range:
- src_pmd++;
- dst_pmd++;
- } while ((unsigned long)src_pmd & PMD_TABLE_MASK);
+ start = vma->vm_start;
+ src_pgd = pgd_offset(src, start);
+ dst_pgd = pgd_offset(dst, start);
+
+ end = vma->vm_end;
+ addr = start;
+ for (i = pgd_index(addr);
+ i <= pgd_index(end-1);
+ i++, addr = next, src_pgd++, dst_pgd++) {
+ next = (addr + PGDIR_SIZE) & PGDIR_MASK;
+ if (next > end || next <= addr)
+ next = end;
+ if (pgd_none(*src_pgd))
+ continue;
+ if (pgd_bad(*src_pgd)) {
+ pgd_ERROR(*src_pgd);
+ pgd_clear(src_pgd);
+ continue;
+ }
+ err = copy_pud_range(dst, src, dst_pgd, src_pgd,
+ vma, addr, next);
+ if (err)
+ break;
}
-out_unlock:
- spin_unlock(&src->page_table_lock);
-out:
- return 0;
-nomem:
- return -ENOMEM;
+
+ return err;
}
static void zap_pte_range(struct mmu_gather *tlb,
@@ -449,23 +519,23 @@ static void zap_pte_range(struct mmu_gat
}
static void zap_pmd_range(struct mmu_gather *tlb,
- pgd_t * dir, unsigned long address,
+ pud_t *pud, unsigned long address,
unsigned long size, struct zap_details *details)
{
pmd_t * pmd;
unsigned long end;
- if (pgd_none(*dir))
+ if (pud_none(*pud))
return;
- if (unlikely(pgd_bad(*dir))) {
- pgd_ERROR(*dir);
- pgd_clear(dir);
+ if (unlikely(pud_bad(*pud))) {
+ pud_ERROR(*pud);
+ pud_clear(pud);
return;
}
- pmd = pmd_offset(dir, address);
+ pmd = pmd_offset(pud, address);
end = address + size;
- if (end > ((address + PGDIR_SIZE) & PGDIR_MASK))
- end = ((address + PGDIR_SIZE) & PGDIR_MASK);
+ if (end > ((address + PUD_SIZE) & PUD_MASK))
+ end = ((address + PUD_SIZE) & PUD_MASK);
do {
zap_pte_range(tlb, pmd, address, end - address, details);
address = (address + PMD_SIZE) & PMD_MASK;
@@ -473,20 +543,46 @@ static void zap_pmd_range(struct mmu_gat
} while (address && (address < end));
}
+static void zap_pud_range(struct mmu_gather *tlb,
+ pgd_t * pgd, unsigned long address,
+ unsigned long end, struct zap_details *details)
+{
+ pud_t * pud;
+
+ if (pgd_none(*pgd))
+ return;
+ if (unlikely(pgd_bad(*pgd))) {
+ pgd_ERROR(*pgd);
+ pgd_clear(pgd);
+ return;
+ }
+ pud = pud_offset(pgd, address);
+ do {
+ zap_pmd_range(tlb, pud, address, end - address, details);
+ address = (address + PUD_SIZE) & PUD_MASK;
+ pud++;
+ } while (address && (address < end));
+}
+
static void unmap_page_range(struct mmu_gather *tlb,
struct vm_area_struct *vma, unsigned long address,
unsigned long end, struct zap_details *details)
{
- pgd_t * dir;
+ unsigned long next;
+ pgd_t *pgd;
+ int i;
BUG_ON(address >= end);
- dir = pgd_offset(vma->vm_mm, address);
+ pgd = pgd_offset(vma->vm_mm, address);
tlb_start_vma(tlb, vma);
- do {
- zap_pmd_range(tlb, dir, address, end - address, details);
- address = (address + PGDIR_SIZE) & PGDIR_MASK;
- dir++;
- } while (address && (address < end));
+ for (i = pgd_index(address); i <= pgd_index(end-1); i++) {
+ next = (address + PGDIR_SIZE) & PGDIR_MASK;
+ if (next <= address || next > end)
+ next = end;
+ zap_pud_range(tlb, pgd, address, next, details);
+ address = next;
+ pgd++;
+ }
tlb_end_vma(tlb, vma);
}
@@ -628,6 +724,7 @@ struct page *
follow_page(struct mm_struct *mm, unsigned long address, int write)
{
pgd_t *pgd;
+ pud_t *pud;
pmd_t *pmd;
pte_t *ptep, pte;
unsigned long pfn;
@@ -641,13 +738,15 @@ follow_page(struct mm_struct *mm, unsign
if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
goto out;
- pmd = pmd_offset(pgd, address);
- if (pmd_none(*pmd))
+ pud = pud_offset(pgd, address);
+ if (pud_none(*pud) || unlikely(pud_bad(*pud)))
+ goto out;
+
+ pmd = pmd_offset(pud, address);
+ if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
goto out;
if (pmd_huge(*pmd))
return follow_huge_pmd(mm, address, pmd, write);
- if (unlikely(pmd_bad(*pmd)))
- goto out;
ptep = pte_offset_map(pmd, address);
if (!ptep)
@@ -691,6 +790,7 @@ untouched_anonymous_page(struct mm_struc
unsigned long address)
{
pgd_t *pgd;
+ pud_t *pud;
pmd_t *pmd;
/* Check if the vma is for an anonymous mapping. */
@@ -702,8 +802,12 @@ untouched_anonymous_page(struct mm_struc
if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
return 1;
+ pud = pud_offset(pgd, address);
+ if (pud_none(*pud) || unlikely(pud_bad(*pud)))
+ return 1;
+
/* Check if page middle directory entry exists. */
- pmd = pmd_offset(pgd, address);
+ pmd = pmd_offset(pud, address);
if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
return 1;
@@ -735,19 +839,21 @@ int get_user_pages(struct task_struct *t
unsigned long pg = start & PAGE_MASK;
struct vm_area_struct *gate_vma = get_gate_vma(tsk);
pgd_t *pgd;
+ pud_t *pud;
pmd_t *pmd;
pte_t *pte;
if (write) /* user gate pages are read-only */
return i ? : -EFAULT;
pgd = pgd_offset_gate(mm, pg);
- if (!pgd)
+ if (pgd_none(*pgd))
return i ? : -EFAULT;
- pmd = pmd_offset(pgd, pg);
- if (!pmd)
+ pud = pud_offset(pgd, pg);
+ if (pud_none(*pud))
return i ? : -EFAULT;
- pte = pte_offset_map(pmd, pg);
- if (!pte)
+ pmd = pmd_offset(pud, pg);
+ if (pmd_none(*pmd))
return i ? : -EFAULT;
+ pte = pte_offset_map(pmd, pg);
if (!pte_present(*pte)) {
pte_unmap(pte);
return i ? : -EFAULT;
@@ -861,16 +967,16 @@ static void zeromap_pte_range(pte_t * pt
} while (address && (address < end));
}
-static inline int zeromap_pmd_range(struct mm_struct *mm, pmd_t * pmd, unsigned long address,
- unsigned long size, pgprot_t prot)
+static inline int zeromap_pmd_range(struct mm_struct *mm, pmd_t * pmd,
+ unsigned long address, unsigned long size, pgprot_t prot)
{
unsigned long base, end;
- base = address & PGDIR_MASK;
- address &= ~PGDIR_MASK;
+ base = address & PUD_MASK;
+ address &= ~PUD_MASK;
end = address + size;
- if (end > PGDIR_SIZE)
- end = PGDIR_SIZE;
+ if (end > PUD_SIZE)
+ end = PUD_SIZE;
do {
pte_t * pte = pte_alloc_map(mm, pmd, base + address);
if (!pte)
@@ -883,31 +989,64 @@ static inline int zeromap_pmd_range(stru
return 0;
}
-int zeromap_page_range(struct vm_area_struct *vma, unsigned long address, unsigned long size, pgprot_t prot)
+static inline int zeromap_pud_range(struct mm_struct *mm, pud_t * pud,
+ unsigned long address,
+ unsigned long size, pgprot_t prot)
{
+ unsigned long base, end;
int error = 0;
- pgd_t * dir;
+
+ base = address & PGDIR_MASK;
+ address &= ~PGDIR_MASK;
+ end = address + size;
+ if (end > PGDIR_SIZE)
+ end = PGDIR_SIZE;
+ do {
+ pmd_t * pmd = pmd_alloc(mm, pud, base + address);
+ error = -ENOMEM;
+ if (!pmd)
+ break;
+ error = zeromap_pmd_range(mm, pmd, address, end - address, prot);
+ if (error)
+ break;
+ address = (address + PUD_SIZE) & PUD_MASK;
+ pud++;
+ } while (address && (address < end));
+ return 0;
+}
+
+int zeromap_page_range(struct vm_area_struct *vma, unsigned long address,
+ unsigned long size, pgprot_t prot)
+{
+ int i;
+ int error = 0;
+ pgd_t * pgd;
unsigned long beg = address;
unsigned long end = address + size;
+ unsigned long next;
struct mm_struct *mm = vma->vm_mm;
- dir = pgd_offset(mm, address);
+ pgd = pgd_offset(mm, address);
flush_cache_range(vma, beg, end);
- if (address >= end)
- BUG();
+ BUG_ON(address >= end);
+ BUG_ON(end > vma->vm_end);
spin_lock(&mm->page_table_lock);
- do {
- pmd_t *pmd = pmd_alloc(mm, dir, address);
+ for (i = pgd_index(address); i <= pgd_index(end-1); i++) {
+ pud_t *pud = pud_alloc(mm, pgd, address);
error = -ENOMEM;
- if (!pmd)
+ if (!pud)
break;
- error = zeromap_pmd_range(mm, pmd, address, end - address, prot);
+ next = (address + PGDIR_SIZE) & PGDIR_MASK;
+ if (next <= beg || next > end)
+ next = end;
+ error = zeromap_pud_range(mm, pud, address,
+ next - address, prot);
if (error)
break;
- address = (address + PGDIR_SIZE) & PGDIR_MASK;
- dir++;
- } while (address && (address < end));
+ address = next;
+ pgd++;
+ }
/*
* Why flush? zeromap_pte_range has a BUG_ON for !pte_none()
*/
@@ -921,8 +1060,9 @@ int zeromap_page_range(struct vm_area_st
* mappings are removed. any references to nonexistent pages results
* in null mappings (currently treated as "copy-on-access")
*/
-static inline void remap_pte_range(pte_t * pte, unsigned long address, unsigned long size,
- unsigned long pfn, pgprot_t prot)
+static inline void
+remap_pte_range(pte_t * pte, unsigned long address, unsigned long size,
+ unsigned long pfn, pgprot_t prot)
{
unsigned long end;
@@ -940,22 +1080,24 @@ static inline void remap_pte_range(pte_t
} while (address && (address < end));
}
-static inline int remap_pmd_range(struct mm_struct *mm, pmd_t * pmd, unsigned long address, unsigned long size,
- unsigned long pfn, pgprot_t prot)
+static inline int
+remap_pmd_range(struct mm_struct *mm, pmd_t * pmd, unsigned long address,
+ unsigned long size, unsigned long pfn, pgprot_t prot)
{
unsigned long base, end;
- base = address & PGDIR_MASK;
- address &= ~PGDIR_MASK;
+ base = address & PUD_MASK;
+ address &= ~PUD_MASK;
end = address + size;
- if (end > PGDIR_SIZE)
- end = PGDIR_SIZE;
- pfn -= address >> PAGE_SHIFT;
+ if (end > PUD_SIZE)
+ end = PUD_SIZE;
+ pfn -= (address >> PAGE_SHIFT);
do {
pte_t * pte = pte_alloc_map(mm, pmd, base + address);
if (!pte)
return -ENOMEM;
- remap_pte_range(pte, base + address, end - address, pfn + (address >> PAGE_SHIFT), prot);
+ remap_pte_range(pte, base + address, end - address,
+ (address >> PAGE_SHIFT) + pfn, prot);
pte_unmap(pte);
address = (address + PMD_SIZE) & PMD_MASK;
pmd++;
@@ -963,20 +1105,46 @@ static inline int remap_pmd_range(struct
return 0;
}
+static inline int remap_pud_range(struct mm_struct *mm, pud_t * pud,
+ unsigned long address, unsigned long size,
+ unsigned long pfn, pgprot_t prot)
+{
+ unsigned long base, end;
+
+ base = address & PGDIR_MASK;
+ address &= ~PGDIR_MASK;
+ end = address + size;
+ if (end > PGDIR_SIZE)
+ end = PGDIR_SIZE;
+ pfn -= address >> PAGE_SHIFT;
+ do {
+ pmd_t *pmd = pmd_alloc(mm, pud, base+address);
+ if (!pmd)
+ return -ENOMEM;
+ remap_pmd_range(mm, pmd, base + address, end - address,
+ (address >> PAGE_SHIFT) + pfn, prot);
+ address = (address + PUD_SIZE) & PUD_MASK;
+ pud++;
+ } while (address && (address < end));
+ return 0;
+}
+
/* Note: this is only safe if the mm semaphore is held when called. */
-int remap_pfn_range(struct vm_area_struct *vma, unsigned long from, unsigned long pfn, unsigned long size, pgprot_t prot)
+int remap_pfn_range(struct vm_area_struct *vma, unsigned long from,
+ unsigned long pfn, unsigned long size, pgprot_t prot)
{
int error = 0;
- pgd_t * dir;
+ pgd_t *pgd;
unsigned long beg = from;
unsigned long end = from + size;
+ unsigned long next;
struct mm_struct *mm = vma->vm_mm;
+ int i;
pfn -= from >> PAGE_SHIFT;
- dir = pgd_offset(mm, from);
+ pgd = pgd_offset(mm, from);
flush_cache_range(vma, beg, end);
- if (from >= end)
- BUG();
+ BUG_ON(from >= end);
/*
* Physically remapped pages are special. Tell the
@@ -987,25 +1155,32 @@ int remap_pfn_range(struct vm_area_struc
* this region.
*/
vma->vm_flags |= VM_IO | VM_RESERVED;
+
spin_lock(&mm->page_table_lock);
- do {
- pmd_t *pmd = pmd_alloc(mm, dir, from);
+ for (i = pgd_index(beg); i <= pgd_index(end-1); i++) {
+ pud_t *pud = pud_alloc(mm, pgd, from);
error = -ENOMEM;
- if (!pmd)
+ if (!pgd)
break;
- error = remap_pmd_range(mm, pmd, from, end - from, pfn + (from >> PAGE_SHIFT), prot);
+ next = (from + PGDIR_SIZE) & PGDIR_MASK;
+ if (next > end || next <= from)
+ next = end;
+ error = remap_pud_range(mm, pud, from, end - from,
+ pfn + (from >> PAGE_SHIFT), prot);
if (error)
break;
- from = (from + PGDIR_SIZE) & PGDIR_MASK;
- dir++;
- } while (from && (from < end));
+ from = next;
+ pgd++;
+ }
/*
* Why flush? remap_pte_range has a BUG_ON for !pte_none()
*/
flush_tlb_range(vma, beg, end);
spin_unlock(&mm->page_table_lock);
+
return error;
}
+
EXPORT_SYMBOL(remap_pfn_range);
/*
@@ -1697,13 +1872,14 @@ static inline int handle_pte_fault(struc
* By the time we get here, we already hold the mm semaphore
*/
int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct * vma,
- unsigned long address, int write_access)
+ unsigned long address, int write_access)
{
pgd_t *pgd;
+ pud_t *pud;
pmd_t *pmd;
+ pte_t *pte;
__set_current_state(TASK_RUNNING);
- pgd = pgd_offset(mm, address);
inc_page_state(pgfault);
@@ -1714,19 +1890,62 @@ int handle_mm_fault(struct mm_struct *mm
* We need the page table lock to synchronize with kswapd
* and the SMP-safe atomic PTE updates.
*/
+ pgd = pgd_offset(mm, address);
spin_lock(&mm->page_table_lock);
- pmd = pmd_alloc(mm, pgd, address);
- if (pmd) {
- pte_t * pte = pte_alloc_map(mm, pmd, address);
- if (pte)
- return handle_pte_fault(mm, vma, address, write_access, pte, pmd);
- }
+ pud = pud_alloc(mm, pgd, address);
+ if (!pud)
+ goto oom;
+
+ pmd = pmd_alloc(mm, pud, address);
+ if (!pmd)
+ goto oom;
+
+ pte = pte_alloc_map(mm, pmd, address);
+ if (!pte)
+ goto oom;
+
+ return handle_pte_fault(mm, vma, address, write_access, pte, pmd);
+
+ oom:
spin_unlock(&mm->page_table_lock);
return VM_FAULT_OOM;
}
/*
+ * Allocate page upper directory.
+ *
+ * We've already handled the fast-path in-line, and we own the
+ * page table lock.
+ *
+ * On a two-level or three-level page table, this ends up actually being
+ * entirely optimized away.
+ */
+pud_t fastcall *__pud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
+{
+ pud_t *new;
+
+ spin_unlock(&mm->page_table_lock);
+ new = pud_alloc_one(mm, address);
+ spin_lock(&mm->page_table_lock);
+ if (!new)
+ return NULL;
+
+ /*
+ * Because we dropped the lock, we should re-check the
+ * entry, as somebody else could have populated it..
+ */
+ if (pgd_present(*pgd)) {
+ pud_free(new);
+ goto out;
+ }
+ pgd_populate(mm, pgd, new);
+out:
+ return pud_offset(pgd, address);
+}
+
+
+/*
* Allocate page middle directory.
*
* We've already handled the fast-path in-line, and we own the
@@ -1735,7 +1954,7 @@ int handle_mm_fault(struct mm_struct *mm
* On a two-level page table, this ends up actually being entirely
* optimized away.
*/
-pmd_t fastcall *__pmd_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
+pmd_t fastcall *__pmd_alloc(struct mm_struct *mm, pud_t *pgd, unsigned long address)
{
pmd_t *new;
@@ -1749,11 +1968,11 @@ pmd_t fastcall *__pmd_alloc(struct mm_st
* Because we dropped the lock, we should re-check the
* entry, as somebody else could have populated it..
*/
- if (pgd_present(*pgd)) {
+ if (pud_present(*pud)) {
pmd_free(new);
goto out;
}
- pgd_populate(mm, pgd, new);
+ pud_populate(mm, pud, new);
out:
return pmd_offset(pgd, address);
}
@@ -1787,17 +2006,21 @@ struct page * vmalloc_to_page(void * vma
unsigned long addr = (unsigned long) vmalloc_addr;
struct page *page = NULL;
pgd_t *pgd = pgd_offset_k(addr);
+ pud_t *pud;
pmd_t *pmd;
pte_t *ptep, pte;
if (!pgd_none(*pgd)) {
- pmd = pmd_offset(pgd, addr);
- if (!pmd_none(*pmd)) {
- ptep = pte_offset_map(pmd, addr);
- pte = *ptep;
- if (pte_present(pte))
- page = pte_page(pte);
- pte_unmap(ptep);
+ pud = pud_offset_k(pgd, addr);
+ if (!pud_none(*pud)) {
+ pmd = pmd_offset(pud, addr);
+ if (!pmd_none(*pmd)) {
+ ptep = pte_offset_map(pmd, addr);
+ pte = *ptep;
+ if (pte_present(pte))
+ page = pte_page(pte);
+ pte_unmap(ptep);
+ }
}
}
return page;
diff -puN mm/mempolicy.c~4level-core-patch mm/mempolicy.c
--- linux-2.6/mm/mempolicy.c~4level-core-patch 2004-11-14 12:33:54.000000000 +1100
+++ linux-2.6-npiggin/mm/mempolicy.c 2004-11-14 12:33:54.000000000 +1100
@@ -233,18 +233,29 @@ static struct mempolicy *mpol_new(int mo
/* Ensure all existing pages follow the policy. */
static int
-verify_pages(unsigned long addr, unsigned long end, unsigned long *nodes)
+verify_pages(struct mm_struct *mm,
+ unsigned long addr, unsigned long end, unsigned long *nodes)
{
while (addr < end) {
struct page *p;
pte_t *pte;
pmd_t *pmd;
- pgd_t *pgd = pgd_offset_k(addr);
+ pud_t *pud;
+ pgd_t *pgd;
+ pgd = pgd_offset(mm, addr);
if (pgd_none(*pgd)) {
- addr = (addr + PGDIR_SIZE) & PGDIR_MASK;
+ unsigned long next = (addr + PGDIR_SIZE) & PGDIR_MASK;
+ if (next > addr)
+ break;
+ addr = next;
+ continue;
+ }
+ pud = pud_offset(pgd, addr);
+ if (pud_none(*pud)) {
+ addr = (addr + PUD_SIZE) & PUD_MASK;
continue;
}
- pmd = pmd_offset(pgd, addr);
+ pmd = pmd_offset(pud, addr);
if (pmd_none(*pmd)) {
addr = (addr + PMD_SIZE) & PMD_MASK;
continue;
@@ -282,7 +293,8 @@ check_range(struct mm_struct *mm, unsign
if (prev && prev->vm_end < vma->vm_start)
return ERR_PTR(-EFAULT);
if ((flags & MPOL_MF_STRICT) && !is_vm_hugetlb_page(vma)) {
- err = verify_pages(vma->vm_start, vma->vm_end, nodes);
+ err = verify_pages(vma->vm_mm,
+ vma->vm_start, vma->vm_end, nodes);
if (err) {
first = ERR_PTR(err);
break;
diff -puN mm/mmap.c~4level-core-patch mm/mmap.c
--- linux-2.6/mm/mmap.c~4level-core-patch 2004-11-14 12:33:54.000000000 +1100
+++ linux-2.6-npiggin/mm/mmap.c 2004-11-14 13:01:07.000000000 +1100
@@ -1474,7 +1474,7 @@ static void free_pgtables(struct mmu_gat
{
unsigned long first = start & PGDIR_MASK;
unsigned long last = end + PGDIR_SIZE - 1;
- unsigned long start_index, end_index;
+ unsigned long start_index;
struct mm_struct *mm = tlb->mm;
if (!prev) {
@@ -1498,23 +1498,23 @@ static void free_pgtables(struct mmu_gat
if (last > next->vm_start)
last = next->vm_start;
}
- if (prev->vm_end > first)
+ if (prev->vm_end > first) {
first = prev->vm_end + PGDIR_SIZE - 1;
+ if (first < prev->vm_end)
+ first = TASK_SIZE;
+ }
break;
}
no_mmaps:
- if (last < first) /* for arches with discontiguous pgd indices */
+ if (last < first) /* for arches with discontiguous indices */
return;
- /*
- * If the PGD bits are not consecutive in the virtual address, the
- * old method of shifting the VA >> by PGDIR_SHIFT doesn't work.
- */
start_index = pgd_index(first);
- if (start_index < FIRST_USER_PGD_NR)
+ if (start_index < FIRST_USER_PGD_NR) {
start_index = FIRST_USER_PGD_NR;
- end_index = pgd_index(last);
- if (end_index > start_index) {
- clear_page_tables(tlb, start_index, end_index - start_index);
+ first = start_index * PGDIR_SIZE;
+ }
+ if (last > first) {
+ clear_page_range(tlb, first, last);
flush_tlb_pgtables(mm, first & PGDIR_MASK, last & PGDIR_MASK);
}
}
@@ -1844,7 +1844,8 @@ void exit_mmap(struct mm_struct *mm)
~0UL, &nr_accounted, NULL);
vm_unacct_memory(nr_accounted);
BUG_ON(mm->map_count); /* This is just debugging */
- clear_page_tables(tlb, FIRST_USER_PGD_NR, USER_PTRS_PER_PGD);
+ clear_page_range(tlb, FIRST_USER_PGD_NR * PGDIR_SIZE,
+ (TASK_SIZE + PGDIR_SIZE - 1) & ~(PGDIR_SIZE - 1));
tlb_finish_mmu(tlb, 0, MM_VM_SIZE(mm));
vma = mm->mmap;
diff -puN mm/mprotect.c~4level-core-patch mm/mprotect.c
--- linux-2.6/mm/mprotect.c~4level-core-patch 2004-11-14 12:33:54.000000000 +1100
+++ linux-2.6-npiggin/mm/mprotect.c 2004-11-14 12:33:54.000000000 +1100
@@ -62,12 +62,38 @@ change_pte_range(pmd_t *pmd, unsigned lo
}
static inline void
-change_pmd_range(pgd_t *pgd, unsigned long address,
+change_pmd_range(pud_t *pud, unsigned long address,
unsigned long size, pgprot_t newprot)
{
pmd_t * pmd;
unsigned long end;
+ if (pud_none(*pud))
+ return;
+ if (pud_bad(*pud)) {
+ pud_ERROR(*pud);
+ pud_clear(pud);
+ return;
+ }
+ pmd = pmd_offset(pud, address);
+ address &= ~PUD_MASK;
+ end = address + size;
+ if (end > PUD_SIZE)
+ end = PUD_SIZE;
+ do {
+ change_pte_range(pmd, address, end - address, newprot);
+ address = (address + PMD_SIZE) & PMD_MASK;
+ pmd++;
+ } while (address && (address < end));
+}
+
+static inline void
+change_pud_range(pgd_t *pgd, unsigned long address,
+ unsigned long size, pgprot_t newprot)
+{
+ pud_t * pud;
+ unsigned long end;
+
if (pgd_none(*pgd))
return;
if (pgd_bad(*pgd)) {
@@ -75,15 +101,15 @@ change_pmd_range(pgd_t *pgd, unsigned lo
pgd_clear(pgd);
return;
}
- pmd = pmd_offset(pgd, address);
+ pud = pud_offset(pgd, address);
address &= ~PGDIR_MASK;
end = address + size;
if (end > PGDIR_SIZE)
end = PGDIR_SIZE;
do {
- change_pte_range(pmd, address, end - address, newprot);
- address = (address + PMD_SIZE) & PMD_MASK;
- pmd++;
+ change_pmd_range(pud, address, end - address, newprot);
+ address = (address + PUD_SIZE) & PUD_MASK;
+ pud++;
} while (address && (address < end));
}
@@ -91,22 +117,25 @@ static void
change_protection(struct vm_area_struct *vma, unsigned long start,
unsigned long end, pgprot_t newprot)
{
- pgd_t *dir;
- unsigned long beg = start;
+ struct mm_struct *mm = current->mm;
+ pgd_t *pgd;
+ unsigned long beg = start, next;
+ int i;
- dir = pgd_offset(current->mm, start);
+ pgd = pgd_offset(mm, start);
flush_cache_range(vma, beg, end);
- if (start >= end)
- BUG();
- spin_lock(¤t->mm->page_table_lock);
- do {
- change_pmd_range(dir, start, end - start, newprot);
- start = (start + PGDIR_SIZE) & PGDIR_MASK;
- dir++;
- } while (start && (start < end));
+ BUG_ON(start >= end);
+ spin_lock(&mm->page_table_lock);
+ for (i = pgd_index(start); i <= pgd_index(end-1); i++) {
+ next = (start + PGDIR_SIZE) & PGDIR_MASK;
+ if (next <= start || next > end)
+ next = end;
+ change_pud_range(pgd, start, next - start, newprot);
+ start = next;
+ pgd++;
+ }
flush_tlb_range(vma, beg, end);
- spin_unlock(¤t->mm->page_table_lock);
- return;
+ spin_unlock(&mm->page_table_lock);
}
static int
diff -puN mm/mremap.c~4level-core-patch mm/mremap.c
--- linux-2.6/mm/mremap.c~4level-core-patch 2004-11-14 12:33:54.000000000 +1100
+++ linux-2.6-npiggin/mm/mremap.c 2004-11-14 12:33:54.000000000 +1100
@@ -25,19 +25,24 @@
static pte_t *get_one_pte_map_nested(struct mm_struct *mm, unsigned long addr)
{
pgd_t *pgd;
+ pud_t *pud;
pmd_t *pmd;
pte_t *pte = NULL;
pgd = pgd_offset(mm, addr);
if (pgd_none(*pgd))
goto end;
- if (pgd_bad(*pgd)) {
- pgd_ERROR(*pgd);
- pgd_clear(pgd);
+
+ pud = pud_offset(pgd, addr);
+ if (pud_none(*pud))
+ goto end;
+ if (pud_bad(*pud)) {
+ pud_ERROR(*pud);
+ pud_clear(pud);
goto end;
}
- pmd = pmd_offset(pgd, addr);
+ pmd = pmd_offset(pud, addr);
if (pmd_none(*pmd))
goto end;
if (pmd_bad(*pmd)) {
@@ -58,12 +63,17 @@ end:
static pte_t *get_one_pte_map(struct mm_struct *mm, unsigned long addr)
{
pgd_t *pgd;
+ pud_t *pud;
pmd_t *pmd;
pgd = pgd_offset(mm, addr);
if (pgd_none(*pgd))
return NULL;
- pmd = pmd_offset(pgd, addr);
+
+ pud = pud_offset(pgd, addr);
+ if (pud_none(*pud))
+ return NULL;
+ pmd = pmd_offset(pud, addr);
if (!pmd_present(*pmd))
return NULL;
return pte_offset_map(pmd, addr);
@@ -71,10 +81,17 @@ static pte_t *get_one_pte_map(struct mm_
static inline pte_t *alloc_one_pte_map(struct mm_struct *mm, unsigned long addr)
{
+ pgd_t *pgd;
+ pud_t *pud;
pmd_t *pmd;
pte_t *pte = NULL;
- pmd = pmd_alloc(mm, pgd_offset(mm, addr), addr);
+ pgd = pgd_offset(mm, addr);
+
+ pud = pud_alloc(mm, pgd, addr);
+ if (!pud)
+ return NULL;
+ pmd = pmd_alloc(mm, pud, addr);
if (pmd)
pte = pte_alloc_map(mm, pmd, addr);
return pte;
diff -puN mm/msync.c~4level-core-patch mm/msync.c
--- linux-2.6/mm/msync.c~4level-core-patch 2004-11-14 12:33:54.000000000 +1100
+++ linux-2.6-npiggin/mm/msync.c 2004-11-14 16:31:51.000000000 +1100
@@ -67,13 +67,39 @@ static int filemap_sync_pte_range(pmd_t
return error;
}
-static inline int filemap_sync_pmd_range(pgd_t * pgd,
+static inline int filemap_sync_pmd_range(pud_t * pud,
unsigned long address, unsigned long end,
struct vm_area_struct *vma, unsigned int flags)
{
pmd_t * pmd;
int error;
+ if (pud_none(*pud))
+ return 0;
+ if (pud_bad(*pud)) {
+ pud_ERROR(*pud);
+ pud_clear(pud);
+ return 0;
+ }
+ pmd = pmd_offset(pud, address);
+ if ((address & PUD_MASK) != (end & PUD_MASK))
+ end = (address & PUD_MASK) + PUD_SIZE;
+ error = 0;
+ do {
+ error |= filemap_sync_pte_range(pmd, address, end, vma, flags);
+ address = (address + PMD_SIZE) & PMD_MASK;
+ pmd++;
+ } while (address && (address < end));
+ return error;
+}
+
+static inline int filemap_sync_pud_range(pgd_t *pgd,
+ unsigned long address, unsigned long end,
+ struct vm_area_struct *vma, unsigned int flags)
+{
+ pud_t *pud;
+ int error;
+
if (pgd_none(*pgd))
return 0;
if (pgd_bad(*pgd)) {
@@ -81,14 +107,14 @@ static inline int filemap_sync_pmd_range
pgd_clear(pgd);
return 0;
}
- pmd = pmd_offset(pgd, address);
+ pud = pud_offset(pgd, address);
if ((address & PGDIR_MASK) != (end & PGDIR_MASK))
end = (address & PGDIR_MASK) + PGDIR_SIZE;
error = 0;
do {
- error |= filemap_sync_pte_range(pmd, address, end, vma, flags);
- address = (address + PMD_SIZE) & PMD_MASK;
- pmd++;
+ error |= filemap_sync_pmd_range(pud, address, end, vma, flags);
+ address = (address + PUD_SIZE) & PUD_MASK;
+ pud++;
} while (address && (address < end));
return error;
}
@@ -96,8 +122,10 @@ static inline int filemap_sync_pmd_range
static int filemap_sync(struct vm_area_struct * vma, unsigned long address,
size_t size, unsigned int flags)
{
- pgd_t * dir;
+ pgd_t *pgd;
unsigned long end = address + size;
+ unsigned long next;
+ int i;
int error = 0;
/* Aquire the lock early; it may be possible to avoid dropping
@@ -105,7 +133,7 @@ static int filemap_sync(struct vm_area_s
*/
spin_lock(&vma->vm_mm->page_table_lock);
- dir = pgd_offset(vma->vm_mm, address);
+ pgd = pgd_offset(vma->vm_mm, address);
flush_cache_range(vma, address, end);
/* For hugepages we can't go walking the page table normally,
@@ -116,11 +144,14 @@ static int filemap_sync(struct vm_area_s
if (address >= end)
BUG();
- do {
- error |= filemap_sync_pmd_range(dir, address, end, vma, flags);
- address = (address + PGDIR_SIZE) & PGDIR_MASK;
- dir++;
- } while (address && (address < end));
+ for (i = pgd_index(address); i <= pgd_index(end-1); i++) {
+ next = (address + PGDIR_SIZE) & PGDIR_MASK;
+ if (next <= address || next > end)
+ next = end;
+ error |= filemap_sync_pud_range(pgd, address, next, vma, flags);
+ address = next;
+ pgd++;
+ }
/*
* Why flush ? filemap_sync_pte already flushed the tlbs with the
* dirty bits.
diff -puN mm/rmap.c~4level-core-patch mm/rmap.c
--- linux-2.6/mm/rmap.c~4level-core-patch 2004-11-14 12:33:54.000000000 +1100
+++ linux-2.6-npiggin/mm/rmap.c 2004-11-14 12:33:54.000000000 +1100
@@ -259,6 +259,7 @@ static int page_referenced_one(struct pa
struct mm_struct *mm = vma->vm_mm;
unsigned long address;
pgd_t *pgd;
+ pud_t *pud;
pmd_t *pmd;
pte_t *pte;
int referenced = 0;
@@ -275,7 +276,11 @@ static int page_referenced_one(struct pa
if (!pgd_present(*pgd))
goto out_unlock;
- pmd = pmd_offset(pgd, address);
+ pud = pud_offset(pgd, address);
+ if (!pud_present(*pud))
+ goto out_unlock;
+
+ pmd = pmd_offset(pud, address);
if (!pmd_present(*pmd))
goto out_unlock;
@@ -499,6 +504,7 @@ static int try_to_unmap_one(struct page
struct mm_struct *mm = vma->vm_mm;
unsigned long address;
pgd_t *pgd;
+ pud_t *pud;
pmd_t *pmd;
pte_t *pte;
pte_t pteval;
@@ -520,7 +526,11 @@ static int try_to_unmap_one(struct page
if (!pgd_present(*pgd))
goto out_unlock;
- pmd = pmd_offset(pgd, address);
+ pud = pud_offset(pgd, address);
+ if (!pud_present(*pud))
+ goto out_unlock;
+
+ pmd = pmd_offset(pud, address);
if (!pmd_present(*pmd))
goto out_unlock;
@@ -628,6 +638,7 @@ static void try_to_unmap_cluster(unsigne
{
struct mm_struct *mm = vma->vm_mm;
pgd_t *pgd;
+ pud_t *pud;
pmd_t *pmd;
pte_t *pte;
pte_t pteval;
@@ -653,7 +664,11 @@ static void try_to_unmap_cluster(unsigne
if (!pgd_present(*pgd))
goto out_unlock;
- pmd = pmd_offset(pgd, address);
+ pud = pud_offset(pgd, address);
+ if (!pud_present(*pud))
+ goto out_unlock;
+
+ pmd = pmd_offset(pud, address);
if (!pmd_present(*pmd))
goto out_unlock;
diff -puN mm/swapfile.c~4level-core-patch mm/swapfile.c
--- linux-2.6/mm/swapfile.c~4level-core-patch 2004-11-14 12:33:54.000000000 +1100
+++ linux-2.6-npiggin/mm/swapfile.c 2004-11-14 16:31:51.000000000 +1100
@@ -486,27 +486,27 @@ static unsigned long unuse_pmd(struct vm
}
/* vma->vm_mm->page_table_lock is held */
-static unsigned long unuse_pgd(struct vm_area_struct * vma, pgd_t *dir,
- unsigned long address, unsigned long size,
+static unsigned long unuse_pud(struct vm_area_struct * vma, pud_t *pud,
+ unsigned long address, unsigned long size, unsigned long offset,
swp_entry_t entry, struct page *page)
{
pmd_t * pmd;
- unsigned long offset, end;
+ unsigned long end;
unsigned long foundaddr;
- if (pgd_none(*dir))
+ if (pud_none(*pud))
return 0;
- if (pgd_bad(*dir)) {
- pgd_ERROR(*dir);
- pgd_clear(dir);
+ if (pud_bad(*pud)) {
+ pud_ERROR(*pud);
+ pud_clear(pud);
return 0;
}
- pmd = pmd_offset(dir, address);
- offset = address & PGDIR_MASK;
- address &= ~PGDIR_MASK;
+ pmd = pmd_offset(pud, address);
+ offset += address & PUD_MASK;
+ address &= ~PUD_MASK;
end = address + size;
- if (end > PGDIR_SIZE)
- end = PGDIR_SIZE;
+ if (end > PUD_SIZE)
+ end = PUD_SIZE;
if (address >= end)
BUG();
do {
@@ -521,12 +521,48 @@ static unsigned long unuse_pgd(struct vm
}
/* vma->vm_mm->page_table_lock is held */
+static unsigned long unuse_pgd(struct vm_area_struct * vma, pgd_t *pgd,
+ unsigned long address, unsigned long size,
+ swp_entry_t entry, struct page *page)
+{
+ pud_t * pud;
+ unsigned long offset;
+ unsigned long foundaddr;
+ unsigned long end;
+
+ if (pgd_none(*pgd))
+ return 0;
+ if (pgd_bad(*pgd)) {
+ pgd_ERROR(*pgd);
+ pgd_clear(pgd);
+ return 0;
+ }
+ pud = pud_offset(pgd, address);
+ offset = address & PGDIR_MASK;
+ address &= ~PGDIR_MASK;
+ end = address + size;
+ if (end > PGDIR_SIZE)
+ end = PGDIR_SIZE;
+ BUG_ON (address >= end);
+ do {
+ foundaddr = unuse_pud(vma, pud, address, end - address,
+ offset, entry, page);
+ if (foundaddr)
+ return foundaddr;
+ address = (address + PUD_SIZE) & PUD_MASK;
+ pud++;
+ } while (address && (address < end));
+ return 0;
+}
+
+/* vma->vm_mm->page_table_lock is held */
static unsigned long unuse_vma(struct vm_area_struct * vma,
swp_entry_t entry, struct page *page)
{
- pgd_t *pgdir;
- unsigned long start, end;
+ pgd_t *pgd;
+ unsigned long start, end, next;
unsigned long foundaddr;
+ int i;
if (page->mapping) {
start = page_address_in_vma(page, vma);
@@ -538,15 +574,18 @@ static unsigned long unuse_vma(struct vm
start = vma->vm_start;
end = vma->vm_end;
}
- pgdir = pgd_offset(vma->vm_mm, start);
- do {
- foundaddr = unuse_pgd(vma, pgdir, start, end - start,
- entry, page);
+ pgd = pgd_offset(vma->vm_mm, start);
+ for (i = pgd_index(start); i <= pgd_index(end-1); i++) {
+ next = (start + PGDIR_SIZE) & PGDIR_MASK;
+ if (next > end || next <= start)
+ next = end;
+ foundaddr = unuse_pgd(vma, pgd, start, next - start, entry, page);
if (foundaddr)
return foundaddr;
- start = (start + PGDIR_SIZE) & PGDIR_MASK;
- pgdir++;
- } while (start && (start < end));
+ start = next;
+ i++;
+ pgd++;
+ }
return 0;
}
diff -puN mm/vmalloc.c~4level-core-patch mm/vmalloc.c
--- linux-2.6/mm/vmalloc.c~4level-core-patch 2004-11-14 12:33:54.000000000 +1100
+++ linux-2.6-npiggin/mm/vmalloc.c 2004-11-14 16:31:51.000000000 +1100
@@ -56,25 +56,25 @@ static void unmap_area_pte(pmd_t *pmd, u
} while (address < end);
}
-static void unmap_area_pmd(pgd_t *dir, unsigned long address,
+static void unmap_area_pmd(pud_t *pud, unsigned long address,
unsigned long size)
{
unsigned long end;
pmd_t *pmd;
- if (pgd_none(*dir))
+ if (pud_none(*pud))
return;
- if (pgd_bad(*dir)) {
- pgd_ERROR(*dir);
- pgd_clear(dir);
+ if (pud_bad(*pud)) {
+ pud_ERROR(*pud);
+ pud_clear(pud);
return;
}
- pmd = pmd_offset(dir, address);
- address &= ~PGDIR_MASK;
+ pmd = pmd_offset(pud, address);
+ address &= ~PUD_MASK;
end = address + size;
- if (end > PGDIR_SIZE)
- end = PGDIR_SIZE;
+ if (end > PUD_SIZE)
+ end = PUD_SIZE;
do {
unmap_area_pte(pmd, address, end - address);
@@ -83,6 +83,33 @@ static void unmap_area_pmd(pgd_t *dir, u
} while (address < end);
}
+static void unmap_area_pud(pgd_t *pgd, unsigned long address,
+ unsigned long size)
+{
+ pud_t *pud;
+ unsigned long end;
+
+ if (pgd_none(*pgd))
+ return;
+ if (pgd_bad(*pgd)) {
+ pgd_ERROR(*pgd);
+ pgd_clear(pgd);
+ return;
+ }
+
+ pud = pud_offset_k(pgd, address);
+ address &= ~PGDIR_MASK;
+ end = address + size;
+ if (end > PGDIR_SIZE)
+ end = PGDIR_SIZE;
+
+ do {
+ unmap_area_pmd(pud, address, end - address);
+ address = (address + PUD_SIZE) & PUD_MASK;
+ pud++;
+ } while (address && (address < end));
+}
+
static int map_area_pte(pte_t *pte, unsigned long address,
unsigned long size, pgprot_t prot,
struct page ***pages)
@@ -96,7 +123,6 @@ static int map_area_pte(pte_t *pte, unsi
do {
struct page *page = **pages;
-
WARN_ON(!pte_none(*pte));
if (!page)
return -ENOMEM;
@@ -115,11 +141,11 @@ static int map_area_pmd(pmd_t *pmd, unsi
{
unsigned long base, end;
- base = address & PGDIR_MASK;
- address &= ~PGDIR_MASK;
+ base = address & PUD_MASK;
+ address &= ~PUD_MASK;
end = address + size;
- if (end > PGDIR_SIZE)
- end = PGDIR_SIZE;
+ if (end > PUD_SIZE)
+ end = PUD_SIZE;
do {
pte_t * pte = pte_alloc_kernel(&init_mm, pmd, base + address);
@@ -134,19 +160,41 @@ static int map_area_pmd(pmd_t *pmd, unsi
return 0;
}
+static int map_area_pud(pud_t *pud, unsigned long address,
+ unsigned long end, pgprot_t prot,
+ struct page ***pages)
+{
+ do {
+ pmd_t *pmd = pmd_alloc(&init_mm, pud, address);
+ if (!pmd)
+ return -ENOMEM;
+ if (map_area_pmd(pmd, address, end - address, prot, pages))
+ return -ENOMEM;
+ address = (address + PUD_SIZE) & PUD_MASK;
+ pud++;
+ } while (address && address < end);
+
+ return 0;
+}
+
void unmap_vm_area(struct vm_struct *area)
{
unsigned long address = (unsigned long) area->addr;
unsigned long end = (address + area->size);
- pgd_t *dir;
+ unsigned long next;
+ pgd_t *pgd;
+ int i;
- dir = pgd_offset_k(address);
+ pgd = pgd_offset_k(address);
flush_cache_vunmap(address, end);
- do {
- unmap_area_pmd(dir, address, end - address);
- address = (address + PGDIR_SIZE) & PGDIR_MASK;
- dir++;
- } while (address && (address < end));
+ for (i = pgd_index(address); i <= pgd_index(end-1); i++) {
+ next = (address + PGDIR_SIZE) & PGDIR_MASK;
+ if (next <= address || next > end)
+ next = end;
+ unmap_area_pud(pgd, address, next - address);
+ address = next;
+ pgd++;
+ }
flush_tlb_kernel_range((unsigned long) area->addr, end);
}
@@ -154,25 +202,30 @@ int map_vm_area(struct vm_struct *area,
{
unsigned long address = (unsigned long) area->addr;
unsigned long end = address + (area->size-PAGE_SIZE);
- pgd_t *dir;
+ unsigned long next;
+ pgd_t *pgd;
int err = 0;
+ int i;
- dir = pgd_offset_k(address);
+ pgd = pgd_offset_k(address);
spin_lock(&init_mm.page_table_lock);
- do {
- pmd_t *pmd = pmd_alloc(&init_mm, dir, address);
- if (!pmd) {
+ for (i = pgd_index(address); i <= pgd_index(end-1); i++) {
+ pud_t *pud = pud_alloc(&init_mm, pgd, address);
+ if (!pud) {
err = -ENOMEM;
break;
}
- if (map_area_pmd(pmd, address, end - address, prot, pages)) {
+ next = (address + PGDIR_SIZE) & PGDIR_MASK;
+ if (next < address || next > end)
+ next = end;
+ if (map_area_pud(pud, address, next, prot, pages)) {
err = -ENOMEM;
break;
}
- address = (address + PGDIR_SIZE) & PGDIR_MASK;
- dir++;
- } while (address && (address < end));
+ address = next;
+ pgd++;
+ }
spin_unlock(&init_mm.page_table_lock);
flush_cache_vmap((unsigned long) area->addr, end);
_
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] Possible alternate 4 level pagetables?
2004-11-14 5:47 ` Nick Piggin
@ 2004-11-14 5:48 ` Nick Piggin
2004-11-14 5:50 ` Nick Piggin
0 siblings, 1 reply; 9+ messages in thread
From: Nick Piggin @ 2004-11-14 5:48 UTC (permalink / raw)
To: Andi Kleen; +Cc: Linux Memory Management
[-- Attachment #1: Type: text/plain, Size: 35 bytes --]
4/4 - 4level arch changes for i386
[-- Attachment #2: 4level-architecture-changes-for-i386.patch --]
[-- Type: text/x-patch, Size: 8073 bytes --]
i386 works with 2 and 3 levels
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---
---
linux-2.6-npiggin/arch/i386/kernel/acpi/sleep.c | 3 ++-
linux-2.6-npiggin/arch/i386/kernel/vm86.c | 11 ++++++++++-
linux-2.6-npiggin/arch/i386/mm/fault.c | 13 ++++++++++---
linux-2.6-npiggin/arch/i386/mm/hugetlbpage.c | 8 ++++++--
linux-2.6-npiggin/arch/i386/mm/init.c | 2 +-
linux-2.6-npiggin/arch/i386/mm/ioremap.c | 7 ++++++-
linux-2.6-npiggin/arch/i386/mm/pageattr.c | 10 +++++++---
linux-2.6-npiggin/include/asm-i386/pgalloc.h | 1 -
linux-2.6-npiggin/include/asm-i386/pgtable.h | 1 +
9 files changed, 43 insertions(+), 13 deletions(-)
diff -puN arch/i386/kernel/acpi/sleep.c~4level-architecture-changes-for-i386 arch/i386/kernel/acpi/sleep.c
--- linux-2.6/arch/i386/kernel/acpi/sleep.c~4level-architecture-changes-for-i386 2004-11-14 12:33:58.000000000 +1100
+++ linux-2.6-npiggin/arch/i386/kernel/acpi/sleep.c 2004-11-14 12:33:58.000000000 +1100
@@ -7,6 +7,7 @@
#include <linux/acpi.h>
#include <linux/bootmem.h>
+#include <asm/current.h> /* XXX remove me */
#include <asm/smp.h>
@@ -24,7 +25,7 @@ static void init_low_mapping(pgd_t *pgd,
int pgd_ofs = 0;
while ((pgd_ofs < pgd_limit) && (pgd_ofs + USER_PTRS_PER_PGD < PTRS_PER_PGD)) {
- set_pgd(pgd, *(pgd+USER_PTRS_PER_PGD));
+ set_pgd(pgd, (*(pgd+USER_PTRS_PER_PGD)));
pgd_ofs++, pgd++;
}
}
diff -puN arch/i386/kernel/vm86.c~4level-architecture-changes-for-i386 arch/i386/kernel/vm86.c
--- linux-2.6/arch/i386/kernel/vm86.c~4level-architecture-changes-for-i386 2004-11-14 12:33:58.000000000 +1100
+++ linux-2.6-npiggin/arch/i386/kernel/vm86.c 2004-11-14 12:33:58.000000000 +1100
@@ -137,6 +137,7 @@ struct pt_regs * fastcall save_v86_state
static void mark_screen_rdonly(struct task_struct * tsk)
{
pgd_t *pgd;
+ pud_t *pud;
pmd_t *pmd;
pte_t *pte, *mapped;
int i;
@@ -151,7 +152,15 @@ static void mark_screen_rdonly(struct ta
pgd_clear(pgd);
goto out;
}
- pmd = pmd_offset(pgd, 0xA0000);
+ pud = pud_offset(pgd, 0xA0000);
+ if (pud_none(*pud))
+ goto out;
+ if (pud_bad(*pud)) {
+ pud_ERROR(*pud);
+ pud_clear(pud);
+ goto out;
+ }
+ pmd = pmd_offset(pud, 0xA0000);
if (pmd_none(*pmd))
goto out;
if (pmd_bad(*pmd)) {
diff -puN arch/i386/mm/fault.c~4level-architecture-changes-for-i386 arch/i386/mm/fault.c
--- linux-2.6/arch/i386/mm/fault.c~4level-architecture-changes-for-i386 2004-11-14 12:33:58.000000000 +1100
+++ linux-2.6-npiggin/arch/i386/mm/fault.c 2004-11-14 12:33:58.000000000 +1100
@@ -518,6 +518,7 @@ vmalloc_fault:
int index = pgd_index(address);
unsigned long pgd_paddr;
pgd_t *pgd, *pgd_k;
+ pud_t *pud, *pud_k;
pmd_t *pmd, *pmd_k;
pte_t *pte_k;
@@ -530,11 +531,17 @@ vmalloc_fault:
/*
* set_pgd(pgd, *pgd_k); here would be useless on PAE
- * and redundant with the set_pmd() on non-PAE.
+ * and redundant with the set_pmd() on non-PAE. As would
+ * set_pud.
*/
- pmd = pmd_offset(pgd, address);
- pmd_k = pmd_offset(pgd_k, address);
+ pud = pud_offset(pgd, address);
+ pud_k = pud_offset(pgd_k, address);
+ if (!pud_present(*pud_k))
+ goto no_context;
+
+ pmd = pmd_offset(pud, address);
+ pmd_k = pmd_offset(pud_k, address);
if (!pmd_present(*pmd_k))
goto no_context;
set_pmd(pmd, *pmd_k);
diff -puN arch/i386/mm/hugetlbpage.c~4level-architecture-changes-for-i386 arch/i386/mm/hugetlbpage.c
--- linux-2.6/arch/i386/mm/hugetlbpage.c~4level-architecture-changes-for-i386 2004-11-14 12:33:58.000000000 +1100
+++ linux-2.6-npiggin/arch/i386/mm/hugetlbpage.c 2004-11-14 12:33:58.000000000 +1100
@@ -21,20 +21,24 @@
static pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr)
{
pgd_t *pgd;
+ pud_t *pud;
pmd_t *pmd = NULL;
pgd = pgd_offset(mm, addr);
- pmd = pmd_alloc(mm, pgd, addr);
+ pud = pud_alloc(mm, pgd, addr);
+ pmd = pmd_alloc(mm, pud, addr);
return (pte_t *) pmd;
}
static pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
{
pgd_t *pgd;
+ pud_t *pud;
pmd_t *pmd = NULL;
pgd = pgd_offset(mm, addr);
- pmd = pmd_offset(pgd, addr);
+ pud = pud_offset(pgd, addr);
+ pmd = pmd_offset(pud, addr);
return (pte_t *) pmd;
}
diff -puN arch/i386/mm/init.c~4level-architecture-changes-for-i386 arch/i386/mm/init.c
--- linux-2.6/arch/i386/mm/init.c~4level-architecture-changes-for-i386 2004-11-14 12:33:58.000000000 +1100
+++ linux-2.6-npiggin/arch/i386/mm/init.c 2004-11-14 12:33:58.000000000 +1100
@@ -233,7 +233,7 @@ EXPORT_SYMBOL(kmap_prot);
EXPORT_SYMBOL(kmap_pte);
#define kmap_get_fixmap_pte(vaddr) \
- pte_offset_kernel(pmd_offset(pgd_offset_k(vaddr), (vaddr)), (vaddr))
+ pte_offset_kernel(pmd_offset(pud_offset(pgd_offset_k(vaddr), vaddr), (vaddr)), (vaddr))
void __init kmap_init(void)
{
diff -puN arch/i386/mm/ioremap.c~4level-architecture-changes-for-i386 arch/i386/mm/ioremap.c
--- linux-2.6/arch/i386/mm/ioremap.c~4level-architecture-changes-for-i386 2004-11-14 12:33:58.000000000 +1100
+++ linux-2.6-npiggin/arch/i386/mm/ioremap.c 2004-11-14 12:33:58.000000000 +1100
@@ -80,9 +80,14 @@ static int remap_area_pages(unsigned lon
BUG();
spin_lock(&init_mm.page_table_lock);
do {
+ pud_t *pud;
pmd_t *pmd;
- pmd = pmd_alloc(&init_mm, dir, address);
+
error = -ENOMEM;
+ pud = pud_alloc(&init_mm, dir, address);
+ if (!pud)
+ break;
+ pmd = pmd_alloc(&init_mm, pud, address);
if (!pmd)
break;
if (remap_area_pmd(pmd, address, end - address,
diff -puN arch/i386/mm/pageattr.c~4level-architecture-changes-for-i386 arch/i386/mm/pageattr.c
--- linux-2.6/arch/i386/mm/pageattr.c~4level-architecture-changes-for-i386 2004-11-14 12:33:58.000000000 +1100
+++ linux-2.6-npiggin/arch/i386/mm/pageattr.c 2004-11-14 12:33:58.000000000 +1100
@@ -19,11 +19,15 @@ static struct list_head df_list = LIST_H
pte_t *lookup_address(unsigned long address)
{
- pgd_t *pgd = pgd_offset_k(address);
+ pgd_t *pgd = pgd_offset_k(address);
+ pud_t *pud;
pmd_t *pmd;
if (pgd_none(*pgd))
return NULL;
- pmd = pmd_offset(pgd, address);
+ pud = pud_offset(pgd, address);
+ if (pud_none(*pud))
+ return NULL;
+ pmd = pmd_offset(pud, address);
if (pmd_none(*pmd))
return NULL;
if (pmd_large(*pmd))
@@ -92,7 +96,7 @@ static void set_pmd_pte(pte_t *kpte, uns
static inline void revert_page(struct page *kpte_page, unsigned long address)
{
pte_t *linear = (pte_t *)
- pmd_offset(pgd_offset(&init_mm, address), address);
+ pmd_offset(pud_offset(pgd_offset_k(address), address), address);
set_pmd_pte(linear, address,
pfn_pte((__pa(address) & LARGE_PAGE_MASK) >> PAGE_SHIFT,
PAGE_KERNEL_LARGE));
diff -puN arch/i386/mm/pgtable.c~4level-architecture-changes-for-i386 arch/i386/mm/pgtable.c
diff -puN include/asm-i386/mmu_context.h~4level-architecture-changes-for-i386 include/asm-i386/mmu_context.h
diff -puN include/asm-i386/page.h~4level-architecture-changes-for-i386 include/asm-i386/page.h
diff -puN include/asm-i386/pgalloc.h~4level-architecture-changes-for-i386 include/asm-i386/pgalloc.h
--- linux-2.6/include/asm-i386/pgalloc.h~4level-architecture-changes-for-i386 2004-11-14 12:33:58.000000000 +1100
+++ linux-2.6-npiggin/include/asm-i386/pgalloc.h 2004-11-14 12:33:58.000000000 +1100
@@ -19,7 +19,6 @@ static inline void pmd_populate(struct m
/*
* Allocate and free page tables.
*/
-
extern pgd_t *pgd_alloc(struct mm_struct *);
extern void pgd_free(pgd_t *pgd);
diff -puN include/asm-i386/pgtable.h~4level-architecture-changes-for-i386 include/asm-i386/pgtable.h
--- linux-2.6/include/asm-i386/pgtable.h~4level-architecture-changes-for-i386 2004-11-14 12:33:58.000000000 +1100
+++ linux-2.6-npiggin/include/asm-i386/pgtable.h 2004-11-14 12:33:58.000000000 +1100
@@ -306,6 +306,7 @@ static inline pte_t pte_modify(pte_t pte
* control the given virtual address
*/
#define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD-1))
+#define pgd_index_k(addr) pgd_index(addr)
/*
* pgd_offset() returns a (pgd_t *)
_
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] Possible alternate 4 level pagetables?
2004-11-14 5:48 ` Nick Piggin
@ 2004-11-14 5:50 ` Nick Piggin
0 siblings, 0 replies; 9+ messages in thread
From: Nick Piggin @ 2004-11-14 5:50 UTC (permalink / raw)
To: Andi Kleen; +Cc: Linux Memory Management
Nick Piggin wrote:
> 4/4 - 4level arch changes for i386
>
>
> ------------------------------------------------------------------------
>
>
>
> i386 works with 2 and 3 levels
>
> Signed-off-by: Andi Kleen <ak@suse.de>
> Signed-off-by: Andrew Morton <akpm@osdl.org>
^^^^^
Sorry, ignore these of course. I forgot they were still attached.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] Possible alternate 4 level pagetables?
2004-11-14 5:46 [RFC] Possible alternate 4 level pagetables? Nick Piggin
2004-11-14 5:46 ` Nick Piggin
@ 2004-11-14 8:25 ` Andi Kleen
2004-11-15 1:19 ` Nick Piggin
2004-11-15 21:37 ` Hugh Dickins
2 siblings, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2004-11-14 8:25 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andi Kleen, Linux Memory Management
On Sun, Nov 14, 2004 at 04:46:21PM +1100, Nick Piggin wrote:
> Just looking at your 4 level page tables patch, I wondered why the extra
> level isn't inserted between pgd and pmd, as that would appear to be the
> least intrusive (conceptually, in the generic code). Also it maybe matches
> more closely the way that the 2->3 level conversion was done.
I did it the way I did to keep i386 and other archs obviously correct
because their logic doesn't change at all for the three lower levels,
and the highest level just hands a pointer through.
Regarding intrusiveness in common code: you pretty much have to change
most of of mm/memory.c, no matter what you do. Also there are overall
only 7 or 8 users that really need the full scale changes, so
it's not as bad as it looks. Ok there is ioremap in each architecture,
but usually you can cheat for these because you know the architecture
will never support 4levels.
I'm sorry, but I don't see much advantage of your patches over mine.
-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] Possible alternate 4 level pagetables?
2004-11-14 8:25 ` Andi Kleen
@ 2004-11-15 1:19 ` Nick Piggin
0 siblings, 0 replies; 9+ messages in thread
From: Nick Piggin @ 2004-11-15 1:19 UTC (permalink / raw)
To: Andi Kleen; +Cc: Linux Memory Management
Andi Kleen wrote:
> On Sun, Nov 14, 2004 at 04:46:21PM +1100, Nick Piggin wrote:
>
>>Just looking at your 4 level page tables patch, I wondered why the extra
>>level isn't inserted between pgd and pmd, as that would appear to be the
>>least intrusive (conceptually, in the generic code). Also it maybe matches
>>more closely the way that the 2->3 level conversion was done.
>
>
> I did it the way I did to keep i386 and other archs obviously correct
> because their logic doesn't change at all for the three lower levels,
> and the highest level just hands a pointer through.
>
Yeah true. Although a pointer to a pud is essentially just a pointer
to pgd in the case where you've only got three levels instead of four.
So it is really a matter of where you make the "folds" I guess.
> Regarding intrusiveness in common code: you pretty much have to change
> most of of mm/memory.c, no matter what you do. Also there are overall
> only 7 or 8 users that really need the full scale changes, so
> it's not as bad as it looks. Ok there is ioremap in each architecture,
> but usually you can cheat for these because you know the architecture
> will never support 4levels.
>
Yeah - technically you can ignore the "pud" type in this system as well
if you're only using three levels, so architectures should be able to just
work. Although really they should just be converted over for cleanliness.
> I'm sorry, but I don't see much advantage of your patches over mine.
>
Well no there isn't much I guess - if mine were bug free it would nearly
compile into the same object code.
The main thing I see is that in my scheme, you have the business ends
of the page table - pgd and pte which are always there, and you fold away
the middle, "transient" levels when they're not in use. It also allows
you to use the same system for both pmd and pud.
But on the other hand yours is maybe as you say a bit less intrusive
code-wise, if not logically. And maybe has other advantages as well.
Anyway, this was just a suggestion - as I said I won't have much time
for it for the next week, but I might try to flesh it out a bit more
after that.
Thanks
Nick
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] Possible alternate 4 level pagetables?
2004-11-14 5:46 [RFC] Possible alternate 4 level pagetables? Nick Piggin
2004-11-14 5:46 ` Nick Piggin
2004-11-14 8:25 ` Andi Kleen
@ 2004-11-15 21:37 ` Hugh Dickins
2004-11-16 1:11 ` Nick Piggin
2 siblings, 1 reply; 9+ messages in thread
From: Hugh Dickins @ 2004-11-15 21:37 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andi Kleen, Linux Memory Management
On Sun, 14 Nov 2004, Nick Piggin wrote:
>
> Just looking at your 4 level page tables patch, I wondered why the extra
> level isn't inserted between pgd and pmd, as that would appear to be the
> least intrusive (conceptually, in the generic code). Also it maybe matches
> more closely the way that the 2->3 level conversion was done.
I thought the same, when I finally took a look a week or so ago.
I've scarcely looked at your patches, but notice they change i386.
For me, the attraction of putting the new level in between pgd and pmd
was that it seemed that only common code and x86_64 (and whatever else
comes to use all four levels in future) would need changing (beyond,
perhaps, #including some asm-generic headers). Some casting to combine
the two levels into pmd in unchanged arch code, or rename pmd to pld in
the changed common code. Andi's arch patches seemed (all?) to spring
from replacing mm->pgd by mm->pml4.
But I could well be mistaken, I wasn't so industrious as to actually
try it.
> I've been toying with it a little bit. It is mainly just starting with
> your code and doing straight conversions, although I also attempted to
> implement a better compatibility layer that does the pagetable "folding"
> for you if you don't need to use the full range of them.
>
> Caveats are that there is still something slightly broken with it on i386,
> and so I haven't looked at x86-64 yet. I don't see why this wouldn't work
> though.
>
> I've called the new level 'pud'. u for upper or something.
Well, yes, your base appetites have led you to the name "pud",
where my refined intellect led me to "phd", with h for higher ;)
> Sorry the patch isn't in very good shape at the moment - I won't have time
> to work on it for a week, so I thought this would be a good point just to
> solicit initial comments.
I doubt it's worthwhile now, particularly if you do have to patch arches.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] Possible alternate 4 level pagetables?
2004-11-15 21:37 ` Hugh Dickins
@ 2004-11-16 1:11 ` Nick Piggin
0 siblings, 0 replies; 9+ messages in thread
From: Nick Piggin @ 2004-11-16 1:11 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Andi Kleen, Linux Memory Management
Hugh Dickins wrote:
> On Sun, 14 Nov 2004, Nick Piggin wrote:
>
>>Just looking at your 4 level page tables patch, I wondered why the extra
>>level isn't inserted between pgd and pmd, as that would appear to be the
>>least intrusive (conceptually, in the generic code). Also it maybe matches
>>more closely the way that the 2->3 level conversion was done.
>
>
> I thought the same, when I finally took a look a week or so ago.
>
> I've scarcely looked at your patches, but notice they change i386.
>
Yep. The problem is that I was making an asm-generic/ header file
to handle pmd folding before even doing anything with 4 levels. Then
this meant that pud folding could be just done with a straight copy
of the pgtable-nopmd.h file.
Technically, I think, arch code would still run without changing a
single line if you didn't want to include those new "folding" headers.
(Maybe aside from a few type warnings / errors).
Andi's is much the same *except* that yes, being the top level means
names have to be changed.
> For me, the attraction of putting the new level in between pgd and pmd
> was that it seemed that only common code and x86_64 (and whatever else
> comes to use all four levels in future) would need changing (beyond,
> perhaps, #including some asm-generic headers). Some casting to combine
> the two levels into pmd in unchanged arch code, or rename pmd to pld in
> the changed common code. Andi's arch patches seemed (all?) to spring
> from replacing mm->pgd by mm->pml4.
>
My first attempt was to insert a pld below pmd actually :) It is the
most appropriately named!
The problem is that some architectures are actually using this level,
it is used in hugepages code, etc. So I decided a 'pud' was the least
intrusive.
> But I could well be mistaken, I wasn't so industrious as to actually
> try it.
>
>
>>I've been toying with it a little bit. It is mainly just starting with
>>your code and doing straight conversions, although I also attempted to
>>implement a better compatibility layer that does the pagetable "folding"
>>for you if you don't need to use the full range of them.
>>
>>Caveats are that there is still something slightly broken with it on i386,
>>and so I haven't looked at x86-64 yet. I don't see why this wouldn't work
>>though.
>>
>>I've called the new level 'pud'. u for upper or something.
>
>
> Well, yes, your base appetites have led you to the name "pud",
> where my refined intellect led me to "phd", with h for higher ;)
>
Oh now I think that is going to cause you all sorts of problems ;)
>
>>Sorry the patch isn't in very good shape at the moment - I won't have time
>>to work on it for a week, so I thought this would be a good point just to
>>solicit initial comments.
>
>
> I doubt it's worthwhile now, particularly if you do have to patch arches.
>
When I get time I'll do another cut with minimal possible architecture
changes first, and put 'improvements' on top of that.
I think I'll hopefully have time to get something more productive to add
to the debate before 2.6.10 comes out. I'm not completely adverse to Andi's
system, but this is going to be around for the next n-years, so I figure
an alternate perspective can't hurt.
Thanks
Nick
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-11-16 1:11 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-14 5:46 [RFC] Possible alternate 4 level pagetables? Nick Piggin
2004-11-14 5:46 ` Nick Piggin
2004-11-14 5:47 ` Nick Piggin
2004-11-14 5:48 ` Nick Piggin
2004-11-14 5:50 ` Nick Piggin
2004-11-14 8:25 ` Andi Kleen
2004-11-15 1:19 ` Nick Piggin
2004-11-15 21:37 ` Hugh Dickins
2004-11-16 1:11 ` Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox