* [PATCH 1/4] Generic Virtual Memmap suport for SPARSEMEM V3
@ 2007-04-04 23:06 Christoph Lameter
2007-04-04 23:06 ` [PATCH 2/4] x86_64: SPARSE_VIRTUAL 2M page size support Christoph Lameter
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Christoph Lameter @ 2007-04-04 23:06 UTC (permalink / raw)
To: akpm
Cc: linux-ia64, linux-kernel, Martin Bligh, linux-mm, Andi Kleen,
Dave Hansen, KAMEZAWA Hiroyuki, Christoph Lameter
Sparse Virtual: Virtual Memmap support for SPARSEMEM V4
V1->V3
- Add IA64 16M vmemmap size support (reduces TLB pressure)
- Add function to test for eventual node/node vmemmap overlaps
- Upper / Lower boundary fix.
V1->V2
- Support for PAGE_SIZE vmemmap which allows the general use of
of virtual memmap on any MMU capable platform (enabled IA64
support).
- Fix various issues as suggested by Dave Hansen.
- Add comments and error handling.
SPARSEMEM is a pretty nice framework that unifies quite a bit of
code over all the arches. It would be great if it could be the default
so that we can get rid of various forms of DISCONTIG and other variations
on memory maps. So far what has hindered this are the additional lookups
that SPARSEMEM introduces for virt_to_page and page_address. This goes
so far that the code to do this has to be kept in a separate function
and cannot be used inline.
This patch introduces virtual memmap support for sparsemem. virt_to_page
page_address and consorts become simple shift/add operations. No page flag
fields, no table lookups, nothing involving memory is required.
The two key operations pfn_to_page and page_to_page become:
#define pfn_to_page(pfn) (vmemmap + (pfn))
#define page_to_pfn(page) ((page) - vmemmap)
In order for this to work we will have to use a virtual mapping.
These are usually for free since kernel memory is already mapped
via a 1-1 mapping requiring a page tabld. The virtual mapping must
be big enough to span all of memory that an arch can support which
may make a virtual memmap difficult to use on 32 bit platforms
that support 36 address bits.
However, if there is enough virtual space available and the arch
already maps its 1-1 kernel space using TLBs (f.e. true of IA64
and x86_64) then this technique makes sparsemem lookups even more
effiecient than CONFIG_FLATMEM. FLATMEM still needs to read the
contents of mem_map. mem_map is constant for a virtual memory map.
Maybe this patch will allow us to make SPARSEMEM the default
configuration that will work on UP, SMP and NUMA on most platforms?
Then we may hopefully be able to remove the various forms of support
for FLATMEM, DISCONTIG etc etc.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.21-rc5-mm4/include/asm-generic/memory_model.h
===================================================================
--- linux-2.6.21-rc5-mm4.orig/include/asm-generic/memory_model.h 2007-04-04 15:45:48.000000000 -0700
+++ linux-2.6.21-rc5-mm4/include/asm-generic/memory_model.h 2007-04-04 15:45:52.000000000 -0700
@@ -46,6 +46,14 @@
__pgdat->node_start_pfn; \
})
+#elif defined(CONFIG_SPARSE_VIRTUAL)
+
+/*
+ * We have a virtual memmap that makes lookups very simple
+ */
+#define __pfn_to_page(pfn) (vmemmap + (pfn))
+#define __page_to_pfn(page) ((page) - vmemmap)
+
#elif defined(CONFIG_SPARSEMEM)
/*
* Note: section's mem_map is encorded to reflect its start_pfn.
Index: linux-2.6.21-rc5-mm4/mm/sparse.c
===================================================================
--- linux-2.6.21-rc5-mm4.orig/mm/sparse.c 2007-04-04 15:45:48.000000000 -0700
+++ linux-2.6.21-rc5-mm4/mm/sparse.c 2007-04-04 15:48:11.000000000 -0700
@@ -9,6 +9,8 @@
#include <linux/spinlock.h>
#include <linux/vmalloc.h>
#include <asm/dma.h>
+#include <asm/pgalloc.h>
+#include <asm/pgtable.h>
/*
* Permanent SPARSEMEM data:
@@ -101,7 +103,7 @@ static inline int sparse_index_init(unsi
/*
* Although written for the SPARSEMEM_EXTREME case, this happens
- * to also work for the flat array case becase
+ * to also work for the flat array case because
* NR_SECTION_ROOTS==NR_MEM_SECTIONS.
*/
int __section_nr(struct mem_section* ms)
@@ -211,6 +213,253 @@ static int sparse_init_one_section(struc
return 1;
}
+#ifdef CONFIG_SPARSE_VIRTUAL
+/*
+ * Virtual Memory Map support
+ *
+ * (C) 2007 sgi. Christoph Lameter <clameter@sgi.com>.
+ *
+ * Virtual memory maps allow VM primitives pfn_to_page, page_to_pfn,
+ * virt_to_page, page_address() etc that involve no memory accesses at all.
+ *
+ * However, virtual mappings need a page table and TLBs. Many Linux
+ * architectures already map their physical space using 1-1 mappings
+ * via TLBs. For those arches the virtual memmory map is essentially
+ * for free if we use the same page size as the 1-1 mappings. In that
+ * case the overhead consists of a few additional pages that are
+ * allocated to create a view of memory for vmemmap.
+ *
+ * Special Kconfig settings:
+ *
+ * CONFIG_ARCH_POPULATES_VIRTUAL_MEMMAP
+ *
+ * The architecture has its own functions to populate the memory
+ * map and provides a vmemmap_populate function.
+ *
+ * CONFIG_ARCH_SUPPORTS_PMD_MAPPING
+ *
+ * If not set then PAGE_SIZE mappings are generated which
+ * require one PTE/TLB per PAGE_SIZE chunk of the virtual memory map.
+ *
+ * If set then PMD_SIZE mappings are generated which are much
+ * lighter on the TLB. On some platforms these generate
+ * the same overhead as the 1-1 mappings.
+ */
+
+/*
+ * Allocate a block of memory to be used for the virtual memory map
+ * or the page tables that are used to create the mapping.
+ */
+void *vmemmap_alloc_block(unsigned long size, int node)
+{
+ if (slab_is_available()) {
+ struct page *page =
+ alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO,
+ get_order(size));
+
+ if (page)
+ return page_address(page);
+ return NULL;
+ } else {
+ void *p = __alloc_bootmem_node(NODE_DATA(node), size, size,
+ __pa(MAX_DMA_ADDRESS));
+ if (p)
+ memset(p, 0, size);
+ return p;
+ }
+}
+
+#ifndef CONFIG_ARCH_POPULATES_VIRTUAL_MEMMAP
+
+static int vmemmap_verify(pte_t *pte, int node,
+ unsigned long start, unsigned long end)
+{
+ unsigned long pfn = pte_pfn(*pte);
+ int actual_node = early_pfn_to_nid(pfn);
+
+ if (actual_node != node)
+ printk(KERN_WARNING "[%lx-%lx] potential offnode page_structs\n",
+ start, end - 1);
+ return 0;
+}
+
+#ifndef CONFIG_ARCH_SUPPORTS_PMD_MAPPING
+
+#define VIRTUAL_MEMMAP_SIZE PAGE_SIZE
+#define VIRTUAL_MEMMAP_MASK PAGE_MASK
+
+static int vmemmap_pte_setup(pte_t *pte, int node, unsigned long addr)
+{
+ void *block;
+ pte_t entry;
+
+ block = vmemmap_alloc_block(PAGE_SIZE, node);
+ if (!block)
+ return -ENOMEM;
+
+ entry = pfn_pte(__pa(block) >> PAGE_SHIFT, PAGE_KERNEL);
+ set_pte(pte, entry);
+ addr &= ~(PAGE_SIZE - 1);
+ printk(KERN_INFO "[%lx-%lx] PTE ->%p on node %d\n",
+ addr, addr + PAGE_SIZE -1, block, node);
+ return 0;
+}
+
+static int vmemmap_pop_pte(pmd_t *pmd, unsigned long addr,
+ unsigned long end, int node)
+{
+ pte_t *pte;
+ int error = 0;
+
+ for (pte = pte_offset_map(pmd, addr); addr < end && !error;
+ pte++, addr += PAGE_SIZE)
+ if (pte_none(*pte))
+ error = vmemmap_pte_setup(pte, node, addr);
+ else
+ error = vmemmap_verify(pte, node,
+ addr + PAGE_SIZE, end);
+ return error;
+}
+
+static int vmemmap_pmd_setup(pmd_t *pmd, int node)
+{
+ void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+
+ if (!p)
+ return -ENOMEM;
+
+ pmd_populate_kernel(&init_mm, pmd, p);
+ return 0;
+}
+
+#else /* CONFIG_ARCH_SUPPORTS_PMD_MAPPING */
+
+#define VIRTUAL_MEMMAP_SIZE PMD_SIZE
+#define VIRTUAL_MEMMAP_MASK PMD_MASK
+
+static int vmemmap_pop_pte(pmd_t *pmd, unsigned long addr,
+ unsigned long end, int node)
+{
+ return 0;
+}
+
+static int vmemmap_pmd_setup(pmd_t *pmd, int node, unsigned long addr)
+{
+ void *block;
+ pte_t entry;
+
+ block = vmemmap_alloc_block(PMD_SIZE, node);
+ if (!block)
+ return -ENOMEM;
+
+ entry = pfn_pte(__pa(block) >> PAGE_SHIFT, PAGE_KERNEL);
+ mk_pte_huge(entry);
+ set_pmd(pmd, __pmd(pte_val(entry)));
+ addr &= ~(PMD_SIZE - 1);
+ printk(KERN_INFO " [%lx-%lx] PMD ->%p on node %d\n",
+ addr, addr + PMD_SIZE - 1, block, node);
+ return 0;
+}
+
+#endif /* CONFIG_ARCH_SUPPORTS_PMD_MAPPING */
+
+static int vmemmap_pop_pmd(pud_t *pud, unsigned long addr,
+ unsigned long end, int node)
+{
+ pmd_t *pmd;
+ int error = 0;
+
+ for (pmd = pmd_offset(pud, addr); addr < end && !error;
+ pmd++, addr += PMD_SIZE) {
+ if (pmd_none(*pmd))
+ error = vmemmap_pmd_setup(pmd, node, addr);
+ else
+ error = vmemmap_verify((pte_t *)pmd, node,
+ pmd_addr_end(addr, end), end);
+
+ if (!error)
+ error = vmemmap_pop_pte(pmd, addr,
+ pmd_addr_end(addr, end), node);
+ }
+ return error;
+}
+
+static int vmemmap_pop_pud(pgd_t *pgd, unsigned long addr,
+ unsigned long end, int node)
+{
+ pud_t *pud;
+ int error = 0;
+
+ for (pud = pud_offset(pgd, addr); addr < end && !error;
+ pud++, addr += PUD_SIZE) {
+
+ if (pud_none(*pud)) {
+ void *p =
+ vmemmap_alloc_block(PAGE_SIZE, node);
+
+ if (!p)
+ return -ENOMEM;
+
+ pud_populate(&init_mm, pud, p);
+ }
+ error = vmemmap_pop_pmd(pud, addr,
+ pud_addr_end(addr, end), node);
+ }
+ return error;
+}
+
+int vmemmap_populate(struct page *start_page, unsigned long nr,
+ int node)
+{
+ pgd_t *pgd;
+ unsigned long addr = (unsigned long)start_page & VIRTUAL_MEMMAP_MASK;
+ unsigned long end =
+ ((unsigned long)(start_page + nr) & VIRTUAL_MEMMAP_MASK)
+ + VIRTUAL_MEMMAP_SIZE;
+ int error = 0;
+
+ printk(KERN_INFO "[%lx-%lx] Virtual memory section"
+ " (%ld pages) node %d\n",
+ (unsigned long)start_page,
+ (unsigned long)(start_page + nr) - 1, nr, node);
+
+ for (pgd = pgd_offset_k(addr); addr < end && !error;
+ pgd++, addr += PGDIR_SIZE) {
+
+ if (pgd_none(*pgd)) {
+ void *p =
+ vmemmap_alloc_block(PAGE_SIZE, node);
+
+ pgd_populate(&init_mm, pgd, p);
+ }
+ error = vmemmap_pop_pud(pgd, addr,
+ pgd_addr_end(addr, end), node);
+ }
+ return error;
+}
+#endif /* !CONFIG_ARCH_POPULATES_VIRTUAL_MEMMAP */
+
+static struct page *sparse_early_mem_map_alloc(unsigned long pnum)
+{
+ struct page *map;
+ struct mem_section *ms = __nr_to_section(pnum);
+ int nid = sparse_early_nid(ms);
+ int error;
+
+ map = pfn_to_page(pnum * PAGES_PER_SECTION);
+ error = vmemmap_populate(map, PAGES_PER_SECTION, nid);
+
+ if (error) {
+ printk(KERN_ERR "%s: allocation failed. Error=%d\n",
+ __FUNCTION__, error);
+ ms->section_mem_map = 0;
+ return NULL;
+ }
+ return map;
+}
+
+#else /* CONFIG_SPARSE_VIRTUAL */
+
static struct page *sparse_early_mem_map_alloc(unsigned long pnum)
{
struct page *map;
@@ -231,6 +480,8 @@ static struct page *sparse_early_mem_map
return NULL;
}
+#endif /* !CONFIG_SPARSE_VIRTUAL */
+
static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
{
struct page *page, *ret;
Index: linux-2.6.21-rc5-mm4/include/linux/mmzone.h
===================================================================
--- linux-2.6.21-rc5-mm4.orig/include/linux/mmzone.h 2007-04-04 15:45:48.000000000 -0700
+++ linux-2.6.21-rc5-mm4/include/linux/mmzone.h 2007-04-04 15:45:52.000000000 -0700
@@ -836,6 +836,8 @@ void sparse_init(void);
void memory_present(int nid, unsigned long start, unsigned long end);
unsigned long __init node_memmap_size_bytes(int, unsigned long, unsigned long);
+int vmemmap_populate(struct page *start_page, unsigned long pages, int node);
+void *vmemmap_alloc_block(unsigned long size, int node);
/*
* If it is possible to have holes within a MAX_ORDER_NR_PAGES, then we
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH 2/4] x86_64: SPARSE_VIRTUAL 2M page size support
2007-04-04 23:06 [PATCH 1/4] Generic Virtual Memmap suport for SPARSEMEM V3 Christoph Lameter
@ 2007-04-04 23:06 ` Christoph Lameter
2007-04-04 23:06 ` [PATCH 3/4] IA64: SPARSE_VIRTUAL 16K " Christoph Lameter
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: Christoph Lameter @ 2007-04-04 23:06 UTC (permalink / raw)
To: akpm
Cc: linux-ia64, linux-kernel, Martin Bligh, linux-mm, Andi Kleen,
Christoph Lameter, Dave Hansen, KAMEZAWA Hiroyuki
x86_64 implement SPARSE_VIRTUAL
x86_64 is using 2M page table entries to map its 1-1 kernel space.
We implement the virtual memmap also using 2M page table entries.
So there is no difference at all to FLATMEM. Both schemes require
a page table and a TLB for each 2MB. FLATMEM still references memory
since the mem_map pointer itself a variable. SPARSE_VIRTUAL uses a
constant for vmemmap. Thus no memory reference. SPARSE_VIRTUAL should
be superior to even FLATMEM.
With this SPARSEMEM becomes the most efficient way of handling
virt_to_page, pfn_to_page and friends for UP, SMP and NUMA on
x86_64.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.21-rc5-mm4/include/asm-x86_64/page.h
===================================================================
--- linux-2.6.21-rc5-mm4.orig/include/asm-x86_64/page.h 2007-04-03 18:41:06.000000000 -0700
+++ linux-2.6.21-rc5-mm4/include/asm-x86_64/page.h 2007-04-03 18:41:59.000000000 -0700
@@ -128,6 +128,7 @@ extern unsigned long phys_base;
VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
#define __HAVE_ARCH_GATE_AREA 1
+#define vmemmap ((struct page *)0xffffe20000000000UL)
#include <asm-generic/memory_model.h>
#include <asm-generic/page.h>
Index: linux-2.6.21-rc5-mm4/Documentation/x86_64/mm.txt
===================================================================
--- linux-2.6.21-rc5-mm4.orig/Documentation/x86_64/mm.txt 2007-04-03 18:41:06.000000000 -0700
+++ linux-2.6.21-rc5-mm4/Documentation/x86_64/mm.txt 2007-04-03 18:41:59.000000000 -0700
@@ -9,6 +9,7 @@ ffff800000000000 - ffff80ffffffffff (=40
ffff810000000000 - ffffc0ffffffffff (=46 bits) direct mapping of all phys. memory
ffffc10000000000 - ffffc1ffffffffff (=40 bits) hole
ffffc20000000000 - ffffe1ffffffffff (=45 bits) vmalloc/ioremap space
+ffffe20000000000 - ffffe2ffffffffff (=40 bits) virtual memory map
... unused hole ...
ffffffff80000000 - ffffffff82800000 (=40 MB) kernel text mapping, from phys 0
... unused hole ...
Index: linux-2.6.21-rc5-mm4/arch/x86_64/Kconfig
===================================================================
--- linux-2.6.21-rc5-mm4.orig/arch/x86_64/Kconfig 2007-04-03 18:41:06.000000000 -0700
+++ linux-2.6.21-rc5-mm4/arch/x86_64/Kconfig 2007-04-03 18:41:59.000000000 -0700
@@ -392,6 +392,12 @@ config ARCH_SPARSEMEM_ENABLE
def_bool y
depends on (NUMA || EXPERIMENTAL)
+config SPARSE_VIRTUAL
+ def_bool y
+
+config ARCH_SUPPORTS_PMD_MAPPING
+ def_bool y
+
config ARCH_MEMORY_PROBE
def_bool y
depends on MEMORY_HOTPLUG
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH 3/4] IA64: SPARSE_VIRTUAL 16K page size support
2007-04-04 23:06 [PATCH 1/4] Generic Virtual Memmap suport for SPARSEMEM V3 Christoph Lameter
2007-04-04 23:06 ` [PATCH 2/4] x86_64: SPARSE_VIRTUAL 2M page size support Christoph Lameter
@ 2007-04-04 23:06 ` Christoph Lameter
2007-04-04 23:06 ` [PATCH 4/4] IA64: SPARSE_VIRTUAL 16M " Christoph Lameter
2007-04-05 21:29 ` [PATCH 1/4] Generic Virtual Memmap suport for SPARSEMEM V3 David Miller, Christoph Lameter
3 siblings, 0 replies; 10+ messages in thread
From: Christoph Lameter @ 2007-04-04 23:06 UTC (permalink / raw)
To: akpm
Cc: linux-ia64, linux-kernel, Martin Bligh, linux-mm,
Christoph Lameter, Dave Hansen, KAMEZAWA Hiroyuki, Andi Kleen
[IA64] Sparse virtual implementation
Equip IA64 sparsemem with a virtual memmap. This is similar to the existing
CONFIG_VMEMMAP functionality for discontig. It uses a page size mapping.
This is provided as a minimally intrusive solution. We split the
128TB VMALLOC area into two 64TB areas and use one for the virtual memmap.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.21-rc5-mm2/arch/ia64/Kconfig
===================================================================
--- linux-2.6.21-rc5-mm2.orig/arch/ia64/Kconfig 2007-04-02 16:15:29.000000000 -0700
+++ linux-2.6.21-rc5-mm2/arch/ia64/Kconfig 2007-04-02 16:15:50.000000000 -0700
@@ -350,6 +350,10 @@ config ARCH_SPARSEMEM_ENABLE
def_bool y
depends on ARCH_DISCONTIGMEM_ENABLE
+config SPARSE_VIRTUAL
+ def_bool y
+ depends on ARCH_SPARSEMEM_ENABLE
+
config ARCH_DISCONTIGMEM_DEFAULT
def_bool y if (IA64_SGI_SN2 || IA64_GENERIC || IA64_HP_ZX1 || IA64_HP_ZX1_SWIOTLB)
depends on ARCH_DISCONTIGMEM_ENABLE
Index: linux-2.6.21-rc5-mm2/include/asm-ia64/page.h
===================================================================
--- linux-2.6.21-rc5-mm2.orig/include/asm-ia64/page.h 2007-04-02 16:15:29.000000000 -0700
+++ linux-2.6.21-rc5-mm2/include/asm-ia64/page.h 2007-04-02 16:15:50.000000000 -0700
@@ -106,6 +106,9 @@ extern int ia64_pfn_valid (unsigned long
# define ia64_pfn_valid(pfn) 1
#endif
+#define vmemmap ((struct page *)(RGN_BASE(RGN_GATE) + \
+ (1UL << (4*PAGE_SHIFT - 10))))
+
#ifdef CONFIG_VIRTUAL_MEM_MAP
extern struct page *vmem_map;
#ifdef CONFIG_DISCONTIGMEM
Index: linux-2.6.21-rc5-mm2/include/asm-ia64/pgtable.h
===================================================================
--- linux-2.6.21-rc5-mm2.orig/include/asm-ia64/pgtable.h 2007-04-02 16:15:29.000000000 -0700
+++ linux-2.6.21-rc5-mm2/include/asm-ia64/pgtable.h 2007-04-02 16:15:50.000000000 -0700
@@ -236,8 +236,13 @@ ia64_phys_addr_valid (unsigned long addr
# define VMALLOC_END vmalloc_end
extern unsigned long vmalloc_end;
#else
+#if defined(CONFIG_SPARSEMEM) && defined(CONFIG_SPARSE_VIRTUAL)
+/* SPARSE_VIRTUAL uses half of vmalloc... */
+# define VMALLOC_END (RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 10)))
+#else
# define VMALLOC_END (RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 9)))
#endif
+#endif
/* fs/proc/kcore.c */
#define kc_vaddr_to_offset(v) ((v) - RGN_BASE(RGN_GATE))
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 4/4] IA64: SPARSE_VIRTUAL 16M page size support
2007-04-04 23:06 [PATCH 1/4] Generic Virtual Memmap suport for SPARSEMEM V3 Christoph Lameter
2007-04-04 23:06 ` [PATCH 2/4] x86_64: SPARSE_VIRTUAL 2M page size support Christoph Lameter
2007-04-04 23:06 ` [PATCH 3/4] IA64: SPARSE_VIRTUAL 16K " Christoph Lameter
@ 2007-04-04 23:06 ` Christoph Lameter
2007-04-05 22:50 ` Luck, Tony
2007-04-05 21:29 ` [PATCH 1/4] Generic Virtual Memmap suport for SPARSEMEM V3 David Miller, Christoph Lameter
3 siblings, 1 reply; 10+ messages in thread
From: Christoph Lameter @ 2007-04-04 23:06 UTC (permalink / raw)
To: akpm
Cc: linux-ia64, linux-kernel, Martin Bligh, linux-mm, Andi Kleen,
Christoph Lameter, Dave Hansen, KAMEZAWA Hiroyuki
[IA64] Large vmemmap support
This implements granule page sized vmemmap support for IA64. This is
important because the traditional vmemmap on IA64 uses page size for
mapping the TLB. For a typical 8GB node on IA64 we need about
(33 - 14 + 6 = 25) = 32 MB of page structs.
Using page size we will end up with (25 - 14 = 11) 2048 page table entries.
This patch will reduce this to two 16MB TLBs. So its a factor
of 1000 less TLBs for the virtual memory map.
We modify the alt_dtlb_miss handler to branch to a vmemmap TLB lookup
function if bit 60 is set. The vmemmap will start with 0xF000xxx so its
going be very distinctive in dumps and can be distinguished easily from
0xE000xxx (kernel 1-1 area) and 0xA000xxx (kernel text, data and vmalloc).
We use a 1 level page table to do lookups for the vmemmap TLBs. Since
we need to cover 1 Petabyte we need to reserve 1 megabyte just for
the table but we can statically allocate it in the data segment. This
simplifies lookups and handling. The fault handler only has to do
a single lookup in contrast to 4 for the current vmalloc/vmemmap
implementation.
Problems with this patchset are:
1. Large 1M array required to cover all of possible memory (1 Petabyte).
Maybe reduce this to actually supported HW sizes? 16TB or 64TB?
2. For systems with small nodes there is a significant chance of
large overlaps. We could dynamically determine the TLB size
but that would make the code more complex.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.21-rc5-mm4/arch/ia64/kernel/ivt.S
===================================================================
--- linux-2.6.21-rc5-mm4.orig/arch/ia64/kernel/ivt.S 2007-04-04 15:45:47.000000000 -0700
+++ linux-2.6.21-rc5-mm4/arch/ia64/kernel/ivt.S 2007-04-04 15:49:24.000000000 -0700
@@ -391,9 +391,11 @@ ENTRY(alt_dtlb_miss)
tbit.z p12,p0=r16,61 // access to region 6?
mov r25=PERCPU_PAGE_SHIFT << 2
mov r26=PERCPU_PAGE_SIZE
- nop.m 0
- nop.b 0
+ tbit.nz p6,p0=r16,60 // Access to VMEMMAP?
+(p6) br.cond.dptk vmemmap
;;
+dtlb_continue:
+ .pred.rel "mutex", p11, p10
(p10) mov r19=IA64_KR(PER_CPU_DATA)
(p11) and r19=r19,r16 // clear non-ppn fields
extr.u r23=r21,IA64_PSR_CPL0_BIT,2 // extract psr.cpl
@@ -416,6 +418,37 @@ ENTRY(alt_dtlb_miss)
(p7) itc.d r19 // insert the TLB entry
mov pr=r31,-1
rfi
+
+vmemmap:
+ //
+ // Granule lookup via vmemmap_table for
+ // the virtual memory map.
+ //
+ tbit.nz p6,p0=r16,59 // more top bits set?
+(p6) br.cond.spnt dtlb_continue // then its mmu bootstrap
+ ;;
+ rsm psr.dt // switch to using physical data addressing
+ extr.u r25=r16, IA64_GRANULE_SHIFT, 32
+ ;;
+ srlz.d
+ LOAD_PHYSICAL(p0, r26, vmemmap_table)
+ shl r25=r25,2
+ ;;
+ add r26=r26,r25 // Index into vmemmap table
+ ;;
+ ld4 r25=[r26] // Get 32 bit descriptor */
+ ;;
+ dep.z r19=r25, 0, 31 // Isolate ppn
+ tbit.z p6,p0=r25, 31 // Present bit set?
+(p6) br.cond.spnt page_fault // Page not present
+ ;;
+ shl r19=r19, IA64_GRANULE_SHIFT // Shift ppn in place
+ ;;
+ or r19=r19,r17 // insert PTE control bits into r19
+ ;;
+ itc.d r19 // insert the TLB entry
+ mov pr=r31,-1
+ rfi
END(alt_dtlb_miss)
.org ia64_ivt+0x1400
Index: linux-2.6.21-rc5-mm4/arch/ia64/mm/discontig.c
===================================================================
--- linux-2.6.21-rc5-mm4.orig/arch/ia64/mm/discontig.c 2007-04-04 15:45:47.000000000 -0700
+++ linux-2.6.21-rc5-mm4/arch/ia64/mm/discontig.c 2007-04-04 15:53:02.000000000 -0700
@@ -8,6 +8,8 @@
* Russ Anderson <rja@sgi.com>
* Jesse Barnes <jbarnes@sgi.com>
* Jack Steiner <steiner@sgi.com>
+ * Copyright (C) 2007 sgi
+ * Christoph Lameter <clameter@sgi.com>
*/
/*
@@ -44,6 +46,79 @@ struct early_node_data {
unsigned long max_pfn;
};
+#ifdef CONFIG_ARCH_POPULATES_VIRTUAL_MEMMAP
+
+/*
+ * The vmemmap_table contains the number of the granule used to map
+ * that section of the virtual memmap.
+ *
+ * We support 50 address bits, 14 bits are used for the page size. This
+ * leaves 36 bits (64G) for the pfn. Using page structs the memmap is going
+ * to take up a bit less than 4TB of virtual space.
+ *
+ * We are mapping these 4TB using 16M granule size which makes us end up
+ * with a bit less than 256k entries.
+ *
+ * Thus the common size of the needed vmemmap_table will be less than 1M.
+ */
+
+#define VMEMMAP_SIZE GRANULEROUNDUP((1UL << (MAX_PHYSMEM_BITS - PAGE_SHIFT)) \
+ * sizeof(struct page))
+
+/*
+ * Each vmemma_table entry describes a 16M block of memory. We have
+ * 32 bit here and use one bit to indicate that a page is present.
+ * 31 bit ppn + 24 index in the page = 55 bits which is larger than
+ * the current maximum of memory (1 Petabyte) supported by IA64.
+ */
+
+#define VMEMMAP_PRESENT (1UL << 31)
+
+u32 vmemmap_table[VMEMMAP_SIZE >> IA64_GRANULE_SHIFT];
+
+int vmemmap_populate(struct page *start, unsigned long nr, int node)
+{
+ unsigned long phys_start = __pa(start) & ~VMEMMAP_FLAG;
+ unsigned long phys_end = __pa(start + nr) & ~VMEMMAP_FLAG;
+ unsigned long addr = GRANULEROUNDDOWN(phys_start);
+ unsigned long end = GRANULEROUNDUP(phys_end);
+
+ for(; addr < end; addr += IA64_GRANULE_SIZE) {
+ u32 *vmem_pp =
+ vmemmap_table + (addr >> IA64_GRANULE_SHIFT);
+ void *block;
+
+ if (*vmem_pp & VMEMMAP_PRESENT) {
+ unsigned long addr = *vmem_pp & ~VMEMMAP_PRESENT;
+ int actual_node;
+
+ actual_node = early_pfn_to_nid(addr >> PAGE_SHIFT);
+ if (actual_node != node)
+ printk(KERN_WARNING "Virtual memory segments on node %d instead "
+ "of %d", actual_node, node);
+
+ } else {
+
+ block = vmemmap_alloc_block(IA64_GRANULE_SIZE, node);
+ if (!block)
+ return -ENOMEM;
+
+ *vmem_pp = VMEMMAP_PRESENT |
+ (__pa(block) >> IA64_GRANULE_SHIFT);
+
+ printk(KERN_INFO "[%p-%p] page_structs=%lu "
+ "node=%d entry=%lu/%lu\n", start, block, nr, node,
+ addr >> IA64_GRANULE_SHIFT,
+ VMEMMAP_SIZE >> IA64_GRANULE_SHIFT);
+ }
+ }
+ return 0;
+}
+#else
+/* Satisfy reference in arch/ia64/kernel/ivt.S */
+u32 vmemmap_table[0];
+#endif
+
static struct early_node_data mem_data[MAX_NUMNODES] __initdata;
static nodemask_t memory_less_mask __initdata;
Index: linux-2.6.21-rc5-mm4/include/asm-ia64/page.h
===================================================================
--- linux-2.6.21-rc5-mm4.orig/include/asm-ia64/page.h 2007-04-04 15:49:18.000000000 -0700
+++ linux-2.6.21-rc5-mm4/include/asm-ia64/page.h 2007-04-04 15:49:24.000000000 -0700
@@ -106,8 +106,13 @@ extern int ia64_pfn_valid (unsigned long
# define ia64_pfn_valid(pfn) 1
#endif
+#ifdef CONFIG_ARCH_POPULATES_VIRTUAL_MEMMAP
+#define VMEMMAP_FLAG (1UL << 60)
+#define vmemmap ((struct page *)(RGN_BASE(RGN_KERNEL) | VMEMMAP_FLAG))
+#else
#define vmemmap ((struct page *)(RGN_BASE(RGN_GATE) + \
(1UL << (4*PAGE_SHIFT - 10))))
+#endif
#ifdef CONFIG_VIRTUAL_MEM_MAP
extern struct page *vmem_map;
Index: linux-2.6.21-rc5-mm4/include/asm-ia64/pgtable.h
===================================================================
--- linux-2.6.21-rc5-mm4.orig/include/asm-ia64/pgtable.h 2007-04-04 15:49:18.000000000 -0700
+++ linux-2.6.21-rc5-mm4/include/asm-ia64/pgtable.h 2007-04-04 15:49:24.000000000 -0700
@@ -236,8 +236,9 @@ ia64_phys_addr_valid (unsigned long addr
# define VMALLOC_END vmalloc_end
extern unsigned long vmalloc_end;
#else
-#if defined(CONFIG_SPARSEMEM) && defined(CONFIG_SPARSE_VIRTUAL)
-/* SPARSE_VIRTUAL uses half of vmalloc... */
+#if defined(CONFIG_SPARSEMEM) && defined(CONFIG_SPARSE_VIRTUAL) && \
+ !defined(CONFIG_ARCH_POPULATES_VIRTUAL_MEMMAP)
+/* Standard SPARSE_VIRTUAL uses half of vmalloc... */
# define VMALLOC_END (RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 10)))
#else
# define VMALLOC_END (RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 9)))
Index: linux-2.6.21-rc5-mm4/arch/ia64/Kconfig
===================================================================
--- linux-2.6.21-rc5-mm4.orig/arch/ia64/Kconfig 2007-04-04 15:49:18.000000000 -0700
+++ linux-2.6.21-rc5-mm4/arch/ia64/Kconfig 2007-04-04 15:49:24.000000000 -0700
@@ -330,6 +330,17 @@ config PREEMPT
Say Y here if you are building a kernel for a desktop, embedded
or real-time system. Say N if you are unsure.
+config ARCH_POPULATES_VIRTUAL_MEMMAP
+ bool "Use 16M TLB for virtual memory map"
+ depends on SPARSE_VIRTUAL
+ help
+ Enables large page virtual memmap support. Each virtual memmap
+ page will be 16MB in size. That size of vmemmap can cover 4GB
+ of memory. We only use a single TLB per node. However, if nodes
+ are small and the distance between the memory of the nodes is
+ < 4GB then the page struct for some of the early pages in the
+ node may end up on the prior node.
+
source "mm/Kconfig"
config ARCH_SELECT_MEMORY_MODEL
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread* RE: [PATCH 4/4] IA64: SPARSE_VIRTUAL 16M page size support
2007-04-04 23:06 ` [PATCH 4/4] IA64: SPARSE_VIRTUAL 16M " Christoph Lameter
@ 2007-04-05 22:50 ` Luck, Tony
2007-04-05 23:10 ` David Miller, Luck, Tony
2007-04-06 17:16 ` Christoph Lameter
0 siblings, 2 replies; 10+ messages in thread
From: Luck, Tony @ 2007-04-05 22:50 UTC (permalink / raw)
To: Christoph Lameter, akpm
Cc: linux-ia64, linux-kernel, Martin Bligh, linux-mm, Andi Kleen,
Dave Hansen, KAMEZAWA Hiroyuki
> This implements granule page sized vmemmap support for IA64.
Christoph,
Your calculations here are all based on a granule size of 16M, but
it is possible to configure 64M granules.
With current sizeof(struct page) == 56, a 16M page will hold enough
page structures for about 4.5G of physical space (assuming 16K pages),
so a 64M page would cover 18G.
4.5G is possibly a bit wasteful (for a system with only a handful
of GBytes per node, and nodes that are not physically contiguous).
18G is definitely going to result in lots of wasted page structs
(that refer to non-existant physical memory around the edges of
each node).
Maybe a granule is not the right unit of allocation ... perhaps 4M
would work better (4M/56 ~= 75000 pages ~= 1.1G)? But if this is
too small, then a hard-coded 16M would be better than a granule,
because 64M is (IMHO) too big.
-Tony
P.S. This patch breaks the build for tiger_defconfig, zx1_defconfig
etc. But you may have fit on the "grand-unified theory" of mem_map
management ... so if the benchmarks come in favourably we could
drop all the other CONFIG options.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 4/4] IA64: SPARSE_VIRTUAL 16M page size support
2007-04-05 22:50 ` Luck, Tony
@ 2007-04-05 23:10 ` David Miller, Luck, Tony
2007-04-06 17:16 ` Christoph Lameter
1 sibling, 0 replies; 10+ messages in thread
From: David Miller, Luck, Tony @ 2007-04-05 23:10 UTC (permalink / raw)
To: tony.luck
Cc: clameter, akpm, linux-ia64, linux-kernel, mbligh, linux-mm, ak,
hansendc, kamezawa.hiroyu
> Maybe a granule is not the right unit of allocation ... perhaps 4M
> would work better (4M/56 ~= 75000 pages ~= 1.1G)? But if this is
> too small, then a hard-coded 16M would be better than a granule,
> because 64M is (IMHO) too big.
A 4MB chunk of page structs covers about 512MB of ram (I'm rounding up
to 64-bytes in my calculations and using an 8K page size, sorry :-).
So I think that is too small although on the sparc64 side that is the
biggest I have available on most processor models.
But I do agree that 64MB is way too big and 16MB is a good compromise
chunk size for this stuff. That covers about 2GB of ram with the
above parameters, which should be about right.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH 4/4] IA64: SPARSE_VIRTUAL 16M page size support
2007-04-05 22:50 ` Luck, Tony
2007-04-05 23:10 ` David Miller, Luck, Tony
@ 2007-04-06 17:16 ` Christoph Lameter
1 sibling, 0 replies; 10+ messages in thread
From: Christoph Lameter @ 2007-04-06 17:16 UTC (permalink / raw)
To: Luck, Tony
Cc: akpm, linux-ia64, linux-kernel, Martin Bligh, linux-mm,
Andi Kleen, Dave Hansen, KAMEZAWA Hiroyuki
On Thu, 5 Apr 2007, Luck, Tony wrote:
> > This implements granule page sized vmemmap support for IA64.
>
> Christoph,
>
> Your calculations here are all based on a granule size of 16M, but
> it is possible to configure 64M granules.
Hmm...... Maybe we need to have a separate size for the vmemmap size?
> With current sizeof(struct page) == 56, a 16M page will hold enough
> page structures for about 4.5G of physical space (assuming 16K pages),
> so a 64M page would cover 18G.
Yes that is far too much.
> Maybe a granule is not the right unit of allocation ... perhaps 4M
> would work better (4M/56 ~= 75000 pages ~= 1.1G)? But if this is
> too small, then a hard-coded 16M would be better than a granule,
> because 64M is (IMHO) too big.
I have some measurements 1M vs. 16M that I took last year when I first
developed the approach:
1. 16k vmm page size
Tasks jobs/min jti jobs/min/task real cpu
1 2434.08 100 2434.0771 2.46 0.02 Thu Oct 12 03:22:20 2006
100 178784.27 93 1787.8427 3.36 7.14 Thu Oct 12 03:22:34 2006
200 279199.63 94 1395.9981 4.30 14.70 Thu Oct 12 03:22:52 2006
300 340909.09 92 1136.3636 5.28 22.55 Thu Oct 12 03:23:14 2006
400 381133.87 90 952.8347 6.30 30.64 Thu Oct 12 03:23:40 2006
500 408942.20 93 817.8844 7.34 38.90 Thu Oct 12 03:24:10 2006
600 430673.53 89 717.7892 8.36 47.15 Thu Oct 12 03:24:45 2006
700 445859.87 92 636.9427 9.42 55.59 Thu Oct 12 03:25:23 2006
800 460564.19 94 575.7052 10.42 63.57 Thu Oct 12 03:26:06 2006
2. 1M vmm page size
Tasks jobs/min jti jobs/min/task real cpu
1 2435.06 100 2435.0649 2.46 0.02 Thu Oct 12 03:08:25 2006
100 178041.54 93 1780.4154 3.37 7.18 Thu Oct 12 03:08:39 2006
200 278035.22 96 1390.1761 4.32 14.85 Thu Oct 12 03:08:57 2006
300 338536.77 96 1128.4559 5.32 22.90 Thu Oct 12 03:09:19 2006
400 377180.58 89 942.9514 6.36 31.19 Thu Oct 12 03:09:46 2006
500 407000.41 96 814.0008 7.37 39.21 Thu Oct 12 03:10:16 2006
600 428979.98 91 714.9666 8.39 47.43 Thu Oct 12 03:10:51 2006
700 444209.41 94 634.5849 9.46 55.86 Thu Oct 12 03:11:30 2006
800 455753.89 93 569.6924 10.53 64.59 Thu Oct 12 03:12:13 2006
4M would be right in the middle and maybe not so bad.
Note that these numbers were based on a more complex TLB handler.
See http://marc.info/?l=linux-ia64&m=116069969308257&w=2 (variable
kernel page size handler).
The problem with a different page size is that this would require
redesign of the TLB lookup logic. We could go back to my variable kernel
page size patch quoted above but then we walk the complete page table.
The 1 level lookup as far as I can tell only works well with 16M.
If we would try to use a 1 level lookup for a 4M page then we would have
a linear lookup table that takes up 4MB to support 1 Petabyte.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/4] Generic Virtual Memmap suport for SPARSEMEM V3
2007-04-04 23:06 [PATCH 1/4] Generic Virtual Memmap suport for SPARSEMEM V3 Christoph Lameter
` (2 preceding siblings ...)
2007-04-04 23:06 ` [PATCH 4/4] IA64: SPARSE_VIRTUAL 16M " Christoph Lameter
@ 2007-04-05 21:29 ` David Miller, Christoph Lameter
2007-04-05 22:27 ` Christoph Lameter
3 siblings, 1 reply; 10+ messages in thread
From: David Miller, Christoph Lameter @ 2007-04-05 21:29 UTC (permalink / raw)
To: clameter
Cc: akpm, linux-ia64, linux-kernel, mbligh, linux-mm, ak, hansendc,
kamezawa.hiroyu
> Sparse Virtual: Virtual Memmap support for SPARSEMEM V4
>
> V1->V3
> - Add IA64 16M vmemmap size support (reduces TLB pressure)
> - Add function to test for eventual node/node vmemmap overlaps
> - Upper / Lower boundary fix.
Hey Christoph, here is sparc64 support for this stuff.
After implementing this and seeing more and more how it works, I
really like it :-)
Thanks a lot for doing this work Christoph!
diff --git a/arch/sparc64/Kconfig b/arch/sparc64/Kconfig
index 1a6348b..4da8012 100644
--- a/arch/sparc64/Kconfig
+++ b/arch/sparc64/Kconfig
@@ -215,6 +215,12 @@ config ARCH_SPARSEMEM_ENABLE
config ARCH_SPARSEMEM_DEFAULT
def_bool y
+config SPARSE_VIRTUAL
+ def_bool y
+
+config ARCH_POPULATES_VIRTUAL_MEMMAP
+ def_bool y
+
config LARGE_ALLOCS
def_bool y
diff --git a/arch/sparc64/kernel/ktlb.S b/arch/sparc64/kernel/ktlb.S
index d4024ac..964527d 100644
--- a/arch/sparc64/kernel/ktlb.S
+++ b/arch/sparc64/kernel/ktlb.S
@@ -226,6 +226,15 @@ kvmap_dtlb_load:
ba,pt %xcc, sun4v_dtlb_load
mov %g5, %g3
+kvmap_vmemmap:
+ sub %g4, %g5, %g5
+ srlx %g5, 22, %g5
+ sethi %hi(vmemmap_table), %g1
+ sllx %g5, 3, %g5
+ or %g1, %lo(vmemmap_table), %g1
+ ba,pt %xcc, kvmap_dtlb_load
+ ldx [%g1 + %g5], %g5
+
kvmap_dtlb_nonlinear:
/* Catch kernel NULL pointer derefs. */
sethi %hi(PAGE_SIZE), %g5
@@ -233,6 +242,13 @@ kvmap_dtlb_nonlinear:
bleu,pn %xcc, kvmap_dtlb_longpath
nop
+ /* Do not use the TSB for vmemmap. */
+ mov (VMEMMAP_BASE >> 24), %g5
+ sllx %g5, 24, %g5
+ cmp %g4,%g5
+ bgeu,pn %xcc, kvmap_vmemmap
+ nop
+
KERN_TSB_LOOKUP_TL1(%g4, %g6, %g5, %g1, %g2, %g3, kvmap_dtlb_load)
kvmap_dtlb_tsbmiss:
diff --git a/arch/sparc64/mm/init.c b/arch/sparc64/mm/init.c
index f146071..9b73933 100644
--- a/arch/sparc64/mm/init.c
+++ b/arch/sparc64/mm/init.c
@@ -1687,6 +1687,56 @@ EXPORT_SYMBOL(_PAGE_E);
unsigned long _PAGE_CACHE __read_mostly;
EXPORT_SYMBOL(_PAGE_CACHE);
+#define VMEMMAP_CHUNK_SHIFT 22
+#define VMEMMAP_CHUNK (1UL << VMEMMAP_CHUNK_SHIFT)
+#define VMEMMAP_CHUNK_MASK ~(VMEMMAP_CHUNK - 1UL)
+#define VMEMMAP_ALIGN(x) (((x)+VMEMMAP_CHUNK-1UL)&VMEMMAP_CHUNK_MASK)
+
+#define VMEMMAP_SIZE ((((1UL << MAX_PHYSADDR_BITS) >> PAGE_SHIFT) * \
+ sizeof(struct page *)) >> VMEMMAP_CHUNK_SHIFT)
+unsigned long vmemmap_table[VMEMMAP_SIZE];
+
+int vmemmap_populate(struct page *start, unsigned long nr, int node)
+{
+ unsigned long vstart = (unsigned long) start;
+ unsigned long vend = (unsigned long) (start + nr);
+ unsigned long phys_start = (vstart - VMEMMAP_BASE);
+ unsigned long phys_end = (vend - VMEMMAP_BASE);
+ unsigned long addr = phys_start & VMEMMAP_CHUNK_MASK;
+ unsigned long end = VMEMMAP_ALIGN(phys_end);
+ unsigned long pte_base;
+
+ pte_base = (_PAGE_VALID | _PAGE_SZ4MB_4U |
+ _PAGE_CP_4U | _PAGE_CV_4U |
+ _PAGE_P_4U | _PAGE_W_4U);
+ if (tlb_type == hypervisor)
+ pte_base = (_PAGE_VALID | _PAGE_SZ4MB_4V |
+ _PAGE_CP_4V | _PAGE_CV_4V |
+ _PAGE_P_4V | _PAGE_W_4V);
+
+ for(; addr < end; addr += VMEMMAP_CHUNK) {
+ unsigned long *vmem_pp =
+ vmemmap_table + (addr >> VMEMMAP_CHUNK_SHIFT);
+ void *block;
+
+ if (!(*vmem_pp & _PAGE_VALID)) {
+ block = vmemmap_alloc_block(1UL << 22, node);
+ if (!block)
+ return -ENOMEM;
+
+ *vmem_pp = pte_base | __pa(block);
+
+ printk(KERN_INFO "[%p-%p] page_structs=%lu "
+ "node=%d entry=%lu/%lu\n", start, block, nr,
+ node,
+ addr >> VMEMMAP_CHUNK_SHIFT,
+ VMEMMAP_SIZE >> VMEMMAP_CHUNK_SHIFT);
+ }
+ }
+ return 0;
+}
+
+
static void prot_init_common(unsigned long page_none,
unsigned long page_shared,
unsigned long page_copy,
diff --git a/include/asm-sparc64/page.h b/include/asm-sparc64/page.h
index ff736ea..f1f1a58 100644
--- a/include/asm-sparc64/page.h
+++ b/include/asm-sparc64/page.h
@@ -22,6 +22,9 @@
#define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT)
#define PAGE_MASK (~(PAGE_SIZE-1))
+#define VMEMMAP_BASE _AC(0x0000000200000000,UL)
+#define vmemmap ((struct page *)VMEMMAP_BASE)
+
/* Flushing for D-cache alias handling is only needed if
* the page size is smaller than 16K.
*/
diff --git a/include/asm-sparc64/pgtable.h b/include/asm-sparc64/pgtable.h
index b12be7a..9cbd149 100644
--- a/include/asm-sparc64/pgtable.h
+++ b/include/asm-sparc64/pgtable.h
@@ -42,6 +42,9 @@
#define HI_OBP_ADDRESS _AC(0x0000000100000000,UL)
#define VMALLOC_START _AC(0x0000000100000000,UL)
#define VMALLOC_END _AC(0x0000000200000000,UL)
+/* see asm-sparc64/page.h for VMEMMAP_BASE which sits right
+ * at VMALLOC_END
+ */
/* XXX All of this needs to be rethought so we can take advantage
* XXX cheetah's full 64-bit virtual address space, ie. no more hole
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 1/4] Generic Virtual Memmap suport for SPARSEMEM V3
2007-04-05 21:29 ` [PATCH 1/4] Generic Virtual Memmap suport for SPARSEMEM V3 David Miller, Christoph Lameter
@ 2007-04-05 22:27 ` Christoph Lameter
2007-04-10 10:43 ` Andy Whitcroft
0 siblings, 1 reply; 10+ messages in thread
From: Christoph Lameter @ 2007-04-05 22:27 UTC (permalink / raw)
To: David Miller
Cc: Andy Whitcroft, akpm, linux-ia64, linux-kernel, mbligh, linux-mm,
ak, hansendc, kamezawa.hiroyu
On Thu, 5 Apr 2007, David Miller wrote:
> Hey Christoph, here is sparc64 support for this stuff.
Great!
> After implementing this and seeing more and more how it works, I
> really like it :-)
>
> Thanks a lot for doing this work Christoph!
Thanks for the appreciation. CCing Andy Whitcroft who will hopefully
merge this all of this together into sparsemem including the S/390
implementation.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/4] Generic Virtual Memmap suport for SPARSEMEM V3
2007-04-05 22:27 ` Christoph Lameter
@ 2007-04-10 10:43 ` Andy Whitcroft
0 siblings, 0 replies; 10+ messages in thread
From: Andy Whitcroft @ 2007-04-10 10:43 UTC (permalink / raw)
To: Christoph Lameter
Cc: David Miller, akpm, linux-ia64, linux-kernel, mbligh, linux-mm,
ak, hansendc, kamezawa.hiroyu
Christoph Lameter wrote:
> On Thu, 5 Apr 2007, David Miller wrote:
>
>> Hey Christoph, here is sparc64 support for this stuff.
>
> Great!
>
>> After implementing this and seeing more and more how it works, I
>> really like it :-)
>>
>> Thanks a lot for doing this work Christoph!
>
> Thanks for the appreciation. CCing Andy Whitcroft who will hopefully
> merge this all of this together into sparsemem including the S/390
> implementation.
Yep grabbed this one and added it to the stack. Now to find a sparc to
test it with!
-apw
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-04-10 10:43 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-04 23:06 [PATCH 1/4] Generic Virtual Memmap suport for SPARSEMEM V3 Christoph Lameter
2007-04-04 23:06 ` [PATCH 2/4] x86_64: SPARSE_VIRTUAL 2M page size support Christoph Lameter
2007-04-04 23:06 ` [PATCH 3/4] IA64: SPARSE_VIRTUAL 16K " Christoph Lameter
2007-04-04 23:06 ` [PATCH 4/4] IA64: SPARSE_VIRTUAL 16M " Christoph Lameter
2007-04-05 22:50 ` Luck, Tony
2007-04-05 23:10 ` David Miller, Luck, Tony
2007-04-06 17:16 ` Christoph Lameter
2007-04-05 21:29 ` [PATCH 1/4] Generic Virtual Memmap suport for SPARSEMEM V3 David Miller, Christoph Lameter
2007-04-05 22:27 ` Christoph Lameter
2007-04-10 10:43 ` Andy Whitcroft
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox