linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* CONFIG_NONLINEAR for small systems
@ 2004-10-18 14:24 Andy Whitcroft
  2004-10-18 14:32 ` 050 bootmem use NODE_DATA Andy Whitcroft
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: Andy Whitcroft @ 2004-10-18 14:24 UTC (permalink / raw)
  To: lhms-devel, linux-mm; +Cc: Andy Whitcroft

Following this email will be a series of patches which provide a
sample implementation of a simplified CONFIG_NONLINEAR memory model. 
The first two cleanup general infrastructure to minimise code 
duplication.  The third introduces an allocator for the numa remap space 
on i386.  The fourth generalises the page flags code to allow the reuse 
of the NODEZONE bits.  The final three are the actual meat of the 
implementation for both i386 and ppc64.

050-bootmem-use-NODE_DATA
060-refactor-setup_memory-i386
080-alloc_remap-i386
100-cleanup-node-zone
150-nonlinear
160-nonlinear-i386
170-nonlinear-ppc64

As has been observed the CONFIG_DISCONTIGMEM implementation
is inefficient space-wise where a system has a sparse intra-node memory
configuration. For example we have systems where node 0 has a
1GB hole within it. Under CONFIG_DISCONTIGMEM this results in the
struct page's for this area being allocated from ZONE_NORMAL and
never used; this is particularly problematic on these 32bit systems
as we are already under severe pressure in this zone.

The generalised CONFIG_NONLINEAR memory model described at OLS
seemed provide more than enough decriptive power to address this
issue but provided far more functionality that was required.
Particularly it breaks the identity V=P+c to allow compression of
the kernel address space, which is not required on these smaller systems.

This patch set is implemented as a proof-of-concept to show
that a simplified CONFIG_NONLINEAR based implementation could provide
sufficient flexibility to solve the problems for these systems.

In the longer term I'd like to see a single CONFIG_NONLINEAR
implementation which allowed these various features to be stacked in
combination as required.

Thoughts?

-apw
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* 050 bootmem use NODE_DATA
  2004-10-18 14:24 CONFIG_NONLINEAR for small systems Andy Whitcroft
@ 2004-10-18 14:32 ` Andy Whitcroft
  2004-10-26 18:16   ` Dave Hansen
  2004-10-18 14:33 ` 060 refactor setup_memory i386 Andy Whitcroft
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Andy Whitcroft @ 2004-10-18 14:32 UTC (permalink / raw)
  To: apw, lhms-devel, linux-mm

Convert the default non-node based bootmem routines to use
NODE_DATA(0).  This is semantically and functionally identical in
any non-node configuration as NODE_DATA(x) is defined as below.

#define NODE_DATA(nid)          (&contig_page_data)

For the node cases (CONFIG_NUMA and CONFIG_DISCONTIG_MEM) we can
use these non-node forms where all boot memory is defined on node 0.

Revision: $Rev$

Signed-off-by: Andy Whitcroft <apw@shadowen.org>

diffstat 050-bootmem-use-NODE_DATA
---
 bootmem.c |   10 ++++------
 1 files changed, 4 insertions(+), 6 deletions(-)

diff -upN reference/mm/bootmem.c current/mm/bootmem.c
--- reference/mm/bootmem.c
+++ current/mm/bootmem.c
@@ -343,31 +343,29 @@ unsigned long __init free_all_bootmem_no
 	return(free_all_bootmem_core(pgdat));
 }
 
-#ifndef CONFIG_DISCONTIGMEM
 unsigned long __init init_bootmem (unsigned long start, unsigned long pages)
 {
 	max_low_pfn = pages;
 	min_low_pfn = start;
-	return(init_bootmem_core(&contig_page_data, start, 0, pages));
+	return(init_bootmem_core(NODE_DATA(0), start, 0, pages));
 }
 
 #ifndef CONFIG_HAVE_ARCH_BOOTMEM_NODE
 void __init reserve_bootmem (unsigned long addr, unsigned long size)
 {
-	reserve_bootmem_core(contig_page_data.bdata, addr, size);
+	reserve_bootmem_core(NODE_DATA(0)->bdata, addr, size);
 }
 #endif /* !CONFIG_HAVE_ARCH_BOOTMEM_NODE */
 
 void __init free_bootmem (unsigned long addr, unsigned long size)
 {
-	free_bootmem_core(contig_page_data.bdata, addr, size);
+	free_bootmem_core(NODE_DATA(0)->bdata, addr, size);
 }
 
 unsigned long __init free_all_bootmem (void)
 {
-	return(free_all_bootmem_core(&contig_page_data));
+	return(free_all_bootmem_core(NODE_DATA(0)));
 }
-#endif /* !CONFIG_DISCONTIGMEM */
 
 void * __init __alloc_bootmem (unsigned long size, unsigned long align, unsigned long goal)
 {
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* 060 refactor setup_memory i386
  2004-10-18 14:24 CONFIG_NONLINEAR for small systems Andy Whitcroft
  2004-10-18 14:32 ` 050 bootmem use NODE_DATA Andy Whitcroft
@ 2004-10-18 14:33 ` Andy Whitcroft
  2004-10-18 14:34 ` 080 alloc_remap i386 Andy Whitcroft
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Andy Whitcroft @ 2004-10-18 14:33 UTC (permalink / raw)
  To: apw, lhms-devel, linux-mm

Refactor the i386 default and CONFIG_DISCONTIG_MEM setup_memory()
functions to share the common bootmem initialisation code.  This code
is intended to be identical, but there are currently some fixes
applied to one and not the other.  This patch extracts this common
initialisation code.

Revision: $Rev$

Signed-off-by: Andy Whitcroft <apw@shadowen.org>

diffstat 060-refactor-setup_memory-i386
---
 kernel/setup.c |   25 ++++++------
 mm/discontig.c |  117 +--------------------------------------------------------
 2 files changed, 18 insertions(+), 124 deletions(-)

diff -upN reference/arch/i386/kernel/setup.c current/arch/i386/kernel/setup.c
--- reference/arch/i386/kernel/setup.c
+++ current/arch/i386/kernel/setup.c
@@ -941,8 +941,6 @@ unsigned long __init find_max_low_pfn(vo
 	return max_low_pfn;
 }
 
-#ifndef CONFIG_DISCONTIGMEM
-
 /*
  * Free all available memory for boot time allocation.  Used
  * as a callback function by efi_memory_walk()
@@ -1016,15 +1014,15 @@ static void __init reserve_ebda_region(v
 		reserve_bootmem(addr, PAGE_SIZE);	
 }
 
+#ifndef CONFIG_DISCONTIGMEM
+void __init setup_bootmem_allocator(void);
 static unsigned long __init setup_memory(void)
 {
-	unsigned long bootmap_size, start_pfn, max_low_pfn;
-
 	/*
 	 * partially used pages are not usable - thus
 	 * we are rounding upwards:
 	 */
-	start_pfn = PFN_UP(init_pg_tables_end);
+	min_low_pfn = PFN_UP(init_pg_tables_end);
 
 	find_max_pfn();
 
@@ -1040,10 +1038,19 @@ static unsigned long __init setup_memory
 #endif
 	printk(KERN_NOTICE "%ldMB LOWMEM available.\n",
 			pages_to_mb(max_low_pfn));
+
+	setup_bootmem_allocator();
+	return max_low_pfn;
+}
+#endif /* !CONFIG_DISCONTIGMEM */
+
+void __init setup_bootmem_allocator(void)
+{
+	unsigned long bootmap_size;
 	/*
 	 * Initialize the boot-time allocator (with low memory only):
 	 */
-	bootmap_size = init_bootmem(start_pfn, max_low_pfn);
+	bootmap_size = init_bootmem(min_low_pfn, max_low_pfn);
 
 	register_bootmem_low_pages(max_low_pfn);
 
@@ -1053,7 +1060,7 @@ static unsigned long __init setup_memory
 	 * the (very unlikely) case of us accidentally initializing the
 	 * bootmem allocator with an invalid RAM area.
 	 */
-	reserve_bootmem(HIGH_MEMORY, (PFN_PHYS(start_pfn) +
+	reserve_bootmem(HIGH_MEMORY, (PFN_PHYS(min_low_pfn) +
 			 bootmap_size + PAGE_SIZE-1) - (HIGH_MEMORY));
 
 	/*
@@ -1110,11 +1117,7 @@ static unsigned long __init setup_memory
 		}
 	}
 #endif
-	return max_low_pfn;
 }
-#else
-extern unsigned long setup_memory(void);
-#endif /* !CONFIG_DISCONTIGMEM */
 
 /*
  * Request address space for all standard RAM and ROM resources
diff -upN reference/arch/i386/mm/discontig.c current/arch/i386/mm/discontig.c
--- reference/arch/i386/mm/discontig.c
+++ current/arch/i386/mm/discontig.c
@@ -136,46 +136,6 @@ static void __init allocate_pgdat(int ni
 	}
 }
 
-/*
- * Register fully available low RAM pages with the bootmem allocator.
- */
-static void __init register_bootmem_low_pages(unsigned long system_max_low_pfn)
-{
-	int i;
-
-	for (i = 0; i < e820.nr_map; i++) {
-		unsigned long curr_pfn, last_pfn, size;
-		/*
-		 * Reserve usable low memory
-		 */
-		if (e820.map[i].type != E820_RAM)
-			continue;
-		/*
-		 * We are rounding up the start address of usable memory:
-		 */
-		curr_pfn = PFN_UP(e820.map[i].addr);
-		if (curr_pfn >= system_max_low_pfn)
-			continue;
-		/*
-		 * ... and at the end of the usable range downwards:
-		 */
-		last_pfn = PFN_DOWN(e820.map[i].addr + e820.map[i].size);
-
-		if (last_pfn > system_max_low_pfn)
-			last_pfn = system_max_low_pfn;
-
-		/*
-		 * .. finally, did all the rounding and playing
-		 * around just make the area go away?
-		 */
-		if (last_pfn <= curr_pfn)
-			continue;
-
-		size = last_pfn - curr_pfn;
-		free_bootmem_node(NODE_DATA(0), PFN_PHYS(curr_pfn), PFN_PHYS(size));
-	}
-}
-
 void __init remap_numa_kva(void)
 {
 	void *vaddr;
@@ -220,21 +180,11 @@ static unsigned long calculate_numa_rema
 	return reserve_pages;
 }
 
-/*
- * workaround for Dell systems that neglect to reserve EBDA
- */
-static void __init reserve_ebda_region_node(void)
-{
-	unsigned int addr;
-	addr = get_bios_ebda();
-	if (addr)
-		reserve_bootmem_node(NODE_DATA(0), addr, PAGE_SIZE);
-}
-
+extern void setup_bootmem_allocator(void);
 unsigned long __init setup_memory(void)
 {
 	int nid;
-	unsigned long bootmap_size, system_start_pfn, system_max_low_pfn;
+	unsigned long system_start_pfn, system_max_low_pfn;
 	unsigned long reserve_pages, pfn;
 
 	/*
@@ -301,68 +251,9 @@ unsigned long __init setup_memory(void)
 
 	NODE_DATA(0)->bdata = &node0_bdata;
 
-	/*
-	 * Initialize the boot-time allocator (with low memory only):
-	 */
-	bootmap_size = init_bootmem_node(NODE_DATA(0), min_low_pfn, 0, system_max_low_pfn);
+	setup_bootmem_allocator();
 
-	register_bootmem_low_pages(system_max_low_pfn);
-
-	/*
-	 * Reserve the bootmem bitmap itself as well. We do this in two
-	 * steps (first step was init_bootmem()) because this catches
-	 * the (very unlikely) case of us accidentally initializing the
-	 * bootmem allocator with an invalid RAM area.
-	 */
-	reserve_bootmem_node(NODE_DATA(0), HIGH_MEMORY, (PFN_PHYS(min_low_pfn) +
-		 bootmap_size + PAGE_SIZE-1) - (HIGH_MEMORY));
-
-	/*
-	 * reserve physical page 0 - it's a special BIOS page on many boxes,
-	 * enabling clean reboots, SMP operation, laptop functions.
-	 */
-	reserve_bootmem_node(NODE_DATA(0), 0, PAGE_SIZE);
-
-	/*
-	 * But first pinch a few for the stack/trampoline stuff
-	 * FIXME: Don't need the extra page at 4K, but need to fix
-	 * trampoline before removing it. (see the GDT stuff)
-	 */
-	reserve_bootmem_node(NODE_DATA(0), PAGE_SIZE, PAGE_SIZE);
-
-	/* reserve EBDA region, it's a 4K region */
-	reserve_ebda_region_node();
-
-#ifdef CONFIG_ACPI_SLEEP
-	/*
-	 * Reserve low memory region for sleep support.
-	 */
-	acpi_reserve_bootmem();
-#endif
-
-	/*
-	 * Find and reserve possible boot-time SMP configuration:
-	 */
-	find_smp_config();
-
-#ifdef CONFIG_BLK_DEV_INITRD
-	if (LOADER_TYPE && INITRD_START) {
-		if (INITRD_START + INITRD_SIZE <= (system_max_low_pfn << PAGE_SHIFT)) {
-			reserve_bootmem_node(NODE_DATA(0), INITRD_START, INITRD_SIZE);
-			initrd_start =
-				INITRD_START ? INITRD_START + PAGE_OFFSET : 0;
-			initrd_end = initrd_start+INITRD_SIZE;
-		}
-		else {
-			printk(KERN_ERR "initrd extends beyond end of memory "
-			    "(0x%08lx > 0x%08lx)\ndisabling initrd\n",
-			    INITRD_START + INITRD_SIZE,
-			    system_max_low_pfn << PAGE_SHIFT);
-			initrd_start = 0;
-		}
-	}
-#endif
-	return system_max_low_pfn;
+	return max_low_pfn;
 }
 
 void __init zone_sizes_init(void)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* 080 alloc_remap i386
  2004-10-18 14:24 CONFIG_NONLINEAR for small systems Andy Whitcroft
  2004-10-18 14:32 ` 050 bootmem use NODE_DATA Andy Whitcroft
  2004-10-18 14:33 ` 060 refactor setup_memory i386 Andy Whitcroft
@ 2004-10-18 14:34 ` Andy Whitcroft
  2004-10-18 14:35 ` 100 cleanup node zone Andy Whitcroft
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Andy Whitcroft @ 2004-10-18 14:34 UTC (permalink / raw)
  To: apw, lhms-devel, linux-mm

Introduce a new allocator for the NUMA the scares remap space.

Revision: $Rev$

Signed-off-by: Andy Whitcroft <apw@shadowen.org>

diffstat 080-alloc_remap-i386
---
 arch/i386/mm/discontig.c  |   55 ++++++++++++++++++++++++++++++++++++++++------
 include/asm-i386/mmzone.h |    2 +
 mm/page_alloc.c           |   35 ++++++++++++++++++++++++++---
 3 files changed, 83 insertions(+), 9 deletions(-)

diff -X /home/apw/brief/lib/vdiff.excl -rupN reference/arch/i386/mm/discontig.c current/arch/i386/mm/discontig.c
--- reference/arch/i386/mm/discontig.c
+++ current/arch/i386/mm/discontig.c
@@ -81,6 +81,9 @@ unsigned long node_remap_offset[MAX_NUMN
 void *node_remap_start_vaddr[MAX_NUMNODES];
 void set_pmd_pfn(unsigned long vaddr, unsigned long pfn, pgprot_t flags);
 
+void *node_remap_end_vaddr[MAX_NUMNODES];
+void *node_remap_alloc_vaddr[MAX_NUMNODES];
+
 /*
  * FLAT - support for basic PC memory model with discontig enabled, essentially
  *        a single node with all available processors in it with a flat
@@ -136,13 +139,36 @@ static void __init allocate_pgdat(int ni
 	}
 }
 
+void *alloc_remap(int nid, unsigned long size)
+{
+	void *allocation = node_remap_alloc_vaddr[nid];
+
+	printk(KERN_WARNING "APW: alloc_remap(%d, %08lx)\n", nid, size);
+
+	size = ALIGN(size, L1_CACHE_BYTES);
+
+	if (!allocation)
+	return 0;
+	if ((allocation + size) >= node_remap_end_vaddr[nid])
+		return 0;
+
+	node_remap_alloc_vaddr[nid] += size;
+
+	memset(allocation, 0, size);
+
+	printk(KERN_WARNING "APW: alloc_remap(%d, %08lx) = %p\n", nid, size,
+			allocation);
+
+	return allocation;
+}
+
 void __init remap_numa_kva(void)
 {
 	void *vaddr;
 	unsigned long pfn;
 	int node;
 
-	for (node = 1; node < numnodes; ++node) {
+	for (node = 0; node < numnodes; ++node) {
 		for (pfn=0; pfn < node_remap_size[node]; pfn += PTRS_PER_PTE) {
 			vaddr = node_remap_start_vaddr[node]+(pfn<<PAGE_SHIFT);
 			set_pmd_pfn((ulong) vaddr, 
@@ -152,15 +178,21 @@ void __init remap_numa_kva(void)
 	}
 }
 
+/* APW/XXX: not here .. */
+unsigned long zone_bitmap_calculate(unsigned long nr_pages);
 static unsigned long calculate_numa_remap_pages(void)
 {
 	int nid;
 	unsigned long size, reserve_pages = 0;
 
-	for (nid = 1; nid < numnodes; nid++) {
+	for (nid = 0; nid < numnodes; nid++) {
 		/* calculate the size of the mem_map needed in bytes */
 		size = (node_end_pfn[nid] - node_start_pfn[nid] + 1) 
 			* sizeof(struct page) + sizeof(pg_data_t);
+
+		/* Allow for the bitmaps. */
+		size += zone_bitmap_calculate(node_end_pfn[nid] - node_start_pfn[nid] + 1);
+
 		/* convert size to large (pmd size) pages, rounding up */
 		size = (size + LARGE_PAGE_BYTES - 1) / LARGE_PAGE_BYTES;
 		/* now the roundup is correct, convert to PAGE_SIZE pages */
@@ -168,8 +200,8 @@ static unsigned long calculate_numa_rema
 		printk("Reserving %ld pages of KVA for lmem_map of node %d\n",
 				size, nid);
 		node_remap_size[nid] = size;
-		reserve_pages += size;
 		node_remap_offset[nid] = reserve_pages;
+		reserve_pages += size;
 		printk("Shrinking node %d from %ld pages to %ld pages\n",
 			nid, node_end_pfn[nid], node_end_pfn[nid] - size);
 		node_end_pfn[nid] -= size;
@@ -236,12 +268,18 @@ unsigned long __init setup_memory(void)
 			(ulong) pfn_to_kaddr(max_low_pfn));
 	for (nid = 0; nid < numnodes; nid++) {
 		node_remap_start_vaddr[nid] = pfn_to_kaddr(
-			(highstart_pfn + reserve_pages) - node_remap_offset[nid]);
+			highstart_pfn + node_remap_offset[nid]);
+		/* Init the node remap allocator */
+		node_remap_end_vaddr[nid] = node_remap_start_vaddr[nid] +
+			(node_remap_size[nid] * PAGE_SIZE);
+		node_remap_alloc_vaddr[nid] = node_remap_start_vaddr[nid] +
+			ALIGN(sizeof(pg_data_t), PAGE_SIZE);
+
 		allocate_pgdat(nid);
 		printk ("node %d will remap to vaddr %08lx - %08lx\n", nid,
 			(ulong) node_remap_start_vaddr[nid],
-			(ulong) pfn_to_kaddr(highstart_pfn + reserve_pages
-			    - node_remap_offset[nid] + node_remap_size[nid]));
+			(ulong) pfn_to_kaddr(highstart_pfn 
+			    + node_remap_offset[nid] + node_remap_size[nid]));
 	}
 	printk("High memory starts at vaddr %08lx\n",
 			(ulong) pfn_to_kaddr(highstart_pfn));
@@ -307,6 +345,10 @@ void __init zone_sizes_init(void)
 		 * normal bootmem allocator, but other nodes come from the
 		 * remapped KVA area - mbligh
 		 */
+			free_area_init_node(nid, NODE_DATA(nid),
+					zones_size, start, zholes_size);
+
+#if 0
 		if (!nid)
 			free_area_init_node(nid, NODE_DATA(nid),
 					zones_size, start, zholes_size);
@@ -319,6 +361,7 @@ void __init zone_sizes_init(void)
 			free_area_init_node(nid, NODE_DATA(nid), zones_size,
 				start, zholes_size);
 		}
+#endif
 	}
 	return;
 }
diff -X /home/apw/brief/lib/vdiff.excl -rupN reference/include/asm-i386/mmzone.h current/include/asm-i386/mmzone.h
--- reference/include/asm-i386/mmzone.h
+++ current/include/asm-i386/mmzone.h
@@ -16,6 +16,8 @@
 	#else	/* summit or generic arch */
 		#include <asm/srat.h>
 	#endif
+	#define HAVE_ARCH_ALLOC_REMAP	1
+
 #else /* !CONFIG_NUMA */
 	#define get_memcfg_numa get_memcfg_numa_flat
 	#define get_zholes_size(n) (0)
diff -X /home/apw/brief/lib/vdiff.excl -rupN reference/mm/page_alloc.c current/mm/page_alloc.c
--- reference/mm/page_alloc.c
+++ current/mm/page_alloc.c
@@ -94,6 +94,9 @@ static void bad_page(const char *functio
 	page->mapping = NULL;
 }
 
+/* APW/XXX: not here. */
+void *alloc_remap(int nid, unsigned long size);
+
 #ifndef CONFIG_HUGETLB_PAGE
 #define prep_compound_page(page, order) do { } while (0)
 #define destroy_compound_page(page, order) do { } while (0)
@@ -1442,11 +1445,23 @@ unsigned long pages_to_bitmap_size(unsig
 	return bitmap_size;
 }
 
+unsigned long zone_bitmap_calculate(unsigned long nr_pages)
+{
+	unsigned long overall_size = 0;
+	int order;
+
+	for (order = 0; order < MAX_ORDER - 1; order++)
+		overall_size += pages_to_bitmap_size(order, nr_pages);
+	
+	return overall_size;
+}
+
 void zone_init_free_lists(struct pglist_data *pgdat, struct zone *zone, unsigned long size)
 {
 	int order;
 	for (order = 0; ; order++) {
 		unsigned long bitmap_size;
+		unsigned long *map;
 
 		INIT_LIST_HEAD(&zone->free_area[order].free_list);
 		if (order == MAX_ORDER-1) {
@@ -1455,8 +1470,15 @@ void zone_init_free_lists(struct pglist_
 		}
 
 		bitmap_size = pages_to_bitmap_size(order, size);
-		zone->free_area[order].map =
-		  (unsigned long *) alloc_bootmem_node(pgdat, bitmap_size);
+
+#ifdef HAVE_ARCH_ALLOC_REMAP
+		map = (unsigned long *) alloc_remap(pgdat->node_id,
+			bitmap_size);
+		if (!map) 
+#endif
+			map = (unsigned long *) alloc_bootmem_node(pgdat,
+				bitmap_size);
+		zone->free_area[order].map = map;
 	}
 }
 
@@ -1581,9 +1603,16 @@ static void __init free_area_init_core(s
 void __init node_alloc_mem_map(struct pglist_data *pgdat)
 {
 	unsigned long size;
+	void *map;
 
 	size = (pgdat->node_spanned_pages + 1) * sizeof(struct page);
-	pgdat->node_mem_map = alloc_bootmem_node(pgdat, size);
+
+#ifdef HAVE_ARCH_ALLOC_REMAP
+	map = (unsigned long *) alloc_remap(pgdat->node_id, size);
+	if (!map)
+#endif
+		map = alloc_bootmem_node(pgdat, size);
+	pgdat->node_mem_map = map;
 #ifndef CONFIG_DISCONTIGMEM
 	mem_map = contig_page_data.node_mem_map;
 #endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* 100 cleanup node zone
  2004-10-18 14:24 CONFIG_NONLINEAR for small systems Andy Whitcroft
                   ` (2 preceding siblings ...)
  2004-10-18 14:34 ` 080 alloc_remap i386 Andy Whitcroft
@ 2004-10-18 14:35 ` Andy Whitcroft
  2004-10-18 14:35 ` 150 nonlinear Andy Whitcroft
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Andy Whitcroft @ 2004-10-18 14:35 UTC (permalink / raw)
  To: apw, lhms-devel, linux-mm

diffstat 100-cleanup-node-zone
---
 mm.h     |   43 ++++++++++++++++++++++++++++++++++++-------
 mmzone.h |   16 +++-------------
 2 files changed, 39 insertions(+), 20 deletions(-)

diff -upN reference/include/linux/mm.h current/include/linux/mm.h
--- reference/include/linux/mm.h
+++ current/include/linux/mm.h
@@ -376,16 +376,41 @@ static inline void put_page(struct page 
  * We'll have up to (MAX_NUMNODES * MAX_NR_ZONES) zones total,
  * so we use (MAX_NODES_SHIFT + MAX_ZONES_SHIFT) here to get enough bits.
  */
-#define NODEZONE_SHIFT (sizeof(page_flags_t)*8 - MAX_NODES_SHIFT - MAX_ZONES_SHIFT)
-#define NODEZONE(node, zone)	((node << ZONES_SHIFT) | zone)
+
+#define FLAGS_SHIFT	(sizeof(page_flags_t)*8)
+
+/* 32bit: NODE:ZONE */
+#define PGFLAGS_NODES_SHIFT	(FLAGS_SHIFT - NODES_SHIFT)
+#define PGFLAGS_ZONES_SHIFT	(PGFLAGS_NODES_SHIFT - ZONES_SHIFT)
+
+#define ZONETABLE_SHIFT		(NODES_SHIFT + ZONES_SHIFT)
+#define PGFLAGS_ZONETABLE_SHIFT	(FLAGS_SHIFT - ZONETABLE_SHIFT)
+
+#if NODES_SHIFT+ZONES_SHIFT > FLAGS_TOTAL_SHIFT
+#error NODES_SHIFT+ZONES_SHIFT > FLAGS_TOTAL_SHIFT
+#endif
+
+#define NODEZONE(node, zone)		((node << ZONES_SHIFT) | zone)
+
+#define ZONES_MASK		(~((~0UL) << ZONES_SHIFT))
+#define NODES_MASK		(~((~0UL) << NODES_SHIFT))
+#define ZONETABLE_MASK		(~((~0UL) << ZONETABLE_SHIFT))
+
+#define PGFLAGS_MASK		(~((~0UL) << PGFLAGS_ZONETABLE_SHIFT)
 
 static inline unsigned long page_zonenum(struct page *page)
 {
-	return (page->flags >> NODEZONE_SHIFT) & (~(~0UL << ZONES_SHIFT));
+	if (FLAGS_SHIFT == (PGFLAGS_ZONES_SHIFT + ZONES_SHIFT))
+ 		return (page->flags >> PGFLAGS_ZONES_SHIFT);
+ 	else
+ 		return (page->flags >> PGFLAGS_ZONES_SHIFT) & ZONES_MASK;
 }
 static inline unsigned long page_to_nid(struct page *page)
 {
-	return (page->flags >> (NODEZONE_SHIFT + ZONES_SHIFT));
+	if (FLAGS_SHIFT == (PGFLAGS_NODES_SHIFT + NODES_SHIFT))
+		return (page->flags >> PGFLAGS_NODES_SHIFT);
+	else
+		return (page->flags >> PGFLAGS_NODES_SHIFT) & NODES_MASK;
 }
 
 struct zone;
@@ -393,13 +418,17 @@ extern struct zone *zone_table[];
 
 static inline struct zone *page_zone(struct page *page)
 {
-	return zone_table[page->flags >> NODEZONE_SHIFT];
+	if (FLAGS_SHIFT == (PGFLAGS_ZONETABLE_SHIFT + ZONETABLE_SHIFT))
+		return zone_table[page->flags >> PGFLAGS_ZONETABLE_SHIFT];
+	else
+		return zone_table[page->flags >> PGFLAGS_ZONETABLE_SHIFT &
+			ZONETABLE_MASK];
 }
 
 static inline void set_page_zone(struct page *page, unsigned long nodezone_num)
 {
-	page->flags &= ~(~0UL << NODEZONE_SHIFT);
-	page->flags |= nodezone_num << NODEZONE_SHIFT;
+	page->flags &= PGFLAGS_MASK;
+	page->flags |= nodezone_num << PGFLAGS_ZONETABLE_SHIFT;
 }
 
 #ifndef CONFIG_DISCONTIGMEM
diff -upN reference/include/linux/mmzone.h current/include/linux/mmzone.h
--- reference/include/linux/mmzone.h
+++ current/include/linux/mmzone.h
@@ -389,27 +389,17 @@ extern struct pglist_data contig_page_da
  * with 32 bit page->flags field, we reserve 8 bits for node/zone info.
  * there are 3 zones (2 bits) and this leaves 8-2=6 bits for nodes.
  */
-#define MAX_NODES_SHIFT		6
+#define FLAGS_TOTAL_SHIFT	8
+
 #elif BITS_PER_LONG == 64
 /*
  * with 64 bit flags field, there's plenty of room.
  */
-#define MAX_NODES_SHIFT		10
+#define FLAGS_TOTAL_SHIFT	12
 #endif
 
 #endif /* !CONFIG_DISCONTIGMEM */
 
-#if NODES_SHIFT > MAX_NODES_SHIFT
-#error NODES_SHIFT > MAX_NODES_SHIFT
-#endif
-
-/* There are currently 3 zones: DMA, Normal & Highmem, thus we need 2 bits */
-#define MAX_ZONES_SHIFT		2
-
-#if ZONES_SHIFT > MAX_ZONES_SHIFT
-#error ZONES_SHIFT > MAX_ZONES_SHIFT
-#endif
-
 extern DECLARE_BITMAP(node_online_map, MAX_NUMNODES);
 
 #if defined(CONFIG_DISCONTIGMEM) || defined(CONFIG_NUMA)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* 150 nonlinear
  2004-10-18 14:24 CONFIG_NONLINEAR for small systems Andy Whitcroft
                   ` (3 preceding siblings ...)
  2004-10-18 14:35 ` 100 cleanup node zone Andy Whitcroft
@ 2004-10-18 14:35 ` Andy Whitcroft
  2004-10-26 18:36   ` Dave Hansen
  2004-10-18 14:36 ` 160 nonlinear i386 Andy Whitcroft
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Andy Whitcroft @ 2004-10-18 14:35 UTC (permalink / raw)
  To: apw, lhms-devel, linux-mm

CONFIG_NONLINEAR memory model.

Revision: $Rev$

Signed-off-by: Andy Whitcroft <apw@shadowen.org>

diffstat 150-nonlinear
---
 include/linux/mm.h     |  103 +++++++++++++++++++++++++++++++++---
 include/linux/mmzone.h |  140 +++++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/numa.h   |    2 
 init/main.c            |    1 
 mm/Makefile            |    2 
 mm/bootmem.c           |   15 ++++-
 mm/memory.c            |    2 
 mm/nonlinear.c         |  137 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/page_alloc.c        |   87 +++++++++++++++++++++++++++++-
 9 files changed, 469 insertions(+), 20 deletions(-)

diff -upN reference/include/linux/mm.h current/include/linux/mm.h
--- reference/include/linux/mm.h
+++ current/include/linux/mm.h
@@ -379,24 +379,76 @@ static inline void put_page(struct page 
 
 #define FLAGS_SHIFT	(sizeof(page_flags_t)*8)
 
-/* 32bit: NODE:ZONE */
+/*
+ * CONFIG_NONLINEAR:
+ *   If there is room for SECTIONS, NODES AND ZONES then:
+ *     NODE:ZONE:SECTION
+ *   else:
+ *     SECTION:ZONE
+ *
+ * Otherwise:
+ *   NODE:ZONE
+ */
+#ifdef CONFIG_NONLINEAR
+
+#if FLAGS_TOTAL_SHIFT >= SECTIONS_SHIFT + NODES_SHIFT + ZONES_SHIFT
+
+/* NODE:ZONE:SECTION */
 #define PGFLAGS_NODES_SHIFT	(FLAGS_SHIFT - NODES_SHIFT)
 #define PGFLAGS_ZONES_SHIFT	(PGFLAGS_NODES_SHIFT - ZONES_SHIFT)
+#define PGFLAGS_SECTIONS_SHIFT	(PGFLAGS_ZONES_SHIFT - SECTIONS_SHIFT)
+
+#define FLAGS_USED_SHIFT	(NODES_SHIFT + ZONES_SHIFT + SECTIONS_SHIFT)
 
 #define ZONETABLE_SHIFT		(NODES_SHIFT + ZONES_SHIFT)
 #define PGFLAGS_ZONETABLE_SHIFT	(FLAGS_SHIFT - ZONETABLE_SHIFT)
 
-#if NODES_SHIFT+ZONES_SHIFT > FLAGS_TOTAL_SHIFT
-#error NODES_SHIFT+ZONES_SHIFT > FLAGS_TOTAL_SHIFT
+#define ZONETABLE(section, node, zone) \
+			((node << ZONES_SHIFT) | zone)
+
+#else
+
+/* SECTION:ZONE */
+#define PGFLAGS_SECTIONS_SHIFT	(FLAGS_SHIFT - SECTIONS_SHIFT)
+#define PGFLAGS_ZONES_SHIFT	(PGFLAGS_SECTIONS_SHIFT - ZONES_SHIFT)
+
+#define FLAGS_USED_SHIFT	(SECTIONS_SHIFT + ZONES_SHIFT)
+
+#define ZONETABLE_SHIFT		(SECTIONS_SHIFT + ZONES_SHIFT)
+#define PGFLAGS_ZONETABLE_SHIFT	(FLAGS_SHIFT - ZONETABLE_SHIFT)
+
+#define ZONETABLE(section, node, zone) \
+			((section << ZONES_SHIFT) | zone)
+
+#endif
+
+#else /* !CONFIG_NONLINEAR */
+
+/* NODE:ZONE */
+#define PGFLAGS_NODES_SHIFT	(FLAGS_SHIFT - NODES_SHIFT)
+#define PGFLAGS_ZONES_SHIFT	(PGFLAGS_NODES_SHIFT - ZONES_SHIFT)
+
+#define ZONETABLE_SHIFT		(NODES_SHIFT + ZONES_SHIFT)
+#define PGFLAGS_ZONETABLE_SHIFT	(FLAGS_SHIFT - ZONETABLE_SHIFT)
+
+#define FLAGS_USED_SHIFT	(NODES_SHIFT + ZONES_SHIFT)
+
+#endif /* !CONFIG_NONLINEAR */
+
+#if FLAGS_USED_SHIFT > FLAGS_TOTAL_SHIFT
+#error SECTIONS_SHIFT+NODES_SHIFT+ZONES_SHIFT > FLAGS_TOTAL_SHIFT
 #endif
 
 #define NODEZONE(node, zone)		((node << ZONES_SHIFT) | zone)
 
 #define ZONES_MASK		(~((~0UL) << ZONES_SHIFT))
 #define NODES_MASK		(~((~0UL) << NODES_SHIFT))
+#define SECTIONS_MASK		(~((~0UL) << SECTIONS_SHIFT))
 #define ZONETABLE_MASK		(~((~0UL) << ZONETABLE_SHIFT))
 
-#define PGFLAGS_MASK		(~((~0UL) << PGFLAGS_ZONETABLE_SHIFT)
+#define ZONETABLE_SIZE  	(1 << ZONETABLE_SHIFT)
+
+#define PGFLAGS_MASK		(~((~0UL) << PGFLAGS_ZONETABLE_SHIFT))
 
 static inline unsigned long page_zonenum(struct page *page)
 {
@@ -405,13 +457,34 @@ static inline unsigned long page_zonenum
  	else
  		return (page->flags >> PGFLAGS_ZONES_SHIFT) & ZONES_MASK;
 }
+#ifdef PGFLAGS_NODES_SHIFT
 static inline unsigned long page_to_nid(struct page *page)
 {
+#if NODES_SHIFT == 0
+	return 0;
+#else 
 	if (FLAGS_SHIFT == (PGFLAGS_NODES_SHIFT + NODES_SHIFT))
 		return (page->flags >> PGFLAGS_NODES_SHIFT);
 	else
 		return (page->flags >> PGFLAGS_NODES_SHIFT) & NODES_MASK;
+#endif
 }
+#else
+static inline struct zone *page_zone(struct page *page);
+static inline unsigned long page_to_nid(struct page *page)
+{
+	return page_zone(page)->zone_pgdat->node_id;
+}
+#endif
+#ifdef PGFLAGS_SECTIONS_SHIFT
+static inline unsigned long page_to_section(struct page *page)
+{
+	if (FLAGS_SHIFT == (PGFLAGS_SECTIONS_SHIFT + SECTIONS_SHIFT))
+ 		return (page->flags >> PGFLAGS_SECTIONS_SHIFT);
+ 	else
+ 		return (page->flags >> PGFLAGS_SECTIONS_SHIFT) & SECTIONS_MASK;
+}
+#endif
 
 struct zone;
 extern struct zone *zone_table[];
@@ -425,13 +498,27 @@ static inline struct zone *page_zone(str
 			ZONETABLE_MASK];
 }
 
-static inline void set_page_zone(struct page *page, unsigned long nodezone_num)
+static inline void set_page_zone(struct page *page, unsigned long zone)
+{
+	page->flags &= ~(ZONES_MASK << PGFLAGS_ZONES_SHIFT);
+	page->flags |= zone << PGFLAGS_ZONES_SHIFT;
+}
+static inline void set_page_node(struct page *page, unsigned long node)
 {
-	page->flags &= PGFLAGS_MASK;
-	page->flags |= nodezone_num << PGFLAGS_ZONETABLE_SHIFT;
+#if defined(PGFLAGS_NODES_SHIFT) && NODES_SHIFT != 0
+	page->flags &= ~(NODES_MASK << PGFLAGS_NODES_SHIFT);
+	page->flags |= node << PGFLAGS_NODES_SHIFT;
+#endif
+}
+static inline void set_page_section(struct page *page, unsigned long section)
+{
+#ifdef PGFLAGS_SECTIONS_SHIFT
+	page->flags &= ~(SECTIONS_MASK << PGFLAGS_SECTIONS_SHIFT);
+	page->flags |= section << PGFLAGS_SECTIONS_SHIFT;
+#endif
 }
 
-#ifndef CONFIG_DISCONTIGMEM
+#ifdef CONFIG_FLATMEM
 /* The array of struct pages - for discontigmem use pgdat->lmem_map */
 extern struct page *mem_map;
 #endif
diff -upN reference/include/linux/mmzone.h current/include/linux/mmzone.h
--- reference/include/linux/mmzone.h
+++ current/include/linux/mmzone.h
@@ -372,7 +372,7 @@ int lower_zone_protection_sysctl_handler
 /* Returns the number of the current Node. */
 #define numa_node_id()		(cpu_to_node(smp_processor_id()))
 
-#ifndef CONFIG_DISCONTIGMEM
+#ifdef CONFIG_FLATMEM
 
 extern struct pglist_data contig_page_data;
 #define NODE_DATA(nid)		(&contig_page_data)
@@ -384,6 +384,8 @@ extern struct pglist_data contig_page_da
 
 #include <asm/mmzone.h>
 
+#endif /* CONFIG_FLATMEM */
+
 #if BITS_PER_LONG == 32 || defined(ARCH_HAS_ATOMIC_UNSIGNED)
 /*
  * with 32 bit page->flags field, we reserve 8 bits for node/zone info.
@@ -395,10 +397,13 @@ extern struct pglist_data contig_page_da
 /*
  * with 64 bit flags field, there's plenty of room.
  */
-#define FLAGS_TOTAL_SHIFT	12
-#endif
+#define FLAGS_TOTAL_SHIFT	32
+
+#else
 
-#endif /* !CONFIG_DISCONTIGMEM */
+#error BITS_PER_LONG not set
+
+#endif
 
 extern DECLARE_BITMAP(node_online_map, MAX_NUMNODES);
 
@@ -429,6 +434,133 @@ static inline unsigned int num_online_no
 #define num_online_nodes()	1
 
 #endif /* CONFIG_DISCONTIGMEM || CONFIG_NUMA */
+
+#ifdef CONFIG_NONLINEAR
+
+/*
+ * SECTION_SHIFT                #bits space required to store a section #
+ * PHYS_SECTION_SHIFT           #bits required to store a physical section #
+ *
+ * PA_SECTION_SHIFT             physical address to/from section number
+ * PFN_SECTION_SHIFT            pfn to/from section number
+ */
+#define SECTIONS_SHIFT          (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
+#define PHYS_SECTION_SHIFT      (MAX_PHYSADDR_BITS - SECTION_SIZE_BITS)
+
+#define PA_SECTION_SHIFT        (SECTION_SIZE_BITS)
+#define PFN_SECTION_SHIFT       (SECTION_SIZE_BITS - PAGE_SHIFT)
+
+#define NR_MEM_SECTIONS        	(1 << SECTIONS_SHIFT)
+#define NR_PHYS_SECTIONS        (1 << PHYS_SECTION_SHIFT)
+
+#define PAGES_PER_SECTION       (1 << PFN_SECTION_SHIFT)
+#define PAGE_SECTION_MASK	(~(PAGES_PER_SECTION-1))
+
+#if NR_MEM_SECTIONS == NR_PHYS_SECTIONS
+#define NONLINEAR_OPTIMISE 1
+#endif
+
+struct page;
+struct mem_section {
+	short section_nid;
+	struct page *section_mem_map;
+};
+
+#ifndef NONLINEAR_OPTIMISE
+extern short phys_section[NR_PHYS_SECTIONS];
+#endif
+extern struct mem_section mem_section[NR_MEM_SECTIONS];
+
+/*
+ * Given a kernel address, find the home node of the underlying memory.
+ */
+#define kvaddr_to_nid(kaddr)	pfn_to_nid(__pa(kaddr) >> PAGE_SHIFT)
+
+#if 0
+#define node_mem_map(nid)	(NODE_DATA(nid)->node_mem_map)
+
+#define node_start_pfn(nid)	(NODE_DATA(nid)->node_start_pfn)
+#define node_end_pfn(nid)						\
+({									\
+	pg_data_t *__pgdat = NODE_DATA(nid);				\
+	__pgdat->node_start_pfn + __pgdat->node_spanned_pages;		\
+})
+
+#define local_mapnr(kvaddr)						\
+({									\
+	unsigned long __pfn = __pa(kvaddr) >> PAGE_SHIFT;		\
+	(__pfn - node_start_pfn(pfn_to_nid(__pfn)));			\
+})
+#endif
+
+#if 0
+/* XXX: FIXME -- wli */
+#define kern_addr_valid(kaddr)	(0)
+#endif
+
+static inline struct mem_section *__pfn_to_section(unsigned long pfn)
+{
+#ifdef NONLINEAR_OPTIMISE
+	return &mem_section[pfn >> PFN_SECTION_SHIFT];
+#else
+	return &mem_section[phys_section[pfn >> PFN_SECTION_SHIFT]];
+#endif
+}
+
+#define pfn_to_page(pfn) 						\
+({ 									\
+	unsigned long __pfn = (pfn);					\
+	__pfn_to_section(__pfn)->section_mem_map + __pfn;		\
+})
+#define page_to_pfn(page)						\
+({									\
+	page - mem_section[page_to_section(page)].section_mem_map;	\
+})
+
+/* APW/XXX: this is not generic??? */
+#if 0
+#define pmd_page(pmd)		(pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT))
+#endif
+
+static inline int pfn_valid(unsigned long pfn)
+{
+	if ((pfn >> PFN_SECTION_SHIFT) >= NR_PHYS_SECTIONS) 
+		return 0;
+#ifdef NONLINEAR_OPTIMISE
+	return mem_section[pfn >> PFN_SECTION_SHIFT].section_mem_map != 0;
+#else
+	return phys_section[pfn >> PFN_SECTION_SHIFT] != -1;
+#endif
+}
+
+/*
+ * APW/XXX: these are _only_ used during initialisation, therefore they
+ * can use __initdata ... they should have names to indicate this
+ * restriction.
+ */
+#ifdef CONFIG_NUMA
+extern unsigned long phys_section_nid[NR_PHYS_SECTIONS];
+#define pfn_to_nid(pfn)							\
+({									\
+	unsigned long __pfn = (pfn);					\
+	phys_section_nid[__pfn >> PFN_SECTION_SHIFT];			\
+})
+#else
+	__pfn_to_section(__pfn)->section_nid;				\
+#define pfn_to_nid(pfn) 0
+#endif
+
+#define pfn_to_pgdat(pfn)						\
+({									\
+	NODE_DATA(pfn_to_nid(pfn));					\
+})
+
+int nonlinear_add(int nid, unsigned long start, unsigned long end);
+int nonlinear_calculate(int nid);
+void nonlinear_allocate(void);
+
+#endif /* CONFIG_NONLINEAR */
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MMZONE_H */
diff -upN reference/include/linux/numa.h current/include/linux/numa.h
--- reference/include/linux/numa.h
+++ current/include/linux/numa.h
@@ -3,7 +3,7 @@
 
 #include <linux/config.h>
 
-#ifdef CONFIG_DISCONTIGMEM
+#ifndef CONFIG_FLATMEM
 #include <asm/numnodes.h>
 #endif
 
diff -upN reference/init/main.c current/init/main.c
--- reference/init/main.c
+++ current/init/main.c
@@ -480,6 +480,7 @@ asmlinkage void __init start_kernel(void
 {
 	char * command_line;
 	extern struct kernel_param __start___param[], __stop___param[];
+
 /*
  * Interrupts are still disabled. Do necessary setups, then
  * enable them
diff -upN reference/mm/bootmem.c current/mm/bootmem.c
--- reference/mm/bootmem.c
+++ current/mm/bootmem.c
@@ -255,6 +255,7 @@ found:
 static unsigned long __init free_all_bootmem_core(pg_data_t *pgdat)
 {
 	struct page *page;
+	unsigned long pfn;
 	bootmem_data_t *bdata = pgdat->bdata;
 	unsigned long i, count, total = 0;
 	unsigned long idx;
@@ -265,15 +266,26 @@ static unsigned long __init free_all_boo
 
 	count = 0;
 	/* first extant page of the node */
-	page = virt_to_page(phys_to_virt(bdata->node_boot_start));
+	pfn = bdata->node_boot_start >> PAGE_SHIFT;
 	idx = bdata->node_low_pfn - (bdata->node_boot_start >> PAGE_SHIFT);
 	map = bdata->node_bootmem_map;
 	/* Check physaddr is O(LOG2(BITS_PER_LONG)) page aligned */
 	if (bdata->node_boot_start == 0 ||
 	    ffs(bdata->node_boot_start) - PAGE_SHIFT > ffs(BITS_PER_LONG))
 		gofast = 1;
+	page = pfn_to_page(pfn);
 	for (i = 0; i < idx; ) {
 		unsigned long v = ~map[i / BITS_PER_LONG];
+
+		/*
+		 * Makes use of the guarentee that *_mem_map will be
+		 * contigious in sections aligned at MAX_ORDER.
+		 * APW/XXX: we are making an assumption that our node_boot_start
+		 * is aligned to BITS_PER_LONG ... is this valid/enforced.
+		 */
+		if ((pfn & ((1 << MAX_ORDER) - 1)) == 0)
+			page = pfn_to_page(pfn);
+
 		if (gofast && v == ~0UL) {
 			int j;
 
@@ -302,6 +314,7 @@ static unsigned long __init free_all_boo
 			i+=BITS_PER_LONG;
 			page += BITS_PER_LONG;
 		}
+		pfn += BITS_PER_LONG;
 	}
 	total += count;
 
diff -upN reference/mm/Makefile current/mm/Makefile
--- reference/mm/Makefile
+++ current/mm/Makefile
@@ -15,6 +15,6 @@ obj-y			:= bootmem.o filemap.o mempool.o
 obj-$(CONFIG_SWAP)	+= page_io.o swap_state.o swapfile.o thrash.o
 obj-$(CONFIG_HUGETLBFS)	+= hugetlb.o
 obj-$(CONFIG_NUMA) 	+= mempolicy.o
+obj-$(CONFIG_NONLINEAR)       += nonlinear.o
 obj-$(CONFIG_SHMEM) += shmem.o
 obj-$(CONFIG_TINY_SHMEM) += tiny-shmem.o
-
diff -upN reference/mm/memory.c current/mm/memory.c
--- reference/mm/memory.c
+++ current/mm/memory.c
@@ -56,7 +56,7 @@
 #include <linux/swapops.h>
 #include <linux/elf.h>
 
-#ifndef CONFIG_DISCONTIGMEM
+#ifdef CONFIG_FLATMEM
 /* use the per-pgdat data instead for discontigmem - mbligh */
 unsigned long max_mapnr;
 struct page *mem_map;
diff -upN /dev/null current/mm/nonlinear.c
--- /dev/null
+++ current/mm/nonlinear.c
@@ -0,0 +1,137 @@
+/*
+ * Non-linear memory mappings.
+ */
+#include <linux/config.h>
+#include <linux/mm.h>
+#include <linux/bootmem.h>
+#include <linux/module.h>
+#include <asm/dma.h>
+
+/*
+ * Permenant non-linear data:
+ *
+ * 1) phys_section	- valid physical memory sections (in mem_section)
+ * 2) mem_section	- memory sections, mem_map's for valid memory
+ */
+#ifndef NONLINEAR_OPTIMISE
+short phys_section[NR_PHYS_SECTIONS] = { [ 0 ... NR_PHYS_SECTIONS-1] = -1 };
+EXPORT_SYMBOL(phys_section);
+#endif
+struct mem_section mem_section[NR_MEM_SECTIONS];
+EXPORT_SYMBOL(mem_section);
+
+
+/*
+ * Initialisation time data:
+ *
+ * 1) phys_section_nid  - physical section node id
+ * 2) phys_section_pfn  - physical section base page frame
+ */
+unsigned long phys_section_nid[NR_PHYS_SECTIONS] __initdata =
+	{ [ 0 ... NR_PHYS_SECTIONS-1] = -1 };
+static unsigned long phys_section_pfn[NR_PHYS_SECTIONS] __initdata;
+
+/* Record a non-linear memory area for a node. */
+int nonlinear_add(int nid, unsigned long start, unsigned long end)
+{
+	unsigned long pfn = start;
+
+printk(KERN_WARNING "APW: nonlinear_add: nid<%d> start<%08lx:%ld> end<%08lx:%ld>\n",
+		nid, start, start >> PFN_SECTION_SHIFT, end, end >> PFN_SECTION_SHIFT);
+	start &= PAGE_SECTION_MASK;
+	for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
+/*printk(KERN_WARNING "  APW: nonlinear_add: section<%d> pfn<%08lx>\n", 
+	pfn >> PFN_SECTION_SHIFT, pfn);*/
+		phys_section_nid[pfn >> PFN_SECTION_SHIFT] = nid;
+		phys_section_pfn[pfn >> PFN_SECTION_SHIFT] = pfn;
+	}
+
+	return 1;
+}
+
+/*
+ * Calculate the space required on a per node basis for the mmap.
+ */
+int nonlinear_calculate(int nid)
+{
+	int pnum;
+	int sections = 0;
+
+	for (pnum = 0; pnum < NR_PHYS_SECTIONS; pnum++) {
+		if (phys_section_nid[pnum] == nid)
+			sections++;
+	}
+
+	return (sections * PAGES_PER_SECTION * sizeof(struct page));
+}
+
+
+/* XXX/APW: NO! */
+void *alloc_remap(int nid, unsigned long size);
+
+/*
+ * Allocate the accumulated non-linear sections, allocate a mem_map
+ * for each and record the physical to section mapping.
+ */
+void nonlinear_allocate(void)
+{
+	int snum = 0;
+	int pnum;
+	struct page *map;
+
+	for (pnum = 0; pnum < NR_PHYS_SECTIONS; pnum++) {
+		if (phys_section_nid[pnum] == -1)
+			continue;
+
+		/* APW/XXX: this is a dumbo name for this feature, should
+		 * be something like alloc_really_really_early. */
+#ifdef HAVE_ARCH_ALLOC_REMAP
+		map = alloc_remap(phys_section_nid[pnum],
+				sizeof(struct page) * PAGES_PER_SECTION);
+#else
+		map = 0;
+#endif
+		if (!map)
+			map = alloc_bootmem_node(NODE_DATA(phys_section_nid[pnum]),
+				sizeof(struct page) * PAGES_PER_SECTION);
+		if (!map)
+			continue;
+
+		/*
+		 * Subtle, we encode the real pfn into the mem_map such that
+		 * the identity pfn - section_mem_map will return the actual
+		 * physical page frame number.
+		 */
+#ifdef NONLINEAR_OPTIMISE
+		snum = pnum;
+#else
+		phys_section[pnum] = snum;
+#endif
+		mem_section[snum].section_mem_map = map -
+			phys_section_pfn[pnum];
+
+if ((pnum % 32) == 0)
+printk(KERN_WARNING "APW: nonlinear_allocate: section<%d> map<%p> ms<%p> pfn<%08lx>\n", pnum, map, mem_section[snum].section_mem_map,  phys_section_pfn[pnum]);
+
+
+		snum++;
+	}
+
+#if 0
+#define X(x)	printk(KERN_WARNING "APW: " #x "<%08lx>\n", x)
+	X(FLAGS_SHIFT);
+	X(SECTIONS_SHIFT);
+	X(ZONES_SHIFT);
+	X(PGFLAGS_SECTIONS_SHIFT);
+	X(PGFLAGS_ZONES_SHIFT);
+	X(ZONETABLE_SHIFT);
+	X(PGFLAGS_ZONETABLE_SHIFT);
+	X(FLAGS_USED_SHIFT);
+	X(ZONES_MASK);
+	X(NODES_MASK);
+	X(SECTIONS_MASK);
+	X(ZONETABLE_MASK);
+	X(ZONETABLE_SIZE);
+	X(PGFLAGS_MASK);
+#endif
+}
diff -upN reference/mm/page_alloc.c current/mm/page_alloc.c
--- reference/mm/page_alloc.c
+++ current/mm/page_alloc.c
@@ -49,7 +49,7 @@ EXPORT_SYMBOL(nr_swap_pages);
  * Used by page_zone() to look up the address of the struct zone whose
  * id is encoded in the upper bits of page->flags
  */
-struct zone *zone_table[1 << (ZONES_SHIFT + NODES_SHIFT)];
+struct zone *zone_table[ZONETABLE_SIZE];
 EXPORT_SYMBOL(zone_table);
 
 static char *zone_names[MAX_NR_ZONES] = { "DMA", "Normal", "HighMem" };
@@ -63,6 +63,7 @@ unsigned long __initdata nr_all_pages;
  */
 static int bad_range(struct zone *zone, struct page *page)
 {
+	/* printk(KERN_WARNING "bad_range: page<%p> pfn<%08lx> s<%08lx> e<%08lx> zone<%p><%p>\n", page, page_to_pfn(page), zone->zone_start_pfn,  zone->zone_start_pfn + zone->spanned_pages, zone, page_zone(page)); */
 	if (page_to_pfn(page) >= zone->zone_start_pfn + zone->spanned_pages)
 		return 1;
 	if (page_to_pfn(page) < zone->zone_start_pfn)
@@ -187,7 +188,11 @@ static inline void __free_pages_bulk (st
 	if (order)
 		destroy_compound_page(page, order);
 	mask = (~0UL) << order;
+#ifdef CONFIG_NONLINEAR
+	page_idx = page_to_pfn(page) - zone->zone_start_pfn;
+#else
 	page_idx = page - base;
+#endif
 	if (page_idx & ~mask)
 		BUG();
 	index = page_idx >> (1 + order);
@@ -204,8 +209,35 @@ static inline void __free_pages_bulk (st
 			break;
 
 		/* Move the buddy up one level. */
+#ifdef CONFIG_NONLINEAR
+		/*
+		 * Locate the struct page for both the matching buddy in our
+		 * pair (buddy1) and the combined O(n+1) page they form (page).
+		 * 
+		 * 1) Any buddy B1 will have an order O twin B2 which satisfies
+		 * the following equasion:
+		 *     B2 = B1 ^ (1 << O)
+		 * For example, if the starting buddy (buddy2) is #8 its order
+		 * 1 buddy is #10:
+		 *     B2 = 8 ^ (1 << 1) = 8 ^ 2 = 10
+		 *
+		 * 2) Any buddy B will have an order O+1 parent P which
+		 * satisfies the following equasion:
+		 *     P = B & ~(1 << O)
+		 *
+		 * Assumption: *_mem_map is contigious at least up to MAX_ORDER
+		 */
+		buddy1 = page + ((page_idx ^ (1 << order)) - page_idx);
+		buddy2 = page;
+
+		page = page - (page_idx - (page_idx & ~(1 << order)));
+
+		if (bad_range(zone, buddy1))
+		printk(KERN_WARNING "__free_pages_bulk: buddy1<%p> buddy2<%p> page<%p> page_idx<%ld> off<%ld>\n", buddy1, buddy2, page, page_idx, (page_idx - (page_idx & ~(1 << order)))); 
+#else
 		buddy1 = base + (page_idx ^ (1 << order));
 		buddy2 = base + page_idx;
+#endif
 		BUG_ON(bad_range(zone, buddy1));
 		BUG_ON(bad_range(zone, buddy2));
 		list_del(&buddy1->lru);
@@ -215,7 +247,11 @@ static inline void __free_pages_bulk (st
 		index >>= 1;
 		page_idx &= mask;
 	}
+#ifdef CONFIG_NONLINEAR
+	list_add(&page->lru, &area->free_list);
+#else
 	list_add(&(base + page_idx)->lru, &area->free_list);
+#endif
 }
 
 static inline void free_pages_check(const char *function, struct page *page)
@@ -380,7 +416,11 @@ static struct page *__rmqueue(struct zon
 
 		page = list_entry(area->free_list.next, struct page, lru);
 		list_del(&page->lru);
+#ifdef CONFIG_NONLINEAR
+		index = page_to_pfn(page) - zone->zone_start_pfn;
+#else
 		index = page - zone->zone_mem_map;
+#endif
 		if (current_order != MAX_ORDER-1)
 			MARK_USED(index, current_order, area);
 		zone->free_pages -= 1UL << order;
@@ -1401,9 +1441,39 @@ void __init memmap_init_zone(unsigned lo
 {
 	struct page *start = pfn_to_page(start_pfn);
 	struct page *page;
+	struct zone *zonep = &NODE_DATA(nid)->node_zones[zone];
+#ifdef CONFIG_NONLINEAR
+	int pfn;
+#endif
+
+	/* APW/XXX: this is the place to both allocate the memory for the
+	 * section; scan the range offered relative to the zone and
+	 * instantiate the page's.
+	 */
+	printk(KERN_WARNING "APW: zone<%p> start<%08lx> pgdat<%p>\n",
+			zonep, start_pfn, zonep->zone_pgdat);
 
+#ifdef CONFIG_NONLINEAR
+	for (pfn = start_pfn; pfn < (start_pfn + size); pfn++) {
+		if (!pfn_valid(pfn))
+			continue;
+		page = pfn_to_page(pfn);
+
+		/*
+		 * Record the CHUNKZONE for this page and the install the
+		 * zone_table link for it also.
+		 */
+		set_page_node(page, nid);
+		set_page_zone(page, zone);
+		set_page_section(page, pfn >> PFN_SECTION_SHIFT);
+		zone_table[ZONETABLE(pfn >> PFN_SECTION_SHIFT, nid, zone)] =
+			zonep;
+#else
 	for (page = start; page < (start + size); page++) {
-		set_page_zone(page, NODEZONE(nid, zone));
+		set_page_node(page, nid);
+		set_page_zone(page, zone);
+#endif
+
 		set_page_count(page, 0);
 		reset_page_mapcount(page);
 		SetPageReserved(page);
@@ -1413,8 +1483,15 @@ void __init memmap_init_zone(unsigned lo
 		if (!is_highmem_idx(zone))
 			set_page_address(page, __va(start_pfn << PAGE_SHIFT));
 #endif
+		
+#ifdef CONFIG_NONLINEAR
+	}
+#else
 		start_pfn++;
 	}
+#endif
+	printk(KERN_WARNING "APW: zone<%p> start<%08lx> pgdat<%p>\n",
+			zonep, start_pfn, zonep->zone_pgdat);
 }
 
 /*
@@ -1509,7 +1586,9 @@ static void __init free_area_init_core(s
 		unsigned long size, realsize;
 		unsigned long batch;
 
+#ifndef CONFIG_NONLINEAR
 		zone_table[NODEZONE(nid, j)] = zone;
+#endif
 		realsize = size = zones_size[j];
 		if (zholes_size)
 			realsize -= zholes_size[j];
@@ -1613,7 +1692,7 @@ void __init node_alloc_mem_map(struct pg
 #endif
 		map = alloc_bootmem_node(pgdat, size);
 	pgdat->node_mem_map = map;
-#ifndef CONFIG_DISCONTIGMEM
+#ifdef CONFIG_FLATMEM
 	mem_map = contig_page_data.node_mem_map;
 #endif
 }
@@ -1632,7 +1711,7 @@ void __init free_area_init_node(int nid,
 	free_area_init_core(pgdat, zones_size, zholes_size);
 }
 
-#ifndef CONFIG_DISCONTIGMEM
+#ifdef CONFIG_FLATMEM
 static bootmem_data_t contig_bootmem_data;
 struct pglist_data contig_page_data = { .bdata = &contig_bootmem_data };
 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* 160 nonlinear i386
  2004-10-18 14:24 CONFIG_NONLINEAR for small systems Andy Whitcroft
                   ` (4 preceding siblings ...)
  2004-10-18 14:35 ` 150 nonlinear Andy Whitcroft
@ 2004-10-18 14:36 ` Andy Whitcroft
  2004-10-18 14:36 ` 170 nonlinear ppc64 Andy Whitcroft
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Andy Whitcroft @ 2004-10-18 14:36 UTC (permalink / raw)
  To: apw, lhms-devel, linux-mm

CONFIG_NONLINEAR for i386

Revision: $Rev$

Signed-off-by: Andy Whitcroft <apw@shadowen.org>

diffstat 160-nonlinear-i386
---
 arch/i386/Kconfig          |   22 ++++++---
 arch/i386/kernel/numaq.c   |    5 ++
 arch/i386/kernel/setup.c   |    7 ++
 arch/i386/kernel/srat.c    |    5 ++
 arch/i386/mm/Makefile      |    2 
 arch/i386/mm/discontig.c   |   97 +++++++++++++++++++++++----------------
 arch/i386/mm/init.c        |   19 ++++---
 include/asm-i386/mmzone.h  |  110 ++++++++++++++++++++++++++++++++++-----------
 include/asm-i386/page.h    |    4 -
 include/asm-i386/pgtable.h |    4 -
 10 files changed, 189 insertions(+), 86 deletions(-)

diff -X /home/apw/brief/lib/vdiff.excl -rupN reference/arch/i386/Kconfig current/arch/i386/Kconfig
--- reference/arch/i386/Kconfig
+++ current/arch/i386/Kconfig
@@ -68,7 +68,7 @@ config X86_VOYAGER
 
 config X86_NUMAQ
 	bool "NUMAQ (IBM/Sequent)"
-	select DISCONTIGMEM
+	#select DISCONTIGMEM
 	select NUMA
 	help
 	  This option is used for getting Linux to run on a (IBM/Sequent) NUMA
@@ -738,16 +738,28 @@ comment "NUMA (NUMA-Q) requires SMP, 64G
 comment "NUMA (Summit) requires SMP, 64GB highmem support, ACPI"
 	depends on X86_SUMMIT && (!HIGHMEM64G || !ACPI)
 
-config DISCONTIGMEM
-	bool
-	depends on NUMA
-	default y
 
 config HAVE_ARCH_BOOTMEM_NODE
 	bool
 	depends on NUMA
 	default y
 
+choice
+	prompt "Memory model"
+	default NONLINEAR if (X86_NUMAQ || X86_SUMMIT)
+	default FLATMEM
+
+config DISCONTIGMEM
+	bool "Discontigious Memory"
+
+config NONLINEAR
+	bool "Nonlinear Memory"
+
+config FLATMEM
+	bool "Flat Memory"
+
+endchoice
+
 config HIGHPTE
 	bool "Allocate 3rd-level pagetables from highmem"
 	depends on HIGHMEM4G || HIGHMEM64G
diff -X /home/apw/brief/lib/vdiff.excl -rupN reference/arch/i386/kernel/numaq.c current/arch/i386/kernel/numaq.c
--- reference/arch/i386/kernel/numaq.c
+++ current/arch/i386/kernel/numaq.c
@@ -60,6 +60,11 @@ static void __init smp_dump_qct(void)
 				eq->hi_shrd_mem_start - eq->priv_mem_size);
 			node_end_pfn[node] = MB_TO_PAGES(
 				eq->hi_shrd_mem_start + eq->hi_shrd_mem_size);
+#ifdef CONFIG_NONLINEAR
+			nonlinear_add(node, node_start_pfn[node],
+				node_end_pfn[node]);
+#endif
+
 		}
 	}
 }
diff -X /home/apw/brief/lib/vdiff.excl -rupN reference/arch/i386/kernel/setup.c current/arch/i386/kernel/setup.c
--- reference/arch/i386/kernel/setup.c
+++ current/arch/i386/kernel/setup.c
@@ -39,6 +39,7 @@
 #include <linux/efi.h>
 #include <linux/init.h>
 #include <linux/edd.h>
+#include <linux/mmzone.h>
 #include <video/edid.h>
 #include <asm/e820.h>
 #include <asm/mpspec.h>
@@ -1014,7 +1015,7 @@ static void __init reserve_ebda_region(v
 		reserve_bootmem(addr, PAGE_SIZE);	
 }
 
-#ifndef CONFIG_DISCONTIGMEM
+#ifdef CONFIG_FLATMEM
 void __init setup_bootmem_allocator(void);
 static unsigned long __init setup_memory(void)
 {
@@ -1042,7 +1043,9 @@ static unsigned long __init setup_memory
 	setup_bootmem_allocator();
 	return max_low_pfn;
 }
-#endif /* !CONFIG_DISCONTIGMEM */
+#else
+unsigned long __init setup_memory(void);
+#endif /* CONFIG_FLATMEM */
 
 void __init setup_bootmem_allocator(void)
 {
diff -X /home/apw/brief/lib/vdiff.excl -rupN reference/arch/i386/kernel/srat.c current/arch/i386/kernel/srat.c
--- reference/arch/i386/kernel/srat.c
+++ current/arch/i386/kernel/srat.c
@@ -261,6 +261,11 @@ static int __init acpi20_parse_srat(stru
 		       j, node_memory_chunk[j].nid,
 		       node_memory_chunk[j].start_pfn,
 		       node_memory_chunk[j].end_pfn);
+#ifdef CONFIG_NONLINEAR
+		 nonlinear_add(node_memory_chunk[j].nid,
+				 node_memory_chunk[j].start_pfn,
+				 node_memory_chunk[j].end_pfn);
+#endif /* CONFIG_NONLINEAR */
 	}
  
 	/*calculate node_start_pfn/node_end_pfn arrays*/
diff -X /home/apw/brief/lib/vdiff.excl -rupN reference/arch/i386/mm/discontig.c current/arch/i386/mm/discontig.c
--- reference/arch/i386/mm/discontig.c
+++ current/arch/i386/mm/discontig.c
@@ -33,20 +33,19 @@
 #include <asm/mmzone.h>
 #include <bios_ebda.h>
 
-struct pglist_data *node_data[MAX_NUMNODES];
-bootmem_data_t node0_bdata;
-
 /*
  * numa interface - we expect the numa architecture specfic code to have
  *                  populated the following initialisation.
  *
  * 1) numnodes         - the total number of nodes configured in the system
- * 2) physnode_map     - the mapping between a pfn and owning node
- * 3) node_start_pfn   - the starting page frame number for a node
+ * 2) node_start_pfn   - the starting page frame number for a node
  * 3) node_end_pfn     - the ending page fram number for a node
  */
 
+#ifdef CONFIG_DISCONTIGMEM
 /*
+ * 4) physnode_map     - the mapping between a pfn and owning node
+ *
  * physnode_map keeps track of the physical memory layout of a generic
  * numa node on a 256Mb break (each element of the array will
  * represent 256Mb of memory and will be marked by the node id.  so,
@@ -58,6 +57,10 @@ bootmem_data_t node0_bdata;
  *     physnode_map[8- ] = -1;
  */
 s8 physnode_map[MAX_ELEMENTS] = { [0 ... (MAX_ELEMENTS - 1)] = -1};
+#endif
+
+struct pglist_data *node_data[MAX_NUMNODES];
+bootmem_data_t node0_bdata;
 
 unsigned long node_start_pfn[MAX_NUMNODES];
 unsigned long node_end_pfn[MAX_NUMNODES];
@@ -186,9 +189,14 @@ static unsigned long calculate_numa_rema
 	unsigned long size, reserve_pages = 0;
 
 	for (nid = 0; nid < numnodes; nid++) {
+#ifdef CONFIG_DISCONTIGMEM
 		/* calculate the size of the mem_map needed in bytes */
 		size = (node_end_pfn[nid] - node_start_pfn[nid] + 1) 
 			* sizeof(struct page) + sizeof(pg_data_t);
+#endif
+#ifdef CONFIG_NONLINEAR
+		size = nonlinear_calculate(nid) + sizeof(pg_data_t);
+#endif
 
 		/* Allow for the bitmaps. */
 		size += zone_bitmap_calculate(node_end_pfn[nid] - node_start_pfn[nid] + 1);
@@ -217,7 +225,7 @@ unsigned long __init setup_memory(void)
 {
 	int nid;
 	unsigned long system_start_pfn, system_max_low_pfn;
-	unsigned long reserve_pages, pfn;
+	unsigned long reserve_pages;
 
 	/*
 	 * When mapping a NUMA machine we allocate the node_mem_map arrays
@@ -228,8 +236,11 @@ unsigned long __init setup_memory(void)
 	 */
 	get_memcfg_numa();
 
+#ifdef CONFIG_DISCONTIGMEM
 	/* Fill in the physnode_map */
 	for (nid = 0; nid < numnodes; nid++) {
+		unsigned long pfn;
+
 		printk("Node: %d, start_pfn: %ld, end_pfn: %ld\n",
 				nid, node_start_pfn[nid], node_end_pfn[nid]);
 		printk("  Setting physnode_map array to node %d for pfns:\n  ",
@@ -241,6 +252,7 @@ unsigned long __init setup_memory(void)
 		}
 		printk("\n");
 	}
+#endif
 
 	reserve_pages = calculate_numa_remap_pages();
 
@@ -340,13 +352,9 @@ void __init zone_sizes_init(void)
 			}
 		}
 		zholes_size = get_zholes_size(nid);
-		/*
-		 * We let the lmem_map for node 0 be allocated from the
-		 * normal bootmem allocator, but other nodes come from the
-		 * remapped KVA area - mbligh
-		 */
-			free_area_init_node(nid, NODE_DATA(nid),
-					zones_size, start, zholes_size);
+
+		free_area_init_node(nid, NODE_DATA(nid),
+				zones_size, start, zholes_size);
 
 #if 0
 		if (!nid)
@@ -369,39 +377,48 @@ void __init zone_sizes_init(void)
 void __init set_highmem_pages_init(int bad_ppro) 
 {
 #ifdef CONFIG_HIGHMEM
-	struct zone *zone;
-
-	for_each_zone(zone) {
-		unsigned long node_pfn, node_high_size, zone_start_pfn;
-		struct page * zone_mem_map;
-		
-		if (!is_highmem(zone))
-			continue;
-
-		printk("Initializing %s for node %d\n", zone->name,
-			zone->zone_pgdat->node_id);
-
-		node_high_size = zone->spanned_pages;
-		zone_mem_map = zone->zone_mem_map;
-		zone_start_pfn = zone->zone_start_pfn;
-
-		for (node_pfn = 0; node_pfn < node_high_size; node_pfn++) {
-			one_highpage_init((struct page *)(zone_mem_map + node_pfn),
-					  zone_start_pfn + node_pfn, bad_ppro);
-		}
-	}
-	totalram_pages += totalhigh_pages;
+  	struct zone *zone;
+	struct page *page;
+  
+  	for_each_zone(zone) {
+		unsigned long node_pfn, zone_start_pfn, zone_end_pfn;
+
+  		if (!is_highmem(zone))
+  			continue;
+  
+  		zone_start_pfn = zone->zone_start_pfn;
+		zone_end_pfn = zone_start_pfn + zone->spanned_pages;
+
+		printk("Initializing %s for node %d (%08lx:%08lx)\n",
+				zone->name, zone->zone_pgdat->node_id,
+				zone_start_pfn, zone_end_pfn);
+  
+		/*
+		 * Makes use of the guarentee that *_mem_map will be
+		 * contigious in sections aligned at MAX_ORDER.
+		 */
+		page = pfn_to_page(zone_start_pfn);
+		/* APW/XXX: pfn_valid!!!! */
+		for (node_pfn = zone_start_pfn; node_pfn < zone_end_pfn; node_pfn++, page++) {
+			if ((node_pfn & ((1 << MAX_ORDER) - 1)) == 0) {
+				if (!pfn_valid(node_pfn)) {
+					node_pfn += (1 << MAX_ORDER) - 1;
+					continue;
+ 				}
+				page = pfn_to_page(node_pfn);
+			}
+			one_highpage_init(page, node_pfn, bad_ppro);
+  		}
+  	}
+  	totalram_pages += totalhigh_pages;
 #endif
 }
 
 void __init set_max_mapnr_init(void)
 {
 #ifdef CONFIG_HIGHMEM
-	struct zone *high0 = &NODE_DATA(0)->node_zones[ZONE_HIGHMEM];
-	if (high0->spanned_pages > 0)
-	      	highmem_start_page = high0->zone_mem_map;
-	else
-		highmem_start_page = pfn_to_page(max_low_pfn+1); 
+	highmem_start_page = pfn_to_page(highstart_pfn);
+	/* highmem_start_page = pfn_to_page(max_low_pfn+1); XXX/APW */
 	num_physpages = highend_pfn;
 #else
 	num_physpages = max_low_pfn;
diff -X /home/apw/brief/lib/vdiff.excl -rupN reference/arch/i386/mm/init.c current/arch/i386/mm/init.c
--- reference/arch/i386/mm/init.c
+++ current/arch/i386/mm/init.c
@@ -274,7 +274,7 @@ void __init one_highpage_init(struct pag
 		SetPageReserved(page);
 }
 
-#ifndef CONFIG_DISCONTIGMEM
+#ifdef CONFIG_FLATMEM
 void __init set_highmem_pages_init(int bad_ppro) 
 {
 	int pfn;
@@ -284,7 +284,7 @@ void __init set_highmem_pages_init(int b
 }
 #else
 extern void set_highmem_pages_init(int);
-#endif /* !CONFIG_DISCONTIGMEM */
+#endif /* CONFIG_FLATMEM */
 
 #else
 #define kmap_init() do { } while (0)
@@ -295,7 +295,7 @@ extern void set_highmem_pages_init(int);
 unsigned long long __PAGE_KERNEL = _PAGE_KERNEL;
 unsigned long long __PAGE_KERNEL_EXEC = _PAGE_KERNEL_EXEC;
 
-#ifndef CONFIG_DISCONTIGMEM
+#ifdef CONFIG_FLATMEM
 #define remap_numa_kva() do {} while (0)
 #else
 extern void __init remap_numa_kva(void);
@@ -388,7 +388,7 @@ void zap_low_mappings (void)
 	flush_tlb_all();
 }
 
-#ifndef CONFIG_DISCONTIGMEM
+#ifdef CONFIG_FLATMEM
 void __init zone_sizes_init(void)
 {
 	unsigned long zones_size[MAX_NR_ZONES] = {0, 0, 0};
@@ -411,7 +411,7 @@ void __init zone_sizes_init(void)
 }
 #else
 extern void zone_sizes_init(void);
-#endif /* !CONFIG_DISCONTIGMEM */
+#endif /* CONFIG_FLATMEM */
 
 static int disable_nx __initdata = 0;
 u64 __supported_pte_mask = ~_PAGE_NX;
@@ -516,6 +516,9 @@ void __init paging_init(void)
 	__flush_tlb_all();
 
 	kmap_init();
+#ifdef CONFIG_NONLINEAR
+	nonlinear_allocate();
+#endif
 	zone_sizes_init();
 }
 
@@ -545,7 +548,7 @@ void __init test_wp_bit(void)
 	}
 }
 
-#ifndef CONFIG_DISCONTIGMEM
+#ifdef CONFIG_FLATMEM
 static void __init set_max_mapnr_init(void)
 {
 #ifdef CONFIG_HIGHMEM
@@ -559,7 +562,7 @@ static void __init set_max_mapnr_init(vo
 #else
 #define __free_all_bootmem() free_all_bootmem_node(NODE_DATA(0))
 extern void set_max_mapnr_init(void);
-#endif /* !CONFIG_DISCONTIGMEM */
+#endif /* CONFIG_FLATMEM */
 
 static struct kcore_list kcore_mem, kcore_vmalloc; 
 
@@ -570,7 +573,7 @@ void __init mem_init(void)
 	int tmp;
 	int bad_ppro;
 
-#ifndef CONFIG_DISCONTIGMEM
+#ifdef CONFIG_FLATMEM
 	if (!mem_map)
 		BUG();
 #endif
diff -X /home/apw/brief/lib/vdiff.excl -rupN reference/arch/i386/mm/Makefile current/arch/i386/mm/Makefile
--- reference/arch/i386/mm/Makefile
+++ current/arch/i386/mm/Makefile
@@ -4,7 +4,7 @@
 
 obj-y	:= init.o pgtable.o fault.o ioremap.o extable.o pageattr.o mmap.o
 
-obj-$(CONFIG_DISCONTIGMEM)	+= discontig.o
+obj-$(CONFIG_NUMA) += discontig.o
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 obj-$(CONFIG_HIGHMEM) += highmem.o
 obj-$(CONFIG_BOOT_IOREMAP) += boot_ioremap.o
diff -X /home/apw/brief/lib/vdiff.excl -rupN reference/include/asm-i386/mmzone.h current/include/asm-i386/mmzone.h
--- reference/include/asm-i386/mmzone.h
+++ current/include/asm-i386/mmzone.h
@@ -8,6 +8,34 @@
 
 #include <asm/smp.h>
 
+#if defined(CONFIG_DISCONTIGMEM) || defined(CONFIG_NONLINEAR)
+extern struct pglist_data *node_data[];
+#define NODE_DATA(nid)          (node_data[nid])
+
+/*
+ * Following are macros that are specific to this numa platform.
+ */
+#define reserve_bootmem(addr, size) \
+	reserve_bootmem_node(NODE_DATA(0), (addr), (size))
+#define alloc_bootmem(x) \
+	__alloc_bootmem_node(NODE_DATA(0), (x), SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS))
+#define alloc_bootmem_low(x) \
+	__alloc_bootmem_node(NODE_DATA(0), (x), SMP_CACHE_BYTES, 0)
+#define alloc_bootmem_pages(x) \
+	__alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE, __pa(MAX_DMA_ADDRESS))
+#define alloc_bootmem_low_pages(x) \
+	__alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE, 0)
+#define alloc_bootmem_node(ignore, x) \
+	__alloc_bootmem_node(NODE_DATA(0), (x), SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS))
+#define alloc_bootmem_pages_node(ignore, x) \
+	__alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE, __pa(MAX_DMA_ADDRESS))
+#define alloc_bootmem_low_pages_node(ignore, x) \
+	__alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE, 0)
+
+#define node_localnr(pfn, nid)		((pfn) - node_data[nid]->node_start_pfn)
+
+#endif /* !CONFIG_DISCONTIGMEM || !CONFIG_NONLINEAR */
+
 #ifdef CONFIG_DISCONTIGMEM
 
 #ifdef CONFIG_NUMA
@@ -23,9 +51,6 @@
 	#define get_zholes_size(n) (0)
 #endif /* CONFIG_NUMA */
 
-extern struct pglist_data *node_data[];
-#define NODE_DATA(nid)		(node_data[nid])
-
 /*
  * generic node memory support, the following assumptions apply:
  *
@@ -57,28 +82,6 @@ static inline struct pglist_data *pfn_to
 
 
 /*
- * Following are macros that are specific to this numa platform.
- */
-#define reserve_bootmem(addr, size) \
-	reserve_bootmem_node(NODE_DATA(0), (addr), (size))
-#define alloc_bootmem(x) \
-	__alloc_bootmem_node(NODE_DATA(0), (x), SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS))
-#define alloc_bootmem_low(x) \
-	__alloc_bootmem_node(NODE_DATA(0), (x), SMP_CACHE_BYTES, 0)
-#define alloc_bootmem_pages(x) \
-	__alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE, __pa(MAX_DMA_ADDRESS))
-#define alloc_bootmem_low_pages(x) \
-	__alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE, 0)
-#define alloc_bootmem_node(ignore, x) \
-	__alloc_bootmem_node(NODE_DATA(0), (x), SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS))
-#define alloc_bootmem_pages_node(ignore, x) \
-	__alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE, __pa(MAX_DMA_ADDRESS))
-#define alloc_bootmem_low_pages_node(ignore, x) \
-	__alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE, 0)
-
-#define node_localnr(pfn, nid)		((pfn) - node_data[nid]->node_start_pfn)
-
-/*
  * Following are macros that each numa implmentation must define.
  */
 
@@ -91,7 +94,7 @@ static inline struct pglist_data *pfn_to
 #define node_start_pfn(nid)	(NODE_DATA(nid)->node_start_pfn)
 #define node_end_pfn(nid)						\
 ({									\
-	pg_data_t *__pgdat = NODE_DATA(nid);				\
+	struct pglist_data *__pgdat = NODE_DATA(nid);			\
 	__pgdat->node_start_pfn + __pgdat->node_spanned_pages;		\
 })
 
@@ -153,4 +156,59 @@ static inline void get_memcfg_numa(void)
 }
 
 #endif /* CONFIG_DISCONTIGMEM */
+
+
+#ifdef CONFIG_NONLINEAR
+
+#ifdef CONFIG_NUMA
+	#ifdef CONFIG_X86_NUMAQ
+		#include <asm/numaq.h>
+	#else	/* summit or generic arch */
+		#include <asm/srat.h>
+	#endif
+#else /* !CONFIG_NUMA */
+	#define get_memcfg_numa get_memcfg_numa_flat
+	#define get_zholes_size(n) (0)
+#endif /* CONFIG_NUMA */
+
+
+/* generic non-linear memory support:
+ *
+ * 1) we will not split memory into more chunks than will fit into the
+ *    flags field of the struct page
+ */
+
+/*
+ * SECTION_SIZE_BITS            2^N: how big each section will be
+ * MAX_PHYSADDR_BITS            2^N: how much physical address space we have
+ * MAX_PHYSMEM_BITS             2^N: how much memory we can have in that space
+ */
+#define SECTION_SIZE_BITS       30
+#define MAX_PHYSADDR_BITS       36
+#define MAX_PHYSMEM_BITS        36
+
+extern int get_memcfg_numa_flat(void );
+/*
+ * This allows any one NUMA architecture to be compiled
+ * for, and still fall back to the flat function if it
+ * fails.
+ */
+static inline void get_memcfg_numa(void)
+{
+#ifdef CONFIG_X86_NUMAQ
+	if (get_memcfg_numaq())
+		return;
+#elif CONFIG_ACPI_SRAT
+	if (get_memcfg_from_srat())
+		return;
+#endif
+
+	get_memcfg_numa_flat();
+}
+
+/* XXX: FIXME -- wli */
+#define kern_addr_valid(kaddr)  (0)
+
+#endif /* CONFIG_NONLINEAR */
+
 #endif /* _ASM_MMZONE_H_ */
diff -X /home/apw/brief/lib/vdiff.excl -rupN reference/include/asm-i386/page.h current/include/asm-i386/page.h
--- reference/include/asm-i386/page.h
+++ current/include/asm-i386/page.h
@@ -133,11 +133,11 @@ extern int sysctl_legacy_va_layout;
 #define __pa(x)			((unsigned long)(x)-PAGE_OFFSET)
 #define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
 #define pfn_to_kaddr(pfn)      __va((pfn) << PAGE_SHIFT)
-#ifndef CONFIG_DISCONTIGMEM
+#ifdef CONFIG_FLATMEM
 #define pfn_to_page(pfn)	(mem_map + (pfn))
 #define page_to_pfn(page)	((unsigned long)((page) - mem_map))
 #define pfn_valid(pfn)		((pfn) < max_mapnr)
-#endif /* !CONFIG_DISCONTIGMEM */
+#endif /* CONFIG_FLATMEM */
 #define virt_to_page(kaddr)	pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
 
 #define virt_addr_valid(kaddr)	pfn_valid(__pa(kaddr) >> PAGE_SHIFT)
diff -X /home/apw/brief/lib/vdiff.excl -rupN reference/include/asm-i386/pgtable.h current/include/asm-i386/pgtable.h
--- reference/include/asm-i386/pgtable.h
+++ current/include/asm-i386/pgtable.h
@@ -400,9 +400,9 @@ extern pte_t *lookup_address(unsigned lo
 
 #endif /* !__ASSEMBLY__ */
 
-#ifndef CONFIG_DISCONTIGMEM
+#ifdef CONFIG_FLATMEM
 #define kern_addr_valid(addr)	(1)
-#endif /* !CONFIG_DISCONTIGMEM */
+#endif /* CONFIG_FLATMEM */
 
 #define io_remap_page_range remap_page_range
 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* 170 nonlinear ppc64
  2004-10-18 14:24 CONFIG_NONLINEAR for small systems Andy Whitcroft
                   ` (5 preceding siblings ...)
  2004-10-18 14:36 ` 160 nonlinear i386 Andy Whitcroft
@ 2004-10-18 14:36 ` Andy Whitcroft
  2004-10-18 15:17 ` [Lhms-devel] CONFIG_NONLINEAR for small systems Hirokazu Takahashi
  2004-10-19  4:30 ` Hiroyuki KAMEZAWA
  8 siblings, 0 replies; 26+ messages in thread
From: Andy Whitcroft @ 2004-10-18 14:36 UTC (permalink / raw)
  To: apw, lhms-devel, linux-mm

CONFIG_NONLINEAR for ppc64.

Revision: $Rev$

Signed-off-by: Andy Whitcroft <apw@shadowen.org>

diffstat 170-nonlinear-ppc64
---
 arch/ppc64/Kconfig         |   19 +++++++++++++++++--
 arch/ppc64/mm/Makefile     |    2 +-
 arch/ppc64/mm/init.c       |    8 ++++----
 arch/ppc64/mm/numa.c       |   13 +++++++++++++
 include/asm-ppc64/mmzone.h |   40 ++++++++++++++++++++++++++++++++++------
 include/asm-ppc64/page.h   |    4 +++-
 6 files changed, 72 insertions(+), 14 deletions(-)

diff -upN reference/arch/ppc64/Kconfig current/arch/ppc64/Kconfig
--- reference/arch/ppc64/Kconfig
+++ current/arch/ppc64/Kconfig
@@ -180,13 +180,28 @@ config HMT
 	bool "Hardware multithreading"
 	depends on SMP && PPC_PSERIES
 
+
+choice
+        prompt "Memory model"
+        default NONLINEAR if (PPC_PSERIES)
+        default FLATMEM
+
 config DISCONTIGMEM
-	bool "Discontiguous Memory Support"
+        bool "Discontigious Memory"
+	depends on SMP && PPC_PSERIES
+
+config NONLINEAR
+        bool "Nonlinear Memory"
 	depends on SMP && PPC_PSERIES
 
+config FLATMEM
+        bool "Flat Memory"
+
+endchoice
+
 config NUMA
 	bool "NUMA support"
-	depends on DISCONTIGMEM
+	#depends on DISCONTIGMEM
 
 config SCHED_SMT
 	bool "SMT (Hyperthreading) scheduler support"
diff -upN reference/arch/ppc64/mm/init.c current/arch/ppc64/mm/init.c
--- reference/arch/ppc64/mm/init.c
+++ current/arch/ppc64/mm/init.c
@@ -597,7 +597,7 @@ EXPORT_SYMBOL(page_is_ram);
  * Initialize the bootmem system and give it all the memory we
  * have available.
  */
-#ifndef CONFIG_DISCONTIGMEM
+#ifdef CONFIG_FLATMEM
 void __init do_init_bootmem(void)
 {
 	unsigned long i;
@@ -695,7 +695,7 @@ module_init(setup_kcore);
 
 void __init mem_init(void)
 {
-#ifdef CONFIG_DISCONTIGMEM
+#if defined(CONFIG_DISCONTIGMEM) || defined(CONFIG_NONLINEAR)
 	int nid;
 #endif
 	pg_data_t *pgdat;
@@ -706,7 +706,7 @@ void __init mem_init(void)
 	num_physpages = max_low_pfn;	/* RAM is assumed contiguous */
 	high_memory = (void *) __va(max_low_pfn * PAGE_SIZE);
 
-#ifdef CONFIG_DISCONTIGMEM
+#if defined(CONFIG_DISCONTIGMEM) || defined(CONFIG_NONLINEAR)
         for (nid = 0; nid < numnodes; nid++) {
 		if (NODE_DATA(nid)->node_spanned_pages != 0) {
 			printk("freeing bootmem node %x\n", nid);
@@ -721,7 +721,7 @@ void __init mem_init(void)
 
 	for_each_pgdat(pgdat) {
 		for (i = 0; i < pgdat->node_spanned_pages; i++) {
-			page = pgdat->node_mem_map + i;
+			page = pfn_to_page(i);
 			if (PageReserved(page))
 				reservedpages++;
 		}
diff -upN reference/arch/ppc64/mm/Makefile current/arch/ppc64/mm/Makefile
--- reference/arch/ppc64/mm/Makefile
+++ current/arch/ppc64/mm/Makefile
@@ -6,6 +6,6 @@ EXTRA_CFLAGS += -mno-minimal-toc
 
 obj-y := fault.o init.o imalloc.o hash_utils.o hash_low.o tlb.o \
 	slb_low.o slb.o stab.o mmap.o
-obj-$(CONFIG_DISCONTIGMEM) += numa.o
+obj-$(CONFIG_NUMA) += numa.o
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 obj-$(CONFIG_PPC_MULTIPLATFORM) += hash_native.o
diff -upN reference/arch/ppc64/mm/numa.c current/arch/ppc64/mm/numa.c
--- reference/arch/ppc64/mm/numa.c
+++ current/arch/ppc64/mm/numa.c
@@ -304,9 +304,13 @@ new_range:
 				size / PAGE_SIZE;
 		}
 
+		/* XXX: think this is discontig ... */
 		for (i = start ; i < (start+size); i += MEMORY_INCREMENT)
 			numa_memory_lookup_table[i >> MEMORY_INCREMENT_SHIFT] =
 				numa_domain;
+#ifdef CONFIG_NONLINEAR
+		nonlinear_add(numa_domain, start, start + size);
+#endif
 
 		ranges--;
 		if (ranges)
@@ -346,10 +350,15 @@ static void __init setup_nonnuma(void)
 	init_node_data[0].node_start_pfn = 0;
 	init_node_data[0].node_spanned_pages = lmb_end_of_DRAM() / PAGE_SIZE;
 
+	/* APW: this is discontig? */
 	for (i = 0 ; i < top_of_ram; i += MEMORY_INCREMENT)
 		numa_memory_lookup_table[i >> MEMORY_INCREMENT_SHIFT] = 0;
 
 	node0_io_hole_size = top_of_ram - total_ram;
+
+#ifdef CONFIG_NONLINEAR
+		nonlinear_add(0, 0, init_node_data[0].node_spanned_pages);
+#endif
 }
 
 static void __init dump_numa_topology(void)
@@ -567,6 +576,10 @@ void __init paging_init(void)
 	memset(zones_size, 0, sizeof(zones_size));
 	memset(zholes_size, 0, sizeof(zholes_size));
 
+#ifdef CONFIG_NONLINEAR
+	nonlinear_allocate();
+#endif
+
 	for (nid = 0; nid < numnodes; nid++) {
 		unsigned long start_pfn;
 		unsigned long end_pfn;
diff -upN reference/include/asm-ppc64/mmzone.h current/include/asm-ppc64/mmzone.h
--- reference/include/asm-ppc64/mmzone.h
+++ current/include/asm-ppc64/mmzone.h
@@ -10,9 +10,13 @@
 #include <linux/config.h>
 #include <asm/smp.h>
 
-#ifdef CONFIG_DISCONTIGMEM
+#if defined(CONFIG_DISCONTIGMEM) || defined(CONFIG_NONLINEAR)
 
 extern struct pglist_data *node_data[];
+/*
+ * Return a pointer to the node data for node n.
+ */
+#define NODE_DATA(nid)		(node_data[nid])
 
 /*
  * Following are specific to this numa platform.
@@ -27,6 +31,10 @@ extern int nr_cpus_in_node[];
 #define MEMORY_INCREMENT_SHIFT 24
 #define MEMORY_INCREMENT (1UL << MEMORY_INCREMENT_SHIFT)
 
+#endif /* !CONFIG_DISCONTIGMEM || !CONFIG_NONLINEAR */
+
+#ifdef CONFIG_DISCONTIGMEM
+
 /* NUMA debugging, will not work on a DLPAR machine */
 #undef DEBUG_NUMA
 
@@ -49,11 +57,6 @@ static inline int pa_to_nid(unsigned lon
 
 #define pfn_to_nid(pfn)		pa_to_nid((pfn) << PAGE_SHIFT)
 
-/*
- * Return a pointer to the node data for node n.
- */
-#define NODE_DATA(nid)		(node_data[nid])
-
 #define node_localnr(pfn, nid)	((pfn) - NODE_DATA(nid)->node_start_pfn)
 
 /*
@@ -91,4 +94,29 @@ static inline int pa_to_nid(unsigned lon
 #define discontigmem_pfn_valid(pfn)		((pfn) < num_physpages)
 
 #endif /* CONFIG_DISCONTIGMEM */
+
+#ifdef CONFIG_NONLINEAR
+
+/* generic non-linear memory support:
+ *
+ * 1) we will not split memory into more chunks than will fit into the
+ *    flags field of the struct page
+ */
+
+/*
+ * SECTION_SIZE_BITS            2^N: how big each section will be
+ * MAX_PHYSADDR_BITS            2^N: how much physical address space we have
+ * MAX_PHYSMEM_BITS             2^N: how much memory we can have in that space
+ */
+#define SECTION_SIZE_BITS       24
+#define MAX_PHYSADDR_BITS       38
+#define MAX_PHYSMEM_BITS        36
+
+#define pa_to_nid(pa)							\
+({									\
+	pfn_to_nid(pa >> PAGE_SHIFT);					\
+})
+
+#endif /* CONFIG_NONLINEAR */
+
 #endif /* _ASM_MMZONE_H_ */
diff -upN reference/include/asm-ppc64/page.h current/include/asm-ppc64/page.h
--- reference/include/asm-ppc64/page.h
+++ current/include/asm-ppc64/page.h
@@ -222,7 +222,9 @@ extern int page_is_ram(unsigned long pfn
 #define page_to_pfn(page)	discontigmem_page_to_pfn(page)
 #define pfn_to_page(pfn)	discontigmem_pfn_to_page(pfn)
 #define pfn_valid(pfn)		discontigmem_pfn_valid(pfn)
-#else
+#endif
+/* XXX/APW: why is NONLINEAR not here */
+#ifdef CONFIG_FLATMEM
 #define pfn_to_page(pfn)	(mem_map + (pfn))
 #define page_to_pfn(page)	((unsigned long)((page) - mem_map))
 #define pfn_valid(pfn)		((pfn) < max_mapnr)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Lhms-devel] CONFIG_NONLINEAR for small systems
  2004-10-18 14:24 CONFIG_NONLINEAR for small systems Andy Whitcroft
                   ` (6 preceding siblings ...)
  2004-10-18 14:36 ` 170 nonlinear ppc64 Andy Whitcroft
@ 2004-10-18 15:17 ` Hirokazu Takahashi
  2004-10-18 15:29   ` Andy Whitcroft
  2004-10-19  4:30 ` Hiroyuki KAMEZAWA
  8 siblings, 1 reply; 26+ messages in thread
From: Hirokazu Takahashi @ 2004-10-18 15:17 UTC (permalink / raw)
  To: apw; +Cc: lhms-devel, linux-mm

Hello, Andy,

What version of kernel are you using?
I recommend linux-2.6.9-rc4-mm1 for your purpose, as it has eliminated
bitmaps for free pages to simplify managing buddy allocator.
This may help you.

> Following this email will be a series of patches which provide a
> sample implementation of a simplified CONFIG_NONLINEAR memory model. 
> The first two cleanup general infrastructure to minimise code 
> duplication.  The third introduces an allocator for the numa remap space 
> on i386.  The fourth generalises the page flags code to allow the reuse 
> of the NODEZONE bits.  The final three are the actual meat of the 
> implementation for both i386 and ppc64.
> 
> 050-bootmem-use-NODE_DATA
> 060-refactor-setup_memory-i386
> 080-alloc_remap-i386
> 100-cleanup-node-zone
> 150-nonlinear
> 160-nonlinear-i386
> 170-nonlinear-ppc64
> 
> As has been observed the CONFIG_DISCONTIGMEM implementation
> is inefficient space-wise where a system has a sparse intra-node memory
> configuration. For example we have systems where node 0 has a
> 1GB hole within it. Under CONFIG_DISCONTIGMEM this results in the
> struct page's for this area being allocated from ZONE_NORMAL and
> never used; this is particularly problematic on these 32bit systems
> as we are already under severe pressure in this zone.
> 
> The generalised CONFIG_NONLINEAR memory model described at OLS
> seemed provide more than enough decriptive power to address this
> issue but provided far more functionality that was required.
> Particularly it breaks the identity V=P+c to allow compression of
> the kernel address space, which is not required on these smaller systems.
> 
> This patch set is implemented as a proof-of-concept to show
> that a simplified CONFIG_NONLINEAR based implementation could provide
> sufficient flexibility to solve the problems for these systems.
> 
> In the longer term I'd like to see a single CONFIG_NONLINEAR
> implementation which allowed these various features to be stacked in
> combination as required.
> 
> Thoughts?
> 
> -apw
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
> Use IT products in your business? Tell us what you think of them. Give us
> Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
> http://productguide.itmanagersjournal.com/guidepromo.tmpl
> _______________________________________________
> Lhms-devel mailing list
> Lhms-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/lhms-devel
> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Lhms-devel] CONFIG_NONLINEAR for small systems
  2004-10-18 15:17 ` [Lhms-devel] CONFIG_NONLINEAR for small systems Hirokazu Takahashi
@ 2004-10-18 15:29   ` Andy Whitcroft
  0 siblings, 0 replies; 26+ messages in thread
From: Andy Whitcroft @ 2004-10-18 15:29 UTC (permalink / raw)
  To: Hirokazu Takahashi; +Cc: lhms-devel, linux-mm

Hirokazu Takahashi wrote:

> What version of kernel are you using?
> I recommend linux-2.6.9-rc4-mm1 for your purpose, as it has eliminated
> bitmaps for free pages to simplify managing buddy allocator.
> This may help you.

Doh, 2.6.9-rc4.  It was the removal of the bitmaps which stopped me 
porting to there.  I didn't want to do the extra work until it was 
decided if they are going for good :).

-apw
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Lhms-devel] CONFIG_NONLINEAR for small systems
  2004-10-18 14:24 CONFIG_NONLINEAR for small systems Andy Whitcroft
                   ` (7 preceding siblings ...)
  2004-10-18 15:17 ` [Lhms-devel] CONFIG_NONLINEAR for small systems Hirokazu Takahashi
@ 2004-10-19  4:30 ` Hiroyuki KAMEZAWA
  2004-10-19  8:16   ` Andy Whitcroft
  8 siblings, 1 reply; 26+ messages in thread
From: Hiroyuki KAMEZAWA @ 2004-10-19  4:30 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: lhms-devel, linux-mm

Hi,
Andy Whitcroft wrote:

> The generalised CONFIG_NONLINEAR memory model described at OLS
> seemed provide more than enough decriptive power to address this
> issue but provided far more functionality that was required.
> Particularly it breaks the identity V=P+c to allow compression of
> the kernel address space, which is not required on these smaller systems.
> 
We have *future* issue to hotplug kernel memory and kernel's virtual address renaming
will be used for it.
As you say, if kernel memory is not remaped,  keeping V=P+c looks good.
But our current direction is to enable kernel-memory-hotplug, which
needs kernel's virtual memory renaming, I think.

NONLINEAR_OPTIMISED looks a bit complicated.
Can replace them with some other name ? Hmm...NONLINEAR_NOREMAP ?


> This patch set is implemented as a proof-of-concept to show
> that a simplified CONFIG_NONLINEAR based implementation could provide
> sufficient flexibility to solve the problems for these systems.
> 
Very interesting. But I'm not sure whether we can use more page->flags bit :[.
I recommend you not to use more page->flags bits.


Kame <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Lhms-devel] CONFIG_NONLINEAR for small systems
  2004-10-19  4:30 ` Hiroyuki KAMEZAWA
@ 2004-10-19  8:16   ` Andy Whitcroft
  0 siblings, 0 replies; 26+ messages in thread
From: Andy Whitcroft @ 2004-10-19  8:16 UTC (permalink / raw)
  To: Hiroyuki KAMEZAWA; +Cc: lhms-devel, linux-mm

Hiroyuki KAMEZAWA wrote:

> We have *future* issue to hotplug kernel memory and kernel's virtual 
> address renaming
> will be used for it.
> As you say, if kernel memory is not remaped,  keeping V=P+c looks good.
> But our current direction is to enable kernel-memory-hotplug, which
> needs kernel's virtual memory renaming, I think.

Yes, I think its very likely that memory hot-plug requires us to break 
V=P+c in a lot of cases - though perhaps not all.  Indeed it was that 
work that started me thinking about using a simplified form to solve 
other problems for my 'crippled' 32bit systems.

What I am trying to say in my comments to this patch is that although 
generalised NONLINEAR will need and should provide this remap, that 
there are a class of systems and problem which don't need it (and the 
costs associated with it).  I'd like to see them supported as a 
sub-option to NONLINEAR... ie as an nonlinear option to maintain V=P+c. 
  In that this style of layout would be one of those that nonlinear offers.

> NONLINEAR_OPTIMISED looks a bit complicated.
> Can replace them with some other name ? Hmm...NONLINEAR_NOREMAP ?

Yes, that is a dumb name, as later I would also see the option to keep 
V=P+c as an optimisation too.  I'll rename it.

>> This patch set is implemented as a proof-of-concept to show
>> that a simplified CONFIG_NONLINEAR based implementation could provide
>> sufficient flexibility to solve the problems for these systems.
>>
> Very interesting. But I'm not sure whether we can use more page->flags 
> bit :[.
> I recommend you not to use more page->flags bits.

It should not use anymore flags bits.  You probabally got that 
impression as I replace the MAX_NODES_SHIFT (at 6) with a 
FLAGS_TOTAL_SHIFT (at 8) in 100-cleanup-node-zone.  What this is doing 
is replacing the MAX_NODES_SHIFT and MAX_ZONE_SHIFT (at 2) as a upper 
bound on the number of bits available, 8 in total.  When the nonlinear 
patch is layered on top we then have NODES, ZONES and SECTIONS competing 
for space flags, but they cannot consume more than these 8 bits.  I then 
choose to drop the NODE and replace it with SECTION to maintain the size 
constraint.  Obviously on the 64bit systems there is almost no limit and 
all three are stored.

Thanks for looking.

-apw
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: 050 bootmem use NODE_DATA
  2004-10-18 14:32 ` 050 bootmem use NODE_DATA Andy Whitcroft
@ 2004-10-26 18:16   ` Dave Hansen
  0 siblings, 0 replies; 26+ messages in thread
From: Dave Hansen @ 2004-10-26 18:16 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: lhms, linux-mm

On Mon, 2004-10-18 at 07:32, Andy Whitcroft wrote:
> Convert the default non-node based bootmem routines to use
> NODE_DATA(0).  This is semantically and functionally identical in
> any non-node configuration as NODE_DATA(x) is defined as below.
> 
> #define NODE_DATA(nid)          (&contig_page_data)
> 
> For the node cases (CONFIG_NUMA and CONFIG_DISCONTIG_MEM) we can
> use these non-node forms where all boot memory is defined on node 0.

Andy, this patch looks like good stuff, even outside of the context of
nonlinear.  Care to forward it on?

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: 150 nonlinear
  2004-10-18 14:35 ` 150 nonlinear Andy Whitcroft
@ 2004-10-26 18:36   ` Dave Hansen
  2004-10-26 19:07     ` [Lhms-devel] " Mika Penttilä
  2004-10-28 11:07     ` Andy Whitcroft
  0 siblings, 2 replies; 26+ messages in thread
From: Dave Hansen @ 2004-10-26 18:36 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: lhms, linux-mm

Hi Andy,

I've been thinking about how we're going to merge up the code that uses
Dave M's nonlinear with your new implementation.

There are two problems that are being solved: having a sparse layout
requiring splitting up mem_map (solved by discontigmem and your
nonlinear), and supporting non-linear phys to virt relationships (Dave
M's implentation which does the mem_map split as well).

I think both Dave M. and I agree that your implementation is the way to
go, mostly because it properly starts the separation of these two
distinct problems.

So, I propose the following: your code should be referred to as
something like CONFIG_SPARSEMEM.  The code supporting non-linear p::v
retains the CONFIG_NONLINEAR name.

Do you think your code is in a place where it's ready for wider testing
on a few more architectures?  In which case, would you like it held in
the -mhp tree while it's waiting to get merged?  

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Lhms-devel] Re: 150 nonlinear
  2004-10-26 18:36   ` Dave Hansen
@ 2004-10-26 19:07     ` Mika Penttilä
  2004-10-26 19:42       ` Dave Hansen
  2004-10-28 11:07     ` Andy Whitcroft
  1 sibling, 1 reply; 26+ messages in thread
From: Mika Penttilä @ 2004-10-26 19:07 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Andy Whitcroft, lhms, linux-mm

Dave Hansen wrote:

>Hi Andy,
>
>I've been thinking about how we're going to merge up the code that uses
>Dave M's nonlinear with your new implementation.
>
>There are two problems that are being solved: having a sparse layout
>requiring splitting up mem_map (solved by discontigmem and your
>nonlinear), and supporting non-linear phys to virt relationships (Dave
>M's implentation which does the mem_map split as well).
>
>I think both Dave M. and I agree that your implementation is the way to
>go, mostly because it properly starts the separation of these two
>distinct problems.
>
>So, I propose the following: your code should be referred to as
>something like CONFIG_SPARSEMEM.  The code supporting non-linear p::v
>retains the CONFIG_NONLINEAR name.
>
>Do you think your code is in a place where it's ready for wider testing
>on a few more architectures?  In which case, would you like it held in
>the -mhp tree while it's waiting to get merged?  
>
>-- Dave
>
>  
>
What do you consider as Dave M's nonlinear?

--Mika

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Lhms-devel] Re: 150 nonlinear
  2004-10-26 19:07     ` [Lhms-devel] " Mika Penttilä
@ 2004-10-26 19:42       ` Dave Hansen
  2004-10-26 20:41         ` Mika Penttilä
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2004-10-26 19:42 UTC (permalink / raw)
  To: Mika Penttilä; +Cc: Andy Whitcroft, lhms, linux-mm

On Tue, 2004-10-26 at 12:07, Mika Penttila wrote:
> What do you consider as Dave M's nonlinear?

This, basically:

http://sprucegoose.sr71.net/patches/2.6.9-rc3-mm3-mhp1/C-nonlinear-base.patch

There's a little there that isn't Dave M's direct work, but it's all in
the spirit of his implementation.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Lhms-devel] Re: 150 nonlinear
  2004-10-26 19:42       ` Dave Hansen
@ 2004-10-26 20:41         ` Mika Penttilä
  2004-10-26 20:55           ` Dave Hansen
  0 siblings, 1 reply; 26+ messages in thread
From: Mika Penttilä @ 2004-10-26 20:41 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Andy Whitcroft, lhms, linux-mm

Dave Hansen wrote:

>On Tue, 2004-10-26 at 12:07, Mika Penttila wrote:
>  
>
>>What do you consider as Dave M's nonlinear?
>>    
>>
>
>This, basically:
>
>http://sprucegoose.sr71.net/patches/2.6.9-rc3-mm3-mhp1/C-nonlinear-base.patch
>
>There's a little there that isn't Dave M's direct work, but it's all in
>the spirit of his implementation.
>
>-- Dave
>
>
>  
>
Ah, you mean Daniel Phillips's initial patch for nonlinear...

Ok, so what's the mem_map split? I see Andy renamed it section_mem_map 
and added NONLINEAR_OPTIMISE, how's that making a difference?

--Mika


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Lhms-devel] Re: 150 nonlinear
  2004-10-26 20:41         ` Mika Penttilä
@ 2004-10-26 20:55           ` Dave Hansen
  2004-10-26 21:20             ` Mika Penttilä
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2004-10-26 20:55 UTC (permalink / raw)
  To: Mika Penttilä; +Cc: Andy Whitcroft, lhms, linux-mm

On Tue, 2004-10-26 at 13:41, Mika Penttila wrote:
> Ah, you mean Daniel Phillips's initial patch for nonlinear...

No.  Dan had a lovely idea, and a decent implementation, but Dave M
completely reimplemented it as far as I know.  That's why I've been
referring to them as "implementations".

> Ok, so what's the mem_map split? I see Andy renamed it section_mem_map 
> and added NONLINEAR_OPTIMISE, how's that making a difference?

I don't understand the question.  Why do we need to split up mem_map?

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Lhms-devel] Re: 150 nonlinear
  2004-10-26 20:55           ` Dave Hansen
@ 2004-10-26 21:20             ` Mika Penttilä
  2004-10-26 21:27               ` Dave Hansen
  0 siblings, 1 reply; 26+ messages in thread
From: Mika Penttilä @ 2004-10-26 21:20 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Andy Whitcroft, lhms, linux-mm

Dave Hansen wrote:

>On Tue, 2004-10-26 at 13:41, Mika Penttila wrote:
>  
>
>>Ah, you mean Daniel Phillips's initial patch for nonlinear...
>>    
>>
>
>No.  Dan had a lovely idea, and a decent implementation, but Dave M
>completely reimplemented it as far as I know.  That's why I've been
>referring to them as "implementations".
>
>  
>
I see ..ok.

>>Ok, so what's the mem_map split? I see Andy renamed it section_mem_map 
>>and added NONLINEAR_OPTIMISE, how's that making a difference?
>>    
>>
>
>I don't understand the question.  Why do we need to split up mem_map?
>
>  
>
I do not understand the split either..but you said :

"There are two problems that are being solved: having a sparse layout
requiring splitting up mem_map (solved by discontigmem and your
nonlinear), and supporting non-linear phys to virt relationships (Dave
M's implentation which does the mem_map split as well)."


so what's the split?

--Mika






--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Lhms-devel] Re: 150 nonlinear
  2004-10-26 21:20             ` Mika Penttilä
@ 2004-10-26 21:27               ` Dave Hansen
  2004-10-26 21:38                 ` Mika Penttilä
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2004-10-26 21:27 UTC (permalink / raw)
  To: Mika Penttilä; +Cc: Andy Whitcroft, lhms, linux-mm

On Tue, 2004-10-26 at 14:20, Mika Penttila wrote:
> "There are two problems that are being solved: having a sparse layout
> requiring splitting up mem_map (solved by discontigmem and your
> nonlinear), and supporting non-linear phys to virt relationships (Dave
> M's implentation which does the mem_map split as well)."
> 
> 
> so what's the split?

So, mem_map is normally laid out so that, if you have 1GB of memory, the
memory for 0x00000000 is at mem_map[0], and the memory for the last page
(at 1GB - 1 page) is at mem_map[1<<30 / PAGE_SIZE - 1].  

That's fine and dandy for most systems.  But, imagine that you have some
memory on a funky machine where you have 2GB of memory, but it is laid
out like this:

    0-1 GB - first 1 GB
  1-100 GB - empty
100-101 GB - second 1 GB

Then, you'd need to have mem_map sized the same as a 101GB system on
your dinky 2GB system (disregard the ia64 implementation).

The split I'm referring to is cutting mem_map[] up into pieces for each
contiguous section of memory.  

Make sense?

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Lhms-devel] Re: 150 nonlinear
  2004-10-26 21:27               ` Dave Hansen
@ 2004-10-26 21:38                 ` Mika Penttilä
  2004-10-26 21:41                   ` Dave Hansen
  0 siblings, 1 reply; 26+ messages in thread
From: Mika Penttilä @ 2004-10-26 21:38 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Andy Whitcroft, lhms, linux-mm

Dave Hansen wrote:

>On Tue, 2004-10-26 at 14:20, Mika Penttila wrote:
>  
>
>>"There are two problems that are being solved: having a sparse layout
>>requiring splitting up mem_map (solved by discontigmem and your
>>nonlinear), and supporting non-linear phys to virt relationships (Dave
>>M's implentation which does the mem_map split as well)."
>>
>>
>>so what's the split?
>>    
>>
>
>So, mem_map is normally laid out so that, if you have 1GB of memory, the
>memory for 0x00000000 is at mem_map[0], and the memory for the last page
>(at 1GB - 1 page) is at mem_map[1<<30 / PAGE_SIZE - 1].  
>
>That's fine and dandy for most systems.  But, imagine that you have some
>memory on a funky machine where you have 2GB of memory, but it is laid
>out like this:
>
>    0-1 GB - first 1 GB
>  1-100 GB - empty
>100-101 GB - second 1 GB
>
>Then, you'd need to have mem_map sized the same as a 101GB system on
>your dinky 2GB system (disregard the ia64 implementation).
>
>The split I'm referring to is cutting mem_map[] up into pieces for each
>contiguous section of memory.  
>
>Make sense?
>
>-- Dave
>
>
>  
>
Yes, I see Dave M's approarch is doing this, but isn't Andy's as well? 
What's the key differences between these two?

--Mika


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Lhms-devel] Re: 150 nonlinear
  2004-10-26 21:38                 ` Mika Penttilä
@ 2004-10-26 21:41                   ` Dave Hansen
  2004-10-26 21:55                     ` Mika Penttilä
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2004-10-26 21:41 UTC (permalink / raw)
  To: Mika Penttilä; +Cc: lhms, linux-mm

Taking poor Andy off the cc...

On Tue, 2004-10-26 at 14:38, Mika Penttila wrote:
> Yes, I see Dave M's approarch is doing this, but isn't Andy's as well? 
> What's the key differences between these two?

Back to my first message:
>There are two problems that are being solved: having a sparse layout
>requiring splitting up mem_map (solved by discontigmem and your
>nonlinear), and supporting non-linear phys to virt relationships (Dave
>M's implentation which does the mem_map split as well).

Andy: split
Dave M: split + non-linear phys to virt

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Lhms-devel] Re: 150 nonlinear
  2004-10-26 21:55                     ` Mika Penttilä
@ 2004-10-26 21:53                       ` Dave Hansen
  2004-10-26 22:01                         ` Mika Penttilä
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2004-10-26 21:53 UTC (permalink / raw)
  To: Mika Penttilä; +Cc: lhms, linux-mm

On Tue, 2004-10-26 at 14:55, Mika Penttila wrote:
> Andy:         __pa and __va as before, nonlinear page_to_pfn and pfn_to_page
> Dave M :     new nonlinear __pa and __va implementations and nonlinear 
> page_to_pfn and pfn_to_page

Yes, basically.  Those are the most visible high-level-API functions
that get changed.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Lhms-devel] Re: 150 nonlinear
  2004-10-26 21:41                   ` Dave Hansen
@ 2004-10-26 21:55                     ` Mika Penttilä
  2004-10-26 21:53                       ` Dave Hansen
  0 siblings, 1 reply; 26+ messages in thread
From: Mika Penttilä @ 2004-10-26 21:55 UTC (permalink / raw)
  To: Dave Hansen; +Cc: lhms, linux-mm

Dave Hansen wrote:

>Taking poor Andy off the cc...
>
>On Tue, 2004-10-26 at 14:38, Mika Penttila wrote:
>  
>
>>Yes, I see Dave M's approarch is doing this, but isn't Andy's as well? 
>>What's the key differences between these two?
>>    
>>
>
>Back to my first message:
>  
>
>>There are two problems that are being solved: having a sparse layout
>>requiring splitting up mem_map (solved by discontigmem and your
>>nonlinear), and supporting non-linear phys to virt relationships (Dave
>>M's implentation which does the mem_map split as well).
>>    
>>
>
>Andy: split
>Dave M: split + non-linear phys to virt
>
>-- Dave
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
>
>  
>
So is it?

Andy:         __pa and __va as before, nonlinear page_to_pfn and pfn_to_page
Dave M :     new nonlinear __pa and __va implementations and nonlinear 
page_to_pfn and pfn_to_page

--Mika


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Lhms-devel] Re: 150 nonlinear
  2004-10-26 21:53                       ` Dave Hansen
@ 2004-10-26 22:01                         ` Mika Penttilä
  0 siblings, 0 replies; 26+ messages in thread
From: Mika Penttilä @ 2004-10-26 22:01 UTC (permalink / raw)
  To: Dave Hansen; +Cc: lhms, linux-mm

Dave Hansen wrote:

>On Tue, 2004-10-26 at 14:55, Mika Penttila wrote:
>  
>
>>Andy:         __pa and __va as before, nonlinear page_to_pfn and pfn_to_page
>>Dave M :     new nonlinear __pa and __va implementations and nonlinear 
>>page_to_pfn and pfn_to_page
>>    
>>
>
>Yes, basically.  Those are the most visible high-level-API functions
>that get changed.
>
>-- Dave
>
>  
>
great..thanks!

--Mika


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: 150 nonlinear
  2004-10-26 18:36   ` Dave Hansen
  2004-10-26 19:07     ` [Lhms-devel] " Mika Penttilä
@ 2004-10-28 11:07     ` Andy Whitcroft
  1 sibling, 0 replies; 26+ messages in thread
From: Andy Whitcroft @ 2004-10-28 11:07 UTC (permalink / raw)
  To: Dave Hansen; +Cc: lhms, linux-mm

Dave Hansen wrote:

> I've been thinking about how we're going to merge up the code that uses
> Dave M's nonlinear with your new implementation.
> 
> There are two problems that are being solved: having a sparse layout
> requiring splitting up mem_map (solved by discontigmem and your
> nonlinear), and supporting non-linear phys to virt relationships (Dave
> M's implentation which does the mem_map split as well).
> 
> I think both Dave M. and I agree that your implementation is the way to
> go, mostly because it properly starts the separation of these two
> distinct problems.
> 
> So, I propose the following: your code should be referred to as
> something like CONFIG_SPARSEMEM.  The code supporting non-linear p::v
> retains the CONFIG_NONLINEAR name.
> 
> Do you think your code is in a place where it's ready for wider testing
> on a few more architectures?  In which case, would you like it held in
> the -mhp tree while it's waiting to get merged?  

Ok.  Meant to get back to you sooner, trouble getting test runs through 
on the new version.  Anyhow, yes thats fine with me.  I'll send out a 
new version here today renamed to CONFIG_SPARSEMEM.  This also has a few 
fixes as a result of futher testing.  -mhp seems as good a place as any 
for the moment.

-apw
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2004-10-28 11:07 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-18 14:24 CONFIG_NONLINEAR for small systems Andy Whitcroft
2004-10-18 14:32 ` 050 bootmem use NODE_DATA Andy Whitcroft
2004-10-26 18:16   ` Dave Hansen
2004-10-18 14:33 ` 060 refactor setup_memory i386 Andy Whitcroft
2004-10-18 14:34 ` 080 alloc_remap i386 Andy Whitcroft
2004-10-18 14:35 ` 100 cleanup node zone Andy Whitcroft
2004-10-18 14:35 ` 150 nonlinear Andy Whitcroft
2004-10-26 18:36   ` Dave Hansen
2004-10-26 19:07     ` [Lhms-devel] " Mika Penttilä
2004-10-26 19:42       ` Dave Hansen
2004-10-26 20:41         ` Mika Penttilä
2004-10-26 20:55           ` Dave Hansen
2004-10-26 21:20             ` Mika Penttilä
2004-10-26 21:27               ` Dave Hansen
2004-10-26 21:38                 ` Mika Penttilä
2004-10-26 21:41                   ` Dave Hansen
2004-10-26 21:55                     ` Mika Penttilä
2004-10-26 21:53                       ` Dave Hansen
2004-10-26 22:01                         ` Mika Penttilä
2004-10-28 11:07     ` Andy Whitcroft
2004-10-18 14:36 ` 160 nonlinear i386 Andy Whitcroft
2004-10-18 14:36 ` 170 nonlinear ppc64 Andy Whitcroft
2004-10-18 15:17 ` [Lhms-devel] CONFIG_NONLINEAR for small systems Hirokazu Takahashi
2004-10-18 15:29   ` Andy Whitcroft
2004-10-19  4:30 ` Hiroyuki KAMEZAWA
2004-10-19  8:16   ` Andy Whitcroft

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox