From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e35.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j2PKi2Lg589886 for ; Fri, 25 Mar 2005 15:44:02 -0500 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j2PKi2Mu196570 for ; Fri, 25 Mar 2005 13:44:02 -0700 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j2PKi1sH017203 for ; Fri, 25 Mar 2005 13:44:01 -0700 Subject: resubmit - [PATCH 2/4] sparsemem base: simple NUMA remap space allocator From: Dave Hansen Date: Fri, 25 Mar 2005 12:44:00 -0800 Message-Id: Sender: owner-linux-mm@kvack.org Return-Path: To: akpm@osdl.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Dave Hansen , apw@shadowen.org List-ID: Introduce a simple allocator for the NUMA remap space. This space is very scarce, used for structures which are best allocated node local. This mechanism is also used on non-NUMA ia64 systems with a vmem_map to keep the pgdat->node_mem_map initialized in a consistent place for all architectures. Issues: o alloc_remap takes a node_id where we might expect a pgdat which was intended to allow us to allocate the pgdat's using this mechanism; which we do not yet do. Could have alloc_remap_node() and alloc_remap_nid() for this purpose. Signed-off-by: Andy Whitcroft Signed-off-by: Dave Hansen --- memhotplug-dave/arch/i386/Kconfig | 5 ++ memhotplug-dave/arch/i386/mm/discontig.c | 59 ++++++++++++++++--------------- memhotplug-dave/include/linux/bootmem.h | 9 ++++ memhotplug-dave/mm/page_alloc.c | 6 ++- 4 files changed, 50 insertions(+), 29 deletions(-) diff -puN arch/i386/Kconfig~FROM-MM-alloc_remap-i386 arch/i386/Kconfig --- memhotplug/arch/i386/Kconfig~FROM-MM-alloc_remap-i386 2005-03-25 08:17:11.000000000 -0800 +++ memhotplug-dave/arch/i386/Kconfig 2005-03-25 08:17:11.000000000 -0800 @@ -787,6 +787,11 @@ config NEED_NODE_MEMMAP_SIZE depends on DISCONTIGMEM default y +config HAVE_ARCH_ALLOC_REMAP + bool + depends on NUMA + default y + config HIGHPTE bool "Allocate 3rd-level pagetables from highmem" depends on HIGHMEM4G || HIGHMEM64G diff -puN arch/i386/mm/discontig.c~FROM-MM-alloc_remap-i386 arch/i386/mm/discontig.c --- memhotplug/arch/i386/mm/discontig.c~FROM-MM-alloc_remap-i386 2005-03-25 08:17:11.000000000 -0800 +++ memhotplug-dave/arch/i386/mm/discontig.c 2005-03-25 08:17:11.000000000 -0800 @@ -108,6 +108,9 @@ unsigned long node_remap_offset[MAX_NUMN void *node_remap_start_vaddr[MAX_NUMNODES]; void set_pmd_pfn(unsigned long vaddr, unsigned long pfn, pgprot_t flags); +void *node_remap_end_vaddr[MAX_NUMNODES]; +void *node_remap_alloc_vaddr[MAX_NUMNODES]; + /* * FLAT - support for basic PC memory model with discontig enabled, essentially * a single node with all available processors in it with a flat @@ -163,6 +166,21 @@ static void __init allocate_pgdat(int ni } } +void *alloc_remap(int nid, unsigned long size) +{ + void *allocation = node_remap_alloc_vaddr[nid]; + + size = ALIGN(size, L1_CACHE_BYTES); + + if (!allocation || (allocation + size) >= node_remap_end_vaddr[nid]) + return 0; + + node_remap_alloc_vaddr[nid] += size; + memset(allocation, 0, size); + + return allocation; +} + void __init remap_numa_kva(void) { void *vaddr; @@ -170,8 +188,6 @@ void __init remap_numa_kva(void) int node; for_each_online_node(node) { - if (node == 0) - continue; for (pfn=0; pfn < node_remap_size[node]; pfn += PTRS_PER_PTE) { vaddr = node_remap_start_vaddr[node]+(pfn<node_mem_map = (struct page *)lmem_map; - free_area_init_node(nid, NODE_DATA(nid), zones_size, - start, zholes_size); - } + + free_area_init_node(nid, NODE_DATA(nid), zones_size, start, + zholes_size); } return; } diff -puN include/linux/bootmem.h~FROM-MM-alloc_remap-i386 include/linux/bootmem.h --- memhotplug/include/linux/bootmem.h~FROM-MM-alloc_remap-i386 2005-03-25 08:17:11.000000000 -0800 +++ memhotplug-dave/include/linux/bootmem.h 2005-03-25 08:17:11.000000000 -0800 @@ -67,6 +67,15 @@ extern void * __init __alloc_bootmem_nod __alloc_bootmem_node((pgdat), (x), PAGE_SIZE, 0) #endif /* !CONFIG_HAVE_ARCH_BOOTMEM_NODE */ +#ifdef CONFIG_HAVE_ARCH_ALLOC_REMAP +extern void *alloc_remap(int nid, unsigned long size); +#else +static inline void *alloc_remap(int nid, unsigned long size) +{ + return NULL; +} +#endif + extern unsigned long __initdata nr_kernel_pages; extern unsigned long __initdata nr_all_pages; diff -puN mm/page_alloc.c~FROM-MM-alloc_remap-i386 mm/page_alloc.c --- memhotplug/mm/page_alloc.c~FROM-MM-alloc_remap-i386 2005-03-25 08:17:11.000000000 -0800 +++ memhotplug-dave/mm/page_alloc.c 2005-03-25 08:17:11.000000000 -0800 @@ -1729,6 +1729,7 @@ static void __init free_area_init_core(s static void __init alloc_node_mem_map(struct pglist_data *pgdat) { unsigned long size; + struct page *map; /* Skip empty nodes */ if (!pgdat->node_spanned_pages) @@ -1737,7 +1738,10 @@ static void __init alloc_node_mem_map(st /* ia64 gets its own node_mem_map, before this, without bootmem */ if (!pgdat->node_mem_map) { size = (pgdat->node_spanned_pages + 1) * sizeof(struct page); - pgdat->node_mem_map = alloc_bootmem_node(pgdat, size); + map = alloc_remap(pgdat->node_id, size); + if (!map) + map = alloc_bootmem_node(pgdat, size); + pgdat->node_mem_map = map; } #ifndef CONFIG_DISCONTIGMEM /* _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org