linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10] x86: Reduce memory and intra-node effects with large count NR_CPUs V2
@ 2008-01-15  2:17 travis
  2008-01-15  2:17 ` [PATCH 01/10] x86: Change size of APICIDs from u8 to u16 V2 travis
                   ` (9 more replies)
  0 siblings, 10 replies; 15+ messages in thread
From: travis @ 2008-01-15  2:17 UTC (permalink / raw)
  To: Andrew Morton, Andi Kleen, mingo
  Cc: Christoph Lameter, Jack Steiner, linux-mm, linux-kernel

This patchset addresses the kernel bloat that occurs when NR_CPUS is increased.
The memory numbers below are with NR_CPUS = 1024 which I've been testing (4 and
32 real processors, the rest "possible" using the additional_cpus start option.)
These changes are all specific to the x86 architecture, non-arch specific
changes will follow.

Based on 2.6.24-rc6-mm1

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
---
V1->V2:
    - Remove extraneous casts
    - Add comment about node memory < NODE_MIN_SIZE
    - changed pxm_to_node_map to u16
    - changed memnode map entries to u16
    - Fix !NUMA builds with '#ifdef CONFIG_NUMA"
    - Add slight optimization to apic_is_clustered_box()
---

The following columns are using the default x86_64 config with no modules.
32cpus is the default NR_CPUS, 1kcpus-before has NR_CPUS = 1024, and
1kcpus-after is after applying this patch.

As noticeable below there's still plenty of room for improvement... ;-)

32cpus			  1kcpus-before		    1kcpus-after
       228 .altinstr_repl 	  +0 .altinstr_repl 	    +0 .altinstr_repl
      1219 .altinstructio 	  +0 .altinstructio 	    +0 .altinstructio
    717512 .bss		    +1542784 .bss	       -147456 .bss
     61374 .comment	  	  +0 .comment	    	    +0 .comment
	16 .con_initcall. 	  +0 .con_initcall. 	    +0 .con_initcall.
    425256 .data	      +20224 .data	    	 -1024 .data
    178688 .data.cachelin  +12898304 .data.cachelin 	    +0 .data.cachelin
      8192 .data.init_tas 	  +0 .data.init_tas 	    +0 .data.init_tas
      4096 .data.page_ali 	  +0 .data.page_ali 	    +0 .data.page_ali
     27008 .data.percpu	     +128768 .data.percpu   	  +128 .data.percpu
     43904 .data.read_mos   +8707872 .data.read_mos 	 -4096 .data.read_mos
	 4 .data_nosave	  	  +0 .data_nosave   	    +0 .data_nosave
      5141 .exit.text	  	  +9 .exit.text	    	    -1 .exit.text
    138480 .init.data	  	+992 .init.data	    	 +3616 .init.data
       133 .init.ramfs	  	  +0 .init.ramfs    	    +1 .init.ramfs
      3192 .init.setup	  	  +0 .init.setup    	    +0 .init.setup
    159754 .init.text	  	+891 .init.text	    	   +13 .init.text
      2288 .initcall.init 	  +0 .initcall.init 	    +0 .initcall.init
	 8 .jiffies	  	  +0 .jiffies	    	    +0 .jiffies
      4512 .pci_fixup	  	  +0 .pci_fixup	    	    +0 .pci_fixup
   1314438 .rodata	       +1312 .rodata	    	  -552 .rodata
     36552 .smp_locks	  	+256 .smp_locks	    	    +0 .smp_locks
   3971848 .text	      +12992 .text	    	 +1781 .text
      3368 .vdso	  	  +0 .vdso	    	    +0 .vdso
	 4 .vgetcpu_mode  	  +0 .vgetcpu_mode  	    +0 .vgetcpu_mode
       218 .vsyscall_0	  	  +0 .vsyscall_0    	    +0 .vsyscall_0
	52 .vsyscall_1	  	  +0 .vsyscall_1    	    +0 .vsyscall_1
	91 .vsyscall_2	  	  +0 .vsyscall_2    	    +0 .vsyscall_2
	 8 .vsyscall_3	  	  +0 .vsyscall_3    	    +0 .vsyscall_3
	54 .vsyscall_fn	  	  +0 .vsyscall_fn   	    +0 .vsyscall_fn
	80 .vsyscall_gtod 	  +0 .vsyscall_gtod 	    +0 .vsyscall_gtod
     39480 __bug_table	  	  +0 __bug_table    	    +0 __bug_table
     16320 __ex_table	  	  +0 __ex_table	    	    +0 __ex_table
      9160 __param	  	  +0 __param	    	    +0 __param
   7172678 Total	   +23314404 Total	       -147590 Total

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 01/10] x86: Change size of APICIDs from u8 to u16 V2
  2008-01-15  2:17 [PATCH 00/10] x86: Reduce memory and intra-node effects with large count NR_CPUs V2 travis
@ 2008-01-15  2:17 ` travis
  2008-01-15  2:17 ` [PATCH 02/10] x86: Change size of node ids " travis
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: travis @ 2008-01-15  2:17 UTC (permalink / raw)
  To: Andrew Morton, Andi Kleen, mingo
  Cc: Christoph Lameter, Jack Steiner, linux-mm, linux-kernel

[-- Attachment #1: big_apicids --]
[-- Type: text/plain, Size: 6959 bytes --]

Change the size of APICIDs from u8 to u16.  This partially
supports the new x2apic mode that will be present on future
processor chips. (Chips actually support 32-bit APICIDs, but that
change is more intrusive. Supporting 16-bit is sufficient for now).

Signed-off-by: Jack Steiner <steiner@sgi.com>

I've included just the partial change from u8 to u16 apicids.  The
remaining x2apic changes will be in a separate patch.

In addition, the fake_node_to_pxm_map[] and fake_apicid_to_node[]
tables have been moved from local data to the __initdata section
reducing stack pressure when MAX_NUMNODES and MAX_LOCAL_APIC are
increased in size.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
---
V1->V2:
    - Remove extraneous casts
    - Add comment about node memory < NODE_MIN_SIZE
---
 arch/x86/kernel/genapic_64.c |    4 ++--
 arch/x86/kernel/mpparse_64.c |    4 ++--
 arch/x86/kernel/smpboot_64.c |    2 +-
 arch/x86/mm/numa_64.c        |    2 +-
 arch/x86/mm/srat_64.c        |   26 +++++++++++++++++---------
 include/asm-x86/processor.h  |   14 +++++++-------
 include/asm-x86/smp_64.h     |    8 ++++----
 7 files changed, 34 insertions(+), 26 deletions(-)

--- a/arch/x86/kernel/genapic_64.c
+++ b/arch/x86/kernel/genapic_64.c
@@ -32,10 +32,10 @@
  * array during this time.  Is it zeroed when the per_cpu
  * data area is removed.
  */
-u8 x86_cpu_to_apicid_init[NR_CPUS] __initdata
+u16 x86_cpu_to_apicid_init[NR_CPUS] __initdata
 					= { [0 ... NR_CPUS-1] = BAD_APICID };
 void *x86_cpu_to_apicid_ptr;
-DEFINE_PER_CPU(u8, x86_cpu_to_apicid) = BAD_APICID;
+DEFINE_PER_CPU(u16, x86_cpu_to_apicid) = BAD_APICID;
 EXPORT_PER_CPU_SYMBOL(x86_cpu_to_apicid);
 
 struct genapic __read_mostly *genapic = &apic_flat;
--- a/arch/x86/kernel/mpparse_64.c
+++ b/arch/x86/kernel/mpparse_64.c
@@ -67,7 +67,7 @@ unsigned disabled_cpus __cpuinitdata;
 /* Bitmask of physically existing CPUs */
 physid_mask_t phys_cpu_present_map = PHYSID_MASK_NONE;
 
-u8 bios_cpu_apicid[NR_CPUS] = { [0 ... NR_CPUS-1] = BAD_APICID };
+u16 bios_cpu_apicid[NR_CPUS] = { [0 ... NR_CPUS-1] = BAD_APICID };
 
 
 /*
@@ -132,7 +132,7 @@ static void __cpuinit MP_processor_info(
 	 * area is created.
 	 */
 	if (x86_cpu_to_apicid_ptr) {
-		u8 *x86_cpu_to_apicid = (u8 *)x86_cpu_to_apicid_ptr;
+		u16 *x86_cpu_to_apicid = x86_cpu_to_apicid_ptr;
 		x86_cpu_to_apicid[cpu] = m->mpc_apicid;
 	} else {
 		per_cpu(x86_cpu_to_apicid, cpu) = m->mpc_apicid;
--- a/arch/x86/kernel/smpboot_64.c
+++ b/arch/x86/kernel/smpboot_64.c
@@ -65,7 +65,7 @@ int smp_num_siblings = 1;
 EXPORT_SYMBOL(smp_num_siblings);
 
 /* Last level cache ID of each logical CPU */
-DEFINE_PER_CPU(u8, cpu_llc_id) = BAD_APICID;
+DEFINE_PER_CPU(u16, cpu_llc_id) = BAD_APICID;
 
 /* Bitmask of currently online CPUs */
 cpumask_t cpu_online_map __read_mostly;
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -627,7 +627,7 @@ void __init init_cpu_to_node(void)
 	int i;
 
 	for (i = 0; i < NR_CPUS; i++) {
-		u8 apicid = x86_cpu_to_apicid_init[i];
+		u16 apicid = x86_cpu_to_apicid_init[i];
 
 		if (apicid == BAD_APICID)
 			continue;
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -130,6 +130,9 @@ void __init
 acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa)
 {
 	int pxm, node;
+	int apic_id;
+
+	apic_id = pa->apic_id;
 	if (srat_disabled())
 		return;
 	if (pa->header.length != sizeof(struct acpi_srat_cpu_affinity)) {
@@ -145,10 +148,10 @@ acpi_numa_processor_affinity_init(struct
 		bad_srat();
 		return;
 	}
-	apicid_to_node[pa->apic_id] = node;
+	apicid_to_node[apic_id] = node;
 	acpi_numa = 1;
 	printk(KERN_INFO "SRAT: PXM %u -> APIC %u -> Node %u\n",
-	       pxm, pa->apic_id, node);
+	       pxm, apic_id, node);
 }
 
 int update_end_of_memory(unsigned long end) {return -1;}
@@ -343,7 +346,12 @@ int __init acpi_scan_nodes(unsigned long
 	/* First clean up the node list */
 	for (i = 0; i < MAX_NUMNODES; i++) {
 		cutoff_node(i, start, end);
-		if ((nodes[i].end - nodes[i].start) < NODE_MIN_SIZE) {
+		/*
+		 * don't confuse VM with a node that doesn't have the
+		 * minimum memory.
+		 */
+		if (nodes[i].end &&
+			(nodes[i].end - nodes[i].start) < NODE_MIN_SIZE) {
 			unparse_node(i);
 			node_set_offline(i);
 		}
@@ -384,6 +392,12 @@ int __init acpi_scan_nodes(unsigned long
 }
 
 #ifdef CONFIG_NUMA_EMU
+static int fake_node_to_pxm_map[MAX_NUMNODES] __initdata = {
+	[0 ... MAX_NUMNODES-1] = PXM_INVAL
+};
+static unsigned char fake_apicid_to_node[MAX_LOCAL_APIC] __initdata = {
+	[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
+};
 static int __init find_node_by_addr(unsigned long addr)
 {
 	int ret = NUMA_NO_NODE;
@@ -414,12 +428,6 @@ static int __init find_node_by_addr(unsi
 void __init acpi_fake_nodes(const struct bootnode *fake_nodes, int num_nodes)
 {
 	int i, j;
-	int fake_node_to_pxm_map[MAX_NUMNODES] = {
-		[0 ... MAX_NUMNODES-1] = PXM_INVAL
-	};
-	unsigned char fake_apicid_to_node[MAX_LOCAL_APIC] = {
-		[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
-	};
 
 	printk(KERN_INFO "Faking PXM affinity for fake nodes on real "
 			 "topology.\n");
--- a/include/asm-x86/processor.h
+++ b/include/asm-x86/processor.h
@@ -86,14 +86,14 @@ struct cpuinfo_x86 {
 #ifdef CONFIG_SMP
 	cpumask_t llc_shared_map;	/* cpus sharing the last level cache */
 #endif
-	unsigned char x86_max_cores;	/* cpuid returned max cores value */
-	unsigned char apicid;
-	unsigned short x86_clflush_size;
+	u16 x86_max_cores;		/* cpuid returned max cores value */
+	u16 apicid;
+	u16 x86_clflush_size;
 #ifdef CONFIG_SMP
-	unsigned char booted_cores;	/* number of cores as seen by OS */
-	__u8 phys_proc_id; 		/* Physical processor id. */
-	__u8 cpu_core_id;  		/* Core id */
-	__u8 cpu_index;			/* index into per_cpu list */
+	u16 booted_cores;		/* number of cores as seen by OS */
+	u16 phys_proc_id; 		/* Physical processor id. */
+	u16 cpu_core_id;  		/* Core id */
+	u16 cpu_index;			/* index into per_cpu list */
 #endif
 } __attribute__((__aligned__(SMP_CACHE_BYTES)));
 
--- a/include/asm-x86/smp_64.h
+++ b/include/asm-x86/smp_64.h
@@ -26,14 +26,14 @@ extern void unlock_ipi_call_lock(void);
 extern int smp_call_function_mask(cpumask_t mask, void (*func)(void *),
 				  void *info, int wait);
 
-extern u8 __initdata x86_cpu_to_apicid_init[];
+extern u16 __initdata x86_cpu_to_apicid_init[];
 extern void *x86_cpu_to_apicid_ptr;
-extern u8 bios_cpu_apicid[];
+extern u16 bios_cpu_apicid[];
 
 DECLARE_PER_CPU(cpumask_t, cpu_sibling_map);
 DECLARE_PER_CPU(cpumask_t, cpu_core_map);
-DECLARE_PER_CPU(u8, cpu_llc_id);
-DECLARE_PER_CPU(u8, x86_cpu_to_apicid);
+DECLARE_PER_CPU(u16, cpu_llc_id);
+DECLARE_PER_CPU(u16, x86_cpu_to_apicid);
 
 static inline int cpu_present_to_apicid(int mps_cpu)
 {

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 02/10] x86: Change size of node ids from u8 to u16 V2
  2008-01-15  2:17 [PATCH 00/10] x86: Reduce memory and intra-node effects with large count NR_CPUs V2 travis
  2008-01-15  2:17 ` [PATCH 01/10] x86: Change size of APICIDs from u8 to u16 V2 travis
@ 2008-01-15  2:17 ` travis
  2008-01-15  5:59   ` Eric Dumazet
  2008-01-15  2:17 ` [PATCH 03/10] x86: Change NR_CPUS arrays in powernow-k8 V2 travis
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 15+ messages in thread
From: travis @ 2008-01-15  2:17 UTC (permalink / raw)
  To: Andrew Morton, Andi Kleen, mingo
  Cc: Christoph Lameter, Jack Steiner, linux-mm, linux-kernel

[-- Attachment #1: big_nodeids --]
[-- Type: text/plain, Size: 4140 bytes --]

Change the size of node ids from 8 bits to 16 bits to
accomodate more than 256 nodes.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
---
V1->V2:
    - changed pxm_to_node_map to u16
    - changed memnode map entries to u16
---
 arch/x86/mm/numa_64.c       |    9 ++++++---
 arch/x86/mm/srat_64.c       |    2 +-
 drivers/acpi/numa.c         |    2 +-
 include/asm-x86/mmzone_64.h |    4 ++--
 include/asm-x86/numa_64.h   |    4 ++--
 include/asm-x86/topology.h  |    2 +-
 6 files changed, 13 insertions(+), 10 deletions(-)

--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -11,6 +11,7 @@
 #include <linux/ctype.h>
 #include <linux/module.h>
 #include <linux/nodemask.h>
+#include <linux/sched.h>
 
 #include <asm/e820.h>
 #include <asm/proto.h>
@@ -30,12 +31,12 @@ bootmem_data_t plat_node_bdata[MAX_NUMNO
 
 struct memnode memnode;
 
-int cpu_to_node_map[NR_CPUS] __read_mostly = {
+u16 cpu_to_node_map[NR_CPUS] __read_mostly = {
 	[0 ... NR_CPUS-1] = NUMA_NO_NODE
 };
 EXPORT_SYMBOL(cpu_to_node_map);
 
-unsigned char apicid_to_node[MAX_LOCAL_APIC] __cpuinitdata = {
+u16 apicid_to_node[MAX_LOCAL_APIC] __cpuinitdata = {
 	[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
 };
 
@@ -544,7 +545,9 @@ void __init numa_initmem_init(unsigned l
 	node_set(0, node_possible_map);
 	for (i = 0; i < NR_CPUS; i++)
 		numa_set_node(i, 0);
-	node_to_cpumask_map[0] = cpumask_of_cpu(0);
+	/* we can't use cpumask_of_cpu() yet */
+	memset(&node_to_cpumask_map[0], 0, sizeof(node_to_cpumask_map[0]));
+	cpu_set(0, node_to_cpumask_map[0]);
 	e820_register_active_regions(0, start_pfn, end_pfn);
 	setup_node_bootmem(0, start_pfn << PAGE_SHIFT, end_pfn << PAGE_SHIFT);
 }
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -395,7 +395,7 @@ int __init acpi_scan_nodes(unsigned long
 static int fake_node_to_pxm_map[MAX_NUMNODES] __initdata = {
 	[0 ... MAX_NUMNODES-1] = PXM_INVAL
 };
-static unsigned char fake_apicid_to_node[MAX_LOCAL_APIC] __initdata = {
+static u16 fake_apicid_to_node[MAX_LOCAL_APIC] __initdata = {
 	[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
 };
 static int __init find_node_by_addr(unsigned long addr)
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -38,7 +38,7 @@ ACPI_MODULE_NAME("numa");
 static nodemask_t nodes_found_map = NODE_MASK_NONE;
 
 /* maps to convert between proximity domain and logical node ID */
-static int pxm_to_node_map[MAX_PXM_DOMAINS]
+static u16 pxm_to_node_map[MAX_PXM_DOMAINS]
 				= { [0 ... MAX_PXM_DOMAINS - 1] = NID_INVAL };
 static int node_to_pxm_map[MAX_NUMNODES]
 				= { [0 ... MAX_NUMNODES - 1] = PXM_INVAL };
--- a/include/asm-x86/mmzone_64.h
+++ b/include/asm-x86/mmzone_64.h
@@ -15,8 +15,8 @@
 struct memnode {
 	int shift;
 	unsigned int mapsize;
-	u8 *map;
-	u8 embedded_map[64-16];
+	u16 *map;
+	u16 embedded_map[64-16];
 } ____cacheline_aligned; /* total size = 64 bytes */
 extern struct memnode memnode;
 #define memnode_shift memnode.shift
--- a/include/asm-x86/numa_64.h
+++ b/include/asm-x86/numa_64.h
@@ -20,7 +20,7 @@ extern void numa_set_node(int cpu, int n
 extern void srat_reserve_add_area(int nodeid);
 extern int hotadd_percent;
 
-extern unsigned char apicid_to_node[MAX_LOCAL_APIC];
+extern u16 apicid_to_node[MAX_LOCAL_APIC];
 
 extern void numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn);
 extern unsigned long numa_free_all_bootmem(void);
@@ -40,6 +40,6 @@ static inline void clear_node_cpumask(in
 #define clear_node_cpumask(cpu) do {} while (0)
 #endif
 
-#define NUMA_NO_NODE 0xff
+#define NUMA_NO_NODE 0xffff
 
 #endif
--- a/include/asm-x86/topology.h
+++ b/include/asm-x86/topology.h
@@ -30,7 +30,7 @@
 #include <asm/mpspec.h>
 
 /* Mappings between logical cpu number and node number */
-extern int cpu_to_node_map[];
+extern u16 cpu_to_node_map[];
 extern cpumask_t node_to_cpumask_map[];
 
 /* Returns the number of the node containing CPU 'cpu' */

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 03/10] x86: Change NR_CPUS arrays in powernow-k8 V2
  2008-01-15  2:17 [PATCH 00/10] x86: Reduce memory and intra-node effects with large count NR_CPUs V2 travis
  2008-01-15  2:17 ` [PATCH 01/10] x86: Change size of APICIDs from u8 to u16 V2 travis
  2008-01-15  2:17 ` [PATCH 02/10] x86: Change size of node ids " travis
@ 2008-01-15  2:17 ` travis
  2008-01-15  2:17 ` [PATCH 04/10] x86: Change NR_CPUS arrays in intel_cacheinfo V2 travis
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: travis @ 2008-01-15  2:17 UTC (permalink / raw)
  To: Andrew Morton, Andi Kleen, mingo
  Cc: Christoph Lameter, Jack Steiner, linux-mm, linux-kernel

[-- Attachment #1: NR_CPUS-arrays-in-powernow-k8 --]
[-- Type: text/plain, Size: 2469 bytes --]

Change the following static arrays sized by NR_CPUS to
per_cpu data variables:

	powernow_k8_data *powernow_data[NR_CPUS];


Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
---
V1->V2:
    - (none)
---
 arch/x86/kernel/cpu/cpufreq/powernow-k8.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- a/arch/x86/kernel/cpu/cpufreq/powernow-k8.c
+++ b/arch/x86/kernel/cpu/cpufreq/powernow-k8.c
@@ -53,7 +53,7 @@
 /* serialize freq changes  */
 static DEFINE_MUTEX(fidvid_mutex);
 
-static struct powernow_k8_data *powernow_data[NR_CPUS];
+static DEFINE_PER_CPU(struct powernow_k8_data *, powernow_data);
 
 static int cpu_family = CPU_OPTERON;
 
@@ -1052,7 +1052,7 @@ static int transition_frequency_pstate(s
 static int powernowk8_target(struct cpufreq_policy *pol, unsigned targfreq, unsigned relation)
 {
 	cpumask_t oldmask = CPU_MASK_ALL;
-	struct powernow_k8_data *data = powernow_data[pol->cpu];
+	struct powernow_k8_data *data = per_cpu(powernow_data, pol->cpu);
 	u32 checkfid;
 	u32 checkvid;
 	unsigned int newstate;
@@ -1128,7 +1128,7 @@ err_out:
 /* Driver entry point to verify the policy and range of frequencies */
 static int powernowk8_verify(struct cpufreq_policy *pol)
 {
-	struct powernow_k8_data *data = powernow_data[pol->cpu];
+	struct powernow_k8_data *data = per_cpu(powernow_data, pol->cpu);
 
 	if (!data)
 		return -EINVAL;
@@ -1233,7 +1233,7 @@ static int __cpuinit powernowk8_cpu_init
 		dprintk("cpu_init done, current fid 0x%x, vid 0x%x\n",
 			data->currfid, data->currvid);
 
-	powernow_data[pol->cpu] = data;
+	per_cpu(powernow_data, pol->cpu) = data;
 
 	return 0;
 
@@ -1247,7 +1247,7 @@ err_out:
 
 static int __devexit powernowk8_cpu_exit (struct cpufreq_policy *pol)
 {
-	struct powernow_k8_data *data = powernow_data[pol->cpu];
+	struct powernow_k8_data *data = per_cpu(powernow_data, pol->cpu);
 
 	if (!data)
 		return -EINVAL;
@@ -1268,7 +1268,7 @@ static unsigned int powernowk8_get (unsi
 	cpumask_t oldmask = current->cpus_allowed;
 	unsigned int khz = 0;
 
-	data = powernow_data[first_cpu(per_cpu(cpu_core_map, cpu))];
+	data = per_cpu(powernow_data, first_cpu(per_cpu(cpu_core_map, cpu)));
 
 	if (!data)
 		return -EINVAL;

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 04/10] x86: Change NR_CPUS arrays in intel_cacheinfo V2
  2008-01-15  2:17 [PATCH 00/10] x86: Reduce memory and intra-node effects with large count NR_CPUs V2 travis
                   ` (2 preceding siblings ...)
  2008-01-15  2:17 ` [PATCH 03/10] x86: Change NR_CPUS arrays in powernow-k8 V2 travis
@ 2008-01-15  2:17 ` travis
  2008-01-15  2:17 ` [PATCH 05/10] x86: Change NR_CPUS arrays in smpboot_64 V2 travis
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: travis @ 2008-01-15  2:17 UTC (permalink / raw)
  To: Andrew Morton, Andi Kleen, mingo
  Cc: Christoph Lameter, Jack Steiner, linux-mm, linux-kernel

[-- Attachment #1: NR_CPUS-arrays-in-intel_cacheinfo --]
[-- Type: text/plain, Size: 6092 bytes --]

Change the following static arrays sized by NR_CPUS to
per_cpu data variables:

	_cpuid4_info *cpuid4_info[NR_CPUS];
	_index_kobject *index_kobject[NR_CPUS];
	kobject * cache_kobject[NR_CPUS];

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
---
V1->V2:
    - (none)
---
 arch/x86/kernel/cpu/intel_cacheinfo.c |   55 +++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 26 deletions(-)

--- a/arch/x86/kernel/cpu/intel_cacheinfo.c
+++ b/arch/x86/kernel/cpu/intel_cacheinfo.c
@@ -451,8 +451,8 @@ unsigned int __cpuinit init_intel_cachei
 }
 
 /* pointer to _cpuid4_info array (for each cache leaf) */
-static struct _cpuid4_info *cpuid4_info[NR_CPUS];
-#define CPUID4_INFO_IDX(x,y)    (&((cpuid4_info[x])[y]))
+static DEFINE_PER_CPU(struct _cpuid4_info *, cpuid4_info);
+#define CPUID4_INFO_IDX(x,y)    (&((per_cpu(cpuid4_info, x))[y]))
 
 #ifdef CONFIG_SMP
 static void __cpuinit cache_shared_cpu_map_setup(unsigned int cpu, int index)
@@ -474,7 +474,7 @@ static void __cpuinit cache_shared_cpu_m
 			if (cpu_data(i).apicid >> index_msb ==
 			    c->apicid >> index_msb) {
 				cpu_set(i, this_leaf->shared_cpu_map);
-				if (i != cpu && cpuid4_info[i])  {
+				if (i != cpu && per_cpu(cpuid4_info, i))  {
 					sibling_leaf = CPUID4_INFO_IDX(i, index);
 					cpu_set(cpu, sibling_leaf->shared_cpu_map);
 				}
@@ -505,8 +505,8 @@ static void __cpuinit free_cache_attribu
 	for (i = 0; i < num_cache_leaves; i++)
 		cache_remove_shared_cpu_map(cpu, i);
 
-	kfree(cpuid4_info[cpu]);
-	cpuid4_info[cpu] = NULL;
+	kfree(per_cpu(cpuid4_info, cpu));
+	per_cpu(cpuid4_info, cpu) = NULL;
 }
 
 static int __cpuinit detect_cache_attributes(unsigned int cpu)
@@ -519,9 +519,9 @@ static int __cpuinit detect_cache_attrib
 	if (num_cache_leaves == 0)
 		return -ENOENT;
 
-	cpuid4_info[cpu] = kzalloc(
+	per_cpu(cpuid4_info, cpu) = kzalloc(
 	    sizeof(struct _cpuid4_info) * num_cache_leaves, GFP_KERNEL);
-	if (cpuid4_info[cpu] == NULL)
+	if (per_cpu(cpuid4_info, cpu) == NULL)
 		return -ENOMEM;
 
 	oldmask = current->cpus_allowed;
@@ -546,8 +546,8 @@ static int __cpuinit detect_cache_attrib
 
 out:
 	if (retval) {
-		kfree(cpuid4_info[cpu]);
-		cpuid4_info[cpu] = NULL;
+		kfree(per_cpu(cpuid4_info, cpu));
+		per_cpu(cpuid4_info, cpu) = NULL;
 	}
 
 	return retval;
@@ -561,7 +561,7 @@ out:
 extern struct sysdev_class cpu_sysdev_class; /* from drivers/base/cpu.c */
 
 /* pointer to kobject for cpuX/cache */
-static struct kobject * cache_kobject[NR_CPUS];
+static DEFINE_PER_CPU(struct kobject *, cache_kobject);
 
 struct _index_kobject {
 	struct kobject kobj;
@@ -570,8 +570,8 @@ struct _index_kobject {
 };
 
 /* pointer to array of kobjects for cpuX/cache/indexY */
-static struct _index_kobject *index_kobject[NR_CPUS];
-#define INDEX_KOBJECT_PTR(x,y)    (&((index_kobject[x])[y]))
+static DEFINE_PER_CPU(struct _index_kobject *, index_kobject);
+#define INDEX_KOBJECT_PTR(x,y)    (&((per_cpu(index_kobject, x))[y]))
 
 #define show_one_plus(file_name, object, val)				\
 static ssize_t show_##file_name						\
@@ -684,10 +684,10 @@ static struct kobj_type ktype_percpu_ent
 
 static void __cpuinit cpuid4_cache_sysfs_exit(unsigned int cpu)
 {
-	kfree(cache_kobject[cpu]);
-	kfree(index_kobject[cpu]);
-	cache_kobject[cpu] = NULL;
-	index_kobject[cpu] = NULL;
+	kfree(per_cpu(cache_kobject, cpu));
+	kfree(per_cpu(index_kobject, cpu));
+	per_cpu(cache_kobject, cpu) = NULL;
+	per_cpu(index_kobject, cpu) = NULL;
 	free_cache_attributes(cpu);
 }
 
@@ -703,13 +703,14 @@ static int __cpuinit cpuid4_cache_sysfs_
 		return err;
 
 	/* Allocate all required memory */
-	cache_kobject[cpu] = kzalloc(sizeof(struct kobject), GFP_KERNEL);
-	if (unlikely(cache_kobject[cpu] == NULL))
+	per_cpu(cache_kobject, cpu) =
+		kzalloc(sizeof(struct kobject), GFP_KERNEL);
+	if (unlikely(per_cpu(cache_kobject, cpu) == NULL))
 		goto err_out;
 
-	index_kobject[cpu] = kzalloc(
+	per_cpu(index_kobject, cpu) = kzalloc(
 	    sizeof(struct _index_kobject ) * num_cache_leaves, GFP_KERNEL);
-	if (unlikely(index_kobject[cpu] == NULL))
+	if (unlikely(per_cpu(index_kobject, cpu) == NULL))
 		goto err_out;
 
 	return 0;
@@ -733,7 +734,8 @@ static int __cpuinit cache_add_dev(struc
 	if (unlikely(retval < 0))
 		return retval;
 
-	retval = kobject_init_and_add(cache_kobject[cpu], &ktype_percpu_entry,
+	retval = kobject_init_and_add(per_cpu(cache_kobject, cpu),
+				      &ktype_percpu_entry,
 				      &sys_dev->kobj, "%s", "cache");
 	if (retval < 0) {
 		cpuid4_cache_sysfs_exit(cpu);
@@ -745,13 +747,14 @@ static int __cpuinit cache_add_dev(struc
 		this_object->cpu = cpu;
 		this_object->index = i;
 		retval = kobject_init_and_add(&(this_object->kobj),
-					      &ktype_cache, cache_kobject[cpu],
+					      &ktype_cache,
+					      per_cpu(cache_kobject, cpu),
 					      "index%1lu", i);
 		if (unlikely(retval)) {
 			for (j = 0; j < i; j++) {
 				kobject_put(&(INDEX_KOBJECT_PTR(cpu,j)->kobj));
 			}
-			kobject_put(cache_kobject[cpu]);
+			kobject_put(per_cpu(cache_kobject, cpu));
 			cpuid4_cache_sysfs_exit(cpu);
 			break;
 		}
@@ -760,7 +763,7 @@ static int __cpuinit cache_add_dev(struc
 	if (!retval)
 		cpu_set(cpu, cache_dev_map);
 
-	kobject_uevent(cache_kobject[cpu], KOBJ_ADD);
+	kobject_uevent(per_cpu(cache_kobject, cpu), KOBJ_ADD);
 	return retval;
 }
 
@@ -769,7 +772,7 @@ static void __cpuinit cache_remove_dev(s
 	unsigned int cpu = sys_dev->id;
 	unsigned long i;
 
-	if (cpuid4_info[cpu] == NULL)
+	if (per_cpu(cpuid4_info, cpu) == NULL)
 		return;
 	if (!cpu_isset(cpu, cache_dev_map))
 		return;
@@ -777,7 +780,7 @@ static void __cpuinit cache_remove_dev(s
 
 	for (i = 0; i < num_cache_leaves; i++)
 		kobject_put(&(INDEX_KOBJECT_PTR(cpu,i)->kobj));
-	kobject_put(cache_kobject[cpu]);
+	kobject_put(per_cpu(cache_kobject, cpu));
 	cpuid4_cache_sysfs_exit(cpu);
 }
 

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 05/10] x86: Change NR_CPUS arrays in smpboot_64 V2
  2008-01-15  2:17 [PATCH 00/10] x86: Reduce memory and intra-node effects with large count NR_CPUs V2 travis
                   ` (3 preceding siblings ...)
  2008-01-15  2:17 ` [PATCH 04/10] x86: Change NR_CPUS arrays in intel_cacheinfo V2 travis
@ 2008-01-15  2:17 ` travis
  2008-01-15  2:17 ` [PATCH 06/10] x86: Change NR_CPUS arrays in topology V2 travis
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: travis @ 2008-01-15  2:17 UTC (permalink / raw)
  To: Andrew Morton, Andi Kleen, mingo
  Cc: Christoph Lameter, Jack Steiner, linux-mm, linux-kernel

[-- Attachment #1: NR_CPUS-arrays-in-smpboot_64 --]
[-- Type: text/plain, Size: 1538 bytes --]

Change the following static arrays sized by NR_CPUS to
per_cpu data variables:

	task_struct *idle_thread_array[NR_CPUS];

This is only done if CONFIG_HOTPLUG_CPU is defined
as otherwise, the array is removed after initialization
anyways.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
---
V1->V2:
    - (none)
---
 arch/x86/kernel/smpboot_64.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

--- a/arch/x86/kernel/smpboot_64.c
+++ b/arch/x86/kernel/smpboot_64.c
@@ -111,10 +111,20 @@ DEFINE_PER_CPU(int, cpu_state) = { 0 };
  * a new thread. Also avoids complicated thread destroy functionality
  * for idle threads.
  */
+#ifdef CONFIG_HOTPLUG_CPU
+/*
+ * Needed only for CONFIG_HOTPLUG_CPU because __cpuinitdata is
+ * removed after init for !CONFIG_HOTPLUG_CPU.
+ */
+static DEFINE_PER_CPU(struct task_struct *, idle_thread_array);
+#define get_idle_for_cpu(x)     (per_cpu(idle_thread_array, x))
+#define set_idle_for_cpu(x,p)   (per_cpu(idle_thread_array, x) = (p))
+#else
 struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
-
 #define get_idle_for_cpu(x)     (idle_thread_array[(x)])
 #define set_idle_for_cpu(x,p)   (idle_thread_array[(x)] = (p))
+#endif
+
 
 /*
  * Currently trivial. Write the real->protected mode

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 06/10] x86: Change NR_CPUS arrays in topology V2
  2008-01-15  2:17 [PATCH 00/10] x86: Reduce memory and intra-node effects with large count NR_CPUs V2 travis
                   ` (4 preceding siblings ...)
  2008-01-15  2:17 ` [PATCH 05/10] x86: Change NR_CPUS arrays in smpboot_64 V2 travis
@ 2008-01-15  2:17 ` travis
  2008-01-15  2:17 ` [PATCH 07/10] x86: Cleanup x86_cpu_to_apicid references V2 travis
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: travis @ 2008-01-15  2:17 UTC (permalink / raw)
  To: Andrew Morton, Andi Kleen, mingo
  Cc: Christoph Lameter, Jack Steiner, linux-mm, linux-kernel

[-- Attachment #1: NR_CPUS-arrays-in-topology --]
[-- Type: text/plain, Size: 1721 bytes --]

Change the following static arrays sized by NR_CPUS to
per_cpu data variables:

	i386_cpu cpu_devices[NR_CPUS];

(And change the struct name to x86_cpu.)

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
---
V1->V2:
    - (none)
---
 arch/x86/kernel/topology.c |    8 ++++----
 include/asm-x86/cpu.h      |    2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

--- a/arch/x86/kernel/topology.c
+++ b/arch/x86/kernel/topology.c
@@ -31,7 +31,7 @@
 #include <linux/mmzone.h>
 #include <asm/cpu.h>
 
-static struct i386_cpu cpu_devices[NR_CPUS];
+static DEFINE_PER_CPU(struct x86_cpu, cpu_devices);
 
 int __cpuinit arch_register_cpu(int num)
 {
@@ -46,16 +46,16 @@ int __cpuinit arch_register_cpu(int num)
 	 */
 #ifdef CONFIG_HOTPLUG_CPU
 	if (num)
-		cpu_devices[num].cpu.hotpluggable = 1;
+		per_cpu(cpu_devices, num).cpu.hotpluggable = 1;
 #endif
 
-	return register_cpu(&cpu_devices[num].cpu, num);
+	return register_cpu(&per_cpu(cpu_devices, num).cpu, num);
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
 void arch_unregister_cpu(int num)
 {
-	return unregister_cpu(&cpu_devices[num].cpu);
+	return unregister_cpu(&per_cpu(cpu_devices, num).cpu);
 }
 EXPORT_SYMBOL(arch_register_cpu);
 EXPORT_SYMBOL(arch_unregister_cpu);
--- a/include/asm-x86/cpu.h
+++ b/include/asm-x86/cpu.h
@@ -7,7 +7,7 @@
 #include <linux/nodemask.h>
 #include <linux/percpu.h>
 
-struct i386_cpu {
+struct x86_cpu {
 	struct cpu cpu;
 };
 extern int arch_register_cpu(int num);

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 07/10] x86: Cleanup x86_cpu_to_apicid references V2
  2008-01-15  2:17 [PATCH 00/10] x86: Reduce memory and intra-node effects with large count NR_CPUs V2 travis
                   ` (5 preceding siblings ...)
  2008-01-15  2:17 ` [PATCH 06/10] x86: Change NR_CPUS arrays in topology V2 travis
@ 2008-01-15  2:17 ` travis
  2008-01-15  2:17 ` [PATCH 08/10] x86: Change NR_CPUS arrays in numa_64 V2 travis
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: travis @ 2008-01-15  2:17 UTC (permalink / raw)
  To: Andrew Morton, Andi Kleen, mingo
  Cc: Christoph Lameter, Jack Steiner, linux-mm, linux-kernel

[-- Attachment #1: cleanup-x86_cpu_to_apicid --]
[-- Type: text/plain, Size: 5448 bytes --]

Clean up references to x86_cpu_to_apicid.  Removes extraneous
comments and standardizes on "x86_*_early_ptr" for the early
kernel init references.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
---
V1->V2:
    - Removed extraneous casts
---
 arch/x86/kernel/genapic_64.c |   11 ++---------
 arch/x86/kernel/mpparse_64.c |   14 +++++---------
 arch/x86/kernel/setup_64.c   |    2 +-
 arch/x86/kernel/smpboot_32.c |    9 ++-------
 arch/x86/kernel/smpboot_64.c |   16 +++++++++-------
 include/asm-x86/smp_32.h     |    2 +-
 include/asm-x86/smp_64.h     |    2 +-
 7 files changed, 21 insertions(+), 35 deletions(-)

--- a/arch/x86/kernel/genapic_64.c
+++ b/arch/x86/kernel/genapic_64.c
@@ -24,17 +24,10 @@
 #include <acpi/acpi_bus.h>
 #endif
 
-/*
- * which logical CPU number maps to which CPU (physical APIC ID)
- *
- * The following static array is used during kernel startup
- * and the x86_cpu_to_apicid_ptr contains the address of the
- * array during this time.  Is it zeroed when the per_cpu
- * data area is removed.
- */
+/* which logical CPU number maps to which CPU (physical APIC ID) */
 u16 x86_cpu_to_apicid_init[NR_CPUS] __initdata
 					= { [0 ... NR_CPUS-1] = BAD_APICID };
-void *x86_cpu_to_apicid_ptr;
+void *x86_cpu_to_apicid_early_ptr;
 DEFINE_PER_CPU(u16, x86_cpu_to_apicid) = BAD_APICID;
 EXPORT_PER_CPU_SYMBOL(x86_cpu_to_apicid);
 
--- a/arch/x86/kernel/mpparse_64.c
+++ b/arch/x86/kernel/mpparse_64.c
@@ -125,15 +125,11 @@ static void __cpuinit MP_processor_info(
 		cpu = 0;
  	}
 	bios_cpu_apicid[cpu] = m->mpc_apicid;
-	/*
-	 * We get called early in the the start_kernel initialization
-	 * process when the per_cpu data area is not yet setup, so we
-	 * use a static array that is removed after the per_cpu data
-	 * area is created.
-	 */
-	if (x86_cpu_to_apicid_ptr) {
-		u16 *x86_cpu_to_apicid = x86_cpu_to_apicid_ptr;
-		x86_cpu_to_apicid[cpu] = m->mpc_apicid;
+	/* are we being called early in kernel startup? */
+	if (x86_cpu_to_apicid_early_ptr) {
+		u16 *cpu_to_apicid = x86_cpu_to_apicid_early_ptr;
+
+		cpu_to_apicid[cpu] = m->mpc_apicid;
 	} else {
 		per_cpu(x86_cpu_to_apicid, cpu) = m->mpc_apicid;
 	}
--- a/arch/x86/kernel/setup_64.c
+++ b/arch/x86/kernel/setup_64.c
@@ -373,7 +373,7 @@ void __init setup_arch(char **cmdline_p)
 
 #ifdef CONFIG_SMP
 	/* setup to use the static apicid table during kernel startup */
-	x86_cpu_to_apicid_ptr = (void *)&x86_cpu_to_apicid_init;
+	x86_cpu_to_apicid_early_ptr = (void *)&x86_cpu_to_apicid_init;
 #endif
 
 #ifdef CONFIG_ACPI
--- a/arch/x86/kernel/smpboot_32.c
+++ b/arch/x86/kernel/smpboot_32.c
@@ -91,15 +91,10 @@ static cpumask_t smp_commenced_mask;
 DEFINE_PER_CPU_SHARED_ALIGNED(struct cpuinfo_x86, cpu_info);
 EXPORT_PER_CPU_SYMBOL(cpu_info);
 
-/*
- * The following static array is used during kernel startup
- * and the x86_cpu_to_apicid_ptr contains the address of the
- * array during this time.  Is it zeroed when the per_cpu
- * data area is removed.
- */
+/* which logical CPU number maps to which CPU (physical APIC ID) */
 u8 x86_cpu_to_apicid_init[NR_CPUS] __initdata =
 			{ [0 ... NR_CPUS-1] = BAD_APICID };
-void *x86_cpu_to_apicid_ptr;
+void *x86_cpu_to_apicid_early_ptr;
 DEFINE_PER_CPU(u8, x86_cpu_to_apicid) = BAD_APICID;
 EXPORT_PER_CPU_SYMBOL(x86_cpu_to_apicid);
 
--- a/arch/x86/kernel/smpboot_64.c
+++ b/arch/x86/kernel/smpboot_64.c
@@ -852,23 +852,25 @@ static int __init smp_sanity_check(unsig
 }
 
 /*
- * Copy apicid's found by MP_processor_info from initial array to the per cpu
- * data area.  The x86_cpu_to_apicid_init array is then expendable and the
- * x86_cpu_to_apicid_ptr is zeroed indicating that the static array is no
- * longer available.
+ * Copy data used in early init routines from the initial arrays to the
+ * per cpu data areas.  These arrays then become expendable and the
+ * *_ptrs are zeroed indicating that the static arrays are gone.
  */
 void __init smp_set_apicids(void)
 {
 	int cpu;
 
-	for_each_cpu_mask(cpu, cpu_possible_map) {
+	for_each_possible_cpu(cpu) {
 		if (per_cpu_offset(cpu))
 			per_cpu(x86_cpu_to_apicid, cpu) =
 						x86_cpu_to_apicid_init[cpu];
+		else
+			printk(KERN_NOTICE "per_cpu_offset zero for cpu %d\n",
+									cpu);
 	}
 
-	/* indicate the static array will be going away soon */
-	x86_cpu_to_apicid_ptr = NULL;
+	/* indicate the early static arrays are gone */
+	x86_cpu_to_apicid_early_ptr = NULL;
 }
 
 static void __init smp_cpu_index_default(void)
--- a/include/asm-x86/smp_32.h
+++ b/include/asm-x86/smp_32.h
@@ -30,7 +30,7 @@ extern void (*mtrr_hook) (void);
 extern void zap_low_mappings (void);
 
 extern u8 __initdata x86_cpu_to_apicid_init[];
-extern void *x86_cpu_to_apicid_ptr;
+extern void *x86_cpu_to_apicid_early_ptr;
 
 DECLARE_PER_CPU(cpumask_t, cpu_sibling_map);
 DECLARE_PER_CPU(cpumask_t, cpu_core_map);
--- a/include/asm-x86/smp_64.h
+++ b/include/asm-x86/smp_64.h
@@ -27,7 +27,7 @@ extern int smp_call_function_mask(cpumas
 				  void *info, int wait);
 
 extern u16 __initdata x86_cpu_to_apicid_init[];
-extern void *x86_cpu_to_apicid_ptr;
+extern void *x86_cpu_to_apicid_early_ptr;
 extern u16 bios_cpu_apicid[];
 
 DECLARE_PER_CPU(cpumask_t, cpu_sibling_map);

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 08/10] x86: Change NR_CPUS arrays in numa_64 V2
  2008-01-15  2:17 [PATCH 00/10] x86: Reduce memory and intra-node effects with large count NR_CPUs V2 travis
                   ` (6 preceding siblings ...)
  2008-01-15  2:17 ` [PATCH 07/10] x86: Cleanup x86_cpu_to_apicid references V2 travis
@ 2008-01-15  2:17 ` travis
  2008-01-15 10:54   ` Andi Kleen
  2008-01-15  2:17 ` [PATCH 09/10] x86: Change NR_CPUS arrays in acpi-cpufreq V2 travis
  2008-01-15  2:17 ` [PATCH 10/10] x86: Change bios_cpu_apicid to percpu data variable V2 travis
  9 siblings, 1 reply; 15+ messages in thread
From: travis @ 2008-01-15  2:17 UTC (permalink / raw)
  To: Andrew Morton, Andi Kleen, mingo
  Cc: Christoph Lameter, Jack Steiner, linux-mm, linux-kernel

[-- Attachment #1: NR_CPUS-arrays-in-numa_64 --]
[-- Type: text/plain, Size: 5109 bytes --]

Change the following static arrays sized by NR_CPUS to
per_cpu data variables:

	char cpu_to_node_map[NR_CPUS];


Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
---
V1->V2:
    - Removed extraneous casts
    - Fix !NUMA builds with '#ifdef CONFIG_NUMA"
---
 arch/x86/kernel/setup_64.c   |    6 +++++-
 arch/x86/kernel/smpboot_64.c |   10 +++++++++-
 arch/x86/mm/numa_64.c        |   20 ++++++++++++++++----
 include/asm-x86/numa_64.h    |    2 --
 include/asm-x86/topology.h   |   15 +++++++++++++--
 net/sunrpc/svc.c             |    1 +
 6 files changed, 44 insertions(+), 10 deletions(-)

--- a/arch/x86/kernel/setup_64.c
+++ b/arch/x86/kernel/setup_64.c
@@ -63,6 +63,7 @@
 #include <asm/cacheflush.h>
 #include <asm/mce.h>
 #include <asm/ds.h>
+#include <asm/topology.h>
 
 #ifdef CONFIG_PARAVIRT
 #include <asm/paravirt.h>
@@ -372,8 +373,11 @@ void __init setup_arch(char **cmdline_p)
 	io_delay_init();
 
 #ifdef CONFIG_SMP
-	/* setup to use the static apicid table during kernel startup */
+	/* setup to use the early static init tables during kernel startup */
 	x86_cpu_to_apicid_early_ptr = (void *)&x86_cpu_to_apicid_init;
+#ifdef CONFIG_NUMA
+	x86_cpu_to_node_map_early_ptr = (void *)&x86_cpu_to_node_map_init;
+#endif
 #endif
 
 #ifdef CONFIG_ACPI
--- a/arch/x86/kernel/smpboot_64.c
+++ b/arch/x86/kernel/smpboot_64.c
@@ -861,9 +861,14 @@ void __init smp_set_apicids(void)
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
-		if (per_cpu_offset(cpu))
+		if (per_cpu_offset(cpu)) {
 			per_cpu(x86_cpu_to_apicid, cpu) =
 						x86_cpu_to_apicid_init[cpu];
+#ifdef CONFIG_NUMA
+			per_cpu(x86_cpu_to_node_map, cpu) =
+						x86_cpu_to_node_map_init[cpu];
+#endif
+		}
 		else
 			printk(KERN_NOTICE "per_cpu_offset zero for cpu %d\n",
 									cpu);
@@ -871,6 +876,9 @@ void __init smp_set_apicids(void)
 
 	/* indicate the early static arrays are gone */
 	x86_cpu_to_apicid_early_ptr = NULL;
+#ifdef CONFIG_NUMA
+	x86_cpu_to_node_map_early_ptr = NULL;
+#endif
 }
 
 static void __init smp_cpu_index_default(void)
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -31,10 +31,14 @@ bootmem_data_t plat_node_bdata[MAX_NUMNO
 
 struct memnode memnode;
 
-u16 cpu_to_node_map[NR_CPUS] __read_mostly = {
+u16 x86_cpu_to_node_map_init[NR_CPUS] __initdata = {
 	[0 ... NR_CPUS-1] = NUMA_NO_NODE
 };
-EXPORT_SYMBOL(cpu_to_node_map);
+void *x86_cpu_to_node_map_early_ptr;
+EXPORT_SYMBOL(x86_cpu_to_node_map_init);
+EXPORT_SYMBOL(x86_cpu_to_node_map_early_ptr);
+DEFINE_PER_CPU(u16, x86_cpu_to_node_map) = NUMA_NO_NODE;
+EXPORT_PER_CPU_SYMBOL(x86_cpu_to_node_map);
 
 u16 apicid_to_node[MAX_LOCAL_APIC] __cpuinitdata = {
 	[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
@@ -545,7 +549,7 @@ void __init numa_initmem_init(unsigned l
 	node_set(0, node_possible_map);
 	for (i = 0; i < NR_CPUS; i++)
 		numa_set_node(i, 0);
-	/* we can't use cpumask_of_cpu() yet */
+	/* cpumask_of_cpu() may not be available during early startup */
 	memset(&node_to_cpumask_map[0], 0, sizeof(node_to_cpumask_map[0]));
 	cpu_set(0, node_to_cpumask_map[0]);
 	e820_register_active_regions(0, start_pfn, end_pfn);
@@ -559,8 +563,16 @@ __cpuinit void numa_add_cpu(int cpu)
 
 void __cpuinit numa_set_node(int cpu, int node)
 {
+	u16 *cpu_to_node_map = x86_cpu_to_node_map_early_ptr;
+
 	cpu_pda(cpu)->nodenumber = node;
-	cpu_to_node_map[cpu] = node;
+
+	if(cpu_to_node_map)
+		cpu_to_node_map[cpu] = node;
+	else if(per_cpu_offset(cpu))
+		per_cpu(x86_cpu_to_node_map, cpu) = node;
+	else
+		Dprintk(KERN_INFO "Setting node for non-present cpu %d\n", cpu);
 }
 
 unsigned long __init numa_free_all_bootmem(void)
--- a/include/asm-x86/numa_64.h
+++ b/include/asm-x86/numa_64.h
@@ -40,6 +40,4 @@ static inline void clear_node_cpumask(in
 #define clear_node_cpumask(cpu) do {} while (0)
 #endif
 
-#define NUMA_NO_NODE 0xffff
-
 #endif
--- a/include/asm-x86/topology.h
+++ b/include/asm-x86/topology.h
@@ -30,13 +30,24 @@
 #include <asm/mpspec.h>
 
 /* Mappings between logical cpu number and node number */
-extern u16 cpu_to_node_map[];
+DECLARE_PER_CPU(u16, x86_cpu_to_node_map);
+extern u16 __initdata x86_cpu_to_node_map_init[];
+extern void *x86_cpu_to_node_map_early_ptr;
 extern cpumask_t node_to_cpumask_map[];
 
+#define NUMA_NO_NODE	((u16)(~0))
+
 /* Returns the number of the node containing CPU 'cpu' */
 static inline int cpu_to_node(int cpu)
 {
-	return cpu_to_node_map[cpu];
+	u16 *cpu_to_node_map = x86_cpu_to_node_map_early_ptr;
+
+	if (cpu_to_node_map)
+		return cpu_to_node_map[cpu];
+	else if(per_cpu_offset(cpu))
+		return per_cpu(x86_cpu_to_node_map, cpu);
+	else
+		return NUMA_NO_NODE;
 }
 
 /*
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -18,6 +18,7 @@
 #include <linux/mm.h>
 #include <linux/interrupt.h>
 #include <linux/module.h>
+#include <linux/sched.h>
 
 #include <linux/sunrpc/types.h>
 #include <linux/sunrpc/xdr.h>

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 09/10] x86: Change NR_CPUS arrays in acpi-cpufreq V2
  2008-01-15  2:17 [PATCH 00/10] x86: Reduce memory and intra-node effects with large count NR_CPUs V2 travis
                   ` (7 preceding siblings ...)
  2008-01-15  2:17 ` [PATCH 08/10] x86: Change NR_CPUS arrays in numa_64 V2 travis
@ 2008-01-15  2:17 ` travis
  2008-01-15  2:17 ` [PATCH 10/10] x86: Change bios_cpu_apicid to percpu data variable V2 travis
  9 siblings, 0 replies; 15+ messages in thread
From: travis @ 2008-01-15  2:17 UTC (permalink / raw)
  To: Andrew Morton, Andi Kleen, mingo
  Cc: Christoph Lameter, Jack Steiner, linux-mm, linux-kernel

[-- Attachment #1: NR_CPUS-arrays-in-acpi-cpufreq --]
[-- Type: text/plain, Size: 4165 bytes --]

Change the following static arrays sized by NR_CPUS to
per_cpu data variables:

	acpi_cpufreq_data *drv_data[NR_CPUS]

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
---
V1->V2:
    - (none)
---
 arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c |   25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

--- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -67,7 +67,8 @@ struct acpi_cpufreq_data {
 	unsigned int cpu_feature;
 };
 
-static struct acpi_cpufreq_data *drv_data[NR_CPUS];
+static DEFINE_PER_CPU(struct acpi_cpufreq_data *, drv_data);
+
 /* acpi_perf_data is a pointer to percpu data. */
 static struct acpi_processor_performance *acpi_perf_data;
 
@@ -218,14 +219,14 @@ static u32 get_cur_val(cpumask_t mask)
 	if (unlikely(cpus_empty(mask)))
 		return 0;
 
-	switch (drv_data[first_cpu(mask)]->cpu_feature) {
+	switch (per_cpu(drv_data, first_cpu(mask))->cpu_feature) {
 	case SYSTEM_INTEL_MSR_CAPABLE:
 		cmd.type = SYSTEM_INTEL_MSR_CAPABLE;
 		cmd.addr.msr.reg = MSR_IA32_PERF_STATUS;
 		break;
 	case SYSTEM_IO_CAPABLE:
 		cmd.type = SYSTEM_IO_CAPABLE;
-		perf = drv_data[first_cpu(mask)]->acpi_data;
+		perf = per_cpu(drv_data, first_cpu(mask))->acpi_data;
 		cmd.addr.io.port = perf->control_register.address;
 		cmd.addr.io.bit_width = perf->control_register.bit_width;
 		break;
@@ -325,7 +326,7 @@ static unsigned int get_measured_perf(un
 
 #endif
 
-	retval = drv_data[cpu]->max_freq * perf_percent / 100;
+	retval = per_cpu(drv_data, cpu)->max_freq * perf_percent / 100;
 
 	put_cpu();
 	set_cpus_allowed(current, saved_mask);
@@ -336,7 +337,7 @@ static unsigned int get_measured_perf(un
 
 static unsigned int get_cur_freq_on_cpu(unsigned int cpu)
 {
-	struct acpi_cpufreq_data *data = drv_data[cpu];
+	struct acpi_cpufreq_data *data = per_cpu(drv_data, cpu);
 	unsigned int freq;
 
 	dprintk("get_cur_freq_on_cpu (%d)\n", cpu);
@@ -370,7 +371,7 @@ static unsigned int check_freqs(cpumask_
 static int acpi_cpufreq_target(struct cpufreq_policy *policy,
 			       unsigned int target_freq, unsigned int relation)
 {
-	struct acpi_cpufreq_data *data = drv_data[policy->cpu];
+	struct acpi_cpufreq_data *data = per_cpu(drv_data, policy->cpu);
 	struct acpi_processor_performance *perf;
 	struct cpufreq_freqs freqs;
 	cpumask_t online_policy_cpus;
@@ -466,7 +467,7 @@ static int acpi_cpufreq_target(struct cp
 
 static int acpi_cpufreq_verify(struct cpufreq_policy *policy)
 {
-	struct acpi_cpufreq_data *data = drv_data[policy->cpu];
+	struct acpi_cpufreq_data *data = per_cpu(drv_data, policy->cpu);
 
 	dprintk("acpi_cpufreq_verify\n");
 
@@ -570,7 +571,7 @@ static int acpi_cpufreq_cpu_init(struct 
 		return -ENOMEM;
 
 	data->acpi_data = percpu_ptr(acpi_perf_data, cpu);
-	drv_data[cpu] = data;
+	per_cpu(drv_data, cpu) = data;
 
 	if (cpu_has(c, X86_FEATURE_CONSTANT_TSC))
 		acpi_cpufreq_driver.flags |= CPUFREQ_CONST_LOOPS;
@@ -714,20 +715,20 @@ err_unreg:
 	acpi_processor_unregister_performance(perf, cpu);
 err_free:
 	kfree(data);
-	drv_data[cpu] = NULL;
+	per_cpu(drv_data, cpu) = NULL;
 
 	return result;
 }
 
 static int acpi_cpufreq_cpu_exit(struct cpufreq_policy *policy)
 {
-	struct acpi_cpufreq_data *data = drv_data[policy->cpu];
+	struct acpi_cpufreq_data *data = per_cpu(drv_data, policy->cpu);
 
 	dprintk("acpi_cpufreq_cpu_exit\n");
 
 	if (data) {
 		cpufreq_frequency_table_put_attr(policy->cpu);
-		drv_data[policy->cpu] = NULL;
+		per_cpu(drv_data, policy->cpu) = NULL;
 		acpi_processor_unregister_performance(data->acpi_data,
 						      policy->cpu);
 		kfree(data);
@@ -738,7 +739,7 @@ static int acpi_cpufreq_cpu_exit(struct 
 
 static int acpi_cpufreq_resume(struct cpufreq_policy *policy)
 {
-	struct acpi_cpufreq_data *data = drv_data[policy->cpu];
+	struct acpi_cpufreq_data *data = per_cpu(drv_data, policy->cpu);
 
 	dprintk("acpi_cpufreq_resume\n");
 

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 10/10] x86: Change bios_cpu_apicid to percpu data variable V2
  2008-01-15  2:17 [PATCH 00/10] x86: Reduce memory and intra-node effects with large count NR_CPUs V2 travis
                   ` (8 preceding siblings ...)
  2008-01-15  2:17 ` [PATCH 09/10] x86: Change NR_CPUS arrays in acpi-cpufreq V2 travis
@ 2008-01-15  2:17 ` travis
  9 siblings, 0 replies; 15+ messages in thread
From: travis @ 2008-01-15  2:17 UTC (permalink / raw)
  To: Andrew Morton, Andi Kleen, mingo
  Cc: Christoph Lameter, Jack Steiner, linux-mm, linux-kernel

[-- Attachment #1: change-bios_cpu_apicid-to-percpu --]
[-- Type: text/plain, Size: 5632 bytes --]

Change static bios_cpu_apicid array to a per_cpu data variable.
This includes using a static array used during initialization
similar to the way x86_cpu_to_apicid[] is handled.

There is one early use of bios_cpu_apicid in apic_is_clustered_box().
The other reference in cpu_present_to_apicid() is called after
smp_set_apicids() has setup the percpu version of bios_cpu_apicid.


Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
---
V1->V2:
    - Removed extraneous casts
    - Add slight optimization to apic_is_clustered_box()
      [don't reference x86_bios_cpu_apicid_early_ptr each pass.]
---
 arch/x86/kernel/apic_64.c    |   17 +++++++++++++++--
 arch/x86/kernel/mpparse_64.c |   12 +++++++++---
 arch/x86/kernel/setup_64.c   |    1 +
 arch/x86/kernel/smpboot_64.c |    3 +++
 include/asm-x86/smp_64.h     |    8 +++++---
 5 files changed, 33 insertions(+), 8 deletions(-)

--- a/arch/x86/kernel/apic_64.c
+++ b/arch/x86/kernel/apic_64.c
@@ -1150,19 +1150,32 @@ __cpuinit int apic_is_clustered_box(void
 {
 	int i, clusters, zeros;
 	unsigned id;
+	u16 *bios_cpu_apicid = x86_bios_cpu_apicid_early_ptr;
 	DECLARE_BITMAP(clustermap, NUM_APIC_CLUSTERS);
 
 	bitmap_zero(clustermap, NUM_APIC_CLUSTERS);
 
 	for (i = 0; i < NR_CPUS; i++) {
-		id = bios_cpu_apicid[i];
+		/* are we being called early in kernel startup? */
+		if (bios_cpu_apicid) {
+			id = bios_cpu_apicid[i];
+		}
+		else if (i < nr_cpu_ids) {
+			if (cpu_present(i))
+				id = per_cpu(x86_bios_cpu_apicid, i);
+			else
+				continue;
+		}
+		else
+			break;
+
 		if (id != BAD_APICID)
 			__set_bit(APIC_CLUSTERID(id), clustermap);
 	}
 
 	/* Problem:  Partially populated chassis may not have CPUs in some of
 	 * the APIC clusters they have been allocated.  Only present CPUs have
-	 * bios_cpu_apicid entries, thus causing zeroes in the bitmap.  Since
+	 * x86_bios_cpu_apicid entries, thus causing zeroes in the bitmap.  Since
 	 * clusters are allocated sequentially, count zeros only if they are
 	 * bounded by ones.
 	 */
--- a/arch/x86/kernel/mpparse_64.c
+++ b/arch/x86/kernel/mpparse_64.c
@@ -67,7 +67,11 @@ unsigned disabled_cpus __cpuinitdata;
 /* Bitmask of physically existing CPUs */
 physid_mask_t phys_cpu_present_map = PHYSID_MASK_NONE;
 
-u16 bios_cpu_apicid[NR_CPUS] = { [0 ... NR_CPUS-1] = BAD_APICID };
+u16 x86_bios_cpu_apicid_init[NR_CPUS] __initdata
+				= { [0 ... NR_CPUS-1] = BAD_APICID };
+void *x86_bios_cpu_apicid_early_ptr;
+DEFINE_PER_CPU(u16, x86_bios_cpu_apicid) = BAD_APICID;
+EXPORT_PER_CPU_SYMBOL(x86_bios_cpu_apicid);
 
 
 /*
@@ -118,20 +122,22 @@ static void __cpuinit MP_processor_info(
 	physid_set(m->mpc_apicid, phys_cpu_present_map);
  	if (m->mpc_cpuflag & CPU_BOOTPROCESSOR) {
  		/*
- 		 * bios_cpu_apicid is required to have processors listed
+		 * x86_bios_cpu_apicid is required to have processors listed
  		 * in same order as logical cpu numbers. Hence the first
  		 * entry is BSP, and so on.
  		 */
 		cpu = 0;
  	}
-	bios_cpu_apicid[cpu] = m->mpc_apicid;
 	/* are we being called early in kernel startup? */
 	if (x86_cpu_to_apicid_early_ptr) {
 		u16 *cpu_to_apicid = x86_cpu_to_apicid_early_ptr;
+		u16 *bios_cpu_apicid = x86_bios_cpu_apicid_early_ptr;
 
 		cpu_to_apicid[cpu] = m->mpc_apicid;
+		bios_cpu_apicid[cpu] = m->mpc_apicid;
 	} else {
 		per_cpu(x86_cpu_to_apicid, cpu) = m->mpc_apicid;
+		per_cpu(x86_bios_cpu_apicid, cpu) = m->mpc_apicid;
 	}
 
 	cpu_set(cpu, cpu_possible_map);
--- a/arch/x86/kernel/setup_64.c
+++ b/arch/x86/kernel/setup_64.c
@@ -375,6 +375,7 @@ void __init setup_arch(char **cmdline_p)
 #ifdef CONFIG_SMP
 	/* setup to use the early static init tables during kernel startup */
 	x86_cpu_to_apicid_early_ptr = (void *)&x86_cpu_to_apicid_init;
+	x86_bios_cpu_apicid_early_ptr = (void *)&x86_bios_cpu_apicid_init;
 #ifdef CONFIG_NUMA
 	x86_cpu_to_node_map_early_ptr = (void *)&x86_cpu_to_node_map_init;
 #endif
--- a/arch/x86/kernel/smpboot_64.c
+++ b/arch/x86/kernel/smpboot_64.c
@@ -864,6 +864,8 @@ void __init smp_set_apicids(void)
 		if (per_cpu_offset(cpu)) {
 			per_cpu(x86_cpu_to_apicid, cpu) =
 						x86_cpu_to_apicid_init[cpu];
+			per_cpu(x86_bios_cpu_apicid, cpu) =
+						x86_bios_cpu_apicid_init[cpu];
 #ifdef CONFIG_NUMA
 			per_cpu(x86_cpu_to_node_map, cpu) =
 						x86_cpu_to_node_map_init[cpu];
@@ -876,6 +878,7 @@ void __init smp_set_apicids(void)
 
 	/* indicate the early static arrays are gone */
 	x86_cpu_to_apicid_early_ptr = NULL;
+	x86_bios_cpu_apicid_early_ptr = NULL;
 #ifdef CONFIG_NUMA
 	x86_cpu_to_node_map_early_ptr = NULL;
 #endif
--- a/include/asm-x86/smp_64.h
+++ b/include/asm-x86/smp_64.h
@@ -27,18 +27,20 @@ extern int smp_call_function_mask(cpumas
 				  void *info, int wait);
 
 extern u16 __initdata x86_cpu_to_apicid_init[];
+extern u16 __initdata x86_bios_cpu_apicid_init[];
 extern void *x86_cpu_to_apicid_early_ptr;
-extern u16 bios_cpu_apicid[];
+extern void *x86_bios_cpu_apicid_early_ptr;
 
 DECLARE_PER_CPU(cpumask_t, cpu_sibling_map);
 DECLARE_PER_CPU(cpumask_t, cpu_core_map);
 DECLARE_PER_CPU(u16, cpu_llc_id);
 DECLARE_PER_CPU(u16, x86_cpu_to_apicid);
+DECLARE_PER_CPU(u16, x86_bios_cpu_apicid);
 
 static inline int cpu_present_to_apicid(int mps_cpu)
 {
-	if (mps_cpu < NR_CPUS)
-		return (int)bios_cpu_apicid[mps_cpu];
+	if (cpu_present(mps_cpu))
+		return (int)per_cpu(x86_bios_cpu_apicid, mps_cpu);
 	else
 		return BAD_APICID;
 }

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 02/10] x86: Change size of node ids from u8 to u16 V2
  2008-01-15  2:17 ` [PATCH 02/10] x86: Change size of node ids " travis
@ 2008-01-15  5:59   ` Eric Dumazet
  2008-01-15 15:51     ` Mike Travis
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2008-01-15  5:59 UTC (permalink / raw)
  To: travis
  Cc: Andrew Morton, Andi Kleen, mingo, Christoph Lameter,
	Jack Steiner, linux-mm, linux-kernel

travis@sgi.com a ecrit :
> Change the size of node ids from 8 bits to 16 bits to
> accomodate more than 256 nodes.
> 
> Signed-off-by: Mike Travis <travis@sgi.com>
> Reviewed-by: Christoph Lameter <clameter@sgi.com>
> ---
> V1->V2:
>     - changed pxm_to_node_map to u16
>     - changed memnode map entries to u16
> ---
>  arch/x86/mm/numa_64.c       |    9 ++++++---
>  arch/x86/mm/srat_64.c       |    2 +-
>  drivers/acpi/numa.c         |    2 +-
>  include/asm-x86/mmzone_64.h |    4 ++--
>  include/asm-x86/numa_64.h   |    4 ++--
>  include/asm-x86/topology.h  |    2 +-
>  6 files changed, 13 insertions(+), 10 deletions(-)
> 
> --- a/arch/x86/mm/numa_64.c
> +++ b/arch/x86/mm/numa_64.c
> @@ -11,6 +11,7 @@
>  #include <linux/ctype.h>
>  #include <linux/module.h>
>  #include <linux/nodemask.h>
> +#include <linux/sched.h>
>  
>  #include <asm/e820.h>
>  #include <asm/proto.h>
> @@ -30,12 +31,12 @@ bootmem_data_t plat_node_bdata[MAX_NUMNO
>  
>  struct memnode memnode;
>  
> -int cpu_to_node_map[NR_CPUS] __read_mostly = {
> +u16 cpu_to_node_map[NR_CPUS] __read_mostly = {
>  	[0 ... NR_CPUS-1] = NUMA_NO_NODE
>  };
>  EXPORT_SYMBOL(cpu_to_node_map);
>  
> -unsigned char apicid_to_node[MAX_LOCAL_APIC] __cpuinitdata = {
> +u16 apicid_to_node[MAX_LOCAL_APIC] __cpuinitdata = {
>  	[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
>  };
>  
> @@ -544,7 +545,9 @@ void __init numa_initmem_init(unsigned l
>  	node_set(0, node_possible_map);
>  	for (i = 0; i < NR_CPUS; i++)
>  		numa_set_node(i, 0);
> -	node_to_cpumask_map[0] = cpumask_of_cpu(0);
> +	/* we can't use cpumask_of_cpu() yet */
> +	memset(&node_to_cpumask_map[0], 0, sizeof(node_to_cpumask_map[0]));
> +	cpu_set(0, node_to_cpumask_map[0]);
>  	e820_register_active_regions(0, start_pfn, end_pfn);
>  	setup_node_bootmem(0, start_pfn << PAGE_SHIFT, end_pfn << PAGE_SHIFT);
>  }
> --- a/arch/x86/mm/srat_64.c
> +++ b/arch/x86/mm/srat_64.c
> @@ -395,7 +395,7 @@ int __init acpi_scan_nodes(unsigned long
>  static int fake_node_to_pxm_map[MAX_NUMNODES] __initdata = {
>  	[0 ... MAX_NUMNODES-1] = PXM_INVAL
>  };
> -static unsigned char fake_apicid_to_node[MAX_LOCAL_APIC] __initdata = {
> +static u16 fake_apicid_to_node[MAX_LOCAL_APIC] __initdata = {
>  	[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
>  };
>  static int __init find_node_by_addr(unsigned long addr)
> --- a/drivers/acpi/numa.c
> +++ b/drivers/acpi/numa.c
> @@ -38,7 +38,7 @@ ACPI_MODULE_NAME("numa");
>  static nodemask_t nodes_found_map = NODE_MASK_NONE;
>  
>  /* maps to convert between proximity domain and logical node ID */
> -static int pxm_to_node_map[MAX_PXM_DOMAINS]
> +static u16 pxm_to_node_map[MAX_PXM_DOMAINS]
>  				= { [0 ... MAX_PXM_DOMAINS - 1] = NID_INVAL };
>  static int node_to_pxm_map[MAX_NUMNODES]
>  				= { [0 ... MAX_NUMNODES - 1] = PXM_INVAL };
> --- a/include/asm-x86/mmzone_64.h
> +++ b/include/asm-x86/mmzone_64.h
> @@ -15,8 +15,8 @@
>  struct memnode {
>  	int shift;
>  	unsigned int mapsize;
> -	u8 *map;
> -	u8 embedded_map[64-16];
> +	u16 *map;
> +	u16 embedded_map[64-16];

Must change to 32-8 here, or 64-8 and change the comment (total size = 128 
bytes). If you change to 32-8, check how .map is set to embedded_map.

>  } ____cacheline_aligned; /* total size = 64 bytes */
>  extern struct memnode memnode;
>  #define memnode_shift memnode.shift
> --- a/include/asm-x86/numa_64.h
> +++ b/include/asm-x86/numa_64.h
> @@ -20,7 +20,7 @@ extern void numa_set_node(int cpu, int n
>  extern void srat_reserve_add_area(int nodeid);
>  extern int hotadd_percent;
>  
> -extern unsigned char apicid_to_node[MAX_LOCAL_APIC];
> +extern u16 apicid_to_node[MAX_LOCAL_APIC];
>  
>  extern void numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn);
>  extern unsigned long numa_free_all_bootmem(void);
> @@ -40,6 +40,6 @@ static inline void clear_node_cpumask(in
>  #define clear_node_cpumask(cpu) do {} while (0)
>  #endif
>  
> -#define NUMA_NO_NODE 0xff
> +#define NUMA_NO_NODE 0xffff
>  
>  #endif
> --- a/include/asm-x86/topology.h
> +++ b/include/asm-x86/topology.h
> @@ -30,7 +30,7 @@
>  #include <asm/mpspec.h>
>  
>  /* Mappings between logical cpu number and node number */
> -extern int cpu_to_node_map[];
> +extern u16 cpu_to_node_map[];
>  extern cpumask_t node_to_cpumask_map[];
>  
>  /* Returns the number of the node containing CPU 'cpu' */
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 08/10] x86: Change NR_CPUS arrays in numa_64 V2
  2008-01-15  2:17 ` [PATCH 08/10] x86: Change NR_CPUS arrays in numa_64 V2 travis
@ 2008-01-15 10:54   ` Andi Kleen
  2008-01-15 22:23     ` Mike Travis
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2008-01-15 10:54 UTC (permalink / raw)
  To: travis
  Cc: Andrew Morton, mingo, Christoph Lameter, Jack Steiner, linux-mm,
	linux-kernel

travis@sgi.com writes:
> +
>  /* Returns the number of the node containing CPU 'cpu' */
>  static inline int cpu_to_node(int cpu)
>  {
> -	return cpu_to_node_map[cpu];
> +	u16 *cpu_to_node_map = x86_cpu_to_node_map_early_ptr;
> +
> +	if (cpu_to_node_map)
> +		return cpu_to_node_map[cpu];
> +	else if(per_cpu_offset(cpu))
> +		return per_cpu(x86_cpu_to_node_map, cpu);
> +	else
> +		return NUMA_NO_NODE;

Seems a little big now to be still inlined.

Also I wonder if there are really that many early callers that it
isn't feasible to just convert them to a early_cpu_to_node(). Also
early_cpu_to_node() should really not be speed critical, so just
linearly searching some other table instead of setting up an explicit
array should be fine for that.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 02/10] x86: Change size of node ids from u8 to u16 V2
  2008-01-15  5:59   ` Eric Dumazet
@ 2008-01-15 15:51     ` Mike Travis
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Travis @ 2008-01-15 15:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Morton, Andi Kleen, mingo, Christoph Lameter,
	Jack Steiner, linux-mm, linux-kernel

Eric Dumazet wrote:
>
>> --- a/include/asm-x86/mmzone_64.h
>> +++ b/include/asm-x86/mmzone_64.h
>> @@ -15,8 +15,8 @@
>>  struct memnode {
>>      int shift;
>>      unsigned int mapsize;
>> -    u8 *map;
>> -    u8 embedded_map[64-16];
>> +    u16 *map;
>> +    u16 embedded_map[64-16];
> 
> Must change to 32-8 here, or 64-8 and change the comment (total size =
> 128 bytes). If you change to 32-8, check how .map is set to embedded_map.
> 
>>  } ____cacheline_aligned; /* total size = 64 bytes */
>>  extern struct memnode memnode;
>>  #define memnode_shift memnode.shift

Thanks! 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 08/10] x86: Change NR_CPUS arrays in numa_64 V2
  2008-01-15 10:54   ` Andi Kleen
@ 2008-01-15 22:23     ` Mike Travis
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Travis @ 2008-01-15 22:23 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, mingo, Christoph Lameter, Jack Steiner, linux-mm,
	linux-kernel

Andi Kleen wrote:
> travis@sgi.com writes:
>> +
>>  /* Returns the number of the node containing CPU 'cpu' */
>>  static inline int cpu_to_node(int cpu)
>>  {
>> -	return cpu_to_node_map[cpu];
>> +	u16 *cpu_to_node_map = x86_cpu_to_node_map_early_ptr;
>> +
>> +	if (cpu_to_node_map)
>> +		return cpu_to_node_map[cpu];
>> +	else if(per_cpu_offset(cpu))
>> +		return per_cpu(x86_cpu_to_node_map, cpu);
>> +	else
>> +		return NUMA_NO_NODE;
> 
> Seems a little big now to be still inlined.
> 
> Also I wonder if there are really that many early callers that it
> isn't feasible to just convert them to a early_cpu_to_node(). Also
> early_cpu_to_node() should really not be speed critical, so just
> linearly searching some other table instead of setting up an explicit
> array should be fine for that.
> 
> -Andi


Is this what you had in mind?  (It's still panic'ing early in kernel startup
so it's not quite done.)

There are a fair number of early callers of cpu_to_node() particularly when
HOTPLUG_CPU is enabled.

One note is that I plan to optimize the check for "earliness" with a flag
that can be checked in the local node instead of always going back to node
0 to check for a non-null "early_ptr".  This will shorten up the inline
functions quite a bit maybe making these mods unnecessary?

Thanks,
Mike
---

diff -pur V2/arch/x86/kernel/setup64.c V3/arch/x86/kernel/setup64.c
--- V2/arch/x86/kernel/setup64.c	2008-01-15 14:06:03.000000000 -0800
+++ V3/arch/x86/kernel/setup64.c	2008-01-15 14:01:09.000000000 -0800
@@ -100,12 +100,12 @@ void __init setup_per_cpu_areas(void)
 	for_each_cpu_mask (i, cpu_possible_map) {
 		char *ptr;
 
-		if (!NODE_DATA(cpu_to_node(i))) {
+		if (!NODE_DATA(early_cpu_to_node(i))) {
 			printk("cpu with no node %d, num_online_nodes %d\n",
 			       i, num_online_nodes());
 			ptr = alloc_bootmem_pages(size);
 		} else { 
-			ptr = alloc_bootmem_pages_node(NODE_DATA(cpu_to_node(i)), size);
+			ptr = alloc_bootmem_pages_node(NODE_DATA(early_cpu_to_node(i)), size);
 		}
 		if (!ptr)
 			panic("Cannot allocate cpu data for CPU %d\n", i);
diff -pur V2/arch/x86/kernel/smpboot_64.c V3/arch/x86/kernel/smpboot_64.c
--- V2/arch/x86/kernel/smpboot_64.c	2008-01-15 14:06:19.000000000 -0800
+++ V3/arch/x86/kernel/smpboot_64.c	2008-01-15 14:01:09.000000000 -0800
@@ -569,7 +569,7 @@ static int __cpuinit do_boot_cpu(int cpu
 	/* Allocate node local memory for AP pdas */
 	if (cpu_pda(cpu) == &boot_cpu_pda[cpu]) {
 		struct x8664_pda *newpda, *pda;
-		int node = cpu_to_node(cpu);
+		int node = early_cpu_to_node(cpu);
 		pda = cpu_pda(cpu);
 		newpda = kmalloc_node(sizeof (struct x8664_pda), GFP_ATOMIC,
 				      node);
@@ -702,7 +702,7 @@ do_rest:
 	if (boot_error) {
 		cpu_clear(cpu, cpu_callout_map); /* was set here (do_boot_cpu()) */
 		clear_bit(cpu, (unsigned long *)&cpu_initialized); /* was set by cpu_init() */
-		clear_node_cpumask(cpu); /* was set by numa_add_cpu */
+		clear_bit(cpu, (unsigned long *)&node_to_cpumask_map[early_cpu_to_node(cpu)]);
 		cpu_clear(cpu, cpu_present_map);
 		cpu_clear(cpu, cpu_possible_map);
 		per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
@@ -1060,7 +1060,7 @@ void remove_cpu_from_maps(void)
 	cpu_clear(cpu, cpu_callout_map);
 	cpu_clear(cpu, cpu_callin_map);
 	clear_bit(cpu, (unsigned long *)&cpu_initialized); /* was set by cpu_init() */
-	clear_node_cpumask(cpu);
+	clear_bit(cpu, (unsigned long *)&node_to_cpumask_map[cpu_to_node(cpu)]);
 }
 
 int __cpu_disable(void)
diff -pur V2/arch/x86/kernel/vsyscall_64.c V3/arch/x86/kernel/vsyscall_64.c
--- V2/arch/x86/kernel/vsyscall_64.c	2008-01-15 14:06:03.000000000 -0800
+++ V3/arch/x86/kernel/vsyscall_64.c	2008-01-15 14:01:09.000000000 -0800
@@ -289,7 +289,7 @@ static void __cpuinit vsyscall_set_cpu(i
 	unsigned long *d;
 	unsigned long node = 0;
 #ifdef CONFIG_NUMA
-	node = cpu_to_node(cpu);
+	node = early_cpu_to_node(cpu);
 #endif
 	if (cpu_has(&cpu_data(cpu), X86_FEATURE_RDTSCP))
 		write_rdtscp_aux((node << 12) | cpu);
diff -pur V2/arch/x86/mm/numa_64.c V3/arch/x86/mm/numa_64.c
--- V2/arch/x86/mm/numa_64.c	2008-01-15 14:06:19.000000000 -0800
+++ V3/arch/x86/mm/numa_64.c	2008-01-15 14:01:09.000000000 -0800
@@ -281,7 +281,7 @@ void __init numa_init_array(void)
 
 	rr = first_node(node_online_map);
 	for (i = 0; i < NR_CPUS; i++) {
-		if (cpu_to_node(i) != NUMA_NO_NODE)
+		if (early_cpu_to_node(i) != NUMA_NO_NODE)
 			continue;
 		numa_set_node(i, rr);
 		rr = next_node(rr, node_online_map);
@@ -558,7 +558,7 @@ void __init numa_initmem_init(unsigned l
 
 __cpuinit void numa_add_cpu(int cpu)
 {
-	set_bit(cpu, (unsigned long *)&node_to_cpumask_map[cpu_to_node(cpu)]);
+	set_bit(cpu, (unsigned long *)&node_to_cpumask_map[early_cpu_to_node(cpu)]);
 }
 
 void __cpuinit numa_set_node(int cpu, int node)
diff -pur V2/arch/x86/mm/srat_64.c V3/arch/x86/mm/srat_64.c
--- V2/arch/x86/mm/srat_64.c	2008-01-15 14:06:18.000000000 -0800
+++ V3/arch/x86/mm/srat_64.c	2008-01-15 14:01:09.000000000 -0800
@@ -382,9 +382,10 @@ int __init acpi_scan_nodes(unsigned long
 			setup_node_bootmem(i, nodes[i].start, nodes[i].end);
 
 	for (i = 0; i < NR_CPUS; i++) {
-		if (cpu_to_node(i) == NUMA_NO_NODE)
+		int node = early_cpu_to_node(i);
+		if (node == NUMA_NO_NODE)
 			continue;
-		if (!node_isset(cpu_to_node(i), node_possible_map))
+		if (!node_isset(node, node_possible_map))
 			numa_set_node(i, NUMA_NO_NODE);
 	}
 	numa_init_array();
diff -pur V2/include/asm-x86/numa_64.h V3/include/asm-x86/numa_64.h
--- V2/include/asm-x86/numa_64.h	2008-01-15 14:06:19.000000000 -0800
+++ V3/include/asm-x86/numa_64.h	2008-01-15 14:01:09.000000000 -0800
@@ -29,15 +29,8 @@ extern void setup_node_bootmem(int nodei
 
 #ifdef CONFIG_NUMA
 extern void __init init_cpu_to_node(void);
-
-static inline void clear_node_cpumask(int cpu)
-{
-	clear_bit(cpu, (unsigned long *)&node_to_cpumask_map[cpu_to_node(cpu)]);
-}
-
 #else
 #define init_cpu_to_node() do {} while (0)
-#define clear_node_cpumask(cpu) do {} while (0)
 #endif
 
 #endif
diff -pur V2/include/asm-x86/topology.h V3/include/asm-x86/topology.h
--- V2/include/asm-x86/topology.h	2008-01-15 14:06:19.000000000 -0800
+++ V3/include/asm-x86/topology.h	2008-01-15 14:01:09.000000000 -0800
@@ -38,7 +38,8 @@ extern cpumask_t node_to_cpumask_map[];
 #define NUMA_NO_NODE	((u16)(~0))
 
 /* Returns the number of the node containing CPU 'cpu' */
-static inline int cpu_to_node(int cpu)
+static inline int early_cpu_to_node(int cpu)
 {
 	u16 *cpu_to_node_map = x86_cpu_to_node_map_early_ptr;
 
@@ -50,6 +51,15 @@ static inline int cpu_to_node(int cpu)
 		return NUMA_NO_NODE;
 }
 
+static inline int cpu_to_node(int cpu)
+{
+	if(per_cpu_offset(cpu))
+		return per_cpu(x86_cpu_to_node_map, cpu);
+	else
+		return NUMA_NO_NODE;
+}
+
 /*
  * Returns the number of the node containing Node 'node'. This
  * architecture is flat, so it is a pretty simple function!

diff -pur V2/include/linux/mmzone.h V3/include/linux/mmzone.h
--- V2/include/linux/mmzone.h	2008-01-15 14:06:03.000000000 -0800
+++ V3/include/linux/mmzone.h	2008-01-15 14:01:09.000000000 -0800
@@ -692,6 +692,7 @@ extern char numa_zonelist_order[];
 /* Returns the number of the current Node. */
 #ifndef numa_node_id
 #define numa_node_id()		(cpu_to_node(raw_smp_processor_id()))
+#define early_numa_node_id()	(early_cpu_to_node(raw_smp_processor_id()))
 #endif
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-01-15 22:23 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-15  2:17 [PATCH 00/10] x86: Reduce memory and intra-node effects with large count NR_CPUs V2 travis
2008-01-15  2:17 ` [PATCH 01/10] x86: Change size of APICIDs from u8 to u16 V2 travis
2008-01-15  2:17 ` [PATCH 02/10] x86: Change size of node ids " travis
2008-01-15  5:59   ` Eric Dumazet
2008-01-15 15:51     ` Mike Travis
2008-01-15  2:17 ` [PATCH 03/10] x86: Change NR_CPUS arrays in powernow-k8 V2 travis
2008-01-15  2:17 ` [PATCH 04/10] x86: Change NR_CPUS arrays in intel_cacheinfo V2 travis
2008-01-15  2:17 ` [PATCH 05/10] x86: Change NR_CPUS arrays in smpboot_64 V2 travis
2008-01-15  2:17 ` [PATCH 06/10] x86: Change NR_CPUS arrays in topology V2 travis
2008-01-15  2:17 ` [PATCH 07/10] x86: Cleanup x86_cpu_to_apicid references V2 travis
2008-01-15  2:17 ` [PATCH 08/10] x86: Change NR_CPUS arrays in numa_64 V2 travis
2008-01-15 10:54   ` Andi Kleen
2008-01-15 22:23     ` Mike Travis
2008-01-15  2:17 ` [PATCH 09/10] x86: Change NR_CPUS arrays in acpi-cpufreq V2 travis
2008-01-15  2:17 ` [PATCH 10/10] x86: Change bios_cpu_apicid to percpu data variable V2 travis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox