* [PATCH 01/07] i386: srat non acpi
2005-09-30 7:33 [PATCH 00/07][RFC] i386: NUMA emulation Magnus Damm
@ 2005-09-30 7:33 ` Magnus Damm, Magnus Damm
2005-09-30 7:33 ` [PATCH 02/07] i386: numa on non-smp Magnus Damm, Magnus Damm
` (6 subsequent siblings)
7 siblings, 0 replies; 38+ messages in thread
From: Magnus Damm, Magnus Damm @ 2005-09-30 7:33 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Magnus Damm
This patch adds code to check the return value of acpi_find_root_pointer().
Without this patch systems without ACPI support such as QEMU crashes when
booting a NUMA kernel configured with CONFIG_ACPI_SRAT=y.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
---
srat.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletion(-)
--- from-0002/arch/i386/kernel/srat.c
+++ to-work/arch/i386/kernel/srat.c 2005-09-28 15:59:13.000000000 +0900
@@ -327,7 +327,12 @@ int __init get_memcfg_from_srat(void)
int tables = 0;
int i = 0;
- acpi_find_root_pointer(ACPI_PHYSICAL_ADDRESSING, rsdp_address);
+ if (ACPI_FAILURE(acpi_find_root_pointer(ACPI_PHYSICAL_ADDRESSING,
+ rsdp_address))) {
+ printk("%s: System description tables not found\n",
+ __FUNCTION__);
+ goto out_err;
+ }
if (rsdp_address->pointer_type == ACPI_PHYSICAL_POINTER) {
printk("%s: assigning address to rsdp\n", __FUNCTION__);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread* [PATCH 02/07] i386: numa on non-smp
2005-09-30 7:33 [PATCH 00/07][RFC] i386: NUMA emulation Magnus Damm
2005-09-30 7:33 ` [PATCH 01/07] i386: srat non acpi Magnus Damm, Magnus Damm
@ 2005-09-30 7:33 ` Magnus Damm, Magnus Damm
2005-09-30 7:33 ` [PATCH 03/07] cpuset: smp or numa Magnus Damm, Magnus Damm
` (5 subsequent siblings)
7 siblings, 0 replies; 38+ messages in thread
From: Magnus Damm, Magnus Damm @ 2005-09-30 7:33 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Magnus Damm
This patch makes it possible to compile and use CONFIG_NUMA without CONFIG_SMP.
Useful for NUMA emulation on real or emulated UP hardware.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
---
asm-i386/topology.h | 7 ++++++-
linux/topology.h | 2 +-
2 files changed, 7 insertions(+), 2 deletions(-)
--- from-0002/include/asm-i386/topology.h
+++ to-work/include/asm-i386/topology.h 2005-09-28 16:26:20.000000000 +0900
@@ -29,8 +29,9 @@
#ifdef CONFIG_NUMA
-#include <asm/mpspec.h>
+#ifdef CONFIG_SMP
+#include <asm/mpspec.h>
#include <linux/cpumask.h>
/* Mappings between logical cpu number and node number */
@@ -88,6 +89,10 @@ static inline int node_to_first_cpu(int
.nr_balance_failed = 0, \
}
+#else
+#include <asm-generic/topology.h>
+#endif
+
extern unsigned long node_start_pfn[];
extern unsigned long node_end_pfn[];
extern unsigned long node_remap_size[];
--- from-0002/include/linux/topology.h
+++ to-work/include/linux/topology.h 2005-09-28 16:26:20.000000000 +0900
@@ -158,7 +158,7 @@
.nr_balance_failed = 0, \
}
-#ifdef CONFIG_NUMA
+#if defined(CONFIG_NUMA) && defined(CONFIG_SMP)
#ifndef SD_NODE_INIT
#error Please define an appropriate SD_NODE_INIT in include/asm/topology.h!!!
#endif
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread* [PATCH 03/07] cpuset: smp or numa
2005-09-30 7:33 [PATCH 00/07][RFC] i386: NUMA emulation Magnus Damm
2005-09-30 7:33 ` [PATCH 01/07] i386: srat non acpi Magnus Damm, Magnus Damm
2005-09-30 7:33 ` [PATCH 02/07] i386: numa on non-smp Magnus Damm, Magnus Damm
@ 2005-09-30 7:33 ` Magnus Damm, Magnus Damm
2005-09-30 7:33 ` [PATCH 04/07] i386: numa warning fix Magnus Damm, Isaku Yamahata
` (4 subsequent siblings)
7 siblings, 0 replies; 38+ messages in thread
From: Magnus Damm, Magnus Damm @ 2005-09-30 7:33 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Magnus Damm
This patch for makes it possible to compile and use CONFIG_CPUSETS without
CONFIG_SMP. Useful for NUMA emulation on real or emulated UP hardware.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
---
init/Kconfig | 2 +-
kernel/cpuset.c | 2 ++
2 files changed, 3 insertions(+), 1 deletion(-)
--- from-0002/init/Kconfig
+++ to-work/init/Kconfig 2005-09-28 17:07:31.000000000 +0900
@@ -245,7 +245,7 @@ config IKCONFIG_PROC
config CPUSETS
bool "Cpuset support"
- depends on SMP
+ depends on SMP || NUMA
help
This option will let you create and manage CPUSETs which
allow dynamically partitioning a system into sets of CPUs and
--- from-0002/kernel/cpuset.c
+++ to-work/kernel/cpuset.c 2005-09-28 17:07:31.000000000 +0900
@@ -657,6 +657,7 @@ static int validate_change(const struct
static void update_cpu_domains(struct cpuset *cur)
{
+#ifdef CONFIG_SMP
struct cpuset *c, *par = cur->parent;
cpumask_t pspan, cspan;
@@ -694,6 +695,7 @@ static void update_cpu_domains(struct cp
lock_cpu_hotplug();
partition_sched_domains(&pspan, &cspan);
unlock_cpu_hotplug();
+#endif
}
static int update_cpumask(struct cpuset *cs, char *buf)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread* [PATCH 04/07] i386: numa warning fix
2005-09-30 7:33 [PATCH 00/07][RFC] i386: NUMA emulation Magnus Damm
` (2 preceding siblings ...)
2005-09-30 7:33 ` [PATCH 03/07] cpuset: smp or numa Magnus Damm, Magnus Damm
@ 2005-09-30 7:33 ` Magnus Damm, Isaku Yamahata
2005-09-30 7:33 ` [PATCH 05/07] i386: sparsemem on pc Magnus Damm, Magnus Damm
` (3 subsequent siblings)
7 siblings, 0 replies; 38+ messages in thread
From: Magnus Damm, Isaku Yamahata @ 2005-09-30 7:33 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Magnus Damm
This patch contains a warning fix for the NUMA patch written by Dave Hansen
which was posted to lkml and linux-mm at September 13:th 2005.
[snip]
CC arch/i386/mm/numa.o
arch/i386/mm/numa.c: In function `remap_numa_kva':
arch/i386/mm/numa.c:85: warning: implicit declaration of function `set_pmd_pfn'
LD arch/i386/mm/built-in.o
[snip]
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
---
pgtable-3level.h | 1 -
pgtable.h | 2 ++
2 files changed, 2 insertions(+), 1 deletion(-)
--- from-0006/include/asm-i386/pgtable-3level.h
+++ to-work/include/asm-i386/pgtable-3level.h 2005-09-28 16:30:09.000000000 +0900
@@ -65,7 +65,6 @@ static inline void set_pte(pte_t *ptep,
set_64bit((unsigned long long *)(pmdptr),pmd_val(pmdval))
#define set_pud(pudptr,pudval) \
(*(pudptr) = (pudval))
-extern void set_pmd_pfn(unsigned long vaddr, unsigned long pfn, pgprot_t flags);
/*
* Pentium-II erratum A13: in PAE mode we explicitly have to flush
--- from-0002/include/asm-i386/pgtable.h
+++ to-work/include/asm-i386/pgtable.h 2005-09-28 16:30:09.000000000 +0900
@@ -327,6 +327,8 @@ static inline pte_t pte_modify(pte_t pte
#define pmd_large(pmd) \
((pmd_val(pmd) & (_PAGE_PSE|_PAGE_PRESENT)) == (_PAGE_PSE|_PAGE_PRESENT))
+extern void set_pmd_pfn(unsigned long vaddr, unsigned long pfn, pgprot_t flags);
+
/*
* the pgd page can be thought of an array like this: pgd_t[PTRS_PER_PGD]
*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread* [PATCH 05/07] i386: sparsemem on pc
2005-09-30 7:33 [PATCH 00/07][RFC] i386: NUMA emulation Magnus Damm
` (3 preceding siblings ...)
2005-09-30 7:33 ` [PATCH 04/07] i386: numa warning fix Magnus Damm, Isaku Yamahata
@ 2005-09-30 7:33 ` Magnus Damm, Magnus Damm
2005-09-30 15:25 ` Dave Hansen
2005-09-30 7:33 ` [PATCH 06/07] i386: discontigmem " Magnus Damm, Magnus Damm
` (2 subsequent siblings)
7 siblings, 1 reply; 38+ messages in thread
From: Magnus Damm, Magnus Damm @ 2005-09-30 7:33 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Magnus Damm
This patch for enables and fixes sparsemem support on i386. This is the
same patch that was sent to linux-kernel on September 6:th 2005, but this
patch includes up-porting to fit on top of the patches written by Dave Hansen.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
---
Kconfig | 4 ++--
kernel/setup.c | 1 +
2 files changed, 3 insertions(+), 2 deletions(-)
--- from-0002/arch/i386/Kconfig
+++ to-work/arch/i386/Kconfig 2005-09-28 16:32:47.000000000 +0900
@@ -762,7 +762,6 @@ config NUMA
depends on SMP && HIGHMEM64G && (X86_NUMAQ || X86_GENERICARCH || (X86_SUMMIT && ACPI))
default n if X86_PC
default y if (X86_NUMAQ || X86_SUMMIT)
- select SPARSEMEM_STATIC
# Need comments to help the hapless user trying to turn on NUMA support
comment "NUMA (NUMA-Q) requires SMP, 64GB highmem support"
@@ -801,7 +800,8 @@ config ARCH_DISCONTIGMEM_DEFAULT
config ARCH_SPARSEMEM_ENABLE
def_bool y
- depends on NUMA
+ depends on NUMA || (X86_PC && EXPERIMENTAL)
+ select SPARSEMEM_STATIC
config ARCH_SELECT_MEMORY_MODEL
def_bool y
--- from-0006/arch/i386/kernel/setup.c
+++ to-work/arch/i386/kernel/setup.c 2005-09-28 16:32:47.000000000 +0900
@@ -390,6 +390,7 @@ int __init get_memcfg_numa_flat(void)
/* Run the memory configuration and find the top of memory. */
node_start_pfn[0] = 0;
node_end_pfn[0] = max_pfn;
+ memory_present(0, 0, max_pfn);
/* Indicate there is one node available. */
nodes_clear(node_online_map);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH 05/07] i386: sparsemem on pc
2005-09-30 7:33 ` [PATCH 05/07] i386: sparsemem on pc Magnus Damm, Magnus Damm
@ 2005-09-30 15:25 ` Dave Hansen
2005-10-01 0:32 ` Magnus Damm
0 siblings, 1 reply; 38+ messages in thread
From: Dave Hansen @ 2005-09-30 15:25 UTC (permalink / raw)
To: Magnus Damm; +Cc: linux-mm, Linux Kernel Mailing List
On Fri, 2005-09-30 at 16:33 +0900, Magnus Damm wrote:
> This patch for enables and fixes sparsemem support on i386. This is the
> same patch that was sent to linux-kernel on September 6:th 2005, but this
> patch includes up-porting to fit on top of the patches written by Dave Hansen.
I'll post a more comprehensive way to do this in just a moment.
Subject: memhotplug testing: hack for flat systems
-- Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 05/07] i386: sparsemem on pc
2005-09-30 15:25 ` Dave Hansen
@ 2005-10-01 0:32 ` Magnus Damm
0 siblings, 0 replies; 38+ messages in thread
From: Magnus Damm @ 2005-10-01 0:32 UTC (permalink / raw)
To: Dave Hansen; +Cc: Magnus Damm, linux-mm, Linux Kernel Mailing List
On 10/1/05, Dave Hansen <haveblue@us.ibm.com> wrote:
> On Fri, 2005-09-30 at 16:33 +0900, Magnus Damm wrote:
> > This patch for enables and fixes sparsemem support on i386. This is the
> > same patch that was sent to linux-kernel on September 6:th 2005, but this
> > patch includes up-porting to fit on top of the patches written by Dave Hansen.
>
> I'll post a more comprehensive way to do this in just a moment.
>
> Subject: memhotplug testing: hack for flat systems
Looks much better, will compile and test on Monday. Thanks.
/ magnus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH 06/07] i386: discontigmem on pc
2005-09-30 7:33 [PATCH 00/07][RFC] i386: NUMA emulation Magnus Damm
` (4 preceding siblings ...)
2005-09-30 7:33 ` [PATCH 05/07] i386: sparsemem on pc Magnus Damm, Magnus Damm
@ 2005-09-30 7:33 ` Magnus Damm, Magnus Damm
2005-09-30 7:33 ` [PATCH 07/07] i386: numa emulation " Magnus Damm, Isaku Yamahata
2005-09-30 15:23 ` [PATCH 00/07][RFC] i386: NUMA emulation Dave Hansen
7 siblings, 0 replies; 38+ messages in thread
From: Magnus Damm, Magnus Damm @ 2005-09-30 7:33 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Magnus Damm
This patch enables and fixes discontigmem support for i386.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
---
arch/i386/Kconfig | 8 ++++++--
include/asm-i386/mmzone.h | 3 ++-
include/linux/mmzone.h | 5 +++++
include/linux/numa.h | 2 +-
mm/Kconfig | 2 +-
5 files changed, 15 insertions(+), 5 deletions(-)
--- from-0008/arch/i386/Kconfig
+++ to-work/arch/i386/Kconfig 2005-09-28 16:33:21.000000000 +0900
@@ -790,9 +790,13 @@ config HAVE_ARCH_ALLOC_REMAP
depends on NUMA
default y
+config ARCH_FLATMEM_ENABLE
+ def_bool y
+ depends on X86_PC
+
config ARCH_DISCONTIGMEM_ENABLE
def_bool y
- depends on NUMA
+ depends on NUMA || (X86_PC && EXPERIMENTAL)
config ARCH_DISCONTIGMEM_DEFAULT
def_bool y
@@ -812,7 +816,7 @@ source "mm/Kconfig"
config HAVE_ARCH_EARLY_PFN_TO_NID
bool
default y
- depends on NUMA
+ depends on NUMA || DISCONTIGMEM
config HIGHPTE
bool "Allocate 3rd-level pagetables from highmem"
--- from-0006/include/asm-i386/mmzone.h
+++ to-work/include/asm-i386/mmzone.h 2005-09-28 16:33:21.000000000 +0900
@@ -75,7 +75,7 @@ static inline int pfn_to_nid(unsigned lo
#endif
}
-#define node_localnr(pfn, nid) ((pfn) - node_data[nid]->node_start_pfn)
+#define node_localnr(pfn, nid) ((pfn) - NODE_DATA(nid)->node_start_pfn)
/*
* Following are macros that each numa implmentation must define.
@@ -106,6 +106,7 @@ static inline int pfn_to_nid(unsigned lo
({ \
unsigned long __pfn = pfn; \
int __node = pfn_to_nid(__pfn); \
+ int foo = (&foo == &__node); /* disable unused warning */ \
&NODE_DATA(__node)->node_mem_map[node_localnr(__pfn,__node)]; \
})
--- from-0002/include/linux/mmzone.h
+++ to-work/include/linux/mmzone.h 2005-09-28 16:33:21.000000000 +0900
@@ -414,7 +414,12 @@ extern struct pglist_data contig_page_da
#define NODE_DATA(nid) (&contig_page_data)
#define NODE_MEM_MAP(nid) mem_map
#define MAX_NODES_SHIFT 1
+
+#ifdef CONFIG_DISCONTIGMEM
+#include <asm/mmzone.h>
+#else
#define pfn_to_nid(pfn) (0)
+#endif
#else /* CONFIG_NEED_MULTIPLE_NODES */
--- from-0001/include/linux/numa.h
+++ to-work/include/linux/numa.h 2005-09-28 16:33:21.000000000 +0900
@@ -3,7 +3,7 @@
#include <linux/config.h>
-#ifndef CONFIG_FLATMEM
+#ifdef CONFIG_NUMA
#include <asm/numnodes.h>
#endif
--- from-0002/mm/Kconfig
+++ to-work/mm/Kconfig 2005-09-28 16:33:21.000000000 +0900
@@ -84,7 +84,7 @@ config FLAT_NODE_MEM_MAP
#
config NEED_MULTIPLE_NODES
def_bool y
- depends on DISCONTIGMEM || NUMA
+ depends on NUMA
config HAVE_MEMORY_PRESENT
def_bool y
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread* [PATCH 07/07] i386: numa emulation on pc
2005-09-30 7:33 [PATCH 00/07][RFC] i386: NUMA emulation Magnus Damm
` (5 preceding siblings ...)
2005-09-30 7:33 ` [PATCH 06/07] i386: discontigmem " Magnus Damm, Magnus Damm
@ 2005-09-30 7:33 ` Magnus Damm, Isaku Yamahata
2005-09-30 18:55 ` Dave Hansen
2005-10-04 7:52 ` Hirokazu Takahashi
2005-09-30 15:23 ` [PATCH 00/07][RFC] i386: NUMA emulation Dave Hansen
7 siblings, 2 replies; 38+ messages in thread
From: Magnus Damm, Isaku Yamahata @ 2005-09-30 7:33 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Magnus Damm
This patch adds NUMA emulation for i386 on top of the fixes for sparsemem and
discontigmem. NUMA emulation already exists for x86_64, and this patch adds
the same feature using the same config option CONFIG_NUMA_EMU. The kernel
command line option used is also the same as for x86_64.
Pass "numa=fake=N" to the kernel where N is the number of nodes to emulate.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
---
arch/i386/Kconfig | 20 +++++++-
arch/i386/kernel/setup.c | 34 +++++++++-----
arch/i386/mm/numa.c | 100 ++++++++++++++++++++++++++++++++++++++++++++ include/asm-i386/mmzone.h | 7 +++
include/asm-i386/numnodes.h | 2
5 files changed, 145 insertions(+), 18 deletions(-)
--- from-0009/arch/i386/Kconfig
+++ to-work/arch/i386/Kconfig 2005-09-30 13:31:13.000000000 +0900
@@ -134,7 +134,7 @@ endchoice
config ACPI_SRAT
bool
default y
- depends on NUMA && (X86_SUMMIT || X86_GENERICARCH)
+ depends on NUMA && (X86_SUMMIT || X86_GENERICARCH || NUMA_EMU)
config X86_SUMMIT_NUMA
bool
@@ -756,12 +756,21 @@ config X86_PAE
depends on HIGHMEM64G
default y
+config NUMA_EMU
+ bool "Numa Memory Nodes Emulation"
+ depends on X86_PC
+ default n
+ help
+ Enable NUMA emulation. A regular single-node PC machine will be
+ split into virtual nodes when booted with "numa=fake=N", where
+ N is the number of nodes.
+
# Common NUMA Features
config NUMA
bool "Numa Memory Allocation and Scheduler Support"
- depends on SMP && HIGHMEM64G && (X86_NUMAQ || X86_GENERICARCH || (X86_SUMMIT && ACPI))
+ depends on (NUMA_EMU && ACPI && HIGHMEM) || (SMP && HIGHMEM64G && (X86_NUMAQ || X86_GENERICARCH || (X86_SUMMIT && ACPI)))
default n if X86_PC
- default y if (X86_NUMAQ || X86_SUMMIT)
+ default y if (X86_NUMAQ || X86_SUMMIT || NUMA_EMU)
# Need comments to help the hapless user trying to turn on NUMA support
comment "NUMA (NUMA-Q) requires SMP, 64GB highmem support"
@@ -770,6 +779,9 @@ comment "NUMA (NUMA-Q) requires SMP, 64G
comment "NUMA (Summit) requires SMP, 64GB highmem support, ACPI"
depends on X86_SUMMIT && (!HIGHMEM64G || !ACPI)
+comment "NUMA (Emulation on PC) requires highmem support and ACPI"
+ depends on X86_PC && (!HIGHMEM || !ACPI)
+
config HAVE_ARCH_BOOTMEM_NODE
bool
depends on NUMA
@@ -916,7 +928,7 @@ config IRQBALANCE
# Summit needs it only when NUMA is on
config BOOT_IOREMAP
bool
- depends on (((X86_SUMMIT || X86_GENERICARCH) && NUMA) || (X86 && EFI))
+ depends on (((X86_SUMMIT || X86_GENERICARCH || NUMA_EMU) && NUMA) || (X86 && EFI))
default y
config REGPARM
--- from-0008/arch/i386/kernel/setup.c
+++ to-work/arch/i386/kernel/setup.c 2005-09-28 17:49:53.000000000 +0900
@@ -931,6 +931,13 @@ static void __init parse_cmdline_early (
elfcorehdr_addr = memparse(from+11, &from);
#endif
+#ifdef CONFIG_NUMA_EMU
+ // virtual numa setup
+ else if (!memcmp(from, "numa=", 5)) {
+ extern void numa_setup(char*, char**);
+ numa_setup(from+5, &from);
+ }
+#endif
/*
* highmem=size forces highmem to be exactly 'size' bytes.
* This works even on boxes that have no highmem otherwise.
@@ -1211,26 +1218,22 @@ static inline unsigned long nid_size_pa
{
return node_end_pfn[nid] - node_start_pfn[nid];
}
-static inline int nid_starts_in_highmem(int nid)
-{
- return node_start_pfn[nid] >= max_low_pfn;
-}
-
void __init nid_zone_sizes_init(int nid)
{
unsigned long zones_size[MAX_NR_ZONES] = {0, 0, 0};
- unsigned long max_dma;
+ unsigned long max_dma = min(max_hardware_dma_pfn(), max_low_pfn);
unsigned long start = node_start_pfn[nid];
unsigned long end = node_end_pfn[nid];
if (node_has_online_mem(nid)){
- if (nid_starts_in_highmem(nid)) {
- zones_size[ZONE_HIGHMEM] = nid_size_pages(nid);
- } else {
- max_dma = min(max_hardware_dma_pfn(), max_low_pfn);
- zones_size[ZONE_DMA] = max_dma;
- zones_size[ZONE_NORMAL] = max_low_pfn - max_dma;
- zones_size[ZONE_HIGHMEM] = end - max_low_pfn;
+ if (start < max_dma) {
+ zones_size[ZONE_DMA] = min(end, max_dma) - start;
+ }
+ if (start < max_low_pfn && max_dma < end) {
+ zones_size[ZONE_NORMAL] = min(end, max_low_pfn) - max(start, max_dma);
+ }
+ if (max_low_pfn <= end) {
+ zones_size[ZONE_HIGHMEM] = end - max(start, max_low_pfn);
}
}
@@ -1270,7 +1273,12 @@ void __init setup_bootmem_allocator(void
/*
* Initialize the boot-time allocator (with low memory only):
*/
+#ifdef CONFIG_NUMA_EMU
+ bootmap_size = init_bootmem(max(min_low_pfn, node_start_pfn[0]),
+ min(max_low_pfn, node_end_pfn[0]));
+#else
bootmap_size = init_bootmem(min_low_pfn, max_low_pfn);
+#endif
register_bootmem_low_pages(max_low_pfn);
--- from-0006/arch/i386/mm/numa.c
+++ to-work/arch/i386/mm/numa.c 2005-09-28 17:49:53.000000000 +0900
@@ -165,3 +165,103 @@ int early_pfn_to_nid(unsigned long pfn)
return 0;
}
+
+#ifdef CONFIG_NUMA_EMU
+int numa_fake __initdata = 0;
+
+extern unsigned long node_start_pfn[MAX_NUMNODES] __read_mostly;
+extern unsigned long node_end_pfn[MAX_NUMNODES] __read_mostly;
+
+int
+get_memcfg_numa_emu(void)
+{
+ unsigned long node_size;
+ unsigned long shift;
+ int i;
+
+ if (numa_fake == 0)
+ return 0;
+ node_size = max_pfn / numa_fake;
+ if (node_size == 0)
+ return 0;
+
+ printk("NUMA - single node, flat memory mode, broken into %d nodes\n",
+ numa_fake);
+ shift = 1;
+ while ((1 << shift) < node_size) {
+ shift++;
+ }
+ node_size = 1 << shift;
+ if (node_size * PAGE_SIZE < (1UL << SECTION_SIZE_BITS)) {
+ printk("node_size %ld is too small.(it must be >= %ld)\n",
+ node_size * PAGE_SIZE, (1UL << SECTION_SIZE_BITS));
+ printk("consider descreas # of nodes "
+ "(or decreas SECTIONS_SIZE_BITS %d)\n",
+ SECTION_SIZE_BITS);
+ printk("kernel will panic!\n");
+ // Don't panic here.
+ // Here even early printk is not enabled so that
+ // this message won't be showed if we panic right here.
+ // Let the kernel go, print this message and then panic.
+ }
+ printk("block size %ld shift %ld\n", node_size, shift);
+
+ nodes_clear(node_online_map);
+ for (i = 0; i < numa_fake; i++) {
+ unsigned long size;
+ unsigned long pfn;
+ node_start_pfn[i] = node_size * i;
+ node_end_pfn[i] = min(node_start_pfn[i] + node_size, max_pfn);
+
+ node_remap_size[i] = node_memmap_size_bytes(i,
+ node_start_pfn[i],
+ node_end_pfn[i]);
+
+ //XXX see calculate_numa_remap_pages()
+ size = node_remap_size[i] + sizeof(pg_data_t);
+ size = (size + PMD_SIZE - 1) / PMD_SIZE;
+ size = size * PTRS_PER_PTE;
+ for (pfn = node_end_pfn[i] - size;
+ pfn < node_end_pfn[i]; pfn++)
+ if (!page_is_ram(pfn))
+ break;
+ if (pfn != node_end_pfn[i])
+ size = 0;
+ if (node_end_pfn[i] & (PTRS_PER_PTE - 1)) {
+ size += node_end_pfn[i] & (PTRS_PER_PTE - 1);
+ }
+
+ if (node_start_pfn[i] + size >= node_end_pfn[i]) {
+ printk("last memory segment %d has too few pages "
+ "%ld = %ld - %ld\n",
+ i,
+ node_end_pfn[i] - node_start_pfn[i],
+ node_start_pfn[i],
+ node_end_pfn[i]);
+ node_start_pfn[i] = 0;
+ node_end_pfn[i] = 0;
+ node_remap_size[i] = 0;
+ break;
+ } else {
+ node_set_online(i);
+ memory_present(i, node_start_pfn[i], node_end_pfn[i]);
+ }
+ }
+ printk("total %d blocks, max %ld\n", i, max_pfn);
+ return 1;
+}
+#endif
+
+void __init
+numa_setup(char* opt, char** retptr)
+{
+#ifdef CONFIG_NUMA_EMU
+ if (!memcmp(opt, "fake=", 5) && (*(opt + 5))) {
+ numa_fake = simple_strtoul(opt + 5, retptr, 0);
+ numa_fake = min(numa_fake, MAX_NUMNODES);
+ printk("fake numa nodes = %d/%d\n", numa_fake, MAX_NUMNODES);
+ } else {
+ *retptr = opt;
+ }
+#endif
+}
--- from-0009/include/asm-i386/mmzone.h
+++ to-work/include/asm-i386/mmzone.h 2005-09-30 13:53:35.000000000 +0900
@@ -18,6 +18,9 @@ extern struct pglist_data *node_data[];
#include <asm/srat.h>
#endif
+#ifdef CONFIG_NUMA_EMU
+extern int get_memcfg_numa_emu(void);
+#endif
extern int get_memcfg_numa_flat(void );
/*
* This allows any one NUMA architecture to be compiled
@@ -33,6 +36,10 @@ static inline void get_memcfg_numa(void)
if (get_memcfg_from_srat())
return;
#endif
+#ifdef CONFIG_NUMA_EMU
+ if (get_memcfg_numa_emu())
+ return;
+#endif
get_memcfg_numa_flat();
}
--- from-0001/include/asm-i386/numnodes.h
+++ to-work/include/asm-i386/numnodes.h 2005-09-28 17:49:53.000000000 +0900
@@ -8,7 +8,7 @@
/* Max 16 Nodes */
#define NODES_SHIFT 4
-#elif defined(CONFIG_ACPI_SRAT)
+#elif defined(CONFIG_ACPI_SRAT) || defined(CONFIG_NUMA_EMU)
/* Max 8 Nodes */
#define NODES_SHIFT 3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH 07/07] i386: numa emulation on pc
2005-09-30 7:33 ` [PATCH 07/07] i386: numa emulation " Magnus Damm, Isaku Yamahata
@ 2005-09-30 18:55 ` Dave Hansen
2005-10-03 9:59 ` Magnus Damm
2005-10-04 7:52 ` Hirokazu Takahashi
1 sibling, 1 reply; 38+ messages in thread
From: Dave Hansen @ 2005-09-30 18:55 UTC (permalink / raw)
To: Magnus Damm, Isaku Yamahata; +Cc: linux-mm, Linux Kernel Mailing List
On Fri, 2005-09-30 at 16:33 +0900, Magnus Damm wrote:
> void __init nid_zone_sizes_init(int nid)
> {
> unsigned long zones_size[MAX_NR_ZONES] = {0, 0, 0};
> - unsigned long max_dma;
> + unsigned long max_dma = min(max_hardware_dma_pfn(), max_low_pfn);
> unsigned long start = node_start_pfn[nid];
> unsigned long end = node_end_pfn[nid];
>
> if (node_has_online_mem(nid)){
> - if (nid_starts_in_highmem(nid)) {
> - zones_size[ZONE_HIGHMEM] = nid_size_pages(nid);
> - } else {
> - max_dma = min(max_hardware_dma_pfn(), max_low_pfn);
> - zones_size[ZONE_DMA] = max_dma;
> - zones_size[ZONE_NORMAL] = max_low_pfn - max_dma;
> - zones_size[ZONE_HIGHMEM] = end - max_low_pfn;
> + if (start < max_dma) {
> + zones_size[ZONE_DMA] = min(end, max_dma) - start;
> + }
> + if (start < max_low_pfn && max_dma < end) {
> + zones_size[ZONE_NORMAL] = min(end, max_low_pfn) - max(start, max_dma);
> + }
> + if (max_low_pfn <= end) {
> + zones_size[ZONE_HIGHMEM] = end - max(start, max_low_pfn);
> }
> }
That is a decent cleanup all by itself. You might want to break it out.
Take a look at the patches I just sent out. They do some similar things
to the same code.
> @@ -1270,7 +1273,12 @@ void __init setup_bootmem_allocator(void
> /*
> * Initialize the boot-time allocator (with low memory only):
> */
> +#ifdef CONFIG_NUMA_EMU
> + bootmap_size = init_bootmem(max(min_low_pfn, node_start_pfn[0]),
> + min(max_low_pfn, node_end_pfn[0]));
> +#else
> bootmap_size = init_bootmem(min_low_pfn, max_low_pfn);
> +#endif
This shouldn't be necessary. Again, take a look at my discontig
separation patches and see if what I did works for you here.
> register_bootmem_low_pages(max_low_pfn);
>
> --- from-0006/arch/i386/mm/numa.c
> +++ to-work/arch/i386/mm/numa.c 2005-09-28 17:49:53.000000000 +0900
> @@ -165,3 +165,103 @@ int early_pfn_to_nid(unsigned long pfn)
>
> return 0;
> }
> +
> +#ifdef CONFIG_NUMA_EMU
...
> +#endif
Ewwwwww :) No real need to put new function in a big #ifdef like that.
Can you just create a new file for NUMA emulation?
> --- from-0001/include/asm-i386/numnodes.h
> +++ to-work/include/asm-i386/numnodes.h 2005-09-28 17:49:53.000000000 +0900
> @@ -8,7 +8,7 @@
> /* Max 16 Nodes */
> #define NODES_SHIFT 4
>
> -#elif defined(CONFIG_ACPI_SRAT)
> +#elif defined(CONFIG_ACPI_SRAT) || defined(CONFIG_NUMA_EMU)
>
> /* Max 8 Nodes */
> #define NODES_SHIFT 3
Geez. We should probably just do those in the Kconfig files. Would
look much simpler. But, that's a patch for another day. This is fine
by itself.
-- Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH 07/07] i386: numa emulation on pc
2005-09-30 18:55 ` Dave Hansen
@ 2005-10-03 9:59 ` Magnus Damm
2005-10-03 16:16 ` Dave Hansen
0 siblings, 1 reply; 38+ messages in thread
From: Magnus Damm @ 2005-10-03 9:59 UTC (permalink / raw)
To: Dave Hansen
Cc: Magnus Damm, Isaku Yamahata, linux-mm, Linux Kernel Mailing List
Hi again Dave,
On 10/1/05, Dave Hansen <haveblue@us.ibm.com> wrote:
> On Fri, 2005-09-30 at 16:33 +0900, Magnus Damm wrote:
> > void __init nid_zone_sizes_init(int nid)
> > {
> > unsigned long zones_size[MAX_NR_ZONES] = {0, 0, 0};
> > - unsigned long max_dma;
> > + unsigned long max_dma = min(max_hardware_dma_pfn(), max_low_pfn);
> > unsigned long start = node_start_pfn[nid];
> > unsigned long end = node_end_pfn[nid];
> >
> > if (node_has_online_mem(nid)){
> > - if (nid_starts_in_highmem(nid)) {
> > - zones_size[ZONE_HIGHMEM] = nid_size_pages(nid);
> > - } else {
> > - max_dma = min(max_hardware_dma_pfn(), max_low_pfn);
> > - zones_size[ZONE_DMA] = max_dma;
> > - zones_size[ZONE_NORMAL] = max_low_pfn - max_dma;
> > - zones_size[ZONE_HIGHMEM] = end - max_low_pfn;
> > + if (start < max_dma) {
> > + zones_size[ZONE_DMA] = min(end, max_dma) - start;
> > + }
> > + if (start < max_low_pfn && max_dma < end) {
> > + zones_size[ZONE_NORMAL] = min(end, max_low_pfn) - max(start, max_dma);
> > + }
> > + if (max_low_pfn <= end) {
> > + zones_size[ZONE_HIGHMEM] = end - max(start, max_low_pfn);
> > }
> > }
>
> That is a decent cleanup all by itself. You might want to break it out.
> Take a look at the patches I just sent out. They do some similar things
> to the same code.
Break it out, sure! I'm not sure which patch to look at, though.
> > @@ -1270,7 +1273,12 @@ void __init setup_bootmem_allocator(void
> > /*
> > * Initialize the boot-time allocator (with low memory only):
> > */
> > +#ifdef CONFIG_NUMA_EMU
> > + bootmap_size = init_bootmem(max(min_low_pfn, node_start_pfn[0]),
> > + min(max_low_pfn, node_end_pfn[0]));
> > +#else
> > bootmap_size = init_bootmem(min_low_pfn, max_low_pfn);
> > +#endif
>
> This shouldn't be necessary. Again, take a look at my discontig
> separation patches and see if what I did works for you here.
Do you mean "discontig-consolidate0.patch"? Maybe I'm misunderstanding.
> > +#ifdef CONFIG_NUMA_EMU
> ...
> > +#endif
>
> Ewwwwww :) No real need to put new function in a big #ifdef like that.
> Can you just create a new file for NUMA emulation?
Hehe, what is this, a beauty contest? =) I agree, but I guess the
reason for this code to be here is that a similar arrangement is done
by x86_64...
I will create a new file. Is arch/i386/mm/numa_emu.c good?
> > --- from-0001/include/asm-i386/numnodes.h
> > +++ to-work/include/asm-i386/numnodes.h 2005-09-28 17:49:53.000000000 +0900
> > @@ -8,7 +8,7 @@
> > /* Max 16 Nodes */
> > #define NODES_SHIFT 4
> >
> > -#elif defined(CONFIG_ACPI_SRAT)
> > +#elif defined(CONFIG_ACPI_SRAT) || defined(CONFIG_NUMA_EMU)
> >
> > /* Max 8 Nodes */
> > #define NODES_SHIFT 3
>
> Geez. We should probably just do those in the Kconfig files. Would
> look much simpler. But, that's a patch for another day. This is fine
> by itself.
No biggie, I will add a config option.
But first, you have written lots and lots of patches, and I am
confused. Could you please tell me on which patches I should base my
code to make things as easy as possible?
Many thanks,
/ magnus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH 07/07] i386: numa emulation on pc
2005-10-03 9:59 ` Magnus Damm
@ 2005-10-03 16:16 ` Dave Hansen
2005-10-04 5:06 ` Magnus Damm
0 siblings, 1 reply; 38+ messages in thread
From: Dave Hansen @ 2005-10-03 16:16 UTC (permalink / raw)
To: Magnus Damm
Cc: Magnus Damm, Isaku Yamahata, linux-mm, Linux Kernel Mailing List
On Mon, 2005-10-03 at 18:59 +0900, Magnus Damm wrote:
> > > +#ifdef CONFIG_NUMA_EMU
> > > + bootmap_size = init_bootmem(max(min_low_pfn, node_start_pfn[0]),
> > > + min(max_low_pfn, node_end_pfn[0]));
> > > +#else
> > > bootmap_size = init_bootmem(min_low_pfn, max_low_pfn);
> > > +#endif
> >
> > This shouldn't be necessary. Again, take a look at my discontig
> > separation patches and see if what I did works for you here.
>
> Do you mean "discontig-consolidate0.patch"? Maybe I'm misunderstanding.
This one, I believe:
http://sr71.net/patches/2.6.14/2.6.14-rc2-git8-mhp1/broken-out/B2.1-i386-discontig-consolidation.patch
> > > +#ifdef CONFIG_NUMA_EMU
> > ...
> > > +#endif
> >
> > Ewwwwww :) No real need to put new function in a big #ifdef like that.
> > Can you just create a new file for NUMA emulation?
>
> Hehe, what is this, a beauty contest? =) I agree, but I guess the
> reason for this code to be here is that a similar arrangement is done
> by x86_64...
If that's really the case, can they _actually_ share code? Maybe we can
do this NUMA emulation thing in non-arch code. Just guessing...
> I will create a new file. Is arch/i386/mm/numa_emu.c good?
> But first, you have written lots and lots of patches, and I am
> confused. Could you please tell me on which patches I should base my
> code to make things as easy as possible?
This is the staging ground for my memory hotplug work. But, it contains
all of my work on other stuff, too. If you build on top of this, it
would be great:
http://sr71.net/patches/2.6.14/2.6.14-rc2-git8-mhp1/
-- Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 07/07] i386: numa emulation on pc
2005-10-03 16:16 ` Dave Hansen
@ 2005-10-04 5:06 ` Magnus Damm
0 siblings, 0 replies; 38+ messages in thread
From: Magnus Damm @ 2005-10-04 5:06 UTC (permalink / raw)
To: Dave Hansen
Cc: Magnus Damm, Isaku Yamahata, linux-mm, Linux Kernel Mailing List
On 10/4/05, Dave Hansen <haveblue@us.ibm.com> wrote:
> On Mon, 2005-10-03 at 18:59 +0900, Magnus Damm wrote:
> > > > +#ifdef CONFIG_NUMA_EMU
> > > ...
> > > > +#endif
> > >
> > > Ewwwwww :) No real need to put new function in a big #ifdef like that.
> > > Can you just create a new file for NUMA emulation?
> >
> > Hehe, what is this, a beauty contest? =) I agree, but I guess the
> > reason for this code to be here is that a similar arrangement is done
> > by x86_64...
>
> If that's really the case, can they _actually_ share code? Maybe we can
> do this NUMA emulation thing in non-arch code. Just guessing...
I'd like to avoid duplication as much as you, but at a quick glance
the x86_64 and i386 architecture looked pretty different. But I will
see what I can do.
> > I will create a new file. Is arch/i386/mm/numa_emu.c good?
>
> > But first, you have written lots and lots of patches, and I am
> > confused. Could you please tell me on which patches I should base my
> > code to make things as easy as possible?
>
> This is the staging ground for my memory hotplug work. But, it contains
> all of my work on other stuff, too. If you build on top of this, it
> would be great:
>
> http://sr71.net/patches/2.6.14/2.6.14-rc2-git8-mhp1/
I will build on top of that then.
Thanks,
/ magnus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 07/07] i386: numa emulation on pc
2005-09-30 7:33 ` [PATCH 07/07] i386: numa emulation " Magnus Damm, Isaku Yamahata
2005-09-30 18:55 ` Dave Hansen
@ 2005-10-04 7:52 ` Hirokazu Takahashi
2005-10-04 9:49 ` Magnus Damm
1 sibling, 1 reply; 38+ messages in thread
From: Hirokazu Takahashi @ 2005-10-04 7:52 UTC (permalink / raw)
To: magnus; +Cc: linux-mm, linux-kernel
Hi,
> This patch adds NUMA emulation for i386 on top of the fixes for sparsemem and
> discontigmem. NUMA emulation already exists for x86_64, and this patch adds
> the same feature using the same config option CONFIG_NUMA_EMU. The kernel
> command line option used is also the same as for x86_64.
It seems like you've forgot to bind cpus with emulated nodes as linux for
x86_64 does. I don't think it's your intention.
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 07/07] i386: numa emulation on pc
2005-10-04 7:52 ` Hirokazu Takahashi
@ 2005-10-04 9:49 ` Magnus Damm
0 siblings, 0 replies; 38+ messages in thread
From: Magnus Damm @ 2005-10-04 9:49 UTC (permalink / raw)
To: Hirokazu Takahashi; +Cc: magnus, linux-mm, linux-kernel
On 10/4/05, Hirokazu Takahashi <taka@valinux.co.jp> wrote:
> It seems like you've forgot to bind cpus with emulated nodes as linux for
> x86_64 does. I don't think it's your intention.
True, not my intention. I will have a look at that. Thanks.
/ magnus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-09-30 7:33 [PATCH 00/07][RFC] i386: NUMA emulation Magnus Damm
` (6 preceding siblings ...)
2005-09-30 7:33 ` [PATCH 07/07] i386: numa emulation " Magnus Damm, Isaku Yamahata
@ 2005-09-30 15:23 ` Dave Hansen
2005-10-03 2:08 ` Magnus Damm
2005-10-03 3:21 ` Paul Jackson
7 siblings, 2 replies; 38+ messages in thread
From: Dave Hansen @ 2005-09-30 15:23 UTC (permalink / raw)
To: Magnus Damm; +Cc: linux-mm, Linux Kernel Mailing List
On Fri, 2005-09-30 at 16:33 +0900, Magnus Damm wrote:
> These patches implement NUMA memory node emulation for regular i386 PC:s.
>
> NUMA emulation could be used to provide coarse-grained memory resource control
> using CPUSETS. Another use is as a test environment for NUMA memory code or
> CPUSETS using an i386 emulator such as QEMU.
This patch set basically allows the "NUMA depends on SMP" dependency to
be removed. I'm not sure this is the right approach. There will likely
never be a real-world NUMA system without SMP. So, this set would seem
to include some increased (#ifdef) complexity for supporting SMP && !
NUMA, which will likely never happen in the real world.
Also, I worry that simply #ifdef'ing things out like CPUsets' update
means that CPUsets lacks some kind of abstraction that it should have
been using in the first place. An #ifdef just papers over the real
problem.
I think it would likely be cleaner if the approach was to emulate an SMP
NUMA system where each NUMA node simply doesn't have all of its CPUs
online.
-- Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-09-30 15:23 ` [PATCH 00/07][RFC] i386: NUMA emulation Dave Hansen
@ 2005-10-03 2:08 ` Magnus Damm
2005-10-03 7:34 ` David Lang
2005-10-03 3:21 ` Paul Jackson
1 sibling, 1 reply; 38+ messages in thread
From: Magnus Damm @ 2005-10-03 2:08 UTC (permalink / raw)
To: Dave Hansen; +Cc: Magnus Damm, linux-mm, Linux Kernel Mailing List
On 10/1/05, Dave Hansen <haveblue@us.ibm.com> wrote:
> On Fri, 2005-09-30 at 16:33 +0900, Magnus Damm wrote:
> > These patches implement NUMA memory node emulation for regular i386 PC:s.
> >
> > NUMA emulation could be used to provide coarse-grained memory resource control
> > using CPUSETS. Another use is as a test environment for NUMA memory code or
> > CPUSETS using an i386 emulator such as QEMU.
>
> This patch set basically allows the "NUMA depends on SMP" dependency to
> be removed. I'm not sure this is the right approach. There will likely
> never be a real-world NUMA system without SMP. So, this set would seem
> to include some increased (#ifdef) complexity for supporting SMP && !
> NUMA, which will likely never happen in the real world.
Yes, this patch set removes "NUMA depends on SMP". It also adds some
simple NUMA emulation code too, but I am sure you are aware of that!
=)
I agree that it is very unlikely to find a single-processor NUMA
system in the real world. So yes, "[PATCH 02/07] i386: numa on
non-smp" adds _some_ extra complexity. But because SMP is set when
supporting more than one cpu, and NUMA is set when supporting more
than one memory node, I see no reason why they should be dependent on
each other. Except that they depend on each other today and breaking
them loose will increase complexity a bit.
> Also, I worry that simply #ifdef'ing things out like CPUsets' update
> means that CPUsets lacks some kind of abstraction that it should have
> been using in the first place. An #ifdef just papers over the real
> problem.
Maybe. CPUSETS has two bitmaps, one for cpus and one for mems. So
depending on SMP or NUMA seems logical to me. Regarding the #ifdef, it
was added because partition_sched_domain() is only implemented for
SMP. That symbol has no prototype or implementation when CONFIG_SMP is
not set. Maybe it is better to add an empty inline function in
linux/sched.h for !SMP?
> I think it would likely be cleaner if the approach was to emulate an SMP
> NUMA system where each NUMA node simply doesn't have all of its CPUs
> online.
Absolutely. And that removes the need for some of my patches. QEMU
runs SMP kernels. It is possible to run SMP kernels on UP hardware.
But there is of course a certain performance loss introduced by all
the SMP locks. I'd rather not force !SMP users to run SMP kernels if
they want coarse-grained memory resource control.
Thanks for your input!
/ magnus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-10-03 2:08 ` Magnus Damm
@ 2005-10-03 7:34 ` David Lang
2005-10-03 10:02 ` Magnus Damm
2005-10-03 14:45 ` Martin J. Bligh
0 siblings, 2 replies; 38+ messages in thread
From: David Lang @ 2005-10-03 7:34 UTC (permalink / raw)
To: Magnus Damm; +Cc: Dave Hansen, Magnus Damm, linux-mm, Linux Kernel Mailing List
On Mon, 3 Oct 2005, Magnus Damm wrote:
> On 10/1/05, Dave Hansen <haveblue@us.ibm.com> wrote:
>> On Fri, 2005-09-30 at 16:33 +0900, Magnus Damm wrote:
>>> These patches implement NUMA memory node emulation for regular i386 PC:s.
>>>
>>> NUMA emulation could be used to provide coarse-grained memory resource control
>>> using CPUSETS. Another use is as a test environment for NUMA memory code or
>>> CPUSETS using an i386 emulator such as QEMU.
>>
>> This patch set basically allows the "NUMA depends on SMP" dependency to
>> be removed. I'm not sure this is the right approach. There will likely
>> never be a real-world NUMA system without SMP. So, this set would seem
>> to include some increased (#ifdef) complexity for supporting SMP && !
>> NUMA, which will likely never happen in the real world.
>
> Yes, this patch set removes "NUMA depends on SMP". It also adds some
> simple NUMA emulation code too, but I am sure you are aware of that!
> =)
>
> I agree that it is very unlikely to find a single-processor NUMA
> system in the real world. So yes, "[PATCH 02/07] i386: numa on
> non-smp" adds _some_ extra complexity. But because SMP is set when
> supporting more than one cpu, and NUMA is set when supporting more
> than one memory node, I see no reason why they should be dependent on
> each other. Except that they depend on each other today and breaking
> them loose will increase complexity a bit.
hmm, observation from the peanut gallery, would it make sene to look at
useing the NUMA code on single proc machines that use PAE to access more
then 4G or ram on a 32 bit system?
David Lang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-10-03 7:34 ` David Lang
@ 2005-10-03 10:02 ` Magnus Damm
2005-10-03 13:33 ` David Lang
2005-10-03 14:45 ` Martin J. Bligh
1 sibling, 1 reply; 38+ messages in thread
From: Magnus Damm @ 2005-10-03 10:02 UTC (permalink / raw)
To: David Lang; +Cc: Dave Hansen, Magnus Damm, linux-mm, Linux Kernel Mailing List
On 10/3/05, David Lang <david.lang@digitalinsight.com> wrote:
> On Mon, 3 Oct 2005, Magnus Damm wrote:
>
> > On 10/1/05, Dave Hansen <haveblue@us.ibm.com> wrote:
> >> On Fri, 2005-09-30 at 16:33 +0900, Magnus Damm wrote:
> >>> These patches implement NUMA memory node emulation for regular i386 PC:s.
> >>>
> >>> NUMA emulation could be used to provide coarse-grained memory resource control
> >>> using CPUSETS. Another use is as a test environment for NUMA memory code or
> >>> CPUSETS using an i386 emulator such as QEMU.
> >>
> >> This patch set basically allows the "NUMA depends on SMP" dependency to
> >> be removed. I'm not sure this is the right approach. There will likely
> >> never be a real-world NUMA system without SMP. So, this set would seem
> >> to include some increased (#ifdef) complexity for supporting SMP && !
> >> NUMA, which will likely never happen in the real world.
> >
> > Yes, this patch set removes "NUMA depends on SMP". It also adds some
> > simple NUMA emulation code too, but I am sure you are aware of that!
> > =)
> >
> > I agree that it is very unlikely to find a single-processor NUMA
> > system in the real world. So yes, "[PATCH 02/07] i386: numa on
> > non-smp" adds _some_ extra complexity. But because SMP is set when
> > supporting more than one cpu, and NUMA is set when supporting more
> > than one memory node, I see no reason why they should be dependent on
> > each other. Except that they depend on each other today and breaking
> > them loose will increase complexity a bit.
>
> hmm, observation from the peanut gallery, would it make sene to look at
> useing the NUMA code on single proc machines that use PAE to access more
> then 4G or ram on a 32 bit system?
Hm, maybe? =) What would you like to accomplish by that?
/ magnus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-10-03 10:02 ` Magnus Damm
@ 2005-10-03 13:33 ` David Lang
2005-10-03 14:59 ` Martin J. Bligh
0 siblings, 1 reply; 38+ messages in thread
From: David Lang @ 2005-10-03 13:33 UTC (permalink / raw)
To: Magnus Damm; +Cc: Dave Hansen, Magnus Damm, linux-mm, Linux Kernel Mailing List
On Mon, 3 Oct 2005, Magnus Damm wrote:
> Date: Mon, 3 Oct 2005 19:02:08 +0900
> From: Magnus Damm <magnus.damm@gmail.com>
> To: David Lang <david.lang@digitalinsight.com>
> Cc: Dave Hansen <haveblue@us.ibm.com>, Magnus Damm <magnus@valinux.co.jp>,
> linux-mm <linux-mm@kvack.org>,
> Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
> Subject: Re: [PATCH 00/07][RFC] i386: NUMA emulation
>
> On 10/3/05, David Lang <david.lang@digitalinsight.com> wrote:
>> On Mon, 3 Oct 2005, Magnus Damm wrote:
>>
>>> On 10/1/05, Dave Hansen <haveblue@us.ibm.com> wrote:
>>>> On Fri, 2005-09-30 at 16:33 +0900, Magnus Damm wrote:
>>>>> These patches implement NUMA memory node emulation for regular i386 PC:s.
>>>>>
>>>>> NUMA emulation could be used to provide coarse-grained memory resource control
>>>>> using CPUSETS. Another use is as a test environment for NUMA memory code or
>>>>> CPUSETS using an i386 emulator such as QEMU.
>>>>
>>>> This patch set basically allows the "NUMA depends on SMP" dependency to
>>>> be removed. I'm not sure this is the right approach. There will likely
>>>> never be a real-world NUMA system without SMP. So, this set would seem
>>>> to include some increased (#ifdef) complexity for supporting SMP && !
>>>> NUMA, which will likely never happen in the real world.
>>>
>>> Yes, this patch set removes "NUMA depends on SMP". It also adds some
>>> simple NUMA emulation code too, but I am sure you are aware of that!
>>> =)
>>>
>>> I agree that it is very unlikely to find a single-processor NUMA
>>> system in the real world. So yes, "[PATCH 02/07] i386: numa on
>>> non-smp" adds _some_ extra complexity. But because SMP is set when
>>> supporting more than one cpu, and NUMA is set when supporting more
>>> than one memory node, I see no reason why they should be dependent on
>>> each other. Except that they depend on each other today and breaking
>>> them loose will increase complexity a bit.
>>
>> hmm, observation from the peanut gallery, would it make sene to look at
>> useing the NUMA code on single proc machines that use PAE to access more
>> then 4G or ram on a 32 bit system?
>
> Hm, maybe? =) What would you like to accomplish by that?
if nothing else preferential use of 'local' (non PAE) memory over 'remote'
(PAE) memory for programs, while still useing it all as needed.
this may be done already, but this type of difference between the access
speed of different chunks of ram seems to be exactly the type of thing
that the NUMA code solves the general case for. I'm thinking that it may
end up simplifying things if the same general-purpose logic will work for
the specific case of PAE instead of it being hard coded as a special case.
it also just struck me as the most obvious example of where a UP box could
have a NUMA-like memory arrangement (and therefor a case to justify
decoupling the SMP and NUMA options)
David Lang
> / magnus
>
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-10-03 13:33 ` David Lang
@ 2005-10-03 14:59 ` Martin J. Bligh
2005-10-03 15:03 ` David Lang
0 siblings, 1 reply; 38+ messages in thread
From: Martin J. Bligh @ 2005-10-03 14:59 UTC (permalink / raw)
To: David Lang, Magnus Damm
Cc: Dave Hansen, Magnus Damm, linux-mm, Linux Kernel Mailing List
> if nothing else preferential use of 'local' (non PAE) memory over
> 'remote' (PAE) memory for programs, while still useing it all as needed.
Why would you want to do that? ;-)
> this may be done already, but this type of difference between the access
> speed of different chunks of ram seems to be exactly the type of thing
> that the NUMA code solves the general case for.
It is!
> I'm thinking that it
> may end up simplifying things if the same general-purpose logic will
> work for the specific case of PAE instead of it being hard coded as
> a special case.
But that's not the same at all! ;-) PAE memory is the same speed as
the other stuff. You just have a 3rd level of pagetables for everything.
One could (correctly) argue it made *all* memory slower, but it does so
in a uniform fashion.
M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-10-03 14:59 ` Martin J. Bligh
@ 2005-10-03 15:03 ` David Lang
2005-10-03 15:08 ` Martin J. Bligh
0 siblings, 1 reply; 38+ messages in thread
From: David Lang @ 2005-10-03 15:03 UTC (permalink / raw)
To: Martin J. Bligh
Cc: Magnus Damm, Dave Hansen, Magnus Damm, linux-mm,
Linux Kernel Mailing List
On Mon, 3 Oct 2005, Martin J. Bligh wrote:
> But that's not the same at all! ;-) PAE memory is the same speed as
> the other stuff. You just have a 3rd level of pagetables for everything.
> One could (correctly) argue it made *all* memory slower, but it does so
> in a uniform fashion.
is it? I've seen during the memory self-test at boot that machines slow
down noticably as they pass the 4G mark.
David Lang
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-10-03 15:03 ` David Lang
@ 2005-10-03 15:08 ` Martin J. Bligh
2005-10-03 15:13 ` David Lang
0 siblings, 1 reply; 38+ messages in thread
From: Martin J. Bligh @ 2005-10-03 15:08 UTC (permalink / raw)
To: David Lang
Cc: Magnus Damm, Dave Hansen, Magnus Damm, linux-mm,
Linux Kernel Mailing List
--David Lang <david.lang@digitalinsight.com> wrote (on Monday, October 03, 2005 08:03:44 -0700):
> On Mon, 3 Oct 2005, Martin J. Bligh wrote:
>
>> But that's not the same at all! ;-) PAE memory is the same speed as
>> the other stuff. You just have a 3rd level of pagetables for everything.
>> One could (correctly) argue it made *all* memory slower, but it does so
>> in a uniform fashion.
>
> is it? I've seen during the memory self-test at boot that machines slow down noticably as they pass the 4G mark.
Not noticed that, and I can't see why it should be the case in general,
though I suppose some machines might be odd. Got any numbers?
M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-10-03 15:08 ` Martin J. Bligh
@ 2005-10-03 15:13 ` David Lang
2005-10-03 15:25 ` Martin J. Bligh
0 siblings, 1 reply; 38+ messages in thread
From: David Lang @ 2005-10-03 15:13 UTC (permalink / raw)
To: Martin J. Bligh
Cc: Magnus Damm, Dave Hansen, Magnus Damm, linux-mm,
Linux Kernel Mailing List
On Mon, 3 Oct 2005, Martin J. Bligh wrote:
> --David Lang <david.lang@digitalinsight.com> wrote (on Monday, October 03, 2005 08:03:44 -0700):
>
>> On Mon, 3 Oct 2005, Martin J. Bligh wrote:
>>
>>> But that's not the same at all! ;-) PAE memory is the same speed as
>>> the other stuff. You just have a 3rd level of pagetables for everything.
>>> One could (correctly) argue it made *all* memory slower, but it does so
>>> in a uniform fashion.
>>
>> is it? I've seen during the memory self-test at boot that machines slow down noticably as they pass the 4G mark.
>
> Not noticed that, and I can't see why it should be the case in general,
> though I suppose some machines might be odd. Got any numbers?
just the fact that the system boot memory test takes 3-4 times as long
with 8G or ram then with 4G of ram. I then boot a 64 bit kernel on the
system and never use PAE mode again :-)
if you can point me at a utility that will test the speed of the memory in
different chunks I'll do some testing on the Opteron systems I have
available. unfortunantly I don't have any Xeon systems to test this on.
David Lang
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-10-03 15:13 ` David Lang
@ 2005-10-03 15:25 ` Martin J. Bligh
2005-10-03 15:32 ` David Lang
0 siblings, 1 reply; 38+ messages in thread
From: Martin J. Bligh @ 2005-10-03 15:25 UTC (permalink / raw)
To: David Lang
Cc: Magnus Damm, Dave Hansen, Magnus Damm, linux-mm,
Linux Kernel Mailing List
--David Lang <david.lang@digitalinsight.com> wrote (on Monday, October 03, 2005 08:13:09 -0700):
> On Mon, 3 Oct 2005, Martin J. Bligh wrote:
>
>> --David Lang <david.lang@digitalinsight.com> wrote (on Monday, October 03, 2005 08:03:44 -0700):
>>
>>> On Mon, 3 Oct 2005, Martin J. Bligh wrote:
>>>
>>>> But that's not the same at all! ;-) PAE memory is the same speed as
>>>> the other stuff. You just have a 3rd level of pagetables for everything.
>>>> One could (correctly) argue it made *all* memory slower, but it does so
>>>> in a uniform fashion.
>>>
>>> is it? I've seen during the memory self-test at boot that machines slow down noticably as they pass the 4G mark.
>>
>> Not noticed that, and I can't see why it should be the case in general,
>> though I suppose some machines might be odd. Got any numbers?
>
> just the fact that the system boot memory test takes 3-4 times as long with 8G or ram then with 4G of ram. I then boot a 64 bit kernel on the system and never use PAE mode again :-)
>
> if you can point me at a utility that will test the speed of the memory in different chunks I'll do some testing on the Opteron systems I have available. unfortunantly I don't have any Xeon systems to test this on.
Mmm. 64-bit uniproc systems, with > 4GB of RAM, running a 32 bit kernel
don't really strike me as a huge market segment ;-)
M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-10-03 15:25 ` Martin J. Bligh
@ 2005-10-03 15:32 ` David Lang
2005-10-03 15:54 ` Martin J. Bligh
0 siblings, 1 reply; 38+ messages in thread
From: David Lang @ 2005-10-03 15:32 UTC (permalink / raw)
To: Martin J. Bligh
Cc: Magnus Damm, Dave Hansen, Magnus Damm, linux-mm,
Linux Kernel Mailing List
On Mon, 3 Oct 2005, Martin J. Bligh wrote:
> --David Lang <david.lang@digitalinsight.com> wrote (on Monday, October 03, 2005 08:13:09 -0700):
>
>> On Mon, 3 Oct 2005, Martin J. Bligh wrote:
>>
>>> --David Lang <david.lang@digitalinsight.com> wrote (on Monday, October 03, 2005 08:03:44 -0700):
>>>
>>>> On Mon, 3 Oct 2005, Martin J. Bligh wrote:
>>>>
>>>>> But that's not the same at all! ;-) PAE memory is the same speed as
>>>>> the other stuff. You just have a 3rd level of pagetables for everything.
>>>>> One could (correctly) argue it made *all* memory slower, but it does so
>>>>> in a uniform fashion.
>>>>
>>>> is it? I've seen during the memory self-test at boot that machines slow down noticably as they pass the 4G mark.
>>>
>>> Not noticed that, and I can't see why it should be the case in general,
>>> though I suppose some machines might be odd. Got any numbers?
>>
>> just the fact that the system boot memory test takes 3-4 times as long with 8G or ram then with 4G of ram. I then boot a 64 bit kernel on the system and never use PAE mode again :-)
>>
>> if you can point me at a utility that will test the speed of the memory in different chunks I'll do some testing on the Opteron systems I have available. unfortunantly I don't have any Xeon systems to test this on.
>
> Mmm. 64-bit uniproc systems, with > 4GB of RAM, running a 32 bit kernel
> don't really strike me as a huge market segment ;-)
true, but there are a lot of 32-bit uniproc systems sold by Intel that
have (or can have) more then 4G of ram. These are the machines I was
thinking of.
David Lang
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-10-03 15:32 ` David Lang
@ 2005-10-03 15:54 ` Martin J. Bligh
2005-10-03 16:44 ` David Lang
0 siblings, 1 reply; 38+ messages in thread
From: Martin J. Bligh @ 2005-10-03 15:54 UTC (permalink / raw)
To: David Lang
Cc: Magnus Damm, Dave Hansen, Magnus Damm, linux-mm,
Linux Kernel Mailing List
--David Lang <david.lang@digitalinsight.com> wrote (on Monday, October 03, 2005 08:32:47 -0700):
> On Mon, 3 Oct 2005, Martin J. Bligh wrote:
>
>> --David Lang <david.lang@digitalinsight.com> wrote (on Monday, October 03, 2005 08:13:09 -0700):
>>
>>> On Mon, 3 Oct 2005, Martin J. Bligh wrote:
>>>
>>>> --David Lang <david.lang@digitalinsight.com> wrote (on Monday, October 03, 2005 08:03:44 -0700):
>>>>
>>>>> On Mon, 3 Oct 2005, Martin J. Bligh wrote:
>>>>>
>>>>>> But that's not the same at all! ;-) PAE memory is the same speed as
>>>>>> the other stuff. You just have a 3rd level of pagetables for everything.
>>>>>> One could (correctly) argue it made *all* memory slower, but it does so
>>>>>> in a uniform fashion.
>>>>>
>>>>> is it? I've seen during the memory self-test at boot that machines slow down noticably as they pass the 4G mark.
>>>>
>>>> Not noticed that, and I can't see why it should be the case in general,
>>>> though I suppose some machines might be odd. Got any numbers?
>>>
>>> just the fact that the system boot memory test takes 3-4 times as long with 8G or ram then with 4G of ram. I then boot a 64 bit kernel on the system and never use PAE mode again :-)
>>>
>>> if you can point me at a utility that will test the speed of the memory in different chunks I'll do some testing on the Opteron systems I have available. unfortunantly I don't have any Xeon systems to test this on.
>>
>> Mmm. 64-bit uniproc systems, with > 4GB of RAM, running a 32 bit kernel
>> don't really strike me as a huge market segment ;-)
>
> true, but there are a lot of 32-bit uniproc systems sold by Intel that have (or can have) more then 4G of ram. These are the machines I was thinking of.
Does your opteron box have more than 1 socket? that'd explain it.
Anyway, it shouldn't happen on any normal platform. Until we get
numbers that prove that it does (and understand why), I don't think
we need NUMA for PAE.
M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-10-03 15:54 ` Martin J. Bligh
@ 2005-10-03 16:44 ` David Lang
0 siblings, 0 replies; 38+ messages in thread
From: David Lang @ 2005-10-03 16:44 UTC (permalink / raw)
To: Martin J. Bligh
Cc: Magnus Damm, Dave Hansen, Magnus Damm, linux-mm,
Linux Kernel Mailing List
On Mon, 3 Oct 2005, Martin J. Bligh wrote:
>>>>>
>>>>> Not noticed that, and I can't see why it should be the case in general,
>>>>> though I suppose some machines might be odd. Got any numbers?
>>>>
>>>> just the fact that the system boot memory test takes 3-4 times as long with 8G or ram then with 4G of ram. I then boot a 64 bit kernel on the system and never use PAE mode again :-)
>>>>
>>>> if you can point me at a utility that will test the speed of the memory in different chunks I'll do some testing on the Opteron systems I have available. unfortunantly I don't have any Xeon systems to test this on.
>>>
>>> Mmm. 64-bit uniproc systems, with > 4GB of RAM, running a 32 bit kernel
>>> don't really strike me as a huge market segment ;-)
>>
>> true, but there are a lot of 32-bit uniproc systems sold by Intel that have (or can have) more then 4G of ram. These are the machines I was thinking of.
>
> Does your opteron box have more than 1 socket? that'd explain it.
yes, but I see the same 4G breakpoint no matter what the memory config
(including one dual proc machine with 16G, if it was a matter of hitting
memory connected to the other socket I would expect the slowdown at 8G,
not at 4G)
> Anyway, it shouldn't happen on any normal platform. Until we get
> numbers that prove that it does (and understand why), I don't think
> we need NUMA for PAE.
Ok, if nobody else is seeing any slowdown.
David Lang
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-10-03 7:34 ` David Lang
2005-10-03 10:02 ` Magnus Damm
@ 2005-10-03 14:45 ` Martin J. Bligh
2005-10-03 14:49 ` David Lang
1 sibling, 1 reply; 38+ messages in thread
From: Martin J. Bligh @ 2005-10-03 14:45 UTC (permalink / raw)
To: David Lang, Magnus Damm
Cc: Dave Hansen, Magnus Damm, linux-mm, Linux Kernel Mailing List
--David Lang <david.lang@digitalinsight.com> wrote (on Monday, October 03, 2005 00:34:40 -0700):
> On Mon, 3 Oct 2005, Magnus Damm wrote:
>
>> On 10/1/05, Dave Hansen <haveblue@us.ibm.com> wrote:
>>> On Fri, 2005-09-30 at 16:33 +0900, Magnus Damm wrote:
>>>> These patches implement NUMA memory node emulation for regular i386 PC:s.
>>>>
>>>> NUMA emulation could be used to provide coarse-grained memory resource control
>>>> using CPUSETS. Another use is as a test environment for NUMA memory code or
>>>> CPUSETS using an i386 emulator such as QEMU.
>>>
>>> This patch set basically allows the "NUMA depends on SMP" dependency to
>>> be removed. I'm not sure this is the right approach. There will likely
>>> never be a real-world NUMA system without SMP. So, this set would seem
>>> to include some increased (#ifdef) complexity for supporting SMP && !
>>> NUMA, which will likely never happen in the real world.
>>
>> Yes, this patch set removes "NUMA depends on SMP". It also adds some
>> simple NUMA emulation code too, but I am sure you are aware of that!
>> =)
>>
>> I agree that it is very unlikely to find a single-processor NUMA
>> system in the real world. So yes, "[PATCH 02/07] i386: numa on
>> non-smp" adds _some_ extra complexity. But because SMP is set when
>> supporting more than one cpu, and NUMA is set when supporting more
>> than one memory node, I see no reason why they should be dependent on
>> each other. Except that they depend on each other today and breaking
>> them loose will increase complexity a bit.
>
> hmm, observation from the peanut gallery, would it make sene to look at
> useing the NUMA code on single proc machines that use PAE to access
> more then 4G or ram on a 32 bit system?
2 problems:
1) there aren't any ;-)
2) The memory is not physically differently separated from the CPUs, so
it's not NUMA.
M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-10-03 14:45 ` Martin J. Bligh
@ 2005-10-03 14:49 ` David Lang
0 siblings, 0 replies; 38+ messages in thread
From: David Lang @ 2005-10-03 14:49 UTC (permalink / raw)
To: Martin J. Bligh
Cc: Magnus Damm, Dave Hansen, Magnus Damm, linux-mm,
Linux Kernel Mailing List
On Mon, 3 Oct 2005, Martin J. Bligh wrote:
>>> I agree that it is very unlikely to find a single-processor NUMA
>>> system in the real world. So yes, "[PATCH 02/07] i386: numa on
>>> non-smp" adds _some_ extra complexity. But because SMP is set when
>>> supporting more than one cpu, and NUMA is set when supporting more
>>> than one memory node, I see no reason why they should be dependent on
>>> each other. Except that they depend on each other today and breaking
>>> them loose will increase complexity a bit.
>>
>> hmm, observation from the peanut gallery, would it make sene to look at
>> useing the NUMA code on single proc machines that use PAE to access
>> more then 4G or ram on a 32 bit system?
>
> 2 problems:
>
> 1) there aren't any ;-)
> 2) The memory is not physically differently separated from the CPUs, so
> it's not NUMA.
even though it's not physically differently seperated from the CPU(s)
doesn't it's differing performance amount to the same thing?
David Lang
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-09-30 15:23 ` [PATCH 00/07][RFC] i386: NUMA emulation Dave Hansen
2005-10-03 2:08 ` Magnus Damm
@ 2005-10-03 3:21 ` Paul Jackson
2005-10-03 5:05 ` Magnus Damm
1 sibling, 1 reply; 38+ messages in thread
From: Paul Jackson @ 2005-10-03 3:21 UTC (permalink / raw)
To: Dave Hansen; +Cc: magnus, linux-mm, linux-kernel
Dave wrote:
> Also, I worry that simply #ifdef'ing things out like CPUsets' update
> means that CPUsets lacks some kind of abstraction that it should have
> been using in the first place.
In the abstract, cpusets should just assume that the system has one or
more CPUs, and one or more Memory Nodes. Ideally, it should not
require either SMP nor NUMA. Indeed, if you (Magnus) can get it
to compile with just one or the other of those two:
config CPUSETS
bool "Cpuset support"
- depends on SMP
+ depends on SMP || NUMA
then I would hope that it would compile with neither. The cpuset
hierarchy on such a system would be rather boring, with all cpusets
having the same one CPU and one Memory Node, but it should work ... in
theory of course.
In practice of course, there may be details on the edges that depend on
the current SMP/NUMA limitations, such as:
Magnus wrote:
> Regarding the #ifdef, it
> was added because partition_sched_domain() is only implemented for
> SMP. That symbol has no prototype or implementation when CONFIG_SMP is
> not set. Maybe it is better to add an empty inline function in
> linux/sched.h for !SMP?
An empty inline partition_sched_domain() would be better than ifdef's
in cpuset.c, yes. Or at least, that's usually the case. Probably here
too.
In theory at least, I applaud Magnus's work here. The assymetry of the
SMP/NUMA define structure has always annoyed me slightly, and only been
explainable in my view as a consequence of the historical order of
development. I had a PC with a second memory board in an ISA slot,
which would qualify as a one CPU, two Memory Node system.
Or what byte us in the future (that PC was a long time ago), the kinks
in the current setup might be a hitch in our side as we extend to
increasingly interesting architectures.
Aside - for those reading this thread on lkml, it originated
on linux-mm. It looks like Dave added lkml to the cc list.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-10-03 3:21 ` Paul Jackson
@ 2005-10-03 5:05 ` Magnus Damm
2005-10-03 5:26 ` Hirokazu Takahashi
` (2 more replies)
0 siblings, 3 replies; 38+ messages in thread
From: Magnus Damm @ 2005-10-03 5:05 UTC (permalink / raw)
To: Paul Jackson; +Cc: Dave Hansen, magnus, linux-mm, linux-kernel
On 10/3/05, Paul Jackson <pj@sgi.com> wrote:
> Dave wrote:
> > Also, I worry that simply #ifdef'ing things out like CPUsets' update
> > means that CPUsets lacks some kind of abstraction that it should have
> > been using in the first place.
>
> In the abstract, cpusets should just assume that the system has one or
> more CPUs, and one or more Memory Nodes. Ideally, it should not
> require either SMP nor NUMA. Indeed, if you (Magnus) can get it
> to compile with just one or the other of those two:
>
> config CPUSETS
> bool "Cpuset support"
> - depends on SMP
> + depends on SMP || NUMA
>
> then I would hope that it would compile with neither. The cpuset
> hierarchy on such a system would be rather boring, with all cpusets
> having the same one CPU and one Memory Node, but it should work ... in
> theory of course.
I just tested this on top of my patches:
@@ -245,7 +245,6 @@ config IKCONFIG_PROC
config CPUSETS
bool "Cpuset support"
- depends on SMP || NUMA
help
and it seems to work ok in practice too. On a regular !SMP !NUMA PC
anyway. As you note, the hierarchy is not that exciting. =) Anyway,
both SMP || NUMA or nothing seems to work as dependencies. After
partition_sched_domain() gets fixed that is.
> In practice of course, there may be details on the edges that depend on
> the current SMP/NUMA limitations, such as:
>
> Magnus wrote:
> > Regarding the #ifdef, it
> > was added because partition_sched_domain() is only implemented for
> > SMP. That symbol has no prototype or implementation when CONFIG_SMP is
> > not set. Maybe it is better to add an empty inline function in
> > linux/sched.h for !SMP?
>
> An empty inline partition_sched_domain() would be better than ifdef's
> in cpuset.c, yes. Or at least, that's usually the case. Probably here
> too.
I agree.
> In theory at least, I applaud Magnus's work here. The assymetry of the
> SMP/NUMA define structure has always annoyed me slightly, and only been
> explainable in my view as a consequence of the historical order of
> development. I had a PC with a second memory board in an ISA slot,
> which would qualify as a one CPU, two Memory Node system.
>
> Or what byte us in the future (that PC was a long time ago), the kinks
> in the current setup might be a hitch in our side as we extend to
> increasingly interesting architectures.
Nice to hear that you like the idea.
Maybe I should have broken down my patches into three smaller sets:
1) i386: NUMA without SMP
2) CPUSETS: NUMA || SMP
3) i386: NUMA emulation
If people like 1) then it's probably a good idea to convert other
architectures too. Both 2) and 3) above are separate but related
issues. And now seems like a good time to solve 2).
So, Paul, please let me know if you prefer SMP || NUMA or no
depencencies in the Kconfig. When I know that I will create a new
patch that hopefully can get into -mm later on.
> Aside - for those reading this thread on lkml, it originated
> on linux-mm. It looks like Dave added lkml to the cc list.
Huh? I sent my patches both to lkml and linux-mm...
Thank you for the feedback!
/ magnus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-10-03 5:05 ` Magnus Damm
@ 2005-10-03 5:26 ` Hirokazu Takahashi
2005-10-03 5:33 ` Paul Jackson
2005-10-03 5:34 ` Paul Jackson
2 siblings, 0 replies; 38+ messages in thread
From: Hirokazu Takahashi @ 2005-10-03 5:26 UTC (permalink / raw)
To: pj; +Cc: magnus.damm, haveblue, magnus, linux-mm, linux-kernel
Hi,
> > In theory at least, I applaud Magnus's work here. The assymetry of the
> > SMP/NUMA define structure has always annoyed me slightly, and only been
> > explainable in my view as a consequence of the historical order of
> > development. I had a PC with a second memory board in an ISA slot,
> > which would qualify as a one CPU, two Memory Node system.
> >
> > Or what byte us in the future (that PC was a long time ago), the kinks
> > in the current setup might be a hitch in our side as we extend to
> > increasingly interesting architectures.
>
> Nice to hear that you like the idea.
>
> Maybe I should have broken down my patches into three smaller sets:
>
> 1) i386: NUMA without SMP
> 2) CPUSETS: NUMA || SMP
> 3) i386: NUMA emulation
>
> If people like 1) then it's probably a good idea to convert other
> architectures too. Both 2) and 3) above are separate but related
> issues. And now seems like a good time to solve 2).
>
> So, Paul, please let me know if you prefer SMP || NUMA or no
> depencencies in the Kconfig. When I know that I will create a new
> patch that hopefully can get into -mm later on.
The latter seems a good idea to me if you're going to enhance CPUSETS
acceptable for CPUMETER or something like that.
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-10-03 5:05 ` Magnus Damm
2005-10-03 5:26 ` Hirokazu Takahashi
@ 2005-10-03 5:33 ` Paul Jackson
2005-10-03 5:59 ` Magnus Damm
2005-10-03 5:34 ` Paul Jackson
2 siblings, 1 reply; 38+ messages in thread
From: Paul Jackson @ 2005-10-03 5:33 UTC (permalink / raw)
To: Magnus Damm; +Cc: haveblue, magnus, linux-mm, linux-kernel
Magnus wrote:
> So, Paul, please let me know if you prefer SMP || NUMA or no
> depencencies in the Kconfig.
In theory, I prefer none. But the devil is in the details here,
and I really don't care that much.
So pick whichever you prefer, or whichever provides the nicest
looking code or patch, or flip a coin ;).
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-10-03 5:33 ` Paul Jackson
@ 2005-10-03 5:59 ` Magnus Damm
2005-10-03 7:26 ` Paul Jackson
0 siblings, 1 reply; 38+ messages in thread
From: Magnus Damm @ 2005-10-03 5:59 UTC (permalink / raw)
To: Paul Jackson; +Cc: haveblue, magnus, linux-mm, linux-kernel
On 10/3/05, Paul Jackson <pj@sgi.com> wrote:
> Magnus wrote:
> > So, Paul, please let me know if you prefer SMP || NUMA or no
> > depencencies in the Kconfig.
>
> In theory, I prefer none. But the devil is in the details here,
> and I really don't care that much.
>
> So pick whichever you prefer, or whichever provides the nicest
> looking code or patch, or flip a coin ;).
I'm tempted to consult the magic eight-ball, but I think I will stick
with the advice from Takahashi-san instead. =) So, the dependency will
be removed.
/ magnus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-10-03 5:59 ` Magnus Damm
@ 2005-10-03 7:26 ` Paul Jackson
0 siblings, 0 replies; 38+ messages in thread
From: Paul Jackson @ 2005-10-03 7:26 UTC (permalink / raw)
To: Magnus Damm; +Cc: haveblue, magnus, linux-mm, linux-kernel
Magnus wrote:
> I think I will stick with the advice from Takahashi-san
Yes - Takahashi-san gives much better advice than an eight-ball.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 00/07][RFC] i386: NUMA emulation
2005-10-03 5:05 ` Magnus Damm
2005-10-03 5:26 ` Hirokazu Takahashi
2005-10-03 5:33 ` Paul Jackson
@ 2005-10-03 5:34 ` Paul Jackson
2 siblings, 0 replies; 38+ messages in thread
From: Paul Jackson @ 2005-10-03 5:34 UTC (permalink / raw)
To: Magnus Damm; +Cc: haveblue, magnus, linux-mm, linux-kernel
Magnus wrote:
> I sent my patches both to lkml and linux-mm...
Must be confusion on my end then. Sorry.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 38+ messages in thread