* [patch 1/4] Make the per cpu reserve configurable
2008-09-19 14:58 [patch 0/4] Cpu alloc V5: Replace percpu allocator in modules.c Christoph Lameter
@ 2008-09-19 14:59 ` Christoph Lameter
2008-09-20 3:55 ` KAMEZAWA Hiroyuki
2008-09-19 14:59 ` [patch 2/4] percpu: Rename variables PERCPU_ENOUGH_ROOM -> PERCPU_AREA_SIZE Christoph Lameter
` (3 subsequent siblings)
4 siblings, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2008-09-19 14:59 UTC (permalink / raw)
To: akpm
Cc: linux-kernel, Christoph Lameter, linux-mm, jeremy, ebiederm,
travis, herbert, xemul, penberg
[-- Attachment #1: cpu_alloc_configurable_percpu --]
[-- Type: text/plain, Size: 3335 bytes --]
The per cpu reserve from which loadable modules allocate their percpu sections
is currently fixed at 8000 bytes.
Add a new kernel parameter
percpu=<dynamically allocatable percpu bytes>
The per cpu reserve area will be used in following patches by the
per cpu allocator.
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
---
arch/ia64/include/asm/percpu.h | 1 +
include/linux/percpu.h | 7 ++++++-
init/main.c | 13 +++++++++++++
3 files changed, 20 insertions(+), 1 deletion(-)
Index: linux-2.6/include/linux/percpu.h
===================================================================
--- linux-2.6.orig/include/linux/percpu.h 2008-09-16 18:14:58.000000000 -0700
+++ linux-2.6/include/linux/percpu.h 2008-09-16 18:21:01.000000000 -0700
@@ -34,6 +34,7 @@
#define EXPORT_PER_CPU_SYMBOL(var) EXPORT_SYMBOL(per_cpu__##var)
#define EXPORT_PER_CPU_SYMBOL_GPL(var) EXPORT_SYMBOL_GPL(per_cpu__##var)
+extern unsigned int percpu_reserve;
/* Enough to cover all DEFINE_PER_CPUs in kernel, including modules. */
#ifndef PERCPU_ENOUGH_ROOM
#ifdef CONFIG_MODULES
@@ -43,7 +44,7 @@
#endif
#define PERCPU_ENOUGH_ROOM \
- (__per_cpu_end - __per_cpu_start + PERCPU_MODULE_RESERVE)
+ (__per_cpu_end - __per_cpu_start + percpu_reserve)
#endif /* PERCPU_ENOUGH_ROOM */
/*
Index: linux-2.6/init/main.c
===================================================================
--- linux-2.6.orig/init/main.c 2008-09-16 18:14:59.000000000 -0700
+++ linux-2.6/init/main.c 2008-09-16 18:24:12.000000000 -0700
@@ -253,6 +253,16 @@ static int __init loglevel(char *str)
early_param("loglevel", loglevel);
+unsigned int percpu_reserve = PERCPU_MODULE_RESERVE;
+
+static int __init init_percpu_reserve(char *str)
+{
+ get_option(&str, &percpu_reserve);
+ return 0;
+}
+
+early_param("percpu=", init_percpu_reserve);
+
/*
* Unknown boot options get handed to init, unless they look like
* failed parameters
@@ -397,6 +407,9 @@ static void __init setup_per_cpu_areas(v
/* Copy section for each CPU (we discard the original) */
size = ALIGN(PERCPU_ENOUGH_ROOM, PAGE_SIZE);
+ printk(KERN_INFO "percpu area: %d bytes total, %d available.\n",
+ size, size - (__per_cpu_end - __per_cpu_start));
+
ptr = alloc_bootmem_pages(size * nr_possible_cpus);
for_each_possible_cpu(i) {
Index: linux-2.6/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.orig/Documentation/kernel-parameters.txt 2008-09-16 18:14:59.000000000 -0700
+++ linux-2.6/Documentation/kernel-parameters.txt 2008-09-16 18:20:08.000000000 -0700
@@ -1643,6 +1643,13 @@ and is between 256 and 4096 characters.
Format: { 0 | 1 }
See arch/parisc/kernel/pdc_chassis.c
+ percpu= Configure the number of percpu bytes that can be
+ dynamically allocated. This is used for per cpu
+ variables of modules and other dynamic per cpu data
+ structures. Creation of per cpu structures after boot
+ may fail if this is set too low.
+ Default is 8000 bytes.
+
pf. [PARIDE]
See Documentation/paride.txt.
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [patch 1/4] Make the per cpu reserve configurable
2008-09-19 14:59 ` [patch 1/4] Make the per cpu reserve configurable Christoph Lameter
@ 2008-09-20 3:55 ` KAMEZAWA Hiroyuki
2008-09-20 23:15 ` Christoph Lameter
0 siblings, 1 reply; 14+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-09-20 3:55 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, linux-kernel, linux-mm, jeremy, ebiederm, travis, herbert,
xemul, penberg
On Fri, 19 Sep 2008 07:59:00 -0700
Christoph Lameter <cl@linux-foundation.org> wrote:
> +unsigned int percpu_reserve = PERCPU_MODULE_RESERVE;
> +
Is this PERCPU_MODULE_RESERVE default size is fixex to 8192 bytes
both on 32bit-arch and 64bit-arch ?
How about enlarging this to twice on 64bit arch now ?
sorry for noise.
Thanks,
-Kame
> +static int __init init_percpu_reserve(char *str)
> +{
> + get_option(&str, &percpu_reserve);
> + return 0;
> +}
> +
> +early_param("percpu=", init_percpu_reserve);
> +
> /*
> * Unknown boot options get handed to init, unless they look like
> * failed parameters
> @@ -397,6 +407,9 @@ static void __init setup_per_cpu_areas(v
>
> /* Copy section for each CPU (we discard the original) */
> size = ALIGN(PERCPU_ENOUGH_ROOM, PAGE_SIZE);
> + printk(KERN_INFO "percpu area: %d bytes total, %d available.\n",
> + size, size - (__per_cpu_end - __per_cpu_start));
> +
> ptr = alloc_bootmem_pages(size * nr_possible_cpus);
>
> for_each_possible_cpu(i) {
> Index: linux-2.6/Documentation/kernel-parameters.txt
> ===================================================================
> --- linux-2.6.orig/Documentation/kernel-parameters.txt 2008-09-16 18:14:59.000000000 -0700
> +++ linux-2.6/Documentation/kernel-parameters.txt 2008-09-16 18:20:08.000000000 -0700
> @@ -1643,6 +1643,13 @@ and is between 256 and 4096 characters.
> Format: { 0 | 1 }
> See arch/parisc/kernel/pdc_chassis.c
>
> + percpu= Configure the number of percpu bytes that can be
> + dynamically allocated. This is used for per cpu
> + variables of modules and other dynamic per cpu data
> + structures. Creation of per cpu structures after boot
> + may fail if this is set too low.
> + Default is 8000 bytes.
> +
> pf. [PARIDE]
> See Documentation/paride.txt.
>
>
> --
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [patch 1/4] Make the per cpu reserve configurable
2008-09-20 3:55 ` KAMEZAWA Hiroyuki
@ 2008-09-20 23:15 ` Christoph Lameter
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2008-09-20 23:15 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: akpm, linux-kernel, linux-mm, jeremy, ebiederm, travis, herbert,
xemul, penberg
KAMEZAWA Hiroyuki wrote:
> Is this PERCPU_MODULE_RESERVE default size is fixex to 8192 bytes
> both on 32bit-arch and 64bit-arch ?
>
Yes.
> How about enlarging this to twice on 64bit arch now ?
>
> sorry for noise.
No actually a good idea to discuss the limit here. Maybe use 10000 for 32 bit and 15000 for 64 bit? Many percpu variables are counters that may be integers.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [patch 2/4] percpu: Rename variables PERCPU_ENOUGH_ROOM -> PERCPU_AREA_SIZE
2008-09-19 14:58 [patch 0/4] Cpu alloc V5: Replace percpu allocator in modules.c Christoph Lameter
2008-09-19 14:59 ` [patch 1/4] Make the per cpu reserve configurable Christoph Lameter
@ 2008-09-19 14:59 ` Christoph Lameter
2008-09-19 14:59 ` [patch 3/4] cpu alloc: The allocator Christoph Lameter
` (2 subsequent siblings)
4 siblings, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2008-09-19 14:59 UTC (permalink / raw)
To: akpm
Cc: linux-kernel, Christoph Lameter, linux-mm, jeremy, ebiederm,
travis, herbert, xemul, penberg
[-- Attachment #1: cpu_alloc_rename --]
[-- Type: text/plain, Size: 6131 bytes --]
Rename PERCPU_ENOUGH_ROOM to PERCPU_AREA_SIZE since its really specifying the
size of the percpu areas.
Rename PERCPU_MODULE_RESERVE to PERCPU_RESERVE_SIZE in anticipation of more
general use of that reserve.
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
---
arch/ia64/include/asm/percpu.h | 2 +-
arch/powerpc/kernel/setup_64.c | 4 ++--
arch/sparc64/kernel/smp.c | 2 +-
arch/x86/kernel/setup_percpu.c | 3 +--
include/linux/percpu.h | 10 +++++-----
init/main.c | 4 ++--
kernel/lockdep.c | 2 +-
kernel/module.c | 2 +-
8 files changed, 14 insertions(+), 15 deletions(-)
Index: linux-2.6/arch/ia64/include/asm/percpu.h
===================================================================
--- linux-2.6.orig/arch/ia64/include/asm/percpu.h 2008-09-16 18:20:19.000000000 -0700
+++ linux-2.6/arch/ia64/include/asm/percpu.h 2008-09-16 18:27:10.000000000 -0700
@@ -6,7 +6,7 @@
* David Mosberger-Tang <davidm@hpl.hp.com>
*/
-#define PERCPU_ENOUGH_ROOM PERCPU_PAGE_SIZE
+#define PERCPU_AREA_SIZE PERCPU_PAGE_SIZE
#ifdef __ASSEMBLY__
# define THIS_CPU(var) (per_cpu__##var) /* use this to mark accesses to per-CPU variables... */
Index: linux-2.6/include/linux/percpu.h
===================================================================
--- linux-2.6.orig/include/linux/percpu.h 2008-09-16 18:25:38.000000000 -0700
+++ linux-2.6/include/linux/percpu.h 2008-09-16 18:28:55.000000000 -0700
@@ -36,16 +36,16 @@
extern unsigned int percpu_reserve;
/* Enough to cover all DEFINE_PER_CPUs in kernel, including modules. */
-#ifndef PERCPU_ENOUGH_ROOM
+#ifndef PERCPU_AREA_SIZE
#ifdef CONFIG_MODULES
-#define PERCPU_MODULE_RESERVE 8192
+#define PERCPU_RESERVE_SIZE 8192
#else
-#define PERCPU_MODULE_RESERVE 0
+#define PERCPU_RESERVE_SIZE 0
#endif
-#define PERCPU_ENOUGH_ROOM \
+#define PERCPU_AREA_SIZE \
(__per_cpu_end - __per_cpu_start + percpu_reserve)
-#endif /* PERCPU_ENOUGH_ROOM */
+#endif /* PERCPU_AREA_SIZE */
/*
* Must be an lvalue. Since @var must be a simple identifier,
Index: linux-2.6/arch/powerpc/kernel/setup_64.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/setup_64.c 2008-09-16 18:13:45.000000000 -0700
+++ linux-2.6/arch/powerpc/kernel/setup_64.c 2008-09-16 18:25:43.000000000 -0700
@@ -599,8 +599,8 @@ void __init setup_per_cpu_areas(void)
/* Copy section for each CPU (we discard the original) */
size = ALIGN(__per_cpu_end - __per_cpu_start, PAGE_SIZE);
#ifdef CONFIG_MODULES
- if (size < PERCPU_ENOUGH_ROOM)
- size = PERCPU_ENOUGH_ROOM;
+ if (size < PERCPU_AREA_SIZE)
+ size = PERCPU_AREA_SIZE;
#endif
for_each_possible_cpu(i) {
Index: linux-2.6/arch/sparc64/kernel/smp.c
===================================================================
--- linux-2.6.orig/arch/sparc64/kernel/smp.c 2008-09-16 18:13:45.000000000 -0700
+++ linux-2.6/arch/sparc64/kernel/smp.c 2008-09-16 18:25:43.000000000 -0700
@@ -1386,7 +1386,7 @@ void __init real_setup_per_cpu_areas(voi
char *ptr;
/* Copy section for each CPU (we discard the original) */
- goal = PERCPU_ENOUGH_ROOM;
+ goal = PERCPU_AREA_SIZE;
__per_cpu_shift = PAGE_SHIFT;
for (size = PAGE_SIZE; size < goal; size <<= 1UL)
Index: linux-2.6/arch/x86/kernel/setup_percpu.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup_percpu.c 2008-09-16 18:13:45.000000000 -0700
+++ linux-2.6/arch/x86/kernel/setup_percpu.c 2008-09-16 18:25:43.000000000 -0700
@@ -140,7 +140,7 @@ static void __init setup_cpu_pda_map(voi
*/
void __init setup_per_cpu_areas(void)
{
- ssize_t size = PERCPU_ENOUGH_ROOM;
+ ssize_t size = PERCPU_AREA_SIZE;
char *ptr;
int cpu;
@@ -148,7 +148,6 @@ void __init setup_per_cpu_areas(void)
setup_cpu_pda_map();
/* Copy section for each CPU (we discard the original) */
- size = PERCPU_ENOUGH_ROOM;
printk(KERN_INFO "PERCPU: Allocating %zd bytes of per cpu data\n",
size);
Index: linux-2.6/init/main.c
===================================================================
--- linux-2.6.orig/init/main.c 2008-09-16 18:25:38.000000000 -0700
+++ linux-2.6/init/main.c 2008-09-16 18:29:40.000000000 -0700
@@ -253,7 +253,7 @@ static int __init loglevel(char *str)
early_param("loglevel", loglevel);
-unsigned int percpu_reserve = PERCPU_MODULE_RESERVE;
+unsigned int percpu_reserve = PERCPU_RESERVE_SIZE;
static int __init init_percpu_reserve(char *str)
{
@@ -406,7 +406,7 @@ static void __init setup_per_cpu_areas(v
unsigned long nr_possible_cpus = num_possible_cpus();
/* Copy section for each CPU (we discard the original) */
- size = ALIGN(PERCPU_ENOUGH_ROOM, PAGE_SIZE);
+ size = ALIGN(PERCPU_AREA_SIZE, PAGE_SIZE);
printk(KERN_INFO "percpu area: %d bytes total, %d available.\n",
size, size - (__per_cpu_end - __per_cpu_start));
Index: linux-2.6/kernel/lockdep.c
===================================================================
--- linux-2.6.orig/kernel/lockdep.c 2008-09-16 18:13:45.000000000 -0700
+++ linux-2.6/kernel/lockdep.c 2008-09-16 18:25:43.000000000 -0700
@@ -639,7 +639,7 @@ static int static_obj(void *obj)
*/
for_each_possible_cpu(i) {
start = (unsigned long) &__per_cpu_start + per_cpu_offset(i);
- end = (unsigned long) &__per_cpu_start + PERCPU_ENOUGH_ROOM
+ end = (unsigned long) &__per_cpu_start + PERCPU_AREA_SIZE
+ per_cpu_offset(i);
if ((addr >= start) && (addr < end))
Index: linux-2.6/kernel/module.c
===================================================================
--- linux-2.6.orig/kernel/module.c 2008-09-16 18:13:45.000000000 -0700
+++ linux-2.6/kernel/module.c 2008-09-16 18:25:43.000000000 -0700
@@ -476,7 +476,7 @@ static int percpu_modinit(void)
/* Static in-kernel percpu data (used). */
pcpu_size[0] = -(__per_cpu_end-__per_cpu_start);
/* Free room. */
- pcpu_size[1] = PERCPU_ENOUGH_ROOM + pcpu_size[0];
+ pcpu_size[1] = PERCPU_AREA_SIZE + pcpu_size[0];
if (pcpu_size[1] < 0) {
printk(KERN_ERR "No per-cpu room for modules.\n");
pcpu_num_used = 1;
--
^ permalink raw reply [flat|nested] 14+ messages in thread* [patch 3/4] cpu alloc: The allocator
2008-09-19 14:58 [patch 0/4] Cpu alloc V5: Replace percpu allocator in modules.c Christoph Lameter
2008-09-19 14:59 ` [patch 1/4] Make the per cpu reserve configurable Christoph Lameter
2008-09-19 14:59 ` [patch 2/4] percpu: Rename variables PERCPU_ENOUGH_ROOM -> PERCPU_AREA_SIZE Christoph Lameter
@ 2008-09-19 14:59 ` Christoph Lameter
2008-09-19 15:23 ` KOSAKI Motohiro
` (2 more replies)
2008-09-19 14:59 ` [patch 4/4] cpu alloc: Use cpu allocator instead of the builtin modules per cpu allocator Christoph Lameter
2008-09-19 15:28 ` [patch 0/4] Cpu alloc V5: Replace percpu allocator in modules.c KOSAKI Motohiro
4 siblings, 3 replies; 14+ messages in thread
From: Christoph Lameter @ 2008-09-19 14:59 UTC (permalink / raw)
To: akpm
Cc: linux-kernel, Christoph Lameter, linux-mm, jeremy, ebiederm,
travis, herbert, xemul, penberg
[-- Attachment #1: cpu_alloc_base --]
[-- Type: text/plain, Size: 11973 bytes --]
The per cpu allocator allows dynamic allocation of memory on all
processors simultaneously. A bitmap is used to track used areas.
The allocator implements tight packing to reduce the cache footprint
and increase speed since cacheline contention is typically not a concern
for memory mainly used by a single cpu. Small objects will fill up gaps
left by larger allocations that required alignments.
The size of the cpu_alloc area can be changed via the percpu=xxx
kernel parameter.
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
---
include/linux/percpu.h | 46 ++++++++++++
include/linux/vmstat.h | 2
mm/Makefile | 2
mm/cpu_alloc.c | 181 +++++++++++++++++++++++++++++++++++++++++++++++++
mm/vmstat.c | 1
5 files changed, 230 insertions(+), 2 deletions(-)
create mode 100644 include/linux/cpu_alloc.h
create mode 100644 mm/cpu_alloc.c
Index: linux-2.6/include/linux/vmstat.h
===================================================================
--- linux-2.6.orig/include/linux/vmstat.h 2008-09-19 09:45:02.000000000 -0500
+++ linux-2.6/include/linux/vmstat.h 2008-09-19 09:49:05.000000000 -0500
@@ -37,7 +37,7 @@
FOR_ALL_ZONES(PGSCAN_KSWAPD),
FOR_ALL_ZONES(PGSCAN_DIRECT),
PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL,
- PAGEOUTRUN, ALLOCSTALL, PGROTATED,
+ PAGEOUTRUN, ALLOCSTALL, PGROTATED, CPU_BYTES,
#ifdef CONFIG_HUGETLB_PAGE
HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
#endif
Index: linux-2.6/mm/Makefile
===================================================================
--- linux-2.6.orig/mm/Makefile 2008-09-19 09:45:02.000000000 -0500
+++ linux-2.6/mm/Makefile 2008-09-19 09:49:05.000000000 -0500
@@ -11,7 +11,7 @@
maccess.o page_alloc.o page-writeback.o pdflush.o \
readahead.o swap.o truncate.o vmscan.o \
prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
- page_isolation.o mm_init.o $(mmu-y)
+ page_isolation.o mm_init.o cpu_alloc.o $(mmu-y)
obj-$(CONFIG_PROC_PAGE_MONITOR) += pagewalk.o
obj-$(CONFIG_BOUNCE) += bounce.o
Index: linux-2.6/mm/cpu_alloc.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/mm/cpu_alloc.c 2008-09-19 09:49:59.000000000 -0500
@@ -0,0 +1,182 @@
+/*
+ * Cpu allocator - Manage objects allocated for each processor
+ *
+ * (C) 2008 SGI, Christoph Lameter <cl@linux-foundation.org>
+ * Basic implementation with allocation and free from a dedicated per
+ * cpu area.
+ *
+ * The per cpu allocator allows a dynamic allocation of a piece of memory on
+ * every processor. A bitmap is used to track used areas.
+ * The allocator implements tight packing to reduce the cache footprint
+ * and increase speed since cacheline contention is typically not a concern
+ * for memory mainly used by a single cpu. Small objects will fill up gaps
+ * left by larger allocations that required alignments.
+ */
+#include <linux/mm.h>
+#include <linux/mmzone.h>
+#include <linux/module.h>
+#include <linux/percpu.h>
+#include <linux/bitmap.h>
+#include <asm/sections.h>
+#include <linux/bootmem.h>
+
+/*
+ * Basic allocation unit. A bit map is created to track the use of each
+ * UNIT_SIZE element in the cpu area.
+ */
+#define UNIT_TYPE int
+#define UNIT_SIZE sizeof(UNIT_TYPE)
+
+int units; /* Actual available units */
+
+/*
+ * How many units are needed for an object of a given size
+ */
+static int size_to_units(unsigned long size)
+{
+ return DIV_ROUND_UP(size, UNIT_SIZE);
+}
+
+/*
+ * Lock to protect the bitmap and the meta data for the cpu allocator.
+ */
+static DEFINE_SPINLOCK(cpu_alloc_map_lock);
+static unsigned long *cpu_alloc_map;
+static int nr_units; /* Number of available units */
+static int first_free; /* First known free unit */
+
+/*
+ * Mark an object as used in the cpu_alloc_map
+ *
+ * Must hold cpu_alloc_map_lock
+ */
+static void set_map(int start, int length)
+{
+ while (length-- > 0)
+ __set_bit(start++, cpu_alloc_map);
+}
+
+/*
+ * Mark an area as freed.
+ *
+ * Must hold cpu_alloc_map_lock
+ */
+static void clear_map(int start, int length)
+{
+ while (length-- > 0)
+ __clear_bit(start++, cpu_alloc_map);
+}
+
+/*
+ * Allocate an object of a certain size
+ *
+ * Returns a special pointer that can be used with CPU_PTR to find the
+ * address of the object for a certain cpu.
+ */
+void *cpu_alloc(unsigned long size, gfp_t gfpflags, unsigned long align)
+{
+ unsigned long start;
+ int units = size_to_units(size);
+ void *ptr;
+ int first;
+ unsigned long flags;
+
+ if (!size)
+ return ZERO_SIZE_PTR;
+
+ WARN_ON(align > PAGE_SIZE);
+
+ spin_lock_irqsave(&cpu_alloc_map_lock, flags);
+
+ first = 1;
+ start = first_free;
+
+ for ( ; ; ) {
+
+ start = find_next_zero_bit(cpu_alloc_map, nr_units, start);
+ if (start >= nr_units)
+ goto out_of_memory;
+
+ if (first)
+ first_free = start;
+
+ /*
+ * Check alignment and that there is enough space after
+ * the starting unit.
+ */
+ if (start % (align / UNIT_SIZE) == 0 &&
+ find_next_bit(cpu_alloc_map, nr_units, start + 1)
+ >= start + units)
+ break;
+ start++;
+ first = 0;
+ }
+
+ if (first)
+ first_free = start + units;
+
+ if (start + units > nr_units)
+ goto out_of_memory;
+
+ set_map(start, units);
+ __count_vm_events(CPU_BYTES, units * UNIT_SIZE);
+
+ spin_unlock_irqrestore(&cpu_alloc_map_lock, flags);
+
+ ptr = __per_cpu_end + start;
+
+ if (gfpflags & __GFP_ZERO) {
+ int cpu;
+
+ for_each_possible_cpu(cpu)
+ memset(CPU_PTR(ptr, cpu), 0, size);
+ }
+
+ return ptr;
+
+out_of_memory:
+ spin_unlock_irqrestore(&cpu_alloc_map_lock, flags);
+ return NULL;
+}
+EXPORT_SYMBOL(cpu_alloc);
+
+/*
+ * Free an object. The pointer must be a cpu pointer allocated
+ * via cpu_alloc.
+ */
+void cpu_free(void *start, unsigned long size)
+{
+ unsigned long units = size_to_units(size);
+ unsigned long index = (int *)start - (int *)__per_cpu_end;
+ unsigned long flags;
+
+ if (!start || start == ZERO_SIZE_PTR)
+ return;
+
+ if (WARN_ON(index >= nr_units))
+ return;
+
+ if (WARN_ON(!test_bit(index, cpu_alloc_map) ||
+ !test_bit(index + units - 1, cpu_alloc_map)))
+ return;
+
+ spin_lock_irqsave(&cpu_alloc_map_lock, flags);
+
+ clear_map(index, units);
+ __count_vm_events(CPU_BYTES, -units * UNIT_SIZE);
+
+ if (index < first_free)
+ first_free = index;
+
+ spin_unlock_irqrestore(&cpu_alloc_map_lock, flags);
+}
+EXPORT_SYMBOL(cpu_free);
+
+
+void cpu_alloc_init(void)
+{
+ nr_units = percpu_reserve / UNIT_SIZE;
+
+ cpu_alloc_map = alloc_bootmem(BITS_TO_LONGS(nr_units));
+}
+
Index: linux-2.6/mm/vmstat.c
===================================================================
--- linux-2.6.orig/mm/vmstat.c 2008-09-19 09:45:02.000000000 -0500
+++ linux-2.6/mm/vmstat.c 2008-09-19 09:49:05.000000000 -0500
@@ -671,6 +671,7 @@
"allocstall",
"pgrotated",
+ "cpu_bytes",
#ifdef CONFIG_HUGETLB_PAGE
"htlb_buddy_alloc_success",
"htlb_buddy_alloc_fail",
Index: linux-2.6/include/linux/percpu.h
===================================================================
--- linux-2.6.orig/include/linux/percpu.h 2008-09-19 09:49:04.000000000 -0500
+++ linux-2.6/include/linux/percpu.h 2008-09-19 09:49:05.000000000 -0500
@@ -107,4 +107,52 @@
#define free_percpu(ptr) percpu_free((ptr))
#define per_cpu_ptr(ptr, cpu) percpu_ptr((ptr), (cpu))
+
+/*
+ * cpu allocator definitions
+ *
+ * The cpu allocator allows allocating an instance of an object for each
+ * processor and the use of a single pointer to access all instances
+ * of the object. cpu_alloc provides optimized means for accessing the
+ * instance of the object belonging to the currently executing processor
+ * as well as special atomic operations on fields of objects of the
+ * currently executing processor.
+ *
+ * Cpu objects are typically small. The allocator packs them tightly
+ * to increase the chance on each access that a per cpu object is already
+ * cached. Alignments may be specified but the intent is to align the data
+ * properly due to cpu alignment constraints and not to avoid cacheline
+ * contention. Any holes left by aligning objects are filled up with smaller
+ * objects that are allocated later.
+ *
+ * Cpu data can be allocated using CPU_ALLOC. The resulting pointer is
+ * pointing to the instance of the variable in the per cpu area provided
+ * by the loader. It is generally an error to use the pointer directly
+ * unless we are booting the system.
+ *
+ * __GFP_ZERO may be passed as a flag to zero the allocated memory.
+ */
+
+/* Return a pointer to the instance of a object for a particular processor */
+#define CPU_PTR(__p, __cpu) SHIFT_PERCPU_PTR((__p), per_cpu_offset(__cpu))
+
+/*
+ * Return a pointer to the instance of the object belonging to the processor
+ * running the current code.
+ */
+#define THIS_CPU(__p) SHIFT_PERCPU_PTR((__p), my_cpu_offset)
+#define __THIS_CPU(__p) SHIFT_PERCPU_PTR((__p), __my_cpu_offset)
+
+#define CPU_ALLOC(type, flags) ((typeof(type) *)cpu_alloc(sizeof(type), (flags), \
+ __alignof__(type)))
+#define CPU_FREE(pointer) cpu_free((pointer), sizeof(*(pointer)))
+
+/*
+ * Raw calls
+ */
+void *cpu_alloc(unsigned long size, gfp_t flags, unsigned long align);
+void cpu_free(void *cpu_pointer, unsigned long size);
+
+void cpu_alloc_init(void);
+
#endif /* __LINUX_PERCPU_H */
Index: linux-2.6/init/main.c
===================================================================
--- linux-2.6.orig/init/main.c 2008-09-19 09:49:04.000000000 -0500
+++ linux-2.6/init/main.c 2008-09-19 09:49:05.000000000 -0500
@@ -261,7 +261,7 @@
return 0;
}
-early_param("percpu=", init_percpu_reserve);
+early_param("percpu", init_percpu_reserve);
/*
* Unknown boot options get handed to init, unless they look like
@@ -368,7 +368,11 @@
#define smp_init() do { } while (0)
#endif
-static inline void setup_per_cpu_areas(void) { }
+static inline void setup_per_cpu_areas(void)
+{
+ cpu_alloc_init();
+}
+
static inline void setup_nr_cpu_ids(void) { }
static inline void smp_prepare_cpus(unsigned int maxcpus) { }
@@ -405,6 +409,7 @@
char *ptr;
unsigned long nr_possible_cpus = num_possible_cpus();
+ cpu_alloc_init();
/* Copy section for each CPU (we discard the original) */
size = ALIGN(PERCPU_AREA_SIZE, PAGE_SIZE);
printk(KERN_INFO "percpu area: %d bytes total, %d available.\n",
Index: linux-2.6/arch/x86/kernel/setup_percpu.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup_percpu.c 2008-09-19 09:49:04.000000000 -0500
+++ linux-2.6/arch/x86/kernel/setup_percpu.c 2008-09-19 09:49:05.000000000 -0500
@@ -144,6 +144,7 @@
char *ptr;
int cpu;
+ cpu_alloc_init();
/* Setup cpu_pda map */
setup_cpu_pda_map();
Index: linux-2.6/arch/ia64/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/ia64/kernel/setup.c 2008-09-19 09:45:02.000000000 -0500
+++ linux-2.6/arch/ia64/kernel/setup.c 2008-09-19 09:49:05.000000000 -0500
@@ -842,6 +842,7 @@
#ifdef CONFIG_ACPI_HOTPLUG_CPU
prefill_possible_map();
#endif
+ cpu_alloc_init();
}
/*
Index: linux-2.6/arch/powerpc/kernel/setup_64.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/setup_64.c 2008-09-19 09:49:04.000000000 -0500
+++ linux-2.6/arch/powerpc/kernel/setup_64.c 2008-09-19 09:49:05.000000000 -0500
@@ -611,6 +611,7 @@
paca[i].data_offset = ptr - __per_cpu_start;
memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
}
+ cpu_alloc_init();
}
#endif
Index: linux-2.6/arch/sparc64/mm/init.c
===================================================================
--- linux-2.6.orig/arch/sparc64/mm/init.c 2008-09-19 09:45:03.000000000 -0500
+++ linux-2.6/arch/sparc64/mm/init.c 2008-09-19 09:49:05.000000000 -0500
@@ -1644,6 +1644,7 @@
/* Dummy function */
void __init setup_per_cpu_areas(void)
{
+ cpu_alloc_init();
}
void __init paging_init(void)
--
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [patch 3/4] cpu alloc: The allocator
2008-09-19 14:59 ` [patch 3/4] cpu alloc: The allocator Christoph Lameter
@ 2008-09-19 15:23 ` KOSAKI Motohiro
2008-09-19 16:27 ` Eric Dumazet
2008-09-19 20:32 ` Christoph Lameter
2 siblings, 0 replies; 14+ messages in thread
From: KOSAKI Motohiro @ 2008-09-19 15:23 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, linux-kernel, linux-mm, jeremy, ebiederm, travis, herbert,
xemul, penberg
> +/*
> + * Allocate an object of a certain size
> + *
> + * Returns a special pointer that can be used with CPU_PTR to find the
> + * address of the object for a certain cpu.
> + */
> +void *cpu_alloc(unsigned long size, gfp_t gfpflags, unsigned long align)
cpu_alloc is good name?
I think some person suspect cpu-hotplug related function.
per_cpu_alloc() or cpu_mem_alloc() are wrong?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch 3/4] cpu alloc: The allocator
2008-09-19 14:59 ` [patch 3/4] cpu alloc: The allocator Christoph Lameter
2008-09-19 15:23 ` KOSAKI Motohiro
@ 2008-09-19 16:27 ` Eric Dumazet
2008-09-19 16:49 ` Christoph Lameter
2008-09-19 20:32 ` Christoph Lameter
2 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2008-09-19 16:27 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, linux-kernel, linux-mm, jeremy, ebiederm, travis, herbert,
xemul, penberg
Christoph Lameter a ecrit :
> The per cpu allocator allows dynamic allocation of memory on all
> processors simultaneously. A bitmap is used to track used areas.
> The allocator implements tight packing to reduce the cache footprint
> and increase speed since cacheline contention is typically not a concern
> for memory mainly used by a single cpu. Small objects will fill up gaps
> left by larger allocations that required alignments.
>
> The size of the cpu_alloc area can be changed via the percpu=xxx
> kernel parameter.
>
> Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
>
> ---
> include/linux/percpu.h | 46 ++++++++++++
> include/linux/vmstat.h | 2
> mm/Makefile | 2
> mm/cpu_alloc.c | 181 +++++++++++++++++++++++++++++++++++++++++++++++++
> mm/vmstat.c | 1
> 5 files changed, 230 insertions(+), 2 deletions(-)
> create mode 100644 include/linux/cpu_alloc.h
> create mode 100644 mm/cpu_alloc.c
>
> Index: linux-2.6/include/linux/vmstat.h
> ===================================================================
> --- linux-2.6.orig/include/linux/vmstat.h 2008-09-19 09:45:02.000000000 -0500
> +++ linux-2.6/include/linux/vmstat.h 2008-09-19 09:49:05.000000000 -0500
> @@ -37,7 +37,7 @@
> FOR_ALL_ZONES(PGSCAN_KSWAPD),
> FOR_ALL_ZONES(PGSCAN_DIRECT),
> PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL,
> - PAGEOUTRUN, ALLOCSTALL, PGROTATED,
> + PAGEOUTRUN, ALLOCSTALL, PGROTATED, CPU_BYTES,
> #ifdef CONFIG_HUGETLB_PAGE
> HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
> #endif
> Index: linux-2.6/mm/Makefile
> ===================================================================
> --- linux-2.6.orig/mm/Makefile 2008-09-19 09:45:02.000000000 -0500
> +++ linux-2.6/mm/Makefile 2008-09-19 09:49:05.000000000 -0500
> @@ -11,7 +11,7 @@
> maccess.o page_alloc.o page-writeback.o pdflush.o \
> readahead.o swap.o truncate.o vmscan.o \
> prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
> - page_isolation.o mm_init.o $(mmu-y)
> + page_isolation.o mm_init.o cpu_alloc.o $(mmu-y)
>
> obj-$(CONFIG_PROC_PAGE_MONITOR) += pagewalk.o
> obj-$(CONFIG_BOUNCE) += bounce.o
> Index: linux-2.6/mm/cpu_alloc.c
> ===================================================================
> --- /dev/null 1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6/mm/cpu_alloc.c 2008-09-19 09:49:59.000000000 -0500
> @@ -0,0 +1,182 @@
> +/*
> + * Cpu allocator - Manage objects allocated for each processor
> + *
> + * (C) 2008 SGI, Christoph Lameter <cl@linux-foundation.org>
> + * Basic implementation with allocation and free from a dedicated per
> + * cpu area.
> + *
> + * The per cpu allocator allows a dynamic allocation of a piece of memory on
> + * every processor. A bitmap is used to track used areas.
> + * The allocator implements tight packing to reduce the cache footprint
> + * and increase speed since cacheline contention is typically not a concern
> + * for memory mainly used by a single cpu. Small objects will fill up gaps
> + * left by larger allocations that required alignments.
> + */
> +#include <linux/mm.h>
> +#include <linux/mmzone.h>
> +#include <linux/module.h>
> +#include <linux/percpu.h>
> +#include <linux/bitmap.h>
> +#include <asm/sections.h>
> +#include <linux/bootmem.h>
> +
> +/*
> + * Basic allocation unit. A bit map is created to track the use of each
> + * UNIT_SIZE element in the cpu area.
> + */
> +#define UNIT_TYPE int
> +#define UNIT_SIZE sizeof(UNIT_TYPE)
> +
> +int units; /* Actual available units */
> +
> +/*
> + * How many units are needed for an object of a given size
> + */
> +static int size_to_units(unsigned long size)
> +{
> + return DIV_ROUND_UP(size, UNIT_SIZE);
> +}
> +
> +/*
> + * Lock to protect the bitmap and the meta data for the cpu allocator.
> + */
> +static DEFINE_SPINLOCK(cpu_alloc_map_lock);
> +static unsigned long *cpu_alloc_map;
> +static int nr_units; /* Number of available units */
> +static int first_free; /* First known free unit */
> +
> +/*
> + * Mark an object as used in the cpu_alloc_map
> + *
> + * Must hold cpu_alloc_map_lock
> + */
> +static void set_map(int start, int length)
> +{
> + while (length-- > 0)
> + __set_bit(start++, cpu_alloc_map);
> +}
> +
> +/*
> + * Mark an area as freed.
> + *
> + * Must hold cpu_alloc_map_lock
> + */
> +static void clear_map(int start, int length)
> +{
> + while (length-- > 0)
> + __clear_bit(start++, cpu_alloc_map);
> +}
> +
> +/*
> + * Allocate an object of a certain size
> + *
> + * Returns a special pointer that can be used with CPU_PTR to find the
> + * address of the object for a certain cpu.
> + */
> +void *cpu_alloc(unsigned long size, gfp_t gfpflags, unsigned long align)
> +{
> + unsigned long start;
> + int units = size_to_units(size);
> + void *ptr;
> + int first;
> + unsigned long flags;
> +
> + if (!size)
> + return ZERO_SIZE_PTR;
> +
> + WARN_ON(align > PAGE_SIZE);
if (align < UNIT_SIZE)
align = UNIT_SIZE;
> +
> + spin_lock_irqsave(&cpu_alloc_map_lock, flags);
> +
> + first = 1;
> + start = first_free;
> +
> + for ( ; ; ) {
> +
> + start = find_next_zero_bit(cpu_alloc_map, nr_units, start);
> + if (start >= nr_units)
> + goto out_of_memory;
> +
> + if (first)
> + first_free = start;
> +
> + /*
> + * Check alignment and that there is enough space after
> + * the starting unit.
> + */
> + if (start % (align / UNIT_SIZE) == 0 &&
or else... divide per 0 ?
> + find_next_bit(cpu_alloc_map, nr_units, start + 1)
> + >= start + units)
> + break;
> + start++;
> + first = 0;
> + }
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [patch 3/4] cpu alloc: The allocator
2008-09-19 16:27 ` Eric Dumazet
@ 2008-09-19 16:49 ` Christoph Lameter
2008-09-19 17:00 ` Christoph Lameter
0 siblings, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2008-09-19 16:49 UTC (permalink / raw)
To: Eric Dumazet
Cc: akpm, linux-kernel, linux-mm, jeremy, ebiederm, travis, herbert,
xemul, penberg
Eric Dumazet wrote:
>> + unsigned long start;
>> + int units = size_to_units(size);
>> + void *ptr;
>> + int first;
>> + unsigned long flags;
>> +
>> + if (!size)
>> + return ZERO_SIZE_PTR;
>> +
>> + WARN_ON(align > PAGE_SIZE);
>
> if (align < UNIT_SIZE)
> align = UNIT_SIZE;
size_to_units() does round up:
/*
* How many units are needed for an object of a given size
*/
static int size_to_units(unsigned long size)
{
return DIV_ROUND_UP(size, UNIT_SIZE);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [patch 3/4] cpu alloc: The allocator
2008-09-19 16:49 ` Christoph Lameter
@ 2008-09-19 17:00 ` Christoph Lameter
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2008-09-19 17:00 UTC (permalink / raw)
To: Eric Dumazet
Cc: akpm, linux-kernel, linux-mm, jeremy, ebiederm, travis, herbert,
xemul, penberg
Completely wrong. We need this patch that Eric suggested:
Subject: cpu_alloc: Allow alignment < UNIT_SIZE
Limit the minimum alignment to UNIT_SIZE.
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Index: linux-2.6/mm/cpu_alloc.c
===================================================================
--- linux-2.6.orig/mm/cpu_alloc.c 2008-09-19 11:47:30.000000000 -0500
+++ linux-2.6/mm/cpu_alloc.c 2008-09-19 11:56:47.000000000 -0500
@@ -86,6 +86,9 @@
WARN_ON(align > PAGE_SIZE);
+ if (align < UNIT_SIZE)
+ align = UNIT_SIZE;
+
spin_lock_irqsave(&cpu_alloc_map_lock, flags);
first = 1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch 3/4] cpu alloc: The allocator
2008-09-19 14:59 ` [patch 3/4] cpu alloc: The allocator Christoph Lameter
2008-09-19 15:23 ` KOSAKI Motohiro
2008-09-19 16:27 ` Eric Dumazet
@ 2008-09-19 20:32 ` Christoph Lameter
2 siblings, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2008-09-19 20:32 UTC (permalink / raw)
To: akpm
Cc: linux-kernel, linux-mm, jeremy, ebiederm, travis, herbert, xemul,
penberg
Duh. A cast went missing which results in a pointer calculation going haywire.
Signed-off-by: <cl@linux-foundation.org>
Index: linux-2.6/mm/cpu_alloc.c
===================================================================
--- linux-2.6.orig/mm/cpu_alloc.c 2008-09-19 14:57:25.000000000 -0500
+++ linux-2.6/mm/cpu_alloc.c 2008-09-19 14:57:33.000000000 -0500
@@ -126,7 +126,7 @@
spin_unlock_irqrestore(&cpu_alloc_map_lock, flags);
- ptr = __per_cpu_end + start;
+ ptr = (int *)__per_cpu_end + start;
printk(KERN_INFO "%d per cpu units allocated at offset %lx address %p\n",
units, start, ptr)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [patch 4/4] cpu alloc: Use cpu allocator instead of the builtin modules per cpu allocator
2008-09-19 14:58 [patch 0/4] Cpu alloc V5: Replace percpu allocator in modules.c Christoph Lameter
` (2 preceding siblings ...)
2008-09-19 14:59 ` [patch 3/4] cpu alloc: The allocator Christoph Lameter
@ 2008-09-19 14:59 ` Christoph Lameter
2008-09-19 15:28 ` [patch 0/4] Cpu alloc V5: Replace percpu allocator in modules.c KOSAKI Motohiro
4 siblings, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2008-09-19 14:59 UTC (permalink / raw)
To: akpm
Cc: linux-kernel, Christoph Lameter, linux-mm, jeremy, ebiederm,
travis, herbert, xemul, penberg
[-- Attachment #1: cpu_alloc_replace_modules_per_cpu_allocator --]
[-- Type: text/plain, Size: 6672 bytes --]
Remove the builtin per cpu allocator from modules.c and use cpu_alloc instead.
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
---
include/linux/module.h | 1
kernel/module.c | 178 ++++---------------------------------------------
2 files changed, 17 insertions(+), 162 deletions(-)
Index: linux-2.6/kernel/module.c
===================================================================
--- linux-2.6.orig/kernel/module.c 2008-09-19 08:12:10.000000000 -0500
+++ linux-2.6/kernel/module.c 2008-09-19 08:16:04.000000000 -0500
@@ -337,121 +337,6 @@
return NULL;
}
-#ifdef CONFIG_SMP
-/* Number of blocks used and allocated. */
-static unsigned int pcpu_num_used, pcpu_num_allocated;
-/* Size of each block. -ve means used. */
-static int *pcpu_size;
-
-static int split_block(unsigned int i, unsigned short size)
-{
- /* Reallocation required? */
- if (pcpu_num_used + 1 > pcpu_num_allocated) {
- int *new;
-
- new = krealloc(pcpu_size, sizeof(new[0])*pcpu_num_allocated*2,
- GFP_KERNEL);
- if (!new)
- return 0;
-
- pcpu_num_allocated *= 2;
- pcpu_size = new;
- }
-
- /* Insert a new subblock */
- memmove(&pcpu_size[i+1], &pcpu_size[i],
- sizeof(pcpu_size[0]) * (pcpu_num_used - i));
- pcpu_num_used++;
-
- pcpu_size[i+1] -= size;
- pcpu_size[i] = size;
- return 1;
-}
-
-static inline unsigned int block_size(int val)
-{
- if (val < 0)
- return -val;
- return val;
-}
-
-static void *percpu_modalloc(unsigned long size, unsigned long align,
- const char *name)
-{
- unsigned long extra;
- unsigned int i;
- void *ptr;
-
- if (align > PAGE_SIZE) {
- printk(KERN_WARNING "%s: per-cpu alignment %li > %li\n",
- name, align, PAGE_SIZE);
- align = PAGE_SIZE;
- }
-
- ptr = __per_cpu_start;
- for (i = 0; i < pcpu_num_used; ptr += block_size(pcpu_size[i]), i++) {
- /* Extra for alignment requirement. */
- extra = ALIGN((unsigned long)ptr, align) - (unsigned long)ptr;
- BUG_ON(i == 0 && extra != 0);
-
- if (pcpu_size[i] < 0 || pcpu_size[i] < extra + size)
- continue;
-
- /* Transfer extra to previous block. */
- if (pcpu_size[i-1] < 0)
- pcpu_size[i-1] -= extra;
- else
- pcpu_size[i-1] += extra;
- pcpu_size[i] -= extra;
- ptr += extra;
-
- /* Split block if warranted */
- if (pcpu_size[i] - size > sizeof(unsigned long))
- if (!split_block(i, size))
- return NULL;
-
- /* Mark allocated */
- pcpu_size[i] = -pcpu_size[i];
- return ptr;
- }
-
- printk(KERN_WARNING "Could not allocate %lu bytes percpu data\n",
- size);
- return NULL;
-}
-
-static void percpu_modfree(void *freeme)
-{
- unsigned int i;
- void *ptr = __per_cpu_start + block_size(pcpu_size[0]);
-
- /* First entry is core kernel percpu data. */
- for (i = 1; i < pcpu_num_used; ptr += block_size(pcpu_size[i]), i++) {
- if (ptr == freeme) {
- pcpu_size[i] = -pcpu_size[i];
- goto free;
- }
- }
- BUG();
-
- free:
- /* Merge with previous? */
- if (pcpu_size[i-1] >= 0) {
- pcpu_size[i-1] += pcpu_size[i];
- pcpu_num_used--;
- memmove(&pcpu_size[i], &pcpu_size[i+1],
- (pcpu_num_used - i) * sizeof(pcpu_size[0]));
- i--;
- }
- /* Merge with next? */
- if (i+1 < pcpu_num_used && pcpu_size[i+1] >= 0) {
- pcpu_size[i] += pcpu_size[i+1];
- pcpu_num_used--;
- memmove(&pcpu_size[i+1], &pcpu_size[i+2],
- (pcpu_num_used - (i+1)) * sizeof(pcpu_size[0]));
- }
-}
-
static unsigned int find_pcpusec(Elf_Ehdr *hdr,
Elf_Shdr *sechdrs,
const char *secstrings)
@@ -467,48 +352,6 @@
memcpy(pcpudest + per_cpu_offset(cpu), from, size);
}
-static int percpu_modinit(void)
-{
- pcpu_num_used = 2;
- pcpu_num_allocated = 2;
- pcpu_size = kmalloc(sizeof(pcpu_size[0]) * pcpu_num_allocated,
- GFP_KERNEL);
- /* Static in-kernel percpu data (used). */
- pcpu_size[0] = -(__per_cpu_end-__per_cpu_start);
- /* Free room. */
- pcpu_size[1] = PERCPU_AREA_SIZE + pcpu_size[0];
- if (pcpu_size[1] < 0) {
- printk(KERN_ERR "No per-cpu room for modules.\n");
- pcpu_num_used = 1;
- }
-
- return 0;
-}
-__initcall(percpu_modinit);
-#else /* ... !CONFIG_SMP */
-static inline void *percpu_modalloc(unsigned long size, unsigned long align,
- const char *name)
-{
- return NULL;
-}
-static inline void percpu_modfree(void *pcpuptr)
-{
- BUG();
-}
-static inline unsigned int find_pcpusec(Elf_Ehdr *hdr,
- Elf_Shdr *sechdrs,
- const char *secstrings)
-{
- return 0;
-}
-static inline void percpu_modcopy(void *pcpudst, const void *src,
- unsigned long size)
-{
- /* pcpusec should be 0, and size of that section should be 0. */
- BUG_ON(size != 0);
-}
-#endif /* CONFIG_SMP */
-
#define MODINFO_ATTR(field) \
static void setup_modinfo_##field(struct module *mod, const char *s) \
{ \
@@ -1433,7 +1276,7 @@
module_free(mod, mod->module_init);
kfree(mod->args);
if (mod->percpu)
- percpu_modfree(mod->percpu);
+ cpu_free(mod->percpu, mod->percpu_size);
/* Free lock-classes: */
lockdep_free_key_range(mod->module_core, mod->core_size);
@@ -1833,6 +1676,7 @@
unsigned int markersstringsindex;
struct module *mod;
long err = 0;
+ unsigned long percpu_size = 0;
void *percpu = NULL, *ptr = NULL; /* Stops spurious gcc warning */
struct exception_table_entry *extable;
mm_segment_t old_fs;
@@ -1981,15 +1825,20 @@
if (pcpuindex) {
/* We have a special allocation for this section. */
- percpu = percpu_modalloc(sechdrs[pcpuindex].sh_size,
- sechdrs[pcpuindex].sh_addralign,
- mod->name);
+ unsigned long align = sechdrs[pcpuindex].sh_addralign;
+
+ percpu_size = sechdrs[pcpuindex].sh_size;
+ percpu = cpu_alloc(percpu_size, GFP_KERNEL|__GFP_ZERO, align);
+ if (!percpu)
+ printk(KERN_WARNING "Could not allocate %lu bytes percpu data\n",
+ percpu_size);
if (!percpu) {
err = -ENOMEM;
goto free_mod;
}
sechdrs[pcpuindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
mod->percpu = percpu;
+ mod->percpu_size = percpu_size;
}
/* Determine total sizes, and put offsets in sh_entsize. For now
@@ -2243,7 +2092,7 @@
module_free(mod, mod->module_core);
free_percpu:
if (percpu)
- percpu_modfree(percpu);
+ cpu_free(percpu, percpu_size);
free_mod:
kfree(args);
free_hdr:
Index: linux-2.6/include/linux/module.h
===================================================================
--- linux-2.6.orig/include/linux/module.h 2008-09-19 08:12:07.000000000 -0500
+++ linux-2.6/include/linux/module.h 2008-09-19 08:12:10.000000000 -0500
@@ -323,6 +323,7 @@
/* Per-cpu data. */
void *percpu;
+ int percpu_size;
/* The command line arguments (may be mangled). People like
keeping pointers to this stuff */
--
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [patch 0/4] Cpu alloc V5: Replace percpu allocator in modules.c
2008-09-19 14:58 [patch 0/4] Cpu alloc V5: Replace percpu allocator in modules.c Christoph Lameter
` (3 preceding siblings ...)
2008-09-19 14:59 ` [patch 4/4] cpu alloc: Use cpu allocator instead of the builtin modules per cpu allocator Christoph Lameter
@ 2008-09-19 15:28 ` KOSAKI Motohiro
2008-09-19 15:50 ` Christoph Lameter
4 siblings, 1 reply; 14+ messages in thread
From: KOSAKI Motohiro @ 2008-09-19 15:28 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, linux-kernel, linux-mm, jeremy, ebiederm, travis, herbert,
xemul, penberg
Hi Cristoph,
> Just do the bare mininum to establish a per cpu allocator. Later patchsets
> will gradually build out the functionality.
>
> The most critical issue that came up on the last round is how to configure
> the size of the percpu area. Here we simply use a kernel parameter and use
> the static size of the existing percpu allocator for modules as a default.
>
> The effect of this patchset is to make the size of percpu data for modules
> configurable. Its no longer fixed at 8000 bytes.
I don't know so much this area.
Could you please what are the problem that you think about?
performance?
fixed-size cause per-cpu starvation by huge user?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [patch 0/4] Cpu alloc V5: Replace percpu allocator in modules.c
2008-09-19 15:28 ` [patch 0/4] Cpu alloc V5: Replace percpu allocator in modules.c KOSAKI Motohiro
@ 2008-09-19 15:50 ` Christoph Lameter
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2008-09-19 15:50 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: akpm, linux-kernel, linux-mm, jeremy, ebiederm, travis, herbert,
xemul, penberg
KOSAKI Motohiro wrote:
>
> I don't know so much this area.
> Could you please what are the problem that you think about?
F.e. Someone is loading lots of modules with lots of percpu data. This will
currently fail if there are more than 8000 bytes allocated. With the percpu
option the size can be configured on bootup. A minor thing but the allocator
can later be used for other things. See the full cpu alloc patchsets that have
been posted before (last one in May).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread