[MODSLAB 0/5] Modular slab allocator V3

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [MODSLAB 0/5] Modular slab allocator V3
@ 2006-09-01 22:33 Christoph Lameter
  2006-09-01 22:34 ` [MODSLAB 1/5] Generic Allocator Framework Christoph Lameter
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Christoph Lameter @ 2006-09-01 22:33 UTC (permalink / raw)
  To: akpm
  Cc: Pekka Enberg, Marcelo Tosatti, linux-kernel, Nick Piggin,
	linux-mm, Christoph Lameter, mpm, Manfred Spraul, Dave Chinner,
	Andi Kleen

Modular Slab Allocator:

Why would one use this?

1. Reduced memory requirements.

  Saving range from a few hundred kbyte on i386 to 5GB on a 1024p 4TB
  Altix NUMA system.

  The slabifier has no caches in the sense of the slab allocator. No storage
  is allocated for per cpu, shared or alien caches. A slab in itself functions
  as the cache. Objects are served directly from a per cpu slab (an "active"
  slab). The management overhead for caches is gone.

  Slabs do not contain metadata but only the payload. Metadata is kept
  in the associated page struct. This means that object can begin at the
  start of a slab and are always properly aligned.

2. No cache reaper

  The current slab allocator needs to periodically check its slab caches and
  move objects back into the slabs. Every 2 seconds on every cpu all slab caches
  are scanned and object move around the system. The system cannot really
  enter a quiescent state.

  The slabifier needs no such mechanism in the single processor case. In the
  SMP case we have a per slab flusher that is active as long as processors
  have active slabs. After a timeout it flushes the active slabs back into
  the slab lists. If no active slabs exist then the flusher is deactivated.

  The cache_reaper has been a consistent trouble spot for interrupt holdoffs
  and scheduling latencies in the SLES9 and SLES10 development cycle. I
  would be grateful if we would not have to deal again with that.

3. Can use the NUMA policies of the page allocator.

  The current slab allocator implements NUMA support through per node lists of
  slabs. If a memory policy or cpuset restrict access to certain node then the
  slab allocator itself must enforce these policies. This is only partially
  implemented and as a result cpusets, memory policies and the NUMA slab do
  not mix too well which leads to per node slabs containing slabs that are
  actually located on different nodes, which causes latencies during
  cache draining (in the cache reaper and elsewhere....).

  The Slabifier does not implement per node slabs. Instead it uses a single
  global pool of partial pages. Memory policies only come into play
  when the active slab is empty and new pages are allocated from the
  page allocator. In that case the page is allocated given the current cpuset
  and the current memory policies by the page allocator and then the slabifier
  serves objects from the slab to the application. The Slabifier does not
  attempt to guarantee that the allocations by kmalloc() are node local. It
  only does a best effort approach. Only kmalloc_node() guarantees that an
  object is allocated on a particular node. kmalloc_node() accomplishes that
  by searching the partial list for a fitting page and allocating a page
  from the requested node if none can be found.

4. Reduced amount of partial slabs.

  The current NUMA slab allocator contains per node partial lists. This
  means that fragmentation of slabs occurs on a per node basis. The more
  nodes are in the system the more potentially partially allocated slabs.

  The slabifier contains a global partial lists. Allocations on other
  nodes can cause the partial list to be shrunk. The existing slab pages
  of a slab cache have therefore a higher usage rate.

  The locking of the partial list is potential scalability problem that is
  addressed in the following ways:

  A. The partial list is only modified (and the lock taken) when necessary.
     Locking is only necessary when a page enters the partial list (it was
     full and the first object was deleted), or it becomes completely
     depleted (slab has to be freed) or it is retrieved to become an
     active slab for allocations of a particular cpu.

  B. A "min_slab_order=" kernel boot option is added. This allows to increase
    the size of the slab pages. Bigger slabs mean less lock taking and larger
    per cpu caches. It also reduces slab fragmentation but comes with the
    danger that the kernel cannot satisfy higher order allocations. However,
    order 1,2,3 allocations should usually be fine. In my tests up to 32p
    I have not yet seen a need to use this to reduce lock contention.

5. Ability to compactify partial lists.

   The slab shrinker of the slabifier can take an argument of a function
   that is capable of moving an object. With that the slabifier is able to
   reduce the amount of partial slabs.

6. Maintainability,

  The Modular Slab is made up out of components that can individually
  be replaced. Modifications are easy and it is easier to add new features.

7. Performance

  The performance of the Modular Slab is roughly comparable with the existing
  slab allocator since both rely on managing a per cpu cache of objects.
  The slab allocator does that by explicitly managing object lists and the
  slabifier does it by reserving a slab per cpu for allocations.

TODO:

- More performance tests than just with AIM7.... Higher CPU
  counts than 32.

Changes V2->V3:

- Tested on i386 (UP + SMP) , x86_64(up), IA64(NUMA up to 32p)

- Overload struct page in mm.h with slab definitions. That
  reduces the macros significantly and makes code more
  readable.

- Debug and optimize functions, Reduce cacheline footprint.

- Add support for specifying slab_min_order= at bootup in order
  to be able to influence fragmentation and lock scaling.

Changes V1-V2:
- Drop pageslab and numaslab. Drop support for VMALLOC allocations.

- Enhance slabifier with some numa capability. Bypass
  free list management for slabs with a single object.
  Drop slab full lists and minimize lock taking
  for partial lists.

- Optimize code: Generate general slab array immediately
  and pass the address of the slab cache in kmalloc(). DMA
  caches remain dynamic.

- Add support for non power of 2 general caches.

- Tested on i386, x86_64 and ia64.

The main intend of this patchset is to modularize
the slab allocator so that development of additions
or modification to the allocator layer become easier.
The framework enables the use of multiple slab allocator
and allows the generation of additional underlying
page allocators (as f.e. needed for mempools and other
specialized things).

The modularization is accomplished through the use of a few
concepts from object oriented programming. Allocators are
described by methods and functions can produce new allocators
based on existing ones by modifying their methods.

So what the patches provide here is:

1. A framework for page allocators and slab allocators

2. Various methods to derive new allocators from old ones
   (add rcu support, destructors, constructors, dma etc)

3. A layer that emulates the exist slab interface (the slabulator).

4. A layer that provides kmalloc functionality.

5. The Slabifier. This is conceptually the Simple Slab (See my RFC
   from last week) but with the additional allocator modifications
   possible it grows like on steroids and then can supply most of
   the functionality of the existing slab allocator and can go even
   beyond it. My tests with AIM7 seem indicates that it is
   equal in performance to the existing slab allocator.

Some new features:

1. The slabifier can flag double frees when the act occurs
   and will attempt to continue.

2. Ability to merge slabs of the same type.

Notably missing features:

- Slab Debugging
  (This should be implemented by deriving a new slab allocator from
  existing ones and adding the necessary processing in alloc and free).

Performance tests on an 8p and 32p machine show consistently that the
performance is equal to the standard slab allocator. Memory use is much
less with this since there is no meta data overhead per slab.

This patchset should just leave the existing slab allocator unharmed. It
adds a hook to include/linux/slab.h to redirect includes to the definitions
for the allocation framework by the slabulator.

Deactivate the "Traditional Slab allocator" in order to activate the modular
slab allocator.

More details may be found in the header of each of the following 4 patches.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [MODSLAB 1/5] Generic Allocator Framework
  2006-09-01 22:33 [MODSLAB 0/5] Modular slab allocator V3 Christoph Lameter
@ 2006-09-01 22:34 ` Christoph Lameter
  2006-09-01 22:34 ` [MODSLAB 2/5] Slabifier Christoph Lameter
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2006-09-01 22:34 UTC (permalink / raw)
  To: akpm
  Cc: Pekka Enberg, Marcelo Tosatti, linux-kernel, Nick Piggin,
	linux-mm, Christoph Lameter, mpm, Andi Kleen, Dave Chinner,
	Manfred Spraul

Add allocator abstraction

The allocator abstraction layer provides sources of pages for the slabifier
and it provides ways to customize the slabifier to ones needs (one can
put dmaificiation, rcuification and so on of slab frees etc on top of the
standard page allocator).

The allocator framework also provides a means for deriving new slab
allocators from old ones. That way features can be added in a generic way.
It would be possible to add rcu for slab objects or debugging in that
fashion.

The object-oriented style of deconstructing the allocators has the
advantage that we can deal with small pieces of code that add special
functionality. The overall framework makes it easy to replace pieces
and evolve the whole allocator systems in a faster way.

It also provides a generic way to operate on different allocators.

It is no problem to define a new allocator that allocates from
memory pools and then use the slab allocator on that memory pool.

The code in mm/allocators.c provides some examples what could be
done with derived allocators.

Signed-off-by: Christoph Lameter <clameter>.

Index: linux-2.6.18-rc5-mm1/mm/Makefile
===================================================================
--- linux-2.6.18-rc5-mm1.orig/mm/Makefile	2006-09-01 10:13:42.824597049 -0700
+++ linux-2.6.18-rc5-mm1/mm/Makefile	2006-09-01 11:47:50.231748544 -0700
@@ -28,3 +28,4 @@ obj-$(CONFIG_MEMORY_HOTPLUG) += memory_h
 obj-$(CONFIG_FS_XIP) += filemap_xip.o
 obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_SMP) += allocpercpu.o
+obj-$(CONFIG_MODULAR_SLAB) += allocator.o
Index: linux-2.6.18-rc5-mm1/include/linux/allocator.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.18-rc5-mm1/include/linux/allocator.h	2006-09-01 11:47:23.782211182 -0700
@@ -0,0 +1,221 @@
+#ifndef _LINUX_ALLOCATOR_H
+#define _LINUX_ALLOCATOR_H
+
+/*
+ * Generic API to memory allocators.
+ * (C) 2006 Silicon Graphics, Inc,
+ *	Christoph Lameter <clameter@sgi.com>
+ */
+#include <linux/gfp.h>
+
+/*
+ * Page allocators
+ *
+ * Page allocators are sources of memory in pages. They basically only
+ * support allocation and freeing of pages. The interesting thing
+ * is how these pages are obtained. Plus with these methods
+ * we can add things in between the page allocator a use
+ * of the page allocator to add things without too much
+ * effort. This allows us to encapsulate new features.
+ *
+ * New allocators could be added f.e. for specialized memory pools.
+ */
+
+struct page_allocator {
+	struct page *(*allocate)(const struct page_allocator *, int order,
+		gfp_t mask, int node);
+	void (*free)(const struct page_allocator *, struct page *, int order);
+	void (*destructor) (struct page_allocator *);
+	const char *name;
+};
+
+/* Standard page allocators*/
+extern const struct page_allocator page_allocator;
+
+/*
+ * Generators for new allocators based on known allocators
+ *
+ * These behave like modifiers to already generated or
+ * existing allocators. May be combined at will.
+ */
+
+/*
+ * A way to free all pages via RCU. The RCU head is placed in the
+ * struct page so this fully transparent and does not require any
+ * allocation and freeing via the slab.
+ */
+struct page_allocator *rcuify_page_allocator
+			(const struct page_allocator *base);
+
+/*
+ * Make an allocation via a specific allocator always return
+ * DMA memory.
+ */
+struct page_allocator *dmaify_page_allocator
+			(const struct page_allocator *base);
+
+
+/*
+ * Allocation and freeing is tracked with slab_reclaim_pages
+ */
+struct page_allocator *reclaimable_slab
+			(const struct page_allocator *base);
+
+struct page_allocator *unreclaimable_slab
+			(const struct page_allocator *base);
+
+/*
+ * This provides a constructor and a destructor call for each object
+ * on a page. The constructors and destructors calling conventions
+ * are compatible with the existing slab implementation. However,
+ * this implementation assumes that the objects always start at offset 0.
+ *
+ * The main use of these is to provide a generic forrm of constructors
+ * and destructors. These run after a page was allocated and before
+ * a page is freed.
+ */
+struct page_allocator *ctor_and_dtor_for_page_allocator
+	(const struct page_allocator *, unsigned int size, void *private,
+		void (*ctor)(void *, void *, unsigned long),
+                void (*dtor)(void *, void *, unsigned long));
+
+#ifdef CONFIG_NUMA
+/*
+ * Allocator that allows the customization of the NUMA behavior of an
+ * allocator. If a node is specified then the allocator will always try
+ * to allocate on that node. Flags set are ORed for every allocation.
+ * F.e. one can set GFP_THISNODE to force an allocation on a particular node
+ * or on a local node.
+ */
+struct page_allocator *numactl_allocator(const struct page_allocator *,
+						int node, gfp_t flags);
+#endif
+
+/* Tools to make your own */
+struct derived_page_allocator {
+	struct page_allocator a;
+	const struct page_allocator *base;
+};
+
+void derived_destructor(struct page_allocator *a);
+
+struct derived_page_allocator *derive_page_allocator
+				(const struct page_allocator *base,
+				const char *name);
+
+/*
+ * Slab allocators
+ */
+
+
+/*
+ * A slab cache structure must be generated and be populated in order to
+ * create a working slab cache.
+ */
+struct slab_cache {
+	const struct slab_allocator *slab_alloc;
+	const struct page_allocator *page_alloc;
+	short int node;		/* Node passed to page allocator */
+	short int align;	/* Alignment requirements */
+	int size;		/* The size of a chunk on a slab */
+	int objsize;		/* The size of an object that is in a chunk */
+	int inuse;		/* Used portion of the chunk */
+	int offset;		/* Offset to the freelist pointer */
+	unsigned int order;	/* Size of the slab page */
+	const char *name;	/* Name (only for display!) */
+	struct list_head list;	/* slabinfo data */
+};
+
+/*
+ * Generic structure for opaque per slab data for slab allocators
+ */
+struct slab_control {
+	struct slab_cache sc;	/* Common information */
+	void *data[50];		/* Some data */
+	void *percpu[NR_CPUS];	/* Some per cpu information. */
+};
+
+struct slab_allocator {
+	/* Allocation functions */
+	void *(*alloc)(struct slab_cache *, gfp_t);
+	void *(*alloc_node)(struct slab_cache *, gfp_t, int);
+	void (*free)(struct slab_cache *, const void *);
+
+	/* Entry point from kfree */
+	void (*__free)(struct page *, const void *);
+
+	/* Object checks */
+	int (*valid_pointer)(struct slab_cache *, const void *object);
+	unsigned long (*object_size)(struct slab_cache *, const void *);
+
+	/*
+	 * Determine slab statistics in units of slabs. Returns the
+	 * number of total pages used by the slab cache.
+	 * active are the pages under allocation or empty
+	 * partial are the number of partial slabs.
+	 */
+	unsigned long (*get_objects)(struct slab_cache *, unsigned long *total,
+			unsigned long *active, unsigned long *partial);
+
+	/*
+	 * Create an actually usable slab cache from a slab allocator
+	 */
+	struct slab_cache *(*create)(struct slab_control *,
+		const struct slab_cache *);
+
+	/*
+	 * shrink defragments a slab cache by moving objects from sparsely
+	 * populated slabs to others. slab shrink will terminate when there
+	 * is only one fragmented slab left.
+	 *
+	 * The move_object function must be supplied otherwise shrink can only
+	 * free pages that are competely empty.
+	 *
+	 * move_object gets a slab_cache pointer and an object pointer. The
+	 * function must reallocate another object and move the contents
+	 * from this object into the new object. Then the function should
+	 * return 1 for success. If it return 0 then the object is pinned.
+	 * the slab that the object resides on will not be freed.
+	 */
+	int (*shrink)(struct slab_cache *,
+			int (*move_object)(struct slab_cache *, void *));
+
+	/*
+	 * Establish a new reference so that destroy does not
+	 * unecessarily destroy the slab_cache
+	 */
+	struct slab_cache * (*dup)(struct slab_cache *);
+	int (*destroy)(struct slab_cache *);
+	void (*destructor)(struct slab_allocator *);
+	const char *name;
+};
+
+/* Standard slab allocator */
+extern const struct slab_allocator slabifier_allocator;
+
+/* Access kmalloc's fixed slabs without creating new ones. */
+extern struct slab_allocator kmalloc_slab_allocator;
+
+#ifdef CONFIG_NUMA
+extern const struct slab_allocator numa_slab_allocator;
+#endif
+
+/* Generate new slab allocators based on old ones */
+struct slab_allocator *rcuify_slab(struct slab_allocator *base);
+struct slab_allocator *dmaify_slab(struct slab_allocator *base);
+
+/* Indestructible static allocators use this. */
+void null_slab_allocator_destructor(struct slab_allocator *);
+
+struct derived_slab_allocator {
+	struct slab_allocator a;
+	const struct slab_allocator *base;
+};
+
+void derived_slab_destructor(struct slab_allocator *a);
+
+struct derived_slab_allocator *derive_slab_allocator
+			(const struct slab_allocator *base,
+			const char *name);
+
+#endif /* _LINUX_ALLOCATOR_H */
Index: linux-2.6.18-rc5-mm1/mm/allocator.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.18-rc5-mm1/mm/allocator.c	2006-09-01 11:50:41.934026026 -0700
@@ -0,0 +1,451 @@
+/*
+ * Generic allocator and modifiers for allocators (slab and page allocators)
+ *
+ * (C) 2006 Silicon Graphics, Inc. Christoph Lameter <clameter@sgi.com>
+ */
+
+#include <linux/allocator.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+
+/*
+ * Section One: Page Allocators
+ */
+
+static char *alloc_str_combine(const char *new, const char *base)
+{
+	char *s;
+
+	s = kmalloc(strlen(new) + strlen(base) + 2, GFP_KERNEL);
+	strcpy(s, new);
+	strcat(s, ":");
+	strcat(s, base);
+	return s;
+}
+
+/* For static allocators */
+static void null_destructor(struct page_allocator *a) {}
+
+/*
+ * A general page allocator that can allocate all of memory
+ */
+static struct page *gen_alloc(const struct page_allocator *a, int order,
+		gfp_t flags, int node)
+{
+	if (order)
+		flags |= __GFP_COMP;
+#ifdef CONFIG_NUMA
+	if (node >=0)
+		return alloc_pages_node(node, flags, order);
+#endif
+	return alloc_pages(flags, order);
+}
+
+static void gen_free(const struct page_allocator *a, struct page *page,
+							int order)
+{
+	__free_pages(page, order);
+}
+
+const struct page_allocator page_allocator = {
+	.allocate = gen_alloc,
+	.free = gen_free,
+	.destructor = null_destructor,
+	.name = "page_allocator"
+};
+
+/*
+ * Functions to deal with dynamically generating allocators.
+ */
+void derived_destructor(struct page_allocator *a)
+{
+	struct derived_page_allocator *d = (void *)a;
+
+	d->base->destructor((struct page_allocator *)d->base);
+	kfree(a->name);
+	kfree(a);
+}
+
+/*
+ * Create a new allocator based on another one. All functionality
+ * is duplicated except for the destructor. The caller needs to do
+ * modifications to some of the methods in the copy.
+ */
+struct derived_page_allocator *derive_page_allocator
+		(const struct page_allocator *base, const char *name)
+{
+	struct derived_page_allocator *d =
+		kmalloc(sizeof(struct derived_page_allocator), GFP_KERNEL);
+
+	d->base = base;
+	d->a.allocate = base->allocate;
+	d->a.free = base->free;
+	d->a.name = alloc_str_combine(name, base->name);
+	d->a.destructor = derived_destructor;
+	return d;
+};
+
+/*
+ * RCU allocator generator (this is used f.e in the slabifier
+ * to realize SLAB_DESTROY_BY_RCU see the slabulator on how to do this).
+ *
+ * We overload struct page once more for the RCU data
+ * lru = RCU head
+ * index = order
+ * mapping = base allocator
+ */
+static void page_free_rcu(struct rcu_head *h)
+{
+	struct page *page;
+	struct page_allocator *base;
+	int order;
+
+ 	page = container_of((struct list_head *)h, struct page, lru);
+	base = (void *)page->mapping;
+	order = page->index;
+	page->index = 0;
+	page->mapping = NULL;
+	base->free(base, page, order);
+}
+
+/*
+ * Use page struct as intermediate rcu storage.
+ */
+static void rcu_free(const struct page_allocator *a, struct page *page,
+							 int order)
+{
+	struct rcu_head *head = (void *)&page->lru;
+	struct derived_page_allocator *d = (void *)a;
+
+	page->index = order;
+	page->mapping = (void *)d->base;
+	call_rcu(head, page_free_rcu);
+}
+
+struct page_allocator *rcuify_page_allocator
+			(const struct page_allocator *base)
+{
+	struct derived_page_allocator *d = derive_page_allocator(base,"rcu");
+
+	d->a.free = rcu_free;
+	return &d->a;
+};
+
+/*
+ * Restrict memory allocations to DMA
+ */
+static struct page *dma_alloc(const struct page_allocator *a, int order,
+						gfp_t flags, int node)
+{
+	struct derived_page_allocator *d = (void *)a;
+
+	return d->base->allocate(d->base, order, flags | __GFP_DMA, node);
+}
+
+struct page_allocator *dmaify_page_allocator
+			(const struct page_allocator *base)
+{
+	struct derived_page_allocator *d = derive_page_allocator(base, "dma");
+
+	d->a.allocate = dma_alloc;
+	return &d->a;
+}
+
+/*
+ * Allocator with constructur and destructor in fixed sized length
+ * in the page (used by the slabifier to realize slab destructors
+ * and constructors).
+ */
+struct deconstructor {
+	struct page_allocator a;
+	const struct page_allocator *base;
+	unsigned int size;
+	void *private;
+	void (*ctor)(void *, void *, unsigned long);
+        void (*dtor)(void *, void *, unsigned long);
+};
+
+static struct page *ctor_alloc(const struct page_allocator *a,
+				int order, gfp_t flags, int node)
+{
+	struct deconstructor *d = (void *)a;
+	struct page * page = d->base->allocate(d->base, order, flags, node);
+
+	if (d->ctor) {
+		void *start = page_address(page);
+		void *end = start + (PAGE_SIZE << order);
+		void *p;
+		int mode = 1;
+
+		/* Setup a mode compatible with slab usage */
+		if (!(mode & __GFP_WAIT))
+			mode |= 2;
+
+		for (p = start; p <= end - d->size; p += d->size)
+			d->ctor(p, d->private, mode);
+	}
+	return page;
+}
+
+static void dtor_free(const struct page_allocator *a,
+				struct page *page, int order)
+{
+	struct deconstructor *d = (void *)a;
+
+	if (d->dtor) {
+		void *start = page_address(page);
+		void *end = start + (PAGE_SIZE << order);
+		void *p;
+
+		for (p = start; p <= end - d->size; p += d->size)
+			d->dtor(p, d->private, 0);
+	}
+	d->base->free(d->base, page, order);
+}
+
+struct page_allocator *ctor_and_dtor_for_page_allocator
+	(const struct page_allocator *base,
+		unsigned int size, void *private,
+		void (*ctor)(void *, void *, unsigned long),
+		void (*dtor)(void *, void *, unsigned long))
+{
+	struct deconstructor *d =
+		kmalloc(sizeof(struct deconstructor), GFP_KERNEL);
+
+	d->a.allocate = ctor ? ctor_alloc : base->allocate;
+	d->a.free = dtor ? dtor_free : base->free;
+	d->a.destructor = derived_destructor;
+	d->a.name = alloc_str_combine("ctor_dtor", base->name);
+	d->base = base;
+	d->ctor = ctor;
+	d->dtor = dtor;
+	d->size = size;
+	d->private = private;
+	return &d->a;
+}
+
+/*
+ * Track reclaimable pages. This is used by the slabulator
+ * to mark allocations of certain slab caches.
+ */
+static struct page *rac_alloc(const struct page_allocator *a, int order,
+			gfp_t flags, int node)
+{
+	struct derived_page_allocator *d = (void *)a;
+	struct page *page = d->base->allocate(d->base, order, flags, node);
+
+	mod_zone_page_state(page_zone(page), NR_SLAB_RECLAIMABLE, 1 << order);
+	return page;
+}
+
+static void rac_free(const struct page_allocator *a, struct page *page,
+							int order)
+{
+	struct derived_page_allocator *d = (void *)a;
+
+	mod_zone_page_state(page_zone(page),
+					NR_SLAB_RECLAIMABLE, -(1 << order));
+	d->base->free(d->base, page, order);
+}
+
+struct page_allocator *reclaimable_slab(const struct page_allocator *base)
+{
+	struct derived_page_allocator *d =
+		derive_page_allocator(&page_allocator,"reclaimable");
+
+	d->a.allocate = rac_alloc;
+	d->a.free = rac_free;
+	return &d->a;
+}
+
+/*
+ * Track unreclaimable pages. This is used by the slabulator
+ * to mark allocations of certain slab caches.
+ */
+static struct page *urac_alloc(const struct page_allocator *a, int order,
+			gfp_t flags, int node)
+{
+	struct derived_page_allocator *d = (void *)a;
+	struct page *page = d->base->allocate(d->base, order, flags, node);
+
+	mod_zone_page_state(page_zone(page),
+			NR_SLAB_UNRECLAIMABLE, 1 << order);
+	return page;
+}
+
+static void urac_free(const struct page_allocator *a, struct page *page,
+							int order)
+{
+	struct derived_page_allocator *d = (void *)a;
+
+	mod_zone_page_state(page_zone(page),
+					NR_SLAB_UNRECLAIMABLE, -(1 << order));
+	d->base->free(d->base, page, order);
+}
+
+struct page_allocator *unreclaimable_slab(const struct page_allocator *base)
+{
+	struct derived_page_allocator *d =
+		derive_page_allocator(&page_allocator,"unreclaimable");
+
+	d->a.allocate = urac_alloc;
+	d->a.free = urac_free;
+	return &d->a;
+}
+
+/*
+ * Numacontrol for allocators
+ */
+struct numactl {
+	struct page_allocator a;
+	const struct page_allocator *base;
+	int node;
+	gfp_t flags;
+};
+
+static struct page *numactl_alloc(const struct page_allocator *a,
+				int order, gfp_t flags, int node)
+{
+	struct numactl *d = (void *)a;
+
+	if (d->node >= 0)
+		node = d->node;
+
+	return d->base->allocate(d->base, order, flags | d->flags, node);
+}
+
+
+struct page_allocator *numactl_allocator(const struct page_allocator *base,
+	int node, gfp_t flags)
+{
+	struct numactl *d =
+		kmalloc(sizeof(struct numactl), GFP_KERNEL);
+
+	d->a.allocate = numactl_alloc;
+	d->a.destructor = derived_destructor;
+	d->a.name = alloc_str_combine("numa", base->name);
+	d->base = base;
+	d->node = node;
+	d->flags = flags;
+	return &d->a;
+}
+
+/*
+ * Slab allocators
+ */
+
+/* Tools to make your own */
+void null_slab_allocator_destructor(struct slab_allocator *a) {}
+
+void derived_slab_destructor(struct slab_allocator *a) {
+	struct derived_slab_allocator *d = (void *)a;
+
+	d->base->destructor((struct slab_allocator *)d->base);
+	kfree(d);
+}
+
+struct derived_slab_allocator *derive_slab_allocator
+		(const struct slab_allocator *base,
+			const char *name) {
+	struct derived_slab_allocator *d =
+		 kmalloc(sizeof(struct derived_slab_allocator), GFP_KERNEL);
+
+	memcpy(&d->a, base, sizeof(struct slab_allocator));
+	d->base = base;
+	d->a.name = alloc_str_combine("name", base->name);
+	d->a.destructor = derived_slab_destructor;
+	return d;
+}
+
+/* Generate new slab allocators based on old ones */
+
+/*
+ * First a generic method to rcuify any slab. We add the rcuhead
+ * to the end of the object and use that on free.
+ */
+
+struct rcuified_slab {
+	struct slab_allocator *a;
+	const struct slab_allocator *base;
+	unsigned int rcu_offset;
+};
+
+/*
+ * Information that is added to the end of the slab
+ */
+struct slabr {
+	struct rcu_head r;
+	struct slab_cache *s;
+};
+
+struct slab_cache *rcuify_slab_create(struct slab_control *c,
+	const struct slab_cache *sc)
+{
+	struct rcuified_slab *d = (void *)sc->slab_alloc;
+	struct slab_cache i;
+
+	memcpy(&i, sc, sizeof(struct slab_cache));
+
+	i.inuse = d->rcu_offset = ALIGN(sc->inuse, sizeof(void *));
+	i.inuse += sizeof(struct slabr) + sizeof(void *);
+	while (i.inuse > i.size)
+		i.size += i.align;
+
+	i.slab_alloc = d->base;
+
+	return d->base->create(c, &i);
+}
+
+void rcu_slab_free(struct rcu_head *rcu)
+{
+	struct slabr *r = (void *) rcu;
+	struct slab_cache *s = r->s;
+	struct rcuified_slab *d = (void *)s->slab_alloc;
+	void *object = (void *) rcu - d->rcu_offset;
+
+	d->base->free(s, object);
+}
+
+void rcuify_slab_free(struct slab_cache *s, const void *object)
+{
+	struct rcuified_slab *r = (struct rcuified_slab *)(s->slab_alloc);
+
+	call_rcu((struct rcu_head *)(object + r->rcu_offset), rcu_slab_free);
+}
+
+struct slab_allocator *rcuify_slab_allocator
+			(const struct slab_allocator *base)
+{
+	struct derived_slab_allocator *d = derive_slab_allocator(base,"rcu");
+
+	d->a.create = rcuify_slab_create;
+	d->a.free = rcuify_slab_free;
+	return &d->a;
+}
+
+/*
+ * dmaification of slab allocation. This is done by dmaifying the
+ * underlying page allocator.
+ */
+struct slab_cache *dmaify_slab_create(struct slab_control *c,
+		const struct slab_cache *sc)
+{
+	struct derived_slab_allocator *d = (void *)sc->slab_alloc;
+	struct slab_cache i;
+
+	memcpy(&i, sc, sizeof(struct slab_cache));
+
+	i.page_alloc = dmaify_page_allocator(sc->page_alloc);
+
+	return d->base->create(c, &i);
+}
+
+struct slab_allocator *dmaify_slab_allocator
+			(const struct slab_allocator *base)
+{
+	struct derived_slab_allocator *d = derive_slab_allocator(base, "dma");
+
+	d->a.create = dmaify_slab_create;
+	return &d->a;
+}
+

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [MODSLAB 2/5] Slabifier
  2006-09-01 22:33 [MODSLAB 0/5] Modular slab allocator V3 Christoph Lameter
  2006-09-01 22:34 ` [MODSLAB 1/5] Generic Allocator Framework Christoph Lameter
@ 2006-09-01 22:34 ` Christoph Lameter
  2006-09-01 22:34 ` [MODSLAB 3/5] /proc/slabinfo display Christoph Lameter
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2006-09-01 22:34 UTC (permalink / raw)
  To: akpm
  Cc: Pekka Enberg, Marcelo Tosatti, linux-kernel, Nick Piggin,
	linux-mm, Christoph Lameter, mpm, Manfred Spraul, Dave Chinner,
	Andi Kleen

V2->V3:
- Overload struct page
- Add new PageSlabsingle flag

Lately I have started tinkering around with the slab in particular after
Matt Mackal mentioned that the slab should be more modular at the KS.
One particular design issue with the current slab is that it is build on the
basic notion of shifting object references from list to list. Without NUMA this
is wild enough with the per cpu caches and the shared cache but with NUMA we now
have per node shared arrays, per node list and per node per node alien caches.
Somehow this all works but one wonders does it have to be that way? On very
large systems the number of these entities grows to unbelievable numbers.
On our 1k cpu/node system each slab need 128M for alien caches alone.

So I thought it may be best to try to develop another basic slab layer
that does not have all the object queues and that does not have to carry
so much state information. I also have had concerns about the way locking
is handled for awhile. We could increase parallelism by finer grained locking.
This in turn may avoid the need for object queues.

One of the problems of the NUMA slab allocator is that per node partial
slab lists are used. Partial slabs cannot be filled up from other nodes.
So what I have tried to do here is to have minimal metainformation combined
with one centralized list of partially allocated slabs. The list_lock
is only taken if list modifications become necessary. The need for those
has been drastically reduced with a few measures. See below.

After toying around for awhile I came to the realization that the page struct
contains all the information necessary to manage a slab block. One can put
all the management information there and that is also advantageous
for performance since we constantly have to use the page struct anyways for
reverse object lookups and during slab creation. So this also reduces the
cache footprint of the slab. The alignment is naturally the best since the
first object starts right at the page boundary. This reduces the complexity
of alignment calculations.

Also we have a page lock in the page struct that is used
for locking each slab during modifications. Taking the lock per slab
is the finest grained locking available and this is fundamental
to the slabifier. The slab lock is taken if the slab contains
multiple objects in order to protect the freelist.

The freelists of objects per page are managed as a chained list.
The struct page contains a pointer to the first element. The first 4 bytes of
the free element contains a pointer to the next free element etc until the
chain ends with NULL. If the object cannot be overwritten after free (RCU
and constructors etc) then we can shift the pointer to the next free element
behind the object.

The slabifier does remove the need for a list of free slabs and a list
of used slabs. Free slabs are immediately returned to the page allocator.
The page allocator has its own per cpu queues that will manage these
free pages. Used slabs are simply not tracked because we never have
a need to find out where the used slabs are. The only important thing
is that the slabs come back to the partial list when an object in them
is deleted. The metadata in the corresponding page struct will allow
us to do that easily.

Per cpu caches exist in the sense that each processor has a per processor
"cpuslab". Objects in this "active" slab will only be allocated from this
processor. This naturally makes all allocations from a single processor
come from the same slab page which reduces fragmentation.
The page state is likely going to stay in the cache. Allocation will be
very fast since we only need the page struct reference for all our needs
which is likely not contended at all. Fetching the next free pointer from
the location of the object nicely prefetches the object.

The list_lock is used only in very rare cases. Lets discuss one example
of multiple processors allocating from the same cache. The first thing that
happens when the slab cache is empty is that every processors gets its own
slab (the "active" slab). This does not require the list_lock because we
get a page from the page allocator and that immediately becomes the
active slab. Now all processors allocate from their slabs. This also
does not require any access to the partial lists so no list_lock is taken.

If a slab becomes full then each processor is simply forgetting about
the slab and gets a new one from the page allocator.

Therefore as long as all processors are just allocating no list_lock is
needed at all.

If a free now happens then things get a bit more complicated. If the free
occurs on an active page then again no list_lock needs to be taken.
Only the slab lock may be contended since it may be under current allocation
by a process.

If the free occurs on a fully allocated page then we make a partially
allocated page from a full page. Now the list_lock will be taken and
the page is put on the partial list.

If further frees occur on a partially allocated page then also no
list_lock needs to be taken because it is still a partially allocated
page. This works until the page has no objects left. At that point
we take the page of the list of partial slabs to free it and that
requires the list_lock again.

If a processors has filled up its active slab and needs a new one then
it will first check if there are partially allocated slabs available.
If so then it will take a partially allocated slab and begin to fill
it up. That also requires taking the list lock.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.18-rc5-mm1/mm/slabifier.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.18-rc5-mm1/mm/slabifier.c	2006-09-01 14:25:55.907938735 -0700
@@ -0,0 +1,936 @@
+/*
+ * Generic Slabifier for the allocator abstraction framework.
+ *
+ * The allocator synchronizes using slab based locks and only
+ * uses a centralized list lock to manage the pool of partial slabs.
+ *
+ * (C) 2006 Silicon Graphics Inc., Christoph Lameter <clameter@sgi.com>
+ */
+
+#include <linux/mm.h>
+#include <linux/allocator.h>
+#include <linux/bit_spinlock.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+/*
+ * Enabling SLABIFIER_DEBUG will check various things. Among others it will
+ * flag double frees.
+ */
+/* #define SLABIFIER_DEBUG */
+
+struct slab {
+	struct slab_cache sc;
+#ifdef CONFIG_SMP
+	int flusher_active;
+	struct work_struct flush;
+#endif
+	atomic_t refcount;		/* Refcount for destroy */
+	atomic_long_t nr_slabs;		/* Total slabs used */
+	/* Performance critical items follow */
+	int size;			/* Total size of an object */
+	int offset;			/* Free pointer offset. */
+	int objects;			/* Number of objects in slab */
+	spinlock_t list_lock;
+	struct list_head partial;
+	unsigned long nr_partial;
+	struct page *active[NR_CPUS];
+};
+
+/*
+ * The page struct is used to keep necessary information about a slab.
+ * For a compound page the first page keeps the slab state.
+ *
+ * Lock order:
+ *   1. slab_lock(page)
+ *   2. slab->list_lock
+ *
+ * The slabifier assigns one slab for allocation to each processor.
+ * Allocations only occur from these active slabs.
+ *
+ * If a cpu slab is active then a workqueue thread checks every 10
+ * seconds if the cpu slab is still in use. The cpu slab is pushed back
+ * to the list if inactive [only needed for SMP].
+ *
+ * Leftover slabs with free elements are kept on a partial list.
+ * There is no list for full slabs. If an object in a full slab is
+ * freed then the slab will show up again on the partial lists.
+ * Otherwise there is no need to track filled up slabs.
+ *
+ * Slabs are freed when they become empty. Teardown and setup is
+ * minimal so we rely on the page allocators per cpu caches for
+ * fast frees and allocations.
+ */
+
+/*
+ * Locking for each individual slab using the pagelock
+ */
+static __always_inline void slab_lock(struct page *page)
+{
+	bit_spin_lock(PG_locked, &page->flags);
+}
+
+static __always_inline void slab_unlock(struct page *page)
+{
+	bit_spin_unlock(PG_locked, &page->flags);
+}
+
+/*
+ * Management of partially allocated slabs
+ */
+static void __always_inline add_partial(struct slab *s, struct page *page)
+{
+	spin_lock(&s->list_lock);
+	s->nr_partial++;
+	list_add_tail(&page->lru, &s->partial);
+	spin_unlock(&s->list_lock);
+}
+
+static void __always_inline remove_partial(struct slab *s,
+						struct page *page)
+{
+	spin_lock(&s->list_lock);
+	list_del(&page->lru);
+	s->nr_partial--;
+	spin_unlock(&s->list_lock);
+}
+
+/*
+ * Lock page and remove it from the partial list
+ *
+ * Must hold list_lock
+ */
+static __always_inline int lock_and_del_slab(struct slab *s,
+						struct page *page)
+{
+	if (bit_spin_trylock(PG_locked, &page->flags)) {
+		list_del(&page->lru);
+		s->nr_partial--;
+		return 1;
+	}
+	return 0;
+}
+
+/*
+ * Get a partial page, lock it and return it.
+ */
+#ifdef CONFIG_NUMA
+static struct page *get_partial(struct slab *s, int node)
+{
+	struct page *page;
+	int searchnode = (node == -1) ? numa_node_id() : node;
+
+	spin_lock(&s->list_lock);
+	/*
+	 * Search for slab on the right node
+	 */
+	list_for_each_entry(page, &s->partial, lru)
+		if (likely(page_to_nid(page) == searchnode) &&
+			lock_and_del_slab(s, page))
+				goto out;
+
+	if (likely(node == -1)) {
+		/*
+		 * We can fall back to any other node in order to
+		 * reduce the size of the partial list.
+		 */
+		list_for_each_entry(page, &s->partial, lru)
+			if (likely(lock_and_del_slab(s, page)))
+				goto out;
+	}
+
+	/* Nothing found */
+	page = NULL;
+out:
+	spin_unlock(&s->list_lock);
+	return page;
+}
+#else
+static struct page *get_partial(struct slab *s, int node)
+{
+	struct page *page;
+
+	spin_lock(&s->list_lock);
+	list_for_each_entry(page, &s->partial, lru)
+		if (likely(lock_and_del_slab(s, page)))
+			goto out;
+
+	/* No slab or all slabs busy */
+	page = NULL;
+out:
+	spin_unlock(&s->list_lock);
+	return page;
+}
+#endif
+
+
+/*
+ * Debugging checks
+ */
+static void check_slab(struct page *page)
+{
+#ifdef SLABIFIER_DEBUG
+	if (!PageSlab(page)) {
+		printk(KERN_CRIT "Not a valid slab page @%p flags=%lx"
+			" mapping=%p count=%d \n",
+			page, page->flags, page->mapping, page_count(page));
+		BUG();
+	}
+#endif
+}
+
+static int check_valid_pointer(struct slab *s, struct page *page,
+					 void *object, void *origin)
+{
+#ifdef SLABIFIER_DEBUG
+	void *base = page_address(page);
+
+	if (object < base || object >= base + s->objects * s->size) {
+		printk(KERN_CRIT "slab %s size %d: pointer %p->%p\nnot in"
+			" range (%p-%p) in page %p\n", s->sc.name, s->size,
+			origin, object, base, base + s->objects * s->size,
+			page);
+		return 0;
+	}
+
+	if ((object - base) % s->size) {
+		printk(KERN_CRIT "slab %s size %d: pointer %p->%p\n"
+			"does not properly point"
+			"to an object in page %p\n",
+			s->sc.name, s->size, origin, object, page);
+		return 0;
+	}
+#endif
+	return 1;
+}
+
+/*
+ * Determine if a certain object on a page is on the freelist and
+ * therefore free. Must hold the slab lock for active slabs to
+ * guarantee that the chains are consistent.
+ */
+static int on_freelist(struct slab *s, struct page *page, void *search)
+{
+	int nr = 0;
+	void **object = page->freelist;
+	void *origin = &page->lru;
+
+	if (PageSlabsingle(page))
+		return 0;
+
+	check_slab(page);
+
+	while (object && nr <= s->objects) {
+		if (object == search)
+			return 1;
+		if (!check_valid_pointer(s, page, object, origin))
+			goto try_recover;
+		origin = object;
+		object = object[s->offset];
+		nr++;
+	}
+
+	if (page->inuse != s->objects - nr) {
+		printk(KERN_CRIT "slab %s: page %p wrong object count."
+			" counter is %d but counted were %d\n",
+			s->sc.name, page, page->inuse,
+			s->objects - nr);
+try_recover:
+		printk(KERN_CRIT "****** Trying to continue by marking "
+			"all objects in the slab used (memory leak!)\n");
+		page->inuse = s->objects;
+		page->freelist =  NULL;
+	}
+	return 0;
+}
+
+void check_free_chain(struct slab *s, struct page *page)
+{
+#ifdef SLABIFIER_DEBUG
+	on_freelist(s, page, NULL);
+#endif
+}
+
+/*
+ * Operations on slabs
+ */
+static void discard_slab(struct slab *s, struct page *page)
+{
+	atomic_long_dec(&s->nr_slabs);
+
+	page->mapping = NULL;
+	reset_page_mapcount(page);
+	__ClearPageSlab(page);
+	__ClearPageSlabsingle(page);
+
+	s->sc.page_alloc->free(s->sc.page_alloc, page, s->sc.order);
+}
+
+/*
+ * Allocate a new slab and prepare an empty freelist and the basic struct
+ * page settings.
+ */
+static struct page *new_slab(struct slab *s, gfp_t flags, int node)
+{
+	struct page *page;
+
+	if (flags & __GFP_WAIT)
+		local_irq_enable();
+
+	page = s->sc.page_alloc->allocate(s->sc.page_alloc, s->sc.order,
+			flags, node == -1 ? s->sc.node : node);
+	if (!page)
+		goto out;
+
+	page->offset = s->offset;
+
+	atomic_long_inc(&s->nr_slabs);
+
+	page->slab = (struct slab_cache *)s;
+	__SetPageSlab(page);
+
+	if (s->objects > 1) {
+		void *start = page_address(page);
+		void *end = start + s->objects * s->size;
+		void **last = start;
+		void *p = start + s->size;
+
+		while (p < end) {
+			last[s->offset] = p;
+			last = p;
+			p += s->size;
+		}
+		last[s->offset] = NULL;
+		page->freelist = start;
+		page->inuse = 0;
+		check_free_chain(s, page);
+	} else
+		__SetPageSlabsingle(page);
+
+out:
+	if (flags & __GFP_WAIT)
+		local_irq_disable();
+	return page;
+}
+
+/*
+ * Move a page back to the lists.
+ *
+ * Must be called with the slab lock held.
+ *
+ * On exit the slab lock will have been dropped.
+ */
+static void __always_inline putback_slab(struct slab *s, struct page *page)
+{
+	if (page->inuse) {
+		if (page->inuse < s->objects)
+			add_partial(s, page);
+		slab_unlock(page);
+	} else {
+		slab_unlock(page);
+		discard_slab(s, page);
+	}
+}
+
+/*
+ * Remove the currently active slab
+ */
+static void __always_inline deactivate_slab(struct slab *s,
+						struct page *page, int cpu)
+{
+	s->active[cpu] = NULL;
+	__ClearPageActive(page);
+	__ClearPageReferenced(page);
+
+	putback_slab(s, page);
+}
+
+/*
+ * Deactivate slab if we have an active slab.
+ */
+static void flush_active(struct slab *s, int cpu)
+{
+	struct page *page;
+	unsigned long flags;
+
+	local_irq_save(flags);
+	page = s->active[cpu];
+	if (likely(page)) {
+		slab_lock(page);
+		deactivate_slab(s, page, cpu);
+	}
+	local_irq_restore(flags);
+}
+
+#ifdef CONFIG_SMP
+/*
+ * Check active per cpu slabs and flush them if they are not in use.
+ */
+void flusher(void *d)
+{
+	struct slab *s = d;
+	int cpu = smp_processor_id();
+	struct page *page;
+	int nr_active = 0;
+
+	for_each_online_cpu(cpu) {
+
+		page = s->active[cpu];
+		if (!page)
+			continue;
+
+		if (PageReferenced(page)) {
+			ClearPageReferenced(page);
+			nr_active++;
+		} else
+			flush_active(s, cpu);
+	}
+	if (nr_active)
+		schedule_delayed_work(&s->flush, 2 * HZ);
+	else
+		s->flusher_active = 0;
+}
+
+static void drain_all(struct slab *s)
+{
+	int cpu;
+
+	if (s->flusher_active) {
+		cancel_delayed_work(&s->flush);
+		for_each_possible_cpu(cpu)
+			flush_active(s, cpu);
+		s->flusher_active = 0;
+	}
+}
+#else
+static void drain_all(struct slab *s)
+{
+	flush_active(s, 0);
+}
+#endif
+
+static __always_inline void *__slab_alloc(struct slab_cache *sc,
+					gfp_t gfpflags, int node)
+{
+	struct slab *s = (void *)sc;
+	struct page *page;
+	void **object;
+	void *next_object;
+	unsigned long flags;
+	int cpu;
+
+	local_irq_save(flags);
+	cpu = smp_processor_id();
+	page = s->active[cpu];
+	if (!page)
+		goto new_slab;
+
+	slab_lock(page);
+	check_free_chain(s, page);
+	if (unlikely(!page->freelist))
+		goto another_slab;
+
+	if (unlikely(node != -1 && page_to_nid(page) != node))
+		goto another_slab;
+redo:
+	page->inuse++;
+	object = page->freelist;
+	page->freelist = next_object = object[page->offset];
+	__SetPageReferenced(page);
+	slab_unlock(page);
+	local_irq_restore(flags);
+	return object;
+
+another_slab:
+	deactivate_slab(s, page, cpu);
+
+new_slab:
+	/*
+	 * This was moved out of line since it dereferences s and thus
+	 * potentially touches an extra cacheline
+	 */
+	if (unlikely(s->objects == 1)) {
+		page = new_slab(s, gfpflags, node);
+		local_irq_restore(flags);
+		if (page)
+			return page_address(page);
+		else
+			return NULL;
+	}
+
+	/* Racy check. If we mistakenly see no partial slabs then we
+	 * just allocate an empty slab. If we mistakenly try to get a
+	 * partial slab then get_partials() will return NULL.
+	 */
+	if (s->nr_partial) {
+		page = get_partial(s, node);
+		if (page)
+			goto gotpage;
+	}
+
+	page = new_slab(s, flags, node);
+
+	if (!page) {
+		local_irq_restore(flags);
+		return NULL;
+	}
+
+	slab_lock(page);
+
+gotpage:
+	if (s->active[cpu]) {
+		slab_unlock(page);
+		discard_slab(s, page);
+		page = s->active[cpu];
+		slab_lock(page);
+	} else
+		s->active[cpu] = page;
+
+	__SetPageActive(page);
+	check_free_chain(s, page);
+
+#ifdef CONFIG_SMP
+	if (keventd_up() && !s->flusher_active) {
+		s->flusher_active = 1;
+		schedule_delayed_work(&s->flush, 2 * HZ);
+	}
+#endif
+	goto redo;
+}
+
+static void *slab_alloc(struct slab_cache *sc, gfp_t gfpflags)
+{
+	return __slab_alloc(sc, gfpflags, -1);
+}
+
+static void *slab_alloc_node(struct slab_cache *sc, gfp_t gfpflags,
+							int node)
+{
+#ifdef CONFIG_NUMA
+	return __slab_alloc(sc, gfpflags, node);
+#else
+	return slab_alloc(sc, gfpflags);
+#endif
+}
+
+static void slab_free(struct slab_cache *sc, const void *x)
+{
+	struct slab *s = (void *)sc;
+	struct page * page;
+	void *prior;
+	void **object = (void *)x;
+	unsigned long flags;
+
+	if (!object)
+		return;
+
+	page = virt_to_page(x);
+
+	if (unlikely(PageCompound(page)))
+		page = page->first_page;
+
+	if (!s)
+		s = (void *)page->slab;
+
+	if (unlikely(PageSlabsingle(page)))
+		goto single_object_slab;
+
+#ifdef SLABIFIER_DEBUG
+	if (unlikely(s != (void *)page->slab))
+		goto slab_mismatch;
+	if (unlikely(!check_valid_pointer(s, page, object, NULL)))
+		goto dumpret;
+#endif
+
+	local_irq_save(flags);
+	slab_lock(page);
+
+#ifdef SLABIFIER_DEBUG
+	if (on_freelist(s, page, object))
+		goto double_free;
+#endif
+
+	prior = object[page->offset] = page->freelist;
+	page->freelist = object;
+	page->inuse--;
+
+	if (likely(PageActive(page) || (page->inuse && prior))) {
+out_unlock:
+		slab_unlock(page);
+		local_irq_restore(flags);
+		return;
+	}
+
+	if (!prior) {
+		/*
+		 * Page was fully used before. It will have one free
+		 * object now. So move to the partial list.
+		 */
+		add_partial(s, page);
+		goto out_unlock;
+	}
+
+	/*
+	 * All object have been freed.
+	 */
+	remove_partial(s, page);
+	slab_unlock(page);
+	discard_slab(s, page);
+	local_irq_restore(flags);
+	return;
+
+single_object_slab:
+	discard_slab(s, page);
+	return;
+
+#ifdef SLABIFIER_DEBUG
+double_free:
+	printk(KERN_CRIT "slab_free %s: object %p already free.\n",
+					s->sc.name, object);
+	dump_stack();
+	goto out_unlock;
+
+slab_mismatch:
+	if (!PageSlab(page)) {
+		printk(KERN_CRIT "slab_free %s size %d: attempt to free "
+			"object(%p) outside of slab.\n",
+			s->sc.name, s->size, object);
+		goto dumpret;
+	}
+
+	if (!page->slab) {
+		printk(KERN_CRIT
+			"slab_free : no slab(NULL) for object %p.\n",
+					object);
+			goto dumpret;
+	}
+
+	printk(KERN_CRIT "slab_free %s(%d): object at %p"
+			" belongs to slab %s(%d)\n",
+			s->sc.name, s->sc.size, object,
+			page->slab->name, page->slab->size);
+
+dumpret:
+	dump_stack();
+	printk(KERN_CRIT "***** Trying to continue by not "
+			"freeing object.\n");
+	return;
+#endif
+}
+
+/* Figure out on which slab object the object resides */
+static __always_inline struct page *get_object_page(const void *x)
+{
+	struct page * page = virt_to_page(x);
+
+	if (unlikely(PageCompound(page)))
+		page = page->first_page;
+
+	if (!PageSlab(page))
+		return NULL;
+
+	return page;
+}
+
+/*
+ * slab_create produces objects aligned at size and the first object
+ * is placed at offset 0 in the slab (We have no metainformation on the
+ * slab, all slabs are in essence off slab).
+ *
+ * In order to get the desired alignment one just needs to align the
+ * size.
+ *
+ * Notice that the allocation order determines the sizes of the per cpu
+ * caches. Each processor has always one slab available for allocations.
+ * Increasing the allocation order reduces the number of times that slabs
+ * must be moved on and off the partial lists and therefore may influence
+ * locking overhead.
+ *
+ * The offset is used to relocate the free list link in each object. It is
+ * therefore possible to move the free list link behind the object. This
+ * is necessary for RCU to work properly and also useful for debugging.
+ *
+ * However no freelists are necessary if there is only one element per
+ * slab.
+ */
+static struct slab_cache *slab_create(struct slab_control *x,
+	const struct slab_cache *sc)
+{
+	struct slab *s = (void *)x;
+	int cpu;
+
+	/* Verify that the generic structure is big enough for our data */
+	BUG_ON(sizeof(struct slab_control) < sizeof(struct slab));
+
+	memcpy(&x->sc, sc, sizeof(struct slab_cache));
+
+	s->size = ALIGN(sc->size, sizeof(void *));
+
+	if (sc->offset > s->size - sizeof(void *) ||
+			(sc->offset % sizeof(void*)))
+		return NULL;
+
+	s->offset = sc->offset / sizeof(void *);
+	BUG_ON(s->offset > 65535);
+	s->objects = (PAGE_SIZE << sc->order) / s->size;
+	BUG_ON(s->objects > 65535);
+	atomic_long_set(&s->nr_slabs, 0);
+	s->nr_partial = 0;
+#ifdef CONFIG_SMP
+	s->flusher_active = 0;
+	INIT_WORK(&s->flush, &flusher, s);
+#endif
+	if (!s->objects)
+		return NULL;
+
+	INIT_LIST_HEAD(&s->partial);
+
+	atomic_set(&s->refcount, 1);
+	spin_lock_init(&s->list_lock);
+
+	for_each_possible_cpu(cpu)
+		s->active[cpu] = NULL;
+	return &s->sc;
+}
+
+/*
+ * Check if a given pointer is valid
+ */
+static int slab_pointer_valid(struct slab_cache *sc, const void *object)
+{
+	struct slab *s = (void *)sc;
+	struct page * page;
+	void *addr;
+
+	page = get_object_page(object);
+
+	if (!page || sc != page->slab)
+		return 0;
+
+	addr = page_address(page);
+	if (object < addr || object >= addr + s->objects * s->size)
+		return 0;
+
+	if ((object - addr) & s->size)
+		return 0;
+
+	return 1;
+}
+
+/*
+ * Determine the size of a slab object
+ */
+static unsigned long slab_object_size(struct slab_cache *sc,
+						const void *object)
+{
+	struct page *page;
+	struct slab_cache *s;
+
+
+	page = get_object_page(object);
+	if (page) {
+		s = page->slab;
+		BUG_ON(sc && s != sc);
+		if (s)
+			return sc->size;
+	}
+	BUG();
+	return 0;	/* Satisfy compiler */
+}
+
+/*
+ * Move slab objects in a given slab by calling the move_objects function.
+ *
+ * Must be called with the slab lock held but will drop and reacquire the
+ * slab lock.
+ */
+static int move_slab_objects(struct slab *s, struct page *page,
+			 int (*move_objects)(struct slab_cache *, void *))
+{
+	int unfreeable = 0;
+	void *addr = page_address(page);
+
+	while (page->inuse - unfreeable > 0) {
+		void *p;
+
+		for (p = addr; p < addr + s->objects; p+= s->size) {
+			if (!on_freelist(s, page, p)) {
+				/*
+				 * Drop the lock here to allow the
+				 * move_object function to do things
+				 * with the slab_cache and maybe this
+				 * page.
+				 */
+				slab_unlock(page);
+				local_irq_enable();
+				if (move_objects((struct slab_cache *)s, p))
+					slab_free(&s->sc, p);
+				else
+					unfreeable++;
+				local_irq_disable();
+				slab_lock(page);
+			}
+		}
+	}
+	return unfreeable;
+}
+
+/*
+ * Shrinking drops the active per cpu slabs and also reaps all empty
+ * slabs off the partial list. Returns the number of slabs freed.
+ *
+ * The move_object function will be called for each objects in partially
+ * allocated slabs. move_object() needs to perform a new allocation for
+ * the object and move the contents of the object to the new location.
+ *
+ * If move_object() returns 1 for success then the object is going to be
+ * removed. If 0 then the object cannot be freed at all. As a result the
+ * slab containing the object will also not be freeable.
+ *
+ * Returns the number of slabs freed.
+ */
+static int slab_shrink(struct slab_cache *sc,
+			int (*move_object)(struct slab_cache *, void *))
+{
+	struct slab *s = (void *)sc;
+	unsigned long flags;
+	int slabs_freed = 0;
+	int i;
+
+	drain_all(s);
+
+	local_irq_save(flags);
+	for(i = 0; s->nr_partial > 1 && i < s->nr_partial - 1; i++ ) {
+		struct page * page;
+
+		page = get_partial(s, -1);
+		if (!page)
+			break;
+
+		/*
+		 * Pin page so that slab_free will not free even if we
+		 * drop the slab lock.
+		 */
+		__SetPageActive(page);
+
+		if (page->inuse < s->objects && move_object)
+			if (move_slab_objects(s,
+					page, move_object) == 0)
+				slabs_freed++;
+
+		/*
+		 * This will put the slab on the front of the partial
+		 * list, the used list or free it.
+		 */
+		__ClearPageActive(page);
+		putback_slab(s, page);
+	}
+	local_irq_restore(flags);
+	return slabs_freed;
+
+}
+
+static struct slab_cache *slab_dup(struct slab_cache *sc)
+{
+	struct slab *s = (void *)sc;
+
+	atomic_inc(&s->refcount);
+	return &s->sc;
+}
+
+static int free_list(struct slab *s, struct list_head *list)
+{
+	int slabs_inuse = 0;
+	unsigned long flags;
+	struct page *page, *h;
+
+	spin_lock_irqsave(&s->list_lock, flags);
+	list_for_each_entry_safe(page, h, list, lru)
+		if (!page->inuse) {
+			list_del(&s->partial);
+			discard_slab(s, page);
+		} else
+			slabs_inuse++;
+	spin_unlock_irqrestore(&s->list_lock, flags);
+	return slabs_inuse;
+}
+
+static int slab_destroy(struct slab_cache *sc)
+{
+	struct slab *s = (void *)sc;
+
+	if (!atomic_dec_and_test(&s->refcount))
+		return 0;
+
+	drain_all(s);
+	free_list(s, &s->partial);
+
+	if (atomic_long_read(&s->nr_slabs))
+		return 1;
+
+	/* Just to make sure that no one uses this again */
+	s->size = 0;
+	return 0;
+}
+
+static unsigned long count_objects(struct slab *s, struct list_head *list)
+{
+	int count = 0;
+	struct page *page;
+	unsigned long flags;
+
+	spin_lock_irqsave(&s->list_lock, flags);
+	list_for_each_entry(page, list, lru)
+		count += page->inuse;
+	spin_unlock_irqrestore(&s->list_lock, flags);
+	return count;
+}
+
+static unsigned long slab_objects(struct slab_cache *sc,
+	unsigned long *p_total, unsigned long *p_active,
+	unsigned long *p_partial)
+{
+	struct slab *s = (void *)sc;
+	int partial = count_objects(s, &s->partial);
+	int nr_slabs = atomic_read(&s->nr_slabs);
+	int active = 0;		/* Active slabs */
+	int nr_active = 0;	/* Objects in active slabs */
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		struct page *page = s->active[cpu];
+
+		if (page) {
+			nr_active++;
+			active += page->inuse;
+		}
+	}
+
+	if (p_partial)
+		*p_partial = s->nr_partial;
+
+	if (p_active)
+		*p_active = nr_active;
+
+	if (p_total)
+		*p_total = nr_slabs;
+
+	return partial + active +
+		(nr_slabs - s->nr_partial - nr_active) * s->objects;
+}
+
+const struct slab_allocator slabifier_allocator = {
+	.name = "Slabifier",
+	.create = slab_create,
+	.alloc = slab_alloc,
+	.alloc_node = slab_alloc_node,
+	.free = slab_free,
+	.valid_pointer = slab_pointer_valid,
+	.object_size = slab_object_size,
+	.get_objects = slab_objects,
+	.shrink = slab_shrink,
+	.dup = slab_dup,
+	.destroy = slab_destroy,
+	.destructor = null_slab_allocator_destructor,
+};
+EXPORT_SYMBOL(slabifier_allocator);
Index: linux-2.6.18-rc5-mm1/mm/Makefile
===================================================================
--- linux-2.6.18-rc5-mm1.orig/mm/Makefile	2006-09-01 14:10:50.404299038 -0700
+++ linux-2.6.18-rc5-mm1/mm/Makefile	2006-09-01 14:10:50.510737778 -0700
@@ -28,4 +28,4 @@ obj-$(CONFIG_MEMORY_HOTPLUG) += memory_h
 obj-$(CONFIG_FS_XIP) += filemap_xip.o
 obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_SMP) += allocpercpu.o
-obj-$(CONFIG_MODULAR_SLAB) += allocator.o
+obj-$(CONFIG_MODULAR_SLAB) += allocator.o slabifier.o
Index: linux-2.6.18-rc5-mm1/include/linux/mm.h
===================================================================
--- linux-2.6.18-rc5-mm1.orig/include/linux/mm.h	2006-09-01 10:13:35.890454083 -0700
+++ linux-2.6.18-rc5-mm1/include/linux/mm.h	2006-09-01 14:10:50.530267822 -0700
@@ -226,10 +226,16 @@ struct page {
 	unsigned long flags;		/* Atomic flags, some possibly
 					 * updated asynchronously */
 	atomic_t _count;		/* Usage count, see below. */
-	atomic_t _mapcount;		/* Count of ptes mapped in mms,
+	union {
+		atomic_t _mapcount;	/* Count of ptes mapped in mms,
 					 * to show when page is mapped
 					 * & limit reverse map searches.
 					 */
+		struct {	/* Slabifier */
+			short unsigned int inuse;
+			short unsigned int offset;
+		};
+	};
 	union {
 	    struct {
 		unsigned long private;		/* Mapping-private opaque data:
@@ -250,8 +256,15 @@ struct page {
 #if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
 	    spinlock_t ptl;
 #endif
+	    struct {			/* Slabifier */
+		struct page *first_page;	/* Compound pages */
+		struct slab_cache *slab;	/* Pointer to slab */
+	    };
+	};
+	union {
+		pgoff_t index;		/* Our offset within mapping. */
+		void *freelist;		/* Slabifier: free object */
 	};
-	pgoff_t index;			/* Our offset within mapping. */
 	struct list_head lru;		/* Pageout list, eg. active_list
 					 * protected by zone->lru_lock !
 					 */
Index: linux-2.6.18-rc5-mm1/include/linux/page-flags.h
===================================================================
--- linux-2.6.18-rc5-mm1.orig/include/linux/page-flags.h	2006-09-01 10:13:36.121885132 -0700
+++ linux-2.6.18-rc5-mm1/include/linux/page-flags.h	2006-09-01 14:10:50.530267822 -0700
@@ -92,7 +92,7 @@
 #define PG_buddy		19	/* Page is free, on buddy lists */
 
 #define PG_readahead		20	/* Reminder to do readahead */
-
+#define PG_slabsingle		21	/* Slab with a single object */
 
 #if (BITS_PER_LONG > 32)
 /*
@@ -126,6 +126,8 @@
 #define PageReferenced(page)	test_bit(PG_referenced, &(page)->flags)
 #define SetPageReferenced(page)	set_bit(PG_referenced, &(page)->flags)
 #define ClearPageReferenced(page)	clear_bit(PG_referenced, &(page)->flags)
+#define __SetPageReferenced(page) __set_bit(PG_referenced, &(page)->flags)
+#define __ClearPageReferenced(page) __clear_bit(PG_referenced, &(page)->flags)
 #define TestClearPageReferenced(page) test_and_clear_bit(PG_referenced, &(page)->flags)
 
 #define PageUptodate(page)	test_bit(PG_uptodate, &(page)->flags)
@@ -155,6 +157,7 @@
 
 #define PageActive(page)	test_bit(PG_active, &(page)->flags)
 #define SetPageActive(page)	set_bit(PG_active, &(page)->flags)
+#define __SetPageActive(page)	__set_bit(PG_active, &(page)->flags)
 #define ClearPageActive(page)	clear_bit(PG_active, &(page)->flags)
 #define __ClearPageActive(page)	__clear_bit(PG_active, &(page)->flags)
 
@@ -254,6 +257,10 @@
 #define SetPageReadahead(page)	set_bit(PG_readahead, &(page)->flags)
 #define TestClearPageReadahead(page) test_and_clear_bit(PG_readahead, &(page)->flags)
 
+#define PageSlabsingle(page)	test_bit(PG_slabsingle, &(page)->flags)
+#define __SetPageSlabsingle(page) __set_bit(PG_slabsingle, &(page)->flags)
+#define __ClearPageSlabsingle(page) __clear_bit(PG_slabsingle, &(page)->flags)
+
 struct page;	/* forward declaration */
 
 int test_clear_page_dirty(struct page *page);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [MODSLAB 3/5] /proc/slabinfo display
  2006-09-01 22:33 [MODSLAB 0/5] Modular slab allocator V3 Christoph Lameter
  2006-09-01 22:34 ` [MODSLAB 1/5] Generic Allocator Framework Christoph Lameter
  2006-09-01 22:34 ` [MODSLAB 2/5] Slabifier Christoph Lameter
@ 2006-09-01 22:34 ` Christoph Lameter
  2006-09-01 22:34 ` [MODSLAB 4/5] Kmalloc subsystem Christoph Lameter
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2006-09-01 22:34 UTC (permalink / raw)
  To: akpm
  Cc: Pekka Enberg, Marcelo Tosatti, linux-kernel, Nick Piggin,
	linux-mm, Christoph Lameter, mpm, Andi Kleen, Dave Chinner,
	Manfred Spraul

Generic Slab statistics module

A statistic module for generic slab allocator framework.

The creator of a cache must register the slab cache with

register_slab()

in order for something to show up in slabinfo.

Here is a sample of slabinfo output:

slabinfo - version: 3.0
# name            <objects> <objsize> <num_slabs> <partial_slabs> <active_slabs> <order> <allocator>
nfs_direct_cache           0     136       0       0       0  0 reclaimable:page_allocator
nfs_write_data            36     896       2       0       0  0 unreclaimable:page_allocator
nfs_read_data             21     768       2       1       0  0 unreclaimable:page_allocator
nfs_inode_cache           17    1032       3       3       0  0 ctor_dtor:reclaimable:page_allocator
rpc_tasks                  0     384       1       1       0  0 unreclaimable:page_allocator
rpc_inode_cache            0     896       1       1       0  0 ctor_dtor:reclaimable:page_allocator
ip6_dst_cache              0     384       1       1       0  0 unreclaimable:page_allocator
TCPv6                      1    1792       2       2       0  0 unreclaimable:page_allocator
UNIX                     112     768      11       7       0  0 unreclaimable:page_allocator
dm_tio                     0      24       0       0       0  0 unreclaimable:page_allocator
dm_io                      0      40       0       0       0  0 unreclaimable:page_allocator
kmalloc                    0      64       0       0       0  0 dma:unreclaimable:page_allocator
cfq_ioc_pool               0     160       0       0       0  0 unreclaimable:page_allocator
cfq_pool                   0     160       0       0       0  0 unreclaimable:page_allocator
mqueue_inode_cache         0     896       1       1       0  0 ctor_dtor:unreclaimable:page_allocator
xfs_chashlist            822      40       9       9       0  0 unreclaimable:page_allocator
xfs_ili                  183     192       9       8       1  0 unreclaimable:page_allocator
xfs_inode               7890     640     319       6       1  0 reclaimable:page_allocator
xfs_efi_item               0     352       0       0       0  0 unreclaimable:page_allocator
xfs_efd_item               0     360       0       0       0  0 unreclaimable:page_allocator
xfs_buf_item               4     184       1       0       1  0 unreclaimable:page_allocator
xfs_acl                    0     304       0       0       0  0 unreclaimable:page_allocator
xfs_dabuf                  0      24       1       0       1  0 unreclaimable:page_allocator
xfs_da_state               0     488       0       0       0  0 unreclaimable:page_allocator
xfs_trans                  1     832       1       0       1  0 unreclaimable:page_allocator
xfs_btree_cur              0     192       1       0       1  0 unreclaimable:page_allocator
xfs_bmap_free_item         0      24       0       0       0  0 unreclaimable:page_allocator
xfs_ioend                128     160       2       0       1  0 unreclaimable:page_allocator
xfs_vnode               7891     768     380       5       1  0 ctor_dtor:reclaimable:page_allocator
isofs_inode_cache          0     656       0       0       0  0 ctor_dtor:reclaimable:page_allocator
fat_inode_cache            0     688       1       1       0  0 ctor_dtor:reclaimable:page_allocator
fat_cache                  0      40       0       0       0  0 ctor_dtor:reclaimable:page_allocator
hugetlbfs_inode_cache      0     624       1       1       0  0 ctor_dtor:unreclaimable:page_allocator
ext2_inode_cache           0     776       0       0       0  0 ctor_dtor:reclaimable:page_allocator
ext2_xattr                 0      88       0       0       0  0 reclaimable:page_allocator
journal_handle             0      24       0       0       0  0 unreclaimable:page_allocator
journal_head               0      96       0       0       0  0 unreclaimable:page_allocator
ext3_inode_cache           0     824       0       0       0  0 ctor_dtor:reclaimable:page_allocator
ext3_xattr                 0      88       0       0       0  0 reclaimable:page_allocator
reiser_inode_cache         0     736       0       0       0  0 ctor_dtor:reclaimable:page_allocator
dnotify_cache              0      40       0       0       0  0 unreclaimable:page_allocator
dquot                      0     256       0       0       0  0 reclaimable:page_allocator
eventpoll_pwq              0      72       1       1       0  0 unreclaimable:page_allocator
inotify_event_cache        0      40       0       0       0  0 unreclaimable:page_allocator
inotify_watch_cache        0      72       1       1       0  0 unreclaimable:page_allocator
kioctx                     0     384       0       0       0  0 unreclaimable:page_allocator
fasync_cache               0      24       0       0       0  0 unreclaimable:page_allocator
shmem_inode_cache        794     816      45      12       0  0 ctor_dtor:unreclaimable:page_allocator
posix_timers_cache         0     136       0       0       0  0 unreclaimable:page_allocator
partial_page_cache         0      48       0       0       0  0 unreclaimable:page_allocator
xfrm_dst_cache             0     384       0       0       0  0 unreclaimable:page_allocator
ip_dst_cache              21     384       2       2       0  0 unreclaimable:page_allocator
RAW                        0     896       1       1       0  0 unreclaimable:page_allocator
UDP                        3     896       3       2       1  0 unreclaimable:page_allocator
TCP                       12    1664       4       4       0  0 unreclaimable:page_allocator
scsi_io_context            0     112       0       0       0  0 unreclaimable:page_allocator
blkdev_ioc                26      56       7       7       0  0 unreclaimable:page_allocator
blkdev_queue              24    1616       4       2       0  0 unreclaimable:page_allocator
blkdev_requests           12     280       2       0       2  0 unreclaimable:page_allocator
sock_inode_cache         167     768      12       6       1  0 ctor_dtor:reclaimable:page_allocator
file_lock_cache            1     184       2       2       0  0 ctor_dtor:unreclaimable:page_allocator
Acpi-Parse                 0      40       0       0       0  0 unreclaimable:page_allocator
Acpi-State                 0      80       0       0       0  0 unreclaimable:page_allocator
proc_inode_cache         696     640      36      16       1  0 ctor_dtor:reclaimable:page_allocator
sigqueue                   0     160       4       0       4  0 unreclaimable:page_allocator
radix_tree_node         2068     560      75       5       0  0 ctor_dtor:unreclaimable:page_allocator
bdev_cache                42     896       5       4       0  0 ctor_dtor:reclaimable:page_allocator
sysfs_dir_cache         4283      80      24       4       0  0 unreclaimable:page_allocator
inode_cache             2571     608     103       8       1  0 ctor_dtor:reclaimable:page_allocator
dentry_cache           13014     200     166       7       3  0 reclaimable:page_allocator
idr_layer_cache           76     536       4       2       0  0 ctor_dtor:unreclaimable:page_allocator
buffer_head             4417     104      33       9       0  0 ctor_dtor:reclaimable:page_allocator
vm_area_struct          1503     176      24      19       3  0 unreclaimable:page_allocator
files_cache               47     768       7       6       1  0 unreclaimable:page_allocator
signal_cache             136     640      10       6       1  0 unreclaimable:page_allocator
sighand_cache            136    1664      19       6       1  0 ctor_dtor:rcu:unreclaimable:page_allocator
anon_vma                 264      32       9       8       1  0 ctor_dtor:rcu:unreclaimable:page_allocator
shared_policy_node         0      48       0       0       0  0 unreclaimable:page_allocator
numa_policy               85     264       4       3       0  0 unreclaimable:page_allocator
kmalloc                    0  262144       0       0       0  4 unreclaimable:page_allocator
kmalloc                    2  131072       2       0       0  3 unreclaimable:page_allocator
kmalloc                    1   65536       1       0       0  2 unreclaimable:page_allocator
kmalloc                   10   32768      10       0       0  1 unreclaimable:page_allocator
kmalloc                   93   16384      93       0       0  0 unreclaimable:page_allocator
kmalloc                   98    8192      49       0       0  0 unreclaimable:page_allocator
kmalloc                   99    4096      31       8       4  0 unreclaimable:page_allocator
kmalloc                  345    2048      47      14       2  0 unreclaimable:page_allocator
kmalloc                  228    1024      21      12       2  0 unreclaimable:page_allocator
kmalloc                  183     512      14       9       3  0 unreclaimable:page_allocator
kmalloc                 3892     256      78      31       3  0 unreclaimable:page_allocator
kmalloc                 1244     128      18       9       4  0 unreclaimable:page_allocator
kmalloc                 1619      64      12       8       1  0 unreclaimable:page_allocator
kmalloc                  121      32       8       5       3  0 unreclaimable:page_allocator
kmalloc                 1644      16       5       4       0  0 unreclaimable:page_allocator
kmalloc                  128       8       4       4       0  0 unreclaimable:page_allocator

Signed-off-by: Christoph Lameter <clameter@sgi.com>


Index: linux-2.6.18-rc5-mm1/fs/proc/proc_misc.c
===================================================================
--- linux-2.6.18-rc5-mm1.orig/fs/proc/proc_misc.c	2006-09-01 10:13:26.836324579 -0700
+++ linux-2.6.18-rc5-mm1/fs/proc/proc_misc.c	2006-09-01 11:48:20.608777567 -0700
@@ -399,9 +399,11 @@ static struct file_operations proc_modul
 };
 #endif
 
-#ifdef CONFIG_SLAB
+#if defined(CONFIG_SLAB) || defined(CONFIG_MODULAR_SLAB)
 extern struct seq_operations slabinfo_op;
+#ifdef CONFIG_SLAB
 extern ssize_t slabinfo_write(struct file *, const char __user *, size_t, loff_t *);
+#endif
 static int slabinfo_open(struct inode *inode, struct file *file)
 {
 	return seq_open(file, &slabinfo_op);
@@ -409,12 +411,14 @@ static int slabinfo_open(struct inode *i
 static struct file_operations proc_slabinfo_operations = {
 	.open		= slabinfo_open,
 	.read		= seq_read,
+#ifdef CONFIG_SLAB
 	.write		= slabinfo_write,
+#endif
 	.llseek		= seq_lseek,
 	.release	= seq_release,
 };
 
-#ifdef CONFIG_DEBUG_SLAB_LEAK
+#if defined(CONFIG_DEBUG_SLAB_LEAK) && defined(CONFIG_SLAB)
 extern struct seq_operations slabstats_op;
 static int slabstats_open(struct inode *inode, struct file *file)
 {
@@ -787,9 +791,9 @@ void __init proc_misc_init(void)
 #endif
 	create_seq_entry("stat", 0, &proc_stat_operations);
 	create_seq_entry("interrupts", 0, &proc_interrupts_operations);
-#ifdef CONFIG_SLAB
+#if defined(CONFIG_SLAB) || defined(CONFIG_MODULAR_SLAB)
 	create_seq_entry("slabinfo",S_IWUSR|S_IRUGO,&proc_slabinfo_operations);
-#ifdef CONFIG_DEBUG_SLAB_LEAK
+#if defined(CONFIG_DEBUG_SLAB_LEAK) && defined(CONFIG_SLAB)
 	create_seq_entry("slab_allocators", 0 ,&proc_slabstats_operations);
 #endif
 #endif
Index: linux-2.6.18-rc5-mm1/mm/Makefile
===================================================================
--- linux-2.6.18-rc5-mm1.orig/mm/Makefile	2006-09-01 11:48:08.244307287 -0700
+++ linux-2.6.18-rc5-mm1/mm/Makefile	2006-09-01 11:48:20.608777567 -0700
@@ -28,4 +28,4 @@ obj-$(CONFIG_MEMORY_HOTPLUG) += memory_h
 obj-$(CONFIG_FS_XIP) += filemap_xip.o
 obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_SMP) += allocpercpu.o
-obj-$(CONFIG_MODULAR_SLAB) += allocator.o slabifier.o
+obj-$(CONFIG_MODULAR_SLAB) += allocator.o slabifier.o slabstat.o
Index: linux-2.6.18-rc5-mm1/mm/slabstat.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.18-rc5-mm1/mm/slabstat.c	2006-09-01 11:48:20.609754070 -0700
@@ -0,0 +1,96 @@
+/*
+ * linux/mm/slabstat.c
+ */
+
+#include <linux/mm.h>
+#include <linux/seq_file.h>
+
+static DECLARE_RWSEM(slabstat_sem);
+
+LIST_HEAD(slab_caches);
+
+void register_slab(struct slab_cache *s)
+{
+	down_write(&slabstat_sem);
+	list_add(&s->list, &slab_caches);
+	up_write(&slabstat_sem);
+}
+
+void unregister_slab(struct slab_cache *s)
+{
+	down_write(&slabstat_sem);
+	list_add(&s->list, &slab_caches);
+	up_write(&slabstat_sem);
+}
+
+static void print_slabinfo_header(struct seq_file *m)
+{
+	/*
+	 * Output format version, so at least we can change it
+	 * without _too_ many complaints.
+	 */
+	seq_puts(m, "slabinfo - version: 3.0\n");
+	seq_puts(m, "# name            <objects> <objsize> <num_slabs> "
+		"<partial_slabs> <active_slabs> <order> <allocator>");
+	seq_putc(m, '\n');
+}
+
+static void *s_start(struct seq_file *m, loff_t *pos)
+{
+	loff_t n = *pos;
+	struct list_head *p;
+
+	down_read(&slabstat_sem);
+	if (!n)
+		print_slabinfo_header(m);
+	p = slab_caches.next;
+	while (n--) {
+		p = p->next;
+		if (p == &slab_caches)
+			return NULL;
+	}
+	return list_entry(p, struct slab_cache, list);
+}
+
+static void *s_next(struct seq_file *m, void *p, loff_t *pos)
+{
+	struct slab_cache *s = p;
+	++*pos;
+	return s->list.next == &slab_caches ?
+		NULL : list_entry(s->list.next, struct slab_cache, list);
+}
+
+static void s_stop(struct seq_file *m, void *p)
+{
+	up_read(&slabstat_sem);
+}
+
+static int s_show(struct seq_file *m, void *p)
+{
+	struct slab_cache *s = p;
+	unsigned long total_slabs;
+	unsigned long active_slabs;
+	unsigned long partial_slabs;
+	unsigned long objects;
+
+	objects = s->slab_alloc->get_objects(s, &total_slabs,
+					&active_slabs, &partial_slabs);
+
+	seq_printf(m, "%-21s %7lu %7u %7lu %7lu %7lu %2d %s",
+		   s->name, objects, s->size, total_slabs, partial_slabs,
+		   active_slabs, s->order, s->page_alloc->name);
+
+	seq_putc(m, '\n');
+	return 0;
+}
+
+/*
+ * slabinfo_op - iterator that generates /proc/slabinfo
+ */
+struct seq_operations slabinfo_op = {
+	.start = s_start,
+	.next = s_next,
+	.stop = s_stop,
+	.show = s_show,
+};
+
Index: linux-2.6.18-rc5-mm1/include/linux/slabstat.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.18-rc5-mm1/include/linux/slabstat.h	2006-09-01 11:48:20.611707074 -0700
@@ -0,0 +1,9 @@
+#ifndef _LINUX_SLABSTAT_H
+#define _LINUX_SLABSTAT_H
+#include <linux/allocator.h>
+
+void register_slab(struct slab_cache *s);
+void unregister_slab(struct slab_cache *s);
+
+#endif /* _LINUX_SLABSTAT_H */
+

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [MODSLAB 4/5] Kmalloc subsystem
  2006-09-01 22:33 [MODSLAB 0/5] Modular slab allocator V3 Christoph Lameter
                   ` (2 preceding siblings ...)
  2006-09-01 22:34 ` [MODSLAB 3/5] /proc/slabinfo display Christoph Lameter
@ 2006-09-01 22:34 ` Christoph Lameter
  2006-09-01 22:34 ` [MODSLAB 5/5] Slabulator: Emulate the existing Slab Layer Christoph Lameter
  2006-09-01 22:34 ` [MODSLAB] Bypass indirections [for performance testing only] Christoph Lameter
  5 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2006-09-01 22:34 UTC (permalink / raw)
  To: akpm
  Cc: Pekka Enberg, Marcelo Tosatti, linux-kernel, Nick Piggin,
	linux-mm, Christoph Lameter, mpm, Manfred Spraul, Dave Chinner,
	Andi Kleen

A generic kmalloc layer for the modular slab

Regular kmalloc allocations are optimized. DMA kmalloc slabs are
created on demand.

Also re exports the kmalloc array as a new slab_allocator that
can be used to tie into the kmalloc array (the slabulator
uses that to avoid creating new slabs that are compatible
with generic kmalloc caches).

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.18-rc5-mm1/include/linux/kmalloc.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.18-rc5-mm1/include/linux/kmalloc.h	2006-09-01 11:54:43.745232343 -0700
@@ -0,0 +1,136 @@
+#ifndef _LINUX_KMALLOC_H
+#define _LINUX_KMALLOC_H
+/*
+ * In kernel dynamic memory allocator.
+ *
+ * (C) 2006 Silicon Graphics, Inc,
+ * 		Christoph Lameter <clameter@sgi.com>
+ */
+
+#include <linux/allocator.h>
+#include <linux/config.h>
+#include <linux/types.h>
+
+#ifndef KMALLOC_ALLOCATOR
+#define KMALLOC_ALLOCATOR slabifier_allocator
+#endif
+
+#define KMALLOC_SHIFT_LOW 3
+
+#define KMALLOC_SHIFT_HIGH 18
+
+#if L1_CACHE_BYTES <= 64
+#define KMALLOC_EXTRAS 2
+#define KMALLOC_EXTRA
+#else
+#define KMALLOC_EXTRAS 0
+#endif
+
+#define KMALLOC_NR_CACHES (KMALLOC_SHIFT_HIGH - KMALLOC_SHIFT_LOW \
+			 + 1 + KMALLOC_EXTRAS)
+/*
+ * We keep the general caches in an array of slab caches that are used for
+ * 2^x bytes of allocations. For each size we generate a DMA and a
+ * non DMA cache (DMA simply means memory for legacy I/O. The regular
+ * caches can be used for devices that can DMA to all of memory).
+ */
+extern struct slab_control kmalloc_caches[KMALLOC_NR_CACHES];
+
+/*
+ * Sorry that the following has to be that ugly but GCC has trouble
+ * with constant propagation and loops.
+ */
+static inline int kmalloc_index(int size)
+{
+	if (size <=    8) return 3;
+	if (size <=   16) return 4;
+	if (size <=   32) return 5;
+	if (size <=   64) return 6;
+#ifdef KMALLOC_EXTRA
+	if (size <=   96) return KMALLOC_SHIFT_HIGH + 1;
+#endif
+	if (size <=  128) return 7;
+#ifdef KMALLOC_EXTRA
+	if (size <=  192) return KMALLOC_SHIFT_HIGH + 2;
+#endif
+	if (size <=  256) return 8;
+	if (size <=  512) return 9;
+	if (size <= 1024) return 10;
+	if (size <= 2048) return 11;
+	if (size <= 4096) return 12;
+	if (size <=   8 * 1024) return 13;
+	if (size <=  16 * 1024) return 14;
+	if (size <=  32 * 1024) return 15;
+	if (size <=  64 * 1024) return 16;
+	if (size <= 128 * 1024) return 17;
+	if (size <= 256 * 1024) return 18;
+	return -1;
+}
+
+/*
+ * Find the slab cache for a given combination of allocation flags and size.
+ *
+ * This ought to end up with a global pointer to the right cache
+ * in kmalloc_caches.
+ */
+static inline struct slab_cache *kmalloc_slab(size_t size)
+{
+	int index = kmalloc_index(size) - KMALLOC_SHIFT_LOW;
+
+	if (index < 0) {
+		/*
+		 * Generate a link failure. Would be great if we could
+		 * do something to stop the compile here.
+		 */
+		extern void __kmalloc_size_too_large(void);
+		__kmalloc_size_too_large();
+	}
+	return &kmalloc_caches[index].sc;
+}
+
+extern void *__kmalloc(size_t, gfp_t);
+#define ____kmalloc __kmalloc
+
+static inline void *kmalloc(size_t size, gfp_t flags)
+{
+	if (__builtin_constant_p(size) && !(flags & __GFP_DMA)) {
+		struct slab_cache *s = kmalloc_slab(size);
+
+		return KMALLOC_ALLOCATOR.alloc(s, flags);
+	} else
+		return __kmalloc(size, flags);
+}
+
+#ifdef CONFIG_NUMA
+extern void *__kmalloc_node(size_t, gfp_t, int);
+static inline void *kmalloc_node(size_t size, gfp_t flags, int node)
+{
+	if (__builtin_constant_p(size) && !(flags & __GFP_DMA)) {
+		struct slab_cache *s = kmalloc_slab(size);
+
+		return KMALLOC_ALLOCATOR.alloc_node(s, flags, node);
+	} else
+		return __kmalloc_node(size, flags, node);
+}
+#else
+#define kmalloc_node(__size, __flags, __node) kmalloc((__size), (__flags))
+#endif
+
+/* Free an object */
+static inline void kfree(const void *x)
+{
+	return KMALLOC_ALLOCATOR.free(NULL, x);
+}
+
+/* Allocate and zero the specified number of bytes */
+extern void *kzalloc(size_t, gfp_t);
+
+/* Figure out what size the chunk is */
+extern size_t ksize(const void *);
+
+extern struct page_allocator *reclaimable_allocator;
+extern struct page_allocator *unreclaimable_allocator;
+
+extern int slab_min_order;
+
+#endif	/* _LINUX_KMALLOC_H */
Index: linux-2.6.18-rc5-mm1/mm/kmalloc.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.18-rc5-mm1/mm/kmalloc.c	2006-09-01 11:56:08.877659139 -0700
@@ -0,0 +1,226 @@
+/*
+ * Create generic slab caches for memory allocation.
+ *
+ * (C) 2006 Silicon Graphics. Inc. Christoph Lameter <clameter@sgi.com>
+ */
+
+#include <linux/mm.h>
+#include <linux/allocator.h>
+#include <linux/module.h>
+#include <linux/kmalloc.h>
+#include <linux/slabstat.h>
+
+#ifndef ARCH_KMALLOC_MINALIGN
+#define ARCH_KMALLOC_MINALIGN sizeof(void *)
+#endif
+
+struct slab_control kmalloc_caches[KMALLOC_NR_CACHES] __cacheline_aligned;
+EXPORT_SYMBOL(kmalloc_caches);
+
+static struct page_allocator *dma_allocator;
+struct page_allocator *reclaimable_allocator;
+struct page_allocator *unreclaimable_allocator;
+
+/*
+ * Mininum order of slab pages. This influences locking overhead and slab
+ * fragmentation. A higher order reduces the number of partial slabs
+ * and increases the number of allocations possible without having to
+ * take the list_lock.
+ */
+int slab_min_order = 0;
+EXPORT(slab_min_order);
+
+static struct slab_cache *kmalloc_caches_dma[KMALLOC_NR_CACHES];
+
+static int __init setup_slab_min_order(char *str)
+{
+	get_option (&str, &slab_min_order);
+
+	return 1;
+}
+
+__setup("slab_min_order=", setup_slab_min_order);
+
+/*
+ * Given a slab size find the correct order to use.
+ * We only support powers of two so there is really
+ * no need for anything special. Objects will always
+ * fit exactly into the slabs with no overhead.
+ */
+static int order(size_t size)
+{
+	unsigned long base_size = PAGE_SIZE << slab_min_order;
+
+	if (size >= base_size)
+		/* One object per slab */
+		return fls(size -1) - PAGE_SHIFT;
+
+	return slab_min_order;
+}
+
+static struct slab_cache *create_kmalloc_cache(struct slab_control *x,
+		const char *name,
+		const struct page_allocator *p,
+		int size)
+{
+	struct slab_cache s;
+	struct slab_cache *rs;
+
+	s.page_alloc = p;
+	s.slab_alloc = &KMALLOC_ALLOCATOR;
+	s.size = size;
+	s.align = ARCH_KMALLOC_MINALIGN;
+	s.offset = 0;
+	s.objsize = size;
+	s.inuse = size;
+	s.node = -1;
+	s.order = order(size);
+	s.name = "kmalloc";
+	rs = KMALLOC_ALLOCATOR.create(x, &s);
+	if (!rs)
+		panic("Creation of kmalloc slab %s size=%d failed.\n",
+			name, size);
+	register_slab(rs);
+	return rs;
+}
+
+static struct slab_cache *get_slab(size_t size, gfp_t flags)
+{
+	int index = kmalloc_index(size) - KMALLOC_SHIFT_LOW;
+	struct slab_cache *s;
+	struct slab_control *x;
+	size_t realsize;
+
+	BUG_ON(size < 0);
+
+	if (!(flags & __GFP_DMA))
+		return &kmalloc_caches[index].sc;
+
+	s = kmalloc_caches_dma[index];
+	if (s)
+		return s;
+
+	/* Dynamically create dma cache */
+	x = kmalloc(sizeof(struct slab_control), flags & ~(__GFP_DMA));
+
+	if (!x)
+		panic("Unable to allocate memory for dma cache\n");
+
+#ifdef KMALLOC_EXTRA
+	if (index <= KMALLOC_SHIFT_HIGH - KMALLOC_SHIFT_LOW)
+#endif
+		realsize = 1 << index;
+#ifdef KMALLOC_EXTRA
+	else if (index = KMALLOC_EXTRA)
+		realsize = 96;
+	else
+		realsize = 192;
+#endif
+
+	s = create_kmalloc_cache(x, "kmalloc_dma", dma_allocator, realsize);
+	kmalloc_caches_dma[index] = s;
+	return s;
+}
+
+void *__kmalloc(size_t size, gfp_t flags)
+{
+	return KMALLOC_ALLOCATOR.alloc(get_slab(size, flags), flags);
+}
+EXPORT_SYMBOL(__kmalloc);
+
+#ifdef CONFIG_NUMA
+void *__kmalloc_node(size_t size, gfp_t flags, int node)
+{
+	return KMALLOC_ALLOCATOR.alloc_node(get_slab(size, flags),
+							flags, node);
+}
+EXPORT_SYMBOL(__kmalloc_node);
+#endif
+
+void *kzalloc(size_t size, gfp_t flags)
+{
+	void *x = __kmalloc(size, flags);
+
+	if (x)
+		memset(x, 0, size);
+	return x;
+}
+EXPORT_SYMBOL(kzalloc);
+
+size_t ksize(const void *object)
+{
+	return KMALLOC_ALLOCATOR.object_size(NULL, object);
+};
+EXPORT_SYMBOL(ksize);
+
+/*
+ * Provide the kmalloc array as regular slab allocator for the
+ * generic allocator framework.
+ */
+struct slab_allocator kmalloc_slab_allocator;
+
+static struct slab_cache *kmalloc_create(struct slab_control *x,
+	const struct slab_cache *s)
+{
+	struct slab_cache *km;
+
+	int index = max(0, fls(s->size - 1) - KMALLOC_SHIFT_LOW);
+
+	if (index > KMALLOC_SHIFT_HIGH - KMALLOC_SHIFT_LOW + 1
+			|| s->offset)
+		return NULL;
+
+	km = &kmalloc_caches[index].sc;
+
+	BUG_ON(s->size > km->size);
+
+	return KMALLOC_ALLOCATOR.dup(km);
+}
+
+static void null_destructor(struct page_allocator *x) {}
+
+void __init kmalloc_init(void)
+{
+	int i;
+
+	for (i =  KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
+		create_kmalloc_cache(
+			&kmalloc_caches[i - KMALLOC_SHIFT_LOW],
+			"kmalloc", &page_allocator, 1 << i);
+	}
+#ifdef KMALLOC_EXTRA
+	/* Non-power of two caches */
+	create_kmalloc_cache(&kmalloc_caches
+		[KMALLOC_SHIFT_HIGH - KMALLOC_SHIFT_LOW + 1], name,
+						&page_allocator, 96);
+	create_kmalloc_cache(&kmalloc_caches
+		[KMALLOC_SHIFT_HIGH - KMALLOC_SHIFT_LOW + 2], name,
+						&page_allocator, 192);
+#endif
+
+	/*
+	 * The above must be done first. Deriving a page allocator requires
+	 * a working (normal) kmalloc array.
+	 */
+	unreclaimable_allocator = unreclaimable_slab(&page_allocator);
+	unreclaimable_allocator->destructor = null_destructor;
+
+	/*
+	 * Fix up the initial arrays. Because of the precending uses
+	 * we likely have consumed a couple of pages that we cannot account
+	 * for.
+	 */
+	for(i = 0; i < KMALLOC_NR_CACHES; i++)
+		kmalloc_caches[i].sc.page_alloc = unreclaimable_allocator;
+
+	reclaimable_allocator = reclaimable_slab(&page_allocator);
+	reclaimable_allocator->destructor = null_destructor;
+	dma_allocator = dmaify_page_allocator(unreclaimable_allocator);
+
+	/* And deal with the kmalloc_cache_allocator */
+	memcpy(&kmalloc_slab_allocator, &KMALLOC_ALLOCATOR,
+			sizeof(struct slab_allocator));
+	kmalloc_slab_allocator.create = kmalloc_create;
+	kmalloc_slab_allocator.destructor = null_slab_allocator_destructor;
+}
+
Index: linux-2.6.18-rc5-mm1/mm/Makefile
===================================================================
--- linux-2.6.18-rc5-mm1.orig/mm/Makefile	2006-09-01 11:54:04.735927776 -0700
+++ linux-2.6.18-rc5-mm1/mm/Makefile	2006-09-01 11:54:06.508279026 -0700
@@ -28,4 +28,4 @@ obj-$(CONFIG_MEMORY_HOTPLUG) += memory_h
 obj-$(CONFIG_FS_XIP) += filemap_xip.o
 obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_SMP) += allocpercpu.o
-obj-$(CONFIG_MODULAR_SLAB) += allocator.o slabifier.o slabstat.o
+obj-$(CONFIG_MODULAR_SLAB) += allocator.o slabifier.o slabstat.o kmalloc.o
Index: linux-2.6.18-rc5-mm1/include/asm-i386/page.h
===================================================================
--- linux-2.6.18-rc5-mm1.orig/include/asm-i386/page.h	2006-08-27 20:41:48.000000000 -0700
+++ linux-2.6.18-rc5-mm1/include/asm-i386/page.h	2006-09-01 11:54:06.508279026 -0700
@@ -37,6 +37,7 @@
 
 #define alloc_zeroed_user_highpage(vma, vaddr) alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
+#define ARCH_NEEDS_SMALL_SLABS
 
 /*
  * These are used to make use of C type-checking..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [MODSLAB 5/5] Slabulator: Emulate the existing Slab Layer
  2006-09-01 22:33 [MODSLAB 0/5] Modular slab allocator V3 Christoph Lameter
                   ` (3 preceding siblings ...)
  2006-09-01 22:34 ` [MODSLAB 4/5] Kmalloc subsystem Christoph Lameter
@ 2006-09-01 22:34 ` Christoph Lameter
  2006-09-01 22:34 ` [MODSLAB] Bypass indirections [for performance testing only] Christoph Lameter
  5 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2006-09-01 22:34 UTC (permalink / raw)
  To: akpm
  Cc: Pekka Enberg, Marcelo Tosatti, linux-kernel, Nick Piggin,
	linux-mm, Christoph Lameter, mpm, Andi Kleen, Dave Chinner,
	Manfred Spraul

The slab emulation layer.

This provides a layer that implements the existing slab API.
We try to keep the definitions that we copy from slab.h
to an absolute minimum. If things break then more
(useless) definitions from slab.h may be needed.

We put a hook into slab.h to redirect includes for slab.h to
slabulator.h.

The slabulator also contains the remnants of the slab reaper since it is
used by the page allocator in the CONFIG_NUMA case. The slabifier does not
need this anymore since it is not object cache based.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.18-rc5-mm1/mm/slabulator.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.18-rc5-mm1/mm/slabulator.c	2006-09-01 14:10:50.808570950 -0700
@@ -0,0 +1,299 @@
+/*
+ * Slabulator = Emulate the Slab API.
+ *
+ * (C) 2006 Silicon Graphics, Inc. Christoph Lameter <clameter@sgi.com>
+ *
+ */
+#include <linux/mm.h>
+#include <linux/kmalloc.h>
+#include <linux/module.h>
+#include <linux/allocator.h>
+#include <linux/bitops.h>
+#include <linux/slabulator.h>
+#include <linux/slabstat.h>
+
+#define SLAB_MAX_ORDER 4
+
+#define SLABULATOR_MERGE
+
+#ifndef ARCH_SLAB_MINALIGN
+#define ARCH_SLAB_MINALIGN sizeof(void *)
+#endif
+
+static int calculate_order(int size)
+{
+	int order;
+	int rem;
+
+	if ((size & (size -1)) == 0) {
+		/*
+		 * We can use the page allocator if the requested size
+		 * is compatible with the page sizes supported.
+		 */
+		int order = fls(size) -1 - PAGE_SHIFT;
+
+		if (order >= 0)
+			return 0;
+	}
+
+	for(order = max(slab_min_order, fls(size - 1) - PAGE_SHIFT);
+			order < MAX_ORDER; order++) {
+		unsigned long slab_size = PAGE_SIZE << order;
+
+		if (slab_size < size)
+			continue;
+
+		rem = slab_size % size;
+
+		if (rem * 8 <= PAGE_SIZE << order)
+			break;
+
+	}
+	if (order >= MAX_ORDER)
+		return -E2BIG;
+	return order;
+}
+
+/*
+ * We can actually operate slabs any time after the page allocator is up.
+ * slab_is_available() merely means that the kmalloc array is available.
+ *
+ * However, be aware that deriving allocators depends on kmalloc being
+ * functional.
+ */
+int slabulator_up = 0;
+
+int slab_is_available(void) {
+	return slabulator_up;
+}
+
+void kmem_cache_init(void)
+{
+	extern void kmalloc_init(void);
+
+	kmalloc_init();
+	slabulator_up = 1;
+}
+
+struct slab_cache *kmem_cache_create(const char *name, size_t size,
+		size_t align, unsigned long flags,
+		void (*ctor)(void *, struct slab_cache *, unsigned long),
+		void (*dtor)(void *, struct slab_cache *, unsigned long))
+{
+	const struct page_allocator *a;
+	struct slab_cache s;
+	struct slab_cache *rs;
+	struct slab_control *x;
+	int one_object_slab;
+
+	s.offset = 0;
+	s.align = max(ARCH_SLAB_MINALIGN, ALIGN(align, sizeof(void *)));
+
+	if (flags & (SLAB_MUST_HWCACHE_ALIGN|SLAB_HWCACHE_ALIGN))
+		s.align = L1_CACHE_BYTES;
+
+	s.inuse = size;
+	s.objsize = size;
+	s.size = ALIGN(size, s.align);
+
+	/* Pick the right allocator for our purposes */
+	if (flags & SLAB_RECLAIM_ACCOUNT)
+		a = reclaimable_allocator;
+	else
+		a = unreclaimable_allocator;
+
+	if (flags & SLAB_CACHE_DMA)
+		a = dmaify_page_allocator(a);
+
+	if (flags & SLAB_DESTROY_BY_RCU)
+		a = rcuify_page_allocator(a);
+
+	one_object_slab = s.size > ((PAGE_SIZE / 2)  << calculate_order(s.size));
+
+	if (!one_object_slab && ((flags & SLAB_DESTROY_BY_RCU) || ctor || dtor)) {
+		/*
+		 * For RCU processing and constructors / destructors:
+		 * The object must remain intact even if it is free.
+		 * The free pointer would hurt us there.
+		 * Relocate the free object pointer out of
+		 * the space used by the object.
+		 *
+		 * Slabs with a single object do not need this since
+		 * those do not have to deal with free pointers.
+		 */
+		s.offset = s.size - sizeof(void *);
+		if (s.offset < s.objsize) {
+			/*
+			 * Would overlap the object. We need to waste some
+			 * more space to make the object safe from the
+			 * free pointer.
+			 */
+			s.offset = s.size;
+			s.size += s.align;
+		}
+		s.inuse = s.size;
+	}
+
+	s.order = calculate_order(s.size);
+
+	if (s.order < 0)
+		goto error;
+
+	s.name = name;
+	s.node = -1;
+
+	x = kmalloc(sizeof(struct slab_control), GFP_KERNEL);
+
+	if (!x)
+		return NULL;
+	s.page_alloc = a;
+	s.slab_alloc = &SLABULATOR_ALLOCATOR;
+#ifdef SLABULATOR_MERGE
+	/*
+	 * This works but is this really something we want?
+	 */
+	if (((s.size & (s.size - 1))==0) && !ctor && !dtor &&
+		   !(flags & (SLAB_DESTROY_BY_RCU|SLAB_RECLAIM_ACCOUNT))) {
+
+		printk(KERN_INFO "Merging slab_cache %s size %d into"
+			" kmalloc array\n", name, s.size);
+		rs = kmalloc_slab_allocator.create(x, &s);
+		kfree(x);
+		x = NULL;
+	} else
+#endif
+	rs = SLABULATOR_ALLOCATOR.create(x, &s);
+	if (!rs)
+		goto error;
+
+	/*
+	 * Now deal with constuctors and destructors. We need to know the
+	 * slab_cache address in order to be able to pass the slab_cache
+	 * address down the chain.
+	 */
+	if (ctor || dtor)
+		rs->page_alloc =
+			ctor_and_dtor_for_page_allocator(rs->page_alloc,
+				rs->size, rs,
+				(void *)ctor, (void *)dtor);
+
+	if (x)
+		register_slab(rs);
+	return rs;
+
+error:
+	a->destructor((struct page_allocator *)a);
+	if (flags & SLAB_PANIC)
+		panic("Cannot create slab %s size=%d realsize=%d "
+			"order=%d offset=%d flags=%lx\n",
+			s.name, size, s.size, s.order, s.offset, flags);
+
+
+	return NULL;
+}
+EXPORT_SYMBOL(kmem_cache_create);
+
+int kmem_cache_destroy(struct slab_cache *s)
+{
+	SLABULATOR_ALLOCATOR.destroy(s);
+	unregister_slab(s);
+	kfree(s);
+	return 0;
+}
+EXPORT_SYMBOL(kmem_cache_destroy);
+
+void *kmem_cache_zalloc(struct slab_cache *s, gfp_t flags)
+{
+	void *x;
+
+	x = kmem_cache_alloc(s, flags);
+	if (x)
+		memset(x, 0, s->objsize);
+	return x;
+}
+
+/*
+ * Generic reaper (the slabifier has its own way of reaping)
+ */
+#ifdef CONFIG_NUMA
+/*
+ * Special reaping functions for NUMA systems called from cache_reap().
+ */
+static DEFINE_PER_CPU(unsigned long, reap_node);
+
+static void init_reap_node(int cpu)
+{
+	int node;
+
+	node = next_node(cpu_to_node(cpu), node_online_map);
+	if (node == MAX_NUMNODES)
+		node = first_node(node_online_map);
+
+	__get_cpu_var(reap_node) = node;
+}
+
+static void next_reap_node(void)
+{
+	int node = __get_cpu_var(reap_node);
+
+	/*
+	 * Also drain per cpu pages on remote zones
+	 */
+	if (node != numa_node_id())
+		drain_node_pages(node);
+
+	node = next_node(node, node_online_map);
+	if (unlikely(node >= MAX_NUMNODES))
+		node = first_node(node_online_map);
+	__get_cpu_var(reap_node) = node;
+}
+#else
+#define init_reap_node(cpu) do { } while (0)
+#define next_reap_node(void) do { } while (0)
+#endif
+
+#define REAPTIMEOUT_CPUC	(2*HZ)
+
+#ifdef CONFIG_SMP
+static DEFINE_PER_CPU(struct work_struct, reap_work);
+
+static void cache_reap(void *unused)
+{
+	next_reap_node();
+	refresh_cpu_vm_stats(smp_processor_id());
+
+	schedule_delayed_work(&__get_cpu_var(reap_work),
+				      REAPTIMEOUT_CPUC);
+}
+
+static void __devinit start_cpu_timer(int cpu)
+{
+	struct work_struct *reap_work = &per_cpu(reap_work, cpu);
+
+	/*
+	 * When this gets called from do_initcalls via cpucache_init(),
+	 * init_workqueues() has already run, so keventd will be setup
+	 * at that time.
+	 */
+	if (keventd_up() && reap_work->func == NULL) {
+		init_reap_node(cpu);
+		INIT_WORK(reap_work, cache_reap, NULL);
+		schedule_delayed_work_on(cpu, reap_work, HZ + 3 * cpu);
+	}
+}
+
+static int __init cpucache_init(void)
+{
+	int cpu;
+
+	/*
+	 * Register the timers that drain pcp pages and update vm statistics
+	 */
+	for_each_online_cpu(cpu)
+		start_cpu_timer(cpu);
+	return 0;
+}
+__initcall(cpucache_init);
+#endif
+
+
Index: linux-2.6.18-rc5-mm1/include/linux/slabulator.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.18-rc5-mm1/include/linux/slabulator.h	2006-09-01 14:10:50.835913012 -0700
@@ -0,0 +1,123 @@
+#ifndef _LINUX_SLABULATOR_H
+#define _LINUX_SLABULATOR_H
+/*
+ * Slabulator: Emulate the existing Slab API.
+ *
+ * (C) 2006 Silicon Graphics, Inc.
+ *		Christoph Lameter <clameter@sgi.com>
+ */
+
+#include <linux/allocator.h>
+#include <linux/kmalloc.h>
+
+#define kmem_cache_t	struct slab_cache
+#define kmem_cache	slab_cache
+
+#ifndef SLABULATOR_ALLOCATOR
+#define SLABULATOR_ALLOCATOR slabifier_allocator
+#endif
+
+/*
+ * We really should be getting rid of these. This is only
+ * a select list/
+ */
+#define	SLAB_KERNEL		GFP_KERNEL
+#define	SLAB_ATOMIC		GFP_ATOMIC
+#define	SLAB_NOFS		GFP_NOFS
+#define SLAB_NOIO		GFP_NOIO
+
+/* No debug features for now */
+#define	SLAB_HWCACHE_ALIGN	0x00002000UL
+#define SLAB_CACHE_DMA		0x00004000UL
+#define SLAB_MUST_HWCACHE_ALIGN	0x00008000UL
+#define SLAB_RECLAIM_ACCOUNT	0x00020000UL
+#define SLAB_PANIC		0x00040000UL
+#define SLAB_DESTROY_BY_RCU	0x00080000UL
+#define SLAB_MEM_SPREAD		0x00100000UL
+
+/* flags passed to a constructor func */
+#define	SLAB_CTOR_CONSTRUCTOR	0x001UL
+#define SLAB_CTOR_ATOMIC	0x002UL
+#define	SLAB_CTOR_VERIFY	0x004UL
+
+/*
+ * slab_allocators are always available after the page allocator
+ * has been brought up. kmem_cache_init creates the kmalloc array:
+ */
+extern int slab_is_available(void);
+extern void kmem_cache_init(void);
+
+/* System wide caches (Should these be really here?) */
+extern struct slab_cache *vm_area_cachep;
+extern struct slab_cache *names_cachep;
+extern struct slab_cache *files_cachep;
+extern struct slab_cache *filp_cachep;
+extern struct slab_cache *fs_cachep;
+extern struct slab_cache *sighand_cachep;
+extern struct slab_cache *bio_cachep;
+
+extern struct slab_cache *kmem_cache_create(const char *name, size_t size,
+	size_t align, unsigned long flags,
+	void (*ctor)(void *, struct slab_cache *, unsigned long),
+	void (*dtor)(void *, struct slab_cache *, unsigned long));
+
+static inline unsigned int kmem_cache_size(struct slab_cache *s)
+{
+	return s->objsize;
+}
+
+static inline const char *kmem_cache_name(struct slab_cache *s)
+{
+	return s->name;
+}
+
+static inline void *kmem_cache_alloc(struct slab_cache *s, gfp_t flags)
+{
+	return SLABULATOR_ALLOCATOR.alloc(s, flags);
+}
+
+static inline void *kmem_cache_alloc_node(struct slab_cache *s,
+					gfp_t flags, int node)
+{
+	return SLABULATOR_ALLOCATOR.alloc_node(s, flags, node);
+}
+
+extern void *kmem_cache_zalloc(struct slab_cache *s, gfp_t flags);
+
+static inline void kmem_cache_free(struct slab_cache *s, const void *x)
+{
+	SLABULATOR_ALLOCATOR.free(s, x);
+}
+
+static inline int kmem_ptr_validate(struct slab_cache *s, void *x)
+{
+	return SLABULATOR_ALLOCATOR.valid_pointer(s, x);
+}
+
+extern int kmem_cache_destroy(struct slab_cache *s);
+
+static inline int kmem_cache_shrink(struct slab_cache *s)
+{
+	return SLABULATOR_ALLOCATOR.shrink(s, NULL);
+}
+
+/**
+ * kcalloc - allocate memory for an array. The memory is set to zero.
+ * @n: number of elements.
+ * @size: element size.
+ * @flags: the type of memory to allocate.
+ */
+static inline void *kcalloc(size_t n, size_t size, gfp_t flags)
+{
+	if (n != 0 && size > ULONG_MAX / n)
+		return NULL;
+	return kzalloc(n * size, flags);
+}
+
+/* No current shrink statistics */
+struct shrinker;
+static inline void kmem_set_shrinker(kmem_cache_t *cachep,
+		struct shrinker *shrinker)
+{}
+#endif /* _LINUX_SLABULATOR_H */
+
Index: linux-2.6.18-rc5-mm1/mm/Makefile
===================================================================
--- linux-2.6.18-rc5-mm1.orig/mm/Makefile	2006-09-01 14:10:50.744121805 -0700
+++ linux-2.6.18-rc5-mm1/mm/Makefile	2006-09-01 14:10:50.836889514 -0700
@@ -28,4 +28,5 @@ obj-$(CONFIG_MEMORY_HOTPLUG) += memory_h
 obj-$(CONFIG_FS_XIP) += filemap_xip.o
 obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_SMP) += allocpercpu.o
-obj-$(CONFIG_MODULAR_SLAB) += allocator.o slabifier.o slabstat.o kmalloc.o
+obj-$(CONFIG_MODULAR_SLAB) += allocator.o slabifier.o slabstat.o \
+				kmalloc.o slabulator.o
Index: linux-2.6.18-rc5-mm1/init/Kconfig
===================================================================
--- linux-2.6.18-rc5-mm1.orig/init/Kconfig	2006-09-01 10:13:37.382549626 -0700
+++ linux-2.6.18-rc5-mm1/init/Kconfig	2006-09-01 14:10:50.837866016 -0700
@@ -332,6 +332,26 @@ config CC_OPTIMIZE_FOR_SIZE
 
 	  If unsure, say N.
 
+config SLAB
+	default y
+	bool "Traditional SLAB allocator"
+	help
+	  Disabling this allows the use of alternate slab allocators
+	  with less overhead such as SLOB (very simple) or the
+	  use the slabifier with the module allocator framework.
+	  Note that alternate slab allocators may not provide
+	  the complete functionality for slab.
+
+config MODULAR_SLAB
+	default y
+	bool "Use the modular allocator framework"
+	depends on EXPERIMENTAL && !SLAB
+	help
+	 The modular  allocator framework allows the flexible use
+	 of different slab allocators and page allocators for memory
+	 allocation. This will completely replace the existing
+	 slab allocator. Beware this is experimental code.
+
 menuconfig EMBEDDED
 	bool "Configure standard kernel features (for small systems)"
 	help
@@ -370,7 +390,6 @@ config KALLSYMS_EXTRA_PASS
 	   reported.  KALLSYMS_EXTRA_PASS is only a temporary workaround while
 	   you wait for kallsyms to be fixed.
 
-
 config HOTPLUG
 	bool "Support for hot-pluggable devices" if EMBEDDED
 	default y
@@ -445,15 +464,6 @@ config SHMEM
 	  option replaces shmem and tmpfs with the much simpler ramfs code,
 	  which may be appropriate on small systems without swap.
 
-config SLAB
-	default y
-	bool "Use full SLAB allocator" if EMBEDDED
-	help
-	  Disabling this replaces the advanced SLAB allocator and
-	  kmalloc support with the drastically simpler SLOB allocator.
-	  SLOB is more space efficient but does not scale well and is
-	  more susceptible to fragmentation.
-
 config VM_EVENT_COUNTERS
 	default y
 	bool "Enable VM event counters for /proc/vmstat" if EMBEDDED
@@ -475,7 +485,7 @@ config BASE_SMALL
 	default 1 if !BASE_FULL
 
 config SLOB
-	default !SLAB
+	default !SLAB && !MODULAR_SLAB
 	bool
 
 menu "Loadable module support"
Index: linux-2.6.18-rc5-mm1/include/linux/slab.h
===================================================================
--- linux-2.6.18-rc5-mm1.orig/include/linux/slab.h	2006-09-01 10:13:36.505650544 -0700
+++ linux-2.6.18-rc5-mm1/include/linux/slab.h	2006-09-01 14:10:50.837866016 -0700
@@ -9,6 +9,10 @@
 
 #if	defined(__KERNEL__)
 
+#ifdef CONFIG_MODULAR_SLAB
+#include <linux/slabulator.h>
+#else
+
 typedef struct kmem_cache kmem_cache_t;
 
 #include	<linux/gfp.h>
@@ -291,6 +295,8 @@ extern kmem_cache_t	*fs_cachep;
 extern kmem_cache_t	*sighand_cachep;
 extern kmem_cache_t	*bio_cachep;
 
+#endif /* CONFIG_SLABULATOR */
+
 #endif	/* __KERNEL__ */
 
 #endif	/* _LINUX_SLAB_H */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [MODSLAB] Bypass indirections [for performance testing only]
  2006-09-01 22:33 [MODSLAB 0/5] Modular slab allocator V3 Christoph Lameter
                   ` (4 preceding siblings ...)
  2006-09-01 22:34 ` [MODSLAB 5/5] Slabulator: Emulate the existing Slab Layer Christoph Lameter
@ 2006-09-01 22:34 ` Christoph Lameter
  5 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2006-09-01 22:34 UTC (permalink / raw)
  To: akpm
  Cc: Pekka Enberg, Marcelo Tosatti, linux-kernel, Nick Piggin,
	linux-mm, Christoph Lameter, mpm, Manfred Spraul, Dave Chinner,
	Andi Kleen

Bypass indirections.

This is a patch to bypass indirections so that one can get some statistics
on how high the impact of the indirect calls is.

Only use this for testing.

Index: linux-2.6.18-rc5-mm1/mm/slabifier.c
===================================================================
--- linux-2.6.18-rc5-mm1.orig/mm/slabifier.c	2006-09-01 14:25:55.907938735 -0700
+++ linux-2.6.18-rc5-mm1/mm/slabifier.c	2006-09-01 15:20:51.915297648 -0700
@@ -498,12 +498,13 @@ gotpage:
 	goto redo;
 }
 
-static void *slab_alloc(struct slab_cache *sc, gfp_t gfpflags)
+void *slab_alloc(struct slab_cache *sc, gfp_t gfpflags)
 {
 	return __slab_alloc(sc, gfpflags, -1);
 }
+EXPORT_SYMBOL(slab_alloc);
 
-static void *slab_alloc_node(struct slab_cache *sc, gfp_t gfpflags,
+void *slab_alloc_node(struct slab_cache *sc, gfp_t gfpflags,
 							int node)
 {
 #ifdef CONFIG_NUMA
@@ -512,8 +513,9 @@ static void *slab_alloc_node(struct slab
 	return slab_alloc(sc, gfpflags);
 #endif
 }
+EXPORT_SYMBOL(slab_alloc_node);
 
-static void slab_free(struct slab_cache *sc, const void *x)
+void slab_free(struct slab_cache *sc, const void *x)
 {
 	struct slab *s = (void *)sc;
 	struct page * page;
@@ -617,6 +619,7 @@ dumpret:
 	return;
 #endif
 }
+EXPORT_SYMBOL(slab_free);
 
 /* Figure out on which slab object the object resides */
 static __always_inline struct page *get_object_page(const void *x)
Index: linux-2.6.18-rc5-mm1/include/linux/kmalloc.h
===================================================================
--- linux-2.6.18-rc5-mm1.orig/include/linux/kmalloc.h	2006-09-01 14:26:06.439513840 -0700
+++ linux-2.6.18-rc5-mm1/include/linux/kmalloc.h	2006-09-01 15:20:51.917250652 -0700
@@ -67,6 +67,10 @@ static inline int kmalloc_index(int size
 	return -1;
 }
 
+extern void *slab_alloc(struct slab_cache *, gfp_t flags);
+extern void *slab_alloc_node(struct slab_cache *, gfp_t, int);
+extern void slab_free(struct slab_cache *, const void *);
+
 /*
  * Find the slab cache for a given combination of allocation flags and size.
  *
@@ -96,7 +100,7 @@ static inline void *kmalloc(size_t size,
 	if (__builtin_constant_p(size) && !(flags & __GFP_DMA)) {
 		struct slab_cache *s = kmalloc_slab(size);
 
-		return KMALLOC_ALLOCATOR.alloc(s, flags);
+		return slab_alloc(s, flags);
 	} else
 		return __kmalloc(size, flags);
 }
@@ -108,7 +112,7 @@ static inline void *kmalloc_node(size_t 
 	if (__builtin_constant_p(size) && !(flags & __GFP_DMA)) {
 		struct slab_cache *s = kmalloc_slab(size);
 
-		return KMALLOC_ALLOCATOR.alloc_node(s, flags, node);
+		return slab_alloc_node(s, flags, node);
 	} else
 		return __kmalloc_node(size, flags, node);
 }
@@ -119,7 +123,7 @@ static inline void *kmalloc_node(size_t 
 /* Free an object */
 static inline void kfree(const void *x)
 {
-	return KMALLOC_ALLOCATOR.free(NULL, x);
+	slab_free(NULL, x);
 }
 
 /* Allocate and zero the specified number of bytes */
Index: linux-2.6.18-rc5-mm1/mm/kmalloc.c
===================================================================
--- linux-2.6.18-rc5-mm1.orig/mm/kmalloc.c	2006-09-01 14:26:06.440490342 -0700
+++ linux-2.6.18-rc5-mm1/mm/kmalloc.c	2006-09-01 15:20:51.917250652 -0700
@@ -124,15 +124,14 @@ static struct slab_cache *get_slab(size_
 
 void *__kmalloc(size_t size, gfp_t flags)
 {
-	return KMALLOC_ALLOCATOR.alloc(get_slab(size, flags), flags);
+	return slab_alloc(get_slab(size, flags), flags);
 }
 EXPORT_SYMBOL(__kmalloc);
 
 #ifdef CONFIG_NUMA
 void *__kmalloc_node(size_t size, gfp_t flags, int node)
 {
-	return KMALLOC_ALLOCATOR.alloc_node(get_slab(size, flags),
-							flags, node);
+	return slab_alloc_node(get_slab(size, flags), flags, node);
 }
 EXPORT_SYMBOL(__kmalloc_node);
 #endif
Index: linux-2.6.18-rc5-mm1/include/linux/slabulator.h
===================================================================
--- linux-2.6.18-rc5-mm1.orig/include/linux/slabulator.h	2006-09-01 14:26:06.861362745 -0700
+++ linux-2.6.18-rc5-mm1/include/linux/slabulator.h	2006-09-01 15:20:51.918227155 -0700
@@ -73,20 +73,23 @@ static inline const char *kmem_cache_nam
 
 static inline void *kmem_cache_alloc(struct slab_cache *s, gfp_t flags)
 {
-	return SLABULATOR_ALLOCATOR.alloc(s, flags);
+	return slab_alloc(s, flags);
+	//return SLABULATOR_ALLOCATOR.alloc(s, flags);
 }
 
 static inline void *kmem_cache_alloc_node(struct slab_cache *s,
 					gfp_t flags, int node)
 {
-	return SLABULATOR_ALLOCATOR.alloc_node(s, flags, node);
+	return slab_alloc_node(s, flags, node);
+//	return SLABULATOR_ALLOCATOR.alloc_node(s, flags, node);
 }
 
 extern void *kmem_cache_zalloc(struct slab_cache *s, gfp_t flags);
 
 static inline void kmem_cache_free(struct slab_cache *s, const void *x)
 {
-	SLABULATOR_ALLOCATOR.free(s, x);
+	slab_free(s, x);
+//	SLABULATOR_ALLOCATOR.free(s, x);
 }
 
 static inline int kmem_ptr_validate(struct slab_cache *s, void *x)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-09-01 22:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-09-01 22:33 [MODSLAB 0/5] Modular slab allocator V3 Christoph Lameter
2006-09-01 22:34 ` [MODSLAB 1/5] Generic Allocator Framework Christoph Lameter
2006-09-01 22:34 ` [MODSLAB 2/5] Slabifier Christoph Lameter
2006-09-01 22:34 ` [MODSLAB 3/5] /proc/slabinfo display Christoph Lameter
2006-09-01 22:34 ` [MODSLAB 4/5] Kmalloc subsystem Christoph Lameter
2006-09-01 22:34 ` [MODSLAB 5/5] Slabulator: Emulate the existing Slab Layer Christoph Lameter
2006-09-01 22:34 ` [MODSLAB] Bypass indirections [for performance testing only] Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox