[MODSLAB 0/4] A modular slab allocator V2

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [MODSLAB 0/4] A modular slab allocator V2
@ 2006-08-27  2:32 Christoph Lameter
  2006-08-27  2:32 ` [MODSLAB 1/4] Generic Allocator Framework Christoph Lameter
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Christoph Lameter @ 2006-08-27  2:32 UTC (permalink / raw)
  To: akpm
  Cc: Marcelo Tosatti, linux-kernel, linux-mm, Andi Kleen, mpm,
	Manfred Spraul, Dave Chinner, Christoph Lameter

Changes V1-V2:
- Drop pageslab and numaslab. Drop support for VMALLOC allocations.

- Enhance slabifier with some numa capability. Bypass
  free list management for slabs with a single object.
  Drop slab full lists and minimize lock taking
  for partial lists.

- Optimize code: Generate general slab array immediately
  and pass the address of the slab cache in kmalloc(). DMA
  caches remain dynamic.

- Add support for non power of 2 general caches.

- Tested on i386, x86_64 and ia64.

The main intend of this patchset is to modularize
the slab allocator so that development of additions
or modification to the allocator layer become easier.
The framework enables the use of multiple slab allocator
and allows the generation of additional underlying
page allocators (as f.e. needed for mempools and other
specialized things).

The modularization is accomplished by trying to use a few
concepts from object oriented programming. Allocators are
described by methods and functions can produce new allocators
based on existing ones by modifying their methods.

So what the patches provide here is:

1. A framework for page allocators and slab allocators

2. Various methods to derive new allocators from old ones
   (add rcu support, destructors, constructors, dma etc)

3. A layer that emulates the exist slab interface (the slabulator).

4. A layer that provides kmalloc functionality.

5. The Slabifier. This is conceptually the Simple Slab (See my RFC
   from last week) but with the additional allocator modifications
   possible it grows like on steroids and then can supply most of
   the functionality of the existing slab allocator and can go even
   beyond it. My tests with AIM7 seem to indicate that it is
   equal in performance to the existing slab allocator. However,
   I am sure that there are specific situtions in which it will
   not behave optimially. Hopefully the modularization will
   lead to a fast way to mature this component.

Some of the other issues in the slab layer are also addressed here:

1. shrink_slab takes a function to move object. Using that
   function slabs can be defragmented to ease slab reclaim
   (This is something discussed at the VM summit).

2. New slabs that are created can be merged into the kmalloc array
   if it is detected that they match. This decreases the number of caches
   and benefits cache use.

3. The slabifier can flag double frees when the act occurs
   and will attempt to continue.

4. There is no 2 second slab reaper tick anymore. Each slab has a 10
   second flusher attached (not needed for UP). The flusher is inactive if
   slab is inactive. System can get to a quiescent state. Currently
   we constantly have a 2 second scan through all slabs in the
   system with expensive processing that causes delays
   that may be significant for real time processing.

Notably missing features:

- Slab Debugging
  (This should be implemented by deriving a new slab allocator from
  existing ones and adding the necessary processing in alloc and free).

Performance tests on an 8p machine show consistently that the performance
is equal to the standard slab allocator. Memory use is much less with
this since there is no meta data overhead per slab.

This patchset should just leave the existing slab allocator unharmed. It only
adds a hook to include/linux/slab.h to redirect includes to the definitions
for the allocation framework by the slabulator.

Deactivate the "Traditional Slab allocator" in order to activate the modular
slab allocator.

More details may be found in the header of each of the following 4 patches.

I am not sure how something like this could be merged. Maybe put in a
special directory?

In the current form the modular framework is definitely fitting the
requirements of the embedded folks since the allocation efficiency
is comparable if not better than SLOB. It is running to my satisfaction
on desktops and small servers. However, a lot of tuning is still
needed. Perhaps a caching framework will be needed to be equal in
speed to slab in some situations. That may perhaps be added as another
derived slab allocator.

Christoph Lameter, August 26, 2006.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [MODSLAB 1/4] Generic Allocator Framework
  2006-08-27  2:32 [MODSLAB 0/4] A modular slab allocator V2 Christoph Lameter
@ 2006-08-27  2:32 ` Christoph Lameter
  2006-08-27  2:32 ` [MODSLAB 2/4] A slab allocator: SLABIFIER Christoph Lameter
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2006-08-27  2:32 UTC (permalink / raw)
  To: akpm
  Cc: Marcelo Tosatti, linux-kernel, linux-mm, Andi Kleen,
	Christoph Lameter, mpm, Dave Chinner, Manfred Spraul

Add allocator abstraction

The allocator abstraction layer provides sources of pages for the slabifier
and it provides ways to customize the slabifier to ones needs (one can
put dmaificiation, rcuification and so on of slab frees etc on top of the
standard page allocator).

The allocator framework also provides a means for deriving new slab
allocators from old ones. That way features can be added in a generic way.
It would be possible to add rcu for slab objects or debugging in that
fashion.

The allocator framework introduces the requirement to do indirect
function calls. This could cause some slowdown of the allocators.
However, I have not seen this in my tests with AIM7 on an 8p NUMA
machine. Maybe different tests will show a slowdown. However, this
object-oriented style of deconstructing the allocators has the
advantage that we can deal with small pieces of code that add special
functionality. The overall framework makes it easy to replace pieces
and evolve the whole allocator systems in a faster way.

It also provides a generic way to operate on different allocators.
It is no problem to define a new allocator that allocates from
memory pools and then use the slab allocator on that memory pool.

The code in mm/allocators.c provides some examples what could be
done with derived allocators.

Signed-off-by: Christoph Lameter <clameter>.

Index: linux-2.6.18-rc4-mm3/mm/Makefile
===================================================================
--- linux-2.6.18-rc4-mm3.orig/mm/Makefile	2006-08-26 16:38:04.813597388 -0700
+++ linux-2.6.18-rc4-mm3/mm/Makefile	2006-08-26 16:38:18.581301135 -0700
@@ -25,4 +25,4 @@ obj-$(CONFIG_MEMORY_HOTPLUG) += memory_h
 obj-$(CONFIG_FS_XIP) += filemap_xip.o
 obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_SMP) += allocpercpu.o
-
+obj-$(CONFIG_MODULAR_SLAB) += allocator.o
Index: linux-2.6.18-rc4-mm3/include/linux/allocator.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.18-rc4-mm3/include/linux/allocator.h	2006-08-26 18:26:25.240956238 -0700
@@ -0,0 +1,221 @@
+#ifndef _LINUX_ALLOCATOR_H
+#define _LINUX_ALLOCATOR_H
+
+/*
+ * Generic API to memory allocators.
+ * (C) 2006 Silicon Graphics, Inc,
+ *	Christoph Lameter <clameter@sgi.com>
+ */
+#include <linux/gfp.h>
+
+/*
+ * Page allocators
+ *
+ * Page allocators are sources of memory in pages. They basically only
+ * support allocation and freeing of pages. The interesting thing
+ * is how these pages are obtained. Plus with these methods
+ * we can add things in between the page allocator a use
+ * of the page allocator to add things without too much
+ * effort. This allows us to encapsulate new features.
+ *
+ * New allocators could be added f.e. for specialized memory pools.
+ */
+
+struct page_allocator {
+	struct page *(*allocate)(const struct page_allocator *, int order,
+		gfp_t mask, int node);
+	void (*free)(const struct page_allocator *, struct page *, int order);
+	void (*destructor) (struct page_allocator *);
+	const char *name;
+};
+
+/* Standard page allocators*/
+extern const struct page_allocator page_allocator;
+
+/*
+ * Generators for new allocators based on known allocators
+ *
+ * These behave like modifiers to already generated or
+ * existing allocators. May be combined at will.
+ */
+
+/*
+ * A way to free all pages via RCU. The RCU head is placed in the
+ * struct page so this fully transparent and does not require any
+ * allocation and freeing via the slab.
+ */
+struct page_allocator *rcuify_page_allocator
+			(const struct page_allocator *base);
+
+/*
+ * Make an allocation via a specific allocator always return
+ * DMA memory.
+ */
+struct page_allocator *dmaify_page_allocator
+			(const struct page_allocator *base);
+
+
+/*
+ * Allocation and freeing is tracked with slab_reclaim_pages
+ */
+struct page_allocator *reclaimable_slab
+			(const struct page_allocator *base);
+
+struct page_allocator *unreclaimable_slab
+			(const struct page_allocator *base);
+
+/*
+ * This provides a constructor and a destructor call for each object
+ * on a page. The constructors and destructors calling conventions
+ * are compatible with the existing slab implementation. However,
+ * this implementation assumes that the objects always start at offset 0.
+ *
+ * The main use of these is to provide a generic forrm of constructors
+ * and destructors. These run after a page was allocated and before
+ * a page is freed.
+ */
+struct page_allocator *ctor_and_dtor_for_page_allocator
+	(const struct page_allocator *, unsigned int size, void *private,
+		void (*ctor)(void *, void *, unsigned long),
+                void (*dtor)(void *, void *, unsigned long));
+
+#ifdef CONFIG_NUMA
+/*
+ * Allocator that allows the customization of the NUMA behavior of an
+ * allocator. If a node is specified then the allocator will always try
+ * to allocate on that node. Flags set are ORed for every allocation.
+ * F.e. one can set GFP_THISNODE to force an allocation on a particular node
+ * or on a local node.
+ */
+struct page_allocator *numactl_allocator(const struct page_allocator *,
+						int node, gfp_t flags);
+#endif
+
+/* Tools to make your own */
+struct derived_page_allocator {
+	struct page_allocator a;
+	const struct page_allocator *base;
+};
+
+void derived_destructor(struct page_allocator *a);
+
+struct derived_page_allocator *derive_page_allocator
+				(const struct page_allocator *base,
+				const char *name);
+
+/*
+ * Slab allocators
+ */
+
+
+/*
+ * A slab cache structure must be generated and be populated in order to
+ * create a working slab cache.
+ */
+struct slab_cache {
+	const struct slab_allocator *slab_alloc;
+	const struct page_allocator *page_alloc;
+	short int node;		/* Node passed to page allocator */
+	short int align;	/* Alignment requirements */
+	int size;		/* The size of a chunk on a slab */
+	int objsize;		/* The size of an object that is in a chunk */
+	int inuse;		/* Used portion of the chunk */
+	int offset;		/* Offset to the freelist pointer */
+	unsigned int order;	/* Size of the slab page */
+	const char *name;	/* Name (only for display!) */
+	struct list_head list;	/* slabinfo data */
+};
+
+/*
+ * Generic structure for opaque per slab data for slab allocators
+ */
+struct slab_control {
+	struct slab_cache sc;	/* Common information */
+	void *data[50];		/* Some data */
+	void *percpu[NR_CPUS];	/* Some per cpu information. */
+};
+
+struct slab_allocator {
+	/* Allocation functions */
+	void *(*alloc)(struct slab_cache *, gfp_t);
+	void *(*alloc_node)(struct slab_cache *, gfp_t, int);
+	void (*free)(struct slab_cache *, const void *);
+
+	/* Entry point from kfree */
+	void (*__free)(struct page *, const void *);
+
+	/* Object checks */
+	int (*valid_pointer)(struct slab_cache *, const void *object);
+	unsigned long (*object_size)(struct slab_cache *, const void *);
+
+	/*
+	 * Determine slab statistics in units of slabs. Returns the
+	 * number of total pages used by the slab cache.
+	 * active are the pages under allocation or empty
+	 * partial are the number of partial slabs.
+	 */
+	unsigned long (*get_objects)(struct slab_cache *, unsigned long *total,
+			unsigned long *active, unsigned long *partial);
+
+	/*
+	 * Create an actually usable slab cache from a slab allocator
+	 */
+	struct slab_cache *(*create)(struct slab_control *,
+		const struct slab_cache *);
+
+	/*
+	 * shrink defragments a slab cache by moving objects from sparsely
+	 * populated slabs to others. slab shrink will terminate when there
+	 * is only one fragmented slab left.
+	 *
+	 * The move_object function must be supplied otherwise shrink can only
+	 * free pages that are competely empty.
+	 *
+	 * move_object gets a slab_cache pointer and an object pointer. The
+	 * function must reallocate another object and move the contents
+	 * from this object into the new object. Then the function should
+	 * return 1 for success. If it return 0 then the object is pinned.
+	 * the slab that the object resides on will not be freed.
+	 */
+	int (*shrink)(struct slab_cache *,
+			int (*move_object)(struct slab_cache *, void *));
+
+	/*
+	 * Establish a new reference so that destroy does not
+	 * unecessarily destroy the slab_cache
+	 */
+	struct slab_cache * (*dup)(struct slab_cache *);
+	int (*destroy)(struct slab_cache *);
+	void (*destructor)(struct slab_allocator *);
+	const char *name;
+};
+
+/* Standard slab allocator */
+extern const struct slab_allocator slabifier_allocator;
+
+/* Access kmalloc's fixed slabs without creating new ones. */
+extern struct slab_allocator kmalloc_slab_allocator;
+
+#ifdef CONFIG_NUMA
+extern const struct slab_allocator numa_slab_allocator;
+#endif
+
+/* Generate new slab allocators based on old ones */
+struct slab_allocator *rcuify_slab(struct slab_allocator *base);
+struct slab_allocator *dmaify_slab(struct slab_allocator *base);
+
+/* Indestructible static allocators use this. */
+void null_slab_allocator_destructor(struct slab_allocator *);
+
+struct derived_slab_allocator {
+	struct slab_allocator a;
+	const struct slab_allocator *base;
+};
+
+void derived_slab_destructor(struct slab_allocator *a);
+
+struct derived_slab_allocator *derive_slab_allocator
+			(const struct slab_allocator *base,
+			const char *name);
+
+#endif /* _LINUX_ALLOCATOR_H */
Index: linux-2.6.18-rc4-mm3/mm/allocator.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.18-rc4-mm3/mm/allocator.c	2006-08-26 18:26:25.239979736 -0700
@@ -0,0 +1,447 @@
+/*
+ * Generic allocator and modifiers for allocators (slab and page allocators)
+ *
+ * (C) 2006 Silicon Graphics, Inc. Christoph Lameter <clameter@sgi.com>
+ */
+
+#include <linux/allocator.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+
+/*
+ * Section One: Page Allocators
+ */
+
+static char *alloc_str_combine(const char *new, const char *base)
+{
+	char *s;
+
+	s = kmalloc(strlen(new) + strlen(base) + 2, GFP_KERNEL);
+	strcpy(s, new);
+	strcat(s, ":");
+	strcat(s, base);
+	return s;
+}
+
+/* For static allocators */
+static void null_destructor(struct page_allocator *a) {}
+
+/*
+ * A general page allocator that can allocate all of memory
+ */
+static struct page *gen_alloc(const struct page_allocator *a, int order,
+		gfp_t flags, int node)
+{
+	if (order)
+		flags |= __GFP_COMP;
+#ifdef CONFIG_NUMA
+	if (node >=0)
+		return alloc_pages_node(node, flags, order);
+#endif
+	return alloc_pages(flags, order);
+}
+
+static void gen_free(const struct page_allocator *a, struct page *page,
+							int order)
+{
+	__free_pages(page, order);
+}
+
+const struct page_allocator page_allocator = {
+	.allocate = gen_alloc,
+	.free = gen_free,
+	.destructor = null_destructor,
+	.name = "page_allocator"
+};
+
+/*
+ * Functions to deal with dynamically generating allocators.
+ */
+void derived_destructor(struct page_allocator *a)
+{
+	struct derived_page_allocator *d = (void *)a;
+
+	d->base->destructor((struct page_allocator *)d->base);
+	kfree(a->name);
+	kfree(a);
+}
+
+/*
+ * Create a new allocator based on another one. All functionality
+ * is duplicated except for the destructor. The caller needs to do
+ * modifications to some of the methods in the copy.
+ */
+struct derived_page_allocator *derive_page_allocator
+		(const struct page_allocator *base, const char *name)
+{
+	struct derived_page_allocator *d =
+		kmalloc(sizeof(struct derived_page_allocator), GFP_KERNEL);
+
+	d->base = base;
+	d->a.allocate = base->allocate;
+	d->a.free = base->free;
+	d->a.name = alloc_str_combine(name, base->name);
+	d->a.destructor = derived_destructor;
+	return d;
+};
+
+/*
+ * RCU allocator generator (this is used f.e in the slabifier
+ * to realize SLAB_DESTROY_BY_RCU see the slabulator on how to do this).
+ *
+ * We overload struct page once more for the RCU data
+ * lru = RCU head
+ * index = order
+ * mapping = base allocator
+ */
+static void page_free_rcu(struct rcu_head *h)
+{
+	struct page *page;
+	struct page_allocator *base;
+	int order;
+
+ 	page = container_of((struct list_head *)h, struct page, lru);
+	base = (void *)page->mapping;
+	order = page->index;
+	page->index = 0;
+	page->mapping = NULL;
+	base->free(base, page, order);
+}
+
+/*
+ * Use page struct as intermediate rcu storage.
+ */
+static void rcu_free(const struct page_allocator *a, struct page *page,
+							 int order)
+{
+	struct rcu_head *head = (void *)&page->lru;
+	struct derived_page_allocator *d = (void *)a;
+
+	page->index = order;
+	page->mapping = (void *)d->base;
+	call_rcu(head, page_free_rcu);
+}
+
+struct page_allocator *rcuify_page_allocator
+			(const struct page_allocator *base)
+{
+	struct derived_page_allocator *d = derive_page_allocator(base,"rcu");
+
+	d->a.free = rcu_free;
+	return &d->a;
+};
+
+/*
+ * Restrict memory allocations to DMA
+ */
+static struct page *dma_alloc(const struct page_allocator *a, int order,
+						gfp_t flags, int node)
+{
+	struct derived_page_allocator *d = (void *)a;
+
+	return d->base->allocate(d->base, order, flags | __GFP_DMA, node);
+}
+
+struct page_allocator *dmaify_page_allocator
+			(const struct page_allocator *base)
+{
+	struct derived_page_allocator *d = derive_page_allocator(base, "dma");
+
+	d->a.allocate = dma_alloc;
+	return &d->a;
+}
+
+/*
+ * Allocator with constructur and destructor in fixed sized length
+ * in the page (used by the slabifier to realize slab destructors
+ * and constructors).
+ */
+struct deconstructor {
+	struct page_allocator a;
+	const struct page_allocator *base;
+	unsigned int size;
+	void *private;
+	void (*ctor)(void *, void *, unsigned long);
+        void (*dtor)(void *, void *, unsigned long);
+};
+
+static struct page *ctor_alloc(const struct page_allocator *a,
+				int order, gfp_t flags, int node)
+{
+	struct deconstructor *d = (void *)a;
+	struct page * page = d->base->allocate(d->base, order, flags, node);
+
+	if (d->ctor) {
+		void *start = page_address(page);
+		void *end = start + (PAGE_SIZE << order);
+		void *p;
+
+		for (p = start; p <= end - d->size; p += d->size)
+		/* Make the flags of the constructor compatible with SLAB use */
+			d->ctor(p, d->private, 1 + !(flags & __GFP_WAIT));
+	}
+	return page;
+}
+
+static void dtor_free(const struct page_allocator *a,
+				struct page *page, int order)
+{
+	struct deconstructor *d = (void *)a;
+
+	if (d->dtor) {
+		void *start = page_address(page);
+		void *end = start + (PAGE_SIZE << order);
+		void *p;
+
+		for (p = start; p <= end - d->size; p += d->size)
+			d->dtor(p, d->private, 0);
+	}
+	d->base->free(d->base, page, order);
+}
+
+struct page_allocator *ctor_and_dtor_for_page_allocator
+	(const struct page_allocator *base,
+		unsigned int size, void *private,
+		void (*ctor)(void *, void *, unsigned long),
+		void (*dtor)(void *, void *, unsigned long))
+{
+	struct deconstructor *d =
+		kmalloc(sizeof(struct deconstructor), GFP_KERNEL);
+
+	d->a.allocate = ctor ? ctor_alloc : base->allocate;
+	d->a.free = dtor ? dtor_free : base->free;
+	d->a.destructor = derived_destructor;
+	d->a.name = alloc_str_combine("ctor_dtor", base->name);
+	d->base = base;
+	d->ctor = ctor;
+	d->dtor = dtor;
+	d->size = size;
+	d->private = private;
+	return &d->a;
+}
+
+/*
+ * Track reclaimable pages. This is used by the slabulator
+ * to mark allocations of certain slab caches.
+ */
+static struct page *rac_alloc(const struct page_allocator *a, int order,
+			gfp_t flags, int node)
+{
+	struct derived_page_allocator *d = (void *)a;
+	struct page *page = d->base->allocate(d->base, order, flags, node);
+
+	mod_zone_page_state(page_zone(page), NR_SLAB_RECLAIMABLE, 1 << order);
+	return page;
+}
+
+static void rac_free(const struct page_allocator *a, struct page *page,
+							int order)
+{
+	struct derived_page_allocator *d = (void *)a;
+
+	mod_zone_page_state(page_zone(page),
+					NR_SLAB_RECLAIMABLE, -(1 << order));
+	d->base->free(d->base, page, order);
+}
+
+struct page_allocator *reclaimable_slab(const struct page_allocator *base)
+{
+	struct derived_page_allocator *d =
+		derive_page_allocator(&page_allocator,"reclaimable");
+
+	d->a.allocate = rac_alloc;
+	d->a.free = rac_free;
+	return &d->a;
+}
+
+/*
+ * Track unreclaimable pages. This is used by the slabulator
+ * to mark allocations of certain slab caches.
+ */
+static struct page *urac_alloc(const struct page_allocator *a, int order,
+			gfp_t flags, int node)
+{
+	struct derived_page_allocator *d = (void *)a;
+	struct page *page = d->base->allocate(d->base, order, flags, node);
+
+	mod_zone_page_state(page_zone(page),
+			NR_SLAB_UNRECLAIMABLE, 1 << order);
+	return page;
+}
+
+static void urac_free(const struct page_allocator *a, struct page *page,
+							int order)
+{
+	struct derived_page_allocator *d = (void *)a;
+
+	mod_zone_page_state(page_zone(page),
+					NR_SLAB_UNRECLAIMABLE, -(1 << order));
+	d->base->free(d->base, page, order);
+}
+
+struct page_allocator *unreclaimable_slab(const struct page_allocator *base)
+{
+	struct derived_page_allocator *d =
+		derive_page_allocator(&page_allocator,"unreclaimable");
+
+	d->a.allocate = urac_alloc;
+	d->a.free = urac_free;
+	return &d->a;
+}
+
+/*
+ * Numacontrol for allocators
+ */
+struct numactl {
+	struct page_allocator a;
+	const struct page_allocator *base;
+	int node;
+	gfp_t flags;
+};
+
+static struct page *numactl_alloc(const struct page_allocator *a,
+				int order, gfp_t flags, int node)
+{
+	struct numactl *d = (void *)a;
+
+	if (d->node >= 0)
+		node = d->node;
+
+	return d->base->allocate(d->base, order, flags | d->flags, node);
+}
+
+
+struct page_allocator *numactl_allocator(const struct page_allocator *base,
+	int node, gfp_t flags)
+{
+	struct numactl *d =
+		kmalloc(sizeof(struct numactl), GFP_KERNEL);
+
+	d->a.allocate = numactl_alloc;
+	d->a.destructor = derived_destructor;
+	d->a.name = alloc_str_combine("numa", base->name);
+	d->base = base;
+	d->node = node;
+	d->flags = flags;
+	return &d->a;
+}
+
+/*
+ * Slab allocators
+ */
+
+/* Tools to make your own */
+void null_slab_allocator_destructor(struct slab_allocator *a) {}
+
+void derived_slab_destructor(struct slab_allocator *a) {
+	struct derived_slab_allocator *d = (void *)a;
+
+	d->base->destructor((struct slab_allocator *)d->base);
+	kfree(d);
+}
+
+struct derived_slab_allocator *derive_slab_allocator
+		(const struct slab_allocator *base,
+			const char *name) {
+	struct derived_slab_allocator *d =
+		 kmalloc(sizeof(struct derived_slab_allocator), GFP_KERNEL);
+
+	memcpy(&d->a, base, sizeof(struct slab_allocator));
+	d->base = base;
+	d->a.name = alloc_str_combine("name", base->name);
+	d->a.destructor = derived_slab_destructor;
+	return d;
+}
+
+/* Generate new slab allocators based on old ones */
+
+/*
+ * First a generic method to rcuify any slab. We add the rcuhead
+ * to the end of the object and use that on free.
+ */
+
+struct rcuified_slab {
+	struct slab_allocator *a;
+	const struct slab_allocator *base;
+	unsigned int rcu_offset;
+};
+
+/*
+ * Information that is added to the end of the slab
+ */
+struct slabr {
+	struct rcu_head r;
+	struct slab_cache *s;
+};
+
+struct slab_cache *rcuify_slab_create(struct slab_control *c,
+	const struct slab_cache *sc)
+{
+	struct rcuified_slab *d = (void *)sc->slab_alloc;
+	struct slab_cache i;
+
+	memcpy(&i, sc, sizeof(struct slab_cache));
+
+	i.inuse = d->rcu_offset = ALIGN(sc->inuse, sizeof(void *));
+	i.inuse += sizeof(struct slabr) + sizeof(void *);
+	while (i.inuse > i.size)
+		i.size += i.align;
+
+	i.slab_alloc = d->base;
+
+	return d->base->create(c, &i);
+}
+
+void rcu_slab_free(struct rcu_head *rcu)
+{
+	struct slabr *r = (void *) rcu;
+	struct slab_cache *s = r->s;
+	struct rcuified_slab *d = (void *)s->slab_alloc;
+	void *object = (void *) rcu - d->rcu_offset;
+
+	d->base->free(s, object);
+}
+
+void rcuify_slab_free(struct slab_cache *s, const void *object)
+{
+	struct rcuified_slab *r = (struct rcuified_slab *)(s->slab_alloc);
+
+	call_rcu((struct rcu_head *)(object + r->rcu_offset), rcu_slab_free);
+}
+
+struct slab_allocator *rcuify_slab_allocator
+			(const struct slab_allocator *base)
+{
+	struct derived_slab_allocator *d = derive_slab_allocator(base,"rcu");
+
+	d->a.create = rcuify_slab_create;
+	d->a.free = rcuify_slab_free;
+	return &d->a;
+}
+
+/*
+ * dmaification of slab allocation. This is done by dmaifying the
+ * underlying page allocator.
+ */
+struct slab_cache *dmaify_slab_create(struct slab_control *c,
+		const struct slab_cache *sc)
+{
+	struct derived_slab_allocator *d = (void *)sc->slab_alloc;
+	struct slab_cache i;
+
+	memcpy(&i, sc, sizeof(struct slab_cache));
+
+	i.page_alloc = dmaify_page_allocator(sc->page_alloc);
+
+	return d->base->create(c, &i);
+}
+
+struct slab_allocator *dmaify_slab_allocator
+			(const struct slab_allocator *base)
+{
+	struct derived_slab_allocator *d = derive_slab_allocator(base, "dma");
+
+	d->a.create = dmaify_slab_create;
+	return &d->a;
+}
+

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [MODSLAB 2/4] A slab allocator: SLABIFIER
  2006-08-27  2:32 [MODSLAB 0/4] A modular slab allocator V2 Christoph Lameter
  2006-08-27  2:32 ` [MODSLAB 1/4] Generic Allocator Framework Christoph Lameter
@ 2006-08-27  2:32 ` Christoph Lameter
  2006-08-27  7:18   ` [MODSLAB 2.5/4] A slab statistics module Christoph Lameter
  2006-08-28  5:33   ` [MODSLAB 2/4] A slab allocator: SLABIFIER Christoph Lameter
  2006-08-27  2:33 ` [MODSLAB 3/4] A Kmalloc subsystem Christoph Lameter
  2006-08-27  2:33 ` [MODSLAB 4/4] Slabulator: Emulate the existing Slab Layer Christoph Lameter
  3 siblings, 2 replies; 7+ messages in thread
From: Christoph Lameter @ 2006-08-27  2:32 UTC (permalink / raw)
  To: akpm
  Cc: Marcelo Tosatti, linux-kernel, linux-mm, Andi Kleen, mpm,
	Manfred Spraul, Dave Chinner, Christoph Lameter

Lately I have started tinkering around with the slab in particular after
Matt Mackal mentioned that the slab should be more modular at the KS.
One particular design issue with the current slab is that it is build on the
basic notion of shifting object references from list to list. Without NUMA this
is wild enough with the per cpu caches and the shared cache but with NUMA we now
have per node shared arrays, per node list and per node per node alien caches.
Somehow this all works but one wonders does it have to be that way? On very
large systems the number of these entities grows to unbelievable numbers.
On our 1k cpu/node system each slab need 128M for alien caches alone.

So I thought it may be best to try to develop another basic slab layer
that does not have all the object queues and that does not have to carry
so much state information. I also have had concerns about the way locking
is handled for awhile. We could increase parallelism by finer grained locking.
This in turn may avoid the need for object queues.

One of the problems of the NUMA slab allocator is that per node partial
slab lists are used. Partial slabs cannot be filled up from other nodes.
So what I have tried to do here is to have minimal metainformation combined
with one centralized list of partially allocated slabs. The list_lock
is only taken if list modifications become necessary. The need for those
has been drastically reduced with a few measures. See below.

After toying around for awhile I came to the realization that the page struct
contains all the information necessary to manage a slab block. One can put
all the management information there and that is also advantageous
for performance since we constantly have to use the page struct anyways for
reverse object lookups and during slab creation. So this also reduces the
cache footprint of the slab. The alignment is naturally the best since the
first object starts right at the page boundary. This reduces the complexity
of alignment calculations.


struct page overloading:

- _mapcout	=> Used to count the objects in use in a slab
- mapping	=> Reference to the slab structure
- index		=> Pointer to the first free element in a slab
- lru		=> Used for list management.

Also we have a page lock in the page struct that is used
for locking each slab during modifications. Taking the lock per slab
is the finest grained locking available and this is fundamental
to the slabifier. The slab lock is taken if the slab contains
multiple objects in order to protect the freelist.

The freelists of objects per page are managed as a chained list.
The struct page contains a pointer to the first element. The first 4 bytes of
the free element contains a pointer to the next free element etc until the
chain ends with NULL. If the object cannot be overwritten after free (RCU
and constructors etc) then we can shift the pointer to the next free element
behind the object.

Flag overloading:

PageReferenced	=> Used to control per cpu slab freeing.
PageActive	=> slab is under active allocation.
PageLocked	=> per slab locking

The slabifier does remove the need for a list of free slabs and a list
of used slabs. Free slabs are immediately returned to the page allocator.
The page allocator has its own per cpu queues that will manage these
free pages. Used slabs are simply not tracked because we never have
a need to find out where the used slabs are. The only important thing
is that the slabs come back to the partial list when an object in them
is deleted. The metadata in the corresponding page struct will allow
us to do that easily.

Per cpu caches exist in the sense that each processor has a per processor
"cpuslab". Objects in this "active" slab will only be allocated from this
processor. This naturally makes all allocations from a single processor
come from the same slab page which reduces fragmentation.
The page state is likely going to stay in the cache. Allocation will be
very fast since we only need the page struct reference for all our needs
which is likely not contended at all. Fetching the next free pointer from
the location of the object nicely prefetches the object.

The list_lock is used only in very rare cases. Let discuss one example
of multiple processors allocating from the same cache. The first thing that
happens when the slab is empty is that every processors gets its own slab
(the "active" slab). This does not require the list_lock because we
get a page from the page allocator and that immediately becomes the
active slab. Now all processors allocate from their slabs. This also
does not require any access to the partial lists so no list_lock is taken.

If a slab becomes full then each processor is simply forgetting about
the slab and gets a new one from the page allocator.

As long as all processors are just allocating no list_lock is needed at all.

If a free now happens then things get a bit more complicated. If the free
occurs on an active page then again no list_lock needs to be taken.
The slab lock may be contended since it is under current allocation by
a processor.

If the free occurs on a fully allocated page then we make a partially
allocated page from a full page. Now the list_lock will be taken and
the page is put on the partial list.

If further frees occur on a partially allocated page then also no
list_lock needs to be taken because it is still a partially allocated
page. This works until the page has no objects left. At that point
we take the page of the list of partial slabs to free it and that
requires the list_lock again.

IF a processors has filled up its active slab and needs a new one then
it will first check if there are partially allocated slabs available.
If so then it will take a partially allocated slab and begin to fill
it up. That also requires taking the list lock.

So the contention of the list_lock has been minimized. I have not tested
this on larger systems that 8p yet but I expect the contention to be
manageable.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.18-rc4-mm3/mm/slabifier.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.18-rc4-mm3/mm/slabifier.c	2006-08-26 18:27:04.730702744 -0700
@@ -0,0 +1,1022 @@
+/*
+ * Generic Slabifier for the allocato abstraction framework.
+ * The allocator synchronizes using slab based locks and only
+ * uses a centralized list lock to manage the pool of partial slabs.
+ *
+ * (C) 2006 Silicon Graphics Inc., Christoph Lameter <clameter@sgi.com>
+ */
+
+#include <linux/mm.h>
+#include <linux/allocator.h>
+#include <linux/bit_spinlock.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+#ifdef SLABIFIER_DEBUG
+#define	DBUG_ON(_x) BUG_ON(_x)
+#else
+#define DBUG_ON(_x)
+#endif
+
+struct slab {
+	struct slab_cache sc;
+#ifdef CONFIG_SMP
+	int flusher_active;
+	struct work_struct flush;
+#endif
+	atomic_t refcount;		/* Refcount for destroy */
+	atomic_long_t nr_slabs;		/* Total slabs used */
+	/* Performance critical items follow */
+	int size;			/* Total size of an object */
+	int offset;			/* Free pointer offset. */
+	int objects;			/* Number of objects in slab */
+	spinlock_t list_lock;
+	struct list_head partial;
+	unsigned long nr_partial;
+	struct page *active[NR_CPUS];
+};
+
+/*
+ * The page struct is used to keep necessary information about a slab.
+ * For a compound page the first page keeps the slab state.
+ *
+ * Overloaded fields in struct page:
+ *
+ * 	lru	 -> used to a slab on the lists
+ *	mapping	 -> pointer to struct slab
+ *	index	 -> pointer to next free object
+ *	_mapcount -> count number of elements in use
+ *
+ * Lock order:
+ *   1. slab_lock(page)
+ *   2. slab->list_lock
+ *
+ * The slabifier assigns one slab for allocation to each processor.
+ * Allocators only occur from these active slabs.
+ * If a cpu slab is active thena workqueue thread checks every 10
+ * seconds if the cpu slab is still in use. The cpu slab is pushed back
+ * to the list if inactive.
+ *
+ * Leftover slabs with free elements are kept on the partial list.
+ * There is no list for full slabs. If an object in a full slab is
+ * freed then the slab will show up again on the partial lists.
+ * Otherwise we have no way of tracking used slabs.
+ *
+ * Slabs are freed when they become empty. Teardown and setup is
+ * minimal so we rely on the page allocators per cpu caches for
+ * fast frees and allocations.
+ */
+
+#define lru_to_last_page(_head) (list_entry((_head)->next, struct page, lru))
+#define lru_to_first_page(_head) (list_entry((_head)->next, struct page, lru))
+
+/*
+ * Some definitions to overload fields in struct page
+ */
+static __always_inline void *get_object_pointer(struct page *page)
+{
+	return (void *)page->index;
+}
+
+static __always_inline void set_object_pointer(struct page *page,
+						void *object)
+{
+	page->index = (unsigned long)object;
+}
+
+static __always_inline struct slab *get_slab(struct page *page)
+{
+	return (struct slab *)page->mapping;
+}
+
+static __always_inline void set_slab(struct page *page, struct slab *s)
+{
+	page->mapping = (void *)s;
+}
+
+static __always_inline int *object_counter(struct page *page)
+{
+	return (int *)&page->_mapcount;
+}
+
+static __always_inline void inc_object_counter(struct page *page)
+{
+	(*object_counter(page))++;
+}
+
+static __always_inline void dec_object_counter(struct page *page)
+{
+	(*object_counter(page))--;
+}
+
+static __always_inline void set_object_counter(struct page *page,
+							int counter)
+{
+	(*object_counter(page))= counter;
+}
+
+static __always_inline int get_object_counter(struct page *page)
+{
+	return (*object_counter(page));
+}
+
+/*
+ * Locking for each individual slab using the pagelock
+ */
+static __always_inline void slab_lock(struct page *page)
+{
+	bit_spin_lock(PG_locked, &page->flags);
+}
+
+static __always_inline void slab_unlock(struct page *page)
+{
+	bit_spin_unlock(PG_locked, &page->flags);
+}
+
+/*
+ * Management of partially allocated slabs
+ */
+static void __always_inline add_partial(struct slab *s, struct page *page)
+{
+	spin_lock(&s->list_lock);
+	s->nr_partial++;
+	list_add_tail(&page->lru, &s->partial);
+	spin_unlock(&s->list_lock);
+}
+
+static void __always_inline remove_partial(struct slab *s,
+						struct page *page)
+{
+	spin_lock(&s->list_lock);
+	list_del(&page->lru);
+	s->nr_partial--;
+	spin_unlock(&s->list_lock);
+}
+
+/*
+ * Get a page and remove it from the partial list
+ * Must hold list_lock
+ */
+static __always_inline int lock_and_del_slab(struct slab *s,
+						struct page *page)
+{
+	if (bit_spin_trylock(PG_locked, &page->flags)) {
+		list_del(&page->lru);
+		s->nr_partial--;
+		return 1;
+	}
+	return 0;
+}
+
+struct page *numa_search(struct slab *s, int node)
+{
+#ifdef CONFIG_NUMA
+	struct list_head *h;
+	struct page *page;
+
+	/*
+	 * Search for slab on the right node
+	 */
+
+	if (node == -1)
+		node =  numa_node_id();
+
+	list_for_each(h, &s->partial) {
+		page = container_of(h, struct page, lru);
+
+		if (likely(page_to_nid(page) == node) &&
+			lock_and_del_slab(s, page))
+				return page;
+	}
+#endif
+	return NULL;
+}
+
+/*
+ * Get a partial page, lock it and return it.
+ */
+static struct page *get_partial(struct slab *s, int node)
+{
+	struct page *page;
+	struct list_head *h;
+
+	spin_lock(&s->list_lock);
+
+	page = numa_search(s, node);
+	if (page)
+		goto out;
+#ifdef CONFIG_NUMA
+	if (node >= 0)
+		goto fail;
+#endif
+
+	list_for_each(h, &s->partial) {
+		page = container_of(h, struct page, lru);
+
+		if (likely(lock_and_del_slab(s, page)))
+			goto out;
+	}
+fail:
+	page = NULL;
+out:
+	spin_unlock(&s->list_lock);
+	return page;
+}
+
+/*
+ * Debugging checks
+ */
+static void check_slab(struct page *page)
+{
+#ifdef SLABIFIER_DEBUG
+	if (!PageSlab(page)) {
+		printk(KERN_CRIT "Not a valid slab page @%p flags=%lx"
+			" mapping=%p count=%d \n",
+			page, page->flags, page->mapping, page_count(page));
+		BUG();
+	}
+#endif
+}
+
+static void check_active_slab(struct page *page)
+{
+#ifdef SLABIFIER_DEBUG
+	if (!PageActive(page)) {
+		printk(KERN_CRIT "Not an active slab page @%p flags=%lx"
+			" mapping=%p count=%d \n",
+			page, page->flags, page->mapping, page_count(page));
+		BUG();
+	}
+#endif
+}
+
+static int check_valid_pointer(struct slab *s, struct page *page,
+					 void *object, void *origin)
+{
+#ifdef SLABIFIER_DEBUG
+	void *base = page_address(page);
+
+	if (object < base || object >= base + s->objects * s->size) {
+		printk(KERN_CRIT "slab %s size %d: pointer %p->%p\nnot in"
+			" range (%p-%p) in page %p\n", s->sc.name, s->size,
+			origin, object, base, base + s->objects * s->size,
+			page);
+		return 0;
+	}
+
+	if ((object - base) % s->size) {
+		printk(KERN_CRIT "slab %s size %d: pointer %p->%p\n"
+			"does not properly point"
+			"to an object in page %p\n",
+			s->sc.name, s->size, origin, object, page);
+		return 0;
+	}
+#endif
+	return 1;
+}
+
+/*
+ * Determine if a certain object on a page is on the freelist and
+ * therefore free. Must hold the slab lock for active slabs to
+ * guarantee that the chains are consistent.
+ */
+static int on_freelist(struct slab *s, struct page *page, void *search)
+{
+	int nr = 0;
+	void **object = get_object_pointer(page);
+	void *origin = &page->lru;
+
+	if (s->objects == 1)
+		return 0;
+
+	check_slab(page);
+
+	while (object && nr <= s->objects) {
+		if (object == search)
+			return 1;
+		if (!check_valid_pointer(s, page, object, origin))
+			goto try_recover;
+		origin = object;
+		object = object[s->offset];
+		nr++;
+	}
+
+	if (get_object_counter(page) != s->objects - nr) {
+		printk(KERN_CRIT "slab %s: page %p wrong object count."
+			" counter is %d but counted were %d\n",
+			s->sc.name, page, get_object_counter(page),
+			s->objects - nr);
+try_recover:
+		printk(KERN_CRIT "****** Trying to continue by marking "
+			"all objects used (memory leak!)\n");
+		set_object_counter(page, s->objects);
+		set_object_pointer(page, NULL);
+	}
+	return 0;
+}
+
+void check_free_chain(struct slab *s, struct page *page)
+{
+#ifdef SLABIFIER_DEBUG
+	on_freelist(s, page, NULL);
+#endif
+}
+
+/*
+ * Operations on slabs
+ */
+static void discard_slab(struct slab *s, struct page *page)
+{
+	DBUG_ON(PageActive(page));
+	DBUG_ON(PageLocked(page));
+	atomic_long_dec(&s->nr_slabs);
+
+	/* Restore page state */
+	page->mapping = NULL;
+	reset_page_mapcount(page);
+	__ClearPageSlab(page);
+
+	s->sc.page_alloc->free(s->sc.page_alloc, page, s->sc.order);
+}
+
+/*
+ * Allocate a new slab and prepare an empty freelist and the basic struct
+ * page settings.
+ */
+static struct page *new_slab(struct slab *s, gfp_t flags, int node)
+{
+	struct page *page;
+
+	page = s->sc.page_alloc->allocate(s->sc.page_alloc, s->sc.order,
+			flags, node < 0 ? s->sc.node : node);
+	if (!page)
+		return NULL;
+
+	set_slab(page, s);
+	__SetPageSlab(page);
+	atomic_long_inc(&s->nr_slabs);
+	return page;
+}
+
+/*
+ * Move a page back to the lists.
+ *
+ * Must be called with the slab lock held.
+ * On exit the slab lock will have been dropped.
+ */
+static void __always_inline putback_slab(struct slab *s, struct page *page)
+{
+	int inuse;
+
+	inuse = get_object_counter(page);
+
+	if (inuse) {
+		if (inuse < s->objects)
+			add_partial(s, page);
+		slab_unlock(page);
+	} else {
+		slab_unlock(page);
+		discard_slab(s, page);
+	}
+}
+
+static void deactivate_slab(struct slab *s, struct page *page, int cpu)
+{
+	s->active[cpu] = NULL;
+	smp_wmb();
+	ClearPageActive(page);
+	ClearPageReferenced(page);
+
+	putback_slab(s, page);
+}
+
+/*
+ * Acquire the slab lock from the active array. If there is no active
+ * slab for this processor then return NULL;
+ */
+static __always_inline struct page *get_and_lock_active(struct slab *s,
+							 int cpu)
+{
+	struct page *page;
+
+redo:
+	page = s->active[cpu];
+	if (unlikely(!page))
+		return NULL;
+	slab_lock(page);
+	if (unlikely(s->active[cpu] != page)) {
+		slab_unlock(page);
+		goto redo;
+	}
+	check_active_slab(page);
+	check_free_chain(s, page);
+	return page;
+}
+
+/*
+ * Flush an active slab back to the lists.
+ */
+static void flush_active(struct slab *s, int cpu)
+{
+	struct page *page;
+	unsigned long flags;
+
+	local_irq_save(flags);
+	page = get_and_lock_active(s, cpu);
+	if (likely(page))
+		deactivate_slab(s, page, cpu);
+	local_irq_restore(flags);
+}
+
+#ifdef CONFIG_SMP
+/*
+ * Flush per cpu slabs if they are not in use.
+ */
+void flusher(void *d)
+{
+	struct slab *s = d;
+	int cpu = smp_processor_id();
+	struct page *page;
+	int nr_active = 0;
+
+	for_each_online_cpu(cpu) {
+
+		page = s->active[cpu];
+		if (!page)
+			continue;
+
+		if (PageReferenced(page)) {
+			ClearPageReferenced(page);
+			nr_active++;
+		} else
+			flush_active(s, cpu);
+	}
+	if (nr_active)
+		schedule_delayed_work(&s->flush, 10 * HZ);
+	else
+		s->flusher_active = 0;
+}
+
+static void drain_all(struct slab *s)
+{
+	int cpu;
+
+	if (s->flusher_active) {
+		cancel_delayed_work(&s->flush);
+		for_each_possible_cpu(cpu)
+			flush_active(s, cpu);
+		s->flusher_active = 0;
+	}
+}
+#else
+static void drain_all(struct slab *s)
+{
+	flush_active(s, 0);
+}
+#endif
+
+/*
+ * slab_create produces objects aligned at size and the first object
+ * is placed at offset 0 in the slab (We have no metainformation on the
+ * slab, all slabs are in essence off slab).
+ *
+ * In order to get the desired alignment one just needs to align the
+ * size.
+ *
+ * Notice that the allocation order determines the sizes of the per cpu
+ * caches. Each processor has always one slab available for allocations.
+ * Increasing the allocation order reduces the number of times that slabs
+ * must be moved on and off the partial lists and therefore may influence
+ * locking overhead.
+ *
+ * The offset is used to relocate the free list link in each object. It is
+ * therefore possible to move the free list link behind the object. This
+ * is necessary for RCU to work properly and also useful for debugging.
+ * However no freelists are necessary if there is only one element per
+ * slab.
+ */
+static struct slab_cache *slab_create(struct slab_control *x,
+	const struct slab_cache *sc)
+{
+	struct slab *s = (void *)x;
+	int cpu;
+
+	/* Verify that the generic structure is big enough for our data */
+	BUG_ON(sizeof(struct slab_control) < sizeof(struct slab));
+
+	memcpy(&x->sc, sc, sizeof(struct slab_cache));
+
+	s->size = ALIGN(sc->size, sizeof(void *));
+
+	if (sc->offset > s->size - sizeof(void *) ||
+			(sc->offset % sizeof(void*)))
+		return NULL;
+
+	s->offset = sc->offset / sizeof(void *);
+	s->objects = (PAGE_SIZE << sc->order) / s->size;
+	atomic_long_set(&s->nr_slabs, 0);
+	s->nr_partial = 0;
+#ifdef CONFIG_SMP
+	s->flusher_active = 0;
+	INIT_WORK(&s->flush, &flusher, s);
+#endif
+	if (!s->objects)
+		return NULL;
+
+	INIT_LIST_HEAD(&s->partial);
+
+	atomic_set(&s->refcount, 1);
+	spin_lock_init(&s->list_lock);
+
+	for_each_possible_cpu(cpu)
+		s->active[cpu] = NULL;
+	return &s->sc;
+}
+
+/*
+ * Reload the per cpu slab
+ *
+ * If we have reloaded successfully then we exit with holding the slab lock
+ * and return the pointer to the new page.
+ *
+ * Return NULL if we cannot reload.
+ */
+static struct page *reload(struct slab *s, unsigned long cpu, gfp_t flags,
+							int node)
+{
+	void *p, *start, *end;
+	void **last;
+	struct page *page;
+
+redo:
+	/* Racy check. If we mistakenly see no partial slabs then we just
+	 * expand the partial list. If we mistakenly try to get a partial
+	 * slab then get_partials will return NULL.
+	 */
+	if (s->nr_partial) {
+		page = get_partial(s, node);
+		if (page)
+			goto gotpage;
+	}
+
+	if ((flags & __GFP_WAIT)) {
+		local_irq_enable();
+		page = new_slab(s, flags, node);
+		local_irq_disable();
+	} else
+		page = new_slab(s, flags, node);
+
+	if (!page)
+		return NULL;
+
+	start = page_address(page);
+	set_object_pointer(page, start);
+
+	end = start + s->objects * s->size;
+	last = start;
+	for (p = start +  s->size; p < end; p += s->size) {
+		last[s->offset] = p;
+		last = p;
+	}
+	last[s->offset] = NULL;
+	set_object_counter(page, 0);
+	slab_lock(page);
+	check_free_chain(s, page);
+
+gotpage:
+	/*
+	 * Now we have a page that is isolated from the lists and locked,
+	 */
+	SetPageActive(page);
+	ClearPageReferenced(page);
+
+	if (cmpxchg(&s->active[cpu], NULL, page) != NULL) {
+
+		ClearPageActive(page);
+		add_partial(s, page);
+		slab_unlock(page);
+
+		page = get_and_lock_active(s, cpu);
+		if (page)
+			return page;
+		goto redo;
+	}
+
+	check_free_chain(s, page);
+
+#ifdef CONFIG_SMP
+	if (keventd_up() && !s->flusher_active) {
+		s->flusher_active = 1;
+		schedule_delayed_work(&s->flush, 10 * HZ);
+	}
+#endif
+
+	return page;
+}
+
+static __always_inline void *__slab_alloc(struct slab_cache *sc,
+					gfp_t gfpflags, int node)
+{
+	struct slab *s = (void *)sc;
+	struct page *page;
+	void **object;
+	void *next_object;
+	unsigned long flags;
+	int cpu;
+
+	if (unlikely(s->objects == 1)) {
+		struct page *page = new_slab(s, gfpflags, node);
+
+		if (page)
+			return page_address(page);
+		else
+			return NULL;
+	}
+
+	local_irq_save(flags);
+	cpu = smp_processor_id();
+	page = get_and_lock_active(s, cpu);
+	if (unlikely(!page))
+		goto load;
+
+	while (unlikely(!get_object_pointer(page) ||
+		(node > 0 && page_to_nid(page) != node))) {
+
+		deactivate_slab(s, page, cpu);
+load:
+		page = reload(s, cpu, gfpflags, node);
+		if (unlikely(!page)) {
+			local_irq_restore(flags);
+			return NULL;
+		}
+	}
+
+	inc_object_counter(page);
+	object = get_object_pointer(page);
+	next_object = object[s->offset];
+	set_object_pointer(page, next_object);
+	check_free_chain(s, page);
+	SetPageReferenced(page);
+	slab_unlock(page);
+	local_irq_restore(flags);
+	return object;
+}
+
+static void *slab_alloc(struct slab_cache *sc, gfp_t gfpflags)
+{
+	return __slab_alloc(sc, gfpflags, -1);
+}
+
+static void *slab_alloc_node(struct slab_cache *sc, gfp_t gfpflags,
+							int node)
+{
+	return __slab_alloc(sc, gfpflags, node);
+}
+
+/* Figure out on which slab object the object resides */
+static __always_inline struct page *get_object_page(const void *x)
+{
+	struct page * page = virt_to_page(x);
+
+	if (unlikely(PageCompound(page)))
+		page = (struct page *)page_private(page);
+
+	if (!PageSlab(page))
+		return NULL;
+
+	return page;
+}
+
+static void slab_free(struct slab_cache *sc, const void *x)
+{
+	struct slab *s = (void *)sc;
+	struct page * page;
+	void *prior;
+	void **object = (void *)x;
+	unsigned long flags;
+
+	if (!object)
+		return;
+
+	page = get_object_page(object);
+	if (unlikely(!page)) {
+		printk(KERN_CRIT "slab_free %s size %d: attempt to free object"
+			"(%p) outside of slab.\n", s->sc.name, s->size, object);
+		goto dumpret;
+	}
+
+	if (!s) {
+		s = get_slab(page);
+
+		if (unlikely(!s)) {
+			printk(KERN_CRIT
+				"slab_free : no slab(NULL) for object %p.\n",
+						object);
+			goto dumpret;
+		}
+	} else
+	if (unlikely(s != get_slab(page))) {
+		printk(KERN_CRIT "slab_free %s: object at %p"
+				" belongs to slab %p\n",
+				s->sc.name, object, get_slab(page));
+		dump_stack();
+		s = get_slab(page);
+	}
+
+	if (unlikely(!check_valid_pointer(s, page, object, NULL))) {
+dumpret:
+		dump_stack();
+		printk(KERN_CRIT "***** Trying to continue by not"
+				"freeing object.\n");
+		return;
+	}
+
+	if (unlikely(s->objects == 1)) {
+		discard_slab(s, page);
+		return;
+	}
+
+	local_irq_save(flags);
+	slab_lock(page);
+
+#ifdef SLABIFIER_DEBUG
+	if (on_freelist(s, page, object)) {
+		printk(KERN_CRIT "slab_free %s: object %p already free.\n",
+						s->sc.name, object);
+		dump_stack();
+		goto out_unlock;
+	}
+#endif
+
+	prior = get_object_pointer(page);
+	object[s->offset] = prior;
+
+	set_object_pointer(page, object);
+	dec_object_counter(page);
+
+	if (unlikely(PageActive(page)))
+		goto out_unlock;
+
+	if (unlikely(get_object_counter(page) == 0)) {
+		if (s->objects > 1)
+			remove_partial(s, page);
+		check_free_chain(s, page);
+		slab_unlock(page);
+		discard_slab(s, page);
+		goto out;
+	}
+
+	if (unlikely(!prior))
+		/*
+		 * Page was fully used before. It will only have one free
+		 * object now. So move to the partial list.
+		 */
+		add_partial(s, page);
+
+out_unlock:
+	slab_unlock(page);
+out:
+	local_irq_restore(flags);
+}
+
+/*
+ * Check if a given pointer is valid
+ */
+static int slab_pointer_valid(struct slab_cache *sc, const void *object)
+{
+	struct slab *s = (void *)sc;
+	struct page * page;
+	void *addr;
+
+	page = get_object_page(object);
+
+	if (!page || s != get_slab(page))
+		return 0;
+
+	addr = page_address(page);
+	if (object < addr || object >= addr + s->objects * s->size)
+		return 0;
+
+	if ((object - addr) & s->size)
+		return 0;
+
+	return 1;
+}
+
+/*
+ * Determine the size of a slab object
+ */
+static unsigned long slab_object_size(struct slab_cache *sc,
+						const void *object)
+{
+	struct page *page;
+	struct slab *s;
+
+	page = get_object_page(object);
+	if (page) {
+		s = get_slab(page);
+		BUG_ON(sc && s != (void *)sc);
+		if (s)
+			return s->size;
+	}
+	BUG();
+	return 0;	/* Satisfy compiler */
+}
+
+/*
+ * Move slab objects in a given slab by calling the move_objects function.
+ *
+ * Must be called with the slab lock held but will drop and reacquire the
+ * slab lock.
+ */
+static int move_slab_objects(struct slab *s, struct page *page,
+			 int (*move_objects)(struct slab_cache *, void *))
+{
+	int unfreeable = 0;
+	void *addr = page_address(page);
+
+	while (get_object_counter(page) - unfreeable > 0) {
+		void *p;
+
+		for (p = addr; p < addr + s->objects; p+= s->size) {
+			if (!on_freelist(s, page, p)) {
+				/*
+				 * Drop the lock here to allow the
+				 * move_object function to do things
+				 * with the slab_cache and maybe this
+				 * page.
+				 */
+				slab_unlock(page);
+				local_irq_enable();
+				if (move_objects((struct slab_cache *)s, p))
+					slab_free(&s->sc, p);
+				else
+					unfreeable++;
+				local_irq_disable();
+				slab_lock(page);
+			}
+		}
+	}
+	return unfreeable;
+}
+
+/*
+ * Shrinking drops all the active per cpu slabs and also reaps all empty
+ * slabs off the partial list. Returns the number of slabs freed.
+ *
+ * If a move_object function is specified then the partial list is going
+ * to be compacted by calling the function on all slabs.
+ * The move_object function will be called for each objects in partially
+ * allocated slabs. move_object() needs to perform a new allocation for
+ * the object and move the contents of the object to the new location.
+ * If move_object() returns 1 for success then the object is going to be
+ * removed. If 0 then the object cannot be freed at all. As a result the
+ * slab containing the object will also not be freeable.
+ *
+ * Returns the number of slabs freed.
+ */
+static int slab_shrink(struct slab_cache *sc,
+			int (*move_object)(struct slab_cache *, void *))
+{
+	struct slab *s = (void *)sc;
+	unsigned long flags;
+	int slabs_freed = 0;
+	int i;
+
+	drain_all(s);
+
+	local_irq_save(flags);
+	for(i = 0; s->nr_partial > 1 && i < s->nr_partial - 1; i++ ) {
+		struct page * page;
+
+		page = get_partial(s, -1);
+		if (!page)
+			break;
+
+		/*
+		 * Pin page so that slab_free will not free even if we
+		 * drop the slab lock.
+		 */
+		SetPageActive(page);
+
+		if (get_object_counter(page) < s->objects && move_object)
+			if (move_slab_objects(s,
+					page, move_object) == 0)
+				slabs_freed++;
+
+		/*
+		 * This will put the slab on the front of the partial
+		 * list, the used list or free it.
+		 */
+		putback_slab(s, page);
+	}
+	local_irq_restore(flags);
+
+	return slabs_freed;
+
+}
+
+static struct slab_cache *slab_dup(struct slab_cache *sc)
+{
+	struct slab *s = (void *)sc;
+
+	atomic_inc(&s->refcount);
+	return &s->sc;
+}
+
+static void free_list(struct slab *s, struct list_head *list)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&s->list_lock, flags);
+	while (!list_empty(list))
+		discard_slab(s, lru_to_last_page(list));
+
+	spin_unlock_irqrestore(&s->list_lock, flags);
+}
+
+static int slab_destroy(struct slab_cache *sc)
+{
+	struct slab * s = (void *)sc;
+
+	if (!atomic_dec_and_test(&s->refcount))
+		return 0;
+
+	drain_all(s);
+	free_list(s, &s->partial);
+
+	if (atomic_long_read(&s->nr_slabs))
+		return 1;
+
+	/* Just to make sure that no one uses this again */
+	s->size = 0;
+	return 0;
+
+}
+
+static unsigned long count_objects(struct slab *s, struct list_head *list)
+{
+	int count = 0;
+	struct list_head *h;
+	unsigned long flags;
+
+	spin_lock_irqsave(&s->list_lock, flags);
+	list_for_each(h, list) {
+		struct page *page = lru_to_first_page(h);
+
+		count += get_object_counter(page);
+	}
+	spin_unlock_irqrestore(&s->list_lock, flags);
+	return count;
+}
+
+static unsigned long slab_objects(struct slab_cache *sc,
+	unsigned long *p_total, unsigned long *p_active,
+	unsigned long *p_partial)
+{
+	struct slab *s = (void *)sc;
+	int partial;
+	int active = 0;		/* Active slabs */
+	int nr_active = 0;	/* Objects in active slabs */
+	int cpu;
+	int nr_slabs = atomic_read(&s->nr_slabs);
+
+	for_each_possible_cpu(cpu) {
+		struct page *page = s->active[cpu];
+
+		if (s->active[cpu]) {
+			nr_active++;
+			active += get_object_counter(page);
+		}
+	}
+
+	partial = count_objects(s, &s->partial);
+
+	if (p_partial)
+		*p_partial = s->nr_partial;
+
+	if (p_active)
+		*p_active = nr_active;
+
+	if (p_total)
+		*p_total = nr_slabs;
+
+	return partial + active +
+		(nr_slabs - s->nr_partial - nr_active) * s->objects;
+}
+
+const struct slab_allocator slabifier_allocator = {
+	.name = "Slabifier",
+	.create = slab_create,
+	.alloc = slab_alloc,
+	.alloc_node = slab_alloc_node,
+	.free = slab_free,
+	.valid_pointer = slab_pointer_valid,
+	.object_size = slab_object_size,
+	.get_objects = slab_objects,
+	.shrink = slab_shrink,
+	.dup = slab_dup,
+	.destroy = slab_destroy,
+	.destructor = null_slab_allocator_destructor,
+};
+EXPORT_SYMBOL(slabifier_allocator);
Index: linux-2.6.18-rc4-mm3/mm/Makefile
===================================================================
--- linux-2.6.18-rc4-mm3.orig/mm/Makefile	2006-08-26 16:38:18.581301135 -0700
+++ linux-2.6.18-rc4-mm3/mm/Makefile	2006-08-26 18:26:28.542509970 -0700
@@ -25,4 +25,4 @@ obj-$(CONFIG_MEMORY_HOTPLUG) += memory_h
 obj-$(CONFIG_FS_XIP) += filemap_xip.o
 obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_SMP) += allocpercpu.o
-obj-$(CONFIG_MODULAR_SLAB) += allocator.o
+obj-$(CONFIG_MODULAR_SLAB) += allocator.o slabifier.o

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [MODSLAB 2.5/4] A slab statistics module
  2006-08-27  2:32 ` [MODSLAB 2/4] A slab allocator: SLABIFIER Christoph Lameter
@ 2006-08-27  7:18   ` Christoph Lameter
  2006-08-28  5:33   ` [MODSLAB 2/4] A slab allocator: SLABIFIER Christoph Lameter
  1 sibling, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2006-08-27  7:18 UTC (permalink / raw)
  To: akpm
  Cc: Marcelo Tosatti, linux-kernel, linux-mm, Andi Kleen, mpm,
	Manfred Spraul, Dave Chinner

I missed a patch. Insert this patch into the patchset after the slabifier 
patch

Generic Slab statistics module

A statistic module for generic slab allocator framework.

The creator of a cache must register the slab cache with

register_slab()

in order for something to show up in slabinfo.

Here is a sample of slabinfo output:

slabinfo - version: 3.0
# name            <objects> <objsize> <num_slabs> <partial_slabs> <active_slabs> <order> <allocator>
nfs_direct_cache           0     136       0       0       0  0 reclaimable:page_allocator
nfs_write_data            36     896       2       0       0  0 unreclaimable:page_allocator
nfs_read_data             21     768       2       1       0  0 unreclaimable:page_allocator
nfs_inode_cache           17    1032       3       3       0  0 ctor_dtor:reclaimable:page_allocator
rpc_tasks                  0     384       1       1       0  0 unreclaimable:page_allocator
rpc_inode_cache            0     896       1       1       0  0 ctor_dtor:reclaimable:page_allocator
ip6_dst_cache              0     384       1       1       0  0 unreclaimable:page_allocator
TCPv6                      1    1792       2       2       0  0 unreclaimable:page_allocator
UNIX                     112     768      11       7       0  0 unreclaimable:page_allocator
dm_tio                     0      24       0       0       0  0 unreclaimable:page_allocator
dm_io                      0      40       0       0       0  0 unreclaimable:page_allocator
kmalloc                    0      64       0       0       0  0 dma:unreclaimable:page_allocator
cfq_ioc_pool               0     160       0       0       0  0 unreclaimable:page_allocator
cfq_pool                   0     160       0       0       0  0 unreclaimable:page_allocator
mqueue_inode_cache         0     896       1       1       0  0 ctor_dtor:unreclaimable:page_allocator
xfs_chashlist            822      40       9       9       0  0 unreclaimable:page_allocator
xfs_ili                  183     192       9       8       1  0 unreclaimable:page_allocator
xfs_inode               7890     640     319       6       1  0 reclaimable:page_allocator
xfs_efi_item               0     352       0       0       0  0 unreclaimable:page_allocator
xfs_efd_item               0     360       0       0       0  0 unreclaimable:page_allocator
xfs_buf_item               4     184       1       0       1  0 unreclaimable:page_allocator
xfs_acl                    0     304       0       0       0  0 unreclaimable:page_allocator
xfs_dabuf                  0      24       1       0       1  0 unreclaimable:page_allocator
xfs_da_state               0     488       0       0       0  0 unreclaimable:page_allocator
xfs_trans                  1     832       1       0       1  0 unreclaimable:page_allocator
xfs_btree_cur              0     192       1       0       1  0 unreclaimable:page_allocator
xfs_bmap_free_item         0      24       0       0       0  0 unreclaimable:page_allocator
xfs_ioend                128     160       2       0       1  0 unreclaimable:page_allocator
xfs_vnode               7891     768     380       5       1  0 ctor_dtor:reclaimable:page_allocator
isofs_inode_cache          0     656       0       0       0  0 ctor_dtor:reclaimable:page_allocator
fat_inode_cache            0     688       1       1       0  0 ctor_dtor:reclaimable:page_allocator
fat_cache                  0      40       0       0       0  0 ctor_dtor:reclaimable:page_allocator
hugetlbfs_inode_cache      0     624       1       1       0  0 ctor_dtor:unreclaimable:page_allocator
ext2_inode_cache           0     776       0       0       0  0 ctor_dtor:reclaimable:page_allocator
ext2_xattr                 0      88       0       0       0  0 reclaimable:page_allocator
journal_handle             0      24       0       0       0  0 unreclaimable:page_allocator
journal_head               0      96       0       0       0  0 unreclaimable:page_allocator
ext3_inode_cache           0     824       0       0       0  0 ctor_dtor:reclaimable:page_allocator
ext3_xattr                 0      88       0       0       0  0 reclaimable:page_allocator
reiser_inode_cache         0     736       0       0       0  0 ctor_dtor:reclaimable:page_allocator
dnotify_cache              0      40       0       0       0  0 unreclaimable:page_allocator
dquot                      0     256       0       0       0  0 reclaimable:page_allocator
eventpoll_pwq              0      72       1       1       0  0 unreclaimable:page_allocator
inotify_event_cache        0      40       0       0       0  0 unreclaimable:page_allocator
inotify_watch_cache        0      72       1       1       0  0 unreclaimable:page_allocator
kioctx                     0     384       0       0       0  0 unreclaimable:page_allocator
fasync_cache               0      24       0       0       0  0 unreclaimable:page_allocator
shmem_inode_cache        794     816      45      12       0  0 ctor_dtor:unreclaimable:page_allocator
posix_timers_cache         0     136       0       0       0  0 unreclaimable:page_allocator
partial_page_cache         0      48       0       0       0  0 unreclaimable:page_allocator
xfrm_dst_cache             0     384       0       0       0  0 unreclaimable:page_allocator
ip_dst_cache              21     384       2       2       0  0 unreclaimable:page_allocator
RAW                        0     896       1       1       0  0 unreclaimable:page_allocator
UDP                        3     896       3       2       1  0 unreclaimable:page_allocator
TCP                       12    1664       4       4       0  0 unreclaimable:page_allocator
scsi_io_context            0     112       0       0       0  0 unreclaimable:page_allocator
blkdev_ioc                26      56       7       7       0  0 unreclaimable:page_allocator
blkdev_queue              24    1616       4       2       0  0 unreclaimable:page_allocator
blkdev_requests           12     280       2       0       2  0 unreclaimable:page_allocator
sock_inode_cache         167     768      12       6       1  0 ctor_dtor:reclaimable:page_allocator
file_lock_cache            1     184       2       2       0  0 ctor_dtor:unreclaimable:page_allocator
Acpi-Parse                 0      40       0       0       0  0 unreclaimable:page_allocator
Acpi-State                 0      80       0       0       0  0 unreclaimable:page_allocator
proc_inode_cache         696     640      36      16       1  0 ctor_dtor:reclaimable:page_allocator
sigqueue                   0     160       4       0       4  0 unreclaimable:page_allocator
radix_tree_node         2068     560      75       5       0  0 ctor_dtor:unreclaimable:page_allocator
bdev_cache                42     896       5       4       0  0 ctor_dtor:reclaimable:page_allocator
sysfs_dir_cache         4283      80      24       4       0  0 unreclaimable:page_allocator
inode_cache             2571     608     103       8       1  0 ctor_dtor:reclaimable:page_allocator
dentry_cache           13014     200     166       7       3  0 reclaimable:page_allocator
idr_layer_cache           76     536       4       2       0  0 ctor_dtor:unreclaimable:page_allocator
buffer_head             4417     104      33       9       0  0 ctor_dtor:reclaimable:page_allocator
vm_area_struct          1503     176      24      19       3  0 unreclaimable:page_allocator
files_cache               47     768       7       6       1  0 unreclaimable:page_allocator
signal_cache             136     640      10       6       1  0 unreclaimable:page_allocator
sighand_cache            136    1664      19       6       1  0 ctor_dtor:rcu:unreclaimable:page_allocator
anon_vma                 264      32       9       8       1  0 ctor_dtor:rcu:unreclaimable:page_allocator
shared_policy_node         0      48       0       0       0  0 unreclaimable:page_allocator
numa_policy               85     264       4       3       0  0 unreclaimable:page_allocator
kmalloc                    0  262144       0       0       0  4 unreclaimable:page_allocator
kmalloc                    2  131072       2       0       0  3 unreclaimable:page_allocator
kmalloc                    1   65536       1       0       0  2 unreclaimable:page_allocator
kmalloc                   10   32768      10       0       0  1 unreclaimable:page_allocator
kmalloc                   93   16384      93       0       0  0 unreclaimable:page_allocator
kmalloc                   98    8192      49       0       0  0 unreclaimable:page_allocator
kmalloc                   99    4096      31       8       4  0 unreclaimable:page_allocator
kmalloc                  345    2048      47      14       2  0 unreclaimable:page_allocator
kmalloc                  228    1024      21      12       2  0 unreclaimable:page_allocator
kmalloc                  183     512      14       9       3  0 unreclaimable:page_allocator
kmalloc                 3892     256      78      31       3  0 unreclaimable:page_allocator
kmalloc                 1244     128      18       9       4  0 unreclaimable:page_allocator
kmalloc                 1619      64      12       8       1  0 unreclaimable:page_allocator
kmalloc                  121      32       8       5       3  0 unreclaimable:page_allocator
kmalloc                 1644      16       5       4       0  0 unreclaimable:page_allocator
kmalloc                  128       8       4       4       0  0 unreclaimable:page_allocator

Signed-off-by: Christoph Lameter <clameter@sgi.com>


Index: linux-2.6.18-rc4-mm3/fs/proc/proc_misc.c
===================================================================
--- linux-2.6.18-rc4-mm3.orig/fs/proc/proc_misc.c	2006-08-26 16:38:02.961172819 -0700
+++ linux-2.6.18-rc4-mm3/fs/proc/proc_misc.c	2006-08-26 16:38:21.425851884 -0700
@@ -394,9 +394,11 @@ static struct file_operations proc_modul
 };
 #endif
 
-#ifdef CONFIG_SLAB
+#if defined(CONFIG_SLAB) || defined(CONFIG_MODULAR_SLAB)
 extern struct seq_operations slabinfo_op;
+#ifdef CONFIG_SLAB
 extern ssize_t slabinfo_write(struct file *, const char __user *, size_t, loff_t *);
+#endif
 static int slabinfo_open(struct inode *inode, struct file *file)
 {
 	return seq_open(file, &slabinfo_op);
@@ -404,12 +406,14 @@ static int slabinfo_open(struct inode *i
 static struct file_operations proc_slabinfo_operations = {
 	.open		= slabinfo_open,
 	.read		= seq_read,
+#ifdef CONFIG_SLAB
 	.write		= slabinfo_write,
+#endif
 	.llseek		= seq_lseek,
 	.release	= seq_release,
 };
 
-#ifdef CONFIG_DEBUG_SLAB_LEAK
+#if defined(CONFIG_DEBUG_SLAB_LEAK) && defined(CONFIG_SLAB)
 extern struct seq_operations slabstats_op;
 static int slabstats_open(struct inode *inode, struct file *file)
 {
@@ -780,9 +784,9 @@ void __init proc_misc_init(void)
 	create_seq_entry("partitions", 0, &proc_partitions_operations);
 	create_seq_entry("stat", 0, &proc_stat_operations);
 	create_seq_entry("interrupts", 0, &proc_interrupts_operations);
-#ifdef CONFIG_SLAB
+#if defined(CONFIG_SLAB) || defined(CONFIG_MODULAR_SLAB)
 	create_seq_entry("slabinfo",S_IWUSR|S_IRUGO,&proc_slabinfo_operations);
-#ifdef CONFIG_DEBUG_SLAB_LEAK
+#if defined(CONFIG_DEBUG_SLAB_LEAK) && defined(CONFIG_SLAB)
 	create_seq_entry("slab_allocators", 0 ,&proc_slabstats_operations);
 #endif
 #endif
Index: linux-2.6.18-rc4-mm3/mm/Makefile
===================================================================
--- linux-2.6.18-rc4-mm3.orig/mm/Makefile	2006-08-26 16:38:20.822373558 -0700
+++ linux-2.6.18-rc4-mm3/mm/Makefile	2006-08-26 16:38:21.426828386 -0700
@@ -25,4 +25,4 @@ obj-$(CONFIG_MEMORY_HOTPLUG) += memory_h
 obj-$(CONFIG_FS_XIP) += filemap_xip.o
 obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_SMP) += allocpercpu.o
-obj-$(CONFIG_MODULAR_SLAB) += allocator.o slabifier.o
+obj-$(CONFIG_MODULAR_SLAB) += allocator.o slabifier.o slabstat.o
Index: linux-2.6.18-rc4-mm3/mm/slabstat.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.18-rc4-mm3/mm/slabstat.c	2006-08-26 18:24:19.659851704 -0700
@@ -0,0 +1,96 @@
+/*
+ * linux/mm/slabstat.c
+ */
+
+#include <linux/mm.h>
+#include <linux/seq_file.h>
+
+static DECLARE_RWSEM(slabstat_sem);
+
+LIST_HEAD(slab_caches);
+
+void register_slab(struct slab_cache *s)
+{
+	down_write(&slabstat_sem);
+	list_add(&s->list, &slab_caches);
+	up_write(&slabstat_sem);
+}
+
+void unregister_slab(struct slab_cache *s)
+{
+	down_write(&slabstat_sem);
+	list_add(&s->list, &slab_caches);
+	up_write(&slabstat_sem);
+}
+
+static void print_slabinfo_header(struct seq_file *m)
+{
+	/*
+	 * Output format version, so at least we can change it
+	 * without _too_ many complaints.
+	 */
+	seq_puts(m, "slabinfo - version: 3.0\n");
+	seq_puts(m, "# name            <objects> <objsize> <num_slabs> "
+		"<partial_slabs> <active_slabs> <order> <allocator>");
+	seq_putc(m, '\n');
+}
+
+static void *s_start(struct seq_file *m, loff_t *pos)
+{
+	loff_t n = *pos;
+	struct list_head *p;
+
+	down_read(&slabstat_sem);
+	if (!n)
+		print_slabinfo_header(m);
+	p = slab_caches.next;
+	while (n--) {
+		p = p->next;
+		if (p == &slab_caches)
+			return NULL;
+	}
+	return list_entry(p, struct slab_cache, list);
+}
+
+static void *s_next(struct seq_file *m, void *p, loff_t *pos)
+{
+	struct slab_cache *s = p;
+	++*pos;
+	return s->list.next == &slab_caches ?
+		NULL : list_entry(s->list.next, struct slab_cache, list);
+}
+
+static void s_stop(struct seq_file *m, void *p)
+{
+	up_read(&slabstat_sem);
+}
+
+static int s_show(struct seq_file *m, void *p)
+{
+	struct slab_cache *s = p;
+	unsigned long total_slabs;
+	unsigned long active_slabs;
+	unsigned long partial_slabs;
+	unsigned long objects;
+
+	objects = s->slab_alloc->get_objects(s, &total_slabs,
+					&active_slabs, &partial_slabs);
+
+	seq_printf(m, "%-21s %7lu %7u %7lu %7lu %7lu %2d %s",
+		   s->name, objects, s->size, total_slabs, partial_slabs,
+		   active_slabs, s->order, s->page_alloc->name);
+
+	seq_putc(m, '\n');
+	return 0;
+}
+
+/*
+ * slabinfo_op - iterator that generates /proc/slabinfo
+ */
+struct seq_operations slabinfo_op = {
+	.start = s_start,
+	.next = s_next,
+	.stop = s_stop,
+	.show = s_show,
+};
+
Index: linux-2.6.18-rc4-mm3/include/linux/slabstat.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.18-rc4-mm3/include/linux/slabstat.h	2006-08-26 16:38:21.427804888 -0700
@@ -0,0 +1,9 @@
+#ifndef _LINUX_SLABSTAT_H
+#define _LINUX_SLABSTAT_H
+#include <linux/allocator.h>
+
+void register_slab(struct slab_cache *s);
+void unregister_slab(struct slab_cache *s);
+
+#endif /* _LINUX_SLABSTAT_H */
+

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [MODSLAB 2/4] A slab allocator: SLABIFIER
  2006-08-27  2:32 ` [MODSLAB 2/4] A slab allocator: SLABIFIER Christoph Lameter
  2006-08-27  7:18   ` [MODSLAB 2.5/4] A slab statistics module Christoph Lameter
@ 2006-08-28  5:33   ` Christoph Lameter
  1 sibling, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2006-08-28  5:33 UTC (permalink / raw)
  To: akpm
  Cc: Marcelo Tosatti, linux-kernel, linux-mm, Andi Kleen, mpm,
	Manfred Spraul, Dave Chinner

Some fixups. Clean up #ifdefs and use the right list function to go 
through the slabs:

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.18-rc4-mm3/mm/slabifier.c
===================================================================
--- linux-2.6.18-rc4-mm3.orig/mm/slabifier.c	2006-08-26 19:10:49.594764694 -0700
+++ linux-2.6.18-rc4-mm3/mm/slabifier.c	2006-08-27 22:31:24.188711553 -0700
@@ -112,12 +112,12 @@ static __always_inline void dec_object_c
 static __always_inline void set_object_counter(struct page *page,
 							int counter)
 {
-	(*object_counter(page))= counter;
+	*object_counter(page) = counter;
 }
 
 static __always_inline int get_object_counter(struct page *page)
 {
-	return (*object_counter(page));
+	return *object_counter(page);
 }
 
 /*
@@ -168,60 +168,58 @@ static __always_inline int lock_and_del_
 	return 0;
 }
 
-struct page *numa_search(struct slab *s, int node)
-{
+/*
+ * Get a partial page, lock it and return it.
+ */
 #ifdef CONFIG_NUMA
-	struct list_head *h;
+static struct page *get_partial(struct slab *s, int node)
+{
 	struct page *page;
+	int searchnode = (node == -1) ? numa_node_id() : node;
 
+	spin_lock(&s->list_lock);
 	/*
 	 * Search for slab on the right node
 	 */
-
-	if (node == -1)
-		node =  numa_node_id();
-
-	list_for_each(h, &s->partial) {
-		page = container_of(h, struct page, lru);
-
-		if (likely(page_to_nid(page) == node) &&
+	list_for_each_entry(page, &s->partial, lru)
+		if (likely(page_to_nid(page) == searchnode) &&
 			lock_and_del_slab(s, page))
-				return page;
+				goto out;
+
+	if (likely(node == -1)) {
+		/*
+		 * We can fall back to any other node in order to
+		 * reduce the size of the partial list.
+		 */
+		list_for_each_entry(page, &s->partial, lru)
+			if (likely(lock_and_del_slab(s, page)))
+				goto out;
 	}
-#endif
-	return NULL;
-}
 
-/*
- * Get a partial page, lock it and return it.
- */
+	/* Nothing found */
+	page = NULL;
+out:
+	spin_unlock(&s->list_lock);
+	return page;
+}
+#else
 static struct page *get_partial(struct slab *s, int node)
 {
 	struct page *page;
-	struct list_head *h;
 
 	spin_lock(&s->list_lock);
-
-	page = numa_search(s, node);
-	if (page)
-		goto out;
-#ifdef CONFIG_NUMA
-	if (node >= 0)
-		goto fail;
-#endif
-
-	list_for_each(h, &s->partial) {
-		page = container_of(h, struct page, lru);
-
+	list_for_each_entry(page, &s->partial, lru)
 		if (likely(lock_and_del_slab(s, page)))
 			goto out;
-	}
-fail:
+
+	/* No slab or all slabs busy */
 	page = NULL;
 out:
 	spin_unlock(&s->list_lock);
 	return page;
 }
+#endif
+
 
 /*
  * Debugging checks
@@ -758,8 +756,7 @@ dumpret:
 		goto out_unlock;
 
 	if (unlikely(get_object_counter(page) == 0)) {
-		if (s->objects > 1)
-			remove_partial(s, page);
+		remove_partial(s, page);
 		check_free_chain(s, page);
 		slab_unlock(page);
 		discard_slab(s, page);
@@ -908,6 +905,7 @@ static int slab_shrink(struct slab_cache
 		 * This will put the slab on the front of the partial
 		 * list, the used list or free it.
 		 */
+		ClearPageActive(page);
 		putback_slab(s, page);
 	}
 	local_irq_restore(flags);
@@ -957,15 +955,12 @@ static int slab_destroy(struct slab_cach
 static unsigned long count_objects(struct slab *s, struct list_head *list)
 {
 	int count = 0;
-	struct list_head *h;
+	struct page *page;
 	unsigned long flags;
 
 	spin_lock_irqsave(&s->list_lock, flags);
-	list_for_each(h, list) {
-		struct page *page = lru_to_first_page(h);
-
+	list_for_each_entry(page, list, lru)
 		count += get_object_counter(page);
-	}
 	spin_unlock_irqrestore(&s->list_lock, flags);
 	return count;
 }
@@ -975,23 +970,21 @@ static unsigned long slab_objects(struct
 	unsigned long *p_partial)
 {
 	struct slab *s = (void *)sc;
-	int partial;
+	int partial = count_objects(s, &s->partial);
+	int nr_slabs = atomic_read(&s->nr_slabs);
 	int active = 0;		/* Active slabs */
 	int nr_active = 0;	/* Objects in active slabs */
 	int cpu;
-	int nr_slabs = atomic_read(&s->nr_slabs);
 
 	for_each_possible_cpu(cpu) {
 		struct page *page = s->active[cpu];
 
-		if (s->active[cpu]) {
+		if (page) {
 			nr_active++;
 			active += get_object_counter(page);
 		}
 	}
 
-	partial = count_objects(s, &s->partial);
-
 	if (p_partial)
 		*p_partial = s->nr_partial;
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [MODSLAB 3/4] A Kmalloc subsystem
  2006-08-27  2:32 [MODSLAB 0/4] A modular slab allocator V2 Christoph Lameter
  2006-08-27  2:32 ` [MODSLAB 1/4] Generic Allocator Framework Christoph Lameter
  2006-08-27  2:32 ` [MODSLAB 2/4] A slab allocator: SLABIFIER Christoph Lameter
@ 2006-08-27  2:33 ` Christoph Lameter
  2006-08-27  2:33 ` [MODSLAB 4/4] Slabulator: Emulate the existing Slab Layer Christoph Lameter
  3 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2006-08-27  2:33 UTC (permalink / raw)
  To: akpm
  Cc: Marcelo Tosatti, linux-kernel, linux-mm, Andi Kleen,
	Christoph Lameter, mpm, Dave Chinner, Manfred Spraul

A generic kmalloc layer for the modular slab

Regular kmalloc allocations are optimized. DMA kmalloc slabs are
created on demand.

Also re exports the kmalloc array as a new slab_allocator that
can be used to tie into the kmalloc array (the slabulator
uses that to avoid creating new slabs that are compatible
with generic kmalloc caches).

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.18-rc4-mm3/include/linux/kmalloc.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.18-rc4-mm3/include/linux/kmalloc.h	2006-08-26 18:25:33.360374104 -0700
@@ -0,0 +1,134 @@
+#ifndef _LINUX_KMALLOC_H
+#define _LINUX_KMALLOC_H
+/*
+ * In kernel dynamic memory allocator.
+ *
+ * (C) 2006 Silicon Graphics, Inc,
+ * 		Christoph Lameter <clameter@sgi.com>
+ */
+
+#include <linux/allocator.h>
+#include <linux/config.h>
+#include <linux/types.h>
+
+#ifndef KMALLOC_ALLOCATOR
+#define KMALLOC_ALLOCATOR slabifier_allocator
+#endif
+
+#define KMALLOC_SHIFT_LOW 3
+
+#define KMALLOC_SHIFT_HIGH 18
+
+#if L1_CACHE_BYTES <= 64
+#define KMALLOC_EXTRAS 2
+#define KMALLOC_EXTRA
+#else
+#define KMALLOC_EXTRAS 0
+#endif
+
+#define KMALLOC_NR_CACHES (KMALLOC_SHIFT_HIGH - KMALLOC_SHIFT_LOW \
+			 + 1 + KMALLOC_EXTRAS)
+/*
+ * We keep the general caches in an array of slab caches that are used for
+ * 2^x bytes of allocations. For each size we generate a DMA and a
+ * non DMA cache (DMA simply means memory for legacy I/O. The regular
+ * caches can be used for devices that can DMA to all of memory).
+ */
+extern struct slab_control kmalloc_caches[KMALLOC_NR_CACHES];
+
+/*
+ * Sorry that the following has to be that ugly but GCC has trouble
+ * with constant propagation and loops.
+ */
+static inline int kmalloc_index(int size)
+{
+	if (size <=    8) return 3;
+	if (size <=   16) return 4;
+	if (size <=   32) return 5;
+	if (size <=   64) return 6;
+#ifdef KMALLOC_EXTRA
+	if (size <=   96) return KMALLOC_SHIFT_HIGH + 1;
+#endif
+	if (size <=  128) return 7;
+#ifdef KMALLOC_EXTRA
+	if (size <=  192) return KMALLOC_SHIFT_HIGH + 2;
+#endif
+	if (size <=  256) return 8;
+	if (size <=  512) return 9;
+	if (size <= 1024) return 10;
+	if (size <= 2048) return 11;
+	if (size <= 4096) return 12;
+	if (size <=   8 * 1024) return 13;
+	if (size <=  16 * 1024) return 14;
+	if (size <=  32 * 1024) return 15;
+	if (size <=  64 * 1024) return 16;
+	if (size <= 128 * 1024) return 17;
+	if (size <= 256 * 1024) return 18;
+	return -1;
+}
+
+/*
+ * Find the slab cache for a given combination of allocation flags and size.
+ *
+ * This ought to end up with a global pointer to the right cache
+ * in kmalloc_caches.
+ */
+static inline struct slab_cache *kmalloc_slab(size_t size)
+{
+	int index = kmalloc_index(size) - KMALLOC_SHIFT_LOW;
+
+	if (index < 0) {
+		/*
+		 * Generate a link failure. Would be great if we could
+		 * do something to stop the compile here.
+		 */
+		extern void __kmalloc_size_too_large(void);
+		__kmalloc_size_too_large();
+	}
+	return &kmalloc_caches[index].sc;
+}
+
+extern void *__kmalloc(size_t, gfp_t);
+#define ____kmalloc __kmalloc
+
+static inline void *kmalloc(size_t size, gfp_t flags)
+{
+	if (__builtin_constant_p(size) && !(flags & __GFP_DMA)) {
+		struct slab_cache *s = kmalloc_slab(size);
+
+		return KMALLOC_ALLOCATOR.alloc(s, flags);
+	} else
+		return __kmalloc(size, flags);
+}
+
+#ifdef CONFIG_NUMA
+extern void *__kmalloc_node(size_t, gfp_t, int);
+static inline void *kmalloc_node(size_t size, gfp_t flags, int node)
+{
+	if (__builtin_constant_p(size) && !(flags & __GFP_DMA)) {
+		struct slab_cache *s = kmalloc_slab(size);
+
+		return KMALLOC_ALLOCATOR.alloc_node(s, flags, node);
+	} else
+		return __kmalloc_node(size, flags, node);
+}
+#else
+#define kmalloc_node(__size, __flags, __node) kmalloc((__size), (__flags))
+#endif
+
+/* Free an object */
+static inline void kfree(const void *x)
+{
+	return KMALLOC_ALLOCATOR.free(NULL, x);
+}
+
+/* Allocate and zero the specified number of bytes */
+extern void *kzalloc(size_t, gfp_t);
+
+/* Figure out what size the chunk is */
+extern size_t ksize(const void *);
+
+extern struct page_allocator *reclaimable_allocator;
+extern struct page_allocator *unreclaimable_allocator;
+
+#endif	/* _LINUX_KMALLOC_H */
Index: linux-2.6.18-rc4-mm3/mm/kmalloc.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.18-rc4-mm3/mm/kmalloc.c	2006-08-26 18:25:33.362327108 -0700
@@ -0,0 +1,205 @@
+/*
+ * Create generic slab caches for memory allocation.
+ *
+ * (C) 2006 Silicon Graphics. Inc. Christoph Lameter <clameter@sgi.com>
+ */
+
+#include <linux/mm.h>
+#include <linux/allocator.h>
+#include <linux/module.h>
+#include <linux/kmalloc.h>
+#include <linux/slabstat.h>
+
+#ifndef ARCH_KMALLOC_MINALIGN
+#define ARCH_KMALLOC_MINALIGN sizeof(void *)
+#endif
+
+struct slab_control kmalloc_caches[KMALLOC_NR_CACHES] __cacheline_aligned;
+EXPORT_SYMBOL(kmalloc_caches);
+
+static struct page_allocator *dma_allocator;
+struct page_allocator *reclaimable_allocator;
+struct page_allocator *unreclaimable_allocator;
+
+static struct slab_cache *kmalloc_caches_dma[KMALLOC_NR_CACHES];
+
+/*
+ * Given a slab size find the correct order to use.
+ * We only support powers of two so there is really
+ * no need for anything special. Objects will always
+ * fit exactly into the slabs with no overhead.
+ */
+static __init int order(size_t size)
+{
+	if (size >= PAGE_SIZE)
+		/* One object per slab */
+		return fls(size -1) - PAGE_SHIFT;
+
+	/* Multiple objects per page which will fit neatly */
+	return 0;
+}
+
+static struct slab_cache *create_kmalloc_cache(struct slab_control *x,
+		const char *name,
+		const struct page_allocator *p,
+		int size)
+{
+	struct slab_cache s;
+	struct slab_cache *rs;
+
+	s.page_alloc = p;
+	s.slab_alloc = &KMALLOC_ALLOCATOR;
+	s.size = size;
+	s.align = ARCH_KMALLOC_MINALIGN;
+	s.offset = 0;
+	s.objsize = size;
+	s.inuse = size;
+	s.node = -1;
+	s.order = order(size);
+	s.name = "kmalloc";
+	rs = KMALLOC_ALLOCATOR.create(x, &s);
+	if (!rs)
+		panic("Creation of kmalloc slab %s size=%d failed.\n",
+			name, size);
+	register_slab(rs);
+	return rs;
+}
+
+static struct slab_cache *get_slab(size_t size, gfp_t flags)
+{
+	int index = kmalloc_index(size) - KMALLOC_SHIFT_LOW;
+	struct slab_cache *s;
+	struct slab_control *x;
+	size_t realsize;
+
+	BUG_ON(size < 0);
+
+	if (!(flags & __GFP_DMA))
+		return &kmalloc_caches[index].sc;
+
+	s = kmalloc_caches_dma[index];
+	if (s)
+		return s;
+
+	/* Dynamically create dma cache */
+	x = kmalloc(sizeof(struct slab_control), flags & ~(__GFP_DMA));
+
+	if (!x)
+		panic("Unable to allocate memory for dma cache\n");
+
+#ifdef KMALLOC_EXTRA
+	if (index <= KMALLOC_SHIFT_HIGH - KMALLOC_SHIFT_LOW)
+#endif
+		realsize = 1 << index;
+#ifdef KMALLOC_EXTRA
+	else if (index = KMALLOC_EXTRA)
+		realsize = 96;
+	else
+		realsize = 192;
+#endif
+
+	s = create_kmalloc_cache(x, "kmalloc_dma", dma_allocator, realsize);
+	kmalloc_caches_dma[index] = s;
+	return s;
+}
+
+void *__kmalloc(size_t size, gfp_t flags)
+{
+	return KMALLOC_ALLOCATOR.alloc(get_slab(size, flags), flags);
+}
+EXPORT_SYMBOL(__kmalloc);
+
+#ifdef CONFIG_NUMA
+void *__kmalloc_node(size_t size, gfp_t flags, int node)
+{
+	return KMALLOC_ALLOCATOR.alloc_node(get_slab(size, flags),
+							flags, node);
+}
+EXPORT_SYMBOL(__kmalloc_node);
+#endif
+
+void *kzalloc(size_t size, gfp_t flags)
+{
+	void *x = __kmalloc(size, flags);
+
+	if (x)
+		memset(x, 0, size);
+	return x;
+}
+EXPORT_SYMBOL(kzalloc);
+
+size_t ksize(const void *object)
+{
+	return KMALLOC_ALLOCATOR.object_size(NULL, object);
+};
+EXPORT_SYMBOL(ksize);
+
+/*
+ * Provide the kmalloc array as regular slab allocator for the
+ * generic allocator framework.
+ */
+struct slab_allocator kmalloc_slab_allocator;
+
+static struct slab_cache *kmalloc_create(struct slab_control *x,
+	const struct slab_cache *s)
+{
+	struct slab_cache *km;
+
+	int index = max(0, fls(s->size - 1) - KMALLOC_SHIFT_LOW);
+
+	if (index > KMALLOC_SHIFT_HIGH - KMALLOC_SHIFT_LOW + 1
+			|| s->offset)
+		return NULL;
+
+	km = &kmalloc_caches[index].sc;
+
+	BUG_ON(s->size > km->size);
+
+	return KMALLOC_ALLOCATOR.dup(km);
+}
+
+static void null_destructor(struct page_allocator *x) {}
+
+void __init kmalloc_init(void)
+{
+	int i;
+
+	for (i =  KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
+		create_kmalloc_cache(
+			&kmalloc_caches[i - KMALLOC_SHIFT_LOW],
+			"kmalloc", &page_allocator, 1 << i);
+	}
+#ifdef KMALLOC_EXTRA
+	/* Non-power of two caches */
+	create_kmalloc_cache(&kmalloc_caches
+		[KMALLOC_SHIFT_HIGH - KMALLOC_SHIFT_LOW + 1], name, pa, 96);
+	create_kmalloc_cache(&kmalloc_caches
+		[KMALLOC_SHIFT_HIGH - KMALLOC_SHIFT_LOW + 2], name, pa, 192);
+#endif
+
+	/*
+	 * The above must be done first. Deriving a page allocator requires
+	 * a working (normal) kmalloc array.
+	 */
+	unreclaimable_allocator = unreclaimable_slab(&page_allocator);
+	unreclaimable_allocator->destructor = null_destructor;
+
+	/*
+	 * Fix up the initial arrays. Because of the precending uses
+	 * we likely have consumed a couple of pages that we cannot account
+	 * for.
+	 */
+	for(i = 0; i < KMALLOC_NR_CACHES; i++)
+		kmalloc_caches[i].sc.page_alloc = unreclaimable_allocator;
+
+	reclaimable_allocator = reclaimable_slab(&page_allocator);
+	reclaimable_allocator->destructor = null_destructor;
+	dma_allocator = dmaify_page_allocator(unreclaimable_allocator);
+
+	/* And deal with the kmalloc_cache_allocator */
+	memcpy(&kmalloc_slab_allocator, &KMALLOC_ALLOCATOR,
+			sizeof(struct slab_allocator));
+	kmalloc_slab_allocator.create = kmalloc_create;
+	kmalloc_slab_allocator.destructor = null_slab_allocator_destructor;
+}
+
Index: linux-2.6.18-rc4-mm3/mm/Makefile
===================================================================
--- linux-2.6.18-rc4-mm3.orig/mm/Makefile	2006-08-26 16:38:21.426828386 -0700
+++ linux-2.6.18-rc4-mm3/mm/Makefile	2006-08-26 16:38:22.103544372 -0700
@@ -25,4 +25,4 @@ obj-$(CONFIG_MEMORY_HOTPLUG) += memory_h
 obj-$(CONFIG_FS_XIP) += filemap_xip.o
 obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_SMP) += allocpercpu.o
-obj-$(CONFIG_MODULAR_SLAB) += allocator.o slabifier.o slabstat.o
+obj-$(CONFIG_MODULAR_SLAB) += allocator.o slabifier.o slabstat.o kmalloc.o

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [MODSLAB 4/4] Slabulator: Emulate the existing Slab Layer
  2006-08-27  2:32 [MODSLAB 0/4] A modular slab allocator V2 Christoph Lameter
                   ` (2 preceding siblings ...)
  2006-08-27  2:33 ` [MODSLAB 3/4] A Kmalloc subsystem Christoph Lameter
@ 2006-08-27  2:33 ` Christoph Lameter
  3 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2006-08-27  2:33 UTC (permalink / raw)
  To: akpm
  Cc: Marcelo Tosatti, linux-kernel, linux-mm, Andi Kleen, mpm,
	Manfred Spraul, Dave Chinner, Christoph Lameter

The slab emulation layer.

This provides a layer that implements the existing slab API.
We try to keep the definitions that we copy from slab.h
to an absolute minimum. If things break then more
(useless) definitions from slab.h may be needed.

We put a hook into slab.h to redirect includes for slab.h to
slabulator.h.

The slabulator also contains the slab reaper since it is
used by the page allocator. However, the slabifier does not
need any of this since it is not per cpu cache based. The slabifier
does never reap active slabs in single processor configuration.
For SMP and NUMA a slab specific reaper is used.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.18-rc4-mm3/mm/slabulator.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.18-rc4-mm3/mm/slabulator.c	2006-08-26 18:27:52.666216263 -0700
@@ -0,0 +1,287 @@
+/*
+ * Slabulator = Emulate the Slab API.
+ *
+ * (C) 2006 Silicon Graphics, Inc. Christoph Lameter <clameter@sgi.com>
+ *
+ */
+#include <linux/mm.h>
+#include <linux/kmalloc.h>
+#include <linux/module.h>
+#include <linux/allocator.h>
+#include <linux/bitops.h>
+#include <linux/slabulator.h>
+#include <linux/slabstat.h>
+
+#define SLAB_MAX_ORDER 4
+
+#define SLABULATOR_MERGE
+
+#ifndef ARCH_SLAB_MINALIGN
+#define ARCH_SLAB_MINALIGN sizeof(void *)
+#endif
+
+static int calculate_order(int size)
+{
+	int order;
+	int rem;
+
+	for(order = max(0, fls(size - 1) - PAGE_SHIFT);
+			order < MAX_ORDER; order++) {
+		unsigned long slab_size = PAGE_SIZE << order;
+
+		if (slab_size < size)
+			continue;
+
+		rem = slab_size % size;
+
+		if (rem * 8 <= PAGE_SIZE << order)
+			break;
+
+	}
+	if (order >= MAX_ORDER)
+		return -E2BIG;
+	return order;
+}
+
+/*
+ * We can actually operate slabs any time after the page allocator is up.
+ * slab_is_available() merely means that the kmalloc array is available.
+ *
+ * However, be aware that deriving allocators depends on kmalloc being
+ * functional.
+ */
+int slabulator_up = 0;
+
+int slab_is_available(void) {
+	return slabulator_up;
+}
+
+void kmem_cache_init(void)
+{
+	extern void kmalloc_init(void);
+
+	kmalloc_init();
+	slabulator_up = 1;
+}
+
+struct slab_cache *kmem_cache_create(const char *name, size_t size,
+		size_t align, unsigned long flags,
+		void (*ctor)(void *, struct slab_cache *, unsigned long),
+		void (*dtor)(void *, struct slab_cache *, unsigned long))
+{
+	const struct page_allocator *a;
+	struct slab_cache s;
+	struct slab_cache *rs;
+	struct slab_control *x;
+	int page_size_slab;
+
+	s.offset = 0;
+	s.align = max(ARCH_SLAB_MINALIGN, ALIGN(align, sizeof(void *)));
+
+	if (flags & (SLAB_MUST_HWCACHE_ALIGN|SLAB_HWCACHE_ALIGN))
+		s.align = L1_CACHE_BYTES;
+
+	s.inuse = size;
+	s.objsize = size;
+	s.size = ALIGN(size, s.align);
+
+	/* Pick the right allocator for our purposes */
+	if (flags & SLAB_RECLAIM_ACCOUNT)
+		a = reclaimable_allocator;
+	else
+		a = unreclaimable_allocator;
+
+	if (flags & SLAB_CACHE_DMA)
+		a = dmaify_page_allocator(a);
+
+	if (flags & SLAB_DESTROY_BY_RCU)
+		a = rcuify_page_allocator(a);
+
+	page_size_slab = (PAGE_SIZE << calculate_order(s.size)) > (s.size << 1);
+
+	if (page_size_slab && ((flags & SLAB_DESTROY_BY_RCU) || ctor || dtor)) {
+		/*
+		 * For RCU processing and constructors / destructors:
+		 * The object must remain intact even if it is free.
+		 * The free pointer would hurt us there.
+		 * Relocate the free object pointer out of
+		 * the space used by the object.
+		 *
+		 * Slabs with a single object do not need this since
+		 * those do not have to deal with free pointers.
+		 */
+		s.offset = s.size - sizeof(void *);
+		if (s.offset < s.objsize) {
+			/*
+			 * Would overlap the object. We need to waste some
+			 * more space to make the object RCU safe
+			 */
+			s.offset = s.size;
+			s.size += s.align;
+		}
+		s.inuse = s.size;
+	}
+
+	s.order = calculate_order(s.size);
+
+	if (s.order < 0)
+		goto error;
+
+	s.name = name;
+	s.node = -1;
+
+	x = kmalloc(sizeof(struct slab_control), GFP_KERNEL);
+
+	if (!x)
+		return NULL;
+	s.page_alloc = a;
+	s.slab_alloc = &SLABULATOR_ALLOCATOR;
+#ifdef SLABULATOR_MERGE
+	/*
+	 * This works but is this really something we want?
+	 */
+	if (((s.size & (s.size - 1))==0) && !ctor && !dtor &&
+		   !(flags & (SLAB_DESTROY_BY_RCU|SLAB_RECLAIM_ACCOUNT))) {
+
+		printk(KERN_INFO "Merging slab_cache %s size %d into"
+			" kmalloc array\n", name, s.size);
+		rs = kmalloc_slab_allocator.create(x, &s);
+		kfree(x);
+		x = NULL;
+	} else
+#endif
+	rs = SLABULATOR_ALLOCATOR.create(x, &s);
+	if (!rs)
+		goto error;
+
+	/*
+	 * Now deal with constuctors and destructors. We need to know the
+	 * slab_cache address in order to be able to pass the slab_cache
+	 * address down the chain.
+	 */
+	if (ctor || dtor)
+		rs->page_alloc =
+			ctor_and_dtor_for_page_allocator(rs->page_alloc,
+				rs->size, rs,
+				(void *)ctor, (void *)dtor);
+
+	if (x)
+		register_slab(rs);
+	return rs;
+
+error:
+	a->destructor((struct page_allocator *)a);
+	if (flags & SLAB_PANIC)
+		panic("Cannot create slab %s size=%ld realsize=%d "
+			"order=%d offset=%d flags=%lx\n",
+			s.name, size, s.size, s.order, s.offset, flags);
+
+
+	return NULL;
+}
+EXPORT_SYMBOL(kmem_cache_create);
+
+int kmem_cache_destroy(struct slab_cache *s)
+{
+	SLABULATOR_ALLOCATOR.destroy(s);
+	unregister_slab(s);
+	kfree(s);
+	return 0;
+}
+EXPORT_SYMBOL(kmem_cache_destroy);
+
+void *kmem_cache_zalloc(struct slab_cache *s, gfp_t flags)
+{
+	void *x;
+
+	x = kmem_cache_alloc(s, flags);
+	if (x)
+		memset(x, 0, s->objsize);
+	return x;
+}
+
+/*
+ * Generic reaper (the slabifier has its own way of reaping)
+ */
+#ifdef CONFIG_NUMA
+/*
+ * Special reaping functions for NUMA systems called from cache_reap().
+ */
+static DEFINE_PER_CPU(unsigned long, reap_node);
+
+static void init_reap_node(int cpu)
+{
+	int node;
+
+	node = next_node(cpu_to_node(cpu), node_online_map);
+	if (node == MAX_NUMNODES)
+		node = first_node(node_online_map);
+
+	__get_cpu_var(reap_node) = node;
+}
+
+static void next_reap_node(void)
+{
+	int node = __get_cpu_var(reap_node);
+
+	/*
+	 * Also drain per cpu pages on remote zones
+	 */
+	if (node != numa_node_id())
+		drain_node_pages(node);
+
+	node = next_node(node, node_online_map);
+	if (unlikely(node >= MAX_NUMNODES))
+		node = first_node(node_online_map);
+	__get_cpu_var(reap_node) = node;
+}
+#else
+#define init_reap_node(cpu) do { } while (0)
+#define next_reap_node(void) do { } while (0)
+#endif
+
+#define REAPTIMEOUT_CPUC	(2*HZ)
+
+#ifdef CONFIG_SMP
+static DEFINE_PER_CPU(struct work_struct, reap_work);
+
+static void cache_reap(void *unused)
+{
+	next_reap_node();
+	refresh_cpu_vm_stats(smp_processor_id());
+
+	schedule_delayed_work(&__get_cpu_var(reap_work),
+				      REAPTIMEOUT_CPUC);
+}
+
+static void __devinit start_cpu_timer(int cpu)
+{
+	struct work_struct *reap_work = &per_cpu(reap_work, cpu);
+
+	/*
+	 * When this gets called from do_initcalls via cpucache_init(),
+	 * init_workqueues() has already run, so keventd will be setup
+	 * at that time.
+	 */
+	if (keventd_up() && reap_work->func == NULL) {
+		init_reap_node(cpu);
+		INIT_WORK(reap_work, cache_reap, NULL);
+		schedule_delayed_work_on(cpu, reap_work, HZ + 3 * cpu);
+	}
+}
+
+static int __init cpucache_init(void)
+{
+	int cpu;
+
+	/*
+	 * Register the timers that drain pcp pages and update vm statistics
+	 */
+	for_each_online_cpu(cpu)
+		start_cpu_timer(cpu);
+	return 0;
+}
+__initcall(cpucache_init);
+#endif
+
+
Index: linux-2.6.18-rc4-mm3/include/linux/slabulator.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.18-rc4-mm3/include/linux/slabulator.h	2006-08-26 18:27:12.869848074 -0700
@@ -0,0 +1,123 @@
+#ifndef _LINUX_SLABULATOR_H
+#define _LINUX_SLABULATOR_H
+/*
+ * Slabulator: Emulate the existing Slab API.
+ *
+ * (C) 2006 Silicon Graphics, Inc.
+ *		Christoph Lameter <clameter@sgi.com>
+ */
+
+#include <linux/allocator.h>
+#include <linux/kmalloc.h>
+
+#define kmem_cache_t	struct slab_cache
+#define kmem_cache	slab_cache
+
+#ifndef SLABULATOR_ALLOCATOR
+#define SLABULATOR_ALLOCATOR slabifier_allocator
+#endif
+
+/*
+ * We really should be getting rid of these. This is only
+ * a select list/
+ */
+#define	SLAB_KERNEL		GFP_KERNEL
+#define	SLAB_ATOMIC		GFP_ATOMIC
+#define	SLAB_NOFS		GFP_NOFS
+#define SLAB_NOIO		GFP_NOIO
+
+/* No debug features for now */
+#define	SLAB_HWCACHE_ALIGN	0x00002000UL
+#define SLAB_CACHE_DMA		0x00004000UL
+#define SLAB_MUST_HWCACHE_ALIGN	0x00008000UL
+#define SLAB_RECLAIM_ACCOUNT	0x00020000UL
+#define SLAB_PANIC		0x00040000UL
+#define SLAB_DESTROY_BY_RCU	0x00080000UL
+#define SLAB_MEM_SPREAD		0x00100000UL
+
+/* flags passed to a constructor func */
+#define	SLAB_CTOR_CONSTRUCTOR	0x001UL
+#define SLAB_CTOR_ATOMIC	0x002UL
+#define	SLAB_CTOR_VERIFY	0x004UL
+
+/*
+ * slab_allocators are always available after the page allocator
+ * has been brought up. kmem_cache_init creates the kmalloc array:
+ */
+extern int slab_is_available(void);
+extern void kmem_cache_init(void);
+
+/* System wide caches (Should these be really here?) */
+extern struct slab_cache *vm_area_cachep;
+extern struct slab_cache *names_cachep;
+extern struct slab_cache *files_cachep;
+extern struct slab_cache *filp_cachep;
+extern struct slab_cache *fs_cachep;
+extern struct slab_cache *sighand_cachep;
+extern struct slab_cache *bio_cachep;
+
+extern struct slab_cache *kmem_cache_create(const char *name, size_t size,
+	size_t align, unsigned long flags,
+	void (*ctor)(void *, struct slab_cache *, unsigned long),
+	void (*dtor)(void *, struct slab_cache *, unsigned long));
+
+static inline unsigned int kmem_cache_size(struct slab_cache *s)
+{
+	return s->objsize;
+}
+
+static inline const char *kmem_cache_name(struct slab_cache *s)
+{
+	return s->name;
+}
+
+static inline void *kmem_cache_alloc(struct slab_cache *s, gfp_t flags)
+{
+	return SLABULATOR_ALLOCATOR.alloc(s, flags);
+}
+
+static inline void *kmem_cache_alloc_node(struct slab_cache *s,
+					gfp_t flags, int node)
+{
+	return SLABULATOR_ALLOCATOR.alloc_node(s, flags, node);
+}
+
+extern void *kmem_cache_zalloc(struct slab_cache *s, gfp_t flags);
+
+static inline void kmem_cache_free(struct slab_cache *s, const void *x)
+{
+	SLABULATOR_ALLOCATOR.free(s, x);
+}
+
+static inline int kmem_ptr_validate(struct slab_cache *s, void *x)
+{
+	return SLABULATOR_ALLOCATOR.valid_pointer(s, x);
+}
+
+extern int kmem_cache_destroy(struct slab_cache *s);
+
+static inline int kmem_cache_shrink(struct slab_cache *s)
+{
+	return SLABULATOR_ALLOCATOR.shrink(s, NULL);
+}
+
+/**
+ * kcalloc - allocate memory for an array. The memory is set to zero.
+ * @n: number of elements.
+ * @size: element size.
+ * @flags: the type of memory to allocate.
+ */
+static inline void *kcalloc(size_t n, size_t size, gfp_t flags)
+{
+	if (n != 0 && size > ULONG_MAX / n)
+		return NULL;
+	return kzalloc(n * size, flags);
+}
+
+/* No current shrink statistics */
+struct shrinker;
+static inline void kmem_set_shrinker(kmem_cache_t *cachep,
+		struct shrinker *shrinker)
+{}
+#endif /* _LINUX_SLABULATOR_H */
+
Index: linux-2.6.18-rc4-mm3/mm/Makefile
===================================================================
--- linux-2.6.18-rc4-mm3.orig/mm/Makefile	2006-08-26 18:27:10.289929422 -0700
+++ linux-2.6.18-rc4-mm3/mm/Makefile	2006-08-26 18:27:12.870824576 -0700
@@ -25,4 +25,5 @@ obj-$(CONFIG_MEMORY_HOTPLUG) += memory_h
 obj-$(CONFIG_FS_XIP) += filemap_xip.o
 obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_SMP) += allocpercpu.o
-obj-$(CONFIG_MODULAR_SLAB) += allocator.o slabifier.o slabstat.o kmalloc.o
+obj-$(CONFIG_MODULAR_SLAB) += allocator.o slabifier.o slabstat.o \
+				kmalloc.o slabulator.o
Index: linux-2.6.18-rc4-mm3/init/Kconfig
===================================================================
--- linux-2.6.18-rc4-mm3.orig/init/Kconfig	2006-08-26 16:38:04.676887088 -0700
+++ linux-2.6.18-rc4-mm3/init/Kconfig	2006-08-26 18:27:12.871801078 -0700
@@ -332,6 +332,26 @@ config CC_OPTIMIZE_FOR_SIZE
 
 	  If unsure, say N.
 
+config SLAB
+	default y
+	bool "Traditional SLAB allocator"
+	help
+	  Disabling this allows the use of alternate slab allocators
+	  with less overhead such as SLOB (very simple) or the
+	  use the slabifier with the module allocator framework.
+	  Note that alternate slab allocators may not provide
+	  the complete functionality for slab.
+
+config MODULAR_SLAB
+	default y
+	bool "Use the modular allocator framework"
+	depends on EXPERIMENTAL && !SLAB
+	help
+	 The modular  allocator framework allows the flexible use
+	 of different slab allocators and page allocators for memory
+	 allocation. This will completely replace the existing
+	 slab allocator. Beware this is experimental code.
+
 menuconfig EMBEDDED
 	bool "Configure standard kernel features (for small systems)"
 	help
@@ -370,7 +390,6 @@ config KALLSYMS_EXTRA_PASS
 	   reported.  KALLSYMS_EXTRA_PASS is only a temporary workaround while
 	   you wait for kallsyms to be fixed.
 
-
 config HOTPLUG
 	bool "Support for hot-pluggable devices" if EMBEDDED
 	default y
@@ -445,15 +464,6 @@ config SHMEM
 	  option replaces shmem and tmpfs with the much simpler ramfs code,
 	  which may be appropriate on small systems without swap.
 
-config SLAB
-	default y
-	bool "Use full SLAB allocator" if EMBEDDED
-	help
-	  Disabling this replaces the advanced SLAB allocator and
-	  kmalloc support with the drastically simpler SLOB allocator.
-	  SLOB is more space efficient but does not scale well and is
-	  more susceptible to fragmentation.
-
 config VM_EVENT_COUNTERS
 	default y
 	bool "Enable VM event counters for /proc/vmstat" if EMBEDDED
@@ -475,7 +485,7 @@ config BASE_SMALL
 	default 1 if !BASE_FULL
 
 config SLOB
-	default !SLAB
+	default !SLAB && !MODULAR_SLAB
 	bool
 
 menu "Loadable module support"
Index: linux-2.6.18-rc4-mm3/include/linux/slab.h
===================================================================
--- linux-2.6.18-rc4-mm3.orig/include/linux/slab.h	2006-08-26 16:38:04.426902539 -0700
+++ linux-2.6.18-rc4-mm3/include/linux/slab.h	2006-08-26 18:27:12.871801078 -0700
@@ -9,6 +9,10 @@
 
 #if	defined(__KERNEL__)
 
+#ifdef CONFIG_MODULAR_SLAB
+#include <linux/slabulator.h>
+#else
+
 typedef struct kmem_cache kmem_cache_t;
 
 #include	<linux/gfp.h>
@@ -291,6 +295,8 @@ extern kmem_cache_t	*fs_cachep;
 extern kmem_cache_t	*sighand_cachep;
 extern kmem_cache_t	*bio_cachep;
 
+#endif /* CONFIG_SLABULATOR */
+
 #endif	/* __KERNEL__ */
 
 #endif	/* _LINUX_SLAB_H */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-08-28  5:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-27  2:32 [MODSLAB 0/4] A modular slab allocator V2 Christoph Lameter
2006-08-27  2:32 ` [MODSLAB 1/4] Generic Allocator Framework Christoph Lameter
2006-08-27  2:32 ` [MODSLAB 2/4] A slab allocator: SLABIFIER Christoph Lameter
2006-08-27  7:18   ` [MODSLAB 2.5/4] A slab statistics module Christoph Lameter
2006-08-28  5:33   ` [MODSLAB 2/4] A slab allocator: SLABIFIER Christoph Lameter
2006-08-27  2:33 ` [MODSLAB 3/4] A Kmalloc subsystem Christoph Lameter
2006-08-27  2:33 ` [MODSLAB 4/4] Slabulator: Emulate the existing Slab Layer Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox