linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Lameter <clameter@sgi.com>
To: akpm@linux-foundatin.org
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Mel Gorman <mel@skynet.ie>
Subject: [patch 22/23] SLUB: Slab reclaim through Lumpy reclaim
Date: Tue, 06 Nov 2007 17:11:52 -0800	[thread overview]
Message-ID: <20071107011231.907368704@sgi.com> (raw)
In-Reply-To: <20071107011130.382244340@sgi.com>

[-- Attachment #1: 0012-slab_defrag_lumpy_reclaim.patch --]
[-- Type: text/plain, Size: 7961 bytes --]

Creates a special function kmem_cache_isolate_slab() and kmem_cache_reclaim()
to support lumpy reclaim.

In order to isolate pages we will have to handle slab page allocations in
such a way that we can determine if a slab is valid whenever we access it
regardless of its time in life.

A valid slab that can be freed has PageSlab(page) and page->inuse > 0 set.
So we need to make sure in allocate_slab() that page->inuse is zero before
PageSlab is set.

kmem_cache_isolate_page() is called from lumpy reclaim to isolate pages
neighboring a page cache page that is being reclaimed. Lumpy reclaim will
gather the slabs and call kmem_cache_reclaim() on the list.

This means that we can remove a slab in order to be able to coalesce
a higher order page.

Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 include/linux/slab.h |    2 +
 mm/slab.c            |   13 ++++++
 mm/slub.c            |  102 ++++++++++++++++++++++++++++++++++++++++++++++++---
 mm/vmscan.c          |   13 +++++-
 4 files changed, 123 insertions(+), 7 deletions(-)

Index: linux-2.6/include/linux/slab.h
===================================================================
--- linux-2.6.orig/include/linux/slab.h	2007-11-06 13:50:47.000000000 -0800
+++ linux-2.6/include/linux/slab.h	2007-11-06 13:50:54.000000000 -0800
@@ -64,6 +64,8 @@ unsigned int kmem_cache_size(struct kmem
 const char *kmem_cache_name(struct kmem_cache *);
 int kmem_ptr_validate(struct kmem_cache *cachep, const void *ptr);
 int kmem_cache_defrag(int node);
+int kmem_cache_isolate_slab(struct page *);
+int kmem_cache_reclaim(struct list_head *);
 
 /*
  * Please use this macro to create slab caches. Simply specify the
Index: linux-2.6/mm/slab.c
===================================================================
--- linux-2.6.orig/mm/slab.c	2007-11-06 13:50:33.000000000 -0800
+++ linux-2.6/mm/slab.c	2007-11-06 13:50:54.000000000 -0800
@@ -2559,6 +2559,19 @@ int kmem_cache_defrag(int node)
 	return 0;
 }
 
+/*
+ * SLAB does not support slab defragmentation
+ */
+int kmem_cache_isolate_slab(struct page *page)
+{
+	return -ENOSYS;
+}
+
+int kmem_cache_reclaim(struct list_head *zaplist)
+{
+	return 0;
+}
+
 /**
  * kmem_cache_destroy - delete a cache
  * @cachep: the cache to destroy
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2007-11-06 13:50:40.000000000 -0800
+++ linux-2.6/mm/slub.c	2007-11-06 13:50:54.000000000 -0800
@@ -1088,18 +1088,19 @@ static noinline struct page *new_slab(st
 	page = allocate_slab(s,
 		flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node);
 	if (!page)
-		goto out;
+		return NULL;
 
 	n = get_node(s, page_to_nid(page));
 	if (n)
 		atomic_long_inc(&n->nr_slabs);
+
+	page->inuse = 0;
 	page->slab = s;
-	state = 1 << PG_slab;
+	state = page->flags | (1 << PG_slab);
 	if (s->flags & (SLAB_DEBUG_FREE | SLAB_RED_ZONE | SLAB_POISON |
 			SLAB_STORE_USER | SLAB_TRACE))
 		state |= SLABDEBUG;
 
-	page->flags |= state;
 	start = page_address(page);
 	page->end = start + 1;
 
@@ -1116,8 +1117,13 @@ static noinline struct page *new_slab(st
 	set_freepointer(s, last, page->end);
 
 	page->freelist = start;
-	page->inuse = 0;
-out:
+
+	/*
+	 * page->inuse must be 0 when PageSlab(page) becomes
+	 * true so that defrag knows that this slab is not in use.
+	 */
+	smp_wmb();
+	page->flags = state;
 	return page;
 }
 
@@ -2622,6 +2628,92 @@ out:
 }
 #endif
 
+
+/*
+ * Check if the given state is that of a reclaimable slab page.
+ *
+ * This is only true if this is indeed a slab page and if
+ * the page has not been frozen.
+ */
+static inline int reclaimable_slab(unsigned long state)
+{
+	if (!(state & (1 << PG_slab)))
+		return 0;
+
+	if (state & FROZEN)
+		return 0;
+
+	return 1;
+}
+
+ /*
+ * Isolate page from the slab partial lists. Return 0 if succesful.
+ *
+ * After isolation the LRU field can be used to put the page onto
+ * a reclaim list.
+ */
+int kmem_cache_isolate_slab(struct page *page)
+{
+	unsigned long flags;
+	struct kmem_cache *s;
+	int rc = -ENOENT;
+	unsigned long state;
+
+	/*
+	 * Avoid attempting to isolate the slab pages if there are
+	 * indications that this will not be successful.
+	 */
+	if (!reclaimable_slab(page->flags) || page_count(page) == 1)
+		return rc;
+
+	/*
+	 * Get a reference to the page. Return if its freed or being freed.
+	 * This is necessary to make sure that the page does not vanish
+	 * from under us before we are able to check the result.
+	 */
+	if (!get_page_unless_zero(page))
+		return rc;
+
+	local_irq_save(flags);
+	state = slab_lock(page);
+
+	/*
+	 * Check the flags again now that we have locked it.
+	 */
+	if (!reclaimable_slab(flags) || !page->inuse) {
+		slab_unlock(page, state);
+		put_page(page);
+		goto out;
+	}
+
+	/*
+	 * Drop reference count. There are object remaining and therefore
+	 * the slab lock will have to be taken before the last object can
+	 * be removed. We hold the slab lock, so no one can free this slab
+	 * now.
+	 *
+	 * We set the slab frozen before releasing the lock. This means
+	 * that no slab free action will be performed. If all objects are
+	 * removed then the slab will be freed during kmem_cache_reclaim().
+	 */
+	BUG_ON(page_count(page) <= 1);
+	put_page(page);
+
+	/*
+	 * Remove the slab from the lists and mark it frozen
+	 */
+	s = page->slab;
+	if (page->inuse < s->objects)
+		remove_partial(s, page);
+	else if (s->flags & SLAB_STORE_USER)
+		remove_full(s, page);
+	slab_unlock(page, state | FROZEN);
+	rc = 0;
+out:
+	local_irq_restore(flags);
+	return rc;
+}
+
 /*
  * Conversion table for small slabs sizes / 8 to the index in the
  * kmalloc array. This is necessary for slabs < 192 since we have non power
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c	2007-11-06 13:50:47.000000000 -0800
+++ linux-2.6/mm/vmscan.c	2007-11-06 13:50:54.000000000 -0800
@@ -687,6 +687,7 @@ static int __isolate_lru_page(struct pag
  */
 static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		struct list_head *src, struct list_head *dst,
+		struct list_head *slab_pages,
 		unsigned long *scanned, int order, int mode)
 {
 	unsigned long nr_taken = 0;
@@ -760,7 +761,13 @@ static unsigned long isolate_lru_pages(u
 			case -EBUSY:
 				/* else it is being freed elsewhere */
 				list_move(&cursor_page->lru, src);
+				break;
+
 			default:
+				if (slab_pages &&
+				    kmem_cache_isolate_slab(cursor_page) == 0)
+						list_add(&cursor_page->lru,
+							slab_pages);
 				break;
 			}
 		}
@@ -796,6 +803,7 @@ static unsigned long shrink_inactive_lis
 				struct zone *zone, struct scan_control *sc)
 {
 	LIST_HEAD(page_list);
+	LIST_HEAD(slab_list);
 	struct pagevec pvec;
 	unsigned long nr_scanned = 0;
 	unsigned long nr_reclaimed = 0;
@@ -813,7 +821,7 @@ static unsigned long shrink_inactive_lis
 
 		nr_taken = isolate_lru_pages(sc->swap_cluster_max,
 			     &zone->inactive_list,
-			     &page_list, &nr_scan, sc->order,
+			     &page_list, &slab_list, &nr_scan, sc->order,
 			     (sc->order > PAGE_ALLOC_COSTLY_ORDER)?
 					     ISOLATE_BOTH : ISOLATE_INACTIVE);
 		nr_active = clear_active_flags(&page_list);
@@ -824,6 +832,7 @@ static unsigned long shrink_inactive_lis
 						-(nr_taken - nr_active));
 		zone->pages_scanned += nr_scan;
 		spin_unlock_irq(&zone->lru_lock);
+		kmem_cache_reclaim(&slab_list);
 
 		nr_scanned += nr_scan;
 		nr_freed = shrink_page_list(&page_list, sc, PAGEOUT_IO_ASYNC);
@@ -1029,7 +1038,7 @@ force_reclaim_mapped:
 	lru_add_drain();
 	spin_lock_irq(&zone->lru_lock);
 	pgmoved = isolate_lru_pages(nr_pages, &zone->active_list,
-			    &l_hold, &pgscanned, sc->order, ISOLATE_ACTIVE);
+			&l_hold, NULL, &pgscanned, sc->order, ISOLATE_ACTIVE);
 	zone->pages_scanned += pgscanned;
 	__mod_zone_page_state(zone, NR_ACTIVE, -pgmoved);
 	spin_unlock_irq(&zone->lru_lock);

-- 

  parent reply	other threads:[~2007-11-07  1:11 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-07  1:11 [patch 00/23] Slab defragmentation V6 Christoph Lameter
2007-11-07  1:11 ` [patch 01/23] SLUB: Move count_partial() Christoph Lameter
2007-11-07  1:11 ` [patch 02/23] SLUB: Rename NUMA defrag_ratio to remote_node_defrag_ratio Christoph Lameter
2007-11-08 14:50   ` Mel Gorman
2007-11-08 17:25     ` Matt Mackall
2007-11-08 19:16       ` Christoph Lameter
2007-11-08 19:47         ` Matt Mackall
2007-11-08 20:01           ` Christoph Lameter
2007-11-08 21:03             ` Matt Mackall
2007-11-08 21:28               ` Christoph Lameter
2007-11-08 23:08                 ` Matt Mackall
2007-11-08 18:56     ` Christoph Lameter
2007-11-08 20:10       ` Mel Gorman
2007-11-07  1:11 ` [patch 03/23] bufferhead: Revert constructor removal Christoph Lameter
2007-11-07  1:11 ` [patch 04/23] dentries: Extract common code to remove dentry from lru Christoph Lameter
2007-11-07  8:50   ` Johannes Weiner
2007-11-07  9:43     ` Jörn Engel
2007-11-07 18:55       ` Christoph Lameter
2007-11-07 18:54         ` Jörn Engel
2007-11-07 19:00           ` Christoph Lameter
2007-11-07 18:28     ` Christoph Lameter
2007-11-07  1:11 ` [patch 05/23] VM: Allow get_page_unless_zero on compound pages Christoph Lameter
2007-11-07  1:11 ` [patch 06/23] SLUB: Extend slabinfo to support -D and -C options Christoph Lameter
2007-11-08 15:00   ` Mel Gorman
2007-11-07  1:11 ` [patch 07/23] SLUB: Add defrag_ratio field and sysfs support Christoph Lameter
2007-11-07  8:55   ` Johannes Weiner
2007-11-07 18:30     ` Christoph Lameter
2007-11-08 15:07   ` Mel Gorman
2007-11-08 18:59     ` Christoph Lameter
2007-11-07  1:11 ` [patch 08/23] SLUB: Replace ctor field with ops field in /sys/slab/:0000008 /sys/slab/:0000016 /sys/slab/:0000024 /sys/slab/:0000032 /sys/slab/:0000040 /sys/slab/:0000048 /sys/slab/:0000056 /sys/slab/:0000064 /sys/slab/:0000072 /sys/slab/:0000080 /sys/slab/:0000088 /sys/slab/:0000096 /sys/slab/:0000104 /sys/slab/:0000128 /sys/slab/:0000144 /sys/slab/:0000184 /sys/slab/:0000192 /sys/slab/:0000216 /sys/slab/:0000256 /sys/slab/:0000344 /sys/slab/:0000384 /sys/slab/:0000448 /sys/slab/:0000512 /sys/slab/:0000768 /sys/slab/:0000968 /sys/slab/:0001024 /sys/slab/:0001152 /sys/slab/:0001312 /sys/slab/:0001536 /sys/slab/:0002048 /sys/slab/:0003072 /sys/slab/:0004096 /sys/slab/:a-0000016 /sys/slab/:a-0000024 /sys/slab/:a-0000056 /sys/slab/:a-0000080 /sys/slab/:a-0000128 /sys/slab/Acpi-Namesp ace /sys/slab/Acpi-Operand /sys/slab/Acpi-Parse /sys/slab/Acpi-ParseExt /sys/slab/Acpi-State /sys/ Christoph Lameter
2007-11-07  1:11 ` [patch 09/23] SLUB: Add get() and kick() methods Christoph Lameter
2007-11-07  2:37   ` Adrian Bunk
2007-11-07  3:07     ` Christoph Lameter
2007-11-07  3:26       ` Adrian Bunk
2007-11-07  1:11 ` [patch 10/23] SLUB: Sort slab cache list and establish maximum objects for defrag slabs Christoph Lameter
2007-11-07  1:11 ` [patch 11/23] SLUB: Slab defrag core Christoph Lameter
2007-11-07 22:13   ` Christoph Lameter
2007-11-07  1:11 ` [patch 12/23] SLUB: Trigger defragmentation from memory reclaim Christoph Lameter
2007-11-07  9:28   ` Johannes Weiner
2007-11-07 18:34     ` Christoph Lameter
2007-11-08 15:12   ` Mel Gorman
2007-11-08 19:00     ` Christoph Lameter
2007-11-07  1:11 ` [patch 13/23] Buffer heads: Support slab defrag Christoph Lameter
2007-11-07  1:11 ` [patch 14/23] inodes: Support generic defragmentation Christoph Lameter
2007-11-07 10:17   ` Jörn Engel
2007-11-07 10:31     ` Jörn Engel
2007-11-07 10:35     ` Andreas Schwab
2007-11-07 10:35       ` Jörn Engel
2007-11-07 18:40     ` Christoph Lameter
2007-11-07 18:51       ` Jörn Engel
2007-11-07 19:00         ` Christoph Lameter
2007-11-07  1:11 ` [patch 15/23] FS: ExtX filesystem defrag Christoph Lameter
2007-11-07  1:11 ` [patch 16/23] FS: XFS slab defragmentation Christoph Lameter
2007-11-07  1:11 ` [patch 17/23] FS: Proc filesystem support for slab defrag Christoph Lameter
2007-11-07  1:11 ` [patch 18/23] FS: Slab defrag: Reiserfs support Christoph Lameter
2007-11-07  1:11 ` [patch 19/23] FS: Socket inode defragmentation Christoph Lameter
2007-11-07  1:11 ` [patch 20/23] dentries: Add constructor Christoph Lameter
2007-11-08 15:23   ` Mel Gorman
2007-11-08 19:03     ` Christoph Lameter
2007-11-07  1:11 ` [patch 21/23] dentries: dentry defragmentation Christoph Lameter
2007-11-07  1:11 ` Christoph Lameter [this message]
2007-11-07  1:11 ` [patch 23/23] SLUB: Add SlabReclaimable() to avoid repeated reclaim attempts Christoph Lameter
2007-11-08 15:26 ` [patch 00/23] Slab defragmentation V6 Mel Gorman
2007-11-08 16:01   ` Plans for Onezonelist patch series ??? Lee Schermerhorn
2007-11-08 18:34     ` Christoph Lameter
2007-11-08 18:40       ` Mel Gorman
2007-11-08 18:43         ` Christoph Lameter
2007-11-08 20:06           ` Mel Gorman
2007-11-08 20:20             ` Christoph Lameter
2007-11-08 20:29               ` Mel Gorman
2007-11-08 18:39     ` Mel Gorman
2007-11-08 19:39       ` Christoph Lameter
2007-11-08 19:12   ` [patch 00/23] Slab defragmentation V6 Christoph Lameter
2007-11-08 20:24     ` Mel Gorman
2007-11-08 20:28       ` Christoph Lameter
2007-11-08 20:58     ` Lee Schermerhorn
2007-11-08 21:27       ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071107011231.907368704@sgi.com \
    --to=clameter@sgi.com \
    --cc=akpm@linux-foundatin.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@skynet.ie \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox