linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Joel Schopp <jschopp@austin.ibm.com>
To: Joel Schopp <jschopp@austin.ibm.com>
Cc: Andrew Morton <akpm@osdl.org>,
	lhms <lhms-devel@lists.sourceforge.net>,
	Linux Memory Management List <linux-mm@kvack.org>,
	linux-kernel@vger.kernel.org, Mel Gorman <mel@csn.ul.ie>,
	Mike Kravetz <kravetz@us.ibm.com>
Subject: [PATCH 8/9] defrag fallback
Date: Mon, 26 Sep 2005 15:16:14 -0500	[thread overview]
Message-ID: <4338570E.6050101@austin.ibm.com> (raw)
In-Reply-To: <4338537E.8070603@austin.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 1263 bytes --]

When we can't allocate from the preferred allocation type we need fallback to
other allocation types.  This patch determines which allocation types we try
to fallback to in which order.  It also adds a special fallback type that is
designed to minimize the fragmentation caused by fallback between the other
types.

There is an implicit tradeoff being made here between avoiding fragmentation
and satisfying allocations.  This patch aims to keep existing behavior of
satisfying allocations if there is any free memory of any type to satisfy them.
It does a reasonable job of trying to minimize the fragmentation, and certainly
does better than a stock kernel in all situations.

However, it would not be hard to imagine scenarios where a different fallback
algorithm that fails more allocations was able to keep fragmentation down much
better, and on some systems this decreased fragmentation might even be worth
the cost of failing allocations.  Systems doing memory hotplug remove for
example.  This patch is designed so that the static function
fallback_alloc() can be easily replaced with an alternate implementation (under
a config option perhaps) in the future.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>

[-- Attachment #2: 8_defrag_fallback --]
[-- Type: text/plain, Size: 3586 bytes --]

Index: 2.6.13-joel2/mm/page_alloc.c
===================================================================
--- 2.6.13-joel2.orig/mm/page_alloc.c	2005-09-21 11:14:49.%N -0500
+++ 2.6.13-joel2/mm/page_alloc.c	2005-09-21 11:17:23.%N -0500
@@ -39,6 +39,17 @@
 #include "internal.h"
 
 /*
+ * fallback_allocs contains the fallback types for low memory conditions
+ * where the preferred alloction type if not available.
+ */
+int fallback_allocs[RCLM_TYPES][RCLM_TYPES+1] = {
+	{RCLM_NORCLM,RCLM_FALLBACK,  RCLM_KERN,  RCLM_USER,-1},
+	{RCLM_KERN,  RCLM_FALLBACK,  RCLM_NORCLM,RCLM_USER,-1},
+	{RCLM_USER,  RCLM_FALLBACK,  RCLM_NORCLM,RCLM_KERN,-1},
+	{RCLM_FALLBACK,  RCLM_NORCLM,RCLM_KERN,  RCLM_USER,-1}
+};
+
+/*
  * MCD - HACK: Find somewhere to initialize this EARLY, or make this
  * initializer cleaner
  */
@@ -576,13 +587,86 @@ static inline struct page
 }
 
 
+/*
+ * If we are falling back, and the allocation is KERNNORCLM,
+ * then reserve any buddies for the KERNNORCLM pool. These
+ * allocations fragment the worst so this helps keep them
+ * in the one place
+ */
+static inline void
+fallback_buddy_reserve(int start_alloctype, struct zone *zone,
+		       unsigned int current_order, struct page *page)
+{
+	int reserve_type = RCLM_NORCLM;
+	struct free_area *area;
+
+	if (start_alloctype == RCLM_NORCLM) {
+		area = zone->free_area_lists[RCLM_NORCLM] + current_order;
+
+		/* Reserve the whole block if this is a large split */
+		if (current_order >= MAX_ORDER / 2) {
+			dec_reserve_count(zone, get_pageblock_type(zone,page));
+
+			/*
+			 * Use this block for fallbacks if the
+			 * minimum reserve is not being met
+			 */
+			if (!is_min_fallback_reserved(zone))
+				reserve_type = RCLM_FALLBACK;
+
+			set_pageblock_type(zone, page, reserve_type);
+			inc_reserve_count(zone, reserve_type);
+		}
+
+	}
+
+}
+
 static struct page *
 fallback_alloc(int alloctype, struct zone *zone, unsigned int order)
 {
-	/* Stub out for seperate review, NULL equates to no fallback*/
+	int *fallback_list;
+	int start_alloctype;
+	unsigned int current_order;
+	struct free_area *area;
+	struct page* page;
+
+	/* Ok, pick the fallback order based on the type */
+	fallback_list = fallback_allocs[alloctype];
+	start_alloctype = alloctype;
+
+
+	/*
+	 * Here, the alloc type lists has been depleted as well as the global
+	 * pool, so fallback. When falling back, the largest possible block
+	 * will be taken to keep the fallbacks clustered if possible
+	 */
+	while ((alloctype = *(++fallback_list)) != -1) {
+
+		/* Find a block to allocate */
+		area = zone->free_area_lists[alloctype] + MAX_ORDER;
+		current_order=MAX_ORDER;
+		do {
+			current_order--;
+			area--;
+			if (!list_empty(&area->free_list)) {
+				page = list_entry(area->free_list.next,
+						  struct page, lru);
+				area->nr_free--;
+				fallback_buddy_reserve(start_alloctype, zone,
+						       current_order, page);
+				return remove_page(zone, page, order,
+						   current_order, area);
+			}
+
+		} while (current_order != order);
+
+	}
+
 	return NULL;
 
 }
+
 /* 
  * Do the hard work of removing an element from the buddy allocator.
  * Call me with the zone->lock already held.
@@ -2101,6 +2185,11 @@ static void __init free_area_init_core(s
 		spin_lock_init(&zone->lru_lock);
 		zone->zone_pgdat = pgdat;
 		zone->free_pages = 0;
+		zone->fallback_reserve = 0;
+
+		/* Set the balance so about 12.5% will be used for fallbacks */
+		zone->fallback_balance = (realsize >> (MAX_ORDER-1)) -
+					 (realsize >> (MAX_ORDER+2));
 
 		zone->temp_priority = zone->prev_priority = DEF_PRIORITY;
 

  parent reply	other threads:[~2005-09-26 20:16 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-26 20:01 [PATCH 0/9] fragmentation avoidance Joel Schopp
2005-09-26 20:03 ` [PATCH 1/9] add defrag flags Joel Schopp
2005-09-27  0:16   ` Kyle Moffett
2005-09-27  0:24     ` Dave Hansen
2005-09-27  0:43       ` Kyle Moffett
2005-09-27  5:44       ` Paul Jackson
2005-09-27 13:34         ` Mel Gorman
2005-09-27 16:26           ` [Lhms-devel] " Paul Jackson
2005-09-27 18:38         ` Joel Schopp
2005-09-27 19:30           ` Paul Jackson
2005-09-27 21:00             ` [Lhms-devel] " Joel Schopp
2005-09-27 21:23               ` Paul Jackson
2005-09-27 22:03                 ` Joel Schopp
2005-09-27 22:45                   ` Paul Jackson
2005-09-26 20:05 ` [PATCH 2/9] declare defrag structs Joel Schopp
2005-09-26 20:06 ` [PATCH 3/9] initialize defrag Joel Schopp
2005-09-26 20:09 ` [PATCH 4/9] defrag helper functions Joel Schopp
2005-09-26 22:29   ` Alex Bligh - linux-kernel
2005-09-27 16:08     ` Joel Schopp
2005-09-26 20:11 ` [PATCH 5/9] propagate defrag alloc types Joel Schopp
2005-09-26 20:13 ` [PATCH 6/9] fragmentation avoidance core Joel Schopp
2005-09-26 20:14 ` [PATCH 7/9] try harder on large allocations Joel Schopp
2005-09-27  7:21   ` Coywolf Qi Hunt
2005-09-27 16:17     ` Joel Schopp
2005-09-26 20:16 ` Joel Schopp [this message]
2005-09-26 20:17 ` [PATCH 9/9] free memory is user reclaimable Joel Schopp
2005-09-26 20:19 ` [PATCH 10/9] percpu splitout Joel Schopp
2005-09-26 21:49 ` [Lhms-devel] [PATCH 0/9] fragmentation avoidance Joel Schopp

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4338570E.6050101@austin.ibm.com \
    --to=jschopp@austin.ibm.com \
    --cc=akpm@osdl.org \
    --cc=kravetz@us.ibm.com \
    --cc=lhms-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox