[PATCH 0/2] Two patches to address bug report in relation to high-order atomic allocations

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] Two patches to address bug report in relation to high-order atomic allocations
@ 2007-05-14 17:32 Mel Gorman
  2007-05-14 17:32 ` [PATCH 1/2] Have kswapd keep a minimum order free other than order-0 Mel Gorman
                   ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Mel Gorman @ 2007-05-14 17:32 UTC (permalink / raw)
  To: nicolas.mailhot, apw, clameter; +Cc: Mel Gorman, akpm, linux-mm

The following two patches should address a problem reported at
http://lkml.org/lkml/2007/5/10/550 . The issue was that atomic high-order
allocations were failing even though free memory was available at the
requested order.

The first patch addresses an observation in the logs that the majority of
free memory was at lower orders even though it was known that high-order
allocations were regularly required. This patch informs kswapd that there
is a known high-order that allocation will regularly request, triggering
watermark reclaim at that order. Arguably, this minimum value that kswapd
reclaims at should be PAGE_ALLOC_COSTLY_ORDER.

The second patch addresses an issue where the callers ability to enter
direct reclaim is not taken into account when checking watermarks. The
patch alters zone_watermarks_ok() so that it only checks the watermarks at
order-0 when the caller is flagged ALLOC_HIGH or ALLOC_HARDER.

Nicolas, I would appreciate if you would test 2.6.21-mm2 with both of these
patches applied. They have changed in a number of respects from what what I
sent you over the weekend and I'd like to be sure the fix still works. Thanks
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-14 17:32 [PATCH 0/2] Two patches to address bug report in relation to high-order atomic allocations Mel Gorman
@ 2007-05-14 17:32 ` Mel Gorman
  2007-05-14 18:01   ` Christoph Lameter
  2007-05-14 17:32 ` [PATCH 2/2] Only check absolute watermarks for ALLOC_HIGH and ALLOC_HARDER allocations Mel Gorman
  2007-05-14 18:13 ` [PATCH 0/2] Two patches to address bug report in relation to high-order atomic allocations Nicolas Mailhot
  2 siblings, 1 reply; 39+ messages in thread
From: Mel Gorman @ 2007-05-14 17:32 UTC (permalink / raw)
  To: apw, clameter, nicolas.mailhot; +Cc: Mel Gorman, akpm, linux-mm

kswapd normally reclaims at order 0 unless there is a higher-order allocation
currently being serviced. However, in some cases it is known that there is a
minimum order size that is generally required such as when SLUB is configured
to use higher orders for performance reasons.  This patch allows a minumum
order to be set, such that min_free_kbytes pages are kept at higher orders.
This depends on lumpy-reclaim to work.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
---

 include/linux/mmzone.h |    1 +
 mm/slub.c              |    1 +
 mm/vmscan.c            |   34 +++++++++++++++++++++++++++++++---
 3 files changed, 33 insertions(+), 3 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-clean/include/linux/mmzone.h linux-2.6.21-mm2-001_kswapd_minorder/include/linux/mmzone.h
--- linux-2.6.21-mm2-clean/include/linux/mmzone.h	2007-05-11 21:16:11.000000000 +0100
+++ linux-2.6.21-mm2-001_kswapd_minorder/include/linux/mmzone.h	2007-05-14 17:09:39.000000000 +0100
@@ -499,6 +499,7 @@ typedef struct pglist_data {
 void get_zone_counts(unsigned long *active, unsigned long *inactive,
 			unsigned long *free);
 void build_all_zonelists(void);
+void raise_kswapd_order(unsigned int order);
 void wakeup_kswapd(struct zone *zone, int order);
 int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
 		int classzone_idx, int alloc_flags);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-clean/mm/slub.c linux-2.6.21-mm2-001_kswapd_minorder/mm/slub.c
--- linux-2.6.21-mm2-clean/mm/slub.c	2007-05-11 21:16:11.000000000 +0100
+++ linux-2.6.21-mm2-001_kswapd_minorder/mm/slub.c	2007-05-14 17:09:39.000000000 +0100
@@ -2131,6 +2131,7 @@ static struct kmem_cache *kmalloc_caches
 static int __init setup_slub_min_order(char *str)
 {
 	get_option (&str, &slub_min_order);
+	raise_kswapd_order(slub_min_order);
 	user_override = 1;
 	return 1;
 }
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-clean/mm/vmscan.c linux-2.6.21-mm2-001_kswapd_minorder/mm/vmscan.c
--- linux-2.6.21-mm2-clean/mm/vmscan.c	2007-05-11 21:16:11.000000000 +0100
+++ linux-2.6.21-mm2-001_kswapd_minorder/mm/vmscan.c	2007-05-14 17:09:39.000000000 +0100
@@ -1407,6 +1407,34 @@ out:
 	return nr_reclaimed;
 }
 
+static unsigned int kswapd_min_order __read_mostly;
+
+static inline int kswapd_order(unsigned int order)
+{
+	return max(kswapd_min_order, order);
+}
+
+/**
+ * raise_kswapd_order - Raise the minimum order that kswapd reclaims
+ * @order: The minimum order kswapd should reclaim at
+ *
+ * kswapd normally reclaims at order 0 unless there is a higher-order
+ * allocation being serviced. This function is used to set the minimum
+ * order that kswapd reclaims at when it is known there will be regular
+ * high-order allocations at a given order.
+ */
+void raise_kswapd_order(unsigned int order)
+{
+	if (order >= MAX_ORDER)
+		return;
+
+	/* Update order if necessary and inform if changed */
+	if (order > kswapd_min_order) {
+		kswapd_min_order = order;
+		printk(KERN_INFO "kswapd reclaim order set to %d\n", order);
+	}
+}
+
 /*
  * The background pageout daemon, started as a kernel thread
  * from the init process. 
@@ -1450,12 +1478,12 @@ static int kswapd(void *p)
 	 */
 	tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
 
-	order = 0;
+	order = kswapd_order(0);
 	for ( ; ; ) {
 		unsigned long new_order;
 
 		prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE);
-		new_order = pgdat->kswapd_max_order;
+		new_order = kswapd_order(pgdat->kswapd_max_order);
 		pgdat->kswapd_max_order = 0;
 		if (order < new_order) {
 			/*
@@ -1467,7 +1495,7 @@ static int kswapd(void *p)
 			if (!freezing(current))
 				schedule();
 
-			order = pgdat->kswapd_max_order;
+			order = kswapd_order(pgdat->kswapd_max_order);
 		}
 		finish_wait(&pgdat->kswapd_wait, &wait);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 2/2] Only check absolute watermarks for ALLOC_HIGH and ALLOC_HARDER allocations
  2007-05-14 17:32 [PATCH 0/2] Two patches to address bug report in relation to high-order atomic allocations Mel Gorman
  2007-05-14 17:32 ` [PATCH 1/2] Have kswapd keep a minimum order free other than order-0 Mel Gorman
@ 2007-05-14 17:32 ` Mel Gorman
  2007-05-16 12:14   ` Nick Piggin
  2007-05-14 18:13 ` [PATCH 0/2] Two patches to address bug report in relation to high-order atomic allocations Nicolas Mailhot
  2 siblings, 1 reply; 39+ messages in thread
From: Mel Gorman @ 2007-05-14 17:32 UTC (permalink / raw)
  To: nicolas.mailhot, clameter, apw; +Cc: Mel Gorman, akpm, linux-mm

zone_watermark_ok() checks if there are enough free pages including a reserve.
High-order allocations additionally check if there are enough free high-order
pages in relation to the watermark adjusted based on the requested size. If
there are not enough free high-order pages available, 0 is returned so that
the caller enters direct reclaim.

ALLOC_HIGH and ALLOC_HARDER allocations are allowed to dip further into
the reserves but also take into account if the number of free high-order
pages meet the adjusted watermarks. As these allocations cannot sleep,
they cannot enter direct reclaim so the allocation can fail even though
the pages are available and the number of free pages is well above the
watermark for order-0.

This patch alters the behaviour of zone_watermark_ok() slightly. Watermarks
are still obeyed but when an allocator is flagged ALLOC_HIGH or ALLOC_HARDER,
we only check that there is sufficient memory over the reserve to satisfy
the allocation, allocation size is ignored.  This patch also documents
better what zone_watermark_ok() is doing.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
---

 page_alloc.c |   21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-001_kswapd_minorder/mm/page_alloc.c linux-2.6.21-mm2-005_nowait_nowatermark/mm/page_alloc.c
--- linux-2.6.21-mm2-001_kswapd_minorder/mm/page_alloc.c	2007-05-14 17:11:37.000000000 +0100
+++ linux-2.6.21-mm2-005_nowait_nowatermark/mm/page_alloc.c	2007-05-14 17:12:40.000000000 +0100
@@ -1280,13 +1280,34 @@ int zone_watermark_ok(struct zone *z, in
 	long free_pages = zone_page_state(z, NR_FREE_PAGES) - (1 << order) + 1;
 	int o;
 
+	/*
+	 * Allow ALLOC_HIGH and ALLOC_HARDER to dip further into reserves
+	 * ALLOC_HIGH              => Reduce the required reserve by a half
+	 * ALLOC_HARDER            => Reduce the required reserve by a quarter
+	 * ALLOC_HIGH|ALLOC_HARDER => Reduce the required reserve by 5/8ths
+	 */
 	if (alloc_flags & ALLOC_HIGH)
 		min -= min / 2;
 	if (alloc_flags & ALLOC_HARDER)
 		min -= min / 4;
 
+	/* Ensure there are sufficient total pages less the reserve. */
 	if (free_pages <= min + z->lowmem_reserve[classzone_idx])
 		return 0;
+	
+	/*
+	 * If the allocation is flagged ALLOC_HARDER or ALLOC_HIGH, the
+	 * caller cannot enter direct reclaim, so allow them to take a page
+	 * if one exists as the absolute reserves have been met.
+	 */
+	if (alloc_flags & (ALLOC_HARDER | ALLOC_HIGH))
+		return 1;
+
+	/*
+	 * For higher order allocations that can sleep, check that there
+	 * are enough free high-order pages above a reserve adjusted
+	 * based on the requested order.
+	 */
 	for (o = 0; o < order; o++) {
 		/* At the next order, this order's pages become unavailable */
 		free_pages -= z->free_area[o].nr_free << o;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-14 17:32 ` [PATCH 1/2] Have kswapd keep a minimum order free other than order-0 Mel Gorman
@ 2007-05-14 18:01   ` Christoph Lameter
  2007-05-14 18:13     ` Christoph Lameter
  2007-05-14 18:19     ` Mel Gorman
  0 siblings, 2 replies; 39+ messages in thread
From: Christoph Lameter @ 2007-05-14 18:01 UTC (permalink / raw)
  To: Mel Gorman; +Cc: apw, nicolas.mailhot, akpm, linux-mm

On Mon, 14 May 2007, Mel Gorman wrote:

> +++ linux-2.6.21-mm2-001_kswapd_minorder/mm/slub.c	2007-05-14 17:09:39.000000000 +0100
> @@ -2131,6 +2131,7 @@ static struct kmem_cache *kmalloc_caches
>  static int __init setup_slub_min_order(char *str)
>  {
>  	get_option (&str, &slub_min_order);
> +	raise_kswapd_order(slub_min_order);
>  	user_override = 1;
>  	return 1;
>  }

You need to do this for slub_max_order not for slub_min_order. Also the
slub_max_order may not necessarily be used. It is just the maximum allowed 
order. I could maintain a slub_max_used_order variable. When that is 
increased I could call raise_kswapd_order?

The same call needs to be put into kmem_cache_init? Or is this only for 
orders > 3?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-14 18:01   ` Christoph Lameter
@ 2007-05-14 18:13     ` Christoph Lameter
  2007-05-14 18:24       ` Mel Gorman
  2007-05-15  4:39       ` Christoph Lameter
  2007-05-14 18:19     ` Mel Gorman
  1 sibling, 2 replies; 39+ messages in thread
From: Christoph Lameter @ 2007-05-14 18:13 UTC (permalink / raw)
  To: Mel Gorman; +Cc: apw, nicolas.mailhot, akpm, linux-mm

I think the slub fragment may have to be this way? This calls 
raise_kswapd_order on each kmem_cache_create with the order of the cache 
that was created thus insuring that the min_order is correctly.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/slub.c |    1 +
 1 file changed, 1 insertion(+)

Index: slub/mm/slub.c
===================================================================
--- slub.orig/mm/slub.c	2007-05-14 11:10:37.000000000 -0700
+++ slub/mm/slub.c	2007-05-14 11:10:55.000000000 -0700
@@ -1996,6 +1996,7 @@ static int kmem_cache_open(struct kmem_c
 #ifdef CONFIG_NUMA
 	s->defrag_ratio = 100;
 #endif
+	raise_kswapd_order(s->order);
 
 	if (init_kmem_cache_nodes(s, gfpflags & ~SLUB_DMA))
 		return 1;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/2] Two patches to address bug report in relation to high-order atomic allocations
  2007-05-14 17:32 [PATCH 0/2] Two patches to address bug report in relation to high-order atomic allocations Mel Gorman
  2007-05-14 17:32 ` [PATCH 1/2] Have kswapd keep a minimum order free other than order-0 Mel Gorman
  2007-05-14 17:32 ` [PATCH 2/2] Only check absolute watermarks for ALLOC_HIGH and ALLOC_HARDER allocations Mel Gorman
@ 2007-05-14 18:13 ` Nicolas Mailhot
  2 siblings, 0 replies; 39+ messages in thread
From: Nicolas Mailhot @ 2007-05-14 18:13 UTC (permalink / raw)
  To: Mel Gorman; +Cc: apw, clameter, akpm, linux-mm

[-- Attachment #1: Type: text/plain, Size: 441 bytes --]

Le lundi 14 mai 2007 à 18:32 +0100, Mel Gorman a écrit :

> Nicolas, I would appreciate if you would test 2.6.21-mm2 with both of these
> patches applied. They have changed in a number of respects from what what I
> sent you over the weekend and I'd like to be sure the fix still works.

I can test but problably not as thoroughly as these past days. Can't
have my system maxing up days in a row you know:)

-- 
Nicolas Mailhot

[-- Attachment #2: Ceci est une partie de message numériquement signée --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-14 18:01   ` Christoph Lameter
  2007-05-14 18:13     ` Christoph Lameter
@ 2007-05-14 18:19     ` Mel Gorman
  1 sibling, 0 replies; 39+ messages in thread
From: Mel Gorman @ 2007-05-14 18:19 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: apw, nicolas.mailhot, akpm, linux-mm

On Mon, 14 May 2007, Christoph Lameter wrote:

> On Mon, 14 May 2007, Mel Gorman wrote:
>
>> +++ linux-2.6.21-mm2-001_kswapd_minorder/mm/slub.c	2007-05-14 17:09:39.000000000 +0100
>> @@ -2131,6 +2131,7 @@ static struct kmem_cache *kmalloc_caches
>>  static int __init setup_slub_min_order(char *str)
>>  {
>>  	get_option (&str, &slub_min_order);
>> +	raise_kswapd_order(slub_min_order);
>>  	user_override = 1;
>>  	return 1;
>>  }
>
> You need to do this for slub_max_order not for slub_min_order.

The intention is to have kswapd keep high-order pages free of an order 
that is known to be of interest. Hence I used slub_min_order because it's 
known to be used regularly. By default, the value is 0 but it's higher if 
slub_min_order, then it gets raised.

> Also the slub_max_order may not necessarily be used. It is just the 
> maximum allowed order. I could maintain a slub_max_used_order variable. 
> When that is increased I could call raise_kswapd_order?
>

A slub_max_user_order variable may have been useful but your suggestion 
in relation to kmem_cache_open() makes more sense.

> The same call needs to be put into kmem_cache_init? Or is this only for
> orders > 3?
>

With kmem_cache_open(), altering kmem_cache_init seems unnecessary. 
Similarly, calling raise_kswapd_order() when parsing slub_min_order= is 
unnecessary.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-14 18:13     ` Christoph Lameter
@ 2007-05-14 18:24       ` Mel Gorman
  2007-05-14 18:52         ` Christoph Lameter
  2007-05-15  8:42         ` Nicolas Mailhot
  2007-05-15  4:39       ` Christoph Lameter
  1 sibling, 2 replies; 39+ messages in thread
From: Mel Gorman @ 2007-05-14 18:24 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: apw, nicolas.mailhot, akpm, linux-mm

On (14/05/07 11:13), Christoph Lameter didst pronounce:
> I think the slub fragment may have to be this way? This calls 
> raise_kswapd_order on each kmem_cache_create with the order of the cache 
> that was created thus insuring that the min_order is correctly.
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> 

Good plan. Revised patch as follows;


kswapd normally reclaims at order 0 unless there is a higher-order allocation
currently being serviced. However, in some cases it is known that there is a
minimum order size that is generally required such as when SLUB is configured
to use higher orders for performance reasons.  This patch allows a minumum
order to be set, such that min_free_kbytes pages are kept at higher orders.
This depends on lumpy-reclaim to work.

[clameter@sgi.com: Call raise_kswapd_order() on kmem_cache_open()]
Acked-by: Andy Whitcroft <apw@shadowen.org>

---
 include/linux/mmzone.h |    1 +
 mm/slub.c              |    1 +
 mm/vmscan.c            |   34 +++++++++++++++++++++++++++++++---
 3 files changed, 33 insertions(+), 3 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-clean/include/linux/mmzone.h linux-2.6.21-mm2-001_kswapd_minorder/include/linux/mmzone.h
--- linux-2.6.21-mm2-clean/include/linux/mmzone.h	2007-05-11 21:16:11.000000000 +0100
+++ linux-2.6.21-mm2-001_kswapd_minorder/include/linux/mmzone.h	2007-05-14 19:04:48.000000000 +0100
@@ -499,6 +499,7 @@ typedef struct pglist_data {
 void get_zone_counts(unsigned long *active, unsigned long *inactive,
 			unsigned long *free);
 void build_all_zonelists(void);
+void raise_kswapd_order(unsigned int order);
 void wakeup_kswapd(struct zone *zone, int order);
 int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
 		int classzone_idx, int alloc_flags);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-clean/mm/slub.c linux-2.6.21-mm2-001_kswapd_minorder/mm/slub.c
--- linux-2.6.21-mm2-clean/mm/slub.c	2007-05-11 21:16:11.000000000 +0100
+++ linux-2.6.21-mm2-001_kswapd_minorder/mm/slub.c	2007-05-14 19:20:23.000000000 +0100
@@ -2001,6 +2001,7 @@ static int kmem_cache_open(struct kmem_c
 #ifdef CONFIG_NUMA
 	s->defrag_ratio = 100;
 #endif
+	raise_kswapd_order(s->order);
 
 	if (init_kmem_cache_nodes(s, gfpflags & ~SLUB_DMA))
 		return 1;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-clean/mm/vmscan.c linux-2.6.21-mm2-001_kswapd_minorder/mm/vmscan.c
--- linux-2.6.21-mm2-clean/mm/vmscan.c	2007-05-11 21:16:11.000000000 +0100
+++ linux-2.6.21-mm2-001_kswapd_minorder/mm/vmscan.c	2007-05-14 19:04:48.000000000 +0100
@@ -1407,6 +1407,34 @@ out:
 	return nr_reclaimed;
 }
 
+static unsigned int kswapd_min_order __read_mostly;
+
+static inline int kswapd_order(unsigned int order)
+{
+	return max(kswapd_min_order, order);
+}
+
+/**
+ * raise_kswapd_order - Raise the minimum order that kswapd reclaims
+ * @order: The minimum order kswapd should reclaim at
+ *
+ * kswapd normally reclaims at order 0 unless there is a higher-order
+ * allocation being serviced. This function is used to set the minimum
+ * order that kswapd reclaims at when it is known there will be regular
+ * high-order allocations at a given order.
+ */
+void raise_kswapd_order(unsigned int order)
+{
+	if (order >= MAX_ORDER)
+		return;
+
+	/* Update order if necessary and inform if changed */
+	if (order > kswapd_min_order) {
+		kswapd_min_order = order;
+		printk(KERN_INFO "kswapd reclaim order set to %d\n", order);
+	}
+}
+
 /*
  * The background pageout daemon, started as a kernel thread
  * from the init process. 
@@ -1450,12 +1478,12 @@ static int kswapd(void *p)
 	 */
 	tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
 
-	order = 0;
+	order = kswapd_order(0);
 	for ( ; ; ) {
 		unsigned long new_order;
 
 		prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE);
-		new_order = pgdat->kswapd_max_order;
+		new_order = kswapd_order(pgdat->kswapd_max_order);
 		pgdat->kswapd_max_order = 0;
 		if (order < new_order) {
 			/*
@@ -1467,7 +1495,7 @@ static int kswapd(void *p)
 			if (!freezing(current))
 				schedule();
 
-			order = pgdat->kswapd_max_order;
+			order = kswapd_order(pgdat->kswapd_max_order);
 		}
 		finish_wait(&pgdat->kswapd_wait, &wait);
 
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-14 18:24       ` Mel Gorman
@ 2007-05-14 18:52         ` Christoph Lameter
  2007-05-15  8:42         ` Nicolas Mailhot
  1 sibling, 0 replies; 39+ messages in thread
From: Christoph Lameter @ 2007-05-14 18:52 UTC (permalink / raw)
  To: Mel Gorman; +Cc: apw, nicolas.mailhot, akpm, linux-mm

On Mon, 14 May 2007, Mel Gorman wrote:

> On (14/05/07 11:13), Christoph Lameter didst pronounce:
> > I think the slub fragment may have to be this way? This calls 
> > raise_kswapd_order on each kmem_cache_create with the order of the cache 
> > that was created thus insuring that the min_order is correctly.
> > 
> > Signed-off-by: Christoph Lameter <clameter@sgi.com>
> > 
> 
> Good plan. Revised patch as follows;

Acked-by: Christoph Lameter <clameter@sgi.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-14 18:13     ` Christoph Lameter
  2007-05-14 18:24       ` Mel Gorman
@ 2007-05-15  4:39       ` Christoph Lameter
  1 sibling, 0 replies; 39+ messages in thread
From: Christoph Lameter @ 2007-05-15  4:39 UTC (permalink / raw)
  To: Mel Gorman; +Cc: apw, nicolas.mailhot, akpm, linux-mm

On third thought: The trouble with this solution is that we will now set 
the order to that used by the largest kmalloc cache. Bad... this could be 
6 on i386 to 13 if CONFIG_LARGE_ALLOCs is set. The large kmalloc caches 
are rarely used and we are used to OOMing if those are utilized to 
frequently.

I guess we should only set this for non kmalloc caches then. 
So move the call into kmem_cache_create? Would make the min order 3 on
most of my mm machines.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/slub.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Index: slub/mm/slub.c
===================================================================
--- slub.orig/mm/slub.c	2007-05-14 21:33:48.000000000 -0700
+++ slub/mm/slub.c	2007-05-14 21:35:40.000000000 -0700
@@ -1996,8 +1996,6 @@ static int kmem_cache_open(struct kmem_c
 #ifdef CONFIG_NUMA
 	s->defrag_ratio = 100;
 #endif
-	raise_kswapd_order(s->order);
-
 	if (init_kmem_cache_nodes(s, gfpflags & ~SLUB_DMA))
 		return 1;
 error:
@@ -2560,6 +2558,7 @@ struct kmem_cache *kmem_cache_create(con
 				goto err;
 			}
 			list_add(&s->list, &slab_caches);
+			raise_kswapd_order(s->order);
 		} else
 			kfree(s);
 	}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-14 18:24       ` Mel Gorman
  2007-05-14 18:52         ` Christoph Lameter
@ 2007-05-15  8:42         ` Nicolas Mailhot
  2007-05-15  9:16           ` Mel Gorman
  2007-05-15 17:09           ` Christoph Lameter
  1 sibling, 2 replies; 39+ messages in thread
From: Nicolas Mailhot @ 2007-05-15  8:42 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Christoph Lameter, apw, akpm, linux-mm

[-- Attachment #1: Type: text/plain, Size: 634 bytes --]

Le lundi 14 mai 2007 à 19:24 +0100, Mel Gorman a écrit :
> On (14/05/07 11:13), Christoph Lameter didst pronounce:
> > I think the slub fragment may have to be this way? This calls 
> > raise_kswapd_order on each kmem_cache_create with the order of the cache 
> > that was created thus insuring that the min_order is correctly.
> > 
> > Signed-off-by: Christoph Lameter <clameter@sgi.com>
> > 
> 
> Good plan. Revised patch as follows;

Kernel with this patch and the other one survives testing. I'll stop
heavy testing now and consider the issue closed.

Thanks for looking at my bug report.

-- 
Nicolas Mailhot

[-- Attachment #2: Ceci est une partie de message numériquement signée --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-15  8:42         ` Nicolas Mailhot
@ 2007-05-15  9:16           ` Mel Gorman
  2007-05-16  8:25             ` Nick Piggin
  2007-05-15 17:09           ` Christoph Lameter
  1 sibling, 1 reply; 39+ messages in thread
From: Mel Gorman @ 2007-05-15  9:16 UTC (permalink / raw)
  To: Nicolas Mailhot
  Cc: Christoph Lameter, Andy Whitcroft, akpm, Linux Memory Management List

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; CHARSET=X-UNKNOWN; format=flowed, Size: 982 bytes --]

On Tue, 15 May 2007, Nicolas Mailhot wrote:

> Le lundi 14 mai 2007 à 19:24 +0100, Mel Gorman a écrit :
>> On (14/05/07 11:13), Christoph Lameter didst pronounce:
>>> I think the slub fragment may have to be this way? This calls
>>> raise_kswapd_order on each kmem_cache_create with the order of the cache
>>> that was created thus insuring that the min_order is correctly.
>>>
>>> Signed-off-by: Christoph Lameter <clameter@sgi.com>
>>>
>>
>> Good plan. Revised patch as follows;
>
> Kernel with this patch and the other one survives testing. I'll stop
> heavy testing now and consider the issue closed.
>

That is good news, thanks for the report.

> Thanks for looking at my bug report.
>

Thank you very much for your testing. I know it was a lot to ask to tie a 
machine up for a few days.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-15  8:42         ` Nicolas Mailhot
  2007-05-15  9:16           ` Mel Gorman
@ 2007-05-15 17:09           ` Christoph Lameter
  1 sibling, 0 replies; 39+ messages in thread
From: Christoph Lameter @ 2007-05-15 17:09 UTC (permalink / raw)
  To: Nicolas Mailhot; +Cc: Mel Gorman, apw, akpm, linux-mm

On Tue, 15 May 2007, Nicolas Mailhot wrote:

> Kernel with this patch and the other one survives testing. I'll stop
> heavy testing now and consider the issue closed.
> 
> Thanks for looking at my bug report.

Wow! This really works Mel! So I can start the work on merging the large 
buffer size / variable order page cache next? This is going to put some 
more pressure on the antifrag patchset.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-15  9:16           ` Mel Gorman
@ 2007-05-16  8:25             ` Nick Piggin
  2007-05-16  9:03               ` Mel Gorman
  0 siblings, 1 reply; 39+ messages in thread
From: Nick Piggin @ 2007-05-16  8:25 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nicolas Mailhot, Christoph Lameter, Andy Whitcroft, akpm,
	Linux Memory Management List

Mel Gorman wrote:
> On Tue, 15 May 2007, Nicolas Mailhot wrote:
> 
>> Le lundi 14 mai 2007 a 19:24 +0100, Mel Gorman a ecrit :
>>
>>> On (14/05/07 11:13), Christoph Lameter didst pronounce:
>>>
>>>> I think the slub fragment may have to be this way? This calls
>>>> raise_kswapd_order on each kmem_cache_create with the order of the 
>>>> cache
>>>> that was created thus insuring that the min_order is correctly.
>>>>
>>>> Signed-off-by: Christoph Lameter <clameter@sgi.com>
>>>>
>>>
>>> Good plan. Revised patch as follows;
>>
>>
>> Kernel with this patch and the other one survives testing. I'll stop
>> heavy testing now and consider the issue closed.
>>
> 
> That is good news, thanks for the report.
> 
>> Thanks for looking at my bug report.
>>
> 
> Thank you very much for your testing. I know it was a lot to ask to tie 
> a machine up for a few days.

Hmm, so we require higher order pages be kept free even if nothing is
using them? That's not very nice :(

-- 
SUSE Labs, Novell Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-16  8:25             ` Nick Piggin
@ 2007-05-16  9:03               ` Mel Gorman
  2007-05-16  9:10                 ` Nick Piggin
  0 siblings, 1 reply; 39+ messages in thread
From: Mel Gorman @ 2007-05-16  9:03 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Nicolas Mailhot, Christoph Lameter, Andy Whitcroft, akpm,
	Linux Memory Management List

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; CHARSET=X-UNKNOWN; FORMAT=flowed, Size: 1660 bytes --]

On Wed, 16 May 2007, Nick Piggin wrote:

> Mel Gorman wrote:
>> On Tue, 15 May 2007, Nicolas Mailhot wrote:
>> 
>>> Le lundi 14 mai 2007 à 19:24 +0100, Mel Gorman a écrit :
>>> 
>>>> On (14/05/07 11:13), Christoph Lameter didst pronounce:
>>>> 
>>>>> I think the slub fragment may have to be this way? This calls
>>>>> raise_kswapd_order on each kmem_cache_create with the order of the cache
>>>>> that was created thus insuring that the min_order is correctly.
>>>>> 
>>>>> Signed-off-by: Christoph Lameter <clameter@sgi.com>
>>>>> 
>>>> 
>>>> Good plan. Revised patch as follows;
>>> 
>>> 
>>> Kernel with this patch and the other one survives testing. I'll stop
>>> heavy testing now and consider the issue closed.
>>> 
>> 
>> That is good news, thanks for the report.
>> 
>>> Thanks for looking at my bug report.
>>> 
>> 
>> Thank you very much for your testing. I know it was a lot to ask to tie a 
>> machine up for a few days.
>
> Hmm, so we require higher order pages be kept free even if nothing is
> using them? That's not very nice :(
>

Not quite. We are already required to keep a minimum number of pages free 
even though nothing is using them. The difference is that if it is known 
high-order allocations are frequently required, the freed pages will be 
contiguous. If no one calls raise_kswapd_order(), kswapd will continue 
reclaiming at order-0. Arguably, e1000 should also be calling 
raise_kswapd_order() when it is using jumbo frames.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-16  9:03               ` Mel Gorman
@ 2007-05-16  9:10                 ` Nick Piggin
  2007-05-16  9:45                   ` Mel Gorman
  0 siblings, 1 reply; 39+ messages in thread
From: Nick Piggin @ 2007-05-16  9:10 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nicolas Mailhot, Christoph Lameter, Andy Whitcroft, akpm,
	Linux Memory Management List

Mel Gorman wrote:
> On Wed, 16 May 2007, Nick Piggin wrote:

>> Hmm, so we require higher order pages be kept free even if nothing is
>> using them? That's not very nice :(
>>
> 
> Not quite. We are already required to keep a minimum number of pages 
> free even though nothing is using them. The difference is that if it is 
> known high-order allocations are frequently required, the freed pages 
> will be contiguous. If no one calls raise_kswapd_order(), kswapd will 
> continue reclaiming at order-0.

And after they are stopped being used, it falls back to order-0? Why
can't this use the infrastructure that is already in place for that?


> Arguably, e1000 should also be calling 
> raise_kswapd_order() when it is using jumbo frames.

It should be able to handle higher order page allocation failures
gracefully. kswapd will be notified of the attempts and go on and try
to free up some higher order pages for it for next time. What is wrong
with this process? Are the higher order watermarks insufficient?

(I would also add that non-arguably, e1000 should also be able to do
scatter gather with jumbo frames too.)

-- 
SUSE Labs, Novell Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-16  9:10                 ` Nick Piggin
@ 2007-05-16  9:45                   ` Mel Gorman
  2007-05-16 12:28                     ` Nick Piggin
  0 siblings, 1 reply; 39+ messages in thread
From: Mel Gorman @ 2007-05-16  9:45 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Nicolas Mailhot, Christoph Lameter, Andy Whitcroft, akpm,
	Linux Memory Management List

On Wed, 16 May 2007, Nick Piggin wrote:

> Mel Gorman wrote:
>> On Wed, 16 May 2007, Nick Piggin wrote:
>
>>> Hmm, so we require higher order pages be kept free even if nothing is
>>> using them? That's not very nice :(
>>> 
>> 
>> Not quite. We are already required to keep a minimum number of pages free 
>> even though nothing is using them. The difference is that if it is known 
>> high-order allocations are frequently required, the freed pages will be 
>> contiguous. If no one calls raise_kswapd_order(), kswapd will continue 
>> reclaiming at order-0.
>
> And after they are stopped being used, it falls back to order-0?

No, raise_kswapd_order() is used when it is known there are many 
high-order allocations of a particular value. It becomes the minimum value 
kswapd reclaims at. SLUB does not *require* high order allocations but can 
be configured to use them so it makes sense to keep min_free_kbytes at 
that order to reduce stalls due to direct reclaim.

> Why
> can't this use the infrastructure that is already in place for that?
>

The infrastructure there currently deals nicely with the situation where 
there are rarely allocations of a high order. This change is for when it 
is known there are frequent high-order (e.g. orders 1-4) allocations. 
While the callers often can direct reclaim, kswapd should help them avoid 
stalls because reducing stalls is one of it's functions. With this patch, 
kswapd still reclaims the same number of pages, just tries to reclaim 
contiguous ones.

>> Arguably, e1000 should also be calling raise_kswapd_order() when it is 
>> using jumbo frames.
>
> It should be able to handle higher order page allocation failures
> gracefully.

Has something changed recently that it can handle failures? It might have 
because it has been hinted that it's possible, just not very fast.

> kswapd will be notified of the attempts and go on and try
> to free up some higher order pages for it for next time. What is wrong
> with this process?

It's reactive, it only occurs when a process has already entered direct 
reclaim.

> Are the higher order watermarks insufficient?
>

The high-order watermarks are still used to make a process that can sleep 
enter direct reclaim when the higher order watermarks are not being met.

> (I would also add that non-arguably, e1000 should also be able to do
> scatter gather with jumbo frames too.)
>

That's another football that has done the laps.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/2] Only check absolute watermarks for ALLOC_HIGH and ALLOC_HARDER allocations
  2007-05-14 17:32 ` [PATCH 2/2] Only check absolute watermarks for ALLOC_HIGH and ALLOC_HARDER allocations Mel Gorman
@ 2007-05-16 12:14   ` Nick Piggin
  2007-05-16 13:24     ` Mel Gorman
  0 siblings, 1 reply; 39+ messages in thread
From: Nick Piggin @ 2007-05-16 12:14 UTC (permalink / raw)
  To: Mel Gorman; +Cc: nicolas.mailhot, clameter, apw, akpm, linux-mm

Mel Gorman wrote:
> zone_watermark_ok() checks if there are enough free pages including a reserve.
> High-order allocations additionally check if there are enough free high-order
> pages in relation to the watermark adjusted based on the requested size. If
> there are not enough free high-order pages available, 0 is returned so that
> the caller enters direct reclaim.
> 
> ALLOC_HIGH and ALLOC_HARDER allocations are allowed to dip further into
> the reserves but also take into account if the number of free high-order
> pages meet the adjusted watermarks. As these allocations cannot sleep,

Why can't ALLOC_HIGH or ALLOC_HARDER sleep? This patch seems wrong to
me.

> they cannot enter direct reclaim so the allocation can fail even though
> the pages are available and the number of free pages is well above the
> watermark for order-0.
> 
> This patch alters the behaviour of zone_watermark_ok() slightly. Watermarks
> are still obeyed but when an allocator is flagged ALLOC_HIGH or ALLOC_HARDER,
> we only check that there is sufficient memory over the reserve to satisfy
> the allocation, allocation size is ignored.  This patch also documents
> better what zone_watermark_ok() is doing.

This is wrong because now you lose the buffering of higher order pages
for more urgent allocation classes against less urgent ones.

Think of how the order-0 allocation buffering works with the watermarks
and consider that we're trying to do the same exact thing for higher order
allocations here.

-- 
SUSE Labs, Novell Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-16  9:45                   ` Mel Gorman
@ 2007-05-16 12:28                     ` Nick Piggin
  2007-05-16 13:50                       ` Mel Gorman
  0 siblings, 1 reply; 39+ messages in thread
From: Nick Piggin @ 2007-05-16 12:28 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nicolas Mailhot, Christoph Lameter, Andy Whitcroft, akpm,
	Linux Memory Management List

Mel Gorman wrote:
> On Wed, 16 May 2007, Nick Piggin wrote:
> 
>> Mel Gorman wrote:
>>
>>> On Wed, 16 May 2007, Nick Piggin wrote:
>>
>>
>>>> Hmm, so we require higher order pages be kept free even if nothing is
>>>> using them? That's not very nice :(
>>>>
>>>
>>> Not quite. We are already required to keep a minimum number of pages 
>>> free even though nothing is using them. The difference is that if it 
>>> is known high-order allocations are frequently required, the freed 
>>> pages will be contiguous. If no one calls raise_kswapd_order(), 
>>> kswapd will continue reclaiming at order-0.
>>
>>
>> And after they are stopped being used, it falls back to order-0?
> 
> 
> No, raise_kswapd_order() is used when it is known there are many 
> high-order allocations of a particular value. It becomes the minimum 
> value kswapd reclaims at. SLUB does not *require* high order allocations 
> but can be configured to use them so it makes sense to keep 
> min_free_kbytes at that order to reduce stalls due to direct reclaim.

The point is you still might not have anything performing those
allocations from those higher order caches. Or you might have things
that are doing higher order allocations, but not via slab.

Basically this is dumbing down the existing higher order watermarking
already there in favour of a worse special case AFAIKS.


>> Why
>> can't this use the infrastructure that is already in place for that?
>>
> 
> The infrastructure there currently deals nicely with the situation where 
> there are rarely allocations of a high order. This change is for when it 
> is known there are frequent high-order (e.g. orders 1-4) allocations. 
> While the callers often can direct reclaim, kswapd should help them 
> avoid stalls because reducing stalls is one of it's functions. With this 
> patch, kswapd still reclaims the same number of pages, just tries to 
> reclaim contiguous ones.

kswapd already does reclaim on behalf of non-sleeping higher order
allocations (or at least it does in mainline).


>>> Arguably, e1000 should also be calling raise_kswapd_order() when it 
>>> is using jumbo frames.
>>
>>
>> It should be able to handle higher order page allocation failures
>> gracefully.
> 
> 
> Has something changed recently that it can handle failures? It might 
> have because it has been hinted that it's possible, just not very fast.

I don't know, but it is stupid if it can't.
It should not be too hard to keep it fast where it is fast today, and have
it at least work where it would otherwise fail... just by reserving some
memory pages in case none can be allocated.


>> kswapd will be notified of the attempts and go on and try
>> to free up some higher order pages for it for next time. What is wrong
>> with this process?
> 
> 
> It's reactive, it only occurs when a process has already entered direct 
> reclaim.

No it should not be. It should be proactive even for higher order allocations.
All this stuff used to work properly :(


>> Are the higher order watermarks insufficient?
>>
> 
> The high-order watermarks are still used to make a process that can 
> sleep enter direct reclaim when the higher order watermarks are not 
> being met.
 >
>> (I would also add that non-arguably, e1000 should also be able to do
>> scatter gather with jumbo frames too.)
>>
> 
> That's another football that has done the laps.

I think the hardware can do it.

-- 
SUSE Labs, Novell Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/2] Only check absolute watermarks for ALLOC_HIGH and ALLOC_HARDER allocations
  2007-05-16 12:14   ` Nick Piggin
@ 2007-05-16 13:24     ` Mel Gorman
  2007-05-16 13:35       ` Nick Piggin
  0 siblings, 1 reply; 39+ messages in thread
From: Mel Gorman @ 2007-05-16 13:24 UTC (permalink / raw)
  To: Nick Piggin; +Cc: nicolas.mailhot, clameter, apw, akpm, linux-mm

On (16/05/07 22:14), Nick Piggin didst pronounce:
> Mel Gorman wrote:
> >zone_watermark_ok() checks if there are enough free pages including a 
> >reserve.
> >High-order allocations additionally check if there are enough free 
> >high-order
> >pages in relation to the watermark adjusted based on the requested size. If
> >there are not enough free high-order pages available, 0 is returned so that
> >the caller enters direct reclaim.
> >
> >ALLOC_HIGH and ALLOC_HARDER allocations are allowed to dip further into
> >the reserves but also take into account if the number of free high-order
> >pages meet the adjusted watermarks. As these allocations cannot sleep,
> 
> Why can't ALLOC_HIGH or ALLOC_HARDER sleep? This patch seems wrong to
> me.
> 

In page_alloc.c

        if ((unlikely(rt_task(p)) && !in_interrupt()) || !wait)
                alloc_flags |= ALLOC_HARDER;

See the !wait part.

The ALLOC_HIGH applies to __GFP_HIGH allocations which are allowed to
dip into emergency pools and go below the reserve.

> >they cannot enter direct reclaim so the allocation can fail even though
> >the pages are available and the number of free pages is well above the
> >watermark for order-0.
> >
> >This patch alters the behaviour of zone_watermark_ok() slightly. Watermarks
> >are still obeyed but when an allocator is flagged ALLOC_HIGH or 
> >ALLOC_HARDER,
> >we only check that there is sufficient memory over the reserve to satisfy
> >the allocation, allocation size is ignored.  This patch also documents
> >better what zone_watermark_ok() is doing.
> 
> This is wrong because now you lose the buffering of higher order pages
> for more urgent allocation classes against less urgent ones.
> 

ALLOC_HARDER is an urgent allocation class.

> Think of how the order-0 allocation buffering works with the watermarks
> and consider that we're trying to do the same exact thing for higher order
> allocations here.
> 

What actually happens is that high-order allocations fail even though
the watermarks are met because they cannot enter direct reclaim.

> -- 
> SUSE Labs, Novell Inc.

-- 
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/2] Only check absolute watermarks for ALLOC_HIGH and ALLOC_HARDER allocations
  2007-05-16 13:24     ` Mel Gorman
@ 2007-05-16 13:35       ` Nick Piggin
  2007-05-16 14:00         ` Mel Gorman
  0 siblings, 1 reply; 39+ messages in thread
From: Nick Piggin @ 2007-05-16 13:35 UTC (permalink / raw)
  To: Mel Gorman; +Cc: nicolas.mailhot, clameter, apw, akpm, linux-mm

Mel Gorman wrote:
> On (16/05/07 22:14), Nick Piggin didst pronounce:
> 
>>Mel Gorman wrote:
>>
>>>zone_watermark_ok() checks if there are enough free pages including a 
>>>reserve.
>>>High-order allocations additionally check if there are enough free 
>>>high-order
>>>pages in relation to the watermark adjusted based on the requested size. If
>>>there are not enough free high-order pages available, 0 is returned so that
>>>the caller enters direct reclaim.
>>>
>>>ALLOC_HIGH and ALLOC_HARDER allocations are allowed to dip further into
>>>the reserves but also take into account if the number of free high-order
>>>pages meet the adjusted watermarks. As these allocations cannot sleep,
>>
>>Why can't ALLOC_HIGH or ALLOC_HARDER sleep? This patch seems wrong to
>>me.
>>
> 
> 
> In page_alloc.c
> 
>         if ((unlikely(rt_task(p)) && !in_interrupt()) || !wait)
>                 alloc_flags |= ALLOC_HARDER;
> 
> See the !wait part.

And the || part.


> The ALLOC_HIGH applies to __GFP_HIGH allocations which are allowed to
> dip into emergency pools and go below the reserve.

And some of them can sleep too.


>>>they cannot enter direct reclaim so the allocation can fail even though
>>>the pages are available and the number of free pages is well above the
>>>watermark for order-0.
>>>
>>>This patch alters the behaviour of zone_watermark_ok() slightly. Watermarks
>>>are still obeyed but when an allocator is flagged ALLOC_HIGH or 
>>>ALLOC_HARDER,
>>>we only check that there is sufficient memory over the reserve to satisfy
>>>the allocation, allocation size is ignored.  This patch also documents
>>>better what zone_watermark_ok() is doing.
>>
>>This is wrong because now you lose the buffering of higher order pages
>>for more urgent allocation classes against less urgent ones.
>>
> 
> 
> ALLOC_HARDER is an urgent allocation class.

And HIGH is even more, and MEMALLOC even more again.


>>Think of how the order-0 allocation buffering works with the watermarks
>>and consider that we're trying to do the same exact thing for higher order
>>allocations here.
>>
> 
> 
> What actually happens is that high-order allocations fail even though
> the watermarks are met because they cannot enter direct reclaim.

Yeah, they fail leaving some spare for more urgent allocations. Like
how the order-0 allocations work.

They should also kick kswapd to start freeing pages _before_ they start
failing too.

-- 
SUSE Labs, Novell Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-16 12:28                     ` Nick Piggin
@ 2007-05-16 13:50                       ` Mel Gorman
  2007-05-16 14:04                         ` Nick Piggin
  2007-05-16 14:20                         ` Nick Piggin
  0 siblings, 2 replies; 39+ messages in thread
From: Mel Gorman @ 2007-05-16 13:50 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Nicolas Mailhot, Christoph Lameter, Andy Whitcroft, akpm,
	Linux Memory Management List

On (16/05/07 22:28), Nick Piggin didst pronounce:
> Mel Gorman wrote:
> >On Wed, 16 May 2007, Nick Piggin wrote:
> >
> >>Mel Gorman wrote:
> >>
> >>>On Wed, 16 May 2007, Nick Piggin wrote:
> >>
> >>
> >>>>Hmm, so we require higher order pages be kept free even if nothing is
> >>>>using them? That's not very nice :(
> >>>>
> >>>
> >>>Not quite. We are already required to keep a minimum number of pages 
> >>>free even though nothing is using them. The difference is that if it 
> >>>is known high-order allocations are frequently required, the freed 
> >>>pages will be contiguous. If no one calls raise_kswapd_order(), 
> >>>kswapd will continue reclaiming at order-0.
> >>
> >>
> >>And after they are stopped being used, it falls back to order-0?
> >
> >
> >No, raise_kswapd_order() is used when it is known there are many 
> >high-order allocations of a particular value. It becomes the minimum 
> >value kswapd reclaims at. SLUB does not *require* high order allocations 
> >but can be configured to use them so it makes sense to keep 
> >min_free_kbytes at that order to reduce stalls due to direct reclaim.
> 
> The point is you still might not have anything performing those
> allocations from those higher order caches. Or you might have things
> that are doing higher order allocations, but not via slab.
> 

On the contrary, raise_kswapd_order() is called when you *know* things will
be performing those allocations. However, I think what you are saying is
that kswapd could end up reclaiming at the highest-order cache even though
it might be very rarely used. Christoph identified the same problem and sent
a follow-up patch, this is the leader

======

On third thought: The trouble with this solution is that we will now set
the order to that used by the largest kmalloc cache. Bad... this could be
6 on i386 to 13 if CONFIG_LARGE_ALLOCs is set. The large kmalloc caches are
rarely used and we are used to OOMing if those are utilized to frequently.

I guess we should only set this for non kmalloc caches then. 
So move the call into kmem_cache_create? Would make the min order 3 on
most of my mm machines.
===

The second part of what you say is that there could be a non-slab user of
high order allocs. That is true and expected. In that case, the existing
mechanism informs kswapd of the higher order as it does today so it can
reclaim at the higher order for a bit and enter direct reclaim if necessary.

> Basically this is dumbing down the existing higher order watermarking
> already there in favour of a worse special case AFAIKS.
> 

It's not being replaced. That existing watermarking is still used. If it
was being replaced, the for loop in zone_watermark_ok() would have been
taken out.

> 
> >>Why
> >>can't this use the infrastructure that is already in place for that?
> >>
> >
> >The infrastructure there currently deals nicely with the situation where 
> >there are rarely allocations of a high order. This change is for when it 
> >is known there are frequent high-order (e.g. orders 1-4) allocations. 
> >While the callers often can direct reclaim, kswapd should help them 
> >avoid stalls because reducing stalls is one of it's functions. With this 
> >patch, kswapd still reclaims the same number of pages, just tries to 
> >reclaim contiguous ones.
> 
> kswapd already does reclaim on behalf of non-sleeping higher order
> allocations (or at least it does in mainline).
> 

My point is that when it does, a caller is still likely to enter direct
reclaim and kswapd can help prevent stalls if it pre-emptively reclaims at
an order known to be commonly used when free pages is below watermarks

> 
> >>>Arguably, e1000 should also be calling raise_kswapd_order() when it 
> >>>is using jumbo frames.
> >>
> >>
> >>It should be able to handle higher order page allocation failures
> >>gracefully.
> >
> >
> >Has something changed recently that it can handle failures? It might 
> >have because it has been hinted that it's possible, just not very fast.
> 
> I don't know, but it is stupid if it can't.

Well, if it could, order:3 allocation failure reports wouldn't occur
periodically.

> It should not be too hard to keep it fast where it is fast today, and have
> it at least work where it would otherwise fail... just by reserving some
> memory pages in case none can be allocated.
> 

It already reserves and still occasionally hits the problem.

> 
> >>kswapd will be notified of the attempts and go on and try
> >>to free up some higher order pages for it for next time. What is wrong
> >>with this process?
> >
> >
> >It's reactive, it only occurs when a process has already entered direct 
> >reclaim.
> 
> No it should not be. It should be proactive even for higher order 
> allocations.

I don't see why it would be. kswapd is only told to wake up when the
first allocation attempt obeying watermarks fails.

> All this stuff used to work properly :(
> 

It only came to light recently that there might be issues.

> 
> >>Are the higher order watermarks insufficient?
> >>
> >
> >The high-order watermarks are still used to make a process that can 
> >sleep enter direct reclaim when the higher order watermarks are not 
> >being met.
> >
> >>(I would also add that non-arguably, e1000 should also be able to do
> >>scatter gather with jumbo frames too.)
> >>
> >
> >That's another football that has done the laps.
> 
> I think the hardware can do it.
> 

e1000 cards come in such a variety of capabilitys that it's difficult to
tell

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/2] Only check absolute watermarks for ALLOC_HIGH and ALLOC_HARDER allocations
  2007-05-16 13:35       ` Nick Piggin
@ 2007-05-16 14:00         ` Mel Gorman
  2007-05-16 14:11           ` Nick Piggin
  0 siblings, 1 reply; 39+ messages in thread
From: Mel Gorman @ 2007-05-16 14:00 UTC (permalink / raw)
  To: Nick Piggin; +Cc: nicolas.mailhot, clameter, apw, akpm, linux-mm

On (16/05/07 23:35), Nick Piggin didst pronounce:
> Mel Gorman wrote:
> >On (16/05/07 22:14), Nick Piggin didst pronounce:
> >
> >>Mel Gorman wrote:
> >>
> >>>zone_watermark_ok() checks if there are enough free pages including a 
> >>>reserve.
> >>>High-order allocations additionally check if there are enough free 
> >>>high-order
> >>>pages in relation to the watermark adjusted based on the requested size. 
> >>>If
> >>>there are not enough free high-order pages available, 0 is returned so 
> >>>that
> >>>the caller enters direct reclaim.
> >>>
> >>>ALLOC_HIGH and ALLOC_HARDER allocations are allowed to dip further into
> >>>the reserves but also take into account if the number of free high-order
> >>>pages meet the adjusted watermarks. As these allocations cannot sleep,
> >>
> >>Why can't ALLOC_HIGH or ALLOC_HARDER sleep? This patch seems wrong to
> >>me.
> >>
> >
> >
> >In page_alloc.c
> >
> >        if ((unlikely(rt_task(p)) && !in_interrupt()) || !wait)
> >                alloc_flags |= ALLOC_HARDER;
> >
> >See the !wait part.
> 
> And the || part.
> 

I doubt a rt_task is thrilled to be entering direct reclaim.

> 
> >The ALLOC_HIGH applies to __GFP_HIGH allocations which are allowed to
> >dip into emergency pools and go below the reserve.
> 
> And some of them can sleep too.
> 

If you feel very strongly about it, I can back out the ALLOC_HIGH part for
__GFP_HIGH allocations but it looks like at a glance that users of __GFP_HIGH
are not too keen on sleeping;

drivers/block/rd.c;
	Comment
	Deep badness.  rd_blkdev_pagecache_IO() needs to allocate
	pagecache pages within a request_fn.  We cannot recur back
	into the filesytem which is mounted atop the ramdisk

fs/ext4/writeback.c;
	Using __GFP_HIGH when allocating bios

kernel/power/swap.c;
	Using __GFP_HIGH when allocating bios

The change is still obeying watermarks, just at order-0 instead of
strictly observing the higher orders.

> 
> >>>they cannot enter direct reclaim so the allocation can fail even though
> >>>the pages are available and the number of free pages is well above the
> >>>watermark for order-0.
> >>>
> >>>This patch alters the behaviour of zone_watermark_ok() slightly. 
> >>>Watermarks
> >>>are still obeyed but when an allocator is flagged ALLOC_HIGH or 
> >>>ALLOC_HARDER,
> >>>we only check that there is sufficient memory over the reserve to satisfy
> >>>the allocation, allocation size is ignored.  This patch also documents
> >>>better what zone_watermark_ok() is doing.
> >>
> >>This is wrong because now you lose the buffering of higher order pages
> >>for more urgent allocation classes against less urgent ones.
> >>
> >
> >
> >ALLOC_HARDER is an urgent allocation class.
> 
> And HIGH is even more, and MEMALLOC even more again.
> 

HIGH => ALLOC_HIGH => obey watermarks at order-0

Somewhat counter-intuitively, with the current code if the allocation is
a really high priority but can sleep, it can actually allocate without any
watermarks at all

> 
> >>Think of how the order-0 allocation buffering works with the watermarks
> >>and consider that we're trying to do the same exact thing for higher order
> >>allocations here.
> >>
> >
> >
> >What actually happens is that high-order allocations fail even though
> >the watermarks are met because they cannot enter direct reclaim.
> 
> Yeah, they fail leaving some spare for more urgent allocations. Like
> how the order-0 allocations work.

order-0 watermarks are still in place. After the patch, it is still not
possible for the allocations to break the watermarks there.

> They should also kick kswapd to start freeing pages _before_ they start
> failing too.
> 

Should prehaps, but from what I read kswapd is only kicked into action
when the first allocation attempt has already failed.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-16 13:50                       ` Mel Gorman
@ 2007-05-16 14:04                         ` Nick Piggin
  2007-05-16 15:32                           ` Mel Gorman
  2007-05-16 14:20                         ` Nick Piggin
  1 sibling, 1 reply; 39+ messages in thread
From: Nick Piggin @ 2007-05-16 14:04 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nicolas Mailhot, Christoph Lameter, Andy Whitcroft, akpm,
	Linux Memory Management List

Mel Gorman wrote:
> On (16/05/07 22:28), Nick Piggin didst pronounce:
> 
>>Mel Gorman wrote:
>>
>>>On Wed, 16 May 2007, Nick Piggin wrote:
>>>

>>>No, raise_kswapd_order() is used when it is known there are many 
>>>high-order allocations of a particular value. It becomes the minimum 
>>>value kswapd reclaims at. SLUB does not *require* high order allocations 
>>>but can be configured to use them so it makes sense to keep 
>>>min_free_kbytes at that order to reduce stalls due to direct reclaim.
>>
>>The point is you still might not have anything performing those
>>allocations from those higher order caches. Or you might have things
>>that are doing higher order allocations, but not via slab.
>>
> 
> 
> On the contrary, raise_kswapd_order() is called when you *know* things will
> be performing those allocations. However, I think what you are saying is
> that kswapd could end up reclaiming at the highest-order cache even though
> it might be very rarely used. Christoph identified the same problem and sent
> a follow-up patch, this is the leader
> 
> ======
> 
> On third thought: The trouble with this solution is that we will now set
> the order to that used by the largest kmalloc cache. Bad... this could be
> 6 on i386 to 13 if CONFIG_LARGE_ALLOCs is set. The large kmalloc caches are
> rarely used and we are used to OOMing if those are utilized to frequently.
> 
> I guess we should only set this for non kmalloc caches then. 
> So move the call into kmem_cache_create? Would make the min order 3 on
> most of my mm machines.
> ===

You do not *know* if the slab is going to be allocated from. Or maybe it
is a few times at bootup, or once every 10 minutes.


> The second part of what you say is that there could be a non-slab user of
> high order allocs. That is true and expected. In that case, the existing
> mechanism informs kswapd of the higher order as it does today so it can
> reclaim at the higher order for a bit and enter direct reclaim if necessary.

You seem to have broken the existing mechanism though.


>>Basically this is dumbing down the existing higher order watermarking
>>already there in favour of a worse special case AFAIKS.
>>
> 
> 
> It's not being replaced. That existing watermarking is still used. If it
> was being replaced, the for loop in zone_watermark_ok() would have been
> taken out.

Patch 2 sure doesn't make it any better.


>>kswapd already does reclaim on behalf of non-sleeping higher order
>>allocations (or at least it does in mainline).
>>
> 
> 
> My point is that when it does, a caller is still likely to enter direct
> reclaim and kswapd can help prevent stalls if it pre-emptively reclaims at
> an order known to be commonly used when free pages is below watermarks

So we should increase the watermarks, and keep the existing, working
code there and it will work for everyone, not just for slab, and it
will not keep higher orders free if they are not needed.


>>>>>Arguably, e1000 should also be calling raise_kswapd_order() when it 
>>>>>is using jumbo frames.
>>>>
>>>>
>>>>It should be able to handle higher order page allocation failures
>>>>gracefully.
>>>
>>>
>>>Has something changed recently that it can handle failures? It might 
>>>have because it has been hinted that it's possible, just not very fast.
>>
>>I don't know, but it is stupid if it can't.
> 
> 
> Well, if it could, order:3 allocation failure reports wouldn't occur
> periodically.

They are reports of failures, not failure to handle the failures.


>>It should not be too hard to keep it fast where it is fast today, and have
>>it at least work where it would otherwise fail... just by reserving some
>>memory pages in case none can be allocated.
>>
> 
> 
> It already reserves and still occasionally hits the problem.

e1000 reserves page? It would have to use them in a manner that guaranteed
timely return to the reserve pool like mempools. If it did that then it
would not have a problem.


>>>>kswapd will be notified of the attempts and go on and try
>>>>to free up some higher order pages for it for next time. What is wrong
>>>>with this process?
>>>
>>>
>>>It's reactive, it only occurs when a process has already entered direct 
>>>reclaim.
>>
>>No it should not be. It should be proactive even for higher order 
>>allocations.
> 
> 
> I don't see why it would be. kswapd is only told to wake up when the
> first allocation attempt obeying watermarks fails.

That first watermark is the the reclaim watermark, not the allocation
watermark.


>>All this stuff used to work properly :(
>>
> 
> 
> It only came to light recently that there might be issues.

I mean kswapd asynchronously freeing higher order pages proactively. We
should get that working again first.

-- 
SUSE Labs, Novell Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/2] Only check absolute watermarks for ALLOC_HIGH and ALLOC_HARDER allocations
  2007-05-16 14:00         ` Mel Gorman
@ 2007-05-16 14:11           ` Nick Piggin
  2007-05-16 18:28             ` Andy Whitcroft
  0 siblings, 1 reply; 39+ messages in thread
From: Nick Piggin @ 2007-05-16 14:11 UTC (permalink / raw)
  To: Mel Gorman; +Cc: nicolas.mailhot, clameter, apw, akpm, linux-mm

Mel Gorman wrote:
> On (16/05/07 23:35), Nick Piggin didst pronounce:
> 
>>Mel Gorman wrote:

>>>In page_alloc.c
>>>
>>>       if ((unlikely(rt_task(p)) && !in_interrupt()) || !wait)
>>>               alloc_flags |= ALLOC_HARDER;
>>>
>>>See the !wait part.
>>
>>And the || part.
>>
> 
> 
> I doubt a rt_task is thrilled to be entering direct reclaim.

Doesn't mean you should break the watermarks. !wait allocations don't
always happen from interrupt context either, and it is possible to see
code doing

if (!alloc(GFP_KERNEL&~__GFP_WAIT)) {
     spin_unlock()
     alloc(GFP_KERNEL)
     spin_lock()
}


>>>The ALLOC_HIGH applies to __GFP_HIGH allocations which are allowed to
>>>dip into emergency pools and go below the reserve.
>>
>>And some of them can sleep too.
>>
> 
> 
> If you feel very strongly about it, I can back out the ALLOC_HIGH part for
> __GFP_HIGH allocations but it looks like at a glance that users of __GFP_HIGH
> are not too keen on sleeping;

I feel strongly about not breaking these things which are specifically there
for a reason and that are being changed seemingly because of the false
impression that kswapd doesn't proactively free pages for them.


>>>ALLOC_HARDER is an urgent allocation class.
>>
>>And HIGH is even more, and MEMALLOC even more again.
>>
> 
> 
> HIGH => ALLOC_HIGH => obey watermarks at order-0
> 
> Somewhat counter-intuitively, with the current code if the allocation is
> a really high priority but can sleep, it can actually allocate without any
> watermarks at all

I didn't understand what you meant?


>>>What actually happens is that high-order allocations fail even though
>>>the watermarks are met because they cannot enter direct reclaim.
>>
>>Yeah, they fail leaving some spare for more urgent allocations. Like
>>how the order-0 allocations work.
> 
> 
> order-0 watermarks are still in place. After the patch, it is still not
> possible for the allocations to break the watermarks there.

The watermarks for higher order pages you could say are implicit but
still there. They are scaled down from the order-0 watermarks, so they
should behave in the same way. I just can't understand why you're
bypassing these if you think the order-0 behaviour is OK.


>>They should also kick kswapd to start freeing pages _before_ they start
>>failing too.
>>
> 
> 
> Should prehaps, but from what I read kswapd is only kicked into action
> when the first allocation attempt has already failed.

Well that's wrong unless you are allocating with GFP_THISNODE, in which
case that is specifically the behaviour that is asked for.

-- 
SUSE Labs, Novell Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-16 13:50                       ` Mel Gorman
  2007-05-16 14:04                         ` Nick Piggin
@ 2007-05-16 14:20                         ` Nick Piggin
  2007-05-16 15:06                           ` Nicolas Mailhot
  1 sibling, 1 reply; 39+ messages in thread
From: Nick Piggin @ 2007-05-16 14:20 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nicolas Mailhot, Christoph Lameter, Andy Whitcroft, akpm,
	Linux Memory Management List

Mel Gorman wrote:

> ======
> 
> On third thought: The trouble with this solution is that we will now set
> the order to that used by the largest kmalloc cache. Bad... this could be
> 6 on i386 to 13 if CONFIG_LARGE_ALLOCs is set. The large kmalloc caches are
> rarely used and we are used to OOMing if those are utilized to frequently.
> 
> I guess we should only set this for non kmalloc caches then. 
> So move the call into kmem_cache_create? Would make the min order 3 on
> most of my mm machines.
> ===

Also, I might add that the e1000 page allocations failures usually come
from kmalloc, so doing this means they might just be protected by chance
if someone happens to create a kmem cache of order 3.

-- 
SUSE Labs, Novell Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-16 14:20                         ` Nick Piggin
@ 2007-05-16 15:06                           ` Nicolas Mailhot
  2007-05-16 15:33                             ` Mel Gorman
  0 siblings, 1 reply; 39+ messages in thread
From: Nicolas Mailhot @ 2007-05-16 15:06 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Mel Gorman, Christoph Lameter, Andy Whitcroft, akpm,
	Linux Memory Management List

[-- Attachment #1: Type: text/plain, Size: 924 bytes --]

Le jeudi 17 mai 2007 à 00:20 +1000, Nick Piggin a écrit :
> Mel Gorman wrote:
> 
> > ======
> > 
> > On third thought: The trouble with this solution is that we will now set
> > the order to that used by the largest kmalloc cache. Bad... this could be
> > 6 on i386 to 13 if CONFIG_LARGE_ALLOCs is set. The large kmalloc caches are
> > rarely used and we are used to OOMing if those are utilized to frequently.
> > 
> > I guess we should only set this for non kmalloc caches then. 
> > So move the call into kmem_cache_create? Would make the min order 3 on
> > most of my mm machines.
> > ===
> 
> Also, I might add that the e1000 page allocations failures usually come
> from kmalloc, so doing this means they might just be protected by chance
> if someone happens to create a kmem cache of order 3.

The system on which the patches were tested does not include an e1000
card

-- 
Nicolas Mailhot

[-- Attachment #2: Ceci est une partie de message numériquement signée --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-16 14:04                         ` Nick Piggin
@ 2007-05-16 15:32                           ` Mel Gorman
  2007-05-16 15:44                             ` Nick Piggin
  2007-05-16 15:46                             ` Nick Piggin
  0 siblings, 2 replies; 39+ messages in thread
From: Mel Gorman @ 2007-05-16 15:32 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Nicolas Mailhot, Christoph Lameter, Andy Whitcroft, akpm,
	Linux Memory Management List

On (17/05/07 00:04), Nick Piggin didst pronounce:
> Mel Gorman wrote:
> >On (16/05/07 22:28), Nick Piggin didst pronounce:
> >
> >>Mel Gorman wrote:
> >>
> >>>On Wed, 16 May 2007, Nick Piggin wrote:
> >>>
> 
> >>>No, raise_kswapd_order() is used when it is known there are many 
> >>>high-order allocations of a particular value. It becomes the minimum 
> >>>value kswapd reclaims at. SLUB does not *require* high order allocations 
> >>>but can be configured to use them so it makes sense to keep 
> >>>min_free_kbytes at that order to reduce stalls due to direct reclaim.
> >>
> >>The point is you still might not have anything performing those
> >>allocations from those higher order caches. Or you might have things
> >>that are doing higher order allocations, but not via slab.
> >>
> >
> >
> >On the contrary, raise_kswapd_order() is called when you *know* things will
> >be performing those allocations. However, I think what you are saying is
> >that kswapd could end up reclaiming at the highest-order cache even though
> >it might be very rarely used. Christoph identified the same problem and 
> >sent
> >a follow-up patch, this is the leader
> >
> >======
> >
> >On third thought: The trouble with this solution is that we will now set
> >the order to that used by the largest kmalloc cache. Bad... this could be
> >6 on i386 to 13 if CONFIG_LARGE_ALLOCs is set. The large kmalloc caches are
> >rarely used and we are used to OOMing if those are utilized to frequently.
> >
> >I guess we should only set this for non kmalloc caches then. 
> >So move the call into kmem_cache_create? Would make the min order 3 on
> >most of my mm machines.
> >===
> 
> You do not *know* if the slab is going to be allocated from. Or maybe it
> is a few times at bootup, or once every 10 minutes.
> 

So is your primary issue with raise_kswapd_order() being called at the
time a cache is opened for use and instead it should be more selective?

> 
> >The second part of what you say is that there could be a non-slab user of
> >high order allocs. That is true and expected. In that case, the existing
> >mechanism informs kswapd of the higher order as it does today so it can
> >reclaim at the higher order for a bit and enter direct reclaim if 
> >necessary.
> 
> You seem to have broken the existing mechanism though.
> 

How is it broken exactly? What has changed in this patch is that there
may be a minimum order that kswapd reclaims at. The same minimum number
of pages are kept free.

If the watermark was totally ignored with the second patch, I would understand
but they are still obeyed. Even if it is an ALLOC_HIGH or ALLOC_HARDER
allocation, the watermarks are obeyed for order-0 so memory does not get
exhausted as that could cause a host of problems. The difference is if this
is a HIGH or HARDER allocation and the memory can be granted without going
belong the order-0 watermarks, it'll succeed. Would it be better if the
lack of ALLOC_CPUSET was used to determine when only order-0 watermarks
should be obeyed?

> >>Basically this is dumbing down the existing higher order watermarking
> >>already there in favour of a worse special case AFAIKS.
> >>
> >
> >
> >It's not being replaced. That existing watermarking is still used. If it
> >was being replaced, the for loop in zone_watermark_ok() would have been
> >taken out.
> 
> Patch 2 sure doesn't make it any better.
> 

The second patch is simply saying "If you can satisfy the allocation without
going below the watermarks for order-0, then do it". Again, if it used
!(alloc_flags & ALLOC_CPUSET), would you be happier?

> 
> >>kswapd already does reclaim on behalf of non-sleeping higher order
> >>allocations (or at least it does in mainline).
> >>
> >
> >
> >My point is that when it does, a caller is still likely to enter direct
> >reclaim and kswapd can help prevent stalls if it pre-emptively reclaims at
> >an order known to be commonly used when free pages is below watermarks
> 
> So we should increase the watermarks, and keep the existing, working
> code there and it will work for everyone, not just for slab, and it
> will not keep higher orders free if they are not needed.
> 

Raising watermarks is no guarantee that a high-order allocation that can sleep
will occur at the right time to kick kswapd awake and that it'll get back from
whatever it's doing in time to spot the new order and start reclaiming again.

> >>>>>Arguably, e1000 should also be calling raise_kswapd_order() when it 
> >>>>>is using jumbo frames.
> >>>>
> >>>>
> >>>>It should be able to handle higher order page allocation failures
> >>>>gracefully.
> >>>
> >>>
> >>>Has something changed recently that it can handle failures? It might 
> >>>have because it has been hinted that it's possible, just not very fast.
> >>
> >>I don't know, but it is stupid if it can't.
> >
> >
> >Well, if it could, order:3 allocation failure reports wouldn't occur
> >periodically.
> 
> They are reports of failures, not failure to handle the failures.
> 

If the failures were being handled correctly, why would it be logging at
all? They would have set __GFP_NOWARN and recovered silently.

> 
> >>It should not be too hard to keep it fast where it is fast today, and have
> >>it at least work where it would otherwise fail... just by reserving some
> >>memory pages in case none can be allocated.
> >>
> >
> >
> >It already reserves and still occasionally hits the problem.
> 
> e1000 reserves page? It would have to use them in a manner that guaranteed
> timely return to the reserve pool like mempools. If it did that then it
> would not have a problem.
> 

When I last looked, they kept a series of buffers in a ring buffer. My
understanding at the time was that this buffer regularly gets depleted
and refilled.

Ultimately, the allocations are done kmalloc() but with jumbo frames, the
kmalloc() is for 32K. As it happens, this means that if jumbo frames are in
use, then that kmalloc slab is opened and the minimum kswapd order is raised
so that min_free_kbytes is kept contiguous for those atomic allocations.

> >>>>kswapd will be notified of the attempts and go on and try
> >>>>to free up some higher order pages for it for next time. What is wrong
> >>>>with this process?
> >>>
> >>>
> >>>It's reactive, it only occurs when a process has already entered direct 
> >>>reclaim.
> >>
> >>No it should not be. It should be proactive even for higher order 
> >>allocations.
> >
> >
> >I don't see why it would be. kswapd is only told to wake up when the
> >first allocation attempt obeying watermarks fails.
> 
> That first watermark is the the reclaim watermark, not the allocation
> watermark.
> 
> 
> >>All this stuff used to work properly :(
> >>
> >
> >
> >It only came to light recently that there might be issues.
> 
> I mean kswapd asynchronously freeing higher order pages proactively. We
> should get that working again first.
> 

What do you suggest then?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-16 15:06                           ` Nicolas Mailhot
@ 2007-05-16 15:33                             ` Mel Gorman
  0 siblings, 0 replies; 39+ messages in thread
From: Mel Gorman @ 2007-05-16 15:33 UTC (permalink / raw)
  To: Nicolas Mailhot
  Cc: Nick Piggin, Christoph Lameter, Andy Whitcroft, akpm,
	Linux Memory Management List

On (16/05/07 17:06), Nicolas Mailhot didst pronounce:
> Le jeudi 17 mai 2007 a 00:20 +1000, Nick Piggin a ecrit :
> > Mel Gorman wrote:
> > 
> > > ======
> > > 
> > > On third thought: The trouble with this solution is that we will now set
> > > the order to that used by the largest kmalloc cache. Bad... this could be
> > > 6 on i386 to 13 if CONFIG_LARGE_ALLOCs is set. The large kmalloc caches are
> > > rarely used and we are used to OOMing if those are utilized to frequently.
> > > 
> > > I guess we should only set this for non kmalloc caches then. 
> > > So move the call into kmem_cache_create? Would make the min order 3 on
> > > most of my mm machines.
> > > ===
> > 
> > Also, I might add that the e1000 page allocations failures usually come
> > from kmalloc, so doing this means they might just be protected by chance
> > if someone happens to create a kmem cache of order 3.
> 
> The system on which the patches were tested does not include an e1000
> card
> 

We know. It's simply a case that in the past, e1000 failing to allocate pages
was the reason to receive reports like yours. They are some similarities in
the problems.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-16 15:32                           ` Mel Gorman
@ 2007-05-16 15:44                             ` Nick Piggin
  2007-05-16 16:46                               ` Mel Gorman
  2007-05-16 15:46                             ` Nick Piggin
  1 sibling, 1 reply; 39+ messages in thread
From: Nick Piggin @ 2007-05-16 15:44 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nicolas Mailhot, Christoph Lameter, Andy Whitcroft, akpm,
	Linux Memory Management List

Mel Gorman wrote:
> On (17/05/07 00:04), Nick Piggin didst pronounce:
> 
>>Mel Gorman wrote:

>>>I guess we should only set this for non kmalloc caches then. 
>>>So move the call into kmem_cache_create? Would make the min order 3 on
>>>most of my mm machines.
>>>===
>>
>>You do not *know* if the slab is going to be allocated from. Or maybe it
>>is a few times at bootup, or once every 10 minutes.
>>
> 
> 
> So is your primary issue with raise_kswapd_order() being called at the
> time a cache is opened for use and instead it should be more selective?
> 
> 
>>>The second part of what you say is that there could be a non-slab user of
>>>high order allocs. That is true and expected. In that case, the existing
>>>mechanism informs kswapd of the higher order as it does today so it can
>>>reclaim at the higher order for a bit and enter direct reclaim if 
>>>necessary.
>>
>>You seem to have broken the existing mechanism though.
>>
> 
> 
> How is it broken exactly? What has changed in this patch is that there
> may be a minimum order that kswapd reclaims at. The same minimum number
> of pages are kept free.

I mean with patch 2.


> If the watermark was totally ignored with the second patch, I would understand
> but they are still obeyed. Even if it is an ALLOC_HIGH or ALLOC_HARDER
> allocation, the watermarks are obeyed for order-0 so memory does not get
> exhausted as that could cause a host of problems. The difference is if this
> is a HIGH or HARDER allocation and the memory can be granted without going
> belong the order-0 watermarks, it'll succeed. Would it be better if the
> lack of ALLOC_CPUSET was used to determine when only order-0 watermarks
> should be obeyed?

But I don't know why you want to disobey higher order watermarks in the
first place. *Those* are exactly the things that are going to be helpful
to fix this problem of atomic higher order allocations failing or non
atomic ones going into direct reclaim.


>>>It's not being replaced. That existing watermarking is still used. If it
>>>was being replaced, the for loop in zone_watermark_ok() would have been
>>>taken out.
>>
>>Patch 2 sure doesn't make it any better.
>>
> 
> 
> The second patch is simply saying "If you can satisfy the allocation without
> going below the watermarks for order-0, then do it". Again, if it used
> !(alloc_flags & ALLOC_CPUSET), would you be happier?

No ;)


>>>My point is that when it does, a caller is still likely to enter direct
>>>reclaim and kswapd can help prevent stalls if it pre-emptively reclaims at
>>>an order known to be commonly used when free pages is below watermarks
>>
>>So we should increase the watermarks, and keep the existing, working
>>code there and it will work for everyone, not just for slab, and it
>>will not keep higher orders free if they are not needed.
>>
> 
> 
> Raising watermarks is no guarantee that a high-order allocation that can sleep
> will occur at the right time to kick kswapd awake and that it'll get back from
> whatever it's doing in time to spot the new order and start reclaiming again.

You don't *need* a higher order allocation that can sleep in order
to kick kswapd. Crikey, I keep saying this.


>>>Well, if it could, order:3 allocation failure reports wouldn't occur
>>>periodically.
>>
>>They are reports of failures, not failure to handle the failures.
>>
> 
> 
> If the failures were being handled correctly, why would it be logging at
> all? They would have set __GFP_NOWARN and recovered silently.

Lots of places don't set __GFP_NOWARN but handle failures. Generally
you want to keep the warning even for atomic allocations if it is
a reasonably small order (0 or 1 or even 2).

The failures I have seen are not "networking stops working". They are
"e1000 gives page allocation failures", and the replies have always
been "that's not unexpected". Have you seen *any* of the former type?


>>>It already reserves and still occasionally hits the problem.
>>
>>e1000 reserves page? It would have to use them in a manner that guaranteed
>>timely return to the reserve pool like mempools. If it did that then it
>>would not have a problem.
>>
> 
> 
> When I last looked, they kept a series of buffers in a ring buffer. My
> understanding at the time was that this buffer regularly gets depleted
> and refilled.

But refilled via the allocator, right? One which does not revert to a
private stash if it cannot get a page.


>>>>All this stuff used to work properly :(
>>>>
>>>
>>>
>>>It only came to light recently that there might be issues.
>>
>>I mean kswapd asynchronously freeing higher order pages proactively. We
>>should get that working again first.
>>
> 
> 
> What do you suggest then?

Working out why it apparently isn't working, first. Then maybe look at
raising watermarks (they get reduced fairly rapidly as the order increases,
so it might just be that there is not enough at order-3).

-- 
SUSE Labs, Novell Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-16 15:32                           ` Mel Gorman
  2007-05-16 15:44                             ` Nick Piggin
@ 2007-05-16 15:46                             ` Nick Piggin
  1 sibling, 0 replies; 39+ messages in thread
From: Nick Piggin @ 2007-05-16 15:46 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nicolas Mailhot, Christoph Lameter, Andy Whitcroft, akpm,
	Linux Memory Management List

Mel Gorman wrote:

> Ultimately, the allocations are done kmalloc() but with jumbo frames, the
> kmalloc() is for 32K. As it happens, this means that if jumbo frames are in
> use, then that kmalloc slab is opened and the minimum kswapd order is raised
> so that min_free_kbytes is kept contiguous for those atomic allocations.

Oh, and I didn't realise Christoph's patch raised the max order if a higher
order kmalloc slab is used. Still, that complaint was basically the least
of my troubles...

-- 
SUSE Labs, Novell Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-16 15:44                             ` Nick Piggin
@ 2007-05-16 16:46                               ` Mel Gorman
  2007-05-17  7:09                                 ` Nick Piggin
  0 siblings, 1 reply; 39+ messages in thread
From: Mel Gorman @ 2007-05-16 16:46 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Nicolas Mailhot, Christoph Lameter, Andy Whitcroft, akpm,
	Linux Memory Management List

On (17/05/07 01:44), Nick Piggin didst pronounce:
> Mel Gorman wrote:
> >On (17/05/07 00:04), Nick Piggin didst pronounce:
> >
> >>Mel Gorman wrote:
> 
> >>>I guess we should only set this for non kmalloc caches then. 
> >>>So move the call into kmem_cache_create? Would make the min order 3 on
> >>>most of my mm machines.
> >>>===
> >>
> >>You do not *know* if the slab is going to be allocated from. Or maybe it
> >>is a few times at bootup, or once every 10 minutes.
> >>
> >
> >
> >So is your primary issue with raise_kswapd_order() being called at the
> >time a cache is opened for use and instead it should be more selective?
> >
> >
> >>>The second part of what you say is that there could be a non-slab user of
> >>>high order allocs. That is true and expected. In that case, the existing
> >>>mechanism informs kswapd of the higher order as it does today so it can
> >>>reclaim at the higher order for a bit and enter direct reclaim if 
> >>>necessary.
> >>
> >>You seem to have broken the existing mechanism though.
> >>
> >
> >
> >How is it broken exactly? What has changed in this patch is that there
> >may be a minimum order that kswapd reclaims at. The same minimum number
> >of pages are kept free.
> 
> I mean with patch 2.
> 

Ok.

> 
> >If the watermark was totally ignored with the second patch, I would 
> >understand
> >but they are still obeyed. Even if it is an ALLOC_HIGH or ALLOC_HARDER
> >allocation, the watermarks are obeyed for order-0 so memory does not get
> >exhausted as that could cause a host of problems. The difference is if this
> >is a HIGH or HARDER allocation and the memory can be granted without going
> >belong the order-0 watermarks, it'll succeed. Would it be better if the
> >lack of ALLOC_CPUSET was used to determine when only order-0 watermarks
> >should be obeyed?
> 
> But I don't know why you want to disobey higher order watermarks in the
> first place.

Because the original problem was bio_alloc() allocations failing and the OOM
log showed that the higher-order pages were available. Patch 2 addressed it
by succeeding these allocations if the min watermark was not breached with the
knowledge that kswapd was awake and reclaiming at the relevant order. I think
it may even have solved it without the kswapd change but the kswapd change
seemed sensible.

> *Those* are exactly the things that are going to be helpful
> to fix this problem of atomic higher order allocations failing or non
> atomic ones going into direct reclaim.
> 

And the intention was that non-atomic ones would go into direct reclaim
after kicking kswapd but the atomic allocations would at least succeeed if
the memory was there as long as they don't totally mess up watermarks.

> 
> >>>It's not being replaced. That existing watermarking is still used. If it
> >>>was being replaced, the for loop in zone_watermark_ok() would have been
> >>>taken out.
> >>
> >>Patch 2 sure doesn't make it any better.
> >>
> >
> >
> >The second patch is simply saying "If you can satisfy the allocation 
> >without
> >going below the watermarks for order-0, then do it". Again, if it used
> >!(alloc_flags & ALLOC_CPUSET), would you be happier?
> 
> No ;)
> 

heh.

> 
> >>>My point is that when it does, a caller is still likely to enter direct
> >>>reclaim and kswapd can help prevent stalls if it pre-emptively reclaims 
> >>>at
> >>>an order known to be commonly used when free pages is below watermarks
> >>
> >>So we should increase the watermarks, and keep the existing, working
> >>code there and it will work for everyone, not just for slab, and it
> >>will not keep higher orders free if they are not needed.
> >>
> >
> >
> >Raising watermarks is no guarantee that a high-order allocation that can 
> >sleep
> >will occur at the right time to kick kswapd awake and that it'll get back 
> >from
> >whatever it's doing in time to spot the new order and start reclaiming 
> >again.
> 
> You don't *need* a higher order allocation that can sleep in order
> to kick kswapd. Crikey, I keep saying this.
> 

Indeed, we seem to have got stuck in a loop of sorts.

I understand that kswapd gets kicked awake either way but there must be a
timing issue. Lets say we had a situations like

order-0 alloc
watermark hit => wake kswapd
order-0 alloc			kswapd reclaiming order 0
order-0 alloc			kswapd reclaiming order 0
order-3 alloc => kick kswap for order 3
order-0 alloc			kswapd reclaiming order 0
order-3 alloc			kswapd reclaiming order 0
order-3 alloc			kswapd reclaiming order 0
order-3 alloc => highorder mark hit, fail

kswapd will keep reclaiming at order-0 until it completes a reclaim cycle
and spots the new order and start over again. So there is a potentially
sizable window there where problems can hit. Right?

> >>>Well, if it could, order:3 allocation failure reports wouldn't occur
> >>>periodically.
> >>
> >>They are reports of failures, not failure to handle the failures.
> >>
> >
> >
> >If the failures were being handled correctly, why would it be logging at
> >all? They would have set __GFP_NOWARN and recovered silently.
> 
> Lots of places don't set __GFP_NOWARN but handle failures. Generally
> you want to keep the warning even for atomic allocations if it is
> a reasonably small order (0 or 1 or even 2).
> 

Fair enough

> The failures I have seen are not "networking stops working". They are
> "e1000 gives page allocation failures", and the replies have always
> been "that's not unexpected". Have you seen *any* of the former type?
> 

Admittadly, I don't recall complaints that networking totally failed. The
result should be that packets drop until such time that the allocations
start succeeding again.

> >>>It already reserves and still occasionally hits the problem.
> >>
> >>e1000 reserves page? It would have to use them in a manner that guaranteed
> >>timely return to the reserve pool like mempools. If it did that then it
> >>would not have a problem.
> >>
> >
> >
> >When I last looked, they kept a series of buffers in a ring buffer. My
> >understanding at the time was that this buffer regularly gets depleted
> >and refilled.
> 
> But refilled via the allocator, right? One which does not revert to a
> private stash if it cannot get a page.
> 

True.

> 
> >>>>All this stuff used to work properly :(
> >>>>
> >>>
> >>>
> >>>It only came to light recently that there might be issues.
> >>
> >>I mean kswapd asynchronously freeing higher order pages proactively. We
> >>should get that working again first.
> >>
> >
> >
> >What do you suggest then?
> 
> Working out why it apparently isn't working, first. Then maybe look at
> raising watermarks (they get reduced fairly rapidly as the order increases,
> so it might just be that there is not enough at order-3).
> 

I believe it failed to work due to a combination of kswapd reclaiming at
the wrong order for a while and the fact that the watermarks are pretty
agressive when it comes to higher orders. I'm trying to think of
alternative fixes but keep coming back to the current fix using 
!(alloc_flags & ALLOC_CPUSET) to allow !wait allocations to succeed if
the memory is there and above min watermarks at order-0.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/2] Only check absolute watermarks for ALLOC_HIGH and ALLOC_HARDER allocations
  2007-05-16 14:11           ` Nick Piggin
@ 2007-05-16 18:28             ` Andy Whitcroft
  2007-05-16 18:48               ` Mel Gorman
  2007-05-17  7:34               ` Nick Piggin
  0 siblings, 2 replies; 39+ messages in thread
From: Andy Whitcroft @ 2007-05-16 18:28 UTC (permalink / raw)
  To: Nick Piggin, clameter; +Cc: Mel Gorman, nicolas.mailhot, akpm, linux-mm

Nick Piggin wrote:
> Mel Gorman wrote:
>> On (16/05/07 23:35), Nick Piggin didst pronounce:
>>
>>> Mel Gorman wrote:
> 
>>>> In page_alloc.c
>>>>
>>>>       if ((unlikely(rt_task(p)) && !in_interrupt()) || !wait)
>>>>               alloc_flags |= ALLOC_HARDER;
>>>>
>>>> See the !wait part.
>>>
>>> And the || part.
>>>
>>
>>
>> I doubt a rt_task is thrilled to be entering direct reclaim.
> 
> Doesn't mean you should break the watermarks. !wait allocations don't
> always happen from interrupt context either, and it is possible to see
> code doing

The problem perhaps here is that we are not able to allocate at all
despite having large amounts of memory free.  In the original problem
report we had a failing order-2 allocation when order-7 pages were free,
and we were over the reserve.

Indeed in experiments with this algorithm I am finding that it is common
to fail an order 2 allocation when more than 2* the reserve is
available.  More on this at the end ...

> if (!alloc(GFP_KERNEL&~__GFP_WAIT)) {
>     spin_unlock()
>     alloc(GFP_KERNEL)
>     spin_lock()
> }
> 
> 
>>>> The ALLOC_HIGH applies to __GFP_HIGH allocations which are allowed to
>>>> dip into emergency pools and go below the reserve.
>>>
>>> And some of them can sleep too.
>>>
>>
>>
>> If you feel very strongly about it, I can back out the ALLOC_HIGH part
>> for
>> __GFP_HIGH allocations but it looks like at a glance that users of
>> __GFP_HIGH
>> are not too keen on sleeping;
> 
> I feel strongly about not breaking these things which are specifically
> there
> for a reason and that are being changed seemingly because of the false
> impression that kswapd doesn't proactively free pages for them.

The interaction with kswapd is not instantaneous.  When an allocation at
high order fails to allocate at the low watermarks it will indeed wake
up kswapd and that will work to release memory at the order specified.
However, if it is already reclaiming at another order it will not switch
up until it next completes a pass.  For an allocator who cannot sleep
this is very likely to be too late.  This is never going to help a
bursty allocator.

>>>> ALLOC_HARDER is an urgent allocation class.
>>>
>>> And HIGH is even more, and MEMALLOC even more again.
>>>
>>
>>
>> HIGH => ALLOC_HIGH => obey watermarks at order-0
>>
>> Somewhat counter-intuitively, with the current code if the allocation is
>> a really high priority but can sleep, it can actually allocate without
>> any
>> watermarks at all
> 
> I didn't understand what you meant?
> 
> 
>>>> What actually happens is that high-order allocations fail even though
>>>> the watermarks are met because they cannot enter direct reclaim.
>>>
>>> Yeah, they fail leaving some spare for more urgent allocations. Like
>>> how the order-0 allocations work.
>>
>>
>> order-0 watermarks are still in place. After the patch, it is still not
>> possible for the allocations to break the watermarks there.
> 
> The watermarks for higher order pages you could say are implicit but
> still there. They are scaled down from the order-0 watermarks, so they
> should behave in the same way. I just can't understand why you're
> bypassing these if you think the order-0 behaviour is OK.

The problem is the watermarks for the higher orders are actually much
stricter than for low orders.  This is a by product of the way in which
the algorithm calculates the current free at each iteration, taking away
the pages at smaller order.  The effective free pages at each order is
scaled by the ratio of the free pages at that order to all the sum of
all higher orders.  The effective min at each order is halved.

Due to the nature of the reclaim strategy we will always expect to see
exponentially more order-0 pages than order-1 etc and so on, making it
hugely more difficult to allocate a page at these higher orders.

>>> They should also kick kswapd to start freeing pages _before_ they start
>>> failing too.
>>>
>>
>>
>> Should prehaps, but from what I read kswapd is only kicked into action
>> when the first allocation attempt has already failed.
> 
> Well that's wrong unless you are allocating with GFP_THISNODE, in which
> case that is specifically the behaviour that is asked for.

kswapd is kicked when we cannot allocate at the normal low water mark,
we will then attempt a further allocation at min/2 etc.  However we are
as likely to fail the second as the effective low water mark for higher
order pages is significantly higher than for order-0.  So kswapd will be
woken, but it has a huge job on its hands to get us from order-0 low
order to order-N low water.  As we cannot sleep we are very likely to fail.

I did some testing with the current algorithm in a test harness.  That
testing seems to show that the effect of the reserve can be majorly
higher than the real reserve.  If we look at the OOM from the original
report we can see there was some 7700 pages free at the time of the
allocation.  The effective reserve for an ALLOC_HARD allocation is only
711 pages, and yet we cannot allocate any pages over order-0.

total free: 7768
reserve   : 1423
   0    1    2    3    4    5    6    7    8    9   10
7560    0    8    0    1    1    0    1    0    0    0

allocation order : 0
effective free   : 7768
effective reserve: 711.5

allocation order : 1
effective free   : 7767
effective reserve: 711.5
0 207 355
FAIL

Looking at the figures above dispassionately it is hard to fault the
logic of the allocator denying this allocation.  There are indeed very
few pages at those orders and some (where possible) are reserved for
PF_MEM tasks, for reclaim itself.  However, the reservation system takes
no account of higher orders, so we can always end up in a situation
where there only order-0 pages free; all higher orders have been split.
 This gives us a constraint on all reclaim processing, it must only
involve order-0 pages else it could deadlock.  BUT if that is true and
reclaim only uses order-0 pages then there is in actually no point in
retaining any PF_MEM reserve at higher order as it would never be used.

What does this mean:

1) any slab which is used from the reclaim path _must_ use order-0
allocations,
2) any slab which is allocated from atomically _should_ use order-0
allocations.

My understanding is all slabs within a slub slab cache have to be the
same order.  So we need to ensure that any slab that might be used from
the reclaim path must only use order-0 pages.  Also it seems that any
slab that is allocated from atomically will have to use order-0 pages in
order to remain reliable.  Christoph, do we have any facility to tag
caches to use a specific allocation order?

I think that probabally means that the second of our patches here is not
the right approach to this problem long term.  A patch which sets the
critical slabs to order-0 is probabally the way forward.  Having said
that, as I mentioned before if all of PF_MEM processing is (and it must
be to be safe) order-0 then actually some relaxing of the watermarks
above order-0 may well be in order.

It is interesting to note the state of the system with the kswapd patch
to target reclaim at SLUB order, figures below.  We get significantly
closer to real OOM before failing the order-2 allocations.  To my mind
that indicates that this change is pretty beneficial under memory
pressure.  Especially for Andrews e1000 workload.  We might consider
using that patch with the minimum order set to PAGE_ALLOC_COSTLY_ORDER.

total free: 2889
reserve   : 1423
   0    1    2    3    4    5    6    7    8    9
2619   27    6    0    0    2    0    1    0    0

allocation order : 0
effective free   : 2889
effective reserve: 711.5

allocation order : 1
effective free   : 2888
effective reserve: 711.5
0 269 355
FAIL

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/2] Only check absolute watermarks for ALLOC_HIGH and ALLOC_HARDER allocations
  2007-05-16 18:28             ` Andy Whitcroft
@ 2007-05-16 18:48               ` Mel Gorman
  2007-05-16 19:00                 ` Christoph Lameter
  2007-05-17  7:34               ` Nick Piggin
  1 sibling, 1 reply; 39+ messages in thread
From: Mel Gorman @ 2007-05-16 18:48 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Nick Piggin, clameter, nicolas.mailhot, akpm, linux-mm

On (16/05/07 19:28), Andy Whitcroft didst pronounce:
> Nick Piggin wrote:
> > Mel Gorman wrote:
> >> On (16/05/07 23:35), Nick Piggin didst pronounce:
> >>
> >>> Mel Gorman wrote:
> > 
> >>>> In page_alloc.c
> >>>>
> >>>>       if ((unlikely(rt_task(p)) && !in_interrupt()) || !wait)
> >>>>               alloc_flags |= ALLOC_HARDER;
> >>>>
> >>>> See the !wait part.
> >>>
> >>> And the || part.
> >>>
> >>
> >>
> >> I doubt a rt_task is thrilled to be entering direct reclaim.
> > 
> > Doesn't mean you should break the watermarks. !wait allocations don't
> > always happen from interrupt context either, and it is possible to see
> > code doing
> 
> The problem perhaps here is that we are not able to allocate at all
> despite having large amounts of memory free.  In the original problem
> report we had a failing order-2 allocation when order-7 pages were free,
> and we were over the reserve.
> 
> Indeed in experiments with this algorithm I am finding that it is common
> to fail an order 2 allocation when more than 2* the reserve is
> available.  More on this at the end ...
> 
> > if (!alloc(GFP_KERNEL&~__GFP_WAIT)) {
> >     spin_unlock()
> >     alloc(GFP_KERNEL)
> >     spin_lock()
> > }
> > 
> > 
> >>>> The ALLOC_HIGH applies to __GFP_HIGH allocations which are allowed to
> >>>> dip into emergency pools and go below the reserve.
> >>>
> >>> And some of them can sleep too.
> >>>
> >>
> >>
> >> If you feel very strongly about it, I can back out the ALLOC_HIGH part
> >> for
> >> __GFP_HIGH allocations but it looks like at a glance that users of
> >> __GFP_HIGH
> >> are not too keen on sleeping;
> > 
> > I feel strongly about not breaking these things which are specifically
> > there
> > for a reason and that are being changed seemingly because of the false
> > impression that kswapd doesn't proactively free pages for them.
> 
> The interaction with kswapd is not instantaneous.  When an allocation at
> high order fails to allocate at the low watermarks it will indeed wake
> up kswapd and that will work to release memory at the order specified.
> However, if it is already reclaiming at another order it will not switch
> up until it next completes a pass.  For an allocator who cannot sleep
> this is very likely to be too late.  This is never going to help a
> bursty allocator.
> 
> >>>> ALLOC_HARDER is an urgent allocation class.
> >>>
> >>> And HIGH is even more, and MEMALLOC even more again.
> >>>
> >>
> >>
> >> HIGH => ALLOC_HIGH => obey watermarks at order-0
> >>
> >> Somewhat counter-intuitively, with the current code if the allocation is
> >> a really high priority but can sleep, it can actually allocate without
> >> any
> >> watermarks at all
> > 
> > I didn't understand what you meant?
> > 
> > 
> >>>> What actually happens is that high-order allocations fail even though
> >>>> the watermarks are met because they cannot enter direct reclaim.
> >>>
> >>> Yeah, they fail leaving some spare for more urgent allocations. Like
> >>> how the order-0 allocations work.
> >>
> >>
> >> order-0 watermarks are still in place. After the patch, it is still not
> >> possible for the allocations to break the watermarks there.
> > 
> > The watermarks for higher order pages you could say are implicit but
> > still there. They are scaled down from the order-0 watermarks, so they
> > should behave in the same way. I just can't understand why you're
> > bypassing these if you think the order-0 behaviour is OK.
> 
> The problem is the watermarks for the higher orders are actually much
> stricter than for low orders.  This is a by product of the way in which
> the algorithm calculates the current free at each iteration, taking away
> the pages at smaller order.  The effective free pages at each order is
> scaled by the ratio of the free pages at that order to all the sum of
> all higher orders.  The effective min at each order is halved.
> 
> Due to the nature of the reclaim strategy we will always expect to see
> exponentially more order-0 pages than order-1 etc and so on, making it
> hugely more difficult to allocate a page at these higher orders.
> 
> >>> They should also kick kswapd to start freeing pages _before_ they start
> >>> failing too.
> >>>
> >>
> >>
> >> Should prehaps, but from what I read kswapd is only kicked into action
> >> when the first allocation attempt has already failed.
> > 
> > Well that's wrong unless you are allocating with GFP_THISNODE, in which
> > case that is specifically the behaviour that is asked for.
> 
> kswapd is kicked when we cannot allocate at the normal low water mark,
> we will then attempt a further allocation at min/2 etc.  However we are
> as likely to fail the second as the effective low water mark for higher
> order pages is significantly higher than for order-0.  So kswapd will be
> woken, but it has a huge job on its hands to get us from order-0 low
> order to order-N low water.  As we cannot sleep we are very likely to fail.
> 
> 
> I did some testing with the current algorithm in a test harness.  That
> testing seems to show that the effect of the reserve can be majorly
> higher than the real reserve.  If we look at the OOM from the original
> report we can see there was some 7700 pages free at the time of the
> allocation.  The effective reserve for an ALLOC_HARD allocation is only
> 711 pages, and yet we cannot allocate any pages over order-0.
> 
> total free: 7768
> reserve   : 1423
>    0    1    2    3    4    5    6    7    8    9   10
> 7560    0    8    0    1    1    0    1    0    0    0
> 
> allocation order : 0
> effective free   : 7768
> effective reserve: 711.5
> 
> allocation order : 1
> effective free   : 7767
> effective reserve: 711.5
> 0 207 355
> FAIL
> 
> 
> Looking at the figures above dispassionately it is hard to fault the
> logic of the allocator denying this allocation.  There are indeed very
> few pages at those orders and some (where possible) are reserved for
> PF_MEM tasks, for reclaim itself.  However, the reservation system takes
> no account of higher orders, so we can always end up in a situation
> where there only order-0 pages free; all higher orders have been split.
>  This gives us a constraint on all reclaim processing, it must only
> involve order-0 pages else it could deadlock.  BUT if that is true and
> reclaim only uses order-0 pages then there is in actually no point in
> retaining any PF_MEM reserve at higher order as it would never be used.
> 
> What does this mean:
> 
> 1) any slab which is used from the reclaim path _must_ use order-0
> allocations,
> 2) any slab which is allocated from atomically _should_ use order-0
> allocations.
> 
> My understanding is all slabs within a slub slab cache have to be the
> same order.  So we need to ensure that any slab that might be used from
> the reclaim path must only use order-0 pages.  Also it seems that any
> slab that is allocated from atomically will have to use order-0 pages in
> order to remain reliable.  Christoph, do we have any facility to tag
> caches to use a specific allocation order?

It may be possible with something like this (probably incomplete and definitly
untested) patch. Only the caches that are known offhand to be involved
in the reclaim path are marked up here, there probably are others that
should use the flag.

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc1-mm1-clean/fs/bio.c linux-2.6.22-rc1-mm1-slubatomic/fs/bio.c
--- linux-2.6.22-rc1-mm1-clean/fs/bio.c	2007-05-13 02:45:56.000000000 +0100
+++ linux-2.6.22-rc1-mm1-slubatomic/fs/bio.c	2007-05-16 19:37:22.000000000 +0100
@@ -1187,13 +1187,15 @@ static void __init biovec_init_slabs(voi
 
 		size = bvs->nr_vecs * sizeof(struct bio_vec);
 		bvs->slab = kmem_cache_create(bvs->name, size, 0,
-                                SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL, NULL);
+                                SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_USES_ATOMIC,
+				NULL, NULL);
 	}
 }
 
 static int __init init_bio(void)
 {
-	bio_slab = KMEM_CACHE(bio, SLAB_HWCACHE_ALIGN|SLAB_PANIC);
+	bio_slab = KMEM_CACHE(bio,
+				SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_USES_ATOMIC);
 
 	biovec_init_slabs();
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc1-mm1-clean/include/linux/slab.h linux-2.6.22-rc1-mm1-slubatomic/include/linux/slab.h
--- linux-2.6.22-rc1-mm1-clean/include/linux/slab.h	2007-05-16 10:54:18.000000000 +0100
+++ linux-2.6.22-rc1-mm1-slubatomic/include/linux/slab.h	2007-05-16 19:31:24.000000000 +0100
@@ -23,6 +23,7 @@ typedef struct kmem_cache kmem_cache_t _
 #define SLAB_DEBUG_FREE		0x00000100UL	/* DEBUG: Perform (expensive) checks on free */
 #define SLAB_RED_ZONE		0x00000400UL	/* DEBUG: Red zone objs in a cache */
 #define SLAB_POISON		0x00000800UL	/* DEBUG: Poison objects */
+#define SLAB_USES_ATOMIC	0x00001000UL	/* Slub uses atomic, must be order-0 */
 #define SLAB_HWCACHE_ALIGN	0x00002000UL	/* Align objs on cache lines */
 #define SLAB_CACHE_DMA		0x00004000UL	/* Use GFP_DMA memory */
 #define SLAB_STORE_USER		0x00010000UL	/* DEBUG: Store the last owner for bug hunting */
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc1-mm1-clean/lib/radix-tree.c linux-2.6.22-rc1-mm1-slubatomic/lib/radix-tree.c
--- linux-2.6.22-rc1-mm1-clean/lib/radix-tree.c	2007-05-16 10:54:18.000000000 +0100
+++ linux-2.6.22-rc1-mm1-slubatomic/lib/radix-tree.c	2007-05-16 19:36:18.000000000 +0100
@@ -1023,7 +1023,8 @@ void __init radix_tree_init(void)
 {
 	radix_tree_node_cachep = kmem_cache_create("radix_tree_node",
 			sizeof(struct radix_tree_node), 0,
-			SLAB_PANIC, radix_tree_node_ctor, NULL);
+			SLAB_PANIC|SLAB_USES_ATOMIC,
+			radix_tree_node_ctor, NULL);
 	radix_tree_init_maxindex();
 	hotcpu_notifier(radix_tree_callback, 0);
 }
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc1-mm1-clean/mm/slub.c linux-2.6.22-rc1-mm1-slubatomic/mm/slub.c
--- linux-2.6.22-rc1-mm1-clean/mm/slub.c	2007-05-16 10:54:19.000000000 +0100
+++ linux-2.6.22-rc1-mm1-slubatomic/mm/slub.c	2007-05-16 19:41:51.000000000 +0100
@@ -1978,7 +1978,11 @@ static int calculate_sizes(struct kmem_c
 	size = ALIGN(size, align);
 	s->size = size;
 
-	s->order = calculate_order(size);
+	if (flags & SLAB_USES_ATOMIC) {
+		BUG_ON(size > PAGE_SIZE);
+		s->order = 0;
+	} else
+		s->order = calculate_order(size);
 	if (s->order < 0)
 		return 0;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/2] Only check absolute watermarks for ALLOC_HIGH and ALLOC_HARDER allocations
  2007-05-16 18:48               ` Mel Gorman
@ 2007-05-16 19:00                 ` Christoph Lameter
  0 siblings, 0 replies; 39+ messages in thread
From: Christoph Lameter @ 2007-05-16 19:00 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andy Whitcroft, Nick Piggin, nicolas.mailhot, akpm, linux-mm

> > few pages at those orders and some (where possible) are reserved for
> > PF_MEM tasks, for reclaim itself.  However, the reservation system takes
> > no account of higher orders, so we can always end up in a situation

Could we change the reservation system to take account of higher orders?

> > My understanding is all slabs within a slub slab cache have to be the
> > same order.  So we need to ensure that any slab that might be used from
> > the reclaim path must only use order-0 pages.  Also it seems that any
> > slab that is allocated from atomically will have to use order-0 pages in
> > order to remain reliable.  Christoph, do we have any facility to tag
> > caches to use a specific allocation order?

I would like to avoid adding such a flag. Forcing low orders on a 
slab limits its scalability. Higher orders mean less frequent taking of 
locks. We adding more special casing to the VM. Its better if we could 
handle the reserves in such a way that higher allocs are possible.

Another solution may be to make sure that we can tolerate failures of 
atomic allocs? GFP_ATOMIC has always had the stigma of being able to fail 
after all.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-16 16:46                               ` Mel Gorman
@ 2007-05-17  7:09                                 ` Nick Piggin
  2007-05-17 12:22                                   ` Andy Whitcroft
  0 siblings, 1 reply; 39+ messages in thread
From: Nick Piggin @ 2007-05-17  7:09 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nicolas Mailhot, Christoph Lameter, Andy Whitcroft, akpm,
	Linux Memory Management List

Mel Gorman wrote:
> On (17/05/07 01:44), Nick Piggin didst pronounce:

>>>If the watermark was totally ignored with the second patch, I would 
>>>understand
>>>but they are still obeyed. Even if it is an ALLOC_HIGH or ALLOC_HARDER
>>>allocation, the watermarks are obeyed for order-0 so memory does not get
>>>exhausted as that could cause a host of problems. The difference is if this
>>>is a HIGH or HARDER allocation and the memory can be granted without going
>>>belong the order-0 watermarks, it'll succeed. Would it be better if the
>>>lack of ALLOC_CPUSET was used to determine when only order-0 watermarks
>>>should be obeyed?
>>
>>But I don't know why you want to disobey higher order watermarks in the
>>first place.
> 
> 
> Because the original problem was bio_alloc() allocations failing and the OOM
> log showed that the higher-order pages were available. Patch 2 addressed it
> by succeeding these allocations if the min watermark was not breached with the
> knowledge that kswapd was awake and reclaiming at the relevant order. I think
> it may even have solved it without the kswapd change but the kswapd change
> seemed sensible.

But that just breaks the watermarks.

It could be that the actual values of the watermarks as they are now are
not very good ones, which is where the problem is coming from.


>>*Those* are exactly the things that are going to be helpful
>>to fix this problem of atomic higher order allocations failing or non
>>atomic ones going into direct reclaim.
>>
> 
> 
> And the intention was that non-atomic ones would go into direct reclaim
> after kicking kswapd but the atomic allocations would at least succeeed if
> the memory was there as long as they don't totally mess up watermarks.

But we have 3 levels of watermarks, so you can keep a reserve for atomic
allocations _and_ a buffer between the reclaim watermark and the direct
reclaim watermark.


>>>Raising watermarks is no guarantee that a high-order allocation that can 
>>>sleep
>>>will occur at the right time to kick kswapd awake and that it'll get back 
>>>from
>>>whatever it's doing in time to spot the new order and start reclaiming 
>>>again.
>>
>>You don't *need* a higher order allocation that can sleep in order
>>to kick kswapd. Crikey, I keep saying this.
>>
> 
> 
> Indeed, we seem to have got stuck in a loop of sorts.
> 
> I understand that kswapd gets kicked awake either way but there must be a
> timing issue. Lets say we had a situations like
> 
> order-0 alloc
> watermark hit => wake kswapd
> order-0 alloc			kswapd reclaiming order 0
> order-0 alloc			kswapd reclaiming order 0
> order-3 alloc => kick kswap for order 3
> order-0 alloc			kswapd reclaiming order 0
> order-3 alloc			kswapd reclaiming order 0
> order-3 alloc			kswapd reclaiming order 0
> order-3 alloc => highorder mark hit, fail
> 
> kswapd will keep reclaiming at order-0 until it completes a reclaim cycle
> and spots the new order and start over again. So there is a potentially
> sizable window there where problems can hit. Right?

Take a look at the code. wakeup_kswapd and __alloc_pages.

First, assume the zone is above high watermarks for order-0 and order-1.
order-0 allocs...
order-1 low watermark hit => don't care, not allocing order-1
order-0 low watermark hit => wake kswapd reclaim order 0
order-1 alloc => wakeup_kswapd raises kswapd_max_order to 1
order-1 allocs continue to succeed until the min watermark is hit
order-1 *atomic* allocs continue until the atomic reserve is hit
order-1 memalloc allocs continue until no more order-1 pages left.

There really is (or should be) a proper watermarking system in place that
provides the right buffering for higher order allocations.


>>Working out why it apparently isn't working, first. Then maybe look at
>>raising watermarks (they get reduced fairly rapidly as the order increases,
>>so it might just be that there is not enough at order-3).
>>
> 
> 
> I believe it failed to work due to a combination of kswapd reclaiming at
> the wrong order for a while and the fact that the watermarks are pretty
> agressive when it comes to higher orders. I'm trying to think of
> alternative fixes but keep coming back to the current fix using 
> !(alloc_flags & ALLOC_CPUSET) to allow !wait allocations to succeed if
> the memory is there and above min watermarks at order-0.

kswapd reclaiming at the wrong order should be a bug. It should start
reclaiming at the right order as soon as an allocation (atomic or not)
goes through the "start reclaiming now" watermark.

Now this is just looking at mainline code that has the kswapd_max_order,
and kswapd doesn't actually reclaim "at" any order -- it just uses the
kswapd_max_order to know when the required "stop reclaiming now" marks
have been hit. If lumpy reclaim is not reclaiming at the right order,
then it means it isn't refreshing from kswapd_max_order enough.

-- 
SUSE Labs, Novell Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/2] Only check absolute watermarks for ALLOC_HIGH and ALLOC_HARDER allocations
  2007-05-16 18:28             ` Andy Whitcroft
  2007-05-16 18:48               ` Mel Gorman
@ 2007-05-17  7:34               ` Nick Piggin
  1 sibling, 0 replies; 39+ messages in thread
From: Nick Piggin @ 2007-05-17  7:34 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: clameter, Mel Gorman, nicolas.mailhot, akpm, linux-mm

Andy Whitcroft wrote:
> Nick Piggin wrote:

>>Doesn't mean you should break the watermarks. !wait allocations don't
>>always happen from interrupt context either, and it is possible to see
>>code doing
> 
> 
> The problem perhaps here is that we are not able to allocate at all
> despite having large amounts of memory free.  In the original problem
> report we had a failing order-2 allocation when order-7 pages were free,
> and we were over the reserve.

Look, the watermarks for higher order pages are exactly the same as those
for order-0 allocations, simply scaled. And the reasons for subtracting
lower order pages is utterly logical.

Firstly, some background. If you have 100 order-0 pages, 100 order-1 pages,
and 100 order-2 pages in the buddy lists, then you have 700 order-0 pages
available, or 300 order-1, or 100 order-2. Right?

Then if the order-0 watermarks are (in KB):
high 256
low 128
min 64

The order-1 watermarks will be 128, 64, 32.
And order-2 will be 64, 32, 16.

So the higher order watermarks are _less_ aggressive than order-0 watermarks.
I made them like that because previously there were _no_ higher order
watermarks, so I didn't want to start too strongly.

There is no reason why they couldn't be changed (eg. there would be a valid
argument to say at least the top watermark should be the same for all orders).

>>I feel strongly about not breaking these things which are specifically
>>there
>>for a reason and that are being changed seemingly because of the false
>>impression that kswapd doesn't proactively free pages for them.
> 
> 
> The interaction with kswapd is not instantaneous.  When an allocation at
> high order fails to allocate at the low watermarks it will indeed wake
> up kswapd and that will work to release memory at the order specified.
> However, if it is already reclaiming at another order it will not switch
> up until it next completes a pass.  For an allocator who cannot sleep
> this is very likely to be too late.  This is never going to help a
> bursty allocator.

If that is how lumpy reclaim works, then shouldn't that be improved?
But in general (and especially mainline) it is definitely harder to reclaim
higher orders, so there might even be an argument to say the watermarks
should be _higher_ for higher order allocations. (I would argue that we
should convert the allocator to lower order allocations ;)).

>>The watermarks for higher order pages you could say are implicit but
>>still there. They are scaled down from the order-0 watermarks, so they
>>should behave in the same way. I just can't understand why you're
>>bypassing these if you think the order-0 behaviour is OK.
> 
> 
> The problem is the watermarks for the higher orders are actually much
> stricter than for low orders.  This is a by product of the way in which
> the algorithm calculates the current free at each iteration, taking away
> the pages at smaller order.  The effective free pages at each order is
> scaled by the ratio of the free pages at that order to all the sum of
> all higher orders.  The effective min at each order is halved.
> 
> Due to the nature of the reclaim strategy we will always expect to see
> exponentially more order-0 pages than order-1 etc and so on, making it
> hugely more difficult to allocate a page at these higher orders.

I don't think it is valid to say the higher watermarks are more strict. They
are less strict in terms of both numbers of pages of that order, and of
total bytes.

>>Well that's wrong unless you are allocating with GFP_THISNODE, in which
>>case that is specifically the behaviour that is asked for.
> 
> 
> kswapd is kicked when we cannot allocate at the normal low water mark,
> we will then attempt a further allocation at min/2 etc.  However we are
> as likely to fail the second as the effective low water mark for higher
> order pages is significantly higher than for order-0.  So kswapd will be
> woken, but it has a huge job on its hands to get us from order-0 low
> order to order-N low water.  As we cannot sleep we are very likely to fail.

No. The problem is not that the watermarks are too high!! Firstly, as I
explained, they are lower. Secondly, you gain _more_ buffering if they
are higher. You're not looking at the whole picture, you see the watermark
that we eventually hit and say "that's too high, let's just allow some
allocations into it", but actually this just screws the guy below you.

The reality is that kswapd isn't getting kicked early enough, so the
watermarks should be increased.

> Looking at the figures above dispassionately it is hard to fault the
> logic of the allocator denying this allocation.  There are indeed very
> few pages at those orders and some (where possible) are reserved for
> PF_MEM tasks, for reclaim itself.

So if you lower the watermarks, then everybody has proportionately less
buffering for themselves, and everything falls apart more easily.

>  However, the reservation system takes
> no account of higher orders, so we can always end up in a situation
> where there only order-0 pages free; all higher orders have been split.

What do you mean the reservation system takes no account of higher
orders?

The lowmem_reserve thing is *not* the PF_MEMALLOC reserve, it is a
mechanism which makes eg. a tiny ZONE_DMA not be allocated from when
doing GFP_HIGHMEM allocations on a 4GB system.

>  This gives us a constraint on all reclaim processing, it must only
> involve order-0 pages else it could deadlock.  BUT if that is true and
> reclaim only uses order-0 pages then there is in actually no point in
> retaining any PF_MEM reserve at higher order as it would never be used.

There is nothing to say reclaim only uses order-0 pages... but I would
buy the argument that says we don't need the PF_MEMALLOC reserves for
eg. order > X allocations (where X is maybe 3).

 > I think that probabally means that the second of our patches here is not
 > the right approach to this problem long term.

I don't think either of the patches are right. Changes to this code
really have to be based on a solid understanding of how it works firstly,
then what the problem is, then how the change is going to go about fixing
the problem without breaking things.

 > A patch which sets the
 > critical slabs to order-0 is probabally the way forward.

Right. Critical allocations should always be order-very-low.

-- 
SUSE Labs, Novell Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-17  7:09                                 ` Nick Piggin
@ 2007-05-17 12:22                                   ` Andy Whitcroft
  2007-05-18  2:25                                     ` Nick Piggin
  0 siblings, 1 reply; 39+ messages in thread
From: Andy Whitcroft @ 2007-05-17 12:22 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Mel Gorman, Nicolas Mailhot, Christoph Lameter, akpm,
	Linux Memory Management List

Nick Piggin wrote:
> Mel Gorman wrote:
>> On (17/05/07 01:44), Nick Piggin didst pronounce:
> 
>>>> If the watermark was totally ignored with the second patch, I would
>>>> understand
>>>> but they are still obeyed. Even if it is an ALLOC_HIGH or ALLOC_HARDER
>>>> allocation, the watermarks are obeyed for order-0 so memory does not
>>>> get
>>>> exhausted as that could cause a host of problems. The difference is
>>>> if this
>>>> is a HIGH or HARDER allocation and the memory can be granted without
>>>> going
>>>> belong the order-0 watermarks, it'll succeed. Would it be better if the
>>>> lack of ALLOC_CPUSET was used to determine when only order-0 watermarks
>>>> should be obeyed?
>>>
>>> But I don't know why you want to disobey higher order watermarks in the
>>> first place.
>>
>>
>> Because the original problem was bio_alloc() allocations failing and
>> the OOM
>> log showed that the higher-order pages were available. Patch 2
>> addressed it
>> by succeeding these allocations if the min watermark was not breached
>> with the
>> knowledge that kswapd was awake and reclaiming at the relevant order.
>> I think
>> it may even have solved it without the kswapd change but the kswapd
>> change
>> seemed sensible.
> 
> But that just breaks the watermarks.
> 
> It could be that the actual values of the watermarks as they are now are
> not very good ones, which is where the problem is coming from.
> 
> 
>>> *Those* are exactly the things that are going to be helpful
>>> to fix this problem of atomic higher order allocations failing or non
>>> atomic ones going into direct reclaim.
>>>
>>
>>
>> And the intention was that non-atomic ones would go into direct reclaim
>> after kicking kswapd but the atomic allocations would at least
>> succeeed if
>> the memory was there as long as they don't totally mess up watermarks.
> 
> But we have 3 levels of watermarks, so you can keep a reserve for atomic
> allocations _and_ a buffer between the reclaim watermark and the direct
> reclaim watermark.
> 
> 
>>>> Raising watermarks is no guarantee that a high-order allocation that
>>>> can sleep
>>>> will occur at the right time to kick kswapd awake and that it'll get
>>>> back from
>>>> whatever it's doing in time to spot the new order and start
>>>> reclaiming again.
>>>
>>> You don't *need* a higher order allocation that can sleep in order
>>> to kick kswapd. Crikey, I keep saying this.
>>>
>>
>>
>> Indeed, we seem to have got stuck in a loop of sorts.
>>
>> I understand that kswapd gets kicked awake either way but there must be a
>> timing issue. Lets say we had a situations like
>>
>> order-0 alloc
>> watermark hit => wake kswapd
>> order-0 alloc            kswapd reclaiming order 0
>> order-0 alloc            kswapd reclaiming order 0
>> order-3 alloc => kick kswap for order 3
>> order-0 alloc            kswapd reclaiming order 0
>> order-3 alloc            kswapd reclaiming order 0
>> order-3 alloc            kswapd reclaiming order 0
>> order-3 alloc => highorder mark hit, fail
>>
>> kswapd will keep reclaiming at order-0 until it completes a reclaim cycle
>> and spots the new order and start over again. So there is a potentially
>> sizable window there where problems can hit. Right?
> 
> Take a look at the code. wakeup_kswapd and __alloc_pages.
> 
> First, assume the zone is above high watermarks for order-0 and order-1.
> order-0 allocs...
> order-1 low watermark hit => don't care, not allocing order-1
> order-0 low watermark hit => wake kswapd reclaim order 0
> order-1 alloc => wakeup_kswapd raises kswapd_max_order to 1
> order-1 allocs continue to succeed until the min watermark is hit
> order-1 *atomic* allocs continue until the atomic reserve is hit
> order-1 memalloc allocs continue until no more order-1 pages left.

This represents the ideal.  However we never consider the reserves at
order-1 unless we get an order-1 allocation.  With lots of order-0
allocations (the norm) we can run the order-1 availability well below
even the atomic reserve without anyone noticing, while the total reserve
is above the order-0 low watermark.  Here kswapd has been idle as there
is only order-0 activity and we have sufficient of those.  THEN an
order-1 comes in, we are below the order-1 low watermarks, we wake
kswapd, and retry and discover we are below the atomic threshold and
_fail_ the allocation.

> 
> There really is (or should be) a proper watermarking system in place that
> provides the right buffering for higher order allocations.

I think that this is should be, not is.

>>> Working out why it apparently isn't working, first. Then maybe look at
>>> raising watermarks (they get reduced fairly rapidly as the order
>>> increases,
>>> so it might just be that there is not enough at order-3).
>>>
>>
>>
>> I believe it failed to work due to a combination of kswapd reclaiming at
>> the wrong order for a while and the fact that the watermarks are pretty
>> agressive when it comes to higher orders. I'm trying to think of
>> alternative fixes but keep coming back to the current fix using
>> !(alloc_flags & ALLOC_CPUSET) to allow !wait allocations to succeed if
>> the memory is there and above min watermarks at order-0.
> 
> kswapd reclaiming at the wrong order should be a bug. It should start
> reclaiming at the right order as soon as an allocation (atomic or not)
> goes through the "start reclaiming now" watermark.
> 
> Now this is just looking at mainline code that has the kswapd_max_order,
> and kswapd doesn't actually reclaim "at" any order -- it just uses the
> kswapd_max_order to know when the required "stop reclaiming now" marks
> have been hit. If lumpy reclaim is not reclaiming at the right order,
> then it means it isn't refreshing from kswapd_max_order enough.

Yes I believe all of this is working as designed.  The problem is that
we treat order-0 and order-1 allocations as independant.  We do not take
into account that we split order-1's to make order-0.  We do not check
the order-1 reserve for order 0 and so wake kswapd early enough.  It is
very hard given the interdependant nature if the current calculation to
detect transitions at _other_ orders when we allocate at any specific order.

Hmmmmmm.

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] Have kswapd keep a minimum order free other than order-0
  2007-05-17 12:22                                   ` Andy Whitcroft
@ 2007-05-18  2:25                                     ` Nick Piggin
  0 siblings, 0 replies; 39+ messages in thread
From: Nick Piggin @ 2007-05-18  2:25 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Mel Gorman, Nicolas Mailhot, Christoph Lameter, akpm,
	Linux Memory Management List

Andy Whitcroft wrote:
> Nick Piggin wrote:

>>>order-0 alloc
>>>watermark hit => wake kswapd
>>>order-0 alloc            kswapd reclaiming order 0
>>>order-0 alloc            kswapd reclaiming order 0
>>>order-3 alloc => kick kswap for order 3
>>>order-0 alloc            kswapd reclaiming order 0
>>>order-3 alloc            kswapd reclaiming order 0
>>>order-3 alloc            kswapd reclaiming order 0
>>>order-3 alloc => highorder mark hit, fail
>>>
>>>kswapd will keep reclaiming at order-0 until it completes a reclaim cycle
>>>and spots the new order and start over again. So there is a potentially
>>>sizable window there where problems can hit. Right?
>>
>>Take a look at the code. wakeup_kswapd and __alloc_pages.
>>
>>First, assume the zone is above high watermarks for order-0 and order-1.
>>order-0 allocs...
>>order-1 low watermark hit => don't care, not allocing order-1
>>order-0 low watermark hit => wake kswapd reclaim order 0
>>order-1 alloc => wakeup_kswapd raises kswapd_max_order to 1
>>order-1 allocs continue to succeed until the min watermark is hit
>>order-1 *atomic* allocs continue until the atomic reserve is hit
>>order-1 memalloc allocs continue until no more order-1 pages left.
> 
> 
> This represents the ideal.  However we never consider the reserves at
> order-1 unless we get an order-1 allocation.  With lots of order-0
> allocations (the norm) we can run the order-1 availability well below
> even the atomic reserve without anyone noticing, while the total reserve
> is above the order-0 low watermark.

Yes, but my reply was addressing the misconception that kswapd never
has its reclaim-order updated while it is reclaiming for a lower order.

It is by design that we don't make order-0 allocations notice order-1
watermarks, so if there is some problem with that, then that is what
should be changed. Not randomly break the watermarking code.


>  Here kswapd has been idle as there
> is only order-0 activity and we have sufficient of those.  THEN an
> order-1 comes in, we are below the order-1 low watermarks, we wake
> kswapd, and retry and discover we are below the atomic threshold and
> _fail_ the allocation.

And that is by design because we don't want to have order-1 pages free
if there are only order-0 allocations.

Anyway, atomic allocations are able to fail gracefully, in which case
kswapd will be kicked for next time. Non-atomic allocations can enter
direct reclaim, so it isn't the end of the world.


>>There really is (or should be) a proper watermarking system in place that
>>provides the right buffering for higher order allocations.
> 
> 
> I think that this is should be, not is.

Well you also said earlier that our problems are due to higher order
watermarks being too aggressive. So I think what is needed is to
actually work out what the real problem is first.


>>>I believe it failed to work due to a combination of kswapd reclaiming at
>>>the wrong order for a while and the fact that the watermarks are pretty
>>>agressive when it comes to higher orders. I'm trying to think of
>>>alternative fixes but keep coming back to the current fix using
>>>!(alloc_flags & ALLOC_CPUSET) to allow !wait allocations to succeed if
>>>the memory is there and above min watermarks at order-0.
>>
>>kswapd reclaiming at the wrong order should be a bug. It should start
>>reclaiming at the right order as soon as an allocation (atomic or not)
>>goes through the "start reclaiming now" watermark.
>>
>>Now this is just looking at mainline code that has the kswapd_max_order,
>>and kswapd doesn't actually reclaim "at" any order -- it just uses the
>>kswapd_max_order to know when the required "stop reclaiming now" marks
>>have been hit. If lumpy reclaim is not reclaiming at the right order,
>>then it means it isn't refreshing from kswapd_max_order enough.
> 
> 
> Yes I believe all of this is working as designed.  The problem is that
> we treat order-0 and order-1 allocations as independant.  We do not take
> into account that we split order-1's to make order-0.  We do not check
> the order-1 reserve for order 0 and so wake kswapd early enough.  It is
> very hard given the interdependant nature if the current calculation to
> detect transitions at _other_ orders when we allocate at any specific order.

Breaking the watermark code then adding a ridiculous hack to pin the
reclaim order to the highest created kmem cache is the wrong way to
go about this.

There are a number of right ways to help with this problem you describe.
One would be to *raise* higher order watermarks. Another would be to
have some decaying check-this-order-watermark-on-alloc counter in the
zone.

All this higher order allocation stuff had better _really_ be worth it...

-- 
SUSE Labs, Novell Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2007-05-18  2:25 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-14 17:32 [PATCH 0/2] Two patches to address bug report in relation to high-order atomic allocations Mel Gorman
2007-05-14 17:32 ` [PATCH 1/2] Have kswapd keep a minimum order free other than order-0 Mel Gorman
2007-05-14 18:01   ` Christoph Lameter
2007-05-14 18:13     ` Christoph Lameter
2007-05-14 18:24       ` Mel Gorman
2007-05-14 18:52         ` Christoph Lameter
2007-05-15  8:42         ` Nicolas Mailhot
2007-05-15  9:16           ` Mel Gorman
2007-05-16  8:25             ` Nick Piggin
2007-05-16  9:03               ` Mel Gorman
2007-05-16  9:10                 ` Nick Piggin
2007-05-16  9:45                   ` Mel Gorman
2007-05-16 12:28                     ` Nick Piggin
2007-05-16 13:50                       ` Mel Gorman
2007-05-16 14:04                         ` Nick Piggin
2007-05-16 15:32                           ` Mel Gorman
2007-05-16 15:44                             ` Nick Piggin
2007-05-16 16:46                               ` Mel Gorman
2007-05-17  7:09                                 ` Nick Piggin
2007-05-17 12:22                                   ` Andy Whitcroft
2007-05-18  2:25                                     ` Nick Piggin
2007-05-16 15:46                             ` Nick Piggin
2007-05-16 14:20                         ` Nick Piggin
2007-05-16 15:06                           ` Nicolas Mailhot
2007-05-16 15:33                             ` Mel Gorman
2007-05-15 17:09           ` Christoph Lameter
2007-05-15  4:39       ` Christoph Lameter
2007-05-14 18:19     ` Mel Gorman
2007-05-14 17:32 ` [PATCH 2/2] Only check absolute watermarks for ALLOC_HIGH and ALLOC_HARDER allocations Mel Gorman
2007-05-16 12:14   ` Nick Piggin
2007-05-16 13:24     ` Mel Gorman
2007-05-16 13:35       ` Nick Piggin
2007-05-16 14:00         ` Mel Gorman
2007-05-16 14:11           ` Nick Piggin
2007-05-16 18:28             ` Andy Whitcroft
2007-05-16 18:48               ` Mel Gorman
2007-05-16 19:00                 ` Christoph Lameter
2007-05-17  7:34               ` Nick Piggin
2007-05-14 18:13 ` [PATCH 0/2] Two patches to address bug report in relation to high-order atomic allocations Nicolas Mailhot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox