[RFC 0/5] Memory controller soft limit introduction (v3)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC 0/5] Memory controller soft limit introduction (v3)
@ 2008-06-27 15:18 Balbir Singh
  2008-06-27 15:18 ` [RFC 1/5] Memory controller soft limit documentation Balbir Singh
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Balbir Singh @ 2008-06-27 15:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: YAMAMOTO Takashi, Paul Menage, linux-kernel, linux-mm,
	Balbir Singh, KAMEZAWA Hiroyuki

This patchset implements the basic changes required to implement soft limits
in the memory controller. A soft limit is a variation of the currently
supported hard limit feature. A memory cgroup can exceed it's soft limit
provided there is no contention for memory.

These patches were tested on a x86_64 box, by running a programs in parallel,
and checking their behaviour for various soft limit values.

These patches were developed on top of 2.6.26-rc5-mm3. Comments, suggestions,
criticism are all welcome!

A previous version of the patch can be found at

http://kerneltrap.org/mailarchive/linux-kernel/2008/2/19/904114

TODOs:

1. Distribute the excessive (non-contended) resources between groups
   in the ratio of their soft limits
2. Merge with KAMEZAWA's and YAMAMOTO's water mark and background reclaim
   patches in the long-term

series
------
memory-controller-soft-limit-add-documentation.patch
prio_heap_delete_max.patch
prio_heap_replace_leaf.patch
memory-controller-soft-limit-res-counter-updates.patch
memory-controller-soft-limit-reclaim-on-contention.patch

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC 1/5] Memory controller soft limit documentation
  2008-06-27 15:18 [RFC 0/5] Memory controller soft limit introduction (v3) Balbir Singh
@ 2008-06-27 15:18 ` Balbir Singh
  2008-06-27 15:18 ` [RFC 2/5] Add delete max to prio heap Balbir Singh
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2008-06-27 15:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: YAMAMOTO Takashi, Paul Menage, linux-kernel, linux-mm,
	Balbir Singh, KAMEZAWA Hiroyuki


Add documentation for the soft limit feature.

Changelog v2 (Thanks to the review by Randy Dunlap)
1. Change several misuses of it's to its
2. Fix spelling errors and punctuation

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 Documentation/controllers/memory.txt |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff -puN Documentation/controllers/memory.txt~memory-controller-soft-limit-add-documentation Documentation/controllers/memory.txt
--- linux-2.6.26-rc5/Documentation/controllers/memory.txt~memory-controller-soft-limit-add-documentation	2008-06-27 20:43:04.000000000 +0530
+++ linux-2.6.26-rc5-balbir/Documentation/controllers/memory.txt	2008-06-27 20:43:04.000000000 +0530
@@ -205,6 +205,22 @@ The memory.force_empty gives an interfac
 
 will drop all charges in cgroup. Currently, this is maintained for test.
 
+The file memory.soft_limit_in_bytes allows users to set soft limits. A soft
+limit is set in a manner similar to limit. The limit feature described
+earlier is a hard limit. A group can never exceed its hard limit. A soft
+limit on the other hand can be exceeded. A group will be shrunk back
+to its soft limit, when there is memory pressure/contention.
+
+Ideally the soft limit should always be set to a value smaller than the
+hard limit. However, the code does not force the user to do so. The soft
+limit can be greater than the hard limit; then the soft limit has
+no meaning in that setup, since the group will always be restrained to its
+hard limit.
+
+Example setting of soft limit
+
+# echo 100M > memory.soft_limit_in_bytes
+
 4. Testing
 
 Balbir posted lmbench, AIM9, LTP and vmmstress results [10] and [11].
_

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC 2/5] Add delete max to prio heap
  2008-06-27 15:18 [RFC 0/5] Memory controller soft limit introduction (v3) Balbir Singh
  2008-06-27 15:18 ` [RFC 1/5] Memory controller soft limit documentation Balbir Singh
@ 2008-06-27 15:18 ` Balbir Singh
  2008-06-27 15:18 ` [RFC 3/5] Replacement policy on heap overfull Balbir Singh
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2008-06-27 15:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: YAMAMOTO Takashi, Paul Menage, linux-kernel, linux-mm,
	Balbir Singh, KAMEZAWA Hiroyuki

This patch enhances the priority heap infrastructure to add a delete_max
routine. This patch and routines are helpful as

1. They allow me to delete nodes from the prio_heap (max heap), which is
   currently missing
2. This infrastructure would be useful for the soft limit patches I am working
   on for the memory controller

Some of the common code has been factored into heap_adjust() a.k.a heapify
in data structures terminology.

I've tested them by porting the code to user space (very easy to do) and
I wrote a simple test routine, that ensures that elements are removed
from the heap in descending order.

Reviewed-by: Paul Menage <menage@google.com>

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 include/linux/prio_heap.h |   10 +++++++-
 lib/prio_heap.c           |   56 ++++++++++++++++++++++++++++++++--------------
 2 files changed, 48 insertions(+), 18 deletions(-)

diff -puN include/linux/prio_heap.h~prio_heap_delete_max include/linux/prio_heap.h
--- linux-2.6.26-rc5/include/linux/prio_heap.h~prio_heap_delete_max	2008-06-27 20:43:08.000000000 +0530
+++ linux-2.6.26-rc5-balbir/include/linux/prio_heap.h	2008-06-27 20:43:08.000000000 +0530
@@ -53,6 +53,14 @@ void heap_free(struct ptr_heap *heap);
  */
 extern void *heap_insert(struct ptr_heap *heap, void *p);
 
-
+/**
+ * heap_delete_max - delete the maximum element from the top of the heap
+ * @heap: The heap to be operated upon
+ *
+ * The top of the heap is removed, the last element is moved to the
+ * top and the entire heap is adjusted, so that the largest element bubbles
+ * up again
+ */
+extern void *heap_delete_max(struct ptr_heap *heap);
 
 #endif /* _LINUX_PRIO_HEAP_H */
diff -puN lib/prio_heap.c~prio_heap_delete_max lib/prio_heap.c
--- linux-2.6.26-rc5/lib/prio_heap.c~prio_heap_delete_max	2008-06-27 20:43:08.000000000 +0530
+++ linux-2.6.26-rc5-balbir/lib/prio_heap.c	2008-06-27 20:43:08.000000000 +0530
@@ -23,11 +23,33 @@ void heap_free(struct ptr_heap *heap)
 	kfree(heap->ptrs);
 }
 
+static void heap_adjust(struct ptr_heap *heap)
+{
+	int pos = 0;
+	void **ptrs = heap->ptrs;
+	void *p = ptrs[pos];
+
+	while (1) {
+		int left = 2 * pos + 1;
+		int right = 2 * pos + 2;
+		int largest = pos;
+		if (left < heap->size && heap->gt(ptrs[left], p))
+			largest = left;
+		if (right < heap->size && heap->gt(ptrs[right], ptrs[largest]))
+			largest = right;
+		if (largest == pos)
+			break;
+		/* Push p down the heap one level and bump one up */
+		ptrs[pos] = ptrs[largest];
+		ptrs[largest] = p;
+		pos = largest;
+	}
+}
+
 void *heap_insert(struct ptr_heap *heap, void *p)
 {
 	void *res;
 	void **ptrs = heap->ptrs;
-	int pos;
 
 	if (heap->size < heap->max) {
 		/* Heap insertion */
@@ -49,22 +71,22 @@ void *heap_insert(struct ptr_heap *heap,
 	/* Replace the current max and heapify */
 	res = ptrs[0];
 	ptrs[0] = p;
-	pos = 0;
+	heap_adjust(heap);
+	return res;
+}
+
+void *heap_delete_max(struct ptr_heap *heap)
+{
+	void **ptrs = heap->ptrs;
+	void *res;
+
+	if (heap->size == 0)
+		return NULL;		/* The heap is empty */
+
+	res = ptrs[0];
+	heap->size--;
+	ptrs[0] = ptrs[heap->size];	/* Put a leaf on top */
+	heap_adjust(heap);
 
-	while (1) {
-		int left = 2 * pos + 1;
-		int right = 2 * pos + 2;
-		int largest = pos;
-		if (left < heap->size && heap->gt(ptrs[left], p))
-			largest = left;
-		if (right < heap->size && heap->gt(ptrs[right], ptrs[largest]))
-			largest = right;
-		if (largest == pos)
-			break;
-		/* Push p down the heap one level and bump one up */
-		ptrs[pos] = ptrs[largest];
-		ptrs[largest] = p;
-		pos = largest;
-	}
 	return res;
 }
_

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC 3/5] Replacement policy on heap overfull
  2008-06-27 15:18 [RFC 0/5] Memory controller soft limit introduction (v3) Balbir Singh
  2008-06-27 15:18 ` [RFC 1/5] Memory controller soft limit documentation Balbir Singh
  2008-06-27 15:18 ` [RFC 2/5] Add delete max to prio heap Balbir Singh
@ 2008-06-27 15:18 ` Balbir Singh
  2008-06-27 15:37   ` Paul Menage
  2008-06-27 15:18 ` [RFC 4/5] Memory controller soft limit resource counter additions Balbir Singh
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Balbir Singh @ 2008-06-27 15:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: YAMAMOTO Takashi, Paul Menage, linux-kernel, linux-mm,
	Balbir Singh, KAMEZAWA Hiroyuki


This patch adds a policy parameter to heap_insert. While inserting an element
if the heap is full, the policy determines which element to replace.
The default earlier is now obtained by passing the policy as HEAP_REP_TOP.
The new HEAP_REP_LEAF policy, replaces a leaf node (the last element).

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 include/linux/prio_heap.h |    9 ++++++++-
 kernel/cgroup.c           |    2 +-
 lib/prio_heap.c           |   31 +++++++++++++++++++++++--------
 3 files changed, 32 insertions(+), 10 deletions(-)

diff -puN include/linux/prio_heap.h~prio_heap_replace_leaf include/linux/prio_heap.h
--- linux-2.6.26-rc5/include/linux/prio_heap.h~prio_heap_replace_leaf	2008-06-27 20:43:09.000000000 +0530
+++ linux-2.6.26-rc5-balbir/include/linux/prio_heap.h	2008-06-27 20:43:09.000000000 +0530
@@ -22,6 +22,11 @@ struct ptr_heap {
 	int (*gt)(void *, void *);
 };
 
+enum heap_replacement_policy {
+	HEAP_REP_LEAF,
+	HEAP_REP_TOP,
+};
+
 /**
  * heap_init - initialize an empty heap with a given memory size
  * @heap: the heap structure to be initialized
@@ -42,6 +47,8 @@ void heap_free(struct ptr_heap *heap);
  * heap_insert - insert a value into the heap and return any overflowed value
  * @heap: the heap to be operated on
  * @p: the pointer to be inserted
+ * @policy: Heap replacement policy, when heap is full. Replace the top
+ * of the heap or the leaf at the end of the array
  *
  * Attempts to insert the given value into the priority heap. If the
  * heap is full prior to the insertion, then the resulting heap will
@@ -51,7 +58,7 @@ void heap_free(struct ptr_heap *heap);
  * (i.e. no change to the heap) if the new element is greater than all
  * elements currently in the heap.
  */
-extern void *heap_insert(struct ptr_heap *heap, void *p);
+extern void *heap_insert(struct ptr_heap *heap, void *p, int policy);
 
 /**
  * heap_delete_max - delete the maximum element from the top of the heap
diff -puN lib/prio_heap.c~prio_heap_replace_leaf lib/prio_heap.c
--- linux-2.6.26-rc5/lib/prio_heap.c~prio_heap_replace_leaf	2008-06-27 20:43:09.000000000 +0530
+++ linux-2.6.26-rc5-balbir/lib/prio_heap.c	2008-06-27 20:43:09.000000000 +0530
@@ -46,7 +46,17 @@ static void heap_adjust(struct ptr_heap 
 	}
 }
 
-void *heap_insert(struct ptr_heap *heap, void *p)
+static void heap_insert_at(struct ptr_heap *heap, void *p, int pos)
+{
+	void **ptrs = heap->ptrs;
+	while (pos > 0 && heap->gt(p, ptrs[(pos-1)/2])) {
+		ptrs[pos] = ptrs[(pos-1)/2];
+		pos = (pos-1)/2;
+	}
+	ptrs[pos] = p;
+}
+
+void *heap_insert(struct ptr_heap *heap, void *p, int policy)
 {
 	void *res;
 	void **ptrs = heap->ptrs;
@@ -54,19 +64,24 @@ void *heap_insert(struct ptr_heap *heap,
 	if (heap->size < heap->max) {
 		/* Heap insertion */
 		int pos = heap->size++;
-		while (pos > 0 && heap->gt(p, ptrs[(pos-1)/2])) {
-			ptrs[pos] = ptrs[(pos-1)/2];
-			pos = (pos-1)/2;
-		}
-		ptrs[pos] = p;
+		heap_insert_at(heap, p, pos);
 		return NULL;
 	}
 
 	/* The heap is full, so something will have to be dropped */
 
 	/* If the new pointer is greater than the current max, drop it */
-	if (heap->gt(p, ptrs[0]))
-		return p;
+	if (policy == HEAP_REP_TOP)
+		if (heap->gt(p, ptrs[0]))
+			return p;
+
+	if (policy == HEAP_REP_LEAF) {
+		/* Heap insertion */
+		int pos = heap->size - 1;
+		res = ptrs[pos];
+		heap_insert_at(heap, p, pos);
+		return res;
+	}
 
 	/* Replace the current max and heapify */
 	res = ptrs[0];
diff -puN kernel/cgroup.c~prio_heap_replace_leaf kernel/cgroup.c
--- linux-2.6.26-rc5/kernel/cgroup.c~prio_heap_replace_leaf	2008-06-27 20:43:09.000000000 +0530
+++ linux-2.6.26-rc5-balbir/kernel/cgroup.c	2008-06-27 20:43:09.000000000 +0530
@@ -1976,7 +1976,7 @@ int cgroup_scan_tasks(struct cgroup_scan
 		 */
 		if (!started_after_time(p, &latest_time, latest_task))
 			continue;
-		dropped = heap_insert(heap, p);
+		dropped = heap_insert(heap, p, HEAP_REP_TOP);
 		if (dropped == NULL) {
 			/*
 			 * The new task was inserted; the heap wasn't
_

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 3/5] Replacement policy on heap overfull
  2008-06-27 15:18 ` [RFC 3/5] Replacement policy on heap overfull Balbir Singh
@ 2008-06-27 15:37   ` Paul Menage
  2008-06-30  3:46     ` Balbir Singh
  0 siblings, 1 reply; 27+ messages in thread
From: Paul Menage @ 2008-06-27 15:37 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Andrew Morton, YAMAMOTO Takashi, linux-kernel, linux-mm,
	KAMEZAWA Hiroyuki

On Fri, Jun 27, 2008 at 8:18 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>
>
> This patch adds a policy parameter to heap_insert. While inserting an element
> if the heap is full, the policy determines which element to replace.
> The default earlier is now obtained by passing the policy as HEAP_REP_TOP.
> The new HEAP_REP_LEAF policy, replaces a leaf node (the last element).
>
> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
>
>  include/linux/prio_heap.h |    9 ++++++++-
>  kernel/cgroup.c           |    2 +-
>  lib/prio_heap.c           |   31 +++++++++++++++++++++++--------
>  3 files changed, 32 insertions(+), 10 deletions(-)
>
> diff -puN include/linux/prio_heap.h~prio_heap_replace_leaf include/linux/prio_heap.h
> --- linux-2.6.26-rc5/include/linux/prio_heap.h~prio_heap_replace_leaf   2008-06-27 20:43:09.000000000 +0530
> +++ linux-2.6.26-rc5-balbir/include/linux/prio_heap.h   2008-06-27 20:43:09.000000000 +0530
> @@ -22,6 +22,11 @@ struct ptr_heap {
>        int (*gt)(void *, void *);
>  };
>
> +enum heap_replacement_policy {
> +       HEAP_REP_LEAF,
> +       HEAP_REP_TOP,
> +};

Maybe "drop" rather than "replace"? HEAP_REP_TOP doesn't replace the
top element if you insert a new higher element, it drops the top.

How about HEAP_DROP_LEAF and HEAP_DROP_MAX? You could also provide a
HEAP_DROP_MIN with the caveat that it would take linear time.

Add comments here about what these mean?

> +       if (policy == HEAP_REP_TOP)

switch() here?

> +               if (heap->gt(p, ptrs[0]))
> +                       return p;
> +
> +       if (policy == HEAP_REP_LEAF) {
> +               /* Heap insertion */
> +               int pos = heap->size - 1;
> +               res = ptrs[pos];
> +               heap_insert_at(heap, p, pos);
> +               return res;
> +       }
>
>        /* Replace the current max and heapify */
>        res = ptrs[0];

This should probably be in the arm dealing with
HEAP_REP_TOP/HEAP_DROP_MAX since we only get here in that case.

Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 3/5] Replacement policy on heap overfull
  2008-06-27 15:37   ` Paul Menage
@ 2008-06-30  3:46     ` Balbir Singh
  0 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2008-06-30  3:46 UTC (permalink / raw)
  To: Paul Menage
  Cc: Andrew Morton, YAMAMOTO Takashi, linux-kernel, linux-mm,
	KAMEZAWA Hiroyuki

Paul Menage wrote:
> On Fri, Jun 27, 2008 at 8:18 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>>
>> This patch adds a policy parameter to heap_insert. While inserting an element
>> if the heap is full, the policy determines which element to replace.
>> The default earlier is now obtained by passing the policy as HEAP_REP_TOP.
>> The new HEAP_REP_LEAF policy, replaces a leaf node (the last element).
>>
>> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
>> ---
>>
>>  include/linux/prio_heap.h |    9 ++++++++-
>>  kernel/cgroup.c           |    2 +-
>>  lib/prio_heap.c           |   31 +++++++++++++++++++++++--------
>>  3 files changed, 32 insertions(+), 10 deletions(-)
>>
>> diff -puN include/linux/prio_heap.h~prio_heap_replace_leaf include/linux/prio_heap.h
>> --- linux-2.6.26-rc5/include/linux/prio_heap.h~prio_heap_replace_leaf   2008-06-27 20:43:09.000000000 +0530
>> +++ linux-2.6.26-rc5-balbir/include/linux/prio_heap.h   2008-06-27 20:43:09.000000000 +0530
>> @@ -22,6 +22,11 @@ struct ptr_heap {
>>        int (*gt)(void *, void *);
>>  };
>>
>> +enum heap_replacement_policy {
>> +       HEAP_REP_LEAF,
>> +       HEAP_REP_TOP,
>> +};
> 
> Maybe "drop" rather than "replace"? HEAP_REP_TOP doesn't replace the
> top element if you insert a new higher element, it drops the top.
> 
> How about HEAP_DROP_LEAF and HEAP_DROP_MAX? You could also provide a
> HEAP_DROP_MIN with the caveat that it would take linear time.
> 
> Add comments here about what these mean?
> 

Sure, will do

>> +       if (policy == HEAP_REP_TOP)
> 
> switch() here?
> 

Can switch over

>> +               if (heap->gt(p, ptrs[0]))
>> +                       return p;
>> +
>> +       if (policy == HEAP_REP_LEAF) {
>> +               /* Heap insertion */
>> +               int pos = heap->size - 1;
>> +               res = ptrs[pos];
>> +               heap_insert_at(heap, p, pos);
>> +               return res;
>> +       }
>>
>>        /* Replace the current max and heapify */
>>        res = ptrs[0];
> 
> This should probably be in the arm dealing with
> HEAP_REP_TOP/HEAP_DROP_MAX since we only get here in that case.

I can do that, I'll need to rearrange the code and merge the condition above
with the ->gt check into HEAP_DROP_MAX

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC 4/5] Memory controller soft limit resource counter additions
  2008-06-27 15:18 [RFC 0/5] Memory controller soft limit introduction (v3) Balbir Singh
                   ` (2 preceding siblings ...)
  2008-06-27 15:18 ` [RFC 3/5] Replacement policy on heap overfull Balbir Singh
@ 2008-06-27 15:18 ` Balbir Singh
  2008-06-27 15:19 ` [RFC 5/5] Memory controller soft limit reclaim on contention Balbir Singh
  2008-06-28  4:36 ` [RFC 0/5] Memory controller soft limit introduction (v3) KAMEZAWA Hiroyuki
  5 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2008-06-27 15:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: YAMAMOTO Takashi, Paul Menage, linux-kernel, linux-mm,
	Balbir Singh, KAMEZAWA Hiroyuki


Add soft_limit is a parameter to the resource counters infrastructure.
Helper routines are also added to detect soft limit overflow.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 include/linux/res_counter.h |   35 +++++++++++++++++++++++++++++++----
 kernel/res_counter.c        |    2 ++
 mm/memcontrol.c             |    6 ++++++
 3 files changed, 39 insertions(+), 4 deletions(-)

diff -puN include/linux/res_counter.h~memory-controller-soft-limit-res-counter-updates include/linux/res_counter.h
--- linux-2.6.26-rc5/include/linux/res_counter.h~memory-controller-soft-limit-res-counter-updates	2008-06-27 20:43:10.000000000 +0530
+++ linux-2.6.26-rc5-balbir/include/linux/res_counter.h	2008-06-27 20:43:10.000000000 +0530
@@ -35,6 +35,12 @@ struct res_counter {
 	 */
 	unsigned long long limit;
 	/*
+	 * the limit that usage can exceed. When resource contention is
+	 * detected, the controller will try and pull back resources from
+	 * counters that have exceeded their soft limit.
+	 */
+	unsigned long long soft_limit;
+	/*
 	 * the number of unsuccessful attempts to consume the resource
 	 */
 	unsigned long long failcnt;
@@ -75,6 +81,7 @@ enum {
 	RES_USAGE,
 	RES_MAX_USAGE,
 	RES_LIMIT,
+	RES_SOFT_LIMIT,
 	RES_FAILCNT,
 };
 
@@ -113,11 +120,17 @@ int __must_check res_counter_charge(stru
 void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
 void res_counter_uncharge(struct res_counter *counter, unsigned long val);
 
-static inline bool res_counter_limit_check_locked(struct res_counter *cnt)
+static inline bool res_counter_limit_check_locked(struct res_counter *cnt,
+							int member)
 {
-	if (cnt->usage < cnt->limit)
+	switch (member) {
+	case RES_LIMIT:
+		if (cnt->usage < cnt->limit)
 		return true;
-
+	case RES_SOFT_LIMIT:
+		if (cnt->usage < cnt->soft_limit)
+		return true;
+	}
 	return false;
 }
 
@@ -131,7 +144,21 @@ static inline bool res_counter_check_und
 	unsigned long flags;
 
 	spin_lock_irqsave(&cnt->lock, flags);
-	ret = res_counter_limit_check_locked(cnt);
+	ret = res_counter_limit_check_locked(cnt, RES_LIMIT);
+	spin_unlock_irqrestore(&cnt->lock, flags);
+	return ret;
+}
+
+/*
+ * Helper function to detect if the cgroup is within it's soft limit
+ */
+static inline bool res_counter_check_under_soft_limit(struct res_counter *cnt)
+{
+	bool ret;
+	unsigned long flags;
+
+	spin_lock_irqsave(&cnt->lock, flags);
+	ret = res_counter_limit_check_locked(cnt, RES_SOFT_LIMIT);
 	spin_unlock_irqrestore(&cnt->lock, flags);
 	return ret;
 }
diff -puN kernel/res_counter.c~memory-controller-soft-limit-res-counter-updates kernel/res_counter.c
--- linux-2.6.26-rc5/kernel/res_counter.c~memory-controller-soft-limit-res-counter-updates	2008-06-27 20:43:10.000000000 +0530
+++ linux-2.6.26-rc5-balbir/kernel/res_counter.c	2008-06-27 20:43:10.000000000 +0530
@@ -72,6 +72,8 @@ res_counter_member(struct res_counter *c
 		return &counter->max_usage;
 	case RES_LIMIT:
 		return &counter->limit;
+	case RES_SOFT_LIMIT:
+		return &counter->soft_limit;
 	case RES_FAILCNT:
 		return &counter->failcnt;
 	};
diff -puN mm/memcontrol.c~memory-controller-soft-limit-res-counter-updates mm/memcontrol.c
--- linux-2.6.26-rc5/mm/memcontrol.c~memory-controller-soft-limit-res-counter-updates	2008-06-27 20:43:10.000000000 +0530
+++ linux-2.6.26-rc5-balbir/mm/memcontrol.c	2008-06-27 20:43:10.000000000 +0530
@@ -972,6 +972,12 @@ static struct cftype mem_cgroup_files[] 
 		.read_u64 = mem_cgroup_read,
 	},
 	{
+		.name = "soft_limit_in_bytes",
+		.private = RES_SOFT_LIMIT,
+		.write = mem_cgroup_write,
+		.read_u64 = mem_cgroup_read,
+	},
+	{
 		.name = "failcnt",
 		.private = RES_FAILCNT,
 		.trigger = mem_cgroup_reset,
diff -puN include/linux/memcontrol.h~memory-controller-soft-limit-res-counter-updates include/linux/memcontrol.h
_

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC 5/5] Memory controller soft limit reclaim on contention
  2008-06-27 15:18 [RFC 0/5] Memory controller soft limit introduction (v3) Balbir Singh
                   ` (3 preceding siblings ...)
  2008-06-27 15:18 ` [RFC 4/5] Memory controller soft limit resource counter additions Balbir Singh
@ 2008-06-27 15:19 ` Balbir Singh
  2008-06-27 16:09   ` Paul Menage
                     ` (2 more replies)
  2008-06-28  4:36 ` [RFC 0/5] Memory controller soft limit introduction (v3) KAMEZAWA Hiroyuki
  5 siblings, 3 replies; 27+ messages in thread
From: Balbir Singh @ 2008-06-27 15:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: YAMAMOTO Takashi, Paul Menage, linux-kernel, linux-mm,
	Balbir Singh, KAMEZAWA Hiroyuki


Setup the soft limit control data structures in the memory controller. The
prio_heap data structure is used. Memory contention is detected in
__alloc_pages_internal(). Prior to calling try_to_free_pages(), the code
now tries to free memory from memory groups above their soft limit. This
happens in the mem_cgroup_reclaim_on_contention() routine. This routine
pulls out cgroups from a max heap (ordered by the size by which they exceed
their soft limit) and reclaims from them.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 include/linux/memcontrol.h  |    7 +++
 include/linux/res_counter.h |   16 ++++++
 mm/memcontrol.c             |  101 ++++++++++++++++++++++++++++++++++++++++++++
 mm/page_alloc.c             |    9 +++
 4 files changed, 132 insertions(+), 1 deletion(-)

diff -puN mm/memcontrol.c~memory-controller-soft-limit-reclaim-on-contention mm/memcontrol.c
--- linux-2.6.26-rc5/mm/memcontrol.c~memory-controller-soft-limit-reclaim-on-contention	2008-06-27 20:43:10.000000000 +0530
+++ linux-2.6.26-rc5-balbir/mm/memcontrol.c	2008-06-27 20:43:10.000000000 +0530
@@ -25,6 +25,7 @@
 #include <linux/page-flags.h>
 #include <linux/backing-dev.h>
 #include <linux/bit_spinlock.h>
+#include <linux/prio_heap.h>
 #include <linux/rcupdate.h>
 #include <linux/slab.h>
 #include <linux/swap.h>
@@ -39,6 +40,18 @@
 struct cgroup_subsys mem_cgroup_subsys __read_mostly;
 static struct kmem_cache *page_cgroup_cache __read_mostly;
 #define MEM_CGROUP_RECLAIM_RETRIES	5
+#define MEM_CGROUP_HEAP_SHIFT		7
+#define MEM_CGROUP_HEAP_SIZE		(1 << MEM_CGROUP_HEAP_SHIFT)
+
+/*
+ * Create a heap of memory controller structures. The heap is reverse
+ * sorted by size. This heap is used for implementing soft limits. Our
+ * current heap implementation does not allow dynamic heap updates, but
+ * eventually, the costliest controller (over it's soft limit should
+ * be on top of the heap).
+ */
+struct ptr_heap mem_cgroup_heap;
+spinlock_t mem_cgroup_heap_lock;	/* One more lock for a global heap */
 
 /*
  * Statistics for memory cgroup.
@@ -129,6 +142,7 @@ struct mem_cgroup {
 	struct mem_cgroup_lru_info info;
 
 	int	prev_priority;	/* for recording reclaim priority */
+	int	on_heap;	/* Are we on the soft limit group */
 	/*
 	 * statistics.
 	 */
@@ -590,6 +604,20 @@ static int mem_cgroup_charge_common(stru
 	}
 	page_assign_page_cgroup(page, pc);
 
+	if (!res_counter_check_under_soft_limit(&mem->res)) {
+		spin_lock_irqsave(&mem_cgroup_heap_lock, flags);
+		if (!mem->on_heap) {
+			struct mem_cgroup *old_mem;
+
+			old_mem = heap_insert(&mem_cgroup_heap, mem,
+						HEAP_REP_LEAF);
+			mem->on_heap = 1;
+			if (old_mem)
+				old_mem->on_heap = 0;
+		}
+		spin_unlock_irqrestore(&mem_cgroup_heap_lock, flags);
+	}
+
 	mz = page_cgroup_zoneinfo(pc);
 	spin_lock_irqsave(&mz->lru_lock, flags);
 	__mem_cgroup_add_list(mz, pc);
@@ -715,6 +743,65 @@ void mem_cgroup_uncharge_cache_page(stru
 }
 
 /*
+ * When the soft limit is exceeded, look through the heap and start
+ * reclaiming from all groups over thier soft limit
+ */
+unsigned long mem_cgroup_reclaim_on_contention(gfp_t gfp_mask)
+{
+	unsigned long nr_reclaimed = 0;
+	struct mem_cgroup *mem;
+	unsigned long flags;
+	int i, count;
+
+
+	for (i = MEM_CGROUP_HEAP_SHIFT; i >= 0; i--) {
+		count = MEM_CGROUP_HEAP_SIZE >> i;
+		mem = mem_cgroup_from_task(current);
+
+		spin_lock_irqsave(&mem_cgroup_heap_lock, flags);
+		if (!res_counter_check_under_soft_limit(&mem->res)) {
+			/*
+			 * The current task might already be over it's soft
+			 * limit and trying to aggressively grow. We check to
+			 * see if it the memory group associated with the
+			 * current task is on the heap when the current group
+			 * is over it's soft limit. If not, we add it
+			 */
+			if (!mem->on_heap) {
+				struct mem_cgroup *old_mem;
+
+				old_mem = heap_insert(&mem_cgroup_heap, mem,
+							HEAP_REP_LEAF);
+				mem->on_heap = 1;
+				if (old_mem)
+					old_mem->on_heap = 0;
+			}
+		}
+
+		while (count-- &&
+			((mem = heap_delete_max(&mem_cgroup_heap)) != NULL)) {
+			BUG_ON(!mem->on_heap);
+			spin_unlock_irqrestore(&mem_cgroup_heap_lock, flags);
+			nr_reclaimed += try_to_free_mem_cgroup_pages(mem,
+								gfp_mask);
+			cond_resched();
+			spin_lock_irqsave(&mem_cgroup_heap_lock, flags);
+			mem->on_heap = 0;
+			/*
+			 * What should be the basis of breaking out?
+			 */
+			if (nr_reclaimed)
+				goto done;
+		}
+done:
+		spin_unlock_irqrestore(&mem_cgroup_heap_lock, flags);
+		if (!mem)
+			break;
+	}
+	return nr_reclaimed;
+}
+
+/*
  * Before starting migration, account against new page.
  */
 int mem_cgroup_prepare_migration(struct page *page, struct page *newpage)
@@ -1052,6 +1139,17 @@ static void mem_cgroup_free(struct mem_c
 		vfree(mem);
 }
 
+static int mem_cgroup_compare_soft_limits(void *p1, void *p2)
+{
+	struct mem_cgroup *mem1 = (struct mem_cgroup *)p1;
+	struct mem_cgroup *mem2 = (struct mem_cgroup *)p2;
+	unsigned long long delta1, delta2;
+
+	delta1 = res_counter_soft_limit_delta(&mem1->res);
+	delta2 = res_counter_soft_limit_delta(&mem2->res);
+
+	return delta1 > delta2;
+}
 
 static struct cgroup_subsys_state *
 mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
@@ -1062,6 +1160,9 @@ mem_cgroup_create(struct cgroup_subsys *
 	if (unlikely((cont->parent) == NULL)) {
 		mem = &init_mem_cgroup;
 		page_cgroup_cache = KMEM_CACHE(page_cgroup, SLAB_PANIC);
+		heap_init(&mem_cgroup_heap, MEM_CGROUP_HEAP_SIZE, GFP_KERNEL,
+				mem_cgroup_compare_soft_limits);
+		spin_lock_init(&mem_cgroup_heap_lock);
 	} else {
 		mem = mem_cgroup_alloc();
 		if (!mem)
diff -puN include/linux/memcontrol.h~memory-controller-soft-limit-reclaim-on-contention include/linux/memcontrol.h
--- linux-2.6.26-rc5/include/linux/memcontrol.h~memory-controller-soft-limit-reclaim-on-contention	2008-06-27 20:43:10.000000000 +0530
+++ linux-2.6.26-rc5-balbir/include/linux/memcontrol.h	2008-06-27 20:43:10.000000000 +0530
@@ -38,6 +38,7 @@ extern void mem_cgroup_move_lists(struct
 extern void mem_cgroup_uncharge_page(struct page *page);
 extern void mem_cgroup_uncharge_cache_page(struct page *page);
 extern int mem_cgroup_shrink_usage(struct mm_struct *mm, gfp_t gfp_mask);
+extern unsigned long mem_cgroup_reclaim_on_contention(gfp_t gfp_mask);
 
 extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
 					struct list_head *dst,
@@ -163,6 +164,12 @@ static inline long mem_cgroup_calc_recla
 {
 	return 0;
 }
+
+static inline unsigned long mem_cgroup_reclaim_on_contention(gfp_t gfp_mask)
+{
+	return 0;
+}
+
 #endif /* CONFIG_CGROUP_MEM_CONT */
 
 #endif /* _LINUX_MEMCONTROL_H */
diff -puN mm/vmscan.c~memory-controller-soft-limit-reclaim-on-contention mm/vmscan.c
diff -puN mm/page_alloc.c~memory-controller-soft-limit-reclaim-on-contention mm/page_alloc.c
--- linux-2.6.26-rc5/mm/page_alloc.c~memory-controller-soft-limit-reclaim-on-contention	2008-06-27 20:43:10.000000000 +0530
+++ linux-2.6.26-rc5-balbir/mm/page_alloc.c	2008-06-27 20:43:10.000000000 +0530
@@ -1669,7 +1669,14 @@ nofail_alloc:
 	reclaim_state.reclaimed_slab = 0;
 	p->reclaim_state = &reclaim_state;
 
-	did_some_progress = try_to_free_pages(zonelist, order, gfp_mask);
+	/*
+	 * First try to reclaim from memory control groups that have
+	 * exceeded their soft limit
+	 */
+	did_some_progress = mem_cgroup_reclaim_on_contention(gfp_mask);
+	if (!did_some_progress)
+		did_some_progress = try_to_free_pages(zonelist, order,
+							gfp_mask);
 
 	p->reclaim_state = NULL;
 	p->flags &= ~PF_MEMALLOC;
diff -puN kernel/res_counter.c~memory-controller-soft-limit-reclaim-on-contention kernel/res_counter.c
diff -puN include/linux/res_counter.h~memory-controller-soft-limit-reclaim-on-contention include/linux/res_counter.h
--- linux-2.6.26-rc5/include/linux/res_counter.h~memory-controller-soft-limit-reclaim-on-contention	2008-06-27 20:43:10.000000000 +0530
+++ linux-2.6.26-rc5-balbir/include/linux/res_counter.h	2008-06-27 20:43:10.000000000 +0530
@@ -163,6 +163,22 @@ static inline bool res_counter_check_und
 	return ret;
 }
 
+/*
+ * Return the delta between soft_limit and usage
+ */
+static inline
+unsigned long long res_counter_soft_limit_delta(struct res_counter *cnt)
+{
+	unsigned long long ret, delta;
+	unsigned long flags;
+
+	spin_lock_irqsave(&cnt->lock, flags);
+	delta = cnt->usage - cnt->soft_limit;
+	ret = delta > 0 ? delta : 0;
+	spin_unlock_irqrestore(&cnt->lock, flags);
+	return ret;
+}
+
 static inline void res_counter_reset_max(struct res_counter *cnt)
 {
 	unsigned long flags;
_

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 5/5] Memory controller soft limit reclaim on contention
  2008-06-27 15:19 ` [RFC 5/5] Memory controller soft limit reclaim on contention Balbir Singh
@ 2008-06-27 16:09   ` Paul Menage
  2008-06-29  4:48     ` Balbir Singh
  2008-06-30  3:42     ` Balbir Singh
  2008-06-28  4:22   ` KAMEZAWA Hiroyuki
  2008-06-30  7:33   ` KOSAKI Motohiro
  2 siblings, 2 replies; 27+ messages in thread
From: Paul Menage @ 2008-06-27 16:09 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Andrew Morton, YAMAMOTO Takashi, linux-kernel, linux-mm,
	KAMEZAWA Hiroyuki

On Fri, Jun 27, 2008 at 8:19 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> +/*
> + * Create a heap of memory controller structures. The heap is reverse
> + * sorted by size. This heap is used for implementing soft limits. Our
> + * current heap implementation does not allow dynamic heap updates, but
> + * eventually, the costliest controller (over it's soft limit should

it's -> its

> +                       old_mem = heap_insert(&mem_cgroup_heap, mem,
> +                                               HEAP_REP_LEAF);
> +                       mem->on_heap = 1;
> +                       if (old_mem)
> +                               old_mem->on_heap = 0;

Maybe a comment here that mem might == old_mem?

> + * When the soft limit is exceeded, look through the heap and start
> + * reclaiming from all groups over thier soft limit

thier -> their

> +               if (!res_counter_check_under_soft_limit(&mem->res)) {
> +                       /*
> +                        * The current task might already be over it's soft
> +                        * limit and trying to aggressively grow. We check to
> +                        * see if it the memory group associated with the
> +                        * current task is on the heap when the current group
> +                        * is over it's soft limit. If not, we add it
> +                        */
> +                       if (!mem->on_heap) {
> +                               struct mem_cgroup *old_mem;
> +
> +                               old_mem = heap_insert(&mem_cgroup_heap, mem,
> +                                                       HEAP_REP_LEAF);
> +                               mem->on_heap = 1;
> +                               if (old_mem)
> +                                       old_mem->on_heap = 0;
> +                       }
> +               }

This and the other similar code for adding to the heap should be
refactored into a separate function.

>
> +static int mem_cgroup_compare_soft_limits(void *p1, void *p2)
> +{
> +       struct mem_cgroup *mem1 = (struct mem_cgroup *)p1;
> +       struct mem_cgroup *mem2 = (struct mem_cgroup *)p2;
> +       unsigned long long delta1, delta2;
> +
> +       delta1 = res_counter_soft_limit_delta(&mem1->res);
> +       delta2 = res_counter_soft_limit_delta(&mem2->res);
> +
> +       return delta1 > delta2;
> +}

This isn't a valid comparator, since it isn't a constant function of
its two input pointers - calling mem_cgroup_compare_soft_limits(m1,
m2) can give different results at different times. So your heap
invariant will become invalid over time.

I think if you want to do this, you're going to need to periodically
take a snapshot of each cgroup's excess and use that snapshot in the
comparator; whenever you update the snapshots, you'll need to restore
the heap invariant.

Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 5/5] Memory controller soft limit reclaim on contention
  2008-06-27 16:09   ` Paul Menage
@ 2008-06-29  4:48     ` Balbir Singh
  2008-06-30  3:42     ` Balbir Singh
  1 sibling, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2008-06-29  4:48 UTC (permalink / raw)
  To: Paul Menage
  Cc: Andrew Morton, YAMAMOTO Takashi, linux-kernel, linux-mm,
	KAMEZAWA Hiroyuki

Paul Menage wrote:
> On Fri, Jun 27, 2008 at 8:19 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>> +/*
>> + * Create a heap of memory controller structures. The heap is reverse
>> + * sorted by size. This heap is used for implementing soft limits. Our
>> + * current heap implementation does not allow dynamic heap updates, but
>> + * eventually, the costliest controller (over it's soft limit should
> 
> it's -> its
> 
>> +                       old_mem = heap_insert(&mem_cgroup_heap, mem,
>> +                                               HEAP_REP_LEAF);
>> +                       mem->on_heap = 1;
>> +                       if (old_mem)
>> +                               old_mem->on_heap = 0;
> 
> Maybe a comment here that mem might == old_mem?
> 
>> + * When the soft limit is exceeded, look through the heap and start
>> + * reclaiming from all groups over thier soft limit
> 
> thier -> their
> 
>> +               if (!res_counter_check_under_soft_limit(&mem->res)) {
>> +                       /*
>> +                        * The current task might already be over it's soft
>> +                        * limit and trying to aggressively grow. We check to
>> +                        * see if it the memory group associated with the
>> +                        * current task is on the heap when the current group
>> +                        * is over it's soft limit. If not, we add it
>> +                        */
>> +                       if (!mem->on_heap) {
>> +                               struct mem_cgroup *old_mem;
>> +
>> +                               old_mem = heap_insert(&mem_cgroup_heap, mem,
>> +                                                       HEAP_REP_LEAF);
>> +                               mem->on_heap = 1;
>> +                               if (old_mem)
>> +                                       old_mem->on_heap = 0;
>> +                       }
>> +               }
> 
> This and the other similar code for adding to the heap should be
> refactored into a separate function.
> 
>> +static int mem_cgroup_compare_soft_limits(void *p1, void *p2)
>> +{
>> +       struct mem_cgroup *mem1 = (struct mem_cgroup *)p1;
>> +       struct mem_cgroup *mem2 = (struct mem_cgroup *)p2;
>> +       unsigned long long delta1, delta2;
>> +
>> +       delta1 = res_counter_soft_limit_delta(&mem1->res);
>> +       delta2 = res_counter_soft_limit_delta(&mem2->res);
>> +
>> +       return delta1 > delta2;
>> +}
> 
> This isn't a valid comparator, since it isn't a constant function of
> its two input pointers - calling mem_cgroup_compare_soft_limits(m1,
> m2) can give different results at different times. So your heap
> invariant will become invalid over time.
> 
> I think if you want to do this, you're going to need to periodically
> take a snapshot of each cgroup's excess and use that snapshot in the
> comparator; whenever you update the snapshots, you'll need to restore
> the heap invariant.

I see your point. Keeping snapshots sounds OK, but updating the heap each time
is expensive, since it's hard to find a node in the heap. If we could, then we
could call heap_adjust frequently (whenever the delta changes) and keep the heap
correctly formed. I wonder if keeping two snapshots will help. One for use by
the ->gt callback (called old_snapshot) and then we switch over to the new
snapshot when we reinsert the element after it has been deleted from the heap.

Thinking further, snapshotting might work, provided we take snapshots at the
time of insertion only. When an element is deleted and re-inserted we update the
snapshot. That way the invariant is not broken.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 5/5] Memory controller soft limit reclaim on contention
  2008-06-27 16:09   ` Paul Menage
  2008-06-29  4:48     ` Balbir Singh
@ 2008-06-30  3:42     ` Balbir Singh
  1 sibling, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2008-06-30  3:42 UTC (permalink / raw)
  To: Paul Menage
  Cc: Andrew Morton, YAMAMOTO Takashi, linux-kernel, linux-mm,
	KAMEZAWA Hiroyuki

Paul Menage wrote:
> On Fri, Jun 27, 2008 at 8:19 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>> +/*
>> + * Create a heap of memory controller structures. The heap is reverse
>> + * sorted by size. This heap is used for implementing soft limits. Our
>> + * current heap implementation does not allow dynamic heap updates, but
>> + * eventually, the costliest controller (over it's soft limit should
> 
> it's -> its
> 

Yes

>> +                       old_mem = heap_insert(&mem_cgroup_heap, mem,
>> +                                               HEAP_REP_LEAF);
>> +                       mem->on_heap = 1;
>> +                       if (old_mem)
>> +                               old_mem->on_heap = 0;
> 
> Maybe a comment here that mem might == old_mem?
> 
>> + * When the soft limit is exceeded, look through the heap and start
>> + * reclaiming from all groups over thier soft limit
> 
> thier -> their
> 

Will fix

>> +               if (!res_counter_check_under_soft_limit(&mem->res)) {
>> +                       /*
>> +                        * The current task might already be over it's soft
>> +                        * limit and trying to aggressively grow. We check to
>> +                        * see if it the memory group associated with the
>> +                        * current task is on the heap when the current group
>> +                        * is over it's soft limit. If not, we add it
>> +                        */
>> +                       if (!mem->on_heap) {
>> +                               struct mem_cgroup *old_mem;
>> +
>> +                               old_mem = heap_insert(&mem_cgroup_heap, mem,
>> +                                                       HEAP_REP_LEAF);
>> +                               mem->on_heap = 1;
>> +                               if (old_mem)
>> +                                       old_mem->on_heap = 0;
>> +                       }
>> +               }
> 
> This and the other similar code for adding to the heap should be
> refactored into a separate function.
> 

OK, I can look into that.

>> +static int mem_cgroup_compare_soft_limits(void *p1, void *p2)
>> +{
>> +       struct mem_cgroup *mem1 = (struct mem_cgroup *)p1;
>> +       struct mem_cgroup *mem2 = (struct mem_cgroup *)p2;
>> +       unsigned long long delta1, delta2;
>> +
>> +       delta1 = res_counter_soft_limit_delta(&mem1->res);
>> +       delta2 = res_counter_soft_limit_delta(&mem2->res);
>> +
>> +       return delta1 > delta2;
>> +}
> 
> This isn't a valid comparator, since it isn't a constant function of
> its two input pointers - calling mem_cgroup_compare_soft_limits(m1,
> m2) can give different results at different times. So your heap
> invariant will become invalid over time.
> 
> I think if you want to do this, you're going to need to periodically
> take a snapshot of each cgroup's excess and use that snapshot in the
> comparator; whenever you update the snapshots, you'll need to restore
> the heap invariant.
> 

I'll fix it by taking snapshots only before inserting an element into the heap
(I think I responded to this one in another email, but missed out on the typos).

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 5/5] Memory controller soft limit reclaim on contention
  2008-06-27 15:19 ` [RFC 5/5] Memory controller soft limit reclaim on contention Balbir Singh
  2008-06-27 16:09   ` Paul Menage
@ 2008-06-28  4:22   ` KAMEZAWA Hiroyuki
  2008-06-30  7:33   ` KOSAKI Motohiro
  2 siblings, 0 replies; 27+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-28  4:22 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Andrew Morton, YAMAMOTO Takashi, Paul Menage, linux-kernel, linux-mm

On Fri, 27 Jun 2008 20:49:06 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> +
> +		while (count-- &&
> +			((mem = heap_delete_max(&mem_cgroup_heap)) != NULL)) {
> +			BUG_ON(!mem->on_heap);
> +			spin_unlock_irqrestore(&mem_cgroup_heap_lock, flags);
> +			nr_reclaimed += try_to_free_mem_cgroup_pages(mem,
> +								gfp_mask);
> +			cond_resched();
> +			spin_lock_irqsave(&mem_cgroup_heap_lock, flags);
> +			mem->on_heap = 0;
It seems "mem* is not on heap after heap_delete_max(), right ?
If so, I think this on_heap should be cleared right after heap_delete_max().


> +			/*
> +			 * What should be the basis of breaking out?
> +			 */
> +			if (nr_reclaimed)
> +				goto done;

why stops here ?

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 5/5] Memory controller soft limit reclaim on contention
  2008-06-27 15:19 ` [RFC 5/5] Memory controller soft limit reclaim on contention Balbir Singh
  2008-06-27 16:09   ` Paul Menage
  2008-06-28  4:22   ` KAMEZAWA Hiroyuki
@ 2008-06-30  7:33   ` KOSAKI Motohiro
  2008-06-30  7:48     ` Balbir Singh
  2 siblings, 1 reply; 27+ messages in thread
From: KOSAKI Motohiro @ 2008-06-30  7:33 UTC (permalink / raw)
  To: Balbir Singh
  Cc: kosaki.motohiro, Andrew Morton, YAMAMOTO Takashi, Paul Menage,
	linux-kernel, linux-mm, KAMEZAWA Hiroyuki

Hi

this code survive stress testing?


> +		while (count-- &&
> +			((mem = heap_delete_max(&mem_cgroup_heap)) != NULL)) {
> +			BUG_ON(!mem->on_heap);
> +			spin_unlock_irqrestore(&mem_cgroup_heap_lock, flags);
> +			nr_reclaimed += try_to_free_mem_cgroup_pages(mem,
> +								gfp_mask);
> +			cond_resched();
> +			spin_lock_irqsave(&mem_cgroup_heap_lock, flags);
> +			mem->on_heap = 0;
> +			/*
> +			 * What should be the basis of breaking out?
> +			 */
> +			if (nr_reclaimed)
> +				goto done;

doubtful shortcut.
we shouldn't assume we need only one page.



>  #endif /* _LINUX_MEMCONTROL_H */
> diff -puN mm/vmscan.c~memory-controller-soft-limit-reclaim-on-contention mm/vmscan.c
> diff -puN mm/page_alloc.c~memory-controller-soft-limit-reclaim-on-contention mm/page_alloc.c
> --- linux-2.6.26-rc5/mm/page_alloc.c~memory-controller-soft-limit-reclaim-on-contention	2008-06-27 20:43:10.000000000 +0530
> +++ linux-2.6.26-rc5-balbir/mm/page_alloc.c	2008-06-27 20:43:10.000000000 +0530
> @@ -1669,7 +1669,14 @@ nofail_alloc:
>  	reclaim_state.reclaimed_slab = 0;
>  	p->reclaim_state = &reclaim_state;
>  
> -	did_some_progress = try_to_free_pages(zonelist, order, gfp_mask);
> +	/*
> +	 * First try to reclaim from memory control groups that have
> +	 * exceeded their soft limit
> +	 */
> +	did_some_progress = mem_cgroup_reclaim_on_contention(gfp_mask);
> +	if (!did_some_progress)
> +		did_some_progress = try_to_free_pages(zonelist, order,
> +							gfp_mask);

try_to_free_mem_cgroup_pages() assume memcg need only one page.
but this code break it.

if anyone need several continuous memory, mem_cgroup_reclaim_on_contention() reclaim 
one or a very few page and return >0, then cause page allocation failure.

shouldn't we extend try_to_free_mem_cgroup_pages() agruments?


in addition, if we don't assume try_to_free_mem_cgroup_pages() need one page,
we should implement lumpy reclaim to mem_cgroup_isolate_pages().
otherwise, cpu wasting significant increase.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 5/5] Memory controller soft limit reclaim on contention
  2008-06-30  7:33   ` KOSAKI Motohiro
@ 2008-06-30  7:48     ` Balbir Singh
  2008-06-30  7:56       ` KOSAKI Motohiro
  0 siblings, 1 reply; 27+ messages in thread
From: Balbir Singh @ 2008-06-30  7:48 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, YAMAMOTO Takashi, Paul Menage, linux-kernel,
	linux-mm, KAMEZAWA Hiroyuki

KOSAKI Motohiro wrote:
> Hi
> 
> this code survive stress testing?
> 
> 
>> +		while (count-- &&
>> +			((mem = heap_delete_max(&mem_cgroup_heap)) != NULL)) {
>> +			BUG_ON(!mem->on_heap);
>> +			spin_unlock_irqrestore(&mem_cgroup_heap_lock, flags);
>> +			nr_reclaimed += try_to_free_mem_cgroup_pages(mem,
>> +								gfp_mask);
>> +			cond_resched();
>> +			spin_lock_irqsave(&mem_cgroup_heap_lock, flags);
>> +			mem->on_heap = 0;
>> +			/*
>> +			 * What should be the basis of breaking out?
>> +			 */
>> +			if (nr_reclaimed)
>> +				goto done;
> 
> doubtful shortcut.
> we shouldn't assume we need only one page.
> 

There's a comment on top -- what should be the basis of breaking out? It
definitely needs refinement, the current solution seemed to be working, so I
kept it.

> 
> 
>>  #endif /* _LINUX_MEMCONTROL_H */
>> diff -puN mm/vmscan.c~memory-controller-soft-limit-reclaim-on-contention mm/vmscan.c
>> diff -puN mm/page_alloc.c~memory-controller-soft-limit-reclaim-on-contention mm/page_alloc.c
>> --- linux-2.6.26-rc5/mm/page_alloc.c~memory-controller-soft-limit-reclaim-on-contention	2008-06-27 20:43:10.000000000 +0530
>> +++ linux-2.6.26-rc5-balbir/mm/page_alloc.c	2008-06-27 20:43:10.000000000 +0530
>> @@ -1669,7 +1669,14 @@ nofail_alloc:
>>  	reclaim_state.reclaimed_slab = 0;
>>  	p->reclaim_state = &reclaim_state;
>>  
>> -	did_some_progress = try_to_free_pages(zonelist, order, gfp_mask);
>> +	/*
>> +	 * First try to reclaim from memory control groups that have
>> +	 * exceeded their soft limit
>> +	 */
>> +	did_some_progress = mem_cgroup_reclaim_on_contention(gfp_mask);
>> +	if (!did_some_progress)
>> +		did_some_progress = try_to_free_pages(zonelist, order,
>> +							gfp_mask);
> 
> try_to_free_mem_cgroup_pages() assume memcg need only one page.
> but this code break it.
> 
> if anyone need several continuous memory, mem_cgroup_reclaim_on_contention() reclaim 
> one or a very few page and return >0, then cause page allocation failure.
> 
> shouldn't we extend try_to_free_mem_cgroup_pages() agruments?
> 
> 
> in addition, if we don't assume try_to_free_mem_cgroup_pages() need one page,
> we should implement lumpy reclaim to mem_cgroup_isolate_pages().
> otherwise, cpu wasting significant increase.

The memory controller currently controls just *user* pages, which are all of
order 1. Since pages are faulted in at different times, lumpy reclaim was not
the highest priority for the memory controller. NOTE: the pages are duplicated
on the per-zone LRU, so lumpy reclaim from there should work just fine.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 5/5] Memory controller soft limit reclaim on contention
  2008-06-30  7:48     ` Balbir Singh
@ 2008-06-30  7:56       ` KOSAKI Motohiro
  2008-06-30  8:11         ` Balbir Singh
  0 siblings, 1 reply; 27+ messages in thread
From: KOSAKI Motohiro @ 2008-06-30  7:56 UTC (permalink / raw)
  To: balbir
  Cc: kosaki.motohiro, Andrew Morton, YAMAMOTO Takashi, Paul Menage,
	linux-kernel, linux-mm, KAMEZAWA Hiroyuki

> >>  #endif /* _LINUX_MEMCONTROL_H */
> >> diff -puN mm/vmscan.c~memory-controller-soft-limit-reclaim-on-contention mm/vmscan.c
> >> diff -puN mm/page_alloc.c~memory-controller-soft-limit-reclaim-on-contention mm/page_alloc.c
> >> --- linux-2.6.26-rc5/mm/page_alloc.c~memory-controller-soft-limit-reclaim-on-contention	2008-06-27 20:43:10.000000000 +0530
> >> +++ linux-2.6.26-rc5-balbir/mm/page_alloc.c	2008-06-27 20:43:10.000000000 +0530
> >> @@ -1669,7 +1669,14 @@ nofail_alloc:
> >>  	reclaim_state.reclaimed_slab = 0;
> >>  	p->reclaim_state = &reclaim_state;
> >>  
> >> -	did_some_progress = try_to_free_pages(zonelist, order, gfp_mask);
> >> +	/*
> >> +	 * First try to reclaim from memory control groups that have
> >> +	 * exceeded their soft limit
> >> +	 */
> >> +	did_some_progress = mem_cgroup_reclaim_on_contention(gfp_mask);
> >> +	if (!did_some_progress)
> >> +		did_some_progress = try_to_free_pages(zonelist, order,
> >> +							gfp_mask);
> > 
> > try_to_free_mem_cgroup_pages() assume memcg need only one page.
> > but this code break it.
> > 
> > if anyone need several continuous memory, mem_cgroup_reclaim_on_contention() reclaim 
> > one or a very few page and return >0, then cause page allocation failure.
> > 
> > shouldn't we extend try_to_free_mem_cgroup_pages() agruments?
> > 
> > 
> > in addition, if we don't assume try_to_free_mem_cgroup_pages() need one page,
> > we should implement lumpy reclaim to mem_cgroup_isolate_pages().
> > otherwise, cpu wasting significant increase.
> 
> The memory controller currently controls just *user* pages, which are all of
> order 1. Since pages are faulted in at different times, lumpy reclaim was not
> the highest priority for the memory controller. NOTE: the pages are duplicated
> on the per-zone LRU, so lumpy reclaim from there should work just fine.

yes, memcg used only one page.
but mem_cgroup_reclaim_on_contention() reclaim for generic alloc_pages(), instead for memcg.
we can't assume memcg usage.
isn't it?




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 5/5] Memory controller soft limit reclaim on contention
  2008-06-30  7:56       ` KOSAKI Motohiro
@ 2008-06-30  8:11         ` Balbir Singh
  2008-06-30  8:17           ` KOSAKI Motohiro
  0 siblings, 1 reply; 27+ messages in thread
From: Balbir Singh @ 2008-06-30  8:11 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, YAMAMOTO Takashi, Paul Menage, linux-kernel,
	linux-mm, KAMEZAWA Hiroyuki

KOSAKI Motohiro wrote:
> yes, memcg used only one page.
> but mem_cgroup_reclaim_on_contention() reclaim for generic alloc_pages(), instead for memcg.
> we can't assume memcg usage.
> isn't it?

Yes, but the reclaim is from memcg pages (memcg groups that are over their soft
limit). I am not sure if I understand your point? If your claim is that we don't
free up pages of at-least order (as desired by __alloc_pages_internal()), that
is correct. We can ensure that we do a pass over memcg and generic zone LRU.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 5/5] Memory controller soft limit reclaim on contention
  2008-06-30  8:11         ` Balbir Singh
@ 2008-06-30  8:17           ` KOSAKI Motohiro
  0 siblings, 0 replies; 27+ messages in thread
From: KOSAKI Motohiro @ 2008-06-30  8:17 UTC (permalink / raw)
  To: balbir
  Cc: kosaki.motohiro, Andrew Morton, YAMAMOTO Takashi, Paul Menage,
	linux-kernel, linux-mm, KAMEZAWA Hiroyuki

> > yes, memcg used only one page.
> > but mem_cgroup_reclaim_on_contention() reclaim for generic alloc_pages(), instead for memcg.
> > we can't assume memcg usage.
> > isn't it?
> 
> Yes, but the reclaim is from memcg pages (memcg groups that are over their soft
> limit). I am not sure if I understand your point? If your claim is that we don't
> free up pages of at-least order (as desired by __alloc_pages_internal()), that
> is correct. We can ensure that we do a pass over memcg and generic zone LRU.

exactly.
Thank you.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 0/5] Memory controller soft limit introduction (v3)
  2008-06-27 15:18 [RFC 0/5] Memory controller soft limit introduction (v3) Balbir Singh
                   ` (4 preceding siblings ...)
  2008-06-27 15:19 ` [RFC 5/5] Memory controller soft limit reclaim on contention Balbir Singh
@ 2008-06-28  4:36 ` KAMEZAWA Hiroyuki
  2008-06-29  5:02   ` Balbir Singh
  5 siblings, 1 reply; 27+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-28  4:36 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Andrew Morton, YAMAMOTO Takashi, Paul Menage, linux-kernel, linux-mm

On Fri, 27 Jun 2008 20:48:08 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> This patchset implements the basic changes required to implement soft limits
> in the memory controller. A soft limit is a variation of the currently
> supported hard limit feature. A memory cgroup can exceed it's soft limit
> provided there is no contention for memory.
> 
> These patches were tested on a x86_64 box, by running a programs in parallel,
> and checking their behaviour for various soft limit values.
> 
> These patches were developed on top of 2.6.26-rc5-mm3. Comments, suggestions,
> criticism are all welcome!
> 
> A previous version of the patch can be found at
> 
> http://kerneltrap.org/mailarchive/linux-kernel/2008/2/19/904114
> 
I have a couple of comments.

1. Why you add soft_limit to res_coutner ?
   Is there any other controller which uses soft-limit ?
   I'll move watermark handling to memcg from res_counter becasue it's
   required only by memcg.

2. *please* handle NUMA
   There is a fundamental difference between global VMM and memcg.
     global VMM - reclaim memory at memory shortage.
     memcg     - for reclaim memory at memory limit
   Then, memcg wasn't required to handle place-of-memory at hitting limit. 
   *just reducing the usage* was enough.
   In this set, you try to handle memory shortage handling.
   So, please handle NUMA, i.e. "what node do you want to reclaim memory from ?"
   If not, 
    - memory placement of Apps can be terrible.
    - cannot work well with cpuset. (I think)

3. I think  when "mem_cgroup_reclaim_on_contention" exits is unclear.
   plz add explanation of algorithm. It returns when some pages are reclaimed ?

4. When swap-full cgroup is on the top of heap, which tends to contain
   tons of memory, much amount of cpu-time will be wasted.
   Can we add "ignore me" flag  ?

Maybe "2" is the most important to implement this.
I think this feature itself is interesting, so please handle NUMA.

"4" includes the user's (middleware's) memcg handling problem. But maybe
a problem should be fixed in future.

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 0/5] Memory controller soft limit introduction (v3)
  2008-06-28  4:36 ` [RFC 0/5] Memory controller soft limit introduction (v3) KAMEZAWA Hiroyuki
@ 2008-06-29  5:02   ` Balbir Singh
  2008-06-30  1:20     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 27+ messages in thread
From: Balbir Singh @ 2008-06-29  5:02 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, YAMAMOTO Takashi, Paul Menage, linux-kernel, linux-mm

KAMEZAWA Hiroyuki wrote:
> On Fri, 27 Jun 2008 20:48:08 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
>> This patchset implements the basic changes required to implement soft limits
>> in the memory controller. A soft limit is a variation of the currently
>> supported hard limit feature. A memory cgroup can exceed it's soft limit
>> provided there is no contention for memory.
>>
>> These patches were tested on a x86_64 box, by running a programs in parallel,
>> and checking their behaviour for various soft limit values.
>>
>> These patches were developed on top of 2.6.26-rc5-mm3. Comments, suggestions,
>> criticism are all welcome!
>>
>> A previous version of the patch can be found at
>>
>> http://kerneltrap.org/mailarchive/linux-kernel/2008/2/19/904114
>>
> I have a couple of comments.
> 
> 1. Why you add soft_limit to res_coutner ?
>    Is there any other controller which uses soft-limit ?
>    I'll move watermark handling to memcg from res_counter becasue it's
>    required only by memcg.
> 

I expect soft_limits to be controller independent. The same thing can be applied
to an io-controller for example, right?

> 2. *please* handle NUMA
>    There is a fundamental difference between global VMM and memcg.
>      global VMM - reclaim memory at memory shortage.
>      memcg     - for reclaim memory at memory limit
>    Then, memcg wasn't required to handle place-of-memory at hitting limit. 
>    *just reducing the usage* was enough.
>    In this set, you try to handle memory shortage handling.
>    So, please handle NUMA, i.e. "what node do you want to reclaim memory from ?"
>    If not, 
>     - memory placement of Apps can be terrible.
>     - cannot work well with cpuset. (I think)
> 

try_to_free_mem_cgroup_pages() handles NUMA right? We start with the
node_zonelists of the current node on which we are executing.  I can pass on the
zonelist from __alloc_pages_internal() to try_to_free_mem_cgroup_pages(). Is
there anything else you had in mind?


> 3. I think  when "mem_cgroup_reclaim_on_contention" exits is unclear.
>    plz add explanation of algorithm. It returns when some pages are reclaimed ?
> 

Sure, I will do that.

> 4. When swap-full cgroup is on the top of heap, which tends to contain
>    tons of memory, much amount of cpu-time will be wasted.
>    Can we add "ignore me" flag  ?
> 

Could you elaborate on swap-full cgroup please? Are you referring to changes
introduced by the memcg-handle-swap-cache patch? I don't mind adding a ignore me
flag, but I guess we need to figure out when a cgroup is swap full.

> Maybe "2" is the most important to implement this.
> I think this feature itself is interesting, so please handle NUMA.
> 

Thanks, I'll definitely fix what ever is needed to make the functionality more
correct and useful.

> "4" includes the user's (middleware's) memcg handling problem. But maybe
> a problem should be fixed in future.

Thanks for the review!

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 0/5] Memory controller soft limit introduction (v3)
  2008-06-29  5:02   ` Balbir Singh
@ 2008-06-30  1:20     ` KAMEZAWA Hiroyuki
  2008-06-30  1:50       ` KAMEZAWA Hiroyuki
  2008-06-30  3:41       ` Balbir Singh
  0 siblings, 2 replies; 27+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-30  1:20 UTC (permalink / raw)
  To: balbir
  Cc: Andrew Morton, YAMAMOTO Takashi, Paul Menage, linux-kernel, linux-mm

On Sun, 29 Jun 2008 10:32:03 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > I have a couple of comments.
> > 
> > 1. Why you add soft_limit to res_coutner ?
> >    Is there any other controller which uses soft-limit ?
> >    I'll move watermark handling to memcg from res_counter becasue it's
> >    required only by memcg.
> > 
> 
> I expect soft_limits to be controller independent. The same thing can be applied
> to an io-controller for example, right?
> 

I can't imagine how soft-limit works on i/o controller. could you explain ?


> > 2. *please* handle NUMA
> >    There is a fundamental difference between global VMM and memcg.
> >      global VMM - reclaim memory at memory shortage.
> >      memcg     - for reclaim memory at memory limit
> >    Then, memcg wasn't required to handle place-of-memory at hitting limit. 
> >    *just reducing the usage* was enough.
> >    In this set, you try to handle memory shortage handling.
> >    So, please handle NUMA, i.e. "what node do you want to reclaim memory from ?"
> >    If not, 
> >     - memory placement of Apps can be terrible.
> >     - cannot work well with cpuset. (I think)
> > 
> 
> try_to_free_mem_cgroup_pages() handles NUMA right? We start with the
> node_zonelists of the current node on which we are executing.  I can pass on the
> zonelist from __alloc_pages_internal() to try_to_free_mem_cgroup_pages(). Is
> there anything else you had in mind?
> 
Assume following case of a host with 2 nodes. and following mount style.

mount -t cgroup -o memory,cpuset none /opt/cgroup/

  
  /Group1: cpu 0-1, mem=0 limit=1G, soft-limit=700M
  /Group2: cpu 2-3, mem=1 limit=1G  soft-limit=700M
  ....
  /Groupxxxx

Assume a environ after some workload, 

  /Group1: cpu 0-1, mem=0 limit=1G, soft-limit=700M usage=990M
  /Group2: cpu 2-3, mem=1 limit=1G  soft-limit=700M usage=400M

*And* memory of node"1" is in shortage and the kernel has to reclaim
memory from node "1".

Your routine tries to relclaim memory from a group, which exceeds soft-limit
....Group1. But it's no help because Group1 doesn't contains any memory in Node1.
And make it worse, your routine doen't tries to call try_to_free_pages() in global
LRU when your soft-limit reclaim some memory. So, if a task in Group 1 continues
to allocate memory at some speed, memory shortage in Group2 will not be recovered,
easily.

This includes 2 aspects of trouble.
 - Group1's memory is reclaimed but it's wrong.
 - Group2's try_to_free_pages() may took very long time.

(Current page shrinking under cpuset seems to scan all nodes,
 his seems not to be quick, but it works  because it scans all.
 This will be another problem, anyway ;).


BTW, currently mem_cgroup_try_to_free_pages() assumes GFP_HIGHUSER_MOVABLE
always.
==
unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
                                                gfp_t gfp_mask)
{
        struct scan_control sc = {
                .may_writepage = !laptop_mode,
                .may_swap = 1,
                .swap_cluster_max = SWAP_CLUSTER_MAX,
                .swappiness = vm_swappiness,
                .order = 0,
                .mem_cgroup = mem_cont,
                .isolate_pages = mem_cgroup_isolate_pages,
        };
        struct zonelist *zonelist;

        sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
                        (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
        zonelist = NODE_DATA(numa_node_id())->node_zonelists;
        return do_try_to_free_pages(zonelist, &sc);
}
==
please select appropriate zonelist here.


> 
> > 3. I think  when "mem_cgroup_reclaim_on_contention" exits is unclear.
> >    plz add explanation of algorithm. It returns when some pages are reclaimed ?
> > 
> 
> Sure, I will do that.
> 
> > 4. When swap-full cgroup is on the top of heap, which tends to contain
> >    tons of memory, much amount of cpu-time will be wasted.
> >    Can we add "ignore me" flag  ?
> > 
> 
> Could you elaborate on swap-full cgroup please? Are you referring to changes
> introduced by the memcg-handle-swap-cache patch? I don't mind adding a ignore me
> flag, but I guess we need to figure out when a cgroup is swap full.
> 
No. no-available-swap, or all-swap-are-used situation.

This situation will happen very easily if swap-controller comes.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 0/5] Memory controller soft limit introduction (v3)
  2008-06-30  1:20     ` KAMEZAWA Hiroyuki
@ 2008-06-30  1:50       ` KAMEZAWA Hiroyuki
  2008-06-30  2:02         ` KAMEZAWA Hiroyuki
  2008-06-30  3:41       ` Balbir Singh
  1 sibling, 1 reply; 27+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-30  1:50 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: balbir, Andrew Morton, YAMAMOTO Takashi, Paul Menage,
	linux-kernel, linux-mm

On Mon, 30 Jun 2008 10:20:54 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > 2. *please* handle NUMA
> > >    There is a fundamental difference between global VMM and memcg.
> > >      global VMM - reclaim memory at memory shortage.
> > >      memcg     - for reclaim memory at memory limit
> > >    Then, memcg wasn't required to handle place-of-memory at hitting limit. 
> > >    *just reducing the usage* was enough.
> > >    In this set, you try to handle memory shortage handling.
> > >    So, please handle NUMA, i.e. "what node do you want to reclaim memory from ?"
> > >    If not, 
> > >     - memory placement of Apps can be terrible.
> > >     - cannot work well with cpuset. (I think)
> > > 
> > 
> > try_to_free_mem_cgroup_pages() handles NUMA right? We start with the
> > node_zonelists of the current node on which we are executing.  I can pass on the
> > zonelist from __alloc_pages_internal() to try_to_free_mem_cgroup_pages(). Is
> > there anything else you had in mind?
> > 
> Assume following case of a host with 2 nodes. and following mount style.
> 
> mount -t cgroup -o memory,cpuset none /opt/cgroup/
> 
>   
>   /Group1: cpu 0-1, mem=0 limit=1G, soft-limit=700M
>   /Group2: cpu 2-3, mem=1 limit=1G  soft-limit=700M
>   ....
>   /Groupxxxx
> 
> Assume a environ after some workload, 
> 
>   /Group1: cpu 0-1, mem=0 limit=1G, soft-limit=700M usage=990M
>   /Group2: cpu 2-3, mem=1 limit=1G  soft-limit=700M usage=400M
> 
> *And* memory of node"1" is in shortage and the kernel has to reclaim
> memory from node "1".
> 
> Your routine tries to relclaim memory from a group, which exceeds soft-limit
> ....Group1. But it's no help because Group1 doesn't contains any memory in Node1.
> And make it worse, your routine doen't tries to call try_to_free_pages() in global
> LRU when your soft-limit reclaim some memory. So, if a task in Group 1 continues
> to allocate memory at some speed, memory shortage in Group2 will not be recovered,
> easily.
> 
> This includes 2 aspects of trouble.
>  - Group1's memory is reclaimed but it's wrong.
>  - Group2's try_to_free_pages() may took very long time.
> 
A bit more inforamtion, to be honest, I don't understand this perfectly.

But I convice there is some difference between limit and shortage.

in 2.6.26-rc5-mm3's shrink_zones() supprots cpuset by this.

==
                if (scan_global_lru(sc)) {
                        if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
                                continue;
                        note_zone_scanning_priority(zone, priority);

                        if (zone_is_all_unreclaimable(zone) &&
                                                priority != DEF_PRIORITY)
                                continue;       /* Let kswapd poll it */
                        sc->all_unreclaimable = 0;
                } else {
                        /*
                         * Ignore cpuset limitation here. We just want to reduce
                         * # of used pages by us regardless of memory shortage.
                         */
                        sc->all_unreclaimable = 0;
                        mem_cgroup_note_reclaim_priority(sc->mem_cgroup,
                                                        priority);
                }
==

First point is (maybe) my mistake. We have to add cpuset hardwall check to memcg
part. (I will write a patch soon.)

Second point is  when memory shortage is caused by some routine which is not in
cpuset. In this case, Group1's memory can be reclaimed w/o benefits.
not big trouble ?


Thanks,
-Kame



























--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 0/5] Memory controller soft limit introduction (v3)
  2008-06-30  1:50       ` KAMEZAWA Hiroyuki
@ 2008-06-30  2:02         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 27+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-30  2:02 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: balbir, Andrew Morton, YAMAMOTO Takashi, Paul Menage,
	linux-kernel, linux-mm

On Mon, 30 Jun 2008 10:50:06 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:


> ==
>                 if (scan_global_lru(sc)) {
>                         if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
>                                 continue;
>                         note_zone_scanning_priority(zone, priority);
> 
>                         if (zone_is_all_unreclaimable(zone) &&
>                                                 priority != DEF_PRIORITY)
>                                 continue;       /* Let kswapd poll it */
>                         sc->all_unreclaimable = 0;
>                 } else {
>                         /*
>                          * Ignore cpuset limitation here. We just want to reduce
>                          * # of used pages by us regardless of memory shortage.
>                          */
>                         sc->all_unreclaimable = 0;
>                         mem_cgroup_note_reclaim_priority(sc->mem_cgroup,
>                                                         priority);
>                 }
> ==
> 
> First point is (maybe) my mistake. We have to add cpuset hardwall check to memcg
> part. (I will write a patch soon.)
> 

I found my comment seems to say some correct thing..
==
 /*
  * Ignore cpuset limitation here. We just want to reduce
  * # of used pages by us regardless of memory shortage.
  */
==
When we handle memory shortage, we'll have to change this mind.

But I can think of another example easily...
==
  MemcgA: limit=1G
  CpusetX: mem=0
  CpusetY: mem=1
  taskP = MemcgA+CpusetX
  taskQ = MemcgA+CpusetY
==
In this case, we just want to reduce the usage of memory....nonsense ?

Hmm..I should refresh my brain and revisit this later.
Any inputs are welcome.

Thanks,
-Kame




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 0/5] Memory controller soft limit introduction (v3)
  2008-06-30  1:20     ` KAMEZAWA Hiroyuki
  2008-06-30  1:50       ` KAMEZAWA Hiroyuki
@ 2008-06-30  3:41       ` Balbir Singh
  2008-06-30  3:57         ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 27+ messages in thread
From: Balbir Singh @ 2008-06-30  3:41 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, YAMAMOTO Takashi, Paul Menage, linux-kernel, linux-mm

KAMEZAWA Hiroyuki wrote:
> On Sun, 29 Jun 2008 10:32:03 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>>> I have a couple of comments.
>>>
>>> 1. Why you add soft_limit to res_coutner ?
>>>    Is there any other controller which uses soft-limit ?
>>>    I'll move watermark handling to memcg from res_counter becasue it's
>>>    required only by memcg.
>>>
>> I expect soft_limits to be controller independent. The same thing can be applied
>> to an io-controller for example, right?
>>
> 
> I can't imagine how soft-limit works on i/o controller. could you explain ?
> 

An io-controller could have the same concept. A hard-limit on the bandwidth and
a soft-limit to allow a group to exceed the soft-limit provided there is no i/o
bandwidth congestion.

> 
>>> 2. *please* handle NUMA
>>>    There is a fundamental difference between global VMM and memcg.
>>>      global VMM - reclaim memory at memory shortage.
>>>      memcg     - for reclaim memory at memory limit
>>>    Then, memcg wasn't required to handle place-of-memory at hitting limit. 
>>>    *just reducing the usage* was enough.
>>>    In this set, you try to handle memory shortage handling.
>>>    So, please handle NUMA, i.e. "what node do you want to reclaim memory from ?"
>>>    If not, 
>>>     - memory placement of Apps can be terrible.
>>>     - cannot work well with cpuset. (I think)
>>>
>> try_to_free_mem_cgroup_pages() handles NUMA right? We start with the
>> node_zonelists of the current node on which we are executing.  I can pass on the
>> zonelist from __alloc_pages_internal() to try_to_free_mem_cgroup_pages(). Is
>> there anything else you had in mind?
>>
> Assume following case of a host with 2 nodes. and following mount style.
> 
> mount -t cgroup -o memory,cpuset none /opt/cgroup/
> 
>   
>   /Group1: cpu 0-1, mem=0 limit=1G, soft-limit=700M
>   /Group2: cpu 2-3, mem=1 limit=1G  soft-limit=700M
>   ....
>   /Groupxxxx
> 
> Assume a environ after some workload, 
> 
>   /Group1: cpu 0-1, mem=0 limit=1G, soft-limit=700M usage=990M
>   /Group2: cpu 2-3, mem=1 limit=1G  soft-limit=700M usage=400M
> 
> *And* memory of node"1" is in shortage and the kernel has to reclaim
> memory from node "1".
> 
> Your routine tries to relclaim memory from a group, which exceeds soft-limit
> ....Group1. But it's no help because Group1 doesn't contains any memory in Node1.
> And make it worse, your routine doen't tries to call try_to_free_pages() in global
> LRU when your soft-limit reclaim some memory. So, if a task in Group 1 continues
> to allocate memory at some speed, memory shortage in Group2 will not be recovered,
> easily.
> 
> This includes 2 aspects of trouble.
>  - Group1's memory is reclaimed but it's wrong.
>  - Group2's try_to_free_pages() may took very long time.
> 
> (Current page shrinking under cpuset seems to scan all nodes,
>  his seems not to be quick, but it works  because it scans all.
>  This will be another problem, anyway ;).
> 
> 
> BTW, currently mem_cgroup_try_to_free_pages() assumes GFP_HIGHUSER_MOVABLE
> always.
> ==
> unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
>                                                 gfp_t gfp_mask)
> {
>         struct scan_control sc = {
>                 .may_writepage = !laptop_mode,
>                 .may_swap = 1,
>                 .swap_cluster_max = SWAP_CLUSTER_MAX,
>                 .swappiness = vm_swappiness,
>                 .order = 0,
>                 .mem_cgroup = mem_cont,
>                 .isolate_pages = mem_cgroup_isolate_pages,
>         };
>         struct zonelist *zonelist;
> 
>         sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
>                         (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
>         zonelist = NODE_DATA(numa_node_id())->node_zonelists;
>         return do_try_to_free_pages(zonelist, &sc);
> }
> ==
> please select appropriate zonelist here.
> 

We do have zonelist information in __alloc_pages_internal(), it should be easy
to pass the zonelist or come up with a good default (current one) if no zonelist
is provided to the routine.


> 
>>> 3. I think  when "mem_cgroup_reclaim_on_contention" exits is unclear.
>>>    plz add explanation of algorithm. It returns when some pages are reclaimed ?
>>>
>> Sure, I will do that.
>>
>>> 4. When swap-full cgroup is on the top of heap, which tends to contain
>>>    tons of memory, much amount of cpu-time will be wasted.
>>>    Can we add "ignore me" flag  ?
>>>
>> Could you elaborate on swap-full cgroup please? Are you referring to changes
>> introduced by the memcg-handle-swap-cache patch? I don't mind adding a ignore me
>> flag, but I guess we need to figure out when a cgroup is swap full.
>>
> No. no-available-swap, or all-swap-are-used situation.
> 
> This situation will happen very easily if swap-controller comes.

We'll definitely deal with it when the swap-controller comes in.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 0/5] Memory controller soft limit introduction (v3)
  2008-06-30  3:41       ` Balbir Singh
@ 2008-06-30  3:57         ` KAMEZAWA Hiroyuki
  2008-06-30  4:00           ` Balbir Singh
  0 siblings, 1 reply; 27+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-30  3:57 UTC (permalink / raw)
  To: balbir
  Cc: Andrew Morton, YAMAMOTO Takashi, Paul Menage, linux-kernel, linux-mm

On Mon, 30 Jun 2008 09:11:19 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> KAMEZAWA Hiroyuki wrote:
> > On Sun, 29 Jun 2008 10:32:03 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> >>> I have a couple of comments.
> >>>
> >>> 1. Why you add soft_limit to res_coutner ?
> >>>    Is there any other controller which uses soft-limit ?
> >>>    I'll move watermark handling to memcg from res_counter becasue it's
> >>>    required only by memcg.
> >>>
> >> I expect soft_limits to be controller independent. The same thing can be applied
> >> to an io-controller for example, right?
> >>
> > 
> > I can't imagine how soft-limit works on i/o controller. could you explain ?
> > 
> 
> An io-controller could have the same concept. A hard-limit on the bandwidth and
> a soft-limit to allow a group to exceed the soft-limit provided there is no i/o
> bandwidth congestion.
> 
Hmm, that is the case where "share" works well. Why soft-limit ?
i/o conroller doesn't support share ? (I don' know sorry.)



> > 
> >>> 2. *please* handle NUMA
> >>>    There is a fundamental difference between global VMM and memcg.
> >>>      global VMM - reclaim memory at memory shortage.
> >>>      memcg     - for reclaim memory at memory limit
> >>>    Then, memcg wasn't required to handle place-of-memory at hitting limit. 
> >>>    *just reducing the usage* was enough.
> >>>    In this set, you try to handle memory shortage handling.
> >>>    So, please handle NUMA, i.e. "what node do you want to reclaim memory from ?"
> >>>    If not, 
> >>>     - memory placement of Apps can be terrible.
> >>>     - cannot work well with cpuset. (I think)
> >>>
> >> try_to_free_mem_cgroup_pages() handles NUMA right? We start with the
> >> node_zonelists of the current node on which we are executing.  I can pass on the
> >> zonelist from __alloc_pages_internal() to try_to_free_mem_cgroup_pages(). Is
> >> there anything else you had in mind?
> >>
> > Assume following case of a host with 2 nodes. and following mount style.
> > 
> > mount -t cgroup -o memory,cpuset none /opt/cgroup/
> > 
> >   
> >   /Group1: cpu 0-1, mem=0 limit=1G, soft-limit=700M
> >   /Group2: cpu 2-3, mem=1 limit=1G  soft-limit=700M
> >   ....
> >   /Groupxxxx
> > 
> > Assume a environ after some workload, 
> > 
> >   /Group1: cpu 0-1, mem=0 limit=1G, soft-limit=700M usage=990M
> >   /Group2: cpu 2-3, mem=1 limit=1G  soft-limit=700M usage=400M
> > 
> > *And* memory of node"1" is in shortage and the kernel has to reclaim
> > memory from node "1".
> > 
> > Your routine tries to relclaim memory from a group, which exceeds soft-limit
> > ....Group1. But it's no help because Group1 doesn't contains any memory in Node1.
> > And make it worse, your routine doen't tries to call try_to_free_pages() in global
> > LRU when your soft-limit reclaim some memory. So, if a task in Group 1 continues
> > to allocate memory at some speed, memory shortage in Group2 will not be recovered,
> > easily.
> > 
> > This includes 2 aspects of trouble.
> >  - Group1's memory is reclaimed but it's wrong.
> >  - Group2's try_to_free_pages() may took very long time.
> > 
> > (Current page shrinking under cpuset seems to scan all nodes,
> >  his seems not to be quick, but it works  because it scans all.
> >  This will be another problem, anyway ;).
> > 
> > 
> > BTW, currently mem_cgroup_try_to_free_pages() assumes GFP_HIGHUSER_MOVABLE
> > always.
> > ==
> > unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
> >                                                 gfp_t gfp_mask)
> > {
> >         struct scan_control sc = {
> >                 .may_writepage = !laptop_mode,
> >                 .may_swap = 1,
> >                 .swap_cluster_max = SWAP_CLUSTER_MAX,
> >                 .swappiness = vm_swappiness,
> >                 .order = 0,
> >                 .mem_cgroup = mem_cont,
> >                 .isolate_pages = mem_cgroup_isolate_pages,
> >         };
> >         struct zonelist *zonelist;
> > 
> >         sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
> >                         (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> >         zonelist = NODE_DATA(numa_node_id())->node_zonelists;
> >         return do_try_to_free_pages(zonelist, &sc);
> > }
> > ==
> > please select appropriate zonelist here.
> > 
> 
> We do have zonelist information in __alloc_pages_internal(), it should be easy
> to pass the zonelist or come up with a good default (current one) if no zonelist
> is provided to the routine.
> 
yes. what I want to say is you should take care of this.

Anyway, I think you should revisit the whole memory reclaim and fixes small bugs?
which doesn't meet soft-limit.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 0/5] Memory controller soft limit introduction (v3)
  2008-06-30  3:57         ` KAMEZAWA Hiroyuki
@ 2008-06-30  4:00           ` Balbir Singh
  2008-06-30  4:19             ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 27+ messages in thread
From: Balbir Singh @ 2008-06-30  4:00 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, YAMAMOTO Takashi, Paul Menage, linux-kernel, linux-mm

KAMEZAWA Hiroyuki wrote:
> Hmm, that is the case where "share" works well. Why soft-limit ?
> i/o conroller doesn't support share ? (I don' know sorry.)
> 

Share is a proportional allocation of a resource. Typically that resource is
soft-limits, but not necessarily. If we re-use resource counters, my expectation
is that

A share implementation would under-neath use soft-limits.

> yes. what I want to say is you should take care of this.
> 

Yes, it will

> Anyway, I think you should revisit the whole memory reclaim and fixes small bugs?
> which doesn't meet soft-limit.
> 

I'll revisit the full thing, I am revisiting parts of it as I write the soft
limit feature.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 0/5] Memory controller soft limit introduction (v3)
  2008-06-30  4:00           ` Balbir Singh
@ 2008-06-30  4:19             ` KAMEZAWA Hiroyuki
  2008-06-30  4:40               ` Balbir Singh
  0 siblings, 1 reply; 27+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-30  4:19 UTC (permalink / raw)
  To: balbir
  Cc: Andrew Morton, YAMAMOTO Takashi, Paul Menage, linux-kernel, linux-mm

On Mon, 30 Jun 2008 09:30:50 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> KAMEZAWA Hiroyuki wrote:
> > Hmm, that is the case where "share" works well. Why soft-limit ?
> > i/o conroller doesn't support share ? (I don' know sorry.)
> > 
> 
> Share is a proportional allocation of a resource. Typically that resource is
> soft-limits, but not necessarily. If we re-use resource counters, my expectation
> is that
> 
> A share implementation would under-neath use soft-limits.
> 
Hmm...I don't convice at this point. (because it's future problem)
At least, please find lock-less approach to check soft-limit.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 0/5] Memory controller soft limit introduction (v3)
  2008-06-30  4:19             ` KAMEZAWA Hiroyuki
@ 2008-06-30  4:40               ` Balbir Singh
  0 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2008-06-30  4:40 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, YAMAMOTO Takashi, Paul Menage, linux-kernel, linux-mm

KAMEZAWA Hiroyuki wrote:
> On Mon, 30 Jun 2008 09:30:50 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
>> KAMEZAWA Hiroyuki wrote:
>>> Hmm, that is the case where "share" works well. Why soft-limit ?
>>> i/o conroller doesn't support share ? (I don' know sorry.)
>>>
>> Share is a proportional allocation of a resource. Typically that resource is
>> soft-limits, but not necessarily. If we re-use resource counters, my expectation
>> is that
>>
>> A share implementation would under-neath use soft-limits.
>>
> Hmm...I don't convice at this point. (because it's future problem)
> At least, please find lock-less approach to check soft-limit.

I've been looking at improving res_counter scalability. One simple approach is
to convert the spin lock to rw spinlock so that reading data can happen in
parallel. The next step would be to explore RCU for resource counters.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2008-06-30  8:17 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-27 15:18 [RFC 0/5] Memory controller soft limit introduction (v3) Balbir Singh
2008-06-27 15:18 ` [RFC 1/5] Memory controller soft limit documentation Balbir Singh
2008-06-27 15:18 ` [RFC 2/5] Add delete max to prio heap Balbir Singh
2008-06-27 15:18 ` [RFC 3/5] Replacement policy on heap overfull Balbir Singh
2008-06-27 15:37   ` Paul Menage
2008-06-30  3:46     ` Balbir Singh
2008-06-27 15:18 ` [RFC 4/5] Memory controller soft limit resource counter additions Balbir Singh
2008-06-27 15:19 ` [RFC 5/5] Memory controller soft limit reclaim on contention Balbir Singh
2008-06-27 16:09   ` Paul Menage
2008-06-29  4:48     ` Balbir Singh
2008-06-30  3:42     ` Balbir Singh
2008-06-28  4:22   ` KAMEZAWA Hiroyuki
2008-06-30  7:33   ` KOSAKI Motohiro
2008-06-30  7:48     ` Balbir Singh
2008-06-30  7:56       ` KOSAKI Motohiro
2008-06-30  8:11         ` Balbir Singh
2008-06-30  8:17           ` KOSAKI Motohiro
2008-06-28  4:36 ` [RFC 0/5] Memory controller soft limit introduction (v3) KAMEZAWA Hiroyuki
2008-06-29  5:02   ` Balbir Singh
2008-06-30  1:20     ` KAMEZAWA Hiroyuki
2008-06-30  1:50       ` KAMEZAWA Hiroyuki
2008-06-30  2:02         ` KAMEZAWA Hiroyuki
2008-06-30  3:41       ` Balbir Singh
2008-06-30  3:57         ` KAMEZAWA Hiroyuki
2008-06-30  4:00           ` Balbir Singh
2008-06-30  4:19             ` KAMEZAWA Hiroyuki
2008-06-30  4:40               ` Balbir Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox