linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] memcg: execute partial memcg freeing in mem_cgroup_destroy
@ 2012-08-16 11:01 Glauber Costa
  2012-08-17  4:51 ` Greg Thelen
  0 siblings, 1 reply; 6+ messages in thread
From: Glauber Costa @ 2012-08-16 11:01 UTC (permalink / raw)
  To: linux-mm
  Cc: cgroups, kamezawa.hiroyu, Andrew Morton, Johannes Weiner,
	Glauber Costa, Michal Hocko

A lot of the initialization we do in mem_cgroup_create() is done with
softirqs enabled. This include grabbing a css id, which holds
&ss->id_lock->rlock, and the per-zone trees, which holds
rtpz->lock->rlock. All of those signal to the lockdep mechanism that
those locks can be used in SOFTIRQ-ON-W context. This means that the
freeing of memcg structure must happen in a compatible context,
otherwise we'll get a deadlock.

The reference counting mechanism we use allows the memcg structure to be
freed later and outlive the actual memcg destruction from the
filesystem. However, we have little, if any, means to guarantee in which
context the last memcg_put will happen. The best we can do is test it
and try to make sure no invalid context releases are happening. But as
we add more code to memcg, the possible interactions grow in number and
expose more ways to get context conflicts.

Greg Thelen reported a bug with that patchset applied that would trigger
if a task would hold a reference to a memcg through its kmem counter.
This would mean that killing that task would eventually get us to
__mem_cgroup_free() after dropping the last kernel page reference, in an
invalid IN-SOFTIRQ-W.

Besides that, he raised the quite valid concern that keeping the full
memcg around for an unbounded period of time can eventually exhaust the
css_id space, and pin a lot of not needed memory. For instance, a
O(nr_cpus) percpu data for the stats is kept around, and we don't expect
to use it after the memcg is gone.

Both those problems can be avoided by freeing as much as we can in
mem_cgroup_destroy(), and leaving only the memcg structure and the
static branches to be removed later. That freeing run on a predictable
context, getting rid of the softirq problem, and also reduces pressure
both on the css_id space and total dangling memory.

I consider this safe because all the page_cgroup references to user
pages are reparented to the imediate parent, so late uncharges won't
trigger the common uncharge paths with a destroyed memcg.

Although we don't migrate kernel pages to parent, we also don't call the
common uncharge paths for those pages, rather uncharging the
res_counters directly. So we are safe on this side of the wall as well.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Reported-by: Greg Thelen <gthelen@google.com>
CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Michal Hocko <mhocko@suse.cz>
CC: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9a82965..78cb394 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5169,18 +5169,9 @@ static void free_work(struct work_struct *work)
 		vfree(memcg);
 }
 
-static void free_rcu(struct rcu_head *rcu_head)
-{
-	struct mem_cgroup *memcg;
-
-	memcg = container_of(rcu_head, struct mem_cgroup, rcu_freeing);
-	INIT_WORK(&memcg->work_freeing, free_work);
-	schedule_work(&memcg->work_freeing);
-}
-
 /*
- * At destroying mem_cgroup, references from swap_cgroup can remain.
- * (scanning all at force_empty is too costly...)
+ * At destroying mem_cgroup, references from swap_cgroup and other places can
+ * remain.  (scanning all at force_empty is too costly...)
  *
  * Instead of clearing all references at force_empty, we remember
  * the number of reference from swap_cgroup and free mem_cgroup when
@@ -5188,6 +5179,14 @@ static void free_rcu(struct rcu_head *rcu_head)
  *
  * Removal of cgroup itself succeeds regardless of refs from swap.
  */
+static void free_rcu(struct rcu_head *rcu_head)
+{
+	struct mem_cgroup *memcg;
+
+	memcg = container_of(rcu_head, struct mem_cgroup, rcu_freeing);
+	INIT_WORK(&memcg->work_freeing, free_work);
+	schedule_work(&memcg->work_freeing);
+}
 
 static void __mem_cgroup_free(struct mem_cgroup *memcg)
 {
@@ -5200,7 +5199,6 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
 		free_mem_cgroup_per_zone_info(memcg, node);
 
 	free_percpu(memcg->stat);
-	call_rcu(&memcg->rcu_freeing, free_rcu);
 }
 
 static void mem_cgroup_get(struct mem_cgroup *memcg)
@@ -5212,7 +5210,7 @@ static void __mem_cgroup_put(struct mem_cgroup *memcg, int count)
 {
 	if (atomic_sub_and_test(count, &memcg->refcnt)) {
 		struct mem_cgroup *parent = parent_mem_cgroup(memcg);
-		__mem_cgroup_free(memcg);
+		call_rcu(&memcg->rcu_freeing, free_rcu);
 		if (parent)
 			mem_cgroup_put(parent);
 	}
@@ -5377,6 +5375,7 @@ static void mem_cgroup_destroy(struct cgroup *cont)
 
 	kmem_cgroup_destroy(memcg);
 
+	__mem_cgroup_free(memcg);
 	mem_cgroup_put(memcg);
 }
 
-- 
1.7.11.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] memcg: execute partial memcg freeing in mem_cgroup_destroy
  2012-08-17  4:51 ` Greg Thelen
@ 2012-08-17  4:51   ` Glauber Costa
  2012-08-17  6:37     ` Greg Thelen
  0 siblings, 1 reply; 6+ messages in thread
From: Glauber Costa @ 2012-08-17  4:51 UTC (permalink / raw)
  To: Greg Thelen
  Cc: linux-mm, cgroups, kamezawa.hiroyu, Andrew Morton,
	Johannes Weiner, Michal Hocko

On 08/17/2012 08:51 AM, Greg Thelen wrote:
> On Thu, Aug 16 2012, Glauber Costa wrote:
> 
>> A lot of the initialization we do in mem_cgroup_create() is done with
>> softirqs enabled. This include grabbing a css id, which holds
>> &ss->id_lock->rlock, and the per-zone trees, which holds
>> rtpz->lock->rlock. All of those signal to the lockdep mechanism that
>> those locks can be used in SOFTIRQ-ON-W context. This means that the
>> freeing of memcg structure must happen in a compatible context,
>> otherwise we'll get a deadlock.
>>
>> The reference counting mechanism we use allows the memcg structure to be
>> freed later and outlive the actual memcg destruction from the
>> filesystem. However, we have little, if any, means to guarantee in which
>> context the last memcg_put will happen. The best we can do is test it
>> and try to make sure no invalid context releases are happening. But as
>> we add more code to memcg, the possible interactions grow in number and
>> expose more ways to get context conflicts.
>>
>> Greg Thelen reported a bug with that patchset applied that would trigger
>> if a task would hold a reference to a memcg through its kmem counter.
>> This would mean that killing that task would eventually get us to
>> __mem_cgroup_free() after dropping the last kernel page reference, in an
>> invalid IN-SOFTIRQ-W.
>>
>> Besides that, he raised the quite valid concern that keeping the full
>> memcg around for an unbounded period of time can eventually exhaust the
>> css_id space, and pin a lot of not needed memory. For instance, a
>> O(nr_cpus) percpu data for the stats is kept around, and we don't expect
>> to use it after the memcg is gone.
>>
>> Both those problems can be avoided by freeing as much as we can in
>> mem_cgroup_destroy(), and leaving only the memcg structure and the
>> static branches to be removed later. That freeing run on a predictable
>> context, getting rid of the softirq problem, and also reduces pressure
>> both on the css_id space and total dangling memory.
> 
> Thank you for the patch.  I think it is a step in the right direction,
> but I suspect a problem (noted below).
> 
>> I consider this safe because all the page_cgroup references to user
>> pages are reparented to the imediate parent, so late uncharges won't
>> trigger the common uncharge paths with a destroyed memcg.
>>
>> Although we don't migrate kernel pages to parent, we also don't call the
>> common uncharge paths for those pages, rather uncharging the
>> res_counters directly. So we are safe on this side of the wall as well.
>>
>> Signed-off-by: Glauber Costa <glommer@parallels.com>
>> Reported-by: Greg Thelen <gthelen@google.com>
>> CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> CC: Michal Hocko <mhocko@suse.cz>
>> CC: Johannes Weiner <hannes@cmpxchg.org>
>> ---
>>  mm/memcontrol.c | 25 ++++++++++++-------------
>>  1 file changed, 12 insertions(+), 13 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 9a82965..78cb394 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -5169,18 +5169,9 @@ static void free_work(struct work_struct *work)
>>  		vfree(memcg);
>>  }
>>  
>> -static void free_rcu(struct rcu_head *rcu_head)
>> -{
>> -	struct mem_cgroup *memcg;
>> -
>> -	memcg = container_of(rcu_head, struct mem_cgroup, rcu_freeing);
>> -	INIT_WORK(&memcg->work_freeing, free_work);
>> -	schedule_work(&memcg->work_freeing);
>> -}
>> -
>>  /*
>> - * At destroying mem_cgroup, references from swap_cgroup can remain.
>> - * (scanning all at force_empty is too costly...)
>> + * At destroying mem_cgroup, references from swap_cgroup and other places can
>> + * remain.  (scanning all at force_empty is too costly...)
>>   *
>>   * Instead of clearing all references at force_empty, we remember
>>   * the number of reference from swap_cgroup and free mem_cgroup when
>> @@ -5188,6 +5179,14 @@ static void free_rcu(struct rcu_head *rcu_head)
>>   *
>>   * Removal of cgroup itself succeeds regardless of refs from swap.
>>   */
>> +static void free_rcu(struct rcu_head *rcu_head)
>> +{
>> +	struct mem_cgroup *memcg;
>> +
>> +	memcg = container_of(rcu_head, struct mem_cgroup, rcu_freeing);
>> +	INIT_WORK(&memcg->work_freeing, free_work);
>> +	schedule_work(&memcg->work_freeing);
>> +}
>>  
>>  static void __mem_cgroup_free(struct mem_cgroup *memcg)
>>  {
>> @@ -5200,7 +5199,6 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
>>  		free_mem_cgroup_per_zone_info(memcg, node);
>>  
>>  	free_percpu(memcg->stat);
>> -	call_rcu(&memcg->rcu_freeing, free_rcu);
>>  }
>>  
>>  static void mem_cgroup_get(struct mem_cgroup *memcg)
>> @@ -5212,7 +5210,7 @@ static void __mem_cgroup_put(struct mem_cgroup *memcg, int count)
>>  {
>>  	if (atomic_sub_and_test(count, &memcg->refcnt)) {
>>  		struct mem_cgroup *parent = parent_mem_cgroup(memcg);
>> -		__mem_cgroup_free(memcg);
>> +		call_rcu(&memcg->rcu_freeing, free_rcu);
>>  		if (parent)
>>  			mem_cgroup_put(parent);
>>  	}
>> @@ -5377,6 +5375,7 @@ static void mem_cgroup_destroy(struct cgroup *cont)
>>  
>>  	kmem_cgroup_destroy(memcg);
>>  
>> +	__mem_cgroup_free(memcg);
> 
> I suspect that this will free the css_id before all swap entries have
> dropped their references with mem_cgroup_uncharge_swap() ?  I think we
> only want to call __mem_cgroup_free() once all non kernel page
> references have been released.  This would include mem_cgroup_destroy()
> and any pending calls to mem_cgroup_uncharge_swap().  I'm not sure, but
> may be a second refcount or some optimization with the kmem_accounted
> bitmask can efficiently handle this.
> 

Can we demonstrate that? I agree there might be a potential problem, and
that is why I sent this separately. But the impression I got after
testing and reading the code, was that the memcg information in
pc->mem_cgroup would be updated to the parent.

This means that any later call to uncharge or uncharge_swap would just
uncharge from the parent memcg and we'd have no problem.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] memcg: execute partial memcg freeing in mem_cgroup_destroy
  2012-08-16 11:01 [PATCH v2] memcg: execute partial memcg freeing in mem_cgroup_destroy Glauber Costa
@ 2012-08-17  4:51 ` Greg Thelen
  2012-08-17  4:51   ` Glauber Costa
  0 siblings, 1 reply; 6+ messages in thread
From: Greg Thelen @ 2012-08-17  4:51 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, cgroups, kamezawa.hiroyu, Andrew Morton,
	Johannes Weiner, Michal Hocko

On Thu, Aug 16 2012, Glauber Costa wrote:

> A lot of the initialization we do in mem_cgroup_create() is done with
> softirqs enabled. This include grabbing a css id, which holds
> &ss->id_lock->rlock, and the per-zone trees, which holds
> rtpz->lock->rlock. All of those signal to the lockdep mechanism that
> those locks can be used in SOFTIRQ-ON-W context. This means that the
> freeing of memcg structure must happen in a compatible context,
> otherwise we'll get a deadlock.
>
> The reference counting mechanism we use allows the memcg structure to be
> freed later and outlive the actual memcg destruction from the
> filesystem. However, we have little, if any, means to guarantee in which
> context the last memcg_put will happen. The best we can do is test it
> and try to make sure no invalid context releases are happening. But as
> we add more code to memcg, the possible interactions grow in number and
> expose more ways to get context conflicts.
>
> Greg Thelen reported a bug with that patchset applied that would trigger
> if a task would hold a reference to a memcg through its kmem counter.
> This would mean that killing that task would eventually get us to
> __mem_cgroup_free() after dropping the last kernel page reference, in an
> invalid IN-SOFTIRQ-W.
>
> Besides that, he raised the quite valid concern that keeping the full
> memcg around for an unbounded period of time can eventually exhaust the
> css_id space, and pin a lot of not needed memory. For instance, a
> O(nr_cpus) percpu data for the stats is kept around, and we don't expect
> to use it after the memcg is gone.
>
> Both those problems can be avoided by freeing as much as we can in
> mem_cgroup_destroy(), and leaving only the memcg structure and the
> static branches to be removed later. That freeing run on a predictable
> context, getting rid of the softirq problem, and also reduces pressure
> both on the css_id space and total dangling memory.

Thank you for the patch.  I think it is a step in the right direction,
but I suspect a problem (noted below).

> I consider this safe because all the page_cgroup references to user
> pages are reparented to the imediate parent, so late uncharges won't
> trigger the common uncharge paths with a destroyed memcg.
>
> Although we don't migrate kernel pages to parent, we also don't call the
> common uncharge paths for those pages, rather uncharging the
> res_counters directly. So we are safe on this side of the wall as well.
>
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> Reported-by: Greg Thelen <gthelen@google.com>
> CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Michal Hocko <mhocko@suse.cz>
> CC: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/memcontrol.c | 25 ++++++++++++-------------
>  1 file changed, 12 insertions(+), 13 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9a82965..78cb394 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5169,18 +5169,9 @@ static void free_work(struct work_struct *work)
>  		vfree(memcg);
>  }
>  
> -static void free_rcu(struct rcu_head *rcu_head)
> -{
> -	struct mem_cgroup *memcg;
> -
> -	memcg = container_of(rcu_head, struct mem_cgroup, rcu_freeing);
> -	INIT_WORK(&memcg->work_freeing, free_work);
> -	schedule_work(&memcg->work_freeing);
> -}
> -
>  /*
> - * At destroying mem_cgroup, references from swap_cgroup can remain.
> - * (scanning all at force_empty is too costly...)
> + * At destroying mem_cgroup, references from swap_cgroup and other places can
> + * remain.  (scanning all at force_empty is too costly...)
>   *
>   * Instead of clearing all references at force_empty, we remember
>   * the number of reference from swap_cgroup and free mem_cgroup when
> @@ -5188,6 +5179,14 @@ static void free_rcu(struct rcu_head *rcu_head)
>   *
>   * Removal of cgroup itself succeeds regardless of refs from swap.
>   */
> +static void free_rcu(struct rcu_head *rcu_head)
> +{
> +	struct mem_cgroup *memcg;
> +
> +	memcg = container_of(rcu_head, struct mem_cgroup, rcu_freeing);
> +	INIT_WORK(&memcg->work_freeing, free_work);
> +	schedule_work(&memcg->work_freeing);
> +}
>  
>  static void __mem_cgroup_free(struct mem_cgroup *memcg)
>  {
> @@ -5200,7 +5199,6 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
>  		free_mem_cgroup_per_zone_info(memcg, node);
>  
>  	free_percpu(memcg->stat);
> -	call_rcu(&memcg->rcu_freeing, free_rcu);
>  }
>  
>  static void mem_cgroup_get(struct mem_cgroup *memcg)
> @@ -5212,7 +5210,7 @@ static void __mem_cgroup_put(struct mem_cgroup *memcg, int count)
>  {
>  	if (atomic_sub_and_test(count, &memcg->refcnt)) {
>  		struct mem_cgroup *parent = parent_mem_cgroup(memcg);
> -		__mem_cgroup_free(memcg);
> +		call_rcu(&memcg->rcu_freeing, free_rcu);
>  		if (parent)
>  			mem_cgroup_put(parent);
>  	}
> @@ -5377,6 +5375,7 @@ static void mem_cgroup_destroy(struct cgroup *cont)
>  
>  	kmem_cgroup_destroy(memcg);
>  
> +	__mem_cgroup_free(memcg);

I suspect that this will free the css_id before all swap entries have
dropped their references with mem_cgroup_uncharge_swap() ?  I think we
only want to call __mem_cgroup_free() once all non kernel page
references have been released.  This would include mem_cgroup_destroy()
and any pending calls to mem_cgroup_uncharge_swap().  I'm not sure, but
may be a second refcount or some optimization with the kmem_accounted
bitmask can efficiently handle this.

>  	mem_cgroup_put(memcg);
>  }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] memcg: execute partial memcg freeing in mem_cgroup_destroy
  2012-08-17  6:37     ` Greg Thelen
@ 2012-08-17  6:36       ` Glauber Costa
  2012-08-21 13:34       ` Glauber Costa
  1 sibling, 0 replies; 6+ messages in thread
From: Glauber Costa @ 2012-08-17  6:36 UTC (permalink / raw)
  To: Greg Thelen
  Cc: linux-mm, cgroups, kamezawa.hiroyu, Andrew Morton,
	Johannes Weiner, Michal Hocko

On 08/17/2012 10:37 AM, Greg Thelen wrote:
> I am by no means a swap expert, so I may be heading in the weeds.  But I
> think that a swapped out page is not necessarily in any memcg lru.  So
> the mem_cgroup_pre_destroy() call to mem_cgroup_force_empty() will not
> necessarily see swapped out pages.
> 
> I think this demonstrates the problem.
hummm, hummm... all right.

I'll take a look at that today

Thanks Greg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] memcg: execute partial memcg freeing in mem_cgroup_destroy
  2012-08-17  4:51   ` Glauber Costa
@ 2012-08-17  6:37     ` Greg Thelen
  2012-08-17  6:36       ` Glauber Costa
  2012-08-21 13:34       ` Glauber Costa
  0 siblings, 2 replies; 6+ messages in thread
From: Greg Thelen @ 2012-08-17  6:37 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, cgroups, kamezawa.hiroyu, Andrew Morton,
	Johannes Weiner, Michal Hocko

On Thu, Aug 16 2012, Glauber Costa wrote:

> On 08/17/2012 08:51 AM, Greg Thelen wrote:
>> On Thu, Aug 16 2012, Glauber Costa wrote:
>> 
>>> I consider this safe because all the page_cgroup references to user
>>> pages are reparented to the imediate parent, so late uncharges won't
>>> trigger the common uncharge paths with a destroyed memcg.
>>>
>>> Although we don't migrate kernel pages to parent, we also don't call the
>>> common uncharge paths for those pages, rather uncharging the
>>> res_counters directly. So we are safe on this side of the wall as well.
>>>
<snip>
>>>
>>> @@ -5377,6 +5375,7 @@ static void mem_cgroup_destroy(struct cgroup *cont)
>>>  
>>>  	kmem_cgroup_destroy(memcg);
>>>  
>>> +	__mem_cgroup_free(memcg);
>> 
>> I suspect that this will free the css_id before all swap entries have
>> dropped their references with mem_cgroup_uncharge_swap() ?  I think we
>> only want to call __mem_cgroup_free() once all non kernel page
>> references have been released.  This would include mem_cgroup_destroy()
>> and any pending calls to mem_cgroup_uncharge_swap().  I'm not sure, but
>> may be a second refcount or some optimization with the kmem_accounted
>> bitmask can efficiently handle this.
>> 
>
> Can we demonstrate that? I agree there might be a potential problem, and
> that is why I sent this separately. But the impression I got after
> testing and reading the code, was that the memcg information in
> pc->mem_cgroup would be updated to the parent.
>
> This means that any later call to uncharge or uncharge_swap would just
> uncharge from the parent memcg and we'd have no problem.

I am by no means a swap expert, so I may be heading in the weeds.  But I
think that a swapped out page is not necessarily in any memcg lru.  So
the mem_cgroup_pre_destroy() call to mem_cgroup_force_empty() will not
necessarily see swapped out pages.

I think this demonstrates the problem.

Start with a 500MB machine running a CONFIG_MEMCG_SWAP=y kernel.
Enable a 1G swap device
% cd /dev/cgroup/memory
% echo 1 > memory.use_hierarchy
% mkdir x

# LOOP_START

% mkdir x/y
% (echo $BASHPID > x/y/tasks && exec ~/memtoy)
memtoy>anon a 600m
memtoy>touch a write
^Z
% cat x/y/tasks > tasks
% cat x/memory.memsw.usage_in_bytes
630484992
% rmdir x/y   # this now deletes css_id

# This should free swapents and uncharge memsw res_counter.  But the
# swap ents use deleted css_id in mem_cgroup_uncharge_swap() so the
# memcg cannot be located.  So memsw.usage_in_bytes is not uncharged,
# even for surviving parent memcg (e.g. x).
% kill %1

% cat x/memory.memsw.usage_in_bytes
168202240

# for fun, goto LOOP_START here and see x/memory.memsw.usage_in_bytes
# grow due to this leak.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] memcg: execute partial memcg freeing in mem_cgroup_destroy
  2012-08-17  6:37     ` Greg Thelen
  2012-08-17  6:36       ` Glauber Costa
@ 2012-08-21 13:34       ` Glauber Costa
  1 sibling, 0 replies; 6+ messages in thread
From: Glauber Costa @ 2012-08-21 13:34 UTC (permalink / raw)
  To: Greg Thelen
  Cc: linux-mm, cgroups, kamezawa.hiroyu, Andrew Morton,
	Johannes Weiner, Michal Hocko

On 08/17/2012 10:37 AM, Greg Thelen wrote:
>> >
>> > Can we demonstrate that? I agree there might be a potential problem, and
>> > that is why I sent this separately. But the impression I got after
>> > testing and reading the code, was that the memcg information in
>> > pc->mem_cgroup would be updated to the parent.
>> >
>> > This means that any later call to uncharge or uncharge_swap would just
>> > uncharge from the parent memcg and we'd have no problem.
> I am by no means a swap expert, so I may be heading in the weeds.  But I
> think that a swapped out page is not necessarily in any memcg lru.  So
> the mem_cgroup_pre_destroy() call to mem_cgroup_force_empty() will not
> necessarily see swapped out pages.
> 
> I think this demonstrates the problem.

Ok, thanks Greg.
This seem to happen solely because we use the css_id in swap_cgroup
structure, and that needs to stay around. If we stored the memcg address
instead, we'd have no such problem. (I believe so atm, still need to go
dig deeper)

But this would lead to a 4 times bigger memory usage for that. Quite a
waste considering this tend to be sparsely used most of the time.






--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-08-21 13:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-16 11:01 [PATCH v2] memcg: execute partial memcg freeing in mem_cgroup_destroy Glauber Costa
2012-08-17  4:51 ` Greg Thelen
2012-08-17  4:51   ` Glauber Costa
2012-08-17  6:37     ` Greg Thelen
2012-08-17  6:36       ` Glauber Costa
2012-08-21 13:34       ` Glauber Costa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox