* [RFC][PATCH -mmotm 1/4] memcg: don't trigger oom at page migration
2008-12-05 12:22 [RFC][PATCH -mmotm 0/4] patches for memory cgroup (20081205) Daisuke Nishimura
@ 2008-12-05 12:23 ` Daisuke Nishimura
2008-12-05 13:22 ` KAMEZAWA Hiroyuki
2008-12-05 13:39 ` Balbir Singh
2008-12-05 12:24 ` [RFC][PATCH -mmotm 2/4] memcg: remove mem_cgroup_try_charge Daisuke Nishimura
` (3 subsequent siblings)
4 siblings, 2 replies; 16+ messages in thread
From: Daisuke Nishimura @ 2008-12-05 12:23 UTC (permalink / raw)
To: LKML, linux-mm
Cc: Balbir Singh, KAMEZAWA Hiroyuki, Pavel Emelyanov, Li Zefan,
Paul Menage, nishimura
I think triggering OOM at mem_cgroup_prepare_migration would be just a bit
overkill.
Returning -ENOMEM would be enough for mem_cgroup_prepare_migration.
The caller would handle the case anyway.
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
mm/memcontrol.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4dbce1d..50ee1be 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1330,7 +1330,7 @@ int mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
unlock_page_cgroup(pc);
if (mem) {
- ret = mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem);
+ ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false);
css_put(&mem->css);
}
*ptr = mem;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [RFC][PATCH -mmotm 1/4] memcg: don't trigger oom at page migration
2008-12-05 12:23 ` [RFC][PATCH -mmotm 1/4] memcg: don't trigger oom at page migration Daisuke Nishimura
@ 2008-12-05 13:22 ` KAMEZAWA Hiroyuki
2008-12-05 13:39 ` Balbir Singh
1 sibling, 0 replies; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-12-05 13:22 UTC (permalink / raw)
To: Daisuke Nishimura
Cc: LKML, linux-mm, Balbir Singh, KAMEZAWA Hiroyuki, Pavel Emelyanov,
Li Zefan, Paul Menage
Daisuke Nishimura said:
> I think triggering OOM at mem_cgroup_prepare_migration would be just a bit
> overkill.
> Returning -ENOMEM would be enough for mem_cgroup_prepare_migration.
> The caller would handle the case anyway.
>
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
> mm/memcontrol.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4dbce1d..50ee1be 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1330,7 +1330,7 @@ int mem_cgroup_prepare_migration(struct page *page,
> struct mem_cgroup **ptr)
> unlock_page_cgroup(pc);
>
> if (mem) {
> - ret = mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem);
> + ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false);
> css_put(&mem->css);
> }
> *ptr = mem;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [RFC][PATCH -mmotm 1/4] memcg: don't trigger oom at page migration
2008-12-05 12:23 ` [RFC][PATCH -mmotm 1/4] memcg: don't trigger oom at page migration Daisuke Nishimura
2008-12-05 13:22 ` KAMEZAWA Hiroyuki
@ 2008-12-05 13:39 ` Balbir Singh
2008-12-06 2:47 ` Daisuke Nishimura
1 sibling, 1 reply; 16+ messages in thread
From: Balbir Singh @ 2008-12-05 13:39 UTC (permalink / raw)
To: Daisuke Nishimura
Cc: LKML, linux-mm, KAMEZAWA Hiroyuki, Pavel Emelyanov, Li Zefan,
Paul Menage
* Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> [2008-12-05 21:23:04]:
> I think triggering OOM at mem_cgroup_prepare_migration would be just a bit
> overkill.
> Returning -ENOMEM would be enough for mem_cgroup_prepare_migration.
> The caller would handle the case anyway.
>
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> ---
> mm/memcontrol.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4dbce1d..50ee1be 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1330,7 +1330,7 @@ int mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
> unlock_page_cgroup(pc);
>
> if (mem) {
> - ret = mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem);
> + ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false);
> css_put(&mem->css);
> }
> *ptr = mem;
>
Seems reasonable to me. A comment indicating or adding a noreclaim
wrapper around __mem_cgroup_try_charge to indicate that no reclaim
will take place will be nice.
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
--
Balbir
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [RFC][PATCH -mmotm 1/4] memcg: don't trigger oom at page migration
2008-12-05 13:39 ` Balbir Singh
@ 2008-12-06 2:47 ` Daisuke Nishimura
2008-12-06 8:18 ` Balbir Singh
0 siblings, 1 reply; 16+ messages in thread
From: Daisuke Nishimura @ 2008-12-06 2:47 UTC (permalink / raw)
To: balbir
Cc: LKML, linux-mm, KAMEZAWA Hiroyuki, Pavel Emelyanov, Li Zefan,
Paul Menage, d-nishimura, Daisuke Nishimura
On Fri, 5 Dec 2008 19:09:26 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> * Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> [2008-12-05 21:23:04]:
>
> > I think triggering OOM at mem_cgroup_prepare_migration would be just a bit
> > overkill.
> > Returning -ENOMEM would be enough for mem_cgroup_prepare_migration.
> > The caller would handle the case anyway.
> >
> > Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > ---
> > mm/memcontrol.c | 2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 4dbce1d..50ee1be 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1330,7 +1330,7 @@ int mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
> > unlock_page_cgroup(pc);
> >
> > if (mem) {
> > - ret = mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem);
> > + ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false);
> > css_put(&mem->css);
> > }
> > *ptr = mem;
> >
>
> Seems reasonable to me. A comment indicating or adding a noreclaim
> wrapper around __mem_cgroup_try_charge to indicate that no reclaim
> will take place will be nice.
>
Ah.. this flag to __mem_cgroup_try_charge doesn't mean "don't reclaim"
but "don't cause oom after it tried to free memory but couldn't
free enough memory after all".
Thanks,
Daisuke Nishimura.
> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
>
> --
> Balbir
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [RFC][PATCH -mmotm 1/4] memcg: don't trigger oom at page migration
2008-12-06 2:47 ` Daisuke Nishimura
@ 2008-12-06 8:18 ` Balbir Singh
0 siblings, 0 replies; 16+ messages in thread
From: Balbir Singh @ 2008-12-06 8:18 UTC (permalink / raw)
To: nishimura
Cc: LKML, linux-mm, KAMEZAWA Hiroyuki, Pavel Emelyanov, Li Zefan,
Paul Menage, d-nishimura
* Daisuke Nishimura <d-nishimura@mtf.biglobe.ne.jp> [2008-12-06 11:47:57]:
> On Fri, 5 Dec 2008 19:09:26 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>
> > * Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> [2008-12-05 21:23:04]:
> >
> > > I think triggering OOM at mem_cgroup_prepare_migration would be just a bit
> > > overkill.
> > > Returning -ENOMEM would be enough for mem_cgroup_prepare_migration.
> > > The caller would handle the case anyway.
> > >
> > > Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > > ---
> > > mm/memcontrol.c | 2 +-
> > > 1 files changed, 1 insertions(+), 1 deletions(-)
> > >
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index 4dbce1d..50ee1be 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -1330,7 +1330,7 @@ int mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
> > > unlock_page_cgroup(pc);
> > >
> > > if (mem) {
> > > - ret = mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem);
> > > + ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false);
> > > css_put(&mem->css);
> > > }
> > > *ptr = mem;
> > >
> >
> > Seems reasonable to me. A comment indicating or adding a noreclaim
> > wrapper around __mem_cgroup_try_charge to indicate that no reclaim
> > will take place will be nice.
> >
> Ah.. this flag to __mem_cgroup_try_charge doesn't mean "don't reclaim"
> but "don't cause oom after it tried to free memory but couldn't
> free enough memory after all".
Thanks, I mistook the parameter. Thanks for clarifying!
>
> Thanks,
> Daisuke Nishimura.
>
> > Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> >
> > --
> > Balbir
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org. For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >
>
--
Balbir
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC][PATCH -mmotm 2/4] memcg: remove mem_cgroup_try_charge
2008-12-05 12:22 [RFC][PATCH -mmotm 0/4] patches for memory cgroup (20081205) Daisuke Nishimura
2008-12-05 12:23 ` [RFC][PATCH -mmotm 1/4] memcg: don't trigger oom at page migration Daisuke Nishimura
@ 2008-12-05 12:24 ` Daisuke Nishimura
2008-12-05 13:24 ` KAMEZAWA Hiroyuki
2008-12-05 12:24 ` [RFC][PATCH -mmotm 3/4] memcg: avoid dead lock caused by race between oom and cpuset_attach Daisuke Nishimura
` (2 subsequent siblings)
4 siblings, 1 reply; 16+ messages in thread
From: Daisuke Nishimura @ 2008-12-05 12:24 UTC (permalink / raw)
To: LKML, linux-mm
Cc: Balbir Singh, KAMEZAWA Hiroyuki, Pavel Emelyanov, Li Zefan,
Paul Menage, nishimura
Now, mem_cgroup_try_charge is not used by anyone, so we can remove it.
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
include/linux/memcontrol.h | 8 --------
mm/memcontrol.c | 21 +--------------------
2 files changed, 1 insertions(+), 28 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index fe82b58..4b35739 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -40,8 +40,6 @@ struct mm_struct;
extern int mem_cgroup_newpage_charge(struct page *page, struct mm_struct *mm,
gfp_t gfp_mask);
/* for swap handling */
-extern int mem_cgroup_try_charge(struct mm_struct *mm,
- gfp_t gfp_mask, struct mem_cgroup **ptr);
extern int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
struct page *page, gfp_t mask, struct mem_cgroup **ptr);
extern void mem_cgroup_commit_charge_swapin(struct page *page,
@@ -135,12 +133,6 @@ static inline int mem_cgroup_cache_charge(struct page *page,
return 0;
}
-static inline int mem_cgroup_try_charge(struct mm_struct *mm,
- gfp_t gfp_mask, struct mem_cgroup **ptr)
-{
- return 0;
-}
-
static inline int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
struct page *page, gfp_t gfp_mask, struct mem_cgroup **ptr)
{
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 50ee1be..9c5856b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -808,27 +808,8 @@ nomem:
return -ENOMEM;
}
-/**
- * mem_cgroup_try_charge - get charge of PAGE_SIZE.
- * @mm: an mm_struct which is charged against. (when *memcg is NULL)
- * @gfp_mask: gfp_mask for reclaim.
- * @memcg: a pointer to memory cgroup which is charged against.
- *
- * charge against memory cgroup pointed by *memcg. if *memcg == NULL, estimated
- * memory cgroup from @mm is got and stored in *memcg.
- *
- * Returns 0 if success. -ENOMEM at failure.
- * This call can invoke OOM-Killer.
- */
-
-int mem_cgroup_try_charge(struct mm_struct *mm,
- gfp_t mask, struct mem_cgroup **memcg)
-{
- return __mem_cgroup_try_charge(mm, mask, memcg, true);
-}
-
/*
- * commit a charge got by mem_cgroup_try_charge() and makes page_cgroup to be
+ * commit a charge got by __mem_cgroup_try_charge() and makes page_cgroup to be
* USED state. If already USED, uncharge and return.
*/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [RFC][PATCH -mmotm 2/4] memcg: remove mem_cgroup_try_charge
2008-12-05 12:24 ` [RFC][PATCH -mmotm 2/4] memcg: remove mem_cgroup_try_charge Daisuke Nishimura
@ 2008-12-05 13:24 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-12-05 13:24 UTC (permalink / raw)
To: Daisuke Nishimura
Cc: LKML, linux-mm, Balbir Singh, KAMEZAWA Hiroyuki, Pavel Emelyanov,
Li Zefan, Paul Menage
Daisuke Nishimura said:
> Now, mem_cgroup_try_charge is not used by anyone, so we can remove it.
>
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Not Now, but after patch [1/4] ;)
Acked-by:KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
> include/linux/memcontrol.h | 8 --------
> mm/memcontrol.c | 21 +--------------------
> 2 files changed, 1 insertions(+), 28 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index fe82b58..4b35739 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -40,8 +40,6 @@ struct mm_struct;
> extern int mem_cgroup_newpage_charge(struct page *page, struct mm_struct
> *mm,
> gfp_t gfp_mask);
> /* for swap handling */
> -extern int mem_cgroup_try_charge(struct mm_struct *mm,
> - gfp_t gfp_mask, struct mem_cgroup **ptr);
> extern int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
> struct page *page, gfp_t mask, struct mem_cgroup **ptr);
> extern void mem_cgroup_commit_charge_swapin(struct page *page,
> @@ -135,12 +133,6 @@ static inline int mem_cgroup_cache_charge(struct page
> *page,
> return 0;
> }
>
> -static inline int mem_cgroup_try_charge(struct mm_struct *mm,
> - gfp_t gfp_mask, struct mem_cgroup **ptr)
> -{
> - return 0;
> -}
> -
> static inline int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
> struct page *page, gfp_t gfp_mask, struct mem_cgroup **ptr)
> {
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 50ee1be..9c5856b 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -808,27 +808,8 @@ nomem:
> return -ENOMEM;
> }
>
> -/**
> - * mem_cgroup_try_charge - get charge of PAGE_SIZE.
> - * @mm: an mm_struct which is charged against. (when *memcg is NULL)
> - * @gfp_mask: gfp_mask for reclaim.
> - * @memcg: a pointer to memory cgroup which is charged against.
> - *
> - * charge against memory cgroup pointed by *memcg. if *memcg == NULL,
> estimated
> - * memory cgroup from @mm is got and stored in *memcg.
> - *
> - * Returns 0 if success. -ENOMEM at failure.
> - * This call can invoke OOM-Killer.
> - */
> -
> -int mem_cgroup_try_charge(struct mm_struct *mm,
> - gfp_t mask, struct mem_cgroup **memcg)
> -{
> - return __mem_cgroup_try_charge(mm, mask, memcg, true);
> -}
> -
> /*
> - * commit a charge got by mem_cgroup_try_charge() and makes page_cgroup
> to be
> + * commit a charge got by __mem_cgroup_try_charge() and makes page_cgroup
> to be
> * USED state. If already USED, uncharge and return.
> */
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC][PATCH -mmotm 3/4] memcg: avoid dead lock caused by race between oom and cpuset_attach
2008-12-05 12:22 [RFC][PATCH -mmotm 0/4] patches for memory cgroup (20081205) Daisuke Nishimura
2008-12-05 12:23 ` [RFC][PATCH -mmotm 1/4] memcg: don't trigger oom at page migration Daisuke Nishimura
2008-12-05 12:24 ` [RFC][PATCH -mmotm 2/4] memcg: remove mem_cgroup_try_charge Daisuke Nishimura
@ 2008-12-05 12:24 ` Daisuke Nishimura
2008-12-05 13:27 ` [RFC][PATCH -mmotm 3/4] memcg: avoid dead lock caused by racebetween " KAMEZAWA Hiroyuki
2008-12-05 13:52 ` [RFC][PATCH -mmotm 3/4] memcg: avoid dead lock caused by race between " Balbir Singh
2008-12-05 12:25 ` [RFC][PATCH -mmotm 4/4] memcg: change try_to_free_pages to hierarchical_reclaim Daisuke Nishimura
2008-12-06 3:42 ` [RFC][PATCH -mmotm 0/4] patches for memory cgroup (20081205) Daisuke Nishimura
4 siblings, 2 replies; 16+ messages in thread
From: Daisuke Nishimura @ 2008-12-05 12:24 UTC (permalink / raw)
To: LKML, linux-mm
Cc: Balbir Singh, KAMEZAWA Hiroyuki, Pavel Emelyanov, Li Zefan,
Paul Menage, nishimura
mpol_rebind_mm(), which can be called from cpuset_attach(), does down_write(mm->mmap_sem).
This means down_write(mm->mmap_sem) can be called under cgroup_mutex.
OTOH, page fault path does down_read(mm->mmap_sem) and calls mem_cgroup_try_charge_xxx(),
which may eventually calls mem_cgroup_out_of_memory(). And mem_cgroup_out_of_memory()
calls cgroup_lock().
This means cgroup_lock() can be called under down_read(mm->mmap_sem).
If those two paths race, dead lock can happen.
This patch avoid this dead lock by:
- remove cgroup_lock() from mem_cgroup_out_of_memory().
- define new mutex (memcg_tasklist) and serialize mem_cgroup_move_task()
(->attach handler of memory cgroup) and mem_cgroup_out_of_memory.
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
mm/memcontrol.c | 5 +++++
mm/oom_kill.c | 2 --
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9c5856b..ab04725 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -51,6 +51,7 @@ static int really_do_swap_account __initdata = 1; /* for remember boot option*/
#define do_swap_account (0)
#endif
+static DEFINE_MUTEX(memcg_tasklist); /* can be hold under cgroup_mutex */
/*
* Statistics for memory cgroup.
@@ -796,7 +797,9 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,
if (!nr_retries--) {
if (oom) {
+ mutex_lock(&memcg_tasklist);
mem_cgroup_out_of_memory(mem_over_limit, gfp_mask);
+ mutex_unlock(&memcg_tasklist);
mem_over_limit->last_oom_jiffies = jiffies;
}
goto nomem;
@@ -2172,10 +2175,12 @@ static void mem_cgroup_move_task(struct cgroup_subsys *ss,
struct cgroup *old_cont,
struct task_struct *p)
{
+ mutex_lock(&memcg_tasklist);
/*
* FIXME: It's better to move charges of this process from old
* memcg to new memcg. But it's just on TODO-List now.
*/
+ mutex_unlock(&memcg_tasklist);
}
struct cgroup_subsys mem_cgroup_subsys = {
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index fd150e3..40ba050 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -429,7 +429,6 @@ void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask)
unsigned long points = 0;
struct task_struct *p;
- cgroup_lock();
read_lock(&tasklist_lock);
retry:
p = select_bad_process(&points, mem);
@@ -444,7 +443,6 @@ retry:
goto retry;
out:
read_unlock(&tasklist_lock);
- cgroup_unlock();
}
#endif
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [RFC][PATCH -mmotm 3/4] memcg: avoid dead lock caused by racebetween oom and cpuset_attach
2008-12-05 12:24 ` [RFC][PATCH -mmotm 3/4] memcg: avoid dead lock caused by race between oom and cpuset_attach Daisuke Nishimura
@ 2008-12-05 13:27 ` KAMEZAWA Hiroyuki
2008-12-06 3:35 ` Daisuke Nishimura
2008-12-05 13:52 ` [RFC][PATCH -mmotm 3/4] memcg: avoid dead lock caused by race between " Balbir Singh
1 sibling, 1 reply; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-12-05 13:27 UTC (permalink / raw)
To: Daisuke Nishimura
Cc: LKML, linux-mm, Balbir Singh, KAMEZAWA Hiroyuki, Pavel Emelyanov,
Li Zefan, Paul Menage
Daisuke Nishimura said:
> mpol_rebind_mm(), which can be called from cpuset_attach(), does
> down_write(mm->mmap_sem).
> This means down_write(mm->mmap_sem) can be called under cgroup_mutex.
>
> OTOH, page fault path does down_read(mm->mmap_sem) and calls
> mem_cgroup_try_charge_xxx(),
> which may eventually calls mem_cgroup_out_of_memory(). And
> mem_cgroup_out_of_memory()
> calls cgroup_lock().
> This means cgroup_lock() can be called under down_read(mm->mmap_sem).
>
good catch.
> If those two paths race, dead lock can happen.
>
> This patch avoid this dead lock by:
> - remove cgroup_lock() from mem_cgroup_out_of_memory().
agree to this.
> - define new mutex (memcg_tasklist) and serialize mem_cgroup_move_task()
> (->attach handler of memory cgroup) and mem_cgroup_out_of_memory.
>
Hmm...seems temporal fix (and adding new global lock...)
But ok, we need fix. revist this later.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu,com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC][PATCH -mmotm 3/4] memcg: avoid dead lock caused by racebetween oom and cpuset_attach
2008-12-05 13:27 ` [RFC][PATCH -mmotm 3/4] memcg: avoid dead lock caused by racebetween " KAMEZAWA Hiroyuki
@ 2008-12-06 3:35 ` Daisuke Nishimura
0 siblings, 0 replies; 16+ messages in thread
From: Daisuke Nishimura @ 2008-12-06 3:35 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: LKML, linux-mm, Balbir Singh, Pavel Emelyanov, Li Zefan,
Paul Menage, d-nishimura, Daisuke Nishimura
On Fri, 5 Dec 2008 22:27:10 +0900 (JST)
"KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Daisuke Nishimura said:
> > mpol_rebind_mm(), which can be called from cpuset_attach(), does
> > down_write(mm->mmap_sem).
> > This means down_write(mm->mmap_sem) can be called under cgroup_mutex.
> >
> > OTOH, page fault path does down_read(mm->mmap_sem) and calls
> > mem_cgroup_try_charge_xxx(),
> > which may eventually calls mem_cgroup_out_of_memory(). And
> > mem_cgroup_out_of_memory()
> > calls cgroup_lock().
> > This means cgroup_lock() can be called under down_read(mm->mmap_sem).
> >
> good catch.
>
Thanks.
> > If those two paths race, dead lock can happen.
> >
> > This patch avoid this dead lock by:
> > - remove cgroup_lock() from mem_cgroup_out_of_memory().
> agree to this.
>
> > - define new mutex (memcg_tasklist) and serialize mem_cgroup_move_task()
> > (->attach handler of memory cgroup) and mem_cgroup_out_of_memory.
> >
> Hmm...seems temporal fix (and adding new global lock...)
> But ok, we need fix. revist this later.
>
I agree.
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu,com>
>
Thank you.
Daisuke Nishimura.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC][PATCH -mmotm 3/4] memcg: avoid dead lock caused by race between oom and cpuset_attach
2008-12-05 12:24 ` [RFC][PATCH -mmotm 3/4] memcg: avoid dead lock caused by race between oom and cpuset_attach Daisuke Nishimura
2008-12-05 13:27 ` [RFC][PATCH -mmotm 3/4] memcg: avoid dead lock caused by racebetween " KAMEZAWA Hiroyuki
@ 2008-12-05 13:52 ` Balbir Singh
1 sibling, 0 replies; 16+ messages in thread
From: Balbir Singh @ 2008-12-05 13:52 UTC (permalink / raw)
To: Daisuke Nishimura
Cc: LKML, linux-mm, KAMEZAWA Hiroyuki, Pavel Emelyanov, Li Zefan,
Paul Menage
* Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> [2008-12-05 21:24:50]:
> mpol_rebind_mm(), which can be called from cpuset_attach(), does down_write(mm->mmap_sem).
> This means down_write(mm->mmap_sem) can be called under cgroup_mutex.
>
> OTOH, page fault path does down_read(mm->mmap_sem) and calls mem_cgroup_try_charge_xxx(),
> which may eventually calls mem_cgroup_out_of_memory(). And mem_cgroup_out_of_memory()
> calls cgroup_lock().
> This means cgroup_lock() can be called under down_read(mm->mmap_sem).
>
> If those two paths race, dead lock can happen.
>
> This patch avoid this dead lock by:
> - remove cgroup_lock() from mem_cgroup_out_of_memory().
> - define new mutex (memcg_tasklist) and serialize mem_cgroup_move_task()
> (->attach handler of memory cgroup) and mem_cgroup_out_of_memory.
A similar race has been reported for cpuset_migrate_mm(), which is
called holding the cgroup_mutex and further calls do_migrate_pages,
which can call reclaim and thus try to acquire cgroup_lock. If we
avoid reclaiming pages with cpuset_migrate_mm(), as the first patch
did, it also solves the reported race.
>
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Looks good to me
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
--
Balbir
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC][PATCH -mmotm 4/4] memcg: change try_to_free_pages to hierarchical_reclaim
2008-12-05 12:22 [RFC][PATCH -mmotm 0/4] patches for memory cgroup (20081205) Daisuke Nishimura
` (2 preceding siblings ...)
2008-12-05 12:24 ` [RFC][PATCH -mmotm 3/4] memcg: avoid dead lock caused by race between oom and cpuset_attach Daisuke Nishimura
@ 2008-12-05 12:25 ` Daisuke Nishimura
2008-12-05 13:30 ` [RFC][PATCH -mmotm 4/4] memcg: change try_to_free_pages tohierarchical_reclaim KAMEZAWA Hiroyuki
2008-12-05 13:54 ` [RFC][PATCH -mmotm 4/4] memcg: change try_to_free_pages to hierarchical_reclaim Balbir Singh
2008-12-06 3:42 ` [RFC][PATCH -mmotm 0/4] patches for memory cgroup (20081205) Daisuke Nishimura
4 siblings, 2 replies; 16+ messages in thread
From: Daisuke Nishimura @ 2008-12-05 12:25 UTC (permalink / raw)
To: LKML, linux-mm
Cc: Balbir Singh, KAMEZAWA Hiroyuki, Pavel Emelyanov, Li Zefan,
Paul Menage, nishimura
mem_cgroup_hierarchicl_reclaim() works properly even when !use_hierarchy now,
so, instead of try_to_free_mem_cgroup_pages(), it should be used in many cases.
The only exception is force_empty. The group has no children in this case.
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
mm/memcontrol.c | 12 ++++--------
1 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ab04725..c0b4f37 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1399,8 +1399,7 @@ int mem_cgroup_shrink_usage(struct mm_struct *mm, gfp_t gfp_mask)
rcu_read_unlock();
do {
- progress = try_to_free_mem_cgroup_pages(mem, gfp_mask, true,
- get_swappiness(mem));
+ progress = mem_cgroup_hierarchical_reclaim(mem, gfp_mask, true);
progress += mem_cgroup_check_under_limit(mem);
} while (!progress && --retry);
@@ -1467,10 +1466,8 @@ static int mem_cgroup_resize_limit(struct mem_cgroup *memcg,
if (!ret)
break;
- progress = try_to_free_mem_cgroup_pages(memcg,
- GFP_KERNEL,
- false,
- get_swappiness(memcg));
+ progress = mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL,
+ false);
if (!progress) retry_count--;
}
@@ -1514,8 +1511,7 @@ int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg,
break;
oldusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
- try_to_free_mem_cgroup_pages(memcg, GFP_KERNEL, true,
- get_swappiness(memcg));
+ mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true);
curusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
if (curusage >= oldusage)
retry_count--;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [RFC][PATCH -mmotm 4/4] memcg: change try_to_free_pages tohierarchical_reclaim
2008-12-05 12:25 ` [RFC][PATCH -mmotm 4/4] memcg: change try_to_free_pages to hierarchical_reclaim Daisuke Nishimura
@ 2008-12-05 13:30 ` KAMEZAWA Hiroyuki
2008-12-05 13:54 ` [RFC][PATCH -mmotm 4/4] memcg: change try_to_free_pages to hierarchical_reclaim Balbir Singh
1 sibling, 0 replies; 16+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-12-05 13:30 UTC (permalink / raw)
To: Daisuke Nishimura
Cc: LKML, linux-mm, Balbir Singh, KAMEZAWA Hiroyuki, Pavel Emelyanov,
Li Zefan, Paul Menage
Daisuke Nishimura said:
> mem_cgroup_hierarchicl_reclaim() works properly even when !use_hierarchy
> now,
> so, instead of try_to_free_mem_cgroup_pages(), it should be used in many
> cases.
>
> The only exception is force_empty. The group has no children in this case.
>
Hmm...I postponed this until removing cgroup_lock from reclaim..
But ok, this is a way to go,
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC][PATCH -mmotm 4/4] memcg: change try_to_free_pages to hierarchical_reclaim
2008-12-05 12:25 ` [RFC][PATCH -mmotm 4/4] memcg: change try_to_free_pages to hierarchical_reclaim Daisuke Nishimura
2008-12-05 13:30 ` [RFC][PATCH -mmotm 4/4] memcg: change try_to_free_pages tohierarchical_reclaim KAMEZAWA Hiroyuki
@ 2008-12-05 13:54 ` Balbir Singh
1 sibling, 0 replies; 16+ messages in thread
From: Balbir Singh @ 2008-12-05 13:54 UTC (permalink / raw)
To: Daisuke Nishimura
Cc: LKML, linux-mm, KAMEZAWA Hiroyuki, Pavel Emelyanov, Li Zefan,
Paul Menage
* Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> [2008-12-05 21:25:29]:
> mem_cgroup_hierarchicl_reclaim() works properly even when !use_hierarchy now,
> so, instead of try_to_free_mem_cgroup_pages(), it should be used in many cases.
>
Yes, that was by design. The design is such that use_hierarchy is set
for all children when the parent has it set and the resource counters
are also linked, such that the charge propagates to the root of the
current hierarchy and not any further.
> The only exception is force_empty. The group has no children in this case.
>
>
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> ---
> mm/memcontrol.c | 12 ++++--------
> 1 files changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index ab04725..c0b4f37 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1399,8 +1399,7 @@ int mem_cgroup_shrink_usage(struct mm_struct *mm, gfp_t gfp_mask)
> rcu_read_unlock();
>
> do {
> - progress = try_to_free_mem_cgroup_pages(mem, gfp_mask, true,
> - get_swappiness(mem));
> + progress = mem_cgroup_hierarchical_reclaim(mem, gfp_mask, true);
> progress += mem_cgroup_check_under_limit(mem);
> } while (!progress && --retry);
>
> @@ -1467,10 +1466,8 @@ static int mem_cgroup_resize_limit(struct mem_cgroup *memcg,
> if (!ret)
> break;
>
> - progress = try_to_free_mem_cgroup_pages(memcg,
> - GFP_KERNEL,
> - false,
> - get_swappiness(memcg));
> + progress = mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL,
> + false);
> if (!progress) retry_count--;
> }
>
> @@ -1514,8 +1511,7 @@ int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg,
> break;
>
> oldusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
> - try_to_free_mem_cgroup_pages(memcg, GFP_KERNEL, true,
> - get_swappiness(memcg));
> + mem_cgroup_hierarchical_reclaim(memcg, GFP_KERNEL, true);
> curusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
> if (curusage >= oldusage)
> retry_count--;
>
Looks good to me
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
--
Balbir
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC][PATCH -mmotm 0/4] patches for memory cgroup (20081205)
2008-12-05 12:22 [RFC][PATCH -mmotm 0/4] patches for memory cgroup (20081205) Daisuke Nishimura
` (3 preceding siblings ...)
2008-12-05 12:25 ` [RFC][PATCH -mmotm 4/4] memcg: change try_to_free_pages to hierarchical_reclaim Daisuke Nishimura
@ 2008-12-06 3:42 ` Daisuke Nishimura
4 siblings, 0 replies; 16+ messages in thread
From: Daisuke Nishimura @ 2008-12-06 3:42 UTC (permalink / raw)
To: Balbir Singh, KAMEZAWA Hiroyuki
Cc: LKML, linux-mm, Pavel Emelyanov, Li Zefan, Paul Menage,
Daisuke Nishimura, d-nishimura
On Fri, 5 Dec 2008 21:22:08 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> Hi.
>
> These are patches that I have now.
>
> Patches:
> [1/4] memcg: don't trigger oom at page migration
> [2/4] memcg: remove mem_cgroup_try_charge
> [3/4] memcg: avoid deadlock caused by race between oom and cpuset_attach
> [4/4] memcg: change try_to_free_pages to hierarchical_reclaim
>
> There is no special meaning in patch order except for 1 and 2.
>
> Any comments would be welcome.
>
Thank you for all your reviews and acks.
I'll send them to Andrew after I back to my office on Monday.
Thanks,
Daisuke Nishimura.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread