linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: balbir@linux.vnet.ibm.com
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	linux-mm@kvack.org, YAMAMOTO Takashi <yamamoto@valinux.co.jp>,
	Paul Menage <menage@google.com>,
	lizf@cn.fujitsu.com, linux-kernel@vger.kernel.org,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	David Rientjes <rientjes@google.com>,
	Pavel Emelianov <xemul@openvz.org>,
	Dhaval Giani <dhaval@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [mm] [PATCH 3/4] Memory cgroup hierarchical reclaim (v4)
Date: Tue, 9 Dec 2008 12:57:13 +0900	[thread overview]
Message-ID: <20081209125713.e868c43a.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20081209034828.GU13333@balbir.in.ibm.com>

On Tue, 9 Dec 2008 09:18:28 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2008-12-09 11:59:43]:
> 
> > On Wed, 26 Nov 2008 11:14:47 +0900
> > Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > 
> > > On Tue, 25 Nov 2008 20:31:25 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > > Daisuke Nishimura wrote:
> > > > > Hi.
> > > > > 
> > > > > Unfortunately, trying to hold cgroup_mutex at reclaim causes dead lock.
> > > > > 
> > > > > For example, when attaching a task to some cpuset directory(memory_migrate=on),
> > > > > 
> > > > >     cgroup_tasks_write (hold cgroup_mutex)
> > > > >         attach_task_by_pid
> > > > >             cgroup_attach_task
> > > > >                 cpuset_attach
> > > > >                     cpuset_migrate_mm
> > > > >                         :
> > > > >                         unmap_and_move
> > > > >                             mem_cgroup_prepare_migration
> > > > >                                 mem_cgroup_try_charge
> > > > >                                     mem_cgroup_hierarchical_reclaim
> > > > > 
> > > > 
> > > > Did lockdep complain about it?
> > > > 
> > > I haven't understood lockdep so well, but I got logs like this:
> > > 
> > > ===
> > > INFO: task move.sh:17710 blocked for more than 480 seconds.
> > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > move.sh       D ffff88010e1c76c0     0 17710  17597
> > >  ffff8800bd9edf00 0000000000000046 0000000000000000 0000000000000000
> > >  ffff8803afbc0000 ffff8800bd9ee270 0000000e00000000 000000010a54459c
> > >  ffffffffffffffff ffffffffffffffff ffffffffffffffff 7fffffffffffffff
> > > Call Trace:
> > >  [<ffffffff802ae9f0>] mem_cgroup_get_first_node+0x29/0x8a
> > >  [<ffffffff804cb357>] mutex_lock_nested+0x180/0x2a2
> > >  [<ffffffff802ae9f0>] mem_cgroup_get_first_node+0x29/0x8a
> > >  [<ffffffff802ae9f0>] mem_cgroup_get_first_node+0x29/0x8a
> > >  [<ffffffff802aed9c>] __mem_cgroup_try_charge+0x27a/0x2de
> > >  [<ffffffff802afdfd>] mem_cgroup_prepare_migration+0x6c/0xa5
> > >  [<ffffffff802ad97f>] migrate_pages+0x10c/0x4a0
> > >  [<ffffffff802ad9c8>] migrate_pages+0x155/0x4a0
> > >  [<ffffffff802a14cb>] new_node_page+0x0/0x2f
> > >  [<ffffffff802a1adb>] check_range+0x300/0x325
> > >  [<ffffffff802a2374>] do_migrate_pages+0x1a5/0x1f1
> > >  [<ffffffff8026d272>] cpuset_migrate_mm+0x30/0x93
> > >  [<ffffffff8026d29c>] cpuset_migrate_mm+0x5a/0x93
> > >  [<ffffffff8026df41>] cpuset_attach+0x93/0xa6
> > >  [<ffffffff8026ae1b>] cgroup_attach_task+0x395/0x3e1
> > >  [<ffffffff8026af61>] cgroup_tasks_write+0xfa/0x11d
> > >  [<ffffffff8026aea0>] cgroup_tasks_write+0x39/0x11d
> > >  [<ffffffff8026b5aa>] cgroup_file_write+0xef/0x216
> > >  [<ffffffff802b2968>] vfs_write+0xad/0x136
> > >  [<ffffffff802b2dfe>] sys_write+0x45/0x6e
> > >  [<ffffffff8020bdab>] system_call_fastpath+0x16/0x1b
> > > INFO: lockdep is turned off.
> > > ===
> > > 
> > > And other processes trying to hold cgroup_mutex are also stuck.
> > > 
> > > > 1. We could probably move away from cgroup_mutex to a memory controller specific
> > > > mutex.
> > > > 2. We could give up cgroup_mutex before migrate_mm, since it seems like we'll
> > > > hold the cgroup lock for long and holding it during reclaim will definitely be
> > > > visible to users trying to create/delete nodes.
> > > > 
> > > > I prefer to do (2), I'll look at the code more closely
> > > > 
> > > I basically agree, but I think we should also consider mpol_rebind_mm.
> > > 
> > > mpol_rebind_mm, which can be called from cpuset_attach, does down_write(mm->mmap_sem),
> > > which means down_write(mm->mmap_sem) can be called under cgroup_mutex.
> > > OTOH, page fault path does down_read(mm->mmap_sem) and can call mem_cgroup_try_charge,
> > > which means mutex_lock(cgroup_mutex) can be called under down_read(mm->mmap_sem).
> > > 
> > 
> > What's status of this problem ? fixed or not yet ?
> > Sorry for failing to track paches.
> >
> 
> Kamezawa-San,
> 
> We are looking at two approaches that I had mentioned earlier
> 
> 1) rely on the new cgroup_tasklist mutex introduced to close the race

Hmm ? what you're talking about is 
==memcg-avoid-dead-lock-caused-by-race-between-oom-and-cpuset_attach.patch
+static DEFINE_MUTEX(memcg_tasklist);   /* can be hold under cgroup_mutex */

this ?

I think there is no Acks from Paul Menage (I.e. cgroup maitainer)
to this and am afraid that this will add new complexity.

Hmm...It seems that my cgroup-ID scan patch will need more time to fix this
race. But I'd like to push fixes for this kind of race out to cgroup layer rather
than making more special...we already have tons of special operations which
cannot be handled in cgroup layer and breaks assumption.

-Kame

> 2) Removing cgroup lock dependency with cgroup_tasks_write. I worry
> that it can lead to long latencies with cgroup_lock held
> 
> I can send a patch for (1) today, I want to fix (2)
> and spent a lot of time staring at that code and could not find
> any easy way to fix it.
>  
> 
> -- 
> 	Balbir
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-12-09  3:58 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-16  8:10 [mm][PATCH 0/4] Memory cgroup hierarchy introduction (v4) Balbir Singh
2008-11-16  8:10 ` [mm] [PATCH 1/4] Memory cgroup hierarchy documentation (v4) Balbir Singh
2008-11-17  1:06   ` Li Zefan
2008-11-17  3:37     ` Balbir Singh
2008-11-16  8:10 ` [mm] [PATCH 2/4] Memory cgroup resource counters for hierarchy (v4) Balbir Singh
2008-11-16  8:10 ` [mm] [PATCH 3/4] Memory cgroup hierarchical reclaim (v4) Balbir Singh
2008-11-25 11:58   ` Daisuke Nishimura
2008-11-25 15:01     ` Balbir Singh
2008-11-26  2:14       ` Daisuke Nishimura
2008-12-09  2:59         ` KAMEZAWA Hiroyuki
2008-12-09  3:48           ` Balbir Singh
2008-12-09  3:57             ` KAMEZAWA Hiroyuki [this message]
2008-12-09  3:53           ` Daisuke Nishimura
2008-12-09  3:58             ` KAMEZAWA Hiroyuki
2008-11-16  8:11 ` [mm] [PATCH 4/4] Memory cgroup hierarchy feature selector (v4) Balbir Singh
2008-11-17  4:46   ` Li Zefan
2008-11-17  4:49     ` Balbir Singh
2008-11-18 23:28   ` Andrew Morton
2008-11-19  5:04     ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081209125713.e868c43a.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=dhaval@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=menage@google.com \
    --cc=nickpiggin@yahoo.com.au \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=rientjes@google.com \
    --cc=xemul@openvz.org \
    --cc=yamamoto@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox