From: David Rientjes <rientjes@google.com>
To: Nick Piggin <npiggin@suse.de>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Miao Xie <miaox@cn.fujitsu.com>,
Lee Schermerhorn <lee.schermerhorn@hp.com>
Subject: Re: [regression] cpuset,mm: update tasks' mems_allowed in time (58568d2)
Date: Fri, 19 Feb 2010 02:06:45 -0800 (PST) [thread overview]
Message-ID: <alpine.DEB.2.00.1002190143040.6293@chino.kir.corp.google.com> (raw)
In-Reply-To: <20100219033126.GI9738@laptop>
On Fri, 19 Feb 2010, Nick Piggin wrote:
> > guarantee_online_cpus() truly does require callback_mutex, the
> > cgroup_scan_tasks() iterator locking can protect changes in the cgroup
> > hierarchy but it doesn't protect a store to cs->cpus_allowed or for
> > hotplug.
>
> Right, but the callback_mutex was being removed by this patch.
>
I was making the case for it to be readded :)
> > top_cpuset.cpus_allowed will always need to track cpu_active_map since
> > those are the schedulable cpus, it looks like that's initialized for SMP
> > and the cpu hotplug notifier does that correctly.
> >
> > I'm not sure what the logic is doing in cpuset_attach() where cs is the
> > cpuset to attach to:
> >
> > if (cs == &top_cpuset) {
> > cpumask_copy(cpus_attach, cpu_possible_mask);
> > to = node_possible_map;
> > }
> >
> > cpus_attach is properly protected by cgroup_lock, but using
> > node_possible_map here will set task->mems_allowed to node_possible_map
> > when the cpuset does not have memory_migrate enabled. This is the source
> > of your oops, I think.
>
> Could be, yes.
>
I'd be interested to see if you still get the same oops with the patch at
the end of this email that fixes this logic.
> But it doesn't matter if stores are done under lock, if the loads are
> not. masks can be multiple words, so there isn't any ordering between
> reading half and old mask and half a new one that results in an invalid
> state. AFAIKS.
>
It doesn't matter for MAX_NUMNODES > BITS_PER_LONG because
task->mems_alllowed only gets updated via cpuset_change_task_nodemask()
where the added nodes are set and then the removed nodes are cleared. The
side effect of this lockless access to task->mems_allowed means we may
have a small race between
nodes_or(tsk->mems_allowed, tsk->mems_allowed, *newmems);
and
tsk->mems_allowed = *newmems;
but the penalty is that we get an allocation on a removed node, which
isn't a big deal, especially since it was previously allowed.
> Well it is exported as cpuset_lock(). And the scheduler has it covered
> in all cases by the looks except for select_task_rq, which is called
> by wakeup code. We should stick WARN_ONs through the cpuset code for
> mutexes not held when they should be.
>
A lot of the reliance on callback_mutex was removed because the strict
hierarchy walking and task membership is now guarded by cgroup_mutex
instead. Some of the comments in kernel/cpuset.c weren't updated so they
still say callback_mutex when in reality they mean cgroup_mutex.
---
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1319,7 +1319,7 @@ static int fmeter_getrate(struct fmeter *fmp)
return val;
}
-/* Protected by cgroup_lock */
+/* Protected by cgroup_mutex held on cpuset_attach() */
static cpumask_var_t cpus_attach;
/* Called by cgroups to determine if a cpuset is usable; cgroup_mutex held */
@@ -1390,8 +1390,12 @@ static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cont,
struct cpuset *oldcs = cgroup_cs(oldcont);
if (cs == &top_cpuset) {
- cpumask_copy(cpus_attach, cpu_possible_mask);
- to = node_possible_map;
+ /*
+ * top_cpuset.cpus_allowed and top_cpuset.mems_allowed are
+ * protected by cgroup_lock which is already held here.
+ */
+ cpumask_copy(cpus_attach, top_cpuset.cpus_allowed);
+ to = top_cpuset.mems_allowed;
} else {
guarantee_online_cpus(cs, cpus_attach);
guarantee_online_mems(cs, &to);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-02-19 10:06 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-18 13:49 Nick Piggin
2010-02-18 21:38 ` David Rientjes
2010-02-19 3:31 ` Nick Piggin
2010-02-19 10:06 ` David Rientjes [this message]
2010-02-22 11:53 ` Miao Xie
2010-02-22 12:06 ` Nick Piggin
2010-02-23 1:48 ` Miao Xie
2010-02-22 22:06 ` David Rientjes
2010-02-23 7:32 ` Miao Xie
2010-02-23 8:55 ` David Rientjes
2010-02-23 9:23 ` Miao Xie
2010-02-23 22:31 ` David Rientjes
2010-02-24 9:35 ` Miao Xie
2010-02-24 21:08 ` David Rientjes
2010-02-25 1:18 ` Miao Xie
2010-02-22 12:12 ` Nick Piggin
2010-02-22 22:00 ` David Rientjes
2010-02-23 8:25 ` Miao Xie
2010-02-23 8:44 ` David Rientjes
2010-02-24 9:49 ` Miao Xie
2010-02-24 21:06 ` David Rientjes
2010-02-19 7:51 ` KOSAKI Motohiro
2010-02-19 9:42 ` David Rientjes
2010-02-19 7:56 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.00.1002190143040.6293@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=lee.schermerhorn@hp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=miaox@cn.fujitsu.com \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox