Re: [regression] cpuset,mm: update tasks' mems_allowed in time (58568d2)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Rientjes <rientjes@google.com>
To: Nick Piggin <npiggin@suse.de>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Miao Xie <miaox@cn.fujitsu.com>,
	Lee Schermerhorn <lee.schermerhorn@hp.com>
Subject: Re: [regression] cpuset,mm: update tasks' mems_allowed in time (58568d2)
Date: Fri, 19 Feb 2010 02:06:45 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.2.00.1002190143040.6293@chino.kir.corp.google.com> (raw)
In-Reply-To: <20100219033126.GI9738@laptop>

On Fri, 19 Feb 2010, Nick Piggin wrote:

> > guarantee_online_cpus() truly does require callback_mutex, the 
> > cgroup_scan_tasks() iterator locking can protect changes in the cgroup 
> > hierarchy but it doesn't protect a store to cs->cpus_allowed or for 
> > hotplug.
> 
> Right, but the callback_mutex was being removed by this patch.
> 

I was making the case for it to be readded :)

> > top_cpuset.cpus_allowed will always need to track cpu_active_map since 
> > those are the schedulable cpus, it looks like that's initialized for SMP 
> > and the cpu hotplug notifier does that correctly.
> > 
> > I'm not sure what the logic is doing in cpuset_attach() where cs is the 
> > cpuset to attach to:
> > 
> > 	if (cs == &top_cpuset) {
> > 		cpumask_copy(cpus_attach, cpu_possible_mask);
> > 		to = node_possible_map;
> > 	}
> > 
> > cpus_attach is properly protected by cgroup_lock, but using 
> > node_possible_map here will set task->mems_allowed to node_possible_map 
> > when the cpuset does not have memory_migrate enabled.  This is the source 
> > of your oops, I think.
> 
> Could be, yes.
> 

I'd be interested to see if you still get the same oops with the patch at 
the end of this email that fixes this logic.

> But it doesn't matter if stores are done under lock, if the loads are
> not. masks can be multiple words, so there isn't any ordering between
> reading half and old mask and half a new one that results in an invalid
> state. AFAIKS.
> 

It doesn't matter for MAX_NUMNODES > BITS_PER_LONG because 
task->mems_alllowed only gets updated via cpuset_change_task_nodemask() 
where the added nodes are set and then the removed nodes are cleared.  The 
side effect of this lockless access to task->mems_allowed means we may 
have a small race between

	nodes_or(tsk->mems_allowed, tsk->mems_allowed, *newmems);

		and

	tsk->mems_allowed = *newmems;

but the penalty is that we get an allocation on a removed node, which 
isn't a big deal, especially since it was previously allowed.

> Well it is exported as cpuset_lock(). And the scheduler has it covered
> in all cases by the looks except for select_task_rq, which is called
> by wakeup code. We should stick WARN_ONs through the cpuset code for
> mutexes not held when they should be.
> 

A lot of the reliance on callback_mutex was removed because the strict 
hierarchy walking and task membership is now guarded by cgroup_mutex 
instead.  Some of the comments in kernel/cpuset.c weren't updated so they 
still say callback_mutex when in reality they mean cgroup_mutex.
---
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1319,7 +1319,7 @@ static int fmeter_getrate(struct fmeter *fmp)
 	return val;
 }
 
-/* Protected by cgroup_lock */
+/* Protected by cgroup_mutex held on cpuset_attach() */
 static cpumask_var_t cpus_attach;
 
 /* Called by cgroups to determine if a cpuset is usable; cgroup_mutex held */
@@ -1390,8 +1390,12 @@ static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cont,
 	struct cpuset *oldcs = cgroup_cs(oldcont);
 
 	if (cs == &top_cpuset) {
-		cpumask_copy(cpus_attach, cpu_possible_mask);
-		to = node_possible_map;
+		/*
+		 * top_cpuset.cpus_allowed and top_cpuset.mems_allowed are
+		 * protected by cgroup_lock which is already held here.
+		 */
+		cpumask_copy(cpus_attach, top_cpuset.cpus_allowed);
+		to = top_cpuset.mems_allowed;
 	} else {
 		guarantee_online_cpus(cs, cpus_attach);
 		guarantee_online_mems(cs, &to);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-02-19 10:06 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-18 13:49 Nick Piggin
2010-02-18 21:38 ` David Rientjes
2010-02-19  3:31   ` Nick Piggin
2010-02-19 10:06     ` David Rientjes [this message]
2010-02-22 11:53       ` Miao Xie
2010-02-22 12:06         ` Nick Piggin
2010-02-23  1:48           ` Miao Xie
2010-02-22 22:06         ` David Rientjes
2010-02-23  7:32           ` Miao Xie
2010-02-23  8:55             ` David Rientjes
2010-02-23  9:23               ` Miao Xie
2010-02-23 22:31                 ` David Rientjes
2010-02-24  9:35                   ` Miao Xie
2010-02-24 21:08                     ` David Rientjes
2010-02-25  1:18                       ` Miao Xie
2010-02-22 12:12       ` Nick Piggin
2010-02-22 22:00         ` David Rientjes
2010-02-23  8:25           ` Miao Xie
2010-02-23  8:44             ` David Rientjes
2010-02-24  9:49               ` Miao Xie
2010-02-24 21:06                 ` David Rientjes
2010-02-19  7:51 ` KOSAKI Motohiro
2010-02-19  9:42   ` David Rientjes
2010-02-19  7:56 ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1002190143040.6293@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=miaox@cn.fujitsu.com \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox