From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
To: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Jiri Kosina <jkosina@suse.cz>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Christoph Lameter <cl@linux-foundation.org>,
Pekka Enberg <penberg@kernel.org>,
"Paul E. McKenney" <paul.mckenney@linaro.org>,
Josh Triplett <josh@joshtriplett.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()
Date: Fri, 5 Oct 2012 12:24:17 +0900 [thread overview]
Message-ID: <506E52E1.3090609@jp.fujitsu.com> (raw)
In-Reply-To: <506D29A7.1000805@linux.vnet.ibm.com>
2012/10/04 15:16, Srivatsa S. Bhat wrote:
> On 10/04/2012 02:43 AM, Andrew Morton wrote:
>> On Wed, 03 Oct 2012 18:23:09 +0530
>> "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>
>>> The synchronization between CPU hotplug readers and writers is achieved by
>>> means of refcounting, safe-guarded by the cpu_hotplug.lock.
>>>
>>> get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
>>> it. If we ever hit an imbalance between the two, we end up compromising the
>>> guarantees of the hotplug synchronization i.e, for example, an extra call to
>>> put_online_cpus() can end up allowing a hotplug reader to execute concurrently with
>>> a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such cases
>>> where the refcount can go negative.
>>>
>>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>>> ---
>>>
>>> kernel/cpu.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>>> index f560598..00d29bc 100644
>>> --- a/kernel/cpu.c
>>> +++ b/kernel/cpu.c
>>> @@ -80,6 +80,7 @@ void put_online_cpus(void)
>>> if (cpu_hotplug.active_writer == current)
>>> return;
>>> mutex_lock(&cpu_hotplug.lock);
>>> + BUG_ON(cpu_hotplug.refcount == 0);
>>> if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
>>> wake_up_process(cpu_hotplug.active_writer);
>>> mutex_unlock(&cpu_hotplug.lock);
>>
>> I think calling BUG() here is a bit harsh. We should only do that if
>> there's a risk to proceeding: a risk of data loss, a reduced ability to
>> analyse the underlying bug, etc.
>>
>> But a cpu-hotplug locking imbalance is a really really really minor
>> problem! So how about we emit a warning then try to fix things up?
>
> That would be better indeed, thanks!
>
>> This should increase the chance that the machine will keep running and
>> so will increase the chance that a user will be able to report the bug
>> to us.
>>
>
> Yep, sounds good.
>
>>
>> --- a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix
>> +++ a/kernel/cpu.c
>> @@ -80,9 +80,12 @@ void put_online_cpus(void)
>> if (cpu_hotplug.active_writer == current)
>> return;
>> mutex_lock(&cpu_hotplug.lock);
>> - BUG_ON(cpu_hotplug.refcount == 0);
>> - if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
>> - wake_up_process(cpu_hotplug.active_writer);
>> + if (!--cpu_hotplug.refcount) {
>
> This won't catch it. We'll enter this 'if' condition only when cpu_hotplug.refcount was
> decremented to zero. We'll miss out the case when it went negative (which we intended to detect).
>
>> + if (WARN_ON(cpu_hotplug.refcount == -1))
>> + cpu_hotplug.refcount++; /* try to fix things up */
>> + if (unlikely(cpu_hotplug.active_writer))
>> + wake_up_process(cpu_hotplug.active_writer);
>> + }
>> mutex_unlock(&cpu_hotplug.lock);
>>
>> }
>
> So how about something like below:
>
> ------------------------------------------------------>
>
> From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Subject: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()
>
> The synchronization between CPU hotplug readers and writers is achieved by
> means of refcounting, safe-guarded by the cpu_hotplug.lock.
>
> get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
> it. If we ever hit an imbalance between the two, we end up compromising the
> guarantees of the hotplug synchronization i.e, for example, an extra call to
> put_online_cpus() can end up allowing a hotplug reader to execute concurrently with
> a hotplug writer. So, add a WARN_ON() in put_online_cpus() to detect such cases
> where the refcount can go negative, and also attempt to fix it up, so that we can
> continue to run.
>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
Looks good to me.
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>
> kernel/cpu.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index f560598..42bd331 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -80,6 +80,10 @@ void put_online_cpus(void)
> if (cpu_hotplug.active_writer == current)
> return;
> mutex_lock(&cpu_hotplug.lock);
> +
> + if (WARN_ON(!cpu_hotplug.refcount))
> + cpu_hotplug.refcount++; /* try to fix things up */
> +
> if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
> wake_up_process(cpu_hotplug.active_writer);
> mutex_unlock(&cpu_hotplug.lock);
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-10-05 3:25 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <alpine.LNX.2.00.1210021810350.23544@pobox.suse.cz>
[not found] ` <20121002170149.GC2465@linux.vnet.ibm.com>
[not found] ` <alpine.LNX.2.00.1210022324050.23544@pobox.suse.cz>
[not found] ` <alpine.LNX.2.00.1210022331130.23544@pobox.suse.cz>
[not found] ` <alpine.LNX.2.00.1210022356370.23544@pobox.suse.cz>
[not found] ` <20121002233138.GD2465@linux.vnet.ibm.com>
[not found] ` <alpine.LNX.2.00.1210030142570.23544@pobox.suse.cz>
[not found] ` <20121003001530.GF2465@linux.vnet.ibm.com>
2012-10-03 0:45 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Jiri Kosina
2012-10-03 3:41 ` Paul E. McKenney
2012-10-03 3:50 ` Srivatsa S. Bhat
2012-10-03 6:08 ` Srivatsa S. Bhat
2012-10-03 8:21 ` Srivatsa S. Bhat
2012-10-03 9:46 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03 12:22 ` Srivatsa S. Bhat
2012-10-03 12:53 ` [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus() Srivatsa S. Bhat
2012-10-03 21:13 ` Andrew Morton
2012-10-04 6:16 ` Srivatsa S. Bhat
2012-10-05 3:24 ` Yasuaki Ishimatsu [this message]
2012-10-05 5:35 ` Srivatsa S. Bhat
2012-10-03 14:50 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Paul E. McKenney
2012-10-03 14:55 ` Srivatsa S. Bhat
2012-10-03 16:00 ` Paul E. McKenney
2012-10-03 14:17 ` Christoph Lameter
2012-10-03 14:15 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Christoph Lameter
2012-10-03 14:34 ` [PATCH v3] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03 15:00 ` Srivatsa S. Bhat
2012-10-03 15:05 ` [PATCH v4] " Jiri Kosina
2012-10-03 15:49 ` Srivatsa S. Bhat
2012-10-03 18:49 ` David Rientjes
2012-10-08 7:26 ` [PATCH] [RESEND] " Jiri Kosina
2012-10-10 6:27 ` Pekka Enberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=506E52E1.3090609@jp.fujitsu.com \
--to=isimatu.yasuaki@jp.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux-foundation.org \
--cc=jkosina@suse.cz \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@kernel.org \
--cc=paul.mckenney@linaro.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=penberg@kernel.org \
--cc=peterz@infradead.org \
--cc=srivatsa.bhat@linux.vnet.ibm.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox