[patch 0/3] lib/percpu_counter, cpu/hotplug: Cure the cpu_dying

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [patch 0/3] lib/percpu_counter, cpu/hotplug: Cure the cpu_dying_mask woes
@ 2023-04-14 16:30 Thomas Gleixner
  2023-04-14 16:30 ` [patch 1/3] lib/percpu_counter: Fix CPU hotplug handling Thomas Gleixner
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Thomas Gleixner @ 2023-04-14 16:30 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Valentin Schneider, Dennis Zhou, Tejun Heo,
	Christoph Lameter, Dave Chinner, Yury Norov, Andy Shevchenko,
	Rasmus Villemoes, Ye Bin, linux-mm

Hi!

The cpu_dying_mask is not only undocumented but also to some extent a
misnomer. It's purpose is to capture the last direction of a cpu_up() or
cpu_down() operation taking eventual rollback operations into account.

cpu_dying mask is not really useful for general consumption. The
cpu_dying_mask bits are sticky even after cpu_up() or cpu_down() completes.

A recent fix to plug a race in the per CPU counter code picked
cpu_dying_mask to cure it. Unfortunately this does not work as the author
probably expected and the behaviour of cpu_dying_mask is not easy to change
without breaking the only other and initial user, the scheduler.

This series addresses this by:

   1) Reworking the per CPU counter hotplug mechanism so the race is fully
      plugged without using cpu_dying_mask

   2) Replacing the cpu_dying_mask logic with hotplug core internal state
      which is exposed to the scheduler with a properly documented
      function.

The series is also available from git:

  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git smp/dying_mask

Thanks

	tglx
---
 include/linux/cpuhotplug.h |    2 -
 include/linux/cpumask.h    |   21 ----------------
 kernel/cpu.c               |   45 +++++++++++++++++++++++++++++------
 kernel/sched/core.c        |    4 +--
 kernel/smpboot.h           |    2 +
 lib/percpu_counter.c       |   57 +++++++++++++++++++--------------------------
 6 files changed, 67 insertions(+), 64 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch 1/3] lib/percpu_counter: Fix CPU hotplug handling
  2023-04-14 16:30 [patch 0/3] lib/percpu_counter, cpu/hotplug: Cure the cpu_dying_mask woes Thomas Gleixner
@ 2023-04-14 16:30 ` Thomas Gleixner
  2023-04-15  5:20   ` Dennis Zhou
  2023-04-17  2:09   ` Dave Chinner
  2023-04-14 16:30 ` [patch 2/3] cpu/hotplug: Remove export of cpu_active_mask and cpu_dying_mask Thomas Gleixner
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 10+ messages in thread
From: Thomas Gleixner @ 2023-04-14 16:30 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Valentin Schneider, Dennis Zhou, Tejun Heo,
	Christoph Lameter, Dave Chinner, Yury Norov, Andy Shevchenko,
	Rasmus Villemoes, Ye Bin, linux-mm

Commit 8b57b11cca88 ("pcpcntrs: fix dying cpu summation race") tried to
address a race condition between percpu_counter_sum() and a concurrent CPU
hotplug operation.

The race window is between the point where an un-plugged CPU removed itself
from the online_cpu_mask and the hotplug state callback which folds the per
CPU counters of the now dead CPU into the global count.

percpu_counter_sum() used for_each_online_cpu() to accumulate the per CPU
local counts, so during the race window it missed to account for the not
yet folded back local count of the offlined CPU.

The attempt to address this used the admittedly undocumented and
pointlessly public cpu_dying_mask by changing the loop iterator to take
both the cpu_online_mask and the cpu_dying_mask into account.

That works to some extent, but it is incorrect.

The cpu_dying_mask bits are sticky even after cpu_up()/cpu_down()
completes. That means that all offlined CPUs are always taken into
account. In the case of disabling SMT at boottime or runtime this results
in evaluating _all_ offlined SMT siblings counters forever.  Depending on
system size, that's a massive amount of cache-lines to be touched forever.

It might be argued, that the cpu_dying_mask bit could be cleared when
cpu_down() completes, but that's not possible under all circumstances.

Especially with partial hotplug the bit must be sticky in order to keep the
initial user, i.e. the scheduler correct. Partial hotplug which allows
explicit state transitions also can create a situation where the race
window gets recreated:

       cpu_down(target = CPUHP_PERCPU_CNT_DEAD + 1)

brings a CPU down to one state before the per CPU counter folding
callback. As this did not reach CPUHP_OFFLINE state the bit would stay set.
Now the next partial operation:

       cpu_up(target = CPUHP_PERCPU_CNT_DEAD + 2)

has to clear the bit and the race window is open again.

There are two ways to solve this:

  1) Maintain a local CPU mask in the per CPU counter code which
     gets the bit set when a CPU comes online and removed in the
     the CPUHP_PERCPU_CNT_DEAD state after folding.

     This adds more code and complexity.

  2) Move the folding hotplug state into the DYING callback section, which
     runs on the outgoing CPU immediatedly after it cleared its online bit.

     There is no concurrency vs. percpu_counter_sum() on another CPU
     because all still online CPUs are waiting in stop_machine() for the
     outgoing CPU to complete its shutdown. The raw spinlock held around
     the CPU mask iteration prevents that an online CPU reaches the stop
     machine thread while iterating, which implicitely prevents the
     outgoing CPU from clearing its online bit.

     This is way simpler than #1 and makes the hotplug calls symmetric for
     the price of a slightly longer wait time in stop_machine(), which is
     not the end of the world as CPU un-plug is already slow. The overall
     time for a cpu_down() operation stays exactly the same.

Implement #2 and plug the race completely.

percpu_counter_sum() is still inherently racy against a concurrent
percpu_counter_add_batch() fastpath unless externally serialized.  That's
completely independent of CPU hotplug though.

Fixes: 8b57b11cca88 ("pcpcntrs: fix dying cpu summation race")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Ye Bin <yebin10@huawei.com>
Cc: linux-mm@kvack.org
---
 include/linux/cpuhotplug.h |    2 -
 lib/percpu_counter.c       |   57 +++++++++++++++++++--------------------------
 2 files changed, 26 insertions(+), 33 deletions(-)

--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -91,7 +91,6 @@ enum cpuhp_state {
 	CPUHP_PRINTK_DEAD,
 	CPUHP_MM_MEMCQ_DEAD,
 	CPUHP_XFS_DEAD,
-	CPUHP_PERCPU_CNT_DEAD,
 	CPUHP_RADIX_DEAD,
 	CPUHP_PAGE_ALLOC,
 	CPUHP_NET_DEV_DEAD,
@@ -196,6 +195,7 @@ enum cpuhp_state {
 	CPUHP_AP_SMPCFD_DYING,
 	CPUHP_AP_X86_TBOOT_DYING,
 	CPUHP_AP_ARM_CACHE_B15_RAC_DYING,
+	CPUHP_AP_PERCPU_COUNTER_STARTING,
 	CPUHP_AP_ONLINE,
 	CPUHP_TEARDOWN_CPU,
 
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -12,7 +12,7 @@
 
 #ifdef CONFIG_HOTPLUG_CPU
 static LIST_HEAD(percpu_counters);
-static DEFINE_SPINLOCK(percpu_counters_lock);
+static DEFINE_RAW_SPINLOCK(percpu_counters_lock);
 #endif
 
 #ifdef CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER
@@ -126,13 +126,8 @@ EXPORT_SYMBOL(percpu_counter_sync);
  * Add up all the per-cpu counts, return the result.  This is a more accurate
  * but much slower version of percpu_counter_read_positive().
  *
- * We use the cpu mask of (cpu_online_mask | cpu_dying_mask) to capture sums
- * from CPUs that are in the process of being taken offline. Dying cpus have
- * been removed from the online mask, but may not have had the hotplug dead
- * notifier called to fold the percpu count back into the global counter sum.
- * By including dying CPUs in the iteration mask, we avoid this race condition
- * so __percpu_counter_sum() just does the right thing when CPUs are being taken
- * offline.
+ * Note: This function is inherently racy against the lockless fastpath of
+ * percpu_counter_add_batch() unless externaly serialized.
  */
 s64 __percpu_counter_sum(struct percpu_counter *fbc)
 {
@@ -142,10 +137,8 @@ s64 __percpu_counter_sum(struct percpu_c
 
 	raw_spin_lock_irqsave(&fbc->lock, flags);
 	ret = fbc->count;
-	for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) {
-		s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
-		ret += *pcount;
-	}
+	for_each_online_cpu(cpu)
+		ret += *per_cpu_ptr(fbc->counters, cpu);
 	raw_spin_unlock_irqrestore(&fbc->lock, flags);
 	return ret;
 }
@@ -167,9 +160,9 @@ int __percpu_counter_init(struct percpu_
 
 #ifdef CONFIG_HOTPLUG_CPU
 	INIT_LIST_HEAD(&fbc->list);
-	spin_lock_irqsave(&percpu_counters_lock, flags);
+	raw_spin_lock_irqsave(&percpu_counters_lock, flags);
 	list_add(&fbc->list, &percpu_counters);
-	spin_unlock_irqrestore(&percpu_counters_lock, flags);
+	raw_spin_unlock_irqrestore(&percpu_counters_lock, flags);
 #endif
 	return 0;
 }
@@ -185,9 +178,9 @@ void percpu_counter_destroy(struct percp
 	debug_percpu_counter_deactivate(fbc);
 
 #ifdef CONFIG_HOTPLUG_CPU
-	spin_lock_irqsave(&percpu_counters_lock, flags);
+	raw_spin_lock_irqsave(&percpu_counters_lock, flags);
 	list_del(&fbc->list);
-	spin_unlock_irqrestore(&percpu_counters_lock, flags);
+	raw_spin_unlock_irqrestore(&percpu_counters_lock, flags);
 #endif
 	free_percpu(fbc->counters);
 	fbc->counters = NULL;
@@ -197,22 +190,29 @@ EXPORT_SYMBOL(percpu_counter_destroy);
 int percpu_counter_batch __read_mostly = 32;
 EXPORT_SYMBOL(percpu_counter_batch);
 
-static int compute_batch_value(unsigned int cpu)
+static void compute_batch_value(int offs)
 {
-	int nr = num_online_cpus();
+	int nr = num_online_cpus() + offs;
+
+	percpu_counter_batch = max(32, nr * 2);
+}
 
-	percpu_counter_batch = max(32, nr*2);
+static int percpu_counter_cpu_starting(unsigned int cpu)
+{
+	/* If invoked during hotplug @cpu is not yet marked online. */
+	compute_batch_value(cpu_online(cpu) ? 0 : 1);
 	return 0;
 }
 
-static int percpu_counter_cpu_dead(unsigned int cpu)
+static int percpu_counter_cpu_dying(unsigned int cpu)
 {
 #ifdef CONFIG_HOTPLUG_CPU
 	struct percpu_counter *fbc;
+	unsigned long flags;
 
-	compute_batch_value(cpu);
+	compute_batch_value(0);
 
-	spin_lock_irq(&percpu_counters_lock);
+	raw_spin_lock_irqsave(&percpu_counters_lock, flags);
 	list_for_each_entry(fbc, &percpu_counters, list) {
 		s32 *pcount;
 
@@ -222,7 +222,7 @@ static int percpu_counter_cpu_dead(unsig
 		*pcount = 0;
 		raw_spin_unlock(&fbc->lock);
 	}
-	spin_unlock_irq(&percpu_counters_lock);
+	raw_spin_unlock_irqrestore(&percpu_counters_lock, flags);
 #endif
 	return 0;
 }
@@ -256,15 +256,8 @@ EXPORT_SYMBOL(__percpu_counter_compare);
 
 static int __init percpu_counter_startup(void)
 {
-	int ret;
-
-	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "lib/percpu_cnt:online",
-				compute_batch_value, NULL);
-	WARN_ON(ret < 0);
-	ret = cpuhp_setup_state_nocalls(CPUHP_PERCPU_CNT_DEAD,
-					"lib/percpu_cnt:dead", NULL,
-					percpu_counter_cpu_dead);
-	WARN_ON(ret < 0);
+	WARN_ON(cpuhp_setup_state(CPUHP_AP_PERCPU_COUNTER_STARTING, "lib/percpu_counter:starting",
+				  percpu_counter_cpu_starting, percpu_counter_cpu_dying));
 	return 0;
 }
 module_init(percpu_counter_startup);



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 1/3] lib/percpu_counter: Fix CPU hotplug handling
  2023-04-14 16:30 ` [patch 1/3] lib/percpu_counter: Fix CPU hotplug handling Thomas Gleixner
@ 2023-04-15  5:20   ` Dennis Zhou
  2023-04-17  2:09   ` Dave Chinner
  1 sibling, 0 replies; 10+ messages in thread
From: Dennis Zhou @ 2023-04-15  5:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Valentin Schneider, Tejun Heo,
	Christoph Lameter, Dave Chinner, Yury Norov, Andy Shevchenko,
	Rasmus Villemoes, Ye Bin, linux-mm

Hello,

On Fri, Apr 14, 2023 at 06:30:43PM +0200, Thomas Gleixner wrote:
> Commit 8b57b11cca88 ("pcpcntrs: fix dying cpu summation race") tried to
> address a race condition between percpu_counter_sum() and a concurrent CPU
> hotplug operation.
> 
> The race window is between the point where an un-plugged CPU removed itself
> from the online_cpu_mask and the hotplug state callback which folds the per
> CPU counters of the now dead CPU into the global count.
> 
> percpu_counter_sum() used for_each_online_cpu() to accumulate the per CPU
> local counts, so during the race window it missed to account for the not
> yet folded back local count of the offlined CPU.
> 
> The attempt to address this used the admittedly undocumented and
> pointlessly public cpu_dying_mask by changing the loop iterator to take
> both the cpu_online_mask and the cpu_dying_mask into account.
> 
> That works to some extent, but it is incorrect.
> 
> The cpu_dying_mask bits are sticky even after cpu_up()/cpu_down()
> completes. That means that all offlined CPUs are always taken into
> account. In the case of disabling SMT at boottime or runtime this results
> in evaluating _all_ offlined SMT siblings counters forever.  Depending on
> system size, that's a massive amount of cache-lines to be touched forever.
> 
> It might be argued, that the cpu_dying_mask bit could be cleared when
> cpu_down() completes, but that's not possible under all circumstances.
> 
> Especially with partial hotplug the bit must be sticky in order to keep the
> initial user, i.e. the scheduler correct. Partial hotplug which allows
> explicit state transitions also can create a situation where the race
> window gets recreated:
> 
>        cpu_down(target = CPUHP_PERCPU_CNT_DEAD + 1)
> 
> brings a CPU down to one state before the per CPU counter folding
> callback. As this did not reach CPUHP_OFFLINE state the bit would stay set.
> Now the next partial operation:
> 
>        cpu_up(target = CPUHP_PERCPU_CNT_DEAD + 2)
> 
> has to clear the bit and the race window is open again.
> 
> There are two ways to solve this:
> 
>   1) Maintain a local CPU mask in the per CPU counter code which
>      gets the bit set when a CPU comes online and removed in the
>      the CPUHP_PERCPU_CNT_DEAD state after folding.
> 
>      This adds more code and complexity.
> 
>   2) Move the folding hotplug state into the DYING callback section, which
>      runs on the outgoing CPU immediatedly after it cleared its online bit.
> 
>      There is no concurrency vs. percpu_counter_sum() on another CPU
>      because all still online CPUs are waiting in stop_machine() for the
>      outgoing CPU to complete its shutdown. The raw spinlock held around
>      the CPU mask iteration prevents that an online CPU reaches the stop
>      machine thread while iterating, which implicitely prevents the
>      outgoing CPU from clearing its online bit.
> 
>      This is way simpler than #1 and makes the hotplug calls symmetric for
>      the price of a slightly longer wait time in stop_machine(), which is
>      not the end of the world as CPU un-plug is already slow. The overall
>      time for a cpu_down() operation stays exactly the same.
> 
> Implement #2 and plug the race completely.
> 
> percpu_counter_sum() is still inherently racy against a concurrent
> percpu_counter_add_batch() fastpath unless externally serialized.  That's
> completely independent of CPU hotplug though.
> 
> Fixes: 8b57b11cca88 ("pcpcntrs: fix dying cpu summation race")
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dennis Zhou <dennis@kernel.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Dave Chinner <dchinner@redhat.com>
> Cc: Yury Norov <yury.norov@gmail.com>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> Cc: Ye Bin <yebin10@huawei.com>
> Cc: linux-mm@kvack.org
> ---
>  include/linux/cpuhotplug.h |    2 -
>  lib/percpu_counter.c       |   57 +++++++++++++++++++--------------------------
>  2 files changed, 26 insertions(+), 33 deletions(-)
> 
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -91,7 +91,6 @@ enum cpuhp_state {
>  	CPUHP_PRINTK_DEAD,
>  	CPUHP_MM_MEMCQ_DEAD,
>  	CPUHP_XFS_DEAD,
> -	CPUHP_PERCPU_CNT_DEAD,
>  	CPUHP_RADIX_DEAD,
>  	CPUHP_PAGE_ALLOC,
>  	CPUHP_NET_DEV_DEAD,
> @@ -196,6 +195,7 @@ enum cpuhp_state {
>  	CPUHP_AP_SMPCFD_DYING,
>  	CPUHP_AP_X86_TBOOT_DYING,
>  	CPUHP_AP_ARM_CACHE_B15_RAC_DYING,
> +	CPUHP_AP_PERCPU_COUNTER_STARTING,
>  	CPUHP_AP_ONLINE,
>  	CPUHP_TEARDOWN_CPU,
>  
> --- a/lib/percpu_counter.c
> +++ b/lib/percpu_counter.c
> @@ -12,7 +12,7 @@
>  
>  #ifdef CONFIG_HOTPLUG_CPU
>  static LIST_HEAD(percpu_counters);
> -static DEFINE_SPINLOCK(percpu_counters_lock);
> +static DEFINE_RAW_SPINLOCK(percpu_counters_lock);
>  #endif
>  
>  #ifdef CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER
> @@ -126,13 +126,8 @@ EXPORT_SYMBOL(percpu_counter_sync);
>   * Add up all the per-cpu counts, return the result.  This is a more accurate
>   * but much slower version of percpu_counter_read_positive().
>   *
> - * We use the cpu mask of (cpu_online_mask | cpu_dying_mask) to capture sums
> - * from CPUs that are in the process of being taken offline. Dying cpus have
> - * been removed from the online mask, but may not have had the hotplug dead
> - * notifier called to fold the percpu count back into the global counter sum.
> - * By including dying CPUs in the iteration mask, we avoid this race condition
> - * so __percpu_counter_sum() just does the right thing when CPUs are being taken
> - * offline.
> + * Note: This function is inherently racy against the lockless fastpath of
> + * percpu_counter_add_batch() unless externaly serialized.
>   */
>  s64 __percpu_counter_sum(struct percpu_counter *fbc)
>  {
> @@ -142,10 +137,8 @@ s64 __percpu_counter_sum(struct percpu_c
>  
>  	raw_spin_lock_irqsave(&fbc->lock, flags);
>  	ret = fbc->count;
> -	for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) {
> -		s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
> -		ret += *pcount;
> -	}
> +	for_each_online_cpu(cpu)
> +		ret += *per_cpu_ptr(fbc->counters, cpu);
>  	raw_spin_unlock_irqrestore(&fbc->lock, flags);
>  	return ret;
>  }
> @@ -167,9 +160,9 @@ int __percpu_counter_init(struct percpu_
>  
>  #ifdef CONFIG_HOTPLUG_CPU
>  	INIT_LIST_HEAD(&fbc->list);
> -	spin_lock_irqsave(&percpu_counters_lock, flags);
> +	raw_spin_lock_irqsave(&percpu_counters_lock, flags);
>  	list_add(&fbc->list, &percpu_counters);
> -	spin_unlock_irqrestore(&percpu_counters_lock, flags);
> +	raw_spin_unlock_irqrestore(&percpu_counters_lock, flags);
>  #endif
>  	return 0;
>  }
> @@ -185,9 +178,9 @@ void percpu_counter_destroy(struct percp
>  	debug_percpu_counter_deactivate(fbc);
>  
>  #ifdef CONFIG_HOTPLUG_CPU
> -	spin_lock_irqsave(&percpu_counters_lock, flags);
> +	raw_spin_lock_irqsave(&percpu_counters_lock, flags);
>  	list_del(&fbc->list);
> -	spin_unlock_irqrestore(&percpu_counters_lock, flags);
> +	raw_spin_unlock_irqrestore(&percpu_counters_lock, flags);
>  #endif
>  	free_percpu(fbc->counters);
>  	fbc->counters = NULL;
> @@ -197,22 +190,29 @@ EXPORT_SYMBOL(percpu_counter_destroy);
>  int percpu_counter_batch __read_mostly = 32;
>  EXPORT_SYMBOL(percpu_counter_batch);
>  
> -static int compute_batch_value(unsigned int cpu)
> +static void compute_batch_value(int offs)
>  {
> -	int nr = num_online_cpus();
> +	int nr = num_online_cpus() + offs;
> +
> +	percpu_counter_batch = max(32, nr * 2);
> +}
>  
> -	percpu_counter_batch = max(32, nr*2);
> +static int percpu_counter_cpu_starting(unsigned int cpu)
> +{
> +	/* If invoked during hotplug @cpu is not yet marked online. */
> +	compute_batch_value(cpu_online(cpu) ? 0 : 1);
>  	return 0;
>  }
>  
> -static int percpu_counter_cpu_dead(unsigned int cpu)
> +static int percpu_counter_cpu_dying(unsigned int cpu)
>  {
>  #ifdef CONFIG_HOTPLUG_CPU
>  	struct percpu_counter *fbc;
> +	unsigned long flags;
>  
> -	compute_batch_value(cpu);
> +	compute_batch_value(0);
>  
> -	spin_lock_irq(&percpu_counters_lock);
> +	raw_spin_lock_irqsave(&percpu_counters_lock, flags);
>  	list_for_each_entry(fbc, &percpu_counters, list) {
>  		s32 *pcount;
>  
> @@ -222,7 +222,7 @@ static int percpu_counter_cpu_dead(unsig
>  		*pcount = 0;
>  		raw_spin_unlock(&fbc->lock);
>  	}
> -	spin_unlock_irq(&percpu_counters_lock);
> +	raw_spin_unlock_irqrestore(&percpu_counters_lock, flags);
>  #endif
>  	return 0;
>  }
> @@ -256,15 +256,8 @@ EXPORT_SYMBOL(__percpu_counter_compare);
>  
>  static int __init percpu_counter_startup(void)
>  {
> -	int ret;
> -
> -	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "lib/percpu_cnt:online",
> -				compute_batch_value, NULL);
> -	WARN_ON(ret < 0);
> -	ret = cpuhp_setup_state_nocalls(CPUHP_PERCPU_CNT_DEAD,
> -					"lib/percpu_cnt:dead", NULL,
> -					percpu_counter_cpu_dead);
> -	WARN_ON(ret < 0);
> +	WARN_ON(cpuhp_setup_state(CPUHP_AP_PERCPU_COUNTER_STARTING, "lib/percpu_counter:starting",
> +				  percpu_counter_cpu_starting, percpu_counter_cpu_dying));
>  	return 0;
>  }
>  module_init(percpu_counter_startup);
> 

Thanks for this work. This is a much more complete solution.

Acked-by: Dennis Zhou <dennis@kernel.org>

Thanks,
Dennis


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 1/3] lib/percpu_counter: Fix CPU hotplug handling
  2023-04-14 16:30 ` [patch 1/3] lib/percpu_counter: Fix CPU hotplug handling Thomas Gleixner
  2023-04-15  5:20   ` Dennis Zhou
@ 2023-04-17  2:09   ` Dave Chinner
  2023-04-17  8:09     ` Thomas Gleixner
  1 sibling, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2023-04-17  2:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Valentin Schneider, Dennis Zhou, Tejun Heo,
	Christoph Lameter, Yury Norov, Andy Shevchenko, Rasmus Villemoes,
	Ye Bin, linux-mm

On Fri, Apr 14, 2023 at 06:30:43PM +0200, Thomas Gleixner wrote:
> Commit 8b57b11cca88 ("pcpcntrs: fix dying cpu summation race") tried to
> address a race condition between percpu_counter_sum() and a concurrent CPU
> hotplug operation.
> 
> The race window is between the point where an un-plugged CPU removed itself
> from the online_cpu_mask and the hotplug state callback which folds the per
> CPU counters of the now dead CPU into the global count.
> 
> percpu_counter_sum() used for_each_online_cpu() to accumulate the per CPU
> local counts, so during the race window it missed to account for the not
> yet folded back local count of the offlined CPU.
> 
> The attempt to address this used the admittedly undocumented and
> pointlessly public cpu_dying_mask by changing the loop iterator to take
> both the cpu_online_mask and the cpu_dying_mask into account.
> 
> That works to some extent, but it is incorrect.
> 
> The cpu_dying_mask bits are sticky even after cpu_up()/cpu_down()
> completes. That means that all offlined CPUs are always taken into
> account. In the case of disabling SMT at boottime or runtime this results
> in evaluating _all_ offlined SMT siblings counters forever.  Depending on
> system size, that's a massive amount of cache-lines to be touched forever.
> 
> It might be argued, that the cpu_dying_mask bit could be cleared when
> cpu_down() completes, but that's not possible under all circumstances.
> 
> Especially with partial hotplug the bit must be sticky in order to keep the
> initial user, i.e. the scheduler correct. Partial hotplug which allows
> explicit state transitions also can create a situation where the race
> window gets recreated:
> 
>        cpu_down(target = CPUHP_PERCPU_CNT_DEAD + 1)
> 
> brings a CPU down to one state before the per CPU counter folding
> callback. As this did not reach CPUHP_OFFLINE state the bit would stay set.
> Now the next partial operation:
> 
>        cpu_up(target = CPUHP_PERCPU_CNT_DEAD + 2)
> 
> has to clear the bit and the race window is open again.
> 
> There are two ways to solve this:
> 
>   1) Maintain a local CPU mask in the per CPU counter code which
>      gets the bit set when a CPU comes online and removed in the
>      the CPUHP_PERCPU_CNT_DEAD state after folding.
> 
>      This adds more code and complexity.
> 
>   2) Move the folding hotplug state into the DYING callback section, which
>      runs on the outgoing CPU immediatedly after it cleared its online bit.
> 
>      There is no concurrency vs. percpu_counter_sum() on another CPU
>      because all still online CPUs are waiting in stop_machine() for the
>      outgoing CPU to complete its shutdown. The raw spinlock held around
>      the CPU mask iteration prevents that an online CPU reaches the stop
>      machine thread while iterating, which implicitely prevents the
>      outgoing CPU from clearing its online bit.
> 
>      This is way simpler than #1 and makes the hotplug calls symmetric for
>      the price of a slightly longer wait time in stop_machine(), which is
>      not the end of the world as CPU un-plug is already slow. The overall
>      time for a cpu_down() operation stays exactly the same.
> 
> Implement #2 and plug the race completely.
> 
> percpu_counter_sum() is still inherently racy against a concurrent
> percpu_counter_add_batch() fastpath unless externally serialized.  That's
> completely independent of CPU hotplug though.
> 
> Fixes: 8b57b11cca88 ("pcpcntrs: fix dying cpu summation race")
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dennis Zhou <dennis@kernel.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Dave Chinner <dchinner@redhat.com>
> Cc: Yury Norov <yury.norov@gmail.com>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> Cc: Ye Bin <yebin10@huawei.com>
> Cc: linux-mm@kvack.org
> ---
>  include/linux/cpuhotplug.h |    2 -
>  lib/percpu_counter.c       |   57 +++++++++++++++++++--------------------------
>  2 files changed, 26 insertions(+), 33 deletions(-)
> 
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -91,7 +91,6 @@ enum cpuhp_state {
>  	CPUHP_PRINTK_DEAD,
>  	CPUHP_MM_MEMCQ_DEAD,
>  	CPUHP_XFS_DEAD,
> -	CPUHP_PERCPU_CNT_DEAD,
>  	CPUHP_RADIX_DEAD,
>  	CPUHP_PAGE_ALLOC,
>  	CPUHP_NET_DEV_DEAD,
> @@ -196,6 +195,7 @@ enum cpuhp_state {
>  	CPUHP_AP_SMPCFD_DYING,
>  	CPUHP_AP_X86_TBOOT_DYING,
>  	CPUHP_AP_ARM_CACHE_B15_RAC_DYING,
> +	CPUHP_AP_PERCPU_COUNTER_STARTING,
>  	CPUHP_AP_ONLINE,
>  	CPUHP_TEARDOWN_CPU,
>  
> --- a/lib/percpu_counter.c
> +++ b/lib/percpu_counter.c
> @@ -12,7 +12,7 @@
>  
>  #ifdef CONFIG_HOTPLUG_CPU
>  static LIST_HEAD(percpu_counters);
> -static DEFINE_SPINLOCK(percpu_counters_lock);
> +static DEFINE_RAW_SPINLOCK(percpu_counters_lock);
>  #endif
>  
>  #ifdef CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER
> @@ -126,13 +126,8 @@ EXPORT_SYMBOL(percpu_counter_sync);
>   * Add up all the per-cpu counts, return the result.  This is a more accurate
>   * but much slower version of percpu_counter_read_positive().
>   *
> - * We use the cpu mask of (cpu_online_mask | cpu_dying_mask) to capture sums
> - * from CPUs that are in the process of being taken offline. Dying cpus have
> - * been removed from the online mask, but may not have had the hotplug dead
> - * notifier called to fold the percpu count back into the global counter sum.
> - * By including dying CPUs in the iteration mask, we avoid this race condition
> - * so __percpu_counter_sum() just does the right thing when CPUs are being taken
> - * offline.
> + * Note: This function is inherently racy against the lockless fastpath of
> + * percpu_counter_add_batch() unless externaly serialized.
>   */
>  s64 __percpu_counter_sum(struct percpu_counter *fbc)
>  {
> @@ -142,10 +137,8 @@ s64 __percpu_counter_sum(struct percpu_c
>  
>  	raw_spin_lock_irqsave(&fbc->lock, flags);
>  	ret = fbc->count;
> -	for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) {
> -		s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
> -		ret += *pcount;
> -	}
> +	for_each_online_cpu(cpu)
> +		ret += *per_cpu_ptr(fbc->counters, cpu);
>  	raw_spin_unlock_irqrestore(&fbc->lock, flags);
>  	return ret;
>  }
> @@ -167,9 +160,9 @@ int __percpu_counter_init(struct percpu_
>  
>  #ifdef CONFIG_HOTPLUG_CPU
>  	INIT_LIST_HEAD(&fbc->list);
> -	spin_lock_irqsave(&percpu_counters_lock, flags);
> +	raw_spin_lock_irqsave(&percpu_counters_lock, flags);
>  	list_add(&fbc->list, &percpu_counters);
> -	spin_unlock_irqrestore(&percpu_counters_lock, flags);
> +	raw_spin_unlock_irqrestore(&percpu_counters_lock, flags);
>  #endif
>  	return 0;
>  }
> @@ -185,9 +178,9 @@ void percpu_counter_destroy(struct percp
>  	debug_percpu_counter_deactivate(fbc);
>  
>  #ifdef CONFIG_HOTPLUG_CPU
> -	spin_lock_irqsave(&percpu_counters_lock, flags);
> +	raw_spin_lock_irqsave(&percpu_counters_lock, flags);
>  	list_del(&fbc->list);
> -	spin_unlock_irqrestore(&percpu_counters_lock, flags);
> +	raw_spin_unlock_irqrestore(&percpu_counters_lock, flags);
>  #endif
>  	free_percpu(fbc->counters);
>  	fbc->counters = NULL;
> @@ -197,22 +190,29 @@ EXPORT_SYMBOL(percpu_counter_destroy);
>  int percpu_counter_batch __read_mostly = 32;
>  EXPORT_SYMBOL(percpu_counter_batch);
>  
> -static int compute_batch_value(unsigned int cpu)
> +static void compute_batch_value(int offs)
>  {
> -	int nr = num_online_cpus();
> +	int nr = num_online_cpus() + offs;
> +
> +	percpu_counter_batch = max(32, nr * 2);
> +}
>  
> -	percpu_counter_batch = max(32, nr*2);
> +static int percpu_counter_cpu_starting(unsigned int cpu)
> +{
> +	/* If invoked during hotplug @cpu is not yet marked online. */
> +	compute_batch_value(cpu_online(cpu) ? 0 : 1);
>  	return 0;
>  }

So this changes the batch size based on whether the CPU is starting
or dying to try to get _compare() to fall into the slow path correctly?

How is this supposed to work with counters that have caller supplied
custom batch sizes? i.e. use percpu_counter_add_batch() and
__percpu_counter_compare() with their own batch sizes directly?
Do they now need to add their own cpu hotplug hooks to
screw around with their batch sizes as well?

-Dave.
-- 
Dave Chinner
dchinner@redhat.com



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 1/3] lib/percpu_counter: Fix CPU hotplug handling
  2023-04-17  2:09   ` Dave Chinner
@ 2023-04-17  8:09     ` Thomas Gleixner
  0 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2023-04-17  8:09 UTC (permalink / raw)
  To: Dave Chinner
  Cc: LKML, Peter Zijlstra, Valentin Schneider, Dennis Zhou, Tejun Heo,
	Christoph Lameter, Yury Norov, Andy Shevchenko, Rasmus Villemoes,
	Ye Bin, linux-mm

On Mon, Apr 17 2023 at 12:09, Dave Chinner wrote:
> On Fri, Apr 14, 2023 at 06:30:43PM +0200, Thomas Gleixner wrote:
>> -	percpu_counter_batch = max(32, nr*2);
>> +static int percpu_counter_cpu_starting(unsigned int cpu)
>> +{
>> +	/* If invoked during hotplug @cpu is not yet marked online. */
>> +	compute_batch_value(cpu_online(cpu) ? 0 : 1);
>>  	return 0;
>>  }
>
> So this changes the batch size based on whether the CPU is starting
> or dying to try to get _compare() to fall into the slow path
> correctly?

Right. That's not new. The original code did the same.

> How is this supposed to work with counters that have caller supplied
> custom batch sizes? i.e. use percpu_counter_add_batch() and
> __percpu_counter_compare() with their own batch sizes directly?
> Do they now need to add their own cpu hotplug hooks to
> screw around with their batch sizes as well?

Now? Nothing has changed here. Just the point where the batch size
computation is called is different. The original code did it in the
dynamic online callback late on hotplug and in the dead (cleanup)
callback late on unplug.

The external batch sizes always have been independent of this.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch 2/3] cpu/hotplug: Remove export of cpu_active_mask and cpu_dying_mask
  2023-04-14 16:30 [patch 0/3] lib/percpu_counter, cpu/hotplug: Cure the cpu_dying_mask woes Thomas Gleixner
  2023-04-14 16:30 ` [patch 1/3] lib/percpu_counter: Fix CPU hotplug handling Thomas Gleixner
@ 2023-04-14 16:30 ` Thomas Gleixner
  2023-04-14 16:30 ` [patch 3/3] cpu/hotplug: Get rid of cpu_dying_mask Thomas Gleixner
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2023-04-14 16:30 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Valentin Schneider, Dennis Zhou, Tejun Heo,
	Christoph Lameter, Dave Chinner, Yury Norov, Andy Shevchenko,
	Rasmus Villemoes, Ye Bin, linux-mm

No module users and no module should ever care.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/cpu.c |    2 --
 1 file changed, 2 deletions(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2643,10 +2643,8 @@ struct cpumask __cpu_present_mask __read
 EXPORT_SYMBOL(__cpu_present_mask);
 
 struct cpumask __cpu_active_mask __read_mostly;
-EXPORT_SYMBOL(__cpu_active_mask);
 
 struct cpumask __cpu_dying_mask __read_mostly;
-EXPORT_SYMBOL(__cpu_dying_mask);
 
 atomic_t __num_online_cpus __read_mostly;
 EXPORT_SYMBOL(__num_online_cpus);



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch 3/3] cpu/hotplug: Get rid of cpu_dying_mask
  2023-04-14 16:30 [patch 0/3] lib/percpu_counter, cpu/hotplug: Cure the cpu_dying_mask woes Thomas Gleixner
  2023-04-14 16:30 ` [patch 1/3] lib/percpu_counter: Fix CPU hotplug handling Thomas Gleixner
  2023-04-14 16:30 ` [patch 2/3] cpu/hotplug: Remove export of cpu_active_mask and cpu_dying_mask Thomas Gleixner
@ 2023-04-14 16:30 ` Thomas Gleixner
  2023-05-03 11:50 ` [patch 0/3] lib/percpu_counter, cpu/hotplug: Cure the cpu_dying_mask woes Valentin Schneider
  2023-12-30 22:39 ` Dennis Zhou
  4 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2023-04-14 16:30 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Valentin Schneider, Dennis Zhou, Tejun Heo,
	Christoph Lameter, Dave Chinner, Yury Norov, Andy Shevchenko,
	Rasmus Villemoes, Ye Bin, linux-mm

The cpu_dying_mask is not only undocumented but also to some extent a
misnomer. It's purpose is to capture the last direction of a cpu_up() or
cpu_down() operation taking eventual rollback operations into account.  The
name and the lack of documentation lured already someone to use it in the
wrong way.

The initial user is the scheduler code which needs to keep the decision
correct whether to schedule tasks on a CPU, which is between the
CPUHP_ONLINE and the CPUHP_ACTIVE state and has the balance_push() hook
installed.

cpu_dying mask is not really useful for general consumption. The
cpu_dying_mask bits are sticky even after cpu_up() or cpu_down()
completes. 

It might be argued, that the cpu_dying_mask bit could be cleared when
cpu_down() completes, but that's not possible under all circumstances.

Especially not with partial hotplug operations. In that case the bit must
be sticky in order to keep the initial user, i.e. the scheduler correct.

Replace the cpumask completely by:

  - recording the direction internally in the CPU hotplug core state

  - exposing that state via a documented function to the scheduler

After that cpu_dying_mask is not longer in use and removed before the next
user trips over it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/cpumask.h |   21 ---------------------
 kernel/cpu.c            |   43 +++++++++++++++++++++++++++++++++++++------
 kernel/sched/core.c     |    4 ++--
 kernel/smpboot.h        |    2 ++
 4 files changed, 41 insertions(+), 29 deletions(-)
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -126,12 +126,10 @@ extern struct cpumask __cpu_possible_mas
 extern struct cpumask __cpu_online_mask;
 extern struct cpumask __cpu_present_mask;
 extern struct cpumask __cpu_active_mask;
-extern struct cpumask __cpu_dying_mask;
 #define cpu_possible_mask ((const struct cpumask *)&__cpu_possible_mask)
 #define cpu_online_mask   ((const struct cpumask *)&__cpu_online_mask)
 #define cpu_present_mask  ((const struct cpumask *)&__cpu_present_mask)
 #define cpu_active_mask   ((const struct cpumask *)&__cpu_active_mask)
-#define cpu_dying_mask    ((const struct cpumask *)&__cpu_dying_mask)
 
 extern atomic_t __num_online_cpus;
 
@@ -1015,15 +1013,6 @@ set_cpu_active(unsigned int cpu, bool ac
 		cpumask_clear_cpu(cpu, &__cpu_active_mask);
 }
 
-static inline void
-set_cpu_dying(unsigned int cpu, bool dying)
-{
-	if (dying)
-		cpumask_set_cpu(cpu, &__cpu_dying_mask);
-	else
-		cpumask_clear_cpu(cpu, &__cpu_dying_mask);
-}
-
 /**
  * to_cpumask - convert an NR_CPUS bitmap to a struct cpumask *
  * @bitmap: the bitmap
@@ -1097,11 +1086,6 @@ static inline bool cpu_active(unsigned i
 	return cpumask_test_cpu(cpu, cpu_active_mask);
 }
 
-static inline bool cpu_dying(unsigned int cpu)
-{
-	return cpumask_test_cpu(cpu, cpu_dying_mask);
-}
-
 #else
 
 #define num_online_cpus()	1U
@@ -1129,11 +1113,6 @@ static inline bool cpu_active(unsigned i
 	return cpu == 0;
 }
 
-static inline bool cpu_dying(unsigned int cpu)
-{
-	return false;
-}
-
 #endif /* NR_CPUS > 1 */
 
 #define cpu_is_offline(cpu)	unlikely(!cpu_online(cpu))
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -53,6 +53,9 @@
  * @rollback:	Perform a rollback
  * @single:	Single callback invocation
  * @bringup:	Single callback bringup or teardown selector
+ * @goes_down:	Indicator for direction of cpu_up()/cpu_down() operations
+ *		including eventual rollbacks. Not affected by state or
+ *		instance add/remove operations. See cpuhp_cpu_goes_down().
  * @cpu:	CPU number
  * @node:	Remote CPU node; for multi-instance, do a
  *		single entry callback for install/remove
@@ -72,6 +75,7 @@ struct cpuhp_cpu_state {
 	bool			rollback;
 	bool			single;
 	bool			bringup;
+	bool			goes_down;
 	struct hlist_node	*node;
 	struct hlist_node	*last;
 	enum cpuhp_state	cb_state;
@@ -295,6 +299,37 @@ void cpu_maps_update_done(void)
 	mutex_unlock(&cpu_add_remove_lock);
 }
 
+/**
+ * cpuhp_cpu_goes_down - Query the current/last CPU hotplug direction of a CPU
+ * @cpu:	The CPU to query
+ *
+ * The direction indicator is modified by the hotplug core on
+ * cpu_up()/cpu_down() operations including eventual rollback operations.
+ * The indicator is not affected by state or instance install/remove
+ * operations.
+ *
+ * The indicator is sticky after the hotplug operation completes, whether
+ * the operation was a full up/down or just a partial bringup/teardown.
+ *
+ *				goes_down
+ *   cpu_up(target) enter	-> False
+ *	rollback on fail	-> True
+ *   cpu_up(target) exit	Last state
+ *
+ *   cpu_down(target) enter	-> True
+ *	rollback on fail	-> False
+ *   cpu_down(target) exit	Last state
+ *
+ * The return value is a racy snapshot and not protected against concurrent
+ * CPU hotplug operations which modify the indicator.
+ *
+ * Returns: True if cached direction is down, false otherwise
+ */
+bool cpuhp_cpu_goes_down(unsigned int cpu)
+{
+	return data_race(per_cpu(cpuhp_state.goes_down, cpu));
+}
+
 /*
  * If set, cpu_up and cpu_down will return -EBUSY and do nothing.
  * Should always be manipulated under cpu_add_remove_lock
@@ -486,8 +521,7 @@ cpuhp_set_state(int cpu, struct cpuhp_cp
 	st->target = target;
 	st->single = false;
 	st->bringup = bringup;
-	if (cpu_dying(cpu) != !bringup)
-		set_cpu_dying(cpu, !bringup);
+	st->goes_down = !bringup;
 
 	return prev_state;
 }
@@ -521,8 +555,7 @@ cpuhp_reset_state(int cpu, struct cpuhp_
 	}
 
 	st->bringup = bringup;
-	if (cpu_dying(cpu) != !bringup)
-		set_cpu_dying(cpu, !bringup);
+	st->goes_down = !bringup;
 }
 
 /* Regular hotplug invocation of the AP hotplug thread */
@@ -2644,8 +2677,6 @@ EXPORT_SYMBOL(__cpu_present_mask);
 
 struct cpumask __cpu_active_mask __read_mostly;
 
-struct cpumask __cpu_dying_mask __read_mostly;
-
 atomic_t __num_online_cpus __read_mostly;
 EXPORT_SYMBOL(__num_online_cpus);
 
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2297,7 +2297,7 @@ static inline bool is_cpu_allowed(struct
 		return cpu_online(cpu);
 
 	/* Regular kernel threads don't get to stay during offline. */
-	if (cpu_dying(cpu))
+	if (cpuhp_cpu_goes_down(cpu))
 		return false;
 
 	/* But are allowed during online. */
@@ -9344,7 +9344,7 @@ static void balance_push(struct rq *rq)
 	 * Only active while going offline and when invoked on the outgoing
 	 * CPU.
 	 */
-	if (!cpu_dying(rq->cpu) || rq != this_rq())
+	if (!cpuhp_cpu_goes_down(rq->cpu) || rq != this_rq())
 		return;
 
 	/*
--- a/kernel/smpboot.h
+++ b/kernel/smpboot.h
@@ -20,4 +20,6 @@ int smpboot_unpark_threads(unsigned int
 
 void __init cpuhp_threads_init(void);
 
+bool cpuhp_cpu_goes_down(unsigned int cpu);
+
 #endif



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 0/3] lib/percpu_counter, cpu/hotplug: Cure the cpu_dying_mask woes
  2023-04-14 16:30 [patch 0/3] lib/percpu_counter, cpu/hotplug: Cure the cpu_dying_mask woes Thomas Gleixner
                   ` (2 preceding siblings ...)
  2023-04-14 16:30 ` [patch 3/3] cpu/hotplug: Get rid of cpu_dying_mask Thomas Gleixner
@ 2023-05-03 11:50 ` Valentin Schneider
  2023-12-30 22:39 ` Dennis Zhou
  4 siblings, 0 replies; 10+ messages in thread
From: Valentin Schneider @ 2023-05-03 11:50 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Peter Zijlstra, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Dave Chinner, Yury Norov, Andy Shevchenko, Rasmus Villemoes,
	Ye Bin, linux-mm

On 14/04/23 18:30, Thomas Gleixner wrote:
> Hi!
>
> The cpu_dying_mask is not only undocumented but also to some extent a
> misnomer. It's purpose is to capture the last direction of a cpu_up() or
> cpu_down() operation taking eventual rollback operations into account.
>
> cpu_dying mask is not really useful for general consumption. The
> cpu_dying_mask bits are sticky even after cpu_up() or cpu_down() completes.
>
> A recent fix to plug a race in the per CPU counter code picked
> cpu_dying_mask to cure it. Unfortunately this does not work as the author
> probably expected and the behaviour of cpu_dying_mask is not easy to change
> without breaking the only other and initial user, the scheduler.
>
> This series addresses this by:
>
>    1) Reworking the per CPU counter hotplug mechanism so the race is fully
>       plugged without using cpu_dying_mask
>
>    2) Replacing the cpu_dying_mask logic with hotplug core internal state
>       which is exposed to the scheduler with a properly documented
>       function.
>

For patches 2-3:

Reviewed-by: Valentin Schneider <vschneid@redhat.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 0/3] lib/percpu_counter, cpu/hotplug: Cure the cpu_dying_mask woes
  2023-04-14 16:30 [patch 0/3] lib/percpu_counter, cpu/hotplug: Cure the cpu_dying_mask woes Thomas Gleixner
                   ` (3 preceding siblings ...)
  2023-05-03 11:50 ` [patch 0/3] lib/percpu_counter, cpu/hotplug: Cure the cpu_dying_mask woes Valentin Schneider
@ 2023-12-30 22:39 ` Dennis Zhou
  4 siblings, 0 replies; 10+ messages in thread
From: Dennis Zhou @ 2023-12-30 22:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Valentin Schneider, Tejun Heo,
	Christoph Lameter, Dave Chinner, Yury Norov, Andy Shevchenko,
	Rasmus Villemoes, Ye Bin, linux-mm

Hello,

On Fri, Apr 14, 2023 at 06:30:42PM +0200, Thomas Gleixner wrote:
> Hi!
> 
> The cpu_dying_mask is not only undocumented but also to some extent a
> misnomer. It's purpose is to capture the last direction of a cpu_up() or
> cpu_down() operation taking eventual rollback operations into account.
> 
> cpu_dying mask is not really useful for general consumption. The
> cpu_dying_mask bits are sticky even after cpu_up() or cpu_down() completes.
> 
> A recent fix to plug a race in the per CPU counter code picked
> cpu_dying_mask to cure it. Unfortunately this does not work as the author
> probably expected and the behaviour of cpu_dying_mask is not easy to change
> without breaking the only other and initial user, the scheduler.
> 
> This series addresses this by:
> 
>    1) Reworking the per CPU counter hotplug mechanism so the race is fully
>       plugged without using cpu_dying_mask
> 
>    2) Replacing the cpu_dying_mask logic with hotplug core internal state
>       which is exposed to the scheduler with a properly documented
>       function.
> 
> The series is also available from git:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git smp/dying_mask
> 
> Thanks
> 
> 	tglx
> ---
>  include/linux/cpuhotplug.h |    2 -
>  include/linux/cpumask.h    |   21 ----------------
>  kernel/cpu.c               |   45 +++++++++++++++++++++++++++++------
>  kernel/sched/core.c        |    4 +--
>  kernel/smpboot.h           |    2 +
>  lib/percpu_counter.c       |   57 +++++++++++++++++++--------------------------
>  6 files changed, 67 insertions(+), 64 deletions(-)

This has been on my mind and regretfully it's been a busy year for me.

I know the merge window is around the corner, but I rebased this series
onto percpu#for-6.8 [1]. I had to massage percpu_counter slightly due
to some changes but other than that it largely is intact. I need to do a
little bit of a more thorough pass and re-send it out, but I think it
remains correct to merge. I can then pull it, give it a few days to soak
in for-next and then send it to Linus either in a follow up PR or in the
2nd week of the merge window.

Thomas, how does this sound to you?

[1] https://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu.git/log/?h=percpu-hotplug

Thanks,
Dennis


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 0/3 v2] lib/percpu_counter, cpu/hotplug: Cure the cpu_dying_mask woes
@ 2024-01-12 23:36 Dennis Zhou
  2024-01-12 23:36 ` [PATCH 2/3] cpu/hotplug: Remove export of cpu_active_mask and cpu_dying_mask Dennis Zhou
  0 siblings, 1 reply; 10+ messages in thread
From: Dennis Zhou @ 2024-01-12 23:36 UTC (permalink / raw)
  To: Tejun Heo, Christoph Lameter, Thomas Gleixner
  Cc: Peter Zijlstra, Valentin Schneider, Dave Chinner, Yury Norov,
	Andy Shevchenko, Rasmus Villemoes, Ye Bin, linux-mm,
	linux-kernel, Dennis Zhou

Hi everyone,

This is a respin of Thomas' series [1] against v6.7-rc4. Largely it's
the same minus a slight change in percpu_counter.c for batch
percpu_counters and updating __percpu_counter_limited_add().

I don't think we reached an alternative resolution here so I can queue
this up and give it some soak time in for-next.

[1] https://lore.kernel.org/lkml/20230414162755.281993820@linutronix.de/

Thanks,
Dennis

Dennis Zhou (2):
  lib/percpu_counter: Fix CPU hotplug handling
  cpu/hotplug: Get rid of cpu_dying_mask

Thomas Gleixner (1):
  cpu/hotplug: Remove export of cpu_active_mask and cpu_dying_mask

 include/linux/cpuhotplug.h |  2 +-
 include/linux/cpumask.h    | 21 ------------
 kernel/cpu.c               | 45 +++++++++++++++++++++-----
 kernel/sched/core.c        |  4 +--
 kernel/smpboot.h           |  2 ++
 lib/percpu_counter.c       | 65 ++++++++++++++++----------------------
 6 files changed, 70 insertions(+), 69 deletions(-)

-- 
2.39.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/3] cpu/hotplug: Remove export of cpu_active_mask and cpu_dying_mask
  2024-01-12 23:36 [PATCH 0/3 v2] " Dennis Zhou
@ 2024-01-12 23:36 ` Dennis Zhou
  0 siblings, 0 replies; 10+ messages in thread
From: Dennis Zhou @ 2024-01-12 23:36 UTC (permalink / raw)
  To: Tejun Heo, Christoph Lameter, Thomas Gleixner
  Cc: Peter Zijlstra, Valentin Schneider, Dave Chinner, Yury Norov,
	Andy Shevchenko, Rasmus Villemoes, Ye Bin, linux-mm,
	linux-kernel, Dennis Zhou

From: Thomas Gleixner <tglx@linutronix.de>

No module users and no module should ever care.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
[Dennis: applied cleanly]
Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
 kernel/cpu.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index a86972a91991..c4929e9cd9be 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -3126,10 +3126,8 @@ struct cpumask __cpu_present_mask __read_mostly;
 EXPORT_SYMBOL(__cpu_present_mask);
 
 struct cpumask __cpu_active_mask __read_mostly;
-EXPORT_SYMBOL(__cpu_active_mask);
 
 struct cpumask __cpu_dying_mask __read_mostly;
-EXPORT_SYMBOL(__cpu_dying_mask);
 
 atomic_t __num_online_cpus __read_mostly;
 EXPORT_SYMBOL(__num_online_cpus);
-- 
2.39.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-01-12 23:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-14 16:30 [patch 0/3] lib/percpu_counter, cpu/hotplug: Cure the cpu_dying_mask woes Thomas Gleixner
2023-04-14 16:30 ` [patch 1/3] lib/percpu_counter: Fix CPU hotplug handling Thomas Gleixner
2023-04-15  5:20   ` Dennis Zhou
2023-04-17  2:09   ` Dave Chinner
2023-04-17  8:09     ` Thomas Gleixner
2023-04-14 16:30 ` [patch 2/3] cpu/hotplug: Remove export of cpu_active_mask and cpu_dying_mask Thomas Gleixner
2023-04-14 16:30 ` [patch 3/3] cpu/hotplug: Get rid of cpu_dying_mask Thomas Gleixner
2023-05-03 11:50 ` [patch 0/3] lib/percpu_counter, cpu/hotplug: Cure the cpu_dying_mask woes Valentin Schneider
2023-12-30 22:39 ` Dennis Zhou
2024-01-12 23:36 [PATCH 0/3 v2] " Dennis Zhou
2024-01-12 23:36 ` [PATCH 2/3] cpu/hotplug: Remove export of cpu_active_mask and cpu_dying_mask Dennis Zhou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox