Hi Jiebin, On 9/13/22 21:25, Jiebin Sun wrote: > > +/* > + * With percpu_counter_add_local() and percpu_counter_sub_local(), counts > + * are accumulated in local per cpu counter and not in fbc->count until > + * local count overflows PERCPU_COUNTER_LOCAL_BATCH. This makes counter > + * write efficient. > + * But percpu_counter_sum(), instead of percpu_counter_read(), needs to be > + * used to add up the counts from each CPU to account for all the local > + * counts. So percpu_counter_add_local() and percpu_counter_sub_local() > + * should be used when a counter is updated frequently and read rarely. > + */ > +static inline void > +percpu_counter_add_local(struct percpu_counter *fbc, s64 amount) > +{ > + percpu_counter_add_batch(fbc, amount, PERCPU_COUNTER_LOCAL_BATCH); > +} > + Unrelated to your patch, and not relevant for ipc/msg as the functions are not called from interrupts, but: Aren't there races with interrupts? > * > * This function is both preempt and irq safe. The former is due to > explicit > * preemption disable. The latter is guaranteed by the fact that the > slow path > * is explicitly protected by an irq-safe spinlock whereas the fast > patch uses > * this_cpu_add which is irq-safe by definition. Hence there is no need > muck > * with irq state before calling this one > */ > void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, > s32 batch) > { >        s64 count; > >        preempt_disable(); >        count = __this_cpu_read(*fbc->counters) + amount; >        if (abs(count) >= batch) { >                unsigned long flags; >                raw_spin_lock_irqsave(&fbc->lock, flags); >                fbc->count += count; >                __this_cpu_sub(*fbc->counters, count - amount); >                raw_spin_unlock_irqrestore(&fbc->lock, flags); >        } else { >                this_cpu_add(*fbc->counters, amount); >        } >        preempt_enable(); > } > EXPORT_SYMBOL(percpu_counter_add_batch); > > Race 1: start: __this_cpu_read(*fbc->counters) = INT_MAX-1. Call: per_cpu_counter_add_batch(fbc, 1, INT_MAX); Result: count=INT_MAX; if (abs(count) >= batch) { // branch taken before the raw_spin_lock_irqsave(): Interrupt Within interrupt:    per_cpu_counter_add_batch(fbc, -2*(INT_MAX-1), INT_MAX)    count=-(INT_MAX-1);    branch not taken    this_cpu_add() updates fbc->counters, new value is -(INT_MAX-1)    exit interrupt raw_spin_lock_irqsave() __this_cpu_sub(*fbc->counters, count - amount) will substract INT_MAX-1 from *fbc->counters. But the value is already -(INT_MAX-1) -> underflow. Race 2: (much simpler) start: __this_cpu_read(*fbc->counters) = 0. Call: per_cpu_counter_add_batch(fbc, INT_MAX-1, INT_MAX); amont=INT_MAX-1; - branch not taken. before this_cpu_add(): interrupt within the interrupt: call per_cpu_counter_add_batch(fbc, INT_MAX-1, INT_MAX)    new value of *fbc->counters: INT_MAX-1.    exit interrupt outside interrupt: this_cpu_add(*fbc->counters, amount); <<< overflow. Attached is an incomplete patch (untested). If needed, I could check the whole file and add/move the required local_irq_save() calls. --     Manfred