On Sat, 28 Feb 2026 at 14:19, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Well it's nice to see the performance benefits from Kiryl's ill-fated
> patch
> (https://lore.kernel.org/linux-mm/20251017141536.577466-1-kirill@shutemov.name/)
>
> And this approach looks far simpler.

This attempt does something completely different, in that it doesn't
actually remove any atomics at all.

Quite the opposite, in fact. It adds *new* atomics - just in a different place.

But if it helps performance, that is certainly still interesting.

It's basically saying that it's not the atomic op itself that is so
expensive, it's literally just the "read + cmpxchg" in
atomic_add_unless() that makes for most of the expense.

And that, in turn, is probably due the fact that the read in that loop
first tries to make the cacheline shared, and then the cmpxchg has to
turn the shared cacheline exclusive, so you have two cache ops - and
possibly then many more because of bouncing due to this all.

Fine, I can believe that.

But if it's purely about the cacheline shared/exclusive behavior, I
think there's a much simpler patch

That much more simple patch is something we've done before: do *not*
read the old value before the cmpxchg loop. Do the cmpxchg with a
default value, and if we guessed wrong, just do the extra loop
iteration.

This attached patch is ENTIRELY UNTESTED. I might have gotten
something wrong. A quick look at the assembler seems to say it
generates that expected code (gcc is not great at this), with the loop
being

        mov    $0x1,%eax
        lea    0x34(%rdi),%rdx
        lea    0x1(%rax),%ecx
        lock cmpxchg %ecx,(%rdx)
        ...

ie the first time through we just assume the count is one.

And yes, that assumption may be wrong, but at least we don't go
through the shared state, and since we got the cacheline for exclusive
the first time around the loop, the second time around we will get it
right.

What do the numbers look with this much simpler patch? (All assuming I
didn't screw some logic up and get some conditional the wrong way
around - please check me).

                        Linus