Re: [PATCH 0/1] mm: improve folio refcount scalability

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Linus Torvalds <torvalds@linuxfoundation.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Gladyshev Ilya <gladyshev.ilya1@h-partners.com>,
	David Hildenbrand <david@kernel.org>,
	 Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	 Vlastimil Babka <vbabka@suse.cz>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	 Michal Hocko <mhocko@suse.com>, Zi Yan <ziy@nvidia.com>,
	Harry Yoo <harry.yoo@oracle.com>,
	 Matthew Wilcox <willy@infradead.org>,
	Yu Zhao <yuzhao@google.com>,
	 Baolin Wang <baolin.wang@linux.alibaba.com>,
	Alistair Popple <apopple@nvidia.com>,
	 Gorbunov Ivan <gorbunov.ivan@h-partners.com>,
	Muchun Song <muchun.song@linux.dev>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org,
	Kiryl Shutsemau <kirill@shutemov.name>,
	 Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH 0/1] mm: improve folio refcount scalability
Date: Sat, 28 Feb 2026 19:27:28 -0800	[thread overview]
Message-ID: <CAHk-=wgfs7KqS8pD8F9F9yC8jwSgQtmifytbmmXVfz9xXrQzuw@mail.gmail.com> (raw)
In-Reply-To: <20260228141941.f6fec687aae9d80a161387f4@linux-foundation.org>

[-- Attachment #1: Type: text/plain, Size: 2136 bytes --]

On Sat, 28 Feb 2026 at 14:19, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Well it's nice to see the performance benefits from Kiryl's ill-fated
> patch
> (https://lore.kernel.org/linux-mm/20251017141536.577466-1-kirill@shutemov.name/)
>
> And this approach looks far simpler.

This attempt does something completely different, in that it doesn't
actually remove any atomics at all.

Quite the opposite, in fact. It adds *new* atomics - just in a different place.

But if it helps performance, that is certainly still interesting.

It's basically saying that it's not the atomic op itself that is so
expensive, it's literally just the "read + cmpxchg" in
atomic_add_unless() that makes for most of the expense.

And that, in turn, is probably due the fact that the read in that loop
first tries to make the cacheline shared, and then the cmpxchg has to
turn the shared cacheline exclusive, so you have two cache ops - and
possibly then many more because of bouncing due to this all.

Fine, I can believe that.

But if it's purely about the cacheline shared/exclusive behavior, I
think there's a much simpler patch

That much more simple patch is something we've done before: do *not*
read the old value before the cmpxchg loop. Do the cmpxchg with a
default value, and if we guessed wrong, just do the extra loop
iteration.

This attached patch is ENTIRELY UNTESTED. I might have gotten
something wrong. A quick look at the assembler seems to say it
generates that expected code (gcc is not great at this), with the loop
being

        mov    $0x1,%eax
        lea    0x34(%rdi),%rdx
        lea    0x1(%rax),%ecx
        lock cmpxchg %ecx,(%rdx)
        ...

ie the first time through we just assume the count is one.

And yes, that assumption may be wrong, but at least we don't go
through the shared state, and since we got the cacheline for exclusive
the first time around the loop, the second time around we will get it
right.

What do the numbers look with this much simpler patch? (All assuming I
didn't screw some logic up and get some conditional the wrong way
around - please check me).

                        Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 855 bytes --]

 include/linux/page_ref.h | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
index 544150d1d5fd..ed3f262aa7f1 100644
--- a/include/linux/page_ref.h
+++ b/include/linux/page_ref.h
@@ -234,8 +234,18 @@ static inline bool page_ref_add_unless(struct page *page, int nr, int u)

 	rcu_read_lock();
 	/* avoid writing to the vmemmap area being remapped */
-	if (page_count_writable(page, u))
-		ret = atomic_add_unless(&page->_refcount, nr, u);
+	if (page_count_writable(page, u)) {
+		/* Assume count == 1, don't read it! */
+		int old = 1;
+		for (;;) {
+			if (atomic_try_cmpxchg(&page->_refcount, &old, old+1)) {
+				ret = true;
+				break;
+			}
+			if (unlikely(!old))
+				break;
+		}
+	}
 	rcu_read_unlock();

 	if (page_ref_tracepoint_active(page_ref_mod_unless))

     prev parent reply	other threads:[~2026-03-01  3:34 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-26 16:27 Gladyshev Ilya
2026-02-26 16:27 ` [PATCH 1/1] mm: implement page refcount locking via dedicated bit Gladyshev Ilya
2026-02-28 22:19 ` [PATCH 0/1] mm: improve folio refcount scalability Andrew Morton
2026-03-01  3:27   ` Linus Torvalds [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wgfs7KqS8pD8F9F9yC8jwSgQtmifytbmmXVfz9xXrQzuw@mail.gmail.com' \
    --to=torvalds@linuxfoundation.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@fromorbit.com \
    --cc=david@kernel.org \
    --cc=gladyshev.ilya1@h-partners.com \
    --cc=gorbunov.ivan@h-partners.com \
    --cc=harry.yoo@oracle.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox