linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Gladyshev Ilya <gladyshev.ilya1@h-partners.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Zi Yan <ziy@nvidia.com>,
	Harry Yoo <harry.yoo@oracle.com>,
	Matthew Wilcox <willy@infradead.org>, Yu Zhao <yuzhao@google.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	Alistair Popple <apopple@nvidia.com>,
	Gorbunov Ivan <gorbunov.ivan@h-partners.com>,
	Muchun Song <muchun.song@linux.dev>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Kiryl Shutsemau <kirill@shutemov.name>,
	Linus Torvalds <torvalds@linuxfoundation.org>
Subject: Re: [PATCH 1/1] mm: implement page refcount locking via dedicated bit
Date: Thu, 5 Mar 2026 09:10:16 +0100	[thread overview]
Message-ID: <a3361902-75bf-4e9e-a8c5-1959f9e72915@kernel.org> (raw)
In-Reply-To: <f3c411e1-062e-4494-b7e9-8056f346effb@kernel.org>

>>  	if (page_ref_tracepoint_active(page_ref_mod_and_test))
>>  		__page_ref_mod_and_test(page, -nr, ret);
>>  	return ret;
>> @@ -204,6 +212,9 @@ static inline int page_ref_dec_and_test(struct page *page)
>>  {
>>  	int ret = atomic_dec_and_test(&page->_refcount);
>>  
>> +	if (ret)
>> +		ret = !atomic_cmpxchg_relaxed(&page->_refcount, 0, PAGEREF_LOCKED_BIT);
>> +
>>  	if (page_ref_tracepoint_active(page_ref_mod_and_test))
>>  		__page_ref_mod_and_test(page, -1, ret);
>>  	return ret;
>> @@ -228,14 +239,23 @@ static inline int folio_ref_dec_return(struct folio *folio)
>>  	return page_ref_dec_return(&folio->page);
>>  }
>>  
>> +#define _PAGEREF_LOCKED_LIMIT	((1 << 30) | PAGEREF_LOCKED_BIT)
>> +
>>  static inline bool page_ref_add_unless_zero(struct page *page, int nr)
>>  {
>>  	bool ret = false;
>> +	int val;
>>  
>>  	rcu_read_lock();
>>  	/* avoid writing to the vmemmap area being remapped */
>> -	if (page_count_writable(page))
>> -		ret = atomic_add_unless(&page->_refcount, nr, 0);
>> +	if (page_count_writable(page)) {
>> +		val = atomic_add_return(nr, &page->_refcount);
>> +		ret = !(val & PAGEREF_LOCKED_BIT);
>> +
>> +		/* Undo atomic_add() if counter is locked and scary big */
>> +		while (unlikely((unsigned int)val >= _PAGEREF_LOCKED_LIMIT))
>> +			val = atomic_cmpxchg_relaxed(&page->_refcount, val, PAGEREF_LOCKED_BIT);
It's still early here, but I think there is a problem.

Please bear with me :)

	val = atomic_add_return(nr, &page->_refcount);
	ret = !(val & PAGEREF_LOCKED_BIT);

Implies that can grab a reference whenever the locked-bit is not set.

Including when the refcount is 0.

Now, that works fine when racing with concurrent freeing, where we are
just able to decrement the refcount, but yet have to set the
PAGEREF_LOCKED_BIT bit.

But, what about any pages that don't have the PAGEREF_LOCKED_BIT set,
but have the refcount at 0 permanently?

That's, for example, the case for any pages where we do an explicit
set_page_count(page, 0);

For example, all pages we add to the page allocator through
__free_pages_core().

That means, that someone could easily grab a reference to such pages,
including tail pages of allocated compound pages where the refcount is
still 0 -- or pages allocated with a frozen refcount where we don't ever
do the set_page_refcount(1) in the buddy.

Bad things will happen when that wrongly page_ref_add_unless_zero()
obtained reference is dropped again to free that page.


You'd have to make sure that there is no way we can achieve refcount ==
0 without going through page_ref_dec_and_test(), when actually freeing a
page.

One piece of the puzzle is handling set_page_count(p, 0) I think. But I
suspect that there might be other places where we don't even have the
set_page_count().

See vmemmap_get_tail() in
https://lore.kernel.org/r/20260227194302.274384-13-kas@kernel.org for
example, where we know the refcount is 0, because we allocated the page
holding memmap with __GFP_ZERO.

For example, I think you'd have to make sure that *any* pages in the
buddy have their refcount set to PAGEREF_LOCKED_BIT, not 0.

So unless I am missing soemthing, this is broken an requires a lot of
care to make sure that refcount==0 is handled everywhere accordingly.

-- 
Cheers,

David


  reply	other threads:[~2026-03-05  8:10 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-26 16:27 [PATCH 0/1] mm: improve folio refcount scalability Gladyshev Ilya
2026-02-26 16:27 ` [PATCH 1/1] mm: implement page refcount locking via dedicated bit Gladyshev Ilya
2026-03-04 19:16   ` David Hildenbrand (Arm)
2026-03-05  8:10     ` David Hildenbrand (Arm) [this message]
2026-02-28 22:19 ` [PATCH 0/1] mm: improve folio refcount scalability Andrew Morton
2026-03-01  3:27   ` Linus Torvalds
2026-03-01 18:52     ` Linus Torvalds
2026-03-01 20:26       ` Pedro Falcato
2026-03-01 21:16         ` Linus Torvalds
2026-03-04 17:34           ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a3361902-75bf-4e9e-a8c5-1959f9e72915@kernel.org \
    --to=david@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=gladyshev.ilya1@h-partners.com \
    --cc=gorbunov.ivan@h-partners.com \
    --cc=harry.yoo@oracle.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=torvalds@linuxfoundation.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox