From: Gladyshev Ilya <gladyshev.ilya1@h-partners.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Zi Yan <ziy@nvidia.com>,
Harry Yoo <harry.yoo@oracle.com>,
Matthew Wilcox <willy@infradead.org>, Yu Zhao <yuzhao@google.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Alistair Popple <apopple@nvidia.com>,
Gorbunov Ivan <gorbunov.ivan@h-partners.com>,
Muchun Song <muchun.song@linux.dev>, <linux-mm@kvack.org>,
<linux-kernel@vger.kernel.org>,
Kiryl Shutsemau <kirill@shutemov.name>,
Linus Torvalds <torvalds@linuxfoundation.org>
Subject: Re: [PATCH 1/1] mm: implement page refcount locking via dedicated bit
Date: Fri, 6 Mar 2026 17:29:15 +0300 [thread overview]
Message-ID: <75aa8f34-4d91-4758-847d-248a5be3db62@h-partners.com> (raw)
In-Reply-To: <54cc6e6d-bda7-4c5b-a998-df854ead1985@kernel.org>
>>> For example, all pages we add to the page allocator through
>>> __free_pages_core().
>>
>> You are right that refcount = 0 is tricky. However, for a bad outcome
>> you will need:
>> 1. Some external reference to this page, through which you try to
>> increment the refcount;
>> 2. set_page_count(0) somewhere between freeing and "it is safe to alloc"
>> state.
>
> That is way, way, way too dodgy.
>
> What you should likely do is
>
> (a) Make set_page_count(0) set it to PAGEREF_LOCKED_BIT
> (b) Make any places that might skip set_page_count(0) to use it
> (c) Document this all extremely thoroughly.
Okay, I agree. My reasoning was more about "it's not _that_ bad if we
miss something outside the allocator" than "everything is fine as it
is", but maybe it is really that bad...
> Alternatively, disallow set_page_count(0) ccompletely and add something
> like "initialize_page_as_frozen()" or sth. like that.
(Below are my thoughts, not active proposal)
Actually, we can keep "zero is locked" property if we change scheme a
little:
If we invert locked bit so that 0 means locked, then we don't care about
memset() / GFP_ZERO / set_page(0). But then there is extra bit on each
active refcount, and (un)masking will be required on each refcount
read/set/etc operation.
We can place locked bit in the highest bit like now, or move it to the
lowest. This doesn't make any difference except for how masking is
performed (and or shift). I can't really tell which approach is faster
(and more optimizable after inline).
This approach doesn't look appealing to me, mostly because of redundant
masking. So for now I will try to add proper initialization /
documentation in all risky places.
>> So adding new pages with zeroed refcount to allocator is safe, because
>> there are no external references. Zeroing tail page's refcount is safe,
>> unless someone actually tries to increment its refcount (and this is bug).
>>
>> Generally, the only unsafe set_page_count() (or any other zeroing) will
>> be in allocator itself between freeing and allocating. Or maybe I missed
>> something, and this approach is indeed incorrect
>>
>> Probably we can think of some debug checks to prevent bugs in "safe"
>> scenarious
>
> Just take a look at do_migrate_range(), where we do a
>
> page = pfn_to_page(pfn);
> folio = page_folio(page);
>
> if (!folio_try_get(folio))
> continue;
>
> If that is a page that was added to the buddy (set_page_count(0)) but
> either (a) not allocated yet or (b) allocated as frozen that's a problem.
>
> We really rely on
>
> (1) Frozen pages
> (2) Buddy pages
> (3) Tail pages (compound or non-compoud)
>
> To have a refcount of 0 that *reliably* makes folio_try_get() etc. fail.
>
> Anything else is just playing with fire.
>
next prev parent reply other threads:[~2026-03-06 14:29 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-26 16:27 [PATCH 0/1] mm: improve folio refcount scalability Gladyshev Ilya
2026-02-26 16:27 ` [PATCH 1/1] mm: implement page refcount locking via dedicated bit Gladyshev Ilya
2026-03-04 19:16 ` David Hildenbrand (Arm)
2026-03-05 8:10 ` David Hildenbrand (Arm)
2026-03-06 11:50 ` Gladyshev Ilya
2026-03-06 13:10 ` David Hildenbrand (Arm)
2026-03-06 14:29 ` Gladyshev Ilya [this message]
2026-02-28 22:19 ` [PATCH 0/1] mm: improve folio refcount scalability Andrew Morton
2026-03-01 3:27 ` Linus Torvalds
2026-03-01 18:52 ` Linus Torvalds
2026-03-01 20:26 ` Pedro Falcato
2026-03-01 21:16 ` Linus Torvalds
2026-03-04 17:34 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=75aa8f34-4d91-4758-847d-248a5be3db62@h-partners.com \
--to=gladyshev.ilya1@h-partners.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=gorbunov.ivan@h-partners.com \
--cc=harry.yoo@oracle.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=torvalds@linuxfoundation.org \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox