linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Gorbunov Ivan <gorbunov.ivan@h-partners.com>
To: <gorbunov.ivan@h-partners.com>
Cc: <david@kernel.org>, <Liam.Howlett@oracle.com>,
	<akpm@linux-foundation.org>, <apopple@nvidia.com>,
	<baolin.wang@linux.alibaba.com>, <gladyshev.ilya1@h-partners.com>,
	<harry.yoo@oracle.com>, <kirill@shutemov.name>,
	<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
	<lorenzo.stoakes@oracle.com>, <mhocko@suse.com>,
	<muchun.song@linux.dev>, <rppt@kernel.org>, <surenb@google.com>,
	<torvalds@linuxfoundation.org>, <vbabka@suse.cz>,
	<willy@infradead.org>, <yuzhao@google.com>, <ziy@nvidia.com>,
	<artem.kuzin@huawei.com>
Subject: [PATCH v2 1/2] mm: drop page refcount zero state semantics
Date: Mon, 20 Apr 2026 08:01:18 +0000	[thread overview]
Message-ID: <9fd8ebbc0f4f45be611bae0d03dd25dd994233c0.1776350895.git.gorbunov.ivan@h-partners.com> (raw)
In-Reply-To: <cover.1776350895.git.gorbunov.ivan@h-partners.com>

Right now 'zero' state could be interpreted in 2 ways
1) Unfrozen page which right now has no explicit owner
2) Frozen page

This states can be 'logically' distinguished by operations such as
page_ref_add, page_ref_inc, etc. In the first we would want the counter to
increase.

For example one can write

page = alloc_frozen_page(...);
page_ref_inc(page, 1);

But in the second state increasing a counter of a frozen page, shouldn't be valid at all.

Another reason for change is our other patch (mm: implement page refcount locking via dedicated bit)
in which frozen pages do not have 0 value in refcount when frozen.

This patch proposes 2 changes
1) Deprecate invariant that the value stored in reference count of frozen page is 0
(Getter functions folio_ref_count/page_ref_count must still return 0 for frozen pages)
2) Allow modification operations like page_ref_add to be used only with
   pages with owners

We've looked at places where pages are allocated, and they are
always initialized via functions like set_page_count(page, 1). However, for
clarity, we've added a debug BUG_ON inside modification functions to ensure
that they are called only on pages with owners. In future those
checks can be improved by replacing operations with their results
returning analogs, if needed.

Co-developed-by: Gladyshev Ilya <gorbunov.ivan@h-partners.com>
Signed-off-by: Gladyshev Ilya <gladyshev.ilya1@h-partners.com>
Signed-off-by: Gorbunov Ivan <gorbunov.ivan@h-partners.com>
---
 drivers/pci/p2pdma.c               |  2 +-
 include/linux/page_ref.h           | 17 +++++++++++++++++
 kernel/liveupdate/kexec_handover.c |  2 +-
 mm/hugetlb.c                       |  2 +-
 mm/mm_init.c                       |  6 +++---
 mm/page_alloc.c                    |  4 ++--
 6 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index e0f546166eb8..e060ae7e1644 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -158,7 +158,7 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
 			 * because we don't want to trigger the
 			 * p2pdma_folio_free() path.
 			 */
-			set_page_count(page, 0);
+			set_page_count_as_frozen(page);
 			percpu_ref_put(ref);
 			return ret;
 		}
diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
index 94d3f0e71c06..a7a07b61d2ae 100644
--- a/include/linux/page_ref.h
+++ b/include/linux/page_ref.h
@@ -62,6 +62,11 @@ static inline void __page_ref_unfreeze(struct page *page, int v)
 
 #endif
 
+static inline bool __page_count_is_frozen(int count)
+{
+	return count == 0;
+}
+
 static inline int page_ref_count(const struct page *page)
 {
 	return atomic_read(&page->_refcount);
@@ -115,8 +120,14 @@ static inline void init_page_count(struct page *page)
 	set_page_count(page, 1);
 }
 
+static inline void set_page_count_as_frozen(struct page *page)
+{
+	set_page_count(page, 0);
+}
+
 static inline void page_ref_add(struct page *page, int nr)
 {
+	VM_BUG_ON(__page_count_is_frozen(page_count(page)));
 	atomic_add(nr, &page->_refcount);
 	if (page_ref_tracepoint_active(page_ref_mod))
 		__page_ref_mod(page, nr);
@@ -129,6 +140,7 @@ static inline void folio_ref_add(struct folio *folio, int nr)
 
 static inline void page_ref_sub(struct page *page, int nr)
 {
+	VM_BUG_ON(__page_count_is_frozen(page_count(page)));
 	atomic_sub(nr, &page->_refcount);
 	if (page_ref_tracepoint_active(page_ref_mod))
 		__page_ref_mod(page, -nr);
@@ -142,6 +154,7 @@ static inline void folio_ref_sub(struct folio *folio, int nr)
 static inline int folio_ref_sub_return(struct folio *folio, int nr)
 {
 	int ret = atomic_sub_return(nr, &folio->_refcount);
+	VM_BUG_ON(__page_count_is_frozen(ret + nr));
 
 	if (page_ref_tracepoint_active(page_ref_mod_and_return))
 		__page_ref_mod_and_return(&folio->page, -nr, ret);
@@ -150,6 +163,7 @@ static inline int folio_ref_sub_return(struct folio *folio, int nr)
 
 static inline void page_ref_inc(struct page *page)
 {
+	VM_BUG_ON(__page_count_is_frozen(page_count(page)));
 	atomic_inc(&page->_refcount);
 	if (page_ref_tracepoint_active(page_ref_mod))
 		__page_ref_mod(page, 1);
@@ -162,6 +176,7 @@ static inline void folio_ref_inc(struct folio *folio)
 
 static inline void page_ref_dec(struct page *page)
 {
+	VM_BUG_ON(__page_count_is_frozen(page_count(page)));
 	atomic_dec(&page->_refcount);
 	if (page_ref_tracepoint_active(page_ref_mod))
 		__page_ref_mod(page, -1);
@@ -189,6 +204,7 @@ static inline int folio_ref_sub_and_test(struct folio *folio, int nr)
 static inline int page_ref_inc_return(struct page *page)
 {
 	int ret = atomic_inc_return(&page->_refcount);
+	VM_BUG_ON(__page_count_is_frozen(ret - 1));
 
 	if (page_ref_tracepoint_active(page_ref_mod_and_return))
 		__page_ref_mod_and_return(page, 1, ret);
@@ -217,6 +233,7 @@ static inline int folio_ref_dec_and_test(struct folio *folio)
 static inline int page_ref_dec_return(struct page *page)
 {
 	int ret = atomic_dec_return(&page->_refcount);
+	VM_BUG_ON(__page_count_is_frozen(ret + 1));
 
 	if (page_ref_tracepoint_active(page_ref_mod_and_return))
 		__page_ref_mod_and_return(page, -1, ret);
diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
index b64f36a45296..36c21f3d8250 100644
--- a/kernel/liveupdate/kexec_handover.c
+++ b/kernel/liveupdate/kexec_handover.c
@@ -390,7 +390,7 @@ static void kho_init_folio(struct page *page, unsigned int order)
 
 	/* For higher order folios, tail pages get a page count of zero. */
 	for (unsigned long i = 1; i < nr_pages; i++)
-		set_page_count(page + i, 0);
+		set_page_count_as_frozen(page + i);
 
 	if (order > 0)
 		prep_compound_page(page, order);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1d41fa3dd43e..b364fda29111 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3186,7 +3186,7 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
 	for (pfn = head_pfn + start_page_number; pfn < end_pfn; page++, pfn++) {
 		__init_single_page(page, pfn, zone, nid);
 		prep_compound_tail(page, &folio->page, order);
-		set_page_count(page, 0);
+		set_page_count_as_frozen(page);
 	}
 }
 
diff --git a/mm/mm_init.c b/mm/mm_init.c
index cec7bb758bdd..e4ec672a9f51 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1066,7 +1066,7 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
 	case MEMORY_DEVICE_PRIVATE:
 	case MEMORY_DEVICE_COHERENT:
 	case MEMORY_DEVICE_PCI_P2PDMA:
-		set_page_count(page, 0);
+		set_page_count_as_frozen(page);
 		break;
 
 	case MEMORY_DEVICE_GENERIC:
@@ -1112,7 +1112,7 @@ static void __ref memmap_init_compound(struct page *head,
 
 		__init_zone_device_page(page, pfn, zone_idx, nid, pgmap);
 		prep_compound_tail(page, head, order);
-		set_page_count(page, 0);
+		set_page_count_as_frozen(page);
 	}
 	prep_compound_head(head, order);
 }
@@ -2250,7 +2250,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
 
 	do {
 		__ClearPageReserved(p);
-		set_page_count(p, 0);
+		set_page_count_as_frozen(p);
 	} while (++p, --i);
 
 	init_pageblock_migratetype(page, MIGRATE_CMA, false);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 65e702fade61..27734cf795da 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1639,14 +1639,14 @@ void __meminit __free_pages_core(struct page *page, unsigned int order,
 		for (loop = 0; loop < nr_pages; loop++, p++) {
 			VM_WARN_ON_ONCE(PageReserved(p));
 			__ClearPageOffline(p);
-			set_page_count(p, 0);
+			set_page_count_as_frozen(p);
 		}
 
 		adjust_managed_page_count(page, nr_pages);
 	} else {
 		for (loop = 0; loop < nr_pages; loop++, p++) {
 			__ClearPageReserved(p);
-			set_page_count(p, 0);
+			set_page_count_as_frozen(p);
 		}
 
 		/* memblock adjusts totalram_pages() manually. */
-- 
2.43.0



  reply	other threads:[~2026-04-20  8:02 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-20  8:01 [PATCH v2 0/2] mm: improve folio refcount scalability Gorbunov Ivan
2026-04-20  8:01 ` Gorbunov Ivan [this message]
2026-04-20  8:01 ` [PATCH v2 2/2] mm: implement page refcount locking via dedicated bit Gorbunov Ivan
2026-04-20 10:07 ` [syzbot ci] Re: mm: improve folio refcount scalability syzbot ci
2026-04-20 12:29   ` Gorbunov Ivan
2026-04-20 13:21     ` syzbot ci

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9fd8ebbc0f4f45be611bae0d03dd25dd994233c0.1776350895.git.gorbunov.ivan@h-partners.com \
    --to=gorbunov.ivan@h-partners.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=artem.kuzin@huawei.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=gladyshev.ilya1@h-partners.com \
    --cc=harry.yoo@oracle.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=torvalds@linuxfoundation.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox