linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 00/13] mm/rmap: support arbitrary folio mappings
@ 2026-04-12 18:59 David Hildenbrand (Arm)
  2026-04-12 18:59 ` [PATCH RFC 01/13] mm/rmap: remove folio->_nr_pages_mapped David Hildenbrand (Arm)
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-12 18:59 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Andrew Morton, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Rik van Riel, Harry Yoo,
	Jann Horn, Brendan Jackman, Zi Yan, Pedro Falcato,
	Matthew Wilcox
  Cc: cgroups, linux-doc, linux-kernel, linux-mm, linux-fsdevel,
	David Hildenbrand (Arm)

This series is related to my LSF/MM/BPF topic:

	[LSF/MM/BPF TOPIC] Towards removing CONFIG_PAGE_MAPCOUNT [1]

And does the following things:

(a) Gets rid of CONFIG_PAGE_MAPCOUNT, stopping rmap-related code to no
    longer use page->_mapcount.

(b) Converts the entire mapcount to a "total mapped pages" counter, that
    can trivially be used to calculate the per-page average mapcount in
    a folio.

(c) Cleans up the code heavily,

(d) Teaches RMAP code to support arbitrary folio mappings: For example,
    supporting PMD-mapping of folios that span multiple PMDs.

Initially, I wanted to use a PMD + PUD mapcount, but once I realized that
we can do the same thing much easier with a "total mapped pages" counters,
I tried that. And was surprised how clean it looks.

More details in the last patch.

Functional Changes
------------------

The kernel now always behaves like CONFIG_PAGE_NO_MAPCOUNT currently
does, in particular:

(1) System/node/memcg stats account large folios as fully mapped as soon
    as a single page is mapped, instead of the precise number of pages
    a partially-mapped folio has mapped. For example, this affects
    "AnonPages:", "Mapped:" and "Shmem" in /proc/meminfo.

(2) "mapmax" part of /proc/$PID/numa_maps uses the average page mapcount
    in a folio instead of the effective page mapcount.

(3) Determining the PM_MMAP_EXCLUSIVE flag for /proc/$PID/pagemap is based on
    folio_maybe_mapped_shared() instead of the effective page mapcount.

(4) /proc/kpagecount exposes the average page mapcount in a folio
    instead of the effective page mapcount.

(5) Calculating the Pss for /proc/$PID/smaps and /proc/$PID/smaps_rollup
    uses the average page mapcount in a folio instead of the effective
    page mapcount.

(6) Calculating the Uss for /proc/$PID/smaps and /proc/$PID/smaps_rollup
    uses folio_maybe_mapped_shared() instead of the effective page
    mapcount.

(7) Detecting partially-mapped anonymous folios uses the average
    page-page mapcount. This implies that we cannot detect partial
    mappings of shared anonymous folios in all cases.

TODOs
-----

Partially-mapped folios:

If deemed relevant, we could detect more partially-mapped shared
anonymous folios on the memory reclaim path (e.g., during access-bit
harvesting) and flag them accordingly, so they can get deferred-split.
We might also just let the deferred splitting logic perform more such
scanning of possible candidates.

Mapcount overflows:

It may already be possible to overflow a large folio's mapcount
(+refcount). With this series, it may be possible to overflow
"total mapped pages" on 32bit; and I'd like to avoid making it an
unsigned long long on 32bit.

In a distant future, we may want a 64bit mapcountv value, but for
the time being (no relevant use cases), we should likely reject new
folio mappings if there is the possibility for mapcount +
"total mapped pages" overflows early. I assume doing some basic checks
during fork() + file folio mapping should be good enough (e.g., stop
once it would turn negative).

This series saw only very basic testing on 64bit and no performance
fine-tuning yet.

[1] https://lore.kernel.org/all/fe6afcc3-7539-4650-863b-04d971e89cfb@kernel.org/

---
David Hildenbrand (Arm) (13):
      mm/rmap: remove folio->_nr_pages_mapped
      fs/proc/task_mmu: remove CONFIG_PAGE_MAPCOUNT handling for "mapmax"
      fs/proc/page: remove CONFIG_PAGE_MAPCOUNT handling for kpagecount
      fs/proc/task_mmu: remove CONFIG_PAGE_MAPCOUNT handling for PM_MMAP_EXCLUSIVE
      fs/proc/task_mmu: remove mapcount comment in smaps_account()
      fs/proc/task_mmu: remove CONFIG_PAGE_MAPCOUNT handling in smaps_account()
      mm/rmap: remove CONFIG_PAGE_MAPCOUNT
      mm: re-consolidate folio->_entire_mapcount
      mm: move _large_mapcount to _mapcount in page[1] of a large folio
      mm: re-consolidate folio->_pincount
      mm/rmap: stop using the entire mapcount for hugetlb folios
      mm/rmap: large mapcount interface cleanups
      mm/rmap: support arbitrary folio mappings

 Documentation/admin-guide/cgroup-v1/memory.rst |   6 +-
 Documentation/admin-guide/cgroup-v2.rst        |  13 +-
 Documentation/admin-guide/mm/pagemap.rst       |  30 ++-
 Documentation/filesystems/proc.rst             |  41 ++--
 Documentation/mm/transhuge.rst                 |  29 +--
 fs/proc/internal.h                             |  58 +----
 fs/proc/page.c                                 |  10 +-
 fs/proc/task_mmu.c                             |  69 ++----
 include/linux/mm.h                             |  37 +--
 include/linux/mm_types.h                       |  22 +-
 include/linux/pgtable.h                        |  22 ++
 include/linux/rmap.h                           | 221 ++++++++----------
 mm/Kconfig                                     |  17 --
 mm/debug.c                                     |  10 +-
 mm/internal.h                                  |  30 +--
 mm/memory.c                                    |   3 +-
 mm/page_alloc.c                                |  31 +--
 mm/rmap.c                                      | 302 ++++++++-----------------
 18 files changed, 325 insertions(+), 626 deletions(-)
---
base-commit: 196ab4af58d724f24335fed3da62920c3cea945f
change-id: 20260330-mapcount-32066c687010

Best regards,
-- 
David Hildenbrand (Arm) <david@kernel.org>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 01/13] mm/rmap: remove folio->_nr_pages_mapped
  2026-04-12 18:59 [PATCH RFC 00/13] mm/rmap: support arbitrary folio mappings David Hildenbrand (Arm)
@ 2026-04-12 18:59 ` David Hildenbrand (Arm)
  2026-04-12 18:59 ` [PATCH RFC 02/13] fs/proc/task_mmu: remove CONFIG_PAGE_MAPCOUNT handling for "mapmax" David Hildenbrand (Arm)
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-12 18:59 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Andrew Morton, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Rik van Riel, Harry Yoo,
	Jann Horn, Brendan Jackman, Zi Yan, Pedro Falcato,
	Matthew Wilcox
  Cc: cgroups, linux-doc, linux-kernel, linux-mm, linux-fsdevel,
	David Hildenbrand (Arm)

In preparation for removing CONFIG_PAGE_MAPCOUNT, let's stop updating
a folio's _nr_pages_mapped and remove it.

This will make CONFIG_PAGE_MAPCOUNT behave just like
CONFIG_NO_PAGE_MAPCOUNT, in particular:

(1) We account folios as fully mapped as soon as a single page is
    mapped. That is visible through:

    (1) Memcg stats (e.g., "anon" and "file_mapped" in cgroup v2)

    (2) System stats (e.g., "AnonPages:", "Mapped:" and "Shmem"
        in /proc/meminfo)

    (3) Per-node stats (e.g., "AnonPages:", "Mapped:" and "Shmem")
        in /sys/devices/system/node/nodeX/meminfo

Especially for anonymous memory, that memory consumption is now visible
for partially-mapped folios until actually split and the unmapped pages
are reclaimed.

(2) We do not detect partially-mapped anonymous folios in all cases

We now detect partial mappings based on the average per-page mapcount in a
folio: if it is < 1, at least one page is not mapped.

In the most common case (exclusive anonymous folios), we always detect
partial mappings this way reliably.

Example scenarios where we will not detect partial mappings:

(A) Allocate a THP and fork a child process. Then, unmap up to half of the
    THP in the parent *and* the child. Once the child quits, we detect
    the partial mapping.

    The folio mapcount will be >= 512 -> Average >= 1.

(B) Allocate a THP and fork 511 child processes. Then, unmap all but one
    page in *all* processes.

    The folio mapcount will be 512 -> Average == 1.

There are two main ideas on how to detect these cases as well, if we
ever get a real indication that this is problematic:

* Let memory reclaim scan candidates (shared anonymous folios) to detect
  partial mappings.

* Add candidate folios to the deferred split queue and let the deferred
  shrinker detect partial mappings.

More code cleanups are possible, but we'll defer that and focus on the
core change here.

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 Documentation/admin-guide/cgroup-v1/memory.rst |   6 +-
 Documentation/admin-guide/cgroup-v2.rst        |  13 +-
 Documentation/mm/transhuge.rst                 |  23 ++--
 include/linux/mm_types.h                       |   4 +-
 include/linux/rmap.h                           |   4 +-
 mm/debug.c                                     |   3 +-
 mm/internal.h                                  |  24 ----
 mm/page_alloc.c                                |   5 -
 mm/rmap.c                                      | 159 ++++++++-----------------
 9 files changed, 69 insertions(+), 172 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
index 7db63c002922..ddb5ff5cee15 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -609,9 +609,9 @@ memory.stat file includes following statistics:
 
 	'rss + mapped_file" will give you resident set size of cgroup.
 
-	Note that some kernel configurations might account complete larger
-	allocations (e.g., THP) towards 'rss' and 'mapped_file', even if
-	only some, but not all that memory is mapped.
+	Note that the kernel accounts entire larger allocations (e.g., THP)
+	towards 'rss' and 'mapped_file' if any part of such an allocation
+	is mapped.
 
 	(Note: file and shmem may be shared among other cgroups. In that case,
 	mapped_file is accounted only when the memory cgroup is owner of page
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 8ad0b2781317..aa703ec89e29 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1538,10 +1538,9 @@ The following nested keys are defined.
 
 	  anon
 		Amount of memory used in anonymous mappings such as
-		brk(), sbrk(), and mmap(MAP_ANONYMOUS). Note that
-		some kernel configurations might account complete larger
-		allocations (e.g., THP) if only some, but not all the
-		memory of such an allocation is mapped anymore.
+		brk(), sbrk(), and mmap(MAP_ANONYMOUS). Note that the
+		kernel accounts entire larger allocations (e.g., THP) towards
+		"anon" if any part of such an allocation is mapped.
 
 	  file
 		Amount of memory used to cache filesystem data,
@@ -1585,9 +1584,9 @@ The following nested keys are defined.
 
 	  file_mapped
 		Amount of cached filesystem data mapped with mmap(). Note
-		that some kernel configurations might account complete
-		larger allocations (e.g., THP) if only some, but not
-		not all the memory of such an allocation is mapped.
+		that the kernel accounts entire larger allocations
+		(e.g., THP) towards "file_mapped" if any part of such an
+		allocation is mapped.
 
 	  file_dirty
 		Amount of cached filesystem data that was modified but
diff --git a/Documentation/mm/transhuge.rst b/Documentation/mm/transhuge.rst
index 0e7f8e4cd2e3..f200c1ac19cb 100644
--- a/Documentation/mm/transhuge.rst
+++ b/Documentation/mm/transhuge.rst
@@ -122,10 +122,6 @@ pages:
     corresponding mapcount), and the current status ("maybe mapped shared" vs.
     "mapped exclusively").
 
-    With CONFIG_PAGE_MAPCOUNT, we also increment/decrement
-    folio->_nr_pages_mapped by ENTIRELY_MAPPED when _entire_mapcount goes
-    from -1 to 0 or 0 to -1.
-
   - map/unmap of individual pages with PTE entry increment/decrement
     folio->_large_mapcount.
 
@@ -134,9 +130,7 @@ pages:
     "mapped exclusively").
 
     With CONFIG_PAGE_MAPCOUNT, we also increment/decrement
-    page->_mapcount and increment/decrement folio->_nr_pages_mapped when
-    page->_mapcount goes from -1 to 0 or 0 to -1 as this counts the number
-    of pages mapped by PTE.
+    page->_mapcount.
 
 split_huge_page internally has to distribute the refcounts in the head
 page to the tail pages before clearing all PG_head/tail bits from the page
@@ -181,12 +175,9 @@ The function deferred_split_folio() is used to queue a folio for splitting.
 The splitting itself will happen when we get memory pressure via shrinker
 interface.
 
-With CONFIG_PAGE_MAPCOUNT, we reliably detect partial mappings based on
-folio->_nr_pages_mapped.
-
-With CONFIG_NO_PAGE_MAPCOUNT, we detect partial mappings based on the
-average per-page mapcount in a THP: if the average is < 1, an anon THP is
-certainly partially mapped. As long as only a single process maps a THP,
-this detection is reliable. With long-running child processes, there can
-be scenarios where partial mappings can currently not be detected, and
-might need asynchronous detection during memory reclaim in the future.
+We detect partial mappings based on the average per-page mapcount in a THP: if
+the average is < 1, an anon THP is certainly partially mapped. As long as
+only a single process maps a THP, this detection is reliable. With
+long-running child processes, there can be scenarios where partial mappings
+can currently not be detected, and might need asynchronous detection during
+memory reclaim in the future.
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index a308e2c23b82..47b2c3d05f41 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -377,7 +377,7 @@ typedef unsigned short mm_id_t;
  * @_last_cpupid: IDs of last CPU and last process that accessed the folio.
  * @_entire_mapcount: Do not use directly, call folio_entire_mapcount().
  * @_large_mapcount: Do not use directly, call folio_mapcount().
- * @_nr_pages_mapped: Do not use outside of rmap and debug code.
+ * @_unused_1: Temporary placeholder.
  * @_pincount: Do not use directly, call folio_maybe_dma_pinned().
  * @_nr_pages: Do not use directly, call folio_nr_pages().
  * @_mm_id: Do not use outside of rmap code.
@@ -452,7 +452,7 @@ struct folio {
 				struct {
 	/* public: */
 					atomic_t _large_mapcount;
-					atomic_t _nr_pages_mapped;
+					unsigned int _unused_1;
 #ifdef CONFIG_64BIT
 					atomic_t _entire_mapcount;
 					atomic_t _pincount;
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 8dc0871e5f00..e5569f5fdaec 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -291,7 +291,7 @@ static inline void folio_add_large_mapcount(struct folio *folio,
 static inline int folio_add_return_large_mapcount(struct folio *folio,
 		int diff, struct vm_area_struct *vma)
 {
-	BUILD_BUG();
+	return atomic_add_return(diff, &folio->_large_mapcount) + 1;
 }
 
 static inline void folio_sub_large_mapcount(struct folio *folio,
@@ -303,7 +303,7 @@ static inline void folio_sub_large_mapcount(struct folio *folio,
 static inline int folio_sub_return_large_mapcount(struct folio *folio,
 		int diff, struct vm_area_struct *vma)
 {
-	BUILD_BUG();
+	return atomic_sub_return(diff, &folio->_large_mapcount) + 1;
 }
 #endif /* CONFIG_MM_ID */
 
diff --git a/mm/debug.c b/mm/debug.c
index 77fa8fe1d641..bfb41ef17a5e 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -86,11 +86,10 @@ static void __dump_folio(const struct folio *folio, const struct page *page,
 		if (folio_has_pincount(folio))
 			pincount = atomic_read(&folio->_pincount);
 
-		pr_warn("head: order:%u mapcount:%d entire_mapcount:%d nr_pages_mapped:%d pincount:%d\n",
+		pr_warn("head: order:%u mapcount:%d entire_mapcount:%d pincount:%d\n",
 				folio_order(folio),
 				folio_mapcount(folio),
 				folio_entire_mapcount(folio),
-				folio_nr_pages_mapped(folio),
 				pincount);
 	}
 
diff --git a/mm/internal.h b/mm/internal.h
index c693646e5b3f..30e48f39d2de 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -103,34 +103,12 @@ struct pagetable_move_control {
 
 void page_writeback_init(void);
 
-/*
- * If a 16GB hugetlb folio were mapped by PTEs of all of its 4kB pages,
- * its nr_pages_mapped would be 0x400000: choose the ENTIRELY_MAPPED bit
- * above that range, instead of 2*(PMD_SIZE/PAGE_SIZE).  Hugetlb currently
- * leaves nr_pages_mapped at 0, but avoid surprise if it participates later.
- */
-#define ENTIRELY_MAPPED		0x800000
-#define FOLIO_PAGES_MAPPED	(ENTIRELY_MAPPED - 1)
-
 /*
  * Flags passed to __show_mem() and show_free_areas() to suppress output in
  * various contexts.
  */
 #define SHOW_MEM_FILTER_NODES		(0x0001u)	/* disallowed nodes */
 
-/*
- * How many individual pages have an elevated _mapcount.  Excludes
- * the folio's entire_mapcount.
- *
- * Don't use this function outside of debugging code.
- */
-static inline int folio_nr_pages_mapped(const struct folio *folio)
-{
-	if (IS_ENABLED(CONFIG_NO_PAGE_MAPCOUNT))
-		return -1;
-	return atomic_read(&folio->_nr_pages_mapped) & FOLIO_PAGES_MAPPED;
-}
-
 /*
  * Retrieve the first entry of a folio based on a provided entry within the
  * folio. We cannot rely on folio->swap as there is no guarantee that it has
@@ -885,8 +863,6 @@ static inline void prep_compound_head(struct page *page, unsigned int order)
 
 	folio_set_order(folio, order);
 	atomic_set(&folio->_large_mapcount, -1);
-	if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT))
-		atomic_set(&folio->_nr_pages_mapped, 0);
 	if (IS_ENABLED(CONFIG_MM_ID)) {
 		folio->_mm_ids = 0;
 		folio->_mm_id_mapcount[0] = -1;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b1c5430cad4e..8888f31aca49 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1111,11 +1111,6 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page)
 			bad_page(page, "nonzero large_mapcount");
 			goto out;
 		}
-		if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT) &&
-		    unlikely(atomic_read(&folio->_nr_pages_mapped))) {
-			bad_page(page, "nonzero nr_pages_mapped");
-			goto out;
-		}
 		if (IS_ENABLED(CONFIG_MM_ID)) {
 			if (unlikely(folio->_mm_id_mapcount[0] != -1)) {
 				bad_page(page, "nonzero mm mapcount 0");
diff --git a/mm/rmap.c b/mm/rmap.c
index 78b7fb5f367c..df42c38fe387 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1353,9 +1353,8 @@ static __always_inline void __folio_add_rmap(struct folio *folio,
 		struct page *page, int nr_pages, struct vm_area_struct *vma,
 		enum pgtable_level level)
 {
-	atomic_t *mapped = &folio->_nr_pages_mapped;
+	int nr = 0, nr_pmdmapped = 0, mapcount;
 	const int orig_nr_pages = nr_pages;
-	int first = 0, nr = 0, nr_pmdmapped = 0;
 
 	__folio_rmap_sanity_checks(folio, page, nr_pages, level);
 
@@ -1366,61 +1365,25 @@ static __always_inline void __folio_add_rmap(struct folio *folio,
 			break;
 		}
 
-		if (IS_ENABLED(CONFIG_NO_PAGE_MAPCOUNT)) {
-			nr = folio_add_return_large_mapcount(folio, orig_nr_pages, vma);
-			if (nr == orig_nr_pages)
-				/* Was completely unmapped. */
-				nr = folio_large_nr_pages(folio);
-			else
-				nr = 0;
-			break;
+		if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT)) {
+			do {
+				atomic_inc(&page->_mapcount);
+			} while (page++, --nr_pages > 0);
 		}
 
-		do {
-			first += atomic_inc_and_test(&page->_mapcount);
-		} while (page++, --nr_pages > 0);
-
-		if (first &&
-		    atomic_add_return_relaxed(first, mapped) < ENTIRELY_MAPPED)
-			nr = first;
-
-		folio_add_large_mapcount(folio, orig_nr_pages, vma);
+		mapcount = folio_add_return_large_mapcount(folio, orig_nr_pages, vma);
+		if (mapcount == orig_nr_pages)
+			nr = folio_large_nr_pages(folio);
 		break;
 	case PGTABLE_LEVEL_PMD:
 	case PGTABLE_LEVEL_PUD:
-		first = atomic_inc_and_test(&folio->_entire_mapcount);
-		if (IS_ENABLED(CONFIG_NO_PAGE_MAPCOUNT)) {
-			if (level == PGTABLE_LEVEL_PMD && first)
-				nr_pmdmapped = folio_large_nr_pages(folio);
-			nr = folio_inc_return_large_mapcount(folio, vma);
-			if (nr == 1)
-				/* Was completely unmapped. */
-				nr = folio_large_nr_pages(folio);
-			else
-				nr = 0;
-			break;
-		}
+		if (atomic_inc_and_test(&folio->_entire_mapcount) &&
+		    level == PGTABLE_LEVEL_PMD)
+			nr_pmdmapped = HPAGE_PMD_NR;
 
-		if (first) {
-			nr = atomic_add_return_relaxed(ENTIRELY_MAPPED, mapped);
-			if (likely(nr < ENTIRELY_MAPPED + ENTIRELY_MAPPED)) {
-				nr_pages = folio_large_nr_pages(folio);
-				/*
-				 * We only track PMD mappings of PMD-sized
-				 * folios separately.
-				 */
-				if (level == PGTABLE_LEVEL_PMD)
-					nr_pmdmapped = nr_pages;
-				nr = nr_pages - (nr & FOLIO_PAGES_MAPPED);
-				/* Raced ahead of a remove and another add? */
-				if (unlikely(nr < 0))
-					nr = 0;
-			} else {
-				/* Raced ahead of a remove of ENTIRELY_MAPPED */
-				nr = 0;
-			}
-		}
-		folio_inc_large_mapcount(folio, vma);
+		mapcount = folio_inc_return_large_mapcount(folio, vma);
+		if (mapcount == 1)
+			nr = folio_large_nr_pages(folio);
 		break;
 	default:
 		BUILD_BUG();
@@ -1676,15 +1639,11 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
 		}
 
 		folio_set_large_mapcount(folio, nr, vma);
-		if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT))
-			atomic_set(&folio->_nr_pages_mapped, nr);
 	} else {
 		nr = folio_large_nr_pages(folio);
 		/* increment count (starts at -1) */
 		atomic_set(&folio->_entire_mapcount, 0);
 		folio_set_large_mapcount(folio, 1, vma);
-		if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT))
-			atomic_set(&folio->_nr_pages_mapped, ENTIRELY_MAPPED);
 		if (exclusive)
 			SetPageAnonExclusive(&folio->page);
 		nr_pmdmapped = nr;
@@ -1773,12 +1732,28 @@ void folio_add_file_rmap_pud(struct folio *folio, struct page *page,
 #endif
 }
 
+static bool __folio_certainly_partially_mapped(struct folio *folio, int mapcount)
+{
+	/*
+	 * This is a best-effort check only: if the average per-page
+	 * mapcount in the folio is smaller than 1, at least one page is not
+	 * mapped -> partially mapped. This is always reliable for exclusive
+	 * folios.
+	 *
+	 * We will not detect partial mappings in all scenarios:
+	 * when a folio becomes partially mapped while shared and the
+	 * average per-page mapcount is >= 1. However, we will detect the
+	 * partial mapping once it becomes exclusively mapped again.
+	 */
+	return mapcount && !folio_entire_mapcount(folio) &&
+	       mapcount < folio_large_nr_pages(folio);
+}
+
 static __always_inline void __folio_remove_rmap(struct folio *folio,
 		struct page *page, int nr_pages, struct vm_area_struct *vma,
 		enum pgtable_level level)
 {
-	atomic_t *mapped = &folio->_nr_pages_mapped;
-	int last = 0, nr = 0, nr_pmdmapped = 0;
+	int nr = 0, nr_pmdmapped = 0, mapcount;
 	bool partially_mapped = false;
 
 	__folio_rmap_sanity_checks(folio, page, nr_pages, level);
@@ -1790,67 +1765,29 @@ static __always_inline void __folio_remove_rmap(struct folio *folio,
 			break;
 		}
 
-		if (IS_ENABLED(CONFIG_NO_PAGE_MAPCOUNT)) {
-			nr = folio_sub_return_large_mapcount(folio, nr_pages, vma);
-			if (!nr) {
-				/* Now completely unmapped. */
-				nr = folio_large_nr_pages(folio);
-			} else {
-				partially_mapped = nr < folio_large_nr_pages(folio) &&
-						   !folio_entire_mapcount(folio);
-				nr = 0;
-			}
-			break;
-		}
-
-		folio_sub_large_mapcount(folio, nr_pages, vma);
-		do {
-			last += atomic_add_negative(-1, &page->_mapcount);
-		} while (page++, --nr_pages > 0);
+		mapcount = folio_sub_return_large_mapcount(folio, nr_pages, vma);
+		if (!mapcount)
+			nr = folio_large_nr_pages(folio);
 
-		if (last &&
-		    atomic_sub_return_relaxed(last, mapped) < ENTIRELY_MAPPED)
-			nr = last;
+		if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT)) {
+			do {
+				atomic_dec(&page->_mapcount);
+			} while (page++, --nr_pages > 0);
+		}
 
-		partially_mapped = nr && atomic_read(mapped);
+		partially_mapped = __folio_certainly_partially_mapped(folio, mapcount);
 		break;
 	case PGTABLE_LEVEL_PMD:
 	case PGTABLE_LEVEL_PUD:
-		if (IS_ENABLED(CONFIG_NO_PAGE_MAPCOUNT)) {
-			last = atomic_add_negative(-1, &folio->_entire_mapcount);
-			if (level == PGTABLE_LEVEL_PMD && last)
-				nr_pmdmapped = folio_large_nr_pages(folio);
-			nr = folio_dec_return_large_mapcount(folio, vma);
-			if (!nr) {
-				/* Now completely unmapped. */
-				nr = folio_large_nr_pages(folio);
-			} else {
-				partially_mapped = last &&
-						   nr < folio_large_nr_pages(folio);
-				nr = 0;
-			}
-			break;
-		}
+		mapcount = folio_dec_return_large_mapcount(folio, vma);
+		if (!mapcount)
+			nr = folio_large_nr_pages(folio);
 
-		folio_dec_large_mapcount(folio, vma);
-		last = atomic_add_negative(-1, &folio->_entire_mapcount);
-		if (last) {
-			nr = atomic_sub_return_relaxed(ENTIRELY_MAPPED, mapped);
-			if (likely(nr < ENTIRELY_MAPPED)) {
-				nr_pages = folio_large_nr_pages(folio);
-				if (level == PGTABLE_LEVEL_PMD)
-					nr_pmdmapped = nr_pages;
-				nr = nr_pages - nr;
-				/* Raced ahead of another remove and an add? */
-				if (unlikely(nr < 0))
-					nr = 0;
-			} else {
-				/* An add of ENTIRELY_MAPPED raced ahead */
-				nr = 0;
-			}
-		}
+		if (atomic_add_negative(-1, &folio->_entire_mapcount) &&
+		    level == PGTABLE_LEVEL_PMD)
+			nr_pmdmapped = HPAGE_PMD_NR;
 
-		partially_mapped = nr && nr < nr_pmdmapped;
+		partially_mapped = __folio_certainly_partially_mapped(folio, mapcount);
 		break;
 	default:
 		BUILD_BUG();

-- 
2.43.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 02/13] fs/proc/task_mmu: remove CONFIG_PAGE_MAPCOUNT handling for "mapmax"
  2026-04-12 18:59 [PATCH RFC 00/13] mm/rmap: support arbitrary folio mappings David Hildenbrand (Arm)
  2026-04-12 18:59 ` [PATCH RFC 01/13] mm/rmap: remove folio->_nr_pages_mapped David Hildenbrand (Arm)
@ 2026-04-12 18:59 ` David Hildenbrand (Arm)
  2026-04-12 18:59 ` [PATCH RFC 03/13] fs/proc/page: remove CONFIG_PAGE_MAPCOUNT handling for kpagecount David Hildenbrand (Arm)
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-12 18:59 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Andrew Morton, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Rik van Riel, Harry Yoo,
	Jann Horn, Brendan Jackman, Zi Yan, Pedro Falcato,
	Matthew Wilcox
  Cc: cgroups, linux-doc, linux-kernel, linux-mm, linux-fsdevel,
	David Hildenbrand (Arm)

In preparation for removing CONFIG_PAGE_MAPCOUNT, let's always use a
folio's average page mapcount instead of the precise page mapcount when
calculating "mapmax".

Update the doc to state that this behavior no longer depends on the
kernel config. While at it, make it clearer what "mapmax" actually
expresses.

For small folios, or large folios that are mostly fully-mapped, there is
no change at all.

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 Documentation/filesystems/proc.rst |  8 ++++----
 fs/proc/task_mmu.c                 | 11 +++--------
 2 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 628364b0f69f..1224dc73e089 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -699,10 +699,10 @@ Where:
 node locality page counters (N0 == node0, N1 == node1, ...) and the kernel page
 size, in KB, that is backing the mapping up.
 
-Note that some kernel configurations do not track the precise number of times
-a page part of a larger allocation (e.g., THP) is mapped. In these
-configurations, "mapmax" might corresponds to the average number of mappings
-per page in such a larger allocation instead.
+"mapmax" is the maximum page mapcount of any page in the mapping, i.e.,
+the highest sharing level observed. For pages that are part of larger
+allocations (e.g., THP), it is derived from the average mapcount per page
+in the allocation, since precise per-page mapcounts are not available.
 
 1.2 Kernel data
 ---------------
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index e091931d7ca1..ad0989d101ab 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -3137,12 +3137,7 @@ static void gather_stats(struct page *page, struct numa_maps *md, int pte_dirty,
 			unsigned long nr_pages)
 {
 	struct folio *folio = page_folio(page);
-	int count;
-
-	if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT))
-		count = folio_precise_page_mapcount(folio, page);
-	else
-		count = folio_average_page_mapcount(folio);
+	const int mapcount = folio_average_page_mapcount(folio);
 
 	md->pages += nr_pages;
 	if (pte_dirty || folio_test_dirty(folio))
@@ -3160,8 +3155,8 @@ static void gather_stats(struct page *page, struct numa_maps *md, int pte_dirty,
 	if (folio_test_anon(folio))
 		md->anon += nr_pages;
 
-	if (count > md->mapcount_max)
-		md->mapcount_max = count;
+	if (mapcount > md->mapcount_max)
+		md->mapcount_max = mapcount;
 
 	md->node[folio_nid(folio)] += nr_pages;
 }

-- 
2.43.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 03/13] fs/proc/page: remove CONFIG_PAGE_MAPCOUNT handling for kpagecount
  2026-04-12 18:59 [PATCH RFC 00/13] mm/rmap: support arbitrary folio mappings David Hildenbrand (Arm)
  2026-04-12 18:59 ` [PATCH RFC 01/13] mm/rmap: remove folio->_nr_pages_mapped David Hildenbrand (Arm)
  2026-04-12 18:59 ` [PATCH RFC 02/13] fs/proc/task_mmu: remove CONFIG_PAGE_MAPCOUNT handling for "mapmax" David Hildenbrand (Arm)
@ 2026-04-12 18:59 ` David Hildenbrand (Arm)
  2026-04-12 18:59 ` [PATCH RFC 04/13] fs/proc/task_mmu: remove CONFIG_PAGE_MAPCOUNT handling for PM_MMAP_EXCLUSIVE David Hildenbrand (Arm)
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-12 18:59 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Andrew Morton, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Rik van Riel, Harry Yoo,
	Jann Horn, Brendan Jackman, Zi Yan, Pedro Falcato,
	Matthew Wilcox
  Cc: cgroups, linux-doc, linux-kernel, linux-mm, linux-fsdevel,
	David Hildenbrand (Arm)

In preparation for removing CONFIG_PAGE_MAPCOUNT, let's always use a
folio's average page mapcount instead of the precise page mapcount when
calculating the kpagecount value, like we do with CONFIG_NO_PAGE_MAPCOUNT.

Update the doc to state that this behavior no longer depends on the
kernel config. While at it, improve the documentation a bit. "pagecount"
was really misnamed back in the days ...

Should we mention that the value is not actually really expressive in many
cases, such as for the shared zeropage or pages with a PFNMAP mapping?
Let's keep it simple, the hope is that this interface is not used at
all anymore, except for some weird debugging scenarios.

For small folios, or large folios that are fully-mapped everywhere, there
is no change at all.

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 Documentation/admin-guide/mm/pagemap.rst | 13 ++++++-------
 fs/proc/page.c                           | 10 +---------
 2 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
index c57e61b5d8aa..f9478bcbb6a9 100644
--- a/Documentation/admin-guide/mm/pagemap.rst
+++ b/Documentation/admin-guide/mm/pagemap.rst
@@ -53,13 +53,12 @@ There are four components to pagemap:
    determine which areas of memory are actually mapped and llseek to
    skip over unmapped regions.
 
- * ``/proc/kpagecount``.  This file contains a 64-bit count of the number of
-   times each page is mapped, indexed by PFN. Some kernel configurations do
-   not track the precise number of times a page part of a larger allocation
-   (e.g., THP) is mapped. In these configurations, the average number of
-   mappings per page in this larger allocation is returned instead. However,
-   if any page of the large allocation is mapped, the returned value will
-   be at least 1.
+ * ``/proc/kpagecount``.  This file contains a 64-bit value for each page,
+   indexed by PFN, representing its mapcount, i.e., the number of times it
+   is mapped into page tables.  For pages that are part of larger allocations
+   (e.g., THP), the average mapcount per page in the allocation is used, since
+   precise per-page mapcounts are not available.  If any page in such an
+   allocation is mapped, the returned value will be at least 1.
 
 The page-types tool in the tools/mm directory can be used to query the
 number of times a page is mapped.
diff --git a/fs/proc/page.c b/fs/proc/page.c
index f9b2c2c906cd..bc4d7c3751de 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -45,17 +45,9 @@ static inline unsigned long get_max_dump_pfn(void)
 static u64 get_kpage_count(const struct page *page)
 {
 	struct page_snapshot ps;
-	u64 ret;
 
 	snapshot_page(&ps, page);
-
-	if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT))
-		ret = folio_precise_page_mapcount(&ps.folio_snapshot,
-						  &ps.page_snapshot);
-	else
-		ret = folio_average_page_mapcount(&ps.folio_snapshot);
-
-	return ret;
+	return folio_average_page_mapcount(&ps.folio_snapshot);
 }
 
 static ssize_t kpage_read(struct file *file, char __user *buf,

-- 
2.43.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 04/13] fs/proc/task_mmu: remove CONFIG_PAGE_MAPCOUNT handling for PM_MMAP_EXCLUSIVE
  2026-04-12 18:59 [PATCH RFC 00/13] mm/rmap: support arbitrary folio mappings David Hildenbrand (Arm)
                   ` (2 preceding siblings ...)
  2026-04-12 18:59 ` [PATCH RFC 03/13] fs/proc/page: remove CONFIG_PAGE_MAPCOUNT handling for kpagecount David Hildenbrand (Arm)
@ 2026-04-12 18:59 ` David Hildenbrand (Arm)
  2026-04-12 18:59 ` [PATCH RFC 05/13] fs/proc/task_mmu: remove mapcount comment in smaps_account() David Hildenbrand (Arm)
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-12 18:59 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Andrew Morton, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Rik van Riel, Harry Yoo,
	Jann Horn, Brendan Jackman, Zi Yan, Pedro Falcato,
	Matthew Wilcox
  Cc: cgroups, linux-doc, linux-kernel, linux-mm, linux-fsdevel,
	David Hildenbrand (Arm)

In preparation for removing CONFIG_PAGE_MAPCOUNT, let's always use a
folio_maybe_mapped_shared() to detect possible page sharing, like we do
with CONFIG_NO_PAGE_MAPCOUNT.

Update the doc to state that this behavior no longer depends on the
kernel config, and simplify the doc a bit to mention less details that
are hard to follow.

For small folios and for large folios that were never mapped in multiple
processes at the same time, there is no change at all. For large folios,
there might be a change if

(1) The folio was once mapped at the same time into more than two
    address space, and now is only mapped in a single address space. We
    might detect it as shared.
(2) A folio page is only mapped into a single address space, but folio
    pages mapped into other address spaces. We will detect it as
    shared.
(3) A folio page is mapped multiple times into the same address space. We
    will detect it as exclusive.

We can now remove __folio_page_mapped_exclusively().

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 Documentation/admin-guide/mm/pagemap.rst | 17 +++++++----------
 fs/proc/task_mmu.c                       | 12 ++----------
 2 files changed, 9 insertions(+), 20 deletions(-)

diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
index f9478bcbb6a9..67eb04b1e246 100644
--- a/Documentation/admin-guide/mm/pagemap.rst
+++ b/Documentation/admin-guide/mm/pagemap.rst
@@ -38,16 +38,13 @@ There are four components to pagemap:
    precisely which pages are mapped (or in swap) and comparing mapped
    pages between processes.
 
-   Traditionally, bit 56 indicates that a page is mapped exactly once and bit
-   56 is clear when a page is mapped multiple times, even when mapped in the
-   same process multiple times. In some kernel configurations, the semantics
-   for pages part of a larger allocation (e.g., THP) can differ: bit 56 is set
-   if all pages part of the corresponding large allocation are *certainly*
-   mapped in the same process, even if the page is mapped multiple times in that
-   process. Bit 56 is clear when any page page of the larger allocation
-   is *maybe* mapped in a different process. In some cases, a large allocation
-   might be treated as "maybe mapped by multiple processes" even though this
-   is no longer the case.
+   Bit 56 set indicates that the page is currently *certainly* exclusively
+   mapped in this process, and bit 56 clear indicates that the page *might be*
+   mapped into multiple processes ("shared").  Note that in the past, the bit
+   precisely indicated that a page was mapped exactly once, and the bit was
+   clear also if mapped multiple times in the same process.  As this precise
+   information is not available for pages that are part of large allocations
+   (e.g., THP), the semantics have been slightly adjusted.
 
    Efficient users of this interface will use ``/proc/pid/maps`` to
    determine which areas of memory are actually mapped and llseek to
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ad0989d101ab..1e1572849fed 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1884,13 +1884,6 @@ static int add_to_pagemap(pagemap_entry_t *pme, struct pagemapread *pm)
 	return 0;
 }
 
-static bool __folio_page_mapped_exclusively(struct folio *folio, struct page *page)
-{
-	if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT))
-		return folio_precise_page_mapcount(folio, page) == 1;
-	return !folio_maybe_mapped_shared(folio);
-}
-
 static int pagemap_pte_hole(unsigned long start, unsigned long end,
 			    __always_unused int depth, struct mm_walk *walk)
 {
@@ -1985,8 +1978,7 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 		folio = page_folio(page);
 		if (!folio_test_anon(folio))
 			flags |= PM_FILE;
-		if ((flags & PM_PRESENT) &&
-		    __folio_page_mapped_exclusively(folio, page))
+		if ((flags & PM_PRESENT) && !folio_maybe_mapped_shared(folio))
 			flags |= PM_MMAP_EXCLUSIVE;
 	}
 
@@ -2058,7 +2050,7 @@ static int pagemap_pmd_range_thp(pmd_t *pmdp, unsigned long addr,
 		pagemap_entry_t pme;
 
 		if (folio && (flags & PM_PRESENT) &&
-		    __folio_page_mapped_exclusively(folio, page))
+		    !folio_maybe_mapped_shared(folio))
 			cur_flags |= PM_MMAP_EXCLUSIVE;
 
 		pme = make_pme(frame, cur_flags);

-- 
2.43.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 05/13] fs/proc/task_mmu: remove mapcount comment in smaps_account()
  2026-04-12 18:59 [PATCH RFC 00/13] mm/rmap: support arbitrary folio mappings David Hildenbrand (Arm)
                   ` (3 preceding siblings ...)
  2026-04-12 18:59 ` [PATCH RFC 04/13] fs/proc/task_mmu: remove CONFIG_PAGE_MAPCOUNT handling for PM_MMAP_EXCLUSIVE David Hildenbrand (Arm)
@ 2026-04-12 18:59 ` David Hildenbrand (Arm)
  2026-04-12 18:59 ` [PATCH RFC 06/13] fs/proc/task_mmu: remove CONFIG_PAGE_MAPCOUNT handling " David Hildenbrand (Arm)
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-12 18:59 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Andrew Morton, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Rik van Riel, Harry Yoo,
	Jann Horn, Brendan Jackman, Zi Yan, Pedro Falcato,
	Matthew Wilcox
  Cc: cgroups, linux-doc, linux-kernel, linux-mm, linux-fsdevel,
	David Hildenbrand (Arm)

Reading the mapcount is a usually a snapshot that can change immediately
afterwards, except when the folio is locked and the folio is unmapped.

For example, nothing stops other folio/page mappings that are not protected
through the same PTL from going away; the folio lock cannot prevent that
situation.

Let's just drop the comment.

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 fs/proc/task_mmu.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 1e1572849fed..55b037768c60 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -968,11 +968,6 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
 		exclusive = !folio_maybe_mapped_shared(folio);
 	}
 
-	/*
-	 * We obtain a snapshot of the mapcount. Without holding the folio lock
-	 * this snapshot can be slightly wrong as we cannot always read the
-	 * mapcount atomically.
-	 */
 	for (i = 0; i < nr; i++, page++) {
 		unsigned long pss = PAGE_SIZE << PSS_SHIFT;
 

-- 
2.43.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 06/13] fs/proc/task_mmu: remove CONFIG_PAGE_MAPCOUNT handling in smaps_account()
  2026-04-12 18:59 [PATCH RFC 00/13] mm/rmap: support arbitrary folio mappings David Hildenbrand (Arm)
                   ` (4 preceding siblings ...)
  2026-04-12 18:59 ` [PATCH RFC 05/13] fs/proc/task_mmu: remove mapcount comment in smaps_account() David Hildenbrand (Arm)
@ 2026-04-12 18:59 ` David Hildenbrand (Arm)
  2026-04-12 18:59 ` [PATCH RFC 07/13] mm/rmap: remove CONFIG_PAGE_MAPCOUNT David Hildenbrand (Arm)
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-12 18:59 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Andrew Morton, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Rik van Riel, Harry Yoo,
	Jann Horn, Brendan Jackman, Zi Yan, Pedro Falcato,
	Matthew Wilcox
  Cc: cgroups, linux-doc, linux-kernel, linux-mm, linux-fsdevel,
	David Hildenbrand (Arm)

In preparation for removing CONFIG_PAGE_MAPCOUNT, let's always use
folio_maybe_mapped_shared() to detect possible page sharing for
calculating the USS, and use folio_average_page_mapcount() to calculate
the PSS, like we do with CONFIG_NO_PAGE_MAPCOUNT.

We can now stop looping over all pages. We could now also get rid
of the "folio_ref_count(folio) == 1" handling that tried to avoid the loop
in the past. But it still looks like a nice and simply
micro-optimization given that many (small) folios only have a single
mapping.

Rename "exclusive" to "private" such that it directly matches the
parameter name in smaps_page_accumulate(), and cleanup the code to
only have a single smaps_page_accumulate() call.

Update the doc to state that this behavior no longer depends on the
kernel config, and simplify the doc a bit to mention less details that
are hard to follow.

We can now remove folio_precise_page_mapcount().

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 Documentation/filesystems/proc.rst | 33 +++++++++++-------------------
 fs/proc/internal.h                 | 39 ------------------------------------
 fs/proc/task_mmu.c                 | 41 ++++++++++----------------------------
 3 files changed, 22 insertions(+), 91 deletions(-)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 1224dc73e089..d2264240e43f 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -490,27 +490,18 @@ in memory, where each page is divided by the number of processes sharing it.
 So if a process has 1000 pages all to itself, and 1000 shared with one other
 process, its PSS will be 1500.  "Pss_Dirty" is the portion of PSS which
 consists of dirty pages.  ("Pss_Clean" is not included, but it can be
-calculated by subtracting "Pss_Dirty" from "Pss".)
-
-Traditionally, a page is accounted as "private" if it is mapped exactly once,
-and a page is accounted as "shared" when mapped multiple times, even when
-mapped in the same process multiple times. Note that this accounting is
-independent of MAP_SHARED.
-
-In some kernel configurations, the semantics of pages part of a larger
-allocation (e.g., THP) can differ: a page is accounted as "private" if all
-pages part of the corresponding large allocation are *certainly* mapped in the
-same process, even if the page is mapped multiple times in that process. A
-page is accounted as "shared" if any page page of the larger allocation
-is *maybe* mapped in a different process. In some cases, a large allocation
-might be treated as "maybe mapped by multiple processes" even though this
-is no longer the case.
-
-Some kernel configurations do not track the precise number of times a page part
-of a larger allocation is mapped. In this case, when calculating the PSS, the
-average number of mappings per page in this larger allocation might be used
-as an approximation for the number of mappings of a page. The PSS calculation
-will be imprecise in this case.
+calculated by subtracting "Pss_Dirty" from "Pss".)  In some scenarios where
+larger allocations (e.g., THP) are used, the PSS can be sightly imprecise,
+as precise information about how many processes share a page is not available
+for individual pages in such allocations.
+
+A page is accounted as "private" if it is currently *certainly* exclusively
+mapped in this process, and as "shared" if the page *might be* mapped into
+multiple processes.  Note that this accounting is independent of MAP_SHARED.
+In the past, pages that were mapped exactly once were accounted as "private",
+and pages with multiple mappings, even if in the same process, as "shared".
+As this precise information is not available for pages that are part of large
+allocations (e.g., THP), the semantics have been slightly adjusted.
 
 "Referenced" indicates the amount of memory currently marked as referenced or
 accessed.
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index c1e8eb984da8..a5908167ce2d 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -161,45 +161,6 @@ unsigned name_to_int(const struct qstr *qstr);
 /* Worst case buffer size needed for holding an integer. */
 #define PROC_NUMBUF 13
 
-#ifdef CONFIG_PAGE_MAPCOUNT
-/**
- * folio_precise_page_mapcount() - Number of mappings of this folio page.
- * @folio: The folio.
- * @page: The page.
- *
- * The number of present user page table entries that reference this page
- * as tracked via the RMAP: either referenced directly (PTE) or as part of
- * a larger area that covers this page (e.g., PMD).
- *
- * Use this function only for the calculation of existing statistics
- * (USS, PSS, mapcount_max) and for debugging purposes (/proc/kpagecount).
- *
- * Do not add new users.
- *
- * Returns: The number of mappings of this folio page. 0 for
- * folios that are not mapped to user space or are not tracked via the RMAP
- * (e.g., shared zeropage).
- */
-static inline int folio_precise_page_mapcount(struct folio *folio,
-		struct page *page)
-{
-	int mapcount = atomic_read(&page->_mapcount) + 1;
-
-	if (page_mapcount_is_type(mapcount))
-		mapcount = 0;
-	if (folio_test_large(folio))
-		mapcount += folio_entire_mapcount(folio);
-
-	return mapcount;
-}
-#else /* !CONFIG_PAGE_MAPCOUNT */
-static inline int folio_precise_page_mapcount(struct folio *folio,
-		struct page *page)
-{
-	BUILD_BUG();
-}
-#endif /* CONFIG_PAGE_MAPCOUNT */
-
 /**
  * folio_average_page_mapcount() - Average number of mappings per page in this
  *				   folio
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 55b037768c60..7b212fb6ae6c 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -918,10 +918,9 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
 		bool present)
 {
 	struct folio *folio = page_folio(page);
-	int i, nr = compound ? compound_nr(page) : 1;
-	unsigned long size = nr * PAGE_SIZE;
-	bool exclusive;
-	int mapcount;
+	const unsigned long size = compound ? folio_size(folio) : PAGE_SIZE;
+	unsigned long pss = size << PSS_SHIFT;
+	bool private = false;
 
 	/*
 	 * First accumulate quantities that depend only on |size| and the type
@@ -943,13 +942,6 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
 		mss->referenced += size;
 
 	/*
-	 * Then accumulate quantities that may depend on sharing, or that may
-	 * differ page-by-page.
-	 *
-	 * refcount == 1 for present entries guarantees that the folio is mapped
-	 * exactly once. For large folios this implies that exactly one
-	 * PTE/PMD/... maps (a part of) this folio.
-	 *
 	 * Treat all non-present entries (where relying on the mapcount and
 	 * refcount doesn't make sense) as "maybe shared, but not sure how
 	 * often". We treat device private entries as being fake-present.
@@ -957,30 +949,17 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
 	 * Note that it would not be safe to read the mapcount especially for
 	 * pages referenced by migration entries, even with the PTL held.
 	 */
-	if (folio_ref_count(folio) == 1 || !present) {
-		smaps_page_accumulate(mss, folio, size, size << PSS_SHIFT,
-				      dirty, locked, present);
-		return;
-	}
-
-	if (IS_ENABLED(CONFIG_NO_PAGE_MAPCOUNT)) {
-		mapcount = folio_average_page_mapcount(folio);
-		exclusive = !folio_maybe_mapped_shared(folio);
-	}
-
-	for (i = 0; i < nr; i++, page++) {
-		unsigned long pss = PAGE_SIZE << PSS_SHIFT;
-
-		if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT)) {
-			mapcount = folio_precise_page_mapcount(folio, page);
-			exclusive = mapcount < 2;
-		}
+	if (present && folio_ref_count(folio) == 1) {
+		/* Single mapping, no need to mess with mapcounts. */
+		private = true;
+	} else if (present) {
+		const int mapcount = folio_average_page_mapcount(folio);
 
 		if (mapcount >= 2)
 			pss /= mapcount;
-		smaps_page_accumulate(mss, folio, PAGE_SIZE, pss,
-				dirty, locked, exclusive);
+		private = !folio_maybe_mapped_shared(folio);
 	}
+	smaps_page_accumulate(mss, folio, size, pss, dirty, locked, private);
 }
 
 #ifdef CONFIG_SHMEM

-- 
2.43.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 07/13] mm/rmap: remove CONFIG_PAGE_MAPCOUNT
  2026-04-12 18:59 [PATCH RFC 00/13] mm/rmap: support arbitrary folio mappings David Hildenbrand (Arm)
                   ` (5 preceding siblings ...)
  2026-04-12 18:59 ` [PATCH RFC 06/13] fs/proc/task_mmu: remove CONFIG_PAGE_MAPCOUNT handling " David Hildenbrand (Arm)
@ 2026-04-12 18:59 ` David Hildenbrand (Arm)
  2026-04-12 18:59 ` [PATCH RFC 08/13] mm: re-consolidate folio->_entire_mapcount David Hildenbrand (Arm)
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-12 18:59 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Andrew Morton, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Rik van Riel, Harry Yoo,
	Jann Horn, Brendan Jackman, Zi Yan, Pedro Falcato,
	Matthew Wilcox
  Cc: cgroups, linux-doc, linux-kernel, linux-mm, linux-fsdevel,
	David Hildenbrand (Arm)

page->mapcount is still updated but essentially unused. So let's
remove CONFIG_PAGE_MAPCOUNT. Given that CONFIG_NO_PAGE_MAPCOUNT is the
only remaining variant, that Kconfig can go as well.

We can replace some instances of "orig_nr_pages" by the "nr_pages" as
the latter is no longer modified.

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 Documentation/mm/transhuge.rst |  3 ---
 include/linux/rmap.h           | 11 +----------
 mm/Kconfig                     | 17 -----------------
 mm/rmap.c                      | 36 ++++++------------------------------
 4 files changed, 7 insertions(+), 60 deletions(-)

diff --git a/Documentation/mm/transhuge.rst b/Documentation/mm/transhuge.rst
index f200c1ac19cb..eb5ac076e4c6 100644
--- a/Documentation/mm/transhuge.rst
+++ b/Documentation/mm/transhuge.rst
@@ -129,9 +129,6 @@ pages:
     corresponding mapcount), and the current status ("maybe mapped shared" vs.
     "mapped exclusively").
 
-    With CONFIG_PAGE_MAPCOUNT, we also increment/decrement
-    page->_mapcount.
-
 split_huge_page internally has to distribute the refcounts in the head
 page to the tail pages before clearing all PG_head/tail bits from the page
 structures. It can be done easily for refcounts taken by page table
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index e5569f5fdaec..4894e43e5f52 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -493,8 +493,6 @@ static __always_inline void __folio_dup_file_rmap(struct folio *folio,
 		struct page *page, int nr_pages, struct vm_area_struct *dst_vma,
 		enum pgtable_level level)
 {
-	const int orig_nr_pages = nr_pages;
-
 	__folio_rmap_sanity_checks(folio, page, nr_pages, level);
 
 	switch (level) {
@@ -504,12 +502,7 @@ static __always_inline void __folio_dup_file_rmap(struct folio *folio,
 			break;
 		}
 
-		if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT)) {
-			do {
-				atomic_inc(&page->_mapcount);
-			} while (page++, --nr_pages > 0);
-		}
-		folio_add_large_mapcount(folio, orig_nr_pages, dst_vma);
+		folio_add_large_mapcount(folio, nr_pages, dst_vma);
 		break;
 	case PGTABLE_LEVEL_PMD:
 	case PGTABLE_LEVEL_PUD:
@@ -608,8 +601,6 @@ static __always_inline int __folio_try_dup_anon_rmap(struct folio *folio,
 		do {
 			if (PageAnonExclusive(page))
 				ClearPageAnonExclusive(page);
-			if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT))
-				atomic_inc(&page->_mapcount);
 		} while (page++, --nr_pages > 0);
 		folio_add_large_mapcount(folio, orig_nr_pages, dst_vma);
 		break;
diff --git a/mm/Kconfig b/mm/Kconfig
index bd283958d675..576db4fdf16e 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -948,25 +948,8 @@ config READ_ONLY_THP_FOR_FS
 	  support of file THPs will be developed in the next few release
 	  cycles.
 
-config NO_PAGE_MAPCOUNT
-	bool "No per-page mapcount (EXPERIMENTAL)"
-	help
-	  Do not maintain per-page mapcounts for pages part of larger
-	  allocations, such as transparent huge pages.
-
-	  When this config option is enabled, some interfaces that relied on
-	  this information will rely on less-precise per-allocation information
-	  instead: for example, using the average per-page mapcount in such
-	  a large allocation instead of the per-page mapcount.
-
-	  EXPERIMENTAL because the impact of some changes is still unclear.
-
 endif # TRANSPARENT_HUGEPAGE
 
-# simple helper to make the code a bit easier to read
-config PAGE_MAPCOUNT
-	def_bool !NO_PAGE_MAPCOUNT
-
 #
 # The architecture supports pgtable leaves that is larger than PAGE_SIZE
 #
diff --git a/mm/rmap.c b/mm/rmap.c
index df42c38fe387..27488183448b 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1354,7 +1354,6 @@ static __always_inline void __folio_add_rmap(struct folio *folio,
 		enum pgtable_level level)
 {
 	int nr = 0, nr_pmdmapped = 0, mapcount;
-	const int orig_nr_pages = nr_pages;
 
 	__folio_rmap_sanity_checks(folio, page, nr_pages, level);
 
@@ -1365,14 +1364,8 @@ static __always_inline void __folio_add_rmap(struct folio *folio,
 			break;
 		}
 
-		if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT)) {
-			do {
-				atomic_inc(&page->_mapcount);
-			} while (page++, --nr_pages > 0);
-		}
-
-		mapcount = folio_add_return_large_mapcount(folio, orig_nr_pages, vma);
-		if (mapcount == orig_nr_pages)
+		mapcount = folio_add_return_large_mapcount(folio, nr_pages, vma);
+		if (mapcount == nr_pages)
 			nr = folio_large_nr_pages(folio);
 		break;
 	case PGTABLE_LEVEL_PMD:
@@ -1518,15 +1511,6 @@ static __always_inline void __folio_add_anon_rmap(struct folio *folio,
 		VM_WARN_ON_FOLIO(folio_test_large(folio) &&
 				 folio_entire_mapcount(folio) > 1 &&
 				 PageAnonExclusive(cur_page), folio);
-		if (IS_ENABLED(CONFIG_NO_PAGE_MAPCOUNT))
-			continue;
-
-		/*
-		 * While PTE-mapping a THP we have a PMD and a PTE
-		 * mapping.
-		 */
-		VM_WARN_ON_FOLIO(atomic_read(&cur_page->_mapcount) > 0 &&
-				 PageAnonExclusive(cur_page), folio);
 	}
 
 	/*
@@ -1628,14 +1612,12 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
 		int i;
 
 		nr = folio_large_nr_pages(folio);
-		for (i = 0; i < nr; i++) {
-			struct page *page = folio_page(folio, i);
+		if (exclusive) {
+			for (i = 0; i < nr; i++) {
+				struct page *page = folio_page(folio, i);
 
-			if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT))
-				/* increment count (starts at -1) */
-				atomic_set(&page->_mapcount, 0);
-			if (exclusive)
 				SetPageAnonExclusive(page);
+			}
 		}
 
 		folio_set_large_mapcount(folio, nr, vma);
@@ -1769,12 +1751,6 @@ static __always_inline void __folio_remove_rmap(struct folio *folio,
 		if (!mapcount)
 			nr = folio_large_nr_pages(folio);
 
-		if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT)) {
-			do {
-				atomic_dec(&page->_mapcount);
-			} while (page++, --nr_pages > 0);
-		}
-
 		partially_mapped = __folio_certainly_partially_mapped(folio, mapcount);
 		break;
 	case PGTABLE_LEVEL_PMD:

-- 
2.43.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 08/13] mm: re-consolidate folio->_entire_mapcount
  2026-04-12 18:59 [PATCH RFC 00/13] mm/rmap: support arbitrary folio mappings David Hildenbrand (Arm)
                   ` (6 preceding siblings ...)
  2026-04-12 18:59 ` [PATCH RFC 07/13] mm/rmap: remove CONFIG_PAGE_MAPCOUNT David Hildenbrand (Arm)
@ 2026-04-12 18:59 ` David Hildenbrand (Arm)
  2026-04-12 18:59 ` [PATCH RFC 09/13] mm: move _large_mapcount to _mapcount in page[1] of a large folio David Hildenbrand (Arm)
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-12 18:59 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Andrew Morton, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Rik van Riel, Harry Yoo,
	Jann Horn, Brendan Jackman, Zi Yan, Pedro Falcato,
	Matthew Wilcox
  Cc: cgroups, linux-doc, linux-kernel, linux-mm, linux-fsdevel,
	David Hildenbrand (Arm)

Now that we have some space left in page[1] of a large folio on 32bit,
we can re-consolidate folio->_entire_mapcount.

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 include/linux/mm.h       |  4 +---
 include/linux/mm_types.h |  5 ++---
 mm/internal.h            |  5 ++---
 mm/page_alloc.c          | 12 ++++--------
 4 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 633bbf9a184a..1715c6ed14d4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1832,9 +1832,7 @@ static inline int is_vmalloc_or_module_addr(const void *x)
  */
 static inline int folio_entire_mapcount(const struct folio *folio)
 {
-	VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
-	if (!IS_ENABLED(CONFIG_64BIT) && unlikely(folio_large_order(folio) == 1))
-		return 0;
+	VM_WARN_ON_FOLIO(!folio_test_large(folio), folio);
 	return atomic_read(&folio->_entire_mapcount) + 1;
 }
 
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 47b2c3d05f41..1e1befe7d418 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -452,9 +452,9 @@ struct folio {
 				struct {
 	/* public: */
 					atomic_t _large_mapcount;
-					unsigned int _unused_1;
-#ifdef CONFIG_64BIT
 					atomic_t _entire_mapcount;
+#ifdef CONFIG_64BIT
+					unsigned int _unused_1;
 					atomic_t _pincount;
 #endif /* CONFIG_64BIT */
 					mm_id_mapcount_t _mm_id_mapcount[2];
@@ -483,7 +483,6 @@ struct folio {
 	/* public: */
 			struct list_head _deferred_list;
 #ifndef CONFIG_64BIT
-			atomic_t _entire_mapcount;
 			atomic_t _pincount;
 #endif /* !CONFIG_64BIT */
 	/* private: the union with struct page is transitional */
diff --git a/mm/internal.h b/mm/internal.h
index 30e48f39d2de..53b20de141b9 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -868,10 +868,9 @@ static inline void prep_compound_head(struct page *page, unsigned int order)
 		folio->_mm_id_mapcount[0] = -1;
 		folio->_mm_id_mapcount[1] = -1;
 	}
-	if (IS_ENABLED(CONFIG_64BIT) || order > 1) {
+	atomic_set(&folio->_entire_mapcount, -1);
+	if (IS_ENABLED(CONFIG_64BIT) || order > 1)
 		atomic_set(&folio->_pincount, 0);
-		atomic_set(&folio->_entire_mapcount, -1);
-	}
 	if (order > 1)
 		INIT_LIST_HEAD(&folio->_deferred_list);
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8888f31aca49..1c09d79cade3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1121,11 +1121,11 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page)
 				goto out;
 			}
 		}
+		if (folio_entire_mapcount(folio)) {
+			bad_page(page, "nonzero entire_mapcount");
+			goto out;
+		}
 		if (IS_ENABLED(CONFIG_64BIT)) {
-			if (unlikely(atomic_read(&folio->_entire_mapcount) + 1)) {
-				bad_page(page, "nonzero entire_mapcount");
-				goto out;
-			}
 			if (unlikely(atomic_read(&folio->_pincount))) {
 				bad_page(page, "nonzero pincount");
 				goto out;
@@ -1139,10 +1139,6 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page)
 			goto out;
 		}
 		if (!IS_ENABLED(CONFIG_64BIT)) {
-			if (unlikely(atomic_read(&folio->_entire_mapcount) + 1)) {
-				bad_page(page, "nonzero entire_mapcount");
-				goto out;
-			}
 			if (unlikely(atomic_read(&folio->_pincount))) {
 				bad_page(page, "nonzero pincount");
 				goto out;

-- 
2.43.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 09/13] mm: move _large_mapcount to _mapcount in page[1] of a large folio
  2026-04-12 18:59 [PATCH RFC 00/13] mm/rmap: support arbitrary folio mappings David Hildenbrand (Arm)
                   ` (7 preceding siblings ...)
  2026-04-12 18:59 ` [PATCH RFC 08/13] mm: re-consolidate folio->_entire_mapcount David Hildenbrand (Arm)
@ 2026-04-12 18:59 ` David Hildenbrand (Arm)
  2026-04-12 18:59 ` [PATCH RFC 10/13] mm: re-consolidate folio->_pincount David Hildenbrand (Arm)
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-12 18:59 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Andrew Morton, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Rik van Riel, Harry Yoo,
	Jann Horn, Brendan Jackman, Zi Yan, Pedro Falcato,
	Matthew Wilcox
  Cc: cgroups, linux-doc, linux-kernel, linux-mm, linux-fsdevel,
	David Hildenbrand (Arm)

Now that the _mapcount in tail pages is completely unused, we can
re-purpose it to ... store another mapcount.

In theory, it should now unnecessary to initialize the large mapcount to -1
in prep_compound_head(), but let's keep doing that for now.

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 include/linux/mm_types.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 1e1befe7d418..e59571d2f81d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -155,8 +155,7 @@ struct page {
 		/*
 		 * For head pages of typed folios, the value stored here
 		 * allows for determining what this page is used for. The
-		 * tail pages of typed folios will not store a type
-		 * (page_type == _mapcount == -1).
+		 * tail pages of typed folios will not store a type.
 		 *
 		 * See page-flags.h for a list of page types which are currently
 		 * stored here.
@@ -378,6 +377,7 @@ typedef unsigned short mm_id_t;
  * @_entire_mapcount: Do not use directly, call folio_entire_mapcount().
  * @_large_mapcount: Do not use directly, call folio_mapcount().
  * @_unused_1: Temporary placeholder.
+ * @_unused_2: Temporary placeholder.
  * @_pincount: Do not use directly, call folio_maybe_dma_pinned().
  * @_nr_pages: Do not use directly, call folio_nr_pages().
  * @_mm_id: Do not use outside of rmap code.
@@ -451,7 +451,7 @@ struct folio {
 			union {
 				struct {
 	/* public: */
-					atomic_t _large_mapcount;
+					unsigned int _unused_2;
 					atomic_t _entire_mapcount;
 #ifdef CONFIG_64BIT
 					unsigned int _unused_1;
@@ -466,7 +466,7 @@ struct folio {
 				};
 				unsigned long _usable_1[4];
 			};
-			atomic_t _mapcount_1;
+			atomic_t _large_mapcount;
 			atomic_t _refcount_1;
 	/* public: */
 #ifdef NR_PAGES_IN_LARGE_FOLIO
@@ -529,7 +529,7 @@ FOLIO_MATCH(_last_cpupid, _last_cpupid);
 			offsetof(struct page, pg) + sizeof(struct page))
 FOLIO_MATCH(flags, _flags_1);
 FOLIO_MATCH(compound_info, _head_1);
-FOLIO_MATCH(_mapcount, _mapcount_1);
+FOLIO_MATCH(_mapcount, _large_mapcount);
 FOLIO_MATCH(_refcount, _refcount_1);
 #undef FOLIO_MATCH
 #define FOLIO_MATCH(pg, fl)						\

-- 
2.43.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 10/13] mm: re-consolidate folio->_pincount
  2026-04-12 18:59 [PATCH RFC 00/13] mm/rmap: support arbitrary folio mappings David Hildenbrand (Arm)
                   ` (8 preceding siblings ...)
  2026-04-12 18:59 ` [PATCH RFC 09/13] mm: move _large_mapcount to _mapcount in page[1] of a large folio David Hildenbrand (Arm)
@ 2026-04-12 18:59 ` David Hildenbrand (Arm)
  2026-04-12 18:59 ` [PATCH RFC 11/13] mm/rmap: stop using the entire mapcount for hugetlb folios David Hildenbrand (Arm)
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-12 18:59 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Andrew Morton, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Rik van Riel, Harry Yoo,
	Jann Horn, Brendan Jackman, Zi Yan, Pedro Falcato,
	Matthew Wilcox
  Cc: cgroups, linux-doc, linux-kernel, linux-mm, linux-fsdevel,
	David Hildenbrand (Arm)

Now that we have some space left in page[1] of a large folio on 32bit,
we can re-consolidate folio->_pincount.

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 include/linux/mm.h       |  4 +---
 include/linux/mm_types.h |  7 ++-----
 mm/debug.c               |  5 +----
 mm/internal.h            |  3 +--
 mm/page_alloc.c          | 14 +++-----------
 5 files changed, 8 insertions(+), 25 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1715c6ed14d4..6dd906585420 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2593,9 +2593,7 @@ static inline pud_t folio_mk_pud(const struct folio *folio, pgprot_t pgprot)
 
 static inline bool folio_has_pincount(const struct folio *folio)
 {
-	if (IS_ENABLED(CONFIG_64BIT))
-		return folio_test_large(folio);
-	return folio_order(folio) > 1;
+	return folio_test_large(folio);
 }
 
 /**
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index e59571d2f81d..450f61cad678 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -451,11 +451,11 @@ struct folio {
 			union {
 				struct {
 	/* public: */
-					unsigned int _unused_2;
+					atomic_t _pincount;
 					atomic_t _entire_mapcount;
 #ifdef CONFIG_64BIT
 					unsigned int _unused_1;
-					atomic_t _pincount;
+					unsigned int _unused_2;
 #endif /* CONFIG_64BIT */
 					mm_id_mapcount_t _mm_id_mapcount[2];
 					union {
@@ -482,9 +482,6 @@ struct folio {
 			unsigned long _head_2;
 	/* public: */
 			struct list_head _deferred_list;
-#ifndef CONFIG_64BIT
-			atomic_t _pincount;
-#endif /* !CONFIG_64BIT */
 	/* private: the union with struct page is transitional */
 		};
 		struct page __page_2;
diff --git a/mm/debug.c b/mm/debug.c
index bfb41ef17a5e..80e050bf29ba 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -81,10 +81,7 @@ static void __dump_folio(const struct folio *folio, const struct page *page,
 			folio_ref_count(folio), mapcount, mapping,
 			folio->index + idx, pfn);
 	if (folio_test_large(folio)) {
-		int pincount = 0;
-
-		if (folio_has_pincount(folio))
-			pincount = atomic_read(&folio->_pincount);
+		int pincount = atomic_read(&folio->_pincount);
 
 		pr_warn("head: order:%u mapcount:%d entire_mapcount:%d pincount:%d\n",
 				folio_order(folio),
diff --git a/mm/internal.h b/mm/internal.h
index 53b20de141b9..aa1206495bc6 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -869,8 +869,7 @@ static inline void prep_compound_head(struct page *page, unsigned int order)
 		folio->_mm_id_mapcount[1] = -1;
 	}
 	atomic_set(&folio->_entire_mapcount, -1);
-	if (IS_ENABLED(CONFIG_64BIT) || order > 1)
-		atomic_set(&folio->_pincount, 0);
+	atomic_set(&folio->_pincount, 0);
 	if (order > 1)
 		INIT_LIST_HEAD(&folio->_deferred_list);
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1c09d79cade3..8ed4c73fdba4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1125,11 +1125,9 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page)
 			bad_page(page, "nonzero entire_mapcount");
 			goto out;
 		}
-		if (IS_ENABLED(CONFIG_64BIT)) {
-			if (unlikely(atomic_read(&folio->_pincount))) {
-				bad_page(page, "nonzero pincount");
-				goto out;
-			}
+		if (unlikely(atomic_read(&folio->_pincount))) {
+			bad_page(page, "nonzero pincount");
+			goto out;
 		}
 		break;
 	case 2:
@@ -1138,12 +1136,6 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page)
 			bad_page(page, "on deferred list");
 			goto out;
 		}
-		if (!IS_ENABLED(CONFIG_64BIT)) {
-			if (unlikely(atomic_read(&folio->_pincount))) {
-				bad_page(page, "nonzero pincount");
-				goto out;
-			}
-		}
 		break;
 	case 3:
 		/* the third tail page: hugetlb specifics overlap ->mappings */

-- 
2.43.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 11/13] mm/rmap: stop using the entire mapcount for hugetlb folios
  2026-04-12 18:59 [PATCH RFC 00/13] mm/rmap: support arbitrary folio mappings David Hildenbrand (Arm)
                   ` (9 preceding siblings ...)
  2026-04-12 18:59 ` [PATCH RFC 10/13] mm: re-consolidate folio->_pincount David Hildenbrand (Arm)
@ 2026-04-12 18:59 ` David Hildenbrand (Arm)
  2026-04-12 18:59 ` [PATCH RFC 12/13] mm/rmap: large mapcount interface cleanups David Hildenbrand (Arm)
  2026-04-12 18:59 ` [PATCH RFC 13/13] mm/rmap: support arbitrary folio mappings David Hildenbrand (Arm)
  12 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-12 18:59 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Andrew Morton, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Rik van Riel, Harry Yoo,
	Jann Horn, Brendan Jackman, Zi Yan, Pedro Falcato,
	Matthew Wilcox
  Cc: cgroups, linux-doc, linux-kernel, linux-mm, linux-fsdevel,
	David Hildenbrand (Arm)

There is no real reason why hugetlb still updates the entire mapcount:
the value always corresponds to folio_mapcount().

As we want to change the semantics of the entire mapcount in a way
incompatible with hugetlb, let's just stop using the entire mapcount
for hugetlb folios entirely.

We only have to teach folio_average_page_mapcount() about the change.

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 fs/proc/internal.h   | 3 +++
 include/linux/mm.h   | 2 ++
 include/linux/rmap.h | 3 ---
 mm/debug.c           | 2 +-
 mm/rmap.c            | 4 +---
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index a5908167ce2d..1dd46e55c850 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -186,6 +186,9 @@ static inline int folio_average_page_mapcount(struct folio *folio)
 	mapcount = folio_large_mapcount(folio);
 	if (unlikely(mapcount <= 0))
 		return 0;
+	if (folio_test_hugetlb(folio))
+		return mapcount;
+
 	entire_mapcount = folio_entire_mapcount(folio);
 	if (mapcount <= entire_mapcount)
 		return entire_mapcount;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6dd906585420..3092db64a009 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1829,6 +1829,8 @@ static inline int is_vmalloc_or_module_addr(const void *x)
  * How many times the entire folio is mapped as a single unit (eg by a
  * PMD or PUD entry).  This is probably not what you want, except for
  * debugging purposes or implementation of other core folio_*() primitives.
+ *
+ * Always 0 for hugetlb folios.
  */
 static inline int folio_entire_mapcount(const struct folio *folio)
 {
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 4894e43e5f52..b81b1d9e1eaa 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -443,7 +443,6 @@ static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
 			return -EBUSY;
 		ClearPageAnonExclusive(&folio->page);
 	}
-	atomic_inc(&folio->_entire_mapcount);
 	atomic_inc(&folio->_large_mapcount);
 	return 0;
 }
@@ -477,7 +476,6 @@ static inline void hugetlb_add_file_rmap(struct folio *folio)
 	VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
 	VM_WARN_ON_FOLIO(folio_test_anon(folio), folio);
 
-	atomic_inc(&folio->_entire_mapcount);
 	atomic_inc(&folio->_large_mapcount);
 }
 
@@ -485,7 +483,6 @@ static inline void hugetlb_remove_rmap(struct folio *folio)
 {
 	VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
 
-	atomic_dec(&folio->_entire_mapcount);
 	atomic_dec(&folio->_large_mapcount);
 }
 
diff --git a/mm/debug.c b/mm/debug.c
index 80e050bf29ba..82baaf87ef3d 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -86,7 +86,7 @@ static void __dump_folio(const struct folio *folio, const struct page *page,
 		pr_warn("head: order:%u mapcount:%d entire_mapcount:%d pincount:%d\n",
 				folio_order(folio),
 				folio_mapcount(folio),
-				folio_entire_mapcount(folio),
+				folio_entire_mapcount(folio);
 				pincount);
 	}
 
diff --git a/mm/rmap.c b/mm/rmap.c
index 27488183448b..d08927949284 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -3042,11 +3042,10 @@ void hugetlb_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
 	VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
 	VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
 
-	atomic_inc(&folio->_entire_mapcount);
 	atomic_inc(&folio->_large_mapcount);
 	if (flags & RMAP_EXCLUSIVE)
 		SetPageAnonExclusive(&folio->page);
-	VM_WARN_ON_FOLIO(folio_entire_mapcount(folio) > 1 &&
+	VM_WARN_ON_FOLIO(folio_large_mapcount(folio) > 1 &&
 			 PageAnonExclusive(&folio->page), folio);
 }
 
@@ -3057,7 +3056,6 @@ void hugetlb_add_new_anon_rmap(struct folio *folio,
 
 	BUG_ON(address < vma->vm_start || address >= vma->vm_end);
 	/* increment count (starts at -1) */
-	atomic_set(&folio->_entire_mapcount, 0);
 	atomic_set(&folio->_large_mapcount, 0);
 	folio_clear_hugetlb_restore_reserve(folio);
 	__folio_set_anon(folio, vma, address, true);

-- 
2.43.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 12/13] mm/rmap: large mapcount interface cleanups
  2026-04-12 18:59 [PATCH RFC 00/13] mm/rmap: support arbitrary folio mappings David Hildenbrand (Arm)
                   ` (10 preceding siblings ...)
  2026-04-12 18:59 ` [PATCH RFC 11/13] mm/rmap: stop using the entire mapcount for hugetlb folios David Hildenbrand (Arm)
@ 2026-04-12 18:59 ` David Hildenbrand (Arm)
  2026-04-12 18:59 ` [PATCH RFC 13/13] mm/rmap: support arbitrary folio mappings David Hildenbrand (Arm)
  12 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-12 18:59 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Andrew Morton, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Rik van Riel, Harry Yoo,
	Jann Horn, Brendan Jackman, Zi Yan, Pedro Falcato,
	Matthew Wilcox
  Cc: cgroups, linux-doc, linux-kernel, linux-mm, linux-fsdevel,
	David Hildenbrand (Arm)

Let's prepare for passing another counter by renaming diff/mapcount to
"nr_mappings" and just using an "unsigned int".

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 include/linux/rmap.h | 61 ++++++++++++++++++++++++++--------------------------
 1 file changed, 31 insertions(+), 30 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index b81b1d9e1eaa..5a02ffd3744a 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -133,10 +133,10 @@ static inline void folio_set_mm_id(struct folio *folio, int idx, mm_id_t id)
 }
 
 static inline void __folio_large_mapcount_sanity_checks(const struct folio *folio,
-		int diff, mm_id_t mm_id)
+		unsigned int nr_mappings, mm_id_t mm_id)
 {
 	VM_WARN_ON_ONCE(!folio_test_large(folio) || folio_test_hugetlb(folio));
-	VM_WARN_ON_ONCE(diff <= 0);
+	VM_WARN_ON_ONCE(nr_mappings == 0);
 	VM_WARN_ON_ONCE(mm_id < MM_ID_MIN || mm_id > MM_ID_MAX);
 
 	/*
@@ -145,7 +145,7 @@ static inline void __folio_large_mapcount_sanity_checks(const struct folio *foli
 	 * a check on 32bit, where we currently reduce the size of the per-MM
 	 * mapcount to a short.
 	 */
-	VM_WARN_ON_ONCE(diff > folio_large_nr_pages(folio));
+	VM_WARN_ON_ONCE(nr_mappings > folio_large_nr_pages(folio));
 	VM_WARN_ON_ONCE(folio_large_nr_pages(folio) - 1 > MM_ID_MAPCOUNT_MAX);
 
 	VM_WARN_ON_ONCE(folio_mm_id(folio, 0) == MM_ID_DUMMY &&
@@ -161,29 +161,29 @@ static inline void __folio_large_mapcount_sanity_checks(const struct folio *foli
 }
 
 static __always_inline void folio_set_large_mapcount(struct folio *folio,
-		int mapcount, struct vm_area_struct *vma)
+		unsigned int nr_mappings, struct vm_area_struct *vma)
 {
-	__folio_large_mapcount_sanity_checks(folio, mapcount, vma->vm_mm->mm_id);
+	__folio_large_mapcount_sanity_checks(folio, nr_mappings, vma->vm_mm->mm_id);
 
 	VM_WARN_ON_ONCE(folio_mm_id(folio, 0) != MM_ID_DUMMY);
 	VM_WARN_ON_ONCE(folio_mm_id(folio, 1) != MM_ID_DUMMY);
 
 	/* Note: mapcounts start at -1. */
-	atomic_set(&folio->_large_mapcount, mapcount - 1);
-	folio->_mm_id_mapcount[0] = mapcount - 1;
+	atomic_set(&folio->_large_mapcount, nr_mappings - 1);
+	folio->_mm_id_mapcount[0] = nr_mappings - 1;
 	folio_set_mm_id(folio, 0, vma->vm_mm->mm_id);
 }
 
 static __always_inline int folio_add_return_large_mapcount(struct folio *folio,
-		int diff, struct vm_area_struct *vma)
+		unsigned int nr_mappings, struct vm_area_struct *vma)
 {
 	const mm_id_t mm_id = vma->vm_mm->mm_id;
 	int new_mapcount_val;
 
 	folio_lock_large_mapcount(folio);
-	__folio_large_mapcount_sanity_checks(folio, diff, mm_id);
+	__folio_large_mapcount_sanity_checks(folio, nr_mappings, mm_id);
 
-	new_mapcount_val = atomic_read(&folio->_large_mapcount) + diff;
+	new_mapcount_val = atomic_read(&folio->_large_mapcount) + nr_mappings;
 	atomic_set(&folio->_large_mapcount, new_mapcount_val);
 
 	/*
@@ -194,14 +194,14 @@ static __always_inline int folio_add_return_large_mapcount(struct folio *folio,
 	 * we might be in trouble when unmapping pages later.
 	 */
 	if (folio_mm_id(folio, 0) == mm_id) {
-		folio->_mm_id_mapcount[0] += diff;
+		folio->_mm_id_mapcount[0] += nr_mappings;
 		if (!IS_ENABLED(CONFIG_64BIT) && unlikely(folio->_mm_id_mapcount[0] < 0)) {
 			folio->_mm_id_mapcount[0] = -1;
 			folio_set_mm_id(folio, 0, MM_ID_DUMMY);
 			folio->_mm_ids |= FOLIO_MM_IDS_SHARED_BIT;
 		}
 	} else if (folio_mm_id(folio, 1) == mm_id) {
-		folio->_mm_id_mapcount[1] += diff;
+		folio->_mm_id_mapcount[1] += nr_mappings;
 		if (!IS_ENABLED(CONFIG_64BIT) && unlikely(folio->_mm_id_mapcount[1] < 0)) {
 			folio->_mm_id_mapcount[1] = -1;
 			folio_set_mm_id(folio, 1, MM_ID_DUMMY);
@@ -209,13 +209,13 @@ static __always_inline int folio_add_return_large_mapcount(struct folio *folio,
 		}
 	} else if (folio_mm_id(folio, 0) == MM_ID_DUMMY) {
 		folio_set_mm_id(folio, 0, mm_id);
-		folio->_mm_id_mapcount[0] = diff - 1;
+		folio->_mm_id_mapcount[0] = nr_mappings - 1;
 		/* We might have other mappings already. */
-		if (new_mapcount_val != diff - 1)
+		if (new_mapcount_val != nr_mappings - 1)
 			folio->_mm_ids |= FOLIO_MM_IDS_SHARED_BIT;
 	} else if (folio_mm_id(folio, 1) == MM_ID_DUMMY) {
 		folio_set_mm_id(folio, 1, mm_id);
-		folio->_mm_id_mapcount[1] = diff - 1;
+		folio->_mm_id_mapcount[1] = nr_mappings - 1;
 		/* Slot 0 certainly has mappings as well. */
 		folio->_mm_ids |= FOLIO_MM_IDS_SHARED_BIT;
 	}
@@ -225,15 +225,15 @@ static __always_inline int folio_add_return_large_mapcount(struct folio *folio,
 #define folio_add_large_mapcount folio_add_return_large_mapcount
 
 static __always_inline int folio_sub_return_large_mapcount(struct folio *folio,
-		int diff, struct vm_area_struct *vma)
+		unsigned int nr_mappings, struct vm_area_struct *vma)
 {
 	const mm_id_t mm_id = vma->vm_mm->mm_id;
 	int new_mapcount_val;
 
 	folio_lock_large_mapcount(folio);
-	__folio_large_mapcount_sanity_checks(folio, diff, mm_id);
+	__folio_large_mapcount_sanity_checks(folio, nr_mappings, mm_id);
 
-	new_mapcount_val = atomic_read(&folio->_large_mapcount) - diff;
+	new_mapcount_val = atomic_read(&folio->_large_mapcount) - nr_mappings;
 	atomic_set(&folio->_large_mapcount, new_mapcount_val);
 
 	/*
@@ -243,13 +243,13 @@ static __always_inline int folio_sub_return_large_mapcount(struct folio *folio,
 	 * negative.
 	 */
 	if (folio_mm_id(folio, 0) == mm_id) {
-		folio->_mm_id_mapcount[0] -= diff;
+		folio->_mm_id_mapcount[0] -= nr_mappings;
 		if (folio->_mm_id_mapcount[0] >= 0)
 			goto out;
 		folio->_mm_id_mapcount[0] = -1;
 		folio_set_mm_id(folio, 0, MM_ID_DUMMY);
 	} else if (folio_mm_id(folio, 1) == mm_id) {
-		folio->_mm_id_mapcount[1] -= diff;
+		folio->_mm_id_mapcount[1] -= nr_mappings;
 		if (folio->_mm_id_mapcount[1] >= 0)
 			goto out;
 		folio->_mm_id_mapcount[1] = -1;
@@ -275,35 +275,36 @@ static __always_inline int folio_sub_return_large_mapcount(struct folio *folio,
  * See __folio_rmap_sanity_checks(), we might map large folios even without
  * CONFIG_TRANSPARENT_HUGEPAGE. We'll keep that working for now.
  */
-static inline void folio_set_large_mapcount(struct folio *folio, int mapcount,
+static inline void folio_set_large_mapcount(struct folio *folio,
+		unsigned int nr_mappings,
 		struct vm_area_struct *vma)
 {
 	/* Note: mapcounts start at -1. */
-	atomic_set(&folio->_large_mapcount, mapcount - 1);
+	atomic_set(&folio->_large_mapcount, nr_mappings - 1);
 }
 
 static inline void folio_add_large_mapcount(struct folio *folio,
-		int diff, struct vm_area_struct *vma)
+		unsigned int nr_mappings, struct vm_area_struct *vma)
 {
-	atomic_add(diff, &folio->_large_mapcount);
+	atomic_add(nr_mappings, &folio->_large_mapcount);
 }
 
 static inline int folio_add_return_large_mapcount(struct folio *folio,
-		int diff, struct vm_area_struct *vma)
+		unsigned int nr_mappings, struct vm_area_struct *vma)
 {
-	return atomic_add_return(diff, &folio->_large_mapcount) + 1;
+	return atomic_add_return(nr_mappings, &folio->_large_mapcount) + 1;
 }
 
 static inline void folio_sub_large_mapcount(struct folio *folio,
-		int diff, struct vm_area_struct *vma)
+		unsigned int nr_mappings, struct vm_area_struct *vma)
 {
-	atomic_sub(diff, &folio->_large_mapcount);
+	atomic_sub(nr_mappings, &folio->_large_mapcount);
 }
 
 static inline int folio_sub_return_large_mapcount(struct folio *folio,
-		int diff, struct vm_area_struct *vma)
+		unsigned int nr_mappings, struct vm_area_struct *vma)
 {
-	return atomic_sub_return(diff, &folio->_large_mapcount) + 1;
+	return atomic_sub_return(nr_mappings, &folio->_large_mapcount) + 1;
 }
 #endif /* CONFIG_MM_ID */
 

-- 
2.43.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 13/13] mm/rmap: support arbitrary folio mappings
  2026-04-12 18:59 [PATCH RFC 00/13] mm/rmap: support arbitrary folio mappings David Hildenbrand (Arm)
                   ` (11 preceding siblings ...)
  2026-04-12 18:59 ` [PATCH RFC 12/13] mm/rmap: large mapcount interface cleanups David Hildenbrand (Arm)
@ 2026-04-12 18:59 ` David Hildenbrand (Arm)
  12 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-12 18:59 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Andrew Morton, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Rik van Riel, Harry Yoo,
	Jann Horn, Brendan Jackman, Zi Yan, Pedro Falcato,
	Matthew Wilcox
  Cc: cgroups, linux-doc, linux-kernel, linux-mm, linux-fsdevel,
	David Hildenbrand (Arm)

Let's replace the entire mapcount by the sum of mapped pages ("total mapped
pages"), which we update alongside the mapcount under the large mapcount
lock.

This allows for teaching all rmap code to just support arbitrary folio
mappings: PUD-sized folio being mapped by PMDs and PTEs, or
mapping folios that span multiple PMDs/PUDs. Note that calling code still
has to be updated to support that.

For example, a PMD-sized large folio with 512 pages that is mapped
through 2 PMDs and a single PTE has mapcount == 3 and 1025 total mapped
pages.

Calculating folio_average_page_mapcount() is now trivial. Provide a
new helper folio_total_mapped_pages() for that purpose. Similarly,
detecting certainly partially mapped folios in
__folio_certainly_partially_mapped() when unmapping is now trivial.

Pass another parameter ("nr_pages") to the large mapcount helpers that
update the new folio->_total_mapped_pages counter atomically with the
mapcount, and return the new value alongside the new mapcount.

We can keep maintaining the PMD statistics for PMD-sized THPs
(e.g., AnonHugePages) based on the new mapcount and the new total mapped
pages quite neatly, without the need for an additional pmd mapcount.

This all cleans up the code nicely. Introduce pgtable_level_to_order() to
easily convert from a pgtable_level to the mapping order so we can

Is an unsigned long for "total mapped pages" sufficient on 32bit? Maybe
not, but it is a similar problem to an "int" being insufficient to store
the mapcount on 64bit (and likely on 32bit) when triggering many PTE
mappings. Likely, for the time being, we might just want to prevent
overflowing both of these counters by teaching rmap code to fail early, or
letting calling code do some opportunistic checks: we don't expect current
reasonable use cases to overflow these counters.

Note that the !CONFIG_MM_ID implementation only exists for cases where
rmap code is called with a large folio even though THPs are not
supported by the kernel config: PMD/PUD mappings are impossible in such
configurations, and proper large folios are not possible. In the future,
we will remove this code entirely, as these pages are not actual folios,
and we can just enable CONFIG_MM_ID in

No functional change intended.

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 Documentation/mm/transhuge.rst |   5 +-
 fs/proc/internal.h             |  22 ++---
 include/linux/mm.h             |  33 +++++---
 include/linux/mm_types.h       |   6 +-
 include/linux/pgtable.h        |  22 +++++
 include/linux/rmap.h           | 184 +++++++++++++++++++----------------------
 mm/debug.c                     |   4 +-
 mm/internal.h                  |   2 +-
 mm/memory.c                    |   3 +-
 mm/page_alloc.c                |   4 +-
 mm/rmap.c                      | 165 ++++++++++++++++--------------------
 11 files changed, 214 insertions(+), 236 deletions(-)

diff --git a/Documentation/mm/transhuge.rst b/Documentation/mm/transhuge.rst
index eb5ac076e4c6..76d3413a5b6b 100644
--- a/Documentation/mm/transhuge.rst
+++ b/Documentation/mm/transhuge.rst
@@ -116,14 +116,15 @@ pages:
     succeeds on tail pages.
 
   - map/unmap of a PMD entry for the whole THP increment/decrement
-    folio->_entire_mapcount and folio->_large_mapcount.
+    folio->_large_mapcount and add/remove HPAGE_PMD_NR to
+    folio->_total_mapped_pages.
 
     We also maintain the two slots for tracking MM owners (MM ID and
     corresponding mapcount), and the current status ("maybe mapped shared" vs.
     "mapped exclusively").
 
   - map/unmap of individual pages with PTE entry increment/decrement
-    folio->_large_mapcount.
+    folio->_total_mapped_pages and folio->_large_mapcount.
 
     We also maintain the two slots for tracking MM owners (MM ID and
     corresponding mapcount), and the current status ("maybe mapped shared" vs.
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 1dd46e55c850..fae901769529 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -178,26 +178,16 @@ unsigned name_to_int(const struct qstr *qstr);
  */
 static inline int folio_average_page_mapcount(struct folio *folio)
 {
-	int mapcount, entire_mapcount, avg;
+	unsigned long total_mapped_pages = folio_total_mapped_pages(folio);
+	const unsigned int order = folio_order(folio);
 
-	if (!folio_test_large(folio))
-		return atomic_read(&folio->_mapcount) + 1;
-
-	mapcount = folio_large_mapcount(folio);
-	if (unlikely(mapcount <= 0))
-		return 0;
-	if (folio_test_hugetlb(folio))
-		return mapcount;
-
-	entire_mapcount = folio_entire_mapcount(folio);
-	if (mapcount <= entire_mapcount)
-		return entire_mapcount;
-	mapcount -= entire_mapcount;
+	if (!total_mapped_pages || !order)
+		return total_mapped_pages;
 
 	/* Round to closest integer ... */
-	avg = ((unsigned int)mapcount + folio_large_nr_pages(folio) / 2) >> folio_large_order(folio);
+	total_mapped_pages += 1ul << (order - 1);
 	/* ... but return at least 1. */
-	return max_t(int, avg + entire_mapcount, 1);
+	return max(total_mapped_pages >> order, 1);
 }
 /*
  * array.c
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3092db64a009..b1c55e0cd317 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1825,19 +1825,6 @@ static inline int is_vmalloc_or_module_addr(const void *x)
 }
 #endif
 
-/*
- * How many times the entire folio is mapped as a single unit (eg by a
- * PMD or PUD entry).  This is probably not what you want, except for
- * debugging purposes or implementation of other core folio_*() primitives.
- *
- * Always 0 for hugetlb folios.
- */
-static inline int folio_entire_mapcount(const struct folio *folio)
-{
-	VM_WARN_ON_FOLIO(!folio_test_large(folio), folio);
-	return atomic_read(&folio->_entire_mapcount) + 1;
-}
-
 static inline int folio_large_mapcount(const struct folio *folio)
 {
 	VM_WARN_ON_FOLIO(!folio_test_large(folio), folio);
@@ -1888,6 +1875,26 @@ static inline bool folio_mapped(const struct folio *folio)
 	return folio_mapcount(folio) >= 1;
 }
 
+/**
+ * folio_total_mapped_pages - total mapped pages across all mappings
+ * @folio: The folio.
+ *
+ * Return the total number of pages mapped by all mappings of this folio.
+ * A page mapped multiple times is counted multiple times.
+ *
+ * For example, a single folio mapped through two PMD-sized mappings will
+ * contribute 1024 pages to the total on systems where a PMD maps 512 pages.
+ *
+ * Return: Total number of mapped pages across all mappings of @folio.
+ */
+static inline unsigned long folio_total_mapped_pages(const struct folio *folio)
+{
+	if (!folio_test_large(folio) || folio_test_hugetlb(folio) ||
+	    !IS_ENABLED(CONFIG_MM_ID))
+		return folio_mapcount(folio);
+	return atomic_long_read(&folio->_total_mapped_pages);
+}
+
 /*
  * Return true if this page is mapped into pagetables.
  * For compound page it returns true if any sub-page of compound page is mapped,
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 450f61cad678..93e05c4fd7b3 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -374,10 +374,9 @@ typedef unsigned short mm_id_t;
  * @pgmap: Metadata for ZONE_DEVICE mappings
  * @virtual: Virtual address in the kernel direct map.
  * @_last_cpupid: IDs of last CPU and last process that accessed the folio.
- * @_entire_mapcount: Do not use directly, call folio_entire_mapcount().
+ * @_total_mapped_pages: Do not use directly, call folio_total_mapped_pages().
  * @_large_mapcount: Do not use directly, call folio_mapcount().
  * @_unused_1: Temporary placeholder.
- * @_unused_2: Temporary placeholder.
  * @_pincount: Do not use directly, call folio_maybe_dma_pinned().
  * @_nr_pages: Do not use directly, call folio_nr_pages().
  * @_mm_id: Do not use outside of rmap code.
@@ -452,11 +451,10 @@ struct folio {
 				struct {
 	/* public: */
 					atomic_t _pincount;
-					atomic_t _entire_mapcount;
 #ifdef CONFIG_64BIT
 					unsigned int _unused_1;
-					unsigned int _unused_2;
 #endif /* CONFIG_64BIT */
+					atomic_long_t _total_mapped_pages;
 					mm_id_mapcount_t _mm_id_mapcount[2];
 					union {
 						mm_id_t _mm_id[2];
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index cdd68ed3ae1a..2351205d9076 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -7,6 +7,8 @@
 
 #define PMD_ORDER	(PMD_SHIFT - PAGE_SHIFT)
 #define PUD_ORDER	(PUD_SHIFT - PAGE_SHIFT)
+#define P4D_ORDER	(P4D_SHIFT - PAGE_SHIFT)
+#define PGDIR_ORDER	(PGDIR_SHIFT - PAGE_SHIFT)
 
 #ifndef __ASSEMBLY__
 #ifdef CONFIG_MMU
@@ -2243,6 +2245,26 @@ static inline const char *pgtable_level_to_str(enum pgtable_level level)
 	}
 }
 
+#ifdef CONFIG_MMU
+static __always_inline unsigned int pgtable_level_to_order(enum pgtable_level level)
+{
+	switch (level) {
+	case PGTABLE_LEVEL_PTE:
+		return 0;
+	case PGTABLE_LEVEL_PMD:
+		return PMD_ORDER;
+	case PGTABLE_LEVEL_PUD:
+		return PUD_ORDER;
+	case PGTABLE_LEVEL_P4D:
+		return P4D_ORDER;
+	case PGTABLE_LEVEL_PGD:
+		return PGDIR_ORDER;
+	default:
+		BUILD_BUG();
+	}
+}
+#endif /* CONFIG_MMU */
+
 #endif /* !__ASSEMBLY__ */
 
 #if !defined(MAX_POSSIBLE_PHYSMEM_BITS) && !defined(CONFIG_64BIT)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 5a02ffd3744a..a71cdd706c7e 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -133,10 +133,10 @@ static inline void folio_set_mm_id(struct folio *folio, int idx, mm_id_t id)
 }
 
 static inline void __folio_large_mapcount_sanity_checks(const struct folio *folio,
-		unsigned int nr_mappings, mm_id_t mm_id)
+		unsigned int nr_mappings, unsigned int nr_pages, mm_id_t mm_id)
 {
 	VM_WARN_ON_ONCE(!folio_test_large(folio) || folio_test_hugetlb(folio));
-	VM_WARN_ON_ONCE(nr_mappings == 0);
+	VM_WARN_ON_ONCE(nr_mappings == 0 || nr_pages == 0 || nr_mappings > nr_pages);
 	VM_WARN_ON_ONCE(mm_id < MM_ID_MIN || mm_id > MM_ID_MAX);
 
 	/*
@@ -145,7 +145,7 @@ static inline void __folio_large_mapcount_sanity_checks(const struct folio *foli
 	 * a check on 32bit, where we currently reduce the size of the per-MM
 	 * mapcount to a short.
 	 */
-	VM_WARN_ON_ONCE(nr_mappings > folio_large_nr_pages(folio));
+	VM_WARN_ON_ONCE(nr_pages > folio_large_nr_pages(folio));
 	VM_WARN_ON_ONCE(folio_large_nr_pages(folio) - 1 > MM_ID_MAPCOUNT_MAX);
 
 	VM_WARN_ON_ONCE(folio_mm_id(folio, 0) == MM_ID_DUMMY &&
@@ -161,31 +161,38 @@ static inline void __folio_large_mapcount_sanity_checks(const struct folio *foli
 }
 
 static __always_inline void folio_set_large_mapcount(struct folio *folio,
-		unsigned int nr_mappings, struct vm_area_struct *vma)
+		unsigned int nr_mappings, int nr_pages, struct vm_area_struct *vma)
 {
-	__folio_large_mapcount_sanity_checks(folio, nr_mappings, vma->vm_mm->mm_id);
+	__folio_large_mapcount_sanity_checks(folio, nr_mappings, nr_pages,
+					     vma->vm_mm->mm_id);
 
 	VM_WARN_ON_ONCE(folio_mm_id(folio, 0) != MM_ID_DUMMY);
 	VM_WARN_ON_ONCE(folio_mm_id(folio, 1) != MM_ID_DUMMY);
 
 	/* Note: mapcounts start at -1. */
 	atomic_set(&folio->_large_mapcount, nr_mappings - 1);
+	atomic_long_set(&folio->_total_mapped_pages, nr_pages);
 	folio->_mm_id_mapcount[0] = nr_mappings - 1;
 	folio_set_mm_id(folio, 0, vma->vm_mm->mm_id);
 }
 
 static __always_inline int folio_add_return_large_mapcount(struct folio *folio,
-		unsigned int nr_mappings, struct vm_area_struct *vma)
+		unsigned int nr_mappings, unsigned int nr_pages,
+		struct vm_area_struct *vma, unsigned long *nr_mapped_pages)
 {
 	const mm_id_t mm_id = vma->vm_mm->mm_id;
+	unsigned long new_mapped_pages;
 	int new_mapcount_val;
 
 	folio_lock_large_mapcount(folio);
-	__folio_large_mapcount_sanity_checks(folio, nr_mappings, mm_id);
+	__folio_large_mapcount_sanity_checks(folio, nr_mappings, nr_pages, mm_id);
 
 	new_mapcount_val = atomic_read(&folio->_large_mapcount) + nr_mappings;
 	atomic_set(&folio->_large_mapcount, new_mapcount_val);
 
+	new_mapped_pages = atomic_long_read(&folio->_total_mapped_pages) + nr_pages;
+	atomic_long_set(&folio->_total_mapped_pages, new_mapped_pages);
+
 	/*
 	 * If a folio is mapped more than once into an MM on 32bit, we
 	 * can in theory overflow the per-MM mapcount (although only for
@@ -220,22 +227,38 @@ static __always_inline int folio_add_return_large_mapcount(struct folio *folio,
 		folio->_mm_ids |= FOLIO_MM_IDS_SHARED_BIT;
 	}
 	folio_unlock_large_mapcount(folio);
+
+	*nr_mapped_pages = new_mapped_pages;
 	return new_mapcount_val + 1;
 }
-#define folio_add_large_mapcount folio_add_return_large_mapcount
+
+static __always_inline void folio_add_large_mapcount(struct folio *folio,
+		unsigned int nr_mappings, unsigned int nr_pages,
+		struct vm_area_struct *vma)
+{
+	unsigned long nr_mapped_pages;
+
+	folio_add_return_large_mapcount(folio, nr_mappings, nr_pages, vma,
+					&nr_mapped_pages);
+}
 
 static __always_inline int folio_sub_return_large_mapcount(struct folio *folio,
-		unsigned int nr_mappings, struct vm_area_struct *vma)
+		unsigned int nr_mappings, unsigned int nr_pages,
+		struct vm_area_struct *vma, unsigned long *nr_mapped_pages)
 {
 	const mm_id_t mm_id = vma->vm_mm->mm_id;
+	unsigned long new_mapped_pages;
 	int new_mapcount_val;
 
 	folio_lock_large_mapcount(folio);
-	__folio_large_mapcount_sanity_checks(folio, nr_mappings, mm_id);
+	__folio_large_mapcount_sanity_checks(folio, nr_mappings, nr_pages, mm_id);
 
 	new_mapcount_val = atomic_read(&folio->_large_mapcount) - nr_mappings;
 	atomic_set(&folio->_large_mapcount, new_mapcount_val);
 
+	new_mapped_pages = atomic_long_read(&folio->_total_mapped_pages) - nr_pages;
+	atomic_long_set(&folio->_total_mapped_pages, new_mapped_pages);
+
 	/*
 	 * There are valid corner cases where we might underflow a per-MM
 	 * mapcount (some mappings added when no slot was free, some mappings
@@ -267,56 +290,59 @@ static __always_inline int folio_sub_return_large_mapcount(struct folio *folio,
 		folio->_mm_ids &= ~FOLIO_MM_IDS_SHARED_BIT;
 out:
 	folio_unlock_large_mapcount(folio);
+
+	*nr_mapped_pages = new_mapped_pages;
 	return new_mapcount_val + 1;
 }
-#define folio_sub_large_mapcount folio_sub_return_large_mapcount
 #else /* !CONFIG_MM_ID */
 /*
  * See __folio_rmap_sanity_checks(), we might map large folios even without
  * CONFIG_TRANSPARENT_HUGEPAGE. We'll keep that working for now.
  */
 static inline void folio_set_large_mapcount(struct folio *folio,
-		unsigned int nr_mappings,
+		unsigned int nr_mappings, unsigned int nr_pages,
 		struct vm_area_struct *vma)
 {
+	/* No support for large mappings. */
+	VM_WARN_ON_ONCE(nr_mappings != nr_pages);
 	/* Note: mapcounts start at -1. */
 	atomic_set(&folio->_large_mapcount, nr_mappings - 1);
 }
 
 static inline void folio_add_large_mapcount(struct folio *folio,
-		unsigned int nr_mappings, struct vm_area_struct *vma)
+		unsigned int nr_mappings, unsigned int nr_pages,
+		struct vm_area_struct *vma)
 {
+	/* No support for large mappings. */
+	VM_WARN_ON_ONCE(nr_mappings != nr_pages);
 	atomic_add(nr_mappings, &folio->_large_mapcount);
 }
 
 static inline int folio_add_return_large_mapcount(struct folio *folio,
-		unsigned int nr_mappings, struct vm_area_struct *vma)
+		unsigned int nr_mappings, unsigned int nr_pages,
+		struct vm_area_struct *vma, unsigned long *nr_mapped_pages)
 {
-	return atomic_add_return(nr_mappings, &folio->_large_mapcount) + 1;
-}
+	int new_mapcount = atomic_add_return(nr_mappings, &folio->_large_mapcount) + 1;
 
-static inline void folio_sub_large_mapcount(struct folio *folio,
-		unsigned int nr_mappings, struct vm_area_struct *vma)
-{
-	atomic_sub(nr_mappings, &folio->_large_mapcount);
+	/* No support for large mappings. */
+	VM_WARN_ON_ONCE(nr_mappings != nr_pages);
+	*nr_mapped_pages = new_mapcount;
+	return new_mapcount;
 }
 
 static inline int folio_sub_return_large_mapcount(struct folio *folio,
-		unsigned int nr_mappings, struct vm_area_struct *vma)
+		unsigned int nr_mappings, unsigned int nr_pages,
+		struct vm_area_struct *vma, unsigned long *nr_mapped_pages)
 {
-	return atomic_sub_return(nr_mappings, &folio->_large_mapcount) + 1;
+	int new_mapcount = atomic_sub_return(nr_mappings, &folio->_large_mapcount) + 1;
+
+	/* No support for large mappings. */
+	VM_WARN_ON_ONCE(nr_mappings != nr_pages);
+	*nr_mapped_pages = new_mapcount;
+	return new_mapcount;
 }
 #endif /* CONFIG_MM_ID */
 
-#define folio_inc_large_mapcount(folio, vma) \
-	folio_add_large_mapcount(folio, 1, vma)
-#define folio_inc_return_large_mapcount(folio, vma) \
-	folio_add_return_large_mapcount(folio, 1, vma)
-#define folio_dec_large_mapcount(folio, vma) \
-	folio_sub_large_mapcount(folio, 1, vma)
-#define folio_dec_return_large_mapcount(folio, vma) \
-	folio_sub_return_large_mapcount(folio, 1, vma)
-
 /* RMAP flags, currently only relevant for some anon rmap operations. */
 typedef int __bitwise rmap_t;
 
@@ -332,6 +358,8 @@ typedef int __bitwise rmap_t;
 static __always_inline void __folio_rmap_sanity_checks(const struct folio *folio,
 		const struct page *page, int nr_pages, enum pgtable_level level)
 {
+	const unsigned int mapping_order = pgtable_level_to_order(level);
+
 	/* hugetlb folios are handled separately. */
 	VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
 
@@ -351,29 +379,8 @@ static __always_inline void __folio_rmap_sanity_checks(const struct folio *folio
 	VM_WARN_ON_FOLIO(page_folio(page) != folio, folio);
 	VM_WARN_ON_FOLIO(page_folio(page + nr_pages - 1) != folio, folio);
 
-	switch (level) {
-	case PGTABLE_LEVEL_PTE:
-		break;
-	case PGTABLE_LEVEL_PMD:
-		/*
-		 * We don't support folios larger than a single PMD yet. So
-		 * when PGTABLE_LEVEL_PMD is set, we assume that we are creating
-		 * a single "entire" mapping of the folio.
-		 */
-		VM_WARN_ON_FOLIO(folio_nr_pages(folio) != HPAGE_PMD_NR, folio);
-		VM_WARN_ON_FOLIO(nr_pages != HPAGE_PMD_NR, folio);
-		break;
-	case PGTABLE_LEVEL_PUD:
-		/*
-		 * Assume that we are creating a single "entire" mapping of the
-		 * folio.
-		 */
-		VM_WARN_ON_FOLIO(folio_nr_pages(folio) != HPAGE_PUD_NR, folio);
-		VM_WARN_ON_FOLIO(nr_pages != HPAGE_PUD_NR, folio);
-		break;
-	default:
-		BUILD_BUG();
-	}
+	VM_WARN_ON_FOLIO(!IS_ALIGNED(nr_pages, 1u << mapping_order), folio);
+	VM_WARN_ON_FOLIO(!IS_ALIGNED(folio_page_idx(folio, page), 1u << mapping_order), folio);
 
 	/*
 	 * Anon folios must have an associated live anon_vma as long as they're
@@ -491,25 +498,14 @@ static __always_inline void __folio_dup_file_rmap(struct folio *folio,
 		struct page *page, int nr_pages, struct vm_area_struct *dst_vma,
 		enum pgtable_level level)
 {
-	__folio_rmap_sanity_checks(folio, page, nr_pages, level);
+	const unsigned int nr_mappings = nr_pages >> pgtable_level_to_order(level);
 
-	switch (level) {
-	case PGTABLE_LEVEL_PTE:
-		if (!folio_test_large(folio)) {
-			atomic_inc(&folio->_mapcount);
-			break;
-		}
+	__folio_rmap_sanity_checks(folio, page, nr_pages, level);
 
-		folio_add_large_mapcount(folio, nr_pages, dst_vma);
-		break;
-	case PGTABLE_LEVEL_PMD:
-	case PGTABLE_LEVEL_PUD:
-		atomic_inc(&folio->_entire_mapcount);
-		folio_inc_large_mapcount(folio, dst_vma);
-		break;
-	default:
-		BUILD_BUG();
-	}
+	if (level == PGTABLE_LEVEL_PTE && !folio_test_large(folio))
+		atomic_inc(&folio->_mapcount);
+	else
+		folio_add_large_mapcount(folio, nr_mappings, nr_pages, dst_vma);
 }
 
 /**
@@ -559,7 +555,6 @@ static __always_inline int __folio_try_dup_anon_rmap(struct folio *folio,
 		struct page *page, int nr_pages, struct vm_area_struct *dst_vma,
 		struct vm_area_struct *src_vma, enum pgtable_level level)
 {
-	const int orig_nr_pages = nr_pages;
 	bool maybe_pinned;
 	int i;
 
@@ -581,39 +576,28 @@ static __always_inline int __folio_try_dup_anon_rmap(struct folio *folio,
 	 * folio. But if any page is PageAnonExclusive, we must fallback to
 	 * copying if the folio maybe pinned.
 	 */
-	switch (level) {
-	case PGTABLE_LEVEL_PTE:
-		if (unlikely(maybe_pinned)) {
-			for (i = 0; i < nr_pages; i++)
-				if (PageAnonExclusive(page + i))
-					return -EBUSY;
-		}
-
-		if (!folio_test_large(folio)) {
-			if (PageAnonExclusive(page))
-				ClearPageAnonExclusive(page);
-			atomic_inc(&folio->_mapcount);
-			break;
-		}
-
-		do {
-			if (PageAnonExclusive(page))
-				ClearPageAnonExclusive(page);
-		} while (page++, --nr_pages > 0);
-		folio_add_large_mapcount(folio, orig_nr_pages, dst_vma);
-		break;
-	case PGTABLE_LEVEL_PMD:
-	case PGTABLE_LEVEL_PUD:
+	if (level == PGTABLE_LEVEL_PTE && !folio_test_large(folio)) {
 		if (PageAnonExclusive(page)) {
 			if (unlikely(maybe_pinned))
 				return -EBUSY;
 			ClearPageAnonExclusive(page);
 		}
-		atomic_inc(&folio->_entire_mapcount);
-		folio_inc_large_mapcount(folio, dst_vma);
-		break;
-	default:
-		BUILD_BUG();
+		atomic_inc(&folio->_mapcount);
+	} else {
+		const unsigned int mapping_order = pgtable_level_to_order(level);
+		const unsigned int nr_mappings = nr_pages >> mapping_order;
+
+		if (unlikely(maybe_pinned)) {
+			for (i = 0; i < nr_pages; i += 1u << mapping_order)
+				if (PageAnonExclusive(page + i))
+					return -EBUSY;
+		} else {
+			for (i = 0; i < nr_pages; i += 1u << mapping_order) {
+				if (PageAnonExclusive(page + i))
+					ClearPageAnonExclusive(page + i);
+			}
+		}
+		folio_add_large_mapcount(folio, nr_mappings, nr_pages, dst_vma);
 	}
 	return 0;
 }
diff --git a/mm/debug.c b/mm/debug.c
index 82baaf87ef3d..15d3cb9c1cb0 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -83,10 +83,10 @@ static void __dump_folio(const struct folio *folio, const struct page *page,
 	if (folio_test_large(folio)) {
 		int pincount = atomic_read(&folio->_pincount);
 
-		pr_warn("head: order:%u mapcount:%d entire_mapcount:%d pincount:%d\n",
+		pr_warn("head: order:%u mapcount:%d total_mapped_pages:%lu pincount:%d\n",
 				folio_order(folio),
 				folio_mapcount(folio),
-				folio_entire_mapcount(folio);
+				folio_total_mapped_pages(folio),
 				pincount);
 	}
 
diff --git a/mm/internal.h b/mm/internal.h
index aa1206495bc6..d4d74f614e7f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -868,7 +868,7 @@ static inline void prep_compound_head(struct page *page, unsigned int order)
 		folio->_mm_id_mapcount[0] = -1;
 		folio->_mm_id_mapcount[1] = -1;
 	}
-	atomic_set(&folio->_entire_mapcount, -1);
+	atomic_long_set(&folio->_total_mapped_pages, 0);
 	atomic_set(&folio->_pincount, 0);
 	if (order > 1)
 		INIT_LIST_HEAD(&folio->_deferred_list);
diff --git a/mm/memory.c b/mm/memory.c
index ea6568571131..6a3e0eed29cc 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4139,8 +4139,7 @@ static bool __wp_can_reuse_large_anon_folio(struct folio *folio,
 	if (folio_large_mapcount(folio) != folio_ref_count(folio))
 		goto unlock;
 
-	VM_WARN_ON_ONCE_FOLIO(folio_large_mapcount(folio) > folio_nr_pages(folio), folio);
-	VM_WARN_ON_ONCE_FOLIO(folio_entire_mapcount(folio), folio);
+	VM_WARN_ON_ONCE_FOLIO(folio_total_mapped_pages(folio) > folio_nr_pages(folio), folio);
 	VM_WARN_ON_ONCE(folio_mm_id(folio, 0) != vma->vm_mm->mm_id &&
 			folio_mm_id(folio, 1) != vma->vm_mm->mm_id);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8ed4c73fdba4..43000d869215 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1121,8 +1121,8 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page)
 				goto out;
 			}
 		}
-		if (folio_entire_mapcount(folio)) {
-			bad_page(page, "nonzero entire_mapcount");
+		if (unlikely(atomic_long_read(&folio->_total_mapped_pages))) {
+			bad_page(page, "nonzero total_mapped_pages");
 			goto out;
 		}
 		if (unlikely(atomic_read(&folio->_pincount))) {
diff --git a/mm/rmap.c b/mm/rmap.c
index d08927949284..47b144f6d3c7 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1357,30 +1357,28 @@ static __always_inline void __folio_add_rmap(struct folio *folio,
 
 	__folio_rmap_sanity_checks(folio, page, nr_pages, level);
 
-	switch (level) {
-	case PGTABLE_LEVEL_PTE:
-		if (!folio_test_large(folio)) {
-			nr = atomic_inc_and_test(&folio->_mapcount);
-			break;
-		}
+	if (level == PGTABLE_LEVEL_PTE && !folio_test_large(folio)) {
+		nr = atomic_inc_and_test(&folio->_mapcount);
+	} else {
+		const unsigned int nr_mappings = nr_pages >> pgtable_level_to_order(level);
+		unsigned long nr_mapped_pages;
 
-		mapcount = folio_add_return_large_mapcount(folio, nr_pages, vma);
-		if (mapcount == nr_pages)
+		mapcount = folio_add_return_large_mapcount(folio, nr_mappings,
+							   nr_pages, vma,
+							   &nr_mapped_pages);
+		if (mapcount == nr_mappings)
 			nr = folio_large_nr_pages(folio);
-		break;
-	case PGTABLE_LEVEL_PMD:
-	case PGTABLE_LEVEL_PUD:
-		if (atomic_inc_and_test(&folio->_entire_mapcount) &&
-		    level == PGTABLE_LEVEL_PMD)
-			nr_pmdmapped = HPAGE_PMD_NR;
 
-		mapcount = folio_inc_return_large_mapcount(folio, vma);
-		if (mapcount == 1)
-			nr = folio_large_nr_pages(folio);
-		break;
-	default:
-		BUILD_BUG();
+		/*
+		 * For PMD-sized THPs, we'll adjust the counter once the
+		 * first PMD mapping is added.
+		 */
+		if (level == PGTABLE_LEVEL_PMD &&
+		    folio_large_nr_pages(folio) == HPAGE_PMD_NR &&
+		    nr_mapped_pages - mapcount == nr_pages - nr_mappings)
+			nr_pmdmapped = HPAGE_PMD_NR;
 	}
+
 	__folio_mod_stat(folio, nr, nr_pmdmapped);
 }
 
@@ -1483,35 +1481,14 @@ static __always_inline void __folio_add_anon_rmap(struct folio *folio,
 		__page_check_anon_rmap(folio, page, vma, address);
 
 	if (flags & RMAP_EXCLUSIVE) {
-		switch (level) {
-		case PGTABLE_LEVEL_PTE:
-			for (i = 0; i < nr_pages; i++)
-				SetPageAnonExclusive(page + i);
-			break;
-		case PGTABLE_LEVEL_PMD:
-			SetPageAnonExclusive(page);
-			break;
-		case PGTABLE_LEVEL_PUD:
-			/*
-			 * Keep the compiler happy, we don't support anonymous
-			 * PUD mappings.
-			 */
-			WARN_ON_ONCE(1);
-			break;
-		default:
-			BUILD_BUG();
-		}
+		const unsigned int mapping_order = pgtable_level_to_order(level);
+
+		for (i = 0; i < nr_pages; i += 1u << mapping_order)
+			SetPageAnonExclusive(page + i);
 	}
 
 	VM_WARN_ON_FOLIO(!folio_test_large(folio) && PageAnonExclusive(page) &&
 			 atomic_read(&folio->_mapcount) > 0, folio);
-	for (i = 0; i < nr_pages; i++) {
-		struct page *cur_page = page + i;
-
-		VM_WARN_ON_FOLIO(folio_test_large(folio) &&
-				 folio_entire_mapcount(folio) > 1 &&
-				 PageAnonExclusive(cur_page), folio);
-	}
 
 	/*
 	 * Only mlock it if the folio is fully mapped to the VMA.
@@ -1608,27 +1585,34 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
 		atomic_set(&folio->_mapcount, 0);
 		if (exclusive)
 			SetPageAnonExclusive(&folio->page);
-	} else if (!folio_test_pmd_mappable(folio)) {
+	} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+		const unsigned int order = folio_large_order(folio);
+		unsigned int nr_mappings, mapping_order;
 		int i;
 
+		if (order >= PUD_ORDER)
+			mapping_order = PUD_ORDER;
+		else if (order >= PMD_ORDER)
+			mapping_order = PMD_ORDER;
+		else
+			mapping_order = 0;
+
 		nr = folio_large_nr_pages(folio);
+		if (order == PMD_ORDER)
+			nr_pmdmapped = 1u << order;
+
 		if (exclusive) {
-			for (i = 0; i < nr; i++) {
+			for (i = 0; i < nr; i += 1u << mapping_order) {
 				struct page *page = folio_page(folio, i);
 
 				SetPageAnonExclusive(page);
 			}
 		}
 
-		folio_set_large_mapcount(folio, nr, vma);
+		nr_mappings = nr >> mapping_order;
+		folio_set_large_mapcount(folio, nr_mappings, nr, vma);
 	} else {
-		nr = folio_large_nr_pages(folio);
-		/* increment count (starts at -1) */
-		atomic_set(&folio->_entire_mapcount, 0);
-		folio_set_large_mapcount(folio, 1, vma);
-		if (exclusive)
-			SetPageAnonExclusive(&folio->page);
-		nr_pmdmapped = nr;
+		WARN_ON_ONCE(1);
 	}
 
 	VM_WARN_ON_ONCE(address < vma->vm_start ||
@@ -1714,8 +1698,10 @@ void folio_add_file_rmap_pud(struct folio *folio, struct page *page,
 #endif
 }
 
-static bool __folio_certainly_partially_mapped(struct folio *folio, int mapcount)
+static bool __folio_certainly_partially_mapped(struct folio *folio)
 {
+	unsigned long total_mapped_pages = atomic_long_read(&folio->_total_mapped_pages);
+
 	/*
 	 * This is a best-effort check only: if the average per-page
 	 * mapcount in the folio is smaller than 1, at least one page is not
@@ -1727,8 +1713,7 @@ static bool __folio_certainly_partially_mapped(struct folio *folio, int mapcount
 	 * average per-page mapcount is >= 1. However, we will detect the
 	 * partial mapping once it becomes exclusively mapped again.
 	 */
-	return mapcount && !folio_entire_mapcount(folio) &&
-	       mapcount < folio_large_nr_pages(folio);
+	return total_mapped_pages && total_mapped_pages < folio_large_nr_pages(folio);
 }
 
 static __always_inline void __folio_remove_rmap(struct folio *folio,
@@ -1736,53 +1721,45 @@ static __always_inline void __folio_remove_rmap(struct folio *folio,
 		enum pgtable_level level)
 {
 	int nr = 0, nr_pmdmapped = 0, mapcount;
-	bool partially_mapped = false;
 
 	__folio_rmap_sanity_checks(folio, page, nr_pages, level);
 
-	switch (level) {
-	case PGTABLE_LEVEL_PTE:
-		if (!folio_test_large(folio)) {
-			nr = atomic_add_negative(-1, &folio->_mapcount);
-			break;
-		}
+	if (level == PGTABLE_LEVEL_PTE && !folio_test_large(folio)) {
+		nr = atomic_add_negative(-1, &folio->_mapcount);
+	} else {
+		const unsigned int nr_mappings = nr_pages >> pgtable_level_to_order(level);
+		unsigned long nr_mapped_pages;
 
-		mapcount = folio_sub_return_large_mapcount(folio, nr_pages, vma);
+		mapcount = folio_sub_return_large_mapcount(folio, nr_mappings,
+							   nr_pages, vma,
+							   &nr_mapped_pages);
 		if (!mapcount)
 			nr = folio_large_nr_pages(folio);
 
-		partially_mapped = __folio_certainly_partially_mapped(folio, mapcount);
-		break;
-	case PGTABLE_LEVEL_PMD:
-	case PGTABLE_LEVEL_PUD:
-		mapcount = folio_dec_return_large_mapcount(folio, vma);
-		if (!mapcount)
-			nr = folio_large_nr_pages(folio);
-
-		if (atomic_add_negative(-1, &folio->_entire_mapcount) &&
-		    level == PGTABLE_LEVEL_PMD)
+		/*
+		 * For PMD-sized THPs, we'll adjust the counter once the
+		 * last PMD mapping is removed.
+		 */
+		if (level == PGTABLE_LEVEL_PMD &&
+		    folio_large_nr_pages(folio) == HPAGE_PMD_NR &&
+		    nr_mapped_pages - mapcount == 0)
 			nr_pmdmapped = HPAGE_PMD_NR;
 
-		partially_mapped = __folio_certainly_partially_mapped(folio, mapcount);
-		break;
-	default:
-		BUILD_BUG();
+		/*
+		 * Queue anon large folio for deferred split if at least one
+		 * page of the folio is unmapped and at least one page is still
+		 * mapped.
+		 *
+		 * Device private folios do not support deferred splitting and
+		 * shrinker based scanning of the folios to free.
+		 */
+		if (folio_test_anon(folio) &&
+		    __folio_certainly_partially_mapped(folio) &&
+		    !folio_test_partially_mapped(folio) &&
+		    !folio_is_device_private(folio))
+			deferred_split_folio(folio, true);
 	}
 
-	/*
-	 * Queue anon large folio for deferred split if at least one page of
-	 * the folio is unmapped and at least one page is still mapped.
-	 *
-	 * Check partially_mapped first to ensure it is a large folio.
-	 *
-	 * Device private folios do not support deferred splitting and
-	 * shrinker based scanning of the folios to free.
-	 */
-	if (partially_mapped && folio_test_anon(folio) &&
-	    !folio_test_partially_mapped(folio) &&
-	    !folio_is_device_private(folio))
-		deferred_split_folio(folio, true);
-
 	__folio_mod_stat(folio, -nr, -nr_pmdmapped);
 
 	/*

-- 
2.43.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-04-12 19:01 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-12 18:59 [PATCH RFC 00/13] mm/rmap: support arbitrary folio mappings David Hildenbrand (Arm)
2026-04-12 18:59 ` [PATCH RFC 01/13] mm/rmap: remove folio->_nr_pages_mapped David Hildenbrand (Arm)
2026-04-12 18:59 ` [PATCH RFC 02/13] fs/proc/task_mmu: remove CONFIG_PAGE_MAPCOUNT handling for "mapmax" David Hildenbrand (Arm)
2026-04-12 18:59 ` [PATCH RFC 03/13] fs/proc/page: remove CONFIG_PAGE_MAPCOUNT handling for kpagecount David Hildenbrand (Arm)
2026-04-12 18:59 ` [PATCH RFC 04/13] fs/proc/task_mmu: remove CONFIG_PAGE_MAPCOUNT handling for PM_MMAP_EXCLUSIVE David Hildenbrand (Arm)
2026-04-12 18:59 ` [PATCH RFC 05/13] fs/proc/task_mmu: remove mapcount comment in smaps_account() David Hildenbrand (Arm)
2026-04-12 18:59 ` [PATCH RFC 06/13] fs/proc/task_mmu: remove CONFIG_PAGE_MAPCOUNT handling " David Hildenbrand (Arm)
2026-04-12 18:59 ` [PATCH RFC 07/13] mm/rmap: remove CONFIG_PAGE_MAPCOUNT David Hildenbrand (Arm)
2026-04-12 18:59 ` [PATCH RFC 08/13] mm: re-consolidate folio->_entire_mapcount David Hildenbrand (Arm)
2026-04-12 18:59 ` [PATCH RFC 09/13] mm: move _large_mapcount to _mapcount in page[1] of a large folio David Hildenbrand (Arm)
2026-04-12 18:59 ` [PATCH RFC 10/13] mm: re-consolidate folio->_pincount David Hildenbrand (Arm)
2026-04-12 18:59 ` [PATCH RFC 11/13] mm/rmap: stop using the entire mapcount for hugetlb folios David Hildenbrand (Arm)
2026-04-12 18:59 ` [PATCH RFC 12/13] mm/rmap: large mapcount interface cleanups David Hildenbrand (Arm)
2026-04-12 18:59 ` [PATCH RFC 13/13] mm/rmap: support arbitrary folio mappings David Hildenbrand (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox